* refactor(connectors): split KtxDialect into core and KtxSqlDialect
Separate the dialect contract into a driver-agnostic core (display/ref
formatting and type mapping) and a SQL-only extension (query generators).
The catalog and entity-details paths resolve the core dialect for any
snapshot driver, so it must stay free of SQL generation; this is the
prerequisite refactor for adding non-SQL primary sources.
- KtxDialect keeps type, formatDisplayRef, parseDisplayRef,
columnDisplayTablePartCount, mapDataType, mapToDimensionType
- KtxSqlDialect extends it with quoteIdentifier, formatTableName, and the
query/sample/statistics generators; the 7 SQL dialects implement it
- add getSqlDialectForDriver for SQL drivers; the 7 connectors and the
relationship-benchmark harness consume it
- thread the relationship pipeline (profiling/validation/composite/
discovery) as KtxSqlDialect | null so a non-SQL source skips coverage SQL
and its candidates stay in review; local-enrichment builds the SQL
dialect only when the connector advertises readOnlySql
Pure extraction: no behavior change for the existing 7 drivers.
Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
* feat(connectors): add MongoDB connector for issue #305
Add a read-only MongoDB connector that treats a database as a primary
context source: collections map to tables and inferred top-level fields to
columns. MongoDB is the first non-SQL source (readOnlySql: false), so
ktx sql and metric compilation do not apply, but its collections flow
through ingest, descriptions, and relationship discovery.
- schema-inference: infer a flat column schema from the most recent
sample_size documents (by _id desc, or order_by for non-ObjectId keys).
Union BSON types per field, mark multi-type fields mixed (string), keep
sub-documents/arrays as a single opaque json column, derive nullability
from presence, treat _id as the primary key
- connector: KtxMongoDbScanConnector behind an injectable client seam;
strictly read-only (find/listCollections/estimatedDocumentCount only),
no executeReadOnly; resolves env:/file: via resolveKtxConfigReference
- core-only KtxMongoDbDialect and a live-database introspection adapter
- wire the mongodb driver: driver union, dialect registry, driver
registration (scopeConfigKey databases), mongodbConnectionSchema,
connection-drivers, normalizeDriver, the live-database route, and the
ktx setup picker. ktx sql is refused by the read-only SQL capability gate
- tests: schema inference, connector snapshot via a fake client, dialect,
driver-schema parsing, and the ktx sql rejection
Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
* docs(integrations): document the MongoDB primary source
Add a MongoDB section to the primary-sources reference: connection config
(url, databases, enabled_tables, sample_size, order_by), mongodb+srv/TLS/
Atlas notes, the schema-inference explainer, a features matrix, and the
non-SQL caveat. Update the frontmatter and connection field reference.
Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
* fix(connectors): address review blockers on the MongoDB connector
- introspect: skip estimatedDocumentCount for views. The count command is
rejected on a MongoDB view (CommandNotSupportedOnView), so counting a view
aborted introspect for the whole connection; compute estimatedRows only for
real collections, as ClickHouse does.
- sl: refuse a semantic-layer query against a non-SQL connection instead of
defaulting it to the Postgres dialect. compileLocalSlQuery (the shared CLI +
MCP path) now rejects a driver with no SQL dialect via the new
isSqlQueryableDriver authority, keeping MongoDB context-only per issue #305.
- tests: cover input.tableScope and the empty-scope skip for the Mongo
connector (the scan layer does not post-filter), the view no-count path, and
the ktx sl query refusal for a mongodb connection.
Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
* polish(mongodb): compute sampled nullCount and document sampling caveats
Address the non-blocking review notes:
- sampleColumn now counts null/absent values over the sampled window instead of
returning nullCount: null, since the documents are already in hand
- warn that a custom order_by must be indexed (an unindexed sort hits MongoDB's
in-memory sort limit on large collections) in the connection schema and docs
- note that sampled values for nested fields are stringified, not faithfully
serialized, so the json opacity is deliberate
Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
* docs(examples): add a MongoDB connector example
A manual, container-backed example mirroring examples/postgres-historic:
- docker-compose.yml + init/seed.js seed a representative dataset (nested
documents, arrays, a Decimal128, a mixed-type field, a nullable field, an
ObjectId reference, and a view) on first container start
- scripts/smoke.sh + introspect-smoke.mjs assert the connector's inferred
schema with no LLM credentials — the same introspection entry point ktx
ingest's database-schema stage uses, including the view-no-count path
- README.md documents the smoke and a full keyless ktx ingest run
(claude-code LLM + managed sentence-transformers embeddings)
Works with Docker Compose or podman compose. Verified end to end.
Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
* chore: ignore examples/** in knip to fix dead-code false positives
The MongoDB connector example files (examples/mongodb/init/seed.js and
examples/mongodb/scripts/introspect-smoke.mjs) are used at runtime but were
flagged as unused by knip. Add examples/** to the ignore array, matching the
existing .context/** entry.
Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Claude-Session: https://claude.ai/code/session_0114qQV8fJ5a5ME3XbMVRzbL
* fix(mongodb): refuse non-SQL connections before SQL analysis
`ktx sql` and the MCP sql_execution tool resolved a SQL-analysis dialect
(falling back to Postgres for a non-SQL driver) and ran read-only
validation before the connector capability gate refused the connection.
For a MongoDB connection that spun up the parser/daemon and produced
Postgres parser diagnostics instead of a clean non-SQL refusal.
Route both entry points through a shared assertSqlQueryableConnection
guard before dialect selection, mirroring compileLocalSlQuery. The
federated duckdb path has no driver and is exempted at each call site.
Add CLI and MCP regression tests asserting validation/connector work
never starts for a MongoDB connection.
* fix(mongodb): pass CI gates (dialect boundary, secrets, setup test)
Three latent failures in the connector surfaced once CI ran on the branch:
- connector.ts imported the concrete KtxMongoDbDialect, which the connector
dialect-import boundary forbids. Route it through getDialectForDriver('mongodb')
and widen inferKtxMongoCollectionColumns to the base KtxDialect (it only uses
mapDataType/mapToDimensionType).
- detect-secrets flagged a test ObjectId hex and the mongodb+srv example URL;
annotate both with allowlist pragmas.
- the "shows every supported database" setup test omitted the new MongoDB option.
---------
Co-authored-by: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Co-authored-by: Luca Martial <48870843+luca-martial@users.noreply.github.com>
Co-authored-by: Luca Martial <lucamrtl@gmail.com>
Co-authored-by: Andrey Avtomonov <andreybavt@gmail.com>
* fix(cli): double the height of the setup banner t crossbar
* fix(cli): unify setup multi-select hints and make Tab the select key
The six interactive multi-select surfaces in `ktx setup` documented three
different hint voices, one had no hint at all, and they named two different
select keys (Space vs Tab). Tab is the only key that can toggle selection
without colliding with type-to-search input, so make it the single documented
select key everywhere and compose every hint from one shared fragment
vocabulary in prompt-navigation.ts.
- Register `updateSettings({ aliases: { tab: 'space' } })` so Tab toggles flat
multiselects; the alias applies only to non-text prompts, leaving typed
search input (schema/Notion) untouched.
- Add the missing hint to the agent-targets prompt and drop the stray
"Space to select … Esc …" info line plus the now-dead writeSetupInfo helper.
- Replace the schema-scope ad-hoc hint with the searchable-multiselect voice
and standardize "filter" -> "search" vocabulary.
- Delete DEFAULT_TREE_PICKER_HELP_TEXT and the unused TreePickerChrome.helpText
seam; render the shared tree hint instead.
* refactor(cli): show LLM check progress for every setup backend
Rename runLlmHealthCheckWithProgress to validateModelWithProgress and
wrap the Claude subscription and Codex auth probes in the same spinner
progress as the Anthropic API and Vertex backends, so each backend shows
consistent "Checking <provider> LLM" output during setup.
* feat(cli): add ktx-orange progress spinners to setup steps
Add a shared runWithCliSpinner helper and a TTY-aware createCliSpinner:
an animated clack spinner in a terminal, and a static stderr-only spinner
before raw-mode pickers (the table tree picker and demo tour), where the
animated spinner's stdin grab would otherwise corrupt the next prompt.
Wrap the slow setup waits in progress spinners: managed runtime install,
embedding daemon start + first-run model download, embeddings health
check, the connection-test gate, and source validation / dbt clone /
Metabase discovery. Recolor every spinner frame from clack's magenta to
the ktx mascot orange (#FF8A4C) via the static helper and clack's
styleFrame option.
Align the tree with AGENTS.md/CLAUDE.md conventions:
- Rewrite user-facing strings, docs, and tests to lowercase `ktx`
(no bare uppercase `KTX` tokens remain outside literal identifiers).
- Drop the legacy `historicSql` migration path and its now-unused
helpers, per the no-backward-compat rule.
- Remove `as unknown as` / `any` casts: narrow `BaseTool` generics to
`z.ZodObject`, add a typed `createLookerClient`, and delete the dead
`getParametersSchema`/`toAnthropicFormat` pre-AI-SDK helpers.
- Use `InvalidArgumentError` for Commander parse failures.
- Finish the adapter→connector prose conversion in the `ktx.yaml` docs
while keeping the literal `adapters` config key.
Setup wizard flow tweaks:
- Add a reveal-tail password prompt (reveal-password-prompt.ts) that unmasks
the last few characters of a typed/pasted secret, and wire it into the setup
prompt adapter in place of clack's password(); adds the @clack/core dep.
- Reorder wizard select options: surface "Paste a key" before the
environment-variable option across embeddings/models/sources, promote
Metabase/Notion in the source list, put Git URL before Local path, reorder
the Notion crawl-mode choices, and relabel the sources "Done" action.
Query-history filter picker output:
- Collapse the per-template parse-failure lines into a single count in the
setup output and route the full template-id list to --debug stderr.
- Model parse failures as a structured parseFailedTemplateIds field instead of
warning strings.
- Add a privacy-safe query_history_filter_completed telemetry event
(counts/enums only), mirrored into the Python daemon schema.
* feat(cli): block context build when a required connection fails its live test
A context build can take several minutes, so a connection that is
unreachable or misconfigured should stop the build up front instead of
failing partway through. Before the build starts, run a live connection
test for every primary- and context-source connection the build depends
on.
Each test's output is captured in a discarded buffer so raw error text
(and database paths) never reach the user; failures are surfaced only by
connection id and connector type, with a pointer to `ktx connection test
<id>` for the underlying error.
- Interactive setup lets the user fix the connection and retry without
restarting, re-resolving targets so an added/removed/reconfigured
connection is honored.
- `--no-input` exits non-zero and writes a failed context state with a
failureReason, so scripts stop early and setup never reads as ready.
Extract the buffered command IO helper out of setup-databases into
src/io/buffered-command-io.ts so both call sites share one implementation.
* feat(cli): use recovery primitive for database setup
* feat(cli): use recovery primitive for source setup
* docs: document setup connection recovery
* fix(cli): close database recovery gaps
* fix(cli): target failing project in gate hint and preserve missing-input
Address two review findings on the connection-recovery work:
- The connection-gate failure hint emitted `ktx connection test <id>` with no
--project-dir, so a setup run started with `--project-dir ./analytics` pointed
users at cwd/KTX_PROJECT_DIR instead of the project that just failed. Emit the
resolved project dir, matching the contextBuildCommands convention.
- The non-interactive database configure path returned `cancelled`, which the
recovery primitive collapses to `failed`. Sibling paths still report
`missing-input` for absent flags, so incomplete-flag runs were
indistinguishable from real connection failures. The database wrapper now
tracks the configure missing-input signal and restores the `missing-input`
step status; the shared primitive keeps its four outcomes.
Fast mode (the ktx ingest --fast/--deep database-ingest depth toggle) is removed.
ktx ingest now always builds the full enriched ("deep") context. There is no
structural fallback: a database connection without a configured model and
embeddings fails the enrichment-readiness preflight before any work runs, with
a 'Run ktx setup to configure a model and embeddings' hint.
- Remove --fast/--deep flags, the per-connection context.depth field, and the
ktx setup depth prompt (delete setup-database-context-depth.ts).
- Rename ingest-depth.ts -> connection-drivers.ts; ingest always requests scan
mode 'enriched'; readiness gate (enrichmentReadinessGaps) runs for every
database target.
- Drop the database-context-depth telemetry step (Node + Python schema mirrors
regenerated).
- Update CLI, setup, context-build view, docs, the public ktx skill, and the
release-smoke / artifacts scripts (now assert the no-LLM guard failure).
ktx status --fast (a separate network-probe flag) is unchanged.
Follow-ups: KLO-726 (live progress for ktx ingest --all), KLO-727 (restore
credentialed successful-ingest release smoke coverage).
* feat(cli): define full warehouse dialect contract
* test(cli): keep dialect edge tests focused
* fix(cli): stabilize dialect contract foundation
* refactor(connectors): own read-only query preparation
* refactor(connectors): resolve dialects through registry
* refactor(connectors): keep concrete dialect classes internal
* chore(workspace): enforce dialect import boundary
* refactor(cli): resolve relationship dialect at scan boundary
* refactor(cli): use dialect display parsing for entity details
* refactor(cli): use dialect display parsing for warehouse catalog
* refactor(cli): use dialect SQL in relationship workflows
* test(cli): verify solid dialect scan workflow closure
* test: split cli tests from source tree
* refactor(cli): standardize BigQuery scope listing
* feat(sqlite): implement connector scope listing
* test(connectors): cover required table listing
* feat(cli): add warehouse driver registry
* refactor(setup): route scope discovery through driver registry
* refactor(cli): route local query execution through driver registry
* refactor(historic-sql): route dialect support through driver registry
* refactor(cli): test warehouse connections through driver registry
* fix(cli): close driver registry type export gaps
* Improve setup daemon diagnostics
* refactor(setup): centralize rail-prefixed diagnostics + query-history fallback
Extract errorMessage, writePrefixedLines, and flushPrefixedBufferedCommandOutput
into clack.ts so the setup wizard, managed daemons, and embedding/agent steps
share one rail-formatted writer. setup-databases.ts also adds a
"disable query history and retry" option when the schema-context build fails
and query history is the likely culprit, surfaced via a new
failed-query-history-unavailable status.
* fix(cli): carry catalog through the picker so BigQuery/Snowflake/SQL Server scope filters match
The setup picker's KtxTableListEntry was a 2-level { schema, name }, so
qualifiedTableId always wrote db.name into enabled_tables. When BigQuery,
Snowflake, or SQL Server later ran fast ingest, their introspect step filtered
the scope set with scopedTableNames(scope, { catalog: projectId|database, db })
— catalog was non-null on the introspect side but null in the scope refs, so
every entry was rejected, the live-database adapter staged zero table files,
and detect() failed with 'Adapter "live-database" did not recognize fetched
source output'.
Align the picker boundary with the canonical 3-level KtxTableRef:
- Add catalog: string | null to KtxTableListEntry.
- BigQuery/Snowflake/SQL Server listTables populate catalog from the
resolved projectId / database; Postgres/MySQL/ClickHouse/SQLite set null.
- qualifiedTableId emits catalog.schema.name when catalog is non-null
(resolveEnabledTables already accepts the 3-part shape) and
schemasFromEnabledTables now goes through parseDottedTableEntry so it
recovers the schema correctly from both 2-part and 3-part entries.
- Export parseDottedTableEntry from enabled-tables.ts (@internal) for picker
reuse.
Update listTables expectations in all seven connector tests and the setup /
picker test fixtures. Add a picker regression test that covers the
catalog-bearing round-trip (save + refine).
* fix(cli): allow debug telemetry under opt-out env
2026-05-26 08:49:05 +02:00
Renamed from packages/cli/src/setup-databases.test.ts (Browse further)