* refactor(connectors): split KtxDialect into core and KtxSqlDialect
Separate the dialect contract into a driver-agnostic core (display/ref
formatting and type mapping) and a SQL-only extension (query generators).
The catalog and entity-details paths resolve the core dialect for any
snapshot driver, so it must stay free of SQL generation; this is the
prerequisite refactor for adding non-SQL primary sources.
- KtxDialect keeps type, formatDisplayRef, parseDisplayRef,
columnDisplayTablePartCount, mapDataType, mapToDimensionType
- KtxSqlDialect extends it with quoteIdentifier, formatTableName, and the
query/sample/statistics generators; the 7 SQL dialects implement it
- add getSqlDialectForDriver for SQL drivers; the 7 connectors and the
relationship-benchmark harness consume it
- thread the relationship pipeline (profiling/validation/composite/
discovery) as KtxSqlDialect | null so a non-SQL source skips coverage SQL
and its candidates stay in review; local-enrichment builds the SQL
dialect only when the connector advertises readOnlySql
Pure extraction: no behavior change for the existing 7 drivers.
Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
* feat(connectors): add MongoDB connector for issue #305
Add a read-only MongoDB connector that treats a database as a primary
context source: collections map to tables and inferred top-level fields to
columns. MongoDB is the first non-SQL source (readOnlySql: false), so
ktx sql and metric compilation do not apply, but its collections flow
through ingest, descriptions, and relationship discovery.
- schema-inference: infer a flat column schema from the most recent
sample_size documents (by _id desc, or order_by for non-ObjectId keys).
Union BSON types per field, mark multi-type fields mixed (string), keep
sub-documents/arrays as a single opaque json column, derive nullability
from presence, treat _id as the primary key
- connector: KtxMongoDbScanConnector behind an injectable client seam;
strictly read-only (find/listCollections/estimatedDocumentCount only),
no executeReadOnly; resolves env:/file: via resolveKtxConfigReference
- core-only KtxMongoDbDialect and a live-database introspection adapter
- wire the mongodb driver: driver union, dialect registry, driver
registration (scopeConfigKey databases), mongodbConnectionSchema,
connection-drivers, normalizeDriver, the live-database route, and the
ktx setup picker. ktx sql is refused by the read-only SQL capability gate
- tests: schema inference, connector snapshot via a fake client, dialect,
driver-schema parsing, and the ktx sql rejection
Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
* docs(integrations): document the MongoDB primary source
Add a MongoDB section to the primary-sources reference: connection config
(url, databases, enabled_tables, sample_size, order_by), mongodb+srv/TLS/
Atlas notes, the schema-inference explainer, a features matrix, and the
non-SQL caveat. Update the frontmatter and connection field reference.
Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
* fix(connectors): address review blockers on the MongoDB connector
- introspect: skip estimatedDocumentCount for views. The count command is
rejected on a MongoDB view (CommandNotSupportedOnView), so counting a view
aborted introspect for the whole connection; compute estimatedRows only for
real collections, as ClickHouse does.
- sl: refuse a semantic-layer query against a non-SQL connection instead of
defaulting it to the Postgres dialect. compileLocalSlQuery (the shared CLI +
MCP path) now rejects a driver with no SQL dialect via the new
isSqlQueryableDriver authority, keeping MongoDB context-only per issue #305.
- tests: cover input.tableScope and the empty-scope skip for the Mongo
connector (the scan layer does not post-filter), the view no-count path, and
the ktx sl query refusal for a mongodb connection.
Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
* polish(mongodb): compute sampled nullCount and document sampling caveats
Address the non-blocking review notes:
- sampleColumn now counts null/absent values over the sampled window instead of
returning nullCount: null, since the documents are already in hand
- warn that a custom order_by must be indexed (an unindexed sort hits MongoDB's
in-memory sort limit on large collections) in the connection schema and docs
- note that sampled values for nested fields are stringified, not faithfully
serialized, so the json opacity is deliberate
Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
* docs(examples): add a MongoDB connector example
A manual, container-backed example mirroring examples/postgres-historic:
- docker-compose.yml + init/seed.js seed a representative dataset (nested
documents, arrays, a Decimal128, a mixed-type field, a nullable field, an
ObjectId reference, and a view) on first container start
- scripts/smoke.sh + introspect-smoke.mjs assert the connector's inferred
schema with no LLM credentials — the same introspection entry point ktx
ingest's database-schema stage uses, including the view-no-count path
- README.md documents the smoke and a full keyless ktx ingest run
(claude-code LLM + managed sentence-transformers embeddings)
Works with Docker Compose or podman compose. Verified end to end.
Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
* chore: ignore examples/** in knip to fix dead-code false positives
The MongoDB connector example files (examples/mongodb/init/seed.js and
examples/mongodb/scripts/introspect-smoke.mjs) are used at runtime but were
flagged as unused by knip. Add examples/** to the ignore array, matching the
existing .context/** entry.
Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Claude-Session: https://claude.ai/code/session_0114qQV8fJ5a5ME3XbMVRzbL
* fix(mongodb): refuse non-SQL connections before SQL analysis
`ktx sql` and the MCP sql_execution tool resolved a SQL-analysis dialect
(falling back to Postgres for a non-SQL driver) and ran read-only
validation before the connector capability gate refused the connection.
For a MongoDB connection that spun up the parser/daemon and produced
Postgres parser diagnostics instead of a clean non-SQL refusal.
Route both entry points through a shared assertSqlQueryableConnection
guard before dialect selection, mirroring compileLocalSlQuery. The
federated duckdb path has no driver and is exempted at each call site.
Add CLI and MCP regression tests asserting validation/connector work
never starts for a MongoDB connection.
* fix(mongodb): pass CI gates (dialect boundary, secrets, setup test)
Three latent failures in the connector surfaced once CI ran on the branch:
- connector.ts imported the concrete KtxMongoDbDialect, which the connector
dialect-import boundary forbids. Route it through getDialectForDriver('mongodb')
and widen inferKtxMongoCollectionColumns to the base KtxDialect (it only uses
mapDataType/mapToDimensionType).
- detect-secrets flagged a test ObjectId hex and the mongodb+srv example URL;
annotate both with allowlist pragmas.
- the "shows every supported database" setup test omitted the new MongoDB option.
---------
Co-authored-by: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Co-authored-by: Luca Martial <48870843+luca-martial@users.noreply.github.com>
Co-authored-by: Luca Martial <lucamrtl@gmail.com>
Co-authored-by: Andrey Avtomonov <andreybavt@gmail.com>
Align the tree with AGENTS.md/CLAUDE.md conventions:
- Rewrite user-facing strings, docs, and tests to lowercase `ktx`
(no bare uppercase `KTX` tokens remain outside literal identifiers).
- Drop the legacy `historicSql` migration path and its now-unused
helpers, per the no-backward-compat rule.
- Remove `as unknown as` / `any` casts: narrow `BaseTool` generics to
`z.ZodObject`, add a typed `createLookerClient`, and delete the dead
`getParametersSchema`/`toAnthropicFormat` pre-AI-SDK helpers.
- Use `InvalidArgumentError` for Commander parse failures.
- Finish the adapter→connector prose conversion in the `ktx.yaml` docs
while keeping the literal `adapters` config key.
* chore: standardize daemon naming on "KTX daemon"
Replace inconsistent names ("KTX Python daemon", "KTX local embeddings
daemon", "KTX managed daemon", "Python daemon") with the single name
"KTX daemon" in CLI output, errors, command descriptions, test
assertions, smoke scripts, docs, AGENTS.md, issue templates, and
codecov flags. The daemon is a portable compute server with endpoints
for SQL analysis, semantic layer, LookML, database introspection, and
embeddings; the previous labels misrepresented it as embeddings-only or
exposed implementation details ("Python", "managed").
The "KTX Python runtime" concept (installed interpreter + packages) is
deliberately left as-is — it is a separate concept from the daemon
process.
* refactor(release): drop release-policy.json runtime dep and next branch
Strips the release-policy.json fallback from release-version.ts so the CLI
reads its version straight from packages/cli/package.json. dev → 0.0.0-private,
installed @kaelio/ktx → the real semver baked into the published package.json.
KtxCliPackageInfo collapses to { name, version, contextPackageName }; /health
no longer depends on version files surviving past a CI run.
Replaces the dual-branch (main + next) semantic-release model with a single-
branch model on main. rcs and stables interleave on the same branch via
{ name: 'main', prerelease: 'rc', channel: 'next' } / ['main']. Drops
@semantic-release/git and @semantic-release/changelog (nothing is committed
back to the repo on any channel) and the workflow's "Prepare next prerelease
branch" step plus the KTX_PRERELEASE_BRANCH plumbing. The git tag plus the
published npm artifact carry the version forward.
Updates docs/release.md, removes the two now-unused devDeps, regenerates
pnpm-lock.yaml. 611/611 @ktx/cli tests, 173/173 script tests, type-check,
biome, knip all clean.
* fix(release): don't throw on non-main branches at config-load time
knip loads .releaserc.cjs on every PR run, where GITHUB_REF_NAME is the
merge ref (e.g. 180/merge). The previous version of releaseBranches threw
immediately when the branch wasn't main, which made knip fail to evaluate
the config and then mis-flag @semantic-release/exec as an unused dep.
semantic-release already refuses to publish when the current branch doesn't
match a configured release branch, so the explicit throw was redundant.
Drop it (and the unused currentBranch helper) and replace the
"rejects releases from non-main" assertion with one that exercises a CI-
shaped GITHUB_REF_NAME and confirms the config loads.
* docs: add CLI component reuse guidance
* docs: add unified ingest ux design
* Refine unified ingest UX design after adversarial review iteration 1
* Refine unified ingest UX design after adversarial review iteration 2
* Refine unified ingest UX design after adversarial review iteration 3
* feat(cli): route public connection ingest command
* feat(cli): hide standalone scan from public help
* feat(cli): plan public ingest depth and query history
* feat(cli): execute public database ingest facets
* feat(ingest): read connection query history config
* fix(cli): use public ingest wording
* fix(config): stop generating ingest adapter allow lists
* docs: document public ingest command
* test: align ingest surface expectations
* docs: add unified ingest public CLI surface plan
* feat(cli): preflight deep public ingest readiness
* feat(setup): store query history in connection context
* feat(setup): store database context depth
* feat(setup): verify context readiness by database depth
* fix(setup): keep context build foreground only
* fix(config): reject reserved ingest connection ids
* test: close unified ingest v1 expectations
* docs: add unified ingest v1 closure plan
* fix(ingest): bypass adapter allow-list for public source ingest
* fix(ingest): honor query history window intent
* fix(ingest): hide scan internals from public database ingest
* feat(ingest): use foreground view for interactive public ingest
* fix(setup): use schema context and query history wording
* test(cli): verify unified ingest public output
* docs: add unified ingest v1 public output closure plan
* fix(setup): forward query history flags
* fix(setup): prompt for postgres query history
* fix(status): report query history readiness
* fix(ingest): remove legacy public guidance
* fix(ingest): polish foreground retry copy
* docs(examples): use unified query history wording
* chore(ingest): finish public query history cleanup
* docs: add unified ingest v1 query history status cleanup plan
* test(docs): cover unified ingest public docs
* docs: align ingest CLI reference with unified UX
* docs: update context build guides for unified ingest
* docs: update setup and primary source ingest wording
* docs: stop advertising adapter-backed example ingest
* docs: close unified ingest public docs gaps
* docs: add unified ingest v1 docs site closure plan
* fix: render unified ingest foreground warnings
* fix: explain query history schema order
* fix: add public ingest retry guidance
* fix: align setup next steps with unified ingest
* fix: remove scan wording from demo progress
* test: verify unified ingest ux closure
* docs: add unified ingest v1 foreground and retry closure plan
* fix(cli): preserve query-history pull config in public ingest
* fix(cli): omit hidden commands from docs command tree
* test(cli): close unified ingest final public surface checks
* docs: add unified ingest v1 final public surface closure plan
* fix(cli): use public source labels in ingest reports
* fix(cli): suppress low-level public ingest output
* test(cli): verify unified ingest public plain output
* docs: add unified ingest v1 public plain output closure plan
* fix(cli): add public ingest copy sanitizers
* fix(cli): sanitize public ingest progress copy
* fix(cli): rename setup schema scope prompt
* docs(plan): add progress copy closure; test: align setup back-nav fixture
Adds the iter9 plan and updates the setup back-navigation test fixture
to pass disableQueryHistory plus listSchemas/listTables stubs that the
unified ingest setup step now requires.
* docs(plan): add final ux labels plan with narrowed label scans
* fix(cli): aggregate unsupported query-history warnings
* fix(cli): align setup database labels
* test(cli): fix setup database test type-check
* fix(cli): remove primary-source wording from setup output
* test(cli): verify unified ingest setup closure
* docs(plan): add unified ingest v1 verification copy closure plan
* fix(cli): remove top-level scan command
* fix(cli): remove legacy ingest and wiki commands
* Merge scan into ingest flow
* feat(cli): split ingest progress into per-phase rows, rename work units to tasks
Each database target in the unified ingest dashboard now renders one row per
real subprocess (Schema, then Query history when enabled) instead of a single
combined bar. Each phase has its own monotonic 0-100% bar so the progress
never snaps back to zero when historic-sql starts after scan completes.
Completed phases keep their final bar, summary, and elapsed time visible as
an inline audit trail; queued and skipped phases are shown explicitly.
Also rename user-facing "work units" / "Failed work units" to "tasks" /
"Failed tasks" in ingest output and parseIngestSummary. The parser still
accepts the legacy "Work units:" wording in captured output for backward
compat. Internal memory-flow event names and type fields are left alone.
* Fix test harness failures
* Fix CI smoke checks
---------
Co-authored-by: Andrey Avtomonov <7889985+andreybavt@users.noreply.github.com>