From 2afab614175cfb3ae868cf389161f824c2ee46bf Mon Sep 17 00:00:00 2001 From: Pintouch <5173681+Pintouch@users.noreply.github.com> Date: Mon, 29 Jun 2026 15:17:56 +0200 Subject: [PATCH] feat(connectors): add MongoDB connector (#305) (#310) MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit * refactor(connectors): split KtxDialect into core and KtxSqlDialect Separate the dialect contract into a driver-agnostic core (display/ref formatting and type mapping) and a SQL-only extension (query generators). The catalog and entity-details paths resolve the core dialect for any snapshot driver, so it must stay free of SQL generation; this is the prerequisite refactor for adding non-SQL primary sources. - KtxDialect keeps type, formatDisplayRef, parseDisplayRef, columnDisplayTablePartCount, mapDataType, mapToDimensionType - KtxSqlDialect extends it with quoteIdentifier, formatTableName, and the query/sample/statistics generators; the 7 SQL dialects implement it - add getSqlDialectForDriver for SQL drivers; the 7 connectors and the relationship-benchmark harness consume it - thread the relationship pipeline (profiling/validation/composite/ discovery) as KtxSqlDialect | null so a non-SQL source skips coverage SQL and its candidates stay in review; local-enrichment builds the SQL dialect only when the connector advertises readOnlySql Pure extraction: no behavior change for the existing 7 drivers. Co-Authored-By: Claude Opus 4.8 (1M context) * feat(connectors): add MongoDB connector for issue #305 Add a read-only MongoDB connector that treats a database as a primary context source: collections map to tables and inferred top-level fields to columns. MongoDB is the first non-SQL source (readOnlySql: false), so ktx sql and metric compilation do not apply, but its collections flow through ingest, descriptions, and relationship discovery. - schema-inference: infer a flat column schema from the most recent sample_size documents (by _id desc, or order_by for non-ObjectId keys). Union BSON types per field, mark multi-type fields mixed (string), keep sub-documents/arrays as a single opaque json column, derive nullability from presence, treat _id as the primary key - connector: KtxMongoDbScanConnector behind an injectable client seam; strictly read-only (find/listCollections/estimatedDocumentCount only), no executeReadOnly; resolves env:/file: via resolveKtxConfigReference - core-only KtxMongoDbDialect and a live-database introspection adapter - wire the mongodb driver: driver union, dialect registry, driver registration (scopeConfigKey databases), mongodbConnectionSchema, connection-drivers, normalizeDriver, the live-database route, and the ktx setup picker. ktx sql is refused by the read-only SQL capability gate - tests: schema inference, connector snapshot via a fake client, dialect, driver-schema parsing, and the ktx sql rejection Co-Authored-By: Claude Opus 4.8 (1M context) * docs(integrations): document the MongoDB primary source Add a MongoDB section to the primary-sources reference: connection config (url, databases, enabled_tables, sample_size, order_by), mongodb+srv/TLS/ Atlas notes, the schema-inference explainer, a features matrix, and the non-SQL caveat. Update the frontmatter and connection field reference. Co-Authored-By: Claude Opus 4.8 (1M context) * fix(connectors): address review blockers on the MongoDB connector - introspect: skip estimatedDocumentCount for views. The count command is rejected on a MongoDB view (CommandNotSupportedOnView), so counting a view aborted introspect for the whole connection; compute estimatedRows only for real collections, as ClickHouse does. - sl: refuse a semantic-layer query against a non-SQL connection instead of defaulting it to the Postgres dialect. compileLocalSlQuery (the shared CLI + MCP path) now rejects a driver with no SQL dialect via the new isSqlQueryableDriver authority, keeping MongoDB context-only per issue #305. - tests: cover input.tableScope and the empty-scope skip for the Mongo connector (the scan layer does not post-filter), the view no-count path, and the ktx sl query refusal for a mongodb connection. Co-Authored-By: Claude Opus 4.8 (1M context) * polish(mongodb): compute sampled nullCount and document sampling caveats Address the non-blocking review notes: - sampleColumn now counts null/absent values over the sampled window instead of returning nullCount: null, since the documents are already in hand - warn that a custom order_by must be indexed (an unindexed sort hits MongoDB's in-memory sort limit on large collections) in the connection schema and docs - note that sampled values for nested fields are stringified, not faithfully serialized, so the json opacity is deliberate Co-Authored-By: Claude Opus 4.8 (1M context) * docs(examples): add a MongoDB connector example A manual, container-backed example mirroring examples/postgres-historic: - docker-compose.yml + init/seed.js seed a representative dataset (nested documents, arrays, a Decimal128, a mixed-type field, a nullable field, an ObjectId reference, and a view) on first container start - scripts/smoke.sh + introspect-smoke.mjs assert the connector's inferred schema with no LLM credentials — the same introspection entry point ktx ingest's database-schema stage uses, including the view-no-count path - README.md documents the smoke and a full keyless ktx ingest run (claude-code LLM + managed sentence-transformers embeddings) Works with Docker Compose or podman compose. Verified end to end. Co-Authored-By: Claude Opus 4.8 (1M context) * chore: ignore examples/** in knip to fix dead-code false positives The MongoDB connector example files (examples/mongodb/init/seed.js and examples/mongodb/scripts/introspect-smoke.mjs) are used at runtime but were flagged as unused by knip. Add examples/** to the ignore array, matching the existing .context/** entry. Co-Authored-By: Claude Opus 4.8 (1M context) Claude-Session: https://claude.ai/code/session_0114qQV8fJ5a5ME3XbMVRzbL * fix(mongodb): refuse non-SQL connections before SQL analysis `ktx sql` and the MCP sql_execution tool resolved a SQL-analysis dialect (falling back to Postgres for a non-SQL driver) and ran read-only validation before the connector capability gate refused the connection. For a MongoDB connection that spun up the parser/daemon and produced Postgres parser diagnostics instead of a clean non-SQL refusal. Route both entry points through a shared assertSqlQueryableConnection guard before dialect selection, mirroring compileLocalSlQuery. The federated duckdb path has no driver and is exempted at each call site. Add CLI and MCP regression tests asserting validation/connector work never starts for a MongoDB connection. * fix(mongodb): pass CI gates (dialect boundary, secrets, setup test) Three latent failures in the connector surfaced once CI ran on the branch: - connector.ts imported the concrete KtxMongoDbDialect, which the connector dialect-import boundary forbids. Route it through getDialectForDriver('mongodb') and widen inferKtxMongoCollectionColumns to the base KtxDialect (it only uses mapDataType/mapToDimensionType). - detect-secrets flagged a test ObjectId hex and the mongodb+srv example URL; annotate both with allowlist pragmas. - the "shows every supported database" setup test omitted the new MongoDB option. --------- Co-authored-by: Claude Opus 4.8 (1M context) Co-authored-by: Luca Martial <48870843+luca-martial@users.noreply.github.com> Co-authored-by: Luca Martial Co-authored-by: Andrey Avtomonov --- .../docs/integrations/primary-sources.mdx | 87 +++- examples/README.md | 8 + examples/mongodb/README.md | 127 ++++++ examples/mongodb/docker-compose.yml | 14 + examples/mongodb/init/seed.js | 64 +++ examples/mongodb/scripts/introspect-smoke.mjs | 53 +++ examples/mongodb/scripts/smoke.sh | 37 ++ knip.json | 3 +- packages/cli/package.json | 1 + packages/cli/src/connection-drivers.ts | 1 + .../cli/src/connectors/bigquery/connector.ts | 4 +- .../cli/src/connectors/bigquery/dialect.ts | 4 +- .../src/connectors/clickhouse/connector.ts | 4 +- .../cli/src/connectors/clickhouse/dialect.ts | 4 +- .../cli/src/connectors/mongodb/connector.ts | 413 ++++++++++++++++++ .../cli/src/connectors/mongodb/dialect.ts | 64 +++ .../mongodb/live-database-introspection.ts | 44 ++ .../connectors/mongodb/schema-inference.ts | 129 ++++++ .../cli/src/connectors/mysql/connector.ts | 4 +- packages/cli/src/connectors/mysql/dialect.ts | 4 +- .../cli/src/connectors/postgres/connector.ts | 4 +- .../cli/src/connectors/postgres/dialect.ts | 4 +- .../cli/src/connectors/snowflake/connector.ts | 4 +- .../cli/src/connectors/snowflake/dialect.ts | 4 +- .../cli/src/connectors/sqlite/connector.ts | 4 +- packages/cli/src/connectors/sqlite/dialect.ts | 4 +- .../cli/src/connectors/sqlserver/connector.ts | 4 +- .../cli/src/connectors/sqlserver/dialect.ts | 4 +- .../cli/src/context/connections/dialects.ts | 81 +++- .../cli/src/context/connections/drivers.ts | 21 + .../src/context/mcp/local-project-ports.ts | 4 + .../cli/src/context/project/driver-schemas.ts | 36 ++ .../cli/src/context/scan/local-enrichment.ts | 6 +- packages/cli/src/context/scan/local-scan.ts | 5 +- .../context/scan/relationship-benchmarks.ts | 8 +- .../scan/relationship-composite-candidates.ts | 16 +- .../context/scan/relationship-discovery.ts | 12 +- .../context/scan/relationship-profiling.ts | 28 +- .../context/scan/relationship-validation.ts | 11 +- packages/cli/src/context/scan/types.ts | 3 +- packages/cli/src/context/sl/local-query.ts | 7 + packages/cli/src/local-adapters.ts | 11 + packages/cli/src/setup-databases.ts | 55 ++- packages/cli/src/sql.ts | 4 + .../test/connectors/mongodb/connector.test.ts | 241 ++++++++++ .../test/connectors/mongodb/dialect.test.ts | 33 ++ .../mongodb/schema-inference.test.ts | 103 +++++ .../test/context/connections/dialects.test.ts | 12 +- .../test/context/connections/drivers.test.ts | 6 + .../context/mcp/local-project-ports.test.ts | 27 ++ .../context/project/driver-schemas.test.ts | 22 + .../relationship-composite-candidates.test.ts | 7 +- .../scan/relationship-discovery.test.ts | 20 +- .../scan/relationship-profiling.test.ts | 29 +- .../scan/relationship-validation.test.ts | 34 +- .../cli/test/context/sl/local-query.test.ts | 12 + packages/cli/test/setup-databases.test.ts | 1 + packages/cli/test/sql.test.ts | 34 ++ pnpm-lock.yaml | 110 +++++ 59 files changed, 1971 insertions(+), 129 deletions(-) create mode 100644 examples/mongodb/README.md create mode 100644 examples/mongodb/docker-compose.yml create mode 100644 examples/mongodb/init/seed.js create mode 100644 examples/mongodb/scripts/introspect-smoke.mjs create mode 100755 examples/mongodb/scripts/smoke.sh create mode 100644 packages/cli/src/connectors/mongodb/connector.ts create mode 100644 packages/cli/src/connectors/mongodb/dialect.ts create mode 100644 packages/cli/src/connectors/mongodb/live-database-introspection.ts create mode 100644 packages/cli/src/connectors/mongodb/schema-inference.ts create mode 100644 packages/cli/test/connectors/mongodb/connector.test.ts create mode 100644 packages/cli/test/connectors/mongodb/dialect.test.ts create mode 100644 packages/cli/test/connectors/mongodb/schema-inference.test.ts diff --git a/docs-site/content/docs/integrations/primary-sources.mdx b/docs-site/content/docs/integrations/primary-sources.mdx index 6cb2d26f..bb0ce12a 100644 --- a/docs-site/content/docs/integrations/primary-sources.mdx +++ b/docs-site/content/docs/integrations/primary-sources.mdx @@ -1,6 +1,6 @@ --- title: Primary Sources -description: Connect ktx to PostgreSQL, Snowflake, BigQuery, MySQL, ClickHouse, SQL Server, or SQLite. +description: Connect ktx to PostgreSQL, Snowflake, BigQuery, MySQL, ClickHouse, SQL Server, SQLite, or MongoDB. --- **ktx** connects to your data warehouse or database to build schema context, @@ -26,10 +26,12 @@ Agents should prefer environment or file references over literal secrets. | Field | Required | Applies to | Description | |-------|----------|------------|-------------| -| `driver` | Yes | all connections | Connector driver such as `postgres`, `snowflake`, `bigquery`, `mysql`, `clickhouse`, `sqlserver`, or `sqlite` | +| `driver` | Yes | all connections | Connector driver such as `postgres`, `snowflake`, `bigquery`, `mysql`, `clickhouse`, `sqlserver`, `sqlite`, or `mongodb` | | `url` | One of the connection methods | URL-style connectors | Database URL, `env:NAME`, or `file:/path/to/secret` | | `host`, `port`, `database`, `username`, `password` | One of the connection methods | PostgreSQL, MySQL, SQL Server | Field-by-field connection values | | `schema` or `schemas` | No | schema-aware warehouses | Single schema or list of schemas to scan | +| `databases` | No | ClickHouse, MongoDB | List of databases to scan | +| `sample_size`, `order_by` | No | MongoDB | Schema-inference sampling controls (recent documents, sort field) | | `context.queryHistory` | No | PostgreSQL, Snowflake, BigQuery | Enables query-history ingestion when the warehouse supports it | | `path` | Yes for path-style SQLite | SQLite | Local SQLite database path or `env:NAME` reference | | `max_bytes_billed` | No | BigQuery | Maximum bytes billed per query job | @@ -510,6 +512,87 @@ No authentication required - SQLite is file-based. The file must be readable by - Foreign key enforcement requires explicit `PRAGMA foreign_keys = ON` - Database file must exist before `ktx connection test` or ingest runs +--- + +## MongoDB + +Connects to MongoDB as a primary context source. **ktx** treats each collection +as a table and each inferred top-level field as a column. MongoDB is a non-SQL +source: `ktx sql` and semantic-layer metric compilation do not apply to a MongoDB +connection, but its collections still flow through `ktx ingest`, descriptions, and +relationship discovery. + +### Connection config + +```yaml title="ktx.yaml" +connections: + mongo-prod: + driver: mongodb + url: env:MONGO_URL + databases: [app] + enabled_tables: [app.users, app.orders] # optional collection allowlist + sample_size: 1000 + # order_by: createdAt # only when _id is not an ObjectId +``` + +Standard `mongodb://` and `mongodb+srv://` connection strings are supported, +including TLS and MongoDB Atlas — pass the full connection string (with its +query parameters) as `url`. The `databases` list selects which databases to +introspect; if omitted, **ktx** uses the database in the URL path. `ktx setup` +also offers MongoDB and stores the selected databases under +`connections..databases`. + +### Authentication + +| Method | Config | +|--------|--------| +| Connection string | `url: env:MONGO_URL` or `url: file:/path/to/secret` | +| Atlas / TLS | Use a `mongodb+srv://` URL with the credentials and TLS options Atlas provides | + +### Schema inference + +MongoDB has no fixed schema, so **ktx** infers one by sampling the most recent +`sample_size` documents per collection (default 1000), sorted by `_id` +descending. Because an ObjectId embeds its creation time, this captures the +collection's current shape with zero configuration. When `_id` is not an +ObjectId (custom string or UUID keys), set `order_by` to a timestamp field such +as `createdAt` so "most recent" is well-defined. A custom `order_by` field +should be indexed — an unindexed sort hits MongoDB's in-memory sort limit and +fails on large collections (`_id`, the default, is always indexed). + +For each top-level field, **ktx** unions the BSON types seen and derives +nullability from how often the field is present: + +- Scalar BSON types map to `string`, `number`, `time`, or `boolean` +- A field seen with more than one type is recorded as `mixed` and treated as a string +- Sub-documents and arrays become a single opaque `json` column (no dotted-path + columns); their sampled values are stringified, not faithfully serialized +- `_id` is the primary key + +### Features + +| Feature | Supported | Notes | +|---------|-----------|-------| +| Collections (as tables) | Yes | Via `listCollections`; `system.*` collections are excluded | +| Primary keys | Yes | `_id` | +| Foreign keys | No | MongoDB has no formal foreign keys | +| Row count estimates | Yes | Via `estimatedDocumentCount` | +| Column statistics | No | - | +| Query history | No | - | +| Table sampling | Yes | Reads the most recent documents | +| Nested analysis | Yes | Sub-documents and arrays modeled as opaque `json` | +| Read-only SQL (`ktx sql`) | No | MongoDB is not a SQL source | + +### Dialect notes + +- Strictly read-only: the connector only issues `find`, `listCollections`, + `estimatedDocumentCount`, and read aggregations +- Sampling rides the `_id` index and uses a server-side time limit so large + collections do not stall a run; a custom `order_by` must be indexed for the + same guarantee +- `sample_size` trades inference coverage for speed; raise it for collections + with highly variable documents + ## Common errors | Error or symptom | Likely cause | Recovery | diff --git a/examples/README.md b/examples/README.md index e84594fe..56e3043b 100644 --- a/examples/README.md +++ b/examples/README.md @@ -23,6 +23,14 @@ at the Orbit-style no-declared-constraint relationship fixture and verifies that relationship enrichment writes nine accepted joins without requiring a local warehouse credential. +## mongodb + +`mongodb/` is a manual container-backed example for the MongoDB connector. It +seeds a representative dataset (nested documents, arrays, a mixed-type field, a +nullable field, and a view), then exercises the connector as a fast no-LLM +introspection smoke (`scripts/smoke.sh`) and documents a full keyless +`ktx ingest` run. Works with Docker Compose or `podman compose`. + ## postgres-historic `postgres-historic/` is a manual Docker-backed smoke for Postgres diff --git a/examples/mongodb/README.md b/examples/mongodb/README.md new file mode 100644 index 00000000..f51bf0bd --- /dev/null +++ b/examples/mongodb/README.md @@ -0,0 +1,127 @@ +# MongoDB Connector Example + +A manual, self-contained example for the **ktx** MongoDB connector. It starts a +local MongoDB, seeds a representative dataset, and exercises the connector both +as a fast no-LLM introspection smoke and as a full `ktx ingest` run. + +MongoDB is a **context-only** primary source: collections become tables and +inferred top-level fields become columns, but `ktx sql` and semantic-layer +metric compilation do not apply. See +[`docs-site/content/docs/integrations/primary-sources.mdx`](../../docs-site/content/docs/integrations/primary-sources.mdx). + +## Prerequisites + +- Docker with Compose v2, or Podman with `podman compose` +- Node and pnpm matching the **ktx** workspace +- The built CLI: `pnpm --filter @kaelio/ktx run build` +- For the full ingest only: `uv` on `PATH` and a usable local Claude Code + session (the keyless `claude-code` LLM backend) + +## What the seed contains + +[`init/seed.js`](init/seed.js) creates the `app` database with: + +- `users` — `_id` (ObjectId), scalar fields, a nested `address`, an array + `tags`, a `Decimal128` `balance`, a `ref` field that holds more than one type + (inferred `mixed`), and an `age` field absent from one document (nullable) +- `orders` — an ObjectId `user_id` reference for relationship discovery +- `active_users` — a **view** (to confirm introspection never runs a count + command on a view) + +MongoDB applies the script once on first container start. Apply it by hand with: + +```bash +mongosh "mongodb://localhost:27117" < examples/mongodb/init/seed.js +``` + +## Smoke (no LLM credentials) + +From the **ktx** repository root: + +```bash +examples/mongodb/scripts/smoke.sh +``` + +It starts MongoDB on `127.0.0.1:27117`, seeds it, and asserts the connector's +inferred schema (collections → tables, nested → `json`, `mixed`, nullability, +`_id` primary key, and a view introspected with `estimatedRows: null`). This +drives the same entry point `ktx ingest`'s "database schema" stage uses, without +needing an LLM or embeddings. + +Podman: + +```bash +KTX_MONGODB_COMPOSE="podman compose" examples/mongodb/scripts/smoke.sh +``` + +Set `KTX_MONGODB_KEEP=1` to leave the container running after the script exits. + +## Full `ktx ingest` + +The public database-ingest path requires a configured model and embeddings. +This runs entirely locally with the keyless `claude-code` LLM backend and the +**ktx**-managed `sentence-transformers` embedding daemon — no API keys. + +Start MongoDB and create a project: + +```bash +docker compose -f examples/mongodb/docker-compose.yml up -d --wait # or: podman compose +node packages/cli/dist/bin.js admin init /tmp/ktx-mongodb-example +``` + +Add the connection and a keyless enrichment stack to +`/tmp/ktx-mongodb-example/ktx.yaml`: + +```yaml +connections: + mongo-prod: + driver: mongodb + url: mongodb://localhost:27117/app + databases: + - app +llm: + provider: + backend: claude-code + models: + default: sonnet +scan: + enrichment: + mode: llm + embeddings: + backend: sentence-transformers + model: all-MiniLM-L6-v2 + dimensions: 384 + sentenceTransformers: + base_url: "" +``` + +Test the connection and ingest: + +```bash +node packages/cli/dist/bin.js connection test mongo-prod --project-dir /tmp/ktx-mongodb-example +node packages/cli/dist/bin.js ingest mongo-prod --project-dir /tmp/ktx-mongodb-example --yes --plain +``` + +The first ingest starts the **ktx** embedding daemon and downloads the +`all-MiniLM-L6-v2` model. Expected final state: `Database schema: done`. + +Inspect the result: + +- `raw-sources/mongo-prod/live-database//tables/*.json` — one per + collection, including the `active_users` view with `estimatedRows: null` +- `raw-sources/mongo-prod/live-database//enrichment/relationships.json` — + inferred relationships sit in `review` (a non-SQL source has no read-only SQL + coverage validation), with `accepted: []` +- `semantic-layer/mongo-prod/_schema/app.yaml` — the schema with per-column AI + descriptions + +`ktx sql -c mongo-prod "SELECT 1"` is refused by the read-only SQL capability +gate, and `ktx sl query -c mongo-prod ...` is refused because MongoDB is not a +SQL source. + +## Cleanup + +```bash +docker compose -f examples/mongodb/docker-compose.yml down -v # or: podman compose +rm -rf /tmp/ktx-mongodb-example +``` diff --git a/examples/mongodb/docker-compose.yml b/examples/mongodb/docker-compose.yml new file mode 100644 index 00000000..d753ec43 --- /dev/null +++ b/examples/mongodb/docker-compose.yml @@ -0,0 +1,14 @@ +services: + mongodb: + image: mongo:7 + ports: + # Non-default host port so the example does not clash with a local MongoDB. + - "27117:27017" + healthcheck: + test: ["CMD-SHELL", "mongosh --quiet --eval \"db.runCommand({ ping: 1 }).ok\" | grep -q 1"] + interval: 2s + timeout: 5s + retries: 30 + volumes: + # MongoDB runs *.js here once, on first start, against an empty data dir. + - ./init:/docker-entrypoint-initdb.d:ro diff --git a/examples/mongodb/init/seed.js b/examples/mongodb/init/seed.js new file mode 100644 index 00000000..0e76928a --- /dev/null +++ b/examples/mongodb/init/seed.js @@ -0,0 +1,64 @@ +// Seed a representative MongoDB dataset for the ktx connector example. +// +// MongoDB runs this once on first container start (it is mounted into +// /docker-entrypoint-initdb.d). It can also be applied by hand: +// mongosh "mongodb://localhost:27117" < examples/mongodb/init/seed.js +// +// The shapes here exercise the connector's schema inference end to end: +// scalar BSON types, a nested sub-document, an array, Decimal128, dates, a +// field with more than one type (-> "mixed"), an absent field (-> nullable), +// an ObjectId reference for relationship discovery, and a view (to confirm +// introspection never runs a count command on a view). +const app = db.getSiblingDB('app'); + +app.users.drop(); +app.orders.drop(); + +app.users.insertMany([ + { + email: 'ada@example.com', + age: 31, + active: true, + created: new Date('2026-01-04T10:00:00Z'), + balance: NumberDecimal('120.50'), + address: { city: 'NY', zip: '10001' }, + tags: ['admin', 'early-access'], + ref: 'abc', + }, + { + email: 'grace@example.com', + active: false, + created: new Date('2026-02-11T08:30:00Z'), + balance: NumberDecimal('0.00'), + address: { city: 'SF', zip: '94016' }, + tags: [], + ref: 42, // a second type for this field -> inferred "mixed" + // age intentionally absent -> inferred nullable + }, + { + email: 'linus@example.com', + age: 27, + active: true, + created: new Date('2026-03-01T12:00:00Z'), + balance: NumberDecimal('9.99'), + address: { city: 'Austin', zip: '73301' }, + tags: ['beta'], + ref: null, + }, +]); + +const userIds = app.users.find({}, { _id: 1 }).toArray().map((u) => u._id); + +app.orders.insertMany([ + { user_id: userIds[0], total: 120.5, status: 'paid', placed: new Date('2026-03-02T09:00:00Z') }, + { user_id: userIds[0], total: 9.99, status: 'pending', placed: new Date('2026-03-05T14:00:00Z') }, + { user_id: userIds[1], total: 50.25, status: 'paid', placed: new Date('2026-03-06T16:00:00Z') }, +]); + +// A view, to confirm introspection does not issue a count command on it +// (MongoDB rejects count on a view with CommandNotSupportedOnView). +app.createView('active_users', 'users', [{ $match: { active: true } }]); + +print('users: ' + app.users.countDocuments()); +print('orders: ' + app.orders.countDocuments()); +print('collections: ' + app.getCollectionNames().join(', ')); diff --git a/examples/mongodb/scripts/introspect-smoke.mjs b/examples/mongodb/scripts/introspect-smoke.mjs new file mode 100644 index 00000000..511681fc --- /dev/null +++ b/examples/mongodb/scripts/introspect-smoke.mjs @@ -0,0 +1,53 @@ +// Deterministic, no-LLM smoke for the MongoDB connector. Drives the same +// introspection entry point ktx ingest's "database schema" stage uses, against +// the seeded example database, and asserts the inferred schema. +// +// Usage: node introspect-smoke.mjs [mongoUrl] +import { fileURLToPath } from 'node:url'; +import { dirname, resolve } from 'node:path'; + +const here = dirname(fileURLToPath(import.meta.url)); +const ktxRoot = resolve(here, '../../..'); +const connectorUrl = `file://${resolve( + ktxRoot, + 'packages/cli/dist/connectors/mongodb/live-database-introspection.js', +)}`; + +const mongoUrl = process.argv[2] ?? 'mongodb://localhost:27117/app'; + +const { createMongoDbLiveDatabaseIntrospection } = await import(connectorUrl); + +function assert(condition, message) { + if (!condition) { + throw new Error(`assertion failed: ${message}`); + } +} + +const port = createMongoDbLiveDatabaseIntrospection({ + connections: { 'mongo-example': { driver: 'mongodb', url: mongoUrl, databases: ['app'] } }, +}); + +const snapshot = await port.extractSchema('mongo-example'); +const tables = new Map(snapshot.tables.map((table) => [table.name, table])); + +assert(snapshot.driver === 'mongodb', 'snapshot driver is mongodb'); +assert(['orders', 'users'].every((name) => tables.has(name)), 'users and orders collections introspected'); + +const users = tables.get('users'); +const columns = new Map(users.columns.map((column) => [column.name, column])); +assert(columns.get('_id')?.primaryKey === true && columns.get('_id')?.nullable === false, '_id is the non-null primary key'); +assert(columns.get('age')?.nullable === true, 'age is nullable (absent in one document)'); +assert(columns.get('email')?.nullable === false, 'email is non-nullable (present in every document)'); +assert(columns.get('address')?.normalizedType === 'json', 'nested address maps to opaque json'); +assert(columns.get('tags')?.normalizedType === 'json', 'array tags maps to opaque json'); +assert(columns.get('ref')?.nativeType === 'mixed', 'ref with two types is inferred as mixed'); + +const view = tables.get('active_users'); +assert(view?.kind === 'view', 'active_users is a view'); +assert(view?.estimatedRows === null, 'a view is introspected without a count (estimatedRows null)'); + +console.log(`OK: introspected ${snapshot.tables.length} collections from ${mongoUrl}`); +for (const table of snapshot.tables) { + console.log(` - ${table.db}.${table.name} (${table.kind}, ${table.columns.length} columns)`); +} +process.exit(0); diff --git a/examples/mongodb/scripts/smoke.sh b/examples/mongodb/scripts/smoke.sh new file mode 100755 index 00000000..4c747083 --- /dev/null +++ b/examples/mongodb/scripts/smoke.sh @@ -0,0 +1,37 @@ +#!/usr/bin/env bash +set -euo pipefail + +# Manual smoke for the MongoDB connector: start MongoDB, seed it, and assert the +# connector's schema introspection (the deterministic, no-LLM half of ktx ingest's +# "database schema" stage). The full enrichment ingest is documented in README.md. + +SCRIPT_DIR="$(cd "$(dirname "${BASH_SOURCE[0]}")" && pwd)" +EXAMPLE_DIR="$(cd "$SCRIPT_DIR/.." && pwd)" +KTX_ROOT="$(cd "$EXAMPLE_DIR/../.." && pwd)" +COMPOSE_FILE="$EXAMPLE_DIR/docker-compose.yml" +CONNECTOR="$KTX_ROOT/packages/cli/dist/connectors/mongodb/live-database-introspection.js" +MONGO_URL="${KTX_MONGODB_URL:-mongodb://localhost:27117/app}" + +# Compose engine: docker by default, override for podman: +# KTX_MONGODB_COMPOSE="podman compose" examples/mongodb/scripts/smoke.sh +COMPOSE="${KTX_MONGODB_COMPOSE:-docker compose}" + +cleanup() { + if [[ "${KTX_MONGODB_KEEP:-0}" != "1" ]]; then + $COMPOSE -f "$COMPOSE_FILE" down -v >/dev/null 2>&1 || true + fi +} +trap cleanup EXIT + +if [[ ! -f "$CONNECTOR" ]]; then + echo "Build the CLI first: pnpm --filter @kaelio/ktx run build" >&2 + exit 1 +fi + +echo "Starting MongoDB and seeding (${COMPOSE})…" +$COMPOSE -f "$COMPOSE_FILE" up -d --wait + +echo "Asserting connector introspection against ${MONGO_URL}…" +node "$SCRIPT_DIR/introspect-smoke.mjs" "$MONGO_URL" + +echo "Smoke passed." diff --git a/knip.json b/knip.json index 65b1a0a2..a2d16ca8 100644 --- a/knip.json +++ b/knip.json @@ -38,7 +38,8 @@ "conventional-changelog-conventionalcommits" ], "ignore": [ - ".context/**" + ".context/**", + "examples/**" ], "ignoreBinaries": [ "uv", diff --git a/packages/cli/package.json b/packages/cli/package.json index a5f9e581..9aa57d23 100644 --- a/packages/cli/package.json +++ b/packages/cli/package.json @@ -72,6 +72,7 @@ "ink": "^7.0.3", "lookml-parser": "7.1.0", "minimatch": "^10.2.5", + "mongodb": "^6.12.0", "mssql": "^12.5.4", "mysql2": "^3.22.3", "openai": "^6.38.0", diff --git a/packages/cli/src/connection-drivers.ts b/packages/cli/src/connection-drivers.ts index 4f10e663..be98746b 100644 --- a/packages/cli/src/connection-drivers.ts +++ b/packages/cli/src/connection-drivers.ts @@ -8,6 +8,7 @@ const KTX_DATABASE_DRIVER_IDS = new Set([ 'sqlserver', 'bigquery', 'snowflake', + 'mongodb', ]); export function normalizeConnectionDriver(connection: KtxProjectConnectionConfig): string { diff --git a/packages/cli/src/connectors/bigquery/connector.ts b/packages/cli/src/connectors/bigquery/connector.ts index 0b30c025..e4e284b6 100644 --- a/packages/cli/src/connectors/bigquery/connector.ts +++ b/packages/cli/src/connectors/bigquery/connector.ts @@ -1,6 +1,6 @@ import { BigQuery, type TableField } from '@google-cloud/bigquery'; import { normalizeBigQueryProjectId, normalizeBigQueryRegion } from '../../context/connections/bigquery-identifiers.js'; -import { getDialectForDriver } from '../../context/connections/dialects.js'; +import { getSqlDialectForDriver } from '../../context/connections/dialects.js'; import { assertReadOnlySql, limitSqlForExecution } from '../../context/connections/read-only-sql.js'; import { tryConstraintQuery } from '../../context/scan/constraint-discovery.js'; import { scopedTableNames } from '../../context/scan/table-ref.js'; @@ -291,7 +291,7 @@ export class KtxBigQueryScanConnector implements KtxScanConnector { private readonly now: () => Date; private readonly maxBytesBilled?: number | string; private readonly queryTimeoutMs?: number; - private readonly dialect = getDialectForDriver('bigquery'); + private readonly dialect = getSqlDialectForDriver('bigquery'); private client: KtxBigQueryClient | null = null; constructor(options: KtxBigQueryScanConnectorOptions) { diff --git a/packages/cli/src/connectors/bigquery/dialect.ts b/packages/cli/src/connectors/bigquery/dialect.ts index 0e2f883e..b63391fe 100644 --- a/packages/cli/src/connectors/bigquery/dialect.ts +++ b/packages/cli/src/connectors/bigquery/dialect.ts @@ -1,4 +1,4 @@ -import type { KtxDialect } from '../../context/connections/dialects.js'; +import type { KtxSqlDialect } from '../../context/connections/dialects.js'; import { columnDisplayPartCount, formatDialectDisplayRef, @@ -11,7 +11,7 @@ import type { KtxSchemaDimensionType, KtxTableRef } from '../../context/scan/typ type BigQueryTableNameRef = Pick & Partial>; /** @internal */ -export class KtxBigQueryDialect implements KtxDialect { +export class KtxBigQueryDialect implements KtxSqlDialect { readonly type = 'bigquery' as const; private readonly typeMappings: Record = { diff --git a/packages/cli/src/connectors/clickhouse/connector.ts b/packages/cli/src/connectors/clickhouse/connector.ts index 38a477e7..e08cc732 100644 --- a/packages/cli/src/connectors/clickhouse/connector.ts +++ b/packages/cli/src/connectors/clickhouse/connector.ts @@ -1,5 +1,5 @@ import { createClient } from '@clickhouse/client'; -import { getDialectForDriver } from '../../context/connections/dialects.js'; +import { getSqlDialectForDriver } from '../../context/connections/dialects.js'; import { assertReadOnlySql, limitSqlForExecution } from '../../context/connections/read-only-sql.js'; import { connectorTestFailure, createKtxConnectorCapabilities, type KtxConnectorTestResult, type KtxColumnSampleInput, type KtxColumnSampleResult, type KtxColumnStatsInput, type KtxColumnStatsResult, type KtxQueryResult, type KtxReadOnlyQueryInput, type KtxScanConnector, type KtxScanContext, type KtxScanInput, type KtxSchemaColumn, type KtxSchemaSnapshot, type KtxSchemaTable, type KtxTableRef, type KtxTableSampleInput, type KtxTableListEntry, type KtxTableSampleResult } from '../../context/scan/types.js'; import { scopedTableNames } from '../../context/scan/table-ref.js'; @@ -284,7 +284,7 @@ export class KtxClickHouseScanConnector implements KtxScanConnector { private readonly clientFactory: KtxClickHouseClientFactory; private readonly endpointResolver?: KtxClickHouseEndpointResolver; private readonly now: () => Date; - private readonly dialect = getDialectForDriver('clickhouse'); + private readonly dialect = getSqlDialectForDriver('clickhouse'); private client: KtxClickHouseClient | null = null; private resolvedEndpoint: KtxClickHouseResolvedEndpoint | null = null; diff --git a/packages/cli/src/connectors/clickhouse/dialect.ts b/packages/cli/src/connectors/clickhouse/dialect.ts index 9e470cae..8f3f5bb1 100644 --- a/packages/cli/src/connectors/clickhouse/dialect.ts +++ b/packages/cli/src/connectors/clickhouse/dialect.ts @@ -1,4 +1,4 @@ -import type { KtxDialect } from '../../context/connections/dialects.js'; +import type { KtxSqlDialect } from '../../context/connections/dialects.js'; import { columnDisplayPartCount, formatDialectDisplayRef, @@ -11,7 +11,7 @@ import type { KtxSchemaDimensionType, KtxTableRef } from '../../context/scan/typ type ClickHouseTableNameRef = Pick & Partial>; /** @internal */ -export class KtxClickHouseDialect implements KtxDialect { +export class KtxClickHouseDialect implements KtxSqlDialect { readonly type = 'clickhouse' as const; private readonly typeMappings: Record = { diff --git a/packages/cli/src/connectors/mongodb/connector.ts b/packages/cli/src/connectors/mongodb/connector.ts new file mode 100644 index 00000000..f46d6038 --- /dev/null +++ b/packages/cli/src/connectors/mongodb/connector.ts @@ -0,0 +1,413 @@ +import { MongoClient } from 'mongodb'; +import { resolveKtxConfigReference } from '../../context/core/config-reference.js'; +import { + connectorTestFailure, + createKtxConnectorCapabilities, + type KtxColumnSampleInput, + type KtxColumnSampleResult, + type KtxConnectorTestResult, + type KtxScanConnector, + type KtxScanContext, + type KtxScanInput, + type KtxSchemaSnapshot, + type KtxSchemaTable, + type KtxTableListEntry, + type KtxTableRef, + type KtxTableSampleInput, + type KtxTableSampleResult, +} from '../../context/scan/types.js'; +import { scopedTableNames } from '../../context/scan/table-ref.js'; +import { getDialectForDriver } from '../../context/connections/dialects.js'; +import { inferKtxMongoCollectionColumns, type KtxMongoDocument, MONGO_ID_FIELD } from './schema-inference.js'; + +const DEFAULT_SAMPLE_SIZE = 1000; +const SAMPLE_MAX_TIME_MS = 30_000; + +export interface KtxMongoDbConnectionConfig { + driver?: string; + url?: string; + database?: string; + databases?: string[]; + enabled_tables?: string[]; + sample_size?: number; + order_by?: string; + [key: string]: unknown; +} + +export interface KtxMongoListedCollection { + name: string; + type?: string; +} + +interface KtxMongoFindOptions { + sort: Record; + limit: number; + projection?: Record; +} + +/** Driver-agnostic seam over the `mongodb` client so the connector is unit-testable without a server. */ +export interface KtxMongoClient { + listCollections(databaseName: string): Promise; + estimatedDocumentCount(databaseName: string, collectionName: string): Promise; + find(databaseName: string, collectionName: string, options: KtxMongoFindOptions): Promise; + ping(databaseName: string): Promise; + close(): Promise; +} + +export interface KtxMongoClientFactory { + create(url: string): KtxMongoClient; +} + +export interface KtxMongoDbScanConnectorOptions { + connectionId: string; + connection: KtxMongoDbConnectionConfig | undefined; + clientFactory?: KtxMongoClientFactory; + env?: NodeJS.ProcessEnv; + now?: () => Date; +} + +class DefaultMongoClient implements KtxMongoClient { + private readonly client: MongoClient; + private connected = false; + + constructor(url: string) { + this.client = new MongoClient(url); + } + + private async connectedClient(): Promise { + if (!this.connected) { + await this.client.connect(); + this.connected = true; + } + return this.client; + } + + async listCollections(databaseName: string): Promise { + const client = await this.connectedClient(); + const collections = await client.db(databaseName).listCollections({}, { nameOnly: false }).toArray(); + return collections.map((collection) => ({ name: collection.name, type: collection.type })); + } + + async estimatedDocumentCount(databaseName: string, collectionName: string): Promise { + const client = await this.connectedClient(); + return client.db(databaseName).collection(collectionName).estimatedDocumentCount(); + } + + async find( + databaseName: string, + collectionName: string, + options: KtxMongoFindOptions, + ): Promise { + const client = await this.connectedClient(); + return client + .db(databaseName) + .collection(collectionName) + .find({}, { sort: options.sort, limit: options.limit, maxTimeMS: SAMPLE_MAX_TIME_MS, ...(options.projection ? { projection: options.projection } : {}) }) + .toArray() as Promise; + } + + async ping(databaseName: string): Promise { + const client = await this.connectedClient(); + await client.db(databaseName).command({ ping: 1 }); + } + + async close(): Promise { + if (this.connected) { + await this.client.close(); + this.connected = false; + } + } +} + +class DefaultMongoClientFactory implements KtxMongoClientFactory { + create(url: string): KtxMongoClient { + return new DefaultMongoClient(url); + } +} + +export function isKtxMongoDbConnectionConfig( + connection: KtxMongoDbConnectionConfig | undefined, +): connection is KtxMongoDbConnectionConfig { + return String(connection?.driver ?? '').toLowerCase() === 'mongodb'; +} + +function databaseFromUrl(url: string): string | undefined { + try { + const path = new URL(url).pathname.replace(/^\/+/, ''); + const database = path.split('/')[0]; + return database && database.length > 0 ? decodeURIComponent(database) : undefined; + } catch { + return undefined; + } +} + +function configuredDatabases(connection: KtxMongoDbConnectionConfig, fallback: string | undefined): string[] { + if (Array.isArray(connection.databases)) { + const selected = connection.databases + .filter((database): database is string => typeof database === 'string' && database.trim().length > 0) + .map((database) => database.trim()); + if (selected.length > 0) { + return [...new Set(selected)]; + } + } + const single = typeof connection.database === 'string' && connection.database.trim().length > 0 + ? connection.database.trim() + : fallback; + return single ? [single] : []; +} + +function positiveInteger(value: unknown, fallback: number): number { + return typeof value === 'number' && Number.isInteger(value) && value > 0 ? value : fallback; +} + +function normalizeSampleValue(value: unknown): unknown { + if (value === null || value === undefined) { + return null; + } + if (value instanceof Date) { + return value.toISOString(); + } + if (typeof value === 'object') { + const bsontype = (value as { _bsontype?: unknown })._bsontype; + return typeof bsontype === 'string' ? String(value) : JSON.stringify(value); + } + return value; +} + +function unionDocumentKeys(documents: readonly KtxMongoDocument[]): string[] { + const keys: string[] = []; + const seen = new Set(); + for (const document of documents) { + for (const key of Object.keys(document)) { + if (!seen.has(key)) { + seen.add(key); + keys.push(key); + } + } + } + return keys; +} + +export class KtxMongoDbScanConnector implements KtxScanConnector { + readonly id: string; + readonly driver = 'mongodb' as const; + readonly capabilities = createKtxConnectorCapabilities({ + tableSampling: true, + columnSampling: true, + columnStats: false, + readOnlySql: false, + nestedAnalysis: true, + formalForeignKeys: false, + estimatedRowCounts: true, + }); + + private readonly connectionId: string; + private readonly connection: KtxMongoDbConnectionConfig; + private readonly url: string; + private readonly databases: string[]; + private readonly sampleSize: number; + private readonly orderBy: string; + private readonly enabledTables: ReadonlySet | null; + private readonly clientFactory: KtxMongoClientFactory; + private readonly now: () => Date; + private readonly dialect = getDialectForDriver('mongodb'); + private client: KtxMongoClient | null = null; + + constructor(options: KtxMongoDbScanConnectorOptions) { + const connection = options.connection ?? {}; + const inputDriver = connection.driver ?? 'unknown'; + if (!isKtxMongoDbConnectionConfig(connection)) { + throw new Error(`Native MongoDB connector cannot run driver "${inputDriver}"`); + } + const env = options.env ?? process.env; + const url = resolveKtxConfigReference( + typeof connection.url === 'string' ? connection.url.trim() : undefined, + env, + ); + if (!url) { + throw new Error(`Native MongoDB connector requires connections.${options.connectionId}.url`); + } + const databases = configuredDatabases(connection, databaseFromUrl(url)); + if (databases.length === 0) { + throw new Error( + `Native MongoDB connector requires connections.${options.connectionId}.databases (or a database in the URL)`, + ); + } + const enabledTables = Array.isArray(connection.enabled_tables) + ? new Set( + connection.enabled_tables + .filter((table): table is string => typeof table === 'string' && table.trim().length > 0) + .map((table) => table.trim()), + ) + : null; + + this.connectionId = options.connectionId; + this.connection = connection; + this.url = url; + this.databases = databases; + this.sampleSize = positiveInteger(connection.sample_size, DEFAULT_SAMPLE_SIZE); + this.orderBy = typeof connection.order_by === 'string' && connection.order_by.trim().length > 0 + ? connection.order_by.trim() + : MONGO_ID_FIELD; + this.enabledTables = enabledTables && enabledTables.size > 0 ? enabledTables : null; + this.clientFactory = options.clientFactory ?? new DefaultMongoClientFactory(); + this.now = options.now ?? (() => new Date()); + this.id = `mongodb:${options.connectionId}`; + } + + async testConnection(): Promise { + try { + await this.clientForQuery().ping(this.databases[0]!); + return { success: true }; + } catch (error) { + return connectorTestFailure(error); + } + } + + async introspect(input: KtxScanInput, _ctx: KtxScanContext): Promise { + this.assertConnection(input.connectionId); + const client = this.clientForQuery(); + const tables: KtxSchemaTable[] = []; + + for (const database of this.databases) { + const scopedNames = input.tableScope + ? new Set(scopedTableNames(input.tableScope, { catalog: null, db: database })) + : null; + const collections = await client.listCollections(database); + for (const collection of collections) { + if (collection.name.startsWith('system.')) { + continue; + } + if (scopedNames && !scopedNames.has(collection.name)) { + continue; + } + if (this.enabledTables && !this.enabledTables.has(`${database}.${collection.name}`)) { + continue; + } + tables.push(await this.introspectCollection(client, database, collection)); + } + } + + return { + connectionId: this.connectionId, + driver: 'mongodb', + extractedAt: this.now().toISOString(), + scope: { schemas: this.databases }, + metadata: { + databases: this.databases, + sample_size: this.sampleSize, + order_by: this.orderBy, + table_count: tables.length, + total_columns: tables.reduce((sum, table) => sum + table.columns.length, 0), + }, + tables, + }; + } + + private async introspectCollection( + client: KtxMongoClient, + database: string, + collection: KtxMongoListedCollection, + ): Promise { + const isView = collection.type === 'view'; + // estimatedDocumentCount issues a count command, which MongoDB rejects on a + // view (CommandNotSupportedOnView); only count real collections. + const estimatedRows = isView ? null : await client.estimatedDocumentCount(database, collection.name); + const documents = await client.find(database, collection.name, { + sort: { [this.orderBy]: -1 }, + limit: this.sampleSize, + }); + return { + catalog: null, + db: database, + name: collection.name, + kind: isView ? 'view' : 'table', + comment: null, + estimatedRows, + columns: inferKtxMongoCollectionColumns(documents, this.dialect), + foreignKeys: [], + }; + } + + async sampleTable(input: KtxTableSampleInput, _ctx: KtxScanContext): Promise { + this.assertConnection(input.connectionId); + const { database, collection } = this.resolveTableRef(input.table); + const documents = await this.clientForQuery().find(database, collection, { + sort: { [this.orderBy]: -1 }, + limit: input.limit, + }); + const headers = input.columns && input.columns.length > 0 ? input.columns : unionDocumentKeys(documents); + const rows = documents.map((document) => headers.map((header) => normalizeSampleValue(document[header]))); + return { headers, rows, totalRows: documents.length }; + } + + async sampleColumn(input: KtxColumnSampleInput, _ctx: KtxScanContext): Promise { + this.assertConnection(input.connectionId); + const { database, collection } = this.resolveTableRef(input.table); + const documents = await this.clientForQuery().find(database, collection, { + sort: { [this.orderBy]: -1 }, + limit: input.limit, + projection: { [input.column]: 1 }, + }); + const values: unknown[] = []; + let nullCount = 0; + for (const document of documents) { + const value = document[input.column]; + if (value === null || value === undefined) { + nullCount += 1; + continue; + } + values.push(normalizeSampleValue(value)); + } + return { values, nullCount, distinctCount: null }; + } + + async listSchemas(): Promise { + return [...this.databases]; + } + + async listTables(schemas?: string[]): Promise { + const client = this.clientForQuery(); + const databases = schemas && schemas.length > 0 ? schemas : this.databases; + const entries: KtxTableListEntry[] = []; + for (const database of databases) { + const collections = await client.listCollections(database); + for (const collection of collections) { + if (collection.name.startsWith('system.')) { + continue; + } + entries.push({ + catalog: null, + schema: database, + name: collection.name, + kind: collection.type === 'view' ? 'view' : 'table', + }); + } + } + return entries; + } + + async cleanup(): Promise { + if (this.client) { + await this.client.close(); + this.client = null; + } + } + + private resolveTableRef(table: KtxTableRef): { database: string; collection: string } { + return { database: table.db ?? this.databases[0]!, collection: table.name }; + } + + private clientForQuery(): KtxMongoClient { + if (!this.client) { + this.client = this.clientFactory.create(this.url); + } + return this.client; + } + + private assertConnection(connectionId: string): void { + if (connectionId !== this.connectionId) { + throw new Error(`ktx MongoDB connector ${this.id} cannot serve connection ${connectionId}`); + } + } +} diff --git a/packages/cli/src/connectors/mongodb/dialect.ts b/packages/cli/src/connectors/mongodb/dialect.ts new file mode 100644 index 00000000..3bda71b7 --- /dev/null +++ b/packages/cli/src/connectors/mongodb/dialect.ts @@ -0,0 +1,64 @@ +import type { KtxDialect } from '../../context/connections/dialects.js'; +import { + columnDisplayPartCount, + formatDialectDisplayRef, + parseDialectDisplayRef, +} from '../../context/connections/dialect-helpers.js'; +import type { KtxDialectTableRef } from '../../context/connections/dialect-helpers.js'; +import type { KtxSchemaDimensionType, KtxTableRef } from '../../context/scan/types.js'; + +/** + * Display/type-mapping half of the dialect contract for MongoDB. Collections map + * to `db.collection` display refs (ansi two-part shape). MongoDB is a non-SQL + * source, so it implements {@link KtxDialect} only — never {@link KtxSqlDialect}. + */ +export class KtxMongoDbDialect implements KtxDialect { + readonly type = 'mongodb' as const; + + private readonly typeMappings: Record = { + objectid: 'string', + string: 'string', + uuid: 'string', + binary: 'string', + regex: 'string', + int: 'number', + long: 'number', + double: 'number', + decimal: 'number', + bool: 'boolean', + date: 'time', + timestamp: 'time', + object: 'string', + array: 'string', + json: 'string', + null: 'string', + mixed: 'string', + }; + + formatDisplayRef(table: KtxDialectTableRef): string { + return formatDialectDisplayRef(table, 'ansi'); + } + + parseDisplayRef(display: string): KtxTableRef | null { + return parseDialectDisplayRef(display, 'ansi'); + } + + columnDisplayTablePartCount(): 1 | 2 | 3 { + return columnDisplayPartCount('ansi'); + } + + mapDataType(nativeType: string): string { + const normalized = nativeType.toLowerCase().trim(); + if (normalized === 'object' || normalized === 'array') { + return 'json'; + } + return normalized || 'mixed'; + } + + mapToDimensionType(nativeType: string): KtxSchemaDimensionType { + if (!nativeType) { + return 'string'; + } + return this.typeMappings[nativeType.toLowerCase().trim()] ?? 'string'; + } +} diff --git a/packages/cli/src/connectors/mongodb/live-database-introspection.ts b/packages/cli/src/connectors/mongodb/live-database-introspection.ts new file mode 100644 index 00000000..24cf3ff2 --- /dev/null +++ b/packages/cli/src/connectors/mongodb/live-database-introspection.ts @@ -0,0 +1,44 @@ +import type { + LiveDatabaseIntrospectionOptions, + LiveDatabaseIntrospectionPort, +} from '../../context/ingest/adapters/live-database/types.js'; +import type { KtxProjectConnectionConfig } from '../../context/project/config.js'; +import { + KtxMongoDbScanConnector, + type KtxMongoClientFactory, + type KtxMongoDbConnectionConfig, +} from './connector.js'; + +interface CreateMongoDbLiveDatabaseIntrospectionOptions { + connections: Record; + clientFactory?: KtxMongoClientFactory; + now?: () => Date; +} + +export function createMongoDbLiveDatabaseIntrospection( + options: CreateMongoDbLiveDatabaseIntrospectionOptions, +): LiveDatabaseIntrospectionPort { + return { + async extractSchema(connectionId: string, introspectionOptions?: LiveDatabaseIntrospectionOptions) { + const connection = options.connections[connectionId] as KtxMongoDbConnectionConfig | undefined; + const connector = new KtxMongoDbScanConnector({ + connectionId, + connection, + clientFactory: options.clientFactory, + now: options.now, + }); + try { + return await connector.introspect( + { + connectionId, + driver: 'mongodb', + ...(introspectionOptions?.tableScope ? { tableScope: introspectionOptions.tableScope } : {}), + }, + { runId: `mongodb-${connectionId}` }, + ); + } finally { + await connector.cleanup(); + } + }, + }; +} diff --git a/packages/cli/src/connectors/mongodb/schema-inference.ts b/packages/cli/src/connectors/mongodb/schema-inference.ts new file mode 100644 index 00000000..5ad645a2 --- /dev/null +++ b/packages/cli/src/connectors/mongodb/schema-inference.ts @@ -0,0 +1,129 @@ +import type { KtxSchemaColumn } from '../../context/scan/types.js'; +import type { KtxDialect } from '../../context/connections/dialects.js'; + +export type KtxMongoDocument = Record; + +/** Top-level field name MongoDB guarantees on every document; used as the primary key. */ +export const MONGO_ID_FIELD = '_id'; + +const BSON_TYPE_NAMES: Record = { + objectid: 'objectId', + int32: 'int', + long: 'long', + double: 'double', + decimal128: 'decimal', + binary: 'binary', + uuid: 'uuid', + timestamp: 'timestamp', + bsonregexp: 'regex', + bsonsymbol: 'string', +}; + +/** + * Canonical BSON type name for a sampled value as the `mongodb` driver hydrates + * it: BSON wrapper objects expose `_bsontype`; everything else maps from the JS + * runtime type. Sub-documents and arrays collapse to opaque `object`/`array`. + * @internal + */ +export function bsonTypeOf(value: unknown): string { + if (value === null || value === undefined) { + return 'null'; + } + if (typeof value === 'string') { + return 'string'; + } + if (typeof value === 'boolean') { + return 'bool'; + } + if (typeof value === 'number') { + return Number.isInteger(value) ? 'int' : 'double'; + } + if (typeof value === 'bigint') { + return 'long'; + } + if (value instanceof Date) { + return 'date'; + } + if (Array.isArray(value)) { + return 'array'; + } + if (typeof value === 'object') { + const bsontype = (value as { _bsontype?: unknown })._bsontype; + if (typeof bsontype === 'string') { + return BSON_TYPE_NAMES[bsontype.toLowerCase()] ?? bsontype; + } + return 'object'; + } + return 'mixed'; +} + +interface FieldAccumulator { + types: Set; + present: number; + nullSeen: boolean; +} + +function resolveNativeType(types: ReadonlySet): string { + if (types.size === 0) { + return 'null'; + } + if (types.size > 1) { + return 'mixed'; + } + return [...types][0]!; +} + +/** + * Infer a flat column schema from sampled documents. Each top-level field becomes + * one column: BSON types are unioned (a field seen with >1 type is `mixed` and + * treated as a string), nullability is derived from field presence and observed + * nulls, and sub-documents/arrays remain a single opaque `json` column. `_id` is + * the non-nullable primary key. + */ +export function inferKtxMongoCollectionColumns( + documents: readonly KtxMongoDocument[], + dialect: KtxDialect, +): KtxSchemaColumn[] { + const total = documents.length; + const order: string[] = []; + const fields = new Map(); + + for (const document of documents) { + if (!document || typeof document !== 'object') { + continue; + } + for (const [name, value] of Object.entries(document)) { + let accumulator = fields.get(name); + if (!accumulator) { + accumulator = { types: new Set(), present: 0, nullSeen: false }; + fields.set(name, accumulator); + order.push(name); + } + accumulator.present += 1; + const bsonType = bsonTypeOf(value); + if (bsonType === 'null') { + accumulator.nullSeen = true; + } else { + accumulator.types.add(bsonType); + } + } + } + + return order.map((name) => { + const accumulator = fields.get(name)!; + const nativeType = resolveNativeType(accumulator.types); + const isId = name === MONGO_ID_FIELD; + const nullable = isId + ? false + : accumulator.present < total || accumulator.nullSeen || accumulator.types.size === 0; + return { + name, + nativeType, + normalizedType: dialect.mapDataType(nativeType), + dimensionType: dialect.mapToDimensionType(nativeType), + nullable, + primaryKey: isId, + comment: null, + }; + }); +} diff --git a/packages/cli/src/connectors/mysql/connector.ts b/packages/cli/src/connectors/mysql/connector.ts index 5bddec53..f3631d5b 100644 --- a/packages/cli/src/connectors/mysql/connector.ts +++ b/packages/cli/src/connectors/mysql/connector.ts @@ -1,5 +1,5 @@ import mysql, { type FieldPacket, type Pool, type RowDataPacket } from 'mysql2/promise'; -import { getDialectForDriver } from '../../context/connections/dialects.js'; +import { getSqlDialectForDriver } from '../../context/connections/dialects.js'; import { resolveStringReference } from '../shared/string-reference.js'; import { assertReadOnlySql, limitSqlForExecution } from '../../context/connections/read-only-sql.js'; import { @@ -391,7 +391,7 @@ export class KtxMysqlScanConnector implements KtxScanConnector { private readonly poolFactory: KtxMysqlPoolFactory; private readonly endpointResolver?: KtxMysqlEndpointResolver; private readonly now: () => Date; - private readonly dialect = getDialectForDriver('mysql'); + private readonly dialect = getSqlDialectForDriver('mysql'); private pool: KtxMysqlPool | null = null; private resolvedEndpoint: KtxMysqlResolvedEndpoint | null = null; diff --git a/packages/cli/src/connectors/mysql/dialect.ts b/packages/cli/src/connectors/mysql/dialect.ts index 6b26c97a..2deca459 100644 --- a/packages/cli/src/connectors/mysql/dialect.ts +++ b/packages/cli/src/connectors/mysql/dialect.ts @@ -1,4 +1,4 @@ -import type { KtxDialect } from '../../context/connections/dialects.js'; +import type { KtxSqlDialect } from '../../context/connections/dialects.js'; import { columnDisplayPartCount, formatDialectDisplayRef, @@ -11,7 +11,7 @@ import type { KtxSchemaDimensionType, KtxTableRef } from '../../context/scan/typ type MysqlTableNameRef = Pick & Partial>; /** @internal */ -export class KtxMysqlDialect implements KtxDialect { +export class KtxMysqlDialect implements KtxSqlDialect { readonly type = 'mysql' as const; private readonly typeMappings: Record = { diff --git a/packages/cli/src/connectors/postgres/connector.ts b/packages/cli/src/connectors/postgres/connector.ts index 1a2fcd40..f863ed94 100644 --- a/packages/cli/src/connectors/postgres/connector.ts +++ b/packages/cli/src/connectors/postgres/connector.ts @@ -1,5 +1,5 @@ import { resolveStringReference } from '../shared/string-reference.js'; -import { getDialectForDriver } from '../../context/connections/dialects.js'; +import { getSqlDialectForDriver } from '../../context/connections/dialects.js'; import { assertReadOnlySql, limitSqlForExecution } from '../../context/connections/read-only-sql.js'; import { tryConstraintQuery } from '../../context/scan/constraint-discovery.js'; import { scopedTableNames } from '../../context/scan/table-ref.js'; @@ -412,7 +412,7 @@ export class KtxPostgresScanConnector implements KtxScanConnector { private readonly poolFactory: KtxPostgresPoolFactory; private readonly endpointResolver?: KtxPostgresEndpointResolver; private readonly now: () => Date; - private readonly dialect = getDialectForDriver('postgres'); + private readonly dialect = getSqlDialectForDriver('postgres'); private pool: KtxPostgresPool | null = null; private lastIdlePoolError: Error | null = null; private resolvedEndpoint: KtxPostgresResolvedEndpoint | null = null; diff --git a/packages/cli/src/connectors/postgres/dialect.ts b/packages/cli/src/connectors/postgres/dialect.ts index 49d5677d..289c7152 100644 --- a/packages/cli/src/connectors/postgres/dialect.ts +++ b/packages/cli/src/connectors/postgres/dialect.ts @@ -1,4 +1,4 @@ -import type { KtxDialect } from '../../context/connections/dialects.js'; +import type { KtxSqlDialect } from '../../context/connections/dialects.js'; import { columnDisplayPartCount, formatDialectDisplayRef, @@ -11,7 +11,7 @@ import type { KtxSchemaDimensionType, KtxTableRef } from '../../context/scan/typ type PostgresTableNameRef = Pick & Partial>; /** @internal */ -export class KtxPostgresDialect implements KtxDialect { +export class KtxPostgresDialect implements KtxSqlDialect { readonly type = 'postgres' as const; private readonly typeMappings: Record = { diff --git a/packages/cli/src/connectors/snowflake/connector.ts b/packages/cli/src/connectors/snowflake/connector.ts index 5f016675..5b1c5bfa 100644 --- a/packages/cli/src/connectors/snowflake/connector.ts +++ b/packages/cli/src/connectors/snowflake/connector.ts @@ -1,5 +1,5 @@ import { createPrivateKey } from 'node:crypto'; -import { getDialectForDriver } from '../../context/connections/dialects.js'; +import { getSqlDialectForDriver } from '../../context/connections/dialects.js'; import { resolveStringReference } from '../shared/string-reference.js'; import { assertReadOnlySql, limitSqlForExecution } from '../../context/connections/read-only-sql.js'; import { tryConstraintQuery } from '../../context/scan/constraint-discovery.js'; @@ -547,7 +547,7 @@ export class KtxSnowflakeScanConnector implements KtxScanConnector { private readonly resolved: KtxSnowflakeResolvedConnectionConfig; private readonly driverFactory: KtxSnowflakeDriverFactory; - private readonly dialect = getDialectForDriver('snowflake'); + private readonly dialect = getSqlDialectForDriver('snowflake'); private readonly now: () => Date; private driverInstance: KtxSnowflakeDriver | null = null; diff --git a/packages/cli/src/connectors/snowflake/dialect.ts b/packages/cli/src/connectors/snowflake/dialect.ts index 3fe04101..c0277a43 100644 --- a/packages/cli/src/connectors/snowflake/dialect.ts +++ b/packages/cli/src/connectors/snowflake/dialect.ts @@ -1,4 +1,4 @@ -import type { KtxDialect } from '../../context/connections/dialects.js'; +import type { KtxSqlDialect } from '../../context/connections/dialects.js'; import { columnDisplayPartCount, formatDialectDisplayRef, @@ -11,7 +11,7 @@ import type { KtxSchemaDimensionType, KtxTableRef } from '../../context/scan/typ type SnowflakeTableNameRef = Pick & Partial>; /** @internal */ -export class KtxSnowflakeDialect implements KtxDialect { +export class KtxSnowflakeDialect implements KtxSqlDialect { readonly type = 'snowflake' as const; private readonly typeMappings: Record = { diff --git a/packages/cli/src/connectors/sqlite/connector.ts b/packages/cli/src/connectors/sqlite/connector.ts index 4cae8f99..c46bc2dd 100644 --- a/packages/cli/src/connectors/sqlite/connector.ts +++ b/packages/cli/src/connectors/sqlite/connector.ts @@ -3,7 +3,7 @@ import { existsSync, readFileSync, statSync } from 'node:fs'; import { homedir } from 'node:os'; import { isAbsolute, resolve } from 'node:path'; import { fileURLToPath } from 'node:url'; -import { getDialectForDriver } from '../../context/connections/dialects.js'; +import { getSqlDialectForDriver } from '../../context/connections/dialects.js'; import { assertReadOnlySql, limitSqlForExecution } from '../../context/connections/read-only-sql.js'; import { normalizeQueryRows } from '../../context/connections/query-executor.js'; import { connectorTestFailure, createKtxConnectorCapabilities, type KtxConnectorTestResult, type KtxColumnSampleInput, type KtxColumnSampleResult, type KtxColumnStatsInput, type KtxColumnStatsResult, type KtxQueryResult, type KtxReadOnlyQueryInput, type KtxScanConnector, type KtxScanContext, type KtxScanInput, type KtxSchemaForeignKey, type KtxSchemaSnapshot, type KtxSchemaTable, type KtxTableListEntry, type KtxTableRef, type KtxTableSampleInput, type KtxTableSampleResult } from '../../context/scan/types.js'; @@ -133,7 +133,7 @@ export class KtxSqliteScanConnector implements KtxScanConnector { private readonly connectionId: string; private readonly dbPath: string; private readonly now: () => Date; - private readonly dialect = getDialectForDriver('sqlite'); + private readonly dialect = getSqlDialectForDriver('sqlite'); private db: Database.Database | null = null; constructor(options: KtxSqliteScanConnectorOptions) { diff --git a/packages/cli/src/connectors/sqlite/dialect.ts b/packages/cli/src/connectors/sqlite/dialect.ts index 5472b674..38c4a220 100644 --- a/packages/cli/src/connectors/sqlite/dialect.ts +++ b/packages/cli/src/connectors/sqlite/dialect.ts @@ -1,4 +1,4 @@ -import type { KtxDialect } from '../../context/connections/dialects.js'; +import type { KtxSqlDialect } from '../../context/connections/dialects.js'; import { columnDisplayPartCount, formatDialectDisplayRef, @@ -11,7 +11,7 @@ import type { KtxSchemaDimensionType, KtxTableRef } from '../../context/scan/typ type SqliteTableNameRef = Pick & Partial>; /** @internal */ -export class KtxSqliteDialect implements KtxDialect { +export class KtxSqliteDialect implements KtxSqlDialect { readonly type = 'sqlite' as const; private readonly typeMappings: Record = { diff --git a/packages/cli/src/connectors/sqlserver/connector.ts b/packages/cli/src/connectors/sqlserver/connector.ts index b70d6a8c..9f101578 100644 --- a/packages/cli/src/connectors/sqlserver/connector.ts +++ b/packages/cli/src/connectors/sqlserver/connector.ts @@ -1,5 +1,5 @@ import { assertReadOnlySql, hoistLeadingCte, stripTrailingSqlNoise } from '../../context/connections/read-only-sql.js'; -import { getDialectForDriver } from '../../context/connections/dialects.js'; +import { getSqlDialectForDriver } from '../../context/connections/dialects.js'; import { tryConstraintQuery } from '../../context/scan/constraint-discovery.js'; import { scopedTableNames } from '../../context/scan/table-ref.js'; import { @@ -353,7 +353,7 @@ export class KtxSqlServerScanConnector implements KtxScanConnector { private readonly poolFactory: KtxSqlServerPoolFactory; private readonly endpointResolver?: KtxSqlServerEndpointResolver; private readonly now: () => Date; - private readonly dialect = getDialectForDriver('sqlserver'); + private readonly dialect = getSqlDialectForDriver('sqlserver'); private pool: KtxSqlServerPool | null = null; private resolvedEndpoint: KtxSqlServerResolvedEndpoint | null = null; diff --git a/packages/cli/src/connectors/sqlserver/dialect.ts b/packages/cli/src/connectors/sqlserver/dialect.ts index 6b1804f4..18cf2a15 100644 --- a/packages/cli/src/connectors/sqlserver/dialect.ts +++ b/packages/cli/src/connectors/sqlserver/dialect.ts @@ -1,4 +1,4 @@ -import type { KtxDialect } from '../../context/connections/dialects.js'; +import type { KtxSqlDialect } from '../../context/connections/dialects.js'; import { columnDisplayPartCount, formatDialectDisplayRef, @@ -11,7 +11,7 @@ import type { KtxSchemaDimensionType, KtxTableRef } from '../../context/scan/typ type SqlServerTableNameRef = Pick & Partial>; /** @internal */ -export class KtxSqlServerDialect implements KtxDialect { +export class KtxSqlServerDialect implements KtxSqlDialect { readonly type = 'sqlserver' as const; private readonly typeMappings: Record = { diff --git a/packages/cli/src/context/connections/dialects.ts b/packages/cli/src/context/connections/dialects.ts index c7929cea..30608f3e 100644 --- a/packages/cli/src/context/connections/dialects.ts +++ b/packages/cli/src/context/connections/dialects.ts @@ -1,20 +1,38 @@ import { KtxBigQueryDialect } from '../../connectors/bigquery/dialect.js'; import { KtxClickHouseDialect } from '../../connectors/clickhouse/dialect.js'; +import { KtxMongoDbDialect } from '../../connectors/mongodb/dialect.js'; import { KtxMysqlDialect } from '../../connectors/mysql/dialect.js'; import { KtxPostgresDialect } from '../../connectors/postgres/dialect.js'; import { KtxSqliteDialect } from '../../connectors/sqlite/dialect.js'; import { KtxSnowflakeDialect } from '../../connectors/snowflake/dialect.js'; import { KtxSqlServerDialect } from '../../connectors/sqlserver/dialect.js'; +import { KtxExpectedError } from '../../errors.js'; import type { KtxConnectionDriver, KtxSchemaDimensionType, KtxTableRef } from '../scan/types.js'; import type { KtxDialectTableRef } from './dialect-helpers.js'; +/** + * Driver-agnostic dialect surface every connection implements, including + * non-SQL sources like MongoDB: display/ref formatting and type mapping. The + * catalog and entity-details paths resolve this for any snapshot driver, so it + * must stay free of SQL generation. + */ export interface KtxDialect { readonly type: KtxConnectionDriver; - quoteIdentifier(identifier: string): string; - formatTableName(table: KtxDialectTableRef): string; formatDisplayRef(table: KtxDialectTableRef): string; parseDisplayRef(display: string): KtxTableRef | null; columnDisplayTablePartCount(): 1 | 2 | 3; + mapToDimensionType(nativeType: string): KtxSchemaDimensionType; + mapDataType(nativeType: string): string; +} + +/** + * SQL query generation, implemented only by SQL warehouse drivers. The relationship + * profiling/validation pipeline is the sole caller and is gated on the + * `readOnlySql` capability, so these methods are unreachable for a non-SQL source. + */ +export interface KtxSqlDialect extends KtxDialect { + quoteIdentifier(identifier: string): string; + formatTableName(table: KtxDialectTableRef): string; getLimitOffsetClause(limit: number, offset?: number): string; getTopClause(limit: number): string; getRandomSampleFilter(samplePct: number): string; @@ -30,21 +48,11 @@ export interface KtxDialect { getDistinctCountExpression(column: string): string; textLengthExpression(columnSql: string): string; castToText(columnSql: string): string; - mapToDimensionType(nativeType: string): KtxSchemaDimensionType; - mapDataType(nativeType: string): string; } -const supportedDrivers: KtxConnectionDriver[] = [ - 'bigquery', - 'clickhouse', - 'mysql', - 'postgres', - 'sqlite', - 'snowflake', - 'sqlserver', -]; +type KtxSqlDriver = Exclude; -const dialectFactories: Record KtxDialect> = { +const sqlDialectFactories: Record KtxSqlDialect> = { bigquery: () => new KtxBigQueryDialect(), clickhouse: () => new KtxClickHouseDialect(), mysql: () => new KtxMysqlDialect(), @@ -54,11 +62,54 @@ const dialectFactories: Record KtxDialect> = { sqlserver: () => new KtxSqlServerDialect(), }; +const dialectFactories: Record KtxDialect> = { + ...sqlDialectFactories, + mongodb: () => new KtxMongoDbDialect(), +}; + +const supportedSqlDrivers = Object.keys(sqlDialectFactories).sort(); + export function getDialectForDriver(driver: string): KtxDialect { const normalized = driver.toLowerCase().trim(); const factory = dialectFactories[normalized as KtxConnectionDriver]; if (factory) { return factory(); } - throw new Error(`Unsupported warehouse driver "${driver}". Supported drivers: ${supportedDrivers.join(', ')}`); + throw new Error( + `Unsupported driver "${driver}". Supported drivers: ${Object.keys(dialectFactories).sort().join(', ')}`, + ); +} + +export function getSqlDialectForDriver(driver: string): KtxSqlDialect { + const normalized = driver.toLowerCase().trim(); + const factory = sqlDialectFactories[normalized as KtxSqlDriver]; + if (factory) { + return factory(); + } + throw new Error(`Driver "${driver}" has no SQL dialect. SQL drivers: ${supportedSqlDrivers.join(', ')}`); +} + +/** + * Whether a driver can generate and execute SQL. Single source of truth for the + * SQL/non-SQL boundary: a driver is SQL-queryable iff it has a SQL dialect, so + * non-SQL sources (e.g. mongodb) are excluded without a hand-maintained list. + */ +export function isSqlQueryableDriver(driver: string | undefined): boolean { + const normalized = (driver ?? '').toLowerCase().trim(); + return Object.prototype.hasOwnProperty.call(sqlDialectFactories, normalized); +} + +/** + * Refuse a non-SQL connection (e.g. mongodb) at a read-only-SQL entry point before + * any dialect selection or parser/daemon work, so it is never validated as Postgres. + * The federated `duckdb` connection has no driver — callers skip this guard for it. + */ +export function assertSqlQueryableConnection(connectionId: string, driver: string | undefined): void { + if (!isSqlQueryableDriver(driver)) { + throw new KtxExpectedError( + `Connection '${connectionId}' uses the non-SQL driver '${driver ?? 'unknown'}'. ` + + 'Read-only SQL (ktx sql, the sql_execution tool) requires a SQL warehouse connection; ' + + 'MongoDB and other context-only sources are searchable and ingestable, not SQL-queryable.', + ); + } } diff --git a/packages/cli/src/context/connections/drivers.ts b/packages/cli/src/context/connections/drivers.ts index 3fbeb058..76ac408e 100644 --- a/packages/cli/src/context/connections/drivers.ts +++ b/packages/cli/src/context/connections/drivers.ts @@ -68,6 +68,27 @@ export const driverRegistrations: Record { + const m = await import('../../connectors/mongodb/connector.js'); + return { + isConnectionConfig: (connection) => { + const typedConnection = connection as Parameters[0]; + return m.isKtxMongoDbConnectionConfig(typedConnection); + }, + createScanConnector: ({ connectionId, connection }) => { + const typedConnection = connection as Parameters[0]; + if (!m.isKtxMongoDbConnectionConfig(typedConnection)) { + throw invalidConnectionConfig('mongodb'); + } + return new m.KtxMongoDbScanConnector({ connectionId, connection: typedConnection }); + }, + }; + }, + }, mysql: { driver: 'mysql', scopeConfigKey: 'schemas', diff --git a/packages/cli/src/context/mcp/local-project-ports.ts b/packages/cli/src/context/mcp/local-project-ports.ts index bf1af94a..6348bfa4 100644 --- a/packages/cli/src/context/mcp/local-project-ports.ts +++ b/packages/cli/src/context/mcp/local-project-ports.ts @@ -2,6 +2,7 @@ import type { KtxSqlQueryExecutorPort } from '../../context/connections/query-ex import { KtxExpectedError, KtxQueryError, isNativeProgrammingFault } from '../../errors.js'; import { executeProjectReadOnlySql } from '../../context/connections/project-sql-executor.js'; import { FEDERATED_CONNECTION_ID, federatedConnectionListing } from '../../context/connections/federation.js'; +import { assertSqlQueryableConnection } from '../../context/connections/dialects.js'; import { resolveConfiguredConnection } from '../../context/connections/resolve-connection.js'; import { type LocalConnectionInfo, @@ -48,6 +49,9 @@ async function executeValidatedReadOnlySql( const isFederated = input.connectionId === FEDERATED_CONNECTION_ID; const connectionId = isFederated ? input.connectionId : assertSafeConnectionId(input.connectionId); const connection = isFederated ? undefined : resolveConfiguredConnection(project.config, connectionId); + if (!isFederated) { + assertSqlQueryableConnection(connectionId, connection!.driver); + } const dialect = sqlAnalysisDialectForDriver(isFederated ? 'duckdb' : connection!.driver); const validation = await options.sqlAnalysis.validateReadOnly(input.sql, dialect); diff --git a/packages/cli/src/context/project/driver-schemas.ts b/packages/cli/src/context/project/driver-schemas.ts index 1b2434c6..52c2bf3a 100644 --- a/packages/cli/src/context/project/driver-schemas.ts +++ b/packages/cli/src/context/project/driver-schemas.ts @@ -48,6 +48,41 @@ const warehouseConnectionSchemas = [ warehouseConnectionSchema('sqlserver'), ] as const; +const mongodbConnectionSchema = z + .looseObject({ + driver: z.literal('mongodb'), + url: z + .string() + .min(1) + .describe( + 'MongoDB connection string (mongodb:// or mongodb+srv://, including TLS/Atlas); may contain a reference like env:MONGO_URL.', + ), + database: z.string().min(1).optional().describe('Single database to introspect when not using databases or a URL path.'), + databases: z + .array(z.string().min(1)) + .optional() + .describe('Databases whose collections ktx introspects as tables. Falls back to the URL path database.'), + enabled_tables: z + .array(z.string().min(1)) + .optional() + .describe('Optional allowlist of "database.collection" names to introspect.'), + sample_size: z + .number() + .int() + .min(1) + .optional() + .describe('How many recent documents to sample per collection when inferring the schema (default 1000).'), + order_by: z + .string() + .min(1) + .optional() + .describe( + 'Field to sort by descending when sampling. Defaults to _id; set this when _id is not an ObjectId. ' + + 'Should be indexed — an unindexed sort hits MongoDB\'s in-memory sort limit on large collections.', + ), + }) + .describe('MongoDB primary-source connection. Schema is inferred by sampling the most recent documents.'); + const positiveIntKeyMessage = (field: string) => `${field} keys must be positive-integer strings (e.g. "1", "42")`; const positiveIntKeyRegex = /^[1-9]\d*$/; @@ -210,6 +245,7 @@ const metricflowConnectionSchema = z export const connectionConfigSchema = z.discriminatedUnion('driver', [ ...warehouseConnectionSchemas, + mongodbConnectionSchema, metabaseConnectionSchema, lookerConnectionSchema, lookmlConnectionSchema, diff --git a/packages/cli/src/context/scan/local-enrichment.ts b/packages/cli/src/context/scan/local-enrichment.ts index 833cb5b1..6addba4a 100644 --- a/packages/cli/src/context/scan/local-enrichment.ts +++ b/packages/cli/src/context/scan/local-enrichment.ts @@ -1,6 +1,6 @@ import pLimit from 'p-limit'; import type { KtxLlmRuntimePort } from '../../context/llm/runtime-port.js'; -import { getDialectForDriver } from '../connections/dialects.js'; +import { getSqlDialectForDriver } from '../connections/dialects.js'; import { buildDefaultKtxProjectConfig, type KtxScanRelationshipConfig } from '../project/config.js'; import { KtxDescriptionGenerator } from './description-generation.js'; import { buildKtxColumnEmbeddingText } from './embedding-text.js'; @@ -486,7 +486,9 @@ export async function runLocalScanEnrichment( snapshot, connectionId: input.connectionId, }); - const dialect = getDialectForDriver(snapshot.driver); + const dialect = input.connector.capabilities.readOnlySql + ? getSqlDialectForDriver(snapshot.driver) + : null; const now = input.now ?? (() => new Date()); const state = completedKtxScanEnrichmentStateSummary(); const syncId = input.syncId ?? input.context.runId; diff --git a/packages/cli/src/context/scan/local-scan.ts b/packages/cli/src/context/scan/local-scan.ts index 8529b7b7..cc4c47f6 100644 --- a/packages/cli/src/context/scan/local-scan.ts +++ b/packages/cli/src/context/scan/local-scan.ts @@ -131,12 +131,13 @@ function normalizeDriver(driver: string | undefined): KtxConnectionDriver { normalized === 'clickhouse' || normalized === 'sqlserver' || normalized === 'bigquery' || - normalized === 'snowflake' + normalized === 'snowflake' || + normalized === 'mongodb' ) { return normalized; } throw new Error( - `Standalone ktx scan supports postgres/sqlite/mysql/clickhouse/sqlserver/bigquery/snowflake in this phase, received "${driver ?? 'unknown'}"`, + `Standalone ktx scan supports postgres/sqlite/mysql/clickhouse/sqlserver/bigquery/snowflake/mongodb in this phase, received "${driver ?? 'unknown'}"`, ); } diff --git a/packages/cli/src/context/scan/relationship-benchmarks.ts b/packages/cli/src/context/scan/relationship-benchmarks.ts index a07221a3..f4ce8cea 100644 --- a/packages/cli/src/context/scan/relationship-benchmarks.ts +++ b/packages/cli/src/context/scan/relationship-benchmarks.ts @@ -6,7 +6,7 @@ import { gunzipSync } from 'node:zlib'; import Database from 'better-sqlite3'; import YAML from 'yaml'; import { z } from 'zod'; -import { getDialectForDriver } from '../connections/dialects.js'; +import { getSqlDialectForDriver } from '../connections/dialects.js'; import type { KtxLlmRuntimePort } from '../llm/runtime-port.js'; import type { KtxEnrichedRelationship, KtxEnrichedSchema, KtxRelationshipType } from './enrichment-types.js'; import { snapshotToKtxEnrichedSchema } from './local-enrichment.js'; @@ -537,7 +537,7 @@ export function ktxRelationshipBenchmarkDetectorWithLlm( const formalLinks = formalMetadata.accepted.map((relationship) => relationshipToBenchmarkLink(relationship)); const acceptedKeys = new Set(formalLinks.map(fkKey)); const sqliteDataAvailable = Boolean(input.dataPath && input.snapshot.driver === 'sqlite'); - const dialect = getDialectForDriver(input.snapshot.driver); + const dialect = getSqlDialectForDriver(input.snapshot.driver); const profilingExecutor = sqliteDataAvailable && input.mode !== 'profiling_disabled' ? new KtxRelationshipBenchmarkSqliteExecutor(input.dataPath as string) @@ -552,6 +552,7 @@ export function ktxRelationshipBenchmarkDetectorWithLlm( }) : await profileKtxRelationshipSchema({ connectionId: input.snapshot.connectionId, + driver: input.snapshot.driver, dialect, schema: input.schema, executor: profilingExecutor, @@ -673,7 +674,7 @@ export function currentKtxRelationshipBenchmarkDetector(): KtxRelationshipBenchm const formalLinks = formalMetadata.accepted.map((relationship) => relationshipToBenchmarkLink(relationship)); const acceptedKeys = new Set(formalLinks.map(fkKey)); const sqliteDataAvailable = Boolean(input.dataPath && input.snapshot.driver === 'sqlite'); - const dialect = getDialectForDriver(input.snapshot.driver); + const dialect = getSqlDialectForDriver(input.snapshot.driver); const profilingExecutor = sqliteDataAvailable && input.mode !== 'profiling_disabled' ? new KtxRelationshipBenchmarkSqliteExecutor(input.dataPath as string) @@ -688,6 +689,7 @@ export function currentKtxRelationshipBenchmarkDetector(): KtxRelationshipBenchm }) : await profileKtxRelationshipSchema({ connectionId: input.snapshot.connectionId, + driver: input.snapshot.driver, dialect, schema: input.schema, executor: profilingExecutor, diff --git a/packages/cli/src/context/scan/relationship-composite-candidates.ts b/packages/cli/src/context/scan/relationship-composite-candidates.ts index 28263a15..047e08cb 100644 --- a/packages/cli/src/context/scan/relationship-composite-candidates.ts +++ b/packages/cli/src/context/scan/relationship-composite-candidates.ts @@ -1,4 +1,4 @@ -import type { KtxDialect } from '../connections/dialects.js'; +import type { KtxSqlDialect } from '../connections/dialects.js'; import type { KtxEnrichedColumn, KtxEnrichedSchema, KtxEnrichedTable, KtxRelationshipType } from './enrichment-types.js'; import { type KtxRelationshipProfileArtifact, @@ -56,7 +56,7 @@ export interface KtxCompositeRelationshipCandidate { export interface DiscoverKtxCompositeRelationshipsInput { connectionId: string; - dialect: KtxDialect; + dialect: KtxSqlDialect; schema: KtxEnrichedSchema; profiles: KtxRelationshipProfileArtifact; executor: KtxRelationshipReadOnlyExecutor | null; @@ -227,11 +227,11 @@ function sqlSuffix(fragment: string): string { return fragment ? ` ${fragment}` : ''; } -function aliasedTupleSelect(dialect: KtxDialect, columns: readonly string[]): string { +function aliasedTupleSelect(dialect: KtxSqlDialect, columns: readonly string[]): string { return columns.map((column, index) => `${dialect.quoteIdentifier(column)} AS c${index}`).join(', '); } -function nonNullPredicate(dialect: KtxDialect, columns: readonly string[]): string { +function nonNullPredicate(dialect: KtxSqlDialect, columns: readonly string[]): string { return columns.map((column) => `${dialect.quoteIdentifier(column)} IS NOT NULL`).join(' AND '); } @@ -242,7 +242,7 @@ function tupleEquality(columns: number): string { } function buildTupleDistinctSql(input: { - dialect: KtxDialect; + dialect: KtxSqlDialect; table: KtxTableRef; columns: readonly string[]; }): string { @@ -257,7 +257,7 @@ function buildTupleDistinctSql(input: { } function buildCompositeCoverageSql(input: { - dialect: KtxDialect; + dialect: KtxSqlDialect; childTable: KtxTableRef; childColumns: readonly string[]; parentTable: KtxTableRef; @@ -322,7 +322,7 @@ function hasAcceptedSubset( async function detectCompositePrimaryKeys(input: { connectionId: string; - dialect: KtxDialect; + dialect: KtxSqlDialect; table: KtxEnrichedTable; profiles: KtxRelationshipProfileArtifact; executor: KtxRelationshipReadOnlyExecutor; @@ -426,7 +426,7 @@ function compatibleTuple(sourceColumns: readonly KtxEnrichedColumn[], targetColu async function validateCompositeRelationship(input: { connectionId: string; - dialect: KtxDialect; + dialect: KtxSqlDialect; sourceTable: KtxEnrichedTable; sourceColumns: readonly KtxEnrichedColumn[]; targetKey: KtxCompositePrimaryKeyCandidate; diff --git a/packages/cli/src/context/scan/relationship-discovery.ts b/packages/cli/src/context/scan/relationship-discovery.ts index c2917aa8..2052d5b7 100644 --- a/packages/cli/src/context/scan/relationship-discovery.ts +++ b/packages/cli/src/context/scan/relationship-discovery.ts @@ -1,5 +1,5 @@ import type { KtxLlmRuntimePort } from '../../context/llm/runtime-port.js'; -import type { KtxDialect } from '../connections/dialects.js'; +import type { KtxSqlDialect } from '../connections/dialects.js'; import type { KtxScanRelationshipConfig } from '../project/config.js'; import type { KtxEnrichedRelationship, KtxEnrichedSchema, KtxRelationshipUpdate } from './enrichment-types.js'; import { @@ -34,7 +34,7 @@ import type { export interface DiscoverKtxRelationshipsInput { connectionId: string; - dialect: KtxDialect; + dialect: KtxSqlDialect | null; connector: KtxScanConnector; schema: KtxEnrichedSchema; context: KtxScanContext; @@ -122,20 +122,21 @@ function compositeSummary(relationships: readonly KtxCompositeRelationshipCandid async function detectCompositeRelationships(input: { connectionId: string; - dialect: KtxDialect; + dialect: KtxSqlDialect | null; schema: KtxEnrichedSchema; profile: KtxRelationshipProfileArtifact; executor: KtxRelationshipReadOnlyExecutor | null; context: DiscoverKtxRelationshipsInput['context']; warnings: KtxScanWarning[]; }): Promise { - if (!input.executor || !input.profile.sqlAvailable) { + if (!input.executor || !input.profile.sqlAvailable || !input.dialect) { return []; } + const dialect = input.dialect; try { const compositeDetection = await discoverKtxCompositeRelationships({ connectionId: input.connectionId, - dialect: input.dialect, + dialect, schema: input.schema, profiles: input.profile, executor: input.executor, @@ -223,6 +224,7 @@ export async function discoverKtxRelationships( const profileCache = createKtxRelationshipProfileCache(); const profile = await profileKtxRelationshipSchema({ connectionId: input.connectionId, + driver: input.connector.driver, dialect: input.dialect, schema: input.schema, executor, diff --git a/packages/cli/src/context/scan/relationship-profiling.ts b/packages/cli/src/context/scan/relationship-profiling.ts index 0f22c21c..d547e350 100644 --- a/packages/cli/src/context/scan/relationship-profiling.ts +++ b/packages/cli/src/context/scan/relationship-profiling.ts @@ -1,4 +1,4 @@ -import type { KtxDialect } from '../connections/dialects.js'; +import type { KtxSqlDialect } from '../connections/dialects.js'; import type { KtxEnrichedColumn, KtxEnrichedSchema, KtxEnrichedTable } from './enrichment-types.js'; import { mapWithConcurrency } from './relationship-validation.js'; import type { @@ -56,7 +56,8 @@ export interface KtxRelationshipProfileCache { export interface ProfileKtxRelationshipSchemaInput { connectionId: string; - dialect: KtxDialect; + driver: KtxConnectionDriver; + dialect: KtxSqlDialect | null; schema: KtxEnrichedSchema; executor: KtxRelationshipReadOnlyExecutor | null; ctx: KtxScanContext; @@ -123,7 +124,7 @@ function columnKey(table: KtxEnrichedTable, column: KtxEnrichedColumn): string { function tableProfileCacheKey(input: { connectionId: string; - dialect: KtxDialect; + dialect: KtxSqlDialect; ctx: KtxScanContext; table: KtxTableRef; sampleValuesPerColumn: number; @@ -149,7 +150,7 @@ function sqlSuffix(fragment: string): string { return fragment ? ` ${fragment}` : ''; } -function sampledTableSql(dialect: KtxDialect, tableSql: string, limit: number): string { +function sampledTableSql(dialect: KtxSqlDialect, tableSql: string, limit: number): string { const top = dialect.getTopClause(limit); if (top) { return `(SELECT ${top} * FROM ${tableSql}) AS relationship_profile_sample`; @@ -158,7 +159,7 @@ function sampledTableSql(dialect: KtxDialect, tableSql: string, limit: number): } function sampleValuesSql(input: { - dialect: KtxDialect; + dialect: KtxSqlDialect; tableSql: string; columnSql: string; limit: number; @@ -175,7 +176,7 @@ function sampleValuesSql(input: { } function columnProfileSelectSql(input: { - dialect: KtxDialect; + dialect: KtxSqlDialect; tableSql: string; profileTableSql: string; column: KtxEnrichedColumn; @@ -218,7 +219,7 @@ function splitSampleValues(value: unknown): string[] { async function queryCount(input: { connectionId: string; - dialect: KtxDialect; + dialect: KtxSqlDialect; table: KtxTableRef; executor: KtxRelationshipReadOnlyExecutor; ctx: KtxScanContext; @@ -233,7 +234,7 @@ async function queryCount(input: { async function queryTableProfile(input: { connectionId: string; - dialect: KtxDialect; + dialect: KtxSqlDialect; table: KtxEnrichedTable; executor: KtxRelationshipReadOnlyExecutor; ctx: KtxScanContext; @@ -320,10 +321,10 @@ type TableProfileResult = export async function profileKtxRelationshipSchema( input: ProfileKtxRelationshipSchemaInput, ): Promise { - if (!input.executor) { + if (!input.executor || !input.dialect) { return { connectionId: input.connectionId, - driver: input.dialect.type, + driver: input.driver, sqlAvailable: false, queryCount: 0, tables: [], @@ -337,6 +338,7 @@ export async function profileKtxRelationshipSchema( const columns: Record = {}; const warnings: string[] = []; const executor = input.executor; + const dialect = input.dialect; const enabledTables = input.schema.tables.filter((candidate) => candidate.enabled); const tableResults = await mapWithConcurrency( @@ -347,7 +349,7 @@ export async function profileKtxRelationshipSchema( const profileSampleRows = input.profileSampleRows ?? 10000; const cacheKey = tableProfileCacheKey({ connectionId: input.connectionId, - dialect: input.dialect, + dialect, ctx: input.ctx, table: table.ref, sampleValuesPerColumn, @@ -361,7 +363,7 @@ export async function profileKtxRelationshipSchema( try { const tableProfile = await queryTableProfile({ connectionId: input.connectionId, - dialect: input.dialect, + dialect, table, executor, ctx: input.ctx, @@ -403,7 +405,7 @@ export async function profileKtxRelationshipSchema( return { connectionId: input.connectionId, - driver: input.dialect.type, + driver: input.driver, sqlAvailable: true, queryCount: queryTotal, tables, diff --git a/packages/cli/src/context/scan/relationship-validation.ts b/packages/cli/src/context/scan/relationship-validation.ts index 5fc0f3fb..ac985eec 100644 --- a/packages/cli/src/context/scan/relationship-validation.ts +++ b/packages/cli/src/context/scan/relationship-validation.ts @@ -1,4 +1,4 @@ -import type { KtxDialect } from '../connections/dialects.js'; +import type { KtxSqlDialect } from '../connections/dialects.js'; import type { KtxRelationshipEndpoint } from './enrichment-types.js'; import { applyKtxRelationshipValidationBudget, type KtxRelationshipValidationBudget } from './relationship-budget.js'; import type { KtxRelationshipDiscoveryCandidate } from './relationship-candidates.js'; @@ -44,7 +44,7 @@ export interface KtxValidatedRelationshipDiscoveryCandidate export interface ValidateKtxRelationshipDiscoveryCandidatesInput { connectionId: string; - dialect: KtxDialect; + dialect: KtxSqlDialect | null; candidates: readonly KtxRelationshipDiscoveryCandidate[]; profiles: KtxRelationshipProfileArtifact; executor: KtxRelationshipReadOnlyExecutor | null; @@ -108,7 +108,7 @@ function sqlSuffix(fragment: string): string { } function buildCoverageSql(input: { - dialect: KtxDialect; + dialect: KtxSqlDialect; childTable: KtxTableRef; childColumn: string; parentTable: KtxTableRef; @@ -237,13 +237,14 @@ export async function validateKtxRelationshipDiscoveryCandidates( input: ValidateKtxRelationshipDiscoveryCandidatesInput, ): Promise { const settings = mergeSettings(input.settings); - if (!input.executor || !input.profiles.sqlAvailable) { + if (!input.executor || !input.profiles.sqlAvailable || !input.dialect) { return input.candidates.map((candidate) => reviewWithoutValidation(candidate, input.profiles, 'validation_unavailable'), ); } const executor = input.executor; + const dialect = input.dialect; async function validateCandidate( candidate: KtxRelationshipDiscoveryCandidate, @@ -260,7 +261,7 @@ export async function validateKtxRelationshipDiscoveryCandidates( { connectionId: input.connectionId, sql: buildCoverageSql({ - dialect: input.dialect, + dialect, childTable: candidate.from.table, childColumn: sourceColumn, parentTable: candidate.to.table, diff --git a/packages/cli/src/context/scan/types.ts b/packages/cli/src/context/scan/types.ts index fc445b5e..148203ef 100644 --- a/packages/cli/src/context/scan/types.ts +++ b/packages/cli/src/context/scan/types.ts @@ -7,7 +7,8 @@ export type KtxConnectionDriver = | 'bigquery' | 'snowflake' | 'mysql' - | 'clickhouse'; + | 'clickhouse' + | 'mongodb'; export type KtxScanMode = 'structural' | 'relationships' | 'enriched'; diff --git a/packages/cli/src/context/sl/local-query.ts b/packages/cli/src/context/sl/local-query.ts index 384b4762..fb8b9439 100644 --- a/packages/cli/src/context/sl/local-query.ts +++ b/packages/cli/src/context/sl/local-query.ts @@ -2,6 +2,7 @@ import type { KtxSqlQueryExecutorPort } from '../../context/connections/query-ex import type { KtxSemanticLayerComputePort } from '../../context/daemon/semantic-layer-compute.js'; import type { KtxMcpProgressCallback } from '../mcp/types.js'; import type { KtxLocalProject } from '../../context/project/project.js'; +import { isSqlQueryableDriver } from '../connections/dialects.js'; import { FEDERATED_CONNECTION_ID } from '../connections/federation.js'; import { resolveRequiredConnectionId } from '../connections/resolve-connection.js'; import { sqlAnalysisDialectForDriver } from '../sql-analysis/dialect.js'; @@ -83,6 +84,12 @@ export async function compileLocalSlQuery( await options.onProgress?.({ progress: 0, message: 'Compiling query' }); const connectionId = resolveLocalConnectionId(project, options.connectionId); const driver = project.config.connections[connectionId]?.driver; + if (!isSqlQueryableDriver(driver)) { + throw new Error( + `Semantic-layer queries require a SQL warehouse connection; '${connectionId}' uses the non-SQL driver ` + + `'${driver ?? 'unknown'}'. MongoDB and other context-only sources are searchable and ingestable, not SL-queryable.`, + ); + } const dialect = sqlAnalysisDialectForDriver(driver); const sources = await loadComputableSources(project, connectionId); diff --git a/packages/cli/src/local-adapters.ts b/packages/cli/src/local-adapters.ts index 671bce67..d923a68b 100644 --- a/packages/cli/src/local-adapters.ts +++ b/packages/cli/src/local-adapters.ts @@ -122,6 +122,17 @@ function createKtxCliLiveDatabaseIntrospection( return { async extractSchema(connectionId: string, options?: LiveDatabaseIntrospectionOptions) { const connection = project.config.connections[connectionId]; + if (String(connection?.driver ?? '').toLowerCase() === 'mongodb') { + const { createMongoDbLiveDatabaseIntrospection } = await import('./connectors/mongodb/live-database-introspection.js'); + const { isKtxMongoDbConnectionConfig } = await import('./connectors/mongodb/connector.js'); + if (!isKtxMongoDbConnectionConfig(connection)) { + return daemon.extractSchema(connectionId, options); + } + const mongodb = createMongoDbLiveDatabaseIntrospection({ + connections: project.config.connections, + }); + return mongodb.extractSchema(connectionId, options); + } if (isKtxPostgresConnectionConfig(connection)) { return postgres.extractSchema(connectionId, options); } diff --git a/packages/cli/src/setup-databases.ts b/packages/cli/src/setup-databases.ts index 9f9f828d..548c03a2 100644 --- a/packages/cli/src/setup-databases.ts +++ b/packages/cli/src/setup-databases.ts @@ -69,7 +69,8 @@ export type KtxSetupDatabaseDriver = | 'clickhouse' | 'sqlserver' | 'bigquery' - | 'snowflake'; + | 'snowflake' + | 'mongodb'; export interface KtxSetupDatabasesArgs { projectDir: string; @@ -156,6 +157,7 @@ const DRIVER_OPTIONS: Array<{ value: KtxSetupDatabaseDriver; label: string }> = { value: 'mysql', label: 'MySQL' }, { value: 'clickhouse', label: 'ClickHouse' }, { value: 'sqlserver', label: 'SQL Server' }, + { value: 'mongodb', label: 'MongoDB' }, { value: 'sqlite', label: 'SQLite' }, ]; @@ -178,6 +180,7 @@ const DEFAULT_CONNECTION_IDS: Record = { sqlserver: 'sqlserver-warehouse', bigquery: 'bigquery-warehouse', snowflake: 'snowflake-warehouse', + mongodb: 'mongodb-source', }; interface ScopeDiscoverySpec { @@ -231,6 +234,13 @@ const SCOPE_DISCOVERY_SPECS: Partial { + if (input.args.inputMode === 'disabled' && !input.args.databaseUrl) return null; + + const rawUrl = + input.args.databaseUrl ?? + (await promptText( + input.prompts, + 'MongoDB connection URL\nFor example mongodb+srv://user:pass@cluster.example.net/app.', // pragma: allowlist secret + stringConfigField(input.existingConnection, 'url'), + )); + if (rawUrl === undefined) return 'back'; + if (!rawUrl) return null; + + const url = normalizeInputReference(rawUrl); + const scope = scriptedScopeConfigForDriver('mongodb', input.args.databaseSchemas); + + if (url.startsWith('env:') || url.startsWith('file:')) { + return { driver: 'mongodb', url, ...scope }; + } + if (urlHasCredentials(url)) { + const ref = await writeProjectLocalSecretReference({ + projectDir: input.args.projectDir, + fileName: `${input.connectionId}-url`, + value: url, + }); + return { driver: 'mongodb', url: ref, ...scope }; + } + return { driver: 'mongodb', url, ...scope }; +} + async function buildConnectionConfig(input: { driver: KtxSetupDatabaseDriver; connectionId: string; @@ -777,6 +822,14 @@ async function buildConnectionConfig(input: { existingConnection: input.existingConnection, }); } + if (driver === 'mongodb') { + return await buildMongoConnectionConfig({ + connectionId: input.connectionId, + args, + prompts, + existingConnection: input.existingConnection, + }); + } if (driver === 'bigquery') { const credentialsPath = await promptText( prompts, diff --git a/packages/cli/src/sql.ts b/packages/cli/src/sql.ts index 8a5431eb..78cfb796 100644 --- a/packages/cli/src/sql.ts +++ b/packages/cli/src/sql.ts @@ -2,6 +2,7 @@ import { executeFederatedQuery } from './connectors/duckdb/federated-executor.js import { FEDERATED_CONNECTION_ID } from './context/connections/federation.js'; import { executeProjectReadOnlySql } from './context/connections/project-sql-executor.js'; import type { KtxSqlQueryExecutionResult } from './context/connections/query-executor.js'; +import { assertSqlQueryableConnection } from './context/connections/dialects.js'; import { resolveConfiguredConnection } from './context/connections/resolve-connection.js'; import { loadKtxProject, type KtxLocalProject } from './context/project/project.js'; import { sqlAnalysisDialectForDriver } from './context/sql-analysis/dialect.js'; @@ -136,6 +137,9 @@ export async function runKtxSql(args: KtxSqlArgs, io: KtxCliIo = process, deps: const connection = isFederated ? undefined : resolveConfiguredConnection(project.config, args.connectionId); driver = isFederated ? 'duckdb' : String(connection?.driver ?? 'unknown').toLowerCase(); demoConnection = isFederated ? false : isDemoConnection(args.connectionId, connection); + if (!isFederated) { + assertSqlQueryableConnection(args.connectionId, connection?.driver); + } const createSqlAnalysis = deps.createSqlAnalysis ?? diff --git a/packages/cli/test/connectors/mongodb/connector.test.ts b/packages/cli/test/connectors/mongodb/connector.test.ts new file mode 100644 index 00000000..43d60013 --- /dev/null +++ b/packages/cli/test/connectors/mongodb/connector.test.ts @@ -0,0 +1,241 @@ +import { describe, expect, it, vi } from 'vitest'; +import { + isKtxMongoDbConnectionConfig, + KtxMongoDbScanConnector, + type KtxMongoClient, + type KtxMongoClientFactory, + type KtxMongoDbConnectionConfig, + type KtxMongoListedCollection, +} from '../../../src/connectors/mongodb/connector.js'; +import { createMongoDbLiveDatabaseIntrospection } from '../../../src/connectors/mongodb/live-database-introspection.js'; +import { executeProjectReadOnlySql } from '../../../src/context/connections/project-sql-executor.js'; +import { tableRefSet } from '../../../src/context/scan/table-ref.js'; +import type { KtxLocalProject } from '../../../src/context/project/project.js'; +import type { KtxMongoDocument } from '../../../src/connectors/mongodb/schema-inference.js'; + +function objectId(hex: string): unknown { + return { _bsontype: 'ObjectId', toString: () => hex }; +} + +const COLLECTIONS: Record = { + app: [ + { name: 'users' }, + { name: 'orders' }, + { name: 'system.views' }, + ], +}; + +const DOCUMENTS: Record = { + users: [ + { _id: objectId('a1'), email: 'a@x.com', age: 31, address: { city: 'NY' } }, + { _id: objectId('a2'), email: 'b@x.com' }, + ], + orders: [{ _id: objectId('b1'), total: 9.99 }], +}; + +function fakeClientFactory(): { factory: KtxMongoClientFactory; client: KtxMongoClient } { + const client: KtxMongoClient = { + listCollections: vi.fn(async (databaseName: string) => COLLECTIONS[databaseName] ?? []), + estimatedDocumentCount: vi.fn(async (_databaseName: string, collectionName: string) => + collectionName === 'users' ? 2 : 1, + ), + find: vi.fn(async (_databaseName: string, collectionName: string) => DOCUMENTS[collectionName] ?? []), + ping: vi.fn(async () => undefined), + close: vi.fn(async () => undefined), + }; + return { factory: { create: vi.fn(() => client) }, client }; +} + +function connector(connection: KtxMongoDbConnectionConfig, factory: KtxMongoClientFactory): KtxMongoDbScanConnector { + return new KtxMongoDbScanConnector({ connectionId: 'mongo-prod', connection, clientFactory: factory }); +} + +const baseConnection: KtxMongoDbConnectionConfig = { + driver: 'mongodb', + url: 'mongodb://localhost:27017/app', + databases: ['app'], +}; + +describe('isKtxMongoDbConnectionConfig', () => { + it('matches only the mongodb driver', () => { + expect(isKtxMongoDbConnectionConfig({ driver: 'mongodb' })).toBe(true); + expect(isKtxMongoDbConnectionConfig({ driver: 'postgres' })).toBe(false); + expect(isKtxMongoDbConnectionConfig(undefined)).toBe(false); + }); +}); + +describe('KtxMongoDbScanConnector capabilities', () => { + it('advertises a non-SQL, sampling-capable source and exposes no SQL execution', () => { + const { factory } = fakeClientFactory(); + const c = connector(baseConnection, factory); + expect(c.driver).toBe('mongodb'); + expect(c.capabilities.readOnlySql).toBe(false); + expect(c.capabilities.formalForeignKeys).toBe(false); + expect(c.capabilities.tableSampling).toBe(true); + expect(c.capabilities.columnSampling).toBe(true); + expect(c.capabilities.nestedAnalysis).toBe(true); + expect('executeReadOnly' in c).toBe(false); + }); +}); + +describe('KtxMongoDbScanConnector construction', () => { + it('requires a url', () => { + const { factory } = fakeClientFactory(); + expect(() => connector({ driver: 'mongodb', databases: ['app'] }, factory)).toThrow(/requires connections\.mongo-prod\.url/); + }); + + it('resolves env: url references and derives the database from the url path', () => { + const { factory, client } = fakeClientFactory(); + const c = new KtxMongoDbScanConnector({ + connectionId: 'mongo-prod', + connection: { driver: 'mongodb', url: 'env:MONGO_URL' }, + clientFactory: factory, + env: { MONGO_URL: 'mongodb://localhost:27017/app' }, + }); + return c.introspect({ connectionId: 'mongo-prod', driver: 'mongodb' }, { runId: 't' }).then((snapshot) => { + expect(snapshot.scope.schemas).toEqual(['app']); + expect(client.listCollections).toHaveBeenCalledWith('app'); + }); + }); + + it('refuses a non-mongodb driver', () => { + const { factory } = fakeClientFactory(); + expect(() => connector({ driver: 'postgres' } as KtxMongoDbConnectionConfig, factory)).toThrow(/cannot run driver "postgres"/); + }); +}); + +describe('KtxMongoDbScanConnector.introspect', () => { + it('maps collections to tables, infers columns, and excludes system collections', async () => { + const { factory } = fakeClientFactory(); + const snapshot = await connector(baseConnection, factory).introspect( + { connectionId: 'mongo-prod', driver: 'mongodb' }, + { runId: 't' }, + ); + + expect(snapshot.driver).toBe('mongodb'); + expect(snapshot.tables.map((table) => table.name).sort()).toEqual(['orders', 'users']); + + const users = snapshot.tables.find((table) => table.name === 'users')!; + expect(users.db).toBe('app'); + expect(users.estimatedRows).toBe(2); + expect(users.foreignKeys).toEqual([]); + + const idColumn = users.columns.find((column) => column.name === '_id')!; + expect(idColumn.primaryKey).toBe(true); + expect(idColumn.nullable).toBe(false); + + const email = users.columns.find((column) => column.name === 'email')!; + expect(email.nullable).toBe(false); // present in every sampled document + + const age = users.columns.find((column) => column.name === 'age')!; + expect(age.nullable).toBe(true); // missing from the second document + + const address = users.columns.find((column) => column.name === 'address')!; + expect(address.normalizedType).toBe('json'); + }); + + it('honors the enabled_tables allowlist', async () => { + const { factory } = fakeClientFactory(); + const snapshot = await connector( + { ...baseConnection, enabled_tables: ['app.users'] }, + factory, + ).introspect({ connectionId: 'mongo-prod', driver: 'mongodb' }, { runId: 't' }); + expect(snapshot.tables.map((table) => table.name)).toEqual(['users']); + }); + + it('restricts introspection to input.tableScope (the scan layer does not post-filter)', async () => { + const { factory } = fakeClientFactory(); + const snapshot = await connector(baseConnection, factory).introspect( + { + connectionId: 'mongo-prod', + driver: 'mongodb', + tableScope: tableRefSet([{ catalog: null, db: 'app', name: 'users' }]), + }, + { runId: 't' }, + ); + expect(snapshot.tables.map((table) => table.name)).toEqual(['users']); + }); + + it('yields zero tables for a database whose scoped set is empty', async () => { + const { factory } = fakeClientFactory(); + const snapshot = await connector(baseConnection, factory).introspect( + { + connectionId: 'mongo-prod', + driver: 'mongodb', + tableScope: tableRefSet([{ catalog: null, db: 'other', name: 'users' }]), + }, + { runId: 't' }, + ); + expect(snapshot.tables).toEqual([]); + }); + + it('does not count documents on a view (estimatedDocumentCount fails on views)', async () => { + const collections: KtxMongoListedCollection[] = [ + { name: 'users' }, + { name: 'active_users', type: 'view' }, + ]; + const client: KtxMongoClient = { + listCollections: vi.fn(async () => collections), + // Real MongoDB rejects a count command on a view with CommandNotSupportedOnView. + estimatedDocumentCount: vi.fn(async (_db: string, name: string) => { + if (name === 'active_users') { + throw new Error('CommandNotSupportedOnView: count is not supported on a view'); + } + return 2; + }), + find: vi.fn(async (_db: string, name: string) => DOCUMENTS[name] ?? DOCUMENTS.users!), + ping: vi.fn(async () => undefined), + close: vi.fn(async () => undefined), + }; + const snapshot = await connector(baseConnection, { create: vi.fn(() => client) }).introspect( + { connectionId: 'mongo-prod', driver: 'mongodb' }, + { runId: 't' }, + ); + const view = snapshot.tables.find((table) => table.name === 'active_users')!; + expect(view.kind).toBe('view'); + expect(view.estimatedRows).toBeNull(); + expect(client.estimatedDocumentCount).not.toHaveBeenCalledWith('app', 'active_users'); + }); +}); + +describe('KtxMongoDbScanConnector sampling', () => { + it('samples a column, flattening nested values and counting nulls over the window', async () => { + const { factory } = fakeClientFactory(); + const result = await connector(baseConnection, factory).sampleColumn( + { connectionId: 'mongo-prod', table: { catalog: null, db: 'app', name: 'users' }, column: 'address', limit: 10 }, + { runId: 't' }, + ); + expect(result.values).toEqual([JSON.stringify({ city: 'NY' })]); + // address is present in the first sampled document and absent from the second + expect(result.nullCount).toBe(1); + }); +}); + +describe('createMongoDbLiveDatabaseIntrospection', () => { + it('extracts a schema through the live-database port', async () => { + const { factory } = fakeClientFactory(); + const port = createMongoDbLiveDatabaseIntrospection({ + connections: { + 'mongo-prod': { driver: 'mongodb', url: 'mongodb://localhost:27017/app', databases: ['app'] }, + }, + clientFactory: factory, + }); + const snapshot = await port.extractSchema('mongo-prod'); + expect(snapshot.driver).toBe('mongodb'); + expect(snapshot.tables.length).toBe(2); + }); +}); + +describe('ktx sql against a MongoDB connection', () => { + it('is rejected by the read-only SQL capability gate', async () => { + const { factory } = fakeClientFactory(); + const project = { config: { connections: {} }, projectDir: '/tmp' } as unknown as KtxLocalProject; + await expect( + executeProjectReadOnlySql({ + project, + input: { connectionId: 'mongo-prod', connection: undefined, sql: 'SELECT 1', maxRows: 1 }, + createConnector: () => connector(baseConnection, factory), + }), + ).rejects.toThrow(/does not support read-only SQL execution/); + }); +}); diff --git a/packages/cli/test/connectors/mongodb/dialect.test.ts b/packages/cli/test/connectors/mongodb/dialect.test.ts new file mode 100644 index 00000000..7403e27c --- /dev/null +++ b/packages/cli/test/connectors/mongodb/dialect.test.ts @@ -0,0 +1,33 @@ +import { describe, expect, it } from 'vitest'; +import { KtxMongoDbDialect } from '../../../src/connectors/mongodb/dialect.js'; +import { getDialectForDriver, getSqlDialectForDriver } from '../../../src/context/connections/dialects.js'; + +const dialect = new KtxMongoDbDialect(); + +describe('KtxMongoDbDialect', () => { + it('formats and parses db.collection display refs', () => { + expect(dialect.formatDisplayRef({ catalog: null, db: 'app', name: 'users' })).toBe('app.users'); + expect(dialect.parseDisplayRef('app.users')).toEqual({ catalog: null, db: 'app', name: 'users' }); + expect(dialect.columnDisplayTablePartCount()).toBe(2); + }); + + it('maps nested types to opaque json and scalars to dimension types', () => { + expect(dialect.mapDataType('object')).toBe('json'); + expect(dialect.mapDataType('array')).toBe('json'); + expect(dialect.mapDataType('objectId')).toBe('objectid'); + expect(dialect.mapToDimensionType('date')).toBe('time'); + expect(dialect.mapToDimensionType('long')).toBe('number'); + expect(dialect.mapToDimensionType('mixed')).toBe('string'); + expect(dialect.mapToDimensionType('object')).toBe('string'); + }); +}); + +describe('dialect registry', () => { + it('resolves a core dialect for mongodb', () => { + expect(getDialectForDriver('mongodb').type).toBe('mongodb'); + }); + + it('refuses a SQL dialect for mongodb', () => { + expect(() => getSqlDialectForDriver('mongodb')).toThrow(/no SQL dialect/); + }); +}); diff --git a/packages/cli/test/connectors/mongodb/schema-inference.test.ts b/packages/cli/test/connectors/mongodb/schema-inference.test.ts new file mode 100644 index 00000000..0f44359e --- /dev/null +++ b/packages/cli/test/connectors/mongodb/schema-inference.test.ts @@ -0,0 +1,103 @@ +import { describe, expect, it } from 'vitest'; +import { KtxMongoDbDialect } from '../../../src/connectors/mongodb/dialect.js'; +import { + bsonTypeOf, + inferKtxMongoCollectionColumns, + type KtxMongoDocument, +} from '../../../src/connectors/mongodb/schema-inference.js'; + +const dialect = new KtxMongoDbDialect(); + +function objectId(): unknown { + return { _bsontype: 'ObjectId', toString: () => '64b7f0c2a1b2c3d4e5f60718' }; // pragma: allowlist secret +} + +function decimal128(value: string): unknown { + return { _bsontype: 'Decimal128', toString: () => value }; +} + +function infer(documents: KtxMongoDocument[]) { + const columns = inferKtxMongoCollectionColumns(documents, dialect); + return new Map(columns.map((column) => [column.name, column])); +} + +describe('bsonTypeOf', () => { + it('maps JS and BSON runtime values to canonical type names', () => { + expect(bsonTypeOf(objectId())).toBe('objectId'); + expect(bsonTypeOf('hi')).toBe('string'); + expect(bsonTypeOf(7)).toBe('int'); + expect(bsonTypeOf(7.5)).toBe('double'); + expect(bsonTypeOf(true)).toBe('bool'); + expect(bsonTypeOf(new Date())).toBe('date'); + expect(bsonTypeOf(decimal128('1.5'))).toBe('decimal'); + expect(bsonTypeOf({ city: 'NY' })).toBe('object'); + expect(bsonTypeOf([1, 2])).toBe('array'); + expect(bsonTypeOf(null)).toBe('null'); + }); +}); + +describe('inferKtxMongoCollectionColumns', () => { + it('treats _id as the non-nullable primary key', () => { + const columns = infer([{ _id: objectId(), name: 'a' }]); + const id = columns.get('_id')!; + expect(id.primaryKey).toBe(true); + expect(id.nullable).toBe(false); + expect(id.dimensionType).toBe('string'); + expect(id.normalizedType).toBe('objectid'); + }); + + it('derives nullability from field presence and observed nulls', () => { + const columns = infer([ + { _id: objectId(), email: 'a@x.com', deleted_at: null }, + { _id: objectId(), email: 'b@x.com' }, + ]); + // present in every document, never null -> not nullable + expect(columns.get('email')!.nullable).toBe(false); + // missing in one document and null in another -> nullable + expect(columns.get('deleted_at')!.nullable).toBe(true); + }); + + it('maps scalar BSON types to dimension types', () => { + const columns = infer([ + { _id: objectId(), age: 30, score: 9.5, active: true, created: new Date(), balance: decimal128('10.00') }, + ]); + expect(columns.get('age')!.dimensionType).toBe('number'); + expect(columns.get('score')!.dimensionType).toBe('number'); + expect(columns.get('active')!.dimensionType).toBe('boolean'); + expect(columns.get('created')!.dimensionType).toBe('time'); + expect(columns.get('balance')!.dimensionType).toBe('number'); + }); + + it('marks a field seen with more than one type as mixed and treats it as a string', () => { + const columns = infer([ + { _id: objectId(), ref: 'abc' }, + { _id: objectId(), ref: 123 }, + ]); + const ref = columns.get('ref')!; + expect(ref.nativeType).toBe('mixed'); + expect(ref.normalizedType).toBe('mixed'); + expect(ref.dimensionType).toBe('string'); + }); + + it('keeps sub-documents and arrays as a single opaque json column', () => { + const columns = infer([ + { _id: objectId(), address: { city: 'NY', zip: '10001' }, tags: ['a', 'b'] }, + ]); + const address = columns.get('address')!; + expect(address.nativeType).toBe('object'); + expect(address.normalizedType).toBe('json'); + expect(address.dimensionType).toBe('string'); + + const tags = columns.get('tags')!; + expect(tags.nativeType).toBe('array'); + expect(tags.normalizedType).toBe('json'); + }); + + it('preserves first-seen field order', () => { + const columns = inferKtxMongoCollectionColumns( + [{ _id: objectId(), b: 1, a: 2 }], + dialect, + ); + expect(columns.map((column) => column.name)).toEqual(['_id', 'b', 'a']); + }); +}); diff --git a/packages/cli/test/context/connections/dialects.test.ts b/packages/cli/test/context/connections/dialects.test.ts index 217be1eb..cc7eaa59 100644 --- a/packages/cli/test/context/connections/dialects.test.ts +++ b/packages/cli/test/context/connections/dialects.test.ts @@ -1,5 +1,5 @@ import { describe, expect, it } from 'vitest'; -import { getDialectForDriver } from '../../../src/context/connections/dialects.js'; +import { getDialectForDriver, getSqlDialectForDriver } from '../../../src/context/connections/dialects.js'; import type { KtxConnectionDriver, KtxTableRef } from '../../../src/context/scan/types.js'; interface DialectFixture { @@ -248,8 +248,8 @@ const fixtures: DialectFixture[] = [ ]; describe('getDialectForDriver', () => { - it.each(fixtures)('returns a full KtxDialect for $driver', (fixture) => { - const dialect = getDialectForDriver(fixture.driver); + it.each(fixtures)('returns a full KtxSqlDialect for $driver', (fixture) => { + const dialect = getSqlDialectForDriver(fixture.driver); const column = dialect.quoteIdentifier('status'); expect(dialect.type).toBe(fixture.driver); @@ -305,12 +305,12 @@ describe('getDialectForDriver', () => { it('throws with a supported-driver list for unknown drivers', () => { expect(() => getDialectForDriver('oracle')).toThrow( - 'Unsupported warehouse driver "oracle". Supported drivers: bigquery, clickhouse, mysql, postgres, sqlite, snowflake, sqlserver', + 'Unsupported driver "oracle". Supported drivers: bigquery, clickhouse, mongodb, mysql, postgres, snowflake, sqlite, sqlserver', ); }); it('rejects legacy driver aliases', () => { - expect(() => getDialectForDriver('postgresql')).toThrow('Unsupported warehouse driver "postgresql"'); - expect(() => getDialectForDriver('sqlite3')).toThrow('Unsupported warehouse driver "sqlite3"'); + expect(() => getDialectForDriver('postgresql')).toThrow('Unsupported driver "postgresql"'); + expect(() => getDialectForDriver('sqlite3')).toThrow('Unsupported driver "sqlite3"'); }); }); diff --git a/packages/cli/test/context/connections/drivers.test.ts b/packages/cli/test/context/connections/drivers.test.ts index 5d59db4e..65bbca4b 100644 --- a/packages/cli/test/context/connections/drivers.test.ts +++ b/packages/cli/test/context/connections/drivers.test.ts @@ -22,6 +22,11 @@ const connectionFixtures: Record = { schemas: ['public'], }), sqlite: () => ({ driver: 'sqlite', path: 'warehouse.db' }), + mongodb: () => ({ + driver: 'mongodb', + url: 'mongodb://localhost:27017/app', + databases: ['app'], + }), mysql: () => ({ driver: 'mysql', host: 'localhost', @@ -96,6 +101,7 @@ describe('driverRegistrations', () => { expect(listSupportedDrivers()).toEqual([ 'bigquery', 'clickhouse', + 'mongodb', 'mysql', 'postgres', 'snowflake', diff --git a/packages/cli/test/context/mcp/local-project-ports.test.ts b/packages/cli/test/context/mcp/local-project-ports.test.ts index 702cff64..d4484775 100644 --- a/packages/cli/test/context/mcp/local-project-ports.test.ts +++ b/packages/cli/test/context/mcp/local-project-ports.test.ts @@ -282,6 +282,33 @@ describe('createLocalProjectMcpContextPorts', () => { expect(createConnector).not.toHaveBeenCalled(); }); + it('refuses sql_execution against a non-SQL (MongoDB) connection before SQL analysis', async () => { + const project = await initKtxProject({ projectDir: tempDir }); + project.config.connections.mongo = { driver: 'mongodb', url: 'mongodb://localhost:27017/app' }; + const createConnector = vi.fn(async () => testConnector()); + const sqlAnalysis = { + analyzeForFingerprint: vi.fn(), + analyzeBatch: vi.fn(), + validateReadOnly: vi.fn(async () => ({ ok: true, error: null })), + }; + const ports = createLocalProjectMcpContextPorts(project, { + sqlAnalysis, + localScan: { createConnector }, + embeddingService: null, + }); + + const execution = ports.sqlExecution?.execute({ + connectionId: 'mongo', + sql: 'select 1', + maxRows: 5, + }); + await expect(execution).rejects.toBeInstanceOf(KtxExpectedError); + await expect(execution).rejects.toThrow("non-SQL driver 'mongodb'"); + // Refused before the parser dialect is chosen and before any connector is built. + expect(sqlAnalysis.validateReadOnly).not.toHaveBeenCalled(); + expect(createConnector).not.toHaveBeenCalled(); + }); + it('emits sql_execution progress stages from local MCP ports', async () => { const project = await initKtxProject({ projectDir: tempDir }); project.config.connections.warehouse = { diff --git a/packages/cli/test/context/project/driver-schemas.test.ts b/packages/cli/test/context/project/driver-schemas.test.ts index df85a0db..1c589134 100644 --- a/packages/cli/test/context/project/driver-schemas.test.ts +++ b/packages/cli/test/context/project/driver-schemas.test.ts @@ -28,6 +28,28 @@ describe('connectionConfigSchema (driver discriminated union)', () => { }); }); + it('parses a mongodb connection with sampling fields', () => { + const parsed = connectionConfigSchema.parse({ + driver: 'mongodb', + url: 'env:MONGO_URL', + databases: ['app'], + enabled_tables: ['app.users'], + sample_size: 500, + order_by: 'createdAt', + }); + expect(parsed).toMatchObject({ + driver: 'mongodb', + url: 'env:MONGO_URL', + databases: ['app'], + sample_size: 500, + order_by: 'createdAt', + }); + }); + + it('rejects a mongodb connection without a url', () => { + expect(() => connectionConfigSchema.parse({ driver: 'mongodb', databases: ['app'] })).toThrow(); + }); + it('rejects an unknown driver', () => { expect(() => connectionConfigSchema.parse({ driver: 'nope', url: 'x' })).toThrow(); }); diff --git a/packages/cli/test/context/scan/relationship-composite-candidates.test.ts b/packages/cli/test/context/scan/relationship-composite-candidates.test.ts index e0a9ca6c..3820f588 100644 --- a/packages/cli/test/context/scan/relationship-composite-candidates.test.ts +++ b/packages/cli/test/context/scan/relationship-composite-candidates.test.ts @@ -1,7 +1,7 @@ import Database from 'better-sqlite3'; import { join } from 'node:path'; import { describe, expect, it } from 'vitest'; -import { getDialectForDriver } from '../../../src/context/connections/dialects.js'; +import { getSqlDialectForDriver } from '../../../src/context/connections/dialects.js'; import { snapshotToKtxEnrichedSchema } from '../../../src/context/scan/local-enrichment.js'; import { loadKtxRelationshipBenchmarkFixture, maskKtxRelationshipBenchmarkSnapshot } from '../../../src/context/scan/relationship-benchmarks.js'; import { discoverKtxCompositeRelationships } from '../../../src/context/scan/relationship-composite-candidates.js'; @@ -42,7 +42,8 @@ describe('composite relationship discovery detector', () => { const executor = new TestSqliteExecutor(fixture.dataPath ?? ''); const profiles = await profileKtxRelationshipSchema({ connectionId: snapshot.connectionId, - dialect: getDialectForDriver(snapshot.driver), + driver: snapshot.driver, + dialect: getSqlDialectForDriver(snapshot.driver), schema, executor, ctx: { runId: 'test:composite-profile' }, @@ -50,7 +51,7 @@ describe('composite relationship discovery detector', () => { const result = await discoverKtxCompositeRelationships({ connectionId: snapshot.connectionId, - dialect: getDialectForDriver(snapshot.driver), + dialect: getSqlDialectForDriver(snapshot.driver), schema, profiles, executor, diff --git a/packages/cli/test/context/scan/relationship-discovery.test.ts b/packages/cli/test/context/scan/relationship-discovery.test.ts index f840a08e..cebb2969 100644 --- a/packages/cli/test/context/scan/relationship-discovery.test.ts +++ b/packages/cli/test/context/scan/relationship-discovery.test.ts @@ -1,7 +1,7 @@ import Database from 'better-sqlite3'; import { afterEach, describe, expect, it, vi } from 'vitest'; import type { KtxLlmRuntimePort } from '../../../src/context/llm/runtime-port.js'; -import { getDialectForDriver } from '../../../src/context/connections/dialects.js'; +import { getSqlDialectForDriver } from '../../../src/context/connections/dialects.js'; import { buildDefaultKtxProjectConfig } from '../../../src/context/project/config.js'; import { snapshotToKtxEnrichedSchema } from '../../../src/context/scan/local-enrichment.js'; import { @@ -311,7 +311,7 @@ describe('production relationship discovery', () => { const result = await discoverKtxRelationships({ connectionId: 'warehouse', - dialect: getDialectForDriver('sqlite'), + dialect: getSqlDialectForDriver('sqlite'), connector: connector(executor), schema: snapshotToKtxEnrichedSchema(snapshot()), context: { runId: 'relationship-run-1' }, @@ -350,7 +350,7 @@ describe('production relationship discovery', () => { const schema = naturalKeySnapshot(); const result = await discoverKtxRelationships({ connectionId: 'warehouse', - dialect: getDialectForDriver('sqlite'), + dialect: getSqlDialectForDriver('sqlite'), connector: { ...connector(executor), introspect: async () => schema, @@ -400,7 +400,7 @@ describe('production relationship discovery', () => { const result = await discoverKtxRelationships({ connectionId: 'warehouse', - dialect: getDialectForDriver('sqlite'), + dialect: getSqlDialectForDriver('sqlite'), connector: { ...connector(executor), introspect: async () => sourceSnapshot, @@ -433,7 +433,7 @@ describe('production relationship discovery', () => { it('keeps candidates review-only when read-only SQL is unavailable', async () => { const result = await discoverKtxRelationships({ connectionId: 'warehouse', - dialect: getDialectForDriver('sqlite'), + dialect: getSqlDialectForDriver('sqlite'), connector: connector(null), schema: snapshotToKtxEnrichedSchema(snapshot()), context: { runId: 'relationship-run-no-sql' }, @@ -459,7 +459,7 @@ describe('production relationship discovery', () => { const sourceSnapshot = declaredForeignKeySnapshot(); const result = await discoverKtxRelationships({ connectionId: 'warehouse', - dialect: getDialectForDriver('sqlite'), + dialect: getSqlDialectForDriver('sqlite'), connector: connector(null), schema: snapshotToKtxEnrichedSchema(sourceSnapshot), context: { runId: 'formal-metadata-no-sql' }, @@ -506,7 +506,7 @@ describe('production relationship discovery', () => { const result = await discoverKtxRelationships({ connectionId: 'warehouse', - dialect: getDialectForDriver('sqlite'), + dialect: getSqlDialectForDriver('sqlite'), connector: connector(executor), schema: snapshotToKtxEnrichedSchema(llmOnlyRelationshipSnapshot()), context: { runId: 'llm-relationship-orchestrator' }, @@ -546,7 +546,7 @@ describe('production relationship discovery', () => { const result = await discoverKtxRelationships({ connectionId: 'warehouse', - dialect: getDialectForDriver('sqlite'), + dialect: getSqlDialectForDriver('sqlite'), connector: connector(executor), schema: snapshotToKtxEnrichedSchema(snapshot()), context: { runId: 'configured-thresholds' }, @@ -607,7 +607,7 @@ describe('production relationship discovery', () => { const result = await discoverKtxRelationships({ connectionId: 'warehouse', - dialect: getDialectForDriver('sqlite'), + dialect: getSqlDialectForDriver('sqlite'), connector: { ...connector(executor), introspect: async () => richSnapshot, @@ -663,7 +663,7 @@ describe('production relationship discovery', () => { const result = await discoverKtxRelationships({ connectionId: maskedSnapshot.connectionId, - dialect: getDialectForDriver(maskedSnapshot.driver), + dialect: getSqlDialectForDriver(maskedSnapshot.driver), connector: testConnector, schema: snapshotToKtxEnrichedSchema(maskedSnapshot, new Map()), context: { runId: 'test:production-composite' }, diff --git a/packages/cli/test/context/scan/relationship-profiling.test.ts b/packages/cli/test/context/scan/relationship-profiling.test.ts index 7983d958..461092b2 100644 --- a/packages/cli/test/context/scan/relationship-profiling.test.ts +++ b/packages/cli/test/context/scan/relationship-profiling.test.ts @@ -2,7 +2,7 @@ import { readFile } from 'node:fs/promises'; import { join } from 'node:path'; import Database from 'better-sqlite3'; import { afterEach, describe, expect, it, vi } from 'vitest'; -import { getDialectForDriver } from '../../../src/context/connections/dialects.js'; +import { getSqlDialectForDriver } from '../../../src/context/connections/dialects.js'; import type { KtxEnrichedColumn, KtxEnrichedSchema, KtxEnrichedTable } from '../../../src/context/scan/enrichment-types.js'; import { snapshotToKtxEnrichedSchema } from '../../../src/context/scan/local-enrichment.js'; import { loadKtxRelationshipBenchmarkFixture, maskKtxRelationshipBenchmarkSnapshot } from '../../../src/context/scan/relationship-benchmarks.js'; @@ -121,7 +121,8 @@ describe('relationship profiling', () => { const result = await profileKtxRelationshipSchema({ connectionId: 'warehouse', - dialect: getDialectForDriver('sqlite'), + driver: 'sqlite', + dialect: getSqlDialectForDriver('sqlite'), schema: schema([ table('accounts', [ column('accounts', 'id', { primaryKey: false, nullable: false }), @@ -183,7 +184,8 @@ describe('relationship profiling', () => { const result = await profileKtxRelationshipSchema({ connectionId: 'warehouse', - dialect: getDialectForDriver('sqlite'), + driver: 'sqlite', + dialect: getSqlDialectForDriver('sqlite'), schema: schema([ table('accounts', [ column('accounts', 'id', { nullable: false }), @@ -226,7 +228,8 @@ describe('relationship profiling', () => { const profiles = await profileKtxRelationshipSchema({ connectionId: 'warehouse', - dialect: getDialectForDriver('sqlite'), + driver: 'sqlite', + dialect: getSqlDialectForDriver('sqlite'), schema: schema([ table('accounts', [ column('accounts', 'id', { nullable: false }), @@ -277,7 +280,8 @@ describe('relationship profiling', () => { const first = await profileKtxRelationshipSchema({ connectionId: 'warehouse', - dialect: getDialectForDriver('sqlite'), + driver: 'sqlite', + dialect: getSqlDialectForDriver('sqlite'), schema: relationshipSchema, executor, ctx: { runId: 'profile-cache-run' }, @@ -285,7 +289,8 @@ describe('relationship profiling', () => { }); const second = await profileKtxRelationshipSchema({ connectionId: 'warehouse', - dialect: getDialectForDriver('sqlite'), + driver: 'sqlite', + dialect: getSqlDialectForDriver('sqlite'), schema: relationshipSchema, executor, ctx: { runId: 'profile-cache-run' }, @@ -293,7 +298,8 @@ describe('relationship profiling', () => { }); const third = await profileKtxRelationshipSchema({ connectionId: 'warehouse', - dialect: getDialectForDriver('sqlite'), + driver: 'sqlite', + dialect: getSqlDialectForDriver('sqlite'), schema: relationshipSchema, executor, ctx: { runId: 'profile-cache-fresh-run' }, @@ -322,7 +328,8 @@ describe('relationship profiling', () => { try { const result = await profileKtxRelationshipSchema({ connectionId: fixture.snapshot.connectionId, - dialect: getDialectForDriver(fixture.snapshot.driver), + driver: fixture.snapshot.driver, + dialect: getSqlDialectForDriver(fixture.snapshot.driver), schema: snapshotToKtxEnrichedSchema(maskedSnapshot, new Map()), executor: scaleExecutor, ctx: { runId: 'scale-stress-profile-query-count' }, @@ -367,7 +374,8 @@ describe('relationship profiling', () => { await profileKtxRelationshipSchema({ connectionId: 'warehouse', - dialect: getDialectForDriver('sqlite'), + driver: 'sqlite', + dialect: getSqlDialectForDriver('sqlite'), schema: schemaWithTables(['accounts', 'orders', 'payments', 'refunds']), executor, ctx: { runId: 'profile-concurrency' }, @@ -403,7 +411,8 @@ describe('relationship profiling', () => { const result = await profileKtxRelationshipSchema({ connectionId: 'warehouse', - dialect: getDialectForDriver('sqlite'), + driver: 'sqlite', + dialect: getSqlDialectForDriver('sqlite'), schema: schemaWithTables(['accounts', 'orders']), executor, ctx: { runId: 'profile-error-isolated' }, diff --git a/packages/cli/test/context/scan/relationship-validation.test.ts b/packages/cli/test/context/scan/relationship-validation.test.ts index 7e876421..98826aed 100644 --- a/packages/cli/test/context/scan/relationship-validation.test.ts +++ b/packages/cli/test/context/scan/relationship-validation.test.ts @@ -1,6 +1,6 @@ import Database from 'better-sqlite3'; import { afterEach, describe, expect, it } from 'vitest'; -import { getDialectForDriver } from '../../../src/context/connections/dialects.js'; +import { getSqlDialectForDriver } from '../../../src/context/connections/dialects.js'; import type { KtxEnrichedColumn, KtxEnrichedSchema, KtxEnrichedTable } from '../../../src/context/scan/enrichment-types.js'; import { generateKtxRelationshipDiscoveryCandidates } from '../../../src/context/scan/relationship-candidates.js'; import type { KtxRelationshipProfileArtifact } from '../../../src/context/scan/relationship-profiling.js'; @@ -102,7 +102,8 @@ describe('relationship validation', () => { const testSchema = schema(); const profiles = await profileKtxRelationshipSchema({ connectionId: 'warehouse', - dialect: getDialectForDriver('sqlite'), + driver: 'sqlite', + dialect: getSqlDialectForDriver('sqlite'), schema: testSchema, executor, ctx: { runId: 'validate-test' }, @@ -113,7 +114,7 @@ describe('relationship validation', () => { const validated = await validateKtxRelationshipDiscoveryCandidates({ connectionId: 'warehouse', - dialect: getDialectForDriver('sqlite'), + dialect: getSqlDialectForDriver('sqlite'), candidates, profiles, executor, @@ -151,7 +152,8 @@ describe('relationship validation', () => { const testSchema = schema(); const profiles = await profileKtxRelationshipSchema({ connectionId: 'warehouse', - dialect: getDialectForDriver('sqlite'), + driver: 'sqlite', + dialect: getSqlDialectForDriver('sqlite'), schema: testSchema, executor, ctx: { runId: 'validate-test' }, @@ -162,7 +164,7 @@ describe('relationship validation', () => { const validated = await validateKtxRelationshipDiscoveryCandidates({ connectionId: 'warehouse', - dialect: getDialectForDriver('sqlite'), + dialect: getSqlDialectForDriver('sqlite'), candidates, profiles, executor, @@ -201,7 +203,8 @@ describe('relationship validation', () => { const testSchema = schema(); const profiles = await profileKtxRelationshipSchema({ connectionId: 'warehouse', - dialect: getDialectForDriver('sqlite'), + driver: 'sqlite', + dialect: getSqlDialectForDriver('sqlite'), schema: testSchema, executor, ctx: { runId: 'validate-budget-profile' }, @@ -214,7 +217,7 @@ describe('relationship validation', () => { const validated = await validateKtxRelationshipDiscoveryCandidates({ connectionId: 'warehouse', - dialect: getDialectForDriver('sqlite'), + dialect: getSqlDialectForDriver('sqlite'), candidates, profiles, executor, @@ -256,7 +259,8 @@ describe('relationship validation', () => { ]); const profiles = await profileKtxRelationshipSchema({ connectionId: 'warehouse', - dialect: getDialectForDriver('sqlite'), + driver: 'sqlite', + dialect: getSqlDialectForDriver('sqlite'), schema: testSchema, executor, ctx: { runId: 'validate-zero-budget-profile' }, @@ -266,7 +270,7 @@ describe('relationship validation', () => { const validated = await validateKtxRelationshipDiscoveryCandidates({ connectionId: 'warehouse', - dialect: getDialectForDriver('sqlite'), + dialect: getSqlDialectForDriver('sqlite'), candidates, profiles, executor, @@ -303,7 +307,8 @@ describe('relationship validation', () => { ]); const profiles = await profileKtxRelationshipSchema({ connectionId: 'warehouse', - dialect: getDialectForDriver('sqlite'), + driver: 'sqlite', + dialect: getSqlDialectForDriver('sqlite'), schema: testSchema, executor, ctx: { runId: 'llm-rejected-validation' }, @@ -332,7 +337,7 @@ describe('relationship validation', () => { const [validated] = await validateKtxRelationshipDiscoveryCandidates({ connectionId: 'warehouse', - dialect: getDialectForDriver('sqlite'), + dialect: getSqlDialectForDriver('sqlite'), candidates: [llmCandidate], profiles, executor, @@ -377,7 +382,8 @@ describe('relationship validation', () => { ]); const profiles = await profileKtxRelationshipSchema({ connectionId: 'warehouse', - dialect: getDialectForDriver('sqlite'), + driver: 'sqlite', + dialect: getSqlDialectForDriver('sqlite'), schema: testSchema, executor, ctx: { runId: 'validation-concurrency-profile' }, @@ -386,7 +392,7 @@ describe('relationship validation', () => { await validateKtxRelationshipDiscoveryCandidates({ connectionId: 'warehouse', - dialect: getDialectForDriver('sqlite'), + dialect: getSqlDialectForDriver('sqlite'), candidates, profiles, executor: throttled, @@ -478,7 +484,7 @@ describe('relationship validation', () => { const [validated] = await validateKtxRelationshipDiscoveryCandidates({ connectionId: 'warehouse', - dialect: getDialectForDriver('sqlite'), + dialect: getSqlDialectForDriver('sqlite'), candidates: [candidate], profiles, executor, diff --git a/packages/cli/test/context/sl/local-query.test.ts b/packages/cli/test/context/sl/local-query.test.ts index d1db5503..5a5891e4 100644 --- a/packages/cli/test/context/sl/local-query.test.ts +++ b/packages/cli/test/context/sl/local-query.test.ts @@ -67,6 +67,18 @@ grain: [] await rm(tempDir, { recursive: true, force: true }); }); + it('refuses a non-SQL (context-only) connection instead of compiling it as Postgres', async () => { + project.config.connections['mongo-prod'] = { driver: 'mongodb', url: 'mongodb://localhost:27017/app' }; + await expect( + compileLocalSlQuery(project, { + connectionId: 'mongo-prod', + query: { measures: ['orders.order_count'], dimensions: ['orders.status'], limit: 25 }, + compute, + }), + ).rejects.toThrow(/non-SQL driver 'mongodb'|require a SQL warehouse connection/); + expect(compute.query).not.toHaveBeenCalled(); + }); + it('compiles a local semantic-layer query with computable sources only', async () => { const result = await compileLocalSlQuery(project, { connectionId: 'warehouse', diff --git a/packages/cli/test/setup-databases.test.ts b/packages/cli/test/setup-databases.test.ts index 7f0a5523..3f363805 100644 --- a/packages/cli/test/setup-databases.test.ts +++ b/packages/cli/test/setup-databases.test.ts @@ -243,6 +243,7 @@ describe('setup databases step', () => { { value: 'mysql', label: 'MySQL' }, { value: 'clickhouse', label: 'ClickHouse' }, { value: 'sqlserver', label: 'SQL Server' }, + { value: 'mongodb', label: 'MongoDB' }, { value: 'sqlite', label: 'SQLite' }, ], required: true, diff --git a/packages/cli/test/sql.test.ts b/packages/cli/test/sql.test.ts index e1222e47..4ffc56ce 100644 --- a/packages/cli/test/sql.test.ts +++ b/packages/cli/test/sql.test.ts @@ -348,6 +348,40 @@ describe('runKtxSql', () => { expect(io.stderr()).toContain('does not support read-only SQL execution.'); }); + it('refuses a non-SQL (MongoDB) connection before invoking SQL analysis', async () => { + const projectDir = join(tempDir, 'project'); + await initKtxProject({ projectDir }); + await writeConnections(projectDir, { mongo: { driver: 'mongodb', url: 'mongodb://localhost:27017/app' } }); + const sqlAnalysis = makeSqlAnalysis({ ok: true, error: null }); + const createSqlAnalysis = vi.fn(() => sqlAnalysis); + const createScanConnector = vi.fn(async () => makeConnector()); + const io = makeIo(); + + await expect( + runKtxSql( + { + command: 'execute', + projectDir, + connectionId: 'mongo', + sql: 'select 1', + maxRows: 1000, + output: 'pretty', + json: false, + cliVersion: '0.0.0-test', + }, + io.io, + { createSqlAnalysis, createScanConnector }, + ), + ).resolves.toBe(1); + + // The non-SQL boundary is enforced before any SQL parser/daemon work, so a + // MongoDB connection never reaches dialect selection or read-only validation. + expect(createSqlAnalysis).not.toHaveBeenCalled(); + expect(sqlAnalysis.validateReadOnly).not.toHaveBeenCalled(); + expect(createScanConnector).not.toHaveBeenCalled(); + expect(io.stderr()).toContain("non-SQL driver 'mongodb'"); + }); + it('routes _ktx_federated through the shared federated executor', async () => { const projectDir = join(tempDir, 'project'); await initKtxProject({ projectDir }); diff --git a/pnpm-lock.yaml b/pnpm-lock.yaml index 2eea9996..c0e2b062 100644 --- a/pnpm-lock.yaml +++ b/pnpm-lock.yaml @@ -194,6 +194,9 @@ importers: minimatch: specifier: ^10.2.5 version: 10.2.5 + mongodb: + specifier: ^6.12.0 + version: 6.21.0 mssql: specifier: ^12.5.4 version: 12.5.4(@azure/core-client@1.10.1) @@ -1240,6 +1243,9 @@ packages: '@cfworker/json-schema': optional: true + '@mongodb-js/saslprep@1.4.11': + resolution: {integrity: sha512-o9rAHc0IpIjuPSxRutWpE1F62x7n+4mVS4rCNHkzhIUMQcc18bb6xEq5wd2NdN0WjepIyXIppRshYI2kQDOZVA==} + '@napi-rs/wasm-runtime@1.1.4': resolution: {integrity: sha512-3NQNNgA1YSlJb/kMH1ildASP9HW7/7kYnRI2szWJaofaS1hWmbGI4H+d3+22aGzXXN9IJ+n+GiFVcGipJP18ow==} peerDependencies: @@ -2575,6 +2581,12 @@ packages: '@types/unist@3.0.3': resolution: {integrity: sha512-ko/gIFJRv177XgZsZcBwnqJN5x/Gien8qNOn0D5bQU/zAzVf9Zt3BlcUiLqhV9y4ARk0GbT3tnUiPNgnTXzc/Q==} + '@types/webidl-conversions@7.0.3': + resolution: {integrity: sha512-CiJJvcRtIgzadHCYXw7dqEnMNRjhGZlYK05Mj9OyktqV8uVT8fD2BFOB7S1uwBE3Kj2Z+4UyPmFw/Ixgw/LAlA==} + + '@types/whatwg-url@11.0.5': + resolution: {integrity: sha512-coYR071JRaHa+xoEvvYqvnIHaVqaYrLPbsufM9BF63HkwI5Lgmy2QR8Q5K/lYDYo5AK82wOvSOS0UsLTpTG7uQ==} + '@typespec/ts-http-runtime@0.3.5': resolution: {integrity: sha512-yURCknZhvywvQItHMMmFSo+fq5arCUIyz/CVk7jD89MSai7dkaX8ufjCWp3NttLojoTVbcE72ri+be/TnEbMHw==} engines: {node: '>=20.0.0'} @@ -2846,6 +2858,10 @@ packages: resolution: {integrity: sha512-yQbXgO/OSZVD2IsiLlro+7Hf6Q18EJrKSEsdoMzKePKXct3gvD8oLcOQdIzGupr5Fj+EDe8gO/lxc1BzfMpxvA==} engines: {node: '>=8'} + bson@6.10.4: + resolution: {integrity: sha512-WIsKqkSC0ABoBJuT1LEX+2HEvNmNKKgnTAyd0fL8qzK4SH2i9NXg+t08YtdZp/V9IZ33cxe3iV4yM0qg8lMQng==} + engines: {node: '>=16.20.1'} + buffer-equal-constant-time@1.0.1: resolution: {integrity: sha512-zRpUiDwd/xk6ADqPMATG8vc9VPrkck7T07OIx0gnjmJAnHnTVXNQG3vfvWNuiZIkwu9KrKdA1iJKfsfTVxE6NA==} @@ -4381,6 +4397,9 @@ packages: resolution: {integrity: sha512-aisnrDP4GNe06UcKFnV5bfMNPBUw4jsLGaWwWfnH3v02GnBuXX2MCVn5RbrWo0j3pczUilYblq7fQ7Nw2t5XKw==} engines: {node: '>= 0.8'} + memory-pager@1.5.0: + resolution: {integrity: sha512-ZS4Bp4r/Zoeq6+NLJpP+0Zzm0pR8whtGPf1XExKLJBAczGMnSi3It14OiNCStjQjM6NU1okjQGSxgEZN8eBYKg==} + meow@13.2.0: resolution: {integrity: sha512-pxQJQzB6djGPXh08dacEloMFopsOqGVRKFPYvPOt9XDZ1HasbgDZA74CJGreSU4G3Ak7EFJGoiH2auq+yXISgA==} engines: {node: '>=18'} @@ -4556,6 +4575,36 @@ packages: moment@2.30.1: resolution: {integrity: sha512-uEmtNhbDOrWPFS+hdjFCBfy9f2YoyzRpwcl+DqpC6taX21FzsTLQVbMV/W7PzNSX6x/bhC1zA3c2UQ5NzH6how==} + mongodb-connection-string-url@3.0.2: + resolution: {integrity: sha512-rMO7CGo/9BFwyZABcKAWL8UJwH/Kc2x0g72uhDWzG48URRax5TCIcJ7Rc3RZqffZzO/Gwff/jyKwCU9TN8gehA==} + + mongodb@6.21.0: + resolution: {integrity: sha512-URyb/VXMjJ4da46OeSXg+puO39XH9DeQpWCslifrRn9JWugy0D+DvvBvkm2WxmHe61O/H19JM66p1z7RHVkZ6A==} + engines: {node: '>=16.20.1'} + peerDependencies: + '@aws-sdk/credential-providers': ^3.188.0 + '@mongodb-js/zstd': ^1.1.0 || ^2.0.0 + gcp-metadata: ^5.2.0 + kerberos: ^2.0.1 + mongodb-client-encryption: '>=6.0.0 <7' + snappy: ^7.3.2 + socks: ^2.7.1 + peerDependenciesMeta: + '@aws-sdk/credential-providers': + optional: true + '@mongodb-js/zstd': + optional: true + gcp-metadata: + optional: true + kerberos: + optional: true + mongodb-client-encryption: + optional: true + snappy: + optional: true + socks: + optional: true + motion-dom@12.40.0: resolution: {integrity: sha512-HxU3ZaBwNPVQUBQf1xxgq+7JrPNZvjLVxgbpEZL7RrWJnsxOf0/OM+yrHG9ogLQ31Do/r57Oz2gQWPK+6q62mg==} @@ -5068,6 +5117,10 @@ packages: pump@3.0.4: resolution: {integrity: sha512-VS7sjc6KR7e1ukRFhQSY5LM2uBWAUPiOPa/A3mkKmiMwSmRFUITt0xuj+/lesgnCv+dPIEYlkzrcyXgquIHMcA==} + punycode@2.3.1: + resolution: {integrity: sha512-vYt7UD1U9Wg6138shLtLOvdAu+8DsC/ilFtEVHcH+wydcSpNE20AfSOduf6MkRFahL5FY7X1oU7nKVZFtfq8Fg==} + engines: {node: '>=6'} + qs@6.15.2: resolution: {integrity: sha512-Rzq0KEyX/w/tEybncDgdkZrJgVUsUMk3xjh3t5bv3S1HTAtg+uOYt72+ZfwiQwKdysThkTBdL/rTi6HDmX9Ddw==} engines: {node: '>=0.6'} @@ -5394,6 +5447,9 @@ packages: space-separated-tokens@2.0.2: resolution: {integrity: sha512-PEGlAwrG8yXGXRjW32fGbg66JAlOAwbObuqVoJpv/mRgoWDQfgH1wDPvtzWyUSNAXBGSk8h755YDbbcEy3SH2Q==} + sparse-bitfield@3.0.3: + resolution: {integrity: sha512-kvzhi7vqKTfkh0PZU+2D2PIllw2ymqJKujUcyPMd9Y75Nv4nPbGJZXNhxsgdQab2BmlDct1YnfQCguEvHr7VsQ==} + spawn-error-forwarder@1.0.0: resolution: {integrity: sha512-gRjMgK5uFjbCvdibeGJuy3I5OYz6VLoVdsOJdA6wV0WlfQVLFueoqMxwwYD9RODdgb6oUIvlRlsyFSiQkMKu0g==} @@ -5634,6 +5690,10 @@ packages: toml@3.0.0: resolution: {integrity: sha512-y/mWCZinnvxjTKYhJ+pYxwD0mRLVvOtdS2Awbgxln6iEnt4rk0yBxeSBHkGJcPucRiG0e55mwWp+g/05rsrd6w==} + tr46@5.1.1: + resolution: {integrity: sha512-hdF5ZgjTqgAntKkklYw0R03MG2x/bSzTtkxmIRw/sTNV8YXsCJ1tfLAX23lhxhHJlEf3CRCOCGGWw3vI3GaSPw==} + engines: {node: '>=18'} + traverse@0.6.8: resolution: {integrity: sha512-aXJDbk6SnumuaZSANd21XAo15ucCDE38H4fkqiGsc3MhCK+wOlZvLP9cB/TvpHT0mOyWgC4Z8EwRlzqYSUzdsA==} engines: {node: '>= 0.4'} @@ -5907,6 +5967,14 @@ packages: web-worker@1.5.0: resolution: {integrity: sha512-RiMReJrTAiA+mBjGONMnjVDP2u3p9R1vkcGz6gDIrOMT3oGuYwX2WRMYI9ipkphSuE5XKEhydbhNEJh4NY9mlw==} + webidl-conversions@7.0.0: + resolution: {integrity: sha512-VwddBukDzu71offAQR975unBIGqfKZpM+8ZX6ySk8nYhVoo5CYaZyzt3YBvYtRtO+aoGlqxPg/B87NGVZ/fu6g==} + engines: {node: '>=12'} + + whatwg-url@14.2.0: + resolution: {integrity: sha512-De72GdQZzNTUBBChsXueQUnPKDkg/5A5zp7pFDuQAj5UFoENpiACU0wlCvzpAGnTkj++ihpKwKyYewn/XNUbKw==} + engines: {node: '>=18'} + which@2.0.2: resolution: {integrity: sha512-BLI3Tl1TW3Pvl70l3yq3Y64i+awpwXqsGBYWkkqMtnbXgrMD+yj7rhW0kuEDxzJaYXGjEW5ogapKNMEKNMjibA==} engines: {node: '>= 8'} @@ -7209,6 +7277,10 @@ snapshots: transitivePeerDependencies: - supports-color + '@mongodb-js/saslprep@1.4.11': + dependencies: + sparse-bitfield: 3.0.3 + '@napi-rs/wasm-runtime@1.1.4(@emnapi/core@1.10.0)(@emnapi/runtime@1.10.0)': dependencies: '@emnapi/core': 1.10.0 @@ -8440,6 +8512,12 @@ snapshots: '@types/unist@3.0.3': {} + '@types/webidl-conversions@7.0.3': {} + + '@types/whatwg-url@11.0.5': + dependencies: + '@types/webidl-conversions': 7.0.3 + '@typespec/ts-http-runtime@0.3.5': dependencies: http-proxy-agent: 7.0.2 @@ -8737,6 +8815,8 @@ snapshots: dependencies: fill-range: 7.1.1 + bson@6.10.4: {} + buffer-equal-constant-time@1.0.1: {} buffer@5.7.1: @@ -10450,6 +10530,8 @@ snapshots: media-typer@1.1.0: {} + memory-pager@1.5.0: {} + meow@13.2.0: {} merge-descriptors@2.0.0: {} @@ -10765,6 +10847,17 @@ snapshots: moment@2.30.1: {} + mongodb-connection-string-url@3.0.2: + dependencies: + '@types/whatwg-url': 11.0.5 + whatwg-url: 14.2.0 + + mongodb@6.21.0: + dependencies: + '@mongodb-js/saslprep': 1.4.11 + bson: 6.10.4 + mongodb-connection-string-url: 3.0.2 + motion-dom@12.40.0: dependencies: motion-utils: 12.39.0 @@ -11219,6 +11312,8 @@ snapshots: end-of-stream: 1.4.5 once: 1.4.0 + punycode@2.3.1: {} + qs@6.15.2: dependencies: side-channel: 1.1.0 @@ -11743,6 +11838,10 @@ snapshots: space-separated-tokens@2.0.2: {} + sparse-bitfield@3.0.3: + dependencies: + memory-pager: 1.5.0 + spawn-error-forwarder@1.0.0: {} spdx-correct@3.2.0: @@ -11982,6 +12081,10 @@ snapshots: toml@3.0.0: {} + tr46@5.1.1: + dependencies: + punycode: 2.3.1 + traverse@0.6.8: {} trim-lines@3.0.1: {} @@ -12185,6 +12288,13 @@ snapshots: web-worker@1.5.0: {} + webidl-conversions@7.0.0: {} + + whatwg-url@14.2.0: + dependencies: + tr46: 5.1.1 + webidl-conversions: 7.0.0 + which@2.0.2: dependencies: isexe: 2.0.0