feat(connectors): add MongoDB connector (#305) (#310)

* refactor(connectors): split KtxDialect into core and KtxSqlDialect

Separate the dialect contract into a driver-agnostic core (display/ref
formatting and type mapping) and a SQL-only extension (query generators).
The catalog and entity-details paths resolve the core dialect for any
snapshot driver, so it must stay free of SQL generation; this is the
prerequisite refactor for adding non-SQL primary sources.

- KtxDialect keeps type, formatDisplayRef, parseDisplayRef,
  columnDisplayTablePartCount, mapDataType, mapToDimensionType
- KtxSqlDialect extends it with quoteIdentifier, formatTableName, and the
  query/sample/statistics generators; the 7 SQL dialects implement it
- add getSqlDialectForDriver for SQL drivers; the 7 connectors and the
  relationship-benchmark harness consume it
- thread the relationship pipeline (profiling/validation/composite/
  discovery) as KtxSqlDialect | null so a non-SQL source skips coverage SQL
  and its candidates stay in review; local-enrichment builds the SQL
  dialect only when the connector advertises readOnlySql

Pure extraction: no behavior change for the existing 7 drivers.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

* feat(connectors): add MongoDB connector for issue #305

Add a read-only MongoDB connector that treats a database as a primary
context source: collections map to tables and inferred top-level fields to
columns. MongoDB is the first non-SQL source (readOnlySql: false), so
ktx sql and metric compilation do not apply, but its collections flow
through ingest, descriptions, and relationship discovery.

- schema-inference: infer a flat column schema from the most recent
  sample_size documents (by _id desc, or order_by for non-ObjectId keys).
  Union BSON types per field, mark multi-type fields mixed (string), keep
  sub-documents/arrays as a single opaque json column, derive nullability
  from presence, treat _id as the primary key
- connector: KtxMongoDbScanConnector behind an injectable client seam;
  strictly read-only (find/listCollections/estimatedDocumentCount only),
  no executeReadOnly; resolves env:/file: via resolveKtxConfigReference
- core-only KtxMongoDbDialect and a live-database introspection adapter
- wire the mongodb driver: driver union, dialect registry, driver
  registration (scopeConfigKey databases), mongodbConnectionSchema,
  connection-drivers, normalizeDriver, the live-database route, and the
  ktx setup picker. ktx sql is refused by the read-only SQL capability gate
- tests: schema inference, connector snapshot via a fake client, dialect,
  driver-schema parsing, and the ktx sql rejection

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

* docs(integrations): document the MongoDB primary source

Add a MongoDB section to the primary-sources reference: connection config
(url, databases, enabled_tables, sample_size, order_by), mongodb+srv/TLS/
Atlas notes, the schema-inference explainer, a features matrix, and the
non-SQL caveat. Update the frontmatter and connection field reference.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

* fix(connectors): address review blockers on the MongoDB connector

- introspect: skip estimatedDocumentCount for views. The count command is
  rejected on a MongoDB view (CommandNotSupportedOnView), so counting a view
  aborted introspect for the whole connection; compute estimatedRows only for
  real collections, as ClickHouse does.
- sl: refuse a semantic-layer query against a non-SQL connection instead of
  defaulting it to the Postgres dialect. compileLocalSlQuery (the shared CLI +
  MCP path) now rejects a driver with no SQL dialect via the new
  isSqlQueryableDriver authority, keeping MongoDB context-only per issue #305.
- tests: cover input.tableScope and the empty-scope skip for the Mongo
  connector (the scan layer does not post-filter), the view no-count path, and
  the ktx sl query refusal for a mongodb connection.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

* polish(mongodb): compute sampled nullCount and document sampling caveats

Address the non-blocking review notes:

- sampleColumn now counts null/absent values over the sampled window instead of
  returning nullCount: null, since the documents are already in hand
- warn that a custom order_by must be indexed (an unindexed sort hits MongoDB's
  in-memory sort limit on large collections) in the connection schema and docs
- note that sampled values for nested fields are stringified, not faithfully
  serialized, so the json opacity is deliberate

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

* docs(examples): add a MongoDB connector example

A manual, container-backed example mirroring examples/postgres-historic:

- docker-compose.yml + init/seed.js seed a representative dataset (nested
  documents, arrays, a Decimal128, a mixed-type field, a nullable field, an
  ObjectId reference, and a view) on first container start
- scripts/smoke.sh + introspect-smoke.mjs assert the connector's inferred
  schema with no LLM credentials — the same introspection entry point ktx
  ingest's database-schema stage uses, including the view-no-count path
- README.md documents the smoke and a full keyless ktx ingest run
  (claude-code LLM + managed sentence-transformers embeddings)

Works with Docker Compose or podman compose. Verified end to end.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

* chore: ignore examples/** in knip to fix dead-code false positives

The MongoDB connector example files (examples/mongodb/init/seed.js and
examples/mongodb/scripts/introspect-smoke.mjs) are used at runtime but were
flagged as unused by knip. Add examples/** to the ignore array, matching the
existing .context/** entry.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Claude-Session: https://claude.ai/code/session_0114qQV8fJ5a5ME3XbMVRzbL

* fix(mongodb): refuse non-SQL connections before SQL analysis

`ktx sql` and the MCP sql_execution tool resolved a SQL-analysis dialect
(falling back to Postgres for a non-SQL driver) and ran read-only
validation before the connector capability gate refused the connection.
For a MongoDB connection that spun up the parser/daemon and produced
Postgres parser diagnostics instead of a clean non-SQL refusal.

Route both entry points through a shared assertSqlQueryableConnection
guard before dialect selection, mirroring compileLocalSlQuery. The
federated duckdb path has no driver and is exempted at each call site.
Add CLI and MCP regression tests asserting validation/connector work
never starts for a MongoDB connection.

* fix(mongodb): pass CI gates (dialect boundary, secrets, setup test)

Three latent failures in the connector surfaced once CI ran on the branch:

- connector.ts imported the concrete KtxMongoDbDialect, which the connector
  dialect-import boundary forbids. Route it through getDialectForDriver('mongodb')
  and widen inferKtxMongoCollectionColumns to the base KtxDialect (it only uses
  mapDataType/mapToDimensionType).
- detect-secrets flagged a test ObjectId hex and the mongodb+srv example URL;
  annotate both with allowlist pragmas.
- the "shows every supported database" setup test omitted the new MongoDB option.

---------

Co-authored-by: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Co-authored-by: Luca Martial <48870843+luca-martial@users.noreply.github.com>
Co-authored-by: Luca Martial <lucamrtl@gmail.com>
Co-authored-by: Andrey Avtomonov <andreybavt@gmail.com>
This commit is contained in:
Pintouch 2026-06-29 15:17:56 +02:00 committed by GitHub
parent 4f084186f1
commit 2afab61417
No known key found for this signature in database
GPG key ID: B5690EEEBB952194
59 changed files with 1971 additions and 129 deletions

View file

@ -8,6 +8,7 @@ const KTX_DATABASE_DRIVER_IDS = new Set([
'sqlserver',
'bigquery',
'snowflake',
'mongodb',
]);
export function normalizeConnectionDriver(connection: KtxProjectConnectionConfig): string {

View file

@ -1,6 +1,6 @@
import { BigQuery, type TableField } from '@google-cloud/bigquery';
import { normalizeBigQueryProjectId, normalizeBigQueryRegion } from '../../context/connections/bigquery-identifiers.js';
import { getDialectForDriver } from '../../context/connections/dialects.js';
import { getSqlDialectForDriver } from '../../context/connections/dialects.js';
import { assertReadOnlySql, limitSqlForExecution } from '../../context/connections/read-only-sql.js';
import { tryConstraintQuery } from '../../context/scan/constraint-discovery.js';
import { scopedTableNames } from '../../context/scan/table-ref.js';
@ -291,7 +291,7 @@ export class KtxBigQueryScanConnector implements KtxScanConnector {
private readonly now: () => Date;
private readonly maxBytesBilled?: number | string;
private readonly queryTimeoutMs?: number;
private readonly dialect = getDialectForDriver('bigquery');
private readonly dialect = getSqlDialectForDriver('bigquery');
private client: KtxBigQueryClient | null = null;
constructor(options: KtxBigQueryScanConnectorOptions) {

View file

@ -1,4 +1,4 @@
import type { KtxDialect } from '../../context/connections/dialects.js';
import type { KtxSqlDialect } from '../../context/connections/dialects.js';
import {
columnDisplayPartCount,
formatDialectDisplayRef,
@ -11,7 +11,7 @@ import type { KtxSchemaDimensionType, KtxTableRef } from '../../context/scan/typ
type BigQueryTableNameRef = Pick<KtxTableRef, 'name'> & Partial<Pick<KtxTableRef, 'catalog' | 'db'>>;
/** @internal */
export class KtxBigQueryDialect implements KtxDialect {
export class KtxBigQueryDialect implements KtxSqlDialect {
readonly type = 'bigquery' as const;
private readonly typeMappings: Record<string, KtxSchemaDimensionType> = {

View file

@ -1,5 +1,5 @@
import { createClient } from '@clickhouse/client';
import { getDialectForDriver } from '../../context/connections/dialects.js';
import { getSqlDialectForDriver } from '../../context/connections/dialects.js';
import { assertReadOnlySql, limitSqlForExecution } from '../../context/connections/read-only-sql.js';
import { connectorTestFailure, createKtxConnectorCapabilities, type KtxConnectorTestResult, type KtxColumnSampleInput, type KtxColumnSampleResult, type KtxColumnStatsInput, type KtxColumnStatsResult, type KtxQueryResult, type KtxReadOnlyQueryInput, type KtxScanConnector, type KtxScanContext, type KtxScanInput, type KtxSchemaColumn, type KtxSchemaSnapshot, type KtxSchemaTable, type KtxTableRef, type KtxTableSampleInput, type KtxTableListEntry, type KtxTableSampleResult } from '../../context/scan/types.js';
import { scopedTableNames } from '../../context/scan/table-ref.js';
@ -284,7 +284,7 @@ export class KtxClickHouseScanConnector implements KtxScanConnector {
private readonly clientFactory: KtxClickHouseClientFactory;
private readonly endpointResolver?: KtxClickHouseEndpointResolver;
private readonly now: () => Date;
private readonly dialect = getDialectForDriver('clickhouse');
private readonly dialect = getSqlDialectForDriver('clickhouse');
private client: KtxClickHouseClient | null = null;
private resolvedEndpoint: KtxClickHouseResolvedEndpoint | null = null;

View file

@ -1,4 +1,4 @@
import type { KtxDialect } from '../../context/connections/dialects.js';
import type { KtxSqlDialect } from '../../context/connections/dialects.js';
import {
columnDisplayPartCount,
formatDialectDisplayRef,
@ -11,7 +11,7 @@ import type { KtxSchemaDimensionType, KtxTableRef } from '../../context/scan/typ
type ClickHouseTableNameRef = Pick<KtxTableRef, 'name'> & Partial<Pick<KtxTableRef, 'catalog' | 'db'>>;
/** @internal */
export class KtxClickHouseDialect implements KtxDialect {
export class KtxClickHouseDialect implements KtxSqlDialect {
readonly type = 'clickhouse' as const;
private readonly typeMappings: Record<string, KtxSchemaDimensionType> = {

View file

@ -0,0 +1,413 @@
import { MongoClient } from 'mongodb';
import { resolveKtxConfigReference } from '../../context/core/config-reference.js';
import {
connectorTestFailure,
createKtxConnectorCapabilities,
type KtxColumnSampleInput,
type KtxColumnSampleResult,
type KtxConnectorTestResult,
type KtxScanConnector,
type KtxScanContext,
type KtxScanInput,
type KtxSchemaSnapshot,
type KtxSchemaTable,
type KtxTableListEntry,
type KtxTableRef,
type KtxTableSampleInput,
type KtxTableSampleResult,
} from '../../context/scan/types.js';
import { scopedTableNames } from '../../context/scan/table-ref.js';
import { getDialectForDriver } from '../../context/connections/dialects.js';
import { inferKtxMongoCollectionColumns, type KtxMongoDocument, MONGO_ID_FIELD } from './schema-inference.js';
const DEFAULT_SAMPLE_SIZE = 1000;
const SAMPLE_MAX_TIME_MS = 30_000;
export interface KtxMongoDbConnectionConfig {
driver?: string;
url?: string;
database?: string;
databases?: string[];
enabled_tables?: string[];
sample_size?: number;
order_by?: string;
[key: string]: unknown;
}
export interface KtxMongoListedCollection {
name: string;
type?: string;
}
interface KtxMongoFindOptions {
sort: Record<string, 1 | -1>;
limit: number;
projection?: Record<string, 1>;
}
/** Driver-agnostic seam over the `mongodb` client so the connector is unit-testable without a server. */
export interface KtxMongoClient {
listCollections(databaseName: string): Promise<KtxMongoListedCollection[]>;
estimatedDocumentCount(databaseName: string, collectionName: string): Promise<number>;
find(databaseName: string, collectionName: string, options: KtxMongoFindOptions): Promise<KtxMongoDocument[]>;
ping(databaseName: string): Promise<void>;
close(): Promise<void>;
}
export interface KtxMongoClientFactory {
create(url: string): KtxMongoClient;
}
export interface KtxMongoDbScanConnectorOptions {
connectionId: string;
connection: KtxMongoDbConnectionConfig | undefined;
clientFactory?: KtxMongoClientFactory;
env?: NodeJS.ProcessEnv;
now?: () => Date;
}
class DefaultMongoClient implements KtxMongoClient {
private readonly client: MongoClient;
private connected = false;
constructor(url: string) {
this.client = new MongoClient(url);
}
private async connectedClient(): Promise<MongoClient> {
if (!this.connected) {
await this.client.connect();
this.connected = true;
}
return this.client;
}
async listCollections(databaseName: string): Promise<KtxMongoListedCollection[]> {
const client = await this.connectedClient();
const collections = await client.db(databaseName).listCollections({}, { nameOnly: false }).toArray();
return collections.map((collection) => ({ name: collection.name, type: collection.type }));
}
async estimatedDocumentCount(databaseName: string, collectionName: string): Promise<number> {
const client = await this.connectedClient();
return client.db(databaseName).collection(collectionName).estimatedDocumentCount();
}
async find(
databaseName: string,
collectionName: string,
options: KtxMongoFindOptions,
): Promise<KtxMongoDocument[]> {
const client = await this.connectedClient();
return client
.db(databaseName)
.collection(collectionName)
.find({}, { sort: options.sort, limit: options.limit, maxTimeMS: SAMPLE_MAX_TIME_MS, ...(options.projection ? { projection: options.projection } : {}) })
.toArray() as Promise<KtxMongoDocument[]>;
}
async ping(databaseName: string): Promise<void> {
const client = await this.connectedClient();
await client.db(databaseName).command({ ping: 1 });
}
async close(): Promise<void> {
if (this.connected) {
await this.client.close();
this.connected = false;
}
}
}
class DefaultMongoClientFactory implements KtxMongoClientFactory {
create(url: string): KtxMongoClient {
return new DefaultMongoClient(url);
}
}
export function isKtxMongoDbConnectionConfig(
connection: KtxMongoDbConnectionConfig | undefined,
): connection is KtxMongoDbConnectionConfig {
return String(connection?.driver ?? '').toLowerCase() === 'mongodb';
}
function databaseFromUrl(url: string): string | undefined {
try {
const path = new URL(url).pathname.replace(/^\/+/, '');
const database = path.split('/')[0];
return database && database.length > 0 ? decodeURIComponent(database) : undefined;
} catch {
return undefined;
}
}
function configuredDatabases(connection: KtxMongoDbConnectionConfig, fallback: string | undefined): string[] {
if (Array.isArray(connection.databases)) {
const selected = connection.databases
.filter((database): database is string => typeof database === 'string' && database.trim().length > 0)
.map((database) => database.trim());
if (selected.length > 0) {
return [...new Set(selected)];
}
}
const single = typeof connection.database === 'string' && connection.database.trim().length > 0
? connection.database.trim()
: fallback;
return single ? [single] : [];
}
function positiveInteger(value: unknown, fallback: number): number {
return typeof value === 'number' && Number.isInteger(value) && value > 0 ? value : fallback;
}
function normalizeSampleValue(value: unknown): unknown {
if (value === null || value === undefined) {
return null;
}
if (value instanceof Date) {
return value.toISOString();
}
if (typeof value === 'object') {
const bsontype = (value as { _bsontype?: unknown })._bsontype;
return typeof bsontype === 'string' ? String(value) : JSON.stringify(value);
}
return value;
}
function unionDocumentKeys(documents: readonly KtxMongoDocument[]): string[] {
const keys: string[] = [];
const seen = new Set<string>();
for (const document of documents) {
for (const key of Object.keys(document)) {
if (!seen.has(key)) {
seen.add(key);
keys.push(key);
}
}
}
return keys;
}
export class KtxMongoDbScanConnector implements KtxScanConnector {
readonly id: string;
readonly driver = 'mongodb' as const;
readonly capabilities = createKtxConnectorCapabilities({
tableSampling: true,
columnSampling: true,
columnStats: false,
readOnlySql: false,
nestedAnalysis: true,
formalForeignKeys: false,
estimatedRowCounts: true,
});
private readonly connectionId: string;
private readonly connection: KtxMongoDbConnectionConfig;
private readonly url: string;
private readonly databases: string[];
private readonly sampleSize: number;
private readonly orderBy: string;
private readonly enabledTables: ReadonlySet<string> | null;
private readonly clientFactory: KtxMongoClientFactory;
private readonly now: () => Date;
private readonly dialect = getDialectForDriver('mongodb');
private client: KtxMongoClient | null = null;
constructor(options: KtxMongoDbScanConnectorOptions) {
const connection = options.connection ?? {};
const inputDriver = connection.driver ?? 'unknown';
if (!isKtxMongoDbConnectionConfig(connection)) {
throw new Error(`Native MongoDB connector cannot run driver "${inputDriver}"`);
}
const env = options.env ?? process.env;
const url = resolveKtxConfigReference(
typeof connection.url === 'string' ? connection.url.trim() : undefined,
env,
);
if (!url) {
throw new Error(`Native MongoDB connector requires connections.${options.connectionId}.url`);
}
const databases = configuredDatabases(connection, databaseFromUrl(url));
if (databases.length === 0) {
throw new Error(
`Native MongoDB connector requires connections.${options.connectionId}.databases (or a database in the URL)`,
);
}
const enabledTables = Array.isArray(connection.enabled_tables)
? new Set(
connection.enabled_tables
.filter((table): table is string => typeof table === 'string' && table.trim().length > 0)
.map((table) => table.trim()),
)
: null;
this.connectionId = options.connectionId;
this.connection = connection;
this.url = url;
this.databases = databases;
this.sampleSize = positiveInteger(connection.sample_size, DEFAULT_SAMPLE_SIZE);
this.orderBy = typeof connection.order_by === 'string' && connection.order_by.trim().length > 0
? connection.order_by.trim()
: MONGO_ID_FIELD;
this.enabledTables = enabledTables && enabledTables.size > 0 ? enabledTables : null;
this.clientFactory = options.clientFactory ?? new DefaultMongoClientFactory();
this.now = options.now ?? (() => new Date());
this.id = `mongodb:${options.connectionId}`;
}
async testConnection(): Promise<KtxConnectorTestResult> {
try {
await this.clientForQuery().ping(this.databases[0]!);
return { success: true };
} catch (error) {
return connectorTestFailure(error);
}
}
async introspect(input: KtxScanInput, _ctx: KtxScanContext): Promise<KtxSchemaSnapshot> {
this.assertConnection(input.connectionId);
const client = this.clientForQuery();
const tables: KtxSchemaTable[] = [];
for (const database of this.databases) {
const scopedNames = input.tableScope
? new Set(scopedTableNames(input.tableScope, { catalog: null, db: database }))
: null;
const collections = await client.listCollections(database);
for (const collection of collections) {
if (collection.name.startsWith('system.')) {
continue;
}
if (scopedNames && !scopedNames.has(collection.name)) {
continue;
}
if (this.enabledTables && !this.enabledTables.has(`${database}.${collection.name}`)) {
continue;
}
tables.push(await this.introspectCollection(client, database, collection));
}
}
return {
connectionId: this.connectionId,
driver: 'mongodb',
extractedAt: this.now().toISOString(),
scope: { schemas: this.databases },
metadata: {
databases: this.databases,
sample_size: this.sampleSize,
order_by: this.orderBy,
table_count: tables.length,
total_columns: tables.reduce((sum, table) => sum + table.columns.length, 0),
},
tables,
};
}
private async introspectCollection(
client: KtxMongoClient,
database: string,
collection: KtxMongoListedCollection,
): Promise<KtxSchemaTable> {
const isView = collection.type === 'view';
// estimatedDocumentCount issues a count command, which MongoDB rejects on a
// view (CommandNotSupportedOnView); only count real collections.
const estimatedRows = isView ? null : await client.estimatedDocumentCount(database, collection.name);
const documents = await client.find(database, collection.name, {
sort: { [this.orderBy]: -1 },
limit: this.sampleSize,
});
return {
catalog: null,
db: database,
name: collection.name,
kind: isView ? 'view' : 'table',
comment: null,
estimatedRows,
columns: inferKtxMongoCollectionColumns(documents, this.dialect),
foreignKeys: [],
};
}
async sampleTable(input: KtxTableSampleInput, _ctx: KtxScanContext): Promise<KtxTableSampleResult> {
this.assertConnection(input.connectionId);
const { database, collection } = this.resolveTableRef(input.table);
const documents = await this.clientForQuery().find(database, collection, {
sort: { [this.orderBy]: -1 },
limit: input.limit,
});
const headers = input.columns && input.columns.length > 0 ? input.columns : unionDocumentKeys(documents);
const rows = documents.map((document) => headers.map((header) => normalizeSampleValue(document[header])));
return { headers, rows, totalRows: documents.length };
}
async sampleColumn(input: KtxColumnSampleInput, _ctx: KtxScanContext): Promise<KtxColumnSampleResult> {
this.assertConnection(input.connectionId);
const { database, collection } = this.resolveTableRef(input.table);
const documents = await this.clientForQuery().find(database, collection, {
sort: { [this.orderBy]: -1 },
limit: input.limit,
projection: { [input.column]: 1 },
});
const values: unknown[] = [];
let nullCount = 0;
for (const document of documents) {
const value = document[input.column];
if (value === null || value === undefined) {
nullCount += 1;
continue;
}
values.push(normalizeSampleValue(value));
}
return { values, nullCount, distinctCount: null };
}
async listSchemas(): Promise<string[]> {
return [...this.databases];
}
async listTables(schemas?: string[]): Promise<KtxTableListEntry[]> {
const client = this.clientForQuery();
const databases = schemas && schemas.length > 0 ? schemas : this.databases;
const entries: KtxTableListEntry[] = [];
for (const database of databases) {
const collections = await client.listCollections(database);
for (const collection of collections) {
if (collection.name.startsWith('system.')) {
continue;
}
entries.push({
catalog: null,
schema: database,
name: collection.name,
kind: collection.type === 'view' ? 'view' : 'table',
});
}
}
return entries;
}
async cleanup(): Promise<void> {
if (this.client) {
await this.client.close();
this.client = null;
}
}
private resolveTableRef(table: KtxTableRef): { database: string; collection: string } {
return { database: table.db ?? this.databases[0]!, collection: table.name };
}
private clientForQuery(): KtxMongoClient {
if (!this.client) {
this.client = this.clientFactory.create(this.url);
}
return this.client;
}
private assertConnection(connectionId: string): void {
if (connectionId !== this.connectionId) {
throw new Error(`ktx MongoDB connector ${this.id} cannot serve connection ${connectionId}`);
}
}
}

View file

@ -0,0 +1,64 @@
import type { KtxDialect } from '../../context/connections/dialects.js';
import {
columnDisplayPartCount,
formatDialectDisplayRef,
parseDialectDisplayRef,
} from '../../context/connections/dialect-helpers.js';
import type { KtxDialectTableRef } from '../../context/connections/dialect-helpers.js';
import type { KtxSchemaDimensionType, KtxTableRef } from '../../context/scan/types.js';
/**
* Display/type-mapping half of the dialect contract for MongoDB. Collections map
* to `db.collection` display refs (ansi two-part shape). MongoDB is a non-SQL
* source, so it implements {@link KtxDialect} only never {@link KtxSqlDialect}.
*/
export class KtxMongoDbDialect implements KtxDialect {
readonly type = 'mongodb' as const;
private readonly typeMappings: Record<string, KtxSchemaDimensionType> = {
objectid: 'string',
string: 'string',
uuid: 'string',
binary: 'string',
regex: 'string',
int: 'number',
long: 'number',
double: 'number',
decimal: 'number',
bool: 'boolean',
date: 'time',
timestamp: 'time',
object: 'string',
array: 'string',
json: 'string',
null: 'string',
mixed: 'string',
};
formatDisplayRef(table: KtxDialectTableRef): string {
return formatDialectDisplayRef(table, 'ansi');
}
parseDisplayRef(display: string): KtxTableRef | null {
return parseDialectDisplayRef(display, 'ansi');
}
columnDisplayTablePartCount(): 1 | 2 | 3 {
return columnDisplayPartCount('ansi');
}
mapDataType(nativeType: string): string {
const normalized = nativeType.toLowerCase().trim();
if (normalized === 'object' || normalized === 'array') {
return 'json';
}
return normalized || 'mixed';
}
mapToDimensionType(nativeType: string): KtxSchemaDimensionType {
if (!nativeType) {
return 'string';
}
return this.typeMappings[nativeType.toLowerCase().trim()] ?? 'string';
}
}

View file

@ -0,0 +1,44 @@
import type {
LiveDatabaseIntrospectionOptions,
LiveDatabaseIntrospectionPort,
} from '../../context/ingest/adapters/live-database/types.js';
import type { KtxProjectConnectionConfig } from '../../context/project/config.js';
import {
KtxMongoDbScanConnector,
type KtxMongoClientFactory,
type KtxMongoDbConnectionConfig,
} from './connector.js';
interface CreateMongoDbLiveDatabaseIntrospectionOptions {
connections: Record<string, KtxProjectConnectionConfig>;
clientFactory?: KtxMongoClientFactory;
now?: () => Date;
}
export function createMongoDbLiveDatabaseIntrospection(
options: CreateMongoDbLiveDatabaseIntrospectionOptions,
): LiveDatabaseIntrospectionPort {
return {
async extractSchema(connectionId: string, introspectionOptions?: LiveDatabaseIntrospectionOptions) {
const connection = options.connections[connectionId] as KtxMongoDbConnectionConfig | undefined;
const connector = new KtxMongoDbScanConnector({
connectionId,
connection,
clientFactory: options.clientFactory,
now: options.now,
});
try {
return await connector.introspect(
{
connectionId,
driver: 'mongodb',
...(introspectionOptions?.tableScope ? { tableScope: introspectionOptions.tableScope } : {}),
},
{ runId: `mongodb-${connectionId}` },
);
} finally {
await connector.cleanup();
}
},
};
}

View file

@ -0,0 +1,129 @@
import type { KtxSchemaColumn } from '../../context/scan/types.js';
import type { KtxDialect } from '../../context/connections/dialects.js';
export type KtxMongoDocument = Record<string, unknown>;
/** Top-level field name MongoDB guarantees on every document; used as the primary key. */
export const MONGO_ID_FIELD = '_id';
const BSON_TYPE_NAMES: Record<string, string> = {
objectid: 'objectId',
int32: 'int',
long: 'long',
double: 'double',
decimal128: 'decimal',
binary: 'binary',
uuid: 'uuid',
timestamp: 'timestamp',
bsonregexp: 'regex',
bsonsymbol: 'string',
};
/**
* Canonical BSON type name for a sampled value as the `mongodb` driver hydrates
* it: BSON wrapper objects expose `_bsontype`; everything else maps from the JS
* runtime type. Sub-documents and arrays collapse to opaque `object`/`array`.
* @internal
*/
export function bsonTypeOf(value: unknown): string {
if (value === null || value === undefined) {
return 'null';
}
if (typeof value === 'string') {
return 'string';
}
if (typeof value === 'boolean') {
return 'bool';
}
if (typeof value === 'number') {
return Number.isInteger(value) ? 'int' : 'double';
}
if (typeof value === 'bigint') {
return 'long';
}
if (value instanceof Date) {
return 'date';
}
if (Array.isArray(value)) {
return 'array';
}
if (typeof value === 'object') {
const bsontype = (value as { _bsontype?: unknown })._bsontype;
if (typeof bsontype === 'string') {
return BSON_TYPE_NAMES[bsontype.toLowerCase()] ?? bsontype;
}
return 'object';
}
return 'mixed';
}
interface FieldAccumulator {
types: Set<string>;
present: number;
nullSeen: boolean;
}
function resolveNativeType(types: ReadonlySet<string>): string {
if (types.size === 0) {
return 'null';
}
if (types.size > 1) {
return 'mixed';
}
return [...types][0]!;
}
/**
* Infer a flat column schema from sampled documents. Each top-level field becomes
* one column: BSON types are unioned (a field seen with >1 type is `mixed` and
* treated as a string), nullability is derived from field presence and observed
* nulls, and sub-documents/arrays remain a single opaque `json` column. `_id` is
* the non-nullable primary key.
*/
export function inferKtxMongoCollectionColumns(
documents: readonly KtxMongoDocument[],
dialect: KtxDialect,
): KtxSchemaColumn[] {
const total = documents.length;
const order: string[] = [];
const fields = new Map<string, FieldAccumulator>();
for (const document of documents) {
if (!document || typeof document !== 'object') {
continue;
}
for (const [name, value] of Object.entries(document)) {
let accumulator = fields.get(name);
if (!accumulator) {
accumulator = { types: new Set(), present: 0, nullSeen: false };
fields.set(name, accumulator);
order.push(name);
}
accumulator.present += 1;
const bsonType = bsonTypeOf(value);
if (bsonType === 'null') {
accumulator.nullSeen = true;
} else {
accumulator.types.add(bsonType);
}
}
}
return order.map((name) => {
const accumulator = fields.get(name)!;
const nativeType = resolveNativeType(accumulator.types);
const isId = name === MONGO_ID_FIELD;
const nullable = isId
? false
: accumulator.present < total || accumulator.nullSeen || accumulator.types.size === 0;
return {
name,
nativeType,
normalizedType: dialect.mapDataType(nativeType),
dimensionType: dialect.mapToDimensionType(nativeType),
nullable,
primaryKey: isId,
comment: null,
};
});
}

View file

@ -1,5 +1,5 @@
import mysql, { type FieldPacket, type Pool, type RowDataPacket } from 'mysql2/promise';
import { getDialectForDriver } from '../../context/connections/dialects.js';
import { getSqlDialectForDriver } from '../../context/connections/dialects.js';
import { resolveStringReference } from '../shared/string-reference.js';
import { assertReadOnlySql, limitSqlForExecution } from '../../context/connections/read-only-sql.js';
import {
@ -391,7 +391,7 @@ export class KtxMysqlScanConnector implements KtxScanConnector {
private readonly poolFactory: KtxMysqlPoolFactory;
private readonly endpointResolver?: KtxMysqlEndpointResolver;
private readonly now: () => Date;
private readonly dialect = getDialectForDriver('mysql');
private readonly dialect = getSqlDialectForDriver('mysql');
private pool: KtxMysqlPool | null = null;
private resolvedEndpoint: KtxMysqlResolvedEndpoint | null = null;

View file

@ -1,4 +1,4 @@
import type { KtxDialect } from '../../context/connections/dialects.js';
import type { KtxSqlDialect } from '../../context/connections/dialects.js';
import {
columnDisplayPartCount,
formatDialectDisplayRef,
@ -11,7 +11,7 @@ import type { KtxSchemaDimensionType, KtxTableRef } from '../../context/scan/typ
type MysqlTableNameRef = Pick<KtxTableRef, 'name'> & Partial<Pick<KtxTableRef, 'catalog' | 'db'>>;
/** @internal */
export class KtxMysqlDialect implements KtxDialect {
export class KtxMysqlDialect implements KtxSqlDialect {
readonly type = 'mysql' as const;
private readonly typeMappings: Record<string, KtxSchemaDimensionType> = {

View file

@ -1,5 +1,5 @@
import { resolveStringReference } from '../shared/string-reference.js';
import { getDialectForDriver } from '../../context/connections/dialects.js';
import { getSqlDialectForDriver } from '../../context/connections/dialects.js';
import { assertReadOnlySql, limitSqlForExecution } from '../../context/connections/read-only-sql.js';
import { tryConstraintQuery } from '../../context/scan/constraint-discovery.js';
import { scopedTableNames } from '../../context/scan/table-ref.js';
@ -412,7 +412,7 @@ export class KtxPostgresScanConnector implements KtxScanConnector {
private readonly poolFactory: KtxPostgresPoolFactory;
private readonly endpointResolver?: KtxPostgresEndpointResolver;
private readonly now: () => Date;
private readonly dialect = getDialectForDriver('postgres');
private readonly dialect = getSqlDialectForDriver('postgres');
private pool: KtxPostgresPool | null = null;
private lastIdlePoolError: Error | null = null;
private resolvedEndpoint: KtxPostgresResolvedEndpoint | null = null;

View file

@ -1,4 +1,4 @@
import type { KtxDialect } from '../../context/connections/dialects.js';
import type { KtxSqlDialect } from '../../context/connections/dialects.js';
import {
columnDisplayPartCount,
formatDialectDisplayRef,
@ -11,7 +11,7 @@ import type { KtxSchemaDimensionType, KtxTableRef } from '../../context/scan/typ
type PostgresTableNameRef = Pick<KtxTableRef, 'name'> & Partial<Pick<KtxTableRef, 'catalog' | 'db'>>;
/** @internal */
export class KtxPostgresDialect implements KtxDialect {
export class KtxPostgresDialect implements KtxSqlDialect {
readonly type = 'postgres' as const;
private readonly typeMappings: Record<string, KtxSchemaDimensionType> = {

View file

@ -1,5 +1,5 @@
import { createPrivateKey } from 'node:crypto';
import { getDialectForDriver } from '../../context/connections/dialects.js';
import { getSqlDialectForDriver } from '../../context/connections/dialects.js';
import { resolveStringReference } from '../shared/string-reference.js';
import { assertReadOnlySql, limitSqlForExecution } from '../../context/connections/read-only-sql.js';
import { tryConstraintQuery } from '../../context/scan/constraint-discovery.js';
@ -547,7 +547,7 @@ export class KtxSnowflakeScanConnector implements KtxScanConnector {
private readonly resolved: KtxSnowflakeResolvedConnectionConfig;
private readonly driverFactory: KtxSnowflakeDriverFactory;
private readonly dialect = getDialectForDriver('snowflake');
private readonly dialect = getSqlDialectForDriver('snowflake');
private readonly now: () => Date;
private driverInstance: KtxSnowflakeDriver | null = null;

View file

@ -1,4 +1,4 @@
import type { KtxDialect } from '../../context/connections/dialects.js';
import type { KtxSqlDialect } from '../../context/connections/dialects.js';
import {
columnDisplayPartCount,
formatDialectDisplayRef,
@ -11,7 +11,7 @@ import type { KtxSchemaDimensionType, KtxTableRef } from '../../context/scan/typ
type SnowflakeTableNameRef = Pick<KtxTableRef, 'name'> & Partial<Pick<KtxTableRef, 'catalog' | 'db'>>;
/** @internal */
export class KtxSnowflakeDialect implements KtxDialect {
export class KtxSnowflakeDialect implements KtxSqlDialect {
readonly type = 'snowflake' as const;
private readonly typeMappings: Record<string, KtxSchemaDimensionType> = {

View file

@ -3,7 +3,7 @@ import { existsSync, readFileSync, statSync } from 'node:fs';
import { homedir } from 'node:os';
import { isAbsolute, resolve } from 'node:path';
import { fileURLToPath } from 'node:url';
import { getDialectForDriver } from '../../context/connections/dialects.js';
import { getSqlDialectForDriver } from '../../context/connections/dialects.js';
import { assertReadOnlySql, limitSqlForExecution } from '../../context/connections/read-only-sql.js';
import { normalizeQueryRows } from '../../context/connections/query-executor.js';
import { connectorTestFailure, createKtxConnectorCapabilities, type KtxConnectorTestResult, type KtxColumnSampleInput, type KtxColumnSampleResult, type KtxColumnStatsInput, type KtxColumnStatsResult, type KtxQueryResult, type KtxReadOnlyQueryInput, type KtxScanConnector, type KtxScanContext, type KtxScanInput, type KtxSchemaForeignKey, type KtxSchemaSnapshot, type KtxSchemaTable, type KtxTableListEntry, type KtxTableRef, type KtxTableSampleInput, type KtxTableSampleResult } from '../../context/scan/types.js';
@ -133,7 +133,7 @@ export class KtxSqliteScanConnector implements KtxScanConnector {
private readonly connectionId: string;
private readonly dbPath: string;
private readonly now: () => Date;
private readonly dialect = getDialectForDriver('sqlite');
private readonly dialect = getSqlDialectForDriver('sqlite');
private db: Database.Database | null = null;
constructor(options: KtxSqliteScanConnectorOptions) {

View file

@ -1,4 +1,4 @@
import type { KtxDialect } from '../../context/connections/dialects.js';
import type { KtxSqlDialect } from '../../context/connections/dialects.js';
import {
columnDisplayPartCount,
formatDialectDisplayRef,
@ -11,7 +11,7 @@ import type { KtxSchemaDimensionType, KtxTableRef } from '../../context/scan/typ
type SqliteTableNameRef = Pick<KtxTableRef, 'name'> & Partial<Pick<KtxTableRef, 'catalog' | 'db'>>;
/** @internal */
export class KtxSqliteDialect implements KtxDialect {
export class KtxSqliteDialect implements KtxSqlDialect {
readonly type = 'sqlite' as const;
private readonly typeMappings: Record<string, KtxSchemaDimensionType> = {

View file

@ -1,5 +1,5 @@
import { assertReadOnlySql, hoistLeadingCte, stripTrailingSqlNoise } from '../../context/connections/read-only-sql.js';
import { getDialectForDriver } from '../../context/connections/dialects.js';
import { getSqlDialectForDriver } from '../../context/connections/dialects.js';
import { tryConstraintQuery } from '../../context/scan/constraint-discovery.js';
import { scopedTableNames } from '../../context/scan/table-ref.js';
import {
@ -353,7 +353,7 @@ export class KtxSqlServerScanConnector implements KtxScanConnector {
private readonly poolFactory: KtxSqlServerPoolFactory;
private readonly endpointResolver?: KtxSqlServerEndpointResolver;
private readonly now: () => Date;
private readonly dialect = getDialectForDriver('sqlserver');
private readonly dialect = getSqlDialectForDriver('sqlserver');
private pool: KtxSqlServerPool | null = null;
private resolvedEndpoint: KtxSqlServerResolvedEndpoint | null = null;

View file

@ -1,4 +1,4 @@
import type { KtxDialect } from '../../context/connections/dialects.js';
import type { KtxSqlDialect } from '../../context/connections/dialects.js';
import {
columnDisplayPartCount,
formatDialectDisplayRef,
@ -11,7 +11,7 @@ import type { KtxSchemaDimensionType, KtxTableRef } from '../../context/scan/typ
type SqlServerTableNameRef = Pick<KtxTableRef, 'name'> & Partial<Pick<KtxTableRef, 'catalog' | 'db'>>;
/** @internal */
export class KtxSqlServerDialect implements KtxDialect {
export class KtxSqlServerDialect implements KtxSqlDialect {
readonly type = 'sqlserver' as const;
private readonly typeMappings: Record<string, KtxSchemaDimensionType> = {

View file

@ -1,20 +1,38 @@
import { KtxBigQueryDialect } from '../../connectors/bigquery/dialect.js';
import { KtxClickHouseDialect } from '../../connectors/clickhouse/dialect.js';
import { KtxMongoDbDialect } from '../../connectors/mongodb/dialect.js';
import { KtxMysqlDialect } from '../../connectors/mysql/dialect.js';
import { KtxPostgresDialect } from '../../connectors/postgres/dialect.js';
import { KtxSqliteDialect } from '../../connectors/sqlite/dialect.js';
import { KtxSnowflakeDialect } from '../../connectors/snowflake/dialect.js';
import { KtxSqlServerDialect } from '../../connectors/sqlserver/dialect.js';
import { KtxExpectedError } from '../../errors.js';
import type { KtxConnectionDriver, KtxSchemaDimensionType, KtxTableRef } from '../scan/types.js';
import type { KtxDialectTableRef } from './dialect-helpers.js';
/**
* Driver-agnostic dialect surface every connection implements, including
* non-SQL sources like MongoDB: display/ref formatting and type mapping. The
* catalog and entity-details paths resolve this for any snapshot driver, so it
* must stay free of SQL generation.
*/
export interface KtxDialect {
readonly type: KtxConnectionDriver;
quoteIdentifier(identifier: string): string;
formatTableName(table: KtxDialectTableRef): string;
formatDisplayRef(table: KtxDialectTableRef): string;
parseDisplayRef(display: string): KtxTableRef | null;
columnDisplayTablePartCount(): 1 | 2 | 3;
mapToDimensionType(nativeType: string): KtxSchemaDimensionType;
mapDataType(nativeType: string): string;
}
/**
* SQL query generation, implemented only by SQL warehouse drivers. The relationship
* profiling/validation pipeline is the sole caller and is gated on the
* `readOnlySql` capability, so these methods are unreachable for a non-SQL source.
*/
export interface KtxSqlDialect extends KtxDialect {
quoteIdentifier(identifier: string): string;
formatTableName(table: KtxDialectTableRef): string;
getLimitOffsetClause(limit: number, offset?: number): string;
getTopClause(limit: number): string;
getRandomSampleFilter(samplePct: number): string;
@ -30,21 +48,11 @@ export interface KtxDialect {
getDistinctCountExpression(column: string): string;
textLengthExpression(columnSql: string): string;
castToText(columnSql: string): string;
mapToDimensionType(nativeType: string): KtxSchemaDimensionType;
mapDataType(nativeType: string): string;
}
const supportedDrivers: KtxConnectionDriver[] = [
'bigquery',
'clickhouse',
'mysql',
'postgres',
'sqlite',
'snowflake',
'sqlserver',
];
type KtxSqlDriver = Exclude<KtxConnectionDriver, 'mongodb'>;
const dialectFactories: Record<KtxConnectionDriver, () => KtxDialect> = {
const sqlDialectFactories: Record<KtxSqlDriver, () => KtxSqlDialect> = {
bigquery: () => new KtxBigQueryDialect(),
clickhouse: () => new KtxClickHouseDialect(),
mysql: () => new KtxMysqlDialect(),
@ -54,11 +62,54 @@ const dialectFactories: Record<KtxConnectionDriver, () => KtxDialect> = {
sqlserver: () => new KtxSqlServerDialect(),
};
const dialectFactories: Record<KtxConnectionDriver, () => KtxDialect> = {
...sqlDialectFactories,
mongodb: () => new KtxMongoDbDialect(),
};
const supportedSqlDrivers = Object.keys(sqlDialectFactories).sort();
export function getDialectForDriver(driver: string): KtxDialect {
const normalized = driver.toLowerCase().trim();
const factory = dialectFactories[normalized as KtxConnectionDriver];
if (factory) {
return factory();
}
throw new Error(`Unsupported warehouse driver "${driver}". Supported drivers: ${supportedDrivers.join(', ')}`);
throw new Error(
`Unsupported driver "${driver}". Supported drivers: ${Object.keys(dialectFactories).sort().join(', ')}`,
);
}
export function getSqlDialectForDriver(driver: string): KtxSqlDialect {
const normalized = driver.toLowerCase().trim();
const factory = sqlDialectFactories[normalized as KtxSqlDriver];
if (factory) {
return factory();
}
throw new Error(`Driver "${driver}" has no SQL dialect. SQL drivers: ${supportedSqlDrivers.join(', ')}`);
}
/**
* Whether a driver can generate and execute SQL. Single source of truth for the
* SQL/non-SQL boundary: a driver is SQL-queryable iff it has a SQL dialect, so
* non-SQL sources (e.g. mongodb) are excluded without a hand-maintained list.
*/
export function isSqlQueryableDriver(driver: string | undefined): boolean {
const normalized = (driver ?? '').toLowerCase().trim();
return Object.prototype.hasOwnProperty.call(sqlDialectFactories, normalized);
}
/**
* Refuse a non-SQL connection (e.g. mongodb) at a read-only-SQL entry point before
* any dialect selection or parser/daemon work, so it is never validated as Postgres.
* The federated `duckdb` connection has no driver callers skip this guard for it.
*/
export function assertSqlQueryableConnection(connectionId: string, driver: string | undefined): void {
if (!isSqlQueryableDriver(driver)) {
throw new KtxExpectedError(
`Connection '${connectionId}' uses the non-SQL driver '${driver ?? 'unknown'}'. ` +
'Read-only SQL (ktx sql, the sql_execution tool) requires a SQL warehouse connection; ' +
'MongoDB and other context-only sources are searchable and ingestable, not SQL-queryable.',
);
}
}

View file

@ -68,6 +68,27 @@ export const driverRegistrations: Record<KtxConnectionDriver, KtxDriverRegistrat
};
},
},
mongodb: {
driver: 'mongodb',
scopeConfigKey: 'databases',
hasHistoricSqlReader: false,
load: async () => {
const m = await import('../../connectors/mongodb/connector.js');
return {
isConnectionConfig: (connection) => {
const typedConnection = connection as Parameters<typeof m.isKtxMongoDbConnectionConfig>[0];
return m.isKtxMongoDbConnectionConfig(typedConnection);
},
createScanConnector: ({ connectionId, connection }) => {
const typedConnection = connection as Parameters<typeof m.isKtxMongoDbConnectionConfig>[0];
if (!m.isKtxMongoDbConnectionConfig(typedConnection)) {
throw invalidConnectionConfig('mongodb');
}
return new m.KtxMongoDbScanConnector({ connectionId, connection: typedConnection });
},
};
},
},
mysql: {
driver: 'mysql',
scopeConfigKey: 'schemas',

View file

@ -2,6 +2,7 @@ import type { KtxSqlQueryExecutorPort } from '../../context/connections/query-ex
import { KtxExpectedError, KtxQueryError, isNativeProgrammingFault } from '../../errors.js';
import { executeProjectReadOnlySql } from '../../context/connections/project-sql-executor.js';
import { FEDERATED_CONNECTION_ID, federatedConnectionListing } from '../../context/connections/federation.js';
import { assertSqlQueryableConnection } from '../../context/connections/dialects.js';
import { resolveConfiguredConnection } from '../../context/connections/resolve-connection.js';
import {
type LocalConnectionInfo,
@ -48,6 +49,9 @@ async function executeValidatedReadOnlySql(
const isFederated = input.connectionId === FEDERATED_CONNECTION_ID;
const connectionId = isFederated ? input.connectionId : assertSafeConnectionId(input.connectionId);
const connection = isFederated ? undefined : resolveConfiguredConnection(project.config, connectionId);
if (!isFederated) {
assertSqlQueryableConnection(connectionId, connection!.driver);
}
const dialect = sqlAnalysisDialectForDriver(isFederated ? 'duckdb' : connection!.driver);
const validation = await options.sqlAnalysis.validateReadOnly(input.sql, dialect);

View file

@ -48,6 +48,41 @@ const warehouseConnectionSchemas = [
warehouseConnectionSchema('sqlserver'),
] as const;
const mongodbConnectionSchema = z
.looseObject({
driver: z.literal('mongodb'),
url: z
.string()
.min(1)
.describe(
'MongoDB connection string (mongodb:// or mongodb+srv://, including TLS/Atlas); may contain a reference like env:MONGO_URL.',
),
database: z.string().min(1).optional().describe('Single database to introspect when not using databases or a URL path.'),
databases: z
.array(z.string().min(1))
.optional()
.describe('Databases whose collections ktx introspects as tables. Falls back to the URL path database.'),
enabled_tables: z
.array(z.string().min(1))
.optional()
.describe('Optional allowlist of "database.collection" names to introspect.'),
sample_size: z
.number()
.int()
.min(1)
.optional()
.describe('How many recent documents to sample per collection when inferring the schema (default 1000).'),
order_by: z
.string()
.min(1)
.optional()
.describe(
'Field to sort by descending when sampling. Defaults to _id; set this when _id is not an ObjectId. ' +
'Should be indexed — an unindexed sort hits MongoDB\'s in-memory sort limit on large collections.',
),
})
.describe('MongoDB primary-source connection. Schema is inferred by sampling the most recent documents.');
const positiveIntKeyMessage = (field: string) => `${field} keys must be positive-integer strings (e.g. "1", "42")`;
const positiveIntKeyRegex = /^[1-9]\d*$/;
@ -210,6 +245,7 @@ const metricflowConnectionSchema = z
export const connectionConfigSchema = z.discriminatedUnion('driver', [
...warehouseConnectionSchemas,
mongodbConnectionSchema,
metabaseConnectionSchema,
lookerConnectionSchema,
lookmlConnectionSchema,

View file

@ -1,6 +1,6 @@
import pLimit from 'p-limit';
import type { KtxLlmRuntimePort } from '../../context/llm/runtime-port.js';
import { getDialectForDriver } from '../connections/dialects.js';
import { getSqlDialectForDriver } from '../connections/dialects.js';
import { buildDefaultKtxProjectConfig, type KtxScanRelationshipConfig } from '../project/config.js';
import { KtxDescriptionGenerator } from './description-generation.js';
import { buildKtxColumnEmbeddingText } from './embedding-text.js';
@ -486,7 +486,9 @@ export async function runLocalScanEnrichment(
snapshot,
connectionId: input.connectionId,
});
const dialect = getDialectForDriver(snapshot.driver);
const dialect = input.connector.capabilities.readOnlySql
? getSqlDialectForDriver(snapshot.driver)
: null;
const now = input.now ?? (() => new Date());
const state = completedKtxScanEnrichmentStateSummary();
const syncId = input.syncId ?? input.context.runId;

View file

@ -131,12 +131,13 @@ function normalizeDriver(driver: string | undefined): KtxConnectionDriver {
normalized === 'clickhouse' ||
normalized === 'sqlserver' ||
normalized === 'bigquery' ||
normalized === 'snowflake'
normalized === 'snowflake' ||
normalized === 'mongodb'
) {
return normalized;
}
throw new Error(
`Standalone ktx scan supports postgres/sqlite/mysql/clickhouse/sqlserver/bigquery/snowflake in this phase, received "${driver ?? 'unknown'}"`,
`Standalone ktx scan supports postgres/sqlite/mysql/clickhouse/sqlserver/bigquery/snowflake/mongodb in this phase, received "${driver ?? 'unknown'}"`,
);
}

View file

@ -6,7 +6,7 @@ import { gunzipSync } from 'node:zlib';
import Database from 'better-sqlite3';
import YAML from 'yaml';
import { z } from 'zod';
import { getDialectForDriver } from '../connections/dialects.js';
import { getSqlDialectForDriver } from '../connections/dialects.js';
import type { KtxLlmRuntimePort } from '../llm/runtime-port.js';
import type { KtxEnrichedRelationship, KtxEnrichedSchema, KtxRelationshipType } from './enrichment-types.js';
import { snapshotToKtxEnrichedSchema } from './local-enrichment.js';
@ -537,7 +537,7 @@ export function ktxRelationshipBenchmarkDetectorWithLlm(
const formalLinks = formalMetadata.accepted.map((relationship) => relationshipToBenchmarkLink(relationship));
const acceptedKeys = new Set(formalLinks.map(fkKey));
const sqliteDataAvailable = Boolean(input.dataPath && input.snapshot.driver === 'sqlite');
const dialect = getDialectForDriver(input.snapshot.driver);
const dialect = getSqlDialectForDriver(input.snapshot.driver);
const profilingExecutor =
sqliteDataAvailable && input.mode !== 'profiling_disabled'
? new KtxRelationshipBenchmarkSqliteExecutor(input.dataPath as string)
@ -552,6 +552,7 @@ export function ktxRelationshipBenchmarkDetectorWithLlm(
})
: await profileKtxRelationshipSchema({
connectionId: input.snapshot.connectionId,
driver: input.snapshot.driver,
dialect,
schema: input.schema,
executor: profilingExecutor,
@ -673,7 +674,7 @@ export function currentKtxRelationshipBenchmarkDetector(): KtxRelationshipBenchm
const formalLinks = formalMetadata.accepted.map((relationship) => relationshipToBenchmarkLink(relationship));
const acceptedKeys = new Set(formalLinks.map(fkKey));
const sqliteDataAvailable = Boolean(input.dataPath && input.snapshot.driver === 'sqlite');
const dialect = getDialectForDriver(input.snapshot.driver);
const dialect = getSqlDialectForDriver(input.snapshot.driver);
const profilingExecutor =
sqliteDataAvailable && input.mode !== 'profiling_disabled'
? new KtxRelationshipBenchmarkSqliteExecutor(input.dataPath as string)
@ -688,6 +689,7 @@ export function currentKtxRelationshipBenchmarkDetector(): KtxRelationshipBenchm
})
: await profileKtxRelationshipSchema({
connectionId: input.snapshot.connectionId,
driver: input.snapshot.driver,
dialect,
schema: input.schema,
executor: profilingExecutor,

View file

@ -1,4 +1,4 @@
import type { KtxDialect } from '../connections/dialects.js';
import type { KtxSqlDialect } from '../connections/dialects.js';
import type { KtxEnrichedColumn, KtxEnrichedSchema, KtxEnrichedTable, KtxRelationshipType } from './enrichment-types.js';
import {
type KtxRelationshipProfileArtifact,
@ -56,7 +56,7 @@ export interface KtxCompositeRelationshipCandidate {
export interface DiscoverKtxCompositeRelationshipsInput {
connectionId: string;
dialect: KtxDialect;
dialect: KtxSqlDialect;
schema: KtxEnrichedSchema;
profiles: KtxRelationshipProfileArtifact;
executor: KtxRelationshipReadOnlyExecutor | null;
@ -227,11 +227,11 @@ function sqlSuffix(fragment: string): string {
return fragment ? ` ${fragment}` : '';
}
function aliasedTupleSelect(dialect: KtxDialect, columns: readonly string[]): string {
function aliasedTupleSelect(dialect: KtxSqlDialect, columns: readonly string[]): string {
return columns.map((column, index) => `${dialect.quoteIdentifier(column)} AS c${index}`).join(', ');
}
function nonNullPredicate(dialect: KtxDialect, columns: readonly string[]): string {
function nonNullPredicate(dialect: KtxSqlDialect, columns: readonly string[]): string {
return columns.map((column) => `${dialect.quoteIdentifier(column)} IS NOT NULL`).join(' AND ');
}
@ -242,7 +242,7 @@ function tupleEquality(columns: number): string {
}
function buildTupleDistinctSql(input: {
dialect: KtxDialect;
dialect: KtxSqlDialect;
table: KtxTableRef;
columns: readonly string[];
}): string {
@ -257,7 +257,7 @@ function buildTupleDistinctSql(input: {
}
function buildCompositeCoverageSql(input: {
dialect: KtxDialect;
dialect: KtxSqlDialect;
childTable: KtxTableRef;
childColumns: readonly string[];
parentTable: KtxTableRef;
@ -322,7 +322,7 @@ function hasAcceptedSubset(
async function detectCompositePrimaryKeys(input: {
connectionId: string;
dialect: KtxDialect;
dialect: KtxSqlDialect;
table: KtxEnrichedTable;
profiles: KtxRelationshipProfileArtifact;
executor: KtxRelationshipReadOnlyExecutor;
@ -426,7 +426,7 @@ function compatibleTuple(sourceColumns: readonly KtxEnrichedColumn[], targetColu
async function validateCompositeRelationship(input: {
connectionId: string;
dialect: KtxDialect;
dialect: KtxSqlDialect;
sourceTable: KtxEnrichedTable;
sourceColumns: readonly KtxEnrichedColumn[];
targetKey: KtxCompositePrimaryKeyCandidate;

View file

@ -1,5 +1,5 @@
import type { KtxLlmRuntimePort } from '../../context/llm/runtime-port.js';
import type { KtxDialect } from '../connections/dialects.js';
import type { KtxSqlDialect } from '../connections/dialects.js';
import type { KtxScanRelationshipConfig } from '../project/config.js';
import type { KtxEnrichedRelationship, KtxEnrichedSchema, KtxRelationshipUpdate } from './enrichment-types.js';
import {
@ -34,7 +34,7 @@ import type {
export interface DiscoverKtxRelationshipsInput {
connectionId: string;
dialect: KtxDialect;
dialect: KtxSqlDialect | null;
connector: KtxScanConnector;
schema: KtxEnrichedSchema;
context: KtxScanContext;
@ -122,20 +122,21 @@ function compositeSummary(relationships: readonly KtxCompositeRelationshipCandid
async function detectCompositeRelationships(input: {
connectionId: string;
dialect: KtxDialect;
dialect: KtxSqlDialect | null;
schema: KtxEnrichedSchema;
profile: KtxRelationshipProfileArtifact;
executor: KtxRelationshipReadOnlyExecutor | null;
context: DiscoverKtxRelationshipsInput['context'];
warnings: KtxScanWarning[];
}): Promise<KtxCompositeRelationshipCandidate[]> {
if (!input.executor || !input.profile.sqlAvailable) {
if (!input.executor || !input.profile.sqlAvailable || !input.dialect) {
return [];
}
const dialect = input.dialect;
try {
const compositeDetection = await discoverKtxCompositeRelationships({
connectionId: input.connectionId,
dialect: input.dialect,
dialect,
schema: input.schema,
profiles: input.profile,
executor: input.executor,
@ -223,6 +224,7 @@ export async function discoverKtxRelationships(
const profileCache = createKtxRelationshipProfileCache();
const profile = await profileKtxRelationshipSchema({
connectionId: input.connectionId,
driver: input.connector.driver,
dialect: input.dialect,
schema: input.schema,
executor,

View file

@ -1,4 +1,4 @@
import type { KtxDialect } from '../connections/dialects.js';
import type { KtxSqlDialect } from '../connections/dialects.js';
import type { KtxEnrichedColumn, KtxEnrichedSchema, KtxEnrichedTable } from './enrichment-types.js';
import { mapWithConcurrency } from './relationship-validation.js';
import type {
@ -56,7 +56,8 @@ export interface KtxRelationshipProfileCache {
export interface ProfileKtxRelationshipSchemaInput {
connectionId: string;
dialect: KtxDialect;
driver: KtxConnectionDriver;
dialect: KtxSqlDialect | null;
schema: KtxEnrichedSchema;
executor: KtxRelationshipReadOnlyExecutor | null;
ctx: KtxScanContext;
@ -123,7 +124,7 @@ function columnKey(table: KtxEnrichedTable, column: KtxEnrichedColumn): string {
function tableProfileCacheKey(input: {
connectionId: string;
dialect: KtxDialect;
dialect: KtxSqlDialect;
ctx: KtxScanContext;
table: KtxTableRef;
sampleValuesPerColumn: number;
@ -149,7 +150,7 @@ function sqlSuffix(fragment: string): string {
return fragment ? ` ${fragment}` : '';
}
function sampledTableSql(dialect: KtxDialect, tableSql: string, limit: number): string {
function sampledTableSql(dialect: KtxSqlDialect, tableSql: string, limit: number): string {
const top = dialect.getTopClause(limit);
if (top) {
return `(SELECT ${top} * FROM ${tableSql}) AS relationship_profile_sample`;
@ -158,7 +159,7 @@ function sampledTableSql(dialect: KtxDialect, tableSql: string, limit: number):
}
function sampleValuesSql(input: {
dialect: KtxDialect;
dialect: KtxSqlDialect;
tableSql: string;
columnSql: string;
limit: number;
@ -175,7 +176,7 @@ function sampleValuesSql(input: {
}
function columnProfileSelectSql(input: {
dialect: KtxDialect;
dialect: KtxSqlDialect;
tableSql: string;
profileTableSql: string;
column: KtxEnrichedColumn;
@ -218,7 +219,7 @@ function splitSampleValues(value: unknown): string[] {
async function queryCount(input: {
connectionId: string;
dialect: KtxDialect;
dialect: KtxSqlDialect;
table: KtxTableRef;
executor: KtxRelationshipReadOnlyExecutor;
ctx: KtxScanContext;
@ -233,7 +234,7 @@ async function queryCount(input: {
async function queryTableProfile(input: {
connectionId: string;
dialect: KtxDialect;
dialect: KtxSqlDialect;
table: KtxEnrichedTable;
executor: KtxRelationshipReadOnlyExecutor;
ctx: KtxScanContext;
@ -320,10 +321,10 @@ type TableProfileResult =
export async function profileKtxRelationshipSchema(
input: ProfileKtxRelationshipSchemaInput,
): Promise<KtxRelationshipProfileArtifact> {
if (!input.executor) {
if (!input.executor || !input.dialect) {
return {
connectionId: input.connectionId,
driver: input.dialect.type,
driver: input.driver,
sqlAvailable: false,
queryCount: 0,
tables: [],
@ -337,6 +338,7 @@ export async function profileKtxRelationshipSchema(
const columns: Record<string, KtxRelationshipColumnProfile> = {};
const warnings: string[] = [];
const executor = input.executor;
const dialect = input.dialect;
const enabledTables = input.schema.tables.filter((candidate) => candidate.enabled);
const tableResults = await mapWithConcurrency<KtxEnrichedTable, TableProfileResult>(
@ -347,7 +349,7 @@ export async function profileKtxRelationshipSchema(
const profileSampleRows = input.profileSampleRows ?? 10000;
const cacheKey = tableProfileCacheKey({
connectionId: input.connectionId,
dialect: input.dialect,
dialect,
ctx: input.ctx,
table: table.ref,
sampleValuesPerColumn,
@ -361,7 +363,7 @@ export async function profileKtxRelationshipSchema(
try {
const tableProfile = await queryTableProfile({
connectionId: input.connectionId,
dialect: input.dialect,
dialect,
table,
executor,
ctx: input.ctx,
@ -403,7 +405,7 @@ export async function profileKtxRelationshipSchema(
return {
connectionId: input.connectionId,
driver: input.dialect.type,
driver: input.driver,
sqlAvailable: true,
queryCount: queryTotal,
tables,

View file

@ -1,4 +1,4 @@
import type { KtxDialect } from '../connections/dialects.js';
import type { KtxSqlDialect } from '../connections/dialects.js';
import type { KtxRelationshipEndpoint } from './enrichment-types.js';
import { applyKtxRelationshipValidationBudget, type KtxRelationshipValidationBudget } from './relationship-budget.js';
import type { KtxRelationshipDiscoveryCandidate } from './relationship-candidates.js';
@ -44,7 +44,7 @@ export interface KtxValidatedRelationshipDiscoveryCandidate
export interface ValidateKtxRelationshipDiscoveryCandidatesInput {
connectionId: string;
dialect: KtxDialect;
dialect: KtxSqlDialect | null;
candidates: readonly KtxRelationshipDiscoveryCandidate[];
profiles: KtxRelationshipProfileArtifact;
executor: KtxRelationshipReadOnlyExecutor | null;
@ -108,7 +108,7 @@ function sqlSuffix(fragment: string): string {
}
function buildCoverageSql(input: {
dialect: KtxDialect;
dialect: KtxSqlDialect;
childTable: KtxTableRef;
childColumn: string;
parentTable: KtxTableRef;
@ -237,13 +237,14 @@ export async function validateKtxRelationshipDiscoveryCandidates(
input: ValidateKtxRelationshipDiscoveryCandidatesInput,
): Promise<KtxValidatedRelationshipDiscoveryCandidate[]> {
const settings = mergeSettings(input.settings);
if (!input.executor || !input.profiles.sqlAvailable) {
if (!input.executor || !input.profiles.sqlAvailable || !input.dialect) {
return input.candidates.map((candidate) =>
reviewWithoutValidation(candidate, input.profiles, 'validation_unavailable'),
);
}
const executor = input.executor;
const dialect = input.dialect;
async function validateCandidate(
candidate: KtxRelationshipDiscoveryCandidate,
@ -260,7 +261,7 @@ export async function validateKtxRelationshipDiscoveryCandidates(
{
connectionId: input.connectionId,
sql: buildCoverageSql({
dialect: input.dialect,
dialect,
childTable: candidate.from.table,
childColumn: sourceColumn,
parentTable: candidate.to.table,

View file

@ -7,7 +7,8 @@ export type KtxConnectionDriver =
| 'bigquery'
| 'snowflake'
| 'mysql'
| 'clickhouse';
| 'clickhouse'
| 'mongodb';
export type KtxScanMode = 'structural' | 'relationships' | 'enriched';

View file

@ -2,6 +2,7 @@ import type { KtxSqlQueryExecutorPort } from '../../context/connections/query-ex
import type { KtxSemanticLayerComputePort } from '../../context/daemon/semantic-layer-compute.js';
import type { KtxMcpProgressCallback } from '../mcp/types.js';
import type { KtxLocalProject } from '../../context/project/project.js';
import { isSqlQueryableDriver } from '../connections/dialects.js';
import { FEDERATED_CONNECTION_ID } from '../connections/federation.js';
import { resolveRequiredConnectionId } from '../connections/resolve-connection.js';
import { sqlAnalysisDialectForDriver } from '../sql-analysis/dialect.js';
@ -83,6 +84,12 @@ export async function compileLocalSlQuery(
await options.onProgress?.({ progress: 0, message: 'Compiling query' });
const connectionId = resolveLocalConnectionId(project, options.connectionId);
const driver = project.config.connections[connectionId]?.driver;
if (!isSqlQueryableDriver(driver)) {
throw new Error(
`Semantic-layer queries require a SQL warehouse connection; '${connectionId}' uses the non-SQL driver ` +
`'${driver ?? 'unknown'}'. MongoDB and other context-only sources are searchable and ingestable, not SL-queryable.`,
);
}
const dialect = sqlAnalysisDialectForDriver(driver);
const sources = await loadComputableSources(project, connectionId);

View file

@ -122,6 +122,17 @@ function createKtxCliLiveDatabaseIntrospection(
return {
async extractSchema(connectionId: string, options?: LiveDatabaseIntrospectionOptions) {
const connection = project.config.connections[connectionId];
if (String(connection?.driver ?? '').toLowerCase() === 'mongodb') {
const { createMongoDbLiveDatabaseIntrospection } = await import('./connectors/mongodb/live-database-introspection.js');
const { isKtxMongoDbConnectionConfig } = await import('./connectors/mongodb/connector.js');
if (!isKtxMongoDbConnectionConfig(connection)) {
return daemon.extractSchema(connectionId, options);
}
const mongodb = createMongoDbLiveDatabaseIntrospection({
connections: project.config.connections,
});
return mongodb.extractSchema(connectionId, options);
}
if (isKtxPostgresConnectionConfig(connection)) {
return postgres.extractSchema(connectionId, options);
}

View file

@ -69,7 +69,8 @@ export type KtxSetupDatabaseDriver =
| 'clickhouse'
| 'sqlserver'
| 'bigquery'
| 'snowflake';
| 'snowflake'
| 'mongodb';
export interface KtxSetupDatabasesArgs {
projectDir: string;
@ -156,6 +157,7 @@ const DRIVER_OPTIONS: Array<{ value: KtxSetupDatabaseDriver; label: string }> =
{ value: 'mysql', label: 'MySQL' },
{ value: 'clickhouse', label: 'ClickHouse' },
{ value: 'sqlserver', label: 'SQL Server' },
{ value: 'mongodb', label: 'MongoDB' },
{ value: 'sqlite', label: 'SQLite' },
];
@ -178,6 +180,7 @@ const DEFAULT_CONNECTION_IDS: Record<KtxSetupDatabaseDriver, string> = {
sqlserver: 'sqlserver-warehouse',
bigquery: 'bigquery-warehouse',
snowflake: 'snowflake-warehouse',
mongodb: 'mongodb-source',
};
interface ScopeDiscoverySpec {
@ -231,6 +234,13 @@ const SCOPE_DISCOVERY_SPECS: Partial<Record<KtxSetupDatabaseDriver, ScopeDiscove
configArrayField: 'databases',
suggest: defaultSuggest,
},
mongodb: {
noun: 'database',
nounPlural: 'databases',
promptLabel: 'MongoDB databases',
configArrayField: 'databases',
suggest: defaultSuggest,
},
sqlserver: {
noun: 'schema',
nounPlural: 'schemas',
@ -748,6 +758,41 @@ async function buildUrlConnectionConfig(input: {
}
}
async function buildMongoConnectionConfig(input: {
connectionId: string;
args: KtxSetupDatabasesArgs;
prompts: KtxSetupDatabasesPromptAdapter;
existingConnection?: KtxProjectConnectionConfig;
}): Promise<KtxProjectConnectionConfig | null | 'back'> {
if (input.args.inputMode === 'disabled' && !input.args.databaseUrl) return null;
const rawUrl =
input.args.databaseUrl ??
(await promptText(
input.prompts,
'MongoDB connection URL\nFor example mongodb+srv://user:pass@cluster.example.net/app.', // pragma: allowlist secret
stringConfigField(input.existingConnection, 'url'),
));
if (rawUrl === undefined) return 'back';
if (!rawUrl) return null;
const url = normalizeInputReference(rawUrl);
const scope = scriptedScopeConfigForDriver('mongodb', input.args.databaseSchemas);
if (url.startsWith('env:') || url.startsWith('file:')) {
return { driver: 'mongodb', url, ...scope };
}
if (urlHasCredentials(url)) {
const ref = await writeProjectLocalSecretReference({
projectDir: input.args.projectDir,
fileName: `${input.connectionId}-url`,
value: url,
});
return { driver: 'mongodb', url: ref, ...scope };
}
return { driver: 'mongodb', url, ...scope };
}
async function buildConnectionConfig(input: {
driver: KtxSetupDatabaseDriver;
connectionId: string;
@ -777,6 +822,14 @@ async function buildConnectionConfig(input: {
existingConnection: input.existingConnection,
});
}
if (driver === 'mongodb') {
return await buildMongoConnectionConfig({
connectionId: input.connectionId,
args,
prompts,
existingConnection: input.existingConnection,
});
}
if (driver === 'bigquery') {
const credentialsPath = await promptText(
prompts,

View file

@ -2,6 +2,7 @@ import { executeFederatedQuery } from './connectors/duckdb/federated-executor.js
import { FEDERATED_CONNECTION_ID } from './context/connections/federation.js';
import { executeProjectReadOnlySql } from './context/connections/project-sql-executor.js';
import type { KtxSqlQueryExecutionResult } from './context/connections/query-executor.js';
import { assertSqlQueryableConnection } from './context/connections/dialects.js';
import { resolveConfiguredConnection } from './context/connections/resolve-connection.js';
import { loadKtxProject, type KtxLocalProject } from './context/project/project.js';
import { sqlAnalysisDialectForDriver } from './context/sql-analysis/dialect.js';
@ -136,6 +137,9 @@ export async function runKtxSql(args: KtxSqlArgs, io: KtxCliIo = process, deps:
const connection = isFederated ? undefined : resolveConfiguredConnection(project.config, args.connectionId);
driver = isFederated ? 'duckdb' : String(connection?.driver ?? 'unknown').toLowerCase();
demoConnection = isFederated ? false : isDemoConnection(args.connectionId, connection);
if (!isFederated) {
assertSqlQueryableConnection(args.connectionId, connection?.driver);
}
const createSqlAnalysis =
deps.createSqlAnalysis ??