mirror of
https://github.com/Kaelio/ktx.git
synced 2026-06-19 08:28:06 +02:00
chore(workspace): gate dead-code with knip production mode (#196)
* refactor(workspace): relocate @ktx/llm source into packages/cli/src/llm * refactor(workspace): rewrite @ktx/llm imports to relative paths * refactor(workspace): fold internal packages into cli * chore(workspace): gate dead-code with knip production mode Turn on production-mode knip plus an autofix run in pre-commit and the `pnpm dead-code` script, document the `/** @internal */` convention for test-only exports in AGENTS.md, annotate test-only exports across the CLI with that JSDoc, and drop dead exports/wrappers the new gate surfaced (e.g. `cli-project.ts`, `lookerRuntimeSourceToFileAdapterSource`, `createLocalScanEnrichmentProvidersFromConfig`, `PGLITE_OWNER_PROCESS_BACKEND_CAPABILITIES`, stale type re-exports). Replace the loose `ignoreIssues` allowlist in `knip.json` with explicit production entries so cross-package barrel leaks are caught. * refactor(cli): delete internal barrel index.ts files The 34 `index.ts` re-export barrels inside `packages/cli/src/` were holdovers from the pre-fold multi-workspace structure. Post-fold-in they served no production purpose: external consumers go through the single package main entry, and in-repo callers mostly imported through them only because the path was short. Internally, knip flagged most barrel re-exports as production-dead (only reached via tests). This change: - Deletes every internal barrel except `packages/cli/src/index.ts` (the published package entry). - Rewrites ~270 source/test files to import each name directly from the file that defines it. - Moves `tools/warehouse-verification/index.ts` to `create-warehouse-verification-tools.ts` (the function it defined locally) and updates its single consumer. - Renames `search/backend-conformance.ts` → `.test-utils.ts` to match the existing test-helper file convention. - Deletes 13 dead test-only chains (dbt-descriptions/*, live-database/extracted-schema, live-database/structural-sync, relationship-* feedback/review chain) plus their tests and a cascading orphan integration test. - Updates test mocks that pointed at deleted barrel paths (notion-client, connector barrels in scan/local-scan-connectors tests) to mock the source files instead. - Points the maintainer benchmark script (`scripts/relationship-benchmark-report.mjs`) at source files instead of `dist/context/scan/index.js`. - Drops the barrel `!` entries from `knip.json`; adds explicit production entries only for the benchmark code reached via dist by the maintainer script. Net: 413 files changed, ~1.2k insertions, ~9.4k deletions. `pnpm run dead-code` (Biome + knip default + knip production) and `pnpm run type-check` are clean; 2277 tests pass. * refactor(workspace): rename @ktx/cli to @kaelio/ktx and pack it directly Promote the CLI workspace package to the public name `@kaelio/ktx` and drop the separate `scripts/build-public-npm-package.mjs` wrapper. The CLI package is now publishable in place (`publishConfig.access: public`, `provenance: true`), so artifact packing uses `pnpm pack` against `packages/cli/` instead of assembling a parallel package tree. Updates all workspace filter invocations, docs, tests, and release readiness checks to reference the new package name, and folds the tarball-name helper into `scripts/public-npm-release-metadata.mjs`. * docs: align "agent clients" and "data agents" terminology Replace "client agents" with "agent clients" and "database agents" with "data agents" across AGENTS.md, README.md, the docs-site copy, and the matching setup-agents test description, matching the canonical vocabulary in docs/terminology.md. Also moves packages/cli/tsconfig.json's tsBuildInfoFile from node_modules/.cache/ to dist/.tsbuildinfo so incremental builds survive node_modules reinstalls. * refactor(release): single source of truth for package version Make packages/cli/package.json the single source of truth for the @kaelio/ktx version. publicNpmPackageVersion() now reads it directly, so artifact filenames, release-readiness checks, and the Python wheel version all derive from one field. The duplicate release-policy.json.publicNpmPackageVersion is removed. Previously the two fields could drift: tarballs were named kaelio-ktx-0.4.1.tgz while internally containing @kaelio/ktx@0.0.0-private. - update-public-release-version.mjs rewrites both Python pyproject.toml files (ktx-daemon, ktx-sl) alongside the npm package.jsons, normalizing the version for PEP 440 (e.g. 0.1.0-rc.2 -> 0.1.0rc2). - semantic-release-config.cjs adds the two pyproject.toml files to @semantic-release/git assets so the release commit back to main carries every version source in lockstep. - The six "?? '0.0.0-private'" fallback literals across the CLI are replaced with "?? getKtxCliPackageInfo().version", and createDefaultKtxMcpServer makes its version arg required. - docs/release.md describes the actual commit-back model: the dev tree always reflects the most recent release; no sentinel pin to maintain. Verified: pnpm run artifacts:build now produces kaelio-ktx-0.4.1.tgz and kaelio_ktx-0.4.1-py3-none-any.whl with @kaelio/ktx@0.4.1 inside. Full type-check, dead-code, and 2287 vitests + 173 script tests pass. * refactor(cli): inject embedding provider resolution and detect sentence-transformers runtime Make resolveProjectEmbeddingProvider and runtimeIo injectable in ingest and scan command entrypoints so tests can stub them, and teach resolvePublicIngestRuntimeRequirements to flag the local-embeddings runtime feature when ktx.yaml selects sentence-transformers. * chore(cli): mark buildLocalStatsStatus and LocalStatsStatus as @internal Both symbols are consumed only by status-project.test.ts. Annotating with /** @internal */ keeps knip's production-mode check clean without changing runtime behavior. * fix(cli): use real package metadata in print-command-tree The stubbed package name embedded a forbidden product identifier that tripped the boundary check in CI. Read the metadata from package.json instead — keeps the rendered tree unchanged and removes a duplicate source of truth. * feat(cli): show embedding coverage in `ktx status`, drop duplicate disk counts Inline `(N embedded)` next to the Wiki scope counts and Semantic-layer source counts, computed with `SUM(embedding_json IS NOT NULL)` over `knowledge_pages` and `local_sl_sources`. Rename the "Knowledge" label to "Wiki" (canonical per `docs/terminology.md`) and rename the matching `localStats.knowledgePages` field to `localStats.wikiPages`. Drop `wiki=N md` and `semantic-layer=N yaml` from the Disk row — those duplicated the per-surface rows above. Disk now reports only actual byte usage (db, cache, raw-sources). The unused `wikiGlobalMarkdownCount` / `semanticLayerYamlCount` fields, the `isMarkdownEntry` / `isYamlEntry` helpers, and the `filter` arg on `summarizeDir` are removed.
This commit is contained in:
parent
a1cfb03d73
commit
2366b00301
1002 changed files with 2286 additions and 12051 deletions
27
packages/cli/src/context/connections/connection-type.ts
Normal file
27
packages/cli/src/context/connections/connection-type.ts
Normal file
|
|
@ -0,0 +1,27 @@
|
|||
import { z } from 'zod';
|
||||
|
||||
export const connectionTypeSchema = z.enum([
|
||||
'POSTGRESQL',
|
||||
'SQLITE',
|
||||
'SQLSERVER',
|
||||
'BIGQUERY',
|
||||
'SNOWFLAKE',
|
||||
'CENTRALREACH',
|
||||
'EPIC',
|
||||
'CERNER',
|
||||
'ATHENA',
|
||||
'QUICKBOOKS',
|
||||
'WORKDAY',
|
||||
'REST',
|
||||
'S3',
|
||||
'SLACK',
|
||||
'METABASE',
|
||||
'LOOKER',
|
||||
'NOTION',
|
||||
'MYSQL',
|
||||
'CLICKHOUSE',
|
||||
'PLAIN',
|
||||
'BETTERSTACK',
|
||||
]);
|
||||
|
||||
export type ConnectionType = z.infer<typeof connectionTypeSchema>;
|
||||
30
packages/cli/src/context/connections/dialects.test.ts
Normal file
30
packages/cli/src/context/connections/dialects.test.ts
Normal file
|
|
@ -0,0 +1,30 @@
|
|||
import { describe, expect, it } from 'vitest';
|
||||
import { getDialectForDriver } from './dialects.js';
|
||||
|
||||
describe('getDialectForDriver', () => {
|
||||
it.each([
|
||||
['postgres', '"public"."orders"'],
|
||||
['postgresql', '"public"."orders"'],
|
||||
['mysql', '`public`.`orders`'],
|
||||
['clickhouse', '`public`.`orders`'],
|
||||
['sqlite', '"orders"'],
|
||||
['snowflake', '"analytics"."public"."orders"'],
|
||||
['bigquery', '`analytics`.`public`.`orders`'],
|
||||
['sqlserver', '[analytics].[public].[orders]'],
|
||||
] as const)('formats table names for %s', (driver, expected) => {
|
||||
const dialect = getDialectForDriver(driver);
|
||||
expect(
|
||||
dialect.formatTableName({
|
||||
catalog: driver === 'snowflake' || driver === 'bigquery' || driver === 'sqlserver' ? 'analytics' : null,
|
||||
db: driver === 'sqlite' ? null : 'public',
|
||||
name: 'orders',
|
||||
}),
|
||||
).toBe(expected);
|
||||
});
|
||||
|
||||
it('throws with a supported-driver list for unknown drivers', () => {
|
||||
expect(() => getDialectForDriver('oracle')).toThrow(
|
||||
'Unsupported warehouse driver "oracle". Supported drivers: bigquery, clickhouse, mysql, postgres, postgresql, sqlite, sqlite3, snowflake, sqlserver',
|
||||
);
|
||||
});
|
||||
});
|
||||
102
packages/cli/src/context/connections/dialects.ts
Normal file
102
packages/cli/src/context/connections/dialects.ts
Normal file
|
|
@ -0,0 +1,102 @@
|
|||
import type { KtxSchemaDimensionType, KtxTableRef } from '../scan/types.js';
|
||||
|
||||
type SupportedDriver =
|
||||
| 'postgres'
|
||||
| 'postgresql'
|
||||
| 'mysql'
|
||||
| 'sqlserver'
|
||||
| 'snowflake'
|
||||
| 'bigquery'
|
||||
| 'clickhouse'
|
||||
| 'sqlite'
|
||||
| 'sqlite3';
|
||||
|
||||
export interface KtxDialect {
|
||||
readonly type: SupportedDriver;
|
||||
quoteIdentifier(identifier: string): string;
|
||||
formatTableName(table: KtxTableRef): string;
|
||||
mapToDimensionType(nativeType: string): KtxSchemaDimensionType;
|
||||
}
|
||||
|
||||
const supportedDrivers: SupportedDriver[] = [
|
||||
'bigquery',
|
||||
'clickhouse',
|
||||
'mysql',
|
||||
'postgres',
|
||||
'postgresql',
|
||||
'sqlite',
|
||||
'sqlite3',
|
||||
'snowflake',
|
||||
'sqlserver',
|
||||
];
|
||||
|
||||
function doubleQuoted(identifier: string): string {
|
||||
return `"${identifier.replace(/"/g, '""')}"`;
|
||||
}
|
||||
|
||||
function backtickQuoted(identifier: string): string {
|
||||
return `\`${identifier.replace(/`/g, '``')}\``;
|
||||
}
|
||||
|
||||
function bigQueryQuoted(identifier: string): string {
|
||||
return `\`${identifier.replace(/`/g, '\\`')}\``;
|
||||
}
|
||||
|
||||
function bracketQuoted(identifier: string): string {
|
||||
return `[${identifier.replace(/\]/g, ']]')}]`;
|
||||
}
|
||||
|
||||
function inferDimensionType(nativeType: string): KtxSchemaDimensionType {
|
||||
const normalized = nativeType.toLowerCase().trim();
|
||||
if (normalized.includes('date') || normalized.includes('time')) {
|
||||
return 'time';
|
||||
}
|
||||
if (
|
||||
normalized.includes('int') ||
|
||||
normalized.includes('num') ||
|
||||
normalized.includes('dec') ||
|
||||
normalized.includes('float') ||
|
||||
normalized.includes('double') ||
|
||||
normalized.includes('real')
|
||||
) {
|
||||
return 'number';
|
||||
}
|
||||
if (normalized.includes('bool') || normalized === 'bit') {
|
||||
return 'boolean';
|
||||
}
|
||||
return 'string';
|
||||
}
|
||||
|
||||
function formatWithParts(table: KtxTableRef, quote: (identifier: string) => string, sqlite = false): string {
|
||||
const parts = sqlite ? [table.name] : [table.catalog, table.db, table.name].filter((part): part is string => !!part);
|
||||
return parts.map(quote).join('.');
|
||||
}
|
||||
|
||||
function createDialect(type: SupportedDriver, quote: (identifier: string) => string, sqlite = false): KtxDialect {
|
||||
return {
|
||||
type,
|
||||
quoteIdentifier: quote,
|
||||
formatTableName: (table) => formatWithParts(table, quote, sqlite),
|
||||
mapToDimensionType: inferDimensionType,
|
||||
};
|
||||
}
|
||||
|
||||
const dialects: Record<SupportedDriver, KtxDialect> = {
|
||||
postgres: createDialect('postgres', doubleQuoted),
|
||||
postgresql: createDialect('postgresql', doubleQuoted),
|
||||
mysql: createDialect('mysql', backtickQuoted),
|
||||
clickhouse: createDialect('clickhouse', backtickQuoted),
|
||||
sqlite: createDialect('sqlite', doubleQuoted, true),
|
||||
sqlite3: createDialect('sqlite3', doubleQuoted, true),
|
||||
snowflake: createDialect('snowflake', doubleQuoted),
|
||||
bigquery: createDialect('bigquery', bigQueryQuoted),
|
||||
sqlserver: createDialect('sqlserver', bracketQuoted),
|
||||
};
|
||||
|
||||
export function getDialectForDriver(driver: string): KtxDialect {
|
||||
const normalized = driver.toLowerCase().trim();
|
||||
if (normalized in dialects) {
|
||||
return dialects[normalized as SupportedDriver];
|
||||
}
|
||||
throw new Error(`Unsupported warehouse driver "${driver}". Supported drivers: ${supportedDrivers.join(', ')}`);
|
||||
}
|
||||
|
|
@ -0,0 +1,59 @@
|
|||
import { describe, expect, it, vi } from 'vitest';
|
||||
import { createDefaultLocalQueryExecutor } from './local-query-executor.js';
|
||||
|
||||
describe('createDefaultLocalQueryExecutor', () => {
|
||||
it('dispatches postgres and sqlite drivers to their executors', async () => {
|
||||
const postgres = {
|
||||
execute: vi.fn(async () => ({
|
||||
headers: ['pg'],
|
||||
rows: [[1]],
|
||||
totalRows: 1,
|
||||
command: 'SELECT',
|
||||
rowCount: 1,
|
||||
})),
|
||||
};
|
||||
const sqlite = {
|
||||
execute: vi.fn(async () => ({
|
||||
headers: ['sqlite'],
|
||||
rows: [[2]],
|
||||
totalRows: 1,
|
||||
command: 'SELECT',
|
||||
rowCount: 1,
|
||||
})),
|
||||
};
|
||||
const executor = createDefaultLocalQueryExecutor({ postgres, sqlite });
|
||||
|
||||
await expect(
|
||||
executor.execute({
|
||||
connectionId: 'pg',
|
||||
connection: { driver: 'postgres' },
|
||||
sql: 'select 1',
|
||||
}),
|
||||
).resolves.toMatchObject({ headers: ['pg'] });
|
||||
await expect(
|
||||
executor.execute({
|
||||
connectionId: 'local',
|
||||
connection: { driver: 'sqlite' },
|
||||
sql: 'select 1',
|
||||
}),
|
||||
).resolves.toMatchObject({ headers: ['sqlite'] });
|
||||
|
||||
expect(postgres.execute).toHaveBeenCalledTimes(1);
|
||||
expect(sqlite.execute).toHaveBeenCalledTimes(1);
|
||||
});
|
||||
|
||||
it('rejects unsupported local execution drivers', async () => {
|
||||
const executor = createDefaultLocalQueryExecutor({
|
||||
postgres: { execute: vi.fn() },
|
||||
sqlite: { execute: vi.fn() },
|
||||
});
|
||||
|
||||
await expect(
|
||||
executor.execute({
|
||||
connectionId: 'warehouse',
|
||||
connection: { driver: 'snowflake' },
|
||||
sql: 'select 1',
|
||||
}),
|
||||
).rejects.toThrow('No local query executor is configured for driver "snowflake".');
|
||||
});
|
||||
});
|
||||
34
packages/cli/src/context/connections/local-query-executor.ts
Normal file
34
packages/cli/src/context/connections/local-query-executor.ts
Normal file
|
|
@ -0,0 +1,34 @@
|
|||
import { createPostgresQueryExecutor } from './postgres-query-executor.js';
|
||||
import type {
|
||||
KtxSqlQueryExecutionInput,
|
||||
KtxSqlQueryExecutionResult,
|
||||
KtxSqlQueryExecutorPort,
|
||||
} from './query-executor.js';
|
||||
import { createSqliteQueryExecutor } from './sqlite-query-executor.js';
|
||||
|
||||
export interface DefaultLocalQueryExecutorOptions {
|
||||
postgres?: KtxSqlQueryExecutorPort;
|
||||
sqlite?: KtxSqlQueryExecutorPort;
|
||||
}
|
||||
|
||||
function driverFor(input: KtxSqlQueryExecutionInput): string {
|
||||
return String(input.connection?.driver ?? '').toLowerCase();
|
||||
}
|
||||
|
||||
export function createDefaultLocalQueryExecutor(options: DefaultLocalQueryExecutorOptions = {}): KtxSqlQueryExecutorPort {
|
||||
const postgres = options.postgres ?? createPostgresQueryExecutor();
|
||||
const sqlite = options.sqlite ?? createSqliteQueryExecutor();
|
||||
|
||||
return {
|
||||
async execute(input: KtxSqlQueryExecutionInput): Promise<KtxSqlQueryExecutionResult> {
|
||||
const driver = driverFor(input);
|
||||
if (driver === 'postgres' || driver === 'postgresql') {
|
||||
return postgres.execute(input);
|
||||
}
|
||||
if (driver === 'sqlite' || driver === 'sqlite3') {
|
||||
return sqlite.execute(input);
|
||||
}
|
||||
throw new Error(`No local query executor is configured for driver "${input.connection?.driver ?? 'unknown'}".`);
|
||||
},
|
||||
};
|
||||
}
|
||||
|
|
@ -0,0 +1,71 @@
|
|||
import { describe, expect, it } from 'vitest';
|
||||
import {
|
||||
localConnectionInfoFromConfig,
|
||||
localConnectionToWarehouseDescriptor,
|
||||
localConnectionTypeForConfig,
|
||||
} from './local-warehouse-descriptor.js';
|
||||
|
||||
describe('localConnectionToWarehouseDescriptor', () => {
|
||||
it('maps local Postgres URLs to canonical warehouse descriptors', () => {
|
||||
expect(
|
||||
localConnectionToWarehouseDescriptor('warehouse', {
|
||||
driver: 'postgres',
|
||||
url: 'postgresql://readonly@db.example.test/analytics',
|
||||
}),
|
||||
).toMatchObject({
|
||||
id: 'warehouse',
|
||||
connection_type: 'POSTGRESQL',
|
||||
host: 'db.example.test',
|
||||
database: 'analytics',
|
||||
});
|
||||
});
|
||||
|
||||
it('maps BigQuery project and dataset from explicit fields', () => {
|
||||
expect(
|
||||
localConnectionToWarehouseDescriptor('bq', {
|
||||
driver: 'bigquery',
|
||||
project_id: 'acme',
|
||||
dataset_id: 'warehouse',
|
||||
}),
|
||||
).toMatchObject({
|
||||
id: 'bq',
|
||||
connection_type: 'BIGQUERY',
|
||||
project_id: 'acme',
|
||||
dataset_id: 'warehouse',
|
||||
});
|
||||
});
|
||||
|
||||
it('returns null for non-warehouse adapters', () => {
|
||||
expect(
|
||||
localConnectionToWarehouseDescriptor('looker', {
|
||||
driver: 'looker',
|
||||
base_url: 'https://looker.example.com',
|
||||
client_id: 'client',
|
||||
}),
|
||||
).toBeNull();
|
||||
});
|
||||
});
|
||||
|
||||
describe('local connection info helpers', () => {
|
||||
it('returns canonical warehouse connection types for local catalogs', () => {
|
||||
expect(localConnectionTypeForConfig('warehouse', { driver: 'postgres' })).toBe('POSTGRESQL');
|
||||
expect(localConnectionTypeForConfig('bq', { driver: 'bigquery', project_id: 'acme' })).toBe('BIGQUERY');
|
||||
expect(localConnectionTypeForConfig('snowflake', { driver: 'snowflake' })).toBe('SNOWFLAKE');
|
||||
});
|
||||
|
||||
it('keeps non-warehouse adapter labels for display-only local connection surfaces', () => {
|
||||
expect(localConnectionTypeForConfig('prod-metabase', { driver: 'metabase', api_url: 'https://metabase.example.com' })).toBe(
|
||||
'metabase',
|
||||
);
|
||||
expect(localConnectionTypeForConfig('missing-driver', {} as never)).toBe('unknown');
|
||||
});
|
||||
|
||||
it('builds nullable local connection info records', () => {
|
||||
expect(localConnectionInfoFromConfig('warehouse', { driver: 'postgres' })).toEqual({
|
||||
id: 'warehouse',
|
||||
name: 'warehouse',
|
||||
connectionType: 'POSTGRESQL',
|
||||
});
|
||||
expect(localConnectionInfoFromConfig('missing', undefined)).toBeNull();
|
||||
});
|
||||
});
|
||||
|
|
@ -0,0 +1,102 @@
|
|||
import type { KtxProjectConnectionConfig } from '../project/config.js';
|
||||
import type { ConnectionType } from './connection-type.js';
|
||||
|
||||
export interface LocalWarehouseDescriptor {
|
||||
id: string;
|
||||
connection_type: ConnectionType;
|
||||
host?: string | null;
|
||||
database?: string | null;
|
||||
account?: string | null;
|
||||
project_id?: string | null;
|
||||
dataset_id?: string | null;
|
||||
connection_params: Record<string, unknown>;
|
||||
}
|
||||
|
||||
export interface LocalConnectionInfo {
|
||||
id: string;
|
||||
name: string;
|
||||
connectionType: string;
|
||||
}
|
||||
|
||||
const DRIVER_TO_CONNECTION_TYPE: Record<string, ConnectionType> = {
|
||||
postgres: 'POSTGRESQL',
|
||||
postgresql: 'POSTGRESQL',
|
||||
sqlite: 'SQLITE',
|
||||
sqlserver: 'SQLSERVER',
|
||||
mssql: 'SQLSERVER',
|
||||
mysql: 'MYSQL',
|
||||
clickhouse: 'CLICKHOUSE',
|
||||
snowflake: 'SNOWFLAKE',
|
||||
bigquery: 'BIGQUERY',
|
||||
};
|
||||
|
||||
export function localConnectionToWarehouseDescriptor(
|
||||
id: string,
|
||||
connection: KtxProjectConnectionConfig | undefined,
|
||||
): LocalWarehouseDescriptor | null {
|
||||
if (!connection) {
|
||||
return null;
|
||||
}
|
||||
const connectionType = DRIVER_TO_CONNECTION_TYPE[String(connection.driver ?? '').toLowerCase()];
|
||||
if (!connectionType) {
|
||||
return null;
|
||||
}
|
||||
|
||||
const info: LocalWarehouseDescriptor = {
|
||||
id,
|
||||
connection_type: connectionType,
|
||||
connection_params: { ...connection },
|
||||
};
|
||||
const url = typeof connection.url === 'string' ? connection.url : null;
|
||||
if (url && !url.startsWith('env:') && !url.startsWith('file:')) {
|
||||
try {
|
||||
const parsed = new URL(url);
|
||||
info.host = parsed.hostname || null;
|
||||
if (parsed.pathname.length > 1) {
|
||||
const [first, second] = parsed.pathname.slice(1).split('/');
|
||||
if (connectionType === 'BIGQUERY') {
|
||||
info.project_id = stringField(connection.project_id) ?? parsed.hostname ?? first ?? null;
|
||||
info.dataset_id = stringField(connection.dataset_id) ?? second ?? null;
|
||||
} else {
|
||||
info.database = first ?? null;
|
||||
}
|
||||
}
|
||||
} catch {
|
||||
info.host = stringField(connection.host);
|
||||
}
|
||||
}
|
||||
|
||||
info.host = stringField(connection.host) ?? info.host ?? null;
|
||||
info.database = stringField(connection.database) ?? info.database ?? null;
|
||||
info.account = stringField(connection.account) ?? null;
|
||||
info.project_id = stringField(connection.project_id) ?? info.project_id ?? null;
|
||||
info.dataset_id = stringField(connection.dataset_id) ?? info.dataset_id ?? null;
|
||||
return info;
|
||||
}
|
||||
|
||||
export function localConnectionTypeForConfig(id: string, connection: KtxProjectConnectionConfig | undefined): string {
|
||||
const descriptor = localConnectionToWarehouseDescriptor(id, connection);
|
||||
if (descriptor) {
|
||||
return descriptor.connection_type;
|
||||
}
|
||||
const driver = typeof connection?.driver === 'string' ? connection.driver.trim() : '';
|
||||
return driver.length > 0 ? driver : 'unknown';
|
||||
}
|
||||
|
||||
export function localConnectionInfoFromConfig(
|
||||
id: string,
|
||||
connection: KtxProjectConnectionConfig | undefined,
|
||||
): LocalConnectionInfo | null {
|
||||
if (!connection) {
|
||||
return null;
|
||||
}
|
||||
return {
|
||||
id,
|
||||
name: id,
|
||||
connectionType: localConnectionTypeForConfig(id, connection),
|
||||
};
|
||||
}
|
||||
|
||||
function stringField(value: unknown): string | null {
|
||||
return typeof value === 'string' && value.trim().length > 0 ? value.trim() : null;
|
||||
}
|
||||
157
packages/cli/src/context/connections/notion-config.test.ts
Normal file
157
packages/cli/src/context/connections/notion-config.test.ts
Normal file
|
|
@ -0,0 +1,157 @@
|
|||
import { mkdtemp, rm, writeFile } from 'node:fs/promises';
|
||||
import { tmpdir } from 'node:os';
|
||||
import { join } from 'node:path';
|
||||
import { afterEach, beforeEach, describe, expect, it } from 'vitest';
|
||||
import {
|
||||
notionConnectionToPullConfig,
|
||||
parseNotionConnectionConfig,
|
||||
redactNotionConnectionConfig,
|
||||
resolveNotionAuthToken,
|
||||
} from './notion-config.js';
|
||||
|
||||
describe('standalone Notion connection config', () => {
|
||||
let tempDir: string;
|
||||
|
||||
beforeEach(async () => {
|
||||
tempDir = await mkdtemp(join(tmpdir(), 'ktx-notion-config-'));
|
||||
});
|
||||
|
||||
afterEach(async () => {
|
||||
await rm(tempDir, { recursive: true, force: true });
|
||||
});
|
||||
|
||||
it('parses selected-root Notion config with safe defaults', () => {
|
||||
const parsed = parseNotionConnectionConfig({
|
||||
driver: 'notion',
|
||||
auth_token_ref: 'env:NOTION_TOKEN',
|
||||
crawl_mode: 'selected_roots',
|
||||
root_page_ids: ['page-1'],
|
||||
});
|
||||
|
||||
expect(parsed).toEqual({
|
||||
driver: 'notion',
|
||||
auth_token: null,
|
||||
auth_token_ref: 'env:NOTION_TOKEN',
|
||||
crawl_mode: 'selected_roots',
|
||||
root_page_ids: ['page-1'],
|
||||
root_database_ids: [],
|
||||
root_data_source_ids: [],
|
||||
max_pages_per_run: 1000,
|
||||
max_knowledge_creates_per_run: 25,
|
||||
max_knowledge_updates_per_run: 20,
|
||||
});
|
||||
expect(parsed).not.toHaveProperty('last_successful_cursor');
|
||||
});
|
||||
|
||||
it('parses inline Notion auth tokens without requiring auth_token_ref', () => {
|
||||
const parsed = parseNotionConnectionConfig({
|
||||
driver: 'notion',
|
||||
auth_token: ' ntn_inline_token ',
|
||||
crawl_mode: 'selected_roots',
|
||||
root_page_ids: ['page-1'],
|
||||
});
|
||||
|
||||
expect(parsed).toMatchObject({
|
||||
driver: 'notion',
|
||||
auth_token: 'ntn_inline_token',
|
||||
auth_token_ref: null,
|
||||
crawl_mode: 'selected_roots',
|
||||
root_page_ids: ['page-1'],
|
||||
});
|
||||
});
|
||||
|
||||
it('redacts token references from display output', () => {
|
||||
expect(
|
||||
redactNotionConnectionConfig(
|
||||
parseNotionConnectionConfig({
|
||||
driver: 'notion',
|
||||
auth_token_ref: 'file:/Users/example/.config/notion-token',
|
||||
crawl_mode: 'all_accessible',
|
||||
max_pages_per_run: 80,
|
||||
}),
|
||||
),
|
||||
).toEqual({
|
||||
driver: 'notion',
|
||||
hasAuthToken: true,
|
||||
crawlMode: 'all_accessible',
|
||||
rootPageIds: [],
|
||||
rootDatabaseIds: [],
|
||||
rootDataSourceIds: [],
|
||||
maxPagesPerRun: 80,
|
||||
maxKnowledgeCreatesPerRun: 25,
|
||||
maxKnowledgeUpdatesPerRun: 20,
|
||||
warning: 'Anything accessible to this Notion integration can become organization knowledge.',
|
||||
});
|
||||
});
|
||||
|
||||
it('requires at least one selected root in selected_roots mode', () => {
|
||||
expect(() =>
|
||||
parseNotionConnectionConfig({
|
||||
driver: 'notion',
|
||||
auth_token_ref: 'env:NOTION_TOKEN',
|
||||
crawl_mode: 'selected_roots',
|
||||
}),
|
||||
).toThrow('selected_roots requires at least one root page, database, or data source id');
|
||||
});
|
||||
|
||||
it('resolves env and file token references without exposing the reference in errors', async () => {
|
||||
const tokenPath = join(tempDir, 'notion-token.txt');
|
||||
await writeFile(tokenPath, 'ntn_file_token\n', 'utf-8');
|
||||
|
||||
await expect(
|
||||
resolveNotionAuthToken('env:NOTION_TOKEN', {
|
||||
env: { NOTION_TOKEN: 'ntn_env_token' },
|
||||
}),
|
||||
).resolves.toBe('ntn_env_token');
|
||||
await expect(resolveNotionAuthToken(`file:${tokenPath}`)).resolves.toBe('ntn_file_token');
|
||||
await expect(resolveNotionAuthToken('env:MISSING_NOTION_TOKEN', { env: {} })).rejects.toThrow(
|
||||
'Notion token environment variable MISSING_NOTION_TOKEN is not set',
|
||||
);
|
||||
});
|
||||
|
||||
it('converts standalone config into adapter pull config', async () => {
|
||||
const pullConfig = await notionConnectionToPullConfig(
|
||||
parseNotionConnectionConfig({
|
||||
driver: 'notion',
|
||||
auth_token_ref: 'env:NOTION_TOKEN',
|
||||
crawl_mode: 'all_accessible',
|
||||
max_pages_per_run: 12,
|
||||
max_knowledge_creates_per_run: 2,
|
||||
max_knowledge_updates_per_run: 7,
|
||||
last_successful_cursor: '{"phase":"all_accessible_pages","cursor":"cursor-1"}',
|
||||
}),
|
||||
{ env: { NOTION_TOKEN: 'ntn_env_token' } },
|
||||
);
|
||||
|
||||
expect(pullConfig).toEqual({
|
||||
authToken: 'ntn_env_token',
|
||||
crawlMode: 'all_accessible',
|
||||
rootPageIds: [],
|
||||
rootDatabaseIds: [],
|
||||
rootDataSourceIds: [],
|
||||
maxPagesPerRun: 12,
|
||||
maxKnowledgeCreatesPerRun: 2,
|
||||
maxKnowledgeUpdatesPerRun: 7,
|
||||
lastSuccessfulCursor: null,
|
||||
});
|
||||
});
|
||||
|
||||
it('uses inline Notion auth_token when building adapter pull config', async () => {
|
||||
const pullConfig = await notionConnectionToPullConfig(
|
||||
parseNotionConnectionConfig({
|
||||
driver: 'notion',
|
||||
auth_token: 'ntn_inline_token',
|
||||
auth_token_ref: 'env:STALE_NOTION_TOKEN',
|
||||
crawl_mode: 'all_accessible',
|
||||
}),
|
||||
{
|
||||
env: {},
|
||||
readTextFile: async () => {
|
||||
throw new Error('readTextFile should not be called for inline auth_token');
|
||||
},
|
||||
},
|
||||
);
|
||||
|
||||
expect(pullConfig.authToken).toBe('ntn_inline_token');
|
||||
});
|
||||
});
|
||||
224
packages/cli/src/context/connections/notion-config.ts
Normal file
224
packages/cli/src/context/connections/notion-config.ts
Normal file
|
|
@ -0,0 +1,224 @@
|
|||
import { readFile } from 'node:fs/promises';
|
||||
import { homedir } from 'node:os';
|
||||
import { resolve } from 'node:path';
|
||||
import {
|
||||
NOTION_DEFAULT_MAX_KNOWLEDGE_CREATES_PER_RUN,
|
||||
type NotionPullConfig,
|
||||
notionPullConfigSchema,
|
||||
} from '../ingest/adapters/notion/types.js';
|
||||
import type { KtxProjectConnectionConfig } from '../project/config.js';
|
||||
|
||||
const KTX_NOTION_ORG_KNOWLEDGE_WARNING =
|
||||
'Anything accessible to this Notion integration can become organization knowledge.';
|
||||
|
||||
type KtxNotionCrawlMode = 'all_accessible' | 'selected_roots';
|
||||
|
||||
type RawKtxNotionConnectionConfig = Extract<KtxProjectConnectionConfig, { driver: 'notion' }>;
|
||||
|
||||
export type KtxNotionConnectionConfig = Omit<
|
||||
RawKtxNotionConnectionConfig,
|
||||
| 'auth_token'
|
||||
| 'auth_token_ref'
|
||||
| 'crawl_mode'
|
||||
| 'root_page_ids'
|
||||
| 'root_database_ids'
|
||||
| 'root_data_source_ids'
|
||||
| 'max_pages_per_run'
|
||||
| 'max_knowledge_creates_per_run'
|
||||
| 'max_knowledge_updates_per_run'
|
||||
> & {
|
||||
driver: 'notion';
|
||||
auth_token: string | null;
|
||||
auth_token_ref: string | null;
|
||||
crawl_mode: KtxNotionCrawlMode;
|
||||
root_page_ids: string[];
|
||||
root_database_ids: string[];
|
||||
root_data_source_ids: string[];
|
||||
max_pages_per_run: number;
|
||||
max_knowledge_creates_per_run: number;
|
||||
max_knowledge_updates_per_run: number;
|
||||
};
|
||||
|
||||
/** @internal */
|
||||
export interface RedactedKtxNotionConnectionConfig {
|
||||
driver: 'notion';
|
||||
hasAuthToken: boolean;
|
||||
crawlMode: KtxNotionCrawlMode;
|
||||
rootPageIds: string[];
|
||||
rootDatabaseIds: string[];
|
||||
rootDataSourceIds: string[];
|
||||
maxPagesPerRun: number;
|
||||
maxKnowledgeCreatesPerRun: number;
|
||||
maxKnowledgeUpdatesPerRun: number;
|
||||
warning: typeof KTX_NOTION_ORG_KNOWLEDGE_WARNING;
|
||||
}
|
||||
|
||||
interface ResolveNotionTokenOptions {
|
||||
env?: Record<string, string | undefined>;
|
||||
readTextFile?: (path: string) => Promise<string>;
|
||||
}
|
||||
|
||||
function isRecord(value: unknown): value is Record<string, unknown> {
|
||||
return typeof value === 'object' && value !== null && !Array.isArray(value);
|
||||
}
|
||||
|
||||
function record(value: unknown): Record<string, unknown> {
|
||||
if (!isRecord(value)) {
|
||||
throw new Error('Notion connection config must be an object');
|
||||
}
|
||||
return value;
|
||||
}
|
||||
|
||||
function stringValue(value: unknown, fallback: string): string {
|
||||
return typeof value === 'string' && value.trim().length > 0 ? value.trim() : fallback;
|
||||
}
|
||||
|
||||
function optionalString(value: unknown): string | null {
|
||||
return typeof value === 'string' && value.trim().length > 0 ? value.trim() : null;
|
||||
}
|
||||
|
||||
function stringArray(value: unknown): string[] {
|
||||
if (!Array.isArray(value)) {
|
||||
return [];
|
||||
}
|
||||
return value.filter((item): item is string => typeof item === 'string' && item.trim().length > 0);
|
||||
}
|
||||
|
||||
function integerWithFallback(value: unknown, fallback: number, name: string): number {
|
||||
if (value === undefined || value === null) {
|
||||
return fallback;
|
||||
}
|
||||
if (typeof value !== 'number' || !Number.isInteger(value)) {
|
||||
throw new Error(`${name} must be an integer`);
|
||||
}
|
||||
return value;
|
||||
}
|
||||
|
||||
function boundedInteger(value: unknown, fallback: number, name: string, min: number, max: number): number {
|
||||
const parsed = integerWithFallback(value, fallback, name);
|
||||
if (parsed < min || parsed > max) {
|
||||
throw new Error(`${name} must be between ${min} and ${max}`);
|
||||
}
|
||||
return parsed;
|
||||
}
|
||||
|
||||
export function parseNotionConnectionConfig(raw: unknown): KtxNotionConnectionConfig {
|
||||
const input = record(raw);
|
||||
if (input.driver !== 'notion') {
|
||||
throw new Error('Notion connection config requires driver: notion');
|
||||
}
|
||||
const authToken = optionalString(input.auth_token);
|
||||
const authTokenRef = optionalString(input.auth_token_ref);
|
||||
if (!authToken && !authTokenRef) {
|
||||
throw new Error('Notion connection config requires auth_token or auth_token_ref');
|
||||
}
|
||||
if (authTokenRef && !authTokenRef.startsWith('env:') && !authTokenRef.startsWith('file:')) {
|
||||
throw new Error('Notion auth_token_ref must use env:NAME or file:/path');
|
||||
}
|
||||
|
||||
const crawlMode = stringValue(input.crawl_mode, 'selected_roots');
|
||||
if (crawlMode !== 'selected_roots' && crawlMode !== 'all_accessible') {
|
||||
throw new Error(`Unsupported Notion crawl_mode: ${crawlMode}`);
|
||||
}
|
||||
const rootPageIds = stringArray(input.root_page_ids);
|
||||
const rootDatabaseIds = stringArray(input.root_database_ids);
|
||||
const rootDataSourceIds = stringArray(input.root_data_source_ids);
|
||||
if (crawlMode === 'selected_roots' && rootPageIds.length + rootDatabaseIds.length + rootDataSourceIds.length === 0) {
|
||||
throw new Error('selected_roots requires at least one root page, database, or data source id');
|
||||
}
|
||||
|
||||
return {
|
||||
driver: 'notion',
|
||||
auth_token: authToken,
|
||||
auth_token_ref: authTokenRef,
|
||||
crawl_mode: crawlMode,
|
||||
root_page_ids: rootPageIds,
|
||||
root_database_ids: rootDatabaseIds,
|
||||
root_data_source_ids: rootDataSourceIds,
|
||||
max_pages_per_run: boundedInteger(input.max_pages_per_run, 1000, 'max_pages_per_run', 1, 10_000),
|
||||
max_knowledge_creates_per_run: boundedInteger(
|
||||
input.max_knowledge_creates_per_run,
|
||||
NOTION_DEFAULT_MAX_KNOWLEDGE_CREATES_PER_RUN,
|
||||
'max_knowledge_creates_per_run',
|
||||
0,
|
||||
25,
|
||||
),
|
||||
max_knowledge_updates_per_run: boundedInteger(
|
||||
input.max_knowledge_updates_per_run,
|
||||
20,
|
||||
'max_knowledge_updates_per_run',
|
||||
0,
|
||||
100,
|
||||
),
|
||||
};
|
||||
}
|
||||
|
||||
/** @internal */
|
||||
export function redactNotionConnectionConfig(config: KtxNotionConnectionConfig): RedactedKtxNotionConnectionConfig {
|
||||
return {
|
||||
driver: 'notion',
|
||||
hasAuthToken: Boolean(config.auth_token ?? config.auth_token_ref),
|
||||
crawlMode: config.crawl_mode,
|
||||
rootPageIds: config.root_page_ids,
|
||||
rootDatabaseIds: config.root_database_ids,
|
||||
rootDataSourceIds: config.root_data_source_ids,
|
||||
maxPagesPerRun: config.max_pages_per_run,
|
||||
maxKnowledgeCreatesPerRun: config.max_knowledge_creates_per_run,
|
||||
maxKnowledgeUpdatesPerRun: config.max_knowledge_updates_per_run,
|
||||
warning: KTX_NOTION_ORG_KNOWLEDGE_WARNING,
|
||||
};
|
||||
}
|
||||
|
||||
function expandHome(path: string): string {
|
||||
return path === '~' || path.startsWith('~/') ? resolve(homedir(), path.slice(2)) : path;
|
||||
}
|
||||
|
||||
/** @internal */
|
||||
export async function resolveNotionAuthToken(
|
||||
authTokenRef: string,
|
||||
options: ResolveNotionTokenOptions = {},
|
||||
): Promise<string> {
|
||||
if (authTokenRef.startsWith('env:')) {
|
||||
const envName = authTokenRef.slice('env:'.length);
|
||||
const value = (options.env ?? process.env)[envName];
|
||||
if (!value) {
|
||||
throw new Error(`Notion token environment variable ${envName} is not set`);
|
||||
}
|
||||
return value.trim();
|
||||
}
|
||||
if (authTokenRef.startsWith('file:')) {
|
||||
const path = expandHome(authTokenRef.slice('file:'.length));
|
||||
const readTextFile = options.readTextFile ?? ((filePath: string) => readFile(filePath, 'utf-8'));
|
||||
const value = (await readTextFile(path)).trim();
|
||||
if (!value) {
|
||||
throw new Error(`Notion token file is empty: ${path}`);
|
||||
}
|
||||
return value;
|
||||
}
|
||||
throw new Error('Notion auth_token_ref must use env:NAME or file:/path');
|
||||
}
|
||||
|
||||
export async function resolveNotionConnectionAuthToken(
|
||||
config: Pick<KtxNotionConnectionConfig, 'auth_token' | 'auth_token_ref'>,
|
||||
options: ResolveNotionTokenOptions = {},
|
||||
): Promise<string> {
|
||||
return config.auth_token ?? (await resolveNotionAuthToken(config.auth_token_ref ?? '', options));
|
||||
}
|
||||
|
||||
export async function notionConnectionToPullConfig(
|
||||
config: KtxNotionConnectionConfig,
|
||||
options: ResolveNotionTokenOptions = {},
|
||||
): Promise<NotionPullConfig> {
|
||||
const authToken = await resolveNotionConnectionAuthToken(config, options);
|
||||
return notionPullConfigSchema.parse({
|
||||
authToken,
|
||||
crawlMode: config.crawl_mode,
|
||||
rootPageIds: config.root_page_ids,
|
||||
rootDatabaseIds: config.root_database_ids,
|
||||
rootDataSourceIds: config.root_data_source_ids,
|
||||
maxPagesPerRun: config.max_pages_per_run,
|
||||
maxKnowledgeCreatesPerRun: config.max_knowledge_creates_per_run,
|
||||
maxKnowledgeUpdatesPerRun: config.max_knowledge_updates_per_run,
|
||||
lastSuccessfulCursor: null,
|
||||
});
|
||||
}
|
||||
|
|
@ -0,0 +1,103 @@
|
|||
import { describe, expect, it, vi } from 'vitest';
|
||||
import { createPostgresQueryExecutor } from './postgres-query-executor.js';
|
||||
|
||||
function makeClient() {
|
||||
const calls: unknown[] = [];
|
||||
const client = {
|
||||
connect: vi.fn(async () => undefined),
|
||||
query: vi.fn(async (input: unknown) => {
|
||||
calls.push(input);
|
||||
if (input === 'BEGIN READ ONLY') {
|
||||
return { rows: [], fields: [], rowCount: null, command: 'BEGIN' };
|
||||
}
|
||||
if (input === 'COMMIT') {
|
||||
return { rows: [], fields: [], rowCount: null, command: 'COMMIT' };
|
||||
}
|
||||
return {
|
||||
rows: [
|
||||
['paid', 2],
|
||||
['open', 1],
|
||||
],
|
||||
fields: [{ name: 'status' }, { name: 'order_count' }],
|
||||
rowCount: 2,
|
||||
command: 'SELECT',
|
||||
};
|
||||
}),
|
||||
end: vi.fn(async () => undefined),
|
||||
};
|
||||
return { client, calls };
|
||||
}
|
||||
|
||||
describe('createPostgresQueryExecutor', () => {
|
||||
it('runs a read-only transaction in array row mode and closes the client', async () => {
|
||||
const { client, calls } = makeClient();
|
||||
const executor = createPostgresQueryExecutor({
|
||||
clientFactory: vi.fn(() => client),
|
||||
});
|
||||
|
||||
const result = await executor.execute({
|
||||
connectionId: 'warehouse',
|
||||
connection: { driver: 'postgres', url: 'postgres://example/db' },
|
||||
sql: 'select status, count(*) as order_count from public.orders group by status',
|
||||
maxRows: 50,
|
||||
});
|
||||
|
||||
expect(client.connect).toHaveBeenCalledTimes(1);
|
||||
expect(calls[0]).toBe('BEGIN READ ONLY');
|
||||
expect(calls[1]).toEqual({
|
||||
text: 'select * from (select status, count(*) as order_count from public.orders group by status) as ktx_query_result limit 50',
|
||||
rowMode: 'array',
|
||||
});
|
||||
expect(calls[2]).toBe('COMMIT');
|
||||
expect(client.end).toHaveBeenCalledTimes(1);
|
||||
expect(result).toEqual({
|
||||
headers: ['status', 'order_count'],
|
||||
rows: [
|
||||
['paid', 2],
|
||||
['open', 1],
|
||||
],
|
||||
totalRows: 2,
|
||||
command: 'SELECT',
|
||||
rowCount: 2,
|
||||
});
|
||||
});
|
||||
|
||||
it('rolls back and closes the client when query execution fails', async () => {
|
||||
const client = {
|
||||
connect: vi.fn(async () => undefined),
|
||||
query: vi.fn(async (input: unknown) => {
|
||||
if (input === 'BEGIN READ ONLY' || input === 'ROLLBACK') {
|
||||
return { rows: [], fields: [], rowCount: null, command: String(input) };
|
||||
}
|
||||
throw new Error('syntax error');
|
||||
}),
|
||||
end: vi.fn(async () => undefined),
|
||||
};
|
||||
const executor = createPostgresQueryExecutor({
|
||||
clientFactory: vi.fn(() => client),
|
||||
});
|
||||
|
||||
await expect(
|
||||
executor.execute({
|
||||
connectionId: 'warehouse',
|
||||
connection: { driver: 'postgres', url: 'postgres://example/db' },
|
||||
sql: 'select * from broken',
|
||||
maxRows: 10,
|
||||
}),
|
||||
).rejects.toThrow('syntax error');
|
||||
expect(client.query).toHaveBeenCalledWith('ROLLBACK');
|
||||
expect(client.end).toHaveBeenCalledTimes(1);
|
||||
});
|
||||
|
||||
it('requires a Postgres url', async () => {
|
||||
const executor = createPostgresQueryExecutor({ clientFactory: vi.fn() });
|
||||
|
||||
await expect(
|
||||
executor.execute({
|
||||
connectionId: 'warehouse',
|
||||
connection: { driver: 'postgres' },
|
||||
sql: 'select 1',
|
||||
}),
|
||||
).rejects.toThrow('Local Postgres execution requires connections.warehouse.url');
|
||||
});
|
||||
});
|
||||
|
|
@ -0,0 +1,78 @@
|
|||
import { Client, type ClientConfig } from 'pg';
|
||||
import type {
|
||||
KtxSqlQueryExecutionInput,
|
||||
KtxSqlQueryExecutionResult,
|
||||
KtxSqlQueryExecutorPort,
|
||||
} from './query-executor.js';
|
||||
import { limitSqlForExecution } from './read-only-sql.js';
|
||||
|
||||
interface PgClientLike {
|
||||
connect(): Promise<unknown>;
|
||||
query(input: string | { text: string; rowMode: 'array' }): Promise<{
|
||||
fields: Array<{ name: string }>;
|
||||
rows: unknown[][];
|
||||
command: string;
|
||||
rowCount: number | null;
|
||||
}>;
|
||||
end(): Promise<void>;
|
||||
}
|
||||
|
||||
interface PostgresQueryExecutorOptions {
|
||||
statementTimeoutMs?: number;
|
||||
queryTimeoutMs?: number;
|
||||
connectionTimeoutMs?: number;
|
||||
clientFactory?: (config: ClientConfig) => PgClientLike;
|
||||
}
|
||||
|
||||
function connectionDriver(input: KtxSqlQueryExecutionInput): string {
|
||||
return String(input.connection?.driver ?? '').toLowerCase();
|
||||
}
|
||||
|
||||
function createDefaultClient(config: ClientConfig): PgClientLike {
|
||||
return new Client(config);
|
||||
}
|
||||
|
||||
export function createPostgresQueryExecutor(options: PostgresQueryExecutorOptions = {}): KtxSqlQueryExecutorPort {
|
||||
const clientFactory = options.clientFactory ?? createDefaultClient;
|
||||
return {
|
||||
async execute(input: KtxSqlQueryExecutionInput): Promise<KtxSqlQueryExecutionResult> {
|
||||
const driver = connectionDriver(input);
|
||||
const connection = input.connection;
|
||||
if (driver !== 'postgres' && driver !== 'postgresql') {
|
||||
throw new Error(`Local Postgres execution cannot run driver "${connection?.driver ?? 'unknown'}".`);
|
||||
}
|
||||
if (typeof connection?.url !== 'string' || connection.url.trim().length === 0) {
|
||||
throw new Error(`Local Postgres execution requires connections.${input.connectionId}.url.`);
|
||||
}
|
||||
|
||||
const client = clientFactory({
|
||||
connectionString: connection.url,
|
||||
statement_timeout: options.statementTimeoutMs ?? 30_000,
|
||||
query_timeout: options.queryTimeoutMs ?? 35_000,
|
||||
connectionTimeoutMillis: options.connectionTimeoutMs ?? 5_000,
|
||||
application_name: 'ktx-local-query',
|
||||
});
|
||||
await client.connect();
|
||||
try {
|
||||
await client.query('BEGIN READ ONLY');
|
||||
const result = await client.query({
|
||||
text: limitSqlForExecution(input.sql, input.maxRows),
|
||||
rowMode: 'array',
|
||||
});
|
||||
await client.query('COMMIT');
|
||||
return {
|
||||
headers: result.fields.map((field) => field.name),
|
||||
rows: result.rows,
|
||||
totalRows: result.rows.length,
|
||||
command: result.command,
|
||||
rowCount: result.rowCount,
|
||||
};
|
||||
} catch (error) {
|
||||
await client.query('ROLLBACK').catch(() => undefined);
|
||||
throw error;
|
||||
} finally {
|
||||
await client.end();
|
||||
}
|
||||
},
|
||||
};
|
||||
}
|
||||
25
packages/cli/src/context/connections/query-executor.ts
Normal file
25
packages/cli/src/context/connections/query-executor.ts
Normal file
|
|
@ -0,0 +1,25 @@
|
|||
import type { KtxProjectConnectionConfig } from '../../context/project/config.js';
|
||||
|
||||
export interface KtxSqlQueryExecutionInput {
|
||||
connectionId: string;
|
||||
projectDir?: string;
|
||||
connection: KtxProjectConnectionConfig | undefined;
|
||||
sql: string;
|
||||
maxRows?: number;
|
||||
}
|
||||
|
||||
export interface KtxSqlQueryExecutionResult {
|
||||
headers: string[];
|
||||
rows: unknown[][];
|
||||
totalRows: number;
|
||||
command: string;
|
||||
rowCount: number | null;
|
||||
}
|
||||
|
||||
export interface KtxSqlQueryExecutorPort {
|
||||
execute(input: KtxSqlQueryExecutionInput): Promise<KtxSqlQueryExecutionResult>;
|
||||
}
|
||||
|
||||
export function normalizeQueryRows(rows: unknown[]): unknown[][] {
|
||||
return rows.map((row) => (Array.isArray(row) ? row : Object.values(row as Record<string, unknown>)));
|
||||
}
|
||||
30
packages/cli/src/context/connections/read-only-sql.test.ts
Normal file
30
packages/cli/src/context/connections/read-only-sql.test.ts
Normal file
|
|
@ -0,0 +1,30 @@
|
|||
import { describe, expect, it } from 'vitest';
|
||||
import { assertReadOnlySql, limitSqlForExecution } from './read-only-sql.js';
|
||||
|
||||
describe('assertReadOnlySql', () => {
|
||||
it('allows select and with queries', () => {
|
||||
expect(assertReadOnlySql('select * from orders')).toBe('select * from orders');
|
||||
expect(assertReadOnlySql('with paid as (select * from orders) select * from paid')).toContain('with paid');
|
||||
});
|
||||
|
||||
it('rejects mutating statements before opening a database connection', () => {
|
||||
expect(() => assertReadOnlySql('delete from orders')).toThrow(
|
||||
'Only read-only SELECT/WITH queries can be executed locally',
|
||||
);
|
||||
expect(() => assertReadOnlySql('create table x(id int)')).toThrow(
|
||||
'Only read-only SELECT/WITH queries can be executed locally',
|
||||
);
|
||||
});
|
||||
});
|
||||
|
||||
describe('limitSqlForExecution', () => {
|
||||
it('wraps compiled SQL and strips trailing semicolons', () => {
|
||||
expect(limitSqlForExecution('select * from public.orders; ', 25)).toBe(
|
||||
'select * from (select * from public.orders) as ktx_query_result limit 25',
|
||||
);
|
||||
});
|
||||
|
||||
it('returns the trimmed SQL when no maxRows value is provided', () => {
|
||||
expect(limitSqlForExecution('select * from orders; ', undefined)).toBe('select * from orders');
|
||||
});
|
||||
});
|
||||
22
packages/cli/src/context/connections/read-only-sql.ts
Normal file
22
packages/cli/src/context/connections/read-only-sql.ts
Normal file
|
|
@ -0,0 +1,22 @@
|
|||
const MUTATING_SQL =
|
||||
/^\s*(insert|update|delete|merge|alter|drop|create|truncate|grant|revoke|copy|call|do|vacuum|analyze|refresh)\b/i;
|
||||
const READ_SQL = /^\s*(select|with)\b/i;
|
||||
|
||||
export function assertReadOnlySql(sql: string): string {
|
||||
const trimmed = sql.trim();
|
||||
if (!READ_SQL.test(trimmed) || MUTATING_SQL.test(trimmed)) {
|
||||
throw new Error('Only read-only SELECT/WITH queries can be executed locally.');
|
||||
}
|
||||
return trimmed;
|
||||
}
|
||||
|
||||
export function limitSqlForExecution(sql: string, maxRows: number | undefined): string {
|
||||
const trimmed = assertReadOnlySql(sql).replace(/;+\s*$/, '');
|
||||
if (!maxRows) {
|
||||
return trimmed;
|
||||
}
|
||||
if (!Number.isInteger(maxRows) || maxRows <= 0) {
|
||||
throw new Error('maxRows must be a positive integer.');
|
||||
}
|
||||
return `select * from (${trimmed}) as ktx_query_result limit ${maxRows}`;
|
||||
}
|
||||
|
|
@ -0,0 +1,139 @@
|
|||
import { mkdtemp, rm } from 'node:fs/promises';
|
||||
import { writeFileSync } from 'node:fs';
|
||||
import { tmpdir } from 'node:os';
|
||||
import { join } from 'node:path';
|
||||
import Database from 'better-sqlite3';
|
||||
import { afterEach, beforeEach, describe, expect, it } from 'vitest';
|
||||
import { createSqliteQueryExecutor, sqliteDatabasePathFromConnection } from './sqlite-query-executor.js';
|
||||
|
||||
describe('createSqliteQueryExecutor', () => {
|
||||
let tempDir: string;
|
||||
let dbPath: string;
|
||||
|
||||
beforeEach(async () => {
|
||||
tempDir = await mkdtemp(join(tmpdir(), 'ktx-sqlite-query-'));
|
||||
dbPath = join(tempDir, 'warehouse.db');
|
||||
const db = new Database(dbPath);
|
||||
db.exec(`
|
||||
CREATE TABLE orders (
|
||||
id INTEGER PRIMARY KEY,
|
||||
status TEXT NOT NULL,
|
||||
amount INTEGER NOT NULL
|
||||
);
|
||||
INSERT INTO orders (status, amount) VALUES
|
||||
('paid', 20),
|
||||
('paid', 30),
|
||||
('open', 10);
|
||||
`);
|
||||
db.close();
|
||||
});
|
||||
|
||||
afterEach(async () => {
|
||||
await rm(tempDir, { recursive: true, force: true });
|
||||
});
|
||||
|
||||
it('executes read-only SELECT SQL against a relative SQLite path', async () => {
|
||||
const executor = createSqliteQueryExecutor();
|
||||
|
||||
const result = await executor.execute({
|
||||
connectionId: 'warehouse',
|
||||
projectDir: tempDir,
|
||||
connection: { driver: 'sqlite', path: 'warehouse.db' },
|
||||
sql: 'select status, count(*) as order_count from orders group by status order by status',
|
||||
maxRows: 10,
|
||||
});
|
||||
|
||||
expect(result).toEqual({
|
||||
headers: ['status', 'order_count'],
|
||||
rows: [
|
||||
['open', 1],
|
||||
['paid', 2],
|
||||
],
|
||||
totalRows: 2,
|
||||
command: 'SELECT',
|
||||
rowCount: 2,
|
||||
});
|
||||
});
|
||||
|
||||
it('supports file urls for SQLite database paths', async () => {
|
||||
expect(
|
||||
sqliteDatabasePathFromConnection({
|
||||
connectionId: 'warehouse',
|
||||
projectDir: tempDir,
|
||||
connection: { driver: 'sqlite', url: `file://${dbPath}` },
|
||||
sql: 'select 1',
|
||||
}),
|
||||
).toBe(dbPath);
|
||||
});
|
||||
|
||||
it('resolves file references for SQLite path fields', async () => {
|
||||
const pointerPath = join(tempDir, 'sqlite-path.txt');
|
||||
writeFileSync(pointerPath, dbPath, 'utf-8');
|
||||
|
||||
expect(
|
||||
sqliteDatabasePathFromConnection({
|
||||
connectionId: 'warehouse',
|
||||
projectDir: tempDir,
|
||||
connection: { driver: 'sqlite', path: `file:${pointerPath}` },
|
||||
sql: 'select 1',
|
||||
}),
|
||||
).toBe(dbPath);
|
||||
});
|
||||
|
||||
it('resolves env references for SQLite database urls', async () => {
|
||||
const originalDatabaseUrl = process.env.KTX_SQLITE_TEST_URL;
|
||||
process.env.KTX_SQLITE_TEST_URL = `sqlite:${dbPath}`;
|
||||
|
||||
try {
|
||||
expect(
|
||||
sqliteDatabasePathFromConnection({
|
||||
connectionId: 'warehouse',
|
||||
projectDir: tempDir,
|
||||
connection: { driver: 'sqlite', url: 'env:KTX_SQLITE_TEST_URL' },
|
||||
sql: 'select 1',
|
||||
}),
|
||||
).toBe(dbPath);
|
||||
} finally {
|
||||
if (originalDatabaseUrl === undefined) {
|
||||
delete process.env.KTX_SQLITE_TEST_URL;
|
||||
} else {
|
||||
process.env.KTX_SQLITE_TEST_URL = originalDatabaseUrl;
|
||||
}
|
||||
}
|
||||
});
|
||||
|
||||
it('rejects mutating SQL before opening the database', async () => {
|
||||
const executor = createSqliteQueryExecutor();
|
||||
|
||||
await expect(
|
||||
executor.execute({
|
||||
connectionId: 'warehouse',
|
||||
projectDir: tempDir,
|
||||
connection: { driver: 'sqlite', path: 'warehouse.db' },
|
||||
sql: 'delete from orders',
|
||||
}),
|
||||
).rejects.toThrow('Only read-only SELECT/WITH queries can be executed locally');
|
||||
});
|
||||
|
||||
it('requires a SQLite driver and a database path', async () => {
|
||||
const executor = createSqliteQueryExecutor();
|
||||
|
||||
await expect(
|
||||
executor.execute({
|
||||
connectionId: 'warehouse',
|
||||
projectDir: tempDir,
|
||||
connection: { driver: 'postgres', path: 'warehouse.db' },
|
||||
sql: 'select 1',
|
||||
}),
|
||||
).rejects.toThrow('Local SQLite execution cannot run driver "postgres"');
|
||||
|
||||
await expect(
|
||||
executor.execute({
|
||||
connectionId: 'warehouse',
|
||||
projectDir: tempDir,
|
||||
connection: { driver: 'sqlite' },
|
||||
sql: 'select 1',
|
||||
}),
|
||||
).rejects.toThrow('Local SQLite execution requires connections.warehouse.path or connections.warehouse.url');
|
||||
});
|
||||
});
|
||||
|
|
@ -0,0 +1,92 @@
|
|||
import { isAbsolute, resolve } from 'node:path';
|
||||
import { fileURLToPath } from 'node:url';
|
||||
import Database from 'better-sqlite3';
|
||||
import { readFileSync } from 'node:fs';
|
||||
import { homedir } from 'node:os';
|
||||
import type {
|
||||
KtxSqlQueryExecutionInput,
|
||||
KtxSqlQueryExecutionResult,
|
||||
KtxSqlQueryExecutorPort,
|
||||
} from './query-executor.js';
|
||||
import { normalizeQueryRows } from './query-executor.js';
|
||||
import { limitSqlForExecution } from './read-only-sql.js';
|
||||
|
||||
type SqliteConnectionConfig = Record<string, unknown> | undefined;
|
||||
|
||||
function connectionDriver(input: KtxSqlQueryExecutionInput): string {
|
||||
return String(input.connection?.driver ?? '').toLowerCase();
|
||||
}
|
||||
|
||||
function stringConfigValue(connection: SqliteConnectionConfig, key: string): string | undefined {
|
||||
const value = connection?.[key];
|
||||
return typeof value === 'string' && value.trim().length > 0 ? resolveStringReference(key, value.trim()) : undefined;
|
||||
}
|
||||
|
||||
function resolveStringReference(key: string, value: string): string {
|
||||
if (value.startsWith('env:')) {
|
||||
return process.env[value.slice('env:'.length)] ?? '';
|
||||
}
|
||||
if (key !== 'url' && value.startsWith('file:')) {
|
||||
const rawPath = value.slice('file:'.length);
|
||||
const path = rawPath.startsWith('~') ? resolve(homedir(), rawPath.slice(1)) : rawPath;
|
||||
return readFileSync(path, 'utf-8').trim();
|
||||
}
|
||||
return value;
|
||||
}
|
||||
|
||||
function sqlitePathFromUrl(url: string): string {
|
||||
if (url.startsWith('file:')) {
|
||||
return fileURLToPath(url);
|
||||
}
|
||||
|
||||
if (url.startsWith('sqlite:')) {
|
||||
const parsed = new URL(url);
|
||||
if (parsed.pathname.length > 0) {
|
||||
return decodeURIComponent(parsed.pathname);
|
||||
}
|
||||
}
|
||||
|
||||
return url;
|
||||
}
|
||||
|
||||
/** @internal */
|
||||
export function sqliteDatabasePathFromConnection(input: KtxSqlQueryExecutionInput): string {
|
||||
const driver = connectionDriver(input);
|
||||
if (driver !== 'sqlite' && driver !== 'sqlite3') {
|
||||
throw new Error(`Local SQLite execution cannot run driver "${input.connection?.driver ?? 'unknown'}".`);
|
||||
}
|
||||
|
||||
const pathValue = stringConfigValue(input.connection, 'path');
|
||||
const urlValue = stringConfigValue(input.connection, 'url');
|
||||
if (!pathValue && !urlValue) {
|
||||
throw new Error(
|
||||
`Local SQLite execution requires connections.${input.connectionId}.path or connections.${input.connectionId}.url.`,
|
||||
);
|
||||
}
|
||||
|
||||
const candidate = pathValue ?? sqlitePathFromUrl(urlValue as string);
|
||||
return isAbsolute(candidate) ? candidate : resolve(input.projectDir ?? process.cwd(), candidate);
|
||||
}
|
||||
|
||||
export function createSqliteQueryExecutor(): KtxSqlQueryExecutorPort {
|
||||
return {
|
||||
async execute(input: KtxSqlQueryExecutionInput): Promise<KtxSqlQueryExecutionResult> {
|
||||
const sql = limitSqlForExecution(input.sql, input.maxRows);
|
||||
const dbPath = sqliteDatabasePathFromConnection(input);
|
||||
const db = new Database(dbPath, { readonly: true, fileMustExist: true });
|
||||
try {
|
||||
const statement = db.prepare(sql);
|
||||
const rows = statement.all() as unknown[];
|
||||
return {
|
||||
headers: statement.columns().map((column) => column.name),
|
||||
rows: normalizeQueryRows(rows),
|
||||
totalRows: rows.length,
|
||||
command: 'SELECT',
|
||||
rowCount: rows.length,
|
||||
};
|
||||
} finally {
|
||||
db.close();
|
||||
}
|
||||
},
|
||||
};
|
||||
}
|
||||
34
packages/cli/src/context/core/config-reference.test.ts
Normal file
34
packages/cli/src/context/core/config-reference.test.ts
Normal file
|
|
@ -0,0 +1,34 @@
|
|||
import { mkdir, writeFile } from 'node:fs/promises';
|
||||
import { tmpdir } from 'node:os';
|
||||
import { join } from 'node:path';
|
||||
import { describe, expect, it } from 'vitest';
|
||||
import { resolveKtxConfigReference, resolveKtxHomePath } from './config-reference.js';
|
||||
|
||||
describe('KTX config references', () => {
|
||||
it('resolves env references without returning empty values', () => {
|
||||
expect(resolveKtxConfigReference('env:AI_GATEWAY_API_KEY', { AI_GATEWAY_API_KEY: ' gateway-key ' })).toBe(
|
||||
'gateway-key',
|
||||
);
|
||||
expect(resolveKtxConfigReference('env:AI_GATEWAY_API_KEY', { AI_GATEWAY_API_KEY: ' ' })).toBeUndefined();
|
||||
expect(resolveKtxConfigReference('env:AI_GATEWAY_API_KEY', {})).toBeUndefined();
|
||||
});
|
||||
|
||||
it('resolves file references and trims file content', async () => {
|
||||
const dir = join(tmpdir(), `ktx-config-reference-${process.pid}`);
|
||||
await mkdir(dir, { recursive: true });
|
||||
const keyPath = join(dir, 'gateway-key.txt');
|
||||
await writeFile(keyPath, 'file-gateway-key\n', 'utf8');
|
||||
|
||||
expect(resolveKtxConfigReference(`file:${keyPath}`, {})).toBe('file-gateway-key');
|
||||
});
|
||||
|
||||
it('returns literal values unchanged after trimming blank-only values', () => {
|
||||
expect(resolveKtxConfigReference('provider/model', {})).toBe('provider/model');
|
||||
expect(resolveKtxConfigReference(' ', {})).toBeUndefined();
|
||||
expect(resolveKtxConfigReference(undefined, {})).toBeUndefined();
|
||||
});
|
||||
|
||||
it('resolves home-prefixed paths', () => {
|
||||
expect(resolveKtxHomePath('~/ktx/key.txt')).toContain('/ktx/key.txt');
|
||||
});
|
||||
});
|
||||
37
packages/cli/src/context/core/config-reference.ts
Normal file
37
packages/cli/src/context/core/config-reference.ts
Normal file
|
|
@ -0,0 +1,37 @@
|
|||
import { readFileSync } from 'node:fs';
|
||||
import { homedir } from 'node:os';
|
||||
import { resolve } from 'node:path';
|
||||
|
||||
/** @internal */
|
||||
export function resolveKtxHomePath(path: string): string {
|
||||
if (path === '~') {
|
||||
return homedir();
|
||||
}
|
||||
|
||||
if (path.startsWith('~/')) {
|
||||
return resolve(homedir(), path.slice(2));
|
||||
}
|
||||
|
||||
return resolve(path);
|
||||
}
|
||||
|
||||
export function resolveKtxConfigReference(value: string | undefined, env: NodeJS.ProcessEnv): string | undefined {
|
||||
if (!value) {
|
||||
return undefined;
|
||||
}
|
||||
|
||||
if (value.startsWith('env:')) {
|
||||
const envName = value.slice('env:'.length).trim();
|
||||
const envValue = env[envName];
|
||||
return envValue && envValue.trim().length > 0 ? envValue.trim() : undefined;
|
||||
}
|
||||
|
||||
if (value.startsWith('file:')) {
|
||||
const filePath = resolveKtxHomePath(value.slice('file:'.length).trim());
|
||||
const fileValue = readFileSync(filePath, 'utf8').trim();
|
||||
return fileValue.length > 0 ? fileValue : undefined;
|
||||
}
|
||||
|
||||
const trimmed = value.trim();
|
||||
return trimmed.length > 0 ? trimmed : undefined;
|
||||
}
|
||||
42
packages/cli/src/context/core/config.ts
Normal file
42
packages/cli/src/context/core/config.ts
Normal file
|
|
@ -0,0 +1,42 @@
|
|||
interface KtxStorageConfig {
|
||||
configDir?: string;
|
||||
homeDir?: string;
|
||||
worktreesDir?: string;
|
||||
}
|
||||
|
||||
interface KtxGitConfig {
|
||||
userName: string;
|
||||
userEmail: string;
|
||||
bootstrapMessage?: string;
|
||||
bootstrapAuthor?: string;
|
||||
bootstrapAuthorEmail?: string;
|
||||
}
|
||||
|
||||
export interface KtxCoreConfig {
|
||||
storage: KtxStorageConfig;
|
||||
git: KtxGitConfig;
|
||||
}
|
||||
|
||||
export interface KtxLogger {
|
||||
debug(message: string): void;
|
||||
log(message: string): void;
|
||||
warn(message: string): void;
|
||||
error(message: string, error?: unknown): void;
|
||||
}
|
||||
|
||||
export const noopLogger: KtxLogger = {
|
||||
debug: () => undefined,
|
||||
log: () => undefined,
|
||||
warn: () => undefined,
|
||||
error: () => undefined,
|
||||
};
|
||||
|
||||
export function resolveConfigDir(config: KtxCoreConfig): string {
|
||||
const homeDir = config.storage.homeDir ?? '/tmp';
|
||||
return config.storage.configDir ?? `${homeDir}/ktx/config`;
|
||||
}
|
||||
|
||||
export function resolveWorktreesDir(config: KtxCoreConfig): string {
|
||||
const homeDir = config.storage.homeDir ?? '/tmp';
|
||||
return config.storage.worktreesDir ?? `${homeDir}/.worktrees`;
|
||||
}
|
||||
5
packages/cli/src/context/core/embedding.ts
Normal file
5
packages/cli/src/context/core/embedding.ts
Normal file
|
|
@ -0,0 +1,5 @@
|
|||
export interface KtxEmbeddingPort {
|
||||
maxBatchSize: number;
|
||||
computeEmbedding(text: string): Promise<number[]>;
|
||||
computeEmbeddingsBulk(texts: string[]): Promise<number[][]>;
|
||||
}
|
||||
43
packages/cli/src/context/core/file-store.ts
Normal file
43
packages/cli/src/context/core/file-store.ts
Normal file
|
|
@ -0,0 +1,43 @@
|
|||
export interface KtxFileWriteResult {
|
||||
commitHash?: string | null;
|
||||
[key: string]: unknown;
|
||||
}
|
||||
|
||||
export interface KtxFileReadResult {
|
||||
content: string;
|
||||
[key: string]: unknown;
|
||||
}
|
||||
|
||||
export interface KtxFileListResult {
|
||||
files: string[];
|
||||
}
|
||||
|
||||
export interface KtxFileHistoryEntry {
|
||||
sha?: string;
|
||||
message?: string;
|
||||
author?: string;
|
||||
date?: string | Date;
|
||||
[key: string]: unknown;
|
||||
}
|
||||
|
||||
export interface KtxFileStorePort<TSelf = unknown> {
|
||||
writeFile(
|
||||
path: string,
|
||||
content: string,
|
||||
author: string,
|
||||
authorEmail: string,
|
||||
commitMessage: string,
|
||||
options?: { skipLock?: boolean },
|
||||
): Promise<KtxFileWriteResult>;
|
||||
readFile(path: string): Promise<KtxFileReadResult>;
|
||||
deleteFile(
|
||||
path: string,
|
||||
author: string,
|
||||
authorEmail: string,
|
||||
commitMessage: string,
|
||||
options?: { skipLock?: boolean },
|
||||
): Promise<KtxFileWriteResult | null>;
|
||||
listFiles(path: string, recursive?: boolean): Promise<KtxFileListResult>;
|
||||
getFileHistory(path: string): Promise<KtxFileHistoryEntry[] | unknown>;
|
||||
forWorktree(workdir: string): TSelf;
|
||||
}
|
||||
29
packages/cli/src/context/core/git-env.ts
Normal file
29
packages/cli/src/context/core/git-env.ts
Normal file
|
|
@ -0,0 +1,29 @@
|
|||
import { simpleGit, type SimpleGit } from 'simple-git';
|
||||
|
||||
const GIT_HOOK_ENV_KEYS = [
|
||||
'GIT_ALTERNATE_OBJECT_DIRECTORIES',
|
||||
'GIT_DIR',
|
||||
'GIT_INDEX_FILE',
|
||||
'GIT_OBJECT_DIRECTORY',
|
||||
'GIT_PREFIX',
|
||||
'GIT_QUARANTINE_PATH',
|
||||
'GIT_WORK_TREE',
|
||||
'GIT_EDITOR',
|
||||
'GIT_EXEC_PATH',
|
||||
'GIT_PAGER',
|
||||
'PAGER',
|
||||
'VISUAL',
|
||||
'EDITOR',
|
||||
] as const;
|
||||
|
||||
function sanitizedGitEnv(env: NodeJS.ProcessEnv = process.env): NodeJS.ProcessEnv {
|
||||
const sanitized = { ...env };
|
||||
for (const key of GIT_HOOK_ENV_KEYS) {
|
||||
delete sanitized[key];
|
||||
}
|
||||
return sanitized;
|
||||
}
|
||||
|
||||
export function createSimpleGit(baseDir: string): SimpleGit {
|
||||
return simpleGit({ baseDir, unsafe: { allowUnsafeAskPass: true } }).env(sanitizedGitEnv());
|
||||
}
|
||||
|
|
@ -0,0 +1,75 @@
|
|||
import { afterEach, beforeEach, describe, expect, it } from 'vitest';
|
||||
import { mkdir, mkdtemp, rm, writeFile } from 'node:fs/promises';
|
||||
import { tmpdir } from 'node:os';
|
||||
import { join } from 'node:path';
|
||||
import type { SimpleGit } from 'simple-git';
|
||||
import type { KtxCoreConfig } from './config.js';
|
||||
import { createSimpleGit } from './git-env.js';
|
||||
import { GitService } from './git.service.js';
|
||||
|
||||
describe('GitService.assertWorktreeClean', () => {
|
||||
let workdir: string;
|
||||
let git: SimpleGit;
|
||||
let gitService: GitService;
|
||||
|
||||
beforeEach(async () => {
|
||||
workdir = await mkdtemp(join(tmpdir(), 'gitsvc-clean-'));
|
||||
git = createSimpleGit(workdir);
|
||||
await git.init();
|
||||
await git.addConfig('user.email', 't@test');
|
||||
await git.addConfig('user.name', 'Test');
|
||||
await writeFile(join(workdir, 'init'), 'init');
|
||||
await git.add('.');
|
||||
await git.commit('init');
|
||||
const coreConfig: KtxCoreConfig = {
|
||||
storage: { configDir: workdir, homeDir: workdir },
|
||||
git: { userName: 'Test', userEmail: 't@test' },
|
||||
};
|
||||
gitService = new GitService(coreConfig);
|
||||
(gitService as any).git = git;
|
||||
(gitService as any).configDir = workdir;
|
||||
});
|
||||
|
||||
afterEach(async () => rm(workdir, { recursive: true, force: true }));
|
||||
|
||||
it('does not throw on a clean worktree', async () => {
|
||||
await expect(gitService.assertWorktreeClean()).resolves.toBeUndefined();
|
||||
});
|
||||
|
||||
it('throws when MERGE_HEAD exists', async () => {
|
||||
await writeFile(join(workdir, '.git', 'MERGE_HEAD'), 'deadbeef\n');
|
||||
await expect(gitService.assertWorktreeClean()).rejects.toThrow(/MERGE_HEAD/);
|
||||
});
|
||||
|
||||
it('throws when CHERRY_PICK_HEAD exists', async () => {
|
||||
await writeFile(join(workdir, '.git', 'CHERRY_PICK_HEAD'), 'deadbeef\n');
|
||||
await expect(gitService.assertWorktreeClean()).rejects.toThrow(/CHERRY_PICK_HEAD/);
|
||||
});
|
||||
|
||||
it('throws when REVERT_HEAD exists', async () => {
|
||||
await writeFile(join(workdir, '.git', 'REVERT_HEAD'), 'deadbeef\n');
|
||||
await expect(gitService.assertWorktreeClean()).rejects.toThrow(/REVERT_HEAD/);
|
||||
});
|
||||
|
||||
it('throws when sequencer/todo exists (interrupted multi-commit revert/cherry-pick)', async () => {
|
||||
await mkdir(join(workdir, '.git', 'sequencer'), { recursive: true });
|
||||
await writeFile(join(workdir, '.git', 'sequencer', 'todo'), 'pick deadbeef foo\n');
|
||||
await expect(gitService.assertWorktreeClean()).rejects.toThrow(/sequencer/);
|
||||
});
|
||||
|
||||
it('throws when the index has unmerged paths', async () => {
|
||||
await git.checkoutLocalBranch('a');
|
||||
await writeFile(join(workdir, 'shared'), 'A version');
|
||||
await git.add('.');
|
||||
await git.commit('a');
|
||||
await git.checkout('master').catch(() => git.checkout('main'));
|
||||
await git.checkoutLocalBranch('b');
|
||||
await writeFile(join(workdir, 'shared'), 'B version');
|
||||
await git.add('.');
|
||||
await git.commit('b');
|
||||
|
||||
await git.raw(['merge', 'a']).catch(() => undefined);
|
||||
|
||||
await expect(gitService.assertWorktreeClean()).rejects.toThrow();
|
||||
});
|
||||
});
|
||||
|
|
@ -0,0 +1,78 @@
|
|||
import { afterEach, beforeEach, describe, expect, it } from 'vitest';
|
||||
import { mkdir, mkdtemp, readdir, rm, writeFile } from 'node:fs/promises';
|
||||
import { tmpdir } from 'node:os';
|
||||
import { join } from 'node:path';
|
||||
import type { SimpleGit } from 'simple-git';
|
||||
import type { KtxCoreConfig } from './config.js';
|
||||
import { createSimpleGit } from './git-env.js';
|
||||
import { GitService } from './git.service.js';
|
||||
|
||||
describe('GitService.deleteDirectories', () => {
|
||||
let workdir: string;
|
||||
let git: SimpleGit;
|
||||
let gitService: GitService;
|
||||
|
||||
beforeEach(async () => {
|
||||
workdir = await mkdtemp(join(tmpdir(), 'gitsvc-dd-'));
|
||||
git = createSimpleGit(workdir);
|
||||
await git.init();
|
||||
await git.addConfig('user.email', 't@test');
|
||||
await git.addConfig('user.name', 'Test');
|
||||
await writeFile(join(workdir, 'keep'), 'k');
|
||||
await git.add('.');
|
||||
await git.commit('init');
|
||||
|
||||
const coreConfig: KtxCoreConfig = {
|
||||
storage: { configDir: workdir, homeDir: workdir },
|
||||
git: { userName: 'Test', userEmail: 't@test' },
|
||||
};
|
||||
gitService = new GitService(coreConfig);
|
||||
(gitService as any).git = git;
|
||||
(gitService as any).configDir = workdir;
|
||||
});
|
||||
|
||||
afterEach(async () => rm(workdir, { recursive: true, force: true }));
|
||||
|
||||
it('removes multiple directories in a single commit', async () => {
|
||||
for (const name of ['a', 'b', 'c']) {
|
||||
await mkdir(join(workdir, name), { recursive: true });
|
||||
await writeFile(join(workdir, name, 'f.txt'), name);
|
||||
}
|
||||
await git.add('.');
|
||||
await git.commit('seed 3 dirs');
|
||||
const beforeCommits = (await git.log()).total;
|
||||
|
||||
const result = await gitService.deleteDirectories(['a', 'b'], 'gc: drop a+b', 'System User', 'system@example.com');
|
||||
expect(result.commitHash).toBeTruthy();
|
||||
|
||||
const entries = await readdir(workdir);
|
||||
expect(entries).not.toContain('a');
|
||||
expect(entries).not.toContain('b');
|
||||
expect(entries).toContain('c');
|
||||
|
||||
const afterCommits = (await git.log()).total;
|
||||
expect(afterCommits).toBe(beforeCommits + 1);
|
||||
});
|
||||
|
||||
it('no-ops and returns a null hash when the input list is empty', async () => {
|
||||
const result = await gitService.deleteDirectories([], 'empty', 'X', 'x@example.com');
|
||||
expect(result.commitHash).toBe('');
|
||||
expect(result.created).toBe(false);
|
||||
});
|
||||
|
||||
it('ignores paths that have already been deleted — commits only the remaining ones', async () => {
|
||||
await mkdir(join(workdir, 'stale'), { recursive: true });
|
||||
await writeFile(join(workdir, 'stale', 'x'), 'x');
|
||||
await git.add('.');
|
||||
await git.commit('seed stale');
|
||||
const result = await gitService.deleteDirectories(
|
||||
['stale', 'missing'],
|
||||
'gc: drop stale + missing',
|
||||
'System User',
|
||||
'system@example.com',
|
||||
);
|
||||
expect(result.commitHash).toBeTruthy();
|
||||
const entries = await readdir(workdir);
|
||||
expect(entries).not.toContain('stale');
|
||||
});
|
||||
});
|
||||
45
packages/cli/src/context/core/git.service.patch.test.ts
Normal file
45
packages/cli/src/context/core/git.service.patch.test.ts
Normal file
|
|
@ -0,0 +1,45 @@
|
|||
import { mkdir, mkdtemp, readFile, writeFile } from 'node:fs/promises';
|
||||
import { tmpdir } from 'node:os';
|
||||
import { join } from 'node:path';
|
||||
import { describe, expect, it } from 'vitest';
|
||||
import { GitService } from './git.service.js';
|
||||
|
||||
async function makeGit() {
|
||||
const homeDir = await mkdtemp(join(tmpdir(), 'ktx-git-patch-'));
|
||||
const configDir = join(homeDir, 'config');
|
||||
const git = new GitService({
|
||||
storage: { configDir, homeDir },
|
||||
git: {
|
||||
userName: 'System User',
|
||||
userEmail: 'system@example.com',
|
||||
bootstrapMessage: 'init',
|
||||
bootstrapAuthor: 'system',
|
||||
bootstrapAuthorEmail: 'system@example.com',
|
||||
},
|
||||
});
|
||||
await git.onModuleInit();
|
||||
return { homeDir, configDir, git };
|
||||
}
|
||||
|
||||
describe('GitService patch helpers', () => {
|
||||
it('collects binary-safe no-rename patches and applies them with --3way --index', async () => {
|
||||
const { homeDir, configDir, git } = await makeGit();
|
||||
await mkdir(join(configDir, 'wiki/global'), { recursive: true });
|
||||
await writeFile(join(configDir, 'wiki/global/page.md'), 'old\n');
|
||||
await git.commitFiles(['wiki/global/page.md'], 'add page', 'System User', 'system@example.com');
|
||||
const base = await git.revParseHead();
|
||||
|
||||
await writeFile(join(configDir, 'wiki/global/page.md'), 'new\n');
|
||||
await git.commitFiles(['wiki/global/page.md'], 'edit page', 'System User', 'system@example.com');
|
||||
const patchPath = join(homeDir, 'proposal.patch');
|
||||
await git.writeBinaryNoRenamePatch(base, 'HEAD', patchPath);
|
||||
|
||||
const targetDir = join(homeDir, 'target');
|
||||
await git.addWorktree(targetDir, 'target', base);
|
||||
const targetGit = git.forWorktree(targetDir);
|
||||
await targetGit.applyPatchFile3WayIndex(patchPath);
|
||||
await targetGit.commitStaged('apply proposal', 'System User', 'system@example.com');
|
||||
|
||||
await expect(readFile(join(targetDir, 'wiki/global/page.md'), 'utf-8')).resolves.toBe('new\n');
|
||||
});
|
||||
});
|
||||
56
packages/cli/src/context/core/git.service.reset-hard.test.ts
Normal file
56
packages/cli/src/context/core/git.service.reset-hard.test.ts
Normal file
|
|
@ -0,0 +1,56 @@
|
|||
import { afterEach, beforeEach, describe, expect, it } from 'vitest';
|
||||
import { mkdtemp, readFile, rm, writeFile } from 'node:fs/promises';
|
||||
import { tmpdir } from 'node:os';
|
||||
import { join } from 'node:path';
|
||||
import type { SimpleGit } from 'simple-git';
|
||||
import type { KtxCoreConfig } from './config.js';
|
||||
import { createSimpleGit } from './git-env.js';
|
||||
import { GitService } from './git.service.js';
|
||||
|
||||
describe('GitService.resetHardTo', () => {
|
||||
let workdir: string;
|
||||
let git: SimpleGit;
|
||||
let gitService: GitService;
|
||||
|
||||
beforeEach(async () => {
|
||||
workdir = await mkdtemp(join(tmpdir(), 'gitsvc-reset-'));
|
||||
git = createSimpleGit(workdir);
|
||||
await git.init();
|
||||
await git.addConfig('user.email', 't@test');
|
||||
await git.addConfig('user.name', 'Test');
|
||||
await writeFile(join(workdir, 'init'), 'init');
|
||||
await git.add('.');
|
||||
await git.commit('init');
|
||||
const coreConfig: KtxCoreConfig = {
|
||||
storage: { configDir: workdir, homeDir: workdir },
|
||||
git: { userName: 'Test', userEmail: 't@test' },
|
||||
};
|
||||
gitService = new GitService(coreConfig);
|
||||
(gitService as any).git = git;
|
||||
(gitService as any).configDir = workdir;
|
||||
});
|
||||
|
||||
afterEach(async () => rm(workdir, { recursive: true, force: true }));
|
||||
|
||||
it('rewinds HEAD to the target SHA, removing later commits and their files', async () => {
|
||||
const baseSha = (await git.revparse(['HEAD'])).trim();
|
||||
await writeFile(join(workdir, 'a'), 'a1');
|
||||
await git.add('.');
|
||||
await git.commit('a');
|
||||
await writeFile(join(workdir, 'b'), 'b1');
|
||||
await git.add('.');
|
||||
await git.commit('b');
|
||||
|
||||
await gitService.resetHardTo(baseSha);
|
||||
|
||||
expect((await git.revparse(['HEAD'])).trim()).toBe(baseSha);
|
||||
expect(await readFile(join(workdir, 'a'), 'utf-8').catch(() => null)).toBeNull();
|
||||
expect(await readFile(join(workdir, 'b'), 'utf-8').catch(() => null)).toBeNull();
|
||||
});
|
||||
|
||||
it('is a no-op when target SHA equals current HEAD', async () => {
|
||||
const sha = (await git.revparse(['HEAD'])).trim();
|
||||
await gitService.resetHardTo(sha);
|
||||
expect((await git.revparse(['HEAD'])).trim()).toBe(sha);
|
||||
});
|
||||
});
|
||||
450
packages/cli/src/context/core/git.service.test.ts
Normal file
450
packages/cli/src/context/core/git.service.test.ts
Normal file
|
|
@ -0,0 +1,450 @@
|
|||
import { mkdtemp, readFile, realpath, rm, writeFile } from 'node:fs/promises';
|
||||
import { tmpdir } from 'node:os';
|
||||
import { join } from 'node:path';
|
||||
import { afterEach, beforeEach, describe, expect, it } from 'vitest';
|
||||
import type { KtxCoreConfig } from './config.js';
|
||||
import { GitService } from './git.service.js';
|
||||
|
||||
// These tests drive a real git repo inside a temp directory — simple-git shells out to the
|
||||
// system `git` binary. They are fast enough to run as unit tests and catch real issues that
|
||||
// would be invisible with mocked git.
|
||||
describe('GitService', () => {
|
||||
let service: GitService;
|
||||
let tempDir: string;
|
||||
|
||||
beforeEach(async () => {
|
||||
tempDir = await mkdtemp(join(tmpdir(), 'git-service-spec-'));
|
||||
|
||||
const coreConfig: KtxCoreConfig = {
|
||||
storage: { configDir: tempDir, homeDir: tempDir },
|
||||
git: {
|
||||
userName: 'Test User',
|
||||
userEmail: 'test@example.com',
|
||||
bootstrapMessage: 'Initialize test config repo',
|
||||
bootstrapAuthor: 'test-system',
|
||||
bootstrapAuthorEmail: 'system@example.com',
|
||||
},
|
||||
};
|
||||
|
||||
service = new GitService(coreConfig);
|
||||
await service.onModuleInit();
|
||||
});
|
||||
|
||||
afterEach(async () => {
|
||||
await rm(tempDir, { recursive: true, force: true });
|
||||
});
|
||||
|
||||
const writeAndCommit = async (filePath: string, content: string, message = 'msg') => {
|
||||
await writeFile(join(tempDir, filePath), content, 'utf-8');
|
||||
return service.commitFile(filePath, message, 'Test', 'test@example.com');
|
||||
};
|
||||
|
||||
describe('cold-start bootstrap commit', () => {
|
||||
it('writes an empty commit on init so HEAD always resolves', async () => {
|
||||
// beforeEach already ran onModuleInit() against an empty temp dir.
|
||||
const head = await service.revParseHead();
|
||||
expect(head).toMatch(/^[0-9a-f]{40}$/);
|
||||
});
|
||||
|
||||
it('does not double-commit when re-initialized', async () => {
|
||||
const before = await service.revParseHead();
|
||||
await service.onModuleInit();
|
||||
const after = await service.revParseHead();
|
||||
expect(after).toBe(before);
|
||||
});
|
||||
|
||||
it('keeps git auto-maintenance attached for deterministic cleanup', async () => {
|
||||
const config = await readFile(join(tempDir, '.git', 'config'), 'utf-8');
|
||||
|
||||
expect(config).toMatch(/\[gc]\n\s+autoDetach = false/);
|
||||
expect(config).toMatch(/\[maintenance]\n\s+autoDetach = false/);
|
||||
});
|
||||
|
||||
it('initializes when release automation sets GIT_ASKPASS', async () => {
|
||||
const releaseEnvDir = await mkdtemp(join(tmpdir(), 'git-service-release-env-'));
|
||||
const previousAskPass = process.env.GIT_ASKPASS;
|
||||
process.env.GIT_ASKPASS = 'echo';
|
||||
|
||||
try {
|
||||
const releaseEnvService = new GitService({
|
||||
storage: { configDir: releaseEnvDir, homeDir: releaseEnvDir },
|
||||
git: {
|
||||
userName: 'Test User',
|
||||
userEmail: 'test@example.com',
|
||||
bootstrapMessage: 'Initialize test config repo',
|
||||
bootstrapAuthor: 'test-system',
|
||||
bootstrapAuthorEmail: 'system@example.com',
|
||||
},
|
||||
});
|
||||
|
||||
await expect(releaseEnvService.onModuleInit()).resolves.toBeUndefined();
|
||||
} finally {
|
||||
if (previousAskPass === undefined) {
|
||||
delete process.env.GIT_ASKPASS;
|
||||
} else {
|
||||
process.env.GIT_ASKPASS = previousAskPass;
|
||||
}
|
||||
await rm(releaseEnvDir, { recursive: true, force: true });
|
||||
}
|
||||
});
|
||||
});
|
||||
|
||||
describe('commitFile `created` flag', () => {
|
||||
it('is true for a real commit', async () => {
|
||||
const info = await writeAndCommit('a.md', '# Hello');
|
||||
expect(info.created).toBe(true);
|
||||
});
|
||||
|
||||
it('is false on a no-op write (content unchanged)', async () => {
|
||||
await writeAndCommit('a.md', '# Hello');
|
||||
const second = await writeAndCommit('a.md', '# Hello', 'unused');
|
||||
expect(second.created).toBe(false);
|
||||
});
|
||||
});
|
||||
|
||||
describe('addNote / getNote', () => {
|
||||
it('attaches a note and reads it back', async () => {
|
||||
const info = await writeAndCommit('a.md', '# Hello');
|
||||
await service.addNote(info.commitHash, 'Rich message from LLM');
|
||||
expect(await service.getNote(info.commitHash)).toBe('Rich message from LLM');
|
||||
});
|
||||
|
||||
it('returns undefined when no note exists', async () => {
|
||||
const info = await writeAndCommit('a.md', '# Hello');
|
||||
expect(await service.getNote(info.commitHash)).toBeUndefined();
|
||||
});
|
||||
|
||||
it('overwrites an existing note (idempotent retries)', async () => {
|
||||
const info = await writeAndCommit('a.md', '# Hello');
|
||||
await service.addNote(info.commitHash, 'First');
|
||||
await service.addNote(info.commitHash, 'Second');
|
||||
expect(await service.getNote(info.commitHash)).toBe('Second');
|
||||
});
|
||||
|
||||
it('skips empty/whitespace messages silently', async () => {
|
||||
const info = await writeAndCommit('a.md', '# Hello');
|
||||
await service.addNote(info.commitHash, ' ');
|
||||
expect(await service.getNote(info.commitHash)).toBeUndefined();
|
||||
});
|
||||
});
|
||||
|
||||
describe('getFileHistory', () => {
|
||||
it('surfaces enhancedMessage when a note is present', async () => {
|
||||
const info = await writeAndCommit('a.md', '# Hello');
|
||||
await service.addNote(info.commitHash, 'Note body');
|
||||
|
||||
const history = await service.getFileHistory('a.md');
|
||||
expect(history[0]?.enhancedMessage).toBe('Note body');
|
||||
});
|
||||
|
||||
it('leaves enhancedMessage undefined when no note is attached', async () => {
|
||||
await writeAndCommit('a.md', '# Hello');
|
||||
const history = await service.getFileHistory('a.md');
|
||||
expect(history[0]?.enhancedMessage).toBeUndefined();
|
||||
});
|
||||
});
|
||||
|
||||
describe('getCommitDiff', () => {
|
||||
it('returns the patch scoped to the requested path', async () => {
|
||||
const info = await writeAndCommit('a.md', '# Hello');
|
||||
const diff = await service.getCommitDiff(info.commitHash, 'a.md');
|
||||
expect(diff).toContain('diff --git');
|
||||
expect(diff).toContain('Hello');
|
||||
});
|
||||
|
||||
it('handles the repository initial commit without throwing', async () => {
|
||||
const info = await writeAndCommit('first.md', 'first');
|
||||
await expect(service.getCommitDiff(info.commitHash, 'first.md')).resolves.toBeDefined();
|
||||
});
|
||||
});
|
||||
|
||||
describe('squashTo', () => {
|
||||
const writeAsSystem = async (filePath: string, content: string, message = 'msg') => {
|
||||
await writeFile(join(tempDir, filePath), content, 'utf-8');
|
||||
return service.commitFile(filePath, message, 'System User', 'system@example.com');
|
||||
};
|
||||
|
||||
it('collapses 3 commits after preHead into a single commit', async () => {
|
||||
const pre = await writeAsSystem('a.md', 'v1');
|
||||
const preHead = pre.commitHash;
|
||||
|
||||
await writeAsSystem('b.md', 'b', 'add b');
|
||||
await writeAsSystem('c.md', 'c', 'add c');
|
||||
await writeAsSystem('a.md', 'v2', 'update a');
|
||||
|
||||
const result = await service.squashTo(preHead, {
|
||||
message: 'Ingest: bundle 3 writes',
|
||||
author: 'System User',
|
||||
authorEmail: 'system@example.com',
|
||||
});
|
||||
|
||||
expect(result.squashed).toBe(true);
|
||||
expect(result.squashedCount).toBe(3);
|
||||
expect(result.commitHash).toBeTruthy();
|
||||
expect(result.commitHash).not.toBe(preHead);
|
||||
const commitHash = result.commitHash;
|
||||
if (!commitHash) {
|
||||
throw new Error('Expected squash commit hash');
|
||||
}
|
||||
|
||||
// The squashed commit should preserve the final tree state.
|
||||
const fileAtSquash = await service.getFileAtCommit('a.md', commitHash);
|
||||
expect(fileAtSquash).toBe('v2');
|
||||
const bAtSquash = await service.getFileAtCommit('b.md', commitHash);
|
||||
expect(bAtSquash).toBe('b');
|
||||
});
|
||||
|
||||
it('is a no-op when preHead equals HEAD', async () => {
|
||||
const pre = await writeAsSystem('a.md', 'v1');
|
||||
|
||||
const result = await service.squashTo(pre.commitHash, {
|
||||
message: 'nothing to squash',
|
||||
author: 'System User',
|
||||
authorEmail: 'system@example.com',
|
||||
});
|
||||
|
||||
expect(result.squashed).toBe(false);
|
||||
expect(result.commitHash).toBe(pre.commitHash);
|
||||
});
|
||||
|
||||
it('skips squash when a foreign-author commit sits between preHead and HEAD', async () => {
|
||||
const pre = await writeAsSystem('a.md', 'v1');
|
||||
const preHead = pre.commitHash;
|
||||
|
||||
await writeAsSystem('b.md', 'from us', 'ours');
|
||||
// Foreign commit
|
||||
await writeAndCommit('c.md', 'from someone else', 'foreign');
|
||||
await writeAsSystem('d.md', 'ours again', 'ours 2');
|
||||
|
||||
const result = await service.squashTo(preHead, {
|
||||
message: 'should be skipped',
|
||||
author: 'System User',
|
||||
authorEmail: 'system@example.com',
|
||||
});
|
||||
|
||||
expect(result.squashed).toBe(false);
|
||||
expect(result.reason).toContain('foreign');
|
||||
expect(result.squashedCount).toBe(3);
|
||||
});
|
||||
|
||||
it('returns cleanly when preHead is empty (no starting commit)', async () => {
|
||||
const result = await service.squashTo('', {
|
||||
message: 'would have squashed',
|
||||
author: 'System User',
|
||||
authorEmail: 'system@example.com',
|
||||
});
|
||||
|
||||
expect(result.squashed).toBe(false);
|
||||
expect(result.commitHash).toBeNull();
|
||||
});
|
||||
});
|
||||
|
||||
describe('worktree lifecycle', () => {
|
||||
// macOS canonicalizes tmp paths (/var/folders → /private/var/folders) when git
|
||||
// returns them from `worktree list`. Resolve through realpath() before comparing.
|
||||
const canonicalSiblingPath = async (suffix: string): Promise<string> => {
|
||||
const parent = await realpath(join(tempDir, '..'));
|
||||
return join(parent, `wt-${Date.now()}-${suffix}`);
|
||||
};
|
||||
|
||||
it('addWorktree creates a branch + directory at the given startSha', async () => {
|
||||
const { commitHash } = await writeAndCommit('seed.md', 'seed');
|
||||
const wtDir = await canonicalSiblingPath('add');
|
||||
await service.addWorktree(wtDir, 'session/alpha', commitHash);
|
||||
const list = await service.listWorktrees();
|
||||
expect(list.find((e) => e.path === wtDir && e.branch === 'refs/heads/session/alpha')).toBeTruthy();
|
||||
await service.removeWorktree(wtDir).catch(() => undefined);
|
||||
await rm(wtDir, { recursive: true, force: true }).catch(() => undefined);
|
||||
});
|
||||
|
||||
it('removeWorktree detaches the worktree entry', async () => {
|
||||
const { commitHash } = await writeAndCommit('seed.md', 'seed');
|
||||
const wtDir = await canonicalSiblingPath('rm');
|
||||
await service.addWorktree(wtDir, 'session/beta', commitHash);
|
||||
await service.removeWorktree(wtDir);
|
||||
const list = await service.listWorktrees();
|
||||
expect(list.find((e) => e.path === wtDir)).toBeFalsy();
|
||||
});
|
||||
|
||||
it('deleteBranch removes a branch ref', async () => {
|
||||
const { commitHash } = await writeAndCommit('seed.md', 'seed');
|
||||
const wtDir = await canonicalSiblingPath('br');
|
||||
await service.addWorktree(wtDir, 'session/gamma', commitHash);
|
||||
await service.removeWorktree(wtDir);
|
||||
await service.deleteBranch('session/gamma', true);
|
||||
const branches = await (service as unknown as { git: import('simple-git').SimpleGit }).git.branchLocal();
|
||||
expect(branches.all).not.toContain('session/gamma');
|
||||
await rm(wtDir, { recursive: true, force: true }).catch(() => undefined);
|
||||
});
|
||||
});
|
||||
|
||||
describe('forWorktree', () => {
|
||||
it('returns a GitService whose operations run inside the given worktree', async () => {
|
||||
const { commitHash } = await writeAndCommit('seed.md', 'seed');
|
||||
const parent = await realpath(join(tempDir, '..'));
|
||||
const wtDir = join(parent, `wt-${Date.now()}-fw`);
|
||||
await service.addWorktree(wtDir, 'session/delta', commitHash);
|
||||
|
||||
const scoped = service.forWorktree(wtDir);
|
||||
expect(await scoped.revParseHead()).toBe(commitHash);
|
||||
|
||||
await service.removeWorktree(wtDir).catch(() => undefined);
|
||||
await rm(wtDir, { recursive: true, force: true }).catch(() => undefined);
|
||||
});
|
||||
|
||||
it('serializes concurrent commits from scoped services targeting the same worktree', async () => {
|
||||
const { commitHash } = await writeAndCommit('seed.md', 'seed');
|
||||
const parent = await realpath(join(tempDir, '..'));
|
||||
const wtDir = join(parent, `wt-${Date.now()}-fw-concurrent`);
|
||||
await service.addWorktree(wtDir, 'session/concurrent', commitHash);
|
||||
|
||||
const first = service.forWorktree(wtDir);
|
||||
const second = service.forWorktree(wtDir);
|
||||
await writeFile(join(wtDir, 'a.md'), 'a\n', 'utf-8');
|
||||
await writeFile(join(wtDir, 'b.md'), 'b\n', 'utf-8');
|
||||
|
||||
const [a, b] = await Promise.all([
|
||||
first.commitFile('a.md', 'add a', 'System User', 'system@example.com'),
|
||||
second.commitFile('b.md', 'add b', 'System User', 'system@example.com'),
|
||||
]);
|
||||
|
||||
expect(a.commitHash).toMatch(/^[0-9a-f]{40}$/);
|
||||
expect(b.commitHash).toMatch(/^[0-9a-f]{40}$/);
|
||||
await expect(first.getFileAtCommit('a.md', a.commitHash)).resolves.toBe('a\n');
|
||||
await expect(second.getFileAtCommit('b.md', b.commitHash)).resolves.toBe('b\n');
|
||||
|
||||
await service.removeWorktree(wtDir).catch(() => undefined);
|
||||
await rm(wtDir, { recursive: true, force: true }).catch(() => undefined);
|
||||
});
|
||||
});
|
||||
|
||||
describe('squashMergeIntoMain', () => {
|
||||
it('merges a session branch as one commit on main, returning the new SHA + touched paths', async () => {
|
||||
const { commitHash: baseSha } = await writeAndCommit('seed.md', 'seed');
|
||||
const parent = await realpath(join(tempDir, '..'));
|
||||
const wtDir = join(parent, `wt-${Date.now()}-sm`);
|
||||
await service.addWorktree(wtDir, 'session/happy', baseSha);
|
||||
|
||||
const scoped = service.forWorktree(wtDir);
|
||||
await writeFile(join(wtDir, 'a.yaml'), 'one: 1\n', 'utf-8');
|
||||
await scoped.commitFile('a.yaml', 'wip a', 'System User', 'system@example.com');
|
||||
await writeFile(join(wtDir, 'b.yaml'), 'two: 2\n', 'utf-8');
|
||||
await scoped.commitFile('b.yaml', 'wip b', 'System User', 'system@example.com');
|
||||
|
||||
const result = await service.squashMergeIntoMain(
|
||||
'session/happy',
|
||||
'System User',
|
||||
'system@example.com',
|
||||
'Memory capture: 2 files [chat=abcd1234]',
|
||||
);
|
||||
|
||||
expect(result.ok).toBe(true);
|
||||
if (!result.ok) {
|
||||
throw new Error('unreachable');
|
||||
}
|
||||
expect(result.squashSha).toMatch(/^[0-9a-f]{40}$/);
|
||||
expect(result.touchedPaths.sort()).toEqual(['a.yaml', 'b.yaml']);
|
||||
|
||||
const mainHead = await service.revParseHead();
|
||||
expect(mainHead).toBe(result.squashSha);
|
||||
expect(mainHead).not.toBe(baseSha);
|
||||
|
||||
await service.removeWorktree(wtDir).catch(() => undefined);
|
||||
await rm(wtDir, { recursive: true, force: true }).catch(() => undefined);
|
||||
});
|
||||
|
||||
it('returns ok with empty touchedPaths when the session branch has no diff vs main', async () => {
|
||||
const { commitHash: baseSha } = await writeAndCommit('seed.md', 'seed');
|
||||
const parent = await realpath(join(tempDir, '..'));
|
||||
const wtDir = join(parent, `wt-${Date.now()}-sm-empty`);
|
||||
await service.addWorktree(wtDir, 'session/empty', baseSha);
|
||||
|
||||
const result = await service.squashMergeIntoMain(
|
||||
'session/empty',
|
||||
'System User',
|
||||
'system@example.com',
|
||||
'should be a no-op',
|
||||
);
|
||||
|
||||
expect(result.ok).toBe(true);
|
||||
if (!result.ok) {
|
||||
throw new Error('unreachable');
|
||||
}
|
||||
expect(result.touchedPaths).toEqual([]);
|
||||
expect(result.squashSha).toBe(baseSha);
|
||||
|
||||
await service.removeWorktree(wtDir).catch(() => undefined);
|
||||
await rm(wtDir, { recursive: true, force: true }).catch(() => undefined);
|
||||
});
|
||||
|
||||
it('returns conflict=true and leaves main clean when session+main touched same file differently', async () => {
|
||||
await writeAndCommit('shared.yaml', 'base\n');
|
||||
const base = await service.revParseHead();
|
||||
if (!base) {
|
||||
throw new Error('no base head');
|
||||
}
|
||||
|
||||
const parent = await realpath(join(tempDir, '..'));
|
||||
const wtDir = join(parent, `wt-${Date.now()}-conf`);
|
||||
await service.addWorktree(wtDir, 'session/conf', base);
|
||||
const scoped = service.forWorktree(wtDir);
|
||||
await writeFile(join(wtDir, 'shared.yaml'), 'session-edit\n', 'utf-8');
|
||||
await scoped.commitFile('shared.yaml', 'session edit', 'System User', 'system@example.com');
|
||||
|
||||
// Main edits the same file a different way, after the session branched.
|
||||
await writeAndCommit('shared.yaml', 'main-edit\n');
|
||||
|
||||
const result = await service.squashMergeIntoMain(
|
||||
'session/conf',
|
||||
'System User',
|
||||
'system@example.com',
|
||||
'Memory capture: 1 file [chat=dead1234]',
|
||||
);
|
||||
|
||||
expect(result.ok).toBe(false);
|
||||
if (result.ok) {
|
||||
throw new Error('unreachable');
|
||||
}
|
||||
expect(result.conflict).toBe(true);
|
||||
expect(result.conflictPaths).toContain('shared.yaml');
|
||||
|
||||
const status = await (service as unknown as { git: import('simple-git').SimpleGit }).git.status();
|
||||
expect(status.isClean()).toBe(true);
|
||||
|
||||
await service.removeWorktree(wtDir).catch(() => undefined);
|
||||
await rm(wtDir, { recursive: true, force: true }).catch(() => undefined);
|
||||
});
|
||||
|
||||
it('reports untracked files that would be overwritten by the squash merge', async () => {
|
||||
const { commitHash: baseSha } = await writeAndCommit('seed.md', 'seed');
|
||||
const parent = await realpath(join(tempDir, '..'));
|
||||
const wtDir = join(parent, `wt-${Date.now()}-untracked`);
|
||||
await service.addWorktree(wtDir, 'session/untracked', baseSha);
|
||||
|
||||
const scoped = service.forWorktree(wtDir);
|
||||
await writeFile(join(wtDir, 'knowledge.md'), 'session version\n', 'utf-8');
|
||||
await scoped.commitFile('knowledge.md', 'session write', 'System User', 'system@example.com');
|
||||
await writeFile(join(tempDir, 'knowledge.md'), 'untracked local version\n', 'utf-8');
|
||||
|
||||
const result = await service.squashMergeIntoMain(
|
||||
'session/untracked',
|
||||
'System User',
|
||||
'system@example.com',
|
||||
'Memory capture: 1 file [chat=untracked]',
|
||||
);
|
||||
|
||||
expect(result.ok).toBe(false);
|
||||
if (result.ok) {
|
||||
throw new Error('unreachable');
|
||||
}
|
||||
expect(result.conflict).toBe(true);
|
||||
expect(result.conflictPaths).toEqual(['knowledge.md']);
|
||||
|
||||
const status = await (service as unknown as { git: import('simple-git').SimpleGit }).git.status();
|
||||
expect(status.not_added).toContain('knowledge.md');
|
||||
|
||||
await service.removeWorktree(wtDir).catch(() => undefined);
|
||||
await rm(wtDir, { recursive: true, force: true }).catch(() => undefined);
|
||||
});
|
||||
});
|
||||
});
|
||||
1062
packages/cli/src/context/core/git.service.ts
Normal file
1062
packages/cli/src/context/core/git.service.ts
Normal file
File diff suppressed because it is too large
Load diff
48
packages/cli/src/context/core/redaction.ts
Normal file
48
packages/cli/src/context/core/redaction.ts
Normal file
|
|
@ -0,0 +1,48 @@
|
|||
/** @internal */
|
||||
export const REDACTED_KTX_CREDENTIAL_VALUE = '<redacted>';
|
||||
|
||||
const SENSITIVE_FIELD_NAME = /(password|secret|token|api[_-]?key|private[_-]?key|passphrase|credential|authorization|url)/i;
|
||||
const URL_CREDENTIAL_PATTERN = /([a-z][a-z0-9+.-]*:\/\/[^:\s/@]+:)([^@\s/]+)(@)/gi;
|
||||
|
||||
function isRecord(value: unknown): value is Record<string, unknown> {
|
||||
return typeof value === 'object' && value !== null && !Array.isArray(value);
|
||||
}
|
||||
|
||||
function isSensitiveField(key: string): boolean {
|
||||
return SENSITIVE_FIELD_NAME.test(key);
|
||||
}
|
||||
|
||||
export function redactKtxSensitiveValue(key: string, value: unknown): unknown {
|
||||
if (isSensitiveField(key)) {
|
||||
return REDACTED_KTX_CREDENTIAL_VALUE;
|
||||
}
|
||||
if (Array.isArray(value)) {
|
||||
return value.map((item) => redactKtxSensitiveValue(key, item));
|
||||
}
|
||||
if (isRecord(value)) {
|
||||
return redactKtxSensitiveMetadata(value);
|
||||
}
|
||||
return value;
|
||||
}
|
||||
|
||||
export function redactKtxSensitiveMetadata(metadata: Record<string, unknown>): Record<string, unknown> {
|
||||
const redacted: Record<string, unknown> = {};
|
||||
for (const [key, value] of Object.entries(metadata)) {
|
||||
if (Array.isArray(value)) {
|
||||
redacted[key] = value.map((item) =>
|
||||
isRecord(item) ? redactKtxSensitiveMetadata(item) : redactKtxSensitiveValue(key, item),
|
||||
);
|
||||
continue;
|
||||
}
|
||||
if (isRecord(value)) {
|
||||
redacted[key] = redactKtxSensitiveValue(key, value);
|
||||
continue;
|
||||
}
|
||||
redacted[key] = redactKtxSensitiveValue(key, value);
|
||||
}
|
||||
return redacted;
|
||||
}
|
||||
|
||||
export function redactKtxSensitiveText(value: string): string {
|
||||
return value.replace(URL_CREDENTIAL_PATTERN, `$1${REDACTED_KTX_CREDENTIAL_VALUE}$3`);
|
||||
}
|
||||
124
packages/cli/src/context/core/session-worktree.service.test.ts
Normal file
124
packages/cli/src/context/core/session-worktree.service.test.ts
Normal file
|
|
@ -0,0 +1,124 @@
|
|||
import { mkdtemp, realpath, rm, stat } from 'node:fs/promises';
|
||||
import { tmpdir } from 'node:os';
|
||||
import { join } from 'node:path';
|
||||
import { afterEach, beforeEach, describe, expect, it, vi } from 'vitest';
|
||||
import type { KtxCoreConfig } from './config.js';
|
||||
import { GitService } from './git.service.js';
|
||||
import { SessionWorktreeService, type WorktreeConfigPort } from './session-worktree.service.js';
|
||||
|
||||
interface TestWorktreeConfig extends WorktreeConfigPort<TestWorktreeConfig> {
|
||||
workdir?: string;
|
||||
}
|
||||
|
||||
// SessionWorktreeService glues a real GitService to a scoped config adapter.
|
||||
describe('SessionWorktreeService', () => {
|
||||
let sessionService: SessionWorktreeService<TestWorktreeConfig>;
|
||||
let gitService: GitService;
|
||||
let homeDir: string;
|
||||
|
||||
beforeEach(async () => {
|
||||
homeDir = await mkdtemp(join(tmpdir(), 'sws-spec-'));
|
||||
homeDir = await realpath(homeDir);
|
||||
|
||||
const coreConfig: KtxCoreConfig = {
|
||||
storage: { configDir: homeDir, homeDir },
|
||||
git: {
|
||||
userName: 'System User',
|
||||
userEmail: 'system@example.com',
|
||||
bootstrapMessage: 'Initialize test config repo',
|
||||
bootstrapAuthor: 'test-system',
|
||||
bootstrapAuthorEmail: 'system@example.com',
|
||||
},
|
||||
};
|
||||
|
||||
gitService = new GitService(coreConfig);
|
||||
await gitService.onModuleInit();
|
||||
const configService: TestWorktreeConfig = {
|
||||
forWorktree: vi.fn(
|
||||
(workdir: string): TestWorktreeConfig => ({ workdir, forWorktree: configService.forWorktree }),
|
||||
),
|
||||
};
|
||||
sessionService = new SessionWorktreeService({
|
||||
coreConfig,
|
||||
gitService,
|
||||
configService,
|
||||
});
|
||||
});
|
||||
|
||||
afterEach(async () => {
|
||||
await rm(homeDir, { recursive: true, force: true });
|
||||
});
|
||||
|
||||
describe('create', () => {
|
||||
it('creates a worktree + branch and returns scoped services', async () => {
|
||||
const baseSha = await gitService.revParseHead();
|
||||
if (!baseSha) {
|
||||
throw new Error('no base sha');
|
||||
}
|
||||
|
||||
const session = await sessionService.create('chat-abc', baseSha);
|
||||
|
||||
expect(session.workdir).toBe(join(homeDir, '.worktrees', 'session-chat-abc'));
|
||||
expect(session.branch).toBe('session/chat-abc');
|
||||
expect(session.baseSha).toBe(baseSha);
|
||||
const stats = await stat(session.workdir);
|
||||
expect(stats.isDirectory()).toBe(true);
|
||||
|
||||
// Scoped git instance reports the worktree's HEAD (= baseSha at creation time).
|
||||
expect(await session.git.revParseHead()).toBe(baseSha);
|
||||
|
||||
const list = await gitService.listWorktrees();
|
||||
expect(list.find((e) => e.path === session.workdir)).toBeTruthy();
|
||||
});
|
||||
|
||||
it('appends a timestamp suffix when the primary dir already exists', async () => {
|
||||
const baseSha = await gitService.revParseHead();
|
||||
if (!baseSha) {
|
||||
throw new Error('no base sha');
|
||||
}
|
||||
|
||||
const first = await sessionService.create('chat-dup', baseSha);
|
||||
const second = await sessionService.create('chat-dup', baseSha);
|
||||
|
||||
expect(first.workdir).not.toBe(second.workdir);
|
||||
expect(second.branch).toMatch(/^session\/chat-dup-\d+$/);
|
||||
});
|
||||
});
|
||||
|
||||
describe('cleanup', () => {
|
||||
it('success removes the worktree dir and deletes the branch', async () => {
|
||||
const baseSha = await gitService.revParseHead();
|
||||
if (!baseSha) {
|
||||
throw new Error('no base sha');
|
||||
}
|
||||
|
||||
const session = await sessionService.create('chat-cleanup-ok', baseSha);
|
||||
await sessionService.cleanup(session, 'success');
|
||||
|
||||
const list = await gitService.listWorktrees();
|
||||
expect(list.find((e) => e.path === session.workdir)).toBeFalsy();
|
||||
await expect(stat(session.workdir)).rejects.toThrow();
|
||||
});
|
||||
|
||||
it('conflict keeps the worktree and writes a sentinel file', async () => {
|
||||
const baseSha = await gitService.revParseHead();
|
||||
if (!baseSha) {
|
||||
throw new Error('no base sha');
|
||||
}
|
||||
|
||||
const session = await sessionService.create('chat-cleanup-conflict', baseSha);
|
||||
await sessionService.cleanup(session, 'conflict', { conflictPaths: ['shared.yaml'] });
|
||||
|
||||
// Dir still exists.
|
||||
await expect(stat(session.workdir)).resolves.toBeTruthy();
|
||||
|
||||
const { readFile } = await import('node:fs/promises');
|
||||
const raw = await readFile(join(session.workdir, '.ktx-outcome'), 'utf-8');
|
||||
const parsed = JSON.parse(raw);
|
||||
expect(parsed.outcome).toBe('conflict');
|
||||
expect(parsed.chatId).toBe('chat-cleanup-conflict');
|
||||
expect(parsed.conflictPaths).toEqual(['shared.yaml']);
|
||||
expect(typeof parsed.at).toBe('string');
|
||||
});
|
||||
});
|
||||
});
|
||||
113
packages/cli/src/context/core/session-worktree.service.ts
Normal file
113
packages/cli/src/context/core/session-worktree.service.ts
Normal file
|
|
@ -0,0 +1,113 @@
|
|||
import { mkdir, stat, writeFile } from 'node:fs/promises';
|
||||
import { join } from 'node:path';
|
||||
import { noopLogger, resolveWorktreesDir, type KtxCoreConfig, type KtxLogger } from './config.js';
|
||||
import { GitService } from './git.service.js';
|
||||
|
||||
export type SessionOutcome = 'success' | 'empty' | 'conflict' | 'crash';
|
||||
|
||||
interface SentinelPayload {
|
||||
outcome: SessionOutcome;
|
||||
at: string;
|
||||
chatId: string;
|
||||
baseSha: string;
|
||||
conflictPaths?: string[];
|
||||
}
|
||||
|
||||
export interface WorktreeConfigPort<TConfig> {
|
||||
forWorktree(workdir: string): TConfig;
|
||||
}
|
||||
|
||||
export interface SessionWorktree<TConfig> {
|
||||
chatId: string;
|
||||
workdir: string;
|
||||
branch: string;
|
||||
baseSha: string;
|
||||
createdAt: Date;
|
||||
git: GitService;
|
||||
config: TConfig;
|
||||
}
|
||||
|
||||
export interface SessionWorktreeServiceDeps<TConfig extends WorktreeConfigPort<TConfig>> {
|
||||
coreConfig: KtxCoreConfig;
|
||||
gitService: GitService;
|
||||
configService: TConfig;
|
||||
logger?: KtxLogger;
|
||||
}
|
||||
|
||||
export class SessionWorktreeService<TConfig extends WorktreeConfigPort<TConfig> = WorktreeConfigPort<never>> {
|
||||
private readonly logger: KtxLogger;
|
||||
private readonly worktreesRoot: string;
|
||||
|
||||
constructor(private readonly deps: SessionWorktreeServiceDeps<TConfig>) {
|
||||
this.logger = deps.logger ?? noopLogger;
|
||||
this.worktreesRoot = resolveWorktreesDir(deps.coreConfig);
|
||||
}
|
||||
|
||||
async create(sessionKey: string, baseSha: string): Promise<SessionWorktree<TConfig>> {
|
||||
await mkdir(this.worktreesRoot, { recursive: true });
|
||||
|
||||
let dirName = `session-${sessionKey}`;
|
||||
let branch = `session/${sessionKey}`;
|
||||
let workdir = join(this.worktreesRoot, dirName);
|
||||
|
||||
try {
|
||||
await stat(workdir);
|
||||
const suffix = Date.now().toString();
|
||||
dirName = `session-${sessionKey}-${suffix}`;
|
||||
branch = `session/${sessionKey}-${suffix}`;
|
||||
workdir = join(this.worktreesRoot, dirName);
|
||||
this.logger.warn(`session worktree collision for key=${sessionKey}; using suffix ${suffix}`);
|
||||
} catch {
|
||||
// no collision: primary name is free
|
||||
}
|
||||
|
||||
await this.deps.gitService.addWorktree(workdir, branch, baseSha);
|
||||
|
||||
return {
|
||||
chatId: sessionKey,
|
||||
workdir,
|
||||
branch,
|
||||
baseSha,
|
||||
createdAt: new Date(),
|
||||
git: this.deps.gitService.forWorktree(workdir),
|
||||
config: this.deps.configService.forWorktree(workdir),
|
||||
};
|
||||
}
|
||||
|
||||
async cleanup(
|
||||
session: SessionWorktree<TConfig>,
|
||||
outcome: SessionOutcome,
|
||||
extra?: { conflictPaths?: string[] },
|
||||
): Promise<void> {
|
||||
if (outcome === 'success' || outcome === 'empty') {
|
||||
try {
|
||||
await this.deps.gitService.removeWorktree(session.workdir);
|
||||
await this.deps.gitService.deleteBranch(session.branch, true);
|
||||
} catch (error) {
|
||||
this.logger.warn(
|
||||
`cleanup(${outcome}) failed for ${session.chatId}: ${
|
||||
error instanceof Error ? error.message : String(error)
|
||||
}`,
|
||||
);
|
||||
}
|
||||
return;
|
||||
}
|
||||
|
||||
const payload: SentinelPayload = {
|
||||
outcome,
|
||||
at: new Date().toISOString(),
|
||||
chatId: session.chatId,
|
||||
baseSha: session.baseSha,
|
||||
...(extra?.conflictPaths ? { conflictPaths: extra.conflictPaths } : {}),
|
||||
};
|
||||
try {
|
||||
await writeFile(join(session.workdir, '.ktx-outcome'), JSON.stringify(payload, null, 2), 'utf-8');
|
||||
} catch (error) {
|
||||
this.logger.warn(
|
||||
`cleanup(${outcome}) failed to write sentinel for ${session.chatId}: ${
|
||||
error instanceof Error ? error.message : String(error)
|
||||
}`,
|
||||
);
|
||||
}
|
||||
}
|
||||
}
|
||||
339
packages/cli/src/context/daemon/semantic-layer-compute.test.ts
Normal file
339
packages/cli/src/context/daemon/semantic-layer-compute.test.ts
Normal file
|
|
@ -0,0 +1,339 @@
|
|||
import { once } from 'node:events';
|
||||
import { createServer } from 'node:http';
|
||||
import { describe, expect, it, vi } from 'vitest';
|
||||
import { createHttpSemanticLayerComputePort, createPythonSemanticLayerComputePort } from './semantic-layer-compute.js';
|
||||
|
||||
const source = {
|
||||
name: 'orders',
|
||||
table: 'public.orders',
|
||||
grain: ['id'],
|
||||
columns: [{ name: 'id', type: 'number' }],
|
||||
joins: [],
|
||||
measures: [{ name: 'order_count', expr: 'count(*)' }],
|
||||
};
|
||||
|
||||
const sourceGenerationInput = {
|
||||
tables: [
|
||||
{
|
||||
name: 'orders',
|
||||
db: 'public',
|
||||
comment: 'Orders table',
|
||||
columns: [
|
||||
{ name: 'id', type: 'integer', primaryKey: true, nullable: false, comment: 'Order ID' },
|
||||
{ name: 'customer_id', type: 'integer' },
|
||||
{ name: 'amount', type: 'decimal', comment: 'Order amount' },
|
||||
],
|
||||
},
|
||||
{
|
||||
name: 'customers',
|
||||
db: 'public',
|
||||
columns: [
|
||||
{ name: 'id', type: 'integer', primaryKey: true },
|
||||
{ name: 'email', type: 'varchar' },
|
||||
],
|
||||
},
|
||||
],
|
||||
links: [
|
||||
{
|
||||
fromTable: 'orders',
|
||||
fromColumn: 'customer_id',
|
||||
toTable: 'customers',
|
||||
toColumn: 'id',
|
||||
relationshipType: 'MANY_TO_ONE',
|
||||
},
|
||||
],
|
||||
dialect: 'postgres',
|
||||
};
|
||||
|
||||
const sourceGenerationDaemonPayload = {
|
||||
tables: [
|
||||
{
|
||||
name: 'orders',
|
||||
db: 'public',
|
||||
comment: 'Orders table',
|
||||
columns: [
|
||||
{ name: 'id', type: 'integer', primary_key: true, nullable: false, comment: 'Order ID' },
|
||||
{ name: 'customer_id', type: 'integer' },
|
||||
{ name: 'amount', type: 'decimal', comment: 'Order amount' },
|
||||
],
|
||||
},
|
||||
{
|
||||
name: 'customers',
|
||||
db: 'public',
|
||||
columns: [
|
||||
{ name: 'id', type: 'integer', primary_key: true },
|
||||
{ name: 'email', type: 'varchar' },
|
||||
],
|
||||
},
|
||||
],
|
||||
links: [
|
||||
{
|
||||
from_table: 'orders',
|
||||
from_column: 'customer_id',
|
||||
to_table: 'customers',
|
||||
to_column: 'id',
|
||||
relationship_type: 'MANY_TO_ONE',
|
||||
},
|
||||
],
|
||||
dialect: 'postgres',
|
||||
};
|
||||
|
||||
const sourceGenerationDaemonResponse = {
|
||||
source_count: 2,
|
||||
sources: [
|
||||
{
|
||||
name: 'orders',
|
||||
table: 'public.orders',
|
||||
grain: ['id'],
|
||||
columns: [{ name: 'id', type: 'number' }],
|
||||
joins: [
|
||||
{
|
||||
to: 'customers',
|
||||
on: 'customer_id = customers.id',
|
||||
relationship: 'many_to_one',
|
||||
},
|
||||
],
|
||||
measures: [{ name: 'record_count', expr: 'count(id)' }],
|
||||
},
|
||||
],
|
||||
};
|
||||
|
||||
describe('createPythonSemanticLayerComputePort', () => {
|
||||
it('calls the semantic-query stdio command', async () => {
|
||||
const runJson = vi.fn(async () => ({
|
||||
sql: 'select count(*) from public.orders',
|
||||
dialect: 'postgres',
|
||||
columns: [{ name: 'orders.order_count' }],
|
||||
plan: { sources_used: ['orders'] },
|
||||
}));
|
||||
const port = createPythonSemanticLayerComputePort({ runJson });
|
||||
|
||||
await expect(
|
||||
port.query({
|
||||
sources: [source],
|
||||
dialect: 'postgres',
|
||||
query: { measures: ['orders.order_count'], dimensions: [] },
|
||||
}),
|
||||
).resolves.toEqual({
|
||||
sql: 'select count(*) from public.orders',
|
||||
dialect: 'postgres',
|
||||
columns: [{ name: 'orders.order_count' }],
|
||||
plan: { sources_used: ['orders'] },
|
||||
});
|
||||
|
||||
expect(runJson).toHaveBeenCalledWith('semantic-query', {
|
||||
sources: [source],
|
||||
dialect: 'postgres',
|
||||
query: { measures: ['orders.order_count'], dimensions: [] },
|
||||
});
|
||||
});
|
||||
|
||||
it('calls the semantic-validate stdio command', async () => {
|
||||
const runJson = vi.fn(async () => ({
|
||||
valid: true,
|
||||
errors: [],
|
||||
warnings: [],
|
||||
per_source_warnings: {},
|
||||
}));
|
||||
const port = createPythonSemanticLayerComputePort({ runJson });
|
||||
|
||||
await expect(
|
||||
port.validateSources({
|
||||
sources: [source],
|
||||
dialect: 'postgres',
|
||||
recentlyTouched: ['orders'],
|
||||
}),
|
||||
).resolves.toEqual({
|
||||
valid: true,
|
||||
errors: [],
|
||||
warnings: [],
|
||||
perSourceWarnings: {},
|
||||
});
|
||||
|
||||
expect(runJson).toHaveBeenCalledWith('semantic-validate', {
|
||||
sources: [source],
|
||||
dialect: 'postgres',
|
||||
recently_touched: ['orders'],
|
||||
});
|
||||
});
|
||||
|
||||
it('calls the semantic-generate-sources stdio command', async () => {
|
||||
const runJson = vi.fn(async () => sourceGenerationDaemonResponse);
|
||||
const port = createPythonSemanticLayerComputePort({ runJson });
|
||||
|
||||
await expect(port.generateSources(sourceGenerationInput)).resolves.toEqual({
|
||||
sourceCount: 2,
|
||||
sources: sourceGenerationDaemonResponse.sources,
|
||||
});
|
||||
|
||||
expect(runJson).toHaveBeenCalledWith('semantic-generate-sources', sourceGenerationDaemonPayload);
|
||||
});
|
||||
});
|
||||
|
||||
describe('createHttpSemanticLayerComputePort', () => {
|
||||
it('calls semantic query and validate HTTP endpoints through an injected runner', async () => {
|
||||
const requestJson = vi.fn(async (path: string) => {
|
||||
if (path === '/semantic-layer/query') {
|
||||
return {
|
||||
sql: 'select count(*) from public.orders',
|
||||
dialect: 'postgres',
|
||||
columns: [{ name: 'orders.order_count' }],
|
||||
plan: { sources_used: ['orders'] },
|
||||
};
|
||||
}
|
||||
return {
|
||||
valid: true,
|
||||
errors: [],
|
||||
warnings: [],
|
||||
per_source_warnings: {},
|
||||
};
|
||||
});
|
||||
const port = createHttpSemanticLayerComputePort({ baseUrl: 'http://127.0.0.1:8765/', requestJson });
|
||||
|
||||
await expect(
|
||||
port.query({
|
||||
sources: [source],
|
||||
dialect: 'postgres',
|
||||
query: { measures: ['orders.order_count'], dimensions: [] },
|
||||
}),
|
||||
).resolves.toEqual({
|
||||
sql: 'select count(*) from public.orders',
|
||||
dialect: 'postgres',
|
||||
columns: [{ name: 'orders.order_count' }],
|
||||
plan: { sources_used: ['orders'] },
|
||||
});
|
||||
|
||||
await expect(
|
||||
port.validateSources({
|
||||
sources: [source],
|
||||
dialect: 'postgres',
|
||||
recentlyTouched: ['orders'],
|
||||
}),
|
||||
).resolves.toEqual({
|
||||
valid: true,
|
||||
errors: [],
|
||||
warnings: [],
|
||||
perSourceWarnings: {},
|
||||
});
|
||||
|
||||
expect(requestJson).toHaveBeenNthCalledWith(1, '/semantic-layer/query', {
|
||||
sources: [source],
|
||||
dialect: 'postgres',
|
||||
query: { measures: ['orders.order_count'], dimensions: [] },
|
||||
});
|
||||
expect(requestJson).toHaveBeenNthCalledWith(2, '/semantic-layer/validate', {
|
||||
sources: [source],
|
||||
dialect: 'postgres',
|
||||
recently_touched: ['orders'],
|
||||
});
|
||||
});
|
||||
|
||||
it('calls the semantic source-generation HTTP endpoint through an injected runner', async () => {
|
||||
const requestJson = vi.fn(async () => sourceGenerationDaemonResponse);
|
||||
const port = createHttpSemanticLayerComputePort({ baseUrl: 'http://127.0.0.1:8765/', requestJson });
|
||||
|
||||
await expect(port.generateSources(sourceGenerationInput)).resolves.toEqual({
|
||||
sourceCount: 2,
|
||||
sources: sourceGenerationDaemonResponse.sources,
|
||||
});
|
||||
|
||||
expect(requestJson).toHaveBeenCalledWith('/semantic-layer/generate-sources', sourceGenerationDaemonPayload);
|
||||
});
|
||||
|
||||
it('posts JSON to a running HTTP daemon endpoint', async () => {
|
||||
const requests: Array<{ url: string | undefined; body: unknown }> = [];
|
||||
const server = createServer((request, response) => {
|
||||
const chunks: Buffer[] = [];
|
||||
request.on('data', (chunk: Buffer) => chunks.push(chunk));
|
||||
request.on('end', () => {
|
||||
requests.push({
|
||||
url: request.url,
|
||||
body: JSON.parse(Buffer.concat(chunks).toString('utf8')),
|
||||
});
|
||||
response.writeHead(200, { 'content-type': 'application/json' });
|
||||
response.end(
|
||||
JSON.stringify({
|
||||
sql: 'select count(*) from public.orders',
|
||||
dialect: 'postgres',
|
||||
columns: [{ name: 'orders.order_count' }],
|
||||
plan: { sources_used: ['orders'] },
|
||||
}),
|
||||
);
|
||||
});
|
||||
});
|
||||
|
||||
server.listen(0, '127.0.0.1');
|
||||
await once(server, 'listening');
|
||||
try {
|
||||
const address = server.address();
|
||||
if (!address || typeof address === 'string') {
|
||||
throw new Error('expected TCP server address');
|
||||
}
|
||||
const port = createHttpSemanticLayerComputePort({ baseUrl: `http://127.0.0.1:${address.port}` });
|
||||
|
||||
await expect(
|
||||
port.query({
|
||||
sources: [source],
|
||||
dialect: 'postgres',
|
||||
query: { measures: ['orders.order_count'], dimensions: [] },
|
||||
}),
|
||||
).resolves.toMatchObject({
|
||||
sql: 'select count(*) from public.orders',
|
||||
dialect: 'postgres',
|
||||
});
|
||||
|
||||
expect(requests).toEqual([
|
||||
{
|
||||
url: '/semantic-layer/query',
|
||||
body: {
|
||||
sources: [source],
|
||||
dialect: 'postgres',
|
||||
query: { measures: ['orders.order_count'], dimensions: [] },
|
||||
},
|
||||
},
|
||||
]);
|
||||
} finally {
|
||||
server.close();
|
||||
}
|
||||
});
|
||||
|
||||
it('posts source-generation JSON to a running HTTP daemon endpoint', async () => {
|
||||
const requests: Array<{ url: string | undefined; body: unknown }> = [];
|
||||
const server = createServer((request, response) => {
|
||||
const chunks: Buffer[] = [];
|
||||
request.on('data', (chunk: Buffer) => chunks.push(chunk));
|
||||
request.on('end', () => {
|
||||
requests.push({
|
||||
url: request.url,
|
||||
body: JSON.parse(Buffer.concat(chunks).toString('utf8')),
|
||||
});
|
||||
response.writeHead(200, { 'content-type': 'application/json' });
|
||||
response.end(JSON.stringify(sourceGenerationDaemonResponse));
|
||||
});
|
||||
});
|
||||
|
||||
server.listen(0, '127.0.0.1');
|
||||
await once(server, 'listening');
|
||||
try {
|
||||
const address = server.address();
|
||||
if (!address || typeof address === 'string') {
|
||||
throw new Error('expected TCP server address');
|
||||
}
|
||||
const port = createHttpSemanticLayerComputePort({ baseUrl: `http://127.0.0.1:${address.port}` });
|
||||
|
||||
await expect(port.generateSources(sourceGenerationInput)).resolves.toEqual({
|
||||
sourceCount: 2,
|
||||
sources: sourceGenerationDaemonResponse.sources,
|
||||
});
|
||||
|
||||
expect(requests).toEqual([
|
||||
{
|
||||
url: '/semantic-layer/generate-sources',
|
||||
body: sourceGenerationDaemonPayload,
|
||||
},
|
||||
]);
|
||||
} finally {
|
||||
server.close();
|
||||
}
|
||||
});
|
||||
});
|
||||
314
packages/cli/src/context/daemon/semantic-layer-compute.ts
Normal file
314
packages/cli/src/context/daemon/semantic-layer-compute.ts
Normal file
|
|
@ -0,0 +1,314 @@
|
|||
import { request as httpRequest } from 'node:http';
|
||||
import { request as httpsRequest } from 'node:https';
|
||||
import { URL } from 'node:url';
|
||||
import { spawn } from 'node:child_process';
|
||||
import type { ResolvedSemanticLayerSource, SemanticLayerQueryInput } from '../sl/types.js';
|
||||
|
||||
interface KtxSemanticLayerComputeQueryResult {
|
||||
sql: string;
|
||||
dialect: string;
|
||||
columns: Array<Record<string, unknown>>;
|
||||
plan: Record<string, unknown>;
|
||||
}
|
||||
|
||||
interface KtxSemanticLayerComputeValidationResult {
|
||||
valid: boolean;
|
||||
errors: string[];
|
||||
warnings: string[];
|
||||
perSourceWarnings: Record<string, string[]>;
|
||||
}
|
||||
|
||||
interface KtxSemanticLayerSourceGenerationColumnInput {
|
||||
name: string;
|
||||
type: string;
|
||||
primaryKey?: boolean;
|
||||
nullable?: boolean;
|
||||
comment?: string | null;
|
||||
}
|
||||
|
||||
interface KtxSemanticLayerSourceGenerationTableInput {
|
||||
name: string;
|
||||
catalog?: string | null;
|
||||
db?: string | null;
|
||||
comment?: string | null;
|
||||
columns: KtxSemanticLayerSourceGenerationColumnInput[];
|
||||
}
|
||||
|
||||
interface KtxSemanticLayerSourceGenerationLinkInput {
|
||||
fromTable: string;
|
||||
fromColumn: string;
|
||||
toTable: string;
|
||||
toColumn: string;
|
||||
relationshipType: string;
|
||||
}
|
||||
|
||||
interface KtxSemanticLayerSourceGenerationInput {
|
||||
tables: KtxSemanticLayerSourceGenerationTableInput[];
|
||||
links: KtxSemanticLayerSourceGenerationLinkInput[];
|
||||
dialect?: string;
|
||||
}
|
||||
|
||||
interface KtxSemanticLayerSourceGenerationResult {
|
||||
sources: Array<Record<string, unknown>>;
|
||||
sourceCount: number;
|
||||
}
|
||||
|
||||
export interface KtxSemanticLayerComputePort {
|
||||
/**
|
||||
* Callers must pass sources sanitized through toResolvedWire. The Python
|
||||
* daemon rejects authoring-only fields such as usage and inherits_columns_from.
|
||||
*/
|
||||
query(input: {
|
||||
sources: ResolvedSemanticLayerSource[];
|
||||
query: SemanticLayerQueryInput;
|
||||
dialect: string;
|
||||
}): Promise<KtxSemanticLayerComputeQueryResult>;
|
||||
/**
|
||||
* Callers must pass sources sanitized through toResolvedWire. The Python
|
||||
* daemon rejects authoring-only fields such as usage and inherits_columns_from.
|
||||
*/
|
||||
validateSources(input: {
|
||||
sources: ResolvedSemanticLayerSource[];
|
||||
dialect: string;
|
||||
recentlyTouched?: string[];
|
||||
}): Promise<KtxSemanticLayerComputeValidationResult>;
|
||||
generateSources(input: KtxSemanticLayerSourceGenerationInput): Promise<KtxSemanticLayerSourceGenerationResult>;
|
||||
}
|
||||
|
||||
type KtxDaemonCommand = 'semantic-query' | 'semantic-validate' | 'semantic-generate-sources';
|
||||
|
||||
type KtxDaemonJsonRunner = (
|
||||
subcommand: KtxDaemonCommand,
|
||||
payload: Record<string, unknown>,
|
||||
) => Promise<Record<string, unknown>>;
|
||||
|
||||
type KtxDaemonHttpJsonRunner = (path: string, payload: Record<string, unknown>) => Promise<Record<string, unknown>>;
|
||||
|
||||
export interface PythonSemanticLayerComputeOptions {
|
||||
command?: string;
|
||||
args?: string[];
|
||||
cwd?: string;
|
||||
env?: NodeJS.ProcessEnv;
|
||||
runJson?: KtxDaemonJsonRunner;
|
||||
}
|
||||
|
||||
/** @internal */
|
||||
export interface HttpSemanticLayerComputeOptions {
|
||||
baseUrl: string;
|
||||
requestJson?: KtxDaemonHttpJsonRunner;
|
||||
}
|
||||
|
||||
function parseJsonObject(raw: string, subcommand: string): Record<string, unknown> {
|
||||
const parsed = JSON.parse(raw) as unknown;
|
||||
if (!parsed || typeof parsed !== 'object' || Array.isArray(parsed)) {
|
||||
throw new Error(`ktx-daemon ${subcommand} returned non-object JSON`);
|
||||
}
|
||||
return parsed as Record<string, unknown>;
|
||||
}
|
||||
|
||||
function runProcessJson(
|
||||
options: Required<Pick<PythonSemanticLayerComputeOptions, 'command' | 'args'>> &
|
||||
Pick<PythonSemanticLayerComputeOptions, 'cwd' | 'env'>,
|
||||
): KtxDaemonJsonRunner {
|
||||
return async (subcommand: KtxDaemonCommand, payload: Record<string, unknown>): Promise<Record<string, unknown>> =>
|
||||
new Promise((resolve, reject) => {
|
||||
const child = spawn(options.command, [...options.args, subcommand], {
|
||||
cwd: options.cwd,
|
||||
env: { ...process.env, ...options.env },
|
||||
stdio: ['pipe', 'pipe', 'pipe'],
|
||||
});
|
||||
const stdout: Buffer[] = [];
|
||||
const stderr: Buffer[] = [];
|
||||
|
||||
child.stdout.on('data', (chunk: Buffer) => stdout.push(chunk));
|
||||
child.stderr.on('data', (chunk: Buffer) => stderr.push(chunk));
|
||||
child.on('error', reject);
|
||||
child.on('close', (code) => {
|
||||
const stdoutText = Buffer.concat(stdout).toString('utf8').trim();
|
||||
const stderrText = Buffer.concat(stderr).toString('utf8').trim();
|
||||
if (code !== 0) {
|
||||
reject(new Error(`ktx-daemon ${subcommand} failed: ${stderrText || `exit code ${code}`}`));
|
||||
return;
|
||||
}
|
||||
try {
|
||||
resolve(parseJsonObject(stdoutText, subcommand));
|
||||
} catch (error) {
|
||||
reject(error);
|
||||
}
|
||||
});
|
||||
child.stdin.end(`${JSON.stringify(payload)}\n`);
|
||||
});
|
||||
}
|
||||
|
||||
function normalizedBaseUrl(baseUrl: string): string {
|
||||
return baseUrl.endsWith('/') ? baseUrl : `${baseUrl}/`;
|
||||
}
|
||||
|
||||
function postJson(baseUrl: string): KtxDaemonHttpJsonRunner {
|
||||
return async (path, payload) =>
|
||||
new Promise((resolve, reject) => {
|
||||
const target = new URL(path.replace(/^\//, ''), normalizedBaseUrl(baseUrl));
|
||||
const body = JSON.stringify(payload);
|
||||
const client = target.protocol === 'https:' ? httpsRequest : httpRequest;
|
||||
const request = client(
|
||||
target,
|
||||
{
|
||||
method: 'POST',
|
||||
headers: {
|
||||
accept: 'application/json',
|
||||
'content-type': 'application/json',
|
||||
'content-length': Buffer.byteLength(body),
|
||||
},
|
||||
},
|
||||
(response) => {
|
||||
const chunks: Buffer[] = [];
|
||||
response.on('data', (chunk: Buffer) => chunks.push(chunk));
|
||||
response.on('end', () => {
|
||||
const text = Buffer.concat(chunks).toString('utf8');
|
||||
const statusCode = response.statusCode ?? 0;
|
||||
if (statusCode < 200 || statusCode >= 300) {
|
||||
reject(new Error(`ktx-daemon HTTP ${path} failed with ${statusCode}: ${text}`));
|
||||
return;
|
||||
}
|
||||
try {
|
||||
resolve(parseJsonObject(text, path));
|
||||
} catch (error) {
|
||||
reject(error);
|
||||
}
|
||||
});
|
||||
},
|
||||
);
|
||||
request.on('error', reject);
|
||||
request.end(body);
|
||||
});
|
||||
}
|
||||
|
||||
function stringArray(value: unknown): string[] {
|
||||
return Array.isArray(value) ? value.filter((item): item is string => typeof item === 'string') : [];
|
||||
}
|
||||
|
||||
function recordValue(value: unknown): Record<string, unknown> {
|
||||
return value && typeof value === 'object' && !Array.isArray(value) ? (value as Record<string, unknown>) : {};
|
||||
}
|
||||
|
||||
function recordArray(value: unknown): Array<Record<string, unknown>> {
|
||||
return Array.isArray(value)
|
||||
? value.filter(
|
||||
(item): item is Record<string, unknown> => item !== null && typeof item === 'object' && !Array.isArray(item),
|
||||
)
|
||||
: [];
|
||||
}
|
||||
|
||||
function sourceGenerationPayload(input: KtxSemanticLayerSourceGenerationInput): Record<string, unknown> {
|
||||
return {
|
||||
tables: input.tables.map((table) => ({
|
||||
name: table.name,
|
||||
...(table.catalog !== undefined ? { catalog: table.catalog } : {}),
|
||||
...(table.db !== undefined ? { db: table.db } : {}),
|
||||
...(table.comment !== undefined ? { comment: table.comment } : {}),
|
||||
columns: table.columns.map((column) => ({
|
||||
name: column.name,
|
||||
type: column.type,
|
||||
...(column.primaryKey !== undefined ? { primary_key: column.primaryKey } : {}),
|
||||
...(column.nullable !== undefined ? { nullable: column.nullable } : {}),
|
||||
...(column.comment !== undefined ? { comment: column.comment } : {}),
|
||||
})),
|
||||
})),
|
||||
links: input.links.map((link) => ({
|
||||
from_table: link.fromTable,
|
||||
from_column: link.fromColumn,
|
||||
to_table: link.toTable,
|
||||
to_column: link.toColumn,
|
||||
relationship_type: link.relationshipType,
|
||||
})),
|
||||
dialect: input.dialect ?? 'postgres',
|
||||
};
|
||||
}
|
||||
|
||||
function sourceGenerationResult(raw: Record<string, unknown>): KtxSemanticLayerSourceGenerationResult {
|
||||
return {
|
||||
sources: recordArray(raw.sources),
|
||||
sourceCount: typeof raw.source_count === 'number' ? raw.source_count : recordArray(raw.sources).length,
|
||||
};
|
||||
}
|
||||
|
||||
export function createPythonSemanticLayerComputePort(
|
||||
options: PythonSemanticLayerComputeOptions = {},
|
||||
): KtxSemanticLayerComputePort {
|
||||
const command = options.command ?? 'python';
|
||||
const args = options.args ?? ['-m', 'ktx_daemon'];
|
||||
const runJson = options.runJson ?? runProcessJson({ command, args, cwd: options.cwd, env: options.env });
|
||||
|
||||
return {
|
||||
async query(input) {
|
||||
const raw = await runJson('semantic-query', {
|
||||
sources: input.sources,
|
||||
dialect: input.dialect,
|
||||
query: input.query,
|
||||
});
|
||||
return {
|
||||
sql: typeof raw.sql === 'string' ? raw.sql : '',
|
||||
dialect: typeof raw.dialect === 'string' ? raw.dialect : input.dialect,
|
||||
columns: recordArray(raw.columns),
|
||||
plan: recordValue(raw.plan),
|
||||
};
|
||||
},
|
||||
async validateSources(input) {
|
||||
const raw = await runJson('semantic-validate', {
|
||||
sources: input.sources,
|
||||
dialect: input.dialect,
|
||||
recently_touched: input.recentlyTouched,
|
||||
});
|
||||
return {
|
||||
valid: raw.valid === true,
|
||||
errors: stringArray(raw.errors),
|
||||
warnings: stringArray(raw.warnings),
|
||||
perSourceWarnings: recordValue(raw.per_source_warnings) as Record<string, string[]>,
|
||||
};
|
||||
},
|
||||
async generateSources(input) {
|
||||
const raw = await runJson('semantic-generate-sources', sourceGenerationPayload(input));
|
||||
return sourceGenerationResult(raw);
|
||||
},
|
||||
};
|
||||
}
|
||||
|
||||
/** @internal */
|
||||
export function createHttpSemanticLayerComputePort(
|
||||
options: HttpSemanticLayerComputeOptions,
|
||||
): KtxSemanticLayerComputePort {
|
||||
const requestJson = options.requestJson ?? postJson(options.baseUrl);
|
||||
|
||||
return {
|
||||
async query(input) {
|
||||
const raw = await requestJson('/semantic-layer/query', {
|
||||
sources: input.sources,
|
||||
dialect: input.dialect,
|
||||
query: input.query,
|
||||
});
|
||||
return {
|
||||
sql: typeof raw.sql === 'string' ? raw.sql : '',
|
||||
dialect: typeof raw.dialect === 'string' ? raw.dialect : input.dialect,
|
||||
columns: recordArray(raw.columns),
|
||||
plan: recordValue(raw.plan),
|
||||
};
|
||||
},
|
||||
async validateSources(input) {
|
||||
const raw = await requestJson('/semantic-layer/validate', {
|
||||
sources: input.sources,
|
||||
dialect: input.dialect,
|
||||
recently_touched: input.recentlyTouched,
|
||||
});
|
||||
return {
|
||||
valid: raw.valid === true,
|
||||
errors: stringArray(raw.errors),
|
||||
warnings: stringArray(raw.warnings),
|
||||
perSourceWarnings: recordValue(raw.per_source_warnings) as Record<string, string[]>,
|
||||
};
|
||||
},
|
||||
async generateSources(input) {
|
||||
const raw = await requestJson('/semantic-layer/generate-sources', sourceGenerationPayload(input));
|
||||
return sourceGenerationResult(raw);
|
||||
},
|
||||
};
|
||||
}
|
||||
196
packages/cli/src/context/index-sync/reindex.test.ts
Normal file
196
packages/cli/src/context/index-sync/reindex.test.ts
Normal file
|
|
@ -0,0 +1,196 @@
|
|||
import { mkdir, mkdtemp, rm, writeFile } from 'node:fs/promises';
|
||||
import { tmpdir } from 'node:os';
|
||||
import { join } from 'node:path';
|
||||
import { afterEach, beforeEach, describe, expect, it } from 'vitest';
|
||||
import type { KtxEmbeddingPort } from '../../context/core/embedding.js';
|
||||
import { initKtxProject, loadKtxProject, type KtxLocalProject } from '../../context/project/project.js';
|
||||
import { SqliteKnowledgeIndex } from '../wiki/sqlite-knowledge-index.js';
|
||||
import { reindexLocalIndexes } from './reindex.js';
|
||||
|
||||
class FakeEmbeddingPort implements KtxEmbeddingPort {
|
||||
readonly maxBatchSize = 8;
|
||||
|
||||
async computeEmbedding(text: string): Promise<number[]> {
|
||||
return [text.length, 1];
|
||||
}
|
||||
|
||||
async computeEmbeddingsBulk(texts: string[]): Promise<number[][]> {
|
||||
return texts.map((text) => [text.length, 1]);
|
||||
}
|
||||
}
|
||||
|
||||
async function createProject(tempDir: string): Promise<KtxLocalProject> {
|
||||
await initKtxProject({ projectDir: tempDir, force: true });
|
||||
return loadKtxProject({ projectDir: tempDir });
|
||||
}
|
||||
|
||||
describe('reindexLocalIndexes', () => {
|
||||
let tempDir: string;
|
||||
|
||||
beforeEach(async () => {
|
||||
tempDir = await mkdtemp(join(tmpdir(), 'ktx-reindex-'));
|
||||
});
|
||||
|
||||
afterEach(async () => {
|
||||
await rm(tempDir, { recursive: true, force: true });
|
||||
});
|
||||
|
||||
it('returns an empty summary when no wiki or semantic-layer directories exist', async () => {
|
||||
const project = await createProject(tempDir);
|
||||
await rm(join(project.projectDir, 'wiki'), { recursive: true, force: true });
|
||||
await rm(join(project.projectDir, 'semantic-layer'), { recursive: true, force: true });
|
||||
|
||||
await expect(reindexLocalIndexes(project, { force: false, embeddingService: null })).resolves.toMatchObject({
|
||||
scopes: [],
|
||||
totals: { scanned: 0, updated: 0, deleted: 0, embeddingsRecomputed: 0, embeddingsFailed: 0 },
|
||||
force: false,
|
||||
embeddingsAvailable: false,
|
||||
});
|
||||
});
|
||||
|
||||
it('discovers empty directories as zero-row scopes', async () => {
|
||||
const project = await createProject(tempDir);
|
||||
await mkdir(join(project.projectDir, 'wiki/user/local'), { recursive: true });
|
||||
await mkdir(join(project.projectDir, 'semantic-layer/warehouse'), { recursive: true });
|
||||
|
||||
const summary = await reindexLocalIndexes(project, { force: false, embeddingService: null });
|
||||
|
||||
expect(summary.scopes.map((scope) => scope.label)).toEqual(['global', 'user/local', 'warehouse']);
|
||||
expect(summary.totals.scanned).toBe(0);
|
||||
});
|
||||
|
||||
it('indexes mixed wiki and SL sources and reports totals', async () => {
|
||||
const project = await createProject(tempDir);
|
||||
await writeFile(
|
||||
join(project.projectDir, 'wiki/global/revenue.md'),
|
||||
'---\nsummary: Revenue\nusage_mode: auto\n---\n\nPaid orders.\n',
|
||||
'utf-8',
|
||||
);
|
||||
await mkdir(join(project.projectDir, 'semantic-layer/warehouse'), { recursive: true });
|
||||
await writeFile(
|
||||
join(project.projectDir, 'semantic-layer/warehouse/orders.yaml'),
|
||||
'name: orders\ntable: public.orders\ngrain: [id]\ncolumns:\n - name: id\n type: number\njoins: []\nmeasures: []\n',
|
||||
'utf-8',
|
||||
);
|
||||
|
||||
const summary = await reindexLocalIndexes(project, {
|
||||
force: false,
|
||||
embeddingService: new FakeEmbeddingPort(),
|
||||
});
|
||||
|
||||
expect(summary.scopes).toHaveLength(2);
|
||||
expect(summary.totals).toMatchObject({ scanned: 2, updated: 2, deleted: 0, embeddingsRecomputed: 2 });
|
||||
expect(summary.embeddingsAvailable).toBe(true);
|
||||
});
|
||||
|
||||
it('does not report unchanged lexical-only rows as updated on repeated runs', async () => {
|
||||
const project = await createProject(tempDir);
|
||||
await writeFile(
|
||||
join(project.projectDir, 'wiki/global/revenue.md'),
|
||||
'---\nsummary: Revenue\nusage_mode: auto\n---\n\nPaid orders.\n',
|
||||
'utf-8',
|
||||
);
|
||||
await mkdir(join(project.projectDir, 'semantic-layer/warehouse'), { recursive: true });
|
||||
await writeFile(
|
||||
join(project.projectDir, 'semantic-layer/warehouse/orders.yaml'),
|
||||
'name: orders\ntable: public.orders\ngrain: [id]\ncolumns:\n - name: id\n type: number\njoins: []\nmeasures: []\n',
|
||||
'utf-8',
|
||||
);
|
||||
|
||||
const first = await reindexLocalIndexes(project, { force: false, embeddingService: null });
|
||||
expect(first.totals).toMatchObject({
|
||||
scanned: 2,
|
||||
updated: 2,
|
||||
deleted: 0,
|
||||
embeddingsRecomputed: 0,
|
||||
embeddingsFailed: 0,
|
||||
});
|
||||
|
||||
const second = await reindexLocalIndexes(project, { force: false, embeddingService: null });
|
||||
|
||||
expect(second.totals).toMatchObject({
|
||||
scanned: 2,
|
||||
updated: 0,
|
||||
deleted: 0,
|
||||
embeddingsRecomputed: 0,
|
||||
embeddingsFailed: 0,
|
||||
});
|
||||
expect(second.scopes.map((scope) => [scope.label, scope.updated])).toEqual([
|
||||
['global', 0],
|
||||
['warehouse', 0],
|
||||
]);
|
||||
});
|
||||
|
||||
it('force clears stale rows before rebuilding each discovered scope', async () => {
|
||||
const project = await createProject(tempDir);
|
||||
const wikiIndex = new SqliteKnowledgeIndex({ dbPath: join(project.projectDir, '.ktx/db.sqlite') });
|
||||
wikiIndex.sync([
|
||||
{
|
||||
path: 'wiki/global/stale.md',
|
||||
key: 'stale',
|
||||
scope: 'GLOBAL',
|
||||
scopeId: null,
|
||||
summary: 'Stale',
|
||||
content: 'Stale content',
|
||||
tags: [],
|
||||
embedding: [1, 0],
|
||||
},
|
||||
]);
|
||||
await writeFile(
|
||||
join(project.projectDir, 'wiki/global/revenue.md'),
|
||||
'---\nsummary: Revenue\nusage_mode: auto\n---\n\nPaid orders.\n',
|
||||
'utf-8',
|
||||
);
|
||||
|
||||
const summary = await reindexLocalIndexes(project, {
|
||||
force: true,
|
||||
embeddingService: new FakeEmbeddingPort(),
|
||||
});
|
||||
|
||||
expect(summary.force).toBe(true);
|
||||
expect(summary.totals).toMatchObject({ scanned: 1, updated: 1, deleted: 0 });
|
||||
expect(wikiIndex.search('Stale', 10)).toEqual([]);
|
||||
});
|
||||
|
||||
it('captures a per-scope error and continues other scopes', async () => {
|
||||
const project = await createProject(tempDir);
|
||||
await writeFile(
|
||||
join(project.projectDir, 'wiki/global/revenue.md'),
|
||||
'---\nsummary: Revenue\nusage_mode: auto\n---\n\nPaid orders.\n',
|
||||
'utf-8',
|
||||
);
|
||||
await mkdir(join(project.projectDir, 'semantic-layer/warehouse'), { recursive: true });
|
||||
await writeFile(join(project.projectDir, 'semantic-layer/warehouse/broken.yaml'), 'not: [valid', 'utf-8');
|
||||
|
||||
const summary = await reindexLocalIndexes(project, { force: false, embeddingService: null });
|
||||
|
||||
expect(summary.scopes.find((scope) => scope.label === 'global')?.error).toBeUndefined();
|
||||
expect(summary.scopes.find((scope) => scope.label === 'warehouse')?.error).toContain('YAML');
|
||||
});
|
||||
|
||||
it('marks a scope errored when configured embeddings fail', async () => {
|
||||
const project = await createProject(tempDir);
|
||||
await writeFile(
|
||||
join(project.projectDir, 'wiki/global/revenue.md'),
|
||||
'---\nsummary: Revenue\nusage_mode: auto\n---\n\nPaid orders.\n',
|
||||
'utf-8',
|
||||
);
|
||||
const embeddingService: KtxEmbeddingPort = {
|
||||
maxBatchSize: 8,
|
||||
async computeEmbedding() {
|
||||
throw new Error('embedding provider unavailable');
|
||||
},
|
||||
async computeEmbeddingsBulk() {
|
||||
throw new Error('embedding provider unavailable');
|
||||
},
|
||||
};
|
||||
|
||||
const summary = await reindexLocalIndexes(project, { force: false, embeddingService });
|
||||
|
||||
expect(summary.scopes[0]).toMatchObject({
|
||||
label: 'global',
|
||||
embeddingsFailed: 1,
|
||||
error: '1 embedding recomputation failed',
|
||||
});
|
||||
});
|
||||
});
|
||||
166
packages/cli/src/context/index-sync/reindex.ts
Normal file
166
packages/cli/src/context/index-sync/reindex.ts
Normal file
|
|
@ -0,0 +1,166 @@
|
|||
import { readdir, stat } from 'node:fs/promises';
|
||||
import { join, relative } from 'node:path';
|
||||
import { ktxLocalStateDbPath } from '../../context/project/local-state-db.js';
|
||||
import type { KtxLocalProject } from '../../context/project/project.js';
|
||||
import { loadLocalSlSourceRecords } from '../../context/sl/local-sl.js';
|
||||
import { SlSearchService } from '../../context/sl/sl-search.service.js';
|
||||
import { SqliteSlSourcesIndex } from '../../context/sl/sqlite-sl-sources-index.js';
|
||||
import { KnowledgeWikiService } from '../../context/wiki/knowledge-wiki.service.js';
|
||||
import { SqliteKnowledgeIndex } from '../../context/wiki/sqlite-knowledge-index.js';
|
||||
import type { ReindexOptions, ReindexScopeResult, ReindexSummary, ReindexWorkResult } from './types.js';
|
||||
|
||||
type DiscoveredScope =
|
||||
| { kind: 'wiki'; scope: 'GLOBAL'; scopeId: null; label: 'global' }
|
||||
| { kind: 'wiki'; scope: 'USER'; scopeId: string; label: `user/${string}` }
|
||||
| { kind: 'sl'; connectionId: string; label: string };
|
||||
|
||||
const ZERO: ReindexWorkResult = {
|
||||
scanned: 0,
|
||||
updated: 0,
|
||||
deleted: 0,
|
||||
embeddingsRecomputed: 0,
|
||||
embeddingsFailed: 0,
|
||||
};
|
||||
|
||||
async function directoryExists(path: string): Promise<boolean> {
|
||||
try {
|
||||
return (await stat(path)).isDirectory();
|
||||
} catch {
|
||||
return false;
|
||||
}
|
||||
}
|
||||
|
||||
async function childDirectories(path: string): Promise<string[]> {
|
||||
try {
|
||||
const entries = await readdir(path, { withFileTypes: true });
|
||||
return entries
|
||||
.filter((entry) => entry.isDirectory())
|
||||
.map((entry) => entry.name)
|
||||
.sort((left, right) => left.localeCompare(right));
|
||||
} catch (error) {
|
||||
if ((error as NodeJS.ErrnoException).code === 'ENOENT') {
|
||||
return [];
|
||||
}
|
||||
throw error;
|
||||
}
|
||||
}
|
||||
|
||||
async function discoverReindexScopes(project: KtxLocalProject): Promise<DiscoveredScope[]> {
|
||||
const scopes: DiscoveredScope[] = [];
|
||||
if (await directoryExists(join(project.projectDir, 'wiki/global'))) {
|
||||
scopes.push({ kind: 'wiki', scope: 'GLOBAL', scopeId: null, label: 'global' });
|
||||
}
|
||||
for (const userId of await childDirectories(join(project.projectDir, 'wiki/user'))) {
|
||||
scopes.push({ kind: 'wiki', scope: 'USER', scopeId: userId, label: `user/${userId}` });
|
||||
}
|
||||
for (const connectionId of await childDirectories(join(project.projectDir, 'semantic-layer'))) {
|
||||
if (connectionId !== '_schema') {
|
||||
scopes.push({ kind: 'sl', connectionId, label: connectionId });
|
||||
}
|
||||
}
|
||||
return scopes;
|
||||
}
|
||||
|
||||
function errorMessage(error: unknown): string {
|
||||
if (!(error instanceof Error)) {
|
||||
return String(error);
|
||||
}
|
||||
return error.name && error.name !== 'Error' ? `${error.name}: ${error.message}` : error.message;
|
||||
}
|
||||
|
||||
function addTotals(left: ReindexWorkResult, right: ReindexWorkResult): ReindexWorkResult {
|
||||
return {
|
||||
scanned: left.scanned + right.scanned,
|
||||
updated: left.updated + right.updated,
|
||||
deleted: left.deleted + right.deleted,
|
||||
embeddingsRecomputed: left.embeddingsRecomputed + right.embeddingsRecomputed,
|
||||
embeddingsFailed: left.embeddingsFailed + right.embeddingsFailed,
|
||||
};
|
||||
}
|
||||
|
||||
function durationSince(startedAt: bigint): number {
|
||||
return Number((process.hrtime.bigint() - startedAt) / 1_000_000n);
|
||||
}
|
||||
|
||||
function embeddingFailureError(work: ReindexWorkResult): string | undefined {
|
||||
if (work.embeddingsFailed === 0) {
|
||||
return undefined;
|
||||
}
|
||||
return `${work.embeddingsFailed} embedding recomputation${work.embeddingsFailed === 1 ? '' : 's'} failed`;
|
||||
}
|
||||
|
||||
export async function reindexLocalIndexes(
|
||||
project: KtxLocalProject,
|
||||
options: ReindexOptions,
|
||||
): Promise<ReindexSummary> {
|
||||
const startedAt = process.hrtime.bigint();
|
||||
const dbPath = ktxLocalStateDbPath(project);
|
||||
const scopes = await discoverReindexScopes(project);
|
||||
const wikiIndex = new SqliteKnowledgeIndex({ dbPath });
|
||||
const slIndex = new SqliteSlSourcesIndex({ dbPath });
|
||||
const wikiService = new KnowledgeWikiService(project.fileStore, options.embeddingService, wikiIndex, project.git);
|
||||
const slService = new SlSearchService(options.embeddingService, slIndex);
|
||||
const results: ReindexScopeResult[] = [];
|
||||
|
||||
for (const scope of scopes) {
|
||||
const scopeStartedAt = process.hrtime.bigint();
|
||||
try {
|
||||
let work: ReindexWorkResult;
|
||||
if (scope.kind === 'wiki') {
|
||||
if (options.force) {
|
||||
wikiIndex.clear(scope.scope, scope.scopeId);
|
||||
}
|
||||
work = await wikiService.syncIndex(scope.scope, scope.scopeId);
|
||||
results.push({
|
||||
kind: 'wiki',
|
||||
label: scope.label,
|
||||
scope: scope.scope === 'GLOBAL' ? 'global' : 'user',
|
||||
scopeId: scope.scopeId,
|
||||
...work,
|
||||
...(options.force ? { deleted: 0 } : {}),
|
||||
...(options.embeddingService && work.embeddingsFailed > 0 ? { error: embeddingFailureError(work) } : {}),
|
||||
durationMs: durationSince(scopeStartedAt),
|
||||
});
|
||||
continue;
|
||||
}
|
||||
|
||||
if (options.force) {
|
||||
await slIndex.clear(scope.connectionId);
|
||||
}
|
||||
const records = await loadLocalSlSourceRecords(project, { connectionId: scope.connectionId });
|
||||
work = await slService.indexSources(
|
||||
scope.connectionId,
|
||||
records.map((record) => record.source),
|
||||
);
|
||||
results.push({
|
||||
kind: 'sl',
|
||||
label: scope.label,
|
||||
connectionId: scope.connectionId,
|
||||
...work,
|
||||
...(options.force ? { deleted: 0 } : {}),
|
||||
...(options.embeddingService && work.embeddingsFailed > 0 ? { error: embeddingFailureError(work) } : {}),
|
||||
durationMs: durationSince(scopeStartedAt),
|
||||
});
|
||||
} catch (error) {
|
||||
results.push({
|
||||
kind: scope.kind,
|
||||
label: scope.label,
|
||||
...(scope.kind === 'wiki'
|
||||
? { scope: scope.scope === 'GLOBAL' ? 'global' : 'user', scopeId: scope.scopeId }
|
||||
: { connectionId: scope.connectionId }),
|
||||
...ZERO,
|
||||
durationMs: durationSince(scopeStartedAt),
|
||||
error: errorMessage(error),
|
||||
});
|
||||
}
|
||||
}
|
||||
|
||||
return {
|
||||
scopes: results,
|
||||
totals: results.reduce(addTotals, ZERO),
|
||||
dbPath: relative(project.projectDir, dbPath) || dbPath,
|
||||
force: options.force,
|
||||
embeddingsAvailable: options.embeddingService !== null,
|
||||
durationMs: durationSince(startedAt),
|
||||
};
|
||||
}
|
||||
33
packages/cli/src/context/index-sync/types.ts
Normal file
33
packages/cli/src/context/index-sync/types.ts
Normal file
|
|
@ -0,0 +1,33 @@
|
|||
import type { KtxEmbeddingPort } from '../../context/core/embedding.js';
|
||||
|
||||
export interface ReindexOptions {
|
||||
force: boolean;
|
||||
embeddingService: KtxEmbeddingPort | null;
|
||||
}
|
||||
|
||||
export interface ReindexWorkResult {
|
||||
scanned: number;
|
||||
updated: number;
|
||||
deleted: number;
|
||||
embeddingsRecomputed: number;
|
||||
embeddingsFailed: number;
|
||||
}
|
||||
|
||||
export interface ReindexScopeResult extends ReindexWorkResult {
|
||||
kind: 'wiki' | 'sl';
|
||||
label: string;
|
||||
scope?: 'global' | 'user';
|
||||
scopeId?: string | null;
|
||||
connectionId?: string;
|
||||
durationMs: number;
|
||||
error?: string;
|
||||
}
|
||||
|
||||
export interface ReindexSummary {
|
||||
scopes: ReindexScopeResult[];
|
||||
totals: ReindexWorkResult;
|
||||
dbPath: string;
|
||||
force: boolean;
|
||||
embeddingsAvailable: boolean;
|
||||
durationMs: number;
|
||||
}
|
||||
42
packages/cli/src/context/ingest/action-identity.test.ts
Normal file
42
packages/cli/src/context/ingest/action-identity.test.ts
Normal file
|
|
@ -0,0 +1,42 @@
|
|||
import { describe, expect, it } from 'vitest';
|
||||
import { actionTargetConnectionId, memoryActionIdentity } from './action-identity.js';
|
||||
|
||||
describe('memory action target identity', () => {
|
||||
it('keys SL actions by target connection and wiki actions by run connection', () => {
|
||||
expect(
|
||||
memoryActionIdentity(
|
||||
{ target: 'sl', type: 'created', key: 'orders', detail: '', targetConnectionId: 'warehouse-b' },
|
||||
'looker-run',
|
||||
),
|
||||
).toBe('sl:warehouse-b:orders');
|
||||
|
||||
expect(memoryActionIdentity({ target: 'sl', type: 'created', key: 'orders', detail: '' }, 'warehouse-a')).toBe(
|
||||
'sl:warehouse-a:orders',
|
||||
);
|
||||
|
||||
expect(
|
||||
memoryActionIdentity(
|
||||
{
|
||||
target: 'wiki',
|
||||
type: 'created',
|
||||
key: 'wiki/global/orders.md',
|
||||
detail: '',
|
||||
targetConnectionId: 'ignored',
|
||||
},
|
||||
'looker-run',
|
||||
),
|
||||
).toBe('wiki:looker-run:wiki/global/orders.md');
|
||||
});
|
||||
|
||||
it('resolves action target connection only for SL actions', () => {
|
||||
expect(
|
||||
actionTargetConnectionId(
|
||||
{ target: 'sl', type: 'updated', key: 'orders', detail: '', targetConnectionId: 'warehouse-b' },
|
||||
'looker-run',
|
||||
),
|
||||
).toBe('warehouse-b');
|
||||
expect(actionTargetConnectionId({ target: 'wiki', type: 'updated', key: 'orders', detail: '' }, 'looker-run')).toBe(
|
||||
'looker-run',
|
||||
);
|
||||
});
|
||||
});
|
||||
9
packages/cli/src/context/ingest/action-identity.ts
Normal file
9
packages/cli/src/context/ingest/action-identity.ts
Normal file
|
|
@ -0,0 +1,9 @@
|
|||
import type { MemoryAction } from '../../context/memory/types.js';
|
||||
|
||||
export function actionTargetConnectionId(action: MemoryAction, runConnectionId: string): string {
|
||||
return action.target === 'sl' ? (action.targetConnectionId ?? runConnectionId) : runConnectionId;
|
||||
}
|
||||
|
||||
export function memoryActionIdentity(action: MemoryAction, runConnectionId: string): string {
|
||||
return `${action.target}:${actionTargetConnectionId(action, runConnectionId)}:${action.key}`;
|
||||
}
|
||||
|
|
@ -0,0 +1,214 @@
|
|||
import { describe, expect, it } from 'vitest';
|
||||
import { parseDbtSchemaFile, parseDbtSchemaFiles } from './parse-schema.js';
|
||||
|
||||
describe('dbt descriptions schema parser', () => {
|
||||
it('resolves shared dbt vars and defaults before parsing schema YAML', () => {
|
||||
const result = parseDbtSchemaFile(
|
||||
`
|
||||
version: 2
|
||||
sources:
|
||||
- name: raw
|
||||
database: "{{ var('database') }}"
|
||||
schema: "{{ var('schema', 'fallback_schema') }}"
|
||||
tables:
|
||||
- name: orders
|
||||
identifier: fct_orders
|
||||
description: "Orders from {{ var('database') }}"
|
||||
columns:
|
||||
- name: customer_id
|
||||
description: "Customer id"
|
||||
tests:
|
||||
- relationships:
|
||||
to: ref('customers')
|
||||
field: id
|
||||
models:
|
||||
- name: "{{ var('model_name', 'orders_model') }}"
|
||||
schema: "{{ var('model_schema') }}"
|
||||
columns:
|
||||
- name: id
|
||||
description: "Order id"
|
||||
`,
|
||||
{ path: 'models/schema.yml', variables: new Map([['database', 'analytics'], ['model_schema', 'mart']]) },
|
||||
);
|
||||
|
||||
expect(result.tables).toEqual([
|
||||
{
|
||||
name: 'fct_orders',
|
||||
description: 'Orders from analytics',
|
||||
database: 'analytics',
|
||||
schema: 'fallback_schema',
|
||||
columns: [
|
||||
{
|
||||
name: 'customer_id',
|
||||
description: 'Customer id',
|
||||
dataType: null,
|
||||
dataTests: [{ name: 'relationships', package: 'dbt', kwargs: { to: "ref('customers')", field: 'id' } }],
|
||||
},
|
||||
],
|
||||
resourceType: 'source',
|
||||
},
|
||||
{
|
||||
name: 'orders_model',
|
||||
description: null,
|
||||
database: null,
|
||||
schema: 'mart',
|
||||
columns: [{ name: 'id', description: 'Order id', dataType: null }],
|
||||
resourceType: 'model',
|
||||
},
|
||||
]);
|
||||
expect(result.relationships).toEqual([
|
||||
{
|
||||
fromTable: 'fct_orders',
|
||||
fromColumn: 'customer_id',
|
||||
toTable: 'customers',
|
||||
toColumn: 'id',
|
||||
fromSchema: 'fallback_schema',
|
||||
},
|
||||
]);
|
||||
});
|
||||
|
||||
it('deduplicates tables by database schema and name while merging columns', () => {
|
||||
const result = parseDbtSchemaFiles([
|
||||
{
|
||||
path: 'models/a.yml',
|
||||
content: `
|
||||
version: 2
|
||||
models:
|
||||
- name: orders
|
||||
description: Orders
|
||||
columns:
|
||||
- name: id
|
||||
description: Primary key
|
||||
`,
|
||||
},
|
||||
{
|
||||
path: 'models/b.yml',
|
||||
content: `
|
||||
version: 2
|
||||
models:
|
||||
- name: orders
|
||||
columns:
|
||||
- name: status
|
||||
description: Status
|
||||
- name: id
|
||||
data_type: integer
|
||||
`,
|
||||
},
|
||||
]);
|
||||
|
||||
expect(result.tables).toEqual([
|
||||
{
|
||||
name: 'orders',
|
||||
description: 'Orders',
|
||||
database: null,
|
||||
schema: null,
|
||||
resourceType: 'model',
|
||||
columns: [
|
||||
{ name: 'id', description: 'Primary key', dataType: 'integer' },
|
||||
{ name: 'status', description: 'Status', dataType: null },
|
||||
],
|
||||
},
|
||||
]);
|
||||
});
|
||||
|
||||
it('returns an empty result for malformed YAML and preserves unresolved Jinja text', () => {
|
||||
expect(parseDbtSchemaFile('{{{{ invalid yaml', { path: 'broken.yml' })).toEqual({
|
||||
projectName: null,
|
||||
dbtVersion: null,
|
||||
tables: [],
|
||||
relationships: [],
|
||||
});
|
||||
|
||||
const unresolved = parseDbtSchemaFile(
|
||||
`
|
||||
version: 2
|
||||
models:
|
||||
- name: "{{ var('missing_model') }}"
|
||||
`,
|
||||
{ variables: new Map() },
|
||||
);
|
||||
expect(unresolved.tables[0]?.name).toBe("{{ var('missing_model') }}");
|
||||
});
|
||||
|
||||
it('extracts data tests, constraints, enum values, tags, and freshness', () => {
|
||||
const result = parseDbtSchemaFile(`
|
||||
version: 2
|
||||
sources:
|
||||
- name: raw
|
||||
schema: jaffle
|
||||
tags: ["raw"]
|
||||
tables:
|
||||
- name: customers
|
||||
tags: ["core"]
|
||||
loaded_at_field: updated_at
|
||||
freshness:
|
||||
warn_after: { count: 12, period: hour }
|
||||
columns:
|
||||
- name: id
|
||||
tests:
|
||||
- not_null
|
||||
- unique
|
||||
- name: status
|
||||
data_tests:
|
||||
- accepted_values:
|
||||
values: ['active', 'inactive']
|
||||
models:
|
||||
- name: orders
|
||||
tags: ["finance"]
|
||||
loaded_at_field: run_at
|
||||
columns:
|
||||
- name: status
|
||||
data_tests:
|
||||
- dbt_utils.expression_is_true:
|
||||
expression: "status is not null"
|
||||
- accepted_values: ['placed', 'shipped']
|
||||
`);
|
||||
|
||||
const customers = result.tables.find((table) => table.name === 'customers');
|
||||
expect(customers?.tagsDbt).toEqual(['raw', 'core']);
|
||||
expect(customers?.freshnessDbt?.loadedAtField).toBe('updated_at');
|
||||
expect(customers?.freshnessDbt?.raw).toBeDefined();
|
||||
const id = customers?.columns.find((column) => column.name === 'id');
|
||||
expect(id?.constraints?.dbt).toEqual({ not_null: true, unique: true });
|
||||
const status = customers?.columns.find((column) => column.name === 'status');
|
||||
expect(status?.enumValuesDbt).toEqual(['active', 'inactive']);
|
||||
|
||||
const orders = result.tables.find((table) => table.name === 'orders');
|
||||
expect(orders?.tagsDbt).toEqual(['finance']);
|
||||
expect(orders?.freshnessDbt?.loadedAtField).toBe('run_at');
|
||||
const ordersStatus = orders?.columns.find((column) => column.name === 'status');
|
||||
expect(ordersStatus?.enumValuesDbt).toEqual(['placed', 'shipped']);
|
||||
expect(ordersStatus?.dataTests).toEqual(
|
||||
expect.arrayContaining([
|
||||
expect.objectContaining({ package: 'dbt_utils', name: 'expression_is_true' }),
|
||||
expect.objectContaining({ package: 'dbt', name: 'accepted_values' }),
|
||||
]),
|
||||
);
|
||||
});
|
||||
|
||||
it('parses relationships from model column data tests', () => {
|
||||
const result = parseDbtSchemaFile(`
|
||||
version: 2
|
||||
models:
|
||||
- name: orders
|
||||
schema: public
|
||||
columns:
|
||||
- name: customer_id
|
||||
data_tests:
|
||||
- relationships:
|
||||
arguments:
|
||||
to: "ref('customers')"
|
||||
field: id
|
||||
`);
|
||||
|
||||
expect(result.relationships).toEqual([
|
||||
{
|
||||
fromTable: 'orders',
|
||||
fromColumn: 'customer_id',
|
||||
toTable: 'customers',
|
||||
toColumn: 'id',
|
||||
fromSchema: 'public',
|
||||
},
|
||||
]);
|
||||
});
|
||||
});
|
||||
|
|
@ -0,0 +1,649 @@
|
|||
import { createHash } from 'node:crypto';
|
||||
import { parse as parseYaml } from 'yaml';
|
||||
import { type KtxLogger, noopLogger } from '../../../../context/core/config.js';
|
||||
import { resolveJinjaVariables } from '../../dbt-shared/project-vars.js';
|
||||
|
||||
interface DbtParsedColumn {
|
||||
name: string;
|
||||
description: string | null;
|
||||
dataType: string | null;
|
||||
dataTests?: DbtDataTestRef[];
|
||||
constraints?: DbtColumnConstraints;
|
||||
enumValuesDbt?: string[];
|
||||
}
|
||||
|
||||
interface DbtDataTestRef {
|
||||
name: string;
|
||||
package: string;
|
||||
kwargs?: Record<string, unknown>;
|
||||
}
|
||||
|
||||
interface DbtColumnConstraints {
|
||||
dbt: {
|
||||
not_null?: boolean;
|
||||
unique?: boolean;
|
||||
};
|
||||
}
|
||||
|
||||
interface DbtParsedRelationship {
|
||||
fromTable: string;
|
||||
fromColumn: string;
|
||||
toTable: string;
|
||||
toColumn: string;
|
||||
fromSchema?: string;
|
||||
toSchema?: string;
|
||||
description?: string;
|
||||
}
|
||||
|
||||
interface DbtParsedTable {
|
||||
name: string;
|
||||
description: string | null;
|
||||
database: string | null;
|
||||
schema: string | null;
|
||||
columns: DbtParsedColumn[];
|
||||
resourceType?: 'source' | 'model';
|
||||
tagsDbt?: string[];
|
||||
freshnessDbt?: {
|
||||
raw?: unknown;
|
||||
loadedAtField?: string | null;
|
||||
};
|
||||
}
|
||||
|
||||
export interface DbtSchemaParseResult {
|
||||
projectName: string | null;
|
||||
dbtVersion: string | null;
|
||||
tables: DbtParsedTable[];
|
||||
relationships: DbtParsedRelationship[];
|
||||
}
|
||||
|
||||
export interface DbtSchemaFile {
|
||||
content: string;
|
||||
path: string;
|
||||
}
|
||||
|
||||
interface ParseDbtSchemaOptions {
|
||||
path?: string;
|
||||
variables?: Map<string, string>;
|
||||
projectName?: string | null;
|
||||
logger?: KtxLogger;
|
||||
}
|
||||
|
||||
interface DbtSchemaYaml {
|
||||
version?: number;
|
||||
sources?: DbtSchemaSource[];
|
||||
models?: DbtSchemaModel[];
|
||||
}
|
||||
|
||||
interface DbtSchemaSource {
|
||||
name: string;
|
||||
description?: string;
|
||||
database?: string;
|
||||
schema?: string;
|
||||
tags?: string[];
|
||||
tables?: DbtSchemaTable[];
|
||||
}
|
||||
|
||||
interface DbtSchemaTable {
|
||||
name: string;
|
||||
description?: string;
|
||||
identifier?: string;
|
||||
tags?: string[];
|
||||
loaded_at_field?: string;
|
||||
freshness?: unknown;
|
||||
columns?: DbtSchemaColumn[];
|
||||
}
|
||||
|
||||
interface DbtSchemaModel {
|
||||
name: string;
|
||||
description?: string;
|
||||
database?: string;
|
||||
schema?: string;
|
||||
tags?: string[];
|
||||
loaded_at_field?: string;
|
||||
freshness?: unknown;
|
||||
columns?: DbtSchemaColumn[];
|
||||
}
|
||||
|
||||
interface DbtSchemaColumn {
|
||||
name: string;
|
||||
description?: string;
|
||||
data_type?: string;
|
||||
data_tests?: DbtSchemaDataTest[];
|
||||
tests?: DbtSchemaDataTest[];
|
||||
}
|
||||
|
||||
type DbtSchemaDataTest =
|
||||
| string
|
||||
| {
|
||||
relationships?: {
|
||||
to?: string;
|
||||
field?: string;
|
||||
arguments?: { to?: string; field?: string };
|
||||
};
|
||||
not_null?: unknown;
|
||||
unique?: unknown;
|
||||
accepted_values?: { values?: unknown } | unknown;
|
||||
[key: string]: unknown;
|
||||
};
|
||||
|
||||
/** @internal */
|
||||
export function parseDbtSchemaFile(content: string, options: ParseDbtSchemaOptions = {}): DbtSchemaParseResult {
|
||||
return new DbtSchemaParser(options.logger ?? noopLogger).parseFile(content, options);
|
||||
}
|
||||
|
||||
export function parseDbtSchemaFiles(
|
||||
files: DbtSchemaFile[],
|
||||
variables?: Map<string, string>,
|
||||
options: { projectName?: string | null; logger?: KtxLogger } = {},
|
||||
): DbtSchemaParseResult {
|
||||
return new DbtSchemaParser(options.logger ?? noopLogger).parseFiles(files, variables, options.projectName ?? null);
|
||||
}
|
||||
|
||||
|
||||
class DbtSchemaParser {
|
||||
constructor(private readonly logger: KtxLogger) {}
|
||||
|
||||
parseFile(yamlContent: string, options: ParseDbtSchemaOptions = {}): DbtSchemaParseResult {
|
||||
this.logger.debug(`Parsing schema file: ${options.path ?? 'unknown'}`);
|
||||
|
||||
const resolved = options.variables
|
||||
? resolveJinjaVariables(yamlContent, options.variables)
|
||||
: { content: yamlContent, unresolvedVars: [] };
|
||||
if (resolved.unresolvedVars.length > 0) {
|
||||
this.logger.warn(
|
||||
`Unresolved dbt variables in ${options.path ?? 'schema file'}: ${resolved.unresolvedVars.join(', ')}`,
|
||||
);
|
||||
}
|
||||
|
||||
let schema: DbtSchemaYaml;
|
||||
try {
|
||||
schema = parseYaml(resolved.content) as DbtSchemaYaml;
|
||||
} catch (error) {
|
||||
this.logger.warn(`Failed to parse YAML${options.path ? ` at ${options.path}` : ''}: ${error}`);
|
||||
return this.emptyResult(options.projectName ?? null);
|
||||
}
|
||||
|
||||
if (!schema || typeof schema !== 'object') {
|
||||
return this.emptyResult(options.projectName ?? null);
|
||||
}
|
||||
|
||||
const tables = [...this.parseSources(schema.sources), ...this.parseModels(schema.models)];
|
||||
const relationships = [
|
||||
...this.parseSourceRelationships(schema.sources),
|
||||
...this.parseModelRelationships(schema.models),
|
||||
];
|
||||
|
||||
return {
|
||||
projectName: options.projectName ?? null,
|
||||
dbtVersion: null,
|
||||
tables,
|
||||
relationships,
|
||||
};
|
||||
}
|
||||
|
||||
parseFiles(
|
||||
files: DbtSchemaFile[],
|
||||
variables?: Map<string, string>,
|
||||
projectName: string | null = null,
|
||||
): DbtSchemaParseResult {
|
||||
const allTables: DbtParsedTable[] = [];
|
||||
const allRelationships: DbtParsedRelationship[] = [];
|
||||
|
||||
for (const file of files) {
|
||||
const result = this.parseFile(file.content, { path: file.path, variables, projectName });
|
||||
allTables.push(...result.tables);
|
||||
allRelationships.push(...result.relationships);
|
||||
}
|
||||
|
||||
return {
|
||||
projectName,
|
||||
dbtVersion: null,
|
||||
tables: this.deduplicateTables(allTables),
|
||||
relationships: this.deduplicateRelationships(allRelationships),
|
||||
};
|
||||
}
|
||||
|
||||
private parseSources(sources: DbtSchemaSource[] | undefined): DbtParsedTable[] {
|
||||
if (!sources || !Array.isArray(sources)) {
|
||||
return [];
|
||||
}
|
||||
|
||||
const tables: DbtParsedTable[] = [];
|
||||
|
||||
for (const source of sources) {
|
||||
const sourceSchema = source.schema ?? source.name;
|
||||
const sourceDatabase = source.database ?? null;
|
||||
const sourceTags = this.normalizeTagList(source.tags);
|
||||
|
||||
if (!source.tables || !Array.isArray(source.tables)) {
|
||||
continue;
|
||||
}
|
||||
|
||||
for (const table of source.tables) {
|
||||
const tagsDbt = this.mergeTagsDbt(sourceTags, this.normalizeTagList(table.tags));
|
||||
const freshnessDbt = this.buildFreshnessDbt(table.freshness, table.loaded_at_field);
|
||||
tables.push({
|
||||
name: table.identifier ?? table.name,
|
||||
description: this.normalizeDescription(table.description),
|
||||
database: sourceDatabase,
|
||||
schema: sourceSchema,
|
||||
columns: this.parseColumns(table.columns),
|
||||
resourceType: 'source',
|
||||
...(tagsDbt ? { tagsDbt } : {}),
|
||||
...(freshnessDbt ? { freshnessDbt } : {}),
|
||||
});
|
||||
}
|
||||
}
|
||||
|
||||
return tables;
|
||||
}
|
||||
|
||||
private parseModels(models: DbtSchemaModel[] | undefined): DbtParsedTable[] {
|
||||
if (!models || !Array.isArray(models)) {
|
||||
return [];
|
||||
}
|
||||
|
||||
const tables: DbtParsedTable[] = [];
|
||||
|
||||
for (const model of models) {
|
||||
if (!model.name) {
|
||||
continue;
|
||||
}
|
||||
|
||||
const tagsDbt = this.mergeTagsDbt(this.normalizeTagList(model.tags));
|
||||
const freshnessDbt = this.buildFreshnessDbt(model.freshness, model.loaded_at_field);
|
||||
tables.push({
|
||||
name: model.name,
|
||||
description: this.normalizeDescription(model.description),
|
||||
database: model.database ?? null,
|
||||
schema: model.schema ?? null,
|
||||
columns: this.parseColumns(model.columns),
|
||||
resourceType: 'model',
|
||||
...(tagsDbt ? { tagsDbt } : {}),
|
||||
...(freshnessDbt ? { freshnessDbt } : {}),
|
||||
});
|
||||
}
|
||||
|
||||
return tables;
|
||||
}
|
||||
|
||||
private parseColumns(columns: DbtSchemaColumn[] | undefined): DbtParsedColumn[] {
|
||||
if (!columns || !Array.isArray(columns)) {
|
||||
return [];
|
||||
}
|
||||
|
||||
return columns.map((column) => {
|
||||
const { refs, constraints, enumValues } = this.parseDataTests(column.data_tests ?? column.tests);
|
||||
return {
|
||||
name: column.name,
|
||||
description: this.normalizeDescription(column.description),
|
||||
dataType: column.data_type ?? null,
|
||||
...(refs.length > 0 ? { dataTests: refs } : {}),
|
||||
...(constraints ? { constraints } : {}),
|
||||
...(enumValues.length > 0 ? { enumValuesDbt: enumValues } : {}),
|
||||
};
|
||||
});
|
||||
}
|
||||
|
||||
private parseDataTests(tests: DbtSchemaDataTest[] | undefined): {
|
||||
refs: DbtDataTestRef[];
|
||||
constraints: DbtColumnConstraints | undefined;
|
||||
enumValues: string[];
|
||||
} {
|
||||
const refs: DbtDataTestRef[] = [];
|
||||
const dbt: { not_null?: boolean; unique?: boolean } = {};
|
||||
const enumValues: string[] = [];
|
||||
if (!tests?.length) {
|
||||
return { refs, constraints: undefined, enumValues };
|
||||
}
|
||||
|
||||
for (const test of tests) {
|
||||
if (typeof test === 'string') {
|
||||
const parsed = this.parseTestNameString(test);
|
||||
refs.push(parsed);
|
||||
if (parsed.package === 'dbt' && parsed.name === 'not_null') {
|
||||
dbt.not_null = true;
|
||||
}
|
||||
if (parsed.package === 'dbt' && parsed.name === 'unique') {
|
||||
dbt.unique = true;
|
||||
}
|
||||
continue;
|
||||
}
|
||||
|
||||
for (const [key, value] of Object.entries(test)) {
|
||||
if (key === 'relationships') {
|
||||
refs.push({
|
||||
name: 'relationships',
|
||||
package: 'dbt',
|
||||
...(value && typeof value === 'object' && !Array.isArray(value)
|
||||
? { kwargs: value as Record<string, unknown> }
|
||||
: {}),
|
||||
});
|
||||
continue;
|
||||
}
|
||||
if (key === 'not_null') {
|
||||
refs.push({ name: 'not_null', package: 'dbt' });
|
||||
dbt.not_null = true;
|
||||
continue;
|
||||
}
|
||||
if (key === 'unique') {
|
||||
refs.push({ name: 'unique', package: 'dbt' });
|
||||
dbt.unique = true;
|
||||
continue;
|
||||
}
|
||||
if (key === 'accepted_values') {
|
||||
if (Array.isArray(value)) {
|
||||
enumValues.push(...value.map((item) => String(item)));
|
||||
refs.push({ name: 'accepted_values', package: 'dbt', kwargs: { values: value } });
|
||||
continue;
|
||||
}
|
||||
if (value && typeof value === 'object' && !Array.isArray(value)) {
|
||||
const values = (value as { values?: unknown }).values;
|
||||
if (Array.isArray(values)) {
|
||||
enumValues.push(...values.map((item) => String(item)));
|
||||
}
|
||||
refs.push({ name: 'accepted_values', package: 'dbt', kwargs: value as Record<string, unknown> });
|
||||
continue;
|
||||
}
|
||||
}
|
||||
refs.push({
|
||||
...this.parseTestNameString(key),
|
||||
...(value && typeof value === 'object' && !Array.isArray(value)
|
||||
? { kwargs: value as Record<string, unknown> }
|
||||
: {}),
|
||||
});
|
||||
}
|
||||
}
|
||||
|
||||
const constraints = dbt.not_null || dbt.unique ? { dbt } : undefined;
|
||||
return { refs, constraints, enumValues };
|
||||
}
|
||||
|
||||
private parseTestNameString(value: string): { name: string; package: string } {
|
||||
const parts = value.split('.');
|
||||
if (parts.length >= 2) {
|
||||
return { package: parts[0]!, name: parts.slice(1).join('.') };
|
||||
}
|
||||
return { package: 'dbt', name: value };
|
||||
}
|
||||
|
||||
private parseSourceRelationships(sources: DbtSchemaSource[] | undefined): DbtParsedRelationship[] {
|
||||
if (!sources || !Array.isArray(sources)) {
|
||||
return [];
|
||||
}
|
||||
|
||||
const relationships: DbtParsedRelationship[] = [];
|
||||
|
||||
for (const source of sources) {
|
||||
const sourceSchema = source.schema ?? source.name;
|
||||
|
||||
if (!source.tables || !Array.isArray(source.tables)) {
|
||||
continue;
|
||||
}
|
||||
|
||||
for (const table of source.tables) {
|
||||
const tableName = table.identifier ?? table.name;
|
||||
|
||||
if (!table.columns || !Array.isArray(table.columns)) {
|
||||
continue;
|
||||
}
|
||||
|
||||
for (const column of table.columns) {
|
||||
const tests = column.data_tests ?? column.tests ?? [];
|
||||
|
||||
for (const test of tests) {
|
||||
const relationship = this.parseRelationshipTest(test, tableName, column.name, sourceSchema);
|
||||
if (relationship) {
|
||||
relationships.push(relationship);
|
||||
}
|
||||
}
|
||||
}
|
||||
}
|
||||
}
|
||||
|
||||
return relationships;
|
||||
}
|
||||
|
||||
private parseModelRelationships(models: DbtSchemaModel[] | undefined): DbtParsedRelationship[] {
|
||||
if (!models || !Array.isArray(models)) {
|
||||
return [];
|
||||
}
|
||||
|
||||
const relationships: DbtParsedRelationship[] = [];
|
||||
|
||||
for (const model of models) {
|
||||
if (!model.name || !model.columns || !Array.isArray(model.columns)) {
|
||||
continue;
|
||||
}
|
||||
|
||||
for (const column of model.columns) {
|
||||
const tests = column.data_tests ?? column.tests ?? [];
|
||||
|
||||
for (const test of tests) {
|
||||
const relationship = this.parseRelationshipTest(test, model.name, column.name, model.schema ?? undefined);
|
||||
if (relationship) {
|
||||
relationships.push(relationship);
|
||||
}
|
||||
}
|
||||
}
|
||||
}
|
||||
|
||||
return relationships;
|
||||
}
|
||||
|
||||
private parseRelationshipTest(
|
||||
test: DbtSchemaDataTest,
|
||||
fromTable: string,
|
||||
fromColumn: string,
|
||||
fromSchema?: string,
|
||||
): DbtParsedRelationship | null {
|
||||
if (typeof test === 'string' || !test.relationships) {
|
||||
return null;
|
||||
}
|
||||
|
||||
const relationship = test.relationships;
|
||||
const toRef = relationship.to ?? relationship.arguments?.to;
|
||||
const toColumn = relationship.field ?? relationship.arguments?.field;
|
||||
|
||||
if (!toRef || !toColumn) {
|
||||
this.logger.debug(`Skipping incomplete relationship test for ${fromTable}.${fromColumn}`);
|
||||
return null;
|
||||
}
|
||||
|
||||
const toTable = this.parseRef(toRef);
|
||||
if (!toTable) {
|
||||
this.logger.debug(`Could not parse ref: ${toRef}`);
|
||||
return null;
|
||||
}
|
||||
|
||||
return {
|
||||
fromTable,
|
||||
fromColumn,
|
||||
toTable,
|
||||
toColumn,
|
||||
...(fromSchema ? { fromSchema } : {}),
|
||||
};
|
||||
}
|
||||
|
||||
private parseRef(refString: string): string | null {
|
||||
const refMatch = refString.match(/ref\s*\(\s*['"]([^'"]+)['"]\s*\)/);
|
||||
if (refMatch) {
|
||||
return refMatch[1];
|
||||
}
|
||||
|
||||
const sourceMatch = refString.match(/source\s*\(\s*['"][^'"]+['"]\s*,\s*['"]([^'"]+)['"]\s*\)/);
|
||||
if (sourceMatch) {
|
||||
return sourceMatch[1];
|
||||
}
|
||||
|
||||
return null;
|
||||
}
|
||||
|
||||
private normalizeDescription(description: string | undefined): string | null {
|
||||
if (!description) {
|
||||
return null;
|
||||
}
|
||||
const trimmed = description.trim();
|
||||
return trimmed.length > 0 ? trimmed : null;
|
||||
}
|
||||
|
||||
private normalizeTagList(tags: string[] | undefined): string[] {
|
||||
if (!tags || !Array.isArray(tags)) {
|
||||
return [];
|
||||
}
|
||||
return tags.map((tag) => String(tag));
|
||||
}
|
||||
|
||||
private mergeTagsDbt(...lists: Array<string[] | undefined>): string[] | undefined {
|
||||
const merged: string[] = [];
|
||||
const seen = new Set<string>();
|
||||
for (const list of lists) {
|
||||
for (const item of list ?? []) {
|
||||
if (!seen.has(item)) {
|
||||
seen.add(item);
|
||||
merged.push(item);
|
||||
}
|
||||
}
|
||||
}
|
||||
return merged.length > 0 ? merged : undefined;
|
||||
}
|
||||
|
||||
private buildFreshnessDbt(freshness: unknown, loadedAtField: string | undefined): DbtParsedTable['freshnessDbt'] {
|
||||
const loadedTrim = loadedAtField?.trim();
|
||||
const hasFreshness = freshness !== undefined && freshness !== null;
|
||||
if (!hasFreshness && !loadedTrim) {
|
||||
return undefined;
|
||||
}
|
||||
return {
|
||||
...(hasFreshness ? { raw: freshness } : {}),
|
||||
...(hasFreshness ? { loadedAtField: loadedTrim ?? null } : loadedTrim ? { loadedAtField: loadedTrim } : {}),
|
||||
};
|
||||
}
|
||||
|
||||
private deduplicateTables(tables: DbtParsedTable[]): DbtParsedTable[] {
|
||||
const seen = new Map<string, DbtParsedTable>();
|
||||
|
||||
for (const table of tables) {
|
||||
const key = `${table.database ?? ''}.${table.schema ?? ''}.${table.name}`.toLowerCase();
|
||||
const existing = seen.get(key);
|
||||
|
||||
if (!existing) {
|
||||
seen.set(key, table);
|
||||
continue;
|
||||
}
|
||||
|
||||
seen.set(key, {
|
||||
...existing,
|
||||
description: existing.description ?? table.description,
|
||||
columns: this.mergeColumns(existing.columns, table.columns),
|
||||
tagsDbt: this.mergeTagsDbt(existing.tagsDbt, table.tagsDbt),
|
||||
freshnessDbt: this.mergeFreshnessDbt(existing.freshnessDbt, table.freshnessDbt),
|
||||
});
|
||||
}
|
||||
|
||||
return Array.from(seen.values());
|
||||
}
|
||||
|
||||
private mergeColumns(existing: DbtParsedColumn[], incoming: DbtParsedColumn[]): DbtParsedColumn[] {
|
||||
const seen = new Map<string, DbtParsedColumn>();
|
||||
|
||||
for (const column of existing) {
|
||||
seen.set(column.name.toLowerCase(), column);
|
||||
}
|
||||
|
||||
for (const column of incoming) {
|
||||
const key = column.name.toLowerCase();
|
||||
const existingColumn = seen.get(key);
|
||||
|
||||
if (!existingColumn) {
|
||||
seen.set(key, column);
|
||||
continue;
|
||||
}
|
||||
|
||||
seen.set(key, {
|
||||
...existingColumn,
|
||||
description: existingColumn.description ?? column.description,
|
||||
dataType: existingColumn.dataType ?? column.dataType,
|
||||
dataTests: this.mergeDbtDataTests(existingColumn.dataTests, column.dataTests),
|
||||
constraints: this.mergeDbtConstraints(existingColumn.constraints, column.constraints),
|
||||
enumValuesDbt: this.mergeStringList(existingColumn.enumValuesDbt, column.enumValuesDbt),
|
||||
});
|
||||
}
|
||||
|
||||
return Array.from(seen.values());
|
||||
}
|
||||
|
||||
private deduplicateRelationships(relationships: DbtParsedRelationship[]): DbtParsedRelationship[] {
|
||||
const seen = new Set<string>();
|
||||
const result: DbtParsedRelationship[] = [];
|
||||
|
||||
for (const relationship of relationships) {
|
||||
const key =
|
||||
`${relationship.fromTable}.${relationship.fromColumn}->${relationship.toTable}.${relationship.toColumn}`.toLowerCase();
|
||||
if (!seen.has(key)) {
|
||||
seen.add(key);
|
||||
result.push(relationship);
|
||||
}
|
||||
}
|
||||
|
||||
return result;
|
||||
}
|
||||
|
||||
private mergeFreshnessDbt(
|
||||
existing?: DbtParsedTable['freshnessDbt'],
|
||||
incoming?: DbtParsedTable['freshnessDbt'],
|
||||
): DbtParsedTable['freshnessDbt'] {
|
||||
if (!existing && !incoming) {
|
||||
return undefined;
|
||||
}
|
||||
const raw = existing?.raw !== undefined ? existing.raw : incoming?.raw;
|
||||
const loadedAtField = existing?.loadedAtField ?? incoming?.loadedAtField;
|
||||
return {
|
||||
...(raw !== undefined ? { raw } : {}),
|
||||
...(loadedAtField !== undefined ? { loadedAtField } : {}),
|
||||
};
|
||||
}
|
||||
|
||||
private mergeDbtConstraints(
|
||||
existing?: DbtColumnConstraints,
|
||||
incoming?: DbtColumnConstraints,
|
||||
): DbtColumnConstraints | undefined {
|
||||
const notNull = !!(existing?.dbt.not_null || incoming?.dbt.not_null);
|
||||
const unique = !!(existing?.dbt.unique || incoming?.dbt.unique);
|
||||
if (!notNull && !unique) {
|
||||
return undefined;
|
||||
}
|
||||
return { dbt: { ...(notNull ? { not_null: true } : {}), ...(unique ? { unique: true } : {}) } };
|
||||
}
|
||||
|
||||
private mergeStringList(existing?: string[], incoming?: string[]): string[] | undefined {
|
||||
return this.mergeTagsDbt(existing, incoming);
|
||||
}
|
||||
|
||||
private mergeDbtDataTests(existing?: DbtDataTestRef[], incoming?: DbtDataTestRef[]): DbtDataTestRef[] | undefined {
|
||||
if (!existing?.length) {
|
||||
return incoming?.length ? [...incoming] : undefined;
|
||||
}
|
||||
if (!incoming?.length) {
|
||||
return [...existing];
|
||||
}
|
||||
const tests = new Map<string, DbtDataTestRef>();
|
||||
for (const test of [...existing, ...incoming]) {
|
||||
const kwargsKey =
|
||||
test.kwargs && Object.keys(test.kwargs).length > 0
|
||||
? `:${createHash('sha256').update(JSON.stringify(test.kwargs)).digest('hex').slice(0, 16)}`
|
||||
: '';
|
||||
tests.set(`${test.package}:${test.name}${kwargsKey}`, test);
|
||||
}
|
||||
return [...tests.values()];
|
||||
}
|
||||
|
||||
private emptyResult(projectName: string | null): DbtSchemaParseResult {
|
||||
return {
|
||||
projectName,
|
||||
dbtVersion: null,
|
||||
tables: [],
|
||||
relationships: [],
|
||||
};
|
||||
}
|
||||
}
|
||||
36
packages/cli/src/context/ingest/adapters/dbt/chunk.test.ts
Normal file
36
packages/cli/src/context/ingest/adapters/dbt/chunk.test.ts
Normal file
|
|
@ -0,0 +1,36 @@
|
|||
import { describe, expect, it } from 'vitest';
|
||||
import { chunkDbtProject } from './chunk.js';
|
||||
|
||||
describe('chunkDbtProject', () => {
|
||||
const diffSet = (modified: string[]) => ({ added: [], modified, deleted: [], unchanged: [] });
|
||||
|
||||
it('caps peerFileIndex when the project has very many yaml files', () => {
|
||||
const modelPaths = Array.from({ length: 201 }, (_, i) => `models/m${i}.yml`);
|
||||
const allPaths = ['dbt_project.yml', ...modelPaths].sort();
|
||||
const { workUnits } = chunkDbtProject({ allPaths });
|
||||
const [first] = workUnits;
|
||||
expect(first).toBeDefined();
|
||||
expect(first?.peerFileIndex).toHaveLength(200);
|
||||
expect(first?.notes).toMatch(/capped at 200/);
|
||||
});
|
||||
|
||||
it('keeps large-project model work units when dbt_project.yml changes', () => {
|
||||
const modelPaths = Array.from({ length: 30 }, (_, i) => `models/m${i}.yml`);
|
||||
const allPaths = ['dbt_project.yml', ...modelPaths].sort();
|
||||
const { workUnits } = chunkDbtProject({ allPaths }, { diffSet: diffSet(['dbt_project.yml']) });
|
||||
|
||||
expect(workUnits).toHaveLength(30);
|
||||
expect(workUnits[0]?.rawFiles).toEqual(['models/m0.yml']);
|
||||
expect(workUnits[0]?.dependencyPaths).toContain('dbt_project.yml');
|
||||
});
|
||||
|
||||
it('keeps large-project model work units when non-model yaml peers change', () => {
|
||||
const modelPaths = Array.from({ length: 30 }, (_, i) => `models/m${i}.yml`);
|
||||
const allPaths = ['dbt_project.yml', 'seeds/seed_properties.yml', ...modelPaths].sort();
|
||||
const { workUnits } = chunkDbtProject({ allPaths }, { diffSet: diffSet(['seeds/seed_properties.yml']) });
|
||||
|
||||
expect(workUnits).toHaveLength(30);
|
||||
expect(workUnits[0]?.rawFiles).toEqual(['models/m0.yml']);
|
||||
expect(workUnits[0]?.dependencyPaths).toContain('seeds/seed_properties.yml');
|
||||
});
|
||||
});
|
||||
130
packages/cli/src/context/ingest/adapters/dbt/chunk.ts
Normal file
130
packages/cli/src/context/ingest/adapters/dbt/chunk.ts
Normal file
|
|
@ -0,0 +1,130 @@
|
|||
import type { ChunkResult, DiffSet, WorkUnit } from '../../types.js';
|
||||
import type { ParsedDbtProject } from './parse.js';
|
||||
|
||||
interface ChunkOptions {
|
||||
diffSet?: DiffSet;
|
||||
}
|
||||
|
||||
/**
|
||||
* Per-model work units (when the project has more than 25 YAML files) only name `rawFiles` under
|
||||
* `models/**`. Other `.yml` (e.g. some `seeds/` or custom layouts) still appear in `peerFileIndex`
|
||||
* or in the small-project / no-models fallbacks — v1 does not emit one WU per non-models file.
|
||||
*/
|
||||
const MODELS_PREFIX = 'models/';
|
||||
|
||||
/** `peerFileIndex` is a hint only (agents may not read those paths). Cap to limit prompt size. */
|
||||
const MAX_PEER_FILE_INDEX = 200;
|
||||
|
||||
function projectYamlPath(allPaths: string[]): string | undefined {
|
||||
if (allPaths.includes('dbt_project.yml')) {
|
||||
return 'dbt_project.yml';
|
||||
}
|
||||
if (allPaths.includes('dbt_project.yaml')) {
|
||||
return 'dbt_project.yaml';
|
||||
}
|
||||
return undefined;
|
||||
}
|
||||
|
||||
function modelRelativePaths(allPaths: string[]): string[] {
|
||||
return allPaths.filter((p) => p.replace(/\\/g, '/').startsWith(MODELS_PREFIX)).sort();
|
||||
}
|
||||
|
||||
function unitKeyForModelFile(mf: string): string {
|
||||
const base = mf
|
||||
.replace(/\.(ya?ml)$/i, '')
|
||||
.replace(/\\/g, '/')
|
||||
.replace(/[^a-zA-Z0-9]+/g, '-')
|
||||
.replace(/^-+|-+$/g, '');
|
||||
return `dbt-${base.toLowerCase()}`;
|
||||
}
|
||||
|
||||
function emitFirstRunWorkUnits(allPaths: string[], dbtDep: string | undefined): WorkUnit[] {
|
||||
if (allPaths.length === 0) {
|
||||
return [];
|
||||
}
|
||||
|
||||
if (allPaths.length <= 25) {
|
||||
return [
|
||||
{
|
||||
unitKey: 'dbt-all',
|
||||
displayLabel: 'dbt project (all yaml)',
|
||||
rawFiles: [...allPaths],
|
||||
peerFileIndex: [],
|
||||
dependencyPaths: [],
|
||||
notes: 'dbt project — all YAML in one WorkUnit (≤25 files)',
|
||||
},
|
||||
];
|
||||
}
|
||||
|
||||
const modelFiles = modelRelativePaths(allPaths);
|
||||
if (modelFiles.length === 0) {
|
||||
return [
|
||||
{
|
||||
unitKey: 'dbt-all',
|
||||
displayLabel: 'dbt project (all yaml, no models/**)',
|
||||
rawFiles: [...allPaths],
|
||||
peerFileIndex: [],
|
||||
dependencyPaths: dbtDep ? [dbtDep] : [],
|
||||
notes: 'dbt: no models/**/*.yml — single slice with dbt_project as dependency if present',
|
||||
},
|
||||
];
|
||||
}
|
||||
|
||||
return modelFiles.map((mf) => {
|
||||
const allPeers = allPaths.filter((p) => p !== mf).sort();
|
||||
const truncated = allPeers.length > MAX_PEER_FILE_INDEX;
|
||||
const peerFileIndex = truncated ? allPeers.slice(0, MAX_PEER_FILE_INDEX) : allPeers;
|
||||
const dependencyPaths = dbtDep && allPaths.includes(dbtDep) && mf !== dbtDep ? [dbtDep].sort() : [];
|
||||
const notes = truncated
|
||||
? `dbt model schema slice (peer index capped at ${MAX_PEER_FILE_INDEX} of ${allPeers.length} paths)`
|
||||
: 'dbt model schema slice';
|
||||
return {
|
||||
unitKey: unitKeyForModelFile(mf),
|
||||
displayLabel: `dbt ${mf}`,
|
||||
rawFiles: [mf],
|
||||
peerFileIndex,
|
||||
dependencyPaths: dependencyPaths,
|
||||
notes,
|
||||
};
|
||||
});
|
||||
}
|
||||
|
||||
function applyDiffSet(firstRunUnits: WorkUnit[], diffSet: DiffSet): ChunkResult {
|
||||
const touched = new Set([...diffSet.added, ...diffSet.modified]);
|
||||
const kept: WorkUnit[] = [];
|
||||
|
||||
for (const wu of firstRunUnits) {
|
||||
const touchedRawFiles = wu.rawFiles.filter((p) => touched.has(p));
|
||||
const touchedDependencies = wu.dependencyPaths.filter((p) => touched.has(p));
|
||||
const touchedPeerFiles = wu.peerFileIndex.filter((p) => touched.has(p));
|
||||
if (touchedRawFiles.length === 0 && touchedDependencies.length === 0 && touchedPeerFiles.length === 0) {
|
||||
continue;
|
||||
}
|
||||
|
||||
const rawFiles = touchedRawFiles.length > 0 ? touchedRawFiles : wu.rawFiles;
|
||||
const unchangedRaw = touchedRawFiles.length > 0 ? wu.rawFiles.filter((p) => !touched.has(p)) : [];
|
||||
for (const p of wu.rawFiles) {
|
||||
if (!rawFiles.includes(p) && !unchangedRaw.includes(p)) {
|
||||
unchangedRaw.push(p);
|
||||
}
|
||||
}
|
||||
const combinedDeps = new Set<string>([...wu.dependencyPaths, ...unchangedRaw, ...touchedPeerFiles]);
|
||||
kept.push({
|
||||
...wu,
|
||||
rawFiles: rawFiles.sort(),
|
||||
dependencyPaths: [...combinedDeps].sort(),
|
||||
});
|
||||
}
|
||||
|
||||
const eviction = diffSet.deleted.length > 0 ? { deletedRawPaths: [...diffSet.deleted].sort() } : undefined;
|
||||
return { workUnits: kept, eviction };
|
||||
}
|
||||
|
||||
export function chunkDbtProject(project: ParsedDbtProject, opts: ChunkOptions = {}): ChunkResult {
|
||||
const dbtDep = projectYamlPath(project.allPaths);
|
||||
const firstRun = emitFirstRunWorkUnits(project.allPaths, dbtDep);
|
||||
if (!opts.diffSet) {
|
||||
return { workUnits: firstRun };
|
||||
}
|
||||
return applyDiffSet(firstRun, opts.diffSet);
|
||||
}
|
||||
|
|
@ -0,0 +1,57 @@
|
|||
import { mkdir, mkdtemp, rm, writeFile } from 'node:fs/promises';
|
||||
import { tmpdir } from 'node:os';
|
||||
import { join } from 'node:path';
|
||||
import { afterEach, beforeEach, describe, expect, it } from 'vitest';
|
||||
import type { SourceAdapter } from '../../types.js';
|
||||
import { DbtSourceAdapter } from './dbt.adapter.js';
|
||||
|
||||
describe('DbtSourceAdapter', () => {
|
||||
let stagedDir: string;
|
||||
let adapter: SourceAdapter;
|
||||
|
||||
beforeEach(async () => {
|
||||
stagedDir = await mkdtemp(join(tmpdir(), 'dbt-adapter-'));
|
||||
adapter = new DbtSourceAdapter();
|
||||
});
|
||||
|
||||
afterEach(async () => {
|
||||
await rm(stagedDir, { recursive: true, force: true });
|
||||
});
|
||||
|
||||
it('declares the expected source key and skill list', () => {
|
||||
expect(adapter.source).toBe('dbt');
|
||||
expect(adapter.skillNames).toEqual(['dbt_ingest']);
|
||||
});
|
||||
|
||||
it('detects a staged dbt project root (dbt_project.yml)', async () => {
|
||||
await writeFile(join(stagedDir, 'dbt_project.yml'), "name: 'jaffle'\nversion: '1.0.0'\n", 'utf-8');
|
||||
expect(await adapter.detect(stagedDir)).toBe(true);
|
||||
});
|
||||
|
||||
it('chunk: dbt_project.yml + models/a.yml yields one WU (≤25 files)', async () => {
|
||||
await writeFile(join(stagedDir, 'dbt_project.yml'), "name: 'jaffle'\n", 'utf-8');
|
||||
await mkdir(join(stagedDir, 'models'), { recursive: true });
|
||||
await writeFile(
|
||||
join(stagedDir, 'models/a.yml'),
|
||||
'version: 2\nmodels:\n - name: orders\n description: Orders\n',
|
||||
'utf-8',
|
||||
);
|
||||
const result = await adapter.chunk(stagedDir);
|
||||
expect(result.workUnits).toHaveLength(1);
|
||||
expect(result.workUnits[0].unitKey).toBe('dbt-all');
|
||||
expect(result.parseArtifacts).toMatchObject({
|
||||
projectName: 'jaffle',
|
||||
tables: [{ name: 'orders', description: 'Orders' }],
|
||||
});
|
||||
});
|
||||
|
||||
it('implements fetch() for git-backed dbt source setup', () => {
|
||||
expect(adapter.fetch).toBeTypeOf('function');
|
||||
});
|
||||
|
||||
it('reports mapped warehouse targets for bundle SL discovery', async () => {
|
||||
adapter = new DbtSourceAdapter({ targetConnectionIds: ['postgres-warehouse', 'postgres-warehouse'] });
|
||||
|
||||
await expect(adapter.listTargetConnectionIds?.(stagedDir)).resolves.toEqual(['postgres-warehouse']);
|
||||
});
|
||||
});
|
||||
53
packages/cli/src/context/ingest/adapters/dbt/dbt.adapter.ts
Normal file
53
packages/cli/src/context/ingest/adapters/dbt/dbt.adapter.ts
Normal file
|
|
@ -0,0 +1,53 @@
|
|||
import { join } from 'node:path';
|
||||
import type { ChunkResult, DiffSet, SourceAdapter } from '../../types.js';
|
||||
import type { FetchContext } from '../../types.js';
|
||||
import { loadProjectInfo } from '../../dbt-shared/project-vars.js';
|
||||
import { loadDbtSchemaFiles } from '../../dbt-shared/schema-files.js';
|
||||
import { parseDbtSchemaFiles } from '../dbt-descriptions/parse-schema.js';
|
||||
import { chunkDbtProject } from './chunk.js';
|
||||
import { detectDbtStagedDir } from './detect.js';
|
||||
import { fetchDbtRepo, type DbtPullConfig } from './fetch.js';
|
||||
import { parseDbtStagedDir } from './parse.js';
|
||||
|
||||
interface DbtSourceAdapterOptions {
|
||||
homeDir?: string;
|
||||
targetConnectionIds?: string[];
|
||||
}
|
||||
|
||||
export class DbtSourceAdapter implements SourceAdapter {
|
||||
readonly source = 'dbt' as const;
|
||||
/** Runner merges: ingest_triage, sl_capture, wiki_capture (see ingest-bundle.runner.ts) */
|
||||
readonly skillNames: string[] = ['dbt_ingest'];
|
||||
|
||||
constructor(private readonly options: DbtSourceAdapterOptions = {}) {}
|
||||
|
||||
detect(stagedDir: string): Promise<boolean> {
|
||||
return detectDbtStagedDir(stagedDir);
|
||||
}
|
||||
|
||||
async listTargetConnectionIds(_stagedDir: string): Promise<string[]> {
|
||||
return [...new Set(this.options.targetConnectionIds ?? [])].sort((left, right) => left.localeCompare(right));
|
||||
}
|
||||
|
||||
async fetch(pullConfig: unknown, stagedDir: string, ctx: FetchContext): Promise<void> {
|
||||
const config = pullConfig as DbtPullConfig | undefined;
|
||||
if (!config?.repoUrl) {
|
||||
throw new Error('dbt fetch requires repoUrl');
|
||||
}
|
||||
await fetchDbtRepo({
|
||||
config,
|
||||
cacheDir: join(this.options.homeDir ?? '.ktx/cache', 'dbt', ctx.connectionId),
|
||||
stagedDir,
|
||||
});
|
||||
}
|
||||
|
||||
async chunk(stagedDir: string, diffSet?: DiffSet): Promise<ChunkResult> {
|
||||
const project = await parseDbtStagedDir(stagedDir);
|
||||
const projectInfo = await loadProjectInfo(stagedDir);
|
||||
const schemaFiles = await loadDbtSchemaFiles(stagedDir);
|
||||
const parseArtifacts = parseDbtSchemaFiles(schemaFiles, projectInfo.variables, {
|
||||
projectName: projectInfo.projectName,
|
||||
});
|
||||
return { ...chunkDbtProject(project, { diffSet }), parseArtifacts };
|
||||
}
|
||||
}
|
||||
12
packages/cli/src/context/ingest/adapters/dbt/detect.ts
Normal file
12
packages/cli/src/context/ingest/adapters/dbt/detect.ts
Normal file
|
|
@ -0,0 +1,12 @@
|
|||
import { access } from 'node:fs/promises';
|
||||
import { join } from 'node:path';
|
||||
|
||||
export async function detectDbtStagedDir(stagedDir: string): Promise<boolean> {
|
||||
for (const name of ['dbt_project.yml', 'dbt_project.yaml'] as const) {
|
||||
try {
|
||||
await access(join(stagedDir, name));
|
||||
return true;
|
||||
} catch {}
|
||||
}
|
||||
return false;
|
||||
}
|
||||
38
packages/cli/src/context/ingest/adapters/dbt/fetch.test.ts
Normal file
38
packages/cli/src/context/ingest/adapters/dbt/fetch.test.ts
Normal file
|
|
@ -0,0 +1,38 @@
|
|||
import { mkdir, mkdtemp, readFile, rm, writeFile } from 'node:fs/promises';
|
||||
import { tmpdir } from 'node:os';
|
||||
import { join } from 'node:path';
|
||||
import { afterEach, beforeEach, describe, expect, it, vi } from 'vitest';
|
||||
import { fetchDbtRepo } from './fetch.js';
|
||||
|
||||
describe('fetchDbtRepo', () => {
|
||||
let tempDir: string;
|
||||
|
||||
beforeEach(async () => {
|
||||
tempDir = await mkdtemp(join(tmpdir(), 'ktx-dbt-fetch-'));
|
||||
});
|
||||
|
||||
afterEach(async () => {
|
||||
await rm(tempDir, { recursive: true, force: true });
|
||||
});
|
||||
|
||||
it('copies dbt yaml files from a fetched repo subpath into staged dir', async () => {
|
||||
const cacheDir = join(tempDir, 'cache');
|
||||
const stagedDir = join(tempDir, 'staged');
|
||||
await mkdir(join(cacheDir, 'analytics', 'models'), { recursive: true });
|
||||
await writeFile(join(cacheDir, 'analytics', 'dbt_project.yml'), 'name: analytics\n', 'utf-8');
|
||||
await writeFile(join(cacheDir, 'analytics', 'models', 'orders.yml'), 'models: []\n', 'utf-8');
|
||||
const cloneOrPull = vi.fn(async () => ({ commitHash: 'abc123' }));
|
||||
|
||||
await expect(
|
||||
fetchDbtRepo({
|
||||
config: { repoUrl: 'https://github.com/acme/dbt.git', path: 'analytics' },
|
||||
cacheDir,
|
||||
stagedDir,
|
||||
deps: { cloneOrPull },
|
||||
}),
|
||||
).resolves.toEqual({ commitHash: 'abc123', filesCopied: 2 });
|
||||
|
||||
await expect(readFile(join(stagedDir, 'dbt_project.yml'), 'utf-8')).resolves.toContain('analytics');
|
||||
await expect(readFile(join(stagedDir, 'models', 'orders.yml'), 'utf-8')).resolves.toContain('models');
|
||||
});
|
||||
});
|
||||
60
packages/cli/src/context/ingest/adapters/dbt/fetch.ts
Normal file
60
packages/cli/src/context/ingest/adapters/dbt/fetch.ts
Normal file
|
|
@ -0,0 +1,60 @@
|
|||
import { access, copyFile, mkdir, readdir } from 'node:fs/promises';
|
||||
import { dirname, join, relative } from 'node:path';
|
||||
import { cloneOrPull, sanitizeRepoError } from '../../repo-fetch.js';
|
||||
|
||||
export interface DbtPullConfig {
|
||||
repoUrl: string;
|
||||
branch?: string;
|
||||
path?: string;
|
||||
authToken?: string | null;
|
||||
}
|
||||
|
||||
export interface FetchDbtRepoParams {
|
||||
config: DbtPullConfig;
|
||||
cacheDir: string;
|
||||
stagedDir: string;
|
||||
deps?: {
|
||||
cloneOrPull?: typeof cloneOrPull;
|
||||
};
|
||||
}
|
||||
|
||||
export async function fetchDbtRepo(params: FetchDbtRepoParams): Promise<{ commitHash: string; filesCopied: number }> {
|
||||
try {
|
||||
const runCloneOrPull = params.deps?.cloneOrPull ?? cloneOrPull;
|
||||
const { commitHash } = await runCloneOrPull({
|
||||
repoUrl: params.config.repoUrl,
|
||||
authToken: params.config.authToken,
|
||||
cacheDir: params.cacheDir,
|
||||
branch: params.config.branch ?? 'main',
|
||||
});
|
||||
const sourceRoot = params.config.path ? join(params.cacheDir, params.config.path) : params.cacheDir;
|
||||
const filesCopied = await copyYamlFilesRecursive(sourceRoot, params.stagedDir);
|
||||
return { commitHash, filesCopied };
|
||||
} catch (error) {
|
||||
throw new Error(sanitizeRepoError(error, params.config.authToken));
|
||||
}
|
||||
}
|
||||
|
||||
async function copyYamlFilesRecursive(sourceRoot: string, destRoot: string): Promise<number> {
|
||||
try {
|
||||
await access(sourceRoot);
|
||||
} catch {
|
||||
return 0;
|
||||
}
|
||||
|
||||
await mkdir(destRoot, { recursive: true });
|
||||
const entries = await readdir(sourceRoot, { withFileTypes: true, recursive: true });
|
||||
let copied = 0;
|
||||
for (const entry of entries) {
|
||||
if (!entry.isFile() || !/\.ya?ml$/i.test(entry.name)) {
|
||||
continue;
|
||||
}
|
||||
const absSrc = join(entry.parentPath, entry.name);
|
||||
const rel = relative(sourceRoot, absSrc);
|
||||
const dest = join(destRoot, rel);
|
||||
await mkdir(dirname(dest), { recursive: true });
|
||||
await copyFile(absSrc, dest);
|
||||
copied += 1;
|
||||
}
|
||||
return copied;
|
||||
}
|
||||
|
|
@ -0,0 +1,8 @@
|
|||
import { describe, expect, it } from 'vitest';
|
||||
import { normalizeDbtPath } from './parse.js';
|
||||
|
||||
describe('normalizeDbtPath', () => {
|
||||
it('normalizes Windows separators to POSIX separators', () => {
|
||||
expect(normalizeDbtPath('models\\marts\\orders.yml')).toBe('models/marts/orders.yml');
|
||||
});
|
||||
});
|
||||
33
packages/cli/src/context/ingest/adapters/dbt/parse.ts
Normal file
33
packages/cli/src/context/ingest/adapters/dbt/parse.ts
Normal file
|
|
@ -0,0 +1,33 @@
|
|||
import { readdir } from 'node:fs/promises';
|
||||
import { join, relative } from 'node:path';
|
||||
|
||||
const YAML_EXT_RE = /\.(ya?ml)$/i;
|
||||
|
||||
/** @internal */
|
||||
export function normalizeDbtPath(path: string): string {
|
||||
return path.replaceAll('\\', '/');
|
||||
}
|
||||
|
||||
async function collectYamlFiles(stagedDir: string): Promise<string[]> {
|
||||
const entries = await readdir(stagedDir, { withFileTypes: true, recursive: true });
|
||||
const paths: string[] = [];
|
||||
for (const entry of entries) {
|
||||
if (!entry.isFile() || !YAML_EXT_RE.test(entry.name)) {
|
||||
continue;
|
||||
}
|
||||
const abs = join(entry.parentPath, entry.name);
|
||||
paths.push(normalizeDbtPath(relative(stagedDir, abs)));
|
||||
}
|
||||
paths.sort();
|
||||
return paths;
|
||||
}
|
||||
|
||||
export interface ParsedDbtProject {
|
||||
/** All `.yml` / `.yaml` paths under stagedDir, relative + sorted. */
|
||||
allPaths: string[];
|
||||
}
|
||||
|
||||
export async function parseDbtStagedDir(stagedDir: string): Promise<ParsedDbtProject> {
|
||||
const allPaths = await collectYamlFiles(stagedDir);
|
||||
return { allPaths };
|
||||
}
|
||||
|
|
@ -0,0 +1,48 @@
|
|||
import { readdir } from 'node:fs/promises';
|
||||
import { join, relative } from 'node:path';
|
||||
import type { ChunkResult, DiffSet, SourceAdapter, WorkUnit } from '../../types.js';
|
||||
|
||||
export class FakeSourceAdapter implements SourceAdapter {
|
||||
readonly source = 'fake';
|
||||
readonly skillNames: string[] = [];
|
||||
|
||||
detect(): Promise<boolean> {
|
||||
return Promise.resolve(true);
|
||||
}
|
||||
|
||||
async chunk(stagedDir: string, diffSet?: DiffSet): Promise<ChunkResult> {
|
||||
const subDirs = (await readdir(stagedDir, { withFileTypes: true }))
|
||||
.filter((e) => e.isDirectory())
|
||||
.map((e) => e.name)
|
||||
.sort();
|
||||
|
||||
const workUnits: WorkUnit[] = [];
|
||||
for (const subDir of subDirs) {
|
||||
const entries = await readdir(join(stagedDir, subDir), { withFileTypes: true, recursive: true });
|
||||
const rawFiles = entries
|
||||
.filter((e) => e.isFile())
|
||||
.map((e) => relative(stagedDir, join(e.parentPath, e.name)))
|
||||
.sort();
|
||||
if (rawFiles.length === 0) {
|
||||
continue;
|
||||
}
|
||||
if (diffSet) {
|
||||
const touched = new Set([...diffSet.added, ...diffSet.modified]);
|
||||
const anyTouched = rawFiles.some((p) => touched.has(p));
|
||||
if (!anyTouched) {
|
||||
continue;
|
||||
}
|
||||
}
|
||||
workUnits.push({
|
||||
unitKey: `fake-${subDir}`,
|
||||
displayLabel: subDir,
|
||||
rawFiles,
|
||||
peerFileIndex: [],
|
||||
dependencyPaths: [],
|
||||
});
|
||||
}
|
||||
|
||||
const eviction = diffSet && diffSet.deleted.length > 0 ? { deletedRawPaths: [...diffSet.deleted] } : undefined;
|
||||
return { workUnits, eviction };
|
||||
}
|
||||
}
|
||||
|
|
@ -0,0 +1,158 @@
|
|||
import { describe, expect, it, vi } from 'vitest';
|
||||
import { BigQueryHistoricSqlQueryHistoryReader } from './bigquery-query-history-reader.js';
|
||||
import { HistoricSqlGrantsMissingError } from './errors.js';
|
||||
|
||||
interface FakeQueryResult {
|
||||
headers: string[];
|
||||
rows: unknown[][];
|
||||
totalRows: number;
|
||||
error?: string;
|
||||
}
|
||||
|
||||
function queryClient(results: FakeQueryResult[]) {
|
||||
const executeQuery = vi.fn(async (_query: string) => {
|
||||
const next = results.shift();
|
||||
if (!next) {
|
||||
throw new Error('unexpected query');
|
||||
}
|
||||
return next;
|
||||
});
|
||||
return { executeQuery };
|
||||
}
|
||||
|
||||
function firstQuery(client: ReturnType<typeof queryClient>): string {
|
||||
const call = client.executeQuery.mock.calls[0];
|
||||
if (!call) {
|
||||
throw new Error('expected query client to be called');
|
||||
}
|
||||
return call[0];
|
||||
}
|
||||
|
||||
describe('BigQueryHistoricSqlQueryHistoryReader', () => {
|
||||
it('probes region-qualified INFORMATION_SCHEMA.JOBS_BY_PROJECT', async () => {
|
||||
const client = queryClient([{ headers: ['1'], rows: [[1]], totalRows: 1 }]);
|
||||
const reader = new BigQueryHistoricSqlQueryHistoryReader({ projectId: 'project-1', region: 'US' });
|
||||
|
||||
await expect(reader.probe(client)).resolves.toEqual({ warnings: [], info: [] });
|
||||
|
||||
expect(client.executeQuery).toHaveBeenCalledWith(
|
||||
'SELECT 1 FROM `project-1.region-us.INFORMATION_SCHEMA.JOBS_BY_PROJECT` LIMIT 1',
|
||||
);
|
||||
});
|
||||
|
||||
it('turns probe result errors into HistoricSqlGrantsMissingError', async () => {
|
||||
const client = queryClient([{ headers: [], rows: [], totalRows: 0, error: 'Access Denied: jobs.listAll' }]);
|
||||
const reader = new BigQueryHistoricSqlQueryHistoryReader({ projectId: 'project-1', region: 'us-central1' });
|
||||
|
||||
await expect(reader.probe(client)).rejects.toMatchObject({
|
||||
name: 'HistoricSqlGrantsMissingError',
|
||||
dialect: 'bigquery',
|
||||
remediation:
|
||||
'Grant roles/bigquery.resourceViewer on the BigQuery project, or grant a custom role containing bigquery.jobs.listAll.',
|
||||
});
|
||||
});
|
||||
|
||||
it('turns thrown probe failures into HistoricSqlGrantsMissingError', async () => {
|
||||
const client = {
|
||||
executeQuery: vi.fn(async () => {
|
||||
throw new Error('permission denied');
|
||||
}),
|
||||
};
|
||||
const reader = new BigQueryHistoricSqlQueryHistoryReader({ projectId: 'project-1', region: 'US' });
|
||||
|
||||
await expect(reader.probe(client)).rejects.toBeInstanceOf(HistoricSqlGrantsMissingError);
|
||||
});
|
||||
|
||||
it('fetches aggregated BigQuery query templates', async () => {
|
||||
const client = queryClient([
|
||||
{
|
||||
headers: [
|
||||
'template_id',
|
||||
'canonical_sql',
|
||||
'executions',
|
||||
'distinct_users',
|
||||
'first_seen',
|
||||
'last_seen',
|
||||
'p50_ms',
|
||||
'p95_ms',
|
||||
'error_rate',
|
||||
'rows_produced',
|
||||
'top_users',
|
||||
],
|
||||
rows: [
|
||||
[
|
||||
'hash-1',
|
||||
'select status from orders',
|
||||
42,
|
||||
3,
|
||||
'2026-05-01T00:00:00.000Z',
|
||||
'2026-05-11T00:00:00.000Z',
|
||||
12,
|
||||
40,
|
||||
0.05,
|
||||
null,
|
||||
JSON.stringify([{ user: 'analyst@example.test', executions: 1 }]),
|
||||
],
|
||||
],
|
||||
totalRows: 1,
|
||||
},
|
||||
]);
|
||||
const reader = new BigQueryHistoricSqlQueryHistoryReader({ projectId: 'demo', region: 'us' });
|
||||
|
||||
const rows = [];
|
||||
for await (const row of reader.fetchAggregated(
|
||||
client,
|
||||
{ start: new Date('2026-02-10T00:00:00.000Z'), end: new Date('2026-05-11T00:00:00.000Z') },
|
||||
{ dialect: 'bigquery', minExecutions: 5, windowDays: 90, enabledTables: [], filters: { dropTrivialProbes: true }, redactionPatterns: [], staleArchiveAfterDays: 90 },
|
||||
)) {
|
||||
rows.push(row);
|
||||
}
|
||||
|
||||
const sql = firstQuery(client);
|
||||
expect(sql).toContain('COUNT(*) AS executions');
|
||||
expect(sql).toContain('COUNT(DISTINCT user_email) AS distinct_users');
|
||||
expect(sql).toContain('GROUP BY query_hash');
|
||||
expect(sql).toContain('HAVING COUNT(*) >= 5');
|
||||
expect(rows).toMatchObject([
|
||||
{
|
||||
templateId: 'hash-1',
|
||||
stats: {
|
||||
executions: 42,
|
||||
errorRate: 0.05,
|
||||
},
|
||||
topUsers: [{ user: 'analyst@example.test', executions: 1 }],
|
||||
},
|
||||
]);
|
||||
});
|
||||
|
||||
it('throws a clear error when the query client cannot execute SQL', async () => {
|
||||
const reader = new BigQueryHistoricSqlQueryHistoryReader({ projectId: 'project-1', region: 'US' });
|
||||
|
||||
await expect(async () => {
|
||||
for await (const _row of reader.fetchAggregated(
|
||||
{},
|
||||
{ start: new Date(), end: new Date() },
|
||||
{
|
||||
dialect: 'bigquery',
|
||||
minExecutions: 5,
|
||||
windowDays: 90,
|
||||
enabledTables: [],
|
||||
filters: { dropTrivialProbes: true },
|
||||
redactionPatterns: [],
|
||||
staleArchiveAfterDays: 90,
|
||||
},
|
||||
)) {
|
||||
throw new Error('unreachable');
|
||||
}
|
||||
}).rejects.toThrow('Historic SQL BigQuery reader requires a query client with executeQuery(query)');
|
||||
});
|
||||
|
||||
it('rejects unsafe project and region identifiers before building SQL', () => {
|
||||
expect(() => new BigQueryHistoricSqlQueryHistoryReader({ projectId: 'project`1', region: 'US' })).toThrow(
|
||||
'Invalid BigQuery project id for historic-SQL ingest: project`1',
|
||||
);
|
||||
expect(() => new BigQueryHistoricSqlQueryHistoryReader({ projectId: 'project-1', region: 'US;DROP' })).toThrow(
|
||||
'Invalid BigQuery region for historic-SQL ingest: US;DROP',
|
||||
);
|
||||
});
|
||||
});
|
||||
|
|
@ -0,0 +1,247 @@
|
|||
import { HistoricSqlGrantsMissingError } from './errors.js';
|
||||
import {
|
||||
aggregatedTemplateSchema,
|
||||
type AggregatedTemplate,
|
||||
type HistoricSqlTimeWindow,
|
||||
type HistoricSqlUnifiedPullConfig,
|
||||
} from './types.js';
|
||||
|
||||
interface QueryResultLike {
|
||||
headers: string[];
|
||||
rows: unknown[][];
|
||||
totalRows: number;
|
||||
error?: string;
|
||||
}
|
||||
|
||||
interface QueryClientLike {
|
||||
executeQuery(query: string): Promise<QueryResultLike>;
|
||||
}
|
||||
|
||||
export interface BigQueryHistoricSqlQueryHistoryReaderOptions {
|
||||
projectId: string;
|
||||
region: string;
|
||||
}
|
||||
|
||||
const BIGQUERY_GRANTS_REMEDIATION =
|
||||
'Grant roles/bigquery.resourceViewer on the BigQuery project, or grant a custom role containing bigquery.jobs.listAll.';
|
||||
|
||||
function queryClient(client: unknown): QueryClientLike {
|
||||
if (
|
||||
client &&
|
||||
typeof client === 'object' &&
|
||||
'executeQuery' in client &&
|
||||
typeof (client as { executeQuery?: unknown }).executeQuery === 'function'
|
||||
) {
|
||||
return client as QueryClientLike;
|
||||
}
|
||||
throw new Error('Historic SQL BigQuery reader requires a query client with executeQuery(query)');
|
||||
}
|
||||
|
||||
function grantsError(cause: unknown): HistoricSqlGrantsMissingError {
|
||||
const message =
|
||||
cause instanceof Error
|
||||
? cause.message
|
||||
: typeof cause === 'string'
|
||||
? cause
|
||||
: 'BigQuery principal cannot query INFORMATION_SCHEMA.JOBS_BY_PROJECT.';
|
||||
return new HistoricSqlGrantsMissingError({
|
||||
dialect: 'bigquery',
|
||||
message: `Missing BigQuery audit grants for historic-SQL ingest: ${message}`,
|
||||
remediation: BIGQUERY_GRANTS_REMEDIATION,
|
||||
cause,
|
||||
});
|
||||
}
|
||||
|
||||
function normalizeProjectId(value: string): string {
|
||||
if (!/^[A-Za-z0-9_-]+$/.test(value)) {
|
||||
throw new Error(`Invalid BigQuery project id for historic-SQL ingest: ${value}`);
|
||||
}
|
||||
return value;
|
||||
}
|
||||
|
||||
function normalizeRegion(value: string): string {
|
||||
const region = value.trim().toLowerCase().replace(/^region-/, '');
|
||||
if (!/^[a-z0-9-]+$/.test(region)) {
|
||||
throw new Error(`Invalid BigQuery region for historic-SQL ingest: ${value}`);
|
||||
}
|
||||
return region;
|
||||
}
|
||||
|
||||
function timestampExpression(value: Date | string): string {
|
||||
const date = value instanceof Date ? value : new Date(value);
|
||||
if (Number.isNaN(date.getTime())) {
|
||||
throw new Error(`Invalid BigQuery query-history timestamp: ${String(value)}`);
|
||||
}
|
||||
return `TIMESTAMP('${date.toISOString().replace(/'/g, "\\'")}')`;
|
||||
}
|
||||
|
||||
function indexByHeader(headers: string[]): Map<string, number> {
|
||||
const out = new Map<string, number>();
|
||||
headers.forEach((header, index) => {
|
||||
out.set(header.toUpperCase(), index);
|
||||
});
|
||||
return out;
|
||||
}
|
||||
|
||||
function value(row: unknown[], indexes: Map<string, number>, name: string): unknown {
|
||||
const index = indexes.get(name.toUpperCase());
|
||||
return index === undefined ? null : row[index];
|
||||
}
|
||||
|
||||
function nullableString(raw: unknown): string | null {
|
||||
if (raw === null || raw === undefined) {
|
||||
return null;
|
||||
}
|
||||
const text = String(raw);
|
||||
return text.length > 0 ? text : null;
|
||||
}
|
||||
|
||||
function requiredString(raw: unknown, field: string): string {
|
||||
const text = nullableString(raw);
|
||||
if (!text) {
|
||||
throw new Error(`BigQuery JOBS_BY_PROJECT row is missing ${field}`);
|
||||
}
|
||||
return text;
|
||||
}
|
||||
|
||||
function nullableNumber(raw: unknown): number | null {
|
||||
if (raw === null || raw === undefined || raw === '') {
|
||||
return null;
|
||||
}
|
||||
const number = typeof raw === 'number' ? raw : Number(raw);
|
||||
if (!Number.isFinite(number)) {
|
||||
return null;
|
||||
}
|
||||
return Math.max(0, number);
|
||||
}
|
||||
|
||||
function requiredNumber(raw: unknown, field: string): number {
|
||||
const number = nullableNumber(raw);
|
||||
if (number === null) {
|
||||
throw new Error(`BigQuery JOBS_BY_PROJECT row has invalid ${field}: ${String(raw)}`);
|
||||
}
|
||||
return number;
|
||||
}
|
||||
|
||||
function requiredInteger(raw: unknown, field: string): number {
|
||||
return Math.trunc(requiredNumber(raw, field));
|
||||
}
|
||||
|
||||
function nullableInteger(raw: unknown): number | null {
|
||||
const number = nullableNumber(raw);
|
||||
return number === null ? null : Math.trunc(number);
|
||||
}
|
||||
|
||||
function isoTimestamp(raw: unknown, field: string): string {
|
||||
if (raw instanceof Date) {
|
||||
return raw.toISOString();
|
||||
}
|
||||
const text = requiredString(raw, field);
|
||||
const date = new Date(text);
|
||||
if (Number.isNaN(date.getTime())) {
|
||||
throw new Error(`BigQuery JOBS_BY_PROJECT row has invalid ${field}: ${text}`);
|
||||
}
|
||||
return date.toISOString();
|
||||
}
|
||||
|
||||
function parseTopUsers(raw: unknown): Array<{ user: string | null; executions: number }> {
|
||||
const text = nullableString(raw);
|
||||
if (!text) {
|
||||
return [];
|
||||
}
|
||||
try {
|
||||
const parsed = JSON.parse(text) as unknown;
|
||||
if (!Array.isArray(parsed)) {
|
||||
return [];
|
||||
}
|
||||
return parsed.flatMap((entry) => {
|
||||
if (!entry || typeof entry !== 'object') {
|
||||
return [];
|
||||
}
|
||||
const user = nullableString((entry as { user?: unknown }).user);
|
||||
const executions = nullableInteger((entry as { executions?: unknown }).executions);
|
||||
return executions === null ? [] : [{ user, executions }];
|
||||
});
|
||||
} catch {
|
||||
return [];
|
||||
}
|
||||
}
|
||||
|
||||
function mapAggregatedRow(row: unknown[], indexes: Map<string, number>): AggregatedTemplate {
|
||||
return aggregatedTemplateSchema.parse({
|
||||
templateId: requiredString(value(row, indexes, 'template_id'), 'template_id'),
|
||||
canonicalSql: requiredString(value(row, indexes, 'canonical_sql'), 'canonical_sql'),
|
||||
dialect: 'bigquery',
|
||||
stats: {
|
||||
executions: requiredInteger(value(row, indexes, 'executions'), 'executions'),
|
||||
distinctUsers: requiredInteger(value(row, indexes, 'distinct_users'), 'distinct_users'),
|
||||
firstSeen: isoTimestamp(value(row, indexes, 'first_seen'), 'first_seen'),
|
||||
lastSeen: isoTimestamp(value(row, indexes, 'last_seen'), 'last_seen'),
|
||||
p50RuntimeMs: nullableNumber(value(row, indexes, 'p50_ms')),
|
||||
p95RuntimeMs: nullableNumber(value(row, indexes, 'p95_ms')),
|
||||
errorRate: requiredNumber(value(row, indexes, 'error_rate'), 'error_rate'),
|
||||
rowsProduced: nullableInteger(value(row, indexes, 'rows_produced')),
|
||||
},
|
||||
topUsers: parseTopUsers(value(row, indexes, 'top_users')),
|
||||
});
|
||||
}
|
||||
|
||||
export class BigQueryHistoricSqlQueryHistoryReader {
|
||||
private readonly viewPath: string;
|
||||
|
||||
constructor(options: BigQueryHistoricSqlQueryHistoryReaderOptions) {
|
||||
const projectId = normalizeProjectId(options.projectId);
|
||||
const region = normalizeRegion(options.region);
|
||||
this.viewPath = `\`${projectId}.region-${region}.INFORMATION_SCHEMA.JOBS_BY_PROJECT\``;
|
||||
}
|
||||
|
||||
async probe(client: unknown): Promise<{ warnings: string[]; info: string[] }> {
|
||||
let result: QueryResultLike;
|
||||
try {
|
||||
result = await queryClient(client).executeQuery(`SELECT 1 FROM ${this.viewPath} LIMIT 1`);
|
||||
} catch (error) {
|
||||
throw grantsError(error);
|
||||
}
|
||||
if (result.error) {
|
||||
throw grantsError(result.error);
|
||||
}
|
||||
return { warnings: [], info: [] };
|
||||
}
|
||||
|
||||
async *fetchAggregated(
|
||||
client: unknown,
|
||||
window: HistoricSqlTimeWindow,
|
||||
config: HistoricSqlUnifiedPullConfig,
|
||||
): AsyncIterable<AggregatedTemplate> {
|
||||
const sql = `
|
||||
SELECT
|
||||
query_hash AS template_id,
|
||||
MIN(query) AS canonical_sql,
|
||||
COUNT(*) AS executions,
|
||||
COUNT(DISTINCT user_email) AS distinct_users,
|
||||
MIN(creation_time) AS first_seen,
|
||||
MAX(creation_time) AS last_seen,
|
||||
APPROX_QUANTILES(TIMESTAMP_DIFF(end_time, creation_time, MILLISECOND), 100)[OFFSET(50)] AS p50_ms,
|
||||
APPROX_QUANTILES(TIMESTAMP_DIFF(end_time, creation_time, MILLISECOND), 100)[OFFSET(95)] AS p95_ms,
|
||||
SAFE_DIVIDE(COUNTIF(error_result IS NOT NULL), COUNT(*)) AS error_rate,
|
||||
CAST(NULL AS INT64) AS rows_produced,
|
||||
TO_JSON_STRING(ARRAY_AGG(STRUCT(user_email AS user, 1 AS executions) ORDER BY creation_time DESC LIMIT 5)) AS top_users
|
||||
FROM ${this.viewPath}
|
||||
WHERE job_type = 'QUERY'
|
||||
AND statement_type IN ('SELECT', 'MERGE')
|
||||
AND creation_time >= ${timestampExpression(window.start)}
|
||||
AND creation_time < ${timestampExpression(window.end)}
|
||||
AND query IS NOT NULL
|
||||
GROUP BY query_hash
|
||||
HAVING COUNT(*) >= ${config.minExecutions}
|
||||
ORDER BY executions DESC`.trim();
|
||||
const result = await queryClient(client).executeQuery(sql);
|
||||
if (result.error) {
|
||||
throw grantsError(result.error);
|
||||
}
|
||||
const indexes = indexByHeader(result.headers);
|
||||
for (const row of result.rows) {
|
||||
yield mapAggregatedRow(row, indexes);
|
||||
}
|
||||
}
|
||||
}
|
||||
|
|
@ -0,0 +1,59 @@
|
|||
import { describe, expect, it } from 'vitest';
|
||||
import {
|
||||
bucketDistinctUsers,
|
||||
bucketErrorRate,
|
||||
bucketExecutions,
|
||||
bucketFrequency,
|
||||
bucketP95Runtime,
|
||||
bucketRecency,
|
||||
} from './buckets.js';
|
||||
|
||||
describe('historic-sql bucket helpers', () => {
|
||||
it('uses stable execution buckets', () => {
|
||||
expect([0, 9, 10, 99, 100, 999, 1000, 4999, 5000, 49999, 50000].map(bucketExecutions)).toEqual([
|
||||
'<10',
|
||||
'<10',
|
||||
'10-100',
|
||||
'10-100',
|
||||
'100-1k',
|
||||
'100-1k',
|
||||
'1k-5k',
|
||||
'1k-5k',
|
||||
'5k-50k',
|
||||
'5k-50k',
|
||||
'>50k',
|
||||
]);
|
||||
});
|
||||
|
||||
it('uses stable distinct-user, error-rate, runtime, and recency buckets', () => {
|
||||
expect([0, 1, 2, 5, 6, 10, 11].map(bucketDistinctUsers)).toEqual([
|
||||
'0',
|
||||
'1',
|
||||
'2-5',
|
||||
'2-5',
|
||||
'5-10',
|
||||
'5-10',
|
||||
'>10',
|
||||
]);
|
||||
expect([0, 0.01, 0.05, 0.2].map(bucketErrorRate)).toEqual(['none', 'low', 'low', 'high']);
|
||||
expect([null, 99, 100, 999, 1000, 9999, 10000].map(bucketP95Runtime)).toEqual([
|
||||
'unknown',
|
||||
'<100ms',
|
||||
'100ms-1s',
|
||||
'100ms-1s',
|
||||
'1s-10s',
|
||||
'1s-10s',
|
||||
'>10s',
|
||||
]);
|
||||
expect(bucketRecency('2026-05-11T00:00:00.000Z', new Date('2026-05-11T12:00:00.000Z'))).toBe('current');
|
||||
expect(bucketRecency('2026-04-20T00:00:00.000Z', new Date('2026-05-11T12:00:00.000Z'))).toBe('recent');
|
||||
expect(bucketRecency('2026-01-01T00:00:00.000Z', new Date('2026-05-11T12:00:00.000Z'))).toBe('stale');
|
||||
});
|
||||
|
||||
it('maps frequency counts to high, mid, and low labels', () => {
|
||||
expect(bucketFrequency(80, 100)).toBe('high');
|
||||
expect(bucketFrequency(20, 100)).toBe('mid');
|
||||
expect(bucketFrequency(1, 100)).toBe('low');
|
||||
expect(bucketFrequency(0, 0)).toBe('low');
|
||||
});
|
||||
});
|
||||
|
|
@ -0,0 +1,49 @@
|
|||
export function bucketExecutions(value: number): string {
|
||||
if (value < 10) return '<10';
|
||||
if (value < 100) return '10-100';
|
||||
if (value < 1000) return '100-1k';
|
||||
if (value < 5000) return '1k-5k';
|
||||
if (value < 50000) return '5k-50k';
|
||||
return '>50k';
|
||||
}
|
||||
|
||||
export function bucketDistinctUsers(value: number): string {
|
||||
if (value <= 0) return '0';
|
||||
if (value === 1) return '1';
|
||||
if (value <= 5) return '2-5';
|
||||
if (value <= 10) return '5-10';
|
||||
return '>10';
|
||||
}
|
||||
|
||||
export function bucketErrorRate(value: number): string {
|
||||
if (value <= 0) return 'none';
|
||||
if (value < 0.1) return 'low';
|
||||
return 'high';
|
||||
}
|
||||
|
||||
export function bucketP95Runtime(value: number | null): string {
|
||||
if (value === null) return 'unknown';
|
||||
if (value < 100) return '<100ms';
|
||||
if (value < 1000) return '100ms-1s';
|
||||
if (value < 10000) return '1s-10s';
|
||||
return '>10s';
|
||||
}
|
||||
|
||||
export function bucketRecency(lastSeen: string, now: Date): string {
|
||||
const parsed = new Date(lastSeen);
|
||||
if (Number.isNaN(parsed.getTime())) {
|
||||
return 'unknown';
|
||||
}
|
||||
const ageDays = (now.getTime() - parsed.getTime()) / (24 * 60 * 60 * 1000);
|
||||
if (ageDays <= 7) return 'current';
|
||||
if (ageDays <= 45) return 'recent';
|
||||
return 'stale';
|
||||
}
|
||||
|
||||
export function bucketFrequency(count: number, total: number): 'high' | 'mid' | 'low' {
|
||||
if (total <= 0 || count <= 0) return 'low';
|
||||
const ratio = count / total;
|
||||
if (ratio >= 0.5) return 'high';
|
||||
if (ratio >= 0.1) return 'mid';
|
||||
return 'low';
|
||||
}
|
||||
|
|
@ -0,0 +1,182 @@
|
|||
import { mkdir, mkdtemp, writeFile } from 'node:fs/promises';
|
||||
import { tmpdir } from 'node:os';
|
||||
import { join } from 'node:path';
|
||||
import { describe, expect, it } from 'vitest';
|
||||
import { chunkHistoricSqlUnifiedStagedDir, describeHistoricSqlUnifiedScope } from './chunk-unified.js';
|
||||
|
||||
async function tempDir(): Promise<string> {
|
||||
return mkdtemp(join(tmpdir(), 'historic-sql-unified-chunk-'));
|
||||
}
|
||||
|
||||
async function writeJson(root: string, relPath: string, value: unknown): Promise<void> {
|
||||
const target = join(root, relPath);
|
||||
await mkdir(join(target, '..'), { recursive: true });
|
||||
await writeFile(target, `${JSON.stringify(value, null, 2)}\n`, 'utf-8');
|
||||
}
|
||||
|
||||
async function writeUnifiedStagedDir(root: string): Promise<void> {
|
||||
await writeJson(root, 'manifest.json', {
|
||||
source: 'historic-sql',
|
||||
connectionId: 'warehouse',
|
||||
dialect: 'postgres',
|
||||
fetchedAt: '2026-05-11T00:00:00.000Z',
|
||||
windowStart: '2026-02-10T00:00:00.000Z',
|
||||
windowEnd: '2026-05-11T00:00:00.000Z',
|
||||
snapshotRowCount: 1,
|
||||
touchedTableCount: 1,
|
||||
parseFailures: 0,
|
||||
warnings: [],
|
||||
probeWarnings: [],
|
||||
});
|
||||
await writeJson(root, 'tables/public.orders.json', {
|
||||
table: 'public.orders',
|
||||
stats: {
|
||||
executionsBucket: '10-100',
|
||||
distinctUsersBucket: '2-5',
|
||||
errorRateBucket: 'none',
|
||||
p95RuntimeBucket: '<100ms',
|
||||
recencyBucket: 'current',
|
||||
},
|
||||
columnsByClause: { select: [['status', 'high']] },
|
||||
observedJoins: [],
|
||||
topTemplates: [{ id: 'orders', canonicalSql: 'select * from public.orders', topUsers: [{ user: 'analyst' }] }],
|
||||
});
|
||||
await writeJson(root, 'patterns-input.json', {
|
||||
templates: [
|
||||
{
|
||||
id: 'orders',
|
||||
canonicalSql: 'select * from public.orders join public.customers on true',
|
||||
tablesTouched: ['public.orders', 'public.customers'],
|
||||
executionsBucket: '10-100',
|
||||
distinctUsersBucket: '2-5',
|
||||
dialect: 'postgres',
|
||||
},
|
||||
],
|
||||
});
|
||||
await writeJson(root, 'patterns-input/part-0001.json', {
|
||||
templates: [
|
||||
{
|
||||
id: 'orders',
|
||||
canonicalSql: 'select * from public.orders join public.customers on true',
|
||||
tablesTouched: ['public.orders', 'public.customers'],
|
||||
executionsBucket: '10-100',
|
||||
distinctUsersBucket: '2-5',
|
||||
dialect: 'postgres',
|
||||
},
|
||||
],
|
||||
});
|
||||
}
|
||||
|
||||
describe('chunkHistoricSqlUnifiedStagedDir', () => {
|
||||
it('emits one table WorkUnit plus one patterns WorkUnit', async () => {
|
||||
const stagedDir = await tempDir();
|
||||
await writeUnifiedStagedDir(stagedDir);
|
||||
|
||||
const result = await chunkHistoricSqlUnifiedStagedDir(stagedDir);
|
||||
|
||||
expect(result.workUnits).toEqual([
|
||||
expect.objectContaining({
|
||||
unitKey: 'historic-sql-table-public-orders',
|
||||
displayLabel: 'Historic SQL usage: public.orders',
|
||||
rawFiles: ['tables/public.orders.json'],
|
||||
dependencyPaths: ['manifest.json'],
|
||||
notes: expect.stringContaining('historic_sql_table_digest'),
|
||||
}),
|
||||
expect.objectContaining({
|
||||
unitKey: 'historic-sql-patterns-part-0001',
|
||||
displayLabel: 'Historic SQL cross-table patterns: part-0001',
|
||||
rawFiles: ['patterns-input/part-0001.json'],
|
||||
dependencyPaths: ['manifest.json'],
|
||||
notes: expect.stringContaining('patterns-input/part-0001.json'),
|
||||
}),
|
||||
]);
|
||||
expect(result.workUnits[0]?.notes).toContain('emit_historic_sql_evidence');
|
||||
expect(result.workUnits[1]?.notes).toContain('emit_historic_sql_evidence');
|
||||
expect(result.reconcileNotes).toEqual(['Historic-SQL touched tables=1 parseFailures=0']);
|
||||
});
|
||||
|
||||
it('respects diff sets for unchanged table and patterns files', async () => {
|
||||
const stagedDir = await tempDir();
|
||||
await writeUnifiedStagedDir(stagedDir);
|
||||
|
||||
await expect(
|
||||
chunkHistoricSqlUnifiedStagedDir(stagedDir, {
|
||||
added: [],
|
||||
modified: ['tables/public.orders.json'],
|
||||
deleted: [],
|
||||
unchanged: ['manifest.json', 'patterns-input.json', 'patterns-input/part-0001.json'],
|
||||
}),
|
||||
).resolves.toMatchObject({
|
||||
workUnits: [expect.objectContaining({ unitKey: 'historic-sql-table-public-orders' })],
|
||||
});
|
||||
|
||||
await expect(
|
||||
chunkHistoricSqlUnifiedStagedDir(stagedDir, {
|
||||
added: [],
|
||||
modified: ['patterns-input/part-0001.json'],
|
||||
deleted: [],
|
||||
unchanged: ['manifest.json', 'patterns-input.json', 'tables/public.orders.json'],
|
||||
}),
|
||||
).resolves.toMatchObject({
|
||||
workUnits: [expect.objectContaining({ unitKey: 'historic-sql-patterns-part-0001' })],
|
||||
});
|
||||
|
||||
await expect(
|
||||
chunkHistoricSqlUnifiedStagedDir(stagedDir, {
|
||||
added: [],
|
||||
modified: ['patterns-input.json'],
|
||||
deleted: [],
|
||||
unchanged: ['manifest.json', 'patterns-input/part-0001.json', 'tables/public.orders.json'],
|
||||
}),
|
||||
).resolves.toMatchObject({
|
||||
workUnits: [],
|
||||
});
|
||||
});
|
||||
|
||||
it('describes unified staged scope', async () => {
|
||||
const stagedDir = await tempDir();
|
||||
await writeUnifiedStagedDir(stagedDir);
|
||||
|
||||
const scope = await describeHistoricSqlUnifiedScope(stagedDir);
|
||||
|
||||
expect(scope.isPathInScope('manifest.json')).toBe(true);
|
||||
expect(scope.isPathInScope('patterns-input.json')).toBe(true);
|
||||
expect(scope.isPathInScope('patterns-input/part-0001.json')).toBe(true);
|
||||
expect(scope.isPathInScope('patterns-input/part-1.json')).toBe(false);
|
||||
expect(scope.isPathInScope('tables/public.orders.json')).toBe(true);
|
||||
expect(scope.isPathInScope('templates/old/page.md')).toBe(false);
|
||||
});
|
||||
|
||||
it('emits one patterns WorkUnit per changed shard', async () => {
|
||||
const stagedDir = await tempDir();
|
||||
await writeUnifiedStagedDir(stagedDir);
|
||||
await writeJson(stagedDir, 'patterns-input/part-0002.json', {
|
||||
templates: [
|
||||
{
|
||||
id: 'line-items',
|
||||
canonicalSql: 'select * from public.orders join public.line_items on true',
|
||||
tablesTouched: ['public.orders', 'public.line_items'],
|
||||
executionsBucket: '10-100',
|
||||
distinctUsersBucket: '2-5',
|
||||
dialect: 'postgres',
|
||||
},
|
||||
],
|
||||
});
|
||||
|
||||
const result = await chunkHistoricSqlUnifiedStagedDir(stagedDir, {
|
||||
added: ['patterns-input/part-0002.json'],
|
||||
modified: ['patterns-input/part-0001.json'],
|
||||
deleted: [],
|
||||
unchanged: ['manifest.json', 'patterns-input.json', 'tables/public.orders.json'],
|
||||
});
|
||||
|
||||
expect(result.workUnits.map((unit) => unit.unitKey)).toEqual([
|
||||
'historic-sql-patterns-part-0001',
|
||||
'historic-sql-patterns-part-0002',
|
||||
]);
|
||||
expect(result.workUnits.map((unit) => unit.rawFiles)).toEqual([
|
||||
['patterns-input/part-0001.json'],
|
||||
['patterns-input/part-0002.json'],
|
||||
]);
|
||||
});
|
||||
});
|
||||
|
|
@ -0,0 +1,99 @@
|
|||
import { createHash } from 'node:crypto';
|
||||
import { readFile, readdir } from 'node:fs/promises';
|
||||
import { join, relative } from 'node:path';
|
||||
import type { ChunkResult, DiffSet, ScopeDescriptor, WorkUnit } from '../../types.js';
|
||||
import { isHistoricSqlPatternInputShardPath } from './pattern-inputs.js';
|
||||
import { stagedManifestSchema, stagedPatternsInputSchema, stagedTableInputSchema } from './types.js';
|
||||
|
||||
async function walk(root: string): Promise<string[]> {
|
||||
const entries = await readdir(root, { withFileTypes: true, recursive: true });
|
||||
return entries
|
||||
.filter((entry) => entry.isFile())
|
||||
.map((entry) => relative(root, join(entry.parentPath, entry.name)).replace(/\\/g, '/'))
|
||||
.sort();
|
||||
}
|
||||
|
||||
async function readJson<T>(stagedDir: string, relPath: string): Promise<T> {
|
||||
return JSON.parse(await readFile(join(stagedDir, relPath), 'utf-8')) as T;
|
||||
}
|
||||
|
||||
function safeUnitKey(value: string): string {
|
||||
return value.replace(/[^a-zA-Z0-9]+/g, '-').replace(/^-+|-+$/g, '');
|
||||
}
|
||||
|
||||
function touchedPath(path: string, touched: Set<string> | null): boolean {
|
||||
return !touched || touched.has(path);
|
||||
}
|
||||
|
||||
export async function chunkHistoricSqlUnifiedStagedDir(stagedDir: string, diffSet?: DiffSet): Promise<ChunkResult> {
|
||||
const files = await walk(stagedDir);
|
||||
const manifest = stagedManifestSchema.parse(await readJson(stagedDir, 'manifest.json'));
|
||||
const touched = diffSet ? new Set([...diffSet.added, ...diffSet.modified]) : null;
|
||||
const workUnits: WorkUnit[] = [];
|
||||
|
||||
for (const path of files.filter((file) => /^tables\/.+\.json$/.test(file))) {
|
||||
if (!touchedPath(path, touched)) {
|
||||
continue;
|
||||
}
|
||||
const table = stagedTableInputSchema.parse(await readJson(stagedDir, path));
|
||||
workUnits.push({
|
||||
unitKey: `historic-sql-table-${safeUnitKey(table.table)}`,
|
||||
displayLabel: `Historic SQL usage: ${table.table}`,
|
||||
rawFiles: [path],
|
||||
dependencyPaths: ['manifest.json'],
|
||||
peerFileIndex: files.filter((file) => file !== path && file !== 'manifest.json').sort(),
|
||||
notes:
|
||||
'Use historic_sql_table_digest. Read this table usage JSON and emit exactly one table_usage object with emit_historic_sql_evidence. Do not call wiki_write or sl_write_source.',
|
||||
});
|
||||
}
|
||||
|
||||
for (const path of files.filter(isHistoricSqlPatternInputShardPath)) {
|
||||
if (!touchedPath(path, touched)) {
|
||||
continue;
|
||||
}
|
||||
stagedPatternsInputSchema.parse(await readJson(stagedDir, path));
|
||||
const shardLabel = path.replace(/^patterns-input\//, '').replace(/\.json$/, '');
|
||||
workUnits.push({
|
||||
unitKey: `historic-sql-patterns-${safeUnitKey(shardLabel)}`,
|
||||
displayLabel: `Historic SQL cross-table patterns: ${shardLabel}`,
|
||||
rawFiles: [path],
|
||||
dependencyPaths: ['manifest.json'],
|
||||
peerFileIndex: files.filter((file) => file !== path && file !== 'manifest.json').sort(),
|
||||
notes:
|
||||
`Use historic_sql_patterns. Read ${path} and emit pattern objects with emit_historic_sql_evidence using rawPath "${path}". Do not call wiki_write or sl_write_source.`,
|
||||
});
|
||||
}
|
||||
|
||||
const deleted = diffSet?.deleted
|
||||
.filter((path) => isHistoricSqlPatternInputShardPath(path) || /^tables\/.+\.json$/.test(path))
|
||||
.sort();
|
||||
return {
|
||||
workUnits,
|
||||
eviction: deleted && deleted.length > 0 ? { deletedRawPaths: deleted } : undefined,
|
||||
reconcileNotes: [`Historic-SQL touched tables=${manifest.touchedTableCount} parseFailures=${manifest.parseFailures}`],
|
||||
contextReport: {
|
||||
capped: false,
|
||||
warnings: [...manifest.probeWarnings, ...manifest.warnings],
|
||||
},
|
||||
};
|
||||
}
|
||||
|
||||
export async function describeHistoricSqlUnifiedScope(stagedDir: string): Promise<ScopeDescriptor> {
|
||||
const manifest = stagedManifestSchema.parse(await readJson(stagedDir, 'manifest.json'));
|
||||
const fingerprint = createHash('sha256')
|
||||
.update(JSON.stringify({
|
||||
connectionId: manifest.connectionId,
|
||||
dialect: manifest.dialect,
|
||||
windowStart: manifest.windowStart,
|
||||
windowEnd: manifest.windowEnd,
|
||||
}))
|
||||
.digest('hex');
|
||||
return {
|
||||
fingerprint,
|
||||
isPathInScope: (rawPath) =>
|
||||
rawPath === 'manifest.json' ||
|
||||
rawPath === 'patterns-input.json' ||
|
||||
isHistoricSqlPatternInputShardPath(rawPath) ||
|
||||
/^tables\/.+\.json$/.test(rawPath),
|
||||
};
|
||||
}
|
||||
|
|
@ -0,0 +1,57 @@
|
|||
import { mkdir, mkdtemp, writeFile } from 'node:fs/promises';
|
||||
import { tmpdir } from 'node:os';
|
||||
import { join } from 'node:path';
|
||||
import { describe, expect, it } from 'vitest';
|
||||
import { detectHistoricSqlStagedDir } from './detect.js';
|
||||
import { HISTORIC_SQL_SOURCE_KEY, stagedManifestSchema } from './types.js';
|
||||
|
||||
async function tempDir(): Promise<string> {
|
||||
return mkdtemp(join(tmpdir(), 'historic-sql-detect-'));
|
||||
}
|
||||
|
||||
async function writeJson(root: string, relPath: string, value: unknown): Promise<void> {
|
||||
const target = join(root, relPath);
|
||||
await mkdir(join(target, '..'), { recursive: true });
|
||||
await writeFile(target, `${JSON.stringify(value, null, 2)}\n`, 'utf-8');
|
||||
}
|
||||
|
||||
function manifest() {
|
||||
return stagedManifestSchema.parse({
|
||||
source: HISTORIC_SQL_SOURCE_KEY,
|
||||
connectionId: 'conn_1',
|
||||
dialect: 'postgres',
|
||||
fetchedAt: '2026-05-04T12:00:00.000Z',
|
||||
windowStart: '2026-02-03T12:00:00.000Z',
|
||||
windowEnd: '2026-05-04T12:00:00.000Z',
|
||||
snapshotRowCount: 0,
|
||||
touchedTableCount: 0,
|
||||
parseFailures: 0,
|
||||
warnings: [],
|
||||
probeWarnings: [],
|
||||
});
|
||||
}
|
||||
|
||||
describe('historic-sql staged dir detection', () => {
|
||||
it('detects manifest source', async () => {
|
||||
const stagedDir = await tempDir();
|
||||
await writeJson(stagedDir, 'manifest.json', manifest());
|
||||
|
||||
await expect(detectHistoricSqlStagedDir(stagedDir)).resolves.toBe(true);
|
||||
});
|
||||
|
||||
it('detects unified table and patterns structure without manifest', async () => {
|
||||
const stagedDir = await tempDir();
|
||||
await writeFile(join(stagedDir, 'not-a-match.txt'), 'x', 'utf-8');
|
||||
await writeJson(stagedDir, 'patterns-input.json', { templates: [] });
|
||||
await writeJson(stagedDir, 'tables/public.orders.json', { table: 'public.orders' });
|
||||
|
||||
await expect(detectHistoricSqlStagedDir(stagedDir)).resolves.toBe(true);
|
||||
});
|
||||
|
||||
it('does not detect unrelated directories', async () => {
|
||||
const stagedDir = await tempDir();
|
||||
await writeJson(stagedDir, 'manifest.json', { source: 'notion' });
|
||||
|
||||
await expect(detectHistoricSqlStagedDir(stagedDir)).resolves.toBe(false);
|
||||
});
|
||||
});
|
||||
|
|
@ -0,0 +1,25 @@
|
|||
import { readFile, readdir } from 'node:fs/promises';
|
||||
import { join } from 'node:path';
|
||||
import { HISTORIC_SQL_SOURCE_KEY } from './types.js';
|
||||
|
||||
export async function detectHistoricSqlStagedDir(stagedDir: string): Promise<boolean> {
|
||||
try {
|
||||
const manifest = JSON.parse(await readFile(join(stagedDir, 'manifest.json'), 'utf-8')) as { source?: unknown };
|
||||
if (manifest.source === HISTORIC_SQL_SOURCE_KEY) {
|
||||
return true;
|
||||
}
|
||||
if (manifest.source !== undefined) {
|
||||
return false;
|
||||
}
|
||||
} catch {
|
||||
// Fall through to structural detection for stage-only fixtures.
|
||||
}
|
||||
|
||||
try {
|
||||
await readFile(join(stagedDir, 'patterns-input.json'), 'utf-8');
|
||||
const entries = await readdir(join(stagedDir, 'tables'), { withFileTypes: true });
|
||||
return entries.some((entry) => entry.isFile() && entry.name.endsWith('.json'));
|
||||
} catch {
|
||||
return false;
|
||||
}
|
||||
}
|
||||
|
|
@ -0,0 +1,61 @@
|
|||
import type { HistoricSqlDialect } from './types.js';
|
||||
|
||||
interface HistoricSqlGrantsMissingErrorOptions {
|
||||
dialect: HistoricSqlDialect;
|
||||
message: string;
|
||||
remediation: string;
|
||||
cause?: unknown;
|
||||
}
|
||||
|
||||
export class HistoricSqlGrantsMissingError extends Error {
|
||||
readonly dialect: HistoricSqlDialect;
|
||||
readonly remediation: string;
|
||||
|
||||
constructor(options: HistoricSqlGrantsMissingErrorOptions) {
|
||||
super(options.message, options.cause === undefined ? undefined : { cause: options.cause });
|
||||
this.name = 'HistoricSqlGrantsMissingError';
|
||||
this.dialect = options.dialect;
|
||||
this.remediation = options.remediation;
|
||||
}
|
||||
}
|
||||
|
||||
interface HistoricSqlExtensionMissingErrorOptions {
|
||||
dialect: HistoricSqlDialect;
|
||||
message: string;
|
||||
remediation: string;
|
||||
cause?: unknown;
|
||||
}
|
||||
|
||||
export class HistoricSqlExtensionMissingError extends Error {
|
||||
readonly dialect: HistoricSqlDialect;
|
||||
readonly remediation: string;
|
||||
|
||||
constructor(options: HistoricSqlExtensionMissingErrorOptions) {
|
||||
super(options.message, options.cause === undefined ? undefined : { cause: options.cause });
|
||||
this.name = 'HistoricSqlExtensionMissingError';
|
||||
this.dialect = options.dialect;
|
||||
this.remediation = options.remediation;
|
||||
}
|
||||
}
|
||||
|
||||
interface HistoricSqlVersionUnsupportedErrorOptions {
|
||||
dialect: HistoricSqlDialect;
|
||||
detectedVersion: string;
|
||||
minimumVersion: string;
|
||||
}
|
||||
|
||||
export class HistoricSqlVersionUnsupportedError extends Error {
|
||||
readonly dialect: HistoricSqlDialect;
|
||||
readonly detectedVersion: string;
|
||||
readonly minimumVersion: string;
|
||||
|
||||
constructor(options: HistoricSqlVersionUnsupportedErrorOptions) {
|
||||
super(
|
||||
`Unsupported ${options.dialect} version for historic-SQL ingest: detected ${options.detectedVersion}; requires ${options.minimumVersion} or newer.`,
|
||||
);
|
||||
this.name = 'HistoricSqlVersionUnsupportedError';
|
||||
this.dialect = options.dialect;
|
||||
this.detectedVersion = options.detectedVersion;
|
||||
this.minimumVersion = options.minimumVersion;
|
||||
}
|
||||
}
|
||||
|
|
@ -0,0 +1,89 @@
|
|||
import { describe, expect, it, vi } from 'vitest';
|
||||
import { asSchema } from 'ai';
|
||||
import { createEmitHistoricSqlEvidenceTool } from './evidence-tool.js';
|
||||
|
||||
describe('emit_historic_sql_evidence tool', () => {
|
||||
it('exposes an AI SDK v6 tool input schema with top-level object type', async () => {
|
||||
const tool = createEmitHistoricSqlEvidenceTool();
|
||||
|
||||
expect(await asSchema(tool.inputSchema).jsonSchema).toMatchObject({
|
||||
type: 'object',
|
||||
});
|
||||
});
|
||||
|
||||
it('writes table usage evidence to the ignored run evidence directory', async () => {
|
||||
const writeFile = vi.fn(async () => ({ success: true, commitHash: null }));
|
||||
const tool = createEmitHistoricSqlEvidenceTool();
|
||||
|
||||
const result = await tool.execute!(
|
||||
{
|
||||
kind: 'table_usage',
|
||||
table: 'public.orders',
|
||||
rawPath: 'tables/public.orders.json',
|
||||
usage: {
|
||||
narrative: 'Orders are repeatedly queried by paid status.',
|
||||
frequencyTier: 'high',
|
||||
commonFilters: ['status'],
|
||||
commonJoins: [],
|
||||
staleSince: null,
|
||||
},
|
||||
},
|
||||
{
|
||||
toolCallId: 'call-1',
|
||||
messages: [],
|
||||
abortSignal: new AbortController().signal,
|
||||
experimental_context: {
|
||||
connectionId: 'warehouse',
|
||||
session: {
|
||||
ingest: { runId: 'run-1', jobId: 'job-1', syncId: 'sync-1', sourceKey: 'historic-sql' },
|
||||
configService: { writeFile },
|
||||
},
|
||||
},
|
||||
} as never,
|
||||
);
|
||||
|
||||
expect(result).toBe('Recorded historic-SQL table_usage evidence for public.orders.');
|
||||
expect(writeFile).toHaveBeenCalledWith(
|
||||
'.ktx/ingest-evidence/historic-sql/run-1/historic-sql-table-public-orders.json',
|
||||
expect.stringContaining('"kind": "table_usage"'),
|
||||
'System User',
|
||||
'system@example.com',
|
||||
'Record historic-SQL evidence: historic-sql-table-public-orders',
|
||||
{ skipLock: true },
|
||||
);
|
||||
});
|
||||
|
||||
it('rejects non-historic ingest sessions', async () => {
|
||||
const tool = createEmitHistoricSqlEvidenceTool();
|
||||
|
||||
await expect(
|
||||
tool.execute!(
|
||||
{
|
||||
kind: 'pattern',
|
||||
rawPath: 'patterns-input.json',
|
||||
pattern: {
|
||||
slug: 'orders',
|
||||
title: 'Orders',
|
||||
narrative: 'Orders pattern.',
|
||||
definitionSql: 'select * from public.orders',
|
||||
tablesInvolved: ['public.orders'],
|
||||
slRefs: ['orders'],
|
||||
constituentTemplateIds: ['pg:1'],
|
||||
},
|
||||
},
|
||||
{
|
||||
toolCallId: 'call-1',
|
||||
messages: [],
|
||||
abortSignal: new AbortController().signal,
|
||||
experimental_context: {
|
||||
connectionId: 'warehouse',
|
||||
session: {
|
||||
ingest: { runId: 'run-1', jobId: 'job-1', syncId: 'sync-1', sourceKey: 'notion' },
|
||||
configService: { writeFile: vi.fn() },
|
||||
},
|
||||
},
|
||||
} as never,
|
||||
),
|
||||
).resolves.toContain('Error: emit_historic_sql_evidence is only available during historic-sql ingest');
|
||||
});
|
||||
});
|
||||
|
|
@ -0,0 +1,121 @@
|
|||
import { tool } from 'ai';
|
||||
import { z } from 'zod';
|
||||
import { historicSqlEvidencePath, serializeHistoricSqlEvidence } from './evidence.js';
|
||||
import { patternOutputSchema, tableUsageOutputSchema } from './skill-schemas.js';
|
||||
|
||||
const SYSTEM_AUTHOR = 'System User';
|
||||
const SYSTEM_EMAIL = 'system@example.com';
|
||||
|
||||
const emitHistoricSqlEvidenceInputSchema = z
|
||||
.object({
|
||||
kind: z.enum(['table_usage', 'pattern']),
|
||||
table: z.string().min(1).optional(),
|
||||
rawPath: z.string().min(1),
|
||||
usage: tableUsageOutputSchema.optional(),
|
||||
pattern: patternOutputSchema.optional(),
|
||||
})
|
||||
.superRefine((input, ctx) => {
|
||||
if (input.kind === 'table_usage') {
|
||||
if (!input.table) {
|
||||
ctx.addIssue({
|
||||
code: 'custom',
|
||||
path: ['table'],
|
||||
message: 'table is required when kind is table_usage',
|
||||
});
|
||||
}
|
||||
if (!input.usage) {
|
||||
ctx.addIssue({
|
||||
code: 'custom',
|
||||
path: ['usage'],
|
||||
message: 'usage is required when kind is table_usage',
|
||||
});
|
||||
}
|
||||
}
|
||||
if (input.kind === 'pattern' && !input.pattern) {
|
||||
ctx.addIssue({
|
||||
code: 'custom',
|
||||
path: ['pattern'],
|
||||
message: 'pattern is required when kind is pattern',
|
||||
});
|
||||
}
|
||||
});
|
||||
|
||||
type EmitHistoricSqlEvidenceInput = z.infer<typeof emitHistoricSqlEvidenceInputSchema>;
|
||||
|
||||
interface EmitHistoricSqlEvidenceToolContext {
|
||||
connectionId?: string | null;
|
||||
session?: {
|
||||
ingest?: { runId: string; sourceKey: string };
|
||||
configService?: {
|
||||
writeFile(
|
||||
path: string,
|
||||
content: string,
|
||||
author: string,
|
||||
authorEmail: string,
|
||||
commitMessage: string,
|
||||
options?: { skipLock?: boolean },
|
||||
): Promise<unknown>;
|
||||
};
|
||||
};
|
||||
}
|
||||
|
||||
function unitKeyForEvidence(input: EmitHistoricSqlEvidenceInput): string {
|
||||
if (input.kind === 'table_usage') {
|
||||
return `historic-sql-table-${String(input.table).replace(/[^a-zA-Z0-9]+/g, '-').replace(/^-+|-+$/g, '')}`;
|
||||
}
|
||||
return `historic-sql-pattern-${String(input.pattern?.slug).replace(/[^a-zA-Z0-9]+/g, '-').replace(/^-+|-+$/g, '')}`;
|
||||
}
|
||||
|
||||
function evidenceEnvelope(input: EmitHistoricSqlEvidenceInput, connectionId: string) {
|
||||
if (input.kind === 'table_usage') {
|
||||
if (!input.table || !input.usage) {
|
||||
throw new Error('Invalid historic-SQL table usage evidence input.');
|
||||
}
|
||||
return {
|
||||
kind: 'table_usage' as const,
|
||||
connectionId,
|
||||
table: input.table,
|
||||
rawPath: input.rawPath,
|
||||
usage: input.usage,
|
||||
};
|
||||
}
|
||||
if (!input.pattern) {
|
||||
throw new Error('Invalid historic-SQL pattern evidence input.');
|
||||
}
|
||||
return {
|
||||
kind: 'pattern' as const,
|
||||
connectionId,
|
||||
rawPath: input.rawPath,
|
||||
pattern: input.pattern,
|
||||
};
|
||||
}
|
||||
|
||||
export function createEmitHistoricSqlEvidenceTool(defaultContext?: EmitHistoricSqlEvidenceToolContext) {
|
||||
return tool({
|
||||
description:
|
||||
'Record typed historic-SQL evidence for deterministic projection. Use this instead of wiki_write, sl_write_source, sl_edit_source, or context_candidate_write during historic-SQL WorkUnits.',
|
||||
inputSchema: emitHistoricSqlEvidenceInputSchema,
|
||||
execute: async (input, options): Promise<string> => {
|
||||
const context = (options.experimental_context as EmitHistoricSqlEvidenceToolContext | undefined) ?? defaultContext;
|
||||
const ingest = context?.session?.ingest;
|
||||
const configService = context?.session?.configService;
|
||||
if (!ingest || ingest.sourceKey !== 'historic-sql' || !configService || !context?.connectionId) {
|
||||
return 'Error: emit_historic_sql_evidence is only available during historic-sql ingest.';
|
||||
}
|
||||
|
||||
const unitKey = unitKeyForEvidence(input);
|
||||
const evidence = evidenceEnvelope(input, context.connectionId);
|
||||
const content = serializeHistoricSqlEvidence(evidence);
|
||||
await configService.writeFile(
|
||||
historicSqlEvidencePath(ingest.runId, unitKey),
|
||||
content,
|
||||
SYSTEM_AUTHOR,
|
||||
SYSTEM_EMAIL,
|
||||
`Record historic-SQL evidence: ${unitKey}`,
|
||||
{ skipLock: true },
|
||||
);
|
||||
const label = evidence.kind === 'table_usage' ? evidence.table : evidence.pattern.slug;
|
||||
return `Recorded historic-SQL ${input.kind} evidence for ${label}.`;
|
||||
},
|
||||
});
|
||||
}
|
||||
|
|
@ -0,0 +1,57 @@
|
|||
import { describe, expect, it } from 'vitest';
|
||||
import {
|
||||
historicSqlEvidenceEnvelopeSchema,
|
||||
historicSqlEvidencePath,
|
||||
historicSqlPatternEvidenceSchema,
|
||||
historicSqlTableUsageEvidenceSchema,
|
||||
} from './evidence.js';
|
||||
|
||||
describe('historic-sql evidence contracts', () => {
|
||||
it('validates table usage evidence emitted by table digest WorkUnits', () => {
|
||||
const parsed = historicSqlTableUsageEvidenceSchema.parse({
|
||||
kind: 'table_usage',
|
||||
connectionId: 'warehouse',
|
||||
table: 'public.orders',
|
||||
rawPath: 'tables/public.orders.json',
|
||||
usage: {
|
||||
narrative: 'Orders are repeatedly queried for paid/refunded lifecycle analysis.',
|
||||
frequencyTier: 'high',
|
||||
commonFilters: ['status', 'created_at'],
|
||||
commonGroupBys: ['status'],
|
||||
commonJoins: [{ table: 'public.customers', on: ['customer_id'] }],
|
||||
staleSince: null,
|
||||
},
|
||||
});
|
||||
|
||||
expect(parsed.table).toBe('public.orders');
|
||||
expect(parsed.usage.frequencyTier).toBe('high');
|
||||
});
|
||||
|
||||
it('validates pattern evidence emitted by the patterns WorkUnit', () => {
|
||||
const parsed = historicSqlPatternEvidenceSchema.parse(
|
||||
historicSqlEvidenceEnvelopeSchema.parse({
|
||||
kind: 'pattern',
|
||||
connectionId: 'warehouse',
|
||||
rawPath: 'patterns-input.json',
|
||||
pattern: {
|
||||
slug: 'order-lifecycle-analysis',
|
||||
title: 'Order Lifecycle Analysis',
|
||||
narrative: 'Analysts compare order status changes by customer segment.',
|
||||
definitionSql: 'select status, count(*) from public.orders group by status',
|
||||
tablesInvolved: ['public.orders', 'public.customers'],
|
||||
slRefs: ['orders', 'customers'],
|
||||
constituentTemplateIds: ['pg:1', 'pg:2'],
|
||||
},
|
||||
}),
|
||||
);
|
||||
|
||||
expect(parsed.kind).toBe('pattern');
|
||||
expect(parsed.pattern.slug).toBe('order-lifecycle-analysis');
|
||||
});
|
||||
|
||||
it('builds a stable ignored evidence path from run and WorkUnit identity', () => {
|
||||
expect(historicSqlEvidencePath('run-1', 'historic-sql-table-public-orders')).toBe(
|
||||
'.ktx/ingest-evidence/historic-sql/run-1/historic-sql-table-public-orders.json',
|
||||
);
|
||||
});
|
||||
});
|
||||
|
|
@ -0,0 +1,41 @@
|
|||
import { z } from 'zod';
|
||||
import { patternOutputSchema, tableUsageOutputSchema } from './skill-schemas.js';
|
||||
|
||||
function safeEvidenceSegment(value: string): string {
|
||||
const segment = value.replace(/[^a-zA-Z0-9._-]+/g, '-').replace(/^-+|-+$/g, '');
|
||||
if (!segment) {
|
||||
throw new Error(`Invalid historic-SQL evidence path segment: ${value}`);
|
||||
}
|
||||
return segment;
|
||||
}
|
||||
|
||||
/** @internal */
|
||||
export const historicSqlTableUsageEvidenceSchema = z.object({
|
||||
kind: z.literal('table_usage'),
|
||||
connectionId: z.string().min(1),
|
||||
table: z.string().min(1),
|
||||
rawPath: z.string().min(1),
|
||||
usage: tableUsageOutputSchema,
|
||||
});
|
||||
|
||||
/** @internal */
|
||||
export const historicSqlPatternEvidenceSchema = z.object({
|
||||
kind: z.literal('pattern'),
|
||||
connectionId: z.string().min(1),
|
||||
rawPath: z.string().min(1),
|
||||
pattern: patternOutputSchema,
|
||||
});
|
||||
|
||||
export const historicSqlEvidenceEnvelopeSchema = z.discriminatedUnion('kind', [
|
||||
historicSqlTableUsageEvidenceSchema,
|
||||
historicSqlPatternEvidenceSchema,
|
||||
]);
|
||||
export type HistoricSqlEvidenceEnvelope = z.infer<typeof historicSqlEvidenceEnvelopeSchema>;
|
||||
|
||||
export function historicSqlEvidencePath(runId: string, unitKey: string): string {
|
||||
return `.ktx/ingest-evidence/historic-sql/${safeEvidenceSegment(runId)}/${safeEvidenceSegment(unitKey)}.json`;
|
||||
}
|
||||
|
||||
export function serializeHistoricSqlEvidence(evidence: HistoricSqlEvidenceEnvelope): string {
|
||||
return `${JSON.stringify(historicSqlEvidenceEnvelopeSchema.parse(evidence), null, 2)}\n`;
|
||||
}
|
||||
|
|
@ -0,0 +1,110 @@
|
|||
import { mkdtemp } from 'node:fs/promises';
|
||||
import { tmpdir } from 'node:os';
|
||||
import { join } from 'node:path';
|
||||
import { describe, expect, it } from 'vitest';
|
||||
import type { SqlAnalysisPort } from '../../../../context/sql-analysis/ports.js';
|
||||
import type { SourceAdapter } from '../../types.js';
|
||||
import { HistoricSqlSourceAdapter } from './historic-sql.adapter.js';
|
||||
import type { HistoricSqlReader } from './types.js';
|
||||
|
||||
async function tempDir(): Promise<string> {
|
||||
return mkdtemp(join(tmpdir(), 'historic-sql-adapter-'));
|
||||
}
|
||||
|
||||
const sqlAnalysis: SqlAnalysisPort = {
|
||||
async analyzeForFingerprint() {
|
||||
throw new Error('analyzeForFingerprint must not be used');
|
||||
},
|
||||
async analyzeBatch() {
|
||||
return new Map();
|
||||
},
|
||||
async validateReadOnly() {
|
||||
return { ok: true };
|
||||
},
|
||||
};
|
||||
|
||||
const reader: HistoricSqlReader = {
|
||||
async probe() {
|
||||
return { warnings: [], info: [] };
|
||||
},
|
||||
async *fetchAggregated() {},
|
||||
};
|
||||
|
||||
describe('HistoricSqlSourceAdapter', () => {
|
||||
it('declares canonical adapter metadata', () => {
|
||||
const adapter = new HistoricSqlSourceAdapter({ sqlAnalysis, reader, queryClient: {} });
|
||||
|
||||
expect(adapter.source).toBe('historic-sql');
|
||||
expect(adapter.skillNames).toEqual(['historic_sql_table_digest', 'historic_sql_patterns']);
|
||||
expect(adapter.reconcileSkillNames).toEqual([]);
|
||||
expect((adapter as SourceAdapter).evidenceIndexing).toBeUndefined();
|
||||
expect(adapter.triageSupported).toBe(false);
|
||||
});
|
||||
|
||||
it('fetches a unified aggregate snapshot and emits unified WorkUnits', async () => {
|
||||
const stagedDir = await tempDir();
|
||||
const aggregateReader: HistoricSqlReader = {
|
||||
async probe() {
|
||||
return { warnings: [], info: [] };
|
||||
},
|
||||
async *fetchAggregated() {
|
||||
yield {
|
||||
templateId: 'pg:1',
|
||||
canonicalSql:
|
||||
'select o.status, count(*) from public.orders o join public.customers c on c.id = o.customer_id group by o.status',
|
||||
dialect: 'postgres',
|
||||
stats: {
|
||||
executions: 25,
|
||||
distinctUsers: 3,
|
||||
firstSeen: '2026-05-01T00:00:00.000Z',
|
||||
lastSeen: '2026-05-11T00:00:00.000Z',
|
||||
p50RuntimeMs: 10,
|
||||
p95RuntimeMs: 20,
|
||||
errorRate: 0,
|
||||
rowsProduced: 10,
|
||||
},
|
||||
topUsers: [{ user: 'analyst', executions: 25 }],
|
||||
};
|
||||
},
|
||||
};
|
||||
const batchSqlAnalysis: SqlAnalysisPort = {
|
||||
async analyzeForFingerprint() {
|
||||
throw new Error('analyzeForFingerprint must not be used');
|
||||
},
|
||||
async analyzeBatch() {
|
||||
return new Map([
|
||||
[
|
||||
'pg:1',
|
||||
{
|
||||
tablesTouched: ['public.orders', 'public.customers'],
|
||||
columnsByClause: { select: ['status'], join: ['customer_id', 'id'], groupBy: ['status'] },
|
||||
},
|
||||
],
|
||||
]);
|
||||
},
|
||||
async validateReadOnly() {
|
||||
return { ok: true };
|
||||
},
|
||||
};
|
||||
const adapter = new HistoricSqlSourceAdapter({
|
||||
sqlAnalysis: batchSqlAnalysis,
|
||||
reader: aggregateReader,
|
||||
queryClient: {},
|
||||
now: () => new Date('2026-05-11T00:00:00.000Z'),
|
||||
});
|
||||
|
||||
await adapter.fetch({ dialect: 'postgres', minExecutions: 5 }, stagedDir, {
|
||||
connectionId: 'warehouse',
|
||||
sourceKey: 'historic-sql',
|
||||
});
|
||||
|
||||
await expect(adapter.detect(stagedDir)).resolves.toBe(true);
|
||||
await expect(adapter.chunk(stagedDir)).resolves.toMatchObject({
|
||||
workUnits: [
|
||||
{ unitKey: 'historic-sql-table-public-customers' },
|
||||
{ unitKey: 'historic-sql-table-public-orders' },
|
||||
{ unitKey: 'historic-sql-patterns-part-0001' },
|
||||
],
|
||||
});
|
||||
});
|
||||
});
|
||||
|
|
@ -0,0 +1,65 @@
|
|||
import type {
|
||||
ChunkResult,
|
||||
DeterministicFinalizationContext,
|
||||
DiffSet,
|
||||
FetchContext,
|
||||
FinalizationResult,
|
||||
ScopeDescriptor,
|
||||
SourceAdapter,
|
||||
} from '../../types.js';
|
||||
import { chunkHistoricSqlUnifiedStagedDir, describeHistoricSqlUnifiedScope } from './chunk-unified.js';
|
||||
import { detectHistoricSqlStagedDir } from './detect.js';
|
||||
import { projectHistoricSqlEvidence } from './projection.js';
|
||||
import { stageHistoricSqlAggregatedSnapshot } from './stage-unified.js';
|
||||
import { type HistoricSqlSourceAdapterDeps } from './types.js';
|
||||
|
||||
export class HistoricSqlSourceAdapter implements SourceAdapter {
|
||||
readonly source = 'historic-sql';
|
||||
readonly skillNames = ['historic_sql_table_digest', 'historic_sql_patterns'];
|
||||
readonly reconcileSkillNames: string[] = [];
|
||||
readonly triageSupported = false;
|
||||
|
||||
constructor(private readonly deps: HistoricSqlSourceAdapterDeps) {}
|
||||
|
||||
detect(stagedDir: string): Promise<boolean> {
|
||||
return detectHistoricSqlStagedDir(stagedDir);
|
||||
}
|
||||
|
||||
async fetch(pullConfig: unknown, stagedDir: string, ctx: FetchContext): Promise<void> {
|
||||
await stageHistoricSqlAggregatedSnapshot({
|
||||
stagedDir,
|
||||
connectionId: ctx.connectionId,
|
||||
queryClient: this.deps.queryClient,
|
||||
reader: this.deps.reader,
|
||||
sqlAnalysis: this.deps.sqlAnalysis,
|
||||
pullConfig,
|
||||
now: this.deps.now?.(),
|
||||
});
|
||||
}
|
||||
|
||||
chunk(stagedDir: string, diffSet?: DiffSet): Promise<ChunkResult> {
|
||||
return chunkHistoricSqlUnifiedStagedDir(stagedDir, diffSet);
|
||||
}
|
||||
|
||||
describeScope(stagedDir: string): Promise<ScopeDescriptor> {
|
||||
return describeHistoricSqlUnifiedScope(stagedDir);
|
||||
}
|
||||
|
||||
async finalize(ctx: DeterministicFinalizationContext): Promise<FinalizationResult> {
|
||||
const projection = await projectHistoricSqlEvidence({
|
||||
workdir: ctx.workdir,
|
||||
connectionId: ctx.connectionId,
|
||||
syncId: ctx.syncId,
|
||||
runId: ctx.runId,
|
||||
overrideReplay: ctx.overrideReplay,
|
||||
});
|
||||
return {
|
||||
result: projection,
|
||||
warnings: projection.warnings,
|
||||
errors: [],
|
||||
touchedSources: projection.touchedSources,
|
||||
changedWikiPageKeys: projection.changedWikiPageKeys,
|
||||
actions: projection.actions,
|
||||
};
|
||||
}
|
||||
}
|
||||
|
|
@ -0,0 +1,286 @@
|
|||
import { mkdtemp, readFile, rm, writeFile } from 'node:fs/promises';
|
||||
import { tmpdir } from 'node:os';
|
||||
import { join } from 'node:path';
|
||||
import YAML from 'yaml';
|
||||
import type { AgentRunnerPort, RunLoopParams } from '../../../../context/llm/runtime-port.js';
|
||||
import { initKtxProject, loadKtxProject, type KtxLocalProject } from '../../../../context/project/project.js';
|
||||
import type { SqlAnalysisBatchItem, SqlAnalysisBatchResult, SqlAnalysisDialect, SqlAnalysisPort } from '../../../../context/sql-analysis/ports.js';
|
||||
import { searchLocalSlSources } from '../../../sl/local-sl.js';
|
||||
import { searchLocalKnowledgePages } from '../../../wiki/local-knowledge.js';
|
||||
import { runLocalIngest } from '../../local-ingest.js';
|
||||
import { afterEach, beforeEach, describe, expect, it, vi } from 'vitest';
|
||||
import { HistoricSqlSourceAdapter } from './historic-sql.adapter.js';
|
||||
import type { AggregatedTemplate, HistoricSqlReader, HistoricSqlUnifiedPullConfig } from './types.js';
|
||||
|
||||
class AcceptanceHistoricSqlReader implements HistoricSqlReader {
|
||||
async probe() {
|
||||
return { warnings: [], info: [] };
|
||||
}
|
||||
|
||||
async *fetchAggregated(
|
||||
_client: unknown,
|
||||
_window: { start: Date; end: Date },
|
||||
_config: HistoricSqlUnifiedPullConfig,
|
||||
): AsyncIterable<AggregatedTemplate> {
|
||||
yield {
|
||||
templateId: 'pg:orders-lifecycle',
|
||||
canonicalSql:
|
||||
'select o.status, c.segment, count(*) from public.orders o join public.customers c on c.id = o.customer_id where o.status = $1 group by o.status, c.segment',
|
||||
dialect: 'postgres',
|
||||
stats: {
|
||||
executions: 42,
|
||||
distinctUsers: 4,
|
||||
firstSeen: '2026-05-01T00:00:00.000Z',
|
||||
lastSeen: '2026-05-11T00:00:00.000Z',
|
||||
p50RuntimeMs: 18,
|
||||
p95RuntimeMs: 84,
|
||||
errorRate: 0,
|
||||
rowsProduced: 420,
|
||||
},
|
||||
topUsers: [{ user: 'analyst@example.test', executions: 42 }],
|
||||
};
|
||||
}
|
||||
}
|
||||
|
||||
class HistoricSqlAcceptanceAgentRunner implements AgentRunnerPort {
|
||||
runLoop = vi.fn(async (params: RunLoopParams) => {
|
||||
if (params.telemetryTags?.operationName !== 'ingest-bundle-wu') {
|
||||
return { stopReason: 'natural' as const };
|
||||
}
|
||||
|
||||
const emitEvidence = params.toolSet.emit_historic_sql_evidence;
|
||||
if (!emitEvidence?.execute) {
|
||||
throw new Error('emit_historic_sql_evidence tool was not available to the historic-SQL WorkUnit');
|
||||
}
|
||||
|
||||
if (params.telemetryTags.unitKey === 'historic-sql-table-public-orders') {
|
||||
const result = await emitEvidence.execute({
|
||||
kind: 'table_usage',
|
||||
table: 'public.orders',
|
||||
rawPath: 'tables/public.orders.json',
|
||||
usage: {
|
||||
narrative: 'Analysts repeatedly inspect paid order lifecycle by customer segment.',
|
||||
frequencyTier: 'high',
|
||||
commonFilters: ['status'],
|
||||
commonGroupBys: ['status', 'segment'],
|
||||
commonJoins: [{ table: 'public.customers', on: ['customer_id', 'id'] }],
|
||||
staleSince: null,
|
||||
},
|
||||
});
|
||||
if (!result.markdown.includes('Recorded historic-SQL table_usage evidence')) {
|
||||
throw new Error(`Unexpected orders evidence result: ${result.markdown}`);
|
||||
}
|
||||
}
|
||||
|
||||
if (params.telemetryTags.unitKey === 'historic-sql-table-public-customers') {
|
||||
const result = await emitEvidence.execute({
|
||||
kind: 'table_usage',
|
||||
table: 'public.customers',
|
||||
rawPath: 'tables/public.customers.json',
|
||||
usage: {
|
||||
narrative: 'Customers provide segment context for paid order lifecycle analysis.',
|
||||
frequencyTier: 'mid',
|
||||
commonFilters: [],
|
||||
commonGroupBys: ['segment'],
|
||||
commonJoins: [{ table: 'public.orders', on: ['id', 'customer_id'] }],
|
||||
staleSince: null,
|
||||
},
|
||||
});
|
||||
if (!result.markdown.includes('Recorded historic-SQL table_usage evidence')) {
|
||||
throw new Error(`Unexpected customers evidence result: ${result.markdown}`);
|
||||
}
|
||||
}
|
||||
|
||||
if (params.telemetryTags.unitKey === 'historic-sql-patterns-part-0001') {
|
||||
const result = await emitEvidence.execute({
|
||||
kind: 'pattern',
|
||||
rawPath: 'patterns-input/part-0001.json',
|
||||
pattern: {
|
||||
slug: 'paid-order-lifecycle',
|
||||
title: 'Paid Order Lifecycle',
|
||||
narrative: 'Analysts join orders and customers to compare paid order lifecycle by segment.',
|
||||
definitionSql:
|
||||
'select o.status, c.segment, count(*) from public.orders o join public.customers c on c.id = o.customer_id group by o.status, c.segment',
|
||||
tablesInvolved: ['public.orders', 'public.customers'],
|
||||
slRefs: ['orders', 'customers'],
|
||||
constituentTemplateIds: ['pg:orders-lifecycle'],
|
||||
},
|
||||
});
|
||||
if (!result.markdown.includes('Recorded historic-SQL pattern evidence')) {
|
||||
throw new Error(`Unexpected pattern evidence result: ${result.markdown}`);
|
||||
}
|
||||
}
|
||||
|
||||
return { stopReason: 'natural' as const };
|
||||
});
|
||||
}
|
||||
|
||||
function acceptanceSqlAnalysis(): SqlAnalysisPort {
|
||||
return {
|
||||
analyzeForFingerprint: async () => {
|
||||
throw new Error('analyzeForFingerprint should not be used by unified historic-SQL ingest');
|
||||
},
|
||||
analyzeBatch: vi.fn(
|
||||
async (
|
||||
items: SqlAnalysisBatchItem[],
|
||||
_dialect: SqlAnalysisDialect,
|
||||
): Promise<Map<string, SqlAnalysisBatchResult>> => {
|
||||
return new Map(
|
||||
items.map((item) => [
|
||||
item.id,
|
||||
{
|
||||
tablesTouched: ['public.orders', 'public.customers'],
|
||||
columnsByClause: {
|
||||
select: ['status', 'segment'],
|
||||
where: ['status'],
|
||||
join: ['customer_id', 'id'],
|
||||
groupBy: ['status', 'segment'],
|
||||
},
|
||||
},
|
||||
]),
|
||||
);
|
||||
},
|
||||
),
|
||||
validateReadOnly: vi.fn(async () => ({ ok: true })),
|
||||
};
|
||||
}
|
||||
|
||||
async function writeHistoricSqlProject(project: KtxLocalProject): Promise<KtxLocalProject> {
|
||||
await writeFile(
|
||||
join(project.projectDir, 'ktx.yaml'),
|
||||
[
|
||||
'connections:',
|
||||
' warehouse:',
|
||||
' driver: postgres',
|
||||
' historicSql:',
|
||||
' enabled: true',
|
||||
' dialect: postgres',
|
||||
' minExecutions: 2',
|
||||
'ingest:',
|
||||
' adapters:',
|
||||
' - historic-sql',
|
||||
' embeddings:',
|
||||
' backend: none',
|
||||
'storage:',
|
||||
' state: sqlite',
|
||||
' search: sqlite-fts5',
|
||||
' git:',
|
||||
' auto_commit: false',
|
||||
' author: KTX Test <system@ktx.local>',
|
||||
'',
|
||||
].join('\n'),
|
||||
'utf-8',
|
||||
);
|
||||
|
||||
const loaded = await loadKtxProject({ projectDir: project.projectDir });
|
||||
await loaded.fileStore.writeFile(
|
||||
'semantic-layer/warehouse/_schema/public.yaml',
|
||||
YAML.stringify({
|
||||
tables: {
|
||||
orders: {
|
||||
table: 'public.orders',
|
||||
columns: [
|
||||
{ name: 'id', type: 'string' },
|
||||
{ name: 'status', type: 'string' },
|
||||
{ name: 'customer_id', type: 'string' },
|
||||
],
|
||||
},
|
||||
customers: {
|
||||
table: 'public.customers',
|
||||
columns: [
|
||||
{ name: 'id', type: 'string' },
|
||||
{ name: 'segment', type: 'string' },
|
||||
],
|
||||
},
|
||||
},
|
||||
}),
|
||||
'KTX Test',
|
||||
'system@ktx.local',
|
||||
'Seed schema shard',
|
||||
);
|
||||
return loaded;
|
||||
}
|
||||
|
||||
describe('historic-SQL local ingest retrieval acceptance', () => {
|
||||
let tempDir: string;
|
||||
|
||||
beforeEach(async () => {
|
||||
tempDir = await mkdtemp(join(tmpdir(), 'ktx-historic-sql-acceptance-'));
|
||||
});
|
||||
|
||||
afterEach(async () => {
|
||||
await rm(tempDir, { recursive: true, force: true });
|
||||
});
|
||||
|
||||
it('projects table and pattern evidence into semantic-layer and wiki retrieval surfaces', async () => {
|
||||
const initialized = await initKtxProject({ projectDir: join(tempDir, 'project') });
|
||||
const project = await writeHistoricSqlProject(initialized);
|
||||
const sqlAnalysis = acceptanceSqlAnalysis();
|
||||
const agentRunner = new HistoricSqlAcceptanceAgentRunner();
|
||||
const adapter = new HistoricSqlSourceAdapter({
|
||||
reader: new AcceptanceHistoricSqlReader(),
|
||||
queryClient: {},
|
||||
sqlAnalysis,
|
||||
now: () => new Date('2026-05-11T00:00:00.000Z'),
|
||||
});
|
||||
|
||||
const result = await runLocalIngest({
|
||||
project,
|
||||
adapters: [adapter],
|
||||
adapter: 'historic-sql',
|
||||
connectionId: 'warehouse',
|
||||
jobId: 'historic-sql-retrieval-acceptance',
|
||||
agentRunner,
|
||||
});
|
||||
|
||||
expect(sqlAnalysis.analyzeBatch).toHaveBeenCalledTimes(1);
|
||||
expect(result.result.failedWorkUnits).toEqual([]);
|
||||
expect(result.result.workUnitCount).toBe(3);
|
||||
expect(agentRunner.runLoop).toHaveBeenCalledTimes(3);
|
||||
const finalization = result.report.body.finalization;
|
||||
expect(finalization).toBeDefined();
|
||||
if (!finalization) {
|
||||
throw new Error('Expected historic-SQL finalization result');
|
||||
}
|
||||
expect(finalization).toMatchObject({
|
||||
sourceKey: 'historic-sql',
|
||||
status: 'success',
|
||||
result: {
|
||||
tableUsageMerged: 2,
|
||||
patternPagesWritten: 1,
|
||||
},
|
||||
});
|
||||
expect(finalization.declaredTouchedSources).toEqual(
|
||||
expect.arrayContaining([
|
||||
{ connectionId: 'warehouse', sourceName: 'customers' },
|
||||
{ connectionId: 'warehouse', sourceName: 'orders' },
|
||||
]),
|
||||
);
|
||||
|
||||
await expect(readFile(join(project.projectDir, 'semantic-layer/warehouse/_schema/public.yaml'), 'utf-8')).resolves
|
||||
.toContain('Analysts repeatedly inspect paid order lifecycle by customer segment.');
|
||||
await expect(readFile(join(project.projectDir, 'wiki/global/historic-sql-paid-order-lifecycle.md'), 'utf-8'))
|
||||
.resolves.toContain('Paid Order Lifecycle');
|
||||
|
||||
const reloaded = await loadKtxProject({ projectDir: project.projectDir });
|
||||
await expect(
|
||||
searchLocalSlSources(reloaded, { connectionId: 'warehouse', query: 'paid order lifecycle', limit: 5 }),
|
||||
).resolves.toEqual(expect.arrayContaining([
|
||||
expect.objectContaining({
|
||||
name: 'orders',
|
||||
frequencyTier: 'high',
|
||||
snippet: expect.stringContaining('<mark>'),
|
||||
matchReasons: expect.arrayContaining(['lexical']),
|
||||
}),
|
||||
]));
|
||||
await expect(
|
||||
searchLocalKnowledgePages(reloaded, { query: 'paid order lifecycle', userId: 'local', limit: 5 }),
|
||||
).resolves.toEqual([
|
||||
expect.objectContaining({
|
||||
key: 'historic-sql-paid-order-lifecycle',
|
||||
summary: 'Paid Order Lifecycle',
|
||||
matchReasons: expect.arrayContaining(['lexical']),
|
||||
}),
|
||||
]);
|
||||
});
|
||||
});
|
||||
|
|
@ -0,0 +1,89 @@
|
|||
import { describe, expect, it } from 'vitest';
|
||||
import {
|
||||
HISTORIC_SQL_PATTERN_WORKUNIT_MAX_BYTES,
|
||||
isHistoricSqlPatternInputShardPath,
|
||||
serializedStagedPatternsInputByteLength,
|
||||
splitHistoricSqlPatternInputs,
|
||||
} from './pattern-inputs.js';
|
||||
import type { StagedPatternsInput } from './types.js';
|
||||
|
||||
type PatternTemplate = StagedPatternsInput['templates'][number];
|
||||
|
||||
function template(id: string, tablesTouched: string[], canonicalSql = 'select 1'): PatternTemplate {
|
||||
return {
|
||||
id,
|
||||
canonicalSql,
|
||||
tablesTouched,
|
||||
executionsBucket: '10-100',
|
||||
distinctUsersBucket: '2-5',
|
||||
dialect: 'postgres',
|
||||
};
|
||||
}
|
||||
|
||||
describe('historic-SQL pattern input sharding', () => {
|
||||
it('keeps the audit input complete while sharding only cross-table pattern candidates', () => {
|
||||
const largeSql = `select * from public.orders join public.customers on true where marker = '${'x'.repeat(260)}'`;
|
||||
const input: StagedPatternsInput = {
|
||||
templates: [
|
||||
template('single-table-orders', ['public.orders']),
|
||||
template('orders-customers-2', ['public.orders', 'public.customers'], largeSql),
|
||||
template('orders-customers-1', ['public.customers', 'public.orders'], largeSql),
|
||||
template('orders-customers-payments', ['public.orders', 'public.customers', 'public.payments'], largeSql),
|
||||
],
|
||||
};
|
||||
|
||||
const result = splitHistoricSqlPatternInputs(input, { maxBytes: 760 });
|
||||
|
||||
expect(result.auditInput.templates.map((entry) => entry.id)).toEqual([
|
||||
'orders-customers-1',
|
||||
'orders-customers-2',
|
||||
'orders-customers-payments',
|
||||
'single-table-orders',
|
||||
]);
|
||||
expect(result.shards.length).toBeGreaterThan(1);
|
||||
expect(result.shards.map((shard) => shard.path)).toEqual([
|
||||
'patterns-input/part-0001.json',
|
||||
'patterns-input/part-0002.json',
|
||||
'patterns-input/part-0003.json',
|
||||
]);
|
||||
expect(result.shards.flatMap((shard) => shard.input.templates.map((entry) => entry.id))).toEqual([
|
||||
'orders-customers-payments',
|
||||
'orders-customers-1',
|
||||
'orders-customers-2',
|
||||
]);
|
||||
expect(result.shards.every((shard) => shard.byteLength <= 760)).toBe(true);
|
||||
expect(result.shards.flatMap((shard) => shard.input.templates).some((entry) => entry.id === 'single-table-orders')).toBe(false);
|
||||
expect(result.warnings).toEqual([]);
|
||||
});
|
||||
|
||||
it('omits a single oversized template from shards and reports a manifest warning', () => {
|
||||
const input: StagedPatternsInput = {
|
||||
templates: [
|
||||
template(
|
||||
'oversized-cross-table',
|
||||
['public.orders', 'public.customers'],
|
||||
`select * from public.orders join public.customers on true where payload = '${'x'.repeat(500)}'`,
|
||||
),
|
||||
],
|
||||
};
|
||||
|
||||
const result = splitHistoricSqlPatternInputs(input, { maxBytes: 240 });
|
||||
|
||||
expect(result.auditInput.templates.map((entry) => entry.id)).toEqual(['oversized-cross-table']);
|
||||
expect(result.shards).toEqual([]);
|
||||
expect(result.warnings).toEqual(['patterns_input_template_too_large:oversized-cross-table']);
|
||||
});
|
||||
|
||||
it('recognizes only generated pattern shard paths', () => {
|
||||
expect(isHistoricSqlPatternInputShardPath('patterns-input/part-0001.json')).toBe(true);
|
||||
expect(isHistoricSqlPatternInputShardPath('patterns-input/part-0012.json')).toBe(true);
|
||||
expect(isHistoricSqlPatternInputShardPath('patterns-input.json')).toBe(false);
|
||||
expect(isHistoricSqlPatternInputShardPath('patterns-input/part-1.json')).toBe(false);
|
||||
expect(isHistoricSqlPatternInputShardPath('patterns-input/readme.md')).toBe(false);
|
||||
});
|
||||
|
||||
it('uses a production byte budget below read_raw_file maximum size', () => {
|
||||
expect(HISTORIC_SQL_PATTERN_WORKUNIT_MAX_BYTES).toBeLessThan(120_000);
|
||||
expect(serializedStagedPatternsInputByteLength({ templates: [] })).toBeGreaterThan(0);
|
||||
});
|
||||
});
|
||||
|
|
@ -0,0 +1,101 @@
|
|||
import { Buffer } from 'node:buffer';
|
||||
import type { StagedPatternsInput } from './types.js';
|
||||
|
||||
const HISTORIC_SQL_PATTERN_WORKUNIT_DIR = 'patterns-input';
|
||||
/** @internal */
|
||||
export const HISTORIC_SQL_PATTERN_WORKUNIT_MAX_BYTES = 110_000;
|
||||
const HISTORIC_SQL_PATTERN_WORKUNIT_PATH_RE = /^patterns-input\/part-\d{4}\.json$/;
|
||||
|
||||
type PatternTemplate = StagedPatternsInput['templates'][number];
|
||||
|
||||
interface HistoricSqlPatternInputShard {
|
||||
path: string;
|
||||
input: StagedPatternsInput;
|
||||
byteLength: number;
|
||||
}
|
||||
|
||||
export interface HistoricSqlPatternInputSplitResult {
|
||||
auditInput: StagedPatternsInput;
|
||||
shards: HistoricSqlPatternInputShard[];
|
||||
warnings: string[];
|
||||
}
|
||||
|
||||
export interface HistoricSqlPatternInputSplitOptions {
|
||||
maxBytes?: number;
|
||||
}
|
||||
|
||||
export function isHistoricSqlPatternInputShardPath(path: string): boolean {
|
||||
return HISTORIC_SQL_PATTERN_WORKUNIT_PATH_RE.test(path);
|
||||
}
|
||||
|
||||
function serializeStagedPatternsInput(input: StagedPatternsInput): string {
|
||||
return `${JSON.stringify(input, null, 2)}\n`;
|
||||
}
|
||||
|
||||
/** @internal */
|
||||
export function serializedStagedPatternsInputByteLength(input: StagedPatternsInput): number {
|
||||
return Buffer.byteLength(serializeStagedPatternsInput(input), 'utf-8');
|
||||
}
|
||||
|
||||
function sortedAuditTemplates(templates: readonly PatternTemplate[]): PatternTemplate[] {
|
||||
return [...templates].sort((left, right) => left.id.localeCompare(right.id));
|
||||
}
|
||||
|
||||
function sortedPatternCandidates(templates: readonly PatternTemplate[]): PatternTemplate[] {
|
||||
return [...templates]
|
||||
.filter((template) => template.tablesTouched.length >= 2)
|
||||
.map((template) => ({ ...template, tablesTouched: [...template.tablesTouched].sort() }))
|
||||
.sort((left, right) => {
|
||||
const cardinality = right.tablesTouched.length - left.tablesTouched.length;
|
||||
if (cardinality !== 0) return cardinality;
|
||||
const tableSignature = left.tablesTouched.join('\0').localeCompare(right.tablesTouched.join('\0'));
|
||||
if (tableSignature !== 0) return tableSignature;
|
||||
return left.id.localeCompare(right.id);
|
||||
});
|
||||
}
|
||||
|
||||
function shardPath(index: number): string {
|
||||
return `${HISTORIC_SQL_PATTERN_WORKUNIT_DIR}/part-${String(index).padStart(4, '0')}.json`;
|
||||
}
|
||||
|
||||
export function splitHistoricSqlPatternInputs(
|
||||
input: StagedPatternsInput,
|
||||
options: HistoricSqlPatternInputSplitOptions = {},
|
||||
): HistoricSqlPatternInputSplitResult {
|
||||
const maxBytes = options.maxBytes ?? HISTORIC_SQL_PATTERN_WORKUNIT_MAX_BYTES;
|
||||
const auditInput: StagedPatternsInput = { templates: sortedAuditTemplates(input.templates) };
|
||||
const warnings: string[] = [];
|
||||
const shards: HistoricSqlPatternInputShard[] = [];
|
||||
let current: PatternTemplate[] = [];
|
||||
|
||||
const flush = () => {
|
||||
if (current.length === 0) {
|
||||
return;
|
||||
}
|
||||
const shardInput: StagedPatternsInput = { templates: current };
|
||||
shards.push({
|
||||
path: shardPath(shards.length + 1),
|
||||
input: shardInput,
|
||||
byteLength: serializedStagedPatternsInputByteLength(shardInput),
|
||||
});
|
||||
current = [];
|
||||
};
|
||||
|
||||
for (const template of sortedPatternCandidates(input.templates)) {
|
||||
const singleInput: StagedPatternsInput = { templates: [template] };
|
||||
if (serializedStagedPatternsInputByteLength(singleInput) > maxBytes) {
|
||||
warnings.push(`patterns_input_template_too_large:${template.id}`);
|
||||
continue;
|
||||
}
|
||||
|
||||
const nextInput: StagedPatternsInput = { templates: [...current, template] };
|
||||
if (current.length > 0 && serializedStagedPatternsInputByteLength(nextInput) > maxBytes) {
|
||||
flush();
|
||||
}
|
||||
current.push(template);
|
||||
}
|
||||
|
||||
flush();
|
||||
|
||||
return { auditInput, shards, warnings };
|
||||
}
|
||||
|
|
@ -0,0 +1,242 @@
|
|||
import { describe, expect, it, vi } from 'vitest';
|
||||
import {
|
||||
HistoricSqlExtensionMissingError,
|
||||
HistoricSqlGrantsMissingError,
|
||||
HistoricSqlVersionUnsupportedError,
|
||||
} from './errors.js';
|
||||
import { PostgresPgssReader } from './postgres-pgss-reader.js';
|
||||
|
||||
interface FakeQueryResult {
|
||||
headers: string[];
|
||||
rows: unknown[][];
|
||||
totalRows?: number;
|
||||
error?: string;
|
||||
}
|
||||
|
||||
function queryClient(results: Array<FakeQueryResult | Error>) {
|
||||
const executeQuery = vi.fn(async (_query: string, _params?: unknown[]) => {
|
||||
const next = results.shift();
|
||||
if (!next) {
|
||||
throw new Error('unexpected query');
|
||||
}
|
||||
if (next instanceof Error) {
|
||||
throw next;
|
||||
}
|
||||
return next;
|
||||
});
|
||||
return { executeQuery };
|
||||
}
|
||||
|
||||
function executedSql(client: ReturnType<typeof queryClient>, index: number): string {
|
||||
const call = client.executeQuery.mock.calls[index];
|
||||
if (!call) {
|
||||
throw new Error(`expected query client call ${index}`);
|
||||
}
|
||||
return call[0];
|
||||
}
|
||||
|
||||
describe('PostgresPgssReader aggregate path', () => {
|
||||
it('probes version, extension presence, grants, and tracking state', async () => {
|
||||
const client = queryClient([
|
||||
{
|
||||
headers: ['server_version_num', 'server_version'],
|
||||
rows: [[160004, 'PostgreSQL 16.4 on x86_64-apple-darwin']],
|
||||
},
|
||||
{ headers: ['?column?'], rows: [[1]] },
|
||||
{ headers: ['has_role'], rows: [[true]] },
|
||||
{ headers: ['track'], rows: [['top']] },
|
||||
{ headers: ['max'], rows: [['5000']] },
|
||||
]);
|
||||
const reader = new PostgresPgssReader();
|
||||
|
||||
await expect(reader.probe(client)).resolves.toEqual({
|
||||
pgServerVersion: 'PostgreSQL 16.4 on x86_64-apple-darwin',
|
||||
warnings: [],
|
||||
info: [],
|
||||
});
|
||||
|
||||
expect(executedSql(client, 0)).toContain("current_setting('server_version_num')::int");
|
||||
expect(executedSql(client, 1)).toBe('SELECT 1 FROM pg_stat_statements LIMIT 1');
|
||||
expect(executedSql(client, 2)).toBe(
|
||||
"SELECT pg_has_role(current_user, 'pg_read_all_stats', 'USAGE') AS has_role",
|
||||
);
|
||||
expect(executedSql(client, 3)).toBe("SELECT current_setting('pg_stat_statements.track') AS track");
|
||||
expect(executedSql(client, 4)).toBe("SELECT current_setting('pg_stat_statements.max') AS max");
|
||||
});
|
||||
|
||||
it('rejects PostgreSQL versions older than 14 without probing the extension', async () => {
|
||||
const client = queryClient([
|
||||
{
|
||||
headers: ['server_version_num', 'server_version'],
|
||||
rows: [[130012, 'PostgreSQL 13.12']],
|
||||
},
|
||||
]);
|
||||
const reader = new PostgresPgssReader();
|
||||
|
||||
const promise = reader.probe(client);
|
||||
await expect(promise).rejects.toMatchObject({
|
||||
name: 'HistoricSqlVersionUnsupportedError',
|
||||
dialect: 'postgres',
|
||||
detectedVersion: 'PostgreSQL 13.12',
|
||||
minimumVersion: 'PostgreSQL 14',
|
||||
});
|
||||
await expect(promise).rejects.toBeInstanceOf(HistoricSqlVersionUnsupportedError);
|
||||
expect(client.executeQuery).toHaveBeenCalledTimes(1);
|
||||
});
|
||||
|
||||
it('maps a missing pg_stat_statements relation to HistoricSqlExtensionMissingError', async () => {
|
||||
const client = queryClient([
|
||||
{
|
||||
headers: ['server_version_num', 'server_version'],
|
||||
rows: [[160004, 'PostgreSQL 16.4']],
|
||||
},
|
||||
new Error('relation "pg_stat_statements" does not exist'),
|
||||
]);
|
||||
const reader = new PostgresPgssReader();
|
||||
|
||||
const promise = reader.probe(client);
|
||||
await expect(promise).rejects.toMatchObject({
|
||||
name: 'HistoricSqlExtensionMissingError',
|
||||
dialect: 'postgres',
|
||||
});
|
||||
await expect(promise).rejects.toBeInstanceOf(HistoricSqlExtensionMissingError);
|
||||
});
|
||||
|
||||
it('maps pg_stat_statements preload failures to HistoricSqlExtensionMissingError with preload remediation', async () => {
|
||||
const client = queryClient([
|
||||
{
|
||||
headers: ['server_version_num', 'server_version'],
|
||||
rows: [[160004, 'PostgreSQL 16.4']],
|
||||
},
|
||||
new Error('pg_stat_statements must be loaded via shared_preload_libraries'),
|
||||
]);
|
||||
const reader = new PostgresPgssReader();
|
||||
|
||||
const promise = reader.probe(client);
|
||||
await expect(promise).rejects.toMatchObject({
|
||||
name: 'HistoricSqlExtensionMissingError',
|
||||
dialect: 'postgres',
|
||||
message: 'pg_stat_statements is installed but not loaded via shared_preload_libraries.',
|
||||
remediation: expect.stringContaining("shared_preload_libraries includes 'pg_stat_statements'"),
|
||||
});
|
||||
await expect(promise).rejects.toBeInstanceOf(HistoricSqlExtensionMissingError);
|
||||
});
|
||||
|
||||
it('maps missing pg_read_all_stats membership to HistoricSqlGrantsMissingError', async () => {
|
||||
const client = queryClient([
|
||||
{
|
||||
headers: ['server_version_num', 'server_version'],
|
||||
rows: [[160004, 'PostgreSQL 16.4']],
|
||||
},
|
||||
{ headers: ['?column?'], rows: [[1]] },
|
||||
{ headers: ['has_role'], rows: [[false]] },
|
||||
]);
|
||||
const reader = new PostgresPgssReader();
|
||||
|
||||
const promise = reader.probe(client);
|
||||
await expect(promise).rejects.toMatchObject({
|
||||
name: 'HistoricSqlGrantsMissingError',
|
||||
dialect: 'postgres',
|
||||
remediation: 'GRANT pg_read_all_stats TO <connection role>;',
|
||||
});
|
||||
await expect(promise).rejects.toBeInstanceOf(HistoricSqlGrantsMissingError);
|
||||
});
|
||||
|
||||
it('returns a warning instead of failing when pg_stat_statements.track is none', async () => {
|
||||
const client = queryClient([
|
||||
{
|
||||
headers: ['server_version_num', 'server_version'],
|
||||
rows: [[160004, 'PostgreSQL 16.4']],
|
||||
},
|
||||
{ headers: ['?column?'], rows: [[1]] },
|
||||
{ headers: ['has_role'], rows: [[true]] },
|
||||
{ headers: ['track'], rows: [['none']] },
|
||||
{ headers: ['max'], rows: [['5000']] },
|
||||
]);
|
||||
const reader = new PostgresPgssReader();
|
||||
|
||||
await expect(reader.probe(client)).resolves.toEqual({
|
||||
pgServerVersion: 'PostgreSQL 16.4',
|
||||
warnings: [
|
||||
"pg_stat_statements.track is none; set it to top or all in the Postgres parameter group or config",
|
||||
],
|
||||
info: [],
|
||||
});
|
||||
});
|
||||
|
||||
it('returns an info note when pg_stat_statements.max is below the recommended floor', async () => {
|
||||
const client = queryClient([
|
||||
{
|
||||
headers: ['server_version_num', 'server_version'],
|
||||
rows: [[160004, 'PostgreSQL 16.4']],
|
||||
},
|
||||
{ headers: ['?column?'], rows: [[1]] },
|
||||
{ headers: ['has_role'], rows: [[true]] },
|
||||
{ headers: ['track'], rows: [['top']] },
|
||||
{ headers: ['max'], rows: [['1000']] },
|
||||
]);
|
||||
const reader = new PostgresPgssReader();
|
||||
|
||||
await expect(reader.probe(client)).resolves.toEqual({
|
||||
pgServerVersion: 'PostgreSQL 16.4',
|
||||
warnings: [],
|
||||
info: [
|
||||
'pg_stat_statements.max is 1000; set it to at least 5000 to reduce query-template eviction churn',
|
||||
],
|
||||
});
|
||||
});
|
||||
|
||||
it('aggregates pg_stat_statements rows by queryid and query', async () => {
|
||||
const executeQuery = vi.fn(async (sql: string, params?: unknown[]) => {
|
||||
if (sql.includes('pg_stat_statements_info')) {
|
||||
return { headers: ['stats_reset', 'dealloc'], rows: [['2026-05-01T00:00:00.000Z', 1]] };
|
||||
}
|
||||
expect(sql).toContain('GROUP BY queryid, query');
|
||||
expect(sql).toContain('HAVING SUM(calls) >= $1');
|
||||
expect(params).toEqual([5]);
|
||||
return {
|
||||
headers: ['template_id', 'canonical_sql', 'executions', 'distinct_users', 'mean_ms', 'rows_produced', 'top_users'],
|
||||
rows: [
|
||||
[
|
||||
'123',
|
||||
'select status from public.orders',
|
||||
'42',
|
||||
'3',
|
||||
'11.5',
|
||||
'100',
|
||||
JSON.stringify([{ user: 'analyst', executions: 40 }]),
|
||||
],
|
||||
],
|
||||
};
|
||||
});
|
||||
|
||||
const reader = new PostgresPgssReader();
|
||||
const rows = [];
|
||||
for await (const row of reader.fetchAggregated(
|
||||
{ executeQuery },
|
||||
{ start: new Date('2026-02-10T00:00:00.000Z'), end: new Date('2026-05-11T00:00:00.000Z') },
|
||||
{ dialect: 'postgres', minExecutions: 5, enabledTables: [], filters: { dropTrivialProbes: true }, redactionPatterns: [], staleArchiveAfterDays: 90 },
|
||||
)) {
|
||||
rows.push(row);
|
||||
}
|
||||
|
||||
expect(rows).toEqual([
|
||||
{
|
||||
templateId: '123',
|
||||
canonicalSql: 'select status from public.orders',
|
||||
dialect: 'postgres',
|
||||
stats: {
|
||||
executions: 42,
|
||||
distinctUsers: 3,
|
||||
firstSeen: '2026-05-01T00:00:00.000Z',
|
||||
lastSeen: '2026-05-11T00:00:00.000Z',
|
||||
p50RuntimeMs: 11.5,
|
||||
p95RuntimeMs: 11.5,
|
||||
errorRate: 0,
|
||||
rowsProduced: 100,
|
||||
},
|
||||
topUsers: [{ user: 'analyst', executions: 40 }],
|
||||
},
|
||||
]);
|
||||
});
|
||||
});
|
||||
|
|
@ -0,0 +1,293 @@
|
|||
import {
|
||||
HistoricSqlExtensionMissingError,
|
||||
HistoricSqlGrantsMissingError,
|
||||
HistoricSqlVersionUnsupportedError,
|
||||
} from './errors.js';
|
||||
import {
|
||||
aggregatedTemplateSchema,
|
||||
type AggregatedTemplate,
|
||||
type HistoricSqlTimeWindow,
|
||||
type HistoricSqlUnifiedPullConfig,
|
||||
type KtxPostgresQueryClient,
|
||||
type PostgresPgssProbeResult,
|
||||
} from './types.js';
|
||||
|
||||
interface QueryResultLike {
|
||||
headers: string[];
|
||||
rows: unknown[][];
|
||||
totalRows?: number;
|
||||
error?: string;
|
||||
}
|
||||
|
||||
const STATS_INFO_SQL = 'SELECT stats_reset, dealloc FROM pg_stat_statements_info';
|
||||
const VERSION_SQL = `
|
||||
SELECT current_setting('server_version_num')::int AS server_version_num,
|
||||
version() AS server_version
|
||||
`.trim();
|
||||
const EXTENSION_PROBE_SQL = 'SELECT 1 FROM pg_stat_statements LIMIT 1';
|
||||
const GRANTS_PROBE_SQL = "SELECT pg_has_role(current_user, 'pg_read_all_stats', 'USAGE') AS has_role";
|
||||
const TRACKING_PROBE_SQL = "SELECT current_setting('pg_stat_statements.track') AS track";
|
||||
const MAX_SETTING_PROBE_SQL = "SELECT current_setting('pg_stat_statements.max') AS max";
|
||||
const RECOMMENDED_PGSS_MAX = 5000;
|
||||
|
||||
const AGGREGATE_SQL = `
|
||||
SELECT queryid::text AS template_id,
|
||||
query AS canonical_sql,
|
||||
SUM(calls)::bigint AS executions,
|
||||
COUNT(DISTINCT userid) AS distinct_users,
|
||||
SUM(total_exec_time) / NULLIF(SUM(calls), 0) AS mean_ms,
|
||||
SUM(rows)::bigint AS rows_produced,
|
||||
COALESCE(
|
||||
json_agg(json_build_object('user', rolname, 'executions', calls) ORDER BY calls DESC)
|
||||
FILTER (WHERE userid IS NOT NULL),
|
||||
'[]'::json
|
||||
)::text AS top_users
|
||||
FROM pg_stat_statements
|
||||
LEFT JOIN pg_roles ON pg_roles.oid = pg_stat_statements.userid
|
||||
WHERE toplevel = true
|
||||
GROUP BY queryid, query
|
||||
HAVING SUM(calls) >= $1
|
||||
ORDER BY SUM(total_exec_time) DESC
|
||||
`.trim();
|
||||
|
||||
const POSTGRES_EXTENSION_REMEDIATION = [
|
||||
'Run CREATE EXTENSION pg_stat_statements; against the connection database.',
|
||||
"Ensure shared_preload_libraries includes 'pg_stat_statements' in the Postgres parameter group or config.",
|
||||
].join(' ');
|
||||
|
||||
const POSTGRES_GRANTS_REMEDIATION = 'GRANT pg_read_all_stats TO <connection role>;';
|
||||
|
||||
function queryClient(client: unknown): KtxPostgresQueryClient {
|
||||
if (
|
||||
client &&
|
||||
typeof client === 'object' &&
|
||||
'executeQuery' in client &&
|
||||
typeof (client as { executeQuery?: unknown }).executeQuery === 'function'
|
||||
) {
|
||||
return client as KtxPostgresQueryClient;
|
||||
}
|
||||
throw new Error('Historic SQL Postgres PGSS reader requires a query client with executeQuery(sql, params?)');
|
||||
}
|
||||
|
||||
async function execute(client: KtxPostgresQueryClient, sql: string, params?: unknown[]): Promise<QueryResultLike> {
|
||||
const result = await client.executeQuery(sql, params);
|
||||
if ('error' in result && typeof result.error === 'string' && result.error.length > 0) {
|
||||
throw new Error(result.error);
|
||||
}
|
||||
return result;
|
||||
}
|
||||
|
||||
function indexByHeader(headers: string[]): Map<string, number> {
|
||||
const out = new Map<string, number>();
|
||||
headers.forEach((header, index) => out.set(header.toLowerCase(), index));
|
||||
return out;
|
||||
}
|
||||
|
||||
function value(row: unknown[], headerIndexes: Map<string, number>, header: string): unknown {
|
||||
const index = headerIndexes.get(header.toLowerCase());
|
||||
return index === undefined ? null : row[index];
|
||||
}
|
||||
|
||||
function nullableString(raw: unknown): string | null {
|
||||
if (raw === null || raw === undefined) {
|
||||
return null;
|
||||
}
|
||||
const text = String(raw);
|
||||
return text.length > 0 ? text : null;
|
||||
}
|
||||
|
||||
function requiredString(raw: unknown, field: string): string {
|
||||
const text = nullableString(raw);
|
||||
if (!text) {
|
||||
throw new Error(`Postgres pg_stat_statements row is missing ${field}`);
|
||||
}
|
||||
return text;
|
||||
}
|
||||
|
||||
function requiredFiniteNumber(raw: unknown, field: string): number {
|
||||
const number = typeof raw === 'number' ? raw : Number(raw);
|
||||
if (!Number.isFinite(number)) {
|
||||
throw new Error(`Postgres pg_stat_statements row has invalid ${field}: ${String(raw)}`);
|
||||
}
|
||||
return number;
|
||||
}
|
||||
|
||||
function requiredInteger(raw: unknown, field: string): number {
|
||||
return Math.trunc(requiredFiniteNumber(raw, field));
|
||||
}
|
||||
|
||||
function nullableNumber(raw: unknown): number | null {
|
||||
if (raw === null || raw === undefined || raw === '') {
|
||||
return null;
|
||||
}
|
||||
const number = typeof raw === 'number' ? raw : Number(raw);
|
||||
return Number.isFinite(number) ? number : null;
|
||||
}
|
||||
|
||||
function nullableInteger(raw: unknown): number | null {
|
||||
const number = nullableNumber(raw);
|
||||
return number === null ? null : Math.trunc(number);
|
||||
}
|
||||
|
||||
function nullableIsoTimestamp(raw: unknown): string | null {
|
||||
if (raw === null || raw === undefined || raw === '') {
|
||||
return null;
|
||||
}
|
||||
if (raw instanceof Date) {
|
||||
return raw.toISOString();
|
||||
}
|
||||
const date = new Date(String(raw));
|
||||
return Number.isNaN(date.getTime()) ? null : date.toISOString();
|
||||
}
|
||||
|
||||
function firstRow(result: QueryResultLike, context: string): { row: unknown[]; headers: Map<string, number> } {
|
||||
const row = result.rows[0];
|
||||
if (!row) {
|
||||
throw new Error(`Postgres historic-SQL ${context} query returned no rows`);
|
||||
}
|
||||
return { row, headers: indexByHeader(result.headers) };
|
||||
}
|
||||
|
||||
function isMissingPgssRelation(error: unknown): boolean {
|
||||
const message = error instanceof Error ? error.message : String(error);
|
||||
return /relation ["']?pg_stat_statements["']? does not exist/i.test(message);
|
||||
}
|
||||
|
||||
function isPgssPreloadRequired(error: unknown): boolean {
|
||||
const message = error instanceof Error ? error.message : String(error);
|
||||
return /pg_stat_statements.*shared_preload_libraries/i.test(message);
|
||||
}
|
||||
|
||||
function extensionMissingError(cause: unknown, message?: string): HistoricSqlExtensionMissingError {
|
||||
return new HistoricSqlExtensionMissingError({
|
||||
dialect: 'postgres',
|
||||
message: message ?? 'pg_stat_statements extension is not installed in the connection database.',
|
||||
remediation: POSTGRES_EXTENSION_REMEDIATION,
|
||||
cause,
|
||||
});
|
||||
}
|
||||
|
||||
function grantsMissingError(): HistoricSqlGrantsMissingError {
|
||||
return new HistoricSqlGrantsMissingError({
|
||||
dialect: 'postgres',
|
||||
message: 'Postgres connection role lacks pg_read_all_stats for historic-SQL ingest.',
|
||||
remediation: POSTGRES_GRANTS_REMEDIATION,
|
||||
});
|
||||
}
|
||||
|
||||
function parseTopUsers(raw: unknown): Array<{ user: string | null; executions: number }> {
|
||||
const text = nullableString(raw);
|
||||
if (!text) {
|
||||
return [];
|
||||
}
|
||||
try {
|
||||
const parsed = JSON.parse(text) as unknown;
|
||||
if (!Array.isArray(parsed)) {
|
||||
return [];
|
||||
}
|
||||
return parsed.flatMap((entry) => {
|
||||
if (!entry || typeof entry !== 'object') {
|
||||
return [];
|
||||
}
|
||||
const user = nullableString((entry as { user?: unknown }).user);
|
||||
const executions = nullableInteger((entry as { executions?: unknown }).executions);
|
||||
return executions === null ? [] : [{ user, executions }];
|
||||
});
|
||||
} catch {
|
||||
return [];
|
||||
}
|
||||
}
|
||||
|
||||
export class PostgresPgssReader {
|
||||
async probe(client: unknown): Promise<PostgresPgssProbeResult> {
|
||||
const pgClient = queryClient(client);
|
||||
const versionResult = await execute(pgClient, VERSION_SQL);
|
||||
const { row: versionRow, headers: versionHeaders } = firstRow(versionResult, 'version probe');
|
||||
const serverVersionNum = requiredFiniteNumber(
|
||||
value(versionRow, versionHeaders, 'server_version_num'),
|
||||
'server_version_num',
|
||||
);
|
||||
const pgServerVersion = requiredString(value(versionRow, versionHeaders, 'server_version'), 'server_version');
|
||||
|
||||
if (serverVersionNum < 140000) {
|
||||
throw new HistoricSqlVersionUnsupportedError({
|
||||
dialect: 'postgres',
|
||||
detectedVersion: pgServerVersion,
|
||||
minimumVersion: 'PostgreSQL 14',
|
||||
});
|
||||
}
|
||||
|
||||
try {
|
||||
await execute(pgClient, EXTENSION_PROBE_SQL);
|
||||
} catch (error) {
|
||||
if (isMissingPgssRelation(error)) {
|
||||
throw extensionMissingError(error);
|
||||
}
|
||||
if (isPgssPreloadRequired(error)) {
|
||||
throw extensionMissingError(
|
||||
error,
|
||||
'pg_stat_statements is installed but not loaded via shared_preload_libraries.',
|
||||
);
|
||||
}
|
||||
throw error;
|
||||
}
|
||||
|
||||
const grantsResult = await execute(pgClient, GRANTS_PROBE_SQL);
|
||||
const { row: grantsRow, headers: grantsHeaders } = firstRow(grantsResult, 'grant probe');
|
||||
if (value(grantsRow, grantsHeaders, 'has_role') !== true) {
|
||||
throw grantsMissingError();
|
||||
}
|
||||
|
||||
const trackingResult = await execute(pgClient, TRACKING_PROBE_SQL);
|
||||
const { row: trackingRow, headers: trackingHeaders } = firstRow(trackingResult, 'tracking probe');
|
||||
const track = nullableString(value(trackingRow, trackingHeaders, 'track'));
|
||||
|
||||
const maxResult = await execute(pgClient, MAX_SETTING_PROBE_SQL);
|
||||
const { row: maxRow, headers: maxHeaders } = firstRow(maxResult, 'max-setting probe');
|
||||
const pgssMax = nullableInteger(value(maxRow, maxHeaders, 'max'));
|
||||
|
||||
const warnings: string[] = [];
|
||||
const info: string[] = [];
|
||||
if (track === 'none') {
|
||||
warnings.push('pg_stat_statements.track is none; set it to top or all in the Postgres parameter group or config');
|
||||
}
|
||||
if (pgssMax !== null && pgssMax < RECOMMENDED_PGSS_MAX) {
|
||||
info.push(
|
||||
`pg_stat_statements.max is ${pgssMax}; set it to at least ${RECOMMENDED_PGSS_MAX} to reduce query-template eviction churn`,
|
||||
);
|
||||
}
|
||||
|
||||
return { pgServerVersion, warnings, info };
|
||||
}
|
||||
|
||||
async *fetchAggregated(
|
||||
client: unknown,
|
||||
window: HistoricSqlTimeWindow,
|
||||
config: HistoricSqlUnifiedPullConfig,
|
||||
): AsyncIterable<AggregatedTemplate> {
|
||||
const pgClient = queryClient(client);
|
||||
const statsResult = await execute(pgClient, STATS_INFO_SQL);
|
||||
const { row: statsRow, headers: statsHeaders } = firstRow(statsResult, 'stats-info');
|
||||
const firstSeen = nullableIsoTimestamp(value(statsRow, statsHeaders, 'stats_reset')) ?? window.start.toISOString();
|
||||
const result = await execute(pgClient, AGGREGATE_SQL, [config.minExecutions]);
|
||||
const indexes = indexByHeader(result.headers);
|
||||
for (const row of result.rows) {
|
||||
yield aggregatedTemplateSchema.parse({
|
||||
templateId: requiredString(value(row, indexes, 'template_id'), 'template_id'),
|
||||
canonicalSql: requiredString(value(row, indexes, 'canonical_sql'), 'canonical_sql'),
|
||||
dialect: 'postgres',
|
||||
stats: {
|
||||
executions: requiredInteger(value(row, indexes, 'executions'), 'executions'),
|
||||
distinctUsers: requiredInteger(value(row, indexes, 'distinct_users'), 'distinct_users'),
|
||||
firstSeen,
|
||||
lastSeen: window.end.toISOString(),
|
||||
p50RuntimeMs: nullableNumber(value(row, indexes, 'mean_ms')),
|
||||
p95RuntimeMs: nullableNumber(value(row, indexes, 'mean_ms')),
|
||||
errorRate: 0,
|
||||
rowsProduced: nullableInteger(value(row, indexes, 'rows_produced')),
|
||||
},
|
||||
topUsers: parseTopUsers(value(row, indexes, 'top_users')),
|
||||
});
|
||||
}
|
||||
}
|
||||
}
|
||||
|
|
@ -0,0 +1,457 @@
|
|||
import { mkdir, mkdtemp, readFile, writeFile } from 'node:fs/promises';
|
||||
import { tmpdir } from 'node:os';
|
||||
import { join } from 'node:path';
|
||||
import YAML from 'yaml';
|
||||
import { describe, expect, it } from 'vitest';
|
||||
import { projectHistoricSqlEvidence } from './projection.js';
|
||||
|
||||
async function tempWorkdir(): Promise<string> {
|
||||
return mkdtemp(join(tmpdir(), 'historic-sql-projection-'));
|
||||
}
|
||||
|
||||
async function writeText(root: string, relPath: string, content: string): Promise<void> {
|
||||
const target = join(root, relPath);
|
||||
await mkdir(join(target, '..'), { recursive: true });
|
||||
await writeFile(target, content, 'utf-8');
|
||||
}
|
||||
|
||||
async function writeJson(root: string, relPath: string, value: unknown): Promise<void> {
|
||||
await writeText(root, relPath, `${JSON.stringify(value, null, 2)}\n`);
|
||||
}
|
||||
|
||||
describe('projectHistoricSqlEvidence', () => {
|
||||
it('merges table usage into matching _schema shards and preserves external usage keys', async () => {
|
||||
const workdir = await tempWorkdir();
|
||||
await writeText(
|
||||
workdir,
|
||||
'semantic-layer/warehouse/_schema/public.yaml',
|
||||
YAML.stringify({
|
||||
tables: {
|
||||
orders: {
|
||||
table: 'public.orders',
|
||||
usage: {
|
||||
narrative: 'Old generated usage.',
|
||||
frequencyTier: 'low',
|
||||
commonFilters: ['old_status'],
|
||||
commonJoins: [],
|
||||
ownerNote: 'keep me',
|
||||
},
|
||||
columns: [{ name: 'id', type: 'string' }],
|
||||
},
|
||||
},
|
||||
}),
|
||||
);
|
||||
await writeJson(workdir, 'raw-sources/warehouse/historic-sql/sync-1/manifest.json', {
|
||||
source: 'historic-sql',
|
||||
connectionId: 'warehouse',
|
||||
dialect: 'postgres',
|
||||
fetchedAt: '2026-05-11T00:00:00.000Z',
|
||||
windowStart: '2026-02-10T00:00:00.000Z',
|
||||
windowEnd: '2026-05-11T00:00:00.000Z',
|
||||
snapshotRowCount: 1,
|
||||
touchedTableCount: 1,
|
||||
parseFailures: 0,
|
||||
warnings: [],
|
||||
probeWarnings: [],
|
||||
staleArchiveAfterDays: 90,
|
||||
});
|
||||
await writeJson(workdir, 'raw-sources/warehouse/historic-sql/sync-1/tables/public.orders.json', { table: 'public.orders' });
|
||||
await writeJson(workdir, '.ktx/ingest-evidence/historic-sql/run-1/orders.json', {
|
||||
kind: 'table_usage',
|
||||
connectionId: 'warehouse',
|
||||
table: 'public.orders',
|
||||
rawPath: 'tables/public.orders.json',
|
||||
usage: {
|
||||
narrative: 'Orders are repeatedly queried for lifecycle analysis.',
|
||||
frequencyTier: 'high',
|
||||
commonFilters: ['status', 'created_at'],
|
||||
commonGroupBys: ['status'],
|
||||
commonJoins: [{ table: 'public.customers', on: ['customer_id'] }],
|
||||
staleSince: null,
|
||||
},
|
||||
});
|
||||
|
||||
const result = await projectHistoricSqlEvidence({ workdir, connectionId: 'warehouse', syncId: 'sync-1', runId: 'run-1' });
|
||||
|
||||
expect(result.touchedSources).toEqual([{ connectionId: 'warehouse', sourceName: 'orders' }]);
|
||||
expect(result.actions).toEqual(
|
||||
expect.arrayContaining([
|
||||
expect.objectContaining({
|
||||
target: 'sl',
|
||||
key: 'orders',
|
||||
rawPaths: ['tables/public.orders.json'],
|
||||
}),
|
||||
]),
|
||||
);
|
||||
const shard = YAML.parse(await readFile(join(workdir, 'semantic-layer/warehouse/_schema/public.yaml'), 'utf-8'));
|
||||
expect(shard.tables.orders.usage).toEqual({
|
||||
ownerNote: 'keep me',
|
||||
narrative: 'Orders are repeatedly queried for lifecycle analysis.',
|
||||
frequencyTier: 'high',
|
||||
commonFilters: ['status', 'created_at'],
|
||||
commonGroupBys: ['status'],
|
||||
commonJoins: [{ table: 'public.customers', on: ['customer_id'] }],
|
||||
staleSince: null,
|
||||
});
|
||||
});
|
||||
|
||||
it('writes pattern pages, reuses similar slugs, and marks missing old pattern pages stale', async () => {
|
||||
const workdir = await tempWorkdir();
|
||||
await writeJson(workdir, 'raw-sources/warehouse/historic-sql/sync-1/manifest.json', {
|
||||
source: 'historic-sql',
|
||||
connectionId: 'warehouse',
|
||||
dialect: 'postgres',
|
||||
fetchedAt: '2026-05-11T00:00:00.000Z',
|
||||
windowStart: '2026-02-10T00:00:00.000Z',
|
||||
windowEnd: '2026-05-11T00:00:00.000Z',
|
||||
snapshotRowCount: 2,
|
||||
touchedTableCount: 2,
|
||||
parseFailures: 0,
|
||||
warnings: [],
|
||||
probeWarnings: [],
|
||||
staleArchiveAfterDays: 90,
|
||||
});
|
||||
await writeJson(workdir, 'raw-sources/warehouse/historic-sql/sync-1/tables/public.orders.json', { table: 'public.orders' });
|
||||
await writeJson(workdir, 'raw-sources/warehouse/historic-sql/sync-1/tables/public.customers.json', { table: 'public.customers' });
|
||||
await writeText(
|
||||
workdir,
|
||||
'wiki/global/historic-sql-old-order-lifecycle.md',
|
||||
[
|
||||
'---',
|
||||
YAML.stringify({
|
||||
summary: 'Old order lifecycle page',
|
||||
tags: ['historic-sql', 'pattern'],
|
||||
refs: [],
|
||||
sl_refs: ['orders'],
|
||||
usage_mode: 'auto',
|
||||
source: 'historic-sql',
|
||||
tables: ['public.orders', 'public.customers'],
|
||||
fingerprints: ['pg:1'],
|
||||
}).trimEnd(),
|
||||
'---',
|
||||
'',
|
||||
'Old body',
|
||||
'',
|
||||
].join('\n'),
|
||||
);
|
||||
await writeText(
|
||||
workdir,
|
||||
'wiki/global/historic-sql-retired-pattern.md',
|
||||
[
|
||||
'---',
|
||||
YAML.stringify({
|
||||
summary: 'Retired pattern',
|
||||
tags: ['historic-sql', 'pattern'],
|
||||
refs: [],
|
||||
sl_refs: [],
|
||||
usage_mode: 'auto',
|
||||
source: 'historic-sql',
|
||||
tables: ['public.tickets'],
|
||||
fingerprints: ['pg:9'],
|
||||
}).trimEnd(),
|
||||
'---',
|
||||
'',
|
||||
'Retired body',
|
||||
'',
|
||||
].join('\n'),
|
||||
);
|
||||
await writeJson(workdir, '.ktx/ingest-evidence/historic-sql/run-1/pattern.json', {
|
||||
kind: 'pattern',
|
||||
connectionId: 'warehouse',
|
||||
rawPath: 'patterns-input.json',
|
||||
pattern: {
|
||||
slug: 'order-lifecycle-analysis',
|
||||
title: 'Order Lifecycle Analysis',
|
||||
narrative: 'Analysts compare order status with customer segment.',
|
||||
definitionSql: 'select * from public.orders join public.customers on customers.id = orders.customer_id',
|
||||
tablesInvolved: ['public.orders', 'public.customers'],
|
||||
slRefs: ['orders', 'customers'],
|
||||
constituentTemplateIds: ['pg:1', 'pg:2'],
|
||||
},
|
||||
});
|
||||
|
||||
const result = await projectHistoricSqlEvidence({ workdir, connectionId: 'warehouse', syncId: 'sync-1', runId: 'run-1' });
|
||||
|
||||
expect(result.patternPagesWritten).toBe(1);
|
||||
expect(result.changedWikiPageKeys).toContain('historic-sql-old-order-lifecycle');
|
||||
expect(result.actions).toEqual(
|
||||
expect.arrayContaining([
|
||||
expect.objectContaining({
|
||||
target: 'wiki',
|
||||
key: 'historic-sql-old-order-lifecycle',
|
||||
rawPaths: ['patterns-input.json'],
|
||||
}),
|
||||
]),
|
||||
);
|
||||
await expect(readFile(join(workdir, 'wiki/global/historic-sql-old-order-lifecycle.md'), 'utf-8')).resolves.toContain(
|
||||
'Order Lifecycle Analysis',
|
||||
);
|
||||
await expect(readFile(join(workdir, 'wiki/global/historic-sql-retired-pattern.md'), 'utf-8')).resolves.toContain(
|
||||
'stale_since: "2026-05-11T00:00:00.000Z"',
|
||||
);
|
||||
});
|
||||
|
||||
it('rewrites a reappearing archived pattern at the flat slug', async () => {
|
||||
const workdir = await tempWorkdir();
|
||||
await writeJson(workdir, 'raw-sources/warehouse/historic-sql/sync-1/manifest.json', {
|
||||
source: 'historic-sql',
|
||||
connectionId: 'warehouse',
|
||||
dialect: 'postgres',
|
||||
fetchedAt: '2026-05-11T00:00:00.000Z',
|
||||
windowStart: '2026-02-10T00:00:00.000Z',
|
||||
windowEnd: '2026-05-11T00:00:00.000Z',
|
||||
snapshotRowCount: 2,
|
||||
touchedTableCount: 2,
|
||||
parseFailures: 0,
|
||||
warnings: [],
|
||||
probeWarnings: [],
|
||||
staleArchiveAfterDays: 30,
|
||||
});
|
||||
await writeJson(workdir, 'raw-sources/warehouse/historic-sql/sync-1/tables/public.orders.json', { table: 'public.orders' });
|
||||
await writeJson(workdir, 'raw-sources/warehouse/historic-sql/sync-1/tables/public.customers.json', { table: 'public.customers' });
|
||||
await writeText(
|
||||
workdir,
|
||||
'wiki/global/historic-sql-order-lifecycle-analysis.md',
|
||||
[
|
||||
'---',
|
||||
YAML.stringify({
|
||||
summary: 'Archived order lifecycle page',
|
||||
tags: ['historic-sql', 'pattern', 'archived'],
|
||||
refs: [],
|
||||
sl_refs: ['orders'],
|
||||
usage_mode: 'auto',
|
||||
source: 'historic-sql',
|
||||
tables: ['public.orders', 'public.customers'],
|
||||
fingerprints: ['pg:1'],
|
||||
stale_since: '2026-01-01T00:00:00.000Z',
|
||||
}).trimEnd(),
|
||||
'---',
|
||||
'',
|
||||
'Archived body',
|
||||
'',
|
||||
].join('\n'),
|
||||
);
|
||||
await writeJson(workdir, '.ktx/ingest-evidence/historic-sql/run-1/pattern.json', {
|
||||
kind: 'pattern',
|
||||
connectionId: 'warehouse',
|
||||
rawPath: 'patterns-input.json',
|
||||
pattern: {
|
||||
slug: 'order-lifecycle-analysis',
|
||||
title: 'Order Lifecycle Analysis',
|
||||
narrative: 'Analysts compare order status with customer segment again.',
|
||||
definitionSql: 'select * from public.orders join public.customers on customers.id = orders.customer_id',
|
||||
tablesInvolved: ['public.orders', 'public.customers'],
|
||||
slRefs: ['orders', 'customers'],
|
||||
constituentTemplateIds: ['pg:1', 'pg:2'],
|
||||
},
|
||||
});
|
||||
|
||||
const result = await projectHistoricSqlEvidence({ workdir, connectionId: 'warehouse', syncId: 'sync-1', runId: 'run-1' });
|
||||
|
||||
expect(result.patternPagesWritten).toBe(1);
|
||||
const page = await readFile(join(workdir, 'wiki/global/historic-sql-order-lifecycle-analysis.md'), 'utf-8');
|
||||
expect(page).toContain('Analysts compare order status with customer segment again.');
|
||||
expect(page).not.toContain('Archived body');
|
||||
expect(page).not.toContain('archived');
|
||||
});
|
||||
|
||||
it('leaves already archived pattern pages stable when they are still absent', async () => {
|
||||
const workdir = await tempWorkdir();
|
||||
await writeJson(workdir, 'raw-sources/warehouse/historic-sql/sync-1/manifest.json', {
|
||||
source: 'historic-sql',
|
||||
connectionId: 'warehouse',
|
||||
dialect: 'postgres',
|
||||
fetchedAt: '2026-05-11T00:00:00.000Z',
|
||||
windowStart: '2026-02-10T00:00:00.000Z',
|
||||
windowEnd: '2026-05-11T00:00:00.000Z',
|
||||
snapshotRowCount: 0,
|
||||
touchedTableCount: 0,
|
||||
parseFailures: 0,
|
||||
warnings: [],
|
||||
probeWarnings: [],
|
||||
staleArchiveAfterDays: 30,
|
||||
});
|
||||
await writeText(
|
||||
workdir,
|
||||
'wiki/global/historic-sql-retired-pattern.md',
|
||||
[
|
||||
'---',
|
||||
YAML.stringify({
|
||||
summary: 'Retired pattern',
|
||||
tags: ['historic-sql', 'pattern', 'archived'],
|
||||
refs: [],
|
||||
sl_refs: [],
|
||||
usage_mode: 'auto',
|
||||
source: 'historic-sql',
|
||||
tables: ['public.tickets'],
|
||||
fingerprints: ['pg:9'],
|
||||
stale_since: '2026-01-01T00:00:00.000Z',
|
||||
}).trimEnd(),
|
||||
'---',
|
||||
'',
|
||||
'Archived retired body',
|
||||
'',
|
||||
].join('\n'),
|
||||
);
|
||||
|
||||
const result = await projectHistoricSqlEvidence({ workdir, connectionId: 'warehouse', syncId: 'sync-1', runId: 'run-1' });
|
||||
|
||||
expect(result.archivedPatternPages).toBe(0);
|
||||
expect(result.stalePatternPagesMarked).toBe(0);
|
||||
await expect(readFile(join(workdir, 'wiki/global/historic-sql-retired-pattern.md'), 'utf-8')).resolves.toContain(
|
||||
'Archived retired body',
|
||||
);
|
||||
});
|
||||
|
||||
it('marks missing table usage stale without deleting old query pages', async () => {
|
||||
const workdir = await tempWorkdir();
|
||||
await writeText(
|
||||
workdir,
|
||||
'semantic-layer/warehouse/_schema/public.yaml',
|
||||
YAML.stringify({
|
||||
tables: {
|
||||
orders: {
|
||||
table: 'public.orders',
|
||||
usage: {
|
||||
narrative: 'Orders were active before.',
|
||||
frequencyTier: 'high',
|
||||
commonFilters: ['status'],
|
||||
commonGroupBys: ['status'],
|
||||
commonJoins: [{ table: 'public.customers', on: ['customer_id'] }],
|
||||
ownerNote: 'keep analyst annotation',
|
||||
},
|
||||
columns: [{ name: 'id', type: 'string' }],
|
||||
},
|
||||
},
|
||||
}),
|
||||
);
|
||||
await writeJson(workdir, 'raw-sources/warehouse/historic-sql/sync-1/manifest.json', {
|
||||
source: 'historic-sql',
|
||||
connectionId: 'warehouse',
|
||||
dialect: 'postgres',
|
||||
fetchedAt: '2026-05-11T00:00:00.000Z',
|
||||
windowStart: '2026-02-10T00:00:00.000Z',
|
||||
windowEnd: '2026-05-11T00:00:00.000Z',
|
||||
snapshotRowCount: 0,
|
||||
touchedTableCount: 0,
|
||||
parseFailures: 0,
|
||||
warnings: [],
|
||||
probeWarnings: [],
|
||||
staleArchiveAfterDays: 90,
|
||||
});
|
||||
await writeJson(workdir, '.ktx/ingest-evidence/historic-sql/run-1/customers.json', {
|
||||
kind: 'table_usage',
|
||||
connectionId: 'warehouse',
|
||||
table: 'public.customers',
|
||||
rawPath: 'tables/public.customers.json',
|
||||
usage: {
|
||||
narrative: 'Customers were queried.',
|
||||
frequencyTier: 'low',
|
||||
commonFilters: [],
|
||||
commonJoins: [],
|
||||
staleSince: null,
|
||||
},
|
||||
});
|
||||
await writeText(
|
||||
workdir,
|
||||
'wiki/global/historic-sql-old-template.md',
|
||||
[
|
||||
'---',
|
||||
YAML.stringify({
|
||||
summary: 'Old template page',
|
||||
tags: ['historic-sql', 'query-pattern'],
|
||||
refs: [],
|
||||
sl_refs: ['orders'],
|
||||
usage_mode: 'auto',
|
||||
source: 'historic-sql',
|
||||
tables: ['public.orders'],
|
||||
fingerprints: ['old:1'],
|
||||
}).trimEnd(),
|
||||
'---',
|
||||
'',
|
||||
'Old body',
|
||||
'',
|
||||
].join('\n'),
|
||||
);
|
||||
|
||||
const result = await projectHistoricSqlEvidence({ workdir, connectionId: 'warehouse', syncId: 'sync-1', runId: 'run-1' });
|
||||
|
||||
expect(result.staleTablesMarked).toBe(1);
|
||||
expect(result.touchedSources).toEqual([{ connectionId: 'warehouse', sourceName: 'orders' }]);
|
||||
const staleAction = result.actions.find((action) => action.target === 'sl' && action.key === 'orders');
|
||||
expect(staleAction).toEqual(expect.objectContaining({ target: 'sl', key: 'orders' }));
|
||||
expect(staleAction?.rawPaths).toBeUndefined();
|
||||
const shard = YAML.parse(await readFile(join(workdir, 'semantic-layer/warehouse/_schema/public.yaml'), 'utf-8'));
|
||||
expect(shard.tables.orders.usage).toEqual({
|
||||
ownerNote: 'keep analyst annotation',
|
||||
narrative: 'No recent historic SQL usage was observed in the latest snapshot.',
|
||||
frequencyTier: 'unused',
|
||||
commonFilters: [],
|
||||
commonGroupBys: [],
|
||||
commonJoins: [],
|
||||
staleSince: '2026-05-11T00:00:00.000Z',
|
||||
});
|
||||
await expect(readFile(join(workdir, 'wiki/global/historic-sql-old-template.md'), 'utf-8')).resolves.toContain(
|
||||
'Old body',
|
||||
);
|
||||
});
|
||||
|
||||
it('does not mark stale or archive pages when override replay has no current-run evidence', async () => {
|
||||
const workdir = await tempWorkdir();
|
||||
await writeText(
|
||||
workdir,
|
||||
'semantic-layer/warehouse/_schema/public.yaml',
|
||||
YAML.stringify({
|
||||
tables: {
|
||||
orders: {
|
||||
table: 'public.orders',
|
||||
usage: {
|
||||
narrative: 'Orders were active before.',
|
||||
frequencyTier: 'high',
|
||||
commonFilters: ['status'],
|
||||
commonGroupBys: ['status'],
|
||||
commonJoins: [],
|
||||
},
|
||||
columns: [{ name: 'id', type: 'string' }],
|
||||
},
|
||||
},
|
||||
}),
|
||||
);
|
||||
await writeJson(workdir, 'raw-sources/warehouse/historic-sql/override-sync/manifest.json', {
|
||||
source: 'historic-sql',
|
||||
connectionId: 'warehouse',
|
||||
dialect: 'postgres',
|
||||
fetchedAt: '2026-05-11T00:00:00.000Z',
|
||||
windowStart: '2026-02-10T00:00:00.000Z',
|
||||
windowEnd: '2026-05-11T00:00:00.000Z',
|
||||
snapshotRowCount: 0,
|
||||
touchedTableCount: 0,
|
||||
parseFailures: 0,
|
||||
warnings: [],
|
||||
probeWarnings: [],
|
||||
staleArchiveAfterDays: 90,
|
||||
});
|
||||
|
||||
const result = await projectHistoricSqlEvidence({
|
||||
workdir,
|
||||
connectionId: 'warehouse',
|
||||
syncId: 'override-sync',
|
||||
runId: 'override-run',
|
||||
overrideReplay: {
|
||||
priorJobId: 'prior-job',
|
||||
priorRunId: 'prior-run',
|
||||
priorSyncId: 'prior-sync',
|
||||
evictionRawPaths: ['tables/public/orders.json'],
|
||||
},
|
||||
});
|
||||
|
||||
expect(result.tableUsageMerged).toBe(0);
|
||||
expect(result.staleTablesMarked).toBe(0);
|
||||
expect(result.patternPagesWritten).toBe(0);
|
||||
expect(result.stalePatternPagesMarked).toBe(0);
|
||||
expect(result.archivedPatternPages).toBe(0);
|
||||
expect(result.touchedSources).toEqual([]);
|
||||
expect(result.changedWikiPageKeys).toEqual([]);
|
||||
expect(result.actions).toEqual([]);
|
||||
});
|
||||
});
|
||||
|
|
@ -0,0 +1,385 @@
|
|||
import { access, mkdir, readdir, readFile, rename, writeFile } from 'node:fs/promises';
|
||||
import { dirname, join, relative } from 'node:path';
|
||||
import YAML from 'yaml';
|
||||
import type { MemoryAction } from '../../../../context/memory/types.js';
|
||||
import { rawSourcesDirForSync } from '../../raw-sources-paths.js';
|
||||
import type { FinalizationOverrideReplay } from '../../types.js';
|
||||
import { mergeUsagePreservingExternal } from '../live-database/manifest.js';
|
||||
import { historicSqlEvidenceEnvelopeSchema, type HistoricSqlEvidenceEnvelope } from './evidence.js';
|
||||
import type { TableUsageOutput } from './skill-schemas.js';
|
||||
import { stagedManifestSchema } from './types.js';
|
||||
|
||||
export interface HistoricSqlProjectionInput {
|
||||
workdir: string;
|
||||
connectionId: string;
|
||||
syncId: string;
|
||||
runId: string;
|
||||
overrideReplay?: FinalizationOverrideReplay;
|
||||
}
|
||||
|
||||
export interface HistoricSqlProjectionResult {
|
||||
tableUsageMerged: number;
|
||||
staleTablesMarked: number;
|
||||
patternPagesWritten: number;
|
||||
stalePatternPagesMarked: number;
|
||||
archivedPatternPages: number;
|
||||
touchedSources: Array<{ connectionId: string; sourceName: string }>;
|
||||
changedWikiPageKeys: string[];
|
||||
actions: MemoryAction[];
|
||||
warnings: string[];
|
||||
}
|
||||
|
||||
interface ManifestShard {
|
||||
tables?: Record<string, { table?: string; usage?: Record<string, unknown>; columns?: unknown[]; [key: string]: unknown }>;
|
||||
}
|
||||
|
||||
interface HistoricSqlPatternPage {
|
||||
key: string;
|
||||
path: string;
|
||||
frontmatter: Record<string, unknown>;
|
||||
content: string;
|
||||
}
|
||||
|
||||
function safeKnowledgeSlug(value: string): string {
|
||||
return value.toLowerCase().replace(/[^a-z0-9_-]+/g, '-').replace(/^-+|-+$/g, '');
|
||||
}
|
||||
|
||||
async function pathExists(path: string): Promise<boolean> {
|
||||
try {
|
||||
await access(path);
|
||||
return true;
|
||||
} catch {
|
||||
return false;
|
||||
}
|
||||
}
|
||||
|
||||
async function walkFiles(root: string): Promise<string[]> {
|
||||
if (!(await pathExists(root))) return [];
|
||||
const result: string[] = [];
|
||||
async function visit(dir: string): Promise<void> {
|
||||
const entries = await readdir(dir, { withFileTypes: true });
|
||||
for (const entry of entries) {
|
||||
const absolute = join(dir, entry.name);
|
||||
if (entry.isDirectory()) {
|
||||
await visit(absolute);
|
||||
} else if (entry.isFile()) {
|
||||
result.push(relative(root, absolute).replace(/\\/g, '/'));
|
||||
}
|
||||
}
|
||||
}
|
||||
await visit(root);
|
||||
return result.sort();
|
||||
}
|
||||
|
||||
async function readJson(path: string): Promise<unknown> {
|
||||
return JSON.parse(await readFile(path, 'utf-8')) as unknown;
|
||||
}
|
||||
|
||||
async function writeYamlAtomic(path: string, value: unknown): Promise<void> {
|
||||
await mkdir(dirname(path), { recursive: true });
|
||||
const tmp = `${path}.tmp`;
|
||||
await writeFile(tmp, YAML.stringify(value, { indent: 2, lineWidth: 0, version: '1.1' }), 'utf-8');
|
||||
await rename(tmp, path);
|
||||
}
|
||||
|
||||
function tableSourceName(tableRef: string): string {
|
||||
return tableRef.split('.').filter(Boolean).at(-1) ?? tableRef;
|
||||
}
|
||||
|
||||
function staleUsage(fetchedAt: string) {
|
||||
return {
|
||||
narrative: 'No recent historic SQL usage was observed in the latest snapshot.',
|
||||
frequencyTier: 'unused' as const,
|
||||
commonFilters: [],
|
||||
commonGroupBys: [],
|
||||
commonJoins: [],
|
||||
staleSince: fetchedAt,
|
||||
};
|
||||
}
|
||||
|
||||
async function loadEvidence(workdir: string, runId: string): Promise<HistoricSqlEvidenceEnvelope[]> {
|
||||
const root = join(workdir, '.ktx/ingest-evidence/historic-sql', runId);
|
||||
const files = await walkFiles(root);
|
||||
const evidence: HistoricSqlEvidenceEnvelope[] = [];
|
||||
for (const file of files.filter((candidate) => candidate.endsWith('.json'))) {
|
||||
evidence.push(historicSqlEvidenceEnvelopeSchema.parse(await readJson(join(root, file))));
|
||||
}
|
||||
return evidence;
|
||||
}
|
||||
|
||||
function renderPatternMarkdown(pattern: HistoricSqlEvidenceEnvelope & { kind: 'pattern' }): string {
|
||||
return [
|
||||
`# ${pattern.pattern.title}`,
|
||||
'',
|
||||
pattern.pattern.narrative,
|
||||
'',
|
||||
'## Representative SQL',
|
||||
'',
|
||||
'```sql',
|
||||
pattern.pattern.definitionSql,
|
||||
'```',
|
||||
'',
|
||||
'## Tables',
|
||||
'',
|
||||
...pattern.pattern.tablesInvolved.map((table) => `- ${table}`),
|
||||
'',
|
||||
'## Constituent Templates',
|
||||
'',
|
||||
...pattern.pattern.constituentTemplateIds.map((id) => `- ${id}`),
|
||||
'',
|
||||
].join('\n');
|
||||
}
|
||||
|
||||
function overlapRatio(left: string[], right: string[]): number {
|
||||
const rightSet = new Set(right);
|
||||
const intersection = left.filter((value) => rightSet.has(value)).length;
|
||||
return left.length === 0 ? 0 : intersection / left.length;
|
||||
}
|
||||
|
||||
function parseMarkdownPage(key: string, path: string, raw: string): HistoricSqlPatternPage | null {
|
||||
const match = raw.match(/^---\n([\s\S]*?)\n---\n?([\s\S]*)$/);
|
||||
if (!match) return null;
|
||||
return {
|
||||
key,
|
||||
path,
|
||||
frontmatter: (YAML.parse(match[1] ?? '') ?? {}) as Record<string, unknown>,
|
||||
content: match[2] ?? '',
|
||||
};
|
||||
}
|
||||
|
||||
function isHistoricPatternPage(page: HistoricSqlPatternPage): boolean {
|
||||
const tags = Array.isArray(page.frontmatter.tags) ? page.frontmatter.tags : [];
|
||||
return (
|
||||
page.frontmatter.source === 'historic-sql' &&
|
||||
tags.includes('historic-sql') &&
|
||||
tags.includes('pattern')
|
||||
);
|
||||
}
|
||||
|
||||
function isArchivedPatternPage(page: HistoricSqlPatternPage): boolean {
|
||||
const tags = Array.isArray(page.frontmatter.tags) ? page.frontmatter.tags : [];
|
||||
return tags.includes('archived');
|
||||
}
|
||||
|
||||
function stringArray(value: unknown): string[] {
|
||||
return Array.isArray(value) ? value.filter((entry): entry is string => typeof entry === 'string') : [];
|
||||
}
|
||||
|
||||
function renderMarkdownPage(frontmatter: Record<string, unknown>, content: string): string {
|
||||
let yaml = YAML.stringify(frontmatter, { indent: 2, lineWidth: 0 }).trimEnd();
|
||||
const staleSince = frontmatter.stale_since;
|
||||
if (typeof staleSince === 'string') {
|
||||
yaml = yaml.replace(`stale_since: ${staleSince}`, `stale_since: "${staleSince}"`);
|
||||
}
|
||||
return `---\n${yaml}\n---\n\n${content.trim()}\n`;
|
||||
}
|
||||
|
||||
function existingPageSignals(page: HistoricSqlPatternPage): string[] {
|
||||
return [...stringArray(page.frontmatter.tables), ...stringArray(page.frontmatter.fingerprints)];
|
||||
}
|
||||
|
||||
function shouldArchive(staleSince: unknown, fetchedAt: string, days: number): boolean {
|
||||
if (typeof staleSince !== 'string') return false;
|
||||
const staleTime = Date.parse(staleSince);
|
||||
const fetchedTime = Date.parse(fetchedAt);
|
||||
if (!Number.isFinite(staleTime) || !Number.isFinite(fetchedTime)) return false;
|
||||
return fetchedTime - staleTime > days * 24 * 60 * 60 * 1000;
|
||||
}
|
||||
|
||||
async function loadPatternPages(root: string): Promise<HistoricSqlPatternPage[]> {
|
||||
const files = await walkFiles(root);
|
||||
const pages: HistoricSqlPatternPage[] = [];
|
||||
for (const file of files.filter((candidate) => candidate.endsWith('.md'))) {
|
||||
if (file.includes('/')) {
|
||||
continue;
|
||||
}
|
||||
const key = file.replace(/\.md$/, '');
|
||||
const path = join(root, file);
|
||||
const page = parseMarkdownPage(key, path, await readFile(path, 'utf-8'));
|
||||
if (page) {
|
||||
pages.push(page);
|
||||
}
|
||||
}
|
||||
return pages;
|
||||
}
|
||||
|
||||
function historicSqlFlatKey(slug: string): string {
|
||||
return `historic-sql-${safeKnowledgeSlug(slug)}`;
|
||||
}
|
||||
|
||||
async function currentStagedTables(rawDir: string): Promise<Set<string>> {
|
||||
const tablesRoot = join(rawDir, 'tables');
|
||||
const files = await walkFiles(tablesRoot);
|
||||
const tables = new Set<string>();
|
||||
for (const file of files.filter((candidate) => candidate.endsWith('.json'))) {
|
||||
const value = await readJson(join(tablesRoot, file));
|
||||
if (typeof value === 'object' && value !== null && 'table' in value && typeof value.table === 'string') {
|
||||
tables.add(value.table);
|
||||
}
|
||||
}
|
||||
return tables;
|
||||
}
|
||||
|
||||
export async function projectHistoricSqlEvidence(input: HistoricSqlProjectionInput): Promise<HistoricSqlProjectionResult> {
|
||||
const result: HistoricSqlProjectionResult = {
|
||||
tableUsageMerged: 0,
|
||||
staleTablesMarked: 0,
|
||||
patternPagesWritten: 0,
|
||||
stalePatternPagesMarked: 0,
|
||||
archivedPatternPages: 0,
|
||||
touchedSources: [],
|
||||
changedWikiPageKeys: [],
|
||||
actions: [],
|
||||
warnings: [],
|
||||
};
|
||||
const touchedKeys = new Set<string>();
|
||||
const rawDir = join(input.workdir, rawSourcesDirForSync(input.connectionId, 'historic-sql', input.syncId));
|
||||
const manifest = stagedManifestSchema.parse(await readJson(join(rawDir, 'manifest.json')));
|
||||
const currentTables = await currentStagedTables(rawDir);
|
||||
const evidence = await loadEvidence(input.workdir, input.runId);
|
||||
if (input.overrideReplay && evidence.length === 0) {
|
||||
result.warnings.push(
|
||||
'historic-sql finalization skipped stale/archive cleanup during override replay without current-run evidence',
|
||||
);
|
||||
return result;
|
||||
}
|
||||
if (evidence.length === 0) {
|
||||
result.warnings.push('historic-sql finalization skipped because no current-run evidence was emitted');
|
||||
return result;
|
||||
}
|
||||
const tableEvidence = evidence.filter((entry): entry is HistoricSqlEvidenceEnvelope & { kind: 'table_usage' } => entry.kind === 'table_usage');
|
||||
const patternEvidence = evidence.filter((entry): entry is HistoricSqlEvidenceEnvelope & { kind: 'pattern' } => entry.kind === 'pattern');
|
||||
|
||||
const schemaRoot = join(input.workdir, 'semantic-layer', input.connectionId, '_schema');
|
||||
for (const file of (await walkFiles(schemaRoot)).filter((candidate) => candidate.endsWith('.yaml') || candidate.endsWith('.yml'))) {
|
||||
const path = join(schemaRoot, file);
|
||||
const before = await readFile(path, 'utf-8');
|
||||
const shard = (YAML.parse(before) ?? {}) as ManifestShard;
|
||||
if (!shard.tables) continue;
|
||||
for (const [tableName, entry] of Object.entries(shard.tables)) {
|
||||
const tableRef = entry.table ?? tableName;
|
||||
const matchingEvidence = tableEvidence.find(
|
||||
(candidate) => candidate.table === tableRef || tableSourceName(candidate.table) === tableName,
|
||||
);
|
||||
if (matchingEvidence) {
|
||||
const merged = mergeUsagePreservingExternal(entry.usage as TableUsageOutput | undefined, matchingEvidence.usage);
|
||||
if (JSON.stringify(entry.usage ?? null) !== JSON.stringify(merged ?? null)) {
|
||||
entry.usage = merged as Record<string, unknown>;
|
||||
result.tableUsageMerged += 1;
|
||||
const sourceName = tableSourceName(matchingEvidence.table);
|
||||
const key = `${input.connectionId}:${sourceName}`;
|
||||
if (!touchedKeys.has(key)) {
|
||||
touchedKeys.add(key);
|
||||
result.touchedSources.push({ connectionId: input.connectionId, sourceName });
|
||||
}
|
||||
result.actions.push({
|
||||
target: 'sl',
|
||||
type: 'updated',
|
||||
key: sourceName,
|
||||
targetConnectionId: input.connectionId,
|
||||
detail: `Merged historic-SQL usage for ${matchingEvidence.table}`,
|
||||
rawPaths: [matchingEvidence.rawPath],
|
||||
});
|
||||
}
|
||||
} else if (entry.usage && !currentTables.has(tableRef)) {
|
||||
const merged = mergeUsagePreservingExternal(entry.usage as TableUsageOutput | undefined, staleUsage(manifest.fetchedAt));
|
||||
if (JSON.stringify(entry.usage ?? null) !== JSON.stringify(merged ?? null)) {
|
||||
entry.usage = merged as Record<string, unknown>;
|
||||
result.staleTablesMarked += 1;
|
||||
const sourceName = tableSourceName(tableRef);
|
||||
const key = `${input.connectionId}:${sourceName}`;
|
||||
if (!touchedKeys.has(key)) {
|
||||
touchedKeys.add(key);
|
||||
result.touchedSources.push({ connectionId: input.connectionId, sourceName });
|
||||
}
|
||||
result.actions.push({
|
||||
target: 'sl',
|
||||
type: 'updated',
|
||||
key: sourceName,
|
||||
targetConnectionId: input.connectionId,
|
||||
detail: `Marked historic-SQL usage stale for ${tableRef}`,
|
||||
});
|
||||
}
|
||||
}
|
||||
}
|
||||
const after = YAML.stringify(shard, { indent: 2, lineWidth: 0, version: '1.1' });
|
||||
if (after !== before) {
|
||||
await writeYamlAtomic(path, shard);
|
||||
}
|
||||
}
|
||||
|
||||
const wikiRoot = join(input.workdir, 'wiki/global');
|
||||
await mkdir(wikiRoot, { recursive: true });
|
||||
const allPages = await loadPatternPages(wikiRoot);
|
||||
const activePages = allPages.filter((page) => !isArchivedPatternPage(page));
|
||||
const patternPages = activePages.filter(isHistoricPatternPage);
|
||||
const writtenKeys = new Set<string>();
|
||||
|
||||
for (const pattern of patternEvidence) {
|
||||
const incomingSignals = [...pattern.pattern.tablesInvolved, ...pattern.pattern.constituentTemplateIds];
|
||||
const reusable = patternPages.find((page) => overlapRatio(incomingSignals, existingPageSignals(page)) >= 0.6);
|
||||
const key = reusable?.key ?? historicSqlFlatKey(pattern.pattern.slug);
|
||||
const pagePath = join(wikiRoot, `${key}.md`);
|
||||
const frontmatter = {
|
||||
summary: pattern.pattern.title,
|
||||
tags: ['historic-sql', 'pattern'],
|
||||
refs: [],
|
||||
sl_refs: pattern.pattern.slRefs,
|
||||
usage_mode: 'auto',
|
||||
source: 'historic-sql',
|
||||
tables: pattern.pattern.tablesInvolved,
|
||||
representative_sql: pattern.pattern.definitionSql,
|
||||
fingerprints: pattern.pattern.constituentTemplateIds,
|
||||
};
|
||||
await mkdir(dirname(pagePath), { recursive: true });
|
||||
await writeFile(pagePath, renderMarkdownPage(frontmatter, renderPatternMarkdown(pattern)), 'utf-8');
|
||||
writtenKeys.add(key);
|
||||
result.patternPagesWritten += 1;
|
||||
result.changedWikiPageKeys.push(key);
|
||||
result.actions.push({
|
||||
target: 'wiki',
|
||||
type: reusable ? 'updated' : 'created',
|
||||
key,
|
||||
detail: `Projected historic-SQL pattern ${pattern.pattern.title}`,
|
||||
rawPaths: [pattern.rawPath],
|
||||
});
|
||||
}
|
||||
|
||||
for (const page of patternPages) {
|
||||
if (writtenKeys.has(page.key)) continue;
|
||||
if (shouldArchive(page.frontmatter.stale_since, manifest.fetchedAt, manifest.staleArchiveAfterDays)) {
|
||||
const tags = [...new Set([...stringArray(page.frontmatter.tags), 'archived'])];
|
||||
await writeFile(
|
||||
page.path,
|
||||
renderMarkdownPage({ ...page.frontmatter, tags, archived_since: manifest.fetchedAt }, page.content),
|
||||
'utf-8',
|
||||
);
|
||||
result.archivedPatternPages += 1;
|
||||
result.changedWikiPageKeys.push(page.key);
|
||||
result.actions.push({
|
||||
target: 'wiki',
|
||||
type: 'updated',
|
||||
key: page.key,
|
||||
detail: `Archived stale historic-SQL pattern page ${page.key}`,
|
||||
});
|
||||
continue;
|
||||
}
|
||||
const tags = [...new Set([...stringArray(page.frontmatter.tags), 'stale'])];
|
||||
await writeFile(
|
||||
page.path,
|
||||
renderMarkdownPage({ ...page.frontmatter, tags, stale_since: manifest.fetchedAt }, page.content),
|
||||
'utf-8',
|
||||
);
|
||||
result.stalePatternPagesMarked += 1;
|
||||
result.changedWikiPageKeys.push(page.key);
|
||||
result.actions.push({
|
||||
target: 'wiki',
|
||||
type: 'updated',
|
||||
key: page.key,
|
||||
detail: `Marked historic-SQL pattern page ${page.key} stale`,
|
||||
});
|
||||
}
|
||||
|
||||
result.changedWikiPageKeys = [...new Set(result.changedWikiPageKeys)].sort();
|
||||
return result;
|
||||
}
|
||||
|
|
@ -0,0 +1,36 @@
|
|||
import { describe, expect, it } from 'vitest';
|
||||
import { compileHistoricSqlRedactionPatterns, redactHistoricSqlText } from './redaction.js';
|
||||
|
||||
describe('historic-SQL redaction', () => {
|
||||
it('redacts regex matches and supports the (?i) case-insensitive prefix', () => {
|
||||
const redactors = compileHistoricSqlRedactionPatterns([
|
||||
'sk_live_[A-Za-z0-9]+',
|
||||
'(?i)secret_token_[a-z0-9]+',
|
||||
]);
|
||||
|
||||
const sql =
|
||||
"select * from public.api_events where api_key = 'sk_live_abc123' and note = 'Secret_Token_9f'"; // pragma: allowlist secret
|
||||
|
||||
expect(redactHistoricSqlText(sql, redactors)).toBe(
|
||||
"select * from public.api_events where api_key = '[REDACTED]' and note = '[REDACTED]'",
|
||||
);
|
||||
});
|
||||
|
||||
it('returns the original SQL text when no redaction patterns are configured', () => {
|
||||
const sql = "select * from public.orders where status = 'paid'";
|
||||
|
||||
expect(redactHistoricSqlText(sql, compileHistoricSqlRedactionPatterns([]))).toBe(sql);
|
||||
});
|
||||
|
||||
it('throws a config-focused error for invalid redaction regex patterns', () => {
|
||||
expect(() => compileHistoricSqlRedactionPatterns(['[broken'])).toThrow(
|
||||
'Invalid historicSql.redactionPatterns entry "[broken"',
|
||||
);
|
||||
});
|
||||
|
||||
it('throws a config-focused error for empty redaction regex patterns', () => {
|
||||
expect(() => compileHistoricSqlRedactionPatterns([' '])).toThrow(
|
||||
'Invalid historicSql.redactionPatterns entry " "',
|
||||
);
|
||||
});
|
||||
});
|
||||
|
|
@ -0,0 +1,37 @@
|
|||
export interface HistoricSqlRedactionPattern {
|
||||
pattern: string;
|
||||
expression: RegExp;
|
||||
}
|
||||
|
||||
const CASE_INSENSITIVE_PREFIX = '(?i)';
|
||||
const REDACTION_TOKEN = '[REDACTED]';
|
||||
|
||||
export function compileHistoricSqlRedactionPatterns(patterns: readonly string[]): HistoricSqlRedactionPattern[] {
|
||||
return patterns.map((pattern) => {
|
||||
const trimmed = pattern.trim();
|
||||
const caseInsensitive = trimmed.startsWith(CASE_INSENSITIVE_PREFIX);
|
||||
const source = caseInsensitive ? trimmed.slice(CASE_INSENSITIVE_PREFIX.length) : trimmed;
|
||||
if (source.length === 0) {
|
||||
throw new Error(`Invalid historicSql.redactionPatterns entry "${pattern}": pattern must not be empty`);
|
||||
}
|
||||
|
||||
try {
|
||||
return {
|
||||
pattern,
|
||||
expression: new RegExp(source, caseInsensitive ? 'gi' : 'g'),
|
||||
};
|
||||
} catch (error) {
|
||||
const reason = error instanceof Error ? error.message : String(error);
|
||||
throw new Error(`Invalid historicSql.redactionPatterns entry "${pattern}": ${reason}`);
|
||||
}
|
||||
});
|
||||
}
|
||||
|
||||
export function redactHistoricSqlText(text: string, redactors: readonly HistoricSqlRedactionPattern[]): string {
|
||||
let next = text;
|
||||
for (const redactor of redactors) {
|
||||
redactor.expression.lastIndex = 0;
|
||||
next = next.replace(redactor.expression, REDACTION_TOKEN);
|
||||
}
|
||||
return next;
|
||||
}
|
||||
|
|
@ -0,0 +1,74 @@
|
|||
import { describe, expect, it } from 'vitest';
|
||||
import { z } from 'zod';
|
||||
import {
|
||||
patternOutputSchema,
|
||||
patternsArraySchema,
|
||||
tableUsageOutputSchema,
|
||||
} from './skill-schemas.js';
|
||||
|
||||
describe('historic-sql skill schemas', () => {
|
||||
it('accepts table usage output and preserves future keys', () => {
|
||||
const parsed = tableUsageOutputSchema.parse({
|
||||
narrative: 'Orders are queried for paid/refunded lifecycle analysis.',
|
||||
frequencyTier: 'high',
|
||||
commonFilters: ['status', 'created_at'],
|
||||
commonGroupBys: ['status'],
|
||||
commonJoins: [{ table: 'public.customers', on: ['customer_id'] }],
|
||||
staleSince: null,
|
||||
analystNote: 'preserve me',
|
||||
});
|
||||
|
||||
expect(parsed).toMatchObject({
|
||||
narrative: 'Orders are queried for paid/refunded lifecycle analysis.',
|
||||
frequencyTier: 'high',
|
||||
commonFilters: ['status', 'created_at'],
|
||||
commonGroupBys: ['status'],
|
||||
commonJoins: [{ table: 'public.customers', on: ['customer_id'] }],
|
||||
staleSince: null,
|
||||
analystNote: 'preserve me',
|
||||
});
|
||||
});
|
||||
|
||||
it('rejects invalid frequency tiers', () => {
|
||||
const result = tableUsageOutputSchema.safeParse({
|
||||
narrative: 'Orders are queried often.',
|
||||
frequencyTier: 'sometimes',
|
||||
commonFilters: [],
|
||||
commonJoins: [],
|
||||
});
|
||||
|
||||
expect(result.success).toBe(false);
|
||||
});
|
||||
|
||||
it('accepts pattern outputs used for wiki projection', () => {
|
||||
const parsed = patternsArraySchema.parse([
|
||||
{
|
||||
slug: 'order-lifecycle-analysis',
|
||||
title: 'Order Lifecycle Analysis',
|
||||
narrative: 'Teams inspect order status by customer and month.',
|
||||
definitionSql: 'select status, count(*) from public.orders group by status',
|
||||
tablesInvolved: ['public.orders', 'public.customers'],
|
||||
slRefs: ['orders', 'customers'],
|
||||
constituentTemplateIds: ['template_1', 'template_2'],
|
||||
},
|
||||
]);
|
||||
|
||||
expect(parsed[0]).toEqual({
|
||||
slug: 'order-lifecycle-analysis',
|
||||
title: 'Order Lifecycle Analysis',
|
||||
narrative: 'Teams inspect order status by customer and month.',
|
||||
definitionSql: 'select status, count(*) from public.orders group by status',
|
||||
tablesInvolved: ['public.orders', 'public.customers'],
|
||||
slRefs: ['orders', 'customers'],
|
||||
constituentTemplateIds: ['template_1', 'template_2'],
|
||||
});
|
||||
});
|
||||
|
||||
it('exports zod schemas that can produce JSON schema for prompt prefixes', () => {
|
||||
const tableUsageJsonSchema = z.toJSONSchema(tableUsageOutputSchema);
|
||||
const patternJsonSchema = z.toJSONSchema(patternOutputSchema);
|
||||
|
||||
expect(tableUsageJsonSchema).toMatchObject({ type: 'object' });
|
||||
expect(patternJsonSchema).toMatchObject({ type: 'object' });
|
||||
});
|
||||
});
|
||||
|
|
@ -0,0 +1,31 @@
|
|||
import { z } from 'zod';
|
||||
|
||||
export const tableUsageOutputSchema = z
|
||||
.object({
|
||||
narrative: z.string(),
|
||||
frequencyTier: z.enum(['high', 'mid', 'low', 'unused']),
|
||||
commonFilters: z.array(z.string()),
|
||||
commonGroupBys: z.array(z.string()).optional(),
|
||||
commonJoins: z.array(
|
||||
z.object({
|
||||
table: z.string(),
|
||||
on: z.array(z.string()),
|
||||
}),
|
||||
),
|
||||
staleSince: z.iso.datetime().nullable().optional(),
|
||||
})
|
||||
.passthrough();
|
||||
export type TableUsageOutput = z.infer<typeof tableUsageOutputSchema>;
|
||||
|
||||
export const patternOutputSchema = z.object({
|
||||
slug: z.string(),
|
||||
title: z.string(),
|
||||
narrative: z.string(),
|
||||
definitionSql: z.string(),
|
||||
tablesInvolved: z.array(z.string()),
|
||||
slRefs: z.array(z.string()),
|
||||
constituentTemplateIds: z.array(z.string()),
|
||||
});
|
||||
|
||||
/** @internal */
|
||||
export const patternsArraySchema = z.array(patternOutputSchema);
|
||||
|
|
@ -0,0 +1,148 @@
|
|||
import { describe, expect, it, vi } from 'vitest';
|
||||
import { HistoricSqlGrantsMissingError } from './errors.js';
|
||||
import { SnowflakeHistoricSqlQueryHistoryReader } from './snowflake-query-history-reader.js';
|
||||
|
||||
interface FakeQueryResult {
|
||||
headers: string[];
|
||||
rows: unknown[][];
|
||||
totalRows: number;
|
||||
error?: string;
|
||||
}
|
||||
|
||||
function queryClient(results: FakeQueryResult[]) {
|
||||
const executeQuery = vi.fn(async (_query: string) => {
|
||||
const next = results.shift();
|
||||
if (!next) {
|
||||
throw new Error('unexpected query');
|
||||
}
|
||||
return next;
|
||||
});
|
||||
return { executeQuery };
|
||||
}
|
||||
|
||||
function firstQuery(client: ReturnType<typeof queryClient>): string {
|
||||
const call = client.executeQuery.mock.calls[0];
|
||||
if (!call) {
|
||||
throw new Error('expected query client to be called');
|
||||
}
|
||||
return call[0];
|
||||
}
|
||||
|
||||
describe('SnowflakeHistoricSqlQueryHistoryReader', () => {
|
||||
it('probes SNOWFLAKE.ACCOUNT_USAGE.QUERY_HISTORY', async () => {
|
||||
const client = queryClient([{ headers: ['1'], rows: [[1]], totalRows: 1 }]);
|
||||
const reader = new SnowflakeHistoricSqlQueryHistoryReader();
|
||||
|
||||
await expect(reader.probe(client)).resolves.toEqual({ warnings: [], info: [] });
|
||||
|
||||
expect(client.executeQuery).toHaveBeenCalledWith(
|
||||
'SELECT 1 FROM SNOWFLAKE.ACCOUNT_USAGE.QUERY_HISTORY LIMIT 1',
|
||||
);
|
||||
});
|
||||
|
||||
it('turns probe result errors into HistoricSqlGrantsMissingError', async () => {
|
||||
const client = queryClient([{ headers: [], rows: [], totalRows: 0, error: 'Object does not exist or not authorized' }]);
|
||||
const reader = new SnowflakeHistoricSqlQueryHistoryReader();
|
||||
|
||||
await expect(reader.probe(client)).rejects.toMatchObject({
|
||||
name: 'HistoricSqlGrantsMissingError',
|
||||
dialect: 'snowflake',
|
||||
remediation: 'GRANT IMPORTED PRIVILEGES ON DATABASE SNOWFLAKE TO ROLE <connection role>;',
|
||||
});
|
||||
});
|
||||
|
||||
it('turns thrown probe failures into HistoricSqlGrantsMissingError', async () => {
|
||||
const client = {
|
||||
executeQuery: vi.fn(async () => {
|
||||
throw new Error('permission denied');
|
||||
}),
|
||||
};
|
||||
const reader = new SnowflakeHistoricSqlQueryHistoryReader();
|
||||
|
||||
await expect(reader.probe(client)).rejects.toBeInstanceOf(HistoricSqlGrantsMissingError);
|
||||
});
|
||||
|
||||
it('fetches aggregated Snowflake query templates', async () => {
|
||||
const client = queryClient([
|
||||
{
|
||||
headers: [
|
||||
'template_id',
|
||||
'canonical_sql',
|
||||
'executions',
|
||||
'distinct_users',
|
||||
'first_seen',
|
||||
'last_seen',
|
||||
'p50_ms',
|
||||
'p95_ms',
|
||||
'error_rate',
|
||||
'rows_produced',
|
||||
'top_users',
|
||||
],
|
||||
rows: [
|
||||
[
|
||||
'hash-1',
|
||||
'select status from orders',
|
||||
42,
|
||||
3,
|
||||
'2026-05-01T00:00:00.000Z',
|
||||
'2026-05-11T00:00:00.000Z',
|
||||
12,
|
||||
40,
|
||||
0.05,
|
||||
100,
|
||||
JSON.stringify([{ user: 'ANALYST', executions: 1 }]),
|
||||
],
|
||||
],
|
||||
totalRows: 1,
|
||||
},
|
||||
]);
|
||||
const reader = new SnowflakeHistoricSqlQueryHistoryReader();
|
||||
|
||||
const rows = [];
|
||||
for await (const row of reader.fetchAggregated(
|
||||
client,
|
||||
{ start: new Date('2026-02-10T00:00:00.000Z'), end: new Date('2026-05-11T00:00:00.000Z') },
|
||||
{ dialect: 'snowflake', minExecutions: 5, windowDays: 90, enabledTables: [], filters: { dropTrivialProbes: true }, redactionPatterns: [], staleArchiveAfterDays: 90 },
|
||||
)) {
|
||||
rows.push(row);
|
||||
}
|
||||
|
||||
const sql = firstQuery(client);
|
||||
expect(sql).toContain('SNOWFLAKE.ACCOUNT_USAGE.QUERY_HISTORY');
|
||||
expect(sql).toContain('COUNT(*) AS executions');
|
||||
expect(sql).toContain('GROUP BY query_hash');
|
||||
expect(sql).toContain('HAVING COUNT(*) >= 5');
|
||||
expect(rows).toMatchObject([
|
||||
{
|
||||
templateId: 'hash-1',
|
||||
stats: {
|
||||
executions: 42,
|
||||
errorRate: 0.05,
|
||||
},
|
||||
topUsers: [{ user: 'ANALYST', executions: 1 }],
|
||||
},
|
||||
]);
|
||||
});
|
||||
|
||||
it('throws a clear error when the query client cannot execute SQL', async () => {
|
||||
const reader = new SnowflakeHistoricSqlQueryHistoryReader();
|
||||
|
||||
await expect(async () => {
|
||||
for await (const _row of reader.fetchAggregated(
|
||||
{},
|
||||
{ start: new Date(), end: new Date() },
|
||||
{
|
||||
dialect: 'snowflake',
|
||||
minExecutions: 5,
|
||||
windowDays: 90,
|
||||
enabledTables: [],
|
||||
filters: { dropTrivialProbes: true },
|
||||
redactionPatterns: [],
|
||||
staleArchiveAfterDays: 90,
|
||||
},
|
||||
)) {
|
||||
throw new Error('unreachable');
|
||||
}
|
||||
}).rejects.toThrow('Historic SQL Snowflake reader requires a query client with executeQuery(query)');
|
||||
});
|
||||
});
|
||||
|
|
@ -0,0 +1,220 @@
|
|||
import { HistoricSqlGrantsMissingError } from './errors.js';
|
||||
import {
|
||||
aggregatedTemplateSchema,
|
||||
type AggregatedTemplate,
|
||||
type HistoricSqlTimeWindow,
|
||||
type HistoricSqlUnifiedPullConfig,
|
||||
} from './types.js';
|
||||
|
||||
interface QueryResultLike {
|
||||
headers: string[];
|
||||
rows: unknown[][];
|
||||
totalRows: number;
|
||||
error?: string;
|
||||
}
|
||||
|
||||
interface QueryClientLike {
|
||||
executeQuery(query: string): Promise<QueryResultLike>;
|
||||
}
|
||||
|
||||
const PROBE_SQL = 'SELECT 1 FROM SNOWFLAKE.ACCOUNT_USAGE.QUERY_HISTORY LIMIT 1';
|
||||
|
||||
const SNOWFLAKE_GRANTS_REMEDIATION =
|
||||
'GRANT IMPORTED PRIVILEGES ON DATABASE SNOWFLAKE TO ROLE <connection role>;';
|
||||
|
||||
function queryClient(client: unknown): QueryClientLike {
|
||||
if (
|
||||
client &&
|
||||
typeof client === 'object' &&
|
||||
'executeQuery' in client &&
|
||||
typeof (client as { executeQuery?: unknown }).executeQuery === 'function'
|
||||
) {
|
||||
return client as QueryClientLike;
|
||||
}
|
||||
throw new Error('Historic SQL Snowflake reader requires a query client with executeQuery(query)');
|
||||
}
|
||||
|
||||
function grantsError(cause: unknown): HistoricSqlGrantsMissingError {
|
||||
const message =
|
||||
cause instanceof Error
|
||||
? cause.message
|
||||
: typeof cause === 'string'
|
||||
? cause
|
||||
: 'Snowflake role cannot query SNOWFLAKE.ACCOUNT_USAGE.QUERY_HISTORY.';
|
||||
return new HistoricSqlGrantsMissingError({
|
||||
dialect: 'snowflake',
|
||||
message: `Missing Snowflake audit grants for historic-SQL ingest: ${message}`,
|
||||
remediation: SNOWFLAKE_GRANTS_REMEDIATION,
|
||||
cause,
|
||||
});
|
||||
}
|
||||
|
||||
function timestampLiteral(value: Date | string): string {
|
||||
const date = value instanceof Date ? value : new Date(value);
|
||||
if (Number.isNaN(date.getTime())) {
|
||||
throw new Error(`Invalid Snowflake query-history timestamp: ${String(value)}`);
|
||||
}
|
||||
return `'${date.toISOString().replace(/'/g, "''")}'::TIMESTAMP_TZ`;
|
||||
}
|
||||
|
||||
function indexByHeader(headers: string[]): Map<string, number> {
|
||||
const out = new Map<string, number>();
|
||||
headers.forEach((header, index) => {
|
||||
out.set(header.toUpperCase(), index);
|
||||
});
|
||||
return out;
|
||||
}
|
||||
|
||||
function value(row: unknown[], indexes: Map<string, number>, name: string): unknown {
|
||||
const index = indexes.get(name.toUpperCase());
|
||||
return index === undefined ? null : row[index];
|
||||
}
|
||||
|
||||
function nullableString(raw: unknown): string | null {
|
||||
if (raw === null || raw === undefined) {
|
||||
return null;
|
||||
}
|
||||
const text = String(raw);
|
||||
return text.length > 0 ? text : null;
|
||||
}
|
||||
|
||||
function requiredString(raw: unknown, field: string): string {
|
||||
const text = nullableString(raw);
|
||||
if (!text) {
|
||||
throw new Error(`Snowflake QUERY_HISTORY row is missing ${field}`);
|
||||
}
|
||||
return text;
|
||||
}
|
||||
|
||||
function nullableNumber(raw: unknown): number | null {
|
||||
if (raw === null || raw === undefined || raw === '') {
|
||||
return null;
|
||||
}
|
||||
const number = typeof raw === 'number' ? raw : Number(raw);
|
||||
if (!Number.isFinite(number)) {
|
||||
return null;
|
||||
}
|
||||
return number;
|
||||
}
|
||||
|
||||
function requiredNumber(raw: unknown, field: string): number {
|
||||
const number = nullableNumber(raw);
|
||||
if (number === null) {
|
||||
throw new Error(`Snowflake QUERY_HISTORY row has invalid ${field}: ${String(raw)}`);
|
||||
}
|
||||
return number;
|
||||
}
|
||||
|
||||
function requiredInteger(raw: unknown, field: string): number {
|
||||
return Math.trunc(requiredNumber(raw, field));
|
||||
}
|
||||
|
||||
function nullableInteger(raw: unknown): number | null {
|
||||
const number = nullableNumber(raw);
|
||||
return number === null ? null : Math.trunc(number);
|
||||
}
|
||||
|
||||
function isoTimestamp(raw: unknown, field: string): string {
|
||||
if (raw instanceof Date) {
|
||||
return raw.toISOString();
|
||||
}
|
||||
const text = requiredString(raw, field);
|
||||
const date = new Date(text);
|
||||
if (Number.isNaN(date.getTime())) {
|
||||
throw new Error(`Snowflake QUERY_HISTORY row has invalid ${field}: ${text}`);
|
||||
}
|
||||
return date.toISOString();
|
||||
}
|
||||
|
||||
function parseTopUsers(raw: unknown): Array<{ user: string | null; executions: number }> {
|
||||
const text = nullableString(raw);
|
||||
if (!text) {
|
||||
return [];
|
||||
}
|
||||
try {
|
||||
const parsed = JSON.parse(text) as unknown;
|
||||
if (!Array.isArray(parsed)) {
|
||||
return [];
|
||||
}
|
||||
return parsed.flatMap((entry) => {
|
||||
if (!entry || typeof entry !== 'object') {
|
||||
return [];
|
||||
}
|
||||
const user = nullableString((entry as { user?: unknown }).user);
|
||||
const executions = nullableInteger((entry as { executions?: unknown }).executions);
|
||||
return executions === null ? [] : [{ user, executions }];
|
||||
});
|
||||
} catch {
|
||||
return [];
|
||||
}
|
||||
}
|
||||
|
||||
function mapAggregatedRow(row: unknown[], indexes: Map<string, number>): AggregatedTemplate {
|
||||
return aggregatedTemplateSchema.parse({
|
||||
templateId: requiredString(value(row, indexes, 'template_id'), 'template_id'),
|
||||
canonicalSql: requiredString(value(row, indexes, 'canonical_sql'), 'canonical_sql'),
|
||||
dialect: 'snowflake',
|
||||
stats: {
|
||||
executions: requiredInteger(value(row, indexes, 'executions'), 'executions'),
|
||||
distinctUsers: requiredInteger(value(row, indexes, 'distinct_users'), 'distinct_users'),
|
||||
firstSeen: isoTimestamp(value(row, indexes, 'first_seen'), 'first_seen'),
|
||||
lastSeen: isoTimestamp(value(row, indexes, 'last_seen'), 'last_seen'),
|
||||
p50RuntimeMs: nullableNumber(value(row, indexes, 'p50_ms')),
|
||||
p95RuntimeMs: nullableNumber(value(row, indexes, 'p95_ms')),
|
||||
errorRate: requiredNumber(value(row, indexes, 'error_rate'), 'error_rate'),
|
||||
rowsProduced: nullableInteger(value(row, indexes, 'rows_produced')),
|
||||
},
|
||||
topUsers: parseTopUsers(value(row, indexes, 'top_users')),
|
||||
});
|
||||
}
|
||||
|
||||
export class SnowflakeHistoricSqlQueryHistoryReader {
|
||||
async probe(client: unknown): Promise<{ warnings: string[]; info: string[] }> {
|
||||
let result: QueryResultLike;
|
||||
try {
|
||||
result = await queryClient(client).executeQuery(PROBE_SQL);
|
||||
} catch (error) {
|
||||
throw grantsError(error);
|
||||
}
|
||||
if (result.error) {
|
||||
throw grantsError(result.error);
|
||||
}
|
||||
return { warnings: [], info: [] };
|
||||
}
|
||||
|
||||
async *fetchAggregated(
|
||||
client: unknown,
|
||||
window: HistoricSqlTimeWindow,
|
||||
config: HistoricSqlUnifiedPullConfig,
|
||||
): AsyncIterable<AggregatedTemplate> {
|
||||
const sql = `
|
||||
SELECT
|
||||
query_hash AS template_id,
|
||||
MIN(query_text) AS canonical_sql,
|
||||
COUNT(*) AS executions,
|
||||
COUNT(DISTINCT user_name) AS distinct_users,
|
||||
MIN(start_time) AS first_seen,
|
||||
MAX(start_time) AS last_seen,
|
||||
APPROX_PERCENTILE(total_elapsed_time, 0.50) AS p50_ms,
|
||||
APPROX_PERCENTILE(total_elapsed_time, 0.95) AS p95_ms,
|
||||
DIV0(COUNT_IF(execution_status != 'SUCCESS'), COUNT(*)) AS error_rate,
|
||||
SUM(rows_produced) AS rows_produced,
|
||||
ARRAY_AGG(OBJECT_CONSTRUCT('user', user_name, 'executions', 1)) WITHIN GROUP (ORDER BY start_time DESC)::string AS top_users
|
||||
FROM SNOWFLAKE.ACCOUNT_USAGE.QUERY_HISTORY
|
||||
WHERE query_text IS NOT NULL
|
||||
AND query_type IN ('SELECT', 'MERGE')
|
||||
AND start_time >= ${timestampLiteral(window.start)}
|
||||
AND start_time < ${timestampLiteral(window.end)}
|
||||
GROUP BY query_hash
|
||||
HAVING COUNT(*) >= ${config.minExecutions}
|
||||
ORDER BY executions DESC`.trim();
|
||||
const result = await queryClient(client).executeQuery(sql);
|
||||
if (result.error) {
|
||||
throw grantsError(result.error);
|
||||
}
|
||||
const indexes = indexByHeader(result.headers);
|
||||
for (const row of result.rows) {
|
||||
yield mapAggregatedRow(row, indexes);
|
||||
}
|
||||
}
|
||||
}
|
||||
|
|
@ -0,0 +1,436 @@
|
|||
import { mkdtemp, readFile, readdir } from 'node:fs/promises';
|
||||
import { tmpdir } from 'node:os';
|
||||
import { join } from 'node:path';
|
||||
import { describe, expect, it, vi } from 'vitest';
|
||||
import type { SqlAnalysisPort } from '../../../../context/sql-analysis/ports.js';
|
||||
import { stageHistoricSqlAggregatedSnapshot } from './stage-unified.js';
|
||||
import type { AggregatedTemplate, HistoricSqlReader } from './types.js';
|
||||
|
||||
async function tempDir(): Promise<string> {
|
||||
return mkdtemp(join(tmpdir(), 'historic-sql-unified-stage-'));
|
||||
}
|
||||
|
||||
async function readJson<T>(root: string, relPath: string): Promise<T> {
|
||||
return JSON.parse(await readFile(join(root, relPath), 'utf-8')) as T;
|
||||
}
|
||||
|
||||
function aggregate(overrides: Partial<AggregatedTemplate> & { templateId: string; canonicalSql: string }): AggregatedTemplate {
|
||||
return {
|
||||
templateId: overrides.templateId,
|
||||
canonicalSql: overrides.canonicalSql,
|
||||
dialect: overrides.dialect ?? 'postgres',
|
||||
stats: overrides.stats ?? {
|
||||
executions: 42,
|
||||
distinctUsers: 3,
|
||||
firstSeen: '2026-05-01T00:00:00.000Z',
|
||||
lastSeen: '2026-05-11T00:00:00.000Z',
|
||||
p50RuntimeMs: 20,
|
||||
p95RuntimeMs: 80,
|
||||
errorRate: 0,
|
||||
rowsProduced: 100,
|
||||
},
|
||||
topUsers: overrides.topUsers ?? [{ user: 'analyst', executions: 40 }],
|
||||
};
|
||||
}
|
||||
|
||||
describe('stageHistoricSqlAggregatedSnapshot', () => {
|
||||
it('batch parses templates and writes stable table and patterns artifacts', async () => {
|
||||
const stagedDir = await tempDir();
|
||||
const reader: HistoricSqlReader = {
|
||||
async probe() {
|
||||
return { warnings: ['pg_stat_statements.track is none; aggregation still proceeds'], info: [] };
|
||||
},
|
||||
async *fetchAggregated() {
|
||||
yield aggregate({
|
||||
templateId: 'orders-by-status',
|
||||
canonicalSql: 'select o.status, count(*) from public.orders o join public.customers c on c.id = o.customer_id where o.created_at >= $1 group by o.status',
|
||||
});
|
||||
yield aggregate({
|
||||
templateId: 'service-account-only',
|
||||
canonicalSql: 'select * from public.orders where id = $1',
|
||||
stats: {
|
||||
executions: 20,
|
||||
distinctUsers: 1,
|
||||
firstSeen: '2026-05-01T00:00:00.000Z',
|
||||
lastSeen: '2026-05-11T00:00:00.000Z',
|
||||
p50RuntimeMs: 5,
|
||||
p95RuntimeMs: 10,
|
||||
errorRate: 0,
|
||||
rowsProduced: 1,
|
||||
},
|
||||
topUsers: [{ user: 'svc_loader', executions: 20 }],
|
||||
});
|
||||
yield aggregate({
|
||||
templateId: 'bad-parse',
|
||||
canonicalSql: 'select broken from',
|
||||
});
|
||||
},
|
||||
};
|
||||
const sqlAnalysis: SqlAnalysisPort = {
|
||||
analyzeForFingerprint: vi.fn(),
|
||||
analyzeBatch: vi.fn(async () => new Map([
|
||||
[
|
||||
'orders-by-status',
|
||||
{
|
||||
tablesTouched: ['public.orders', 'public.customers'],
|
||||
columnsByClause: {
|
||||
select: ['status'],
|
||||
where: ['created_at'],
|
||||
join: ['customer_id'],
|
||||
groupBy: ['status'],
|
||||
},
|
||||
},
|
||||
],
|
||||
['bad-parse', { tablesTouched: [], columnsByClause: {}, error: 'parse failed' }],
|
||||
])),
|
||||
validateReadOnly: vi.fn(async () => ({ ok: true })),
|
||||
};
|
||||
|
||||
await stageHistoricSqlAggregatedSnapshot({
|
||||
stagedDir,
|
||||
connectionId: 'warehouse',
|
||||
queryClient: {},
|
||||
reader,
|
||||
sqlAnalysis,
|
||||
pullConfig: {
|
||||
dialect: 'postgres',
|
||||
filters: {
|
||||
serviceAccounts: { patterns: ['^svc_'], mode: 'exclude' },
|
||||
},
|
||||
},
|
||||
now: new Date('2026-05-11T12:00:00.000Z'),
|
||||
});
|
||||
|
||||
expect(sqlAnalysis.analyzeBatch).toHaveBeenCalledTimes(1);
|
||||
expect(sqlAnalysis.analyzeBatch).toHaveBeenCalledWith(
|
||||
[
|
||||
{
|
||||
id: 'orders-by-status',
|
||||
sql: 'select o.status, count(*) from public.orders o join public.customers c on c.id = o.customer_id where o.created_at >= $1 group by o.status',
|
||||
},
|
||||
{ id: 'bad-parse', sql: 'select broken from' },
|
||||
],
|
||||
'postgres',
|
||||
);
|
||||
|
||||
expect(await readdir(join(stagedDir, 'tables'))).toEqual(['public.customers.json', 'public.orders.json']);
|
||||
|
||||
const manifest = await readJson<Record<string, unknown>>(stagedDir, 'manifest.json');
|
||||
expect(manifest).toMatchObject({
|
||||
source: 'historic-sql',
|
||||
connectionId: 'warehouse',
|
||||
dialect: 'postgres',
|
||||
snapshotRowCount: 3,
|
||||
touchedTableCount: 2,
|
||||
parseFailures: 1,
|
||||
warnings: ['parse_failed:bad-parse'],
|
||||
probeWarnings: ['pg_stat_statements.track is none; aggregation still proceeds'],
|
||||
staleArchiveAfterDays: 90,
|
||||
});
|
||||
|
||||
const orders = await readJson<Record<string, any>>(stagedDir, 'tables/public.orders.json');
|
||||
expect(orders).toMatchObject({
|
||||
table: 'public.orders',
|
||||
stats: {
|
||||
executionsBucket: '10-100',
|
||||
distinctUsersBucket: '2-5',
|
||||
errorRateBucket: 'none',
|
||||
p95RuntimeBucket: '<100ms',
|
||||
recencyBucket: 'current',
|
||||
},
|
||||
columnsByClause: {
|
||||
select: [['status', 'high']],
|
||||
where: [['created_at', 'high']],
|
||||
join: [['customer_id', 'high']],
|
||||
groupBy: [['status', 'high']],
|
||||
},
|
||||
observedJoins: [{ withTable: 'public.customers', on: ['customer_id'], freq: 'high' }],
|
||||
topTemplates: [
|
||||
{
|
||||
id: 'orders-by-status',
|
||||
topUsers: [{ user: 'analyst' }],
|
||||
},
|
||||
],
|
||||
});
|
||||
expect(orders.topTemplates[0].canonicalSql).toContain('group by o.status');
|
||||
|
||||
const patterns = await readJson<Record<string, any>>(stagedDir, 'patterns-input.json');
|
||||
expect(patterns.templates).toEqual([
|
||||
{
|
||||
id: 'orders-by-status',
|
||||
canonicalSql: expect.stringContaining('public.orders'),
|
||||
tablesTouched: ['public.customers', 'public.orders'],
|
||||
executionsBucket: '10-100',
|
||||
distinctUsersBucket: '2-5',
|
||||
dialect: 'postgres',
|
||||
},
|
||||
]);
|
||||
});
|
||||
|
||||
it('redacts configured SQL substrings in staged artifacts while analyzing original SQL', async () => {
|
||||
const stagedDir = await tempDir();
|
||||
const originalSql =
|
||||
"select * from public.api_events where api_key = 'sk_live_abc123' and note = 'Secret_Token_9f'"; // pragma: allowlist secret
|
||||
const reader: HistoricSqlReader = {
|
||||
async probe() {
|
||||
return { warnings: [], info: [] };
|
||||
},
|
||||
async *fetchAggregated() {
|
||||
yield aggregate({
|
||||
templateId: 'api-events-with-secret',
|
||||
canonicalSql: originalSql,
|
||||
stats: {
|
||||
executions: 15,
|
||||
distinctUsers: 2,
|
||||
firstSeen: '2026-05-01T00:00:00.000Z',
|
||||
lastSeen: '2026-05-11T00:00:00.000Z',
|
||||
p50RuntimeMs: 12,
|
||||
p95RuntimeMs: 25,
|
||||
errorRate: 0,
|
||||
rowsProduced: 15,
|
||||
},
|
||||
});
|
||||
},
|
||||
};
|
||||
const sqlAnalysis: SqlAnalysisPort = {
|
||||
analyzeForFingerprint: vi.fn(),
|
||||
analyzeBatch: vi.fn(async () => new Map([
|
||||
[
|
||||
'api-events-with-secret',
|
||||
{
|
||||
tablesTouched: ['public.api_events'],
|
||||
columnsByClause: {
|
||||
select: [],
|
||||
where: ['api_key', 'note'],
|
||||
join: [],
|
||||
groupBy: [],
|
||||
},
|
||||
},
|
||||
],
|
||||
])),
|
||||
validateReadOnly: vi.fn(async () => ({ ok: true })),
|
||||
};
|
||||
|
||||
await stageHistoricSqlAggregatedSnapshot({
|
||||
stagedDir,
|
||||
connectionId: 'warehouse',
|
||||
queryClient: {},
|
||||
reader,
|
||||
sqlAnalysis,
|
||||
pullConfig: {
|
||||
dialect: 'postgres',
|
||||
redactionPatterns: ['sk_live_[A-Za-z0-9]+', '(?i)secret_token_[a-z0-9]+'],
|
||||
},
|
||||
now: new Date('2026-05-11T12:00:00.000Z'),
|
||||
});
|
||||
|
||||
expect(sqlAnalysis.analyzeBatch).toHaveBeenCalledWith(
|
||||
[{ id: 'api-events-with-secret', sql: originalSql }],
|
||||
'postgres',
|
||||
);
|
||||
|
||||
const tableJson = await readFile(join(stagedDir, 'tables/public.api_events.json'), 'utf-8');
|
||||
const patternsJson = await readFile(join(stagedDir, 'patterns-input.json'), 'utf-8');
|
||||
expect(tableJson).not.toContain('sk_live_abc123');
|
||||
expect(tableJson).not.toContain('Secret_Token_9f');
|
||||
expect(patternsJson).not.toContain('sk_live_abc123');
|
||||
expect(patternsJson).not.toContain('Secret_Token_9f');
|
||||
expect(tableJson).toContain('[REDACTED]');
|
||||
expect(patternsJson).toContain('[REDACTED]');
|
||||
});
|
||||
|
||||
it('limits staged table artifacts to configured enabled tables', async () => {
|
||||
const stagedDir = await tempDir();
|
||||
const reader: HistoricSqlReader = {
|
||||
async probe() {
|
||||
return { warnings: [], info: [] };
|
||||
},
|
||||
async *fetchAggregated() {
|
||||
yield aggregate({
|
||||
templateId: 'selected-qualified',
|
||||
canonicalSql: 'select count(*) from orbit_analytics.int_active_contract_arr',
|
||||
});
|
||||
yield aggregate({
|
||||
templateId: 'selected-unqualified',
|
||||
canonicalSql: 'select count(*) from int_customer_health_signals',
|
||||
});
|
||||
yield aggregate({
|
||||
templateId: 'unselected',
|
||||
canonicalSql: 'select count(*) from orbit_raw.accounts',
|
||||
});
|
||||
},
|
||||
};
|
||||
const sqlAnalysis: SqlAnalysisPort = {
|
||||
analyzeForFingerprint: vi.fn(),
|
||||
analyzeBatch: vi.fn(async () => new Map([
|
||||
[
|
||||
'selected-qualified',
|
||||
{
|
||||
tablesTouched: ['orbit_analytics.int_active_contract_arr'],
|
||||
columnsByClause: { select: [], where: [], join: [], groupBy: [] },
|
||||
},
|
||||
],
|
||||
[
|
||||
'selected-unqualified',
|
||||
{
|
||||
tablesTouched: ['int_customer_health_signals'],
|
||||
columnsByClause: { select: [], where: [], join: [], groupBy: [] },
|
||||
},
|
||||
],
|
||||
[
|
||||
'unselected',
|
||||
{
|
||||
tablesTouched: ['orbit_raw.accounts'],
|
||||
columnsByClause: { select: [], where: [], join: [], groupBy: [] },
|
||||
},
|
||||
],
|
||||
])),
|
||||
validateReadOnly: vi.fn(async () => ({ ok: true })),
|
||||
};
|
||||
|
||||
await stageHistoricSqlAggregatedSnapshot({
|
||||
stagedDir,
|
||||
connectionId: 'warehouse',
|
||||
queryClient: {},
|
||||
reader,
|
||||
sqlAnalysis,
|
||||
pullConfig: {
|
||||
dialect: 'postgres',
|
||||
enabledTables: [
|
||||
'orbit_analytics.int_active_contract_arr',
|
||||
'orbit_analytics.int_customer_health_signals',
|
||||
],
|
||||
},
|
||||
now: new Date('2026-05-11T12:00:00.000Z'),
|
||||
});
|
||||
|
||||
expect(await readdir(join(stagedDir, 'tables'))).toEqual([
|
||||
'int_customer_health_signals.json',
|
||||
'orbit_analytics.int_active_contract_arr.json',
|
||||
]);
|
||||
const manifest = await readJson<Record<string, any>>(stagedDir, 'manifest.json');
|
||||
expect(manifest.touchedTableCount).toBe(2);
|
||||
const patterns = await readJson<Record<string, any>>(stagedDir, 'patterns-input.json');
|
||||
expect(patterns.templates.map((entry: any) => entry.id)).toEqual(['selected-qualified', 'selected-unqualified']);
|
||||
});
|
||||
|
||||
it('preserves full patterns audit input and writes bounded cross-table pattern shards', async () => {
|
||||
const stagedDir = await tempDir();
|
||||
const largeSql = `select * from public.orders o join public.customers c on c.id = o.customer_id where payload = '${'x'.repeat(8000)}'`;
|
||||
const reader: HistoricSqlReader = {
|
||||
async probe() {
|
||||
return { warnings: [], info: [] };
|
||||
},
|
||||
async *fetchAggregated() {
|
||||
yield aggregate({
|
||||
templateId: 'orders-customers-a',
|
||||
canonicalSql: largeSql,
|
||||
stats: {
|
||||
executions: 25,
|
||||
distinctUsers: 4,
|
||||
firstSeen: '2026-05-01T00:00:00.000Z',
|
||||
lastSeen: '2026-05-11T00:00:00.000Z',
|
||||
p50RuntimeMs: 15,
|
||||
p95RuntimeMs: 90,
|
||||
errorRate: 0,
|
||||
rowsProduced: 250,
|
||||
},
|
||||
});
|
||||
yield aggregate({
|
||||
templateId: 'orders-customers-b',
|
||||
canonicalSql: largeSql.replace('payload', 'payload_b'),
|
||||
stats: {
|
||||
executions: 22,
|
||||
distinctUsers: 3,
|
||||
firstSeen: '2026-05-01T00:00:00.000Z',
|
||||
lastSeen: '2026-05-11T00:00:00.000Z',
|
||||
p50RuntimeMs: 20,
|
||||
p95RuntimeMs: 95,
|
||||
errorRate: 0,
|
||||
rowsProduced: 220,
|
||||
},
|
||||
});
|
||||
yield aggregate({
|
||||
templateId: 'orders-single-table',
|
||||
canonicalSql: 'select count(*) from public.orders',
|
||||
stats: {
|
||||
executions: 30,
|
||||
distinctUsers: 2,
|
||||
firstSeen: '2026-05-01T00:00:00.000Z',
|
||||
lastSeen: '2026-05-11T00:00:00.000Z',
|
||||
p50RuntimeMs: 10,
|
||||
p95RuntimeMs: 20,
|
||||
errorRate: 0,
|
||||
rowsProduced: 30,
|
||||
},
|
||||
});
|
||||
},
|
||||
};
|
||||
const sqlAnalysis: SqlAnalysisPort = {
|
||||
analyzeForFingerprint: vi.fn(),
|
||||
analyzeBatch: vi.fn(async () => new Map([
|
||||
[
|
||||
'orders-customers-a',
|
||||
{
|
||||
tablesTouched: ['public.orders', 'public.customers'],
|
||||
columnsByClause: {
|
||||
select: [],
|
||||
where: ['payload'],
|
||||
join: ['customer_id', 'id'],
|
||||
groupBy: [],
|
||||
},
|
||||
},
|
||||
],
|
||||
[
|
||||
'orders-customers-b',
|
||||
{
|
||||
tablesTouched: ['public.orders', 'public.customers'],
|
||||
columnsByClause: {
|
||||
select: [],
|
||||
where: ['payload_b'],
|
||||
join: ['customer_id', 'id'],
|
||||
groupBy: [],
|
||||
},
|
||||
},
|
||||
],
|
||||
[
|
||||
'orders-single-table',
|
||||
{
|
||||
tablesTouched: ['public.orders'],
|
||||
columnsByClause: {
|
||||
select: [],
|
||||
where: [],
|
||||
join: [],
|
||||
groupBy: [],
|
||||
},
|
||||
},
|
||||
],
|
||||
])),
|
||||
validateReadOnly: vi.fn(async () => ({ ok: true })),
|
||||
};
|
||||
|
||||
await stageHistoricSqlAggregatedSnapshot({
|
||||
stagedDir,
|
||||
connectionId: 'warehouse',
|
||||
queryClient: {},
|
||||
reader,
|
||||
sqlAnalysis,
|
||||
pullConfig: { dialect: 'postgres' },
|
||||
now: new Date('2026-05-11T12:00:00.000Z'),
|
||||
});
|
||||
|
||||
const audit = await readJson<Record<string, any>>(stagedDir, 'patterns-input.json');
|
||||
expect(audit.templates.map((entry: any) => entry.id)).toEqual([
|
||||
'orders-customers-a',
|
||||
'orders-customers-b',
|
||||
'orders-single-table',
|
||||
]);
|
||||
|
||||
const firstShard = await readJson<Record<string, any>>(stagedDir, 'patterns-input/part-0001.json');
|
||||
expect(firstShard.templates.map((entry: any) => entry.id)).toEqual(['orders-customers-a', 'orders-customers-b']);
|
||||
expect(firstShard.templates.some((entry: any) => entry.id === 'orders-single-table')).toBe(false);
|
||||
|
||||
const manifest = await readJson<Record<string, any>>(stagedDir, 'manifest.json');
|
||||
expect(manifest.warnings).toEqual([]);
|
||||
});
|
||||
});
|
||||
|
|
@ -0,0 +1,360 @@
|
|||
import { mkdir, writeFile } from 'node:fs/promises';
|
||||
import { dirname, join } from 'node:path';
|
||||
import type { SqlAnalysisPort } from '../../../../context/sql-analysis/ports.js';
|
||||
import {
|
||||
bucketDistinctUsers,
|
||||
bucketErrorRate,
|
||||
bucketExecutions,
|
||||
bucketFrequency,
|
||||
bucketP95Runtime,
|
||||
bucketRecency,
|
||||
} from './buckets.js';
|
||||
import { splitHistoricSqlPatternInputs } from './pattern-inputs.js';
|
||||
import {
|
||||
compileHistoricSqlRedactionPatterns,
|
||||
redactHistoricSqlText,
|
||||
type HistoricSqlRedactionPattern,
|
||||
} from './redaction.js';
|
||||
import {
|
||||
HISTORIC_SQL_SOURCE_KEY,
|
||||
aggregatedTemplateSchema,
|
||||
historicSqlUnifiedPullConfigSchema,
|
||||
type AggregatedTemplate,
|
||||
type HistoricSqlReader,
|
||||
type HistoricSqlUnifiedPullConfig,
|
||||
type StagedPatternsInput,
|
||||
type StagedTableInput,
|
||||
} from './types.js';
|
||||
|
||||
interface StageHistoricSqlAggregatedSnapshotInput {
|
||||
stagedDir: string;
|
||||
connectionId: string;
|
||||
queryClient: unknown;
|
||||
reader: HistoricSqlReader;
|
||||
sqlAnalysis: SqlAnalysisPort;
|
||||
pullConfig: unknown;
|
||||
now?: Date;
|
||||
}
|
||||
|
||||
interface ParsedTemplate {
|
||||
template: AggregatedTemplate;
|
||||
tablesTouched: string[];
|
||||
includedTables: string[];
|
||||
columnsByClause: Record<string, string[]>;
|
||||
}
|
||||
|
||||
interface EnabledTableFilter {
|
||||
exact: Set<string>;
|
||||
uniqueUnqualified: Set<string>;
|
||||
}
|
||||
|
||||
interface TableAccumulator {
|
||||
table: string;
|
||||
executions: number;
|
||||
distinctUsers: number;
|
||||
errorRateNumerator: number;
|
||||
p95RuntimeMs: number | null;
|
||||
lastSeen: string;
|
||||
columnsByClause: Map<string, Map<string, number>>;
|
||||
observedJoins: Map<string, Map<string, number>>;
|
||||
topTemplates: AggregatedTemplate[];
|
||||
}
|
||||
|
||||
const TRIVIAL_SQL_RE = /^\s*SELECT\s+(1|NOW\(\)|CURRENT_TIMESTAMP|VERSION\(\))\s*;?\s*$/i;
|
||||
const NOISE_PREFIX_RE = /^\s*(SHOW|DESCRIBE|DESC|EXPLAIN|USE|SET)\b/i;
|
||||
const SYSTEM_TABLE_RE = /\b(INFORMATION_SCHEMA|SNOWFLAKE\.ACCOUNT_USAGE|pg_|system\.)/i;
|
||||
|
||||
function writeJson(root: string, relPath: string, value: unknown): Promise<void> {
|
||||
const target = join(root, relPath);
|
||||
return mkdir(dirname(target), { recursive: true }).then(() =>
|
||||
writeFile(target, `${JSON.stringify(value, null, 2)}\n`, 'utf-8'),
|
||||
);
|
||||
}
|
||||
|
||||
function compilePatterns(patterns: string[]): RegExp[] {
|
||||
return patterns.map((pattern) => new RegExp(pattern));
|
||||
}
|
||||
|
||||
function matchesAny(value: string | null, patterns: RegExp[]): boolean {
|
||||
return !!value && patterns.some((pattern) => pattern.test(value));
|
||||
}
|
||||
|
||||
function shouldDropBySql(sql: string, config: HistoricSqlUnifiedPullConfig): boolean {
|
||||
if (NOISE_PREFIX_RE.test(sql) || SYSTEM_TABLE_RE.test(sql)) return true;
|
||||
if (config.filters.dropTrivialProbes !== false && TRIVIAL_SQL_RE.test(sql)) return true;
|
||||
return false;
|
||||
}
|
||||
|
||||
function shouldDropByUsers(template: AggregatedTemplate, config: HistoricSqlUnifiedPullConfig): boolean {
|
||||
const service = config.filters.serviceAccounts;
|
||||
if (!service || service.mode === 'mark-only' || service.patterns.length === 0) return false;
|
||||
const patterns = compilePatterns(service.patterns);
|
||||
const matchingExecutions = template.topUsers
|
||||
.filter((entry) => matchesAny(entry.user, patterns))
|
||||
.reduce((sum, entry) => sum + entry.executions, 0);
|
||||
const allExecutions = template.topUsers.reduce((sum, entry) => sum + entry.executions, 0);
|
||||
const serviceOnly = allExecutions > 0 && matchingExecutions >= allExecutions;
|
||||
return service.mode === 'exclude' ? serviceOnly : !serviceOnly;
|
||||
}
|
||||
|
||||
function shouldDropByFailure(template: AggregatedTemplate, config: HistoricSqlUnifiedPullConfig): boolean {
|
||||
const failed = config.filters.dropFailedBelow;
|
||||
return !!failed && template.stats.errorRate > failed.errorRate && template.stats.executions < failed.executions;
|
||||
}
|
||||
|
||||
function shouldDropTemplate(template: AggregatedTemplate, config: HistoricSqlUnifiedPullConfig): boolean {
|
||||
if (shouldDropBySql(template.canonicalSql, config)) return true;
|
||||
if (shouldDropByUsers(template, config)) return true;
|
||||
if (shouldDropByFailure(template, config)) return true;
|
||||
return false;
|
||||
}
|
||||
|
||||
function normalizeTableIdentifier(value: string): string {
|
||||
return value.trim().toLowerCase();
|
||||
}
|
||||
|
||||
function unqualifiedTableIdentifier(value: string): string {
|
||||
const parts = normalizeTableIdentifier(value).split('.').filter(Boolean);
|
||||
return parts.at(-1) ?? '';
|
||||
}
|
||||
|
||||
function buildEnabledTableFilter(enabledTables: string[]): EnabledTableFilter | null {
|
||||
if (enabledTables.length === 0) {
|
||||
return null;
|
||||
}
|
||||
const exact = new Set(enabledTables.map(normalizeTableIdentifier).filter((value) => value.length > 0));
|
||||
const unqualifiedCounts = new Map<string, number>();
|
||||
for (const table of exact) {
|
||||
const unqualified = unqualifiedTableIdentifier(table);
|
||||
if (unqualified.length > 0) {
|
||||
unqualifiedCounts.set(unqualified, (unqualifiedCounts.get(unqualified) ?? 0) + 1);
|
||||
}
|
||||
}
|
||||
return {
|
||||
exact,
|
||||
uniqueUnqualified: new Set(
|
||||
[...unqualifiedCounts.entries()]
|
||||
.filter(([, count]) => count === 1)
|
||||
.map(([table]) => table),
|
||||
),
|
||||
};
|
||||
}
|
||||
|
||||
function isEnabledTable(table: string, filter: EnabledTableFilter | null): boolean {
|
||||
if (!filter) {
|
||||
return true;
|
||||
}
|
||||
const normalized = normalizeTableIdentifier(table);
|
||||
return filter.exact.has(normalized) || filter.uniqueUnqualified.has(unqualifiedTableIdentifier(normalized));
|
||||
}
|
||||
|
||||
function historicSqlWindowDays(config: HistoricSqlUnifiedPullConfig): number {
|
||||
return 'windowDays' in config ? config.windowDays : 90;
|
||||
}
|
||||
|
||||
function redactTemplateSql(
|
||||
template: AggregatedTemplate,
|
||||
redactors: readonly HistoricSqlRedactionPattern[],
|
||||
): AggregatedTemplate {
|
||||
if (redactors.length === 0) {
|
||||
return template;
|
||||
}
|
||||
return {
|
||||
...template,
|
||||
canonicalSql: redactHistoricSqlText(template.canonicalSql, redactors),
|
||||
};
|
||||
}
|
||||
|
||||
function recordColumn(acc: TableAccumulator, clause: string, column: string, executions: number): void {
|
||||
const byColumn = acc.columnsByClause.get(clause) ?? new Map<string, number>();
|
||||
byColumn.set(column, (byColumn.get(column) ?? 0) + executions);
|
||||
acc.columnsByClause.set(clause, byColumn);
|
||||
}
|
||||
|
||||
function recordJoin(acc: TableAccumulator, otherTable: string, columns: string[], executions: number): void {
|
||||
const byColumns = acc.observedJoins.get(otherTable) ?? new Map<string, number>();
|
||||
const key = [...new Set(columns)].sort().join(',');
|
||||
if (key.length > 0) {
|
||||
byColumns.set(key, (byColumns.get(key) ?? 0) + executions);
|
||||
acc.observedJoins.set(otherTable, byColumns);
|
||||
}
|
||||
}
|
||||
|
||||
function accumulatorFor(table: string): TableAccumulator {
|
||||
return {
|
||||
table,
|
||||
executions: 0,
|
||||
distinctUsers: 0,
|
||||
errorRateNumerator: 0,
|
||||
p95RuntimeMs: null,
|
||||
lastSeen: '1970-01-01T00:00:00.000Z',
|
||||
columnsByClause: new Map(),
|
||||
observedJoins: new Map(),
|
||||
topTemplates: [],
|
||||
};
|
||||
}
|
||||
|
||||
function addTemplate(acc: TableAccumulator, parsed: ParsedTemplate): void {
|
||||
const executions = parsed.template.stats.executions;
|
||||
acc.executions += executions;
|
||||
acc.distinctUsers = Math.max(acc.distinctUsers, parsed.template.stats.distinctUsers);
|
||||
acc.errorRateNumerator += parsed.template.stats.errorRate * executions;
|
||||
acc.p95RuntimeMs =
|
||||
acc.p95RuntimeMs === null
|
||||
? parsed.template.stats.p95RuntimeMs
|
||||
: parsed.template.stats.p95RuntimeMs === null
|
||||
? acc.p95RuntimeMs
|
||||
: Math.max(acc.p95RuntimeMs, parsed.template.stats.p95RuntimeMs);
|
||||
acc.lastSeen = parsed.template.stats.lastSeen > acc.lastSeen ? parsed.template.stats.lastSeen : acc.lastSeen;
|
||||
for (const [clause, columns] of Object.entries(parsed.columnsByClause)) {
|
||||
for (const column of columns) {
|
||||
recordColumn(acc, clause, column, executions);
|
||||
}
|
||||
}
|
||||
const joinColumns = parsed.columnsByClause.join ?? [];
|
||||
for (const otherTable of parsed.tablesTouched.filter((table) => table !== acc.table)) {
|
||||
recordJoin(acc, otherTable, joinColumns, executions);
|
||||
}
|
||||
acc.topTemplates.push(parsed.template);
|
||||
}
|
||||
|
||||
function toStagedTable(acc: TableAccumulator, now: Date): StagedTableInput {
|
||||
const errorRate = acc.executions > 0 ? acc.errorRateNumerator / acc.executions : 0;
|
||||
const columnsByClause: Record<string, Array<[string, string]>> = Object.fromEntries(
|
||||
[...acc.columnsByClause.entries()]
|
||||
.sort(([left], [right]) => left.localeCompare(right))
|
||||
.map(([clause, counts]) => [
|
||||
clause,
|
||||
[...counts.entries()]
|
||||
.sort((left, right) => right[1] - left[1] || left[0].localeCompare(right[0]))
|
||||
.map(([column, count]) => [column, bucketFrequency(count, acc.executions)] as [string, string]),
|
||||
]),
|
||||
);
|
||||
const observedJoins = [...acc.observedJoins.entries()]
|
||||
.flatMap(([withTable, byColumns]) =>
|
||||
[...byColumns.entries()].map(([columns, count]) => ({
|
||||
withTable,
|
||||
on: columns.split(',').filter(Boolean),
|
||||
freq: bucketFrequency(count, acc.executions),
|
||||
})),
|
||||
)
|
||||
.sort((left, right) => left.withTable.localeCompare(right.withTable) || left.on.join(',').localeCompare(right.on.join(',')));
|
||||
const topTemplates = [...acc.topTemplates]
|
||||
.sort((left, right) => right.stats.executions - left.stats.executions || left.templateId.localeCompare(right.templateId))
|
||||
.slice(0, 5)
|
||||
.map((template) => ({
|
||||
id: template.templateId,
|
||||
canonicalSql: template.canonicalSql,
|
||||
topUsers: template.topUsers.slice(0, 5).map((entry) => ({ user: entry.user })),
|
||||
}));
|
||||
|
||||
return {
|
||||
table: acc.table,
|
||||
stats: {
|
||||
executionsBucket: bucketExecutions(acc.executions),
|
||||
distinctUsersBucket: bucketDistinctUsers(acc.distinctUsers),
|
||||
errorRateBucket: bucketErrorRate(errorRate),
|
||||
p95RuntimeBucket: bucketP95Runtime(acc.p95RuntimeMs),
|
||||
recencyBucket: bucketRecency(acc.lastSeen, now),
|
||||
},
|
||||
columnsByClause,
|
||||
observedJoins,
|
||||
topTemplates,
|
||||
};
|
||||
}
|
||||
|
||||
function toPatternsInput(parsedTemplates: ParsedTemplate[]): StagedPatternsInput {
|
||||
return {
|
||||
templates: parsedTemplates
|
||||
.map(({ template, tablesTouched }) => ({
|
||||
id: template.templateId,
|
||||
canonicalSql: template.canonicalSql,
|
||||
tablesTouched: [...tablesTouched].sort(),
|
||||
executionsBucket: bucketExecutions(template.stats.executions),
|
||||
distinctUsersBucket: bucketDistinctUsers(template.stats.distinctUsers),
|
||||
dialect: template.dialect,
|
||||
}))
|
||||
.sort((left, right) => left.id.localeCompare(right.id)),
|
||||
};
|
||||
}
|
||||
|
||||
export async function stageHistoricSqlAggregatedSnapshot(input: StageHistoricSqlAggregatedSnapshotInput): Promise<void> {
|
||||
const config = historicSqlUnifiedPullConfigSchema.parse(input.pullConfig);
|
||||
const enabledTableFilter = buildEnabledTableFilter(config.enabledTables);
|
||||
const redactors = compileHistoricSqlRedactionPatterns(config.redactionPatterns);
|
||||
const now = input.now ?? new Date();
|
||||
const windowStart = new Date(now.getTime() - historicSqlWindowDays(config) * 24 * 60 * 60 * 1000);
|
||||
const probe = await input.reader.probe(input.queryClient);
|
||||
const snapshot: AggregatedTemplate[] = [];
|
||||
let snapshotRowCount = 0;
|
||||
|
||||
for await (const row of input.reader.fetchAggregated(input.queryClient, { start: windowStart, end: now }, config)) {
|
||||
snapshotRowCount += 1;
|
||||
const parsed = aggregatedTemplateSchema.parse(row);
|
||||
if (!shouldDropTemplate(parsed, config)) {
|
||||
snapshot.push(parsed);
|
||||
}
|
||||
}
|
||||
|
||||
const analysis = await input.sqlAnalysis.analyzeBatch(
|
||||
snapshot.map((template) => ({ id: template.templateId, sql: template.canonicalSql })),
|
||||
config.dialect,
|
||||
);
|
||||
const warnings: string[] = [];
|
||||
const parsedTemplates: ParsedTemplate[] = [];
|
||||
for (const template of snapshot) {
|
||||
const parsed = analysis.get(template.templateId);
|
||||
if (!parsed || parsed.error) {
|
||||
warnings.push(`parse_failed:${template.templateId}`);
|
||||
continue;
|
||||
}
|
||||
const tablesTouched = [...new Set(parsed.tablesTouched)].filter((table) => table.length > 0).sort();
|
||||
const includedTables = tablesTouched.filter((table) => isEnabledTable(table, enabledTableFilter));
|
||||
if (includedTables.length === 0) {
|
||||
continue;
|
||||
}
|
||||
parsedTemplates.push({
|
||||
template: redactTemplateSql(template, redactors),
|
||||
tablesTouched,
|
||||
includedTables,
|
||||
columnsByClause: Object.fromEntries(
|
||||
Object.entries(parsed.columnsByClause).map(([clause, columns]) => [clause, [...new Set(columns)].sort()]),
|
||||
),
|
||||
});
|
||||
}
|
||||
|
||||
const byTable = new Map<string, TableAccumulator>();
|
||||
for (const parsed of parsedTemplates) {
|
||||
for (const table of parsed.includedTables) {
|
||||
const acc = byTable.get(table) ?? accumulatorFor(table);
|
||||
addTemplate(acc, parsed);
|
||||
byTable.set(table, acc);
|
||||
}
|
||||
}
|
||||
|
||||
await mkdir(input.stagedDir, { recursive: true });
|
||||
for (const [table, acc] of [...byTable.entries()].sort(([left], [right]) => left.localeCompare(right))) {
|
||||
await writeJson(input.stagedDir, `tables/${table}.json`, toStagedTable(acc, now));
|
||||
}
|
||||
const patternsInput = toPatternsInput(parsedTemplates);
|
||||
const patternInputSplit = splitHistoricSqlPatternInputs(patternsInput);
|
||||
const allWarnings = [...warnings, ...patternInputSplit.warnings];
|
||||
await writeJson(input.stagedDir, 'patterns-input.json', patternInputSplit.auditInput);
|
||||
for (const shard of patternInputSplit.shards) {
|
||||
await writeJson(input.stagedDir, shard.path, shard.input);
|
||||
}
|
||||
await writeJson(input.stagedDir, 'manifest.json', {
|
||||
source: HISTORIC_SQL_SOURCE_KEY,
|
||||
connectionId: input.connectionId,
|
||||
dialect: config.dialect,
|
||||
fetchedAt: now.toISOString(),
|
||||
windowStart: windowStart.toISOString(),
|
||||
windowEnd: now.toISOString(),
|
||||
snapshotRowCount,
|
||||
touchedTableCount: byTable.size,
|
||||
parseFailures: allWarnings.filter((warning) => warning.startsWith('parse_failed:')).length,
|
||||
warnings: allWarnings,
|
||||
probeWarnings: probe.warnings,
|
||||
staleArchiveAfterDays: config.staleArchiveAfterDays,
|
||||
});
|
||||
}
|
||||
|
|
@ -0,0 +1,110 @@
|
|||
import { describe, expect, it } from 'vitest';
|
||||
import {
|
||||
aggregatedTemplateSchema,
|
||||
historicSqlUnifiedPullConfigSchema,
|
||||
stagedManifestSchema,
|
||||
stagedPatternsInputSchema,
|
||||
stagedTableInputSchema,
|
||||
} from './types.js';
|
||||
|
||||
describe('historic-sql unified contracts', () => {
|
||||
it('parses minExecutions and service-account filters', () => {
|
||||
expect(historicSqlUnifiedPullConfigSchema.parse({ dialect: 'postgres', minExecutions: 9 })).toMatchObject({
|
||||
dialect: 'postgres',
|
||||
minExecutions: 9,
|
||||
redactionPatterns: [],
|
||||
staleArchiveAfterDays: 90,
|
||||
});
|
||||
expect(historicSqlUnifiedPullConfigSchema.parse({ dialect: 'postgres', minExecutions: 9 })).not.toHaveProperty(
|
||||
'windowDays',
|
||||
);
|
||||
expect(historicSqlUnifiedPullConfigSchema.parse({ dialect: 'postgres', minExecutions: 9 })).not.toHaveProperty(
|
||||
'concurrency',
|
||||
);
|
||||
|
||||
const parsed = historicSqlUnifiedPullConfigSchema.parse({
|
||||
dialect: 'postgres',
|
||||
minExecutions: 7,
|
||||
filters: {
|
||||
serviceAccounts: { patterns: ['^svc_'], mode: 'exclude' },
|
||||
},
|
||||
});
|
||||
expect(parsed.minExecutions).toBe(7);
|
||||
expect(parsed.filters.serviceAccounts).toEqual({ patterns: ['^svc_'], mode: 'exclude' });
|
||||
});
|
||||
|
||||
it('validates aggregate templates from warehouse readers', () => {
|
||||
const parsed = aggregatedTemplateSchema.parse({
|
||||
templateId: 'pg:123',
|
||||
canonicalSql: 'select status, count(*) from public.orders group by status',
|
||||
dialect: 'postgres',
|
||||
stats: {
|
||||
executions: 42,
|
||||
distinctUsers: 3,
|
||||
firstSeen: '2026-05-01T00:00:00.000Z',
|
||||
lastSeen: '2026-05-11T00:00:00.000Z',
|
||||
p50RuntimeMs: 12.5,
|
||||
p95RuntimeMs: 40,
|
||||
errorRate: 0,
|
||||
rowsProduced: 100,
|
||||
},
|
||||
topUsers: [{ user: 'analyst', executions: 40 }],
|
||||
});
|
||||
|
||||
expect(parsed.templateId).toBe('pg:123');
|
||||
expect(parsed.topUsers).toEqual([{ user: 'analyst', executions: 40 }]);
|
||||
});
|
||||
|
||||
it('validates staged table, patterns, and manifest artifacts', () => {
|
||||
expect(
|
||||
stagedTableInputSchema.parse({
|
||||
table: 'public.orders',
|
||||
stats: {
|
||||
executionsBucket: '10-100',
|
||||
distinctUsersBucket: '2-5',
|
||||
errorRateBucket: 'none',
|
||||
p95RuntimeBucket: '<100ms',
|
||||
recencyBucket: 'current',
|
||||
},
|
||||
columnsByClause: {
|
||||
select: [['status', 'high']],
|
||||
where: [['created_at', 'mid']],
|
||||
},
|
||||
observedJoins: [{ withTable: 'public.customers', on: ['customer_id'], freq: 'high' }],
|
||||
topTemplates: [{ id: 'pg:123', canonicalSql: 'select * from public.orders', topUsers: [{ user: 'analyst' }] }],
|
||||
}).table,
|
||||
).toBe('public.orders');
|
||||
|
||||
expect(
|
||||
stagedPatternsInputSchema.parse({
|
||||
templates: [
|
||||
{
|
||||
id: 'pg:123',
|
||||
canonicalSql: 'select * from public.orders',
|
||||
tablesTouched: ['public.orders'],
|
||||
executionsBucket: '10-100',
|
||||
distinctUsersBucket: '2-5',
|
||||
dialect: 'postgres',
|
||||
},
|
||||
],
|
||||
}).templates,
|
||||
).toHaveLength(1);
|
||||
|
||||
expect(
|
||||
stagedManifestSchema.parse({
|
||||
source: 'historic-sql',
|
||||
connectionId: 'warehouse',
|
||||
dialect: 'postgres',
|
||||
fetchedAt: '2026-05-11T00:00:00.000Z',
|
||||
windowStart: '2026-02-10T00:00:00.000Z',
|
||||
windowEnd: '2026-05-11T00:00:00.000Z',
|
||||
snapshotRowCount: 2,
|
||||
touchedTableCount: 1,
|
||||
parseFailures: 1,
|
||||
warnings: ['parse_failed:bad'],
|
||||
probeWarnings: [],
|
||||
staleArchiveAfterDays: 90,
|
||||
}).staleArchiveAfterDays,
|
||||
).toBe(90);
|
||||
});
|
||||
});
|
||||
153
packages/cli/src/context/ingest/adapters/historic-sql/types.ts
Normal file
153
packages/cli/src/context/ingest/adapters/historic-sql/types.ts
Normal file
|
|
@ -0,0 +1,153 @@
|
|||
import { z } from 'zod';
|
||||
import type { SqlAnalysisPort } from '../../../../context/sql-analysis/ports.js';
|
||||
|
||||
export const HISTORIC_SQL_SOURCE_KEY = 'historic-sql' as const;
|
||||
|
||||
const historicSqlDialectSchema = z.enum(['snowflake', 'bigquery', 'postgres']);
|
||||
export type HistoricSqlDialect = z.infer<typeof historicSqlDialectSchema>;
|
||||
|
||||
const filterModeSchema = z.enum(['exclude', 'include', 'mark-only']);
|
||||
|
||||
const historicSqlCommonPullConfigSchema = z.object({
|
||||
minExecutions: z.number().int().nonnegative().default(5),
|
||||
enabledTables: z.array(z.string().min(1)).default([]),
|
||||
filters: z.object({
|
||||
serviceAccounts: z.object({
|
||||
patterns: z.array(z.string()).default([]),
|
||||
mode: filterModeSchema.default('exclude'),
|
||||
}).optional(),
|
||||
orchestrators: z.object({
|
||||
mode: filterModeSchema.default('mark-only'),
|
||||
}).optional(),
|
||||
dropTrivialProbes: z.boolean().default(true),
|
||||
dropFailedBelow: z.object({
|
||||
errorRate: z.number().min(0).max(1),
|
||||
executions: z.number().int().nonnegative(),
|
||||
}).optional(),
|
||||
}).default({ dropTrivialProbes: true }),
|
||||
redactionPatterns: z.array(z.string()).default([]),
|
||||
staleArchiveAfterDays: z.number().int().positive().default(90),
|
||||
});
|
||||
|
||||
const historicSqlWindowedPullConfigSchema = historicSqlCommonPullConfigSchema.extend({
|
||||
dialect: z.enum(['snowflake', 'bigquery']),
|
||||
windowDays: z.number().int().positive().default(90),
|
||||
});
|
||||
|
||||
const historicSqlPostgresPullConfigSchema = historicSqlCommonPullConfigSchema.extend({
|
||||
dialect: z.literal('postgres'),
|
||||
});
|
||||
|
||||
export const historicSqlUnifiedPullConfigSchema = z.discriminatedUnion('dialect', [
|
||||
historicSqlWindowedPullConfigSchema,
|
||||
historicSqlPostgresPullConfigSchema,
|
||||
]);
|
||||
|
||||
export type HistoricSqlUnifiedPullConfig = z.infer<typeof historicSqlUnifiedPullConfigSchema>;
|
||||
|
||||
export const aggregatedTemplateSchema = z.object({
|
||||
templateId: z.string().min(1),
|
||||
canonicalSql: z.string().min(1),
|
||||
dialect: historicSqlDialectSchema,
|
||||
stats: z.object({
|
||||
executions: z.number().int().nonnegative(),
|
||||
distinctUsers: z.number().int().nonnegative(),
|
||||
firstSeen: z.iso.datetime(),
|
||||
lastSeen: z.iso.datetime(),
|
||||
p50RuntimeMs: z.number().nonnegative().nullable(),
|
||||
p95RuntimeMs: z.number().nonnegative().nullable(),
|
||||
errorRate: z.number().min(0).max(1),
|
||||
rowsProduced: z.number().int().nonnegative().nullable(),
|
||||
}),
|
||||
topUsers: z.array(z.object({
|
||||
user: z.string().nullable(),
|
||||
executions: z.number().int().nonnegative(),
|
||||
})).default([]),
|
||||
});
|
||||
export type AggregatedTemplate = z.infer<typeof aggregatedTemplateSchema>;
|
||||
|
||||
export const stagedTableInputSchema = z.object({
|
||||
table: z.string().min(1),
|
||||
stats: z.object({
|
||||
executionsBucket: z.string(),
|
||||
distinctUsersBucket: z.string(),
|
||||
errorRateBucket: z.string(),
|
||||
p95RuntimeBucket: z.string(),
|
||||
recencyBucket: z.string(),
|
||||
}),
|
||||
columnsByClause: z.record(z.string(), z.array(z.tuple([z.string(), z.string()]))),
|
||||
observedJoins: z.array(z.object({
|
||||
withTable: z.string(),
|
||||
on: z.array(z.string()),
|
||||
freq: z.string(),
|
||||
})),
|
||||
topTemplates: z.array(z.object({
|
||||
id: z.string(),
|
||||
canonicalSql: z.string(),
|
||||
topUsers: z.array(z.object({ user: z.string().nullable() })),
|
||||
})),
|
||||
});
|
||||
export type StagedTableInput = z.infer<typeof stagedTableInputSchema>;
|
||||
|
||||
export const stagedPatternsInputSchema = z.object({
|
||||
templates: z.array(z.object({
|
||||
id: z.string(),
|
||||
canonicalSql: z.string(),
|
||||
tablesTouched: z.array(z.string()),
|
||||
executionsBucket: z.string(),
|
||||
distinctUsersBucket: z.string(),
|
||||
dialect: historicSqlDialectSchema,
|
||||
})),
|
||||
});
|
||||
export type StagedPatternsInput = z.infer<typeof stagedPatternsInputSchema>;
|
||||
|
||||
export const stagedManifestSchema = z.object({
|
||||
source: z.literal(HISTORIC_SQL_SOURCE_KEY),
|
||||
connectionId: z.string().min(1),
|
||||
dialect: historicSqlDialectSchema,
|
||||
fetchedAt: z.iso.datetime(),
|
||||
windowStart: z.iso.datetime(),
|
||||
windowEnd: z.iso.datetime(),
|
||||
snapshotRowCount: z.number().int().nonnegative(),
|
||||
touchedTableCount: z.number().int().nonnegative(),
|
||||
parseFailures: z.number().int().nonnegative(),
|
||||
warnings: z.array(z.string()),
|
||||
probeWarnings: z.array(z.string()),
|
||||
staleArchiveAfterDays: z.number().int().positive().default(90),
|
||||
});
|
||||
|
||||
interface HistoricSqlProbeResult {
|
||||
warnings: string[];
|
||||
info?: string[];
|
||||
}
|
||||
|
||||
export interface HistoricSqlReader {
|
||||
probe(client: unknown): Promise<HistoricSqlProbeResult>;
|
||||
fetchAggregated(
|
||||
client: unknown,
|
||||
window: HistoricSqlTimeWindow,
|
||||
config: HistoricSqlUnifiedPullConfig,
|
||||
): AsyncIterable<AggregatedTemplate>;
|
||||
}
|
||||
|
||||
export interface HistoricSqlTimeWindow {
|
||||
start: Date;
|
||||
end: Date;
|
||||
}
|
||||
|
||||
export interface KtxPostgresQueryClient {
|
||||
executeQuery(sql: string, params?: unknown[]): Promise<{ headers: string[]; rows: unknown[][]; totalRows?: number }>;
|
||||
}
|
||||
|
||||
export interface PostgresPgssProbeResult extends HistoricSqlProbeResult {
|
||||
pgServerVersion: string;
|
||||
warnings: string[];
|
||||
info: string[];
|
||||
}
|
||||
|
||||
export interface HistoricSqlSourceAdapterDeps {
|
||||
sqlAnalysis: SqlAnalysisPort;
|
||||
reader: HistoricSqlReader;
|
||||
queryClient: unknown;
|
||||
now?: () => Date;
|
||||
}
|
||||
|
|
@ -0,0 +1,107 @@
|
|||
import { mkdtemp } from 'node:fs/promises';
|
||||
import { tmpdir } from 'node:os';
|
||||
import { join } from 'node:path';
|
||||
import { describe, expect, it } from 'vitest';
|
||||
import type { KtxSchemaSnapshot } from '../../../scan/types.js';
|
||||
import { chunkLiveDatabaseStagedDir } from './chunk.js';
|
||||
import { liveDatabaseTablePath, writeLiveDatabaseSnapshot } from './stage.js';
|
||||
|
||||
function snapshot(): KtxSchemaSnapshot {
|
||||
return {
|
||||
connectionId: 'conn-1',
|
||||
driver: 'postgres',
|
||||
extractedAt: '2026-04-27T00:00:00.000Z',
|
||||
scope: { schemas: ['public'] },
|
||||
metadata: {},
|
||||
tables: [
|
||||
{
|
||||
name: 'orders',
|
||||
catalog: null,
|
||||
db: 'public',
|
||||
kind: 'table',
|
||||
comment: null,
|
||||
estimatedRows: null,
|
||||
columns: [
|
||||
{
|
||||
name: 'id',
|
||||
nativeType: 'integer',
|
||||
normalizedType: 'integer',
|
||||
dimensionType: 'number',
|
||||
nullable: false,
|
||||
primaryKey: true,
|
||||
comment: null,
|
||||
},
|
||||
],
|
||||
foreignKeys: [],
|
||||
},
|
||||
{
|
||||
name: 'customers',
|
||||
catalog: null,
|
||||
db: 'public',
|
||||
kind: 'table',
|
||||
comment: null,
|
||||
estimatedRows: null,
|
||||
columns: [
|
||||
{
|
||||
name: 'id',
|
||||
nativeType: 'integer',
|
||||
normalizedType: 'integer',
|
||||
dimensionType: 'number',
|
||||
nullable: false,
|
||||
primaryKey: true,
|
||||
comment: null,
|
||||
},
|
||||
],
|
||||
foreignKeys: [],
|
||||
},
|
||||
],
|
||||
};
|
||||
}
|
||||
|
||||
describe('chunkLiveDatabaseStagedDir', () => {
|
||||
it('emits one work unit per table on the first run', async () => {
|
||||
const dir = await mkdtemp(join(tmpdir(), 'ktx-live-db-chunk-'));
|
||||
await writeLiveDatabaseSnapshot(dir, snapshot());
|
||||
|
||||
const result = await chunkLiveDatabaseStagedDir(dir);
|
||||
expect(result.workUnits.map((wu) => wu.unitKey)).toEqual([
|
||||
'live-database-public-customers',
|
||||
'live-database-public-orders',
|
||||
]);
|
||||
expect(result.workUnits[0]?.dependencyPaths).toEqual(['connection.json', 'foreign-keys.json']);
|
||||
expect(result.workUnits[0]?.peerFileIndex).toContain(
|
||||
liveDatabaseTablePath({ catalog: null, db: 'public', name: 'orders' }),
|
||||
);
|
||||
});
|
||||
|
||||
it('keeps only changed tables during incremental syncs and records table evictions', async () => {
|
||||
const dir = await mkdtemp(join(tmpdir(), 'ktx-live-db-diff-'));
|
||||
await writeLiveDatabaseSnapshot(dir, snapshot());
|
||||
const ordersPath = liveDatabaseTablePath({ catalog: null, db: 'public', name: 'orders' });
|
||||
const customersPath = liveDatabaseTablePath({ catalog: null, db: 'public', name: 'customers' });
|
||||
|
||||
const result = await chunkLiveDatabaseStagedDir(dir, {
|
||||
added: [],
|
||||
modified: [ordersPath],
|
||||
deleted: [customersPath],
|
||||
unchanged: ['connection.json', 'foreign-keys.json'],
|
||||
});
|
||||
|
||||
expect(result.workUnits.map((wu) => wu.unitKey)).toEqual(['live-database-public-orders']);
|
||||
expect(result.eviction?.deletedRawPaths).toEqual([customersPath]);
|
||||
});
|
||||
|
||||
it('fans out all table work units when the foreign-key index changes', async () => {
|
||||
const dir = await mkdtemp(join(tmpdir(), 'ktx-live-db-fk-'));
|
||||
await writeLiveDatabaseSnapshot(dir, snapshot());
|
||||
|
||||
const result = await chunkLiveDatabaseStagedDir(dir, {
|
||||
added: [],
|
||||
modified: ['foreign-keys.json'],
|
||||
deleted: [],
|
||||
unchanged: [],
|
||||
});
|
||||
|
||||
expect(result.workUnits).toHaveLength(2);
|
||||
});
|
||||
});
|
||||
|
|
@ -0,0 +1,58 @@
|
|||
import type { ChunkResult, DiffSet, WorkUnit } from '../../types.js';
|
||||
import type { KtxSchemaTable } from '../../../scan/types.js';
|
||||
import { LIVE_DATABASE_FOREIGN_KEYS_FILE, LIVE_DATABASE_META_FILE, readLiveDatabaseTableFiles } from './stage.js';
|
||||
|
||||
function unitKey(table: KtxSchemaTable): string {
|
||||
const parts = [table.catalog, table.db, table.name]
|
||||
.filter((part): part is string => typeof part === 'string' && part.length > 0)
|
||||
.map((part) =>
|
||||
part
|
||||
.toLowerCase()
|
||||
.replace(/[^a-z0-9]+/g, '-')
|
||||
.replace(/^-+|-+$/g, ''),
|
||||
)
|
||||
.filter(Boolean);
|
||||
return `live-database-${parts.join('-') || 'table'}`;
|
||||
}
|
||||
|
||||
function displayName(table: KtxSchemaTable): string {
|
||||
return [table.catalog, table.db, table.name].filter(Boolean).join('.');
|
||||
}
|
||||
|
||||
function isTablePath(path: string): boolean {
|
||||
return path.startsWith('tables/') && path.endsWith('.json');
|
||||
}
|
||||
|
||||
export async function chunkLiveDatabaseStagedDir(stagedDir: string, diffSet?: DiffSet): Promise<ChunkResult> {
|
||||
const tableFiles = await readLiveDatabaseTableFiles(stagedDir);
|
||||
const allTablePaths = tableFiles.map((file) => file.path);
|
||||
const globalDeps = [LIVE_DATABASE_META_FILE, LIVE_DATABASE_FOREIGN_KEYS_FILE];
|
||||
const touched = diffSet ? new Set([...diffSet.added, ...diffSet.modified]) : null;
|
||||
const globalTouched = Boolean(
|
||||
touched && (touched.has(LIVE_DATABASE_META_FILE) || touched.has(LIVE_DATABASE_FOREIGN_KEYS_FILE)),
|
||||
);
|
||||
|
||||
const workUnits: WorkUnit[] = [];
|
||||
for (const file of tableFiles) {
|
||||
if (touched && !globalTouched && !touched.has(file.path)) {
|
||||
continue;
|
||||
}
|
||||
const peers = allTablePaths.filter((path) => path !== file.path).sort();
|
||||
workUnits.push({
|
||||
unitKey: unitKey(file.table),
|
||||
displayLabel: `Live database table ${displayName(file.table)}`,
|
||||
rawFiles: [file.path],
|
||||
peerFileIndex: peers,
|
||||
dependencyPaths: globalDeps,
|
||||
notes: `Database catalog snapshot for ${displayName(file.table)} with ${file.table.columns.length} column${
|
||||
file.table.columns.length === 1 ? '' : 's'
|
||||
}.`,
|
||||
});
|
||||
}
|
||||
|
||||
const deletedRawPaths = diffSet ? diffSet.deleted.filter(isTablePath).sort() : [];
|
||||
return {
|
||||
workUnits,
|
||||
...(deletedRawPaths.length > 0 ? { eviction: { deletedRawPaths } } : {}),
|
||||
};
|
||||
}
|
||||
|
|
@ -0,0 +1,255 @@
|
|||
import { once } from 'node:events';
|
||||
import { createServer } from 'node:http';
|
||||
import { describe, expect, it, vi } from 'vitest';
|
||||
import { createDaemonLiveDatabaseIntrospection } from './daemon-introspection.js';
|
||||
|
||||
const daemonResponse = {
|
||||
connection_id: 'warehouse',
|
||||
extracted_at: '2026-04-28T10:00:00+00:00',
|
||||
metadata: { driver: 'postgres', schemas: ['public'] },
|
||||
tables: [
|
||||
{
|
||||
catalog: 'warehouse',
|
||||
db: 'public',
|
||||
name: 'customers',
|
||||
comment: null,
|
||||
columns: [{ name: 'id', type: 'integer', nullable: false, primary_key: true, comment: null }],
|
||||
foreign_keys: [],
|
||||
},
|
||||
{
|
||||
catalog: 'warehouse',
|
||||
db: 'public',
|
||||
name: 'orders',
|
||||
comment: 'Order facts',
|
||||
columns: [
|
||||
{ name: 'id', type: 'integer', nullable: false, primary_key: true, comment: 'Order id' },
|
||||
{ name: 'customer_id', type: 'integer', nullable: false, primary_key: false, comment: null },
|
||||
],
|
||||
foreign_keys: [
|
||||
{
|
||||
from_column: 'customer_id',
|
||||
to_table: 'customers',
|
||||
to_column: 'id',
|
||||
constraint_name: 'orders_customer_id_fkey',
|
||||
},
|
||||
],
|
||||
},
|
||||
],
|
||||
};
|
||||
|
||||
describe('createDaemonLiveDatabaseIntrospection', () => {
|
||||
it('calls the database-introspect daemon command and maps the snapshot response', async () => {
|
||||
const runJson = vi.fn(async () => daemonResponse);
|
||||
const introspection = createDaemonLiveDatabaseIntrospection({
|
||||
connections: {
|
||||
warehouse: {
|
||||
driver: 'postgres',
|
||||
url: 'postgres://localhost:5432/warehouse',
|
||||
},
|
||||
},
|
||||
schemas: ['public'],
|
||||
runJson,
|
||||
});
|
||||
|
||||
await expect(introspection.extractSchema('warehouse')).resolves.toEqual({
|
||||
connectionId: 'warehouse',
|
||||
driver: 'postgres',
|
||||
extractedAt: '2026-04-28T10:00:00+00:00',
|
||||
scope: { schemas: ['public'] },
|
||||
metadata: { driver: 'postgres', schemas: ['public'] },
|
||||
tables: [
|
||||
{
|
||||
catalog: 'warehouse',
|
||||
db: 'public',
|
||||
name: 'customers',
|
||||
kind: 'table',
|
||||
comment: null,
|
||||
estimatedRows: null,
|
||||
columns: [
|
||||
{
|
||||
name: 'id',
|
||||
nativeType: 'integer',
|
||||
normalizedType: 'integer',
|
||||
dimensionType: 'number',
|
||||
nullable: false,
|
||||
primaryKey: true,
|
||||
comment: null,
|
||||
},
|
||||
],
|
||||
foreignKeys: [],
|
||||
},
|
||||
{
|
||||
catalog: 'warehouse',
|
||||
db: 'public',
|
||||
name: 'orders',
|
||||
kind: 'table',
|
||||
comment: 'Order facts',
|
||||
estimatedRows: null,
|
||||
columns: [
|
||||
{
|
||||
name: 'id',
|
||||
nativeType: 'integer',
|
||||
normalizedType: 'integer',
|
||||
dimensionType: 'number',
|
||||
nullable: false,
|
||||
primaryKey: true,
|
||||
comment: 'Order id',
|
||||
},
|
||||
{
|
||||
name: 'customer_id',
|
||||
nativeType: 'integer',
|
||||
normalizedType: 'integer',
|
||||
dimensionType: 'number',
|
||||
nullable: false,
|
||||
primaryKey: false,
|
||||
comment: null,
|
||||
},
|
||||
],
|
||||
foreignKeys: [
|
||||
{
|
||||
fromColumn: 'customer_id',
|
||||
toCatalog: null,
|
||||
toDb: null,
|
||||
toTable: 'customers',
|
||||
toColumn: 'id',
|
||||
constraintName: 'orders_customer_id_fkey',
|
||||
},
|
||||
],
|
||||
},
|
||||
],
|
||||
});
|
||||
|
||||
expect(runJson).toHaveBeenCalledWith('database-introspect', {
|
||||
connection_id: 'warehouse',
|
||||
driver: 'postgres',
|
||||
url: 'postgres://localhost:5432/warehouse',
|
||||
schemas: ['public'],
|
||||
statement_timeout_ms: 30_000,
|
||||
connection_timeout_seconds: 5,
|
||||
});
|
||||
});
|
||||
|
||||
it('calls a running daemon HTTP endpoint when baseUrl is configured', async () => {
|
||||
const requests: Array<{ url: string | undefined; body: unknown }> = [];
|
||||
const server = createServer((request, response) => {
|
||||
const chunks: Buffer[] = [];
|
||||
request.on('data', (chunk: Buffer) => chunks.push(chunk));
|
||||
request.on('end', () => {
|
||||
requests.push({
|
||||
url: request.url,
|
||||
body: JSON.parse(Buffer.concat(chunks).toString('utf8')),
|
||||
});
|
||||
response.writeHead(200, { 'content-type': 'application/json' });
|
||||
response.end(JSON.stringify(daemonResponse));
|
||||
});
|
||||
});
|
||||
|
||||
server.listen(0, '127.0.0.1');
|
||||
await once(server, 'listening');
|
||||
try {
|
||||
const address = server.address();
|
||||
if (!address || typeof address === 'string') {
|
||||
throw new Error('expected TCP server address');
|
||||
}
|
||||
const introspection = createDaemonLiveDatabaseIntrospection({
|
||||
connections: {
|
||||
warehouse: {
|
||||
driver: 'postgresql',
|
||||
url: 'postgres://localhost:5432/warehouse',
|
||||
},
|
||||
},
|
||||
baseUrl: `http://127.0.0.1:${address.port}`,
|
||||
});
|
||||
|
||||
await expect(introspection.extractSchema('warehouse')).resolves.toMatchObject({
|
||||
connectionId: 'warehouse',
|
||||
tables: [{ name: 'customers' }, { name: 'orders' }],
|
||||
});
|
||||
|
||||
expect(requests).toEqual([
|
||||
{
|
||||
url: '/database/introspect',
|
||||
body: {
|
||||
connection_id: 'warehouse',
|
||||
driver: 'postgres',
|
||||
url: 'postgres://localhost:5432/warehouse',
|
||||
schemas: ['public'],
|
||||
statement_timeout_ms: 30_000,
|
||||
connection_timeout_seconds: 5,
|
||||
},
|
||||
},
|
||||
]);
|
||||
} finally {
|
||||
server.close();
|
||||
}
|
||||
});
|
||||
|
||||
it('requires a configured postgres connection with a url', async () => {
|
||||
const introspection = createDaemonLiveDatabaseIntrospection({
|
||||
connections: {
|
||||
warehouse: {
|
||||
driver: 'postgres',
|
||||
},
|
||||
},
|
||||
runJson: vi.fn(async () => daemonResponse),
|
||||
});
|
||||
|
||||
await expect(introspection.extractSchema('warehouse')).rejects.toThrow(
|
||||
'Local live-database ingest requires connections.warehouse.url.',
|
||||
);
|
||||
});
|
||||
|
||||
it('rejects unsupported local connection drivers before calling the daemon', async () => {
|
||||
const runJson = vi.fn(async () => daemonResponse);
|
||||
const introspection = createDaemonLiveDatabaseIntrospection({
|
||||
connections: {
|
||||
warehouse: {
|
||||
driver: 'snowflake',
|
||||
url: 'snowflake://example',
|
||||
},
|
||||
},
|
||||
runJson,
|
||||
});
|
||||
|
||||
await expect(introspection.extractSchema('warehouse')).rejects.toThrow(
|
||||
'Local live-database ingest cannot run driver "snowflake".',
|
||||
);
|
||||
expect(runJson).not.toHaveBeenCalled();
|
||||
});
|
||||
|
||||
it('filters out tables not on the enabled_tables allowlist', async () => {
|
||||
const runJson = vi.fn(async () => daemonResponse);
|
||||
const introspection = createDaemonLiveDatabaseIntrospection({
|
||||
connections: {
|
||||
warehouse: {
|
||||
driver: 'postgres',
|
||||
url: 'postgres://localhost:5432/warehouse',
|
||||
enabled_tables: ['public.orders'],
|
||||
},
|
||||
},
|
||||
schemas: ['public'],
|
||||
runJson,
|
||||
});
|
||||
|
||||
const snapshot = await introspection.extractSchema('warehouse');
|
||||
expect(snapshot.tables.map((table) => `${table.db}.${table.name}`)).toEqual(['public.orders']);
|
||||
});
|
||||
|
||||
it('passes through every table when enabled_tables is omitted or empty', async () => {
|
||||
const runJson = vi.fn(async () => daemonResponse);
|
||||
const introspection = createDaemonLiveDatabaseIntrospection({
|
||||
connections: {
|
||||
warehouse: {
|
||||
driver: 'postgres',
|
||||
url: 'postgres://localhost:5432/warehouse',
|
||||
enabled_tables: [],
|
||||
},
|
||||
},
|
||||
schemas: ['public'],
|
||||
runJson,
|
||||
});
|
||||
|
||||
const snapshot = await introspection.extractSchema('warehouse');
|
||||
expect(snapshot.tables.map((table) => table.name)).toEqual(['customers', 'orders']);
|
||||
});
|
||||
});
|
||||
|
|
@ -0,0 +1,256 @@
|
|||
import { spawn } from 'node:child_process';
|
||||
import { request as httpRequest } from 'node:http';
|
||||
import { request as httpsRequest } from 'node:https';
|
||||
import { URL } from 'node:url';
|
||||
import type { KtxProjectConnectionConfig } from '../../../project/config.js';
|
||||
import { filterSnapshotTables, resolveEnabledTables } from '../../../scan/enabled-tables.js';
|
||||
import type { KtxSchemaColumn, KtxSchemaForeignKey, KtxSchemaSnapshot, KtxSchemaTable } from '../../../scan/types.js';
|
||||
import { inferKtxDimensionType, normalizeKtxNativeType } from '../../../scan/type-normalization.js';
|
||||
import type { LiveDatabaseIntrospectionPort } from './types.js';
|
||||
|
||||
type KtxDaemonDatabaseIntrospectionCommand = 'database-introspect';
|
||||
|
||||
type KtxDaemonDatabaseJsonRunner = (
|
||||
subcommand: KtxDaemonDatabaseIntrospectionCommand,
|
||||
payload: Record<string, unknown>,
|
||||
) => Promise<Record<string, unknown>>;
|
||||
|
||||
export type KtxDaemonDatabaseHttpJsonRunner = (
|
||||
path: string,
|
||||
payload: Record<string, unknown>,
|
||||
) => Promise<Record<string, unknown>>;
|
||||
|
||||
export interface DaemonLiveDatabaseIntrospectionOptions {
|
||||
connections: Record<string, KtxProjectConnectionConfig>;
|
||||
schemas?: string[];
|
||||
statementTimeoutMs?: number;
|
||||
connectionTimeoutSeconds?: number;
|
||||
command?: string;
|
||||
args?: string[];
|
||||
cwd?: string;
|
||||
env?: NodeJS.ProcessEnv;
|
||||
baseUrl?: string;
|
||||
runJson?: KtxDaemonDatabaseJsonRunner;
|
||||
requestJson?: KtxDaemonDatabaseHttpJsonRunner;
|
||||
now?: () => Date;
|
||||
}
|
||||
|
||||
const DEFAULT_SCHEMAS = ['public'];
|
||||
|
||||
function parseJsonObject(raw: string, subcommand: string): Record<string, unknown> {
|
||||
const parsed = JSON.parse(raw) as unknown;
|
||||
if (!parsed || typeof parsed !== 'object' || Array.isArray(parsed)) {
|
||||
throw new Error(`ktx-daemon ${subcommand} returned non-object JSON`);
|
||||
}
|
||||
return parsed as Record<string, unknown>;
|
||||
}
|
||||
|
||||
function runProcessJson(
|
||||
options: Required<Pick<DaemonLiveDatabaseIntrospectionOptions, 'command' | 'args'>> &
|
||||
Pick<DaemonLiveDatabaseIntrospectionOptions, 'cwd' | 'env'>,
|
||||
): KtxDaemonDatabaseJsonRunner {
|
||||
return async (subcommand, payload) =>
|
||||
new Promise((resolve, reject) => {
|
||||
const child = spawn(options.command, [...options.args, subcommand], {
|
||||
cwd: options.cwd,
|
||||
env: { ...process.env, ...options.env },
|
||||
stdio: ['pipe', 'pipe', 'pipe'],
|
||||
});
|
||||
const stdout: Buffer[] = [];
|
||||
const stderr: Buffer[] = [];
|
||||
|
||||
child.stdout.on('data', (chunk: Buffer) => stdout.push(chunk));
|
||||
child.stderr.on('data', (chunk: Buffer) => stderr.push(chunk));
|
||||
child.on('error', reject);
|
||||
child.on('close', (code) => {
|
||||
const stdoutText = Buffer.concat(stdout).toString('utf8').trim();
|
||||
const stderrText = Buffer.concat(stderr).toString('utf8').trim();
|
||||
if (code !== 0) {
|
||||
reject(new Error(`ktx-daemon ${subcommand} failed: ${stderrText || `exit code ${code}`}`));
|
||||
return;
|
||||
}
|
||||
try {
|
||||
resolve(parseJsonObject(stdoutText, subcommand));
|
||||
} catch (error) {
|
||||
reject(error);
|
||||
}
|
||||
});
|
||||
child.stdin.end(`${JSON.stringify(payload)}\n`);
|
||||
});
|
||||
}
|
||||
|
||||
function normalizedBaseUrl(baseUrl: string): string {
|
||||
return baseUrl.endsWith('/') ? baseUrl : `${baseUrl}/`;
|
||||
}
|
||||
|
||||
function postJson(baseUrl: string): KtxDaemonDatabaseHttpJsonRunner {
|
||||
return async (path, payload) =>
|
||||
new Promise((resolve, reject) => {
|
||||
const target = new URL(path.replace(/^\//, ''), normalizedBaseUrl(baseUrl));
|
||||
const body = JSON.stringify(payload);
|
||||
const client = target.protocol === 'https:' ? httpsRequest : httpRequest;
|
||||
const request = client(
|
||||
target,
|
||||
{
|
||||
method: 'POST',
|
||||
headers: {
|
||||
accept: 'application/json',
|
||||
'content-type': 'application/json',
|
||||
'content-length': Buffer.byteLength(body),
|
||||
},
|
||||
},
|
||||
(response) => {
|
||||
const chunks: Buffer[] = [];
|
||||
response.on('data', (chunk: Buffer) => chunks.push(chunk));
|
||||
response.on('end', () => {
|
||||
const text = Buffer.concat(chunks).toString('utf8');
|
||||
const statusCode = response.statusCode ?? 0;
|
||||
if (statusCode < 200 || statusCode >= 300) {
|
||||
reject(new Error(`ktx-daemon HTTP ${path} failed with ${statusCode}: ${text}`));
|
||||
return;
|
||||
}
|
||||
try {
|
||||
resolve(parseJsonObject(text, path));
|
||||
} catch (error) {
|
||||
reject(error);
|
||||
}
|
||||
});
|
||||
},
|
||||
);
|
||||
request.on('error', reject);
|
||||
request.end(body);
|
||||
});
|
||||
}
|
||||
|
||||
function recordValue(value: unknown): Record<string, unknown> {
|
||||
return value && typeof value === 'object' && !Array.isArray(value) ? (value as Record<string, unknown>) : {};
|
||||
}
|
||||
|
||||
function recordArray(value: unknown): Array<Record<string, unknown>> {
|
||||
return Array.isArray(value)
|
||||
? value.filter(
|
||||
(item): item is Record<string, unknown> => item !== null && typeof item === 'object' && !Array.isArray(item),
|
||||
)
|
||||
: [];
|
||||
}
|
||||
|
||||
function requiredString(value: unknown, field: string): string {
|
||||
if (typeof value !== 'string' || value.length === 0) {
|
||||
throw new Error(`ktx-daemon database introspection response is missing string field ${field}`);
|
||||
}
|
||||
return value;
|
||||
}
|
||||
|
||||
function nullableString(value: unknown): string | null {
|
||||
return typeof value === 'string' ? value : null;
|
||||
}
|
||||
|
||||
function optionalString(value: unknown): string | undefined {
|
||||
return typeof value === 'string' ? value : undefined;
|
||||
}
|
||||
|
||||
function normalizeDriver(driver: unknown): string {
|
||||
const normalized = String(driver ?? '').trim().toLowerCase();
|
||||
return normalized === 'postgresql' ? 'postgres' : normalized;
|
||||
}
|
||||
|
||||
function requirePostgresConnection(
|
||||
connections: Record<string, KtxProjectConnectionConfig>,
|
||||
connectionId: string,
|
||||
): KtxProjectConnectionConfig & { url: string } {
|
||||
const connection = connections[connectionId];
|
||||
const driver = normalizeDriver(connection?.driver);
|
||||
if (driver !== 'postgres') {
|
||||
throw new Error(`Local live-database ingest cannot run driver "${connection?.driver ?? 'unknown'}".`);
|
||||
}
|
||||
if (typeof connection.url !== 'string' || connection.url.trim().length === 0) {
|
||||
throw new Error(`Local live-database ingest requires connections.${connectionId}.url.`);
|
||||
}
|
||||
return connection as KtxProjectConnectionConfig & { url: string };
|
||||
}
|
||||
|
||||
function mapColumn(raw: Record<string, unknown>): KtxSchemaColumn {
|
||||
const nativeType = requiredString(raw.type, 'tables[].columns[].type');
|
||||
return {
|
||||
name: requiredString(raw.name, 'tables[].columns[].name'),
|
||||
nativeType,
|
||||
normalizedType: normalizeKtxNativeType(nativeType),
|
||||
dimensionType: inferKtxDimensionType(nativeType),
|
||||
nullable: raw.nullable !== false ? true : false,
|
||||
primaryKey: raw.primary_key === true,
|
||||
comment: nullableString(raw.comment),
|
||||
};
|
||||
}
|
||||
|
||||
function mapForeignKey(raw: Record<string, unknown>): KtxSchemaForeignKey {
|
||||
return {
|
||||
fromColumn: requiredString(raw.from_column, 'tables[].foreign_keys[].from_column'),
|
||||
toCatalog: null,
|
||||
toDb: null,
|
||||
toTable: requiredString(raw.to_table, 'tables[].foreign_keys[].to_table'),
|
||||
toColumn: requiredString(raw.to_column, 'tables[].foreign_keys[].to_column'),
|
||||
constraintName: nullableString(raw.constraint_name),
|
||||
};
|
||||
}
|
||||
|
||||
function mapTable(raw: Record<string, unknown>): KtxSchemaTable {
|
||||
return {
|
||||
catalog: nullableString(raw.catalog),
|
||||
db: nullableString(raw.db),
|
||||
name: requiredString(raw.name, 'tables[].name'),
|
||||
kind: 'table',
|
||||
comment: nullableString(raw.comment),
|
||||
estimatedRows: null,
|
||||
columns: recordArray(raw.columns).map(mapColumn),
|
||||
foreignKeys: recordArray(raw.foreign_keys).map(mapForeignKey),
|
||||
};
|
||||
}
|
||||
|
||||
function mapDaemonSnapshot(
|
||||
raw: Record<string, unknown>,
|
||||
input: { connectionId: string; extractedAt: string; schemas: string[] },
|
||||
): KtxSchemaSnapshot {
|
||||
return {
|
||||
connectionId: requiredString(raw.connection_id, 'connection_id') || input.connectionId,
|
||||
driver: 'postgres',
|
||||
extractedAt: optionalString(raw.extracted_at) ?? input.extractedAt,
|
||||
scope: { schemas: input.schemas },
|
||||
metadata: recordValue(raw.metadata),
|
||||
tables: recordArray(raw.tables).map(mapTable),
|
||||
};
|
||||
}
|
||||
|
||||
export function createDaemonLiveDatabaseIntrospection(
|
||||
options: DaemonLiveDatabaseIntrospectionOptions,
|
||||
): LiveDatabaseIntrospectionPort {
|
||||
const schemas = options.schemas ?? DEFAULT_SCHEMAS;
|
||||
const command = options.command ?? 'python';
|
||||
const args = options.args ?? ['-m', 'ktx_daemon'];
|
||||
const runJson = options.runJson ?? runProcessJson({ command, args, cwd: options.cwd, env: options.env });
|
||||
const requestJson = options.requestJson ?? (options.baseUrl ? postJson(options.baseUrl) : undefined);
|
||||
const now = options.now ?? (() => new Date());
|
||||
|
||||
return {
|
||||
async extractSchema(connectionId: string): Promise<KtxSchemaSnapshot> {
|
||||
const connection = requirePostgresConnection(options.connections, connectionId);
|
||||
const payload = {
|
||||
connection_id: connectionId,
|
||||
driver: normalizeDriver(connection.driver),
|
||||
url: connection.url,
|
||||
schemas,
|
||||
statement_timeout_ms: options.statementTimeoutMs ?? 30_000,
|
||||
connection_timeout_seconds: options.connectionTimeoutSeconds ?? 5,
|
||||
};
|
||||
const raw = requestJson
|
||||
? await requestJson('/database/introspect', payload)
|
||||
: await runJson('database-introspect', payload);
|
||||
const snapshot = mapDaemonSnapshot(raw, {
|
||||
connectionId,
|
||||
extractedAt: now().toISOString(),
|
||||
schemas,
|
||||
});
|
||||
const enabledTables = resolveEnabledTables(connection);
|
||||
return enabledTables ? filterSnapshotTables(snapshot, enabledTables) : snapshot;
|
||||
},
|
||||
};
|
||||
}
|
||||
|
|
@ -0,0 +1,59 @@
|
|||
import { mkdtemp } from 'node:fs/promises';
|
||||
import { tmpdir } from 'node:os';
|
||||
import { join } from 'node:path';
|
||||
import { describe, expect, it, vi } from 'vitest';
|
||||
import { LiveDatabaseSourceAdapter } from './live-database.adapter.js';
|
||||
|
||||
describe('LiveDatabaseSourceAdapter', () => {
|
||||
it('fetches a schema snapshot through the introspection port', async () => {
|
||||
const extractSchema = vi.fn().mockResolvedValue({
|
||||
connectionId: 'conn-1',
|
||||
driver: 'postgres',
|
||||
extractedAt: '2026-04-27T00:00:00.000Z',
|
||||
scope: { schemas: ['public'] },
|
||||
metadata: {},
|
||||
tables: [
|
||||
{
|
||||
name: 'orders',
|
||||
catalog: null,
|
||||
db: 'public',
|
||||
kind: 'table',
|
||||
comment: null,
|
||||
estimatedRows: null,
|
||||
columns: [
|
||||
{
|
||||
name: 'id',
|
||||
nativeType: 'integer',
|
||||
normalizedType: 'integer',
|
||||
dimensionType: 'number',
|
||||
nullable: false,
|
||||
primaryKey: true,
|
||||
comment: null,
|
||||
},
|
||||
],
|
||||
foreignKeys: [],
|
||||
},
|
||||
],
|
||||
});
|
||||
const adapter = new LiveDatabaseSourceAdapter({
|
||||
introspection: { extractSchema },
|
||||
now: () => new Date('2026-04-27T00:00:00.000Z'),
|
||||
});
|
||||
const dir = await mkdtemp(join(tmpdir(), 'ktx-live-db-adapter-'));
|
||||
|
||||
await adapter.fetch(undefined, dir, { connectionId: 'conn-1', sourceKey: 'live-database' });
|
||||
|
||||
expect(extractSchema).toHaveBeenCalledWith('conn-1');
|
||||
await expect(adapter.detect(dir)).resolves.toBe(true);
|
||||
const chunked = await adapter.chunk(dir);
|
||||
expect(chunked.workUnits.map((wu) => wu.unitKey)).toEqual(['live-database-public-orders']);
|
||||
});
|
||||
|
||||
it('declares the live database source and skill', () => {
|
||||
const adapter = new LiveDatabaseSourceAdapter({
|
||||
introspection: { extractSchema: vi.fn() },
|
||||
});
|
||||
expect(adapter.source).toBe('live-database');
|
||||
expect(adapter.skillNames).toEqual(['live_database_ingest']);
|
||||
});
|
||||
});
|
||||
|
|
@ -0,0 +1,28 @@
|
|||
import type { ChunkResult, DiffSet, FetchContext, SourceAdapter } from '../../types.js';
|
||||
import { chunkLiveDatabaseStagedDir } from './chunk.js';
|
||||
import { detectLiveDatabaseStagedDir, writeLiveDatabaseSnapshot } from './stage.js';
|
||||
import type { LiveDatabaseSourceAdapterDeps } from './types.js';
|
||||
|
||||
export class LiveDatabaseSourceAdapter implements SourceAdapter {
|
||||
readonly source = 'live-database';
|
||||
readonly skillNames = ['live_database_ingest'];
|
||||
|
||||
constructor(private readonly deps: LiveDatabaseSourceAdapterDeps) {}
|
||||
|
||||
detect(stagedDir: string): Promise<boolean> {
|
||||
return detectLiveDatabaseStagedDir(stagedDir);
|
||||
}
|
||||
|
||||
async fetch(_pullConfig: unknown, stagedDir: string, ctx: FetchContext): Promise<void> {
|
||||
const snapshot = await this.deps.introspection.extractSchema(ctx.connectionId);
|
||||
await writeLiveDatabaseSnapshot(stagedDir, {
|
||||
...snapshot,
|
||||
connectionId: ctx.connectionId,
|
||||
extractedAt: snapshot.extractedAt ?? (this.deps.now ?? (() => new Date()))().toISOString(),
|
||||
});
|
||||
}
|
||||
|
||||
chunk(stagedDir: string, diffSet?: DiffSet): Promise<ChunkResult> {
|
||||
return chunkLiveDatabaseStagedDir(stagedDir, diffSet);
|
||||
}
|
||||
}
|
||||
|
|
@ -0,0 +1,308 @@
|
|||
import { describe, expect, it } from 'vitest';
|
||||
import {
|
||||
buildLiveDatabaseManifestShards,
|
||||
type LiveDatabaseManifestExistingDescriptions,
|
||||
type LiveDatabaseManifestJoinEntry,
|
||||
type LiveDatabaseManifestShard,
|
||||
} from './manifest.js';
|
||||
|
||||
function shardObject(shards: Map<string, LiveDatabaseManifestShard>): Record<string, LiveDatabaseManifestShard> {
|
||||
return Object.fromEntries([...shards.entries()].sort(([a], [b]) => a.localeCompare(b)));
|
||||
}
|
||||
|
||||
describe('buildLiveDatabaseManifestShards', () => {
|
||||
it('builds shard objects with generated joins and preserved external descriptions', () => {
|
||||
const existingDescriptions = new Map<string, LiveDatabaseManifestExistingDescriptions>([
|
||||
[
|
||||
'orders',
|
||||
{
|
||||
table: { user: 'Pinned analyst description', db: 'Old db description' },
|
||||
columns: new Map([['id', { user: 'Pinned id description', db: 'Old id description' }]]),
|
||||
},
|
||||
],
|
||||
]);
|
||||
|
||||
const preservedJoins = new Map<string, LiveDatabaseManifestJoinEntry[]>([
|
||||
[
|
||||
'orders',
|
||||
[
|
||||
{
|
||||
to: 'customers',
|
||||
on: 'orders.account_id = customers.id',
|
||||
relationship: 'many_to_one',
|
||||
source: 'manual',
|
||||
},
|
||||
{
|
||||
to: 'missing_accounts',
|
||||
on: 'orders.account_id = missing_accounts.id',
|
||||
relationship: 'many_to_one',
|
||||
source: 'manual',
|
||||
},
|
||||
],
|
||||
],
|
||||
]);
|
||||
|
||||
const result = buildLiveDatabaseManifestShards({
|
||||
connectionType: 'POSTGRESQL',
|
||||
mapColumnType: (nativeType) => nativeType.toLowerCase(),
|
||||
existingDescriptions,
|
||||
existingPreservedJoins: preservedJoins,
|
||||
tables: [
|
||||
{
|
||||
name: 'orders',
|
||||
catalog: null,
|
||||
db: 'public',
|
||||
descriptions: { db: 'Fresh db description', ai: 'Generated AI description' },
|
||||
columns: [
|
||||
{
|
||||
name: 'id',
|
||||
type: 'INTEGER',
|
||||
pk: true,
|
||||
nullable: false,
|
||||
descriptions: { db: 'Fresh id description' },
|
||||
},
|
||||
{
|
||||
name: 'customer_id',
|
||||
type: 'INTEGER',
|
||||
},
|
||||
],
|
||||
},
|
||||
{
|
||||
name: 'customers',
|
||||
catalog: null,
|
||||
db: 'public',
|
||||
columns: [
|
||||
{
|
||||
name: 'id',
|
||||
type: 'INTEGER',
|
||||
pk: true,
|
||||
nullable: false,
|
||||
},
|
||||
],
|
||||
},
|
||||
],
|
||||
joins: [
|
||||
{
|
||||
fromTable: 'orders',
|
||||
fromColumns: ['customer_id'],
|
||||
toTable: 'customers',
|
||||
toColumns: ['id'],
|
||||
relationship: 'MANY_TO_ONE',
|
||||
source: 'formal',
|
||||
},
|
||||
],
|
||||
});
|
||||
|
||||
expect(result.tablesProcessed).toBe(2);
|
||||
expect(shardObject(result.shards)).toEqual({
|
||||
public: {
|
||||
tables: {
|
||||
orders: {
|
||||
table: 'public.orders',
|
||||
descriptions: {
|
||||
user: 'Pinned analyst description',
|
||||
db: 'Fresh db description',
|
||||
ai: 'Generated AI description',
|
||||
},
|
||||
columns: [
|
||||
{
|
||||
name: 'id',
|
||||
type: 'integer',
|
||||
pk: true,
|
||||
nullable: false,
|
||||
descriptions: {
|
||||
user: 'Pinned id description',
|
||||
db: 'Fresh id description',
|
||||
},
|
||||
},
|
||||
{
|
||||
name: 'customer_id',
|
||||
type: 'integer',
|
||||
},
|
||||
],
|
||||
joins: [
|
||||
{
|
||||
to: 'customers',
|
||||
on: 'orders.customer_id = customers.id',
|
||||
relationship: 'many_to_one',
|
||||
source: 'formal',
|
||||
},
|
||||
{
|
||||
to: 'customers',
|
||||
on: 'orders.account_id = customers.id',
|
||||
relationship: 'many_to_one',
|
||||
source: 'manual',
|
||||
},
|
||||
],
|
||||
},
|
||||
customers: {
|
||||
table: 'public.customers',
|
||||
columns: [
|
||||
{
|
||||
name: 'id',
|
||||
type: 'integer',
|
||||
pk: true,
|
||||
nullable: false,
|
||||
},
|
||||
],
|
||||
joins: [
|
||||
{
|
||||
to: 'orders',
|
||||
on: 'customers.id = orders.customer_id',
|
||||
relationship: 'one_to_many',
|
||||
source: 'formal',
|
||||
},
|
||||
],
|
||||
},
|
||||
},
|
||||
},
|
||||
});
|
||||
});
|
||||
|
||||
it('uses warehouse and schema shard keys for snowflake-style connections', () => {
|
||||
const result = buildLiveDatabaseManifestShards({
|
||||
connectionType: 'SNOWFLAKE',
|
||||
mapColumnType: (nativeType) => nativeType.toLowerCase(),
|
||||
tables: [
|
||||
{
|
||||
name: 'accounts',
|
||||
catalog: 'ANALYTICS',
|
||||
db: 'CORE',
|
||||
columns: [{ name: 'id', type: 'NUMBER' }],
|
||||
},
|
||||
],
|
||||
joins: [],
|
||||
});
|
||||
|
||||
expect(shardObject(result.shards)).toEqual({
|
||||
'ANALYTICS.CORE': {
|
||||
tables: {
|
||||
accounts: {
|
||||
table: 'ANALYTICS.CORE.accounts',
|
||||
columns: [{ name: 'id', type: 'number' }],
|
||||
},
|
||||
},
|
||||
},
|
||||
});
|
||||
});
|
||||
|
||||
it('preserves external usage keys while replacing historic SQL managed keys', () => {
|
||||
const existingUsage = new Map([
|
||||
[
|
||||
'orders',
|
||||
{
|
||||
narrative: 'Old generated usage narrative.',
|
||||
frequencyTier: 'low' as const,
|
||||
commonFilters: ['old_status'],
|
||||
commonJoins: [],
|
||||
ownerNote: 'Pinned analyst note',
|
||||
},
|
||||
],
|
||||
]);
|
||||
|
||||
const result = buildLiveDatabaseManifestShards({
|
||||
connectionType: 'POSTGRESQL',
|
||||
mapColumnType: (nativeType) => nativeType.toLowerCase(),
|
||||
existingUsage,
|
||||
tables: [
|
||||
{
|
||||
name: 'orders',
|
||||
catalog: null,
|
||||
db: 'public',
|
||||
usage: {
|
||||
narrative: 'Fresh generated usage narrative.',
|
||||
frequencyTier: 'high',
|
||||
commonFilters: ['status'],
|
||||
commonGroupBys: ['created_at'],
|
||||
commonJoins: [{ table: 'public.customers', on: ['customer_id'] }],
|
||||
},
|
||||
columns: [{ name: 'id', type: 'INTEGER' }],
|
||||
},
|
||||
],
|
||||
joins: [],
|
||||
});
|
||||
|
||||
expect(shardObject(result.shards)).toEqual({
|
||||
public: {
|
||||
tables: {
|
||||
orders: {
|
||||
table: 'public.orders',
|
||||
usage: {
|
||||
ownerNote: 'Pinned analyst note',
|
||||
narrative: 'Fresh generated usage narrative.',
|
||||
frequencyTier: 'high',
|
||||
commonFilters: ['status'],
|
||||
commonGroupBys: ['created_at'],
|
||||
commonJoins: [{ table: 'public.customers', on: ['customer_id'] }],
|
||||
},
|
||||
columns: [{ name: 'id', type: 'integer' }],
|
||||
},
|
||||
},
|
||||
},
|
||||
});
|
||||
});
|
||||
|
||||
it('renders ordered multi-column joins in both directions', () => {
|
||||
const result = buildLiveDatabaseManifestShards({
|
||||
connectionType: 'POSTGRESQL',
|
||||
mapColumnType: (nativeType) => nativeType,
|
||||
tables: [
|
||||
{
|
||||
name: 'order_lines',
|
||||
catalog: null,
|
||||
db: 'public',
|
||||
columns: [
|
||||
{ name: 'order_id', type: 'integer' },
|
||||
{ name: 'line_number', type: 'integer' },
|
||||
],
|
||||
},
|
||||
{
|
||||
name: 'order_line_allocations',
|
||||
catalog: null,
|
||||
db: 'public',
|
||||
columns: [
|
||||
{ name: 'order_id', type: 'integer' },
|
||||
{ name: 'line_number', type: 'integer' },
|
||||
],
|
||||
},
|
||||
],
|
||||
joins: [
|
||||
{
|
||||
fromTable: 'order_line_allocations',
|
||||
fromColumns: ['order_id', 'line_number'],
|
||||
toTable: 'order_lines',
|
||||
toColumns: ['order_id', 'line_number'],
|
||||
relationship: 'many_to_one',
|
||||
source: 'inferred',
|
||||
},
|
||||
],
|
||||
});
|
||||
|
||||
expect(shardObject(result.shards)).toMatchObject({
|
||||
public: {
|
||||
tables: {
|
||||
order_line_allocations: {
|
||||
joins: [
|
||||
{
|
||||
to: 'order_lines',
|
||||
on: 'order_line_allocations.order_id = order_lines.order_id AND order_line_allocations.line_number = order_lines.line_number',
|
||||
relationship: 'many_to_one',
|
||||
source: 'inferred',
|
||||
},
|
||||
],
|
||||
},
|
||||
order_lines: {
|
||||
joins: [
|
||||
{
|
||||
to: 'order_line_allocations',
|
||||
on: 'order_lines.order_id = order_line_allocations.order_id AND order_lines.line_number = order_line_allocations.line_number',
|
||||
relationship: 'one_to_many',
|
||||
source: 'inferred',
|
||||
},
|
||||
],
|
||||
},
|
||||
},
|
||||
},
|
||||
});
|
||||
});
|
||||
});
|
||||
|
|
@ -0,0 +1,310 @@
|
|||
import type { TableUsageOutput } from '../historic-sql/skill-schemas.js';
|
||||
|
||||
const RELATIONSHIP_MAP: Record<string, string> = {
|
||||
MANY_TO_ONE: 'many_to_one',
|
||||
ONE_TO_MANY: 'one_to_many',
|
||||
ONE_TO_ONE: 'one_to_one',
|
||||
};
|
||||
|
||||
const RELATIONSHIP_INVERSE: Record<string, string> = {
|
||||
many_to_one: 'one_to_many',
|
||||
one_to_many: 'many_to_one',
|
||||
one_to_one: 'one_to_one',
|
||||
};
|
||||
|
||||
const SCAN_MANAGED_DESCRIPTION_KEYS = new Set(['db', 'ai']);
|
||||
const HISTORIC_SQL_MANAGED_USAGE_KEYS = new Set([
|
||||
'narrative',
|
||||
'frequencyTier',
|
||||
'commonFilters',
|
||||
'commonGroupBys',
|
||||
'commonJoins',
|
||||
'staleSince',
|
||||
]);
|
||||
|
||||
interface LiveDatabaseManifestColumn {
|
||||
name: string;
|
||||
type: string;
|
||||
pk?: boolean;
|
||||
nullable?: boolean;
|
||||
descriptions?: Record<string, string>;
|
||||
}
|
||||
|
||||
export interface LiveDatabaseManifestJoinEntry {
|
||||
to: string;
|
||||
on: string;
|
||||
relationship: string;
|
||||
source: string;
|
||||
}
|
||||
|
||||
interface LiveDatabaseManifestTableEntry {
|
||||
table: string;
|
||||
descriptions?: Record<string, string>;
|
||||
usage?: TableUsageOutput;
|
||||
columns: LiveDatabaseManifestColumn[];
|
||||
joins?: LiveDatabaseManifestJoinEntry[];
|
||||
}
|
||||
|
||||
export interface LiveDatabaseManifestShard {
|
||||
tables: Record<string, LiveDatabaseManifestTableEntry>;
|
||||
}
|
||||
|
||||
export interface LiveDatabaseManifestTableData {
|
||||
name: string;
|
||||
catalog: string | null;
|
||||
db: string | null;
|
||||
descriptions?: Record<string, string>;
|
||||
usage?: TableUsageOutput;
|
||||
columns: Array<{
|
||||
name: string;
|
||||
type: string;
|
||||
pk?: boolean;
|
||||
nullable?: boolean;
|
||||
descriptions?: Record<string, string>;
|
||||
}>;
|
||||
}
|
||||
|
||||
export interface LiveDatabaseManifestJoinData {
|
||||
fromTable: string;
|
||||
fromColumns: string[];
|
||||
toTable: string;
|
||||
toColumns: string[];
|
||||
relationship: string;
|
||||
source: 'formal' | 'inferred' | 'manual';
|
||||
}
|
||||
|
||||
export interface LiveDatabaseManifestExistingDescriptions {
|
||||
table?: Record<string, string>;
|
||||
columns: Map<string, Record<string, string>>;
|
||||
}
|
||||
|
||||
export interface BuildLiveDatabaseManifestShardsInput {
|
||||
connectionType: string;
|
||||
tables: LiveDatabaseManifestTableData[];
|
||||
joins: LiveDatabaseManifestJoinData[];
|
||||
mapColumnType: (nativeType: string) => string;
|
||||
existingPreservedJoins?: Map<string, LiveDatabaseManifestJoinEntry[]>;
|
||||
existingDescriptions?: Map<string, LiveDatabaseManifestExistingDescriptions>;
|
||||
existingUsage?: Map<string, TableUsageOutput>;
|
||||
}
|
||||
|
||||
export interface BuildLiveDatabaseManifestShardsResult {
|
||||
shards: Map<string, LiveDatabaseManifestShard>;
|
||||
tablesProcessed: number;
|
||||
}
|
||||
|
||||
function mergeDescriptionsPreservingExternal(
|
||||
existing: Record<string, string> | undefined,
|
||||
incoming: Record<string, string> | undefined,
|
||||
): Record<string, string> | undefined {
|
||||
if (!existing && !incoming) {
|
||||
return undefined;
|
||||
}
|
||||
const result: Record<string, string> = {};
|
||||
if (existing) {
|
||||
for (const [key, value] of Object.entries(existing)) {
|
||||
if (!SCAN_MANAGED_DESCRIPTION_KEYS.has(key)) {
|
||||
result[key] = value;
|
||||
}
|
||||
}
|
||||
}
|
||||
if (incoming) {
|
||||
Object.assign(result, incoming);
|
||||
}
|
||||
return Object.keys(result).length > 0 ? result : undefined;
|
||||
}
|
||||
|
||||
export function mergeUsagePreservingExternal(
|
||||
existing: TableUsageOutput | undefined,
|
||||
incoming: TableUsageOutput | undefined,
|
||||
): TableUsageOutput | undefined {
|
||||
if (!existing && !incoming) {
|
||||
return undefined;
|
||||
}
|
||||
if (!incoming) {
|
||||
return existing ? { ...existing } : undefined;
|
||||
}
|
||||
const result: Record<string, unknown> = {};
|
||||
if (existing) {
|
||||
for (const [key, value] of Object.entries(existing)) {
|
||||
if (!HISTORIC_SQL_MANAGED_USAGE_KEYS.has(key)) {
|
||||
result[key] = value;
|
||||
}
|
||||
}
|
||||
}
|
||||
Object.assign(result, incoming);
|
||||
return Object.keys(result).length > 0 ? (result as TableUsageOutput) : undefined;
|
||||
}
|
||||
|
||||
function getShardKey(connectionType: string, catalog: string | null, db: string | null): string {
|
||||
const normalized = connectionType.toUpperCase();
|
||||
|
||||
switch (normalized) {
|
||||
case 'SNOWFLAKE':
|
||||
case 'DATABRICKS': {
|
||||
const catalogPart = catalog ?? 'default';
|
||||
const schemaPart = db ?? 'public';
|
||||
return `${catalogPart}.${schemaPart}`;
|
||||
}
|
||||
case 'BIGQUERY': {
|
||||
return db ?? catalog ?? 'default';
|
||||
}
|
||||
case 'MYSQL':
|
||||
case 'CLICKHOUSE': {
|
||||
return db ?? catalog ?? 'default';
|
||||
}
|
||||
default: {
|
||||
return db ?? 'public';
|
||||
}
|
||||
}
|
||||
}
|
||||
|
||||
function buildTableRef(name: string, catalog: string | null, db: string | null): string {
|
||||
const parts: string[] = [];
|
||||
if (catalog) {
|
||||
parts.push(catalog);
|
||||
}
|
||||
if (db) {
|
||||
parts.push(db);
|
||||
}
|
||||
parts.push(name);
|
||||
return parts.join('.');
|
||||
}
|
||||
|
||||
function addJoinOnce(
|
||||
joinsByTable: Map<string, LiveDatabaseManifestJoinEntry[]>,
|
||||
tableName: string,
|
||||
join: LiveDatabaseManifestJoinEntry,
|
||||
): void {
|
||||
const joins = joinsByTable.get(tableName) ?? [];
|
||||
const exists = joins.some((candidate) => candidate.to === join.to && candidate.on === join.on);
|
||||
if (!exists) {
|
||||
joins.push(join);
|
||||
}
|
||||
joinsByTable.set(tableName, joins);
|
||||
}
|
||||
|
||||
function joinCondition(
|
||||
leftTable: string,
|
||||
leftColumns: readonly string[],
|
||||
rightTable: string,
|
||||
rightColumns: readonly string[],
|
||||
): string {
|
||||
if (leftColumns.length === 0 || leftColumns.length !== rightColumns.length) {
|
||||
throw new Error(`Invalid relationship join from ${leftTable} to ${rightTable}: column tuple widths differ`);
|
||||
}
|
||||
return leftColumns
|
||||
.map((leftColumn, index) => {
|
||||
const rightColumn = rightColumns[index];
|
||||
if (!rightColumn) {
|
||||
throw new Error(`Invalid relationship join from ${leftTable} to ${rightTable}: missing target column`);
|
||||
}
|
||||
return `${leftTable}.${leftColumn} = ${rightTable}.${rightColumn}`;
|
||||
})
|
||||
.join(' AND ');
|
||||
}
|
||||
|
||||
function buildJoinsByTable(
|
||||
tableNames: Set<string>,
|
||||
joins: LiveDatabaseManifestJoinData[],
|
||||
preservedJoins: Map<string, LiveDatabaseManifestJoinEntry[]>,
|
||||
): Map<string, LiveDatabaseManifestJoinEntry[]> {
|
||||
const joinsByTable = new Map<string, LiveDatabaseManifestJoinEntry[]>();
|
||||
|
||||
for (const join of joins) {
|
||||
if (!tableNames.has(join.fromTable) || !tableNames.has(join.toTable)) {
|
||||
continue;
|
||||
}
|
||||
const relationship = RELATIONSHIP_MAP[join.relationship] ?? join.relationship;
|
||||
addJoinOnce(joinsByTable, join.fromTable, {
|
||||
to: join.toTable,
|
||||
on: joinCondition(join.fromTable, join.fromColumns, join.toTable, join.toColumns),
|
||||
relationship,
|
||||
source: join.source,
|
||||
});
|
||||
|
||||
const reverseRelationship = RELATIONSHIP_INVERSE[relationship] ?? 'one_to_many';
|
||||
addJoinOnce(joinsByTable, join.toTable, {
|
||||
to: join.fromTable,
|
||||
on: joinCondition(join.toTable, join.toColumns, join.fromTable, join.fromColumns),
|
||||
relationship: reverseRelationship,
|
||||
source: join.source,
|
||||
});
|
||||
}
|
||||
|
||||
for (const [tableName, tableJoins] of preservedJoins) {
|
||||
if (!tableNames.has(tableName)) {
|
||||
continue;
|
||||
}
|
||||
for (const join of tableJoins) {
|
||||
if (tableNames.has(join.to)) {
|
||||
addJoinOnce(joinsByTable, tableName, join);
|
||||
}
|
||||
}
|
||||
}
|
||||
|
||||
return joinsByTable;
|
||||
}
|
||||
|
||||
export function buildLiveDatabaseManifestShards(
|
||||
input: BuildLiveDatabaseManifestShardsInput,
|
||||
): BuildLiveDatabaseManifestShardsResult {
|
||||
const tableNames = new Set(input.tables.map((table) => table.name));
|
||||
const joinsByTable = buildJoinsByTable(tableNames, input.joins, input.existingPreservedJoins ?? new Map());
|
||||
const shards = new Map<string, LiveDatabaseManifestShard>();
|
||||
|
||||
for (const table of input.tables) {
|
||||
const shardKey = getShardKey(input.connectionType, table.catalog, table.db);
|
||||
const shard = shards.get(shardKey) ?? { tables: {} };
|
||||
const existingDescriptions = input.existingDescriptions?.get(table.name);
|
||||
|
||||
const columns: LiveDatabaseManifestColumn[] = table.columns.map((column) => {
|
||||
const manifestColumn: LiveDatabaseManifestColumn = {
|
||||
name: column.name,
|
||||
type: input.mapColumnType(column.type),
|
||||
};
|
||||
if (column.pk) {
|
||||
manifestColumn.pk = true;
|
||||
}
|
||||
if (column.nullable === false) {
|
||||
manifestColumn.nullable = false;
|
||||
}
|
||||
const descriptions = mergeDescriptionsPreservingExternal(
|
||||
existingDescriptions?.columns.get(column.name),
|
||||
column.descriptions,
|
||||
);
|
||||
if (descriptions) {
|
||||
manifestColumn.descriptions = descriptions;
|
||||
}
|
||||
return manifestColumn;
|
||||
});
|
||||
|
||||
const entry: LiveDatabaseManifestTableEntry = {
|
||||
table: buildTableRef(table.name, table.catalog, table.db),
|
||||
columns,
|
||||
};
|
||||
|
||||
const tableDescriptions = mergeDescriptionsPreservingExternal(existingDescriptions?.table, table.descriptions);
|
||||
if (tableDescriptions) {
|
||||
entry.descriptions = tableDescriptions;
|
||||
}
|
||||
|
||||
const usage = mergeUsagePreservingExternal(input.existingUsage?.get(table.name), table.usage);
|
||||
if (usage) {
|
||||
entry.usage = usage;
|
||||
}
|
||||
|
||||
const tableJoins = joinsByTable.get(table.name);
|
||||
if (tableJoins && tableJoins.length > 0) {
|
||||
entry.joins = tableJoins;
|
||||
}
|
||||
|
||||
shard.tables[table.name] = entry;
|
||||
shards.set(shardKey, shard);
|
||||
}
|
||||
|
||||
return {
|
||||
shards,
|
||||
tablesProcessed: input.tables.length,
|
||||
};
|
||||
}
|
||||
|
|
@ -0,0 +1,152 @@
|
|||
import { mkdtemp, readFile } from 'node:fs/promises';
|
||||
import { tmpdir } from 'node:os';
|
||||
import { join } from 'node:path';
|
||||
import { describe, expect, it } from 'vitest';
|
||||
import {
|
||||
detectLiveDatabaseStagedDir,
|
||||
LIVE_DATABASE_FOREIGN_KEYS_FILE,
|
||||
LIVE_DATABASE_META_FILE,
|
||||
liveDatabaseTablePath,
|
||||
readLiveDatabaseTableFiles,
|
||||
writeLiveDatabaseSnapshot,
|
||||
} from './stage.js';
|
||||
import type { KtxSchemaSnapshot } from '../../../scan/types.js';
|
||||
|
||||
function snapshot(): KtxSchemaSnapshot {
|
||||
return {
|
||||
connectionId: 'conn-1',
|
||||
driver: 'postgres',
|
||||
extractedAt: '2026-04-27T00:00:00.000Z',
|
||||
scope: { schemas: ['public'] },
|
||||
metadata: { dialect: 'postgres' },
|
||||
tables: [
|
||||
{
|
||||
name: 'orders',
|
||||
catalog: null,
|
||||
db: 'public',
|
||||
kind: 'table',
|
||||
comment: 'Orders placed by customers',
|
||||
estimatedRows: 200,
|
||||
columns: [
|
||||
{
|
||||
name: 'id',
|
||||
nativeType: 'integer',
|
||||
normalizedType: 'integer',
|
||||
dimensionType: 'number',
|
||||
nullable: false,
|
||||
primaryKey: true,
|
||||
comment: null,
|
||||
},
|
||||
{
|
||||
name: 'customer_id',
|
||||
nativeType: 'integer',
|
||||
normalizedType: 'integer',
|
||||
dimensionType: 'number',
|
||||
nullable: false,
|
||||
primaryKey: false,
|
||||
comment: null,
|
||||
},
|
||||
{
|
||||
name: 'total',
|
||||
nativeType: 'numeric',
|
||||
normalizedType: 'numeric',
|
||||
dimensionType: 'number',
|
||||
nullable: false,
|
||||
primaryKey: false,
|
||||
comment: null,
|
||||
},
|
||||
],
|
||||
foreignKeys: [
|
||||
{
|
||||
fromColumn: 'customer_id',
|
||||
toCatalog: null,
|
||||
toDb: 'public',
|
||||
toTable: 'customers',
|
||||
toColumn: 'id',
|
||||
constraintName: null,
|
||||
},
|
||||
],
|
||||
},
|
||||
{
|
||||
name: 'customers',
|
||||
catalog: null,
|
||||
db: 'public',
|
||||
kind: 'table',
|
||||
comment: null,
|
||||
estimatedRows: 50,
|
||||
columns: [
|
||||
{
|
||||
name: 'id',
|
||||
nativeType: 'integer',
|
||||
normalizedType: 'integer',
|
||||
dimensionType: 'number',
|
||||
nullable: false,
|
||||
primaryKey: true,
|
||||
comment: null,
|
||||
},
|
||||
],
|
||||
foreignKeys: [],
|
||||
},
|
||||
],
|
||||
};
|
||||
}
|
||||
|
||||
describe('live-database staged snapshot files', () => {
|
||||
it('writes deterministic metadata, table, and foreign-key files', async () => {
|
||||
const dir = await mkdtemp(join(tmpdir(), 'ktx-live-db-stage-'));
|
||||
await writeLiveDatabaseSnapshot(dir, snapshot());
|
||||
|
||||
await expect(readFile(join(dir, LIVE_DATABASE_META_FILE), 'utf8')).resolves.toContain('"connectionId": "conn-1"');
|
||||
await expect(readFile(join(dir, LIVE_DATABASE_FOREIGN_KEYS_FILE), 'utf8')).resolves.toContain(
|
||||
'"fromTable": "orders"',
|
||||
);
|
||||
const connectionJson = await readFile(join(dir, LIVE_DATABASE_META_FILE), 'utf8');
|
||||
expect(connectionJson).toContain('"driver": "postgres"');
|
||||
expect(connectionJson).toContain('"schemas"');
|
||||
|
||||
const ordersPath = liveDatabaseTablePath({ catalog: null, db: 'public', name: 'orders' });
|
||||
const customersPath = liveDatabaseTablePath({ catalog: null, db: 'public', name: 'customers' });
|
||||
expect(ordersPath).toMatch(/^tables\/[A-Za-z0-9_-]+\.[A-Za-z0-9_-]+\.[A-Za-z0-9_-]+\.json$/);
|
||||
await expect(readFile(join(dir, ordersPath), 'utf8')).resolves.toContain('"name": "orders"');
|
||||
await expect(readFile(join(dir, customersPath), 'utf8')).resolves.toContain('"name": "customers"');
|
||||
const ordersJson = await readFile(join(dir, ordersPath), 'utf8');
|
||||
expect(ordersJson).toContain('"kind": "table"');
|
||||
expect(ordersJson).toContain('"estimatedRows": 200');
|
||||
expect(ordersJson).toContain('"nativeType": "integer"');
|
||||
expect(ordersJson).toContain('"normalizedType": "integer"');
|
||||
expect(ordersJson).not.toContain('"type": "integer"');
|
||||
|
||||
const tableFiles = await readLiveDatabaseTableFiles(dir);
|
||||
expect(tableFiles.map((file) => file.table.name)).toEqual(['customers', 'orders']);
|
||||
expect(await detectLiveDatabaseStagedDir(dir)).toBe(true);
|
||||
});
|
||||
|
||||
it('redacts sensitive snapshot metadata before writing connection metadata', async () => {
|
||||
const dir = await mkdtemp(join(tmpdir(), 'ktx-live-db-redacted-stage-'));
|
||||
await writeLiveDatabaseSnapshot(dir, {
|
||||
...snapshot(),
|
||||
metadata: {
|
||||
dialect: 'postgres',
|
||||
url: 'postgres://reader:secret@example.test/db', // pragma: allowlist secret
|
||||
serviceAccountJson: {
|
||||
client_email: 'reader@example.test',
|
||||
private_key: 'pem-value', // pragma: allowlist secret
|
||||
},
|
||||
},
|
||||
});
|
||||
|
||||
const connectionJson = await readFile(join(dir, LIVE_DATABASE_META_FILE), 'utf8');
|
||||
|
||||
expect(connectionJson).toContain('"dialect": "postgres"');
|
||||
expect(connectionJson).toContain('"client_email": "reader@example.test"');
|
||||
expect(connectionJson).toContain('"url": "<redacted>"');
|
||||
expect(connectionJson).toContain('"private_key": "<redacted>"');
|
||||
expect(connectionJson).not.toContain('postgres://reader:secret@example.test/db'); // pragma: allowlist secret
|
||||
expect(connectionJson).not.toContain('pem-value');
|
||||
});
|
||||
|
||||
it('returns false for a directory that is missing live database metadata', async () => {
|
||||
const dir = await mkdtemp(join(tmpdir(), 'ktx-live-db-empty-'));
|
||||
expect(await detectLiveDatabaseStagedDir(dir)).toBe(false);
|
||||
});
|
||||
});
|
||||
139
packages/cli/src/context/ingest/adapters/live-database/stage.ts
Normal file
139
packages/cli/src/context/ingest/adapters/live-database/stage.ts
Normal file
|
|
@ -0,0 +1,139 @@
|
|||
import { Buffer } from 'node:buffer';
|
||||
import type { Dirent } from 'node:fs';
|
||||
import { mkdir, readdir, readFile, writeFile } from 'node:fs/promises';
|
||||
import { join, relative } from 'node:path';
|
||||
import { redactKtxSensitiveMetadata } from '../../../core/redaction.js';
|
||||
import type { KtxSchemaSnapshot, KtxSchemaTable, KtxTableRef } from '../../../scan/types.js';
|
||||
|
||||
export const LIVE_DATABASE_META_FILE = 'connection.json';
|
||||
export const LIVE_DATABASE_FOREIGN_KEYS_FILE = 'foreign-keys.json';
|
||||
const LIVE_DATABASE_TABLES_DIR = 'tables';
|
||||
|
||||
interface LiveDatabaseTableFile {
|
||||
path: string;
|
||||
table: KtxSchemaTable;
|
||||
}
|
||||
|
||||
interface ForeignKeyIndexEntry {
|
||||
fromTable: string;
|
||||
fromTablePath: string;
|
||||
fromColumn: string;
|
||||
toCatalog: string | null;
|
||||
toDb: string | null;
|
||||
toTable: string;
|
||||
toColumn: string;
|
||||
constraintName: string | null;
|
||||
}
|
||||
|
||||
function encodePathPart(value: string | null | undefined): string {
|
||||
return Buffer.from(value ?? '_', 'utf8').toString('base64url');
|
||||
}
|
||||
|
||||
function tableSortKey(table: KtxTableRef): string {
|
||||
return `${table.catalog ?? ''}\u0000${table.db ?? ''}\u0000${table.name}`;
|
||||
}
|
||||
|
||||
/** @internal */
|
||||
export function liveDatabaseTablePath(table: KtxTableRef): string {
|
||||
return `${LIVE_DATABASE_TABLES_DIR}/${encodePathPart(table.catalog)}.${encodePathPart(table.db)}.${encodePathPart(
|
||||
table.name,
|
||||
)}.json`;
|
||||
}
|
||||
|
||||
async function walkFiles(root: string, dir = root): Promise<string[]> {
|
||||
let entries: Dirent[];
|
||||
try {
|
||||
entries = await readdir(dir, { withFileTypes: true });
|
||||
} catch {
|
||||
return [];
|
||||
}
|
||||
const files: string[] = [];
|
||||
for (const entry of entries) {
|
||||
const absolute = join(dir, entry.name);
|
||||
if (entry.isDirectory()) {
|
||||
files.push(...(await walkFiles(root, absolute)));
|
||||
} else if (entry.isFile()) {
|
||||
files.push(relative(root, absolute).replace(/\\/g, '/'));
|
||||
}
|
||||
}
|
||||
return files.sort();
|
||||
}
|
||||
|
||||
function stableJson(value: unknown): string {
|
||||
return `${JSON.stringify(value, null, 2)}\n`;
|
||||
}
|
||||
|
||||
function foreignKeyIndex(snapshot: KtxSchemaSnapshot): ForeignKeyIndexEntry[] {
|
||||
const entries: ForeignKeyIndexEntry[] = [];
|
||||
for (const table of snapshot.tables) {
|
||||
for (const fk of table.foreignKeys) {
|
||||
entries.push({
|
||||
fromTable: table.name,
|
||||
fromTablePath: liveDatabaseTablePath(table),
|
||||
fromColumn: fk.fromColumn,
|
||||
toCatalog: fk.toCatalog,
|
||||
toDb: fk.toDb,
|
||||
toTable: fk.toTable,
|
||||
toColumn: fk.toColumn,
|
||||
constraintName: fk.constraintName,
|
||||
});
|
||||
}
|
||||
}
|
||||
entries.sort(
|
||||
(a, b) =>
|
||||
a.fromTable.localeCompare(b.fromTable) ||
|
||||
a.fromColumn.localeCompare(b.fromColumn) ||
|
||||
a.toTable.localeCompare(b.toTable) ||
|
||||
a.toColumn.localeCompare(b.toColumn),
|
||||
);
|
||||
return entries;
|
||||
}
|
||||
|
||||
export async function writeLiveDatabaseSnapshot(stagedDir: string, snapshot: KtxSchemaSnapshot): Promise<void> {
|
||||
await mkdir(join(stagedDir, LIVE_DATABASE_TABLES_DIR), { recursive: true });
|
||||
const sortedTables = [...snapshot.tables].sort((a, b) => tableSortKey(a).localeCompare(tableSortKey(b)));
|
||||
const metadata = {
|
||||
connectionId: snapshot.connectionId,
|
||||
driver: snapshot.driver,
|
||||
extractedAt: snapshot.extractedAt,
|
||||
scope: snapshot.scope,
|
||||
metadata: redactKtxSensitiveMetadata(snapshot.metadata),
|
||||
tableCount: sortedTables.length,
|
||||
};
|
||||
await writeFile(join(stagedDir, LIVE_DATABASE_META_FILE), stableJson(metadata));
|
||||
await writeFile(
|
||||
join(stagedDir, LIVE_DATABASE_FOREIGN_KEYS_FILE),
|
||||
stableJson({ foreignKeys: foreignKeyIndex(snapshot) }),
|
||||
);
|
||||
for (const table of sortedTables) {
|
||||
await writeFile(join(stagedDir, liveDatabaseTablePath(table)), stableJson(table));
|
||||
}
|
||||
}
|
||||
|
||||
export async function readLiveDatabaseTableFiles(stagedDir: string): Promise<LiveDatabaseTableFile[]> {
|
||||
const files = await walkFiles(join(stagedDir, LIVE_DATABASE_TABLES_DIR));
|
||||
const out: LiveDatabaseTableFile[] = [];
|
||||
for (const file of files.filter((path) => path.endsWith('.json'))) {
|
||||
const path = `${LIVE_DATABASE_TABLES_DIR}/${file}`;
|
||||
const raw = await readFile(join(stagedDir, path), 'utf8');
|
||||
const parsed = JSON.parse(raw) as KtxSchemaTable;
|
||||
if (parsed && typeof parsed.name === 'string' && Array.isArray(parsed.columns)) {
|
||||
out.push({ path, table: parsed });
|
||||
}
|
||||
}
|
||||
out.sort((a, b) => tableSortKey(a.table).localeCompare(tableSortKey(b.table)));
|
||||
return out;
|
||||
}
|
||||
|
||||
export async function detectLiveDatabaseStagedDir(stagedDir: string): Promise<boolean> {
|
||||
try {
|
||||
const meta = JSON.parse(await readFile(join(stagedDir, LIVE_DATABASE_META_FILE), 'utf8')) as unknown;
|
||||
if (!meta || typeof meta !== 'object' || Array.isArray(meta)) {
|
||||
return false;
|
||||
}
|
||||
const files = await readLiveDatabaseTableFiles(stagedDir);
|
||||
return files.length > 0;
|
||||
} catch {
|
||||
return false;
|
||||
}
|
||||
}
|
||||
|
|
@ -0,0 +1,10 @@
|
|||
import type { KtxSchemaSnapshot } from '../../../scan/types.js';
|
||||
|
||||
export interface LiveDatabaseIntrospectionPort {
|
||||
extractSchema(connectionId: string): Promise<KtxSchemaSnapshot>;
|
||||
}
|
||||
|
||||
export interface LiveDatabaseSourceAdapterDeps {
|
||||
introspection: LiveDatabaseIntrospectionPort;
|
||||
now?: () => Date;
|
||||
}
|
||||
154
packages/cli/src/context/ingest/adapters/looker/chunk.test.ts
Normal file
154
packages/cli/src/context/ingest/adapters/looker/chunk.test.ts
Normal file
|
|
@ -0,0 +1,154 @@
|
|||
import { mkdir, mkdtemp, rm, writeFile } from 'node:fs/promises';
|
||||
import { tmpdir } from 'node:os';
|
||||
import { join } from 'node:path';
|
||||
import { afterEach, beforeEach, describe, expect, it } from 'vitest';
|
||||
import { chunkLookerStagedDir } from './chunk.js';
|
||||
import { writeLookerEvidenceDocuments } from './evidence-documents.js';
|
||||
|
||||
async function writeJson(stagedDir: string, relPath: string, value: unknown): Promise<void> {
|
||||
const abs = join(stagedDir, relPath);
|
||||
await mkdir(join(abs, '..'), { recursive: true });
|
||||
await writeFile(abs, `${JSON.stringify(value, null, 2)}\n`, 'utf-8');
|
||||
}
|
||||
|
||||
async function writeSmallFixture(stagedDir: string): Promise<void> {
|
||||
await writeJson(stagedDir, 'sync-config.json', {
|
||||
lookerConnectionId: '11111111-1111-4111-8111-111111111111',
|
||||
fetchedAt: '2026-04-30T12:30:00.000Z',
|
||||
});
|
||||
await writeJson(stagedDir, 'lookml_models.json', {
|
||||
models: [{ name: 'b2b', label: 'B2B', explores: [{ name: 'sales_pipeline', label: 'Sales Pipeline' }] }],
|
||||
});
|
||||
await writeJson(stagedDir, 'explores/b2b/sales_pipeline.json', {
|
||||
modelName: 'b2b',
|
||||
exploreName: 'sales_pipeline',
|
||||
label: 'Sales Pipeline',
|
||||
description: null,
|
||||
fields: { dimensions: [{ name: 'opportunities.id' }], measures: [{ name: 'opportunities.arr' }] },
|
||||
joins: [],
|
||||
});
|
||||
await writeJson(stagedDir, 'dashboards/10.json', {
|
||||
lookerId: '10',
|
||||
title: 'Sales Pipeline',
|
||||
description: null,
|
||||
folderId: '7',
|
||||
ownerId: '3',
|
||||
updatedAt: '2026-04-30T12:00:00.000Z',
|
||||
tiles: [{ id: '100', title: 'ARR', lookId: null, query: { model: 'b2b', view: 'sales_pipeline' } }],
|
||||
});
|
||||
await writeJson(stagedDir, 'looks/20.json', {
|
||||
lookerId: '20',
|
||||
title: 'Open Pipeline',
|
||||
description: null,
|
||||
folderId: '7',
|
||||
ownerId: '3',
|
||||
updatedAt: '2026-04-30T12:00:00.000Z',
|
||||
query: { model: 'b2b', view: 'sales_pipeline', fields: ['opportunities.arr'] },
|
||||
});
|
||||
await writeJson(stagedDir, 'folders/tree.json', {
|
||||
folders: [{ id: '7', name: 'Sandbox', parentId: null, path: ['Sandbox'] }],
|
||||
});
|
||||
await writeJson(stagedDir, 'users/3.json', { id: '3', displayName: 'Ada Lovelace', email: null });
|
||||
await writeJson(stagedDir, 'signals/dashboard_usage.json', [
|
||||
{ contentId: '10', queryCount30d: 50, uniqueUsers30d: 8 },
|
||||
]);
|
||||
await writeJson(stagedDir, 'signals/look_usage.json', [{ contentId: '20', queryCount30d: 20, uniqueUsers30d: 5 }]);
|
||||
await writeJson(stagedDir, 'signals/scheduled_plans.json', [
|
||||
{ contentId: '10', contentType: 'dashboard', isScheduled: true, scheduleCount: 1, recipientCount: 3 },
|
||||
]);
|
||||
await writeJson(stagedDir, 'signals/favorites.json', [
|
||||
{ contentId: '10', contentType: 'dashboard', favoriteCount: 4 },
|
||||
]);
|
||||
await writeLookerEvidenceDocuments(stagedDir);
|
||||
}
|
||||
|
||||
describe('chunkLookerStagedDir', () => {
|
||||
let stagedDir: string;
|
||||
|
||||
beforeEach(async () => {
|
||||
stagedDir = await mkdtemp(join(tmpdir(), 'looker-chunk-'));
|
||||
await writeSmallFixture(stagedDir);
|
||||
});
|
||||
|
||||
afterEach(async () => {
|
||||
await rm(stagedDir, { recursive: true, force: true });
|
||||
});
|
||||
|
||||
it('emits one WU per explore, dashboard, and Look with readable dependencies', async () => {
|
||||
const result = await chunkLookerStagedDir(stagedDir);
|
||||
expect(result.reconcileNotes).toEqual([
|
||||
expect.stringContaining('emit_artifact_resolution with actionType="subsumed"'),
|
||||
]);
|
||||
expect(result.workUnits.map((wu) => wu.unitKey).sort()).toEqual([
|
||||
'looker-dashboard-10',
|
||||
'looker-explore-b2b-sales_pipeline',
|
||||
'looker-look-20',
|
||||
]);
|
||||
|
||||
const dashboard = result.workUnits.find((wu) => wu.unitKey === 'looker-dashboard-10');
|
||||
expect(dashboard?.rawFiles).toEqual([
|
||||
'dashboards/10.json',
|
||||
'evidence/dashboards/10/metadata.json',
|
||||
'evidence/dashboards/10/page.md',
|
||||
]);
|
||||
expect(dashboard?.notes).toContain('context_candidate_write');
|
||||
expect(dashboard?.notes).not.toContain('wiki_write');
|
||||
expect(dashboard?.dependencyPaths.sort()).toEqual([
|
||||
'explores/b2b/sales_pipeline.json',
|
||||
'folders/tree.json',
|
||||
'signals/dashboard_usage.json',
|
||||
'signals/favorites.json',
|
||||
'signals/scheduled_plans.json',
|
||||
'users/3.json',
|
||||
]);
|
||||
|
||||
const explore = result.workUnits.find((wu) => wu.unitKey === 'looker-explore-b2b-sales_pipeline');
|
||||
expect(explore?.rawFiles).toEqual([
|
||||
'explores/b2b/sales_pipeline.json',
|
||||
'evidence/explores/b2b/sales_pipeline/metadata.json',
|
||||
'evidence/explores/b2b/sales_pipeline/page.md',
|
||||
]);
|
||||
expect(explore?.dependencyPaths).toEqual(['lookml_models.json']);
|
||||
});
|
||||
|
||||
it('keeps downstream dashboard and Look WUs when an explore dependency changes', async () => {
|
||||
const result = await chunkLookerStagedDir(stagedDir, {
|
||||
added: [],
|
||||
modified: ['explores/b2b/sales_pipeline.json'],
|
||||
deleted: [],
|
||||
unchanged: [
|
||||
'dashboards/10.json',
|
||||
'looks/20.json',
|
||||
'lookml_models.json',
|
||||
'folders/tree.json',
|
||||
'users/3.json',
|
||||
'signals/dashboard_usage.json',
|
||||
'signals/look_usage.json',
|
||||
'signals/scheduled_plans.json',
|
||||
'signals/favorites.json',
|
||||
],
|
||||
});
|
||||
|
||||
expect(result.workUnits.map((wu) => wu.unitKey).sort()).toEqual([
|
||||
'looker-dashboard-10',
|
||||
'looker-explore-b2b-sales_pipeline',
|
||||
'looker-look-20',
|
||||
]);
|
||||
expect(result.workUnits.find((wu) => wu.unitKey === 'looker-dashboard-10')?.rawFiles).toEqual([
|
||||
'dashboards/10.json',
|
||||
'evidence/dashboards/10/metadata.json',
|
||||
'evidence/dashboards/10/page.md',
|
||||
]);
|
||||
});
|
||||
|
||||
it('returns an EvictionUnit for deleted runtime entity raw paths', async () => {
|
||||
const result = await chunkLookerStagedDir(stagedDir, {
|
||||
added: [],
|
||||
modified: [],
|
||||
deleted: ['looks/20.json'],
|
||||
unchanged: ['dashboards/10.json', 'explores/b2b/sales_pipeline.json'],
|
||||
});
|
||||
|
||||
expect(result.eviction).toEqual({ deletedRawPaths: ['looks/20.json'] });
|
||||
});
|
||||
});
|
||||
198
packages/cli/src/context/ingest/adapters/looker/chunk.ts
Normal file
198
packages/cli/src/context/ingest/adapters/looker/chunk.ts
Normal file
|
|
@ -0,0 +1,198 @@
|
|||
import { readdir, readFile } from 'node:fs/promises';
|
||||
import { join, relative } from 'node:path';
|
||||
import type { ChunkResult, DiffSet, WorkUnit } from '../../types.js';
|
||||
import { buildLookerReconcileNotes } from './reconcile.js';
|
||||
import {
|
||||
STAGED_FILES,
|
||||
type StagedDashboardFile,
|
||||
type StagedLookerQuery,
|
||||
type StagedLookFile,
|
||||
stagedDashboardFileSchema,
|
||||
stagedExploreFileSchema,
|
||||
stagedLookFileSchema,
|
||||
} from './types.js';
|
||||
|
||||
interface LoadedLookerProject {
|
||||
allPaths: string[];
|
||||
dashboardsByPath: Map<string, StagedDashboardFile>;
|
||||
looksByPath: Map<string, StagedLookFile>;
|
||||
explorePaths: string[];
|
||||
}
|
||||
|
||||
async function walk(root: string): Promise<string[]> {
|
||||
const entries = await readdir(root, { withFileTypes: true, recursive: true });
|
||||
return entries
|
||||
.filter((entry) => entry.isFile())
|
||||
.map((entry) => relative(root, join(entry.parentPath, entry.name)).replace(/\\/g, '/'))
|
||||
.sort();
|
||||
}
|
||||
|
||||
async function loadProject(stagedDir: string): Promise<LoadedLookerProject> {
|
||||
const allPaths = await walk(stagedDir);
|
||||
const dashboardsByPath = new Map<string, StagedDashboardFile>();
|
||||
const looksByPath = new Map<string, StagedLookFile>();
|
||||
const explorePaths: string[] = [];
|
||||
|
||||
for (const path of allPaths) {
|
||||
if (/^dashboards\/[^/]+\.json$/.test(path)) {
|
||||
dashboardsByPath.set(
|
||||
path,
|
||||
stagedDashboardFileSchema.parse(JSON.parse(await readFile(join(stagedDir, path), 'utf-8'))),
|
||||
);
|
||||
continue;
|
||||
}
|
||||
if (/^looks\/[^/]+\.json$/.test(path)) {
|
||||
looksByPath.set(path, stagedLookFileSchema.parse(JSON.parse(await readFile(join(stagedDir, path), 'utf-8'))));
|
||||
continue;
|
||||
}
|
||||
if (/^explores\/[^/]+\/[^/]+\.json$/.test(path)) {
|
||||
const explore = stagedExploreFileSchema.parse(JSON.parse(await readFile(join(stagedDir, path), 'utf-8')));
|
||||
explorePaths.push(explorePath(explore.modelName, explore.exploreName));
|
||||
}
|
||||
}
|
||||
|
||||
return { allPaths, dashboardsByPath, looksByPath, explorePaths: [...new Set(explorePaths)].sort() };
|
||||
}
|
||||
|
||||
export async function chunkLookerStagedDir(stagedDir: string, diffSet?: DiffSet): Promise<ChunkResult> {
|
||||
const project = await loadProject(stagedDir);
|
||||
const firstRunUnits = emitFirstRunWorkUnits(project);
|
||||
const result = diffSet ? applyDiffSet(firstRunUnits, diffSet) : { workUnits: firstRunUnits };
|
||||
const eviction =
|
||||
diffSet && diffSet.deleted.length > 0 ? { deletedRawPaths: [...diffSet.deleted].sort() } : result.eviction;
|
||||
return {
|
||||
...result,
|
||||
eviction,
|
||||
reconcileNotes: result.workUnits.length > 0 || eviction ? buildLookerReconcileNotes() : [],
|
||||
};
|
||||
}
|
||||
|
||||
function emitFirstRunWorkUnits(project: LoadedLookerProject): WorkUnit[] {
|
||||
const units: WorkUnit[] = [];
|
||||
|
||||
for (const path of project.explorePaths) {
|
||||
const parts = /^explores\/([^/]+)\/([^/]+)\.json$/.exec(path);
|
||||
if (!parts) {
|
||||
continue;
|
||||
}
|
||||
const deps = project.allPaths.includes(STAGED_FILES.lookmlModels) ? [STAGED_FILES.lookmlModels] : [];
|
||||
units.push(
|
||||
buildUnit(project, {
|
||||
unitKey: `looker-explore-${parts[1]}-${parts[2]}`,
|
||||
displayLabel: `Looker explore ${parts[1]}.${parts[2]}`,
|
||||
rawFiles: [path, ...evidencePathsForExplore(project, parts[1], parts[2])],
|
||||
dependencyPaths: deps,
|
||||
notes: `Write API-derived SL source looker__${parts[1]}__${parts[2]} and durable domain knowledge for this Looker explore.`,
|
||||
}),
|
||||
);
|
||||
}
|
||||
|
||||
for (const [path, dashboard] of [...project.dashboardsByPath.entries()].sort(([a], [b]) => a.localeCompare(b))) {
|
||||
const deps = new Set<string>();
|
||||
addIfPresent(project, deps, STAGED_FILES.foldersTree);
|
||||
addIfPresent(project, deps, STAGED_FILES.signals.dashboardUsage);
|
||||
addIfPresent(project, deps, STAGED_FILES.signals.scheduledPlans);
|
||||
addIfPresent(project, deps, STAGED_FILES.signals.favorites);
|
||||
if (dashboard.ownerId) {
|
||||
addIfPresent(project, deps, `users/${dashboard.ownerId}.json`);
|
||||
}
|
||||
for (const tile of dashboard.tiles) {
|
||||
addExploreDependency(project, deps, tile.query);
|
||||
}
|
||||
|
||||
units.push(
|
||||
buildUnit(project, {
|
||||
unitKey: `looker-dashboard-${dashboard.lookerId}`,
|
||||
displayLabel: `Looker dashboard "${dashboard.title}"`,
|
||||
rawFiles: [path, ...evidencePathsForDashboard(project, dashboard.lookerId)],
|
||||
dependencyPaths: [...deps].sort(),
|
||||
notes:
|
||||
'Extract generalizable metric, segment, and domain knowledge from this dashboard. Treat usage, owner, and folder data as prioritization/provenance context only. Use context_evidence_search/context_evidence_read and context_candidate_write for wiki-bound knowledge; do not write wiki pages directly from this WorkUnit.',
|
||||
}),
|
||||
);
|
||||
}
|
||||
|
||||
for (const [path, look] of [...project.looksByPath.entries()].sort(([a], [b]) => a.localeCompare(b))) {
|
||||
const deps = new Set<string>();
|
||||
addIfPresent(project, deps, STAGED_FILES.foldersTree);
|
||||
addIfPresent(project, deps, STAGED_FILES.signals.lookUsage);
|
||||
addIfPresent(project, deps, STAGED_FILES.signals.scheduledPlans);
|
||||
addIfPresent(project, deps, STAGED_FILES.signals.favorites);
|
||||
if (look.ownerId) {
|
||||
addIfPresent(project, deps, `users/${look.ownerId}.json`);
|
||||
}
|
||||
addExploreDependency(project, deps, look.query);
|
||||
|
||||
units.push(
|
||||
buildUnit(project, {
|
||||
unitKey: `looker-look-${look.lookerId}`,
|
||||
displayLabel: `Looker Look "${look.title}"`,
|
||||
rawFiles: [path, ...evidencePathsForLook(project, look.lookerId)],
|
||||
dependencyPaths: [...deps].sort(),
|
||||
notes:
|
||||
'Extract generalizable metric, segment, and domain knowledge from this Look. Treat usage, owner, and folder data as prioritization/provenance context only. Use context_evidence_search/context_evidence_read and context_candidate_write for wiki-bound knowledge; do not write wiki pages directly from this WorkUnit.',
|
||||
}),
|
||||
);
|
||||
}
|
||||
|
||||
return units.sort((a, b) => a.unitKey.localeCompare(b.unitKey));
|
||||
}
|
||||
|
||||
function buildUnit(
|
||||
project: LoadedLookerProject,
|
||||
input: Pick<WorkUnit, 'unitKey' | 'displayLabel' | 'rawFiles' | 'dependencyPaths' | 'notes'>,
|
||||
): WorkUnit {
|
||||
const excluded = new Set([...input.rawFiles, ...input.dependencyPaths]);
|
||||
return {
|
||||
...input,
|
||||
peerFileIndex: project.allPaths.filter((path) => !excluded.has(path)).sort(),
|
||||
};
|
||||
}
|
||||
|
||||
function applyDiffSet(firstRunUnits: WorkUnit[], diffSet: DiffSet): ChunkResult {
|
||||
const touched = new Set([...diffSet.added, ...diffSet.modified]);
|
||||
const workUnits = firstRunUnits.filter((wu) => {
|
||||
const readablePaths = [...wu.rawFiles, ...wu.dependencyPaths];
|
||||
return readablePaths.some((path) => touched.has(path));
|
||||
});
|
||||
return { workUnits };
|
||||
}
|
||||
|
||||
function addIfPresent(project: LoadedLookerProject, deps: Set<string>, path: string): void {
|
||||
if (project.allPaths.includes(path)) {
|
||||
deps.add(path);
|
||||
}
|
||||
}
|
||||
|
||||
function addExploreDependency(project: LoadedLookerProject, deps: Set<string>, query: StagedLookerQuery | null): void {
|
||||
if (!query) {
|
||||
return;
|
||||
}
|
||||
addIfPresent(project, deps, explorePath(query.model, query.view));
|
||||
}
|
||||
|
||||
function evidencePathsForExplore(project: LoadedLookerProject, modelName: string, exploreName: string): string[] {
|
||||
return existingPaths(project, [
|
||||
`evidence/explores/${modelName}/${exploreName}/metadata.json`,
|
||||
`evidence/explores/${modelName}/${exploreName}/page.md`,
|
||||
]);
|
||||
}
|
||||
|
||||
function evidencePathsForDashboard(project: LoadedLookerProject, dashboardId: string): string[] {
|
||||
return existingPaths(project, [
|
||||
`evidence/dashboards/${dashboardId}/metadata.json`,
|
||||
`evidence/dashboards/${dashboardId}/page.md`,
|
||||
]);
|
||||
}
|
||||
|
||||
function evidencePathsForLook(project: LoadedLookerProject, lookId: string): string[] {
|
||||
return existingPaths(project, [`evidence/looks/${lookId}/metadata.json`, `evidence/looks/${lookId}/page.md`]);
|
||||
}
|
||||
|
||||
function existingPaths(project: LoadedLookerProject, paths: string[]): string[] {
|
||||
return paths.filter((path) => project.allPaths.includes(path));
|
||||
}
|
||||
|
||||
function explorePath(modelName: string, exploreName: string): string {
|
||||
return `explores/${modelName}/${exploreName}.json`;
|
||||
}
|
||||
|
|
@ -0,0 +1,14 @@
|
|||
import { readFile } from 'node:fs/promises';
|
||||
import { describe, expect, it } from 'vitest';
|
||||
|
||||
describe('LookerClient boundary', () => {
|
||||
it('does not import server or NestJS modules', async () => {
|
||||
const source = await readFile(new URL('./client.ts', import.meta.url), 'utf-8');
|
||||
|
||||
expect(source).not.toMatch(/@nestjs\/common/);
|
||||
expect(source).not.toMatch(/DataSourceClient/);
|
||||
expect(source).not.toMatch(/\.\.\/interfaces/);
|
||||
expect(source).not.toMatch(/\.\.\/types/);
|
||||
expect(source).not.toMatch(/server\/src/);
|
||||
});
|
||||
});
|
||||
473
packages/cli/src/context/ingest/adapters/looker/client.test.ts
Normal file
473
packages/cli/src/context/ingest/adapters/looker/client.test.ts
Normal file
|
|
@ -0,0 +1,473 @@
|
|||
import { describe, expect, it, vi } from 'vitest';
|
||||
import { LookerClient, type LookerSdkPort } from './client.js';
|
||||
|
||||
const clientSecretParam = 'client_secret'; // pragma: allowlist secret
|
||||
|
||||
function params(): Record<string, unknown> {
|
||||
return {
|
||||
base_url: 'https://example.looker.com',
|
||||
client_id: 'id',
|
||||
[clientSecretParam]: 'credential', // pragma: allowlist secret
|
||||
};
|
||||
}
|
||||
|
||||
function sdk(overrides: Partial<LookerSdkPort> = {}): LookerSdkPort {
|
||||
const port: LookerSdkPort = {
|
||||
me: vi.fn().mockResolvedValue({ id: '1', display_name: 'API User', email: 'api@example.com' }),
|
||||
search_dashboards: vi.fn().mockResolvedValue([{ id: '10' }]),
|
||||
dashboard: vi.fn().mockResolvedValue({
|
||||
id: '10',
|
||||
title: 'Revenue Dashboard',
|
||||
description: 'Revenue concepts',
|
||||
folder_id: '20',
|
||||
user_id: '1',
|
||||
updated_at: '2026-04-30T00:00:00.000Z',
|
||||
dashboard_elements: [
|
||||
{
|
||||
id: '99',
|
||||
title: 'ARR',
|
||||
look_id: null,
|
||||
query: {
|
||||
id: 'q1',
|
||||
model: 'b2b',
|
||||
view: 'sales_pipeline',
|
||||
fields: ['opportunities.arr', 'opportunities.stage'],
|
||||
filters: { 'opportunities.stage': 'open' },
|
||||
sorts: ['opportunities.arr desc'],
|
||||
limit: '500',
|
||||
},
|
||||
},
|
||||
],
|
||||
}),
|
||||
search_looks: vi.fn().mockResolvedValue([{ id: '30' }]),
|
||||
search_scheduled_plans: vi.fn().mockResolvedValue([]),
|
||||
look: vi.fn().mockResolvedValue({
|
||||
id: '30',
|
||||
title: 'Open Pipeline ARR',
|
||||
description: 'ARR for open opportunities',
|
||||
folder_id: '20',
|
||||
user_id: '1',
|
||||
updated_at: '2026-04-30T00:00:00.000Z',
|
||||
query: {
|
||||
id: 'q2',
|
||||
model: 'b2b',
|
||||
view: 'sales_pipeline',
|
||||
fields: ['opportunities.arr'],
|
||||
filters: { 'opportunities.stage': 'open' },
|
||||
},
|
||||
}),
|
||||
all_folders: vi.fn().mockResolvedValue([{ id: '20', name: 'Executive', parent_id: null }]),
|
||||
all_users: vi.fn().mockResolvedValue([{ id: '1', display_name: 'API User', email: 'api@example.com' }]),
|
||||
all_groups: vi.fn().mockResolvedValue([{ id: '2', name: 'Finance' }]),
|
||||
all_connections: vi.fn().mockResolvedValue([
|
||||
{
|
||||
name: 'b2b_sandbox_bq',
|
||||
host: 'warehouse.example.com',
|
||||
database: 'analytics',
|
||||
schema: 'public',
|
||||
dialect_name: 'bigquery_standard_sql',
|
||||
},
|
||||
]),
|
||||
all_lookml_models: vi
|
||||
.fn()
|
||||
.mockResolvedValue([
|
||||
{ name: 'b2b', label: 'B2B', explores: [{ name: 'sales_pipeline', label: 'Sales Pipeline' }] },
|
||||
]),
|
||||
lookml_model_explore: vi.fn().mockResolvedValue({
|
||||
name: 'sales_pipeline',
|
||||
label: 'Sales Pipeline',
|
||||
description: 'Opportunity pipeline',
|
||||
sql_table_name: 'proj.dataset.opportunities AS opportunities',
|
||||
connection_name: 'b2b_sandbox_bq',
|
||||
view_name: 'opportunities',
|
||||
fields: {
|
||||
dimensions: [{ name: 'opportunities.stage', label: 'Stage', type: 'string', sql: '$' + '{TABLE}.stage' }],
|
||||
measures: [{ name: 'opportunities.arr', label: 'ARR', type: 'sum', sql: '$' + '{TABLE}.arr' }],
|
||||
},
|
||||
joins: [
|
||||
{
|
||||
name: 'accounts',
|
||||
type: 'left_outer',
|
||||
relationship: 'many_to_one',
|
||||
sql_table_name: 'proj.dataset.accounts',
|
||||
sql_on: '$' + '{opportunities.account_id} = $' + '{accounts.id}',
|
||||
from: null,
|
||||
},
|
||||
],
|
||||
}),
|
||||
run_inline_query: vi.fn().mockResolvedValue('[]'),
|
||||
logout: vi.fn().mockResolvedValue(undefined),
|
||||
...overrides,
|
||||
};
|
||||
return port;
|
||||
}
|
||||
|
||||
describe('LookerClient', () => {
|
||||
it('validates credentials with me()', async () => {
|
||||
const client = new LookerClient(params(), { sdkFactory: () => sdk() });
|
||||
|
||||
await expect(client.testConnection()).resolves.toEqual({
|
||||
success: true,
|
||||
metadata: { userId: '1', displayName: 'API User', email: 'api@example.com' },
|
||||
});
|
||||
});
|
||||
|
||||
it('does not warn to console when optional prioritization inputs fail by default', async () => {
|
||||
const warn = vi.spyOn(console, 'warn').mockImplementation(() => undefined);
|
||||
const fakeSdk = sdk({
|
||||
search_dashboards: vi.fn().mockRejectedValue(new Error('dashboards unavailable')),
|
||||
search_looks: vi.fn().mockRejectedValue(new Error('looks unavailable')),
|
||||
});
|
||||
const client = new LookerClient(params(), { sdkFactory: () => fakeSdk });
|
||||
|
||||
await expect(client.getSignals()).resolves.toMatchObject({
|
||||
dashboardUsage: [],
|
||||
lookUsage: [],
|
||||
scheduledPlans: [],
|
||||
favorites: [],
|
||||
});
|
||||
|
||||
expect(warn).not.toHaveBeenCalled();
|
||||
});
|
||||
|
||||
it('maps dashboards, looks, folders, models, explores, users, and groups to staged DTOs', async () => {
|
||||
const fakeSdk = sdk();
|
||||
const client = new LookerClient(params(), { sdkFactory: () => fakeSdk });
|
||||
|
||||
await expect(client.listDashboards()).resolves.toEqual([{ id: '10', updatedAt: null }]);
|
||||
await expect(client.getDashboard('10')).resolves.toMatchObject({
|
||||
lookerId: '10',
|
||||
title: 'Revenue Dashboard',
|
||||
tiles: [{ id: '99', query: { model: 'b2b', view: 'sales_pipeline' } }],
|
||||
});
|
||||
await expect(client.listLooks()).resolves.toEqual([{ id: '30', updatedAt: null }]);
|
||||
await expect(client.getLook('30')).resolves.toMatchObject({
|
||||
lookerId: '30',
|
||||
title: 'Open Pipeline ARR',
|
||||
query: { model: 'b2b', view: 'sales_pipeline' },
|
||||
});
|
||||
await expect(client.listFolders()).resolves.toEqual({
|
||||
folders: [{ id: '20', name: 'Executive', parentId: null, path: ['Executive'] }],
|
||||
});
|
||||
await expect(client.listLookmlModels()).resolves.toEqual({
|
||||
models: [{ name: 'b2b', label: 'B2B', explores: [{ name: 'sales_pipeline', label: 'Sales Pipeline' }] }],
|
||||
});
|
||||
await expect(client.listLookerConnections()).resolves.toEqual([
|
||||
{
|
||||
name: 'b2b_sandbox_bq',
|
||||
host: 'warehouse.example.com',
|
||||
database: 'analytics',
|
||||
schema: 'public',
|
||||
dialect: 'bigquery_standard_sql',
|
||||
},
|
||||
]);
|
||||
await expect(client.getExplore('b2b', 'sales_pipeline')).resolves.toMatchObject({
|
||||
modelName: 'b2b',
|
||||
exploreName: 'sales_pipeline',
|
||||
rawSqlTableName: 'proj.dataset.opportunities AS opportunities',
|
||||
connectionName: 'b2b_sandbox_bq',
|
||||
viewName: 'opportunities',
|
||||
fields: { dimensions: [{ name: 'opportunities.stage' }], measures: [{ name: 'opportunities.arr' }] },
|
||||
joins: [
|
||||
{
|
||||
name: 'accounts',
|
||||
rawSqlTableName: 'proj.dataset.accounts',
|
||||
sqlOn: '$' + '{opportunities.account_id} = $' + '{accounts.id}',
|
||||
from: null,
|
||||
targetTable: null,
|
||||
},
|
||||
],
|
||||
targetWarehouseConnectionId: null,
|
||||
targetTable: null,
|
||||
});
|
||||
expect(fakeSdk.dashboard).toHaveBeenCalledWith(
|
||||
'10',
|
||||
'id,title,description,folder_id,user_id,updated_at,dashboard_elements(id,title,look_id,query(id,model,view,fields,filters,sorts,limit,dynamic_fields))',
|
||||
);
|
||||
expect(fakeSdk.look).toHaveBeenCalledWith(
|
||||
'30',
|
||||
'id,title,description,folder_id,user_id,updated_at,query(id,model,view,fields,filters,sorts,limit,dynamic_fields)',
|
||||
);
|
||||
expect(fakeSdk.lookml_model_explore).toHaveBeenCalledWith(
|
||||
'b2b',
|
||||
'sales_pipeline',
|
||||
'name,label,description,sql_table_name,connection_name,view_name,fields,joins(name,type,relationship,sql_table_name,sql_on,from)',
|
||||
);
|
||||
expect(fakeSdk.all_connections).toHaveBeenCalledWith('name,host,database,schema,dialect_name');
|
||||
});
|
||||
|
||||
it('returns empty usage signals when system activity access fails', async () => {
|
||||
const client = new LookerClient(params(), {
|
||||
sdkFactory: () =>
|
||||
sdk({
|
||||
run_inline_query: vi.fn().mockRejectedValue(new Error('access denied')),
|
||||
search_dashboards: vi.fn().mockResolvedValue([{ id: '10', favorite_count: 4 }]),
|
||||
search_looks: vi.fn().mockResolvedValue([{ id: '30', favorite_count: 2 }]),
|
||||
search_scheduled_plans: vi.fn().mockResolvedValue([]),
|
||||
}),
|
||||
});
|
||||
|
||||
await expect(client.getSignals()).resolves.toEqual({
|
||||
dashboardUsage: [],
|
||||
lookUsage: [],
|
||||
scheduledPlans: [],
|
||||
favorites: [
|
||||
{ contentId: '10', contentType: 'dashboard', favoriteCount: 4 },
|
||||
{ contentId: '30', contentType: 'look', favoriteCount: 2 },
|
||||
],
|
||||
});
|
||||
});
|
||||
|
||||
it('paginates dashboard and Look searches', async () => {
|
||||
const dashboardPageOne = Array.from({ length: 500 }, (_, index) => ({ id: String(index + 1) }));
|
||||
const lookPageOne = Array.from({ length: 500 }, (_, index) => ({ id: String(index + 1001) }));
|
||||
const fakeSdk = sdk({
|
||||
search_dashboards: vi
|
||||
.fn()
|
||||
.mockResolvedValueOnce(dashboardPageOne)
|
||||
.mockResolvedValueOnce([{ id: '501' }]),
|
||||
search_looks: vi
|
||||
.fn()
|
||||
.mockResolvedValueOnce(lookPageOne)
|
||||
.mockResolvedValueOnce([{ id: '1501' }]),
|
||||
});
|
||||
const client = new LookerClient(params(), { sdkFactory: () => fakeSdk });
|
||||
|
||||
await expect(client.listDashboards()).resolves.toHaveLength(501);
|
||||
await expect(client.listLooks()).resolves.toHaveLength(501);
|
||||
|
||||
expect(fakeSdk.search_dashboards).toHaveBeenNthCalledWith(
|
||||
1,
|
||||
expect.objectContaining({
|
||||
deleted: false,
|
||||
fields: 'id,updated_at',
|
||||
limit: 500,
|
||||
offset: 0,
|
||||
sorts: 'id',
|
||||
}),
|
||||
);
|
||||
expect(fakeSdk.search_dashboards).toHaveBeenNthCalledWith(
|
||||
2,
|
||||
expect.objectContaining({
|
||||
limit: 500,
|
||||
offset: 500,
|
||||
}),
|
||||
);
|
||||
expect(fakeSdk.search_looks).toHaveBeenNthCalledWith(
|
||||
1,
|
||||
expect.objectContaining({
|
||||
deleted: false,
|
||||
fields: 'id,updated_at',
|
||||
limit: 500,
|
||||
offset: 0,
|
||||
sorts: 'id',
|
||||
}),
|
||||
);
|
||||
expect(fakeSdk.search_looks).toHaveBeenNthCalledWith(
|
||||
2,
|
||||
expect.objectContaining({
|
||||
limit: 500,
|
||||
offset: 500,
|
||||
}),
|
||||
);
|
||||
});
|
||||
|
||||
it('returns updatedAt cursors from dashboard and Look listing rows', async () => {
|
||||
const fakeSdk = sdk({
|
||||
search_dashboards: vi.fn().mockResolvedValue([{ id: '10', updated_at: '2026-04-30T12:00:00.000Z' }]),
|
||||
search_looks: vi.fn().mockResolvedValue([{ id: '30', updated_at: '2026-04-30T11:00:00.000Z' }]),
|
||||
});
|
||||
const client = new LookerClient(params(), { sdkFactory: () => fakeSdk });
|
||||
|
||||
await expect(client.listDashboards()).resolves.toEqual([{ id: '10', updatedAt: '2026-04-30T12:00:00.000Z' }]);
|
||||
await expect(client.listLooks()).resolves.toEqual([{ id: '30', updatedAt: '2026-04-30T11:00:00.000Z' }]);
|
||||
});
|
||||
|
||||
it('logs out the SDK session during cleanup', async () => {
|
||||
const fakeSdk = sdk();
|
||||
const client = new LookerClient(params(), { sdkFactory: () => fakeSdk });
|
||||
|
||||
await client.testConnection();
|
||||
await client.cleanup();
|
||||
|
||||
expect(fakeSdk.logout).toHaveBeenCalledTimes(1);
|
||||
});
|
||||
|
||||
it('aggregates usage, scheduled-plan, and favorite signals', async () => {
|
||||
const runInlineQuery = vi
|
||||
.fn()
|
||||
.mockResolvedValueOnce(
|
||||
JSON.stringify([
|
||||
{
|
||||
'dashboard.id': '10',
|
||||
'history.query_run_count': 3,
|
||||
'history.created_date': '2026-04-30',
|
||||
'user.id': 'user-1',
|
||||
},
|
||||
{
|
||||
'dashboard.id': '10',
|
||||
'history.query_run_count': '2',
|
||||
'history.created_date': '2026-04-29',
|
||||
'user.id': 'user-2',
|
||||
},
|
||||
]),
|
||||
)
|
||||
.mockResolvedValueOnce(
|
||||
JSON.stringify([
|
||||
{
|
||||
'look.id': '30',
|
||||
'history.query_run_count': 7,
|
||||
'history.created_date': '2026-04-28',
|
||||
'user.id': 'user-1',
|
||||
},
|
||||
]),
|
||||
);
|
||||
const fakeSdk = sdk({
|
||||
run_inline_query: runInlineQuery,
|
||||
search_dashboards: vi.fn().mockResolvedValueOnce([{ id: '10', favorite_count: 4 }]),
|
||||
search_looks: vi.fn().mockResolvedValueOnce([{ id: '30', favorite_count: 2 }]),
|
||||
search_scheduled_plans: vi.fn().mockResolvedValueOnce([
|
||||
{
|
||||
id: 'sp-dashboard',
|
||||
dashboard_id: '10',
|
||||
look_id: null,
|
||||
enabled: true,
|
||||
scheduled_plan_destination: [{ id: 'dest-1' }, { id: 'dest-2' }],
|
||||
},
|
||||
{
|
||||
id: 'sp-look',
|
||||
dashboard_id: null,
|
||||
look_id: '30',
|
||||
enabled: true,
|
||||
scheduled_plan_destination: [{ id: 'dest-3' }],
|
||||
},
|
||||
]),
|
||||
});
|
||||
const client = new LookerClient(params(), { sdkFactory: () => fakeSdk });
|
||||
|
||||
await expect(client.getSignals()).resolves.toEqual({
|
||||
dashboardUsage: [
|
||||
{
|
||||
contentId: '10',
|
||||
queryCount30d: 5,
|
||||
uniqueUsers30d: 2,
|
||||
lastRunAt: '2026-04-30',
|
||||
topUsers: ['user-1', 'user-2'],
|
||||
},
|
||||
],
|
||||
lookUsage: [
|
||||
{
|
||||
contentId: '30',
|
||||
queryCount30d: 7,
|
||||
uniqueUsers30d: 1,
|
||||
lastRunAt: '2026-04-28',
|
||||
topUsers: ['user-1'],
|
||||
},
|
||||
],
|
||||
scheduledPlans: [
|
||||
{
|
||||
contentId: '10',
|
||||
contentType: 'dashboard',
|
||||
isScheduled: true,
|
||||
scheduleCount: 1,
|
||||
recipientCount: 2,
|
||||
},
|
||||
{
|
||||
contentId: '30',
|
||||
contentType: 'look',
|
||||
isScheduled: true,
|
||||
scheduleCount: 1,
|
||||
recipientCount: 1,
|
||||
},
|
||||
],
|
||||
favorites: [
|
||||
{ contentId: '10', contentType: 'dashboard', favoriteCount: 4 },
|
||||
{ contentId: '30', contentType: 'look', favoriteCount: 2 },
|
||||
],
|
||||
});
|
||||
|
||||
expect(runInlineQuery).toHaveBeenNthCalledWith(
|
||||
1,
|
||||
expect.objectContaining({
|
||||
result_format: 'json',
|
||||
body: expect.objectContaining({
|
||||
model: 'system__activity',
|
||||
view: 'history',
|
||||
fields: ['dashboard.id', 'history.query_run_count', 'history.created_date', 'user.id'],
|
||||
}),
|
||||
}),
|
||||
);
|
||||
expect(fakeSdk.search_scheduled_plans).toHaveBeenCalledWith(
|
||||
expect.objectContaining({
|
||||
all_users: true,
|
||||
fields: 'id,dashboard_id,look_id,enabled,scheduled_plan_destination',
|
||||
limit: 500,
|
||||
offset: 0,
|
||||
sorts: 'id',
|
||||
}),
|
||||
);
|
||||
});
|
||||
|
||||
it('retries a 429 response once using Retry-After seconds', async () => {
|
||||
const sleep = vi.fn().mockResolvedValue(undefined);
|
||||
const rateLimitError = new Error('rate limited');
|
||||
Object.assign(rateLimitError, { statusCode: 429, headers: { 'retry-after': '2' } });
|
||||
const fakeSdk = sdk({
|
||||
search_dashboards: vi
|
||||
.fn()
|
||||
.mockRejectedValueOnce(rateLimitError)
|
||||
.mockResolvedValueOnce([{ id: '10' }]),
|
||||
});
|
||||
const client = new LookerClient(params(), { sdkFactory: () => fakeSdk, sleep });
|
||||
|
||||
await expect(client.listDashboards()).resolves.toEqual([{ id: '10', updatedAt: null }]);
|
||||
|
||||
expect(sleep).toHaveBeenCalledWith(2000);
|
||||
expect(fakeSdk.search_dashboards).toHaveBeenCalledTimes(2);
|
||||
});
|
||||
|
||||
it('does not retry non-429 errors', async () => {
|
||||
const sleep = vi.fn().mockResolvedValue(undefined);
|
||||
const error = new Error('broken dashboard');
|
||||
Object.assign(error, { statusCode: 500 });
|
||||
const fakeSdk = sdk({ dashboard: vi.fn().mockRejectedValue(error) });
|
||||
const client = new LookerClient(params(), { sdkFactory: () => fakeSdk, sleep });
|
||||
|
||||
await expect(client.getDashboard('10')).rejects.toThrow('broken dashboard');
|
||||
|
||||
expect(sleep).not.toHaveBeenCalled();
|
||||
expect(fakeSdk.dashboard).toHaveBeenCalledTimes(1);
|
||||
});
|
||||
|
||||
it('initializes the real @looker/sdk-node SDK with inline credentials without throwing', async () => {
|
||||
const client = new LookerClient(params());
|
||||
|
||||
const result = await client.testConnection();
|
||||
|
||||
// Without injected sdkFactory the real SDK is constructed via InlineLookerSettings.
|
||||
// This used to throw "Missing required configuration values like base_url" because
|
||||
// the parent NodeSettingsIniFile constructor validated config before the override
|
||||
// could supply credentials. Whatever happens now (auth/network failure against the
|
||||
// bogus example URL is fine) — what must NOT happen is a synchronous SDK-init throw.
|
||||
expect(result.success).toBe(false);
|
||||
expect(result.error).toBeDefined();
|
||||
expect(result.error).not.toMatch(/Missing required configuration values/i);
|
||||
|
||||
await client.cleanup();
|
||||
});
|
||||
|
||||
it('strips trailing /api/4.0 from base_url so the SDK does not double-prefix it', async () => {
|
||||
const clientWithSuffix = new LookerClient({
|
||||
base_url: 'https://example.looker.com/api/4.0',
|
||||
client_id: 'id',
|
||||
[clientSecretParam]: 'credential', // pragma: allowlist secret
|
||||
});
|
||||
const result = await clientWithSuffix.testConnection();
|
||||
expect(result.success).toBe(false);
|
||||
// If base_url is double-prefixed the SDK would hit /api/4.0/api/4.0/login. Either
|
||||
// the URL is correctly normalized (transport-level network failure) or we'd see a
|
||||
// 404/HTML response — either way the stack must not be a config-validation throw.
|
||||
expect(result.error).not.toMatch(/Missing required configuration values/i);
|
||||
await clientWithSuffix.cleanup();
|
||||
});
|
||||
});
|
||||
732
packages/cli/src/context/ingest/adapters/looker/client.ts
Normal file
732
packages/cli/src/context/ingest/adapters/looker/client.ts
Normal file
|
|
@ -0,0 +1,732 @@
|
|||
import type {
|
||||
IRequestRunInlineQuery,
|
||||
IRequestSearchDashboards,
|
||||
IRequestSearchLooks,
|
||||
IRequestSearchScheduledPlans,
|
||||
} from '@looker/sdk';
|
||||
import type { IApiSection, IApiSettings } from '@looker/sdk-rtl';
|
||||
import { LookerNodeSDK, NodeSettings } from '@looker/sdk-node';
|
||||
import type { LookerRuntimeClient } from './fetch.js';
|
||||
import type {
|
||||
StagedDashboardFile,
|
||||
StagedExploreFile,
|
||||
StagedFoldersTreeFile,
|
||||
StagedGroupFile,
|
||||
StagedLookerQuery,
|
||||
StagedLookerSignalsFile,
|
||||
StagedLookFile,
|
||||
StagedLookmlModelsFile,
|
||||
StagedUserFile,
|
||||
} from './types.js';
|
||||
|
||||
type LookerRecord = Record<string, unknown>;
|
||||
|
||||
export interface TestConnectionResult {
|
||||
success: boolean;
|
||||
error?: string;
|
||||
metadata?: Record<string, unknown>;
|
||||
}
|
||||
|
||||
export interface LookerConnectionParams extends Record<string, unknown> {
|
||||
base_url: string;
|
||||
client_id: string;
|
||||
client_secret: string;
|
||||
}
|
||||
|
||||
export interface LookerWarehouseConnectionInfo {
|
||||
name: string;
|
||||
host: string | null;
|
||||
database: string | null;
|
||||
schema: string | null;
|
||||
dialect: string | null;
|
||||
}
|
||||
|
||||
const LOOKER_PAGE_SIZE = 500;
|
||||
const LOOKER_DASHBOARD_FIELDS =
|
||||
'id,title,description,folder_id,user_id,updated_at,dashboard_elements(id,title,look_id,query(id,model,view,fields,filters,sorts,limit,dynamic_fields))';
|
||||
const LOOKER_LOOK_FIELDS =
|
||||
'id,title,description,folder_id,user_id,updated_at,query(id,model,view,fields,filters,sorts,limit,dynamic_fields)';
|
||||
const LOOKER_EXPLORE_FIELDS =
|
||||
'name,label,description,sql_table_name,connection_name,view_name,fields,joins(name,type,relationship,sql_table_name,sql_on,from)';
|
||||
|
||||
export interface LookerSdkPort {
|
||||
me(fields?: string): Promise<LookerRecord>;
|
||||
search_dashboards(request?: LookerRecord): Promise<LookerRecord[]>;
|
||||
dashboard(id: string, fields?: string): Promise<LookerRecord>;
|
||||
search_looks(request?: LookerRecord): Promise<LookerRecord[]>;
|
||||
search_scheduled_plans(request?: LookerRecord): Promise<LookerRecord[]>;
|
||||
look(id: string, fields?: string): Promise<LookerRecord>;
|
||||
all_folders(fields?: string): Promise<LookerRecord[]>;
|
||||
all_users(fields?: string): Promise<LookerRecord[]>;
|
||||
all_groups(fields?: string): Promise<LookerRecord[]>;
|
||||
all_connections(fields?: string): Promise<LookerRecord[]>;
|
||||
all_lookml_models(fields?: string): Promise<LookerRecord[]>;
|
||||
lookml_model_explore(modelName: string, exploreName: string, fields?: string): Promise<LookerRecord>;
|
||||
run_inline_query(request: IRequestRunInlineQuery): Promise<string>;
|
||||
logout(): Promise<void>;
|
||||
}
|
||||
|
||||
export interface LookerClientLogger {
|
||||
log(message: string): void;
|
||||
warn(message: string): void;
|
||||
error(message: string): void;
|
||||
debug?(message: string): void;
|
||||
}
|
||||
|
||||
export interface LookerClientDeps {
|
||||
sdkFactory?: (params: LookerConnectionParams) => LookerSdkPort;
|
||||
sleep?: (ms: number) => Promise<void>;
|
||||
logger?: LookerClientLogger;
|
||||
}
|
||||
|
||||
const defaultLogger: LookerClientLogger = {
|
||||
log: () => undefined,
|
||||
warn: () => undefined,
|
||||
error: () => undefined,
|
||||
debug: () => undefined,
|
||||
};
|
||||
|
||||
class InlineLookerSettings extends NodeSettings {
|
||||
constructor(private readonly params: LookerConnectionParams) {
|
||||
super('', {
|
||||
base_url: normalizeBaseUrl(params.base_url),
|
||||
client_id: params.client_id,
|
||||
client_secret: params.client_secret, // pragma: allowlist secret
|
||||
verify_ssl: 'true',
|
||||
timeout: '120',
|
||||
} as unknown as IApiSettings);
|
||||
}
|
||||
|
||||
override readConfig(_section?: string): IApiSection {
|
||||
return {
|
||||
base_url: normalizeBaseUrl(this.params.base_url),
|
||||
client_id: this.params.client_id,
|
||||
client_secret: this.params.client_secret, // pragma: allowlist secret
|
||||
verify_ssl: 'true',
|
||||
timeout: '120',
|
||||
};
|
||||
}
|
||||
}
|
||||
|
||||
function createLookerSdkPort(params: LookerConnectionParams): LookerSdkPort {
|
||||
const sdk = LookerNodeSDK.init40(new InlineLookerSettings(params));
|
||||
return {
|
||||
me: (fields) => sdk.ok(sdk.me(fields)).then(toRecord),
|
||||
search_dashboards: (request) =>
|
||||
sdk.ok(sdk.search_dashboards((request ?? {}) as IRequestSearchDashboards)).then(toRecordArray),
|
||||
dashboard: (id, fields) => sdk.ok(sdk.dashboard(id, fields)).then(toRecord),
|
||||
search_looks: (request) => sdk.ok(sdk.search_looks((request ?? {}) as IRequestSearchLooks)).then(toRecordArray),
|
||||
search_scheduled_plans: (request) =>
|
||||
sdk.ok(sdk.search_scheduled_plans((request ?? {}) as IRequestSearchScheduledPlans)).then(toRecordArray),
|
||||
look: (id, fields) => sdk.ok(sdk.look(id, fields)).then(toRecord),
|
||||
all_folders: (fields) => sdk.ok(sdk.all_folders(fields)).then(toRecordArray),
|
||||
all_users: (fields) => sdk.ok(sdk.all_users({ fields })).then(toRecordArray),
|
||||
all_groups: (fields) => sdk.ok(sdk.all_groups({ fields })).then(toRecordArray),
|
||||
all_connections: (fields) => sdk.ok(sdk.all_connections(fields)).then(toRecordArray),
|
||||
all_lookml_models: (fields) => sdk.ok(sdk.all_lookml_models({ fields })).then(toRecordArray),
|
||||
lookml_model_explore: (modelName, exploreName, fields) =>
|
||||
sdk
|
||||
.ok(sdk.lookml_model_explore({ lookml_model_name: modelName, explore_name: exploreName, fields }))
|
||||
.then(toRecord),
|
||||
run_inline_query: (request) => sdk.ok(sdk.run_inline_query(request)),
|
||||
logout: async () => {
|
||||
await sdk.authSession.logout();
|
||||
},
|
||||
};
|
||||
}
|
||||
|
||||
export class LookerClient implements LookerRuntimeClient {
|
||||
private readonly logger: LookerClientLogger;
|
||||
private readonly params: LookerConnectionParams;
|
||||
private sdkInstance: LookerSdkPort | null = null;
|
||||
|
||||
constructor(
|
||||
connectionParams: Record<string, unknown>,
|
||||
private readonly deps: LookerClientDeps = {},
|
||||
) {
|
||||
this.logger = deps.logger ?? defaultLogger;
|
||||
this.params = parseLookerConnectionParams(connectionParams);
|
||||
}
|
||||
|
||||
get dataSourceType(): string {
|
||||
return 'LOOKER';
|
||||
}
|
||||
|
||||
async testConnection(): Promise<TestConnectionResult> {
|
||||
try {
|
||||
const me = await this.withRateLimitRetry(() => this.sdk().me('id,display_name,email'));
|
||||
return {
|
||||
success: true,
|
||||
metadata: {
|
||||
userId: stringValue(me.id),
|
||||
displayName: nullableString(me.display_name),
|
||||
email: nullableString(me.email),
|
||||
},
|
||||
};
|
||||
} catch (error) {
|
||||
return { success: false, error: error instanceof Error ? error.message : String(error) };
|
||||
}
|
||||
}
|
||||
|
||||
async listDashboards(): Promise<Array<{ id: string; updatedAt: string | null }>> {
|
||||
const dashboards = await this.collectPaged((offset) =>
|
||||
this.sdk().search_dashboards({
|
||||
deleted: false,
|
||||
fields: 'id,updated_at',
|
||||
limit: LOOKER_PAGE_SIZE,
|
||||
offset,
|
||||
sorts: 'id',
|
||||
}),
|
||||
);
|
||||
return dashboards.flatMap(entityRef);
|
||||
}
|
||||
|
||||
async getDashboard(id: string): Promise<StagedDashboardFile> {
|
||||
const dashboard = await this.withRateLimitRetry(() => this.sdk().dashboard(id, LOOKER_DASHBOARD_FIELDS));
|
||||
const elements = arrayValue(dashboard.dashboard_elements);
|
||||
return {
|
||||
lookerId: stringValue(dashboard.id),
|
||||
title: stringValue(dashboard.title),
|
||||
description: nullableString(dashboard.description),
|
||||
folderId: nullableString(dashboard.folder_id),
|
||||
ownerId: nullableString(dashboard.user_id),
|
||||
updatedAt: nullableString(dashboard.updated_at),
|
||||
tiles: elements.map((tile) => ({
|
||||
id: stringValue(tile.id),
|
||||
title: nullableString(tile.title),
|
||||
lookId: nullableString(tile.look_id),
|
||||
query: queryValue(tile.query),
|
||||
})),
|
||||
};
|
||||
}
|
||||
|
||||
async listLooks(): Promise<Array<{ id: string; updatedAt: string | null }>> {
|
||||
const looks = await this.collectPaged((offset) =>
|
||||
this.sdk().search_looks({
|
||||
deleted: false,
|
||||
fields: 'id,updated_at',
|
||||
limit: LOOKER_PAGE_SIZE,
|
||||
offset,
|
||||
sorts: 'id',
|
||||
}),
|
||||
);
|
||||
return looks.flatMap(entityRef);
|
||||
}
|
||||
|
||||
async getLook(id: string): Promise<StagedLookFile> {
|
||||
const look = await this.withRateLimitRetry(() => this.sdk().look(id, LOOKER_LOOK_FIELDS));
|
||||
return {
|
||||
lookerId: stringValue(look.id),
|
||||
title: stringValue(look.title),
|
||||
description: nullableString(look.description),
|
||||
folderId: nullableString(look.folder_id),
|
||||
ownerId: nullableString(look.user_id),
|
||||
updatedAt: nullableString(look.updated_at),
|
||||
query: queryValue(look.query),
|
||||
};
|
||||
}
|
||||
|
||||
async listFolders(): Promise<StagedFoldersTreeFile> {
|
||||
const folders = await this.withRateLimitRetry(() => this.sdk().all_folders('id,name,parent_id'));
|
||||
const byId = new Map<string, LookerRecord>();
|
||||
for (const folder of folders) {
|
||||
byId.set(stringValue(folder.id), folder);
|
||||
}
|
||||
return {
|
||||
folders: folders.map((folder) => ({
|
||||
id: stringValue(folder.id),
|
||||
name: stringValue(folder.name),
|
||||
parentId: nullableString(folder.parent_id),
|
||||
path: folderPath(folder, byId),
|
||||
})),
|
||||
};
|
||||
}
|
||||
|
||||
async listUsers(): Promise<StagedUserFile[]> {
|
||||
const users = await this.withRateLimitRetry(() => this.sdk().all_users('id,display_name,email'));
|
||||
return users.map((user) => ({
|
||||
id: stringValue(user.id),
|
||||
displayName: nullableString(user.display_name),
|
||||
email: nullableString(user.email),
|
||||
}));
|
||||
}
|
||||
|
||||
async listGroups(): Promise<StagedGroupFile[]> {
|
||||
const groups = await this.withRateLimitRetry(() => this.sdk().all_groups('id,name'));
|
||||
return groups.map((group) => ({
|
||||
id: stringValue(group.id),
|
||||
name: stringValue(group.name),
|
||||
}));
|
||||
}
|
||||
|
||||
async listLookmlModels(): Promise<StagedLookmlModelsFile> {
|
||||
const models = await this.withRateLimitRetry(() => this.sdk().all_lookml_models('name,label,explores'));
|
||||
return {
|
||||
models: models.map((model) => ({
|
||||
name: stringValue(model.name),
|
||||
label: nullableString(model.label),
|
||||
explores: arrayValue(model.explores).map((explore) => ({
|
||||
name: stringValue(explore.name),
|
||||
label: nullableString(explore.label),
|
||||
})),
|
||||
})),
|
||||
};
|
||||
}
|
||||
|
||||
async listLookerConnections(): Promise<LookerWarehouseConnectionInfo[]> {
|
||||
const connections = await this.withRateLimitRetry(() =>
|
||||
this.sdk().all_connections('name,host,database,schema,dialect_name'),
|
||||
);
|
||||
return connections.map((connection) => ({
|
||||
name: stringValue(connection.name),
|
||||
host: nullableString(connection.host),
|
||||
database: nullableString(connection.database),
|
||||
schema: nullableString(connection.schema),
|
||||
dialect: nullableString(connection.dialect_name ?? connection.dialect),
|
||||
}));
|
||||
}
|
||||
|
||||
async getExplore(modelName: string, exploreName: string): Promise<StagedExploreFile> {
|
||||
const explore = await this.withRateLimitRetry(() =>
|
||||
this.sdk().lookml_model_explore(modelName, exploreName, LOOKER_EXPLORE_FIELDS),
|
||||
);
|
||||
const fields = recordValue(explore.fields);
|
||||
return {
|
||||
modelName,
|
||||
exploreName: stringValue(explore.name),
|
||||
label: nullableString(explore.label),
|
||||
description: nullableString(explore.description),
|
||||
rawSqlTableName: nullableString(explore.sql_table_name ?? explore.sqlTableName),
|
||||
connectionName: nullableString(explore.connection_name ?? explore.connectionName),
|
||||
viewName: nullableString(explore.view_name ?? explore.viewName),
|
||||
fields: {
|
||||
dimensions: arrayValue(fields.dimensions).map(stagedField),
|
||||
measures: arrayValue(fields.measures).map(stagedField),
|
||||
},
|
||||
joins: arrayValue(explore.joins).map((join) => ({
|
||||
name: stringValue(join.name),
|
||||
type: nullableString(join.type),
|
||||
relationship: nullableString(join.relationship),
|
||||
rawSqlTableName: nullableString(join.sql_table_name ?? join.sqlTableName),
|
||||
sqlOn: nullableString(join.sql_on ?? join.sqlOn),
|
||||
from: nullableString(join.from),
|
||||
targetTable: null,
|
||||
})),
|
||||
targetWarehouseConnectionId: null,
|
||||
targetTable: null,
|
||||
};
|
||||
}
|
||||
|
||||
async getSignals(): Promise<StagedLookerSignalsFile> {
|
||||
const [dashboardUsage, lookUsage, scheduledPlans, favorites] = await Promise.all([
|
||||
this.getUsageSignals('dashboard').catch((error) =>
|
||||
this.warnAndReturnEmpty('Looker system__activity dashboard usage unavailable', error),
|
||||
),
|
||||
this.getUsageSignals('look').catch((error) =>
|
||||
this.warnAndReturnEmpty('Looker system__activity Look usage unavailable', error),
|
||||
),
|
||||
this.getScheduledPlanSignals().catch((error) =>
|
||||
this.warnAndReturnEmpty('Looker scheduled-plan signals unavailable', error),
|
||||
),
|
||||
this.getFavoriteSignals().catch((error) => this.warnAndReturnEmpty('Looker favorite signals unavailable', error)),
|
||||
]);
|
||||
|
||||
return { dashboardUsage, lookUsage, scheduledPlans, favorites };
|
||||
}
|
||||
|
||||
async cleanup(): Promise<void> {
|
||||
const sdk = this.sdkInstance;
|
||||
if (!sdk) {
|
||||
return;
|
||||
}
|
||||
await sdk.logout();
|
||||
this.sdkInstance = null;
|
||||
}
|
||||
|
||||
private async getUsageSignals(contentType: 'dashboard' | 'look'): Promise<StagedLookerSignalsFile['dashboardUsage']> {
|
||||
const idField = contentType === 'dashboard' ? 'dashboard.id' : 'look.id';
|
||||
const raw = await this.withRateLimitRetry(() =>
|
||||
this.sdk().run_inline_query({
|
||||
result_format: 'json',
|
||||
body: {
|
||||
model: 'system__activity',
|
||||
view: 'history',
|
||||
fields: [idField, 'history.query_run_count', 'history.created_date', 'user.id'],
|
||||
filters: {
|
||||
'history.created_date': '30 days',
|
||||
[idField]: '-NULL',
|
||||
},
|
||||
sorts: ['history.query_run_count desc'],
|
||||
limit: '5000',
|
||||
},
|
||||
}),
|
||||
);
|
||||
|
||||
return aggregateUsageRows(parseJsonRows(raw), idField);
|
||||
}
|
||||
|
||||
private async getScheduledPlanSignals(): Promise<StagedLookerSignalsFile['scheduledPlans']> {
|
||||
const plans = await this.collectPaged((offset) =>
|
||||
this.sdk().search_scheduled_plans({
|
||||
all_users: true,
|
||||
fields: 'id,dashboard_id,look_id,enabled,scheduled_plan_destination',
|
||||
limit: LOOKER_PAGE_SIZE,
|
||||
offset,
|
||||
sorts: 'id',
|
||||
}),
|
||||
);
|
||||
const byContent = new Map<
|
||||
string,
|
||||
{
|
||||
contentId: string;
|
||||
contentType: 'dashboard' | 'look';
|
||||
isScheduled: boolean;
|
||||
scheduleCount: number;
|
||||
recipientCount: number;
|
||||
}
|
||||
>();
|
||||
|
||||
for (const plan of plans) {
|
||||
const dashboardId = nullableString(plan.dashboard_id);
|
||||
const lookId = nullableString(plan.look_id);
|
||||
const contentType = dashboardId ? 'dashboard' : lookId ? 'look' : null;
|
||||
const contentId = dashboardId ?? lookId;
|
||||
if (!contentType || !contentId) {
|
||||
continue;
|
||||
}
|
||||
const key = `${contentType}:${contentId}`;
|
||||
const current =
|
||||
byContent.get(key) ??
|
||||
({
|
||||
contentId,
|
||||
contentType,
|
||||
isScheduled: false,
|
||||
scheduleCount: 0,
|
||||
recipientCount: 0,
|
||||
} satisfies StagedLookerSignalsFile['scheduledPlans'][number]);
|
||||
if (plan.enabled !== false) {
|
||||
current.isScheduled = true;
|
||||
current.scheduleCount += 1;
|
||||
current.recipientCount += arrayValue(plan.scheduled_plan_destination).length;
|
||||
}
|
||||
byContent.set(key, current);
|
||||
}
|
||||
|
||||
return [...byContent.values()].filter((signal) => signal.scheduleCount > 0).sort(compareContentSignals);
|
||||
}
|
||||
|
||||
private async getFavoriteSignals(): Promise<StagedLookerSignalsFile['favorites']> {
|
||||
const dashboards = await this.collectPaged((offset) =>
|
||||
this.sdk().search_dashboards({
|
||||
deleted: false,
|
||||
fields: 'id,favorite_count',
|
||||
limit: LOOKER_PAGE_SIZE,
|
||||
offset,
|
||||
sorts: 'id',
|
||||
}),
|
||||
);
|
||||
const looks = await this.collectPaged((offset) =>
|
||||
this.sdk().search_looks({
|
||||
deleted: false,
|
||||
fields: 'id,favorite_count',
|
||||
limit: LOOKER_PAGE_SIZE,
|
||||
offset,
|
||||
sorts: 'id',
|
||||
}),
|
||||
);
|
||||
|
||||
return [
|
||||
...dashboards.flatMap((dashboard) => favoriteSignal(dashboard, 'dashboard')),
|
||||
...looks.flatMap((look) => favoriteSignal(look, 'look')),
|
||||
].sort(compareContentSignals);
|
||||
}
|
||||
|
||||
private warnAndReturnEmpty(message: string, error: unknown): never[] {
|
||||
this.logger.warn(`${message}; continuing without that prioritization input: ${errorMessage(error)}`);
|
||||
return [];
|
||||
}
|
||||
|
||||
private async collectPaged(loadPage: (offset: number) => Promise<LookerRecord[]>): Promise<LookerRecord[]> {
|
||||
const rows: LookerRecord[] = [];
|
||||
for (let offset = 0; ; offset += LOOKER_PAGE_SIZE) {
|
||||
const page = await this.withRateLimitRetry(() => loadPage(offset));
|
||||
rows.push(...page);
|
||||
if (page.length < LOOKER_PAGE_SIZE) {
|
||||
return rows;
|
||||
}
|
||||
}
|
||||
}
|
||||
|
||||
private async withRateLimitRetry<T>(load: () => Promise<T>): Promise<T> {
|
||||
try {
|
||||
return await load();
|
||||
} catch (error) {
|
||||
if (lookerStatusCode(error) !== 429) {
|
||||
throw error;
|
||||
}
|
||||
await (this.deps.sleep ?? sleep)(retryAfterMs(error));
|
||||
return load();
|
||||
}
|
||||
}
|
||||
|
||||
private sdk(): LookerSdkPort {
|
||||
if (!this.sdkInstance) {
|
||||
this.sdkInstance = this.deps.sdkFactory?.(this.params) ?? createLookerSdkPort(this.params);
|
||||
}
|
||||
return this.sdkInstance;
|
||||
}
|
||||
}
|
||||
|
||||
function parseLookerConnectionParams(raw: Record<string, unknown>): LookerConnectionParams {
|
||||
const baseUrl = raw.base_url;
|
||||
const clientId = raw.client_id;
|
||||
const apiCredential = raw.client_secret; // pragma: allowlist secret
|
||||
if (typeof baseUrl !== 'string' || baseUrl.trim() === '') {
|
||||
throw new Error('Looker base_url is required');
|
||||
}
|
||||
if (typeof clientId !== 'string' || clientId.trim() === '') {
|
||||
throw new Error('Looker client_id is required');
|
||||
}
|
||||
if (typeof apiCredential !== 'string' || apiCredential.trim() === '') {
|
||||
throw new Error('Looker client_secret is required'); // pragma: allowlist secret
|
||||
}
|
||||
return { base_url: baseUrl, client_id: clientId, client_secret: apiCredential }; // pragma: allowlist secret
|
||||
}
|
||||
|
||||
function toRecord(value: object): LookerRecord {
|
||||
return value as LookerRecord;
|
||||
}
|
||||
|
||||
function toRecordArray(values: object[]): LookerRecord[] {
|
||||
return values.map(toRecord);
|
||||
}
|
||||
|
||||
function normalizeBaseUrl(baseUrl: string): string {
|
||||
return baseUrl
|
||||
.trim()
|
||||
.replace(/\/+$/, '')
|
||||
.replace(/\/api\/(4\.0|3\.1)$/, '');
|
||||
}
|
||||
|
||||
function entityRef(row: LookerRecord): Array<{ id: string; updatedAt: string | null }> {
|
||||
if (row.id === null || row.id === undefined) {
|
||||
return [];
|
||||
}
|
||||
return [{ id: String(row.id), updatedAt: nullableString(row.updated_at) }];
|
||||
}
|
||||
|
||||
function queryValue(value: unknown): StagedLookerQuery | null {
|
||||
if (!value || typeof value !== 'object') {
|
||||
return null;
|
||||
}
|
||||
const record = value as LookerRecord;
|
||||
if (typeof record.model !== 'string' || typeof record.view !== 'string') {
|
||||
return null;
|
||||
}
|
||||
return {
|
||||
id: nullableString(record.id) ?? undefined,
|
||||
model: record.model,
|
||||
view: record.view,
|
||||
fields: stringArray(record.fields),
|
||||
filters: recordValue(record.filters),
|
||||
sorts: stringArray(record.sorts),
|
||||
limit: typeof record.limit === 'string' || typeof record.limit === 'number' ? record.limit : null,
|
||||
dynamicFields: nullableString(record.dynamic_fields ?? record.dynamicFields),
|
||||
targetWarehouseConnectionId: null,
|
||||
targetTable: null,
|
||||
};
|
||||
}
|
||||
|
||||
function parseJsonRows(raw: string): LookerRecord[] {
|
||||
const parsed = JSON.parse(raw) as unknown;
|
||||
return Array.isArray(parsed) ? parsed.filter((row): row is LookerRecord => !!row && typeof row === 'object') : [];
|
||||
}
|
||||
|
||||
function aggregateUsageRows(
|
||||
rows: LookerRecord[],
|
||||
idField: 'dashboard.id' | 'look.id',
|
||||
): StagedLookerSignalsFile['dashboardUsage'] {
|
||||
const byContent = new Map<
|
||||
string,
|
||||
{
|
||||
contentId: string;
|
||||
queryCount30d: number;
|
||||
lastRunAt: string | null;
|
||||
users: Set<string>;
|
||||
}
|
||||
>();
|
||||
|
||||
for (const row of rows) {
|
||||
const contentId = nullableString(row[idField]);
|
||||
if (!contentId) {
|
||||
continue;
|
||||
}
|
||||
const current = byContent.get(contentId) ?? {
|
||||
contentId,
|
||||
queryCount30d: 0,
|
||||
lastRunAt: null,
|
||||
users: new Set<string>(),
|
||||
};
|
||||
current.queryCount30d += numberValue(row['history.query_run_count']);
|
||||
const userId = nullableString(row['user.id']);
|
||||
if (userId) {
|
||||
current.users.add(userId);
|
||||
}
|
||||
const lastRunAt = nullableString(row['history.created_date']);
|
||||
if (lastRunAt && (!current.lastRunAt || lastRunAt > current.lastRunAt)) {
|
||||
current.lastRunAt = lastRunAt;
|
||||
}
|
||||
byContent.set(contentId, current);
|
||||
}
|
||||
|
||||
return [...byContent.values()]
|
||||
.map((signal) => ({
|
||||
contentId: signal.contentId,
|
||||
queryCount30d: signal.queryCount30d,
|
||||
uniqueUsers30d: signal.users.size,
|
||||
lastRunAt: signal.lastRunAt,
|
||||
topUsers: [...signal.users].sort().slice(0, 5),
|
||||
}))
|
||||
.sort((a, b) => a.contentId.localeCompare(b.contentId));
|
||||
}
|
||||
|
||||
function favoriteSignal(row: LookerRecord, contentType: 'dashboard' | 'look'): StagedLookerSignalsFile['favorites'] {
|
||||
const contentId = nullableString(row.id);
|
||||
if (!contentId) {
|
||||
return [];
|
||||
}
|
||||
return [{ contentId, contentType, favoriteCount: numberValue(row.favorite_count) }];
|
||||
}
|
||||
|
||||
function compareContentSignals(
|
||||
a: { contentType?: string; contentId: string },
|
||||
b: { contentType?: string; contentId: string },
|
||||
): number {
|
||||
return `${a.contentType ?? ''}:${a.contentId}`.localeCompare(`${b.contentType ?? ''}:${b.contentId}`);
|
||||
}
|
||||
|
||||
function numberValue(value: unknown): number {
|
||||
if (typeof value === 'number' && Number.isFinite(value)) {
|
||||
return value;
|
||||
}
|
||||
if (typeof value === 'string' && value.trim() !== '') {
|
||||
const parsed = Number(value);
|
||||
return Number.isFinite(parsed) ? parsed : 0;
|
||||
}
|
||||
return 0;
|
||||
}
|
||||
|
||||
function errorMessage(error: unknown): string {
|
||||
return error instanceof Error ? error.message : String(error);
|
||||
}
|
||||
|
||||
async function sleep(ms: number): Promise<void> {
|
||||
await new Promise((resolve) => setTimeout(resolve, ms));
|
||||
}
|
||||
|
||||
function lookerStatusCode(error: unknown): number | null {
|
||||
if (!error || typeof error !== 'object') {
|
||||
return null;
|
||||
}
|
||||
const record = error as Record<string, unknown>;
|
||||
const direct = record.statusCode ?? record.status;
|
||||
if (typeof direct === 'number') {
|
||||
return direct;
|
||||
}
|
||||
if (typeof direct === 'string') {
|
||||
const parsed = Number(direct);
|
||||
return Number.isFinite(parsed) ? parsed : null;
|
||||
}
|
||||
const response = record.response;
|
||||
if (response && typeof response === 'object') {
|
||||
return lookerStatusCode(response);
|
||||
}
|
||||
return null;
|
||||
}
|
||||
|
||||
function retryAfterMs(error: unknown): number {
|
||||
const value = retryAfterHeader(error);
|
||||
if (!value) {
|
||||
return 1000;
|
||||
}
|
||||
const seconds = Number(value);
|
||||
if (Number.isFinite(seconds)) {
|
||||
return Math.max(0, seconds * 1000);
|
||||
}
|
||||
const dateMs = Date.parse(value);
|
||||
return Number.isFinite(dateMs) ? Math.max(0, dateMs - Date.now()) : 1000;
|
||||
}
|
||||
|
||||
function retryAfterHeader(error: unknown): string | null {
|
||||
if (!error || typeof error !== 'object') {
|
||||
return null;
|
||||
}
|
||||
const record = error as Record<string, unknown>;
|
||||
const response = record.response;
|
||||
const responseRecord = response && typeof response === 'object' ? (response as Record<string, unknown>) : null;
|
||||
const headers = record.headers ?? responseRecord?.headers;
|
||||
if (!headers || typeof headers !== 'object') {
|
||||
return null;
|
||||
}
|
||||
const getter = (headers as { get?: unknown }).get;
|
||||
if (typeof getter === 'function') {
|
||||
const value = getter.call(headers, 'retry-after');
|
||||
return typeof value === 'string' ? value : null;
|
||||
}
|
||||
const headerRecord = headers as Record<string, unknown>;
|
||||
const direct = headerRecord['retry-after'] ?? headerRecord['Retry-After'];
|
||||
return typeof direct === 'string' ? direct : null;
|
||||
}
|
||||
|
||||
function stagedField(value: LookerRecord) {
|
||||
return {
|
||||
name: stringValue(value.name),
|
||||
label: nullableString(value.label),
|
||||
type: nullableString(value.type),
|
||||
sql: nullableString(value.sql),
|
||||
description: nullableString(value.description),
|
||||
};
|
||||
}
|
||||
|
||||
function folderPath(folder: LookerRecord, byId: Map<string, LookerRecord>): string[] {
|
||||
const path: string[] = [];
|
||||
let current: LookerRecord | undefined = folder;
|
||||
const seen = new Set<string>();
|
||||
while (current) {
|
||||
const id = stringValue(current.id);
|
||||
if (seen.has(id)) {
|
||||
break;
|
||||
}
|
||||
seen.add(id);
|
||||
path.unshift(stringValue(current.name));
|
||||
const parentId = nullableString(current.parent_id);
|
||||
current = parentId ? byId.get(parentId) : undefined;
|
||||
}
|
||||
return path;
|
||||
}
|
||||
|
||||
function arrayValue(value: unknown): LookerRecord[] {
|
||||
return Array.isArray(value) ? value.filter((item): item is LookerRecord => !!item && typeof item === 'object') : [];
|
||||
}
|
||||
|
||||
function recordValue(value: unknown): Record<string, unknown> {
|
||||
return value && typeof value === 'object' && !Array.isArray(value) ? { ...(value as Record<string, unknown>) } : {};
|
||||
}
|
||||
|
||||
function stringArray(value: unknown): string[] {
|
||||
return Array.isArray(value) ? value.filter((item): item is string => typeof item === 'string') : [];
|
||||
}
|
||||
|
||||
function stringValue(value: unknown): string {
|
||||
if (value === null || value === undefined) {
|
||||
return '';
|
||||
}
|
||||
return String(value);
|
||||
}
|
||||
|
||||
function nullableString(value: unknown): string | null {
|
||||
if (value === null || value === undefined) {
|
||||
return null;
|
||||
}
|
||||
return String(value);
|
||||
}
|
||||
|
|
@ -0,0 +1,44 @@
|
|||
import { describe, expect, it, vi } from 'vitest';
|
||||
import { createDaemonLookerTableIdentifierParser } from './daemon-table-identifier-parser.js';
|
||||
|
||||
describe('createDaemonLookerTableIdentifierParser', () => {
|
||||
it('posts parse items to the daemon endpoint', async () => {
|
||||
const requestJson = vi.fn(async () => ({
|
||||
results: {
|
||||
orders: {
|
||||
ok: true,
|
||||
catalog: null,
|
||||
schema: 'public',
|
||||
name: 'orders',
|
||||
canonical_table: 'public.orders',
|
||||
},
|
||||
},
|
||||
}));
|
||||
const parser = createDaemonLookerTableIdentifierParser({
|
||||
baseUrl: 'http://127.0.0.1:8765',
|
||||
requestJson,
|
||||
});
|
||||
|
||||
await expect(parser.parse([{ key: 'orders', sql_table_name: 'public.orders', dialect: 'postgres' }])).resolves.toEqual({
|
||||
orders: {
|
||||
ok: true,
|
||||
catalog: null,
|
||||
schema: 'public',
|
||||
name: 'orders',
|
||||
canonical_table: 'public.orders',
|
||||
},
|
||||
});
|
||||
expect(requestJson).toHaveBeenCalledWith('/sql/parse-table-identifier', {
|
||||
items: [{ key: 'orders', sql_table_name: 'public.orders', dialect: 'postgres' }],
|
||||
});
|
||||
});
|
||||
|
||||
it('rejects non-object daemon responses', async () => {
|
||||
const parser = createDaemonLookerTableIdentifierParser({
|
||||
baseUrl: 'http://127.0.0.1:8765',
|
||||
requestJson: async () => ({ results: null }),
|
||||
});
|
||||
|
||||
await expect(parser.parse([])).rejects.toThrow('ktx-daemon table identifier parser returned invalid results');
|
||||
});
|
||||
});
|
||||
|
|
@ -0,0 +1,81 @@
|
|||
import { request as httpRequest } from 'node:http';
|
||||
import { request as httpsRequest } from 'node:https';
|
||||
import { URL } from 'node:url';
|
||||
import type {
|
||||
LookerParsedIdentifier,
|
||||
LookerTableIdentifierParseItem,
|
||||
LookerTableIdentifierParser,
|
||||
} from './mapping.js';
|
||||
|
||||
export type KtxDaemonTableIdentifierHttpJsonRunner = (
|
||||
path: string,
|
||||
payload: Record<string, unknown>,
|
||||
) => Promise<Record<string, unknown>>;
|
||||
|
||||
export interface DaemonLookerTableIdentifierParserOptions {
|
||||
baseUrl: string;
|
||||
requestJson?: KtxDaemonTableIdentifierHttpJsonRunner;
|
||||
}
|
||||
|
||||
export function createDaemonLookerTableIdentifierParser(
|
||||
options: DaemonLookerTableIdentifierParserOptions,
|
||||
): LookerTableIdentifierParser {
|
||||
const requestJson = options.requestJson ?? postJson(options.baseUrl);
|
||||
return {
|
||||
async parse(items: LookerTableIdentifierParseItem[]): Promise<Record<string, LookerParsedIdentifier>> {
|
||||
const raw = await requestJson('/sql/parse-table-identifier', { items });
|
||||
if (!raw.results || typeof raw.results !== 'object' || Array.isArray(raw.results)) {
|
||||
throw new Error('ktx-daemon table identifier parser returned invalid results');
|
||||
}
|
||||
return raw.results as Record<string, LookerParsedIdentifier>;
|
||||
},
|
||||
};
|
||||
}
|
||||
|
||||
function normalizedBaseUrl(baseUrl: string): string {
|
||||
return baseUrl.endsWith('/') ? baseUrl : `${baseUrl}/`;
|
||||
}
|
||||
|
||||
function postJson(baseUrl: string): KtxDaemonTableIdentifierHttpJsonRunner {
|
||||
return async (path, payload) =>
|
||||
new Promise((resolve, reject) => {
|
||||
const target = new URL(path.replace(/^\//, ''), normalizedBaseUrl(baseUrl));
|
||||
const body = JSON.stringify(payload);
|
||||
const client = target.protocol === 'https:' ? httpsRequest : httpRequest;
|
||||
const request = client(
|
||||
target,
|
||||
{
|
||||
method: 'POST',
|
||||
headers: {
|
||||
accept: 'application/json',
|
||||
'content-type': 'application/json',
|
||||
'content-length': Buffer.byteLength(body),
|
||||
},
|
||||
},
|
||||
(response) => {
|
||||
const chunks: Buffer[] = [];
|
||||
response.on('data', (chunk: Buffer) => chunks.push(chunk));
|
||||
response.on('end', () => {
|
||||
const text = Buffer.concat(chunks).toString('utf8');
|
||||
const statusCode = response.statusCode ?? 0;
|
||||
if (statusCode < 200 || statusCode >= 300) {
|
||||
reject(new Error(`ktx-daemon HTTP ${path} failed with ${statusCode}: ${text}`));
|
||||
return;
|
||||
}
|
||||
try {
|
||||
const parsed = JSON.parse(text) as unknown;
|
||||
if (!parsed || typeof parsed !== 'object' || Array.isArray(parsed)) {
|
||||
reject(new Error(`ktx-daemon HTTP ${path} returned non-object JSON`));
|
||||
return;
|
||||
}
|
||||
resolve(parsed as Record<string, unknown>);
|
||||
} catch (error) {
|
||||
reject(error);
|
||||
}
|
||||
});
|
||||
},
|
||||
);
|
||||
request.on('error', reject);
|
||||
request.end(body);
|
||||
});
|
||||
}
|
||||
Some files were not shown because too many files have changed in this diff Show more
Loading…
Add table
Add a link
Reference in a new issue