mirror of
https://github.com/Kaelio/ktx.git
synced 2026-06-25 08:48:08 +02:00
chore(workspace): gate dead-code with knip production mode (#196)
* refactor(workspace): relocate @ktx/llm source into packages/cli/src/llm * refactor(workspace): rewrite @ktx/llm imports to relative paths * refactor(workspace): fold internal packages into cli * chore(workspace): gate dead-code with knip production mode Turn on production-mode knip plus an autofix run in pre-commit and the `pnpm dead-code` script, document the `/** @internal */` convention for test-only exports in AGENTS.md, annotate test-only exports across the CLI with that JSDoc, and drop dead exports/wrappers the new gate surfaced (e.g. `cli-project.ts`, `lookerRuntimeSourceToFileAdapterSource`, `createLocalScanEnrichmentProvidersFromConfig`, `PGLITE_OWNER_PROCESS_BACKEND_CAPABILITIES`, stale type re-exports). Replace the loose `ignoreIssues` allowlist in `knip.json` with explicit production entries so cross-package barrel leaks are caught. * refactor(cli): delete internal barrel index.ts files The 34 `index.ts` re-export barrels inside `packages/cli/src/` were holdovers from the pre-fold multi-workspace structure. Post-fold-in they served no production purpose: external consumers go through the single package main entry, and in-repo callers mostly imported through them only because the path was short. Internally, knip flagged most barrel re-exports as production-dead (only reached via tests). This change: - Deletes every internal barrel except `packages/cli/src/index.ts` (the published package entry). - Rewrites ~270 source/test files to import each name directly from the file that defines it. - Moves `tools/warehouse-verification/index.ts` to `create-warehouse-verification-tools.ts` (the function it defined locally) and updates its single consumer. - Renames `search/backend-conformance.ts` → `.test-utils.ts` to match the existing test-helper file convention. - Deletes 13 dead test-only chains (dbt-descriptions/*, live-database/extracted-schema, live-database/structural-sync, relationship-* feedback/review chain) plus their tests and a cascading orphan integration test. - Updates test mocks that pointed at deleted barrel paths (notion-client, connector barrels in scan/local-scan-connectors tests) to mock the source files instead. - Points the maintainer benchmark script (`scripts/relationship-benchmark-report.mjs`) at source files instead of `dist/context/scan/index.js`. - Drops the barrel `!` entries from `knip.json`; adds explicit production entries only for the benchmark code reached via dist by the maintainer script. Net: 413 files changed, ~1.2k insertions, ~9.4k deletions. `pnpm run dead-code` (Biome + knip default + knip production) and `pnpm run type-check` are clean; 2277 tests pass. * refactor(workspace): rename @ktx/cli to @kaelio/ktx and pack it directly Promote the CLI workspace package to the public name `@kaelio/ktx` and drop the separate `scripts/build-public-npm-package.mjs` wrapper. The CLI package is now publishable in place (`publishConfig.access: public`, `provenance: true`), so artifact packing uses `pnpm pack` against `packages/cli/` instead of assembling a parallel package tree. Updates all workspace filter invocations, docs, tests, and release readiness checks to reference the new package name, and folds the tarball-name helper into `scripts/public-npm-release-metadata.mjs`. * docs: align "agent clients" and "data agents" terminology Replace "client agents" with "agent clients" and "database agents" with "data agents" across AGENTS.md, README.md, the docs-site copy, and the matching setup-agents test description, matching the canonical vocabulary in docs/terminology.md. Also moves packages/cli/tsconfig.json's tsBuildInfoFile from node_modules/.cache/ to dist/.tsbuildinfo so incremental builds survive node_modules reinstalls. * refactor(release): single source of truth for package version Make packages/cli/package.json the single source of truth for the @kaelio/ktx version. publicNpmPackageVersion() now reads it directly, so artifact filenames, release-readiness checks, and the Python wheel version all derive from one field. The duplicate release-policy.json.publicNpmPackageVersion is removed. Previously the two fields could drift: tarballs were named kaelio-ktx-0.4.1.tgz while internally containing @kaelio/ktx@0.0.0-private. - update-public-release-version.mjs rewrites both Python pyproject.toml files (ktx-daemon, ktx-sl) alongside the npm package.jsons, normalizing the version for PEP 440 (e.g. 0.1.0-rc.2 -> 0.1.0rc2). - semantic-release-config.cjs adds the two pyproject.toml files to @semantic-release/git assets so the release commit back to main carries every version source in lockstep. - The six "?? '0.0.0-private'" fallback literals across the CLI are replaced with "?? getKtxCliPackageInfo().version", and createDefaultKtxMcpServer makes its version arg required. - docs/release.md describes the actual commit-back model: the dev tree always reflects the most recent release; no sentinel pin to maintain. Verified: pnpm run artifacts:build now produces kaelio-ktx-0.4.1.tgz and kaelio_ktx-0.4.1-py3-none-any.whl with @kaelio/ktx@0.4.1 inside. Full type-check, dead-code, and 2287 vitests + 173 script tests pass. * refactor(cli): inject embedding provider resolution and detect sentence-transformers runtime Make resolveProjectEmbeddingProvider and runtimeIo injectable in ingest and scan command entrypoints so tests can stub them, and teach resolvePublicIngestRuntimeRequirements to flag the local-embeddings runtime feature when ktx.yaml selects sentence-transformers. * chore(cli): mark buildLocalStatsStatus and LocalStatsStatus as @internal Both symbols are consumed only by status-project.test.ts. Annotating with /** @internal */ keeps knip's production-mode check clean without changing runtime behavior. * fix(cli): use real package metadata in print-command-tree The stubbed package name embedded a forbidden product identifier that tripped the boundary check in CI. Read the metadata from package.json instead — keeps the rendered tree unchanged and removes a duplicate source of truth. * feat(cli): show embedding coverage in `ktx status`, drop duplicate disk counts Inline `(N embedded)` next to the Wiki scope counts and Semantic-layer source counts, computed with `SUM(embedding_json IS NOT NULL)` over `knowledge_pages` and `local_sl_sources`. Rename the "Knowledge" label to "Wiki" (canonical per `docs/terminology.md`) and rename the matching `localStats.knowledgePages` field to `localStats.wikiPages`. Drop `wiki=N md` and `semantic-layer=N yaml` from the Disk row — those duplicated the per-surface rows above. Disk now reports only actual byte usage (db, cache, raw-sources). The unused `wikiGlobalMarkdownCount` / `semanticLayerYamlCount` fields, the `isMarkdownEntry` / `isYamlEntry` helpers, and the `filter` arg on `summarizeDir` are removed.
This commit is contained in:
parent
a1cfb03d73
commit
2366b00301
1002 changed files with 2286 additions and 12051 deletions
42
packages/cli/src/context/ingest/action-identity.test.ts
Normal file
42
packages/cli/src/context/ingest/action-identity.test.ts
Normal file
|
|
@ -0,0 +1,42 @@
|
|||
import { describe, expect, it } from 'vitest';
|
||||
import { actionTargetConnectionId, memoryActionIdentity } from './action-identity.js';
|
||||
|
||||
describe('memory action target identity', () => {
|
||||
it('keys SL actions by target connection and wiki actions by run connection', () => {
|
||||
expect(
|
||||
memoryActionIdentity(
|
||||
{ target: 'sl', type: 'created', key: 'orders', detail: '', targetConnectionId: 'warehouse-b' },
|
||||
'looker-run',
|
||||
),
|
||||
).toBe('sl:warehouse-b:orders');
|
||||
|
||||
expect(memoryActionIdentity({ target: 'sl', type: 'created', key: 'orders', detail: '' }, 'warehouse-a')).toBe(
|
||||
'sl:warehouse-a:orders',
|
||||
);
|
||||
|
||||
expect(
|
||||
memoryActionIdentity(
|
||||
{
|
||||
target: 'wiki',
|
||||
type: 'created',
|
||||
key: 'wiki/global/orders.md',
|
||||
detail: '',
|
||||
targetConnectionId: 'ignored',
|
||||
},
|
||||
'looker-run',
|
||||
),
|
||||
).toBe('wiki:looker-run:wiki/global/orders.md');
|
||||
});
|
||||
|
||||
it('resolves action target connection only for SL actions', () => {
|
||||
expect(
|
||||
actionTargetConnectionId(
|
||||
{ target: 'sl', type: 'updated', key: 'orders', detail: '', targetConnectionId: 'warehouse-b' },
|
||||
'looker-run',
|
||||
),
|
||||
).toBe('warehouse-b');
|
||||
expect(actionTargetConnectionId({ target: 'wiki', type: 'updated', key: 'orders', detail: '' }, 'looker-run')).toBe(
|
||||
'looker-run',
|
||||
);
|
||||
});
|
||||
});
|
||||
9
packages/cli/src/context/ingest/action-identity.ts
Normal file
9
packages/cli/src/context/ingest/action-identity.ts
Normal file
|
|
@ -0,0 +1,9 @@
|
|||
import type { MemoryAction } from '../../context/memory/types.js';
|
||||
|
||||
export function actionTargetConnectionId(action: MemoryAction, runConnectionId: string): string {
|
||||
return action.target === 'sl' ? (action.targetConnectionId ?? runConnectionId) : runConnectionId;
|
||||
}
|
||||
|
||||
export function memoryActionIdentity(action: MemoryAction, runConnectionId: string): string {
|
||||
return `${action.target}:${actionTargetConnectionId(action, runConnectionId)}:${action.key}`;
|
||||
}
|
||||
|
|
@ -0,0 +1,214 @@
|
|||
import { describe, expect, it } from 'vitest';
|
||||
import { parseDbtSchemaFile, parseDbtSchemaFiles } from './parse-schema.js';
|
||||
|
||||
describe('dbt descriptions schema parser', () => {
|
||||
it('resolves shared dbt vars and defaults before parsing schema YAML', () => {
|
||||
const result = parseDbtSchemaFile(
|
||||
`
|
||||
version: 2
|
||||
sources:
|
||||
- name: raw
|
||||
database: "{{ var('database') }}"
|
||||
schema: "{{ var('schema', 'fallback_schema') }}"
|
||||
tables:
|
||||
- name: orders
|
||||
identifier: fct_orders
|
||||
description: "Orders from {{ var('database') }}"
|
||||
columns:
|
||||
- name: customer_id
|
||||
description: "Customer id"
|
||||
tests:
|
||||
- relationships:
|
||||
to: ref('customers')
|
||||
field: id
|
||||
models:
|
||||
- name: "{{ var('model_name', 'orders_model') }}"
|
||||
schema: "{{ var('model_schema') }}"
|
||||
columns:
|
||||
- name: id
|
||||
description: "Order id"
|
||||
`,
|
||||
{ path: 'models/schema.yml', variables: new Map([['database', 'analytics'], ['model_schema', 'mart']]) },
|
||||
);
|
||||
|
||||
expect(result.tables).toEqual([
|
||||
{
|
||||
name: 'fct_orders',
|
||||
description: 'Orders from analytics',
|
||||
database: 'analytics',
|
||||
schema: 'fallback_schema',
|
||||
columns: [
|
||||
{
|
||||
name: 'customer_id',
|
||||
description: 'Customer id',
|
||||
dataType: null,
|
||||
dataTests: [{ name: 'relationships', package: 'dbt', kwargs: { to: "ref('customers')", field: 'id' } }],
|
||||
},
|
||||
],
|
||||
resourceType: 'source',
|
||||
},
|
||||
{
|
||||
name: 'orders_model',
|
||||
description: null,
|
||||
database: null,
|
||||
schema: 'mart',
|
||||
columns: [{ name: 'id', description: 'Order id', dataType: null }],
|
||||
resourceType: 'model',
|
||||
},
|
||||
]);
|
||||
expect(result.relationships).toEqual([
|
||||
{
|
||||
fromTable: 'fct_orders',
|
||||
fromColumn: 'customer_id',
|
||||
toTable: 'customers',
|
||||
toColumn: 'id',
|
||||
fromSchema: 'fallback_schema',
|
||||
},
|
||||
]);
|
||||
});
|
||||
|
||||
it('deduplicates tables by database schema and name while merging columns', () => {
|
||||
const result = parseDbtSchemaFiles([
|
||||
{
|
||||
path: 'models/a.yml',
|
||||
content: `
|
||||
version: 2
|
||||
models:
|
||||
- name: orders
|
||||
description: Orders
|
||||
columns:
|
||||
- name: id
|
||||
description: Primary key
|
||||
`,
|
||||
},
|
||||
{
|
||||
path: 'models/b.yml',
|
||||
content: `
|
||||
version: 2
|
||||
models:
|
||||
- name: orders
|
||||
columns:
|
||||
- name: status
|
||||
description: Status
|
||||
- name: id
|
||||
data_type: integer
|
||||
`,
|
||||
},
|
||||
]);
|
||||
|
||||
expect(result.tables).toEqual([
|
||||
{
|
||||
name: 'orders',
|
||||
description: 'Orders',
|
||||
database: null,
|
||||
schema: null,
|
||||
resourceType: 'model',
|
||||
columns: [
|
||||
{ name: 'id', description: 'Primary key', dataType: 'integer' },
|
||||
{ name: 'status', description: 'Status', dataType: null },
|
||||
],
|
||||
},
|
||||
]);
|
||||
});
|
||||
|
||||
it('returns an empty result for malformed YAML and preserves unresolved Jinja text', () => {
|
||||
expect(parseDbtSchemaFile('{{{{ invalid yaml', { path: 'broken.yml' })).toEqual({
|
||||
projectName: null,
|
||||
dbtVersion: null,
|
||||
tables: [],
|
||||
relationships: [],
|
||||
});
|
||||
|
||||
const unresolved = parseDbtSchemaFile(
|
||||
`
|
||||
version: 2
|
||||
models:
|
||||
- name: "{{ var('missing_model') }}"
|
||||
`,
|
||||
{ variables: new Map() },
|
||||
);
|
||||
expect(unresolved.tables[0]?.name).toBe("{{ var('missing_model') }}");
|
||||
});
|
||||
|
||||
it('extracts data tests, constraints, enum values, tags, and freshness', () => {
|
||||
const result = parseDbtSchemaFile(`
|
||||
version: 2
|
||||
sources:
|
||||
- name: raw
|
||||
schema: jaffle
|
||||
tags: ["raw"]
|
||||
tables:
|
||||
- name: customers
|
||||
tags: ["core"]
|
||||
loaded_at_field: updated_at
|
||||
freshness:
|
||||
warn_after: { count: 12, period: hour }
|
||||
columns:
|
||||
- name: id
|
||||
tests:
|
||||
- not_null
|
||||
- unique
|
||||
- name: status
|
||||
data_tests:
|
||||
- accepted_values:
|
||||
values: ['active', 'inactive']
|
||||
models:
|
||||
- name: orders
|
||||
tags: ["finance"]
|
||||
loaded_at_field: run_at
|
||||
columns:
|
||||
- name: status
|
||||
data_tests:
|
||||
- dbt_utils.expression_is_true:
|
||||
expression: "status is not null"
|
||||
- accepted_values: ['placed', 'shipped']
|
||||
`);
|
||||
|
||||
const customers = result.tables.find((table) => table.name === 'customers');
|
||||
expect(customers?.tagsDbt).toEqual(['raw', 'core']);
|
||||
expect(customers?.freshnessDbt?.loadedAtField).toBe('updated_at');
|
||||
expect(customers?.freshnessDbt?.raw).toBeDefined();
|
||||
const id = customers?.columns.find((column) => column.name === 'id');
|
||||
expect(id?.constraints?.dbt).toEqual({ not_null: true, unique: true });
|
||||
const status = customers?.columns.find((column) => column.name === 'status');
|
||||
expect(status?.enumValuesDbt).toEqual(['active', 'inactive']);
|
||||
|
||||
const orders = result.tables.find((table) => table.name === 'orders');
|
||||
expect(orders?.tagsDbt).toEqual(['finance']);
|
||||
expect(orders?.freshnessDbt?.loadedAtField).toBe('run_at');
|
||||
const ordersStatus = orders?.columns.find((column) => column.name === 'status');
|
||||
expect(ordersStatus?.enumValuesDbt).toEqual(['placed', 'shipped']);
|
||||
expect(ordersStatus?.dataTests).toEqual(
|
||||
expect.arrayContaining([
|
||||
expect.objectContaining({ package: 'dbt_utils', name: 'expression_is_true' }),
|
||||
expect.objectContaining({ package: 'dbt', name: 'accepted_values' }),
|
||||
]),
|
||||
);
|
||||
});
|
||||
|
||||
it('parses relationships from model column data tests', () => {
|
||||
const result = parseDbtSchemaFile(`
|
||||
version: 2
|
||||
models:
|
||||
- name: orders
|
||||
schema: public
|
||||
columns:
|
||||
- name: customer_id
|
||||
data_tests:
|
||||
- relationships:
|
||||
arguments:
|
||||
to: "ref('customers')"
|
||||
field: id
|
||||
`);
|
||||
|
||||
expect(result.relationships).toEqual([
|
||||
{
|
||||
fromTable: 'orders',
|
||||
fromColumn: 'customer_id',
|
||||
toTable: 'customers',
|
||||
toColumn: 'id',
|
||||
fromSchema: 'public',
|
||||
},
|
||||
]);
|
||||
});
|
||||
});
|
||||
|
|
@ -0,0 +1,649 @@
|
|||
import { createHash } from 'node:crypto';
|
||||
import { parse as parseYaml } from 'yaml';
|
||||
import { type KtxLogger, noopLogger } from '../../../../context/core/config.js';
|
||||
import { resolveJinjaVariables } from '../../dbt-shared/project-vars.js';
|
||||
|
||||
interface DbtParsedColumn {
|
||||
name: string;
|
||||
description: string | null;
|
||||
dataType: string | null;
|
||||
dataTests?: DbtDataTestRef[];
|
||||
constraints?: DbtColumnConstraints;
|
||||
enumValuesDbt?: string[];
|
||||
}
|
||||
|
||||
interface DbtDataTestRef {
|
||||
name: string;
|
||||
package: string;
|
||||
kwargs?: Record<string, unknown>;
|
||||
}
|
||||
|
||||
interface DbtColumnConstraints {
|
||||
dbt: {
|
||||
not_null?: boolean;
|
||||
unique?: boolean;
|
||||
};
|
||||
}
|
||||
|
||||
interface DbtParsedRelationship {
|
||||
fromTable: string;
|
||||
fromColumn: string;
|
||||
toTable: string;
|
||||
toColumn: string;
|
||||
fromSchema?: string;
|
||||
toSchema?: string;
|
||||
description?: string;
|
||||
}
|
||||
|
||||
interface DbtParsedTable {
|
||||
name: string;
|
||||
description: string | null;
|
||||
database: string | null;
|
||||
schema: string | null;
|
||||
columns: DbtParsedColumn[];
|
||||
resourceType?: 'source' | 'model';
|
||||
tagsDbt?: string[];
|
||||
freshnessDbt?: {
|
||||
raw?: unknown;
|
||||
loadedAtField?: string | null;
|
||||
};
|
||||
}
|
||||
|
||||
export interface DbtSchemaParseResult {
|
||||
projectName: string | null;
|
||||
dbtVersion: string | null;
|
||||
tables: DbtParsedTable[];
|
||||
relationships: DbtParsedRelationship[];
|
||||
}
|
||||
|
||||
export interface DbtSchemaFile {
|
||||
content: string;
|
||||
path: string;
|
||||
}
|
||||
|
||||
interface ParseDbtSchemaOptions {
|
||||
path?: string;
|
||||
variables?: Map<string, string>;
|
||||
projectName?: string | null;
|
||||
logger?: KtxLogger;
|
||||
}
|
||||
|
||||
interface DbtSchemaYaml {
|
||||
version?: number;
|
||||
sources?: DbtSchemaSource[];
|
||||
models?: DbtSchemaModel[];
|
||||
}
|
||||
|
||||
interface DbtSchemaSource {
|
||||
name: string;
|
||||
description?: string;
|
||||
database?: string;
|
||||
schema?: string;
|
||||
tags?: string[];
|
||||
tables?: DbtSchemaTable[];
|
||||
}
|
||||
|
||||
interface DbtSchemaTable {
|
||||
name: string;
|
||||
description?: string;
|
||||
identifier?: string;
|
||||
tags?: string[];
|
||||
loaded_at_field?: string;
|
||||
freshness?: unknown;
|
||||
columns?: DbtSchemaColumn[];
|
||||
}
|
||||
|
||||
interface DbtSchemaModel {
|
||||
name: string;
|
||||
description?: string;
|
||||
database?: string;
|
||||
schema?: string;
|
||||
tags?: string[];
|
||||
loaded_at_field?: string;
|
||||
freshness?: unknown;
|
||||
columns?: DbtSchemaColumn[];
|
||||
}
|
||||
|
||||
interface DbtSchemaColumn {
|
||||
name: string;
|
||||
description?: string;
|
||||
data_type?: string;
|
||||
data_tests?: DbtSchemaDataTest[];
|
||||
tests?: DbtSchemaDataTest[];
|
||||
}
|
||||
|
||||
type DbtSchemaDataTest =
|
||||
| string
|
||||
| {
|
||||
relationships?: {
|
||||
to?: string;
|
||||
field?: string;
|
||||
arguments?: { to?: string; field?: string };
|
||||
};
|
||||
not_null?: unknown;
|
||||
unique?: unknown;
|
||||
accepted_values?: { values?: unknown } | unknown;
|
||||
[key: string]: unknown;
|
||||
};
|
||||
|
||||
/** @internal */
|
||||
export function parseDbtSchemaFile(content: string, options: ParseDbtSchemaOptions = {}): DbtSchemaParseResult {
|
||||
return new DbtSchemaParser(options.logger ?? noopLogger).parseFile(content, options);
|
||||
}
|
||||
|
||||
export function parseDbtSchemaFiles(
|
||||
files: DbtSchemaFile[],
|
||||
variables?: Map<string, string>,
|
||||
options: { projectName?: string | null; logger?: KtxLogger } = {},
|
||||
): DbtSchemaParseResult {
|
||||
return new DbtSchemaParser(options.logger ?? noopLogger).parseFiles(files, variables, options.projectName ?? null);
|
||||
}
|
||||
|
||||
|
||||
class DbtSchemaParser {
|
||||
constructor(private readonly logger: KtxLogger) {}
|
||||
|
||||
parseFile(yamlContent: string, options: ParseDbtSchemaOptions = {}): DbtSchemaParseResult {
|
||||
this.logger.debug(`Parsing schema file: ${options.path ?? 'unknown'}`);
|
||||
|
||||
const resolved = options.variables
|
||||
? resolveJinjaVariables(yamlContent, options.variables)
|
||||
: { content: yamlContent, unresolvedVars: [] };
|
||||
if (resolved.unresolvedVars.length > 0) {
|
||||
this.logger.warn(
|
||||
`Unresolved dbt variables in ${options.path ?? 'schema file'}: ${resolved.unresolvedVars.join(', ')}`,
|
||||
);
|
||||
}
|
||||
|
||||
let schema: DbtSchemaYaml;
|
||||
try {
|
||||
schema = parseYaml(resolved.content) as DbtSchemaYaml;
|
||||
} catch (error) {
|
||||
this.logger.warn(`Failed to parse YAML${options.path ? ` at ${options.path}` : ''}: ${error}`);
|
||||
return this.emptyResult(options.projectName ?? null);
|
||||
}
|
||||
|
||||
if (!schema || typeof schema !== 'object') {
|
||||
return this.emptyResult(options.projectName ?? null);
|
||||
}
|
||||
|
||||
const tables = [...this.parseSources(schema.sources), ...this.parseModels(schema.models)];
|
||||
const relationships = [
|
||||
...this.parseSourceRelationships(schema.sources),
|
||||
...this.parseModelRelationships(schema.models),
|
||||
];
|
||||
|
||||
return {
|
||||
projectName: options.projectName ?? null,
|
||||
dbtVersion: null,
|
||||
tables,
|
||||
relationships,
|
||||
};
|
||||
}
|
||||
|
||||
parseFiles(
|
||||
files: DbtSchemaFile[],
|
||||
variables?: Map<string, string>,
|
||||
projectName: string | null = null,
|
||||
): DbtSchemaParseResult {
|
||||
const allTables: DbtParsedTable[] = [];
|
||||
const allRelationships: DbtParsedRelationship[] = [];
|
||||
|
||||
for (const file of files) {
|
||||
const result = this.parseFile(file.content, { path: file.path, variables, projectName });
|
||||
allTables.push(...result.tables);
|
||||
allRelationships.push(...result.relationships);
|
||||
}
|
||||
|
||||
return {
|
||||
projectName,
|
||||
dbtVersion: null,
|
||||
tables: this.deduplicateTables(allTables),
|
||||
relationships: this.deduplicateRelationships(allRelationships),
|
||||
};
|
||||
}
|
||||
|
||||
private parseSources(sources: DbtSchemaSource[] | undefined): DbtParsedTable[] {
|
||||
if (!sources || !Array.isArray(sources)) {
|
||||
return [];
|
||||
}
|
||||
|
||||
const tables: DbtParsedTable[] = [];
|
||||
|
||||
for (const source of sources) {
|
||||
const sourceSchema = source.schema ?? source.name;
|
||||
const sourceDatabase = source.database ?? null;
|
||||
const sourceTags = this.normalizeTagList(source.tags);
|
||||
|
||||
if (!source.tables || !Array.isArray(source.tables)) {
|
||||
continue;
|
||||
}
|
||||
|
||||
for (const table of source.tables) {
|
||||
const tagsDbt = this.mergeTagsDbt(sourceTags, this.normalizeTagList(table.tags));
|
||||
const freshnessDbt = this.buildFreshnessDbt(table.freshness, table.loaded_at_field);
|
||||
tables.push({
|
||||
name: table.identifier ?? table.name,
|
||||
description: this.normalizeDescription(table.description),
|
||||
database: sourceDatabase,
|
||||
schema: sourceSchema,
|
||||
columns: this.parseColumns(table.columns),
|
||||
resourceType: 'source',
|
||||
...(tagsDbt ? { tagsDbt } : {}),
|
||||
...(freshnessDbt ? { freshnessDbt } : {}),
|
||||
});
|
||||
}
|
||||
}
|
||||
|
||||
return tables;
|
||||
}
|
||||
|
||||
private parseModels(models: DbtSchemaModel[] | undefined): DbtParsedTable[] {
|
||||
if (!models || !Array.isArray(models)) {
|
||||
return [];
|
||||
}
|
||||
|
||||
const tables: DbtParsedTable[] = [];
|
||||
|
||||
for (const model of models) {
|
||||
if (!model.name) {
|
||||
continue;
|
||||
}
|
||||
|
||||
const tagsDbt = this.mergeTagsDbt(this.normalizeTagList(model.tags));
|
||||
const freshnessDbt = this.buildFreshnessDbt(model.freshness, model.loaded_at_field);
|
||||
tables.push({
|
||||
name: model.name,
|
||||
description: this.normalizeDescription(model.description),
|
||||
database: model.database ?? null,
|
||||
schema: model.schema ?? null,
|
||||
columns: this.parseColumns(model.columns),
|
||||
resourceType: 'model',
|
||||
...(tagsDbt ? { tagsDbt } : {}),
|
||||
...(freshnessDbt ? { freshnessDbt } : {}),
|
||||
});
|
||||
}
|
||||
|
||||
return tables;
|
||||
}
|
||||
|
||||
private parseColumns(columns: DbtSchemaColumn[] | undefined): DbtParsedColumn[] {
|
||||
if (!columns || !Array.isArray(columns)) {
|
||||
return [];
|
||||
}
|
||||
|
||||
return columns.map((column) => {
|
||||
const { refs, constraints, enumValues } = this.parseDataTests(column.data_tests ?? column.tests);
|
||||
return {
|
||||
name: column.name,
|
||||
description: this.normalizeDescription(column.description),
|
||||
dataType: column.data_type ?? null,
|
||||
...(refs.length > 0 ? { dataTests: refs } : {}),
|
||||
...(constraints ? { constraints } : {}),
|
||||
...(enumValues.length > 0 ? { enumValuesDbt: enumValues } : {}),
|
||||
};
|
||||
});
|
||||
}
|
||||
|
||||
private parseDataTests(tests: DbtSchemaDataTest[] | undefined): {
|
||||
refs: DbtDataTestRef[];
|
||||
constraints: DbtColumnConstraints | undefined;
|
||||
enumValues: string[];
|
||||
} {
|
||||
const refs: DbtDataTestRef[] = [];
|
||||
const dbt: { not_null?: boolean; unique?: boolean } = {};
|
||||
const enumValues: string[] = [];
|
||||
if (!tests?.length) {
|
||||
return { refs, constraints: undefined, enumValues };
|
||||
}
|
||||
|
||||
for (const test of tests) {
|
||||
if (typeof test === 'string') {
|
||||
const parsed = this.parseTestNameString(test);
|
||||
refs.push(parsed);
|
||||
if (parsed.package === 'dbt' && parsed.name === 'not_null') {
|
||||
dbt.not_null = true;
|
||||
}
|
||||
if (parsed.package === 'dbt' && parsed.name === 'unique') {
|
||||
dbt.unique = true;
|
||||
}
|
||||
continue;
|
||||
}
|
||||
|
||||
for (const [key, value] of Object.entries(test)) {
|
||||
if (key === 'relationships') {
|
||||
refs.push({
|
||||
name: 'relationships',
|
||||
package: 'dbt',
|
||||
...(value && typeof value === 'object' && !Array.isArray(value)
|
||||
? { kwargs: value as Record<string, unknown> }
|
||||
: {}),
|
||||
});
|
||||
continue;
|
||||
}
|
||||
if (key === 'not_null') {
|
||||
refs.push({ name: 'not_null', package: 'dbt' });
|
||||
dbt.not_null = true;
|
||||
continue;
|
||||
}
|
||||
if (key === 'unique') {
|
||||
refs.push({ name: 'unique', package: 'dbt' });
|
||||
dbt.unique = true;
|
||||
continue;
|
||||
}
|
||||
if (key === 'accepted_values') {
|
||||
if (Array.isArray(value)) {
|
||||
enumValues.push(...value.map((item) => String(item)));
|
||||
refs.push({ name: 'accepted_values', package: 'dbt', kwargs: { values: value } });
|
||||
continue;
|
||||
}
|
||||
if (value && typeof value === 'object' && !Array.isArray(value)) {
|
||||
const values = (value as { values?: unknown }).values;
|
||||
if (Array.isArray(values)) {
|
||||
enumValues.push(...values.map((item) => String(item)));
|
||||
}
|
||||
refs.push({ name: 'accepted_values', package: 'dbt', kwargs: value as Record<string, unknown> });
|
||||
continue;
|
||||
}
|
||||
}
|
||||
refs.push({
|
||||
...this.parseTestNameString(key),
|
||||
...(value && typeof value === 'object' && !Array.isArray(value)
|
||||
? { kwargs: value as Record<string, unknown> }
|
||||
: {}),
|
||||
});
|
||||
}
|
||||
}
|
||||
|
||||
const constraints = dbt.not_null || dbt.unique ? { dbt } : undefined;
|
||||
return { refs, constraints, enumValues };
|
||||
}
|
||||
|
||||
private parseTestNameString(value: string): { name: string; package: string } {
|
||||
const parts = value.split('.');
|
||||
if (parts.length >= 2) {
|
||||
return { package: parts[0]!, name: parts.slice(1).join('.') };
|
||||
}
|
||||
return { package: 'dbt', name: value };
|
||||
}
|
||||
|
||||
private parseSourceRelationships(sources: DbtSchemaSource[] | undefined): DbtParsedRelationship[] {
|
||||
if (!sources || !Array.isArray(sources)) {
|
||||
return [];
|
||||
}
|
||||
|
||||
const relationships: DbtParsedRelationship[] = [];
|
||||
|
||||
for (const source of sources) {
|
||||
const sourceSchema = source.schema ?? source.name;
|
||||
|
||||
if (!source.tables || !Array.isArray(source.tables)) {
|
||||
continue;
|
||||
}
|
||||
|
||||
for (const table of source.tables) {
|
||||
const tableName = table.identifier ?? table.name;
|
||||
|
||||
if (!table.columns || !Array.isArray(table.columns)) {
|
||||
continue;
|
||||
}
|
||||
|
||||
for (const column of table.columns) {
|
||||
const tests = column.data_tests ?? column.tests ?? [];
|
||||
|
||||
for (const test of tests) {
|
||||
const relationship = this.parseRelationshipTest(test, tableName, column.name, sourceSchema);
|
||||
if (relationship) {
|
||||
relationships.push(relationship);
|
||||
}
|
||||
}
|
||||
}
|
||||
}
|
||||
}
|
||||
|
||||
return relationships;
|
||||
}
|
||||
|
||||
private parseModelRelationships(models: DbtSchemaModel[] | undefined): DbtParsedRelationship[] {
|
||||
if (!models || !Array.isArray(models)) {
|
||||
return [];
|
||||
}
|
||||
|
||||
const relationships: DbtParsedRelationship[] = [];
|
||||
|
||||
for (const model of models) {
|
||||
if (!model.name || !model.columns || !Array.isArray(model.columns)) {
|
||||
continue;
|
||||
}
|
||||
|
||||
for (const column of model.columns) {
|
||||
const tests = column.data_tests ?? column.tests ?? [];
|
||||
|
||||
for (const test of tests) {
|
||||
const relationship = this.parseRelationshipTest(test, model.name, column.name, model.schema ?? undefined);
|
||||
if (relationship) {
|
||||
relationships.push(relationship);
|
||||
}
|
||||
}
|
||||
}
|
||||
}
|
||||
|
||||
return relationships;
|
||||
}
|
||||
|
||||
private parseRelationshipTest(
|
||||
test: DbtSchemaDataTest,
|
||||
fromTable: string,
|
||||
fromColumn: string,
|
||||
fromSchema?: string,
|
||||
): DbtParsedRelationship | null {
|
||||
if (typeof test === 'string' || !test.relationships) {
|
||||
return null;
|
||||
}
|
||||
|
||||
const relationship = test.relationships;
|
||||
const toRef = relationship.to ?? relationship.arguments?.to;
|
||||
const toColumn = relationship.field ?? relationship.arguments?.field;
|
||||
|
||||
if (!toRef || !toColumn) {
|
||||
this.logger.debug(`Skipping incomplete relationship test for ${fromTable}.${fromColumn}`);
|
||||
return null;
|
||||
}
|
||||
|
||||
const toTable = this.parseRef(toRef);
|
||||
if (!toTable) {
|
||||
this.logger.debug(`Could not parse ref: ${toRef}`);
|
||||
return null;
|
||||
}
|
||||
|
||||
return {
|
||||
fromTable,
|
||||
fromColumn,
|
||||
toTable,
|
||||
toColumn,
|
||||
...(fromSchema ? { fromSchema } : {}),
|
||||
};
|
||||
}
|
||||
|
||||
private parseRef(refString: string): string | null {
|
||||
const refMatch = refString.match(/ref\s*\(\s*['"]([^'"]+)['"]\s*\)/);
|
||||
if (refMatch) {
|
||||
return refMatch[1];
|
||||
}
|
||||
|
||||
const sourceMatch = refString.match(/source\s*\(\s*['"][^'"]+['"]\s*,\s*['"]([^'"]+)['"]\s*\)/);
|
||||
if (sourceMatch) {
|
||||
return sourceMatch[1];
|
||||
}
|
||||
|
||||
return null;
|
||||
}
|
||||
|
||||
private normalizeDescription(description: string | undefined): string | null {
|
||||
if (!description) {
|
||||
return null;
|
||||
}
|
||||
const trimmed = description.trim();
|
||||
return trimmed.length > 0 ? trimmed : null;
|
||||
}
|
||||
|
||||
private normalizeTagList(tags: string[] | undefined): string[] {
|
||||
if (!tags || !Array.isArray(tags)) {
|
||||
return [];
|
||||
}
|
||||
return tags.map((tag) => String(tag));
|
||||
}
|
||||
|
||||
private mergeTagsDbt(...lists: Array<string[] | undefined>): string[] | undefined {
|
||||
const merged: string[] = [];
|
||||
const seen = new Set<string>();
|
||||
for (const list of lists) {
|
||||
for (const item of list ?? []) {
|
||||
if (!seen.has(item)) {
|
||||
seen.add(item);
|
||||
merged.push(item);
|
||||
}
|
||||
}
|
||||
}
|
||||
return merged.length > 0 ? merged : undefined;
|
||||
}
|
||||
|
||||
private buildFreshnessDbt(freshness: unknown, loadedAtField: string | undefined): DbtParsedTable['freshnessDbt'] {
|
||||
const loadedTrim = loadedAtField?.trim();
|
||||
const hasFreshness = freshness !== undefined && freshness !== null;
|
||||
if (!hasFreshness && !loadedTrim) {
|
||||
return undefined;
|
||||
}
|
||||
return {
|
||||
...(hasFreshness ? { raw: freshness } : {}),
|
||||
...(hasFreshness ? { loadedAtField: loadedTrim ?? null } : loadedTrim ? { loadedAtField: loadedTrim } : {}),
|
||||
};
|
||||
}
|
||||
|
||||
private deduplicateTables(tables: DbtParsedTable[]): DbtParsedTable[] {
|
||||
const seen = new Map<string, DbtParsedTable>();
|
||||
|
||||
for (const table of tables) {
|
||||
const key = `${table.database ?? ''}.${table.schema ?? ''}.${table.name}`.toLowerCase();
|
||||
const existing = seen.get(key);
|
||||
|
||||
if (!existing) {
|
||||
seen.set(key, table);
|
||||
continue;
|
||||
}
|
||||
|
||||
seen.set(key, {
|
||||
...existing,
|
||||
description: existing.description ?? table.description,
|
||||
columns: this.mergeColumns(existing.columns, table.columns),
|
||||
tagsDbt: this.mergeTagsDbt(existing.tagsDbt, table.tagsDbt),
|
||||
freshnessDbt: this.mergeFreshnessDbt(existing.freshnessDbt, table.freshnessDbt),
|
||||
});
|
||||
}
|
||||
|
||||
return Array.from(seen.values());
|
||||
}
|
||||
|
||||
private mergeColumns(existing: DbtParsedColumn[], incoming: DbtParsedColumn[]): DbtParsedColumn[] {
|
||||
const seen = new Map<string, DbtParsedColumn>();
|
||||
|
||||
for (const column of existing) {
|
||||
seen.set(column.name.toLowerCase(), column);
|
||||
}
|
||||
|
||||
for (const column of incoming) {
|
||||
const key = column.name.toLowerCase();
|
||||
const existingColumn = seen.get(key);
|
||||
|
||||
if (!existingColumn) {
|
||||
seen.set(key, column);
|
||||
continue;
|
||||
}
|
||||
|
||||
seen.set(key, {
|
||||
...existingColumn,
|
||||
description: existingColumn.description ?? column.description,
|
||||
dataType: existingColumn.dataType ?? column.dataType,
|
||||
dataTests: this.mergeDbtDataTests(existingColumn.dataTests, column.dataTests),
|
||||
constraints: this.mergeDbtConstraints(existingColumn.constraints, column.constraints),
|
||||
enumValuesDbt: this.mergeStringList(existingColumn.enumValuesDbt, column.enumValuesDbt),
|
||||
});
|
||||
}
|
||||
|
||||
return Array.from(seen.values());
|
||||
}
|
||||
|
||||
private deduplicateRelationships(relationships: DbtParsedRelationship[]): DbtParsedRelationship[] {
|
||||
const seen = new Set<string>();
|
||||
const result: DbtParsedRelationship[] = [];
|
||||
|
||||
for (const relationship of relationships) {
|
||||
const key =
|
||||
`${relationship.fromTable}.${relationship.fromColumn}->${relationship.toTable}.${relationship.toColumn}`.toLowerCase();
|
||||
if (!seen.has(key)) {
|
||||
seen.add(key);
|
||||
result.push(relationship);
|
||||
}
|
||||
}
|
||||
|
||||
return result;
|
||||
}
|
||||
|
||||
private mergeFreshnessDbt(
|
||||
existing?: DbtParsedTable['freshnessDbt'],
|
||||
incoming?: DbtParsedTable['freshnessDbt'],
|
||||
): DbtParsedTable['freshnessDbt'] {
|
||||
if (!existing && !incoming) {
|
||||
return undefined;
|
||||
}
|
||||
const raw = existing?.raw !== undefined ? existing.raw : incoming?.raw;
|
||||
const loadedAtField = existing?.loadedAtField ?? incoming?.loadedAtField;
|
||||
return {
|
||||
...(raw !== undefined ? { raw } : {}),
|
||||
...(loadedAtField !== undefined ? { loadedAtField } : {}),
|
||||
};
|
||||
}
|
||||
|
||||
private mergeDbtConstraints(
|
||||
existing?: DbtColumnConstraints,
|
||||
incoming?: DbtColumnConstraints,
|
||||
): DbtColumnConstraints | undefined {
|
||||
const notNull = !!(existing?.dbt.not_null || incoming?.dbt.not_null);
|
||||
const unique = !!(existing?.dbt.unique || incoming?.dbt.unique);
|
||||
if (!notNull && !unique) {
|
||||
return undefined;
|
||||
}
|
||||
return { dbt: { ...(notNull ? { not_null: true } : {}), ...(unique ? { unique: true } : {}) } };
|
||||
}
|
||||
|
||||
private mergeStringList(existing?: string[], incoming?: string[]): string[] | undefined {
|
||||
return this.mergeTagsDbt(existing, incoming);
|
||||
}
|
||||
|
||||
private mergeDbtDataTests(existing?: DbtDataTestRef[], incoming?: DbtDataTestRef[]): DbtDataTestRef[] | undefined {
|
||||
if (!existing?.length) {
|
||||
return incoming?.length ? [...incoming] : undefined;
|
||||
}
|
||||
if (!incoming?.length) {
|
||||
return [...existing];
|
||||
}
|
||||
const tests = new Map<string, DbtDataTestRef>();
|
||||
for (const test of [...existing, ...incoming]) {
|
||||
const kwargsKey =
|
||||
test.kwargs && Object.keys(test.kwargs).length > 0
|
||||
? `:${createHash('sha256').update(JSON.stringify(test.kwargs)).digest('hex').slice(0, 16)}`
|
||||
: '';
|
||||
tests.set(`${test.package}:${test.name}${kwargsKey}`, test);
|
||||
}
|
||||
return [...tests.values()];
|
||||
}
|
||||
|
||||
private emptyResult(projectName: string | null): DbtSchemaParseResult {
|
||||
return {
|
||||
projectName,
|
||||
dbtVersion: null,
|
||||
tables: [],
|
||||
relationships: [],
|
||||
};
|
||||
}
|
||||
}
|
||||
36
packages/cli/src/context/ingest/adapters/dbt/chunk.test.ts
Normal file
36
packages/cli/src/context/ingest/adapters/dbt/chunk.test.ts
Normal file
|
|
@ -0,0 +1,36 @@
|
|||
import { describe, expect, it } from 'vitest';
|
||||
import { chunkDbtProject } from './chunk.js';
|
||||
|
||||
describe('chunkDbtProject', () => {
|
||||
const diffSet = (modified: string[]) => ({ added: [], modified, deleted: [], unchanged: [] });
|
||||
|
||||
it('caps peerFileIndex when the project has very many yaml files', () => {
|
||||
const modelPaths = Array.from({ length: 201 }, (_, i) => `models/m${i}.yml`);
|
||||
const allPaths = ['dbt_project.yml', ...modelPaths].sort();
|
||||
const { workUnits } = chunkDbtProject({ allPaths });
|
||||
const [first] = workUnits;
|
||||
expect(first).toBeDefined();
|
||||
expect(first?.peerFileIndex).toHaveLength(200);
|
||||
expect(first?.notes).toMatch(/capped at 200/);
|
||||
});
|
||||
|
||||
it('keeps large-project model work units when dbt_project.yml changes', () => {
|
||||
const modelPaths = Array.from({ length: 30 }, (_, i) => `models/m${i}.yml`);
|
||||
const allPaths = ['dbt_project.yml', ...modelPaths].sort();
|
||||
const { workUnits } = chunkDbtProject({ allPaths }, { diffSet: diffSet(['dbt_project.yml']) });
|
||||
|
||||
expect(workUnits).toHaveLength(30);
|
||||
expect(workUnits[0]?.rawFiles).toEqual(['models/m0.yml']);
|
||||
expect(workUnits[0]?.dependencyPaths).toContain('dbt_project.yml');
|
||||
});
|
||||
|
||||
it('keeps large-project model work units when non-model yaml peers change', () => {
|
||||
const modelPaths = Array.from({ length: 30 }, (_, i) => `models/m${i}.yml`);
|
||||
const allPaths = ['dbt_project.yml', 'seeds/seed_properties.yml', ...modelPaths].sort();
|
||||
const { workUnits } = chunkDbtProject({ allPaths }, { diffSet: diffSet(['seeds/seed_properties.yml']) });
|
||||
|
||||
expect(workUnits).toHaveLength(30);
|
||||
expect(workUnits[0]?.rawFiles).toEqual(['models/m0.yml']);
|
||||
expect(workUnits[0]?.dependencyPaths).toContain('seeds/seed_properties.yml');
|
||||
});
|
||||
});
|
||||
130
packages/cli/src/context/ingest/adapters/dbt/chunk.ts
Normal file
130
packages/cli/src/context/ingest/adapters/dbt/chunk.ts
Normal file
|
|
@ -0,0 +1,130 @@
|
|||
import type { ChunkResult, DiffSet, WorkUnit } from '../../types.js';
|
||||
import type { ParsedDbtProject } from './parse.js';
|
||||
|
||||
interface ChunkOptions {
|
||||
diffSet?: DiffSet;
|
||||
}
|
||||
|
||||
/**
|
||||
* Per-model work units (when the project has more than 25 YAML files) only name `rawFiles` under
|
||||
* `models/**`. Other `.yml` (e.g. some `seeds/` or custom layouts) still appear in `peerFileIndex`
|
||||
* or in the small-project / no-models fallbacks — v1 does not emit one WU per non-models file.
|
||||
*/
|
||||
const MODELS_PREFIX = 'models/';
|
||||
|
||||
/** `peerFileIndex` is a hint only (agents may not read those paths). Cap to limit prompt size. */
|
||||
const MAX_PEER_FILE_INDEX = 200;
|
||||
|
||||
function projectYamlPath(allPaths: string[]): string | undefined {
|
||||
if (allPaths.includes('dbt_project.yml')) {
|
||||
return 'dbt_project.yml';
|
||||
}
|
||||
if (allPaths.includes('dbt_project.yaml')) {
|
||||
return 'dbt_project.yaml';
|
||||
}
|
||||
return undefined;
|
||||
}
|
||||
|
||||
function modelRelativePaths(allPaths: string[]): string[] {
|
||||
return allPaths.filter((p) => p.replace(/\\/g, '/').startsWith(MODELS_PREFIX)).sort();
|
||||
}
|
||||
|
||||
function unitKeyForModelFile(mf: string): string {
|
||||
const base = mf
|
||||
.replace(/\.(ya?ml)$/i, '')
|
||||
.replace(/\\/g, '/')
|
||||
.replace(/[^a-zA-Z0-9]+/g, '-')
|
||||
.replace(/^-+|-+$/g, '');
|
||||
return `dbt-${base.toLowerCase()}`;
|
||||
}
|
||||
|
||||
function emitFirstRunWorkUnits(allPaths: string[], dbtDep: string | undefined): WorkUnit[] {
|
||||
if (allPaths.length === 0) {
|
||||
return [];
|
||||
}
|
||||
|
||||
if (allPaths.length <= 25) {
|
||||
return [
|
||||
{
|
||||
unitKey: 'dbt-all',
|
||||
displayLabel: 'dbt project (all yaml)',
|
||||
rawFiles: [...allPaths],
|
||||
peerFileIndex: [],
|
||||
dependencyPaths: [],
|
||||
notes: 'dbt project — all YAML in one WorkUnit (≤25 files)',
|
||||
},
|
||||
];
|
||||
}
|
||||
|
||||
const modelFiles = modelRelativePaths(allPaths);
|
||||
if (modelFiles.length === 0) {
|
||||
return [
|
||||
{
|
||||
unitKey: 'dbt-all',
|
||||
displayLabel: 'dbt project (all yaml, no models/**)',
|
||||
rawFiles: [...allPaths],
|
||||
peerFileIndex: [],
|
||||
dependencyPaths: dbtDep ? [dbtDep] : [],
|
||||
notes: 'dbt: no models/**/*.yml — single slice with dbt_project as dependency if present',
|
||||
},
|
||||
];
|
||||
}
|
||||
|
||||
return modelFiles.map((mf) => {
|
||||
const allPeers = allPaths.filter((p) => p !== mf).sort();
|
||||
const truncated = allPeers.length > MAX_PEER_FILE_INDEX;
|
||||
const peerFileIndex = truncated ? allPeers.slice(0, MAX_PEER_FILE_INDEX) : allPeers;
|
||||
const dependencyPaths = dbtDep && allPaths.includes(dbtDep) && mf !== dbtDep ? [dbtDep].sort() : [];
|
||||
const notes = truncated
|
||||
? `dbt model schema slice (peer index capped at ${MAX_PEER_FILE_INDEX} of ${allPeers.length} paths)`
|
||||
: 'dbt model schema slice';
|
||||
return {
|
||||
unitKey: unitKeyForModelFile(mf),
|
||||
displayLabel: `dbt ${mf}`,
|
||||
rawFiles: [mf],
|
||||
peerFileIndex,
|
||||
dependencyPaths: dependencyPaths,
|
||||
notes,
|
||||
};
|
||||
});
|
||||
}
|
||||
|
||||
function applyDiffSet(firstRunUnits: WorkUnit[], diffSet: DiffSet): ChunkResult {
|
||||
const touched = new Set([...diffSet.added, ...diffSet.modified]);
|
||||
const kept: WorkUnit[] = [];
|
||||
|
||||
for (const wu of firstRunUnits) {
|
||||
const touchedRawFiles = wu.rawFiles.filter((p) => touched.has(p));
|
||||
const touchedDependencies = wu.dependencyPaths.filter((p) => touched.has(p));
|
||||
const touchedPeerFiles = wu.peerFileIndex.filter((p) => touched.has(p));
|
||||
if (touchedRawFiles.length === 0 && touchedDependencies.length === 0 && touchedPeerFiles.length === 0) {
|
||||
continue;
|
||||
}
|
||||
|
||||
const rawFiles = touchedRawFiles.length > 0 ? touchedRawFiles : wu.rawFiles;
|
||||
const unchangedRaw = touchedRawFiles.length > 0 ? wu.rawFiles.filter((p) => !touched.has(p)) : [];
|
||||
for (const p of wu.rawFiles) {
|
||||
if (!rawFiles.includes(p) && !unchangedRaw.includes(p)) {
|
||||
unchangedRaw.push(p);
|
||||
}
|
||||
}
|
||||
const combinedDeps = new Set<string>([...wu.dependencyPaths, ...unchangedRaw, ...touchedPeerFiles]);
|
||||
kept.push({
|
||||
...wu,
|
||||
rawFiles: rawFiles.sort(),
|
||||
dependencyPaths: [...combinedDeps].sort(),
|
||||
});
|
||||
}
|
||||
|
||||
const eviction = diffSet.deleted.length > 0 ? { deletedRawPaths: [...diffSet.deleted].sort() } : undefined;
|
||||
return { workUnits: kept, eviction };
|
||||
}
|
||||
|
||||
export function chunkDbtProject(project: ParsedDbtProject, opts: ChunkOptions = {}): ChunkResult {
|
||||
const dbtDep = projectYamlPath(project.allPaths);
|
||||
const firstRun = emitFirstRunWorkUnits(project.allPaths, dbtDep);
|
||||
if (!opts.diffSet) {
|
||||
return { workUnits: firstRun };
|
||||
}
|
||||
return applyDiffSet(firstRun, opts.diffSet);
|
||||
}
|
||||
|
|
@ -0,0 +1,57 @@
|
|||
import { mkdir, mkdtemp, rm, writeFile } from 'node:fs/promises';
|
||||
import { tmpdir } from 'node:os';
|
||||
import { join } from 'node:path';
|
||||
import { afterEach, beforeEach, describe, expect, it } from 'vitest';
|
||||
import type { SourceAdapter } from '../../types.js';
|
||||
import { DbtSourceAdapter } from './dbt.adapter.js';
|
||||
|
||||
describe('DbtSourceAdapter', () => {
|
||||
let stagedDir: string;
|
||||
let adapter: SourceAdapter;
|
||||
|
||||
beforeEach(async () => {
|
||||
stagedDir = await mkdtemp(join(tmpdir(), 'dbt-adapter-'));
|
||||
adapter = new DbtSourceAdapter();
|
||||
});
|
||||
|
||||
afterEach(async () => {
|
||||
await rm(stagedDir, { recursive: true, force: true });
|
||||
});
|
||||
|
||||
it('declares the expected source key and skill list', () => {
|
||||
expect(adapter.source).toBe('dbt');
|
||||
expect(adapter.skillNames).toEqual(['dbt_ingest']);
|
||||
});
|
||||
|
||||
it('detects a staged dbt project root (dbt_project.yml)', async () => {
|
||||
await writeFile(join(stagedDir, 'dbt_project.yml'), "name: 'jaffle'\nversion: '1.0.0'\n", 'utf-8');
|
||||
expect(await adapter.detect(stagedDir)).toBe(true);
|
||||
});
|
||||
|
||||
it('chunk: dbt_project.yml + models/a.yml yields one WU (≤25 files)', async () => {
|
||||
await writeFile(join(stagedDir, 'dbt_project.yml'), "name: 'jaffle'\n", 'utf-8');
|
||||
await mkdir(join(stagedDir, 'models'), { recursive: true });
|
||||
await writeFile(
|
||||
join(stagedDir, 'models/a.yml'),
|
||||
'version: 2\nmodels:\n - name: orders\n description: Orders\n',
|
||||
'utf-8',
|
||||
);
|
||||
const result = await adapter.chunk(stagedDir);
|
||||
expect(result.workUnits).toHaveLength(1);
|
||||
expect(result.workUnits[0].unitKey).toBe('dbt-all');
|
||||
expect(result.parseArtifacts).toMatchObject({
|
||||
projectName: 'jaffle',
|
||||
tables: [{ name: 'orders', description: 'Orders' }],
|
||||
});
|
||||
});
|
||||
|
||||
it('implements fetch() for git-backed dbt source setup', () => {
|
||||
expect(adapter.fetch).toBeTypeOf('function');
|
||||
});
|
||||
|
||||
it('reports mapped warehouse targets for bundle SL discovery', async () => {
|
||||
adapter = new DbtSourceAdapter({ targetConnectionIds: ['postgres-warehouse', 'postgres-warehouse'] });
|
||||
|
||||
await expect(adapter.listTargetConnectionIds?.(stagedDir)).resolves.toEqual(['postgres-warehouse']);
|
||||
});
|
||||
});
|
||||
53
packages/cli/src/context/ingest/adapters/dbt/dbt.adapter.ts
Normal file
53
packages/cli/src/context/ingest/adapters/dbt/dbt.adapter.ts
Normal file
|
|
@ -0,0 +1,53 @@
|
|||
import { join } from 'node:path';
|
||||
import type { ChunkResult, DiffSet, SourceAdapter } from '../../types.js';
|
||||
import type { FetchContext } from '../../types.js';
|
||||
import { loadProjectInfo } from '../../dbt-shared/project-vars.js';
|
||||
import { loadDbtSchemaFiles } from '../../dbt-shared/schema-files.js';
|
||||
import { parseDbtSchemaFiles } from '../dbt-descriptions/parse-schema.js';
|
||||
import { chunkDbtProject } from './chunk.js';
|
||||
import { detectDbtStagedDir } from './detect.js';
|
||||
import { fetchDbtRepo, type DbtPullConfig } from './fetch.js';
|
||||
import { parseDbtStagedDir } from './parse.js';
|
||||
|
||||
interface DbtSourceAdapterOptions {
|
||||
homeDir?: string;
|
||||
targetConnectionIds?: string[];
|
||||
}
|
||||
|
||||
export class DbtSourceAdapter implements SourceAdapter {
|
||||
readonly source = 'dbt' as const;
|
||||
/** Runner merges: ingest_triage, sl_capture, wiki_capture (see ingest-bundle.runner.ts) */
|
||||
readonly skillNames: string[] = ['dbt_ingest'];
|
||||
|
||||
constructor(private readonly options: DbtSourceAdapterOptions = {}) {}
|
||||
|
||||
detect(stagedDir: string): Promise<boolean> {
|
||||
return detectDbtStagedDir(stagedDir);
|
||||
}
|
||||
|
||||
async listTargetConnectionIds(_stagedDir: string): Promise<string[]> {
|
||||
return [...new Set(this.options.targetConnectionIds ?? [])].sort((left, right) => left.localeCompare(right));
|
||||
}
|
||||
|
||||
async fetch(pullConfig: unknown, stagedDir: string, ctx: FetchContext): Promise<void> {
|
||||
const config = pullConfig as DbtPullConfig | undefined;
|
||||
if (!config?.repoUrl) {
|
||||
throw new Error('dbt fetch requires repoUrl');
|
||||
}
|
||||
await fetchDbtRepo({
|
||||
config,
|
||||
cacheDir: join(this.options.homeDir ?? '.ktx/cache', 'dbt', ctx.connectionId),
|
||||
stagedDir,
|
||||
});
|
||||
}
|
||||
|
||||
async chunk(stagedDir: string, diffSet?: DiffSet): Promise<ChunkResult> {
|
||||
const project = await parseDbtStagedDir(stagedDir);
|
||||
const projectInfo = await loadProjectInfo(stagedDir);
|
||||
const schemaFiles = await loadDbtSchemaFiles(stagedDir);
|
||||
const parseArtifacts = parseDbtSchemaFiles(schemaFiles, projectInfo.variables, {
|
||||
projectName: projectInfo.projectName,
|
||||
});
|
||||
return { ...chunkDbtProject(project, { diffSet }), parseArtifacts };
|
||||
}
|
||||
}
|
||||
12
packages/cli/src/context/ingest/adapters/dbt/detect.ts
Normal file
12
packages/cli/src/context/ingest/adapters/dbt/detect.ts
Normal file
|
|
@ -0,0 +1,12 @@
|
|||
import { access } from 'node:fs/promises';
|
||||
import { join } from 'node:path';
|
||||
|
||||
export async function detectDbtStagedDir(stagedDir: string): Promise<boolean> {
|
||||
for (const name of ['dbt_project.yml', 'dbt_project.yaml'] as const) {
|
||||
try {
|
||||
await access(join(stagedDir, name));
|
||||
return true;
|
||||
} catch {}
|
||||
}
|
||||
return false;
|
||||
}
|
||||
38
packages/cli/src/context/ingest/adapters/dbt/fetch.test.ts
Normal file
38
packages/cli/src/context/ingest/adapters/dbt/fetch.test.ts
Normal file
|
|
@ -0,0 +1,38 @@
|
|||
import { mkdir, mkdtemp, readFile, rm, writeFile } from 'node:fs/promises';
|
||||
import { tmpdir } from 'node:os';
|
||||
import { join } from 'node:path';
|
||||
import { afterEach, beforeEach, describe, expect, it, vi } from 'vitest';
|
||||
import { fetchDbtRepo } from './fetch.js';
|
||||
|
||||
describe('fetchDbtRepo', () => {
|
||||
let tempDir: string;
|
||||
|
||||
beforeEach(async () => {
|
||||
tempDir = await mkdtemp(join(tmpdir(), 'ktx-dbt-fetch-'));
|
||||
});
|
||||
|
||||
afterEach(async () => {
|
||||
await rm(tempDir, { recursive: true, force: true });
|
||||
});
|
||||
|
||||
it('copies dbt yaml files from a fetched repo subpath into staged dir', async () => {
|
||||
const cacheDir = join(tempDir, 'cache');
|
||||
const stagedDir = join(tempDir, 'staged');
|
||||
await mkdir(join(cacheDir, 'analytics', 'models'), { recursive: true });
|
||||
await writeFile(join(cacheDir, 'analytics', 'dbt_project.yml'), 'name: analytics\n', 'utf-8');
|
||||
await writeFile(join(cacheDir, 'analytics', 'models', 'orders.yml'), 'models: []\n', 'utf-8');
|
||||
const cloneOrPull = vi.fn(async () => ({ commitHash: 'abc123' }));
|
||||
|
||||
await expect(
|
||||
fetchDbtRepo({
|
||||
config: { repoUrl: 'https://github.com/acme/dbt.git', path: 'analytics' },
|
||||
cacheDir,
|
||||
stagedDir,
|
||||
deps: { cloneOrPull },
|
||||
}),
|
||||
).resolves.toEqual({ commitHash: 'abc123', filesCopied: 2 });
|
||||
|
||||
await expect(readFile(join(stagedDir, 'dbt_project.yml'), 'utf-8')).resolves.toContain('analytics');
|
||||
await expect(readFile(join(stagedDir, 'models', 'orders.yml'), 'utf-8')).resolves.toContain('models');
|
||||
});
|
||||
});
|
||||
60
packages/cli/src/context/ingest/adapters/dbt/fetch.ts
Normal file
60
packages/cli/src/context/ingest/adapters/dbt/fetch.ts
Normal file
|
|
@ -0,0 +1,60 @@
|
|||
import { access, copyFile, mkdir, readdir } from 'node:fs/promises';
|
||||
import { dirname, join, relative } from 'node:path';
|
||||
import { cloneOrPull, sanitizeRepoError } from '../../repo-fetch.js';
|
||||
|
||||
export interface DbtPullConfig {
|
||||
repoUrl: string;
|
||||
branch?: string;
|
||||
path?: string;
|
||||
authToken?: string | null;
|
||||
}
|
||||
|
||||
export interface FetchDbtRepoParams {
|
||||
config: DbtPullConfig;
|
||||
cacheDir: string;
|
||||
stagedDir: string;
|
||||
deps?: {
|
||||
cloneOrPull?: typeof cloneOrPull;
|
||||
};
|
||||
}
|
||||
|
||||
export async function fetchDbtRepo(params: FetchDbtRepoParams): Promise<{ commitHash: string; filesCopied: number }> {
|
||||
try {
|
||||
const runCloneOrPull = params.deps?.cloneOrPull ?? cloneOrPull;
|
||||
const { commitHash } = await runCloneOrPull({
|
||||
repoUrl: params.config.repoUrl,
|
||||
authToken: params.config.authToken,
|
||||
cacheDir: params.cacheDir,
|
||||
branch: params.config.branch ?? 'main',
|
||||
});
|
||||
const sourceRoot = params.config.path ? join(params.cacheDir, params.config.path) : params.cacheDir;
|
||||
const filesCopied = await copyYamlFilesRecursive(sourceRoot, params.stagedDir);
|
||||
return { commitHash, filesCopied };
|
||||
} catch (error) {
|
||||
throw new Error(sanitizeRepoError(error, params.config.authToken));
|
||||
}
|
||||
}
|
||||
|
||||
async function copyYamlFilesRecursive(sourceRoot: string, destRoot: string): Promise<number> {
|
||||
try {
|
||||
await access(sourceRoot);
|
||||
} catch {
|
||||
return 0;
|
||||
}
|
||||
|
||||
await mkdir(destRoot, { recursive: true });
|
||||
const entries = await readdir(sourceRoot, { withFileTypes: true, recursive: true });
|
||||
let copied = 0;
|
||||
for (const entry of entries) {
|
||||
if (!entry.isFile() || !/\.ya?ml$/i.test(entry.name)) {
|
||||
continue;
|
||||
}
|
||||
const absSrc = join(entry.parentPath, entry.name);
|
||||
const rel = relative(sourceRoot, absSrc);
|
||||
const dest = join(destRoot, rel);
|
||||
await mkdir(dirname(dest), { recursive: true });
|
||||
await copyFile(absSrc, dest);
|
||||
copied += 1;
|
||||
}
|
||||
return copied;
|
||||
}
|
||||
|
|
@ -0,0 +1,8 @@
|
|||
import { describe, expect, it } from 'vitest';
|
||||
import { normalizeDbtPath } from './parse.js';
|
||||
|
||||
describe('normalizeDbtPath', () => {
|
||||
it('normalizes Windows separators to POSIX separators', () => {
|
||||
expect(normalizeDbtPath('models\\marts\\orders.yml')).toBe('models/marts/orders.yml');
|
||||
});
|
||||
});
|
||||
33
packages/cli/src/context/ingest/adapters/dbt/parse.ts
Normal file
33
packages/cli/src/context/ingest/adapters/dbt/parse.ts
Normal file
|
|
@ -0,0 +1,33 @@
|
|||
import { readdir } from 'node:fs/promises';
|
||||
import { join, relative } from 'node:path';
|
||||
|
||||
const YAML_EXT_RE = /\.(ya?ml)$/i;
|
||||
|
||||
/** @internal */
|
||||
export function normalizeDbtPath(path: string): string {
|
||||
return path.replaceAll('\\', '/');
|
||||
}
|
||||
|
||||
async function collectYamlFiles(stagedDir: string): Promise<string[]> {
|
||||
const entries = await readdir(stagedDir, { withFileTypes: true, recursive: true });
|
||||
const paths: string[] = [];
|
||||
for (const entry of entries) {
|
||||
if (!entry.isFile() || !YAML_EXT_RE.test(entry.name)) {
|
||||
continue;
|
||||
}
|
||||
const abs = join(entry.parentPath, entry.name);
|
||||
paths.push(normalizeDbtPath(relative(stagedDir, abs)));
|
||||
}
|
||||
paths.sort();
|
||||
return paths;
|
||||
}
|
||||
|
||||
export interface ParsedDbtProject {
|
||||
/** All `.yml` / `.yaml` paths under stagedDir, relative + sorted. */
|
||||
allPaths: string[];
|
||||
}
|
||||
|
||||
export async function parseDbtStagedDir(stagedDir: string): Promise<ParsedDbtProject> {
|
||||
const allPaths = await collectYamlFiles(stagedDir);
|
||||
return { allPaths };
|
||||
}
|
||||
|
|
@ -0,0 +1,48 @@
|
|||
import { readdir } from 'node:fs/promises';
|
||||
import { join, relative } from 'node:path';
|
||||
import type { ChunkResult, DiffSet, SourceAdapter, WorkUnit } from '../../types.js';
|
||||
|
||||
export class FakeSourceAdapter implements SourceAdapter {
|
||||
readonly source = 'fake';
|
||||
readonly skillNames: string[] = [];
|
||||
|
||||
detect(): Promise<boolean> {
|
||||
return Promise.resolve(true);
|
||||
}
|
||||
|
||||
async chunk(stagedDir: string, diffSet?: DiffSet): Promise<ChunkResult> {
|
||||
const subDirs = (await readdir(stagedDir, { withFileTypes: true }))
|
||||
.filter((e) => e.isDirectory())
|
||||
.map((e) => e.name)
|
||||
.sort();
|
||||
|
||||
const workUnits: WorkUnit[] = [];
|
||||
for (const subDir of subDirs) {
|
||||
const entries = await readdir(join(stagedDir, subDir), { withFileTypes: true, recursive: true });
|
||||
const rawFiles = entries
|
||||
.filter((e) => e.isFile())
|
||||
.map((e) => relative(stagedDir, join(e.parentPath, e.name)))
|
||||
.sort();
|
||||
if (rawFiles.length === 0) {
|
||||
continue;
|
||||
}
|
||||
if (diffSet) {
|
||||
const touched = new Set([...diffSet.added, ...diffSet.modified]);
|
||||
const anyTouched = rawFiles.some((p) => touched.has(p));
|
||||
if (!anyTouched) {
|
||||
continue;
|
||||
}
|
||||
}
|
||||
workUnits.push({
|
||||
unitKey: `fake-${subDir}`,
|
||||
displayLabel: subDir,
|
||||
rawFiles,
|
||||
peerFileIndex: [],
|
||||
dependencyPaths: [],
|
||||
});
|
||||
}
|
||||
|
||||
const eviction = diffSet && diffSet.deleted.length > 0 ? { deletedRawPaths: [...diffSet.deleted] } : undefined;
|
||||
return { workUnits, eviction };
|
||||
}
|
||||
}
|
||||
|
|
@ -0,0 +1,158 @@
|
|||
import { describe, expect, it, vi } from 'vitest';
|
||||
import { BigQueryHistoricSqlQueryHistoryReader } from './bigquery-query-history-reader.js';
|
||||
import { HistoricSqlGrantsMissingError } from './errors.js';
|
||||
|
||||
interface FakeQueryResult {
|
||||
headers: string[];
|
||||
rows: unknown[][];
|
||||
totalRows: number;
|
||||
error?: string;
|
||||
}
|
||||
|
||||
function queryClient(results: FakeQueryResult[]) {
|
||||
const executeQuery = vi.fn(async (_query: string) => {
|
||||
const next = results.shift();
|
||||
if (!next) {
|
||||
throw new Error('unexpected query');
|
||||
}
|
||||
return next;
|
||||
});
|
||||
return { executeQuery };
|
||||
}
|
||||
|
||||
function firstQuery(client: ReturnType<typeof queryClient>): string {
|
||||
const call = client.executeQuery.mock.calls[0];
|
||||
if (!call) {
|
||||
throw new Error('expected query client to be called');
|
||||
}
|
||||
return call[0];
|
||||
}
|
||||
|
||||
describe('BigQueryHistoricSqlQueryHistoryReader', () => {
|
||||
it('probes region-qualified INFORMATION_SCHEMA.JOBS_BY_PROJECT', async () => {
|
||||
const client = queryClient([{ headers: ['1'], rows: [[1]], totalRows: 1 }]);
|
||||
const reader = new BigQueryHistoricSqlQueryHistoryReader({ projectId: 'project-1', region: 'US' });
|
||||
|
||||
await expect(reader.probe(client)).resolves.toEqual({ warnings: [], info: [] });
|
||||
|
||||
expect(client.executeQuery).toHaveBeenCalledWith(
|
||||
'SELECT 1 FROM `project-1.region-us.INFORMATION_SCHEMA.JOBS_BY_PROJECT` LIMIT 1',
|
||||
);
|
||||
});
|
||||
|
||||
it('turns probe result errors into HistoricSqlGrantsMissingError', async () => {
|
||||
const client = queryClient([{ headers: [], rows: [], totalRows: 0, error: 'Access Denied: jobs.listAll' }]);
|
||||
const reader = new BigQueryHistoricSqlQueryHistoryReader({ projectId: 'project-1', region: 'us-central1' });
|
||||
|
||||
await expect(reader.probe(client)).rejects.toMatchObject({
|
||||
name: 'HistoricSqlGrantsMissingError',
|
||||
dialect: 'bigquery',
|
||||
remediation:
|
||||
'Grant roles/bigquery.resourceViewer on the BigQuery project, or grant a custom role containing bigquery.jobs.listAll.',
|
||||
});
|
||||
});
|
||||
|
||||
it('turns thrown probe failures into HistoricSqlGrantsMissingError', async () => {
|
||||
const client = {
|
||||
executeQuery: vi.fn(async () => {
|
||||
throw new Error('permission denied');
|
||||
}),
|
||||
};
|
||||
const reader = new BigQueryHistoricSqlQueryHistoryReader({ projectId: 'project-1', region: 'US' });
|
||||
|
||||
await expect(reader.probe(client)).rejects.toBeInstanceOf(HistoricSqlGrantsMissingError);
|
||||
});
|
||||
|
||||
it('fetches aggregated BigQuery query templates', async () => {
|
||||
const client = queryClient([
|
||||
{
|
||||
headers: [
|
||||
'template_id',
|
||||
'canonical_sql',
|
||||
'executions',
|
||||
'distinct_users',
|
||||
'first_seen',
|
||||
'last_seen',
|
||||
'p50_ms',
|
||||
'p95_ms',
|
||||
'error_rate',
|
||||
'rows_produced',
|
||||
'top_users',
|
||||
],
|
||||
rows: [
|
||||
[
|
||||
'hash-1',
|
||||
'select status from orders',
|
||||
42,
|
||||
3,
|
||||
'2026-05-01T00:00:00.000Z',
|
||||
'2026-05-11T00:00:00.000Z',
|
||||
12,
|
||||
40,
|
||||
0.05,
|
||||
null,
|
||||
JSON.stringify([{ user: 'analyst@example.test', executions: 1 }]),
|
||||
],
|
||||
],
|
||||
totalRows: 1,
|
||||
},
|
||||
]);
|
||||
const reader = new BigQueryHistoricSqlQueryHistoryReader({ projectId: 'demo', region: 'us' });
|
||||
|
||||
const rows = [];
|
||||
for await (const row of reader.fetchAggregated(
|
||||
client,
|
||||
{ start: new Date('2026-02-10T00:00:00.000Z'), end: new Date('2026-05-11T00:00:00.000Z') },
|
||||
{ dialect: 'bigquery', minExecutions: 5, windowDays: 90, enabledTables: [], filters: { dropTrivialProbes: true }, redactionPatterns: [], staleArchiveAfterDays: 90 },
|
||||
)) {
|
||||
rows.push(row);
|
||||
}
|
||||
|
||||
const sql = firstQuery(client);
|
||||
expect(sql).toContain('COUNT(*) AS executions');
|
||||
expect(sql).toContain('COUNT(DISTINCT user_email) AS distinct_users');
|
||||
expect(sql).toContain('GROUP BY query_hash');
|
||||
expect(sql).toContain('HAVING COUNT(*) >= 5');
|
||||
expect(rows).toMatchObject([
|
||||
{
|
||||
templateId: 'hash-1',
|
||||
stats: {
|
||||
executions: 42,
|
||||
errorRate: 0.05,
|
||||
},
|
||||
topUsers: [{ user: 'analyst@example.test', executions: 1 }],
|
||||
},
|
||||
]);
|
||||
});
|
||||
|
||||
it('throws a clear error when the query client cannot execute SQL', async () => {
|
||||
const reader = new BigQueryHistoricSqlQueryHistoryReader({ projectId: 'project-1', region: 'US' });
|
||||
|
||||
await expect(async () => {
|
||||
for await (const _row of reader.fetchAggregated(
|
||||
{},
|
||||
{ start: new Date(), end: new Date() },
|
||||
{
|
||||
dialect: 'bigquery',
|
||||
minExecutions: 5,
|
||||
windowDays: 90,
|
||||
enabledTables: [],
|
||||
filters: { dropTrivialProbes: true },
|
||||
redactionPatterns: [],
|
||||
staleArchiveAfterDays: 90,
|
||||
},
|
||||
)) {
|
||||
throw new Error('unreachable');
|
||||
}
|
||||
}).rejects.toThrow('Historic SQL BigQuery reader requires a query client with executeQuery(query)');
|
||||
});
|
||||
|
||||
it('rejects unsafe project and region identifiers before building SQL', () => {
|
||||
expect(() => new BigQueryHistoricSqlQueryHistoryReader({ projectId: 'project`1', region: 'US' })).toThrow(
|
||||
'Invalid BigQuery project id for historic-SQL ingest: project`1',
|
||||
);
|
||||
expect(() => new BigQueryHistoricSqlQueryHistoryReader({ projectId: 'project-1', region: 'US;DROP' })).toThrow(
|
||||
'Invalid BigQuery region for historic-SQL ingest: US;DROP',
|
||||
);
|
||||
});
|
||||
});
|
||||
|
|
@ -0,0 +1,247 @@
|
|||
import { HistoricSqlGrantsMissingError } from './errors.js';
|
||||
import {
|
||||
aggregatedTemplateSchema,
|
||||
type AggregatedTemplate,
|
||||
type HistoricSqlTimeWindow,
|
||||
type HistoricSqlUnifiedPullConfig,
|
||||
} from './types.js';
|
||||
|
||||
interface QueryResultLike {
|
||||
headers: string[];
|
||||
rows: unknown[][];
|
||||
totalRows: number;
|
||||
error?: string;
|
||||
}
|
||||
|
||||
interface QueryClientLike {
|
||||
executeQuery(query: string): Promise<QueryResultLike>;
|
||||
}
|
||||
|
||||
export interface BigQueryHistoricSqlQueryHistoryReaderOptions {
|
||||
projectId: string;
|
||||
region: string;
|
||||
}
|
||||
|
||||
const BIGQUERY_GRANTS_REMEDIATION =
|
||||
'Grant roles/bigquery.resourceViewer on the BigQuery project, or grant a custom role containing bigquery.jobs.listAll.';
|
||||
|
||||
function queryClient(client: unknown): QueryClientLike {
|
||||
if (
|
||||
client &&
|
||||
typeof client === 'object' &&
|
||||
'executeQuery' in client &&
|
||||
typeof (client as { executeQuery?: unknown }).executeQuery === 'function'
|
||||
) {
|
||||
return client as QueryClientLike;
|
||||
}
|
||||
throw new Error('Historic SQL BigQuery reader requires a query client with executeQuery(query)');
|
||||
}
|
||||
|
||||
function grantsError(cause: unknown): HistoricSqlGrantsMissingError {
|
||||
const message =
|
||||
cause instanceof Error
|
||||
? cause.message
|
||||
: typeof cause === 'string'
|
||||
? cause
|
||||
: 'BigQuery principal cannot query INFORMATION_SCHEMA.JOBS_BY_PROJECT.';
|
||||
return new HistoricSqlGrantsMissingError({
|
||||
dialect: 'bigquery',
|
||||
message: `Missing BigQuery audit grants for historic-SQL ingest: ${message}`,
|
||||
remediation: BIGQUERY_GRANTS_REMEDIATION,
|
||||
cause,
|
||||
});
|
||||
}
|
||||
|
||||
function normalizeProjectId(value: string): string {
|
||||
if (!/^[A-Za-z0-9_-]+$/.test(value)) {
|
||||
throw new Error(`Invalid BigQuery project id for historic-SQL ingest: ${value}`);
|
||||
}
|
||||
return value;
|
||||
}
|
||||
|
||||
function normalizeRegion(value: string): string {
|
||||
const region = value.trim().toLowerCase().replace(/^region-/, '');
|
||||
if (!/^[a-z0-9-]+$/.test(region)) {
|
||||
throw new Error(`Invalid BigQuery region for historic-SQL ingest: ${value}`);
|
||||
}
|
||||
return region;
|
||||
}
|
||||
|
||||
function timestampExpression(value: Date | string): string {
|
||||
const date = value instanceof Date ? value : new Date(value);
|
||||
if (Number.isNaN(date.getTime())) {
|
||||
throw new Error(`Invalid BigQuery query-history timestamp: ${String(value)}`);
|
||||
}
|
||||
return `TIMESTAMP('${date.toISOString().replace(/'/g, "\\'")}')`;
|
||||
}
|
||||
|
||||
function indexByHeader(headers: string[]): Map<string, number> {
|
||||
const out = new Map<string, number>();
|
||||
headers.forEach((header, index) => {
|
||||
out.set(header.toUpperCase(), index);
|
||||
});
|
||||
return out;
|
||||
}
|
||||
|
||||
function value(row: unknown[], indexes: Map<string, number>, name: string): unknown {
|
||||
const index = indexes.get(name.toUpperCase());
|
||||
return index === undefined ? null : row[index];
|
||||
}
|
||||
|
||||
function nullableString(raw: unknown): string | null {
|
||||
if (raw === null || raw === undefined) {
|
||||
return null;
|
||||
}
|
||||
const text = String(raw);
|
||||
return text.length > 0 ? text : null;
|
||||
}
|
||||
|
||||
function requiredString(raw: unknown, field: string): string {
|
||||
const text = nullableString(raw);
|
||||
if (!text) {
|
||||
throw new Error(`BigQuery JOBS_BY_PROJECT row is missing ${field}`);
|
||||
}
|
||||
return text;
|
||||
}
|
||||
|
||||
function nullableNumber(raw: unknown): number | null {
|
||||
if (raw === null || raw === undefined || raw === '') {
|
||||
return null;
|
||||
}
|
||||
const number = typeof raw === 'number' ? raw : Number(raw);
|
||||
if (!Number.isFinite(number)) {
|
||||
return null;
|
||||
}
|
||||
return Math.max(0, number);
|
||||
}
|
||||
|
||||
function requiredNumber(raw: unknown, field: string): number {
|
||||
const number = nullableNumber(raw);
|
||||
if (number === null) {
|
||||
throw new Error(`BigQuery JOBS_BY_PROJECT row has invalid ${field}: ${String(raw)}`);
|
||||
}
|
||||
return number;
|
||||
}
|
||||
|
||||
function requiredInteger(raw: unknown, field: string): number {
|
||||
return Math.trunc(requiredNumber(raw, field));
|
||||
}
|
||||
|
||||
function nullableInteger(raw: unknown): number | null {
|
||||
const number = nullableNumber(raw);
|
||||
return number === null ? null : Math.trunc(number);
|
||||
}
|
||||
|
||||
function isoTimestamp(raw: unknown, field: string): string {
|
||||
if (raw instanceof Date) {
|
||||
return raw.toISOString();
|
||||
}
|
||||
const text = requiredString(raw, field);
|
||||
const date = new Date(text);
|
||||
if (Number.isNaN(date.getTime())) {
|
||||
throw new Error(`BigQuery JOBS_BY_PROJECT row has invalid ${field}: ${text}`);
|
||||
}
|
||||
return date.toISOString();
|
||||
}
|
||||
|
||||
function parseTopUsers(raw: unknown): Array<{ user: string | null; executions: number }> {
|
||||
const text = nullableString(raw);
|
||||
if (!text) {
|
||||
return [];
|
||||
}
|
||||
try {
|
||||
const parsed = JSON.parse(text) as unknown;
|
||||
if (!Array.isArray(parsed)) {
|
||||
return [];
|
||||
}
|
||||
return parsed.flatMap((entry) => {
|
||||
if (!entry || typeof entry !== 'object') {
|
||||
return [];
|
||||
}
|
||||
const user = nullableString((entry as { user?: unknown }).user);
|
||||
const executions = nullableInteger((entry as { executions?: unknown }).executions);
|
||||
return executions === null ? [] : [{ user, executions }];
|
||||
});
|
||||
} catch {
|
||||
return [];
|
||||
}
|
||||
}
|
||||
|
||||
function mapAggregatedRow(row: unknown[], indexes: Map<string, number>): AggregatedTemplate {
|
||||
return aggregatedTemplateSchema.parse({
|
||||
templateId: requiredString(value(row, indexes, 'template_id'), 'template_id'),
|
||||
canonicalSql: requiredString(value(row, indexes, 'canonical_sql'), 'canonical_sql'),
|
||||
dialect: 'bigquery',
|
||||
stats: {
|
||||
executions: requiredInteger(value(row, indexes, 'executions'), 'executions'),
|
||||
distinctUsers: requiredInteger(value(row, indexes, 'distinct_users'), 'distinct_users'),
|
||||
firstSeen: isoTimestamp(value(row, indexes, 'first_seen'), 'first_seen'),
|
||||
lastSeen: isoTimestamp(value(row, indexes, 'last_seen'), 'last_seen'),
|
||||
p50RuntimeMs: nullableNumber(value(row, indexes, 'p50_ms')),
|
||||
p95RuntimeMs: nullableNumber(value(row, indexes, 'p95_ms')),
|
||||
errorRate: requiredNumber(value(row, indexes, 'error_rate'), 'error_rate'),
|
||||
rowsProduced: nullableInteger(value(row, indexes, 'rows_produced')),
|
||||
},
|
||||
topUsers: parseTopUsers(value(row, indexes, 'top_users')),
|
||||
});
|
||||
}
|
||||
|
||||
export class BigQueryHistoricSqlQueryHistoryReader {
|
||||
private readonly viewPath: string;
|
||||
|
||||
constructor(options: BigQueryHistoricSqlQueryHistoryReaderOptions) {
|
||||
const projectId = normalizeProjectId(options.projectId);
|
||||
const region = normalizeRegion(options.region);
|
||||
this.viewPath = `\`${projectId}.region-${region}.INFORMATION_SCHEMA.JOBS_BY_PROJECT\``;
|
||||
}
|
||||
|
||||
async probe(client: unknown): Promise<{ warnings: string[]; info: string[] }> {
|
||||
let result: QueryResultLike;
|
||||
try {
|
||||
result = await queryClient(client).executeQuery(`SELECT 1 FROM ${this.viewPath} LIMIT 1`);
|
||||
} catch (error) {
|
||||
throw grantsError(error);
|
||||
}
|
||||
if (result.error) {
|
||||
throw grantsError(result.error);
|
||||
}
|
||||
return { warnings: [], info: [] };
|
||||
}
|
||||
|
||||
async *fetchAggregated(
|
||||
client: unknown,
|
||||
window: HistoricSqlTimeWindow,
|
||||
config: HistoricSqlUnifiedPullConfig,
|
||||
): AsyncIterable<AggregatedTemplate> {
|
||||
const sql = `
|
||||
SELECT
|
||||
query_hash AS template_id,
|
||||
MIN(query) AS canonical_sql,
|
||||
COUNT(*) AS executions,
|
||||
COUNT(DISTINCT user_email) AS distinct_users,
|
||||
MIN(creation_time) AS first_seen,
|
||||
MAX(creation_time) AS last_seen,
|
||||
APPROX_QUANTILES(TIMESTAMP_DIFF(end_time, creation_time, MILLISECOND), 100)[OFFSET(50)] AS p50_ms,
|
||||
APPROX_QUANTILES(TIMESTAMP_DIFF(end_time, creation_time, MILLISECOND), 100)[OFFSET(95)] AS p95_ms,
|
||||
SAFE_DIVIDE(COUNTIF(error_result IS NOT NULL), COUNT(*)) AS error_rate,
|
||||
CAST(NULL AS INT64) AS rows_produced,
|
||||
TO_JSON_STRING(ARRAY_AGG(STRUCT(user_email AS user, 1 AS executions) ORDER BY creation_time DESC LIMIT 5)) AS top_users
|
||||
FROM ${this.viewPath}
|
||||
WHERE job_type = 'QUERY'
|
||||
AND statement_type IN ('SELECT', 'MERGE')
|
||||
AND creation_time >= ${timestampExpression(window.start)}
|
||||
AND creation_time < ${timestampExpression(window.end)}
|
||||
AND query IS NOT NULL
|
||||
GROUP BY query_hash
|
||||
HAVING COUNT(*) >= ${config.minExecutions}
|
||||
ORDER BY executions DESC`.trim();
|
||||
const result = await queryClient(client).executeQuery(sql);
|
||||
if (result.error) {
|
||||
throw grantsError(result.error);
|
||||
}
|
||||
const indexes = indexByHeader(result.headers);
|
||||
for (const row of result.rows) {
|
||||
yield mapAggregatedRow(row, indexes);
|
||||
}
|
||||
}
|
||||
}
|
||||
|
|
@ -0,0 +1,59 @@
|
|||
import { describe, expect, it } from 'vitest';
|
||||
import {
|
||||
bucketDistinctUsers,
|
||||
bucketErrorRate,
|
||||
bucketExecutions,
|
||||
bucketFrequency,
|
||||
bucketP95Runtime,
|
||||
bucketRecency,
|
||||
} from './buckets.js';
|
||||
|
||||
describe('historic-sql bucket helpers', () => {
|
||||
it('uses stable execution buckets', () => {
|
||||
expect([0, 9, 10, 99, 100, 999, 1000, 4999, 5000, 49999, 50000].map(bucketExecutions)).toEqual([
|
||||
'<10',
|
||||
'<10',
|
||||
'10-100',
|
||||
'10-100',
|
||||
'100-1k',
|
||||
'100-1k',
|
||||
'1k-5k',
|
||||
'1k-5k',
|
||||
'5k-50k',
|
||||
'5k-50k',
|
||||
'>50k',
|
||||
]);
|
||||
});
|
||||
|
||||
it('uses stable distinct-user, error-rate, runtime, and recency buckets', () => {
|
||||
expect([0, 1, 2, 5, 6, 10, 11].map(bucketDistinctUsers)).toEqual([
|
||||
'0',
|
||||
'1',
|
||||
'2-5',
|
||||
'2-5',
|
||||
'5-10',
|
||||
'5-10',
|
||||
'>10',
|
||||
]);
|
||||
expect([0, 0.01, 0.05, 0.2].map(bucketErrorRate)).toEqual(['none', 'low', 'low', 'high']);
|
||||
expect([null, 99, 100, 999, 1000, 9999, 10000].map(bucketP95Runtime)).toEqual([
|
||||
'unknown',
|
||||
'<100ms',
|
||||
'100ms-1s',
|
||||
'100ms-1s',
|
||||
'1s-10s',
|
||||
'1s-10s',
|
||||
'>10s',
|
||||
]);
|
||||
expect(bucketRecency('2026-05-11T00:00:00.000Z', new Date('2026-05-11T12:00:00.000Z'))).toBe('current');
|
||||
expect(bucketRecency('2026-04-20T00:00:00.000Z', new Date('2026-05-11T12:00:00.000Z'))).toBe('recent');
|
||||
expect(bucketRecency('2026-01-01T00:00:00.000Z', new Date('2026-05-11T12:00:00.000Z'))).toBe('stale');
|
||||
});
|
||||
|
||||
it('maps frequency counts to high, mid, and low labels', () => {
|
||||
expect(bucketFrequency(80, 100)).toBe('high');
|
||||
expect(bucketFrequency(20, 100)).toBe('mid');
|
||||
expect(bucketFrequency(1, 100)).toBe('low');
|
||||
expect(bucketFrequency(0, 0)).toBe('low');
|
||||
});
|
||||
});
|
||||
|
|
@ -0,0 +1,49 @@
|
|||
export function bucketExecutions(value: number): string {
|
||||
if (value < 10) return '<10';
|
||||
if (value < 100) return '10-100';
|
||||
if (value < 1000) return '100-1k';
|
||||
if (value < 5000) return '1k-5k';
|
||||
if (value < 50000) return '5k-50k';
|
||||
return '>50k';
|
||||
}
|
||||
|
||||
export function bucketDistinctUsers(value: number): string {
|
||||
if (value <= 0) return '0';
|
||||
if (value === 1) return '1';
|
||||
if (value <= 5) return '2-5';
|
||||
if (value <= 10) return '5-10';
|
||||
return '>10';
|
||||
}
|
||||
|
||||
export function bucketErrorRate(value: number): string {
|
||||
if (value <= 0) return 'none';
|
||||
if (value < 0.1) return 'low';
|
||||
return 'high';
|
||||
}
|
||||
|
||||
export function bucketP95Runtime(value: number | null): string {
|
||||
if (value === null) return 'unknown';
|
||||
if (value < 100) return '<100ms';
|
||||
if (value < 1000) return '100ms-1s';
|
||||
if (value < 10000) return '1s-10s';
|
||||
return '>10s';
|
||||
}
|
||||
|
||||
export function bucketRecency(lastSeen: string, now: Date): string {
|
||||
const parsed = new Date(lastSeen);
|
||||
if (Number.isNaN(parsed.getTime())) {
|
||||
return 'unknown';
|
||||
}
|
||||
const ageDays = (now.getTime() - parsed.getTime()) / (24 * 60 * 60 * 1000);
|
||||
if (ageDays <= 7) return 'current';
|
||||
if (ageDays <= 45) return 'recent';
|
||||
return 'stale';
|
||||
}
|
||||
|
||||
export function bucketFrequency(count: number, total: number): 'high' | 'mid' | 'low' {
|
||||
if (total <= 0 || count <= 0) return 'low';
|
||||
const ratio = count / total;
|
||||
if (ratio >= 0.5) return 'high';
|
||||
if (ratio >= 0.1) return 'mid';
|
||||
return 'low';
|
||||
}
|
||||
|
|
@ -0,0 +1,182 @@
|
|||
import { mkdir, mkdtemp, writeFile } from 'node:fs/promises';
|
||||
import { tmpdir } from 'node:os';
|
||||
import { join } from 'node:path';
|
||||
import { describe, expect, it } from 'vitest';
|
||||
import { chunkHistoricSqlUnifiedStagedDir, describeHistoricSqlUnifiedScope } from './chunk-unified.js';
|
||||
|
||||
async function tempDir(): Promise<string> {
|
||||
return mkdtemp(join(tmpdir(), 'historic-sql-unified-chunk-'));
|
||||
}
|
||||
|
||||
async function writeJson(root: string, relPath: string, value: unknown): Promise<void> {
|
||||
const target = join(root, relPath);
|
||||
await mkdir(join(target, '..'), { recursive: true });
|
||||
await writeFile(target, `${JSON.stringify(value, null, 2)}\n`, 'utf-8');
|
||||
}
|
||||
|
||||
async function writeUnifiedStagedDir(root: string): Promise<void> {
|
||||
await writeJson(root, 'manifest.json', {
|
||||
source: 'historic-sql',
|
||||
connectionId: 'warehouse',
|
||||
dialect: 'postgres',
|
||||
fetchedAt: '2026-05-11T00:00:00.000Z',
|
||||
windowStart: '2026-02-10T00:00:00.000Z',
|
||||
windowEnd: '2026-05-11T00:00:00.000Z',
|
||||
snapshotRowCount: 1,
|
||||
touchedTableCount: 1,
|
||||
parseFailures: 0,
|
||||
warnings: [],
|
||||
probeWarnings: [],
|
||||
});
|
||||
await writeJson(root, 'tables/public.orders.json', {
|
||||
table: 'public.orders',
|
||||
stats: {
|
||||
executionsBucket: '10-100',
|
||||
distinctUsersBucket: '2-5',
|
||||
errorRateBucket: 'none',
|
||||
p95RuntimeBucket: '<100ms',
|
||||
recencyBucket: 'current',
|
||||
},
|
||||
columnsByClause: { select: [['status', 'high']] },
|
||||
observedJoins: [],
|
||||
topTemplates: [{ id: 'orders', canonicalSql: 'select * from public.orders', topUsers: [{ user: 'analyst' }] }],
|
||||
});
|
||||
await writeJson(root, 'patterns-input.json', {
|
||||
templates: [
|
||||
{
|
||||
id: 'orders',
|
||||
canonicalSql: 'select * from public.orders join public.customers on true',
|
||||
tablesTouched: ['public.orders', 'public.customers'],
|
||||
executionsBucket: '10-100',
|
||||
distinctUsersBucket: '2-5',
|
||||
dialect: 'postgres',
|
||||
},
|
||||
],
|
||||
});
|
||||
await writeJson(root, 'patterns-input/part-0001.json', {
|
||||
templates: [
|
||||
{
|
||||
id: 'orders',
|
||||
canonicalSql: 'select * from public.orders join public.customers on true',
|
||||
tablesTouched: ['public.orders', 'public.customers'],
|
||||
executionsBucket: '10-100',
|
||||
distinctUsersBucket: '2-5',
|
||||
dialect: 'postgres',
|
||||
},
|
||||
],
|
||||
});
|
||||
}
|
||||
|
||||
describe('chunkHistoricSqlUnifiedStagedDir', () => {
|
||||
it('emits one table WorkUnit plus one patterns WorkUnit', async () => {
|
||||
const stagedDir = await tempDir();
|
||||
await writeUnifiedStagedDir(stagedDir);
|
||||
|
||||
const result = await chunkHistoricSqlUnifiedStagedDir(stagedDir);
|
||||
|
||||
expect(result.workUnits).toEqual([
|
||||
expect.objectContaining({
|
||||
unitKey: 'historic-sql-table-public-orders',
|
||||
displayLabel: 'Historic SQL usage: public.orders',
|
||||
rawFiles: ['tables/public.orders.json'],
|
||||
dependencyPaths: ['manifest.json'],
|
||||
notes: expect.stringContaining('historic_sql_table_digest'),
|
||||
}),
|
||||
expect.objectContaining({
|
||||
unitKey: 'historic-sql-patterns-part-0001',
|
||||
displayLabel: 'Historic SQL cross-table patterns: part-0001',
|
||||
rawFiles: ['patterns-input/part-0001.json'],
|
||||
dependencyPaths: ['manifest.json'],
|
||||
notes: expect.stringContaining('patterns-input/part-0001.json'),
|
||||
}),
|
||||
]);
|
||||
expect(result.workUnits[0]?.notes).toContain('emit_historic_sql_evidence');
|
||||
expect(result.workUnits[1]?.notes).toContain('emit_historic_sql_evidence');
|
||||
expect(result.reconcileNotes).toEqual(['Historic-SQL touched tables=1 parseFailures=0']);
|
||||
});
|
||||
|
||||
it('respects diff sets for unchanged table and patterns files', async () => {
|
||||
const stagedDir = await tempDir();
|
||||
await writeUnifiedStagedDir(stagedDir);
|
||||
|
||||
await expect(
|
||||
chunkHistoricSqlUnifiedStagedDir(stagedDir, {
|
||||
added: [],
|
||||
modified: ['tables/public.orders.json'],
|
||||
deleted: [],
|
||||
unchanged: ['manifest.json', 'patterns-input.json', 'patterns-input/part-0001.json'],
|
||||
}),
|
||||
).resolves.toMatchObject({
|
||||
workUnits: [expect.objectContaining({ unitKey: 'historic-sql-table-public-orders' })],
|
||||
});
|
||||
|
||||
await expect(
|
||||
chunkHistoricSqlUnifiedStagedDir(stagedDir, {
|
||||
added: [],
|
||||
modified: ['patterns-input/part-0001.json'],
|
||||
deleted: [],
|
||||
unchanged: ['manifest.json', 'patterns-input.json', 'tables/public.orders.json'],
|
||||
}),
|
||||
).resolves.toMatchObject({
|
||||
workUnits: [expect.objectContaining({ unitKey: 'historic-sql-patterns-part-0001' })],
|
||||
});
|
||||
|
||||
await expect(
|
||||
chunkHistoricSqlUnifiedStagedDir(stagedDir, {
|
||||
added: [],
|
||||
modified: ['patterns-input.json'],
|
||||
deleted: [],
|
||||
unchanged: ['manifest.json', 'patterns-input/part-0001.json', 'tables/public.orders.json'],
|
||||
}),
|
||||
).resolves.toMatchObject({
|
||||
workUnits: [],
|
||||
});
|
||||
});
|
||||
|
||||
it('describes unified staged scope', async () => {
|
||||
const stagedDir = await tempDir();
|
||||
await writeUnifiedStagedDir(stagedDir);
|
||||
|
||||
const scope = await describeHistoricSqlUnifiedScope(stagedDir);
|
||||
|
||||
expect(scope.isPathInScope('manifest.json')).toBe(true);
|
||||
expect(scope.isPathInScope('patterns-input.json')).toBe(true);
|
||||
expect(scope.isPathInScope('patterns-input/part-0001.json')).toBe(true);
|
||||
expect(scope.isPathInScope('patterns-input/part-1.json')).toBe(false);
|
||||
expect(scope.isPathInScope('tables/public.orders.json')).toBe(true);
|
||||
expect(scope.isPathInScope('templates/old/page.md')).toBe(false);
|
||||
});
|
||||
|
||||
it('emits one patterns WorkUnit per changed shard', async () => {
|
||||
const stagedDir = await tempDir();
|
||||
await writeUnifiedStagedDir(stagedDir);
|
||||
await writeJson(stagedDir, 'patterns-input/part-0002.json', {
|
||||
templates: [
|
||||
{
|
||||
id: 'line-items',
|
||||
canonicalSql: 'select * from public.orders join public.line_items on true',
|
||||
tablesTouched: ['public.orders', 'public.line_items'],
|
||||
executionsBucket: '10-100',
|
||||
distinctUsersBucket: '2-5',
|
||||
dialect: 'postgres',
|
||||
},
|
||||
],
|
||||
});
|
||||
|
||||
const result = await chunkHistoricSqlUnifiedStagedDir(stagedDir, {
|
||||
added: ['patterns-input/part-0002.json'],
|
||||
modified: ['patterns-input/part-0001.json'],
|
||||
deleted: [],
|
||||
unchanged: ['manifest.json', 'patterns-input.json', 'tables/public.orders.json'],
|
||||
});
|
||||
|
||||
expect(result.workUnits.map((unit) => unit.unitKey)).toEqual([
|
||||
'historic-sql-patterns-part-0001',
|
||||
'historic-sql-patterns-part-0002',
|
||||
]);
|
||||
expect(result.workUnits.map((unit) => unit.rawFiles)).toEqual([
|
||||
['patterns-input/part-0001.json'],
|
||||
['patterns-input/part-0002.json'],
|
||||
]);
|
||||
});
|
||||
});
|
||||
|
|
@ -0,0 +1,99 @@
|
|||
import { createHash } from 'node:crypto';
|
||||
import { readFile, readdir } from 'node:fs/promises';
|
||||
import { join, relative } from 'node:path';
|
||||
import type { ChunkResult, DiffSet, ScopeDescriptor, WorkUnit } from '../../types.js';
|
||||
import { isHistoricSqlPatternInputShardPath } from './pattern-inputs.js';
|
||||
import { stagedManifestSchema, stagedPatternsInputSchema, stagedTableInputSchema } from './types.js';
|
||||
|
||||
async function walk(root: string): Promise<string[]> {
|
||||
const entries = await readdir(root, { withFileTypes: true, recursive: true });
|
||||
return entries
|
||||
.filter((entry) => entry.isFile())
|
||||
.map((entry) => relative(root, join(entry.parentPath, entry.name)).replace(/\\/g, '/'))
|
||||
.sort();
|
||||
}
|
||||
|
||||
async function readJson<T>(stagedDir: string, relPath: string): Promise<T> {
|
||||
return JSON.parse(await readFile(join(stagedDir, relPath), 'utf-8')) as T;
|
||||
}
|
||||
|
||||
function safeUnitKey(value: string): string {
|
||||
return value.replace(/[^a-zA-Z0-9]+/g, '-').replace(/^-+|-+$/g, '');
|
||||
}
|
||||
|
||||
function touchedPath(path: string, touched: Set<string> | null): boolean {
|
||||
return !touched || touched.has(path);
|
||||
}
|
||||
|
||||
export async function chunkHistoricSqlUnifiedStagedDir(stagedDir: string, diffSet?: DiffSet): Promise<ChunkResult> {
|
||||
const files = await walk(stagedDir);
|
||||
const manifest = stagedManifestSchema.parse(await readJson(stagedDir, 'manifest.json'));
|
||||
const touched = diffSet ? new Set([...diffSet.added, ...diffSet.modified]) : null;
|
||||
const workUnits: WorkUnit[] = [];
|
||||
|
||||
for (const path of files.filter((file) => /^tables\/.+\.json$/.test(file))) {
|
||||
if (!touchedPath(path, touched)) {
|
||||
continue;
|
||||
}
|
||||
const table = stagedTableInputSchema.parse(await readJson(stagedDir, path));
|
||||
workUnits.push({
|
||||
unitKey: `historic-sql-table-${safeUnitKey(table.table)}`,
|
||||
displayLabel: `Historic SQL usage: ${table.table}`,
|
||||
rawFiles: [path],
|
||||
dependencyPaths: ['manifest.json'],
|
||||
peerFileIndex: files.filter((file) => file !== path && file !== 'manifest.json').sort(),
|
||||
notes:
|
||||
'Use historic_sql_table_digest. Read this table usage JSON and emit exactly one table_usage object with emit_historic_sql_evidence. Do not call wiki_write or sl_write_source.',
|
||||
});
|
||||
}
|
||||
|
||||
for (const path of files.filter(isHistoricSqlPatternInputShardPath)) {
|
||||
if (!touchedPath(path, touched)) {
|
||||
continue;
|
||||
}
|
||||
stagedPatternsInputSchema.parse(await readJson(stagedDir, path));
|
||||
const shardLabel = path.replace(/^patterns-input\//, '').replace(/\.json$/, '');
|
||||
workUnits.push({
|
||||
unitKey: `historic-sql-patterns-${safeUnitKey(shardLabel)}`,
|
||||
displayLabel: `Historic SQL cross-table patterns: ${shardLabel}`,
|
||||
rawFiles: [path],
|
||||
dependencyPaths: ['manifest.json'],
|
||||
peerFileIndex: files.filter((file) => file !== path && file !== 'manifest.json').sort(),
|
||||
notes:
|
||||
`Use historic_sql_patterns. Read ${path} and emit pattern objects with emit_historic_sql_evidence using rawPath "${path}". Do not call wiki_write or sl_write_source.`,
|
||||
});
|
||||
}
|
||||
|
||||
const deleted = diffSet?.deleted
|
||||
.filter((path) => isHistoricSqlPatternInputShardPath(path) || /^tables\/.+\.json$/.test(path))
|
||||
.sort();
|
||||
return {
|
||||
workUnits,
|
||||
eviction: deleted && deleted.length > 0 ? { deletedRawPaths: deleted } : undefined,
|
||||
reconcileNotes: [`Historic-SQL touched tables=${manifest.touchedTableCount} parseFailures=${manifest.parseFailures}`],
|
||||
contextReport: {
|
||||
capped: false,
|
||||
warnings: [...manifest.probeWarnings, ...manifest.warnings],
|
||||
},
|
||||
};
|
||||
}
|
||||
|
||||
export async function describeHistoricSqlUnifiedScope(stagedDir: string): Promise<ScopeDescriptor> {
|
||||
const manifest = stagedManifestSchema.parse(await readJson(stagedDir, 'manifest.json'));
|
||||
const fingerprint = createHash('sha256')
|
||||
.update(JSON.stringify({
|
||||
connectionId: manifest.connectionId,
|
||||
dialect: manifest.dialect,
|
||||
windowStart: manifest.windowStart,
|
||||
windowEnd: manifest.windowEnd,
|
||||
}))
|
||||
.digest('hex');
|
||||
return {
|
||||
fingerprint,
|
||||
isPathInScope: (rawPath) =>
|
||||
rawPath === 'manifest.json' ||
|
||||
rawPath === 'patterns-input.json' ||
|
||||
isHistoricSqlPatternInputShardPath(rawPath) ||
|
||||
/^tables\/.+\.json$/.test(rawPath),
|
||||
};
|
||||
}
|
||||
|
|
@ -0,0 +1,57 @@
|
|||
import { mkdir, mkdtemp, writeFile } from 'node:fs/promises';
|
||||
import { tmpdir } from 'node:os';
|
||||
import { join } from 'node:path';
|
||||
import { describe, expect, it } from 'vitest';
|
||||
import { detectHistoricSqlStagedDir } from './detect.js';
|
||||
import { HISTORIC_SQL_SOURCE_KEY, stagedManifestSchema } from './types.js';
|
||||
|
||||
async function tempDir(): Promise<string> {
|
||||
return mkdtemp(join(tmpdir(), 'historic-sql-detect-'));
|
||||
}
|
||||
|
||||
async function writeJson(root: string, relPath: string, value: unknown): Promise<void> {
|
||||
const target = join(root, relPath);
|
||||
await mkdir(join(target, '..'), { recursive: true });
|
||||
await writeFile(target, `${JSON.stringify(value, null, 2)}\n`, 'utf-8');
|
||||
}
|
||||
|
||||
function manifest() {
|
||||
return stagedManifestSchema.parse({
|
||||
source: HISTORIC_SQL_SOURCE_KEY,
|
||||
connectionId: 'conn_1',
|
||||
dialect: 'postgres',
|
||||
fetchedAt: '2026-05-04T12:00:00.000Z',
|
||||
windowStart: '2026-02-03T12:00:00.000Z',
|
||||
windowEnd: '2026-05-04T12:00:00.000Z',
|
||||
snapshotRowCount: 0,
|
||||
touchedTableCount: 0,
|
||||
parseFailures: 0,
|
||||
warnings: [],
|
||||
probeWarnings: [],
|
||||
});
|
||||
}
|
||||
|
||||
describe('historic-sql staged dir detection', () => {
|
||||
it('detects manifest source', async () => {
|
||||
const stagedDir = await tempDir();
|
||||
await writeJson(stagedDir, 'manifest.json', manifest());
|
||||
|
||||
await expect(detectHistoricSqlStagedDir(stagedDir)).resolves.toBe(true);
|
||||
});
|
||||
|
||||
it('detects unified table and patterns structure without manifest', async () => {
|
||||
const stagedDir = await tempDir();
|
||||
await writeFile(join(stagedDir, 'not-a-match.txt'), 'x', 'utf-8');
|
||||
await writeJson(stagedDir, 'patterns-input.json', { templates: [] });
|
||||
await writeJson(stagedDir, 'tables/public.orders.json', { table: 'public.orders' });
|
||||
|
||||
await expect(detectHistoricSqlStagedDir(stagedDir)).resolves.toBe(true);
|
||||
});
|
||||
|
||||
it('does not detect unrelated directories', async () => {
|
||||
const stagedDir = await tempDir();
|
||||
await writeJson(stagedDir, 'manifest.json', { source: 'notion' });
|
||||
|
||||
await expect(detectHistoricSqlStagedDir(stagedDir)).resolves.toBe(false);
|
||||
});
|
||||
});
|
||||
|
|
@ -0,0 +1,25 @@
|
|||
import { readFile, readdir } from 'node:fs/promises';
|
||||
import { join } from 'node:path';
|
||||
import { HISTORIC_SQL_SOURCE_KEY } from './types.js';
|
||||
|
||||
export async function detectHistoricSqlStagedDir(stagedDir: string): Promise<boolean> {
|
||||
try {
|
||||
const manifest = JSON.parse(await readFile(join(stagedDir, 'manifest.json'), 'utf-8')) as { source?: unknown };
|
||||
if (manifest.source === HISTORIC_SQL_SOURCE_KEY) {
|
||||
return true;
|
||||
}
|
||||
if (manifest.source !== undefined) {
|
||||
return false;
|
||||
}
|
||||
} catch {
|
||||
// Fall through to structural detection for stage-only fixtures.
|
||||
}
|
||||
|
||||
try {
|
||||
await readFile(join(stagedDir, 'patterns-input.json'), 'utf-8');
|
||||
const entries = await readdir(join(stagedDir, 'tables'), { withFileTypes: true });
|
||||
return entries.some((entry) => entry.isFile() && entry.name.endsWith('.json'));
|
||||
} catch {
|
||||
return false;
|
||||
}
|
||||
}
|
||||
|
|
@ -0,0 +1,61 @@
|
|||
import type { HistoricSqlDialect } from './types.js';
|
||||
|
||||
interface HistoricSqlGrantsMissingErrorOptions {
|
||||
dialect: HistoricSqlDialect;
|
||||
message: string;
|
||||
remediation: string;
|
||||
cause?: unknown;
|
||||
}
|
||||
|
||||
export class HistoricSqlGrantsMissingError extends Error {
|
||||
readonly dialect: HistoricSqlDialect;
|
||||
readonly remediation: string;
|
||||
|
||||
constructor(options: HistoricSqlGrantsMissingErrorOptions) {
|
||||
super(options.message, options.cause === undefined ? undefined : { cause: options.cause });
|
||||
this.name = 'HistoricSqlGrantsMissingError';
|
||||
this.dialect = options.dialect;
|
||||
this.remediation = options.remediation;
|
||||
}
|
||||
}
|
||||
|
||||
interface HistoricSqlExtensionMissingErrorOptions {
|
||||
dialect: HistoricSqlDialect;
|
||||
message: string;
|
||||
remediation: string;
|
||||
cause?: unknown;
|
||||
}
|
||||
|
||||
export class HistoricSqlExtensionMissingError extends Error {
|
||||
readonly dialect: HistoricSqlDialect;
|
||||
readonly remediation: string;
|
||||
|
||||
constructor(options: HistoricSqlExtensionMissingErrorOptions) {
|
||||
super(options.message, options.cause === undefined ? undefined : { cause: options.cause });
|
||||
this.name = 'HistoricSqlExtensionMissingError';
|
||||
this.dialect = options.dialect;
|
||||
this.remediation = options.remediation;
|
||||
}
|
||||
}
|
||||
|
||||
interface HistoricSqlVersionUnsupportedErrorOptions {
|
||||
dialect: HistoricSqlDialect;
|
||||
detectedVersion: string;
|
||||
minimumVersion: string;
|
||||
}
|
||||
|
||||
export class HistoricSqlVersionUnsupportedError extends Error {
|
||||
readonly dialect: HistoricSqlDialect;
|
||||
readonly detectedVersion: string;
|
||||
readonly minimumVersion: string;
|
||||
|
||||
constructor(options: HistoricSqlVersionUnsupportedErrorOptions) {
|
||||
super(
|
||||
`Unsupported ${options.dialect} version for historic-SQL ingest: detected ${options.detectedVersion}; requires ${options.minimumVersion} or newer.`,
|
||||
);
|
||||
this.name = 'HistoricSqlVersionUnsupportedError';
|
||||
this.dialect = options.dialect;
|
||||
this.detectedVersion = options.detectedVersion;
|
||||
this.minimumVersion = options.minimumVersion;
|
||||
}
|
||||
}
|
||||
|
|
@ -0,0 +1,89 @@
|
|||
import { describe, expect, it, vi } from 'vitest';
|
||||
import { asSchema } from 'ai';
|
||||
import { createEmitHistoricSqlEvidenceTool } from './evidence-tool.js';
|
||||
|
||||
describe('emit_historic_sql_evidence tool', () => {
|
||||
it('exposes an AI SDK v6 tool input schema with top-level object type', async () => {
|
||||
const tool = createEmitHistoricSqlEvidenceTool();
|
||||
|
||||
expect(await asSchema(tool.inputSchema).jsonSchema).toMatchObject({
|
||||
type: 'object',
|
||||
});
|
||||
});
|
||||
|
||||
it('writes table usage evidence to the ignored run evidence directory', async () => {
|
||||
const writeFile = vi.fn(async () => ({ success: true, commitHash: null }));
|
||||
const tool = createEmitHistoricSqlEvidenceTool();
|
||||
|
||||
const result = await tool.execute!(
|
||||
{
|
||||
kind: 'table_usage',
|
||||
table: 'public.orders',
|
||||
rawPath: 'tables/public.orders.json',
|
||||
usage: {
|
||||
narrative: 'Orders are repeatedly queried by paid status.',
|
||||
frequencyTier: 'high',
|
||||
commonFilters: ['status'],
|
||||
commonJoins: [],
|
||||
staleSince: null,
|
||||
},
|
||||
},
|
||||
{
|
||||
toolCallId: 'call-1',
|
||||
messages: [],
|
||||
abortSignal: new AbortController().signal,
|
||||
experimental_context: {
|
||||
connectionId: 'warehouse',
|
||||
session: {
|
||||
ingest: { runId: 'run-1', jobId: 'job-1', syncId: 'sync-1', sourceKey: 'historic-sql' },
|
||||
configService: { writeFile },
|
||||
},
|
||||
},
|
||||
} as never,
|
||||
);
|
||||
|
||||
expect(result).toBe('Recorded historic-SQL table_usage evidence for public.orders.');
|
||||
expect(writeFile).toHaveBeenCalledWith(
|
||||
'.ktx/ingest-evidence/historic-sql/run-1/historic-sql-table-public-orders.json',
|
||||
expect.stringContaining('"kind": "table_usage"'),
|
||||
'System User',
|
||||
'system@example.com',
|
||||
'Record historic-SQL evidence: historic-sql-table-public-orders',
|
||||
{ skipLock: true },
|
||||
);
|
||||
});
|
||||
|
||||
it('rejects non-historic ingest sessions', async () => {
|
||||
const tool = createEmitHistoricSqlEvidenceTool();
|
||||
|
||||
await expect(
|
||||
tool.execute!(
|
||||
{
|
||||
kind: 'pattern',
|
||||
rawPath: 'patterns-input.json',
|
||||
pattern: {
|
||||
slug: 'orders',
|
||||
title: 'Orders',
|
||||
narrative: 'Orders pattern.',
|
||||
definitionSql: 'select * from public.orders',
|
||||
tablesInvolved: ['public.orders'],
|
||||
slRefs: ['orders'],
|
||||
constituentTemplateIds: ['pg:1'],
|
||||
},
|
||||
},
|
||||
{
|
||||
toolCallId: 'call-1',
|
||||
messages: [],
|
||||
abortSignal: new AbortController().signal,
|
||||
experimental_context: {
|
||||
connectionId: 'warehouse',
|
||||
session: {
|
||||
ingest: { runId: 'run-1', jobId: 'job-1', syncId: 'sync-1', sourceKey: 'notion' },
|
||||
configService: { writeFile: vi.fn() },
|
||||
},
|
||||
},
|
||||
} as never,
|
||||
),
|
||||
).resolves.toContain('Error: emit_historic_sql_evidence is only available during historic-sql ingest');
|
||||
});
|
||||
});
|
||||
|
|
@ -0,0 +1,121 @@
|
|||
import { tool } from 'ai';
|
||||
import { z } from 'zod';
|
||||
import { historicSqlEvidencePath, serializeHistoricSqlEvidence } from './evidence.js';
|
||||
import { patternOutputSchema, tableUsageOutputSchema } from './skill-schemas.js';
|
||||
|
||||
const SYSTEM_AUTHOR = 'System User';
|
||||
const SYSTEM_EMAIL = 'system@example.com';
|
||||
|
||||
const emitHistoricSqlEvidenceInputSchema = z
|
||||
.object({
|
||||
kind: z.enum(['table_usage', 'pattern']),
|
||||
table: z.string().min(1).optional(),
|
||||
rawPath: z.string().min(1),
|
||||
usage: tableUsageOutputSchema.optional(),
|
||||
pattern: patternOutputSchema.optional(),
|
||||
})
|
||||
.superRefine((input, ctx) => {
|
||||
if (input.kind === 'table_usage') {
|
||||
if (!input.table) {
|
||||
ctx.addIssue({
|
||||
code: 'custom',
|
||||
path: ['table'],
|
||||
message: 'table is required when kind is table_usage',
|
||||
});
|
||||
}
|
||||
if (!input.usage) {
|
||||
ctx.addIssue({
|
||||
code: 'custom',
|
||||
path: ['usage'],
|
||||
message: 'usage is required when kind is table_usage',
|
||||
});
|
||||
}
|
||||
}
|
||||
if (input.kind === 'pattern' && !input.pattern) {
|
||||
ctx.addIssue({
|
||||
code: 'custom',
|
||||
path: ['pattern'],
|
||||
message: 'pattern is required when kind is pattern',
|
||||
});
|
||||
}
|
||||
});
|
||||
|
||||
type EmitHistoricSqlEvidenceInput = z.infer<typeof emitHistoricSqlEvidenceInputSchema>;
|
||||
|
||||
interface EmitHistoricSqlEvidenceToolContext {
|
||||
connectionId?: string | null;
|
||||
session?: {
|
||||
ingest?: { runId: string; sourceKey: string };
|
||||
configService?: {
|
||||
writeFile(
|
||||
path: string,
|
||||
content: string,
|
||||
author: string,
|
||||
authorEmail: string,
|
||||
commitMessage: string,
|
||||
options?: { skipLock?: boolean },
|
||||
): Promise<unknown>;
|
||||
};
|
||||
};
|
||||
}
|
||||
|
||||
function unitKeyForEvidence(input: EmitHistoricSqlEvidenceInput): string {
|
||||
if (input.kind === 'table_usage') {
|
||||
return `historic-sql-table-${String(input.table).replace(/[^a-zA-Z0-9]+/g, '-').replace(/^-+|-+$/g, '')}`;
|
||||
}
|
||||
return `historic-sql-pattern-${String(input.pattern?.slug).replace(/[^a-zA-Z0-9]+/g, '-').replace(/^-+|-+$/g, '')}`;
|
||||
}
|
||||
|
||||
function evidenceEnvelope(input: EmitHistoricSqlEvidenceInput, connectionId: string) {
|
||||
if (input.kind === 'table_usage') {
|
||||
if (!input.table || !input.usage) {
|
||||
throw new Error('Invalid historic-SQL table usage evidence input.');
|
||||
}
|
||||
return {
|
||||
kind: 'table_usage' as const,
|
||||
connectionId,
|
||||
table: input.table,
|
||||
rawPath: input.rawPath,
|
||||
usage: input.usage,
|
||||
};
|
||||
}
|
||||
if (!input.pattern) {
|
||||
throw new Error('Invalid historic-SQL pattern evidence input.');
|
||||
}
|
||||
return {
|
||||
kind: 'pattern' as const,
|
||||
connectionId,
|
||||
rawPath: input.rawPath,
|
||||
pattern: input.pattern,
|
||||
};
|
||||
}
|
||||
|
||||
export function createEmitHistoricSqlEvidenceTool(defaultContext?: EmitHistoricSqlEvidenceToolContext) {
|
||||
return tool({
|
||||
description:
|
||||
'Record typed historic-SQL evidence for deterministic projection. Use this instead of wiki_write, sl_write_source, sl_edit_source, or context_candidate_write during historic-SQL WorkUnits.',
|
||||
inputSchema: emitHistoricSqlEvidenceInputSchema,
|
||||
execute: async (input, options): Promise<string> => {
|
||||
const context = (options.experimental_context as EmitHistoricSqlEvidenceToolContext | undefined) ?? defaultContext;
|
||||
const ingest = context?.session?.ingest;
|
||||
const configService = context?.session?.configService;
|
||||
if (!ingest || ingest.sourceKey !== 'historic-sql' || !configService || !context?.connectionId) {
|
||||
return 'Error: emit_historic_sql_evidence is only available during historic-sql ingest.';
|
||||
}
|
||||
|
||||
const unitKey = unitKeyForEvidence(input);
|
||||
const evidence = evidenceEnvelope(input, context.connectionId);
|
||||
const content = serializeHistoricSqlEvidence(evidence);
|
||||
await configService.writeFile(
|
||||
historicSqlEvidencePath(ingest.runId, unitKey),
|
||||
content,
|
||||
SYSTEM_AUTHOR,
|
||||
SYSTEM_EMAIL,
|
||||
`Record historic-SQL evidence: ${unitKey}`,
|
||||
{ skipLock: true },
|
||||
);
|
||||
const label = evidence.kind === 'table_usage' ? evidence.table : evidence.pattern.slug;
|
||||
return `Recorded historic-SQL ${input.kind} evidence for ${label}.`;
|
||||
},
|
||||
});
|
||||
}
|
||||
|
|
@ -0,0 +1,57 @@
|
|||
import { describe, expect, it } from 'vitest';
|
||||
import {
|
||||
historicSqlEvidenceEnvelopeSchema,
|
||||
historicSqlEvidencePath,
|
||||
historicSqlPatternEvidenceSchema,
|
||||
historicSqlTableUsageEvidenceSchema,
|
||||
} from './evidence.js';
|
||||
|
||||
describe('historic-sql evidence contracts', () => {
|
||||
it('validates table usage evidence emitted by table digest WorkUnits', () => {
|
||||
const parsed = historicSqlTableUsageEvidenceSchema.parse({
|
||||
kind: 'table_usage',
|
||||
connectionId: 'warehouse',
|
||||
table: 'public.orders',
|
||||
rawPath: 'tables/public.orders.json',
|
||||
usage: {
|
||||
narrative: 'Orders are repeatedly queried for paid/refunded lifecycle analysis.',
|
||||
frequencyTier: 'high',
|
||||
commonFilters: ['status', 'created_at'],
|
||||
commonGroupBys: ['status'],
|
||||
commonJoins: [{ table: 'public.customers', on: ['customer_id'] }],
|
||||
staleSince: null,
|
||||
},
|
||||
});
|
||||
|
||||
expect(parsed.table).toBe('public.orders');
|
||||
expect(parsed.usage.frequencyTier).toBe('high');
|
||||
});
|
||||
|
||||
it('validates pattern evidence emitted by the patterns WorkUnit', () => {
|
||||
const parsed = historicSqlPatternEvidenceSchema.parse(
|
||||
historicSqlEvidenceEnvelopeSchema.parse({
|
||||
kind: 'pattern',
|
||||
connectionId: 'warehouse',
|
||||
rawPath: 'patterns-input.json',
|
||||
pattern: {
|
||||
slug: 'order-lifecycle-analysis',
|
||||
title: 'Order Lifecycle Analysis',
|
||||
narrative: 'Analysts compare order status changes by customer segment.',
|
||||
definitionSql: 'select status, count(*) from public.orders group by status',
|
||||
tablesInvolved: ['public.orders', 'public.customers'],
|
||||
slRefs: ['orders', 'customers'],
|
||||
constituentTemplateIds: ['pg:1', 'pg:2'],
|
||||
},
|
||||
}),
|
||||
);
|
||||
|
||||
expect(parsed.kind).toBe('pattern');
|
||||
expect(parsed.pattern.slug).toBe('order-lifecycle-analysis');
|
||||
});
|
||||
|
||||
it('builds a stable ignored evidence path from run and WorkUnit identity', () => {
|
||||
expect(historicSqlEvidencePath('run-1', 'historic-sql-table-public-orders')).toBe(
|
||||
'.ktx/ingest-evidence/historic-sql/run-1/historic-sql-table-public-orders.json',
|
||||
);
|
||||
});
|
||||
});
|
||||
|
|
@ -0,0 +1,41 @@
|
|||
import { z } from 'zod';
|
||||
import { patternOutputSchema, tableUsageOutputSchema } from './skill-schemas.js';
|
||||
|
||||
function safeEvidenceSegment(value: string): string {
|
||||
const segment = value.replace(/[^a-zA-Z0-9._-]+/g, '-').replace(/^-+|-+$/g, '');
|
||||
if (!segment) {
|
||||
throw new Error(`Invalid historic-SQL evidence path segment: ${value}`);
|
||||
}
|
||||
return segment;
|
||||
}
|
||||
|
||||
/** @internal */
|
||||
export const historicSqlTableUsageEvidenceSchema = z.object({
|
||||
kind: z.literal('table_usage'),
|
||||
connectionId: z.string().min(1),
|
||||
table: z.string().min(1),
|
||||
rawPath: z.string().min(1),
|
||||
usage: tableUsageOutputSchema,
|
||||
});
|
||||
|
||||
/** @internal */
|
||||
export const historicSqlPatternEvidenceSchema = z.object({
|
||||
kind: z.literal('pattern'),
|
||||
connectionId: z.string().min(1),
|
||||
rawPath: z.string().min(1),
|
||||
pattern: patternOutputSchema,
|
||||
});
|
||||
|
||||
export const historicSqlEvidenceEnvelopeSchema = z.discriminatedUnion('kind', [
|
||||
historicSqlTableUsageEvidenceSchema,
|
||||
historicSqlPatternEvidenceSchema,
|
||||
]);
|
||||
export type HistoricSqlEvidenceEnvelope = z.infer<typeof historicSqlEvidenceEnvelopeSchema>;
|
||||
|
||||
export function historicSqlEvidencePath(runId: string, unitKey: string): string {
|
||||
return `.ktx/ingest-evidence/historic-sql/${safeEvidenceSegment(runId)}/${safeEvidenceSegment(unitKey)}.json`;
|
||||
}
|
||||
|
||||
export function serializeHistoricSqlEvidence(evidence: HistoricSqlEvidenceEnvelope): string {
|
||||
return `${JSON.stringify(historicSqlEvidenceEnvelopeSchema.parse(evidence), null, 2)}\n`;
|
||||
}
|
||||
|
|
@ -0,0 +1,110 @@
|
|||
import { mkdtemp } from 'node:fs/promises';
|
||||
import { tmpdir } from 'node:os';
|
||||
import { join } from 'node:path';
|
||||
import { describe, expect, it } from 'vitest';
|
||||
import type { SqlAnalysisPort } from '../../../../context/sql-analysis/ports.js';
|
||||
import type { SourceAdapter } from '../../types.js';
|
||||
import { HistoricSqlSourceAdapter } from './historic-sql.adapter.js';
|
||||
import type { HistoricSqlReader } from './types.js';
|
||||
|
||||
async function tempDir(): Promise<string> {
|
||||
return mkdtemp(join(tmpdir(), 'historic-sql-adapter-'));
|
||||
}
|
||||
|
||||
const sqlAnalysis: SqlAnalysisPort = {
|
||||
async analyzeForFingerprint() {
|
||||
throw new Error('analyzeForFingerprint must not be used');
|
||||
},
|
||||
async analyzeBatch() {
|
||||
return new Map();
|
||||
},
|
||||
async validateReadOnly() {
|
||||
return { ok: true };
|
||||
},
|
||||
};
|
||||
|
||||
const reader: HistoricSqlReader = {
|
||||
async probe() {
|
||||
return { warnings: [], info: [] };
|
||||
},
|
||||
async *fetchAggregated() {},
|
||||
};
|
||||
|
||||
describe('HistoricSqlSourceAdapter', () => {
|
||||
it('declares canonical adapter metadata', () => {
|
||||
const adapter = new HistoricSqlSourceAdapter({ sqlAnalysis, reader, queryClient: {} });
|
||||
|
||||
expect(adapter.source).toBe('historic-sql');
|
||||
expect(adapter.skillNames).toEqual(['historic_sql_table_digest', 'historic_sql_patterns']);
|
||||
expect(adapter.reconcileSkillNames).toEqual([]);
|
||||
expect((adapter as SourceAdapter).evidenceIndexing).toBeUndefined();
|
||||
expect(adapter.triageSupported).toBe(false);
|
||||
});
|
||||
|
||||
it('fetches a unified aggregate snapshot and emits unified WorkUnits', async () => {
|
||||
const stagedDir = await tempDir();
|
||||
const aggregateReader: HistoricSqlReader = {
|
||||
async probe() {
|
||||
return { warnings: [], info: [] };
|
||||
},
|
||||
async *fetchAggregated() {
|
||||
yield {
|
||||
templateId: 'pg:1',
|
||||
canonicalSql:
|
||||
'select o.status, count(*) from public.orders o join public.customers c on c.id = o.customer_id group by o.status',
|
||||
dialect: 'postgres',
|
||||
stats: {
|
||||
executions: 25,
|
||||
distinctUsers: 3,
|
||||
firstSeen: '2026-05-01T00:00:00.000Z',
|
||||
lastSeen: '2026-05-11T00:00:00.000Z',
|
||||
p50RuntimeMs: 10,
|
||||
p95RuntimeMs: 20,
|
||||
errorRate: 0,
|
||||
rowsProduced: 10,
|
||||
},
|
||||
topUsers: [{ user: 'analyst', executions: 25 }],
|
||||
};
|
||||
},
|
||||
};
|
||||
const batchSqlAnalysis: SqlAnalysisPort = {
|
||||
async analyzeForFingerprint() {
|
||||
throw new Error('analyzeForFingerprint must not be used');
|
||||
},
|
||||
async analyzeBatch() {
|
||||
return new Map([
|
||||
[
|
||||
'pg:1',
|
||||
{
|
||||
tablesTouched: ['public.orders', 'public.customers'],
|
||||
columnsByClause: { select: ['status'], join: ['customer_id', 'id'], groupBy: ['status'] },
|
||||
},
|
||||
],
|
||||
]);
|
||||
},
|
||||
async validateReadOnly() {
|
||||
return { ok: true };
|
||||
},
|
||||
};
|
||||
const adapter = new HistoricSqlSourceAdapter({
|
||||
sqlAnalysis: batchSqlAnalysis,
|
||||
reader: aggregateReader,
|
||||
queryClient: {},
|
||||
now: () => new Date('2026-05-11T00:00:00.000Z'),
|
||||
});
|
||||
|
||||
await adapter.fetch({ dialect: 'postgres', minExecutions: 5 }, stagedDir, {
|
||||
connectionId: 'warehouse',
|
||||
sourceKey: 'historic-sql',
|
||||
});
|
||||
|
||||
await expect(adapter.detect(stagedDir)).resolves.toBe(true);
|
||||
await expect(adapter.chunk(stagedDir)).resolves.toMatchObject({
|
||||
workUnits: [
|
||||
{ unitKey: 'historic-sql-table-public-customers' },
|
||||
{ unitKey: 'historic-sql-table-public-orders' },
|
||||
{ unitKey: 'historic-sql-patterns-part-0001' },
|
||||
],
|
||||
});
|
||||
});
|
||||
});
|
||||
|
|
@ -0,0 +1,65 @@
|
|||
import type {
|
||||
ChunkResult,
|
||||
DeterministicFinalizationContext,
|
||||
DiffSet,
|
||||
FetchContext,
|
||||
FinalizationResult,
|
||||
ScopeDescriptor,
|
||||
SourceAdapter,
|
||||
} from '../../types.js';
|
||||
import { chunkHistoricSqlUnifiedStagedDir, describeHistoricSqlUnifiedScope } from './chunk-unified.js';
|
||||
import { detectHistoricSqlStagedDir } from './detect.js';
|
||||
import { projectHistoricSqlEvidence } from './projection.js';
|
||||
import { stageHistoricSqlAggregatedSnapshot } from './stage-unified.js';
|
||||
import { type HistoricSqlSourceAdapterDeps } from './types.js';
|
||||
|
||||
export class HistoricSqlSourceAdapter implements SourceAdapter {
|
||||
readonly source = 'historic-sql';
|
||||
readonly skillNames = ['historic_sql_table_digest', 'historic_sql_patterns'];
|
||||
readonly reconcileSkillNames: string[] = [];
|
||||
readonly triageSupported = false;
|
||||
|
||||
constructor(private readonly deps: HistoricSqlSourceAdapterDeps) {}
|
||||
|
||||
detect(stagedDir: string): Promise<boolean> {
|
||||
return detectHistoricSqlStagedDir(stagedDir);
|
||||
}
|
||||
|
||||
async fetch(pullConfig: unknown, stagedDir: string, ctx: FetchContext): Promise<void> {
|
||||
await stageHistoricSqlAggregatedSnapshot({
|
||||
stagedDir,
|
||||
connectionId: ctx.connectionId,
|
||||
queryClient: this.deps.queryClient,
|
||||
reader: this.deps.reader,
|
||||
sqlAnalysis: this.deps.sqlAnalysis,
|
||||
pullConfig,
|
||||
now: this.deps.now?.(),
|
||||
});
|
||||
}
|
||||
|
||||
chunk(stagedDir: string, diffSet?: DiffSet): Promise<ChunkResult> {
|
||||
return chunkHistoricSqlUnifiedStagedDir(stagedDir, diffSet);
|
||||
}
|
||||
|
||||
describeScope(stagedDir: string): Promise<ScopeDescriptor> {
|
||||
return describeHistoricSqlUnifiedScope(stagedDir);
|
||||
}
|
||||
|
||||
async finalize(ctx: DeterministicFinalizationContext): Promise<FinalizationResult> {
|
||||
const projection = await projectHistoricSqlEvidence({
|
||||
workdir: ctx.workdir,
|
||||
connectionId: ctx.connectionId,
|
||||
syncId: ctx.syncId,
|
||||
runId: ctx.runId,
|
||||
overrideReplay: ctx.overrideReplay,
|
||||
});
|
||||
return {
|
||||
result: projection,
|
||||
warnings: projection.warnings,
|
||||
errors: [],
|
||||
touchedSources: projection.touchedSources,
|
||||
changedWikiPageKeys: projection.changedWikiPageKeys,
|
||||
actions: projection.actions,
|
||||
};
|
||||
}
|
||||
}
|
||||
|
|
@ -0,0 +1,286 @@
|
|||
import { mkdtemp, readFile, rm, writeFile } from 'node:fs/promises';
|
||||
import { tmpdir } from 'node:os';
|
||||
import { join } from 'node:path';
|
||||
import YAML from 'yaml';
|
||||
import type { AgentRunnerPort, RunLoopParams } from '../../../../context/llm/runtime-port.js';
|
||||
import { initKtxProject, loadKtxProject, type KtxLocalProject } from '../../../../context/project/project.js';
|
||||
import type { SqlAnalysisBatchItem, SqlAnalysisBatchResult, SqlAnalysisDialect, SqlAnalysisPort } from '../../../../context/sql-analysis/ports.js';
|
||||
import { searchLocalSlSources } from '../../../sl/local-sl.js';
|
||||
import { searchLocalKnowledgePages } from '../../../wiki/local-knowledge.js';
|
||||
import { runLocalIngest } from '../../local-ingest.js';
|
||||
import { afterEach, beforeEach, describe, expect, it, vi } from 'vitest';
|
||||
import { HistoricSqlSourceAdapter } from './historic-sql.adapter.js';
|
||||
import type { AggregatedTemplate, HistoricSqlReader, HistoricSqlUnifiedPullConfig } from './types.js';
|
||||
|
||||
class AcceptanceHistoricSqlReader implements HistoricSqlReader {
|
||||
async probe() {
|
||||
return { warnings: [], info: [] };
|
||||
}
|
||||
|
||||
async *fetchAggregated(
|
||||
_client: unknown,
|
||||
_window: { start: Date; end: Date },
|
||||
_config: HistoricSqlUnifiedPullConfig,
|
||||
): AsyncIterable<AggregatedTemplate> {
|
||||
yield {
|
||||
templateId: 'pg:orders-lifecycle',
|
||||
canonicalSql:
|
||||
'select o.status, c.segment, count(*) from public.orders o join public.customers c on c.id = o.customer_id where o.status = $1 group by o.status, c.segment',
|
||||
dialect: 'postgres',
|
||||
stats: {
|
||||
executions: 42,
|
||||
distinctUsers: 4,
|
||||
firstSeen: '2026-05-01T00:00:00.000Z',
|
||||
lastSeen: '2026-05-11T00:00:00.000Z',
|
||||
p50RuntimeMs: 18,
|
||||
p95RuntimeMs: 84,
|
||||
errorRate: 0,
|
||||
rowsProduced: 420,
|
||||
},
|
||||
topUsers: [{ user: 'analyst@example.test', executions: 42 }],
|
||||
};
|
||||
}
|
||||
}
|
||||
|
||||
class HistoricSqlAcceptanceAgentRunner implements AgentRunnerPort {
|
||||
runLoop = vi.fn(async (params: RunLoopParams) => {
|
||||
if (params.telemetryTags?.operationName !== 'ingest-bundle-wu') {
|
||||
return { stopReason: 'natural' as const };
|
||||
}
|
||||
|
||||
const emitEvidence = params.toolSet.emit_historic_sql_evidence;
|
||||
if (!emitEvidence?.execute) {
|
||||
throw new Error('emit_historic_sql_evidence tool was not available to the historic-SQL WorkUnit');
|
||||
}
|
||||
|
||||
if (params.telemetryTags.unitKey === 'historic-sql-table-public-orders') {
|
||||
const result = await emitEvidence.execute({
|
||||
kind: 'table_usage',
|
||||
table: 'public.orders',
|
||||
rawPath: 'tables/public.orders.json',
|
||||
usage: {
|
||||
narrative: 'Analysts repeatedly inspect paid order lifecycle by customer segment.',
|
||||
frequencyTier: 'high',
|
||||
commonFilters: ['status'],
|
||||
commonGroupBys: ['status', 'segment'],
|
||||
commonJoins: [{ table: 'public.customers', on: ['customer_id', 'id'] }],
|
||||
staleSince: null,
|
||||
},
|
||||
});
|
||||
if (!result.markdown.includes('Recorded historic-SQL table_usage evidence')) {
|
||||
throw new Error(`Unexpected orders evidence result: ${result.markdown}`);
|
||||
}
|
||||
}
|
||||
|
||||
if (params.telemetryTags.unitKey === 'historic-sql-table-public-customers') {
|
||||
const result = await emitEvidence.execute({
|
||||
kind: 'table_usage',
|
||||
table: 'public.customers',
|
||||
rawPath: 'tables/public.customers.json',
|
||||
usage: {
|
||||
narrative: 'Customers provide segment context for paid order lifecycle analysis.',
|
||||
frequencyTier: 'mid',
|
||||
commonFilters: [],
|
||||
commonGroupBys: ['segment'],
|
||||
commonJoins: [{ table: 'public.orders', on: ['id', 'customer_id'] }],
|
||||
staleSince: null,
|
||||
},
|
||||
});
|
||||
if (!result.markdown.includes('Recorded historic-SQL table_usage evidence')) {
|
||||
throw new Error(`Unexpected customers evidence result: ${result.markdown}`);
|
||||
}
|
||||
}
|
||||
|
||||
if (params.telemetryTags.unitKey === 'historic-sql-patterns-part-0001') {
|
||||
const result = await emitEvidence.execute({
|
||||
kind: 'pattern',
|
||||
rawPath: 'patterns-input/part-0001.json',
|
||||
pattern: {
|
||||
slug: 'paid-order-lifecycle',
|
||||
title: 'Paid Order Lifecycle',
|
||||
narrative: 'Analysts join orders and customers to compare paid order lifecycle by segment.',
|
||||
definitionSql:
|
||||
'select o.status, c.segment, count(*) from public.orders o join public.customers c on c.id = o.customer_id group by o.status, c.segment',
|
||||
tablesInvolved: ['public.orders', 'public.customers'],
|
||||
slRefs: ['orders', 'customers'],
|
||||
constituentTemplateIds: ['pg:orders-lifecycle'],
|
||||
},
|
||||
});
|
||||
if (!result.markdown.includes('Recorded historic-SQL pattern evidence')) {
|
||||
throw new Error(`Unexpected pattern evidence result: ${result.markdown}`);
|
||||
}
|
||||
}
|
||||
|
||||
return { stopReason: 'natural' as const };
|
||||
});
|
||||
}
|
||||
|
||||
function acceptanceSqlAnalysis(): SqlAnalysisPort {
|
||||
return {
|
||||
analyzeForFingerprint: async () => {
|
||||
throw new Error('analyzeForFingerprint should not be used by unified historic-SQL ingest');
|
||||
},
|
||||
analyzeBatch: vi.fn(
|
||||
async (
|
||||
items: SqlAnalysisBatchItem[],
|
||||
_dialect: SqlAnalysisDialect,
|
||||
): Promise<Map<string, SqlAnalysisBatchResult>> => {
|
||||
return new Map(
|
||||
items.map((item) => [
|
||||
item.id,
|
||||
{
|
||||
tablesTouched: ['public.orders', 'public.customers'],
|
||||
columnsByClause: {
|
||||
select: ['status', 'segment'],
|
||||
where: ['status'],
|
||||
join: ['customer_id', 'id'],
|
||||
groupBy: ['status', 'segment'],
|
||||
},
|
||||
},
|
||||
]),
|
||||
);
|
||||
},
|
||||
),
|
||||
validateReadOnly: vi.fn(async () => ({ ok: true })),
|
||||
};
|
||||
}
|
||||
|
||||
async function writeHistoricSqlProject(project: KtxLocalProject): Promise<KtxLocalProject> {
|
||||
await writeFile(
|
||||
join(project.projectDir, 'ktx.yaml'),
|
||||
[
|
||||
'connections:',
|
||||
' warehouse:',
|
||||
' driver: postgres',
|
||||
' historicSql:',
|
||||
' enabled: true',
|
||||
' dialect: postgres',
|
||||
' minExecutions: 2',
|
||||
'ingest:',
|
||||
' adapters:',
|
||||
' - historic-sql',
|
||||
' embeddings:',
|
||||
' backend: none',
|
||||
'storage:',
|
||||
' state: sqlite',
|
||||
' search: sqlite-fts5',
|
||||
' git:',
|
||||
' auto_commit: false',
|
||||
' author: KTX Test <system@ktx.local>',
|
||||
'',
|
||||
].join('\n'),
|
||||
'utf-8',
|
||||
);
|
||||
|
||||
const loaded = await loadKtxProject({ projectDir: project.projectDir });
|
||||
await loaded.fileStore.writeFile(
|
||||
'semantic-layer/warehouse/_schema/public.yaml',
|
||||
YAML.stringify({
|
||||
tables: {
|
||||
orders: {
|
||||
table: 'public.orders',
|
||||
columns: [
|
||||
{ name: 'id', type: 'string' },
|
||||
{ name: 'status', type: 'string' },
|
||||
{ name: 'customer_id', type: 'string' },
|
||||
],
|
||||
},
|
||||
customers: {
|
||||
table: 'public.customers',
|
||||
columns: [
|
||||
{ name: 'id', type: 'string' },
|
||||
{ name: 'segment', type: 'string' },
|
||||
],
|
||||
},
|
||||
},
|
||||
}),
|
||||
'KTX Test',
|
||||
'system@ktx.local',
|
||||
'Seed schema shard',
|
||||
);
|
||||
return loaded;
|
||||
}
|
||||
|
||||
describe('historic-SQL local ingest retrieval acceptance', () => {
|
||||
let tempDir: string;
|
||||
|
||||
beforeEach(async () => {
|
||||
tempDir = await mkdtemp(join(tmpdir(), 'ktx-historic-sql-acceptance-'));
|
||||
});
|
||||
|
||||
afterEach(async () => {
|
||||
await rm(tempDir, { recursive: true, force: true });
|
||||
});
|
||||
|
||||
it('projects table and pattern evidence into semantic-layer and wiki retrieval surfaces', async () => {
|
||||
const initialized = await initKtxProject({ projectDir: join(tempDir, 'project') });
|
||||
const project = await writeHistoricSqlProject(initialized);
|
||||
const sqlAnalysis = acceptanceSqlAnalysis();
|
||||
const agentRunner = new HistoricSqlAcceptanceAgentRunner();
|
||||
const adapter = new HistoricSqlSourceAdapter({
|
||||
reader: new AcceptanceHistoricSqlReader(),
|
||||
queryClient: {},
|
||||
sqlAnalysis,
|
||||
now: () => new Date('2026-05-11T00:00:00.000Z'),
|
||||
});
|
||||
|
||||
const result = await runLocalIngest({
|
||||
project,
|
||||
adapters: [adapter],
|
||||
adapter: 'historic-sql',
|
||||
connectionId: 'warehouse',
|
||||
jobId: 'historic-sql-retrieval-acceptance',
|
||||
agentRunner,
|
||||
});
|
||||
|
||||
expect(sqlAnalysis.analyzeBatch).toHaveBeenCalledTimes(1);
|
||||
expect(result.result.failedWorkUnits).toEqual([]);
|
||||
expect(result.result.workUnitCount).toBe(3);
|
||||
expect(agentRunner.runLoop).toHaveBeenCalledTimes(3);
|
||||
const finalization = result.report.body.finalization;
|
||||
expect(finalization).toBeDefined();
|
||||
if (!finalization) {
|
||||
throw new Error('Expected historic-SQL finalization result');
|
||||
}
|
||||
expect(finalization).toMatchObject({
|
||||
sourceKey: 'historic-sql',
|
||||
status: 'success',
|
||||
result: {
|
||||
tableUsageMerged: 2,
|
||||
patternPagesWritten: 1,
|
||||
},
|
||||
});
|
||||
expect(finalization.declaredTouchedSources).toEqual(
|
||||
expect.arrayContaining([
|
||||
{ connectionId: 'warehouse', sourceName: 'customers' },
|
||||
{ connectionId: 'warehouse', sourceName: 'orders' },
|
||||
]),
|
||||
);
|
||||
|
||||
await expect(readFile(join(project.projectDir, 'semantic-layer/warehouse/_schema/public.yaml'), 'utf-8')).resolves
|
||||
.toContain('Analysts repeatedly inspect paid order lifecycle by customer segment.');
|
||||
await expect(readFile(join(project.projectDir, 'wiki/global/historic-sql-paid-order-lifecycle.md'), 'utf-8'))
|
||||
.resolves.toContain('Paid Order Lifecycle');
|
||||
|
||||
const reloaded = await loadKtxProject({ projectDir: project.projectDir });
|
||||
await expect(
|
||||
searchLocalSlSources(reloaded, { connectionId: 'warehouse', query: 'paid order lifecycle', limit: 5 }),
|
||||
).resolves.toEqual(expect.arrayContaining([
|
||||
expect.objectContaining({
|
||||
name: 'orders',
|
||||
frequencyTier: 'high',
|
||||
snippet: expect.stringContaining('<mark>'),
|
||||
matchReasons: expect.arrayContaining(['lexical']),
|
||||
}),
|
||||
]));
|
||||
await expect(
|
||||
searchLocalKnowledgePages(reloaded, { query: 'paid order lifecycle', userId: 'local', limit: 5 }),
|
||||
).resolves.toEqual([
|
||||
expect.objectContaining({
|
||||
key: 'historic-sql-paid-order-lifecycle',
|
||||
summary: 'Paid Order Lifecycle',
|
||||
matchReasons: expect.arrayContaining(['lexical']),
|
||||
}),
|
||||
]);
|
||||
});
|
||||
});
|
||||
|
|
@ -0,0 +1,89 @@
|
|||
import { describe, expect, it } from 'vitest';
|
||||
import {
|
||||
HISTORIC_SQL_PATTERN_WORKUNIT_MAX_BYTES,
|
||||
isHistoricSqlPatternInputShardPath,
|
||||
serializedStagedPatternsInputByteLength,
|
||||
splitHistoricSqlPatternInputs,
|
||||
} from './pattern-inputs.js';
|
||||
import type { StagedPatternsInput } from './types.js';
|
||||
|
||||
type PatternTemplate = StagedPatternsInput['templates'][number];
|
||||
|
||||
function template(id: string, tablesTouched: string[], canonicalSql = 'select 1'): PatternTemplate {
|
||||
return {
|
||||
id,
|
||||
canonicalSql,
|
||||
tablesTouched,
|
||||
executionsBucket: '10-100',
|
||||
distinctUsersBucket: '2-5',
|
||||
dialect: 'postgres',
|
||||
};
|
||||
}
|
||||
|
||||
describe('historic-SQL pattern input sharding', () => {
|
||||
it('keeps the audit input complete while sharding only cross-table pattern candidates', () => {
|
||||
const largeSql = `select * from public.orders join public.customers on true where marker = '${'x'.repeat(260)}'`;
|
||||
const input: StagedPatternsInput = {
|
||||
templates: [
|
||||
template('single-table-orders', ['public.orders']),
|
||||
template('orders-customers-2', ['public.orders', 'public.customers'], largeSql),
|
||||
template('orders-customers-1', ['public.customers', 'public.orders'], largeSql),
|
||||
template('orders-customers-payments', ['public.orders', 'public.customers', 'public.payments'], largeSql),
|
||||
],
|
||||
};
|
||||
|
||||
const result = splitHistoricSqlPatternInputs(input, { maxBytes: 760 });
|
||||
|
||||
expect(result.auditInput.templates.map((entry) => entry.id)).toEqual([
|
||||
'orders-customers-1',
|
||||
'orders-customers-2',
|
||||
'orders-customers-payments',
|
||||
'single-table-orders',
|
||||
]);
|
||||
expect(result.shards.length).toBeGreaterThan(1);
|
||||
expect(result.shards.map((shard) => shard.path)).toEqual([
|
||||
'patterns-input/part-0001.json',
|
||||
'patterns-input/part-0002.json',
|
||||
'patterns-input/part-0003.json',
|
||||
]);
|
||||
expect(result.shards.flatMap((shard) => shard.input.templates.map((entry) => entry.id))).toEqual([
|
||||
'orders-customers-payments',
|
||||
'orders-customers-1',
|
||||
'orders-customers-2',
|
||||
]);
|
||||
expect(result.shards.every((shard) => shard.byteLength <= 760)).toBe(true);
|
||||
expect(result.shards.flatMap((shard) => shard.input.templates).some((entry) => entry.id === 'single-table-orders')).toBe(false);
|
||||
expect(result.warnings).toEqual([]);
|
||||
});
|
||||
|
||||
it('omits a single oversized template from shards and reports a manifest warning', () => {
|
||||
const input: StagedPatternsInput = {
|
||||
templates: [
|
||||
template(
|
||||
'oversized-cross-table',
|
||||
['public.orders', 'public.customers'],
|
||||
`select * from public.orders join public.customers on true where payload = '${'x'.repeat(500)}'`,
|
||||
),
|
||||
],
|
||||
};
|
||||
|
||||
const result = splitHistoricSqlPatternInputs(input, { maxBytes: 240 });
|
||||
|
||||
expect(result.auditInput.templates.map((entry) => entry.id)).toEqual(['oversized-cross-table']);
|
||||
expect(result.shards).toEqual([]);
|
||||
expect(result.warnings).toEqual(['patterns_input_template_too_large:oversized-cross-table']);
|
||||
});
|
||||
|
||||
it('recognizes only generated pattern shard paths', () => {
|
||||
expect(isHistoricSqlPatternInputShardPath('patterns-input/part-0001.json')).toBe(true);
|
||||
expect(isHistoricSqlPatternInputShardPath('patterns-input/part-0012.json')).toBe(true);
|
||||
expect(isHistoricSqlPatternInputShardPath('patterns-input.json')).toBe(false);
|
||||
expect(isHistoricSqlPatternInputShardPath('patterns-input/part-1.json')).toBe(false);
|
||||
expect(isHistoricSqlPatternInputShardPath('patterns-input/readme.md')).toBe(false);
|
||||
});
|
||||
|
||||
it('uses a production byte budget below read_raw_file maximum size', () => {
|
||||
expect(HISTORIC_SQL_PATTERN_WORKUNIT_MAX_BYTES).toBeLessThan(120_000);
|
||||
expect(serializedStagedPatternsInputByteLength({ templates: [] })).toBeGreaterThan(0);
|
||||
});
|
||||
});
|
||||
|
|
@ -0,0 +1,101 @@
|
|||
import { Buffer } from 'node:buffer';
|
||||
import type { StagedPatternsInput } from './types.js';
|
||||
|
||||
const HISTORIC_SQL_PATTERN_WORKUNIT_DIR = 'patterns-input';
|
||||
/** @internal */
|
||||
export const HISTORIC_SQL_PATTERN_WORKUNIT_MAX_BYTES = 110_000;
|
||||
const HISTORIC_SQL_PATTERN_WORKUNIT_PATH_RE = /^patterns-input\/part-\d{4}\.json$/;
|
||||
|
||||
type PatternTemplate = StagedPatternsInput['templates'][number];
|
||||
|
||||
interface HistoricSqlPatternInputShard {
|
||||
path: string;
|
||||
input: StagedPatternsInput;
|
||||
byteLength: number;
|
||||
}
|
||||
|
||||
export interface HistoricSqlPatternInputSplitResult {
|
||||
auditInput: StagedPatternsInput;
|
||||
shards: HistoricSqlPatternInputShard[];
|
||||
warnings: string[];
|
||||
}
|
||||
|
||||
export interface HistoricSqlPatternInputSplitOptions {
|
||||
maxBytes?: number;
|
||||
}
|
||||
|
||||
export function isHistoricSqlPatternInputShardPath(path: string): boolean {
|
||||
return HISTORIC_SQL_PATTERN_WORKUNIT_PATH_RE.test(path);
|
||||
}
|
||||
|
||||
function serializeStagedPatternsInput(input: StagedPatternsInput): string {
|
||||
return `${JSON.stringify(input, null, 2)}\n`;
|
||||
}
|
||||
|
||||
/** @internal */
|
||||
export function serializedStagedPatternsInputByteLength(input: StagedPatternsInput): number {
|
||||
return Buffer.byteLength(serializeStagedPatternsInput(input), 'utf-8');
|
||||
}
|
||||
|
||||
function sortedAuditTemplates(templates: readonly PatternTemplate[]): PatternTemplate[] {
|
||||
return [...templates].sort((left, right) => left.id.localeCompare(right.id));
|
||||
}
|
||||
|
||||
function sortedPatternCandidates(templates: readonly PatternTemplate[]): PatternTemplate[] {
|
||||
return [...templates]
|
||||
.filter((template) => template.tablesTouched.length >= 2)
|
||||
.map((template) => ({ ...template, tablesTouched: [...template.tablesTouched].sort() }))
|
||||
.sort((left, right) => {
|
||||
const cardinality = right.tablesTouched.length - left.tablesTouched.length;
|
||||
if (cardinality !== 0) return cardinality;
|
||||
const tableSignature = left.tablesTouched.join('\0').localeCompare(right.tablesTouched.join('\0'));
|
||||
if (tableSignature !== 0) return tableSignature;
|
||||
return left.id.localeCompare(right.id);
|
||||
});
|
||||
}
|
||||
|
||||
function shardPath(index: number): string {
|
||||
return `${HISTORIC_SQL_PATTERN_WORKUNIT_DIR}/part-${String(index).padStart(4, '0')}.json`;
|
||||
}
|
||||
|
||||
export function splitHistoricSqlPatternInputs(
|
||||
input: StagedPatternsInput,
|
||||
options: HistoricSqlPatternInputSplitOptions = {},
|
||||
): HistoricSqlPatternInputSplitResult {
|
||||
const maxBytes = options.maxBytes ?? HISTORIC_SQL_PATTERN_WORKUNIT_MAX_BYTES;
|
||||
const auditInput: StagedPatternsInput = { templates: sortedAuditTemplates(input.templates) };
|
||||
const warnings: string[] = [];
|
||||
const shards: HistoricSqlPatternInputShard[] = [];
|
||||
let current: PatternTemplate[] = [];
|
||||
|
||||
const flush = () => {
|
||||
if (current.length === 0) {
|
||||
return;
|
||||
}
|
||||
const shardInput: StagedPatternsInput = { templates: current };
|
||||
shards.push({
|
||||
path: shardPath(shards.length + 1),
|
||||
input: shardInput,
|
||||
byteLength: serializedStagedPatternsInputByteLength(shardInput),
|
||||
});
|
||||
current = [];
|
||||
};
|
||||
|
||||
for (const template of sortedPatternCandidates(input.templates)) {
|
||||
const singleInput: StagedPatternsInput = { templates: [template] };
|
||||
if (serializedStagedPatternsInputByteLength(singleInput) > maxBytes) {
|
||||
warnings.push(`patterns_input_template_too_large:${template.id}`);
|
||||
continue;
|
||||
}
|
||||
|
||||
const nextInput: StagedPatternsInput = { templates: [...current, template] };
|
||||
if (current.length > 0 && serializedStagedPatternsInputByteLength(nextInput) > maxBytes) {
|
||||
flush();
|
||||
}
|
||||
current.push(template);
|
||||
}
|
||||
|
||||
flush();
|
||||
|
||||
return { auditInput, shards, warnings };
|
||||
}
|
||||
|
|
@ -0,0 +1,242 @@
|
|||
import { describe, expect, it, vi } from 'vitest';
|
||||
import {
|
||||
HistoricSqlExtensionMissingError,
|
||||
HistoricSqlGrantsMissingError,
|
||||
HistoricSqlVersionUnsupportedError,
|
||||
} from './errors.js';
|
||||
import { PostgresPgssReader } from './postgres-pgss-reader.js';
|
||||
|
||||
interface FakeQueryResult {
|
||||
headers: string[];
|
||||
rows: unknown[][];
|
||||
totalRows?: number;
|
||||
error?: string;
|
||||
}
|
||||
|
||||
function queryClient(results: Array<FakeQueryResult | Error>) {
|
||||
const executeQuery = vi.fn(async (_query: string, _params?: unknown[]) => {
|
||||
const next = results.shift();
|
||||
if (!next) {
|
||||
throw new Error('unexpected query');
|
||||
}
|
||||
if (next instanceof Error) {
|
||||
throw next;
|
||||
}
|
||||
return next;
|
||||
});
|
||||
return { executeQuery };
|
||||
}
|
||||
|
||||
function executedSql(client: ReturnType<typeof queryClient>, index: number): string {
|
||||
const call = client.executeQuery.mock.calls[index];
|
||||
if (!call) {
|
||||
throw new Error(`expected query client call ${index}`);
|
||||
}
|
||||
return call[0];
|
||||
}
|
||||
|
||||
describe('PostgresPgssReader aggregate path', () => {
|
||||
it('probes version, extension presence, grants, and tracking state', async () => {
|
||||
const client = queryClient([
|
||||
{
|
||||
headers: ['server_version_num', 'server_version'],
|
||||
rows: [[160004, 'PostgreSQL 16.4 on x86_64-apple-darwin']],
|
||||
},
|
||||
{ headers: ['?column?'], rows: [[1]] },
|
||||
{ headers: ['has_role'], rows: [[true]] },
|
||||
{ headers: ['track'], rows: [['top']] },
|
||||
{ headers: ['max'], rows: [['5000']] },
|
||||
]);
|
||||
const reader = new PostgresPgssReader();
|
||||
|
||||
await expect(reader.probe(client)).resolves.toEqual({
|
||||
pgServerVersion: 'PostgreSQL 16.4 on x86_64-apple-darwin',
|
||||
warnings: [],
|
||||
info: [],
|
||||
});
|
||||
|
||||
expect(executedSql(client, 0)).toContain("current_setting('server_version_num')::int");
|
||||
expect(executedSql(client, 1)).toBe('SELECT 1 FROM pg_stat_statements LIMIT 1');
|
||||
expect(executedSql(client, 2)).toBe(
|
||||
"SELECT pg_has_role(current_user, 'pg_read_all_stats', 'USAGE') AS has_role",
|
||||
);
|
||||
expect(executedSql(client, 3)).toBe("SELECT current_setting('pg_stat_statements.track') AS track");
|
||||
expect(executedSql(client, 4)).toBe("SELECT current_setting('pg_stat_statements.max') AS max");
|
||||
});
|
||||
|
||||
it('rejects PostgreSQL versions older than 14 without probing the extension', async () => {
|
||||
const client = queryClient([
|
||||
{
|
||||
headers: ['server_version_num', 'server_version'],
|
||||
rows: [[130012, 'PostgreSQL 13.12']],
|
||||
},
|
||||
]);
|
||||
const reader = new PostgresPgssReader();
|
||||
|
||||
const promise = reader.probe(client);
|
||||
await expect(promise).rejects.toMatchObject({
|
||||
name: 'HistoricSqlVersionUnsupportedError',
|
||||
dialect: 'postgres',
|
||||
detectedVersion: 'PostgreSQL 13.12',
|
||||
minimumVersion: 'PostgreSQL 14',
|
||||
});
|
||||
await expect(promise).rejects.toBeInstanceOf(HistoricSqlVersionUnsupportedError);
|
||||
expect(client.executeQuery).toHaveBeenCalledTimes(1);
|
||||
});
|
||||
|
||||
it('maps a missing pg_stat_statements relation to HistoricSqlExtensionMissingError', async () => {
|
||||
const client = queryClient([
|
||||
{
|
||||
headers: ['server_version_num', 'server_version'],
|
||||
rows: [[160004, 'PostgreSQL 16.4']],
|
||||
},
|
||||
new Error('relation "pg_stat_statements" does not exist'),
|
||||
]);
|
||||
const reader = new PostgresPgssReader();
|
||||
|
||||
const promise = reader.probe(client);
|
||||
await expect(promise).rejects.toMatchObject({
|
||||
name: 'HistoricSqlExtensionMissingError',
|
||||
dialect: 'postgres',
|
||||
});
|
||||
await expect(promise).rejects.toBeInstanceOf(HistoricSqlExtensionMissingError);
|
||||
});
|
||||
|
||||
it('maps pg_stat_statements preload failures to HistoricSqlExtensionMissingError with preload remediation', async () => {
|
||||
const client = queryClient([
|
||||
{
|
||||
headers: ['server_version_num', 'server_version'],
|
||||
rows: [[160004, 'PostgreSQL 16.4']],
|
||||
},
|
||||
new Error('pg_stat_statements must be loaded via shared_preload_libraries'),
|
||||
]);
|
||||
const reader = new PostgresPgssReader();
|
||||
|
||||
const promise = reader.probe(client);
|
||||
await expect(promise).rejects.toMatchObject({
|
||||
name: 'HistoricSqlExtensionMissingError',
|
||||
dialect: 'postgres',
|
||||
message: 'pg_stat_statements is installed but not loaded via shared_preload_libraries.',
|
||||
remediation: expect.stringContaining("shared_preload_libraries includes 'pg_stat_statements'"),
|
||||
});
|
||||
await expect(promise).rejects.toBeInstanceOf(HistoricSqlExtensionMissingError);
|
||||
});
|
||||
|
||||
it('maps missing pg_read_all_stats membership to HistoricSqlGrantsMissingError', async () => {
|
||||
const client = queryClient([
|
||||
{
|
||||
headers: ['server_version_num', 'server_version'],
|
||||
rows: [[160004, 'PostgreSQL 16.4']],
|
||||
},
|
||||
{ headers: ['?column?'], rows: [[1]] },
|
||||
{ headers: ['has_role'], rows: [[false]] },
|
||||
]);
|
||||
const reader = new PostgresPgssReader();
|
||||
|
||||
const promise = reader.probe(client);
|
||||
await expect(promise).rejects.toMatchObject({
|
||||
name: 'HistoricSqlGrantsMissingError',
|
||||
dialect: 'postgres',
|
||||
remediation: 'GRANT pg_read_all_stats TO <connection role>;',
|
||||
});
|
||||
await expect(promise).rejects.toBeInstanceOf(HistoricSqlGrantsMissingError);
|
||||
});
|
||||
|
||||
it('returns a warning instead of failing when pg_stat_statements.track is none', async () => {
|
||||
const client = queryClient([
|
||||
{
|
||||
headers: ['server_version_num', 'server_version'],
|
||||
rows: [[160004, 'PostgreSQL 16.4']],
|
||||
},
|
||||
{ headers: ['?column?'], rows: [[1]] },
|
||||
{ headers: ['has_role'], rows: [[true]] },
|
||||
{ headers: ['track'], rows: [['none']] },
|
||||
{ headers: ['max'], rows: [['5000']] },
|
||||
]);
|
||||
const reader = new PostgresPgssReader();
|
||||
|
||||
await expect(reader.probe(client)).resolves.toEqual({
|
||||
pgServerVersion: 'PostgreSQL 16.4',
|
||||
warnings: [
|
||||
"pg_stat_statements.track is none; set it to top or all in the Postgres parameter group or config",
|
||||
],
|
||||
info: [],
|
||||
});
|
||||
});
|
||||
|
||||
it('returns an info note when pg_stat_statements.max is below the recommended floor', async () => {
|
||||
const client = queryClient([
|
||||
{
|
||||
headers: ['server_version_num', 'server_version'],
|
||||
rows: [[160004, 'PostgreSQL 16.4']],
|
||||
},
|
||||
{ headers: ['?column?'], rows: [[1]] },
|
||||
{ headers: ['has_role'], rows: [[true]] },
|
||||
{ headers: ['track'], rows: [['top']] },
|
||||
{ headers: ['max'], rows: [['1000']] },
|
||||
]);
|
||||
const reader = new PostgresPgssReader();
|
||||
|
||||
await expect(reader.probe(client)).resolves.toEqual({
|
||||
pgServerVersion: 'PostgreSQL 16.4',
|
||||
warnings: [],
|
||||
info: [
|
||||
'pg_stat_statements.max is 1000; set it to at least 5000 to reduce query-template eviction churn',
|
||||
],
|
||||
});
|
||||
});
|
||||
|
||||
it('aggregates pg_stat_statements rows by queryid and query', async () => {
|
||||
const executeQuery = vi.fn(async (sql: string, params?: unknown[]) => {
|
||||
if (sql.includes('pg_stat_statements_info')) {
|
||||
return { headers: ['stats_reset', 'dealloc'], rows: [['2026-05-01T00:00:00.000Z', 1]] };
|
||||
}
|
||||
expect(sql).toContain('GROUP BY queryid, query');
|
||||
expect(sql).toContain('HAVING SUM(calls) >= $1');
|
||||
expect(params).toEqual([5]);
|
||||
return {
|
||||
headers: ['template_id', 'canonical_sql', 'executions', 'distinct_users', 'mean_ms', 'rows_produced', 'top_users'],
|
||||
rows: [
|
||||
[
|
||||
'123',
|
||||
'select status from public.orders',
|
||||
'42',
|
||||
'3',
|
||||
'11.5',
|
||||
'100',
|
||||
JSON.stringify([{ user: 'analyst', executions: 40 }]),
|
||||
],
|
||||
],
|
||||
};
|
||||
});
|
||||
|
||||
const reader = new PostgresPgssReader();
|
||||
const rows = [];
|
||||
for await (const row of reader.fetchAggregated(
|
||||
{ executeQuery },
|
||||
{ start: new Date('2026-02-10T00:00:00.000Z'), end: new Date('2026-05-11T00:00:00.000Z') },
|
||||
{ dialect: 'postgres', minExecutions: 5, enabledTables: [], filters: { dropTrivialProbes: true }, redactionPatterns: [], staleArchiveAfterDays: 90 },
|
||||
)) {
|
||||
rows.push(row);
|
||||
}
|
||||
|
||||
expect(rows).toEqual([
|
||||
{
|
||||
templateId: '123',
|
||||
canonicalSql: 'select status from public.orders',
|
||||
dialect: 'postgres',
|
||||
stats: {
|
||||
executions: 42,
|
||||
distinctUsers: 3,
|
||||
firstSeen: '2026-05-01T00:00:00.000Z',
|
||||
lastSeen: '2026-05-11T00:00:00.000Z',
|
||||
p50RuntimeMs: 11.5,
|
||||
p95RuntimeMs: 11.5,
|
||||
errorRate: 0,
|
||||
rowsProduced: 100,
|
||||
},
|
||||
topUsers: [{ user: 'analyst', executions: 40 }],
|
||||
},
|
||||
]);
|
||||
});
|
||||
});
|
||||
|
|
@ -0,0 +1,293 @@
|
|||
import {
|
||||
HistoricSqlExtensionMissingError,
|
||||
HistoricSqlGrantsMissingError,
|
||||
HistoricSqlVersionUnsupportedError,
|
||||
} from './errors.js';
|
||||
import {
|
||||
aggregatedTemplateSchema,
|
||||
type AggregatedTemplate,
|
||||
type HistoricSqlTimeWindow,
|
||||
type HistoricSqlUnifiedPullConfig,
|
||||
type KtxPostgresQueryClient,
|
||||
type PostgresPgssProbeResult,
|
||||
} from './types.js';
|
||||
|
||||
interface QueryResultLike {
|
||||
headers: string[];
|
||||
rows: unknown[][];
|
||||
totalRows?: number;
|
||||
error?: string;
|
||||
}
|
||||
|
||||
const STATS_INFO_SQL = 'SELECT stats_reset, dealloc FROM pg_stat_statements_info';
|
||||
const VERSION_SQL = `
|
||||
SELECT current_setting('server_version_num')::int AS server_version_num,
|
||||
version() AS server_version
|
||||
`.trim();
|
||||
const EXTENSION_PROBE_SQL = 'SELECT 1 FROM pg_stat_statements LIMIT 1';
|
||||
const GRANTS_PROBE_SQL = "SELECT pg_has_role(current_user, 'pg_read_all_stats', 'USAGE') AS has_role";
|
||||
const TRACKING_PROBE_SQL = "SELECT current_setting('pg_stat_statements.track') AS track";
|
||||
const MAX_SETTING_PROBE_SQL = "SELECT current_setting('pg_stat_statements.max') AS max";
|
||||
const RECOMMENDED_PGSS_MAX = 5000;
|
||||
|
||||
const AGGREGATE_SQL = `
|
||||
SELECT queryid::text AS template_id,
|
||||
query AS canonical_sql,
|
||||
SUM(calls)::bigint AS executions,
|
||||
COUNT(DISTINCT userid) AS distinct_users,
|
||||
SUM(total_exec_time) / NULLIF(SUM(calls), 0) AS mean_ms,
|
||||
SUM(rows)::bigint AS rows_produced,
|
||||
COALESCE(
|
||||
json_agg(json_build_object('user', rolname, 'executions', calls) ORDER BY calls DESC)
|
||||
FILTER (WHERE userid IS NOT NULL),
|
||||
'[]'::json
|
||||
)::text AS top_users
|
||||
FROM pg_stat_statements
|
||||
LEFT JOIN pg_roles ON pg_roles.oid = pg_stat_statements.userid
|
||||
WHERE toplevel = true
|
||||
GROUP BY queryid, query
|
||||
HAVING SUM(calls) >= $1
|
||||
ORDER BY SUM(total_exec_time) DESC
|
||||
`.trim();
|
||||
|
||||
const POSTGRES_EXTENSION_REMEDIATION = [
|
||||
'Run CREATE EXTENSION pg_stat_statements; against the connection database.',
|
||||
"Ensure shared_preload_libraries includes 'pg_stat_statements' in the Postgres parameter group or config.",
|
||||
].join(' ');
|
||||
|
||||
const POSTGRES_GRANTS_REMEDIATION = 'GRANT pg_read_all_stats TO <connection role>;';
|
||||
|
||||
function queryClient(client: unknown): KtxPostgresQueryClient {
|
||||
if (
|
||||
client &&
|
||||
typeof client === 'object' &&
|
||||
'executeQuery' in client &&
|
||||
typeof (client as { executeQuery?: unknown }).executeQuery === 'function'
|
||||
) {
|
||||
return client as KtxPostgresQueryClient;
|
||||
}
|
||||
throw new Error('Historic SQL Postgres PGSS reader requires a query client with executeQuery(sql, params?)');
|
||||
}
|
||||
|
||||
async function execute(client: KtxPostgresQueryClient, sql: string, params?: unknown[]): Promise<QueryResultLike> {
|
||||
const result = await client.executeQuery(sql, params);
|
||||
if ('error' in result && typeof result.error === 'string' && result.error.length > 0) {
|
||||
throw new Error(result.error);
|
||||
}
|
||||
return result;
|
||||
}
|
||||
|
||||
function indexByHeader(headers: string[]): Map<string, number> {
|
||||
const out = new Map<string, number>();
|
||||
headers.forEach((header, index) => out.set(header.toLowerCase(), index));
|
||||
return out;
|
||||
}
|
||||
|
||||
function value(row: unknown[], headerIndexes: Map<string, number>, header: string): unknown {
|
||||
const index = headerIndexes.get(header.toLowerCase());
|
||||
return index === undefined ? null : row[index];
|
||||
}
|
||||
|
||||
function nullableString(raw: unknown): string | null {
|
||||
if (raw === null || raw === undefined) {
|
||||
return null;
|
||||
}
|
||||
const text = String(raw);
|
||||
return text.length > 0 ? text : null;
|
||||
}
|
||||
|
||||
function requiredString(raw: unknown, field: string): string {
|
||||
const text = nullableString(raw);
|
||||
if (!text) {
|
||||
throw new Error(`Postgres pg_stat_statements row is missing ${field}`);
|
||||
}
|
||||
return text;
|
||||
}
|
||||
|
||||
function requiredFiniteNumber(raw: unknown, field: string): number {
|
||||
const number = typeof raw === 'number' ? raw : Number(raw);
|
||||
if (!Number.isFinite(number)) {
|
||||
throw new Error(`Postgres pg_stat_statements row has invalid ${field}: ${String(raw)}`);
|
||||
}
|
||||
return number;
|
||||
}
|
||||
|
||||
function requiredInteger(raw: unknown, field: string): number {
|
||||
return Math.trunc(requiredFiniteNumber(raw, field));
|
||||
}
|
||||
|
||||
function nullableNumber(raw: unknown): number | null {
|
||||
if (raw === null || raw === undefined || raw === '') {
|
||||
return null;
|
||||
}
|
||||
const number = typeof raw === 'number' ? raw : Number(raw);
|
||||
return Number.isFinite(number) ? number : null;
|
||||
}
|
||||
|
||||
function nullableInteger(raw: unknown): number | null {
|
||||
const number = nullableNumber(raw);
|
||||
return number === null ? null : Math.trunc(number);
|
||||
}
|
||||
|
||||
function nullableIsoTimestamp(raw: unknown): string | null {
|
||||
if (raw === null || raw === undefined || raw === '') {
|
||||
return null;
|
||||
}
|
||||
if (raw instanceof Date) {
|
||||
return raw.toISOString();
|
||||
}
|
||||
const date = new Date(String(raw));
|
||||
return Number.isNaN(date.getTime()) ? null : date.toISOString();
|
||||
}
|
||||
|
||||
function firstRow(result: QueryResultLike, context: string): { row: unknown[]; headers: Map<string, number> } {
|
||||
const row = result.rows[0];
|
||||
if (!row) {
|
||||
throw new Error(`Postgres historic-SQL ${context} query returned no rows`);
|
||||
}
|
||||
return { row, headers: indexByHeader(result.headers) };
|
||||
}
|
||||
|
||||
function isMissingPgssRelation(error: unknown): boolean {
|
||||
const message = error instanceof Error ? error.message : String(error);
|
||||
return /relation ["']?pg_stat_statements["']? does not exist/i.test(message);
|
||||
}
|
||||
|
||||
function isPgssPreloadRequired(error: unknown): boolean {
|
||||
const message = error instanceof Error ? error.message : String(error);
|
||||
return /pg_stat_statements.*shared_preload_libraries/i.test(message);
|
||||
}
|
||||
|
||||
function extensionMissingError(cause: unknown, message?: string): HistoricSqlExtensionMissingError {
|
||||
return new HistoricSqlExtensionMissingError({
|
||||
dialect: 'postgres',
|
||||
message: message ?? 'pg_stat_statements extension is not installed in the connection database.',
|
||||
remediation: POSTGRES_EXTENSION_REMEDIATION,
|
||||
cause,
|
||||
});
|
||||
}
|
||||
|
||||
function grantsMissingError(): HistoricSqlGrantsMissingError {
|
||||
return new HistoricSqlGrantsMissingError({
|
||||
dialect: 'postgres',
|
||||
message: 'Postgres connection role lacks pg_read_all_stats for historic-SQL ingest.',
|
||||
remediation: POSTGRES_GRANTS_REMEDIATION,
|
||||
});
|
||||
}
|
||||
|
||||
function parseTopUsers(raw: unknown): Array<{ user: string | null; executions: number }> {
|
||||
const text = nullableString(raw);
|
||||
if (!text) {
|
||||
return [];
|
||||
}
|
||||
try {
|
||||
const parsed = JSON.parse(text) as unknown;
|
||||
if (!Array.isArray(parsed)) {
|
||||
return [];
|
||||
}
|
||||
return parsed.flatMap((entry) => {
|
||||
if (!entry || typeof entry !== 'object') {
|
||||
return [];
|
||||
}
|
||||
const user = nullableString((entry as { user?: unknown }).user);
|
||||
const executions = nullableInteger((entry as { executions?: unknown }).executions);
|
||||
return executions === null ? [] : [{ user, executions }];
|
||||
});
|
||||
} catch {
|
||||
return [];
|
||||
}
|
||||
}
|
||||
|
||||
export class PostgresPgssReader {
|
||||
async probe(client: unknown): Promise<PostgresPgssProbeResult> {
|
||||
const pgClient = queryClient(client);
|
||||
const versionResult = await execute(pgClient, VERSION_SQL);
|
||||
const { row: versionRow, headers: versionHeaders } = firstRow(versionResult, 'version probe');
|
||||
const serverVersionNum = requiredFiniteNumber(
|
||||
value(versionRow, versionHeaders, 'server_version_num'),
|
||||
'server_version_num',
|
||||
);
|
||||
const pgServerVersion = requiredString(value(versionRow, versionHeaders, 'server_version'), 'server_version');
|
||||
|
||||
if (serverVersionNum < 140000) {
|
||||
throw new HistoricSqlVersionUnsupportedError({
|
||||
dialect: 'postgres',
|
||||
detectedVersion: pgServerVersion,
|
||||
minimumVersion: 'PostgreSQL 14',
|
||||
});
|
||||
}
|
||||
|
||||
try {
|
||||
await execute(pgClient, EXTENSION_PROBE_SQL);
|
||||
} catch (error) {
|
||||
if (isMissingPgssRelation(error)) {
|
||||
throw extensionMissingError(error);
|
||||
}
|
||||
if (isPgssPreloadRequired(error)) {
|
||||
throw extensionMissingError(
|
||||
error,
|
||||
'pg_stat_statements is installed but not loaded via shared_preload_libraries.',
|
||||
);
|
||||
}
|
||||
throw error;
|
||||
}
|
||||
|
||||
const grantsResult = await execute(pgClient, GRANTS_PROBE_SQL);
|
||||
const { row: grantsRow, headers: grantsHeaders } = firstRow(grantsResult, 'grant probe');
|
||||
if (value(grantsRow, grantsHeaders, 'has_role') !== true) {
|
||||
throw grantsMissingError();
|
||||
}
|
||||
|
||||
const trackingResult = await execute(pgClient, TRACKING_PROBE_SQL);
|
||||
const { row: trackingRow, headers: trackingHeaders } = firstRow(trackingResult, 'tracking probe');
|
||||
const track = nullableString(value(trackingRow, trackingHeaders, 'track'));
|
||||
|
||||
const maxResult = await execute(pgClient, MAX_SETTING_PROBE_SQL);
|
||||
const { row: maxRow, headers: maxHeaders } = firstRow(maxResult, 'max-setting probe');
|
||||
const pgssMax = nullableInteger(value(maxRow, maxHeaders, 'max'));
|
||||
|
||||
const warnings: string[] = [];
|
||||
const info: string[] = [];
|
||||
if (track === 'none') {
|
||||
warnings.push('pg_stat_statements.track is none; set it to top or all in the Postgres parameter group or config');
|
||||
}
|
||||
if (pgssMax !== null && pgssMax < RECOMMENDED_PGSS_MAX) {
|
||||
info.push(
|
||||
`pg_stat_statements.max is ${pgssMax}; set it to at least ${RECOMMENDED_PGSS_MAX} to reduce query-template eviction churn`,
|
||||
);
|
||||
}
|
||||
|
||||
return { pgServerVersion, warnings, info };
|
||||
}
|
||||
|
||||
async *fetchAggregated(
|
||||
client: unknown,
|
||||
window: HistoricSqlTimeWindow,
|
||||
config: HistoricSqlUnifiedPullConfig,
|
||||
): AsyncIterable<AggregatedTemplate> {
|
||||
const pgClient = queryClient(client);
|
||||
const statsResult = await execute(pgClient, STATS_INFO_SQL);
|
||||
const { row: statsRow, headers: statsHeaders } = firstRow(statsResult, 'stats-info');
|
||||
const firstSeen = nullableIsoTimestamp(value(statsRow, statsHeaders, 'stats_reset')) ?? window.start.toISOString();
|
||||
const result = await execute(pgClient, AGGREGATE_SQL, [config.minExecutions]);
|
||||
const indexes = indexByHeader(result.headers);
|
||||
for (const row of result.rows) {
|
||||
yield aggregatedTemplateSchema.parse({
|
||||
templateId: requiredString(value(row, indexes, 'template_id'), 'template_id'),
|
||||
canonicalSql: requiredString(value(row, indexes, 'canonical_sql'), 'canonical_sql'),
|
||||
dialect: 'postgres',
|
||||
stats: {
|
||||
executions: requiredInteger(value(row, indexes, 'executions'), 'executions'),
|
||||
distinctUsers: requiredInteger(value(row, indexes, 'distinct_users'), 'distinct_users'),
|
||||
firstSeen,
|
||||
lastSeen: window.end.toISOString(),
|
||||
p50RuntimeMs: nullableNumber(value(row, indexes, 'mean_ms')),
|
||||
p95RuntimeMs: nullableNumber(value(row, indexes, 'mean_ms')),
|
||||
errorRate: 0,
|
||||
rowsProduced: nullableInteger(value(row, indexes, 'rows_produced')),
|
||||
},
|
||||
topUsers: parseTopUsers(value(row, indexes, 'top_users')),
|
||||
});
|
||||
}
|
||||
}
|
||||
}
|
||||
|
|
@ -0,0 +1,457 @@
|
|||
import { mkdir, mkdtemp, readFile, writeFile } from 'node:fs/promises';
|
||||
import { tmpdir } from 'node:os';
|
||||
import { join } from 'node:path';
|
||||
import YAML from 'yaml';
|
||||
import { describe, expect, it } from 'vitest';
|
||||
import { projectHistoricSqlEvidence } from './projection.js';
|
||||
|
||||
async function tempWorkdir(): Promise<string> {
|
||||
return mkdtemp(join(tmpdir(), 'historic-sql-projection-'));
|
||||
}
|
||||
|
||||
async function writeText(root: string, relPath: string, content: string): Promise<void> {
|
||||
const target = join(root, relPath);
|
||||
await mkdir(join(target, '..'), { recursive: true });
|
||||
await writeFile(target, content, 'utf-8');
|
||||
}
|
||||
|
||||
async function writeJson(root: string, relPath: string, value: unknown): Promise<void> {
|
||||
await writeText(root, relPath, `${JSON.stringify(value, null, 2)}\n`);
|
||||
}
|
||||
|
||||
describe('projectHistoricSqlEvidence', () => {
|
||||
it('merges table usage into matching _schema shards and preserves external usage keys', async () => {
|
||||
const workdir = await tempWorkdir();
|
||||
await writeText(
|
||||
workdir,
|
||||
'semantic-layer/warehouse/_schema/public.yaml',
|
||||
YAML.stringify({
|
||||
tables: {
|
||||
orders: {
|
||||
table: 'public.orders',
|
||||
usage: {
|
||||
narrative: 'Old generated usage.',
|
||||
frequencyTier: 'low',
|
||||
commonFilters: ['old_status'],
|
||||
commonJoins: [],
|
||||
ownerNote: 'keep me',
|
||||
},
|
||||
columns: [{ name: 'id', type: 'string' }],
|
||||
},
|
||||
},
|
||||
}),
|
||||
);
|
||||
await writeJson(workdir, 'raw-sources/warehouse/historic-sql/sync-1/manifest.json', {
|
||||
source: 'historic-sql',
|
||||
connectionId: 'warehouse',
|
||||
dialect: 'postgres',
|
||||
fetchedAt: '2026-05-11T00:00:00.000Z',
|
||||
windowStart: '2026-02-10T00:00:00.000Z',
|
||||
windowEnd: '2026-05-11T00:00:00.000Z',
|
||||
snapshotRowCount: 1,
|
||||
touchedTableCount: 1,
|
||||
parseFailures: 0,
|
||||
warnings: [],
|
||||
probeWarnings: [],
|
||||
staleArchiveAfterDays: 90,
|
||||
});
|
||||
await writeJson(workdir, 'raw-sources/warehouse/historic-sql/sync-1/tables/public.orders.json', { table: 'public.orders' });
|
||||
await writeJson(workdir, '.ktx/ingest-evidence/historic-sql/run-1/orders.json', {
|
||||
kind: 'table_usage',
|
||||
connectionId: 'warehouse',
|
||||
table: 'public.orders',
|
||||
rawPath: 'tables/public.orders.json',
|
||||
usage: {
|
||||
narrative: 'Orders are repeatedly queried for lifecycle analysis.',
|
||||
frequencyTier: 'high',
|
||||
commonFilters: ['status', 'created_at'],
|
||||
commonGroupBys: ['status'],
|
||||
commonJoins: [{ table: 'public.customers', on: ['customer_id'] }],
|
||||
staleSince: null,
|
||||
},
|
||||
});
|
||||
|
||||
const result = await projectHistoricSqlEvidence({ workdir, connectionId: 'warehouse', syncId: 'sync-1', runId: 'run-1' });
|
||||
|
||||
expect(result.touchedSources).toEqual([{ connectionId: 'warehouse', sourceName: 'orders' }]);
|
||||
expect(result.actions).toEqual(
|
||||
expect.arrayContaining([
|
||||
expect.objectContaining({
|
||||
target: 'sl',
|
||||
key: 'orders',
|
||||
rawPaths: ['tables/public.orders.json'],
|
||||
}),
|
||||
]),
|
||||
);
|
||||
const shard = YAML.parse(await readFile(join(workdir, 'semantic-layer/warehouse/_schema/public.yaml'), 'utf-8'));
|
||||
expect(shard.tables.orders.usage).toEqual({
|
||||
ownerNote: 'keep me',
|
||||
narrative: 'Orders are repeatedly queried for lifecycle analysis.',
|
||||
frequencyTier: 'high',
|
||||
commonFilters: ['status', 'created_at'],
|
||||
commonGroupBys: ['status'],
|
||||
commonJoins: [{ table: 'public.customers', on: ['customer_id'] }],
|
||||
staleSince: null,
|
||||
});
|
||||
});
|
||||
|
||||
it('writes pattern pages, reuses similar slugs, and marks missing old pattern pages stale', async () => {
|
||||
const workdir = await tempWorkdir();
|
||||
await writeJson(workdir, 'raw-sources/warehouse/historic-sql/sync-1/manifest.json', {
|
||||
source: 'historic-sql',
|
||||
connectionId: 'warehouse',
|
||||
dialect: 'postgres',
|
||||
fetchedAt: '2026-05-11T00:00:00.000Z',
|
||||
windowStart: '2026-02-10T00:00:00.000Z',
|
||||
windowEnd: '2026-05-11T00:00:00.000Z',
|
||||
snapshotRowCount: 2,
|
||||
touchedTableCount: 2,
|
||||
parseFailures: 0,
|
||||
warnings: [],
|
||||
probeWarnings: [],
|
||||
staleArchiveAfterDays: 90,
|
||||
});
|
||||
await writeJson(workdir, 'raw-sources/warehouse/historic-sql/sync-1/tables/public.orders.json', { table: 'public.orders' });
|
||||
await writeJson(workdir, 'raw-sources/warehouse/historic-sql/sync-1/tables/public.customers.json', { table: 'public.customers' });
|
||||
await writeText(
|
||||
workdir,
|
||||
'wiki/global/historic-sql-old-order-lifecycle.md',
|
||||
[
|
||||
'---',
|
||||
YAML.stringify({
|
||||
summary: 'Old order lifecycle page',
|
||||
tags: ['historic-sql', 'pattern'],
|
||||
refs: [],
|
||||
sl_refs: ['orders'],
|
||||
usage_mode: 'auto',
|
||||
source: 'historic-sql',
|
||||
tables: ['public.orders', 'public.customers'],
|
||||
fingerprints: ['pg:1'],
|
||||
}).trimEnd(),
|
||||
'---',
|
||||
'',
|
||||
'Old body',
|
||||
'',
|
||||
].join('\n'),
|
||||
);
|
||||
await writeText(
|
||||
workdir,
|
||||
'wiki/global/historic-sql-retired-pattern.md',
|
||||
[
|
||||
'---',
|
||||
YAML.stringify({
|
||||
summary: 'Retired pattern',
|
||||
tags: ['historic-sql', 'pattern'],
|
||||
refs: [],
|
||||
sl_refs: [],
|
||||
usage_mode: 'auto',
|
||||
source: 'historic-sql',
|
||||
tables: ['public.tickets'],
|
||||
fingerprints: ['pg:9'],
|
||||
}).trimEnd(),
|
||||
'---',
|
||||
'',
|
||||
'Retired body',
|
||||
'',
|
||||
].join('\n'),
|
||||
);
|
||||
await writeJson(workdir, '.ktx/ingest-evidence/historic-sql/run-1/pattern.json', {
|
||||
kind: 'pattern',
|
||||
connectionId: 'warehouse',
|
||||
rawPath: 'patterns-input.json',
|
||||
pattern: {
|
||||
slug: 'order-lifecycle-analysis',
|
||||
title: 'Order Lifecycle Analysis',
|
||||
narrative: 'Analysts compare order status with customer segment.',
|
||||
definitionSql: 'select * from public.orders join public.customers on customers.id = orders.customer_id',
|
||||
tablesInvolved: ['public.orders', 'public.customers'],
|
||||
slRefs: ['orders', 'customers'],
|
||||
constituentTemplateIds: ['pg:1', 'pg:2'],
|
||||
},
|
||||
});
|
||||
|
||||
const result = await projectHistoricSqlEvidence({ workdir, connectionId: 'warehouse', syncId: 'sync-1', runId: 'run-1' });
|
||||
|
||||
expect(result.patternPagesWritten).toBe(1);
|
||||
expect(result.changedWikiPageKeys).toContain('historic-sql-old-order-lifecycle');
|
||||
expect(result.actions).toEqual(
|
||||
expect.arrayContaining([
|
||||
expect.objectContaining({
|
||||
target: 'wiki',
|
||||
key: 'historic-sql-old-order-lifecycle',
|
||||
rawPaths: ['patterns-input.json'],
|
||||
}),
|
||||
]),
|
||||
);
|
||||
await expect(readFile(join(workdir, 'wiki/global/historic-sql-old-order-lifecycle.md'), 'utf-8')).resolves.toContain(
|
||||
'Order Lifecycle Analysis',
|
||||
);
|
||||
await expect(readFile(join(workdir, 'wiki/global/historic-sql-retired-pattern.md'), 'utf-8')).resolves.toContain(
|
||||
'stale_since: "2026-05-11T00:00:00.000Z"',
|
||||
);
|
||||
});
|
||||
|
||||
it('rewrites a reappearing archived pattern at the flat slug', async () => {
|
||||
const workdir = await tempWorkdir();
|
||||
await writeJson(workdir, 'raw-sources/warehouse/historic-sql/sync-1/manifest.json', {
|
||||
source: 'historic-sql',
|
||||
connectionId: 'warehouse',
|
||||
dialect: 'postgres',
|
||||
fetchedAt: '2026-05-11T00:00:00.000Z',
|
||||
windowStart: '2026-02-10T00:00:00.000Z',
|
||||
windowEnd: '2026-05-11T00:00:00.000Z',
|
||||
snapshotRowCount: 2,
|
||||
touchedTableCount: 2,
|
||||
parseFailures: 0,
|
||||
warnings: [],
|
||||
probeWarnings: [],
|
||||
staleArchiveAfterDays: 30,
|
||||
});
|
||||
await writeJson(workdir, 'raw-sources/warehouse/historic-sql/sync-1/tables/public.orders.json', { table: 'public.orders' });
|
||||
await writeJson(workdir, 'raw-sources/warehouse/historic-sql/sync-1/tables/public.customers.json', { table: 'public.customers' });
|
||||
await writeText(
|
||||
workdir,
|
||||
'wiki/global/historic-sql-order-lifecycle-analysis.md',
|
||||
[
|
||||
'---',
|
||||
YAML.stringify({
|
||||
summary: 'Archived order lifecycle page',
|
||||
tags: ['historic-sql', 'pattern', 'archived'],
|
||||
refs: [],
|
||||
sl_refs: ['orders'],
|
||||
usage_mode: 'auto',
|
||||
source: 'historic-sql',
|
||||
tables: ['public.orders', 'public.customers'],
|
||||
fingerprints: ['pg:1'],
|
||||
stale_since: '2026-01-01T00:00:00.000Z',
|
||||
}).trimEnd(),
|
||||
'---',
|
||||
'',
|
||||
'Archived body',
|
||||
'',
|
||||
].join('\n'),
|
||||
);
|
||||
await writeJson(workdir, '.ktx/ingest-evidence/historic-sql/run-1/pattern.json', {
|
||||
kind: 'pattern',
|
||||
connectionId: 'warehouse',
|
||||
rawPath: 'patterns-input.json',
|
||||
pattern: {
|
||||
slug: 'order-lifecycle-analysis',
|
||||
title: 'Order Lifecycle Analysis',
|
||||
narrative: 'Analysts compare order status with customer segment again.',
|
||||
definitionSql: 'select * from public.orders join public.customers on customers.id = orders.customer_id',
|
||||
tablesInvolved: ['public.orders', 'public.customers'],
|
||||
slRefs: ['orders', 'customers'],
|
||||
constituentTemplateIds: ['pg:1', 'pg:2'],
|
||||
},
|
||||
});
|
||||
|
||||
const result = await projectHistoricSqlEvidence({ workdir, connectionId: 'warehouse', syncId: 'sync-1', runId: 'run-1' });
|
||||
|
||||
expect(result.patternPagesWritten).toBe(1);
|
||||
const page = await readFile(join(workdir, 'wiki/global/historic-sql-order-lifecycle-analysis.md'), 'utf-8');
|
||||
expect(page).toContain('Analysts compare order status with customer segment again.');
|
||||
expect(page).not.toContain('Archived body');
|
||||
expect(page).not.toContain('archived');
|
||||
});
|
||||
|
||||
it('leaves already archived pattern pages stable when they are still absent', async () => {
|
||||
const workdir = await tempWorkdir();
|
||||
await writeJson(workdir, 'raw-sources/warehouse/historic-sql/sync-1/manifest.json', {
|
||||
source: 'historic-sql',
|
||||
connectionId: 'warehouse',
|
||||
dialect: 'postgres',
|
||||
fetchedAt: '2026-05-11T00:00:00.000Z',
|
||||
windowStart: '2026-02-10T00:00:00.000Z',
|
||||
windowEnd: '2026-05-11T00:00:00.000Z',
|
||||
snapshotRowCount: 0,
|
||||
touchedTableCount: 0,
|
||||
parseFailures: 0,
|
||||
warnings: [],
|
||||
probeWarnings: [],
|
||||
staleArchiveAfterDays: 30,
|
||||
});
|
||||
await writeText(
|
||||
workdir,
|
||||
'wiki/global/historic-sql-retired-pattern.md',
|
||||
[
|
||||
'---',
|
||||
YAML.stringify({
|
||||
summary: 'Retired pattern',
|
||||
tags: ['historic-sql', 'pattern', 'archived'],
|
||||
refs: [],
|
||||
sl_refs: [],
|
||||
usage_mode: 'auto',
|
||||
source: 'historic-sql',
|
||||
tables: ['public.tickets'],
|
||||
fingerprints: ['pg:9'],
|
||||
stale_since: '2026-01-01T00:00:00.000Z',
|
||||
}).trimEnd(),
|
||||
'---',
|
||||
'',
|
||||
'Archived retired body',
|
||||
'',
|
||||
].join('\n'),
|
||||
);
|
||||
|
||||
const result = await projectHistoricSqlEvidence({ workdir, connectionId: 'warehouse', syncId: 'sync-1', runId: 'run-1' });
|
||||
|
||||
expect(result.archivedPatternPages).toBe(0);
|
||||
expect(result.stalePatternPagesMarked).toBe(0);
|
||||
await expect(readFile(join(workdir, 'wiki/global/historic-sql-retired-pattern.md'), 'utf-8')).resolves.toContain(
|
||||
'Archived retired body',
|
||||
);
|
||||
});
|
||||
|
||||
it('marks missing table usage stale without deleting old query pages', async () => {
|
||||
const workdir = await tempWorkdir();
|
||||
await writeText(
|
||||
workdir,
|
||||
'semantic-layer/warehouse/_schema/public.yaml',
|
||||
YAML.stringify({
|
||||
tables: {
|
||||
orders: {
|
||||
table: 'public.orders',
|
||||
usage: {
|
||||
narrative: 'Orders were active before.',
|
||||
frequencyTier: 'high',
|
||||
commonFilters: ['status'],
|
||||
commonGroupBys: ['status'],
|
||||
commonJoins: [{ table: 'public.customers', on: ['customer_id'] }],
|
||||
ownerNote: 'keep analyst annotation',
|
||||
},
|
||||
columns: [{ name: 'id', type: 'string' }],
|
||||
},
|
||||
},
|
||||
}),
|
||||
);
|
||||
await writeJson(workdir, 'raw-sources/warehouse/historic-sql/sync-1/manifest.json', {
|
||||
source: 'historic-sql',
|
||||
connectionId: 'warehouse',
|
||||
dialect: 'postgres',
|
||||
fetchedAt: '2026-05-11T00:00:00.000Z',
|
||||
windowStart: '2026-02-10T00:00:00.000Z',
|
||||
windowEnd: '2026-05-11T00:00:00.000Z',
|
||||
snapshotRowCount: 0,
|
||||
touchedTableCount: 0,
|
||||
parseFailures: 0,
|
||||
warnings: [],
|
||||
probeWarnings: [],
|
||||
staleArchiveAfterDays: 90,
|
||||
});
|
||||
await writeJson(workdir, '.ktx/ingest-evidence/historic-sql/run-1/customers.json', {
|
||||
kind: 'table_usage',
|
||||
connectionId: 'warehouse',
|
||||
table: 'public.customers',
|
||||
rawPath: 'tables/public.customers.json',
|
||||
usage: {
|
||||
narrative: 'Customers were queried.',
|
||||
frequencyTier: 'low',
|
||||
commonFilters: [],
|
||||
commonJoins: [],
|
||||
staleSince: null,
|
||||
},
|
||||
});
|
||||
await writeText(
|
||||
workdir,
|
||||
'wiki/global/historic-sql-old-template.md',
|
||||
[
|
||||
'---',
|
||||
YAML.stringify({
|
||||
summary: 'Old template page',
|
||||
tags: ['historic-sql', 'query-pattern'],
|
||||
refs: [],
|
||||
sl_refs: ['orders'],
|
||||
usage_mode: 'auto',
|
||||
source: 'historic-sql',
|
||||
tables: ['public.orders'],
|
||||
fingerprints: ['old:1'],
|
||||
}).trimEnd(),
|
||||
'---',
|
||||
'',
|
||||
'Old body',
|
||||
'',
|
||||
].join('\n'),
|
||||
);
|
||||
|
||||
const result = await projectHistoricSqlEvidence({ workdir, connectionId: 'warehouse', syncId: 'sync-1', runId: 'run-1' });
|
||||
|
||||
expect(result.staleTablesMarked).toBe(1);
|
||||
expect(result.touchedSources).toEqual([{ connectionId: 'warehouse', sourceName: 'orders' }]);
|
||||
const staleAction = result.actions.find((action) => action.target === 'sl' && action.key === 'orders');
|
||||
expect(staleAction).toEqual(expect.objectContaining({ target: 'sl', key: 'orders' }));
|
||||
expect(staleAction?.rawPaths).toBeUndefined();
|
||||
const shard = YAML.parse(await readFile(join(workdir, 'semantic-layer/warehouse/_schema/public.yaml'), 'utf-8'));
|
||||
expect(shard.tables.orders.usage).toEqual({
|
||||
ownerNote: 'keep analyst annotation',
|
||||
narrative: 'No recent historic SQL usage was observed in the latest snapshot.',
|
||||
frequencyTier: 'unused',
|
||||
commonFilters: [],
|
||||
commonGroupBys: [],
|
||||
commonJoins: [],
|
||||
staleSince: '2026-05-11T00:00:00.000Z',
|
||||
});
|
||||
await expect(readFile(join(workdir, 'wiki/global/historic-sql-old-template.md'), 'utf-8')).resolves.toContain(
|
||||
'Old body',
|
||||
);
|
||||
});
|
||||
|
||||
it('does not mark stale or archive pages when override replay has no current-run evidence', async () => {
|
||||
const workdir = await tempWorkdir();
|
||||
await writeText(
|
||||
workdir,
|
||||
'semantic-layer/warehouse/_schema/public.yaml',
|
||||
YAML.stringify({
|
||||
tables: {
|
||||
orders: {
|
||||
table: 'public.orders',
|
||||
usage: {
|
||||
narrative: 'Orders were active before.',
|
||||
frequencyTier: 'high',
|
||||
commonFilters: ['status'],
|
||||
commonGroupBys: ['status'],
|
||||
commonJoins: [],
|
||||
},
|
||||
columns: [{ name: 'id', type: 'string' }],
|
||||
},
|
||||
},
|
||||
}),
|
||||
);
|
||||
await writeJson(workdir, 'raw-sources/warehouse/historic-sql/override-sync/manifest.json', {
|
||||
source: 'historic-sql',
|
||||
connectionId: 'warehouse',
|
||||
dialect: 'postgres',
|
||||
fetchedAt: '2026-05-11T00:00:00.000Z',
|
||||
windowStart: '2026-02-10T00:00:00.000Z',
|
||||
windowEnd: '2026-05-11T00:00:00.000Z',
|
||||
snapshotRowCount: 0,
|
||||
touchedTableCount: 0,
|
||||
parseFailures: 0,
|
||||
warnings: [],
|
||||
probeWarnings: [],
|
||||
staleArchiveAfterDays: 90,
|
||||
});
|
||||
|
||||
const result = await projectHistoricSqlEvidence({
|
||||
workdir,
|
||||
connectionId: 'warehouse',
|
||||
syncId: 'override-sync',
|
||||
runId: 'override-run',
|
||||
overrideReplay: {
|
||||
priorJobId: 'prior-job',
|
||||
priorRunId: 'prior-run',
|
||||
priorSyncId: 'prior-sync',
|
||||
evictionRawPaths: ['tables/public/orders.json'],
|
||||
},
|
||||
});
|
||||
|
||||
expect(result.tableUsageMerged).toBe(0);
|
||||
expect(result.staleTablesMarked).toBe(0);
|
||||
expect(result.patternPagesWritten).toBe(0);
|
||||
expect(result.stalePatternPagesMarked).toBe(0);
|
||||
expect(result.archivedPatternPages).toBe(0);
|
||||
expect(result.touchedSources).toEqual([]);
|
||||
expect(result.changedWikiPageKeys).toEqual([]);
|
||||
expect(result.actions).toEqual([]);
|
||||
});
|
||||
});
|
||||
|
|
@ -0,0 +1,385 @@
|
|||
import { access, mkdir, readdir, readFile, rename, writeFile } from 'node:fs/promises';
|
||||
import { dirname, join, relative } from 'node:path';
|
||||
import YAML from 'yaml';
|
||||
import type { MemoryAction } from '../../../../context/memory/types.js';
|
||||
import { rawSourcesDirForSync } from '../../raw-sources-paths.js';
|
||||
import type { FinalizationOverrideReplay } from '../../types.js';
|
||||
import { mergeUsagePreservingExternal } from '../live-database/manifest.js';
|
||||
import { historicSqlEvidenceEnvelopeSchema, type HistoricSqlEvidenceEnvelope } from './evidence.js';
|
||||
import type { TableUsageOutput } from './skill-schemas.js';
|
||||
import { stagedManifestSchema } from './types.js';
|
||||
|
||||
export interface HistoricSqlProjectionInput {
|
||||
workdir: string;
|
||||
connectionId: string;
|
||||
syncId: string;
|
||||
runId: string;
|
||||
overrideReplay?: FinalizationOverrideReplay;
|
||||
}
|
||||
|
||||
export interface HistoricSqlProjectionResult {
|
||||
tableUsageMerged: number;
|
||||
staleTablesMarked: number;
|
||||
patternPagesWritten: number;
|
||||
stalePatternPagesMarked: number;
|
||||
archivedPatternPages: number;
|
||||
touchedSources: Array<{ connectionId: string; sourceName: string }>;
|
||||
changedWikiPageKeys: string[];
|
||||
actions: MemoryAction[];
|
||||
warnings: string[];
|
||||
}
|
||||
|
||||
interface ManifestShard {
|
||||
tables?: Record<string, { table?: string; usage?: Record<string, unknown>; columns?: unknown[]; [key: string]: unknown }>;
|
||||
}
|
||||
|
||||
interface HistoricSqlPatternPage {
|
||||
key: string;
|
||||
path: string;
|
||||
frontmatter: Record<string, unknown>;
|
||||
content: string;
|
||||
}
|
||||
|
||||
function safeKnowledgeSlug(value: string): string {
|
||||
return value.toLowerCase().replace(/[^a-z0-9_-]+/g, '-').replace(/^-+|-+$/g, '');
|
||||
}
|
||||
|
||||
async function pathExists(path: string): Promise<boolean> {
|
||||
try {
|
||||
await access(path);
|
||||
return true;
|
||||
} catch {
|
||||
return false;
|
||||
}
|
||||
}
|
||||
|
||||
async function walkFiles(root: string): Promise<string[]> {
|
||||
if (!(await pathExists(root))) return [];
|
||||
const result: string[] = [];
|
||||
async function visit(dir: string): Promise<void> {
|
||||
const entries = await readdir(dir, { withFileTypes: true });
|
||||
for (const entry of entries) {
|
||||
const absolute = join(dir, entry.name);
|
||||
if (entry.isDirectory()) {
|
||||
await visit(absolute);
|
||||
} else if (entry.isFile()) {
|
||||
result.push(relative(root, absolute).replace(/\\/g, '/'));
|
||||
}
|
||||
}
|
||||
}
|
||||
await visit(root);
|
||||
return result.sort();
|
||||
}
|
||||
|
||||
async function readJson(path: string): Promise<unknown> {
|
||||
return JSON.parse(await readFile(path, 'utf-8')) as unknown;
|
||||
}
|
||||
|
||||
async function writeYamlAtomic(path: string, value: unknown): Promise<void> {
|
||||
await mkdir(dirname(path), { recursive: true });
|
||||
const tmp = `${path}.tmp`;
|
||||
await writeFile(tmp, YAML.stringify(value, { indent: 2, lineWidth: 0, version: '1.1' }), 'utf-8');
|
||||
await rename(tmp, path);
|
||||
}
|
||||
|
||||
function tableSourceName(tableRef: string): string {
|
||||
return tableRef.split('.').filter(Boolean).at(-1) ?? tableRef;
|
||||
}
|
||||
|
||||
function staleUsage(fetchedAt: string) {
|
||||
return {
|
||||
narrative: 'No recent historic SQL usage was observed in the latest snapshot.',
|
||||
frequencyTier: 'unused' as const,
|
||||
commonFilters: [],
|
||||
commonGroupBys: [],
|
||||
commonJoins: [],
|
||||
staleSince: fetchedAt,
|
||||
};
|
||||
}
|
||||
|
||||
async function loadEvidence(workdir: string, runId: string): Promise<HistoricSqlEvidenceEnvelope[]> {
|
||||
const root = join(workdir, '.ktx/ingest-evidence/historic-sql', runId);
|
||||
const files = await walkFiles(root);
|
||||
const evidence: HistoricSqlEvidenceEnvelope[] = [];
|
||||
for (const file of files.filter((candidate) => candidate.endsWith('.json'))) {
|
||||
evidence.push(historicSqlEvidenceEnvelopeSchema.parse(await readJson(join(root, file))));
|
||||
}
|
||||
return evidence;
|
||||
}
|
||||
|
||||
function renderPatternMarkdown(pattern: HistoricSqlEvidenceEnvelope & { kind: 'pattern' }): string {
|
||||
return [
|
||||
`# ${pattern.pattern.title}`,
|
||||
'',
|
||||
pattern.pattern.narrative,
|
||||
'',
|
||||
'## Representative SQL',
|
||||
'',
|
||||
'```sql',
|
||||
pattern.pattern.definitionSql,
|
||||
'```',
|
||||
'',
|
||||
'## Tables',
|
||||
'',
|
||||
...pattern.pattern.tablesInvolved.map((table) => `- ${table}`),
|
||||
'',
|
||||
'## Constituent Templates',
|
||||
'',
|
||||
...pattern.pattern.constituentTemplateIds.map((id) => `- ${id}`),
|
||||
'',
|
||||
].join('\n');
|
||||
}
|
||||
|
||||
function overlapRatio(left: string[], right: string[]): number {
|
||||
const rightSet = new Set(right);
|
||||
const intersection = left.filter((value) => rightSet.has(value)).length;
|
||||
return left.length === 0 ? 0 : intersection / left.length;
|
||||
}
|
||||
|
||||
function parseMarkdownPage(key: string, path: string, raw: string): HistoricSqlPatternPage | null {
|
||||
const match = raw.match(/^---\n([\s\S]*?)\n---\n?([\s\S]*)$/);
|
||||
if (!match) return null;
|
||||
return {
|
||||
key,
|
||||
path,
|
||||
frontmatter: (YAML.parse(match[1] ?? '') ?? {}) as Record<string, unknown>,
|
||||
content: match[2] ?? '',
|
||||
};
|
||||
}
|
||||
|
||||
function isHistoricPatternPage(page: HistoricSqlPatternPage): boolean {
|
||||
const tags = Array.isArray(page.frontmatter.tags) ? page.frontmatter.tags : [];
|
||||
return (
|
||||
page.frontmatter.source === 'historic-sql' &&
|
||||
tags.includes('historic-sql') &&
|
||||
tags.includes('pattern')
|
||||
);
|
||||
}
|
||||
|
||||
function isArchivedPatternPage(page: HistoricSqlPatternPage): boolean {
|
||||
const tags = Array.isArray(page.frontmatter.tags) ? page.frontmatter.tags : [];
|
||||
return tags.includes('archived');
|
||||
}
|
||||
|
||||
function stringArray(value: unknown): string[] {
|
||||
return Array.isArray(value) ? value.filter((entry): entry is string => typeof entry === 'string') : [];
|
||||
}
|
||||
|
||||
function renderMarkdownPage(frontmatter: Record<string, unknown>, content: string): string {
|
||||
let yaml = YAML.stringify(frontmatter, { indent: 2, lineWidth: 0 }).trimEnd();
|
||||
const staleSince = frontmatter.stale_since;
|
||||
if (typeof staleSince === 'string') {
|
||||
yaml = yaml.replace(`stale_since: ${staleSince}`, `stale_since: "${staleSince}"`);
|
||||
}
|
||||
return `---\n${yaml}\n---\n\n${content.trim()}\n`;
|
||||
}
|
||||
|
||||
function existingPageSignals(page: HistoricSqlPatternPage): string[] {
|
||||
return [...stringArray(page.frontmatter.tables), ...stringArray(page.frontmatter.fingerprints)];
|
||||
}
|
||||
|
||||
function shouldArchive(staleSince: unknown, fetchedAt: string, days: number): boolean {
|
||||
if (typeof staleSince !== 'string') return false;
|
||||
const staleTime = Date.parse(staleSince);
|
||||
const fetchedTime = Date.parse(fetchedAt);
|
||||
if (!Number.isFinite(staleTime) || !Number.isFinite(fetchedTime)) return false;
|
||||
return fetchedTime - staleTime > days * 24 * 60 * 60 * 1000;
|
||||
}
|
||||
|
||||
async function loadPatternPages(root: string): Promise<HistoricSqlPatternPage[]> {
|
||||
const files = await walkFiles(root);
|
||||
const pages: HistoricSqlPatternPage[] = [];
|
||||
for (const file of files.filter((candidate) => candidate.endsWith('.md'))) {
|
||||
if (file.includes('/')) {
|
||||
continue;
|
||||
}
|
||||
const key = file.replace(/\.md$/, '');
|
||||
const path = join(root, file);
|
||||
const page = parseMarkdownPage(key, path, await readFile(path, 'utf-8'));
|
||||
if (page) {
|
||||
pages.push(page);
|
||||
}
|
||||
}
|
||||
return pages;
|
||||
}
|
||||
|
||||
function historicSqlFlatKey(slug: string): string {
|
||||
return `historic-sql-${safeKnowledgeSlug(slug)}`;
|
||||
}
|
||||
|
||||
async function currentStagedTables(rawDir: string): Promise<Set<string>> {
|
||||
const tablesRoot = join(rawDir, 'tables');
|
||||
const files = await walkFiles(tablesRoot);
|
||||
const tables = new Set<string>();
|
||||
for (const file of files.filter((candidate) => candidate.endsWith('.json'))) {
|
||||
const value = await readJson(join(tablesRoot, file));
|
||||
if (typeof value === 'object' && value !== null && 'table' in value && typeof value.table === 'string') {
|
||||
tables.add(value.table);
|
||||
}
|
||||
}
|
||||
return tables;
|
||||
}
|
||||
|
||||
export async function projectHistoricSqlEvidence(input: HistoricSqlProjectionInput): Promise<HistoricSqlProjectionResult> {
|
||||
const result: HistoricSqlProjectionResult = {
|
||||
tableUsageMerged: 0,
|
||||
staleTablesMarked: 0,
|
||||
patternPagesWritten: 0,
|
||||
stalePatternPagesMarked: 0,
|
||||
archivedPatternPages: 0,
|
||||
touchedSources: [],
|
||||
changedWikiPageKeys: [],
|
||||
actions: [],
|
||||
warnings: [],
|
||||
};
|
||||
const touchedKeys = new Set<string>();
|
||||
const rawDir = join(input.workdir, rawSourcesDirForSync(input.connectionId, 'historic-sql', input.syncId));
|
||||
const manifest = stagedManifestSchema.parse(await readJson(join(rawDir, 'manifest.json')));
|
||||
const currentTables = await currentStagedTables(rawDir);
|
||||
const evidence = await loadEvidence(input.workdir, input.runId);
|
||||
if (input.overrideReplay && evidence.length === 0) {
|
||||
result.warnings.push(
|
||||
'historic-sql finalization skipped stale/archive cleanup during override replay without current-run evidence',
|
||||
);
|
||||
return result;
|
||||
}
|
||||
if (evidence.length === 0) {
|
||||
result.warnings.push('historic-sql finalization skipped because no current-run evidence was emitted');
|
||||
return result;
|
||||
}
|
||||
const tableEvidence = evidence.filter((entry): entry is HistoricSqlEvidenceEnvelope & { kind: 'table_usage' } => entry.kind === 'table_usage');
|
||||
const patternEvidence = evidence.filter((entry): entry is HistoricSqlEvidenceEnvelope & { kind: 'pattern' } => entry.kind === 'pattern');
|
||||
|
||||
const schemaRoot = join(input.workdir, 'semantic-layer', input.connectionId, '_schema');
|
||||
for (const file of (await walkFiles(schemaRoot)).filter((candidate) => candidate.endsWith('.yaml') || candidate.endsWith('.yml'))) {
|
||||
const path = join(schemaRoot, file);
|
||||
const before = await readFile(path, 'utf-8');
|
||||
const shard = (YAML.parse(before) ?? {}) as ManifestShard;
|
||||
if (!shard.tables) continue;
|
||||
for (const [tableName, entry] of Object.entries(shard.tables)) {
|
||||
const tableRef = entry.table ?? tableName;
|
||||
const matchingEvidence = tableEvidence.find(
|
||||
(candidate) => candidate.table === tableRef || tableSourceName(candidate.table) === tableName,
|
||||
);
|
||||
if (matchingEvidence) {
|
||||
const merged = mergeUsagePreservingExternal(entry.usage as TableUsageOutput | undefined, matchingEvidence.usage);
|
||||
if (JSON.stringify(entry.usage ?? null) !== JSON.stringify(merged ?? null)) {
|
||||
entry.usage = merged as Record<string, unknown>;
|
||||
result.tableUsageMerged += 1;
|
||||
const sourceName = tableSourceName(matchingEvidence.table);
|
||||
const key = `${input.connectionId}:${sourceName}`;
|
||||
if (!touchedKeys.has(key)) {
|
||||
touchedKeys.add(key);
|
||||
result.touchedSources.push({ connectionId: input.connectionId, sourceName });
|
||||
}
|
||||
result.actions.push({
|
||||
target: 'sl',
|
||||
type: 'updated',
|
||||
key: sourceName,
|
||||
targetConnectionId: input.connectionId,
|
||||
detail: `Merged historic-SQL usage for ${matchingEvidence.table}`,
|
||||
rawPaths: [matchingEvidence.rawPath],
|
||||
});
|
||||
}
|
||||
} else if (entry.usage && !currentTables.has(tableRef)) {
|
||||
const merged = mergeUsagePreservingExternal(entry.usage as TableUsageOutput | undefined, staleUsage(manifest.fetchedAt));
|
||||
if (JSON.stringify(entry.usage ?? null) !== JSON.stringify(merged ?? null)) {
|
||||
entry.usage = merged as Record<string, unknown>;
|
||||
result.staleTablesMarked += 1;
|
||||
const sourceName = tableSourceName(tableRef);
|
||||
const key = `${input.connectionId}:${sourceName}`;
|
||||
if (!touchedKeys.has(key)) {
|
||||
touchedKeys.add(key);
|
||||
result.touchedSources.push({ connectionId: input.connectionId, sourceName });
|
||||
}
|
||||
result.actions.push({
|
||||
target: 'sl',
|
||||
type: 'updated',
|
||||
key: sourceName,
|
||||
targetConnectionId: input.connectionId,
|
||||
detail: `Marked historic-SQL usage stale for ${tableRef}`,
|
||||
});
|
||||
}
|
||||
}
|
||||
}
|
||||
const after = YAML.stringify(shard, { indent: 2, lineWidth: 0, version: '1.1' });
|
||||
if (after !== before) {
|
||||
await writeYamlAtomic(path, shard);
|
||||
}
|
||||
}
|
||||
|
||||
const wikiRoot = join(input.workdir, 'wiki/global');
|
||||
await mkdir(wikiRoot, { recursive: true });
|
||||
const allPages = await loadPatternPages(wikiRoot);
|
||||
const activePages = allPages.filter((page) => !isArchivedPatternPage(page));
|
||||
const patternPages = activePages.filter(isHistoricPatternPage);
|
||||
const writtenKeys = new Set<string>();
|
||||
|
||||
for (const pattern of patternEvidence) {
|
||||
const incomingSignals = [...pattern.pattern.tablesInvolved, ...pattern.pattern.constituentTemplateIds];
|
||||
const reusable = patternPages.find((page) => overlapRatio(incomingSignals, existingPageSignals(page)) >= 0.6);
|
||||
const key = reusable?.key ?? historicSqlFlatKey(pattern.pattern.slug);
|
||||
const pagePath = join(wikiRoot, `${key}.md`);
|
||||
const frontmatter = {
|
||||
summary: pattern.pattern.title,
|
||||
tags: ['historic-sql', 'pattern'],
|
||||
refs: [],
|
||||
sl_refs: pattern.pattern.slRefs,
|
||||
usage_mode: 'auto',
|
||||
source: 'historic-sql',
|
||||
tables: pattern.pattern.tablesInvolved,
|
||||
representative_sql: pattern.pattern.definitionSql,
|
||||
fingerprints: pattern.pattern.constituentTemplateIds,
|
||||
};
|
||||
await mkdir(dirname(pagePath), { recursive: true });
|
||||
await writeFile(pagePath, renderMarkdownPage(frontmatter, renderPatternMarkdown(pattern)), 'utf-8');
|
||||
writtenKeys.add(key);
|
||||
result.patternPagesWritten += 1;
|
||||
result.changedWikiPageKeys.push(key);
|
||||
result.actions.push({
|
||||
target: 'wiki',
|
||||
type: reusable ? 'updated' : 'created',
|
||||
key,
|
||||
detail: `Projected historic-SQL pattern ${pattern.pattern.title}`,
|
||||
rawPaths: [pattern.rawPath],
|
||||
});
|
||||
}
|
||||
|
||||
for (const page of patternPages) {
|
||||
if (writtenKeys.has(page.key)) continue;
|
||||
if (shouldArchive(page.frontmatter.stale_since, manifest.fetchedAt, manifest.staleArchiveAfterDays)) {
|
||||
const tags = [...new Set([...stringArray(page.frontmatter.tags), 'archived'])];
|
||||
await writeFile(
|
||||
page.path,
|
||||
renderMarkdownPage({ ...page.frontmatter, tags, archived_since: manifest.fetchedAt }, page.content),
|
||||
'utf-8',
|
||||
);
|
||||
result.archivedPatternPages += 1;
|
||||
result.changedWikiPageKeys.push(page.key);
|
||||
result.actions.push({
|
||||
target: 'wiki',
|
||||
type: 'updated',
|
||||
key: page.key,
|
||||
detail: `Archived stale historic-SQL pattern page ${page.key}`,
|
||||
});
|
||||
continue;
|
||||
}
|
||||
const tags = [...new Set([...stringArray(page.frontmatter.tags), 'stale'])];
|
||||
await writeFile(
|
||||
page.path,
|
||||
renderMarkdownPage({ ...page.frontmatter, tags, stale_since: manifest.fetchedAt }, page.content),
|
||||
'utf-8',
|
||||
);
|
||||
result.stalePatternPagesMarked += 1;
|
||||
result.changedWikiPageKeys.push(page.key);
|
||||
result.actions.push({
|
||||
target: 'wiki',
|
||||
type: 'updated',
|
||||
key: page.key,
|
||||
detail: `Marked historic-SQL pattern page ${page.key} stale`,
|
||||
});
|
||||
}
|
||||
|
||||
result.changedWikiPageKeys = [...new Set(result.changedWikiPageKeys)].sort();
|
||||
return result;
|
||||
}
|
||||
|
|
@ -0,0 +1,36 @@
|
|||
import { describe, expect, it } from 'vitest';
|
||||
import { compileHistoricSqlRedactionPatterns, redactHistoricSqlText } from './redaction.js';
|
||||
|
||||
describe('historic-SQL redaction', () => {
|
||||
it('redacts regex matches and supports the (?i) case-insensitive prefix', () => {
|
||||
const redactors = compileHistoricSqlRedactionPatterns([
|
||||
'sk_live_[A-Za-z0-9]+',
|
||||
'(?i)secret_token_[a-z0-9]+',
|
||||
]);
|
||||
|
||||
const sql =
|
||||
"select * from public.api_events where api_key = 'sk_live_abc123' and note = 'Secret_Token_9f'"; // pragma: allowlist secret
|
||||
|
||||
expect(redactHistoricSqlText(sql, redactors)).toBe(
|
||||
"select * from public.api_events where api_key = '[REDACTED]' and note = '[REDACTED]'",
|
||||
);
|
||||
});
|
||||
|
||||
it('returns the original SQL text when no redaction patterns are configured', () => {
|
||||
const sql = "select * from public.orders where status = 'paid'";
|
||||
|
||||
expect(redactHistoricSqlText(sql, compileHistoricSqlRedactionPatterns([]))).toBe(sql);
|
||||
});
|
||||
|
||||
it('throws a config-focused error for invalid redaction regex patterns', () => {
|
||||
expect(() => compileHistoricSqlRedactionPatterns(['[broken'])).toThrow(
|
||||
'Invalid historicSql.redactionPatterns entry "[broken"',
|
||||
);
|
||||
});
|
||||
|
||||
it('throws a config-focused error for empty redaction regex patterns', () => {
|
||||
expect(() => compileHistoricSqlRedactionPatterns([' '])).toThrow(
|
||||
'Invalid historicSql.redactionPatterns entry " "',
|
||||
);
|
||||
});
|
||||
});
|
||||
|
|
@ -0,0 +1,37 @@
|
|||
export interface HistoricSqlRedactionPattern {
|
||||
pattern: string;
|
||||
expression: RegExp;
|
||||
}
|
||||
|
||||
const CASE_INSENSITIVE_PREFIX = '(?i)';
|
||||
const REDACTION_TOKEN = '[REDACTED]';
|
||||
|
||||
export function compileHistoricSqlRedactionPatterns(patterns: readonly string[]): HistoricSqlRedactionPattern[] {
|
||||
return patterns.map((pattern) => {
|
||||
const trimmed = pattern.trim();
|
||||
const caseInsensitive = trimmed.startsWith(CASE_INSENSITIVE_PREFIX);
|
||||
const source = caseInsensitive ? trimmed.slice(CASE_INSENSITIVE_PREFIX.length) : trimmed;
|
||||
if (source.length === 0) {
|
||||
throw new Error(`Invalid historicSql.redactionPatterns entry "${pattern}": pattern must not be empty`);
|
||||
}
|
||||
|
||||
try {
|
||||
return {
|
||||
pattern,
|
||||
expression: new RegExp(source, caseInsensitive ? 'gi' : 'g'),
|
||||
};
|
||||
} catch (error) {
|
||||
const reason = error instanceof Error ? error.message : String(error);
|
||||
throw new Error(`Invalid historicSql.redactionPatterns entry "${pattern}": ${reason}`);
|
||||
}
|
||||
});
|
||||
}
|
||||
|
||||
export function redactHistoricSqlText(text: string, redactors: readonly HistoricSqlRedactionPattern[]): string {
|
||||
let next = text;
|
||||
for (const redactor of redactors) {
|
||||
redactor.expression.lastIndex = 0;
|
||||
next = next.replace(redactor.expression, REDACTION_TOKEN);
|
||||
}
|
||||
return next;
|
||||
}
|
||||
|
|
@ -0,0 +1,74 @@
|
|||
import { describe, expect, it } from 'vitest';
|
||||
import { z } from 'zod';
|
||||
import {
|
||||
patternOutputSchema,
|
||||
patternsArraySchema,
|
||||
tableUsageOutputSchema,
|
||||
} from './skill-schemas.js';
|
||||
|
||||
describe('historic-sql skill schemas', () => {
|
||||
it('accepts table usage output and preserves future keys', () => {
|
||||
const parsed = tableUsageOutputSchema.parse({
|
||||
narrative: 'Orders are queried for paid/refunded lifecycle analysis.',
|
||||
frequencyTier: 'high',
|
||||
commonFilters: ['status', 'created_at'],
|
||||
commonGroupBys: ['status'],
|
||||
commonJoins: [{ table: 'public.customers', on: ['customer_id'] }],
|
||||
staleSince: null,
|
||||
analystNote: 'preserve me',
|
||||
});
|
||||
|
||||
expect(parsed).toMatchObject({
|
||||
narrative: 'Orders are queried for paid/refunded lifecycle analysis.',
|
||||
frequencyTier: 'high',
|
||||
commonFilters: ['status', 'created_at'],
|
||||
commonGroupBys: ['status'],
|
||||
commonJoins: [{ table: 'public.customers', on: ['customer_id'] }],
|
||||
staleSince: null,
|
||||
analystNote: 'preserve me',
|
||||
});
|
||||
});
|
||||
|
||||
it('rejects invalid frequency tiers', () => {
|
||||
const result = tableUsageOutputSchema.safeParse({
|
||||
narrative: 'Orders are queried often.',
|
||||
frequencyTier: 'sometimes',
|
||||
commonFilters: [],
|
||||
commonJoins: [],
|
||||
});
|
||||
|
||||
expect(result.success).toBe(false);
|
||||
});
|
||||
|
||||
it('accepts pattern outputs used for wiki projection', () => {
|
||||
const parsed = patternsArraySchema.parse([
|
||||
{
|
||||
slug: 'order-lifecycle-analysis',
|
||||
title: 'Order Lifecycle Analysis',
|
||||
narrative: 'Teams inspect order status by customer and month.',
|
||||
definitionSql: 'select status, count(*) from public.orders group by status',
|
||||
tablesInvolved: ['public.orders', 'public.customers'],
|
||||
slRefs: ['orders', 'customers'],
|
||||
constituentTemplateIds: ['template_1', 'template_2'],
|
||||
},
|
||||
]);
|
||||
|
||||
expect(parsed[0]).toEqual({
|
||||
slug: 'order-lifecycle-analysis',
|
||||
title: 'Order Lifecycle Analysis',
|
||||
narrative: 'Teams inspect order status by customer and month.',
|
||||
definitionSql: 'select status, count(*) from public.orders group by status',
|
||||
tablesInvolved: ['public.orders', 'public.customers'],
|
||||
slRefs: ['orders', 'customers'],
|
||||
constituentTemplateIds: ['template_1', 'template_2'],
|
||||
});
|
||||
});
|
||||
|
||||
it('exports zod schemas that can produce JSON schema for prompt prefixes', () => {
|
||||
const tableUsageJsonSchema = z.toJSONSchema(tableUsageOutputSchema);
|
||||
const patternJsonSchema = z.toJSONSchema(patternOutputSchema);
|
||||
|
||||
expect(tableUsageJsonSchema).toMatchObject({ type: 'object' });
|
||||
expect(patternJsonSchema).toMatchObject({ type: 'object' });
|
||||
});
|
||||
});
|
||||
|
|
@ -0,0 +1,31 @@
|
|||
import { z } from 'zod';
|
||||
|
||||
export const tableUsageOutputSchema = z
|
||||
.object({
|
||||
narrative: z.string(),
|
||||
frequencyTier: z.enum(['high', 'mid', 'low', 'unused']),
|
||||
commonFilters: z.array(z.string()),
|
||||
commonGroupBys: z.array(z.string()).optional(),
|
||||
commonJoins: z.array(
|
||||
z.object({
|
||||
table: z.string(),
|
||||
on: z.array(z.string()),
|
||||
}),
|
||||
),
|
||||
staleSince: z.iso.datetime().nullable().optional(),
|
||||
})
|
||||
.passthrough();
|
||||
export type TableUsageOutput = z.infer<typeof tableUsageOutputSchema>;
|
||||
|
||||
export const patternOutputSchema = z.object({
|
||||
slug: z.string(),
|
||||
title: z.string(),
|
||||
narrative: z.string(),
|
||||
definitionSql: z.string(),
|
||||
tablesInvolved: z.array(z.string()),
|
||||
slRefs: z.array(z.string()),
|
||||
constituentTemplateIds: z.array(z.string()),
|
||||
});
|
||||
|
||||
/** @internal */
|
||||
export const patternsArraySchema = z.array(patternOutputSchema);
|
||||
|
|
@ -0,0 +1,148 @@
|
|||
import { describe, expect, it, vi } from 'vitest';
|
||||
import { HistoricSqlGrantsMissingError } from './errors.js';
|
||||
import { SnowflakeHistoricSqlQueryHistoryReader } from './snowflake-query-history-reader.js';
|
||||
|
||||
interface FakeQueryResult {
|
||||
headers: string[];
|
||||
rows: unknown[][];
|
||||
totalRows: number;
|
||||
error?: string;
|
||||
}
|
||||
|
||||
function queryClient(results: FakeQueryResult[]) {
|
||||
const executeQuery = vi.fn(async (_query: string) => {
|
||||
const next = results.shift();
|
||||
if (!next) {
|
||||
throw new Error('unexpected query');
|
||||
}
|
||||
return next;
|
||||
});
|
||||
return { executeQuery };
|
||||
}
|
||||
|
||||
function firstQuery(client: ReturnType<typeof queryClient>): string {
|
||||
const call = client.executeQuery.mock.calls[0];
|
||||
if (!call) {
|
||||
throw new Error('expected query client to be called');
|
||||
}
|
||||
return call[0];
|
||||
}
|
||||
|
||||
describe('SnowflakeHistoricSqlQueryHistoryReader', () => {
|
||||
it('probes SNOWFLAKE.ACCOUNT_USAGE.QUERY_HISTORY', async () => {
|
||||
const client = queryClient([{ headers: ['1'], rows: [[1]], totalRows: 1 }]);
|
||||
const reader = new SnowflakeHistoricSqlQueryHistoryReader();
|
||||
|
||||
await expect(reader.probe(client)).resolves.toEqual({ warnings: [], info: [] });
|
||||
|
||||
expect(client.executeQuery).toHaveBeenCalledWith(
|
||||
'SELECT 1 FROM SNOWFLAKE.ACCOUNT_USAGE.QUERY_HISTORY LIMIT 1',
|
||||
);
|
||||
});
|
||||
|
||||
it('turns probe result errors into HistoricSqlGrantsMissingError', async () => {
|
||||
const client = queryClient([{ headers: [], rows: [], totalRows: 0, error: 'Object does not exist or not authorized' }]);
|
||||
const reader = new SnowflakeHistoricSqlQueryHistoryReader();
|
||||
|
||||
await expect(reader.probe(client)).rejects.toMatchObject({
|
||||
name: 'HistoricSqlGrantsMissingError',
|
||||
dialect: 'snowflake',
|
||||
remediation: 'GRANT IMPORTED PRIVILEGES ON DATABASE SNOWFLAKE TO ROLE <connection role>;',
|
||||
});
|
||||
});
|
||||
|
||||
it('turns thrown probe failures into HistoricSqlGrantsMissingError', async () => {
|
||||
const client = {
|
||||
executeQuery: vi.fn(async () => {
|
||||
throw new Error('permission denied');
|
||||
}),
|
||||
};
|
||||
const reader = new SnowflakeHistoricSqlQueryHistoryReader();
|
||||
|
||||
await expect(reader.probe(client)).rejects.toBeInstanceOf(HistoricSqlGrantsMissingError);
|
||||
});
|
||||
|
||||
it('fetches aggregated Snowflake query templates', async () => {
|
||||
const client = queryClient([
|
||||
{
|
||||
headers: [
|
||||
'template_id',
|
||||
'canonical_sql',
|
||||
'executions',
|
||||
'distinct_users',
|
||||
'first_seen',
|
||||
'last_seen',
|
||||
'p50_ms',
|
||||
'p95_ms',
|
||||
'error_rate',
|
||||
'rows_produced',
|
||||
'top_users',
|
||||
],
|
||||
rows: [
|
||||
[
|
||||
'hash-1',
|
||||
'select status from orders',
|
||||
42,
|
||||
3,
|
||||
'2026-05-01T00:00:00.000Z',
|
||||
'2026-05-11T00:00:00.000Z',
|
||||
12,
|
||||
40,
|
||||
0.05,
|
||||
100,
|
||||
JSON.stringify([{ user: 'ANALYST', executions: 1 }]),
|
||||
],
|
||||
],
|
||||
totalRows: 1,
|
||||
},
|
||||
]);
|
||||
const reader = new SnowflakeHistoricSqlQueryHistoryReader();
|
||||
|
||||
const rows = [];
|
||||
for await (const row of reader.fetchAggregated(
|
||||
client,
|
||||
{ start: new Date('2026-02-10T00:00:00.000Z'), end: new Date('2026-05-11T00:00:00.000Z') },
|
||||
{ dialect: 'snowflake', minExecutions: 5, windowDays: 90, enabledTables: [], filters: { dropTrivialProbes: true }, redactionPatterns: [], staleArchiveAfterDays: 90 },
|
||||
)) {
|
||||
rows.push(row);
|
||||
}
|
||||
|
||||
const sql = firstQuery(client);
|
||||
expect(sql).toContain('SNOWFLAKE.ACCOUNT_USAGE.QUERY_HISTORY');
|
||||
expect(sql).toContain('COUNT(*) AS executions');
|
||||
expect(sql).toContain('GROUP BY query_hash');
|
||||
expect(sql).toContain('HAVING COUNT(*) >= 5');
|
||||
expect(rows).toMatchObject([
|
||||
{
|
||||
templateId: 'hash-1',
|
||||
stats: {
|
||||
executions: 42,
|
||||
errorRate: 0.05,
|
||||
},
|
||||
topUsers: [{ user: 'ANALYST', executions: 1 }],
|
||||
},
|
||||
]);
|
||||
});
|
||||
|
||||
it('throws a clear error when the query client cannot execute SQL', async () => {
|
||||
const reader = new SnowflakeHistoricSqlQueryHistoryReader();
|
||||
|
||||
await expect(async () => {
|
||||
for await (const _row of reader.fetchAggregated(
|
||||
{},
|
||||
{ start: new Date(), end: new Date() },
|
||||
{
|
||||
dialect: 'snowflake',
|
||||
minExecutions: 5,
|
||||
windowDays: 90,
|
||||
enabledTables: [],
|
||||
filters: { dropTrivialProbes: true },
|
||||
redactionPatterns: [],
|
||||
staleArchiveAfterDays: 90,
|
||||
},
|
||||
)) {
|
||||
throw new Error('unreachable');
|
||||
}
|
||||
}).rejects.toThrow('Historic SQL Snowflake reader requires a query client with executeQuery(query)');
|
||||
});
|
||||
});
|
||||
|
|
@ -0,0 +1,220 @@
|
|||
import { HistoricSqlGrantsMissingError } from './errors.js';
|
||||
import {
|
||||
aggregatedTemplateSchema,
|
||||
type AggregatedTemplate,
|
||||
type HistoricSqlTimeWindow,
|
||||
type HistoricSqlUnifiedPullConfig,
|
||||
} from './types.js';
|
||||
|
||||
interface QueryResultLike {
|
||||
headers: string[];
|
||||
rows: unknown[][];
|
||||
totalRows: number;
|
||||
error?: string;
|
||||
}
|
||||
|
||||
interface QueryClientLike {
|
||||
executeQuery(query: string): Promise<QueryResultLike>;
|
||||
}
|
||||
|
||||
const PROBE_SQL = 'SELECT 1 FROM SNOWFLAKE.ACCOUNT_USAGE.QUERY_HISTORY LIMIT 1';
|
||||
|
||||
const SNOWFLAKE_GRANTS_REMEDIATION =
|
||||
'GRANT IMPORTED PRIVILEGES ON DATABASE SNOWFLAKE TO ROLE <connection role>;';
|
||||
|
||||
function queryClient(client: unknown): QueryClientLike {
|
||||
if (
|
||||
client &&
|
||||
typeof client === 'object' &&
|
||||
'executeQuery' in client &&
|
||||
typeof (client as { executeQuery?: unknown }).executeQuery === 'function'
|
||||
) {
|
||||
return client as QueryClientLike;
|
||||
}
|
||||
throw new Error('Historic SQL Snowflake reader requires a query client with executeQuery(query)');
|
||||
}
|
||||
|
||||
function grantsError(cause: unknown): HistoricSqlGrantsMissingError {
|
||||
const message =
|
||||
cause instanceof Error
|
||||
? cause.message
|
||||
: typeof cause === 'string'
|
||||
? cause
|
||||
: 'Snowflake role cannot query SNOWFLAKE.ACCOUNT_USAGE.QUERY_HISTORY.';
|
||||
return new HistoricSqlGrantsMissingError({
|
||||
dialect: 'snowflake',
|
||||
message: `Missing Snowflake audit grants for historic-SQL ingest: ${message}`,
|
||||
remediation: SNOWFLAKE_GRANTS_REMEDIATION,
|
||||
cause,
|
||||
});
|
||||
}
|
||||
|
||||
function timestampLiteral(value: Date | string): string {
|
||||
const date = value instanceof Date ? value : new Date(value);
|
||||
if (Number.isNaN(date.getTime())) {
|
||||
throw new Error(`Invalid Snowflake query-history timestamp: ${String(value)}`);
|
||||
}
|
||||
return `'${date.toISOString().replace(/'/g, "''")}'::TIMESTAMP_TZ`;
|
||||
}
|
||||
|
||||
function indexByHeader(headers: string[]): Map<string, number> {
|
||||
const out = new Map<string, number>();
|
||||
headers.forEach((header, index) => {
|
||||
out.set(header.toUpperCase(), index);
|
||||
});
|
||||
return out;
|
||||
}
|
||||
|
||||
function value(row: unknown[], indexes: Map<string, number>, name: string): unknown {
|
||||
const index = indexes.get(name.toUpperCase());
|
||||
return index === undefined ? null : row[index];
|
||||
}
|
||||
|
||||
function nullableString(raw: unknown): string | null {
|
||||
if (raw === null || raw === undefined) {
|
||||
return null;
|
||||
}
|
||||
const text = String(raw);
|
||||
return text.length > 0 ? text : null;
|
||||
}
|
||||
|
||||
function requiredString(raw: unknown, field: string): string {
|
||||
const text = nullableString(raw);
|
||||
if (!text) {
|
||||
throw new Error(`Snowflake QUERY_HISTORY row is missing ${field}`);
|
||||
}
|
||||
return text;
|
||||
}
|
||||
|
||||
function nullableNumber(raw: unknown): number | null {
|
||||
if (raw === null || raw === undefined || raw === '') {
|
||||
return null;
|
||||
}
|
||||
const number = typeof raw === 'number' ? raw : Number(raw);
|
||||
if (!Number.isFinite(number)) {
|
||||
return null;
|
||||
}
|
||||
return number;
|
||||
}
|
||||
|
||||
function requiredNumber(raw: unknown, field: string): number {
|
||||
const number = nullableNumber(raw);
|
||||
if (number === null) {
|
||||
throw new Error(`Snowflake QUERY_HISTORY row has invalid ${field}: ${String(raw)}`);
|
||||
}
|
||||
return number;
|
||||
}
|
||||
|
||||
function requiredInteger(raw: unknown, field: string): number {
|
||||
return Math.trunc(requiredNumber(raw, field));
|
||||
}
|
||||
|
||||
function nullableInteger(raw: unknown): number | null {
|
||||
const number = nullableNumber(raw);
|
||||
return number === null ? null : Math.trunc(number);
|
||||
}
|
||||
|
||||
function isoTimestamp(raw: unknown, field: string): string {
|
||||
if (raw instanceof Date) {
|
||||
return raw.toISOString();
|
||||
}
|
||||
const text = requiredString(raw, field);
|
||||
const date = new Date(text);
|
||||
if (Number.isNaN(date.getTime())) {
|
||||
throw new Error(`Snowflake QUERY_HISTORY row has invalid ${field}: ${text}`);
|
||||
}
|
||||
return date.toISOString();
|
||||
}
|
||||
|
||||
function parseTopUsers(raw: unknown): Array<{ user: string | null; executions: number }> {
|
||||
const text = nullableString(raw);
|
||||
if (!text) {
|
||||
return [];
|
||||
}
|
||||
try {
|
||||
const parsed = JSON.parse(text) as unknown;
|
||||
if (!Array.isArray(parsed)) {
|
||||
return [];
|
||||
}
|
||||
return parsed.flatMap((entry) => {
|
||||
if (!entry || typeof entry !== 'object') {
|
||||
return [];
|
||||
}
|
||||
const user = nullableString((entry as { user?: unknown }).user);
|
||||
const executions = nullableInteger((entry as { executions?: unknown }).executions);
|
||||
return executions === null ? [] : [{ user, executions }];
|
||||
});
|
||||
} catch {
|
||||
return [];
|
||||
}
|
||||
}
|
||||
|
||||
function mapAggregatedRow(row: unknown[], indexes: Map<string, number>): AggregatedTemplate {
|
||||
return aggregatedTemplateSchema.parse({
|
||||
templateId: requiredString(value(row, indexes, 'template_id'), 'template_id'),
|
||||
canonicalSql: requiredString(value(row, indexes, 'canonical_sql'), 'canonical_sql'),
|
||||
dialect: 'snowflake',
|
||||
stats: {
|
||||
executions: requiredInteger(value(row, indexes, 'executions'), 'executions'),
|
||||
distinctUsers: requiredInteger(value(row, indexes, 'distinct_users'), 'distinct_users'),
|
||||
firstSeen: isoTimestamp(value(row, indexes, 'first_seen'), 'first_seen'),
|
||||
lastSeen: isoTimestamp(value(row, indexes, 'last_seen'), 'last_seen'),
|
||||
p50RuntimeMs: nullableNumber(value(row, indexes, 'p50_ms')),
|
||||
p95RuntimeMs: nullableNumber(value(row, indexes, 'p95_ms')),
|
||||
errorRate: requiredNumber(value(row, indexes, 'error_rate'), 'error_rate'),
|
||||
rowsProduced: nullableInteger(value(row, indexes, 'rows_produced')),
|
||||
},
|
||||
topUsers: parseTopUsers(value(row, indexes, 'top_users')),
|
||||
});
|
||||
}
|
||||
|
||||
export class SnowflakeHistoricSqlQueryHistoryReader {
|
||||
async probe(client: unknown): Promise<{ warnings: string[]; info: string[] }> {
|
||||
let result: QueryResultLike;
|
||||
try {
|
||||
result = await queryClient(client).executeQuery(PROBE_SQL);
|
||||
} catch (error) {
|
||||
throw grantsError(error);
|
||||
}
|
||||
if (result.error) {
|
||||
throw grantsError(result.error);
|
||||
}
|
||||
return { warnings: [], info: [] };
|
||||
}
|
||||
|
||||
async *fetchAggregated(
|
||||
client: unknown,
|
||||
window: HistoricSqlTimeWindow,
|
||||
config: HistoricSqlUnifiedPullConfig,
|
||||
): AsyncIterable<AggregatedTemplate> {
|
||||
const sql = `
|
||||
SELECT
|
||||
query_hash AS template_id,
|
||||
MIN(query_text) AS canonical_sql,
|
||||
COUNT(*) AS executions,
|
||||
COUNT(DISTINCT user_name) AS distinct_users,
|
||||
MIN(start_time) AS first_seen,
|
||||
MAX(start_time) AS last_seen,
|
||||
APPROX_PERCENTILE(total_elapsed_time, 0.50) AS p50_ms,
|
||||
APPROX_PERCENTILE(total_elapsed_time, 0.95) AS p95_ms,
|
||||
DIV0(COUNT_IF(execution_status != 'SUCCESS'), COUNT(*)) AS error_rate,
|
||||
SUM(rows_produced) AS rows_produced,
|
||||
ARRAY_AGG(OBJECT_CONSTRUCT('user', user_name, 'executions', 1)) WITHIN GROUP (ORDER BY start_time DESC)::string AS top_users
|
||||
FROM SNOWFLAKE.ACCOUNT_USAGE.QUERY_HISTORY
|
||||
WHERE query_text IS NOT NULL
|
||||
AND query_type IN ('SELECT', 'MERGE')
|
||||
AND start_time >= ${timestampLiteral(window.start)}
|
||||
AND start_time < ${timestampLiteral(window.end)}
|
||||
GROUP BY query_hash
|
||||
HAVING COUNT(*) >= ${config.minExecutions}
|
||||
ORDER BY executions DESC`.trim();
|
||||
const result = await queryClient(client).executeQuery(sql);
|
||||
if (result.error) {
|
||||
throw grantsError(result.error);
|
||||
}
|
||||
const indexes = indexByHeader(result.headers);
|
||||
for (const row of result.rows) {
|
||||
yield mapAggregatedRow(row, indexes);
|
||||
}
|
||||
}
|
||||
}
|
||||
|
|
@ -0,0 +1,436 @@
|
|||
import { mkdtemp, readFile, readdir } from 'node:fs/promises';
|
||||
import { tmpdir } from 'node:os';
|
||||
import { join } from 'node:path';
|
||||
import { describe, expect, it, vi } from 'vitest';
|
||||
import type { SqlAnalysisPort } from '../../../../context/sql-analysis/ports.js';
|
||||
import { stageHistoricSqlAggregatedSnapshot } from './stage-unified.js';
|
||||
import type { AggregatedTemplate, HistoricSqlReader } from './types.js';
|
||||
|
||||
async function tempDir(): Promise<string> {
|
||||
return mkdtemp(join(tmpdir(), 'historic-sql-unified-stage-'));
|
||||
}
|
||||
|
||||
async function readJson<T>(root: string, relPath: string): Promise<T> {
|
||||
return JSON.parse(await readFile(join(root, relPath), 'utf-8')) as T;
|
||||
}
|
||||
|
||||
function aggregate(overrides: Partial<AggregatedTemplate> & { templateId: string; canonicalSql: string }): AggregatedTemplate {
|
||||
return {
|
||||
templateId: overrides.templateId,
|
||||
canonicalSql: overrides.canonicalSql,
|
||||
dialect: overrides.dialect ?? 'postgres',
|
||||
stats: overrides.stats ?? {
|
||||
executions: 42,
|
||||
distinctUsers: 3,
|
||||
firstSeen: '2026-05-01T00:00:00.000Z',
|
||||
lastSeen: '2026-05-11T00:00:00.000Z',
|
||||
p50RuntimeMs: 20,
|
||||
p95RuntimeMs: 80,
|
||||
errorRate: 0,
|
||||
rowsProduced: 100,
|
||||
},
|
||||
topUsers: overrides.topUsers ?? [{ user: 'analyst', executions: 40 }],
|
||||
};
|
||||
}
|
||||
|
||||
describe('stageHistoricSqlAggregatedSnapshot', () => {
|
||||
it('batch parses templates and writes stable table and patterns artifacts', async () => {
|
||||
const stagedDir = await tempDir();
|
||||
const reader: HistoricSqlReader = {
|
||||
async probe() {
|
||||
return { warnings: ['pg_stat_statements.track is none; aggregation still proceeds'], info: [] };
|
||||
},
|
||||
async *fetchAggregated() {
|
||||
yield aggregate({
|
||||
templateId: 'orders-by-status',
|
||||
canonicalSql: 'select o.status, count(*) from public.orders o join public.customers c on c.id = o.customer_id where o.created_at >= $1 group by o.status',
|
||||
});
|
||||
yield aggregate({
|
||||
templateId: 'service-account-only',
|
||||
canonicalSql: 'select * from public.orders where id = $1',
|
||||
stats: {
|
||||
executions: 20,
|
||||
distinctUsers: 1,
|
||||
firstSeen: '2026-05-01T00:00:00.000Z',
|
||||
lastSeen: '2026-05-11T00:00:00.000Z',
|
||||
p50RuntimeMs: 5,
|
||||
p95RuntimeMs: 10,
|
||||
errorRate: 0,
|
||||
rowsProduced: 1,
|
||||
},
|
||||
topUsers: [{ user: 'svc_loader', executions: 20 }],
|
||||
});
|
||||
yield aggregate({
|
||||
templateId: 'bad-parse',
|
||||
canonicalSql: 'select broken from',
|
||||
});
|
||||
},
|
||||
};
|
||||
const sqlAnalysis: SqlAnalysisPort = {
|
||||
analyzeForFingerprint: vi.fn(),
|
||||
analyzeBatch: vi.fn(async () => new Map([
|
||||
[
|
||||
'orders-by-status',
|
||||
{
|
||||
tablesTouched: ['public.orders', 'public.customers'],
|
||||
columnsByClause: {
|
||||
select: ['status'],
|
||||
where: ['created_at'],
|
||||
join: ['customer_id'],
|
||||
groupBy: ['status'],
|
||||
},
|
||||
},
|
||||
],
|
||||
['bad-parse', { tablesTouched: [], columnsByClause: {}, error: 'parse failed' }],
|
||||
])),
|
||||
validateReadOnly: vi.fn(async () => ({ ok: true })),
|
||||
};
|
||||
|
||||
await stageHistoricSqlAggregatedSnapshot({
|
||||
stagedDir,
|
||||
connectionId: 'warehouse',
|
||||
queryClient: {},
|
||||
reader,
|
||||
sqlAnalysis,
|
||||
pullConfig: {
|
||||
dialect: 'postgres',
|
||||
filters: {
|
||||
serviceAccounts: { patterns: ['^svc_'], mode: 'exclude' },
|
||||
},
|
||||
},
|
||||
now: new Date('2026-05-11T12:00:00.000Z'),
|
||||
});
|
||||
|
||||
expect(sqlAnalysis.analyzeBatch).toHaveBeenCalledTimes(1);
|
||||
expect(sqlAnalysis.analyzeBatch).toHaveBeenCalledWith(
|
||||
[
|
||||
{
|
||||
id: 'orders-by-status',
|
||||
sql: 'select o.status, count(*) from public.orders o join public.customers c on c.id = o.customer_id where o.created_at >= $1 group by o.status',
|
||||
},
|
||||
{ id: 'bad-parse', sql: 'select broken from' },
|
||||
],
|
||||
'postgres',
|
||||
);
|
||||
|
||||
expect(await readdir(join(stagedDir, 'tables'))).toEqual(['public.customers.json', 'public.orders.json']);
|
||||
|
||||
const manifest = await readJson<Record<string, unknown>>(stagedDir, 'manifest.json');
|
||||
expect(manifest).toMatchObject({
|
||||
source: 'historic-sql',
|
||||
connectionId: 'warehouse',
|
||||
dialect: 'postgres',
|
||||
snapshotRowCount: 3,
|
||||
touchedTableCount: 2,
|
||||
parseFailures: 1,
|
||||
warnings: ['parse_failed:bad-parse'],
|
||||
probeWarnings: ['pg_stat_statements.track is none; aggregation still proceeds'],
|
||||
staleArchiveAfterDays: 90,
|
||||
});
|
||||
|
||||
const orders = await readJson<Record<string, any>>(stagedDir, 'tables/public.orders.json');
|
||||
expect(orders).toMatchObject({
|
||||
table: 'public.orders',
|
||||
stats: {
|
||||
executionsBucket: '10-100',
|
||||
distinctUsersBucket: '2-5',
|
||||
errorRateBucket: 'none',
|
||||
p95RuntimeBucket: '<100ms',
|
||||
recencyBucket: 'current',
|
||||
},
|
||||
columnsByClause: {
|
||||
select: [['status', 'high']],
|
||||
where: [['created_at', 'high']],
|
||||
join: [['customer_id', 'high']],
|
||||
groupBy: [['status', 'high']],
|
||||
},
|
||||
observedJoins: [{ withTable: 'public.customers', on: ['customer_id'], freq: 'high' }],
|
||||
topTemplates: [
|
||||
{
|
||||
id: 'orders-by-status',
|
||||
topUsers: [{ user: 'analyst' }],
|
||||
},
|
||||
],
|
||||
});
|
||||
expect(orders.topTemplates[0].canonicalSql).toContain('group by o.status');
|
||||
|
||||
const patterns = await readJson<Record<string, any>>(stagedDir, 'patterns-input.json');
|
||||
expect(patterns.templates).toEqual([
|
||||
{
|
||||
id: 'orders-by-status',
|
||||
canonicalSql: expect.stringContaining('public.orders'),
|
||||
tablesTouched: ['public.customers', 'public.orders'],
|
||||
executionsBucket: '10-100',
|
||||
distinctUsersBucket: '2-5',
|
||||
dialect: 'postgres',
|
||||
},
|
||||
]);
|
||||
});
|
||||
|
||||
it('redacts configured SQL substrings in staged artifacts while analyzing original SQL', async () => {
|
||||
const stagedDir = await tempDir();
|
||||
const originalSql =
|
||||
"select * from public.api_events where api_key = 'sk_live_abc123' and note = 'Secret_Token_9f'"; // pragma: allowlist secret
|
||||
const reader: HistoricSqlReader = {
|
||||
async probe() {
|
||||
return { warnings: [], info: [] };
|
||||
},
|
||||
async *fetchAggregated() {
|
||||
yield aggregate({
|
||||
templateId: 'api-events-with-secret',
|
||||
canonicalSql: originalSql,
|
||||
stats: {
|
||||
executions: 15,
|
||||
distinctUsers: 2,
|
||||
firstSeen: '2026-05-01T00:00:00.000Z',
|
||||
lastSeen: '2026-05-11T00:00:00.000Z',
|
||||
p50RuntimeMs: 12,
|
||||
p95RuntimeMs: 25,
|
||||
errorRate: 0,
|
||||
rowsProduced: 15,
|
||||
},
|
||||
});
|
||||
},
|
||||
};
|
||||
const sqlAnalysis: SqlAnalysisPort = {
|
||||
analyzeForFingerprint: vi.fn(),
|
||||
analyzeBatch: vi.fn(async () => new Map([
|
||||
[
|
||||
'api-events-with-secret',
|
||||
{
|
||||
tablesTouched: ['public.api_events'],
|
||||
columnsByClause: {
|
||||
select: [],
|
||||
where: ['api_key', 'note'],
|
||||
join: [],
|
||||
groupBy: [],
|
||||
},
|
||||
},
|
||||
],
|
||||
])),
|
||||
validateReadOnly: vi.fn(async () => ({ ok: true })),
|
||||
};
|
||||
|
||||
await stageHistoricSqlAggregatedSnapshot({
|
||||
stagedDir,
|
||||
connectionId: 'warehouse',
|
||||
queryClient: {},
|
||||
reader,
|
||||
sqlAnalysis,
|
||||
pullConfig: {
|
||||
dialect: 'postgres',
|
||||
redactionPatterns: ['sk_live_[A-Za-z0-9]+', '(?i)secret_token_[a-z0-9]+'],
|
||||
},
|
||||
now: new Date('2026-05-11T12:00:00.000Z'),
|
||||
});
|
||||
|
||||
expect(sqlAnalysis.analyzeBatch).toHaveBeenCalledWith(
|
||||
[{ id: 'api-events-with-secret', sql: originalSql }],
|
||||
'postgres',
|
||||
);
|
||||
|
||||
const tableJson = await readFile(join(stagedDir, 'tables/public.api_events.json'), 'utf-8');
|
||||
const patternsJson = await readFile(join(stagedDir, 'patterns-input.json'), 'utf-8');
|
||||
expect(tableJson).not.toContain('sk_live_abc123');
|
||||
expect(tableJson).not.toContain('Secret_Token_9f');
|
||||
expect(patternsJson).not.toContain('sk_live_abc123');
|
||||
expect(patternsJson).not.toContain('Secret_Token_9f');
|
||||
expect(tableJson).toContain('[REDACTED]');
|
||||
expect(patternsJson).toContain('[REDACTED]');
|
||||
});
|
||||
|
||||
it('limits staged table artifacts to configured enabled tables', async () => {
|
||||
const stagedDir = await tempDir();
|
||||
const reader: HistoricSqlReader = {
|
||||
async probe() {
|
||||
return { warnings: [], info: [] };
|
||||
},
|
||||
async *fetchAggregated() {
|
||||
yield aggregate({
|
||||
templateId: 'selected-qualified',
|
||||
canonicalSql: 'select count(*) from orbit_analytics.int_active_contract_arr',
|
||||
});
|
||||
yield aggregate({
|
||||
templateId: 'selected-unqualified',
|
||||
canonicalSql: 'select count(*) from int_customer_health_signals',
|
||||
});
|
||||
yield aggregate({
|
||||
templateId: 'unselected',
|
||||
canonicalSql: 'select count(*) from orbit_raw.accounts',
|
||||
});
|
||||
},
|
||||
};
|
||||
const sqlAnalysis: SqlAnalysisPort = {
|
||||
analyzeForFingerprint: vi.fn(),
|
||||
analyzeBatch: vi.fn(async () => new Map([
|
||||
[
|
||||
'selected-qualified',
|
||||
{
|
||||
tablesTouched: ['orbit_analytics.int_active_contract_arr'],
|
||||
columnsByClause: { select: [], where: [], join: [], groupBy: [] },
|
||||
},
|
||||
],
|
||||
[
|
||||
'selected-unqualified',
|
||||
{
|
||||
tablesTouched: ['int_customer_health_signals'],
|
||||
columnsByClause: { select: [], where: [], join: [], groupBy: [] },
|
||||
},
|
||||
],
|
||||
[
|
||||
'unselected',
|
||||
{
|
||||
tablesTouched: ['orbit_raw.accounts'],
|
||||
columnsByClause: { select: [], where: [], join: [], groupBy: [] },
|
||||
},
|
||||
],
|
||||
])),
|
||||
validateReadOnly: vi.fn(async () => ({ ok: true })),
|
||||
};
|
||||
|
||||
await stageHistoricSqlAggregatedSnapshot({
|
||||
stagedDir,
|
||||
connectionId: 'warehouse',
|
||||
queryClient: {},
|
||||
reader,
|
||||
sqlAnalysis,
|
||||
pullConfig: {
|
||||
dialect: 'postgres',
|
||||
enabledTables: [
|
||||
'orbit_analytics.int_active_contract_arr',
|
||||
'orbit_analytics.int_customer_health_signals',
|
||||
],
|
||||
},
|
||||
now: new Date('2026-05-11T12:00:00.000Z'),
|
||||
});
|
||||
|
||||
expect(await readdir(join(stagedDir, 'tables'))).toEqual([
|
||||
'int_customer_health_signals.json',
|
||||
'orbit_analytics.int_active_contract_arr.json',
|
||||
]);
|
||||
const manifest = await readJson<Record<string, any>>(stagedDir, 'manifest.json');
|
||||
expect(manifest.touchedTableCount).toBe(2);
|
||||
const patterns = await readJson<Record<string, any>>(stagedDir, 'patterns-input.json');
|
||||
expect(patterns.templates.map((entry: any) => entry.id)).toEqual(['selected-qualified', 'selected-unqualified']);
|
||||
});
|
||||
|
||||
it('preserves full patterns audit input and writes bounded cross-table pattern shards', async () => {
|
||||
const stagedDir = await tempDir();
|
||||
const largeSql = `select * from public.orders o join public.customers c on c.id = o.customer_id where payload = '${'x'.repeat(8000)}'`;
|
||||
const reader: HistoricSqlReader = {
|
||||
async probe() {
|
||||
return { warnings: [], info: [] };
|
||||
},
|
||||
async *fetchAggregated() {
|
||||
yield aggregate({
|
||||
templateId: 'orders-customers-a',
|
||||
canonicalSql: largeSql,
|
||||
stats: {
|
||||
executions: 25,
|
||||
distinctUsers: 4,
|
||||
firstSeen: '2026-05-01T00:00:00.000Z',
|
||||
lastSeen: '2026-05-11T00:00:00.000Z',
|
||||
p50RuntimeMs: 15,
|
||||
p95RuntimeMs: 90,
|
||||
errorRate: 0,
|
||||
rowsProduced: 250,
|
||||
},
|
||||
});
|
||||
yield aggregate({
|
||||
templateId: 'orders-customers-b',
|
||||
canonicalSql: largeSql.replace('payload', 'payload_b'),
|
||||
stats: {
|
||||
executions: 22,
|
||||
distinctUsers: 3,
|
||||
firstSeen: '2026-05-01T00:00:00.000Z',
|
||||
lastSeen: '2026-05-11T00:00:00.000Z',
|
||||
p50RuntimeMs: 20,
|
||||
p95RuntimeMs: 95,
|
||||
errorRate: 0,
|
||||
rowsProduced: 220,
|
||||
},
|
||||
});
|
||||
yield aggregate({
|
||||
templateId: 'orders-single-table',
|
||||
canonicalSql: 'select count(*) from public.orders',
|
||||
stats: {
|
||||
executions: 30,
|
||||
distinctUsers: 2,
|
||||
firstSeen: '2026-05-01T00:00:00.000Z',
|
||||
lastSeen: '2026-05-11T00:00:00.000Z',
|
||||
p50RuntimeMs: 10,
|
||||
p95RuntimeMs: 20,
|
||||
errorRate: 0,
|
||||
rowsProduced: 30,
|
||||
},
|
||||
});
|
||||
},
|
||||
};
|
||||
const sqlAnalysis: SqlAnalysisPort = {
|
||||
analyzeForFingerprint: vi.fn(),
|
||||
analyzeBatch: vi.fn(async () => new Map([
|
||||
[
|
||||
'orders-customers-a',
|
||||
{
|
||||
tablesTouched: ['public.orders', 'public.customers'],
|
||||
columnsByClause: {
|
||||
select: [],
|
||||
where: ['payload'],
|
||||
join: ['customer_id', 'id'],
|
||||
groupBy: [],
|
||||
},
|
||||
},
|
||||
],
|
||||
[
|
||||
'orders-customers-b',
|
||||
{
|
||||
tablesTouched: ['public.orders', 'public.customers'],
|
||||
columnsByClause: {
|
||||
select: [],
|
||||
where: ['payload_b'],
|
||||
join: ['customer_id', 'id'],
|
||||
groupBy: [],
|
||||
},
|
||||
},
|
||||
],
|
||||
[
|
||||
'orders-single-table',
|
||||
{
|
||||
tablesTouched: ['public.orders'],
|
||||
columnsByClause: {
|
||||
select: [],
|
||||
where: [],
|
||||
join: [],
|
||||
groupBy: [],
|
||||
},
|
||||
},
|
||||
],
|
||||
])),
|
||||
validateReadOnly: vi.fn(async () => ({ ok: true })),
|
||||
};
|
||||
|
||||
await stageHistoricSqlAggregatedSnapshot({
|
||||
stagedDir,
|
||||
connectionId: 'warehouse',
|
||||
queryClient: {},
|
||||
reader,
|
||||
sqlAnalysis,
|
||||
pullConfig: { dialect: 'postgres' },
|
||||
now: new Date('2026-05-11T12:00:00.000Z'),
|
||||
});
|
||||
|
||||
const audit = await readJson<Record<string, any>>(stagedDir, 'patterns-input.json');
|
||||
expect(audit.templates.map((entry: any) => entry.id)).toEqual([
|
||||
'orders-customers-a',
|
||||
'orders-customers-b',
|
||||
'orders-single-table',
|
||||
]);
|
||||
|
||||
const firstShard = await readJson<Record<string, any>>(stagedDir, 'patterns-input/part-0001.json');
|
||||
expect(firstShard.templates.map((entry: any) => entry.id)).toEqual(['orders-customers-a', 'orders-customers-b']);
|
||||
expect(firstShard.templates.some((entry: any) => entry.id === 'orders-single-table')).toBe(false);
|
||||
|
||||
const manifest = await readJson<Record<string, any>>(stagedDir, 'manifest.json');
|
||||
expect(manifest.warnings).toEqual([]);
|
||||
});
|
||||
});
|
||||
|
|
@ -0,0 +1,360 @@
|
|||
import { mkdir, writeFile } from 'node:fs/promises';
|
||||
import { dirname, join } from 'node:path';
|
||||
import type { SqlAnalysisPort } from '../../../../context/sql-analysis/ports.js';
|
||||
import {
|
||||
bucketDistinctUsers,
|
||||
bucketErrorRate,
|
||||
bucketExecutions,
|
||||
bucketFrequency,
|
||||
bucketP95Runtime,
|
||||
bucketRecency,
|
||||
} from './buckets.js';
|
||||
import { splitHistoricSqlPatternInputs } from './pattern-inputs.js';
|
||||
import {
|
||||
compileHistoricSqlRedactionPatterns,
|
||||
redactHistoricSqlText,
|
||||
type HistoricSqlRedactionPattern,
|
||||
} from './redaction.js';
|
||||
import {
|
||||
HISTORIC_SQL_SOURCE_KEY,
|
||||
aggregatedTemplateSchema,
|
||||
historicSqlUnifiedPullConfigSchema,
|
||||
type AggregatedTemplate,
|
||||
type HistoricSqlReader,
|
||||
type HistoricSqlUnifiedPullConfig,
|
||||
type StagedPatternsInput,
|
||||
type StagedTableInput,
|
||||
} from './types.js';
|
||||
|
||||
interface StageHistoricSqlAggregatedSnapshotInput {
|
||||
stagedDir: string;
|
||||
connectionId: string;
|
||||
queryClient: unknown;
|
||||
reader: HistoricSqlReader;
|
||||
sqlAnalysis: SqlAnalysisPort;
|
||||
pullConfig: unknown;
|
||||
now?: Date;
|
||||
}
|
||||
|
||||
interface ParsedTemplate {
|
||||
template: AggregatedTemplate;
|
||||
tablesTouched: string[];
|
||||
includedTables: string[];
|
||||
columnsByClause: Record<string, string[]>;
|
||||
}
|
||||
|
||||
interface EnabledTableFilter {
|
||||
exact: Set<string>;
|
||||
uniqueUnqualified: Set<string>;
|
||||
}
|
||||
|
||||
interface TableAccumulator {
|
||||
table: string;
|
||||
executions: number;
|
||||
distinctUsers: number;
|
||||
errorRateNumerator: number;
|
||||
p95RuntimeMs: number | null;
|
||||
lastSeen: string;
|
||||
columnsByClause: Map<string, Map<string, number>>;
|
||||
observedJoins: Map<string, Map<string, number>>;
|
||||
topTemplates: AggregatedTemplate[];
|
||||
}
|
||||
|
||||
const TRIVIAL_SQL_RE = /^\s*SELECT\s+(1|NOW\(\)|CURRENT_TIMESTAMP|VERSION\(\))\s*;?\s*$/i;
|
||||
const NOISE_PREFIX_RE = /^\s*(SHOW|DESCRIBE|DESC|EXPLAIN|USE|SET)\b/i;
|
||||
const SYSTEM_TABLE_RE = /\b(INFORMATION_SCHEMA|SNOWFLAKE\.ACCOUNT_USAGE|pg_|system\.)/i;
|
||||
|
||||
function writeJson(root: string, relPath: string, value: unknown): Promise<void> {
|
||||
const target = join(root, relPath);
|
||||
return mkdir(dirname(target), { recursive: true }).then(() =>
|
||||
writeFile(target, `${JSON.stringify(value, null, 2)}\n`, 'utf-8'),
|
||||
);
|
||||
}
|
||||
|
||||
function compilePatterns(patterns: string[]): RegExp[] {
|
||||
return patterns.map((pattern) => new RegExp(pattern));
|
||||
}
|
||||
|
||||
function matchesAny(value: string | null, patterns: RegExp[]): boolean {
|
||||
return !!value && patterns.some((pattern) => pattern.test(value));
|
||||
}
|
||||
|
||||
function shouldDropBySql(sql: string, config: HistoricSqlUnifiedPullConfig): boolean {
|
||||
if (NOISE_PREFIX_RE.test(sql) || SYSTEM_TABLE_RE.test(sql)) return true;
|
||||
if (config.filters.dropTrivialProbes !== false && TRIVIAL_SQL_RE.test(sql)) return true;
|
||||
return false;
|
||||
}
|
||||
|
||||
function shouldDropByUsers(template: AggregatedTemplate, config: HistoricSqlUnifiedPullConfig): boolean {
|
||||
const service = config.filters.serviceAccounts;
|
||||
if (!service || service.mode === 'mark-only' || service.patterns.length === 0) return false;
|
||||
const patterns = compilePatterns(service.patterns);
|
||||
const matchingExecutions = template.topUsers
|
||||
.filter((entry) => matchesAny(entry.user, patterns))
|
||||
.reduce((sum, entry) => sum + entry.executions, 0);
|
||||
const allExecutions = template.topUsers.reduce((sum, entry) => sum + entry.executions, 0);
|
||||
const serviceOnly = allExecutions > 0 && matchingExecutions >= allExecutions;
|
||||
return service.mode === 'exclude' ? serviceOnly : !serviceOnly;
|
||||
}
|
||||
|
||||
function shouldDropByFailure(template: AggregatedTemplate, config: HistoricSqlUnifiedPullConfig): boolean {
|
||||
const failed = config.filters.dropFailedBelow;
|
||||
return !!failed && template.stats.errorRate > failed.errorRate && template.stats.executions < failed.executions;
|
||||
}
|
||||
|
||||
function shouldDropTemplate(template: AggregatedTemplate, config: HistoricSqlUnifiedPullConfig): boolean {
|
||||
if (shouldDropBySql(template.canonicalSql, config)) return true;
|
||||
if (shouldDropByUsers(template, config)) return true;
|
||||
if (shouldDropByFailure(template, config)) return true;
|
||||
return false;
|
||||
}
|
||||
|
||||
function normalizeTableIdentifier(value: string): string {
|
||||
return value.trim().toLowerCase();
|
||||
}
|
||||
|
||||
function unqualifiedTableIdentifier(value: string): string {
|
||||
const parts = normalizeTableIdentifier(value).split('.').filter(Boolean);
|
||||
return parts.at(-1) ?? '';
|
||||
}
|
||||
|
||||
function buildEnabledTableFilter(enabledTables: string[]): EnabledTableFilter | null {
|
||||
if (enabledTables.length === 0) {
|
||||
return null;
|
||||
}
|
||||
const exact = new Set(enabledTables.map(normalizeTableIdentifier).filter((value) => value.length > 0));
|
||||
const unqualifiedCounts = new Map<string, number>();
|
||||
for (const table of exact) {
|
||||
const unqualified = unqualifiedTableIdentifier(table);
|
||||
if (unqualified.length > 0) {
|
||||
unqualifiedCounts.set(unqualified, (unqualifiedCounts.get(unqualified) ?? 0) + 1);
|
||||
}
|
||||
}
|
||||
return {
|
||||
exact,
|
||||
uniqueUnqualified: new Set(
|
||||
[...unqualifiedCounts.entries()]
|
||||
.filter(([, count]) => count === 1)
|
||||
.map(([table]) => table),
|
||||
),
|
||||
};
|
||||
}
|
||||
|
||||
function isEnabledTable(table: string, filter: EnabledTableFilter | null): boolean {
|
||||
if (!filter) {
|
||||
return true;
|
||||
}
|
||||
const normalized = normalizeTableIdentifier(table);
|
||||
return filter.exact.has(normalized) || filter.uniqueUnqualified.has(unqualifiedTableIdentifier(normalized));
|
||||
}
|
||||
|
||||
function historicSqlWindowDays(config: HistoricSqlUnifiedPullConfig): number {
|
||||
return 'windowDays' in config ? config.windowDays : 90;
|
||||
}
|
||||
|
||||
function redactTemplateSql(
|
||||
template: AggregatedTemplate,
|
||||
redactors: readonly HistoricSqlRedactionPattern[],
|
||||
): AggregatedTemplate {
|
||||
if (redactors.length === 0) {
|
||||
return template;
|
||||
}
|
||||
return {
|
||||
...template,
|
||||
canonicalSql: redactHistoricSqlText(template.canonicalSql, redactors),
|
||||
};
|
||||
}
|
||||
|
||||
function recordColumn(acc: TableAccumulator, clause: string, column: string, executions: number): void {
|
||||
const byColumn = acc.columnsByClause.get(clause) ?? new Map<string, number>();
|
||||
byColumn.set(column, (byColumn.get(column) ?? 0) + executions);
|
||||
acc.columnsByClause.set(clause, byColumn);
|
||||
}
|
||||
|
||||
function recordJoin(acc: TableAccumulator, otherTable: string, columns: string[], executions: number): void {
|
||||
const byColumns = acc.observedJoins.get(otherTable) ?? new Map<string, number>();
|
||||
const key = [...new Set(columns)].sort().join(',');
|
||||
if (key.length > 0) {
|
||||
byColumns.set(key, (byColumns.get(key) ?? 0) + executions);
|
||||
acc.observedJoins.set(otherTable, byColumns);
|
||||
}
|
||||
}
|
||||
|
||||
function accumulatorFor(table: string): TableAccumulator {
|
||||
return {
|
||||
table,
|
||||
executions: 0,
|
||||
distinctUsers: 0,
|
||||
errorRateNumerator: 0,
|
||||
p95RuntimeMs: null,
|
||||
lastSeen: '1970-01-01T00:00:00.000Z',
|
||||
columnsByClause: new Map(),
|
||||
observedJoins: new Map(),
|
||||
topTemplates: [],
|
||||
};
|
||||
}
|
||||
|
||||
function addTemplate(acc: TableAccumulator, parsed: ParsedTemplate): void {
|
||||
const executions = parsed.template.stats.executions;
|
||||
acc.executions += executions;
|
||||
acc.distinctUsers = Math.max(acc.distinctUsers, parsed.template.stats.distinctUsers);
|
||||
acc.errorRateNumerator += parsed.template.stats.errorRate * executions;
|
||||
acc.p95RuntimeMs =
|
||||
acc.p95RuntimeMs === null
|
||||
? parsed.template.stats.p95RuntimeMs
|
||||
: parsed.template.stats.p95RuntimeMs === null
|
||||
? acc.p95RuntimeMs
|
||||
: Math.max(acc.p95RuntimeMs, parsed.template.stats.p95RuntimeMs);
|
||||
acc.lastSeen = parsed.template.stats.lastSeen > acc.lastSeen ? parsed.template.stats.lastSeen : acc.lastSeen;
|
||||
for (const [clause, columns] of Object.entries(parsed.columnsByClause)) {
|
||||
for (const column of columns) {
|
||||
recordColumn(acc, clause, column, executions);
|
||||
}
|
||||
}
|
||||
const joinColumns = parsed.columnsByClause.join ?? [];
|
||||
for (const otherTable of parsed.tablesTouched.filter((table) => table !== acc.table)) {
|
||||
recordJoin(acc, otherTable, joinColumns, executions);
|
||||
}
|
||||
acc.topTemplates.push(parsed.template);
|
||||
}
|
||||
|
||||
function toStagedTable(acc: TableAccumulator, now: Date): StagedTableInput {
|
||||
const errorRate = acc.executions > 0 ? acc.errorRateNumerator / acc.executions : 0;
|
||||
const columnsByClause: Record<string, Array<[string, string]>> = Object.fromEntries(
|
||||
[...acc.columnsByClause.entries()]
|
||||
.sort(([left], [right]) => left.localeCompare(right))
|
||||
.map(([clause, counts]) => [
|
||||
clause,
|
||||
[...counts.entries()]
|
||||
.sort((left, right) => right[1] - left[1] || left[0].localeCompare(right[0]))
|
||||
.map(([column, count]) => [column, bucketFrequency(count, acc.executions)] as [string, string]),
|
||||
]),
|
||||
);
|
||||
const observedJoins = [...acc.observedJoins.entries()]
|
||||
.flatMap(([withTable, byColumns]) =>
|
||||
[...byColumns.entries()].map(([columns, count]) => ({
|
||||
withTable,
|
||||
on: columns.split(',').filter(Boolean),
|
||||
freq: bucketFrequency(count, acc.executions),
|
||||
})),
|
||||
)
|
||||
.sort((left, right) => left.withTable.localeCompare(right.withTable) || left.on.join(',').localeCompare(right.on.join(',')));
|
||||
const topTemplates = [...acc.topTemplates]
|
||||
.sort((left, right) => right.stats.executions - left.stats.executions || left.templateId.localeCompare(right.templateId))
|
||||
.slice(0, 5)
|
||||
.map((template) => ({
|
||||
id: template.templateId,
|
||||
canonicalSql: template.canonicalSql,
|
||||
topUsers: template.topUsers.slice(0, 5).map((entry) => ({ user: entry.user })),
|
||||
}));
|
||||
|
||||
return {
|
||||
table: acc.table,
|
||||
stats: {
|
||||
executionsBucket: bucketExecutions(acc.executions),
|
||||
distinctUsersBucket: bucketDistinctUsers(acc.distinctUsers),
|
||||
errorRateBucket: bucketErrorRate(errorRate),
|
||||
p95RuntimeBucket: bucketP95Runtime(acc.p95RuntimeMs),
|
||||
recencyBucket: bucketRecency(acc.lastSeen, now),
|
||||
},
|
||||
columnsByClause,
|
||||
observedJoins,
|
||||
topTemplates,
|
||||
};
|
||||
}
|
||||
|
||||
function toPatternsInput(parsedTemplates: ParsedTemplate[]): StagedPatternsInput {
|
||||
return {
|
||||
templates: parsedTemplates
|
||||
.map(({ template, tablesTouched }) => ({
|
||||
id: template.templateId,
|
||||
canonicalSql: template.canonicalSql,
|
||||
tablesTouched: [...tablesTouched].sort(),
|
||||
executionsBucket: bucketExecutions(template.stats.executions),
|
||||
distinctUsersBucket: bucketDistinctUsers(template.stats.distinctUsers),
|
||||
dialect: template.dialect,
|
||||
}))
|
||||
.sort((left, right) => left.id.localeCompare(right.id)),
|
||||
};
|
||||
}
|
||||
|
||||
export async function stageHistoricSqlAggregatedSnapshot(input: StageHistoricSqlAggregatedSnapshotInput): Promise<void> {
|
||||
const config = historicSqlUnifiedPullConfigSchema.parse(input.pullConfig);
|
||||
const enabledTableFilter = buildEnabledTableFilter(config.enabledTables);
|
||||
const redactors = compileHistoricSqlRedactionPatterns(config.redactionPatterns);
|
||||
const now = input.now ?? new Date();
|
||||
const windowStart = new Date(now.getTime() - historicSqlWindowDays(config) * 24 * 60 * 60 * 1000);
|
||||
const probe = await input.reader.probe(input.queryClient);
|
||||
const snapshot: AggregatedTemplate[] = [];
|
||||
let snapshotRowCount = 0;
|
||||
|
||||
for await (const row of input.reader.fetchAggregated(input.queryClient, { start: windowStart, end: now }, config)) {
|
||||
snapshotRowCount += 1;
|
||||
const parsed = aggregatedTemplateSchema.parse(row);
|
||||
if (!shouldDropTemplate(parsed, config)) {
|
||||
snapshot.push(parsed);
|
||||
}
|
||||
}
|
||||
|
||||
const analysis = await input.sqlAnalysis.analyzeBatch(
|
||||
snapshot.map((template) => ({ id: template.templateId, sql: template.canonicalSql })),
|
||||
config.dialect,
|
||||
);
|
||||
const warnings: string[] = [];
|
||||
const parsedTemplates: ParsedTemplate[] = [];
|
||||
for (const template of snapshot) {
|
||||
const parsed = analysis.get(template.templateId);
|
||||
if (!parsed || parsed.error) {
|
||||
warnings.push(`parse_failed:${template.templateId}`);
|
||||
continue;
|
||||
}
|
||||
const tablesTouched = [...new Set(parsed.tablesTouched)].filter((table) => table.length > 0).sort();
|
||||
const includedTables = tablesTouched.filter((table) => isEnabledTable(table, enabledTableFilter));
|
||||
if (includedTables.length === 0) {
|
||||
continue;
|
||||
}
|
||||
parsedTemplates.push({
|
||||
template: redactTemplateSql(template, redactors),
|
||||
tablesTouched,
|
||||
includedTables,
|
||||
columnsByClause: Object.fromEntries(
|
||||
Object.entries(parsed.columnsByClause).map(([clause, columns]) => [clause, [...new Set(columns)].sort()]),
|
||||
),
|
||||
});
|
||||
}
|
||||
|
||||
const byTable = new Map<string, TableAccumulator>();
|
||||
for (const parsed of parsedTemplates) {
|
||||
for (const table of parsed.includedTables) {
|
||||
const acc = byTable.get(table) ?? accumulatorFor(table);
|
||||
addTemplate(acc, parsed);
|
||||
byTable.set(table, acc);
|
||||
}
|
||||
}
|
||||
|
||||
await mkdir(input.stagedDir, { recursive: true });
|
||||
for (const [table, acc] of [...byTable.entries()].sort(([left], [right]) => left.localeCompare(right))) {
|
||||
await writeJson(input.stagedDir, `tables/${table}.json`, toStagedTable(acc, now));
|
||||
}
|
||||
const patternsInput = toPatternsInput(parsedTemplates);
|
||||
const patternInputSplit = splitHistoricSqlPatternInputs(patternsInput);
|
||||
const allWarnings = [...warnings, ...patternInputSplit.warnings];
|
||||
await writeJson(input.stagedDir, 'patterns-input.json', patternInputSplit.auditInput);
|
||||
for (const shard of patternInputSplit.shards) {
|
||||
await writeJson(input.stagedDir, shard.path, shard.input);
|
||||
}
|
||||
await writeJson(input.stagedDir, 'manifest.json', {
|
||||
source: HISTORIC_SQL_SOURCE_KEY,
|
||||
connectionId: input.connectionId,
|
||||
dialect: config.dialect,
|
||||
fetchedAt: now.toISOString(),
|
||||
windowStart: windowStart.toISOString(),
|
||||
windowEnd: now.toISOString(),
|
||||
snapshotRowCount,
|
||||
touchedTableCount: byTable.size,
|
||||
parseFailures: allWarnings.filter((warning) => warning.startsWith('parse_failed:')).length,
|
||||
warnings: allWarnings,
|
||||
probeWarnings: probe.warnings,
|
||||
staleArchiveAfterDays: config.staleArchiveAfterDays,
|
||||
});
|
||||
}
|
||||
|
|
@ -0,0 +1,110 @@
|
|||
import { describe, expect, it } from 'vitest';
|
||||
import {
|
||||
aggregatedTemplateSchema,
|
||||
historicSqlUnifiedPullConfigSchema,
|
||||
stagedManifestSchema,
|
||||
stagedPatternsInputSchema,
|
||||
stagedTableInputSchema,
|
||||
} from './types.js';
|
||||
|
||||
describe('historic-sql unified contracts', () => {
|
||||
it('parses minExecutions and service-account filters', () => {
|
||||
expect(historicSqlUnifiedPullConfigSchema.parse({ dialect: 'postgres', minExecutions: 9 })).toMatchObject({
|
||||
dialect: 'postgres',
|
||||
minExecutions: 9,
|
||||
redactionPatterns: [],
|
||||
staleArchiveAfterDays: 90,
|
||||
});
|
||||
expect(historicSqlUnifiedPullConfigSchema.parse({ dialect: 'postgres', minExecutions: 9 })).not.toHaveProperty(
|
||||
'windowDays',
|
||||
);
|
||||
expect(historicSqlUnifiedPullConfigSchema.parse({ dialect: 'postgres', minExecutions: 9 })).not.toHaveProperty(
|
||||
'concurrency',
|
||||
);
|
||||
|
||||
const parsed = historicSqlUnifiedPullConfigSchema.parse({
|
||||
dialect: 'postgres',
|
||||
minExecutions: 7,
|
||||
filters: {
|
||||
serviceAccounts: { patterns: ['^svc_'], mode: 'exclude' },
|
||||
},
|
||||
});
|
||||
expect(parsed.minExecutions).toBe(7);
|
||||
expect(parsed.filters.serviceAccounts).toEqual({ patterns: ['^svc_'], mode: 'exclude' });
|
||||
});
|
||||
|
||||
it('validates aggregate templates from warehouse readers', () => {
|
||||
const parsed = aggregatedTemplateSchema.parse({
|
||||
templateId: 'pg:123',
|
||||
canonicalSql: 'select status, count(*) from public.orders group by status',
|
||||
dialect: 'postgres',
|
||||
stats: {
|
||||
executions: 42,
|
||||
distinctUsers: 3,
|
||||
firstSeen: '2026-05-01T00:00:00.000Z',
|
||||
lastSeen: '2026-05-11T00:00:00.000Z',
|
||||
p50RuntimeMs: 12.5,
|
||||
p95RuntimeMs: 40,
|
||||
errorRate: 0,
|
||||
rowsProduced: 100,
|
||||
},
|
||||
topUsers: [{ user: 'analyst', executions: 40 }],
|
||||
});
|
||||
|
||||
expect(parsed.templateId).toBe('pg:123');
|
||||
expect(parsed.topUsers).toEqual([{ user: 'analyst', executions: 40 }]);
|
||||
});
|
||||
|
||||
it('validates staged table, patterns, and manifest artifacts', () => {
|
||||
expect(
|
||||
stagedTableInputSchema.parse({
|
||||
table: 'public.orders',
|
||||
stats: {
|
||||
executionsBucket: '10-100',
|
||||
distinctUsersBucket: '2-5',
|
||||
errorRateBucket: 'none',
|
||||
p95RuntimeBucket: '<100ms',
|
||||
recencyBucket: 'current',
|
||||
},
|
||||
columnsByClause: {
|
||||
select: [['status', 'high']],
|
||||
where: [['created_at', 'mid']],
|
||||
},
|
||||
observedJoins: [{ withTable: 'public.customers', on: ['customer_id'], freq: 'high' }],
|
||||
topTemplates: [{ id: 'pg:123', canonicalSql: 'select * from public.orders', topUsers: [{ user: 'analyst' }] }],
|
||||
}).table,
|
||||
).toBe('public.orders');
|
||||
|
||||
expect(
|
||||
stagedPatternsInputSchema.parse({
|
||||
templates: [
|
||||
{
|
||||
id: 'pg:123',
|
||||
canonicalSql: 'select * from public.orders',
|
||||
tablesTouched: ['public.orders'],
|
||||
executionsBucket: '10-100',
|
||||
distinctUsersBucket: '2-5',
|
||||
dialect: 'postgres',
|
||||
},
|
||||
],
|
||||
}).templates,
|
||||
).toHaveLength(1);
|
||||
|
||||
expect(
|
||||
stagedManifestSchema.parse({
|
||||
source: 'historic-sql',
|
||||
connectionId: 'warehouse',
|
||||
dialect: 'postgres',
|
||||
fetchedAt: '2026-05-11T00:00:00.000Z',
|
||||
windowStart: '2026-02-10T00:00:00.000Z',
|
||||
windowEnd: '2026-05-11T00:00:00.000Z',
|
||||
snapshotRowCount: 2,
|
||||
touchedTableCount: 1,
|
||||
parseFailures: 1,
|
||||
warnings: ['parse_failed:bad'],
|
||||
probeWarnings: [],
|
||||
staleArchiveAfterDays: 90,
|
||||
}).staleArchiveAfterDays,
|
||||
).toBe(90);
|
||||
});
|
||||
});
|
||||
153
packages/cli/src/context/ingest/adapters/historic-sql/types.ts
Normal file
153
packages/cli/src/context/ingest/adapters/historic-sql/types.ts
Normal file
|
|
@ -0,0 +1,153 @@
|
|||
import { z } from 'zod';
|
||||
import type { SqlAnalysisPort } from '../../../../context/sql-analysis/ports.js';
|
||||
|
||||
export const HISTORIC_SQL_SOURCE_KEY = 'historic-sql' as const;
|
||||
|
||||
const historicSqlDialectSchema = z.enum(['snowflake', 'bigquery', 'postgres']);
|
||||
export type HistoricSqlDialect = z.infer<typeof historicSqlDialectSchema>;
|
||||
|
||||
const filterModeSchema = z.enum(['exclude', 'include', 'mark-only']);
|
||||
|
||||
const historicSqlCommonPullConfigSchema = z.object({
|
||||
minExecutions: z.number().int().nonnegative().default(5),
|
||||
enabledTables: z.array(z.string().min(1)).default([]),
|
||||
filters: z.object({
|
||||
serviceAccounts: z.object({
|
||||
patterns: z.array(z.string()).default([]),
|
||||
mode: filterModeSchema.default('exclude'),
|
||||
}).optional(),
|
||||
orchestrators: z.object({
|
||||
mode: filterModeSchema.default('mark-only'),
|
||||
}).optional(),
|
||||
dropTrivialProbes: z.boolean().default(true),
|
||||
dropFailedBelow: z.object({
|
||||
errorRate: z.number().min(0).max(1),
|
||||
executions: z.number().int().nonnegative(),
|
||||
}).optional(),
|
||||
}).default({ dropTrivialProbes: true }),
|
||||
redactionPatterns: z.array(z.string()).default([]),
|
||||
staleArchiveAfterDays: z.number().int().positive().default(90),
|
||||
});
|
||||
|
||||
const historicSqlWindowedPullConfigSchema = historicSqlCommonPullConfigSchema.extend({
|
||||
dialect: z.enum(['snowflake', 'bigquery']),
|
||||
windowDays: z.number().int().positive().default(90),
|
||||
});
|
||||
|
||||
const historicSqlPostgresPullConfigSchema = historicSqlCommonPullConfigSchema.extend({
|
||||
dialect: z.literal('postgres'),
|
||||
});
|
||||
|
||||
export const historicSqlUnifiedPullConfigSchema = z.discriminatedUnion('dialect', [
|
||||
historicSqlWindowedPullConfigSchema,
|
||||
historicSqlPostgresPullConfigSchema,
|
||||
]);
|
||||
|
||||
export type HistoricSqlUnifiedPullConfig = z.infer<typeof historicSqlUnifiedPullConfigSchema>;
|
||||
|
||||
export const aggregatedTemplateSchema = z.object({
|
||||
templateId: z.string().min(1),
|
||||
canonicalSql: z.string().min(1),
|
||||
dialect: historicSqlDialectSchema,
|
||||
stats: z.object({
|
||||
executions: z.number().int().nonnegative(),
|
||||
distinctUsers: z.number().int().nonnegative(),
|
||||
firstSeen: z.iso.datetime(),
|
||||
lastSeen: z.iso.datetime(),
|
||||
p50RuntimeMs: z.number().nonnegative().nullable(),
|
||||
p95RuntimeMs: z.number().nonnegative().nullable(),
|
||||
errorRate: z.number().min(0).max(1),
|
||||
rowsProduced: z.number().int().nonnegative().nullable(),
|
||||
}),
|
||||
topUsers: z.array(z.object({
|
||||
user: z.string().nullable(),
|
||||
executions: z.number().int().nonnegative(),
|
||||
})).default([]),
|
||||
});
|
||||
export type AggregatedTemplate = z.infer<typeof aggregatedTemplateSchema>;
|
||||
|
||||
export const stagedTableInputSchema = z.object({
|
||||
table: z.string().min(1),
|
||||
stats: z.object({
|
||||
executionsBucket: z.string(),
|
||||
distinctUsersBucket: z.string(),
|
||||
errorRateBucket: z.string(),
|
||||
p95RuntimeBucket: z.string(),
|
||||
recencyBucket: z.string(),
|
||||
}),
|
||||
columnsByClause: z.record(z.string(), z.array(z.tuple([z.string(), z.string()]))),
|
||||
observedJoins: z.array(z.object({
|
||||
withTable: z.string(),
|
||||
on: z.array(z.string()),
|
||||
freq: z.string(),
|
||||
})),
|
||||
topTemplates: z.array(z.object({
|
||||
id: z.string(),
|
||||
canonicalSql: z.string(),
|
||||
topUsers: z.array(z.object({ user: z.string().nullable() })),
|
||||
})),
|
||||
});
|
||||
export type StagedTableInput = z.infer<typeof stagedTableInputSchema>;
|
||||
|
||||
export const stagedPatternsInputSchema = z.object({
|
||||
templates: z.array(z.object({
|
||||
id: z.string(),
|
||||
canonicalSql: z.string(),
|
||||
tablesTouched: z.array(z.string()),
|
||||
executionsBucket: z.string(),
|
||||
distinctUsersBucket: z.string(),
|
||||
dialect: historicSqlDialectSchema,
|
||||
})),
|
||||
});
|
||||
export type StagedPatternsInput = z.infer<typeof stagedPatternsInputSchema>;
|
||||
|
||||
export const stagedManifestSchema = z.object({
|
||||
source: z.literal(HISTORIC_SQL_SOURCE_KEY),
|
||||
connectionId: z.string().min(1),
|
||||
dialect: historicSqlDialectSchema,
|
||||
fetchedAt: z.iso.datetime(),
|
||||
windowStart: z.iso.datetime(),
|
||||
windowEnd: z.iso.datetime(),
|
||||
snapshotRowCount: z.number().int().nonnegative(),
|
||||
touchedTableCount: z.number().int().nonnegative(),
|
||||
parseFailures: z.number().int().nonnegative(),
|
||||
warnings: z.array(z.string()),
|
||||
probeWarnings: z.array(z.string()),
|
||||
staleArchiveAfterDays: z.number().int().positive().default(90),
|
||||
});
|
||||
|
||||
interface HistoricSqlProbeResult {
|
||||
warnings: string[];
|
||||
info?: string[];
|
||||
}
|
||||
|
||||
export interface HistoricSqlReader {
|
||||
probe(client: unknown): Promise<HistoricSqlProbeResult>;
|
||||
fetchAggregated(
|
||||
client: unknown,
|
||||
window: HistoricSqlTimeWindow,
|
||||
config: HistoricSqlUnifiedPullConfig,
|
||||
): AsyncIterable<AggregatedTemplate>;
|
||||
}
|
||||
|
||||
export interface HistoricSqlTimeWindow {
|
||||
start: Date;
|
||||
end: Date;
|
||||
}
|
||||
|
||||
export interface KtxPostgresQueryClient {
|
||||
executeQuery(sql: string, params?: unknown[]): Promise<{ headers: string[]; rows: unknown[][]; totalRows?: number }>;
|
||||
}
|
||||
|
||||
export interface PostgresPgssProbeResult extends HistoricSqlProbeResult {
|
||||
pgServerVersion: string;
|
||||
warnings: string[];
|
||||
info: string[];
|
||||
}
|
||||
|
||||
export interface HistoricSqlSourceAdapterDeps {
|
||||
sqlAnalysis: SqlAnalysisPort;
|
||||
reader: HistoricSqlReader;
|
||||
queryClient: unknown;
|
||||
now?: () => Date;
|
||||
}
|
||||
|
|
@ -0,0 +1,107 @@
|
|||
import { mkdtemp } from 'node:fs/promises';
|
||||
import { tmpdir } from 'node:os';
|
||||
import { join } from 'node:path';
|
||||
import { describe, expect, it } from 'vitest';
|
||||
import type { KtxSchemaSnapshot } from '../../../scan/types.js';
|
||||
import { chunkLiveDatabaseStagedDir } from './chunk.js';
|
||||
import { liveDatabaseTablePath, writeLiveDatabaseSnapshot } from './stage.js';
|
||||
|
||||
function snapshot(): KtxSchemaSnapshot {
|
||||
return {
|
||||
connectionId: 'conn-1',
|
||||
driver: 'postgres',
|
||||
extractedAt: '2026-04-27T00:00:00.000Z',
|
||||
scope: { schemas: ['public'] },
|
||||
metadata: {},
|
||||
tables: [
|
||||
{
|
||||
name: 'orders',
|
||||
catalog: null,
|
||||
db: 'public',
|
||||
kind: 'table',
|
||||
comment: null,
|
||||
estimatedRows: null,
|
||||
columns: [
|
||||
{
|
||||
name: 'id',
|
||||
nativeType: 'integer',
|
||||
normalizedType: 'integer',
|
||||
dimensionType: 'number',
|
||||
nullable: false,
|
||||
primaryKey: true,
|
||||
comment: null,
|
||||
},
|
||||
],
|
||||
foreignKeys: [],
|
||||
},
|
||||
{
|
||||
name: 'customers',
|
||||
catalog: null,
|
||||
db: 'public',
|
||||
kind: 'table',
|
||||
comment: null,
|
||||
estimatedRows: null,
|
||||
columns: [
|
||||
{
|
||||
name: 'id',
|
||||
nativeType: 'integer',
|
||||
normalizedType: 'integer',
|
||||
dimensionType: 'number',
|
||||
nullable: false,
|
||||
primaryKey: true,
|
||||
comment: null,
|
||||
},
|
||||
],
|
||||
foreignKeys: [],
|
||||
},
|
||||
],
|
||||
};
|
||||
}
|
||||
|
||||
describe('chunkLiveDatabaseStagedDir', () => {
|
||||
it('emits one work unit per table on the first run', async () => {
|
||||
const dir = await mkdtemp(join(tmpdir(), 'ktx-live-db-chunk-'));
|
||||
await writeLiveDatabaseSnapshot(dir, snapshot());
|
||||
|
||||
const result = await chunkLiveDatabaseStagedDir(dir);
|
||||
expect(result.workUnits.map((wu) => wu.unitKey)).toEqual([
|
||||
'live-database-public-customers',
|
||||
'live-database-public-orders',
|
||||
]);
|
||||
expect(result.workUnits[0]?.dependencyPaths).toEqual(['connection.json', 'foreign-keys.json']);
|
||||
expect(result.workUnits[0]?.peerFileIndex).toContain(
|
||||
liveDatabaseTablePath({ catalog: null, db: 'public', name: 'orders' }),
|
||||
);
|
||||
});
|
||||
|
||||
it('keeps only changed tables during incremental syncs and records table evictions', async () => {
|
||||
const dir = await mkdtemp(join(tmpdir(), 'ktx-live-db-diff-'));
|
||||
await writeLiveDatabaseSnapshot(dir, snapshot());
|
||||
const ordersPath = liveDatabaseTablePath({ catalog: null, db: 'public', name: 'orders' });
|
||||
const customersPath = liveDatabaseTablePath({ catalog: null, db: 'public', name: 'customers' });
|
||||
|
||||
const result = await chunkLiveDatabaseStagedDir(dir, {
|
||||
added: [],
|
||||
modified: [ordersPath],
|
||||
deleted: [customersPath],
|
||||
unchanged: ['connection.json', 'foreign-keys.json'],
|
||||
});
|
||||
|
||||
expect(result.workUnits.map((wu) => wu.unitKey)).toEqual(['live-database-public-orders']);
|
||||
expect(result.eviction?.deletedRawPaths).toEqual([customersPath]);
|
||||
});
|
||||
|
||||
it('fans out all table work units when the foreign-key index changes', async () => {
|
||||
const dir = await mkdtemp(join(tmpdir(), 'ktx-live-db-fk-'));
|
||||
await writeLiveDatabaseSnapshot(dir, snapshot());
|
||||
|
||||
const result = await chunkLiveDatabaseStagedDir(dir, {
|
||||
added: [],
|
||||
modified: ['foreign-keys.json'],
|
||||
deleted: [],
|
||||
unchanged: [],
|
||||
});
|
||||
|
||||
expect(result.workUnits).toHaveLength(2);
|
||||
});
|
||||
});
|
||||
|
|
@ -0,0 +1,58 @@
|
|||
import type { ChunkResult, DiffSet, WorkUnit } from '../../types.js';
|
||||
import type { KtxSchemaTable } from '../../../scan/types.js';
|
||||
import { LIVE_DATABASE_FOREIGN_KEYS_FILE, LIVE_DATABASE_META_FILE, readLiveDatabaseTableFiles } from './stage.js';
|
||||
|
||||
function unitKey(table: KtxSchemaTable): string {
|
||||
const parts = [table.catalog, table.db, table.name]
|
||||
.filter((part): part is string => typeof part === 'string' && part.length > 0)
|
||||
.map((part) =>
|
||||
part
|
||||
.toLowerCase()
|
||||
.replace(/[^a-z0-9]+/g, '-')
|
||||
.replace(/^-+|-+$/g, ''),
|
||||
)
|
||||
.filter(Boolean);
|
||||
return `live-database-${parts.join('-') || 'table'}`;
|
||||
}
|
||||
|
||||
function displayName(table: KtxSchemaTable): string {
|
||||
return [table.catalog, table.db, table.name].filter(Boolean).join('.');
|
||||
}
|
||||
|
||||
function isTablePath(path: string): boolean {
|
||||
return path.startsWith('tables/') && path.endsWith('.json');
|
||||
}
|
||||
|
||||
export async function chunkLiveDatabaseStagedDir(stagedDir: string, diffSet?: DiffSet): Promise<ChunkResult> {
|
||||
const tableFiles = await readLiveDatabaseTableFiles(stagedDir);
|
||||
const allTablePaths = tableFiles.map((file) => file.path);
|
||||
const globalDeps = [LIVE_DATABASE_META_FILE, LIVE_DATABASE_FOREIGN_KEYS_FILE];
|
||||
const touched = diffSet ? new Set([...diffSet.added, ...diffSet.modified]) : null;
|
||||
const globalTouched = Boolean(
|
||||
touched && (touched.has(LIVE_DATABASE_META_FILE) || touched.has(LIVE_DATABASE_FOREIGN_KEYS_FILE)),
|
||||
);
|
||||
|
||||
const workUnits: WorkUnit[] = [];
|
||||
for (const file of tableFiles) {
|
||||
if (touched && !globalTouched && !touched.has(file.path)) {
|
||||
continue;
|
||||
}
|
||||
const peers = allTablePaths.filter((path) => path !== file.path).sort();
|
||||
workUnits.push({
|
||||
unitKey: unitKey(file.table),
|
||||
displayLabel: `Live database table ${displayName(file.table)}`,
|
||||
rawFiles: [file.path],
|
||||
peerFileIndex: peers,
|
||||
dependencyPaths: globalDeps,
|
||||
notes: `Database catalog snapshot for ${displayName(file.table)} with ${file.table.columns.length} column${
|
||||
file.table.columns.length === 1 ? '' : 's'
|
||||
}.`,
|
||||
});
|
||||
}
|
||||
|
||||
const deletedRawPaths = diffSet ? diffSet.deleted.filter(isTablePath).sort() : [];
|
||||
return {
|
||||
workUnits,
|
||||
...(deletedRawPaths.length > 0 ? { eviction: { deletedRawPaths } } : {}),
|
||||
};
|
||||
}
|
||||
|
|
@ -0,0 +1,255 @@
|
|||
import { once } from 'node:events';
|
||||
import { createServer } from 'node:http';
|
||||
import { describe, expect, it, vi } from 'vitest';
|
||||
import { createDaemonLiveDatabaseIntrospection } from './daemon-introspection.js';
|
||||
|
||||
const daemonResponse = {
|
||||
connection_id: 'warehouse',
|
||||
extracted_at: '2026-04-28T10:00:00+00:00',
|
||||
metadata: { driver: 'postgres', schemas: ['public'] },
|
||||
tables: [
|
||||
{
|
||||
catalog: 'warehouse',
|
||||
db: 'public',
|
||||
name: 'customers',
|
||||
comment: null,
|
||||
columns: [{ name: 'id', type: 'integer', nullable: false, primary_key: true, comment: null }],
|
||||
foreign_keys: [],
|
||||
},
|
||||
{
|
||||
catalog: 'warehouse',
|
||||
db: 'public',
|
||||
name: 'orders',
|
||||
comment: 'Order facts',
|
||||
columns: [
|
||||
{ name: 'id', type: 'integer', nullable: false, primary_key: true, comment: 'Order id' },
|
||||
{ name: 'customer_id', type: 'integer', nullable: false, primary_key: false, comment: null },
|
||||
],
|
||||
foreign_keys: [
|
||||
{
|
||||
from_column: 'customer_id',
|
||||
to_table: 'customers',
|
||||
to_column: 'id',
|
||||
constraint_name: 'orders_customer_id_fkey',
|
||||
},
|
||||
],
|
||||
},
|
||||
],
|
||||
};
|
||||
|
||||
describe('createDaemonLiveDatabaseIntrospection', () => {
|
||||
it('calls the database-introspect daemon command and maps the snapshot response', async () => {
|
||||
const runJson = vi.fn(async () => daemonResponse);
|
||||
const introspection = createDaemonLiveDatabaseIntrospection({
|
||||
connections: {
|
||||
warehouse: {
|
||||
driver: 'postgres',
|
||||
url: 'postgres://localhost:5432/warehouse',
|
||||
},
|
||||
},
|
||||
schemas: ['public'],
|
||||
runJson,
|
||||
});
|
||||
|
||||
await expect(introspection.extractSchema('warehouse')).resolves.toEqual({
|
||||
connectionId: 'warehouse',
|
||||
driver: 'postgres',
|
||||
extractedAt: '2026-04-28T10:00:00+00:00',
|
||||
scope: { schemas: ['public'] },
|
||||
metadata: { driver: 'postgres', schemas: ['public'] },
|
||||
tables: [
|
||||
{
|
||||
catalog: 'warehouse',
|
||||
db: 'public',
|
||||
name: 'customers',
|
||||
kind: 'table',
|
||||
comment: null,
|
||||
estimatedRows: null,
|
||||
columns: [
|
||||
{
|
||||
name: 'id',
|
||||
nativeType: 'integer',
|
||||
normalizedType: 'integer',
|
||||
dimensionType: 'number',
|
||||
nullable: false,
|
||||
primaryKey: true,
|
||||
comment: null,
|
||||
},
|
||||
],
|
||||
foreignKeys: [],
|
||||
},
|
||||
{
|
||||
catalog: 'warehouse',
|
||||
db: 'public',
|
||||
name: 'orders',
|
||||
kind: 'table',
|
||||
comment: 'Order facts',
|
||||
estimatedRows: null,
|
||||
columns: [
|
||||
{
|
||||
name: 'id',
|
||||
nativeType: 'integer',
|
||||
normalizedType: 'integer',
|
||||
dimensionType: 'number',
|
||||
nullable: false,
|
||||
primaryKey: true,
|
||||
comment: 'Order id',
|
||||
},
|
||||
{
|
||||
name: 'customer_id',
|
||||
nativeType: 'integer',
|
||||
normalizedType: 'integer',
|
||||
dimensionType: 'number',
|
||||
nullable: false,
|
||||
primaryKey: false,
|
||||
comment: null,
|
||||
},
|
||||
],
|
||||
foreignKeys: [
|
||||
{
|
||||
fromColumn: 'customer_id',
|
||||
toCatalog: null,
|
||||
toDb: null,
|
||||
toTable: 'customers',
|
||||
toColumn: 'id',
|
||||
constraintName: 'orders_customer_id_fkey',
|
||||
},
|
||||
],
|
||||
},
|
||||
],
|
||||
});
|
||||
|
||||
expect(runJson).toHaveBeenCalledWith('database-introspect', {
|
||||
connection_id: 'warehouse',
|
||||
driver: 'postgres',
|
||||
url: 'postgres://localhost:5432/warehouse',
|
||||
schemas: ['public'],
|
||||
statement_timeout_ms: 30_000,
|
||||
connection_timeout_seconds: 5,
|
||||
});
|
||||
});
|
||||
|
||||
it('calls a running daemon HTTP endpoint when baseUrl is configured', async () => {
|
||||
const requests: Array<{ url: string | undefined; body: unknown }> = [];
|
||||
const server = createServer((request, response) => {
|
||||
const chunks: Buffer[] = [];
|
||||
request.on('data', (chunk: Buffer) => chunks.push(chunk));
|
||||
request.on('end', () => {
|
||||
requests.push({
|
||||
url: request.url,
|
||||
body: JSON.parse(Buffer.concat(chunks).toString('utf8')),
|
||||
});
|
||||
response.writeHead(200, { 'content-type': 'application/json' });
|
||||
response.end(JSON.stringify(daemonResponse));
|
||||
});
|
||||
});
|
||||
|
||||
server.listen(0, '127.0.0.1');
|
||||
await once(server, 'listening');
|
||||
try {
|
||||
const address = server.address();
|
||||
if (!address || typeof address === 'string') {
|
||||
throw new Error('expected TCP server address');
|
||||
}
|
||||
const introspection = createDaemonLiveDatabaseIntrospection({
|
||||
connections: {
|
||||
warehouse: {
|
||||
driver: 'postgresql',
|
||||
url: 'postgres://localhost:5432/warehouse',
|
||||
},
|
||||
},
|
||||
baseUrl: `http://127.0.0.1:${address.port}`,
|
||||
});
|
||||
|
||||
await expect(introspection.extractSchema('warehouse')).resolves.toMatchObject({
|
||||
connectionId: 'warehouse',
|
||||
tables: [{ name: 'customers' }, { name: 'orders' }],
|
||||
});
|
||||
|
||||
expect(requests).toEqual([
|
||||
{
|
||||
url: '/database/introspect',
|
||||
body: {
|
||||
connection_id: 'warehouse',
|
||||
driver: 'postgres',
|
||||
url: 'postgres://localhost:5432/warehouse',
|
||||
schemas: ['public'],
|
||||
statement_timeout_ms: 30_000,
|
||||
connection_timeout_seconds: 5,
|
||||
},
|
||||
},
|
||||
]);
|
||||
} finally {
|
||||
server.close();
|
||||
}
|
||||
});
|
||||
|
||||
it('requires a configured postgres connection with a url', async () => {
|
||||
const introspection = createDaemonLiveDatabaseIntrospection({
|
||||
connections: {
|
||||
warehouse: {
|
||||
driver: 'postgres',
|
||||
},
|
||||
},
|
||||
runJson: vi.fn(async () => daemonResponse),
|
||||
});
|
||||
|
||||
await expect(introspection.extractSchema('warehouse')).rejects.toThrow(
|
||||
'Local live-database ingest requires connections.warehouse.url.',
|
||||
);
|
||||
});
|
||||
|
||||
it('rejects unsupported local connection drivers before calling the daemon', async () => {
|
||||
const runJson = vi.fn(async () => daemonResponse);
|
||||
const introspection = createDaemonLiveDatabaseIntrospection({
|
||||
connections: {
|
||||
warehouse: {
|
||||
driver: 'snowflake',
|
||||
url: 'snowflake://example',
|
||||
},
|
||||
},
|
||||
runJson,
|
||||
});
|
||||
|
||||
await expect(introspection.extractSchema('warehouse')).rejects.toThrow(
|
||||
'Local live-database ingest cannot run driver "snowflake".',
|
||||
);
|
||||
expect(runJson).not.toHaveBeenCalled();
|
||||
});
|
||||
|
||||
it('filters out tables not on the enabled_tables allowlist', async () => {
|
||||
const runJson = vi.fn(async () => daemonResponse);
|
||||
const introspection = createDaemonLiveDatabaseIntrospection({
|
||||
connections: {
|
||||
warehouse: {
|
||||
driver: 'postgres',
|
||||
url: 'postgres://localhost:5432/warehouse',
|
||||
enabled_tables: ['public.orders'],
|
||||
},
|
||||
},
|
||||
schemas: ['public'],
|
||||
runJson,
|
||||
});
|
||||
|
||||
const snapshot = await introspection.extractSchema('warehouse');
|
||||
expect(snapshot.tables.map((table) => `${table.db}.${table.name}`)).toEqual(['public.orders']);
|
||||
});
|
||||
|
||||
it('passes through every table when enabled_tables is omitted or empty', async () => {
|
||||
const runJson = vi.fn(async () => daemonResponse);
|
||||
const introspection = createDaemonLiveDatabaseIntrospection({
|
||||
connections: {
|
||||
warehouse: {
|
||||
driver: 'postgres',
|
||||
url: 'postgres://localhost:5432/warehouse',
|
||||
enabled_tables: [],
|
||||
},
|
||||
},
|
||||
schemas: ['public'],
|
||||
runJson,
|
||||
});
|
||||
|
||||
const snapshot = await introspection.extractSchema('warehouse');
|
||||
expect(snapshot.tables.map((table) => table.name)).toEqual(['customers', 'orders']);
|
||||
});
|
||||
});
|
||||
|
|
@ -0,0 +1,256 @@
|
|||
import { spawn } from 'node:child_process';
|
||||
import { request as httpRequest } from 'node:http';
|
||||
import { request as httpsRequest } from 'node:https';
|
||||
import { URL } from 'node:url';
|
||||
import type { KtxProjectConnectionConfig } from '../../../project/config.js';
|
||||
import { filterSnapshotTables, resolveEnabledTables } from '../../../scan/enabled-tables.js';
|
||||
import type { KtxSchemaColumn, KtxSchemaForeignKey, KtxSchemaSnapshot, KtxSchemaTable } from '../../../scan/types.js';
|
||||
import { inferKtxDimensionType, normalizeKtxNativeType } from '../../../scan/type-normalization.js';
|
||||
import type { LiveDatabaseIntrospectionPort } from './types.js';
|
||||
|
||||
type KtxDaemonDatabaseIntrospectionCommand = 'database-introspect';
|
||||
|
||||
type KtxDaemonDatabaseJsonRunner = (
|
||||
subcommand: KtxDaemonDatabaseIntrospectionCommand,
|
||||
payload: Record<string, unknown>,
|
||||
) => Promise<Record<string, unknown>>;
|
||||
|
||||
export type KtxDaemonDatabaseHttpJsonRunner = (
|
||||
path: string,
|
||||
payload: Record<string, unknown>,
|
||||
) => Promise<Record<string, unknown>>;
|
||||
|
||||
export interface DaemonLiveDatabaseIntrospectionOptions {
|
||||
connections: Record<string, KtxProjectConnectionConfig>;
|
||||
schemas?: string[];
|
||||
statementTimeoutMs?: number;
|
||||
connectionTimeoutSeconds?: number;
|
||||
command?: string;
|
||||
args?: string[];
|
||||
cwd?: string;
|
||||
env?: NodeJS.ProcessEnv;
|
||||
baseUrl?: string;
|
||||
runJson?: KtxDaemonDatabaseJsonRunner;
|
||||
requestJson?: KtxDaemonDatabaseHttpJsonRunner;
|
||||
now?: () => Date;
|
||||
}
|
||||
|
||||
const DEFAULT_SCHEMAS = ['public'];
|
||||
|
||||
function parseJsonObject(raw: string, subcommand: string): Record<string, unknown> {
|
||||
const parsed = JSON.parse(raw) as unknown;
|
||||
if (!parsed || typeof parsed !== 'object' || Array.isArray(parsed)) {
|
||||
throw new Error(`ktx-daemon ${subcommand} returned non-object JSON`);
|
||||
}
|
||||
return parsed as Record<string, unknown>;
|
||||
}
|
||||
|
||||
function runProcessJson(
|
||||
options: Required<Pick<DaemonLiveDatabaseIntrospectionOptions, 'command' | 'args'>> &
|
||||
Pick<DaemonLiveDatabaseIntrospectionOptions, 'cwd' | 'env'>,
|
||||
): KtxDaemonDatabaseJsonRunner {
|
||||
return async (subcommand, payload) =>
|
||||
new Promise((resolve, reject) => {
|
||||
const child = spawn(options.command, [...options.args, subcommand], {
|
||||
cwd: options.cwd,
|
||||
env: { ...process.env, ...options.env },
|
||||
stdio: ['pipe', 'pipe', 'pipe'],
|
||||
});
|
||||
const stdout: Buffer[] = [];
|
||||
const stderr: Buffer[] = [];
|
||||
|
||||
child.stdout.on('data', (chunk: Buffer) => stdout.push(chunk));
|
||||
child.stderr.on('data', (chunk: Buffer) => stderr.push(chunk));
|
||||
child.on('error', reject);
|
||||
child.on('close', (code) => {
|
||||
const stdoutText = Buffer.concat(stdout).toString('utf8').trim();
|
||||
const stderrText = Buffer.concat(stderr).toString('utf8').trim();
|
||||
if (code !== 0) {
|
||||
reject(new Error(`ktx-daemon ${subcommand} failed: ${stderrText || `exit code ${code}`}`));
|
||||
return;
|
||||
}
|
||||
try {
|
||||
resolve(parseJsonObject(stdoutText, subcommand));
|
||||
} catch (error) {
|
||||
reject(error);
|
||||
}
|
||||
});
|
||||
child.stdin.end(`${JSON.stringify(payload)}\n`);
|
||||
});
|
||||
}
|
||||
|
||||
function normalizedBaseUrl(baseUrl: string): string {
|
||||
return baseUrl.endsWith('/') ? baseUrl : `${baseUrl}/`;
|
||||
}
|
||||
|
||||
function postJson(baseUrl: string): KtxDaemonDatabaseHttpJsonRunner {
|
||||
return async (path, payload) =>
|
||||
new Promise((resolve, reject) => {
|
||||
const target = new URL(path.replace(/^\//, ''), normalizedBaseUrl(baseUrl));
|
||||
const body = JSON.stringify(payload);
|
||||
const client = target.protocol === 'https:' ? httpsRequest : httpRequest;
|
||||
const request = client(
|
||||
target,
|
||||
{
|
||||
method: 'POST',
|
||||
headers: {
|
||||
accept: 'application/json',
|
||||
'content-type': 'application/json',
|
||||
'content-length': Buffer.byteLength(body),
|
||||
},
|
||||
},
|
||||
(response) => {
|
||||
const chunks: Buffer[] = [];
|
||||
response.on('data', (chunk: Buffer) => chunks.push(chunk));
|
||||
response.on('end', () => {
|
||||
const text = Buffer.concat(chunks).toString('utf8');
|
||||
const statusCode = response.statusCode ?? 0;
|
||||
if (statusCode < 200 || statusCode >= 300) {
|
||||
reject(new Error(`ktx-daemon HTTP ${path} failed with ${statusCode}: ${text}`));
|
||||
return;
|
||||
}
|
||||
try {
|
||||
resolve(parseJsonObject(text, path));
|
||||
} catch (error) {
|
||||
reject(error);
|
||||
}
|
||||
});
|
||||
},
|
||||
);
|
||||
request.on('error', reject);
|
||||
request.end(body);
|
||||
});
|
||||
}
|
||||
|
||||
function recordValue(value: unknown): Record<string, unknown> {
|
||||
return value && typeof value === 'object' && !Array.isArray(value) ? (value as Record<string, unknown>) : {};
|
||||
}
|
||||
|
||||
function recordArray(value: unknown): Array<Record<string, unknown>> {
|
||||
return Array.isArray(value)
|
||||
? value.filter(
|
||||
(item): item is Record<string, unknown> => item !== null && typeof item === 'object' && !Array.isArray(item),
|
||||
)
|
||||
: [];
|
||||
}
|
||||
|
||||
function requiredString(value: unknown, field: string): string {
|
||||
if (typeof value !== 'string' || value.length === 0) {
|
||||
throw new Error(`ktx-daemon database introspection response is missing string field ${field}`);
|
||||
}
|
||||
return value;
|
||||
}
|
||||
|
||||
function nullableString(value: unknown): string | null {
|
||||
return typeof value === 'string' ? value : null;
|
||||
}
|
||||
|
||||
function optionalString(value: unknown): string | undefined {
|
||||
return typeof value === 'string' ? value : undefined;
|
||||
}
|
||||
|
||||
function normalizeDriver(driver: unknown): string {
|
||||
const normalized = String(driver ?? '').trim().toLowerCase();
|
||||
return normalized === 'postgresql' ? 'postgres' : normalized;
|
||||
}
|
||||
|
||||
function requirePostgresConnection(
|
||||
connections: Record<string, KtxProjectConnectionConfig>,
|
||||
connectionId: string,
|
||||
): KtxProjectConnectionConfig & { url: string } {
|
||||
const connection = connections[connectionId];
|
||||
const driver = normalizeDriver(connection?.driver);
|
||||
if (driver !== 'postgres') {
|
||||
throw new Error(`Local live-database ingest cannot run driver "${connection?.driver ?? 'unknown'}".`);
|
||||
}
|
||||
if (typeof connection.url !== 'string' || connection.url.trim().length === 0) {
|
||||
throw new Error(`Local live-database ingest requires connections.${connectionId}.url.`);
|
||||
}
|
||||
return connection as KtxProjectConnectionConfig & { url: string };
|
||||
}
|
||||
|
||||
function mapColumn(raw: Record<string, unknown>): KtxSchemaColumn {
|
||||
const nativeType = requiredString(raw.type, 'tables[].columns[].type');
|
||||
return {
|
||||
name: requiredString(raw.name, 'tables[].columns[].name'),
|
||||
nativeType,
|
||||
normalizedType: normalizeKtxNativeType(nativeType),
|
||||
dimensionType: inferKtxDimensionType(nativeType),
|
||||
nullable: raw.nullable !== false ? true : false,
|
||||
primaryKey: raw.primary_key === true,
|
||||
comment: nullableString(raw.comment),
|
||||
};
|
||||
}
|
||||
|
||||
function mapForeignKey(raw: Record<string, unknown>): KtxSchemaForeignKey {
|
||||
return {
|
||||
fromColumn: requiredString(raw.from_column, 'tables[].foreign_keys[].from_column'),
|
||||
toCatalog: null,
|
||||
toDb: null,
|
||||
toTable: requiredString(raw.to_table, 'tables[].foreign_keys[].to_table'),
|
||||
toColumn: requiredString(raw.to_column, 'tables[].foreign_keys[].to_column'),
|
||||
constraintName: nullableString(raw.constraint_name),
|
||||
};
|
||||
}
|
||||
|
||||
function mapTable(raw: Record<string, unknown>): KtxSchemaTable {
|
||||
return {
|
||||
catalog: nullableString(raw.catalog),
|
||||
db: nullableString(raw.db),
|
||||
name: requiredString(raw.name, 'tables[].name'),
|
||||
kind: 'table',
|
||||
comment: nullableString(raw.comment),
|
||||
estimatedRows: null,
|
||||
columns: recordArray(raw.columns).map(mapColumn),
|
||||
foreignKeys: recordArray(raw.foreign_keys).map(mapForeignKey),
|
||||
};
|
||||
}
|
||||
|
||||
function mapDaemonSnapshot(
|
||||
raw: Record<string, unknown>,
|
||||
input: { connectionId: string; extractedAt: string; schemas: string[] },
|
||||
): KtxSchemaSnapshot {
|
||||
return {
|
||||
connectionId: requiredString(raw.connection_id, 'connection_id') || input.connectionId,
|
||||
driver: 'postgres',
|
||||
extractedAt: optionalString(raw.extracted_at) ?? input.extractedAt,
|
||||
scope: { schemas: input.schemas },
|
||||
metadata: recordValue(raw.metadata),
|
||||
tables: recordArray(raw.tables).map(mapTable),
|
||||
};
|
||||
}
|
||||
|
||||
export function createDaemonLiveDatabaseIntrospection(
|
||||
options: DaemonLiveDatabaseIntrospectionOptions,
|
||||
): LiveDatabaseIntrospectionPort {
|
||||
const schemas = options.schemas ?? DEFAULT_SCHEMAS;
|
||||
const command = options.command ?? 'python';
|
||||
const args = options.args ?? ['-m', 'ktx_daemon'];
|
||||
const runJson = options.runJson ?? runProcessJson({ command, args, cwd: options.cwd, env: options.env });
|
||||
const requestJson = options.requestJson ?? (options.baseUrl ? postJson(options.baseUrl) : undefined);
|
||||
const now = options.now ?? (() => new Date());
|
||||
|
||||
return {
|
||||
async extractSchema(connectionId: string): Promise<KtxSchemaSnapshot> {
|
||||
const connection = requirePostgresConnection(options.connections, connectionId);
|
||||
const payload = {
|
||||
connection_id: connectionId,
|
||||
driver: normalizeDriver(connection.driver),
|
||||
url: connection.url,
|
||||
schemas,
|
||||
statement_timeout_ms: options.statementTimeoutMs ?? 30_000,
|
||||
connection_timeout_seconds: options.connectionTimeoutSeconds ?? 5,
|
||||
};
|
||||
const raw = requestJson
|
||||
? await requestJson('/database/introspect', payload)
|
||||
: await runJson('database-introspect', payload);
|
||||
const snapshot = mapDaemonSnapshot(raw, {
|
||||
connectionId,
|
||||
extractedAt: now().toISOString(),
|
||||
schemas,
|
||||
});
|
||||
const enabledTables = resolveEnabledTables(connection);
|
||||
return enabledTables ? filterSnapshotTables(snapshot, enabledTables) : snapshot;
|
||||
},
|
||||
};
|
||||
}
|
||||
|
|
@ -0,0 +1,59 @@
|
|||
import { mkdtemp } from 'node:fs/promises';
|
||||
import { tmpdir } from 'node:os';
|
||||
import { join } from 'node:path';
|
||||
import { describe, expect, it, vi } from 'vitest';
|
||||
import { LiveDatabaseSourceAdapter } from './live-database.adapter.js';
|
||||
|
||||
describe('LiveDatabaseSourceAdapter', () => {
|
||||
it('fetches a schema snapshot through the introspection port', async () => {
|
||||
const extractSchema = vi.fn().mockResolvedValue({
|
||||
connectionId: 'conn-1',
|
||||
driver: 'postgres',
|
||||
extractedAt: '2026-04-27T00:00:00.000Z',
|
||||
scope: { schemas: ['public'] },
|
||||
metadata: {},
|
||||
tables: [
|
||||
{
|
||||
name: 'orders',
|
||||
catalog: null,
|
||||
db: 'public',
|
||||
kind: 'table',
|
||||
comment: null,
|
||||
estimatedRows: null,
|
||||
columns: [
|
||||
{
|
||||
name: 'id',
|
||||
nativeType: 'integer',
|
||||
normalizedType: 'integer',
|
||||
dimensionType: 'number',
|
||||
nullable: false,
|
||||
primaryKey: true,
|
||||
comment: null,
|
||||
},
|
||||
],
|
||||
foreignKeys: [],
|
||||
},
|
||||
],
|
||||
});
|
||||
const adapter = new LiveDatabaseSourceAdapter({
|
||||
introspection: { extractSchema },
|
||||
now: () => new Date('2026-04-27T00:00:00.000Z'),
|
||||
});
|
||||
const dir = await mkdtemp(join(tmpdir(), 'ktx-live-db-adapter-'));
|
||||
|
||||
await adapter.fetch(undefined, dir, { connectionId: 'conn-1', sourceKey: 'live-database' });
|
||||
|
||||
expect(extractSchema).toHaveBeenCalledWith('conn-1');
|
||||
await expect(adapter.detect(dir)).resolves.toBe(true);
|
||||
const chunked = await adapter.chunk(dir);
|
||||
expect(chunked.workUnits.map((wu) => wu.unitKey)).toEqual(['live-database-public-orders']);
|
||||
});
|
||||
|
||||
it('declares the live database source and skill', () => {
|
||||
const adapter = new LiveDatabaseSourceAdapter({
|
||||
introspection: { extractSchema: vi.fn() },
|
||||
});
|
||||
expect(adapter.source).toBe('live-database');
|
||||
expect(adapter.skillNames).toEqual(['live_database_ingest']);
|
||||
});
|
||||
});
|
||||
|
|
@ -0,0 +1,28 @@
|
|||
import type { ChunkResult, DiffSet, FetchContext, SourceAdapter } from '../../types.js';
|
||||
import { chunkLiveDatabaseStagedDir } from './chunk.js';
|
||||
import { detectLiveDatabaseStagedDir, writeLiveDatabaseSnapshot } from './stage.js';
|
||||
import type { LiveDatabaseSourceAdapterDeps } from './types.js';
|
||||
|
||||
export class LiveDatabaseSourceAdapter implements SourceAdapter {
|
||||
readonly source = 'live-database';
|
||||
readonly skillNames = ['live_database_ingest'];
|
||||
|
||||
constructor(private readonly deps: LiveDatabaseSourceAdapterDeps) {}
|
||||
|
||||
detect(stagedDir: string): Promise<boolean> {
|
||||
return detectLiveDatabaseStagedDir(stagedDir);
|
||||
}
|
||||
|
||||
async fetch(_pullConfig: unknown, stagedDir: string, ctx: FetchContext): Promise<void> {
|
||||
const snapshot = await this.deps.introspection.extractSchema(ctx.connectionId);
|
||||
await writeLiveDatabaseSnapshot(stagedDir, {
|
||||
...snapshot,
|
||||
connectionId: ctx.connectionId,
|
||||
extractedAt: snapshot.extractedAt ?? (this.deps.now ?? (() => new Date()))().toISOString(),
|
||||
});
|
||||
}
|
||||
|
||||
chunk(stagedDir: string, diffSet?: DiffSet): Promise<ChunkResult> {
|
||||
return chunkLiveDatabaseStagedDir(stagedDir, diffSet);
|
||||
}
|
||||
}
|
||||
|
|
@ -0,0 +1,308 @@
|
|||
import { describe, expect, it } from 'vitest';
|
||||
import {
|
||||
buildLiveDatabaseManifestShards,
|
||||
type LiveDatabaseManifestExistingDescriptions,
|
||||
type LiveDatabaseManifestJoinEntry,
|
||||
type LiveDatabaseManifestShard,
|
||||
} from './manifest.js';
|
||||
|
||||
function shardObject(shards: Map<string, LiveDatabaseManifestShard>): Record<string, LiveDatabaseManifestShard> {
|
||||
return Object.fromEntries([...shards.entries()].sort(([a], [b]) => a.localeCompare(b)));
|
||||
}
|
||||
|
||||
describe('buildLiveDatabaseManifestShards', () => {
|
||||
it('builds shard objects with generated joins and preserved external descriptions', () => {
|
||||
const existingDescriptions = new Map<string, LiveDatabaseManifestExistingDescriptions>([
|
||||
[
|
||||
'orders',
|
||||
{
|
||||
table: { user: 'Pinned analyst description', db: 'Old db description' },
|
||||
columns: new Map([['id', { user: 'Pinned id description', db: 'Old id description' }]]),
|
||||
},
|
||||
],
|
||||
]);
|
||||
|
||||
const preservedJoins = new Map<string, LiveDatabaseManifestJoinEntry[]>([
|
||||
[
|
||||
'orders',
|
||||
[
|
||||
{
|
||||
to: 'customers',
|
||||
on: 'orders.account_id = customers.id',
|
||||
relationship: 'many_to_one',
|
||||
source: 'manual',
|
||||
},
|
||||
{
|
||||
to: 'missing_accounts',
|
||||
on: 'orders.account_id = missing_accounts.id',
|
||||
relationship: 'many_to_one',
|
||||
source: 'manual',
|
||||
},
|
||||
],
|
||||
],
|
||||
]);
|
||||
|
||||
const result = buildLiveDatabaseManifestShards({
|
||||
connectionType: 'POSTGRESQL',
|
||||
mapColumnType: (nativeType) => nativeType.toLowerCase(),
|
||||
existingDescriptions,
|
||||
existingPreservedJoins: preservedJoins,
|
||||
tables: [
|
||||
{
|
||||
name: 'orders',
|
||||
catalog: null,
|
||||
db: 'public',
|
||||
descriptions: { db: 'Fresh db description', ai: 'Generated AI description' },
|
||||
columns: [
|
||||
{
|
||||
name: 'id',
|
||||
type: 'INTEGER',
|
||||
pk: true,
|
||||
nullable: false,
|
||||
descriptions: { db: 'Fresh id description' },
|
||||
},
|
||||
{
|
||||
name: 'customer_id',
|
||||
type: 'INTEGER',
|
||||
},
|
||||
],
|
||||
},
|
||||
{
|
||||
name: 'customers',
|
||||
catalog: null,
|
||||
db: 'public',
|
||||
columns: [
|
||||
{
|
||||
name: 'id',
|
||||
type: 'INTEGER',
|
||||
pk: true,
|
||||
nullable: false,
|
||||
},
|
||||
],
|
||||
},
|
||||
],
|
||||
joins: [
|
||||
{
|
||||
fromTable: 'orders',
|
||||
fromColumns: ['customer_id'],
|
||||
toTable: 'customers',
|
||||
toColumns: ['id'],
|
||||
relationship: 'MANY_TO_ONE',
|
||||
source: 'formal',
|
||||
},
|
||||
],
|
||||
});
|
||||
|
||||
expect(result.tablesProcessed).toBe(2);
|
||||
expect(shardObject(result.shards)).toEqual({
|
||||
public: {
|
||||
tables: {
|
||||
orders: {
|
||||
table: 'public.orders',
|
||||
descriptions: {
|
||||
user: 'Pinned analyst description',
|
||||
db: 'Fresh db description',
|
||||
ai: 'Generated AI description',
|
||||
},
|
||||
columns: [
|
||||
{
|
||||
name: 'id',
|
||||
type: 'integer',
|
||||
pk: true,
|
||||
nullable: false,
|
||||
descriptions: {
|
||||
user: 'Pinned id description',
|
||||
db: 'Fresh id description',
|
||||
},
|
||||
},
|
||||
{
|
||||
name: 'customer_id',
|
||||
type: 'integer',
|
||||
},
|
||||
],
|
||||
joins: [
|
||||
{
|
||||
to: 'customers',
|
||||
on: 'orders.customer_id = customers.id',
|
||||
relationship: 'many_to_one',
|
||||
source: 'formal',
|
||||
},
|
||||
{
|
||||
to: 'customers',
|
||||
on: 'orders.account_id = customers.id',
|
||||
relationship: 'many_to_one',
|
||||
source: 'manual',
|
||||
},
|
||||
],
|
||||
},
|
||||
customers: {
|
||||
table: 'public.customers',
|
||||
columns: [
|
||||
{
|
||||
name: 'id',
|
||||
type: 'integer',
|
||||
pk: true,
|
||||
nullable: false,
|
||||
},
|
||||
],
|
||||
joins: [
|
||||
{
|
||||
to: 'orders',
|
||||
on: 'customers.id = orders.customer_id',
|
||||
relationship: 'one_to_many',
|
||||
source: 'formal',
|
||||
},
|
||||
],
|
||||
},
|
||||
},
|
||||
},
|
||||
});
|
||||
});
|
||||
|
||||
it('uses warehouse and schema shard keys for snowflake-style connections', () => {
|
||||
const result = buildLiveDatabaseManifestShards({
|
||||
connectionType: 'SNOWFLAKE',
|
||||
mapColumnType: (nativeType) => nativeType.toLowerCase(),
|
||||
tables: [
|
||||
{
|
||||
name: 'accounts',
|
||||
catalog: 'ANALYTICS',
|
||||
db: 'CORE',
|
||||
columns: [{ name: 'id', type: 'NUMBER' }],
|
||||
},
|
||||
],
|
||||
joins: [],
|
||||
});
|
||||
|
||||
expect(shardObject(result.shards)).toEqual({
|
||||
'ANALYTICS.CORE': {
|
||||
tables: {
|
||||
accounts: {
|
||||
table: 'ANALYTICS.CORE.accounts',
|
||||
columns: [{ name: 'id', type: 'number' }],
|
||||
},
|
||||
},
|
||||
},
|
||||
});
|
||||
});
|
||||
|
||||
it('preserves external usage keys while replacing historic SQL managed keys', () => {
|
||||
const existingUsage = new Map([
|
||||
[
|
||||
'orders',
|
||||
{
|
||||
narrative: 'Old generated usage narrative.',
|
||||
frequencyTier: 'low' as const,
|
||||
commonFilters: ['old_status'],
|
||||
commonJoins: [],
|
||||
ownerNote: 'Pinned analyst note',
|
||||
},
|
||||
],
|
||||
]);
|
||||
|
||||
const result = buildLiveDatabaseManifestShards({
|
||||
connectionType: 'POSTGRESQL',
|
||||
mapColumnType: (nativeType) => nativeType.toLowerCase(),
|
||||
existingUsage,
|
||||
tables: [
|
||||
{
|
||||
name: 'orders',
|
||||
catalog: null,
|
||||
db: 'public',
|
||||
usage: {
|
||||
narrative: 'Fresh generated usage narrative.',
|
||||
frequencyTier: 'high',
|
||||
commonFilters: ['status'],
|
||||
commonGroupBys: ['created_at'],
|
||||
commonJoins: [{ table: 'public.customers', on: ['customer_id'] }],
|
||||
},
|
||||
columns: [{ name: 'id', type: 'INTEGER' }],
|
||||
},
|
||||
],
|
||||
joins: [],
|
||||
});
|
||||
|
||||
expect(shardObject(result.shards)).toEqual({
|
||||
public: {
|
||||
tables: {
|
||||
orders: {
|
||||
table: 'public.orders',
|
||||
usage: {
|
||||
ownerNote: 'Pinned analyst note',
|
||||
narrative: 'Fresh generated usage narrative.',
|
||||
frequencyTier: 'high',
|
||||
commonFilters: ['status'],
|
||||
commonGroupBys: ['created_at'],
|
||||
commonJoins: [{ table: 'public.customers', on: ['customer_id'] }],
|
||||
},
|
||||
columns: [{ name: 'id', type: 'integer' }],
|
||||
},
|
||||
},
|
||||
},
|
||||
});
|
||||
});
|
||||
|
||||
it('renders ordered multi-column joins in both directions', () => {
|
||||
const result = buildLiveDatabaseManifestShards({
|
||||
connectionType: 'POSTGRESQL',
|
||||
mapColumnType: (nativeType) => nativeType,
|
||||
tables: [
|
||||
{
|
||||
name: 'order_lines',
|
||||
catalog: null,
|
||||
db: 'public',
|
||||
columns: [
|
||||
{ name: 'order_id', type: 'integer' },
|
||||
{ name: 'line_number', type: 'integer' },
|
||||
],
|
||||
},
|
||||
{
|
||||
name: 'order_line_allocations',
|
||||
catalog: null,
|
||||
db: 'public',
|
||||
columns: [
|
||||
{ name: 'order_id', type: 'integer' },
|
||||
{ name: 'line_number', type: 'integer' },
|
||||
],
|
||||
},
|
||||
],
|
||||
joins: [
|
||||
{
|
||||
fromTable: 'order_line_allocations',
|
||||
fromColumns: ['order_id', 'line_number'],
|
||||
toTable: 'order_lines',
|
||||
toColumns: ['order_id', 'line_number'],
|
||||
relationship: 'many_to_one',
|
||||
source: 'inferred',
|
||||
},
|
||||
],
|
||||
});
|
||||
|
||||
expect(shardObject(result.shards)).toMatchObject({
|
||||
public: {
|
||||
tables: {
|
||||
order_line_allocations: {
|
||||
joins: [
|
||||
{
|
||||
to: 'order_lines',
|
||||
on: 'order_line_allocations.order_id = order_lines.order_id AND order_line_allocations.line_number = order_lines.line_number',
|
||||
relationship: 'many_to_one',
|
||||
source: 'inferred',
|
||||
},
|
||||
],
|
||||
},
|
||||
order_lines: {
|
||||
joins: [
|
||||
{
|
||||
to: 'order_line_allocations',
|
||||
on: 'order_lines.order_id = order_line_allocations.order_id AND order_lines.line_number = order_line_allocations.line_number',
|
||||
relationship: 'one_to_many',
|
||||
source: 'inferred',
|
||||
},
|
||||
],
|
||||
},
|
||||
},
|
||||
},
|
||||
});
|
||||
});
|
||||
});
|
||||
|
|
@ -0,0 +1,310 @@
|
|||
import type { TableUsageOutput } from '../historic-sql/skill-schemas.js';
|
||||
|
||||
const RELATIONSHIP_MAP: Record<string, string> = {
|
||||
MANY_TO_ONE: 'many_to_one',
|
||||
ONE_TO_MANY: 'one_to_many',
|
||||
ONE_TO_ONE: 'one_to_one',
|
||||
};
|
||||
|
||||
const RELATIONSHIP_INVERSE: Record<string, string> = {
|
||||
many_to_one: 'one_to_many',
|
||||
one_to_many: 'many_to_one',
|
||||
one_to_one: 'one_to_one',
|
||||
};
|
||||
|
||||
const SCAN_MANAGED_DESCRIPTION_KEYS = new Set(['db', 'ai']);
|
||||
const HISTORIC_SQL_MANAGED_USAGE_KEYS = new Set([
|
||||
'narrative',
|
||||
'frequencyTier',
|
||||
'commonFilters',
|
||||
'commonGroupBys',
|
||||
'commonJoins',
|
||||
'staleSince',
|
||||
]);
|
||||
|
||||
interface LiveDatabaseManifestColumn {
|
||||
name: string;
|
||||
type: string;
|
||||
pk?: boolean;
|
||||
nullable?: boolean;
|
||||
descriptions?: Record<string, string>;
|
||||
}
|
||||
|
||||
export interface LiveDatabaseManifestJoinEntry {
|
||||
to: string;
|
||||
on: string;
|
||||
relationship: string;
|
||||
source: string;
|
||||
}
|
||||
|
||||
interface LiveDatabaseManifestTableEntry {
|
||||
table: string;
|
||||
descriptions?: Record<string, string>;
|
||||
usage?: TableUsageOutput;
|
||||
columns: LiveDatabaseManifestColumn[];
|
||||
joins?: LiveDatabaseManifestJoinEntry[];
|
||||
}
|
||||
|
||||
export interface LiveDatabaseManifestShard {
|
||||
tables: Record<string, LiveDatabaseManifestTableEntry>;
|
||||
}
|
||||
|
||||
export interface LiveDatabaseManifestTableData {
|
||||
name: string;
|
||||
catalog: string | null;
|
||||
db: string | null;
|
||||
descriptions?: Record<string, string>;
|
||||
usage?: TableUsageOutput;
|
||||
columns: Array<{
|
||||
name: string;
|
||||
type: string;
|
||||
pk?: boolean;
|
||||
nullable?: boolean;
|
||||
descriptions?: Record<string, string>;
|
||||
}>;
|
||||
}
|
||||
|
||||
export interface LiveDatabaseManifestJoinData {
|
||||
fromTable: string;
|
||||
fromColumns: string[];
|
||||
toTable: string;
|
||||
toColumns: string[];
|
||||
relationship: string;
|
||||
source: 'formal' | 'inferred' | 'manual';
|
||||
}
|
||||
|
||||
export interface LiveDatabaseManifestExistingDescriptions {
|
||||
table?: Record<string, string>;
|
||||
columns: Map<string, Record<string, string>>;
|
||||
}
|
||||
|
||||
export interface BuildLiveDatabaseManifestShardsInput {
|
||||
connectionType: string;
|
||||
tables: LiveDatabaseManifestTableData[];
|
||||
joins: LiveDatabaseManifestJoinData[];
|
||||
mapColumnType: (nativeType: string) => string;
|
||||
existingPreservedJoins?: Map<string, LiveDatabaseManifestJoinEntry[]>;
|
||||
existingDescriptions?: Map<string, LiveDatabaseManifestExistingDescriptions>;
|
||||
existingUsage?: Map<string, TableUsageOutput>;
|
||||
}
|
||||
|
||||
export interface BuildLiveDatabaseManifestShardsResult {
|
||||
shards: Map<string, LiveDatabaseManifestShard>;
|
||||
tablesProcessed: number;
|
||||
}
|
||||
|
||||
function mergeDescriptionsPreservingExternal(
|
||||
existing: Record<string, string> | undefined,
|
||||
incoming: Record<string, string> | undefined,
|
||||
): Record<string, string> | undefined {
|
||||
if (!existing && !incoming) {
|
||||
return undefined;
|
||||
}
|
||||
const result: Record<string, string> = {};
|
||||
if (existing) {
|
||||
for (const [key, value] of Object.entries(existing)) {
|
||||
if (!SCAN_MANAGED_DESCRIPTION_KEYS.has(key)) {
|
||||
result[key] = value;
|
||||
}
|
||||
}
|
||||
}
|
||||
if (incoming) {
|
||||
Object.assign(result, incoming);
|
||||
}
|
||||
return Object.keys(result).length > 0 ? result : undefined;
|
||||
}
|
||||
|
||||
export function mergeUsagePreservingExternal(
|
||||
existing: TableUsageOutput | undefined,
|
||||
incoming: TableUsageOutput | undefined,
|
||||
): TableUsageOutput | undefined {
|
||||
if (!existing && !incoming) {
|
||||
return undefined;
|
||||
}
|
||||
if (!incoming) {
|
||||
return existing ? { ...existing } : undefined;
|
||||
}
|
||||
const result: Record<string, unknown> = {};
|
||||
if (existing) {
|
||||
for (const [key, value] of Object.entries(existing)) {
|
||||
if (!HISTORIC_SQL_MANAGED_USAGE_KEYS.has(key)) {
|
||||
result[key] = value;
|
||||
}
|
||||
}
|
||||
}
|
||||
Object.assign(result, incoming);
|
||||
return Object.keys(result).length > 0 ? (result as TableUsageOutput) : undefined;
|
||||
}
|
||||
|
||||
function getShardKey(connectionType: string, catalog: string | null, db: string | null): string {
|
||||
const normalized = connectionType.toUpperCase();
|
||||
|
||||
switch (normalized) {
|
||||
case 'SNOWFLAKE':
|
||||
case 'DATABRICKS': {
|
||||
const catalogPart = catalog ?? 'default';
|
||||
const schemaPart = db ?? 'public';
|
||||
return `${catalogPart}.${schemaPart}`;
|
||||
}
|
||||
case 'BIGQUERY': {
|
||||
return db ?? catalog ?? 'default';
|
||||
}
|
||||
case 'MYSQL':
|
||||
case 'CLICKHOUSE': {
|
||||
return db ?? catalog ?? 'default';
|
||||
}
|
||||
default: {
|
||||
return db ?? 'public';
|
||||
}
|
||||
}
|
||||
}
|
||||
|
||||
function buildTableRef(name: string, catalog: string | null, db: string | null): string {
|
||||
const parts: string[] = [];
|
||||
if (catalog) {
|
||||
parts.push(catalog);
|
||||
}
|
||||
if (db) {
|
||||
parts.push(db);
|
||||
}
|
||||
parts.push(name);
|
||||
return parts.join('.');
|
||||
}
|
||||
|
||||
function addJoinOnce(
|
||||
joinsByTable: Map<string, LiveDatabaseManifestJoinEntry[]>,
|
||||
tableName: string,
|
||||
join: LiveDatabaseManifestJoinEntry,
|
||||
): void {
|
||||
const joins = joinsByTable.get(tableName) ?? [];
|
||||
const exists = joins.some((candidate) => candidate.to === join.to && candidate.on === join.on);
|
||||
if (!exists) {
|
||||
joins.push(join);
|
||||
}
|
||||
joinsByTable.set(tableName, joins);
|
||||
}
|
||||
|
||||
function joinCondition(
|
||||
leftTable: string,
|
||||
leftColumns: readonly string[],
|
||||
rightTable: string,
|
||||
rightColumns: readonly string[],
|
||||
): string {
|
||||
if (leftColumns.length === 0 || leftColumns.length !== rightColumns.length) {
|
||||
throw new Error(`Invalid relationship join from ${leftTable} to ${rightTable}: column tuple widths differ`);
|
||||
}
|
||||
return leftColumns
|
||||
.map((leftColumn, index) => {
|
||||
const rightColumn = rightColumns[index];
|
||||
if (!rightColumn) {
|
||||
throw new Error(`Invalid relationship join from ${leftTable} to ${rightTable}: missing target column`);
|
||||
}
|
||||
return `${leftTable}.${leftColumn} = ${rightTable}.${rightColumn}`;
|
||||
})
|
||||
.join(' AND ');
|
||||
}
|
||||
|
||||
function buildJoinsByTable(
|
||||
tableNames: Set<string>,
|
||||
joins: LiveDatabaseManifestJoinData[],
|
||||
preservedJoins: Map<string, LiveDatabaseManifestJoinEntry[]>,
|
||||
): Map<string, LiveDatabaseManifestJoinEntry[]> {
|
||||
const joinsByTable = new Map<string, LiveDatabaseManifestJoinEntry[]>();
|
||||
|
||||
for (const join of joins) {
|
||||
if (!tableNames.has(join.fromTable) || !tableNames.has(join.toTable)) {
|
||||
continue;
|
||||
}
|
||||
const relationship = RELATIONSHIP_MAP[join.relationship] ?? join.relationship;
|
||||
addJoinOnce(joinsByTable, join.fromTable, {
|
||||
to: join.toTable,
|
||||
on: joinCondition(join.fromTable, join.fromColumns, join.toTable, join.toColumns),
|
||||
relationship,
|
||||
source: join.source,
|
||||
});
|
||||
|
||||
const reverseRelationship = RELATIONSHIP_INVERSE[relationship] ?? 'one_to_many';
|
||||
addJoinOnce(joinsByTable, join.toTable, {
|
||||
to: join.fromTable,
|
||||
on: joinCondition(join.toTable, join.toColumns, join.fromTable, join.fromColumns),
|
||||
relationship: reverseRelationship,
|
||||
source: join.source,
|
||||
});
|
||||
}
|
||||
|
||||
for (const [tableName, tableJoins] of preservedJoins) {
|
||||
if (!tableNames.has(tableName)) {
|
||||
continue;
|
||||
}
|
||||
for (const join of tableJoins) {
|
||||
if (tableNames.has(join.to)) {
|
||||
addJoinOnce(joinsByTable, tableName, join);
|
||||
}
|
||||
}
|
||||
}
|
||||
|
||||
return joinsByTable;
|
||||
}
|
||||
|
||||
export function buildLiveDatabaseManifestShards(
|
||||
input: BuildLiveDatabaseManifestShardsInput,
|
||||
): BuildLiveDatabaseManifestShardsResult {
|
||||
const tableNames = new Set(input.tables.map((table) => table.name));
|
||||
const joinsByTable = buildJoinsByTable(tableNames, input.joins, input.existingPreservedJoins ?? new Map());
|
||||
const shards = new Map<string, LiveDatabaseManifestShard>();
|
||||
|
||||
for (const table of input.tables) {
|
||||
const shardKey = getShardKey(input.connectionType, table.catalog, table.db);
|
||||
const shard = shards.get(shardKey) ?? { tables: {} };
|
||||
const existingDescriptions = input.existingDescriptions?.get(table.name);
|
||||
|
||||
const columns: LiveDatabaseManifestColumn[] = table.columns.map((column) => {
|
||||
const manifestColumn: LiveDatabaseManifestColumn = {
|
||||
name: column.name,
|
||||
type: input.mapColumnType(column.type),
|
||||
};
|
||||
if (column.pk) {
|
||||
manifestColumn.pk = true;
|
||||
}
|
||||
if (column.nullable === false) {
|
||||
manifestColumn.nullable = false;
|
||||
}
|
||||
const descriptions = mergeDescriptionsPreservingExternal(
|
||||
existingDescriptions?.columns.get(column.name),
|
||||
column.descriptions,
|
||||
);
|
||||
if (descriptions) {
|
||||
manifestColumn.descriptions = descriptions;
|
||||
}
|
||||
return manifestColumn;
|
||||
});
|
||||
|
||||
const entry: LiveDatabaseManifestTableEntry = {
|
||||
table: buildTableRef(table.name, table.catalog, table.db),
|
||||
columns,
|
||||
};
|
||||
|
||||
const tableDescriptions = mergeDescriptionsPreservingExternal(existingDescriptions?.table, table.descriptions);
|
||||
if (tableDescriptions) {
|
||||
entry.descriptions = tableDescriptions;
|
||||
}
|
||||
|
||||
const usage = mergeUsagePreservingExternal(input.existingUsage?.get(table.name), table.usage);
|
||||
if (usage) {
|
||||
entry.usage = usage;
|
||||
}
|
||||
|
||||
const tableJoins = joinsByTable.get(table.name);
|
||||
if (tableJoins && tableJoins.length > 0) {
|
||||
entry.joins = tableJoins;
|
||||
}
|
||||
|
||||
shard.tables[table.name] = entry;
|
||||
shards.set(shardKey, shard);
|
||||
}
|
||||
|
||||
return {
|
||||
shards,
|
||||
tablesProcessed: input.tables.length,
|
||||
};
|
||||
}
|
||||
|
|
@ -0,0 +1,152 @@
|
|||
import { mkdtemp, readFile } from 'node:fs/promises';
|
||||
import { tmpdir } from 'node:os';
|
||||
import { join } from 'node:path';
|
||||
import { describe, expect, it } from 'vitest';
|
||||
import {
|
||||
detectLiveDatabaseStagedDir,
|
||||
LIVE_DATABASE_FOREIGN_KEYS_FILE,
|
||||
LIVE_DATABASE_META_FILE,
|
||||
liveDatabaseTablePath,
|
||||
readLiveDatabaseTableFiles,
|
||||
writeLiveDatabaseSnapshot,
|
||||
} from './stage.js';
|
||||
import type { KtxSchemaSnapshot } from '../../../scan/types.js';
|
||||
|
||||
function snapshot(): KtxSchemaSnapshot {
|
||||
return {
|
||||
connectionId: 'conn-1',
|
||||
driver: 'postgres',
|
||||
extractedAt: '2026-04-27T00:00:00.000Z',
|
||||
scope: { schemas: ['public'] },
|
||||
metadata: { dialect: 'postgres' },
|
||||
tables: [
|
||||
{
|
||||
name: 'orders',
|
||||
catalog: null,
|
||||
db: 'public',
|
||||
kind: 'table',
|
||||
comment: 'Orders placed by customers',
|
||||
estimatedRows: 200,
|
||||
columns: [
|
||||
{
|
||||
name: 'id',
|
||||
nativeType: 'integer',
|
||||
normalizedType: 'integer',
|
||||
dimensionType: 'number',
|
||||
nullable: false,
|
||||
primaryKey: true,
|
||||
comment: null,
|
||||
},
|
||||
{
|
||||
name: 'customer_id',
|
||||
nativeType: 'integer',
|
||||
normalizedType: 'integer',
|
||||
dimensionType: 'number',
|
||||
nullable: false,
|
||||
primaryKey: false,
|
||||
comment: null,
|
||||
},
|
||||
{
|
||||
name: 'total',
|
||||
nativeType: 'numeric',
|
||||
normalizedType: 'numeric',
|
||||
dimensionType: 'number',
|
||||
nullable: false,
|
||||
primaryKey: false,
|
||||
comment: null,
|
||||
},
|
||||
],
|
||||
foreignKeys: [
|
||||
{
|
||||
fromColumn: 'customer_id',
|
||||
toCatalog: null,
|
||||
toDb: 'public',
|
||||
toTable: 'customers',
|
||||
toColumn: 'id',
|
||||
constraintName: null,
|
||||
},
|
||||
],
|
||||
},
|
||||
{
|
||||
name: 'customers',
|
||||
catalog: null,
|
||||
db: 'public',
|
||||
kind: 'table',
|
||||
comment: null,
|
||||
estimatedRows: 50,
|
||||
columns: [
|
||||
{
|
||||
name: 'id',
|
||||
nativeType: 'integer',
|
||||
normalizedType: 'integer',
|
||||
dimensionType: 'number',
|
||||
nullable: false,
|
||||
primaryKey: true,
|
||||
comment: null,
|
||||
},
|
||||
],
|
||||
foreignKeys: [],
|
||||
},
|
||||
],
|
||||
};
|
||||
}
|
||||
|
||||
describe('live-database staged snapshot files', () => {
|
||||
it('writes deterministic metadata, table, and foreign-key files', async () => {
|
||||
const dir = await mkdtemp(join(tmpdir(), 'ktx-live-db-stage-'));
|
||||
await writeLiveDatabaseSnapshot(dir, snapshot());
|
||||
|
||||
await expect(readFile(join(dir, LIVE_DATABASE_META_FILE), 'utf8')).resolves.toContain('"connectionId": "conn-1"');
|
||||
await expect(readFile(join(dir, LIVE_DATABASE_FOREIGN_KEYS_FILE), 'utf8')).resolves.toContain(
|
||||
'"fromTable": "orders"',
|
||||
);
|
||||
const connectionJson = await readFile(join(dir, LIVE_DATABASE_META_FILE), 'utf8');
|
||||
expect(connectionJson).toContain('"driver": "postgres"');
|
||||
expect(connectionJson).toContain('"schemas"');
|
||||
|
||||
const ordersPath = liveDatabaseTablePath({ catalog: null, db: 'public', name: 'orders' });
|
||||
const customersPath = liveDatabaseTablePath({ catalog: null, db: 'public', name: 'customers' });
|
||||
expect(ordersPath).toMatch(/^tables\/[A-Za-z0-9_-]+\.[A-Za-z0-9_-]+\.[A-Za-z0-9_-]+\.json$/);
|
||||
await expect(readFile(join(dir, ordersPath), 'utf8')).resolves.toContain('"name": "orders"');
|
||||
await expect(readFile(join(dir, customersPath), 'utf8')).resolves.toContain('"name": "customers"');
|
||||
const ordersJson = await readFile(join(dir, ordersPath), 'utf8');
|
||||
expect(ordersJson).toContain('"kind": "table"');
|
||||
expect(ordersJson).toContain('"estimatedRows": 200');
|
||||
expect(ordersJson).toContain('"nativeType": "integer"');
|
||||
expect(ordersJson).toContain('"normalizedType": "integer"');
|
||||
expect(ordersJson).not.toContain('"type": "integer"');
|
||||
|
||||
const tableFiles = await readLiveDatabaseTableFiles(dir);
|
||||
expect(tableFiles.map((file) => file.table.name)).toEqual(['customers', 'orders']);
|
||||
expect(await detectLiveDatabaseStagedDir(dir)).toBe(true);
|
||||
});
|
||||
|
||||
it('redacts sensitive snapshot metadata before writing connection metadata', async () => {
|
||||
const dir = await mkdtemp(join(tmpdir(), 'ktx-live-db-redacted-stage-'));
|
||||
await writeLiveDatabaseSnapshot(dir, {
|
||||
...snapshot(),
|
||||
metadata: {
|
||||
dialect: 'postgres',
|
||||
url: 'postgres://reader:secret@example.test/db', // pragma: allowlist secret
|
||||
serviceAccountJson: {
|
||||
client_email: 'reader@example.test',
|
||||
private_key: 'pem-value', // pragma: allowlist secret
|
||||
},
|
||||
},
|
||||
});
|
||||
|
||||
const connectionJson = await readFile(join(dir, LIVE_DATABASE_META_FILE), 'utf8');
|
||||
|
||||
expect(connectionJson).toContain('"dialect": "postgres"');
|
||||
expect(connectionJson).toContain('"client_email": "reader@example.test"');
|
||||
expect(connectionJson).toContain('"url": "<redacted>"');
|
||||
expect(connectionJson).toContain('"private_key": "<redacted>"');
|
||||
expect(connectionJson).not.toContain('postgres://reader:secret@example.test/db'); // pragma: allowlist secret
|
||||
expect(connectionJson).not.toContain('pem-value');
|
||||
});
|
||||
|
||||
it('returns false for a directory that is missing live database metadata', async () => {
|
||||
const dir = await mkdtemp(join(tmpdir(), 'ktx-live-db-empty-'));
|
||||
expect(await detectLiveDatabaseStagedDir(dir)).toBe(false);
|
||||
});
|
||||
});
|
||||
139
packages/cli/src/context/ingest/adapters/live-database/stage.ts
Normal file
139
packages/cli/src/context/ingest/adapters/live-database/stage.ts
Normal file
|
|
@ -0,0 +1,139 @@
|
|||
import { Buffer } from 'node:buffer';
|
||||
import type { Dirent } from 'node:fs';
|
||||
import { mkdir, readdir, readFile, writeFile } from 'node:fs/promises';
|
||||
import { join, relative } from 'node:path';
|
||||
import { redactKtxSensitiveMetadata } from '../../../core/redaction.js';
|
||||
import type { KtxSchemaSnapshot, KtxSchemaTable, KtxTableRef } from '../../../scan/types.js';
|
||||
|
||||
export const LIVE_DATABASE_META_FILE = 'connection.json';
|
||||
export const LIVE_DATABASE_FOREIGN_KEYS_FILE = 'foreign-keys.json';
|
||||
const LIVE_DATABASE_TABLES_DIR = 'tables';
|
||||
|
||||
interface LiveDatabaseTableFile {
|
||||
path: string;
|
||||
table: KtxSchemaTable;
|
||||
}
|
||||
|
||||
interface ForeignKeyIndexEntry {
|
||||
fromTable: string;
|
||||
fromTablePath: string;
|
||||
fromColumn: string;
|
||||
toCatalog: string | null;
|
||||
toDb: string | null;
|
||||
toTable: string;
|
||||
toColumn: string;
|
||||
constraintName: string | null;
|
||||
}
|
||||
|
||||
function encodePathPart(value: string | null | undefined): string {
|
||||
return Buffer.from(value ?? '_', 'utf8').toString('base64url');
|
||||
}
|
||||
|
||||
function tableSortKey(table: KtxTableRef): string {
|
||||
return `${table.catalog ?? ''}\u0000${table.db ?? ''}\u0000${table.name}`;
|
||||
}
|
||||
|
||||
/** @internal */
|
||||
export function liveDatabaseTablePath(table: KtxTableRef): string {
|
||||
return `${LIVE_DATABASE_TABLES_DIR}/${encodePathPart(table.catalog)}.${encodePathPart(table.db)}.${encodePathPart(
|
||||
table.name,
|
||||
)}.json`;
|
||||
}
|
||||
|
||||
async function walkFiles(root: string, dir = root): Promise<string[]> {
|
||||
let entries: Dirent[];
|
||||
try {
|
||||
entries = await readdir(dir, { withFileTypes: true });
|
||||
} catch {
|
||||
return [];
|
||||
}
|
||||
const files: string[] = [];
|
||||
for (const entry of entries) {
|
||||
const absolute = join(dir, entry.name);
|
||||
if (entry.isDirectory()) {
|
||||
files.push(...(await walkFiles(root, absolute)));
|
||||
} else if (entry.isFile()) {
|
||||
files.push(relative(root, absolute).replace(/\\/g, '/'));
|
||||
}
|
||||
}
|
||||
return files.sort();
|
||||
}
|
||||
|
||||
function stableJson(value: unknown): string {
|
||||
return `${JSON.stringify(value, null, 2)}\n`;
|
||||
}
|
||||
|
||||
function foreignKeyIndex(snapshot: KtxSchemaSnapshot): ForeignKeyIndexEntry[] {
|
||||
const entries: ForeignKeyIndexEntry[] = [];
|
||||
for (const table of snapshot.tables) {
|
||||
for (const fk of table.foreignKeys) {
|
||||
entries.push({
|
||||
fromTable: table.name,
|
||||
fromTablePath: liveDatabaseTablePath(table),
|
||||
fromColumn: fk.fromColumn,
|
||||
toCatalog: fk.toCatalog,
|
||||
toDb: fk.toDb,
|
||||
toTable: fk.toTable,
|
||||
toColumn: fk.toColumn,
|
||||
constraintName: fk.constraintName,
|
||||
});
|
||||
}
|
||||
}
|
||||
entries.sort(
|
||||
(a, b) =>
|
||||
a.fromTable.localeCompare(b.fromTable) ||
|
||||
a.fromColumn.localeCompare(b.fromColumn) ||
|
||||
a.toTable.localeCompare(b.toTable) ||
|
||||
a.toColumn.localeCompare(b.toColumn),
|
||||
);
|
||||
return entries;
|
||||
}
|
||||
|
||||
export async function writeLiveDatabaseSnapshot(stagedDir: string, snapshot: KtxSchemaSnapshot): Promise<void> {
|
||||
await mkdir(join(stagedDir, LIVE_DATABASE_TABLES_DIR), { recursive: true });
|
||||
const sortedTables = [...snapshot.tables].sort((a, b) => tableSortKey(a).localeCompare(tableSortKey(b)));
|
||||
const metadata = {
|
||||
connectionId: snapshot.connectionId,
|
||||
driver: snapshot.driver,
|
||||
extractedAt: snapshot.extractedAt,
|
||||
scope: snapshot.scope,
|
||||
metadata: redactKtxSensitiveMetadata(snapshot.metadata),
|
||||
tableCount: sortedTables.length,
|
||||
};
|
||||
await writeFile(join(stagedDir, LIVE_DATABASE_META_FILE), stableJson(metadata));
|
||||
await writeFile(
|
||||
join(stagedDir, LIVE_DATABASE_FOREIGN_KEYS_FILE),
|
||||
stableJson({ foreignKeys: foreignKeyIndex(snapshot) }),
|
||||
);
|
||||
for (const table of sortedTables) {
|
||||
await writeFile(join(stagedDir, liveDatabaseTablePath(table)), stableJson(table));
|
||||
}
|
||||
}
|
||||
|
||||
export async function readLiveDatabaseTableFiles(stagedDir: string): Promise<LiveDatabaseTableFile[]> {
|
||||
const files = await walkFiles(join(stagedDir, LIVE_DATABASE_TABLES_DIR));
|
||||
const out: LiveDatabaseTableFile[] = [];
|
||||
for (const file of files.filter((path) => path.endsWith('.json'))) {
|
||||
const path = `${LIVE_DATABASE_TABLES_DIR}/${file}`;
|
||||
const raw = await readFile(join(stagedDir, path), 'utf8');
|
||||
const parsed = JSON.parse(raw) as KtxSchemaTable;
|
||||
if (parsed && typeof parsed.name === 'string' && Array.isArray(parsed.columns)) {
|
||||
out.push({ path, table: parsed });
|
||||
}
|
||||
}
|
||||
out.sort((a, b) => tableSortKey(a.table).localeCompare(tableSortKey(b.table)));
|
||||
return out;
|
||||
}
|
||||
|
||||
export async function detectLiveDatabaseStagedDir(stagedDir: string): Promise<boolean> {
|
||||
try {
|
||||
const meta = JSON.parse(await readFile(join(stagedDir, LIVE_DATABASE_META_FILE), 'utf8')) as unknown;
|
||||
if (!meta || typeof meta !== 'object' || Array.isArray(meta)) {
|
||||
return false;
|
||||
}
|
||||
const files = await readLiveDatabaseTableFiles(stagedDir);
|
||||
return files.length > 0;
|
||||
} catch {
|
||||
return false;
|
||||
}
|
||||
}
|
||||
|
|
@ -0,0 +1,10 @@
|
|||
import type { KtxSchemaSnapshot } from '../../../scan/types.js';
|
||||
|
||||
export interface LiveDatabaseIntrospectionPort {
|
||||
extractSchema(connectionId: string): Promise<KtxSchemaSnapshot>;
|
||||
}
|
||||
|
||||
export interface LiveDatabaseSourceAdapterDeps {
|
||||
introspection: LiveDatabaseIntrospectionPort;
|
||||
now?: () => Date;
|
||||
}
|
||||
154
packages/cli/src/context/ingest/adapters/looker/chunk.test.ts
Normal file
154
packages/cli/src/context/ingest/adapters/looker/chunk.test.ts
Normal file
|
|
@ -0,0 +1,154 @@
|
|||
import { mkdir, mkdtemp, rm, writeFile } from 'node:fs/promises';
|
||||
import { tmpdir } from 'node:os';
|
||||
import { join } from 'node:path';
|
||||
import { afterEach, beforeEach, describe, expect, it } from 'vitest';
|
||||
import { chunkLookerStagedDir } from './chunk.js';
|
||||
import { writeLookerEvidenceDocuments } from './evidence-documents.js';
|
||||
|
||||
async function writeJson(stagedDir: string, relPath: string, value: unknown): Promise<void> {
|
||||
const abs = join(stagedDir, relPath);
|
||||
await mkdir(join(abs, '..'), { recursive: true });
|
||||
await writeFile(abs, `${JSON.stringify(value, null, 2)}\n`, 'utf-8');
|
||||
}
|
||||
|
||||
async function writeSmallFixture(stagedDir: string): Promise<void> {
|
||||
await writeJson(stagedDir, 'sync-config.json', {
|
||||
lookerConnectionId: '11111111-1111-4111-8111-111111111111',
|
||||
fetchedAt: '2026-04-30T12:30:00.000Z',
|
||||
});
|
||||
await writeJson(stagedDir, 'lookml_models.json', {
|
||||
models: [{ name: 'b2b', label: 'B2B', explores: [{ name: 'sales_pipeline', label: 'Sales Pipeline' }] }],
|
||||
});
|
||||
await writeJson(stagedDir, 'explores/b2b/sales_pipeline.json', {
|
||||
modelName: 'b2b',
|
||||
exploreName: 'sales_pipeline',
|
||||
label: 'Sales Pipeline',
|
||||
description: null,
|
||||
fields: { dimensions: [{ name: 'opportunities.id' }], measures: [{ name: 'opportunities.arr' }] },
|
||||
joins: [],
|
||||
});
|
||||
await writeJson(stagedDir, 'dashboards/10.json', {
|
||||
lookerId: '10',
|
||||
title: 'Sales Pipeline',
|
||||
description: null,
|
||||
folderId: '7',
|
||||
ownerId: '3',
|
||||
updatedAt: '2026-04-30T12:00:00.000Z',
|
||||
tiles: [{ id: '100', title: 'ARR', lookId: null, query: { model: 'b2b', view: 'sales_pipeline' } }],
|
||||
});
|
||||
await writeJson(stagedDir, 'looks/20.json', {
|
||||
lookerId: '20',
|
||||
title: 'Open Pipeline',
|
||||
description: null,
|
||||
folderId: '7',
|
||||
ownerId: '3',
|
||||
updatedAt: '2026-04-30T12:00:00.000Z',
|
||||
query: { model: 'b2b', view: 'sales_pipeline', fields: ['opportunities.arr'] },
|
||||
});
|
||||
await writeJson(stagedDir, 'folders/tree.json', {
|
||||
folders: [{ id: '7', name: 'Sandbox', parentId: null, path: ['Sandbox'] }],
|
||||
});
|
||||
await writeJson(stagedDir, 'users/3.json', { id: '3', displayName: 'Ada Lovelace', email: null });
|
||||
await writeJson(stagedDir, 'signals/dashboard_usage.json', [
|
||||
{ contentId: '10', queryCount30d: 50, uniqueUsers30d: 8 },
|
||||
]);
|
||||
await writeJson(stagedDir, 'signals/look_usage.json', [{ contentId: '20', queryCount30d: 20, uniqueUsers30d: 5 }]);
|
||||
await writeJson(stagedDir, 'signals/scheduled_plans.json', [
|
||||
{ contentId: '10', contentType: 'dashboard', isScheduled: true, scheduleCount: 1, recipientCount: 3 },
|
||||
]);
|
||||
await writeJson(stagedDir, 'signals/favorites.json', [
|
||||
{ contentId: '10', contentType: 'dashboard', favoriteCount: 4 },
|
||||
]);
|
||||
await writeLookerEvidenceDocuments(stagedDir);
|
||||
}
|
||||
|
||||
describe('chunkLookerStagedDir', () => {
|
||||
let stagedDir: string;
|
||||
|
||||
beforeEach(async () => {
|
||||
stagedDir = await mkdtemp(join(tmpdir(), 'looker-chunk-'));
|
||||
await writeSmallFixture(stagedDir);
|
||||
});
|
||||
|
||||
afterEach(async () => {
|
||||
await rm(stagedDir, { recursive: true, force: true });
|
||||
});
|
||||
|
||||
it('emits one WU per explore, dashboard, and Look with readable dependencies', async () => {
|
||||
const result = await chunkLookerStagedDir(stagedDir);
|
||||
expect(result.reconcileNotes).toEqual([
|
||||
expect.stringContaining('emit_artifact_resolution with actionType="subsumed"'),
|
||||
]);
|
||||
expect(result.workUnits.map((wu) => wu.unitKey).sort()).toEqual([
|
||||
'looker-dashboard-10',
|
||||
'looker-explore-b2b-sales_pipeline',
|
||||
'looker-look-20',
|
||||
]);
|
||||
|
||||
const dashboard = result.workUnits.find((wu) => wu.unitKey === 'looker-dashboard-10');
|
||||
expect(dashboard?.rawFiles).toEqual([
|
||||
'dashboards/10.json',
|
||||
'evidence/dashboards/10/metadata.json',
|
||||
'evidence/dashboards/10/page.md',
|
||||
]);
|
||||
expect(dashboard?.notes).toContain('context_candidate_write');
|
||||
expect(dashboard?.notes).not.toContain('wiki_write');
|
||||
expect(dashboard?.dependencyPaths.sort()).toEqual([
|
||||
'explores/b2b/sales_pipeline.json',
|
||||
'folders/tree.json',
|
||||
'signals/dashboard_usage.json',
|
||||
'signals/favorites.json',
|
||||
'signals/scheduled_plans.json',
|
||||
'users/3.json',
|
||||
]);
|
||||
|
||||
const explore = result.workUnits.find((wu) => wu.unitKey === 'looker-explore-b2b-sales_pipeline');
|
||||
expect(explore?.rawFiles).toEqual([
|
||||
'explores/b2b/sales_pipeline.json',
|
||||
'evidence/explores/b2b/sales_pipeline/metadata.json',
|
||||
'evidence/explores/b2b/sales_pipeline/page.md',
|
||||
]);
|
||||
expect(explore?.dependencyPaths).toEqual(['lookml_models.json']);
|
||||
});
|
||||
|
||||
it('keeps downstream dashboard and Look WUs when an explore dependency changes', async () => {
|
||||
const result = await chunkLookerStagedDir(stagedDir, {
|
||||
added: [],
|
||||
modified: ['explores/b2b/sales_pipeline.json'],
|
||||
deleted: [],
|
||||
unchanged: [
|
||||
'dashboards/10.json',
|
||||
'looks/20.json',
|
||||
'lookml_models.json',
|
||||
'folders/tree.json',
|
||||
'users/3.json',
|
||||
'signals/dashboard_usage.json',
|
||||
'signals/look_usage.json',
|
||||
'signals/scheduled_plans.json',
|
||||
'signals/favorites.json',
|
||||
],
|
||||
});
|
||||
|
||||
expect(result.workUnits.map((wu) => wu.unitKey).sort()).toEqual([
|
||||
'looker-dashboard-10',
|
||||
'looker-explore-b2b-sales_pipeline',
|
||||
'looker-look-20',
|
||||
]);
|
||||
expect(result.workUnits.find((wu) => wu.unitKey === 'looker-dashboard-10')?.rawFiles).toEqual([
|
||||
'dashboards/10.json',
|
||||
'evidence/dashboards/10/metadata.json',
|
||||
'evidence/dashboards/10/page.md',
|
||||
]);
|
||||
});
|
||||
|
||||
it('returns an EvictionUnit for deleted runtime entity raw paths', async () => {
|
||||
const result = await chunkLookerStagedDir(stagedDir, {
|
||||
added: [],
|
||||
modified: [],
|
||||
deleted: ['looks/20.json'],
|
||||
unchanged: ['dashboards/10.json', 'explores/b2b/sales_pipeline.json'],
|
||||
});
|
||||
|
||||
expect(result.eviction).toEqual({ deletedRawPaths: ['looks/20.json'] });
|
||||
});
|
||||
});
|
||||
198
packages/cli/src/context/ingest/adapters/looker/chunk.ts
Normal file
198
packages/cli/src/context/ingest/adapters/looker/chunk.ts
Normal file
|
|
@ -0,0 +1,198 @@
|
|||
import { readdir, readFile } from 'node:fs/promises';
|
||||
import { join, relative } from 'node:path';
|
||||
import type { ChunkResult, DiffSet, WorkUnit } from '../../types.js';
|
||||
import { buildLookerReconcileNotes } from './reconcile.js';
|
||||
import {
|
||||
STAGED_FILES,
|
||||
type StagedDashboardFile,
|
||||
type StagedLookerQuery,
|
||||
type StagedLookFile,
|
||||
stagedDashboardFileSchema,
|
||||
stagedExploreFileSchema,
|
||||
stagedLookFileSchema,
|
||||
} from './types.js';
|
||||
|
||||
interface LoadedLookerProject {
|
||||
allPaths: string[];
|
||||
dashboardsByPath: Map<string, StagedDashboardFile>;
|
||||
looksByPath: Map<string, StagedLookFile>;
|
||||
explorePaths: string[];
|
||||
}
|
||||
|
||||
async function walk(root: string): Promise<string[]> {
|
||||
const entries = await readdir(root, { withFileTypes: true, recursive: true });
|
||||
return entries
|
||||
.filter((entry) => entry.isFile())
|
||||
.map((entry) => relative(root, join(entry.parentPath, entry.name)).replace(/\\/g, '/'))
|
||||
.sort();
|
||||
}
|
||||
|
||||
async function loadProject(stagedDir: string): Promise<LoadedLookerProject> {
|
||||
const allPaths = await walk(stagedDir);
|
||||
const dashboardsByPath = new Map<string, StagedDashboardFile>();
|
||||
const looksByPath = new Map<string, StagedLookFile>();
|
||||
const explorePaths: string[] = [];
|
||||
|
||||
for (const path of allPaths) {
|
||||
if (/^dashboards\/[^/]+\.json$/.test(path)) {
|
||||
dashboardsByPath.set(
|
||||
path,
|
||||
stagedDashboardFileSchema.parse(JSON.parse(await readFile(join(stagedDir, path), 'utf-8'))),
|
||||
);
|
||||
continue;
|
||||
}
|
||||
if (/^looks\/[^/]+\.json$/.test(path)) {
|
||||
looksByPath.set(path, stagedLookFileSchema.parse(JSON.parse(await readFile(join(stagedDir, path), 'utf-8'))));
|
||||
continue;
|
||||
}
|
||||
if (/^explores\/[^/]+\/[^/]+\.json$/.test(path)) {
|
||||
const explore = stagedExploreFileSchema.parse(JSON.parse(await readFile(join(stagedDir, path), 'utf-8')));
|
||||
explorePaths.push(explorePath(explore.modelName, explore.exploreName));
|
||||
}
|
||||
}
|
||||
|
||||
return { allPaths, dashboardsByPath, looksByPath, explorePaths: [...new Set(explorePaths)].sort() };
|
||||
}
|
||||
|
||||
export async function chunkLookerStagedDir(stagedDir: string, diffSet?: DiffSet): Promise<ChunkResult> {
|
||||
const project = await loadProject(stagedDir);
|
||||
const firstRunUnits = emitFirstRunWorkUnits(project);
|
||||
const result = diffSet ? applyDiffSet(firstRunUnits, diffSet) : { workUnits: firstRunUnits };
|
||||
const eviction =
|
||||
diffSet && diffSet.deleted.length > 0 ? { deletedRawPaths: [...diffSet.deleted].sort() } : result.eviction;
|
||||
return {
|
||||
...result,
|
||||
eviction,
|
||||
reconcileNotes: result.workUnits.length > 0 || eviction ? buildLookerReconcileNotes() : [],
|
||||
};
|
||||
}
|
||||
|
||||
function emitFirstRunWorkUnits(project: LoadedLookerProject): WorkUnit[] {
|
||||
const units: WorkUnit[] = [];
|
||||
|
||||
for (const path of project.explorePaths) {
|
||||
const parts = /^explores\/([^/]+)\/([^/]+)\.json$/.exec(path);
|
||||
if (!parts) {
|
||||
continue;
|
||||
}
|
||||
const deps = project.allPaths.includes(STAGED_FILES.lookmlModels) ? [STAGED_FILES.lookmlModels] : [];
|
||||
units.push(
|
||||
buildUnit(project, {
|
||||
unitKey: `looker-explore-${parts[1]}-${parts[2]}`,
|
||||
displayLabel: `Looker explore ${parts[1]}.${parts[2]}`,
|
||||
rawFiles: [path, ...evidencePathsForExplore(project, parts[1], parts[2])],
|
||||
dependencyPaths: deps,
|
||||
notes: `Write API-derived SL source looker__${parts[1]}__${parts[2]} and durable domain knowledge for this Looker explore.`,
|
||||
}),
|
||||
);
|
||||
}
|
||||
|
||||
for (const [path, dashboard] of [...project.dashboardsByPath.entries()].sort(([a], [b]) => a.localeCompare(b))) {
|
||||
const deps = new Set<string>();
|
||||
addIfPresent(project, deps, STAGED_FILES.foldersTree);
|
||||
addIfPresent(project, deps, STAGED_FILES.signals.dashboardUsage);
|
||||
addIfPresent(project, deps, STAGED_FILES.signals.scheduledPlans);
|
||||
addIfPresent(project, deps, STAGED_FILES.signals.favorites);
|
||||
if (dashboard.ownerId) {
|
||||
addIfPresent(project, deps, `users/${dashboard.ownerId}.json`);
|
||||
}
|
||||
for (const tile of dashboard.tiles) {
|
||||
addExploreDependency(project, deps, tile.query);
|
||||
}
|
||||
|
||||
units.push(
|
||||
buildUnit(project, {
|
||||
unitKey: `looker-dashboard-${dashboard.lookerId}`,
|
||||
displayLabel: `Looker dashboard "${dashboard.title}"`,
|
||||
rawFiles: [path, ...evidencePathsForDashboard(project, dashboard.lookerId)],
|
||||
dependencyPaths: [...deps].sort(),
|
||||
notes:
|
||||
'Extract generalizable metric, segment, and domain knowledge from this dashboard. Treat usage, owner, and folder data as prioritization/provenance context only. Use context_evidence_search/context_evidence_read and context_candidate_write for wiki-bound knowledge; do not write wiki pages directly from this WorkUnit.',
|
||||
}),
|
||||
);
|
||||
}
|
||||
|
||||
for (const [path, look] of [...project.looksByPath.entries()].sort(([a], [b]) => a.localeCompare(b))) {
|
||||
const deps = new Set<string>();
|
||||
addIfPresent(project, deps, STAGED_FILES.foldersTree);
|
||||
addIfPresent(project, deps, STAGED_FILES.signals.lookUsage);
|
||||
addIfPresent(project, deps, STAGED_FILES.signals.scheduledPlans);
|
||||
addIfPresent(project, deps, STAGED_FILES.signals.favorites);
|
||||
if (look.ownerId) {
|
||||
addIfPresent(project, deps, `users/${look.ownerId}.json`);
|
||||
}
|
||||
addExploreDependency(project, deps, look.query);
|
||||
|
||||
units.push(
|
||||
buildUnit(project, {
|
||||
unitKey: `looker-look-${look.lookerId}`,
|
||||
displayLabel: `Looker Look "${look.title}"`,
|
||||
rawFiles: [path, ...evidencePathsForLook(project, look.lookerId)],
|
||||
dependencyPaths: [...deps].sort(),
|
||||
notes:
|
||||
'Extract generalizable metric, segment, and domain knowledge from this Look. Treat usage, owner, and folder data as prioritization/provenance context only. Use context_evidence_search/context_evidence_read and context_candidate_write for wiki-bound knowledge; do not write wiki pages directly from this WorkUnit.',
|
||||
}),
|
||||
);
|
||||
}
|
||||
|
||||
return units.sort((a, b) => a.unitKey.localeCompare(b.unitKey));
|
||||
}
|
||||
|
||||
function buildUnit(
|
||||
project: LoadedLookerProject,
|
||||
input: Pick<WorkUnit, 'unitKey' | 'displayLabel' | 'rawFiles' | 'dependencyPaths' | 'notes'>,
|
||||
): WorkUnit {
|
||||
const excluded = new Set([...input.rawFiles, ...input.dependencyPaths]);
|
||||
return {
|
||||
...input,
|
||||
peerFileIndex: project.allPaths.filter((path) => !excluded.has(path)).sort(),
|
||||
};
|
||||
}
|
||||
|
||||
function applyDiffSet(firstRunUnits: WorkUnit[], diffSet: DiffSet): ChunkResult {
|
||||
const touched = new Set([...diffSet.added, ...diffSet.modified]);
|
||||
const workUnits = firstRunUnits.filter((wu) => {
|
||||
const readablePaths = [...wu.rawFiles, ...wu.dependencyPaths];
|
||||
return readablePaths.some((path) => touched.has(path));
|
||||
});
|
||||
return { workUnits };
|
||||
}
|
||||
|
||||
function addIfPresent(project: LoadedLookerProject, deps: Set<string>, path: string): void {
|
||||
if (project.allPaths.includes(path)) {
|
||||
deps.add(path);
|
||||
}
|
||||
}
|
||||
|
||||
function addExploreDependency(project: LoadedLookerProject, deps: Set<string>, query: StagedLookerQuery | null): void {
|
||||
if (!query) {
|
||||
return;
|
||||
}
|
||||
addIfPresent(project, deps, explorePath(query.model, query.view));
|
||||
}
|
||||
|
||||
function evidencePathsForExplore(project: LoadedLookerProject, modelName: string, exploreName: string): string[] {
|
||||
return existingPaths(project, [
|
||||
`evidence/explores/${modelName}/${exploreName}/metadata.json`,
|
||||
`evidence/explores/${modelName}/${exploreName}/page.md`,
|
||||
]);
|
||||
}
|
||||
|
||||
function evidencePathsForDashboard(project: LoadedLookerProject, dashboardId: string): string[] {
|
||||
return existingPaths(project, [
|
||||
`evidence/dashboards/${dashboardId}/metadata.json`,
|
||||
`evidence/dashboards/${dashboardId}/page.md`,
|
||||
]);
|
||||
}
|
||||
|
||||
function evidencePathsForLook(project: LoadedLookerProject, lookId: string): string[] {
|
||||
return existingPaths(project, [`evidence/looks/${lookId}/metadata.json`, `evidence/looks/${lookId}/page.md`]);
|
||||
}
|
||||
|
||||
function existingPaths(project: LoadedLookerProject, paths: string[]): string[] {
|
||||
return paths.filter((path) => project.allPaths.includes(path));
|
||||
}
|
||||
|
||||
function explorePath(modelName: string, exploreName: string): string {
|
||||
return `explores/${modelName}/${exploreName}.json`;
|
||||
}
|
||||
|
|
@ -0,0 +1,14 @@
|
|||
import { readFile } from 'node:fs/promises';
|
||||
import { describe, expect, it } from 'vitest';
|
||||
|
||||
describe('LookerClient boundary', () => {
|
||||
it('does not import server or NestJS modules', async () => {
|
||||
const source = await readFile(new URL('./client.ts', import.meta.url), 'utf-8');
|
||||
|
||||
expect(source).not.toMatch(/@nestjs\/common/);
|
||||
expect(source).not.toMatch(/DataSourceClient/);
|
||||
expect(source).not.toMatch(/\.\.\/interfaces/);
|
||||
expect(source).not.toMatch(/\.\.\/types/);
|
||||
expect(source).not.toMatch(/server\/src/);
|
||||
});
|
||||
});
|
||||
473
packages/cli/src/context/ingest/adapters/looker/client.test.ts
Normal file
473
packages/cli/src/context/ingest/adapters/looker/client.test.ts
Normal file
|
|
@ -0,0 +1,473 @@
|
|||
import { describe, expect, it, vi } from 'vitest';
|
||||
import { LookerClient, type LookerSdkPort } from './client.js';
|
||||
|
||||
const clientSecretParam = 'client_secret'; // pragma: allowlist secret
|
||||
|
||||
function params(): Record<string, unknown> {
|
||||
return {
|
||||
base_url: 'https://example.looker.com',
|
||||
client_id: 'id',
|
||||
[clientSecretParam]: 'credential', // pragma: allowlist secret
|
||||
};
|
||||
}
|
||||
|
||||
function sdk(overrides: Partial<LookerSdkPort> = {}): LookerSdkPort {
|
||||
const port: LookerSdkPort = {
|
||||
me: vi.fn().mockResolvedValue({ id: '1', display_name: 'API User', email: 'api@example.com' }),
|
||||
search_dashboards: vi.fn().mockResolvedValue([{ id: '10' }]),
|
||||
dashboard: vi.fn().mockResolvedValue({
|
||||
id: '10',
|
||||
title: 'Revenue Dashboard',
|
||||
description: 'Revenue concepts',
|
||||
folder_id: '20',
|
||||
user_id: '1',
|
||||
updated_at: '2026-04-30T00:00:00.000Z',
|
||||
dashboard_elements: [
|
||||
{
|
||||
id: '99',
|
||||
title: 'ARR',
|
||||
look_id: null,
|
||||
query: {
|
||||
id: 'q1',
|
||||
model: 'b2b',
|
||||
view: 'sales_pipeline',
|
||||
fields: ['opportunities.arr', 'opportunities.stage'],
|
||||
filters: { 'opportunities.stage': 'open' },
|
||||
sorts: ['opportunities.arr desc'],
|
||||
limit: '500',
|
||||
},
|
||||
},
|
||||
],
|
||||
}),
|
||||
search_looks: vi.fn().mockResolvedValue([{ id: '30' }]),
|
||||
search_scheduled_plans: vi.fn().mockResolvedValue([]),
|
||||
look: vi.fn().mockResolvedValue({
|
||||
id: '30',
|
||||
title: 'Open Pipeline ARR',
|
||||
description: 'ARR for open opportunities',
|
||||
folder_id: '20',
|
||||
user_id: '1',
|
||||
updated_at: '2026-04-30T00:00:00.000Z',
|
||||
query: {
|
||||
id: 'q2',
|
||||
model: 'b2b',
|
||||
view: 'sales_pipeline',
|
||||
fields: ['opportunities.arr'],
|
||||
filters: { 'opportunities.stage': 'open' },
|
||||
},
|
||||
}),
|
||||
all_folders: vi.fn().mockResolvedValue([{ id: '20', name: 'Executive', parent_id: null }]),
|
||||
all_users: vi.fn().mockResolvedValue([{ id: '1', display_name: 'API User', email: 'api@example.com' }]),
|
||||
all_groups: vi.fn().mockResolvedValue([{ id: '2', name: 'Finance' }]),
|
||||
all_connections: vi.fn().mockResolvedValue([
|
||||
{
|
||||
name: 'b2b_sandbox_bq',
|
||||
host: 'warehouse.example.com',
|
||||
database: 'analytics',
|
||||
schema: 'public',
|
||||
dialect_name: 'bigquery_standard_sql',
|
||||
},
|
||||
]),
|
||||
all_lookml_models: vi
|
||||
.fn()
|
||||
.mockResolvedValue([
|
||||
{ name: 'b2b', label: 'B2B', explores: [{ name: 'sales_pipeline', label: 'Sales Pipeline' }] },
|
||||
]),
|
||||
lookml_model_explore: vi.fn().mockResolvedValue({
|
||||
name: 'sales_pipeline',
|
||||
label: 'Sales Pipeline',
|
||||
description: 'Opportunity pipeline',
|
||||
sql_table_name: 'proj.dataset.opportunities AS opportunities',
|
||||
connection_name: 'b2b_sandbox_bq',
|
||||
view_name: 'opportunities',
|
||||
fields: {
|
||||
dimensions: [{ name: 'opportunities.stage', label: 'Stage', type: 'string', sql: '$' + '{TABLE}.stage' }],
|
||||
measures: [{ name: 'opportunities.arr', label: 'ARR', type: 'sum', sql: '$' + '{TABLE}.arr' }],
|
||||
},
|
||||
joins: [
|
||||
{
|
||||
name: 'accounts',
|
||||
type: 'left_outer',
|
||||
relationship: 'many_to_one',
|
||||
sql_table_name: 'proj.dataset.accounts',
|
||||
sql_on: '$' + '{opportunities.account_id} = $' + '{accounts.id}',
|
||||
from: null,
|
||||
},
|
||||
],
|
||||
}),
|
||||
run_inline_query: vi.fn().mockResolvedValue('[]'),
|
||||
logout: vi.fn().mockResolvedValue(undefined),
|
||||
...overrides,
|
||||
};
|
||||
return port;
|
||||
}
|
||||
|
||||
describe('LookerClient', () => {
|
||||
it('validates credentials with me()', async () => {
|
||||
const client = new LookerClient(params(), { sdkFactory: () => sdk() });
|
||||
|
||||
await expect(client.testConnection()).resolves.toEqual({
|
||||
success: true,
|
||||
metadata: { userId: '1', displayName: 'API User', email: 'api@example.com' },
|
||||
});
|
||||
});
|
||||
|
||||
it('does not warn to console when optional prioritization inputs fail by default', async () => {
|
||||
const warn = vi.spyOn(console, 'warn').mockImplementation(() => undefined);
|
||||
const fakeSdk = sdk({
|
||||
search_dashboards: vi.fn().mockRejectedValue(new Error('dashboards unavailable')),
|
||||
search_looks: vi.fn().mockRejectedValue(new Error('looks unavailable')),
|
||||
});
|
||||
const client = new LookerClient(params(), { sdkFactory: () => fakeSdk });
|
||||
|
||||
await expect(client.getSignals()).resolves.toMatchObject({
|
||||
dashboardUsage: [],
|
||||
lookUsage: [],
|
||||
scheduledPlans: [],
|
||||
favorites: [],
|
||||
});
|
||||
|
||||
expect(warn).not.toHaveBeenCalled();
|
||||
});
|
||||
|
||||
it('maps dashboards, looks, folders, models, explores, users, and groups to staged DTOs', async () => {
|
||||
const fakeSdk = sdk();
|
||||
const client = new LookerClient(params(), { sdkFactory: () => fakeSdk });
|
||||
|
||||
await expect(client.listDashboards()).resolves.toEqual([{ id: '10', updatedAt: null }]);
|
||||
await expect(client.getDashboard('10')).resolves.toMatchObject({
|
||||
lookerId: '10',
|
||||
title: 'Revenue Dashboard',
|
||||
tiles: [{ id: '99', query: { model: 'b2b', view: 'sales_pipeline' } }],
|
||||
});
|
||||
await expect(client.listLooks()).resolves.toEqual([{ id: '30', updatedAt: null }]);
|
||||
await expect(client.getLook('30')).resolves.toMatchObject({
|
||||
lookerId: '30',
|
||||
title: 'Open Pipeline ARR',
|
||||
query: { model: 'b2b', view: 'sales_pipeline' },
|
||||
});
|
||||
await expect(client.listFolders()).resolves.toEqual({
|
||||
folders: [{ id: '20', name: 'Executive', parentId: null, path: ['Executive'] }],
|
||||
});
|
||||
await expect(client.listLookmlModels()).resolves.toEqual({
|
||||
models: [{ name: 'b2b', label: 'B2B', explores: [{ name: 'sales_pipeline', label: 'Sales Pipeline' }] }],
|
||||
});
|
||||
await expect(client.listLookerConnections()).resolves.toEqual([
|
||||
{
|
||||
name: 'b2b_sandbox_bq',
|
||||
host: 'warehouse.example.com',
|
||||
database: 'analytics',
|
||||
schema: 'public',
|
||||
dialect: 'bigquery_standard_sql',
|
||||
},
|
||||
]);
|
||||
await expect(client.getExplore('b2b', 'sales_pipeline')).resolves.toMatchObject({
|
||||
modelName: 'b2b',
|
||||
exploreName: 'sales_pipeline',
|
||||
rawSqlTableName: 'proj.dataset.opportunities AS opportunities',
|
||||
connectionName: 'b2b_sandbox_bq',
|
||||
viewName: 'opportunities',
|
||||
fields: { dimensions: [{ name: 'opportunities.stage' }], measures: [{ name: 'opportunities.arr' }] },
|
||||
joins: [
|
||||
{
|
||||
name: 'accounts',
|
||||
rawSqlTableName: 'proj.dataset.accounts',
|
||||
sqlOn: '$' + '{opportunities.account_id} = $' + '{accounts.id}',
|
||||
from: null,
|
||||
targetTable: null,
|
||||
},
|
||||
],
|
||||
targetWarehouseConnectionId: null,
|
||||
targetTable: null,
|
||||
});
|
||||
expect(fakeSdk.dashboard).toHaveBeenCalledWith(
|
||||
'10',
|
||||
'id,title,description,folder_id,user_id,updated_at,dashboard_elements(id,title,look_id,query(id,model,view,fields,filters,sorts,limit,dynamic_fields))',
|
||||
);
|
||||
expect(fakeSdk.look).toHaveBeenCalledWith(
|
||||
'30',
|
||||
'id,title,description,folder_id,user_id,updated_at,query(id,model,view,fields,filters,sorts,limit,dynamic_fields)',
|
||||
);
|
||||
expect(fakeSdk.lookml_model_explore).toHaveBeenCalledWith(
|
||||
'b2b',
|
||||
'sales_pipeline',
|
||||
'name,label,description,sql_table_name,connection_name,view_name,fields,joins(name,type,relationship,sql_table_name,sql_on,from)',
|
||||
);
|
||||
expect(fakeSdk.all_connections).toHaveBeenCalledWith('name,host,database,schema,dialect_name');
|
||||
});
|
||||
|
||||
it('returns empty usage signals when system activity access fails', async () => {
|
||||
const client = new LookerClient(params(), {
|
||||
sdkFactory: () =>
|
||||
sdk({
|
||||
run_inline_query: vi.fn().mockRejectedValue(new Error('access denied')),
|
||||
search_dashboards: vi.fn().mockResolvedValue([{ id: '10', favorite_count: 4 }]),
|
||||
search_looks: vi.fn().mockResolvedValue([{ id: '30', favorite_count: 2 }]),
|
||||
search_scheduled_plans: vi.fn().mockResolvedValue([]),
|
||||
}),
|
||||
});
|
||||
|
||||
await expect(client.getSignals()).resolves.toEqual({
|
||||
dashboardUsage: [],
|
||||
lookUsage: [],
|
||||
scheduledPlans: [],
|
||||
favorites: [
|
||||
{ contentId: '10', contentType: 'dashboard', favoriteCount: 4 },
|
||||
{ contentId: '30', contentType: 'look', favoriteCount: 2 },
|
||||
],
|
||||
});
|
||||
});
|
||||
|
||||
it('paginates dashboard and Look searches', async () => {
|
||||
const dashboardPageOne = Array.from({ length: 500 }, (_, index) => ({ id: String(index + 1) }));
|
||||
const lookPageOne = Array.from({ length: 500 }, (_, index) => ({ id: String(index + 1001) }));
|
||||
const fakeSdk = sdk({
|
||||
search_dashboards: vi
|
||||
.fn()
|
||||
.mockResolvedValueOnce(dashboardPageOne)
|
||||
.mockResolvedValueOnce([{ id: '501' }]),
|
||||
search_looks: vi
|
||||
.fn()
|
||||
.mockResolvedValueOnce(lookPageOne)
|
||||
.mockResolvedValueOnce([{ id: '1501' }]),
|
||||
});
|
||||
const client = new LookerClient(params(), { sdkFactory: () => fakeSdk });
|
||||
|
||||
await expect(client.listDashboards()).resolves.toHaveLength(501);
|
||||
await expect(client.listLooks()).resolves.toHaveLength(501);
|
||||
|
||||
expect(fakeSdk.search_dashboards).toHaveBeenNthCalledWith(
|
||||
1,
|
||||
expect.objectContaining({
|
||||
deleted: false,
|
||||
fields: 'id,updated_at',
|
||||
limit: 500,
|
||||
offset: 0,
|
||||
sorts: 'id',
|
||||
}),
|
||||
);
|
||||
expect(fakeSdk.search_dashboards).toHaveBeenNthCalledWith(
|
||||
2,
|
||||
expect.objectContaining({
|
||||
limit: 500,
|
||||
offset: 500,
|
||||
}),
|
||||
);
|
||||
expect(fakeSdk.search_looks).toHaveBeenNthCalledWith(
|
||||
1,
|
||||
expect.objectContaining({
|
||||
deleted: false,
|
||||
fields: 'id,updated_at',
|
||||
limit: 500,
|
||||
offset: 0,
|
||||
sorts: 'id',
|
||||
}),
|
||||
);
|
||||
expect(fakeSdk.search_looks).toHaveBeenNthCalledWith(
|
||||
2,
|
||||
expect.objectContaining({
|
||||
limit: 500,
|
||||
offset: 500,
|
||||
}),
|
||||
);
|
||||
});
|
||||
|
||||
it('returns updatedAt cursors from dashboard and Look listing rows', async () => {
|
||||
const fakeSdk = sdk({
|
||||
search_dashboards: vi.fn().mockResolvedValue([{ id: '10', updated_at: '2026-04-30T12:00:00.000Z' }]),
|
||||
search_looks: vi.fn().mockResolvedValue([{ id: '30', updated_at: '2026-04-30T11:00:00.000Z' }]),
|
||||
});
|
||||
const client = new LookerClient(params(), { sdkFactory: () => fakeSdk });
|
||||
|
||||
await expect(client.listDashboards()).resolves.toEqual([{ id: '10', updatedAt: '2026-04-30T12:00:00.000Z' }]);
|
||||
await expect(client.listLooks()).resolves.toEqual([{ id: '30', updatedAt: '2026-04-30T11:00:00.000Z' }]);
|
||||
});
|
||||
|
||||
it('logs out the SDK session during cleanup', async () => {
|
||||
const fakeSdk = sdk();
|
||||
const client = new LookerClient(params(), { sdkFactory: () => fakeSdk });
|
||||
|
||||
await client.testConnection();
|
||||
await client.cleanup();
|
||||
|
||||
expect(fakeSdk.logout).toHaveBeenCalledTimes(1);
|
||||
});
|
||||
|
||||
it('aggregates usage, scheduled-plan, and favorite signals', async () => {
|
||||
const runInlineQuery = vi
|
||||
.fn()
|
||||
.mockResolvedValueOnce(
|
||||
JSON.stringify([
|
||||
{
|
||||
'dashboard.id': '10',
|
||||
'history.query_run_count': 3,
|
||||
'history.created_date': '2026-04-30',
|
||||
'user.id': 'user-1',
|
||||
},
|
||||
{
|
||||
'dashboard.id': '10',
|
||||
'history.query_run_count': '2',
|
||||
'history.created_date': '2026-04-29',
|
||||
'user.id': 'user-2',
|
||||
},
|
||||
]),
|
||||
)
|
||||
.mockResolvedValueOnce(
|
||||
JSON.stringify([
|
||||
{
|
||||
'look.id': '30',
|
||||
'history.query_run_count': 7,
|
||||
'history.created_date': '2026-04-28',
|
||||
'user.id': 'user-1',
|
||||
},
|
||||
]),
|
||||
);
|
||||
const fakeSdk = sdk({
|
||||
run_inline_query: runInlineQuery,
|
||||
search_dashboards: vi.fn().mockResolvedValueOnce([{ id: '10', favorite_count: 4 }]),
|
||||
search_looks: vi.fn().mockResolvedValueOnce([{ id: '30', favorite_count: 2 }]),
|
||||
search_scheduled_plans: vi.fn().mockResolvedValueOnce([
|
||||
{
|
||||
id: 'sp-dashboard',
|
||||
dashboard_id: '10',
|
||||
look_id: null,
|
||||
enabled: true,
|
||||
scheduled_plan_destination: [{ id: 'dest-1' }, { id: 'dest-2' }],
|
||||
},
|
||||
{
|
||||
id: 'sp-look',
|
||||
dashboard_id: null,
|
||||
look_id: '30',
|
||||
enabled: true,
|
||||
scheduled_plan_destination: [{ id: 'dest-3' }],
|
||||
},
|
||||
]),
|
||||
});
|
||||
const client = new LookerClient(params(), { sdkFactory: () => fakeSdk });
|
||||
|
||||
await expect(client.getSignals()).resolves.toEqual({
|
||||
dashboardUsage: [
|
||||
{
|
||||
contentId: '10',
|
||||
queryCount30d: 5,
|
||||
uniqueUsers30d: 2,
|
||||
lastRunAt: '2026-04-30',
|
||||
topUsers: ['user-1', 'user-2'],
|
||||
},
|
||||
],
|
||||
lookUsage: [
|
||||
{
|
||||
contentId: '30',
|
||||
queryCount30d: 7,
|
||||
uniqueUsers30d: 1,
|
||||
lastRunAt: '2026-04-28',
|
||||
topUsers: ['user-1'],
|
||||
},
|
||||
],
|
||||
scheduledPlans: [
|
||||
{
|
||||
contentId: '10',
|
||||
contentType: 'dashboard',
|
||||
isScheduled: true,
|
||||
scheduleCount: 1,
|
||||
recipientCount: 2,
|
||||
},
|
||||
{
|
||||
contentId: '30',
|
||||
contentType: 'look',
|
||||
isScheduled: true,
|
||||
scheduleCount: 1,
|
||||
recipientCount: 1,
|
||||
},
|
||||
],
|
||||
favorites: [
|
||||
{ contentId: '10', contentType: 'dashboard', favoriteCount: 4 },
|
||||
{ contentId: '30', contentType: 'look', favoriteCount: 2 },
|
||||
],
|
||||
});
|
||||
|
||||
expect(runInlineQuery).toHaveBeenNthCalledWith(
|
||||
1,
|
||||
expect.objectContaining({
|
||||
result_format: 'json',
|
||||
body: expect.objectContaining({
|
||||
model: 'system__activity',
|
||||
view: 'history',
|
||||
fields: ['dashboard.id', 'history.query_run_count', 'history.created_date', 'user.id'],
|
||||
}),
|
||||
}),
|
||||
);
|
||||
expect(fakeSdk.search_scheduled_plans).toHaveBeenCalledWith(
|
||||
expect.objectContaining({
|
||||
all_users: true,
|
||||
fields: 'id,dashboard_id,look_id,enabled,scheduled_plan_destination',
|
||||
limit: 500,
|
||||
offset: 0,
|
||||
sorts: 'id',
|
||||
}),
|
||||
);
|
||||
});
|
||||
|
||||
it('retries a 429 response once using Retry-After seconds', async () => {
|
||||
const sleep = vi.fn().mockResolvedValue(undefined);
|
||||
const rateLimitError = new Error('rate limited');
|
||||
Object.assign(rateLimitError, { statusCode: 429, headers: { 'retry-after': '2' } });
|
||||
const fakeSdk = sdk({
|
||||
search_dashboards: vi
|
||||
.fn()
|
||||
.mockRejectedValueOnce(rateLimitError)
|
||||
.mockResolvedValueOnce([{ id: '10' }]),
|
||||
});
|
||||
const client = new LookerClient(params(), { sdkFactory: () => fakeSdk, sleep });
|
||||
|
||||
await expect(client.listDashboards()).resolves.toEqual([{ id: '10', updatedAt: null }]);
|
||||
|
||||
expect(sleep).toHaveBeenCalledWith(2000);
|
||||
expect(fakeSdk.search_dashboards).toHaveBeenCalledTimes(2);
|
||||
});
|
||||
|
||||
it('does not retry non-429 errors', async () => {
|
||||
const sleep = vi.fn().mockResolvedValue(undefined);
|
||||
const error = new Error('broken dashboard');
|
||||
Object.assign(error, { statusCode: 500 });
|
||||
const fakeSdk = sdk({ dashboard: vi.fn().mockRejectedValue(error) });
|
||||
const client = new LookerClient(params(), { sdkFactory: () => fakeSdk, sleep });
|
||||
|
||||
await expect(client.getDashboard('10')).rejects.toThrow('broken dashboard');
|
||||
|
||||
expect(sleep).not.toHaveBeenCalled();
|
||||
expect(fakeSdk.dashboard).toHaveBeenCalledTimes(1);
|
||||
});
|
||||
|
||||
it('initializes the real @looker/sdk-node SDK with inline credentials without throwing', async () => {
|
||||
const client = new LookerClient(params());
|
||||
|
||||
const result = await client.testConnection();
|
||||
|
||||
// Without injected sdkFactory the real SDK is constructed via InlineLookerSettings.
|
||||
// This used to throw "Missing required configuration values like base_url" because
|
||||
// the parent NodeSettingsIniFile constructor validated config before the override
|
||||
// could supply credentials. Whatever happens now (auth/network failure against the
|
||||
// bogus example URL is fine) — what must NOT happen is a synchronous SDK-init throw.
|
||||
expect(result.success).toBe(false);
|
||||
expect(result.error).toBeDefined();
|
||||
expect(result.error).not.toMatch(/Missing required configuration values/i);
|
||||
|
||||
await client.cleanup();
|
||||
});
|
||||
|
||||
it('strips trailing /api/4.0 from base_url so the SDK does not double-prefix it', async () => {
|
||||
const clientWithSuffix = new LookerClient({
|
||||
base_url: 'https://example.looker.com/api/4.0',
|
||||
client_id: 'id',
|
||||
[clientSecretParam]: 'credential', // pragma: allowlist secret
|
||||
});
|
||||
const result = await clientWithSuffix.testConnection();
|
||||
expect(result.success).toBe(false);
|
||||
// If base_url is double-prefixed the SDK would hit /api/4.0/api/4.0/login. Either
|
||||
// the URL is correctly normalized (transport-level network failure) or we'd see a
|
||||
// 404/HTML response — either way the stack must not be a config-validation throw.
|
||||
expect(result.error).not.toMatch(/Missing required configuration values/i);
|
||||
await clientWithSuffix.cleanup();
|
||||
});
|
||||
});
|
||||
732
packages/cli/src/context/ingest/adapters/looker/client.ts
Normal file
732
packages/cli/src/context/ingest/adapters/looker/client.ts
Normal file
|
|
@ -0,0 +1,732 @@
|
|||
import type {
|
||||
IRequestRunInlineQuery,
|
||||
IRequestSearchDashboards,
|
||||
IRequestSearchLooks,
|
||||
IRequestSearchScheduledPlans,
|
||||
} from '@looker/sdk';
|
||||
import type { IApiSection, IApiSettings } from '@looker/sdk-rtl';
|
||||
import { LookerNodeSDK, NodeSettings } from '@looker/sdk-node';
|
||||
import type { LookerRuntimeClient } from './fetch.js';
|
||||
import type {
|
||||
StagedDashboardFile,
|
||||
StagedExploreFile,
|
||||
StagedFoldersTreeFile,
|
||||
StagedGroupFile,
|
||||
StagedLookerQuery,
|
||||
StagedLookerSignalsFile,
|
||||
StagedLookFile,
|
||||
StagedLookmlModelsFile,
|
||||
StagedUserFile,
|
||||
} from './types.js';
|
||||
|
||||
type LookerRecord = Record<string, unknown>;
|
||||
|
||||
export interface TestConnectionResult {
|
||||
success: boolean;
|
||||
error?: string;
|
||||
metadata?: Record<string, unknown>;
|
||||
}
|
||||
|
||||
export interface LookerConnectionParams extends Record<string, unknown> {
|
||||
base_url: string;
|
||||
client_id: string;
|
||||
client_secret: string;
|
||||
}
|
||||
|
||||
export interface LookerWarehouseConnectionInfo {
|
||||
name: string;
|
||||
host: string | null;
|
||||
database: string | null;
|
||||
schema: string | null;
|
||||
dialect: string | null;
|
||||
}
|
||||
|
||||
const LOOKER_PAGE_SIZE = 500;
|
||||
const LOOKER_DASHBOARD_FIELDS =
|
||||
'id,title,description,folder_id,user_id,updated_at,dashboard_elements(id,title,look_id,query(id,model,view,fields,filters,sorts,limit,dynamic_fields))';
|
||||
const LOOKER_LOOK_FIELDS =
|
||||
'id,title,description,folder_id,user_id,updated_at,query(id,model,view,fields,filters,sorts,limit,dynamic_fields)';
|
||||
const LOOKER_EXPLORE_FIELDS =
|
||||
'name,label,description,sql_table_name,connection_name,view_name,fields,joins(name,type,relationship,sql_table_name,sql_on,from)';
|
||||
|
||||
export interface LookerSdkPort {
|
||||
me(fields?: string): Promise<LookerRecord>;
|
||||
search_dashboards(request?: LookerRecord): Promise<LookerRecord[]>;
|
||||
dashboard(id: string, fields?: string): Promise<LookerRecord>;
|
||||
search_looks(request?: LookerRecord): Promise<LookerRecord[]>;
|
||||
search_scheduled_plans(request?: LookerRecord): Promise<LookerRecord[]>;
|
||||
look(id: string, fields?: string): Promise<LookerRecord>;
|
||||
all_folders(fields?: string): Promise<LookerRecord[]>;
|
||||
all_users(fields?: string): Promise<LookerRecord[]>;
|
||||
all_groups(fields?: string): Promise<LookerRecord[]>;
|
||||
all_connections(fields?: string): Promise<LookerRecord[]>;
|
||||
all_lookml_models(fields?: string): Promise<LookerRecord[]>;
|
||||
lookml_model_explore(modelName: string, exploreName: string, fields?: string): Promise<LookerRecord>;
|
||||
run_inline_query(request: IRequestRunInlineQuery): Promise<string>;
|
||||
logout(): Promise<void>;
|
||||
}
|
||||
|
||||
export interface LookerClientLogger {
|
||||
log(message: string): void;
|
||||
warn(message: string): void;
|
||||
error(message: string): void;
|
||||
debug?(message: string): void;
|
||||
}
|
||||
|
||||
export interface LookerClientDeps {
|
||||
sdkFactory?: (params: LookerConnectionParams) => LookerSdkPort;
|
||||
sleep?: (ms: number) => Promise<void>;
|
||||
logger?: LookerClientLogger;
|
||||
}
|
||||
|
||||
const defaultLogger: LookerClientLogger = {
|
||||
log: () => undefined,
|
||||
warn: () => undefined,
|
||||
error: () => undefined,
|
||||
debug: () => undefined,
|
||||
};
|
||||
|
||||
class InlineLookerSettings extends NodeSettings {
|
||||
constructor(private readonly params: LookerConnectionParams) {
|
||||
super('', {
|
||||
base_url: normalizeBaseUrl(params.base_url),
|
||||
client_id: params.client_id,
|
||||
client_secret: params.client_secret, // pragma: allowlist secret
|
||||
verify_ssl: 'true',
|
||||
timeout: '120',
|
||||
} as unknown as IApiSettings);
|
||||
}
|
||||
|
||||
override readConfig(_section?: string): IApiSection {
|
||||
return {
|
||||
base_url: normalizeBaseUrl(this.params.base_url),
|
||||
client_id: this.params.client_id,
|
||||
client_secret: this.params.client_secret, // pragma: allowlist secret
|
||||
verify_ssl: 'true',
|
||||
timeout: '120',
|
||||
};
|
||||
}
|
||||
}
|
||||
|
||||
function createLookerSdkPort(params: LookerConnectionParams): LookerSdkPort {
|
||||
const sdk = LookerNodeSDK.init40(new InlineLookerSettings(params));
|
||||
return {
|
||||
me: (fields) => sdk.ok(sdk.me(fields)).then(toRecord),
|
||||
search_dashboards: (request) =>
|
||||
sdk.ok(sdk.search_dashboards((request ?? {}) as IRequestSearchDashboards)).then(toRecordArray),
|
||||
dashboard: (id, fields) => sdk.ok(sdk.dashboard(id, fields)).then(toRecord),
|
||||
search_looks: (request) => sdk.ok(sdk.search_looks((request ?? {}) as IRequestSearchLooks)).then(toRecordArray),
|
||||
search_scheduled_plans: (request) =>
|
||||
sdk.ok(sdk.search_scheduled_plans((request ?? {}) as IRequestSearchScheduledPlans)).then(toRecordArray),
|
||||
look: (id, fields) => sdk.ok(sdk.look(id, fields)).then(toRecord),
|
||||
all_folders: (fields) => sdk.ok(sdk.all_folders(fields)).then(toRecordArray),
|
||||
all_users: (fields) => sdk.ok(sdk.all_users({ fields })).then(toRecordArray),
|
||||
all_groups: (fields) => sdk.ok(sdk.all_groups({ fields })).then(toRecordArray),
|
||||
all_connections: (fields) => sdk.ok(sdk.all_connections(fields)).then(toRecordArray),
|
||||
all_lookml_models: (fields) => sdk.ok(sdk.all_lookml_models({ fields })).then(toRecordArray),
|
||||
lookml_model_explore: (modelName, exploreName, fields) =>
|
||||
sdk
|
||||
.ok(sdk.lookml_model_explore({ lookml_model_name: modelName, explore_name: exploreName, fields }))
|
||||
.then(toRecord),
|
||||
run_inline_query: (request) => sdk.ok(sdk.run_inline_query(request)),
|
||||
logout: async () => {
|
||||
await sdk.authSession.logout();
|
||||
},
|
||||
};
|
||||
}
|
||||
|
||||
export class LookerClient implements LookerRuntimeClient {
|
||||
private readonly logger: LookerClientLogger;
|
||||
private readonly params: LookerConnectionParams;
|
||||
private sdkInstance: LookerSdkPort | null = null;
|
||||
|
||||
constructor(
|
||||
connectionParams: Record<string, unknown>,
|
||||
private readonly deps: LookerClientDeps = {},
|
||||
) {
|
||||
this.logger = deps.logger ?? defaultLogger;
|
||||
this.params = parseLookerConnectionParams(connectionParams);
|
||||
}
|
||||
|
||||
get dataSourceType(): string {
|
||||
return 'LOOKER';
|
||||
}
|
||||
|
||||
async testConnection(): Promise<TestConnectionResult> {
|
||||
try {
|
||||
const me = await this.withRateLimitRetry(() => this.sdk().me('id,display_name,email'));
|
||||
return {
|
||||
success: true,
|
||||
metadata: {
|
||||
userId: stringValue(me.id),
|
||||
displayName: nullableString(me.display_name),
|
||||
email: nullableString(me.email),
|
||||
},
|
||||
};
|
||||
} catch (error) {
|
||||
return { success: false, error: error instanceof Error ? error.message : String(error) };
|
||||
}
|
||||
}
|
||||
|
||||
async listDashboards(): Promise<Array<{ id: string; updatedAt: string | null }>> {
|
||||
const dashboards = await this.collectPaged((offset) =>
|
||||
this.sdk().search_dashboards({
|
||||
deleted: false,
|
||||
fields: 'id,updated_at',
|
||||
limit: LOOKER_PAGE_SIZE,
|
||||
offset,
|
||||
sorts: 'id',
|
||||
}),
|
||||
);
|
||||
return dashboards.flatMap(entityRef);
|
||||
}
|
||||
|
||||
async getDashboard(id: string): Promise<StagedDashboardFile> {
|
||||
const dashboard = await this.withRateLimitRetry(() => this.sdk().dashboard(id, LOOKER_DASHBOARD_FIELDS));
|
||||
const elements = arrayValue(dashboard.dashboard_elements);
|
||||
return {
|
||||
lookerId: stringValue(dashboard.id),
|
||||
title: stringValue(dashboard.title),
|
||||
description: nullableString(dashboard.description),
|
||||
folderId: nullableString(dashboard.folder_id),
|
||||
ownerId: nullableString(dashboard.user_id),
|
||||
updatedAt: nullableString(dashboard.updated_at),
|
||||
tiles: elements.map((tile) => ({
|
||||
id: stringValue(tile.id),
|
||||
title: nullableString(tile.title),
|
||||
lookId: nullableString(tile.look_id),
|
||||
query: queryValue(tile.query),
|
||||
})),
|
||||
};
|
||||
}
|
||||
|
||||
async listLooks(): Promise<Array<{ id: string; updatedAt: string | null }>> {
|
||||
const looks = await this.collectPaged((offset) =>
|
||||
this.sdk().search_looks({
|
||||
deleted: false,
|
||||
fields: 'id,updated_at',
|
||||
limit: LOOKER_PAGE_SIZE,
|
||||
offset,
|
||||
sorts: 'id',
|
||||
}),
|
||||
);
|
||||
return looks.flatMap(entityRef);
|
||||
}
|
||||
|
||||
async getLook(id: string): Promise<StagedLookFile> {
|
||||
const look = await this.withRateLimitRetry(() => this.sdk().look(id, LOOKER_LOOK_FIELDS));
|
||||
return {
|
||||
lookerId: stringValue(look.id),
|
||||
title: stringValue(look.title),
|
||||
description: nullableString(look.description),
|
||||
folderId: nullableString(look.folder_id),
|
||||
ownerId: nullableString(look.user_id),
|
||||
updatedAt: nullableString(look.updated_at),
|
||||
query: queryValue(look.query),
|
||||
};
|
||||
}
|
||||
|
||||
async listFolders(): Promise<StagedFoldersTreeFile> {
|
||||
const folders = await this.withRateLimitRetry(() => this.sdk().all_folders('id,name,parent_id'));
|
||||
const byId = new Map<string, LookerRecord>();
|
||||
for (const folder of folders) {
|
||||
byId.set(stringValue(folder.id), folder);
|
||||
}
|
||||
return {
|
||||
folders: folders.map((folder) => ({
|
||||
id: stringValue(folder.id),
|
||||
name: stringValue(folder.name),
|
||||
parentId: nullableString(folder.parent_id),
|
||||
path: folderPath(folder, byId),
|
||||
})),
|
||||
};
|
||||
}
|
||||
|
||||
async listUsers(): Promise<StagedUserFile[]> {
|
||||
const users = await this.withRateLimitRetry(() => this.sdk().all_users('id,display_name,email'));
|
||||
return users.map((user) => ({
|
||||
id: stringValue(user.id),
|
||||
displayName: nullableString(user.display_name),
|
||||
email: nullableString(user.email),
|
||||
}));
|
||||
}
|
||||
|
||||
async listGroups(): Promise<StagedGroupFile[]> {
|
||||
const groups = await this.withRateLimitRetry(() => this.sdk().all_groups('id,name'));
|
||||
return groups.map((group) => ({
|
||||
id: stringValue(group.id),
|
||||
name: stringValue(group.name),
|
||||
}));
|
||||
}
|
||||
|
||||
async listLookmlModels(): Promise<StagedLookmlModelsFile> {
|
||||
const models = await this.withRateLimitRetry(() => this.sdk().all_lookml_models('name,label,explores'));
|
||||
return {
|
||||
models: models.map((model) => ({
|
||||
name: stringValue(model.name),
|
||||
label: nullableString(model.label),
|
||||
explores: arrayValue(model.explores).map((explore) => ({
|
||||
name: stringValue(explore.name),
|
||||
label: nullableString(explore.label),
|
||||
})),
|
||||
})),
|
||||
};
|
||||
}
|
||||
|
||||
async listLookerConnections(): Promise<LookerWarehouseConnectionInfo[]> {
|
||||
const connections = await this.withRateLimitRetry(() =>
|
||||
this.sdk().all_connections('name,host,database,schema,dialect_name'),
|
||||
);
|
||||
return connections.map((connection) => ({
|
||||
name: stringValue(connection.name),
|
||||
host: nullableString(connection.host),
|
||||
database: nullableString(connection.database),
|
||||
schema: nullableString(connection.schema),
|
||||
dialect: nullableString(connection.dialect_name ?? connection.dialect),
|
||||
}));
|
||||
}
|
||||
|
||||
async getExplore(modelName: string, exploreName: string): Promise<StagedExploreFile> {
|
||||
const explore = await this.withRateLimitRetry(() =>
|
||||
this.sdk().lookml_model_explore(modelName, exploreName, LOOKER_EXPLORE_FIELDS),
|
||||
);
|
||||
const fields = recordValue(explore.fields);
|
||||
return {
|
||||
modelName,
|
||||
exploreName: stringValue(explore.name),
|
||||
label: nullableString(explore.label),
|
||||
description: nullableString(explore.description),
|
||||
rawSqlTableName: nullableString(explore.sql_table_name ?? explore.sqlTableName),
|
||||
connectionName: nullableString(explore.connection_name ?? explore.connectionName),
|
||||
viewName: nullableString(explore.view_name ?? explore.viewName),
|
||||
fields: {
|
||||
dimensions: arrayValue(fields.dimensions).map(stagedField),
|
||||
measures: arrayValue(fields.measures).map(stagedField),
|
||||
},
|
||||
joins: arrayValue(explore.joins).map((join) => ({
|
||||
name: stringValue(join.name),
|
||||
type: nullableString(join.type),
|
||||
relationship: nullableString(join.relationship),
|
||||
rawSqlTableName: nullableString(join.sql_table_name ?? join.sqlTableName),
|
||||
sqlOn: nullableString(join.sql_on ?? join.sqlOn),
|
||||
from: nullableString(join.from),
|
||||
targetTable: null,
|
||||
})),
|
||||
targetWarehouseConnectionId: null,
|
||||
targetTable: null,
|
||||
};
|
||||
}
|
||||
|
||||
async getSignals(): Promise<StagedLookerSignalsFile> {
|
||||
const [dashboardUsage, lookUsage, scheduledPlans, favorites] = await Promise.all([
|
||||
this.getUsageSignals('dashboard').catch((error) =>
|
||||
this.warnAndReturnEmpty('Looker system__activity dashboard usage unavailable', error),
|
||||
),
|
||||
this.getUsageSignals('look').catch((error) =>
|
||||
this.warnAndReturnEmpty('Looker system__activity Look usage unavailable', error),
|
||||
),
|
||||
this.getScheduledPlanSignals().catch((error) =>
|
||||
this.warnAndReturnEmpty('Looker scheduled-plan signals unavailable', error),
|
||||
),
|
||||
this.getFavoriteSignals().catch((error) => this.warnAndReturnEmpty('Looker favorite signals unavailable', error)),
|
||||
]);
|
||||
|
||||
return { dashboardUsage, lookUsage, scheduledPlans, favorites };
|
||||
}
|
||||
|
||||
async cleanup(): Promise<void> {
|
||||
const sdk = this.sdkInstance;
|
||||
if (!sdk) {
|
||||
return;
|
||||
}
|
||||
await sdk.logout();
|
||||
this.sdkInstance = null;
|
||||
}
|
||||
|
||||
private async getUsageSignals(contentType: 'dashboard' | 'look'): Promise<StagedLookerSignalsFile['dashboardUsage']> {
|
||||
const idField = contentType === 'dashboard' ? 'dashboard.id' : 'look.id';
|
||||
const raw = await this.withRateLimitRetry(() =>
|
||||
this.sdk().run_inline_query({
|
||||
result_format: 'json',
|
||||
body: {
|
||||
model: 'system__activity',
|
||||
view: 'history',
|
||||
fields: [idField, 'history.query_run_count', 'history.created_date', 'user.id'],
|
||||
filters: {
|
||||
'history.created_date': '30 days',
|
||||
[idField]: '-NULL',
|
||||
},
|
||||
sorts: ['history.query_run_count desc'],
|
||||
limit: '5000',
|
||||
},
|
||||
}),
|
||||
);
|
||||
|
||||
return aggregateUsageRows(parseJsonRows(raw), idField);
|
||||
}
|
||||
|
||||
private async getScheduledPlanSignals(): Promise<StagedLookerSignalsFile['scheduledPlans']> {
|
||||
const plans = await this.collectPaged((offset) =>
|
||||
this.sdk().search_scheduled_plans({
|
||||
all_users: true,
|
||||
fields: 'id,dashboard_id,look_id,enabled,scheduled_plan_destination',
|
||||
limit: LOOKER_PAGE_SIZE,
|
||||
offset,
|
||||
sorts: 'id',
|
||||
}),
|
||||
);
|
||||
const byContent = new Map<
|
||||
string,
|
||||
{
|
||||
contentId: string;
|
||||
contentType: 'dashboard' | 'look';
|
||||
isScheduled: boolean;
|
||||
scheduleCount: number;
|
||||
recipientCount: number;
|
||||
}
|
||||
>();
|
||||
|
||||
for (const plan of plans) {
|
||||
const dashboardId = nullableString(plan.dashboard_id);
|
||||
const lookId = nullableString(plan.look_id);
|
||||
const contentType = dashboardId ? 'dashboard' : lookId ? 'look' : null;
|
||||
const contentId = dashboardId ?? lookId;
|
||||
if (!contentType || !contentId) {
|
||||
continue;
|
||||
}
|
||||
const key = `${contentType}:${contentId}`;
|
||||
const current =
|
||||
byContent.get(key) ??
|
||||
({
|
||||
contentId,
|
||||
contentType,
|
||||
isScheduled: false,
|
||||
scheduleCount: 0,
|
||||
recipientCount: 0,
|
||||
} satisfies StagedLookerSignalsFile['scheduledPlans'][number]);
|
||||
if (plan.enabled !== false) {
|
||||
current.isScheduled = true;
|
||||
current.scheduleCount += 1;
|
||||
current.recipientCount += arrayValue(plan.scheduled_plan_destination).length;
|
||||
}
|
||||
byContent.set(key, current);
|
||||
}
|
||||
|
||||
return [...byContent.values()].filter((signal) => signal.scheduleCount > 0).sort(compareContentSignals);
|
||||
}
|
||||
|
||||
private async getFavoriteSignals(): Promise<StagedLookerSignalsFile['favorites']> {
|
||||
const dashboards = await this.collectPaged((offset) =>
|
||||
this.sdk().search_dashboards({
|
||||
deleted: false,
|
||||
fields: 'id,favorite_count',
|
||||
limit: LOOKER_PAGE_SIZE,
|
||||
offset,
|
||||
sorts: 'id',
|
||||
}),
|
||||
);
|
||||
const looks = await this.collectPaged((offset) =>
|
||||
this.sdk().search_looks({
|
||||
deleted: false,
|
||||
fields: 'id,favorite_count',
|
||||
limit: LOOKER_PAGE_SIZE,
|
||||
offset,
|
||||
sorts: 'id',
|
||||
}),
|
||||
);
|
||||
|
||||
return [
|
||||
...dashboards.flatMap((dashboard) => favoriteSignal(dashboard, 'dashboard')),
|
||||
...looks.flatMap((look) => favoriteSignal(look, 'look')),
|
||||
].sort(compareContentSignals);
|
||||
}
|
||||
|
||||
private warnAndReturnEmpty(message: string, error: unknown): never[] {
|
||||
this.logger.warn(`${message}; continuing without that prioritization input: ${errorMessage(error)}`);
|
||||
return [];
|
||||
}
|
||||
|
||||
private async collectPaged(loadPage: (offset: number) => Promise<LookerRecord[]>): Promise<LookerRecord[]> {
|
||||
const rows: LookerRecord[] = [];
|
||||
for (let offset = 0; ; offset += LOOKER_PAGE_SIZE) {
|
||||
const page = await this.withRateLimitRetry(() => loadPage(offset));
|
||||
rows.push(...page);
|
||||
if (page.length < LOOKER_PAGE_SIZE) {
|
||||
return rows;
|
||||
}
|
||||
}
|
||||
}
|
||||
|
||||
private async withRateLimitRetry<T>(load: () => Promise<T>): Promise<T> {
|
||||
try {
|
||||
return await load();
|
||||
} catch (error) {
|
||||
if (lookerStatusCode(error) !== 429) {
|
||||
throw error;
|
||||
}
|
||||
await (this.deps.sleep ?? sleep)(retryAfterMs(error));
|
||||
return load();
|
||||
}
|
||||
}
|
||||
|
||||
private sdk(): LookerSdkPort {
|
||||
if (!this.sdkInstance) {
|
||||
this.sdkInstance = this.deps.sdkFactory?.(this.params) ?? createLookerSdkPort(this.params);
|
||||
}
|
||||
return this.sdkInstance;
|
||||
}
|
||||
}
|
||||
|
||||
function parseLookerConnectionParams(raw: Record<string, unknown>): LookerConnectionParams {
|
||||
const baseUrl = raw.base_url;
|
||||
const clientId = raw.client_id;
|
||||
const apiCredential = raw.client_secret; // pragma: allowlist secret
|
||||
if (typeof baseUrl !== 'string' || baseUrl.trim() === '') {
|
||||
throw new Error('Looker base_url is required');
|
||||
}
|
||||
if (typeof clientId !== 'string' || clientId.trim() === '') {
|
||||
throw new Error('Looker client_id is required');
|
||||
}
|
||||
if (typeof apiCredential !== 'string' || apiCredential.trim() === '') {
|
||||
throw new Error('Looker client_secret is required'); // pragma: allowlist secret
|
||||
}
|
||||
return { base_url: baseUrl, client_id: clientId, client_secret: apiCredential }; // pragma: allowlist secret
|
||||
}
|
||||
|
||||
function toRecord(value: object): LookerRecord {
|
||||
return value as LookerRecord;
|
||||
}
|
||||
|
||||
function toRecordArray(values: object[]): LookerRecord[] {
|
||||
return values.map(toRecord);
|
||||
}
|
||||
|
||||
function normalizeBaseUrl(baseUrl: string): string {
|
||||
return baseUrl
|
||||
.trim()
|
||||
.replace(/\/+$/, '')
|
||||
.replace(/\/api\/(4\.0|3\.1)$/, '');
|
||||
}
|
||||
|
||||
function entityRef(row: LookerRecord): Array<{ id: string; updatedAt: string | null }> {
|
||||
if (row.id === null || row.id === undefined) {
|
||||
return [];
|
||||
}
|
||||
return [{ id: String(row.id), updatedAt: nullableString(row.updated_at) }];
|
||||
}
|
||||
|
||||
function queryValue(value: unknown): StagedLookerQuery | null {
|
||||
if (!value || typeof value !== 'object') {
|
||||
return null;
|
||||
}
|
||||
const record = value as LookerRecord;
|
||||
if (typeof record.model !== 'string' || typeof record.view !== 'string') {
|
||||
return null;
|
||||
}
|
||||
return {
|
||||
id: nullableString(record.id) ?? undefined,
|
||||
model: record.model,
|
||||
view: record.view,
|
||||
fields: stringArray(record.fields),
|
||||
filters: recordValue(record.filters),
|
||||
sorts: stringArray(record.sorts),
|
||||
limit: typeof record.limit === 'string' || typeof record.limit === 'number' ? record.limit : null,
|
||||
dynamicFields: nullableString(record.dynamic_fields ?? record.dynamicFields),
|
||||
targetWarehouseConnectionId: null,
|
||||
targetTable: null,
|
||||
};
|
||||
}
|
||||
|
||||
function parseJsonRows(raw: string): LookerRecord[] {
|
||||
const parsed = JSON.parse(raw) as unknown;
|
||||
return Array.isArray(parsed) ? parsed.filter((row): row is LookerRecord => !!row && typeof row === 'object') : [];
|
||||
}
|
||||
|
||||
function aggregateUsageRows(
|
||||
rows: LookerRecord[],
|
||||
idField: 'dashboard.id' | 'look.id',
|
||||
): StagedLookerSignalsFile['dashboardUsage'] {
|
||||
const byContent = new Map<
|
||||
string,
|
||||
{
|
||||
contentId: string;
|
||||
queryCount30d: number;
|
||||
lastRunAt: string | null;
|
||||
users: Set<string>;
|
||||
}
|
||||
>();
|
||||
|
||||
for (const row of rows) {
|
||||
const contentId = nullableString(row[idField]);
|
||||
if (!contentId) {
|
||||
continue;
|
||||
}
|
||||
const current = byContent.get(contentId) ?? {
|
||||
contentId,
|
||||
queryCount30d: 0,
|
||||
lastRunAt: null,
|
||||
users: new Set<string>(),
|
||||
};
|
||||
current.queryCount30d += numberValue(row['history.query_run_count']);
|
||||
const userId = nullableString(row['user.id']);
|
||||
if (userId) {
|
||||
current.users.add(userId);
|
||||
}
|
||||
const lastRunAt = nullableString(row['history.created_date']);
|
||||
if (lastRunAt && (!current.lastRunAt || lastRunAt > current.lastRunAt)) {
|
||||
current.lastRunAt = lastRunAt;
|
||||
}
|
||||
byContent.set(contentId, current);
|
||||
}
|
||||
|
||||
return [...byContent.values()]
|
||||
.map((signal) => ({
|
||||
contentId: signal.contentId,
|
||||
queryCount30d: signal.queryCount30d,
|
||||
uniqueUsers30d: signal.users.size,
|
||||
lastRunAt: signal.lastRunAt,
|
||||
topUsers: [...signal.users].sort().slice(0, 5),
|
||||
}))
|
||||
.sort((a, b) => a.contentId.localeCompare(b.contentId));
|
||||
}
|
||||
|
||||
function favoriteSignal(row: LookerRecord, contentType: 'dashboard' | 'look'): StagedLookerSignalsFile['favorites'] {
|
||||
const contentId = nullableString(row.id);
|
||||
if (!contentId) {
|
||||
return [];
|
||||
}
|
||||
return [{ contentId, contentType, favoriteCount: numberValue(row.favorite_count) }];
|
||||
}
|
||||
|
||||
function compareContentSignals(
|
||||
a: { contentType?: string; contentId: string },
|
||||
b: { contentType?: string; contentId: string },
|
||||
): number {
|
||||
return `${a.contentType ?? ''}:${a.contentId}`.localeCompare(`${b.contentType ?? ''}:${b.contentId}`);
|
||||
}
|
||||
|
||||
function numberValue(value: unknown): number {
|
||||
if (typeof value === 'number' && Number.isFinite(value)) {
|
||||
return value;
|
||||
}
|
||||
if (typeof value === 'string' && value.trim() !== '') {
|
||||
const parsed = Number(value);
|
||||
return Number.isFinite(parsed) ? parsed : 0;
|
||||
}
|
||||
return 0;
|
||||
}
|
||||
|
||||
function errorMessage(error: unknown): string {
|
||||
return error instanceof Error ? error.message : String(error);
|
||||
}
|
||||
|
||||
async function sleep(ms: number): Promise<void> {
|
||||
await new Promise((resolve) => setTimeout(resolve, ms));
|
||||
}
|
||||
|
||||
function lookerStatusCode(error: unknown): number | null {
|
||||
if (!error || typeof error !== 'object') {
|
||||
return null;
|
||||
}
|
||||
const record = error as Record<string, unknown>;
|
||||
const direct = record.statusCode ?? record.status;
|
||||
if (typeof direct === 'number') {
|
||||
return direct;
|
||||
}
|
||||
if (typeof direct === 'string') {
|
||||
const parsed = Number(direct);
|
||||
return Number.isFinite(parsed) ? parsed : null;
|
||||
}
|
||||
const response = record.response;
|
||||
if (response && typeof response === 'object') {
|
||||
return lookerStatusCode(response);
|
||||
}
|
||||
return null;
|
||||
}
|
||||
|
||||
function retryAfterMs(error: unknown): number {
|
||||
const value = retryAfterHeader(error);
|
||||
if (!value) {
|
||||
return 1000;
|
||||
}
|
||||
const seconds = Number(value);
|
||||
if (Number.isFinite(seconds)) {
|
||||
return Math.max(0, seconds * 1000);
|
||||
}
|
||||
const dateMs = Date.parse(value);
|
||||
return Number.isFinite(dateMs) ? Math.max(0, dateMs - Date.now()) : 1000;
|
||||
}
|
||||
|
||||
function retryAfterHeader(error: unknown): string | null {
|
||||
if (!error || typeof error !== 'object') {
|
||||
return null;
|
||||
}
|
||||
const record = error as Record<string, unknown>;
|
||||
const response = record.response;
|
||||
const responseRecord = response && typeof response === 'object' ? (response as Record<string, unknown>) : null;
|
||||
const headers = record.headers ?? responseRecord?.headers;
|
||||
if (!headers || typeof headers !== 'object') {
|
||||
return null;
|
||||
}
|
||||
const getter = (headers as { get?: unknown }).get;
|
||||
if (typeof getter === 'function') {
|
||||
const value = getter.call(headers, 'retry-after');
|
||||
return typeof value === 'string' ? value : null;
|
||||
}
|
||||
const headerRecord = headers as Record<string, unknown>;
|
||||
const direct = headerRecord['retry-after'] ?? headerRecord['Retry-After'];
|
||||
return typeof direct === 'string' ? direct : null;
|
||||
}
|
||||
|
||||
function stagedField(value: LookerRecord) {
|
||||
return {
|
||||
name: stringValue(value.name),
|
||||
label: nullableString(value.label),
|
||||
type: nullableString(value.type),
|
||||
sql: nullableString(value.sql),
|
||||
description: nullableString(value.description),
|
||||
};
|
||||
}
|
||||
|
||||
function folderPath(folder: LookerRecord, byId: Map<string, LookerRecord>): string[] {
|
||||
const path: string[] = [];
|
||||
let current: LookerRecord | undefined = folder;
|
||||
const seen = new Set<string>();
|
||||
while (current) {
|
||||
const id = stringValue(current.id);
|
||||
if (seen.has(id)) {
|
||||
break;
|
||||
}
|
||||
seen.add(id);
|
||||
path.unshift(stringValue(current.name));
|
||||
const parentId = nullableString(current.parent_id);
|
||||
current = parentId ? byId.get(parentId) : undefined;
|
||||
}
|
||||
return path;
|
||||
}
|
||||
|
||||
function arrayValue(value: unknown): LookerRecord[] {
|
||||
return Array.isArray(value) ? value.filter((item): item is LookerRecord => !!item && typeof item === 'object') : [];
|
||||
}
|
||||
|
||||
function recordValue(value: unknown): Record<string, unknown> {
|
||||
return value && typeof value === 'object' && !Array.isArray(value) ? { ...(value as Record<string, unknown>) } : {};
|
||||
}
|
||||
|
||||
function stringArray(value: unknown): string[] {
|
||||
return Array.isArray(value) ? value.filter((item): item is string => typeof item === 'string') : [];
|
||||
}
|
||||
|
||||
function stringValue(value: unknown): string {
|
||||
if (value === null || value === undefined) {
|
||||
return '';
|
||||
}
|
||||
return String(value);
|
||||
}
|
||||
|
||||
function nullableString(value: unknown): string | null {
|
||||
if (value === null || value === undefined) {
|
||||
return null;
|
||||
}
|
||||
return String(value);
|
||||
}
|
||||
|
|
@ -0,0 +1,44 @@
|
|||
import { describe, expect, it, vi } from 'vitest';
|
||||
import { createDaemonLookerTableIdentifierParser } from './daemon-table-identifier-parser.js';
|
||||
|
||||
describe('createDaemonLookerTableIdentifierParser', () => {
|
||||
it('posts parse items to the daemon endpoint', async () => {
|
||||
const requestJson = vi.fn(async () => ({
|
||||
results: {
|
||||
orders: {
|
||||
ok: true,
|
||||
catalog: null,
|
||||
schema: 'public',
|
||||
name: 'orders',
|
||||
canonical_table: 'public.orders',
|
||||
},
|
||||
},
|
||||
}));
|
||||
const parser = createDaemonLookerTableIdentifierParser({
|
||||
baseUrl: 'http://127.0.0.1:8765',
|
||||
requestJson,
|
||||
});
|
||||
|
||||
await expect(parser.parse([{ key: 'orders', sql_table_name: 'public.orders', dialect: 'postgres' }])).resolves.toEqual({
|
||||
orders: {
|
||||
ok: true,
|
||||
catalog: null,
|
||||
schema: 'public',
|
||||
name: 'orders',
|
||||
canonical_table: 'public.orders',
|
||||
},
|
||||
});
|
||||
expect(requestJson).toHaveBeenCalledWith('/sql/parse-table-identifier', {
|
||||
items: [{ key: 'orders', sql_table_name: 'public.orders', dialect: 'postgres' }],
|
||||
});
|
||||
});
|
||||
|
||||
it('rejects non-object daemon responses', async () => {
|
||||
const parser = createDaemonLookerTableIdentifierParser({
|
||||
baseUrl: 'http://127.0.0.1:8765',
|
||||
requestJson: async () => ({ results: null }),
|
||||
});
|
||||
|
||||
await expect(parser.parse([])).rejects.toThrow('ktx-daemon table identifier parser returned invalid results');
|
||||
});
|
||||
});
|
||||
|
|
@ -0,0 +1,81 @@
|
|||
import { request as httpRequest } from 'node:http';
|
||||
import { request as httpsRequest } from 'node:https';
|
||||
import { URL } from 'node:url';
|
||||
import type {
|
||||
LookerParsedIdentifier,
|
||||
LookerTableIdentifierParseItem,
|
||||
LookerTableIdentifierParser,
|
||||
} from './mapping.js';
|
||||
|
||||
export type KtxDaemonTableIdentifierHttpJsonRunner = (
|
||||
path: string,
|
||||
payload: Record<string, unknown>,
|
||||
) => Promise<Record<string, unknown>>;
|
||||
|
||||
export interface DaemonLookerTableIdentifierParserOptions {
|
||||
baseUrl: string;
|
||||
requestJson?: KtxDaemonTableIdentifierHttpJsonRunner;
|
||||
}
|
||||
|
||||
export function createDaemonLookerTableIdentifierParser(
|
||||
options: DaemonLookerTableIdentifierParserOptions,
|
||||
): LookerTableIdentifierParser {
|
||||
const requestJson = options.requestJson ?? postJson(options.baseUrl);
|
||||
return {
|
||||
async parse(items: LookerTableIdentifierParseItem[]): Promise<Record<string, LookerParsedIdentifier>> {
|
||||
const raw = await requestJson('/sql/parse-table-identifier', { items });
|
||||
if (!raw.results || typeof raw.results !== 'object' || Array.isArray(raw.results)) {
|
||||
throw new Error('ktx-daemon table identifier parser returned invalid results');
|
||||
}
|
||||
return raw.results as Record<string, LookerParsedIdentifier>;
|
||||
},
|
||||
};
|
||||
}
|
||||
|
||||
function normalizedBaseUrl(baseUrl: string): string {
|
||||
return baseUrl.endsWith('/') ? baseUrl : `${baseUrl}/`;
|
||||
}
|
||||
|
||||
function postJson(baseUrl: string): KtxDaemonTableIdentifierHttpJsonRunner {
|
||||
return async (path, payload) =>
|
||||
new Promise((resolve, reject) => {
|
||||
const target = new URL(path.replace(/^\//, ''), normalizedBaseUrl(baseUrl));
|
||||
const body = JSON.stringify(payload);
|
||||
const client = target.protocol === 'https:' ? httpsRequest : httpRequest;
|
||||
const request = client(
|
||||
target,
|
||||
{
|
||||
method: 'POST',
|
||||
headers: {
|
||||
accept: 'application/json',
|
||||
'content-type': 'application/json',
|
||||
'content-length': Buffer.byteLength(body),
|
||||
},
|
||||
},
|
||||
(response) => {
|
||||
const chunks: Buffer[] = [];
|
||||
response.on('data', (chunk: Buffer) => chunks.push(chunk));
|
||||
response.on('end', () => {
|
||||
const text = Buffer.concat(chunks).toString('utf8');
|
||||
const statusCode = response.statusCode ?? 0;
|
||||
if (statusCode < 200 || statusCode >= 300) {
|
||||
reject(new Error(`ktx-daemon HTTP ${path} failed with ${statusCode}: ${text}`));
|
||||
return;
|
||||
}
|
||||
try {
|
||||
const parsed = JSON.parse(text) as unknown;
|
||||
if (!parsed || typeof parsed !== 'object' || Array.isArray(parsed)) {
|
||||
reject(new Error(`ktx-daemon HTTP ${path} returned non-object JSON`));
|
||||
return;
|
||||
}
|
||||
resolve(parsed as Record<string, unknown>);
|
||||
} catch (error) {
|
||||
reject(error);
|
||||
}
|
||||
});
|
||||
},
|
||||
);
|
||||
request.on('error', reject);
|
||||
request.end(body);
|
||||
});
|
||||
}
|
||||
|
|
@ -0,0 +1,47 @@
|
|||
import { mkdir, mkdtemp, rm, writeFile } from 'node:fs/promises';
|
||||
import { tmpdir } from 'node:os';
|
||||
import { join } from 'node:path';
|
||||
import { afterEach, beforeEach, describe, expect, it } from 'vitest';
|
||||
import { detectLookerStagedDir } from './detect.js';
|
||||
|
||||
async function touch(stagedDir: string, relPath: string, body = '{}\n'): Promise<void> {
|
||||
const abs = join(stagedDir, relPath);
|
||||
await mkdir(join(abs, '..'), { recursive: true });
|
||||
await writeFile(abs, body, 'utf-8');
|
||||
}
|
||||
|
||||
describe('detectLookerStagedDir', () => {
|
||||
let stagedDir: string;
|
||||
|
||||
beforeEach(async () => {
|
||||
stagedDir = await mkdtemp(join(tmpdir(), 'looker-detect-'));
|
||||
});
|
||||
|
||||
afterEach(async () => {
|
||||
await rm(stagedDir, { recursive: true, force: true });
|
||||
});
|
||||
|
||||
it('returns true when sync-config.json and at least one runtime entity are present', async () => {
|
||||
await touch(stagedDir, 'sync-config.json');
|
||||
await touch(stagedDir, 'explores/b2b/sales_pipeline.json');
|
||||
expect(await detectLookerStagedDir(stagedDir)).toBe(true);
|
||||
});
|
||||
|
||||
it('returns true for dashboard-only staged dirs', async () => {
|
||||
await touch(stagedDir, 'sync-config.json');
|
||||
await touch(stagedDir, 'dashboards/10.json');
|
||||
expect(await detectLookerStagedDir(stagedDir)).toBe(true);
|
||||
});
|
||||
|
||||
it('returns false without sync-config.json', async () => {
|
||||
await touch(stagedDir, 'looks/20.json');
|
||||
expect(await detectLookerStagedDir(stagedDir)).toBe(false);
|
||||
});
|
||||
|
||||
it('returns false when only control files are present', async () => {
|
||||
await touch(stagedDir, 'sync-config.json');
|
||||
await touch(stagedDir, 'lookml_models.json');
|
||||
await touch(stagedDir, 'signals/dashboard_usage.json', '[]\n');
|
||||
expect(await detectLookerStagedDir(stagedDir)).toBe(false);
|
||||
});
|
||||
});
|
||||
28
packages/cli/src/context/ingest/adapters/looker/detect.ts
Normal file
28
packages/cli/src/context/ingest/adapters/looker/detect.ts
Normal file
|
|
@ -0,0 +1,28 @@
|
|||
import { readdir, stat } from 'node:fs/promises';
|
||||
import { join, relative } from 'node:path';
|
||||
import { STAGED_FILES } from './types.js';
|
||||
|
||||
const LOOKER_ENTITY_FILE_RE = /^(explores\/[^/]+\/[^/]+|dashboards\/[^/]+|looks\/[^/]+)\.json$/;
|
||||
|
||||
async function walk(root: string): Promise<string[]> {
|
||||
const entries = await readdir(root, { withFileTypes: true, recursive: true });
|
||||
return entries
|
||||
.filter((entry) => entry.isFile())
|
||||
.map((entry) => relative(root, join(entry.parentPath, entry.name)).replace(/\\/g, '/'))
|
||||
.sort();
|
||||
}
|
||||
|
||||
export async function detectLookerStagedDir(stagedDir: string): Promise<boolean> {
|
||||
try {
|
||||
await stat(join(stagedDir, STAGED_FILES.syncConfig));
|
||||
} catch {
|
||||
return false;
|
||||
}
|
||||
|
||||
try {
|
||||
const paths = await walk(stagedDir);
|
||||
return paths.some((path) => LOOKER_ENTITY_FILE_RE.test(path));
|
||||
} catch {
|
||||
return false;
|
||||
}
|
||||
}
|
||||
|
|
@ -0,0 +1,188 @@
|
|||
import { mkdir, mkdtemp, readFile, rm, writeFile } from 'node:fs/promises';
|
||||
import { tmpdir } from 'node:os';
|
||||
import { dirname, join } from 'node:path';
|
||||
import { afterEach, beforeEach, describe, expect, it } from 'vitest';
|
||||
import { getLookerTriageSignals, writeLookerEvidenceDocuments } from './evidence-documents.js';
|
||||
|
||||
async function writeJson(root: string, relPath: string, value: unknown): Promise<void> {
|
||||
const target = join(root, relPath);
|
||||
await mkdir(dirname(target), { recursive: true });
|
||||
await writeFile(target, `${JSON.stringify(value, null, 2)}\n`, 'utf-8');
|
||||
}
|
||||
|
||||
async function readJson<T>(root: string, relPath: string): Promise<T> {
|
||||
return JSON.parse(await readFile(join(root, relPath), 'utf-8')) as T;
|
||||
}
|
||||
|
||||
describe('Looker evidence documents', () => {
|
||||
let stagedDir: string;
|
||||
|
||||
beforeEach(async () => {
|
||||
stagedDir = await mkdtemp(join(tmpdir(), 'looker-evidence-docs-'));
|
||||
await writeJson(stagedDir, 'explores/b2b/sales_pipeline.json', {
|
||||
modelName: 'b2b',
|
||||
exploreName: 'sales_pipeline',
|
||||
label: 'Sales Pipeline',
|
||||
description: 'Pipeline analysis explore.',
|
||||
fields: {
|
||||
dimensions: [
|
||||
{ name: 'opportunities.stage', label: 'Stage', type: 'string', sql: '${TABLE}.stage', description: null },
|
||||
],
|
||||
measures: [
|
||||
{
|
||||
name: 'opportunities.arr',
|
||||
label: 'ARR',
|
||||
type: 'sum',
|
||||
sql: '${TABLE}.arr',
|
||||
description: 'Annual recurring revenue.',
|
||||
},
|
||||
],
|
||||
},
|
||||
joins: [{ name: 'accounts', type: 'left_outer', relationship: 'many_to_one' }],
|
||||
});
|
||||
await writeJson(stagedDir, 'dashboards/10.json', {
|
||||
lookerId: '10',
|
||||
title: 'Sales Pipeline Overview',
|
||||
description: 'Executive dashboard for open pipeline ARR.',
|
||||
folderId: '7',
|
||||
ownerId: '3',
|
||||
updatedAt: '2026-04-30T10:00:00.000Z',
|
||||
tiles: [
|
||||
{
|
||||
id: '100',
|
||||
title: 'Open Pipeline ARR',
|
||||
lookId: null,
|
||||
query: {
|
||||
model: 'b2b',
|
||||
view: 'sales_pipeline',
|
||||
fields: ['opportunities.arr', 'opportunities.stage'],
|
||||
filters: { 'opportunities.stage': 'open' },
|
||||
sorts: ['opportunities.arr desc'],
|
||||
limit: '500',
|
||||
},
|
||||
},
|
||||
],
|
||||
});
|
||||
await writeJson(stagedDir, 'looks/20.json', {
|
||||
lookerId: '20',
|
||||
title: 'Active Opportunity Pipeline',
|
||||
description: 'Saved Look for active opportunity pipeline review.',
|
||||
folderId: '7',
|
||||
ownerId: '3',
|
||||
updatedAt: '2026-04-30T11:00:00.000Z',
|
||||
query: {
|
||||
model: 'b2b',
|
||||
view: 'sales_pipeline',
|
||||
fields: ['opportunities.arr'],
|
||||
filters: { 'opportunities.stage': 'open' },
|
||||
sorts: [],
|
||||
limit: '500',
|
||||
},
|
||||
});
|
||||
await writeJson(stagedDir, 'signals/dashboard_usage.json', [
|
||||
{
|
||||
contentId: '10',
|
||||
queryCount30d: 80,
|
||||
uniqueUsers30d: 12,
|
||||
lastRunAt: '2026-04-30T09:00:00.000Z',
|
||||
topUsers: ['3'],
|
||||
},
|
||||
]);
|
||||
await writeJson(stagedDir, 'signals/look_usage.json', [
|
||||
{
|
||||
contentId: '20',
|
||||
queryCount30d: 2,
|
||||
uniqueUsers30d: 1,
|
||||
lastRunAt: '2026-04-29T09:00:00.000Z',
|
||||
topUsers: ['3'],
|
||||
},
|
||||
]);
|
||||
await writeJson(stagedDir, 'signals/scheduled_plans.json', [
|
||||
{ contentId: '10', contentType: 'dashboard', isScheduled: true, scheduleCount: 2, recipientCount: 5 },
|
||||
]);
|
||||
await writeJson(stagedDir, 'signals/favorites.json', [
|
||||
{ contentId: '10', contentType: 'dashboard', favoriteCount: 4 },
|
||||
]);
|
||||
});
|
||||
|
||||
afterEach(async () => {
|
||||
await rm(stagedDir, { recursive: true, force: true });
|
||||
});
|
||||
|
||||
it('writes indexable metadata and markdown for explores, dashboards, and Looks', async () => {
|
||||
await writeLookerEvidenceDocuments(stagedDir);
|
||||
|
||||
await expect(readJson(stagedDir, 'evidence/explores/b2b/sales_pipeline/metadata.json')).resolves.toMatchObject({
|
||||
objectType: 'looker_explore',
|
||||
id: 'looker:explore:b2b.sales_pipeline',
|
||||
title: 'Sales Pipeline',
|
||||
path: 'Looker / Explores / b2b.sales_pipeline',
|
||||
properties: {
|
||||
rawPath: 'explores/b2b/sales_pipeline.json',
|
||||
modelName: 'b2b',
|
||||
exploreName: 'sales_pipeline',
|
||||
},
|
||||
});
|
||||
await expect(readJson(stagedDir, 'evidence/dashboards/10/metadata.json')).resolves.toMatchObject({
|
||||
objectType: 'looker_dashboard',
|
||||
id: 'looker:dashboard:10',
|
||||
title: 'Sales Pipeline Overview',
|
||||
path: 'Looker / Dashboards / Sales Pipeline Overview',
|
||||
lastEditedAt: '2026-04-30T10:00:00.000Z',
|
||||
properties: {
|
||||
rawPath: 'dashboards/10.json',
|
||||
lookerId: '10',
|
||||
},
|
||||
});
|
||||
await expect(readJson(stagedDir, 'evidence/looks/20/metadata.json')).resolves.toMatchObject({
|
||||
objectType: 'looker_look',
|
||||
id: 'looker:look:20',
|
||||
title: 'Active Opportunity Pipeline',
|
||||
path: 'Looker / Looks / Active Opportunity Pipeline',
|
||||
properties: {
|
||||
rawPath: 'looks/20.json',
|
||||
lookerId: '20',
|
||||
},
|
||||
});
|
||||
|
||||
const dashboardMarkdown = await readFile(join(stagedDir, 'evidence/dashboards/10/page.md'), 'utf-8');
|
||||
expect(dashboardMarkdown).toContain('# Sales Pipeline Overview');
|
||||
expect(dashboardMarkdown).toContain('Executive dashboard for open pipeline ARR.');
|
||||
expect(dashboardMarkdown).toContain('## Tile: Open Pipeline ARR');
|
||||
expect(dashboardMarkdown).toContain('- model: b2b');
|
||||
expect(dashboardMarkdown).toContain('- explore: sales_pipeline');
|
||||
expect(dashboardMarkdown).toContain('- opportunities.stage = open');
|
||||
expect(dashboardMarkdown).not.toContain('80');
|
||||
expect(dashboardMarkdown).not.toContain('queryCount30d');
|
||||
expect(dashboardMarkdown).not.toContain('recipient');
|
||||
expect(dashboardMarkdown).not.toContain('favorite');
|
||||
expect(dashboardMarkdown).not.toContain('owner');
|
||||
});
|
||||
|
||||
it('returns usage-aware triage signals without exposing usage as document prose', async () => {
|
||||
await writeLookerEvidenceDocuments(stagedDir);
|
||||
|
||||
await expect(getLookerTriageSignals(stagedDir, 'looker:dashboard:10')).resolves.toEqual({
|
||||
objectType: 'looker_dashboard',
|
||||
propertyHints: {
|
||||
contentType: 'dashboard',
|
||||
queryCount30d: '80',
|
||||
uniqueUsers30d: '12',
|
||||
isScheduled: 'true',
|
||||
favoriteCount: '4',
|
||||
},
|
||||
lastEditedAt: '2026-04-30T10:00:00.000Z',
|
||||
});
|
||||
await expect(getLookerTriageSignals(stagedDir, 'looker:look:20')).resolves.toEqual({
|
||||
objectType: 'looker_look',
|
||||
propertyHints: {
|
||||
contentType: 'look',
|
||||
queryCount30d: '2',
|
||||
uniqueUsers30d: '1',
|
||||
isScheduled: 'false',
|
||||
favoriteCount: '0',
|
||||
},
|
||||
lastEditedAt: '2026-04-30T11:00:00.000Z',
|
||||
});
|
||||
});
|
||||
});
|
||||
|
|
@ -0,0 +1,378 @@
|
|||
import { mkdir, readdir, readFile, writeFile } from 'node:fs/promises';
|
||||
import { dirname, join, relative } from 'node:path';
|
||||
import type { TriageSignals } from '../../types.js';
|
||||
import {
|
||||
STAGED_FILES,
|
||||
type StagedDashboardFile,
|
||||
type StagedExploreFile,
|
||||
type StagedLookerSignalsFile,
|
||||
type StagedLookFile,
|
||||
stagedDashboardFileSchema,
|
||||
stagedExploreFileSchema,
|
||||
stagedLookerSignalsFileSchema,
|
||||
stagedLookFileSchema,
|
||||
} from './types.js';
|
||||
|
||||
type JsonObject = Record<string, unknown>;
|
||||
|
||||
interface EvidenceDocument {
|
||||
relDir: string;
|
||||
metadata: JsonObject;
|
||||
markdown: string;
|
||||
}
|
||||
|
||||
export async function writeLookerEvidenceDocuments(stagedDir: string): Promise<void> {
|
||||
const paths = await walkJson(stagedDir);
|
||||
const signals = await readSignals(stagedDir);
|
||||
const documents: EvidenceDocument[] = [];
|
||||
|
||||
for (const relPath of paths) {
|
||||
if (/^explores\/[^/]+\/[^/]+\.json$/.test(relPath)) {
|
||||
const explore = await readJson(stagedDir, relPath, stagedExploreFileSchema);
|
||||
documents.push(renderExploreEvidence(relPath, explore));
|
||||
continue;
|
||||
}
|
||||
if (/^dashboards\/[^/]+\.json$/.test(relPath)) {
|
||||
const dashboard = await readJson(stagedDir, relPath, stagedDashboardFileSchema);
|
||||
documents.push(renderDashboardEvidence(relPath, dashboard));
|
||||
continue;
|
||||
}
|
||||
if (/^looks\/[^/]+\.json$/.test(relPath)) {
|
||||
const look = await readJson(stagedDir, relPath, stagedLookFileSchema);
|
||||
documents.push(renderLookEvidence(relPath, look));
|
||||
}
|
||||
}
|
||||
|
||||
for (const document of documents) {
|
||||
await writeJson(stagedDir, join(document.relDir, 'metadata.json'), document.metadata);
|
||||
await writeText(stagedDir, join(document.relDir, 'page.md'), document.markdown);
|
||||
}
|
||||
|
||||
await writeJson(stagedDir, join(STAGED_FILES.evidenceRoot, 'signals-summary.json'), {
|
||||
dashboardUsageCount: signals.dashboardUsage.length,
|
||||
lookUsageCount: signals.lookUsage.length,
|
||||
scheduledPlanCount: signals.scheduledPlans.length,
|
||||
favoriteCount: signals.favorites.length,
|
||||
});
|
||||
}
|
||||
|
||||
export async function getLookerTriageSignals(stagedDir: string, externalId: string): Promise<TriageSignals> {
|
||||
const signals = await readSignals(stagedDir);
|
||||
const dashboardId = /^looker:dashboard:(.+)$/.exec(externalId)?.[1];
|
||||
if (dashboardId) {
|
||||
const dashboard = await readOptionalJson(
|
||||
stagedDir,
|
||||
`dashboards/${safePathSegment(dashboardId)}.json`,
|
||||
stagedDashboardFileSchema,
|
||||
);
|
||||
const usage = signals.dashboardUsage.find((item) => item.contentId === dashboardId);
|
||||
const schedule = signals.scheduledPlans.find(
|
||||
(item) => item.contentType === 'dashboard' && item.contentId === dashboardId,
|
||||
);
|
||||
const favorite = signals.favorites.find(
|
||||
(item) => item.contentType === 'dashboard' && item.contentId === dashboardId,
|
||||
);
|
||||
return {
|
||||
objectType: 'looker_dashboard',
|
||||
lastEditedAt: dashboard?.updatedAt ?? usage?.lastRunAt ?? undefined,
|
||||
propertyHints: {
|
||||
contentType: 'dashboard',
|
||||
queryCount30d: String(usage?.queryCount30d ?? 0),
|
||||
uniqueUsers30d: String(usage?.uniqueUsers30d ?? 0),
|
||||
isScheduled: String(schedule?.isScheduled ?? false),
|
||||
favoriteCount: String(favorite?.favoriteCount ?? 0),
|
||||
},
|
||||
};
|
||||
}
|
||||
|
||||
const lookId = /^looker:look:(.+)$/.exec(externalId)?.[1];
|
||||
if (lookId) {
|
||||
const look = await readOptionalJson(stagedDir, `looks/${safePathSegment(lookId)}.json`, stagedLookFileSchema);
|
||||
const usage = signals.lookUsage.find((item) => item.contentId === lookId);
|
||||
const schedule = signals.scheduledPlans.find((item) => item.contentType === 'look' && item.contentId === lookId);
|
||||
const favorite = signals.favorites.find((item) => item.contentType === 'look' && item.contentId === lookId);
|
||||
return {
|
||||
objectType: 'looker_look',
|
||||
lastEditedAt: look?.updatedAt ?? usage?.lastRunAt ?? undefined,
|
||||
propertyHints: {
|
||||
contentType: 'look',
|
||||
queryCount30d: String(usage?.queryCount30d ?? 0),
|
||||
uniqueUsers30d: String(usage?.uniqueUsers30d ?? 0),
|
||||
isScheduled: String(schedule?.isScheduled ?? false),
|
||||
favoriteCount: String(favorite?.favoriteCount ?? 0),
|
||||
},
|
||||
};
|
||||
}
|
||||
|
||||
const explore = /^looker:explore:([^.]+)\.(.+)$/.exec(externalId);
|
||||
if (explore) {
|
||||
return {
|
||||
objectType: 'looker_explore',
|
||||
propertyHints: {
|
||||
contentType: 'explore',
|
||||
modelName: explore[1],
|
||||
exploreName: explore[2],
|
||||
},
|
||||
};
|
||||
}
|
||||
|
||||
return { objectType: 'looker_runtime' };
|
||||
}
|
||||
|
||||
function renderExploreEvidence(rawPath: string, explore: StagedExploreFile): EvidenceDocument {
|
||||
const title = explore.label ?? `${explore.modelName}.${explore.exploreName}`;
|
||||
const relDir = join(
|
||||
STAGED_FILES.evidenceRoot,
|
||||
'explores',
|
||||
safePathSegment(explore.modelName),
|
||||
safePathSegment(explore.exploreName),
|
||||
);
|
||||
const lines = [
|
||||
`# ${title}`,
|
||||
'',
|
||||
explore.description ? explore.description : '',
|
||||
'',
|
||||
'## Explore',
|
||||
'',
|
||||
`- model: ${explore.modelName}`,
|
||||
`- explore: ${explore.exploreName}`,
|
||||
'',
|
||||
'## Dimensions',
|
||||
'',
|
||||
...fieldLines(explore.fields.dimensions),
|
||||
'',
|
||||
'## Measures',
|
||||
'',
|
||||
...fieldLines(explore.fields.measures),
|
||||
'',
|
||||
'## Joins',
|
||||
'',
|
||||
...(explore.joins.length === 0
|
||||
? ['- none']
|
||||
: explore.joins.map((item) => `- ${item.name}${item.relationship ? ` (${item.relationship})` : ''}`)),
|
||||
];
|
||||
return {
|
||||
relDir,
|
||||
metadata: {
|
||||
objectType: 'looker_explore',
|
||||
id: `looker:explore:${explore.modelName}.${explore.exploreName}`,
|
||||
title,
|
||||
path: `Looker / Explores / ${explore.modelName}.${explore.exploreName}`,
|
||||
url: null,
|
||||
parentId: null,
|
||||
databaseId: null,
|
||||
dataSourceId: null,
|
||||
lastEditedAt: null,
|
||||
lastEditedBy: null,
|
||||
properties: {
|
||||
rawPath,
|
||||
modelName: explore.modelName,
|
||||
exploreName: explore.exploreName,
|
||||
},
|
||||
},
|
||||
markdown: normalizeMarkdown(lines),
|
||||
};
|
||||
}
|
||||
|
||||
function renderDashboardEvidence(rawPath: string, dashboard: StagedDashboardFile): EvidenceDocument {
|
||||
const relDir = join(STAGED_FILES.evidenceRoot, 'dashboards', safePathSegment(dashboard.lookerId));
|
||||
const lines = [
|
||||
`# ${dashboard.title}`,
|
||||
'',
|
||||
dashboard.description ?? '',
|
||||
'',
|
||||
'## Dashboard Queries',
|
||||
'',
|
||||
...dashboard.tiles.flatMap((tile) => [
|
||||
`## Tile: ${tile.title ?? tile.id}`,
|
||||
'',
|
||||
...(tile.query ? queryLines(tile.query) : ['- no inline query captured']),
|
||||
'',
|
||||
]),
|
||||
];
|
||||
return {
|
||||
relDir,
|
||||
metadata: {
|
||||
objectType: 'looker_dashboard',
|
||||
id: `looker:dashboard:${dashboard.lookerId}`,
|
||||
title: dashboard.title,
|
||||
path: `Looker / Dashboards / ${dashboard.title}`,
|
||||
url: null,
|
||||
parentId: dashboard.folderId,
|
||||
databaseId: null,
|
||||
dataSourceId: null,
|
||||
lastEditedAt: dashboard.updatedAt,
|
||||
lastEditedBy: null,
|
||||
properties: {
|
||||
rawPath,
|
||||
lookerId: dashboard.lookerId,
|
||||
},
|
||||
},
|
||||
markdown: normalizeMarkdown(lines),
|
||||
};
|
||||
}
|
||||
|
||||
function renderLookEvidence(rawPath: string, look: StagedLookFile): EvidenceDocument {
|
||||
const relDir = join(STAGED_FILES.evidenceRoot, 'looks', safePathSegment(look.lookerId));
|
||||
const lines = [
|
||||
`# ${look.title}`,
|
||||
'',
|
||||
look.description ?? '',
|
||||
'',
|
||||
'## Look Query',
|
||||
'',
|
||||
...(look.query ? queryLines(look.query) : ['- no query captured']),
|
||||
];
|
||||
return {
|
||||
relDir,
|
||||
metadata: {
|
||||
objectType: 'looker_look',
|
||||
id: `looker:look:${look.lookerId}`,
|
||||
title: look.title,
|
||||
path: `Looker / Looks / ${look.title}`,
|
||||
url: null,
|
||||
parentId: look.folderId,
|
||||
databaseId: null,
|
||||
dataSourceId: null,
|
||||
lastEditedAt: look.updatedAt,
|
||||
lastEditedBy: null,
|
||||
properties: {
|
||||
rawPath,
|
||||
lookerId: look.lookerId,
|
||||
},
|
||||
},
|
||||
markdown: normalizeMarkdown(lines),
|
||||
};
|
||||
}
|
||||
|
||||
function fieldLines(
|
||||
fields: Array<{
|
||||
name: string;
|
||||
label: string | null;
|
||||
type: string | null;
|
||||
sql: string | null;
|
||||
description: string | null;
|
||||
}>,
|
||||
): string[] {
|
||||
if (fields.length === 0) {
|
||||
return ['- none'];
|
||||
}
|
||||
return fields.map((field) => {
|
||||
const parts = [
|
||||
field.name,
|
||||
field.label ? `label: ${field.label}` : null,
|
||||
field.type ? `type: ${field.type}` : null,
|
||||
field.description ? `description: ${field.description}` : null,
|
||||
].filter(Boolean);
|
||||
return `- ${parts.join('; ')}`;
|
||||
});
|
||||
}
|
||||
|
||||
function queryLines(query: StagedDashboardFile['tiles'][number]['query']): string[] {
|
||||
if (!query) {
|
||||
return ['- no query captured'];
|
||||
}
|
||||
return [
|
||||
`- model: ${query.model}`,
|
||||
`- explore: ${query.view}`,
|
||||
'',
|
||||
'### Fields',
|
||||
'',
|
||||
...(query.fields.length === 0 ? ['- none'] : query.fields.map((field) => `- ${field}`)),
|
||||
'',
|
||||
'### Filters',
|
||||
'',
|
||||
...filterLines(query.filters),
|
||||
];
|
||||
}
|
||||
|
||||
function filterLines(filters: Record<string, unknown>): string[] {
|
||||
const entries = Object.entries(filters).filter(
|
||||
([, value]) => value !== null && value !== undefined && String(value).trim() !== '',
|
||||
);
|
||||
if (entries.length === 0) {
|
||||
return ['- none'];
|
||||
}
|
||||
return entries.map(([field, value]) => `- ${field} = ${String(value)}`);
|
||||
}
|
||||
|
||||
async function readSignals(stagedDir: string): Promise<StagedLookerSignalsFile> {
|
||||
const [dashboardUsage, lookUsage, scheduledPlans, favorites] = await Promise.all([
|
||||
readOptionalArray(stagedDir, STAGED_FILES.signals.dashboardUsage),
|
||||
readOptionalArray(stagedDir, STAGED_FILES.signals.lookUsage),
|
||||
readOptionalArray(stagedDir, STAGED_FILES.signals.scheduledPlans),
|
||||
readOptionalArray(stagedDir, STAGED_FILES.signals.favorites),
|
||||
]);
|
||||
return stagedLookerSignalsFileSchema.parse({ dashboardUsage, lookUsage, scheduledPlans, favorites });
|
||||
}
|
||||
|
||||
async function readOptionalArray(stagedDir: string, relPath: string): Promise<unknown[]> {
|
||||
try {
|
||||
const parsed = JSON.parse(await readFile(join(stagedDir, relPath), 'utf-8')) as unknown;
|
||||
return Array.isArray(parsed) ? parsed : [];
|
||||
} catch (error) {
|
||||
if (error && typeof error === 'object' && 'code' in error && error.code === 'ENOENT') {
|
||||
return [];
|
||||
}
|
||||
throw error;
|
||||
}
|
||||
}
|
||||
|
||||
async function readOptionalJson<T>(
|
||||
stagedDir: string,
|
||||
relPath: string,
|
||||
schema: { parse(value: unknown): T },
|
||||
): Promise<T | null> {
|
||||
try {
|
||||
return await readJson(stagedDir, relPath, schema);
|
||||
} catch (error) {
|
||||
if (error && typeof error === 'object' && 'code' in error && error.code === 'ENOENT') {
|
||||
return null;
|
||||
}
|
||||
throw error;
|
||||
}
|
||||
}
|
||||
|
||||
async function readJson<T>(stagedDir: string, relPath: string, schema: { parse(value: unknown): T }): Promise<T> {
|
||||
return schema.parse(JSON.parse(await readFile(join(stagedDir, relPath), 'utf-8')));
|
||||
}
|
||||
|
||||
async function writeJson(stagedDir: string, relPath: string, value: unknown): Promise<void> {
|
||||
await writeText(stagedDir, relPath, `${JSON.stringify(value, null, 2)}\n`);
|
||||
}
|
||||
|
||||
async function writeText(stagedDir: string, relPath: string, body: string): Promise<void> {
|
||||
const target = join(stagedDir, relPath);
|
||||
await mkdir(dirname(target), { recursive: true });
|
||||
await writeFile(target, body, 'utf-8');
|
||||
}
|
||||
|
||||
async function walkJson(root: string, dir = root): Promise<string[]> {
|
||||
const entries = await readdir(dir, { withFileTypes: true });
|
||||
const paths: string[] = [];
|
||||
for (const entry of entries) {
|
||||
const absPath = join(dir, entry.name);
|
||||
if (entry.isDirectory()) {
|
||||
paths.push(...(await walkJson(root, absPath)));
|
||||
continue;
|
||||
}
|
||||
if (entry.isFile() && entry.name.endsWith('.json')) {
|
||||
paths.push(relative(root, absPath).replace(/\\/g, '/'));
|
||||
}
|
||||
}
|
||||
return paths.sort();
|
||||
}
|
||||
|
||||
function safePathSegment(value: string): string {
|
||||
if (!/^[a-zA-Z0-9_-]+$/.test(value)) {
|
||||
throw new Error(`Unsafe Looker evidence path segment: ${value}`);
|
||||
}
|
||||
return value;
|
||||
}
|
||||
|
||||
function normalizeMarkdown(lines: string[]): string {
|
||||
return `${lines
|
||||
.filter((line, index, all) => line !== '' || all[index - 1] !== '')
|
||||
.join('\n')
|
||||
.trim()}\n`;
|
||||
}
|
||||
|
|
@ -0,0 +1,74 @@
|
|||
import { describe, expect, it, vi } from 'vitest';
|
||||
import type { FetchContext } from '../../types.js';
|
||||
import type { LookerSdkPort } from './client.js';
|
||||
import {
|
||||
DefaultLookerClientFactory,
|
||||
DefaultLookerConnectionClientFactory,
|
||||
type LookerCredentialResolver,
|
||||
} from './factory.js';
|
||||
import type { LookerRuntimeClient } from './fetch.js';
|
||||
import type { LookerPullConfig } from './types.js';
|
||||
|
||||
function sdk(): LookerSdkPort {
|
||||
return {
|
||||
me: vi.fn().mockResolvedValue({ id: '1', display_name: 'API User', email: 'api@example.com' }),
|
||||
search_dashboards: vi.fn().mockResolvedValue([{ id: '10' }]),
|
||||
dashboard: vi.fn(),
|
||||
search_looks: vi.fn().mockResolvedValue([]),
|
||||
search_scheduled_plans: vi.fn().mockResolvedValue([]),
|
||||
look: vi.fn(),
|
||||
all_folders: vi.fn().mockResolvedValue([]),
|
||||
all_users: vi.fn().mockResolvedValue([]),
|
||||
all_groups: vi.fn().mockResolvedValue([]),
|
||||
all_connections: vi.fn().mockResolvedValue([]),
|
||||
all_lookml_models: vi.fn().mockResolvedValue([]),
|
||||
lookml_model_explore: vi.fn(),
|
||||
run_inline_query: vi.fn().mockResolvedValue('[]'),
|
||||
logout: vi.fn().mockResolvedValue(undefined),
|
||||
};
|
||||
}
|
||||
|
||||
describe('DefaultLookerConnectionClientFactory', () => {
|
||||
it('resolves credentials by Looker connection id and creates a KTX Looker client', async () => {
|
||||
const fakeSdk = sdk();
|
||||
const resolver: LookerCredentialResolver = {
|
||||
resolve: vi.fn().mockResolvedValue({
|
||||
base_url: 'https://example.looker.com',
|
||||
client_id: 'id',
|
||||
client_secret: 'credential', // pragma: allowlist secret
|
||||
}),
|
||||
};
|
||||
const factory = new DefaultLookerConnectionClientFactory(resolver, { sdkFactory: () => fakeSdk });
|
||||
|
||||
const client = await factory.createClient('prod-looker');
|
||||
|
||||
await expect(client.listDashboards()).resolves.toEqual([{ id: '10', updatedAt: null }]);
|
||||
expect(resolver.resolve).toHaveBeenCalledWith('prod-looker');
|
||||
});
|
||||
});
|
||||
|
||||
describe('DefaultLookerClientFactory', () => {
|
||||
const ctx: FetchContext = { connectionId: 'ctx-looker', sourceKey: 'looker' };
|
||||
|
||||
it('uses pullConfig.lookerConnectionId when present', async () => {
|
||||
const runtimeClient = { listDashboards: vi.fn() } as unknown as LookerRuntimeClient;
|
||||
const inner = { createClient: vi.fn().mockResolvedValue(runtimeClient) };
|
||||
const factory = new DefaultLookerClientFactory(inner);
|
||||
const config = { lookerConnectionId: 'prod-looker' } as LookerPullConfig;
|
||||
|
||||
await expect(factory.createClient(config, ctx)).resolves.toBe(runtimeClient);
|
||||
|
||||
expect(inner.createClient).toHaveBeenCalledWith('prod-looker');
|
||||
});
|
||||
|
||||
it('falls back to ctx.connectionId when pullConfig.lookerConnectionId is absent', async () => {
|
||||
const runtimeClient = { listDashboards: vi.fn() } as unknown as LookerRuntimeClient;
|
||||
const inner = { createClient: vi.fn().mockResolvedValue(runtimeClient) };
|
||||
const factory = new DefaultLookerClientFactory(inner);
|
||||
const config = {} as LookerPullConfig;
|
||||
|
||||
await expect(factory.createClient(config, ctx)).resolves.toBe(runtimeClient);
|
||||
|
||||
expect(inner.createClient).toHaveBeenCalledWith('ctx-looker');
|
||||
});
|
||||
});
|
||||
34
packages/cli/src/context/ingest/adapters/looker/factory.ts
Normal file
34
packages/cli/src/context/ingest/adapters/looker/factory.ts
Normal file
|
|
@ -0,0 +1,34 @@
|
|||
import type { FetchContext } from '../../types.js';
|
||||
import { LookerClient, type LookerClientDeps, type LookerConnectionParams } from './client.js';
|
||||
import type { LookerClientFactory, LookerRuntimeClient } from './fetch.js';
|
||||
import type { LookerPullConfig } from './types.js';
|
||||
|
||||
export interface LookerCredentialResolver {
|
||||
resolve(lookerConnectionId: string): Promise<LookerConnectionParams>;
|
||||
}
|
||||
|
||||
/** @internal */
|
||||
export interface LookerConnectionClientFactory {
|
||||
createClient(lookerConnectionId: string): Promise<LookerRuntimeClient>;
|
||||
}
|
||||
|
||||
export class DefaultLookerConnectionClientFactory implements LookerConnectionClientFactory {
|
||||
constructor(
|
||||
private readonly resolver: LookerCredentialResolver,
|
||||
private readonly deps: LookerClientDeps = {},
|
||||
) {}
|
||||
|
||||
async createClient(lookerConnectionId: string): Promise<LookerRuntimeClient> {
|
||||
const credentials = await this.resolver.resolve(lookerConnectionId);
|
||||
return new LookerClient(credentials, this.deps);
|
||||
}
|
||||
}
|
||||
|
||||
/** @internal */
|
||||
export class DefaultLookerClientFactory implements LookerClientFactory {
|
||||
constructor(private readonly inner: LookerConnectionClientFactory) {}
|
||||
|
||||
async createClient(config: LookerPullConfig, ctx: FetchContext): Promise<LookerRuntimeClient> {
|
||||
return this.inner.createClient(config.lookerConnectionId ?? ctx.connectionId);
|
||||
}
|
||||
}
|
||||
|
|
@ -0,0 +1,77 @@
|
|||
import { mkdtemp, rm } from 'node:fs/promises';
|
||||
import { tmpdir } from 'node:os';
|
||||
import { join } from 'node:path';
|
||||
import { afterEach, beforeEach, describe, expect, it } from 'vitest';
|
||||
import { readLookerFetchReport, writeLookerFetchReport } from './fetch-report.js';
|
||||
|
||||
describe('Looker staged fetch report', () => {
|
||||
let stagedDir: string;
|
||||
|
||||
beforeEach(async () => {
|
||||
stagedDir = await mkdtemp(join(tmpdir(), 'looker-fetch-report-'));
|
||||
});
|
||||
|
||||
afterEach(async () => {
|
||||
await rm(stagedDir, { recursive: true, force: true });
|
||||
});
|
||||
|
||||
it('returns null when a staged bundle has no fetch report', async () => {
|
||||
await expect(readLookerFetchReport(stagedDir)).resolves.toBeNull();
|
||||
});
|
||||
|
||||
it('round-trips partial fetch issues', async () => {
|
||||
await writeLookerFetchReport(stagedDir, {
|
||||
status: 'partial',
|
||||
retryRecommended: true,
|
||||
skipped: [
|
||||
{
|
||||
rawPath: 'dashboards/10.json',
|
||||
entityType: 'dashboard',
|
||||
entityId: '10',
|
||||
severity: 'error',
|
||||
statusCode: 429,
|
||||
message: 'Looker API rate limit remained after retry',
|
||||
retryRecommended: true,
|
||||
},
|
||||
],
|
||||
warnings: [
|
||||
{
|
||||
rawPath: 'signals/dashboard_usage.json',
|
||||
entityType: 'signals',
|
||||
entityId: null,
|
||||
severity: 'warning',
|
||||
statusCode: 403,
|
||||
message: 'system__activity unavailable',
|
||||
retryRecommended: false,
|
||||
},
|
||||
],
|
||||
});
|
||||
|
||||
await expect(readLookerFetchReport(stagedDir)).resolves.toEqual({
|
||||
status: 'partial',
|
||||
retryRecommended: true,
|
||||
skipped: [
|
||||
{
|
||||
rawPath: 'dashboards/10.json',
|
||||
entityType: 'dashboard',
|
||||
entityId: '10',
|
||||
severity: 'error',
|
||||
statusCode: 429,
|
||||
message: 'Looker API rate limit remained after retry',
|
||||
retryRecommended: true,
|
||||
},
|
||||
],
|
||||
warnings: [
|
||||
{
|
||||
rawPath: 'signals/dashboard_usage.json',
|
||||
entityType: 'signals',
|
||||
entityId: null,
|
||||
severity: 'warning',
|
||||
statusCode: 403,
|
||||
message: 'system__activity unavailable',
|
||||
retryRecommended: false,
|
||||
},
|
||||
],
|
||||
});
|
||||
});
|
||||
});
|
||||
|
|
@ -0,0 +1,22 @@
|
|||
import { mkdir, readFile, writeFile } from 'node:fs/promises';
|
||||
import { dirname, join } from 'node:path';
|
||||
import { STAGED_FILES, type StagedLookerFetchReport, stagedLookerFetchReportSchema } from './types.js';
|
||||
|
||||
export async function readLookerFetchReport(stagedDir: string): Promise<StagedLookerFetchReport | null> {
|
||||
try {
|
||||
const raw = await readFile(join(stagedDir, STAGED_FILES.fetchReport), 'utf-8');
|
||||
return stagedLookerFetchReportSchema.parse(JSON.parse(raw));
|
||||
} catch (error) {
|
||||
if (error && typeof error === 'object' && 'code' in error && error.code === 'ENOENT') {
|
||||
return null;
|
||||
}
|
||||
throw error;
|
||||
}
|
||||
}
|
||||
|
||||
export async function writeLookerFetchReport(stagedDir: string, report: StagedLookerFetchReport): Promise<void> {
|
||||
const parsed = stagedLookerFetchReportSchema.parse(report);
|
||||
const target = join(stagedDir, STAGED_FILES.fetchReport);
|
||||
await mkdir(dirname(target), { recursive: true });
|
||||
await writeFile(target, `${JSON.stringify(parsed, null, 2)}\n`, 'utf-8');
|
||||
}
|
||||
645
packages/cli/src/context/ingest/adapters/looker/fetch.test.ts
Normal file
645
packages/cli/src/context/ingest/adapters/looker/fetch.test.ts
Normal file
|
|
@ -0,0 +1,645 @@
|
|||
import { mkdtemp, readdir, readFile, rm } from 'node:fs/promises';
|
||||
import { tmpdir } from 'node:os';
|
||||
import { join } from 'node:path';
|
||||
import { afterEach, beforeEach, describe, expect, it, vi } from 'vitest';
|
||||
import { chunkLookerStagedDir } from './chunk.js';
|
||||
import { fetchLookerRuntimeBundle, type LookerRuntimeClient } from './fetch.js';
|
||||
|
||||
const connectionId = '11111111-1111-4111-8111-111111111111';
|
||||
|
||||
function makeClient(): LookerRuntimeClient {
|
||||
return {
|
||||
listDashboards: vi.fn().mockResolvedValue([{ id: '10' }]),
|
||||
getDashboard: vi.fn().mockResolvedValue({
|
||||
lookerId: '10',
|
||||
title: 'Sales Pipeline',
|
||||
description: 'Pipeline health',
|
||||
folderId: '7',
|
||||
ownerId: '3',
|
||||
updatedAt: '2026-04-30T12:00:00.000Z',
|
||||
tiles: [{ id: '100', title: 'ARR', lookId: null, query: { model: 'b2b', view: 'sales_pipeline' } }],
|
||||
}),
|
||||
listLooks: vi.fn().mockResolvedValue([{ id: '20' }]),
|
||||
getLook: vi.fn().mockResolvedValue({
|
||||
lookerId: '20',
|
||||
title: 'Open Pipeline',
|
||||
description: null,
|
||||
folderId: '7',
|
||||
ownerId: '3',
|
||||
updatedAt: '2026-04-30T12:00:00.000Z',
|
||||
query: { model: 'b2b', view: 'sales_pipeline', fields: ['opportunities.arr'] },
|
||||
}),
|
||||
listFolders: vi
|
||||
.fn()
|
||||
.mockResolvedValue({ folders: [{ id: '7', name: 'Sandbox', parentId: null, path: ['Sandbox'] }] }),
|
||||
listUsers: vi.fn().mockResolvedValue([{ id: '3', displayName: 'Ada Lovelace', email: null }]),
|
||||
listGroups: vi.fn().mockResolvedValue([{ id: '4', name: 'Sales' }]),
|
||||
listLookmlModels: vi.fn().mockResolvedValue({
|
||||
models: [{ name: 'b2b', label: 'B2B', explores: [{ name: 'sales_pipeline', label: 'Sales Pipeline' }] }],
|
||||
}),
|
||||
getExplore: vi.fn().mockResolvedValue({
|
||||
modelName: 'b2b',
|
||||
exploreName: 'sales_pipeline',
|
||||
label: 'Sales Pipeline',
|
||||
description: null,
|
||||
fields: { dimensions: [{ name: 'opportunities.id' }], measures: [{ name: 'opportunities.arr' }] },
|
||||
joins: [],
|
||||
}),
|
||||
getSignals: vi.fn().mockResolvedValue({
|
||||
dashboardUsage: [{ contentId: '10', queryCount30d: 50, uniqueUsers30d: 8, lastRunAt: null, topUsers: ['3'] }],
|
||||
lookUsage: [{ contentId: '20', queryCount30d: 20, uniqueUsers30d: 5, lastRunAt: null, topUsers: ['3'] }],
|
||||
scheduledPlans: [
|
||||
{ contentId: '10', contentType: 'dashboard', isScheduled: true, scheduleCount: 1, recipientCount: 3 },
|
||||
],
|
||||
favorites: [{ contentId: '10', contentType: 'dashboard', favoriteCount: 4 }],
|
||||
}),
|
||||
cleanup: vi.fn().mockResolvedValue(undefined),
|
||||
};
|
||||
}
|
||||
|
||||
describe('fetchLookerRuntimeBundle', () => {
|
||||
let stagedDir: string;
|
||||
|
||||
beforeEach(async () => {
|
||||
stagedDir = await mkdtemp(join(tmpdir(), 'looker-fetch-'));
|
||||
});
|
||||
|
||||
afterEach(async () => {
|
||||
await rm(stagedDir, { recursive: true, force: true });
|
||||
});
|
||||
|
||||
it('writes dashboards, looks, folders, users, groups, models, explores, signals, and sync config', async () => {
|
||||
const client = makeClient();
|
||||
await fetchLookerRuntimeBundle({
|
||||
pullConfig: { lookerConnectionId: connectionId, instanceBaseUrl: 'https://example.looker.com' },
|
||||
stagedDir,
|
||||
ctx: { connectionId, sourceKey: 'looker' },
|
||||
clientFactory: { createClient: vi.fn().mockResolvedValue(client) },
|
||||
now: () => new Date('2026-04-30T12:30:00.000Z'),
|
||||
});
|
||||
|
||||
expect(await readdir(join(stagedDir, 'dashboards'))).toEqual(['10.json']);
|
||||
expect(await readdir(join(stagedDir, 'looks'))).toEqual(['20.json']);
|
||||
expect(await readdir(join(stagedDir, 'users'))).toEqual(['3.json']);
|
||||
expect(await readdir(join(stagedDir, 'groups'))).toEqual(['4.json']);
|
||||
expect(await readdir(join(stagedDir, 'explores/b2b'))).toEqual(['sales_pipeline.json']);
|
||||
|
||||
const syncConfig = JSON.parse(await readFile(join(stagedDir, 'sync-config.json'), 'utf-8'));
|
||||
expect(syncConfig).toEqual({
|
||||
lookerConnectionId: connectionId,
|
||||
fetchedAt: '2026-04-30T12:30:00.000Z',
|
||||
instanceBaseUrl: 'https://example.looker.com',
|
||||
previousCursors: {
|
||||
dashboardsLastSyncedAt: null,
|
||||
looksLastSyncedAt: null,
|
||||
},
|
||||
nextCursors: {
|
||||
dashboardsLastSyncedAt: null,
|
||||
looksLastSyncedAt: null,
|
||||
},
|
||||
});
|
||||
|
||||
const scope = JSON.parse(await readFile(join(stagedDir, 'looker-scope.json'), 'utf-8'));
|
||||
expect(scope).toEqual({
|
||||
mode: 'full',
|
||||
knownCurrentRawPaths: ['dashboards/10.json', 'looks/20.json'],
|
||||
fetchedRawPaths: ['dashboards/10.json', 'looks/20.json'],
|
||||
});
|
||||
|
||||
const dashboardUsage = JSON.parse(await readFile(join(stagedDir, 'signals/dashboard_usage.json'), 'utf-8'));
|
||||
expect(dashboardUsage).toEqual([
|
||||
{ contentId: '10', queryCount30d: 50, uniqueUsers30d: 8, lastRunAt: null, topUsers: ['3'] },
|
||||
]);
|
||||
|
||||
const lookUsage = JSON.parse(await readFile(join(stagedDir, 'signals/look_usage.json'), 'utf-8'));
|
||||
const scheduledPlans = JSON.parse(await readFile(join(stagedDir, 'signals/scheduled_plans.json'), 'utf-8'));
|
||||
const favorites = JSON.parse(await readFile(join(stagedDir, 'signals/favorites.json'), 'utf-8'));
|
||||
|
||||
expect(lookUsage).toEqual([
|
||||
{ contentId: '20', queryCount30d: 20, uniqueUsers30d: 5, lastRunAt: null, topUsers: ['3'] },
|
||||
]);
|
||||
expect(scheduledPlans).toEqual([
|
||||
{ contentId: '10', contentType: 'dashboard', isScheduled: true, scheduleCount: 1, recipientCount: 3 },
|
||||
]);
|
||||
expect(favorites).toEqual([{ contentId: '10', contentType: 'dashboard', favoriteCount: 4 }]);
|
||||
});
|
||||
|
||||
it('stages only changed Dashboard and Look entity bodies during incremental pulls', async () => {
|
||||
const client = makeClient();
|
||||
vi.mocked(client.listDashboards).mockResolvedValue([
|
||||
{ id: '10', updatedAt: '2026-04-30T12:00:00.000Z' },
|
||||
{ id: '11', updatedAt: '2026-04-30T12:10:00.000Z' },
|
||||
]);
|
||||
vi.mocked(client.getDashboard).mockImplementation(async (id: string) => ({
|
||||
lookerId: id,
|
||||
title: `Dashboard ${id}`,
|
||||
description: null,
|
||||
folderId: '7',
|
||||
ownerId: '3',
|
||||
updatedAt: id === '11' ? '2026-04-30T12:10:00.000Z' : '2026-04-30T12:00:00.000Z',
|
||||
tiles: [],
|
||||
}));
|
||||
vi.mocked(client.listLooks).mockResolvedValue([
|
||||
{ id: '20', updatedAt: '2026-04-30T11:00:00.000Z' },
|
||||
{ id: '21', updatedAt: null },
|
||||
]);
|
||||
vi.mocked(client.getLook).mockImplementation(async (id: string) => ({
|
||||
lookerId: id,
|
||||
title: `Look ${id}`,
|
||||
description: null,
|
||||
folderId: '7',
|
||||
ownerId: '3',
|
||||
updatedAt: id === '21' ? null : '2026-04-30T11:00:00.000Z',
|
||||
query: null,
|
||||
}));
|
||||
|
||||
await fetchLookerRuntimeBundle({
|
||||
pullConfig: {
|
||||
lookerConnectionId: connectionId,
|
||||
dashboardUpdatedSince: '2026-04-30T12:00:00.000Z',
|
||||
lookUpdatedSince: '2026-04-30T11:00:00.000Z',
|
||||
},
|
||||
stagedDir,
|
||||
ctx: { connectionId, sourceKey: 'looker' },
|
||||
clientFactory: { createClient: vi.fn().mockResolvedValue(client) },
|
||||
now: () => new Date('2026-04-30T12:30:00.000Z'),
|
||||
});
|
||||
|
||||
expect(client.getDashboard).toHaveBeenCalledTimes(1);
|
||||
expect(client.getDashboard).toHaveBeenCalledWith('11');
|
||||
expect(client.getLook).toHaveBeenCalledTimes(1);
|
||||
expect(client.getLook).toHaveBeenCalledWith('21');
|
||||
|
||||
await expect(readdir(join(stagedDir, 'dashboards'))).resolves.toEqual(['11.json']);
|
||||
await expect(readdir(join(stagedDir, 'looks'))).resolves.toEqual(['21.json']);
|
||||
|
||||
const syncConfig = JSON.parse(await readFile(join(stagedDir, 'sync-config.json'), 'utf-8'));
|
||||
expect(syncConfig.previousCursors).toEqual({
|
||||
dashboardsLastSyncedAt: '2026-04-30T12:00:00.000Z',
|
||||
looksLastSyncedAt: '2026-04-30T11:00:00.000Z',
|
||||
});
|
||||
expect(syncConfig.nextCursors).toEqual({
|
||||
dashboardsLastSyncedAt: '2026-04-30T12:10:00.000Z',
|
||||
looksLastSyncedAt: '2026-04-30T11:00:00.000Z',
|
||||
});
|
||||
|
||||
const scope = JSON.parse(await readFile(join(stagedDir, 'looker-scope.json'), 'utf-8'));
|
||||
expect(scope).toEqual({
|
||||
mode: 'incremental',
|
||||
knownCurrentRawPaths: ['dashboards/10.json', 'dashboards/11.json', 'looks/20.json', 'looks/21.json'],
|
||||
fetchedRawPaths: ['dashboards/11.json', 'looks/21.json'],
|
||||
});
|
||||
});
|
||||
|
||||
it('falls back to empty signal files when the client has no signal support', async () => {
|
||||
const client = makeClient();
|
||||
delete client.getSignals;
|
||||
|
||||
await fetchLookerRuntimeBundle({
|
||||
pullConfig: { lookerConnectionId: connectionId },
|
||||
stagedDir,
|
||||
ctx: { connectionId, sourceKey: 'looker' },
|
||||
clientFactory: { createClient: vi.fn().mockResolvedValue(client) },
|
||||
now: () => new Date('2026-04-30T12:30:00.000Z'),
|
||||
});
|
||||
|
||||
expect(JSON.parse(await readFile(join(stagedDir, 'signals/look_usage.json'), 'utf-8'))).toEqual([]);
|
||||
});
|
||||
|
||||
it('stamps explore warehouse targets from pull config and reports unmapped Looker connections', async () => {
|
||||
const client = makeClient();
|
||||
const warehouseConnectionId = '22222222-2222-4222-8222-222222222222';
|
||||
vi.mocked(client.listLookmlModels).mockResolvedValue({
|
||||
models: [
|
||||
{
|
||||
name: 'b2b',
|
||||
label: 'B2B',
|
||||
explores: [
|
||||
{ name: 'sales_pipeline', label: 'Sales Pipeline' },
|
||||
{ name: 'marketing', label: 'Marketing' },
|
||||
],
|
||||
},
|
||||
],
|
||||
});
|
||||
vi.mocked(client.getExplore).mockImplementation(async (_modelName: string, exploreName: string) => {
|
||||
if (exploreName === 'marketing') {
|
||||
return {
|
||||
modelName: 'b2b',
|
||||
exploreName: 'marketing',
|
||||
label: 'Marketing',
|
||||
description: null,
|
||||
rawSqlTableName: 'proj.dataset.marketing',
|
||||
connectionName: 'missing_mapping',
|
||||
viewName: 'marketing',
|
||||
fields: {
|
||||
dimensions: [{ name: 'marketing.id', label: null, type: null, sql: null, description: null }],
|
||||
measures: [{ name: 'marketing.spend', label: null, type: null, sql: null, description: null }],
|
||||
},
|
||||
joins: [],
|
||||
targetWarehouseConnectionId: null,
|
||||
targetTable: null,
|
||||
};
|
||||
}
|
||||
return {
|
||||
modelName: 'b2b',
|
||||
exploreName: 'sales_pipeline',
|
||||
label: 'Sales Pipeline',
|
||||
description: null,
|
||||
rawSqlTableName: 'proj.dataset.opportunities AS opportunities',
|
||||
connectionName: 'b2b_sandbox_bq',
|
||||
viewName: 'opportunities',
|
||||
fields: {
|
||||
dimensions: [{ name: 'opportunities.id', label: null, type: null, sql: null, description: null }],
|
||||
measures: [{ name: 'opportunities.arr', label: null, type: null, sql: null, description: null }],
|
||||
},
|
||||
joins: [
|
||||
{
|
||||
name: 'accounts',
|
||||
type: 'left_outer',
|
||||
relationship: 'many_to_one',
|
||||
rawSqlTableName: 'proj.dataset.accounts',
|
||||
sqlOn: '$' + '{opportunities.account_id} = $' + '{accounts.id}',
|
||||
from: null,
|
||||
targetTable: null,
|
||||
},
|
||||
],
|
||||
targetWarehouseConnectionId: null,
|
||||
targetTable: null,
|
||||
};
|
||||
});
|
||||
|
||||
await fetchLookerRuntimeBundle({
|
||||
pullConfig: {
|
||||
lookerConnectionId: connectionId,
|
||||
connectionMappings: { b2b_sandbox_bq: warehouseConnectionId },
|
||||
connectionTypes: { b2b_sandbox_bq: 'BIGQUERY' },
|
||||
parsedTargetTables: {
|
||||
'b2b.sales_pipeline': {
|
||||
ok: true,
|
||||
catalog: 'proj',
|
||||
schema: 'dataset',
|
||||
name: 'opportunities',
|
||||
canonicalTable: 'proj.dataset.opportunities',
|
||||
},
|
||||
'b2b.sales_pipeline.accounts': {
|
||||
ok: true,
|
||||
catalog: 'proj',
|
||||
schema: 'dataset',
|
||||
name: 'accounts',
|
||||
canonicalTable: 'proj.dataset.accounts',
|
||||
},
|
||||
},
|
||||
},
|
||||
stagedDir,
|
||||
ctx: { connectionId, sourceKey: 'looker' },
|
||||
clientFactory: { createClient: vi.fn().mockResolvedValue(client) },
|
||||
now: () => new Date('2026-04-30T12:30:00.000Z'),
|
||||
});
|
||||
|
||||
const salesPipeline = JSON.parse(await readFile(join(stagedDir, 'explores/b2b/sales_pipeline.json'), 'utf-8'));
|
||||
expect(salesPipeline).toMatchObject({
|
||||
connectionName: 'b2b_sandbox_bq',
|
||||
targetWarehouseConnectionId: warehouseConnectionId,
|
||||
targetTable: {
|
||||
ok: true,
|
||||
catalog: 'proj',
|
||||
schema: 'dataset',
|
||||
name: 'opportunities',
|
||||
canonicalTable: 'proj.dataset.opportunities',
|
||||
},
|
||||
joins: [
|
||||
{
|
||||
name: 'accounts',
|
||||
targetTable: {
|
||||
ok: true,
|
||||
catalog: 'proj',
|
||||
schema: 'dataset',
|
||||
name: 'accounts',
|
||||
canonicalTable: 'proj.dataset.accounts',
|
||||
},
|
||||
},
|
||||
],
|
||||
});
|
||||
|
||||
const marketing = JSON.parse(await readFile(join(stagedDir, 'explores/b2b/marketing.json'), 'utf-8'));
|
||||
expect(marketing).toMatchObject({
|
||||
connectionName: 'missing_mapping',
|
||||
targetWarehouseConnectionId: null,
|
||||
targetTable: {
|
||||
ok: false,
|
||||
reason: 'no_connection_mapping',
|
||||
},
|
||||
});
|
||||
|
||||
const report = JSON.parse(await readFile(join(stagedDir, 'looker-fetch-report.json'), 'utf-8'));
|
||||
expect(report.status).toBe('partial');
|
||||
expect(report.skipped).toEqual([]);
|
||||
expect(report.warnings).toEqual([
|
||||
{
|
||||
rawPath: 'looker_connection_mappings/missing_mapping',
|
||||
entityType: 'looker_connection_mapping',
|
||||
entityId: 'missing_mapping',
|
||||
severity: 'warning',
|
||||
statusCode: null,
|
||||
message: 'Looker connection missing_mapping is not mapped to a warehouse connection; 1 explore will be wiki-only.',
|
||||
retryRecommended: false,
|
||||
kind: 'unmapped_looker_connection',
|
||||
details: {
|
||||
lookerConnectionName: 'missing_mapping',
|
||||
affectedExplores: ['b2b.marketing'],
|
||||
},
|
||||
},
|
||||
]);
|
||||
});
|
||||
|
||||
it('reports parsed target table failures without retrying the Looker fetch', async () => {
|
||||
const client = makeClient();
|
||||
const warehouseConnectionId = '22222222-2222-4222-8222-222222222222';
|
||||
vi.mocked(client.getExplore).mockResolvedValue({
|
||||
modelName: 'b2b',
|
||||
exploreName: 'sales_pipeline',
|
||||
label: 'Sales Pipeline',
|
||||
description: null,
|
||||
rawSqlTableName: '$' + '{derived.SQL_TABLE_NAME}',
|
||||
connectionName: 'b2b_sandbox_bq',
|
||||
viewName: 'opportunities',
|
||||
fields: {
|
||||
dimensions: [{ name: 'opportunities.id', label: null, type: null, sql: null, description: null }],
|
||||
measures: [{ name: 'opportunities.arr', label: null, type: null, sql: null, description: null }],
|
||||
},
|
||||
joins: [],
|
||||
targetWarehouseConnectionId: null,
|
||||
targetTable: null,
|
||||
});
|
||||
|
||||
await fetchLookerRuntimeBundle({
|
||||
pullConfig: {
|
||||
lookerConnectionId: connectionId,
|
||||
connectionMappings: { b2b_sandbox_bq: warehouseConnectionId },
|
||||
connectionTypes: { b2b_sandbox_bq: 'BIGQUERY' },
|
||||
parsedTargetTables: {
|
||||
'b2b.sales_pipeline': {
|
||||
ok: false,
|
||||
reason: 'looker_template_unresolved',
|
||||
detail: 'Looker template markers cannot be resolved before parsing.',
|
||||
},
|
||||
},
|
||||
},
|
||||
stagedDir,
|
||||
ctx: { connectionId, sourceKey: 'looker' },
|
||||
clientFactory: { createClient: vi.fn().mockResolvedValue(client) },
|
||||
now: () => new Date('2026-04-30T12:30:00.000Z'),
|
||||
});
|
||||
|
||||
const explore = JSON.parse(await readFile(join(stagedDir, 'explores/b2b/sales_pipeline.json'), 'utf-8'));
|
||||
expect(explore).toMatchObject({
|
||||
targetWarehouseConnectionId: warehouseConnectionId,
|
||||
targetTable: {
|
||||
ok: false,
|
||||
reason: 'looker_template_unresolved',
|
||||
},
|
||||
});
|
||||
|
||||
const report = JSON.parse(await readFile(join(stagedDir, 'looker-fetch-report.json'), 'utf-8'));
|
||||
expect(report).toMatchObject({
|
||||
status: 'partial',
|
||||
retryRecommended: false,
|
||||
skipped: [],
|
||||
warnings: [
|
||||
{
|
||||
rawPath: 'looker_connection_mappings/b2b_sandbox_bq',
|
||||
entityType: 'looker_connection_mapping',
|
||||
entityId: 'b2b_sandbox_bq',
|
||||
severity: 'warning',
|
||||
statusCode: null,
|
||||
message:
|
||||
'Looker explore b2b.sales_pipeline has sql_table_name that cannot be mapped to a physical warehouse table: looker_template_unresolved.',
|
||||
retryRecommended: false,
|
||||
kind: 'looker_template_unresolved',
|
||||
details: {
|
||||
lookerConnectionName: 'b2b_sandbox_bq',
|
||||
rawSqlTableName: '$' + '{derived.SQL_TABLE_NAME}',
|
||||
reason: 'looker_template_unresolved',
|
||||
},
|
||||
},
|
||||
],
|
||||
});
|
||||
});
|
||||
|
||||
it('propagates parent explore warehouse targets onto Dashboard tile and Look queries', async () => {
|
||||
const client = makeClient();
|
||||
const warehouseConnectionId = '22222222-2222-4222-8222-222222222222';
|
||||
vi.mocked(client.getExplore).mockResolvedValue({
|
||||
modelName: 'b2b',
|
||||
exploreName: 'sales_pipeline',
|
||||
label: 'Sales Pipeline',
|
||||
description: null,
|
||||
rawSqlTableName: 'proj.dataset.opportunities AS opportunities',
|
||||
connectionName: 'b2b_sandbox_bq',
|
||||
viewName: 'opportunities',
|
||||
fields: {
|
||||
dimensions: [{ name: 'opportunities.id', label: null, type: null, sql: null, description: null }],
|
||||
measures: [{ name: 'opportunities.arr', label: null, type: null, sql: null, description: null }],
|
||||
},
|
||||
joins: [],
|
||||
targetWarehouseConnectionId: null,
|
||||
targetTable: null,
|
||||
});
|
||||
|
||||
await fetchLookerRuntimeBundle({
|
||||
pullConfig: {
|
||||
lookerConnectionId: connectionId,
|
||||
connectionMappings: { b2b_sandbox_bq: warehouseConnectionId },
|
||||
connectionTypes: { b2b_sandbox_bq: 'BIGQUERY' },
|
||||
parsedTargetTables: {
|
||||
'b2b.sales_pipeline': {
|
||||
ok: true,
|
||||
catalog: 'proj',
|
||||
schema: 'dataset',
|
||||
name: 'opportunities',
|
||||
canonicalTable: 'proj.dataset.opportunities',
|
||||
},
|
||||
},
|
||||
},
|
||||
stagedDir,
|
||||
ctx: { connectionId, sourceKey: 'looker' },
|
||||
clientFactory: { createClient: vi.fn().mockResolvedValue(client) },
|
||||
now: () => new Date('2026-04-30T12:30:00.000Z'),
|
||||
});
|
||||
|
||||
const dashboard = JSON.parse(await readFile(join(stagedDir, 'dashboards/10.json'), 'utf-8'));
|
||||
expect(dashboard.tiles[0].query).toMatchObject({
|
||||
model: 'b2b',
|
||||
view: 'sales_pipeline',
|
||||
targetWarehouseConnectionId: warehouseConnectionId,
|
||||
targetTable: {
|
||||
ok: true,
|
||||
catalog: 'proj',
|
||||
schema: 'dataset',
|
||||
name: 'opportunities',
|
||||
canonicalTable: 'proj.dataset.opportunities',
|
||||
},
|
||||
});
|
||||
|
||||
const look = JSON.parse(await readFile(join(stagedDir, 'looks/20.json'), 'utf-8'));
|
||||
expect(look.query).toMatchObject({
|
||||
model: 'b2b',
|
||||
view: 'sales_pipeline',
|
||||
targetWarehouseConnectionId: warehouseConnectionId,
|
||||
targetTable: {
|
||||
ok: true,
|
||||
catalog: 'proj',
|
||||
schema: 'dataset',
|
||||
name: 'opportunities',
|
||||
canonicalTable: 'proj.dataset.opportunities',
|
||||
},
|
||||
});
|
||||
});
|
||||
|
||||
it('records skipped detail entities and keeps cursors pinned for affected entity types', async () => {
|
||||
const client = makeClient();
|
||||
vi.mocked(client.listDashboards).mockResolvedValue([
|
||||
{ id: '10', updatedAt: '2026-04-30T12:00:00.000Z' },
|
||||
{ id: '11', updatedAt: '2026-04-30T12:10:00.000Z' },
|
||||
]);
|
||||
vi.mocked(client.getDashboard).mockImplementation(async (id: string) => {
|
||||
if (id === '11') {
|
||||
const error = new Error('Looker API rate limit remained after retry');
|
||||
Object.assign(error, { statusCode: 429 });
|
||||
throw error;
|
||||
}
|
||||
return {
|
||||
lookerId: id,
|
||||
title: `Dashboard ${id}`,
|
||||
description: null,
|
||||
folderId: '7',
|
||||
ownerId: '3',
|
||||
updatedAt: '2026-04-30T12:00:00.000Z',
|
||||
tiles: [],
|
||||
};
|
||||
});
|
||||
vi.mocked(client.listLooks).mockResolvedValue([{ id: '20', updatedAt: '2026-04-30T11:15:00.000Z' }]);
|
||||
vi.mocked(client.getLook).mockResolvedValue({
|
||||
lookerId: '20',
|
||||
title: 'Look 20',
|
||||
description: null,
|
||||
folderId: '7',
|
||||
ownerId: '3',
|
||||
updatedAt: '2026-04-30T11:15:00.000Z',
|
||||
query: null,
|
||||
});
|
||||
|
||||
await fetchLookerRuntimeBundle({
|
||||
pullConfig: {
|
||||
lookerConnectionId: connectionId,
|
||||
dashboardUpdatedSince: '2026-04-30T12:00:00.000Z',
|
||||
lookUpdatedSince: '2026-04-30T11:00:00.000Z',
|
||||
},
|
||||
stagedDir,
|
||||
ctx: { connectionId, sourceKey: 'looker' },
|
||||
clientFactory: { createClient: vi.fn().mockResolvedValue(client) },
|
||||
now: () => new Date('2026-04-30T12:30:00.000Z'),
|
||||
});
|
||||
|
||||
await expect(readdir(join(stagedDir, 'dashboards'))).rejects.toMatchObject({ code: 'ENOENT' });
|
||||
await expect(readdir(join(stagedDir, 'looks'))).resolves.toEqual(['20.json']);
|
||||
|
||||
const syncConfig = JSON.parse(await readFile(join(stagedDir, 'sync-config.json'), 'utf-8'));
|
||||
expect(syncConfig.nextCursors).toEqual({
|
||||
dashboardsLastSyncedAt: '2026-04-30T12:00:00.000Z',
|
||||
looksLastSyncedAt: '2026-04-30T11:15:00.000Z',
|
||||
});
|
||||
|
||||
const report = JSON.parse(await readFile(join(stagedDir, 'looker-fetch-report.json'), 'utf-8'));
|
||||
expect(report).toEqual({
|
||||
status: 'partial',
|
||||
retryRecommended: true,
|
||||
skipped: [
|
||||
{
|
||||
rawPath: 'dashboards/11.json',
|
||||
entityType: 'dashboard',
|
||||
entityId: '11',
|
||||
severity: 'error',
|
||||
statusCode: 429,
|
||||
message: 'Looker API rate limit remained after retry',
|
||||
retryRecommended: true,
|
||||
},
|
||||
],
|
||||
warnings: [],
|
||||
});
|
||||
});
|
||||
|
||||
it('continues without explore bootstrap when LookML model listing is denied', async () => {
|
||||
const client = makeClient();
|
||||
const error = new Error('LookML model access denied');
|
||||
Object.assign(error, { statusCode: 403 });
|
||||
vi.mocked(client.listLookmlModels).mockRejectedValue(error);
|
||||
|
||||
await fetchLookerRuntimeBundle({
|
||||
pullConfig: { lookerConnectionId: connectionId },
|
||||
stagedDir,
|
||||
ctx: { connectionId, sourceKey: 'looker' },
|
||||
clientFactory: { createClient: vi.fn().mockResolvedValue(client) },
|
||||
now: () => new Date('2026-04-30T12:30:00.000Z'),
|
||||
});
|
||||
|
||||
await expect(readdir(join(stagedDir, 'dashboards'))).resolves.toEqual(['10.json']);
|
||||
await expect(readdir(join(stagedDir, 'looks'))).resolves.toEqual(['20.json']);
|
||||
await expect(readFile(join(stagedDir, 'lookml_models.json'), 'utf-8')).resolves.toBe('{\n "models": []\n}\n');
|
||||
await expect(readdir(join(stagedDir, 'explores'))).rejects.toMatchObject({ code: 'ENOENT' });
|
||||
expect(client.getExplore).not.toHaveBeenCalled();
|
||||
|
||||
const report = JSON.parse(await readFile(join(stagedDir, 'looker-fetch-report.json'), 'utf-8'));
|
||||
expect(report).toEqual({
|
||||
status: 'success',
|
||||
retryRecommended: false,
|
||||
skipped: [],
|
||||
warnings: [
|
||||
{
|
||||
rawPath: 'lookml_models.json',
|
||||
entityType: 'lookml_models',
|
||||
entityId: null,
|
||||
severity: 'warning',
|
||||
statusCode: 403,
|
||||
message: 'LookML model access denied',
|
||||
retryRecommended: false,
|
||||
},
|
||||
],
|
||||
});
|
||||
|
||||
const chunked = await chunkLookerStagedDir(stagedDir);
|
||||
expect(chunked.workUnits.map((wu) => wu.unitKey).sort()).toEqual(['looker-dashboard-10', 'looker-look-20']);
|
||||
expect(chunked.workUnits.flatMap((wu) => wu.dependencyPaths)).not.toContain('explores/b2b/sales_pipeline.json');
|
||||
});
|
||||
|
||||
it('cleans up the Looker client after a successful fetch', async () => {
|
||||
const client = makeClient();
|
||||
|
||||
await fetchLookerRuntimeBundle({
|
||||
pullConfig: { lookerConnectionId: connectionId },
|
||||
stagedDir,
|
||||
ctx: { connectionId, sourceKey: 'looker' },
|
||||
clientFactory: { createClient: vi.fn().mockResolvedValue(client) },
|
||||
now: () => new Date('2026-04-30T12:30:00.000Z'),
|
||||
});
|
||||
|
||||
expect(client.cleanup).toHaveBeenCalledTimes(1);
|
||||
});
|
||||
|
||||
it('cleans up the Looker client when fetch throws', async () => {
|
||||
const client = makeClient();
|
||||
vi.mocked(client.listDashboards).mockRejectedValue(new Error('Looker API unavailable'));
|
||||
|
||||
await expect(
|
||||
fetchLookerRuntimeBundle({
|
||||
pullConfig: { lookerConnectionId: connectionId },
|
||||
stagedDir,
|
||||
ctx: { connectionId, sourceKey: 'looker' },
|
||||
clientFactory: { createClient: vi.fn().mockResolvedValue(client) },
|
||||
now: () => new Date('2026-04-30T12:30:00.000Z'),
|
||||
}),
|
||||
).rejects.toThrow('Looker API unavailable');
|
||||
|
||||
expect(client.cleanup).toHaveBeenCalledTimes(1);
|
||||
});
|
||||
});
|
||||
555
packages/cli/src/context/ingest/adapters/looker/fetch.ts
Normal file
555
packages/cli/src/context/ingest/adapters/looker/fetch.ts
Normal file
|
|
@ -0,0 +1,555 @@
|
|||
import { mkdir, writeFile } from 'node:fs/promises';
|
||||
import { dirname, join } from 'node:path';
|
||||
import type { ParsedTargetTable } from '../../parsed-target-table.js';
|
||||
import type { FetchContext } from '../../types.js';
|
||||
import { writeLookerEvidenceDocuments } from './evidence-documents.js';
|
||||
import { writeLookerFetchReport } from './fetch-report.js';
|
||||
import {
|
||||
type LookerPullConfig,
|
||||
parseLookerPullConfig,
|
||||
STAGED_FILES,
|
||||
type StagedDashboardFile,
|
||||
type StagedExploreFile,
|
||||
type StagedFoldersTreeFile,
|
||||
type StagedGroupFile,
|
||||
type StagedLookerFetchIssue,
|
||||
type StagedLookerFetchReport,
|
||||
type StagedLookerQuery,
|
||||
type StagedLookerSignalsFile,
|
||||
type StagedLookFile,
|
||||
type StagedLookmlModelsFile,
|
||||
type StagedUserFile,
|
||||
stagedDashboardFileSchema,
|
||||
stagedExploreFileSchema,
|
||||
stagedFoldersTreeFileSchema,
|
||||
stagedGroupFileSchema,
|
||||
stagedLookerScopeFileSchema,
|
||||
stagedLookerSignalsFileSchema,
|
||||
stagedLookFileSchema,
|
||||
stagedLookmlModelsFileSchema,
|
||||
stagedSyncConfigSchema,
|
||||
stagedUserFileSchema,
|
||||
} from './types.js';
|
||||
|
||||
interface LookerEntityRef {
|
||||
id: string;
|
||||
updatedAt?: string | null;
|
||||
}
|
||||
|
||||
export interface LookerRuntimeClient {
|
||||
listDashboards(): Promise<LookerEntityRef[]>;
|
||||
getDashboard(id: string): Promise<StagedDashboardFile>;
|
||||
listLooks(): Promise<LookerEntityRef[]>;
|
||||
getLook(id: string): Promise<StagedLookFile>;
|
||||
listFolders(): Promise<StagedFoldersTreeFile>;
|
||||
listUsers(): Promise<StagedUserFile[]>;
|
||||
listGroups(): Promise<StagedGroupFile[]>;
|
||||
listLookmlModels(): Promise<StagedLookmlModelsFile>;
|
||||
getExplore(modelName: string, exploreName: string): Promise<StagedExploreFile>;
|
||||
getSignals?(): Promise<StagedLookerSignalsFile>;
|
||||
cleanup?(): Promise<void>;
|
||||
}
|
||||
|
||||
export interface LookerClientFactory {
|
||||
createClient(config: LookerPullConfig, ctx: FetchContext): Promise<LookerRuntimeClient> | LookerRuntimeClient;
|
||||
}
|
||||
|
||||
interface ExploreTargetSummary {
|
||||
targetWarehouseConnectionId: string | null;
|
||||
targetTable: ParsedTargetTable | null;
|
||||
}
|
||||
|
||||
interface StampedExploreResult {
|
||||
explore: StagedExploreFile;
|
||||
targetSummary: ExploreTargetSummary;
|
||||
}
|
||||
|
||||
interface StagedJsonFile<T> {
|
||||
rawPath: string;
|
||||
value: T;
|
||||
}
|
||||
|
||||
type ParsedTargetTableFailureReason = Extract<ParsedTargetTable, { ok: false }>['reason'];
|
||||
|
||||
interface FetchLookerRuntimeBundleParams {
|
||||
pullConfig: unknown;
|
||||
stagedDir: string;
|
||||
ctx: FetchContext;
|
||||
clientFactory: LookerClientFactory;
|
||||
now?: () => Date;
|
||||
}
|
||||
|
||||
export async function fetchLookerRuntimeBundle(params: FetchLookerRuntimeBundleParams): Promise<void> {
|
||||
const config = parseLookerPullConfig(params.pullConfig);
|
||||
const connectionId = config.lookerConnectionId ?? params.ctx.connectionId;
|
||||
const client = await params.clientFactory.createClient(config, params.ctx);
|
||||
try {
|
||||
const now = params.now ?? (() => new Date());
|
||||
const skipped: StagedLookerFetchIssue[] = [];
|
||||
const warnings: StagedLookerFetchIssue[] = [];
|
||||
let dashboardFetchHadSkips = false;
|
||||
let lookFetchHadSkips = false;
|
||||
const fetchedDashboards: Array<StagedJsonFile<StagedDashboardFile>> = [];
|
||||
const fetchedLooks: Array<StagedJsonFile<StagedLookFile>> = [];
|
||||
|
||||
const previousCursors = {
|
||||
dashboardsLastSyncedAt: config.dashboardUpdatedSince ?? null,
|
||||
looksLastSyncedAt: config.lookUpdatedSince ?? null,
|
||||
};
|
||||
|
||||
const dashboards = await client.listDashboards();
|
||||
const dashboardRawPaths = dashboards.map((dashboardRef) => `dashboards/${safePathSegment(dashboardRef.id)}.json`);
|
||||
const dashboardsToFetch = dashboards.filter((dashboardRef) =>
|
||||
shouldFetchEntity(dashboardRef, previousCursors.dashboardsLastSyncedAt),
|
||||
);
|
||||
const fetchedRawPaths: string[] = [];
|
||||
for (const dashboardRef of dashboardsToFetch) {
|
||||
const rawPath = `dashboards/${safePathSegment(dashboardRef.id)}.json`;
|
||||
try {
|
||||
const dashboard = stagedDashboardFileSchema.parse(await client.getDashboard(dashboardRef.id));
|
||||
const dashboardRawPath = `dashboards/${safePathSegment(dashboard.lookerId)}.json`;
|
||||
fetchedRawPaths.push(dashboardRawPath);
|
||||
fetchedDashboards.push({ rawPath: dashboardRawPath, value: dashboard });
|
||||
} catch (error) {
|
||||
dashboardFetchHadSkips = true;
|
||||
skipped.push(issueForFetchError({ rawPath, entityType: 'dashboard', entityId: dashboardRef.id, error }));
|
||||
}
|
||||
}
|
||||
|
||||
const looks = await client.listLooks();
|
||||
const lookRawPaths = looks.map((lookRef) => `looks/${safePathSegment(lookRef.id)}.json`);
|
||||
const looksToFetch = looks.filter((lookRef) => shouldFetchEntity(lookRef, previousCursors.looksLastSyncedAt));
|
||||
for (const lookRef of looksToFetch) {
|
||||
const rawPath = `looks/${safePathSegment(lookRef.id)}.json`;
|
||||
try {
|
||||
const look = stagedLookFileSchema.parse(await client.getLook(lookRef.id));
|
||||
const lookRawPath = `looks/${safePathSegment(look.lookerId)}.json`;
|
||||
fetchedRawPaths.push(lookRawPath);
|
||||
fetchedLooks.push({ rawPath: lookRawPath, value: look });
|
||||
} catch (error) {
|
||||
lookFetchHadSkips = true;
|
||||
skipped.push(issueForFetchError({ rawPath, entityType: 'look', entityId: lookRef.id, error }));
|
||||
}
|
||||
}
|
||||
|
||||
const nextCursors = {
|
||||
dashboardsLastSyncedAt: dashboardFetchHadSkips
|
||||
? previousCursors.dashboardsLastSyncedAt
|
||||
: maxUpdatedAt(dashboards, previousCursors.dashboardsLastSyncedAt),
|
||||
looksLastSyncedAt: lookFetchHadSkips
|
||||
? previousCursors.looksLastSyncedAt
|
||||
: maxUpdatedAt(looks, previousCursors.looksLastSyncedAt),
|
||||
};
|
||||
const fetchMode =
|
||||
previousCursors.dashboardsLastSyncedAt || previousCursors.looksLastSyncedAt ? 'incremental' : 'full';
|
||||
|
||||
await writeJson(
|
||||
params.stagedDir,
|
||||
STAGED_FILES.syncConfig,
|
||||
stagedSyncConfigSchema.parse({
|
||||
lookerConnectionId: connectionId,
|
||||
fetchedAt: now().toISOString(),
|
||||
...(config.instanceBaseUrl ? { instanceBaseUrl: config.instanceBaseUrl } : {}),
|
||||
previousCursors,
|
||||
nextCursors,
|
||||
}),
|
||||
);
|
||||
|
||||
await writeJson(
|
||||
params.stagedDir,
|
||||
STAGED_FILES.scope,
|
||||
stagedLookerScopeFileSchema.parse({
|
||||
mode: fetchMode,
|
||||
knownCurrentRawPaths: [...dashboardRawPaths, ...lookRawPaths].sort(),
|
||||
fetchedRawPaths: fetchedRawPaths.sort(),
|
||||
}),
|
||||
);
|
||||
|
||||
const folders = stagedFoldersTreeFileSchema.parse(await client.listFolders());
|
||||
await writeJson(params.stagedDir, STAGED_FILES.foldersTree, folders);
|
||||
|
||||
const users = await client.listUsers();
|
||||
for (const rawUser of users) {
|
||||
const user = stagedUserFileSchema.parse(rawUser);
|
||||
await writeJson(params.stagedDir, `users/${safePathSegment(user.id)}.json`, user);
|
||||
}
|
||||
|
||||
const groups = await client.listGroups();
|
||||
for (const rawGroup of groups) {
|
||||
const group = stagedGroupFileSchema.parse(rawGroup);
|
||||
await writeJson(params.stagedDir, `groups/${safePathSegment(group.id)}.json`, group);
|
||||
}
|
||||
|
||||
let models: StagedLookmlModelsFile;
|
||||
try {
|
||||
models = stagedLookmlModelsFileSchema.parse(await client.listLookmlModels());
|
||||
} catch (error) {
|
||||
warnings.push(
|
||||
issueForFetchError({
|
||||
rawPath: STAGED_FILES.lookmlModels,
|
||||
entityType: 'lookml_models',
|
||||
entityId: null,
|
||||
error,
|
||||
severity: 'warning',
|
||||
}),
|
||||
);
|
||||
models = stagedLookmlModelsFileSchema.parse({ models: [] });
|
||||
}
|
||||
await writeJson(params.stagedDir, STAGED_FILES.lookmlModels, models);
|
||||
const exploreTargetsByKey = new Map<string, ExploreTargetSummary>();
|
||||
const stagedExplores: StagedExploreFile[] = [];
|
||||
for (const model of models.models) {
|
||||
for (const exploreRef of model.explores) {
|
||||
const rawPath = `explores/${safePathSegment(model.name)}/${safePathSegment(exploreRef.name)}.json`;
|
||||
try {
|
||||
const result = stampExploreWarehouseTarget(await client.getExplore(model.name, exploreRef.name), config);
|
||||
stagedExplores.push(result.explore);
|
||||
exploreTargetsByKey.set(exploreKey(result.explore.modelName, result.explore.exploreName), result.targetSummary);
|
||||
await writeJson(
|
||||
params.stagedDir,
|
||||
`explores/${safePathSegment(result.explore.modelName)}/${safePathSegment(result.explore.exploreName)}.json`,
|
||||
result.explore,
|
||||
);
|
||||
} catch (error) {
|
||||
skipped.push(
|
||||
issueForFetchError({
|
||||
rawPath,
|
||||
entityType: 'explore',
|
||||
entityId: `${model.name}.${exploreRef.name}`,
|
||||
error,
|
||||
}),
|
||||
);
|
||||
}
|
||||
}
|
||||
}
|
||||
warnings.push(...warehouseTargetWarnings(stagedExplores));
|
||||
|
||||
for (const dashboard of fetchedDashboards) {
|
||||
await writeJson(params.stagedDir, dashboard.rawPath, stampDashboardQueries(dashboard.value, exploreTargetsByKey));
|
||||
}
|
||||
|
||||
for (const look of fetchedLooks) {
|
||||
await writeJson(params.stagedDir, look.rawPath, stampLookQuery(look.value, exploreTargetsByKey));
|
||||
}
|
||||
|
||||
let signals: StagedLookerSignalsFile;
|
||||
try {
|
||||
signals = stagedLookerSignalsFileSchema.parse(client.getSignals ? await client.getSignals() : {});
|
||||
} catch (error) {
|
||||
warnings.push(
|
||||
issueForFetchError({
|
||||
rawPath: STAGED_FILES.signals.dashboardUsage,
|
||||
entityType: 'signals',
|
||||
entityId: null,
|
||||
error,
|
||||
}),
|
||||
);
|
||||
signals = stagedLookerSignalsFileSchema.parse({});
|
||||
}
|
||||
await writeJson(params.stagedDir, STAGED_FILES.signals.dashboardUsage, signals.dashboardUsage);
|
||||
await writeJson(params.stagedDir, STAGED_FILES.signals.lookUsage, signals.lookUsage);
|
||||
await writeJson(params.stagedDir, STAGED_FILES.signals.scheduledPlans, signals.scheduledPlans);
|
||||
await writeJson(params.stagedDir, STAGED_FILES.signals.favorites, signals.favorites);
|
||||
|
||||
await writeLookerEvidenceDocuments(params.stagedDir);
|
||||
await writeLookerFetchReport(params.stagedDir, buildFetchReport(skipped, warnings));
|
||||
} finally {
|
||||
await client.cleanup?.();
|
||||
}
|
||||
}
|
||||
|
||||
async function writeJson(stagedDir: string, relPath: string, value: unknown): Promise<void> {
|
||||
const abs = join(stagedDir, relPath);
|
||||
await mkdir(dirname(abs), { recursive: true });
|
||||
await writeFile(abs, `${JSON.stringify(value, null, 2)}\n`, 'utf-8');
|
||||
}
|
||||
|
||||
function safePathSegment(value: string): string {
|
||||
if (!/^[a-zA-Z0-9_-]+$/.test(value)) {
|
||||
throw new Error(`Unsafe Looker staged path segment: ${value}`);
|
||||
}
|
||||
return value;
|
||||
}
|
||||
|
||||
function shouldFetchEntity(ref: LookerEntityRef, updatedSince: string | null): boolean {
|
||||
if (!updatedSince) {
|
||||
return true;
|
||||
}
|
||||
if (!ref.updatedAt) {
|
||||
return true;
|
||||
}
|
||||
return Date.parse(ref.updatedAt) > Date.parse(updatedSince);
|
||||
}
|
||||
|
||||
function maxUpdatedAt(refs: LookerEntityRef[], fallback: string | null): string | null {
|
||||
let max = fallback;
|
||||
for (const ref of refs) {
|
||||
if (!ref.updatedAt) {
|
||||
continue;
|
||||
}
|
||||
if (!max || Date.parse(ref.updatedAt) > Date.parse(max)) {
|
||||
max = ref.updatedAt;
|
||||
}
|
||||
}
|
||||
if (!max) {
|
||||
return null;
|
||||
}
|
||||
const ms = Date.parse(max);
|
||||
return Number.isNaN(ms) ? null : new Date(ms).toISOString();
|
||||
}
|
||||
|
||||
function stampExploreWarehouseTarget(rawExplore: unknown, config: LookerPullConfig): StampedExploreResult {
|
||||
const parsed = stagedExploreFileSchema.parse(rawExplore);
|
||||
const key = exploreKey(parsed.modelName, parsed.exploreName);
|
||||
const targetWarehouseConnectionId = connectionMappingFor(parsed.connectionName, config);
|
||||
const targetTable = targetTableFor({
|
||||
key,
|
||||
rawSqlTableName: parsed.rawSqlTableName,
|
||||
targetWarehouseConnectionId,
|
||||
config,
|
||||
entityLabel: `Looker explore ${key}`,
|
||||
});
|
||||
|
||||
const explore = stagedExploreFileSchema.parse({
|
||||
...parsed,
|
||||
targetWarehouseConnectionId,
|
||||
targetTable,
|
||||
joins: parsed.joins.map((join) => ({
|
||||
...join,
|
||||
targetTable: join.rawSqlTableName
|
||||
? targetTableFor({
|
||||
key: `${key}.${join.name}`,
|
||||
rawSqlTableName: join.rawSqlTableName,
|
||||
targetWarehouseConnectionId,
|
||||
config,
|
||||
entityLabel: `Looker join ${key}.${join.name}`,
|
||||
})
|
||||
: null,
|
||||
})),
|
||||
});
|
||||
|
||||
return {
|
||||
explore,
|
||||
targetSummary: {
|
||||
targetWarehouseConnectionId: explore.targetWarehouseConnectionId,
|
||||
targetTable: explore.targetTable,
|
||||
},
|
||||
};
|
||||
}
|
||||
|
||||
function connectionMappingFor(connectionName: string | null, config: LookerPullConfig): string | null {
|
||||
if (!connectionName) {
|
||||
return null;
|
||||
}
|
||||
return config.connectionMappings[connectionName] ?? null;
|
||||
}
|
||||
|
||||
function targetTableFor(input: {
|
||||
key: string;
|
||||
rawSqlTableName: string | null;
|
||||
targetWarehouseConnectionId: string | null;
|
||||
config: LookerPullConfig;
|
||||
entityLabel: string;
|
||||
}): ParsedTargetTable | null {
|
||||
if (!input.rawSqlTableName && !input.targetWarehouseConnectionId) {
|
||||
return null;
|
||||
}
|
||||
|
||||
if (!input.targetWarehouseConnectionId) {
|
||||
return {
|
||||
ok: false,
|
||||
reason: 'no_connection_mapping',
|
||||
detail: `${input.entityLabel} has no mapped warehouse connection.`,
|
||||
};
|
||||
}
|
||||
|
||||
const parsed = input.config.parsedTargetTables[input.key];
|
||||
if (parsed) {
|
||||
return parsed;
|
||||
}
|
||||
|
||||
if (!input.rawSqlTableName) {
|
||||
return null;
|
||||
}
|
||||
|
||||
return {
|
||||
ok: false,
|
||||
reason: 'parse_error',
|
||||
detail: `${input.entityLabel} has raw sql_table_name but no parsedTargetTables entry for key ${input.key}.`,
|
||||
};
|
||||
}
|
||||
|
||||
function exploreKey(modelName: string, exploreName: string): string {
|
||||
return `${modelName}.${exploreName}`;
|
||||
}
|
||||
|
||||
function stampQueryWarehouseTarget(
|
||||
query: StagedLookerQuery | null,
|
||||
exploreTargetsByKey: Map<string, ExploreTargetSummary>,
|
||||
): StagedLookerQuery | null {
|
||||
if (!query) {
|
||||
return null;
|
||||
}
|
||||
|
||||
const target = exploreTargetsByKey.get(exploreKey(query.model, query.view));
|
||||
if (!target) {
|
||||
return query;
|
||||
}
|
||||
|
||||
return {
|
||||
...query,
|
||||
targetWarehouseConnectionId: target.targetWarehouseConnectionId,
|
||||
targetTable: target.targetTable,
|
||||
};
|
||||
}
|
||||
|
||||
function stampDashboardQueries(
|
||||
dashboard: StagedDashboardFile,
|
||||
exploreTargetsByKey: Map<string, ExploreTargetSummary>,
|
||||
): StagedDashboardFile {
|
||||
return stagedDashboardFileSchema.parse({
|
||||
...dashboard,
|
||||
tiles: dashboard.tiles.map((tile) => ({
|
||||
...tile,
|
||||
query: stampQueryWarehouseTarget(tile.query, exploreTargetsByKey),
|
||||
})),
|
||||
});
|
||||
}
|
||||
|
||||
function stampLookQuery(look: StagedLookFile, exploreTargetsByKey: Map<string, ExploreTargetSummary>): StagedLookFile {
|
||||
return stagedLookFileSchema.parse({
|
||||
...look,
|
||||
query: stampQueryWarehouseTarget(look.query, exploreTargetsByKey),
|
||||
});
|
||||
}
|
||||
|
||||
function warehouseTargetWarnings(explores: StagedExploreFile[]): StagedLookerFetchIssue[] {
|
||||
const unmapped = new Map<string, string[]>();
|
||||
const warnings: StagedLookerFetchIssue[] = [];
|
||||
|
||||
for (const explore of explores) {
|
||||
const targetTable = explore.targetTable;
|
||||
if (!targetTable || targetTable.ok) {
|
||||
continue;
|
||||
}
|
||||
|
||||
const sourceKey = exploreKey(explore.modelName, explore.exploreName);
|
||||
const lookerConnectionName = explore.connectionName ?? 'missing_connection_name';
|
||||
|
||||
if (targetTable.reason === 'no_connection_mapping') {
|
||||
const existing = unmapped.get(lookerConnectionName) ?? [];
|
||||
existing.push(sourceKey);
|
||||
unmapped.set(lookerConnectionName, existing);
|
||||
continue;
|
||||
}
|
||||
|
||||
warnings.push({
|
||||
rawPath: `looker_connection_mappings/${safeWarningPathSegment(lookerConnectionName)}`,
|
||||
entityType: 'looker_connection_mapping',
|
||||
entityId: explore.connectionName,
|
||||
severity: 'warning',
|
||||
statusCode: null,
|
||||
message: `Looker explore ${sourceKey} has sql_table_name that cannot be mapped to a physical warehouse table: ${targetTable.reason}.`,
|
||||
retryRecommended: false,
|
||||
kind: warningKindForReason(targetTable.reason),
|
||||
details: {
|
||||
lookerConnectionName,
|
||||
rawSqlTableName: explore.rawSqlTableName,
|
||||
reason: targetTable.reason,
|
||||
},
|
||||
});
|
||||
}
|
||||
|
||||
for (const [lookerConnectionName, affectedExplores] of [...unmapped.entries()].sort(([a], [b]) =>
|
||||
a.localeCompare(b),
|
||||
)) {
|
||||
const sortedAffectedExplores = [...affectedExplores].sort();
|
||||
warnings.push({
|
||||
rawPath: `looker_connection_mappings/${safeWarningPathSegment(lookerConnectionName)}`,
|
||||
entityType: 'looker_connection_mapping',
|
||||
entityId: lookerConnectionName === 'missing_connection_name' ? null : lookerConnectionName,
|
||||
severity: 'warning',
|
||||
statusCode: null,
|
||||
message: `Looker connection ${lookerConnectionName} is not mapped to a warehouse connection; ${sortedAffectedExplores.length} explore${sortedAffectedExplores.length === 1 ? '' : 's'} will be wiki-only.`,
|
||||
retryRecommended: false,
|
||||
kind: 'unmapped_looker_connection',
|
||||
details: {
|
||||
lookerConnectionName,
|
||||
affectedExplores: sortedAffectedExplores,
|
||||
},
|
||||
});
|
||||
}
|
||||
|
||||
return warnings;
|
||||
}
|
||||
|
||||
function warningKindForReason(reason: ParsedTargetTableFailureReason): StagedLookerFetchIssue['kind'] {
|
||||
if (reason === 'looker_template_unresolved') {
|
||||
return 'looker_template_unresolved';
|
||||
}
|
||||
if (reason === 'derived_table_not_supported') {
|
||||
return 'derived_table_not_supported';
|
||||
}
|
||||
return 'unparseable_sql_table_name';
|
||||
}
|
||||
|
||||
function safeWarningPathSegment(value: string): string {
|
||||
return value.replace(/[^a-zA-Z0-9_-]+/g, '_');
|
||||
}
|
||||
|
||||
function issueForFetchError(input: {
|
||||
rawPath: string;
|
||||
entityType: StagedLookerFetchIssue['entityType'];
|
||||
entityId: string | null;
|
||||
error: unknown;
|
||||
severity?: StagedLookerFetchIssue['severity'];
|
||||
}): StagedLookerFetchIssue {
|
||||
const statusCode = errorStatusCode(input.error);
|
||||
return {
|
||||
rawPath: input.rawPath,
|
||||
entityType: input.entityType,
|
||||
entityId: input.entityId,
|
||||
severity: input.severity ?? (input.entityType === 'signals' ? 'warning' : 'error'),
|
||||
statusCode,
|
||||
message: errorMessage(input.error),
|
||||
retryRecommended: statusCode === 429,
|
||||
};
|
||||
}
|
||||
|
||||
function errorMessage(error: unknown): string {
|
||||
return error instanceof Error ? error.message : String(error);
|
||||
}
|
||||
|
||||
function errorStatusCode(error: unknown): number | null {
|
||||
if (!error || typeof error !== 'object') {
|
||||
return null;
|
||||
}
|
||||
const record = error as Record<string, unknown>;
|
||||
const direct = record.statusCode ?? record.status;
|
||||
if (typeof direct === 'number') {
|
||||
return direct;
|
||||
}
|
||||
if (typeof direct === 'string') {
|
||||
const parsed = Number(direct);
|
||||
return Number.isFinite(parsed) ? parsed : null;
|
||||
}
|
||||
const response = record.response;
|
||||
if (response && typeof response === 'object') {
|
||||
return errorStatusCode(response);
|
||||
}
|
||||
return null;
|
||||
}
|
||||
|
||||
function buildFetchReport(
|
||||
skipped: StagedLookerFetchIssue[],
|
||||
warnings: StagedLookerFetchIssue[],
|
||||
): StagedLookerFetchReport {
|
||||
const retryRecommended = [...skipped, ...warnings].some((issue) => issue.retryRecommended);
|
||||
const hasWarehouseTargetWarnings = warnings.some((issue) => issue.entityType === 'looker_connection_mapping');
|
||||
return {
|
||||
status: skipped.length > 0 || hasWarehouseTargetWarnings ? 'partial' : 'success',
|
||||
retryRecommended,
|
||||
skipped,
|
||||
warnings,
|
||||
};
|
||||
}
|
||||
|
|
@ -0,0 +1,53 @@
|
|||
import type { KtxLocalProject } from '../../../../context/project/project.js';
|
||||
import type { KtxProjectConnectionConfig } from '../../../../context/project/config.js';
|
||||
import {
|
||||
type LookerCredentialResolver,
|
||||
} from './factory.js';
|
||||
|
||||
function stringField(value: unknown): string | null {
|
||||
return typeof value === 'string' && value.trim().length > 0 ? value.trim() : null;
|
||||
}
|
||||
|
||||
function resolveEnvReference(ref: string, env: NodeJS.ProcessEnv): string | null {
|
||||
if (!ref.startsWith('env:')) {
|
||||
return null;
|
||||
}
|
||||
return stringField(env[ref.slice('env:'.length)]);
|
||||
}
|
||||
|
||||
export function lookerCredentialsFromLocalConnection(
|
||||
connectionId: string,
|
||||
connection: KtxProjectConnectionConfig | undefined,
|
||||
env: NodeJS.ProcessEnv = process.env,
|
||||
) {
|
||||
if (!connection || String(connection.driver).toLowerCase() !== 'looker') {
|
||||
throw new Error(`Connection "${connectionId}" is not a Looker connection`);
|
||||
}
|
||||
const baseUrl = stringField(connection.base_url);
|
||||
const clientId = stringField(connection.client_id);
|
||||
const clientSecret =
|
||||
stringField(connection.client_secret) ??
|
||||
(stringField(connection.client_secret_ref) ? resolveEnvReference(String(connection.client_secret_ref), env) : null);
|
||||
|
||||
if (!baseUrl) {
|
||||
throw new Error(`Connection "${connectionId}" is missing Looker base_url`);
|
||||
}
|
||||
if (!clientId) {
|
||||
throw new Error(`Connection "${connectionId}" is missing Looker client_id`);
|
||||
}
|
||||
if (!clientSecret) {
|
||||
throw new Error(`Connection "${connectionId}" is missing Looker client_secret or client_secret_ref`);
|
||||
}
|
||||
return { base_url: baseUrl, client_id: clientId, client_secret: clientSecret };
|
||||
}
|
||||
|
||||
export function createLocalLookerCredentialResolver(
|
||||
project: KtxLocalProject,
|
||||
env: NodeJS.ProcessEnv = process.env,
|
||||
): LookerCredentialResolver {
|
||||
return {
|
||||
async resolve(lookerConnectionId) {
|
||||
return lookerCredentialsFromLocalConnection(lookerConnectionId, project.config.connections[lookerConnectionId], env);
|
||||
},
|
||||
};
|
||||
}
|
||||
|
|
@ -0,0 +1,116 @@
|
|||
import { mkdtemp } from 'node:fs/promises';
|
||||
import { tmpdir } from 'node:os';
|
||||
import { join } from 'node:path';
|
||||
import { describe, expect, it } from 'vitest';
|
||||
import { LocalLookerRuntimeStore } from './local-runtime-store.js';
|
||||
|
||||
describe('LocalLookerRuntimeStore', () => {
|
||||
async function store() {
|
||||
const dir = await mkdtemp(join(tmpdir(), 'ktx-looker-store-'));
|
||||
return new LocalLookerRuntimeStore({
|
||||
dbPath: join(dir, 'db.sqlite'),
|
||||
now: () => new Date('2026-05-05T12:00:00.000Z'),
|
||||
});
|
||||
}
|
||||
|
||||
it('stores cursors and connection mappings', async () => {
|
||||
const local = await store();
|
||||
|
||||
await local.setCursors('prod-looker', {
|
||||
dashboardsLastSyncedAt: '2026-05-01T00:00:00.000Z',
|
||||
looksLastSyncedAt: null,
|
||||
});
|
||||
await local.upsertConnectionMapping({
|
||||
lookerConnectionId: 'prod-looker',
|
||||
lookerConnectionName: 'bq_reporting',
|
||||
ktxConnectionId: 'prod-warehouse',
|
||||
source: 'cli',
|
||||
});
|
||||
|
||||
await expect(local.readCursors('prod-looker')).resolves.toEqual({
|
||||
dashboardsLastSyncedAt: '2026-05-01T00:00:00.000Z',
|
||||
looksLastSyncedAt: null,
|
||||
});
|
||||
await expect(local.readMappings('prod-looker')).resolves.toEqual([
|
||||
{
|
||||
lookerConnectionName: 'bq_reporting',
|
||||
ktxConnectionId: 'prod-warehouse',
|
||||
lookerHost: null,
|
||||
lookerDatabase: null,
|
||||
lookerDialect: null,
|
||||
},
|
||||
]);
|
||||
});
|
||||
|
||||
it('refreshes discovered metadata without dropping local targets', async () => {
|
||||
const local = await store();
|
||||
await local.upsertConnectionMapping({
|
||||
lookerConnectionId: 'prod-looker',
|
||||
lookerConnectionName: 'bq_reporting',
|
||||
ktxConnectionId: 'prod-warehouse',
|
||||
source: 'cli',
|
||||
});
|
||||
|
||||
await local.refreshDiscoveredConnections({
|
||||
lookerConnectionId: 'prod-looker',
|
||||
discovered: [
|
||||
{
|
||||
name: 'bq_reporting',
|
||||
host: 'bigquery.googleapis.com',
|
||||
database: 'analytics',
|
||||
schema: null,
|
||||
dialect: 'bigquery_standard_sql',
|
||||
},
|
||||
],
|
||||
});
|
||||
|
||||
await expect(local.listConnectionMappings('prod-looker')).resolves.toEqual([
|
||||
{
|
||||
lookerConnectionName: 'bq_reporting',
|
||||
ktxConnectionId: 'prod-warehouse',
|
||||
lookerHost: 'bigquery.googleapis.com',
|
||||
lookerDatabase: 'analytics',
|
||||
lookerDialect: 'bigquery_standard_sql',
|
||||
source: 'refresh',
|
||||
},
|
||||
]);
|
||||
});
|
||||
|
||||
it('applies yaml mapping intent while preserving refresh metadata and cli overrides', async () => {
|
||||
const local = await store();
|
||||
await local.refreshDiscoveredConnections({
|
||||
lookerConnectionId: 'prod-looker',
|
||||
discovered: [{ name: 'analytics', host: 'looker-db.test', database: 'warehouse', schema: null, dialect: 'postgres' }],
|
||||
});
|
||||
await local.upsertConnectionMapping({
|
||||
lookerConnectionId: 'prod-looker',
|
||||
lookerConnectionName: 'manual',
|
||||
ktxConnectionId: 'cli-warehouse',
|
||||
source: 'cli',
|
||||
});
|
||||
|
||||
await local.applyYamlBootstrap({
|
||||
lookerConnectionId: 'prod-looker',
|
||||
mappings: [
|
||||
{ lookerConnectionName: 'analytics', ktxConnectionId: 'yaml-warehouse' },
|
||||
{ lookerConnectionName: 'manual', ktxConnectionId: 'yaml-warehouse' },
|
||||
],
|
||||
});
|
||||
|
||||
await expect(local.listConnectionMappings('prod-looker')).resolves.toMatchObject([
|
||||
{
|
||||
lookerConnectionName: 'analytics',
|
||||
ktxConnectionId: 'yaml-warehouse',
|
||||
lookerHost: 'looker-db.test',
|
||||
lookerDatabase: 'warehouse',
|
||||
lookerDialect: 'postgres',
|
||||
source: 'ktx.yaml',
|
||||
},
|
||||
{
|
||||
lookerConnectionName: 'manual',
|
||||
ktxConnectionId: 'cli-warehouse',
|
||||
source: 'cli',
|
||||
},
|
||||
]);
|
||||
});
|
||||
});
|
||||
|
|
@ -0,0 +1,280 @@
|
|||
import { mkdirSync } from 'node:fs';
|
||||
import { dirname } from 'node:path';
|
||||
import Database from 'better-sqlite3';
|
||||
import type { LookerWarehouseConnectionInfo } from './client.js';
|
||||
import type { LookerConnectionMapping } from './mapping.js';
|
||||
import type { LookerRuntimeCursors } from './types.js';
|
||||
|
||||
type LocalLookerMappingSource = 'ktx.yaml' | 'cli' | 'refresh';
|
||||
|
||||
interface LocalLookerRuntimeStoreOptions {
|
||||
dbPath: string;
|
||||
now?: () => Date;
|
||||
}
|
||||
|
||||
export interface LocalLookerConnectionMappingListRow extends LookerConnectionMapping {
|
||||
source: LocalLookerMappingSource;
|
||||
}
|
||||
|
||||
export interface UpsertLocalLookerConnectionMappingInput {
|
||||
lookerConnectionId: string;
|
||||
lookerConnectionName: string;
|
||||
ktxConnectionId: string | null;
|
||||
source: LocalLookerMappingSource;
|
||||
}
|
||||
|
||||
interface ApplyLocalLookerYamlBootstrapInput {
|
||||
lookerConnectionId: string;
|
||||
mappings: Array<{
|
||||
lookerConnectionName: string;
|
||||
ktxConnectionId: string | null;
|
||||
}>;
|
||||
}
|
||||
|
||||
export interface RefreshLocalLookerDiscoveredConnectionsInput {
|
||||
lookerConnectionId: string;
|
||||
discovered: LookerWarehouseConnectionInfo[];
|
||||
}
|
||||
|
||||
export interface ClearLocalLookerMappingsInput {
|
||||
lookerConnectionId: string;
|
||||
lookerConnectionName?: string;
|
||||
}
|
||||
|
||||
interface LookerSourceStateReader {
|
||||
readMappings(lookerConnectionId: string): Promise<LookerConnectionMapping[]>;
|
||||
readCursors(lookerConnectionId: string): Promise<LookerRuntimeCursors>;
|
||||
}
|
||||
|
||||
export class LocalLookerRuntimeStore implements LookerSourceStateReader {
|
||||
private readonly db: Database.Database;
|
||||
private readonly now: () => Date;
|
||||
|
||||
constructor(options: LocalLookerRuntimeStoreOptions) {
|
||||
mkdirSync(dirname(options.dbPath), { recursive: true });
|
||||
this.db = new Database(options.dbPath);
|
||||
this.db.pragma('journal_mode = WAL');
|
||||
this.db.pragma('foreign_keys = ON');
|
||||
this.now = options.now ?? (() => new Date());
|
||||
this.db.exec(`
|
||||
CREATE TABLE IF NOT EXISTS local_looker_runtime_config (
|
||||
looker_connection_id TEXT PRIMARY KEY,
|
||||
dashboards_last_synced_at TEXT,
|
||||
looks_last_synced_at TEXT,
|
||||
updated_at TEXT NOT NULL
|
||||
);
|
||||
|
||||
CREATE TABLE IF NOT EXISTS local_looker_connection_mappings (
|
||||
looker_connection_id TEXT NOT NULL,
|
||||
looker_connection_name TEXT NOT NULL,
|
||||
ktx_connection_id TEXT,
|
||||
looker_host TEXT,
|
||||
looker_database TEXT,
|
||||
looker_dialect TEXT,
|
||||
source TEXT NOT NULL,
|
||||
updated_at TEXT NOT NULL,
|
||||
PRIMARY KEY (looker_connection_id, looker_connection_name)
|
||||
);
|
||||
`);
|
||||
}
|
||||
|
||||
async applyYamlBootstrap(input: ApplyLocalLookerYamlBootstrapInput): Promise<void> {
|
||||
const timestamp = this.now().toISOString();
|
||||
const apply = this.db.transaction(() => {
|
||||
const existing = this.db.prepare(`
|
||||
SELECT ktx_connection_id, source
|
||||
FROM local_looker_connection_mappings
|
||||
WHERE looker_connection_id = ? AND looker_connection_name = ?
|
||||
`);
|
||||
const insert = this.db.prepare(`
|
||||
INSERT INTO local_looker_connection_mappings (
|
||||
looker_connection_id,
|
||||
looker_connection_name,
|
||||
ktx_connection_id,
|
||||
looker_host,
|
||||
looker_database,
|
||||
looker_dialect,
|
||||
source,
|
||||
updated_at
|
||||
)
|
||||
VALUES (?, ?, ?, NULL, NULL, NULL, 'ktx.yaml', ?)
|
||||
`);
|
||||
const updateRefreshRow = this.db.prepare(`
|
||||
UPDATE local_looker_connection_mappings
|
||||
SET ktx_connection_id = ?,
|
||||
source = 'ktx.yaml',
|
||||
updated_at = ?
|
||||
WHERE looker_connection_id = ?
|
||||
AND looker_connection_name = ?
|
||||
AND source = 'refresh'
|
||||
AND ktx_connection_id IS NULL
|
||||
`);
|
||||
|
||||
for (const mapping of input.mappings) {
|
||||
const row = existing.get(input.lookerConnectionId, mapping.lookerConnectionName) as
|
||||
| { ktx_connection_id: string | null; source: LocalLookerMappingSource }
|
||||
| undefined;
|
||||
if (!row) {
|
||||
insert.run(input.lookerConnectionId, mapping.lookerConnectionName, mapping.ktxConnectionId, timestamp);
|
||||
continue;
|
||||
}
|
||||
if (row.source === 'refresh' && row.ktx_connection_id === null) {
|
||||
updateRefreshRow.run(mapping.ktxConnectionId, timestamp, input.lookerConnectionId, mapping.lookerConnectionName);
|
||||
}
|
||||
}
|
||||
});
|
||||
|
||||
apply();
|
||||
}
|
||||
|
||||
async readCursors(lookerConnectionId: string): Promise<LookerRuntimeCursors> {
|
||||
const row = this.db
|
||||
.prepare(
|
||||
`
|
||||
SELECT dashboards_last_synced_at, looks_last_synced_at
|
||||
FROM local_looker_runtime_config
|
||||
WHERE looker_connection_id = ?
|
||||
`,
|
||||
)
|
||||
.get(lookerConnectionId) as { dashboards_last_synced_at: string | null; looks_last_synced_at: string | null } | undefined;
|
||||
|
||||
return {
|
||||
dashboardsLastSyncedAt: row?.dashboards_last_synced_at ?? null,
|
||||
looksLastSyncedAt: row?.looks_last_synced_at ?? null,
|
||||
};
|
||||
}
|
||||
|
||||
async setCursors(lookerConnectionId: string, cursors: LookerRuntimeCursors): Promise<void> {
|
||||
this.db
|
||||
.prepare(
|
||||
`
|
||||
INSERT INTO local_looker_runtime_config (
|
||||
looker_connection_id,
|
||||
dashboards_last_synced_at,
|
||||
looks_last_synced_at,
|
||||
updated_at
|
||||
)
|
||||
VALUES (?, ?, ?, ?)
|
||||
ON CONFLICT(looker_connection_id) DO UPDATE SET
|
||||
dashboards_last_synced_at = excluded.dashboards_last_synced_at,
|
||||
looks_last_synced_at = excluded.looks_last_synced_at,
|
||||
updated_at = excluded.updated_at
|
||||
`,
|
||||
)
|
||||
.run(lookerConnectionId, cursors.dashboardsLastSyncedAt, cursors.looksLastSyncedAt, this.now().toISOString());
|
||||
}
|
||||
|
||||
async readMappings(lookerConnectionId: string): Promise<LookerConnectionMapping[]> {
|
||||
return (await this.listConnectionMappings(lookerConnectionId)).map(({ source: _source, ...mapping }) => mapping);
|
||||
}
|
||||
|
||||
async listConnectionMappings(lookerConnectionId: string): Promise<LocalLookerConnectionMappingListRow[]> {
|
||||
const rows = this.db
|
||||
.prepare(
|
||||
`
|
||||
SELECT
|
||||
looker_connection_name,
|
||||
ktx_connection_id,
|
||||
looker_host,
|
||||
looker_database,
|
||||
looker_dialect,
|
||||
source
|
||||
FROM local_looker_connection_mappings
|
||||
WHERE looker_connection_id = ?
|
||||
ORDER BY looker_connection_name
|
||||
`,
|
||||
)
|
||||
.all(lookerConnectionId) as Array<{
|
||||
looker_connection_name: string;
|
||||
ktx_connection_id: string | null;
|
||||
looker_host: string | null;
|
||||
looker_database: string | null;
|
||||
looker_dialect: string | null;
|
||||
source: LocalLookerMappingSource;
|
||||
}>;
|
||||
|
||||
return rows.map((row) => ({
|
||||
lookerConnectionName: row.looker_connection_name,
|
||||
ktxConnectionId: row.ktx_connection_id,
|
||||
lookerHost: row.looker_host,
|
||||
lookerDatabase: row.looker_database,
|
||||
lookerDialect: row.looker_dialect,
|
||||
source: row.source,
|
||||
}));
|
||||
}
|
||||
|
||||
async upsertConnectionMapping(input: UpsertLocalLookerConnectionMappingInput): Promise<void> {
|
||||
this.db
|
||||
.prepare(
|
||||
`
|
||||
INSERT INTO local_looker_connection_mappings (
|
||||
looker_connection_id,
|
||||
looker_connection_name,
|
||||
ktx_connection_id,
|
||||
looker_host,
|
||||
looker_database,
|
||||
looker_dialect,
|
||||
source,
|
||||
updated_at
|
||||
)
|
||||
VALUES (?, ?, ?, NULL, NULL, NULL, ?, ?)
|
||||
ON CONFLICT(looker_connection_id, looker_connection_name) DO UPDATE SET
|
||||
ktx_connection_id = excluded.ktx_connection_id,
|
||||
source = excluded.source,
|
||||
updated_at = excluded.updated_at
|
||||
`,
|
||||
)
|
||||
.run(input.lookerConnectionId, input.lookerConnectionName, input.ktxConnectionId, input.source, this.now().toISOString());
|
||||
}
|
||||
|
||||
async refreshDiscoveredConnections(input: RefreshLocalLookerDiscoveredConnectionsInput): Promise<void> {
|
||||
const timestamp = this.now().toISOString();
|
||||
const update = this.db.transaction(() => {
|
||||
const upsert = this.db.prepare(`
|
||||
INSERT INTO local_looker_connection_mappings (
|
||||
looker_connection_id,
|
||||
looker_connection_name,
|
||||
ktx_connection_id,
|
||||
looker_host,
|
||||
looker_database,
|
||||
looker_dialect,
|
||||
source,
|
||||
updated_at
|
||||
)
|
||||
VALUES (?, ?, NULL, ?, ?, ?, 'refresh', ?)
|
||||
ON CONFLICT(looker_connection_id, looker_connection_name) DO UPDATE SET
|
||||
looker_host = excluded.looker_host,
|
||||
looker_database = excluded.looker_database,
|
||||
looker_dialect = excluded.looker_dialect,
|
||||
source = excluded.source,
|
||||
updated_at = excluded.updated_at
|
||||
`);
|
||||
for (const connection of input.discovered) {
|
||||
upsert.run(
|
||||
input.lookerConnectionId,
|
||||
connection.name,
|
||||
connection.host,
|
||||
connection.database,
|
||||
connection.dialect,
|
||||
timestamp,
|
||||
);
|
||||
}
|
||||
});
|
||||
update();
|
||||
}
|
||||
|
||||
async clearConnectionMappings(input: ClearLocalLookerMappingsInput): Promise<void> {
|
||||
if (input.lookerConnectionName) {
|
||||
this.db
|
||||
.prepare(
|
||||
`
|
||||
DELETE FROM local_looker_connection_mappings
|
||||
WHERE looker_connection_id = ? AND looker_connection_name = ?
|
||||
`,
|
||||
)
|
||||
.run(input.lookerConnectionId, input.lookerConnectionName);
|
||||
return;
|
||||
}
|
||||
this.db.prepare('DELETE FROM local_looker_connection_mappings WHERE looker_connection_id = ?').run(input.lookerConnectionId);
|
||||
}
|
||||
}
|
||||
|
|
@ -0,0 +1,125 @@
|
|||
import { mkdir, mkdtemp, readFile, rm, writeFile } from 'node:fs/promises';
|
||||
import { tmpdir } from 'node:os';
|
||||
import { join } from 'node:path';
|
||||
import { afterEach, beforeEach, describe, expect, it, vi } from 'vitest';
|
||||
import type { LookerRuntimeClient } from './fetch.js';
|
||||
import { LookerSourceAdapter } from './looker.adapter.js';
|
||||
|
||||
const connectionId = '11111111-1111-4111-8111-111111111111';
|
||||
|
||||
function makeClient(): LookerRuntimeClient {
|
||||
return {
|
||||
listDashboards: vi.fn().mockResolvedValue([]),
|
||||
getDashboard: vi.fn(),
|
||||
listLooks: vi.fn().mockResolvedValue([]),
|
||||
getLook: vi.fn(),
|
||||
listFolders: vi.fn().mockResolvedValue({ folders: [] }),
|
||||
listUsers: vi.fn().mockResolvedValue([]),
|
||||
listGroups: vi.fn().mockResolvedValue([]),
|
||||
listLookmlModels: vi.fn().mockResolvedValue({
|
||||
models: [{ name: 'b2b', label: 'B2B', explores: [{ name: 'sales_pipeline', label: 'Sales Pipeline' }] }],
|
||||
}),
|
||||
getExplore: vi.fn().mockResolvedValue({
|
||||
modelName: 'b2b',
|
||||
exploreName: 'sales_pipeline',
|
||||
label: 'Sales Pipeline',
|
||||
description: null,
|
||||
fields: { dimensions: [], measures: [] },
|
||||
joins: [],
|
||||
}),
|
||||
};
|
||||
}
|
||||
|
||||
describe('LookerSourceAdapter', () => {
|
||||
let stagedDir: string;
|
||||
|
||||
beforeEach(async () => {
|
||||
stagedDir = await mkdtemp(join(tmpdir(), 'looker-adapter-'));
|
||||
});
|
||||
|
||||
afterEach(async () => {
|
||||
await rm(stagedDir, { recursive: true, force: true });
|
||||
});
|
||||
|
||||
it('exposes source="looker" and skillNames=["looker_ingest"]', () => {
|
||||
const adapter = new LookerSourceAdapter({ clientFactory: { createClient: () => makeClient() } });
|
||||
expect(adapter.source).toBe('looker');
|
||||
expect(adapter.skillNames).toEqual(['looker_ingest']);
|
||||
});
|
||||
|
||||
it('enables context evidence indexing and delegates triage signals', async () => {
|
||||
const adapter = new LookerSourceAdapter({ clientFactory: { createClient: () => makeClient() } });
|
||||
|
||||
expect(adapter.evidenceIndexing).toBe('documents');
|
||||
expect(adapter.triageSupported).toBe(true);
|
||||
await expect(adapter.getTriageSignals?.(stagedDir, 'looker:dashboard:10')).resolves.toMatchObject({
|
||||
objectType: 'looker_dashboard',
|
||||
});
|
||||
});
|
||||
|
||||
it('fetches, detects, and chunks a runtime bundle through the composed adapter', async () => {
|
||||
const adapter = new LookerSourceAdapter({
|
||||
clientFactory: { createClient: vi.fn().mockResolvedValue(makeClient()) },
|
||||
now: () => new Date('2026-04-30T12:30:00.000Z'),
|
||||
});
|
||||
|
||||
await mkdir(stagedDir, { recursive: true });
|
||||
await adapter.fetch({ lookerConnectionId: connectionId }, stagedDir, { connectionId, sourceKey: 'looker' });
|
||||
|
||||
expect(await adapter.detect(stagedDir)).toBe(true);
|
||||
expect(await readFile(join(stagedDir, 'explores/b2b/sales_pipeline.json'), 'utf-8')).toContain('sales_pipeline');
|
||||
|
||||
const result = await adapter.chunk(stagedDir);
|
||||
expect(result.workUnits.map((wu) => wu.unitKey)).toEqual(['looker-explore-b2b-sales_pipeline']);
|
||||
});
|
||||
|
||||
it('passes pull success notifications to the server callback', async () => {
|
||||
const onPullSucceeded = vi.fn().mockResolvedValue(undefined);
|
||||
const adapter = new LookerSourceAdapter({
|
||||
clientFactory: { createClient: () => makeClient() },
|
||||
onPullSucceeded,
|
||||
});
|
||||
const completedAt = new Date('2026-04-30T12:00:00.000Z');
|
||||
|
||||
await adapter.onPullSucceeded({
|
||||
connectionId,
|
||||
sourceKey: 'looker',
|
||||
syncId: 'sync-1',
|
||||
trigger: 'scheduled_pull',
|
||||
completedAt,
|
||||
stagedDir: '/tmp/staged',
|
||||
});
|
||||
|
||||
expect(onPullSucceeded).toHaveBeenCalledWith({
|
||||
connectionId,
|
||||
sourceKey: 'looker',
|
||||
syncId: 'sync-1',
|
||||
trigger: 'scheduled_pull',
|
||||
completedAt,
|
||||
stagedDir: '/tmp/staged',
|
||||
});
|
||||
});
|
||||
|
||||
it('describes incremental fetch scope from the staged scope file', async () => {
|
||||
await mkdir(join(stagedDir, 'dashboards'), { recursive: true });
|
||||
await writeFile(
|
||||
join(stagedDir, 'looker-scope.json'),
|
||||
JSON.stringify(
|
||||
{
|
||||
mode: 'incremental',
|
||||
knownCurrentRawPaths: ['dashboards/10.json', 'dashboards/11.json'],
|
||||
fetchedRawPaths: ['dashboards/11.json'],
|
||||
},
|
||||
null,
|
||||
2,
|
||||
),
|
||||
);
|
||||
const adapter = new LookerSourceAdapter({ clientFactory: { createClient: () => makeClient() } });
|
||||
|
||||
const scope = await adapter.describeScope(stagedDir);
|
||||
|
||||
expect(scope.isPathInScope('dashboards/10.json')).toBe(false);
|
||||
expect(scope.isPathInScope('dashboards/11.json')).toBe(true);
|
||||
expect(scope.isPathInScope('dashboards/12.json')).toBe(true);
|
||||
});
|
||||
});
|
||||
|
|
@ -0,0 +1,70 @@
|
|||
import type { ChunkResult, DiffSet, FetchContext, IngestTrigger, ScopeDescriptor, SourceAdapter } from '../../types.js';
|
||||
import { chunkLookerStagedDir } from './chunk.js';
|
||||
import { detectLookerStagedDir } from './detect.js';
|
||||
import { getLookerTriageSignals } from './evidence-documents.js';
|
||||
import { fetchLookerRuntimeBundle, type LookerClientFactory } from './fetch.js';
|
||||
import { readLookerFetchReport } from './fetch-report.js';
|
||||
import { describeLookerScope } from './scope.js';
|
||||
import { listLookerTargetConnectionIds } from './target-connections.js';
|
||||
|
||||
interface LookerPullSucceededContext {
|
||||
connectionId: string;
|
||||
sourceKey: string;
|
||||
syncId: string;
|
||||
trigger: IngestTrigger;
|
||||
completedAt: Date;
|
||||
stagedDir: string;
|
||||
}
|
||||
|
||||
export interface LookerSourceAdapterDeps {
|
||||
clientFactory: LookerClientFactory;
|
||||
now?: () => Date;
|
||||
onPullSucceeded?: (ctx: LookerPullSucceededContext) => Promise<void>;
|
||||
}
|
||||
|
||||
export class LookerSourceAdapter implements SourceAdapter {
|
||||
readonly source = 'looker';
|
||||
readonly skillNames: string[] = ['looker_ingest'];
|
||||
readonly evidenceIndexing = 'documents' as const;
|
||||
readonly triageSupported = true;
|
||||
|
||||
constructor(private readonly deps: LookerSourceAdapterDeps) {}
|
||||
|
||||
detect(stagedDir: string): Promise<boolean> {
|
||||
return detectLookerStagedDir(stagedDir);
|
||||
}
|
||||
|
||||
fetch(pullConfig: unknown, stagedDir: string, ctx: FetchContext): Promise<void> {
|
||||
return fetchLookerRuntimeBundle({
|
||||
pullConfig,
|
||||
stagedDir,
|
||||
ctx,
|
||||
clientFactory: this.deps.clientFactory,
|
||||
now: this.deps.now,
|
||||
});
|
||||
}
|
||||
|
||||
chunk(stagedDir: string, diffSet?: DiffSet): Promise<ChunkResult> {
|
||||
return chunkLookerStagedDir(stagedDir, diffSet);
|
||||
}
|
||||
|
||||
readFetchReport(stagedDir: string) {
|
||||
return readLookerFetchReport(stagedDir);
|
||||
}
|
||||
|
||||
listTargetConnectionIds(stagedDir: string): Promise<string[]> {
|
||||
return listLookerTargetConnectionIds(stagedDir);
|
||||
}
|
||||
|
||||
getTriageSignals(stagedDir: string, externalId: string) {
|
||||
return getLookerTriageSignals(stagedDir, externalId);
|
||||
}
|
||||
|
||||
describeScope(stagedDir: string): Promise<ScopeDescriptor> {
|
||||
return describeLookerScope(stagedDir);
|
||||
}
|
||||
|
||||
async onPullSucceeded(ctx: LookerPullSucceededContext): Promise<void> {
|
||||
await this.deps.onPullSucceeded?.(ctx);
|
||||
}
|
||||
}
|
||||
384
packages/cli/src/context/ingest/adapters/looker/mapping.test.ts
Normal file
384
packages/cli/src/context/ingest/adapters/looker/mapping.test.ts
Normal file
|
|
@ -0,0 +1,384 @@
|
|||
import { describe, expect, it, vi } from 'vitest';
|
||||
import type { StagedExploreFile, StagedLookmlModelsFile } from './types.js';
|
||||
import {
|
||||
buildLookerPullConfigFromInputs,
|
||||
collectExploreParseItems,
|
||||
computeLookerMappingDrift,
|
||||
discoverLookerConnections,
|
||||
lookerDialectToConnectionType,
|
||||
projectParsedIdentifier,
|
||||
refreshLookerMappingPlaceholders,
|
||||
sqlglotDialectForConnectionType,
|
||||
suggestKtxConnectionForLookerConnection,
|
||||
validateLookerMappings,
|
||||
validateLookerWarehouseTarget,
|
||||
} from './mapping.js';
|
||||
|
||||
const liveConnections = [
|
||||
{
|
||||
name: 'b2b_sandbox_bq',
|
||||
host: 'warehouse.example.com',
|
||||
database: 'analytics',
|
||||
schema: null,
|
||||
dialect: 'bigquery_standard_sql',
|
||||
},
|
||||
{
|
||||
name: 'pg_runtime',
|
||||
host: 'pg.internal:5432',
|
||||
database: 'app',
|
||||
schema: 'public',
|
||||
dialect: 'postgres',
|
||||
},
|
||||
];
|
||||
|
||||
const mappedExplore: StagedExploreFile = {
|
||||
modelName: 'b2b',
|
||||
exploreName: 'sales_pipeline',
|
||||
label: 'Sales Pipeline',
|
||||
description: null,
|
||||
rawSqlTableName: 'proj.analytics.opportunities AS opportunities',
|
||||
connectionName: 'b2b_sandbox_bq',
|
||||
viewName: 'opportunities',
|
||||
fields: { dimensions: [], measures: [] },
|
||||
joins: [
|
||||
{
|
||||
name: 'accounts',
|
||||
type: 'left_outer',
|
||||
relationship: 'many_to_one',
|
||||
rawSqlTableName: 'proj.analytics.accounts',
|
||||
sqlOn: null,
|
||||
from: null,
|
||||
targetTable: null,
|
||||
},
|
||||
],
|
||||
targetWarehouseConnectionId: null,
|
||||
targetTable: null,
|
||||
};
|
||||
|
||||
const models: StagedLookmlModelsFile = {
|
||||
models: [{ name: 'b2b', label: 'B2B', explores: [{ name: 'sales_pipeline', label: 'Sales Pipeline' }] }],
|
||||
};
|
||||
|
||||
describe('discoverLookerConnections', () => {
|
||||
it('delegates to the runtime client connection discovery method', async () => {
|
||||
const client = { listLookerConnections: vi.fn().mockResolvedValue(liveConnections) };
|
||||
|
||||
await expect(discoverLookerConnections(client)).resolves.toEqual(liveConnections);
|
||||
expect(client.listLookerConnections).toHaveBeenCalledTimes(1);
|
||||
});
|
||||
});
|
||||
|
||||
describe('looker dialect and target validation helpers', () => {
|
||||
it('maps Looker dialect names to KTX connection types', () => {
|
||||
expect(lookerDialectToConnectionType('bigquery_standard_sql')).toBe('BIGQUERY');
|
||||
expect(lookerDialectToConnectionType('postgres')).toBe('POSTGRESQL');
|
||||
expect(lookerDialectToConnectionType('mssql')).toBe('SQLSERVER');
|
||||
expect(lookerDialectToConnectionType('unknown')).toBeNull();
|
||||
});
|
||||
|
||||
it('maps supported warehouse connection types to sqlglot dialects', () => {
|
||||
expect(sqlglotDialectForConnectionType('BIGQUERY')).toBe('bigquery');
|
||||
expect(sqlglotDialectForConnectionType('POSTGRESQL')).toBe('postgres');
|
||||
expect(sqlglotDialectForConnectionType('LOOKER')).toBeNull();
|
||||
});
|
||||
|
||||
it('returns a structured failure for unsupported Looker warehouse targets', () => {
|
||||
expect(validateLookerWarehouseTarget('LOOKER')).toEqual({
|
||||
ok: false,
|
||||
reason: 'Connection type LOOKER cannot be used as a Looker warehouse mapping target',
|
||||
});
|
||||
});
|
||||
});
|
||||
|
||||
describe('suggestKtxConnectionForLookerConnection', () => {
|
||||
it('returns the single deterministic target with matching type, host, and database', () => {
|
||||
expect(
|
||||
suggestKtxConnectionForLookerConnection({
|
||||
lookerConnection: liveConnections[1],
|
||||
candidateConnections: [
|
||||
{
|
||||
id: 'wrong-type',
|
||||
connection_type: 'MYSQL',
|
||||
connection_params: { host: 'pg.internal', database: 'app' },
|
||||
},
|
||||
{
|
||||
id: 'pg-target',
|
||||
connection_type: 'POSTGRESQL',
|
||||
connection_params: { host: 'PG.INTERNAL', database: 'APP' },
|
||||
},
|
||||
],
|
||||
}),
|
||||
).toBe('pg-target');
|
||||
});
|
||||
|
||||
it('returns null when more than one target matches', () => {
|
||||
expect(
|
||||
suggestKtxConnectionForLookerConnection({
|
||||
lookerConnection: liveConnections[1],
|
||||
candidateConnections: [
|
||||
{
|
||||
id: 'first',
|
||||
connection_type: 'POSTGRESQL',
|
||||
connection_params: { host: 'pg.internal', database: 'app' },
|
||||
},
|
||||
{
|
||||
id: 'second',
|
||||
connection_type: 'POSTGRESQL',
|
||||
connection_params: { host: 'pg.internal:5432', database: 'APP' },
|
||||
},
|
||||
],
|
||||
}),
|
||||
).toBeNull();
|
||||
});
|
||||
});
|
||||
|
||||
describe('refreshLookerMappingPlaceholders', () => {
|
||||
it('adds newly discovered placeholders and refreshes live metadata without dropping saved targets', () => {
|
||||
expect(
|
||||
refreshLookerMappingPlaceholders({
|
||||
stored: [
|
||||
{
|
||||
lookerConnectionName: 'b2b_sandbox_bq',
|
||||
ktxConnectionId: 'warehouse',
|
||||
lookerHost: null,
|
||||
lookerDatabase: null,
|
||||
lookerDialect: null,
|
||||
},
|
||||
],
|
||||
live: liveConnections,
|
||||
}),
|
||||
).toEqual({
|
||||
changed: true,
|
||||
mappings: [
|
||||
{
|
||||
lookerConnectionName: 'b2b_sandbox_bq',
|
||||
ktxConnectionId: 'warehouse',
|
||||
lookerHost: 'warehouse.example.com',
|
||||
lookerDatabase: 'analytics',
|
||||
lookerDialect: 'bigquery_standard_sql',
|
||||
},
|
||||
{
|
||||
lookerConnectionName: 'pg_runtime',
|
||||
ktxConnectionId: null,
|
||||
lookerHost: 'pg.internal:5432',
|
||||
lookerDatabase: 'app',
|
||||
lookerDialect: 'postgres',
|
||||
},
|
||||
],
|
||||
});
|
||||
});
|
||||
});
|
||||
|
||||
describe('computeLookerMappingDrift and validateLookerMappings', () => {
|
||||
it('reports unmapped live connections, stale stored mappings, and in-sync mappings', () => {
|
||||
expect(
|
||||
computeLookerMappingDrift({
|
||||
storedMappings: [
|
||||
{
|
||||
lookerConnectionName: 'b2b_sandbox_bq',
|
||||
ktxConnectionId: 'warehouse',
|
||||
lookerHost: null,
|
||||
lookerDatabase: null,
|
||||
lookerDialect: null,
|
||||
},
|
||||
{
|
||||
lookerConnectionName: 'stale_runtime',
|
||||
ktxConnectionId: 'warehouse',
|
||||
lookerHost: null,
|
||||
lookerDatabase: null,
|
||||
lookerDialect: null,
|
||||
},
|
||||
],
|
||||
discovered: liveConnections,
|
||||
}),
|
||||
).toEqual({
|
||||
unmappedDiscovered: [liveConnections[1]],
|
||||
staleMappings: [{ lookerConnectionName: 'stale_runtime', reason: 'looker_connection_not_found' }],
|
||||
inSync: [{ lookerConnectionName: 'b2b_sandbox_bq', ktxConnectionId: 'warehouse' }],
|
||||
});
|
||||
});
|
||||
|
||||
it('validates missing and unsupported target connection ids', () => {
|
||||
expect(
|
||||
validateLookerMappings({
|
||||
mappings: [
|
||||
{
|
||||
lookerConnectionName: 'b2b_sandbox_bq',
|
||||
ktxConnectionId: 'missing',
|
||||
lookerHost: null,
|
||||
lookerDatabase: null,
|
||||
lookerDialect: null,
|
||||
},
|
||||
{
|
||||
lookerConnectionName: 'pg_runtime',
|
||||
ktxConnectionId: 'looker-target',
|
||||
lookerHost: null,
|
||||
lookerDatabase: null,
|
||||
lookerDialect: null,
|
||||
},
|
||||
],
|
||||
knownKtxConnectionIds: new Set(['looker-target']),
|
||||
knownConnectionTypes: new Map([['looker-target', 'LOOKER']]),
|
||||
}),
|
||||
).toEqual({
|
||||
ok: false,
|
||||
errors: [
|
||||
{ key: 'b2b_sandbox_bq', reason: 'KTX connection missing does not exist' },
|
||||
{
|
||||
key: 'pg_runtime',
|
||||
reason: 'Connection type LOOKER cannot be used as a Looker warehouse mapping target',
|
||||
},
|
||||
],
|
||||
});
|
||||
});
|
||||
});
|
||||
|
||||
describe('collectExploreParseItems and projectParsedIdentifier', () => {
|
||||
it('collects base explore and join parser inputs for mapped explores', () => {
|
||||
expect(
|
||||
collectExploreParseItems({
|
||||
explore: mappedExplore,
|
||||
connectionMappings: { b2b_sandbox_bq: 'warehouse' },
|
||||
targetConnections: new Map([['warehouse', { id: 'warehouse', connection_type: 'BIGQUERY' }]]),
|
||||
}),
|
||||
).toEqual({
|
||||
parsedTargetTables: {},
|
||||
parseItems: [
|
||||
{
|
||||
key: 'b2b.sales_pipeline',
|
||||
sql_table_name: 'proj.analytics.opportunities AS opportunities',
|
||||
dialect: 'bigquery',
|
||||
},
|
||||
{
|
||||
key: 'b2b.sales_pipeline.accounts',
|
||||
sql_table_name: 'proj.analytics.accounts',
|
||||
dialect: 'bigquery',
|
||||
},
|
||||
],
|
||||
});
|
||||
});
|
||||
|
||||
it('projects successful and failed parser rows into KTX parsed target tables', () => {
|
||||
expect(
|
||||
projectParsedIdentifier({
|
||||
ok: true,
|
||||
catalog: 'proj',
|
||||
schema: 'analytics',
|
||||
name: 'accounts',
|
||||
canonical_table: 'proj.analytics.accounts',
|
||||
}),
|
||||
).toEqual({
|
||||
ok: true,
|
||||
catalog: 'proj',
|
||||
schema: 'analytics',
|
||||
name: 'accounts',
|
||||
canonicalTable: 'proj.analytics.accounts',
|
||||
});
|
||||
|
||||
expect(projectParsedIdentifier({ ok: false, reason: 'derived_table_not_supported' })).toEqual({
|
||||
ok: false,
|
||||
reason: 'derived_table_not_supported',
|
||||
});
|
||||
});
|
||||
});
|
||||
|
||||
describe('buildLookerPullConfigFromInputs', () => {
|
||||
it('builds the hosted-equivalent Looker pull config from caller-loaded inputs', async () => {
|
||||
const parser = {
|
||||
parse: vi.fn().mockResolvedValue({
|
||||
'b2b.sales_pipeline': {
|
||||
ok: true,
|
||||
catalog: 'proj',
|
||||
schema: 'analytics',
|
||||
name: 'opportunities',
|
||||
canonical_table: 'proj.analytics.opportunities',
|
||||
},
|
||||
'b2b.sales_pipeline.accounts': {
|
||||
ok: true,
|
||||
catalog: 'proj',
|
||||
schema: 'analytics',
|
||||
name: 'accounts',
|
||||
canonical_table: 'proj.analytics.accounts',
|
||||
},
|
||||
}),
|
||||
};
|
||||
const client = {
|
||||
listLookmlModels: vi.fn().mockResolvedValue(models),
|
||||
getExplore: vi.fn().mockResolvedValue(mappedExplore),
|
||||
};
|
||||
|
||||
await expect(
|
||||
buildLookerPullConfigFromInputs({
|
||||
lookerConnectionId: 'prod-looker',
|
||||
cursors: {
|
||||
dashboardsLastSyncedAt: '2026-05-01T00:00:00.000Z',
|
||||
looksLastSyncedAt: null,
|
||||
},
|
||||
refreshedMappings: [
|
||||
{
|
||||
lookerConnectionName: 'b2b_sandbox_bq',
|
||||
ktxConnectionId: 'warehouse',
|
||||
lookerHost: 'warehouse.example.com',
|
||||
lookerDatabase: 'analytics',
|
||||
lookerDialect: 'bigquery_standard_sql',
|
||||
},
|
||||
],
|
||||
targetConnections: new Map([['warehouse', { id: 'warehouse', connection_type: 'BIGQUERY' }]]),
|
||||
client,
|
||||
parser,
|
||||
}),
|
||||
).resolves.toEqual({
|
||||
lookerConnectionId: 'prod-looker',
|
||||
dashboardUpdatedSince: '2026-05-01T00:00:00.000Z',
|
||||
lookUpdatedSince: null,
|
||||
connectionMappings: { b2b_sandbox_bq: 'warehouse' },
|
||||
connectionTypes: { b2b_sandbox_bq: 'BIGQUERY' },
|
||||
parsedTargetTables: {
|
||||
'b2b.sales_pipeline': {
|
||||
ok: true,
|
||||
catalog: 'proj',
|
||||
schema: 'analytics',
|
||||
name: 'opportunities',
|
||||
canonicalTable: 'proj.analytics.opportunities',
|
||||
},
|
||||
'b2b.sales_pipeline.accounts': {
|
||||
ok: true,
|
||||
catalog: 'proj',
|
||||
schema: 'analytics',
|
||||
name: 'accounts',
|
||||
canonicalTable: 'proj.analytics.accounts',
|
||||
},
|
||||
},
|
||||
});
|
||||
});
|
||||
|
||||
it('marks parser failures as parse_error without blocking pull-config construction', async () => {
|
||||
const parser = { parse: vi.fn().mockRejectedValue(new Error('python unavailable')) };
|
||||
const client = {
|
||||
listLookmlModels: vi.fn().mockResolvedValue(models),
|
||||
getExplore: vi.fn().mockResolvedValue(mappedExplore),
|
||||
};
|
||||
|
||||
const config = await buildLookerPullConfigFromInputs({
|
||||
lookerConnectionId: 'prod-looker',
|
||||
cursors: { dashboardsLastSyncedAt: null, looksLastSyncedAt: null },
|
||||
refreshedMappings: [
|
||||
{
|
||||
lookerConnectionName: 'b2b_sandbox_bq',
|
||||
ktxConnectionId: 'warehouse',
|
||||
lookerHost: null,
|
||||
lookerDatabase: null,
|
||||
lookerDialect: null,
|
||||
},
|
||||
],
|
||||
targetConnections: new Map([['warehouse', { id: 'warehouse', connection_type: 'BIGQUERY' }]]),
|
||||
client,
|
||||
parser,
|
||||
});
|
||||
|
||||
expect(config.parsedTargetTables).toMatchObject({
|
||||
'b2b.sales_pipeline': { ok: false, reason: 'parse_error' },
|
||||
'b2b.sales_pipeline.accounts': { ok: false, reason: 'parse_error' },
|
||||
});
|
||||
});
|
||||
});
|
||||
446
packages/cli/src/context/ingest/adapters/looker/mapping.ts
Normal file
446
packages/cli/src/context/ingest/adapters/looker/mapping.ts
Normal file
|
|
@ -0,0 +1,446 @@
|
|||
import type { ParsedTargetTable } from '../../parsed-target-table.js';
|
||||
import type { LookerWarehouseConnectionInfo } from './client.js';
|
||||
import type { LookerPullConfig, LookerRuntimeCursors, StagedExploreFile, StagedLookmlModelsFile } from './types.js';
|
||||
|
||||
const LOOKER_DIALECT_TO_CONNECTION_TYPE = {
|
||||
bigquery: 'BIGQUERY',
|
||||
bigquery_standard_sql: 'BIGQUERY',
|
||||
snowflake: 'SNOWFLAKE',
|
||||
postgres: 'POSTGRESQL',
|
||||
postgresql: 'POSTGRESQL',
|
||||
mysql: 'MYSQL',
|
||||
sqlite: 'SQLITE',
|
||||
sqlserver: 'SQLSERVER',
|
||||
mssql: 'SQLSERVER',
|
||||
tsql: 'SQLSERVER',
|
||||
clickhouse: 'CLICKHOUSE',
|
||||
} as const;
|
||||
|
||||
/** @internal */
|
||||
export type LookerWarehouseTargetConnectionType =
|
||||
(typeof LOOKER_DIALECT_TO_CONNECTION_TYPE)[keyof typeof LOOKER_DIALECT_TO_CONNECTION_TYPE];
|
||||
|
||||
export interface LookerConnectionMapping {
|
||||
lookerConnectionName: string;
|
||||
ktxConnectionId: string | null;
|
||||
lookerHost: string | null;
|
||||
lookerDatabase: string | null;
|
||||
lookerDialect: string | null;
|
||||
}
|
||||
|
||||
export interface LookerTargetConnection {
|
||||
id: string;
|
||||
connection_type: string;
|
||||
connection_params?: Record<string, unknown> | null;
|
||||
}
|
||||
|
||||
/** @internal */
|
||||
export interface LookerMappingCandidateConnection extends LookerTargetConnection {}
|
||||
|
||||
export interface LookerMappingDrift {
|
||||
unmappedDiscovered: LookerWarehouseConnectionInfo[];
|
||||
staleMappings: Array<{ lookerConnectionName: string; reason: 'looker_connection_not_found' }>;
|
||||
inSync: Array<{ lookerConnectionName: string; ktxConnectionId: string }>;
|
||||
}
|
||||
|
||||
export type LookerMappingValidationResult =
|
||||
| { ok: true }
|
||||
| { ok: false; errors: Array<{ key: string; reason: string }> };
|
||||
|
||||
export interface LookerTableIdentifierParseItem {
|
||||
key: string;
|
||||
sql_table_name: string;
|
||||
dialect: string;
|
||||
}
|
||||
|
||||
type ParsedTargetTableFailureReason = Extract<ParsedTargetTable, { ok: false }>['reason'];
|
||||
|
||||
export interface LookerParsedIdentifier {
|
||||
ok: boolean;
|
||||
catalog?: string | null;
|
||||
schema?: string | null;
|
||||
name?: string | null;
|
||||
canonical_table?: string | null;
|
||||
reason?: ParsedTargetTableFailureReason | null;
|
||||
detail?: string | null;
|
||||
}
|
||||
|
||||
export interface LookerTableIdentifierParser {
|
||||
parse(items: LookerTableIdentifierParseItem[]): Promise<Record<string, LookerParsedIdentifier>>;
|
||||
}
|
||||
|
||||
export interface LookerMappingClient {
|
||||
listLookerConnections(): Promise<LookerWarehouseConnectionInfo[]>;
|
||||
listLookmlModels(): Promise<StagedLookmlModelsFile>;
|
||||
getExplore(modelName: string, exploreName: string): Promise<StagedExploreFile>;
|
||||
}
|
||||
|
||||
const SQLGLOT_DIALECT_BY_CONNECTION_TYPE: Partial<Record<LookerWarehouseTargetConnectionType, string>> = {
|
||||
BIGQUERY: 'bigquery',
|
||||
SNOWFLAKE: 'snowflake',
|
||||
POSTGRESQL: 'postgres',
|
||||
MYSQL: 'mysql',
|
||||
SQLITE: 'sqlite',
|
||||
SQLSERVER: 'tsql',
|
||||
CLICKHOUSE: 'clickhouse',
|
||||
};
|
||||
|
||||
export async function discoverLookerConnections(
|
||||
client: Pick<LookerMappingClient, 'listLookerConnections'>,
|
||||
): Promise<LookerWarehouseConnectionInfo[]> {
|
||||
return client.listLookerConnections();
|
||||
}
|
||||
|
||||
/** @internal */
|
||||
export function lookerDialectToConnectionType(dialect: string | null): LookerWarehouseTargetConnectionType | null {
|
||||
if (!dialect) {
|
||||
return null;
|
||||
}
|
||||
return (
|
||||
LOOKER_DIALECT_TO_CONNECTION_TYPE[dialect.toLowerCase() as keyof typeof LOOKER_DIALECT_TO_CONNECTION_TYPE] ?? null
|
||||
);
|
||||
}
|
||||
|
||||
/** @internal */
|
||||
export function sqlglotDialectForConnectionType(connectionType: string): string | null {
|
||||
return SQLGLOT_DIALECT_BY_CONNECTION_TYPE[connectionType as LookerWarehouseTargetConnectionType] ?? null;
|
||||
}
|
||||
|
||||
/** @internal */
|
||||
export function validateLookerWarehouseTarget(connectionType: string): { ok: true } | { ok: false; reason: string } {
|
||||
return sqlglotDialectForConnectionType(connectionType)
|
||||
? { ok: true }
|
||||
: {
|
||||
ok: false,
|
||||
reason: `Connection type ${connectionType} cannot be used as a Looker warehouse mapping target`,
|
||||
};
|
||||
}
|
||||
|
||||
function extractWarehouseHost(params: unknown, connectionType: string): string | null {
|
||||
const record = isRecord(params) ? params : {};
|
||||
switch (connectionType) {
|
||||
case 'POSTGRESQL':
|
||||
case 'SQLSERVER':
|
||||
case 'MYSQL':
|
||||
case 'CLICKHOUSE':
|
||||
return readString(record, 'host');
|
||||
case 'SNOWFLAKE':
|
||||
return readString(record, 'account');
|
||||
default:
|
||||
return null;
|
||||
}
|
||||
}
|
||||
|
||||
function extractWarehouseDatabase(params: unknown, connectionType: string): string | null {
|
||||
const record = isRecord(params) ? params : {};
|
||||
switch (connectionType) {
|
||||
case 'POSTGRESQL':
|
||||
case 'SQLSERVER':
|
||||
case 'MYSQL':
|
||||
case 'CLICKHOUSE':
|
||||
case 'SNOWFLAKE':
|
||||
return readString(record, 'database');
|
||||
case 'BIGQUERY':
|
||||
return readString(record, 'dataset_id');
|
||||
default:
|
||||
return null;
|
||||
}
|
||||
}
|
||||
|
||||
function normalizeHost(value: string | null): string | null {
|
||||
return value ? value.toLowerCase().replace(/:\d+$/, '') : null;
|
||||
}
|
||||
|
||||
function normalizeName(value: string | null): string | null {
|
||||
return value ? value.toLowerCase() : null;
|
||||
}
|
||||
|
||||
/** @internal */
|
||||
export function suggestKtxConnectionForLookerConnection(args: {
|
||||
lookerConnection: LookerWarehouseConnectionInfo;
|
||||
candidateConnections: LookerMappingCandidateConnection[];
|
||||
}): string | null {
|
||||
const expectedType = lookerDialectToConnectionType(args.lookerConnection.dialect);
|
||||
if (!expectedType || !args.lookerConnection.host || !args.lookerConnection.database || !args.lookerConnection.dialect) {
|
||||
return null;
|
||||
}
|
||||
|
||||
const matches = args.candidateConnections.filter((connection) => {
|
||||
if (connection.connection_type !== expectedType) {
|
||||
return false;
|
||||
}
|
||||
return (
|
||||
normalizeHost(extractWarehouseHost(connection.connection_params, connection.connection_type)) ===
|
||||
normalizeHost(args.lookerConnection.host) &&
|
||||
normalizeName(extractWarehouseDatabase(connection.connection_params, connection.connection_type)) ===
|
||||
normalizeName(args.lookerConnection.database)
|
||||
);
|
||||
});
|
||||
|
||||
return matches.length === 1 ? matches[0].id : null;
|
||||
}
|
||||
|
||||
export function computeLookerMappingDrift(args: {
|
||||
storedMappings: LookerConnectionMapping[];
|
||||
discovered: LookerWarehouseConnectionInfo[];
|
||||
}): LookerMappingDrift {
|
||||
const discoveredByName = new Map(args.discovered.map((connection) => [connection.name, connection]));
|
||||
const storedByName = new Map(args.storedMappings.map((mapping) => [mapping.lookerConnectionName, mapping]));
|
||||
|
||||
return {
|
||||
unmappedDiscovered: args.discovered.filter((connection) => !storedByName.get(connection.name)?.ktxConnectionId),
|
||||
staleMappings: args.storedMappings
|
||||
.filter((mapping) => !discoveredByName.has(mapping.lookerConnectionName))
|
||||
.map((mapping) => ({
|
||||
lookerConnectionName: mapping.lookerConnectionName,
|
||||
reason: 'looker_connection_not_found' as const,
|
||||
})),
|
||||
inSync: args.storedMappings
|
||||
.filter((mapping) => discoveredByName.has(mapping.lookerConnectionName) && mapping.ktxConnectionId)
|
||||
.map((mapping) => ({
|
||||
lookerConnectionName: mapping.lookerConnectionName,
|
||||
ktxConnectionId: mapping.ktxConnectionId as string,
|
||||
})),
|
||||
};
|
||||
}
|
||||
|
||||
export function validateLookerMappings(args: {
|
||||
mappings: LookerConnectionMapping[];
|
||||
knownKtxConnectionIds: Set<string>;
|
||||
knownConnectionTypes: ReadonlyMap<string, string>;
|
||||
}): LookerMappingValidationResult {
|
||||
const errors: Array<{ key: string; reason: string }> = [];
|
||||
for (const mapping of args.mappings) {
|
||||
if (!mapping.ktxConnectionId) {
|
||||
continue;
|
||||
}
|
||||
if (!args.knownKtxConnectionIds.has(mapping.ktxConnectionId)) {
|
||||
errors.push({
|
||||
key: mapping.lookerConnectionName,
|
||||
reason: `KTX connection ${mapping.ktxConnectionId} does not exist`,
|
||||
});
|
||||
continue;
|
||||
}
|
||||
const connectionType = args.knownConnectionTypes.get(mapping.ktxConnectionId);
|
||||
const validation = validateLookerWarehouseTarget(connectionType ?? 'unknown');
|
||||
if (!validation.ok) {
|
||||
errors.push({ key: mapping.lookerConnectionName, reason: validation.reason });
|
||||
}
|
||||
}
|
||||
return errors.length === 0 ? { ok: true } : { ok: false, errors };
|
||||
}
|
||||
|
||||
/** @internal */
|
||||
export function refreshLookerMappingPlaceholders(args: {
|
||||
stored: LookerConnectionMapping[];
|
||||
live: LookerWarehouseConnectionInfo[];
|
||||
}): { mappings: LookerConnectionMapping[]; changed: boolean } {
|
||||
const byName = new Map(args.stored.map((mapping) => [mapping.lookerConnectionName, mapping]));
|
||||
let changed = false;
|
||||
|
||||
for (const live of args.live) {
|
||||
const existing = byName.get(live.name);
|
||||
if (!existing) {
|
||||
byName.set(live.name, {
|
||||
lookerConnectionName: live.name,
|
||||
ktxConnectionId: null,
|
||||
lookerHost: live.host,
|
||||
lookerDatabase: live.database,
|
||||
lookerDialect: live.dialect,
|
||||
});
|
||||
changed = true;
|
||||
continue;
|
||||
}
|
||||
|
||||
const refreshed: LookerConnectionMapping = {
|
||||
...existing,
|
||||
lookerHost: live.host,
|
||||
lookerDatabase: live.database,
|
||||
lookerDialect: live.dialect,
|
||||
};
|
||||
if (
|
||||
refreshed.lookerHost !== existing.lookerHost ||
|
||||
refreshed.lookerDatabase !== existing.lookerDatabase ||
|
||||
refreshed.lookerDialect !== existing.lookerDialect
|
||||
) {
|
||||
byName.set(live.name, refreshed);
|
||||
changed = true;
|
||||
}
|
||||
}
|
||||
|
||||
return { mappings: [...byName.values()], changed };
|
||||
}
|
||||
|
||||
/** @internal */
|
||||
export function collectExploreParseItems(args: {
|
||||
explore: StagedExploreFile;
|
||||
connectionMappings: Record<string, string>;
|
||||
targetConnections: ReadonlyMap<string, Pick<LookerTargetConnection, 'id' | 'connection_type'>>;
|
||||
}): { parsedTargetTables: Record<string, ParsedTargetTable>; parseItems: LookerTableIdentifierParseItem[] } {
|
||||
const parsedTargetTables: Record<string, ParsedTargetTable> = {};
|
||||
const parseItems: LookerTableIdentifierParseItem[] = [];
|
||||
const lookerConnectionName = args.explore.connectionName;
|
||||
const targetConnectionId = lookerConnectionName ? args.connectionMappings[lookerConnectionName] : undefined;
|
||||
|
||||
if (!lookerConnectionName || !targetConnectionId) {
|
||||
return { parsedTargetTables, parseItems };
|
||||
}
|
||||
|
||||
const targetConnection = args.targetConnections.get(targetConnectionId);
|
||||
const dialect = targetConnection ? sqlglotDialectForConnectionType(targetConnection.connection_type) : null;
|
||||
const key = `${args.explore.modelName}.${args.explore.exploreName}`;
|
||||
|
||||
if (!dialect) {
|
||||
parsedTargetTables[key] = {
|
||||
ok: false,
|
||||
reason: 'unsupported_dialect',
|
||||
detail: `Connection type ${targetConnection?.connection_type ?? 'unknown'} does not map to a supported sqlglot dialect.`,
|
||||
};
|
||||
return { parsedTargetTables, parseItems };
|
||||
}
|
||||
|
||||
if (args.explore.rawSqlTableName) {
|
||||
parseItems.push({ key, sql_table_name: args.explore.rawSqlTableName, dialect });
|
||||
}
|
||||
|
||||
for (const join of args.explore.joins) {
|
||||
if (!join.rawSqlTableName) {
|
||||
continue;
|
||||
}
|
||||
parseItems.push({
|
||||
key: `${key}.${join.name}`,
|
||||
sql_table_name: join.rawSqlTableName,
|
||||
dialect,
|
||||
});
|
||||
}
|
||||
|
||||
return { parsedTargetTables, parseItems };
|
||||
}
|
||||
|
||||
/** @internal */
|
||||
export function projectParsedIdentifier(row: LookerParsedIdentifier | undefined): ParsedTargetTable {
|
||||
if (!row) {
|
||||
return { ok: false, reason: 'parse_error', detail: 'Python parser response was missing this key.' };
|
||||
}
|
||||
if (row.ok && row.name && row.canonical_table) {
|
||||
return {
|
||||
ok: true,
|
||||
catalog: row.catalog ?? null,
|
||||
schema: row.schema ?? null,
|
||||
name: row.name,
|
||||
canonicalTable: row.canonical_table,
|
||||
};
|
||||
}
|
||||
return {
|
||||
ok: false,
|
||||
reason: row.reason ?? 'parse_error',
|
||||
detail: row.reason ? undefined : 'Python parser returned an invalid success row without name or canonical_table.',
|
||||
};
|
||||
}
|
||||
|
||||
export async function buildLookerPullConfigFromInputs(args: {
|
||||
lookerConnectionId: string;
|
||||
cursors: LookerRuntimeCursors;
|
||||
refreshedMappings: LookerConnectionMapping[];
|
||||
targetConnections: ReadonlyMap<string, Pick<LookerTargetConnection, 'id' | 'connection_type'>>;
|
||||
client: Pick<LookerMappingClient, 'listLookmlModels' | 'getExplore'>;
|
||||
parser: LookerTableIdentifierParser;
|
||||
}): Promise<LookerPullConfig> {
|
||||
const connectionMappings: Record<string, string> = {};
|
||||
const connectionTypes: Record<string, LookerWarehouseTargetConnectionType> = {};
|
||||
|
||||
for (const mapping of args.refreshedMappings) {
|
||||
if (!mapping.ktxConnectionId) {
|
||||
continue;
|
||||
}
|
||||
const target = args.targetConnections.get(mapping.ktxConnectionId);
|
||||
if (!target || !validateLookerWarehouseTarget(target.connection_type).ok) {
|
||||
continue;
|
||||
}
|
||||
connectionMappings[mapping.lookerConnectionName] = mapping.ktxConnectionId;
|
||||
connectionTypes[mapping.lookerConnectionName] = target.connection_type as LookerWarehouseTargetConnectionType;
|
||||
}
|
||||
|
||||
const parsedTargetTables = await parseExploreTargets({
|
||||
client: args.client,
|
||||
connectionMappings,
|
||||
targetConnections: args.targetConnections,
|
||||
parser: args.parser,
|
||||
});
|
||||
|
||||
return {
|
||||
lookerConnectionId: args.lookerConnectionId,
|
||||
dashboardUpdatedSince: args.cursors.dashboardsLastSyncedAt,
|
||||
lookUpdatedSince: args.cursors.looksLastSyncedAt,
|
||||
connectionMappings,
|
||||
connectionTypes,
|
||||
parsedTargetTables,
|
||||
};
|
||||
}
|
||||
|
||||
async function parseExploreTargets(args: {
|
||||
client: Pick<LookerMappingClient, 'listLookmlModels' | 'getExplore'>;
|
||||
connectionMappings: Record<string, string>;
|
||||
targetConnections: ReadonlyMap<string, Pick<LookerTargetConnection, 'id' | 'connection_type'>>;
|
||||
parser: LookerTableIdentifierParser;
|
||||
}): Promise<Record<string, ParsedTargetTable>> {
|
||||
const parsedTargetTables: Record<string, ParsedTargetTable> = {};
|
||||
const parseItems: LookerTableIdentifierParseItem[] = [];
|
||||
|
||||
let models: StagedLookmlModelsFile;
|
||||
try {
|
||||
models = await args.client.listLookmlModels();
|
||||
} catch {
|
||||
return parsedTargetTables;
|
||||
}
|
||||
|
||||
for (const model of models.models) {
|
||||
for (const exploreRef of model.explores) {
|
||||
let explore: StagedExploreFile;
|
||||
try {
|
||||
explore = await args.client.getExplore(model.name, exploreRef.name);
|
||||
} catch {
|
||||
continue;
|
||||
}
|
||||
const collected = collectExploreParseItems({
|
||||
explore,
|
||||
connectionMappings: args.connectionMappings,
|
||||
targetConnections: args.targetConnections,
|
||||
});
|
||||
Object.assign(parsedTargetTables, collected.parsedTargetTables);
|
||||
parseItems.push(...collected.parseItems);
|
||||
}
|
||||
}
|
||||
|
||||
if (parseItems.length === 0) {
|
||||
return parsedTargetTables;
|
||||
}
|
||||
|
||||
let results: Record<string, LookerParsedIdentifier>;
|
||||
try {
|
||||
results = await args.parser.parse(parseItems);
|
||||
} catch {
|
||||
for (const item of parseItems) {
|
||||
parsedTargetTables[item.key] = {
|
||||
ok: false,
|
||||
reason: 'parse_error',
|
||||
detail: 'Python parse-table-identifier failed during Looker pull-config projection.',
|
||||
};
|
||||
}
|
||||
return parsedTargetTables;
|
||||
}
|
||||
|
||||
for (const item of parseItems) {
|
||||
parsedTargetTables[item.key] = projectParsedIdentifier(results[item.key]);
|
||||
}
|
||||
return parsedTargetTables;
|
||||
}
|
||||
|
||||
function isRecord(value: unknown): value is Record<string, unknown> {
|
||||
return value !== null && typeof value === 'object' && !Array.isArray(value);
|
||||
}
|
||||
|
||||
function readString(record: Record<string, unknown>, key: string): string | null {
|
||||
const value = record[key];
|
||||
return typeof value === 'string' ? value : null;
|
||||
}
|
||||
|
|
@ -0,0 +1,13 @@
|
|||
import { describe, expect, it } from 'vitest';
|
||||
import { buildLookerReconcileNotes } from './reconcile.js';
|
||||
|
||||
describe('buildLookerReconcileNotes', () => {
|
||||
it('instructs reconciliation to record subsumed provenance', () => {
|
||||
expect(buildLookerReconcileNotes()).toEqual([
|
||||
[
|
||||
'Looker runtime API-derived SL sources use looker__<model>__<explore>.',
|
||||
'If the unprefixed file-adapter source <model>__<explore> exists, prefer it in wiki sl_refs, delete or avoid the API-derived source, and call emit_artifact_resolution with actionType="subsumed" for the API raw explore path.',
|
||||
].join(' '),
|
||||
]);
|
||||
});
|
||||
});
|
||||
|
|
@ -0,0 +1,8 @@
|
|||
export function buildLookerReconcileNotes(): string[] {
|
||||
return [
|
||||
[
|
||||
'Looker runtime API-derived SL sources use looker__<model>__<explore>.',
|
||||
'If the unprefixed file-adapter source <model>__<explore> exists, prefer it in wiki sl_refs, delete or avoid the API-derived source, and call emit_artifact_resolution with actionType="subsumed" for the API raw explore path.',
|
||||
].join(' '),
|
||||
];
|
||||
}
|
||||
101
packages/cli/src/context/ingest/adapters/looker/scope.test.ts
Normal file
101
packages/cli/src/context/ingest/adapters/looker/scope.test.ts
Normal file
|
|
@ -0,0 +1,101 @@
|
|||
import { mkdir, mkdtemp, rm, writeFile } from 'node:fs/promises';
|
||||
import { tmpdir } from 'node:os';
|
||||
import { join } from 'node:path';
|
||||
import { afterEach, beforeEach, describe, expect, it } from 'vitest';
|
||||
import { describeLookerScope, hashLookerScope, isPathInLookerScope } from './scope.js';
|
||||
|
||||
async function writeJson(stagedDir: string, relPath: string, value: unknown): Promise<void> {
|
||||
const abs = join(stagedDir, relPath);
|
||||
await mkdir(join(abs, '..'), { recursive: true });
|
||||
await writeFile(abs, `${JSON.stringify(value, null, 2)}\n`, 'utf-8');
|
||||
}
|
||||
|
||||
describe('Looker runtime fetch scope', () => {
|
||||
let stagedDir: string;
|
||||
|
||||
beforeEach(async () => {
|
||||
stagedDir = await mkdtemp(join(tmpdir(), 'looker-scope-'));
|
||||
});
|
||||
|
||||
afterEach(async () => {
|
||||
await rm(stagedDir, { recursive: true, force: true });
|
||||
});
|
||||
|
||||
it('keeps omitted known-current entity files out of the deletion baseline', () => {
|
||||
const scope = {
|
||||
mode: 'incremental' as const,
|
||||
knownCurrentRawPaths: ['dashboards/10.json', 'dashboards/11.json', 'looks/20.json'],
|
||||
fetchedRawPaths: ['dashboards/11.json'],
|
||||
};
|
||||
|
||||
expect(isPathInLookerScope('dashboards/10.json', scope)).toBe(false);
|
||||
expect(isPathInLookerScope('looks/20.json', scope)).toBe(false);
|
||||
expect(isPathInLookerScope('dashboards/11.json', scope)).toBe(true);
|
||||
expect(isPathInLookerScope('looks/21.json', scope)).toBe(true);
|
||||
expect(isPathInLookerScope('signals/dashboard_usage.json', scope)).toBe(true);
|
||||
expect(isPathInLookerScope('explores/b2b/sales_pipeline.json', scope)).toBe(true);
|
||||
});
|
||||
|
||||
it('keeps omitted unchanged evidence documents out of incremental delete scope', () => {
|
||||
const scope = {
|
||||
mode: 'incremental' as const,
|
||||
knownCurrentRawPaths: ['dashboards/10.json', 'looks/20.json'],
|
||||
fetchedRawPaths: ['dashboards/10.json'],
|
||||
};
|
||||
|
||||
expect(isPathInLookerScope('evidence/dashboards/10/page.md', scope)).toBe(true);
|
||||
expect(isPathInLookerScope('evidence/dashboards/10/metadata.json', scope)).toBe(true);
|
||||
expect(isPathInLookerScope('evidence/looks/20/page.md', scope)).toBe(false);
|
||||
expect(isPathInLookerScope('evidence/looks/20/metadata.json', scope)).toBe(false);
|
||||
});
|
||||
|
||||
it('treats full scope as all raw paths in scope', () => {
|
||||
const scope = {
|
||||
mode: 'full' as const,
|
||||
knownCurrentRawPaths: ['dashboards/10.json'],
|
||||
fetchedRawPaths: ['dashboards/10.json'],
|
||||
};
|
||||
|
||||
expect(isPathInLookerScope('dashboards/10.json', scope)).toBe(true);
|
||||
expect(isPathInLookerScope('dashboards/99.json', scope)).toBe(true);
|
||||
expect(isPathInLookerScope('looks/20.json', scope)).toBe(true);
|
||||
});
|
||||
|
||||
it('hashes scope order-insensitively', () => {
|
||||
const a = hashLookerScope({
|
||||
mode: 'incremental',
|
||||
knownCurrentRawPaths: ['looks/20.json', 'dashboards/10.json'],
|
||||
fetchedRawPaths: ['dashboards/10.json'],
|
||||
});
|
||||
const b = hashLookerScope({
|
||||
mode: 'incremental',
|
||||
knownCurrentRawPaths: ['dashboards/10.json', 'looks/20.json'],
|
||||
fetchedRawPaths: ['dashboards/10.json'],
|
||||
});
|
||||
|
||||
expect(a).toBe(b);
|
||||
expect(a).toMatch(/^[0-9a-f]{64}$/);
|
||||
});
|
||||
|
||||
it('reads staged scope and returns a SourceAdapter ScopeDescriptor', async () => {
|
||||
await writeJson(stagedDir, 'looker-scope.json', {
|
||||
mode: 'incremental',
|
||||
knownCurrentRawPaths: ['dashboards/10.json', 'looks/20.json'],
|
||||
fetchedRawPaths: ['dashboards/10.json'],
|
||||
});
|
||||
|
||||
const descriptor = await describeLookerScope(stagedDir);
|
||||
|
||||
expect(descriptor.fingerprint).toMatch(/^[0-9a-f]{64}$/);
|
||||
expect(descriptor.isPathInScope('dashboards/10.json')).toBe(true);
|
||||
expect(descriptor.isPathInScope('looks/20.json')).toBe(false);
|
||||
expect(descriptor.isPathInScope('looks/99.json')).toBe(true);
|
||||
});
|
||||
|
||||
it('falls back to full scope when old fixtures do not have a scope file', async () => {
|
||||
const descriptor = await describeLookerScope(stagedDir);
|
||||
|
||||
expect(descriptor.isPathInScope('dashboards/10.json')).toBe(true);
|
||||
expect(descriptor.isPathInScope('looks/20.json')).toBe(true);
|
||||
});
|
||||
});
|
||||
65
packages/cli/src/context/ingest/adapters/looker/scope.ts
Normal file
65
packages/cli/src/context/ingest/adapters/looker/scope.ts
Normal file
|
|
@ -0,0 +1,65 @@
|
|||
import { createHash } from 'node:crypto';
|
||||
import { readFile } from 'node:fs/promises';
|
||||
import { join } from 'node:path';
|
||||
import type { ScopeDescriptor } from '../../types.js';
|
||||
import { STAGED_FILES, type StagedLookerScopeFile, stagedLookerScopeFileSchema } from './types.js';
|
||||
|
||||
const LOOKER_ENTITY_PATH_RE = /^(dashboards|looks)\/[^/]+\.json$/;
|
||||
const LOOKER_EVIDENCE_ENTITY_PATH_RE = /^evidence\/(dashboards|looks)\/([^/]+)\/(?:metadata\.json|page\.md)$/;
|
||||
|
||||
export async function describeLookerScope(stagedDir: string): Promise<ScopeDescriptor> {
|
||||
const scope = await readLookerScope(stagedDir);
|
||||
return {
|
||||
fingerprint: hashLookerScope(scope),
|
||||
isPathInScope: (rawPath) => isPathInLookerScope(rawPath, scope),
|
||||
};
|
||||
}
|
||||
|
||||
async function readLookerScope(stagedDir: string): Promise<StagedLookerScopeFile> {
|
||||
try {
|
||||
const body = await readFile(join(stagedDir, STAGED_FILES.scope), 'utf-8');
|
||||
return stagedLookerScopeFileSchema.parse(JSON.parse(body));
|
||||
} catch (error) {
|
||||
if (error && typeof error === 'object' && 'code' in error && error.code === 'ENOENT') {
|
||||
return { mode: 'full', knownCurrentRawPaths: [], fetchedRawPaths: [] };
|
||||
}
|
||||
throw error;
|
||||
}
|
||||
}
|
||||
|
||||
/** @internal */
|
||||
export function hashLookerScope(scope: StagedLookerScopeFile): string {
|
||||
const canonical = JSON.stringify({
|
||||
mode: scope.mode,
|
||||
knownCurrentRawPaths: [...scope.knownCurrentRawPaths].sort(),
|
||||
fetchedRawPaths: [...scope.fetchedRawPaths].sort(),
|
||||
});
|
||||
return createHash('sha256').update(canonical).digest('hex');
|
||||
}
|
||||
|
||||
/** @internal */
|
||||
export function isPathInLookerScope(rawPath: string, scope: StagedLookerScopeFile): boolean {
|
||||
if (scope.mode === 'full') {
|
||||
return true;
|
||||
}
|
||||
|
||||
const entityRawPath = scopedEntityRawPath(rawPath);
|
||||
if (!entityRawPath) {
|
||||
return true;
|
||||
}
|
||||
|
||||
const knownCurrent = new Set(scope.knownCurrentRawPaths);
|
||||
const fetched = new Set(scope.fetchedRawPaths);
|
||||
return fetched.has(entityRawPath) || !knownCurrent.has(entityRawPath);
|
||||
}
|
||||
|
||||
function scopedEntityRawPath(rawPath: string): string | null {
|
||||
if (LOOKER_ENTITY_PATH_RE.test(rawPath)) {
|
||||
return rawPath;
|
||||
}
|
||||
const evidence = LOOKER_EVIDENCE_ENTITY_PATH_RE.exec(rawPath);
|
||||
if (evidence) {
|
||||
return `${evidence[1]}/${evidence[2]}.json`;
|
||||
}
|
||||
return null;
|
||||
}
|
||||
|
|
@ -0,0 +1,86 @@
|
|||
import { mkdir, mkdtemp, rm, writeFile } from 'node:fs/promises';
|
||||
import { tmpdir } from 'node:os';
|
||||
import { join } from 'node:path';
|
||||
import { afterEach, beforeEach, describe, expect, it } from 'vitest';
|
||||
import { listLookerTargetConnectionIds } from './target-connections.js';
|
||||
|
||||
describe('listLookerTargetConnectionIds', () => {
|
||||
let stagedDir: string;
|
||||
|
||||
beforeEach(async () => {
|
||||
stagedDir = await mkdtemp(join(tmpdir(), 'looker-targets-'));
|
||||
});
|
||||
|
||||
afterEach(async () => {
|
||||
await rm(stagedDir, { recursive: true, force: true });
|
||||
});
|
||||
|
||||
it('collects unique target warehouse IDs from explores, dashboard queries, and Look queries', async () => {
|
||||
await mkdir(join(stagedDir, 'explores', 'b2b'), { recursive: true });
|
||||
await mkdir(join(stagedDir, 'dashboards'), { recursive: true });
|
||||
await mkdir(join(stagedDir, 'looks'), { recursive: true });
|
||||
|
||||
await writeFile(
|
||||
join(stagedDir, 'explores', 'b2b', 'sales_pipeline.json'),
|
||||
JSON.stringify({
|
||||
modelName: 'b2b',
|
||||
exploreName: 'sales_pipeline',
|
||||
label: null,
|
||||
description: null,
|
||||
fields: { dimensions: [], measures: [] },
|
||||
joins: [],
|
||||
targetWarehouseConnectionId: '22222222-2222-4222-8222-222222222222',
|
||||
}),
|
||||
);
|
||||
await writeFile(
|
||||
join(stagedDir, 'dashboards', '1.json'),
|
||||
JSON.stringify({
|
||||
lookerId: '1',
|
||||
title: 'Pipeline',
|
||||
description: null,
|
||||
folderId: null,
|
||||
ownerId: null,
|
||||
updatedAt: null,
|
||||
tiles: [
|
||||
{
|
||||
id: '11',
|
||||
title: 'ARR',
|
||||
lookId: null,
|
||||
query: {
|
||||
model: 'b2b',
|
||||
view: 'sales_pipeline',
|
||||
fields: [],
|
||||
filters: {},
|
||||
sorts: [],
|
||||
targetWarehouseConnectionId: '33333333-3333-4333-8333-333333333333',
|
||||
},
|
||||
},
|
||||
],
|
||||
}),
|
||||
);
|
||||
await writeFile(
|
||||
join(stagedDir, 'looks', '2.json'),
|
||||
JSON.stringify({
|
||||
lookerId: '2',
|
||||
title: 'Customers',
|
||||
description: null,
|
||||
folderId: null,
|
||||
ownerId: null,
|
||||
updatedAt: null,
|
||||
query: {
|
||||
model: 'b2b',
|
||||
view: 'sales_pipeline',
|
||||
fields: [],
|
||||
filters: {},
|
||||
sorts: [],
|
||||
targetWarehouseConnectionId: '22222222-2222-4222-8222-222222222222',
|
||||
},
|
||||
}),
|
||||
);
|
||||
|
||||
await expect(listLookerTargetConnectionIds(stagedDir)).resolves.toEqual([
|
||||
'22222222-2222-4222-8222-222222222222',
|
||||
'33333333-3333-4333-8333-333333333333',
|
||||
]);
|
||||
});
|
||||
});
|
||||
|
|
@ -0,0 +1,41 @@
|
|||
import { readdir, readFile } from 'node:fs/promises';
|
||||
import { join, relative } from 'node:path';
|
||||
import { stagedDashboardFileSchema, stagedExploreFileSchema, stagedLookFileSchema } from './types.js';
|
||||
|
||||
async function walk(root: string): Promise<string[]> {
|
||||
const entries = await readdir(root, { withFileTypes: true, recursive: true });
|
||||
return entries
|
||||
.filter((entry) => entry.isFile())
|
||||
.map((entry) => relative(root, join(entry.parentPath, entry.name)).replace(/\\/g, '/'))
|
||||
.sort();
|
||||
}
|
||||
|
||||
function addTarget(targets: Set<string>, value: string | null | undefined): void {
|
||||
if (value) {
|
||||
targets.add(value);
|
||||
}
|
||||
}
|
||||
|
||||
export async function listLookerTargetConnectionIds(stagedDir: string): Promise<string[]> {
|
||||
const targets = new Set<string>();
|
||||
for (const path of await walk(stagedDir)) {
|
||||
const fullPath = join(stagedDir, path);
|
||||
if (/^explores\/[^/]+\/[^/]+\.json$/.test(path)) {
|
||||
const explore = stagedExploreFileSchema.parse(JSON.parse(await readFile(fullPath, 'utf-8')));
|
||||
addTarget(targets, explore.targetWarehouseConnectionId);
|
||||
continue;
|
||||
}
|
||||
if (/^dashboards\/[^/]+\.json$/.test(path)) {
|
||||
const dashboard = stagedDashboardFileSchema.parse(JSON.parse(await readFile(fullPath, 'utf-8')));
|
||||
for (const tile of dashboard.tiles) {
|
||||
addTarget(targets, tile.query?.targetWarehouseConnectionId);
|
||||
}
|
||||
continue;
|
||||
}
|
||||
if (/^looks\/[^/]+\.json$/.test(path)) {
|
||||
const look = stagedLookFileSchema.parse(JSON.parse(await readFile(fullPath, 'utf-8')));
|
||||
addTarget(targets, look.query?.targetWarehouseConnectionId);
|
||||
}
|
||||
}
|
||||
return [...targets].sort();
|
||||
}
|
||||
|
|
@ -0,0 +1,243 @@
|
|||
import { describe, expect, it } from 'vitest';
|
||||
import type { ToolOutput } from '../../../../../context/tools/base-tool.js';
|
||||
import { buildLookerSlProposal, createLookerQueryToSlTool, type LookerSlProposal } from './looker-query-to-sl.tool.js';
|
||||
|
||||
describe('buildLookerSlProposal', () => {
|
||||
it('suggests a measure and segment for an aggregated filtered Looker query', () => {
|
||||
const proposal = buildLookerSlProposal({
|
||||
contentTitle: 'Open Pipeline ARR',
|
||||
contentType: 'look',
|
||||
usage: { queryCount30d: 42, uniqueUsers30d: 7 },
|
||||
query: {
|
||||
model: 'b2b',
|
||||
view: 'sales_pipeline',
|
||||
fields: ['opportunities.arr', 'opportunities.stage'],
|
||||
filters: { 'opportunities.stage': 'open' },
|
||||
sorts: ['opportunities.arr desc'],
|
||||
limit: '500',
|
||||
},
|
||||
});
|
||||
|
||||
expect(proposal.sourceName).toBe('looker__b2b__sales_pipeline');
|
||||
expect(proposal.triageLane).toBe('full');
|
||||
expect(proposal.decision).toBe('measure_added');
|
||||
expect(proposal.measures).toEqual([
|
||||
{
|
||||
name: 'arr',
|
||||
lookerField: 'opportunities.arr',
|
||||
expr: 'sum(opportunities.arr)',
|
||||
description: 'Suggested from Looker look "Open Pipeline ARR"; verify against explore field SQL before writing.',
|
||||
},
|
||||
]);
|
||||
expect(proposal.dimensions).toEqual([{ name: 'stage', lookerField: 'opportunities.stage' }]);
|
||||
expect(proposal.segments).toEqual([
|
||||
{
|
||||
name: 'open_pipeline_arr',
|
||||
filters: { 'opportunities.stage': 'open' },
|
||||
suggestedPredicate: "opportunities.stage = 'open'",
|
||||
description: 'Reusable filter candidate from Looker look "Open Pipeline ARR".',
|
||||
},
|
||||
]);
|
||||
expect(proposal.notes).toContain(
|
||||
'Usage signals can raise priority, but query counts, users, owners, and folders must not be written as wiki narrative.',
|
||||
);
|
||||
});
|
||||
|
||||
it('keeps simple saved views as wiki-only candidates', () => {
|
||||
const proposal = buildLookerSlProposal({
|
||||
contentTitle: 'Accounts By Region',
|
||||
query: {
|
||||
model: 'b2b',
|
||||
view: 'accounts',
|
||||
fields: ['accounts.region', 'accounts.segment'],
|
||||
filters: {},
|
||||
},
|
||||
});
|
||||
|
||||
expect(proposal.sourceName).toBe('looker__b2b__accounts');
|
||||
expect(proposal.triageLane).toBe('light');
|
||||
expect(proposal.decision).toBe('wiki_only');
|
||||
expect(proposal.measures).toEqual([]);
|
||||
expect(proposal.dimensions).toEqual([
|
||||
{ name: 'region', lookerField: 'accounts.region' },
|
||||
{ name: 'segment', lookerField: 'accounts.segment' },
|
||||
]);
|
||||
expect(proposal.segments).toEqual([]);
|
||||
});
|
||||
|
||||
it('promotes high-usage filter-only queries as derived-source candidates', () => {
|
||||
const proposal = buildLookerSlProposal({
|
||||
contentTitle: 'Active Customers',
|
||||
usage: { queryCount30d: 15, uniqueUsers30d: 4 },
|
||||
query: {
|
||||
model: 'b2b',
|
||||
view: 'customers',
|
||||
fields: ['customers.id', 'customers.name'],
|
||||
filters: { 'customers.status': 'active', 'customers.is_test': '-yes' },
|
||||
},
|
||||
});
|
||||
|
||||
expect(proposal.sourceName).toBe('looker__b2b__customers');
|
||||
expect(proposal.decision).toBe('source_created');
|
||||
expect(proposal.segments).toEqual([
|
||||
{
|
||||
name: 'active_customers',
|
||||
filters: { 'customers.status': 'active', 'customers.is_test': '-yes' },
|
||||
suggestedPredicate: "customers.status = 'active' AND customers.is_test != 'yes'",
|
||||
description: 'Reusable filter candidate from Looker look "Active Customers".',
|
||||
},
|
||||
]);
|
||||
});
|
||||
|
||||
it('surfaces mapped warehouse target metadata for direct SL writes', () => {
|
||||
const proposal = buildLookerSlProposal({
|
||||
contentTitle: 'Open Pipeline ARR',
|
||||
contentType: 'dashboard_tile',
|
||||
usage: { queryCount30d: 42, uniqueUsers30d: 7 },
|
||||
query: {
|
||||
model: 'b2b',
|
||||
view: 'sales_pipeline',
|
||||
fields: ['opportunities.arr', 'opportunities.stage'],
|
||||
filters: { 'opportunities.stage': 'open' },
|
||||
targetWarehouseConnectionId: '22222222-2222-4222-8222-222222222222',
|
||||
targetTable: {
|
||||
ok: true,
|
||||
catalog: 'proj',
|
||||
schema: 'dataset',
|
||||
name: 'opportunities',
|
||||
canonicalTable: 'proj.dataset.opportunities',
|
||||
},
|
||||
},
|
||||
});
|
||||
|
||||
expect(proposal.sourceName).toBe('looker__b2b__sales_pipeline');
|
||||
expect(proposal.targetStatus).toBe('mapped');
|
||||
expect(proposal.targetWarehouseConnectionId).toBe('22222222-2222-4222-8222-222222222222');
|
||||
expect(proposal.sourceTable).toBe('proj.dataset.opportunities');
|
||||
expect(proposal.canWriteStandaloneSource).toBe(true);
|
||||
expect(proposal.targetTable).toEqual({
|
||||
ok: true,
|
||||
catalog: 'proj',
|
||||
schema: 'dataset',
|
||||
name: 'opportunities',
|
||||
canonicalTable: 'proj.dataset.opportunities',
|
||||
});
|
||||
expect(proposal.notes).toContain(
|
||||
'targetTable.ok is true: write or edit SL on targetWarehouseConnectionId using targetTable.canonicalTable as source.table.',
|
||||
);
|
||||
});
|
||||
|
||||
it('surfaces unmapped and unparseable target reasons for wiki-only fallback', () => {
|
||||
const unmapped = buildLookerSlProposal({
|
||||
contentTitle: 'Revenue Trend',
|
||||
query: {
|
||||
model: 'b2b',
|
||||
view: 'revenue',
|
||||
fields: ['revenue.arr'],
|
||||
filters: {},
|
||||
targetWarehouseConnectionId: null,
|
||||
targetTable: {
|
||||
ok: false,
|
||||
reason: 'no_connection_mapping',
|
||||
},
|
||||
},
|
||||
});
|
||||
|
||||
expect(unmapped.targetStatus).toBe('unmapped');
|
||||
expect(unmapped.targetWarehouseConnectionId).toBeNull();
|
||||
expect(unmapped.sourceTable).toBeNull();
|
||||
expect(unmapped.canWriteStandaloneSource).toBe(false);
|
||||
expect(unmapped.notes).toContain(
|
||||
'targetTable.ok is false (no_connection_mapping): keep this query wiki-only and pass the reason through emit_unmapped_fallback.',
|
||||
);
|
||||
|
||||
const unparseable = buildLookerSlProposal({
|
||||
contentTitle: 'Templated Source',
|
||||
query: {
|
||||
model: 'b2b',
|
||||
view: 'templated',
|
||||
fields: ['templated.count'],
|
||||
filters: {},
|
||||
targetWarehouseConnectionId: '22222222-2222-4222-8222-222222222222',
|
||||
targetTable: {
|
||||
ok: false,
|
||||
reason: 'looker_template_unresolved',
|
||||
detail: 'The sql_table_name contains ${derived.SQL_TABLE_NAME}.',
|
||||
},
|
||||
},
|
||||
});
|
||||
|
||||
expect(unparseable.targetStatus).toBe('unparseable');
|
||||
expect(unparseable.targetWarehouseConnectionId).toBe('22222222-2222-4222-8222-222222222222');
|
||||
expect(unparseable.sourceTable).toBeNull();
|
||||
expect(unparseable.canWriteStandaloneSource).toBe(false);
|
||||
expect(unparseable.notes).toContain(
|
||||
'targetTable.ok is false (looker_template_unresolved): keep this query wiki-only and pass the reason through emit_unmapped_fallback.',
|
||||
);
|
||||
});
|
||||
});
|
||||
|
||||
describe('createLookerQueryToSlTool', () => {
|
||||
it('returns markdown plus the structured proposal', async () => {
|
||||
const lookerQueryToSl = createLookerQueryToSlTool();
|
||||
if (!lookerQueryToSl.execute) {
|
||||
throw new Error('looker_query_to_sl tool must be executable');
|
||||
}
|
||||
const output = (await lookerQueryToSl.execute(
|
||||
{
|
||||
contentTitle: 'Revenue Trend',
|
||||
contentType: 'dashboard_tile',
|
||||
query: {
|
||||
model: 'finance',
|
||||
view: 'orders',
|
||||
fields: ['orders.total_revenue', 'orders.created_month'],
|
||||
filters: { 'orders.status': 'paid' },
|
||||
sorts: [],
|
||||
targetWarehouseConnectionId: null,
|
||||
targetTable: null,
|
||||
},
|
||||
},
|
||||
{ toolCallId: 'call-1', messages: [] } as never,
|
||||
)) as ToolOutput<LookerSlProposal>;
|
||||
|
||||
expect(output.markdown).toContain('Looker query SL proposal');
|
||||
expect(output.markdown).toContain('looker__finance__orders');
|
||||
expect(output.structured.sourceName).toBe('looker__finance__orders');
|
||||
expect(output.structured.measures[0]?.name).toBe('total_revenue');
|
||||
});
|
||||
|
||||
it('prints target connection and canonical table in markdown output', async () => {
|
||||
const lookerQueryToSl = createLookerQueryToSlTool();
|
||||
if (!lookerQueryToSl.execute) {
|
||||
throw new Error('looker_query_to_sl tool must be executable');
|
||||
}
|
||||
|
||||
const output = (await lookerQueryToSl.execute(
|
||||
{
|
||||
contentTitle: 'Revenue Trend',
|
||||
contentType: 'dashboard_tile',
|
||||
query: {
|
||||
model: 'finance',
|
||||
view: 'orders',
|
||||
fields: ['orders.total_revenue', 'orders.created_month'],
|
||||
filters: { 'orders.status': 'paid' },
|
||||
sorts: [],
|
||||
targetWarehouseConnectionId: '33333333-3333-4333-8333-333333333333',
|
||||
targetTable: {
|
||||
ok: true,
|
||||
catalog: 'proj',
|
||||
schema: 'finance',
|
||||
name: 'orders',
|
||||
canonicalTable: 'proj.finance.orders',
|
||||
},
|
||||
},
|
||||
},
|
||||
{ toolCallId: 'call-1', messages: [] } as never,
|
||||
)) as ToolOutput<LookerSlProposal>;
|
||||
|
||||
expect(output.markdown).toContain('- targetStatus: mapped');
|
||||
expect(output.markdown).toContain('- targetWarehouseConnectionId: 33333333-3333-4333-8333-333333333333');
|
||||
expect(output.markdown).toContain('- sourceTable: proj.finance.orders');
|
||||
expect(output.structured.canWriteStandaloneSource).toBe(true);
|
||||
});
|
||||
});
|
||||
|
|
@ -0,0 +1,307 @@
|
|||
import { tool } from 'ai';
|
||||
import { z } from 'zod';
|
||||
import type { ToolOutput } from '../../../../../context/tools/base-tool.js';
|
||||
import type { ParsedTargetTable } from '../../../parsed-target-table.js';
|
||||
import { stagedLookerQuerySchema } from '../types.js';
|
||||
|
||||
const lookerUsageInputSchema = z.object({
|
||||
queryCount30d: z.number().int().nonnegative().default(0),
|
||||
uniqueUsers30d: z.number().int().nonnegative().default(0),
|
||||
});
|
||||
|
||||
const lookerQueryToSlInputSchema = z.object({
|
||||
query: stagedLookerQuerySchema,
|
||||
contentTitle: z.string().min(1).optional(),
|
||||
contentType: z.enum(['look', 'dashboard_tile']).default('look'),
|
||||
usage: lookerUsageInputSchema.optional(),
|
||||
});
|
||||
|
||||
export type LookerQueryToSlInput = z.input<typeof lookerQueryToSlInputSchema>;
|
||||
|
||||
type LookerTargetStatus = 'mapped' | 'unmapped' | 'unparseable' | 'missing_target_table';
|
||||
|
||||
interface LookerSlFieldProposal {
|
||||
name: string;
|
||||
lookerField: string;
|
||||
}
|
||||
|
||||
interface LookerSlMeasureProposal extends LookerSlFieldProposal {
|
||||
expr: string;
|
||||
description: string;
|
||||
}
|
||||
|
||||
interface LookerSlSegmentProposal {
|
||||
name: string;
|
||||
filters: Record<string, unknown>;
|
||||
suggestedPredicate: string;
|
||||
description: string;
|
||||
}
|
||||
|
||||
export interface LookerSlProposal {
|
||||
sourceName: string;
|
||||
targetWarehouseConnectionId: string | null;
|
||||
targetTable: ParsedTargetTable | null;
|
||||
targetStatus: LookerTargetStatus;
|
||||
sourceTable: string | null;
|
||||
canWriteStandaloneSource: boolean;
|
||||
triageLane: 'skip' | 'light' | 'full';
|
||||
decision: 'wiki_only' | 'measure_added' | 'source_created';
|
||||
dimensions: LookerSlFieldProposal[];
|
||||
measures: LookerSlMeasureProposal[];
|
||||
segments: LookerSlSegmentProposal[];
|
||||
notes: string[];
|
||||
}
|
||||
|
||||
const MEASURE_FIELD_RE =
|
||||
/\b(count|sum|total|revenue|arr|mrr|amount|avg|average|rate|ratio|percent|pct|margin|profit|value|score)\b/i;
|
||||
|
||||
function targetStatus(
|
||||
targetWarehouseConnectionId: string | null,
|
||||
targetTable: ParsedTargetTable | null,
|
||||
): LookerTargetStatus {
|
||||
if (targetTable?.ok === true && targetWarehouseConnectionId) {
|
||||
return 'mapped';
|
||||
}
|
||||
if (targetTable?.ok === false && targetTable.reason === 'no_connection_mapping') {
|
||||
return 'unmapped';
|
||||
}
|
||||
if (targetTable?.ok === false) {
|
||||
return 'unparseable';
|
||||
}
|
||||
return 'missing_target_table';
|
||||
}
|
||||
|
||||
function targetNotes(status: LookerTargetStatus, targetTable: ParsedTargetTable | null): string[] {
|
||||
if (status === 'mapped') {
|
||||
return [
|
||||
'targetTable.ok is true: write or edit SL on targetWarehouseConnectionId using targetTable.canonicalTable as source.table.',
|
||||
'Use targetTable.catalog, targetTable.schema, and targetTable.name only for source_tables preflight matching.',
|
||||
'Never use rawSqlTableName as source.table; it may contain aliases, templates, or derived-table SQL.',
|
||||
];
|
||||
}
|
||||
if (targetTable?.ok === false) {
|
||||
return [
|
||||
`targetTable.ok is false (${targetTable.reason}): keep this query wiki-only and pass the reason through emit_unmapped_fallback.`,
|
||||
];
|
||||
}
|
||||
return [
|
||||
'No targetTable was staged for this query; read the parent explore dependency before attempting any SL write.',
|
||||
];
|
||||
}
|
||||
|
||||
/** @internal */
|
||||
export function buildLookerSlProposal(raw: LookerQueryToSlInput): LookerSlProposal {
|
||||
const input = lookerQueryToSlInputSchema.parse(raw);
|
||||
const sourceName = `looker__${toSlName(input.query.model)}__${toSlName(input.query.view)}`;
|
||||
const usage = input.usage;
|
||||
const targetWarehouseConnectionId = input.query.targetWarehouseConnectionId ?? null;
|
||||
const targetTable = input.query.targetTable ?? null;
|
||||
const status = targetStatus(targetWarehouseConnectionId, targetTable);
|
||||
const sourceTable = targetTable?.ok === true ? targetTable.canonicalTable : null;
|
||||
const canWriteStandaloneSource = status === 'mapped';
|
||||
const triageLane =
|
||||
usage && usage.queryCount30d === 0 && usage.uniqueUsers30d === 0 ? 'skip' : isHighUsage(usage) ? 'full' : 'light';
|
||||
const dimensions: LookerSlFieldProposal[] = [];
|
||||
const measures: LookerSlMeasureProposal[] = [];
|
||||
|
||||
for (const field of input.query.fields) {
|
||||
const proposal = { name: toSlName(fieldLeaf(field)), lookerField: field };
|
||||
if (isMeasureLikeField(field)) {
|
||||
measures.push({
|
||||
...proposal,
|
||||
expr: suggestedMeasureExpr(field),
|
||||
description: `Suggested from Looker ${contentLabel(input)}; verify against explore field SQL before writing.`,
|
||||
});
|
||||
} else {
|
||||
dimensions.push(proposal);
|
||||
}
|
||||
}
|
||||
|
||||
const filters = nonEmptyFilters(input.query.filters);
|
||||
const segments =
|
||||
Object.keys(filters).length === 0
|
||||
? []
|
||||
: [
|
||||
{
|
||||
name: toSlName(input.contentTitle ?? Object.keys(filters).map(fieldLeaf).join('_')),
|
||||
filters,
|
||||
suggestedPredicate: Object.entries(filters)
|
||||
.map(([field, value]) => filterValueToPredicate(field, value))
|
||||
.join(' AND '),
|
||||
description: `Reusable filter candidate from Looker ${contentLabel(input)}.`,
|
||||
},
|
||||
];
|
||||
|
||||
const decision =
|
||||
measures.length > 0 ? 'measure_added' : segments.length > 0 && isHighUsage(usage) ? 'source_created' : 'wiki_only';
|
||||
|
||||
const notes = [
|
||||
...targetNotes(status, targetTable),
|
||||
'Treat this as a proposal, not an instruction to write SL blindly.',
|
||||
'Verify field SQL, source shape, and existing SL overlap with sl_discover/sl_read_source before sl_write_source or sl_edit_source.',
|
||||
'Usage signals can raise priority, but query counts, users, owners, and folders must not be written as wiki narrative.',
|
||||
];
|
||||
if (triageLane === 'skip') {
|
||||
notes.push('Zero recent usage is a skip signal unless the raw content clearly defines durable business semantics.');
|
||||
}
|
||||
|
||||
return {
|
||||
sourceName,
|
||||
targetWarehouseConnectionId,
|
||||
targetTable,
|
||||
targetStatus: status,
|
||||
sourceTable,
|
||||
canWriteStandaloneSource,
|
||||
triageLane,
|
||||
decision,
|
||||
dimensions,
|
||||
measures,
|
||||
segments,
|
||||
notes,
|
||||
};
|
||||
}
|
||||
|
||||
export function createLookerQueryToSlTool() {
|
||||
return tool({
|
||||
description:
|
||||
'Given one staged Looker query JSON, return a conservative proposal for SL measures, dimensions, reusable filters, and triage priority. The proposal is advisory; verify with SL tools before writing.',
|
||||
inputSchema: lookerQueryToSlInputSchema,
|
||||
execute: async (input): Promise<ToolOutput<LookerSlProposal>> => {
|
||||
const structured = buildLookerSlProposal(input);
|
||||
return {
|
||||
markdown: formatLookerSlProposal(structured),
|
||||
structured,
|
||||
};
|
||||
},
|
||||
toModelOutput: ({ output }) => {
|
||||
const markdown =
|
||||
output && typeof output === 'object' && 'markdown' in output
|
||||
? String((output as { markdown: unknown }).markdown)
|
||||
: String(output);
|
||||
return { type: 'content', value: [{ type: 'text', text: markdown }] };
|
||||
},
|
||||
});
|
||||
}
|
||||
|
||||
function formatLookerSlProposal(proposal: LookerSlProposal): string {
|
||||
const lines = [
|
||||
'## Looker query SL proposal',
|
||||
'',
|
||||
`- sourceName: ${proposal.sourceName}`,
|
||||
`- targetStatus: ${proposal.targetStatus}`,
|
||||
`- targetWarehouseConnectionId: ${proposal.targetWarehouseConnectionId ?? '(none)'}`,
|
||||
`- sourceTable: ${proposal.sourceTable ?? '(none)'}`,
|
||||
`- canWriteStandaloneSource: ${proposal.canWriteStandaloneSource}`,
|
||||
`- triageLane: ${proposal.triageLane}`,
|
||||
`- decision: ${proposal.decision}`,
|
||||
'',
|
||||
'### Measures',
|
||||
...(proposal.measures.length === 0
|
||||
? ['- (none)']
|
||||
: proposal.measures.map((measure) => `- ${measure.name}: ${measure.expr} (${measure.lookerField})`)),
|
||||
'',
|
||||
'### Dimensions',
|
||||
...(proposal.dimensions.length === 0
|
||||
? ['- (none)']
|
||||
: proposal.dimensions.map((dimension) => `- ${dimension.name}: ${dimension.lookerField}`)),
|
||||
'',
|
||||
'### Segments',
|
||||
...(proposal.segments.length === 0
|
||||
? ['- (none)']
|
||||
: proposal.segments.map((segment) => `- ${segment.name}: ${segment.suggestedPredicate}`)),
|
||||
'',
|
||||
'### Notes',
|
||||
...proposal.notes.map((note) => `- ${note}`),
|
||||
];
|
||||
return lines.join('\n');
|
||||
}
|
||||
|
||||
function isHighUsage(usage: z.infer<typeof lookerUsageInputSchema> | undefined): boolean {
|
||||
return !!usage && (usage.queryCount30d >= 10 || usage.uniqueUsers30d >= 3);
|
||||
}
|
||||
|
||||
function isMeasureLikeField(field: string): boolean {
|
||||
return MEASURE_FIELD_RE.test(fieldLeaf(field).replace(/_/g, ' '));
|
||||
}
|
||||
|
||||
function suggestedMeasureExpr(field: string): string {
|
||||
const leaf = fieldLeaf(field);
|
||||
if (/\b(count|count_distinct)\b/i.test(leaf.replace(/_/g, ' '))) {
|
||||
return `count(${field})`;
|
||||
}
|
||||
if (/\b(avg|average|rate|ratio|percent|pct|margin|score)\b/i.test(leaf.replace(/_/g, ' '))) {
|
||||
return `avg(${field})`;
|
||||
}
|
||||
return `sum(${field})`;
|
||||
}
|
||||
|
||||
function fieldLeaf(field: string): string {
|
||||
const parts = field.split('.');
|
||||
return parts[parts.length - 1] || field;
|
||||
}
|
||||
|
||||
function nonEmptyFilters(filters: Record<string, unknown>): Record<string, unknown> {
|
||||
return Object.fromEntries(
|
||||
Object.entries(filters).filter(([, value]) => {
|
||||
if (value === null || value === undefined) {
|
||||
return false;
|
||||
}
|
||||
if (typeof value === 'string') {
|
||||
return value.trim().length > 0;
|
||||
}
|
||||
if (Array.isArray(value)) {
|
||||
return value.length > 0;
|
||||
}
|
||||
return true;
|
||||
}),
|
||||
);
|
||||
}
|
||||
|
||||
function filterValueToPredicate(field: string, value: unknown): string {
|
||||
if (Array.isArray(value)) {
|
||||
return `${field} IN (${value.map(sqlLiteral).join(', ')})`;
|
||||
}
|
||||
if (typeof value === 'number' || typeof value === 'boolean') {
|
||||
return `${field} = ${String(value)}`;
|
||||
}
|
||||
const raw = String(value).trim();
|
||||
if (raw.includes(',') && !raw.includes('"') && !raw.includes("'")) {
|
||||
return `${field} IN (${raw
|
||||
.split(',')
|
||||
.map((part) => sqlLiteral(part.trim()))
|
||||
.join(', ')})`;
|
||||
}
|
||||
if (raw.startsWith('-') && raw.length > 1) {
|
||||
return `${field} != ${sqlLiteral(raw.slice(1).trim())}`;
|
||||
}
|
||||
if (raw.includes('%')) {
|
||||
return `${field} LIKE ${sqlLiteral(raw)}`;
|
||||
}
|
||||
return `${field} = ${sqlLiteral(raw)}`;
|
||||
}
|
||||
|
||||
function sqlLiteral(value: unknown): string {
|
||||
if (typeof value === 'number' || typeof value === 'boolean') {
|
||||
return String(value);
|
||||
}
|
||||
return `'${String(value).replace(/'/g, "''")}'`;
|
||||
}
|
||||
|
||||
function contentLabel(input: z.infer<typeof lookerQueryToSlInputSchema>): string {
|
||||
const noun = input.contentType === 'dashboard_tile' ? 'dashboard tile' : 'look';
|
||||
return input.contentTitle ? `${noun} "${input.contentTitle}"` : noun;
|
||||
}
|
||||
|
||||
function toSlName(value: string): string {
|
||||
const normalized = value
|
||||
.trim()
|
||||
.replace(/([a-z0-9])([A-Z])/g, '$1_$2')
|
||||
.toLowerCase()
|
||||
.replace(/[^a-z0-9]+/g, '_')
|
||||
.replace(/^_+|_+$/g, '')
|
||||
.replace(/_+/g, '_');
|
||||
if (!normalized) {
|
||||
throw new Error(`Cannot derive semantic-layer name from empty Looker value`);
|
||||
}
|
||||
return /^[0-9]/.test(normalized) ? `n_${normalized}` : normalized;
|
||||
}
|
||||
329
packages/cli/src/context/ingest/adapters/looker/types.test.ts
Normal file
329
packages/cli/src/context/ingest/adapters/looker/types.test.ts
Normal file
|
|
@ -0,0 +1,329 @@
|
|||
import { describe, expect, it } from 'vitest';
|
||||
import { parsedTargetTableSchema } from '../../parsed-target-table.js';
|
||||
import {
|
||||
lookerPullConfigSchema,
|
||||
parseLookerPullConfig,
|
||||
stagedDashboardFileSchema,
|
||||
stagedExploreFileSchema,
|
||||
stagedLookerFetchIssueSchema,
|
||||
stagedLookerQuerySchema,
|
||||
stagedLookerScopeFileSchema,
|
||||
stagedLookerSignalsFileSchema,
|
||||
stagedLookFileSchema,
|
||||
stagedSyncConfigSchema,
|
||||
} from './types.js';
|
||||
|
||||
describe('Looker staged runtime schemas', () => {
|
||||
it('parses pull config and staged sync config', () => {
|
||||
expect(
|
||||
lookerPullConfigSchema.parse({
|
||||
lookerConnectionId: '11111111-1111-4111-8111-111111111111',
|
||||
instanceBaseUrl: 'https://example.looker.com',
|
||||
}),
|
||||
).toEqual({
|
||||
lookerConnectionId: '11111111-1111-4111-8111-111111111111',
|
||||
instanceBaseUrl: 'https://example.looker.com',
|
||||
connectionMappings: {},
|
||||
connectionTypes: {},
|
||||
parsedTargetTables: {},
|
||||
});
|
||||
|
||||
expect(
|
||||
stagedSyncConfigSchema.parse({
|
||||
lookerConnectionId: '11111111-1111-4111-8111-111111111111',
|
||||
fetchedAt: '2026-04-30T12:00:00.000Z',
|
||||
instanceBaseUrl: 'https://example.looker.com',
|
||||
}),
|
||||
).toMatchObject({
|
||||
lookerConnectionId: '11111111-1111-4111-8111-111111111111',
|
||||
instanceBaseUrl: 'https://example.looker.com',
|
||||
});
|
||||
});
|
||||
|
||||
it('parses incremental pull cursors and scope manifests', () => {
|
||||
expect(
|
||||
parseLookerPullConfig({
|
||||
lookerConnectionId: '11111111-1111-4111-8111-111111111111',
|
||||
dashboardUpdatedSince: '2026-04-30T10:00:00.000Z',
|
||||
lookUpdatedSince: '2026-04-30T11:00:00.000Z',
|
||||
}),
|
||||
).toEqual({
|
||||
lookerConnectionId: '11111111-1111-4111-8111-111111111111',
|
||||
dashboardUpdatedSince: '2026-04-30T10:00:00.000Z',
|
||||
lookUpdatedSince: '2026-04-30T11:00:00.000Z',
|
||||
connectionMappings: {},
|
||||
connectionTypes: {},
|
||||
parsedTargetTables: {},
|
||||
});
|
||||
|
||||
expect(
|
||||
stagedLookerScopeFileSchema.parse({
|
||||
mode: 'incremental',
|
||||
knownCurrentRawPaths: ['dashboards/10.json', 'looks/20.json'],
|
||||
fetchedRawPaths: ['dashboards/10.json'],
|
||||
}),
|
||||
).toEqual({
|
||||
mode: 'incremental',
|
||||
knownCurrentRawPaths: ['dashboards/10.json', 'looks/20.json'],
|
||||
fetchedRawPaths: ['dashboards/10.json'],
|
||||
});
|
||||
|
||||
expect(
|
||||
stagedSyncConfigSchema.parse({
|
||||
lookerConnectionId: '11111111-1111-4111-8111-111111111111',
|
||||
fetchedAt: '2026-04-30T12:30:00.000Z',
|
||||
previousCursors: {
|
||||
dashboardsLastSyncedAt: null,
|
||||
looksLastSyncedAt: '2026-04-30T11:00:00.000Z',
|
||||
},
|
||||
nextCursors: {
|
||||
dashboardsLastSyncedAt: '2026-04-30T12:00:00.000Z',
|
||||
looksLastSyncedAt: '2026-04-30T11:00:00.000Z',
|
||||
},
|
||||
}).nextCursors,
|
||||
).toEqual({
|
||||
dashboardsLastSyncedAt: '2026-04-30T12:00:00.000Z',
|
||||
looksLastSyncedAt: '2026-04-30T11:00:00.000Z',
|
||||
});
|
||||
});
|
||||
|
||||
it('normalizes numeric Looker ids to strings', () => {
|
||||
const dashboard = stagedDashboardFileSchema.parse({
|
||||
lookerId: 10,
|
||||
title: 'Sales Pipeline',
|
||||
description: null,
|
||||
folderId: 7,
|
||||
ownerId: 3,
|
||||
updatedAt: '2026-04-30T12:00:00.000Z',
|
||||
tiles: [{ id: 100, title: 'ARR', lookId: null, query: { model: 'b2b', view: 'sales_pipeline' } }],
|
||||
});
|
||||
|
||||
expect(dashboard.lookerId).toBe('10');
|
||||
expect(dashboard.folderId).toBe('7');
|
||||
expect(dashboard.ownerId).toBe('3');
|
||||
expect(dashboard.tiles[0].id).toBe('100');
|
||||
});
|
||||
|
||||
it('parses explores, looks, and signal files with defaults', () => {
|
||||
expect(
|
||||
stagedExploreFileSchema.parse({
|
||||
modelName: 'b2b',
|
||||
exploreName: 'sales_pipeline',
|
||||
label: 'Sales Pipeline',
|
||||
description: null,
|
||||
fields: {
|
||||
dimensions: [{ name: 'opportunities.id', label: 'Opportunity ID', type: 'number', sql: '${TABLE}.id' }],
|
||||
measures: [{ name: 'opportunities.arr', label: 'ARR', type: 'sum', sql: '${TABLE}.arr' }],
|
||||
},
|
||||
joins: [{ name: 'accounts', type: 'left_outer', relationship: 'many_to_one' }],
|
||||
}),
|
||||
).toMatchObject({
|
||||
modelName: 'b2b',
|
||||
exploreName: 'sales_pipeline',
|
||||
fields: { dimensions: [{ name: 'opportunities.id' }], measures: [{ name: 'opportunities.arr' }] },
|
||||
});
|
||||
|
||||
expect(
|
||||
stagedLookFileSchema.parse({
|
||||
lookerId: '20',
|
||||
title: 'Open Pipeline',
|
||||
description: null,
|
||||
folderId: null,
|
||||
ownerId: null,
|
||||
updatedAt: null,
|
||||
query: { model: 'b2b', view: 'sales_pipeline', fields: ['opportunities.arr'] },
|
||||
}),
|
||||
).toMatchObject({ lookerId: '20', query: { fields: ['opportunities.arr'] } });
|
||||
|
||||
expect(stagedLookerSignalsFileSchema.parse({}).dashboardUsage).toEqual([]);
|
||||
});
|
||||
|
||||
it('parses warehouse SL mapping pull config and staged target table fields', () => {
|
||||
const targetConnectionId = '22222222-2222-4222-8222-222222222222';
|
||||
const parsedTargetTable = {
|
||||
ok: true as const,
|
||||
catalog: 'proj',
|
||||
schema: 'dataset',
|
||||
name: 'opportunities',
|
||||
canonicalTable: 'proj.dataset.opportunities',
|
||||
};
|
||||
|
||||
expect(parsedTargetTableSchema.parse(parsedTargetTable)).toEqual(parsedTargetTable);
|
||||
|
||||
expect(
|
||||
parseLookerPullConfig({
|
||||
lookerConnectionId: '11111111-1111-4111-8111-111111111111',
|
||||
connectionMappings: { b2b_sandbox_bq: targetConnectionId },
|
||||
connectionTypes: { b2b_sandbox_bq: 'BIGQUERY' },
|
||||
parsedTargetTables: { 'b2b.sales_pipeline': parsedTargetTable },
|
||||
}),
|
||||
).toEqual({
|
||||
lookerConnectionId: '11111111-1111-4111-8111-111111111111',
|
||||
connectionMappings: { b2b_sandbox_bq: targetConnectionId },
|
||||
connectionTypes: { b2b_sandbox_bq: 'BIGQUERY' },
|
||||
parsedTargetTables: { 'b2b.sales_pipeline': parsedTargetTable },
|
||||
});
|
||||
|
||||
expect(
|
||||
stagedExploreFileSchema.parse({
|
||||
modelName: 'b2b',
|
||||
exploreName: 'sales_pipeline',
|
||||
label: 'Sales Pipeline',
|
||||
description: null,
|
||||
rawSqlTableName: 'proj.dataset.opportunities AS opportunities',
|
||||
connectionName: 'b2b_sandbox_bq',
|
||||
viewName: 'opportunities',
|
||||
fields: {
|
||||
dimensions: [{ name: 'opportunities.id', label: 'Opportunity ID', type: 'number', sql: '${TABLE}.id' }],
|
||||
measures: [{ name: 'opportunities.arr', label: 'ARR', type: 'sum', sql: '${TABLE}.arr' }],
|
||||
},
|
||||
joins: [
|
||||
{
|
||||
name: 'accounts',
|
||||
type: 'left_outer',
|
||||
relationship: 'many_to_one',
|
||||
rawSqlTableName: 'proj.dataset.accounts',
|
||||
sqlOn: '${opportunities.account_id} = ${accounts.id}',
|
||||
from: null,
|
||||
targetTable: {
|
||||
ok: true,
|
||||
catalog: 'proj',
|
||||
schema: 'dataset',
|
||||
name: 'accounts',
|
||||
canonicalTable: 'proj.dataset.accounts',
|
||||
},
|
||||
},
|
||||
],
|
||||
targetWarehouseConnectionId: targetConnectionId,
|
||||
targetTable: parsedTargetTable,
|
||||
}),
|
||||
).toMatchObject({
|
||||
modelName: 'b2b',
|
||||
exploreName: 'sales_pipeline',
|
||||
connectionName: 'b2b_sandbox_bq',
|
||||
targetWarehouseConnectionId: targetConnectionId,
|
||||
targetTable: parsedTargetTable,
|
||||
joins: [{ name: 'accounts', targetTable: { ok: true, name: 'accounts' } }],
|
||||
});
|
||||
});
|
||||
|
||||
it('parses structured Looker mapping fetch warnings', () => {
|
||||
expect(
|
||||
stagedLookerFetchIssueSchema.parse({
|
||||
rawPath: 'looker_connection_mappings/b2b_sandbox_bq',
|
||||
entityType: 'looker_connection_mapping',
|
||||
entityId: 'b2b_sandbox_bq',
|
||||
severity: 'warning',
|
||||
statusCode: null,
|
||||
message: 'Looker connection b2b_sandbox_bq is not mapped to a warehouse connection.',
|
||||
retryRecommended: false,
|
||||
kind: 'unmapped_looker_connection',
|
||||
details: {
|
||||
lookerConnectionName: 'b2b_sandbox_bq',
|
||||
affectedExplores: ['b2b.sales_pipeline'],
|
||||
},
|
||||
}),
|
||||
).toMatchObject({
|
||||
entityType: 'looker_connection_mapping',
|
||||
kind: 'unmapped_looker_connection',
|
||||
details: {
|
||||
lookerConnectionName: 'b2b_sandbox_bq',
|
||||
affectedExplores: ['b2b.sales_pipeline'],
|
||||
},
|
||||
});
|
||||
});
|
||||
|
||||
it('parses LookML model listing warnings in fetch reports', () => {
|
||||
expect(
|
||||
stagedLookerFetchIssueSchema.parse({
|
||||
rawPath: 'lookml_models.json',
|
||||
entityType: 'lookml_models',
|
||||
entityId: null,
|
||||
severity: 'warning',
|
||||
statusCode: 403,
|
||||
message: 'LookML model access denied',
|
||||
retryRecommended: false,
|
||||
}),
|
||||
).toEqual({
|
||||
rawPath: 'lookml_models.json',
|
||||
entityType: 'lookml_models',
|
||||
entityId: null,
|
||||
severity: 'warning',
|
||||
statusCode: 403,
|
||||
message: 'LookML model access denied',
|
||||
retryRecommended: false,
|
||||
});
|
||||
});
|
||||
|
||||
it('accepts slug-shaped connection ids inside KTX Looker runtime schemas', () => {
|
||||
const parsedTargetTable = {
|
||||
ok: true as const,
|
||||
catalog: 'proj',
|
||||
schema: 'dataset',
|
||||
name: 'opportunities',
|
||||
canonicalTable: 'proj.dataset.opportunities',
|
||||
};
|
||||
|
||||
expect(
|
||||
parseLookerPullConfig({
|
||||
lookerConnectionId: 'prod-looker',
|
||||
connectionMappings: { b2b_sandbox_bq: 'prod-warehouse' },
|
||||
connectionTypes: { b2b_sandbox_bq: 'BIGQUERY' },
|
||||
parsedTargetTables: { 'b2b.sales_pipeline': parsedTargetTable },
|
||||
}),
|
||||
).toMatchObject({
|
||||
lookerConnectionId: 'prod-looker',
|
||||
connectionMappings: { b2b_sandbox_bq: 'prod-warehouse' },
|
||||
});
|
||||
|
||||
expect(
|
||||
stagedSyncConfigSchema.parse({
|
||||
lookerConnectionId: 'prod-looker',
|
||||
fetchedAt: '2026-04-30T12:00:00.000Z',
|
||||
}),
|
||||
).toMatchObject({
|
||||
lookerConnectionId: 'prod-looker',
|
||||
});
|
||||
|
||||
expect(
|
||||
stagedLookerQuerySchema.parse({
|
||||
model: 'b2b',
|
||||
view: 'sales_pipeline',
|
||||
targetWarehouseConnectionId: 'prod-warehouse',
|
||||
targetTable: parsedTargetTable,
|
||||
}),
|
||||
).toMatchObject({
|
||||
targetWarehouseConnectionId: 'prod-warehouse',
|
||||
targetTable: parsedTargetTable,
|
||||
});
|
||||
|
||||
expect(
|
||||
stagedExploreFileSchema.parse({
|
||||
modelName: 'b2b',
|
||||
exploreName: 'sales_pipeline',
|
||||
label: 'Sales Pipeline',
|
||||
description: null,
|
||||
fields: { dimensions: [], measures: [] },
|
||||
targetWarehouseConnectionId: 'prod-warehouse',
|
||||
targetTable: parsedTargetTable,
|
||||
}),
|
||||
).toMatchObject({
|
||||
targetWarehouseConnectionId: 'prod-warehouse',
|
||||
targetTable: parsedTargetTable,
|
||||
});
|
||||
});
|
||||
|
||||
it('rejects unsafe KTX Looker connection ids', () => {
|
||||
expect(() =>
|
||||
parseLookerPullConfig({
|
||||
lookerConnectionId: '../prod-looker',
|
||||
}),
|
||||
).toThrow();
|
||||
|
||||
expect(() =>
|
||||
parseLookerPullConfig({
|
||||
connectionMappings: { b2b_sandbox_bq: 'prod/warehouse' },
|
||||
}),
|
||||
).toThrow();
|
||||
});
|
||||
});
|
||||
255
packages/cli/src/context/ingest/adapters/looker/types.ts
Normal file
255
packages/cli/src/context/ingest/adapters/looker/types.ts
Normal file
|
|
@ -0,0 +1,255 @@
|
|||
import { z } from 'zod';
|
||||
import { connectionTypeSchema } from '../../../connections/connection-type.js';
|
||||
import { parsedTargetTableSchema } from '../../parsed-target-table.js';
|
||||
|
||||
const lookerIdSchema = z.union([z.string(), z.number().int()]).transform(String);
|
||||
const nullableLookerIdSchema = z.union([lookerIdSchema, z.null()]).default(null);
|
||||
|
||||
const lookerConnectionIdSchema = z.string().min(1).regex(/^[A-Za-z0-9_-]+$/);
|
||||
|
||||
const lookerRuntimeCursorsSchema = z.object({
|
||||
dashboardsLastSyncedAt: z.iso.datetime().nullable().default(null),
|
||||
looksLastSyncedAt: z.iso.datetime().nullable().default(null),
|
||||
});
|
||||
|
||||
export type LookerRuntimeCursors = z.infer<typeof lookerRuntimeCursorsSchema>;
|
||||
|
||||
/** @internal */
|
||||
export const lookerPullConfigSchema = z.object({
|
||||
lookerConnectionId: lookerConnectionIdSchema.optional(),
|
||||
instanceBaseUrl: z.url().optional(),
|
||||
dashboardUpdatedSince: z.iso.datetime().nullable().optional(),
|
||||
lookUpdatedSince: z.iso.datetime().nullable().optional(),
|
||||
connectionMappings: z.record(z.string(), lookerConnectionIdSchema).default({}),
|
||||
connectionTypes: z.record(z.string(), connectionTypeSchema).default({}),
|
||||
parsedTargetTables: z.record(z.string(), parsedTargetTableSchema).default({}),
|
||||
});
|
||||
|
||||
export type LookerPullConfig = z.infer<typeof lookerPullConfigSchema>;
|
||||
|
||||
export function parseLookerPullConfig(raw: unknown): LookerPullConfig {
|
||||
return lookerPullConfigSchema.parse(raw ?? {});
|
||||
}
|
||||
|
||||
export const stagedSyncConfigSchema = z.object({
|
||||
lookerConnectionId: lookerConnectionIdSchema,
|
||||
fetchedAt: z.iso.datetime(),
|
||||
instanceBaseUrl: z.url().optional(),
|
||||
previousCursors: lookerRuntimeCursorsSchema.default({
|
||||
dashboardsLastSyncedAt: null,
|
||||
looksLastSyncedAt: null,
|
||||
}),
|
||||
nextCursors: lookerRuntimeCursorsSchema.default({
|
||||
dashboardsLastSyncedAt: null,
|
||||
looksLastSyncedAt: null,
|
||||
}),
|
||||
});
|
||||
|
||||
export const stagedLookerQuerySchema = z.object({
|
||||
id: lookerIdSchema.optional(),
|
||||
model: z.string(),
|
||||
view: z.string(),
|
||||
fields: z.array(z.string()).default([]),
|
||||
filters: z.record(z.string(), z.unknown()).default({}),
|
||||
sorts: z.array(z.string()).default([]),
|
||||
limit: z.union([z.string(), z.number()]).optional().nullable(),
|
||||
dynamicFields: z.string().optional().nullable(),
|
||||
targetWarehouseConnectionId: lookerConnectionIdSchema.nullable().default(null),
|
||||
targetTable: parsedTargetTableSchema.nullable().default(null),
|
||||
});
|
||||
|
||||
export type StagedLookerQuery = z.infer<typeof stagedLookerQuerySchema>;
|
||||
|
||||
const stagedDashboardTileSchema = z.object({
|
||||
id: lookerIdSchema,
|
||||
title: z.string().nullable().default(null),
|
||||
lookId: nullableLookerIdSchema,
|
||||
query: stagedLookerQuerySchema.nullable().default(null),
|
||||
});
|
||||
|
||||
export const stagedDashboardFileSchema = z.object({
|
||||
lookerId: lookerIdSchema,
|
||||
title: z.string(),
|
||||
description: z.string().nullable(),
|
||||
folderId: nullableLookerIdSchema,
|
||||
ownerId: nullableLookerIdSchema,
|
||||
updatedAt: z.string().nullable(),
|
||||
tiles: z.array(stagedDashboardTileSchema).default([]),
|
||||
});
|
||||
|
||||
export type StagedDashboardFile = z.infer<typeof stagedDashboardFileSchema>;
|
||||
|
||||
export const stagedLookFileSchema = z.object({
|
||||
lookerId: lookerIdSchema,
|
||||
title: z.string(),
|
||||
description: z.string().nullable(),
|
||||
folderId: nullableLookerIdSchema,
|
||||
ownerId: nullableLookerIdSchema,
|
||||
updatedAt: z.string().nullable(),
|
||||
query: stagedLookerQuerySchema.nullable().default(null),
|
||||
});
|
||||
|
||||
export type StagedLookFile = z.infer<typeof stagedLookFileSchema>;
|
||||
|
||||
const stagedFolderSchema = z.object({
|
||||
id: lookerIdSchema,
|
||||
name: z.string(),
|
||||
parentId: nullableLookerIdSchema,
|
||||
path: z.array(z.string()).default([]),
|
||||
});
|
||||
|
||||
export const stagedFoldersTreeFileSchema = z.object({
|
||||
folders: z.array(stagedFolderSchema),
|
||||
});
|
||||
|
||||
export type StagedFoldersTreeFile = z.infer<typeof stagedFoldersTreeFileSchema>;
|
||||
|
||||
export const stagedUserFileSchema = z.object({
|
||||
id: lookerIdSchema,
|
||||
displayName: z.string().nullable(),
|
||||
email: z.string().nullable().default(null),
|
||||
});
|
||||
|
||||
export type StagedUserFile = z.infer<typeof stagedUserFileSchema>;
|
||||
|
||||
export const stagedGroupFileSchema = z.object({
|
||||
id: lookerIdSchema,
|
||||
name: z.string(),
|
||||
});
|
||||
|
||||
export type StagedGroupFile = z.infer<typeof stagedGroupFileSchema>;
|
||||
|
||||
const stagedLookmlModelSchema = z.object({
|
||||
name: z.string(),
|
||||
label: z.string().nullable().default(null),
|
||||
explores: z.array(z.object({ name: z.string(), label: z.string().nullable().default(null) })),
|
||||
});
|
||||
|
||||
export const stagedLookmlModelsFileSchema = z.object({
|
||||
models: z.array(stagedLookmlModelSchema),
|
||||
});
|
||||
|
||||
export type StagedLookmlModelsFile = z.infer<typeof stagedLookmlModelsFileSchema>;
|
||||
|
||||
const stagedLookerFieldSchema = z.object({
|
||||
name: z.string(),
|
||||
label: z.string().nullable().default(null),
|
||||
type: z.string().nullable().default(null),
|
||||
sql: z.string().nullable().default(null),
|
||||
description: z.string().nullable().default(null),
|
||||
});
|
||||
|
||||
const stagedLookerJoinSchema = z.object({
|
||||
name: z.string(),
|
||||
type: z.string().nullable().default(null),
|
||||
relationship: z.string().nullable().default(null),
|
||||
rawSqlTableName: z.string().nullable().default(null),
|
||||
sqlOn: z.string().nullable().default(null),
|
||||
from: z.string().nullable().default(null),
|
||||
targetTable: parsedTargetTableSchema.nullable().default(null),
|
||||
});
|
||||
|
||||
export const stagedExploreFileSchema = z.object({
|
||||
modelName: z.string(),
|
||||
exploreName: z.string(),
|
||||
label: z.string().nullable().default(null),
|
||||
description: z.string().nullable().default(null),
|
||||
rawSqlTableName: z.string().nullable().default(null),
|
||||
connectionName: z.string().nullable().default(null),
|
||||
viewName: z.string().nullable().default(null),
|
||||
fields: z.object({
|
||||
dimensions: z.array(stagedLookerFieldSchema).default([]),
|
||||
measures: z.array(stagedLookerFieldSchema).default([]),
|
||||
}),
|
||||
joins: z.array(stagedLookerJoinSchema).default([]),
|
||||
targetWarehouseConnectionId: lookerConnectionIdSchema.nullable().default(null),
|
||||
targetTable: parsedTargetTableSchema.nullable().default(null),
|
||||
});
|
||||
|
||||
export type StagedExploreFile = z.infer<typeof stagedExploreFileSchema>;
|
||||
|
||||
const stagedUsageSignalSchema = z.object({
|
||||
contentId: lookerIdSchema,
|
||||
queryCount30d: z.number().int().nonnegative().default(0),
|
||||
uniqueUsers30d: z.number().int().nonnegative().default(0),
|
||||
lastRunAt: z.string().nullable().default(null),
|
||||
topUsers: z.array(lookerIdSchema).default([]),
|
||||
});
|
||||
|
||||
const stagedScheduledPlanSignalSchema = z.object({
|
||||
contentId: lookerIdSchema,
|
||||
contentType: z.enum(['dashboard', 'look']),
|
||||
isScheduled: z.boolean(),
|
||||
scheduleCount: z.number().int().nonnegative().default(0),
|
||||
recipientCount: z.number().int().nonnegative().default(0),
|
||||
});
|
||||
|
||||
const stagedFavoriteSignalSchema = z.object({
|
||||
contentId: lookerIdSchema,
|
||||
contentType: z.enum(['dashboard', 'look']),
|
||||
favoriteCount: z.number().int().nonnegative().default(0),
|
||||
});
|
||||
|
||||
export const stagedLookerSignalsFileSchema = z.object({
|
||||
dashboardUsage: z.array(stagedUsageSignalSchema).default([]),
|
||||
lookUsage: z.array(stagedUsageSignalSchema).default([]),
|
||||
scheduledPlans: z.array(stagedScheduledPlanSignalSchema).default([]),
|
||||
favorites: z.array(stagedFavoriteSignalSchema).default([]),
|
||||
});
|
||||
|
||||
export type StagedLookerSignalsFile = z.infer<typeof stagedLookerSignalsFileSchema>;
|
||||
|
||||
export const stagedLookerScopeFileSchema = z.object({
|
||||
mode: z.enum(['full', 'incremental']),
|
||||
knownCurrentRawPaths: z.array(z.string()).default([]),
|
||||
fetchedRawPaths: z.array(z.string()).default([]),
|
||||
});
|
||||
|
||||
export type StagedLookerScopeFile = z.infer<typeof stagedLookerScopeFileSchema>;
|
||||
|
||||
const stagedLookerFetchIssueKindSchema = z.enum([
|
||||
'unmapped_looker_connection',
|
||||
'unparseable_sql_table_name',
|
||||
'looker_template_unresolved',
|
||||
'derived_table_not_supported',
|
||||
'lookml_connection_mismatch',
|
||||
]);
|
||||
|
||||
/** @internal */
|
||||
export const stagedLookerFetchIssueSchema = z.object({
|
||||
rawPath: z.string().min(1),
|
||||
entityType: z.enum(['dashboard', 'look', 'explore', 'signals', 'lookml_models', 'looker_connection_mapping']),
|
||||
entityId: z.string().nullable().default(null),
|
||||
severity: z.enum(['warning', 'error']),
|
||||
statusCode: z.number().int().nullable().default(null),
|
||||
message: z.string().min(1),
|
||||
retryRecommended: z.boolean().default(false),
|
||||
kind: stagedLookerFetchIssueKindSchema.optional(),
|
||||
details: z.record(z.string(), z.unknown()).optional(),
|
||||
});
|
||||
|
||||
export type StagedLookerFetchIssue = z.infer<typeof stagedLookerFetchIssueSchema>;
|
||||
|
||||
export const stagedLookerFetchReportSchema = z.object({
|
||||
status: z.enum(['success', 'partial']),
|
||||
retryRecommended: z.boolean().default(false),
|
||||
skipped: z.array(stagedLookerFetchIssueSchema).default([]),
|
||||
warnings: z.array(stagedLookerFetchIssueSchema).default([]),
|
||||
});
|
||||
|
||||
export type StagedLookerFetchReport = z.infer<typeof stagedLookerFetchReportSchema>;
|
||||
|
||||
export const STAGED_FILES = {
|
||||
syncConfig: 'sync-config.json',
|
||||
scope: 'looker-scope.json',
|
||||
fetchReport: 'looker-fetch-report.json',
|
||||
evidenceRoot: 'evidence',
|
||||
lookmlModels: 'lookml_models.json',
|
||||
foldersTree: 'folders/tree.json',
|
||||
signals: {
|
||||
dashboardUsage: 'signals/dashboard_usage.json',
|
||||
lookUsage: 'signals/look_usage.json',
|
||||
scheduledPlans: 'signals/scheduled_plans.json',
|
||||
favorites: 'signals/favorites.json',
|
||||
},
|
||||
} as const;
|
||||
230
packages/cli/src/context/ingest/adapters/lookml/chunk.test.ts
Normal file
230
packages/cli/src/context/ingest/adapters/lookml/chunk.test.ts
Normal file
|
|
@ -0,0 +1,230 @@
|
|||
import { join } from 'node:path';
|
||||
import { describe, expect, it } from 'vitest';
|
||||
import { chunkLookmlProject } from './chunk.js';
|
||||
import { type ParsedLookmlProject, parseLookmlStagedDir } from './parse.js';
|
||||
|
||||
const FIXTURE_ROOT = join(__dirname, '../../../../test/fixtures/lookml');
|
||||
|
||||
describe('chunkLookmlProject — first run', () => {
|
||||
it('single-model bundle → 1 WU with model + all views in rawFiles', async () => {
|
||||
const stagedDir = join(FIXTURE_ROOT, 'single-model');
|
||||
const project = await parseLookmlStagedDir(stagedDir);
|
||||
const result = chunkLookmlProject(project);
|
||||
expect(result.workUnits).toHaveLength(1);
|
||||
const wu = result.workUnits[0];
|
||||
expect(wu.unitKey).toBe('lookml-orders');
|
||||
expect(wu.rawFiles.sort()).toEqual(['orders.model.lkml', 'views/customers.view.lkml', 'views/orders.view.lkml']);
|
||||
expect(wu.peerFileIndex).toEqual([]);
|
||||
expect(wu.dependencyPaths).toEqual([]);
|
||||
expect(result.eviction).toBeUndefined();
|
||||
});
|
||||
|
||||
it('multi-model bundle → 1 WU per model; shared view owned by lex-first model; others see it in dependencyPaths + peerFileIndex is pathless-index', async () => {
|
||||
const stagedDir = join(FIXTURE_ROOT, 'multi-model');
|
||||
const project = await parseLookmlStagedDir(stagedDir);
|
||||
const result = chunkLookmlProject(project);
|
||||
expect(result.workUnits).toHaveLength(2);
|
||||
const marketing = result.workUnits.find((wu) => wu.unitKey === 'lookml-marketing');
|
||||
const orders = result.workUnits.find((wu) => wu.unitKey === 'lookml-orders');
|
||||
expect(marketing).toBeDefined();
|
||||
expect(orders).toBeDefined();
|
||||
if (!marketing || !orders) {
|
||||
throw new Error('expected marketing and orders work units');
|
||||
}
|
||||
|
||||
// marketing sorts before orders → marketing owns shared_dims
|
||||
expect(marketing.rawFiles).toContain('views/shared_dims.view.lkml');
|
||||
expect(marketing.rawFiles).toContain('views/campaigns.view.lkml');
|
||||
expect(marketing.rawFiles).toContain('marketing.model.lkml');
|
||||
expect(marketing.rawFiles).not.toContain('views/orders.view.lkml');
|
||||
expect(marketing.dependencyPaths).toEqual([]);
|
||||
|
||||
// orders does NOT own shared_dims — it's in dependencyPaths (read-only upstream).
|
||||
expect(orders.rawFiles).not.toContain('views/shared_dims.view.lkml');
|
||||
expect(orders.dependencyPaths).toEqual(['views/shared_dims.view.lkml']);
|
||||
expect(orders.rawFiles).toContain('views/orders.view.lkml');
|
||||
expect(orders.rawFiles).toContain('orders.model.lkml');
|
||||
|
||||
// Each WU's peerFileIndex lists the OTHER model's files (paths-only index).
|
||||
expect(orders.peerFileIndex).toContain('marketing.model.lkml');
|
||||
expect(orders.peerFileIndex).toContain('views/campaigns.view.lkml');
|
||||
// Dependency paths should not be duplicated into peerFileIndex.
|
||||
expect(orders.peerFileIndex).not.toContain('views/shared_dims.view.lkml');
|
||||
});
|
||||
|
||||
it('extends-chain fixture: single WU contains base + orders + orders_ext; chain order visible via graph', async () => {
|
||||
const stagedDir = join(FIXTURE_ROOT, 'extends-chain');
|
||||
const project = await parseLookmlStagedDir(stagedDir);
|
||||
const result = chunkLookmlProject(project);
|
||||
// One model ("orders") includes views/*.view.lkml — so all three views land in its WU.
|
||||
expect(result.workUnits).toHaveLength(1);
|
||||
const wu = result.workUnits[0];
|
||||
expect(wu.unitKey).toBe('lookml-orders');
|
||||
expect(wu.rawFiles.sort()).toEqual([
|
||||
'orders.model.lkml',
|
||||
'views/base.view.lkml',
|
||||
'views/orders.view.lkml',
|
||||
'views/orders_ext.view.lkml',
|
||||
]);
|
||||
expect(wu.dependencyPaths).toEqual([]); // all ancestors already in rawFiles on first run
|
||||
expect(wu.notes).toMatch(/orders/);
|
||||
});
|
||||
|
||||
it('is deterministic: two calls on the same project return structurally identical WorkUnits', async () => {
|
||||
const stagedDir = join(FIXTURE_ROOT, 'multi-model');
|
||||
const project = await parseLookmlStagedDir(stagedDir);
|
||||
const r1 = chunkLookmlProject(project);
|
||||
const r2 = chunkLookmlProject(project);
|
||||
expect(r1.workUnits).toEqual(r2.workUnits);
|
||||
});
|
||||
|
||||
it('unitKey is model-name-derived (stable across parse+chunk cycles and across re-syncs)', async () => {
|
||||
const project = await parseLookmlStagedDir(join(FIXTURE_ROOT, 'multi-model'));
|
||||
const { workUnits } = chunkLookmlProject(project);
|
||||
expect(workUnits.map((wu) => wu.unitKey).sort()).toEqual(['lookml-marketing', 'lookml-orders']);
|
||||
});
|
||||
|
||||
it('marks mismatched model WorkUnits as SL-disallowed and keeps wiki ingest enabled', () => {
|
||||
const project: ParsedLookmlProject = {
|
||||
models: [
|
||||
{
|
||||
path: 'b2b.model.lkml',
|
||||
name: 'b2b',
|
||||
includes: ['views/orders.view.lkml'],
|
||||
explores: ['orders'],
|
||||
connectionName: 'wrong_connection',
|
||||
},
|
||||
],
|
||||
views: [{ path: 'views/orders.view.lkml', name: 'orders', extendsFrom: [], rawSqlTableName: 'public.orders' }],
|
||||
dashboards: [],
|
||||
allPaths: ['b2b.model.lkml', 'views/orders.view.lkml'],
|
||||
};
|
||||
|
||||
const result = chunkLookmlProject(project, { mismatchedModelNames: new Set(['b2b']) });
|
||||
const wu = result.workUnits[0];
|
||||
|
||||
expect(wu.unitKey).toBe('lookml-b2b');
|
||||
expect(wu.rawFiles).toEqual(['b2b.model.lkml', 'views/orders.view.lkml']);
|
||||
expect(wu.slDisallowed).toBe(true);
|
||||
expect(wu.slDisallowedReason).toBe('lookml_connection_mismatch');
|
||||
expect(wu.notes).toContain('[LOOKML SL WRITES DISALLOWED]');
|
||||
expect(wu.notes).toContain('reason: lookml_connection_mismatch');
|
||||
expect(wu.notes).toContain('Do not call sl_write_source or sl_edit_source for this WorkUnit.');
|
||||
});
|
||||
});
|
||||
|
||||
describe('chunkLookmlProject — re-sync', () => {
|
||||
it("modified file in one model only emits that model's WU", async () => {
|
||||
const stagedDir = join(FIXTURE_ROOT, 'multi-model');
|
||||
const project = await parseLookmlStagedDir(stagedDir);
|
||||
const result = chunkLookmlProject(project, {
|
||||
diffSet: {
|
||||
added: [],
|
||||
modified: ['views/campaigns.view.lkml'],
|
||||
deleted: [],
|
||||
unchanged: [
|
||||
'marketing.model.lkml',
|
||||
'orders.model.lkml',
|
||||
'views/orders.view.lkml',
|
||||
'views/shared_dims.view.lkml',
|
||||
],
|
||||
},
|
||||
});
|
||||
expect(result.workUnits).toHaveLength(1);
|
||||
expect(result.workUnits[0].unitKey).toBe('lookml-marketing');
|
||||
});
|
||||
|
||||
it("added file under a model emits that model's WU with the new path in rawFiles", async () => {
|
||||
const stagedDir = join(FIXTURE_ROOT, 'single-model');
|
||||
const project = await parseLookmlStagedDir(stagedDir);
|
||||
const result = chunkLookmlProject(project, {
|
||||
diffSet: {
|
||||
added: ['views/customers.view.lkml'],
|
||||
modified: [],
|
||||
deleted: [],
|
||||
unchanged: ['orders.model.lkml', 'views/orders.view.lkml'],
|
||||
},
|
||||
});
|
||||
expect(result.workUnits).toHaveLength(1);
|
||||
expect(result.workUnits[0].rawFiles).toContain('views/customers.view.lkml');
|
||||
});
|
||||
|
||||
it('widens dependencyPaths with transitive extends ancestors on re-sync', async () => {
|
||||
const stagedDir = join(FIXTURE_ROOT, 'extends-chain');
|
||||
const project = await parseLookmlStagedDir(stagedDir);
|
||||
// Only orders_ext is touched; base and orders are upstream ancestors.
|
||||
// Because the single-model WU's rawFiles ALREADY include all three on first run,
|
||||
// they remain in rawFiles — dependencyPaths stays empty. Widening matters when
|
||||
// re-sync drops some files from rawFiles, which doesn't apply for a monolithic
|
||||
// single-model WU. Assert the baseline invariant.
|
||||
const result = chunkLookmlProject(project, {
|
||||
diffSet: {
|
||||
added: [],
|
||||
modified: ['views/orders_ext.view.lkml'],
|
||||
deleted: [],
|
||||
unchanged: ['orders.model.lkml', 'views/base.view.lkml', 'views/orders.view.lkml'],
|
||||
},
|
||||
});
|
||||
expect(result.workUnits).toHaveLength(1);
|
||||
const wu = result.workUnits[0];
|
||||
expect(wu.rawFiles).toContain('views/orders_ext.view.lkml');
|
||||
// Ancestors already in rawFiles → not duplicated into dependencyPaths.
|
||||
expect(wu.dependencyPaths).toEqual([]);
|
||||
});
|
||||
|
||||
it('widens dependencyPaths when an ancestor is OUTSIDE the WU (synthesized cross-model case)', () => {
|
||||
// Synthesize a scenario in-memory: two models, "a" owns base.view.lkml,
|
||||
// "b" owns derived.view.lkml which extends base. A diff that only touches
|
||||
// derived.view.lkml should widen b's WU with base.view.lkml in dependencyPaths
|
||||
// if base lives outside b's rawFiles. In practice with the current emit rules,
|
||||
// base.view.lkml would already be in dependencyPaths because model b lists
|
||||
// base.view.lkml under its `include:`. Here we confirm the widening is idempotent.
|
||||
const project: ParsedLookmlProject = {
|
||||
models: [
|
||||
{ path: 'a.model.lkml', name: 'a', includes: ['views/base.view.lkml'], explores: [], connectionName: null },
|
||||
{
|
||||
path: 'b.model.lkml',
|
||||
name: 'b',
|
||||
includes: ['views/base.view.lkml', 'views/derived.view.lkml'],
|
||||
explores: [],
|
||||
connectionName: null,
|
||||
},
|
||||
],
|
||||
views: [
|
||||
{ path: 'views/base.view.lkml', name: 'base', extendsFrom: [], rawSqlTableName: null },
|
||||
{ path: 'views/derived.view.lkml', name: 'derived', extendsFrom: ['base'], rawSqlTableName: null },
|
||||
],
|
||||
dashboards: [],
|
||||
allPaths: ['a.model.lkml', 'b.model.lkml', 'views/base.view.lkml', 'views/derived.view.lkml'],
|
||||
};
|
||||
const result = chunkLookmlProject(project, {
|
||||
diffSet: {
|
||||
added: [],
|
||||
modified: ['views/derived.view.lkml'],
|
||||
deleted: [],
|
||||
unchanged: ['a.model.lkml', 'b.model.lkml', 'views/base.view.lkml'],
|
||||
},
|
||||
});
|
||||
const b = result.workUnits.find((wu) => wu.unitKey === 'lookml-b');
|
||||
expect(b).toBeDefined();
|
||||
if (!b) {
|
||||
throw new Error('expected lookml-b work unit');
|
||||
}
|
||||
expect(b.dependencyPaths).toContain('views/base.view.lkml');
|
||||
});
|
||||
|
||||
it('passes through diffSet.deleted as an EvictionUnit', async () => {
|
||||
const project = await parseLookmlStagedDir(join(FIXTURE_ROOT, 'single-model'));
|
||||
const result = chunkLookmlProject(project, {
|
||||
diffSet: {
|
||||
added: [],
|
||||
modified: [],
|
||||
deleted: ['views/zombie.view.lkml'],
|
||||
unchanged: ['orders.model.lkml', 'views/customers.view.lkml', 'views/orders.view.lkml'],
|
||||
},
|
||||
});
|
||||
expect(result.eviction).toEqual({ deletedRawPaths: ['views/zombie.view.lkml'] });
|
||||
// No WU emitted because no current files are touched.
|
||||
expect(result.workUnits).toEqual([]);
|
||||
});
|
||||
});
|
||||
159
packages/cli/src/context/ingest/adapters/lookml/chunk.ts
Normal file
159
packages/cli/src/context/ingest/adapters/lookml/chunk.ts
Normal file
|
|
@ -0,0 +1,159 @@
|
|||
import type { ChunkResult, DiffSet, WorkUnit } from '../../types.js';
|
||||
import { buildLookmlGraph, type LookmlGraph } from './graph.js';
|
||||
import type { ParsedLookmlProject } from './parse.js';
|
||||
|
||||
interface ChunkOptions {
|
||||
diffSet?: DiffSet;
|
||||
mismatchedModelNames?: Set<string>;
|
||||
}
|
||||
|
||||
function lookmlSlDisallowedNotes(modelName: string, existingNotes: string): string {
|
||||
return [
|
||||
'[LOOKML SL WRITES DISALLOWED]',
|
||||
'reason: lookml_connection_mismatch',
|
||||
`model: ${modelName}`,
|
||||
'Do not call sl_write_source or sl_edit_source for this WorkUnit.',
|
||||
'Continue wiki extraction and context candidates from the raw LookML files.',
|
||||
'[/LOOKML SL WRITES DISALLOWED]',
|
||||
'',
|
||||
existingNotes,
|
||||
].join('\n');
|
||||
}
|
||||
|
||||
/**
|
||||
* Emit WorkUnits for a parsed LookML project.
|
||||
*
|
||||
* First run (no diffSet): one WU per model + `lookml-orphans` (if any non-owned views)
|
||||
* + `lookml-dashboard-<name>` per dashboard file.
|
||||
*
|
||||
* Re-sync (diffSet provided): filter to WUs whose rawFiles intersect added∪modified;
|
||||
* widen dependencyPaths with every file in `allPaths`
|
||||
* that's upstream of the WU's changed files via the graph.
|
||||
* Emit a single EvictionUnit for diffSet.deleted.
|
||||
*/
|
||||
export function chunkLookmlProject(project: ParsedLookmlProject, opts: ChunkOptions = {}): ChunkResult {
|
||||
const graph = buildLookmlGraph(project);
|
||||
const firstRunUnits = emitFirstRunWorkUnits(project, graph, opts);
|
||||
if (!opts.diffSet) {
|
||||
return { workUnits: firstRunUnits };
|
||||
}
|
||||
return applyDiffSet(firstRunUnits, project, graph, opts.diffSet);
|
||||
}
|
||||
|
||||
function emitFirstRunWorkUnits(project: ParsedLookmlProject, graph: LookmlGraph, opts: ChunkOptions): WorkUnit[] {
|
||||
const allModelPaths = [...new Set(project.models.map((m) => m.path))].sort();
|
||||
const allDashboardPaths = [...new Set(project.dashboards.map((d) => d.path))].sort();
|
||||
// Dedupe: a .view.lkml with multiple `view:` blocks produces multiple ParsedLookmlView
|
||||
// entries sharing one path.
|
||||
const allViewPaths = [...new Set(project.views.map((v) => v.path))].sort();
|
||||
|
||||
const workUnits: WorkUnit[] = [];
|
||||
|
||||
// Per-model WU, sorted by model name for determinism.
|
||||
const sortedModels = [...project.models].sort((a, b) => a.name.localeCompare(b.name));
|
||||
|
||||
for (const model of sortedModels) {
|
||||
const includedViewPaths = (graph.viewsIncludedByModel.get(model.name) ?? []).filter((p) =>
|
||||
allViewPaths.includes(p),
|
||||
);
|
||||
// Views the model includes and which this model ALSO owns (first-includer-wins).
|
||||
const ownedViewPaths = includedViewPaths.filter((p) => graph.ownerByViewPath.get(p) === model.name);
|
||||
// Views the model includes but that another lexicographically-earlier model owns.
|
||||
// These land in dependencyPaths so this WU's agent can READ them, but the "canonical
|
||||
// write" for those views happens in the owner's WU.
|
||||
const nonOwnedDepViewPaths = includedViewPaths.filter((p) => graph.ownerByViewPath.get(p) !== model.name).sort();
|
||||
|
||||
const rawFiles = [model.path, ...ownedViewPaths].sort();
|
||||
const peerFileIndex = [
|
||||
...allModelPaths.filter((p) => p !== model.path),
|
||||
...allViewPaths.filter((p) => !rawFiles.includes(p) && !nonOwnedDepViewPaths.includes(p)),
|
||||
...allDashboardPaths,
|
||||
].sort();
|
||||
|
||||
const isMismatched = opts.mismatchedModelNames?.has(model.name) ?? false;
|
||||
const notes =
|
||||
model.explores.length > 0
|
||||
? `LookML model "${model.name}" (explores: ${model.explores.join(', ')})`
|
||||
: `LookML model "${model.name}"`;
|
||||
|
||||
workUnits.push({
|
||||
unitKey: `lookml-${model.name}`,
|
||||
displayLabel: `LookML model "${model.name}"`,
|
||||
rawFiles,
|
||||
peerFileIndex,
|
||||
dependencyPaths: nonOwnedDepViewPaths,
|
||||
notes: isMismatched ? lookmlSlDisallowedNotes(model.name, notes) : notes,
|
||||
slDisallowed: isMismatched ? true : undefined,
|
||||
slDisallowedReason: isMismatched ? 'lookml_connection_mismatch' : undefined,
|
||||
});
|
||||
}
|
||||
|
||||
// Orphan view WU — views that no model includes. Skip entirely if none.
|
||||
const orphanViewPaths = allViewPaths.filter((p) => !graph.ownerByViewPath.has(p)).sort();
|
||||
if (orphanViewPaths.length > 0) {
|
||||
workUnits.push({
|
||||
unitKey: 'lookml-orphans',
|
||||
displayLabel: 'LookML orphan views',
|
||||
rawFiles: orphanViewPaths,
|
||||
peerFileIndex: [...allModelPaths, ...allDashboardPaths].sort(),
|
||||
dependencyPaths: [],
|
||||
notes: 'Views not referenced by any .model.lkml (orphaned)',
|
||||
});
|
||||
}
|
||||
|
||||
// One WU per dashboard file.
|
||||
for (const dashboard of [...project.dashboards].sort((a, b) => a.name.localeCompare(b.name))) {
|
||||
workUnits.push({
|
||||
unitKey: `lookml-dashboard-${dashboard.name}`,
|
||||
displayLabel: `LookML dashboard "${dashboard.name}"`,
|
||||
rawFiles: [dashboard.path],
|
||||
peerFileIndex: [...allModelPaths, ...allViewPaths].sort(),
|
||||
dependencyPaths: [],
|
||||
notes: `LookML dashboard "${dashboard.name}"`,
|
||||
});
|
||||
}
|
||||
|
||||
return workUnits;
|
||||
}
|
||||
|
||||
function applyDiffSet(
|
||||
firstRunUnits: WorkUnit[],
|
||||
_project: ParsedLookmlProject,
|
||||
graph: LookmlGraph,
|
||||
diffSet: DiffSet,
|
||||
): ChunkResult {
|
||||
const touched = new Set([...diffSet.added, ...diffSet.modified]);
|
||||
const keptUnits: WorkUnit[] = [];
|
||||
|
||||
for (const wu of firstRunUnits) {
|
||||
const anyTouched = wu.rawFiles.some((p) => touched.has(p));
|
||||
if (!anyTouched) {
|
||||
continue;
|
||||
}
|
||||
|
||||
// Widen dependencyPaths: for every view in rawFiles, add paths of all transitive
|
||||
// extends ancestors (if known in the graph) that aren't already in rawFiles.
|
||||
const existingDeps = new Set(wu.dependencyPaths);
|
||||
for (const rawPath of wu.rawFiles) {
|
||||
const viewNames = graph.viewNamesByPath.get(rawPath) ?? [];
|
||||
for (const viewName of viewNames) {
|
||||
const ancestors = graph.extendsAncestorsByViewName.get(viewName) ?? [];
|
||||
for (const ancestorName of ancestors) {
|
||||
const ancestorPaths = graph.pathsByViewName.get(ancestorName) ?? [];
|
||||
for (const ancestorPath of ancestorPaths) {
|
||||
if (!wu.rawFiles.includes(ancestorPath)) {
|
||||
existingDeps.add(ancestorPath);
|
||||
}
|
||||
}
|
||||
}
|
||||
}
|
||||
}
|
||||
keptUnits.push({
|
||||
...wu,
|
||||
dependencyPaths: [...existingDeps].sort(),
|
||||
});
|
||||
}
|
||||
|
||||
const eviction = diffSet.deleted.length > 0 ? { deletedRawPaths: [...diffSet.deleted].sort() } : undefined;
|
||||
return { workUnits: keptUnits, eviction };
|
||||
}
|
||||
|
|
@ -0,0 +1,46 @@
|
|||
import { mkdir, mkdtemp, rm, writeFile } from 'node:fs/promises';
|
||||
import { tmpdir } from 'node:os';
|
||||
import { join } from 'node:path';
|
||||
import { afterEach, beforeEach, describe, expect, it } from 'vitest';
|
||||
import { detectLookmlStagedDir } from './detect.js';
|
||||
|
||||
describe('detectLookmlStagedDir', () => {
|
||||
let stagedDir: string;
|
||||
|
||||
beforeEach(async () => {
|
||||
stagedDir = await mkdtemp(join(tmpdir(), 'lkml-detect-'));
|
||||
});
|
||||
|
||||
afterEach(async () => rm(stagedDir, { recursive: true, force: true }));
|
||||
|
||||
it('returns true when a .model.lkml is present at root', async () => {
|
||||
await writeFile(join(stagedDir, 'orders.model.lkml'), 'include: "views/*"\n', 'utf-8');
|
||||
expect(await detectLookmlStagedDir(stagedDir)).toBe(true);
|
||||
});
|
||||
|
||||
it('returns true when only a .view.lkml is present (no model)', async () => {
|
||||
await writeFile(join(stagedDir, 'x.view.lkml'), 'view: x {}\n', 'utf-8');
|
||||
expect(await detectLookmlStagedDir(stagedDir)).toBe(true);
|
||||
});
|
||||
|
||||
it('returns true when .lkml files are nested under any subdirectory', async () => {
|
||||
await mkdir(join(stagedDir, 'nested', 'deeper'), { recursive: true });
|
||||
await writeFile(join(stagedDir, 'nested', 'deeper', 'x.view.lkml'), 'view: x {}\n', 'utf-8');
|
||||
expect(await detectLookmlStagedDir(stagedDir)).toBe(true);
|
||||
});
|
||||
|
||||
it('accepts the .lookml extension as well as .lkml', async () => {
|
||||
await writeFile(join(stagedDir, 'x.view.lookml'), 'view: x {}\n', 'utf-8');
|
||||
expect(await detectLookmlStagedDir(stagedDir)).toBe(true);
|
||||
});
|
||||
|
||||
it('returns false for a bundle with no .lkml files at all', async () => {
|
||||
await writeFile(join(stagedDir, 'README.md'), '# hi\n', 'utf-8');
|
||||
await writeFile(join(stagedDir, 'config.yaml'), 'a: 1\n', 'utf-8');
|
||||
expect(await detectLookmlStagedDir(stagedDir)).toBe(false);
|
||||
});
|
||||
|
||||
it('returns false for an empty directory', async () => {
|
||||
expect(await detectLookmlStagedDir(stagedDir)).toBe(false);
|
||||
});
|
||||
});
|
||||
13
packages/cli/src/context/ingest/adapters/lookml/detect.ts
Normal file
13
packages/cli/src/context/ingest/adapters/lookml/detect.ts
Normal file
|
|
@ -0,0 +1,13 @@
|
|||
import { readdir } from 'node:fs/promises';
|
||||
|
||||
const LKML_EXT_RE = /\.(lkml|lookml)$/i;
|
||||
|
||||
export async function detectLookmlStagedDir(stagedDir: string): Promise<boolean> {
|
||||
const entries = await readdir(stagedDir, { withFileTypes: true, recursive: true });
|
||||
for (const entry of entries) {
|
||||
if (entry.isFile() && LKML_EXT_RE.test(entry.name)) {
|
||||
return true;
|
||||
}
|
||||
}
|
||||
return false;
|
||||
}
|
||||
|
|
@ -0,0 +1,113 @@
|
|||
import { mkdtemp, readFile, rm } from 'node:fs/promises';
|
||||
import { tmpdir } from 'node:os';
|
||||
import { join } from 'node:path';
|
||||
import { afterEach, beforeEach, describe, expect, it } from 'vitest';
|
||||
import type { ParsedLookmlProject } from './parse.js';
|
||||
import {
|
||||
LOOKML_FETCH_REPORT_FILE,
|
||||
LOOKML_MISMATCHED_MODELS_FILE,
|
||||
buildLookmlValidationArtifacts,
|
||||
readLookmlFetchReport,
|
||||
readLookmlMismatchedModelNames,
|
||||
writeLookmlValidationArtifacts,
|
||||
} from './fetch-report.js';
|
||||
|
||||
function project(models: ParsedLookmlProject['models']): ParsedLookmlProject {
|
||||
return { models, views: [], dashboards: [], allPaths: models.map((m) => m.path) };
|
||||
}
|
||||
|
||||
describe('LookML validation fetch report', () => {
|
||||
let stagedDir: string;
|
||||
|
||||
beforeEach(async () => {
|
||||
stagedDir = await mkdtemp(join(tmpdir(), 'lookml-report-'));
|
||||
});
|
||||
|
||||
afterEach(async () => rm(stagedDir, { recursive: true, force: true }));
|
||||
|
||||
it('emits partial warning artifacts for mismatched model connection names', async () => {
|
||||
const artifacts = buildLookmlValidationArtifacts(
|
||||
project([
|
||||
{
|
||||
path: 'b2b.model.lkml',
|
||||
name: 'b2b',
|
||||
includes: [],
|
||||
explores: ['orders'],
|
||||
connectionName: 'staging_pg',
|
||||
},
|
||||
{
|
||||
path: 'finance.model.lkml',
|
||||
name: 'finance',
|
||||
includes: [],
|
||||
explores: ['revenue'],
|
||||
connectionName: 'b2b_sandbox_bq',
|
||||
},
|
||||
]),
|
||||
{ expectedLookerConnectionName: 'b2b_sandbox_bq' },
|
||||
);
|
||||
|
||||
expect(artifacts.mismatchedModelNames).toEqual(['b2b']);
|
||||
expect(artifacts.report.status).toBe('partial');
|
||||
expect(artifacts.report.warnings).toEqual([
|
||||
{
|
||||
rawPath: 'b2b.model.lkml',
|
||||
entityType: 'lookml_models',
|
||||
entityId: 'b2b',
|
||||
severity: 'warning',
|
||||
statusCode: null,
|
||||
message:
|
||||
'LookML model b2b declares connection staging_pg but this warehouse expects b2b_sandbox_bq; SL writes are disabled for this model.',
|
||||
retryRecommended: false,
|
||||
kind: 'lookml_connection_mismatch',
|
||||
details: { model: 'b2b', declared: 'staging_pg', expected: 'b2b_sandbox_bq' },
|
||||
},
|
||||
]);
|
||||
});
|
||||
|
||||
it('emits success when no expected connection is configured', () => {
|
||||
const artifacts = buildLookmlValidationArtifacts(
|
||||
project([
|
||||
{
|
||||
path: 'b2b.model.lkml',
|
||||
name: 'b2b',
|
||||
includes: [],
|
||||
explores: [],
|
||||
connectionName: 'staging_pg',
|
||||
},
|
||||
]),
|
||||
{ expectedLookerConnectionName: null },
|
||||
);
|
||||
|
||||
expect(artifacts.mismatchedModelNames).toEqual([]);
|
||||
expect(artifacts.report).toEqual({
|
||||
status: 'success',
|
||||
retryRecommended: false,
|
||||
skipped: [],
|
||||
warnings: [],
|
||||
});
|
||||
});
|
||||
|
||||
it('round-trips the fetch report and mismatched model sidecar', async () => {
|
||||
const artifacts = buildLookmlValidationArtifacts(
|
||||
project([
|
||||
{
|
||||
path: 'orders.model.lkml',
|
||||
name: 'orders',
|
||||
includes: [],
|
||||
explores: [],
|
||||
connectionName: 'wrong',
|
||||
},
|
||||
]),
|
||||
{ expectedLookerConnectionName: 'expected' },
|
||||
);
|
||||
|
||||
await writeLookmlValidationArtifacts(stagedDir, artifacts);
|
||||
|
||||
await expect(readFile(join(stagedDir, LOOKML_FETCH_REPORT_FILE), 'utf-8')).resolves.toContain(
|
||||
'lookml_connection_mismatch',
|
||||
);
|
||||
await expect(readFile(join(stagedDir, LOOKML_MISMATCHED_MODELS_FILE), 'utf-8')).resolves.toContain('orders');
|
||||
await expect(readLookmlFetchReport(stagedDir)).resolves.toEqual(artifacts.report);
|
||||
await expect(readLookmlMismatchedModelNames(stagedDir)).resolves.toEqual(new Set(['orders']));
|
||||
});
|
||||
});
|
||||
127
packages/cli/src/context/ingest/adapters/lookml/fetch-report.ts
Normal file
127
packages/cli/src/context/ingest/adapters/lookml/fetch-report.ts
Normal file
|
|
@ -0,0 +1,127 @@
|
|||
import { mkdir, readFile, writeFile } from 'node:fs/promises';
|
||||
import { dirname, join } from 'node:path';
|
||||
import * as z from 'zod';
|
||||
import type { SourceFetchReport } from '../../types.js';
|
||||
import type { ParsedLookmlProject } from './parse.js';
|
||||
|
||||
/** @internal */
|
||||
export const LOOKML_FETCH_REPORT_FILE = 'lookml-fetch-report.json';
|
||||
/** @internal */
|
||||
export const LOOKML_MISMATCHED_MODELS_FILE = 'lookml-mismatched-models.json';
|
||||
|
||||
const fetchIssueKindSchema = z.enum([
|
||||
'unmapped_looker_connection',
|
||||
'unparseable_sql_table_name',
|
||||
'looker_template_unresolved',
|
||||
'derived_table_not_supported',
|
||||
'lookml_connection_mismatch',
|
||||
]);
|
||||
|
||||
const fetchIssueSchema = z.object({
|
||||
rawPath: z.string().min(1),
|
||||
entityType: z.string().min(1),
|
||||
entityId: z.string().nullable(),
|
||||
severity: z.enum(['warning', 'error']),
|
||||
statusCode: z.number().int().nullable(),
|
||||
message: z.string().min(1),
|
||||
retryRecommended: z.boolean(),
|
||||
kind: fetchIssueKindSchema.optional(),
|
||||
details: z.record(z.string(), z.unknown()).optional(),
|
||||
});
|
||||
|
||||
const fetchReportSchema = z.object({
|
||||
status: z.enum(['success', 'partial']),
|
||||
retryRecommended: z.boolean(),
|
||||
skipped: z.array(fetchIssueSchema),
|
||||
warnings: z.array(fetchIssueSchema),
|
||||
});
|
||||
|
||||
const mismatchedModelsSchema = z.object({
|
||||
modelNames: z.array(z.string().min(1)).default([]),
|
||||
});
|
||||
|
||||
interface LookmlValidationArtifacts {
|
||||
report: SourceFetchReport;
|
||||
mismatchedModelNames: string[];
|
||||
}
|
||||
|
||||
export function buildLookmlValidationArtifacts(
|
||||
project: ParsedLookmlProject,
|
||||
config: { expectedLookerConnectionName: string | null },
|
||||
): LookmlValidationArtifacts {
|
||||
const expected = config.expectedLookerConnectionName;
|
||||
if (!expected) {
|
||||
return {
|
||||
report: { status: 'success', retryRecommended: false, skipped: [], warnings: [] },
|
||||
mismatchedModelNames: [],
|
||||
};
|
||||
}
|
||||
|
||||
const mismatched = project.models
|
||||
.filter((model) => model.connectionName !== null && model.connectionName !== expected)
|
||||
.sort((a, b) => a.name.localeCompare(b.name));
|
||||
|
||||
const warnings = mismatched.map((model) => {
|
||||
const declared = model.connectionName ?? '(none)';
|
||||
return {
|
||||
rawPath: model.path,
|
||||
entityType: 'lookml_models',
|
||||
entityId: model.name,
|
||||
severity: 'warning' as const,
|
||||
statusCode: null,
|
||||
message: `LookML model ${model.name} declares connection ${declared} but this warehouse expects ${expected}; SL writes are disabled for this model.`,
|
||||
retryRecommended: false,
|
||||
kind: 'lookml_connection_mismatch' as const,
|
||||
details: { model: model.name, declared, expected },
|
||||
};
|
||||
});
|
||||
|
||||
return {
|
||||
report: {
|
||||
status: warnings.length > 0 ? 'partial' : 'success',
|
||||
retryRecommended: false,
|
||||
skipped: [],
|
||||
warnings,
|
||||
},
|
||||
mismatchedModelNames: mismatched.map((model) => model.name),
|
||||
};
|
||||
}
|
||||
|
||||
export async function writeLookmlValidationArtifacts(
|
||||
stagedDir: string,
|
||||
artifacts: LookmlValidationArtifacts,
|
||||
): Promise<void> {
|
||||
const reportPath = join(stagedDir, LOOKML_FETCH_REPORT_FILE);
|
||||
await mkdir(dirname(reportPath), { recursive: true });
|
||||
await writeFile(reportPath, `${JSON.stringify(fetchReportSchema.parse(artifacts.report), null, 2)}\n`, 'utf-8');
|
||||
await writeFile(
|
||||
join(stagedDir, LOOKML_MISMATCHED_MODELS_FILE),
|
||||
`${JSON.stringify({ modelNames: artifacts.mismatchedModelNames }, null, 2)}\n`,
|
||||
'utf-8',
|
||||
);
|
||||
}
|
||||
|
||||
export async function readLookmlFetchReport(stagedDir: string): Promise<SourceFetchReport | null> {
|
||||
try {
|
||||
const raw = await readFile(join(stagedDir, LOOKML_FETCH_REPORT_FILE), 'utf-8');
|
||||
return fetchReportSchema.parse(JSON.parse(raw));
|
||||
} catch (error) {
|
||||
if (error && typeof error === 'object' && 'code' in error && error.code === 'ENOENT') {
|
||||
return null;
|
||||
}
|
||||
throw error;
|
||||
}
|
||||
}
|
||||
|
||||
export async function readLookmlMismatchedModelNames(stagedDir: string): Promise<Set<string>> {
|
||||
try {
|
||||
const raw = await readFile(join(stagedDir, LOOKML_MISMATCHED_MODELS_FILE), 'utf-8');
|
||||
const parsed = mismatchedModelsSchema.parse(JSON.parse(raw));
|
||||
return new Set(parsed.modelNames);
|
||||
} catch (error) {
|
||||
if (error && typeof error === 'object' && 'code' in error && error.code === 'ENOENT') {
|
||||
return new Set();
|
||||
}
|
||||
throw error;
|
||||
}
|
||||
}
|
||||
146
packages/cli/src/context/ingest/adapters/lookml/fetch.test.ts
Normal file
146
packages/cli/src/context/ingest/adapters/lookml/fetch.test.ts
Normal file
|
|
@ -0,0 +1,146 @@
|
|||
import { mkdir, mkdtemp, readFile, rm, writeFile } from 'node:fs/promises';
|
||||
import { tmpdir } from 'node:os';
|
||||
import { join } from 'node:path';
|
||||
import { afterEach, beforeEach, describe, expect, it } from 'vitest';
|
||||
import { makeLocalGitRepo } from '../../../test/make-local-git-repo.js';
|
||||
import { fetchLookmlRepo } from './fetch.js';
|
||||
import type { LookmlPullConfig } from './pull-config.js';
|
||||
|
||||
const FIXTURE_ROOT = join(__dirname, '../../../../test/fixtures/lookml');
|
||||
|
||||
function pullConfig(overrides: Partial<LookmlPullConfig> & Pick<LookmlPullConfig, 'repoUrl'>): LookmlPullConfig {
|
||||
return {
|
||||
branch: 'main',
|
||||
path: null,
|
||||
authToken: null,
|
||||
expectedLookerConnectionName: null,
|
||||
parsedTargetTables: {},
|
||||
...overrides,
|
||||
};
|
||||
}
|
||||
|
||||
describe('fetchLookmlRepo', () => {
|
||||
let tmpRoot: string;
|
||||
|
||||
beforeEach(async () => {
|
||||
tmpRoot = await mkdtemp(join(tmpdir(), 'fetch-lookml-'));
|
||||
});
|
||||
|
||||
afterEach(async () => rm(tmpRoot, { recursive: true, force: true }));
|
||||
|
||||
it('clones a local file:// repo and materializes only .lkml/.lookml files into stagedDir', async () => {
|
||||
const repo = await makeLocalGitRepo(join(FIXTURE_ROOT, 'single-model'), join(tmpRoot, 'origin'));
|
||||
// Add a non-LookML file to prove we filter it out.
|
||||
await repo.writeFile('README.md', '# readme\n');
|
||||
await repo.commit('add readme');
|
||||
|
||||
const stagedDir = join(tmpRoot, 'staged');
|
||||
const cacheDir = join(tmpRoot, 'cache', 'conn-1');
|
||||
await mkdir(stagedDir, { recursive: true });
|
||||
|
||||
const result = await fetchLookmlRepo({
|
||||
config: pullConfig({ repoUrl: repo.repoUrl }),
|
||||
cacheDir,
|
||||
stagedDir,
|
||||
});
|
||||
|
||||
expect(result.filesCopied).toBe(3); // orders.model.lkml + 2 views
|
||||
expect(result.commitHash).toMatch(/^[0-9a-f]{40}$/);
|
||||
await expect(readFile(join(stagedDir, 'orders.model.lkml'), 'utf-8')).resolves.toMatch(/connection:/);
|
||||
await expect(readFile(join(stagedDir, 'views', 'orders.view.lkml'), 'utf-8')).resolves.toMatch(/view: orders/);
|
||||
// README.md is present in the cache but NOT in stagedDir.
|
||||
await expect(readFile(join(stagedDir, 'README.md'), 'utf-8')).rejects.toThrow();
|
||||
await expect(readFile(join(cacheDir, 'README.md'), 'utf-8')).resolves.toMatch(/readme/);
|
||||
});
|
||||
|
||||
it('pulls an existing cache dir (second call) and surfaces the new commit', async () => {
|
||||
const repo = await makeLocalGitRepo(join(FIXTURE_ROOT, 'single-model'), join(tmpRoot, 'origin'));
|
||||
const stagedDir1 = join(tmpRoot, 'staged-1');
|
||||
const stagedDir2 = join(tmpRoot, 'staged-2');
|
||||
const cacheDir = join(tmpRoot, 'cache', 'conn-1');
|
||||
await mkdir(stagedDir1, { recursive: true });
|
||||
await mkdir(stagedDir2, { recursive: true });
|
||||
|
||||
const r1 = await fetchLookmlRepo({
|
||||
config: pullConfig({ repoUrl: repo.repoUrl }),
|
||||
cacheDir,
|
||||
stagedDir: stagedDir1,
|
||||
});
|
||||
|
||||
// Commit a new revision in the origin — a modified view.
|
||||
await repo.writeFile('views/orders.view.lkml', 'view: orders { sql_table_name: public.orders_v2 ;; }\n');
|
||||
await repo.commit('bump');
|
||||
|
||||
const r2 = await fetchLookmlRepo({
|
||||
config: pullConfig({ repoUrl: repo.repoUrl }),
|
||||
cacheDir,
|
||||
stagedDir: stagedDir2,
|
||||
});
|
||||
expect(r2.commitHash).not.toBe(r1.commitHash);
|
||||
await expect(readFile(join(stagedDir2, 'views', 'orders.view.lkml'), 'utf-8')).resolves.toMatch(/orders_v2/);
|
||||
});
|
||||
|
||||
it('respects config.path — only files under that subtree land in stagedDir', async () => {
|
||||
// Build a multi-subdir repo: models/... + views/...
|
||||
const originRoot = join(tmpRoot, 'origin');
|
||||
await mkdir(originRoot, { recursive: true });
|
||||
await mkdir(join(originRoot, 'fixture-src', 'models'), { recursive: true });
|
||||
await mkdir(join(originRoot, 'fixture-src', 'views'), { recursive: true });
|
||||
await writeFile(join(originRoot, 'fixture-src', 'models', 'orders.model.lkml'), 'connection: "c"\n', 'utf-8');
|
||||
await writeFile(join(originRoot, 'fixture-src', 'views', 'orders.view.lkml'), 'view: orders {}\n', 'utf-8');
|
||||
const repo = await makeLocalGitRepo(join(originRoot, 'fixture-src'), join(originRoot, 'git'));
|
||||
|
||||
const stagedDir = join(tmpRoot, 'staged');
|
||||
const cacheDir = join(tmpRoot, 'cache', 'conn-path');
|
||||
await mkdir(stagedDir, { recursive: true });
|
||||
|
||||
const result = await fetchLookmlRepo({
|
||||
config: pullConfig({ repoUrl: repo.repoUrl, path: 'views' }),
|
||||
cacheDir,
|
||||
stagedDir,
|
||||
});
|
||||
expect(result.filesCopied).toBe(1);
|
||||
await expect(readFile(join(stagedDir, 'orders.view.lkml'), 'utf-8')).resolves.toMatch(/view: orders/);
|
||||
// The model under `models/` is NOT copied because we scoped to `views/`.
|
||||
await expect(readFile(join(stagedDir, 'orders.model.lkml'), 'utf-8')).rejects.toThrow();
|
||||
});
|
||||
|
||||
it('falls back to fresh clone when the cache dir is corrupt', async () => {
|
||||
const repo = await makeLocalGitRepo(join(FIXTURE_ROOT, 'single-model'), join(tmpRoot, 'origin'));
|
||||
const stagedDir = join(tmpRoot, 'staged');
|
||||
const cacheDir = join(tmpRoot, 'cache', 'conn-bad');
|
||||
await mkdir(stagedDir, { recursive: true });
|
||||
|
||||
// Pre-create a cacheDir that looks like a git repo but is corrupt.
|
||||
await mkdir(join(cacheDir, '.git'), { recursive: true });
|
||||
await writeFile(join(cacheDir, '.git', 'HEAD'), 'garbage\n', 'utf-8');
|
||||
|
||||
const result = await fetchLookmlRepo({
|
||||
config: pullConfig({ repoUrl: repo.repoUrl }),
|
||||
cacheDir,
|
||||
stagedDir,
|
||||
});
|
||||
expect(result.filesCopied).toBeGreaterThan(0);
|
||||
});
|
||||
|
||||
it('sanitizes auth tokens out of error messages when clone fails', async () => {
|
||||
const stagedDir = join(tmpRoot, 'staged');
|
||||
const cacheDir = join(tmpRoot, 'cache', 'conn-bad-url');
|
||||
await mkdir(stagedDir, { recursive: true });
|
||||
|
||||
await expect(
|
||||
fetchLookmlRepo({
|
||||
config: pullConfig({
|
||||
repoUrl: 'http://definitely-not-a-real-host.test/r.git',
|
||||
authToken: 'supersecret-token',
|
||||
}),
|
||||
cacheDir,
|
||||
stagedDir,
|
||||
}),
|
||||
).rejects.toThrow(
|
||||
// Error is thrown with sanitized message — the token is replaced by '***'.
|
||||
// The exact message depends on simple-git's failure mode; we assert the token does NOT appear.
|
||||
expect.objectContaining({ message: expect.not.stringContaining('supersecret-token') }),
|
||||
);
|
||||
});
|
||||
});
|
||||
75
packages/cli/src/context/ingest/adapters/lookml/fetch.ts
Normal file
75
packages/cli/src/context/ingest/adapters/lookml/fetch.ts
Normal file
|
|
@ -0,0 +1,75 @@
|
|||
import { access, copyFile, mkdir, readdir } from 'node:fs/promises';
|
||||
import { join, relative } from 'node:path';
|
||||
import { cloneOrPull, sanitizeRepoError } from '../../repo-fetch.js';
|
||||
import type { LookmlPullConfig } from './pull-config.js';
|
||||
|
||||
export interface FetchLookmlRepoParams {
|
||||
config: LookmlPullConfig;
|
||||
/** Persistent cache directory (typically per-connection). Cloned here once, pulled on subsequent calls. */
|
||||
cacheDir: string;
|
||||
/** Per-job staged directory that the adapter writes `.lkml`/`.lookml` files into. */
|
||||
stagedDir: string;
|
||||
}
|
||||
|
||||
export interface FetchLookmlRepoResult {
|
||||
/** SHA of the repo HEAD after the pull. */
|
||||
commitHash: string;
|
||||
/** Number of LookML files copied into `stagedDir`. */
|
||||
filesCopied: number;
|
||||
}
|
||||
|
||||
const LKML_EXT_RE = /\.(lkml|lookml)$/i;
|
||||
|
||||
export async function fetchLookmlRepo(params: FetchLookmlRepoParams): Promise<FetchLookmlRepoResult> {
|
||||
const { config, cacheDir, stagedDir } = params;
|
||||
const branch = config.branch || 'main';
|
||||
|
||||
try {
|
||||
const { commitHash } = await cloneOrPull({
|
||||
repoUrl: config.repoUrl,
|
||||
authToken: config.authToken,
|
||||
cacheDir,
|
||||
branch,
|
||||
});
|
||||
|
||||
const sourceRoot = config.path ? join(cacheDir, config.path) : cacheDir;
|
||||
const filesCopied = await copyLkmlFilesRecursive(sourceRoot, stagedDir);
|
||||
|
||||
return { commitHash, filesCopied };
|
||||
} catch (err) {
|
||||
throw new Error(sanitizeRepoError(err, config.authToken));
|
||||
}
|
||||
}
|
||||
|
||||
async function copyLkmlFilesRecursive(sourceRoot: string, destRoot: string): Promise<number> {
|
||||
if (!(await dirExists(sourceRoot))) {
|
||||
return 0;
|
||||
}
|
||||
await mkdir(destRoot, { recursive: true });
|
||||
const entries = await readdir(sourceRoot, { withFileTypes: true, recursive: true });
|
||||
let copied = 0;
|
||||
for (const entry of entries) {
|
||||
if (!entry.isFile()) {
|
||||
continue;
|
||||
}
|
||||
if (!LKML_EXT_RE.test(entry.name)) {
|
||||
continue;
|
||||
}
|
||||
const absSrc = join(entry.parentPath, entry.name);
|
||||
const rel = relative(sourceRoot, absSrc);
|
||||
const dest = join(destRoot, rel);
|
||||
await mkdir(join(dest, '..'), { recursive: true });
|
||||
await copyFile(absSrc, dest);
|
||||
copied++;
|
||||
}
|
||||
return copied;
|
||||
}
|
||||
|
||||
async function dirExists(path: string): Promise<boolean> {
|
||||
try {
|
||||
await access(path);
|
||||
return true;
|
||||
} catch {
|
||||
return false;
|
||||
}
|
||||
}
|
||||
118
packages/cli/src/context/ingest/adapters/lookml/graph.test.ts
Normal file
118
packages/cli/src/context/ingest/adapters/lookml/graph.test.ts
Normal file
|
|
@ -0,0 +1,118 @@
|
|||
import { describe, expect, it } from 'vitest';
|
||||
import { buildLookmlGraph } from './graph.js';
|
||||
import type { ParsedLookmlProject } from './parse.js';
|
||||
|
||||
type LooseParsedLookmlProject = Omit<Partial<ParsedLookmlProject>, 'models' | 'views'> & {
|
||||
models?: Array<Omit<ParsedLookmlProject['models'][number], 'connectionName'> & { connectionName?: string | null }>;
|
||||
views?: Array<Omit<ParsedLookmlProject['views'][number], 'rawSqlTableName'> & { rawSqlTableName?: string | null }>;
|
||||
};
|
||||
|
||||
const mkProject = (overrides: LooseParsedLookmlProject): ParsedLookmlProject => ({
|
||||
dashboards: [],
|
||||
allPaths: [],
|
||||
...overrides,
|
||||
models: (overrides.models ?? []).map((model) => ({ connectionName: null, ...model })),
|
||||
views: (overrides.views ?? []).map((view) => ({ rawSqlTableName: null, ...view })),
|
||||
});
|
||||
|
||||
describe('buildLookmlGraph', () => {
|
||||
it('assigns a single model as owner of all its included views', () => {
|
||||
const project = mkProject({
|
||||
models: [{ path: 'orders.model.lkml', name: 'orders', includes: ['views/*.view.lkml'], explores: ['orders'] }],
|
||||
views: [
|
||||
{ path: 'views/orders.view.lkml', name: 'orders', extendsFrom: [] },
|
||||
{ path: 'views/customers.view.lkml', name: 'customers', extendsFrom: [] },
|
||||
],
|
||||
allPaths: ['orders.model.lkml', 'views/customers.view.lkml', 'views/orders.view.lkml'],
|
||||
});
|
||||
const graph = buildLookmlGraph(project);
|
||||
expect(graph.ownerByViewPath.get('views/orders.view.lkml')).toBe('orders');
|
||||
expect(graph.ownerByViewPath.get('views/customers.view.lkml')).toBe('orders');
|
||||
expect(graph.viewsIncludedByModel.get('orders')?.sort()).toEqual([
|
||||
'views/customers.view.lkml',
|
||||
'views/orders.view.lkml',
|
||||
]);
|
||||
});
|
||||
|
||||
it('assigns shared views to the lexicographically-first model that includes them', () => {
|
||||
const project = mkProject({
|
||||
models: [
|
||||
{ path: 'marketing.model.lkml', name: 'marketing', includes: ['views/shared.view.lkml'], explores: [] },
|
||||
{
|
||||
path: 'orders.model.lkml',
|
||||
name: 'orders',
|
||||
includes: ['views/shared.view.lkml', 'views/orders.view.lkml'],
|
||||
explores: [],
|
||||
},
|
||||
],
|
||||
views: [
|
||||
{ path: 'views/shared.view.lkml', name: 'shared', extendsFrom: [] },
|
||||
{ path: 'views/orders.view.lkml', name: 'orders', extendsFrom: [] },
|
||||
],
|
||||
allPaths: ['marketing.model.lkml', 'orders.model.lkml', 'views/orders.view.lkml', 'views/shared.view.lkml'],
|
||||
});
|
||||
const graph = buildLookmlGraph(project);
|
||||
// "marketing" sorts before "orders", so marketing owns the shared view.
|
||||
expect(graph.ownerByViewPath.get('views/shared.view.lkml')).toBe('marketing');
|
||||
expect(graph.ownerByViewPath.get('views/orders.view.lkml')).toBe('orders');
|
||||
// Both models list the shared view in their include set:
|
||||
expect(graph.includersByViewPath.get('views/shared.view.lkml')?.sort()).toEqual(['marketing', 'orders']);
|
||||
});
|
||||
|
||||
it('resolves transitive extends chains into dependency paths', () => {
|
||||
const project = mkProject({
|
||||
models: [{ path: 'orders.model.lkml', name: 'orders', includes: ['views/*.view.lkml'], explores: [] }],
|
||||
views: [
|
||||
{ path: 'views/base.view.lkml', name: 'base', extendsFrom: [] },
|
||||
{ path: 'views/orders.view.lkml', name: 'orders', extendsFrom: ['base'] },
|
||||
{ path: 'views/orders_ext.view.lkml', name: 'orders_ext', extendsFrom: ['orders'] },
|
||||
],
|
||||
allPaths: ['orders.model.lkml', 'views/base.view.lkml', 'views/orders.view.lkml', 'views/orders_ext.view.lkml'],
|
||||
});
|
||||
const graph = buildLookmlGraph(project);
|
||||
expect(graph.extendsAncestorsByViewName.get('orders_ext')?.sort()).toEqual(['base', 'orders']);
|
||||
expect(graph.extendsAncestorsByViewName.get('orders')?.sort()).toEqual(['base']);
|
||||
expect(graph.extendsAncestorsByViewName.get('base')?.sort()).toEqual([]);
|
||||
});
|
||||
|
||||
it('resolves glob-style include patterns (views/*.view.lkml) against allPaths', () => {
|
||||
const project = mkProject({
|
||||
models: [{ path: 'orders.model.lkml', name: 'orders', includes: ['views/*.view.lkml'], explores: [] }],
|
||||
views: [
|
||||
{ path: 'views/a.view.lkml', name: 'a', extendsFrom: [] },
|
||||
{ path: 'views/sub/b.view.lkml', name: 'b', extendsFrom: [] },
|
||||
],
|
||||
allPaths: ['orders.model.lkml', 'views/a.view.lkml', 'views/sub/b.view.lkml'],
|
||||
});
|
||||
const graph = buildLookmlGraph(project);
|
||||
// Single-star glob matches one path segment — "views/sub/b.view.lkml" is NOT matched.
|
||||
expect(graph.viewsIncludedByModel.get('orders')?.sort()).toEqual(['views/a.view.lkml']);
|
||||
});
|
||||
|
||||
it('resolves double-star include patterns (views/**/*.view.lkml) recursively', () => {
|
||||
const project = mkProject({
|
||||
models: [{ path: 'orders.model.lkml', name: 'orders', includes: ['views/**/*.view.lkml'], explores: [] }],
|
||||
views: [
|
||||
{ path: 'views/a.view.lkml', name: 'a', extendsFrom: [] },
|
||||
{ path: 'views/sub/b.view.lkml', name: 'b', extendsFrom: [] },
|
||||
],
|
||||
allPaths: ['orders.model.lkml', 'views/a.view.lkml', 'views/sub/b.view.lkml'],
|
||||
});
|
||||
const graph = buildLookmlGraph(project);
|
||||
expect(graph.viewsIncludedByModel.get('orders')?.sort()).toEqual(['views/a.view.lkml', 'views/sub/b.view.lkml']);
|
||||
});
|
||||
|
||||
it('leaves a view ownerless when no model includes it', () => {
|
||||
const project = mkProject({
|
||||
models: [{ path: 'other.model.lkml', name: 'other', includes: ['views/included.view.lkml'], explores: [] }],
|
||||
views: [
|
||||
{ path: 'views/included.view.lkml', name: 'included', extendsFrom: [] },
|
||||
{ path: 'views/orphan.view.lkml', name: 'orphan', extendsFrom: [] },
|
||||
],
|
||||
allPaths: ['other.model.lkml', 'views/included.view.lkml', 'views/orphan.view.lkml'],
|
||||
});
|
||||
const graph = buildLookmlGraph(project);
|
||||
expect(graph.ownerByViewPath.has('views/orphan.view.lkml')).toBe(false);
|
||||
expect(graph.ownerByViewPath.get('views/included.view.lkml')).toBe('other');
|
||||
});
|
||||
});
|
||||
Some files were not shown because too many files have changed in this diff Show more
Loading…
Add table
Add a link
Reference in a new issue