chore(workspace): gate dead-code with knip production mode (#196)

* refactor(workspace): relocate @ktx/llm source into packages/cli/src/llm

* refactor(workspace): rewrite @ktx/llm imports to relative paths

* refactor(workspace): fold internal packages into cli

* chore(workspace): gate dead-code with knip production mode

Turn on production-mode knip plus an autofix run in pre-commit and the
`pnpm dead-code` script, document the `/** @internal */` convention for
test-only exports in AGENTS.md, annotate test-only exports across the
CLI with that JSDoc, and drop dead exports/wrappers the new gate
surfaced (e.g. `cli-project.ts`, `lookerRuntimeSourceToFileAdapterSource`,
`createLocalScanEnrichmentProvidersFromConfig`,
`PGLITE_OWNER_PROCESS_BACKEND_CAPABILITIES`, stale type re-exports).
Replace the loose `ignoreIssues` allowlist in `knip.json` with explicit
production entries so cross-package barrel leaks are caught.

* refactor(cli): delete internal barrel index.ts files

The 34 `index.ts` re-export barrels inside `packages/cli/src/` were
holdovers from the pre-fold multi-workspace structure. Post-fold-in they
served no production purpose: external consumers go through the single
package main entry, and in-repo callers mostly imported through them
only because the path was short. Internally, knip flagged most barrel
re-exports as production-dead (only reached via tests).

This change:
- Deletes every internal barrel except `packages/cli/src/index.ts`
  (the published package entry).
- Rewrites ~270 source/test files to import each name directly from
  the file that defines it.
- Moves `tools/warehouse-verification/index.ts` to
  `create-warehouse-verification-tools.ts` (the function it defined
  locally) and updates its single consumer.
- Renames `search/backend-conformance.ts` → `.test-utils.ts` to match
  the existing test-helper file convention.
- Deletes 13 dead test-only chains (dbt-descriptions/*,
  live-database/extracted-schema, live-database/structural-sync,
  relationship-* feedback/review chain) plus their tests and a
  cascading orphan integration test.
- Updates test mocks that pointed at deleted barrel paths
  (notion-client, connector barrels in scan/local-scan-connectors
  tests) to mock the source files instead.
- Points the maintainer benchmark script
  (`scripts/relationship-benchmark-report.mjs`) at source files
  instead of `dist/context/scan/index.js`.
- Drops the barrel `!` entries from `knip.json`; adds explicit
  production entries only for the benchmark code reached via dist by
  the maintainer script.

Net: 413 files changed, ~1.2k insertions, ~9.4k deletions.

`pnpm run dead-code` (Biome + knip default + knip production) and
`pnpm run type-check` are clean; 2277 tests pass.

* refactor(workspace): rename @ktx/cli to @kaelio/ktx and pack it directly

Promote the CLI workspace package to the public name `@kaelio/ktx` and
drop the separate `scripts/build-public-npm-package.mjs` wrapper. The
CLI package is now publishable in place (`publishConfig.access: public`,
`provenance: true`), so artifact packing uses `pnpm pack` against
`packages/cli/` instead of assembling a parallel package tree.

Updates all workspace filter invocations, docs, tests, and release
readiness checks to reference the new package name, and folds the
tarball-name helper into `scripts/public-npm-release-metadata.mjs`.

* docs: align "agent clients" and "data agents" terminology

Replace "client agents" with "agent clients" and "database agents" with
"data agents" across AGENTS.md, README.md, the docs-site copy, and the
matching setup-agents test description, matching the canonical
vocabulary in docs/terminology.md.

Also moves packages/cli/tsconfig.json's tsBuildInfoFile from
node_modules/.cache/ to dist/.tsbuildinfo so incremental builds survive
node_modules reinstalls.

* refactor(release): single source of truth for package version

Make packages/cli/package.json the single source of truth for the
@kaelio/ktx version. publicNpmPackageVersion() now reads it directly,
so artifact filenames, release-readiness checks, and the Python wheel
version all derive from one field. The duplicate
release-policy.json.publicNpmPackageVersion is removed.

Previously the two fields could drift: tarballs were named
kaelio-ktx-0.4.1.tgz while internally containing
@kaelio/ktx@0.0.0-private.

- update-public-release-version.mjs rewrites both Python pyproject.toml
  files (ktx-daemon, ktx-sl) alongside the npm package.jsons,
  normalizing the version for PEP 440 (e.g. 0.1.0-rc.2 -> 0.1.0rc2).
- semantic-release-config.cjs adds the two pyproject.toml files to
  @semantic-release/git assets so the release commit back to main
  carries every version source in lockstep.
- The six "?? '0.0.0-private'" fallback literals across the CLI are
  replaced with "?? getKtxCliPackageInfo().version", and
  createDefaultKtxMcpServer makes its version arg required.
- docs/release.md describes the actual commit-back model: the dev tree
  always reflects the most recent release; no sentinel pin to
  maintain.

Verified: pnpm run artifacts:build now produces
kaelio-ktx-0.4.1.tgz and kaelio_ktx-0.4.1-py3-none-any.whl with
@kaelio/ktx@0.4.1 inside. Full type-check, dead-code, and
2287 vitests + 173 script tests pass.

* refactor(cli): inject embedding provider resolution and detect sentence-transformers runtime

Make resolveProjectEmbeddingProvider and runtimeIo injectable in ingest and
scan command entrypoints so tests can stub them, and teach
resolvePublicIngestRuntimeRequirements to flag the local-embeddings runtime
feature when ktx.yaml selects sentence-transformers.

* chore(cli): mark buildLocalStatsStatus and LocalStatsStatus as @internal

Both symbols are consumed only by status-project.test.ts. Annotating with
/** @internal */ keeps knip's production-mode check clean without changing
runtime behavior.

* fix(cli): use real package metadata in print-command-tree

The stubbed package name embedded a forbidden product identifier that
tripped the boundary check in CI. Read the metadata from package.json
instead — keeps the rendered tree unchanged and removes a duplicate
source of truth.

* feat(cli): show embedding coverage in `ktx status`, drop duplicate disk counts

Inline `(N embedded)` next to the Wiki scope counts and Semantic-layer
source counts, computed with `SUM(embedding_json IS NOT NULL)` over
`knowledge_pages` and `local_sl_sources`. Rename the "Knowledge" label to
"Wiki" (canonical per `docs/terminology.md`) and rename the matching
`localStats.knowledgePages` field to `localStats.wikiPages`.

Drop `wiki=N md` and `semantic-layer=N yaml` from the Disk row — those
duplicated the per-surface rows above. Disk now reports only actual byte
usage (db, cache, raw-sources). The unused `wikiGlobalMarkdownCount` /
`semanticLayerYamlCount` fields, the `isMarkdownEntry` / `isYamlEntry`
helpers, and the `filter` arg on `summarizeDir` are removed.
This commit is contained in:
Andrey Avtomonov 2026-05-21 15:28:58 +02:00 committed by GitHub
parent a1cfb03d73
commit 2366b00301
No known key found for this signature in database
GPG key ID: B5690EEEBB952194
1002 changed files with 2286 additions and 12051 deletions

View file

@ -0,0 +1,42 @@
import { describe, expect, it } from 'vitest';
import { actionTargetConnectionId, memoryActionIdentity } from './action-identity.js';
describe('memory action target identity', () => {
it('keys SL actions by target connection and wiki actions by run connection', () => {
expect(
memoryActionIdentity(
{ target: 'sl', type: 'created', key: 'orders', detail: '', targetConnectionId: 'warehouse-b' },
'looker-run',
),
).toBe('sl:warehouse-b:orders');
expect(memoryActionIdentity({ target: 'sl', type: 'created', key: 'orders', detail: '' }, 'warehouse-a')).toBe(
'sl:warehouse-a:orders',
);
expect(
memoryActionIdentity(
{
target: 'wiki',
type: 'created',
key: 'wiki/global/orders.md',
detail: '',
targetConnectionId: 'ignored',
},
'looker-run',
),
).toBe('wiki:looker-run:wiki/global/orders.md');
});
it('resolves action target connection only for SL actions', () => {
expect(
actionTargetConnectionId(
{ target: 'sl', type: 'updated', key: 'orders', detail: '', targetConnectionId: 'warehouse-b' },
'looker-run',
),
).toBe('warehouse-b');
expect(actionTargetConnectionId({ target: 'wiki', type: 'updated', key: 'orders', detail: '' }, 'looker-run')).toBe(
'looker-run',
);
});
});

View file

@ -0,0 +1,9 @@
import type { MemoryAction } from '../../context/memory/types.js';
export function actionTargetConnectionId(action: MemoryAction, runConnectionId: string): string {
return action.target === 'sl' ? (action.targetConnectionId ?? runConnectionId) : runConnectionId;
}
export function memoryActionIdentity(action: MemoryAction, runConnectionId: string): string {
return `${action.target}:${actionTargetConnectionId(action, runConnectionId)}:${action.key}`;
}

View file

@ -0,0 +1,214 @@
import { describe, expect, it } from 'vitest';
import { parseDbtSchemaFile, parseDbtSchemaFiles } from './parse-schema.js';
describe('dbt descriptions schema parser', () => {
it('resolves shared dbt vars and defaults before parsing schema YAML', () => {
const result = parseDbtSchemaFile(
`
version: 2
sources:
- name: raw
database: "{{ var('database') }}"
schema: "{{ var('schema', 'fallback_schema') }}"
tables:
- name: orders
identifier: fct_orders
description: "Orders from {{ var('database') }}"
columns:
- name: customer_id
description: "Customer id"
tests:
- relationships:
to: ref('customers')
field: id
models:
- name: "{{ var('model_name', 'orders_model') }}"
schema: "{{ var('model_schema') }}"
columns:
- name: id
description: "Order id"
`,
{ path: 'models/schema.yml', variables: new Map([['database', 'analytics'], ['model_schema', 'mart']]) },
);
expect(result.tables).toEqual([
{
name: 'fct_orders',
description: 'Orders from analytics',
database: 'analytics',
schema: 'fallback_schema',
columns: [
{
name: 'customer_id',
description: 'Customer id',
dataType: null,
dataTests: [{ name: 'relationships', package: 'dbt', kwargs: { to: "ref('customers')", field: 'id' } }],
},
],
resourceType: 'source',
},
{
name: 'orders_model',
description: null,
database: null,
schema: 'mart',
columns: [{ name: 'id', description: 'Order id', dataType: null }],
resourceType: 'model',
},
]);
expect(result.relationships).toEqual([
{
fromTable: 'fct_orders',
fromColumn: 'customer_id',
toTable: 'customers',
toColumn: 'id',
fromSchema: 'fallback_schema',
},
]);
});
it('deduplicates tables by database schema and name while merging columns', () => {
const result = parseDbtSchemaFiles([
{
path: 'models/a.yml',
content: `
version: 2
models:
- name: orders
description: Orders
columns:
- name: id
description: Primary key
`,
},
{
path: 'models/b.yml',
content: `
version: 2
models:
- name: orders
columns:
- name: status
description: Status
- name: id
data_type: integer
`,
},
]);
expect(result.tables).toEqual([
{
name: 'orders',
description: 'Orders',
database: null,
schema: null,
resourceType: 'model',
columns: [
{ name: 'id', description: 'Primary key', dataType: 'integer' },
{ name: 'status', description: 'Status', dataType: null },
],
},
]);
});
it('returns an empty result for malformed YAML and preserves unresolved Jinja text', () => {
expect(parseDbtSchemaFile('{{{{ invalid yaml', { path: 'broken.yml' })).toEqual({
projectName: null,
dbtVersion: null,
tables: [],
relationships: [],
});
const unresolved = parseDbtSchemaFile(
`
version: 2
models:
- name: "{{ var('missing_model') }}"
`,
{ variables: new Map() },
);
expect(unresolved.tables[0]?.name).toBe("{{ var('missing_model') }}");
});
it('extracts data tests, constraints, enum values, tags, and freshness', () => {
const result = parseDbtSchemaFile(`
version: 2
sources:
- name: raw
schema: jaffle
tags: ["raw"]
tables:
- name: customers
tags: ["core"]
loaded_at_field: updated_at
freshness:
warn_after: { count: 12, period: hour }
columns:
- name: id
tests:
- not_null
- unique
- name: status
data_tests:
- accepted_values:
values: ['active', 'inactive']
models:
- name: orders
tags: ["finance"]
loaded_at_field: run_at
columns:
- name: status
data_tests:
- dbt_utils.expression_is_true:
expression: "status is not null"
- accepted_values: ['placed', 'shipped']
`);
const customers = result.tables.find((table) => table.name === 'customers');
expect(customers?.tagsDbt).toEqual(['raw', 'core']);
expect(customers?.freshnessDbt?.loadedAtField).toBe('updated_at');
expect(customers?.freshnessDbt?.raw).toBeDefined();
const id = customers?.columns.find((column) => column.name === 'id');
expect(id?.constraints?.dbt).toEqual({ not_null: true, unique: true });
const status = customers?.columns.find((column) => column.name === 'status');
expect(status?.enumValuesDbt).toEqual(['active', 'inactive']);
const orders = result.tables.find((table) => table.name === 'orders');
expect(orders?.tagsDbt).toEqual(['finance']);
expect(orders?.freshnessDbt?.loadedAtField).toBe('run_at');
const ordersStatus = orders?.columns.find((column) => column.name === 'status');
expect(ordersStatus?.enumValuesDbt).toEqual(['placed', 'shipped']);
expect(ordersStatus?.dataTests).toEqual(
expect.arrayContaining([
expect.objectContaining({ package: 'dbt_utils', name: 'expression_is_true' }),
expect.objectContaining({ package: 'dbt', name: 'accepted_values' }),
]),
);
});
it('parses relationships from model column data tests', () => {
const result = parseDbtSchemaFile(`
version: 2
models:
- name: orders
schema: public
columns:
- name: customer_id
data_tests:
- relationships:
arguments:
to: "ref('customers')"
field: id
`);
expect(result.relationships).toEqual([
{
fromTable: 'orders',
fromColumn: 'customer_id',
toTable: 'customers',
toColumn: 'id',
fromSchema: 'public',
},
]);
});
});

View file

@ -0,0 +1,649 @@
import { createHash } from 'node:crypto';
import { parse as parseYaml } from 'yaml';
import { type KtxLogger, noopLogger } from '../../../../context/core/config.js';
import { resolveJinjaVariables } from '../../dbt-shared/project-vars.js';
interface DbtParsedColumn {
name: string;
description: string | null;
dataType: string | null;
dataTests?: DbtDataTestRef[];
constraints?: DbtColumnConstraints;
enumValuesDbt?: string[];
}
interface DbtDataTestRef {
name: string;
package: string;
kwargs?: Record<string, unknown>;
}
interface DbtColumnConstraints {
dbt: {
not_null?: boolean;
unique?: boolean;
};
}
interface DbtParsedRelationship {
fromTable: string;
fromColumn: string;
toTable: string;
toColumn: string;
fromSchema?: string;
toSchema?: string;
description?: string;
}
interface DbtParsedTable {
name: string;
description: string | null;
database: string | null;
schema: string | null;
columns: DbtParsedColumn[];
resourceType?: 'source' | 'model';
tagsDbt?: string[];
freshnessDbt?: {
raw?: unknown;
loadedAtField?: string | null;
};
}
export interface DbtSchemaParseResult {
projectName: string | null;
dbtVersion: string | null;
tables: DbtParsedTable[];
relationships: DbtParsedRelationship[];
}
export interface DbtSchemaFile {
content: string;
path: string;
}
interface ParseDbtSchemaOptions {
path?: string;
variables?: Map<string, string>;
projectName?: string | null;
logger?: KtxLogger;
}
interface DbtSchemaYaml {
version?: number;
sources?: DbtSchemaSource[];
models?: DbtSchemaModel[];
}
interface DbtSchemaSource {
name: string;
description?: string;
database?: string;
schema?: string;
tags?: string[];
tables?: DbtSchemaTable[];
}
interface DbtSchemaTable {
name: string;
description?: string;
identifier?: string;
tags?: string[];
loaded_at_field?: string;
freshness?: unknown;
columns?: DbtSchemaColumn[];
}
interface DbtSchemaModel {
name: string;
description?: string;
database?: string;
schema?: string;
tags?: string[];
loaded_at_field?: string;
freshness?: unknown;
columns?: DbtSchemaColumn[];
}
interface DbtSchemaColumn {
name: string;
description?: string;
data_type?: string;
data_tests?: DbtSchemaDataTest[];
tests?: DbtSchemaDataTest[];
}
type DbtSchemaDataTest =
| string
| {
relationships?: {
to?: string;
field?: string;
arguments?: { to?: string; field?: string };
};
not_null?: unknown;
unique?: unknown;
accepted_values?: { values?: unknown } | unknown;
[key: string]: unknown;
};
/** @internal */
export function parseDbtSchemaFile(content: string, options: ParseDbtSchemaOptions = {}): DbtSchemaParseResult {
return new DbtSchemaParser(options.logger ?? noopLogger).parseFile(content, options);
}
export function parseDbtSchemaFiles(
files: DbtSchemaFile[],
variables?: Map<string, string>,
options: { projectName?: string | null; logger?: KtxLogger } = {},
): DbtSchemaParseResult {
return new DbtSchemaParser(options.logger ?? noopLogger).parseFiles(files, variables, options.projectName ?? null);
}
class DbtSchemaParser {
constructor(private readonly logger: KtxLogger) {}
parseFile(yamlContent: string, options: ParseDbtSchemaOptions = {}): DbtSchemaParseResult {
this.logger.debug(`Parsing schema file: ${options.path ?? 'unknown'}`);
const resolved = options.variables
? resolveJinjaVariables(yamlContent, options.variables)
: { content: yamlContent, unresolvedVars: [] };
if (resolved.unresolvedVars.length > 0) {
this.logger.warn(
`Unresolved dbt variables in ${options.path ?? 'schema file'}: ${resolved.unresolvedVars.join(', ')}`,
);
}
let schema: DbtSchemaYaml;
try {
schema = parseYaml(resolved.content) as DbtSchemaYaml;
} catch (error) {
this.logger.warn(`Failed to parse YAML${options.path ? ` at ${options.path}` : ''}: ${error}`);
return this.emptyResult(options.projectName ?? null);
}
if (!schema || typeof schema !== 'object') {
return this.emptyResult(options.projectName ?? null);
}
const tables = [...this.parseSources(schema.sources), ...this.parseModels(schema.models)];
const relationships = [
...this.parseSourceRelationships(schema.sources),
...this.parseModelRelationships(schema.models),
];
return {
projectName: options.projectName ?? null,
dbtVersion: null,
tables,
relationships,
};
}
parseFiles(
files: DbtSchemaFile[],
variables?: Map<string, string>,
projectName: string | null = null,
): DbtSchemaParseResult {
const allTables: DbtParsedTable[] = [];
const allRelationships: DbtParsedRelationship[] = [];
for (const file of files) {
const result = this.parseFile(file.content, { path: file.path, variables, projectName });
allTables.push(...result.tables);
allRelationships.push(...result.relationships);
}
return {
projectName,
dbtVersion: null,
tables: this.deduplicateTables(allTables),
relationships: this.deduplicateRelationships(allRelationships),
};
}
private parseSources(sources: DbtSchemaSource[] | undefined): DbtParsedTable[] {
if (!sources || !Array.isArray(sources)) {
return [];
}
const tables: DbtParsedTable[] = [];
for (const source of sources) {
const sourceSchema = source.schema ?? source.name;
const sourceDatabase = source.database ?? null;
const sourceTags = this.normalizeTagList(source.tags);
if (!source.tables || !Array.isArray(source.tables)) {
continue;
}
for (const table of source.tables) {
const tagsDbt = this.mergeTagsDbt(sourceTags, this.normalizeTagList(table.tags));
const freshnessDbt = this.buildFreshnessDbt(table.freshness, table.loaded_at_field);
tables.push({
name: table.identifier ?? table.name,
description: this.normalizeDescription(table.description),
database: sourceDatabase,
schema: sourceSchema,
columns: this.parseColumns(table.columns),
resourceType: 'source',
...(tagsDbt ? { tagsDbt } : {}),
...(freshnessDbt ? { freshnessDbt } : {}),
});
}
}
return tables;
}
private parseModels(models: DbtSchemaModel[] | undefined): DbtParsedTable[] {
if (!models || !Array.isArray(models)) {
return [];
}
const tables: DbtParsedTable[] = [];
for (const model of models) {
if (!model.name) {
continue;
}
const tagsDbt = this.mergeTagsDbt(this.normalizeTagList(model.tags));
const freshnessDbt = this.buildFreshnessDbt(model.freshness, model.loaded_at_field);
tables.push({
name: model.name,
description: this.normalizeDescription(model.description),
database: model.database ?? null,
schema: model.schema ?? null,
columns: this.parseColumns(model.columns),
resourceType: 'model',
...(tagsDbt ? { tagsDbt } : {}),
...(freshnessDbt ? { freshnessDbt } : {}),
});
}
return tables;
}
private parseColumns(columns: DbtSchemaColumn[] | undefined): DbtParsedColumn[] {
if (!columns || !Array.isArray(columns)) {
return [];
}
return columns.map((column) => {
const { refs, constraints, enumValues } = this.parseDataTests(column.data_tests ?? column.tests);
return {
name: column.name,
description: this.normalizeDescription(column.description),
dataType: column.data_type ?? null,
...(refs.length > 0 ? { dataTests: refs } : {}),
...(constraints ? { constraints } : {}),
...(enumValues.length > 0 ? { enumValuesDbt: enumValues } : {}),
};
});
}
private parseDataTests(tests: DbtSchemaDataTest[] | undefined): {
refs: DbtDataTestRef[];
constraints: DbtColumnConstraints | undefined;
enumValues: string[];
} {
const refs: DbtDataTestRef[] = [];
const dbt: { not_null?: boolean; unique?: boolean } = {};
const enumValues: string[] = [];
if (!tests?.length) {
return { refs, constraints: undefined, enumValues };
}
for (const test of tests) {
if (typeof test === 'string') {
const parsed = this.parseTestNameString(test);
refs.push(parsed);
if (parsed.package === 'dbt' && parsed.name === 'not_null') {
dbt.not_null = true;
}
if (parsed.package === 'dbt' && parsed.name === 'unique') {
dbt.unique = true;
}
continue;
}
for (const [key, value] of Object.entries(test)) {
if (key === 'relationships') {
refs.push({
name: 'relationships',
package: 'dbt',
...(value && typeof value === 'object' && !Array.isArray(value)
? { kwargs: value as Record<string, unknown> }
: {}),
});
continue;
}
if (key === 'not_null') {
refs.push({ name: 'not_null', package: 'dbt' });
dbt.not_null = true;
continue;
}
if (key === 'unique') {
refs.push({ name: 'unique', package: 'dbt' });
dbt.unique = true;
continue;
}
if (key === 'accepted_values') {
if (Array.isArray(value)) {
enumValues.push(...value.map((item) => String(item)));
refs.push({ name: 'accepted_values', package: 'dbt', kwargs: { values: value } });
continue;
}
if (value && typeof value === 'object' && !Array.isArray(value)) {
const values = (value as { values?: unknown }).values;
if (Array.isArray(values)) {
enumValues.push(...values.map((item) => String(item)));
}
refs.push({ name: 'accepted_values', package: 'dbt', kwargs: value as Record<string, unknown> });
continue;
}
}
refs.push({
...this.parseTestNameString(key),
...(value && typeof value === 'object' && !Array.isArray(value)
? { kwargs: value as Record<string, unknown> }
: {}),
});
}
}
const constraints = dbt.not_null || dbt.unique ? { dbt } : undefined;
return { refs, constraints, enumValues };
}
private parseTestNameString(value: string): { name: string; package: string } {
const parts = value.split('.');
if (parts.length >= 2) {
return { package: parts[0]!, name: parts.slice(1).join('.') };
}
return { package: 'dbt', name: value };
}
private parseSourceRelationships(sources: DbtSchemaSource[] | undefined): DbtParsedRelationship[] {
if (!sources || !Array.isArray(sources)) {
return [];
}
const relationships: DbtParsedRelationship[] = [];
for (const source of sources) {
const sourceSchema = source.schema ?? source.name;
if (!source.tables || !Array.isArray(source.tables)) {
continue;
}
for (const table of source.tables) {
const tableName = table.identifier ?? table.name;
if (!table.columns || !Array.isArray(table.columns)) {
continue;
}
for (const column of table.columns) {
const tests = column.data_tests ?? column.tests ?? [];
for (const test of tests) {
const relationship = this.parseRelationshipTest(test, tableName, column.name, sourceSchema);
if (relationship) {
relationships.push(relationship);
}
}
}
}
}
return relationships;
}
private parseModelRelationships(models: DbtSchemaModel[] | undefined): DbtParsedRelationship[] {
if (!models || !Array.isArray(models)) {
return [];
}
const relationships: DbtParsedRelationship[] = [];
for (const model of models) {
if (!model.name || !model.columns || !Array.isArray(model.columns)) {
continue;
}
for (const column of model.columns) {
const tests = column.data_tests ?? column.tests ?? [];
for (const test of tests) {
const relationship = this.parseRelationshipTest(test, model.name, column.name, model.schema ?? undefined);
if (relationship) {
relationships.push(relationship);
}
}
}
}
return relationships;
}
private parseRelationshipTest(
test: DbtSchemaDataTest,
fromTable: string,
fromColumn: string,
fromSchema?: string,
): DbtParsedRelationship | null {
if (typeof test === 'string' || !test.relationships) {
return null;
}
const relationship = test.relationships;
const toRef = relationship.to ?? relationship.arguments?.to;
const toColumn = relationship.field ?? relationship.arguments?.field;
if (!toRef || !toColumn) {
this.logger.debug(`Skipping incomplete relationship test for ${fromTable}.${fromColumn}`);
return null;
}
const toTable = this.parseRef(toRef);
if (!toTable) {
this.logger.debug(`Could not parse ref: ${toRef}`);
return null;
}
return {
fromTable,
fromColumn,
toTable,
toColumn,
...(fromSchema ? { fromSchema } : {}),
};
}
private parseRef(refString: string): string | null {
const refMatch = refString.match(/ref\s*\(\s*['"]([^'"]+)['"]\s*\)/);
if (refMatch) {
return refMatch[1];
}
const sourceMatch = refString.match(/source\s*\(\s*['"][^'"]+['"]\s*,\s*['"]([^'"]+)['"]\s*\)/);
if (sourceMatch) {
return sourceMatch[1];
}
return null;
}
private normalizeDescription(description: string | undefined): string | null {
if (!description) {
return null;
}
const trimmed = description.trim();
return trimmed.length > 0 ? trimmed : null;
}
private normalizeTagList(tags: string[] | undefined): string[] {
if (!tags || !Array.isArray(tags)) {
return [];
}
return tags.map((tag) => String(tag));
}
private mergeTagsDbt(...lists: Array<string[] | undefined>): string[] | undefined {
const merged: string[] = [];
const seen = new Set<string>();
for (const list of lists) {
for (const item of list ?? []) {
if (!seen.has(item)) {
seen.add(item);
merged.push(item);
}
}
}
return merged.length > 0 ? merged : undefined;
}
private buildFreshnessDbt(freshness: unknown, loadedAtField: string | undefined): DbtParsedTable['freshnessDbt'] {
const loadedTrim = loadedAtField?.trim();
const hasFreshness = freshness !== undefined && freshness !== null;
if (!hasFreshness && !loadedTrim) {
return undefined;
}
return {
...(hasFreshness ? { raw: freshness } : {}),
...(hasFreshness ? { loadedAtField: loadedTrim ?? null } : loadedTrim ? { loadedAtField: loadedTrim } : {}),
};
}
private deduplicateTables(tables: DbtParsedTable[]): DbtParsedTable[] {
const seen = new Map<string, DbtParsedTable>();
for (const table of tables) {
const key = `${table.database ?? ''}.${table.schema ?? ''}.${table.name}`.toLowerCase();
const existing = seen.get(key);
if (!existing) {
seen.set(key, table);
continue;
}
seen.set(key, {
...existing,
description: existing.description ?? table.description,
columns: this.mergeColumns(existing.columns, table.columns),
tagsDbt: this.mergeTagsDbt(existing.tagsDbt, table.tagsDbt),
freshnessDbt: this.mergeFreshnessDbt(existing.freshnessDbt, table.freshnessDbt),
});
}
return Array.from(seen.values());
}
private mergeColumns(existing: DbtParsedColumn[], incoming: DbtParsedColumn[]): DbtParsedColumn[] {
const seen = new Map<string, DbtParsedColumn>();
for (const column of existing) {
seen.set(column.name.toLowerCase(), column);
}
for (const column of incoming) {
const key = column.name.toLowerCase();
const existingColumn = seen.get(key);
if (!existingColumn) {
seen.set(key, column);
continue;
}
seen.set(key, {
...existingColumn,
description: existingColumn.description ?? column.description,
dataType: existingColumn.dataType ?? column.dataType,
dataTests: this.mergeDbtDataTests(existingColumn.dataTests, column.dataTests),
constraints: this.mergeDbtConstraints(existingColumn.constraints, column.constraints),
enumValuesDbt: this.mergeStringList(existingColumn.enumValuesDbt, column.enumValuesDbt),
});
}
return Array.from(seen.values());
}
private deduplicateRelationships(relationships: DbtParsedRelationship[]): DbtParsedRelationship[] {
const seen = new Set<string>();
const result: DbtParsedRelationship[] = [];
for (const relationship of relationships) {
const key =
`${relationship.fromTable}.${relationship.fromColumn}->${relationship.toTable}.${relationship.toColumn}`.toLowerCase();
if (!seen.has(key)) {
seen.add(key);
result.push(relationship);
}
}
return result;
}
private mergeFreshnessDbt(
existing?: DbtParsedTable['freshnessDbt'],
incoming?: DbtParsedTable['freshnessDbt'],
): DbtParsedTable['freshnessDbt'] {
if (!existing && !incoming) {
return undefined;
}
const raw = existing?.raw !== undefined ? existing.raw : incoming?.raw;
const loadedAtField = existing?.loadedAtField ?? incoming?.loadedAtField;
return {
...(raw !== undefined ? { raw } : {}),
...(loadedAtField !== undefined ? { loadedAtField } : {}),
};
}
private mergeDbtConstraints(
existing?: DbtColumnConstraints,
incoming?: DbtColumnConstraints,
): DbtColumnConstraints | undefined {
const notNull = !!(existing?.dbt.not_null || incoming?.dbt.not_null);
const unique = !!(existing?.dbt.unique || incoming?.dbt.unique);
if (!notNull && !unique) {
return undefined;
}
return { dbt: { ...(notNull ? { not_null: true } : {}), ...(unique ? { unique: true } : {}) } };
}
private mergeStringList(existing?: string[], incoming?: string[]): string[] | undefined {
return this.mergeTagsDbt(existing, incoming);
}
private mergeDbtDataTests(existing?: DbtDataTestRef[], incoming?: DbtDataTestRef[]): DbtDataTestRef[] | undefined {
if (!existing?.length) {
return incoming?.length ? [...incoming] : undefined;
}
if (!incoming?.length) {
return [...existing];
}
const tests = new Map<string, DbtDataTestRef>();
for (const test of [...existing, ...incoming]) {
const kwargsKey =
test.kwargs && Object.keys(test.kwargs).length > 0
? `:${createHash('sha256').update(JSON.stringify(test.kwargs)).digest('hex').slice(0, 16)}`
: '';
tests.set(`${test.package}:${test.name}${kwargsKey}`, test);
}
return [...tests.values()];
}
private emptyResult(projectName: string | null): DbtSchemaParseResult {
return {
projectName,
dbtVersion: null,
tables: [],
relationships: [],
};
}
}

View file

@ -0,0 +1,36 @@
import { describe, expect, it } from 'vitest';
import { chunkDbtProject } from './chunk.js';
describe('chunkDbtProject', () => {
const diffSet = (modified: string[]) => ({ added: [], modified, deleted: [], unchanged: [] });
it('caps peerFileIndex when the project has very many yaml files', () => {
const modelPaths = Array.from({ length: 201 }, (_, i) => `models/m${i}.yml`);
const allPaths = ['dbt_project.yml', ...modelPaths].sort();
const { workUnits } = chunkDbtProject({ allPaths });
const [first] = workUnits;
expect(first).toBeDefined();
expect(first?.peerFileIndex).toHaveLength(200);
expect(first?.notes).toMatch(/capped at 200/);
});
it('keeps large-project model work units when dbt_project.yml changes', () => {
const modelPaths = Array.from({ length: 30 }, (_, i) => `models/m${i}.yml`);
const allPaths = ['dbt_project.yml', ...modelPaths].sort();
const { workUnits } = chunkDbtProject({ allPaths }, { diffSet: diffSet(['dbt_project.yml']) });
expect(workUnits).toHaveLength(30);
expect(workUnits[0]?.rawFiles).toEqual(['models/m0.yml']);
expect(workUnits[0]?.dependencyPaths).toContain('dbt_project.yml');
});
it('keeps large-project model work units when non-model yaml peers change', () => {
const modelPaths = Array.from({ length: 30 }, (_, i) => `models/m${i}.yml`);
const allPaths = ['dbt_project.yml', 'seeds/seed_properties.yml', ...modelPaths].sort();
const { workUnits } = chunkDbtProject({ allPaths }, { diffSet: diffSet(['seeds/seed_properties.yml']) });
expect(workUnits).toHaveLength(30);
expect(workUnits[0]?.rawFiles).toEqual(['models/m0.yml']);
expect(workUnits[0]?.dependencyPaths).toContain('seeds/seed_properties.yml');
});
});

View file

@ -0,0 +1,130 @@
import type { ChunkResult, DiffSet, WorkUnit } from '../../types.js';
import type { ParsedDbtProject } from './parse.js';
interface ChunkOptions {
diffSet?: DiffSet;
}
/**
* Per-model work units (when the project has more than 25 YAML files) only name `rawFiles` under
* `models/**`. Other `.yml` (e.g. some `seeds/` or custom layouts) still appear in `peerFileIndex`
* or in the small-project / no-models fallbacks v1 does not emit one WU per non-models file.
*/
const MODELS_PREFIX = 'models/';
/** `peerFileIndex` is a hint only (agents may not read those paths). Cap to limit prompt size. */
const MAX_PEER_FILE_INDEX = 200;
function projectYamlPath(allPaths: string[]): string | undefined {
if (allPaths.includes('dbt_project.yml')) {
return 'dbt_project.yml';
}
if (allPaths.includes('dbt_project.yaml')) {
return 'dbt_project.yaml';
}
return undefined;
}
function modelRelativePaths(allPaths: string[]): string[] {
return allPaths.filter((p) => p.replace(/\\/g, '/').startsWith(MODELS_PREFIX)).sort();
}
function unitKeyForModelFile(mf: string): string {
const base = mf
.replace(/\.(ya?ml)$/i, '')
.replace(/\\/g, '/')
.replace(/[^a-zA-Z0-9]+/g, '-')
.replace(/^-+|-+$/g, '');
return `dbt-${base.toLowerCase()}`;
}
function emitFirstRunWorkUnits(allPaths: string[], dbtDep: string | undefined): WorkUnit[] {
if (allPaths.length === 0) {
return [];
}
if (allPaths.length <= 25) {
return [
{
unitKey: 'dbt-all',
displayLabel: 'dbt project (all yaml)',
rawFiles: [...allPaths],
peerFileIndex: [],
dependencyPaths: [],
notes: 'dbt project — all YAML in one WorkUnit (≤25 files)',
},
];
}
const modelFiles = modelRelativePaths(allPaths);
if (modelFiles.length === 0) {
return [
{
unitKey: 'dbt-all',
displayLabel: 'dbt project (all yaml, no models/**)',
rawFiles: [...allPaths],
peerFileIndex: [],
dependencyPaths: dbtDep ? [dbtDep] : [],
notes: 'dbt: no models/**/*.yml — single slice with dbt_project as dependency if present',
},
];
}
return modelFiles.map((mf) => {
const allPeers = allPaths.filter((p) => p !== mf).sort();
const truncated = allPeers.length > MAX_PEER_FILE_INDEX;
const peerFileIndex = truncated ? allPeers.slice(0, MAX_PEER_FILE_INDEX) : allPeers;
const dependencyPaths = dbtDep && allPaths.includes(dbtDep) && mf !== dbtDep ? [dbtDep].sort() : [];
const notes = truncated
? `dbt model schema slice (peer index capped at ${MAX_PEER_FILE_INDEX} of ${allPeers.length} paths)`
: 'dbt model schema slice';
return {
unitKey: unitKeyForModelFile(mf),
displayLabel: `dbt ${mf}`,
rawFiles: [mf],
peerFileIndex,
dependencyPaths: dependencyPaths,
notes,
};
});
}
function applyDiffSet(firstRunUnits: WorkUnit[], diffSet: DiffSet): ChunkResult {
const touched = new Set([...diffSet.added, ...diffSet.modified]);
const kept: WorkUnit[] = [];
for (const wu of firstRunUnits) {
const touchedRawFiles = wu.rawFiles.filter((p) => touched.has(p));
const touchedDependencies = wu.dependencyPaths.filter((p) => touched.has(p));
const touchedPeerFiles = wu.peerFileIndex.filter((p) => touched.has(p));
if (touchedRawFiles.length === 0 && touchedDependencies.length === 0 && touchedPeerFiles.length === 0) {
continue;
}
const rawFiles = touchedRawFiles.length > 0 ? touchedRawFiles : wu.rawFiles;
const unchangedRaw = touchedRawFiles.length > 0 ? wu.rawFiles.filter((p) => !touched.has(p)) : [];
for (const p of wu.rawFiles) {
if (!rawFiles.includes(p) && !unchangedRaw.includes(p)) {
unchangedRaw.push(p);
}
}
const combinedDeps = new Set<string>([...wu.dependencyPaths, ...unchangedRaw, ...touchedPeerFiles]);
kept.push({
...wu,
rawFiles: rawFiles.sort(),
dependencyPaths: [...combinedDeps].sort(),
});
}
const eviction = diffSet.deleted.length > 0 ? { deletedRawPaths: [...diffSet.deleted].sort() } : undefined;
return { workUnits: kept, eviction };
}
export function chunkDbtProject(project: ParsedDbtProject, opts: ChunkOptions = {}): ChunkResult {
const dbtDep = projectYamlPath(project.allPaths);
const firstRun = emitFirstRunWorkUnits(project.allPaths, dbtDep);
if (!opts.diffSet) {
return { workUnits: firstRun };
}
return applyDiffSet(firstRun, opts.diffSet);
}

View file

@ -0,0 +1,57 @@
import { mkdir, mkdtemp, rm, writeFile } from 'node:fs/promises';
import { tmpdir } from 'node:os';
import { join } from 'node:path';
import { afterEach, beforeEach, describe, expect, it } from 'vitest';
import type { SourceAdapter } from '../../types.js';
import { DbtSourceAdapter } from './dbt.adapter.js';
describe('DbtSourceAdapter', () => {
let stagedDir: string;
let adapter: SourceAdapter;
beforeEach(async () => {
stagedDir = await mkdtemp(join(tmpdir(), 'dbt-adapter-'));
adapter = new DbtSourceAdapter();
});
afterEach(async () => {
await rm(stagedDir, { recursive: true, force: true });
});
it('declares the expected source key and skill list', () => {
expect(adapter.source).toBe('dbt');
expect(adapter.skillNames).toEqual(['dbt_ingest']);
});
it('detects a staged dbt project root (dbt_project.yml)', async () => {
await writeFile(join(stagedDir, 'dbt_project.yml'), "name: 'jaffle'\nversion: '1.0.0'\n", 'utf-8');
expect(await adapter.detect(stagedDir)).toBe(true);
});
it('chunk: dbt_project.yml + models/a.yml yields one WU (≤25 files)', async () => {
await writeFile(join(stagedDir, 'dbt_project.yml'), "name: 'jaffle'\n", 'utf-8');
await mkdir(join(stagedDir, 'models'), { recursive: true });
await writeFile(
join(stagedDir, 'models/a.yml'),
'version: 2\nmodels:\n - name: orders\n description: Orders\n',
'utf-8',
);
const result = await adapter.chunk(stagedDir);
expect(result.workUnits).toHaveLength(1);
expect(result.workUnits[0].unitKey).toBe('dbt-all');
expect(result.parseArtifacts).toMatchObject({
projectName: 'jaffle',
tables: [{ name: 'orders', description: 'Orders' }],
});
});
it('implements fetch() for git-backed dbt source setup', () => {
expect(adapter.fetch).toBeTypeOf('function');
});
it('reports mapped warehouse targets for bundle SL discovery', async () => {
adapter = new DbtSourceAdapter({ targetConnectionIds: ['postgres-warehouse', 'postgres-warehouse'] });
await expect(adapter.listTargetConnectionIds?.(stagedDir)).resolves.toEqual(['postgres-warehouse']);
});
});

View file

@ -0,0 +1,53 @@
import { join } from 'node:path';
import type { ChunkResult, DiffSet, SourceAdapter } from '../../types.js';
import type { FetchContext } from '../../types.js';
import { loadProjectInfo } from '../../dbt-shared/project-vars.js';
import { loadDbtSchemaFiles } from '../../dbt-shared/schema-files.js';
import { parseDbtSchemaFiles } from '../dbt-descriptions/parse-schema.js';
import { chunkDbtProject } from './chunk.js';
import { detectDbtStagedDir } from './detect.js';
import { fetchDbtRepo, type DbtPullConfig } from './fetch.js';
import { parseDbtStagedDir } from './parse.js';
interface DbtSourceAdapterOptions {
homeDir?: string;
targetConnectionIds?: string[];
}
export class DbtSourceAdapter implements SourceAdapter {
readonly source = 'dbt' as const;
/** Runner merges: ingest_triage, sl_capture, wiki_capture (see ingest-bundle.runner.ts) */
readonly skillNames: string[] = ['dbt_ingest'];
constructor(private readonly options: DbtSourceAdapterOptions = {}) {}
detect(stagedDir: string): Promise<boolean> {
return detectDbtStagedDir(stagedDir);
}
async listTargetConnectionIds(_stagedDir: string): Promise<string[]> {
return [...new Set(this.options.targetConnectionIds ?? [])].sort((left, right) => left.localeCompare(right));
}
async fetch(pullConfig: unknown, stagedDir: string, ctx: FetchContext): Promise<void> {
const config = pullConfig as DbtPullConfig | undefined;
if (!config?.repoUrl) {
throw new Error('dbt fetch requires repoUrl');
}
await fetchDbtRepo({
config,
cacheDir: join(this.options.homeDir ?? '.ktx/cache', 'dbt', ctx.connectionId),
stagedDir,
});
}
async chunk(stagedDir: string, diffSet?: DiffSet): Promise<ChunkResult> {
const project = await parseDbtStagedDir(stagedDir);
const projectInfo = await loadProjectInfo(stagedDir);
const schemaFiles = await loadDbtSchemaFiles(stagedDir);
const parseArtifacts = parseDbtSchemaFiles(schemaFiles, projectInfo.variables, {
projectName: projectInfo.projectName,
});
return { ...chunkDbtProject(project, { diffSet }), parseArtifacts };
}
}

View file

@ -0,0 +1,12 @@
import { access } from 'node:fs/promises';
import { join } from 'node:path';
export async function detectDbtStagedDir(stagedDir: string): Promise<boolean> {
for (const name of ['dbt_project.yml', 'dbt_project.yaml'] as const) {
try {
await access(join(stagedDir, name));
return true;
} catch {}
}
return false;
}

View file

@ -0,0 +1,38 @@
import { mkdir, mkdtemp, readFile, rm, writeFile } from 'node:fs/promises';
import { tmpdir } from 'node:os';
import { join } from 'node:path';
import { afterEach, beforeEach, describe, expect, it, vi } from 'vitest';
import { fetchDbtRepo } from './fetch.js';
describe('fetchDbtRepo', () => {
let tempDir: string;
beforeEach(async () => {
tempDir = await mkdtemp(join(tmpdir(), 'ktx-dbt-fetch-'));
});
afterEach(async () => {
await rm(tempDir, { recursive: true, force: true });
});
it('copies dbt yaml files from a fetched repo subpath into staged dir', async () => {
const cacheDir = join(tempDir, 'cache');
const stagedDir = join(tempDir, 'staged');
await mkdir(join(cacheDir, 'analytics', 'models'), { recursive: true });
await writeFile(join(cacheDir, 'analytics', 'dbt_project.yml'), 'name: analytics\n', 'utf-8');
await writeFile(join(cacheDir, 'analytics', 'models', 'orders.yml'), 'models: []\n', 'utf-8');
const cloneOrPull = vi.fn(async () => ({ commitHash: 'abc123' }));
await expect(
fetchDbtRepo({
config: { repoUrl: 'https://github.com/acme/dbt.git', path: 'analytics' },
cacheDir,
stagedDir,
deps: { cloneOrPull },
}),
).resolves.toEqual({ commitHash: 'abc123', filesCopied: 2 });
await expect(readFile(join(stagedDir, 'dbt_project.yml'), 'utf-8')).resolves.toContain('analytics');
await expect(readFile(join(stagedDir, 'models', 'orders.yml'), 'utf-8')).resolves.toContain('models');
});
});

View file

@ -0,0 +1,60 @@
import { access, copyFile, mkdir, readdir } from 'node:fs/promises';
import { dirname, join, relative } from 'node:path';
import { cloneOrPull, sanitizeRepoError } from '../../repo-fetch.js';
export interface DbtPullConfig {
repoUrl: string;
branch?: string;
path?: string;
authToken?: string | null;
}
export interface FetchDbtRepoParams {
config: DbtPullConfig;
cacheDir: string;
stagedDir: string;
deps?: {
cloneOrPull?: typeof cloneOrPull;
};
}
export async function fetchDbtRepo(params: FetchDbtRepoParams): Promise<{ commitHash: string; filesCopied: number }> {
try {
const runCloneOrPull = params.deps?.cloneOrPull ?? cloneOrPull;
const { commitHash } = await runCloneOrPull({
repoUrl: params.config.repoUrl,
authToken: params.config.authToken,
cacheDir: params.cacheDir,
branch: params.config.branch ?? 'main',
});
const sourceRoot = params.config.path ? join(params.cacheDir, params.config.path) : params.cacheDir;
const filesCopied = await copyYamlFilesRecursive(sourceRoot, params.stagedDir);
return { commitHash, filesCopied };
} catch (error) {
throw new Error(sanitizeRepoError(error, params.config.authToken));
}
}
async function copyYamlFilesRecursive(sourceRoot: string, destRoot: string): Promise<number> {
try {
await access(sourceRoot);
} catch {
return 0;
}
await mkdir(destRoot, { recursive: true });
const entries = await readdir(sourceRoot, { withFileTypes: true, recursive: true });
let copied = 0;
for (const entry of entries) {
if (!entry.isFile() || !/\.ya?ml$/i.test(entry.name)) {
continue;
}
const absSrc = join(entry.parentPath, entry.name);
const rel = relative(sourceRoot, absSrc);
const dest = join(destRoot, rel);
await mkdir(dirname(dest), { recursive: true });
await copyFile(absSrc, dest);
copied += 1;
}
return copied;
}

View file

@ -0,0 +1,8 @@
import { describe, expect, it } from 'vitest';
import { normalizeDbtPath } from './parse.js';
describe('normalizeDbtPath', () => {
it('normalizes Windows separators to POSIX separators', () => {
expect(normalizeDbtPath('models\\marts\\orders.yml')).toBe('models/marts/orders.yml');
});
});

View file

@ -0,0 +1,33 @@
import { readdir } from 'node:fs/promises';
import { join, relative } from 'node:path';
const YAML_EXT_RE = /\.(ya?ml)$/i;
/** @internal */
export function normalizeDbtPath(path: string): string {
return path.replaceAll('\\', '/');
}
async function collectYamlFiles(stagedDir: string): Promise<string[]> {
const entries = await readdir(stagedDir, { withFileTypes: true, recursive: true });
const paths: string[] = [];
for (const entry of entries) {
if (!entry.isFile() || !YAML_EXT_RE.test(entry.name)) {
continue;
}
const abs = join(entry.parentPath, entry.name);
paths.push(normalizeDbtPath(relative(stagedDir, abs)));
}
paths.sort();
return paths;
}
export interface ParsedDbtProject {
/** All `.yml` / `.yaml` paths under stagedDir, relative + sorted. */
allPaths: string[];
}
export async function parseDbtStagedDir(stagedDir: string): Promise<ParsedDbtProject> {
const allPaths = await collectYamlFiles(stagedDir);
return { allPaths };
}

View file

@ -0,0 +1,48 @@
import { readdir } from 'node:fs/promises';
import { join, relative } from 'node:path';
import type { ChunkResult, DiffSet, SourceAdapter, WorkUnit } from '../../types.js';
export class FakeSourceAdapter implements SourceAdapter {
readonly source = 'fake';
readonly skillNames: string[] = [];
detect(): Promise<boolean> {
return Promise.resolve(true);
}
async chunk(stagedDir: string, diffSet?: DiffSet): Promise<ChunkResult> {
const subDirs = (await readdir(stagedDir, { withFileTypes: true }))
.filter((e) => e.isDirectory())
.map((e) => e.name)
.sort();
const workUnits: WorkUnit[] = [];
for (const subDir of subDirs) {
const entries = await readdir(join(stagedDir, subDir), { withFileTypes: true, recursive: true });
const rawFiles = entries
.filter((e) => e.isFile())
.map((e) => relative(stagedDir, join(e.parentPath, e.name)))
.sort();
if (rawFiles.length === 0) {
continue;
}
if (diffSet) {
const touched = new Set([...diffSet.added, ...diffSet.modified]);
const anyTouched = rawFiles.some((p) => touched.has(p));
if (!anyTouched) {
continue;
}
}
workUnits.push({
unitKey: `fake-${subDir}`,
displayLabel: subDir,
rawFiles,
peerFileIndex: [],
dependencyPaths: [],
});
}
const eviction = diffSet && diffSet.deleted.length > 0 ? { deletedRawPaths: [...diffSet.deleted] } : undefined;
return { workUnits, eviction };
}
}

View file

@ -0,0 +1,158 @@
import { describe, expect, it, vi } from 'vitest';
import { BigQueryHistoricSqlQueryHistoryReader } from './bigquery-query-history-reader.js';
import { HistoricSqlGrantsMissingError } from './errors.js';
interface FakeQueryResult {
headers: string[];
rows: unknown[][];
totalRows: number;
error?: string;
}
function queryClient(results: FakeQueryResult[]) {
const executeQuery = vi.fn(async (_query: string) => {
const next = results.shift();
if (!next) {
throw new Error('unexpected query');
}
return next;
});
return { executeQuery };
}
function firstQuery(client: ReturnType<typeof queryClient>): string {
const call = client.executeQuery.mock.calls[0];
if (!call) {
throw new Error('expected query client to be called');
}
return call[0];
}
describe('BigQueryHistoricSqlQueryHistoryReader', () => {
it('probes region-qualified INFORMATION_SCHEMA.JOBS_BY_PROJECT', async () => {
const client = queryClient([{ headers: ['1'], rows: [[1]], totalRows: 1 }]);
const reader = new BigQueryHistoricSqlQueryHistoryReader({ projectId: 'project-1', region: 'US' });
await expect(reader.probe(client)).resolves.toEqual({ warnings: [], info: [] });
expect(client.executeQuery).toHaveBeenCalledWith(
'SELECT 1 FROM `project-1.region-us.INFORMATION_SCHEMA.JOBS_BY_PROJECT` LIMIT 1',
);
});
it('turns probe result errors into HistoricSqlGrantsMissingError', async () => {
const client = queryClient([{ headers: [], rows: [], totalRows: 0, error: 'Access Denied: jobs.listAll' }]);
const reader = new BigQueryHistoricSqlQueryHistoryReader({ projectId: 'project-1', region: 'us-central1' });
await expect(reader.probe(client)).rejects.toMatchObject({
name: 'HistoricSqlGrantsMissingError',
dialect: 'bigquery',
remediation:
'Grant roles/bigquery.resourceViewer on the BigQuery project, or grant a custom role containing bigquery.jobs.listAll.',
});
});
it('turns thrown probe failures into HistoricSqlGrantsMissingError', async () => {
const client = {
executeQuery: vi.fn(async () => {
throw new Error('permission denied');
}),
};
const reader = new BigQueryHistoricSqlQueryHistoryReader({ projectId: 'project-1', region: 'US' });
await expect(reader.probe(client)).rejects.toBeInstanceOf(HistoricSqlGrantsMissingError);
});
it('fetches aggregated BigQuery query templates', async () => {
const client = queryClient([
{
headers: [
'template_id',
'canonical_sql',
'executions',
'distinct_users',
'first_seen',
'last_seen',
'p50_ms',
'p95_ms',
'error_rate',
'rows_produced',
'top_users',
],
rows: [
[
'hash-1',
'select status from orders',
42,
3,
'2026-05-01T00:00:00.000Z',
'2026-05-11T00:00:00.000Z',
12,
40,
0.05,
null,
JSON.stringify([{ user: 'analyst@example.test', executions: 1 }]),
],
],
totalRows: 1,
},
]);
const reader = new BigQueryHistoricSqlQueryHistoryReader({ projectId: 'demo', region: 'us' });
const rows = [];
for await (const row of reader.fetchAggregated(
client,
{ start: new Date('2026-02-10T00:00:00.000Z'), end: new Date('2026-05-11T00:00:00.000Z') },
{ dialect: 'bigquery', minExecutions: 5, windowDays: 90, enabledTables: [], filters: { dropTrivialProbes: true }, redactionPatterns: [], staleArchiveAfterDays: 90 },
)) {
rows.push(row);
}
const sql = firstQuery(client);
expect(sql).toContain('COUNT(*) AS executions');
expect(sql).toContain('COUNT(DISTINCT user_email) AS distinct_users');
expect(sql).toContain('GROUP BY query_hash');
expect(sql).toContain('HAVING COUNT(*) >= 5');
expect(rows).toMatchObject([
{
templateId: 'hash-1',
stats: {
executions: 42,
errorRate: 0.05,
},
topUsers: [{ user: 'analyst@example.test', executions: 1 }],
},
]);
});
it('throws a clear error when the query client cannot execute SQL', async () => {
const reader = new BigQueryHistoricSqlQueryHistoryReader({ projectId: 'project-1', region: 'US' });
await expect(async () => {
for await (const _row of reader.fetchAggregated(
{},
{ start: new Date(), end: new Date() },
{
dialect: 'bigquery',
minExecutions: 5,
windowDays: 90,
enabledTables: [],
filters: { dropTrivialProbes: true },
redactionPatterns: [],
staleArchiveAfterDays: 90,
},
)) {
throw new Error('unreachable');
}
}).rejects.toThrow('Historic SQL BigQuery reader requires a query client with executeQuery(query)');
});
it('rejects unsafe project and region identifiers before building SQL', () => {
expect(() => new BigQueryHistoricSqlQueryHistoryReader({ projectId: 'project`1', region: 'US' })).toThrow(
'Invalid BigQuery project id for historic-SQL ingest: project`1',
);
expect(() => new BigQueryHistoricSqlQueryHistoryReader({ projectId: 'project-1', region: 'US;DROP' })).toThrow(
'Invalid BigQuery region for historic-SQL ingest: US;DROP',
);
});
});

View file

@ -0,0 +1,247 @@
import { HistoricSqlGrantsMissingError } from './errors.js';
import {
aggregatedTemplateSchema,
type AggregatedTemplate,
type HistoricSqlTimeWindow,
type HistoricSqlUnifiedPullConfig,
} from './types.js';
interface QueryResultLike {
headers: string[];
rows: unknown[][];
totalRows: number;
error?: string;
}
interface QueryClientLike {
executeQuery(query: string): Promise<QueryResultLike>;
}
export interface BigQueryHistoricSqlQueryHistoryReaderOptions {
projectId: string;
region: string;
}
const BIGQUERY_GRANTS_REMEDIATION =
'Grant roles/bigquery.resourceViewer on the BigQuery project, or grant a custom role containing bigquery.jobs.listAll.';
function queryClient(client: unknown): QueryClientLike {
if (
client &&
typeof client === 'object' &&
'executeQuery' in client &&
typeof (client as { executeQuery?: unknown }).executeQuery === 'function'
) {
return client as QueryClientLike;
}
throw new Error('Historic SQL BigQuery reader requires a query client with executeQuery(query)');
}
function grantsError(cause: unknown): HistoricSqlGrantsMissingError {
const message =
cause instanceof Error
? cause.message
: typeof cause === 'string'
? cause
: 'BigQuery principal cannot query INFORMATION_SCHEMA.JOBS_BY_PROJECT.';
return new HistoricSqlGrantsMissingError({
dialect: 'bigquery',
message: `Missing BigQuery audit grants for historic-SQL ingest: ${message}`,
remediation: BIGQUERY_GRANTS_REMEDIATION,
cause,
});
}
function normalizeProjectId(value: string): string {
if (!/^[A-Za-z0-9_-]+$/.test(value)) {
throw new Error(`Invalid BigQuery project id for historic-SQL ingest: ${value}`);
}
return value;
}
function normalizeRegion(value: string): string {
const region = value.trim().toLowerCase().replace(/^region-/, '');
if (!/^[a-z0-9-]+$/.test(region)) {
throw new Error(`Invalid BigQuery region for historic-SQL ingest: ${value}`);
}
return region;
}
function timestampExpression(value: Date | string): string {
const date = value instanceof Date ? value : new Date(value);
if (Number.isNaN(date.getTime())) {
throw new Error(`Invalid BigQuery query-history timestamp: ${String(value)}`);
}
return `TIMESTAMP('${date.toISOString().replace(/'/g, "\\'")}')`;
}
function indexByHeader(headers: string[]): Map<string, number> {
const out = new Map<string, number>();
headers.forEach((header, index) => {
out.set(header.toUpperCase(), index);
});
return out;
}
function value(row: unknown[], indexes: Map<string, number>, name: string): unknown {
const index = indexes.get(name.toUpperCase());
return index === undefined ? null : row[index];
}
function nullableString(raw: unknown): string | null {
if (raw === null || raw === undefined) {
return null;
}
const text = String(raw);
return text.length > 0 ? text : null;
}
function requiredString(raw: unknown, field: string): string {
const text = nullableString(raw);
if (!text) {
throw new Error(`BigQuery JOBS_BY_PROJECT row is missing ${field}`);
}
return text;
}
function nullableNumber(raw: unknown): number | null {
if (raw === null || raw === undefined || raw === '') {
return null;
}
const number = typeof raw === 'number' ? raw : Number(raw);
if (!Number.isFinite(number)) {
return null;
}
return Math.max(0, number);
}
function requiredNumber(raw: unknown, field: string): number {
const number = nullableNumber(raw);
if (number === null) {
throw new Error(`BigQuery JOBS_BY_PROJECT row has invalid ${field}: ${String(raw)}`);
}
return number;
}
function requiredInteger(raw: unknown, field: string): number {
return Math.trunc(requiredNumber(raw, field));
}
function nullableInteger(raw: unknown): number | null {
const number = nullableNumber(raw);
return number === null ? null : Math.trunc(number);
}
function isoTimestamp(raw: unknown, field: string): string {
if (raw instanceof Date) {
return raw.toISOString();
}
const text = requiredString(raw, field);
const date = new Date(text);
if (Number.isNaN(date.getTime())) {
throw new Error(`BigQuery JOBS_BY_PROJECT row has invalid ${field}: ${text}`);
}
return date.toISOString();
}
function parseTopUsers(raw: unknown): Array<{ user: string | null; executions: number }> {
const text = nullableString(raw);
if (!text) {
return [];
}
try {
const parsed = JSON.parse(text) as unknown;
if (!Array.isArray(parsed)) {
return [];
}
return parsed.flatMap((entry) => {
if (!entry || typeof entry !== 'object') {
return [];
}
const user = nullableString((entry as { user?: unknown }).user);
const executions = nullableInteger((entry as { executions?: unknown }).executions);
return executions === null ? [] : [{ user, executions }];
});
} catch {
return [];
}
}
function mapAggregatedRow(row: unknown[], indexes: Map<string, number>): AggregatedTemplate {
return aggregatedTemplateSchema.parse({
templateId: requiredString(value(row, indexes, 'template_id'), 'template_id'),
canonicalSql: requiredString(value(row, indexes, 'canonical_sql'), 'canonical_sql'),
dialect: 'bigquery',
stats: {
executions: requiredInteger(value(row, indexes, 'executions'), 'executions'),
distinctUsers: requiredInteger(value(row, indexes, 'distinct_users'), 'distinct_users'),
firstSeen: isoTimestamp(value(row, indexes, 'first_seen'), 'first_seen'),
lastSeen: isoTimestamp(value(row, indexes, 'last_seen'), 'last_seen'),
p50RuntimeMs: nullableNumber(value(row, indexes, 'p50_ms')),
p95RuntimeMs: nullableNumber(value(row, indexes, 'p95_ms')),
errorRate: requiredNumber(value(row, indexes, 'error_rate'), 'error_rate'),
rowsProduced: nullableInteger(value(row, indexes, 'rows_produced')),
},
topUsers: parseTopUsers(value(row, indexes, 'top_users')),
});
}
export class BigQueryHistoricSqlQueryHistoryReader {
private readonly viewPath: string;
constructor(options: BigQueryHistoricSqlQueryHistoryReaderOptions) {
const projectId = normalizeProjectId(options.projectId);
const region = normalizeRegion(options.region);
this.viewPath = `\`${projectId}.region-${region}.INFORMATION_SCHEMA.JOBS_BY_PROJECT\``;
}
async probe(client: unknown): Promise<{ warnings: string[]; info: string[] }> {
let result: QueryResultLike;
try {
result = await queryClient(client).executeQuery(`SELECT 1 FROM ${this.viewPath} LIMIT 1`);
} catch (error) {
throw grantsError(error);
}
if (result.error) {
throw grantsError(result.error);
}
return { warnings: [], info: [] };
}
async *fetchAggregated(
client: unknown,
window: HistoricSqlTimeWindow,
config: HistoricSqlUnifiedPullConfig,
): AsyncIterable<AggregatedTemplate> {
const sql = `
SELECT
query_hash AS template_id,
MIN(query) AS canonical_sql,
COUNT(*) AS executions,
COUNT(DISTINCT user_email) AS distinct_users,
MIN(creation_time) AS first_seen,
MAX(creation_time) AS last_seen,
APPROX_QUANTILES(TIMESTAMP_DIFF(end_time, creation_time, MILLISECOND), 100)[OFFSET(50)] AS p50_ms,
APPROX_QUANTILES(TIMESTAMP_DIFF(end_time, creation_time, MILLISECOND), 100)[OFFSET(95)] AS p95_ms,
SAFE_DIVIDE(COUNTIF(error_result IS NOT NULL), COUNT(*)) AS error_rate,
CAST(NULL AS INT64) AS rows_produced,
TO_JSON_STRING(ARRAY_AGG(STRUCT(user_email AS user, 1 AS executions) ORDER BY creation_time DESC LIMIT 5)) AS top_users
FROM ${this.viewPath}
WHERE job_type = 'QUERY'
AND statement_type IN ('SELECT', 'MERGE')
AND creation_time >= ${timestampExpression(window.start)}
AND creation_time < ${timestampExpression(window.end)}
AND query IS NOT NULL
GROUP BY query_hash
HAVING COUNT(*) >= ${config.minExecutions}
ORDER BY executions DESC`.trim();
const result = await queryClient(client).executeQuery(sql);
if (result.error) {
throw grantsError(result.error);
}
const indexes = indexByHeader(result.headers);
for (const row of result.rows) {
yield mapAggregatedRow(row, indexes);
}
}
}

View file

@ -0,0 +1,59 @@
import { describe, expect, it } from 'vitest';
import {
bucketDistinctUsers,
bucketErrorRate,
bucketExecutions,
bucketFrequency,
bucketP95Runtime,
bucketRecency,
} from './buckets.js';
describe('historic-sql bucket helpers', () => {
it('uses stable execution buckets', () => {
expect([0, 9, 10, 99, 100, 999, 1000, 4999, 5000, 49999, 50000].map(bucketExecutions)).toEqual([
'<10',
'<10',
'10-100',
'10-100',
'100-1k',
'100-1k',
'1k-5k',
'1k-5k',
'5k-50k',
'5k-50k',
'>50k',
]);
});
it('uses stable distinct-user, error-rate, runtime, and recency buckets', () => {
expect([0, 1, 2, 5, 6, 10, 11].map(bucketDistinctUsers)).toEqual([
'0',
'1',
'2-5',
'2-5',
'5-10',
'5-10',
'>10',
]);
expect([0, 0.01, 0.05, 0.2].map(bucketErrorRate)).toEqual(['none', 'low', 'low', 'high']);
expect([null, 99, 100, 999, 1000, 9999, 10000].map(bucketP95Runtime)).toEqual([
'unknown',
'<100ms',
'100ms-1s',
'100ms-1s',
'1s-10s',
'1s-10s',
'>10s',
]);
expect(bucketRecency('2026-05-11T00:00:00.000Z', new Date('2026-05-11T12:00:00.000Z'))).toBe('current');
expect(bucketRecency('2026-04-20T00:00:00.000Z', new Date('2026-05-11T12:00:00.000Z'))).toBe('recent');
expect(bucketRecency('2026-01-01T00:00:00.000Z', new Date('2026-05-11T12:00:00.000Z'))).toBe('stale');
});
it('maps frequency counts to high, mid, and low labels', () => {
expect(bucketFrequency(80, 100)).toBe('high');
expect(bucketFrequency(20, 100)).toBe('mid');
expect(bucketFrequency(1, 100)).toBe('low');
expect(bucketFrequency(0, 0)).toBe('low');
});
});

View file

@ -0,0 +1,49 @@
export function bucketExecutions(value: number): string {
if (value < 10) return '<10';
if (value < 100) return '10-100';
if (value < 1000) return '100-1k';
if (value < 5000) return '1k-5k';
if (value < 50000) return '5k-50k';
return '>50k';
}
export function bucketDistinctUsers(value: number): string {
if (value <= 0) return '0';
if (value === 1) return '1';
if (value <= 5) return '2-5';
if (value <= 10) return '5-10';
return '>10';
}
export function bucketErrorRate(value: number): string {
if (value <= 0) return 'none';
if (value < 0.1) return 'low';
return 'high';
}
export function bucketP95Runtime(value: number | null): string {
if (value === null) return 'unknown';
if (value < 100) return '<100ms';
if (value < 1000) return '100ms-1s';
if (value < 10000) return '1s-10s';
return '>10s';
}
export function bucketRecency(lastSeen: string, now: Date): string {
const parsed = new Date(lastSeen);
if (Number.isNaN(parsed.getTime())) {
return 'unknown';
}
const ageDays = (now.getTime() - parsed.getTime()) / (24 * 60 * 60 * 1000);
if (ageDays <= 7) return 'current';
if (ageDays <= 45) return 'recent';
return 'stale';
}
export function bucketFrequency(count: number, total: number): 'high' | 'mid' | 'low' {
if (total <= 0 || count <= 0) return 'low';
const ratio = count / total;
if (ratio >= 0.5) return 'high';
if (ratio >= 0.1) return 'mid';
return 'low';
}

View file

@ -0,0 +1,182 @@
import { mkdir, mkdtemp, writeFile } from 'node:fs/promises';
import { tmpdir } from 'node:os';
import { join } from 'node:path';
import { describe, expect, it } from 'vitest';
import { chunkHistoricSqlUnifiedStagedDir, describeHistoricSqlUnifiedScope } from './chunk-unified.js';
async function tempDir(): Promise<string> {
return mkdtemp(join(tmpdir(), 'historic-sql-unified-chunk-'));
}
async function writeJson(root: string, relPath: string, value: unknown): Promise<void> {
const target = join(root, relPath);
await mkdir(join(target, '..'), { recursive: true });
await writeFile(target, `${JSON.stringify(value, null, 2)}\n`, 'utf-8');
}
async function writeUnifiedStagedDir(root: string): Promise<void> {
await writeJson(root, 'manifest.json', {
source: 'historic-sql',
connectionId: 'warehouse',
dialect: 'postgres',
fetchedAt: '2026-05-11T00:00:00.000Z',
windowStart: '2026-02-10T00:00:00.000Z',
windowEnd: '2026-05-11T00:00:00.000Z',
snapshotRowCount: 1,
touchedTableCount: 1,
parseFailures: 0,
warnings: [],
probeWarnings: [],
});
await writeJson(root, 'tables/public.orders.json', {
table: 'public.orders',
stats: {
executionsBucket: '10-100',
distinctUsersBucket: '2-5',
errorRateBucket: 'none',
p95RuntimeBucket: '<100ms',
recencyBucket: 'current',
},
columnsByClause: { select: [['status', 'high']] },
observedJoins: [],
topTemplates: [{ id: 'orders', canonicalSql: 'select * from public.orders', topUsers: [{ user: 'analyst' }] }],
});
await writeJson(root, 'patterns-input.json', {
templates: [
{
id: 'orders',
canonicalSql: 'select * from public.orders join public.customers on true',
tablesTouched: ['public.orders', 'public.customers'],
executionsBucket: '10-100',
distinctUsersBucket: '2-5',
dialect: 'postgres',
},
],
});
await writeJson(root, 'patterns-input/part-0001.json', {
templates: [
{
id: 'orders',
canonicalSql: 'select * from public.orders join public.customers on true',
tablesTouched: ['public.orders', 'public.customers'],
executionsBucket: '10-100',
distinctUsersBucket: '2-5',
dialect: 'postgres',
},
],
});
}
describe('chunkHistoricSqlUnifiedStagedDir', () => {
it('emits one table WorkUnit plus one patterns WorkUnit', async () => {
const stagedDir = await tempDir();
await writeUnifiedStagedDir(stagedDir);
const result = await chunkHistoricSqlUnifiedStagedDir(stagedDir);
expect(result.workUnits).toEqual([
expect.objectContaining({
unitKey: 'historic-sql-table-public-orders',
displayLabel: 'Historic SQL usage: public.orders',
rawFiles: ['tables/public.orders.json'],
dependencyPaths: ['manifest.json'],
notes: expect.stringContaining('historic_sql_table_digest'),
}),
expect.objectContaining({
unitKey: 'historic-sql-patterns-part-0001',
displayLabel: 'Historic SQL cross-table patterns: part-0001',
rawFiles: ['patterns-input/part-0001.json'],
dependencyPaths: ['manifest.json'],
notes: expect.stringContaining('patterns-input/part-0001.json'),
}),
]);
expect(result.workUnits[0]?.notes).toContain('emit_historic_sql_evidence');
expect(result.workUnits[1]?.notes).toContain('emit_historic_sql_evidence');
expect(result.reconcileNotes).toEqual(['Historic-SQL touched tables=1 parseFailures=0']);
});
it('respects diff sets for unchanged table and patterns files', async () => {
const stagedDir = await tempDir();
await writeUnifiedStagedDir(stagedDir);
await expect(
chunkHistoricSqlUnifiedStagedDir(stagedDir, {
added: [],
modified: ['tables/public.orders.json'],
deleted: [],
unchanged: ['manifest.json', 'patterns-input.json', 'patterns-input/part-0001.json'],
}),
).resolves.toMatchObject({
workUnits: [expect.objectContaining({ unitKey: 'historic-sql-table-public-orders' })],
});
await expect(
chunkHistoricSqlUnifiedStagedDir(stagedDir, {
added: [],
modified: ['patterns-input/part-0001.json'],
deleted: [],
unchanged: ['manifest.json', 'patterns-input.json', 'tables/public.orders.json'],
}),
).resolves.toMatchObject({
workUnits: [expect.objectContaining({ unitKey: 'historic-sql-patterns-part-0001' })],
});
await expect(
chunkHistoricSqlUnifiedStagedDir(stagedDir, {
added: [],
modified: ['patterns-input.json'],
deleted: [],
unchanged: ['manifest.json', 'patterns-input/part-0001.json', 'tables/public.orders.json'],
}),
).resolves.toMatchObject({
workUnits: [],
});
});
it('describes unified staged scope', async () => {
const stagedDir = await tempDir();
await writeUnifiedStagedDir(stagedDir);
const scope = await describeHistoricSqlUnifiedScope(stagedDir);
expect(scope.isPathInScope('manifest.json')).toBe(true);
expect(scope.isPathInScope('patterns-input.json')).toBe(true);
expect(scope.isPathInScope('patterns-input/part-0001.json')).toBe(true);
expect(scope.isPathInScope('patterns-input/part-1.json')).toBe(false);
expect(scope.isPathInScope('tables/public.orders.json')).toBe(true);
expect(scope.isPathInScope('templates/old/page.md')).toBe(false);
});
it('emits one patterns WorkUnit per changed shard', async () => {
const stagedDir = await tempDir();
await writeUnifiedStagedDir(stagedDir);
await writeJson(stagedDir, 'patterns-input/part-0002.json', {
templates: [
{
id: 'line-items',
canonicalSql: 'select * from public.orders join public.line_items on true',
tablesTouched: ['public.orders', 'public.line_items'],
executionsBucket: '10-100',
distinctUsersBucket: '2-5',
dialect: 'postgres',
},
],
});
const result = await chunkHistoricSqlUnifiedStagedDir(stagedDir, {
added: ['patterns-input/part-0002.json'],
modified: ['patterns-input/part-0001.json'],
deleted: [],
unchanged: ['manifest.json', 'patterns-input.json', 'tables/public.orders.json'],
});
expect(result.workUnits.map((unit) => unit.unitKey)).toEqual([
'historic-sql-patterns-part-0001',
'historic-sql-patterns-part-0002',
]);
expect(result.workUnits.map((unit) => unit.rawFiles)).toEqual([
['patterns-input/part-0001.json'],
['patterns-input/part-0002.json'],
]);
});
});

View file

@ -0,0 +1,99 @@
import { createHash } from 'node:crypto';
import { readFile, readdir } from 'node:fs/promises';
import { join, relative } from 'node:path';
import type { ChunkResult, DiffSet, ScopeDescriptor, WorkUnit } from '../../types.js';
import { isHistoricSqlPatternInputShardPath } from './pattern-inputs.js';
import { stagedManifestSchema, stagedPatternsInputSchema, stagedTableInputSchema } from './types.js';
async function walk(root: string): Promise<string[]> {
const entries = await readdir(root, { withFileTypes: true, recursive: true });
return entries
.filter((entry) => entry.isFile())
.map((entry) => relative(root, join(entry.parentPath, entry.name)).replace(/\\/g, '/'))
.sort();
}
async function readJson<T>(stagedDir: string, relPath: string): Promise<T> {
return JSON.parse(await readFile(join(stagedDir, relPath), 'utf-8')) as T;
}
function safeUnitKey(value: string): string {
return value.replace(/[^a-zA-Z0-9]+/g, '-').replace(/^-+|-+$/g, '');
}
function touchedPath(path: string, touched: Set<string> | null): boolean {
return !touched || touched.has(path);
}
export async function chunkHistoricSqlUnifiedStagedDir(stagedDir: string, diffSet?: DiffSet): Promise<ChunkResult> {
const files = await walk(stagedDir);
const manifest = stagedManifestSchema.parse(await readJson(stagedDir, 'manifest.json'));
const touched = diffSet ? new Set([...diffSet.added, ...diffSet.modified]) : null;
const workUnits: WorkUnit[] = [];
for (const path of files.filter((file) => /^tables\/.+\.json$/.test(file))) {
if (!touchedPath(path, touched)) {
continue;
}
const table = stagedTableInputSchema.parse(await readJson(stagedDir, path));
workUnits.push({
unitKey: `historic-sql-table-${safeUnitKey(table.table)}`,
displayLabel: `Historic SQL usage: ${table.table}`,
rawFiles: [path],
dependencyPaths: ['manifest.json'],
peerFileIndex: files.filter((file) => file !== path && file !== 'manifest.json').sort(),
notes:
'Use historic_sql_table_digest. Read this table usage JSON and emit exactly one table_usage object with emit_historic_sql_evidence. Do not call wiki_write or sl_write_source.',
});
}
for (const path of files.filter(isHistoricSqlPatternInputShardPath)) {
if (!touchedPath(path, touched)) {
continue;
}
stagedPatternsInputSchema.parse(await readJson(stagedDir, path));
const shardLabel = path.replace(/^patterns-input\//, '').replace(/\.json$/, '');
workUnits.push({
unitKey: `historic-sql-patterns-${safeUnitKey(shardLabel)}`,
displayLabel: `Historic SQL cross-table patterns: ${shardLabel}`,
rawFiles: [path],
dependencyPaths: ['manifest.json'],
peerFileIndex: files.filter((file) => file !== path && file !== 'manifest.json').sort(),
notes:
`Use historic_sql_patterns. Read ${path} and emit pattern objects with emit_historic_sql_evidence using rawPath "${path}". Do not call wiki_write or sl_write_source.`,
});
}
const deleted = diffSet?.deleted
.filter((path) => isHistoricSqlPatternInputShardPath(path) || /^tables\/.+\.json$/.test(path))
.sort();
return {
workUnits,
eviction: deleted && deleted.length > 0 ? { deletedRawPaths: deleted } : undefined,
reconcileNotes: [`Historic-SQL touched tables=${manifest.touchedTableCount} parseFailures=${manifest.parseFailures}`],
contextReport: {
capped: false,
warnings: [...manifest.probeWarnings, ...manifest.warnings],
},
};
}
export async function describeHistoricSqlUnifiedScope(stagedDir: string): Promise<ScopeDescriptor> {
const manifest = stagedManifestSchema.parse(await readJson(stagedDir, 'manifest.json'));
const fingerprint = createHash('sha256')
.update(JSON.stringify({
connectionId: manifest.connectionId,
dialect: manifest.dialect,
windowStart: manifest.windowStart,
windowEnd: manifest.windowEnd,
}))
.digest('hex');
return {
fingerprint,
isPathInScope: (rawPath) =>
rawPath === 'manifest.json' ||
rawPath === 'patterns-input.json' ||
isHistoricSqlPatternInputShardPath(rawPath) ||
/^tables\/.+\.json$/.test(rawPath),
};
}

View file

@ -0,0 +1,57 @@
import { mkdir, mkdtemp, writeFile } from 'node:fs/promises';
import { tmpdir } from 'node:os';
import { join } from 'node:path';
import { describe, expect, it } from 'vitest';
import { detectHistoricSqlStagedDir } from './detect.js';
import { HISTORIC_SQL_SOURCE_KEY, stagedManifestSchema } from './types.js';
async function tempDir(): Promise<string> {
return mkdtemp(join(tmpdir(), 'historic-sql-detect-'));
}
async function writeJson(root: string, relPath: string, value: unknown): Promise<void> {
const target = join(root, relPath);
await mkdir(join(target, '..'), { recursive: true });
await writeFile(target, `${JSON.stringify(value, null, 2)}\n`, 'utf-8');
}
function manifest() {
return stagedManifestSchema.parse({
source: HISTORIC_SQL_SOURCE_KEY,
connectionId: 'conn_1',
dialect: 'postgres',
fetchedAt: '2026-05-04T12:00:00.000Z',
windowStart: '2026-02-03T12:00:00.000Z',
windowEnd: '2026-05-04T12:00:00.000Z',
snapshotRowCount: 0,
touchedTableCount: 0,
parseFailures: 0,
warnings: [],
probeWarnings: [],
});
}
describe('historic-sql staged dir detection', () => {
it('detects manifest source', async () => {
const stagedDir = await tempDir();
await writeJson(stagedDir, 'manifest.json', manifest());
await expect(detectHistoricSqlStagedDir(stagedDir)).resolves.toBe(true);
});
it('detects unified table and patterns structure without manifest', async () => {
const stagedDir = await tempDir();
await writeFile(join(stagedDir, 'not-a-match.txt'), 'x', 'utf-8');
await writeJson(stagedDir, 'patterns-input.json', { templates: [] });
await writeJson(stagedDir, 'tables/public.orders.json', { table: 'public.orders' });
await expect(detectHistoricSqlStagedDir(stagedDir)).resolves.toBe(true);
});
it('does not detect unrelated directories', async () => {
const stagedDir = await tempDir();
await writeJson(stagedDir, 'manifest.json', { source: 'notion' });
await expect(detectHistoricSqlStagedDir(stagedDir)).resolves.toBe(false);
});
});

View file

@ -0,0 +1,25 @@
import { readFile, readdir } from 'node:fs/promises';
import { join } from 'node:path';
import { HISTORIC_SQL_SOURCE_KEY } from './types.js';
export async function detectHistoricSqlStagedDir(stagedDir: string): Promise<boolean> {
try {
const manifest = JSON.parse(await readFile(join(stagedDir, 'manifest.json'), 'utf-8')) as { source?: unknown };
if (manifest.source === HISTORIC_SQL_SOURCE_KEY) {
return true;
}
if (manifest.source !== undefined) {
return false;
}
} catch {
// Fall through to structural detection for stage-only fixtures.
}
try {
await readFile(join(stagedDir, 'patterns-input.json'), 'utf-8');
const entries = await readdir(join(stagedDir, 'tables'), { withFileTypes: true });
return entries.some((entry) => entry.isFile() && entry.name.endsWith('.json'));
} catch {
return false;
}
}

View file

@ -0,0 +1,61 @@
import type { HistoricSqlDialect } from './types.js';
interface HistoricSqlGrantsMissingErrorOptions {
dialect: HistoricSqlDialect;
message: string;
remediation: string;
cause?: unknown;
}
export class HistoricSqlGrantsMissingError extends Error {
readonly dialect: HistoricSqlDialect;
readonly remediation: string;
constructor(options: HistoricSqlGrantsMissingErrorOptions) {
super(options.message, options.cause === undefined ? undefined : { cause: options.cause });
this.name = 'HistoricSqlGrantsMissingError';
this.dialect = options.dialect;
this.remediation = options.remediation;
}
}
interface HistoricSqlExtensionMissingErrorOptions {
dialect: HistoricSqlDialect;
message: string;
remediation: string;
cause?: unknown;
}
export class HistoricSqlExtensionMissingError extends Error {
readonly dialect: HistoricSqlDialect;
readonly remediation: string;
constructor(options: HistoricSqlExtensionMissingErrorOptions) {
super(options.message, options.cause === undefined ? undefined : { cause: options.cause });
this.name = 'HistoricSqlExtensionMissingError';
this.dialect = options.dialect;
this.remediation = options.remediation;
}
}
interface HistoricSqlVersionUnsupportedErrorOptions {
dialect: HistoricSqlDialect;
detectedVersion: string;
minimumVersion: string;
}
export class HistoricSqlVersionUnsupportedError extends Error {
readonly dialect: HistoricSqlDialect;
readonly detectedVersion: string;
readonly minimumVersion: string;
constructor(options: HistoricSqlVersionUnsupportedErrorOptions) {
super(
`Unsupported ${options.dialect} version for historic-SQL ingest: detected ${options.detectedVersion}; requires ${options.minimumVersion} or newer.`,
);
this.name = 'HistoricSqlVersionUnsupportedError';
this.dialect = options.dialect;
this.detectedVersion = options.detectedVersion;
this.minimumVersion = options.minimumVersion;
}
}

View file

@ -0,0 +1,89 @@
import { describe, expect, it, vi } from 'vitest';
import { asSchema } from 'ai';
import { createEmitHistoricSqlEvidenceTool } from './evidence-tool.js';
describe('emit_historic_sql_evidence tool', () => {
it('exposes an AI SDK v6 tool input schema with top-level object type', async () => {
const tool = createEmitHistoricSqlEvidenceTool();
expect(await asSchema(tool.inputSchema).jsonSchema).toMatchObject({
type: 'object',
});
});
it('writes table usage evidence to the ignored run evidence directory', async () => {
const writeFile = vi.fn(async () => ({ success: true, commitHash: null }));
const tool = createEmitHistoricSqlEvidenceTool();
const result = await tool.execute!(
{
kind: 'table_usage',
table: 'public.orders',
rawPath: 'tables/public.orders.json',
usage: {
narrative: 'Orders are repeatedly queried by paid status.',
frequencyTier: 'high',
commonFilters: ['status'],
commonJoins: [],
staleSince: null,
},
},
{
toolCallId: 'call-1',
messages: [],
abortSignal: new AbortController().signal,
experimental_context: {
connectionId: 'warehouse',
session: {
ingest: { runId: 'run-1', jobId: 'job-1', syncId: 'sync-1', sourceKey: 'historic-sql' },
configService: { writeFile },
},
},
} as never,
);
expect(result).toBe('Recorded historic-SQL table_usage evidence for public.orders.');
expect(writeFile).toHaveBeenCalledWith(
'.ktx/ingest-evidence/historic-sql/run-1/historic-sql-table-public-orders.json',
expect.stringContaining('"kind": "table_usage"'),
'System User',
'system@example.com',
'Record historic-SQL evidence: historic-sql-table-public-orders',
{ skipLock: true },
);
});
it('rejects non-historic ingest sessions', async () => {
const tool = createEmitHistoricSqlEvidenceTool();
await expect(
tool.execute!(
{
kind: 'pattern',
rawPath: 'patterns-input.json',
pattern: {
slug: 'orders',
title: 'Orders',
narrative: 'Orders pattern.',
definitionSql: 'select * from public.orders',
tablesInvolved: ['public.orders'],
slRefs: ['orders'],
constituentTemplateIds: ['pg:1'],
},
},
{
toolCallId: 'call-1',
messages: [],
abortSignal: new AbortController().signal,
experimental_context: {
connectionId: 'warehouse',
session: {
ingest: { runId: 'run-1', jobId: 'job-1', syncId: 'sync-1', sourceKey: 'notion' },
configService: { writeFile: vi.fn() },
},
},
} as never,
),
).resolves.toContain('Error: emit_historic_sql_evidence is only available during historic-sql ingest');
});
});

View file

@ -0,0 +1,121 @@
import { tool } from 'ai';
import { z } from 'zod';
import { historicSqlEvidencePath, serializeHistoricSqlEvidence } from './evidence.js';
import { patternOutputSchema, tableUsageOutputSchema } from './skill-schemas.js';
const SYSTEM_AUTHOR = 'System User';
const SYSTEM_EMAIL = 'system@example.com';
const emitHistoricSqlEvidenceInputSchema = z
.object({
kind: z.enum(['table_usage', 'pattern']),
table: z.string().min(1).optional(),
rawPath: z.string().min(1),
usage: tableUsageOutputSchema.optional(),
pattern: patternOutputSchema.optional(),
})
.superRefine((input, ctx) => {
if (input.kind === 'table_usage') {
if (!input.table) {
ctx.addIssue({
code: 'custom',
path: ['table'],
message: 'table is required when kind is table_usage',
});
}
if (!input.usage) {
ctx.addIssue({
code: 'custom',
path: ['usage'],
message: 'usage is required when kind is table_usage',
});
}
}
if (input.kind === 'pattern' && !input.pattern) {
ctx.addIssue({
code: 'custom',
path: ['pattern'],
message: 'pattern is required when kind is pattern',
});
}
});
type EmitHistoricSqlEvidenceInput = z.infer<typeof emitHistoricSqlEvidenceInputSchema>;
interface EmitHistoricSqlEvidenceToolContext {
connectionId?: string | null;
session?: {
ingest?: { runId: string; sourceKey: string };
configService?: {
writeFile(
path: string,
content: string,
author: string,
authorEmail: string,
commitMessage: string,
options?: { skipLock?: boolean },
): Promise<unknown>;
};
};
}
function unitKeyForEvidence(input: EmitHistoricSqlEvidenceInput): string {
if (input.kind === 'table_usage') {
return `historic-sql-table-${String(input.table).replace(/[^a-zA-Z0-9]+/g, '-').replace(/^-+|-+$/g, '')}`;
}
return `historic-sql-pattern-${String(input.pattern?.slug).replace(/[^a-zA-Z0-9]+/g, '-').replace(/^-+|-+$/g, '')}`;
}
function evidenceEnvelope(input: EmitHistoricSqlEvidenceInput, connectionId: string) {
if (input.kind === 'table_usage') {
if (!input.table || !input.usage) {
throw new Error('Invalid historic-SQL table usage evidence input.');
}
return {
kind: 'table_usage' as const,
connectionId,
table: input.table,
rawPath: input.rawPath,
usage: input.usage,
};
}
if (!input.pattern) {
throw new Error('Invalid historic-SQL pattern evidence input.');
}
return {
kind: 'pattern' as const,
connectionId,
rawPath: input.rawPath,
pattern: input.pattern,
};
}
export function createEmitHistoricSqlEvidenceTool(defaultContext?: EmitHistoricSqlEvidenceToolContext) {
return tool({
description:
'Record typed historic-SQL evidence for deterministic projection. Use this instead of wiki_write, sl_write_source, sl_edit_source, or context_candidate_write during historic-SQL WorkUnits.',
inputSchema: emitHistoricSqlEvidenceInputSchema,
execute: async (input, options): Promise<string> => {
const context = (options.experimental_context as EmitHistoricSqlEvidenceToolContext | undefined) ?? defaultContext;
const ingest = context?.session?.ingest;
const configService = context?.session?.configService;
if (!ingest || ingest.sourceKey !== 'historic-sql' || !configService || !context?.connectionId) {
return 'Error: emit_historic_sql_evidence is only available during historic-sql ingest.';
}
const unitKey = unitKeyForEvidence(input);
const evidence = evidenceEnvelope(input, context.connectionId);
const content = serializeHistoricSqlEvidence(evidence);
await configService.writeFile(
historicSqlEvidencePath(ingest.runId, unitKey),
content,
SYSTEM_AUTHOR,
SYSTEM_EMAIL,
`Record historic-SQL evidence: ${unitKey}`,
{ skipLock: true },
);
const label = evidence.kind === 'table_usage' ? evidence.table : evidence.pattern.slug;
return `Recorded historic-SQL ${input.kind} evidence for ${label}.`;
},
});
}

View file

@ -0,0 +1,57 @@
import { describe, expect, it } from 'vitest';
import {
historicSqlEvidenceEnvelopeSchema,
historicSqlEvidencePath,
historicSqlPatternEvidenceSchema,
historicSqlTableUsageEvidenceSchema,
} from './evidence.js';
describe('historic-sql evidence contracts', () => {
it('validates table usage evidence emitted by table digest WorkUnits', () => {
const parsed = historicSqlTableUsageEvidenceSchema.parse({
kind: 'table_usage',
connectionId: 'warehouse',
table: 'public.orders',
rawPath: 'tables/public.orders.json',
usage: {
narrative: 'Orders are repeatedly queried for paid/refunded lifecycle analysis.',
frequencyTier: 'high',
commonFilters: ['status', 'created_at'],
commonGroupBys: ['status'],
commonJoins: [{ table: 'public.customers', on: ['customer_id'] }],
staleSince: null,
},
});
expect(parsed.table).toBe('public.orders');
expect(parsed.usage.frequencyTier).toBe('high');
});
it('validates pattern evidence emitted by the patterns WorkUnit', () => {
const parsed = historicSqlPatternEvidenceSchema.parse(
historicSqlEvidenceEnvelopeSchema.parse({
kind: 'pattern',
connectionId: 'warehouse',
rawPath: 'patterns-input.json',
pattern: {
slug: 'order-lifecycle-analysis',
title: 'Order Lifecycle Analysis',
narrative: 'Analysts compare order status changes by customer segment.',
definitionSql: 'select status, count(*) from public.orders group by status',
tablesInvolved: ['public.orders', 'public.customers'],
slRefs: ['orders', 'customers'],
constituentTemplateIds: ['pg:1', 'pg:2'],
},
}),
);
expect(parsed.kind).toBe('pattern');
expect(parsed.pattern.slug).toBe('order-lifecycle-analysis');
});
it('builds a stable ignored evidence path from run and WorkUnit identity', () => {
expect(historicSqlEvidencePath('run-1', 'historic-sql-table-public-orders')).toBe(
'.ktx/ingest-evidence/historic-sql/run-1/historic-sql-table-public-orders.json',
);
});
});

View file

@ -0,0 +1,41 @@
import { z } from 'zod';
import { patternOutputSchema, tableUsageOutputSchema } from './skill-schemas.js';
function safeEvidenceSegment(value: string): string {
const segment = value.replace(/[^a-zA-Z0-9._-]+/g, '-').replace(/^-+|-+$/g, '');
if (!segment) {
throw new Error(`Invalid historic-SQL evidence path segment: ${value}`);
}
return segment;
}
/** @internal */
export const historicSqlTableUsageEvidenceSchema = z.object({
kind: z.literal('table_usage'),
connectionId: z.string().min(1),
table: z.string().min(1),
rawPath: z.string().min(1),
usage: tableUsageOutputSchema,
});
/** @internal */
export const historicSqlPatternEvidenceSchema = z.object({
kind: z.literal('pattern'),
connectionId: z.string().min(1),
rawPath: z.string().min(1),
pattern: patternOutputSchema,
});
export const historicSqlEvidenceEnvelopeSchema = z.discriminatedUnion('kind', [
historicSqlTableUsageEvidenceSchema,
historicSqlPatternEvidenceSchema,
]);
export type HistoricSqlEvidenceEnvelope = z.infer<typeof historicSqlEvidenceEnvelopeSchema>;
export function historicSqlEvidencePath(runId: string, unitKey: string): string {
return `.ktx/ingest-evidence/historic-sql/${safeEvidenceSegment(runId)}/${safeEvidenceSegment(unitKey)}.json`;
}
export function serializeHistoricSqlEvidence(evidence: HistoricSqlEvidenceEnvelope): string {
return `${JSON.stringify(historicSqlEvidenceEnvelopeSchema.parse(evidence), null, 2)}\n`;
}

View file

@ -0,0 +1,110 @@
import { mkdtemp } from 'node:fs/promises';
import { tmpdir } from 'node:os';
import { join } from 'node:path';
import { describe, expect, it } from 'vitest';
import type { SqlAnalysisPort } from '../../../../context/sql-analysis/ports.js';
import type { SourceAdapter } from '../../types.js';
import { HistoricSqlSourceAdapter } from './historic-sql.adapter.js';
import type { HistoricSqlReader } from './types.js';
async function tempDir(): Promise<string> {
return mkdtemp(join(tmpdir(), 'historic-sql-adapter-'));
}
const sqlAnalysis: SqlAnalysisPort = {
async analyzeForFingerprint() {
throw new Error('analyzeForFingerprint must not be used');
},
async analyzeBatch() {
return new Map();
},
async validateReadOnly() {
return { ok: true };
},
};
const reader: HistoricSqlReader = {
async probe() {
return { warnings: [], info: [] };
},
async *fetchAggregated() {},
};
describe('HistoricSqlSourceAdapter', () => {
it('declares canonical adapter metadata', () => {
const adapter = new HistoricSqlSourceAdapter({ sqlAnalysis, reader, queryClient: {} });
expect(adapter.source).toBe('historic-sql');
expect(adapter.skillNames).toEqual(['historic_sql_table_digest', 'historic_sql_patterns']);
expect(adapter.reconcileSkillNames).toEqual([]);
expect((adapter as SourceAdapter).evidenceIndexing).toBeUndefined();
expect(adapter.triageSupported).toBe(false);
});
it('fetches a unified aggregate snapshot and emits unified WorkUnits', async () => {
const stagedDir = await tempDir();
const aggregateReader: HistoricSqlReader = {
async probe() {
return { warnings: [], info: [] };
},
async *fetchAggregated() {
yield {
templateId: 'pg:1',
canonicalSql:
'select o.status, count(*) from public.orders o join public.customers c on c.id = o.customer_id group by o.status',
dialect: 'postgres',
stats: {
executions: 25,
distinctUsers: 3,
firstSeen: '2026-05-01T00:00:00.000Z',
lastSeen: '2026-05-11T00:00:00.000Z',
p50RuntimeMs: 10,
p95RuntimeMs: 20,
errorRate: 0,
rowsProduced: 10,
},
topUsers: [{ user: 'analyst', executions: 25 }],
};
},
};
const batchSqlAnalysis: SqlAnalysisPort = {
async analyzeForFingerprint() {
throw new Error('analyzeForFingerprint must not be used');
},
async analyzeBatch() {
return new Map([
[
'pg:1',
{
tablesTouched: ['public.orders', 'public.customers'],
columnsByClause: { select: ['status'], join: ['customer_id', 'id'], groupBy: ['status'] },
},
],
]);
},
async validateReadOnly() {
return { ok: true };
},
};
const adapter = new HistoricSqlSourceAdapter({
sqlAnalysis: batchSqlAnalysis,
reader: aggregateReader,
queryClient: {},
now: () => new Date('2026-05-11T00:00:00.000Z'),
});
await adapter.fetch({ dialect: 'postgres', minExecutions: 5 }, stagedDir, {
connectionId: 'warehouse',
sourceKey: 'historic-sql',
});
await expect(adapter.detect(stagedDir)).resolves.toBe(true);
await expect(adapter.chunk(stagedDir)).resolves.toMatchObject({
workUnits: [
{ unitKey: 'historic-sql-table-public-customers' },
{ unitKey: 'historic-sql-table-public-orders' },
{ unitKey: 'historic-sql-patterns-part-0001' },
],
});
});
});

View file

@ -0,0 +1,65 @@
import type {
ChunkResult,
DeterministicFinalizationContext,
DiffSet,
FetchContext,
FinalizationResult,
ScopeDescriptor,
SourceAdapter,
} from '../../types.js';
import { chunkHistoricSqlUnifiedStagedDir, describeHistoricSqlUnifiedScope } from './chunk-unified.js';
import { detectHistoricSqlStagedDir } from './detect.js';
import { projectHistoricSqlEvidence } from './projection.js';
import { stageHistoricSqlAggregatedSnapshot } from './stage-unified.js';
import { type HistoricSqlSourceAdapterDeps } from './types.js';
export class HistoricSqlSourceAdapter implements SourceAdapter {
readonly source = 'historic-sql';
readonly skillNames = ['historic_sql_table_digest', 'historic_sql_patterns'];
readonly reconcileSkillNames: string[] = [];
readonly triageSupported = false;
constructor(private readonly deps: HistoricSqlSourceAdapterDeps) {}
detect(stagedDir: string): Promise<boolean> {
return detectHistoricSqlStagedDir(stagedDir);
}
async fetch(pullConfig: unknown, stagedDir: string, ctx: FetchContext): Promise<void> {
await stageHistoricSqlAggregatedSnapshot({
stagedDir,
connectionId: ctx.connectionId,
queryClient: this.deps.queryClient,
reader: this.deps.reader,
sqlAnalysis: this.deps.sqlAnalysis,
pullConfig,
now: this.deps.now?.(),
});
}
chunk(stagedDir: string, diffSet?: DiffSet): Promise<ChunkResult> {
return chunkHistoricSqlUnifiedStagedDir(stagedDir, diffSet);
}
describeScope(stagedDir: string): Promise<ScopeDescriptor> {
return describeHistoricSqlUnifiedScope(stagedDir);
}
async finalize(ctx: DeterministicFinalizationContext): Promise<FinalizationResult> {
const projection = await projectHistoricSqlEvidence({
workdir: ctx.workdir,
connectionId: ctx.connectionId,
syncId: ctx.syncId,
runId: ctx.runId,
overrideReplay: ctx.overrideReplay,
});
return {
result: projection,
warnings: projection.warnings,
errors: [],
touchedSources: projection.touchedSources,
changedWikiPageKeys: projection.changedWikiPageKeys,
actions: projection.actions,
};
}
}

View file

@ -0,0 +1,286 @@
import { mkdtemp, readFile, rm, writeFile } from 'node:fs/promises';
import { tmpdir } from 'node:os';
import { join } from 'node:path';
import YAML from 'yaml';
import type { AgentRunnerPort, RunLoopParams } from '../../../../context/llm/runtime-port.js';
import { initKtxProject, loadKtxProject, type KtxLocalProject } from '../../../../context/project/project.js';
import type { SqlAnalysisBatchItem, SqlAnalysisBatchResult, SqlAnalysisDialect, SqlAnalysisPort } from '../../../../context/sql-analysis/ports.js';
import { searchLocalSlSources } from '../../../sl/local-sl.js';
import { searchLocalKnowledgePages } from '../../../wiki/local-knowledge.js';
import { runLocalIngest } from '../../local-ingest.js';
import { afterEach, beforeEach, describe, expect, it, vi } from 'vitest';
import { HistoricSqlSourceAdapter } from './historic-sql.adapter.js';
import type { AggregatedTemplate, HistoricSqlReader, HistoricSqlUnifiedPullConfig } from './types.js';
class AcceptanceHistoricSqlReader implements HistoricSqlReader {
async probe() {
return { warnings: [], info: [] };
}
async *fetchAggregated(
_client: unknown,
_window: { start: Date; end: Date },
_config: HistoricSqlUnifiedPullConfig,
): AsyncIterable<AggregatedTemplate> {
yield {
templateId: 'pg:orders-lifecycle',
canonicalSql:
'select o.status, c.segment, count(*) from public.orders o join public.customers c on c.id = o.customer_id where o.status = $1 group by o.status, c.segment',
dialect: 'postgres',
stats: {
executions: 42,
distinctUsers: 4,
firstSeen: '2026-05-01T00:00:00.000Z',
lastSeen: '2026-05-11T00:00:00.000Z',
p50RuntimeMs: 18,
p95RuntimeMs: 84,
errorRate: 0,
rowsProduced: 420,
},
topUsers: [{ user: 'analyst@example.test', executions: 42 }],
};
}
}
class HistoricSqlAcceptanceAgentRunner implements AgentRunnerPort {
runLoop = vi.fn(async (params: RunLoopParams) => {
if (params.telemetryTags?.operationName !== 'ingest-bundle-wu') {
return { stopReason: 'natural' as const };
}
const emitEvidence = params.toolSet.emit_historic_sql_evidence;
if (!emitEvidence?.execute) {
throw new Error('emit_historic_sql_evidence tool was not available to the historic-SQL WorkUnit');
}
if (params.telemetryTags.unitKey === 'historic-sql-table-public-orders') {
const result = await emitEvidence.execute({
kind: 'table_usage',
table: 'public.orders',
rawPath: 'tables/public.orders.json',
usage: {
narrative: 'Analysts repeatedly inspect paid order lifecycle by customer segment.',
frequencyTier: 'high',
commonFilters: ['status'],
commonGroupBys: ['status', 'segment'],
commonJoins: [{ table: 'public.customers', on: ['customer_id', 'id'] }],
staleSince: null,
},
});
if (!result.markdown.includes('Recorded historic-SQL table_usage evidence')) {
throw new Error(`Unexpected orders evidence result: ${result.markdown}`);
}
}
if (params.telemetryTags.unitKey === 'historic-sql-table-public-customers') {
const result = await emitEvidence.execute({
kind: 'table_usage',
table: 'public.customers',
rawPath: 'tables/public.customers.json',
usage: {
narrative: 'Customers provide segment context for paid order lifecycle analysis.',
frequencyTier: 'mid',
commonFilters: [],
commonGroupBys: ['segment'],
commonJoins: [{ table: 'public.orders', on: ['id', 'customer_id'] }],
staleSince: null,
},
});
if (!result.markdown.includes('Recorded historic-SQL table_usage evidence')) {
throw new Error(`Unexpected customers evidence result: ${result.markdown}`);
}
}
if (params.telemetryTags.unitKey === 'historic-sql-patterns-part-0001') {
const result = await emitEvidence.execute({
kind: 'pattern',
rawPath: 'patterns-input/part-0001.json',
pattern: {
slug: 'paid-order-lifecycle',
title: 'Paid Order Lifecycle',
narrative: 'Analysts join orders and customers to compare paid order lifecycle by segment.',
definitionSql:
'select o.status, c.segment, count(*) from public.orders o join public.customers c on c.id = o.customer_id group by o.status, c.segment',
tablesInvolved: ['public.orders', 'public.customers'],
slRefs: ['orders', 'customers'],
constituentTemplateIds: ['pg:orders-lifecycle'],
},
});
if (!result.markdown.includes('Recorded historic-SQL pattern evidence')) {
throw new Error(`Unexpected pattern evidence result: ${result.markdown}`);
}
}
return { stopReason: 'natural' as const };
});
}
function acceptanceSqlAnalysis(): SqlAnalysisPort {
return {
analyzeForFingerprint: async () => {
throw new Error('analyzeForFingerprint should not be used by unified historic-SQL ingest');
},
analyzeBatch: vi.fn(
async (
items: SqlAnalysisBatchItem[],
_dialect: SqlAnalysisDialect,
): Promise<Map<string, SqlAnalysisBatchResult>> => {
return new Map(
items.map((item) => [
item.id,
{
tablesTouched: ['public.orders', 'public.customers'],
columnsByClause: {
select: ['status', 'segment'],
where: ['status'],
join: ['customer_id', 'id'],
groupBy: ['status', 'segment'],
},
},
]),
);
},
),
validateReadOnly: vi.fn(async () => ({ ok: true })),
};
}
async function writeHistoricSqlProject(project: KtxLocalProject): Promise<KtxLocalProject> {
await writeFile(
join(project.projectDir, 'ktx.yaml'),
[
'connections:',
' warehouse:',
' driver: postgres',
' historicSql:',
' enabled: true',
' dialect: postgres',
' minExecutions: 2',
'ingest:',
' adapters:',
' - historic-sql',
' embeddings:',
' backend: none',
'storage:',
' state: sqlite',
' search: sqlite-fts5',
' git:',
' auto_commit: false',
' author: KTX Test <system@ktx.local>',
'',
].join('\n'),
'utf-8',
);
const loaded = await loadKtxProject({ projectDir: project.projectDir });
await loaded.fileStore.writeFile(
'semantic-layer/warehouse/_schema/public.yaml',
YAML.stringify({
tables: {
orders: {
table: 'public.orders',
columns: [
{ name: 'id', type: 'string' },
{ name: 'status', type: 'string' },
{ name: 'customer_id', type: 'string' },
],
},
customers: {
table: 'public.customers',
columns: [
{ name: 'id', type: 'string' },
{ name: 'segment', type: 'string' },
],
},
},
}),
'KTX Test',
'system@ktx.local',
'Seed schema shard',
);
return loaded;
}
describe('historic-SQL local ingest retrieval acceptance', () => {
let tempDir: string;
beforeEach(async () => {
tempDir = await mkdtemp(join(tmpdir(), 'ktx-historic-sql-acceptance-'));
});
afterEach(async () => {
await rm(tempDir, { recursive: true, force: true });
});
it('projects table and pattern evidence into semantic-layer and wiki retrieval surfaces', async () => {
const initialized = await initKtxProject({ projectDir: join(tempDir, 'project') });
const project = await writeHistoricSqlProject(initialized);
const sqlAnalysis = acceptanceSqlAnalysis();
const agentRunner = new HistoricSqlAcceptanceAgentRunner();
const adapter = new HistoricSqlSourceAdapter({
reader: new AcceptanceHistoricSqlReader(),
queryClient: {},
sqlAnalysis,
now: () => new Date('2026-05-11T00:00:00.000Z'),
});
const result = await runLocalIngest({
project,
adapters: [adapter],
adapter: 'historic-sql',
connectionId: 'warehouse',
jobId: 'historic-sql-retrieval-acceptance',
agentRunner,
});
expect(sqlAnalysis.analyzeBatch).toHaveBeenCalledTimes(1);
expect(result.result.failedWorkUnits).toEqual([]);
expect(result.result.workUnitCount).toBe(3);
expect(agentRunner.runLoop).toHaveBeenCalledTimes(3);
const finalization = result.report.body.finalization;
expect(finalization).toBeDefined();
if (!finalization) {
throw new Error('Expected historic-SQL finalization result');
}
expect(finalization).toMatchObject({
sourceKey: 'historic-sql',
status: 'success',
result: {
tableUsageMerged: 2,
patternPagesWritten: 1,
},
});
expect(finalization.declaredTouchedSources).toEqual(
expect.arrayContaining([
{ connectionId: 'warehouse', sourceName: 'customers' },
{ connectionId: 'warehouse', sourceName: 'orders' },
]),
);
await expect(readFile(join(project.projectDir, 'semantic-layer/warehouse/_schema/public.yaml'), 'utf-8')).resolves
.toContain('Analysts repeatedly inspect paid order lifecycle by customer segment.');
await expect(readFile(join(project.projectDir, 'wiki/global/historic-sql-paid-order-lifecycle.md'), 'utf-8'))
.resolves.toContain('Paid Order Lifecycle');
const reloaded = await loadKtxProject({ projectDir: project.projectDir });
await expect(
searchLocalSlSources(reloaded, { connectionId: 'warehouse', query: 'paid order lifecycle', limit: 5 }),
).resolves.toEqual(expect.arrayContaining([
expect.objectContaining({
name: 'orders',
frequencyTier: 'high',
snippet: expect.stringContaining('<mark>'),
matchReasons: expect.arrayContaining(['lexical']),
}),
]));
await expect(
searchLocalKnowledgePages(reloaded, { query: 'paid order lifecycle', userId: 'local', limit: 5 }),
).resolves.toEqual([
expect.objectContaining({
key: 'historic-sql-paid-order-lifecycle',
summary: 'Paid Order Lifecycle',
matchReasons: expect.arrayContaining(['lexical']),
}),
]);
});
});

View file

@ -0,0 +1,89 @@
import { describe, expect, it } from 'vitest';
import {
HISTORIC_SQL_PATTERN_WORKUNIT_MAX_BYTES,
isHistoricSqlPatternInputShardPath,
serializedStagedPatternsInputByteLength,
splitHistoricSqlPatternInputs,
} from './pattern-inputs.js';
import type { StagedPatternsInput } from './types.js';
type PatternTemplate = StagedPatternsInput['templates'][number];
function template(id: string, tablesTouched: string[], canonicalSql = 'select 1'): PatternTemplate {
return {
id,
canonicalSql,
tablesTouched,
executionsBucket: '10-100',
distinctUsersBucket: '2-5',
dialect: 'postgres',
};
}
describe('historic-SQL pattern input sharding', () => {
it('keeps the audit input complete while sharding only cross-table pattern candidates', () => {
const largeSql = `select * from public.orders join public.customers on true where marker = '${'x'.repeat(260)}'`;
const input: StagedPatternsInput = {
templates: [
template('single-table-orders', ['public.orders']),
template('orders-customers-2', ['public.orders', 'public.customers'], largeSql),
template('orders-customers-1', ['public.customers', 'public.orders'], largeSql),
template('orders-customers-payments', ['public.orders', 'public.customers', 'public.payments'], largeSql),
],
};
const result = splitHistoricSqlPatternInputs(input, { maxBytes: 760 });
expect(result.auditInput.templates.map((entry) => entry.id)).toEqual([
'orders-customers-1',
'orders-customers-2',
'orders-customers-payments',
'single-table-orders',
]);
expect(result.shards.length).toBeGreaterThan(1);
expect(result.shards.map((shard) => shard.path)).toEqual([
'patterns-input/part-0001.json',
'patterns-input/part-0002.json',
'patterns-input/part-0003.json',
]);
expect(result.shards.flatMap((shard) => shard.input.templates.map((entry) => entry.id))).toEqual([
'orders-customers-payments',
'orders-customers-1',
'orders-customers-2',
]);
expect(result.shards.every((shard) => shard.byteLength <= 760)).toBe(true);
expect(result.shards.flatMap((shard) => shard.input.templates).some((entry) => entry.id === 'single-table-orders')).toBe(false);
expect(result.warnings).toEqual([]);
});
it('omits a single oversized template from shards and reports a manifest warning', () => {
const input: StagedPatternsInput = {
templates: [
template(
'oversized-cross-table',
['public.orders', 'public.customers'],
`select * from public.orders join public.customers on true where payload = '${'x'.repeat(500)}'`,
),
],
};
const result = splitHistoricSqlPatternInputs(input, { maxBytes: 240 });
expect(result.auditInput.templates.map((entry) => entry.id)).toEqual(['oversized-cross-table']);
expect(result.shards).toEqual([]);
expect(result.warnings).toEqual(['patterns_input_template_too_large:oversized-cross-table']);
});
it('recognizes only generated pattern shard paths', () => {
expect(isHistoricSqlPatternInputShardPath('patterns-input/part-0001.json')).toBe(true);
expect(isHistoricSqlPatternInputShardPath('patterns-input/part-0012.json')).toBe(true);
expect(isHistoricSqlPatternInputShardPath('patterns-input.json')).toBe(false);
expect(isHistoricSqlPatternInputShardPath('patterns-input/part-1.json')).toBe(false);
expect(isHistoricSqlPatternInputShardPath('patterns-input/readme.md')).toBe(false);
});
it('uses a production byte budget below read_raw_file maximum size', () => {
expect(HISTORIC_SQL_PATTERN_WORKUNIT_MAX_BYTES).toBeLessThan(120_000);
expect(serializedStagedPatternsInputByteLength({ templates: [] })).toBeGreaterThan(0);
});
});

View file

@ -0,0 +1,101 @@
import { Buffer } from 'node:buffer';
import type { StagedPatternsInput } from './types.js';
const HISTORIC_SQL_PATTERN_WORKUNIT_DIR = 'patterns-input';
/** @internal */
export const HISTORIC_SQL_PATTERN_WORKUNIT_MAX_BYTES = 110_000;
const HISTORIC_SQL_PATTERN_WORKUNIT_PATH_RE = /^patterns-input\/part-\d{4}\.json$/;
type PatternTemplate = StagedPatternsInput['templates'][number];
interface HistoricSqlPatternInputShard {
path: string;
input: StagedPatternsInput;
byteLength: number;
}
export interface HistoricSqlPatternInputSplitResult {
auditInput: StagedPatternsInput;
shards: HistoricSqlPatternInputShard[];
warnings: string[];
}
export interface HistoricSqlPatternInputSplitOptions {
maxBytes?: number;
}
export function isHistoricSqlPatternInputShardPath(path: string): boolean {
return HISTORIC_SQL_PATTERN_WORKUNIT_PATH_RE.test(path);
}
function serializeStagedPatternsInput(input: StagedPatternsInput): string {
return `${JSON.stringify(input, null, 2)}\n`;
}
/** @internal */
export function serializedStagedPatternsInputByteLength(input: StagedPatternsInput): number {
return Buffer.byteLength(serializeStagedPatternsInput(input), 'utf-8');
}
function sortedAuditTemplates(templates: readonly PatternTemplate[]): PatternTemplate[] {
return [...templates].sort((left, right) => left.id.localeCompare(right.id));
}
function sortedPatternCandidates(templates: readonly PatternTemplate[]): PatternTemplate[] {
return [...templates]
.filter((template) => template.tablesTouched.length >= 2)
.map((template) => ({ ...template, tablesTouched: [...template.tablesTouched].sort() }))
.sort((left, right) => {
const cardinality = right.tablesTouched.length - left.tablesTouched.length;
if (cardinality !== 0) return cardinality;
const tableSignature = left.tablesTouched.join('\0').localeCompare(right.tablesTouched.join('\0'));
if (tableSignature !== 0) return tableSignature;
return left.id.localeCompare(right.id);
});
}
function shardPath(index: number): string {
return `${HISTORIC_SQL_PATTERN_WORKUNIT_DIR}/part-${String(index).padStart(4, '0')}.json`;
}
export function splitHistoricSqlPatternInputs(
input: StagedPatternsInput,
options: HistoricSqlPatternInputSplitOptions = {},
): HistoricSqlPatternInputSplitResult {
const maxBytes = options.maxBytes ?? HISTORIC_SQL_PATTERN_WORKUNIT_MAX_BYTES;
const auditInput: StagedPatternsInput = { templates: sortedAuditTemplates(input.templates) };
const warnings: string[] = [];
const shards: HistoricSqlPatternInputShard[] = [];
let current: PatternTemplate[] = [];
const flush = () => {
if (current.length === 0) {
return;
}
const shardInput: StagedPatternsInput = { templates: current };
shards.push({
path: shardPath(shards.length + 1),
input: shardInput,
byteLength: serializedStagedPatternsInputByteLength(shardInput),
});
current = [];
};
for (const template of sortedPatternCandidates(input.templates)) {
const singleInput: StagedPatternsInput = { templates: [template] };
if (serializedStagedPatternsInputByteLength(singleInput) > maxBytes) {
warnings.push(`patterns_input_template_too_large:${template.id}`);
continue;
}
const nextInput: StagedPatternsInput = { templates: [...current, template] };
if (current.length > 0 && serializedStagedPatternsInputByteLength(nextInput) > maxBytes) {
flush();
}
current.push(template);
}
flush();
return { auditInput, shards, warnings };
}

View file

@ -0,0 +1,242 @@
import { describe, expect, it, vi } from 'vitest';
import {
HistoricSqlExtensionMissingError,
HistoricSqlGrantsMissingError,
HistoricSqlVersionUnsupportedError,
} from './errors.js';
import { PostgresPgssReader } from './postgres-pgss-reader.js';
interface FakeQueryResult {
headers: string[];
rows: unknown[][];
totalRows?: number;
error?: string;
}
function queryClient(results: Array<FakeQueryResult | Error>) {
const executeQuery = vi.fn(async (_query: string, _params?: unknown[]) => {
const next = results.shift();
if (!next) {
throw new Error('unexpected query');
}
if (next instanceof Error) {
throw next;
}
return next;
});
return { executeQuery };
}
function executedSql(client: ReturnType<typeof queryClient>, index: number): string {
const call = client.executeQuery.mock.calls[index];
if (!call) {
throw new Error(`expected query client call ${index}`);
}
return call[0];
}
describe('PostgresPgssReader aggregate path', () => {
it('probes version, extension presence, grants, and tracking state', async () => {
const client = queryClient([
{
headers: ['server_version_num', 'server_version'],
rows: [[160004, 'PostgreSQL 16.4 on x86_64-apple-darwin']],
},
{ headers: ['?column?'], rows: [[1]] },
{ headers: ['has_role'], rows: [[true]] },
{ headers: ['track'], rows: [['top']] },
{ headers: ['max'], rows: [['5000']] },
]);
const reader = new PostgresPgssReader();
await expect(reader.probe(client)).resolves.toEqual({
pgServerVersion: 'PostgreSQL 16.4 on x86_64-apple-darwin',
warnings: [],
info: [],
});
expect(executedSql(client, 0)).toContain("current_setting('server_version_num')::int");
expect(executedSql(client, 1)).toBe('SELECT 1 FROM pg_stat_statements LIMIT 1');
expect(executedSql(client, 2)).toBe(
"SELECT pg_has_role(current_user, 'pg_read_all_stats', 'USAGE') AS has_role",
);
expect(executedSql(client, 3)).toBe("SELECT current_setting('pg_stat_statements.track') AS track");
expect(executedSql(client, 4)).toBe("SELECT current_setting('pg_stat_statements.max') AS max");
});
it('rejects PostgreSQL versions older than 14 without probing the extension', async () => {
const client = queryClient([
{
headers: ['server_version_num', 'server_version'],
rows: [[130012, 'PostgreSQL 13.12']],
},
]);
const reader = new PostgresPgssReader();
const promise = reader.probe(client);
await expect(promise).rejects.toMatchObject({
name: 'HistoricSqlVersionUnsupportedError',
dialect: 'postgres',
detectedVersion: 'PostgreSQL 13.12',
minimumVersion: 'PostgreSQL 14',
});
await expect(promise).rejects.toBeInstanceOf(HistoricSqlVersionUnsupportedError);
expect(client.executeQuery).toHaveBeenCalledTimes(1);
});
it('maps a missing pg_stat_statements relation to HistoricSqlExtensionMissingError', async () => {
const client = queryClient([
{
headers: ['server_version_num', 'server_version'],
rows: [[160004, 'PostgreSQL 16.4']],
},
new Error('relation "pg_stat_statements" does not exist'),
]);
const reader = new PostgresPgssReader();
const promise = reader.probe(client);
await expect(promise).rejects.toMatchObject({
name: 'HistoricSqlExtensionMissingError',
dialect: 'postgres',
});
await expect(promise).rejects.toBeInstanceOf(HistoricSqlExtensionMissingError);
});
it('maps pg_stat_statements preload failures to HistoricSqlExtensionMissingError with preload remediation', async () => {
const client = queryClient([
{
headers: ['server_version_num', 'server_version'],
rows: [[160004, 'PostgreSQL 16.4']],
},
new Error('pg_stat_statements must be loaded via shared_preload_libraries'),
]);
const reader = new PostgresPgssReader();
const promise = reader.probe(client);
await expect(promise).rejects.toMatchObject({
name: 'HistoricSqlExtensionMissingError',
dialect: 'postgres',
message: 'pg_stat_statements is installed but not loaded via shared_preload_libraries.',
remediation: expect.stringContaining("shared_preload_libraries includes 'pg_stat_statements'"),
});
await expect(promise).rejects.toBeInstanceOf(HistoricSqlExtensionMissingError);
});
it('maps missing pg_read_all_stats membership to HistoricSqlGrantsMissingError', async () => {
const client = queryClient([
{
headers: ['server_version_num', 'server_version'],
rows: [[160004, 'PostgreSQL 16.4']],
},
{ headers: ['?column?'], rows: [[1]] },
{ headers: ['has_role'], rows: [[false]] },
]);
const reader = new PostgresPgssReader();
const promise = reader.probe(client);
await expect(promise).rejects.toMatchObject({
name: 'HistoricSqlGrantsMissingError',
dialect: 'postgres',
remediation: 'GRANT pg_read_all_stats TO <connection role>;',
});
await expect(promise).rejects.toBeInstanceOf(HistoricSqlGrantsMissingError);
});
it('returns a warning instead of failing when pg_stat_statements.track is none', async () => {
const client = queryClient([
{
headers: ['server_version_num', 'server_version'],
rows: [[160004, 'PostgreSQL 16.4']],
},
{ headers: ['?column?'], rows: [[1]] },
{ headers: ['has_role'], rows: [[true]] },
{ headers: ['track'], rows: [['none']] },
{ headers: ['max'], rows: [['5000']] },
]);
const reader = new PostgresPgssReader();
await expect(reader.probe(client)).resolves.toEqual({
pgServerVersion: 'PostgreSQL 16.4',
warnings: [
"pg_stat_statements.track is none; set it to top or all in the Postgres parameter group or config",
],
info: [],
});
});
it('returns an info note when pg_stat_statements.max is below the recommended floor', async () => {
const client = queryClient([
{
headers: ['server_version_num', 'server_version'],
rows: [[160004, 'PostgreSQL 16.4']],
},
{ headers: ['?column?'], rows: [[1]] },
{ headers: ['has_role'], rows: [[true]] },
{ headers: ['track'], rows: [['top']] },
{ headers: ['max'], rows: [['1000']] },
]);
const reader = new PostgresPgssReader();
await expect(reader.probe(client)).resolves.toEqual({
pgServerVersion: 'PostgreSQL 16.4',
warnings: [],
info: [
'pg_stat_statements.max is 1000; set it to at least 5000 to reduce query-template eviction churn',
],
});
});
it('aggregates pg_stat_statements rows by queryid and query', async () => {
const executeQuery = vi.fn(async (sql: string, params?: unknown[]) => {
if (sql.includes('pg_stat_statements_info')) {
return { headers: ['stats_reset', 'dealloc'], rows: [['2026-05-01T00:00:00.000Z', 1]] };
}
expect(sql).toContain('GROUP BY queryid, query');
expect(sql).toContain('HAVING SUM(calls) >= $1');
expect(params).toEqual([5]);
return {
headers: ['template_id', 'canonical_sql', 'executions', 'distinct_users', 'mean_ms', 'rows_produced', 'top_users'],
rows: [
[
'123',
'select status from public.orders',
'42',
'3',
'11.5',
'100',
JSON.stringify([{ user: 'analyst', executions: 40 }]),
],
],
};
});
const reader = new PostgresPgssReader();
const rows = [];
for await (const row of reader.fetchAggregated(
{ executeQuery },
{ start: new Date('2026-02-10T00:00:00.000Z'), end: new Date('2026-05-11T00:00:00.000Z') },
{ dialect: 'postgres', minExecutions: 5, enabledTables: [], filters: { dropTrivialProbes: true }, redactionPatterns: [], staleArchiveAfterDays: 90 },
)) {
rows.push(row);
}
expect(rows).toEqual([
{
templateId: '123',
canonicalSql: 'select status from public.orders',
dialect: 'postgres',
stats: {
executions: 42,
distinctUsers: 3,
firstSeen: '2026-05-01T00:00:00.000Z',
lastSeen: '2026-05-11T00:00:00.000Z',
p50RuntimeMs: 11.5,
p95RuntimeMs: 11.5,
errorRate: 0,
rowsProduced: 100,
},
topUsers: [{ user: 'analyst', executions: 40 }],
},
]);
});
});

View file

@ -0,0 +1,293 @@
import {
HistoricSqlExtensionMissingError,
HistoricSqlGrantsMissingError,
HistoricSqlVersionUnsupportedError,
} from './errors.js';
import {
aggregatedTemplateSchema,
type AggregatedTemplate,
type HistoricSqlTimeWindow,
type HistoricSqlUnifiedPullConfig,
type KtxPostgresQueryClient,
type PostgresPgssProbeResult,
} from './types.js';
interface QueryResultLike {
headers: string[];
rows: unknown[][];
totalRows?: number;
error?: string;
}
const STATS_INFO_SQL = 'SELECT stats_reset, dealloc FROM pg_stat_statements_info';
const VERSION_SQL = `
SELECT current_setting('server_version_num')::int AS server_version_num,
version() AS server_version
`.trim();
const EXTENSION_PROBE_SQL = 'SELECT 1 FROM pg_stat_statements LIMIT 1';
const GRANTS_PROBE_SQL = "SELECT pg_has_role(current_user, 'pg_read_all_stats', 'USAGE') AS has_role";
const TRACKING_PROBE_SQL = "SELECT current_setting('pg_stat_statements.track') AS track";
const MAX_SETTING_PROBE_SQL = "SELECT current_setting('pg_stat_statements.max') AS max";
const RECOMMENDED_PGSS_MAX = 5000;
const AGGREGATE_SQL = `
SELECT queryid::text AS template_id,
query AS canonical_sql,
SUM(calls)::bigint AS executions,
COUNT(DISTINCT userid) AS distinct_users,
SUM(total_exec_time) / NULLIF(SUM(calls), 0) AS mean_ms,
SUM(rows)::bigint AS rows_produced,
COALESCE(
json_agg(json_build_object('user', rolname, 'executions', calls) ORDER BY calls DESC)
FILTER (WHERE userid IS NOT NULL),
'[]'::json
)::text AS top_users
FROM pg_stat_statements
LEFT JOIN pg_roles ON pg_roles.oid = pg_stat_statements.userid
WHERE toplevel = true
GROUP BY queryid, query
HAVING SUM(calls) >= $1
ORDER BY SUM(total_exec_time) DESC
`.trim();
const POSTGRES_EXTENSION_REMEDIATION = [
'Run CREATE EXTENSION pg_stat_statements; against the connection database.',
"Ensure shared_preload_libraries includes 'pg_stat_statements' in the Postgres parameter group or config.",
].join(' ');
const POSTGRES_GRANTS_REMEDIATION = 'GRANT pg_read_all_stats TO <connection role>;';
function queryClient(client: unknown): KtxPostgresQueryClient {
if (
client &&
typeof client === 'object' &&
'executeQuery' in client &&
typeof (client as { executeQuery?: unknown }).executeQuery === 'function'
) {
return client as KtxPostgresQueryClient;
}
throw new Error('Historic SQL Postgres PGSS reader requires a query client with executeQuery(sql, params?)');
}
async function execute(client: KtxPostgresQueryClient, sql: string, params?: unknown[]): Promise<QueryResultLike> {
const result = await client.executeQuery(sql, params);
if ('error' in result && typeof result.error === 'string' && result.error.length > 0) {
throw new Error(result.error);
}
return result;
}
function indexByHeader(headers: string[]): Map<string, number> {
const out = new Map<string, number>();
headers.forEach((header, index) => out.set(header.toLowerCase(), index));
return out;
}
function value(row: unknown[], headerIndexes: Map<string, number>, header: string): unknown {
const index = headerIndexes.get(header.toLowerCase());
return index === undefined ? null : row[index];
}
function nullableString(raw: unknown): string | null {
if (raw === null || raw === undefined) {
return null;
}
const text = String(raw);
return text.length > 0 ? text : null;
}
function requiredString(raw: unknown, field: string): string {
const text = nullableString(raw);
if (!text) {
throw new Error(`Postgres pg_stat_statements row is missing ${field}`);
}
return text;
}
function requiredFiniteNumber(raw: unknown, field: string): number {
const number = typeof raw === 'number' ? raw : Number(raw);
if (!Number.isFinite(number)) {
throw new Error(`Postgres pg_stat_statements row has invalid ${field}: ${String(raw)}`);
}
return number;
}
function requiredInteger(raw: unknown, field: string): number {
return Math.trunc(requiredFiniteNumber(raw, field));
}
function nullableNumber(raw: unknown): number | null {
if (raw === null || raw === undefined || raw === '') {
return null;
}
const number = typeof raw === 'number' ? raw : Number(raw);
return Number.isFinite(number) ? number : null;
}
function nullableInteger(raw: unknown): number | null {
const number = nullableNumber(raw);
return number === null ? null : Math.trunc(number);
}
function nullableIsoTimestamp(raw: unknown): string | null {
if (raw === null || raw === undefined || raw === '') {
return null;
}
if (raw instanceof Date) {
return raw.toISOString();
}
const date = new Date(String(raw));
return Number.isNaN(date.getTime()) ? null : date.toISOString();
}
function firstRow(result: QueryResultLike, context: string): { row: unknown[]; headers: Map<string, number> } {
const row = result.rows[0];
if (!row) {
throw new Error(`Postgres historic-SQL ${context} query returned no rows`);
}
return { row, headers: indexByHeader(result.headers) };
}
function isMissingPgssRelation(error: unknown): boolean {
const message = error instanceof Error ? error.message : String(error);
return /relation ["']?pg_stat_statements["']? does not exist/i.test(message);
}
function isPgssPreloadRequired(error: unknown): boolean {
const message = error instanceof Error ? error.message : String(error);
return /pg_stat_statements.*shared_preload_libraries/i.test(message);
}
function extensionMissingError(cause: unknown, message?: string): HistoricSqlExtensionMissingError {
return new HistoricSqlExtensionMissingError({
dialect: 'postgres',
message: message ?? 'pg_stat_statements extension is not installed in the connection database.',
remediation: POSTGRES_EXTENSION_REMEDIATION,
cause,
});
}
function grantsMissingError(): HistoricSqlGrantsMissingError {
return new HistoricSqlGrantsMissingError({
dialect: 'postgres',
message: 'Postgres connection role lacks pg_read_all_stats for historic-SQL ingest.',
remediation: POSTGRES_GRANTS_REMEDIATION,
});
}
function parseTopUsers(raw: unknown): Array<{ user: string | null; executions: number }> {
const text = nullableString(raw);
if (!text) {
return [];
}
try {
const parsed = JSON.parse(text) as unknown;
if (!Array.isArray(parsed)) {
return [];
}
return parsed.flatMap((entry) => {
if (!entry || typeof entry !== 'object') {
return [];
}
const user = nullableString((entry as { user?: unknown }).user);
const executions = nullableInteger((entry as { executions?: unknown }).executions);
return executions === null ? [] : [{ user, executions }];
});
} catch {
return [];
}
}
export class PostgresPgssReader {
async probe(client: unknown): Promise<PostgresPgssProbeResult> {
const pgClient = queryClient(client);
const versionResult = await execute(pgClient, VERSION_SQL);
const { row: versionRow, headers: versionHeaders } = firstRow(versionResult, 'version probe');
const serverVersionNum = requiredFiniteNumber(
value(versionRow, versionHeaders, 'server_version_num'),
'server_version_num',
);
const pgServerVersion = requiredString(value(versionRow, versionHeaders, 'server_version'), 'server_version');
if (serverVersionNum < 140000) {
throw new HistoricSqlVersionUnsupportedError({
dialect: 'postgres',
detectedVersion: pgServerVersion,
minimumVersion: 'PostgreSQL 14',
});
}
try {
await execute(pgClient, EXTENSION_PROBE_SQL);
} catch (error) {
if (isMissingPgssRelation(error)) {
throw extensionMissingError(error);
}
if (isPgssPreloadRequired(error)) {
throw extensionMissingError(
error,
'pg_stat_statements is installed but not loaded via shared_preload_libraries.',
);
}
throw error;
}
const grantsResult = await execute(pgClient, GRANTS_PROBE_SQL);
const { row: grantsRow, headers: grantsHeaders } = firstRow(grantsResult, 'grant probe');
if (value(grantsRow, grantsHeaders, 'has_role') !== true) {
throw grantsMissingError();
}
const trackingResult = await execute(pgClient, TRACKING_PROBE_SQL);
const { row: trackingRow, headers: trackingHeaders } = firstRow(trackingResult, 'tracking probe');
const track = nullableString(value(trackingRow, trackingHeaders, 'track'));
const maxResult = await execute(pgClient, MAX_SETTING_PROBE_SQL);
const { row: maxRow, headers: maxHeaders } = firstRow(maxResult, 'max-setting probe');
const pgssMax = nullableInteger(value(maxRow, maxHeaders, 'max'));
const warnings: string[] = [];
const info: string[] = [];
if (track === 'none') {
warnings.push('pg_stat_statements.track is none; set it to top or all in the Postgres parameter group or config');
}
if (pgssMax !== null && pgssMax < RECOMMENDED_PGSS_MAX) {
info.push(
`pg_stat_statements.max is ${pgssMax}; set it to at least ${RECOMMENDED_PGSS_MAX} to reduce query-template eviction churn`,
);
}
return { pgServerVersion, warnings, info };
}
async *fetchAggregated(
client: unknown,
window: HistoricSqlTimeWindow,
config: HistoricSqlUnifiedPullConfig,
): AsyncIterable<AggregatedTemplate> {
const pgClient = queryClient(client);
const statsResult = await execute(pgClient, STATS_INFO_SQL);
const { row: statsRow, headers: statsHeaders } = firstRow(statsResult, 'stats-info');
const firstSeen = nullableIsoTimestamp(value(statsRow, statsHeaders, 'stats_reset')) ?? window.start.toISOString();
const result = await execute(pgClient, AGGREGATE_SQL, [config.minExecutions]);
const indexes = indexByHeader(result.headers);
for (const row of result.rows) {
yield aggregatedTemplateSchema.parse({
templateId: requiredString(value(row, indexes, 'template_id'), 'template_id'),
canonicalSql: requiredString(value(row, indexes, 'canonical_sql'), 'canonical_sql'),
dialect: 'postgres',
stats: {
executions: requiredInteger(value(row, indexes, 'executions'), 'executions'),
distinctUsers: requiredInteger(value(row, indexes, 'distinct_users'), 'distinct_users'),
firstSeen,
lastSeen: window.end.toISOString(),
p50RuntimeMs: nullableNumber(value(row, indexes, 'mean_ms')),
p95RuntimeMs: nullableNumber(value(row, indexes, 'mean_ms')),
errorRate: 0,
rowsProduced: nullableInteger(value(row, indexes, 'rows_produced')),
},
topUsers: parseTopUsers(value(row, indexes, 'top_users')),
});
}
}
}

View file

@ -0,0 +1,457 @@
import { mkdir, mkdtemp, readFile, writeFile } from 'node:fs/promises';
import { tmpdir } from 'node:os';
import { join } from 'node:path';
import YAML from 'yaml';
import { describe, expect, it } from 'vitest';
import { projectHistoricSqlEvidence } from './projection.js';
async function tempWorkdir(): Promise<string> {
return mkdtemp(join(tmpdir(), 'historic-sql-projection-'));
}
async function writeText(root: string, relPath: string, content: string): Promise<void> {
const target = join(root, relPath);
await mkdir(join(target, '..'), { recursive: true });
await writeFile(target, content, 'utf-8');
}
async function writeJson(root: string, relPath: string, value: unknown): Promise<void> {
await writeText(root, relPath, `${JSON.stringify(value, null, 2)}\n`);
}
describe('projectHistoricSqlEvidence', () => {
it('merges table usage into matching _schema shards and preserves external usage keys', async () => {
const workdir = await tempWorkdir();
await writeText(
workdir,
'semantic-layer/warehouse/_schema/public.yaml',
YAML.stringify({
tables: {
orders: {
table: 'public.orders',
usage: {
narrative: 'Old generated usage.',
frequencyTier: 'low',
commonFilters: ['old_status'],
commonJoins: [],
ownerNote: 'keep me',
},
columns: [{ name: 'id', type: 'string' }],
},
},
}),
);
await writeJson(workdir, 'raw-sources/warehouse/historic-sql/sync-1/manifest.json', {
source: 'historic-sql',
connectionId: 'warehouse',
dialect: 'postgres',
fetchedAt: '2026-05-11T00:00:00.000Z',
windowStart: '2026-02-10T00:00:00.000Z',
windowEnd: '2026-05-11T00:00:00.000Z',
snapshotRowCount: 1,
touchedTableCount: 1,
parseFailures: 0,
warnings: [],
probeWarnings: [],
staleArchiveAfterDays: 90,
});
await writeJson(workdir, 'raw-sources/warehouse/historic-sql/sync-1/tables/public.orders.json', { table: 'public.orders' });
await writeJson(workdir, '.ktx/ingest-evidence/historic-sql/run-1/orders.json', {
kind: 'table_usage',
connectionId: 'warehouse',
table: 'public.orders',
rawPath: 'tables/public.orders.json',
usage: {
narrative: 'Orders are repeatedly queried for lifecycle analysis.',
frequencyTier: 'high',
commonFilters: ['status', 'created_at'],
commonGroupBys: ['status'],
commonJoins: [{ table: 'public.customers', on: ['customer_id'] }],
staleSince: null,
},
});
const result = await projectHistoricSqlEvidence({ workdir, connectionId: 'warehouse', syncId: 'sync-1', runId: 'run-1' });
expect(result.touchedSources).toEqual([{ connectionId: 'warehouse', sourceName: 'orders' }]);
expect(result.actions).toEqual(
expect.arrayContaining([
expect.objectContaining({
target: 'sl',
key: 'orders',
rawPaths: ['tables/public.orders.json'],
}),
]),
);
const shard = YAML.parse(await readFile(join(workdir, 'semantic-layer/warehouse/_schema/public.yaml'), 'utf-8'));
expect(shard.tables.orders.usage).toEqual({
ownerNote: 'keep me',
narrative: 'Orders are repeatedly queried for lifecycle analysis.',
frequencyTier: 'high',
commonFilters: ['status', 'created_at'],
commonGroupBys: ['status'],
commonJoins: [{ table: 'public.customers', on: ['customer_id'] }],
staleSince: null,
});
});
it('writes pattern pages, reuses similar slugs, and marks missing old pattern pages stale', async () => {
const workdir = await tempWorkdir();
await writeJson(workdir, 'raw-sources/warehouse/historic-sql/sync-1/manifest.json', {
source: 'historic-sql',
connectionId: 'warehouse',
dialect: 'postgres',
fetchedAt: '2026-05-11T00:00:00.000Z',
windowStart: '2026-02-10T00:00:00.000Z',
windowEnd: '2026-05-11T00:00:00.000Z',
snapshotRowCount: 2,
touchedTableCount: 2,
parseFailures: 0,
warnings: [],
probeWarnings: [],
staleArchiveAfterDays: 90,
});
await writeJson(workdir, 'raw-sources/warehouse/historic-sql/sync-1/tables/public.orders.json', { table: 'public.orders' });
await writeJson(workdir, 'raw-sources/warehouse/historic-sql/sync-1/tables/public.customers.json', { table: 'public.customers' });
await writeText(
workdir,
'wiki/global/historic-sql-old-order-lifecycle.md',
[
'---',
YAML.stringify({
summary: 'Old order lifecycle page',
tags: ['historic-sql', 'pattern'],
refs: [],
sl_refs: ['orders'],
usage_mode: 'auto',
source: 'historic-sql',
tables: ['public.orders', 'public.customers'],
fingerprints: ['pg:1'],
}).trimEnd(),
'---',
'',
'Old body',
'',
].join('\n'),
);
await writeText(
workdir,
'wiki/global/historic-sql-retired-pattern.md',
[
'---',
YAML.stringify({
summary: 'Retired pattern',
tags: ['historic-sql', 'pattern'],
refs: [],
sl_refs: [],
usage_mode: 'auto',
source: 'historic-sql',
tables: ['public.tickets'],
fingerprints: ['pg:9'],
}).trimEnd(),
'---',
'',
'Retired body',
'',
].join('\n'),
);
await writeJson(workdir, '.ktx/ingest-evidence/historic-sql/run-1/pattern.json', {
kind: 'pattern',
connectionId: 'warehouse',
rawPath: 'patterns-input.json',
pattern: {
slug: 'order-lifecycle-analysis',
title: 'Order Lifecycle Analysis',
narrative: 'Analysts compare order status with customer segment.',
definitionSql: 'select * from public.orders join public.customers on customers.id = orders.customer_id',
tablesInvolved: ['public.orders', 'public.customers'],
slRefs: ['orders', 'customers'],
constituentTemplateIds: ['pg:1', 'pg:2'],
},
});
const result = await projectHistoricSqlEvidence({ workdir, connectionId: 'warehouse', syncId: 'sync-1', runId: 'run-1' });
expect(result.patternPagesWritten).toBe(1);
expect(result.changedWikiPageKeys).toContain('historic-sql-old-order-lifecycle');
expect(result.actions).toEqual(
expect.arrayContaining([
expect.objectContaining({
target: 'wiki',
key: 'historic-sql-old-order-lifecycle',
rawPaths: ['patterns-input.json'],
}),
]),
);
await expect(readFile(join(workdir, 'wiki/global/historic-sql-old-order-lifecycle.md'), 'utf-8')).resolves.toContain(
'Order Lifecycle Analysis',
);
await expect(readFile(join(workdir, 'wiki/global/historic-sql-retired-pattern.md'), 'utf-8')).resolves.toContain(
'stale_since: "2026-05-11T00:00:00.000Z"',
);
});
it('rewrites a reappearing archived pattern at the flat slug', async () => {
const workdir = await tempWorkdir();
await writeJson(workdir, 'raw-sources/warehouse/historic-sql/sync-1/manifest.json', {
source: 'historic-sql',
connectionId: 'warehouse',
dialect: 'postgres',
fetchedAt: '2026-05-11T00:00:00.000Z',
windowStart: '2026-02-10T00:00:00.000Z',
windowEnd: '2026-05-11T00:00:00.000Z',
snapshotRowCount: 2,
touchedTableCount: 2,
parseFailures: 0,
warnings: [],
probeWarnings: [],
staleArchiveAfterDays: 30,
});
await writeJson(workdir, 'raw-sources/warehouse/historic-sql/sync-1/tables/public.orders.json', { table: 'public.orders' });
await writeJson(workdir, 'raw-sources/warehouse/historic-sql/sync-1/tables/public.customers.json', { table: 'public.customers' });
await writeText(
workdir,
'wiki/global/historic-sql-order-lifecycle-analysis.md',
[
'---',
YAML.stringify({
summary: 'Archived order lifecycle page',
tags: ['historic-sql', 'pattern', 'archived'],
refs: [],
sl_refs: ['orders'],
usage_mode: 'auto',
source: 'historic-sql',
tables: ['public.orders', 'public.customers'],
fingerprints: ['pg:1'],
stale_since: '2026-01-01T00:00:00.000Z',
}).trimEnd(),
'---',
'',
'Archived body',
'',
].join('\n'),
);
await writeJson(workdir, '.ktx/ingest-evidence/historic-sql/run-1/pattern.json', {
kind: 'pattern',
connectionId: 'warehouse',
rawPath: 'patterns-input.json',
pattern: {
slug: 'order-lifecycle-analysis',
title: 'Order Lifecycle Analysis',
narrative: 'Analysts compare order status with customer segment again.',
definitionSql: 'select * from public.orders join public.customers on customers.id = orders.customer_id',
tablesInvolved: ['public.orders', 'public.customers'],
slRefs: ['orders', 'customers'],
constituentTemplateIds: ['pg:1', 'pg:2'],
},
});
const result = await projectHistoricSqlEvidence({ workdir, connectionId: 'warehouse', syncId: 'sync-1', runId: 'run-1' });
expect(result.patternPagesWritten).toBe(1);
const page = await readFile(join(workdir, 'wiki/global/historic-sql-order-lifecycle-analysis.md'), 'utf-8');
expect(page).toContain('Analysts compare order status with customer segment again.');
expect(page).not.toContain('Archived body');
expect(page).not.toContain('archived');
});
it('leaves already archived pattern pages stable when they are still absent', async () => {
const workdir = await tempWorkdir();
await writeJson(workdir, 'raw-sources/warehouse/historic-sql/sync-1/manifest.json', {
source: 'historic-sql',
connectionId: 'warehouse',
dialect: 'postgres',
fetchedAt: '2026-05-11T00:00:00.000Z',
windowStart: '2026-02-10T00:00:00.000Z',
windowEnd: '2026-05-11T00:00:00.000Z',
snapshotRowCount: 0,
touchedTableCount: 0,
parseFailures: 0,
warnings: [],
probeWarnings: [],
staleArchiveAfterDays: 30,
});
await writeText(
workdir,
'wiki/global/historic-sql-retired-pattern.md',
[
'---',
YAML.stringify({
summary: 'Retired pattern',
tags: ['historic-sql', 'pattern', 'archived'],
refs: [],
sl_refs: [],
usage_mode: 'auto',
source: 'historic-sql',
tables: ['public.tickets'],
fingerprints: ['pg:9'],
stale_since: '2026-01-01T00:00:00.000Z',
}).trimEnd(),
'---',
'',
'Archived retired body',
'',
].join('\n'),
);
const result = await projectHistoricSqlEvidence({ workdir, connectionId: 'warehouse', syncId: 'sync-1', runId: 'run-1' });
expect(result.archivedPatternPages).toBe(0);
expect(result.stalePatternPagesMarked).toBe(0);
await expect(readFile(join(workdir, 'wiki/global/historic-sql-retired-pattern.md'), 'utf-8')).resolves.toContain(
'Archived retired body',
);
});
it('marks missing table usage stale without deleting old query pages', async () => {
const workdir = await tempWorkdir();
await writeText(
workdir,
'semantic-layer/warehouse/_schema/public.yaml',
YAML.stringify({
tables: {
orders: {
table: 'public.orders',
usage: {
narrative: 'Orders were active before.',
frequencyTier: 'high',
commonFilters: ['status'],
commonGroupBys: ['status'],
commonJoins: [{ table: 'public.customers', on: ['customer_id'] }],
ownerNote: 'keep analyst annotation',
},
columns: [{ name: 'id', type: 'string' }],
},
},
}),
);
await writeJson(workdir, 'raw-sources/warehouse/historic-sql/sync-1/manifest.json', {
source: 'historic-sql',
connectionId: 'warehouse',
dialect: 'postgres',
fetchedAt: '2026-05-11T00:00:00.000Z',
windowStart: '2026-02-10T00:00:00.000Z',
windowEnd: '2026-05-11T00:00:00.000Z',
snapshotRowCount: 0,
touchedTableCount: 0,
parseFailures: 0,
warnings: [],
probeWarnings: [],
staleArchiveAfterDays: 90,
});
await writeJson(workdir, '.ktx/ingest-evidence/historic-sql/run-1/customers.json', {
kind: 'table_usage',
connectionId: 'warehouse',
table: 'public.customers',
rawPath: 'tables/public.customers.json',
usage: {
narrative: 'Customers were queried.',
frequencyTier: 'low',
commonFilters: [],
commonJoins: [],
staleSince: null,
},
});
await writeText(
workdir,
'wiki/global/historic-sql-old-template.md',
[
'---',
YAML.stringify({
summary: 'Old template page',
tags: ['historic-sql', 'query-pattern'],
refs: [],
sl_refs: ['orders'],
usage_mode: 'auto',
source: 'historic-sql',
tables: ['public.orders'],
fingerprints: ['old:1'],
}).trimEnd(),
'---',
'',
'Old body',
'',
].join('\n'),
);
const result = await projectHistoricSqlEvidence({ workdir, connectionId: 'warehouse', syncId: 'sync-1', runId: 'run-1' });
expect(result.staleTablesMarked).toBe(1);
expect(result.touchedSources).toEqual([{ connectionId: 'warehouse', sourceName: 'orders' }]);
const staleAction = result.actions.find((action) => action.target === 'sl' && action.key === 'orders');
expect(staleAction).toEqual(expect.objectContaining({ target: 'sl', key: 'orders' }));
expect(staleAction?.rawPaths).toBeUndefined();
const shard = YAML.parse(await readFile(join(workdir, 'semantic-layer/warehouse/_schema/public.yaml'), 'utf-8'));
expect(shard.tables.orders.usage).toEqual({
ownerNote: 'keep analyst annotation',
narrative: 'No recent historic SQL usage was observed in the latest snapshot.',
frequencyTier: 'unused',
commonFilters: [],
commonGroupBys: [],
commonJoins: [],
staleSince: '2026-05-11T00:00:00.000Z',
});
await expect(readFile(join(workdir, 'wiki/global/historic-sql-old-template.md'), 'utf-8')).resolves.toContain(
'Old body',
);
});
it('does not mark stale or archive pages when override replay has no current-run evidence', async () => {
const workdir = await tempWorkdir();
await writeText(
workdir,
'semantic-layer/warehouse/_schema/public.yaml',
YAML.stringify({
tables: {
orders: {
table: 'public.orders',
usage: {
narrative: 'Orders were active before.',
frequencyTier: 'high',
commonFilters: ['status'],
commonGroupBys: ['status'],
commonJoins: [],
},
columns: [{ name: 'id', type: 'string' }],
},
},
}),
);
await writeJson(workdir, 'raw-sources/warehouse/historic-sql/override-sync/manifest.json', {
source: 'historic-sql',
connectionId: 'warehouse',
dialect: 'postgres',
fetchedAt: '2026-05-11T00:00:00.000Z',
windowStart: '2026-02-10T00:00:00.000Z',
windowEnd: '2026-05-11T00:00:00.000Z',
snapshotRowCount: 0,
touchedTableCount: 0,
parseFailures: 0,
warnings: [],
probeWarnings: [],
staleArchiveAfterDays: 90,
});
const result = await projectHistoricSqlEvidence({
workdir,
connectionId: 'warehouse',
syncId: 'override-sync',
runId: 'override-run',
overrideReplay: {
priorJobId: 'prior-job',
priorRunId: 'prior-run',
priorSyncId: 'prior-sync',
evictionRawPaths: ['tables/public/orders.json'],
},
});
expect(result.tableUsageMerged).toBe(0);
expect(result.staleTablesMarked).toBe(0);
expect(result.patternPagesWritten).toBe(0);
expect(result.stalePatternPagesMarked).toBe(0);
expect(result.archivedPatternPages).toBe(0);
expect(result.touchedSources).toEqual([]);
expect(result.changedWikiPageKeys).toEqual([]);
expect(result.actions).toEqual([]);
});
});

View file

@ -0,0 +1,385 @@
import { access, mkdir, readdir, readFile, rename, writeFile } from 'node:fs/promises';
import { dirname, join, relative } from 'node:path';
import YAML from 'yaml';
import type { MemoryAction } from '../../../../context/memory/types.js';
import { rawSourcesDirForSync } from '../../raw-sources-paths.js';
import type { FinalizationOverrideReplay } from '../../types.js';
import { mergeUsagePreservingExternal } from '../live-database/manifest.js';
import { historicSqlEvidenceEnvelopeSchema, type HistoricSqlEvidenceEnvelope } from './evidence.js';
import type { TableUsageOutput } from './skill-schemas.js';
import { stagedManifestSchema } from './types.js';
export interface HistoricSqlProjectionInput {
workdir: string;
connectionId: string;
syncId: string;
runId: string;
overrideReplay?: FinalizationOverrideReplay;
}
export interface HistoricSqlProjectionResult {
tableUsageMerged: number;
staleTablesMarked: number;
patternPagesWritten: number;
stalePatternPagesMarked: number;
archivedPatternPages: number;
touchedSources: Array<{ connectionId: string; sourceName: string }>;
changedWikiPageKeys: string[];
actions: MemoryAction[];
warnings: string[];
}
interface ManifestShard {
tables?: Record<string, { table?: string; usage?: Record<string, unknown>; columns?: unknown[]; [key: string]: unknown }>;
}
interface HistoricSqlPatternPage {
key: string;
path: string;
frontmatter: Record<string, unknown>;
content: string;
}
function safeKnowledgeSlug(value: string): string {
return value.toLowerCase().replace(/[^a-z0-9_-]+/g, '-').replace(/^-+|-+$/g, '');
}
async function pathExists(path: string): Promise<boolean> {
try {
await access(path);
return true;
} catch {
return false;
}
}
async function walkFiles(root: string): Promise<string[]> {
if (!(await pathExists(root))) return [];
const result: string[] = [];
async function visit(dir: string): Promise<void> {
const entries = await readdir(dir, { withFileTypes: true });
for (const entry of entries) {
const absolute = join(dir, entry.name);
if (entry.isDirectory()) {
await visit(absolute);
} else if (entry.isFile()) {
result.push(relative(root, absolute).replace(/\\/g, '/'));
}
}
}
await visit(root);
return result.sort();
}
async function readJson(path: string): Promise<unknown> {
return JSON.parse(await readFile(path, 'utf-8')) as unknown;
}
async function writeYamlAtomic(path: string, value: unknown): Promise<void> {
await mkdir(dirname(path), { recursive: true });
const tmp = `${path}.tmp`;
await writeFile(tmp, YAML.stringify(value, { indent: 2, lineWidth: 0, version: '1.1' }), 'utf-8');
await rename(tmp, path);
}
function tableSourceName(tableRef: string): string {
return tableRef.split('.').filter(Boolean).at(-1) ?? tableRef;
}
function staleUsage(fetchedAt: string) {
return {
narrative: 'No recent historic SQL usage was observed in the latest snapshot.',
frequencyTier: 'unused' as const,
commonFilters: [],
commonGroupBys: [],
commonJoins: [],
staleSince: fetchedAt,
};
}
async function loadEvidence(workdir: string, runId: string): Promise<HistoricSqlEvidenceEnvelope[]> {
const root = join(workdir, '.ktx/ingest-evidence/historic-sql', runId);
const files = await walkFiles(root);
const evidence: HistoricSqlEvidenceEnvelope[] = [];
for (const file of files.filter((candidate) => candidate.endsWith('.json'))) {
evidence.push(historicSqlEvidenceEnvelopeSchema.parse(await readJson(join(root, file))));
}
return evidence;
}
function renderPatternMarkdown(pattern: HistoricSqlEvidenceEnvelope & { kind: 'pattern' }): string {
return [
`# ${pattern.pattern.title}`,
'',
pattern.pattern.narrative,
'',
'## Representative SQL',
'',
'```sql',
pattern.pattern.definitionSql,
'```',
'',
'## Tables',
'',
...pattern.pattern.tablesInvolved.map((table) => `- ${table}`),
'',
'## Constituent Templates',
'',
...pattern.pattern.constituentTemplateIds.map((id) => `- ${id}`),
'',
].join('\n');
}
function overlapRatio(left: string[], right: string[]): number {
const rightSet = new Set(right);
const intersection = left.filter((value) => rightSet.has(value)).length;
return left.length === 0 ? 0 : intersection / left.length;
}
function parseMarkdownPage(key: string, path: string, raw: string): HistoricSqlPatternPage | null {
const match = raw.match(/^---\n([\s\S]*?)\n---\n?([\s\S]*)$/);
if (!match) return null;
return {
key,
path,
frontmatter: (YAML.parse(match[1] ?? '') ?? {}) as Record<string, unknown>,
content: match[2] ?? '',
};
}
function isHistoricPatternPage(page: HistoricSqlPatternPage): boolean {
const tags = Array.isArray(page.frontmatter.tags) ? page.frontmatter.tags : [];
return (
page.frontmatter.source === 'historic-sql' &&
tags.includes('historic-sql') &&
tags.includes('pattern')
);
}
function isArchivedPatternPage(page: HistoricSqlPatternPage): boolean {
const tags = Array.isArray(page.frontmatter.tags) ? page.frontmatter.tags : [];
return tags.includes('archived');
}
function stringArray(value: unknown): string[] {
return Array.isArray(value) ? value.filter((entry): entry is string => typeof entry === 'string') : [];
}
function renderMarkdownPage(frontmatter: Record<string, unknown>, content: string): string {
let yaml = YAML.stringify(frontmatter, { indent: 2, lineWidth: 0 }).trimEnd();
const staleSince = frontmatter.stale_since;
if (typeof staleSince === 'string') {
yaml = yaml.replace(`stale_since: ${staleSince}`, `stale_since: "${staleSince}"`);
}
return `---\n${yaml}\n---\n\n${content.trim()}\n`;
}
function existingPageSignals(page: HistoricSqlPatternPage): string[] {
return [...stringArray(page.frontmatter.tables), ...stringArray(page.frontmatter.fingerprints)];
}
function shouldArchive(staleSince: unknown, fetchedAt: string, days: number): boolean {
if (typeof staleSince !== 'string') return false;
const staleTime = Date.parse(staleSince);
const fetchedTime = Date.parse(fetchedAt);
if (!Number.isFinite(staleTime) || !Number.isFinite(fetchedTime)) return false;
return fetchedTime - staleTime > days * 24 * 60 * 60 * 1000;
}
async function loadPatternPages(root: string): Promise<HistoricSqlPatternPage[]> {
const files = await walkFiles(root);
const pages: HistoricSqlPatternPage[] = [];
for (const file of files.filter((candidate) => candidate.endsWith('.md'))) {
if (file.includes('/')) {
continue;
}
const key = file.replace(/\.md$/, '');
const path = join(root, file);
const page = parseMarkdownPage(key, path, await readFile(path, 'utf-8'));
if (page) {
pages.push(page);
}
}
return pages;
}
function historicSqlFlatKey(slug: string): string {
return `historic-sql-${safeKnowledgeSlug(slug)}`;
}
async function currentStagedTables(rawDir: string): Promise<Set<string>> {
const tablesRoot = join(rawDir, 'tables');
const files = await walkFiles(tablesRoot);
const tables = new Set<string>();
for (const file of files.filter((candidate) => candidate.endsWith('.json'))) {
const value = await readJson(join(tablesRoot, file));
if (typeof value === 'object' && value !== null && 'table' in value && typeof value.table === 'string') {
tables.add(value.table);
}
}
return tables;
}
export async function projectHistoricSqlEvidence(input: HistoricSqlProjectionInput): Promise<HistoricSqlProjectionResult> {
const result: HistoricSqlProjectionResult = {
tableUsageMerged: 0,
staleTablesMarked: 0,
patternPagesWritten: 0,
stalePatternPagesMarked: 0,
archivedPatternPages: 0,
touchedSources: [],
changedWikiPageKeys: [],
actions: [],
warnings: [],
};
const touchedKeys = new Set<string>();
const rawDir = join(input.workdir, rawSourcesDirForSync(input.connectionId, 'historic-sql', input.syncId));
const manifest = stagedManifestSchema.parse(await readJson(join(rawDir, 'manifest.json')));
const currentTables = await currentStagedTables(rawDir);
const evidence = await loadEvidence(input.workdir, input.runId);
if (input.overrideReplay && evidence.length === 0) {
result.warnings.push(
'historic-sql finalization skipped stale/archive cleanup during override replay without current-run evidence',
);
return result;
}
if (evidence.length === 0) {
result.warnings.push('historic-sql finalization skipped because no current-run evidence was emitted');
return result;
}
const tableEvidence = evidence.filter((entry): entry is HistoricSqlEvidenceEnvelope & { kind: 'table_usage' } => entry.kind === 'table_usage');
const patternEvidence = evidence.filter((entry): entry is HistoricSqlEvidenceEnvelope & { kind: 'pattern' } => entry.kind === 'pattern');
const schemaRoot = join(input.workdir, 'semantic-layer', input.connectionId, '_schema');
for (const file of (await walkFiles(schemaRoot)).filter((candidate) => candidate.endsWith('.yaml') || candidate.endsWith('.yml'))) {
const path = join(schemaRoot, file);
const before = await readFile(path, 'utf-8');
const shard = (YAML.parse(before) ?? {}) as ManifestShard;
if (!shard.tables) continue;
for (const [tableName, entry] of Object.entries(shard.tables)) {
const tableRef = entry.table ?? tableName;
const matchingEvidence = tableEvidence.find(
(candidate) => candidate.table === tableRef || tableSourceName(candidate.table) === tableName,
);
if (matchingEvidence) {
const merged = mergeUsagePreservingExternal(entry.usage as TableUsageOutput | undefined, matchingEvidence.usage);
if (JSON.stringify(entry.usage ?? null) !== JSON.stringify(merged ?? null)) {
entry.usage = merged as Record<string, unknown>;
result.tableUsageMerged += 1;
const sourceName = tableSourceName(matchingEvidence.table);
const key = `${input.connectionId}:${sourceName}`;
if (!touchedKeys.has(key)) {
touchedKeys.add(key);
result.touchedSources.push({ connectionId: input.connectionId, sourceName });
}
result.actions.push({
target: 'sl',
type: 'updated',
key: sourceName,
targetConnectionId: input.connectionId,
detail: `Merged historic-SQL usage for ${matchingEvidence.table}`,
rawPaths: [matchingEvidence.rawPath],
});
}
} else if (entry.usage && !currentTables.has(tableRef)) {
const merged = mergeUsagePreservingExternal(entry.usage as TableUsageOutput | undefined, staleUsage(manifest.fetchedAt));
if (JSON.stringify(entry.usage ?? null) !== JSON.stringify(merged ?? null)) {
entry.usage = merged as Record<string, unknown>;
result.staleTablesMarked += 1;
const sourceName = tableSourceName(tableRef);
const key = `${input.connectionId}:${sourceName}`;
if (!touchedKeys.has(key)) {
touchedKeys.add(key);
result.touchedSources.push({ connectionId: input.connectionId, sourceName });
}
result.actions.push({
target: 'sl',
type: 'updated',
key: sourceName,
targetConnectionId: input.connectionId,
detail: `Marked historic-SQL usage stale for ${tableRef}`,
});
}
}
}
const after = YAML.stringify(shard, { indent: 2, lineWidth: 0, version: '1.1' });
if (after !== before) {
await writeYamlAtomic(path, shard);
}
}
const wikiRoot = join(input.workdir, 'wiki/global');
await mkdir(wikiRoot, { recursive: true });
const allPages = await loadPatternPages(wikiRoot);
const activePages = allPages.filter((page) => !isArchivedPatternPage(page));
const patternPages = activePages.filter(isHistoricPatternPage);
const writtenKeys = new Set<string>();
for (const pattern of patternEvidence) {
const incomingSignals = [...pattern.pattern.tablesInvolved, ...pattern.pattern.constituentTemplateIds];
const reusable = patternPages.find((page) => overlapRatio(incomingSignals, existingPageSignals(page)) >= 0.6);
const key = reusable?.key ?? historicSqlFlatKey(pattern.pattern.slug);
const pagePath = join(wikiRoot, `${key}.md`);
const frontmatter = {
summary: pattern.pattern.title,
tags: ['historic-sql', 'pattern'],
refs: [],
sl_refs: pattern.pattern.slRefs,
usage_mode: 'auto',
source: 'historic-sql',
tables: pattern.pattern.tablesInvolved,
representative_sql: pattern.pattern.definitionSql,
fingerprints: pattern.pattern.constituentTemplateIds,
};
await mkdir(dirname(pagePath), { recursive: true });
await writeFile(pagePath, renderMarkdownPage(frontmatter, renderPatternMarkdown(pattern)), 'utf-8');
writtenKeys.add(key);
result.patternPagesWritten += 1;
result.changedWikiPageKeys.push(key);
result.actions.push({
target: 'wiki',
type: reusable ? 'updated' : 'created',
key,
detail: `Projected historic-SQL pattern ${pattern.pattern.title}`,
rawPaths: [pattern.rawPath],
});
}
for (const page of patternPages) {
if (writtenKeys.has(page.key)) continue;
if (shouldArchive(page.frontmatter.stale_since, manifest.fetchedAt, manifest.staleArchiveAfterDays)) {
const tags = [...new Set([...stringArray(page.frontmatter.tags), 'archived'])];
await writeFile(
page.path,
renderMarkdownPage({ ...page.frontmatter, tags, archived_since: manifest.fetchedAt }, page.content),
'utf-8',
);
result.archivedPatternPages += 1;
result.changedWikiPageKeys.push(page.key);
result.actions.push({
target: 'wiki',
type: 'updated',
key: page.key,
detail: `Archived stale historic-SQL pattern page ${page.key}`,
});
continue;
}
const tags = [...new Set([...stringArray(page.frontmatter.tags), 'stale'])];
await writeFile(
page.path,
renderMarkdownPage({ ...page.frontmatter, tags, stale_since: manifest.fetchedAt }, page.content),
'utf-8',
);
result.stalePatternPagesMarked += 1;
result.changedWikiPageKeys.push(page.key);
result.actions.push({
target: 'wiki',
type: 'updated',
key: page.key,
detail: `Marked historic-SQL pattern page ${page.key} stale`,
});
}
result.changedWikiPageKeys = [...new Set(result.changedWikiPageKeys)].sort();
return result;
}

View file

@ -0,0 +1,36 @@
import { describe, expect, it } from 'vitest';
import { compileHistoricSqlRedactionPatterns, redactHistoricSqlText } from './redaction.js';
describe('historic-SQL redaction', () => {
it('redacts regex matches and supports the (?i) case-insensitive prefix', () => {
const redactors = compileHistoricSqlRedactionPatterns([
'sk_live_[A-Za-z0-9]+',
'(?i)secret_token_[a-z0-9]+',
]);
const sql =
"select * from public.api_events where api_key = 'sk_live_abc123' and note = 'Secret_Token_9f'"; // pragma: allowlist secret
expect(redactHistoricSqlText(sql, redactors)).toBe(
"select * from public.api_events where api_key = '[REDACTED]' and note = '[REDACTED]'",
);
});
it('returns the original SQL text when no redaction patterns are configured', () => {
const sql = "select * from public.orders where status = 'paid'";
expect(redactHistoricSqlText(sql, compileHistoricSqlRedactionPatterns([]))).toBe(sql);
});
it('throws a config-focused error for invalid redaction regex patterns', () => {
expect(() => compileHistoricSqlRedactionPatterns(['[broken'])).toThrow(
'Invalid historicSql.redactionPatterns entry "[broken"',
);
});
it('throws a config-focused error for empty redaction regex patterns', () => {
expect(() => compileHistoricSqlRedactionPatterns([' '])).toThrow(
'Invalid historicSql.redactionPatterns entry " "',
);
});
});

View file

@ -0,0 +1,37 @@
export interface HistoricSqlRedactionPattern {
pattern: string;
expression: RegExp;
}
const CASE_INSENSITIVE_PREFIX = '(?i)';
const REDACTION_TOKEN = '[REDACTED]';
export function compileHistoricSqlRedactionPatterns(patterns: readonly string[]): HistoricSqlRedactionPattern[] {
return patterns.map((pattern) => {
const trimmed = pattern.trim();
const caseInsensitive = trimmed.startsWith(CASE_INSENSITIVE_PREFIX);
const source = caseInsensitive ? trimmed.slice(CASE_INSENSITIVE_PREFIX.length) : trimmed;
if (source.length === 0) {
throw new Error(`Invalid historicSql.redactionPatterns entry "${pattern}": pattern must not be empty`);
}
try {
return {
pattern,
expression: new RegExp(source, caseInsensitive ? 'gi' : 'g'),
};
} catch (error) {
const reason = error instanceof Error ? error.message : String(error);
throw new Error(`Invalid historicSql.redactionPatterns entry "${pattern}": ${reason}`);
}
});
}
export function redactHistoricSqlText(text: string, redactors: readonly HistoricSqlRedactionPattern[]): string {
let next = text;
for (const redactor of redactors) {
redactor.expression.lastIndex = 0;
next = next.replace(redactor.expression, REDACTION_TOKEN);
}
return next;
}

View file

@ -0,0 +1,74 @@
import { describe, expect, it } from 'vitest';
import { z } from 'zod';
import {
patternOutputSchema,
patternsArraySchema,
tableUsageOutputSchema,
} from './skill-schemas.js';
describe('historic-sql skill schemas', () => {
it('accepts table usage output and preserves future keys', () => {
const parsed = tableUsageOutputSchema.parse({
narrative: 'Orders are queried for paid/refunded lifecycle analysis.',
frequencyTier: 'high',
commonFilters: ['status', 'created_at'],
commonGroupBys: ['status'],
commonJoins: [{ table: 'public.customers', on: ['customer_id'] }],
staleSince: null,
analystNote: 'preserve me',
});
expect(parsed).toMatchObject({
narrative: 'Orders are queried for paid/refunded lifecycle analysis.',
frequencyTier: 'high',
commonFilters: ['status', 'created_at'],
commonGroupBys: ['status'],
commonJoins: [{ table: 'public.customers', on: ['customer_id'] }],
staleSince: null,
analystNote: 'preserve me',
});
});
it('rejects invalid frequency tiers', () => {
const result = tableUsageOutputSchema.safeParse({
narrative: 'Orders are queried often.',
frequencyTier: 'sometimes',
commonFilters: [],
commonJoins: [],
});
expect(result.success).toBe(false);
});
it('accepts pattern outputs used for wiki projection', () => {
const parsed = patternsArraySchema.parse([
{
slug: 'order-lifecycle-analysis',
title: 'Order Lifecycle Analysis',
narrative: 'Teams inspect order status by customer and month.',
definitionSql: 'select status, count(*) from public.orders group by status',
tablesInvolved: ['public.orders', 'public.customers'],
slRefs: ['orders', 'customers'],
constituentTemplateIds: ['template_1', 'template_2'],
},
]);
expect(parsed[0]).toEqual({
slug: 'order-lifecycle-analysis',
title: 'Order Lifecycle Analysis',
narrative: 'Teams inspect order status by customer and month.',
definitionSql: 'select status, count(*) from public.orders group by status',
tablesInvolved: ['public.orders', 'public.customers'],
slRefs: ['orders', 'customers'],
constituentTemplateIds: ['template_1', 'template_2'],
});
});
it('exports zod schemas that can produce JSON schema for prompt prefixes', () => {
const tableUsageJsonSchema = z.toJSONSchema(tableUsageOutputSchema);
const patternJsonSchema = z.toJSONSchema(patternOutputSchema);
expect(tableUsageJsonSchema).toMatchObject({ type: 'object' });
expect(patternJsonSchema).toMatchObject({ type: 'object' });
});
});

View file

@ -0,0 +1,31 @@
import { z } from 'zod';
export const tableUsageOutputSchema = z
.object({
narrative: z.string(),
frequencyTier: z.enum(['high', 'mid', 'low', 'unused']),
commonFilters: z.array(z.string()),
commonGroupBys: z.array(z.string()).optional(),
commonJoins: z.array(
z.object({
table: z.string(),
on: z.array(z.string()),
}),
),
staleSince: z.iso.datetime().nullable().optional(),
})
.passthrough();
export type TableUsageOutput = z.infer<typeof tableUsageOutputSchema>;
export const patternOutputSchema = z.object({
slug: z.string(),
title: z.string(),
narrative: z.string(),
definitionSql: z.string(),
tablesInvolved: z.array(z.string()),
slRefs: z.array(z.string()),
constituentTemplateIds: z.array(z.string()),
});
/** @internal */
export const patternsArraySchema = z.array(patternOutputSchema);

View file

@ -0,0 +1,148 @@
import { describe, expect, it, vi } from 'vitest';
import { HistoricSqlGrantsMissingError } from './errors.js';
import { SnowflakeHistoricSqlQueryHistoryReader } from './snowflake-query-history-reader.js';
interface FakeQueryResult {
headers: string[];
rows: unknown[][];
totalRows: number;
error?: string;
}
function queryClient(results: FakeQueryResult[]) {
const executeQuery = vi.fn(async (_query: string) => {
const next = results.shift();
if (!next) {
throw new Error('unexpected query');
}
return next;
});
return { executeQuery };
}
function firstQuery(client: ReturnType<typeof queryClient>): string {
const call = client.executeQuery.mock.calls[0];
if (!call) {
throw new Error('expected query client to be called');
}
return call[0];
}
describe('SnowflakeHistoricSqlQueryHistoryReader', () => {
it('probes SNOWFLAKE.ACCOUNT_USAGE.QUERY_HISTORY', async () => {
const client = queryClient([{ headers: ['1'], rows: [[1]], totalRows: 1 }]);
const reader = new SnowflakeHistoricSqlQueryHistoryReader();
await expect(reader.probe(client)).resolves.toEqual({ warnings: [], info: [] });
expect(client.executeQuery).toHaveBeenCalledWith(
'SELECT 1 FROM SNOWFLAKE.ACCOUNT_USAGE.QUERY_HISTORY LIMIT 1',
);
});
it('turns probe result errors into HistoricSqlGrantsMissingError', async () => {
const client = queryClient([{ headers: [], rows: [], totalRows: 0, error: 'Object does not exist or not authorized' }]);
const reader = new SnowflakeHistoricSqlQueryHistoryReader();
await expect(reader.probe(client)).rejects.toMatchObject({
name: 'HistoricSqlGrantsMissingError',
dialect: 'snowflake',
remediation: 'GRANT IMPORTED PRIVILEGES ON DATABASE SNOWFLAKE TO ROLE <connection role>;',
});
});
it('turns thrown probe failures into HistoricSqlGrantsMissingError', async () => {
const client = {
executeQuery: vi.fn(async () => {
throw new Error('permission denied');
}),
};
const reader = new SnowflakeHistoricSqlQueryHistoryReader();
await expect(reader.probe(client)).rejects.toBeInstanceOf(HistoricSqlGrantsMissingError);
});
it('fetches aggregated Snowflake query templates', async () => {
const client = queryClient([
{
headers: [
'template_id',
'canonical_sql',
'executions',
'distinct_users',
'first_seen',
'last_seen',
'p50_ms',
'p95_ms',
'error_rate',
'rows_produced',
'top_users',
],
rows: [
[
'hash-1',
'select status from orders',
42,
3,
'2026-05-01T00:00:00.000Z',
'2026-05-11T00:00:00.000Z',
12,
40,
0.05,
100,
JSON.stringify([{ user: 'ANALYST', executions: 1 }]),
],
],
totalRows: 1,
},
]);
const reader = new SnowflakeHistoricSqlQueryHistoryReader();
const rows = [];
for await (const row of reader.fetchAggregated(
client,
{ start: new Date('2026-02-10T00:00:00.000Z'), end: new Date('2026-05-11T00:00:00.000Z') },
{ dialect: 'snowflake', minExecutions: 5, windowDays: 90, enabledTables: [], filters: { dropTrivialProbes: true }, redactionPatterns: [], staleArchiveAfterDays: 90 },
)) {
rows.push(row);
}
const sql = firstQuery(client);
expect(sql).toContain('SNOWFLAKE.ACCOUNT_USAGE.QUERY_HISTORY');
expect(sql).toContain('COUNT(*) AS executions');
expect(sql).toContain('GROUP BY query_hash');
expect(sql).toContain('HAVING COUNT(*) >= 5');
expect(rows).toMatchObject([
{
templateId: 'hash-1',
stats: {
executions: 42,
errorRate: 0.05,
},
topUsers: [{ user: 'ANALYST', executions: 1 }],
},
]);
});
it('throws a clear error when the query client cannot execute SQL', async () => {
const reader = new SnowflakeHistoricSqlQueryHistoryReader();
await expect(async () => {
for await (const _row of reader.fetchAggregated(
{},
{ start: new Date(), end: new Date() },
{
dialect: 'snowflake',
minExecutions: 5,
windowDays: 90,
enabledTables: [],
filters: { dropTrivialProbes: true },
redactionPatterns: [],
staleArchiveAfterDays: 90,
},
)) {
throw new Error('unreachable');
}
}).rejects.toThrow('Historic SQL Snowflake reader requires a query client with executeQuery(query)');
});
});

View file

@ -0,0 +1,220 @@
import { HistoricSqlGrantsMissingError } from './errors.js';
import {
aggregatedTemplateSchema,
type AggregatedTemplate,
type HistoricSqlTimeWindow,
type HistoricSqlUnifiedPullConfig,
} from './types.js';
interface QueryResultLike {
headers: string[];
rows: unknown[][];
totalRows: number;
error?: string;
}
interface QueryClientLike {
executeQuery(query: string): Promise<QueryResultLike>;
}
const PROBE_SQL = 'SELECT 1 FROM SNOWFLAKE.ACCOUNT_USAGE.QUERY_HISTORY LIMIT 1';
const SNOWFLAKE_GRANTS_REMEDIATION =
'GRANT IMPORTED PRIVILEGES ON DATABASE SNOWFLAKE TO ROLE <connection role>;';
function queryClient(client: unknown): QueryClientLike {
if (
client &&
typeof client === 'object' &&
'executeQuery' in client &&
typeof (client as { executeQuery?: unknown }).executeQuery === 'function'
) {
return client as QueryClientLike;
}
throw new Error('Historic SQL Snowflake reader requires a query client with executeQuery(query)');
}
function grantsError(cause: unknown): HistoricSqlGrantsMissingError {
const message =
cause instanceof Error
? cause.message
: typeof cause === 'string'
? cause
: 'Snowflake role cannot query SNOWFLAKE.ACCOUNT_USAGE.QUERY_HISTORY.';
return new HistoricSqlGrantsMissingError({
dialect: 'snowflake',
message: `Missing Snowflake audit grants for historic-SQL ingest: ${message}`,
remediation: SNOWFLAKE_GRANTS_REMEDIATION,
cause,
});
}
function timestampLiteral(value: Date | string): string {
const date = value instanceof Date ? value : new Date(value);
if (Number.isNaN(date.getTime())) {
throw new Error(`Invalid Snowflake query-history timestamp: ${String(value)}`);
}
return `'${date.toISOString().replace(/'/g, "''")}'::TIMESTAMP_TZ`;
}
function indexByHeader(headers: string[]): Map<string, number> {
const out = new Map<string, number>();
headers.forEach((header, index) => {
out.set(header.toUpperCase(), index);
});
return out;
}
function value(row: unknown[], indexes: Map<string, number>, name: string): unknown {
const index = indexes.get(name.toUpperCase());
return index === undefined ? null : row[index];
}
function nullableString(raw: unknown): string | null {
if (raw === null || raw === undefined) {
return null;
}
const text = String(raw);
return text.length > 0 ? text : null;
}
function requiredString(raw: unknown, field: string): string {
const text = nullableString(raw);
if (!text) {
throw new Error(`Snowflake QUERY_HISTORY row is missing ${field}`);
}
return text;
}
function nullableNumber(raw: unknown): number | null {
if (raw === null || raw === undefined || raw === '') {
return null;
}
const number = typeof raw === 'number' ? raw : Number(raw);
if (!Number.isFinite(number)) {
return null;
}
return number;
}
function requiredNumber(raw: unknown, field: string): number {
const number = nullableNumber(raw);
if (number === null) {
throw new Error(`Snowflake QUERY_HISTORY row has invalid ${field}: ${String(raw)}`);
}
return number;
}
function requiredInteger(raw: unknown, field: string): number {
return Math.trunc(requiredNumber(raw, field));
}
function nullableInteger(raw: unknown): number | null {
const number = nullableNumber(raw);
return number === null ? null : Math.trunc(number);
}
function isoTimestamp(raw: unknown, field: string): string {
if (raw instanceof Date) {
return raw.toISOString();
}
const text = requiredString(raw, field);
const date = new Date(text);
if (Number.isNaN(date.getTime())) {
throw new Error(`Snowflake QUERY_HISTORY row has invalid ${field}: ${text}`);
}
return date.toISOString();
}
function parseTopUsers(raw: unknown): Array<{ user: string | null; executions: number }> {
const text = nullableString(raw);
if (!text) {
return [];
}
try {
const parsed = JSON.parse(text) as unknown;
if (!Array.isArray(parsed)) {
return [];
}
return parsed.flatMap((entry) => {
if (!entry || typeof entry !== 'object') {
return [];
}
const user = nullableString((entry as { user?: unknown }).user);
const executions = nullableInteger((entry as { executions?: unknown }).executions);
return executions === null ? [] : [{ user, executions }];
});
} catch {
return [];
}
}
function mapAggregatedRow(row: unknown[], indexes: Map<string, number>): AggregatedTemplate {
return aggregatedTemplateSchema.parse({
templateId: requiredString(value(row, indexes, 'template_id'), 'template_id'),
canonicalSql: requiredString(value(row, indexes, 'canonical_sql'), 'canonical_sql'),
dialect: 'snowflake',
stats: {
executions: requiredInteger(value(row, indexes, 'executions'), 'executions'),
distinctUsers: requiredInteger(value(row, indexes, 'distinct_users'), 'distinct_users'),
firstSeen: isoTimestamp(value(row, indexes, 'first_seen'), 'first_seen'),
lastSeen: isoTimestamp(value(row, indexes, 'last_seen'), 'last_seen'),
p50RuntimeMs: nullableNumber(value(row, indexes, 'p50_ms')),
p95RuntimeMs: nullableNumber(value(row, indexes, 'p95_ms')),
errorRate: requiredNumber(value(row, indexes, 'error_rate'), 'error_rate'),
rowsProduced: nullableInteger(value(row, indexes, 'rows_produced')),
},
topUsers: parseTopUsers(value(row, indexes, 'top_users')),
});
}
export class SnowflakeHistoricSqlQueryHistoryReader {
async probe(client: unknown): Promise<{ warnings: string[]; info: string[] }> {
let result: QueryResultLike;
try {
result = await queryClient(client).executeQuery(PROBE_SQL);
} catch (error) {
throw grantsError(error);
}
if (result.error) {
throw grantsError(result.error);
}
return { warnings: [], info: [] };
}
async *fetchAggregated(
client: unknown,
window: HistoricSqlTimeWindow,
config: HistoricSqlUnifiedPullConfig,
): AsyncIterable<AggregatedTemplate> {
const sql = `
SELECT
query_hash AS template_id,
MIN(query_text) AS canonical_sql,
COUNT(*) AS executions,
COUNT(DISTINCT user_name) AS distinct_users,
MIN(start_time) AS first_seen,
MAX(start_time) AS last_seen,
APPROX_PERCENTILE(total_elapsed_time, 0.50) AS p50_ms,
APPROX_PERCENTILE(total_elapsed_time, 0.95) AS p95_ms,
DIV0(COUNT_IF(execution_status != 'SUCCESS'), COUNT(*)) AS error_rate,
SUM(rows_produced) AS rows_produced,
ARRAY_AGG(OBJECT_CONSTRUCT('user', user_name, 'executions', 1)) WITHIN GROUP (ORDER BY start_time DESC)::string AS top_users
FROM SNOWFLAKE.ACCOUNT_USAGE.QUERY_HISTORY
WHERE query_text IS NOT NULL
AND query_type IN ('SELECT', 'MERGE')
AND start_time >= ${timestampLiteral(window.start)}
AND start_time < ${timestampLiteral(window.end)}
GROUP BY query_hash
HAVING COUNT(*) >= ${config.minExecutions}
ORDER BY executions DESC`.trim();
const result = await queryClient(client).executeQuery(sql);
if (result.error) {
throw grantsError(result.error);
}
const indexes = indexByHeader(result.headers);
for (const row of result.rows) {
yield mapAggregatedRow(row, indexes);
}
}
}

View file

@ -0,0 +1,436 @@
import { mkdtemp, readFile, readdir } from 'node:fs/promises';
import { tmpdir } from 'node:os';
import { join } from 'node:path';
import { describe, expect, it, vi } from 'vitest';
import type { SqlAnalysisPort } from '../../../../context/sql-analysis/ports.js';
import { stageHistoricSqlAggregatedSnapshot } from './stage-unified.js';
import type { AggregatedTemplate, HistoricSqlReader } from './types.js';
async function tempDir(): Promise<string> {
return mkdtemp(join(tmpdir(), 'historic-sql-unified-stage-'));
}
async function readJson<T>(root: string, relPath: string): Promise<T> {
return JSON.parse(await readFile(join(root, relPath), 'utf-8')) as T;
}
function aggregate(overrides: Partial<AggregatedTemplate> & { templateId: string; canonicalSql: string }): AggregatedTemplate {
return {
templateId: overrides.templateId,
canonicalSql: overrides.canonicalSql,
dialect: overrides.dialect ?? 'postgres',
stats: overrides.stats ?? {
executions: 42,
distinctUsers: 3,
firstSeen: '2026-05-01T00:00:00.000Z',
lastSeen: '2026-05-11T00:00:00.000Z',
p50RuntimeMs: 20,
p95RuntimeMs: 80,
errorRate: 0,
rowsProduced: 100,
},
topUsers: overrides.topUsers ?? [{ user: 'analyst', executions: 40 }],
};
}
describe('stageHistoricSqlAggregatedSnapshot', () => {
it('batch parses templates and writes stable table and patterns artifacts', async () => {
const stagedDir = await tempDir();
const reader: HistoricSqlReader = {
async probe() {
return { warnings: ['pg_stat_statements.track is none; aggregation still proceeds'], info: [] };
},
async *fetchAggregated() {
yield aggregate({
templateId: 'orders-by-status',
canonicalSql: 'select o.status, count(*) from public.orders o join public.customers c on c.id = o.customer_id where o.created_at >= $1 group by o.status',
});
yield aggregate({
templateId: 'service-account-only',
canonicalSql: 'select * from public.orders where id = $1',
stats: {
executions: 20,
distinctUsers: 1,
firstSeen: '2026-05-01T00:00:00.000Z',
lastSeen: '2026-05-11T00:00:00.000Z',
p50RuntimeMs: 5,
p95RuntimeMs: 10,
errorRate: 0,
rowsProduced: 1,
},
topUsers: [{ user: 'svc_loader', executions: 20 }],
});
yield aggregate({
templateId: 'bad-parse',
canonicalSql: 'select broken from',
});
},
};
const sqlAnalysis: SqlAnalysisPort = {
analyzeForFingerprint: vi.fn(),
analyzeBatch: vi.fn(async () => new Map([
[
'orders-by-status',
{
tablesTouched: ['public.orders', 'public.customers'],
columnsByClause: {
select: ['status'],
where: ['created_at'],
join: ['customer_id'],
groupBy: ['status'],
},
},
],
['bad-parse', { tablesTouched: [], columnsByClause: {}, error: 'parse failed' }],
])),
validateReadOnly: vi.fn(async () => ({ ok: true })),
};
await stageHistoricSqlAggregatedSnapshot({
stagedDir,
connectionId: 'warehouse',
queryClient: {},
reader,
sqlAnalysis,
pullConfig: {
dialect: 'postgres',
filters: {
serviceAccounts: { patterns: ['^svc_'], mode: 'exclude' },
},
},
now: new Date('2026-05-11T12:00:00.000Z'),
});
expect(sqlAnalysis.analyzeBatch).toHaveBeenCalledTimes(1);
expect(sqlAnalysis.analyzeBatch).toHaveBeenCalledWith(
[
{
id: 'orders-by-status',
sql: 'select o.status, count(*) from public.orders o join public.customers c on c.id = o.customer_id where o.created_at >= $1 group by o.status',
},
{ id: 'bad-parse', sql: 'select broken from' },
],
'postgres',
);
expect(await readdir(join(stagedDir, 'tables'))).toEqual(['public.customers.json', 'public.orders.json']);
const manifest = await readJson<Record<string, unknown>>(stagedDir, 'manifest.json');
expect(manifest).toMatchObject({
source: 'historic-sql',
connectionId: 'warehouse',
dialect: 'postgres',
snapshotRowCount: 3,
touchedTableCount: 2,
parseFailures: 1,
warnings: ['parse_failed:bad-parse'],
probeWarnings: ['pg_stat_statements.track is none; aggregation still proceeds'],
staleArchiveAfterDays: 90,
});
const orders = await readJson<Record<string, any>>(stagedDir, 'tables/public.orders.json');
expect(orders).toMatchObject({
table: 'public.orders',
stats: {
executionsBucket: '10-100',
distinctUsersBucket: '2-5',
errorRateBucket: 'none',
p95RuntimeBucket: '<100ms',
recencyBucket: 'current',
},
columnsByClause: {
select: [['status', 'high']],
where: [['created_at', 'high']],
join: [['customer_id', 'high']],
groupBy: [['status', 'high']],
},
observedJoins: [{ withTable: 'public.customers', on: ['customer_id'], freq: 'high' }],
topTemplates: [
{
id: 'orders-by-status',
topUsers: [{ user: 'analyst' }],
},
],
});
expect(orders.topTemplates[0].canonicalSql).toContain('group by o.status');
const patterns = await readJson<Record<string, any>>(stagedDir, 'patterns-input.json');
expect(patterns.templates).toEqual([
{
id: 'orders-by-status',
canonicalSql: expect.stringContaining('public.orders'),
tablesTouched: ['public.customers', 'public.orders'],
executionsBucket: '10-100',
distinctUsersBucket: '2-5',
dialect: 'postgres',
},
]);
});
it('redacts configured SQL substrings in staged artifacts while analyzing original SQL', async () => {
const stagedDir = await tempDir();
const originalSql =
"select * from public.api_events where api_key = 'sk_live_abc123' and note = 'Secret_Token_9f'"; // pragma: allowlist secret
const reader: HistoricSqlReader = {
async probe() {
return { warnings: [], info: [] };
},
async *fetchAggregated() {
yield aggregate({
templateId: 'api-events-with-secret',
canonicalSql: originalSql,
stats: {
executions: 15,
distinctUsers: 2,
firstSeen: '2026-05-01T00:00:00.000Z',
lastSeen: '2026-05-11T00:00:00.000Z',
p50RuntimeMs: 12,
p95RuntimeMs: 25,
errorRate: 0,
rowsProduced: 15,
},
});
},
};
const sqlAnalysis: SqlAnalysisPort = {
analyzeForFingerprint: vi.fn(),
analyzeBatch: vi.fn(async () => new Map([
[
'api-events-with-secret',
{
tablesTouched: ['public.api_events'],
columnsByClause: {
select: [],
where: ['api_key', 'note'],
join: [],
groupBy: [],
},
},
],
])),
validateReadOnly: vi.fn(async () => ({ ok: true })),
};
await stageHistoricSqlAggregatedSnapshot({
stagedDir,
connectionId: 'warehouse',
queryClient: {},
reader,
sqlAnalysis,
pullConfig: {
dialect: 'postgres',
redactionPatterns: ['sk_live_[A-Za-z0-9]+', '(?i)secret_token_[a-z0-9]+'],
},
now: new Date('2026-05-11T12:00:00.000Z'),
});
expect(sqlAnalysis.analyzeBatch).toHaveBeenCalledWith(
[{ id: 'api-events-with-secret', sql: originalSql }],
'postgres',
);
const tableJson = await readFile(join(stagedDir, 'tables/public.api_events.json'), 'utf-8');
const patternsJson = await readFile(join(stagedDir, 'patterns-input.json'), 'utf-8');
expect(tableJson).not.toContain('sk_live_abc123');
expect(tableJson).not.toContain('Secret_Token_9f');
expect(patternsJson).not.toContain('sk_live_abc123');
expect(patternsJson).not.toContain('Secret_Token_9f');
expect(tableJson).toContain('[REDACTED]');
expect(patternsJson).toContain('[REDACTED]');
});
it('limits staged table artifacts to configured enabled tables', async () => {
const stagedDir = await tempDir();
const reader: HistoricSqlReader = {
async probe() {
return { warnings: [], info: [] };
},
async *fetchAggregated() {
yield aggregate({
templateId: 'selected-qualified',
canonicalSql: 'select count(*) from orbit_analytics.int_active_contract_arr',
});
yield aggregate({
templateId: 'selected-unqualified',
canonicalSql: 'select count(*) from int_customer_health_signals',
});
yield aggregate({
templateId: 'unselected',
canonicalSql: 'select count(*) from orbit_raw.accounts',
});
},
};
const sqlAnalysis: SqlAnalysisPort = {
analyzeForFingerprint: vi.fn(),
analyzeBatch: vi.fn(async () => new Map([
[
'selected-qualified',
{
tablesTouched: ['orbit_analytics.int_active_contract_arr'],
columnsByClause: { select: [], where: [], join: [], groupBy: [] },
},
],
[
'selected-unqualified',
{
tablesTouched: ['int_customer_health_signals'],
columnsByClause: { select: [], where: [], join: [], groupBy: [] },
},
],
[
'unselected',
{
tablesTouched: ['orbit_raw.accounts'],
columnsByClause: { select: [], where: [], join: [], groupBy: [] },
},
],
])),
validateReadOnly: vi.fn(async () => ({ ok: true })),
};
await stageHistoricSqlAggregatedSnapshot({
stagedDir,
connectionId: 'warehouse',
queryClient: {},
reader,
sqlAnalysis,
pullConfig: {
dialect: 'postgres',
enabledTables: [
'orbit_analytics.int_active_contract_arr',
'orbit_analytics.int_customer_health_signals',
],
},
now: new Date('2026-05-11T12:00:00.000Z'),
});
expect(await readdir(join(stagedDir, 'tables'))).toEqual([
'int_customer_health_signals.json',
'orbit_analytics.int_active_contract_arr.json',
]);
const manifest = await readJson<Record<string, any>>(stagedDir, 'manifest.json');
expect(manifest.touchedTableCount).toBe(2);
const patterns = await readJson<Record<string, any>>(stagedDir, 'patterns-input.json');
expect(patterns.templates.map((entry: any) => entry.id)).toEqual(['selected-qualified', 'selected-unqualified']);
});
it('preserves full patterns audit input and writes bounded cross-table pattern shards', async () => {
const stagedDir = await tempDir();
const largeSql = `select * from public.orders o join public.customers c on c.id = o.customer_id where payload = '${'x'.repeat(8000)}'`;
const reader: HistoricSqlReader = {
async probe() {
return { warnings: [], info: [] };
},
async *fetchAggregated() {
yield aggregate({
templateId: 'orders-customers-a',
canonicalSql: largeSql,
stats: {
executions: 25,
distinctUsers: 4,
firstSeen: '2026-05-01T00:00:00.000Z',
lastSeen: '2026-05-11T00:00:00.000Z',
p50RuntimeMs: 15,
p95RuntimeMs: 90,
errorRate: 0,
rowsProduced: 250,
},
});
yield aggregate({
templateId: 'orders-customers-b',
canonicalSql: largeSql.replace('payload', 'payload_b'),
stats: {
executions: 22,
distinctUsers: 3,
firstSeen: '2026-05-01T00:00:00.000Z',
lastSeen: '2026-05-11T00:00:00.000Z',
p50RuntimeMs: 20,
p95RuntimeMs: 95,
errorRate: 0,
rowsProduced: 220,
},
});
yield aggregate({
templateId: 'orders-single-table',
canonicalSql: 'select count(*) from public.orders',
stats: {
executions: 30,
distinctUsers: 2,
firstSeen: '2026-05-01T00:00:00.000Z',
lastSeen: '2026-05-11T00:00:00.000Z',
p50RuntimeMs: 10,
p95RuntimeMs: 20,
errorRate: 0,
rowsProduced: 30,
},
});
},
};
const sqlAnalysis: SqlAnalysisPort = {
analyzeForFingerprint: vi.fn(),
analyzeBatch: vi.fn(async () => new Map([
[
'orders-customers-a',
{
tablesTouched: ['public.orders', 'public.customers'],
columnsByClause: {
select: [],
where: ['payload'],
join: ['customer_id', 'id'],
groupBy: [],
},
},
],
[
'orders-customers-b',
{
tablesTouched: ['public.orders', 'public.customers'],
columnsByClause: {
select: [],
where: ['payload_b'],
join: ['customer_id', 'id'],
groupBy: [],
},
},
],
[
'orders-single-table',
{
tablesTouched: ['public.orders'],
columnsByClause: {
select: [],
where: [],
join: [],
groupBy: [],
},
},
],
])),
validateReadOnly: vi.fn(async () => ({ ok: true })),
};
await stageHistoricSqlAggregatedSnapshot({
stagedDir,
connectionId: 'warehouse',
queryClient: {},
reader,
sqlAnalysis,
pullConfig: { dialect: 'postgres' },
now: new Date('2026-05-11T12:00:00.000Z'),
});
const audit = await readJson<Record<string, any>>(stagedDir, 'patterns-input.json');
expect(audit.templates.map((entry: any) => entry.id)).toEqual([
'orders-customers-a',
'orders-customers-b',
'orders-single-table',
]);
const firstShard = await readJson<Record<string, any>>(stagedDir, 'patterns-input/part-0001.json');
expect(firstShard.templates.map((entry: any) => entry.id)).toEqual(['orders-customers-a', 'orders-customers-b']);
expect(firstShard.templates.some((entry: any) => entry.id === 'orders-single-table')).toBe(false);
const manifest = await readJson<Record<string, any>>(stagedDir, 'manifest.json');
expect(manifest.warnings).toEqual([]);
});
});

View file

@ -0,0 +1,360 @@
import { mkdir, writeFile } from 'node:fs/promises';
import { dirname, join } from 'node:path';
import type { SqlAnalysisPort } from '../../../../context/sql-analysis/ports.js';
import {
bucketDistinctUsers,
bucketErrorRate,
bucketExecutions,
bucketFrequency,
bucketP95Runtime,
bucketRecency,
} from './buckets.js';
import { splitHistoricSqlPatternInputs } from './pattern-inputs.js';
import {
compileHistoricSqlRedactionPatterns,
redactHistoricSqlText,
type HistoricSqlRedactionPattern,
} from './redaction.js';
import {
HISTORIC_SQL_SOURCE_KEY,
aggregatedTemplateSchema,
historicSqlUnifiedPullConfigSchema,
type AggregatedTemplate,
type HistoricSqlReader,
type HistoricSqlUnifiedPullConfig,
type StagedPatternsInput,
type StagedTableInput,
} from './types.js';
interface StageHistoricSqlAggregatedSnapshotInput {
stagedDir: string;
connectionId: string;
queryClient: unknown;
reader: HistoricSqlReader;
sqlAnalysis: SqlAnalysisPort;
pullConfig: unknown;
now?: Date;
}
interface ParsedTemplate {
template: AggregatedTemplate;
tablesTouched: string[];
includedTables: string[];
columnsByClause: Record<string, string[]>;
}
interface EnabledTableFilter {
exact: Set<string>;
uniqueUnqualified: Set<string>;
}
interface TableAccumulator {
table: string;
executions: number;
distinctUsers: number;
errorRateNumerator: number;
p95RuntimeMs: number | null;
lastSeen: string;
columnsByClause: Map<string, Map<string, number>>;
observedJoins: Map<string, Map<string, number>>;
topTemplates: AggregatedTemplate[];
}
const TRIVIAL_SQL_RE = /^\s*SELECT\s+(1|NOW\(\)|CURRENT_TIMESTAMP|VERSION\(\))\s*;?\s*$/i;
const NOISE_PREFIX_RE = /^\s*(SHOW|DESCRIBE|DESC|EXPLAIN|USE|SET)\b/i;
const SYSTEM_TABLE_RE = /\b(INFORMATION_SCHEMA|SNOWFLAKE\.ACCOUNT_USAGE|pg_|system\.)/i;
function writeJson(root: string, relPath: string, value: unknown): Promise<void> {
const target = join(root, relPath);
return mkdir(dirname(target), { recursive: true }).then(() =>
writeFile(target, `${JSON.stringify(value, null, 2)}\n`, 'utf-8'),
);
}
function compilePatterns(patterns: string[]): RegExp[] {
return patterns.map((pattern) => new RegExp(pattern));
}
function matchesAny(value: string | null, patterns: RegExp[]): boolean {
return !!value && patterns.some((pattern) => pattern.test(value));
}
function shouldDropBySql(sql: string, config: HistoricSqlUnifiedPullConfig): boolean {
if (NOISE_PREFIX_RE.test(sql) || SYSTEM_TABLE_RE.test(sql)) return true;
if (config.filters.dropTrivialProbes !== false && TRIVIAL_SQL_RE.test(sql)) return true;
return false;
}
function shouldDropByUsers(template: AggregatedTemplate, config: HistoricSqlUnifiedPullConfig): boolean {
const service = config.filters.serviceAccounts;
if (!service || service.mode === 'mark-only' || service.patterns.length === 0) return false;
const patterns = compilePatterns(service.patterns);
const matchingExecutions = template.topUsers
.filter((entry) => matchesAny(entry.user, patterns))
.reduce((sum, entry) => sum + entry.executions, 0);
const allExecutions = template.topUsers.reduce((sum, entry) => sum + entry.executions, 0);
const serviceOnly = allExecutions > 0 && matchingExecutions >= allExecutions;
return service.mode === 'exclude' ? serviceOnly : !serviceOnly;
}
function shouldDropByFailure(template: AggregatedTemplate, config: HistoricSqlUnifiedPullConfig): boolean {
const failed = config.filters.dropFailedBelow;
return !!failed && template.stats.errorRate > failed.errorRate && template.stats.executions < failed.executions;
}
function shouldDropTemplate(template: AggregatedTemplate, config: HistoricSqlUnifiedPullConfig): boolean {
if (shouldDropBySql(template.canonicalSql, config)) return true;
if (shouldDropByUsers(template, config)) return true;
if (shouldDropByFailure(template, config)) return true;
return false;
}
function normalizeTableIdentifier(value: string): string {
return value.trim().toLowerCase();
}
function unqualifiedTableIdentifier(value: string): string {
const parts = normalizeTableIdentifier(value).split('.').filter(Boolean);
return parts.at(-1) ?? '';
}
function buildEnabledTableFilter(enabledTables: string[]): EnabledTableFilter | null {
if (enabledTables.length === 0) {
return null;
}
const exact = new Set(enabledTables.map(normalizeTableIdentifier).filter((value) => value.length > 0));
const unqualifiedCounts = new Map<string, number>();
for (const table of exact) {
const unqualified = unqualifiedTableIdentifier(table);
if (unqualified.length > 0) {
unqualifiedCounts.set(unqualified, (unqualifiedCounts.get(unqualified) ?? 0) + 1);
}
}
return {
exact,
uniqueUnqualified: new Set(
[...unqualifiedCounts.entries()]
.filter(([, count]) => count === 1)
.map(([table]) => table),
),
};
}
function isEnabledTable(table: string, filter: EnabledTableFilter | null): boolean {
if (!filter) {
return true;
}
const normalized = normalizeTableIdentifier(table);
return filter.exact.has(normalized) || filter.uniqueUnqualified.has(unqualifiedTableIdentifier(normalized));
}
function historicSqlWindowDays(config: HistoricSqlUnifiedPullConfig): number {
return 'windowDays' in config ? config.windowDays : 90;
}
function redactTemplateSql(
template: AggregatedTemplate,
redactors: readonly HistoricSqlRedactionPattern[],
): AggregatedTemplate {
if (redactors.length === 0) {
return template;
}
return {
...template,
canonicalSql: redactHistoricSqlText(template.canonicalSql, redactors),
};
}
function recordColumn(acc: TableAccumulator, clause: string, column: string, executions: number): void {
const byColumn = acc.columnsByClause.get(clause) ?? new Map<string, number>();
byColumn.set(column, (byColumn.get(column) ?? 0) + executions);
acc.columnsByClause.set(clause, byColumn);
}
function recordJoin(acc: TableAccumulator, otherTable: string, columns: string[], executions: number): void {
const byColumns = acc.observedJoins.get(otherTable) ?? new Map<string, number>();
const key = [...new Set(columns)].sort().join(',');
if (key.length > 0) {
byColumns.set(key, (byColumns.get(key) ?? 0) + executions);
acc.observedJoins.set(otherTable, byColumns);
}
}
function accumulatorFor(table: string): TableAccumulator {
return {
table,
executions: 0,
distinctUsers: 0,
errorRateNumerator: 0,
p95RuntimeMs: null,
lastSeen: '1970-01-01T00:00:00.000Z',
columnsByClause: new Map(),
observedJoins: new Map(),
topTemplates: [],
};
}
function addTemplate(acc: TableAccumulator, parsed: ParsedTemplate): void {
const executions = parsed.template.stats.executions;
acc.executions += executions;
acc.distinctUsers = Math.max(acc.distinctUsers, parsed.template.stats.distinctUsers);
acc.errorRateNumerator += parsed.template.stats.errorRate * executions;
acc.p95RuntimeMs =
acc.p95RuntimeMs === null
? parsed.template.stats.p95RuntimeMs
: parsed.template.stats.p95RuntimeMs === null
? acc.p95RuntimeMs
: Math.max(acc.p95RuntimeMs, parsed.template.stats.p95RuntimeMs);
acc.lastSeen = parsed.template.stats.lastSeen > acc.lastSeen ? parsed.template.stats.lastSeen : acc.lastSeen;
for (const [clause, columns] of Object.entries(parsed.columnsByClause)) {
for (const column of columns) {
recordColumn(acc, clause, column, executions);
}
}
const joinColumns = parsed.columnsByClause.join ?? [];
for (const otherTable of parsed.tablesTouched.filter((table) => table !== acc.table)) {
recordJoin(acc, otherTable, joinColumns, executions);
}
acc.topTemplates.push(parsed.template);
}
function toStagedTable(acc: TableAccumulator, now: Date): StagedTableInput {
const errorRate = acc.executions > 0 ? acc.errorRateNumerator / acc.executions : 0;
const columnsByClause: Record<string, Array<[string, string]>> = Object.fromEntries(
[...acc.columnsByClause.entries()]
.sort(([left], [right]) => left.localeCompare(right))
.map(([clause, counts]) => [
clause,
[...counts.entries()]
.sort((left, right) => right[1] - left[1] || left[0].localeCompare(right[0]))
.map(([column, count]) => [column, bucketFrequency(count, acc.executions)] as [string, string]),
]),
);
const observedJoins = [...acc.observedJoins.entries()]
.flatMap(([withTable, byColumns]) =>
[...byColumns.entries()].map(([columns, count]) => ({
withTable,
on: columns.split(',').filter(Boolean),
freq: bucketFrequency(count, acc.executions),
})),
)
.sort((left, right) => left.withTable.localeCompare(right.withTable) || left.on.join(',').localeCompare(right.on.join(',')));
const topTemplates = [...acc.topTemplates]
.sort((left, right) => right.stats.executions - left.stats.executions || left.templateId.localeCompare(right.templateId))
.slice(0, 5)
.map((template) => ({
id: template.templateId,
canonicalSql: template.canonicalSql,
topUsers: template.topUsers.slice(0, 5).map((entry) => ({ user: entry.user })),
}));
return {
table: acc.table,
stats: {
executionsBucket: bucketExecutions(acc.executions),
distinctUsersBucket: bucketDistinctUsers(acc.distinctUsers),
errorRateBucket: bucketErrorRate(errorRate),
p95RuntimeBucket: bucketP95Runtime(acc.p95RuntimeMs),
recencyBucket: bucketRecency(acc.lastSeen, now),
},
columnsByClause,
observedJoins,
topTemplates,
};
}
function toPatternsInput(parsedTemplates: ParsedTemplate[]): StagedPatternsInput {
return {
templates: parsedTemplates
.map(({ template, tablesTouched }) => ({
id: template.templateId,
canonicalSql: template.canonicalSql,
tablesTouched: [...tablesTouched].sort(),
executionsBucket: bucketExecutions(template.stats.executions),
distinctUsersBucket: bucketDistinctUsers(template.stats.distinctUsers),
dialect: template.dialect,
}))
.sort((left, right) => left.id.localeCompare(right.id)),
};
}
export async function stageHistoricSqlAggregatedSnapshot(input: StageHistoricSqlAggregatedSnapshotInput): Promise<void> {
const config = historicSqlUnifiedPullConfigSchema.parse(input.pullConfig);
const enabledTableFilter = buildEnabledTableFilter(config.enabledTables);
const redactors = compileHistoricSqlRedactionPatterns(config.redactionPatterns);
const now = input.now ?? new Date();
const windowStart = new Date(now.getTime() - historicSqlWindowDays(config) * 24 * 60 * 60 * 1000);
const probe = await input.reader.probe(input.queryClient);
const snapshot: AggregatedTemplate[] = [];
let snapshotRowCount = 0;
for await (const row of input.reader.fetchAggregated(input.queryClient, { start: windowStart, end: now }, config)) {
snapshotRowCount += 1;
const parsed = aggregatedTemplateSchema.parse(row);
if (!shouldDropTemplate(parsed, config)) {
snapshot.push(parsed);
}
}
const analysis = await input.sqlAnalysis.analyzeBatch(
snapshot.map((template) => ({ id: template.templateId, sql: template.canonicalSql })),
config.dialect,
);
const warnings: string[] = [];
const parsedTemplates: ParsedTemplate[] = [];
for (const template of snapshot) {
const parsed = analysis.get(template.templateId);
if (!parsed || parsed.error) {
warnings.push(`parse_failed:${template.templateId}`);
continue;
}
const tablesTouched = [...new Set(parsed.tablesTouched)].filter((table) => table.length > 0).sort();
const includedTables = tablesTouched.filter((table) => isEnabledTable(table, enabledTableFilter));
if (includedTables.length === 0) {
continue;
}
parsedTemplates.push({
template: redactTemplateSql(template, redactors),
tablesTouched,
includedTables,
columnsByClause: Object.fromEntries(
Object.entries(parsed.columnsByClause).map(([clause, columns]) => [clause, [...new Set(columns)].sort()]),
),
});
}
const byTable = new Map<string, TableAccumulator>();
for (const parsed of parsedTemplates) {
for (const table of parsed.includedTables) {
const acc = byTable.get(table) ?? accumulatorFor(table);
addTemplate(acc, parsed);
byTable.set(table, acc);
}
}
await mkdir(input.stagedDir, { recursive: true });
for (const [table, acc] of [...byTable.entries()].sort(([left], [right]) => left.localeCompare(right))) {
await writeJson(input.stagedDir, `tables/${table}.json`, toStagedTable(acc, now));
}
const patternsInput = toPatternsInput(parsedTemplates);
const patternInputSplit = splitHistoricSqlPatternInputs(patternsInput);
const allWarnings = [...warnings, ...patternInputSplit.warnings];
await writeJson(input.stagedDir, 'patterns-input.json', patternInputSplit.auditInput);
for (const shard of patternInputSplit.shards) {
await writeJson(input.stagedDir, shard.path, shard.input);
}
await writeJson(input.stagedDir, 'manifest.json', {
source: HISTORIC_SQL_SOURCE_KEY,
connectionId: input.connectionId,
dialect: config.dialect,
fetchedAt: now.toISOString(),
windowStart: windowStart.toISOString(),
windowEnd: now.toISOString(),
snapshotRowCount,
touchedTableCount: byTable.size,
parseFailures: allWarnings.filter((warning) => warning.startsWith('parse_failed:')).length,
warnings: allWarnings,
probeWarnings: probe.warnings,
staleArchiveAfterDays: config.staleArchiveAfterDays,
});
}

View file

@ -0,0 +1,110 @@
import { describe, expect, it } from 'vitest';
import {
aggregatedTemplateSchema,
historicSqlUnifiedPullConfigSchema,
stagedManifestSchema,
stagedPatternsInputSchema,
stagedTableInputSchema,
} from './types.js';
describe('historic-sql unified contracts', () => {
it('parses minExecutions and service-account filters', () => {
expect(historicSqlUnifiedPullConfigSchema.parse({ dialect: 'postgres', minExecutions: 9 })).toMatchObject({
dialect: 'postgres',
minExecutions: 9,
redactionPatterns: [],
staleArchiveAfterDays: 90,
});
expect(historicSqlUnifiedPullConfigSchema.parse({ dialect: 'postgres', minExecutions: 9 })).not.toHaveProperty(
'windowDays',
);
expect(historicSqlUnifiedPullConfigSchema.parse({ dialect: 'postgres', minExecutions: 9 })).not.toHaveProperty(
'concurrency',
);
const parsed = historicSqlUnifiedPullConfigSchema.parse({
dialect: 'postgres',
minExecutions: 7,
filters: {
serviceAccounts: { patterns: ['^svc_'], mode: 'exclude' },
},
});
expect(parsed.minExecutions).toBe(7);
expect(parsed.filters.serviceAccounts).toEqual({ patterns: ['^svc_'], mode: 'exclude' });
});
it('validates aggregate templates from warehouse readers', () => {
const parsed = aggregatedTemplateSchema.parse({
templateId: 'pg:123',
canonicalSql: 'select status, count(*) from public.orders group by status',
dialect: 'postgres',
stats: {
executions: 42,
distinctUsers: 3,
firstSeen: '2026-05-01T00:00:00.000Z',
lastSeen: '2026-05-11T00:00:00.000Z',
p50RuntimeMs: 12.5,
p95RuntimeMs: 40,
errorRate: 0,
rowsProduced: 100,
},
topUsers: [{ user: 'analyst', executions: 40 }],
});
expect(parsed.templateId).toBe('pg:123');
expect(parsed.topUsers).toEqual([{ user: 'analyst', executions: 40 }]);
});
it('validates staged table, patterns, and manifest artifacts', () => {
expect(
stagedTableInputSchema.parse({
table: 'public.orders',
stats: {
executionsBucket: '10-100',
distinctUsersBucket: '2-5',
errorRateBucket: 'none',
p95RuntimeBucket: '<100ms',
recencyBucket: 'current',
},
columnsByClause: {
select: [['status', 'high']],
where: [['created_at', 'mid']],
},
observedJoins: [{ withTable: 'public.customers', on: ['customer_id'], freq: 'high' }],
topTemplates: [{ id: 'pg:123', canonicalSql: 'select * from public.orders', topUsers: [{ user: 'analyst' }] }],
}).table,
).toBe('public.orders');
expect(
stagedPatternsInputSchema.parse({
templates: [
{
id: 'pg:123',
canonicalSql: 'select * from public.orders',
tablesTouched: ['public.orders'],
executionsBucket: '10-100',
distinctUsersBucket: '2-5',
dialect: 'postgres',
},
],
}).templates,
).toHaveLength(1);
expect(
stagedManifestSchema.parse({
source: 'historic-sql',
connectionId: 'warehouse',
dialect: 'postgres',
fetchedAt: '2026-05-11T00:00:00.000Z',
windowStart: '2026-02-10T00:00:00.000Z',
windowEnd: '2026-05-11T00:00:00.000Z',
snapshotRowCount: 2,
touchedTableCount: 1,
parseFailures: 1,
warnings: ['parse_failed:bad'],
probeWarnings: [],
staleArchiveAfterDays: 90,
}).staleArchiveAfterDays,
).toBe(90);
});
});

View file

@ -0,0 +1,153 @@
import { z } from 'zod';
import type { SqlAnalysisPort } from '../../../../context/sql-analysis/ports.js';
export const HISTORIC_SQL_SOURCE_KEY = 'historic-sql' as const;
const historicSqlDialectSchema = z.enum(['snowflake', 'bigquery', 'postgres']);
export type HistoricSqlDialect = z.infer<typeof historicSqlDialectSchema>;
const filterModeSchema = z.enum(['exclude', 'include', 'mark-only']);
const historicSqlCommonPullConfigSchema = z.object({
minExecutions: z.number().int().nonnegative().default(5),
enabledTables: z.array(z.string().min(1)).default([]),
filters: z.object({
serviceAccounts: z.object({
patterns: z.array(z.string()).default([]),
mode: filterModeSchema.default('exclude'),
}).optional(),
orchestrators: z.object({
mode: filterModeSchema.default('mark-only'),
}).optional(),
dropTrivialProbes: z.boolean().default(true),
dropFailedBelow: z.object({
errorRate: z.number().min(0).max(1),
executions: z.number().int().nonnegative(),
}).optional(),
}).default({ dropTrivialProbes: true }),
redactionPatterns: z.array(z.string()).default([]),
staleArchiveAfterDays: z.number().int().positive().default(90),
});
const historicSqlWindowedPullConfigSchema = historicSqlCommonPullConfigSchema.extend({
dialect: z.enum(['snowflake', 'bigquery']),
windowDays: z.number().int().positive().default(90),
});
const historicSqlPostgresPullConfigSchema = historicSqlCommonPullConfigSchema.extend({
dialect: z.literal('postgres'),
});
export const historicSqlUnifiedPullConfigSchema = z.discriminatedUnion('dialect', [
historicSqlWindowedPullConfigSchema,
historicSqlPostgresPullConfigSchema,
]);
export type HistoricSqlUnifiedPullConfig = z.infer<typeof historicSqlUnifiedPullConfigSchema>;
export const aggregatedTemplateSchema = z.object({
templateId: z.string().min(1),
canonicalSql: z.string().min(1),
dialect: historicSqlDialectSchema,
stats: z.object({
executions: z.number().int().nonnegative(),
distinctUsers: z.number().int().nonnegative(),
firstSeen: z.iso.datetime(),
lastSeen: z.iso.datetime(),
p50RuntimeMs: z.number().nonnegative().nullable(),
p95RuntimeMs: z.number().nonnegative().nullable(),
errorRate: z.number().min(0).max(1),
rowsProduced: z.number().int().nonnegative().nullable(),
}),
topUsers: z.array(z.object({
user: z.string().nullable(),
executions: z.number().int().nonnegative(),
})).default([]),
});
export type AggregatedTemplate = z.infer<typeof aggregatedTemplateSchema>;
export const stagedTableInputSchema = z.object({
table: z.string().min(1),
stats: z.object({
executionsBucket: z.string(),
distinctUsersBucket: z.string(),
errorRateBucket: z.string(),
p95RuntimeBucket: z.string(),
recencyBucket: z.string(),
}),
columnsByClause: z.record(z.string(), z.array(z.tuple([z.string(), z.string()]))),
observedJoins: z.array(z.object({
withTable: z.string(),
on: z.array(z.string()),
freq: z.string(),
})),
topTemplates: z.array(z.object({
id: z.string(),
canonicalSql: z.string(),
topUsers: z.array(z.object({ user: z.string().nullable() })),
})),
});
export type StagedTableInput = z.infer<typeof stagedTableInputSchema>;
export const stagedPatternsInputSchema = z.object({
templates: z.array(z.object({
id: z.string(),
canonicalSql: z.string(),
tablesTouched: z.array(z.string()),
executionsBucket: z.string(),
distinctUsersBucket: z.string(),
dialect: historicSqlDialectSchema,
})),
});
export type StagedPatternsInput = z.infer<typeof stagedPatternsInputSchema>;
export const stagedManifestSchema = z.object({
source: z.literal(HISTORIC_SQL_SOURCE_KEY),
connectionId: z.string().min(1),
dialect: historicSqlDialectSchema,
fetchedAt: z.iso.datetime(),
windowStart: z.iso.datetime(),
windowEnd: z.iso.datetime(),
snapshotRowCount: z.number().int().nonnegative(),
touchedTableCount: z.number().int().nonnegative(),
parseFailures: z.number().int().nonnegative(),
warnings: z.array(z.string()),
probeWarnings: z.array(z.string()),
staleArchiveAfterDays: z.number().int().positive().default(90),
});
interface HistoricSqlProbeResult {
warnings: string[];
info?: string[];
}
export interface HistoricSqlReader {
probe(client: unknown): Promise<HistoricSqlProbeResult>;
fetchAggregated(
client: unknown,
window: HistoricSqlTimeWindow,
config: HistoricSqlUnifiedPullConfig,
): AsyncIterable<AggregatedTemplate>;
}
export interface HistoricSqlTimeWindow {
start: Date;
end: Date;
}
export interface KtxPostgresQueryClient {
executeQuery(sql: string, params?: unknown[]): Promise<{ headers: string[]; rows: unknown[][]; totalRows?: number }>;
}
export interface PostgresPgssProbeResult extends HistoricSqlProbeResult {
pgServerVersion: string;
warnings: string[];
info: string[];
}
export interface HistoricSqlSourceAdapterDeps {
sqlAnalysis: SqlAnalysisPort;
reader: HistoricSqlReader;
queryClient: unknown;
now?: () => Date;
}

View file

@ -0,0 +1,107 @@
import { mkdtemp } from 'node:fs/promises';
import { tmpdir } from 'node:os';
import { join } from 'node:path';
import { describe, expect, it } from 'vitest';
import type { KtxSchemaSnapshot } from '../../../scan/types.js';
import { chunkLiveDatabaseStagedDir } from './chunk.js';
import { liveDatabaseTablePath, writeLiveDatabaseSnapshot } from './stage.js';
function snapshot(): KtxSchemaSnapshot {
return {
connectionId: 'conn-1',
driver: 'postgres',
extractedAt: '2026-04-27T00:00:00.000Z',
scope: { schemas: ['public'] },
metadata: {},
tables: [
{
name: 'orders',
catalog: null,
db: 'public',
kind: 'table',
comment: null,
estimatedRows: null,
columns: [
{
name: 'id',
nativeType: 'integer',
normalizedType: 'integer',
dimensionType: 'number',
nullable: false,
primaryKey: true,
comment: null,
},
],
foreignKeys: [],
},
{
name: 'customers',
catalog: null,
db: 'public',
kind: 'table',
comment: null,
estimatedRows: null,
columns: [
{
name: 'id',
nativeType: 'integer',
normalizedType: 'integer',
dimensionType: 'number',
nullable: false,
primaryKey: true,
comment: null,
},
],
foreignKeys: [],
},
],
};
}
describe('chunkLiveDatabaseStagedDir', () => {
it('emits one work unit per table on the first run', async () => {
const dir = await mkdtemp(join(tmpdir(), 'ktx-live-db-chunk-'));
await writeLiveDatabaseSnapshot(dir, snapshot());
const result = await chunkLiveDatabaseStagedDir(dir);
expect(result.workUnits.map((wu) => wu.unitKey)).toEqual([
'live-database-public-customers',
'live-database-public-orders',
]);
expect(result.workUnits[0]?.dependencyPaths).toEqual(['connection.json', 'foreign-keys.json']);
expect(result.workUnits[0]?.peerFileIndex).toContain(
liveDatabaseTablePath({ catalog: null, db: 'public', name: 'orders' }),
);
});
it('keeps only changed tables during incremental syncs and records table evictions', async () => {
const dir = await mkdtemp(join(tmpdir(), 'ktx-live-db-diff-'));
await writeLiveDatabaseSnapshot(dir, snapshot());
const ordersPath = liveDatabaseTablePath({ catalog: null, db: 'public', name: 'orders' });
const customersPath = liveDatabaseTablePath({ catalog: null, db: 'public', name: 'customers' });
const result = await chunkLiveDatabaseStagedDir(dir, {
added: [],
modified: [ordersPath],
deleted: [customersPath],
unchanged: ['connection.json', 'foreign-keys.json'],
});
expect(result.workUnits.map((wu) => wu.unitKey)).toEqual(['live-database-public-orders']);
expect(result.eviction?.deletedRawPaths).toEqual([customersPath]);
});
it('fans out all table work units when the foreign-key index changes', async () => {
const dir = await mkdtemp(join(tmpdir(), 'ktx-live-db-fk-'));
await writeLiveDatabaseSnapshot(dir, snapshot());
const result = await chunkLiveDatabaseStagedDir(dir, {
added: [],
modified: ['foreign-keys.json'],
deleted: [],
unchanged: [],
});
expect(result.workUnits).toHaveLength(2);
});
});

View file

@ -0,0 +1,58 @@
import type { ChunkResult, DiffSet, WorkUnit } from '../../types.js';
import type { KtxSchemaTable } from '../../../scan/types.js';
import { LIVE_DATABASE_FOREIGN_KEYS_FILE, LIVE_DATABASE_META_FILE, readLiveDatabaseTableFiles } from './stage.js';
function unitKey(table: KtxSchemaTable): string {
const parts = [table.catalog, table.db, table.name]
.filter((part): part is string => typeof part === 'string' && part.length > 0)
.map((part) =>
part
.toLowerCase()
.replace(/[^a-z0-9]+/g, '-')
.replace(/^-+|-+$/g, ''),
)
.filter(Boolean);
return `live-database-${parts.join('-') || 'table'}`;
}
function displayName(table: KtxSchemaTable): string {
return [table.catalog, table.db, table.name].filter(Boolean).join('.');
}
function isTablePath(path: string): boolean {
return path.startsWith('tables/') && path.endsWith('.json');
}
export async function chunkLiveDatabaseStagedDir(stagedDir: string, diffSet?: DiffSet): Promise<ChunkResult> {
const tableFiles = await readLiveDatabaseTableFiles(stagedDir);
const allTablePaths = tableFiles.map((file) => file.path);
const globalDeps = [LIVE_DATABASE_META_FILE, LIVE_DATABASE_FOREIGN_KEYS_FILE];
const touched = diffSet ? new Set([...diffSet.added, ...diffSet.modified]) : null;
const globalTouched = Boolean(
touched && (touched.has(LIVE_DATABASE_META_FILE) || touched.has(LIVE_DATABASE_FOREIGN_KEYS_FILE)),
);
const workUnits: WorkUnit[] = [];
for (const file of tableFiles) {
if (touched && !globalTouched && !touched.has(file.path)) {
continue;
}
const peers = allTablePaths.filter((path) => path !== file.path).sort();
workUnits.push({
unitKey: unitKey(file.table),
displayLabel: `Live database table ${displayName(file.table)}`,
rawFiles: [file.path],
peerFileIndex: peers,
dependencyPaths: globalDeps,
notes: `Database catalog snapshot for ${displayName(file.table)} with ${file.table.columns.length} column${
file.table.columns.length === 1 ? '' : 's'
}.`,
});
}
const deletedRawPaths = diffSet ? diffSet.deleted.filter(isTablePath).sort() : [];
return {
workUnits,
...(deletedRawPaths.length > 0 ? { eviction: { deletedRawPaths } } : {}),
};
}

View file

@ -0,0 +1,255 @@
import { once } from 'node:events';
import { createServer } from 'node:http';
import { describe, expect, it, vi } from 'vitest';
import { createDaemonLiveDatabaseIntrospection } from './daemon-introspection.js';
const daemonResponse = {
connection_id: 'warehouse',
extracted_at: '2026-04-28T10:00:00+00:00',
metadata: { driver: 'postgres', schemas: ['public'] },
tables: [
{
catalog: 'warehouse',
db: 'public',
name: 'customers',
comment: null,
columns: [{ name: 'id', type: 'integer', nullable: false, primary_key: true, comment: null }],
foreign_keys: [],
},
{
catalog: 'warehouse',
db: 'public',
name: 'orders',
comment: 'Order facts',
columns: [
{ name: 'id', type: 'integer', nullable: false, primary_key: true, comment: 'Order id' },
{ name: 'customer_id', type: 'integer', nullable: false, primary_key: false, comment: null },
],
foreign_keys: [
{
from_column: 'customer_id',
to_table: 'customers',
to_column: 'id',
constraint_name: 'orders_customer_id_fkey',
},
],
},
],
};
describe('createDaemonLiveDatabaseIntrospection', () => {
it('calls the database-introspect daemon command and maps the snapshot response', async () => {
const runJson = vi.fn(async () => daemonResponse);
const introspection = createDaemonLiveDatabaseIntrospection({
connections: {
warehouse: {
driver: 'postgres',
url: 'postgres://localhost:5432/warehouse',
},
},
schemas: ['public'],
runJson,
});
await expect(introspection.extractSchema('warehouse')).resolves.toEqual({
connectionId: 'warehouse',
driver: 'postgres',
extractedAt: '2026-04-28T10:00:00+00:00',
scope: { schemas: ['public'] },
metadata: { driver: 'postgres', schemas: ['public'] },
tables: [
{
catalog: 'warehouse',
db: 'public',
name: 'customers',
kind: 'table',
comment: null,
estimatedRows: null,
columns: [
{
name: 'id',
nativeType: 'integer',
normalizedType: 'integer',
dimensionType: 'number',
nullable: false,
primaryKey: true,
comment: null,
},
],
foreignKeys: [],
},
{
catalog: 'warehouse',
db: 'public',
name: 'orders',
kind: 'table',
comment: 'Order facts',
estimatedRows: null,
columns: [
{
name: 'id',
nativeType: 'integer',
normalizedType: 'integer',
dimensionType: 'number',
nullable: false,
primaryKey: true,
comment: 'Order id',
},
{
name: 'customer_id',
nativeType: 'integer',
normalizedType: 'integer',
dimensionType: 'number',
nullable: false,
primaryKey: false,
comment: null,
},
],
foreignKeys: [
{
fromColumn: 'customer_id',
toCatalog: null,
toDb: null,
toTable: 'customers',
toColumn: 'id',
constraintName: 'orders_customer_id_fkey',
},
],
},
],
});
expect(runJson).toHaveBeenCalledWith('database-introspect', {
connection_id: 'warehouse',
driver: 'postgres',
url: 'postgres://localhost:5432/warehouse',
schemas: ['public'],
statement_timeout_ms: 30_000,
connection_timeout_seconds: 5,
});
});
it('calls a running daemon HTTP endpoint when baseUrl is configured', async () => {
const requests: Array<{ url: string | undefined; body: unknown }> = [];
const server = createServer((request, response) => {
const chunks: Buffer[] = [];
request.on('data', (chunk: Buffer) => chunks.push(chunk));
request.on('end', () => {
requests.push({
url: request.url,
body: JSON.parse(Buffer.concat(chunks).toString('utf8')),
});
response.writeHead(200, { 'content-type': 'application/json' });
response.end(JSON.stringify(daemonResponse));
});
});
server.listen(0, '127.0.0.1');
await once(server, 'listening');
try {
const address = server.address();
if (!address || typeof address === 'string') {
throw new Error('expected TCP server address');
}
const introspection = createDaemonLiveDatabaseIntrospection({
connections: {
warehouse: {
driver: 'postgresql',
url: 'postgres://localhost:5432/warehouse',
},
},
baseUrl: `http://127.0.0.1:${address.port}`,
});
await expect(introspection.extractSchema('warehouse')).resolves.toMatchObject({
connectionId: 'warehouse',
tables: [{ name: 'customers' }, { name: 'orders' }],
});
expect(requests).toEqual([
{
url: '/database/introspect',
body: {
connection_id: 'warehouse',
driver: 'postgres',
url: 'postgres://localhost:5432/warehouse',
schemas: ['public'],
statement_timeout_ms: 30_000,
connection_timeout_seconds: 5,
},
},
]);
} finally {
server.close();
}
});
it('requires a configured postgres connection with a url', async () => {
const introspection = createDaemonLiveDatabaseIntrospection({
connections: {
warehouse: {
driver: 'postgres',
},
},
runJson: vi.fn(async () => daemonResponse),
});
await expect(introspection.extractSchema('warehouse')).rejects.toThrow(
'Local live-database ingest requires connections.warehouse.url.',
);
});
it('rejects unsupported local connection drivers before calling the daemon', async () => {
const runJson = vi.fn(async () => daemonResponse);
const introspection = createDaemonLiveDatabaseIntrospection({
connections: {
warehouse: {
driver: 'snowflake',
url: 'snowflake://example',
},
},
runJson,
});
await expect(introspection.extractSchema('warehouse')).rejects.toThrow(
'Local live-database ingest cannot run driver "snowflake".',
);
expect(runJson).not.toHaveBeenCalled();
});
it('filters out tables not on the enabled_tables allowlist', async () => {
const runJson = vi.fn(async () => daemonResponse);
const introspection = createDaemonLiveDatabaseIntrospection({
connections: {
warehouse: {
driver: 'postgres',
url: 'postgres://localhost:5432/warehouse',
enabled_tables: ['public.orders'],
},
},
schemas: ['public'],
runJson,
});
const snapshot = await introspection.extractSchema('warehouse');
expect(snapshot.tables.map((table) => `${table.db}.${table.name}`)).toEqual(['public.orders']);
});
it('passes through every table when enabled_tables is omitted or empty', async () => {
const runJson = vi.fn(async () => daemonResponse);
const introspection = createDaemonLiveDatabaseIntrospection({
connections: {
warehouse: {
driver: 'postgres',
url: 'postgres://localhost:5432/warehouse',
enabled_tables: [],
},
},
schemas: ['public'],
runJson,
});
const snapshot = await introspection.extractSchema('warehouse');
expect(snapshot.tables.map((table) => table.name)).toEqual(['customers', 'orders']);
});
});

View file

@ -0,0 +1,256 @@
import { spawn } from 'node:child_process';
import { request as httpRequest } from 'node:http';
import { request as httpsRequest } from 'node:https';
import { URL } from 'node:url';
import type { KtxProjectConnectionConfig } from '../../../project/config.js';
import { filterSnapshotTables, resolveEnabledTables } from '../../../scan/enabled-tables.js';
import type { KtxSchemaColumn, KtxSchemaForeignKey, KtxSchemaSnapshot, KtxSchemaTable } from '../../../scan/types.js';
import { inferKtxDimensionType, normalizeKtxNativeType } from '../../../scan/type-normalization.js';
import type { LiveDatabaseIntrospectionPort } from './types.js';
type KtxDaemonDatabaseIntrospectionCommand = 'database-introspect';
type KtxDaemonDatabaseJsonRunner = (
subcommand: KtxDaemonDatabaseIntrospectionCommand,
payload: Record<string, unknown>,
) => Promise<Record<string, unknown>>;
export type KtxDaemonDatabaseHttpJsonRunner = (
path: string,
payload: Record<string, unknown>,
) => Promise<Record<string, unknown>>;
export interface DaemonLiveDatabaseIntrospectionOptions {
connections: Record<string, KtxProjectConnectionConfig>;
schemas?: string[];
statementTimeoutMs?: number;
connectionTimeoutSeconds?: number;
command?: string;
args?: string[];
cwd?: string;
env?: NodeJS.ProcessEnv;
baseUrl?: string;
runJson?: KtxDaemonDatabaseJsonRunner;
requestJson?: KtxDaemonDatabaseHttpJsonRunner;
now?: () => Date;
}
const DEFAULT_SCHEMAS = ['public'];
function parseJsonObject(raw: string, subcommand: string): Record<string, unknown> {
const parsed = JSON.parse(raw) as unknown;
if (!parsed || typeof parsed !== 'object' || Array.isArray(parsed)) {
throw new Error(`ktx-daemon ${subcommand} returned non-object JSON`);
}
return parsed as Record<string, unknown>;
}
function runProcessJson(
options: Required<Pick<DaemonLiveDatabaseIntrospectionOptions, 'command' | 'args'>> &
Pick<DaemonLiveDatabaseIntrospectionOptions, 'cwd' | 'env'>,
): KtxDaemonDatabaseJsonRunner {
return async (subcommand, payload) =>
new Promise((resolve, reject) => {
const child = spawn(options.command, [...options.args, subcommand], {
cwd: options.cwd,
env: { ...process.env, ...options.env },
stdio: ['pipe', 'pipe', 'pipe'],
});
const stdout: Buffer[] = [];
const stderr: Buffer[] = [];
child.stdout.on('data', (chunk: Buffer) => stdout.push(chunk));
child.stderr.on('data', (chunk: Buffer) => stderr.push(chunk));
child.on('error', reject);
child.on('close', (code) => {
const stdoutText = Buffer.concat(stdout).toString('utf8').trim();
const stderrText = Buffer.concat(stderr).toString('utf8').trim();
if (code !== 0) {
reject(new Error(`ktx-daemon ${subcommand} failed: ${stderrText || `exit code ${code}`}`));
return;
}
try {
resolve(parseJsonObject(stdoutText, subcommand));
} catch (error) {
reject(error);
}
});
child.stdin.end(`${JSON.stringify(payload)}\n`);
});
}
function normalizedBaseUrl(baseUrl: string): string {
return baseUrl.endsWith('/') ? baseUrl : `${baseUrl}/`;
}
function postJson(baseUrl: string): KtxDaemonDatabaseHttpJsonRunner {
return async (path, payload) =>
new Promise((resolve, reject) => {
const target = new URL(path.replace(/^\//, ''), normalizedBaseUrl(baseUrl));
const body = JSON.stringify(payload);
const client = target.protocol === 'https:' ? httpsRequest : httpRequest;
const request = client(
target,
{
method: 'POST',
headers: {
accept: 'application/json',
'content-type': 'application/json',
'content-length': Buffer.byteLength(body),
},
},
(response) => {
const chunks: Buffer[] = [];
response.on('data', (chunk: Buffer) => chunks.push(chunk));
response.on('end', () => {
const text = Buffer.concat(chunks).toString('utf8');
const statusCode = response.statusCode ?? 0;
if (statusCode < 200 || statusCode >= 300) {
reject(new Error(`ktx-daemon HTTP ${path} failed with ${statusCode}: ${text}`));
return;
}
try {
resolve(parseJsonObject(text, path));
} catch (error) {
reject(error);
}
});
},
);
request.on('error', reject);
request.end(body);
});
}
function recordValue(value: unknown): Record<string, unknown> {
return value && typeof value === 'object' && !Array.isArray(value) ? (value as Record<string, unknown>) : {};
}
function recordArray(value: unknown): Array<Record<string, unknown>> {
return Array.isArray(value)
? value.filter(
(item): item is Record<string, unknown> => item !== null && typeof item === 'object' && !Array.isArray(item),
)
: [];
}
function requiredString(value: unknown, field: string): string {
if (typeof value !== 'string' || value.length === 0) {
throw new Error(`ktx-daemon database introspection response is missing string field ${field}`);
}
return value;
}
function nullableString(value: unknown): string | null {
return typeof value === 'string' ? value : null;
}
function optionalString(value: unknown): string | undefined {
return typeof value === 'string' ? value : undefined;
}
function normalizeDriver(driver: unknown): string {
const normalized = String(driver ?? '').trim().toLowerCase();
return normalized === 'postgresql' ? 'postgres' : normalized;
}
function requirePostgresConnection(
connections: Record<string, KtxProjectConnectionConfig>,
connectionId: string,
): KtxProjectConnectionConfig & { url: string } {
const connection = connections[connectionId];
const driver = normalizeDriver(connection?.driver);
if (driver !== 'postgres') {
throw new Error(`Local live-database ingest cannot run driver "${connection?.driver ?? 'unknown'}".`);
}
if (typeof connection.url !== 'string' || connection.url.trim().length === 0) {
throw new Error(`Local live-database ingest requires connections.${connectionId}.url.`);
}
return connection as KtxProjectConnectionConfig & { url: string };
}
function mapColumn(raw: Record<string, unknown>): KtxSchemaColumn {
const nativeType = requiredString(raw.type, 'tables[].columns[].type');
return {
name: requiredString(raw.name, 'tables[].columns[].name'),
nativeType,
normalizedType: normalizeKtxNativeType(nativeType),
dimensionType: inferKtxDimensionType(nativeType),
nullable: raw.nullable !== false ? true : false,
primaryKey: raw.primary_key === true,
comment: nullableString(raw.comment),
};
}
function mapForeignKey(raw: Record<string, unknown>): KtxSchemaForeignKey {
return {
fromColumn: requiredString(raw.from_column, 'tables[].foreign_keys[].from_column'),
toCatalog: null,
toDb: null,
toTable: requiredString(raw.to_table, 'tables[].foreign_keys[].to_table'),
toColumn: requiredString(raw.to_column, 'tables[].foreign_keys[].to_column'),
constraintName: nullableString(raw.constraint_name),
};
}
function mapTable(raw: Record<string, unknown>): KtxSchemaTable {
return {
catalog: nullableString(raw.catalog),
db: nullableString(raw.db),
name: requiredString(raw.name, 'tables[].name'),
kind: 'table',
comment: nullableString(raw.comment),
estimatedRows: null,
columns: recordArray(raw.columns).map(mapColumn),
foreignKeys: recordArray(raw.foreign_keys).map(mapForeignKey),
};
}
function mapDaemonSnapshot(
raw: Record<string, unknown>,
input: { connectionId: string; extractedAt: string; schemas: string[] },
): KtxSchemaSnapshot {
return {
connectionId: requiredString(raw.connection_id, 'connection_id') || input.connectionId,
driver: 'postgres',
extractedAt: optionalString(raw.extracted_at) ?? input.extractedAt,
scope: { schemas: input.schemas },
metadata: recordValue(raw.metadata),
tables: recordArray(raw.tables).map(mapTable),
};
}
export function createDaemonLiveDatabaseIntrospection(
options: DaemonLiveDatabaseIntrospectionOptions,
): LiveDatabaseIntrospectionPort {
const schemas = options.schemas ?? DEFAULT_SCHEMAS;
const command = options.command ?? 'python';
const args = options.args ?? ['-m', 'ktx_daemon'];
const runJson = options.runJson ?? runProcessJson({ command, args, cwd: options.cwd, env: options.env });
const requestJson = options.requestJson ?? (options.baseUrl ? postJson(options.baseUrl) : undefined);
const now = options.now ?? (() => new Date());
return {
async extractSchema(connectionId: string): Promise<KtxSchemaSnapshot> {
const connection = requirePostgresConnection(options.connections, connectionId);
const payload = {
connection_id: connectionId,
driver: normalizeDriver(connection.driver),
url: connection.url,
schemas,
statement_timeout_ms: options.statementTimeoutMs ?? 30_000,
connection_timeout_seconds: options.connectionTimeoutSeconds ?? 5,
};
const raw = requestJson
? await requestJson('/database/introspect', payload)
: await runJson('database-introspect', payload);
const snapshot = mapDaemonSnapshot(raw, {
connectionId,
extractedAt: now().toISOString(),
schemas,
});
const enabledTables = resolveEnabledTables(connection);
return enabledTables ? filterSnapshotTables(snapshot, enabledTables) : snapshot;
},
};
}

View file

@ -0,0 +1,59 @@
import { mkdtemp } from 'node:fs/promises';
import { tmpdir } from 'node:os';
import { join } from 'node:path';
import { describe, expect, it, vi } from 'vitest';
import { LiveDatabaseSourceAdapter } from './live-database.adapter.js';
describe('LiveDatabaseSourceAdapter', () => {
it('fetches a schema snapshot through the introspection port', async () => {
const extractSchema = vi.fn().mockResolvedValue({
connectionId: 'conn-1',
driver: 'postgres',
extractedAt: '2026-04-27T00:00:00.000Z',
scope: { schemas: ['public'] },
metadata: {},
tables: [
{
name: 'orders',
catalog: null,
db: 'public',
kind: 'table',
comment: null,
estimatedRows: null,
columns: [
{
name: 'id',
nativeType: 'integer',
normalizedType: 'integer',
dimensionType: 'number',
nullable: false,
primaryKey: true,
comment: null,
},
],
foreignKeys: [],
},
],
});
const adapter = new LiveDatabaseSourceAdapter({
introspection: { extractSchema },
now: () => new Date('2026-04-27T00:00:00.000Z'),
});
const dir = await mkdtemp(join(tmpdir(), 'ktx-live-db-adapter-'));
await adapter.fetch(undefined, dir, { connectionId: 'conn-1', sourceKey: 'live-database' });
expect(extractSchema).toHaveBeenCalledWith('conn-1');
await expect(adapter.detect(dir)).resolves.toBe(true);
const chunked = await adapter.chunk(dir);
expect(chunked.workUnits.map((wu) => wu.unitKey)).toEqual(['live-database-public-orders']);
});
it('declares the live database source and skill', () => {
const adapter = new LiveDatabaseSourceAdapter({
introspection: { extractSchema: vi.fn() },
});
expect(adapter.source).toBe('live-database');
expect(adapter.skillNames).toEqual(['live_database_ingest']);
});
});

View file

@ -0,0 +1,28 @@
import type { ChunkResult, DiffSet, FetchContext, SourceAdapter } from '../../types.js';
import { chunkLiveDatabaseStagedDir } from './chunk.js';
import { detectLiveDatabaseStagedDir, writeLiveDatabaseSnapshot } from './stage.js';
import type { LiveDatabaseSourceAdapterDeps } from './types.js';
export class LiveDatabaseSourceAdapter implements SourceAdapter {
readonly source = 'live-database';
readonly skillNames = ['live_database_ingest'];
constructor(private readonly deps: LiveDatabaseSourceAdapterDeps) {}
detect(stagedDir: string): Promise<boolean> {
return detectLiveDatabaseStagedDir(stagedDir);
}
async fetch(_pullConfig: unknown, stagedDir: string, ctx: FetchContext): Promise<void> {
const snapshot = await this.deps.introspection.extractSchema(ctx.connectionId);
await writeLiveDatabaseSnapshot(stagedDir, {
...snapshot,
connectionId: ctx.connectionId,
extractedAt: snapshot.extractedAt ?? (this.deps.now ?? (() => new Date()))().toISOString(),
});
}
chunk(stagedDir: string, diffSet?: DiffSet): Promise<ChunkResult> {
return chunkLiveDatabaseStagedDir(stagedDir, diffSet);
}
}

View file

@ -0,0 +1,308 @@
import { describe, expect, it } from 'vitest';
import {
buildLiveDatabaseManifestShards,
type LiveDatabaseManifestExistingDescriptions,
type LiveDatabaseManifestJoinEntry,
type LiveDatabaseManifestShard,
} from './manifest.js';
function shardObject(shards: Map<string, LiveDatabaseManifestShard>): Record<string, LiveDatabaseManifestShard> {
return Object.fromEntries([...shards.entries()].sort(([a], [b]) => a.localeCompare(b)));
}
describe('buildLiveDatabaseManifestShards', () => {
it('builds shard objects with generated joins and preserved external descriptions', () => {
const existingDescriptions = new Map<string, LiveDatabaseManifestExistingDescriptions>([
[
'orders',
{
table: { user: 'Pinned analyst description', db: 'Old db description' },
columns: new Map([['id', { user: 'Pinned id description', db: 'Old id description' }]]),
},
],
]);
const preservedJoins = new Map<string, LiveDatabaseManifestJoinEntry[]>([
[
'orders',
[
{
to: 'customers',
on: 'orders.account_id = customers.id',
relationship: 'many_to_one',
source: 'manual',
},
{
to: 'missing_accounts',
on: 'orders.account_id = missing_accounts.id',
relationship: 'many_to_one',
source: 'manual',
},
],
],
]);
const result = buildLiveDatabaseManifestShards({
connectionType: 'POSTGRESQL',
mapColumnType: (nativeType) => nativeType.toLowerCase(),
existingDescriptions,
existingPreservedJoins: preservedJoins,
tables: [
{
name: 'orders',
catalog: null,
db: 'public',
descriptions: { db: 'Fresh db description', ai: 'Generated AI description' },
columns: [
{
name: 'id',
type: 'INTEGER',
pk: true,
nullable: false,
descriptions: { db: 'Fresh id description' },
},
{
name: 'customer_id',
type: 'INTEGER',
},
],
},
{
name: 'customers',
catalog: null,
db: 'public',
columns: [
{
name: 'id',
type: 'INTEGER',
pk: true,
nullable: false,
},
],
},
],
joins: [
{
fromTable: 'orders',
fromColumns: ['customer_id'],
toTable: 'customers',
toColumns: ['id'],
relationship: 'MANY_TO_ONE',
source: 'formal',
},
],
});
expect(result.tablesProcessed).toBe(2);
expect(shardObject(result.shards)).toEqual({
public: {
tables: {
orders: {
table: 'public.orders',
descriptions: {
user: 'Pinned analyst description',
db: 'Fresh db description',
ai: 'Generated AI description',
},
columns: [
{
name: 'id',
type: 'integer',
pk: true,
nullable: false,
descriptions: {
user: 'Pinned id description',
db: 'Fresh id description',
},
},
{
name: 'customer_id',
type: 'integer',
},
],
joins: [
{
to: 'customers',
on: 'orders.customer_id = customers.id',
relationship: 'many_to_one',
source: 'formal',
},
{
to: 'customers',
on: 'orders.account_id = customers.id',
relationship: 'many_to_one',
source: 'manual',
},
],
},
customers: {
table: 'public.customers',
columns: [
{
name: 'id',
type: 'integer',
pk: true,
nullable: false,
},
],
joins: [
{
to: 'orders',
on: 'customers.id = orders.customer_id',
relationship: 'one_to_many',
source: 'formal',
},
],
},
},
},
});
});
it('uses warehouse and schema shard keys for snowflake-style connections', () => {
const result = buildLiveDatabaseManifestShards({
connectionType: 'SNOWFLAKE',
mapColumnType: (nativeType) => nativeType.toLowerCase(),
tables: [
{
name: 'accounts',
catalog: 'ANALYTICS',
db: 'CORE',
columns: [{ name: 'id', type: 'NUMBER' }],
},
],
joins: [],
});
expect(shardObject(result.shards)).toEqual({
'ANALYTICS.CORE': {
tables: {
accounts: {
table: 'ANALYTICS.CORE.accounts',
columns: [{ name: 'id', type: 'number' }],
},
},
},
});
});
it('preserves external usage keys while replacing historic SQL managed keys', () => {
const existingUsage = new Map([
[
'orders',
{
narrative: 'Old generated usage narrative.',
frequencyTier: 'low' as const,
commonFilters: ['old_status'],
commonJoins: [],
ownerNote: 'Pinned analyst note',
},
],
]);
const result = buildLiveDatabaseManifestShards({
connectionType: 'POSTGRESQL',
mapColumnType: (nativeType) => nativeType.toLowerCase(),
existingUsage,
tables: [
{
name: 'orders',
catalog: null,
db: 'public',
usage: {
narrative: 'Fresh generated usage narrative.',
frequencyTier: 'high',
commonFilters: ['status'],
commonGroupBys: ['created_at'],
commonJoins: [{ table: 'public.customers', on: ['customer_id'] }],
},
columns: [{ name: 'id', type: 'INTEGER' }],
},
],
joins: [],
});
expect(shardObject(result.shards)).toEqual({
public: {
tables: {
orders: {
table: 'public.orders',
usage: {
ownerNote: 'Pinned analyst note',
narrative: 'Fresh generated usage narrative.',
frequencyTier: 'high',
commonFilters: ['status'],
commonGroupBys: ['created_at'],
commonJoins: [{ table: 'public.customers', on: ['customer_id'] }],
},
columns: [{ name: 'id', type: 'integer' }],
},
},
},
});
});
it('renders ordered multi-column joins in both directions', () => {
const result = buildLiveDatabaseManifestShards({
connectionType: 'POSTGRESQL',
mapColumnType: (nativeType) => nativeType,
tables: [
{
name: 'order_lines',
catalog: null,
db: 'public',
columns: [
{ name: 'order_id', type: 'integer' },
{ name: 'line_number', type: 'integer' },
],
},
{
name: 'order_line_allocations',
catalog: null,
db: 'public',
columns: [
{ name: 'order_id', type: 'integer' },
{ name: 'line_number', type: 'integer' },
],
},
],
joins: [
{
fromTable: 'order_line_allocations',
fromColumns: ['order_id', 'line_number'],
toTable: 'order_lines',
toColumns: ['order_id', 'line_number'],
relationship: 'many_to_one',
source: 'inferred',
},
],
});
expect(shardObject(result.shards)).toMatchObject({
public: {
tables: {
order_line_allocations: {
joins: [
{
to: 'order_lines',
on: 'order_line_allocations.order_id = order_lines.order_id AND order_line_allocations.line_number = order_lines.line_number',
relationship: 'many_to_one',
source: 'inferred',
},
],
},
order_lines: {
joins: [
{
to: 'order_line_allocations',
on: 'order_lines.order_id = order_line_allocations.order_id AND order_lines.line_number = order_line_allocations.line_number',
relationship: 'one_to_many',
source: 'inferred',
},
],
},
},
},
});
});
});

View file

@ -0,0 +1,310 @@
import type { TableUsageOutput } from '../historic-sql/skill-schemas.js';
const RELATIONSHIP_MAP: Record<string, string> = {
MANY_TO_ONE: 'many_to_one',
ONE_TO_MANY: 'one_to_many',
ONE_TO_ONE: 'one_to_one',
};
const RELATIONSHIP_INVERSE: Record<string, string> = {
many_to_one: 'one_to_many',
one_to_many: 'many_to_one',
one_to_one: 'one_to_one',
};
const SCAN_MANAGED_DESCRIPTION_KEYS = new Set(['db', 'ai']);
const HISTORIC_SQL_MANAGED_USAGE_KEYS = new Set([
'narrative',
'frequencyTier',
'commonFilters',
'commonGroupBys',
'commonJoins',
'staleSince',
]);
interface LiveDatabaseManifestColumn {
name: string;
type: string;
pk?: boolean;
nullable?: boolean;
descriptions?: Record<string, string>;
}
export interface LiveDatabaseManifestJoinEntry {
to: string;
on: string;
relationship: string;
source: string;
}
interface LiveDatabaseManifestTableEntry {
table: string;
descriptions?: Record<string, string>;
usage?: TableUsageOutput;
columns: LiveDatabaseManifestColumn[];
joins?: LiveDatabaseManifestJoinEntry[];
}
export interface LiveDatabaseManifestShard {
tables: Record<string, LiveDatabaseManifestTableEntry>;
}
export interface LiveDatabaseManifestTableData {
name: string;
catalog: string | null;
db: string | null;
descriptions?: Record<string, string>;
usage?: TableUsageOutput;
columns: Array<{
name: string;
type: string;
pk?: boolean;
nullable?: boolean;
descriptions?: Record<string, string>;
}>;
}
export interface LiveDatabaseManifestJoinData {
fromTable: string;
fromColumns: string[];
toTable: string;
toColumns: string[];
relationship: string;
source: 'formal' | 'inferred' | 'manual';
}
export interface LiveDatabaseManifestExistingDescriptions {
table?: Record<string, string>;
columns: Map<string, Record<string, string>>;
}
export interface BuildLiveDatabaseManifestShardsInput {
connectionType: string;
tables: LiveDatabaseManifestTableData[];
joins: LiveDatabaseManifestJoinData[];
mapColumnType: (nativeType: string) => string;
existingPreservedJoins?: Map<string, LiveDatabaseManifestJoinEntry[]>;
existingDescriptions?: Map<string, LiveDatabaseManifestExistingDescriptions>;
existingUsage?: Map<string, TableUsageOutput>;
}
export interface BuildLiveDatabaseManifestShardsResult {
shards: Map<string, LiveDatabaseManifestShard>;
tablesProcessed: number;
}
function mergeDescriptionsPreservingExternal(
existing: Record<string, string> | undefined,
incoming: Record<string, string> | undefined,
): Record<string, string> | undefined {
if (!existing && !incoming) {
return undefined;
}
const result: Record<string, string> = {};
if (existing) {
for (const [key, value] of Object.entries(existing)) {
if (!SCAN_MANAGED_DESCRIPTION_KEYS.has(key)) {
result[key] = value;
}
}
}
if (incoming) {
Object.assign(result, incoming);
}
return Object.keys(result).length > 0 ? result : undefined;
}
export function mergeUsagePreservingExternal(
existing: TableUsageOutput | undefined,
incoming: TableUsageOutput | undefined,
): TableUsageOutput | undefined {
if (!existing && !incoming) {
return undefined;
}
if (!incoming) {
return existing ? { ...existing } : undefined;
}
const result: Record<string, unknown> = {};
if (existing) {
for (const [key, value] of Object.entries(existing)) {
if (!HISTORIC_SQL_MANAGED_USAGE_KEYS.has(key)) {
result[key] = value;
}
}
}
Object.assign(result, incoming);
return Object.keys(result).length > 0 ? (result as TableUsageOutput) : undefined;
}
function getShardKey(connectionType: string, catalog: string | null, db: string | null): string {
const normalized = connectionType.toUpperCase();
switch (normalized) {
case 'SNOWFLAKE':
case 'DATABRICKS': {
const catalogPart = catalog ?? 'default';
const schemaPart = db ?? 'public';
return `${catalogPart}.${schemaPart}`;
}
case 'BIGQUERY': {
return db ?? catalog ?? 'default';
}
case 'MYSQL':
case 'CLICKHOUSE': {
return db ?? catalog ?? 'default';
}
default: {
return db ?? 'public';
}
}
}
function buildTableRef(name: string, catalog: string | null, db: string | null): string {
const parts: string[] = [];
if (catalog) {
parts.push(catalog);
}
if (db) {
parts.push(db);
}
parts.push(name);
return parts.join('.');
}
function addJoinOnce(
joinsByTable: Map<string, LiveDatabaseManifestJoinEntry[]>,
tableName: string,
join: LiveDatabaseManifestJoinEntry,
): void {
const joins = joinsByTable.get(tableName) ?? [];
const exists = joins.some((candidate) => candidate.to === join.to && candidate.on === join.on);
if (!exists) {
joins.push(join);
}
joinsByTable.set(tableName, joins);
}
function joinCondition(
leftTable: string,
leftColumns: readonly string[],
rightTable: string,
rightColumns: readonly string[],
): string {
if (leftColumns.length === 0 || leftColumns.length !== rightColumns.length) {
throw new Error(`Invalid relationship join from ${leftTable} to ${rightTable}: column tuple widths differ`);
}
return leftColumns
.map((leftColumn, index) => {
const rightColumn = rightColumns[index];
if (!rightColumn) {
throw new Error(`Invalid relationship join from ${leftTable} to ${rightTable}: missing target column`);
}
return `${leftTable}.${leftColumn} = ${rightTable}.${rightColumn}`;
})
.join(' AND ');
}
function buildJoinsByTable(
tableNames: Set<string>,
joins: LiveDatabaseManifestJoinData[],
preservedJoins: Map<string, LiveDatabaseManifestJoinEntry[]>,
): Map<string, LiveDatabaseManifestJoinEntry[]> {
const joinsByTable = new Map<string, LiveDatabaseManifestJoinEntry[]>();
for (const join of joins) {
if (!tableNames.has(join.fromTable) || !tableNames.has(join.toTable)) {
continue;
}
const relationship = RELATIONSHIP_MAP[join.relationship] ?? join.relationship;
addJoinOnce(joinsByTable, join.fromTable, {
to: join.toTable,
on: joinCondition(join.fromTable, join.fromColumns, join.toTable, join.toColumns),
relationship,
source: join.source,
});
const reverseRelationship = RELATIONSHIP_INVERSE[relationship] ?? 'one_to_many';
addJoinOnce(joinsByTable, join.toTable, {
to: join.fromTable,
on: joinCondition(join.toTable, join.toColumns, join.fromTable, join.fromColumns),
relationship: reverseRelationship,
source: join.source,
});
}
for (const [tableName, tableJoins] of preservedJoins) {
if (!tableNames.has(tableName)) {
continue;
}
for (const join of tableJoins) {
if (tableNames.has(join.to)) {
addJoinOnce(joinsByTable, tableName, join);
}
}
}
return joinsByTable;
}
export function buildLiveDatabaseManifestShards(
input: BuildLiveDatabaseManifestShardsInput,
): BuildLiveDatabaseManifestShardsResult {
const tableNames = new Set(input.tables.map((table) => table.name));
const joinsByTable = buildJoinsByTable(tableNames, input.joins, input.existingPreservedJoins ?? new Map());
const shards = new Map<string, LiveDatabaseManifestShard>();
for (const table of input.tables) {
const shardKey = getShardKey(input.connectionType, table.catalog, table.db);
const shard = shards.get(shardKey) ?? { tables: {} };
const existingDescriptions = input.existingDescriptions?.get(table.name);
const columns: LiveDatabaseManifestColumn[] = table.columns.map((column) => {
const manifestColumn: LiveDatabaseManifestColumn = {
name: column.name,
type: input.mapColumnType(column.type),
};
if (column.pk) {
manifestColumn.pk = true;
}
if (column.nullable === false) {
manifestColumn.nullable = false;
}
const descriptions = mergeDescriptionsPreservingExternal(
existingDescriptions?.columns.get(column.name),
column.descriptions,
);
if (descriptions) {
manifestColumn.descriptions = descriptions;
}
return manifestColumn;
});
const entry: LiveDatabaseManifestTableEntry = {
table: buildTableRef(table.name, table.catalog, table.db),
columns,
};
const tableDescriptions = mergeDescriptionsPreservingExternal(existingDescriptions?.table, table.descriptions);
if (tableDescriptions) {
entry.descriptions = tableDescriptions;
}
const usage = mergeUsagePreservingExternal(input.existingUsage?.get(table.name), table.usage);
if (usage) {
entry.usage = usage;
}
const tableJoins = joinsByTable.get(table.name);
if (tableJoins && tableJoins.length > 0) {
entry.joins = tableJoins;
}
shard.tables[table.name] = entry;
shards.set(shardKey, shard);
}
return {
shards,
tablesProcessed: input.tables.length,
};
}

View file

@ -0,0 +1,152 @@
import { mkdtemp, readFile } from 'node:fs/promises';
import { tmpdir } from 'node:os';
import { join } from 'node:path';
import { describe, expect, it } from 'vitest';
import {
detectLiveDatabaseStagedDir,
LIVE_DATABASE_FOREIGN_KEYS_FILE,
LIVE_DATABASE_META_FILE,
liveDatabaseTablePath,
readLiveDatabaseTableFiles,
writeLiveDatabaseSnapshot,
} from './stage.js';
import type { KtxSchemaSnapshot } from '../../../scan/types.js';
function snapshot(): KtxSchemaSnapshot {
return {
connectionId: 'conn-1',
driver: 'postgres',
extractedAt: '2026-04-27T00:00:00.000Z',
scope: { schemas: ['public'] },
metadata: { dialect: 'postgres' },
tables: [
{
name: 'orders',
catalog: null,
db: 'public',
kind: 'table',
comment: 'Orders placed by customers',
estimatedRows: 200,
columns: [
{
name: 'id',
nativeType: 'integer',
normalizedType: 'integer',
dimensionType: 'number',
nullable: false,
primaryKey: true,
comment: null,
},
{
name: 'customer_id',
nativeType: 'integer',
normalizedType: 'integer',
dimensionType: 'number',
nullable: false,
primaryKey: false,
comment: null,
},
{
name: 'total',
nativeType: 'numeric',
normalizedType: 'numeric',
dimensionType: 'number',
nullable: false,
primaryKey: false,
comment: null,
},
],
foreignKeys: [
{
fromColumn: 'customer_id',
toCatalog: null,
toDb: 'public',
toTable: 'customers',
toColumn: 'id',
constraintName: null,
},
],
},
{
name: 'customers',
catalog: null,
db: 'public',
kind: 'table',
comment: null,
estimatedRows: 50,
columns: [
{
name: 'id',
nativeType: 'integer',
normalizedType: 'integer',
dimensionType: 'number',
nullable: false,
primaryKey: true,
comment: null,
},
],
foreignKeys: [],
},
],
};
}
describe('live-database staged snapshot files', () => {
it('writes deterministic metadata, table, and foreign-key files', async () => {
const dir = await mkdtemp(join(tmpdir(), 'ktx-live-db-stage-'));
await writeLiveDatabaseSnapshot(dir, snapshot());
await expect(readFile(join(dir, LIVE_DATABASE_META_FILE), 'utf8')).resolves.toContain('"connectionId": "conn-1"');
await expect(readFile(join(dir, LIVE_DATABASE_FOREIGN_KEYS_FILE), 'utf8')).resolves.toContain(
'"fromTable": "orders"',
);
const connectionJson = await readFile(join(dir, LIVE_DATABASE_META_FILE), 'utf8');
expect(connectionJson).toContain('"driver": "postgres"');
expect(connectionJson).toContain('"schemas"');
const ordersPath = liveDatabaseTablePath({ catalog: null, db: 'public', name: 'orders' });
const customersPath = liveDatabaseTablePath({ catalog: null, db: 'public', name: 'customers' });
expect(ordersPath).toMatch(/^tables\/[A-Za-z0-9_-]+\.[A-Za-z0-9_-]+\.[A-Za-z0-9_-]+\.json$/);
await expect(readFile(join(dir, ordersPath), 'utf8')).resolves.toContain('"name": "orders"');
await expect(readFile(join(dir, customersPath), 'utf8')).resolves.toContain('"name": "customers"');
const ordersJson = await readFile(join(dir, ordersPath), 'utf8');
expect(ordersJson).toContain('"kind": "table"');
expect(ordersJson).toContain('"estimatedRows": 200');
expect(ordersJson).toContain('"nativeType": "integer"');
expect(ordersJson).toContain('"normalizedType": "integer"');
expect(ordersJson).not.toContain('"type": "integer"');
const tableFiles = await readLiveDatabaseTableFiles(dir);
expect(tableFiles.map((file) => file.table.name)).toEqual(['customers', 'orders']);
expect(await detectLiveDatabaseStagedDir(dir)).toBe(true);
});
it('redacts sensitive snapshot metadata before writing connection metadata', async () => {
const dir = await mkdtemp(join(tmpdir(), 'ktx-live-db-redacted-stage-'));
await writeLiveDatabaseSnapshot(dir, {
...snapshot(),
metadata: {
dialect: 'postgres',
url: 'postgres://reader:secret@example.test/db', // pragma: allowlist secret
serviceAccountJson: {
client_email: 'reader@example.test',
private_key: 'pem-value', // pragma: allowlist secret
},
},
});
const connectionJson = await readFile(join(dir, LIVE_DATABASE_META_FILE), 'utf8');
expect(connectionJson).toContain('"dialect": "postgres"');
expect(connectionJson).toContain('"client_email": "reader@example.test"');
expect(connectionJson).toContain('"url": "<redacted>"');
expect(connectionJson).toContain('"private_key": "<redacted>"');
expect(connectionJson).not.toContain('postgres://reader:secret@example.test/db'); // pragma: allowlist secret
expect(connectionJson).not.toContain('pem-value');
});
it('returns false for a directory that is missing live database metadata', async () => {
const dir = await mkdtemp(join(tmpdir(), 'ktx-live-db-empty-'));
expect(await detectLiveDatabaseStagedDir(dir)).toBe(false);
});
});

View file

@ -0,0 +1,139 @@
import { Buffer } from 'node:buffer';
import type { Dirent } from 'node:fs';
import { mkdir, readdir, readFile, writeFile } from 'node:fs/promises';
import { join, relative } from 'node:path';
import { redactKtxSensitiveMetadata } from '../../../core/redaction.js';
import type { KtxSchemaSnapshot, KtxSchemaTable, KtxTableRef } from '../../../scan/types.js';
export const LIVE_DATABASE_META_FILE = 'connection.json';
export const LIVE_DATABASE_FOREIGN_KEYS_FILE = 'foreign-keys.json';
const LIVE_DATABASE_TABLES_DIR = 'tables';
interface LiveDatabaseTableFile {
path: string;
table: KtxSchemaTable;
}
interface ForeignKeyIndexEntry {
fromTable: string;
fromTablePath: string;
fromColumn: string;
toCatalog: string | null;
toDb: string | null;
toTable: string;
toColumn: string;
constraintName: string | null;
}
function encodePathPart(value: string | null | undefined): string {
return Buffer.from(value ?? '_', 'utf8').toString('base64url');
}
function tableSortKey(table: KtxTableRef): string {
return `${table.catalog ?? ''}\u0000${table.db ?? ''}\u0000${table.name}`;
}
/** @internal */
export function liveDatabaseTablePath(table: KtxTableRef): string {
return `${LIVE_DATABASE_TABLES_DIR}/${encodePathPart(table.catalog)}.${encodePathPart(table.db)}.${encodePathPart(
table.name,
)}.json`;
}
async function walkFiles(root: string, dir = root): Promise<string[]> {
let entries: Dirent[];
try {
entries = await readdir(dir, { withFileTypes: true });
} catch {
return [];
}
const files: string[] = [];
for (const entry of entries) {
const absolute = join(dir, entry.name);
if (entry.isDirectory()) {
files.push(...(await walkFiles(root, absolute)));
} else if (entry.isFile()) {
files.push(relative(root, absolute).replace(/\\/g, '/'));
}
}
return files.sort();
}
function stableJson(value: unknown): string {
return `${JSON.stringify(value, null, 2)}\n`;
}
function foreignKeyIndex(snapshot: KtxSchemaSnapshot): ForeignKeyIndexEntry[] {
const entries: ForeignKeyIndexEntry[] = [];
for (const table of snapshot.tables) {
for (const fk of table.foreignKeys) {
entries.push({
fromTable: table.name,
fromTablePath: liveDatabaseTablePath(table),
fromColumn: fk.fromColumn,
toCatalog: fk.toCatalog,
toDb: fk.toDb,
toTable: fk.toTable,
toColumn: fk.toColumn,
constraintName: fk.constraintName,
});
}
}
entries.sort(
(a, b) =>
a.fromTable.localeCompare(b.fromTable) ||
a.fromColumn.localeCompare(b.fromColumn) ||
a.toTable.localeCompare(b.toTable) ||
a.toColumn.localeCompare(b.toColumn),
);
return entries;
}
export async function writeLiveDatabaseSnapshot(stagedDir: string, snapshot: KtxSchemaSnapshot): Promise<void> {
await mkdir(join(stagedDir, LIVE_DATABASE_TABLES_DIR), { recursive: true });
const sortedTables = [...snapshot.tables].sort((a, b) => tableSortKey(a).localeCompare(tableSortKey(b)));
const metadata = {
connectionId: snapshot.connectionId,
driver: snapshot.driver,
extractedAt: snapshot.extractedAt,
scope: snapshot.scope,
metadata: redactKtxSensitiveMetadata(snapshot.metadata),
tableCount: sortedTables.length,
};
await writeFile(join(stagedDir, LIVE_DATABASE_META_FILE), stableJson(metadata));
await writeFile(
join(stagedDir, LIVE_DATABASE_FOREIGN_KEYS_FILE),
stableJson({ foreignKeys: foreignKeyIndex(snapshot) }),
);
for (const table of sortedTables) {
await writeFile(join(stagedDir, liveDatabaseTablePath(table)), stableJson(table));
}
}
export async function readLiveDatabaseTableFiles(stagedDir: string): Promise<LiveDatabaseTableFile[]> {
const files = await walkFiles(join(stagedDir, LIVE_DATABASE_TABLES_DIR));
const out: LiveDatabaseTableFile[] = [];
for (const file of files.filter((path) => path.endsWith('.json'))) {
const path = `${LIVE_DATABASE_TABLES_DIR}/${file}`;
const raw = await readFile(join(stagedDir, path), 'utf8');
const parsed = JSON.parse(raw) as KtxSchemaTable;
if (parsed && typeof parsed.name === 'string' && Array.isArray(parsed.columns)) {
out.push({ path, table: parsed });
}
}
out.sort((a, b) => tableSortKey(a.table).localeCompare(tableSortKey(b.table)));
return out;
}
export async function detectLiveDatabaseStagedDir(stagedDir: string): Promise<boolean> {
try {
const meta = JSON.parse(await readFile(join(stagedDir, LIVE_DATABASE_META_FILE), 'utf8')) as unknown;
if (!meta || typeof meta !== 'object' || Array.isArray(meta)) {
return false;
}
const files = await readLiveDatabaseTableFiles(stagedDir);
return files.length > 0;
} catch {
return false;
}
}

View file

@ -0,0 +1,10 @@
import type { KtxSchemaSnapshot } from '../../../scan/types.js';
export interface LiveDatabaseIntrospectionPort {
extractSchema(connectionId: string): Promise<KtxSchemaSnapshot>;
}
export interface LiveDatabaseSourceAdapterDeps {
introspection: LiveDatabaseIntrospectionPort;
now?: () => Date;
}

View file

@ -0,0 +1,154 @@
import { mkdir, mkdtemp, rm, writeFile } from 'node:fs/promises';
import { tmpdir } from 'node:os';
import { join } from 'node:path';
import { afterEach, beforeEach, describe, expect, it } from 'vitest';
import { chunkLookerStagedDir } from './chunk.js';
import { writeLookerEvidenceDocuments } from './evidence-documents.js';
async function writeJson(stagedDir: string, relPath: string, value: unknown): Promise<void> {
const abs = join(stagedDir, relPath);
await mkdir(join(abs, '..'), { recursive: true });
await writeFile(abs, `${JSON.stringify(value, null, 2)}\n`, 'utf-8');
}
async function writeSmallFixture(stagedDir: string): Promise<void> {
await writeJson(stagedDir, 'sync-config.json', {
lookerConnectionId: '11111111-1111-4111-8111-111111111111',
fetchedAt: '2026-04-30T12:30:00.000Z',
});
await writeJson(stagedDir, 'lookml_models.json', {
models: [{ name: 'b2b', label: 'B2B', explores: [{ name: 'sales_pipeline', label: 'Sales Pipeline' }] }],
});
await writeJson(stagedDir, 'explores/b2b/sales_pipeline.json', {
modelName: 'b2b',
exploreName: 'sales_pipeline',
label: 'Sales Pipeline',
description: null,
fields: { dimensions: [{ name: 'opportunities.id' }], measures: [{ name: 'opportunities.arr' }] },
joins: [],
});
await writeJson(stagedDir, 'dashboards/10.json', {
lookerId: '10',
title: 'Sales Pipeline',
description: null,
folderId: '7',
ownerId: '3',
updatedAt: '2026-04-30T12:00:00.000Z',
tiles: [{ id: '100', title: 'ARR', lookId: null, query: { model: 'b2b', view: 'sales_pipeline' } }],
});
await writeJson(stagedDir, 'looks/20.json', {
lookerId: '20',
title: 'Open Pipeline',
description: null,
folderId: '7',
ownerId: '3',
updatedAt: '2026-04-30T12:00:00.000Z',
query: { model: 'b2b', view: 'sales_pipeline', fields: ['opportunities.arr'] },
});
await writeJson(stagedDir, 'folders/tree.json', {
folders: [{ id: '7', name: 'Sandbox', parentId: null, path: ['Sandbox'] }],
});
await writeJson(stagedDir, 'users/3.json', { id: '3', displayName: 'Ada Lovelace', email: null });
await writeJson(stagedDir, 'signals/dashboard_usage.json', [
{ contentId: '10', queryCount30d: 50, uniqueUsers30d: 8 },
]);
await writeJson(stagedDir, 'signals/look_usage.json', [{ contentId: '20', queryCount30d: 20, uniqueUsers30d: 5 }]);
await writeJson(stagedDir, 'signals/scheduled_plans.json', [
{ contentId: '10', contentType: 'dashboard', isScheduled: true, scheduleCount: 1, recipientCount: 3 },
]);
await writeJson(stagedDir, 'signals/favorites.json', [
{ contentId: '10', contentType: 'dashboard', favoriteCount: 4 },
]);
await writeLookerEvidenceDocuments(stagedDir);
}
describe('chunkLookerStagedDir', () => {
let stagedDir: string;
beforeEach(async () => {
stagedDir = await mkdtemp(join(tmpdir(), 'looker-chunk-'));
await writeSmallFixture(stagedDir);
});
afterEach(async () => {
await rm(stagedDir, { recursive: true, force: true });
});
it('emits one WU per explore, dashboard, and Look with readable dependencies', async () => {
const result = await chunkLookerStagedDir(stagedDir);
expect(result.reconcileNotes).toEqual([
expect.stringContaining('emit_artifact_resolution with actionType="subsumed"'),
]);
expect(result.workUnits.map((wu) => wu.unitKey).sort()).toEqual([
'looker-dashboard-10',
'looker-explore-b2b-sales_pipeline',
'looker-look-20',
]);
const dashboard = result.workUnits.find((wu) => wu.unitKey === 'looker-dashboard-10');
expect(dashboard?.rawFiles).toEqual([
'dashboards/10.json',
'evidence/dashboards/10/metadata.json',
'evidence/dashboards/10/page.md',
]);
expect(dashboard?.notes).toContain('context_candidate_write');
expect(dashboard?.notes).not.toContain('wiki_write');
expect(dashboard?.dependencyPaths.sort()).toEqual([
'explores/b2b/sales_pipeline.json',
'folders/tree.json',
'signals/dashboard_usage.json',
'signals/favorites.json',
'signals/scheduled_plans.json',
'users/3.json',
]);
const explore = result.workUnits.find((wu) => wu.unitKey === 'looker-explore-b2b-sales_pipeline');
expect(explore?.rawFiles).toEqual([
'explores/b2b/sales_pipeline.json',
'evidence/explores/b2b/sales_pipeline/metadata.json',
'evidence/explores/b2b/sales_pipeline/page.md',
]);
expect(explore?.dependencyPaths).toEqual(['lookml_models.json']);
});
it('keeps downstream dashboard and Look WUs when an explore dependency changes', async () => {
const result = await chunkLookerStagedDir(stagedDir, {
added: [],
modified: ['explores/b2b/sales_pipeline.json'],
deleted: [],
unchanged: [
'dashboards/10.json',
'looks/20.json',
'lookml_models.json',
'folders/tree.json',
'users/3.json',
'signals/dashboard_usage.json',
'signals/look_usage.json',
'signals/scheduled_plans.json',
'signals/favorites.json',
],
});
expect(result.workUnits.map((wu) => wu.unitKey).sort()).toEqual([
'looker-dashboard-10',
'looker-explore-b2b-sales_pipeline',
'looker-look-20',
]);
expect(result.workUnits.find((wu) => wu.unitKey === 'looker-dashboard-10')?.rawFiles).toEqual([
'dashboards/10.json',
'evidence/dashboards/10/metadata.json',
'evidence/dashboards/10/page.md',
]);
});
it('returns an EvictionUnit for deleted runtime entity raw paths', async () => {
const result = await chunkLookerStagedDir(stagedDir, {
added: [],
modified: [],
deleted: ['looks/20.json'],
unchanged: ['dashboards/10.json', 'explores/b2b/sales_pipeline.json'],
});
expect(result.eviction).toEqual({ deletedRawPaths: ['looks/20.json'] });
});
});

View file

@ -0,0 +1,198 @@
import { readdir, readFile } from 'node:fs/promises';
import { join, relative } from 'node:path';
import type { ChunkResult, DiffSet, WorkUnit } from '../../types.js';
import { buildLookerReconcileNotes } from './reconcile.js';
import {
STAGED_FILES,
type StagedDashboardFile,
type StagedLookerQuery,
type StagedLookFile,
stagedDashboardFileSchema,
stagedExploreFileSchema,
stagedLookFileSchema,
} from './types.js';
interface LoadedLookerProject {
allPaths: string[];
dashboardsByPath: Map<string, StagedDashboardFile>;
looksByPath: Map<string, StagedLookFile>;
explorePaths: string[];
}
async function walk(root: string): Promise<string[]> {
const entries = await readdir(root, { withFileTypes: true, recursive: true });
return entries
.filter((entry) => entry.isFile())
.map((entry) => relative(root, join(entry.parentPath, entry.name)).replace(/\\/g, '/'))
.sort();
}
async function loadProject(stagedDir: string): Promise<LoadedLookerProject> {
const allPaths = await walk(stagedDir);
const dashboardsByPath = new Map<string, StagedDashboardFile>();
const looksByPath = new Map<string, StagedLookFile>();
const explorePaths: string[] = [];
for (const path of allPaths) {
if (/^dashboards\/[^/]+\.json$/.test(path)) {
dashboardsByPath.set(
path,
stagedDashboardFileSchema.parse(JSON.parse(await readFile(join(stagedDir, path), 'utf-8'))),
);
continue;
}
if (/^looks\/[^/]+\.json$/.test(path)) {
looksByPath.set(path, stagedLookFileSchema.parse(JSON.parse(await readFile(join(stagedDir, path), 'utf-8'))));
continue;
}
if (/^explores\/[^/]+\/[^/]+\.json$/.test(path)) {
const explore = stagedExploreFileSchema.parse(JSON.parse(await readFile(join(stagedDir, path), 'utf-8')));
explorePaths.push(explorePath(explore.modelName, explore.exploreName));
}
}
return { allPaths, dashboardsByPath, looksByPath, explorePaths: [...new Set(explorePaths)].sort() };
}
export async function chunkLookerStagedDir(stagedDir: string, diffSet?: DiffSet): Promise<ChunkResult> {
const project = await loadProject(stagedDir);
const firstRunUnits = emitFirstRunWorkUnits(project);
const result = diffSet ? applyDiffSet(firstRunUnits, diffSet) : { workUnits: firstRunUnits };
const eviction =
diffSet && diffSet.deleted.length > 0 ? { deletedRawPaths: [...diffSet.deleted].sort() } : result.eviction;
return {
...result,
eviction,
reconcileNotes: result.workUnits.length > 0 || eviction ? buildLookerReconcileNotes() : [],
};
}
function emitFirstRunWorkUnits(project: LoadedLookerProject): WorkUnit[] {
const units: WorkUnit[] = [];
for (const path of project.explorePaths) {
const parts = /^explores\/([^/]+)\/([^/]+)\.json$/.exec(path);
if (!parts) {
continue;
}
const deps = project.allPaths.includes(STAGED_FILES.lookmlModels) ? [STAGED_FILES.lookmlModels] : [];
units.push(
buildUnit(project, {
unitKey: `looker-explore-${parts[1]}-${parts[2]}`,
displayLabel: `Looker explore ${parts[1]}.${parts[2]}`,
rawFiles: [path, ...evidencePathsForExplore(project, parts[1], parts[2])],
dependencyPaths: deps,
notes: `Write API-derived SL source looker__${parts[1]}__${parts[2]} and durable domain knowledge for this Looker explore.`,
}),
);
}
for (const [path, dashboard] of [...project.dashboardsByPath.entries()].sort(([a], [b]) => a.localeCompare(b))) {
const deps = new Set<string>();
addIfPresent(project, deps, STAGED_FILES.foldersTree);
addIfPresent(project, deps, STAGED_FILES.signals.dashboardUsage);
addIfPresent(project, deps, STAGED_FILES.signals.scheduledPlans);
addIfPresent(project, deps, STAGED_FILES.signals.favorites);
if (dashboard.ownerId) {
addIfPresent(project, deps, `users/${dashboard.ownerId}.json`);
}
for (const tile of dashboard.tiles) {
addExploreDependency(project, deps, tile.query);
}
units.push(
buildUnit(project, {
unitKey: `looker-dashboard-${dashboard.lookerId}`,
displayLabel: `Looker dashboard "${dashboard.title}"`,
rawFiles: [path, ...evidencePathsForDashboard(project, dashboard.lookerId)],
dependencyPaths: [...deps].sort(),
notes:
'Extract generalizable metric, segment, and domain knowledge from this dashboard. Treat usage, owner, and folder data as prioritization/provenance context only. Use context_evidence_search/context_evidence_read and context_candidate_write for wiki-bound knowledge; do not write wiki pages directly from this WorkUnit.',
}),
);
}
for (const [path, look] of [...project.looksByPath.entries()].sort(([a], [b]) => a.localeCompare(b))) {
const deps = new Set<string>();
addIfPresent(project, deps, STAGED_FILES.foldersTree);
addIfPresent(project, deps, STAGED_FILES.signals.lookUsage);
addIfPresent(project, deps, STAGED_FILES.signals.scheduledPlans);
addIfPresent(project, deps, STAGED_FILES.signals.favorites);
if (look.ownerId) {
addIfPresent(project, deps, `users/${look.ownerId}.json`);
}
addExploreDependency(project, deps, look.query);
units.push(
buildUnit(project, {
unitKey: `looker-look-${look.lookerId}`,
displayLabel: `Looker Look "${look.title}"`,
rawFiles: [path, ...evidencePathsForLook(project, look.lookerId)],
dependencyPaths: [...deps].sort(),
notes:
'Extract generalizable metric, segment, and domain knowledge from this Look. Treat usage, owner, and folder data as prioritization/provenance context only. Use context_evidence_search/context_evidence_read and context_candidate_write for wiki-bound knowledge; do not write wiki pages directly from this WorkUnit.',
}),
);
}
return units.sort((a, b) => a.unitKey.localeCompare(b.unitKey));
}
function buildUnit(
project: LoadedLookerProject,
input: Pick<WorkUnit, 'unitKey' | 'displayLabel' | 'rawFiles' | 'dependencyPaths' | 'notes'>,
): WorkUnit {
const excluded = new Set([...input.rawFiles, ...input.dependencyPaths]);
return {
...input,
peerFileIndex: project.allPaths.filter((path) => !excluded.has(path)).sort(),
};
}
function applyDiffSet(firstRunUnits: WorkUnit[], diffSet: DiffSet): ChunkResult {
const touched = new Set([...diffSet.added, ...diffSet.modified]);
const workUnits = firstRunUnits.filter((wu) => {
const readablePaths = [...wu.rawFiles, ...wu.dependencyPaths];
return readablePaths.some((path) => touched.has(path));
});
return { workUnits };
}
function addIfPresent(project: LoadedLookerProject, deps: Set<string>, path: string): void {
if (project.allPaths.includes(path)) {
deps.add(path);
}
}
function addExploreDependency(project: LoadedLookerProject, deps: Set<string>, query: StagedLookerQuery | null): void {
if (!query) {
return;
}
addIfPresent(project, deps, explorePath(query.model, query.view));
}
function evidencePathsForExplore(project: LoadedLookerProject, modelName: string, exploreName: string): string[] {
return existingPaths(project, [
`evidence/explores/${modelName}/${exploreName}/metadata.json`,
`evidence/explores/${modelName}/${exploreName}/page.md`,
]);
}
function evidencePathsForDashboard(project: LoadedLookerProject, dashboardId: string): string[] {
return existingPaths(project, [
`evidence/dashboards/${dashboardId}/metadata.json`,
`evidence/dashboards/${dashboardId}/page.md`,
]);
}
function evidencePathsForLook(project: LoadedLookerProject, lookId: string): string[] {
return existingPaths(project, [`evidence/looks/${lookId}/metadata.json`, `evidence/looks/${lookId}/page.md`]);
}
function existingPaths(project: LoadedLookerProject, paths: string[]): string[] {
return paths.filter((path) => project.allPaths.includes(path));
}
function explorePath(modelName: string, exploreName: string): string {
return `explores/${modelName}/${exploreName}.json`;
}

View file

@ -0,0 +1,14 @@
import { readFile } from 'node:fs/promises';
import { describe, expect, it } from 'vitest';
describe('LookerClient boundary', () => {
it('does not import server or NestJS modules', async () => {
const source = await readFile(new URL('./client.ts', import.meta.url), 'utf-8');
expect(source).not.toMatch(/@nestjs\/common/);
expect(source).not.toMatch(/DataSourceClient/);
expect(source).not.toMatch(/\.\.\/interfaces/);
expect(source).not.toMatch(/\.\.\/types/);
expect(source).not.toMatch(/server\/src/);
});
});

View file

@ -0,0 +1,473 @@
import { describe, expect, it, vi } from 'vitest';
import { LookerClient, type LookerSdkPort } from './client.js';
const clientSecretParam = 'client_secret'; // pragma: allowlist secret
function params(): Record<string, unknown> {
return {
base_url: 'https://example.looker.com',
client_id: 'id',
[clientSecretParam]: 'credential', // pragma: allowlist secret
};
}
function sdk(overrides: Partial<LookerSdkPort> = {}): LookerSdkPort {
const port: LookerSdkPort = {
me: vi.fn().mockResolvedValue({ id: '1', display_name: 'API User', email: 'api@example.com' }),
search_dashboards: vi.fn().mockResolvedValue([{ id: '10' }]),
dashboard: vi.fn().mockResolvedValue({
id: '10',
title: 'Revenue Dashboard',
description: 'Revenue concepts',
folder_id: '20',
user_id: '1',
updated_at: '2026-04-30T00:00:00.000Z',
dashboard_elements: [
{
id: '99',
title: 'ARR',
look_id: null,
query: {
id: 'q1',
model: 'b2b',
view: 'sales_pipeline',
fields: ['opportunities.arr', 'opportunities.stage'],
filters: { 'opportunities.stage': 'open' },
sorts: ['opportunities.arr desc'],
limit: '500',
},
},
],
}),
search_looks: vi.fn().mockResolvedValue([{ id: '30' }]),
search_scheduled_plans: vi.fn().mockResolvedValue([]),
look: vi.fn().mockResolvedValue({
id: '30',
title: 'Open Pipeline ARR',
description: 'ARR for open opportunities',
folder_id: '20',
user_id: '1',
updated_at: '2026-04-30T00:00:00.000Z',
query: {
id: 'q2',
model: 'b2b',
view: 'sales_pipeline',
fields: ['opportunities.arr'],
filters: { 'opportunities.stage': 'open' },
},
}),
all_folders: vi.fn().mockResolvedValue([{ id: '20', name: 'Executive', parent_id: null }]),
all_users: vi.fn().mockResolvedValue([{ id: '1', display_name: 'API User', email: 'api@example.com' }]),
all_groups: vi.fn().mockResolvedValue([{ id: '2', name: 'Finance' }]),
all_connections: vi.fn().mockResolvedValue([
{
name: 'b2b_sandbox_bq',
host: 'warehouse.example.com',
database: 'analytics',
schema: 'public',
dialect_name: 'bigquery_standard_sql',
},
]),
all_lookml_models: vi
.fn()
.mockResolvedValue([
{ name: 'b2b', label: 'B2B', explores: [{ name: 'sales_pipeline', label: 'Sales Pipeline' }] },
]),
lookml_model_explore: vi.fn().mockResolvedValue({
name: 'sales_pipeline',
label: 'Sales Pipeline',
description: 'Opportunity pipeline',
sql_table_name: 'proj.dataset.opportunities AS opportunities',
connection_name: 'b2b_sandbox_bq',
view_name: 'opportunities',
fields: {
dimensions: [{ name: 'opportunities.stage', label: 'Stage', type: 'string', sql: '$' + '{TABLE}.stage' }],
measures: [{ name: 'opportunities.arr', label: 'ARR', type: 'sum', sql: '$' + '{TABLE}.arr' }],
},
joins: [
{
name: 'accounts',
type: 'left_outer',
relationship: 'many_to_one',
sql_table_name: 'proj.dataset.accounts',
sql_on: '$' + '{opportunities.account_id} = $' + '{accounts.id}',
from: null,
},
],
}),
run_inline_query: vi.fn().mockResolvedValue('[]'),
logout: vi.fn().mockResolvedValue(undefined),
...overrides,
};
return port;
}
describe('LookerClient', () => {
it('validates credentials with me()', async () => {
const client = new LookerClient(params(), { sdkFactory: () => sdk() });
await expect(client.testConnection()).resolves.toEqual({
success: true,
metadata: { userId: '1', displayName: 'API User', email: 'api@example.com' },
});
});
it('does not warn to console when optional prioritization inputs fail by default', async () => {
const warn = vi.spyOn(console, 'warn').mockImplementation(() => undefined);
const fakeSdk = sdk({
search_dashboards: vi.fn().mockRejectedValue(new Error('dashboards unavailable')),
search_looks: vi.fn().mockRejectedValue(new Error('looks unavailable')),
});
const client = new LookerClient(params(), { sdkFactory: () => fakeSdk });
await expect(client.getSignals()).resolves.toMatchObject({
dashboardUsage: [],
lookUsage: [],
scheduledPlans: [],
favorites: [],
});
expect(warn).not.toHaveBeenCalled();
});
it('maps dashboards, looks, folders, models, explores, users, and groups to staged DTOs', async () => {
const fakeSdk = sdk();
const client = new LookerClient(params(), { sdkFactory: () => fakeSdk });
await expect(client.listDashboards()).resolves.toEqual([{ id: '10', updatedAt: null }]);
await expect(client.getDashboard('10')).resolves.toMatchObject({
lookerId: '10',
title: 'Revenue Dashboard',
tiles: [{ id: '99', query: { model: 'b2b', view: 'sales_pipeline' } }],
});
await expect(client.listLooks()).resolves.toEqual([{ id: '30', updatedAt: null }]);
await expect(client.getLook('30')).resolves.toMatchObject({
lookerId: '30',
title: 'Open Pipeline ARR',
query: { model: 'b2b', view: 'sales_pipeline' },
});
await expect(client.listFolders()).resolves.toEqual({
folders: [{ id: '20', name: 'Executive', parentId: null, path: ['Executive'] }],
});
await expect(client.listLookmlModels()).resolves.toEqual({
models: [{ name: 'b2b', label: 'B2B', explores: [{ name: 'sales_pipeline', label: 'Sales Pipeline' }] }],
});
await expect(client.listLookerConnections()).resolves.toEqual([
{
name: 'b2b_sandbox_bq',
host: 'warehouse.example.com',
database: 'analytics',
schema: 'public',
dialect: 'bigquery_standard_sql',
},
]);
await expect(client.getExplore('b2b', 'sales_pipeline')).resolves.toMatchObject({
modelName: 'b2b',
exploreName: 'sales_pipeline',
rawSqlTableName: 'proj.dataset.opportunities AS opportunities',
connectionName: 'b2b_sandbox_bq',
viewName: 'opportunities',
fields: { dimensions: [{ name: 'opportunities.stage' }], measures: [{ name: 'opportunities.arr' }] },
joins: [
{
name: 'accounts',
rawSqlTableName: 'proj.dataset.accounts',
sqlOn: '$' + '{opportunities.account_id} = $' + '{accounts.id}',
from: null,
targetTable: null,
},
],
targetWarehouseConnectionId: null,
targetTable: null,
});
expect(fakeSdk.dashboard).toHaveBeenCalledWith(
'10',
'id,title,description,folder_id,user_id,updated_at,dashboard_elements(id,title,look_id,query(id,model,view,fields,filters,sorts,limit,dynamic_fields))',
);
expect(fakeSdk.look).toHaveBeenCalledWith(
'30',
'id,title,description,folder_id,user_id,updated_at,query(id,model,view,fields,filters,sorts,limit,dynamic_fields)',
);
expect(fakeSdk.lookml_model_explore).toHaveBeenCalledWith(
'b2b',
'sales_pipeline',
'name,label,description,sql_table_name,connection_name,view_name,fields,joins(name,type,relationship,sql_table_name,sql_on,from)',
);
expect(fakeSdk.all_connections).toHaveBeenCalledWith('name,host,database,schema,dialect_name');
});
it('returns empty usage signals when system activity access fails', async () => {
const client = new LookerClient(params(), {
sdkFactory: () =>
sdk({
run_inline_query: vi.fn().mockRejectedValue(new Error('access denied')),
search_dashboards: vi.fn().mockResolvedValue([{ id: '10', favorite_count: 4 }]),
search_looks: vi.fn().mockResolvedValue([{ id: '30', favorite_count: 2 }]),
search_scheduled_plans: vi.fn().mockResolvedValue([]),
}),
});
await expect(client.getSignals()).resolves.toEqual({
dashboardUsage: [],
lookUsage: [],
scheduledPlans: [],
favorites: [
{ contentId: '10', contentType: 'dashboard', favoriteCount: 4 },
{ contentId: '30', contentType: 'look', favoriteCount: 2 },
],
});
});
it('paginates dashboard and Look searches', async () => {
const dashboardPageOne = Array.from({ length: 500 }, (_, index) => ({ id: String(index + 1) }));
const lookPageOne = Array.from({ length: 500 }, (_, index) => ({ id: String(index + 1001) }));
const fakeSdk = sdk({
search_dashboards: vi
.fn()
.mockResolvedValueOnce(dashboardPageOne)
.mockResolvedValueOnce([{ id: '501' }]),
search_looks: vi
.fn()
.mockResolvedValueOnce(lookPageOne)
.mockResolvedValueOnce([{ id: '1501' }]),
});
const client = new LookerClient(params(), { sdkFactory: () => fakeSdk });
await expect(client.listDashboards()).resolves.toHaveLength(501);
await expect(client.listLooks()).resolves.toHaveLength(501);
expect(fakeSdk.search_dashboards).toHaveBeenNthCalledWith(
1,
expect.objectContaining({
deleted: false,
fields: 'id,updated_at',
limit: 500,
offset: 0,
sorts: 'id',
}),
);
expect(fakeSdk.search_dashboards).toHaveBeenNthCalledWith(
2,
expect.objectContaining({
limit: 500,
offset: 500,
}),
);
expect(fakeSdk.search_looks).toHaveBeenNthCalledWith(
1,
expect.objectContaining({
deleted: false,
fields: 'id,updated_at',
limit: 500,
offset: 0,
sorts: 'id',
}),
);
expect(fakeSdk.search_looks).toHaveBeenNthCalledWith(
2,
expect.objectContaining({
limit: 500,
offset: 500,
}),
);
});
it('returns updatedAt cursors from dashboard and Look listing rows', async () => {
const fakeSdk = sdk({
search_dashboards: vi.fn().mockResolvedValue([{ id: '10', updated_at: '2026-04-30T12:00:00.000Z' }]),
search_looks: vi.fn().mockResolvedValue([{ id: '30', updated_at: '2026-04-30T11:00:00.000Z' }]),
});
const client = new LookerClient(params(), { sdkFactory: () => fakeSdk });
await expect(client.listDashboards()).resolves.toEqual([{ id: '10', updatedAt: '2026-04-30T12:00:00.000Z' }]);
await expect(client.listLooks()).resolves.toEqual([{ id: '30', updatedAt: '2026-04-30T11:00:00.000Z' }]);
});
it('logs out the SDK session during cleanup', async () => {
const fakeSdk = sdk();
const client = new LookerClient(params(), { sdkFactory: () => fakeSdk });
await client.testConnection();
await client.cleanup();
expect(fakeSdk.logout).toHaveBeenCalledTimes(1);
});
it('aggregates usage, scheduled-plan, and favorite signals', async () => {
const runInlineQuery = vi
.fn()
.mockResolvedValueOnce(
JSON.stringify([
{
'dashboard.id': '10',
'history.query_run_count': 3,
'history.created_date': '2026-04-30',
'user.id': 'user-1',
},
{
'dashboard.id': '10',
'history.query_run_count': '2',
'history.created_date': '2026-04-29',
'user.id': 'user-2',
},
]),
)
.mockResolvedValueOnce(
JSON.stringify([
{
'look.id': '30',
'history.query_run_count': 7,
'history.created_date': '2026-04-28',
'user.id': 'user-1',
},
]),
);
const fakeSdk = sdk({
run_inline_query: runInlineQuery,
search_dashboards: vi.fn().mockResolvedValueOnce([{ id: '10', favorite_count: 4 }]),
search_looks: vi.fn().mockResolvedValueOnce([{ id: '30', favorite_count: 2 }]),
search_scheduled_plans: vi.fn().mockResolvedValueOnce([
{
id: 'sp-dashboard',
dashboard_id: '10',
look_id: null,
enabled: true,
scheduled_plan_destination: [{ id: 'dest-1' }, { id: 'dest-2' }],
},
{
id: 'sp-look',
dashboard_id: null,
look_id: '30',
enabled: true,
scheduled_plan_destination: [{ id: 'dest-3' }],
},
]),
});
const client = new LookerClient(params(), { sdkFactory: () => fakeSdk });
await expect(client.getSignals()).resolves.toEqual({
dashboardUsage: [
{
contentId: '10',
queryCount30d: 5,
uniqueUsers30d: 2,
lastRunAt: '2026-04-30',
topUsers: ['user-1', 'user-2'],
},
],
lookUsage: [
{
contentId: '30',
queryCount30d: 7,
uniqueUsers30d: 1,
lastRunAt: '2026-04-28',
topUsers: ['user-1'],
},
],
scheduledPlans: [
{
contentId: '10',
contentType: 'dashboard',
isScheduled: true,
scheduleCount: 1,
recipientCount: 2,
},
{
contentId: '30',
contentType: 'look',
isScheduled: true,
scheduleCount: 1,
recipientCount: 1,
},
],
favorites: [
{ contentId: '10', contentType: 'dashboard', favoriteCount: 4 },
{ contentId: '30', contentType: 'look', favoriteCount: 2 },
],
});
expect(runInlineQuery).toHaveBeenNthCalledWith(
1,
expect.objectContaining({
result_format: 'json',
body: expect.objectContaining({
model: 'system__activity',
view: 'history',
fields: ['dashboard.id', 'history.query_run_count', 'history.created_date', 'user.id'],
}),
}),
);
expect(fakeSdk.search_scheduled_plans).toHaveBeenCalledWith(
expect.objectContaining({
all_users: true,
fields: 'id,dashboard_id,look_id,enabled,scheduled_plan_destination',
limit: 500,
offset: 0,
sorts: 'id',
}),
);
});
it('retries a 429 response once using Retry-After seconds', async () => {
const sleep = vi.fn().mockResolvedValue(undefined);
const rateLimitError = new Error('rate limited');
Object.assign(rateLimitError, { statusCode: 429, headers: { 'retry-after': '2' } });
const fakeSdk = sdk({
search_dashboards: vi
.fn()
.mockRejectedValueOnce(rateLimitError)
.mockResolvedValueOnce([{ id: '10' }]),
});
const client = new LookerClient(params(), { sdkFactory: () => fakeSdk, sleep });
await expect(client.listDashboards()).resolves.toEqual([{ id: '10', updatedAt: null }]);
expect(sleep).toHaveBeenCalledWith(2000);
expect(fakeSdk.search_dashboards).toHaveBeenCalledTimes(2);
});
it('does not retry non-429 errors', async () => {
const sleep = vi.fn().mockResolvedValue(undefined);
const error = new Error('broken dashboard');
Object.assign(error, { statusCode: 500 });
const fakeSdk = sdk({ dashboard: vi.fn().mockRejectedValue(error) });
const client = new LookerClient(params(), { sdkFactory: () => fakeSdk, sleep });
await expect(client.getDashboard('10')).rejects.toThrow('broken dashboard');
expect(sleep).not.toHaveBeenCalled();
expect(fakeSdk.dashboard).toHaveBeenCalledTimes(1);
});
it('initializes the real @looker/sdk-node SDK with inline credentials without throwing', async () => {
const client = new LookerClient(params());
const result = await client.testConnection();
// Without injected sdkFactory the real SDK is constructed via InlineLookerSettings.
// This used to throw "Missing required configuration values like base_url" because
// the parent NodeSettingsIniFile constructor validated config before the override
// could supply credentials. Whatever happens now (auth/network failure against the
// bogus example URL is fine) — what must NOT happen is a synchronous SDK-init throw.
expect(result.success).toBe(false);
expect(result.error).toBeDefined();
expect(result.error).not.toMatch(/Missing required configuration values/i);
await client.cleanup();
});
it('strips trailing /api/4.0 from base_url so the SDK does not double-prefix it', async () => {
const clientWithSuffix = new LookerClient({
base_url: 'https://example.looker.com/api/4.0',
client_id: 'id',
[clientSecretParam]: 'credential', // pragma: allowlist secret
});
const result = await clientWithSuffix.testConnection();
expect(result.success).toBe(false);
// If base_url is double-prefixed the SDK would hit /api/4.0/api/4.0/login. Either
// the URL is correctly normalized (transport-level network failure) or we'd see a
// 404/HTML response — either way the stack must not be a config-validation throw.
expect(result.error).not.toMatch(/Missing required configuration values/i);
await clientWithSuffix.cleanup();
});
});

View file

@ -0,0 +1,732 @@
import type {
IRequestRunInlineQuery,
IRequestSearchDashboards,
IRequestSearchLooks,
IRequestSearchScheduledPlans,
} from '@looker/sdk';
import type { IApiSection, IApiSettings } from '@looker/sdk-rtl';
import { LookerNodeSDK, NodeSettings } from '@looker/sdk-node';
import type { LookerRuntimeClient } from './fetch.js';
import type {
StagedDashboardFile,
StagedExploreFile,
StagedFoldersTreeFile,
StagedGroupFile,
StagedLookerQuery,
StagedLookerSignalsFile,
StagedLookFile,
StagedLookmlModelsFile,
StagedUserFile,
} from './types.js';
type LookerRecord = Record<string, unknown>;
export interface TestConnectionResult {
success: boolean;
error?: string;
metadata?: Record<string, unknown>;
}
export interface LookerConnectionParams extends Record<string, unknown> {
base_url: string;
client_id: string;
client_secret: string;
}
export interface LookerWarehouseConnectionInfo {
name: string;
host: string | null;
database: string | null;
schema: string | null;
dialect: string | null;
}
const LOOKER_PAGE_SIZE = 500;
const LOOKER_DASHBOARD_FIELDS =
'id,title,description,folder_id,user_id,updated_at,dashboard_elements(id,title,look_id,query(id,model,view,fields,filters,sorts,limit,dynamic_fields))';
const LOOKER_LOOK_FIELDS =
'id,title,description,folder_id,user_id,updated_at,query(id,model,view,fields,filters,sorts,limit,dynamic_fields)';
const LOOKER_EXPLORE_FIELDS =
'name,label,description,sql_table_name,connection_name,view_name,fields,joins(name,type,relationship,sql_table_name,sql_on,from)';
export interface LookerSdkPort {
me(fields?: string): Promise<LookerRecord>;
search_dashboards(request?: LookerRecord): Promise<LookerRecord[]>;
dashboard(id: string, fields?: string): Promise<LookerRecord>;
search_looks(request?: LookerRecord): Promise<LookerRecord[]>;
search_scheduled_plans(request?: LookerRecord): Promise<LookerRecord[]>;
look(id: string, fields?: string): Promise<LookerRecord>;
all_folders(fields?: string): Promise<LookerRecord[]>;
all_users(fields?: string): Promise<LookerRecord[]>;
all_groups(fields?: string): Promise<LookerRecord[]>;
all_connections(fields?: string): Promise<LookerRecord[]>;
all_lookml_models(fields?: string): Promise<LookerRecord[]>;
lookml_model_explore(modelName: string, exploreName: string, fields?: string): Promise<LookerRecord>;
run_inline_query(request: IRequestRunInlineQuery): Promise<string>;
logout(): Promise<void>;
}
export interface LookerClientLogger {
log(message: string): void;
warn(message: string): void;
error(message: string): void;
debug?(message: string): void;
}
export interface LookerClientDeps {
sdkFactory?: (params: LookerConnectionParams) => LookerSdkPort;
sleep?: (ms: number) => Promise<void>;
logger?: LookerClientLogger;
}
const defaultLogger: LookerClientLogger = {
log: () => undefined,
warn: () => undefined,
error: () => undefined,
debug: () => undefined,
};
class InlineLookerSettings extends NodeSettings {
constructor(private readonly params: LookerConnectionParams) {
super('', {
base_url: normalizeBaseUrl(params.base_url),
client_id: params.client_id,
client_secret: params.client_secret, // pragma: allowlist secret
verify_ssl: 'true',
timeout: '120',
} as unknown as IApiSettings);
}
override readConfig(_section?: string): IApiSection {
return {
base_url: normalizeBaseUrl(this.params.base_url),
client_id: this.params.client_id,
client_secret: this.params.client_secret, // pragma: allowlist secret
verify_ssl: 'true',
timeout: '120',
};
}
}
function createLookerSdkPort(params: LookerConnectionParams): LookerSdkPort {
const sdk = LookerNodeSDK.init40(new InlineLookerSettings(params));
return {
me: (fields) => sdk.ok(sdk.me(fields)).then(toRecord),
search_dashboards: (request) =>
sdk.ok(sdk.search_dashboards((request ?? {}) as IRequestSearchDashboards)).then(toRecordArray),
dashboard: (id, fields) => sdk.ok(sdk.dashboard(id, fields)).then(toRecord),
search_looks: (request) => sdk.ok(sdk.search_looks((request ?? {}) as IRequestSearchLooks)).then(toRecordArray),
search_scheduled_plans: (request) =>
sdk.ok(sdk.search_scheduled_plans((request ?? {}) as IRequestSearchScheduledPlans)).then(toRecordArray),
look: (id, fields) => sdk.ok(sdk.look(id, fields)).then(toRecord),
all_folders: (fields) => sdk.ok(sdk.all_folders(fields)).then(toRecordArray),
all_users: (fields) => sdk.ok(sdk.all_users({ fields })).then(toRecordArray),
all_groups: (fields) => sdk.ok(sdk.all_groups({ fields })).then(toRecordArray),
all_connections: (fields) => sdk.ok(sdk.all_connections(fields)).then(toRecordArray),
all_lookml_models: (fields) => sdk.ok(sdk.all_lookml_models({ fields })).then(toRecordArray),
lookml_model_explore: (modelName, exploreName, fields) =>
sdk
.ok(sdk.lookml_model_explore({ lookml_model_name: modelName, explore_name: exploreName, fields }))
.then(toRecord),
run_inline_query: (request) => sdk.ok(sdk.run_inline_query(request)),
logout: async () => {
await sdk.authSession.logout();
},
};
}
export class LookerClient implements LookerRuntimeClient {
private readonly logger: LookerClientLogger;
private readonly params: LookerConnectionParams;
private sdkInstance: LookerSdkPort | null = null;
constructor(
connectionParams: Record<string, unknown>,
private readonly deps: LookerClientDeps = {},
) {
this.logger = deps.logger ?? defaultLogger;
this.params = parseLookerConnectionParams(connectionParams);
}
get dataSourceType(): string {
return 'LOOKER';
}
async testConnection(): Promise<TestConnectionResult> {
try {
const me = await this.withRateLimitRetry(() => this.sdk().me('id,display_name,email'));
return {
success: true,
metadata: {
userId: stringValue(me.id),
displayName: nullableString(me.display_name),
email: nullableString(me.email),
},
};
} catch (error) {
return { success: false, error: error instanceof Error ? error.message : String(error) };
}
}
async listDashboards(): Promise<Array<{ id: string; updatedAt: string | null }>> {
const dashboards = await this.collectPaged((offset) =>
this.sdk().search_dashboards({
deleted: false,
fields: 'id,updated_at',
limit: LOOKER_PAGE_SIZE,
offset,
sorts: 'id',
}),
);
return dashboards.flatMap(entityRef);
}
async getDashboard(id: string): Promise<StagedDashboardFile> {
const dashboard = await this.withRateLimitRetry(() => this.sdk().dashboard(id, LOOKER_DASHBOARD_FIELDS));
const elements = arrayValue(dashboard.dashboard_elements);
return {
lookerId: stringValue(dashboard.id),
title: stringValue(dashboard.title),
description: nullableString(dashboard.description),
folderId: nullableString(dashboard.folder_id),
ownerId: nullableString(dashboard.user_id),
updatedAt: nullableString(dashboard.updated_at),
tiles: elements.map((tile) => ({
id: stringValue(tile.id),
title: nullableString(tile.title),
lookId: nullableString(tile.look_id),
query: queryValue(tile.query),
})),
};
}
async listLooks(): Promise<Array<{ id: string; updatedAt: string | null }>> {
const looks = await this.collectPaged((offset) =>
this.sdk().search_looks({
deleted: false,
fields: 'id,updated_at',
limit: LOOKER_PAGE_SIZE,
offset,
sorts: 'id',
}),
);
return looks.flatMap(entityRef);
}
async getLook(id: string): Promise<StagedLookFile> {
const look = await this.withRateLimitRetry(() => this.sdk().look(id, LOOKER_LOOK_FIELDS));
return {
lookerId: stringValue(look.id),
title: stringValue(look.title),
description: nullableString(look.description),
folderId: nullableString(look.folder_id),
ownerId: nullableString(look.user_id),
updatedAt: nullableString(look.updated_at),
query: queryValue(look.query),
};
}
async listFolders(): Promise<StagedFoldersTreeFile> {
const folders = await this.withRateLimitRetry(() => this.sdk().all_folders('id,name,parent_id'));
const byId = new Map<string, LookerRecord>();
for (const folder of folders) {
byId.set(stringValue(folder.id), folder);
}
return {
folders: folders.map((folder) => ({
id: stringValue(folder.id),
name: stringValue(folder.name),
parentId: nullableString(folder.parent_id),
path: folderPath(folder, byId),
})),
};
}
async listUsers(): Promise<StagedUserFile[]> {
const users = await this.withRateLimitRetry(() => this.sdk().all_users('id,display_name,email'));
return users.map((user) => ({
id: stringValue(user.id),
displayName: nullableString(user.display_name),
email: nullableString(user.email),
}));
}
async listGroups(): Promise<StagedGroupFile[]> {
const groups = await this.withRateLimitRetry(() => this.sdk().all_groups('id,name'));
return groups.map((group) => ({
id: stringValue(group.id),
name: stringValue(group.name),
}));
}
async listLookmlModels(): Promise<StagedLookmlModelsFile> {
const models = await this.withRateLimitRetry(() => this.sdk().all_lookml_models('name,label,explores'));
return {
models: models.map((model) => ({
name: stringValue(model.name),
label: nullableString(model.label),
explores: arrayValue(model.explores).map((explore) => ({
name: stringValue(explore.name),
label: nullableString(explore.label),
})),
})),
};
}
async listLookerConnections(): Promise<LookerWarehouseConnectionInfo[]> {
const connections = await this.withRateLimitRetry(() =>
this.sdk().all_connections('name,host,database,schema,dialect_name'),
);
return connections.map((connection) => ({
name: stringValue(connection.name),
host: nullableString(connection.host),
database: nullableString(connection.database),
schema: nullableString(connection.schema),
dialect: nullableString(connection.dialect_name ?? connection.dialect),
}));
}
async getExplore(modelName: string, exploreName: string): Promise<StagedExploreFile> {
const explore = await this.withRateLimitRetry(() =>
this.sdk().lookml_model_explore(modelName, exploreName, LOOKER_EXPLORE_FIELDS),
);
const fields = recordValue(explore.fields);
return {
modelName,
exploreName: stringValue(explore.name),
label: nullableString(explore.label),
description: nullableString(explore.description),
rawSqlTableName: nullableString(explore.sql_table_name ?? explore.sqlTableName),
connectionName: nullableString(explore.connection_name ?? explore.connectionName),
viewName: nullableString(explore.view_name ?? explore.viewName),
fields: {
dimensions: arrayValue(fields.dimensions).map(stagedField),
measures: arrayValue(fields.measures).map(stagedField),
},
joins: arrayValue(explore.joins).map((join) => ({
name: stringValue(join.name),
type: nullableString(join.type),
relationship: nullableString(join.relationship),
rawSqlTableName: nullableString(join.sql_table_name ?? join.sqlTableName),
sqlOn: nullableString(join.sql_on ?? join.sqlOn),
from: nullableString(join.from),
targetTable: null,
})),
targetWarehouseConnectionId: null,
targetTable: null,
};
}
async getSignals(): Promise<StagedLookerSignalsFile> {
const [dashboardUsage, lookUsage, scheduledPlans, favorites] = await Promise.all([
this.getUsageSignals('dashboard').catch((error) =>
this.warnAndReturnEmpty('Looker system__activity dashboard usage unavailable', error),
),
this.getUsageSignals('look').catch((error) =>
this.warnAndReturnEmpty('Looker system__activity Look usage unavailable', error),
),
this.getScheduledPlanSignals().catch((error) =>
this.warnAndReturnEmpty('Looker scheduled-plan signals unavailable', error),
),
this.getFavoriteSignals().catch((error) => this.warnAndReturnEmpty('Looker favorite signals unavailable', error)),
]);
return { dashboardUsage, lookUsage, scheduledPlans, favorites };
}
async cleanup(): Promise<void> {
const sdk = this.sdkInstance;
if (!sdk) {
return;
}
await sdk.logout();
this.sdkInstance = null;
}
private async getUsageSignals(contentType: 'dashboard' | 'look'): Promise<StagedLookerSignalsFile['dashboardUsage']> {
const idField = contentType === 'dashboard' ? 'dashboard.id' : 'look.id';
const raw = await this.withRateLimitRetry(() =>
this.sdk().run_inline_query({
result_format: 'json',
body: {
model: 'system__activity',
view: 'history',
fields: [idField, 'history.query_run_count', 'history.created_date', 'user.id'],
filters: {
'history.created_date': '30 days',
[idField]: '-NULL',
},
sorts: ['history.query_run_count desc'],
limit: '5000',
},
}),
);
return aggregateUsageRows(parseJsonRows(raw), idField);
}
private async getScheduledPlanSignals(): Promise<StagedLookerSignalsFile['scheduledPlans']> {
const plans = await this.collectPaged((offset) =>
this.sdk().search_scheduled_plans({
all_users: true,
fields: 'id,dashboard_id,look_id,enabled,scheduled_plan_destination',
limit: LOOKER_PAGE_SIZE,
offset,
sorts: 'id',
}),
);
const byContent = new Map<
string,
{
contentId: string;
contentType: 'dashboard' | 'look';
isScheduled: boolean;
scheduleCount: number;
recipientCount: number;
}
>();
for (const plan of plans) {
const dashboardId = nullableString(plan.dashboard_id);
const lookId = nullableString(plan.look_id);
const contentType = dashboardId ? 'dashboard' : lookId ? 'look' : null;
const contentId = dashboardId ?? lookId;
if (!contentType || !contentId) {
continue;
}
const key = `${contentType}:${contentId}`;
const current =
byContent.get(key) ??
({
contentId,
contentType,
isScheduled: false,
scheduleCount: 0,
recipientCount: 0,
} satisfies StagedLookerSignalsFile['scheduledPlans'][number]);
if (plan.enabled !== false) {
current.isScheduled = true;
current.scheduleCount += 1;
current.recipientCount += arrayValue(plan.scheduled_plan_destination).length;
}
byContent.set(key, current);
}
return [...byContent.values()].filter((signal) => signal.scheduleCount > 0).sort(compareContentSignals);
}
private async getFavoriteSignals(): Promise<StagedLookerSignalsFile['favorites']> {
const dashboards = await this.collectPaged((offset) =>
this.sdk().search_dashboards({
deleted: false,
fields: 'id,favorite_count',
limit: LOOKER_PAGE_SIZE,
offset,
sorts: 'id',
}),
);
const looks = await this.collectPaged((offset) =>
this.sdk().search_looks({
deleted: false,
fields: 'id,favorite_count',
limit: LOOKER_PAGE_SIZE,
offset,
sorts: 'id',
}),
);
return [
...dashboards.flatMap((dashboard) => favoriteSignal(dashboard, 'dashboard')),
...looks.flatMap((look) => favoriteSignal(look, 'look')),
].sort(compareContentSignals);
}
private warnAndReturnEmpty(message: string, error: unknown): never[] {
this.logger.warn(`${message}; continuing without that prioritization input: ${errorMessage(error)}`);
return [];
}
private async collectPaged(loadPage: (offset: number) => Promise<LookerRecord[]>): Promise<LookerRecord[]> {
const rows: LookerRecord[] = [];
for (let offset = 0; ; offset += LOOKER_PAGE_SIZE) {
const page = await this.withRateLimitRetry(() => loadPage(offset));
rows.push(...page);
if (page.length < LOOKER_PAGE_SIZE) {
return rows;
}
}
}
private async withRateLimitRetry<T>(load: () => Promise<T>): Promise<T> {
try {
return await load();
} catch (error) {
if (lookerStatusCode(error) !== 429) {
throw error;
}
await (this.deps.sleep ?? sleep)(retryAfterMs(error));
return load();
}
}
private sdk(): LookerSdkPort {
if (!this.sdkInstance) {
this.sdkInstance = this.deps.sdkFactory?.(this.params) ?? createLookerSdkPort(this.params);
}
return this.sdkInstance;
}
}
function parseLookerConnectionParams(raw: Record<string, unknown>): LookerConnectionParams {
const baseUrl = raw.base_url;
const clientId = raw.client_id;
const apiCredential = raw.client_secret; // pragma: allowlist secret
if (typeof baseUrl !== 'string' || baseUrl.trim() === '') {
throw new Error('Looker base_url is required');
}
if (typeof clientId !== 'string' || clientId.trim() === '') {
throw new Error('Looker client_id is required');
}
if (typeof apiCredential !== 'string' || apiCredential.trim() === '') {
throw new Error('Looker client_secret is required'); // pragma: allowlist secret
}
return { base_url: baseUrl, client_id: clientId, client_secret: apiCredential }; // pragma: allowlist secret
}
function toRecord(value: object): LookerRecord {
return value as LookerRecord;
}
function toRecordArray(values: object[]): LookerRecord[] {
return values.map(toRecord);
}
function normalizeBaseUrl(baseUrl: string): string {
return baseUrl
.trim()
.replace(/\/+$/, '')
.replace(/\/api\/(4\.0|3\.1)$/, '');
}
function entityRef(row: LookerRecord): Array<{ id: string; updatedAt: string | null }> {
if (row.id === null || row.id === undefined) {
return [];
}
return [{ id: String(row.id), updatedAt: nullableString(row.updated_at) }];
}
function queryValue(value: unknown): StagedLookerQuery | null {
if (!value || typeof value !== 'object') {
return null;
}
const record = value as LookerRecord;
if (typeof record.model !== 'string' || typeof record.view !== 'string') {
return null;
}
return {
id: nullableString(record.id) ?? undefined,
model: record.model,
view: record.view,
fields: stringArray(record.fields),
filters: recordValue(record.filters),
sorts: stringArray(record.sorts),
limit: typeof record.limit === 'string' || typeof record.limit === 'number' ? record.limit : null,
dynamicFields: nullableString(record.dynamic_fields ?? record.dynamicFields),
targetWarehouseConnectionId: null,
targetTable: null,
};
}
function parseJsonRows(raw: string): LookerRecord[] {
const parsed = JSON.parse(raw) as unknown;
return Array.isArray(parsed) ? parsed.filter((row): row is LookerRecord => !!row && typeof row === 'object') : [];
}
function aggregateUsageRows(
rows: LookerRecord[],
idField: 'dashboard.id' | 'look.id',
): StagedLookerSignalsFile['dashboardUsage'] {
const byContent = new Map<
string,
{
contentId: string;
queryCount30d: number;
lastRunAt: string | null;
users: Set<string>;
}
>();
for (const row of rows) {
const contentId = nullableString(row[idField]);
if (!contentId) {
continue;
}
const current = byContent.get(contentId) ?? {
contentId,
queryCount30d: 0,
lastRunAt: null,
users: new Set<string>(),
};
current.queryCount30d += numberValue(row['history.query_run_count']);
const userId = nullableString(row['user.id']);
if (userId) {
current.users.add(userId);
}
const lastRunAt = nullableString(row['history.created_date']);
if (lastRunAt && (!current.lastRunAt || lastRunAt > current.lastRunAt)) {
current.lastRunAt = lastRunAt;
}
byContent.set(contentId, current);
}
return [...byContent.values()]
.map((signal) => ({
contentId: signal.contentId,
queryCount30d: signal.queryCount30d,
uniqueUsers30d: signal.users.size,
lastRunAt: signal.lastRunAt,
topUsers: [...signal.users].sort().slice(0, 5),
}))
.sort((a, b) => a.contentId.localeCompare(b.contentId));
}
function favoriteSignal(row: LookerRecord, contentType: 'dashboard' | 'look'): StagedLookerSignalsFile['favorites'] {
const contentId = nullableString(row.id);
if (!contentId) {
return [];
}
return [{ contentId, contentType, favoriteCount: numberValue(row.favorite_count) }];
}
function compareContentSignals(
a: { contentType?: string; contentId: string },
b: { contentType?: string; contentId: string },
): number {
return `${a.contentType ?? ''}:${a.contentId}`.localeCompare(`${b.contentType ?? ''}:${b.contentId}`);
}
function numberValue(value: unknown): number {
if (typeof value === 'number' && Number.isFinite(value)) {
return value;
}
if (typeof value === 'string' && value.trim() !== '') {
const parsed = Number(value);
return Number.isFinite(parsed) ? parsed : 0;
}
return 0;
}
function errorMessage(error: unknown): string {
return error instanceof Error ? error.message : String(error);
}
async function sleep(ms: number): Promise<void> {
await new Promise((resolve) => setTimeout(resolve, ms));
}
function lookerStatusCode(error: unknown): number | null {
if (!error || typeof error !== 'object') {
return null;
}
const record = error as Record<string, unknown>;
const direct = record.statusCode ?? record.status;
if (typeof direct === 'number') {
return direct;
}
if (typeof direct === 'string') {
const parsed = Number(direct);
return Number.isFinite(parsed) ? parsed : null;
}
const response = record.response;
if (response && typeof response === 'object') {
return lookerStatusCode(response);
}
return null;
}
function retryAfterMs(error: unknown): number {
const value = retryAfterHeader(error);
if (!value) {
return 1000;
}
const seconds = Number(value);
if (Number.isFinite(seconds)) {
return Math.max(0, seconds * 1000);
}
const dateMs = Date.parse(value);
return Number.isFinite(dateMs) ? Math.max(0, dateMs - Date.now()) : 1000;
}
function retryAfterHeader(error: unknown): string | null {
if (!error || typeof error !== 'object') {
return null;
}
const record = error as Record<string, unknown>;
const response = record.response;
const responseRecord = response && typeof response === 'object' ? (response as Record<string, unknown>) : null;
const headers = record.headers ?? responseRecord?.headers;
if (!headers || typeof headers !== 'object') {
return null;
}
const getter = (headers as { get?: unknown }).get;
if (typeof getter === 'function') {
const value = getter.call(headers, 'retry-after');
return typeof value === 'string' ? value : null;
}
const headerRecord = headers as Record<string, unknown>;
const direct = headerRecord['retry-after'] ?? headerRecord['Retry-After'];
return typeof direct === 'string' ? direct : null;
}
function stagedField(value: LookerRecord) {
return {
name: stringValue(value.name),
label: nullableString(value.label),
type: nullableString(value.type),
sql: nullableString(value.sql),
description: nullableString(value.description),
};
}
function folderPath(folder: LookerRecord, byId: Map<string, LookerRecord>): string[] {
const path: string[] = [];
let current: LookerRecord | undefined = folder;
const seen = new Set<string>();
while (current) {
const id = stringValue(current.id);
if (seen.has(id)) {
break;
}
seen.add(id);
path.unshift(stringValue(current.name));
const parentId = nullableString(current.parent_id);
current = parentId ? byId.get(parentId) : undefined;
}
return path;
}
function arrayValue(value: unknown): LookerRecord[] {
return Array.isArray(value) ? value.filter((item): item is LookerRecord => !!item && typeof item === 'object') : [];
}
function recordValue(value: unknown): Record<string, unknown> {
return value && typeof value === 'object' && !Array.isArray(value) ? { ...(value as Record<string, unknown>) } : {};
}
function stringArray(value: unknown): string[] {
return Array.isArray(value) ? value.filter((item): item is string => typeof item === 'string') : [];
}
function stringValue(value: unknown): string {
if (value === null || value === undefined) {
return '';
}
return String(value);
}
function nullableString(value: unknown): string | null {
if (value === null || value === undefined) {
return null;
}
return String(value);
}

View file

@ -0,0 +1,44 @@
import { describe, expect, it, vi } from 'vitest';
import { createDaemonLookerTableIdentifierParser } from './daemon-table-identifier-parser.js';
describe('createDaemonLookerTableIdentifierParser', () => {
it('posts parse items to the daemon endpoint', async () => {
const requestJson = vi.fn(async () => ({
results: {
orders: {
ok: true,
catalog: null,
schema: 'public',
name: 'orders',
canonical_table: 'public.orders',
},
},
}));
const parser = createDaemonLookerTableIdentifierParser({
baseUrl: 'http://127.0.0.1:8765',
requestJson,
});
await expect(parser.parse([{ key: 'orders', sql_table_name: 'public.orders', dialect: 'postgres' }])).resolves.toEqual({
orders: {
ok: true,
catalog: null,
schema: 'public',
name: 'orders',
canonical_table: 'public.orders',
},
});
expect(requestJson).toHaveBeenCalledWith('/sql/parse-table-identifier', {
items: [{ key: 'orders', sql_table_name: 'public.orders', dialect: 'postgres' }],
});
});
it('rejects non-object daemon responses', async () => {
const parser = createDaemonLookerTableIdentifierParser({
baseUrl: 'http://127.0.0.1:8765',
requestJson: async () => ({ results: null }),
});
await expect(parser.parse([])).rejects.toThrow('ktx-daemon table identifier parser returned invalid results');
});
});

View file

@ -0,0 +1,81 @@
import { request as httpRequest } from 'node:http';
import { request as httpsRequest } from 'node:https';
import { URL } from 'node:url';
import type {
LookerParsedIdentifier,
LookerTableIdentifierParseItem,
LookerTableIdentifierParser,
} from './mapping.js';
export type KtxDaemonTableIdentifierHttpJsonRunner = (
path: string,
payload: Record<string, unknown>,
) => Promise<Record<string, unknown>>;
export interface DaemonLookerTableIdentifierParserOptions {
baseUrl: string;
requestJson?: KtxDaemonTableIdentifierHttpJsonRunner;
}
export function createDaemonLookerTableIdentifierParser(
options: DaemonLookerTableIdentifierParserOptions,
): LookerTableIdentifierParser {
const requestJson = options.requestJson ?? postJson(options.baseUrl);
return {
async parse(items: LookerTableIdentifierParseItem[]): Promise<Record<string, LookerParsedIdentifier>> {
const raw = await requestJson('/sql/parse-table-identifier', { items });
if (!raw.results || typeof raw.results !== 'object' || Array.isArray(raw.results)) {
throw new Error('ktx-daemon table identifier parser returned invalid results');
}
return raw.results as Record<string, LookerParsedIdentifier>;
},
};
}
function normalizedBaseUrl(baseUrl: string): string {
return baseUrl.endsWith('/') ? baseUrl : `${baseUrl}/`;
}
function postJson(baseUrl: string): KtxDaemonTableIdentifierHttpJsonRunner {
return async (path, payload) =>
new Promise((resolve, reject) => {
const target = new URL(path.replace(/^\//, ''), normalizedBaseUrl(baseUrl));
const body = JSON.stringify(payload);
const client = target.protocol === 'https:' ? httpsRequest : httpRequest;
const request = client(
target,
{
method: 'POST',
headers: {
accept: 'application/json',
'content-type': 'application/json',
'content-length': Buffer.byteLength(body),
},
},
(response) => {
const chunks: Buffer[] = [];
response.on('data', (chunk: Buffer) => chunks.push(chunk));
response.on('end', () => {
const text = Buffer.concat(chunks).toString('utf8');
const statusCode = response.statusCode ?? 0;
if (statusCode < 200 || statusCode >= 300) {
reject(new Error(`ktx-daemon HTTP ${path} failed with ${statusCode}: ${text}`));
return;
}
try {
const parsed = JSON.parse(text) as unknown;
if (!parsed || typeof parsed !== 'object' || Array.isArray(parsed)) {
reject(new Error(`ktx-daemon HTTP ${path} returned non-object JSON`));
return;
}
resolve(parsed as Record<string, unknown>);
} catch (error) {
reject(error);
}
});
},
);
request.on('error', reject);
request.end(body);
});
}

View file

@ -0,0 +1,47 @@
import { mkdir, mkdtemp, rm, writeFile } from 'node:fs/promises';
import { tmpdir } from 'node:os';
import { join } from 'node:path';
import { afterEach, beforeEach, describe, expect, it } from 'vitest';
import { detectLookerStagedDir } from './detect.js';
async function touch(stagedDir: string, relPath: string, body = '{}\n'): Promise<void> {
const abs = join(stagedDir, relPath);
await mkdir(join(abs, '..'), { recursive: true });
await writeFile(abs, body, 'utf-8');
}
describe('detectLookerStagedDir', () => {
let stagedDir: string;
beforeEach(async () => {
stagedDir = await mkdtemp(join(tmpdir(), 'looker-detect-'));
});
afterEach(async () => {
await rm(stagedDir, { recursive: true, force: true });
});
it('returns true when sync-config.json and at least one runtime entity are present', async () => {
await touch(stagedDir, 'sync-config.json');
await touch(stagedDir, 'explores/b2b/sales_pipeline.json');
expect(await detectLookerStagedDir(stagedDir)).toBe(true);
});
it('returns true for dashboard-only staged dirs', async () => {
await touch(stagedDir, 'sync-config.json');
await touch(stagedDir, 'dashboards/10.json');
expect(await detectLookerStagedDir(stagedDir)).toBe(true);
});
it('returns false without sync-config.json', async () => {
await touch(stagedDir, 'looks/20.json');
expect(await detectLookerStagedDir(stagedDir)).toBe(false);
});
it('returns false when only control files are present', async () => {
await touch(stagedDir, 'sync-config.json');
await touch(stagedDir, 'lookml_models.json');
await touch(stagedDir, 'signals/dashboard_usage.json', '[]\n');
expect(await detectLookerStagedDir(stagedDir)).toBe(false);
});
});

View file

@ -0,0 +1,28 @@
import { readdir, stat } from 'node:fs/promises';
import { join, relative } from 'node:path';
import { STAGED_FILES } from './types.js';
const LOOKER_ENTITY_FILE_RE = /^(explores\/[^/]+\/[^/]+|dashboards\/[^/]+|looks\/[^/]+)\.json$/;
async function walk(root: string): Promise<string[]> {
const entries = await readdir(root, { withFileTypes: true, recursive: true });
return entries
.filter((entry) => entry.isFile())
.map((entry) => relative(root, join(entry.parentPath, entry.name)).replace(/\\/g, '/'))
.sort();
}
export async function detectLookerStagedDir(stagedDir: string): Promise<boolean> {
try {
await stat(join(stagedDir, STAGED_FILES.syncConfig));
} catch {
return false;
}
try {
const paths = await walk(stagedDir);
return paths.some((path) => LOOKER_ENTITY_FILE_RE.test(path));
} catch {
return false;
}
}

View file

@ -0,0 +1,188 @@
import { mkdir, mkdtemp, readFile, rm, writeFile } from 'node:fs/promises';
import { tmpdir } from 'node:os';
import { dirname, join } from 'node:path';
import { afterEach, beforeEach, describe, expect, it } from 'vitest';
import { getLookerTriageSignals, writeLookerEvidenceDocuments } from './evidence-documents.js';
async function writeJson(root: string, relPath: string, value: unknown): Promise<void> {
const target = join(root, relPath);
await mkdir(dirname(target), { recursive: true });
await writeFile(target, `${JSON.stringify(value, null, 2)}\n`, 'utf-8');
}
async function readJson<T>(root: string, relPath: string): Promise<T> {
return JSON.parse(await readFile(join(root, relPath), 'utf-8')) as T;
}
describe('Looker evidence documents', () => {
let stagedDir: string;
beforeEach(async () => {
stagedDir = await mkdtemp(join(tmpdir(), 'looker-evidence-docs-'));
await writeJson(stagedDir, 'explores/b2b/sales_pipeline.json', {
modelName: 'b2b',
exploreName: 'sales_pipeline',
label: 'Sales Pipeline',
description: 'Pipeline analysis explore.',
fields: {
dimensions: [
{ name: 'opportunities.stage', label: 'Stage', type: 'string', sql: '${TABLE}.stage', description: null },
],
measures: [
{
name: 'opportunities.arr',
label: 'ARR',
type: 'sum',
sql: '${TABLE}.arr',
description: 'Annual recurring revenue.',
},
],
},
joins: [{ name: 'accounts', type: 'left_outer', relationship: 'many_to_one' }],
});
await writeJson(stagedDir, 'dashboards/10.json', {
lookerId: '10',
title: 'Sales Pipeline Overview',
description: 'Executive dashboard for open pipeline ARR.',
folderId: '7',
ownerId: '3',
updatedAt: '2026-04-30T10:00:00.000Z',
tiles: [
{
id: '100',
title: 'Open Pipeline ARR',
lookId: null,
query: {
model: 'b2b',
view: 'sales_pipeline',
fields: ['opportunities.arr', 'opportunities.stage'],
filters: { 'opportunities.stage': 'open' },
sorts: ['opportunities.arr desc'],
limit: '500',
},
},
],
});
await writeJson(stagedDir, 'looks/20.json', {
lookerId: '20',
title: 'Active Opportunity Pipeline',
description: 'Saved Look for active opportunity pipeline review.',
folderId: '7',
ownerId: '3',
updatedAt: '2026-04-30T11:00:00.000Z',
query: {
model: 'b2b',
view: 'sales_pipeline',
fields: ['opportunities.arr'],
filters: { 'opportunities.stage': 'open' },
sorts: [],
limit: '500',
},
});
await writeJson(stagedDir, 'signals/dashboard_usage.json', [
{
contentId: '10',
queryCount30d: 80,
uniqueUsers30d: 12,
lastRunAt: '2026-04-30T09:00:00.000Z',
topUsers: ['3'],
},
]);
await writeJson(stagedDir, 'signals/look_usage.json', [
{
contentId: '20',
queryCount30d: 2,
uniqueUsers30d: 1,
lastRunAt: '2026-04-29T09:00:00.000Z',
topUsers: ['3'],
},
]);
await writeJson(stagedDir, 'signals/scheduled_plans.json', [
{ contentId: '10', contentType: 'dashboard', isScheduled: true, scheduleCount: 2, recipientCount: 5 },
]);
await writeJson(stagedDir, 'signals/favorites.json', [
{ contentId: '10', contentType: 'dashboard', favoriteCount: 4 },
]);
});
afterEach(async () => {
await rm(stagedDir, { recursive: true, force: true });
});
it('writes indexable metadata and markdown for explores, dashboards, and Looks', async () => {
await writeLookerEvidenceDocuments(stagedDir);
await expect(readJson(stagedDir, 'evidence/explores/b2b/sales_pipeline/metadata.json')).resolves.toMatchObject({
objectType: 'looker_explore',
id: 'looker:explore:b2b.sales_pipeline',
title: 'Sales Pipeline',
path: 'Looker / Explores / b2b.sales_pipeline',
properties: {
rawPath: 'explores/b2b/sales_pipeline.json',
modelName: 'b2b',
exploreName: 'sales_pipeline',
},
});
await expect(readJson(stagedDir, 'evidence/dashboards/10/metadata.json')).resolves.toMatchObject({
objectType: 'looker_dashboard',
id: 'looker:dashboard:10',
title: 'Sales Pipeline Overview',
path: 'Looker / Dashboards / Sales Pipeline Overview',
lastEditedAt: '2026-04-30T10:00:00.000Z',
properties: {
rawPath: 'dashboards/10.json',
lookerId: '10',
},
});
await expect(readJson(stagedDir, 'evidence/looks/20/metadata.json')).resolves.toMatchObject({
objectType: 'looker_look',
id: 'looker:look:20',
title: 'Active Opportunity Pipeline',
path: 'Looker / Looks / Active Opportunity Pipeline',
properties: {
rawPath: 'looks/20.json',
lookerId: '20',
},
});
const dashboardMarkdown = await readFile(join(stagedDir, 'evidence/dashboards/10/page.md'), 'utf-8');
expect(dashboardMarkdown).toContain('# Sales Pipeline Overview');
expect(dashboardMarkdown).toContain('Executive dashboard for open pipeline ARR.');
expect(dashboardMarkdown).toContain('## Tile: Open Pipeline ARR');
expect(dashboardMarkdown).toContain('- model: b2b');
expect(dashboardMarkdown).toContain('- explore: sales_pipeline');
expect(dashboardMarkdown).toContain('- opportunities.stage = open');
expect(dashboardMarkdown).not.toContain('80');
expect(dashboardMarkdown).not.toContain('queryCount30d');
expect(dashboardMarkdown).not.toContain('recipient');
expect(dashboardMarkdown).not.toContain('favorite');
expect(dashboardMarkdown).not.toContain('owner');
});
it('returns usage-aware triage signals without exposing usage as document prose', async () => {
await writeLookerEvidenceDocuments(stagedDir);
await expect(getLookerTriageSignals(stagedDir, 'looker:dashboard:10')).resolves.toEqual({
objectType: 'looker_dashboard',
propertyHints: {
contentType: 'dashboard',
queryCount30d: '80',
uniqueUsers30d: '12',
isScheduled: 'true',
favoriteCount: '4',
},
lastEditedAt: '2026-04-30T10:00:00.000Z',
});
await expect(getLookerTriageSignals(stagedDir, 'looker:look:20')).resolves.toEqual({
objectType: 'looker_look',
propertyHints: {
contentType: 'look',
queryCount30d: '2',
uniqueUsers30d: '1',
isScheduled: 'false',
favoriteCount: '0',
},
lastEditedAt: '2026-04-30T11:00:00.000Z',
});
});
});

View file

@ -0,0 +1,378 @@
import { mkdir, readdir, readFile, writeFile } from 'node:fs/promises';
import { dirname, join, relative } from 'node:path';
import type { TriageSignals } from '../../types.js';
import {
STAGED_FILES,
type StagedDashboardFile,
type StagedExploreFile,
type StagedLookerSignalsFile,
type StagedLookFile,
stagedDashboardFileSchema,
stagedExploreFileSchema,
stagedLookerSignalsFileSchema,
stagedLookFileSchema,
} from './types.js';
type JsonObject = Record<string, unknown>;
interface EvidenceDocument {
relDir: string;
metadata: JsonObject;
markdown: string;
}
export async function writeLookerEvidenceDocuments(stagedDir: string): Promise<void> {
const paths = await walkJson(stagedDir);
const signals = await readSignals(stagedDir);
const documents: EvidenceDocument[] = [];
for (const relPath of paths) {
if (/^explores\/[^/]+\/[^/]+\.json$/.test(relPath)) {
const explore = await readJson(stagedDir, relPath, stagedExploreFileSchema);
documents.push(renderExploreEvidence(relPath, explore));
continue;
}
if (/^dashboards\/[^/]+\.json$/.test(relPath)) {
const dashboard = await readJson(stagedDir, relPath, stagedDashboardFileSchema);
documents.push(renderDashboardEvidence(relPath, dashboard));
continue;
}
if (/^looks\/[^/]+\.json$/.test(relPath)) {
const look = await readJson(stagedDir, relPath, stagedLookFileSchema);
documents.push(renderLookEvidence(relPath, look));
}
}
for (const document of documents) {
await writeJson(stagedDir, join(document.relDir, 'metadata.json'), document.metadata);
await writeText(stagedDir, join(document.relDir, 'page.md'), document.markdown);
}
await writeJson(stagedDir, join(STAGED_FILES.evidenceRoot, 'signals-summary.json'), {
dashboardUsageCount: signals.dashboardUsage.length,
lookUsageCount: signals.lookUsage.length,
scheduledPlanCount: signals.scheduledPlans.length,
favoriteCount: signals.favorites.length,
});
}
export async function getLookerTriageSignals(stagedDir: string, externalId: string): Promise<TriageSignals> {
const signals = await readSignals(stagedDir);
const dashboardId = /^looker:dashboard:(.+)$/.exec(externalId)?.[1];
if (dashboardId) {
const dashboard = await readOptionalJson(
stagedDir,
`dashboards/${safePathSegment(dashboardId)}.json`,
stagedDashboardFileSchema,
);
const usage = signals.dashboardUsage.find((item) => item.contentId === dashboardId);
const schedule = signals.scheduledPlans.find(
(item) => item.contentType === 'dashboard' && item.contentId === dashboardId,
);
const favorite = signals.favorites.find(
(item) => item.contentType === 'dashboard' && item.contentId === dashboardId,
);
return {
objectType: 'looker_dashboard',
lastEditedAt: dashboard?.updatedAt ?? usage?.lastRunAt ?? undefined,
propertyHints: {
contentType: 'dashboard',
queryCount30d: String(usage?.queryCount30d ?? 0),
uniqueUsers30d: String(usage?.uniqueUsers30d ?? 0),
isScheduled: String(schedule?.isScheduled ?? false),
favoriteCount: String(favorite?.favoriteCount ?? 0),
},
};
}
const lookId = /^looker:look:(.+)$/.exec(externalId)?.[1];
if (lookId) {
const look = await readOptionalJson(stagedDir, `looks/${safePathSegment(lookId)}.json`, stagedLookFileSchema);
const usage = signals.lookUsage.find((item) => item.contentId === lookId);
const schedule = signals.scheduledPlans.find((item) => item.contentType === 'look' && item.contentId === lookId);
const favorite = signals.favorites.find((item) => item.contentType === 'look' && item.contentId === lookId);
return {
objectType: 'looker_look',
lastEditedAt: look?.updatedAt ?? usage?.lastRunAt ?? undefined,
propertyHints: {
contentType: 'look',
queryCount30d: String(usage?.queryCount30d ?? 0),
uniqueUsers30d: String(usage?.uniqueUsers30d ?? 0),
isScheduled: String(schedule?.isScheduled ?? false),
favoriteCount: String(favorite?.favoriteCount ?? 0),
},
};
}
const explore = /^looker:explore:([^.]+)\.(.+)$/.exec(externalId);
if (explore) {
return {
objectType: 'looker_explore',
propertyHints: {
contentType: 'explore',
modelName: explore[1],
exploreName: explore[2],
},
};
}
return { objectType: 'looker_runtime' };
}
function renderExploreEvidence(rawPath: string, explore: StagedExploreFile): EvidenceDocument {
const title = explore.label ?? `${explore.modelName}.${explore.exploreName}`;
const relDir = join(
STAGED_FILES.evidenceRoot,
'explores',
safePathSegment(explore.modelName),
safePathSegment(explore.exploreName),
);
const lines = [
`# ${title}`,
'',
explore.description ? explore.description : '',
'',
'## Explore',
'',
`- model: ${explore.modelName}`,
`- explore: ${explore.exploreName}`,
'',
'## Dimensions',
'',
...fieldLines(explore.fields.dimensions),
'',
'## Measures',
'',
...fieldLines(explore.fields.measures),
'',
'## Joins',
'',
...(explore.joins.length === 0
? ['- none']
: explore.joins.map((item) => `- ${item.name}${item.relationship ? ` (${item.relationship})` : ''}`)),
];
return {
relDir,
metadata: {
objectType: 'looker_explore',
id: `looker:explore:${explore.modelName}.${explore.exploreName}`,
title,
path: `Looker / Explores / ${explore.modelName}.${explore.exploreName}`,
url: null,
parentId: null,
databaseId: null,
dataSourceId: null,
lastEditedAt: null,
lastEditedBy: null,
properties: {
rawPath,
modelName: explore.modelName,
exploreName: explore.exploreName,
},
},
markdown: normalizeMarkdown(lines),
};
}
function renderDashboardEvidence(rawPath: string, dashboard: StagedDashboardFile): EvidenceDocument {
const relDir = join(STAGED_FILES.evidenceRoot, 'dashboards', safePathSegment(dashboard.lookerId));
const lines = [
`# ${dashboard.title}`,
'',
dashboard.description ?? '',
'',
'## Dashboard Queries',
'',
...dashboard.tiles.flatMap((tile) => [
`## Tile: ${tile.title ?? tile.id}`,
'',
...(tile.query ? queryLines(tile.query) : ['- no inline query captured']),
'',
]),
];
return {
relDir,
metadata: {
objectType: 'looker_dashboard',
id: `looker:dashboard:${dashboard.lookerId}`,
title: dashboard.title,
path: `Looker / Dashboards / ${dashboard.title}`,
url: null,
parentId: dashboard.folderId,
databaseId: null,
dataSourceId: null,
lastEditedAt: dashboard.updatedAt,
lastEditedBy: null,
properties: {
rawPath,
lookerId: dashboard.lookerId,
},
},
markdown: normalizeMarkdown(lines),
};
}
function renderLookEvidence(rawPath: string, look: StagedLookFile): EvidenceDocument {
const relDir = join(STAGED_FILES.evidenceRoot, 'looks', safePathSegment(look.lookerId));
const lines = [
`# ${look.title}`,
'',
look.description ?? '',
'',
'## Look Query',
'',
...(look.query ? queryLines(look.query) : ['- no query captured']),
];
return {
relDir,
metadata: {
objectType: 'looker_look',
id: `looker:look:${look.lookerId}`,
title: look.title,
path: `Looker / Looks / ${look.title}`,
url: null,
parentId: look.folderId,
databaseId: null,
dataSourceId: null,
lastEditedAt: look.updatedAt,
lastEditedBy: null,
properties: {
rawPath,
lookerId: look.lookerId,
},
},
markdown: normalizeMarkdown(lines),
};
}
function fieldLines(
fields: Array<{
name: string;
label: string | null;
type: string | null;
sql: string | null;
description: string | null;
}>,
): string[] {
if (fields.length === 0) {
return ['- none'];
}
return fields.map((field) => {
const parts = [
field.name,
field.label ? `label: ${field.label}` : null,
field.type ? `type: ${field.type}` : null,
field.description ? `description: ${field.description}` : null,
].filter(Boolean);
return `- ${parts.join('; ')}`;
});
}
function queryLines(query: StagedDashboardFile['tiles'][number]['query']): string[] {
if (!query) {
return ['- no query captured'];
}
return [
`- model: ${query.model}`,
`- explore: ${query.view}`,
'',
'### Fields',
'',
...(query.fields.length === 0 ? ['- none'] : query.fields.map((field) => `- ${field}`)),
'',
'### Filters',
'',
...filterLines(query.filters),
];
}
function filterLines(filters: Record<string, unknown>): string[] {
const entries = Object.entries(filters).filter(
([, value]) => value !== null && value !== undefined && String(value).trim() !== '',
);
if (entries.length === 0) {
return ['- none'];
}
return entries.map(([field, value]) => `- ${field} = ${String(value)}`);
}
async function readSignals(stagedDir: string): Promise<StagedLookerSignalsFile> {
const [dashboardUsage, lookUsage, scheduledPlans, favorites] = await Promise.all([
readOptionalArray(stagedDir, STAGED_FILES.signals.dashboardUsage),
readOptionalArray(stagedDir, STAGED_FILES.signals.lookUsage),
readOptionalArray(stagedDir, STAGED_FILES.signals.scheduledPlans),
readOptionalArray(stagedDir, STAGED_FILES.signals.favorites),
]);
return stagedLookerSignalsFileSchema.parse({ dashboardUsage, lookUsage, scheduledPlans, favorites });
}
async function readOptionalArray(stagedDir: string, relPath: string): Promise<unknown[]> {
try {
const parsed = JSON.parse(await readFile(join(stagedDir, relPath), 'utf-8')) as unknown;
return Array.isArray(parsed) ? parsed : [];
} catch (error) {
if (error && typeof error === 'object' && 'code' in error && error.code === 'ENOENT') {
return [];
}
throw error;
}
}
async function readOptionalJson<T>(
stagedDir: string,
relPath: string,
schema: { parse(value: unknown): T },
): Promise<T | null> {
try {
return await readJson(stagedDir, relPath, schema);
} catch (error) {
if (error && typeof error === 'object' && 'code' in error && error.code === 'ENOENT') {
return null;
}
throw error;
}
}
async function readJson<T>(stagedDir: string, relPath: string, schema: { parse(value: unknown): T }): Promise<T> {
return schema.parse(JSON.parse(await readFile(join(stagedDir, relPath), 'utf-8')));
}
async function writeJson(stagedDir: string, relPath: string, value: unknown): Promise<void> {
await writeText(stagedDir, relPath, `${JSON.stringify(value, null, 2)}\n`);
}
async function writeText(stagedDir: string, relPath: string, body: string): Promise<void> {
const target = join(stagedDir, relPath);
await mkdir(dirname(target), { recursive: true });
await writeFile(target, body, 'utf-8');
}
async function walkJson(root: string, dir = root): Promise<string[]> {
const entries = await readdir(dir, { withFileTypes: true });
const paths: string[] = [];
for (const entry of entries) {
const absPath = join(dir, entry.name);
if (entry.isDirectory()) {
paths.push(...(await walkJson(root, absPath)));
continue;
}
if (entry.isFile() && entry.name.endsWith('.json')) {
paths.push(relative(root, absPath).replace(/\\/g, '/'));
}
}
return paths.sort();
}
function safePathSegment(value: string): string {
if (!/^[a-zA-Z0-9_-]+$/.test(value)) {
throw new Error(`Unsafe Looker evidence path segment: ${value}`);
}
return value;
}
function normalizeMarkdown(lines: string[]): string {
return `${lines
.filter((line, index, all) => line !== '' || all[index - 1] !== '')
.join('\n')
.trim()}\n`;
}

View file

@ -0,0 +1,74 @@
import { describe, expect, it, vi } from 'vitest';
import type { FetchContext } from '../../types.js';
import type { LookerSdkPort } from './client.js';
import {
DefaultLookerClientFactory,
DefaultLookerConnectionClientFactory,
type LookerCredentialResolver,
} from './factory.js';
import type { LookerRuntimeClient } from './fetch.js';
import type { LookerPullConfig } from './types.js';
function sdk(): LookerSdkPort {
return {
me: vi.fn().mockResolvedValue({ id: '1', display_name: 'API User', email: 'api@example.com' }),
search_dashboards: vi.fn().mockResolvedValue([{ id: '10' }]),
dashboard: vi.fn(),
search_looks: vi.fn().mockResolvedValue([]),
search_scheduled_plans: vi.fn().mockResolvedValue([]),
look: vi.fn(),
all_folders: vi.fn().mockResolvedValue([]),
all_users: vi.fn().mockResolvedValue([]),
all_groups: vi.fn().mockResolvedValue([]),
all_connections: vi.fn().mockResolvedValue([]),
all_lookml_models: vi.fn().mockResolvedValue([]),
lookml_model_explore: vi.fn(),
run_inline_query: vi.fn().mockResolvedValue('[]'),
logout: vi.fn().mockResolvedValue(undefined),
};
}
describe('DefaultLookerConnectionClientFactory', () => {
it('resolves credentials by Looker connection id and creates a KTX Looker client', async () => {
const fakeSdk = sdk();
const resolver: LookerCredentialResolver = {
resolve: vi.fn().mockResolvedValue({
base_url: 'https://example.looker.com',
client_id: 'id',
client_secret: 'credential', // pragma: allowlist secret
}),
};
const factory = new DefaultLookerConnectionClientFactory(resolver, { sdkFactory: () => fakeSdk });
const client = await factory.createClient('prod-looker');
await expect(client.listDashboards()).resolves.toEqual([{ id: '10', updatedAt: null }]);
expect(resolver.resolve).toHaveBeenCalledWith('prod-looker');
});
});
describe('DefaultLookerClientFactory', () => {
const ctx: FetchContext = { connectionId: 'ctx-looker', sourceKey: 'looker' };
it('uses pullConfig.lookerConnectionId when present', async () => {
const runtimeClient = { listDashboards: vi.fn() } as unknown as LookerRuntimeClient;
const inner = { createClient: vi.fn().mockResolvedValue(runtimeClient) };
const factory = new DefaultLookerClientFactory(inner);
const config = { lookerConnectionId: 'prod-looker' } as LookerPullConfig;
await expect(factory.createClient(config, ctx)).resolves.toBe(runtimeClient);
expect(inner.createClient).toHaveBeenCalledWith('prod-looker');
});
it('falls back to ctx.connectionId when pullConfig.lookerConnectionId is absent', async () => {
const runtimeClient = { listDashboards: vi.fn() } as unknown as LookerRuntimeClient;
const inner = { createClient: vi.fn().mockResolvedValue(runtimeClient) };
const factory = new DefaultLookerClientFactory(inner);
const config = {} as LookerPullConfig;
await expect(factory.createClient(config, ctx)).resolves.toBe(runtimeClient);
expect(inner.createClient).toHaveBeenCalledWith('ctx-looker');
});
});

View file

@ -0,0 +1,34 @@
import type { FetchContext } from '../../types.js';
import { LookerClient, type LookerClientDeps, type LookerConnectionParams } from './client.js';
import type { LookerClientFactory, LookerRuntimeClient } from './fetch.js';
import type { LookerPullConfig } from './types.js';
export interface LookerCredentialResolver {
resolve(lookerConnectionId: string): Promise<LookerConnectionParams>;
}
/** @internal */
export interface LookerConnectionClientFactory {
createClient(lookerConnectionId: string): Promise<LookerRuntimeClient>;
}
export class DefaultLookerConnectionClientFactory implements LookerConnectionClientFactory {
constructor(
private readonly resolver: LookerCredentialResolver,
private readonly deps: LookerClientDeps = {},
) {}
async createClient(lookerConnectionId: string): Promise<LookerRuntimeClient> {
const credentials = await this.resolver.resolve(lookerConnectionId);
return new LookerClient(credentials, this.deps);
}
}
/** @internal */
export class DefaultLookerClientFactory implements LookerClientFactory {
constructor(private readonly inner: LookerConnectionClientFactory) {}
async createClient(config: LookerPullConfig, ctx: FetchContext): Promise<LookerRuntimeClient> {
return this.inner.createClient(config.lookerConnectionId ?? ctx.connectionId);
}
}

View file

@ -0,0 +1,77 @@
import { mkdtemp, rm } from 'node:fs/promises';
import { tmpdir } from 'node:os';
import { join } from 'node:path';
import { afterEach, beforeEach, describe, expect, it } from 'vitest';
import { readLookerFetchReport, writeLookerFetchReport } from './fetch-report.js';
describe('Looker staged fetch report', () => {
let stagedDir: string;
beforeEach(async () => {
stagedDir = await mkdtemp(join(tmpdir(), 'looker-fetch-report-'));
});
afterEach(async () => {
await rm(stagedDir, { recursive: true, force: true });
});
it('returns null when a staged bundle has no fetch report', async () => {
await expect(readLookerFetchReport(stagedDir)).resolves.toBeNull();
});
it('round-trips partial fetch issues', async () => {
await writeLookerFetchReport(stagedDir, {
status: 'partial',
retryRecommended: true,
skipped: [
{
rawPath: 'dashboards/10.json',
entityType: 'dashboard',
entityId: '10',
severity: 'error',
statusCode: 429,
message: 'Looker API rate limit remained after retry',
retryRecommended: true,
},
],
warnings: [
{
rawPath: 'signals/dashboard_usage.json',
entityType: 'signals',
entityId: null,
severity: 'warning',
statusCode: 403,
message: 'system__activity unavailable',
retryRecommended: false,
},
],
});
await expect(readLookerFetchReport(stagedDir)).resolves.toEqual({
status: 'partial',
retryRecommended: true,
skipped: [
{
rawPath: 'dashboards/10.json',
entityType: 'dashboard',
entityId: '10',
severity: 'error',
statusCode: 429,
message: 'Looker API rate limit remained after retry',
retryRecommended: true,
},
],
warnings: [
{
rawPath: 'signals/dashboard_usage.json',
entityType: 'signals',
entityId: null,
severity: 'warning',
statusCode: 403,
message: 'system__activity unavailable',
retryRecommended: false,
},
],
});
});
});

View file

@ -0,0 +1,22 @@
import { mkdir, readFile, writeFile } from 'node:fs/promises';
import { dirname, join } from 'node:path';
import { STAGED_FILES, type StagedLookerFetchReport, stagedLookerFetchReportSchema } from './types.js';
export async function readLookerFetchReport(stagedDir: string): Promise<StagedLookerFetchReport | null> {
try {
const raw = await readFile(join(stagedDir, STAGED_FILES.fetchReport), 'utf-8');
return stagedLookerFetchReportSchema.parse(JSON.parse(raw));
} catch (error) {
if (error && typeof error === 'object' && 'code' in error && error.code === 'ENOENT') {
return null;
}
throw error;
}
}
export async function writeLookerFetchReport(stagedDir: string, report: StagedLookerFetchReport): Promise<void> {
const parsed = stagedLookerFetchReportSchema.parse(report);
const target = join(stagedDir, STAGED_FILES.fetchReport);
await mkdir(dirname(target), { recursive: true });
await writeFile(target, `${JSON.stringify(parsed, null, 2)}\n`, 'utf-8');
}

View file

@ -0,0 +1,645 @@
import { mkdtemp, readdir, readFile, rm } from 'node:fs/promises';
import { tmpdir } from 'node:os';
import { join } from 'node:path';
import { afterEach, beforeEach, describe, expect, it, vi } from 'vitest';
import { chunkLookerStagedDir } from './chunk.js';
import { fetchLookerRuntimeBundle, type LookerRuntimeClient } from './fetch.js';
const connectionId = '11111111-1111-4111-8111-111111111111';
function makeClient(): LookerRuntimeClient {
return {
listDashboards: vi.fn().mockResolvedValue([{ id: '10' }]),
getDashboard: vi.fn().mockResolvedValue({
lookerId: '10',
title: 'Sales Pipeline',
description: 'Pipeline health',
folderId: '7',
ownerId: '3',
updatedAt: '2026-04-30T12:00:00.000Z',
tiles: [{ id: '100', title: 'ARR', lookId: null, query: { model: 'b2b', view: 'sales_pipeline' } }],
}),
listLooks: vi.fn().mockResolvedValue([{ id: '20' }]),
getLook: vi.fn().mockResolvedValue({
lookerId: '20',
title: 'Open Pipeline',
description: null,
folderId: '7',
ownerId: '3',
updatedAt: '2026-04-30T12:00:00.000Z',
query: { model: 'b2b', view: 'sales_pipeline', fields: ['opportunities.arr'] },
}),
listFolders: vi
.fn()
.mockResolvedValue({ folders: [{ id: '7', name: 'Sandbox', parentId: null, path: ['Sandbox'] }] }),
listUsers: vi.fn().mockResolvedValue([{ id: '3', displayName: 'Ada Lovelace', email: null }]),
listGroups: vi.fn().mockResolvedValue([{ id: '4', name: 'Sales' }]),
listLookmlModels: vi.fn().mockResolvedValue({
models: [{ name: 'b2b', label: 'B2B', explores: [{ name: 'sales_pipeline', label: 'Sales Pipeline' }] }],
}),
getExplore: vi.fn().mockResolvedValue({
modelName: 'b2b',
exploreName: 'sales_pipeline',
label: 'Sales Pipeline',
description: null,
fields: { dimensions: [{ name: 'opportunities.id' }], measures: [{ name: 'opportunities.arr' }] },
joins: [],
}),
getSignals: vi.fn().mockResolvedValue({
dashboardUsage: [{ contentId: '10', queryCount30d: 50, uniqueUsers30d: 8, lastRunAt: null, topUsers: ['3'] }],
lookUsage: [{ contentId: '20', queryCount30d: 20, uniqueUsers30d: 5, lastRunAt: null, topUsers: ['3'] }],
scheduledPlans: [
{ contentId: '10', contentType: 'dashboard', isScheduled: true, scheduleCount: 1, recipientCount: 3 },
],
favorites: [{ contentId: '10', contentType: 'dashboard', favoriteCount: 4 }],
}),
cleanup: vi.fn().mockResolvedValue(undefined),
};
}
describe('fetchLookerRuntimeBundle', () => {
let stagedDir: string;
beforeEach(async () => {
stagedDir = await mkdtemp(join(tmpdir(), 'looker-fetch-'));
});
afterEach(async () => {
await rm(stagedDir, { recursive: true, force: true });
});
it('writes dashboards, looks, folders, users, groups, models, explores, signals, and sync config', async () => {
const client = makeClient();
await fetchLookerRuntimeBundle({
pullConfig: { lookerConnectionId: connectionId, instanceBaseUrl: 'https://example.looker.com' },
stagedDir,
ctx: { connectionId, sourceKey: 'looker' },
clientFactory: { createClient: vi.fn().mockResolvedValue(client) },
now: () => new Date('2026-04-30T12:30:00.000Z'),
});
expect(await readdir(join(stagedDir, 'dashboards'))).toEqual(['10.json']);
expect(await readdir(join(stagedDir, 'looks'))).toEqual(['20.json']);
expect(await readdir(join(stagedDir, 'users'))).toEqual(['3.json']);
expect(await readdir(join(stagedDir, 'groups'))).toEqual(['4.json']);
expect(await readdir(join(stagedDir, 'explores/b2b'))).toEqual(['sales_pipeline.json']);
const syncConfig = JSON.parse(await readFile(join(stagedDir, 'sync-config.json'), 'utf-8'));
expect(syncConfig).toEqual({
lookerConnectionId: connectionId,
fetchedAt: '2026-04-30T12:30:00.000Z',
instanceBaseUrl: 'https://example.looker.com',
previousCursors: {
dashboardsLastSyncedAt: null,
looksLastSyncedAt: null,
},
nextCursors: {
dashboardsLastSyncedAt: null,
looksLastSyncedAt: null,
},
});
const scope = JSON.parse(await readFile(join(stagedDir, 'looker-scope.json'), 'utf-8'));
expect(scope).toEqual({
mode: 'full',
knownCurrentRawPaths: ['dashboards/10.json', 'looks/20.json'],
fetchedRawPaths: ['dashboards/10.json', 'looks/20.json'],
});
const dashboardUsage = JSON.parse(await readFile(join(stagedDir, 'signals/dashboard_usage.json'), 'utf-8'));
expect(dashboardUsage).toEqual([
{ contentId: '10', queryCount30d: 50, uniqueUsers30d: 8, lastRunAt: null, topUsers: ['3'] },
]);
const lookUsage = JSON.parse(await readFile(join(stagedDir, 'signals/look_usage.json'), 'utf-8'));
const scheduledPlans = JSON.parse(await readFile(join(stagedDir, 'signals/scheduled_plans.json'), 'utf-8'));
const favorites = JSON.parse(await readFile(join(stagedDir, 'signals/favorites.json'), 'utf-8'));
expect(lookUsage).toEqual([
{ contentId: '20', queryCount30d: 20, uniqueUsers30d: 5, lastRunAt: null, topUsers: ['3'] },
]);
expect(scheduledPlans).toEqual([
{ contentId: '10', contentType: 'dashboard', isScheduled: true, scheduleCount: 1, recipientCount: 3 },
]);
expect(favorites).toEqual([{ contentId: '10', contentType: 'dashboard', favoriteCount: 4 }]);
});
it('stages only changed Dashboard and Look entity bodies during incremental pulls', async () => {
const client = makeClient();
vi.mocked(client.listDashboards).mockResolvedValue([
{ id: '10', updatedAt: '2026-04-30T12:00:00.000Z' },
{ id: '11', updatedAt: '2026-04-30T12:10:00.000Z' },
]);
vi.mocked(client.getDashboard).mockImplementation(async (id: string) => ({
lookerId: id,
title: `Dashboard ${id}`,
description: null,
folderId: '7',
ownerId: '3',
updatedAt: id === '11' ? '2026-04-30T12:10:00.000Z' : '2026-04-30T12:00:00.000Z',
tiles: [],
}));
vi.mocked(client.listLooks).mockResolvedValue([
{ id: '20', updatedAt: '2026-04-30T11:00:00.000Z' },
{ id: '21', updatedAt: null },
]);
vi.mocked(client.getLook).mockImplementation(async (id: string) => ({
lookerId: id,
title: `Look ${id}`,
description: null,
folderId: '7',
ownerId: '3',
updatedAt: id === '21' ? null : '2026-04-30T11:00:00.000Z',
query: null,
}));
await fetchLookerRuntimeBundle({
pullConfig: {
lookerConnectionId: connectionId,
dashboardUpdatedSince: '2026-04-30T12:00:00.000Z',
lookUpdatedSince: '2026-04-30T11:00:00.000Z',
},
stagedDir,
ctx: { connectionId, sourceKey: 'looker' },
clientFactory: { createClient: vi.fn().mockResolvedValue(client) },
now: () => new Date('2026-04-30T12:30:00.000Z'),
});
expect(client.getDashboard).toHaveBeenCalledTimes(1);
expect(client.getDashboard).toHaveBeenCalledWith('11');
expect(client.getLook).toHaveBeenCalledTimes(1);
expect(client.getLook).toHaveBeenCalledWith('21');
await expect(readdir(join(stagedDir, 'dashboards'))).resolves.toEqual(['11.json']);
await expect(readdir(join(stagedDir, 'looks'))).resolves.toEqual(['21.json']);
const syncConfig = JSON.parse(await readFile(join(stagedDir, 'sync-config.json'), 'utf-8'));
expect(syncConfig.previousCursors).toEqual({
dashboardsLastSyncedAt: '2026-04-30T12:00:00.000Z',
looksLastSyncedAt: '2026-04-30T11:00:00.000Z',
});
expect(syncConfig.nextCursors).toEqual({
dashboardsLastSyncedAt: '2026-04-30T12:10:00.000Z',
looksLastSyncedAt: '2026-04-30T11:00:00.000Z',
});
const scope = JSON.parse(await readFile(join(stagedDir, 'looker-scope.json'), 'utf-8'));
expect(scope).toEqual({
mode: 'incremental',
knownCurrentRawPaths: ['dashboards/10.json', 'dashboards/11.json', 'looks/20.json', 'looks/21.json'],
fetchedRawPaths: ['dashboards/11.json', 'looks/21.json'],
});
});
it('falls back to empty signal files when the client has no signal support', async () => {
const client = makeClient();
delete client.getSignals;
await fetchLookerRuntimeBundle({
pullConfig: { lookerConnectionId: connectionId },
stagedDir,
ctx: { connectionId, sourceKey: 'looker' },
clientFactory: { createClient: vi.fn().mockResolvedValue(client) },
now: () => new Date('2026-04-30T12:30:00.000Z'),
});
expect(JSON.parse(await readFile(join(stagedDir, 'signals/look_usage.json'), 'utf-8'))).toEqual([]);
});
it('stamps explore warehouse targets from pull config and reports unmapped Looker connections', async () => {
const client = makeClient();
const warehouseConnectionId = '22222222-2222-4222-8222-222222222222';
vi.mocked(client.listLookmlModels).mockResolvedValue({
models: [
{
name: 'b2b',
label: 'B2B',
explores: [
{ name: 'sales_pipeline', label: 'Sales Pipeline' },
{ name: 'marketing', label: 'Marketing' },
],
},
],
});
vi.mocked(client.getExplore).mockImplementation(async (_modelName: string, exploreName: string) => {
if (exploreName === 'marketing') {
return {
modelName: 'b2b',
exploreName: 'marketing',
label: 'Marketing',
description: null,
rawSqlTableName: 'proj.dataset.marketing',
connectionName: 'missing_mapping',
viewName: 'marketing',
fields: {
dimensions: [{ name: 'marketing.id', label: null, type: null, sql: null, description: null }],
measures: [{ name: 'marketing.spend', label: null, type: null, sql: null, description: null }],
},
joins: [],
targetWarehouseConnectionId: null,
targetTable: null,
};
}
return {
modelName: 'b2b',
exploreName: 'sales_pipeline',
label: 'Sales Pipeline',
description: null,
rawSqlTableName: 'proj.dataset.opportunities AS opportunities',
connectionName: 'b2b_sandbox_bq',
viewName: 'opportunities',
fields: {
dimensions: [{ name: 'opportunities.id', label: null, type: null, sql: null, description: null }],
measures: [{ name: 'opportunities.arr', label: null, type: null, sql: null, description: null }],
},
joins: [
{
name: 'accounts',
type: 'left_outer',
relationship: 'many_to_one',
rawSqlTableName: 'proj.dataset.accounts',
sqlOn: '$' + '{opportunities.account_id} = $' + '{accounts.id}',
from: null,
targetTable: null,
},
],
targetWarehouseConnectionId: null,
targetTable: null,
};
});
await fetchLookerRuntimeBundle({
pullConfig: {
lookerConnectionId: connectionId,
connectionMappings: { b2b_sandbox_bq: warehouseConnectionId },
connectionTypes: { b2b_sandbox_bq: 'BIGQUERY' },
parsedTargetTables: {
'b2b.sales_pipeline': {
ok: true,
catalog: 'proj',
schema: 'dataset',
name: 'opportunities',
canonicalTable: 'proj.dataset.opportunities',
},
'b2b.sales_pipeline.accounts': {
ok: true,
catalog: 'proj',
schema: 'dataset',
name: 'accounts',
canonicalTable: 'proj.dataset.accounts',
},
},
},
stagedDir,
ctx: { connectionId, sourceKey: 'looker' },
clientFactory: { createClient: vi.fn().mockResolvedValue(client) },
now: () => new Date('2026-04-30T12:30:00.000Z'),
});
const salesPipeline = JSON.parse(await readFile(join(stagedDir, 'explores/b2b/sales_pipeline.json'), 'utf-8'));
expect(salesPipeline).toMatchObject({
connectionName: 'b2b_sandbox_bq',
targetWarehouseConnectionId: warehouseConnectionId,
targetTable: {
ok: true,
catalog: 'proj',
schema: 'dataset',
name: 'opportunities',
canonicalTable: 'proj.dataset.opportunities',
},
joins: [
{
name: 'accounts',
targetTable: {
ok: true,
catalog: 'proj',
schema: 'dataset',
name: 'accounts',
canonicalTable: 'proj.dataset.accounts',
},
},
],
});
const marketing = JSON.parse(await readFile(join(stagedDir, 'explores/b2b/marketing.json'), 'utf-8'));
expect(marketing).toMatchObject({
connectionName: 'missing_mapping',
targetWarehouseConnectionId: null,
targetTable: {
ok: false,
reason: 'no_connection_mapping',
},
});
const report = JSON.parse(await readFile(join(stagedDir, 'looker-fetch-report.json'), 'utf-8'));
expect(report.status).toBe('partial');
expect(report.skipped).toEqual([]);
expect(report.warnings).toEqual([
{
rawPath: 'looker_connection_mappings/missing_mapping',
entityType: 'looker_connection_mapping',
entityId: 'missing_mapping',
severity: 'warning',
statusCode: null,
message: 'Looker connection missing_mapping is not mapped to a warehouse connection; 1 explore will be wiki-only.',
retryRecommended: false,
kind: 'unmapped_looker_connection',
details: {
lookerConnectionName: 'missing_mapping',
affectedExplores: ['b2b.marketing'],
},
},
]);
});
it('reports parsed target table failures without retrying the Looker fetch', async () => {
const client = makeClient();
const warehouseConnectionId = '22222222-2222-4222-8222-222222222222';
vi.mocked(client.getExplore).mockResolvedValue({
modelName: 'b2b',
exploreName: 'sales_pipeline',
label: 'Sales Pipeline',
description: null,
rawSqlTableName: '$' + '{derived.SQL_TABLE_NAME}',
connectionName: 'b2b_sandbox_bq',
viewName: 'opportunities',
fields: {
dimensions: [{ name: 'opportunities.id', label: null, type: null, sql: null, description: null }],
measures: [{ name: 'opportunities.arr', label: null, type: null, sql: null, description: null }],
},
joins: [],
targetWarehouseConnectionId: null,
targetTable: null,
});
await fetchLookerRuntimeBundle({
pullConfig: {
lookerConnectionId: connectionId,
connectionMappings: { b2b_sandbox_bq: warehouseConnectionId },
connectionTypes: { b2b_sandbox_bq: 'BIGQUERY' },
parsedTargetTables: {
'b2b.sales_pipeline': {
ok: false,
reason: 'looker_template_unresolved',
detail: 'Looker template markers cannot be resolved before parsing.',
},
},
},
stagedDir,
ctx: { connectionId, sourceKey: 'looker' },
clientFactory: { createClient: vi.fn().mockResolvedValue(client) },
now: () => new Date('2026-04-30T12:30:00.000Z'),
});
const explore = JSON.parse(await readFile(join(stagedDir, 'explores/b2b/sales_pipeline.json'), 'utf-8'));
expect(explore).toMatchObject({
targetWarehouseConnectionId: warehouseConnectionId,
targetTable: {
ok: false,
reason: 'looker_template_unresolved',
},
});
const report = JSON.parse(await readFile(join(stagedDir, 'looker-fetch-report.json'), 'utf-8'));
expect(report).toMatchObject({
status: 'partial',
retryRecommended: false,
skipped: [],
warnings: [
{
rawPath: 'looker_connection_mappings/b2b_sandbox_bq',
entityType: 'looker_connection_mapping',
entityId: 'b2b_sandbox_bq',
severity: 'warning',
statusCode: null,
message:
'Looker explore b2b.sales_pipeline has sql_table_name that cannot be mapped to a physical warehouse table: looker_template_unresolved.',
retryRecommended: false,
kind: 'looker_template_unresolved',
details: {
lookerConnectionName: 'b2b_sandbox_bq',
rawSqlTableName: '$' + '{derived.SQL_TABLE_NAME}',
reason: 'looker_template_unresolved',
},
},
],
});
});
it('propagates parent explore warehouse targets onto Dashboard tile and Look queries', async () => {
const client = makeClient();
const warehouseConnectionId = '22222222-2222-4222-8222-222222222222';
vi.mocked(client.getExplore).mockResolvedValue({
modelName: 'b2b',
exploreName: 'sales_pipeline',
label: 'Sales Pipeline',
description: null,
rawSqlTableName: 'proj.dataset.opportunities AS opportunities',
connectionName: 'b2b_sandbox_bq',
viewName: 'opportunities',
fields: {
dimensions: [{ name: 'opportunities.id', label: null, type: null, sql: null, description: null }],
measures: [{ name: 'opportunities.arr', label: null, type: null, sql: null, description: null }],
},
joins: [],
targetWarehouseConnectionId: null,
targetTable: null,
});
await fetchLookerRuntimeBundle({
pullConfig: {
lookerConnectionId: connectionId,
connectionMappings: { b2b_sandbox_bq: warehouseConnectionId },
connectionTypes: { b2b_sandbox_bq: 'BIGQUERY' },
parsedTargetTables: {
'b2b.sales_pipeline': {
ok: true,
catalog: 'proj',
schema: 'dataset',
name: 'opportunities',
canonicalTable: 'proj.dataset.opportunities',
},
},
},
stagedDir,
ctx: { connectionId, sourceKey: 'looker' },
clientFactory: { createClient: vi.fn().mockResolvedValue(client) },
now: () => new Date('2026-04-30T12:30:00.000Z'),
});
const dashboard = JSON.parse(await readFile(join(stagedDir, 'dashboards/10.json'), 'utf-8'));
expect(dashboard.tiles[0].query).toMatchObject({
model: 'b2b',
view: 'sales_pipeline',
targetWarehouseConnectionId: warehouseConnectionId,
targetTable: {
ok: true,
catalog: 'proj',
schema: 'dataset',
name: 'opportunities',
canonicalTable: 'proj.dataset.opportunities',
},
});
const look = JSON.parse(await readFile(join(stagedDir, 'looks/20.json'), 'utf-8'));
expect(look.query).toMatchObject({
model: 'b2b',
view: 'sales_pipeline',
targetWarehouseConnectionId: warehouseConnectionId,
targetTable: {
ok: true,
catalog: 'proj',
schema: 'dataset',
name: 'opportunities',
canonicalTable: 'proj.dataset.opportunities',
},
});
});
it('records skipped detail entities and keeps cursors pinned for affected entity types', async () => {
const client = makeClient();
vi.mocked(client.listDashboards).mockResolvedValue([
{ id: '10', updatedAt: '2026-04-30T12:00:00.000Z' },
{ id: '11', updatedAt: '2026-04-30T12:10:00.000Z' },
]);
vi.mocked(client.getDashboard).mockImplementation(async (id: string) => {
if (id === '11') {
const error = new Error('Looker API rate limit remained after retry');
Object.assign(error, { statusCode: 429 });
throw error;
}
return {
lookerId: id,
title: `Dashboard ${id}`,
description: null,
folderId: '7',
ownerId: '3',
updatedAt: '2026-04-30T12:00:00.000Z',
tiles: [],
};
});
vi.mocked(client.listLooks).mockResolvedValue([{ id: '20', updatedAt: '2026-04-30T11:15:00.000Z' }]);
vi.mocked(client.getLook).mockResolvedValue({
lookerId: '20',
title: 'Look 20',
description: null,
folderId: '7',
ownerId: '3',
updatedAt: '2026-04-30T11:15:00.000Z',
query: null,
});
await fetchLookerRuntimeBundle({
pullConfig: {
lookerConnectionId: connectionId,
dashboardUpdatedSince: '2026-04-30T12:00:00.000Z',
lookUpdatedSince: '2026-04-30T11:00:00.000Z',
},
stagedDir,
ctx: { connectionId, sourceKey: 'looker' },
clientFactory: { createClient: vi.fn().mockResolvedValue(client) },
now: () => new Date('2026-04-30T12:30:00.000Z'),
});
await expect(readdir(join(stagedDir, 'dashboards'))).rejects.toMatchObject({ code: 'ENOENT' });
await expect(readdir(join(stagedDir, 'looks'))).resolves.toEqual(['20.json']);
const syncConfig = JSON.parse(await readFile(join(stagedDir, 'sync-config.json'), 'utf-8'));
expect(syncConfig.nextCursors).toEqual({
dashboardsLastSyncedAt: '2026-04-30T12:00:00.000Z',
looksLastSyncedAt: '2026-04-30T11:15:00.000Z',
});
const report = JSON.parse(await readFile(join(stagedDir, 'looker-fetch-report.json'), 'utf-8'));
expect(report).toEqual({
status: 'partial',
retryRecommended: true,
skipped: [
{
rawPath: 'dashboards/11.json',
entityType: 'dashboard',
entityId: '11',
severity: 'error',
statusCode: 429,
message: 'Looker API rate limit remained after retry',
retryRecommended: true,
},
],
warnings: [],
});
});
it('continues without explore bootstrap when LookML model listing is denied', async () => {
const client = makeClient();
const error = new Error('LookML model access denied');
Object.assign(error, { statusCode: 403 });
vi.mocked(client.listLookmlModels).mockRejectedValue(error);
await fetchLookerRuntimeBundle({
pullConfig: { lookerConnectionId: connectionId },
stagedDir,
ctx: { connectionId, sourceKey: 'looker' },
clientFactory: { createClient: vi.fn().mockResolvedValue(client) },
now: () => new Date('2026-04-30T12:30:00.000Z'),
});
await expect(readdir(join(stagedDir, 'dashboards'))).resolves.toEqual(['10.json']);
await expect(readdir(join(stagedDir, 'looks'))).resolves.toEqual(['20.json']);
await expect(readFile(join(stagedDir, 'lookml_models.json'), 'utf-8')).resolves.toBe('{\n "models": []\n}\n');
await expect(readdir(join(stagedDir, 'explores'))).rejects.toMatchObject({ code: 'ENOENT' });
expect(client.getExplore).not.toHaveBeenCalled();
const report = JSON.parse(await readFile(join(stagedDir, 'looker-fetch-report.json'), 'utf-8'));
expect(report).toEqual({
status: 'success',
retryRecommended: false,
skipped: [],
warnings: [
{
rawPath: 'lookml_models.json',
entityType: 'lookml_models',
entityId: null,
severity: 'warning',
statusCode: 403,
message: 'LookML model access denied',
retryRecommended: false,
},
],
});
const chunked = await chunkLookerStagedDir(stagedDir);
expect(chunked.workUnits.map((wu) => wu.unitKey).sort()).toEqual(['looker-dashboard-10', 'looker-look-20']);
expect(chunked.workUnits.flatMap((wu) => wu.dependencyPaths)).not.toContain('explores/b2b/sales_pipeline.json');
});
it('cleans up the Looker client after a successful fetch', async () => {
const client = makeClient();
await fetchLookerRuntimeBundle({
pullConfig: { lookerConnectionId: connectionId },
stagedDir,
ctx: { connectionId, sourceKey: 'looker' },
clientFactory: { createClient: vi.fn().mockResolvedValue(client) },
now: () => new Date('2026-04-30T12:30:00.000Z'),
});
expect(client.cleanup).toHaveBeenCalledTimes(1);
});
it('cleans up the Looker client when fetch throws', async () => {
const client = makeClient();
vi.mocked(client.listDashboards).mockRejectedValue(new Error('Looker API unavailable'));
await expect(
fetchLookerRuntimeBundle({
pullConfig: { lookerConnectionId: connectionId },
stagedDir,
ctx: { connectionId, sourceKey: 'looker' },
clientFactory: { createClient: vi.fn().mockResolvedValue(client) },
now: () => new Date('2026-04-30T12:30:00.000Z'),
}),
).rejects.toThrow('Looker API unavailable');
expect(client.cleanup).toHaveBeenCalledTimes(1);
});
});

View file

@ -0,0 +1,555 @@
import { mkdir, writeFile } from 'node:fs/promises';
import { dirname, join } from 'node:path';
import type { ParsedTargetTable } from '../../parsed-target-table.js';
import type { FetchContext } from '../../types.js';
import { writeLookerEvidenceDocuments } from './evidence-documents.js';
import { writeLookerFetchReport } from './fetch-report.js';
import {
type LookerPullConfig,
parseLookerPullConfig,
STAGED_FILES,
type StagedDashboardFile,
type StagedExploreFile,
type StagedFoldersTreeFile,
type StagedGroupFile,
type StagedLookerFetchIssue,
type StagedLookerFetchReport,
type StagedLookerQuery,
type StagedLookerSignalsFile,
type StagedLookFile,
type StagedLookmlModelsFile,
type StagedUserFile,
stagedDashboardFileSchema,
stagedExploreFileSchema,
stagedFoldersTreeFileSchema,
stagedGroupFileSchema,
stagedLookerScopeFileSchema,
stagedLookerSignalsFileSchema,
stagedLookFileSchema,
stagedLookmlModelsFileSchema,
stagedSyncConfigSchema,
stagedUserFileSchema,
} from './types.js';
interface LookerEntityRef {
id: string;
updatedAt?: string | null;
}
export interface LookerRuntimeClient {
listDashboards(): Promise<LookerEntityRef[]>;
getDashboard(id: string): Promise<StagedDashboardFile>;
listLooks(): Promise<LookerEntityRef[]>;
getLook(id: string): Promise<StagedLookFile>;
listFolders(): Promise<StagedFoldersTreeFile>;
listUsers(): Promise<StagedUserFile[]>;
listGroups(): Promise<StagedGroupFile[]>;
listLookmlModels(): Promise<StagedLookmlModelsFile>;
getExplore(modelName: string, exploreName: string): Promise<StagedExploreFile>;
getSignals?(): Promise<StagedLookerSignalsFile>;
cleanup?(): Promise<void>;
}
export interface LookerClientFactory {
createClient(config: LookerPullConfig, ctx: FetchContext): Promise<LookerRuntimeClient> | LookerRuntimeClient;
}
interface ExploreTargetSummary {
targetWarehouseConnectionId: string | null;
targetTable: ParsedTargetTable | null;
}
interface StampedExploreResult {
explore: StagedExploreFile;
targetSummary: ExploreTargetSummary;
}
interface StagedJsonFile<T> {
rawPath: string;
value: T;
}
type ParsedTargetTableFailureReason = Extract<ParsedTargetTable, { ok: false }>['reason'];
interface FetchLookerRuntimeBundleParams {
pullConfig: unknown;
stagedDir: string;
ctx: FetchContext;
clientFactory: LookerClientFactory;
now?: () => Date;
}
export async function fetchLookerRuntimeBundle(params: FetchLookerRuntimeBundleParams): Promise<void> {
const config = parseLookerPullConfig(params.pullConfig);
const connectionId = config.lookerConnectionId ?? params.ctx.connectionId;
const client = await params.clientFactory.createClient(config, params.ctx);
try {
const now = params.now ?? (() => new Date());
const skipped: StagedLookerFetchIssue[] = [];
const warnings: StagedLookerFetchIssue[] = [];
let dashboardFetchHadSkips = false;
let lookFetchHadSkips = false;
const fetchedDashboards: Array<StagedJsonFile<StagedDashboardFile>> = [];
const fetchedLooks: Array<StagedJsonFile<StagedLookFile>> = [];
const previousCursors = {
dashboardsLastSyncedAt: config.dashboardUpdatedSince ?? null,
looksLastSyncedAt: config.lookUpdatedSince ?? null,
};
const dashboards = await client.listDashboards();
const dashboardRawPaths = dashboards.map((dashboardRef) => `dashboards/${safePathSegment(dashboardRef.id)}.json`);
const dashboardsToFetch = dashboards.filter((dashboardRef) =>
shouldFetchEntity(dashboardRef, previousCursors.dashboardsLastSyncedAt),
);
const fetchedRawPaths: string[] = [];
for (const dashboardRef of dashboardsToFetch) {
const rawPath = `dashboards/${safePathSegment(dashboardRef.id)}.json`;
try {
const dashboard = stagedDashboardFileSchema.parse(await client.getDashboard(dashboardRef.id));
const dashboardRawPath = `dashboards/${safePathSegment(dashboard.lookerId)}.json`;
fetchedRawPaths.push(dashboardRawPath);
fetchedDashboards.push({ rawPath: dashboardRawPath, value: dashboard });
} catch (error) {
dashboardFetchHadSkips = true;
skipped.push(issueForFetchError({ rawPath, entityType: 'dashboard', entityId: dashboardRef.id, error }));
}
}
const looks = await client.listLooks();
const lookRawPaths = looks.map((lookRef) => `looks/${safePathSegment(lookRef.id)}.json`);
const looksToFetch = looks.filter((lookRef) => shouldFetchEntity(lookRef, previousCursors.looksLastSyncedAt));
for (const lookRef of looksToFetch) {
const rawPath = `looks/${safePathSegment(lookRef.id)}.json`;
try {
const look = stagedLookFileSchema.parse(await client.getLook(lookRef.id));
const lookRawPath = `looks/${safePathSegment(look.lookerId)}.json`;
fetchedRawPaths.push(lookRawPath);
fetchedLooks.push({ rawPath: lookRawPath, value: look });
} catch (error) {
lookFetchHadSkips = true;
skipped.push(issueForFetchError({ rawPath, entityType: 'look', entityId: lookRef.id, error }));
}
}
const nextCursors = {
dashboardsLastSyncedAt: dashboardFetchHadSkips
? previousCursors.dashboardsLastSyncedAt
: maxUpdatedAt(dashboards, previousCursors.dashboardsLastSyncedAt),
looksLastSyncedAt: lookFetchHadSkips
? previousCursors.looksLastSyncedAt
: maxUpdatedAt(looks, previousCursors.looksLastSyncedAt),
};
const fetchMode =
previousCursors.dashboardsLastSyncedAt || previousCursors.looksLastSyncedAt ? 'incremental' : 'full';
await writeJson(
params.stagedDir,
STAGED_FILES.syncConfig,
stagedSyncConfigSchema.parse({
lookerConnectionId: connectionId,
fetchedAt: now().toISOString(),
...(config.instanceBaseUrl ? { instanceBaseUrl: config.instanceBaseUrl } : {}),
previousCursors,
nextCursors,
}),
);
await writeJson(
params.stagedDir,
STAGED_FILES.scope,
stagedLookerScopeFileSchema.parse({
mode: fetchMode,
knownCurrentRawPaths: [...dashboardRawPaths, ...lookRawPaths].sort(),
fetchedRawPaths: fetchedRawPaths.sort(),
}),
);
const folders = stagedFoldersTreeFileSchema.parse(await client.listFolders());
await writeJson(params.stagedDir, STAGED_FILES.foldersTree, folders);
const users = await client.listUsers();
for (const rawUser of users) {
const user = stagedUserFileSchema.parse(rawUser);
await writeJson(params.stagedDir, `users/${safePathSegment(user.id)}.json`, user);
}
const groups = await client.listGroups();
for (const rawGroup of groups) {
const group = stagedGroupFileSchema.parse(rawGroup);
await writeJson(params.stagedDir, `groups/${safePathSegment(group.id)}.json`, group);
}
let models: StagedLookmlModelsFile;
try {
models = stagedLookmlModelsFileSchema.parse(await client.listLookmlModels());
} catch (error) {
warnings.push(
issueForFetchError({
rawPath: STAGED_FILES.lookmlModels,
entityType: 'lookml_models',
entityId: null,
error,
severity: 'warning',
}),
);
models = stagedLookmlModelsFileSchema.parse({ models: [] });
}
await writeJson(params.stagedDir, STAGED_FILES.lookmlModels, models);
const exploreTargetsByKey = new Map<string, ExploreTargetSummary>();
const stagedExplores: StagedExploreFile[] = [];
for (const model of models.models) {
for (const exploreRef of model.explores) {
const rawPath = `explores/${safePathSegment(model.name)}/${safePathSegment(exploreRef.name)}.json`;
try {
const result = stampExploreWarehouseTarget(await client.getExplore(model.name, exploreRef.name), config);
stagedExplores.push(result.explore);
exploreTargetsByKey.set(exploreKey(result.explore.modelName, result.explore.exploreName), result.targetSummary);
await writeJson(
params.stagedDir,
`explores/${safePathSegment(result.explore.modelName)}/${safePathSegment(result.explore.exploreName)}.json`,
result.explore,
);
} catch (error) {
skipped.push(
issueForFetchError({
rawPath,
entityType: 'explore',
entityId: `${model.name}.${exploreRef.name}`,
error,
}),
);
}
}
}
warnings.push(...warehouseTargetWarnings(stagedExplores));
for (const dashboard of fetchedDashboards) {
await writeJson(params.stagedDir, dashboard.rawPath, stampDashboardQueries(dashboard.value, exploreTargetsByKey));
}
for (const look of fetchedLooks) {
await writeJson(params.stagedDir, look.rawPath, stampLookQuery(look.value, exploreTargetsByKey));
}
let signals: StagedLookerSignalsFile;
try {
signals = stagedLookerSignalsFileSchema.parse(client.getSignals ? await client.getSignals() : {});
} catch (error) {
warnings.push(
issueForFetchError({
rawPath: STAGED_FILES.signals.dashboardUsage,
entityType: 'signals',
entityId: null,
error,
}),
);
signals = stagedLookerSignalsFileSchema.parse({});
}
await writeJson(params.stagedDir, STAGED_FILES.signals.dashboardUsage, signals.dashboardUsage);
await writeJson(params.stagedDir, STAGED_FILES.signals.lookUsage, signals.lookUsage);
await writeJson(params.stagedDir, STAGED_FILES.signals.scheduledPlans, signals.scheduledPlans);
await writeJson(params.stagedDir, STAGED_FILES.signals.favorites, signals.favorites);
await writeLookerEvidenceDocuments(params.stagedDir);
await writeLookerFetchReport(params.stagedDir, buildFetchReport(skipped, warnings));
} finally {
await client.cleanup?.();
}
}
async function writeJson(stagedDir: string, relPath: string, value: unknown): Promise<void> {
const abs = join(stagedDir, relPath);
await mkdir(dirname(abs), { recursive: true });
await writeFile(abs, `${JSON.stringify(value, null, 2)}\n`, 'utf-8');
}
function safePathSegment(value: string): string {
if (!/^[a-zA-Z0-9_-]+$/.test(value)) {
throw new Error(`Unsafe Looker staged path segment: ${value}`);
}
return value;
}
function shouldFetchEntity(ref: LookerEntityRef, updatedSince: string | null): boolean {
if (!updatedSince) {
return true;
}
if (!ref.updatedAt) {
return true;
}
return Date.parse(ref.updatedAt) > Date.parse(updatedSince);
}
function maxUpdatedAt(refs: LookerEntityRef[], fallback: string | null): string | null {
let max = fallback;
for (const ref of refs) {
if (!ref.updatedAt) {
continue;
}
if (!max || Date.parse(ref.updatedAt) > Date.parse(max)) {
max = ref.updatedAt;
}
}
if (!max) {
return null;
}
const ms = Date.parse(max);
return Number.isNaN(ms) ? null : new Date(ms).toISOString();
}
function stampExploreWarehouseTarget(rawExplore: unknown, config: LookerPullConfig): StampedExploreResult {
const parsed = stagedExploreFileSchema.parse(rawExplore);
const key = exploreKey(parsed.modelName, parsed.exploreName);
const targetWarehouseConnectionId = connectionMappingFor(parsed.connectionName, config);
const targetTable = targetTableFor({
key,
rawSqlTableName: parsed.rawSqlTableName,
targetWarehouseConnectionId,
config,
entityLabel: `Looker explore ${key}`,
});
const explore = stagedExploreFileSchema.parse({
...parsed,
targetWarehouseConnectionId,
targetTable,
joins: parsed.joins.map((join) => ({
...join,
targetTable: join.rawSqlTableName
? targetTableFor({
key: `${key}.${join.name}`,
rawSqlTableName: join.rawSqlTableName,
targetWarehouseConnectionId,
config,
entityLabel: `Looker join ${key}.${join.name}`,
})
: null,
})),
});
return {
explore,
targetSummary: {
targetWarehouseConnectionId: explore.targetWarehouseConnectionId,
targetTable: explore.targetTable,
},
};
}
function connectionMappingFor(connectionName: string | null, config: LookerPullConfig): string | null {
if (!connectionName) {
return null;
}
return config.connectionMappings[connectionName] ?? null;
}
function targetTableFor(input: {
key: string;
rawSqlTableName: string | null;
targetWarehouseConnectionId: string | null;
config: LookerPullConfig;
entityLabel: string;
}): ParsedTargetTable | null {
if (!input.rawSqlTableName && !input.targetWarehouseConnectionId) {
return null;
}
if (!input.targetWarehouseConnectionId) {
return {
ok: false,
reason: 'no_connection_mapping',
detail: `${input.entityLabel} has no mapped warehouse connection.`,
};
}
const parsed = input.config.parsedTargetTables[input.key];
if (parsed) {
return parsed;
}
if (!input.rawSqlTableName) {
return null;
}
return {
ok: false,
reason: 'parse_error',
detail: `${input.entityLabel} has raw sql_table_name but no parsedTargetTables entry for key ${input.key}.`,
};
}
function exploreKey(modelName: string, exploreName: string): string {
return `${modelName}.${exploreName}`;
}
function stampQueryWarehouseTarget(
query: StagedLookerQuery | null,
exploreTargetsByKey: Map<string, ExploreTargetSummary>,
): StagedLookerQuery | null {
if (!query) {
return null;
}
const target = exploreTargetsByKey.get(exploreKey(query.model, query.view));
if (!target) {
return query;
}
return {
...query,
targetWarehouseConnectionId: target.targetWarehouseConnectionId,
targetTable: target.targetTable,
};
}
function stampDashboardQueries(
dashboard: StagedDashboardFile,
exploreTargetsByKey: Map<string, ExploreTargetSummary>,
): StagedDashboardFile {
return stagedDashboardFileSchema.parse({
...dashboard,
tiles: dashboard.tiles.map((tile) => ({
...tile,
query: stampQueryWarehouseTarget(tile.query, exploreTargetsByKey),
})),
});
}
function stampLookQuery(look: StagedLookFile, exploreTargetsByKey: Map<string, ExploreTargetSummary>): StagedLookFile {
return stagedLookFileSchema.parse({
...look,
query: stampQueryWarehouseTarget(look.query, exploreTargetsByKey),
});
}
function warehouseTargetWarnings(explores: StagedExploreFile[]): StagedLookerFetchIssue[] {
const unmapped = new Map<string, string[]>();
const warnings: StagedLookerFetchIssue[] = [];
for (const explore of explores) {
const targetTable = explore.targetTable;
if (!targetTable || targetTable.ok) {
continue;
}
const sourceKey = exploreKey(explore.modelName, explore.exploreName);
const lookerConnectionName = explore.connectionName ?? 'missing_connection_name';
if (targetTable.reason === 'no_connection_mapping') {
const existing = unmapped.get(lookerConnectionName) ?? [];
existing.push(sourceKey);
unmapped.set(lookerConnectionName, existing);
continue;
}
warnings.push({
rawPath: `looker_connection_mappings/${safeWarningPathSegment(lookerConnectionName)}`,
entityType: 'looker_connection_mapping',
entityId: explore.connectionName,
severity: 'warning',
statusCode: null,
message: `Looker explore ${sourceKey} has sql_table_name that cannot be mapped to a physical warehouse table: ${targetTable.reason}.`,
retryRecommended: false,
kind: warningKindForReason(targetTable.reason),
details: {
lookerConnectionName,
rawSqlTableName: explore.rawSqlTableName,
reason: targetTable.reason,
},
});
}
for (const [lookerConnectionName, affectedExplores] of [...unmapped.entries()].sort(([a], [b]) =>
a.localeCompare(b),
)) {
const sortedAffectedExplores = [...affectedExplores].sort();
warnings.push({
rawPath: `looker_connection_mappings/${safeWarningPathSegment(lookerConnectionName)}`,
entityType: 'looker_connection_mapping',
entityId: lookerConnectionName === 'missing_connection_name' ? null : lookerConnectionName,
severity: 'warning',
statusCode: null,
message: `Looker connection ${lookerConnectionName} is not mapped to a warehouse connection; ${sortedAffectedExplores.length} explore${sortedAffectedExplores.length === 1 ? '' : 's'} will be wiki-only.`,
retryRecommended: false,
kind: 'unmapped_looker_connection',
details: {
lookerConnectionName,
affectedExplores: sortedAffectedExplores,
},
});
}
return warnings;
}
function warningKindForReason(reason: ParsedTargetTableFailureReason): StagedLookerFetchIssue['kind'] {
if (reason === 'looker_template_unresolved') {
return 'looker_template_unresolved';
}
if (reason === 'derived_table_not_supported') {
return 'derived_table_not_supported';
}
return 'unparseable_sql_table_name';
}
function safeWarningPathSegment(value: string): string {
return value.replace(/[^a-zA-Z0-9_-]+/g, '_');
}
function issueForFetchError(input: {
rawPath: string;
entityType: StagedLookerFetchIssue['entityType'];
entityId: string | null;
error: unknown;
severity?: StagedLookerFetchIssue['severity'];
}): StagedLookerFetchIssue {
const statusCode = errorStatusCode(input.error);
return {
rawPath: input.rawPath,
entityType: input.entityType,
entityId: input.entityId,
severity: input.severity ?? (input.entityType === 'signals' ? 'warning' : 'error'),
statusCode,
message: errorMessage(input.error),
retryRecommended: statusCode === 429,
};
}
function errorMessage(error: unknown): string {
return error instanceof Error ? error.message : String(error);
}
function errorStatusCode(error: unknown): number | null {
if (!error || typeof error !== 'object') {
return null;
}
const record = error as Record<string, unknown>;
const direct = record.statusCode ?? record.status;
if (typeof direct === 'number') {
return direct;
}
if (typeof direct === 'string') {
const parsed = Number(direct);
return Number.isFinite(parsed) ? parsed : null;
}
const response = record.response;
if (response && typeof response === 'object') {
return errorStatusCode(response);
}
return null;
}
function buildFetchReport(
skipped: StagedLookerFetchIssue[],
warnings: StagedLookerFetchIssue[],
): StagedLookerFetchReport {
const retryRecommended = [...skipped, ...warnings].some((issue) => issue.retryRecommended);
const hasWarehouseTargetWarnings = warnings.some((issue) => issue.entityType === 'looker_connection_mapping');
return {
status: skipped.length > 0 || hasWarehouseTargetWarnings ? 'partial' : 'success',
retryRecommended,
skipped,
warnings,
};
}

View file

@ -0,0 +1,53 @@
import type { KtxLocalProject } from '../../../../context/project/project.js';
import type { KtxProjectConnectionConfig } from '../../../../context/project/config.js';
import {
type LookerCredentialResolver,
} from './factory.js';
function stringField(value: unknown): string | null {
return typeof value === 'string' && value.trim().length > 0 ? value.trim() : null;
}
function resolveEnvReference(ref: string, env: NodeJS.ProcessEnv): string | null {
if (!ref.startsWith('env:')) {
return null;
}
return stringField(env[ref.slice('env:'.length)]);
}
export function lookerCredentialsFromLocalConnection(
connectionId: string,
connection: KtxProjectConnectionConfig | undefined,
env: NodeJS.ProcessEnv = process.env,
) {
if (!connection || String(connection.driver).toLowerCase() !== 'looker') {
throw new Error(`Connection "${connectionId}" is not a Looker connection`);
}
const baseUrl = stringField(connection.base_url);
const clientId = stringField(connection.client_id);
const clientSecret =
stringField(connection.client_secret) ??
(stringField(connection.client_secret_ref) ? resolveEnvReference(String(connection.client_secret_ref), env) : null);
if (!baseUrl) {
throw new Error(`Connection "${connectionId}" is missing Looker base_url`);
}
if (!clientId) {
throw new Error(`Connection "${connectionId}" is missing Looker client_id`);
}
if (!clientSecret) {
throw new Error(`Connection "${connectionId}" is missing Looker client_secret or client_secret_ref`);
}
return { base_url: baseUrl, client_id: clientId, client_secret: clientSecret };
}
export function createLocalLookerCredentialResolver(
project: KtxLocalProject,
env: NodeJS.ProcessEnv = process.env,
): LookerCredentialResolver {
return {
async resolve(lookerConnectionId) {
return lookerCredentialsFromLocalConnection(lookerConnectionId, project.config.connections[lookerConnectionId], env);
},
};
}

View file

@ -0,0 +1,116 @@
import { mkdtemp } from 'node:fs/promises';
import { tmpdir } from 'node:os';
import { join } from 'node:path';
import { describe, expect, it } from 'vitest';
import { LocalLookerRuntimeStore } from './local-runtime-store.js';
describe('LocalLookerRuntimeStore', () => {
async function store() {
const dir = await mkdtemp(join(tmpdir(), 'ktx-looker-store-'));
return new LocalLookerRuntimeStore({
dbPath: join(dir, 'db.sqlite'),
now: () => new Date('2026-05-05T12:00:00.000Z'),
});
}
it('stores cursors and connection mappings', async () => {
const local = await store();
await local.setCursors('prod-looker', {
dashboardsLastSyncedAt: '2026-05-01T00:00:00.000Z',
looksLastSyncedAt: null,
});
await local.upsertConnectionMapping({
lookerConnectionId: 'prod-looker',
lookerConnectionName: 'bq_reporting',
ktxConnectionId: 'prod-warehouse',
source: 'cli',
});
await expect(local.readCursors('prod-looker')).resolves.toEqual({
dashboardsLastSyncedAt: '2026-05-01T00:00:00.000Z',
looksLastSyncedAt: null,
});
await expect(local.readMappings('prod-looker')).resolves.toEqual([
{
lookerConnectionName: 'bq_reporting',
ktxConnectionId: 'prod-warehouse',
lookerHost: null,
lookerDatabase: null,
lookerDialect: null,
},
]);
});
it('refreshes discovered metadata without dropping local targets', async () => {
const local = await store();
await local.upsertConnectionMapping({
lookerConnectionId: 'prod-looker',
lookerConnectionName: 'bq_reporting',
ktxConnectionId: 'prod-warehouse',
source: 'cli',
});
await local.refreshDiscoveredConnections({
lookerConnectionId: 'prod-looker',
discovered: [
{
name: 'bq_reporting',
host: 'bigquery.googleapis.com',
database: 'analytics',
schema: null,
dialect: 'bigquery_standard_sql',
},
],
});
await expect(local.listConnectionMappings('prod-looker')).resolves.toEqual([
{
lookerConnectionName: 'bq_reporting',
ktxConnectionId: 'prod-warehouse',
lookerHost: 'bigquery.googleapis.com',
lookerDatabase: 'analytics',
lookerDialect: 'bigquery_standard_sql',
source: 'refresh',
},
]);
});
it('applies yaml mapping intent while preserving refresh metadata and cli overrides', async () => {
const local = await store();
await local.refreshDiscoveredConnections({
lookerConnectionId: 'prod-looker',
discovered: [{ name: 'analytics', host: 'looker-db.test', database: 'warehouse', schema: null, dialect: 'postgres' }],
});
await local.upsertConnectionMapping({
lookerConnectionId: 'prod-looker',
lookerConnectionName: 'manual',
ktxConnectionId: 'cli-warehouse',
source: 'cli',
});
await local.applyYamlBootstrap({
lookerConnectionId: 'prod-looker',
mappings: [
{ lookerConnectionName: 'analytics', ktxConnectionId: 'yaml-warehouse' },
{ lookerConnectionName: 'manual', ktxConnectionId: 'yaml-warehouse' },
],
});
await expect(local.listConnectionMappings('prod-looker')).resolves.toMatchObject([
{
lookerConnectionName: 'analytics',
ktxConnectionId: 'yaml-warehouse',
lookerHost: 'looker-db.test',
lookerDatabase: 'warehouse',
lookerDialect: 'postgres',
source: 'ktx.yaml',
},
{
lookerConnectionName: 'manual',
ktxConnectionId: 'cli-warehouse',
source: 'cli',
},
]);
});
});

View file

@ -0,0 +1,280 @@
import { mkdirSync } from 'node:fs';
import { dirname } from 'node:path';
import Database from 'better-sqlite3';
import type { LookerWarehouseConnectionInfo } from './client.js';
import type { LookerConnectionMapping } from './mapping.js';
import type { LookerRuntimeCursors } from './types.js';
type LocalLookerMappingSource = 'ktx.yaml' | 'cli' | 'refresh';
interface LocalLookerRuntimeStoreOptions {
dbPath: string;
now?: () => Date;
}
export interface LocalLookerConnectionMappingListRow extends LookerConnectionMapping {
source: LocalLookerMappingSource;
}
export interface UpsertLocalLookerConnectionMappingInput {
lookerConnectionId: string;
lookerConnectionName: string;
ktxConnectionId: string | null;
source: LocalLookerMappingSource;
}
interface ApplyLocalLookerYamlBootstrapInput {
lookerConnectionId: string;
mappings: Array<{
lookerConnectionName: string;
ktxConnectionId: string | null;
}>;
}
export interface RefreshLocalLookerDiscoveredConnectionsInput {
lookerConnectionId: string;
discovered: LookerWarehouseConnectionInfo[];
}
export interface ClearLocalLookerMappingsInput {
lookerConnectionId: string;
lookerConnectionName?: string;
}
interface LookerSourceStateReader {
readMappings(lookerConnectionId: string): Promise<LookerConnectionMapping[]>;
readCursors(lookerConnectionId: string): Promise<LookerRuntimeCursors>;
}
export class LocalLookerRuntimeStore implements LookerSourceStateReader {
private readonly db: Database.Database;
private readonly now: () => Date;
constructor(options: LocalLookerRuntimeStoreOptions) {
mkdirSync(dirname(options.dbPath), { recursive: true });
this.db = new Database(options.dbPath);
this.db.pragma('journal_mode = WAL');
this.db.pragma('foreign_keys = ON');
this.now = options.now ?? (() => new Date());
this.db.exec(`
CREATE TABLE IF NOT EXISTS local_looker_runtime_config (
looker_connection_id TEXT PRIMARY KEY,
dashboards_last_synced_at TEXT,
looks_last_synced_at TEXT,
updated_at TEXT NOT NULL
);
CREATE TABLE IF NOT EXISTS local_looker_connection_mappings (
looker_connection_id TEXT NOT NULL,
looker_connection_name TEXT NOT NULL,
ktx_connection_id TEXT,
looker_host TEXT,
looker_database TEXT,
looker_dialect TEXT,
source TEXT NOT NULL,
updated_at TEXT NOT NULL,
PRIMARY KEY (looker_connection_id, looker_connection_name)
);
`);
}
async applyYamlBootstrap(input: ApplyLocalLookerYamlBootstrapInput): Promise<void> {
const timestamp = this.now().toISOString();
const apply = this.db.transaction(() => {
const existing = this.db.prepare(`
SELECT ktx_connection_id, source
FROM local_looker_connection_mappings
WHERE looker_connection_id = ? AND looker_connection_name = ?
`);
const insert = this.db.prepare(`
INSERT INTO local_looker_connection_mappings (
looker_connection_id,
looker_connection_name,
ktx_connection_id,
looker_host,
looker_database,
looker_dialect,
source,
updated_at
)
VALUES (?, ?, ?, NULL, NULL, NULL, 'ktx.yaml', ?)
`);
const updateRefreshRow = this.db.prepare(`
UPDATE local_looker_connection_mappings
SET ktx_connection_id = ?,
source = 'ktx.yaml',
updated_at = ?
WHERE looker_connection_id = ?
AND looker_connection_name = ?
AND source = 'refresh'
AND ktx_connection_id IS NULL
`);
for (const mapping of input.mappings) {
const row = existing.get(input.lookerConnectionId, mapping.lookerConnectionName) as
| { ktx_connection_id: string | null; source: LocalLookerMappingSource }
| undefined;
if (!row) {
insert.run(input.lookerConnectionId, mapping.lookerConnectionName, mapping.ktxConnectionId, timestamp);
continue;
}
if (row.source === 'refresh' && row.ktx_connection_id === null) {
updateRefreshRow.run(mapping.ktxConnectionId, timestamp, input.lookerConnectionId, mapping.lookerConnectionName);
}
}
});
apply();
}
async readCursors(lookerConnectionId: string): Promise<LookerRuntimeCursors> {
const row = this.db
.prepare(
`
SELECT dashboards_last_synced_at, looks_last_synced_at
FROM local_looker_runtime_config
WHERE looker_connection_id = ?
`,
)
.get(lookerConnectionId) as { dashboards_last_synced_at: string | null; looks_last_synced_at: string | null } | undefined;
return {
dashboardsLastSyncedAt: row?.dashboards_last_synced_at ?? null,
looksLastSyncedAt: row?.looks_last_synced_at ?? null,
};
}
async setCursors(lookerConnectionId: string, cursors: LookerRuntimeCursors): Promise<void> {
this.db
.prepare(
`
INSERT INTO local_looker_runtime_config (
looker_connection_id,
dashboards_last_synced_at,
looks_last_synced_at,
updated_at
)
VALUES (?, ?, ?, ?)
ON CONFLICT(looker_connection_id) DO UPDATE SET
dashboards_last_synced_at = excluded.dashboards_last_synced_at,
looks_last_synced_at = excluded.looks_last_synced_at,
updated_at = excluded.updated_at
`,
)
.run(lookerConnectionId, cursors.dashboardsLastSyncedAt, cursors.looksLastSyncedAt, this.now().toISOString());
}
async readMappings(lookerConnectionId: string): Promise<LookerConnectionMapping[]> {
return (await this.listConnectionMappings(lookerConnectionId)).map(({ source: _source, ...mapping }) => mapping);
}
async listConnectionMappings(lookerConnectionId: string): Promise<LocalLookerConnectionMappingListRow[]> {
const rows = this.db
.prepare(
`
SELECT
looker_connection_name,
ktx_connection_id,
looker_host,
looker_database,
looker_dialect,
source
FROM local_looker_connection_mappings
WHERE looker_connection_id = ?
ORDER BY looker_connection_name
`,
)
.all(lookerConnectionId) as Array<{
looker_connection_name: string;
ktx_connection_id: string | null;
looker_host: string | null;
looker_database: string | null;
looker_dialect: string | null;
source: LocalLookerMappingSource;
}>;
return rows.map((row) => ({
lookerConnectionName: row.looker_connection_name,
ktxConnectionId: row.ktx_connection_id,
lookerHost: row.looker_host,
lookerDatabase: row.looker_database,
lookerDialect: row.looker_dialect,
source: row.source,
}));
}
async upsertConnectionMapping(input: UpsertLocalLookerConnectionMappingInput): Promise<void> {
this.db
.prepare(
`
INSERT INTO local_looker_connection_mappings (
looker_connection_id,
looker_connection_name,
ktx_connection_id,
looker_host,
looker_database,
looker_dialect,
source,
updated_at
)
VALUES (?, ?, ?, NULL, NULL, NULL, ?, ?)
ON CONFLICT(looker_connection_id, looker_connection_name) DO UPDATE SET
ktx_connection_id = excluded.ktx_connection_id,
source = excluded.source,
updated_at = excluded.updated_at
`,
)
.run(input.lookerConnectionId, input.lookerConnectionName, input.ktxConnectionId, input.source, this.now().toISOString());
}
async refreshDiscoveredConnections(input: RefreshLocalLookerDiscoveredConnectionsInput): Promise<void> {
const timestamp = this.now().toISOString();
const update = this.db.transaction(() => {
const upsert = this.db.prepare(`
INSERT INTO local_looker_connection_mappings (
looker_connection_id,
looker_connection_name,
ktx_connection_id,
looker_host,
looker_database,
looker_dialect,
source,
updated_at
)
VALUES (?, ?, NULL, ?, ?, ?, 'refresh', ?)
ON CONFLICT(looker_connection_id, looker_connection_name) DO UPDATE SET
looker_host = excluded.looker_host,
looker_database = excluded.looker_database,
looker_dialect = excluded.looker_dialect,
source = excluded.source,
updated_at = excluded.updated_at
`);
for (const connection of input.discovered) {
upsert.run(
input.lookerConnectionId,
connection.name,
connection.host,
connection.database,
connection.dialect,
timestamp,
);
}
});
update();
}
async clearConnectionMappings(input: ClearLocalLookerMappingsInput): Promise<void> {
if (input.lookerConnectionName) {
this.db
.prepare(
`
DELETE FROM local_looker_connection_mappings
WHERE looker_connection_id = ? AND looker_connection_name = ?
`,
)
.run(input.lookerConnectionId, input.lookerConnectionName);
return;
}
this.db.prepare('DELETE FROM local_looker_connection_mappings WHERE looker_connection_id = ?').run(input.lookerConnectionId);
}
}

View file

@ -0,0 +1,125 @@
import { mkdir, mkdtemp, readFile, rm, writeFile } from 'node:fs/promises';
import { tmpdir } from 'node:os';
import { join } from 'node:path';
import { afterEach, beforeEach, describe, expect, it, vi } from 'vitest';
import type { LookerRuntimeClient } from './fetch.js';
import { LookerSourceAdapter } from './looker.adapter.js';
const connectionId = '11111111-1111-4111-8111-111111111111';
function makeClient(): LookerRuntimeClient {
return {
listDashboards: vi.fn().mockResolvedValue([]),
getDashboard: vi.fn(),
listLooks: vi.fn().mockResolvedValue([]),
getLook: vi.fn(),
listFolders: vi.fn().mockResolvedValue({ folders: [] }),
listUsers: vi.fn().mockResolvedValue([]),
listGroups: vi.fn().mockResolvedValue([]),
listLookmlModels: vi.fn().mockResolvedValue({
models: [{ name: 'b2b', label: 'B2B', explores: [{ name: 'sales_pipeline', label: 'Sales Pipeline' }] }],
}),
getExplore: vi.fn().mockResolvedValue({
modelName: 'b2b',
exploreName: 'sales_pipeline',
label: 'Sales Pipeline',
description: null,
fields: { dimensions: [], measures: [] },
joins: [],
}),
};
}
describe('LookerSourceAdapter', () => {
let stagedDir: string;
beforeEach(async () => {
stagedDir = await mkdtemp(join(tmpdir(), 'looker-adapter-'));
});
afterEach(async () => {
await rm(stagedDir, { recursive: true, force: true });
});
it('exposes source="looker" and skillNames=["looker_ingest"]', () => {
const adapter = new LookerSourceAdapter({ clientFactory: { createClient: () => makeClient() } });
expect(adapter.source).toBe('looker');
expect(adapter.skillNames).toEqual(['looker_ingest']);
});
it('enables context evidence indexing and delegates triage signals', async () => {
const adapter = new LookerSourceAdapter({ clientFactory: { createClient: () => makeClient() } });
expect(adapter.evidenceIndexing).toBe('documents');
expect(adapter.triageSupported).toBe(true);
await expect(adapter.getTriageSignals?.(stagedDir, 'looker:dashboard:10')).resolves.toMatchObject({
objectType: 'looker_dashboard',
});
});
it('fetches, detects, and chunks a runtime bundle through the composed adapter', async () => {
const adapter = new LookerSourceAdapter({
clientFactory: { createClient: vi.fn().mockResolvedValue(makeClient()) },
now: () => new Date('2026-04-30T12:30:00.000Z'),
});
await mkdir(stagedDir, { recursive: true });
await adapter.fetch({ lookerConnectionId: connectionId }, stagedDir, { connectionId, sourceKey: 'looker' });
expect(await adapter.detect(stagedDir)).toBe(true);
expect(await readFile(join(stagedDir, 'explores/b2b/sales_pipeline.json'), 'utf-8')).toContain('sales_pipeline');
const result = await adapter.chunk(stagedDir);
expect(result.workUnits.map((wu) => wu.unitKey)).toEqual(['looker-explore-b2b-sales_pipeline']);
});
it('passes pull success notifications to the server callback', async () => {
const onPullSucceeded = vi.fn().mockResolvedValue(undefined);
const adapter = new LookerSourceAdapter({
clientFactory: { createClient: () => makeClient() },
onPullSucceeded,
});
const completedAt = new Date('2026-04-30T12:00:00.000Z');
await adapter.onPullSucceeded({
connectionId,
sourceKey: 'looker',
syncId: 'sync-1',
trigger: 'scheduled_pull',
completedAt,
stagedDir: '/tmp/staged',
});
expect(onPullSucceeded).toHaveBeenCalledWith({
connectionId,
sourceKey: 'looker',
syncId: 'sync-1',
trigger: 'scheduled_pull',
completedAt,
stagedDir: '/tmp/staged',
});
});
it('describes incremental fetch scope from the staged scope file', async () => {
await mkdir(join(stagedDir, 'dashboards'), { recursive: true });
await writeFile(
join(stagedDir, 'looker-scope.json'),
JSON.stringify(
{
mode: 'incremental',
knownCurrentRawPaths: ['dashboards/10.json', 'dashboards/11.json'],
fetchedRawPaths: ['dashboards/11.json'],
},
null,
2,
),
);
const adapter = new LookerSourceAdapter({ clientFactory: { createClient: () => makeClient() } });
const scope = await adapter.describeScope(stagedDir);
expect(scope.isPathInScope('dashboards/10.json')).toBe(false);
expect(scope.isPathInScope('dashboards/11.json')).toBe(true);
expect(scope.isPathInScope('dashboards/12.json')).toBe(true);
});
});

View file

@ -0,0 +1,70 @@
import type { ChunkResult, DiffSet, FetchContext, IngestTrigger, ScopeDescriptor, SourceAdapter } from '../../types.js';
import { chunkLookerStagedDir } from './chunk.js';
import { detectLookerStagedDir } from './detect.js';
import { getLookerTriageSignals } from './evidence-documents.js';
import { fetchLookerRuntimeBundle, type LookerClientFactory } from './fetch.js';
import { readLookerFetchReport } from './fetch-report.js';
import { describeLookerScope } from './scope.js';
import { listLookerTargetConnectionIds } from './target-connections.js';
interface LookerPullSucceededContext {
connectionId: string;
sourceKey: string;
syncId: string;
trigger: IngestTrigger;
completedAt: Date;
stagedDir: string;
}
export interface LookerSourceAdapterDeps {
clientFactory: LookerClientFactory;
now?: () => Date;
onPullSucceeded?: (ctx: LookerPullSucceededContext) => Promise<void>;
}
export class LookerSourceAdapter implements SourceAdapter {
readonly source = 'looker';
readonly skillNames: string[] = ['looker_ingest'];
readonly evidenceIndexing = 'documents' as const;
readonly triageSupported = true;
constructor(private readonly deps: LookerSourceAdapterDeps) {}
detect(stagedDir: string): Promise<boolean> {
return detectLookerStagedDir(stagedDir);
}
fetch(pullConfig: unknown, stagedDir: string, ctx: FetchContext): Promise<void> {
return fetchLookerRuntimeBundle({
pullConfig,
stagedDir,
ctx,
clientFactory: this.deps.clientFactory,
now: this.deps.now,
});
}
chunk(stagedDir: string, diffSet?: DiffSet): Promise<ChunkResult> {
return chunkLookerStagedDir(stagedDir, diffSet);
}
readFetchReport(stagedDir: string) {
return readLookerFetchReport(stagedDir);
}
listTargetConnectionIds(stagedDir: string): Promise<string[]> {
return listLookerTargetConnectionIds(stagedDir);
}
getTriageSignals(stagedDir: string, externalId: string) {
return getLookerTriageSignals(stagedDir, externalId);
}
describeScope(stagedDir: string): Promise<ScopeDescriptor> {
return describeLookerScope(stagedDir);
}
async onPullSucceeded(ctx: LookerPullSucceededContext): Promise<void> {
await this.deps.onPullSucceeded?.(ctx);
}
}

View file

@ -0,0 +1,384 @@
import { describe, expect, it, vi } from 'vitest';
import type { StagedExploreFile, StagedLookmlModelsFile } from './types.js';
import {
buildLookerPullConfigFromInputs,
collectExploreParseItems,
computeLookerMappingDrift,
discoverLookerConnections,
lookerDialectToConnectionType,
projectParsedIdentifier,
refreshLookerMappingPlaceholders,
sqlglotDialectForConnectionType,
suggestKtxConnectionForLookerConnection,
validateLookerMappings,
validateLookerWarehouseTarget,
} from './mapping.js';
const liveConnections = [
{
name: 'b2b_sandbox_bq',
host: 'warehouse.example.com',
database: 'analytics',
schema: null,
dialect: 'bigquery_standard_sql',
},
{
name: 'pg_runtime',
host: 'pg.internal:5432',
database: 'app',
schema: 'public',
dialect: 'postgres',
},
];
const mappedExplore: StagedExploreFile = {
modelName: 'b2b',
exploreName: 'sales_pipeline',
label: 'Sales Pipeline',
description: null,
rawSqlTableName: 'proj.analytics.opportunities AS opportunities',
connectionName: 'b2b_sandbox_bq',
viewName: 'opportunities',
fields: { dimensions: [], measures: [] },
joins: [
{
name: 'accounts',
type: 'left_outer',
relationship: 'many_to_one',
rawSqlTableName: 'proj.analytics.accounts',
sqlOn: null,
from: null,
targetTable: null,
},
],
targetWarehouseConnectionId: null,
targetTable: null,
};
const models: StagedLookmlModelsFile = {
models: [{ name: 'b2b', label: 'B2B', explores: [{ name: 'sales_pipeline', label: 'Sales Pipeline' }] }],
};
describe('discoverLookerConnections', () => {
it('delegates to the runtime client connection discovery method', async () => {
const client = { listLookerConnections: vi.fn().mockResolvedValue(liveConnections) };
await expect(discoverLookerConnections(client)).resolves.toEqual(liveConnections);
expect(client.listLookerConnections).toHaveBeenCalledTimes(1);
});
});
describe('looker dialect and target validation helpers', () => {
it('maps Looker dialect names to KTX connection types', () => {
expect(lookerDialectToConnectionType('bigquery_standard_sql')).toBe('BIGQUERY');
expect(lookerDialectToConnectionType('postgres')).toBe('POSTGRESQL');
expect(lookerDialectToConnectionType('mssql')).toBe('SQLSERVER');
expect(lookerDialectToConnectionType('unknown')).toBeNull();
});
it('maps supported warehouse connection types to sqlglot dialects', () => {
expect(sqlglotDialectForConnectionType('BIGQUERY')).toBe('bigquery');
expect(sqlglotDialectForConnectionType('POSTGRESQL')).toBe('postgres');
expect(sqlglotDialectForConnectionType('LOOKER')).toBeNull();
});
it('returns a structured failure for unsupported Looker warehouse targets', () => {
expect(validateLookerWarehouseTarget('LOOKER')).toEqual({
ok: false,
reason: 'Connection type LOOKER cannot be used as a Looker warehouse mapping target',
});
});
});
describe('suggestKtxConnectionForLookerConnection', () => {
it('returns the single deterministic target with matching type, host, and database', () => {
expect(
suggestKtxConnectionForLookerConnection({
lookerConnection: liveConnections[1],
candidateConnections: [
{
id: 'wrong-type',
connection_type: 'MYSQL',
connection_params: { host: 'pg.internal', database: 'app' },
},
{
id: 'pg-target',
connection_type: 'POSTGRESQL',
connection_params: { host: 'PG.INTERNAL', database: 'APP' },
},
],
}),
).toBe('pg-target');
});
it('returns null when more than one target matches', () => {
expect(
suggestKtxConnectionForLookerConnection({
lookerConnection: liveConnections[1],
candidateConnections: [
{
id: 'first',
connection_type: 'POSTGRESQL',
connection_params: { host: 'pg.internal', database: 'app' },
},
{
id: 'second',
connection_type: 'POSTGRESQL',
connection_params: { host: 'pg.internal:5432', database: 'APP' },
},
],
}),
).toBeNull();
});
});
describe('refreshLookerMappingPlaceholders', () => {
it('adds newly discovered placeholders and refreshes live metadata without dropping saved targets', () => {
expect(
refreshLookerMappingPlaceholders({
stored: [
{
lookerConnectionName: 'b2b_sandbox_bq',
ktxConnectionId: 'warehouse',
lookerHost: null,
lookerDatabase: null,
lookerDialect: null,
},
],
live: liveConnections,
}),
).toEqual({
changed: true,
mappings: [
{
lookerConnectionName: 'b2b_sandbox_bq',
ktxConnectionId: 'warehouse',
lookerHost: 'warehouse.example.com',
lookerDatabase: 'analytics',
lookerDialect: 'bigquery_standard_sql',
},
{
lookerConnectionName: 'pg_runtime',
ktxConnectionId: null,
lookerHost: 'pg.internal:5432',
lookerDatabase: 'app',
lookerDialect: 'postgres',
},
],
});
});
});
describe('computeLookerMappingDrift and validateLookerMappings', () => {
it('reports unmapped live connections, stale stored mappings, and in-sync mappings', () => {
expect(
computeLookerMappingDrift({
storedMappings: [
{
lookerConnectionName: 'b2b_sandbox_bq',
ktxConnectionId: 'warehouse',
lookerHost: null,
lookerDatabase: null,
lookerDialect: null,
},
{
lookerConnectionName: 'stale_runtime',
ktxConnectionId: 'warehouse',
lookerHost: null,
lookerDatabase: null,
lookerDialect: null,
},
],
discovered: liveConnections,
}),
).toEqual({
unmappedDiscovered: [liveConnections[1]],
staleMappings: [{ lookerConnectionName: 'stale_runtime', reason: 'looker_connection_not_found' }],
inSync: [{ lookerConnectionName: 'b2b_sandbox_bq', ktxConnectionId: 'warehouse' }],
});
});
it('validates missing and unsupported target connection ids', () => {
expect(
validateLookerMappings({
mappings: [
{
lookerConnectionName: 'b2b_sandbox_bq',
ktxConnectionId: 'missing',
lookerHost: null,
lookerDatabase: null,
lookerDialect: null,
},
{
lookerConnectionName: 'pg_runtime',
ktxConnectionId: 'looker-target',
lookerHost: null,
lookerDatabase: null,
lookerDialect: null,
},
],
knownKtxConnectionIds: new Set(['looker-target']),
knownConnectionTypes: new Map([['looker-target', 'LOOKER']]),
}),
).toEqual({
ok: false,
errors: [
{ key: 'b2b_sandbox_bq', reason: 'KTX connection missing does not exist' },
{
key: 'pg_runtime',
reason: 'Connection type LOOKER cannot be used as a Looker warehouse mapping target',
},
],
});
});
});
describe('collectExploreParseItems and projectParsedIdentifier', () => {
it('collects base explore and join parser inputs for mapped explores', () => {
expect(
collectExploreParseItems({
explore: mappedExplore,
connectionMappings: { b2b_sandbox_bq: 'warehouse' },
targetConnections: new Map([['warehouse', { id: 'warehouse', connection_type: 'BIGQUERY' }]]),
}),
).toEqual({
parsedTargetTables: {},
parseItems: [
{
key: 'b2b.sales_pipeline',
sql_table_name: 'proj.analytics.opportunities AS opportunities',
dialect: 'bigquery',
},
{
key: 'b2b.sales_pipeline.accounts',
sql_table_name: 'proj.analytics.accounts',
dialect: 'bigquery',
},
],
});
});
it('projects successful and failed parser rows into KTX parsed target tables', () => {
expect(
projectParsedIdentifier({
ok: true,
catalog: 'proj',
schema: 'analytics',
name: 'accounts',
canonical_table: 'proj.analytics.accounts',
}),
).toEqual({
ok: true,
catalog: 'proj',
schema: 'analytics',
name: 'accounts',
canonicalTable: 'proj.analytics.accounts',
});
expect(projectParsedIdentifier({ ok: false, reason: 'derived_table_not_supported' })).toEqual({
ok: false,
reason: 'derived_table_not_supported',
});
});
});
describe('buildLookerPullConfigFromInputs', () => {
it('builds the hosted-equivalent Looker pull config from caller-loaded inputs', async () => {
const parser = {
parse: vi.fn().mockResolvedValue({
'b2b.sales_pipeline': {
ok: true,
catalog: 'proj',
schema: 'analytics',
name: 'opportunities',
canonical_table: 'proj.analytics.opportunities',
},
'b2b.sales_pipeline.accounts': {
ok: true,
catalog: 'proj',
schema: 'analytics',
name: 'accounts',
canonical_table: 'proj.analytics.accounts',
},
}),
};
const client = {
listLookmlModels: vi.fn().mockResolvedValue(models),
getExplore: vi.fn().mockResolvedValue(mappedExplore),
};
await expect(
buildLookerPullConfigFromInputs({
lookerConnectionId: 'prod-looker',
cursors: {
dashboardsLastSyncedAt: '2026-05-01T00:00:00.000Z',
looksLastSyncedAt: null,
},
refreshedMappings: [
{
lookerConnectionName: 'b2b_sandbox_bq',
ktxConnectionId: 'warehouse',
lookerHost: 'warehouse.example.com',
lookerDatabase: 'analytics',
lookerDialect: 'bigquery_standard_sql',
},
],
targetConnections: new Map([['warehouse', { id: 'warehouse', connection_type: 'BIGQUERY' }]]),
client,
parser,
}),
).resolves.toEqual({
lookerConnectionId: 'prod-looker',
dashboardUpdatedSince: '2026-05-01T00:00:00.000Z',
lookUpdatedSince: null,
connectionMappings: { b2b_sandbox_bq: 'warehouse' },
connectionTypes: { b2b_sandbox_bq: 'BIGQUERY' },
parsedTargetTables: {
'b2b.sales_pipeline': {
ok: true,
catalog: 'proj',
schema: 'analytics',
name: 'opportunities',
canonicalTable: 'proj.analytics.opportunities',
},
'b2b.sales_pipeline.accounts': {
ok: true,
catalog: 'proj',
schema: 'analytics',
name: 'accounts',
canonicalTable: 'proj.analytics.accounts',
},
},
});
});
it('marks parser failures as parse_error without blocking pull-config construction', async () => {
const parser = { parse: vi.fn().mockRejectedValue(new Error('python unavailable')) };
const client = {
listLookmlModels: vi.fn().mockResolvedValue(models),
getExplore: vi.fn().mockResolvedValue(mappedExplore),
};
const config = await buildLookerPullConfigFromInputs({
lookerConnectionId: 'prod-looker',
cursors: { dashboardsLastSyncedAt: null, looksLastSyncedAt: null },
refreshedMappings: [
{
lookerConnectionName: 'b2b_sandbox_bq',
ktxConnectionId: 'warehouse',
lookerHost: null,
lookerDatabase: null,
lookerDialect: null,
},
],
targetConnections: new Map([['warehouse', { id: 'warehouse', connection_type: 'BIGQUERY' }]]),
client,
parser,
});
expect(config.parsedTargetTables).toMatchObject({
'b2b.sales_pipeline': { ok: false, reason: 'parse_error' },
'b2b.sales_pipeline.accounts': { ok: false, reason: 'parse_error' },
});
});
});

View file

@ -0,0 +1,446 @@
import type { ParsedTargetTable } from '../../parsed-target-table.js';
import type { LookerWarehouseConnectionInfo } from './client.js';
import type { LookerPullConfig, LookerRuntimeCursors, StagedExploreFile, StagedLookmlModelsFile } from './types.js';
const LOOKER_DIALECT_TO_CONNECTION_TYPE = {
bigquery: 'BIGQUERY',
bigquery_standard_sql: 'BIGQUERY',
snowflake: 'SNOWFLAKE',
postgres: 'POSTGRESQL',
postgresql: 'POSTGRESQL',
mysql: 'MYSQL',
sqlite: 'SQLITE',
sqlserver: 'SQLSERVER',
mssql: 'SQLSERVER',
tsql: 'SQLSERVER',
clickhouse: 'CLICKHOUSE',
} as const;
/** @internal */
export type LookerWarehouseTargetConnectionType =
(typeof LOOKER_DIALECT_TO_CONNECTION_TYPE)[keyof typeof LOOKER_DIALECT_TO_CONNECTION_TYPE];
export interface LookerConnectionMapping {
lookerConnectionName: string;
ktxConnectionId: string | null;
lookerHost: string | null;
lookerDatabase: string | null;
lookerDialect: string | null;
}
export interface LookerTargetConnection {
id: string;
connection_type: string;
connection_params?: Record<string, unknown> | null;
}
/** @internal */
export interface LookerMappingCandidateConnection extends LookerTargetConnection {}
export interface LookerMappingDrift {
unmappedDiscovered: LookerWarehouseConnectionInfo[];
staleMappings: Array<{ lookerConnectionName: string; reason: 'looker_connection_not_found' }>;
inSync: Array<{ lookerConnectionName: string; ktxConnectionId: string }>;
}
export type LookerMappingValidationResult =
| { ok: true }
| { ok: false; errors: Array<{ key: string; reason: string }> };
export interface LookerTableIdentifierParseItem {
key: string;
sql_table_name: string;
dialect: string;
}
type ParsedTargetTableFailureReason = Extract<ParsedTargetTable, { ok: false }>['reason'];
export interface LookerParsedIdentifier {
ok: boolean;
catalog?: string | null;
schema?: string | null;
name?: string | null;
canonical_table?: string | null;
reason?: ParsedTargetTableFailureReason | null;
detail?: string | null;
}
export interface LookerTableIdentifierParser {
parse(items: LookerTableIdentifierParseItem[]): Promise<Record<string, LookerParsedIdentifier>>;
}
export interface LookerMappingClient {
listLookerConnections(): Promise<LookerWarehouseConnectionInfo[]>;
listLookmlModels(): Promise<StagedLookmlModelsFile>;
getExplore(modelName: string, exploreName: string): Promise<StagedExploreFile>;
}
const SQLGLOT_DIALECT_BY_CONNECTION_TYPE: Partial<Record<LookerWarehouseTargetConnectionType, string>> = {
BIGQUERY: 'bigquery',
SNOWFLAKE: 'snowflake',
POSTGRESQL: 'postgres',
MYSQL: 'mysql',
SQLITE: 'sqlite',
SQLSERVER: 'tsql',
CLICKHOUSE: 'clickhouse',
};
export async function discoverLookerConnections(
client: Pick<LookerMappingClient, 'listLookerConnections'>,
): Promise<LookerWarehouseConnectionInfo[]> {
return client.listLookerConnections();
}
/** @internal */
export function lookerDialectToConnectionType(dialect: string | null): LookerWarehouseTargetConnectionType | null {
if (!dialect) {
return null;
}
return (
LOOKER_DIALECT_TO_CONNECTION_TYPE[dialect.toLowerCase() as keyof typeof LOOKER_DIALECT_TO_CONNECTION_TYPE] ?? null
);
}
/** @internal */
export function sqlglotDialectForConnectionType(connectionType: string): string | null {
return SQLGLOT_DIALECT_BY_CONNECTION_TYPE[connectionType as LookerWarehouseTargetConnectionType] ?? null;
}
/** @internal */
export function validateLookerWarehouseTarget(connectionType: string): { ok: true } | { ok: false; reason: string } {
return sqlglotDialectForConnectionType(connectionType)
? { ok: true }
: {
ok: false,
reason: `Connection type ${connectionType} cannot be used as a Looker warehouse mapping target`,
};
}
function extractWarehouseHost(params: unknown, connectionType: string): string | null {
const record = isRecord(params) ? params : {};
switch (connectionType) {
case 'POSTGRESQL':
case 'SQLSERVER':
case 'MYSQL':
case 'CLICKHOUSE':
return readString(record, 'host');
case 'SNOWFLAKE':
return readString(record, 'account');
default:
return null;
}
}
function extractWarehouseDatabase(params: unknown, connectionType: string): string | null {
const record = isRecord(params) ? params : {};
switch (connectionType) {
case 'POSTGRESQL':
case 'SQLSERVER':
case 'MYSQL':
case 'CLICKHOUSE':
case 'SNOWFLAKE':
return readString(record, 'database');
case 'BIGQUERY':
return readString(record, 'dataset_id');
default:
return null;
}
}
function normalizeHost(value: string | null): string | null {
return value ? value.toLowerCase().replace(/:\d+$/, '') : null;
}
function normalizeName(value: string | null): string | null {
return value ? value.toLowerCase() : null;
}
/** @internal */
export function suggestKtxConnectionForLookerConnection(args: {
lookerConnection: LookerWarehouseConnectionInfo;
candidateConnections: LookerMappingCandidateConnection[];
}): string | null {
const expectedType = lookerDialectToConnectionType(args.lookerConnection.dialect);
if (!expectedType || !args.lookerConnection.host || !args.lookerConnection.database || !args.lookerConnection.dialect) {
return null;
}
const matches = args.candidateConnections.filter((connection) => {
if (connection.connection_type !== expectedType) {
return false;
}
return (
normalizeHost(extractWarehouseHost(connection.connection_params, connection.connection_type)) ===
normalizeHost(args.lookerConnection.host) &&
normalizeName(extractWarehouseDatabase(connection.connection_params, connection.connection_type)) ===
normalizeName(args.lookerConnection.database)
);
});
return matches.length === 1 ? matches[0].id : null;
}
export function computeLookerMappingDrift(args: {
storedMappings: LookerConnectionMapping[];
discovered: LookerWarehouseConnectionInfo[];
}): LookerMappingDrift {
const discoveredByName = new Map(args.discovered.map((connection) => [connection.name, connection]));
const storedByName = new Map(args.storedMappings.map((mapping) => [mapping.lookerConnectionName, mapping]));
return {
unmappedDiscovered: args.discovered.filter((connection) => !storedByName.get(connection.name)?.ktxConnectionId),
staleMappings: args.storedMappings
.filter((mapping) => !discoveredByName.has(mapping.lookerConnectionName))
.map((mapping) => ({
lookerConnectionName: mapping.lookerConnectionName,
reason: 'looker_connection_not_found' as const,
})),
inSync: args.storedMappings
.filter((mapping) => discoveredByName.has(mapping.lookerConnectionName) && mapping.ktxConnectionId)
.map((mapping) => ({
lookerConnectionName: mapping.lookerConnectionName,
ktxConnectionId: mapping.ktxConnectionId as string,
})),
};
}
export function validateLookerMappings(args: {
mappings: LookerConnectionMapping[];
knownKtxConnectionIds: Set<string>;
knownConnectionTypes: ReadonlyMap<string, string>;
}): LookerMappingValidationResult {
const errors: Array<{ key: string; reason: string }> = [];
for (const mapping of args.mappings) {
if (!mapping.ktxConnectionId) {
continue;
}
if (!args.knownKtxConnectionIds.has(mapping.ktxConnectionId)) {
errors.push({
key: mapping.lookerConnectionName,
reason: `KTX connection ${mapping.ktxConnectionId} does not exist`,
});
continue;
}
const connectionType = args.knownConnectionTypes.get(mapping.ktxConnectionId);
const validation = validateLookerWarehouseTarget(connectionType ?? 'unknown');
if (!validation.ok) {
errors.push({ key: mapping.lookerConnectionName, reason: validation.reason });
}
}
return errors.length === 0 ? { ok: true } : { ok: false, errors };
}
/** @internal */
export function refreshLookerMappingPlaceholders(args: {
stored: LookerConnectionMapping[];
live: LookerWarehouseConnectionInfo[];
}): { mappings: LookerConnectionMapping[]; changed: boolean } {
const byName = new Map(args.stored.map((mapping) => [mapping.lookerConnectionName, mapping]));
let changed = false;
for (const live of args.live) {
const existing = byName.get(live.name);
if (!existing) {
byName.set(live.name, {
lookerConnectionName: live.name,
ktxConnectionId: null,
lookerHost: live.host,
lookerDatabase: live.database,
lookerDialect: live.dialect,
});
changed = true;
continue;
}
const refreshed: LookerConnectionMapping = {
...existing,
lookerHost: live.host,
lookerDatabase: live.database,
lookerDialect: live.dialect,
};
if (
refreshed.lookerHost !== existing.lookerHost ||
refreshed.lookerDatabase !== existing.lookerDatabase ||
refreshed.lookerDialect !== existing.lookerDialect
) {
byName.set(live.name, refreshed);
changed = true;
}
}
return { mappings: [...byName.values()], changed };
}
/** @internal */
export function collectExploreParseItems(args: {
explore: StagedExploreFile;
connectionMappings: Record<string, string>;
targetConnections: ReadonlyMap<string, Pick<LookerTargetConnection, 'id' | 'connection_type'>>;
}): { parsedTargetTables: Record<string, ParsedTargetTable>; parseItems: LookerTableIdentifierParseItem[] } {
const parsedTargetTables: Record<string, ParsedTargetTable> = {};
const parseItems: LookerTableIdentifierParseItem[] = [];
const lookerConnectionName = args.explore.connectionName;
const targetConnectionId = lookerConnectionName ? args.connectionMappings[lookerConnectionName] : undefined;
if (!lookerConnectionName || !targetConnectionId) {
return { parsedTargetTables, parseItems };
}
const targetConnection = args.targetConnections.get(targetConnectionId);
const dialect = targetConnection ? sqlglotDialectForConnectionType(targetConnection.connection_type) : null;
const key = `${args.explore.modelName}.${args.explore.exploreName}`;
if (!dialect) {
parsedTargetTables[key] = {
ok: false,
reason: 'unsupported_dialect',
detail: `Connection type ${targetConnection?.connection_type ?? 'unknown'} does not map to a supported sqlglot dialect.`,
};
return { parsedTargetTables, parseItems };
}
if (args.explore.rawSqlTableName) {
parseItems.push({ key, sql_table_name: args.explore.rawSqlTableName, dialect });
}
for (const join of args.explore.joins) {
if (!join.rawSqlTableName) {
continue;
}
parseItems.push({
key: `${key}.${join.name}`,
sql_table_name: join.rawSqlTableName,
dialect,
});
}
return { parsedTargetTables, parseItems };
}
/** @internal */
export function projectParsedIdentifier(row: LookerParsedIdentifier | undefined): ParsedTargetTable {
if (!row) {
return { ok: false, reason: 'parse_error', detail: 'Python parser response was missing this key.' };
}
if (row.ok && row.name && row.canonical_table) {
return {
ok: true,
catalog: row.catalog ?? null,
schema: row.schema ?? null,
name: row.name,
canonicalTable: row.canonical_table,
};
}
return {
ok: false,
reason: row.reason ?? 'parse_error',
detail: row.reason ? undefined : 'Python parser returned an invalid success row without name or canonical_table.',
};
}
export async function buildLookerPullConfigFromInputs(args: {
lookerConnectionId: string;
cursors: LookerRuntimeCursors;
refreshedMappings: LookerConnectionMapping[];
targetConnections: ReadonlyMap<string, Pick<LookerTargetConnection, 'id' | 'connection_type'>>;
client: Pick<LookerMappingClient, 'listLookmlModels' | 'getExplore'>;
parser: LookerTableIdentifierParser;
}): Promise<LookerPullConfig> {
const connectionMappings: Record<string, string> = {};
const connectionTypes: Record<string, LookerWarehouseTargetConnectionType> = {};
for (const mapping of args.refreshedMappings) {
if (!mapping.ktxConnectionId) {
continue;
}
const target = args.targetConnections.get(mapping.ktxConnectionId);
if (!target || !validateLookerWarehouseTarget(target.connection_type).ok) {
continue;
}
connectionMappings[mapping.lookerConnectionName] = mapping.ktxConnectionId;
connectionTypes[mapping.lookerConnectionName] = target.connection_type as LookerWarehouseTargetConnectionType;
}
const parsedTargetTables = await parseExploreTargets({
client: args.client,
connectionMappings,
targetConnections: args.targetConnections,
parser: args.parser,
});
return {
lookerConnectionId: args.lookerConnectionId,
dashboardUpdatedSince: args.cursors.dashboardsLastSyncedAt,
lookUpdatedSince: args.cursors.looksLastSyncedAt,
connectionMappings,
connectionTypes,
parsedTargetTables,
};
}
async function parseExploreTargets(args: {
client: Pick<LookerMappingClient, 'listLookmlModels' | 'getExplore'>;
connectionMappings: Record<string, string>;
targetConnections: ReadonlyMap<string, Pick<LookerTargetConnection, 'id' | 'connection_type'>>;
parser: LookerTableIdentifierParser;
}): Promise<Record<string, ParsedTargetTable>> {
const parsedTargetTables: Record<string, ParsedTargetTable> = {};
const parseItems: LookerTableIdentifierParseItem[] = [];
let models: StagedLookmlModelsFile;
try {
models = await args.client.listLookmlModels();
} catch {
return parsedTargetTables;
}
for (const model of models.models) {
for (const exploreRef of model.explores) {
let explore: StagedExploreFile;
try {
explore = await args.client.getExplore(model.name, exploreRef.name);
} catch {
continue;
}
const collected = collectExploreParseItems({
explore,
connectionMappings: args.connectionMappings,
targetConnections: args.targetConnections,
});
Object.assign(parsedTargetTables, collected.parsedTargetTables);
parseItems.push(...collected.parseItems);
}
}
if (parseItems.length === 0) {
return parsedTargetTables;
}
let results: Record<string, LookerParsedIdentifier>;
try {
results = await args.parser.parse(parseItems);
} catch {
for (const item of parseItems) {
parsedTargetTables[item.key] = {
ok: false,
reason: 'parse_error',
detail: 'Python parse-table-identifier failed during Looker pull-config projection.',
};
}
return parsedTargetTables;
}
for (const item of parseItems) {
parsedTargetTables[item.key] = projectParsedIdentifier(results[item.key]);
}
return parsedTargetTables;
}
function isRecord(value: unknown): value is Record<string, unknown> {
return value !== null && typeof value === 'object' && !Array.isArray(value);
}
function readString(record: Record<string, unknown>, key: string): string | null {
const value = record[key];
return typeof value === 'string' ? value : null;
}

View file

@ -0,0 +1,13 @@
import { describe, expect, it } from 'vitest';
import { buildLookerReconcileNotes } from './reconcile.js';
describe('buildLookerReconcileNotes', () => {
it('instructs reconciliation to record subsumed provenance', () => {
expect(buildLookerReconcileNotes()).toEqual([
[
'Looker runtime API-derived SL sources use looker__<model>__<explore>.',
'If the unprefixed file-adapter source <model>__<explore> exists, prefer it in wiki sl_refs, delete or avoid the API-derived source, and call emit_artifact_resolution with actionType="subsumed" for the API raw explore path.',
].join(' '),
]);
});
});

View file

@ -0,0 +1,8 @@
export function buildLookerReconcileNotes(): string[] {
return [
[
'Looker runtime API-derived SL sources use looker__<model>__<explore>.',
'If the unprefixed file-adapter source <model>__<explore> exists, prefer it in wiki sl_refs, delete or avoid the API-derived source, and call emit_artifact_resolution with actionType="subsumed" for the API raw explore path.',
].join(' '),
];
}

View file

@ -0,0 +1,101 @@
import { mkdir, mkdtemp, rm, writeFile } from 'node:fs/promises';
import { tmpdir } from 'node:os';
import { join } from 'node:path';
import { afterEach, beforeEach, describe, expect, it } from 'vitest';
import { describeLookerScope, hashLookerScope, isPathInLookerScope } from './scope.js';
async function writeJson(stagedDir: string, relPath: string, value: unknown): Promise<void> {
const abs = join(stagedDir, relPath);
await mkdir(join(abs, '..'), { recursive: true });
await writeFile(abs, `${JSON.stringify(value, null, 2)}\n`, 'utf-8');
}
describe('Looker runtime fetch scope', () => {
let stagedDir: string;
beforeEach(async () => {
stagedDir = await mkdtemp(join(tmpdir(), 'looker-scope-'));
});
afterEach(async () => {
await rm(stagedDir, { recursive: true, force: true });
});
it('keeps omitted known-current entity files out of the deletion baseline', () => {
const scope = {
mode: 'incremental' as const,
knownCurrentRawPaths: ['dashboards/10.json', 'dashboards/11.json', 'looks/20.json'],
fetchedRawPaths: ['dashboards/11.json'],
};
expect(isPathInLookerScope('dashboards/10.json', scope)).toBe(false);
expect(isPathInLookerScope('looks/20.json', scope)).toBe(false);
expect(isPathInLookerScope('dashboards/11.json', scope)).toBe(true);
expect(isPathInLookerScope('looks/21.json', scope)).toBe(true);
expect(isPathInLookerScope('signals/dashboard_usage.json', scope)).toBe(true);
expect(isPathInLookerScope('explores/b2b/sales_pipeline.json', scope)).toBe(true);
});
it('keeps omitted unchanged evidence documents out of incremental delete scope', () => {
const scope = {
mode: 'incremental' as const,
knownCurrentRawPaths: ['dashboards/10.json', 'looks/20.json'],
fetchedRawPaths: ['dashboards/10.json'],
};
expect(isPathInLookerScope('evidence/dashboards/10/page.md', scope)).toBe(true);
expect(isPathInLookerScope('evidence/dashboards/10/metadata.json', scope)).toBe(true);
expect(isPathInLookerScope('evidence/looks/20/page.md', scope)).toBe(false);
expect(isPathInLookerScope('evidence/looks/20/metadata.json', scope)).toBe(false);
});
it('treats full scope as all raw paths in scope', () => {
const scope = {
mode: 'full' as const,
knownCurrentRawPaths: ['dashboards/10.json'],
fetchedRawPaths: ['dashboards/10.json'],
};
expect(isPathInLookerScope('dashboards/10.json', scope)).toBe(true);
expect(isPathInLookerScope('dashboards/99.json', scope)).toBe(true);
expect(isPathInLookerScope('looks/20.json', scope)).toBe(true);
});
it('hashes scope order-insensitively', () => {
const a = hashLookerScope({
mode: 'incremental',
knownCurrentRawPaths: ['looks/20.json', 'dashboards/10.json'],
fetchedRawPaths: ['dashboards/10.json'],
});
const b = hashLookerScope({
mode: 'incremental',
knownCurrentRawPaths: ['dashboards/10.json', 'looks/20.json'],
fetchedRawPaths: ['dashboards/10.json'],
});
expect(a).toBe(b);
expect(a).toMatch(/^[0-9a-f]{64}$/);
});
it('reads staged scope and returns a SourceAdapter ScopeDescriptor', async () => {
await writeJson(stagedDir, 'looker-scope.json', {
mode: 'incremental',
knownCurrentRawPaths: ['dashboards/10.json', 'looks/20.json'],
fetchedRawPaths: ['dashboards/10.json'],
});
const descriptor = await describeLookerScope(stagedDir);
expect(descriptor.fingerprint).toMatch(/^[0-9a-f]{64}$/);
expect(descriptor.isPathInScope('dashboards/10.json')).toBe(true);
expect(descriptor.isPathInScope('looks/20.json')).toBe(false);
expect(descriptor.isPathInScope('looks/99.json')).toBe(true);
});
it('falls back to full scope when old fixtures do not have a scope file', async () => {
const descriptor = await describeLookerScope(stagedDir);
expect(descriptor.isPathInScope('dashboards/10.json')).toBe(true);
expect(descriptor.isPathInScope('looks/20.json')).toBe(true);
});
});

View file

@ -0,0 +1,65 @@
import { createHash } from 'node:crypto';
import { readFile } from 'node:fs/promises';
import { join } from 'node:path';
import type { ScopeDescriptor } from '../../types.js';
import { STAGED_FILES, type StagedLookerScopeFile, stagedLookerScopeFileSchema } from './types.js';
const LOOKER_ENTITY_PATH_RE = /^(dashboards|looks)\/[^/]+\.json$/;
const LOOKER_EVIDENCE_ENTITY_PATH_RE = /^evidence\/(dashboards|looks)\/([^/]+)\/(?:metadata\.json|page\.md)$/;
export async function describeLookerScope(stagedDir: string): Promise<ScopeDescriptor> {
const scope = await readLookerScope(stagedDir);
return {
fingerprint: hashLookerScope(scope),
isPathInScope: (rawPath) => isPathInLookerScope(rawPath, scope),
};
}
async function readLookerScope(stagedDir: string): Promise<StagedLookerScopeFile> {
try {
const body = await readFile(join(stagedDir, STAGED_FILES.scope), 'utf-8');
return stagedLookerScopeFileSchema.parse(JSON.parse(body));
} catch (error) {
if (error && typeof error === 'object' && 'code' in error && error.code === 'ENOENT') {
return { mode: 'full', knownCurrentRawPaths: [], fetchedRawPaths: [] };
}
throw error;
}
}
/** @internal */
export function hashLookerScope(scope: StagedLookerScopeFile): string {
const canonical = JSON.stringify({
mode: scope.mode,
knownCurrentRawPaths: [...scope.knownCurrentRawPaths].sort(),
fetchedRawPaths: [...scope.fetchedRawPaths].sort(),
});
return createHash('sha256').update(canonical).digest('hex');
}
/** @internal */
export function isPathInLookerScope(rawPath: string, scope: StagedLookerScopeFile): boolean {
if (scope.mode === 'full') {
return true;
}
const entityRawPath = scopedEntityRawPath(rawPath);
if (!entityRawPath) {
return true;
}
const knownCurrent = new Set(scope.knownCurrentRawPaths);
const fetched = new Set(scope.fetchedRawPaths);
return fetched.has(entityRawPath) || !knownCurrent.has(entityRawPath);
}
function scopedEntityRawPath(rawPath: string): string | null {
if (LOOKER_ENTITY_PATH_RE.test(rawPath)) {
return rawPath;
}
const evidence = LOOKER_EVIDENCE_ENTITY_PATH_RE.exec(rawPath);
if (evidence) {
return `${evidence[1]}/${evidence[2]}.json`;
}
return null;
}

View file

@ -0,0 +1,86 @@
import { mkdir, mkdtemp, rm, writeFile } from 'node:fs/promises';
import { tmpdir } from 'node:os';
import { join } from 'node:path';
import { afterEach, beforeEach, describe, expect, it } from 'vitest';
import { listLookerTargetConnectionIds } from './target-connections.js';
describe('listLookerTargetConnectionIds', () => {
let stagedDir: string;
beforeEach(async () => {
stagedDir = await mkdtemp(join(tmpdir(), 'looker-targets-'));
});
afterEach(async () => {
await rm(stagedDir, { recursive: true, force: true });
});
it('collects unique target warehouse IDs from explores, dashboard queries, and Look queries', async () => {
await mkdir(join(stagedDir, 'explores', 'b2b'), { recursive: true });
await mkdir(join(stagedDir, 'dashboards'), { recursive: true });
await mkdir(join(stagedDir, 'looks'), { recursive: true });
await writeFile(
join(stagedDir, 'explores', 'b2b', 'sales_pipeline.json'),
JSON.stringify({
modelName: 'b2b',
exploreName: 'sales_pipeline',
label: null,
description: null,
fields: { dimensions: [], measures: [] },
joins: [],
targetWarehouseConnectionId: '22222222-2222-4222-8222-222222222222',
}),
);
await writeFile(
join(stagedDir, 'dashboards', '1.json'),
JSON.stringify({
lookerId: '1',
title: 'Pipeline',
description: null,
folderId: null,
ownerId: null,
updatedAt: null,
tiles: [
{
id: '11',
title: 'ARR',
lookId: null,
query: {
model: 'b2b',
view: 'sales_pipeline',
fields: [],
filters: {},
sorts: [],
targetWarehouseConnectionId: '33333333-3333-4333-8333-333333333333',
},
},
],
}),
);
await writeFile(
join(stagedDir, 'looks', '2.json'),
JSON.stringify({
lookerId: '2',
title: 'Customers',
description: null,
folderId: null,
ownerId: null,
updatedAt: null,
query: {
model: 'b2b',
view: 'sales_pipeline',
fields: [],
filters: {},
sorts: [],
targetWarehouseConnectionId: '22222222-2222-4222-8222-222222222222',
},
}),
);
await expect(listLookerTargetConnectionIds(stagedDir)).resolves.toEqual([
'22222222-2222-4222-8222-222222222222',
'33333333-3333-4333-8333-333333333333',
]);
});
});

View file

@ -0,0 +1,41 @@
import { readdir, readFile } from 'node:fs/promises';
import { join, relative } from 'node:path';
import { stagedDashboardFileSchema, stagedExploreFileSchema, stagedLookFileSchema } from './types.js';
async function walk(root: string): Promise<string[]> {
const entries = await readdir(root, { withFileTypes: true, recursive: true });
return entries
.filter((entry) => entry.isFile())
.map((entry) => relative(root, join(entry.parentPath, entry.name)).replace(/\\/g, '/'))
.sort();
}
function addTarget(targets: Set<string>, value: string | null | undefined): void {
if (value) {
targets.add(value);
}
}
export async function listLookerTargetConnectionIds(stagedDir: string): Promise<string[]> {
const targets = new Set<string>();
for (const path of await walk(stagedDir)) {
const fullPath = join(stagedDir, path);
if (/^explores\/[^/]+\/[^/]+\.json$/.test(path)) {
const explore = stagedExploreFileSchema.parse(JSON.parse(await readFile(fullPath, 'utf-8')));
addTarget(targets, explore.targetWarehouseConnectionId);
continue;
}
if (/^dashboards\/[^/]+\.json$/.test(path)) {
const dashboard = stagedDashboardFileSchema.parse(JSON.parse(await readFile(fullPath, 'utf-8')));
for (const tile of dashboard.tiles) {
addTarget(targets, tile.query?.targetWarehouseConnectionId);
}
continue;
}
if (/^looks\/[^/]+\.json$/.test(path)) {
const look = stagedLookFileSchema.parse(JSON.parse(await readFile(fullPath, 'utf-8')));
addTarget(targets, look.query?.targetWarehouseConnectionId);
}
}
return [...targets].sort();
}

View file

@ -0,0 +1,243 @@
import { describe, expect, it } from 'vitest';
import type { ToolOutput } from '../../../../../context/tools/base-tool.js';
import { buildLookerSlProposal, createLookerQueryToSlTool, type LookerSlProposal } from './looker-query-to-sl.tool.js';
describe('buildLookerSlProposal', () => {
it('suggests a measure and segment for an aggregated filtered Looker query', () => {
const proposal = buildLookerSlProposal({
contentTitle: 'Open Pipeline ARR',
contentType: 'look',
usage: { queryCount30d: 42, uniqueUsers30d: 7 },
query: {
model: 'b2b',
view: 'sales_pipeline',
fields: ['opportunities.arr', 'opportunities.stage'],
filters: { 'opportunities.stage': 'open' },
sorts: ['opportunities.arr desc'],
limit: '500',
},
});
expect(proposal.sourceName).toBe('looker__b2b__sales_pipeline');
expect(proposal.triageLane).toBe('full');
expect(proposal.decision).toBe('measure_added');
expect(proposal.measures).toEqual([
{
name: 'arr',
lookerField: 'opportunities.arr',
expr: 'sum(opportunities.arr)',
description: 'Suggested from Looker look "Open Pipeline ARR"; verify against explore field SQL before writing.',
},
]);
expect(proposal.dimensions).toEqual([{ name: 'stage', lookerField: 'opportunities.stage' }]);
expect(proposal.segments).toEqual([
{
name: 'open_pipeline_arr',
filters: { 'opportunities.stage': 'open' },
suggestedPredicate: "opportunities.stage = 'open'",
description: 'Reusable filter candidate from Looker look "Open Pipeline ARR".',
},
]);
expect(proposal.notes).toContain(
'Usage signals can raise priority, but query counts, users, owners, and folders must not be written as wiki narrative.',
);
});
it('keeps simple saved views as wiki-only candidates', () => {
const proposal = buildLookerSlProposal({
contentTitle: 'Accounts By Region',
query: {
model: 'b2b',
view: 'accounts',
fields: ['accounts.region', 'accounts.segment'],
filters: {},
},
});
expect(proposal.sourceName).toBe('looker__b2b__accounts');
expect(proposal.triageLane).toBe('light');
expect(proposal.decision).toBe('wiki_only');
expect(proposal.measures).toEqual([]);
expect(proposal.dimensions).toEqual([
{ name: 'region', lookerField: 'accounts.region' },
{ name: 'segment', lookerField: 'accounts.segment' },
]);
expect(proposal.segments).toEqual([]);
});
it('promotes high-usage filter-only queries as derived-source candidates', () => {
const proposal = buildLookerSlProposal({
contentTitle: 'Active Customers',
usage: { queryCount30d: 15, uniqueUsers30d: 4 },
query: {
model: 'b2b',
view: 'customers',
fields: ['customers.id', 'customers.name'],
filters: { 'customers.status': 'active', 'customers.is_test': '-yes' },
},
});
expect(proposal.sourceName).toBe('looker__b2b__customers');
expect(proposal.decision).toBe('source_created');
expect(proposal.segments).toEqual([
{
name: 'active_customers',
filters: { 'customers.status': 'active', 'customers.is_test': '-yes' },
suggestedPredicate: "customers.status = 'active' AND customers.is_test != 'yes'",
description: 'Reusable filter candidate from Looker look "Active Customers".',
},
]);
});
it('surfaces mapped warehouse target metadata for direct SL writes', () => {
const proposal = buildLookerSlProposal({
contentTitle: 'Open Pipeline ARR',
contentType: 'dashboard_tile',
usage: { queryCount30d: 42, uniqueUsers30d: 7 },
query: {
model: 'b2b',
view: 'sales_pipeline',
fields: ['opportunities.arr', 'opportunities.stage'],
filters: { 'opportunities.stage': 'open' },
targetWarehouseConnectionId: '22222222-2222-4222-8222-222222222222',
targetTable: {
ok: true,
catalog: 'proj',
schema: 'dataset',
name: 'opportunities',
canonicalTable: 'proj.dataset.opportunities',
},
},
});
expect(proposal.sourceName).toBe('looker__b2b__sales_pipeline');
expect(proposal.targetStatus).toBe('mapped');
expect(proposal.targetWarehouseConnectionId).toBe('22222222-2222-4222-8222-222222222222');
expect(proposal.sourceTable).toBe('proj.dataset.opportunities');
expect(proposal.canWriteStandaloneSource).toBe(true);
expect(proposal.targetTable).toEqual({
ok: true,
catalog: 'proj',
schema: 'dataset',
name: 'opportunities',
canonicalTable: 'proj.dataset.opportunities',
});
expect(proposal.notes).toContain(
'targetTable.ok is true: write or edit SL on targetWarehouseConnectionId using targetTable.canonicalTable as source.table.',
);
});
it('surfaces unmapped and unparseable target reasons for wiki-only fallback', () => {
const unmapped = buildLookerSlProposal({
contentTitle: 'Revenue Trend',
query: {
model: 'b2b',
view: 'revenue',
fields: ['revenue.arr'],
filters: {},
targetWarehouseConnectionId: null,
targetTable: {
ok: false,
reason: 'no_connection_mapping',
},
},
});
expect(unmapped.targetStatus).toBe('unmapped');
expect(unmapped.targetWarehouseConnectionId).toBeNull();
expect(unmapped.sourceTable).toBeNull();
expect(unmapped.canWriteStandaloneSource).toBe(false);
expect(unmapped.notes).toContain(
'targetTable.ok is false (no_connection_mapping): keep this query wiki-only and pass the reason through emit_unmapped_fallback.',
);
const unparseable = buildLookerSlProposal({
contentTitle: 'Templated Source',
query: {
model: 'b2b',
view: 'templated',
fields: ['templated.count'],
filters: {},
targetWarehouseConnectionId: '22222222-2222-4222-8222-222222222222',
targetTable: {
ok: false,
reason: 'looker_template_unresolved',
detail: 'The sql_table_name contains ${derived.SQL_TABLE_NAME}.',
},
},
});
expect(unparseable.targetStatus).toBe('unparseable');
expect(unparseable.targetWarehouseConnectionId).toBe('22222222-2222-4222-8222-222222222222');
expect(unparseable.sourceTable).toBeNull();
expect(unparseable.canWriteStandaloneSource).toBe(false);
expect(unparseable.notes).toContain(
'targetTable.ok is false (looker_template_unresolved): keep this query wiki-only and pass the reason through emit_unmapped_fallback.',
);
});
});
describe('createLookerQueryToSlTool', () => {
it('returns markdown plus the structured proposal', async () => {
const lookerQueryToSl = createLookerQueryToSlTool();
if (!lookerQueryToSl.execute) {
throw new Error('looker_query_to_sl tool must be executable');
}
const output = (await lookerQueryToSl.execute(
{
contentTitle: 'Revenue Trend',
contentType: 'dashboard_tile',
query: {
model: 'finance',
view: 'orders',
fields: ['orders.total_revenue', 'orders.created_month'],
filters: { 'orders.status': 'paid' },
sorts: [],
targetWarehouseConnectionId: null,
targetTable: null,
},
},
{ toolCallId: 'call-1', messages: [] } as never,
)) as ToolOutput<LookerSlProposal>;
expect(output.markdown).toContain('Looker query SL proposal');
expect(output.markdown).toContain('looker__finance__orders');
expect(output.structured.sourceName).toBe('looker__finance__orders');
expect(output.structured.measures[0]?.name).toBe('total_revenue');
});
it('prints target connection and canonical table in markdown output', async () => {
const lookerQueryToSl = createLookerQueryToSlTool();
if (!lookerQueryToSl.execute) {
throw new Error('looker_query_to_sl tool must be executable');
}
const output = (await lookerQueryToSl.execute(
{
contentTitle: 'Revenue Trend',
contentType: 'dashboard_tile',
query: {
model: 'finance',
view: 'orders',
fields: ['orders.total_revenue', 'orders.created_month'],
filters: { 'orders.status': 'paid' },
sorts: [],
targetWarehouseConnectionId: '33333333-3333-4333-8333-333333333333',
targetTable: {
ok: true,
catalog: 'proj',
schema: 'finance',
name: 'orders',
canonicalTable: 'proj.finance.orders',
},
},
},
{ toolCallId: 'call-1', messages: [] } as never,
)) as ToolOutput<LookerSlProposal>;
expect(output.markdown).toContain('- targetStatus: mapped');
expect(output.markdown).toContain('- targetWarehouseConnectionId: 33333333-3333-4333-8333-333333333333');
expect(output.markdown).toContain('- sourceTable: proj.finance.orders');
expect(output.structured.canWriteStandaloneSource).toBe(true);
});
});

View file

@ -0,0 +1,307 @@
import { tool } from 'ai';
import { z } from 'zod';
import type { ToolOutput } from '../../../../../context/tools/base-tool.js';
import type { ParsedTargetTable } from '../../../parsed-target-table.js';
import { stagedLookerQuerySchema } from '../types.js';
const lookerUsageInputSchema = z.object({
queryCount30d: z.number().int().nonnegative().default(0),
uniqueUsers30d: z.number().int().nonnegative().default(0),
});
const lookerQueryToSlInputSchema = z.object({
query: stagedLookerQuerySchema,
contentTitle: z.string().min(1).optional(),
contentType: z.enum(['look', 'dashboard_tile']).default('look'),
usage: lookerUsageInputSchema.optional(),
});
export type LookerQueryToSlInput = z.input<typeof lookerQueryToSlInputSchema>;
type LookerTargetStatus = 'mapped' | 'unmapped' | 'unparseable' | 'missing_target_table';
interface LookerSlFieldProposal {
name: string;
lookerField: string;
}
interface LookerSlMeasureProposal extends LookerSlFieldProposal {
expr: string;
description: string;
}
interface LookerSlSegmentProposal {
name: string;
filters: Record<string, unknown>;
suggestedPredicate: string;
description: string;
}
export interface LookerSlProposal {
sourceName: string;
targetWarehouseConnectionId: string | null;
targetTable: ParsedTargetTable | null;
targetStatus: LookerTargetStatus;
sourceTable: string | null;
canWriteStandaloneSource: boolean;
triageLane: 'skip' | 'light' | 'full';
decision: 'wiki_only' | 'measure_added' | 'source_created';
dimensions: LookerSlFieldProposal[];
measures: LookerSlMeasureProposal[];
segments: LookerSlSegmentProposal[];
notes: string[];
}
const MEASURE_FIELD_RE =
/\b(count|sum|total|revenue|arr|mrr|amount|avg|average|rate|ratio|percent|pct|margin|profit|value|score)\b/i;
function targetStatus(
targetWarehouseConnectionId: string | null,
targetTable: ParsedTargetTable | null,
): LookerTargetStatus {
if (targetTable?.ok === true && targetWarehouseConnectionId) {
return 'mapped';
}
if (targetTable?.ok === false && targetTable.reason === 'no_connection_mapping') {
return 'unmapped';
}
if (targetTable?.ok === false) {
return 'unparseable';
}
return 'missing_target_table';
}
function targetNotes(status: LookerTargetStatus, targetTable: ParsedTargetTable | null): string[] {
if (status === 'mapped') {
return [
'targetTable.ok is true: write or edit SL on targetWarehouseConnectionId using targetTable.canonicalTable as source.table.',
'Use targetTable.catalog, targetTable.schema, and targetTable.name only for source_tables preflight matching.',
'Never use rawSqlTableName as source.table; it may contain aliases, templates, or derived-table SQL.',
];
}
if (targetTable?.ok === false) {
return [
`targetTable.ok is false (${targetTable.reason}): keep this query wiki-only and pass the reason through emit_unmapped_fallback.`,
];
}
return [
'No targetTable was staged for this query; read the parent explore dependency before attempting any SL write.',
];
}
/** @internal */
export function buildLookerSlProposal(raw: LookerQueryToSlInput): LookerSlProposal {
const input = lookerQueryToSlInputSchema.parse(raw);
const sourceName = `looker__${toSlName(input.query.model)}__${toSlName(input.query.view)}`;
const usage = input.usage;
const targetWarehouseConnectionId = input.query.targetWarehouseConnectionId ?? null;
const targetTable = input.query.targetTable ?? null;
const status = targetStatus(targetWarehouseConnectionId, targetTable);
const sourceTable = targetTable?.ok === true ? targetTable.canonicalTable : null;
const canWriteStandaloneSource = status === 'mapped';
const triageLane =
usage && usage.queryCount30d === 0 && usage.uniqueUsers30d === 0 ? 'skip' : isHighUsage(usage) ? 'full' : 'light';
const dimensions: LookerSlFieldProposal[] = [];
const measures: LookerSlMeasureProposal[] = [];
for (const field of input.query.fields) {
const proposal = { name: toSlName(fieldLeaf(field)), lookerField: field };
if (isMeasureLikeField(field)) {
measures.push({
...proposal,
expr: suggestedMeasureExpr(field),
description: `Suggested from Looker ${contentLabel(input)}; verify against explore field SQL before writing.`,
});
} else {
dimensions.push(proposal);
}
}
const filters = nonEmptyFilters(input.query.filters);
const segments =
Object.keys(filters).length === 0
? []
: [
{
name: toSlName(input.contentTitle ?? Object.keys(filters).map(fieldLeaf).join('_')),
filters,
suggestedPredicate: Object.entries(filters)
.map(([field, value]) => filterValueToPredicate(field, value))
.join(' AND '),
description: `Reusable filter candidate from Looker ${contentLabel(input)}.`,
},
];
const decision =
measures.length > 0 ? 'measure_added' : segments.length > 0 && isHighUsage(usage) ? 'source_created' : 'wiki_only';
const notes = [
...targetNotes(status, targetTable),
'Treat this as a proposal, not an instruction to write SL blindly.',
'Verify field SQL, source shape, and existing SL overlap with sl_discover/sl_read_source before sl_write_source or sl_edit_source.',
'Usage signals can raise priority, but query counts, users, owners, and folders must not be written as wiki narrative.',
];
if (triageLane === 'skip') {
notes.push('Zero recent usage is a skip signal unless the raw content clearly defines durable business semantics.');
}
return {
sourceName,
targetWarehouseConnectionId,
targetTable,
targetStatus: status,
sourceTable,
canWriteStandaloneSource,
triageLane,
decision,
dimensions,
measures,
segments,
notes,
};
}
export function createLookerQueryToSlTool() {
return tool({
description:
'Given one staged Looker query JSON, return a conservative proposal for SL measures, dimensions, reusable filters, and triage priority. The proposal is advisory; verify with SL tools before writing.',
inputSchema: lookerQueryToSlInputSchema,
execute: async (input): Promise<ToolOutput<LookerSlProposal>> => {
const structured = buildLookerSlProposal(input);
return {
markdown: formatLookerSlProposal(structured),
structured,
};
},
toModelOutput: ({ output }) => {
const markdown =
output && typeof output === 'object' && 'markdown' in output
? String((output as { markdown: unknown }).markdown)
: String(output);
return { type: 'content', value: [{ type: 'text', text: markdown }] };
},
});
}
function formatLookerSlProposal(proposal: LookerSlProposal): string {
const lines = [
'## Looker query SL proposal',
'',
`- sourceName: ${proposal.sourceName}`,
`- targetStatus: ${proposal.targetStatus}`,
`- targetWarehouseConnectionId: ${proposal.targetWarehouseConnectionId ?? '(none)'}`,
`- sourceTable: ${proposal.sourceTable ?? '(none)'}`,
`- canWriteStandaloneSource: ${proposal.canWriteStandaloneSource}`,
`- triageLane: ${proposal.triageLane}`,
`- decision: ${proposal.decision}`,
'',
'### Measures',
...(proposal.measures.length === 0
? ['- (none)']
: proposal.measures.map((measure) => `- ${measure.name}: ${measure.expr} (${measure.lookerField})`)),
'',
'### Dimensions',
...(proposal.dimensions.length === 0
? ['- (none)']
: proposal.dimensions.map((dimension) => `- ${dimension.name}: ${dimension.lookerField}`)),
'',
'### Segments',
...(proposal.segments.length === 0
? ['- (none)']
: proposal.segments.map((segment) => `- ${segment.name}: ${segment.suggestedPredicate}`)),
'',
'### Notes',
...proposal.notes.map((note) => `- ${note}`),
];
return lines.join('\n');
}
function isHighUsage(usage: z.infer<typeof lookerUsageInputSchema> | undefined): boolean {
return !!usage && (usage.queryCount30d >= 10 || usage.uniqueUsers30d >= 3);
}
function isMeasureLikeField(field: string): boolean {
return MEASURE_FIELD_RE.test(fieldLeaf(field).replace(/_/g, ' '));
}
function suggestedMeasureExpr(field: string): string {
const leaf = fieldLeaf(field);
if (/\b(count|count_distinct)\b/i.test(leaf.replace(/_/g, ' '))) {
return `count(${field})`;
}
if (/\b(avg|average|rate|ratio|percent|pct|margin|score)\b/i.test(leaf.replace(/_/g, ' '))) {
return `avg(${field})`;
}
return `sum(${field})`;
}
function fieldLeaf(field: string): string {
const parts = field.split('.');
return parts[parts.length - 1] || field;
}
function nonEmptyFilters(filters: Record<string, unknown>): Record<string, unknown> {
return Object.fromEntries(
Object.entries(filters).filter(([, value]) => {
if (value === null || value === undefined) {
return false;
}
if (typeof value === 'string') {
return value.trim().length > 0;
}
if (Array.isArray(value)) {
return value.length > 0;
}
return true;
}),
);
}
function filterValueToPredicate(field: string, value: unknown): string {
if (Array.isArray(value)) {
return `${field} IN (${value.map(sqlLiteral).join(', ')})`;
}
if (typeof value === 'number' || typeof value === 'boolean') {
return `${field} = ${String(value)}`;
}
const raw = String(value).trim();
if (raw.includes(',') && !raw.includes('"') && !raw.includes("'")) {
return `${field} IN (${raw
.split(',')
.map((part) => sqlLiteral(part.trim()))
.join(', ')})`;
}
if (raw.startsWith('-') && raw.length > 1) {
return `${field} != ${sqlLiteral(raw.slice(1).trim())}`;
}
if (raw.includes('%')) {
return `${field} LIKE ${sqlLiteral(raw)}`;
}
return `${field} = ${sqlLiteral(raw)}`;
}
function sqlLiteral(value: unknown): string {
if (typeof value === 'number' || typeof value === 'boolean') {
return String(value);
}
return `'${String(value).replace(/'/g, "''")}'`;
}
function contentLabel(input: z.infer<typeof lookerQueryToSlInputSchema>): string {
const noun = input.contentType === 'dashboard_tile' ? 'dashboard tile' : 'look';
return input.contentTitle ? `${noun} "${input.contentTitle}"` : noun;
}
function toSlName(value: string): string {
const normalized = value
.trim()
.replace(/([a-z0-9])([A-Z])/g, '$1_$2')
.toLowerCase()
.replace(/[^a-z0-9]+/g, '_')
.replace(/^_+|_+$/g, '')
.replace(/_+/g, '_');
if (!normalized) {
throw new Error(`Cannot derive semantic-layer name from empty Looker value`);
}
return /^[0-9]/.test(normalized) ? `n_${normalized}` : normalized;
}

View file

@ -0,0 +1,329 @@
import { describe, expect, it } from 'vitest';
import { parsedTargetTableSchema } from '../../parsed-target-table.js';
import {
lookerPullConfigSchema,
parseLookerPullConfig,
stagedDashboardFileSchema,
stagedExploreFileSchema,
stagedLookerFetchIssueSchema,
stagedLookerQuerySchema,
stagedLookerScopeFileSchema,
stagedLookerSignalsFileSchema,
stagedLookFileSchema,
stagedSyncConfigSchema,
} from './types.js';
describe('Looker staged runtime schemas', () => {
it('parses pull config and staged sync config', () => {
expect(
lookerPullConfigSchema.parse({
lookerConnectionId: '11111111-1111-4111-8111-111111111111',
instanceBaseUrl: 'https://example.looker.com',
}),
).toEqual({
lookerConnectionId: '11111111-1111-4111-8111-111111111111',
instanceBaseUrl: 'https://example.looker.com',
connectionMappings: {},
connectionTypes: {},
parsedTargetTables: {},
});
expect(
stagedSyncConfigSchema.parse({
lookerConnectionId: '11111111-1111-4111-8111-111111111111',
fetchedAt: '2026-04-30T12:00:00.000Z',
instanceBaseUrl: 'https://example.looker.com',
}),
).toMatchObject({
lookerConnectionId: '11111111-1111-4111-8111-111111111111',
instanceBaseUrl: 'https://example.looker.com',
});
});
it('parses incremental pull cursors and scope manifests', () => {
expect(
parseLookerPullConfig({
lookerConnectionId: '11111111-1111-4111-8111-111111111111',
dashboardUpdatedSince: '2026-04-30T10:00:00.000Z',
lookUpdatedSince: '2026-04-30T11:00:00.000Z',
}),
).toEqual({
lookerConnectionId: '11111111-1111-4111-8111-111111111111',
dashboardUpdatedSince: '2026-04-30T10:00:00.000Z',
lookUpdatedSince: '2026-04-30T11:00:00.000Z',
connectionMappings: {},
connectionTypes: {},
parsedTargetTables: {},
});
expect(
stagedLookerScopeFileSchema.parse({
mode: 'incremental',
knownCurrentRawPaths: ['dashboards/10.json', 'looks/20.json'],
fetchedRawPaths: ['dashboards/10.json'],
}),
).toEqual({
mode: 'incremental',
knownCurrentRawPaths: ['dashboards/10.json', 'looks/20.json'],
fetchedRawPaths: ['dashboards/10.json'],
});
expect(
stagedSyncConfigSchema.parse({
lookerConnectionId: '11111111-1111-4111-8111-111111111111',
fetchedAt: '2026-04-30T12:30:00.000Z',
previousCursors: {
dashboardsLastSyncedAt: null,
looksLastSyncedAt: '2026-04-30T11:00:00.000Z',
},
nextCursors: {
dashboardsLastSyncedAt: '2026-04-30T12:00:00.000Z',
looksLastSyncedAt: '2026-04-30T11:00:00.000Z',
},
}).nextCursors,
).toEqual({
dashboardsLastSyncedAt: '2026-04-30T12:00:00.000Z',
looksLastSyncedAt: '2026-04-30T11:00:00.000Z',
});
});
it('normalizes numeric Looker ids to strings', () => {
const dashboard = stagedDashboardFileSchema.parse({
lookerId: 10,
title: 'Sales Pipeline',
description: null,
folderId: 7,
ownerId: 3,
updatedAt: '2026-04-30T12:00:00.000Z',
tiles: [{ id: 100, title: 'ARR', lookId: null, query: { model: 'b2b', view: 'sales_pipeline' } }],
});
expect(dashboard.lookerId).toBe('10');
expect(dashboard.folderId).toBe('7');
expect(dashboard.ownerId).toBe('3');
expect(dashboard.tiles[0].id).toBe('100');
});
it('parses explores, looks, and signal files with defaults', () => {
expect(
stagedExploreFileSchema.parse({
modelName: 'b2b',
exploreName: 'sales_pipeline',
label: 'Sales Pipeline',
description: null,
fields: {
dimensions: [{ name: 'opportunities.id', label: 'Opportunity ID', type: 'number', sql: '${TABLE}.id' }],
measures: [{ name: 'opportunities.arr', label: 'ARR', type: 'sum', sql: '${TABLE}.arr' }],
},
joins: [{ name: 'accounts', type: 'left_outer', relationship: 'many_to_one' }],
}),
).toMatchObject({
modelName: 'b2b',
exploreName: 'sales_pipeline',
fields: { dimensions: [{ name: 'opportunities.id' }], measures: [{ name: 'opportunities.arr' }] },
});
expect(
stagedLookFileSchema.parse({
lookerId: '20',
title: 'Open Pipeline',
description: null,
folderId: null,
ownerId: null,
updatedAt: null,
query: { model: 'b2b', view: 'sales_pipeline', fields: ['opportunities.arr'] },
}),
).toMatchObject({ lookerId: '20', query: { fields: ['opportunities.arr'] } });
expect(stagedLookerSignalsFileSchema.parse({}).dashboardUsage).toEqual([]);
});
it('parses warehouse SL mapping pull config and staged target table fields', () => {
const targetConnectionId = '22222222-2222-4222-8222-222222222222';
const parsedTargetTable = {
ok: true as const,
catalog: 'proj',
schema: 'dataset',
name: 'opportunities',
canonicalTable: 'proj.dataset.opportunities',
};
expect(parsedTargetTableSchema.parse(parsedTargetTable)).toEqual(parsedTargetTable);
expect(
parseLookerPullConfig({
lookerConnectionId: '11111111-1111-4111-8111-111111111111',
connectionMappings: { b2b_sandbox_bq: targetConnectionId },
connectionTypes: { b2b_sandbox_bq: 'BIGQUERY' },
parsedTargetTables: { 'b2b.sales_pipeline': parsedTargetTable },
}),
).toEqual({
lookerConnectionId: '11111111-1111-4111-8111-111111111111',
connectionMappings: { b2b_sandbox_bq: targetConnectionId },
connectionTypes: { b2b_sandbox_bq: 'BIGQUERY' },
parsedTargetTables: { 'b2b.sales_pipeline': parsedTargetTable },
});
expect(
stagedExploreFileSchema.parse({
modelName: 'b2b',
exploreName: 'sales_pipeline',
label: 'Sales Pipeline',
description: null,
rawSqlTableName: 'proj.dataset.opportunities AS opportunities',
connectionName: 'b2b_sandbox_bq',
viewName: 'opportunities',
fields: {
dimensions: [{ name: 'opportunities.id', label: 'Opportunity ID', type: 'number', sql: '${TABLE}.id' }],
measures: [{ name: 'opportunities.arr', label: 'ARR', type: 'sum', sql: '${TABLE}.arr' }],
},
joins: [
{
name: 'accounts',
type: 'left_outer',
relationship: 'many_to_one',
rawSqlTableName: 'proj.dataset.accounts',
sqlOn: '${opportunities.account_id} = ${accounts.id}',
from: null,
targetTable: {
ok: true,
catalog: 'proj',
schema: 'dataset',
name: 'accounts',
canonicalTable: 'proj.dataset.accounts',
},
},
],
targetWarehouseConnectionId: targetConnectionId,
targetTable: parsedTargetTable,
}),
).toMatchObject({
modelName: 'b2b',
exploreName: 'sales_pipeline',
connectionName: 'b2b_sandbox_bq',
targetWarehouseConnectionId: targetConnectionId,
targetTable: parsedTargetTable,
joins: [{ name: 'accounts', targetTable: { ok: true, name: 'accounts' } }],
});
});
it('parses structured Looker mapping fetch warnings', () => {
expect(
stagedLookerFetchIssueSchema.parse({
rawPath: 'looker_connection_mappings/b2b_sandbox_bq',
entityType: 'looker_connection_mapping',
entityId: 'b2b_sandbox_bq',
severity: 'warning',
statusCode: null,
message: 'Looker connection b2b_sandbox_bq is not mapped to a warehouse connection.',
retryRecommended: false,
kind: 'unmapped_looker_connection',
details: {
lookerConnectionName: 'b2b_sandbox_bq',
affectedExplores: ['b2b.sales_pipeline'],
},
}),
).toMatchObject({
entityType: 'looker_connection_mapping',
kind: 'unmapped_looker_connection',
details: {
lookerConnectionName: 'b2b_sandbox_bq',
affectedExplores: ['b2b.sales_pipeline'],
},
});
});
it('parses LookML model listing warnings in fetch reports', () => {
expect(
stagedLookerFetchIssueSchema.parse({
rawPath: 'lookml_models.json',
entityType: 'lookml_models',
entityId: null,
severity: 'warning',
statusCode: 403,
message: 'LookML model access denied',
retryRecommended: false,
}),
).toEqual({
rawPath: 'lookml_models.json',
entityType: 'lookml_models',
entityId: null,
severity: 'warning',
statusCode: 403,
message: 'LookML model access denied',
retryRecommended: false,
});
});
it('accepts slug-shaped connection ids inside KTX Looker runtime schemas', () => {
const parsedTargetTable = {
ok: true as const,
catalog: 'proj',
schema: 'dataset',
name: 'opportunities',
canonicalTable: 'proj.dataset.opportunities',
};
expect(
parseLookerPullConfig({
lookerConnectionId: 'prod-looker',
connectionMappings: { b2b_sandbox_bq: 'prod-warehouse' },
connectionTypes: { b2b_sandbox_bq: 'BIGQUERY' },
parsedTargetTables: { 'b2b.sales_pipeline': parsedTargetTable },
}),
).toMatchObject({
lookerConnectionId: 'prod-looker',
connectionMappings: { b2b_sandbox_bq: 'prod-warehouse' },
});
expect(
stagedSyncConfigSchema.parse({
lookerConnectionId: 'prod-looker',
fetchedAt: '2026-04-30T12:00:00.000Z',
}),
).toMatchObject({
lookerConnectionId: 'prod-looker',
});
expect(
stagedLookerQuerySchema.parse({
model: 'b2b',
view: 'sales_pipeline',
targetWarehouseConnectionId: 'prod-warehouse',
targetTable: parsedTargetTable,
}),
).toMatchObject({
targetWarehouseConnectionId: 'prod-warehouse',
targetTable: parsedTargetTable,
});
expect(
stagedExploreFileSchema.parse({
modelName: 'b2b',
exploreName: 'sales_pipeline',
label: 'Sales Pipeline',
description: null,
fields: { dimensions: [], measures: [] },
targetWarehouseConnectionId: 'prod-warehouse',
targetTable: parsedTargetTable,
}),
).toMatchObject({
targetWarehouseConnectionId: 'prod-warehouse',
targetTable: parsedTargetTable,
});
});
it('rejects unsafe KTX Looker connection ids', () => {
expect(() =>
parseLookerPullConfig({
lookerConnectionId: '../prod-looker',
}),
).toThrow();
expect(() =>
parseLookerPullConfig({
connectionMappings: { b2b_sandbox_bq: 'prod/warehouse' },
}),
).toThrow();
});
});

View file

@ -0,0 +1,255 @@
import { z } from 'zod';
import { connectionTypeSchema } from '../../../connections/connection-type.js';
import { parsedTargetTableSchema } from '../../parsed-target-table.js';
const lookerIdSchema = z.union([z.string(), z.number().int()]).transform(String);
const nullableLookerIdSchema = z.union([lookerIdSchema, z.null()]).default(null);
const lookerConnectionIdSchema = z.string().min(1).regex(/^[A-Za-z0-9_-]+$/);
const lookerRuntimeCursorsSchema = z.object({
dashboardsLastSyncedAt: z.iso.datetime().nullable().default(null),
looksLastSyncedAt: z.iso.datetime().nullable().default(null),
});
export type LookerRuntimeCursors = z.infer<typeof lookerRuntimeCursorsSchema>;
/** @internal */
export const lookerPullConfigSchema = z.object({
lookerConnectionId: lookerConnectionIdSchema.optional(),
instanceBaseUrl: z.url().optional(),
dashboardUpdatedSince: z.iso.datetime().nullable().optional(),
lookUpdatedSince: z.iso.datetime().nullable().optional(),
connectionMappings: z.record(z.string(), lookerConnectionIdSchema).default({}),
connectionTypes: z.record(z.string(), connectionTypeSchema).default({}),
parsedTargetTables: z.record(z.string(), parsedTargetTableSchema).default({}),
});
export type LookerPullConfig = z.infer<typeof lookerPullConfigSchema>;
export function parseLookerPullConfig(raw: unknown): LookerPullConfig {
return lookerPullConfigSchema.parse(raw ?? {});
}
export const stagedSyncConfigSchema = z.object({
lookerConnectionId: lookerConnectionIdSchema,
fetchedAt: z.iso.datetime(),
instanceBaseUrl: z.url().optional(),
previousCursors: lookerRuntimeCursorsSchema.default({
dashboardsLastSyncedAt: null,
looksLastSyncedAt: null,
}),
nextCursors: lookerRuntimeCursorsSchema.default({
dashboardsLastSyncedAt: null,
looksLastSyncedAt: null,
}),
});
export const stagedLookerQuerySchema = z.object({
id: lookerIdSchema.optional(),
model: z.string(),
view: z.string(),
fields: z.array(z.string()).default([]),
filters: z.record(z.string(), z.unknown()).default({}),
sorts: z.array(z.string()).default([]),
limit: z.union([z.string(), z.number()]).optional().nullable(),
dynamicFields: z.string().optional().nullable(),
targetWarehouseConnectionId: lookerConnectionIdSchema.nullable().default(null),
targetTable: parsedTargetTableSchema.nullable().default(null),
});
export type StagedLookerQuery = z.infer<typeof stagedLookerQuerySchema>;
const stagedDashboardTileSchema = z.object({
id: lookerIdSchema,
title: z.string().nullable().default(null),
lookId: nullableLookerIdSchema,
query: stagedLookerQuerySchema.nullable().default(null),
});
export const stagedDashboardFileSchema = z.object({
lookerId: lookerIdSchema,
title: z.string(),
description: z.string().nullable(),
folderId: nullableLookerIdSchema,
ownerId: nullableLookerIdSchema,
updatedAt: z.string().nullable(),
tiles: z.array(stagedDashboardTileSchema).default([]),
});
export type StagedDashboardFile = z.infer<typeof stagedDashboardFileSchema>;
export const stagedLookFileSchema = z.object({
lookerId: lookerIdSchema,
title: z.string(),
description: z.string().nullable(),
folderId: nullableLookerIdSchema,
ownerId: nullableLookerIdSchema,
updatedAt: z.string().nullable(),
query: stagedLookerQuerySchema.nullable().default(null),
});
export type StagedLookFile = z.infer<typeof stagedLookFileSchema>;
const stagedFolderSchema = z.object({
id: lookerIdSchema,
name: z.string(),
parentId: nullableLookerIdSchema,
path: z.array(z.string()).default([]),
});
export const stagedFoldersTreeFileSchema = z.object({
folders: z.array(stagedFolderSchema),
});
export type StagedFoldersTreeFile = z.infer<typeof stagedFoldersTreeFileSchema>;
export const stagedUserFileSchema = z.object({
id: lookerIdSchema,
displayName: z.string().nullable(),
email: z.string().nullable().default(null),
});
export type StagedUserFile = z.infer<typeof stagedUserFileSchema>;
export const stagedGroupFileSchema = z.object({
id: lookerIdSchema,
name: z.string(),
});
export type StagedGroupFile = z.infer<typeof stagedGroupFileSchema>;
const stagedLookmlModelSchema = z.object({
name: z.string(),
label: z.string().nullable().default(null),
explores: z.array(z.object({ name: z.string(), label: z.string().nullable().default(null) })),
});
export const stagedLookmlModelsFileSchema = z.object({
models: z.array(stagedLookmlModelSchema),
});
export type StagedLookmlModelsFile = z.infer<typeof stagedLookmlModelsFileSchema>;
const stagedLookerFieldSchema = z.object({
name: z.string(),
label: z.string().nullable().default(null),
type: z.string().nullable().default(null),
sql: z.string().nullable().default(null),
description: z.string().nullable().default(null),
});
const stagedLookerJoinSchema = z.object({
name: z.string(),
type: z.string().nullable().default(null),
relationship: z.string().nullable().default(null),
rawSqlTableName: z.string().nullable().default(null),
sqlOn: z.string().nullable().default(null),
from: z.string().nullable().default(null),
targetTable: parsedTargetTableSchema.nullable().default(null),
});
export const stagedExploreFileSchema = z.object({
modelName: z.string(),
exploreName: z.string(),
label: z.string().nullable().default(null),
description: z.string().nullable().default(null),
rawSqlTableName: z.string().nullable().default(null),
connectionName: z.string().nullable().default(null),
viewName: z.string().nullable().default(null),
fields: z.object({
dimensions: z.array(stagedLookerFieldSchema).default([]),
measures: z.array(stagedLookerFieldSchema).default([]),
}),
joins: z.array(stagedLookerJoinSchema).default([]),
targetWarehouseConnectionId: lookerConnectionIdSchema.nullable().default(null),
targetTable: parsedTargetTableSchema.nullable().default(null),
});
export type StagedExploreFile = z.infer<typeof stagedExploreFileSchema>;
const stagedUsageSignalSchema = z.object({
contentId: lookerIdSchema,
queryCount30d: z.number().int().nonnegative().default(0),
uniqueUsers30d: z.number().int().nonnegative().default(0),
lastRunAt: z.string().nullable().default(null),
topUsers: z.array(lookerIdSchema).default([]),
});
const stagedScheduledPlanSignalSchema = z.object({
contentId: lookerIdSchema,
contentType: z.enum(['dashboard', 'look']),
isScheduled: z.boolean(),
scheduleCount: z.number().int().nonnegative().default(0),
recipientCount: z.number().int().nonnegative().default(0),
});
const stagedFavoriteSignalSchema = z.object({
contentId: lookerIdSchema,
contentType: z.enum(['dashboard', 'look']),
favoriteCount: z.number().int().nonnegative().default(0),
});
export const stagedLookerSignalsFileSchema = z.object({
dashboardUsage: z.array(stagedUsageSignalSchema).default([]),
lookUsage: z.array(stagedUsageSignalSchema).default([]),
scheduledPlans: z.array(stagedScheduledPlanSignalSchema).default([]),
favorites: z.array(stagedFavoriteSignalSchema).default([]),
});
export type StagedLookerSignalsFile = z.infer<typeof stagedLookerSignalsFileSchema>;
export const stagedLookerScopeFileSchema = z.object({
mode: z.enum(['full', 'incremental']),
knownCurrentRawPaths: z.array(z.string()).default([]),
fetchedRawPaths: z.array(z.string()).default([]),
});
export type StagedLookerScopeFile = z.infer<typeof stagedLookerScopeFileSchema>;
const stagedLookerFetchIssueKindSchema = z.enum([
'unmapped_looker_connection',
'unparseable_sql_table_name',
'looker_template_unresolved',
'derived_table_not_supported',
'lookml_connection_mismatch',
]);
/** @internal */
export const stagedLookerFetchIssueSchema = z.object({
rawPath: z.string().min(1),
entityType: z.enum(['dashboard', 'look', 'explore', 'signals', 'lookml_models', 'looker_connection_mapping']),
entityId: z.string().nullable().default(null),
severity: z.enum(['warning', 'error']),
statusCode: z.number().int().nullable().default(null),
message: z.string().min(1),
retryRecommended: z.boolean().default(false),
kind: stagedLookerFetchIssueKindSchema.optional(),
details: z.record(z.string(), z.unknown()).optional(),
});
export type StagedLookerFetchIssue = z.infer<typeof stagedLookerFetchIssueSchema>;
export const stagedLookerFetchReportSchema = z.object({
status: z.enum(['success', 'partial']),
retryRecommended: z.boolean().default(false),
skipped: z.array(stagedLookerFetchIssueSchema).default([]),
warnings: z.array(stagedLookerFetchIssueSchema).default([]),
});
export type StagedLookerFetchReport = z.infer<typeof stagedLookerFetchReportSchema>;
export const STAGED_FILES = {
syncConfig: 'sync-config.json',
scope: 'looker-scope.json',
fetchReport: 'looker-fetch-report.json',
evidenceRoot: 'evidence',
lookmlModels: 'lookml_models.json',
foldersTree: 'folders/tree.json',
signals: {
dashboardUsage: 'signals/dashboard_usage.json',
lookUsage: 'signals/look_usage.json',
scheduledPlans: 'signals/scheduled_plans.json',
favorites: 'signals/favorites.json',
},
} as const;

View file

@ -0,0 +1,230 @@
import { join } from 'node:path';
import { describe, expect, it } from 'vitest';
import { chunkLookmlProject } from './chunk.js';
import { type ParsedLookmlProject, parseLookmlStagedDir } from './parse.js';
const FIXTURE_ROOT = join(__dirname, '../../../../test/fixtures/lookml');
describe('chunkLookmlProject — first run', () => {
it('single-model bundle → 1 WU with model + all views in rawFiles', async () => {
const stagedDir = join(FIXTURE_ROOT, 'single-model');
const project = await parseLookmlStagedDir(stagedDir);
const result = chunkLookmlProject(project);
expect(result.workUnits).toHaveLength(1);
const wu = result.workUnits[0];
expect(wu.unitKey).toBe('lookml-orders');
expect(wu.rawFiles.sort()).toEqual(['orders.model.lkml', 'views/customers.view.lkml', 'views/orders.view.lkml']);
expect(wu.peerFileIndex).toEqual([]);
expect(wu.dependencyPaths).toEqual([]);
expect(result.eviction).toBeUndefined();
});
it('multi-model bundle → 1 WU per model; shared view owned by lex-first model; others see it in dependencyPaths + peerFileIndex is pathless-index', async () => {
const stagedDir = join(FIXTURE_ROOT, 'multi-model');
const project = await parseLookmlStagedDir(stagedDir);
const result = chunkLookmlProject(project);
expect(result.workUnits).toHaveLength(2);
const marketing = result.workUnits.find((wu) => wu.unitKey === 'lookml-marketing');
const orders = result.workUnits.find((wu) => wu.unitKey === 'lookml-orders');
expect(marketing).toBeDefined();
expect(orders).toBeDefined();
if (!marketing || !orders) {
throw new Error('expected marketing and orders work units');
}
// marketing sorts before orders → marketing owns shared_dims
expect(marketing.rawFiles).toContain('views/shared_dims.view.lkml');
expect(marketing.rawFiles).toContain('views/campaigns.view.lkml');
expect(marketing.rawFiles).toContain('marketing.model.lkml');
expect(marketing.rawFiles).not.toContain('views/orders.view.lkml');
expect(marketing.dependencyPaths).toEqual([]);
// orders does NOT own shared_dims — it's in dependencyPaths (read-only upstream).
expect(orders.rawFiles).not.toContain('views/shared_dims.view.lkml');
expect(orders.dependencyPaths).toEqual(['views/shared_dims.view.lkml']);
expect(orders.rawFiles).toContain('views/orders.view.lkml');
expect(orders.rawFiles).toContain('orders.model.lkml');
// Each WU's peerFileIndex lists the OTHER model's files (paths-only index).
expect(orders.peerFileIndex).toContain('marketing.model.lkml');
expect(orders.peerFileIndex).toContain('views/campaigns.view.lkml');
// Dependency paths should not be duplicated into peerFileIndex.
expect(orders.peerFileIndex).not.toContain('views/shared_dims.view.lkml');
});
it('extends-chain fixture: single WU contains base + orders + orders_ext; chain order visible via graph', async () => {
const stagedDir = join(FIXTURE_ROOT, 'extends-chain');
const project = await parseLookmlStagedDir(stagedDir);
const result = chunkLookmlProject(project);
// One model ("orders") includes views/*.view.lkml — so all three views land in its WU.
expect(result.workUnits).toHaveLength(1);
const wu = result.workUnits[0];
expect(wu.unitKey).toBe('lookml-orders');
expect(wu.rawFiles.sort()).toEqual([
'orders.model.lkml',
'views/base.view.lkml',
'views/orders.view.lkml',
'views/orders_ext.view.lkml',
]);
expect(wu.dependencyPaths).toEqual([]); // all ancestors already in rawFiles on first run
expect(wu.notes).toMatch(/orders/);
});
it('is deterministic: two calls on the same project return structurally identical WorkUnits', async () => {
const stagedDir = join(FIXTURE_ROOT, 'multi-model');
const project = await parseLookmlStagedDir(stagedDir);
const r1 = chunkLookmlProject(project);
const r2 = chunkLookmlProject(project);
expect(r1.workUnits).toEqual(r2.workUnits);
});
it('unitKey is model-name-derived (stable across parse+chunk cycles and across re-syncs)', async () => {
const project = await parseLookmlStagedDir(join(FIXTURE_ROOT, 'multi-model'));
const { workUnits } = chunkLookmlProject(project);
expect(workUnits.map((wu) => wu.unitKey).sort()).toEqual(['lookml-marketing', 'lookml-orders']);
});
it('marks mismatched model WorkUnits as SL-disallowed and keeps wiki ingest enabled', () => {
const project: ParsedLookmlProject = {
models: [
{
path: 'b2b.model.lkml',
name: 'b2b',
includes: ['views/orders.view.lkml'],
explores: ['orders'],
connectionName: 'wrong_connection',
},
],
views: [{ path: 'views/orders.view.lkml', name: 'orders', extendsFrom: [], rawSqlTableName: 'public.orders' }],
dashboards: [],
allPaths: ['b2b.model.lkml', 'views/orders.view.lkml'],
};
const result = chunkLookmlProject(project, { mismatchedModelNames: new Set(['b2b']) });
const wu = result.workUnits[0];
expect(wu.unitKey).toBe('lookml-b2b');
expect(wu.rawFiles).toEqual(['b2b.model.lkml', 'views/orders.view.lkml']);
expect(wu.slDisallowed).toBe(true);
expect(wu.slDisallowedReason).toBe('lookml_connection_mismatch');
expect(wu.notes).toContain('[LOOKML SL WRITES DISALLOWED]');
expect(wu.notes).toContain('reason: lookml_connection_mismatch');
expect(wu.notes).toContain('Do not call sl_write_source or sl_edit_source for this WorkUnit.');
});
});
describe('chunkLookmlProject — re-sync', () => {
it("modified file in one model only emits that model's WU", async () => {
const stagedDir = join(FIXTURE_ROOT, 'multi-model');
const project = await parseLookmlStagedDir(stagedDir);
const result = chunkLookmlProject(project, {
diffSet: {
added: [],
modified: ['views/campaigns.view.lkml'],
deleted: [],
unchanged: [
'marketing.model.lkml',
'orders.model.lkml',
'views/orders.view.lkml',
'views/shared_dims.view.lkml',
],
},
});
expect(result.workUnits).toHaveLength(1);
expect(result.workUnits[0].unitKey).toBe('lookml-marketing');
});
it("added file under a model emits that model's WU with the new path in rawFiles", async () => {
const stagedDir = join(FIXTURE_ROOT, 'single-model');
const project = await parseLookmlStagedDir(stagedDir);
const result = chunkLookmlProject(project, {
diffSet: {
added: ['views/customers.view.lkml'],
modified: [],
deleted: [],
unchanged: ['orders.model.lkml', 'views/orders.view.lkml'],
},
});
expect(result.workUnits).toHaveLength(1);
expect(result.workUnits[0].rawFiles).toContain('views/customers.view.lkml');
});
it('widens dependencyPaths with transitive extends ancestors on re-sync', async () => {
const stagedDir = join(FIXTURE_ROOT, 'extends-chain');
const project = await parseLookmlStagedDir(stagedDir);
// Only orders_ext is touched; base and orders are upstream ancestors.
// Because the single-model WU's rawFiles ALREADY include all three on first run,
// they remain in rawFiles — dependencyPaths stays empty. Widening matters when
// re-sync drops some files from rawFiles, which doesn't apply for a monolithic
// single-model WU. Assert the baseline invariant.
const result = chunkLookmlProject(project, {
diffSet: {
added: [],
modified: ['views/orders_ext.view.lkml'],
deleted: [],
unchanged: ['orders.model.lkml', 'views/base.view.lkml', 'views/orders.view.lkml'],
},
});
expect(result.workUnits).toHaveLength(1);
const wu = result.workUnits[0];
expect(wu.rawFiles).toContain('views/orders_ext.view.lkml');
// Ancestors already in rawFiles → not duplicated into dependencyPaths.
expect(wu.dependencyPaths).toEqual([]);
});
it('widens dependencyPaths when an ancestor is OUTSIDE the WU (synthesized cross-model case)', () => {
// Synthesize a scenario in-memory: two models, "a" owns base.view.lkml,
// "b" owns derived.view.lkml which extends base. A diff that only touches
// derived.view.lkml should widen b's WU with base.view.lkml in dependencyPaths
// if base lives outside b's rawFiles. In practice with the current emit rules,
// base.view.lkml would already be in dependencyPaths because model b lists
// base.view.lkml under its `include:`. Here we confirm the widening is idempotent.
const project: ParsedLookmlProject = {
models: [
{ path: 'a.model.lkml', name: 'a', includes: ['views/base.view.lkml'], explores: [], connectionName: null },
{
path: 'b.model.lkml',
name: 'b',
includes: ['views/base.view.lkml', 'views/derived.view.lkml'],
explores: [],
connectionName: null,
},
],
views: [
{ path: 'views/base.view.lkml', name: 'base', extendsFrom: [], rawSqlTableName: null },
{ path: 'views/derived.view.lkml', name: 'derived', extendsFrom: ['base'], rawSqlTableName: null },
],
dashboards: [],
allPaths: ['a.model.lkml', 'b.model.lkml', 'views/base.view.lkml', 'views/derived.view.lkml'],
};
const result = chunkLookmlProject(project, {
diffSet: {
added: [],
modified: ['views/derived.view.lkml'],
deleted: [],
unchanged: ['a.model.lkml', 'b.model.lkml', 'views/base.view.lkml'],
},
});
const b = result.workUnits.find((wu) => wu.unitKey === 'lookml-b');
expect(b).toBeDefined();
if (!b) {
throw new Error('expected lookml-b work unit');
}
expect(b.dependencyPaths).toContain('views/base.view.lkml');
});
it('passes through diffSet.deleted as an EvictionUnit', async () => {
const project = await parseLookmlStagedDir(join(FIXTURE_ROOT, 'single-model'));
const result = chunkLookmlProject(project, {
diffSet: {
added: [],
modified: [],
deleted: ['views/zombie.view.lkml'],
unchanged: ['orders.model.lkml', 'views/customers.view.lkml', 'views/orders.view.lkml'],
},
});
expect(result.eviction).toEqual({ deletedRawPaths: ['views/zombie.view.lkml'] });
// No WU emitted because no current files are touched.
expect(result.workUnits).toEqual([]);
});
});

View file

@ -0,0 +1,159 @@
import type { ChunkResult, DiffSet, WorkUnit } from '../../types.js';
import { buildLookmlGraph, type LookmlGraph } from './graph.js';
import type { ParsedLookmlProject } from './parse.js';
interface ChunkOptions {
diffSet?: DiffSet;
mismatchedModelNames?: Set<string>;
}
function lookmlSlDisallowedNotes(modelName: string, existingNotes: string): string {
return [
'[LOOKML SL WRITES DISALLOWED]',
'reason: lookml_connection_mismatch',
`model: ${modelName}`,
'Do not call sl_write_source or sl_edit_source for this WorkUnit.',
'Continue wiki extraction and context candidates from the raw LookML files.',
'[/LOOKML SL WRITES DISALLOWED]',
'',
existingNotes,
].join('\n');
}
/**
* Emit WorkUnits for a parsed LookML project.
*
* First run (no diffSet): one WU per model + `lookml-orphans` (if any non-owned views)
* + `lookml-dashboard-<name>` per dashboard file.
*
* Re-sync (diffSet provided): filter to WUs whose rawFiles intersect addedmodified;
* widen dependencyPaths with every file in `allPaths`
* that's upstream of the WU's changed files via the graph.
* Emit a single EvictionUnit for diffSet.deleted.
*/
export function chunkLookmlProject(project: ParsedLookmlProject, opts: ChunkOptions = {}): ChunkResult {
const graph = buildLookmlGraph(project);
const firstRunUnits = emitFirstRunWorkUnits(project, graph, opts);
if (!opts.diffSet) {
return { workUnits: firstRunUnits };
}
return applyDiffSet(firstRunUnits, project, graph, opts.diffSet);
}
function emitFirstRunWorkUnits(project: ParsedLookmlProject, graph: LookmlGraph, opts: ChunkOptions): WorkUnit[] {
const allModelPaths = [...new Set(project.models.map((m) => m.path))].sort();
const allDashboardPaths = [...new Set(project.dashboards.map((d) => d.path))].sort();
// Dedupe: a .view.lkml with multiple `view:` blocks produces multiple ParsedLookmlView
// entries sharing one path.
const allViewPaths = [...new Set(project.views.map((v) => v.path))].sort();
const workUnits: WorkUnit[] = [];
// Per-model WU, sorted by model name for determinism.
const sortedModels = [...project.models].sort((a, b) => a.name.localeCompare(b.name));
for (const model of sortedModels) {
const includedViewPaths = (graph.viewsIncludedByModel.get(model.name) ?? []).filter((p) =>
allViewPaths.includes(p),
);
// Views the model includes and which this model ALSO owns (first-includer-wins).
const ownedViewPaths = includedViewPaths.filter((p) => graph.ownerByViewPath.get(p) === model.name);
// Views the model includes but that another lexicographically-earlier model owns.
// These land in dependencyPaths so this WU's agent can READ them, but the "canonical
// write" for those views happens in the owner's WU.
const nonOwnedDepViewPaths = includedViewPaths.filter((p) => graph.ownerByViewPath.get(p) !== model.name).sort();
const rawFiles = [model.path, ...ownedViewPaths].sort();
const peerFileIndex = [
...allModelPaths.filter((p) => p !== model.path),
...allViewPaths.filter((p) => !rawFiles.includes(p) && !nonOwnedDepViewPaths.includes(p)),
...allDashboardPaths,
].sort();
const isMismatched = opts.mismatchedModelNames?.has(model.name) ?? false;
const notes =
model.explores.length > 0
? `LookML model "${model.name}" (explores: ${model.explores.join(', ')})`
: `LookML model "${model.name}"`;
workUnits.push({
unitKey: `lookml-${model.name}`,
displayLabel: `LookML model "${model.name}"`,
rawFiles,
peerFileIndex,
dependencyPaths: nonOwnedDepViewPaths,
notes: isMismatched ? lookmlSlDisallowedNotes(model.name, notes) : notes,
slDisallowed: isMismatched ? true : undefined,
slDisallowedReason: isMismatched ? 'lookml_connection_mismatch' : undefined,
});
}
// Orphan view WU — views that no model includes. Skip entirely if none.
const orphanViewPaths = allViewPaths.filter((p) => !graph.ownerByViewPath.has(p)).sort();
if (orphanViewPaths.length > 0) {
workUnits.push({
unitKey: 'lookml-orphans',
displayLabel: 'LookML orphan views',
rawFiles: orphanViewPaths,
peerFileIndex: [...allModelPaths, ...allDashboardPaths].sort(),
dependencyPaths: [],
notes: 'Views not referenced by any .model.lkml (orphaned)',
});
}
// One WU per dashboard file.
for (const dashboard of [...project.dashboards].sort((a, b) => a.name.localeCompare(b.name))) {
workUnits.push({
unitKey: `lookml-dashboard-${dashboard.name}`,
displayLabel: `LookML dashboard "${dashboard.name}"`,
rawFiles: [dashboard.path],
peerFileIndex: [...allModelPaths, ...allViewPaths].sort(),
dependencyPaths: [],
notes: `LookML dashboard "${dashboard.name}"`,
});
}
return workUnits;
}
function applyDiffSet(
firstRunUnits: WorkUnit[],
_project: ParsedLookmlProject,
graph: LookmlGraph,
diffSet: DiffSet,
): ChunkResult {
const touched = new Set([...diffSet.added, ...diffSet.modified]);
const keptUnits: WorkUnit[] = [];
for (const wu of firstRunUnits) {
const anyTouched = wu.rawFiles.some((p) => touched.has(p));
if (!anyTouched) {
continue;
}
// Widen dependencyPaths: for every view in rawFiles, add paths of all transitive
// extends ancestors (if known in the graph) that aren't already in rawFiles.
const existingDeps = new Set(wu.dependencyPaths);
for (const rawPath of wu.rawFiles) {
const viewNames = graph.viewNamesByPath.get(rawPath) ?? [];
for (const viewName of viewNames) {
const ancestors = graph.extendsAncestorsByViewName.get(viewName) ?? [];
for (const ancestorName of ancestors) {
const ancestorPaths = graph.pathsByViewName.get(ancestorName) ?? [];
for (const ancestorPath of ancestorPaths) {
if (!wu.rawFiles.includes(ancestorPath)) {
existingDeps.add(ancestorPath);
}
}
}
}
}
keptUnits.push({
...wu,
dependencyPaths: [...existingDeps].sort(),
});
}
const eviction = diffSet.deleted.length > 0 ? { deletedRawPaths: [...diffSet.deleted].sort() } : undefined;
return { workUnits: keptUnits, eviction };
}

View file

@ -0,0 +1,46 @@
import { mkdir, mkdtemp, rm, writeFile } from 'node:fs/promises';
import { tmpdir } from 'node:os';
import { join } from 'node:path';
import { afterEach, beforeEach, describe, expect, it } from 'vitest';
import { detectLookmlStagedDir } from './detect.js';
describe('detectLookmlStagedDir', () => {
let stagedDir: string;
beforeEach(async () => {
stagedDir = await mkdtemp(join(tmpdir(), 'lkml-detect-'));
});
afterEach(async () => rm(stagedDir, { recursive: true, force: true }));
it('returns true when a .model.lkml is present at root', async () => {
await writeFile(join(stagedDir, 'orders.model.lkml'), 'include: "views/*"\n', 'utf-8');
expect(await detectLookmlStagedDir(stagedDir)).toBe(true);
});
it('returns true when only a .view.lkml is present (no model)', async () => {
await writeFile(join(stagedDir, 'x.view.lkml'), 'view: x {}\n', 'utf-8');
expect(await detectLookmlStagedDir(stagedDir)).toBe(true);
});
it('returns true when .lkml files are nested under any subdirectory', async () => {
await mkdir(join(stagedDir, 'nested', 'deeper'), { recursive: true });
await writeFile(join(stagedDir, 'nested', 'deeper', 'x.view.lkml'), 'view: x {}\n', 'utf-8');
expect(await detectLookmlStagedDir(stagedDir)).toBe(true);
});
it('accepts the .lookml extension as well as .lkml', async () => {
await writeFile(join(stagedDir, 'x.view.lookml'), 'view: x {}\n', 'utf-8');
expect(await detectLookmlStagedDir(stagedDir)).toBe(true);
});
it('returns false for a bundle with no .lkml files at all', async () => {
await writeFile(join(stagedDir, 'README.md'), '# hi\n', 'utf-8');
await writeFile(join(stagedDir, 'config.yaml'), 'a: 1\n', 'utf-8');
expect(await detectLookmlStagedDir(stagedDir)).toBe(false);
});
it('returns false for an empty directory', async () => {
expect(await detectLookmlStagedDir(stagedDir)).toBe(false);
});
});

View file

@ -0,0 +1,13 @@
import { readdir } from 'node:fs/promises';
const LKML_EXT_RE = /\.(lkml|lookml)$/i;
export async function detectLookmlStagedDir(stagedDir: string): Promise<boolean> {
const entries = await readdir(stagedDir, { withFileTypes: true, recursive: true });
for (const entry of entries) {
if (entry.isFile() && LKML_EXT_RE.test(entry.name)) {
return true;
}
}
return false;
}

View file

@ -0,0 +1,113 @@
import { mkdtemp, readFile, rm } from 'node:fs/promises';
import { tmpdir } from 'node:os';
import { join } from 'node:path';
import { afterEach, beforeEach, describe, expect, it } from 'vitest';
import type { ParsedLookmlProject } from './parse.js';
import {
LOOKML_FETCH_REPORT_FILE,
LOOKML_MISMATCHED_MODELS_FILE,
buildLookmlValidationArtifacts,
readLookmlFetchReport,
readLookmlMismatchedModelNames,
writeLookmlValidationArtifacts,
} from './fetch-report.js';
function project(models: ParsedLookmlProject['models']): ParsedLookmlProject {
return { models, views: [], dashboards: [], allPaths: models.map((m) => m.path) };
}
describe('LookML validation fetch report', () => {
let stagedDir: string;
beforeEach(async () => {
stagedDir = await mkdtemp(join(tmpdir(), 'lookml-report-'));
});
afterEach(async () => rm(stagedDir, { recursive: true, force: true }));
it('emits partial warning artifacts for mismatched model connection names', async () => {
const artifacts = buildLookmlValidationArtifacts(
project([
{
path: 'b2b.model.lkml',
name: 'b2b',
includes: [],
explores: ['orders'],
connectionName: 'staging_pg',
},
{
path: 'finance.model.lkml',
name: 'finance',
includes: [],
explores: ['revenue'],
connectionName: 'b2b_sandbox_bq',
},
]),
{ expectedLookerConnectionName: 'b2b_sandbox_bq' },
);
expect(artifacts.mismatchedModelNames).toEqual(['b2b']);
expect(artifacts.report.status).toBe('partial');
expect(artifacts.report.warnings).toEqual([
{
rawPath: 'b2b.model.lkml',
entityType: 'lookml_models',
entityId: 'b2b',
severity: 'warning',
statusCode: null,
message:
'LookML model b2b declares connection staging_pg but this warehouse expects b2b_sandbox_bq; SL writes are disabled for this model.',
retryRecommended: false,
kind: 'lookml_connection_mismatch',
details: { model: 'b2b', declared: 'staging_pg', expected: 'b2b_sandbox_bq' },
},
]);
});
it('emits success when no expected connection is configured', () => {
const artifacts = buildLookmlValidationArtifacts(
project([
{
path: 'b2b.model.lkml',
name: 'b2b',
includes: [],
explores: [],
connectionName: 'staging_pg',
},
]),
{ expectedLookerConnectionName: null },
);
expect(artifacts.mismatchedModelNames).toEqual([]);
expect(artifacts.report).toEqual({
status: 'success',
retryRecommended: false,
skipped: [],
warnings: [],
});
});
it('round-trips the fetch report and mismatched model sidecar', async () => {
const artifacts = buildLookmlValidationArtifacts(
project([
{
path: 'orders.model.lkml',
name: 'orders',
includes: [],
explores: [],
connectionName: 'wrong',
},
]),
{ expectedLookerConnectionName: 'expected' },
);
await writeLookmlValidationArtifacts(stagedDir, artifacts);
await expect(readFile(join(stagedDir, LOOKML_FETCH_REPORT_FILE), 'utf-8')).resolves.toContain(
'lookml_connection_mismatch',
);
await expect(readFile(join(stagedDir, LOOKML_MISMATCHED_MODELS_FILE), 'utf-8')).resolves.toContain('orders');
await expect(readLookmlFetchReport(stagedDir)).resolves.toEqual(artifacts.report);
await expect(readLookmlMismatchedModelNames(stagedDir)).resolves.toEqual(new Set(['orders']));
});
});

View file

@ -0,0 +1,127 @@
import { mkdir, readFile, writeFile } from 'node:fs/promises';
import { dirname, join } from 'node:path';
import * as z from 'zod';
import type { SourceFetchReport } from '../../types.js';
import type { ParsedLookmlProject } from './parse.js';
/** @internal */
export const LOOKML_FETCH_REPORT_FILE = 'lookml-fetch-report.json';
/** @internal */
export const LOOKML_MISMATCHED_MODELS_FILE = 'lookml-mismatched-models.json';
const fetchIssueKindSchema = z.enum([
'unmapped_looker_connection',
'unparseable_sql_table_name',
'looker_template_unresolved',
'derived_table_not_supported',
'lookml_connection_mismatch',
]);
const fetchIssueSchema = z.object({
rawPath: z.string().min(1),
entityType: z.string().min(1),
entityId: z.string().nullable(),
severity: z.enum(['warning', 'error']),
statusCode: z.number().int().nullable(),
message: z.string().min(1),
retryRecommended: z.boolean(),
kind: fetchIssueKindSchema.optional(),
details: z.record(z.string(), z.unknown()).optional(),
});
const fetchReportSchema = z.object({
status: z.enum(['success', 'partial']),
retryRecommended: z.boolean(),
skipped: z.array(fetchIssueSchema),
warnings: z.array(fetchIssueSchema),
});
const mismatchedModelsSchema = z.object({
modelNames: z.array(z.string().min(1)).default([]),
});
interface LookmlValidationArtifacts {
report: SourceFetchReport;
mismatchedModelNames: string[];
}
export function buildLookmlValidationArtifacts(
project: ParsedLookmlProject,
config: { expectedLookerConnectionName: string | null },
): LookmlValidationArtifacts {
const expected = config.expectedLookerConnectionName;
if (!expected) {
return {
report: { status: 'success', retryRecommended: false, skipped: [], warnings: [] },
mismatchedModelNames: [],
};
}
const mismatched = project.models
.filter((model) => model.connectionName !== null && model.connectionName !== expected)
.sort((a, b) => a.name.localeCompare(b.name));
const warnings = mismatched.map((model) => {
const declared = model.connectionName ?? '(none)';
return {
rawPath: model.path,
entityType: 'lookml_models',
entityId: model.name,
severity: 'warning' as const,
statusCode: null,
message: `LookML model ${model.name} declares connection ${declared} but this warehouse expects ${expected}; SL writes are disabled for this model.`,
retryRecommended: false,
kind: 'lookml_connection_mismatch' as const,
details: { model: model.name, declared, expected },
};
});
return {
report: {
status: warnings.length > 0 ? 'partial' : 'success',
retryRecommended: false,
skipped: [],
warnings,
},
mismatchedModelNames: mismatched.map((model) => model.name),
};
}
export async function writeLookmlValidationArtifacts(
stagedDir: string,
artifacts: LookmlValidationArtifacts,
): Promise<void> {
const reportPath = join(stagedDir, LOOKML_FETCH_REPORT_FILE);
await mkdir(dirname(reportPath), { recursive: true });
await writeFile(reportPath, `${JSON.stringify(fetchReportSchema.parse(artifacts.report), null, 2)}\n`, 'utf-8');
await writeFile(
join(stagedDir, LOOKML_MISMATCHED_MODELS_FILE),
`${JSON.stringify({ modelNames: artifacts.mismatchedModelNames }, null, 2)}\n`,
'utf-8',
);
}
export async function readLookmlFetchReport(stagedDir: string): Promise<SourceFetchReport | null> {
try {
const raw = await readFile(join(stagedDir, LOOKML_FETCH_REPORT_FILE), 'utf-8');
return fetchReportSchema.parse(JSON.parse(raw));
} catch (error) {
if (error && typeof error === 'object' && 'code' in error && error.code === 'ENOENT') {
return null;
}
throw error;
}
}
export async function readLookmlMismatchedModelNames(stagedDir: string): Promise<Set<string>> {
try {
const raw = await readFile(join(stagedDir, LOOKML_MISMATCHED_MODELS_FILE), 'utf-8');
const parsed = mismatchedModelsSchema.parse(JSON.parse(raw));
return new Set(parsed.modelNames);
} catch (error) {
if (error && typeof error === 'object' && 'code' in error && error.code === 'ENOENT') {
return new Set();
}
throw error;
}
}

View file

@ -0,0 +1,146 @@
import { mkdir, mkdtemp, readFile, rm, writeFile } from 'node:fs/promises';
import { tmpdir } from 'node:os';
import { join } from 'node:path';
import { afterEach, beforeEach, describe, expect, it } from 'vitest';
import { makeLocalGitRepo } from '../../../test/make-local-git-repo.js';
import { fetchLookmlRepo } from './fetch.js';
import type { LookmlPullConfig } from './pull-config.js';
const FIXTURE_ROOT = join(__dirname, '../../../../test/fixtures/lookml');
function pullConfig(overrides: Partial<LookmlPullConfig> & Pick<LookmlPullConfig, 'repoUrl'>): LookmlPullConfig {
return {
branch: 'main',
path: null,
authToken: null,
expectedLookerConnectionName: null,
parsedTargetTables: {},
...overrides,
};
}
describe('fetchLookmlRepo', () => {
let tmpRoot: string;
beforeEach(async () => {
tmpRoot = await mkdtemp(join(tmpdir(), 'fetch-lookml-'));
});
afterEach(async () => rm(tmpRoot, { recursive: true, force: true }));
it('clones a local file:// repo and materializes only .lkml/.lookml files into stagedDir', async () => {
const repo = await makeLocalGitRepo(join(FIXTURE_ROOT, 'single-model'), join(tmpRoot, 'origin'));
// Add a non-LookML file to prove we filter it out.
await repo.writeFile('README.md', '# readme\n');
await repo.commit('add readme');
const stagedDir = join(tmpRoot, 'staged');
const cacheDir = join(tmpRoot, 'cache', 'conn-1');
await mkdir(stagedDir, { recursive: true });
const result = await fetchLookmlRepo({
config: pullConfig({ repoUrl: repo.repoUrl }),
cacheDir,
stagedDir,
});
expect(result.filesCopied).toBe(3); // orders.model.lkml + 2 views
expect(result.commitHash).toMatch(/^[0-9a-f]{40}$/);
await expect(readFile(join(stagedDir, 'orders.model.lkml'), 'utf-8')).resolves.toMatch(/connection:/);
await expect(readFile(join(stagedDir, 'views', 'orders.view.lkml'), 'utf-8')).resolves.toMatch(/view: orders/);
// README.md is present in the cache but NOT in stagedDir.
await expect(readFile(join(stagedDir, 'README.md'), 'utf-8')).rejects.toThrow();
await expect(readFile(join(cacheDir, 'README.md'), 'utf-8')).resolves.toMatch(/readme/);
});
it('pulls an existing cache dir (second call) and surfaces the new commit', async () => {
const repo = await makeLocalGitRepo(join(FIXTURE_ROOT, 'single-model'), join(tmpRoot, 'origin'));
const stagedDir1 = join(tmpRoot, 'staged-1');
const stagedDir2 = join(tmpRoot, 'staged-2');
const cacheDir = join(tmpRoot, 'cache', 'conn-1');
await mkdir(stagedDir1, { recursive: true });
await mkdir(stagedDir2, { recursive: true });
const r1 = await fetchLookmlRepo({
config: pullConfig({ repoUrl: repo.repoUrl }),
cacheDir,
stagedDir: stagedDir1,
});
// Commit a new revision in the origin — a modified view.
await repo.writeFile('views/orders.view.lkml', 'view: orders { sql_table_name: public.orders_v2 ;; }\n');
await repo.commit('bump');
const r2 = await fetchLookmlRepo({
config: pullConfig({ repoUrl: repo.repoUrl }),
cacheDir,
stagedDir: stagedDir2,
});
expect(r2.commitHash).not.toBe(r1.commitHash);
await expect(readFile(join(stagedDir2, 'views', 'orders.view.lkml'), 'utf-8')).resolves.toMatch(/orders_v2/);
});
it('respects config.path — only files under that subtree land in stagedDir', async () => {
// Build a multi-subdir repo: models/... + views/...
const originRoot = join(tmpRoot, 'origin');
await mkdir(originRoot, { recursive: true });
await mkdir(join(originRoot, 'fixture-src', 'models'), { recursive: true });
await mkdir(join(originRoot, 'fixture-src', 'views'), { recursive: true });
await writeFile(join(originRoot, 'fixture-src', 'models', 'orders.model.lkml'), 'connection: "c"\n', 'utf-8');
await writeFile(join(originRoot, 'fixture-src', 'views', 'orders.view.lkml'), 'view: orders {}\n', 'utf-8');
const repo = await makeLocalGitRepo(join(originRoot, 'fixture-src'), join(originRoot, 'git'));
const stagedDir = join(tmpRoot, 'staged');
const cacheDir = join(tmpRoot, 'cache', 'conn-path');
await mkdir(stagedDir, { recursive: true });
const result = await fetchLookmlRepo({
config: pullConfig({ repoUrl: repo.repoUrl, path: 'views' }),
cacheDir,
stagedDir,
});
expect(result.filesCopied).toBe(1);
await expect(readFile(join(stagedDir, 'orders.view.lkml'), 'utf-8')).resolves.toMatch(/view: orders/);
// The model under `models/` is NOT copied because we scoped to `views/`.
await expect(readFile(join(stagedDir, 'orders.model.lkml'), 'utf-8')).rejects.toThrow();
});
it('falls back to fresh clone when the cache dir is corrupt', async () => {
const repo = await makeLocalGitRepo(join(FIXTURE_ROOT, 'single-model'), join(tmpRoot, 'origin'));
const stagedDir = join(tmpRoot, 'staged');
const cacheDir = join(tmpRoot, 'cache', 'conn-bad');
await mkdir(stagedDir, { recursive: true });
// Pre-create a cacheDir that looks like a git repo but is corrupt.
await mkdir(join(cacheDir, '.git'), { recursive: true });
await writeFile(join(cacheDir, '.git', 'HEAD'), 'garbage\n', 'utf-8');
const result = await fetchLookmlRepo({
config: pullConfig({ repoUrl: repo.repoUrl }),
cacheDir,
stagedDir,
});
expect(result.filesCopied).toBeGreaterThan(0);
});
it('sanitizes auth tokens out of error messages when clone fails', async () => {
const stagedDir = join(tmpRoot, 'staged');
const cacheDir = join(tmpRoot, 'cache', 'conn-bad-url');
await mkdir(stagedDir, { recursive: true });
await expect(
fetchLookmlRepo({
config: pullConfig({
repoUrl: 'http://definitely-not-a-real-host.test/r.git',
authToken: 'supersecret-token',
}),
cacheDir,
stagedDir,
}),
).rejects.toThrow(
// Error is thrown with sanitized message — the token is replaced by '***'.
// The exact message depends on simple-git's failure mode; we assert the token does NOT appear.
expect.objectContaining({ message: expect.not.stringContaining('supersecret-token') }),
);
});
});

View file

@ -0,0 +1,75 @@
import { access, copyFile, mkdir, readdir } from 'node:fs/promises';
import { join, relative } from 'node:path';
import { cloneOrPull, sanitizeRepoError } from '../../repo-fetch.js';
import type { LookmlPullConfig } from './pull-config.js';
export interface FetchLookmlRepoParams {
config: LookmlPullConfig;
/** Persistent cache directory (typically per-connection). Cloned here once, pulled on subsequent calls. */
cacheDir: string;
/** Per-job staged directory that the adapter writes `.lkml`/`.lookml` files into. */
stagedDir: string;
}
export interface FetchLookmlRepoResult {
/** SHA of the repo HEAD after the pull. */
commitHash: string;
/** Number of LookML files copied into `stagedDir`. */
filesCopied: number;
}
const LKML_EXT_RE = /\.(lkml|lookml)$/i;
export async function fetchLookmlRepo(params: FetchLookmlRepoParams): Promise<FetchLookmlRepoResult> {
const { config, cacheDir, stagedDir } = params;
const branch = config.branch || 'main';
try {
const { commitHash } = await cloneOrPull({
repoUrl: config.repoUrl,
authToken: config.authToken,
cacheDir,
branch,
});
const sourceRoot = config.path ? join(cacheDir, config.path) : cacheDir;
const filesCopied = await copyLkmlFilesRecursive(sourceRoot, stagedDir);
return { commitHash, filesCopied };
} catch (err) {
throw new Error(sanitizeRepoError(err, config.authToken));
}
}
async function copyLkmlFilesRecursive(sourceRoot: string, destRoot: string): Promise<number> {
if (!(await dirExists(sourceRoot))) {
return 0;
}
await mkdir(destRoot, { recursive: true });
const entries = await readdir(sourceRoot, { withFileTypes: true, recursive: true });
let copied = 0;
for (const entry of entries) {
if (!entry.isFile()) {
continue;
}
if (!LKML_EXT_RE.test(entry.name)) {
continue;
}
const absSrc = join(entry.parentPath, entry.name);
const rel = relative(sourceRoot, absSrc);
const dest = join(destRoot, rel);
await mkdir(join(dest, '..'), { recursive: true });
await copyFile(absSrc, dest);
copied++;
}
return copied;
}
async function dirExists(path: string): Promise<boolean> {
try {
await access(path);
return true;
} catch {
return false;
}
}

View file

@ -0,0 +1,118 @@
import { describe, expect, it } from 'vitest';
import { buildLookmlGraph } from './graph.js';
import type { ParsedLookmlProject } from './parse.js';
type LooseParsedLookmlProject = Omit<Partial<ParsedLookmlProject>, 'models' | 'views'> & {
models?: Array<Omit<ParsedLookmlProject['models'][number], 'connectionName'> & { connectionName?: string | null }>;
views?: Array<Omit<ParsedLookmlProject['views'][number], 'rawSqlTableName'> & { rawSqlTableName?: string | null }>;
};
const mkProject = (overrides: LooseParsedLookmlProject): ParsedLookmlProject => ({
dashboards: [],
allPaths: [],
...overrides,
models: (overrides.models ?? []).map((model) => ({ connectionName: null, ...model })),
views: (overrides.views ?? []).map((view) => ({ rawSqlTableName: null, ...view })),
});
describe('buildLookmlGraph', () => {
it('assigns a single model as owner of all its included views', () => {
const project = mkProject({
models: [{ path: 'orders.model.lkml', name: 'orders', includes: ['views/*.view.lkml'], explores: ['orders'] }],
views: [
{ path: 'views/orders.view.lkml', name: 'orders', extendsFrom: [] },
{ path: 'views/customers.view.lkml', name: 'customers', extendsFrom: [] },
],
allPaths: ['orders.model.lkml', 'views/customers.view.lkml', 'views/orders.view.lkml'],
});
const graph = buildLookmlGraph(project);
expect(graph.ownerByViewPath.get('views/orders.view.lkml')).toBe('orders');
expect(graph.ownerByViewPath.get('views/customers.view.lkml')).toBe('orders');
expect(graph.viewsIncludedByModel.get('orders')?.sort()).toEqual([
'views/customers.view.lkml',
'views/orders.view.lkml',
]);
});
it('assigns shared views to the lexicographically-first model that includes them', () => {
const project = mkProject({
models: [
{ path: 'marketing.model.lkml', name: 'marketing', includes: ['views/shared.view.lkml'], explores: [] },
{
path: 'orders.model.lkml',
name: 'orders',
includes: ['views/shared.view.lkml', 'views/orders.view.lkml'],
explores: [],
},
],
views: [
{ path: 'views/shared.view.lkml', name: 'shared', extendsFrom: [] },
{ path: 'views/orders.view.lkml', name: 'orders', extendsFrom: [] },
],
allPaths: ['marketing.model.lkml', 'orders.model.lkml', 'views/orders.view.lkml', 'views/shared.view.lkml'],
});
const graph = buildLookmlGraph(project);
// "marketing" sorts before "orders", so marketing owns the shared view.
expect(graph.ownerByViewPath.get('views/shared.view.lkml')).toBe('marketing');
expect(graph.ownerByViewPath.get('views/orders.view.lkml')).toBe('orders');
// Both models list the shared view in their include set:
expect(graph.includersByViewPath.get('views/shared.view.lkml')?.sort()).toEqual(['marketing', 'orders']);
});
it('resolves transitive extends chains into dependency paths', () => {
const project = mkProject({
models: [{ path: 'orders.model.lkml', name: 'orders', includes: ['views/*.view.lkml'], explores: [] }],
views: [
{ path: 'views/base.view.lkml', name: 'base', extendsFrom: [] },
{ path: 'views/orders.view.lkml', name: 'orders', extendsFrom: ['base'] },
{ path: 'views/orders_ext.view.lkml', name: 'orders_ext', extendsFrom: ['orders'] },
],
allPaths: ['orders.model.lkml', 'views/base.view.lkml', 'views/orders.view.lkml', 'views/orders_ext.view.lkml'],
});
const graph = buildLookmlGraph(project);
expect(graph.extendsAncestorsByViewName.get('orders_ext')?.sort()).toEqual(['base', 'orders']);
expect(graph.extendsAncestorsByViewName.get('orders')?.sort()).toEqual(['base']);
expect(graph.extendsAncestorsByViewName.get('base')?.sort()).toEqual([]);
});
it('resolves glob-style include patterns (views/*.view.lkml) against allPaths', () => {
const project = mkProject({
models: [{ path: 'orders.model.lkml', name: 'orders', includes: ['views/*.view.lkml'], explores: [] }],
views: [
{ path: 'views/a.view.lkml', name: 'a', extendsFrom: [] },
{ path: 'views/sub/b.view.lkml', name: 'b', extendsFrom: [] },
],
allPaths: ['orders.model.lkml', 'views/a.view.lkml', 'views/sub/b.view.lkml'],
});
const graph = buildLookmlGraph(project);
// Single-star glob matches one path segment — "views/sub/b.view.lkml" is NOT matched.
expect(graph.viewsIncludedByModel.get('orders')?.sort()).toEqual(['views/a.view.lkml']);
});
it('resolves double-star include patterns (views/**/*.view.lkml) recursively', () => {
const project = mkProject({
models: [{ path: 'orders.model.lkml', name: 'orders', includes: ['views/**/*.view.lkml'], explores: [] }],
views: [
{ path: 'views/a.view.lkml', name: 'a', extendsFrom: [] },
{ path: 'views/sub/b.view.lkml', name: 'b', extendsFrom: [] },
],
allPaths: ['orders.model.lkml', 'views/a.view.lkml', 'views/sub/b.view.lkml'],
});
const graph = buildLookmlGraph(project);
expect(graph.viewsIncludedByModel.get('orders')?.sort()).toEqual(['views/a.view.lkml', 'views/sub/b.view.lkml']);
});
it('leaves a view ownerless when no model includes it', () => {
const project = mkProject({
models: [{ path: 'other.model.lkml', name: 'other', includes: ['views/included.view.lkml'], explores: [] }],
views: [
{ path: 'views/included.view.lkml', name: 'included', extendsFrom: [] },
{ path: 'views/orphan.view.lkml', name: 'orphan', extendsFrom: [] },
],
allPaths: ['other.model.lkml', 'views/included.view.lkml', 'views/orphan.view.lkml'],
});
const graph = buildLookmlGraph(project);
expect(graph.ownerByViewPath.has('views/orphan.view.lkml')).toBe(false);
expect(graph.ownerByViewPath.get('views/included.view.lkml')).toBe('other');
});
});

Some files were not shown because too many files have changed in this diff Show more