fix(context): merge overlay columns onto manifest columns by name (#94)

* fix(context): merge overlay columns onto manifest columns by name

composeOverlay was appending overlay columns to the manifest column list,
producing duplicate entries when dbt/metabase overlays declared a column
just to attach descriptions. The duplicates carried no `type`, so the
pydantic SourceDefinition rejected them at semantic-query time and broke
`ktx sl query` for every overlay-backed measure. Now overlay columns
match base columns by name (case-insensitive): same-name entries merge
onto the manifest (overlay fields win, type/role fall back to the base,
descriptions merge per source key) and only new names append.

* refactor(sl): split overlay columns from column_overrides and enforce TS/Python wire contract

Overlay sources now have two distinct collections: `columns:` for computed
columns (requiring `expr` + `type`) and `column_overrides:` for metadata
patches to inherited manifest columns. Composing or loading an overlay that
mixes the two — or references an unknown column — fails with a typed error.

Introduce `ResolvedSemanticLayerSource` / `resolvedSourceSchema` /
`toResolvedWire` as the strict shape sent to the Python engine, and add a
schema contract test that diffs Zod against the Pydantic JSON schema dumped
by `python -m semantic_layer dump-schema`. `SourceDefinition` is now
`extra="forbid"` on the Python side.

`loadAllSources` surfaces per-file load errors instead of swallowing them,
so validation/query paths can report manifest shard parse failures.

* fix(context): make scan description generation resilient and quiet

A transient sampleTable failure during ingest used to take out every
table in a connection: generateTableDescription returned a hardcoded
'Table not found' string into descriptions.ai, and KtxDescriptionGenerator
was constructed without a logger, so the failure left no trail anywhere.

- sampleTable / sampleColumn calls retry 3x with 200/400/800ms backoff,
  honouring KtxScanContext.signal via a new KtxAbortedError.
- On retry exhaustion or missing capability, table generation falls back
  to a metadata-only prompt built from column name / native type / comment
  / rawDescriptions. The column path follows the same rule -- call the
  LLM when any of samples or rawDescriptions are available; skip only
  when both are absent.
- Logger is now threaded from KtxScanContext into the generator. Failures
  emit structured KtxScanWarning entries (new description_fallback_used
  code, plus existing sampling_failed / enrichment_failed /
  connector_capability_missing). ktx scan groups warnings by code so a
  batch of identical failures collapses to one summary line plus sample.
- Returns null on failure instead of the 'Table not found' sentinel; the
  manifest writer's existing guard already skips empty descriptions, so
  schema YAML no longer carries misleading text. SCAN_MANAGED_DESCRIPTION_KEYS
  already strips stale 'ai' on merge, so existing YAML clears on next run.

Also suppress AI SDK v6 'system in messages' warning: pull system messages
out of KtxMessageBuilder.wrapSimple's output via a new splitKtxSystemMessages
helper and pass them top-level to generateText (preserves cacheControl
providerOptions on the SystemModelMessage). Agent-runner's local
splitSystemPromptMessages dedupes onto the shared helper.

* test(docs): align examples-docs assertions with revamped docs

PR #103 (setup/guide doc revamp) reworded several CLI examples and
connection labels; the assertions in scripts/examples-docs.test.mjs
still referenced the pre-revamp wording and were failing in CI on main.
Update the regexes to match the post-revamp content:

- drop the `--json` flag from the sl-query example expectation
- move the `Driver:` / `Status: ok` probe to the connection reference,
  which is where that output now lives (driver id is lowercase
  `postgres`, not the display name `PostgreSQL`)
- drop the obsolete `Install \`uv\`...` troubleshooting line
- accept `<connectionId>` everywhere; the docs no longer use the
  hyphenated `<connection-id>` form
- match the `warehouse` connection id used in the quickstart instead of
  the `postgres-warehouse` id only used in the README and setup ref

* fix(sl): skip TS/Python schema contract test when uv is unavailable

The TypeScript checks CI job does not install uv or Python, so the
module-level `execFileSync('uv', ...)` in schemas.contract.test.ts threw
ENOENT and failed the suite. Wrap the schema dump in a try/catch and
guard the describe block with `describe.skipIf` so the test skips in
environments without uv. Local dev and any CI job that has uv on PATH
still runs the cross-language contract assertion.
This commit is contained in:
Andrey Avtomonov 2026-05-15 02:11:04 +02:00 committed by GitHub
parent 6bc8d200ea
commit cb8902f1e5
No known key found for this signature in database
GPG key ID: B5690EEEBB952194
56 changed files with 1650 additions and 237 deletions

View file

@ -392,6 +392,26 @@ describe('local semantic-layer helpers', () => {
).rejects.toThrow('Invalid semantic-layer source');
});
it('reports legacy overlay column patches with a file-attributed migration hint', async () => {
const invalidYaml = [
'name: orders',
'columns:',
' - name: status',
' descriptions:',
' user: Order status.',
'',
].join('\n');
await expect(
validateLocalSlSource(invalidYaml, { project, connectionId: 'warehouse', sourceName: 'orders' }),
).resolves.toEqual({
valid: false,
errors: [
"semantic-layer/warehouse/orders.yaml: column 'status' patches a manifest column but is in 'columns:' — move it to 'column_overrides:'",
],
});
});
it('rejects unsafe source paths', async () => {
await expect(
readLocalSlSource(project, {

View file

@ -12,6 +12,7 @@ import {
type ManifestTableEntry,
projectManifestEntry,
SemanticLayerService,
toResolvedWire,
} from './semantic-layer.service.js';
import type { PgliteSlSearchPrototypeOwnerOptions } from './pglite-sl-search-prototype.js';
import { loadLatestSlDictionaryEntries } from './sl-dictionary-profile.js';
@ -240,7 +241,12 @@ export async function loadLocalSlSourceRecords(
if (!base) {
continue;
}
const source = composeOverlay(base.source, parsed);
let source: SemanticLayerSource;
try {
source = composeOverlay(base.source, parsed);
} catch (error) {
throw new Error(`${path}: ${error instanceof Error ? error.message : String(error)}`);
}
sources.set(name, {
...summarizeSemanticSource({ connectionId, path, source }),
yaml: sourceToYaml(source),
@ -253,11 +259,28 @@ export async function loadLocalSlSourceRecords(
export async function validateLocalSlSource(
rawYaml: string,
options?: { project?: KtxLocalProject; connectionId?: string },
options?: { project?: KtxLocalProject; connectionId?: string; sourceName?: string },
): Promise<LocalSlValidationResult> {
try {
const parsed = parseYamlRecord(rawYaml);
const schema = parsed.table || parsed.sql ? sourceDefinitionSchema : sourceOverlaySchema;
if (schema === sourceOverlaySchema && Array.isArray(parsed.columns)) {
const sourceName = options?.sourceName ?? (typeof parsed.name === 'string' ? parsed.name : 'source');
const path =
options?.connectionId && isSafeConnectionId(options.connectionId)
? `semantic-layer/${options.connectionId}/${sourceName}.yaml`
: `${sourceName}.yaml`;
const legacyColumnPatchErrors = parsed.columns
.filter((column): column is Record<string, unknown> => isRecord(column))
.filter((column) => typeof column.name === 'string' && (!column.expr || !column.type))
.map(
(column) =>
`${path}: column '${column.name}' patches a manifest column but is in 'columns:' — move it to 'column_overrides:'`,
);
if (legacyColumnPatchErrors.length > 0) {
return { valid: false, errors: legacyColumnPatchErrors };
}
}
const result = schema.parse(parsed);
const errors: string[] = [];
@ -268,6 +291,10 @@ export async function validateLocalSlSource(
);
}
if ('table' in result || 'sql' in result) {
toResolvedWire(result as SemanticLayerSource);
}
return { valid: errors.length === 0, errors };
} catch (error) {
return { valid: false, errors: validationErrors(error) };

View file

@ -1,4 +1,4 @@
import type { SemanticLayerQueryInput, SemanticLayerSource } from './types.js';
import type { ResolvedSemanticLayerSource, SemanticLayerQueryInput } from './types.js';
export interface KtxConnectionInfo {
id: string;
@ -20,7 +20,7 @@ export interface SlConnectionCatalogPort {
export interface SlPythonPort {
validateSources(input: {
sources: SemanticLayerSource[];
sources: ResolvedSemanticLayerSource[];
dialect: string;
recently_touched?: string[];
}): Promise<{
@ -28,7 +28,7 @@ export interface SlPythonPort {
error?: unknown;
}>;
query(input: {
sources: SemanticLayerSource[];
sources: ResolvedSemanticLayerSource[];
query: SemanticLayerQueryInput;
dialect: string;
}): Promise<{ data?: { sql?: string; plan?: Record<string, unknown> } | null; error?: unknown }>;

View file

@ -0,0 +1,68 @@
import { execFileSync } from 'node:child_process';
import { Ajv2020 } from 'ajv/dist/2020.js';
import { describe, expect, it } from 'vitest';
import { resolvedSourceSchema } from './schemas.js';
import { toResolvedWire } from './semantic-layer.service.js';
import type { SemanticLayerSource } from './types.js';
function loadPythonSourceDefinitionSchema(): Record<string, unknown> | null {
try {
const stdout = execFileSync('uv', ['run', 'python', '-m', 'semantic_layer', 'dump-schema'], {
cwd: new URL('../../../../', import.meta.url),
encoding: 'utf8',
stdio: ['ignore', 'pipe', 'ignore'],
});
return JSON.parse(stdout) as Record<string, unknown>;
} catch {
return null;
}
}
const sourceDefinitionJsonSchema = loadPythonSourceDefinitionSchema();
const fixtures: SemanticLayerSource[] = [
{
name: 'orders',
table: 'public.orders',
grain: ['id'],
columns: [
{ name: 'id', type: 'number' },
{
name: 'status',
type: 'string',
descriptions: { dbt: 'Order lifecycle status.' },
constraints: { dbt: { not_null: true } },
enum_values: { dbt: ['placed', 'shipped'] },
tests: { dbt: [{ name: 'accepted_values', package: 'dbt' }] },
},
],
joins: [{ to: 'customers', on: 'orders.customer_id = customers.id', relationship: 'many_to_one' }],
measures: [{ name: 'order_count', expr: 'count(id)' }],
segments: [{ name: 'paid', expr: "status = 'paid'" }],
default_time_dimension: { dbt: 'created_at' },
tags: { dbt: ['mart'] },
freshness: { dbt: { loaded_at_field: 'updated_at' } },
},
{
name: 'aav_orders',
sql: 'select id, status from public.orders where status = paid',
grain: ['id'],
columns: [{ name: 'id', type: 'number' }],
joins: [],
measures: [],
},
];
describe.skipIf(sourceDefinitionJsonSchema === null)('resolved source JSON Schema contract', () => {
it('keeps TS resolved-source fixtures accepted by the Python SourceDefinition schema', () => {
const ajv = new Ajv2020({ allErrors: true, strict: false });
const validate = ajv.compile(sourceDefinitionJsonSchema as Record<string, unknown>);
for (const fixture of fixtures) {
const wire = toResolvedWire(fixture);
expect(resolvedSourceSchema.safeParse(wire).success).toBe(true);
expect(validate(wire), JSON.stringify(validate.errors, null, 2)).toBe(true);
}
});
});

View file

@ -78,6 +78,8 @@ const joinDeclarationSchema = z.object({
alias: z.string().optional(),
});
const resolvedJoinDeclarationSchema = joinDeclarationSchema.strict();
const sourceColumnSchema = z.object({
name: unqualifiedNameSchema,
// type/descriptions optional on standalone sources: compose-time enrichment fills them
@ -89,24 +91,39 @@ const sourceColumnSchema = z.object({
visibility: z.enum(columnVisibilityValues).optional(),
descriptions: descriptionsSchema.optional(),
expr: z.string().optional(),
natural_granularity: z.string().optional(),
constraints: sourceKeyedColumnConstraintsSchema.optional(),
enum_values: sourceKeyedStringArraySchema.optional(),
tests: dbtColumnTestsSchema.optional(),
});
/** Overlay column: type requires expr (structural types are inherited from manifest). */
const resolvedSourceColumnSchema = sourceColumnSchema.extend({
type: z.enum(columnTypeValues),
}).strict();
/** Overlay column: computed columns only. Structural columns live in the manifest. */
const overlayColumnSchema = z
.object({
name: unqualifiedNameSchema,
type: z.enum(columnTypeValues).optional(),
type: z.enum(columnTypeValues),
role: z.enum(columnRoleValues).optional(),
visibility: z.enum(columnVisibilityValues).optional(),
descriptions: descriptionsSchema.optional(),
expr: z.string().optional(),
expr: z.string().min(1),
})
.refine((col) => !col.type || col.expr, {
message: "Overlay column with 'type' must also have 'expr' (only computed columns may specify a type)",
});
.strict();
const columnOverrideSchema = z
.object({
name: unqualifiedNameSchema,
role: z.enum(columnRoleValues).optional(),
visibility: z.enum(columnVisibilityValues).optional(),
descriptions: descriptionsSchema.optional(),
constraints: sourceKeyedColumnConstraintsSchema.optional(),
enum_values: sourceKeyedStringArraySchema.optional(),
tests: dbtColumnTestsSchema.optional(),
})
.strict();
/** Standalone source: has `table` or `sql`, requires grain + columns. */
export const sourceDefinitionSchema = z
@ -143,6 +160,26 @@ export const sourceDefinitionSchema = z
message: "Standalone source must have exactly one of 'table' or 'sql' (not both)",
});
export const resolvedSourceSchema = z
.object({
name: z.string().min(1),
descriptions: descriptionsSchema.optional(),
table: z.string().optional(),
sql: z.string().optional(),
grain: z.array(unqualifiedNameSchema).min(1),
columns: z.array(resolvedSourceColumnSchema).min(1),
joins: z.array(resolvedJoinDeclarationSchema).default([]),
measures: z.array(slMeasureDefinitionSchema).default([]),
segments: z.array(segmentDefinitionSchema).optional(),
default_time_dimension: defaultTimeDimensionDbtSchema.optional(),
tags: sourceKeyedStringArraySchema.optional(),
freshness: sourceFreshnessSchema.optional(),
})
.strict()
.refine((s) => (s.table || s.sql) && !(s.table && s.sql), {
message: "Resolved source must have exactly one of 'table' or 'sql' (not both)",
});
/** Overlay source: no table/sql, all fields optional except name. */
export const sourceOverlaySchema = z
.object({
@ -150,6 +187,7 @@ export const sourceOverlaySchema = z
descriptions: z.record(z.string(), z.string()).optional(),
grain: z.array(unqualifiedNameSchema).optional(),
columns: z.array(overlayColumnSchema).optional(),
column_overrides: z.array(columnOverrideSchema).optional(),
joins: z.array(joinDeclarationSchema).optional(),
measures: z.array(slMeasureDefinitionSchema).optional(),
segments: z.array(segmentDefinitionSchema).optional(),

View file

@ -2,13 +2,17 @@ import type { Mock } from 'vitest';
import { beforeEach, describe, expect, it, vi } from 'vitest';
import {
ColumnNameCollisionError,
composeOverlay,
ConflictingExcludeAndOverrideError,
enrichColumnsFromManifest,
findDanglingSegmentRefs,
projectManifestEntry,
SemanticLayerService,
toResolvedWire,
UnknownColumnOverrideError,
} from './semantic-layer.service.js';
import { sourceDefinitionSchema } from './schemas.js';
import { resolvedSourceSchema, sourceDefinitionSchema, sourceOverlaySchema } from './schemas.js';
import type { SemanticLayerSource } from './types.js';
const pythonPort = {
@ -139,6 +143,69 @@ describe('composeOverlay', () => {
expect(composed.measures).toHaveLength(1);
});
it('applies column_overrides to same-named manifest columns', () => {
const overlay = {
name: 'fct_labs',
column_overrides: [
{ name: 'lab_order_id', descriptions: { user: 'Primary key' } },
{ name: 'admin_user_id', descriptions: { user: 'FK to admin_users' } },
],
};
const composed = composeOverlay(baseTable, overlay);
// No duplicate columns appended — same-named overlay entries merged onto the base.
expect(composed.columns).toHaveLength(3);
const labOrder = composed.columns.find((c) => c.name === 'lab_order_id');
expect(labOrder?.type).toBe('string');
expect(labOrder?.descriptions).toEqual({ user: 'Primary key' });
const adminUser = composed.columns.find((c) => c.name === 'admin_user_id');
expect(adminUser?.type).toBe('string');
expect(adminUser?.descriptions).toEqual({ user: 'FK to admin_users' });
});
it('appends computed columns alongside column overrides', () => {
const overlay = {
name: 'fct_labs',
column_overrides: [
{ name: 'lab_order_id', descriptions: { user: 'PK doc' } },
],
columns: [
{ name: 'is_byol', type: 'boolean', expr: "lab_type = 'byol'" },
],
};
const composed = composeOverlay(baseTable, overlay);
expect(composed.columns).toHaveLength(4);
expect(composed.columns.find((c) => c.name === 'is_byol')?.expr).toBe("lab_type = 'byol'");
expect(composed.columns.find((c) => c.name === 'lab_order_id')?.type).toBe('string');
});
it('rejects column_overrides that target unknown manifest columns', () => {
expect(() =>
composeOverlay(baseTable, {
name: 'fct_labs',
column_overrides: [{ name: 'missing', descriptions: { user: 'Nope' } }],
}),
).toThrow(UnknownColumnOverrideError);
});
it('rejects computed columns whose names collide with manifest columns', () => {
expect(() =>
composeOverlay(baseTable, {
name: 'fct_labs',
columns: [{ name: 'lab_order_id', type: 'string', expr: 'lab_order_id' }],
}),
).toThrow(ColumnNameCollisionError);
});
it('rejects exclude/override conflicts before applying exclusions', () => {
expect(() =>
composeOverlay(baseTable, {
name: 'fct_labs',
exclude_columns: ['lab_order_id'],
column_overrides: [{ name: 'lab_order_id', descriptions: { user: 'Hidden PK' } }],
}),
).toThrow(ConflictingExcludeAndOverrideError);
});
it('merges overlay descriptions (plural) with base descriptions keyed by source', () => {
const baseWithDescriptions: SemanticLayerSource = {
...baseTable,
@ -418,6 +485,62 @@ describe('sourceDefinitionSchema', () => {
});
});
describe('sourceOverlaySchema', () => {
it('accepts column_overrides and keeps columns computed-only', () => {
const result = sourceOverlaySchema.safeParse({
name: 'orders',
column_overrides: [{ name: 'status', descriptions: { user: 'Lifecycle status' } }],
columns: [{ name: 'is_paid', type: 'boolean', expr: "status = 'paid'" }],
});
expect(result.success).toBe(true);
});
it('rejects typeless overlay columns and singular description on overrides', () => {
const result = sourceOverlaySchema.safeParse({
name: 'orders',
column_overrides: [{ name: 'status', description: 'Lifecycle status' }],
columns: [{ name: 'status', descriptions: { user: 'Lifecycle status' } }],
});
expect(result.success).toBe(false);
if (!result.success) {
const paths = result.error.issues.map((issue) => issue.path.join('.'));
expect(paths).toContain('column_overrides.0');
expect(paths).toContain('columns.0.type');
expect(paths).toContain('columns.0.expr');
}
});
});
describe('toResolvedWire', () => {
it('strips TS-only authoring and provenance fields before the Python boundary', () => {
const wire = toResolvedWire({
name: 'orders',
table: 'public.orders',
inherits_columns_from: 'orders',
grain: ['id'],
columns: [{ name: 'id', type: 'string' }],
joins: [{ to: 'customers', on: 'orders.customer_id = customers.id', relationship: 'many_to_one', source: 'formal' }],
measures: [],
usage: {
narrative: 'Frequently queried orders.',
frequencyTier: 'high',
commonFilters: ['status'],
commonJoins: [],
},
});
expect(wire).toEqual({
name: 'orders',
table: 'public.orders',
grain: ['id'],
columns: [{ name: 'id', type: 'string' }],
joins: [{ to: 'customers', on: 'orders.customer_id = customers.id', relationship: 'many_to_one' }],
measures: [],
});
expect(resolvedSourceSchema.parse(wire)).toEqual(wire);
});
});
describe('projectManifestEntry', () => {
it('projects manifest usage onto the semantic-layer source', () => {
const source = projectManifestEntry('orders', {
@ -537,7 +660,8 @@ describe('loadAllSources — standalone enrichment via inherits_columns_from', (
].join('\n'),
});
const sources = await service.loadAllSources('conn-1');
const { sources, loadErrors } = await service.loadAllSources('conn-1');
expect(loadErrors).toEqual([]);
expect(sources[0]).toMatchObject({
name: 'orders',
@ -601,7 +725,8 @@ describe('loadAllSources — standalone enrichment via inherits_columns_from', (
return Promise.reject(new Error(`Unexpected readFile: ${path}`));
});
const sources = await service.loadAllSources('conn-1');
const { sources, loadErrors } = await service.loadAllSources('conn-1');
expect(loadErrors).toEqual([]);
const aav = sources.find((s) => s.name === 'aav_consignments');
expect(aav).toBeDefined();
expect(aav?.columns).toEqual([
@ -646,7 +771,8 @@ describe('loadAllSources — standalone enrichment via inherits_columns_from', (
});
});
const sources = await service.loadAllSources('conn-1');
const { sources, loadErrors } = await service.loadAllSources('conn-1');
expect(loadErrors).toEqual([]);
const aav = sources.find((s) => s.name === 'aav_consignments');
expect(aav?.columns[0].type).toBe('string');
});
@ -670,7 +796,8 @@ describe('loadAllSources — standalone enrichment via inherits_columns_from', (
].join('\n'),
});
const sources = await service.loadAllSources('conn-1');
const { sources, loadErrors } = await service.loadAllSources('conn-1');
expect(loadErrors).toEqual([]);
const aav = sources.find((s) => s.name === 'aav_consignments');
expect(aav?.columns).toEqual([{ name: 'FOO', type: 'string' }]);
});
@ -693,7 +820,8 @@ describe('loadAllSources — standalone enrichment via inherits_columns_from', (
].join('\n'),
});
const sources = await service.loadAllSources('conn-1');
const { sources, loadErrors } = await service.loadAllSources('conn-1');
expect(loadErrors).toEqual([]);
expect(sources[0]).toMatchObject({
name: 'orders',
@ -701,6 +829,33 @@ describe('loadAllSources — standalone enrichment via inherits_columns_from', (
columns: [{ name: 'id', type: 'string', descriptions: { user: 'Stable order identifier.' } }],
});
});
it('reports file-attributed errors for legacy overlay column patches', async () => {
const schemaPath = 'semantic-layer/conn-1/_schema/marts.yaml';
const overlayPath = 'semantic-layer/conn-1/orders.yaml';
configService.listFiles.mockResolvedValue({ files: [schemaPath, overlayPath] });
configService.readFile.mockImplementation((path: string) => {
if (path === schemaPath) {
return Promise.resolve({
content: [
'tables:',
' orders:',
' table: public.orders',
' columns:',
' - { name: id, type: string, pk: true }',
].join('\n'),
});
}
return Promise.resolve({
content: ['name: orders', 'columns:', ' - name: id', ' descriptions: { user: "Stable id." }'].join('\n'),
});
});
const { loadErrors } = await service.loadAllSources('conn-1');
expect(loadErrors.join('\n')).toContain(overlayPath);
expect(loadErrors.join('\n')).toContain("move it to 'column_overrides:'");
});
});
describe('validateWithProposedSource', () => {

View file

@ -4,8 +4,14 @@ import { noopLogger } from '../core/index.js';
import type { TableUsageOutput } from '../ingest/adapters/historic-sql/skill-schemas.js';
import type { SlConnectionCatalogPort, SlPythonPort } from './ports.js';
import { normalizeSemanticLayerDescriptions } from './description-normalization.js';
import { isOverlaySource, sourceDefinitionSchema, sourceOverlaySchema } from './schemas.js';
import type { SemanticLayerQueryExecutionResult, SemanticLayerQueryInput, SemanticLayerSource } from './types.js';
import { isOverlaySource, resolvedSourceSchema, sourceDefinitionSchema, sourceOverlaySchema } from './schemas.js';
import type {
ResolvedSemanticLayerSource,
SemanticLayerColumnOverride,
SemanticLayerQueryExecutionResult,
SemanticLayerQueryInput,
SemanticLayerSource,
} from './types.js';
interface WriteSourceOptions {
skipValidation?: boolean;
@ -14,6 +20,30 @@ interface WriteSourceOptions {
const SL_DIR_PREFIX = 'semantic-layer';
const CONNECTION_ID_PATTERN = /^[a-zA-Z0-9][a-zA-Z0-9_-]*$/;
export interface LoadAllSourcesResult {
sources: SemanticLayerSource[];
loadErrors: string[];
}
export class UnknownColumnOverrideError extends Error {}
export class ColumnNameCollisionError extends Error {}
export class ConflictingExcludeAndOverrideError extends Error {}
class ComposeContractError extends Error {}
function isComposeError(error: unknown): boolean {
return (
error instanceof UnknownColumnOverrideError ||
error instanceof ColumnNameCollisionError ||
error instanceof ConflictingExcludeAndOverrideError ||
error instanceof ComposeContractError
);
}
function formatComposeError(filePath: string, error: unknown): string {
const message = error instanceof Error ? error.message : String(error);
return `${filePath}: ${message}`;
}
function formatPortError(error: unknown, fallback: string): string {
if (typeof error === 'string') {
return error;
@ -37,6 +67,24 @@ function formatPortError(error: unknown, fallback: string): string {
return fallback;
}
export function toResolvedWire(source: SemanticLayerSource): ResolvedSemanticLayerSource {
const stripped = {
...source,
columns: source.columns.map((column) => ({ ...column })),
joins: source.joins.map(({ source: _source, ...join }) => join),
} as Record<string, unknown>;
delete stripped.inherits_columns_from;
delete stripped.usage;
delete stripped.source_type;
const parsed = resolvedSourceSchema.safeParse(stripped);
if (!parsed.success) {
const issues = parsed.error.issues.map((issue) => `${issue.path.join('.')}: ${issue.message}`).join('; ');
throw new ComposeContractError(`resolved source '${source.name}' violates the TS/Python contract: ${issues}`);
}
return parsed.data as ResolvedSemanticLayerSource;
}
export class SemanticLayerService {
constructor(
private readonly configService: KtxFileStorePort,
@ -158,16 +206,17 @@ export class SemanticLayerService {
}
}
async loadAllSources(connectionId: string): Promise<SemanticLayerSource[]> {
async loadAllSources(connectionId: string): Promise<LoadAllSourcesResult> {
const dir = `${SL_DIR_PREFIX}/${connectionId}`;
const schemaDir = `${dir}/_schema`;
const loadErrors: string[] = [];
let allFiles: string[];
try {
const result = await this.configService.listFiles(dir);
allFiles = result.files.filter((f) => f.endsWith('.yaml'));
} catch {
return [];
return { sources: [], loadErrors };
}
// 1. Load manifest shards from _schema/*.yaml → project to sources
@ -184,7 +233,9 @@ export class SemanticLayerService {
}
}
} catch (e) {
this.logger.warn(`Failed to parse manifest shard ${filePath}: ${e}`);
const message = `Failed to parse manifest shard ${filePath}: ${e instanceof Error ? e.message : String(e)}`;
loadErrors.push(message);
this.logger.warn(message);
}
}
@ -227,6 +278,7 @@ export class SemanticLayerService {
);
}
}
toResolvedWire(standalone);
sources.set(name, standalone);
} else {
// Overlay — compose with manifest entry if present
@ -238,11 +290,15 @@ export class SemanticLayerService {
}
}
} catch (e) {
this.logger.warn(`Failed to parse YAML file ${filePath}: ${e}`);
const message = isComposeError(e)
? formatComposeError(filePath, e)
: `Failed to parse YAML file ${filePath}: ${e instanceof Error ? e.message : String(e)}`;
loadErrors.push(message);
this.logger.warn(message);
}
}
return Array.from(sources.values());
return { sources: Array.from(sources.values()), loadErrors };
}
/**
@ -622,8 +678,10 @@ export class SemanticLayerService {
connectionId: string,
proposedSource: SemanticLayerSource,
): Promise<{ errors: string[]; warnings: string[]; perSourceWarnings: Record<string, string[]> }> {
const existing = await this.loadAllSources(connectionId);
const loaded = await this.loadAllSources(connectionId);
const existing = loaded.sources;
const merged = existing.filter((s) => s.name !== proposedSource.name);
const loadErrors = [...loaded.loadErrors];
// Overlays (no table/sql) must be composed with their manifest base before
// validation, otherwise the filter below drops them and the edited source
@ -641,11 +699,27 @@ export class SemanticLayerService {
perSourceWarnings: {},
};
}
toPush = composeOverlay(base, { ...proposedSource });
try {
toPush = composeOverlay(base, { ...proposedSource });
} catch (error) {
return {
errors: [...loadErrors, formatComposeError(`${proposedSource.name}.yaml`, error)],
warnings: [],
perSourceWarnings: {},
};
}
} else if (proposedSource.inherits_columns_from) {
const base = await this.findManifestEntryByTableRef(connectionId, proposedSource.inherits_columns_from);
if (base) {
toPush = enrichColumnsFromManifest(proposedSource, base);
try {
toPush = enrichColumnsFromManifest(proposedSource, base);
} catch (error) {
return {
errors: [...loadErrors, formatComposeError(`${proposedSource.name}.yaml`, error)],
warnings: [],
perSourceWarnings: {},
};
}
}
// Miss is non-fatal — the source ships unenriched, validator will surface
// any column-without-type errors via the warehouse probe.
@ -654,37 +728,37 @@ export class SemanticLayerService {
const validatable = merged.filter((s) => s.table != null || s.sql != null);
if (validatable.length === 0) {
return { errors: [], warnings: [], perSourceWarnings: {} };
return { errors: loadErrors, warnings: [], perSourceWarnings: {} };
}
const dialect = await this.getDialectForConnection(connectionId);
try {
const { data, error } = await this.python.validateSources({
sources: validatable,
sources: validatable.map(toResolvedWire),
dialect,
recently_touched: [proposedSource.name],
});
if (error) {
const errorMsg = formatPortError(error, 'Unknown validation error');
return { errors: [errorMsg], warnings: [], perSourceWarnings: {} };
return { errors: [...loadErrors, errorMsg], warnings: [], perSourceWarnings: {} };
}
if (!data) {
return {
errors: await this.validatePhysicalTableReferences(connectionId, validatable),
errors: [...loadErrors, ...(await this.validatePhysicalTableReferences(connectionId, validatable))],
warnings: [],
perSourceWarnings: {},
};
}
const physicalErrors = await this.validatePhysicalTableReferences(connectionId, validatable);
return {
errors: [...(data.errors ?? []), ...physicalErrors],
errors: [...loadErrors, ...(data.errors ?? []), ...physicalErrors],
warnings: data.warnings ?? [],
perSourceWarnings: data.per_source_warnings ?? {},
};
} catch (e) {
return {
errors: [`Validation call failed: ${e instanceof Error ? e.message : String(e)}`],
errors: [...loadErrors, `Validation call failed: ${e instanceof Error ? e.message : String(e)}`],
warnings: [],
perSourceWarnings: {},
};
@ -692,23 +766,23 @@ export class SemanticLayerService {
}
async validateSourcesForConnection(connectionId: string): Promise<{ errors: string[]; warnings: string[] }> {
const allSources = await this.loadAllSources(connectionId);
const { sources: allSources, loadErrors } = await this.loadAllSources(connectionId);
const sources = allSources.filter((source) => source.table != null || source.sql != null);
if (sources.length === 0) {
return { errors: [], warnings: [] };
return { errors: loadErrors, warnings: [] };
}
const dialect = await this.getDialectForConnection(connectionId);
const { data, error } = await this.python.validateSources({ sources, dialect });
const { data, error } = await this.python.validateSources({ sources: sources.map(toResolvedWire), dialect });
if (error) {
return { errors: [formatPortError(error, 'Unknown validation error')], warnings: [] };
return { errors: [...loadErrors, formatPortError(error, 'Unknown validation error')], warnings: [] };
}
if (!data) {
return { errors: await this.validatePhysicalTableReferences(connectionId, sources), warnings: [] };
return { errors: [...loadErrors, ...(await this.validatePhysicalTableReferences(connectionId, sources))], warnings: [] };
}
const physicalErrors = await this.validatePhysicalTableReferences(connectionId, sources);
return {
errors: [...(data.errors ?? []), ...physicalErrors],
errors: [...loadErrors, ...(data.errors ?? []), ...physicalErrors],
warnings: data.warnings ?? [],
};
}
@ -802,6 +876,7 @@ export class SemanticLayerService {
} else {
// Overlay — check references against manifest
const excludeColumns = (data.exclude_columns as string[]) ?? [];
const columnOverrides = (data.column_overrides as Array<{ name: string }> | undefined) ?? [];
const disableJoins = (data.disable_joins as string[]) ?? [];
const cols = manifestColumns.get(name);
const joins = manifestJoins.get(name);
@ -817,6 +892,16 @@ export class SemanticLayerService {
}
}
const excluded = new Set(excludeColumns);
for (const override of columnOverrides) {
if (!cols.has(override.name)) {
warnings.push(`${name}: column_overrides references non-existent column '${override.name}'`);
}
if (excluded.has(override.name)) {
warnings.push(`${name}: column '${override.name}' appears in both exclude_columns and column_overrides`);
}
}
for (const joinOn of disableJoins) {
const normalized = joinOn.replace(/\s+/g, ' ').trim();
if (!joins?.has(normalized)) {
@ -999,7 +1084,10 @@ export class SemanticLayerService {
*/
async executeQuery(connectionId: string, query: SemanticLayerQueryInput): Promise<SemanticLayerQueryExecutionResult> {
// 1. Load sources, filtering out sources with no table or sql
const allSources = await this.loadAllSources(connectionId);
const { sources: allSources, loadErrors } = await this.loadAllSources(connectionId);
if (loadErrors.length > 0) {
throw new Error(`Semantic layer source load failed: ${loadErrors.join('; ')}`);
}
const sources = allSources.filter((s) => {
if (!s.table && !s.sql) {
this.logger.warn(`Skipping source "${s.name}" with no table or sql defined`);
@ -1021,7 +1109,7 @@ export class SemanticLayerService {
// 3. Generate SQL via python SL engine
const { data: slResult, error: slError } = await this.python.query({
sources,
sources: sources.map(toResolvedWire),
query,
dialect,
});
@ -1092,18 +1180,20 @@ export function projectManifestEntry(name: string, entry: ManifestTableEntry): S
const grain = pkColumns.length > 0 ? pkColumns : entry.columns.map((c) => c.name);
// Table-level dbt config from manifest shards is surfaced on the source for search / tools.
return {
const source: SemanticLayerSource = {
name,
table: entry.table,
descriptions: entry.descriptions,
grain,
columns,
joins: (entry.joins ?? []).map((j) => ({ to: j.to, on: j.on, relationship: j.relationship, source: j.source })),
joins: (entry.joins ?? []).map((j) => ({ to: j.to, on: j.on, relationship: j.relationship })),
measures: [],
...(entry.tags?.dbt?.length ? { tags: entry.tags } : {}),
...(entry.freshness?.dbt ? { freshness: entry.freshness } : {}),
...(entry.usage ? { usage: entry.usage } : {}),
};
toResolvedWire(source);
return source;
}
function normalizeWs(s: string): string {
@ -1331,6 +1421,7 @@ const COMPOSE_KNOWN_KEYS = new Set([
'descriptions',
'grain',
'columns',
'column_overrides',
'joins',
'measures',
'segments',
@ -1365,14 +1456,48 @@ export function composeOverlay(base: SemanticLayerSource, overlay: Record<string
result.usage = normalizedOverlay.usage as SemanticLayerSource['usage'];
}
// Filter out excluded columns
const excluded = new Set((normalizedOverlay.exclude_columns as string[] | undefined) ?? []);
let columns = result.columns.filter((c) => !excluded.has(c.name));
const columnOverrides = (normalizedOverlay.column_overrides as SemanticLayerColumnOverride[] | undefined) ?? [];
const overrideNames = columnOverrides.map((column) => column.name);
const conflictingOverrides = overrideNames.filter((name) => excluded.has(name));
if (conflictingOverrides.length > 0) {
throw new ConflictingExcludeAndOverrideError(
`column_overrides conflict with exclude_columns for '${base.name}': ${conflictingOverrides.join(', ')}`,
);
}
// Append overlay computed columns
const overlayColumns = (normalizedOverlay.columns as SemanticLayerSource['columns'] | undefined) ?? [];
columns = [...columns, ...overlayColumns];
result.columns = columns;
const baseByLowerName = new Map(base.columns.map((column) => [column.name.toLowerCase(), column]));
const columnsByLowerName = new Map(
result.columns.filter((column) => !excluded.has(column.name)).map((column) => [column.name.toLowerCase(), column]),
);
for (const override of columnOverrides) {
const key = override.name.toLowerCase();
const baseColumn = baseByLowerName.get(key);
if (!baseColumn) {
throw new UnknownColumnOverrideError(
`column '${override.name}' in column_overrides does not exist on manifest source '${base.name}'`,
);
}
const baseDescriptions = baseColumn.descriptions ?? {};
const overrideDescriptions = override.descriptions ?? {};
const merged = { ...baseColumn, ...override };
if (Object.keys(baseDescriptions).length > 0 || Object.keys(overrideDescriptions).length > 0) {
merged.descriptions = { ...baseDescriptions, ...overrideDescriptions };
}
columnsByLowerName.set(key, merged);
}
const computedColumns = (normalizedOverlay.columns as SemanticLayerSource['columns'] | undefined) ?? [];
for (const column of computedColumns) {
if (baseByLowerName.has(column.name.toLowerCase())) {
throw new ColumnNameCollisionError(
`column '${column.name}' in columns patches a manifest column on '${base.name}' — move it to 'column_overrides:'`,
);
}
columnsByLowerName.set(column.name.toLowerCase(), column);
}
result.columns = [...columnsByLowerName.values()];
// Measures from overlay only
result.measures = (normalizedOverlay.measures as SemanticLayerSource['measures'] | undefined) ?? [];
@ -1401,6 +1526,12 @@ export function composeOverlay(base: SemanticLayerSource, overlay: Record<string
const newJoins = overlayJoins.filter((j) => !existingKeys.has(`${j.to}::${normalizeWs(j.on)}`));
result.joins = [...manifestJoins, ...newJoins];
const overlayParse = sourceOverlaySchema.safeParse(normalizedOverlay);
if (!overlayParse.success) {
const issues = overlayParse.error.issues.map((issue) => `${issue.path.join('.')}: ${issue.message}`).join('; ');
throw new ComposeContractError(`overlay for '${base.name}' violates the authoring schema: ${issues}`);
}
toResolvedWire(result);
return result;
}
@ -1464,5 +1595,7 @@ export function enrichColumnsFromManifest(
}
return merged;
});
return { ...source, columns: enrichedColumns };
const enriched = { ...source, columns: enrichedColumns };
toResolvedWire(enriched);
return enriched;
}

View file

@ -7,7 +7,7 @@ import { SlDiscoverTool } from './sl-discover.tool.js';
function makeTool() {
const semanticLayerService = {
listConnectionIdsWithNames: vi.fn(async () => [] as Array<{ id: string; name: string; connectionType: string }>),
loadAllSources: vi.fn(async () => [] as SemanticLayerSource[]),
loadAllSources: vi.fn(async () => ({ sources: [] as SemanticLayerSource[], loadErrors: [] })),
};
const slSearchService = {
search: vi.fn(async () => []),
@ -53,7 +53,8 @@ describe('SlDiscoverTool - session-scoped reads', () => {
listConnectionIdsWithNames: vi.fn().mockResolvedValue([
{ id: 'warehouse', name: 'warehouse', connectionType: 'postgres' },
]),
loadAllSources: vi.fn().mockResolvedValue([
loadAllSources: vi.fn().mockResolvedValue({
sources: [
{
name: 'orders',
table: 'public.orders',
@ -62,7 +63,9 @@ describe('SlDiscoverTool - session-scoped reads', () => {
measures: [],
joins: [],
},
]),
],
loadErrors: [],
}),
};
const result = await tool.call({}, makeContext({ session: makeSession(sessionSemanticLayerService) }));

View file

@ -101,7 +101,7 @@ Use this to understand what data is available before querying through the semant
// If inspecting a specific source — show the SL interface (columns, measures, joins)
// without the raw SQL. Use `sl_read_source` to see the full YAML including SQL.
if (sourceName) {
const sources = await semanticLayerService.loadAllSources(connectionId);
const { sources } = await semanticLayerService.loadAllSources(connectionId);
const source = sources.find((s) => s.name === sourceName);
if (!source) {
return {
@ -151,7 +151,7 @@ Use this to understand what data is available before querying through the semant
// Load sources from all connections in parallel
const results = await Promise.all(
connections.map(async (conn) => {
const sources = await semanticLayerService.loadAllSources(conn.id);
const { sources } = await semanticLayerService.loadAllSources(conn.id);
let filtered = sources;
if (query) {
filtered = await this.filterByQuery(conn.id, sources, query);
@ -213,7 +213,7 @@ Use this to understand what data is available before querying through the semant
connectionName: string,
query?: string,
): Promise<ToolOutput<SlDiscoverStructured>> {
const sources = await semanticLayerService.loadAllSources(connectionId);
const { sources } = await semanticLayerService.loadAllSources(connectionId);
if (sources.length === 0) {
return {

View file

@ -11,7 +11,7 @@ function makeTool(overrides: any = {}) {
}),
validateWithProposedSource: vi.fn().mockResolvedValue({ errors: [], warnings: [] }),
writeSource: vi.fn().mockResolvedValue({ commitHash: 'c1' }),
loadAllSources: vi.fn().mockResolvedValue([]),
loadAllSources: vi.fn().mockResolvedValue({ sources: [], loadErrors: [] }),
deleteSource: vi.fn().mockResolvedValue(undefined),
isManifestBacked: vi.fn().mockResolvedValue(false),
...overrides.semanticLayerService,
@ -44,7 +44,7 @@ function makeSession(overrides: Partial<ToolSession> = {}): ToolSession {
}),
validateWithProposedSource: vi.fn().mockResolvedValue({ errors: [], warnings: [] }),
writeSource: vi.fn().mockResolvedValue({ commitHash: 'c1' }),
loadAllSources: vi.fn().mockResolvedValue([]),
loadAllSources: vi.fn().mockResolvedValue({ sources: [], loadErrors: [] }),
} as any,
wikiService: {} as any,
configService: {} as any,
@ -191,9 +191,10 @@ describe('SlEditSourceTool — manifest-backed source without overlay', () => {
expect(joinedErrors).toContain('manifest');
expect(joinedErrors).toContain('sl_write_source');
expect(joinedErrors).toContain('overlay');
// Overlay shape: only name + measures/segments/description
// Overlay shape: name plus overlay-only fields.
expect(joinedErrors).toContain('measures');
expect(joinedErrors).toContain('segments');
expect(joinedErrors).toContain('column_overrides');
});
it('still returns the plain "Source not found" error for truly-missing names', async () => {

View file

@ -127,7 +127,8 @@ If no source exists yet, use sl_write_source instead — this tool will reject t
` - name: <measure_name>`,
` expr: "<expression>"`,
` description: "<what it measures>"`,
`Overlay shape: "name:" plus any of "measures:", "segments:", "descriptions:". Do NOT include "sql:", "table:", "grain:", "columns:", or "joins:" — those are inherited from the manifest.`,
`Overlay shape: "name:" plus any of "measures:", "segments:", "descriptions:", "joins:", "disable_joins:", "exclude_columns:", "column_overrides:", or computed-only "columns:" entries with expr + type.`,
`Do NOT include "sql:", "table:", "grain:", or base-table "columns:" — those are inherited from the manifest.`,
].join('\n'),
],
sourceName,
@ -181,7 +182,7 @@ If no source exists yet, use sl_write_source instead — this tool will reject t
const result = await semanticLayerService.writeSource(connectionId, source, author, authorEmail, commitMessage);
if (!skipIndex) {
const allSources = await semanticLayerService.loadAllSources(connectionId);
const { sources: allSources } = await semanticLayerService.loadAllSources(connectionId);
await this.slSearchService.indexSources(connectionId, allSources).catch(() => {});
}

View file

@ -34,7 +34,7 @@ describe('SlValidateTool — session-aware touched-set filtering', () => {
{ name: 'customers', table: 'x.customers', grain: ['id'], columns: [], joins: [], measures: [] },
];
const serviceMock = {
loadAllSources: vi.fn().mockResolvedValue(sources),
loadAllSources: vi.fn().mockResolvedValue({ sources, loadErrors: [] }),
validateSourcesForConnection: vi.fn().mockResolvedValue({
errors: ['orders: missing join target', 'customers: invalid grain'],
warnings: ['orders: disconnected-components warning'],

View file

@ -62,7 +62,7 @@ Checks: all join targets exist, grain is valid, no missing references.
const semanticLayerService = context.session?.semanticLayerService ?? this.semanticLayerService;
const sources = await semanticLayerService.loadAllSources(connectionId);
const { sources } = await semanticLayerService.loadAllSources(connectionId);
if (sources.length === 0) {
return this.buildOutput(true, [], '(all)', {
validationErrors: ['No sources found for this connection.'],

View file

@ -8,7 +8,7 @@ function makeDeps(opts: { sourceYaml: string; executeQuery: ReturnType<typeof vi
isManifestBacked: vi.fn().mockResolvedValue(false),
listManifestSourceNames: vi.fn().mockResolvedValue([]),
loadSource: vi.fn().mockResolvedValue(null),
loadAllSources: vi.fn().mockResolvedValue([]),
loadAllSources: vi.fn().mockResolvedValue({ sources: [], loadErrors: [] }),
validatePhysicalTableReferences: vi.fn().mockResolvedValue([]),
} as never,
connections: {

View file

@ -88,8 +88,9 @@ export async function validateSingleSource(
errors.push(
`${sourceName}.yaml: standalone source shadows an existing manifest entry — ` +
`writing it as-is drops the manifest's columns and joins. ` +
`Remove "sql:", "table:", "grain:", "columns:", and "joins:" and keep only ` +
`"name:" plus "measures:"/"segments:"/"descriptions:" to write an overlay ` +
`Remove "sql:", "table:", "grain:", and base-table "columns:" and keep only ` +
`"name:" plus overlay fields such as "measures:", "segments:", "descriptions:", ` +
`"joins:", "column_overrides:", or computed-only "columns:" to write an overlay ` +
`that inherits the manifest schema. Call sl_read_source to inspect the existing source first.`,
);
return { errors, warnings };
@ -108,7 +109,7 @@ export async function validateSingleSource(
}
if (errorPaths.has('columns')) {
warnings.push(
`${sourceName}.yaml: hint — overlay columns must be computed: {name, expr, type}. Do NOT include base table columns.`,
`${sourceName}.yaml: hint — overlay columns must be computed: {name, expr, type}. Use column_overrides for manifest column descriptions or metadata.`,
);
}
if (errorPaths.has('measures')) {
@ -240,7 +241,8 @@ async function probeOverlayMeasures(
}
| undefined;
try {
const all = await deps.semanticLayerService.loadAllSources(connectionId);
const { sources: all, loadErrors } = await deps.semanticLayerService.loadAllSources(connectionId);
errors.push(...loadErrors);
composed = all.find((s) => s.name === sourceName);
} catch (e) {
errors.push(

View file

@ -8,7 +8,7 @@ function makeTool(overrides: Partial<Record<string, any>> = {}) {
listManifestSourceNames: vi.fn().mockResolvedValue(['ACCOUNTS', 'ORDERS']),
isManifestBacked: vi.fn().mockResolvedValue(false),
loadSource: vi.fn().mockResolvedValue(null),
loadAllSources: vi.fn().mockResolvedValue([]),
loadAllSources: vi.fn().mockResolvedValue({ sources: [], loadErrors: [] }),
validateWithProposedSource: vi.fn().mockResolvedValue({ errors: [], warnings: [] }),
writeSource: vi.fn().mockResolvedValue({ commitHash: 'c1' }),
deleteSource: vi.fn().mockResolvedValue(undefined),
@ -59,7 +59,7 @@ describe('SlWriteSourceTool — session gating', () => {
actions: [],
semanticLayerService: {
loadSource: vi.fn().mockResolvedValue(null),
loadAllSources: vi.fn().mockResolvedValue([]),
loadAllSources: vi.fn().mockResolvedValue({ sources: [], loadErrors: [] }),
validateWithProposedSource: vi.fn().mockResolvedValue({ errors: [], warnings: [] }),
writeSource: vi.fn().mockResolvedValue({ commitHash: 'c1' }),
deleteSource: vi.fn().mockResolvedValue(undefined),
@ -213,7 +213,7 @@ describe('SlWriteSourceTool — session gating', () => {
ingest: { runId: 'run-1', jobId: 'job-1', syncId: 'sync-1', sourceKey: 'metabase' },
semanticLayerService: {
loadSource: vi.fn().mockResolvedValue(null),
loadAllSources: vi.fn().mockResolvedValue([]),
loadAllSources: vi.fn().mockResolvedValue({ sources: [], loadErrors: [] }),
validateWithProposedSource: vi.fn().mockResolvedValue({ errors: [], warnings: [] }),
writeSource: vi.fn().mockResolvedValue({ commitHash: 'c1' }),
deleteSource: vi.fn().mockResolvedValue(undefined),

View file

@ -23,7 +23,9 @@ const slWriteSourceInputSchema = z.object({
.describe('Name of the source to create, edit, or delete'),
source: sourceInputSchema
.optional()
.describe('Source definition (standalone with table/sql) or overlay (measures, computed columns, etc.)'),
.describe(
'Source definition (standalone with table/sql) or overlay (measures, column_overrides, computed columns, etc.)',
),
delete: z.boolean().optional().describe('Set to true to delete this source entirely'),
rawPaths: z
.array(z.string().min(1))
@ -73,7 +75,8 @@ If the source already exists, this tool will overwrite it with the new definitio
- table: For physical table/view sources (e.g., "public.orders"). Mutually exclusive with sql.
- sql: For SQL-based sources (the SQL query). Mutually exclusive with table.
- grain: What one row represents (e.g., ["id"], ["customer_id", "product_id"])
- columns: All columns with type (string/number/time/boolean) and optional descriptions
- columns: All columns with type (string/number/time/boolean) and optional descriptions. On overlays, columns are computed-only and require expr + type.
- column_overrides: Overlay-only metadata patches for existing manifest columns (descriptions, role, visibility, constraints, enum_values, tests). Do not include type or expr.
- joins: Relationships to other sources (to, on, relationship: many_to_one/one_to_many/one_to_one)
- measures: Pre-defined aggregations (name, expr like "sum(amount)", optional filter, optional segments bare names of segments defined on the same source, optional description)
- segments: Named, reusable boolean predicates scoped to this source (name, expr a SQL boolean over this source's columns, optional description). A measure references one with \`segments: [name]\`; a query references one with the dotted form \`source.segment_name\`. Use when the same predicate appears on 3+ measures — e.g. extract \`is_paid = true and is_refunded = '0'\` as \`segments: [{name: paid_non_refunded, expr: "..."}]\` and have each measure use \`segments: [paid_non_refunded]\` instead of re-typing the predicate inside \`sum(case when ... then x end)\`. Segments are predicates only — they cannot be selected as dimensions or grouped by; if you need to group by the predicate, add a \`columns[]\` entry instead.
@ -113,7 +116,7 @@ Do NOT join back to a table that the SQL already aggregates from if the grain co
try {
await semanticLayerService.deleteSource(connectionId, sourceName, author, authorEmail);
if (!skipIndex) {
const allSources = await semanticLayerService.loadAllSources(connectionId);
const { sources: allSources } = await semanticLayerService.loadAllSources(connectionId);
await this.slSearchService.indexSources(connectionId, allSources).catch(() => {});
}
if (context.session) {
@ -210,7 +213,7 @@ Do NOT join back to a table that the SQL already aggregates from if the grain co
);
if (!skipIndex) {
const allSources = await semanticLayerService.loadAllSources(connectionId);
const { sources: allSources } = await semanticLayerService.loadAllSources(connectionId);
await this.slSearchService.indexSources(connectionId, allSources).catch(() => {});
}
@ -317,8 +320,9 @@ Do NOT join back to a table that the SQL already aggregates from if the grain co
`Error: cannot write "${sourceName}" as a standalone source — a manifest entry with that name already exists.`,
` Writing standalone would drop the manifest's columns and joins, leaving only what you list here.`,
`To add measures/segments on top of the manifest, rewrite this YAML as an overlay:`,
` - Remove "sql:", "table:", "grain:", "columns:", and "joins:".`,
` - Keep only "name:", plus "measures:", "segments:", and/or "descriptions:".`,
` - Remove "sql:", "table:", "grain:", and base-table "columns:".`,
` - Keep "name:" plus "measures:", "segments:", "descriptions:", "joins:", "disable_joins:",`,
` "exclude_columns:", "column_overrides:", and/or computed-only "columns:" entries with expr + type.`,
` - The manifest's schema is inherited automatically.`,
`If you really need a different base table, use a different source name.`,
].join('\n');

View file

@ -47,6 +47,32 @@ export interface SemanticLayerSource {
usage?: TableUsageOutput;
}
type SemanticLayerColumn = SemanticLayerSource['columns'][number];
type SemanticLayerJoin = SemanticLayerSource['joins'][number];
export interface SemanticLayerColumnOverride {
name: string;
role?: string;
visibility?: string;
descriptions?: Record<string, string>;
constraints?: { dbt?: { not_null?: boolean; unique?: boolean } };
enum_values?: { dbt?: string[] };
tests?: {
dbt?: Array<{ name: string; package: string; kwargs?: Record<string, unknown> }>;
dbt_by_package?: Record<string, string[]>;
};
}
export type ResolvedSemanticLayerSource = Omit<
SemanticLayerSource,
'inherits_columns_from' | 'usage' | 'joins'
> & {
table?: string;
sql?: string;
columns: Array<SemanticLayerColumn & { type: string }>;
joins: Array<Omit<SemanticLayerJoin, 'source'>>;
};
export interface SemanticLayerQueryInput {
measures: Array<string | { expr: string; name: string }>;
dimensions: Array<string | { field: string; granularity?: string }>;