mirror of
https://github.com/Kaelio/ktx.git
synced 2026-06-25 08:48:08 +02:00
chore(workspace): gate dead-code with knip production mode (#196)
* refactor(workspace): relocate @ktx/llm source into packages/cli/src/llm * refactor(workspace): rewrite @ktx/llm imports to relative paths * refactor(workspace): fold internal packages into cli * chore(workspace): gate dead-code with knip production mode Turn on production-mode knip plus an autofix run in pre-commit and the `pnpm dead-code` script, document the `/** @internal */` convention for test-only exports in AGENTS.md, annotate test-only exports across the CLI with that JSDoc, and drop dead exports/wrappers the new gate surfaced (e.g. `cli-project.ts`, `lookerRuntimeSourceToFileAdapterSource`, `createLocalScanEnrichmentProvidersFromConfig`, `PGLITE_OWNER_PROCESS_BACKEND_CAPABILITIES`, stale type re-exports). Replace the loose `ignoreIssues` allowlist in `knip.json` with explicit production entries so cross-package barrel leaks are caught. * refactor(cli): delete internal barrel index.ts files The 34 `index.ts` re-export barrels inside `packages/cli/src/` were holdovers from the pre-fold multi-workspace structure. Post-fold-in they served no production purpose: external consumers go through the single package main entry, and in-repo callers mostly imported through them only because the path was short. Internally, knip flagged most barrel re-exports as production-dead (only reached via tests). This change: - Deletes every internal barrel except `packages/cli/src/index.ts` (the published package entry). - Rewrites ~270 source/test files to import each name directly from the file that defines it. - Moves `tools/warehouse-verification/index.ts` to `create-warehouse-verification-tools.ts` (the function it defined locally) and updates its single consumer. - Renames `search/backend-conformance.ts` → `.test-utils.ts` to match the existing test-helper file convention. - Deletes 13 dead test-only chains (dbt-descriptions/*, live-database/extracted-schema, live-database/structural-sync, relationship-* feedback/review chain) plus their tests and a cascading orphan integration test. - Updates test mocks that pointed at deleted barrel paths (notion-client, connector barrels in scan/local-scan-connectors tests) to mock the source files instead. - Points the maintainer benchmark script (`scripts/relationship-benchmark-report.mjs`) at source files instead of `dist/context/scan/index.js`. - Drops the barrel `!` entries from `knip.json`; adds explicit production entries only for the benchmark code reached via dist by the maintainer script. Net: 413 files changed, ~1.2k insertions, ~9.4k deletions. `pnpm run dead-code` (Biome + knip default + knip production) and `pnpm run type-check` are clean; 2277 tests pass. * refactor(workspace): rename @ktx/cli to @kaelio/ktx and pack it directly Promote the CLI workspace package to the public name `@kaelio/ktx` and drop the separate `scripts/build-public-npm-package.mjs` wrapper. The CLI package is now publishable in place (`publishConfig.access: public`, `provenance: true`), so artifact packing uses `pnpm pack` against `packages/cli/` instead of assembling a parallel package tree. Updates all workspace filter invocations, docs, tests, and release readiness checks to reference the new package name, and folds the tarball-name helper into `scripts/public-npm-release-metadata.mjs`. * docs: align "agent clients" and "data agents" terminology Replace "client agents" with "agent clients" and "database agents" with "data agents" across AGENTS.md, README.md, the docs-site copy, and the matching setup-agents test description, matching the canonical vocabulary in docs/terminology.md. Also moves packages/cli/tsconfig.json's tsBuildInfoFile from node_modules/.cache/ to dist/.tsbuildinfo so incremental builds survive node_modules reinstalls. * refactor(release): single source of truth for package version Make packages/cli/package.json the single source of truth for the @kaelio/ktx version. publicNpmPackageVersion() now reads it directly, so artifact filenames, release-readiness checks, and the Python wheel version all derive from one field. The duplicate release-policy.json.publicNpmPackageVersion is removed. Previously the two fields could drift: tarballs were named kaelio-ktx-0.4.1.tgz while internally containing @kaelio/ktx@0.0.0-private. - update-public-release-version.mjs rewrites both Python pyproject.toml files (ktx-daemon, ktx-sl) alongside the npm package.jsons, normalizing the version for PEP 440 (e.g. 0.1.0-rc.2 -> 0.1.0rc2). - semantic-release-config.cjs adds the two pyproject.toml files to @semantic-release/git assets so the release commit back to main carries every version source in lockstep. - The six "?? '0.0.0-private'" fallback literals across the CLI are replaced with "?? getKtxCliPackageInfo().version", and createDefaultKtxMcpServer makes its version arg required. - docs/release.md describes the actual commit-back model: the dev tree always reflects the most recent release; no sentinel pin to maintain. Verified: pnpm run artifacts:build now produces kaelio-ktx-0.4.1.tgz and kaelio_ktx-0.4.1-py3-none-any.whl with @kaelio/ktx@0.4.1 inside. Full type-check, dead-code, and 2287 vitests + 173 script tests pass. * refactor(cli): inject embedding provider resolution and detect sentence-transformers runtime Make resolveProjectEmbeddingProvider and runtimeIo injectable in ingest and scan command entrypoints so tests can stub them, and teach resolvePublicIngestRuntimeRequirements to flag the local-embeddings runtime feature when ktx.yaml selects sentence-transformers. * chore(cli): mark buildLocalStatsStatus and LocalStatsStatus as @internal Both symbols are consumed only by status-project.test.ts. Annotating with /** @internal */ keeps knip's production-mode check clean without changing runtime behavior. * fix(cli): use real package metadata in print-command-tree The stubbed package name embedded a forbidden product identifier that tripped the boundary check in CI. Read the metadata from package.json instead — keeps the rendered tree unchanged and removes a duplicate source of truth. * feat(cli): show embedding coverage in `ktx status`, drop duplicate disk counts Inline `(N embedded)` next to the Wiki scope counts and Semantic-layer source counts, computed with `SUM(embedding_json IS NOT NULL)` over `knowledge_pages` and `local_sl_sources`. Rename the "Knowledge" label to "Wiki" (canonical per `docs/terminology.md`) and rename the matching `localStats.knowledgePages` field to `localStats.wikiPages`. Drop `wiki=N md` and `semantic-layer=N yaml` from the Disk row — those duplicated the per-surface rows above. Disk now reports only actual byte usage (db, cache, raw-sources). The unused `wikiGlobalMarkdownCount` / `semanticLayerYamlCount` fields, the `isMarkdownEntry` / `isYamlEntry` helpers, and the `filter` arg on `summarizeDir` are removed.
This commit is contained in:
parent
a1cfb03d73
commit
2366b00301
1002 changed files with 2286 additions and 12051 deletions
|
|
@ -0,0 +1,53 @@
|
|||
import { tool } from 'ai';
|
||||
import { z } from 'zod';
|
||||
import type { ArtifactResolutionRecord, StageIndex } from '../stages/stage-index.types.js';
|
||||
|
||||
interface EmitArtifactResolutionDeps {
|
||||
stageIndex: StageIndex;
|
||||
allowedPaths: Set<string>;
|
||||
}
|
||||
|
||||
function sameArtifactResolution(left: ArtifactResolutionRecord, right: ArtifactResolutionRecord): boolean {
|
||||
return (
|
||||
left.rawPath === right.rawPath &&
|
||||
left.artifactKind === right.artifactKind &&
|
||||
left.artifactKey === right.artifactKey &&
|
||||
left.actionType === right.actionType
|
||||
);
|
||||
}
|
||||
|
||||
export function createEmitArtifactResolutionTool(deps: EmitArtifactResolutionDeps) {
|
||||
return tool({
|
||||
description:
|
||||
'Record one explicit artifact resolution for ingest provenance. Use when reconciliation merges or subsumes an artifact without creating a new wiki or SL write action.',
|
||||
inputSchema: z.object({
|
||||
rawPath: z.string().min(1),
|
||||
artifactKind: z.enum(['sl', 'wiki']),
|
||||
artifactKey: z.string().min(1),
|
||||
actionType: z.enum(['merged', 'subsumed']),
|
||||
reason: z.string().min(1),
|
||||
}),
|
||||
execute: async (input): Promise<string> => {
|
||||
if (!deps.allowedPaths.has(input.rawPath)) {
|
||||
return `Error: rawPath "${input.rawPath}" is not available to this ingest stage`;
|
||||
}
|
||||
|
||||
const record: ArtifactResolutionRecord = {
|
||||
rawPath: input.rawPath,
|
||||
artifactKind: input.artifactKind,
|
||||
artifactKey: input.artifactKey,
|
||||
actionType: input.actionType,
|
||||
reason: input.reason,
|
||||
};
|
||||
const existingIndex = deps.stageIndex.artifactResolutions?.findIndex((candidate) =>
|
||||
sameArtifactResolution(candidate, record),
|
||||
);
|
||||
if (existingIndex !== undefined && existingIndex >= 0 && deps.stageIndex.artifactResolutions) {
|
||||
deps.stageIndex.artifactResolutions[existingIndex] = record;
|
||||
} else {
|
||||
deps.stageIndex.artifactResolutions = [...(deps.stageIndex.artifactResolutions ?? []), record];
|
||||
}
|
||||
return `recorded artifact resolution for ${record.artifactKind}:${record.artifactKey}`;
|
||||
},
|
||||
});
|
||||
}
|
||||
|
|
@ -0,0 +1,38 @@
|
|||
import { tool } from 'ai';
|
||||
import { z } from 'zod';
|
||||
import type { ConflictResolvedRecord, StageIndex } from '../stages/stage-index.types.js';
|
||||
|
||||
interface EmitConflictResolutionDeps {
|
||||
stageIndex: StageIndex;
|
||||
}
|
||||
|
||||
export function createEmitConflictResolutionTool(deps: EmitConflictResolutionDeps) {
|
||||
return tool({
|
||||
description:
|
||||
'Record one conflict resolution decision for the final IngestReport. Call after resolving or flagging a cross-WorkUnit conflict.',
|
||||
inputSchema: z.object({
|
||||
unitKey: z.string().min(1).optional(),
|
||||
kind: z.enum(['structural_duplicate', 'near_duplicate', 'definitional_contradiction', 're_ingest_change']),
|
||||
contestedKey: z.string().min(1).optional(),
|
||||
artifactKey: z.string().min(1),
|
||||
detail: z.string().min(1),
|
||||
flaggedForHuman: z.boolean().default(false),
|
||||
}),
|
||||
execute: async (input): Promise<string> => {
|
||||
const record: ConflictResolvedRecord = {
|
||||
kind: input.kind,
|
||||
artifactKey: input.artifactKey,
|
||||
detail: input.detail,
|
||||
flaggedForHuman: input.flaggedForHuman,
|
||||
};
|
||||
if (input.unitKey) {
|
||||
record.unitKey = input.unitKey;
|
||||
}
|
||||
if (input.contestedKey) {
|
||||
record.contestedKey = input.contestedKey;
|
||||
}
|
||||
deps.stageIndex.conflictsResolved.push(record);
|
||||
return `recorded conflict resolution for ${record.artifactKey}`;
|
||||
},
|
||||
});
|
||||
}
|
||||
|
|
@ -0,0 +1,51 @@
|
|||
import { tool } from 'ai';
|
||||
import { z } from 'zod';
|
||||
import type { EvictionAppliedRecord, StageIndex } from '../stages/stage-index.types.js';
|
||||
|
||||
interface EmitEvictionDecisionDeps {
|
||||
stageIndex: StageIndex;
|
||||
deletedRawPaths: string[];
|
||||
}
|
||||
|
||||
function sameEvictionArtifact(left: EvictionAppliedRecord, right: EvictionAppliedRecord): boolean {
|
||||
return (
|
||||
left.rawPath === right.rawPath && left.artifactKind === right.artifactKind && left.artifactKey === right.artifactKey
|
||||
);
|
||||
}
|
||||
|
||||
export function createEmitEvictionDecisionTool(deps: EmitEvictionDecisionDeps) {
|
||||
const allowedPaths = new Set(deps.deletedRawPaths);
|
||||
return tool({
|
||||
description:
|
||||
'Record one eviction decision for the final IngestReport. The rawPath must come from the current Eviction Set.',
|
||||
inputSchema: z.object({
|
||||
rawPath: z.string().min(1),
|
||||
artifactKind: z.enum(['sl', 'wiki']),
|
||||
artifactKey: z.string().min(1),
|
||||
action: z.literal('removed'),
|
||||
reason: z.string().min(1),
|
||||
}),
|
||||
execute: async (input): Promise<string> => {
|
||||
if (!allowedPaths.has(input.rawPath)) {
|
||||
return `Error: rawPath "${input.rawPath}" is not in the current eviction set`;
|
||||
}
|
||||
|
||||
const record: EvictionAppliedRecord = {
|
||||
rawPath: input.rawPath,
|
||||
artifactKind: input.artifactKind,
|
||||
artifactKey: input.artifactKey,
|
||||
action: input.action,
|
||||
reason: input.reason,
|
||||
};
|
||||
const existingIndex = deps.stageIndex.evictionsApplied.findIndex((candidate) =>
|
||||
sameEvictionArtifact(candidate, record),
|
||||
);
|
||||
if (existingIndex >= 0) {
|
||||
deps.stageIndex.evictionsApplied[existingIndex] = record;
|
||||
} else {
|
||||
deps.stageIndex.evictionsApplied.push(record);
|
||||
}
|
||||
return `recorded eviction decision for ${record.rawPath} -> ${record.artifactKind}:${record.artifactKey}`;
|
||||
},
|
||||
});
|
||||
}
|
||||
|
|
@ -0,0 +1,275 @@
|
|||
import type { Tool } from 'ai';
|
||||
import { describe, expect, it } from 'vitest';
|
||||
import type { StageIndex } from '../stages/stage-index.types.js';
|
||||
import { createEmitArtifactResolutionTool } from './emit-artifact-resolution.tool.js';
|
||||
import { createEmitConflictResolutionTool } from './emit-conflict-resolution.tool.js';
|
||||
import { createEmitEvictionDecisionTool } from './emit-eviction-decision.tool.js';
|
||||
import { createEmitUnmappedFallbackTool } from './emit-unmapped-fallback.tool.js';
|
||||
|
||||
function makeStageIndex(): StageIndex {
|
||||
return {
|
||||
jobId: 'job-1',
|
||||
connectionId: 'c1',
|
||||
workUnits: [],
|
||||
conflictsResolved: [],
|
||||
evictionsApplied: [],
|
||||
unmappedFallbacks: [],
|
||||
};
|
||||
}
|
||||
|
||||
async function executeTool<Input>(tool: Tool<Input, string>, input: NoInfer<Input>) {
|
||||
if (!tool.execute) {
|
||||
throw new Error('tool is not executable');
|
||||
}
|
||||
return (await tool.execute(input, { toolCallId: 'tool-call-1', messages: [] })) as string;
|
||||
}
|
||||
|
||||
describe('reconciliation emit tools', () => {
|
||||
it('records conflict resolutions on the shared stage index', async () => {
|
||||
const stageIndex = makeStageIndex();
|
||||
const tool = createEmitConflictResolutionTool({ stageIndex });
|
||||
|
||||
const output = await executeTool(tool, {
|
||||
unitKey: 'wu-orders',
|
||||
kind: 'near_duplicate',
|
||||
contestedKey: 'gross_revenue',
|
||||
artifactKey: 'sl:orders.gross_revenue',
|
||||
detail: 'orders and order_facts compute the same revenue metric; retained orders as canonical',
|
||||
flaggedForHuman: true,
|
||||
});
|
||||
|
||||
expect(stageIndex.conflictsResolved).toEqual([
|
||||
{
|
||||
unitKey: 'wu-orders',
|
||||
kind: 'near_duplicate',
|
||||
contestedKey: 'gross_revenue',
|
||||
artifactKey: 'sl:orders.gross_revenue',
|
||||
detail: 'orders and order_facts compute the same revenue metric; retained orders as canonical',
|
||||
flaggedForHuman: true,
|
||||
},
|
||||
]);
|
||||
expect(output).toBe('recorded conflict resolution for sl:orders.gross_revenue');
|
||||
});
|
||||
|
||||
it('records eviction decisions only for deleted raw paths in the current eviction set', async () => {
|
||||
const stageIndex = makeStageIndex();
|
||||
const tool = createEmitEvictionDecisionTool({
|
||||
stageIndex,
|
||||
deletedRawPaths: ['views/old_orders.view.lkml'],
|
||||
});
|
||||
|
||||
const output = await executeTool(tool, {
|
||||
rawPath: 'views/old_orders.view.lkml',
|
||||
artifactKind: 'sl',
|
||||
artifactKey: 'old_orders',
|
||||
action: 'removed',
|
||||
reason: 'source raw file was deleted and no retained artifacts are required',
|
||||
});
|
||||
|
||||
expect(output).toContain('recorded eviction decision for views/old_orders.view.lkml');
|
||||
expect(stageIndex.evictionsApplied).toEqual([
|
||||
{
|
||||
rawPath: 'views/old_orders.view.lkml',
|
||||
artifactKind: 'sl',
|
||||
artifactKey: 'old_orders',
|
||||
action: 'removed',
|
||||
reason: 'source raw file was deleted and no retained artifacts are required',
|
||||
},
|
||||
]);
|
||||
});
|
||||
|
||||
it('updates an existing eviction decision for the same raw path and artifact', async () => {
|
||||
const stageIndex = makeStageIndex();
|
||||
const tool = createEmitEvictionDecisionTool({
|
||||
stageIndex,
|
||||
deletedRawPaths: ['views/old_orders.view.lkml'],
|
||||
});
|
||||
|
||||
await executeTool(tool, {
|
||||
rawPath: 'views/old_orders.view.lkml',
|
||||
artifactKind: 'wiki',
|
||||
artifactKey: 'orders/old',
|
||||
action: 'removed',
|
||||
reason: 'first pass',
|
||||
});
|
||||
await executeTool(tool, {
|
||||
rawPath: 'views/old_orders.view.lkml',
|
||||
artifactKind: 'wiki',
|
||||
artifactKey: 'orders/old',
|
||||
action: 'removed',
|
||||
reason: 'second pass after checking references',
|
||||
});
|
||||
|
||||
expect(stageIndex.evictionsApplied).toEqual([
|
||||
{
|
||||
rawPath: 'views/old_orders.view.lkml',
|
||||
artifactKind: 'wiki',
|
||||
artifactKey: 'orders/old',
|
||||
action: 'removed',
|
||||
reason: 'second pass after checking references',
|
||||
},
|
||||
]);
|
||||
});
|
||||
|
||||
it('rejects eviction decisions for raw paths outside the current eviction set', async () => {
|
||||
const stageIndex = makeStageIndex();
|
||||
const tool = createEmitEvictionDecisionTool({
|
||||
stageIndex,
|
||||
deletedRawPaths: ['views/old_orders.view.lkml'],
|
||||
});
|
||||
|
||||
const output = await executeTool(tool, {
|
||||
rawPath: 'views/not_deleted.view.lkml',
|
||||
artifactKind: 'sl',
|
||||
artifactKey: 'not_deleted',
|
||||
action: 'removed',
|
||||
reason: 'bad input',
|
||||
});
|
||||
|
||||
expect(output).toContain('Error: rawPath "views/not_deleted.view.lkml" is not in the current eviction set');
|
||||
expect(stageIndex.evictionsApplied).toEqual([]);
|
||||
});
|
||||
|
||||
it('records unmapped fallback decisions for allowed raw paths', async () => {
|
||||
const stageIndex = makeStageIndex();
|
||||
const tool = createEmitUnmappedFallbackTool({
|
||||
stageIndex,
|
||||
allowedPaths: new Set(['metrics/conversion.yml']),
|
||||
});
|
||||
|
||||
const output = await executeTool(tool, {
|
||||
rawPath: 'metrics/conversion.yml',
|
||||
reason: 'no_physical_table',
|
||||
fallback: 'flagged',
|
||||
});
|
||||
|
||||
expect(output).toContain('recorded unmapped fallback for metrics/conversion.yml');
|
||||
expect(stageIndex.unmappedFallbacks).toEqual([
|
||||
{
|
||||
rawPath: 'metrics/conversion.yml',
|
||||
reason: 'no_physical_table',
|
||||
detail: expect.stringContaining('not present as a source'),
|
||||
fallback: 'flagged',
|
||||
},
|
||||
]);
|
||||
});
|
||||
|
||||
it('deduplicates identical unmapped fallback decisions', async () => {
|
||||
const stageIndex = makeStageIndex();
|
||||
const tool = createEmitUnmappedFallbackTool({
|
||||
stageIndex,
|
||||
allowedPaths: new Set(['metrics/conversion.yml']),
|
||||
});
|
||||
|
||||
await executeTool(tool, {
|
||||
rawPath: 'metrics/conversion.yml',
|
||||
reason: 'no_physical_table',
|
||||
fallback: 'flagged',
|
||||
});
|
||||
await executeTool(tool, {
|
||||
rawPath: 'metrics/conversion.yml',
|
||||
reason: 'no_physical_table',
|
||||
fallback: 'flagged',
|
||||
});
|
||||
|
||||
expect(stageIndex.unmappedFallbacks).toEqual([
|
||||
{
|
||||
rawPath: 'metrics/conversion.yml',
|
||||
reason: 'no_physical_table',
|
||||
detail: expect.stringContaining('not present as a source'),
|
||||
fallback: 'flagged',
|
||||
},
|
||||
]);
|
||||
});
|
||||
|
||||
it('records MetricFlow-specific unsupported fallback reasons', async () => {
|
||||
const stageIndex = makeStageIndex();
|
||||
const tool = createEmitUnmappedFallbackTool({
|
||||
stageIndex,
|
||||
allowedPaths: new Set(['metrics/conversion.yml']),
|
||||
});
|
||||
|
||||
const output = await executeTool(tool, {
|
||||
rawPath: 'metrics/conversion.yml',
|
||||
reason: 'conversion_metric_unsupported',
|
||||
fallback: 'flagged',
|
||||
});
|
||||
|
||||
expect(output).toContain('conversion metric');
|
||||
expect(stageIndex.unmappedFallbacks).toEqual([
|
||||
{
|
||||
rawPath: 'metrics/conversion.yml',
|
||||
reason: 'conversion_metric_unsupported',
|
||||
detail: expect.stringContaining('conversion metric'),
|
||||
fallback: 'flagged',
|
||||
},
|
||||
]);
|
||||
});
|
||||
|
||||
it('rejects unmapped fallback decisions for raw paths outside the allowed set', async () => {
|
||||
const stageIndex = makeStageIndex();
|
||||
const tool = createEmitUnmappedFallbackTool({
|
||||
stageIndex,
|
||||
allowedPaths: new Set(['metrics/conversion.yml']),
|
||||
});
|
||||
|
||||
const output = await executeTool(tool, {
|
||||
rawPath: 'metrics/not-in-this-work-unit.yml',
|
||||
reason: 'no_physical_table',
|
||||
fallback: 'flagged',
|
||||
});
|
||||
|
||||
expect(output).toContain(
|
||||
'Error: rawPath "metrics/not-in-this-work-unit.yml" is not available to this ingest stage',
|
||||
);
|
||||
expect(stageIndex.unmappedFallbacks).toEqual([]);
|
||||
});
|
||||
|
||||
it('rejects missing-table fallback decisions when the table resolves to an existing semantic source', async () => {
|
||||
const stageIndex = makeStageIndex();
|
||||
const tool = createEmitUnmappedFallbackTool({
|
||||
stageIndex,
|
||||
allowedPaths: new Set(['cards/revenue.json']),
|
||||
tableRefExists: async (tableRef) => tableRef === 'orbit_analytics.mart_revenue_daily',
|
||||
});
|
||||
|
||||
const output = await executeTool(tool, {
|
||||
rawPath: 'cards/revenue.json',
|
||||
reason: 'no_physical_table',
|
||||
tableRef: 'orbit_analytics.mart_revenue_daily',
|
||||
fallback: 'wiki_only',
|
||||
});
|
||||
|
||||
expect(output).toContain(
|
||||
'Error: tableRef "orbit_analytics.mart_revenue_daily" already resolves to a semantic source',
|
||||
);
|
||||
expect(stageIndex.unmappedFallbacks).toEqual([]);
|
||||
});
|
||||
|
||||
it('records explicit artifact resolutions for provenance rows', async () => {
|
||||
const stageIndex = makeStageIndex();
|
||||
const tool = createEmitArtifactResolutionTool({
|
||||
stageIndex,
|
||||
allowedPaths: new Set(['explores/b2b/sales_pipeline.json']),
|
||||
});
|
||||
|
||||
const output = await executeTool(tool, {
|
||||
rawPath: 'explores/b2b/sales_pipeline.json',
|
||||
artifactKind: 'sl',
|
||||
artifactKey: 'looker__b2b__sales_pipeline',
|
||||
actionType: 'subsumed',
|
||||
reason: 'File-adapter source b2b__sales_pipeline is canonical for this explore.',
|
||||
});
|
||||
|
||||
expect(output).toBe('recorded artifact resolution for sl:looker__b2b__sales_pipeline');
|
||||
expect(stageIndex.artifactResolutions).toEqual([
|
||||
{
|
||||
rawPath: 'explores/b2b/sales_pipeline.json',
|
||||
artifactKind: 'sl',
|
||||
artifactKey: 'looker__b2b__sales_pipeline',
|
||||
actionType: 'subsumed',
|
||||
reason: 'File-adapter source b2b__sales_pipeline is canonical for this explore.',
|
||||
},
|
||||
]);
|
||||
});
|
||||
});
|
||||
|
|
@ -0,0 +1,106 @@
|
|||
import { tool } from 'ai';
|
||||
import { z } from 'zod';
|
||||
import type { StageIndex, UnmappedFallbackRecord, UnmappedFallbackReason } from '../stages/stage-index.types.js';
|
||||
|
||||
interface EmitUnmappedFallbackDeps {
|
||||
stageIndex: StageIndex;
|
||||
allowedPaths: ReadonlySet<string>;
|
||||
tableRefExists?: (tableRef: string) => Promise<boolean>;
|
||||
}
|
||||
|
||||
const unmappedFallbackReasonSchema = z.enum([
|
||||
'no_connection_mapping',
|
||||
'looker_template_unresolved',
|
||||
'derived_table_not_supported',
|
||||
'no_physical_table',
|
||||
'multiple_table_references',
|
||||
'unsupported_dialect',
|
||||
'parse_error',
|
||||
'missing_target_table',
|
||||
'cumulative_metric_unsupported',
|
||||
'conversion_metric_unsupported',
|
||||
]);
|
||||
|
||||
function sameUnmappedFallback(left: UnmappedFallbackRecord, right: UnmappedFallbackRecord): boolean {
|
||||
return left.rawPath === right.rawPath && left.reason === right.reason && left.fallback === right.fallback;
|
||||
}
|
||||
|
||||
// Generates a canonical description for each reason so the recorded `detail`
|
||||
// is always consistent with the reason code. Free-form text from the LLM
|
||||
// previously caused contradictions like "no_physical_table" being explained
|
||||
// as "no mapped connection exists" — the tool now owns the core sentence and
|
||||
// the LLM may add optional clarification context.
|
||||
function canonicalDetail(reason: UnmappedFallbackReason, tableRef: string | undefined): string {
|
||||
const tableClause = tableRef ? `'${tableRef}'` : 'the referenced object';
|
||||
switch (reason) {
|
||||
case 'no_physical_table':
|
||||
return `${tableClause} is described but is not present as a source in any mapped warehouse/dbt connection.`;
|
||||
case 'no_connection_mapping':
|
||||
return `${tableClause} has no non-Notion warehouse/dbt connection to map against.`;
|
||||
case 'missing_target_table':
|
||||
return `${tableClause} is referenced but the target table could not be located.`;
|
||||
case 'looker_template_unresolved':
|
||||
return `${tableClause} uses LookML templating that could not be resolved.`;
|
||||
case 'derived_table_not_supported':
|
||||
return `${tableClause} is a derived/inline definition that is not yet supported as a semantic-layer source.`;
|
||||
case 'multiple_table_references':
|
||||
return `${tableClause} references multiple tables; cannot map to a single source.`;
|
||||
case 'unsupported_dialect':
|
||||
return `${tableClause} uses a SQL dialect that is not yet supported.`;
|
||||
case 'parse_error':
|
||||
return `${tableClause} could not be parsed.`;
|
||||
case 'cumulative_metric_unsupported':
|
||||
return `${tableClause} is a cumulative metric, which is not yet supported as a first-class semantic-layer primitive.`;
|
||||
case 'conversion_metric_unsupported':
|
||||
return `${tableClause} is a conversion metric, which is not yet supported as a first-class semantic-layer primitive.`;
|
||||
}
|
||||
}
|
||||
|
||||
function requiresMissingTableValidation(reason: UnmappedFallbackReason): boolean {
|
||||
return reason === 'no_physical_table' || reason === 'missing_target_table';
|
||||
}
|
||||
|
||||
export function createEmitUnmappedFallbackTool(deps: EmitUnmappedFallbackDeps) {
|
||||
return tool({
|
||||
description:
|
||||
'Record one unmapped fallback decision for the final IngestReport. The rawPath must be available to the current ingest stage. The tool generates the canonical detail from the structured reason and optional tableRef; use clarification only to add context that does not contradict the reason code.',
|
||||
inputSchema: z.object({
|
||||
rawPath: z.string().min(1),
|
||||
reason: unmappedFallbackReasonSchema,
|
||||
tableRef: z
|
||||
.string()
|
||||
.optional()
|
||||
.describe('The fully-qualified table or source reference that triggered the fallback (e.g. "<schema>.<table>"). Used to generate canonical detail text.'),
|
||||
clarification: z
|
||||
.string()
|
||||
.optional()
|
||||
.describe('Optional extra context appended to the canonical detail. Must not contradict the reason code.'),
|
||||
fallback: z.enum(['sql_standalone', 'wiki_only', 'flagged']),
|
||||
}),
|
||||
execute: async (input): Promise<string> => {
|
||||
if (!deps.allowedPaths.has(input.rawPath)) {
|
||||
return `Error: rawPath "${input.rawPath}" is not available to this ingest stage`;
|
||||
}
|
||||
if (input.tableRef && requiresMissingTableValidation(input.reason) && deps.tableRefExists) {
|
||||
const exists = await deps.tableRefExists(input.tableRef);
|
||||
if (exists) {
|
||||
return `Error: tableRef "${input.tableRef}" already resolves to a semantic source; do not record ${input.reason} for an existing table.`;
|
||||
}
|
||||
}
|
||||
|
||||
const base = canonicalDetail(input.reason, input.tableRef);
|
||||
const detail = input.clarification ? `${base} ${input.clarification.trim()}`.trim() : base;
|
||||
|
||||
const record: UnmappedFallbackRecord = {
|
||||
rawPath: input.rawPath,
|
||||
reason: input.reason,
|
||||
detail,
|
||||
fallback: input.fallback,
|
||||
};
|
||||
if (!deps.stageIndex.unmappedFallbacks.some((candidate) => sameUnmappedFallback(candidate, record))) {
|
||||
deps.stageIndex.unmappedFallbacks.push(record);
|
||||
}
|
||||
return `recorded unmapped fallback for ${record.rawPath} (${record.fallback}): ${detail}`;
|
||||
},
|
||||
});
|
||||
}
|
||||
|
|
@ -0,0 +1,56 @@
|
|||
import { describe, expect, it, vi } from 'vitest';
|
||||
import { createEvictionListTool } from './eviction-list.tool.js';
|
||||
|
||||
describe('eviction_list tool', () => {
|
||||
it('returns artifacts produced for each deleted raw path', async () => {
|
||||
const provenance = {
|
||||
findLatestArtifactsForRawPaths: vi.fn().mockResolvedValue(
|
||||
new Map([
|
||||
[
|
||||
'views/old.lkml',
|
||||
[{ artifact_kind: 'sl', artifact_key: 'old_metric', action_type: 'source_created' } as any],
|
||||
],
|
||||
['views/gone.lkml', []],
|
||||
]),
|
||||
),
|
||||
};
|
||||
const tool = createEvictionListTool({
|
||||
provenance: provenance as any,
|
||||
connectionId: 'c1',
|
||||
sourceKey: 'lookml',
|
||||
deletedRawPaths: ['views/old.lkml', 'views/gone.lkml'],
|
||||
});
|
||||
const out = (await (tool.execute as (...args: unknown[]) => unknown)(
|
||||
{},
|
||||
{ toolCallId: 't', messages: [] },
|
||||
)) as string;
|
||||
expect(out).toContain('views/old.lkml');
|
||||
expect(out).toContain('old_metric');
|
||||
expect(out).toContain('views/gone.lkml');
|
||||
});
|
||||
|
||||
it('returns empty string when no deletions', async () => {
|
||||
const tool = createEvictionListTool({
|
||||
provenance: {} as any,
|
||||
connectionId: 'c1',
|
||||
sourceKey: 'lookml',
|
||||
deletedRawPaths: [],
|
||||
});
|
||||
const out = (await (tool.execute as (...args: unknown[]) => unknown)(
|
||||
{},
|
||||
{ toolCallId: 't', messages: [] },
|
||||
)) as string;
|
||||
expect(out).toMatch(/empty/i);
|
||||
});
|
||||
|
||||
it('tells curators to record decisions', () => {
|
||||
const tool = createEvictionListTool({
|
||||
provenance: {} as any,
|
||||
connectionId: 'c1',
|
||||
sourceKey: 'lookml',
|
||||
deletedRawPaths: [],
|
||||
});
|
||||
|
||||
expect(tool.description).toContain('emit_eviction_decision');
|
||||
});
|
||||
});
|
||||
39
packages/cli/src/context/ingest/tools/eviction-list.tool.ts
Normal file
39
packages/cli/src/context/ingest/tools/eviction-list.tool.ts
Normal file
|
|
@ -0,0 +1,39 @@
|
|||
import { tool } from 'ai';
|
||||
import { z } from 'zod';
|
||||
import type { IngestProvenancePort } from '../ports.js';
|
||||
|
||||
export interface EvictionListDeps {
|
||||
provenance: IngestProvenancePort;
|
||||
connectionId: string;
|
||||
sourceKey: string;
|
||||
deletedRawPaths: string[];
|
||||
}
|
||||
|
||||
export function createEvictionListTool(deps: EvictionListDeps) {
|
||||
return tool({
|
||||
description:
|
||||
'List every artifact that the most recent completed sync produced from a now-deleted raw file. Remove each listed artifact and record the decision with emit_eviction_decision so the ingest report lists every deleted-source decision.',
|
||||
inputSchema: z.object({}),
|
||||
execute: async () => {
|
||||
if (deps.deletedRawPaths.length === 0) {
|
||||
return '(empty) — no files were deleted since the last sync';
|
||||
}
|
||||
const map = await deps.provenance.findLatestArtifactsForRawPaths(
|
||||
deps.connectionId,
|
||||
deps.sourceKey,
|
||||
deps.deletedRawPaths,
|
||||
);
|
||||
return [...map.entries()]
|
||||
.map(([path, rows]) => {
|
||||
if (rows.length === 0) {
|
||||
return `- raw_path: ${path}\n artifacts: (none)`;
|
||||
}
|
||||
const artifactLines = rows
|
||||
.map((r) => ` - kind: ${r.artifact_kind} key: ${r.artifact_key} (last action: ${r.action_type})`)
|
||||
.join('\n');
|
||||
return `- raw_path: ${path}\n artifacts:\n${artifactLines}`;
|
||||
})
|
||||
.join('\n');
|
||||
},
|
||||
});
|
||||
}
|
||||
|
|
@ -0,0 +1,69 @@
|
|||
import { mkdir, mkdtemp, rm, writeFile } from 'node:fs/promises';
|
||||
import { tmpdir } from 'node:os';
|
||||
import { join } from 'node:path';
|
||||
import { afterEach, beforeEach, describe, expect, it } from 'vitest';
|
||||
import { createReadRawFileTool } from './read-raw-file.tool.js';
|
||||
|
||||
describe('read_raw_file tool', () => {
|
||||
let stagedDir: string;
|
||||
|
||||
beforeEach(async () => {
|
||||
stagedDir = await mkdtemp(join(tmpdir(), 'readraw-'));
|
||||
await mkdir(join(stagedDir, 'views'), { recursive: true });
|
||||
await writeFile(join(stagedDir, 'views', 'a.yml'), 'line1\nline2\nline3\n', 'utf-8');
|
||||
await writeFile(join(stagedDir, 'peer.yml'), 'secret', 'utf-8');
|
||||
});
|
||||
|
||||
afterEach(async () => rm(stagedDir, { recursive: true, force: true }));
|
||||
|
||||
it('returns content for an allowed path', async () => {
|
||||
const tool = createReadRawFileTool({ stagedDir, allowedPaths: new Set(['views/a.yml']) });
|
||||
const result = await (tool.execute as (...args: unknown[]) => unknown)(
|
||||
{ path: 'views/a.yml' },
|
||||
{ toolCallId: 't1', messages: [] },
|
||||
);
|
||||
expect(result).toContain('line1');
|
||||
expect(result).toContain('line2');
|
||||
});
|
||||
|
||||
it('refuses to return oversized files and directs callers to read spans', async () => {
|
||||
await writeFile(join(stagedDir, 'views', 'huge.yml'), `${'x'.repeat(160_000)}\n`, 'utf-8');
|
||||
const tool = createReadRawFileTool({ stagedDir, allowedPaths: new Set(['views/huge.yml']) });
|
||||
const result = await (tool.execute as (...args: unknown[]) => unknown)(
|
||||
{ path: 'views/huge.yml' },
|
||||
{ toolCallId: 't1', messages: [] },
|
||||
);
|
||||
|
||||
expect(result).toMatch(/too large/i);
|
||||
expect(result).toMatch(/read_raw_span/i);
|
||||
expect(String(result).length).toBeLessThan(1000);
|
||||
});
|
||||
|
||||
it('rejects a path not in the allow-list', async () => {
|
||||
const tool = createReadRawFileTool({ stagedDir, allowedPaths: new Set(['views/a.yml']) });
|
||||
const result = await (tool.execute as (...args: unknown[]) => unknown)(
|
||||
{ path: 'peer.yml' },
|
||||
{ toolCallId: 't1', messages: [] },
|
||||
);
|
||||
expect(result).toMatch(/not accessible/i);
|
||||
expect(result).not.toContain('secret');
|
||||
});
|
||||
|
||||
it('rejects directory traversal attempts', async () => {
|
||||
const tool = createReadRawFileTool({ stagedDir, allowedPaths: new Set(['views/a.yml']) });
|
||||
const result = await (tool.execute as (...args: unknown[]) => unknown)(
|
||||
{ path: '../outside.yml' },
|
||||
{ toolCallId: 't1', messages: [] },
|
||||
);
|
||||
expect(result).toMatch(/not accessible/i);
|
||||
});
|
||||
|
||||
it('returns a clear error when the file is missing despite being allowed', async () => {
|
||||
const tool = createReadRawFileTool({ stagedDir, allowedPaths: new Set(['views/missing.yml']) });
|
||||
const result = await (tool.execute as (...args: unknown[]) => unknown)(
|
||||
{ path: 'views/missing.yml' },
|
||||
{ toolCallId: 't1', messages: [] },
|
||||
);
|
||||
expect(result).toMatch(/not found/i);
|
||||
});
|
||||
});
|
||||
41
packages/cli/src/context/ingest/tools/read-raw-file.tool.ts
Normal file
41
packages/cli/src/context/ingest/tools/read-raw-file.tool.ts
Normal file
|
|
@ -0,0 +1,41 @@
|
|||
import { readFile, stat } from 'node:fs/promises';
|
||||
import { join, normalize, resolve } from 'node:path';
|
||||
import { tool } from 'ai';
|
||||
import { z } from 'zod';
|
||||
|
||||
interface ReadRawFileDeps {
|
||||
stagedDir: string;
|
||||
allowedPaths: Set<string>;
|
||||
}
|
||||
|
||||
const MAX_READ_RAW_FILE_BYTES = 120_000;
|
||||
|
||||
export function createReadRawFileTool(deps: ReadRawFileDeps) {
|
||||
const stagedRoot = resolve(deps.stagedDir);
|
||||
return tool({
|
||||
description:
|
||||
"Read the full text content of a raw source file inside this WorkUnit. `path` must be relative to the staged bundle root (no leading slash, no `..`) and must appear in the WorkUnit's rawFiles or dependencyPaths list.",
|
||||
inputSchema: z.object({
|
||||
path: z.string().describe('Path relative to the staged bundle root. Example: "views/customers/customer.lkml".'),
|
||||
}),
|
||||
execute: async ({ path }) => {
|
||||
const normalized = normalize(path).replace(/^[/\\]+/, '');
|
||||
if (normalized.startsWith('..') || !deps.allowedPaths.has(normalized)) {
|
||||
return `Error: path "${path}" is not accessible from this WorkUnit. Allowed paths: ${[...deps.allowedPaths].sort().join(', ')}`;
|
||||
}
|
||||
const absolute = resolve(join(stagedRoot, normalized));
|
||||
if (!absolute.startsWith(`${stagedRoot}/`) && absolute !== stagedRoot) {
|
||||
return `Error: path "${path}" is not accessible from this WorkUnit.`;
|
||||
}
|
||||
try {
|
||||
const fileStat = await stat(absolute);
|
||||
if (fileStat.size > MAX_READ_RAW_FILE_BYTES) {
|
||||
return `Error: file "${path}" is too large to return in full (${fileStat.size} bytes). Use read_raw_span with targeted line ranges instead.`;
|
||||
}
|
||||
return await readFile(absolute, 'utf-8');
|
||||
} catch (err) {
|
||||
return `Error: file "${path}" not found. (${err instanceof Error ? err.message : String(err)})`;
|
||||
}
|
||||
},
|
||||
});
|
||||
}
|
||||
|
|
@ -0,0 +1,53 @@
|
|||
import { mkdir, mkdtemp, rm, writeFile } from 'node:fs/promises';
|
||||
import { tmpdir } from 'node:os';
|
||||
import { join } from 'node:path';
|
||||
import { afterEach, beforeEach, describe, expect, it } from 'vitest';
|
||||
import { createReadRawSpanTool } from './read-raw-span.tool.js';
|
||||
|
||||
describe('read_raw_span tool', () => {
|
||||
let stagedDir: string;
|
||||
|
||||
beforeEach(async () => {
|
||||
stagedDir = await mkdtemp(join(tmpdir(), 'readspan-'));
|
||||
await mkdir(join(stagedDir, 'v'), { recursive: true });
|
||||
await writeFile(join(stagedDir, 'v', 'a.yml'), 'line1\nline2\nline3\nline4\nline5\n', 'utf-8');
|
||||
});
|
||||
|
||||
afterEach(async () => rm(stagedDir, { recursive: true, force: true }));
|
||||
|
||||
it('returns the requested 1-based inclusive line range', async () => {
|
||||
const tool = createReadRawSpanTool({ stagedDir, allowedPaths: new Set(['v/a.yml']) });
|
||||
const result = await (tool.execute as (...args: unknown[]) => unknown)(
|
||||
{ path: 'v/a.yml', startLine: 2, endLine: 4 },
|
||||
{ toolCallId: 't1', messages: [] },
|
||||
);
|
||||
expect(result).toBe('line2\nline3\nline4');
|
||||
});
|
||||
|
||||
it('clamps endLine to the end of the file', async () => {
|
||||
const tool = createReadRawSpanTool({ stagedDir, allowedPaths: new Set(['v/a.yml']) });
|
||||
const result = await (tool.execute as (...args: unknown[]) => unknown)(
|
||||
{ path: 'v/a.yml', startLine: 4, endLine: 99 },
|
||||
{ toolCallId: 't1', messages: [] },
|
||||
);
|
||||
expect(result).toBe('line4\nline5');
|
||||
});
|
||||
|
||||
it('rejects start > end', async () => {
|
||||
const tool = createReadRawSpanTool({ stagedDir, allowedPaths: new Set(['v/a.yml']) });
|
||||
const result = await (tool.execute as (...args: unknown[]) => unknown)(
|
||||
{ path: 'v/a.yml', startLine: 5, endLine: 2 },
|
||||
{ toolCallId: 't1', messages: [] },
|
||||
);
|
||||
expect(result).toMatch(/startLine must be/i);
|
||||
});
|
||||
|
||||
it('rejects paths not in the allow-list', async () => {
|
||||
const tool = createReadRawSpanTool({ stagedDir, allowedPaths: new Set([]) });
|
||||
const result = await (tool.execute as (...args: unknown[]) => unknown)(
|
||||
{ path: 'v/a.yml', startLine: 1, endLine: 1 },
|
||||
{ toolCallId: 't1', messages: [] },
|
||||
);
|
||||
expect(result).toMatch(/not accessible/i);
|
||||
});
|
||||
});
|
||||
46
packages/cli/src/context/ingest/tools/read-raw-span.tool.ts
Normal file
46
packages/cli/src/context/ingest/tools/read-raw-span.tool.ts
Normal file
|
|
@ -0,0 +1,46 @@
|
|||
import { readFile } from 'node:fs/promises';
|
||||
import { join, normalize, resolve } from 'node:path';
|
||||
import { tool } from 'ai';
|
||||
import { z } from 'zod';
|
||||
|
||||
interface ReadRawSpanDeps {
|
||||
stagedDir: string;
|
||||
allowedPaths: Set<string>;
|
||||
}
|
||||
|
||||
export function createReadRawSpanTool(deps: ReadRawSpanDeps) {
|
||||
const stagedRoot = resolve(deps.stagedDir);
|
||||
return tool({
|
||||
description:
|
||||
'Read a 1-based inclusive line range from a raw source file. Use this to resolve a provenance pointer like `file.lkml#L15-28` without loading the whole file into context.',
|
||||
inputSchema: z.object({
|
||||
path: z.string().describe('Path relative to the staged bundle root.'),
|
||||
startLine: z.number().int().min(1).describe('First line to return (1-based, inclusive).'),
|
||||
endLine: z.number().int().min(1).describe('Last line to return (1-based, inclusive). Clamped to file length.'),
|
||||
}),
|
||||
execute: async ({ path, startLine, endLine }) => {
|
||||
if (startLine > endLine) {
|
||||
return `Error: startLine must be <= endLine (got startLine=${startLine}, endLine=${endLine})`;
|
||||
}
|
||||
const normalized = normalize(path).replace(/^[/\\]+/, '');
|
||||
if (normalized.startsWith('..') || !deps.allowedPaths.has(normalized)) {
|
||||
return `Error: path "${path}" is not accessible from this context. Allowed paths: ${[...deps.allowedPaths].sort().join(', ')}`;
|
||||
}
|
||||
const absolute = resolve(join(stagedRoot, normalized));
|
||||
if (!absolute.startsWith(`${stagedRoot}/`) && absolute !== stagedRoot) {
|
||||
return `Error: path "${path}" is not accessible from this context.`;
|
||||
}
|
||||
try {
|
||||
const body = await readFile(absolute, 'utf-8');
|
||||
const rawLines = body.split('\n');
|
||||
// Treat a trailing empty element caused by a file-ending newline as NOT a line.
|
||||
const lines = rawLines.length > 0 && rawLines[rawLines.length - 1] === '' ? rawLines.slice(0, -1) : rawLines;
|
||||
const from = Math.max(1, startLine);
|
||||
const to = Math.min(lines.length, endLine);
|
||||
return lines.slice(from - 1, to).join('\n');
|
||||
} catch (err) {
|
||||
return `Error: file "${path}" not found. (${err instanceof Error ? err.message : String(err)})`;
|
||||
}
|
||||
},
|
||||
});
|
||||
}
|
||||
131
packages/cli/src/context/ingest/tools/stage-diff.tool.test.ts
Normal file
131
packages/cli/src/context/ingest/tools/stage-diff.tool.test.ts
Normal file
|
|
@ -0,0 +1,131 @@
|
|||
import { describe, expect, it } from 'vitest';
|
||||
import { createStageDiffTool } from './stage-diff.tool.js';
|
||||
|
||||
describe('stage_diff tool', () => {
|
||||
const stageIndex = {
|
||||
jobId: 'j',
|
||||
connectionId: 'c1',
|
||||
workUnits: [
|
||||
{
|
||||
unitKey: 'u1',
|
||||
rawFiles: [],
|
||||
status: 'success' as const,
|
||||
actions: [{ target: 'sl' as const, type: 'created' as const, key: 'churn_risk_score', detail: 'customers' }],
|
||||
touchedSlSources: [{ connectionId: 'c1', sourceName: 'customers' }],
|
||||
},
|
||||
{
|
||||
unitKey: 'u2',
|
||||
rawFiles: [],
|
||||
status: 'success' as const,
|
||||
actions: [{ target: 'sl' as const, type: 'created' as const, key: 'churn_risk_score', detail: 'billing' }],
|
||||
touchedSlSources: [{ connectionId: 'c1', sourceName: 'billing' }],
|
||||
},
|
||||
],
|
||||
conflictsResolved: [],
|
||||
evictionsApplied: [],
|
||||
unmappedFallbacks: [],
|
||||
};
|
||||
|
||||
it('finds overlapping artifact keys between two WUs', async () => {
|
||||
const tool = createStageDiffTool({ stageIndex });
|
||||
const out = (await (tool.execute as (...args: unknown[]) => unknown)(
|
||||
{ unitKeyA: 'u1', unitKeyB: 'u2' },
|
||||
{ toolCallId: 't', messages: [] },
|
||||
)) as string;
|
||||
expect(out).toContain('churn_risk_score');
|
||||
expect(out).toMatch(/overlap/i);
|
||||
});
|
||||
|
||||
it('says no overlap when keys are disjoint', async () => {
|
||||
const tool = createStageDiffTool({
|
||||
stageIndex: {
|
||||
jobId: 'j',
|
||||
connectionId: 'c1',
|
||||
workUnits: [
|
||||
{
|
||||
unitKey: 'u1',
|
||||
rawFiles: [],
|
||||
status: 'success',
|
||||
actions: [{ target: 'sl', type: 'created', key: 'a', detail: '' }],
|
||||
touchedSlSources: [{ connectionId: 'c1', sourceName: 'a' }],
|
||||
},
|
||||
{
|
||||
unitKey: 'u2',
|
||||
rawFiles: [],
|
||||
status: 'success',
|
||||
actions: [{ target: 'sl', type: 'created', key: 'b', detail: '' }],
|
||||
touchedSlSources: [{ connectionId: 'c1', sourceName: 'b' }],
|
||||
},
|
||||
],
|
||||
conflictsResolved: [],
|
||||
evictionsApplied: [],
|
||||
unmappedFallbacks: [],
|
||||
},
|
||||
});
|
||||
const out = (await (tool.execute as (...args: unknown[]) => unknown)(
|
||||
{ unitKeyA: 'u1', unitKeyB: 'u2' },
|
||||
{ toolCallId: 't', messages: [] },
|
||||
)) as string;
|
||||
expect(out).toMatch(/no overlap/i);
|
||||
});
|
||||
|
||||
it('does not overlap same-named SL actions on different target connections', async () => {
|
||||
const tool = createStageDiffTool({
|
||||
stageIndex: {
|
||||
jobId: 'j',
|
||||
connectionId: 'looker-run',
|
||||
workUnits: [
|
||||
{
|
||||
unitKey: 'u1',
|
||||
rawFiles: [],
|
||||
status: 'success',
|
||||
actions: [
|
||||
{
|
||||
target: 'sl',
|
||||
type: 'created',
|
||||
key: 'looker__b2b__sales_pipeline',
|
||||
detail: 'W1',
|
||||
targetConnectionId: 'W1',
|
||||
},
|
||||
],
|
||||
touchedSlSources: [{ connectionId: 'W1', sourceName: 'looker__b2b__sales_pipeline' }],
|
||||
},
|
||||
{
|
||||
unitKey: 'u2',
|
||||
rawFiles: [],
|
||||
status: 'success',
|
||||
actions: [
|
||||
{
|
||||
target: 'sl',
|
||||
type: 'created',
|
||||
key: 'looker__b2b__sales_pipeline',
|
||||
detail: 'W2',
|
||||
targetConnectionId: 'W2',
|
||||
},
|
||||
],
|
||||
touchedSlSources: [{ connectionId: 'W2', sourceName: 'looker__b2b__sales_pipeline' }],
|
||||
},
|
||||
],
|
||||
conflictsResolved: [],
|
||||
evictionsApplied: [],
|
||||
unmappedFallbacks: [],
|
||||
},
|
||||
});
|
||||
|
||||
const out = (await (tool.execute as (...args: unknown[]) => unknown)(
|
||||
{ unitKeyA: 'u1', unitKeyB: 'u2' },
|
||||
{ toolCallId: 't', messages: [] },
|
||||
)) as string;
|
||||
|
||||
expect(out).toMatch(/no overlap/i);
|
||||
});
|
||||
|
||||
it('returns an error when a unitKey is unknown', async () => {
|
||||
const tool = createStageDiffTool({ stageIndex });
|
||||
const out = (await (tool.execute as (...args: unknown[]) => unknown)(
|
||||
{ unitKeyA: 'u1', unitKeyB: 'nope' },
|
||||
{ toolCallId: 't', messages: [] },
|
||||
)) as string;
|
||||
expect(out).toMatch(/unknown/i);
|
||||
});
|
||||
});
|
||||
44
packages/cli/src/context/ingest/tools/stage-diff.tool.ts
Normal file
44
packages/cli/src/context/ingest/tools/stage-diff.tool.ts
Normal file
|
|
@ -0,0 +1,44 @@
|
|||
import { tool } from 'ai';
|
||||
import { z } from 'zod';
|
||||
import { memoryActionIdentity } from '../action-identity.js';
|
||||
import type { StageIndex } from '../stages/stage-index.types.js';
|
||||
|
||||
export interface StageDiffDeps {
|
||||
stageIndex: StageIndex;
|
||||
}
|
||||
|
||||
export function createStageDiffTool(deps: StageDiffDeps) {
|
||||
return tool({
|
||||
description:
|
||||
'Compare two WorkUnits by their writes. SL writes overlap only when target connection and artifact key both match; same-key SL actions on different target connections are non-overlapping.',
|
||||
inputSchema: z.object({
|
||||
unitKeyA: z.string(),
|
||||
unitKeyB: z.string(),
|
||||
}),
|
||||
execute: ({ unitKeyA, unitKeyB }) => {
|
||||
const a = deps.stageIndex.workUnits.find((wu) => wu.unitKey === unitKeyA);
|
||||
const b = deps.stageIndex.workUnits.find((wu) => wu.unitKey === unitKeyB);
|
||||
if (!a) {
|
||||
return Promise.resolve(`Error: unknown unitKey "${unitKeyA}"`);
|
||||
}
|
||||
if (!b) {
|
||||
return Promise.resolve(`Error: unknown unitKey "${unitKeyB}"`);
|
||||
}
|
||||
const runConnectionId = deps.stageIndex.connectionId;
|
||||
const keysA = new Set(a.actions.map((ac) => memoryActionIdentity(ac, runConnectionId)));
|
||||
const keysB = new Set(b.actions.map((ac) => memoryActionIdentity(ac, runConnectionId)));
|
||||
const overlap = [...keysA].filter((k) => keysB.has(k));
|
||||
if (overlap.length === 0) {
|
||||
return Promise.resolve(`No overlap between ${unitKeyA} and ${unitKeyB}.`);
|
||||
}
|
||||
const overlapDetail = overlap
|
||||
.map((k) => {
|
||||
const aDetail = a.actions.find((ac) => memoryActionIdentity(ac, runConnectionId) === k);
|
||||
const bDetail = b.actions.find((ac) => memoryActionIdentity(ac, runConnectionId) === k);
|
||||
return `- ${k}\n ${unitKeyA}: ${aDetail?.detail ?? ''}\n ${unitKeyB}: ${bDetail?.detail ?? ''}`;
|
||||
})
|
||||
.join('\n');
|
||||
return Promise.resolve(`Overlap between ${unitKeyA} and ${unitKeyB}:\n${overlapDetail}`);
|
||||
},
|
||||
});
|
||||
}
|
||||
|
|
@ -0,0 +1,66 @@
|
|||
import { describe, expect, it } from 'vitest';
|
||||
import { createStageListTool } from './stage-list.tool.js';
|
||||
|
||||
describe('stage_list tool', () => {
|
||||
it('returns a compact summary of the stage index', async () => {
|
||||
const tool = createStageListTool({
|
||||
stageIndex: {
|
||||
jobId: 'j1',
|
||||
connectionId: 'c1',
|
||||
workUnits: [
|
||||
{
|
||||
unitKey: 'u1',
|
||||
rawFiles: ['a.yml'],
|
||||
status: 'success',
|
||||
actions: [{ target: 'sl', type: 'created', key: 'src_a', detail: '' }],
|
||||
touchedSlSources: [{ connectionId: 'c1', sourceName: 'src_a' }],
|
||||
},
|
||||
{
|
||||
unitKey: 'u2',
|
||||
rawFiles: ['b.yml'],
|
||||
status: 'success',
|
||||
actions: [
|
||||
{
|
||||
target: 'wiki',
|
||||
type: 'created',
|
||||
key: 'page_b',
|
||||
detail: 'tables: orbit_analytics.customer',
|
||||
},
|
||||
],
|
||||
touchedSlSources: [],
|
||||
},
|
||||
],
|
||||
conflictsResolved: [],
|
||||
evictionsApplied: [],
|
||||
unmappedFallbacks: [],
|
||||
},
|
||||
});
|
||||
const out = (await (tool.execute as (...args: unknown[]) => unknown)(
|
||||
{},
|
||||
{ toolCallId: 't', messages: [] },
|
||||
)) as string;
|
||||
expect(out).toContain('u1');
|
||||
expect(out).toContain('src_a');
|
||||
expect(out).toContain('u2');
|
||||
expect(out).toContain('page_b');
|
||||
expect(out).toContain('tables: orbit_analytics.customer');
|
||||
});
|
||||
|
||||
it('says empty when no writes', async () => {
|
||||
const tool = createStageListTool({
|
||||
stageIndex: {
|
||||
jobId: 'j',
|
||||
connectionId: 'c1',
|
||||
workUnits: [],
|
||||
conflictsResolved: [],
|
||||
evictionsApplied: [],
|
||||
unmappedFallbacks: [],
|
||||
},
|
||||
});
|
||||
const out = (await (tool.execute as (...args: unknown[]) => unknown)(
|
||||
{},
|
||||
{ toolCallId: 't', messages: [] },
|
||||
)) as string;
|
||||
expect(out).toMatch(/empty/i);
|
||||
});
|
||||
});
|
||||
41
packages/cli/src/context/ingest/tools/stage-list.tool.ts
Normal file
41
packages/cli/src/context/ingest/tools/stage-list.tool.ts
Normal file
|
|
@ -0,0 +1,41 @@
|
|||
import { tool } from 'ai';
|
||||
import { z } from 'zod';
|
||||
import type { StageIndex } from '../stages/stage-index.types.js';
|
||||
|
||||
export interface StageListDeps {
|
||||
stageIndex: StageIndex;
|
||||
}
|
||||
|
||||
function formatActionDetail(detail: string): string {
|
||||
return detail.trim().replace(/\s+/g, ' ');
|
||||
}
|
||||
|
||||
export function createStageListTool(deps: StageListDeps) {
|
||||
return tool({
|
||||
description:
|
||||
'List every write made by Stage 3 WorkUnits in this job. Each entry has the unitKey, raw files, and the action set (SL sources touched, wiki pages written).',
|
||||
inputSchema: z.object({}),
|
||||
execute: () => {
|
||||
if (deps.stageIndex.workUnits.length === 0) {
|
||||
return Promise.resolve('(empty) — no WorkUnits wrote anything in this job');
|
||||
}
|
||||
const out = deps.stageIndex.workUnits
|
||||
.map((wu) => {
|
||||
const actions =
|
||||
wu.actions.length === 0
|
||||
? ' (no actions)'
|
||||
: wu.actions
|
||||
.map((a) => {
|
||||
const detail = formatActionDetail(a.detail);
|
||||
return detail.length > 0
|
||||
? ` - ${a.target}:${a.type} ${a.key}; detail: ${detail}`
|
||||
: ` - ${a.target}:${a.type} ${a.key}`;
|
||||
})
|
||||
.join('\n');
|
||||
return `- unitKey: ${wu.unitKey} (status=${wu.status})\n rawFiles: ${wu.rawFiles.join(', ') || '(none)'}\n actions:\n${actions}`;
|
||||
})
|
||||
.join('\n');
|
||||
return Promise.resolve(out);
|
||||
},
|
||||
});
|
||||
}
|
||||
101
packages/cli/src/context/ingest/tools/tool-call-logger.ts
Normal file
101
packages/cli/src/context/ingest/tools/tool-call-logger.ts
Normal file
|
|
@ -0,0 +1,101 @@
|
|||
import { appendFile, mkdir } from 'node:fs/promises';
|
||||
import { dirname } from 'node:path';
|
||||
import type { KtxRuntimeToolSet } from '../../../context/llm/runtime-port.js';
|
||||
|
||||
export interface ToolCallLogEntry {
|
||||
ts: string;
|
||||
wuKey: string;
|
||||
toolCallId?: string;
|
||||
toolName: string;
|
||||
durationMs: number;
|
||||
input: unknown;
|
||||
output?: unknown;
|
||||
error?: { message: string; name?: string };
|
||||
}
|
||||
|
||||
interface ToolCallLoggerOptions {
|
||||
onEntry?(entry: ToolCallLogEntry): void;
|
||||
}
|
||||
|
||||
/**
|
||||
* Wrap every tool in `tools` so each invocation appends a JSONL record with
|
||||
* `{toolName, input, output | error, durationMs}` to `logFilePath`. Used by
|
||||
* the ingest runner to produce per-WU transcripts so a completed sync can be
|
||||
* inspected the way `parse_chat.py` inspects a chat.
|
||||
*
|
||||
* Tool shape is preserved (description, inputSchema, ...). Tools without an
|
||||
* `execute` function (provider-defined) pass through untouched.
|
||||
*
|
||||
* Log writes are best-effort and fire-and-forget; a failing write will never
|
||||
* block or error the agent. Tool execution inside a single agent loop is
|
||||
* sequential (`generateText` awaits each tool result), so per-WU files are
|
||||
* effectively single-writer and lines land in call order.
|
||||
*/
|
||||
export function wrapToolsWithLogger<T extends KtxRuntimeToolSet>(
|
||||
tools: T,
|
||||
logFilePath: string,
|
||||
wuKey: string,
|
||||
options: ToolCallLoggerOptions = {},
|
||||
): T {
|
||||
const wrapped: Record<string, unknown> = {};
|
||||
for (const [name, original] of Object.entries(tools) as Array<[string, T[string]]>) {
|
||||
const originalExecute = original.execute;
|
||||
if (typeof originalExecute !== 'function') {
|
||||
wrapped[name] = original;
|
||||
continue;
|
||||
}
|
||||
const wrappedExecute = async (input: unknown) => {
|
||||
const start = Date.now();
|
||||
try {
|
||||
const output = await originalExecute(input);
|
||||
const entry: ToolCallLogEntry = {
|
||||
ts: new Date().toISOString(),
|
||||
wuKey,
|
||||
toolName: name,
|
||||
durationMs: Date.now() - start,
|
||||
input,
|
||||
output,
|
||||
};
|
||||
options.onEntry?.(entry);
|
||||
appendEntry(logFilePath, entry);
|
||||
return output;
|
||||
} catch (err) {
|
||||
const entry: ToolCallLogEntry = {
|
||||
ts: new Date().toISOString(),
|
||||
wuKey,
|
||||
toolName: name,
|
||||
durationMs: Date.now() - start,
|
||||
input,
|
||||
error: {
|
||||
message: err instanceof Error ? err.message : String(err),
|
||||
name: err instanceof Error ? err.name : undefined,
|
||||
},
|
||||
};
|
||||
options.onEntry?.(entry);
|
||||
appendEntry(logFilePath, entry);
|
||||
throw err;
|
||||
}
|
||||
};
|
||||
wrapped[name] = { ...original, execute: wrappedExecute };
|
||||
}
|
||||
return wrapped as T;
|
||||
}
|
||||
|
||||
function appendEntry(path: string, entry: ToolCallLogEntry): void {
|
||||
void (async () => {
|
||||
try {
|
||||
await mkdir(dirname(path), { recursive: true });
|
||||
await appendFile(path, `${safeStringify(entry)}\n`, 'utf-8');
|
||||
} catch {
|
||||
// best-effort
|
||||
}
|
||||
})();
|
||||
}
|
||||
|
||||
function safeStringify(v: unknown): string {
|
||||
try {
|
||||
return JSON.stringify(v);
|
||||
} catch {
|
||||
return JSON.stringify({ error: 'serialize-failed' });
|
||||
}
|
||||
}
|
||||
|
|
@ -0,0 +1,218 @@
|
|||
import { describe, expect, it } from 'vitest';
|
||||
import type { ToolCallLogEntry } from './tool-call-logger.js';
|
||||
import { createMutableToolTranscriptSummary, recordToolTranscriptEntry } from './tool-transcript-summary.js';
|
||||
|
||||
function entry(overrides: Partial<ToolCallLogEntry>): ToolCallLogEntry {
|
||||
return {
|
||||
ts: '2026-05-11T00:00:00.000Z',
|
||||
wuKey: 'wu-1',
|
||||
toolName: 'wiki_write',
|
||||
durationMs: 1,
|
||||
input: {},
|
||||
...overrides,
|
||||
};
|
||||
}
|
||||
|
||||
describe('tool transcript summaries', () => {
|
||||
it('keeps recovered wiki_write structured failures out of fatal failures', () => {
|
||||
const summary = createMutableToolTranscriptSummary('wu-1', '/tmp/wu-1.jsonl');
|
||||
|
||||
recordToolTranscriptEntry(
|
||||
summary,
|
||||
entry({
|
||||
input: { key: 'orbit-customers' },
|
||||
output: { structured: { success: false, key: 'orbit-customers' } },
|
||||
}),
|
||||
);
|
||||
recordToolTranscriptEntry(
|
||||
summary,
|
||||
entry({
|
||||
input: { key: 'orbit-customers' },
|
||||
output: { structured: { success: true, key: 'orbit-customers' } },
|
||||
}),
|
||||
);
|
||||
|
||||
expect(summary.errorCount).toBe(1);
|
||||
expect(summary.fatalErrorCount).toBe(0);
|
||||
});
|
||||
|
||||
it('treats a suggested flat wiki key retry as recovery for an invalid nested key', () => {
|
||||
const summary = createMutableToolTranscriptSummary('wu-1', '/tmp/wu-1.jsonl');
|
||||
|
||||
recordToolTranscriptEntry(
|
||||
summary,
|
||||
entry({
|
||||
input: { key: 'historic-sql/top-accounts-by-contract-arr' },
|
||||
output: { structured: { success: false, key: 'historic-sql/top-accounts-by-contract-arr' } },
|
||||
}),
|
||||
);
|
||||
recordToolTranscriptEntry(
|
||||
summary,
|
||||
entry({
|
||||
input: { key: 'historic-sql-top-accounts-by-contract-arr' },
|
||||
output: { structured: { success: true, key: 'historic-sql-top-accounts-by-contract-arr' } },
|
||||
}),
|
||||
);
|
||||
|
||||
expect(summary.errorCount).toBe(1);
|
||||
expect(summary.fatalErrorCount).toBe(0);
|
||||
});
|
||||
|
||||
it('counts unrecovered wiki_remove structured failures as fatal transcript errors', () => {
|
||||
const summary = createMutableToolTranscriptSummary('reconcile', '/tmp/reconcile.jsonl');
|
||||
|
||||
recordToolTranscriptEntry(summary, {
|
||||
ts: '2026-05-11T00:00:00.000Z',
|
||||
wuKey: 'reconcile',
|
||||
toolCallId: 'remove-1',
|
||||
toolName: 'wiki_remove',
|
||||
durationMs: 1,
|
||||
input: { key: 'duplicate-page' },
|
||||
output: { structured: { success: false, key: 'duplicate-page' } },
|
||||
});
|
||||
|
||||
expect(summary.errorCount).toBe(1);
|
||||
expect(summary.fatalErrorCount).toBe(1);
|
||||
});
|
||||
|
||||
it('keeps unrecovered structured write failures fatal', () => {
|
||||
const summary = createMutableToolTranscriptSummary('wu-1', '/tmp/wu-1.jsonl');
|
||||
|
||||
recordToolTranscriptEntry(
|
||||
summary,
|
||||
entry({
|
||||
input: { key: 'orbit-customers' },
|
||||
output: { structured: { success: false, key: 'orbit-customers' } },
|
||||
}),
|
||||
);
|
||||
|
||||
expect(summary.errorCount).toBe(1);
|
||||
expect(summary.fatalErrorCount).toBe(1);
|
||||
});
|
||||
|
||||
it('treats a later sl_edit_source success as recovery for the same SL source', () => {
|
||||
const summary = createMutableToolTranscriptSummary('wu-1', '/tmp/wu-1.jsonl');
|
||||
|
||||
recordToolTranscriptEntry(
|
||||
summary,
|
||||
entry({
|
||||
toolName: 'sl_write_source',
|
||||
input: { connectionId: 'warehouse', sourceName: 'orbit_customers' },
|
||||
output: { structured: { success: false, sourceName: 'orbit_customers' } },
|
||||
}),
|
||||
);
|
||||
recordToolTranscriptEntry(
|
||||
summary,
|
||||
entry({
|
||||
toolName: 'sl_edit_source',
|
||||
input: { connectionId: 'warehouse', sourceName: 'orbit_customers' },
|
||||
output: { structured: { success: true, sourceName: 'orbit_customers' } },
|
||||
}),
|
||||
);
|
||||
|
||||
expect(summary.errorCount).toBe(1);
|
||||
expect(summary.fatalErrorCount).toBe(0);
|
||||
});
|
||||
|
||||
it('treats explicit unmapped fallback as recovery for guarded SL write failures', () => {
|
||||
const summary = createMutableToolTranscriptSummary('wu-1', '/tmp/wu-1.jsonl');
|
||||
|
||||
recordToolTranscriptEntry(
|
||||
summary,
|
||||
entry({
|
||||
toolName: 'sl_write_source',
|
||||
input: { connectionId: 'dbt-main', sourceName: 'stg_accounts' },
|
||||
output: { structured: { success: false, sourceName: 'stg_accounts' } },
|
||||
}),
|
||||
);
|
||||
recordToolTranscriptEntry(
|
||||
summary,
|
||||
entry({
|
||||
toolName: 'emit_unmapped_fallback',
|
||||
input: { rawPath: 'models/schema.yml', reason: 'no_physical_table', tableRef: 'stg_accounts', fallback: 'wiki_only' },
|
||||
output: 'recorded unmapped fallback for models/schema.yml (wiki_only)',
|
||||
}),
|
||||
);
|
||||
|
||||
expect(summary.errorCount).toBe(1);
|
||||
expect(summary.fatalErrorCount).toBe(0);
|
||||
});
|
||||
|
||||
it('treats an untargeted unmapped fallback as recovery when there is only one pending SL failure', () => {
|
||||
const summary = createMutableToolTranscriptSummary('wu-1', '/tmp/wu-1.jsonl');
|
||||
|
||||
recordToolTranscriptEntry(
|
||||
summary,
|
||||
entry({
|
||||
toolName: 'sl_write_source',
|
||||
input: { connectionId: 'dbt-main', sourceName: 'stg_accounts' },
|
||||
output: { structured: { success: false, sourceName: 'stg_accounts' } },
|
||||
}),
|
||||
);
|
||||
recordToolTranscriptEntry(
|
||||
summary,
|
||||
entry({
|
||||
toolName: 'emit_unmapped_fallback',
|
||||
input: { rawPath: 'models/schema.yml', reason: 'no_physical_table', fallback: 'wiki_only' },
|
||||
output: 'recorded unmapped fallback for models/schema.yml (wiki_only)',
|
||||
}),
|
||||
);
|
||||
|
||||
expect(summary.errorCount).toBe(1);
|
||||
expect(summary.fatalErrorCount).toBe(0);
|
||||
});
|
||||
|
||||
it('keeps unrelated SL write failures fatal when one source gets an unmapped fallback', () => {
|
||||
const summary = createMutableToolTranscriptSummary('wu-1', '/tmp/wu-1.jsonl');
|
||||
|
||||
recordToolTranscriptEntry(
|
||||
summary,
|
||||
entry({
|
||||
toolName: 'sl_write_source',
|
||||
input: { connectionId: 'dbt-main', sourceName: 'stg_accounts' },
|
||||
output: { structured: { success: false, sourceName: 'stg_accounts' } },
|
||||
}),
|
||||
);
|
||||
recordToolTranscriptEntry(
|
||||
summary,
|
||||
entry({
|
||||
toolName: 'sl_write_source',
|
||||
input: { connectionId: 'dbt-main', sourceName: 'stg_orders' },
|
||||
output: { structured: { success: false, sourceName: 'stg_orders' } },
|
||||
}),
|
||||
);
|
||||
recordToolTranscriptEntry(
|
||||
summary,
|
||||
entry({
|
||||
toolName: 'emit_unmapped_fallback',
|
||||
input: { rawPath: 'models/schema.yml', reason: 'no_physical_table', tableRef: 'stg_accounts', fallback: 'wiki_only' },
|
||||
output: 'recorded unmapped fallback for models/schema.yml (wiki_only)',
|
||||
}),
|
||||
);
|
||||
|
||||
expect(summary.errorCount).toBe(2);
|
||||
expect(summary.fatalErrorCount).toBe(1);
|
||||
});
|
||||
|
||||
it('keeps thrown tool errors fatal even after a successful write', () => {
|
||||
const summary = createMutableToolTranscriptSummary('wu-1', '/tmp/wu-1.jsonl');
|
||||
|
||||
recordToolTranscriptEntry(
|
||||
summary,
|
||||
entry({
|
||||
input: { key: 'orbit-customers' },
|
||||
error: { message: 'tool crashed' },
|
||||
}),
|
||||
);
|
||||
recordToolTranscriptEntry(
|
||||
summary,
|
||||
entry({
|
||||
input: { key: 'orbit-customers' },
|
||||
output: { structured: { success: true, key: 'orbit-customers' } },
|
||||
}),
|
||||
);
|
||||
|
||||
expect(summary.errorCount).toBe(1);
|
||||
expect(summary.fatalErrorCount).toBe(1);
|
||||
});
|
||||
});
|
||||
189
packages/cli/src/context/ingest/tools/tool-transcript-summary.ts
Normal file
189
packages/cli/src/context/ingest/tools/tool-transcript-summary.ts
Normal file
|
|
@ -0,0 +1,189 @@
|
|||
import type { ToolCallLogEntry } from './tool-call-logger.js';
|
||||
import { isFlatWikiKey, suggestFlatWikiKey } from '../../wiki/keys.js';
|
||||
|
||||
export interface MutableToolTranscriptSummary {
|
||||
unitKey: string;
|
||||
path: string;
|
||||
toolCallCount: number;
|
||||
errorCount: number;
|
||||
fatalErrorCount: number;
|
||||
toolNames: Set<string>;
|
||||
hardErrorCount: number;
|
||||
recoverableFailureCounts: Map<string, number>;
|
||||
}
|
||||
|
||||
export function createMutableToolTranscriptSummary(unitKey: string, path: string): MutableToolTranscriptSummary {
|
||||
return {
|
||||
unitKey,
|
||||
path,
|
||||
toolCallCount: 0,
|
||||
errorCount: 0,
|
||||
fatalErrorCount: 0,
|
||||
toolNames: new Set<string>(),
|
||||
hardErrorCount: 0,
|
||||
recoverableFailureCounts: new Map<string, number>(),
|
||||
};
|
||||
}
|
||||
|
||||
export function recordToolTranscriptEntry(summary: MutableToolTranscriptSummary, entry: ToolCallLogEntry): void {
|
||||
summary.toolCallCount += 1;
|
||||
summary.toolNames.add(entry.toolName);
|
||||
|
||||
if (entry.error) {
|
||||
summary.errorCount += 1;
|
||||
summary.hardErrorCount += 1;
|
||||
refreshFatalErrorCount(summary);
|
||||
return;
|
||||
}
|
||||
|
||||
const recoverableFailureKey = recoverableStructuredFailureKey(entry);
|
||||
if (recoverableFailureKey) {
|
||||
summary.errorCount += 1;
|
||||
summary.recoverableFailureCounts.set(
|
||||
recoverableFailureKey,
|
||||
(summary.recoverableFailureCounts.get(recoverableFailureKey) ?? 0) + 1,
|
||||
);
|
||||
refreshFatalErrorCount(summary);
|
||||
return;
|
||||
}
|
||||
|
||||
const recoveryKey = recoverableStructuredSuccessKey(entry);
|
||||
if (recoveryKey) {
|
||||
summary.recoverableFailureCounts.delete(recoveryKey);
|
||||
}
|
||||
if (entry.toolName === 'emit_unmapped_fallback') {
|
||||
const fallbackTarget = fallbackSlTargetKey(entry);
|
||||
const pendingSlKeys = [...summary.recoverableFailureCounts.keys()].filter((key) => key.startsWith('sl:'));
|
||||
for (const key of pendingSlKeys) {
|
||||
if (
|
||||
(fallbackTarget && slFailureKeyMatchesFallback(key, fallbackTarget)) ||
|
||||
(!fallbackTarget && pendingSlKeys.length === 1)
|
||||
) {
|
||||
summary.recoverableFailureCounts.delete(key);
|
||||
}
|
||||
}
|
||||
}
|
||||
refreshFatalErrorCount(summary);
|
||||
}
|
||||
|
||||
function refreshFatalErrorCount(summary: MutableToolTranscriptSummary): void {
|
||||
summary.fatalErrorCount =
|
||||
summary.hardErrorCount + [...summary.recoverableFailureCounts.values()].reduce((sum, count) => sum + count, 0);
|
||||
}
|
||||
|
||||
function recoverableStructuredFailureKey(entry: ToolCallLogEntry): string | null {
|
||||
if (!isStructuredToolFailure(entry.output)) {
|
||||
return null;
|
||||
}
|
||||
if (entry.toolName === 'wiki_write' || entry.toolName === 'wiki_remove') {
|
||||
return wikiTargetKey(entry);
|
||||
}
|
||||
if (entry.toolName === 'sl_write_source') {
|
||||
return slTargetKey(entry);
|
||||
}
|
||||
return null;
|
||||
}
|
||||
|
||||
function recoverableStructuredSuccessKey(entry: ToolCallLogEntry): string | null {
|
||||
if (!isStructuredToolSuccess(entry.output)) {
|
||||
return null;
|
||||
}
|
||||
if (entry.toolName === 'wiki_write' || entry.toolName === 'wiki_remove') {
|
||||
return wikiTargetKey(entry);
|
||||
}
|
||||
if (entry.toolName === 'sl_write_source' || entry.toolName === 'sl_edit_source') {
|
||||
return slTargetKey(entry);
|
||||
}
|
||||
return null;
|
||||
}
|
||||
|
||||
function isStructuredToolFailure(output: unknown): boolean {
|
||||
return structuredSuccess(output) === false;
|
||||
}
|
||||
|
||||
function isStructuredToolSuccess(output: unknown): boolean {
|
||||
return structuredSuccess(output) === true;
|
||||
}
|
||||
|
||||
function structuredSuccess(output: unknown): boolean | null {
|
||||
const structured = recordField(output, 'structured');
|
||||
const success = structured?.success;
|
||||
return typeof success === 'boolean' ? success : null;
|
||||
}
|
||||
|
||||
function wikiTargetKey(entry: ToolCallLogEntry): string | null {
|
||||
const key = stringField(recordField(entry.output, 'structured'), 'key') ?? stringField(entry.input, 'key');
|
||||
if (!key) {
|
||||
return null;
|
||||
}
|
||||
return `wiki:${isFlatWikiKey(key) ? key : suggestFlatWikiKey(key)}`;
|
||||
}
|
||||
|
||||
function slTargetKey(entry: ToolCallLogEntry): string | null {
|
||||
const structured = recordField(entry.output, 'structured');
|
||||
const sourceName = stringField(structured, 'sourceName') ?? stringField(entry.input, 'sourceName');
|
||||
if (!sourceName) {
|
||||
return null;
|
||||
}
|
||||
const connectionId = stringField(entry.input, 'connectionId') ?? '';
|
||||
return `sl:${connectionId}:${sourceName}`;
|
||||
}
|
||||
|
||||
function fallbackSlTargetKey(entry: ToolCallLogEntry): { connectionId?: string; sourceName: string } | null {
|
||||
const tableRef = stringField(entry.input, 'tableRef');
|
||||
if (!tableRef) {
|
||||
return null;
|
||||
}
|
||||
const sourceName = finalReferenceSegment(tableRef);
|
||||
if (!sourceName) {
|
||||
return null;
|
||||
}
|
||||
const connectionId = stringField(entry.input, 'connectionId');
|
||||
return {
|
||||
sourceName,
|
||||
...(connectionId ? { connectionId } : {}),
|
||||
};
|
||||
}
|
||||
|
||||
function slFailureKeyMatchesFallback(
|
||||
failureKey: string,
|
||||
fallback: { connectionId?: string; sourceName: string },
|
||||
): boolean {
|
||||
const match = /^sl:([^:]*):(.*)$/.exec(failureKey);
|
||||
if (!match) {
|
||||
return false;
|
||||
}
|
||||
const [, connectionId, sourceName] = match;
|
||||
if (fallback.connectionId && connectionId !== fallback.connectionId) {
|
||||
return false;
|
||||
}
|
||||
return normalizeReferenceSegment(sourceName ?? '') === normalizeReferenceSegment(fallback.sourceName);
|
||||
}
|
||||
|
||||
function finalReferenceSegment(value: string): string {
|
||||
const normalized = value
|
||||
.trim()
|
||||
.replace(/["`]/g, '')
|
||||
.replace(/[\[\]]/g, '');
|
||||
return normalized.split('.').filter(Boolean).at(-1) ?? '';
|
||||
}
|
||||
|
||||
function normalizeReferenceSegment(value: string): string {
|
||||
return finalReferenceSegment(value).toLowerCase();
|
||||
}
|
||||
|
||||
function recordField(value: unknown, field: string): Record<string, unknown> | null {
|
||||
if (!value || typeof value !== 'object' || Array.isArray(value)) {
|
||||
return null;
|
||||
}
|
||||
const nested = (value as Record<string, unknown>)[field];
|
||||
return nested && typeof nested === 'object' && !Array.isArray(nested) ? (nested as Record<string, unknown>) : null;
|
||||
}
|
||||
|
||||
function stringField(value: unknown, field: string): string | null {
|
||||
if (!value || typeof value !== 'object' || Array.isArray(value)) {
|
||||
return null;
|
||||
}
|
||||
const raw = (value as Record<string, unknown>)[field];
|
||||
return typeof raw === 'string' && raw.length > 0 ? raw : null;
|
||||
}
|
||||
|
|
@ -0,0 +1,95 @@
|
|||
import { z } from 'zod';
|
||||
import type { KtxRuntimeToolDescriptor, KtxRuntimeToolSet } from '../../../context/llm/runtime-port.js';
|
||||
|
||||
const verificationLedgerInputSchema = z.object({
|
||||
summary: z.string().min(1).max(2000),
|
||||
verifiedIdentifiers: z.array(z.string().min(1)).max(100).default([]),
|
||||
unverifiedIdentifiers: z.array(z.string().min(1)).max(100).default([]),
|
||||
notes: z.string().max(2000).optional(),
|
||||
});
|
||||
|
||||
interface VerificationLedgerEntry {
|
||||
summary: string;
|
||||
verifiedIdentifiers: string[];
|
||||
unverifiedIdentifiers: string[];
|
||||
notes?: string;
|
||||
}
|
||||
|
||||
export interface VerificationLedgerState {
|
||||
entries: VerificationLedgerEntry[];
|
||||
}
|
||||
|
||||
const WRITE_TOOL_NAMES = new Set([
|
||||
'wiki_write',
|
||||
'wiki_remove',
|
||||
'sl_write_source',
|
||||
'sl_edit_source',
|
||||
'emit_unmapped_fallback',
|
||||
]);
|
||||
|
||||
export const VERIFICATION_LEDGER_PROMPT = `<pre_write_verification>
|
||||
Before any durable wiki, semantic-layer, or unmapped-fallback write (wiki_write, wiki_remove, sl_write_source, sl_edit_source, emit_unmapped_fallback), call record_verification_ledger.
|
||||
The ledger is a model-authored checkpoint, not a deterministic parser gate. Summarize the verification protocol from the loaded skill, list identifiers verified with discover_data/entity_details/sql_execution, and list anything intentionally left unverified. If the write contains no warehouse identifiers, say that explicitly.
|
||||
If a write tool returns verification_ledger_required, complete the ledger and retry the write.
|
||||
</pre_write_verification>`;
|
||||
|
||||
export function createVerificationLedgerState(): VerificationLedgerState {
|
||||
return { entries: [] };
|
||||
}
|
||||
|
||||
export function withVerificationLedger(tools: KtxRuntimeToolSet, state: VerificationLedgerState): KtxRuntimeToolSet {
|
||||
const wrapped: KtxRuntimeToolSet = {};
|
||||
for (const [name, original] of Object.entries(tools)) {
|
||||
if (!WRITE_TOOL_NAMES.has(name) || typeof original.execute !== 'function') {
|
||||
wrapped[name] = original;
|
||||
continue;
|
||||
}
|
||||
const originalExecute = original.execute;
|
||||
const guardedExecute = async (input: unknown) => {
|
||||
if (state.entries.length === 0) {
|
||||
return verificationRequiredOutput(name);
|
||||
}
|
||||
return originalExecute(input);
|
||||
};
|
||||
wrapped[name] = { ...original, execute: guardedExecute };
|
||||
}
|
||||
wrapped.record_verification_ledger = createRecordVerificationLedgerTool(state);
|
||||
return wrapped;
|
||||
}
|
||||
|
||||
function createRecordVerificationLedgerTool(state: VerificationLedgerState): KtxRuntimeToolDescriptor {
|
||||
return {
|
||||
name: 'record_verification_ledger',
|
||||
description:
|
||||
'Record the pre-write verification ledger required by loaded ingest skills. Call this before wiki/SL/fallback writes to state what was verified, which tool calls support it, and what remains intentionally unverified.',
|
||||
inputSchema: verificationLedgerInputSchema,
|
||||
execute: async (input) => {
|
||||
const entry = verificationLedgerInputSchema.parse(input);
|
||||
state.entries.push(entry);
|
||||
return {
|
||||
markdown:
|
||||
`Verification ledger recorded. Summary: ${entry.summary}\n` +
|
||||
`Verified identifiers: ${entry.verifiedIdentifiers.length ? entry.verifiedIdentifiers.join(', ') : '(none)'}\n` +
|
||||
`Unverified identifiers: ${
|
||||
entry.unverifiedIdentifiers.length ? entry.unverifiedIdentifiers.join(', ') : '(none)'
|
||||
}`,
|
||||
structured: { success: true, entry },
|
||||
};
|
||||
},
|
||||
};
|
||||
}
|
||||
|
||||
function verificationRequiredOutput(toolName: string) {
|
||||
return {
|
||||
markdown:
|
||||
`Pre-write verification required before calling ${toolName}. ` +
|
||||
'Call record_verification_ledger first. In the ledger, summarize the loaded skill protocol you followed, ' +
|
||||
'list identifiers verified via discover_data/entity_details/sql_execution, and list any identifiers intentionally left unverified. ' +
|
||||
'If the write contains no warehouse identifiers, say that explicitly in the ledger summary.',
|
||||
structured: {
|
||||
success: false,
|
||||
reason: 'verification_ledger_required',
|
||||
toolName,
|
||||
},
|
||||
};
|
||||
}
|
||||
|
|
@ -0,0 +1,28 @@
|
|||
import type { KtxFileStorePort } from '../../../core/file-store.js';
|
||||
import type { SlConnectionCatalogPort } from '../../../sl/ports.js';
|
||||
import { WarehouseCatalogService } from '../../../scan/warehouse-catalog.js';
|
||||
import type { BaseTool, ToolContext } from '../../../tools/base-tool.js';
|
||||
import { DiscoverDataTool } from './discover-data.tool.js';
|
||||
import { EntityDetailsTool } from './entity-details.tool.js';
|
||||
import { SqlExecutionTool } from './sql-execution.tool.js';
|
||||
|
||||
export function createWarehouseVerificationTools(deps: {
|
||||
connections: SlConnectionCatalogPort;
|
||||
fallbackFileStore: KtxFileStorePort;
|
||||
wikiSearchTool: BaseTool;
|
||||
slDiscoverTool: BaseTool;
|
||||
}): BaseTool[] {
|
||||
const catalogFactory = (context: ToolContext) =>
|
||||
new WarehouseCatalogService({
|
||||
fileStore: context.session?.configService ?? deps.fallbackFileStore,
|
||||
});
|
||||
return [
|
||||
new EntityDetailsTool(catalogFactory),
|
||||
new SqlExecutionTool(deps.connections),
|
||||
new DiscoverDataTool({
|
||||
wikiSearchTool: deps.wikiSearchTool,
|
||||
slDiscoverTool: deps.slDiscoverTool,
|
||||
catalogFactory,
|
||||
}),
|
||||
];
|
||||
}
|
||||
|
|
@ -0,0 +1,131 @@
|
|||
import { beforeEach, describe, expect, it, vi } from 'vitest';
|
||||
import type { WarehouseCatalogService } from '../../../scan/warehouse-catalog.js';
|
||||
import type { BaseTool, ToolContext } from '../../../../context/tools/base-tool.js';
|
||||
import { DiscoverDataTool } from './discover-data.tool.js';
|
||||
|
||||
describe('DiscoverDataTool', () => {
|
||||
const wikiSearchTool = { call: vi.fn() } as unknown as BaseTool & { call: ReturnType<typeof vi.fn> };
|
||||
const slDiscoverTool = { call: vi.fn() } as unknown as BaseTool & { call: ReturnType<typeof vi.fn> };
|
||||
const catalog = { searchByName: vi.fn() } as unknown as WarehouseCatalogService & {
|
||||
searchByName: ReturnType<typeof vi.fn>;
|
||||
};
|
||||
const context: ToolContext = {
|
||||
sourceId: 'ingest',
|
||||
messageId: 'm1',
|
||||
userId: 'system',
|
||||
session: { allowedConnectionNames: new Set(['warehouse']) } as any,
|
||||
};
|
||||
const tool = new DiscoverDataTool({
|
||||
wikiSearchTool,
|
||||
slDiscoverTool,
|
||||
catalogFactory: () => catalog,
|
||||
});
|
||||
|
||||
beforeEach(() => {
|
||||
wikiSearchTool.call.mockReset();
|
||||
slDiscoverTool.call.mockReset();
|
||||
catalog.searchByName.mockReset();
|
||||
wikiSearchTool.call.mockResolvedValue({
|
||||
markdown: '- orders wiki',
|
||||
structured: { totalFound: 1, results: [{ key: 'orders' }] },
|
||||
});
|
||||
slDiscoverTool.call.mockResolvedValue({
|
||||
markdown: '- orders source',
|
||||
structured: { totalSources: 1, sources: [{ sourceName: 'orders' }] },
|
||||
});
|
||||
catalog.searchByName.mockResolvedValue([
|
||||
{
|
||||
kind: 'table',
|
||||
connectionId: 'warehouse',
|
||||
ref: { catalog: null, db: 'public', name: 'orders' },
|
||||
display: 'public.orders',
|
||||
matchedOn: 'name',
|
||||
},
|
||||
]);
|
||||
});
|
||||
|
||||
it('groups wiki, semantic layer, and raw schema hits with routing hints', async () => {
|
||||
const result = await tool.call({ query: 'orders', connectionId: 'warehouse', limit: 5 }, context);
|
||||
|
||||
expect(result.markdown).toContain('## Wiki Pages');
|
||||
expect(result.markdown).toContain('use `wiki_read(blockKey)` for full content');
|
||||
expect(result.markdown).toContain('## Semantic Layer Sources');
|
||||
expect(result.markdown).toContain('use `sl_read_source(sourceName)` for the YAML');
|
||||
expect(result.markdown).toContain('## Raw Warehouse Schema');
|
||||
expect(result.markdown).toContain('use `entity_details({connectionId, targets: [{display}]})`');
|
||||
expect(result.structured.raw?.hits).toHaveLength(1);
|
||||
});
|
||||
|
||||
it('includes connectionId on raw schema hits so entity_details can follow up', async () => {
|
||||
const multiConnectionContext: ToolContext = {
|
||||
...context,
|
||||
session: { allowedConnectionNames: new Set(['warehouse', 'analytics']) } as any,
|
||||
};
|
||||
catalog.searchByName.mockImplementation(async (connectionId: string, query: string) => [
|
||||
{
|
||||
kind: 'table',
|
||||
connectionId,
|
||||
ref: { catalog: null, db: 'public', name: `${connectionId}_${query}` },
|
||||
display: `public.${connectionId}_${query}`,
|
||||
matchedOn: 'name',
|
||||
},
|
||||
]);
|
||||
|
||||
const result = await tool.call({ query: 'orders', limit: 10 }, multiConnectionContext);
|
||||
|
||||
expect(catalog.searchByName).toHaveBeenCalledWith('analytics', 'orders', 10);
|
||||
expect(catalog.searchByName).toHaveBeenCalledWith('warehouse', 'orders', 10);
|
||||
expect(result.markdown).toContain('connectionId=analytics');
|
||||
expect(result.markdown).toContain('connectionId=warehouse');
|
||||
expect(result.markdown).toContain(
|
||||
'entity_details({connectionId: "analytics", targets: [{display: "public.analytics_orders"}]})',
|
||||
);
|
||||
expect(result.structured.raw?.hits.map((hit) => hit.connectionId)).toEqual(['analytics', 'warehouse']);
|
||||
});
|
||||
|
||||
it('refuses explicit out-of-scope connection names', async () => {
|
||||
const result = await tool.call({ query: 'orders', connectionId: 'billing' }, context);
|
||||
|
||||
expect(result.markdown).toContain('Connection "billing" is not available to this ingest stage.');
|
||||
expect(result.structured).toEqual({ wiki: null, sl: null, raw: null });
|
||||
expect(wikiSearchTool.call).not.toHaveBeenCalled();
|
||||
expect(slDiscoverTool.call).not.toHaveBeenCalled();
|
||||
expect(catalog.searchByName).not.toHaveBeenCalled();
|
||||
});
|
||||
|
||||
it('delegates sourceName inspect mode to sl_discover only', async () => {
|
||||
slDiscoverTool.call.mockResolvedValueOnce({
|
||||
markdown: 'source detail',
|
||||
structured: { sourceName: 'orders' },
|
||||
});
|
||||
|
||||
const result = await tool.call({ sourceName: 'orders', connectionId: 'warehouse' }, context);
|
||||
|
||||
expect(slDiscoverTool.call).toHaveBeenCalledWith({ sourceName: 'orders', connectionId: 'warehouse' }, context);
|
||||
expect(wikiSearchTool.call).not.toHaveBeenCalled();
|
||||
expect(catalog.searchByName).not.toHaveBeenCalled();
|
||||
expect(result.markdown).toContain('source detail');
|
||||
});
|
||||
|
||||
it('returns the empty-state message when all sections are empty', async () => {
|
||||
wikiSearchTool.call.mockResolvedValueOnce({ markdown: '', structured: { totalFound: 0, results: [] } });
|
||||
slDiscoverTool.call.mockResolvedValueOnce({ markdown: '', structured: { totalSources: 0, sources: [] } });
|
||||
catalog.searchByName.mockResolvedValueOnce([]);
|
||||
|
||||
const result = await tool.call({ query: 'customer source', connectionId: 'warehouse' }, context);
|
||||
|
||||
expect(result.markdown).toContain('No matches for "customer source" across wiki, semantic layer, or raw warehouse schema.');
|
||||
});
|
||||
|
||||
it('uses connectionId as the optional connection filter', () => {
|
||||
const legacyConnectionField = ['connection', 'Name'].join('');
|
||||
|
||||
expect(tool.parseInput({ query: 'orders', connectionId: 'warehouse', limit: 5 })).toEqual({
|
||||
query: 'orders',
|
||||
connectionId: 'warehouse',
|
||||
limit: 5,
|
||||
});
|
||||
|
||||
expect(() => tool.parseInput({ query: 'orders', [legacyConnectionField]: 'warehouse', limit: 5 })).toThrow();
|
||||
});
|
||||
});
|
||||
|
|
@ -0,0 +1,142 @@
|
|||
import { z } from 'zod';
|
||||
import { WarehouseCatalogService, type RawSchemaHit } from '../../../scan/warehouse-catalog.js';
|
||||
import { BaseTool, type ToolContext, type ToolOutput } from '../../../../context/tools/base-tool.js';
|
||||
|
||||
const discoverDataInputSchema = z.object({
|
||||
query: z.string().optional(),
|
||||
connectionId: z.string().regex(/^[a-zA-Z0-9][a-zA-Z0-9_-]*$/).optional(),
|
||||
limit: z.number().int().positive().max(50).optional().default(10),
|
||||
sourceName: z.string().optional(),
|
||||
}).strict();
|
||||
|
||||
type DiscoverDataInput = z.input<typeof discoverDataInputSchema>;
|
||||
|
||||
export interface DiscoverDataStructured {
|
||||
wiki: unknown | null;
|
||||
sl: unknown | null;
|
||||
raw: { hits: RawSchemaHit[] } | null;
|
||||
}
|
||||
|
||||
interface DiscoverDataDeps {
|
||||
wikiSearchTool: BaseTool;
|
||||
slDiscoverTool: BaseTool;
|
||||
catalogFactory: (context: ToolContext) => WarehouseCatalogService;
|
||||
}
|
||||
|
||||
function totalFound(structured: unknown): number {
|
||||
return typeof structured === 'object' &&
|
||||
structured !== null &&
|
||||
'totalFound' in structured &&
|
||||
typeof structured.totalFound === 'number'
|
||||
? structured.totalFound
|
||||
: 0;
|
||||
}
|
||||
|
||||
function totalSources(structured: unknown): number {
|
||||
return typeof structured === 'object' &&
|
||||
structured !== null &&
|
||||
'totalSources' in structured &&
|
||||
typeof structured.totalSources === 'number'
|
||||
? structured.totalSources
|
||||
: 0;
|
||||
}
|
||||
|
||||
function allowedConnectionNames(context: ToolContext): ReadonlySet<string> | null {
|
||||
return context.session?.allowedConnectionNames ?? null;
|
||||
}
|
||||
|
||||
export class DiscoverDataTool extends BaseTool<typeof discoverDataInputSchema> {
|
||||
readonly name = 'discover_data';
|
||||
|
||||
constructor(private readonly deps: DiscoverDataDeps) {
|
||||
super();
|
||||
}
|
||||
|
||||
get description(): string {
|
||||
return 'Discover existing wiki pages, semantic layer sources, and raw warehouse schema hits before writing ingest output.';
|
||||
}
|
||||
|
||||
get inputSchema() {
|
||||
return discoverDataInputSchema;
|
||||
}
|
||||
|
||||
async call(input: DiscoverDataInput, context: ToolContext): Promise<ToolOutput<DiscoverDataStructured>> {
|
||||
const allowed = allowedConnectionNames(context);
|
||||
if (input.connectionId && allowed && !allowed.has(input.connectionId)) {
|
||||
return {
|
||||
markdown: `Connection "${input.connectionId}" is not available to this ingest stage.`,
|
||||
structured: { wiki: null, sl: null, raw: null },
|
||||
};
|
||||
}
|
||||
|
||||
if (input.sourceName) {
|
||||
const sl = await this.deps.slDiscoverTool.call(
|
||||
{ sourceName: input.sourceName, connectionId: input.connectionId },
|
||||
context,
|
||||
);
|
||||
return { markdown: sl.markdown, structured: { wiki: null, sl: sl.structured, raw: null } };
|
||||
}
|
||||
|
||||
const query = input.query?.trim() || '';
|
||||
const limit = input.limit ?? 10;
|
||||
const parts: string[] = [];
|
||||
let wiki: unknown | null = null;
|
||||
let sl: unknown | null = null;
|
||||
let raw: DiscoverDataStructured['raw'] = null;
|
||||
|
||||
if (query) {
|
||||
const wikiResult = await this.deps.wikiSearchTool.call({ query, limit }, context);
|
||||
if (totalFound(wikiResult.structured) > 0) {
|
||||
parts.push('## Wiki Pages', '> use `wiki_read(blockKey)` for full content', wikiResult.markdown, '');
|
||||
wiki = wikiResult.structured;
|
||||
}
|
||||
}
|
||||
|
||||
const slResult = await this.deps.slDiscoverTool.call(
|
||||
{ query: query || undefined, connectionId: input.connectionId },
|
||||
context,
|
||||
);
|
||||
if (totalSources(slResult.structured) > 0) {
|
||||
parts.push(
|
||||
'## Semantic Layer Sources',
|
||||
'> use `sl_read_source(sourceName)` for the YAML, or `entity_details` for warehouse-shape details',
|
||||
slResult.markdown,
|
||||
'',
|
||||
);
|
||||
sl = slResult.structured;
|
||||
}
|
||||
|
||||
const catalog = this.deps.catalogFactory(context);
|
||||
const connections = input.connectionId ? [input.connectionId] : [...(allowed ?? [])].sort();
|
||||
const rawHits: RawSchemaHit[] = [];
|
||||
for (const connectionId of connections) {
|
||||
rawHits.push(...(await catalog.searchByName(connectionId, query, limit)));
|
||||
}
|
||||
if (rawHits.length > 0) {
|
||||
parts.push(
|
||||
'## Raw Warehouse Schema',
|
||||
'> use `entity_details({connectionId, targets: [{display}]})` for full DDL + sample values',
|
||||
);
|
||||
parts.push(
|
||||
rawHits
|
||||
.slice(0, limit)
|
||||
.map(
|
||||
(hit) =>
|
||||
`- ${hit.kind}: ${hit.display} [connectionId=${hit.connectionId}] (matched on ${hit.matchedOn}) - ` +
|
||||
`follow up with \`entity_details({connectionId: "${hit.connectionId}", targets: [{display: "${hit.display}"}]})\``,
|
||||
)
|
||||
.join('\n'),
|
||||
);
|
||||
raw = { hits: rawHits.slice(0, limit) };
|
||||
}
|
||||
|
||||
if (parts.length === 0) {
|
||||
return {
|
||||
markdown: `No matches for "${query}" across wiki, semantic layer, or raw warehouse schema. Try broader terms; this concept may not exist yet.`,
|
||||
structured: { wiki, sl, raw },
|
||||
};
|
||||
}
|
||||
|
||||
return { markdown: parts.join('\n'), structured: { wiki, sl, raw } };
|
||||
}
|
||||
}
|
||||
|
|
@ -0,0 +1,213 @@
|
|||
import { mkdtemp, rm } from 'node:fs/promises';
|
||||
import { tmpdir } from 'node:os';
|
||||
import { join } from 'node:path';
|
||||
import { afterEach, beforeEach, describe, expect, it } from 'vitest';
|
||||
import { initKtxProject, type KtxLocalProject } from '../../../../context/project/project.js';
|
||||
import { WarehouseCatalogService } from '../../../scan/warehouse-catalog.js';
|
||||
import type { ToolContext } from '../../../../context/tools/base-tool.js';
|
||||
import { EntityDetailsTool } from './entity-details.tool.js';
|
||||
|
||||
describe('EntityDetailsTool', () => {
|
||||
let tempDir: string;
|
||||
let project: KtxLocalProject;
|
||||
let tool: EntityDetailsTool;
|
||||
let context: ToolContext;
|
||||
|
||||
beforeEach(async () => {
|
||||
tempDir = await mkdtemp(join(tmpdir(), 'ktx-entity-details-'));
|
||||
project = await initKtxProject({ projectDir: join(tempDir, 'project') });
|
||||
await seedLiveDatabaseScan();
|
||||
tool = new EntityDetailsTool(() => new WarehouseCatalogService({ fileStore: project.fileStore }));
|
||||
context = {
|
||||
sourceId: 'ingest',
|
||||
messageId: 'm1',
|
||||
userId: 'system',
|
||||
session: {
|
||||
allowedConnectionNames: new Set(['warehouse']),
|
||||
} as any,
|
||||
};
|
||||
});
|
||||
|
||||
afterEach(async () => {
|
||||
await rm(tempDir, { recursive: true, force: true });
|
||||
});
|
||||
|
||||
async function seedLiveDatabaseScan(connectionId = 'warehouse', syncId = 'sync-1') {
|
||||
const root = `raw-sources/${connectionId}/live-database/${syncId}`;
|
||||
await project.fileStore.writeFile(
|
||||
`${root}/connection.json`,
|
||||
JSON.stringify({ connectionId, driver: 'postgres', extractedAt: '2026-05-12T00:00:00.000Z' }, null, 2),
|
||||
'ktx',
|
||||
'ktx@example.com',
|
||||
'seed connection',
|
||||
);
|
||||
await project.fileStore.writeFile(
|
||||
`${root}/tables/orders.json`,
|
||||
JSON.stringify(
|
||||
{
|
||||
catalog: null,
|
||||
db: 'public',
|
||||
name: 'orders',
|
||||
kind: 'table',
|
||||
comment: 'Customer orders',
|
||||
estimatedRows: 12,
|
||||
columns: [
|
||||
{
|
||||
name: 'id',
|
||||
nativeType: 'integer',
|
||||
normalizedType: 'integer',
|
||||
dimensionType: 'number',
|
||||
nullable: false,
|
||||
primaryKey: true,
|
||||
comment: 'Order id',
|
||||
},
|
||||
{
|
||||
name: 'status',
|
||||
nativeType: 'text',
|
||||
normalizedType: 'text',
|
||||
dimensionType: 'string',
|
||||
nullable: false,
|
||||
primaryKey: false,
|
||||
comment: 'Order status',
|
||||
},
|
||||
],
|
||||
foreignKeys: [],
|
||||
},
|
||||
null,
|
||||
2,
|
||||
),
|
||||
'ktx',
|
||||
'ktx@example.com',
|
||||
'seed orders',
|
||||
);
|
||||
await project.fileStore.writeFile(
|
||||
`${root}/enrichment/relationship-profile.json`,
|
||||
JSON.stringify(
|
||||
{
|
||||
connectionId,
|
||||
driver: 'postgres',
|
||||
tables: [{ table: { catalog: null, db: 'public', name: 'orders' }, rowCount: 12 }],
|
||||
columns: {
|
||||
'orders.status': {
|
||||
table: { catalog: null, db: 'public', name: 'orders' },
|
||||
column: 'status',
|
||||
rowCount: 12,
|
||||
nullCount: 0,
|
||||
distinctCount: 2,
|
||||
nullRate: 0,
|
||||
sampleValues: ['paid', 'refunded'],
|
||||
},
|
||||
},
|
||||
},
|
||||
null,
|
||||
2,
|
||||
),
|
||||
'ktx',
|
||||
'ktx@example.com',
|
||||
'seed profile',
|
||||
);
|
||||
}
|
||||
|
||||
it('returns scoped table detail for a display target', async () => {
|
||||
const result = await tool.call({ connectionId: 'warehouse', targets: [{ display: 'public.orders' }] }, context);
|
||||
|
||||
expect(result.markdown).toContain('### public.orders');
|
||||
expect(result.markdown).toContain('- status (text, nullable=false)');
|
||||
expect(result.markdown).toContain('sample: ["paid","refunded"]');
|
||||
expect(result.structured.scanAvailable).toBe(true);
|
||||
expect(result.structured.resolved).toHaveLength(1);
|
||||
});
|
||||
|
||||
it('resolves display targets that include a column name', async () => {
|
||||
const result = await tool.call(
|
||||
{ connectionId: 'warehouse', targets: [{ display: 'public.orders.status' }] },
|
||||
context,
|
||||
);
|
||||
|
||||
expect(result.markdown).toContain('### public.orders');
|
||||
expect(result.markdown).toContain('- status (text, nullable=false)');
|
||||
expect(result.markdown).not.toContain('- id (integer');
|
||||
expect(result.structured.resolved).toHaveLength(1);
|
||||
expect(result.structured.resolved[0]?.columns.map((column) => column.name)).toEqual(['status']);
|
||||
});
|
||||
|
||||
it('reports missing explicit columns instead of returning an empty column list', async () => {
|
||||
const result = await tool.call(
|
||||
{ connectionId: 'warehouse', targets: [{ display: 'public.orders.plan_tier' }] },
|
||||
context,
|
||||
);
|
||||
|
||||
expect(result.markdown).toContain('Column not found in scan: public.orders.plan_tier');
|
||||
expect(result.markdown).toContain('Available columns: id, status');
|
||||
expect(result.structured.resolved).toHaveLength(0);
|
||||
expect(result.structured.missing).toHaveLength(1);
|
||||
});
|
||||
|
||||
it('reports missing structured table targets in model-visible markdown', async () => {
|
||||
const result = await tool.call(
|
||||
{
|
||||
connectionId: 'warehouse',
|
||||
targets: [{ catalog: null, db: 'public', name: 'orderz' }],
|
||||
},
|
||||
context,
|
||||
);
|
||||
|
||||
expect(result.markdown).toContain('Not found in scan: public.orderz');
|
||||
expect(result.markdown).toContain('Closest matches: orders');
|
||||
expect(result.structured.resolved).toHaveLength(0);
|
||||
expect(result.structured.missing).toHaveLength(1);
|
||||
});
|
||||
|
||||
it('reports missing structured column targets in model-visible markdown', async () => {
|
||||
const result = await tool.call(
|
||||
{
|
||||
connectionId: 'warehouse',
|
||||
targets: [{ catalog: null, db: 'public', name: 'orders', column: 'plan_tier' }],
|
||||
},
|
||||
context,
|
||||
);
|
||||
|
||||
expect(result.markdown).toContain('Column not found in scan: public.orders.plan_tier');
|
||||
expect(result.markdown).toContain('Available columns: id, status');
|
||||
expect(result.structured.resolved).toHaveLength(0);
|
||||
expect(result.structured.missing).toHaveLength(1);
|
||||
});
|
||||
|
||||
it('returns a no-scan state distinct from not found', async () => {
|
||||
const result = await tool.call(
|
||||
{ connectionId: 'empty', targets: [{ display: 'public.orders' }] },
|
||||
{ ...context, session: { ...context.session!, allowedConnectionNames: new Set(['empty']) } },
|
||||
);
|
||||
|
||||
expect(result.markdown).toContain('No live-database scan available for connection "empty"; run `ktx scan` first.');
|
||||
expect(result.structured.scanAvailable).toBe(false);
|
||||
});
|
||||
|
||||
it('refuses out-of-scope connections', async () => {
|
||||
const result = await tool.call({ connectionId: 'billing', targets: [{ display: 'public.orders' }] }, context);
|
||||
|
||||
expect(result.markdown).toContain('Connection "billing" is not available to this ingest stage.');
|
||||
expect(result.structured.scanAvailable).toBe(false);
|
||||
});
|
||||
|
||||
it('uses connectionId as the public input field', async () => {
|
||||
const legacyConnectionField = ['connection', 'Name'].join('');
|
||||
|
||||
expect(
|
||||
tool.parseInput({
|
||||
connectionId: 'warehouse',
|
||||
targets: [{ display: 'public.orders' }],
|
||||
}),
|
||||
).toEqual({
|
||||
connectionId: 'warehouse',
|
||||
targets: [{ display: 'public.orders' }],
|
||||
});
|
||||
|
||||
expect(() =>
|
||||
tool.parseInput({
|
||||
[legacyConnectionField]: 'warehouse',
|
||||
targets: [{ display: 'public.orders' }],
|
||||
}),
|
||||
).toThrow();
|
||||
});
|
||||
});
|
||||
|
|
@ -0,0 +1,170 @@
|
|||
import { z } from 'zod';
|
||||
import type { KtxTableRef } from '../../../scan/types.js';
|
||||
import { WarehouseCatalogService, type TableDetail } from '../../../scan/warehouse-catalog.js';
|
||||
import { BaseTool, type ToolContext, type ToolOutput } from '../../../../context/tools/base-tool.js';
|
||||
|
||||
const targetSchema = z.union([
|
||||
z.object({ display: z.string().min(1) }),
|
||||
z.object({
|
||||
catalog: z.string().nullable(),
|
||||
db: z.string().nullable(),
|
||||
name: z.string().min(1),
|
||||
column: z.string().optional(),
|
||||
}),
|
||||
]);
|
||||
|
||||
const entityDetailsInputSchema = z.object({
|
||||
connectionId: z.string().regex(/^[a-zA-Z0-9][a-zA-Z0-9_-]*$/),
|
||||
targets: z.array(targetSchema).min(1).max(50),
|
||||
}).strict();
|
||||
|
||||
type EntityDetailsInput = z.infer<typeof entityDetailsInputSchema>;
|
||||
type EntityDetailsTarget = EntityDetailsInput['targets'][number];
|
||||
|
||||
export interface EntityDetailsStructured {
|
||||
resolved: TableDetail[];
|
||||
missing: Array<{ target: unknown; candidates: KtxTableRef[] }>;
|
||||
scanAvailable: boolean;
|
||||
}
|
||||
|
||||
function allowedConnectionNames(context: ToolContext): ReadonlySet<string> | null {
|
||||
return context.session?.allowedConnectionNames ?? null;
|
||||
}
|
||||
|
||||
function targetLabel(target: EntityDetailsTarget): string {
|
||||
if ('display' in target) {
|
||||
return target.display;
|
||||
}
|
||||
return [target.catalog, target.db, target.name, target.column].filter((part): part is string => !!part).join('.');
|
||||
}
|
||||
|
||||
function appendMissingTargetMarkdown(parts: string[], target: EntityDetailsTarget, candidates: KtxTableRef[]): void {
|
||||
parts.push(`Not found in scan: ${targetLabel(target)}`);
|
||||
if (candidates.length > 0) {
|
||||
parts.push(`Closest matches: ${candidates.map((candidate) => candidate.name).join(', ')}`);
|
||||
}
|
||||
}
|
||||
|
||||
async function resolveTarget(
|
||||
catalog: WarehouseCatalogService,
|
||||
connectionId: string,
|
||||
target: EntityDetailsTarget,
|
||||
): Promise<{ resolved: (KtxTableRef & { column?: string }) | null; candidates: KtxTableRef[] }> {
|
||||
if ('display' in target) {
|
||||
return catalog.resolveDisplayTarget(connectionId, target.display);
|
||||
}
|
||||
|
||||
const candidateResolution = await catalog.resolveDisplayTarget(connectionId, targetLabel(target));
|
||||
return {
|
||||
resolved: {
|
||||
catalog: target.catalog,
|
||||
db: target.db,
|
||||
name: target.name,
|
||||
column: target.column,
|
||||
},
|
||||
candidates: candidateResolution.candidates,
|
||||
};
|
||||
}
|
||||
|
||||
function sampleText(values: string[]): string {
|
||||
return values.length > 0 ? ` - sample: ${JSON.stringify(values.slice(0, 10))}` : '';
|
||||
}
|
||||
|
||||
function appendTableMarkdown(parts: string[], detail: TableDetail, columnName?: string): void {
|
||||
const columns = columnName ? detail.columns.filter((column) => column.name === columnName) : detail.columns;
|
||||
parts.push(`### ${detail.display}`);
|
||||
parts.push(`Type: ${detail.kind} | Native columns: ${detail.columns.length}`);
|
||||
if (detail.description || detail.comment) {
|
||||
parts.push(`Description: ${detail.description ?? detail.comment}`);
|
||||
}
|
||||
parts.push('', 'Columns:');
|
||||
for (const column of columns) {
|
||||
const pk = column.primaryKey ? ', PK' : '';
|
||||
parts.push(`- ${column.name} (${column.nativeType}, nullable=${column.nullable}${pk})${sampleText(column.sampleValues)}`);
|
||||
}
|
||||
parts.push('');
|
||||
}
|
||||
|
||||
function findColumn(detail: TableDetail, columnName: string): TableDetail['columns'][number] | null {
|
||||
const normalized = columnName.toLowerCase();
|
||||
return detail.columns.find((column) => column.name.toLowerCase() === normalized) ?? null;
|
||||
}
|
||||
|
||||
export class EntityDetailsTool extends BaseTool<typeof entityDetailsInputSchema> {
|
||||
readonly name = 'entity_details';
|
||||
|
||||
constructor(private readonly catalogFactory: (context: ToolContext) => WarehouseCatalogService) {
|
||||
super();
|
||||
}
|
||||
|
||||
get description(): string {
|
||||
return 'Verify warehouse tables and columns from the latest live-database scan before writing them into wiki or semantic-layer output.';
|
||||
}
|
||||
|
||||
get inputSchema() {
|
||||
return entityDetailsInputSchema;
|
||||
}
|
||||
|
||||
async call(input: EntityDetailsInput, context: ToolContext): Promise<ToolOutput<EntityDetailsStructured>> {
|
||||
const allowed = allowedConnectionNames(context);
|
||||
if (allowed && !allowed.has(input.connectionId)) {
|
||||
return {
|
||||
markdown: `Connection "${input.connectionId}" is not available to this ingest stage.`,
|
||||
structured: { resolved: [], missing: [], scanAvailable: false },
|
||||
};
|
||||
}
|
||||
|
||||
const catalog = this.catalogFactory(context);
|
||||
const scanAvailable = await catalog.hasScan(input.connectionId);
|
||||
if (!scanAvailable) {
|
||||
return {
|
||||
markdown: `No live-database scan available for connection "${input.connectionId}"; run \`ktx scan\` first.`,
|
||||
structured: { resolved: [], missing: [], scanAvailable: false },
|
||||
};
|
||||
}
|
||||
|
||||
const parts: string[] = [];
|
||||
const resolved: TableDetail[] = [];
|
||||
const missing: EntityDetailsStructured['missing'] = [];
|
||||
|
||||
for (const target of input.targets) {
|
||||
const resolution = await resolveTarget(catalog, input.connectionId, target);
|
||||
if (!resolution.resolved) {
|
||||
missing.push({ target, candidates: resolution.candidates });
|
||||
appendMissingTargetMarkdown(parts, target, resolution.candidates);
|
||||
continue;
|
||||
}
|
||||
const detail = await catalog.getTable({ connectionId: input.connectionId, ...resolution.resolved });
|
||||
if (!detail) {
|
||||
missing.push({ target, candidates: resolution.candidates });
|
||||
appendMissingTargetMarkdown(parts, target, resolution.candidates);
|
||||
continue;
|
||||
}
|
||||
const requestedColumn = resolution.resolved.column;
|
||||
if (requestedColumn) {
|
||||
const column = findColumn(detail, requestedColumn);
|
||||
if (!column) {
|
||||
missing.push({
|
||||
target,
|
||||
candidates: [{ catalog: detail.catalog, db: detail.db, name: detail.name }],
|
||||
});
|
||||
parts.push(`Column not found in scan: ${detail.display}.${requestedColumn}`);
|
||||
parts.push(`Available columns: ${detail.columns.map((candidate) => candidate.name).join(', ')}`);
|
||||
continue;
|
||||
}
|
||||
const scopedDetail = { ...detail, columns: [column] };
|
||||
resolved.push(scopedDetail);
|
||||
appendTableMarkdown(parts, scopedDetail, column.name);
|
||||
continue;
|
||||
}
|
||||
|
||||
resolved.push(detail);
|
||||
appendTableMarkdown(parts, detail);
|
||||
}
|
||||
|
||||
return {
|
||||
markdown: parts.join('\n').trim(),
|
||||
structured: { resolved, missing, scanAvailable: true },
|
||||
};
|
||||
}
|
||||
}
|
||||
|
|
@ -0,0 +1,78 @@
|
|||
import { describe, expect, it, vi } from 'vitest';
|
||||
import type { SlConnectionCatalogPort } from '../../../../context/sl/ports.js';
|
||||
import type { ToolContext } from '../../../../context/tools/base-tool.js';
|
||||
import { SqlExecutionTool } from './sql-execution.tool.js';
|
||||
|
||||
describe('SqlExecutionTool', () => {
|
||||
const connections = {
|
||||
executeQuery: vi.fn(),
|
||||
} as unknown as SlConnectionCatalogPort & { executeQuery: ReturnType<typeof vi.fn> };
|
||||
const tool = new SqlExecutionTool(connections);
|
||||
const context: ToolContext = {
|
||||
sourceId: 'ingest',
|
||||
messageId: 'm1',
|
||||
userId: 'system',
|
||||
session: { allowedConnectionNames: new Set(['warehouse']) } as any,
|
||||
};
|
||||
|
||||
it('wraps read-only SQL with a capped row limit', async () => {
|
||||
connections.executeQuery.mockResolvedValue({ headers: ['status'], rows: [['paid']], totalRows: 1 });
|
||||
|
||||
const result = await tool.call(
|
||||
{ connectionId: 'warehouse', sql: 'select status from public.orders', rowLimit: 5 },
|
||||
context,
|
||||
);
|
||||
|
||||
expect(connections.executeQuery).toHaveBeenCalledWith(
|
||||
'warehouse',
|
||||
'select * from (select status from public.orders) as ktx_query_result limit 5',
|
||||
);
|
||||
expect(result.markdown).toContain('| status |');
|
||||
expect(result.structured.wrappedSql).toContain('limit 5');
|
||||
});
|
||||
|
||||
it.each(['insert into x values (1)', 'drop table x', 'vacuum'])('rejects mutating SQL: %s', async (sql) => {
|
||||
connections.executeQuery.mockClear();
|
||||
|
||||
const result = await tool.call({ connectionId: 'warehouse', sql }, context);
|
||||
|
||||
expect(result.markdown).toContain('Only read-only SELECT/WITH queries can be executed locally.');
|
||||
expect(connections.executeQuery).not.toHaveBeenCalled();
|
||||
});
|
||||
|
||||
it('surfaces connector errors verbatim', async () => {
|
||||
connections.executeQuery.mockRejectedValue(new Error('relation "orbit_analytics.customer" does not exist'));
|
||||
|
||||
const result = await tool.call(
|
||||
{ connectionId: 'warehouse', sql: 'select 1 from orbit_analytics.customer', rowLimit: 1 },
|
||||
context,
|
||||
);
|
||||
|
||||
expect(result.markdown).toContain('relation "orbit_analytics.customer" does not exist');
|
||||
expect(result.structured.error).toContain('relation "orbit_analytics.customer" does not exist');
|
||||
});
|
||||
|
||||
it('uses connectionId as the public input field', () => {
|
||||
const legacyConnectionField = ['connection', 'Name'].join('');
|
||||
|
||||
expect(
|
||||
tool.parseInput({
|
||||
connectionId: 'warehouse',
|
||||
sql: 'select 1',
|
||||
rowLimit: 5,
|
||||
}),
|
||||
).toEqual({
|
||||
connectionId: 'warehouse',
|
||||
sql: 'select 1',
|
||||
rowLimit: 5,
|
||||
});
|
||||
|
||||
expect(() =>
|
||||
tool.parseInput({
|
||||
[legacyConnectionField]: 'warehouse',
|
||||
sql: 'select 1',
|
||||
rowLimit: 5,
|
||||
}),
|
||||
).toThrow();
|
||||
});
|
||||
});
|
||||
|
|
@ -0,0 +1,102 @@
|
|||
import { z } from 'zod';
|
||||
import { assertReadOnlySql, limitSqlForExecution } from '../../../../context/connections/read-only-sql.js';
|
||||
import type { SlConnectionCatalogPort } from '../../../../context/sl/ports.js';
|
||||
import { BaseTool, type ToolContext, type ToolOutput } from '../../../../context/tools/base-tool.js';
|
||||
|
||||
const sqlExecutionInputSchema = z.object({
|
||||
connectionId: z.string().regex(/^[a-zA-Z0-9][a-zA-Z0-9_-]*$/),
|
||||
sql: z.string().min(1),
|
||||
rowLimit: z.number().int().positive().max(1000).optional().default(100),
|
||||
}).strict();
|
||||
|
||||
type SqlExecutionInput = z.input<typeof sqlExecutionInputSchema>;
|
||||
|
||||
export interface SqlExecutionStructured {
|
||||
headers: string[];
|
||||
rows: unknown[][];
|
||||
rowCount: number;
|
||||
truncated: boolean;
|
||||
sql: string;
|
||||
wrappedSql: string;
|
||||
error?: string;
|
||||
}
|
||||
|
||||
function markdownTable(headers: string[], rows: unknown[][], totalRows: number): string {
|
||||
if (headers.length === 0) {
|
||||
return rows.length === 0 ? 'Query returned no rows.' : JSON.stringify(rows.slice(0, 20));
|
||||
}
|
||||
const visible = rows.slice(0, 20);
|
||||
const lines = [
|
||||
`| ${headers.join(' | ')} |`,
|
||||
`| ${headers.map(() => '---').join(' | ')} |`,
|
||||
...visible.map((row) => `| ${row.map((value) => String(value ?? '')).join(' | ')} |`),
|
||||
];
|
||||
if (totalRows > visible.length) {
|
||||
lines.push(`... +${totalRows - visible.length} more rows`);
|
||||
}
|
||||
return lines.join('\n');
|
||||
}
|
||||
|
||||
export class SqlExecutionTool extends BaseTool<typeof sqlExecutionInputSchema> {
|
||||
readonly name = 'sql_execution';
|
||||
|
||||
constructor(private readonly connections: SlConnectionCatalogPort) {
|
||||
super();
|
||||
}
|
||||
|
||||
get description(): string {
|
||||
return 'Run a single read-only SELECT or WITH probe against an allowed warehouse connection and return a capped markdown table or the warehouse error.';
|
||||
}
|
||||
|
||||
get inputSchema() {
|
||||
return sqlExecutionInputSchema;
|
||||
}
|
||||
|
||||
async call(input: SqlExecutionInput, context: ToolContext): Promise<ToolOutput<SqlExecutionStructured>> {
|
||||
const allowed = context.session?.allowedConnectionNames;
|
||||
if (allowed && !allowed.has(input.connectionId)) {
|
||||
return {
|
||||
markdown: `Connection "${input.connectionId}" is not available to this ingest stage.`,
|
||||
structured: {
|
||||
headers: [],
|
||||
rows: [],
|
||||
rowCount: 0,
|
||||
truncated: false,
|
||||
sql: input.sql,
|
||||
wrappedSql: '',
|
||||
error: 'connection_not_allowed',
|
||||
},
|
||||
};
|
||||
}
|
||||
|
||||
let sql: string;
|
||||
let wrappedSql: string;
|
||||
try {
|
||||
sql = assertReadOnlySql(input.sql);
|
||||
wrappedSql = limitSqlForExecution(sql, input.rowLimit);
|
||||
} catch (error) {
|
||||
const message = error instanceof Error ? error.message : String(error);
|
||||
return {
|
||||
markdown: message,
|
||||
structured: { headers: [], rows: [], rowCount: 0, truncated: false, sql: input.sql, wrappedSql: '', error: message },
|
||||
};
|
||||
}
|
||||
|
||||
try {
|
||||
const result = await this.connections.executeQuery(input.connectionId, wrappedSql);
|
||||
const headers = result.headers ?? [];
|
||||
const rows = result.rows ?? [];
|
||||
const rowCount = result.totalRows ?? rows.length;
|
||||
return {
|
||||
markdown: markdownTable(headers, rows, rowCount),
|
||||
structured: { headers, rows, rowCount, truncated: rowCount > rows.length, sql, wrappedSql },
|
||||
};
|
||||
} catch (error) {
|
||||
const message = error instanceof Error ? error.message : String(error);
|
||||
return {
|
||||
markdown: `SQL execution failed: ${message}`,
|
||||
structured: { headers: [], rows: [], rowCount: 0, truncated: false, sql, wrappedSql, error: message },
|
||||
};
|
||||
}
|
||||
}
|
||||
}
|
||||
Loading…
Add table
Add a link
Reference in a new issue