apunkt/ktx

mirror of https://github.com/Kaelio/ktx.git synced 2026-06-10 08:05:14 +02:00

Andrey Avtomonov c22248dabf

feat(context): add warehouse verification tools (#46 )

* feat(context): add warehouse dialect dispatch

* feat(context): read warehouse scan catalog

* feat(context): add entity details verification tool

* feat(context): add ingest SQL verification tool

* feat(context): add raw warehouse discovery tool

* feat(context): expose warehouse verification tools to ingest

* docs(context): add ingest identifier verification protocol

* test(context): guard ingest identifier verification prompts

* chore(context): verify warehouse verification tools

* docs: add warehouse verification tools plan and spec

* fix(context): expose target warehouses to Notion ingest

* fix(context): update ingest prompts for warehouse verification tools

* fix(context): scope raw schema discovery to allowed connections

* fix(context): verify warehouse column display targets

* docs: add notion warehouse verification gap closure plan

* fix(context): include raw discovery connection names

* fix(context): expose warehouse targets for LookML and MetricFlow

* fix(context): pass connection config to ingest query executors

* fix(cli): enable read-only SQL probes for local ingest

* docs: add warehouse verification final v1 closure plan

* fix(context): align warehouse sql probe prompt shape

* docs: add warehouse verification prompt shape closure plan

* test(context): catch connectionless sql execution prompt examples

* fix(context): include connection name in sl capture sql example

* docs: add warehouse verification sql example closure plan

* fix(context): report structured entity detail misses

* docs: add warehouse verification structured target miss closure plan

* fix: report untracked squash merge conflicts

* feat: require ingest verification ledger

* fix: stabilize ingest wiki references

2026-05-13 13:43:23 +02:00

7.5 KiB

Raw Permalink Blame History

Warehouse Verification SQL Example Closure Implementation Plan

For agentic workers: REQUIRED SUB-SKILL: Use superpowers:subagent-driven-development (recommended) or superpowers:executing-plans to implement this plan task-by-task. Steps use checkbox (- [ ]) syntax for tracking.

Goal: Remove the last connectionless sql_execution prompt example so warehouse-verification writer guidance always matches KTX's shipped tool contract.

Architecture: Keep the warehouse verification tool code unchanged. Tighten the prompt asset guard so multiline sql_execution({ sql: ... }) examples fail tests, then update the stale sl_capture worked example to pass connectionName explicitly.

Tech Stack: Markdown skill prompts, TypeScript, Vitest, pnpm workspace commands.

Audit summary

The warehouse verification tools, runner wiring, source-adapter target fan-out, CLI query executor, and focused tests are present. Focused verification passed:

pnpm --filter @ktx/context exec vitest run src/connections/dialects.test.ts src/connections/read-only-sql.test.ts src/ingest/tools/warehouse-verification/warehouse-catalog.service.test.ts src/ingest/tools/warehouse-verification/entity-details.tool.test.ts src/ingest/tools/warehouse-verification/sql-execution.tool.test.ts src/ingest/tools/warehouse-verification/discover-data.tool.test.ts src/ingest/ingest-prompts.test.ts src/ingest/ingest-runtime-assets.test.ts src/memory/memory-runtime-assets.test.ts src/ingest/local-adapters.test.ts src/ingest/adapters/notion/notion.adapter.test.ts src/ingest/adapters/lookml/lookml.adapter.test.ts src/ingest/adapters/metricflow/metricflow.adapter.test.ts
pnpm --filter @ktx/cli exec vitest run src/ingest-query-executor.test.ts src/ingest.test.ts -t "supplies a scan-connector query executor"

Remaining v1-blocking gap:

packages/context/skills/sl_capture/SKILL.md still contains a worked example with a multiline sql_execution({ sql: ... }) call. KTX's tool contract is sql_execution({connectionName, sql, rowLimit?}), so this example can teach agents to call the shipped tool with invalid input.

Non-blocking gaps remain out of scope for this v1 plan:

Full DDL-style entity_details formatting with FK profile summaries.
AST-backed SQL validation for data-modifying CTE bodies.
Search over generated enrichment/descriptions.json.
Per-WorkUnit reuse of a single WarehouseCatalogService instance for cache hits across separate tool calls.
A deterministic fake-LLM end-to-end Notion hallucination regression.
Tokenized or embedding-backed raw schema search ranking in discover_data.

File structure

Modify these files:

packages/context/src/memory/memory-runtime-assets.test.ts: add a prompt guard that catches multiline sql_execution calls without connectionName.
packages/context/skills/sl_capture/SKILL.md: update the stale worked example to include the target warehouse connectionName.

Task 1: Add a multiline SQL prompt guard

Files:

Modify: packages/context/src/memory/memory-runtime-assets.test.ts
Step 1: Add a helper that extracts sql_execution call examples

In packages/context/src/memory/memory-runtime-assets.test.ts, add this helper after forbiddenProductPattern():

function sqlExecutionCallBlocks(body: string): string[] {
  const blocks: string[] = [];
  const marker = 'sql_execution({';
  let offset = 0;

  while (offset < body.length) {
    const start = body.indexOf(marker, offset);
    if (start === -1) {
      break;
    }
    const end = body.indexOf('})', start + marker.length);
    blocks.push(body.slice(start, end === -1 ? start + marker.length : end + 2));
    offset = start + marker.length;
  }

  return blocks;
}

Step 2: Strengthen the existing SQL-shape test

Replace the body of ships only the KTX connectionName sql_execution call shape in writer guidance with:

    const shared = await readFile(join(skillsDir, '_shared', 'identifier-verification.md'), 'utf-8');
    const bodies = [{ name: '_shared/identifier-verification.md', body: shared }];

    expect(shared).toContain('sql_execution({connectionName, sql: "SELECT DISTINCT');
    expect(shared).toContain('sql_execution({connectionName, sql: "SELECT 1 FROM');

    for (const skillName of verificationWriterSkills) {
      const body = await readFile(join(skillsDir, skillName, 'SKILL.md'), 'utf-8');
      bodies.push({ name: `${skillName}/SKILL.md`, body });
      expect(body).toContain('sql_execution({connectionName');
      expect(body).not.toContain('sql_execution({ sql');
      expect(body).not.toContain('session shape');
      expect(body).not.toContain('connection is already pinned by the ingest session');
    }

    for (const { name, body } of bodies) {
      const calls = sqlExecutionCallBlocks(body);
      expect(calls.length, `${name} should contain sql_execution guidance`).toBeGreaterThan(0);
      expect(
        calls.filter((call) => !call.includes('connectionName')),
        `${name} has sql_execution calls without connectionName`,
      ).toEqual([]);
      expect(body, `${name} has a connectionless multiline sql_execution call`).not.toMatch(
        /sql_execution\(\{\s*sql\s*:/,
      );
    }

Step 3: Run the failing prompt guard

Run:

pnpm --filter @ktx/context exec vitest run src/memory/memory-runtime-assets.test.ts -t "connectionName sql_execution"

Expected: FAIL. The failure must identify sl_capture/SKILL.md as having a sql_execution call without connectionName or a connectionless multiline sql_execution call.

Step 4: Commit the failing guard

Run:

git add packages/context/src/memory/memory-runtime-assets.test.ts
git commit -m "test(context): catch connectionless sql execution prompt examples"

Task 2: Fix the stale `sl_capture` SQL example

Files:

Modify: packages/context/skills/sl_capture/SKILL.md
Test: packages/context/src/memory/memory-runtime-assets.test.ts
Test: packages/context/src/ingest/ingest-runtime-assets.test.ts
Step 1: Update the worked example

In packages/context/skills/sl_capture/SKILL.md, replace the sql_execution block in "Worked example - new join" with:

sql_execution({
  connectionName: "warehouse",
  sql: "SELECT COUNT(*), COUNT(DISTINCT a.admin_user_id) FROM public.fct_orders a JOIN public.fct_mau_multiprotocol b ON a.admin_user_id = b.admin_user_id LIMIT 1"
})

Step 2: Run the prompt guards

Run:

pnpm --filter @ktx/context exec vitest run src/memory/memory-runtime-assets.test.ts src/ingest/ingest-runtime-assets.test.ts

Expected: PASS.

Step 3: Run a direct stale-shape scan

Run:

rg -n -U "sql_execution\\(\\{\\s*\\n\\s*sql:" packages/context/skills packages/context/prompts

Expected: no matches and exit code 1.

Step 4: Run the context type-check

Run:

pnpm --filter @ktx/context run type-check

Expected: PASS.

Step 5: Commit the prompt fix

Run:

git add packages/context/skills/sl_capture/SKILL.md
git commit -m "fix(context): include connection name in sl capture sql example"

Self-review

Spec coverage:

The only remaining v1-blocking prompt-shape gap has a failing test and a direct prompt edit.
Tool implementation, runner wiring, adapter scoping, and CLI execution remain covered by the focused suites listed in the audit summary.

Placeholder scan:

This plan contains no deferred implementation placeholders.

Type consistency:

The plan uses the shipped KTX tool shape: sql_execution({connectionName, sql, rowLimit?}).

7.5 KiB Raw Permalink Blame History

Warehouse Verification SQL Example Closure Implementation Plan

Audit summary

File structure

Task 1: Add a multiline SQL prompt guard

Task 2: Fix the stale sl_capture SQL example

Self-review

7.5 KiB

Raw Permalink Blame History

Task 2: Fix the stale `sl_capture` SQL example