ktx/docs/superpowers/plans/2026-05-13-warehouse-verification-prompt-shape-closure.md
Andrey Avtomonov c22248dabf
feat(context): add warehouse verification tools (#46)
* feat(context): add warehouse dialect dispatch

* feat(context): read warehouse scan catalog

* feat(context): add entity details verification tool

* feat(context): add ingest SQL verification tool

* feat(context): add raw warehouse discovery tool

* feat(context): expose warehouse verification tools to ingest

* docs(context): add ingest identifier verification protocol

* test(context): guard ingest identifier verification prompts

* chore(context): verify warehouse verification tools

* docs: add warehouse verification tools plan and spec

* fix(context): expose target warehouses to Notion ingest

* fix(context): update ingest prompts for warehouse verification tools

* fix(context): scope raw schema discovery to allowed connections

* fix(context): verify warehouse column display targets

* docs: add notion warehouse verification gap closure plan

* fix(context): include raw discovery connection names

* fix(context): expose warehouse targets for LookML and MetricFlow

* fix(context): pass connection config to ingest query executors

* fix(cli): enable read-only SQL probes for local ingest

* docs: add warehouse verification final v1 closure plan

* fix(context): align warehouse sql probe prompt shape

* docs: add warehouse verification prompt shape closure plan

* test(context): catch connectionless sql execution prompt examples

* fix(context): include connection name in sl capture sql example

* docs: add warehouse verification sql example closure plan

* fix(context): report structured entity detail misses

* docs: add warehouse verification structured target miss closure plan

* fix: report untracked squash merge conflicts

* feat: require ingest verification ledger

* fix: stabilize ingest wiki references
2026-05-13 13:43:23 +02:00

16 KiB

Warehouse Verification Prompt Shape Closure Implementation Plan

For agentic workers: REQUIRED SUB-SKILL: Use superpowers:subagent-driven-development (recommended) or superpowers:executing-plans to implement this plan task-by-task. Steps use checkbox (- [ ]) syntax for tracking.

Goal: Make every warehouse-verification prompt use KTX's shipped sql_execution input shape so ingest agents include connectionName when they probe warehouse identifiers.

Architecture: Keep the warehouse verification tool code unchanged. Add prompt-asset tests that reject Kaelio's old session-only SQL examples, then update the shared identifier protocol and the three remaining per-skill SQL probe examples that still show the legacy shape.

Tech Stack: Markdown skill prompts, TypeScript, Vitest, pnpm workspace commands.


Audit Summary

The warehouse verification tools, runner wiring, adapter target fan-out, and focused tests are present. Focused verification passed:

pnpm --filter @ktx/context exec vitest run src/connections/dialects.test.ts src/connections/read-only-sql.test.ts src/ingest/tools/warehouse-verification/warehouse-catalog.service.test.ts src/ingest/tools/warehouse-verification/entity-details.tool.test.ts src/ingest/tools/warehouse-verification/sql-execution.tool.test.ts src/ingest/tools/warehouse-verification/discover-data.tool.test.ts src/ingest/ingest-prompts.test.ts src/ingest/ingest-runtime-assets.test.ts src/memory/memory-runtime-assets.test.ts src/ingest/local-adapters.test.ts src/ingest/adapters/notion/notion.adapter.test.ts src/ingest/adapters/lookml/lookml.adapter.test.ts src/ingest/adapters/metricflow/metricflow.adapter.test.ts
pnpm --filter @ktx/cli exec vitest run src/ingest-query-executor.test.ts src/ingest.test.ts -t "supplies a scan-connector query executor"

Remaining v1-blocking gap:

  • packages/context/skills/lookml_ingest/SKILL.md, packages/context/skills/metricflow_ingest/SKILL.md, and packages/context/skills/sl_capture/SKILL.md still contain sql_execution({ sql ... }) / "session shape" guidance inherited from Kaelio. KTX's tool contract is sql_execution({connectionName, sql, rowLimit?}), so these examples can make agents call the shipped tool with invalid input.

Non-blocking gaps remain out of scope for this v1 plan:

  • Full DDL-style entity_details formatting with FK profile summaries.
  • AST-backed SQL validation for data-modifying CTE bodies.
  • Search over generated enrichment/descriptions.json.
  • Per-WorkUnit reuse of a single WarehouseCatalogService instance for cache hits across separate tool calls.
  • A deterministic fake-LLM end-to-end Notion hallucination regression. Prompt guards and tool contract tests cover the v1 contract; a broader behavior regression can land as follow-up.

File Structure

Modify these files:

  • packages/context/src/memory/memory-runtime-assets.test.ts: add a prompt guard that rejects the legacy session-only sql_execution shape.
  • packages/context/src/ingest/ingest-runtime-assets.test.ts: strengthen the shared prompt asset assertion for the KTX connectionName SQL shape.
  • packages/context/skills/_shared/identifier-verification.md: make both SQL probe instructions show the KTX connectionName argument.
  • packages/context/skills/notion_synthesize/SKILL.md: inline the updated protocol block.
  • packages/context/skills/dbt_ingest/SKILL.md: inline the updated protocol block.
  • packages/context/skills/lookml_ingest/SKILL.md: inline the updated protocol block and fix the legacy SQL fallback example.
  • packages/context/skills/looker_ingest/SKILL.md: inline the updated protocol block.
  • packages/context/skills/metabase_ingest/SKILL.md: inline the updated protocol block.
  • packages/context/skills/metricflow_ingest/SKILL.md: inline the updated protocol block and fix the legacy SQL fallback example.
  • packages/context/skills/live_database_ingest/SKILL.md: inline the updated protocol block.
  • packages/context/skills/historic_sql_table_digest/SKILL.md: inline the updated protocol block.
  • packages/context/skills/historic_sql_patterns/SKILL.md: inline the updated protocol block.
  • packages/context/skills/knowledge_capture/SKILL.md: inline the updated protocol block.
  • packages/context/skills/sl_capture/SKILL.md: inline the updated protocol block and fix the join-discovery SQL example.

Task 1: Add Prompt Guards For The KTX SQL Tool Shape

Files:

  • Modify: packages/context/src/memory/memory-runtime-assets.test.ts

  • Modify: packages/context/src/ingest/ingest-runtime-assets.test.ts

  • Step 1: Add the failing memory asset guard

In packages/context/src/memory/memory-runtime-assets.test.ts, add this test after does not ship stale warehouse verification tool names or fictional identifiers:

  it('ships only the KTX connectionName sql_execution call shape in writer guidance', async () => {
    const shared = await readFile(join(skillsDir, '_shared', 'identifier-verification.md'), 'utf-8');

    expect(shared).toContain('sql_execution({connectionName, sql: "SELECT DISTINCT');
    expect(shared).toContain('sql_execution({connectionName, sql: "SELECT 1 FROM');

    for (const skillName of verificationWriterSkills) {
      const body = await readFile(join(skillsDir, skillName, 'SKILL.md'), 'utf-8');
      expect(body).toContain('sql_execution({connectionName');
      expect(body).not.toContain('sql_execution({ sql');
      expect(body).not.toContain('session shape');
      expect(body).not.toContain('connection is already pinned by the ingest session');
    }
  });
  • Step 2: Strengthen the shared ingest asset guard

In packages/context/src/ingest/ingest-runtime-assets.test.ts, update packages identifier verification prompt assets so the final assertions are:

    expect(shared).toContain('discover_data');
    expect(shared).toContain('entity_details');
    expect(shared).toContain('sql_execution');
    expect(shared).toContain('sql_execution({connectionName, sql: "SELECT DISTINCT');
    expect(shared).toContain('sql_execution({connectionName, sql: "SELECT 1 FROM');
  • Step 3: Run the failing prompt guards

Run:

pnpm --filter @ktx/context exec vitest run src/memory/memory-runtime-assets.test.ts src/ingest/ingest-runtime-assets.test.ts

Expected: FAIL. The failure must mention at least one current legacy string: sql_execution({ sql, session shape, or missing sql_execution({connectionName.

Task 2: Update The Shared Identifier Verification Protocol

Files:

  • Modify: packages/context/skills/_shared/identifier-verification.md

  • Modify: packages/context/skills/notion_synthesize/SKILL.md

  • Modify: packages/context/skills/dbt_ingest/SKILL.md

  • Modify: packages/context/skills/lookml_ingest/SKILL.md

  • Modify: packages/context/skills/looker_ingest/SKILL.md

  • Modify: packages/context/skills/metabase_ingest/SKILL.md

  • Modify: packages/context/skills/metricflow_ingest/SKILL.md

  • Modify: packages/context/skills/live_database_ingest/SKILL.md

  • Modify: packages/context/skills/historic_sql_table_digest/SKILL.md

  • Modify: packages/context/skills/historic_sql_patterns/SKILL.md

  • Modify: packages/context/skills/knowledge_capture/SKILL.md

  • Modify: packages/context/skills/sl_capture/SKILL.md

  • Step 1: Replace the shared protocol text

Replace the full ## Identifier Verification Protocol block in packages/context/skills/_shared/identifier-verification.md with:

## Identifier Verification Protocol

Before writing a wiki page or SL source on any topic:

1. `discover_data({query: "<topic>"})` - see what wikis, SL sources, and raw
   tables already exist. Prefer updating existing pages over creating new ones.

Before emitting any `schema.table` or `schema.table.column` into a wiki body,
SL source, `tables:` frontmatter, `sl_refs`, or `emit_unmapped_fallback`:

2. `entity_details({connectionName, targets: [{display: "<identifier>"}]})` -
   confirm the identifier resolves; inspect native types, FK/PK, and
   sampleValues.
3. For literal values from the source, such as status codes or plan tiers,
   check whether they appear in `entity_details` sampleValues for the relevant
   column. If sampleValues is short or the sample may have missed real values,
   run a `sql_execution` probe with the same warehouse connection name:
   `sql_execution({connectionName, sql: "SELECT DISTINCT <col> FROM <ref> LIMIT 50"})`.
4. If the candidate identifier still does not resolve, do one of:
   - Use `sql_execution({connectionName, sql: "SELECT 1 FROM <ref> LIMIT 0"})`.
     If it errors, the identifier is fictional.
   - Wrap the identifier in `[unverified - from <rawPath>]` in the wiki body,
     citing the exact raw path that mentioned it.
   - When recording `emit_unmapped_fallback` with `no_physical_table`, include
     the failing probe error in `clarification`.
5. Never copy `<schema>.<table>` placeholder strings from these instructions
   into output.
  • Step 2: Inline the same protocol in every writer skill

Replace the existing ## Identifier Verification Protocol block in each writer skill with the exact block from Step 1:

packages/context/skills/notion_synthesize/SKILL.md
packages/context/skills/dbt_ingest/SKILL.md
packages/context/skills/lookml_ingest/SKILL.md
packages/context/skills/looker_ingest/SKILL.md
packages/context/skills/metabase_ingest/SKILL.md
packages/context/skills/metricflow_ingest/SKILL.md
packages/context/skills/live_database_ingest/SKILL.md
packages/context/skills/historic_sql_table_digest/SKILL.md
packages/context/skills/historic_sql_patterns/SKILL.md
packages/context/skills/knowledge_capture/SKILL.md
packages/context/skills/sl_capture/SKILL.md
  • Step 3: Run the shared prompt asset tests

Run:

pnpm --filter @ktx/context exec vitest run src/memory/memory-runtime-assets.test.ts src/ingest/ingest-runtime-assets.test.ts

Expected: still FAIL because the per-skill legacy SQL examples in LookML, MetricFlow, and sl_capture have not been fixed yet.

Task 3: Fix Legacy Per-Skill SQL Examples

Files:

  • Modify: packages/context/skills/lookml_ingest/SKILL.md

  • Modify: packages/context/skills/metricflow_ingest/SKILL.md

  • Modify: packages/context/skills/sl_capture/SKILL.md

  • Step 1: Fix the LookML fallback probe example

In packages/context/skills/lookml_ingest/SKILL.md, replace the current Required flow item 2 with:

2. If the table isn't in the manifest, use the warehouse `connectionName`
   returned by `discover_data` or the target connection chosen from
   `sl_discover`, then call a dialect-appropriate SQL probe with that
   connection name, for example:
   `sql_execution({connectionName: "warehouse", sql: "SELECT 1 FROM analytics.orders LIMIT 0"})`.
   Replace `warehouse`, `analytics`, and `orders` with the verified connection,
   schema or dataset, and table from the WorkUnit evidence.
  • Step 2: Fix the MetricFlow fallback probe example

In packages/context/skills/metricflow_ingest/SKILL.md, replace the paragraph that begins If \sl_discover` errors` with:

If `sl_discover` errors because no such table exists, use `discover_data` and
`entity_details` to find the warehouse target. If a SQL probe is still needed,
call `sql_execution` with the same warehouse connection name, for example:
`sql_execution({connectionName: "warehouse", sql: "SELECT 1 FROM analytics.orders LIMIT 0"})`.
**Never invent column names** - every column in `columns:`, `grain:`, and
`sql:` must be sourced from raw files, `entity_details`, or a successful SQL
probe.
  • Step 3: Fix the sl_capture join probe example

In packages/context/skills/sl_capture/SKILL.md, replace Tool sequence item 6 with:

6. For join discovery: use `sql_execution({connectionName: "warehouse", sql: "SELECT count(*) FROM public.orders o JOIN public.customers c ON c.id = o.customer_id LIMIT 20"})` with the target warehouse connection name and dialect-correct table names to verify the join key exists in both tables and assess cardinality before declaring the join.
  • Step 4: Run the prompt asset tests

Run:

pnpm --filter @ktx/context exec vitest run src/memory/memory-runtime-assets.test.ts src/ingest/ingest-runtime-assets.test.ts

Expected: PASS. The tests must report 2 files passed.

Task 4: Final Verification

Files:

  • No new files.

  • Step 1: Run focused warehouse prompt and tool tests

Run:

pnpm --filter @ktx/context exec vitest run src/connections/dialects.test.ts src/connections/read-only-sql.test.ts src/ingest/tools/warehouse-verification/warehouse-catalog.service.test.ts src/ingest/tools/warehouse-verification/entity-details.tool.test.ts src/ingest/tools/warehouse-verification/sql-execution.tool.test.ts src/ingest/tools/warehouse-verification/discover-data.tool.test.ts src/ingest/ingest-prompts.test.ts src/ingest/ingest-runtime-assets.test.ts src/memory/memory-runtime-assets.test.ts

Expected: PASS.

  • Step 2: Run package type-check

Run:

pnpm --filter @ktx/context run type-check

Expected: PASS.

  • Step 3: Inspect final diff

Run:

git diff -- packages/context/src/memory/memory-runtime-assets.test.ts packages/context/src/ingest/ingest-runtime-assets.test.ts packages/context/skills/_shared/identifier-verification.md packages/context/skills/notion_synthesize/SKILL.md packages/context/skills/dbt_ingest/SKILL.md packages/context/skills/lookml_ingest/SKILL.md packages/context/skills/looker_ingest/SKILL.md packages/context/skills/metabase_ingest/SKILL.md packages/context/skills/metricflow_ingest/SKILL.md packages/context/skills/live_database_ingest/SKILL.md packages/context/skills/historic_sql_table_digest/SKILL.md packages/context/skills/historic_sql_patterns/SKILL.md packages/context/skills/knowledge_capture/SKILL.md packages/context/skills/sl_capture/SKILL.md

Expected: only prompt wording and prompt-asset guards changed. No tool implementation files changed.

  • Step 4: Commit

Run:

git add packages/context/src/memory/memory-runtime-assets.test.ts packages/context/src/ingest/ingest-runtime-assets.test.ts packages/context/skills/_shared/identifier-verification.md packages/context/skills/notion_synthesize/SKILL.md packages/context/skills/dbt_ingest/SKILL.md packages/context/skills/lookml_ingest/SKILL.md packages/context/skills/looker_ingest/SKILL.md packages/context/skills/metabase_ingest/SKILL.md packages/context/skills/metricflow_ingest/SKILL.md packages/context/skills/live_database_ingest/SKILL.md packages/context/skills/historic_sql_table_digest/SKILL.md packages/context/skills/historic_sql_patterns/SKILL.md packages/context/skills/knowledge_capture/SKILL.md packages/context/skills/sl_capture/SKILL.md
git commit -m "fix(context): align warehouse sql probe prompt shape"

Expected: one focused commit.

Self-Review

Spec coverage:

  • The original spec requires sql_execution inputs to include connectionName; this plan removes contradictory session-only examples from all active writer guidance.
  • The shared protocol remains in _shared and inlined in every synthesis writer skill named by the original spec.
  • The tool implementation remains unchanged because the shipped schema already enforces the v1 contract.

Placeholder scan:

  • The plan has no deferred implementation markers.
  • Prompt examples use concrete warehouse, analytics, and orders example names only to demonstrate JSON shape, and each example tells the worker to replace them with discovered evidence.

Type consistency:

  • Tests assert the exact KTX tool call shape: sql_execution({connectionName, sql: ...}).
  • Prompt wording consistently uses connectionName, matching packages/context/src/ingest/tools/warehouse-verification/sql-execution.tool.ts.