fix(context): align warehouse sql probe prompt shape

2026-06-10 08:05:14 +02:00 · 2026-05-13 00:13:38 +02:00 · 2026-05-13 00:13:38 +02:00 · b80660e4d6
commit b80660e4d6
parent ffc9456e75
14 changed files with 101 additions and 52 deletions
--- a/packages/context/skills/_shared/identifier-verification.md
+++ b/packages/context/skills/_shared/identifier-verification.md
@ -14,11 +14,11 @@ SL source, `tables:` frontmatter, `sl_refs`, or `emit_unmapped_fallback`:
 3. For literal values from the source, such as status codes or plan tiers,
   check whether they appear in `entity_details` sampleValues for the relevant
   column. If sampleValues is short or the sample may have missed real values,
-   run a `sql_execution` probe:
-   `SELECT DISTINCT <col> FROM <ref> LIMIT 50`.
+   run a `sql_execution` probe with the same warehouse connection name:
+   `sql_execution({connectionName, sql: "SELECT DISTINCT <col> FROM <ref> LIMIT 50"})`.
 4. If the candidate identifier still does not resolve, do one of:
-   - Use `sql_execution` with `SELECT 1 FROM <ref> LIMIT 0`. If it errors, the
-     identifier is fictional.
+   - Use `sql_execution({connectionName, sql: "SELECT 1 FROM <ref> LIMIT 0"})`.
+     If it errors, the identifier is fictional.
   - Wrap the identifier in `[unverified - from <rawPath>]` in the wiki body,
     citing the exact raw path that mentioned it.
   - When recording `emit_unmapped_fallback` with `no_physical_table`, include
--- a/packages/context/skills/dbt_ingest/SKILL.md
+++ b/packages/context/skills/dbt_ingest/SKILL.md
@ -47,11 +47,11 @@ SL source, `tables:` frontmatter, `sl_refs`, or `emit_unmapped_fallback`:
 3. For literal values from the source, such as status codes or plan tiers,
   check whether they appear in `entity_details` sampleValues for the relevant
   column. If sampleValues is short or the sample may have missed real values,
-   run a `sql_execution` probe:
-   `SELECT DISTINCT <col> FROM <ref> LIMIT 50`.
+   run a `sql_execution` probe with the same warehouse connection name:
+   `sql_execution({connectionName, sql: "SELECT DISTINCT <col> FROM <ref> LIMIT 50"})`.
 4. If the candidate identifier still does not resolve, do one of:
-   - Use `sql_execution` with `SELECT 1 FROM <ref> LIMIT 0`. If it errors, the
-     identifier is fictional.
+   - Use `sql_execution({connectionName, sql: "SELECT 1 FROM <ref> LIMIT 0"})`.
+     If it errors, the identifier is fictional.
   - Wrap the identifier in `[unverified - from <rawPath>]` in the wiki body,
     citing the exact raw path that mentioned it.
   - When recording `emit_unmapped_fallback` with `no_physical_table`, include
--- a/packages/context/skills/historic_sql_patterns/SKILL.md
+++ b/packages/context/skills/historic_sql_patterns/SKILL.md
@ -37,11 +37,11 @@ SL source, `tables:` frontmatter, `sl_refs`, or `emit_unmapped_fallback`:
 3. For literal values from the source, such as status codes or plan tiers,
   check whether they appear in `entity_details` sampleValues for the relevant
   column. If sampleValues is short or the sample may have missed real values,
-   run a `sql_execution` probe:
-   `SELECT DISTINCT <col> FROM <ref> LIMIT 50`.
+   run a `sql_execution` probe with the same warehouse connection name:
+   `sql_execution({connectionName, sql: "SELECT DISTINCT <col> FROM <ref> LIMIT 50"})`.
 4. If the candidate identifier still does not resolve, do one of:
-   - Use `sql_execution` with `SELECT 1 FROM <ref> LIMIT 0`. If it errors, the
-     identifier is fictional.
+   - Use `sql_execution({connectionName, sql: "SELECT 1 FROM <ref> LIMIT 0"})`.
+     If it errors, the identifier is fictional.
   - Wrap the identifier in `[unverified - from <rawPath>]` in the wiki body,
     citing the exact raw path that mentioned it.
   - When recording `emit_unmapped_fallback` with `no_physical_table`, include
--- a/packages/context/skills/historic_sql_table_digest/SKILL.md
+++ b/packages/context/skills/historic_sql_table_digest/SKILL.md
@ -19,11 +19,31 @@ Use this skill when the WorkUnit raw file is one `tables/<schema>.<name>.json` f

 ## Identifier Verification Protocol

-Only mention columns visible in the table's scan record. Use
-`entity_details({connectionName, targets: [{display: "<identifier>"}]})` if
-the table or column attribution is uncertain. Do not infer join columns or
-filters from neighboring SQL unless the scan record confirms the column exists
-on the named table.
+Before writing a wiki page or SL source on any topic:
+
+1. `discover_data({query: "<topic>"})` - see what wikis, SL sources, and raw
+   tables already exist. Prefer updating existing pages over creating new ones.
+
+Before emitting any `schema.table` or `schema.table.column` into a wiki body,
+SL source, `tables:` frontmatter, `sl_refs`, or `emit_unmapped_fallback`:
+
+2. `entity_details({connectionName, targets: [{display: "<identifier>"}]})` -
+   confirm the identifier resolves; inspect native types, FK/PK, and
+   sampleValues.
+3. For literal values from the source, such as status codes or plan tiers,
+   check whether they appear in `entity_details` sampleValues for the relevant
+   column. If sampleValues is short or the sample may have missed real values,
+   run a `sql_execution` probe with the same warehouse connection name:
+   `sql_execution({connectionName, sql: "SELECT DISTINCT <col> FROM <ref> LIMIT 50"})`.
+4. If the candidate identifier still does not resolve, do one of:
+   - Use `sql_execution({connectionName, sql: "SELECT 1 FROM <ref> LIMIT 0"})`.
+     If it errors, the identifier is fictional.
+   - Wrap the identifier in `[unverified - from <rawPath>]` in the wiki body,
+     citing the exact raw path that mentioned it.
+   - When recording `emit_unmapped_fallback` with `no_physical_table`, include
+     the failing probe error in `clarification`.
+5. Never copy `<schema>.<table>` placeholder strings from these instructions
+   into output.

 ## Evidence Shape

--- a/packages/context/skills/knowledge_capture/SKILL.md
+++ b/packages/context/skills/knowledge_capture/SKILL.md
@ -66,11 +66,11 @@ SL source, `tables:` frontmatter, `sl_refs`, or `emit_unmapped_fallback`:
 3. For literal values from the source, such as status codes or plan tiers,
   check whether they appear in `entity_details` sampleValues for the relevant
   column. If sampleValues is short or the sample may have missed real values,
-   run a `sql_execution` probe:
-   `SELECT DISTINCT <col> FROM <ref> LIMIT 50`.
+   run a `sql_execution` probe with the same warehouse connection name:
+   `sql_execution({connectionName, sql: "SELECT DISTINCT <col> FROM <ref> LIMIT 50"})`.
 4. If the candidate identifier still does not resolve, do one of:
-   - Use `sql_execution` with `SELECT 1 FROM <ref> LIMIT 0`. If it errors, the
-     identifier is fictional.
+   - Use `sql_execution({connectionName, sql: "SELECT 1 FROM <ref> LIMIT 0"})`.
+     If it errors, the identifier is fictional.
   - Wrap the identifier in `[unverified - from <rawPath>]` in the wiki body,
     citing the exact raw path that mentioned it.
   - When recording `emit_unmapped_fallback` with `no_physical_table`, include
--- a/packages/context/skills/live_database_ingest/SKILL.md
+++ b/packages/context/skills/live_database_ingest/SKILL.md
@ -43,11 +43,11 @@ SL source, `tables:` frontmatter, `sl_refs`, or `emit_unmapped_fallback`:
 3. For literal values from the source, such as status codes or plan tiers,
   check whether they appear in `entity_details` sampleValues for the relevant
   column. If sampleValues is short or the sample may have missed real values,
-   run a `sql_execution` probe:
-   `SELECT DISTINCT <col> FROM <ref> LIMIT 50`.
+   run a `sql_execution` probe with the same warehouse connection name:
+   `sql_execution({connectionName, sql: "SELECT DISTINCT <col> FROM <ref> LIMIT 50"})`.
 4. If the candidate identifier still does not resolve, do one of:
-   - Use `sql_execution` with `SELECT 1 FROM <ref> LIMIT 0`. If it errors, the
-     identifier is fictional.
+   - Use `sql_execution({connectionName, sql: "SELECT 1 FROM <ref> LIMIT 0"})`.
+     If it errors, the identifier is fictional.
   - Wrap the identifier in `[unverified - from <rawPath>]` in the wiki body,
     citing the exact raw path that mentioned it.
   - When recording `emit_unmapped_fallback` with `no_physical_table`, include
--- a/packages/context/skills/looker_ingest/SKILL.md
+++ b/packages/context/skills/looker_ingest/SKILL.md
@ -40,11 +40,11 @@ SL source, `tables:` frontmatter, `sl_refs`, or `emit_unmapped_fallback`:
 3. For literal values from the source, such as status codes or plan tiers,
   check whether they appear in `entity_details` sampleValues for the relevant
   column. If sampleValues is short or the sample may have missed real values,
-   run a `sql_execution` probe:
-   `SELECT DISTINCT <col> FROM <ref> LIMIT 50`.
+   run a `sql_execution` probe with the same warehouse connection name:
+   `sql_execution({connectionName, sql: "SELECT DISTINCT <col> FROM <ref> LIMIT 50"})`.
 4. If the candidate identifier still does not resolve, do one of:
-   - Use `sql_execution` with `SELECT 1 FROM <ref> LIMIT 0`. If it errors, the
-     identifier is fictional.
+   - Use `sql_execution({connectionName, sql: "SELECT 1 FROM <ref> LIMIT 0"})`.
+     If it errors, the identifier is fictional.
   - Wrap the identifier in `[unverified - from <rawPath>]` in the wiki body,
     citing the exact raw path that mentioned it.
   - When recording `emit_unmapped_fallback` with `no_physical_table`, include
--- a/packages/context/skills/lookml_ingest/SKILL.md
+++ b/packages/context/skills/lookml_ingest/SKILL.md
@ -70,11 +70,11 @@ SL source, `tables:` frontmatter, `sl_refs`, or `emit_unmapped_fallback`:
 3. For literal values from the source, such as status codes or plan tiers,
   check whether they appear in `entity_details` sampleValues for the relevant
   column. If sampleValues is short or the sample may have missed real values,
-   run a `sql_execution` probe:
-   `SELECT DISTINCT <col> FROM <ref> LIMIT 50`.
+   run a `sql_execution` probe with the same warehouse connection name:
+   `sql_execution({connectionName, sql: "SELECT DISTINCT <col> FROM <ref> LIMIT 50"})`.
 4. If the candidate identifier still does not resolve, do one of:
-   - Use `sql_execution` with `SELECT 1 FROM <ref> LIMIT 0`. If it errors, the
-     identifier is fictional.
+   - Use `sql_execution({connectionName, sql: "SELECT 1 FROM <ref> LIMIT 0"})`.
+     If it errors, the identifier is fictional.
   - Wrap the identifier in `[unverified - from <rawPath>]` in the wiki body,
     citing the exact raw path that mentioned it.
   - When recording `emit_unmapped_fallback` with `no_physical_table`, include
@ -85,7 +85,13 @@ SL source, `tables:` frontmatter, `sl_refs`, or `emit_unmapped_fallback`:
 **Required flow before writing any overlay or standalone**:

 1. Call `sl_discover(<tableName>)` for each base table you're about to touch. That returns the real columns.
-2. If the table isn't in the manifest, fall back to `sql_execution({ sql: "SELECT column_name FROM <dataset>.INFORMATION_SCHEMA.COLUMNS WHERE table_name = '<table>'" })` (session shape — a connection is already pinned by the ingest session).
+2. If the table isn't in the manifest, use the warehouse `connectionName`
+   returned by `discover_data` or the target connection chosen from
+   `sl_discover`, then call a dialect-appropriate SQL probe with that
+   connection name, for example:
+   `sql_execution({connectionName: "warehouse", sql: "SELECT 1 FROM analytics.orders LIMIT 0"})`.
+   Replace `warehouse`, `analytics`, and `orders` with the verified connection,
+   schema or dataset, and table from the WorkUnit evidence.
 3. Use only those names in `sql:`, `columns:`, and `grain:`. Map each `dimension_group` to ONE `{ name: <physical_col>, type: time, role: time }` entry — never one per timeframe.

 | LookML input | KTX `columns:` entry |
--- a/packages/context/skills/metabase_ingest/SKILL.md
+++ b/packages/context/skills/metabase_ingest/SKILL.md
@ -63,11 +63,11 @@ SL source, `tables:` frontmatter, `sl_refs`, or `emit_unmapped_fallback`:
 3. For literal values from the source, such as status codes or plan tiers,
   check whether they appear in `entity_details` sampleValues for the relevant
   column. If sampleValues is short or the sample may have missed real values,
-   run a `sql_execution` probe:
-   `SELECT DISTINCT <col> FROM <ref> LIMIT 50`.
+   run a `sql_execution` probe with the same warehouse connection name:
+   `sql_execution({connectionName, sql: "SELECT DISTINCT <col> FROM <ref> LIMIT 50"})`.
 4. If the candidate identifier still does not resolve, do one of:
-   - Use `sql_execution` with `SELECT 1 FROM <ref> LIMIT 0`. If it errors, the
-     identifier is fictional.
+   - Use `sql_execution({connectionName, sql: "SELECT 1 FROM <ref> LIMIT 0"})`.
+     If it errors, the identifier is fictional.
   - Wrap the identifier in `[unverified - from <rawPath>]` in the wiki body,
     citing the exact raw path that mentioned it.
   - When recording `emit_unmapped_fallback` with `no_physical_table`, include
--- a/packages/context/skills/metricflow_ingest/SKILL.md
+++ b/packages/context/skills/metricflow_ingest/SKILL.md
@ -48,11 +48,11 @@ SL source, `tables:` frontmatter, `sl_refs`, or `emit_unmapped_fallback`:
 3. For literal values from the source, such as status codes or plan tiers,
   check whether they appear in `entity_details` sampleValues for the relevant
   column. If sampleValues is short or the sample may have missed real values,
-   run a `sql_execution` probe:
-   `SELECT DISTINCT <col> FROM <ref> LIMIT 50`.
+   run a `sql_execution` probe with the same warehouse connection name:
+   `sql_execution({connectionName, sql: "SELECT DISTINCT <col> FROM <ref> LIMIT 50"})`.
 4. If the candidate identifier still does not resolve, do one of:
-   - Use `sql_execution` with `SELECT 1 FROM <ref> LIMIT 0`. If it errors, the
-     identifier is fictional.
+   - Use `sql_execution({connectionName, sql: "SELECT 1 FROM <ref> LIMIT 0"})`.
+     If it errors, the identifier is fictional.
   - Wrap the identifier in `[unverified - from <rawPath>]` in the wiki body,
     citing the exact raw path that mentioned it.
   - When recording `emit_unmapped_fallback` with `no_physical_table`, include
@ -80,7 +80,13 @@ The `model:` field on a semantic_model is a string like `ref('table_name')`, `so
 - `source('s','t')` → table name `t`. Verify via `sl_discover(t)`.
 - Literal (no `ref(...)` / `source(...)`) → treat as the table name directly.

-If `sl_discover` errors (no such table), fall back to `sql_execution({ sql: "SELECT column_name FROM <dataset>.INFORMATION_SCHEMA.COLUMNS WHERE table_name = '<x>'" })` (session shape — a connection is already pinned by the ingest session). **Never invent column names** — every column in `columns:`, `grain:`, and `sql:` must be sourced from a real probe.
+If `sl_discover` errors because no such table exists, use `discover_data` and
+`entity_details` to find the warehouse target. If a SQL probe is still needed,
+call `sql_execution` with the same warehouse connection name, for example:
+`sql_execution({connectionName: "warehouse", sql: "SELECT 1 FROM analytics.orders LIMIT 0"})`.
+**Never invent column names** - every column in `columns:`, `grain:`, and
+`sql:` must be sourced from raw files, `entity_details`, or a successful SQL
+probe.

 After every `sl_write_source`, call `sl_validate`. The warehouse will reject invented columns with `Unrecognized name: <name>` — treat as a hard failure and re-read the schema.

--- a/packages/context/skills/notion_synthesize/SKILL.md
+++ b/packages/context/skills/notion_synthesize/SKILL.md
@ -85,11 +85,11 @@ SL source, `tables:` frontmatter, `sl_refs`, or `emit_unmapped_fallback`:
 3. For literal values from the source, such as status codes or plan tiers,
   check whether they appear in `entity_details` sampleValues for the relevant
   column. If sampleValues is short or the sample may have missed real values,
-   run a `sql_execution` probe:
-   `SELECT DISTINCT <col> FROM <ref> LIMIT 50`.
+   run a `sql_execution` probe with the same warehouse connection name:
+   `sql_execution({connectionName, sql: "SELECT DISTINCT <col> FROM <ref> LIMIT 50"})`.
 4. If the candidate identifier still does not resolve, do one of:
-   - Use `sql_execution` with `SELECT 1 FROM <ref> LIMIT 0`. If it errors, the
-     identifier is fictional.
+   - Use `sql_execution({connectionName, sql: "SELECT 1 FROM <ref> LIMIT 0"})`.
+     If it errors, the identifier is fictional.
   - Wrap the identifier in `[unverified - from <rawPath>]` in the wiki body,
     citing the exact raw path that mentioned it.
   - When recording `emit_unmapped_fallback` with `no_physical_table`, include
--- a/packages/context/skills/sl_capture/SKILL.md
+++ b/packages/context/skills/sl_capture/SKILL.md
@ -193,11 +193,11 @@ SL source, `tables:` frontmatter, `sl_refs`, or `emit_unmapped_fallback`:
 3. For literal values from the source, such as status codes or plan tiers,
   check whether they appear in `entity_details` sampleValues for the relevant
   column. If sampleValues is short or the sample may have missed real values,
-   run a `sql_execution` probe:
-   `SELECT DISTINCT <col> FROM <ref> LIMIT 50`.
+   run a `sql_execution` probe with the same warehouse connection name:
+   `sql_execution({connectionName, sql: "SELECT DISTINCT <col> FROM <ref> LIMIT 50"})`.
 4. If the candidate identifier still does not resolve, do one of:
-   - Use `sql_execution` with `SELECT 1 FROM <ref> LIMIT 0`. If it errors, the
-     identifier is fictional.
+   - Use `sql_execution({connectionName, sql: "SELECT 1 FROM <ref> LIMIT 0"})`.
+     If it errors, the identifier is fictional.
   - Wrap the identifier in `[unverified - from <rawPath>]` in the wiki body,
     citing the exact raw path that mentioned it.
   - When recording `emit_unmapped_fallback` with `no_physical_table`, include
@ -212,7 +212,7 @@ SL source, `tables:` frontmatter, `sl_refs`, or `emit_unmapped_fallback`:
 3. `sl_read_source({ sourceName })` — read the raw YAML before editing.
 4. For modifications: `sl_edit_source({ sourceName, old_string, new_string })` with exact-string replacements. `old_string` must match exactly and be unique in the file.
 5. For new sources or full rewrites: `sl_write_source({ sourceName, content })` with the full YAML content.
-6. For join discovery: `sql_execution({ sql })` to verify the join key exists in both tables and assess cardinality before declaring the join.
+6. For join discovery: use `sql_execution({connectionName: "warehouse", sql: "SELECT count(*) FROM public.orders o JOIN public.customers c ON c.id = o.customer_id LIMIT 20"})` with the target warehouse connection name and dialect-correct table names to verify the join key exists in both tables and assess cardinality before declaring the join.
 7. Cross-reference knowledge: author the edge once on the **wiki** side via `sl_refs: [source_name]` in the page's front-matter. The reverse edge (wiki pages that cite an SL source) is derived automatically by the reconciler — do not add a `knowledge_refs:` field to SL YAMLs.
 8. `sl_validate` — run after writing or editing to surface schema issues, duplicate measure names, and cross-source validation errors. Read-only; the writes are already committed (the squash-at-end flow will collapse them into one commit).

--- a/packages/context/src/ingest/ingest-runtime-assets.test.ts
+++ b/packages/context/src/ingest/ingest-runtime-assets.test.ts
@ -98,5 +98,7 @@ describe('ingest runtime assets', () => {
    expect(shared).toContain('discover_data');
    expect(shared).toContain('entity_details');
    expect(shared).toContain('sql_execution');
+    expect(shared).toContain('sql_execution({connectionName, sql: "SELECT DISTINCT');
+    expect(shared).toContain('sql_execution({connectionName, sql: "SELECT 1 FROM');
  });
 });
--- a/packages/context/src/memory/memory-runtime-assets.test.ts
+++ b/packages/context/src/memory/memory-runtime-assets.test.ts
@ -147,4 +147,19 @@ describe('memory runtime assets', () => {
      expect(body).not.toContain('sl_describe_table');
    }
  });
+
+  it('ships only the KTX connectionName sql_execution call shape in writer guidance', async () => {
+    const shared = await readFile(join(skillsDir, '_shared', 'identifier-verification.md'), 'utf-8');
+
+    expect(shared).toContain('sql_execution({connectionName, sql: "SELECT DISTINCT');
+    expect(shared).toContain('sql_execution({connectionName, sql: "SELECT 1 FROM');
+
+    for (const skillName of verificationWriterSkills) {
+      const body = await readFile(join(skillsDir, skillName, 'SKILL.md'), 'utf-8');
+      expect(body).toContain('sql_execution({connectionName');
+      expect(body).not.toContain('sql_execution({ sql');
+      expect(body).not.toContain('session shape');
+      expect(body).not.toContain('connection is already pinned by the ingest session');
+    }
+  });
 });