fix(context): align warehouse sql probe prompt shape

This commit is contained in:
Andrey Avtomonov 2026-05-13 00:13:38 +02:00
parent ffc9456e75
commit b80660e4d6
14 changed files with 101 additions and 52 deletions

View file

@ -14,11 +14,11 @@ SL source, `tables:` frontmatter, `sl_refs`, or `emit_unmapped_fallback`:
3. For literal values from the source, such as status codes or plan tiers,
check whether they appear in `entity_details` sampleValues for the relevant
column. If sampleValues is short or the sample may have missed real values,
run a `sql_execution` probe:
`SELECT DISTINCT <col> FROM <ref> LIMIT 50`.
run a `sql_execution` probe with the same warehouse connection name:
`sql_execution({connectionName, sql: "SELECT DISTINCT <col> FROM <ref> LIMIT 50"})`.
4. If the candidate identifier still does not resolve, do one of:
- Use `sql_execution` with `SELECT 1 FROM <ref> LIMIT 0`. If it errors, the
identifier is fictional.
- Use `sql_execution({connectionName, sql: "SELECT 1 FROM <ref> LIMIT 0"})`.
If it errors, the identifier is fictional.
- Wrap the identifier in `[unverified - from <rawPath>]` in the wiki body,
citing the exact raw path that mentioned it.
- When recording `emit_unmapped_fallback` with `no_physical_table`, include

View file

@ -47,11 +47,11 @@ SL source, `tables:` frontmatter, `sl_refs`, or `emit_unmapped_fallback`:
3. For literal values from the source, such as status codes or plan tiers,
check whether they appear in `entity_details` sampleValues for the relevant
column. If sampleValues is short or the sample may have missed real values,
run a `sql_execution` probe:
`SELECT DISTINCT <col> FROM <ref> LIMIT 50`.
run a `sql_execution` probe with the same warehouse connection name:
`sql_execution({connectionName, sql: "SELECT DISTINCT <col> FROM <ref> LIMIT 50"})`.
4. If the candidate identifier still does not resolve, do one of:
- Use `sql_execution` with `SELECT 1 FROM <ref> LIMIT 0`. If it errors, the
identifier is fictional.
- Use `sql_execution({connectionName, sql: "SELECT 1 FROM <ref> LIMIT 0"})`.
If it errors, the identifier is fictional.
- Wrap the identifier in `[unverified - from <rawPath>]` in the wiki body,
citing the exact raw path that mentioned it.
- When recording `emit_unmapped_fallback` with `no_physical_table`, include

View file

@ -37,11 +37,11 @@ SL source, `tables:` frontmatter, `sl_refs`, or `emit_unmapped_fallback`:
3. For literal values from the source, such as status codes or plan tiers,
check whether they appear in `entity_details` sampleValues for the relevant
column. If sampleValues is short or the sample may have missed real values,
run a `sql_execution` probe:
`SELECT DISTINCT <col> FROM <ref> LIMIT 50`.
run a `sql_execution` probe with the same warehouse connection name:
`sql_execution({connectionName, sql: "SELECT DISTINCT <col> FROM <ref> LIMIT 50"})`.
4. If the candidate identifier still does not resolve, do one of:
- Use `sql_execution` with `SELECT 1 FROM <ref> LIMIT 0`. If it errors, the
identifier is fictional.
- Use `sql_execution({connectionName, sql: "SELECT 1 FROM <ref> LIMIT 0"})`.
If it errors, the identifier is fictional.
- Wrap the identifier in `[unverified - from <rawPath>]` in the wiki body,
citing the exact raw path that mentioned it.
- When recording `emit_unmapped_fallback` with `no_physical_table`, include

View file

@ -19,11 +19,31 @@ Use this skill when the WorkUnit raw file is one `tables/<schema>.<name>.json` f
## Identifier Verification Protocol
Only mention columns visible in the table's scan record. Use
`entity_details({connectionName, targets: [{display: "<identifier>"}]})` if
the table or column attribution is uncertain. Do not infer join columns or
filters from neighboring SQL unless the scan record confirms the column exists
on the named table.
Before writing a wiki page or SL source on any topic:
1. `discover_data({query: "<topic>"})` - see what wikis, SL sources, and raw
tables already exist. Prefer updating existing pages over creating new ones.
Before emitting any `schema.table` or `schema.table.column` into a wiki body,
SL source, `tables:` frontmatter, `sl_refs`, or `emit_unmapped_fallback`:
2. `entity_details({connectionName, targets: [{display: "<identifier>"}]})` -
confirm the identifier resolves; inspect native types, FK/PK, and
sampleValues.
3. For literal values from the source, such as status codes or plan tiers,
check whether they appear in `entity_details` sampleValues for the relevant
column. If sampleValues is short or the sample may have missed real values,
run a `sql_execution` probe with the same warehouse connection name:
`sql_execution({connectionName, sql: "SELECT DISTINCT <col> FROM <ref> LIMIT 50"})`.
4. If the candidate identifier still does not resolve, do one of:
- Use `sql_execution({connectionName, sql: "SELECT 1 FROM <ref> LIMIT 0"})`.
If it errors, the identifier is fictional.
- Wrap the identifier in `[unverified - from <rawPath>]` in the wiki body,
citing the exact raw path that mentioned it.
- When recording `emit_unmapped_fallback` with `no_physical_table`, include
the failing probe error in `clarification`.
5. Never copy `<schema>.<table>` placeholder strings from these instructions
into output.
## Evidence Shape

View file

@ -66,11 +66,11 @@ SL source, `tables:` frontmatter, `sl_refs`, or `emit_unmapped_fallback`:
3. For literal values from the source, such as status codes or plan tiers,
check whether they appear in `entity_details` sampleValues for the relevant
column. If sampleValues is short or the sample may have missed real values,
run a `sql_execution` probe:
`SELECT DISTINCT <col> FROM <ref> LIMIT 50`.
run a `sql_execution` probe with the same warehouse connection name:
`sql_execution({connectionName, sql: "SELECT DISTINCT <col> FROM <ref> LIMIT 50"})`.
4. If the candidate identifier still does not resolve, do one of:
- Use `sql_execution` with `SELECT 1 FROM <ref> LIMIT 0`. If it errors, the
identifier is fictional.
- Use `sql_execution({connectionName, sql: "SELECT 1 FROM <ref> LIMIT 0"})`.
If it errors, the identifier is fictional.
- Wrap the identifier in `[unverified - from <rawPath>]` in the wiki body,
citing the exact raw path that mentioned it.
- When recording `emit_unmapped_fallback` with `no_physical_table`, include

View file

@ -43,11 +43,11 @@ SL source, `tables:` frontmatter, `sl_refs`, or `emit_unmapped_fallback`:
3. For literal values from the source, such as status codes or plan tiers,
check whether they appear in `entity_details` sampleValues for the relevant
column. If sampleValues is short or the sample may have missed real values,
run a `sql_execution` probe:
`SELECT DISTINCT <col> FROM <ref> LIMIT 50`.
run a `sql_execution` probe with the same warehouse connection name:
`sql_execution({connectionName, sql: "SELECT DISTINCT <col> FROM <ref> LIMIT 50"})`.
4. If the candidate identifier still does not resolve, do one of:
- Use `sql_execution` with `SELECT 1 FROM <ref> LIMIT 0`. If it errors, the
identifier is fictional.
- Use `sql_execution({connectionName, sql: "SELECT 1 FROM <ref> LIMIT 0"})`.
If it errors, the identifier is fictional.
- Wrap the identifier in `[unverified - from <rawPath>]` in the wiki body,
citing the exact raw path that mentioned it.
- When recording `emit_unmapped_fallback` with `no_physical_table`, include

View file

@ -40,11 +40,11 @@ SL source, `tables:` frontmatter, `sl_refs`, or `emit_unmapped_fallback`:
3. For literal values from the source, such as status codes or plan tiers,
check whether they appear in `entity_details` sampleValues for the relevant
column. If sampleValues is short or the sample may have missed real values,
run a `sql_execution` probe:
`SELECT DISTINCT <col> FROM <ref> LIMIT 50`.
run a `sql_execution` probe with the same warehouse connection name:
`sql_execution({connectionName, sql: "SELECT DISTINCT <col> FROM <ref> LIMIT 50"})`.
4. If the candidate identifier still does not resolve, do one of:
- Use `sql_execution` with `SELECT 1 FROM <ref> LIMIT 0`. If it errors, the
identifier is fictional.
- Use `sql_execution({connectionName, sql: "SELECT 1 FROM <ref> LIMIT 0"})`.
If it errors, the identifier is fictional.
- Wrap the identifier in `[unverified - from <rawPath>]` in the wiki body,
citing the exact raw path that mentioned it.
- When recording `emit_unmapped_fallback` with `no_physical_table`, include

View file

@ -70,11 +70,11 @@ SL source, `tables:` frontmatter, `sl_refs`, or `emit_unmapped_fallback`:
3. For literal values from the source, such as status codes or plan tiers,
check whether they appear in `entity_details` sampleValues for the relevant
column. If sampleValues is short or the sample may have missed real values,
run a `sql_execution` probe:
`SELECT DISTINCT <col> FROM <ref> LIMIT 50`.
run a `sql_execution` probe with the same warehouse connection name:
`sql_execution({connectionName, sql: "SELECT DISTINCT <col> FROM <ref> LIMIT 50"})`.
4. If the candidate identifier still does not resolve, do one of:
- Use `sql_execution` with `SELECT 1 FROM <ref> LIMIT 0`. If it errors, the
identifier is fictional.
- Use `sql_execution({connectionName, sql: "SELECT 1 FROM <ref> LIMIT 0"})`.
If it errors, the identifier is fictional.
- Wrap the identifier in `[unverified - from <rawPath>]` in the wiki body,
citing the exact raw path that mentioned it.
- When recording `emit_unmapped_fallback` with `no_physical_table`, include
@ -85,7 +85,13 @@ SL source, `tables:` frontmatter, `sl_refs`, or `emit_unmapped_fallback`:
**Required flow before writing any overlay or standalone**:
1. Call `sl_discover(<tableName>)` for each base table you're about to touch. That returns the real columns.
2. If the table isn't in the manifest, fall back to `sql_execution({ sql: "SELECT column_name FROM <dataset>.INFORMATION_SCHEMA.COLUMNS WHERE table_name = '<table>'" })` (session shape — a connection is already pinned by the ingest session).
2. If the table isn't in the manifest, use the warehouse `connectionName`
returned by `discover_data` or the target connection chosen from
`sl_discover`, then call a dialect-appropriate SQL probe with that
connection name, for example:
`sql_execution({connectionName: "warehouse", sql: "SELECT 1 FROM analytics.orders LIMIT 0"})`.
Replace `warehouse`, `analytics`, and `orders` with the verified connection,
schema or dataset, and table from the WorkUnit evidence.
3. Use only those names in `sql:`, `columns:`, and `grain:`. Map each `dimension_group` to ONE `{ name: <physical_col>, type: time, role: time }` entry — never one per timeframe.
| LookML input | KTX `columns:` entry |

View file

@ -63,11 +63,11 @@ SL source, `tables:` frontmatter, `sl_refs`, or `emit_unmapped_fallback`:
3. For literal values from the source, such as status codes or plan tiers,
check whether they appear in `entity_details` sampleValues for the relevant
column. If sampleValues is short or the sample may have missed real values,
run a `sql_execution` probe:
`SELECT DISTINCT <col> FROM <ref> LIMIT 50`.
run a `sql_execution` probe with the same warehouse connection name:
`sql_execution({connectionName, sql: "SELECT DISTINCT <col> FROM <ref> LIMIT 50"})`.
4. If the candidate identifier still does not resolve, do one of:
- Use `sql_execution` with `SELECT 1 FROM <ref> LIMIT 0`. If it errors, the
identifier is fictional.
- Use `sql_execution({connectionName, sql: "SELECT 1 FROM <ref> LIMIT 0"})`.
If it errors, the identifier is fictional.
- Wrap the identifier in `[unverified - from <rawPath>]` in the wiki body,
citing the exact raw path that mentioned it.
- When recording `emit_unmapped_fallback` with `no_physical_table`, include

View file

@ -48,11 +48,11 @@ SL source, `tables:` frontmatter, `sl_refs`, or `emit_unmapped_fallback`:
3. For literal values from the source, such as status codes or plan tiers,
check whether they appear in `entity_details` sampleValues for the relevant
column. If sampleValues is short or the sample may have missed real values,
run a `sql_execution` probe:
`SELECT DISTINCT <col> FROM <ref> LIMIT 50`.
run a `sql_execution` probe with the same warehouse connection name:
`sql_execution({connectionName, sql: "SELECT DISTINCT <col> FROM <ref> LIMIT 50"})`.
4. If the candidate identifier still does not resolve, do one of:
- Use `sql_execution` with `SELECT 1 FROM <ref> LIMIT 0`. If it errors, the
identifier is fictional.
- Use `sql_execution({connectionName, sql: "SELECT 1 FROM <ref> LIMIT 0"})`.
If it errors, the identifier is fictional.
- Wrap the identifier in `[unverified - from <rawPath>]` in the wiki body,
citing the exact raw path that mentioned it.
- When recording `emit_unmapped_fallback` with `no_physical_table`, include
@ -80,7 +80,13 @@ The `model:` field on a semantic_model is a string like `ref('table_name')`, `so
- `source('s','t')` → table name `t`. Verify via `sl_discover(t)`.
- Literal (no `ref(...)` / `source(...)`) → treat as the table name directly.
If `sl_discover` errors (no such table), fall back to `sql_execution({ sql: "SELECT column_name FROM <dataset>.INFORMATION_SCHEMA.COLUMNS WHERE table_name = '<x>'" })` (session shape — a connection is already pinned by the ingest session). **Never invent column names** — every column in `columns:`, `grain:`, and `sql:` must be sourced from a real probe.
If `sl_discover` errors because no such table exists, use `discover_data` and
`entity_details` to find the warehouse target. If a SQL probe is still needed,
call `sql_execution` with the same warehouse connection name, for example:
`sql_execution({connectionName: "warehouse", sql: "SELECT 1 FROM analytics.orders LIMIT 0"})`.
**Never invent column names** - every column in `columns:`, `grain:`, and
`sql:` must be sourced from raw files, `entity_details`, or a successful SQL
probe.
After every `sl_write_source`, call `sl_validate`. The warehouse will reject invented columns with `Unrecognized name: <name>` — treat as a hard failure and re-read the schema.

View file

@ -85,11 +85,11 @@ SL source, `tables:` frontmatter, `sl_refs`, or `emit_unmapped_fallback`:
3. For literal values from the source, such as status codes or plan tiers,
check whether they appear in `entity_details` sampleValues for the relevant
column. If sampleValues is short or the sample may have missed real values,
run a `sql_execution` probe:
`SELECT DISTINCT <col> FROM <ref> LIMIT 50`.
run a `sql_execution` probe with the same warehouse connection name:
`sql_execution({connectionName, sql: "SELECT DISTINCT <col> FROM <ref> LIMIT 50"})`.
4. If the candidate identifier still does not resolve, do one of:
- Use `sql_execution` with `SELECT 1 FROM <ref> LIMIT 0`. If it errors, the
identifier is fictional.
- Use `sql_execution({connectionName, sql: "SELECT 1 FROM <ref> LIMIT 0"})`.
If it errors, the identifier is fictional.
- Wrap the identifier in `[unverified - from <rawPath>]` in the wiki body,
citing the exact raw path that mentioned it.
- When recording `emit_unmapped_fallback` with `no_physical_table`, include

View file

@ -193,11 +193,11 @@ SL source, `tables:` frontmatter, `sl_refs`, or `emit_unmapped_fallback`:
3. For literal values from the source, such as status codes or plan tiers,
check whether they appear in `entity_details` sampleValues for the relevant
column. If sampleValues is short or the sample may have missed real values,
run a `sql_execution` probe:
`SELECT DISTINCT <col> FROM <ref> LIMIT 50`.
run a `sql_execution` probe with the same warehouse connection name:
`sql_execution({connectionName, sql: "SELECT DISTINCT <col> FROM <ref> LIMIT 50"})`.
4. If the candidate identifier still does not resolve, do one of:
- Use `sql_execution` with `SELECT 1 FROM <ref> LIMIT 0`. If it errors, the
identifier is fictional.
- Use `sql_execution({connectionName, sql: "SELECT 1 FROM <ref> LIMIT 0"})`.
If it errors, the identifier is fictional.
- Wrap the identifier in `[unverified - from <rawPath>]` in the wiki body,
citing the exact raw path that mentioned it.
- When recording `emit_unmapped_fallback` with `no_physical_table`, include
@ -212,7 +212,7 @@ SL source, `tables:` frontmatter, `sl_refs`, or `emit_unmapped_fallback`:
3. `sl_read_source({ sourceName })` — read the raw YAML before editing.
4. For modifications: `sl_edit_source({ sourceName, old_string, new_string })` with exact-string replacements. `old_string` must match exactly and be unique in the file.
5. For new sources or full rewrites: `sl_write_source({ sourceName, content })` with the full YAML content.
6. For join discovery: `sql_execution({ sql })` to verify the join key exists in both tables and assess cardinality before declaring the join.
6. For join discovery: use `sql_execution({connectionName: "warehouse", sql: "SELECT count(*) FROM public.orders o JOIN public.customers c ON c.id = o.customer_id LIMIT 20"})` with the target warehouse connection name and dialect-correct table names to verify the join key exists in both tables and assess cardinality before declaring the join.
7. Cross-reference knowledge: author the edge once on the **wiki** side via `sl_refs: [source_name]` in the page's front-matter. The reverse edge (wiki pages that cite an SL source) is derived automatically by the reconciler — do not add a `knowledge_refs:` field to SL YAMLs.
8. `sl_validate` — run after writing or editing to surface schema issues, duplicate measure names, and cross-source validation errors. Read-only; the writes are already committed (the squash-at-end flow will collapse them into one commit).

View file

@ -98,5 +98,7 @@ describe('ingest runtime assets', () => {
expect(shared).toContain('discover_data');
expect(shared).toContain('entity_details');
expect(shared).toContain('sql_execution');
expect(shared).toContain('sql_execution({connectionName, sql: "SELECT DISTINCT');
expect(shared).toContain('sql_execution({connectionName, sql: "SELECT 1 FROM');
});
});

View file

@ -147,4 +147,19 @@ describe('memory runtime assets', () => {
expect(body).not.toContain('sl_describe_table');
}
});
it('ships only the KTX connectionName sql_execution call shape in writer guidance', async () => {
const shared = await readFile(join(skillsDir, '_shared', 'identifier-verification.md'), 'utf-8');
expect(shared).toContain('sql_execution({connectionName, sql: "SELECT DISTINCT');
expect(shared).toContain('sql_execution({connectionName, sql: "SELECT 1 FROM');
for (const skillName of verificationWriterSkills) {
const body = await readFile(join(skillsDir, skillName, 'SKILL.md'), 'utf-8');
expect(body).toContain('sql_execution({connectionName');
expect(body).not.toContain('sql_execution({ sql');
expect(body).not.toContain('session shape');
expect(body).not.toContain('connection is already pinned by the ingest session');
}
});
});