diff --git a/packages/context/skills/_shared/identifier-verification.md b/packages/context/skills/_shared/identifier-verification.md index 21f1da68..775203bd 100644 --- a/packages/context/skills/_shared/identifier-verification.md +++ b/packages/context/skills/_shared/identifier-verification.md @@ -14,11 +14,11 @@ SL source, `tables:` frontmatter, `sl_refs`, or `emit_unmapped_fallback`: 3. For literal values from the source, such as status codes or plan tiers, check whether they appear in `entity_details` sampleValues for the relevant column. If sampleValues is short or the sample may have missed real values, - run a `sql_execution` probe: - `SELECT DISTINCT FROM LIMIT 50`. + run a `sql_execution` probe with the same warehouse connection name: + `sql_execution({connectionName, sql: "SELECT DISTINCT FROM LIMIT 50"})`. 4. If the candidate identifier still does not resolve, do one of: - - Use `sql_execution` with `SELECT 1 FROM LIMIT 0`. If it errors, the - identifier is fictional. + - Use `sql_execution({connectionName, sql: "SELECT 1 FROM LIMIT 0"})`. + If it errors, the identifier is fictional. - Wrap the identifier in `[unverified - from ]` in the wiki body, citing the exact raw path that mentioned it. - When recording `emit_unmapped_fallback` with `no_physical_table`, include diff --git a/packages/context/skills/dbt_ingest/SKILL.md b/packages/context/skills/dbt_ingest/SKILL.md index 4d5b54c4..6b332d8e 100644 --- a/packages/context/skills/dbt_ingest/SKILL.md +++ b/packages/context/skills/dbt_ingest/SKILL.md @@ -47,11 +47,11 @@ SL source, `tables:` frontmatter, `sl_refs`, or `emit_unmapped_fallback`: 3. For literal values from the source, such as status codes or plan tiers, check whether they appear in `entity_details` sampleValues for the relevant column. If sampleValues is short or the sample may have missed real values, - run a `sql_execution` probe: - `SELECT DISTINCT FROM LIMIT 50`. + run a `sql_execution` probe with the same warehouse connection name: + `sql_execution({connectionName, sql: "SELECT DISTINCT FROM LIMIT 50"})`. 4. If the candidate identifier still does not resolve, do one of: - - Use `sql_execution` with `SELECT 1 FROM LIMIT 0`. If it errors, the - identifier is fictional. + - Use `sql_execution({connectionName, sql: "SELECT 1 FROM LIMIT 0"})`. + If it errors, the identifier is fictional. - Wrap the identifier in `[unverified - from ]` in the wiki body, citing the exact raw path that mentioned it. - When recording `emit_unmapped_fallback` with `no_physical_table`, include diff --git a/packages/context/skills/historic_sql_patterns/SKILL.md b/packages/context/skills/historic_sql_patterns/SKILL.md index aaf7a26c..5e898c47 100644 --- a/packages/context/skills/historic_sql_patterns/SKILL.md +++ b/packages/context/skills/historic_sql_patterns/SKILL.md @@ -37,11 +37,11 @@ SL source, `tables:` frontmatter, `sl_refs`, or `emit_unmapped_fallback`: 3. For literal values from the source, such as status codes or plan tiers, check whether they appear in `entity_details` sampleValues for the relevant column. If sampleValues is short or the sample may have missed real values, - run a `sql_execution` probe: - `SELECT DISTINCT FROM LIMIT 50`. + run a `sql_execution` probe with the same warehouse connection name: + `sql_execution({connectionName, sql: "SELECT DISTINCT FROM LIMIT 50"})`. 4. If the candidate identifier still does not resolve, do one of: - - Use `sql_execution` with `SELECT 1 FROM LIMIT 0`. If it errors, the - identifier is fictional. + - Use `sql_execution({connectionName, sql: "SELECT 1 FROM LIMIT 0"})`. + If it errors, the identifier is fictional. - Wrap the identifier in `[unverified - from ]` in the wiki body, citing the exact raw path that mentioned it. - When recording `emit_unmapped_fallback` with `no_physical_table`, include diff --git a/packages/context/skills/historic_sql_table_digest/SKILL.md b/packages/context/skills/historic_sql_table_digest/SKILL.md index 669b3eec..0815e3dc 100644 --- a/packages/context/skills/historic_sql_table_digest/SKILL.md +++ b/packages/context/skills/historic_sql_table_digest/SKILL.md @@ -19,11 +19,31 @@ Use this skill when the WorkUnit raw file is one `tables/..json` f ## Identifier Verification Protocol -Only mention columns visible in the table's scan record. Use -`entity_details({connectionName, targets: [{display: ""}]})` if -the table or column attribution is uncertain. Do not infer join columns or -filters from neighboring SQL unless the scan record confirms the column exists -on the named table. +Before writing a wiki page or SL source on any topic: + +1. `discover_data({query: ""})` - see what wikis, SL sources, and raw + tables already exist. Prefer updating existing pages over creating new ones. + +Before emitting any `schema.table` or `schema.table.column` into a wiki body, +SL source, `tables:` frontmatter, `sl_refs`, or `emit_unmapped_fallback`: + +2. `entity_details({connectionName, targets: [{display: ""}]})` - + confirm the identifier resolves; inspect native types, FK/PK, and + sampleValues. +3. For literal values from the source, such as status codes or plan tiers, + check whether they appear in `entity_details` sampleValues for the relevant + column. If sampleValues is short or the sample may have missed real values, + run a `sql_execution` probe with the same warehouse connection name: + `sql_execution({connectionName, sql: "SELECT DISTINCT FROM LIMIT 50"})`. +4. If the candidate identifier still does not resolve, do one of: + - Use `sql_execution({connectionName, sql: "SELECT 1 FROM LIMIT 0"})`. + If it errors, the identifier is fictional. + - Wrap the identifier in `[unverified - from ]` in the wiki body, + citing the exact raw path that mentioned it. + - When recording `emit_unmapped_fallback` with `no_physical_table`, include + the failing probe error in `clarification`. +5. Never copy `.` placeholder strings from these instructions + into output. ## Evidence Shape diff --git a/packages/context/skills/knowledge_capture/SKILL.md b/packages/context/skills/knowledge_capture/SKILL.md index e514e780..e2876ffe 100644 --- a/packages/context/skills/knowledge_capture/SKILL.md +++ b/packages/context/skills/knowledge_capture/SKILL.md @@ -66,11 +66,11 @@ SL source, `tables:` frontmatter, `sl_refs`, or `emit_unmapped_fallback`: 3. For literal values from the source, such as status codes or plan tiers, check whether they appear in `entity_details` sampleValues for the relevant column. If sampleValues is short or the sample may have missed real values, - run a `sql_execution` probe: - `SELECT DISTINCT FROM LIMIT 50`. + run a `sql_execution` probe with the same warehouse connection name: + `sql_execution({connectionName, sql: "SELECT DISTINCT FROM LIMIT 50"})`. 4. If the candidate identifier still does not resolve, do one of: - - Use `sql_execution` with `SELECT 1 FROM LIMIT 0`. If it errors, the - identifier is fictional. + - Use `sql_execution({connectionName, sql: "SELECT 1 FROM LIMIT 0"})`. + If it errors, the identifier is fictional. - Wrap the identifier in `[unverified - from ]` in the wiki body, citing the exact raw path that mentioned it. - When recording `emit_unmapped_fallback` with `no_physical_table`, include diff --git a/packages/context/skills/live_database_ingest/SKILL.md b/packages/context/skills/live_database_ingest/SKILL.md index 0b9074e9..2b9cb6d8 100644 --- a/packages/context/skills/live_database_ingest/SKILL.md +++ b/packages/context/skills/live_database_ingest/SKILL.md @@ -43,11 +43,11 @@ SL source, `tables:` frontmatter, `sl_refs`, or `emit_unmapped_fallback`: 3. For literal values from the source, such as status codes or plan tiers, check whether they appear in `entity_details` sampleValues for the relevant column. If sampleValues is short or the sample may have missed real values, - run a `sql_execution` probe: - `SELECT DISTINCT FROM LIMIT 50`. + run a `sql_execution` probe with the same warehouse connection name: + `sql_execution({connectionName, sql: "SELECT DISTINCT FROM LIMIT 50"})`. 4. If the candidate identifier still does not resolve, do one of: - - Use `sql_execution` with `SELECT 1 FROM LIMIT 0`. If it errors, the - identifier is fictional. + - Use `sql_execution({connectionName, sql: "SELECT 1 FROM LIMIT 0"})`. + If it errors, the identifier is fictional. - Wrap the identifier in `[unverified - from ]` in the wiki body, citing the exact raw path that mentioned it. - When recording `emit_unmapped_fallback` with `no_physical_table`, include diff --git a/packages/context/skills/looker_ingest/SKILL.md b/packages/context/skills/looker_ingest/SKILL.md index 87dfe1b7..7a41fa6e 100644 --- a/packages/context/skills/looker_ingest/SKILL.md +++ b/packages/context/skills/looker_ingest/SKILL.md @@ -40,11 +40,11 @@ SL source, `tables:` frontmatter, `sl_refs`, or `emit_unmapped_fallback`: 3. For literal values from the source, such as status codes or plan tiers, check whether they appear in `entity_details` sampleValues for the relevant column. If sampleValues is short or the sample may have missed real values, - run a `sql_execution` probe: - `SELECT DISTINCT FROM LIMIT 50`. + run a `sql_execution` probe with the same warehouse connection name: + `sql_execution({connectionName, sql: "SELECT DISTINCT FROM LIMIT 50"})`. 4. If the candidate identifier still does not resolve, do one of: - - Use `sql_execution` with `SELECT 1 FROM LIMIT 0`. If it errors, the - identifier is fictional. + - Use `sql_execution({connectionName, sql: "SELECT 1 FROM LIMIT 0"})`. + If it errors, the identifier is fictional. - Wrap the identifier in `[unverified - from ]` in the wiki body, citing the exact raw path that mentioned it. - When recording `emit_unmapped_fallback` with `no_physical_table`, include diff --git a/packages/context/skills/lookml_ingest/SKILL.md b/packages/context/skills/lookml_ingest/SKILL.md index 44725699..5a9c79a3 100644 --- a/packages/context/skills/lookml_ingest/SKILL.md +++ b/packages/context/skills/lookml_ingest/SKILL.md @@ -70,11 +70,11 @@ SL source, `tables:` frontmatter, `sl_refs`, or `emit_unmapped_fallback`: 3. For literal values from the source, such as status codes or plan tiers, check whether they appear in `entity_details` sampleValues for the relevant column. If sampleValues is short or the sample may have missed real values, - run a `sql_execution` probe: - `SELECT DISTINCT FROM LIMIT 50`. + run a `sql_execution` probe with the same warehouse connection name: + `sql_execution({connectionName, sql: "SELECT DISTINCT FROM LIMIT 50"})`. 4. If the candidate identifier still does not resolve, do one of: - - Use `sql_execution` with `SELECT 1 FROM LIMIT 0`. If it errors, the - identifier is fictional. + - Use `sql_execution({connectionName, sql: "SELECT 1 FROM LIMIT 0"})`. + If it errors, the identifier is fictional. - Wrap the identifier in `[unverified - from ]` in the wiki body, citing the exact raw path that mentioned it. - When recording `emit_unmapped_fallback` with `no_physical_table`, include @@ -85,7 +85,13 @@ SL source, `tables:` frontmatter, `sl_refs`, or `emit_unmapped_fallback`: **Required flow before writing any overlay or standalone**: 1. Call `sl_discover()` for each base table you're about to touch. That returns the real columns. -2. If the table isn't in the manifest, fall back to `sql_execution({ sql: "SELECT column_name FROM .INFORMATION_SCHEMA.COLUMNS WHERE table_name = '
'" })` (session shape — a connection is already pinned by the ingest session). +2. If the table isn't in the manifest, use the warehouse `connectionName` + returned by `discover_data` or the target connection chosen from + `sl_discover`, then call a dialect-appropriate SQL probe with that + connection name, for example: + `sql_execution({connectionName: "warehouse", sql: "SELECT 1 FROM analytics.orders LIMIT 0"})`. + Replace `warehouse`, `analytics`, and `orders` with the verified connection, + schema or dataset, and table from the WorkUnit evidence. 3. Use only those names in `sql:`, `columns:`, and `grain:`. Map each `dimension_group` to ONE `{ name: , type: time, role: time }` entry — never one per timeframe. | LookML input | KTX `columns:` entry | diff --git a/packages/context/skills/metabase_ingest/SKILL.md b/packages/context/skills/metabase_ingest/SKILL.md index 3b2535e4..f5aa00e2 100644 --- a/packages/context/skills/metabase_ingest/SKILL.md +++ b/packages/context/skills/metabase_ingest/SKILL.md @@ -63,11 +63,11 @@ SL source, `tables:` frontmatter, `sl_refs`, or `emit_unmapped_fallback`: 3. For literal values from the source, such as status codes or plan tiers, check whether they appear in `entity_details` sampleValues for the relevant column. If sampleValues is short or the sample may have missed real values, - run a `sql_execution` probe: - `SELECT DISTINCT FROM LIMIT 50`. + run a `sql_execution` probe with the same warehouse connection name: + `sql_execution({connectionName, sql: "SELECT DISTINCT FROM LIMIT 50"})`. 4. If the candidate identifier still does not resolve, do one of: - - Use `sql_execution` with `SELECT 1 FROM LIMIT 0`. If it errors, the - identifier is fictional. + - Use `sql_execution({connectionName, sql: "SELECT 1 FROM LIMIT 0"})`. + If it errors, the identifier is fictional. - Wrap the identifier in `[unverified - from ]` in the wiki body, citing the exact raw path that mentioned it. - When recording `emit_unmapped_fallback` with `no_physical_table`, include diff --git a/packages/context/skills/metricflow_ingest/SKILL.md b/packages/context/skills/metricflow_ingest/SKILL.md index a24bab06..47187ffb 100644 --- a/packages/context/skills/metricflow_ingest/SKILL.md +++ b/packages/context/skills/metricflow_ingest/SKILL.md @@ -48,11 +48,11 @@ SL source, `tables:` frontmatter, `sl_refs`, or `emit_unmapped_fallback`: 3. For literal values from the source, such as status codes or plan tiers, check whether they appear in `entity_details` sampleValues for the relevant column. If sampleValues is short or the sample may have missed real values, - run a `sql_execution` probe: - `SELECT DISTINCT FROM LIMIT 50`. + run a `sql_execution` probe with the same warehouse connection name: + `sql_execution({connectionName, sql: "SELECT DISTINCT FROM LIMIT 50"})`. 4. If the candidate identifier still does not resolve, do one of: - - Use `sql_execution` with `SELECT 1 FROM LIMIT 0`. If it errors, the - identifier is fictional. + - Use `sql_execution({connectionName, sql: "SELECT 1 FROM LIMIT 0"})`. + If it errors, the identifier is fictional. - Wrap the identifier in `[unverified - from ]` in the wiki body, citing the exact raw path that mentioned it. - When recording `emit_unmapped_fallback` with `no_physical_table`, include @@ -80,7 +80,13 @@ The `model:` field on a semantic_model is a string like `ref('table_name')`, `so - `source('s','t')` → table name `t`. Verify via `sl_discover(t)`. - Literal (no `ref(...)` / `source(...)`) → treat as the table name directly. -If `sl_discover` errors (no such table), fall back to `sql_execution({ sql: "SELECT column_name FROM .INFORMATION_SCHEMA.COLUMNS WHERE table_name = ''" })` (session shape — a connection is already pinned by the ingest session). **Never invent column names** — every column in `columns:`, `grain:`, and `sql:` must be sourced from a real probe. +If `sl_discover` errors because no such table exists, use `discover_data` and +`entity_details` to find the warehouse target. If a SQL probe is still needed, +call `sql_execution` with the same warehouse connection name, for example: +`sql_execution({connectionName: "warehouse", sql: "SELECT 1 FROM analytics.orders LIMIT 0"})`. +**Never invent column names** - every column in `columns:`, `grain:`, and +`sql:` must be sourced from raw files, `entity_details`, or a successful SQL +probe. After every `sl_write_source`, call `sl_validate`. The warehouse will reject invented columns with `Unrecognized name: ` — treat as a hard failure and re-read the schema. diff --git a/packages/context/skills/notion_synthesize/SKILL.md b/packages/context/skills/notion_synthesize/SKILL.md index f4bf7f83..524c6832 100644 --- a/packages/context/skills/notion_synthesize/SKILL.md +++ b/packages/context/skills/notion_synthesize/SKILL.md @@ -85,11 +85,11 @@ SL source, `tables:` frontmatter, `sl_refs`, or `emit_unmapped_fallback`: 3. For literal values from the source, such as status codes or plan tiers, check whether they appear in `entity_details` sampleValues for the relevant column. If sampleValues is short or the sample may have missed real values, - run a `sql_execution` probe: - `SELECT DISTINCT FROM LIMIT 50`. + run a `sql_execution` probe with the same warehouse connection name: + `sql_execution({connectionName, sql: "SELECT DISTINCT FROM LIMIT 50"})`. 4. If the candidate identifier still does not resolve, do one of: - - Use `sql_execution` with `SELECT 1 FROM LIMIT 0`. If it errors, the - identifier is fictional. + - Use `sql_execution({connectionName, sql: "SELECT 1 FROM LIMIT 0"})`. + If it errors, the identifier is fictional. - Wrap the identifier in `[unverified - from ]` in the wiki body, citing the exact raw path that mentioned it. - When recording `emit_unmapped_fallback` with `no_physical_table`, include diff --git a/packages/context/skills/sl_capture/SKILL.md b/packages/context/skills/sl_capture/SKILL.md index abb84170..8ddaa672 100644 --- a/packages/context/skills/sl_capture/SKILL.md +++ b/packages/context/skills/sl_capture/SKILL.md @@ -193,11 +193,11 @@ SL source, `tables:` frontmatter, `sl_refs`, or `emit_unmapped_fallback`: 3. For literal values from the source, such as status codes or plan tiers, check whether they appear in `entity_details` sampleValues for the relevant column. If sampleValues is short or the sample may have missed real values, - run a `sql_execution` probe: - `SELECT DISTINCT FROM LIMIT 50`. + run a `sql_execution` probe with the same warehouse connection name: + `sql_execution({connectionName, sql: "SELECT DISTINCT FROM LIMIT 50"})`. 4. If the candidate identifier still does not resolve, do one of: - - Use `sql_execution` with `SELECT 1 FROM LIMIT 0`. If it errors, the - identifier is fictional. + - Use `sql_execution({connectionName, sql: "SELECT 1 FROM LIMIT 0"})`. + If it errors, the identifier is fictional. - Wrap the identifier in `[unverified - from ]` in the wiki body, citing the exact raw path that mentioned it. - When recording `emit_unmapped_fallback` with `no_physical_table`, include @@ -212,7 +212,7 @@ SL source, `tables:` frontmatter, `sl_refs`, or `emit_unmapped_fallback`: 3. `sl_read_source({ sourceName })` — read the raw YAML before editing. 4. For modifications: `sl_edit_source({ sourceName, old_string, new_string })` with exact-string replacements. `old_string` must match exactly and be unique in the file. 5. For new sources or full rewrites: `sl_write_source({ sourceName, content })` with the full YAML content. -6. For join discovery: `sql_execution({ sql })` to verify the join key exists in both tables and assess cardinality before declaring the join. +6. For join discovery: use `sql_execution({connectionName: "warehouse", sql: "SELECT count(*) FROM public.orders o JOIN public.customers c ON c.id = o.customer_id LIMIT 20"})` with the target warehouse connection name and dialect-correct table names to verify the join key exists in both tables and assess cardinality before declaring the join. 7. Cross-reference knowledge: author the edge once on the **wiki** side via `sl_refs: [source_name]` in the page's front-matter. The reverse edge (wiki pages that cite an SL source) is derived automatically by the reconciler — do not add a `knowledge_refs:` field to SL YAMLs. 8. `sl_validate` — run after writing or editing to surface schema issues, duplicate measure names, and cross-source validation errors. Read-only; the writes are already committed (the squash-at-end flow will collapse them into one commit). diff --git a/packages/context/src/ingest/ingest-runtime-assets.test.ts b/packages/context/src/ingest/ingest-runtime-assets.test.ts index fd1fd66e..4b75fcdf 100644 --- a/packages/context/src/ingest/ingest-runtime-assets.test.ts +++ b/packages/context/src/ingest/ingest-runtime-assets.test.ts @@ -98,5 +98,7 @@ describe('ingest runtime assets', () => { expect(shared).toContain('discover_data'); expect(shared).toContain('entity_details'); expect(shared).toContain('sql_execution'); + expect(shared).toContain('sql_execution({connectionName, sql: "SELECT DISTINCT'); + expect(shared).toContain('sql_execution({connectionName, sql: "SELECT 1 FROM'); }); }); diff --git a/packages/context/src/memory/memory-runtime-assets.test.ts b/packages/context/src/memory/memory-runtime-assets.test.ts index 4c77de1a..ddcdefdf 100644 --- a/packages/context/src/memory/memory-runtime-assets.test.ts +++ b/packages/context/src/memory/memory-runtime-assets.test.ts @@ -147,4 +147,19 @@ describe('memory runtime assets', () => { expect(body).not.toContain('sl_describe_table'); } }); + + it('ships only the KTX connectionName sql_execution call shape in writer guidance', async () => { + const shared = await readFile(join(skillsDir, '_shared', 'identifier-verification.md'), 'utf-8'); + + expect(shared).toContain('sql_execution({connectionName, sql: "SELECT DISTINCT'); + expect(shared).toContain('sql_execution({connectionName, sql: "SELECT 1 FROM'); + + for (const skillName of verificationWriterSkills) { + const body = await readFile(join(skillsDir, skillName, 'SKILL.md'), 'utf-8'); + expect(body).toContain('sql_execution({connectionName'); + expect(body).not.toContain('sql_execution({ sql'); + expect(body).not.toContain('session shape'); + expect(body).not.toContain('connection is already pinned by the ingest session'); + } + }); });