mirror of
https://github.com/Kaelio/ktx.git
synced 2026-06-10 08:05:14 +02:00
fix(context): align warehouse sql probe prompt shape
This commit is contained in:
parent
ffc9456e75
commit
b80660e4d6
14 changed files with 101 additions and 52 deletions
|
|
@ -14,11 +14,11 @@ SL source, `tables:` frontmatter, `sl_refs`, or `emit_unmapped_fallback`:
|
|||
3. For literal values from the source, such as status codes or plan tiers,
|
||||
check whether they appear in `entity_details` sampleValues for the relevant
|
||||
column. If sampleValues is short or the sample may have missed real values,
|
||||
run a `sql_execution` probe:
|
||||
`SELECT DISTINCT <col> FROM <ref> LIMIT 50`.
|
||||
run a `sql_execution` probe with the same warehouse connection name:
|
||||
`sql_execution({connectionName, sql: "SELECT DISTINCT <col> FROM <ref> LIMIT 50"})`.
|
||||
4. If the candidate identifier still does not resolve, do one of:
|
||||
- Use `sql_execution` with `SELECT 1 FROM <ref> LIMIT 0`. If it errors, the
|
||||
identifier is fictional.
|
||||
- Use `sql_execution({connectionName, sql: "SELECT 1 FROM <ref> LIMIT 0"})`.
|
||||
If it errors, the identifier is fictional.
|
||||
- Wrap the identifier in `[unverified - from <rawPath>]` in the wiki body,
|
||||
citing the exact raw path that mentioned it.
|
||||
- When recording `emit_unmapped_fallback` with `no_physical_table`, include
|
||||
|
|
|
|||
|
|
@ -47,11 +47,11 @@ SL source, `tables:` frontmatter, `sl_refs`, or `emit_unmapped_fallback`:
|
|||
3. For literal values from the source, such as status codes or plan tiers,
|
||||
check whether they appear in `entity_details` sampleValues for the relevant
|
||||
column. If sampleValues is short or the sample may have missed real values,
|
||||
run a `sql_execution` probe:
|
||||
`SELECT DISTINCT <col> FROM <ref> LIMIT 50`.
|
||||
run a `sql_execution` probe with the same warehouse connection name:
|
||||
`sql_execution({connectionName, sql: "SELECT DISTINCT <col> FROM <ref> LIMIT 50"})`.
|
||||
4. If the candidate identifier still does not resolve, do one of:
|
||||
- Use `sql_execution` with `SELECT 1 FROM <ref> LIMIT 0`. If it errors, the
|
||||
identifier is fictional.
|
||||
- Use `sql_execution({connectionName, sql: "SELECT 1 FROM <ref> LIMIT 0"})`.
|
||||
If it errors, the identifier is fictional.
|
||||
- Wrap the identifier in `[unverified - from <rawPath>]` in the wiki body,
|
||||
citing the exact raw path that mentioned it.
|
||||
- When recording `emit_unmapped_fallback` with `no_physical_table`, include
|
||||
|
|
|
|||
|
|
@ -37,11 +37,11 @@ SL source, `tables:` frontmatter, `sl_refs`, or `emit_unmapped_fallback`:
|
|||
3. For literal values from the source, such as status codes or plan tiers,
|
||||
check whether they appear in `entity_details` sampleValues for the relevant
|
||||
column. If sampleValues is short or the sample may have missed real values,
|
||||
run a `sql_execution` probe:
|
||||
`SELECT DISTINCT <col> FROM <ref> LIMIT 50`.
|
||||
run a `sql_execution` probe with the same warehouse connection name:
|
||||
`sql_execution({connectionName, sql: "SELECT DISTINCT <col> FROM <ref> LIMIT 50"})`.
|
||||
4. If the candidate identifier still does not resolve, do one of:
|
||||
- Use `sql_execution` with `SELECT 1 FROM <ref> LIMIT 0`. If it errors, the
|
||||
identifier is fictional.
|
||||
- Use `sql_execution({connectionName, sql: "SELECT 1 FROM <ref> LIMIT 0"})`.
|
||||
If it errors, the identifier is fictional.
|
||||
- Wrap the identifier in `[unverified - from <rawPath>]` in the wiki body,
|
||||
citing the exact raw path that mentioned it.
|
||||
- When recording `emit_unmapped_fallback` with `no_physical_table`, include
|
||||
|
|
|
|||
|
|
@ -19,11 +19,31 @@ Use this skill when the WorkUnit raw file is one `tables/<schema>.<name>.json` f
|
|||
|
||||
## Identifier Verification Protocol
|
||||
|
||||
Only mention columns visible in the table's scan record. Use
|
||||
`entity_details({connectionName, targets: [{display: "<identifier>"}]})` if
|
||||
the table or column attribution is uncertain. Do not infer join columns or
|
||||
filters from neighboring SQL unless the scan record confirms the column exists
|
||||
on the named table.
|
||||
Before writing a wiki page or SL source on any topic:
|
||||
|
||||
1. `discover_data({query: "<topic>"})` - see what wikis, SL sources, and raw
|
||||
tables already exist. Prefer updating existing pages over creating new ones.
|
||||
|
||||
Before emitting any `schema.table` or `schema.table.column` into a wiki body,
|
||||
SL source, `tables:` frontmatter, `sl_refs`, or `emit_unmapped_fallback`:
|
||||
|
||||
2. `entity_details({connectionName, targets: [{display: "<identifier>"}]})` -
|
||||
confirm the identifier resolves; inspect native types, FK/PK, and
|
||||
sampleValues.
|
||||
3. For literal values from the source, such as status codes or plan tiers,
|
||||
check whether they appear in `entity_details` sampleValues for the relevant
|
||||
column. If sampleValues is short or the sample may have missed real values,
|
||||
run a `sql_execution` probe with the same warehouse connection name:
|
||||
`sql_execution({connectionName, sql: "SELECT DISTINCT <col> FROM <ref> LIMIT 50"})`.
|
||||
4. If the candidate identifier still does not resolve, do one of:
|
||||
- Use `sql_execution({connectionName, sql: "SELECT 1 FROM <ref> LIMIT 0"})`.
|
||||
If it errors, the identifier is fictional.
|
||||
- Wrap the identifier in `[unverified - from <rawPath>]` in the wiki body,
|
||||
citing the exact raw path that mentioned it.
|
||||
- When recording `emit_unmapped_fallback` with `no_physical_table`, include
|
||||
the failing probe error in `clarification`.
|
||||
5. Never copy `<schema>.<table>` placeholder strings from these instructions
|
||||
into output.
|
||||
|
||||
## Evidence Shape
|
||||
|
||||
|
|
|
|||
|
|
@ -66,11 +66,11 @@ SL source, `tables:` frontmatter, `sl_refs`, or `emit_unmapped_fallback`:
|
|||
3. For literal values from the source, such as status codes or plan tiers,
|
||||
check whether they appear in `entity_details` sampleValues for the relevant
|
||||
column. If sampleValues is short or the sample may have missed real values,
|
||||
run a `sql_execution` probe:
|
||||
`SELECT DISTINCT <col> FROM <ref> LIMIT 50`.
|
||||
run a `sql_execution` probe with the same warehouse connection name:
|
||||
`sql_execution({connectionName, sql: "SELECT DISTINCT <col> FROM <ref> LIMIT 50"})`.
|
||||
4. If the candidate identifier still does not resolve, do one of:
|
||||
- Use `sql_execution` with `SELECT 1 FROM <ref> LIMIT 0`. If it errors, the
|
||||
identifier is fictional.
|
||||
- Use `sql_execution({connectionName, sql: "SELECT 1 FROM <ref> LIMIT 0"})`.
|
||||
If it errors, the identifier is fictional.
|
||||
- Wrap the identifier in `[unverified - from <rawPath>]` in the wiki body,
|
||||
citing the exact raw path that mentioned it.
|
||||
- When recording `emit_unmapped_fallback` with `no_physical_table`, include
|
||||
|
|
|
|||
|
|
@ -43,11 +43,11 @@ SL source, `tables:` frontmatter, `sl_refs`, or `emit_unmapped_fallback`:
|
|||
3. For literal values from the source, such as status codes or plan tiers,
|
||||
check whether they appear in `entity_details` sampleValues for the relevant
|
||||
column. If sampleValues is short or the sample may have missed real values,
|
||||
run a `sql_execution` probe:
|
||||
`SELECT DISTINCT <col> FROM <ref> LIMIT 50`.
|
||||
run a `sql_execution` probe with the same warehouse connection name:
|
||||
`sql_execution({connectionName, sql: "SELECT DISTINCT <col> FROM <ref> LIMIT 50"})`.
|
||||
4. If the candidate identifier still does not resolve, do one of:
|
||||
- Use `sql_execution` with `SELECT 1 FROM <ref> LIMIT 0`. If it errors, the
|
||||
identifier is fictional.
|
||||
- Use `sql_execution({connectionName, sql: "SELECT 1 FROM <ref> LIMIT 0"})`.
|
||||
If it errors, the identifier is fictional.
|
||||
- Wrap the identifier in `[unverified - from <rawPath>]` in the wiki body,
|
||||
citing the exact raw path that mentioned it.
|
||||
- When recording `emit_unmapped_fallback` with `no_physical_table`, include
|
||||
|
|
|
|||
|
|
@ -40,11 +40,11 @@ SL source, `tables:` frontmatter, `sl_refs`, or `emit_unmapped_fallback`:
|
|||
3. For literal values from the source, such as status codes or plan tiers,
|
||||
check whether they appear in `entity_details` sampleValues for the relevant
|
||||
column. If sampleValues is short or the sample may have missed real values,
|
||||
run a `sql_execution` probe:
|
||||
`SELECT DISTINCT <col> FROM <ref> LIMIT 50`.
|
||||
run a `sql_execution` probe with the same warehouse connection name:
|
||||
`sql_execution({connectionName, sql: "SELECT DISTINCT <col> FROM <ref> LIMIT 50"})`.
|
||||
4. If the candidate identifier still does not resolve, do one of:
|
||||
- Use `sql_execution` with `SELECT 1 FROM <ref> LIMIT 0`. If it errors, the
|
||||
identifier is fictional.
|
||||
- Use `sql_execution({connectionName, sql: "SELECT 1 FROM <ref> LIMIT 0"})`.
|
||||
If it errors, the identifier is fictional.
|
||||
- Wrap the identifier in `[unverified - from <rawPath>]` in the wiki body,
|
||||
citing the exact raw path that mentioned it.
|
||||
- When recording `emit_unmapped_fallback` with `no_physical_table`, include
|
||||
|
|
|
|||
|
|
@ -70,11 +70,11 @@ SL source, `tables:` frontmatter, `sl_refs`, or `emit_unmapped_fallback`:
|
|||
3. For literal values from the source, such as status codes or plan tiers,
|
||||
check whether they appear in `entity_details` sampleValues for the relevant
|
||||
column. If sampleValues is short or the sample may have missed real values,
|
||||
run a `sql_execution` probe:
|
||||
`SELECT DISTINCT <col> FROM <ref> LIMIT 50`.
|
||||
run a `sql_execution` probe with the same warehouse connection name:
|
||||
`sql_execution({connectionName, sql: "SELECT DISTINCT <col> FROM <ref> LIMIT 50"})`.
|
||||
4. If the candidate identifier still does not resolve, do one of:
|
||||
- Use `sql_execution` with `SELECT 1 FROM <ref> LIMIT 0`. If it errors, the
|
||||
identifier is fictional.
|
||||
- Use `sql_execution({connectionName, sql: "SELECT 1 FROM <ref> LIMIT 0"})`.
|
||||
If it errors, the identifier is fictional.
|
||||
- Wrap the identifier in `[unverified - from <rawPath>]` in the wiki body,
|
||||
citing the exact raw path that mentioned it.
|
||||
- When recording `emit_unmapped_fallback` with `no_physical_table`, include
|
||||
|
|
@ -85,7 +85,13 @@ SL source, `tables:` frontmatter, `sl_refs`, or `emit_unmapped_fallback`:
|
|||
**Required flow before writing any overlay or standalone**:
|
||||
|
||||
1. Call `sl_discover(<tableName>)` for each base table you're about to touch. That returns the real columns.
|
||||
2. If the table isn't in the manifest, fall back to `sql_execution({ sql: "SELECT column_name FROM <dataset>.INFORMATION_SCHEMA.COLUMNS WHERE table_name = '<table>'" })` (session shape — a connection is already pinned by the ingest session).
|
||||
2. If the table isn't in the manifest, use the warehouse `connectionName`
|
||||
returned by `discover_data` or the target connection chosen from
|
||||
`sl_discover`, then call a dialect-appropriate SQL probe with that
|
||||
connection name, for example:
|
||||
`sql_execution({connectionName: "warehouse", sql: "SELECT 1 FROM analytics.orders LIMIT 0"})`.
|
||||
Replace `warehouse`, `analytics`, and `orders` with the verified connection,
|
||||
schema or dataset, and table from the WorkUnit evidence.
|
||||
3. Use only those names in `sql:`, `columns:`, and `grain:`. Map each `dimension_group` to ONE `{ name: <physical_col>, type: time, role: time }` entry — never one per timeframe.
|
||||
|
||||
| LookML input | KTX `columns:` entry |
|
||||
|
|
|
|||
|
|
@ -63,11 +63,11 @@ SL source, `tables:` frontmatter, `sl_refs`, or `emit_unmapped_fallback`:
|
|||
3. For literal values from the source, such as status codes or plan tiers,
|
||||
check whether they appear in `entity_details` sampleValues for the relevant
|
||||
column. If sampleValues is short or the sample may have missed real values,
|
||||
run a `sql_execution` probe:
|
||||
`SELECT DISTINCT <col> FROM <ref> LIMIT 50`.
|
||||
run a `sql_execution` probe with the same warehouse connection name:
|
||||
`sql_execution({connectionName, sql: "SELECT DISTINCT <col> FROM <ref> LIMIT 50"})`.
|
||||
4. If the candidate identifier still does not resolve, do one of:
|
||||
- Use `sql_execution` with `SELECT 1 FROM <ref> LIMIT 0`. If it errors, the
|
||||
identifier is fictional.
|
||||
- Use `sql_execution({connectionName, sql: "SELECT 1 FROM <ref> LIMIT 0"})`.
|
||||
If it errors, the identifier is fictional.
|
||||
- Wrap the identifier in `[unverified - from <rawPath>]` in the wiki body,
|
||||
citing the exact raw path that mentioned it.
|
||||
- When recording `emit_unmapped_fallback` with `no_physical_table`, include
|
||||
|
|
|
|||
|
|
@ -48,11 +48,11 @@ SL source, `tables:` frontmatter, `sl_refs`, or `emit_unmapped_fallback`:
|
|||
3. For literal values from the source, such as status codes or plan tiers,
|
||||
check whether they appear in `entity_details` sampleValues for the relevant
|
||||
column. If sampleValues is short or the sample may have missed real values,
|
||||
run a `sql_execution` probe:
|
||||
`SELECT DISTINCT <col> FROM <ref> LIMIT 50`.
|
||||
run a `sql_execution` probe with the same warehouse connection name:
|
||||
`sql_execution({connectionName, sql: "SELECT DISTINCT <col> FROM <ref> LIMIT 50"})`.
|
||||
4. If the candidate identifier still does not resolve, do one of:
|
||||
- Use `sql_execution` with `SELECT 1 FROM <ref> LIMIT 0`. If it errors, the
|
||||
identifier is fictional.
|
||||
- Use `sql_execution({connectionName, sql: "SELECT 1 FROM <ref> LIMIT 0"})`.
|
||||
If it errors, the identifier is fictional.
|
||||
- Wrap the identifier in `[unverified - from <rawPath>]` in the wiki body,
|
||||
citing the exact raw path that mentioned it.
|
||||
- When recording `emit_unmapped_fallback` with `no_physical_table`, include
|
||||
|
|
@ -80,7 +80,13 @@ The `model:` field on a semantic_model is a string like `ref('table_name')`, `so
|
|||
- `source('s','t')` → table name `t`. Verify via `sl_discover(t)`.
|
||||
- Literal (no `ref(...)` / `source(...)`) → treat as the table name directly.
|
||||
|
||||
If `sl_discover` errors (no such table), fall back to `sql_execution({ sql: "SELECT column_name FROM <dataset>.INFORMATION_SCHEMA.COLUMNS WHERE table_name = '<x>'" })` (session shape — a connection is already pinned by the ingest session). **Never invent column names** — every column in `columns:`, `grain:`, and `sql:` must be sourced from a real probe.
|
||||
If `sl_discover` errors because no such table exists, use `discover_data` and
|
||||
`entity_details` to find the warehouse target. If a SQL probe is still needed,
|
||||
call `sql_execution` with the same warehouse connection name, for example:
|
||||
`sql_execution({connectionName: "warehouse", sql: "SELECT 1 FROM analytics.orders LIMIT 0"})`.
|
||||
**Never invent column names** - every column in `columns:`, `grain:`, and
|
||||
`sql:` must be sourced from raw files, `entity_details`, or a successful SQL
|
||||
probe.
|
||||
|
||||
After every `sl_write_source`, call `sl_validate`. The warehouse will reject invented columns with `Unrecognized name: <name>` — treat as a hard failure and re-read the schema.
|
||||
|
||||
|
|
|
|||
|
|
@ -85,11 +85,11 @@ SL source, `tables:` frontmatter, `sl_refs`, or `emit_unmapped_fallback`:
|
|||
3. For literal values from the source, such as status codes or plan tiers,
|
||||
check whether they appear in `entity_details` sampleValues for the relevant
|
||||
column. If sampleValues is short or the sample may have missed real values,
|
||||
run a `sql_execution` probe:
|
||||
`SELECT DISTINCT <col> FROM <ref> LIMIT 50`.
|
||||
run a `sql_execution` probe with the same warehouse connection name:
|
||||
`sql_execution({connectionName, sql: "SELECT DISTINCT <col> FROM <ref> LIMIT 50"})`.
|
||||
4. If the candidate identifier still does not resolve, do one of:
|
||||
- Use `sql_execution` with `SELECT 1 FROM <ref> LIMIT 0`. If it errors, the
|
||||
identifier is fictional.
|
||||
- Use `sql_execution({connectionName, sql: "SELECT 1 FROM <ref> LIMIT 0"})`.
|
||||
If it errors, the identifier is fictional.
|
||||
- Wrap the identifier in `[unverified - from <rawPath>]` in the wiki body,
|
||||
citing the exact raw path that mentioned it.
|
||||
- When recording `emit_unmapped_fallback` with `no_physical_table`, include
|
||||
|
|
|
|||
|
|
@ -193,11 +193,11 @@ SL source, `tables:` frontmatter, `sl_refs`, or `emit_unmapped_fallback`:
|
|||
3. For literal values from the source, such as status codes or plan tiers,
|
||||
check whether they appear in `entity_details` sampleValues for the relevant
|
||||
column. If sampleValues is short or the sample may have missed real values,
|
||||
run a `sql_execution` probe:
|
||||
`SELECT DISTINCT <col> FROM <ref> LIMIT 50`.
|
||||
run a `sql_execution` probe with the same warehouse connection name:
|
||||
`sql_execution({connectionName, sql: "SELECT DISTINCT <col> FROM <ref> LIMIT 50"})`.
|
||||
4. If the candidate identifier still does not resolve, do one of:
|
||||
- Use `sql_execution` with `SELECT 1 FROM <ref> LIMIT 0`. If it errors, the
|
||||
identifier is fictional.
|
||||
- Use `sql_execution({connectionName, sql: "SELECT 1 FROM <ref> LIMIT 0"})`.
|
||||
If it errors, the identifier is fictional.
|
||||
- Wrap the identifier in `[unverified - from <rawPath>]` in the wiki body,
|
||||
citing the exact raw path that mentioned it.
|
||||
- When recording `emit_unmapped_fallback` with `no_physical_table`, include
|
||||
|
|
@ -212,7 +212,7 @@ SL source, `tables:` frontmatter, `sl_refs`, or `emit_unmapped_fallback`:
|
|||
3. `sl_read_source({ sourceName })` — read the raw YAML before editing.
|
||||
4. For modifications: `sl_edit_source({ sourceName, old_string, new_string })` with exact-string replacements. `old_string` must match exactly and be unique in the file.
|
||||
5. For new sources or full rewrites: `sl_write_source({ sourceName, content })` with the full YAML content.
|
||||
6. For join discovery: `sql_execution({ sql })` to verify the join key exists in both tables and assess cardinality before declaring the join.
|
||||
6. For join discovery: use `sql_execution({connectionName: "warehouse", sql: "SELECT count(*) FROM public.orders o JOIN public.customers c ON c.id = o.customer_id LIMIT 20"})` with the target warehouse connection name and dialect-correct table names to verify the join key exists in both tables and assess cardinality before declaring the join.
|
||||
7. Cross-reference knowledge: author the edge once on the **wiki** side via `sl_refs: [source_name]` in the page's front-matter. The reverse edge (wiki pages that cite an SL source) is derived automatically by the reconciler — do not add a `knowledge_refs:` field to SL YAMLs.
|
||||
8. `sl_validate` — run after writing or editing to surface schema issues, duplicate measure names, and cross-source validation errors. Read-only; the writes are already committed (the squash-at-end flow will collapse them into one commit).
|
||||
|
||||
|
|
|
|||
|
|
@ -98,5 +98,7 @@ describe('ingest runtime assets', () => {
|
|||
expect(shared).toContain('discover_data');
|
||||
expect(shared).toContain('entity_details');
|
||||
expect(shared).toContain('sql_execution');
|
||||
expect(shared).toContain('sql_execution({connectionName, sql: "SELECT DISTINCT');
|
||||
expect(shared).toContain('sql_execution({connectionName, sql: "SELECT 1 FROM');
|
||||
});
|
||||
});
|
||||
|
|
|
|||
|
|
@ -147,4 +147,19 @@ describe('memory runtime assets', () => {
|
|||
expect(body).not.toContain('sl_describe_table');
|
||||
}
|
||||
});
|
||||
|
||||
it('ships only the KTX connectionName sql_execution call shape in writer guidance', async () => {
|
||||
const shared = await readFile(join(skillsDir, '_shared', 'identifier-verification.md'), 'utf-8');
|
||||
|
||||
expect(shared).toContain('sql_execution({connectionName, sql: "SELECT DISTINCT');
|
||||
expect(shared).toContain('sql_execution({connectionName, sql: "SELECT 1 FROM');
|
||||
|
||||
for (const skillName of verificationWriterSkills) {
|
||||
const body = await readFile(join(skillsDir, skillName, 'SKILL.md'), 'utf-8');
|
||||
expect(body).toContain('sql_execution({connectionName');
|
||||
expect(body).not.toContain('sql_execution({ sql');
|
||||
expect(body).not.toContain('session shape');
|
||||
expect(body).not.toContain('connection is already pinned by the ingest session');
|
||||
}
|
||||
});
|
||||
});
|
||||
|
|
|
|||
Loading…
Add table
Add a link
Reference in a new issue