8.5 KiB
| name | description | callers | |
|---|---|---|---|
| historic_sql_ingest | Convert one full-tier historic-SQL template WorkUnit into a canonical query knowledge page, linked SL refs, and optional semantic-layer proposals. |
|
Historic SQL Ingest
Use this skill when the WorkUnit contains files under raw-sources/<connectionId>/historic-sql/<syncId>/templates/<templateId>/.
Read exactly one historic-SQL template WorkUnit. Each WorkUnit represents one staged template or categorical sub-cluster that already survived full-tier page triage. It is not an intent cluster.
Input Shape
The WorkUnit normally exposes:
metadata.jsoninrawFiles.page.mdinrawFiles.usage.jsonindependencyPaths.manifest.jsonindependencyPaths.peerFileIndexcontaining sibling templates that you cannot read.
metadata.json has the stable identity:
{
"id": "fp_1",
"title": "snowflake - analytics.orders [fp_1]",
"path": "templates/fp_1/page.md",
"objectType": "historic_sql_template",
"lastEditedAt": null,
"properties": {
"fingerprint": "fp_1",
"sub_cluster_id": null,
"dialect": "snowflake",
"tables_touched": ["analytics.orders"],
"literal_slots": [
{ "position": 1, "type": "string", "classification": "constant" },
{ "position": 2, "type": "date", "classification": "runtime" }
],
"triage_signals": {
"executions_bucket": "high",
"distinct_users_bucket": "team",
"error_rate_bucket": "ok",
"recency_bucket": "active",
"service_account_only": "false",
"slot_summary": "1 constant, 1 runtime"
}
}
}
page.md contains mechanically generated normalized SQL and touched tables:
# fp_1
## Normalized SQL
SELECT date_trunc(?, created_at), count(*) FROM analytics.orders WHERE status = ? AND created_at >= ? GROUP BY 1
## Tables touched
- analytics.orders
usage.json contains volatile stats, literal top values, and redacted samples. Use it for intent inference and usage summaries. Do not treat usage-only drift as a reason to group this template with siblings.
Required Workflow
- Read the WorkUnit section in the prompt first.
- Call
read_raw_fileformetadata.json,page.md,usage.json, andmanifest.json. - Confirm
metadata.objectType === "historic_sql_template". If it is not, callemit_unmapped_fallbackwithreason: "parse_error",fallback: "flagged", and themetadata.jsonraw path. - Extract
fingerprint,sub_cluster_id,dialect,tables_touched,literal_slots, normalized SQL, usage stats, top literal values, and sample timestamps. - Infer one canonical query intent from this template only. Use table names, selected expressions, aggregations, joins, grouping, constant literal slots, and repeated successful samples. Runtime literal slots are parameters, not fixed business rules.
- Build a short intent slug in kebab-case. Use
queries/<intent_slug>as the wiki key. - Search existing knowledge with
wiki_searchusing the intent phrase and the primary table. Prefer updating an existingqueries/...page when it is the same intent. - Discover touched tables with
sl_discover. Add cleanly matched source names tosl_refs. If a table does not map cleanly, keep it in the page body and do not include it insl_refs. - Write or update the query page with
wiki_write. - Apply the SL proposal threshold below. If it passes and a useful generic measure, segment, join, or overlay is clear, update the semantic layer and run
sl_validate. - Exit without reading peer files or grouping sibling templates.
Wiki Page Shape
Use wiki_write for pages. Emit the spec frontmatter fields directly on the query page.
Use this shape:
{
"key": "queries/<intent_slug>",
"summary": "<one sentence canonical intent>",
"tags": ["historic-sql", "query-pattern"],
"sl_refs": ["<clean_source_name>"],
"source": "historic-sql",
"intent": "<human-readable canonical intent>",
"tables": ["<tables_touched>"],
"representative_sql": "<parameterized representative SQL>",
"usage": {
"executions": 47812,
"distinct_users": 12,
"first_seen": "2026-02-01",
"last_seen": "2026-04-30",
"p50_runtime_ms": 320,
"p95_runtime_ms": 1180,
"error_rate": 0.0007
},
"fingerprints": ["<fingerprint or sub-cluster id>"],
"content": "## <Canonical Intent Title>\n\n### Parameters\n- <constant/runtime/categorical slot notes>\n\n### When To Use\n- <concise reusable guidance>\n\n### Caveats\n- <redaction, service-account, low-confidence, or mapping notes if present>"
}
For Snowflake templates include usage.rows_produced when present in usage.json; for BigQuery v1 omit usage.rows_produced.
The key: "queries/<intent_slug>" value writes to knowledge/global/queries/<intent_slug>.md during external ingest because bundle ingests write global wiki pages.
Representative SQL Rules
- Start from normalized SQL in
page.md. - For constant slots, use the dominant
usage.literal_slots[].top_values[0][0]when it has definitional meaning. Quote string and date values in the representative SQL. - For runtime slots, render named parameters such as
:start_date,:as_of,:status, or:threshold. - For categorical slots, document the known categories and write this WorkUnit's sub-cluster value when
sub_cluster_idis present. - Preserve the warehouse dialect named by
metadata.properties.dialect. - Do not copy sample bound_sql into the wiki unless it is visibly redacted and safer than the normalized SQL. Prefer normalized SQL plus parameter notes.
SL Proposal Threshold
Only propose semantic-layer changes when all are true:
- This WorkUnit reached Stage 3 full tier. The runner normally guarantees this, but treat
executions_bucket=lowplusdistinct_users_bucket=soloorservice_account_only=trueas a reason to write wiki only. - At least one
literal_slots[]entry hasclassification: "constant"and the value has durable business meaning, such as a status, plan tier, channel, threshold, or fixed category. - Every table in
tables_touchedmaps cleanly throughsl_discoverto an existing SL source.
When the threshold passes:
- Call
sl_read_sourcebefore editing an existing source. - Prefer adding a measure, segment, computed dimension, join, or manifest-backed overlay over creating a standalone SQL source.
- Use
sl_write_sourcefor a manifest-backed overlay only withname:plus additive fields such asmeasures:,segments:,description:, orjoins:. Do not includesql:,table:,grain:, orcolumns:on manifest-backed overlays. - Use
sl_edit_sourcefor targeted edits when the source file already exists. - Run
sl_validateafter every SL write or edit. - Keep runtime parameters as caller filters. Do not bake dates, user ids, ids, search strings, or other runtime slots into SL measures.
When the threshold does not pass, write the wiki page and set sl_refs for any cleanly discovered touched tables. A wiki-only result is valid.
Intent Inference Guidance
Prefer canonical intent names that describe the business question, not the SQL shape:
- Good:
queries/monthly-paid-order-count - Good:
queries/enterprise-contract-renewal-risk - Good:
queries/support-ticket-first-response-time - Weak:
queries/fp-1 - Weak:
queries/count-orders-group-by-date
Use the SQL shape to infer intent:
COUNT,SUM,AVG,MIN,MAX,GROUP BY, and date truncation usually indicate metrics or rollups.- Constant slots often name segments, statuses, tiers, regions, or thresholds.
- Runtime slots usually represent time windows, selected entities, or caller filters.
- Repeated successful samples from multiple human users make the page more durable.
- High error rates, service-account-only use, or old
last_seenvalues belong in caveats.
Boundaries
- Do not group sibling templates. Stage 4
historic_sql_curatorowns cross-template clustering and dedupe. - Do not read paths listed only in
peerFileIndex. - Do not create or update
historic_sql_curator. - Do not call
context_candidate_write; historic-SQL Stage 3 writes final wiki and optional SL artifacts directly. - Do not invent joins, measures, or definitions that are not supported by the normalized SQL, touched tables, literal slots, or existing SL sources.
- Do not copy unredacted sample
bound_sql, user emails, account ids, tokens, or free-text literal values into wiki or SL output. - Do not write SL changes when any touched table lacks a clean SL mapping.
- Do not finish after only an SL write. Always write or update the query knowledge page first so the canonical SQL pattern is searchable.