ktx/packages/context/skills/historic_sql_patterns/SKILL.md
Andrey Avtomonov c22248dabf
feat(context): add warehouse verification tools (#46)
* feat(context): add warehouse dialect dispatch

* feat(context): read warehouse scan catalog

* feat(context): add entity details verification tool

* feat(context): add ingest SQL verification tool

* feat(context): add raw warehouse discovery tool

* feat(context): expose warehouse verification tools to ingest

* docs(context): add ingest identifier verification protocol

* test(context): guard ingest identifier verification prompts

* chore(context): verify warehouse verification tools

* docs: add warehouse verification tools plan and spec

* fix(context): expose target warehouses to Notion ingest

* fix(context): update ingest prompts for warehouse verification tools

* fix(context): scope raw schema discovery to allowed connections

* fix(context): verify warehouse column display targets

* docs: add notion warehouse verification gap closure plan

* fix(context): include raw discovery connection names

* fix(context): expose warehouse targets for LookML and MetricFlow

* fix(context): pass connection config to ingest query executors

* fix(cli): enable read-only SQL probes for local ingest

* docs: add warehouse verification final v1 closure plan

* fix(context): align warehouse sql probe prompt shape

* docs: add warehouse verification prompt shape closure plan

* test(context): catch connectionless sql execution prompt examples

* fix(context): include connection name in sl capture sql example

* docs: add warehouse verification sql example closure plan

* fix(context): report structured entity detail misses

* docs: add warehouse verification structured target miss closure plan

* fix: report untracked squash merge conflicts

* feat: require ingest verification ledger

* fix: stabilize ingest wiki references
2026-05-13 13:43:23 +02:00

4.2 KiB

name description callers
historic_sql_patterns Identify recurring cross-table historic-SQL analytical intents from a bounded pattern shard and emit typed pattern evidence for deterministic wiki projection.
memory_agent

Historic SQL Patterns

Use this skill when the WorkUnit raw file is a patterns-input/part-0001.json style shard from the historic-sql adapter. Older staged bundles may still provide root patterns-input.json; when that is the WorkUnit raw file, read it the same way.

Required Workflow

  1. Read the WorkUnit notes first.
  2. Find the single pattern input file listed under the WorkUnit rawFiles section.
  3. Call read_raw_file for that exact raw file path.
  4. Identify recurring analytical intents that span at least two tables and have repeated usage signal.
  5. Emit one pattern evidence object per durable cross-table intent by calling emit_historic_sql_evidence.
  6. Set each evidence object's rawPath to the exact raw file path read in step 3.
  7. Stop after all pattern evidence has been emitted.

Every join column mentioned in pattern descriptions must be verified via entity_details for both sides of the join.

Identifier Verification Protocol

Before writing a wiki page or SL source on any topic:

  1. discover_data({query: "<topic>"}) - see what wikis, SL sources, and raw tables already exist. Prefer updating existing pages over creating new ones.

Before emitting any schema.table or schema.table.column into a wiki body, SL source, tables: frontmatter, sl_refs, or emit_unmapped_fallback:

  1. entity_details({connectionName, targets: [{display: "<identifier>"}]}) - confirm the identifier resolves; inspect native types, FK/PK, and sampleValues.
  2. For literal values from the source, such as status codes or plan tiers, check whether they appear in entity_details sampleValues for the relevant column. If sampleValues is short or the sample may have missed real values, run a sql_execution probe with the same warehouse connection name: sql_execution({connectionName, sql: "SELECT DISTINCT <col> FROM <ref> LIMIT 50"}).
  3. If the candidate identifier still does not resolve, do one of:
    • Use sql_execution({connectionName, sql: "SELECT 1 FROM <ref> LIMIT 0"}). If it errors, the identifier is fictional.
    • Wrap the identifier in [unverified - from <rawPath>] in the wiki body, citing the exact raw path that mentioned it.
    • When recording emit_unmapped_fallback with no_physical_table, include the failing probe error in clarification.
  4. Never copy <schema>.<table> placeholder strings from these instructions into output.

Evidence Shape

Each call to emit_historic_sql_evidence must use this shape:

{
  "kind": "pattern",
  "rawPath": "patterns-input/part-0001.json",
  "pattern": {
    "slug": "order-lifecycle-analysis",
    "title": "Order Lifecycle Analysis",
    "narrative": "Analysts compare order statuses with customer segments to understand lifecycle movement.",
    "definitionSql": "select o.status, count(*) from public.orders o join public.customers c on c.id = o.customer_id group by o.status",
    "tablesInvolved": ["public.orders", "public.customers"],
    "slRefs": ["orders", "customers"],
    "constituentTemplateIds": ["pg:1", "pg:2"]
  }
}

The pattern object must match patternOutputSchema; multiple calls together must form patternsArraySchema.

Pattern Selection Rules

  • Prefer patterns that involve two or more tables.
  • Prefer templates with executionsBucket at least 10-100 and distinctUsersBucket above solo usage.
  • Merge templates into one pattern only when the business intent is the same.
  • Use a stable kebab-case slug based on intent, not a template id.
  • Set definitionSql to the clearest representative SQL from a constituent template.
  • Set slRefs to source names when the source name is obvious from table names; omit uncertain refs rather than guessing.
  • Treat each pattern shard independently; do not read peer shard files from peerFileIndex.

Boundaries

  • Do not call wiki_write.
  • Do not call sl_write_source.
  • Do not call sl_edit_source.
  • Do not call context_candidate_write.
  • Do not create single-table pattern pages.
  • Do not copy credentials, tokens, user emails, or unredacted literals into evidence.