# Add universal SQL-authoring craft to the ktx-analytics skill

> Priority: HIGH. The `ktx-analytics` skill currently tells the agent *which
> ktx tools to call and in what order*, but gives almost no guidance on
> *writing correct SQL*. In benchmark runs the agent reliably produced
> runnable SQL (0 execution errors) yet failed on correctness — precision,
> determinism, type mismatches, and answer completeness. These are universal
> analytics-engineering truths that every ktx user benefits from, so they
> belong in the shipped skill, not in any caller's prompt.

## Scope guard (read first)

Only **universally-true** SQL/analytics craft goes here — guidance that helps a
real ktx user querying a **live** database. The test for inclusion: *"Would this
advice be correct and useful for an analyst on a current, production database?"*

**Dialect-specific syntax is out of scope here.** The v9 harnesses' only
per-dialect content (Snowflake: `DB.SCHEMA.TABLE` FQTNs, double-quoted
lowercase cols, VARIANT colon-paths; BigQuery: backtick FQTNs, `_TABLE_SUFFIX`
for sharded tables; sqlite: `strftime`/`julianday`) is genuinely useful but
belongs in a **dialect-aware** location (per-driver notes), not this flat
skill. Track separately as a follow-up; the rules below must stay
dialect-agnostic.

Explicitly **do NOT** add (these are application/consumer concerns, not skill
concerns, and some are actively wrong for live data):
- Output-format contracts ("return a bare result set with exactly these
  columns, no prose"). The skill is for interactive analysis and already
  favors readable tables + summaries; a caller that needs a strict result
  shape specifies that itself.
- Anchoring relative time ("recent", "past N months") to `MAX(date)` of the
  data. On a live database "recent" means relative to *now*; this is only true
  for static snapshots and must not be baked into the product.
- Anything justified by a grader/scoring comparator.

## File

`packages/cli/src/skills/analytics/SKILL.md` (the shipped skill;
`setup-agents.ts` installs it into agent environments — the copy under a
project's `.claude/skills/` is regenerated from this source). Extend the
existing `<rules>` block and step 5 ("Query") / step 6 ("Validate and
explain"); keep the existing interactive guidance intact.

## Requirements — add these as general rules (behavior only, no rationale that
references answers/graders)

**Schema discovery before writing SQL**
1. Inspect representative sample rows of each table before composing SQL —
   confirm date/time encoding (e.g. `YYYYMMDD` vs ISO vs epoch), null
   prevalence in join/filter keys, and the actual set of categorical/enum
   values. (`entity_details` + a small `sql_execution` sample.)
2. Cast a column to its real type before comparing it in `WHERE`/`JOIN`. A
   string column compared against a numeric literal (or vice versa) can
   silently match nothing.

**Composition discipline**
3. Build complex queries incrementally — one CTE at a time, verifying each
   layer's output on a small sample before stacking the next.
4. Avoid joins that fan out row counts. Add columns only from tables already
   required by the grain, or pre-aggregate to the target grain before joining.

**Window-function correctness**
5. Give every ranking/ordering window function a complete, deterministic
   tie-breaker (append unique key columns), so `RANK`/`ROW_NUMBER`/`LAG`
   results are stable rather than flickering across runs.
6. Apply row filters **after** window functions for sequence / "first" /
   "most recent" / "since" questions — compute over the full partition, then
   filter.

**Numeric precision**
7. Compute at full precision; round only in the final projection, never inside
   intermediate CTEs.
8. Be explicit about truncation (`CAST AS INT` truncates; use explicit
   rounding when rounding is intended).
9. Distinguish "average of per-group averages" (macro: `AVG(group_metric)`)
   from "overall/weighted average" (micro: `SUM(num)/SUM(den)`) based on the
   question's wording.

**Answer completeness / interpretation**
10. "top / highest / most / lowest" → return only the winning row(s) (e.g.
    `RANK() = 1` / `QUALIFY`), not the full ranked list, unless a list is asked
    for.
11. "for each X / per X / by X" → exactly one row per X; don't collapse to a
    single value unless the question says "overall" or "total across X".
12. When a question asks for inputs and a derived value ("X, Y, and their
    ratio"), include the inputs as columns alongside the derived value.
13. When grouping by a human-readable label (a name), also expose the entity's
    identifier — identity, not just the label, is part of the result.
14. When a result is unexpectedly empty, relax filters one at a time to find
    which predicate removed the rows.

## Acceptance criteria

- The shipped `analytics/SKILL.md` contains the rules above, phrased as general
  truths with **no reference to any benchmark, gold answer, or scoring
  comparator**.
- Existing interactive guidance (compact result tables, summaries,
  clarification prompts, the tool-order workflow) is preserved — the skill must
  still read well for an interactive human-facing analysis session.
- None of the excluded items (output-shape contract, `MAX(date)` anchoring,
  grader-driven advice) appear.
- Skill stays within a reasonable size; group the new rules under clear
  sub-headings so they're scannable.

## Benchmark context (motivation only)

On the Spider 2.0-Lite sqlite subset, the solver produced 0 execution errors
but ~50 result mismatches; a large share traced to exactly these gaps
(premature rounding, string-vs-number compares, non-deterministic window
ordering, returning full lists for "top" questions, dropping inputs to derived
values). These are generic SQL-authoring defects — fixing them in the skill
improves ktx for everyone and, as a side effect, the benchmark.