# Per-dialect SQL syntax notes, served on demand and scoped to the connection > Refined spec. Intake draft: `todo/08-per-dialect-sql-syntax-notes.md`. Companion > to `specs/07-analytics-skill-sql-craft.md`, which kept the analytics SQL craft > dialect-agnostic and explicitly deferred per-dialect syntax to this spec. ## Problem Spec 07 added universal, **dialect-agnostic** SQL-authoring craft to the `ktx-analytics` skill (`packages/cli/src/skills/analytics/SKILL.md`). That craft deliberately excludes anything that reads correctly on only one engine — no `QUALIFY`, no `strftime`/`julianday`, no backtick or `DB.SCHEMA.TABLE` FQTNs — because the flat skill is installed verbatim and an agent querying sqlite must never see Snowflake syntax. But a large share of *real* correctness depends on exactly that excluded, engine-specific syntax: - **Snowflake:** `DATABASE.SCHEMA.TABLE` FQTNs, double-quoted case-sensitive identifiers (unquoted folds to upper-case), VARIANT colon-paths (`col:field.sub::type`), `QUALIFY`. - **BigQuery:** backtick FQTNs (`` `project.dataset.table` ``), `_TABLE_SUFFIX` for sharded/wildcard tables, `QUALIFY`, `JSON_VALUE`/`JSON_EXTRACT`. - **sqlite:** `strftime`/`julianday`/`date()` for dates, no `QUALIFY`, `json_extract`. - and the remaining supported engines (`postgres`, `mysql`, `clickhouse`, `sqlserver`/`tsql`), each with its own FQTN, quoting, date, top-N, and JSON conventions. This guidance is genuinely useful to an agent writing SQL against a live database, but it must **not** pollute the flat dialect-agnostic skill. It belongs in a **dialect-aware** channel, surfaced only for the dialect the active connection actually uses, and selected from the project's own configured state — not guessed, not shown all at once. ## Generic use case Any **ktx** project whose connections span more than one warehouse engine — a Snowflake warehouse plus a BigQuery export plus a local sqlite extract, say. When the agent (or a human analyst the agent assists) writes SQL for a given connection, it should receive *that engine's* syntax conventions — FQTN form, identifier quoting, date functions, top-N idiom, semi-structured access — and nothing for the engines it is not querying. The need is independent of any benchmark: it is what "write correct SQL against this specific warehouse" requires on every multi-engine stack. ## Model The change adds a **dialect-aware channel** alongside spec 07's flat skill. The following decisions are committed by this refinement; the implementer owns the exact prose and code. ### Delivery: a dynamic MCP tool (decision committed) The draft posed two delivery mechanisms and asked the refinement to "weigh them before committing." This spec commits to **dynamic MCP delivery**: a new read-only MCP tool returns the syntax notes for a given `connectionId`, with the dialect resolved server-side from the connection's configured `driver`. The flat skill gains a one-line pointer to that tool. **No install-mechanism change is required.** The alternative — **multi-file skill delivery** (bundle `reference/.md` files and point the skill at the matching one) — is **rejected** for **ktx**, for reasons that hold regardless of how the skill is otherwise authored: 1. **It cannot scope on two of the six install targets.** Cursor (`.cursor/rules/ktx-analytics.mdc`) and OpenCode (`.opencode/commands/ktx-analytics.md`) are physically **single-file**; `setup-agents.ts` flattens the skill to one file there. A bundled `reference/` directory degenerates to "concatenate every dialect into one file," so a sqlite agent would see Snowflake VARIANT syntax — **failing this spec's core no-leak criterion on those targets**, and defeating progressive disclosure (everything is in context at once). The MCP tool behaves **identically on all six targets** because it is a tool call, not an installed file. 2. **Selecting the dialect is a deterministic operation, so it belongs in code, not model judgment.** Anthropic's skill-authoring guidance explicitly says to *"prefer scripts [tools] for deterministic operations."* With bundled files the **model** must infer that connection X is Snowflake and open the right file — and on a multi-connection project it can open the wrong one. With the tool, the **server** resolves `driver → dialect` from `ktx.yaml` state and returns exactly the right notes. 3. **It needs a delivery subsystem that the tool does not.** Multi-file delivery requires reworking `readAnalyticsSkillContent`, `installTarget`, `plannedKtxAgentFiles`, the install manifest (a directory variant), `removeKtxAgentInstall`, and `writeClaudeDesktopSkillBundle`, plus a concatenation transform for the single-file targets. The MCP tool requires one read-only handler and one skill pointer. 4. **The dependency is free.** The `ktx-analytics` skill already hard-depends on the **ktx** MCP server — its entire workflow is calling `discover_data`, `entity_details`, `sql_execution`, and so on. Wherever the server is down, the skill is already non-functional; the tool adds **no new dependency**. 5. **Dropping Cursor/OpenCode does not change this.** Removing those targets would make multi-file delivery *possible*, but it would not make it better: reasons 2–4 stand, and the drop is a disproportionate cost (Cursor is a major target) to neutralize a constraint the tool handles for free. Whether **ktx** supports those targets is a separate product decision and is out of scope here. This is consistent with Anthropic's progressive-disclosure goal — load the relevant material on demand, at zero context cost until needed — which the tool satisfies (its output costs context only when called) while resolving *which* dialect from state rather than from a model guess. Reference: [Skill authoring best practices](https://platform.claude.com/docs/en/agents-and-tools/agent-skills/best-practices). ### Scope derived from state, through the one existing resolver Which dialect's notes the agent sees is **derived** from the connection's configured `driver`, via the resolver the rest of the system already uses — `sqlAnalysisDialectForDriver(driver)` in `packages/cli/src/context/sql-analysis/dialect.ts`. The same function already selects the dialect for `sql_execution`, `sl_query`, and the Python SQL-analysis daemon. This spec **must not** introduce a second driver→dialect map. The notes are **keyed by the resolved `SqlAnalysisDialect`** (so the SQL Server entry is keyed `tsql`, not `sqlserver`), tying the note key-space to the resolver's codomain so the two cannot drift. ### Authored per-engine notes are sanctioned static content Enumerating syntax notes per engine is **not** a rotting denylist of bad specifics; FQTN form and identifier quoting are genuine, stable invariants of each engine — the kind of universal fact **ktx**'s design rules explicitly permit as static content. What must stay derived-from-state is note *selection* (the active dialect) and note *coverage* (every configured driver must resolve to notes that exist), both of which this spec ties to the connector registry. ### The flat skill stays dialect-agnostic (spec 07 invariant preserved) This work adds a *separate* channel. It does **not** amend spec 07's `` block or inline any dialect syntax into `SKILL.md`. Spec 07's acceptance criterion — no `QUALIFY`/`strftime`/`julianday`/backtick-FQTN/etc. in the flat skill — stays green. The only `SKILL.md` change is the pointer in requirement 3, which names the tool and contains no dialect syntax. ## Requirements ### 1. A read-only `sql_dialect_notes` MCP tool Register a new tool beside the existing context tools (`packages/cli/src/context/mcp/context-tools.ts`). The tool name is the implementer's to finalize but should follow the existing snake_case convention (`entity_details`, `sql_execution`); `sql_dialect_notes` is the suggested name. - **Input:** `{ connectionId }`, **required** — matching its siblings `entity_details`/`sql_execution`, which always take an explicit connection. - **Output:** `{ connectionId, dialect, notes }` where `dialect` is the resolved `SqlAnalysisDialect` and `notes` is the markdown guidance for that dialect. - **Resolution:** `connectionId → connection.driver → sqlAnalysisDialectForDriver(driver) → notes[dialect]`, reusing the existing resolver. Do not duplicate the driver→dialect map. - **Guards:** - A **non-SQL context-source** connection (driver `metabase`, `looker`, `lookml`, `notion`, `dbt`, `metricflow`) returns a **clear "not a SQL warehouse connection" error**, not postgres notes. Gate on the existing `isDatabaseDriver()` (`packages/cli/src/connection-drivers.ts`). - For any **SQL warehouse** connection the resolver always yields a dialect with notes (all seven warehouse drivers are covered — requirement 2); its built-in `postgres` default is a safety floor, so the tool never errors for a SQL connection and never emits a single-engine dialect (e.g. Snowflake) by accident. - **Annotations:** read-only and idempotent, consistent with the other read tools. - **Description (docs-grade, third person, states what and when):** e.g. *"Returns the SQL syntax conventions for a connection's dialect — FQTN form, identifier quoting and case-folding, date/time functions, top-N idiom, and semi-structured access. Use before authoring raw SQL against a connection so the SQL matches that engine."* The description drives the agent's decision to call the tool, so it must be specific. ### 2. Per-dialect note content Author concise notes for each supported dialect against a **fixed rubric**, so every dialect answers the same questions. Each facet is a line or two of timeless, engine-true convention (no version-dated "as of vX" content), phrased as guidance with the engine reason where it helps — inheriting spec 07's heuristics-with-a-why tone. The rubric facets: 1. **FQTN form** — how to fully-qualify a table on this engine. 2. **Identifier quoting & case-folding** — quote character and how unquoted identifiers fold. 3. **Date/time** — the engine's date functions and common date-encoding idioms. 4. **Top-N / window-filtering idiom** — `QUALIFY` where supported; a CTE + outer-filter form where it is not; `TOP` for `tsql`. 5. **Semi-structured / JSON access** — VARIANT colon-paths, `JSON_VALUE`/ `JSON_EXTRACT`, `->`/`->>`, `json_extract`, as applicable. 6. **Sharded / partition idiom** where the engine has one (e.g. BigQuery `_TABLE_SUFFIX`). Constraints on the content: - **Coverage = the reachable dialect set.** Every driver in the connector registry must resolve to a dialect that has non-empty notes. The reachable set is `postgres`, `mysql`, `snowflake`, `bigquery`, `sqlite`, `clickhouse`, and `tsql` (from `sqlserver`). Do **not** author notes for `duckdb`/`databricks`: they appear in the resolver map but no connector can produce them, so they are unreachable — matching the draft's "don't author for nonexistent drivers." - **Keyed by `SqlAnalysisDialect`** (see Model). - **Storage is the implementer's choice.** The notes MAY live as per-dialect markdown files inside the package (e.g. under the skill's directory) served by the tool, or as a typed map. If files are used they are **package-internal** — served by the tool, never installed onto an agent target — and already ship via the recursive `src/skills → dist/skills` copy (`packages/cli/scripts/copy-runtime-assets.mjs`); no `setup-agents.ts` change. - **No benchmark, gold-answer, grader, or scoring references** anywhere in the notes. The implementer must verify each engine's specifics against current official documentation (the well-known anchors above are starting points, not a substitute for checking the engine's docs). ### 3. The `SKILL.md` pointer (completes spec 07's deferral) Add a **single one-line pointer** to the SQL-authoring step (step 4 "Plan" / step 5 "Query") of `packages/cli/src/skills/analytics/SKILL.md`, directing the agent to call the tool before writing raw SQL against a connection — e.g. *"Before writing raw `sql_execution` SQL, call `sql_dialect_notes` with the connection's id to get that engine's syntax conventions."* This is the pointer spec 07 deliberately did not add because the tool did not yet exist. - The pointer **names the tool only**; it contains **no dialect syntax**, so the flat skill stays dialect-agnostic. - Follow the skill's existing tool-reference convention. The skill currently names MCP tools by **bare** name (`discover_data`, `sql_execution`). Anthropic's guidance recommends **fully-qualified** `ServerName:tool` names to avoid "tool not found" when multiple MCP servers are present. Whether to fully-qualify the new pointer (and optionally retrofit the existing bare references) is a small, separable decision flagged for the maintainer — **not** a rename sweep this spec mandates. ### 4. Coverage is enforced from state, not by hand A test must **derive** the required coverage from the connector registry rather than hardcoding a dialect list: enumerate the configured warehouse drivers (`warehouseDrivers` in `driver-schemas.ts` / `KTX_DATABASE_DRIVER_IDS` in `connection-drivers.ts`), resolve each through `sqlAnalysisDialectForDriver`, and assert each result has non-empty notes. Adding a connector later then **fails this test** until its dialect gets notes — the allowlist-from-state discipline, not a hand-maintained list. ### 5. No dialect syntax leaks into the flat skill Spec 07's content assertion over `analytics/SKILL.md` stays green: the flat skill (and its worked example) still contain no `QUALIFY`, `strftime`, `julianday`, backtick/`DB.SCHEMA.TABLE` FQTN, or other single-engine construct. This spec adds a tool and a tool-pointer; it does not move dialect syntax into the skill. ### 6. Delivery is unchanged `setup-agents.ts` (`readAnalyticsSkillContent`, `installTarget`, `writeClaudeDesktopSkillBundle`, `plannedKtxAgentFiles`) needs **no change**. The skill still installs as a single `SKILL.md` per target. Confirm the channel works on all six targets — Claude Code, Claude Desktop (zip), Codex, universal `.agents`, Cursor (`.mdc`), OpenCode (`.md`) — by virtue of being a tool call, including the single-file targets where multi-file delivery could not scope. ### 7. Coordination with specs 07 and 03 - **Spec 07** owns the dialect-agnostic `` block. This spec must not amend it; it adds the tool, the pointer, and the notes. - **Spec 03** (`03-multi-connection-routing-in-analytics-skill`) threads `connectionId` through the skill's tool calls. The `sql_dialect_notes` pointer is `connectionId`-scoped and fits that routing; keep the pointer consistent with spec 03's `connectionId` rules and do not rewrite the routing it owns. ## Acceptance criteria - An agent querying a **sqlite** connection gets sqlite date idioms and **never** sees Snowflake/BigQuery-only syntax; an agent querying **Snowflake** gets FQTN / identifier / VARIANT guidance. - The dialect shown is **derived from the connection's configured `driver`** via the existing `sqlAnalysisDialectForDriver`, not hardcoded per project and not guessed. No second driver→dialect map is introduced. - **Every configured warehouse driver** (`postgres`, `mysql`, `snowflake`, `bigquery`, `sqlite`, `clickhouse`, `sqlserver`) resolves to a dialect with non-empty notes, and the coverage test derives this from the registry. - A **non-SQL context-source** connection (e.g. `metabase`, `notion`) yields a clear "not a SQL warehouse" response, **not** postgres notes. - `analytics/SKILL.md` remains dialect-agnostic — spec 07's criteria are unaffected. The new pointer references the tool only and adds no dialect syntax. - The channel installs/serves correctly across **all six** agent targets, including the single-file Cursor/OpenCode shape, with **no `setup-agents.ts` change**. - The notes contain **no** benchmark/gold/grader/scoring references and **no** time-sensitive ("as of version X") content. ## Implementation orientation Line numbers drift; treat these as anchors, not addresses. The implementer owns the design. - **Dialect resolver (reuse, do not duplicate):** `packages/cli/src/context/sql-analysis/dialect.ts` — `sqlAnalysisDialectForDriver(driver)`, returning `SqlAnalysisDialect` (`./ports.ts`), default `postgres`. - **Connector registry (drives coverage):** `packages/cli/src/connection-drivers.ts` (`KTX_DATABASE_DRIVER_IDS`, `isDatabaseDriver`) and `packages/cli/src/context/project/driver-schemas.ts` (`warehouseDrivers`, the per-driver `connectionConfigSchema`). - **MCP tool registration:** `packages/cli/src/context/mcp/context-tools.ts` (register beside `connection_list`, `entity_details`, `sql_execution`); the `connectionId → driver → dialect` resolution already exists for `sql_execution` in `packages/cli/src/context/mcp/local-project-ports.ts` — route the new tool through the same path. - **The skill (one-line pointer only):** `packages/cli/src/skills/analytics/SKILL.md` — add the tool pointer in step 4/5; leave ``/``/``/`` otherwise intact. - **Note storage (if files):** under the skill directory, shipped by `packages/cli/scripts/copy-runtime-assets.mjs`'s recursive copy; served by the tool, never installed. - **Delivery (confirm unchanged):** `packages/cli/src/setup-agents.ts`. - **Tests:** unit tests for resolution (including `sqlserver → tsql`, unknown → `postgres`, and non-warehouse rejection); a registry-derived coverage test (requirement 4); a content test that each dialect's notes cover the rubric facets and contain no banned tokens; and an extension of spec 07's `analytics/SKILL.md` content test asserting the new pointer is present and the flat skill is still dialect-clean. Rebuild and re-link the dev binary so the playground picks up the change: `pnpm run build && pnpm run link:dev`. ## Benchmark context (motivation only) The Spider 2.0-Lite v9 harnesses' only per-dialect content was Snowflake (`DB.SCHEMA.TABLE` FQTNs, double-quoted lower-case columns, VARIANT colon-paths), BigQuery (backtick FQTNs, `_TABLE_SUFFIX` for sharded tables), and sqlite (`strftime`/`julianday`). That content is real and useful but engine-specific; spec 07 kept it out of the flat skill and deferred it here so the dialect-agnostic rules stay clean. Delivering it through a dialect-scoped **ktx** tool generalizes the same correctness benefit to every multi-engine **ktx** project — improving the benchmark score is a side effect, not the goal, and the shipped skill contains no trace of the benchmark. ## Implementation notes Implemented on branch `write-feature-spec-wiki`, alongside spec 07. The committed decision (dynamic MCP delivery, not multi-file skill bundling) was implemented as specified — no `setup-agents.ts` change. **What was built** - Per-dialect notes are markdown files under `packages/cli/src/context/sql-analysis/dialects/.md` (one each for `postgres`, `mysql`, `snowflake`, `bigquery`, `sqlite`, `clickhouse`, `tsql`), served by `sqlDialectNotes(dialect)` in `sql-analysis/dialect-notes.ts` (lazy read + cache, `postgres` fallback floor; the authored set is the `DIALECTS_WITH_NOTES` const). `duckdb`/`databricks` are intentionally unauthored (unreachable from any connector). Each note answers the fixed rubric — FQTN, identifier quoting/case-folding, date/time, top-N/window idiom, JSON/semi-structured, plus a sharded-table line for BigQuery. Engine specifics were verified against current docs via Context7 (Snowflake VARIANT colon-paths and unquoted→UPPER case-folding; BigQuery `_TABLE_SUFFIX`, `QUALIFY`, `JSON_VALUE`; ClickHouse `LIMIT n BY` and `JSONExtract*`, with no `QUALIFY`). The files are package-internal — `copy-runtime-assets.mjs` ships them to `dist`; they are never installed onto an agent target. - New read-only MCP tool `sql_dialect_notes` (`context-tools.ts`): input `{ connectionId }` (required), output `{ connectionId, dialect, notes }`, read-only + idempotent annotations. It resolves through the **existing** `connectionId → connection.driver → sqlAnalysisDialectForDriver` path (no second driver→dialect map), implemented as the unconditional `dialectNotes` port in `local-project-ports.ts` via an extracted `resolveDialectNotesForConnection`. A non-SQL context source (gated by `isDatabaseDriver`) throws `KtxExpectedError` ("not a SQL warehouse"), not postgres notes — so the expected agent mistake stays out of Error Tracking. - `connection-drivers.ts`: `KTX_DATABASE_DRIVER_IDS` is now an exported (`@internal`) readonly tuple so the coverage test derives required coverage from the registry; `isDatabaseDriver` behavior is unchanged. - `skills/analytics/SKILL.md`: a single dialect-agnostic pointer in step 5 ("call `sql_dialect_notes` … to get that engine's FQTN, identifier-quoting, date, top-N, and JSON conventions"). It names the tool only; spec 07's `` block and its dialect-clean content test are untouched. **Tests** - `test/context/mcp/dialect-notes.test.ts`: registry-derived coverage (a future connector fails the test until its dialect has notes), the full rubric per dialect, leak isolation (sqlite shows `strftime` and never `VARIANT`/`_TABLE_SUFFIX`; `QUALIFY` only on snowflake/bigquery; engine-exclusive markers stay put), no benchmark/grader or version-dated content, the postgres fallback, and `resolveDialectNotesForConnection` resolving sqlite / snowflake / `sqlserver→tsql` and rejecting a non-SQL source / unknown connection with `KtxExpectedError`; plus a guard that the `DIALECTS_WITH_NOTES` const and the `dialects/*.md` files stay in sync. - `test/context/mcp/server.test.ts`: `sql_dialect_notes` added to the retained tool set + annotations assertion + a handler-routing test, and the regenerated `__snapshots__/mcp-tools-list.json`. - `test/skills/analytics-skill-content.test.ts`: asserts the new pointer is present and the flat skill stays dialect-clean. **Verification** — `tsc -p tsconfig.json` (src) clean; full default suite 393 files / 3001 passing; slow suite green (incl. `local-project-ports.test.ts`); all three `dead-code` checks clean; the `dialects/*.md` files copy into `dist`. Rebuilt and re-linked `ktx-dev`. **Deviations / notes** - Notes are stored as per-dialect markdown files (not a typed map, and not bundled `reference/*.md` skill files) — all sanctioned by the spec; plain markdown is the most maintainable to edit. They are served by the tool and ship via a `copy-runtime-assets.mjs` entry (`src/context/sql-analysis/dialects → dist/…`); no `setup-agents.ts` change. - `pnpm run type-check` still reports one pre-existing, unrelated error in `test/mcp-server-factory.test.ts` (committed in-flight MCP work on this branch); this change adds zero new type errors and does not touch that file.