}` | **Standalone** with `sql: SELECT * FROM WHERE ` | Enforcement, not opt-in |
| `explore: { join: Y { sql_on: …; relationship: … } }` | `joins:` entry `{ to: Y, on: " = Y.", relationship: … }` | On the overlay or standalone |
| `conditionally_filter` / `always_filter` | `segments: [{ name, expr }]` | Callers reference by name |
-| Manifest entry | `_schema/*.yaml` | **Never edit** — auto-imported |
+| Manifest entry | `_schema/*.yaml` | **Never edit** - auto-imported |
Type map: `date`/`datetime`/`timestamp` → `time`; `yesno` → `boolean`; `number` → `number`; `string` → `string`. Ignore `drill_fields:` (UI only).
@@ -92,14 +92,14 @@ SL source, `tables:` frontmatter, `sl_refs`, or `emit_unmapped_fallback`:
`sql_execution({connectionName: "warehouse", sql: "SELECT 1 FROM analytics.orders LIMIT 0"})`.
Replace `warehouse`, `analytics`, and `orders` with the verified connection,
schema or dataset, and table from the WorkUnit evidence.
-3. Use only those names in `sql:`, `columns:`, and `grain:`. Map each `dimension_group` to ONE `{ name: , type: time, role: time }` entry — never one per timeframe.
+3. Use only those names in `sql:`, `columns:`, and `grain:`. Map each `dimension_group` to ONE `{ name: , type: time, role: time }` entry - never one per timeframe.
| LookML input | KTX `columns:` entry |
|---|---|
| `dimension_group: month { type: time; timeframes: [month]; sql: ${TABLE}.month_date ;; }` | `{ name: month_date, type: time, role: time }` |
-| `dimension_group: date { type: time; timeframes: [raw, date, week, month]; sql: ${TABLE}.date ;; }` | `{ name: date, type: time, role: time }` — single entry, NOT `date_raw`/`date_date`/`date_week` |
+| `dimension_group: date { type: time; timeframes: [raw, date, week, month]; sql: ${TABLE}.date ;; }` | `{ name: date, type: time, role: time }` - single entry, NOT `date_raw`/`date_date`/`date_week` |
-**After every `sl_write_source`**: call `sl_validate`. It runs `SELECT * FROM () LIMIT 0` against the connection. If a column name was invented, the warehouse's `Unrecognized name: …` error comes back verbatim. Treat that as a hard failure — re-read the real columns with `sl_discover` and rewrite.
+**After every `sl_write_source`**: call `sl_validate`. It runs `SELECT * FROM () LIMIT 0` against the connection. If a column name was invented, the warehouse's `Unrecognized name: …` error comes back verbatim. Treat that as a hard failure - re-read the real columns with `sl_discover` and rewrite.
## Provenance markers
@@ -110,13 +110,13 @@ When a wiki mixes LookML source prose with `sl_discover` output, tag sections:
Customers fan out many-to-one into `accounts` via `account_id`.
-`customers.admin_user_id` is nullable — orphan rows exist.
+`customers.admin_user_id` is nullable - orphan rows exist.
```
Invisible in most renderers; lets a future pass audit provenance.
-## Example 1 — overlay (thin wrapper)
+## Example 1 - overlay (thin wrapper)
LookML (excerpt):
@@ -154,7 +154,7 @@ joins:
relationship: many_to_one
```
-## Example 2 — standalone from `derived_table`
+## Example 2 - standalone from `derived_table`
```lookml
view: lab_results {
@@ -188,7 +188,7 @@ measures:
- { name: avg_delta, expr: "avg(delta)" }
```
-## Example 3 — standalone with `sql_always_where`
+## Example 3 - standalone with `sql_always_where`
```lookml
view: rpt_daily_braze_email {
diff --git a/packages/context/skills/metabase_ingest/SKILL.md b/packages/context/skills/metabase_ingest/SKILL.md
index d35166dc..af25288f 100644
--- a/packages/context/skills/metabase_ingest/SKILL.md
+++ b/packages/context/skills/metabase_ingest/SKILL.md
@@ -79,12 +79,12 @@ SL source, `tables:` frontmatter, `sl_refs`, or `emit_unmapped_fallback`:
For each card:
1. Analyze `resolvedSql` + `resultMetadata`: identify base tables, aggregations, joins, filters, column types.
-2. **REQUIRED before any write**: call `sl_discover` for every candidate target source name. The response tells you whether the name is manifest-backed (`Type: table` or `Type: sql`). For manifest-backed names you MUST use the overlay shape (`name:` + `measures:`/`segments:`/`description:` only — no `sql:`, `table:`, `grain:`, or `columns:`); the tool will reject a standalone write and you'll have wasted the call. If `sl_discover` returns nothing for the name, you can write a standalone source. Also call `sl_read_source` on existing sources you intend to extend so you don't duplicate measures.
+2. **REQUIRED before any write**: call `sl_discover` for every candidate target source name. The response tells you whether the name is manifest-backed (`Type: table` or `Type: sql`). For manifest-backed names you MUST use the overlay shape (`name:` + `measures:`/`segments:`/`description:` only - no `sql:`, `table:`, `grain:`, or `columns:`); the tool will reject a standalone write and you'll have wasted the call. If `sl_discover` returns nothing for the name, you can write a standalone source. Also call `sl_read_source` on existing sources you intend to extend so you don't duplicate measures.
3. Include `rawPaths: ["cards/.json"]` on every `sl_write_source`, `sl_edit_source`, and `wiki_write` call. If one artifact generalizes multiple near-duplicate cards, include each contributing card path and no unrelated cards.
4. Decide:
- Simple aggregation on a table that already has a source → `sl_edit_source` to add a measure.
- Join between tables that should be linked in the SL graph → `sl_edit_source` to add a join.
- - Complex derived SQL (CTEs, multi-layer aggregation, scoring models) → `sl_write_source` with `source_type: sql`. When the SQL projects/filters from a single manifest-backed base table, set `inherits_columns_from: ` so columns inherit type and description from the manifest — see `sl_capture` skill for the slim form. Use `sl_discover` to discover the manifest key from the table reference in the SQL (it accepts `MARTS.CONSIGNMENTS`, `ANALYTICS.MARTS.CONSIGNMENTS`, or `CONSIGNMENTS`).
+ - Complex derived SQL (CTEs, multi-layer aggregation, scoring models) → `sl_write_source` with `source_type: sql`. When the SQL projects/filters from a single manifest-backed base table, set `inherits_columns_from: ` so columns inherit type and description from the manifest - see `sl_capture` skill for the slim form. Use `sl_discover` to discover the manifest key from the table reference in the SQL (it accepts `MARTS.CONSIGNMENTS`, `ANALYTICS.MARTS.CONSIGNMENTS`, or `CONSIGNMENTS`).
- New base table not yet in the semantic layer → `sl_write_source` with `source_type: table`.
- Trivial query (`SELECT *`, simple `COUNT(*)` with no business logic) → do nothing; the runner will record this card as `action_type='skipped'`.
- Duplicate of an existing measure → same as trivial; do nothing for this card.
@@ -98,11 +98,11 @@ measures:
expr: ""
```
-Overlay shape: `name:` plus any of `measures:`, `segments:`, `descriptions:`, `joins:`, `disable_joins:`. Never include `sql:`, `table:`, `grain:`, or `columns:` on a manifest-backed name — those would shadow the manifest's schema and drop its joins. Overlay `joins:` are merged additively with the manifest's joins (deduped by `to` + `on`); use `disable_joins: [""]` to suppress a specific manifest join. After the overlay exists, use `sl_edit_source` for further tweaks. See `sl_capture` skill for the canonical overlay rule.
+Overlay shape: `name:` plus any of `measures:`, `segments:`, `descriptions:`, `joins:`, `disable_joins:`. Never include `sql:`, `table:`, `grain:`, or `columns:` on a manifest-backed name - those would shadow the manifest's schema and drop its joins. Overlay `joins:` are merged additively with the manifest's joins (deduped by `to` + `on`); use `disable_joins: [""]` to suppress a specific manifest join. After the overlay exists, use `sl_edit_source` for further tweaks. See `sl_capture` skill for the canonical overlay rule.
**Join discovery:** When your card's SQL references warehouse tables (e.g. in `FROM` or `JOIN` clauses), call `sl_discover({ query: '' })` before writing. The matching manifest entry's `name` is the value you use in `joins: [- to: ]` only when the card output exposes a local key that matches the target source grain (for example `account_id = mart_account_segments.account_id`). Do not declare a KTX join just because the card SQL joins that table internally. If the output only exposes display fields such as `account_name`, keep the SQL source self-contained or project the key before adding the join. Use `many_to_one` for FK-to-dimension joins, `one_to_many` for the reverse.
-**Hard rule on join columns (prevents broken joins):** For every join you declare, the local column on the left of `on:` MUST be (a) present in your source's projected output and (b) a key/ID column, never a display value. If the natural FK isn't in your SELECT, add it to SELECT before declaring the join. Joining `account_name = mart_account_segments.account_id` is always wrong — names are not identifiers and the equality produces zero matches. The validator rejects this with a "display value to identifier" error; the tool will refuse to save it. Add `account_id` to your SELECT and join on `account_id = mart_account_segments.account_id`, or omit the join entirely.
+**Hard rule on join columns (prevents broken joins):** For every join you declare, the local column on the left of `on:` MUST be (a) present in your source's projected output and (b) a key/ID column, never a display value. If the natural FK isn't in your SELECT, add it to SELECT before declaring the join. Joining `account_name = mart_account_segments.account_id` is always wrong - names are not identifiers and the equality produces zero matches. The validator rejects this with a "display value to identifier" error; the tool will refuse to save it. Add `account_id` to your SELECT and join on `account_id = mart_account_segments.account_id`, or omit the join entirely.
## priorProvenance
@@ -114,7 +114,7 @@ If the WU prompt includes a `priorProvenance` section for a card, it tells you w
## Deduplication
-Before writing, scan all cards in this WU for near-duplicate groups — cards whose `resolvedSql` shares the same CTEs, base tables, joins, and aggregation structure but differs only in:
+Before writing, scan all cards in this WU for near-duplicate groups - cards whose `resolvedSql` shares the same CTEs, base tables, joins, and aggregation structure but differs only in:
- Trailing filters (e.g. `date_trunc(week, date)` vs `date_trunc(month, date)`).
- Minor `WHERE` clause variations.
- Column aliases or output column subsets.
@@ -124,7 +124,7 @@ When you find a group of near-duplicates:
1. Create ONE generalized source from the most comprehensive card in the group.
2. Strip card-specific trailing filters from the SQL so the source covers all variants (e.g. keep daily grain instead of filtering to week/month).
3. If each card had a distinct measure or filter, add them as separate measures on the single source.
-4. For all cards except the canonical one, do nothing — they'll be recorded as `action_type='skipped'` automatically by the runner.
+4. For all cards except the canonical one, do nothing - they'll be recorded as `action_type='skipped'` automatically by the runner.
Do NOT merge cards with fundamentally different business logic, even if they share CTEs.
@@ -132,7 +132,7 @@ Do NOT merge cards with fundamentally different business logic, even if they sha
When a card's `resolvedSql` contains `GROUP BY` with aggregation functions (`SUM`, `COUNT`, `AVG`, …):
-1. **Detect**: simple aggregation on base tables/joins — `SELECT` with `GROUP BY`, no complex CTEs or window functions.
+1. **Detect**: simple aggregation on base tables/joins - `SELECT` with `GROUP BY`, no complex CTEs or window functions.
2. **Decompose**: strip the `GROUP BY` and aggregation functions. Keep `FROM`, `JOIN`, and `WHERE` intact.
3. **Expose row-level columns**: include the grouped-by columns AND the raw columns being aggregated (e.g. `money_out` instead of `SUM(money_out) AS total_money_out`).
4. **Define aggregations as measures**: convert each aggregation into a KSL measure (e.g. `sum(money_out)`).
@@ -144,17 +144,17 @@ Exception: keep the pre-aggregated SQL when the query involves multi-CTE pipelin
Every card carries a `resolvedSql` field. Check the staged card's `resolutionStatus` first:
-- `resolutionStatus: "resolved"` — `{{#N}}` references are inlined and `[[ ... ]]` optional clauses have been dropped locally. If the resolved SQL contains no other parameters the SQL is executable as-is. If the card had **required** (non-bracketed) `{{ var }}` placeholders, the SQL is prefixed with a placeholder-warning comment block listing every dummy substitution Metabase made — see "Step A" below.
-- `resolutionStatus: "fallback"` — Metabase failed to resolve. The SQL still contains `{{#N}}`, `{{#N-name}} alias`, `{{ var }}`, and `[[ ... ]]` syntax. Do the translation steps below before writing a source.
+- `resolutionStatus: "resolved"` - `{{#N}}` references are inlined and `[[ ... ]]` optional clauses have been dropped locally. If the resolved SQL contains no other parameters the SQL is executable as-is. If the card had **required** (non-bracketed) `{{ var }}` placeholders, the SQL is prefixed with a placeholder-warning comment block listing every dummy substitution Metabase made - see "Step A" below.
+- `resolutionStatus: "fallback"` - Metabase failed to resolve. The SQL still contains `{{#N}}`, `{{#N-name}} alias`, `{{ var }}`, and `[[ ... ]]` syntax. Do the translation steps below before writing a source.
-### Step A — Handle dummy-substituted placeholders (resolved cards only)
+### Step A - Handle dummy-substituted placeholders (resolved cards only)
When a card has a required `{{ var }}` outside any `[[ ]]` block, the resolver substitutes a **dummy value** purely so Metabase's parser will accept the query. The resulting SQL is prefixed with a comment like:
```sql
-- PLACEHOLDER_WARNING: this SQL was extracted from a Metabase card with
-- unbound template parameters. The placeholders below were substituted with DUMMY
--- values to satisfy Metabase's parser — they DO NOT represent intended filters.
+-- values to satisfy Metabase's parser - they DO NOT represent intended filters.
-- Drop the corresponding clauses (or expose them as runtime SL filters) before
-- persisting this SQL as a semantic-layer source.
-- {{ auction_end }} (type=dimension, widget=date/all-options) → '2020-01-01~2020-12-31'
@@ -165,7 +165,7 @@ WHERE start_date >= '2020-01-01' AND start_date < '2021-01-01' AND status = 'pla
For each listed placeholder: locate the WHERE clause(s) in the SQL that reference the dummy literal and **drop them**, then strip the warning comment. SL chat-time filters compose narrowing predicates dynamically, so the source should represent the unfiltered dataset.
-For `fallback` cards, dropping is simpler — the SQL still has the `[[ ... ]]` brackets and `{{ var }}` placeholders intact:
+For `fallback` cards, dropping is simpler - the SQL still has the `[[ ... ]]` brackets and `{{ var }}` placeholders intact:
```sql
-- before:
@@ -177,18 +177,18 @@ WHERE 1=1
WHERE 1=1
```
-### Step B — Inline `{{#N}}` references (fallback cards only)
+### Step B - Inline `{{#N}}` references (fallback cards only)
Resolved cards already have `{{#N}}` inlined for you. For `fallback` cards, each `{{#N}}` (or `{{#N-some-slug}}`) in the SQL refers to another card's `resolvedSql`. The referenced card is in the WU's `rawFiles` or `dependencyPaths`. Read it with `read_raw_file`, then inline its SQL.
If the reference has an alias (`from {{#5996-listing-interactions}} tb`), the **outer** SQL probably uses that alias (`select tb.* ...`, `tb.column_name`, etc.). When you inline, you must EITHER:
-1. **Pick a single base table inside the inlined SQL and rename its alias to the outer alias.** Useful when the inlined card is `SELECT * FROM listings JOIN ...` — set the LISTINGS alias to `tb` and `tb.*` keeps working in the outer query.
+1. **Pick a single base table inside the inlined SQL and rename its alias to the outer alias.** Useful when the inlined card is `SELECT * FROM listings JOIN ...` - set the LISTINGS alias to `tb` and `tb.*` keeps working in the outer query.
2. **Replace the outer alias references with explicit columns from the inlined SQL.** Useful when the inlined card has multiple JOINs and `tb.*` is ambiguous.
Never leave the outer alias dangling: after inlining, **grep your SQL for the outer alias name and rewrite or remove every reference**. A leftover `tb.*` with no `tb` table is the most common failure mode here.
-### Step C — Inlining cleanup checklist
+### Step C - Inlining cleanup checklist
After Steps A and B, your SQL must:
@@ -209,11 +209,11 @@ For `source_type: sql`:
- If `sl_discover` resolves the table, it is not outside the manifest. Do not write an `unmapped-table-*` fallback for resolved `orbit_raw`, `mart`, or other manifest-backed sources just because they appear inside card SQL.
- If `sl_discover` cannot resolve a referenced table at all, write a single-line `wiki_write` with key `unmapped-table-` and `rawPaths: ["cards/.json"]` so the gap is documented, then call `emit_unmapped_fallback` with the staged card path as `rawPath`, `reason: "missing_target_table"`, `tableRef: ""`, and `fallback: "wiki_only"`. Do not use this fallback if `sl_discover` resolved the table/source.
-Joins on manifest-backed names compose: the manifest's joins are inherited automatically, and any overlay `joins:` are merged on top (deduped by `to` + `on`). Use `disable_joins: [""]` in the overlay to suppress a specific manifest join. If `sl_discover` shows a manifest-backed source with `Joins: 0` and the warehouse FK metadata is genuinely absent, declaring application-level joins via the overlay is fair game — bootstrap with `sl_write_source` (overlay shape above), then refine via `sl_edit_source`.
+Joins on manifest-backed names compose: the manifest's joins are inherited automatically, and any overlay `joins:` are merged on top (deduped by `to` + `on`). Use `disable_joins: [""]` in the overlay to suppress a specific manifest join. If `sl_discover` shows a manifest-backed source with `Joins: 0` and the warehouse FK metadata is genuinely absent, declaring application-level joins via the overlay is fair game - bootstrap with `sl_write_source` (overlay shape above), then refine via `sl_edit_source`.
## Cross-card references (`{{#N}}`)
-Resolved cards (`resolutionStatus: "resolved"`) have these inlined for you. Unresolved cards (`resolutionStatus: "fallback"`) need manual handling — see "SQL translation from raw native to KSL" above.
+Resolved cards (`resolutionStatus: "resolved"`) have these inlined for you. Unresolved cards (`resolutionStatus: "fallback"`) need manual handling - see "SQL translation from raw native to KSL" above.
## Provenance markers
@@ -237,7 +237,7 @@ Source definitions must follow ktx-sl YAML conventions:
- `columns`: all columns with correct types (`string`, `number`, `time`, `boolean`).
- Time columns: mark with `role: time`.
- `joins`: use correct `relationship` types (`many_to_one` for FK→PK, `one_to_many` for reverse).
-- `joins.on`: `local_column = TARGET_SOURCE.target_column` — the right side MUST include the target source name.
+- `joins.on`: `local_column = TARGET_SOURCE.target_column` - the right side MUST include the target source name.
- `measures.expr`: aggregation expression (e.g. `"sum(amount)"`); optional `filter` for business rules; required `description`.
Measure naming: descriptive `snake_case` (e.g. `total_revenue`, `avg_order_value`).
@@ -250,4 +250,4 @@ Measure naming: descriptive `snake_case` (e.g. `total_revenue`, `avg_order_value
- If two measures differ only by a filter (e.g. `revenue` vs `paid_revenue`), they are distinct.
- Use the card's `name` + `description` to write meaningful measure descriptions.
- When multiple cards in a WU are near-duplicates, create ONE generalized source; the runner will skip the rest automatically.
-- Process every card in the WU — don't stop early.
+- Process every card in the WU - don't stop early.
diff --git a/packages/context/skills/metricflow_ingest/SKILL.md b/packages/context/skills/metricflow_ingest/SKILL.md
index 67743892..646dedb8 100644
--- a/packages/context/skills/metricflow_ingest/SKILL.md
+++ b/packages/context/skills/metricflow_ingest/SKILL.md
@@ -15,7 +15,7 @@ A MetricFlow `semantic_model` maps to an SL source; MetricFlow `measures` map to
| `semantic_model: X { model: ref('t') }` with measures + dimensions | **Overlay** at `/X.yaml` with `measures`, `columns` (computed), `joins` | The `model:` ref resolves to a manifest table. |
| `semantic_model: X { model: source('s','t') }` | **Overlay** at `/X.yaml` over table `t`. | Same shape; `source()` still resolves to a physical table. |
| `semantic_model: X { model: }` with no manifest entry | **Standalone** with explicit `sql:`, `grain:`, `columns:` | Happens when the dbt manifest isn't available. |
-| `semantic_model: Y { extends: X }` | **Merge** Y's measures/dimensions/entities into X's overlay, or write a single overlay named for the most-derived child (Y) containing both X's and Y's primitives | Do not emit a second overlay for X — flatten. |
+| `semantic_model: Y { extends: X }` | **Merge** Y's measures/dimensions/entities into X's overlay, or write a single overlay named for the most-derived child (Y) containing both X's and Y's primitives | Do not emit a second overlay for X - flatten. |
| `measures: [{ name, agg, expr }]` | `measures: [{ name, expr: "()" }]` | Aggregation inlined. `agg: count_distinct` → `count(distinct ...)`. |
| `entities: [{ name, type: primary }]` | `grain: []` on the overlay/standalone | Primary/unique entities drive grain. |
| `entities: [{ name, type: foreign }]` | `joins:` entry joining to the primary-entity's semantic_model | Only when a matching primary is discoverable. |
@@ -24,10 +24,10 @@ A MetricFlow `semantic_model` maps to an SL source; MetricFlow `measures` map to
| `metrics: [{ type: derived, type_params: { expr, metrics } }]` | **Derived measure** on whichever source owns the referenced measures, with `expr:` referencing measure names | If the metric spans models, still write it once on the source owning the "primary" measure (the one the agent judges most central). Mention the cross-model chain in the description. |
| `metrics: [{ type: ratio, type_params: { numerator, denominator } }]` | Same as derived; `expr: "numerator / NULLIF(denominator, 0)"` if no explicit expr | Safe-division by default. |
| `metrics: [{ type: cumulative, type_params: { window, grain_to_date } }]` | **Standalone** source with a window-function SQL; reference the resulting column as a normal measure | KTX SL has no first-class cumulative primitive (spec Non-goals). |
-| `metrics: [{ type: conversion }]` | **Flag for human** — do NOT write. Emit a wiki note describing the intended semantics. | No KTX equivalent in v1. |
+| `metrics: [{ type: conversion }]` | **Flag for human** - do NOT write. Emit a wiki note describing the intended semantics. | No KTX equivalent in v1. |
| Metric not mappable | Wiki page `-definition.md` with the full YAML body quoted | Capture the intent even if we can't emit SL. |
-Type map: MetricFlow `time` to KTX `time`; `categorical` to `string`; `number` to `number`; `boolean` to `boolean`. Follow `expr` over `name` when both differ — `expr` is the physical column.
+Type map: MetricFlow `time` to KTX `time`; `categorical` to `string`; `number` to `number`; `boolean` to `boolean`. Follow `expr` over `name` when both differ - `expr` is the physical column.
Verify each MetricFlow model source table with entity_details before producing
the corresponding sl_write_source.
@@ -67,7 +67,7 @@ Within one WorkUnit, multiple semantic_models linked by `extends:` are guarantee
1. Start with the most-derived child (the one that no other semantic_model extends).
2. Walk the `extends:` chain upward, accumulating measures, dimensions, entities.
3. Write ONE overlay/standalone, named for the most-derived child's SL-appropriate name (not the base).
-4. Parents that lack their own distinctive content should NOT get a separate overlay. If a parent has unique measures a child doesn't inherit, consider whether the base is used elsewhere — if yes, write both; if no, still one overlay.
+4. Parents that lack their own distinctive content should NOT get a separate overlay. If a parent has unique measures a child doesn't inherit, consider whether the base is used elsewhere - if yes, write both; if no, still one overlay.
5. Measure/dimension name collisions: child wins, but note the overridden parent in the overlay's description or in a sibling wiki page.
The spec's worked example has `orders`, `orders_ext` (extends orders), and `metrics/orders_final.yml` (defines `revenue` referencing both). The right output is ONE overlay named `orders_ext` (or `orders` if the team's naming favors the base) containing `order_count`, `gross_amount`, `refund_amount`, and a derived `revenue` measure. Provenance tags point to all three source files.
@@ -88,9 +88,9 @@ call `sql_execution` with the same warehouse connection name, for example:
`sql:` must be sourced from raw files, `entity_details`, or a successful SQL
probe.
-After every `sl_write_source`, call `sl_validate`. The warehouse will reject invented columns with `Unrecognized name: ` — treat as a hard failure and re-read the schema.
+After every `sl_write_source`, call `sl_validate`. The warehouse will reject invented columns with `Unrecognized name: ` - treat as a hard failure and re-read the schema.
-## Cumulative metrics — sql-standalone fallback
+## Cumulative metrics - sql-standalone fallback
KTX SL has no first-class `window:` or `grain_to_date:` primitive in v1 (spec Non-goals). Translate a MetricFlow cumulative metric to a standalone SL source with a window-function SQL:
@@ -125,7 +125,7 @@ measures:
Pick the time column based on the semantic_model's `defaults.agg_time_dimension` (e.g. `ordered_at`). If the MetricFlow config omits it, probe the base table for time-typed columns and choose the most obvious. After writing the standalone SQL source, call `emit_unmapped_fallback` with `rawPath` set to the MetricFlow file path, `reason: "cumulative_metric_unsupported"`, and `fallback: "sql_standalone"`.
-## Conversion metrics — flag for human
+## Conversion metrics - flag for human
```yaml
metrics:
@@ -159,7 +159,7 @@ name: orders_ext
Line ranges (`#L-`) point to the exact YAML span within the file (the `semantic_models:` entry for its own `name`). Use `read_raw_span` to identify those ranges before writing.
-## Example 1 — single semantic_model to overlay
+## Example 1 - single semantic_model to overlay
```yaml
# MetricFlow:
@@ -185,7 +185,7 @@ measures:
grain: [order_id]
```
-## Example 2 — extends chain → one flattened overlay
+## Example 2 - extends chain → one flattened overlay
```yaml
# MetricFlow:
@@ -232,7 +232,7 @@ measures:
grain: [order_id]
```
-## Example 3 — derived metric spanning two semantic_models
+## Example 3 - derived metric spanning two semantic_models
```yaml
# models/sales.yml
@@ -256,7 +256,7 @@ metrics:
metrics: [{name: revenue}, {name: cost}]
```
-Because the WorkUnit bundles all three files (cross-component union via the metric), write the derived measure on ONE of the two sources — pick the source whose domain "owns" the metric (here, `sales` — margin is inherently a sales metric). Cross-source references aren't native in KTX SL; treat the metric's operands as already-resolvable in the target source's query context OR emit a standalone SQL that joins the two tables:
+Because the WorkUnit bundles all three files (cross-component union via the metric), write the derived measure on ONE of the two sources - pick the source whose domain "owns" the metric (here, `sales` - margin is inherently a sales metric). Cross-source references aren't native in KTX SL; treat the metric's operands as already-resolvable in the target source's query context OR emit a standalone SQL that joins the two tables:
```yaml
# /sales.yaml
@@ -269,7 +269,7 @@ measures:
```
```yaml
-# /margin.yaml — standalone because it spans two tables
+# /margin.yaml - standalone because it spans two tables
#
#
#
@@ -292,7 +292,7 @@ measures:
Also write a wiki page at `wiki/global/margin-metric.md` explaining the cross-source origin.
-## Example 4 — filtered metric creates a new measure
+## Example 4 - filtered metric creates a new measure
```yaml
metrics:
diff --git a/packages/context/skills/notion_synthesize/SKILL.md b/packages/context/skills/notion_synthesize/SKILL.md
index 1b5417e3..e799ce7c 100644
--- a/packages/context/skills/notion_synthesize/SKILL.md
+++ b/packages/context/skills/notion_synthesize/SKILL.md
@@ -67,7 +67,7 @@ Search existing wiki pages for the same `tables:` or `sl_refs:` frontmatter and
- Do not create SL sources under the Notion connection just because a page mentions a warehouse, dbt, Looker, or Metabase object. Use the mapped warehouse/source connection after discovery, or emit an unmapped fallback and write wiki-only.
- Distinguish fallback reasons precisely: if a non-Notion warehouse/dbt connection exists but `sl_discover` cannot find the named table/source, use `no_physical_table`; reserve `no_connection_mapping` for cases where there is no plausible non-Notion target connection at all.
- If `sl_discover` resolves the table/source, do not call `emit_unmapped_fallback` for that table. Use the resolved source for `sl_refs`, overlay edits, or wiki-only documentation.
-- When calling `emit_unmapped_fallback`, pass the table or source identifier as `tableRef` (e.g. `tableRef: "."`) — the tool generates the canonical detail string from the reason code and `tableRef`. Use the optional `clarification` field only to add context that does not contradict the reason. Do not restate the reason in `clarification`.
+- When calling `emit_unmapped_fallback`, pass the table or source identifier as `tableRef` (e.g. `tableRef: "."`) - the tool generates the canonical detail string from the reason code and `tableRef`. Use the optional `clarification` field only to add context that does not contradict the reason. Do not restate the reason in `clarification`.
## Identifier Verification Protocol
diff --git a/packages/context/skills/sl/SKILL.md b/packages/context/skills/sl/SKILL.md
index 7103a276..2a1e4a09 100644
--- a/packages/context/skills/sl/SKILL.md
+++ b/packages/context/skills/sl/SKILL.md
@@ -1,6 +1,6 @@
---
name: sl
-description: KTX's semantic layer — a structured catalog of sources (tables/views), measures, joins, and segments expressed as YAML. Covers the schema and how to query it via `sl_query`. Use when the task involves querying pre-defined metrics (ARR, churn, retention, LTV, MAU) or reading SL source YAML to understand the catalog. Capture is handled by the `sl_capture` skill (memory-agent only).
+description: KTX's semantic layer - a structured catalog of sources (tables/views), measures, joins, and segments expressed as YAML. Covers the schema and how to query it via `sl_query`. Use when the task involves querying pre-defined metrics (ARR, churn, retention, LTV, MAU) or reading SL source YAML to understand the catalog. Capture is handled by the `sl_capture` skill (memory-agent only).
---
# Semantic Layer
@@ -8,10 +8,10 @@ description: KTX's semantic layer — a structured catalog of sources (tables/vi
KTX's semantic layer (SL) is a structured catalog. Each **source** represents a table, a SQL view, or an overlay that enriches a manifest-backed table with measures, computed columns, joins, and named segments. The catalog is the single source of truth for reusable business metrics.
This skill covers two parts:
-- **Part 1** — Schema reference (what an SL source looks like).
-- **Part 2** — Querying via `sl_query`.
+- **Part 1** - Schema reference (what an SL source looks like).
+- **Part 2** - Querying via `sl_query`.
-Capture (when and how to add new patterns to the SL) is a separate concern handled by the memory-agent — see the `sl_capture` skill if you are running in capture mode. The research agent **reads** and **queries** the SL via the tools described here; it does not write to it.
+Capture (when and how to add new patterns to the SL) is a separate concern handled by the memory-agent - see the `sl_capture` skill if you are running in capture mode. The research agent **reads** and **queries** the SL via the tools described here; it does not write to it.
For capture-time identifier verification, load `sl_capture`. Synthesis writer
skills must verify warehouse identifiers with `discover_data`,
@@ -19,7 +19,7 @@ skills must verify warehouse identifiers with `discover_data`,
---
-## Part 1 — Schema reference
+## Part 1 - Schema reference
An SL source is a YAML file at `semantic-layer//.yaml`. There are three flavors:
@@ -34,7 +34,7 @@ descriptions:
measures:
- name: total_revenue
expr: sum(amount)
- description: Total order revenue — filter by status or region at query time
+ description: Total order revenue - filter by status or region at query time
columns: # computed dimensions only
- name: is_large_order
type: boolean
@@ -49,7 +49,7 @@ joins:
```
Rules:
-- Do **not** repeat base-table columns, grain, `table`, or `source_type` in an overlay — those are inherited.
+- Do **not** repeat base-table columns, grain, `table`, or `source_type` in an overlay - those are inherited.
- Overlay columns MUST be computed (`expr` + `type`).
- `exclude_columns` hides specific manifest columns; `disable_joins` suppresses specific auto-detected joins.
@@ -106,7 +106,7 @@ measures:
expr: count(*)
```
-An SQL source is a one-shot answer: the aggregation is frozen, callers cannot re-group or re-filter by columns the SQL has collapsed, and the source is disconnected from the join graph. Prefer overlays + measures over SQL sources when possible — the `sl_capture` skill covers when SQL is justified.
+An SQL source is a one-shot answer: the aggregation is frozen, callers cannot re-group or re-filter by columns the SQL has collapsed, and the source is disconnected from the join graph. Prefer overlays + measures over SQL sources when possible - the `sl_capture` skill covers when SQL is justified.
### Columns
@@ -119,7 +119,7 @@ Every standalone column requires `name` and `type`. Overlays have computed colum
### Grain
-`grain: [col_a, col_b]` — the set of columns that uniquely identify one row. The query engine uses grain to prevent fan-out in joins. Overlays inherit grain from the manifest unless they override.
+`grain: [col_a, col_b]` - the set of columns that uniquely identify one row. The query engine uses grain to prevent fan-out in joins. Overlays inherit grain from the manifest unless they override.
### Joins
@@ -128,7 +128,7 @@ joins:
- to: customers # target source name
on: "customer_id = customers.id" # local_col = TARGET.target_col
relationship: many_to_one # or one_to_many, one_to_one
- alias: primary_customer # optional — lets you join the same target twice
+ alias: primary_customer # optional - lets you join the same target twice
```
- `on` format: `local_col = TARGET.target_col`. Always qualify the right side with the target source name.
@@ -140,13 +140,13 @@ joins:
measures:
- name: total_arr
expr: sum(arr_amount)
- description: Sum of ARR — filter by plan_name at query time
+ description: Sum of ARR - filter by plan_name at query time
filter: "is_active = true"
segments: [paid_non_refunded]
```
- `name` (required, snake_case).
-- `expr` (required): any valid SQL aggregate — `sum(x)`, `count(*)`, `count(distinct user_id)`, `avg(score)`.
+- `expr` (required): any valid SQL aggregate - `sum(x)`, `count(*)`, `count(distinct user_id)`, `avg(score)`.
- `description` (required on capture): what the measure computes and how to use it.
- `filter` (optional): SQL predicate applied as a WHERE clause specific to this measure.
- `segments` (optional): names of segments defined on the same source. The engine AND-composes each segment's `expr` into this measure's effective filter.
@@ -162,23 +162,23 @@ segments:
description: Orders that were paid and not refunded
```
-Named, reusable boolean predicates scoped to one source. Reference by bare name in a measure's `segments: []`, or by dotted form `source.segment_name` in an `sl_query`. Segments are predicates only — they are NOT selectable as dimensions. If you need to group by the predicate, add a `columns[]` entry instead.
+Named, reusable boolean predicates scoped to one source. Reference by bare name in a measure's `segments: []`, or by dotted form `source.segment_name` in an `sl_query`. Segments are predicates only - they are NOT selectable as dimensions. If you need to group by the predicate, add a `columns[]` entry instead.
### Cross-references with the wiki
-The reverse edge (wiki pages that cite this source) is derived automatically from each wiki's `sl_refs:` — you don't emit anything on the SL side. Author the edge once on the wiki via `sl_refs:`; the post-write reconciler populates the knowledge↔SL index.
+The reverse edge (wiki pages that cite this source) is derived automatically from each wiki's `sl_refs:` - you don't emit anything on the SL side. Author the edge once on the wiki via `sl_refs:`; the post-write reconciler populates the knowledge↔SL index.
---
-## Part 2 — Querying via `sl_query`
+## Part 2 - Querying via `sl_query`
The `sl_query` tool generates correct SQL from a structured query. It handles joins, fan-out prevention, aggregation correctness, and filter classification automatically. Prefer it over writing raw SQL whenever the SL has the relevant sources.
### When to prefer sl_query over raw SQL
- A pre-defined measure already exists (`source.measure_name` appears in the catalog).
-- The question combines fields from multiple sources — the engine resolves the join path automatically.
-- The question asks for a standard metric (revenue, ARR, churn, retention, LTV, conversion, MAU, etc.) — even if no pre-defined measure exists, a runtime aggregation over a catalog column is usually correct.
+- The question combines fields from multiple sources - the engine resolves the join path automatically.
+- The question asks for a standard metric (revenue, ARR, churn, retention, LTV, conversion, MAU, etc.) - even if no pre-defined measure exists, a runtime aggregation over a catalog column is usually correct.
Use raw SQL (`sql_execution`) only when:
- The computation requires multi-step CTEs whose intermediate grain is not a column in any source.
@@ -201,17 +201,17 @@ Use raw SQL (`sql_execution`) only when:
- **`measures`**: mix pre-defined refs (`source.measure`) and runtime aggregations (`sum(source.column)`).
- **`dimensions`**: column refs or `{ field, granularity }` objects for time grains (`day`, `week`, `month`, `quarter`, `year`).
- **`filters`**: free-form SQL predicates. The engine auto-classifies each as WHERE or HAVING based on whether it references an aggregated measure.
-- **`segments`**: dotted `source.segment_name`. Each segment is AND-ed into the effective filter of every measure whose base source matches. Segments never become a global WHERE — use `filters` for cross-source predicates.
+- **`segments`**: dotted `source.segment_name`. Each segment is AND-ed into the effective filter of every measure whose base source matches. Segments never become a global WHERE - use `filters` for cross-source predicates.
- **`order_by`**: string or `{ field, direction }`. Direction defaults to `asc`.
- **`limit`**: integer row cap.
### Join resolution
-You don't specify a base table. The engine infers the set of sources needed from the fields you reference and resolves the shortest join path through the catalog's declared joins. If no path exists between two sources, the query fails with a path-not-found error — check `discover_data` or `sl_discover` to see which sources are connected.
+You don't specify a base table. The engine infers the set of sources needed from the fields you reference and resolves the shortest join path through the catalog's declared joins. If no path exists between two sources, the query fails with a path-not-found error - check `discover_data` or `sl_discover` to see which sources are connected.
### Worked examples
-Cross-source query — engine resolves `account_health_scores → accounts ← opportunities` automatically:
+Cross-source query - engine resolves `account_health_scores → accounts ← opportunities` automatically:
```json
{
diff --git a/packages/context/skills/sl_capture/SKILL.md b/packages/context/skills/sl_capture/SKILL.md
index 4ec21545..3d19118f 100644
--- a/packages/context/skills/sl_capture/SKILL.md
+++ b/packages/context/skills/sl_capture/SKILL.md
@@ -1,10 +1,10 @@
---
name: sl_capture
-description: How to capture new reusable patterns into KTX's semantic layer — when a measure, segment, or join belongs in the catalog and how to write it generically so it stays small and useful over time. Loaded by the post-turn memory-agent only. The research agent does not write to the SL.
+description: How to capture new reusable patterns into KTX's semantic layer - when a measure, segment, or join belongs in the catalog and how to write it generically so it stays small and useful over time. Loaded by the post-turn memory-agent only. The research agent does not write to the SL.
callers: [memory_agent]
---
-# Semantic Layer — Capture
+# Semantic Layer - Capture
This skill covers **when** and **how** to capture new patterns into the semantic layer. For schema reference and query grammar, load the `sl` skill first.
@@ -13,8 +13,8 @@ When the current turn produces a reusable pattern (business metric, derived view
## SQL dialect
The user-facing prompt includes a `Warehouse:` line under the SL Sources index
-(e.g. `Warehouse: BIGQUERY`). All `expr` strings — measure expressions, segment
-predicates, computed-column SQL — execute on that warehouse and must use its
+(e.g. `Warehouse: BIGQUERY`). All `expr` strings - measure expressions, segment
+predicates, computed-column SQL - execute on that warehouse and must use its
syntax. Date arithmetic in particular varies by dialect:
- **BigQuery**: `transaction_date >= TIMESTAMP_SUB(CURRENT_TIMESTAMP(), INTERVAL 90 DAY)` (when the column is `TIMESTAMP`); `event_date >= DATE_SUB(CURRENT_DATE(), INTERVAL 90 DAY)` (when `DATE`).
@@ -22,7 +22,7 @@ syntax. Date arithmetic in particular varies by dialect:
- **Snowflake**: `transaction_date >= dateadd(day, -90, current_timestamp())`.
Match the column's manifest type (`type: time` → TIMESTAMP/DATETIME on the
-warehouse) — comparing TIMESTAMP to a DATE-arithmetic result fails on
+warehouse) - comparing TIMESTAMP to a DATE-arithmetic result fails on
BigQuery. After every `sl_edit_source`/`sl_write_source`, the inline validator runs a
`LIMIT 1` warehouse probe per measure and surfaces dialect mismatches; if
you see an error trailer, fix the expression and retry rather than leaving
@@ -68,12 +68,12 @@ Callers filter `region = 'US'` at query time.
**Bake constants in only when the filter has named business meaning that won't change** (`enterprise_arr` for a contractually defined tier), cannot be expressed via the source's dimensions, or comes from a regulated/fixed list.
**Time anchors and value lists belong in callers' filters, not in measure expressions or source SQL.**
-- Anti-pattern (date anchor inlined): `expr: count(distinct case when transaction_date >= '2026-04-12' then customer_id end)` — the date will need editing every time the question shifts, and every reader has to discover it.
-- Anti-pattern (value list inlined in source SQL): `WHERE product_category_1 IN ('Testosterone', 'Weight Loss', …)` — locks the source to today's catalog and blocks callers from broadening or narrowing.
+- Anti-pattern (date anchor inlined): `expr: count(distinct case when transaction_date >= '2026-04-12' then customer_id end)` - the date will need editing every time the question shifts, and every reader has to discover it.
+- Anti-pattern (value list inlined in source SQL): `WHERE product_category_1 IN ('Testosterone', 'Weight Loss', …)` - locks the source to today's catalog and blocks callers from broadening or narrowing.
- Preferred: a generic measure (`count(distinct customer_id)`) plus either a named segment that captures the *meaning* of the anchor (`gh_new_products_since_launch`) or a query-time filter. Callers compose; the source stays small.
- A date is durable to bake in only when it represents a regulatory cutover, a contractually fixed boundary, or a one-time event that reshapes how the source itself is read.
-**If you create a segment whose expr matches a measure's filter, the measure MUST reference the segment via `segments: [segment_name]` rather than re-inlining the predicate.** This is the canonical pattern even with a single measure — duplicating the predicate inline defeats the purpose of naming it.
+**If you create a segment whose expr matches a measure's filter, the measure MUST reference the segment via `segments: [segment_name]` rather than re-inlining the predicate.** This is the canonical pattern even with a single measure - duplicating the predicate inline defeats the purpose of naming it.
Anti-pattern:
```yaml
@@ -100,24 +100,24 @@ measures:
**Extract repeated filter bundles into named segments.** If the same predicate appears on multiple measures of the same source, lift it to a `segments[]` entry and have each measure reference it. One edit updates every measure that depends on it.
-**Never write a standalone file on a manifest-backed name.** If `sl_discover({ query: "" })` finds an existing schema for that name, you MUST write an overlay (`name:` + `measures:`/`segments:`/`descriptions:` only — no `sql:`, `table:`, `grain:`, `columns:`, `joins:`). A standalone with `sql:` or `table:` on a manifest-backed name clobbers the inherited columns and joins; `sl_write_source` and `sl_validate` both reject this shape with a clear fix hint. Always run `sl_discover` before your first write on any existing name.
+**Never write a standalone file on a manifest-backed name.** If `sl_discover({ query: "" })` finds an existing schema for that name, you MUST write an overlay (`name:` + `measures:`/`segments:`/`descriptions:` only - no `sql:`, `table:`, `grain:`, `columns:`, `joins:`). A standalone with `sql:` or `table:` on a manifest-backed name clobbers the inherited columns and joins; `sl_write_source` and `sl_validate` both reject this shape with a clear fix hint. Always run `sl_discover` before your first write on any existing name.
**Prefer overlay decomposition over standalone SQL sources.** Before reaching for `source_type: sql`, check whether the metric decomposes into measures on existing overlays (including cross-source derived measures). Use `source_type: sql` only when:
- The metric requires per-user/per-entity derivation that cannot be expressed as a single `expr` (e.g., `EXISTS` over a time-windowed subset), OR
- The metric requires multi-step CTEs whose intermediate grain is not a column in any existing source.
-When an `sql` source is unavoidable, note in its `descriptions` map which SL gap forced the choice so it can be retired once the primitive ships. It must target a name NOT in the manifest — pick a distinct one (e.g. `mrr_waterfall_rollup`, not `fct_orders`).
+When an `sql` source is unavoidable, note in its `descriptions` map which SL gap forced the choice so it can be retired once the primitive ships. It must target a name NOT in the manifest - pick a distinct one (e.g. `mrr_waterfall_rollup`, not `fct_orders`).
## Slim standalone sources via `inherits_columns_from`
When a standalone SQL source filters or projects from a single manifest-backed base table (the common pattern for derived views like `aav_consignments` over `MARTS.CONSIGNMENTS`), set `inherits_columns_from:` to the base table's manifest key and list only column **names** in `columns:`. Compose-time enrichment fills `type`, `descriptions`, and `role` from the matching manifest column.
-Discover the manifest key with `sl_discover` — pass the bare name (`CONSIGNMENTS`), the fully-qualified path (`ANALYTICS.MARTS.CONSIGNMENTS`), or any suffix; the tool resolves all forms and prints the canonical key in its output.
+Discover the manifest key with `sl_discover` - pass the bare name (`CONSIGNMENTS`), the fully-qualified path (`ANALYTICS.MARTS.CONSIGNMENTS`), or any suffix; the tool resolves all forms and prints the canonical key in its output.
```yaml
name: aav_consignments
descriptions:
- user: AAV consignments — filtered view of MARTS.CONSIGNMENTS for the auto-auction-vaulting channel.
+ user: AAV consignments - filtered view of MARTS.CONSIGNMENTS for the auto-auction-vaulting channel.
source_type: sql
sql: |
SELECT CONSIGNED_ITEM_ID, CASH_ADV_AMOUNT, ALT_VALUE_COMBINED, my_derived_flag
@@ -131,7 +131,7 @@ columns:
- { name: CONSIGNED_ITEM_ID } # type/descriptions inherited from manifest
- { name: CASH_ADV_AMOUNT }
- { name: ALT_VALUE_COMBINED }
- - { name: my_derived_flag, type: boolean, expr: "CASH_ADV_AMOUNT > 0", descriptions: { user: "Computed locally — has any cash advance." } }
+ - { name: my_derived_flag, type: boolean, expr: "CASH_ADV_AMOUNT > 0", descriptions: { user: "Computed locally - has any cash advance." } }
measures:
- name: total_cash_advance
expr: sum(CASH_ADV_AMOUNT)
@@ -139,12 +139,12 @@ measures:
Rules:
-- Inheritance fills only **blank** fields. If you set a `description` locally, it wins — useful when the base description is misleading in the filtered view.
+- Inheritance fills only **blank** fields. If you set a `description` locally, it wins - useful when the base description is misleading in the filtered view.
- A column not in the manifest (a derived/aliased column, or one from a different table in a `JOIN`) needs its own `type` and `description` declared.
-- If `inherits_columns_from` doesn't resolve, the source still loads, but every column without a type triggers a validator error on the warehouse probe — `sl_discover` first to confirm the key.
-- Don't use `inherits_columns_from` for sources backed by `table:` (those should be overlays — see the rule against shadowing the manifest above).
+- If `inherits_columns_from` doesn't resolve, the source still loads, but every column without a type triggers a validator error on the warehouse probe - `sl_discover` first to confirm the key.
+- Don't use `inherits_columns_from` for sources backed by `table:` (those should be overlays - see the rule against shadowing the manifest above).
-## Refinement — replace, don't append
+## Refinement - replace, don't append
When the user corrects a prior answer, the existing measure is wrong by the user's own standard. Replace it, don't add a parallel measure.
@@ -208,14 +208,14 @@ SL source, `tables:` frontmatter, `sl_refs`, or `emit_unmapped_fallback`:
## Tool sequence
-1. `sl_discover` — see what source files exist.
-2. `sl_discover({ query: "" })` — **REQUIRED before the first write on any name**. Shows columns/joins/grain from the manifest. If the call returns a schema, you MUST write an overlay, not a standalone. Skipping this is the #1 cause of accidentally shadowing the manifest.
-3. `sl_read_source({ connectionId, sourceName })` — read the raw YAML before editing.
+1. `sl_discover` - see what source files exist.
+2. `sl_discover({ query: "" })` - **REQUIRED before the first write on any name**. Shows columns/joins/grain from the manifest. If the call returns a schema, you MUST write an overlay, not a standalone. Skipping this is the #1 cause of accidentally shadowing the manifest.
+3. `sl_read_source({ connectionId, sourceName })` - read the raw YAML before editing.
4. For modifications: `sl_edit_source({ connectionId, sourceName, yaml_edits: [{ oldText, newText, reason }] })` with exact-string replacements. `oldText` must match exactly and be unique in the file.
5. For new sources or full rewrites: `sl_write_source({ connectionId, sourceName, source })` with the full structured source definition.
6. For join discovery: use `sql_execution({connectionName: "warehouse", sql: "SELECT count(*) FROM public.orders o JOIN public.customers c ON c.id = o.customer_id LIMIT 20"})` with the target warehouse connection name and dialect-correct table names to verify the join key exists in both tables and assess cardinality before declaring the join.
-7. Cross-reference knowledge: author the edge once on the **wiki** side via `sl_refs: [source_name]` in the page's front-matter. The reverse edge (wiki pages that cite an SL source) is derived automatically by the reconciler — do not add a `knowledge_refs:` field to SL YAMLs.
-8. `sl_validate` — run after writing or editing to surface schema issues, duplicate measure names, and cross-source validation errors. Read-only; the writes are already committed (the squash-at-end flow will collapse them into one commit).
+7. Cross-reference knowledge: author the edge once on the **wiki** side via `sl_refs: [source_name]` in the page's front-matter. The reverse edge (wiki pages that cite an SL source) is derived automatically by the reconciler - do not add a `knowledge_refs:` field to SL YAMLs.
+8. `sl_validate` - run after writing or editing to surface schema issues, duplicate measure names, and cross-source validation errors. Read-only; the writes are already committed (the squash-at-end flow will collapse them into one commit).
## Editing patterns
@@ -224,13 +224,13 @@ SL source, `tables:` frontmatter, `sl_refs`, or `emit_unmapped_fallback`:
- Do NOT modify existing measures or their descriptions unless the current turn explicitly corrects them.
- During bundle/external ingest, include `rawPaths` on every `sl_write_source`/`sl_edit_source` call with only the raw files that directly support the SL action.
-## Worked example — additive overlay
+## Worked example - additive overlay
Conversation:
- User: "What was the average order value last quarter?"
- Assistant fell back to SQL: `SELECT AVG(amount) FROM orders WHERE order_date >= ...`
-Existing index: `orders [measures=0, joins=0] — candidate for enrichment`.
+Existing index: `orders [measures=0, joins=0] - candidate for enrichment`.
```
sl_discover()
@@ -253,9 +253,9 @@ sl_validate({ connectionId: "warehouse" })
→ clean
```
-The overlay only contains `name` and `measures` — no columns, grain, or table. Those are inherited from the manifest.
+The overlay only contains `name` and `measures` - no columns, grain, or table. Those are inherited from the manifest.
-## Worked example — refinement (replace)
+## Worked example - refinement (replace)
Prior turn:
- [user] "How many active users do we have per region?"
@@ -281,7 +281,7 @@ sl_validate({ connectionId: "warehouse" })
If you only added a new measure, the old incorrect `active_count` would stay and future queries would keep answering the wrong question.
-## Worked example — new join
+## Worked example - new join
Prior turn: user asked to correlate LTV with protocol count; assistant joined `fct_orders` with `fct_mau_multiprotocol` on `admin_user_id` in raw SQL.
@@ -315,6 +315,6 @@ Always verify joins with `sql_execution` before adding them.
- A measure whose filter matches a segment MUST reference the segment via `segments: [name]`.
- Extract repeated predicates into named segments.
- Use computed dimensions for derived categories.
-- When the user corrects a prior answer, replace — don't append.
+- When the user corrects a prior answer, replace - don't append.
- Always run `sl_validate` after writing to surface issues.
- If nothing is worth capturing, respond without calling any SL write tool.
diff --git a/packages/context/skills/wiki_capture/SKILL.md b/packages/context/skills/wiki_capture/SKILL.md
index d57a39ad..55601f99 100644
--- a/packages/context/skills/wiki_capture/SKILL.md
+++ b/packages/context/skills/wiki_capture/SKILL.md
@@ -1,6 +1,6 @@
---
name: wiki_capture
-description: KTX's knowledge base — wiki pages for durable, reusable business knowledge. Covers capture workflow for user preferences, metric definitions, organizational conventions, and cross-references between wiki pages and semantic-layer sources. Loaded by the post-turn memory-agent only. The research agent reads wiki via `wiki_read`/`wiki_search` but does not write it.
+description: KTX's knowledge base - wiki pages for durable, reusable business knowledge. Covers capture workflow for user preferences, metric definitions, organizational conventions, and cross-references between wiki pages and semantic-layer sources. Loaded by the post-turn memory-agent only. The research agent reads wiki via `wiki_read`/`wiki_search` but does not write it.
callers: [memory_agent]
---
@@ -8,14 +8,14 @@ callers: [memory_agent]
## Role
-The knowledge base stores durable, reusable business knowledge for an analytics assistant. Each page is a self-contained rule, definition, or convention that answers "how should this concept be handled in this organization?" — written once and reused across chats.
+The knowledge base stores durable, reusable business knowledge for an analytics assistant. Each page is a self-contained rule, definition, or convention that answers "how should this concept be handled in this organization?" - written once and reused across chats.
Scope selection is handled by the runtime:
- When user-scoped knowledge is enabled AND the caller is a chat turn, writes go to the user's **personal** scope.
- When the caller is an admin-driven ingest (`sourceType: 'external_ingest'`), writes go to the **global** scope.
- When user-scoped knowledge is disabled, all writes go to the global scope.
-The `wiki_write` tool picks the right scope based on the session. Capture logic does not need to choose — focus on whether the content is worth capturing at all.
+The `wiki_write` tool picks the right scope based on the session. Capture logic does not need to choose - focus on whether the content is worth capturing at all.
## What to capture
@@ -30,8 +30,8 @@ Do NOT capture:
- One-off requests ("answer under 100 words").
- Temporary instructions scoped to the current chat.
- Ad-hoc formatting preferences.
-- Information already present in the semantic layer (column names, join paths, measure formulas — those belong in SL).
-- **Query results, snapshots, or time-bounded benchmark tables.** Numbers go stale; pasting "Oct 2025: 25%, Nov 2025: 19.9%, …" creates misinformation as soon as new data lands. Reference the SL source by name (`sl_refs`) and let future query tools pull live data — the wiki captures the *rule* (definition, exclusion, segmentation), the SL source captures the *measure*, and query execution captures the *current values*.
+- Information already present in the semantic layer (column names, join paths, measure formulas - those belong in SL).
+- **Query results, snapshots, or time-bounded benchmark tables.** Numbers go stale; pasting "Oct 2025: 25%, Nov 2025: 19.9%, …" creates misinformation as soon as new data lands. Reference the SL source by name (`sl_refs`) and let future query tools pull live data - the wiki captures the *rule* (definition, exclusion, segmentation), the SL source captures the *measure*, and query execution captures the *current values*.
- **Interpretive narrative tied to a specific snapshot** ("M1 retention degraded sharply from Dec 2025"). The observation is anchored to data that will move; the actionable convention (e.g., "always exclude in-progress cohorts") may be worth capturing on its own, but the snapshot-specific commentary is not.
If nothing is worth capturing, respond without calling any tool.
@@ -40,13 +40,13 @@ If nothing is worth capturing, respond without calling any tool.
1. Read the wiki index (provided in the prompt) and decide whether the turn introduces durable knowledge.
2. **Before writing**, search for related content so cross-references are accurate:
- - `discover_data` first when a page relates to data or SL concepts — find
+ - `discover_data` first when a page relates to data or SL concepts - find
existing wiki pages, SL sources, and raw warehouse schema together.
- - `wiki_search` with the topic — find related wiki pages to populate `refs`.
- - `sl_discover` with the concept — if the page defines a metric (revenue, churn, retention, LTV, ARR, MRR, CAC, attribution, etc.), find matching SL sources or measures to populate `sl_refs`. If no matches, pass `sl_refs: []` so future readers know you checked.
+ - `wiki_search` with the topic - find related wiki pages to populate `refs`.
+ - `sl_discover` with the concept - if the page defines a metric (revenue, churn, retention, LTV, ARR, MRR, CAC, attribution, etc.), find matching SL sources or measures to populate `sl_refs`. If no matches, pass `sl_refs: []` so future readers know you checked.
3. If updating an existing page, `wiki_read` it first. Use the returned `structured.content` or markdown body as the exact stored text for targeted replacements; current tags, refs, and sl_refs are returned in structured metadata.
4. `wiki_write` to create or update. Prefer merging into an existing page over creating a new one.
-5. `wiki_remove` only when a page is truly obsolete — not to replace stale content (update it instead).
+5. `wiki_remove` only when a page is truly obsolete - not to replace stale content (update it instead).
For bundle/external ingest, include `rawPaths` on every `wiki_write`/`wiki_remove` call with only the raw files that directly support that wiki action. This keeps ingest provenance tied to the actual source file, not every file in the WorkUnit.
@@ -82,7 +82,7 @@ SL source, `tables:` frontmatter, `sl_refs`, or `emit_unmapped_fallback`:
- **Keys** are short kebab-case topic identifiers: `leads-source-filter`, `revenue-definition`, `churn-calculation`. No namespacing, no prefixes.
- **Summary** is a one-line hook (≤200 chars) shown in the index.
-- **Content** is concise markdown — actionable rules, not prose.
+- **Content** is concise markdown - actionable rules, not prose.
```
## [Topic Title]
@@ -116,8 +116,8 @@ All three fields use REPLACE semantics on update:
Two modes:
-- **Full content** — pass `content` to rewrite the whole page. Use when the page structure needs to change.
-- **Targeted edits** — pass `replacements: [{ oldText, newText }]` to apply exact-string replacements. Use for small updates; preserves the rest of the page.
+- **Full content** - pass `content` to rewrite the whole page. Use when the page structure needs to change.
+- **Targeted edits** - pass `replacements: [{ oldText, newText }]` to apply exact-string replacements. Use for small updates; preserves the rest of the page.
When editing, read the page first so the edit matches exact whitespace and indentation.
@@ -125,7 +125,7 @@ When editing, read the page first so the edit matches exact whitespace and inden
Organization (GLOBAL) pages are read-only from a user's personal-scope session. To override a global rule for a single user, write a personal page with the **same key**. At read time the USER page wins.
-## Worked example — capturing a metric with cross-references
+## Worked example - capturing a metric with cross-references
User says: "Going forward, the official refund rate is total refunded amount divided by total gross transaction amount."
@@ -133,7 +133,7 @@ User says: "Going forward, the official refund rate is total refunded amount div
wiki_list_tags()
→ existing tags include "finance"
wiki_search({ query: "refund revenue paid orders" })
- → returns `revenue-definition` (related — defines paid-orders filter)
+ → returns `revenue-definition` (related - defines paid-orders filter)
sl_discover({ query: "refund rate" })
→ returns fct_orders (score 0.08), fct_gaap_revenue (0.06)
sl_read_source({ connectionId: "warehouse", sourceName: "fct_orders" })
@@ -155,6 +155,6 @@ Search-then-write order matters. Cross-references are part of the page's identit
- Read existing pages before updating them.
- Prefer merging into an existing page over creating a new one.
- Prefer fewer, richer pages over many thin ones.
-- Write content as clear, actionable rules — not narrative prose.
+- Write content as clear, actionable rules - not narrative prose.
- Discover cross-references via search before writing, not after.
- If nothing is worth capturing, respond without calling any tool.
diff --git a/python/ktx-sl/AGENTS.md b/python/ktx-sl/AGENTS.md
index 591ed9da..b9b54f18 100644
--- a/python/ktx-sl/AGENTS.md
+++ b/python/ktx-sl/AGENTS.md
@@ -1,6 +1,6 @@
# Semantic Layer Engine
-Python semantic layer that generates SQL from structured JSON queries. No `from` clause — sources are inferred from fully-qualified field names (`source.column`).
+Python semantic layer that generates SQL from structured JSON queries. No `from` clause - sources are inferred from fully-qualified field names (`source.column`).
## Quick Start
@@ -16,7 +16,7 @@ Use `--model` to pass a self-contained YAML model (list of source definitions) i
### 1. Create an inline model file
```yaml
-# /tmp/model.yaml — a YAML list of source definitions
+# /tmp/model.yaml - a YAML list of source definitions
- name: orders
table: public.orders
grain: [id]
@@ -119,9 +119,9 @@ uv run python -m semantic_layer.cli --model /tmp/model.yaml \
## Coding Guidelines
-### Expression handling — always use sqlglot AST, never regex on SQL
+### Expression handling - always use sqlglot AST, never regex on SQL
-- **Parse expressions** with `sqlglot.parse_one(f"SELECT {expr}")` and walk/transform the AST. Never use `str.replace()`, `re.sub()`, or string splitting on SQL fragments — these corrupt string literals, aliases, and nested expressions.
+- **Parse expressions** with `sqlglot.parse_one(f"SELECT {expr}")` and walk/transform the AST. Never use `str.replace()`, `re.sub()`, or string splitting on SQL fragments - these corrupt string literals, aliases, and nested expressions.
- **Quote reserved words first**: always call `quote_reserved_identifiers(expr)` before passing to `sqlglot.parse_one()`. Column/source names like `group`, `key`, `order` will fail to parse otherwise.
- **Use the parse cache** in `parser.py` (`ExpressionParser._parse_as_select()`) for read-only AST walks. Direct `sqlglot.parse_one()` calls are fine when you need to `.transform()` the tree.
- **Regex is fine for non-SQL tasks**: sanitizing alias names, masking string literals before parse, etc. The rule is: don't use regex to interpret SQL structure.