description: Convert Metabase questions, models, and metrics into KTX Semantic Layer source definitions. Covers result-metadata to KSL column type mapping, FK/PK detection, near-duplicate deduplication, pre-aggregation decomposition, join-graph connectivity, and how to react to priorProvenance from earlier ingest syncs. Load when the WorkUnit contains `cards/<id>.json` files under a Metabase bundle.
Each WorkUnit represents one Metabase collection's cards for one Metabase database (mapped to exactly one KTX connection). Every `cards/<id>.json` file carries the resolved SQL, result_metadata, card type, collection path, and referenced-card ids. The WU's `sync-config.json` tells you which sync mode is active and which selections apply. `databases/<id>.json` tells you the target KTX connection.
- Identify grain candidates: columns with `semantic_type: type/PK`.
- Identify join candidates: columns with `semantic_type: type/FK` plus `fk_target_field_id`.
- Identify time columns: `semantic_type: type/CreationTimestamp` or `type/UpdatedTimestamp` → set `role: time`.
- Use `display_name` for measure descriptions when available.
### Additional card metadata
-`parameters`: list of card-level parameters with widget types and defaults. When SQL resolution fell back to unresolved SQL, use this to drive Step A of the SQL-translation workflow (drop optional clauses): knowing each `{{ var }}` is `type: "date/range"` vs `type: "category"` tells you what kind of clause it is.
-`resultMetadata[i].field_ref`: Metabase's canonical reference to the source warehouse field. Shape `["field", <field_id>, <options>]`. When this is set, the column maps directly to a warehouse field, which is useful for declaring joins from FK metadata without re-parsing SQL.
-`lastRunAt`: ISO timestamp of the card's last execution. If null or very old, the card may be dead; prefer skipping over creating a source.
-`dashboardCount`: number of dashboards referencing the card. Cards with `dashboardCount: 0` and a stale `lastRunAt` are strong skip signals.
2. Check `sl_discover` and `sl_read_source` for existing sources that overlap.
3. Decide:
- Simple aggregation on a table that already has a source → `sl_edit_source` to add a measure.
- Join between tables that should be linked in the SL graph → `sl_edit_source` to add a join.
- Complex derived SQL (CTEs, multi-layer aggregation, scoring models) → `sl_write_source` with `source_type: sql`. When the SQL projects/filters from a single manifest-backed base table, set `inherits_columns_from: <manifest_key>` so columns inherit type and description from the manifest — see `sl_capture` skill for the slim form. Use `sl_discover` to discover the manifest key from the table reference in the SQL (it accepts `MARTS.CONSIGNMENTS`, `ANALYTICS.MARTS.CONSIGNMENTS`, or `CONSIGNMENTS`).
- New base table not yet in the semantic layer → `sl_write_source` with `source_type: table`.
- Trivial query (`SELECT *`, simple `COUNT(*)` with no business logic) → do nothing; the runner will record this card as `action_type='skipped'`.
- Duplicate of an existing measure → same as trivial; do nothing for this card.
**Manifest-only names need an overlay first.** If `sl_discover` shows a source name with `Type: table` but `sl_read_source` returns "Source not found", the source lives only in the schema manifest (no standalone overlay yet). `sl_edit_source` cannot edit manifest-only names — you must bootstrap an overlay with `sl_write_source` using the overlay shape:
```yaml
name: <SOURCE_NAME>
measures:
- name: <measure_name>
expr: "<expression>"
```
Overlay shape: `name:` plus any of `measures:`, `segments:`, `description:`, `joins:`, `disable_joins:`. Never include `sql:`, `table:`, `grain:`, or `columns:` on a manifest-backed name — those would shadow the manifest's schema and drop its joins. Overlay `joins:` are merged additively with the manifest's joins (deduped by `to` + `on`); use `disable_joins: ["<on-clause>"]` to suppress a specific manifest join. After the overlay exists, use `sl_edit_source` for further tweaks. See `sl_capture` skill for the canonical overlay rule.
**Join discovery:** When your card's SQL references warehouse tables (e.g. in `FROM` or `JOIN` clauses), call `sl_discover({ query: '<table>' })` before writing. The matching manifest entry's `name` is the value you put in `joins: [- to: <name>]`. Use `many_to_one` for FK-to-dimension joins, `one_to_many` for the reverse.
## priorProvenance
If the WU prompt includes a `priorProvenance` section for a card, it tells you what happened on prior ingest syncs. Treat it as advisory:
-`action_type: source_created` on source X → prefer editing X with `sl_edit_source` rather than writing a new source.
-`action_type: measure_added` on source X → you already contributed to X; add only measures that aren't present.
-`action_type: subsumed` or `merged` → this card was folded into another source last time; unless its SQL has changed structurally, keep it subsumed (no new write).
-`action_type: skipped` → last time we decided not to ingest this card; re-read the SQL and confirm the decision still holds. If the card now has non-trivial business logic, ingest it.
## Deduplication
Before writing, scan all cards in this WU for near-duplicate groups — cards whose `resolvedSql` shares the same CTEs, base tables, joins, and aggregation structure but differs only in:
- Trailing filters (e.g. `date_trunc(week, date)` vs `date_trunc(month, date)`).
- Minor `WHERE` clause variations.
- Column aliases or output column subsets.
- Aggregation granularity (daily vs weekly vs monthly).
When you find a group of near-duplicates:
1. Create ONE generalized source from the most comprehensive card in the group.
2. Strip card-specific trailing filters from the SQL so the source covers all variants (e.g. keep daily grain instead of filtering to week/month).
3. If each card had a distinct measure or filter, add them as separate measures on the single source.
4. For all cards except the canonical one, do nothing — they'll be recorded as `action_type='skipped'` automatically by the runner.
Do NOT merge cards with fundamentally different business logic, even if they share CTEs.
## Pre-aggregation decomposition
When a card's `resolvedSql` contains `GROUP BY` with aggregation functions (`SUM`, `COUNT`, `AVG`, …):
1.**Detect**: simple aggregation on base tables/joins — `SELECT` with `GROUP BY`, no complex CTEs or window functions.
2.**Decompose**: strip the `GROUP BY` and aggregation functions. Keep `FROM`, `JOIN`, and `WHERE` intact.
3.**Expose row-level columns**: include the grouped-by columns AND the raw columns being aggregated (e.g. `money_out` instead of `SUM(money_out) AS total_money_out`).
4.**Define aggregations as measures**: convert each aggregation into a KSL measure (e.g. `sum(money_out)`).
5.**Add joins**: with FK columns now exposed, declare joins to dimension sources.
Exception: keep the pre-aggregated SQL when the query involves multi-CTE pipelines, window functions, or recursive logic where decomposition would lose business logic.
## SQL translation from raw native to KSL
Every card carries a `resolvedSql` field. Check the staged card's `resolutionStatus` first:
-`resolutionStatus: "resolved"` — `{{#N}}` references are inlined and `[[ ... ]]` optional clauses have been dropped locally. If the resolved SQL contains no other parameters the SQL is executable as-is. If the card had **required** (non-bracketed) `{{ var }}` placeholders, the SQL is prefixed with a placeholder-warning comment block listing every dummy substitution Metabase made — see "Step A" below.
-`resolutionStatus: "fallback"` — Metabase failed to resolve. The SQL still contains `{{#N}}`, `{{#N-name}} alias`, `{{ var }}`, and `[[ ... ]]` syntax. Do the translation steps below before writing a source.
### Step A — Handle dummy-substituted placeholders (resolved cards only)
When a card has a required `{{ var }}` outside any `[[ ]]` block, the resolver substitutes a **dummy value** purely so Metabase's parser will accept the query. The resulting SQL is prefixed with a comment like:
```sql
-- PLACEHOLDER_WARNING: this SQL was extracted from a Metabase card with
-- unbound template parameters. The placeholders below were substituted with DUMMY
-- values to satisfy Metabase's parser — they DO NOT represent intended filters.
-- Drop the corresponding clauses (or expose them as runtime SL filters) before
-- persisting this SQL as a semantic-layer source.
WHERE start_date >= '2020-01-01' AND start_date < '2021-01-01' AND status = 'placeholder'
```
For each listed placeholder: locate the WHERE clause(s) in the SQL that reference the dummy literal and **drop them**, then strip the warning comment. SL chat-time filters compose narrowing predicates dynamically, so the source should represent the unfiltered dataset.
For `fallback` cards, dropping is simpler — the SQL still has the `[[ ... ]]` brackets and `{{ var }}` placeholders intact:
```sql
-- before:
WHERE 1=1
[[AND {{ auction_end }} ]]
[[AND status = {{ status }} ]]
-- after:
WHERE 1=1
```
### Step B — Inline `{{#N}}` references (fallback cards only)
Resolved cards already have `{{#N}}` inlined for you. For `fallback` cards, each `{{#N}}` (or `{{#N-some-slug}}`) in the SQL refers to another card's `resolvedSql`. The referenced card is in the WU's `rawFiles` or `dependencyPaths`. Read it with `read_raw_file`, then inline its SQL.
If the reference has an alias (`from {{#5996-listing-interactions}} tb`), the **outer** SQL probably uses that alias (`select tb.* ...`, `tb.column_name`, etc.). When you inline, you must EITHER:
1.**Pick a single base table inside the inlined SQL and rename its alias to the outer alias.** Useful when the inlined card is `SELECT * FROM listings JOIN ...` — set the LISTINGS alias to `tb` and `tb.*` keeps working in the outer query.
2.**Replace the outer alias references with explicit columns from the inlined SQL.** Useful when the inlined card has multiple JOINs and `tb.*` is ambiguous.
Never leave the outer alias dangling: after inlining, **grep your SQL for the outer alias name and rewrite or remove every reference**. A leftover `tb.*` with no `tb` table is the most common failure mode here.
### Step C — Inlining cleanup checklist
After Steps A and B, your SQL must:
- Contain no placeholder-warning comment, no `{{`, `}}`, `[[`, or `]]` characters anywhere.
- Reference no aliases that aren't defined inside the SQL itself.
- Be valid as a standalone subquery (the validator runs `SELECT * FROM (your_sql) LIMIT 1`).
If `resolutionStatus: "fallback"` and the SQL is still complex enough that you can't confidently translate it, **skip the card** rather than writing broken SQL. Call `emit_unmapped_fallback` with the staged card path as `rawPath`, `reason: "metabase_sql_untranslated"`, and `fallback: "flagged"`.
## Join-graph connectivity
For `source_type: table`:
- Use FK columns (`semantic_type: type/FK`) to declare `many_to_one` joins to dimension sources.
- Match column names ending in `_id` against existing sources' grain columns.
For `source_type: sql`:
- The validator parses your SQL and **rejects the write** if any FROM/JOIN table has a manifest entry that you did not declare in `joins:`. The error names every missing join target — declare a `many_to_one` join for each and reissue.
- Tables outside the manifest (schemas not covered by this connection — e.g. `staging.*` referenced from a MARTS source) are not flagged. For those, write a single-line `wiki_write` with key `unmapped-table-<table_name>` so the gap is documented, then call `emit_unmapped_fallback` with the staged card path as `rawPath`, `reason: "table_outside_manifest"`, and `fallback: "wiki_only"`.
Joins on manifest-backed names compose: the manifest's joins are inherited automatically, and any overlay `joins:` are merged on top (deduped by `to` + `on`). Use `disable_joins: ["<on-clause>"]` in the overlay to suppress a specific manifest join. If `sl_discover` shows a manifest-backed source with `Joins: 0` and the warehouse FK metadata is genuinely absent, declaring application-level joins via the overlay is fair game — bootstrap with `sl_write_source` (overlay shape above), then refine via `sl_edit_source`.
## Cross-card references (`{{#N}}`)
Resolved cards (`resolutionStatus: "resolved"`) have these inlined for you. Unresolved cards (`resolutionStatus: "fallback"`) need manual handling — see "SQL translation from raw native to KSL" above.
## Provenance markers
Every SL source and wiki page you write carries HTML-comment provenance tags pointing to the `cards/<id>.json` files they derive from: