ktx/packages/context/skills/metricflow_ingest/SKILL.md

---
name: metricflow_ingest
description: Map a MetricFlow semantic_model or metric into KTX semantic layer sources. Covers the MetricFlow to KTX primitive table, `extends:` inheritance flattening, metric-type handling (simple / derived / ratio / cumulative / conversion), `model: ref('x')` resolution, and four worked examples. Load when the turn contains `.yml`/`.yaml` files with top-level `semantic_models:` or `metrics:`.
callers: [memory_agent]
---

# MetricFlow to KTX Semantic Layer

A MetricFlow `semantic_model` maps to an SL source; MetricFlow `measures` map to KTX measures; MetricFlow `entities` map to KTX `joins`; MetricFlow `metrics` (top-level) map to KTX measures OR to cross-model derived measures. Files in one WorkUnit are ALWAYS part of the same logical entity (a connected component, possibly spanning `extends:` + cross-model metric refs). Flatten inheritance and cross-file references at write time.

## Mapping table

| MetricFlow | KTX form | Notes |
|---|---|---|
| `semantic_model: X { model: ref('t') }` with measures + dimensions | **Overlay** at `<connId>/X.yaml` with `measures`, `columns` (computed), `joins` | The `model:` ref resolves to a manifest table. |
| `semantic_model: X { model: source('s','t') }` | **Overlay** at `<connId>/X.yaml` over table `t`. | Same shape; `source()` still resolves to a physical table. |
| `semantic_model: X { model: <literal> }` with no manifest entry | **Standalone** with explicit `sql:`, `grain:`, `columns:` | Happens when the dbt manifest isn't available. |
| `semantic_model: Y { extends: X }` | **Merge** Y's measures/dimensions/entities into X's overlay, or write a single overlay named for the most-derived child (Y) containing both X's and Y's primitives | Do not emit a second overlay for X — flatten. |
| `measures: [{ name, agg, expr }]` | `measures: [{ name, expr: "<agg>(<expr>)" }]` | Aggregation inlined. `agg: count_distinct` → `count(distinct ...)`. |
| `entities: [{ name, type: primary }]` | `grain: [<entity_name-or-expr>]` on the overlay/standalone | Primary/unique entities drive grain. |
| `entities: [{ name, type: foreign }]` | `joins:` entry joining to the primary-entity's semantic_model | Only when a matching primary is discoverable. |
| `metrics: [{ type: simple, type_params: { measure: X } }]` | If the base measure is labeled/described by the metric: in-place edit to the existing measure. Otherwise leave as-is. | Same-name metrics can absorb metadata. |
| `metrics: [{ type: simple, filter: <jinja> }]` | **New measure** on the same source, with the filter translated to SQL and attached via `filter:` | Translate Jinja `{{ Dimension('x__y') }}` to the column name `y`. |
| `metrics: [{ type: derived, type_params: { expr, metrics } }]` | **Derived measure** on whichever source owns the referenced measures, with `expr:` referencing measure names | If the metric spans models, still write it once on the source owning the "primary" measure (the one the agent judges most central). Mention the cross-model chain in the description. |
| `metrics: [{ type: ratio, type_params: { numerator, denominator } }]` | Same as derived; `expr: "numerator / NULLIF(denominator, 0)"` if no explicit expr | Safe-division by default. |
| `metrics: [{ type: cumulative, type_params: { window, grain_to_date } }]` | **Standalone** source with a window-function SQL; reference the resulting column as a normal measure | KTX SL has no first-class cumulative primitive (spec Non-goals). |
| `metrics: [{ type: conversion }]` | **Flag for human** — do NOT write. Emit a wiki note describing the intended semantics. | No KTX equivalent in v1. |
| Metric not mappable | Wiki page `<metric_name>-definition.md` with the full YAML body quoted | Capture the intent even if we can't emit SL. |

Type map: MetricFlow `time` to KTX `time`; `categorical` to `string`; `number` to `number`; `boolean` to `boolean`. Follow `expr` over `name` when both differ — `expr` is the physical column.

## Flattening `extends:`

Within one WorkUnit, multiple semantic_models linked by `extends:` are guaranteed to be present (the chunker groups them). Resolve inheritance **before** writing:

1. Start with the most-derived child (the one that no other semantic_model extends).
2. Walk the `extends:` chain upward, accumulating measures, dimensions, entities.
3. Write ONE overlay/standalone, named for the most-derived child's SL-appropriate name (not the base).
4. Parents that lack their own distinctive content should NOT get a separate overlay. If a parent has unique measures a child doesn't inherit, consider whether the base is used elsewhere — if yes, write both; if no, still one overlay.
5. Measure/dimension name collisions: child wins, but note the overridden parent in the overlay's description or in a sibling wiki page.

The spec's worked example has `orders`, `orders_ext` (extends orders), and `metrics/orders_final.yml` (defines `revenue` referencing both). The right output is ONE overlay named `orders_ext` (or `orders` if the team's naming favors the base) containing `order_count`, `gross_amount`, `refund_amount`, and a derived `revenue` measure. Provenance tags point to all three source files.

## `model:` ref resolution

The `model:` field on a semantic_model is a string like `ref('table_name')`, `source('src','table_name')`, or a literal. Resolve:

- `ref('x')` → table name `x`. Verify via `sl_discover(x)`.
- `source('s','t')` → table name `t`. Verify via `sl_discover(t)`.
- Literal (no `ref(...)` / `source(...)`) → treat as the table name directly.

If `sl_discover` errors (no such table), fall back to `sql_execution({ sql: "SELECT column_name FROM <dataset>.INFORMATION_SCHEMA.COLUMNS WHERE table_name = '<x>'" })` (session shape — a connection is already pinned by the ingest session). **Never invent column names** — every column in `columns:`, `grain:`, and `sql:` must be sourced from a real probe.

After every `sl_write_source`, call `sl_validate`. The warehouse will reject invented columns with `Unrecognized name: <name>` — treat as a hard failure and re-read the schema.

## Cumulative metrics — sql-standalone fallback

KTX SL has no first-class `window:` or `grain_to_date:` primitive in v1 (spec Non-goals). Translate a MetricFlow cumulative metric to a standalone SL source with a window-function SQL:

```yaml
# MetricFlow input:
metrics:
  - name: cum_revenue_7d
    type: cumulative
    type_params:
      measure: gross_amount
      window: 7 days
```

```yaml
# KTX standalone output:
name: cum_revenue_7d
source_type: sql
sql: |
  SELECT
    ordered_at,
    SUM(amount) OVER (ORDER BY ordered_at RANGE BETWEEN INTERVAL '7' DAY PRECEDING AND CURRENT ROW) AS cum_revenue_7d,
    order_id
  FROM analytics.orders
grain: [order_id]
columns:
  - {name: ordered_at, type: time, role: time}
  - {name: cum_revenue_7d, type: number}
  - {name: order_id, type: string}
measures:
  - {name: cum_revenue_7d, expr: "max(cum_revenue_7d)"}
```

Pick the time column based on the semantic_model's `defaults.agg_time_dimension` (e.g. `ordered_at`). If the MetricFlow config omits it, probe the base table for time-typed columns and choose the most obvious. After writing the standalone SQL source, call `emit_unmapped_fallback` with `rawPath` set to the MetricFlow file path, `reason: "cumulative_metric_unsupported"`, and `fallback: "sql_standalone"`.

## Conversion metrics — flag for human

```yaml
metrics:
  - name: signup_to_first_order
    type: conversion
    type_params:
      conversion_type_params:
        entity: customer
        base_measure: signup_count
        conversion_measure: first_order_count
        window: 30 days
```

Do NOT emit SL for this. Instead:
- Write a wiki page at `knowledge/global/<metric_name>-intent.md` quoting the full YAML body and a one-line explanation of the intended semantics (base event → conversion event within window).
- Call `emit_unmapped_fallback` with `rawPath` set to the MetricFlow file path, `reason: "conversion_metric_unsupported"`, and `fallback: "flagged"`.

When KTX SL gains conversion primitives, re-ingesting will find the prior wiki note (via `priorProvenance`) and replace it with an SL source.

## Provenance markers

Every overlay/standalone/wiki page emitted from a MetricFlow source carries HTML-comment provenance tags. When one overlay derives from multiple files (e.g. an extends chain), emit one tag per contributing file:

```yaml
# <!-- from: raw-sources/conn-1/metricflow/<syncId>/models/orders.yml#L1-20 -->
# <!-- from: raw-sources/conn-1/metricflow/<syncId>/models/orders_ext.yml#L1-12 -->
# <!-- from: raw-sources/conn-1/metricflow/<syncId>/metrics/orders_final.yml#L1-10 -->
name: orders_ext
...
```

Line ranges (`#L<start>-<end>`) point to the exact YAML span within the file (the `semantic_models:` entry for its own `name`). Use `read_raw_span` to identify those ranges before writing.

## Example 1 — single semantic_model to overlay

```yaml
# MetricFlow:
semantic_models:
  - name: orders
    model: ref('orders')
    entities:
      - {name: order_id, type: primary}
    measures:
      - {name: order_count, agg: count, expr: order_id}
      - {name: gross_amount, agg: sum, expr: amount}
```

```yaml
# KTX overlay at <connId>/orders.yaml:
# <!-- from: raw-sources/.../models/orders.yml#L1-10 -->
name: orders
description: Order fact table.
measures:
  - {name: order_count, expr: "count(order_id)"}
  - {name: gross_amount, expr: "sum(amount)"}
grain: [order_id]
```

## Example 2 — extends chain → one flattened overlay

```yaml
# MetricFlow:
# models/orders.yml
semantic_models:
  - name: orders
    model: ref('orders')
    measures:
      - {name: order_count, agg: count, expr: order_id}
      - {name: gross_amount, agg: sum, expr: amount}

# models/orders_ext.yml
semantic_models:
  - name: orders_ext
    model: ref('orders_ext')
    extends: orders
    measures:
      - {name: refund_amount, agg: sum, expr: refund_amt}

# metrics/orders_final.yml
metrics:
  - name: revenue
    type: derived
    type_params:
      expr: gross_amount - refund_amount
      metrics:
        - {name: gross_amount}
        - {name: refund_amount}
```

```yaml
# KTX overlay at <connId>/orders_ext.yaml (one file; inheritance flattened):
# <!-- from: raw-sources/.../models/orders.yml#L1-10 -->
# <!-- from: raw-sources/.../models/orders_ext.yml#L1-8 -->
# <!-- from: raw-sources/.../metrics/orders_final.yml#L1-10 -->
name: orders_ext
description: Extended order fact including refund handling; `revenue` = gross - refund.
measures:
  - {name: order_count, expr: "count(order_id)"}
  - {name: gross_amount, expr: "sum(amount)"}
  - {name: refund_amount, expr: "sum(refund_amt)"}
  - {name: revenue, expr: "gross_amount - refund_amount"}
grain: [order_id]
```

## Example 3 — derived metric spanning two semantic_models

```yaml
# models/sales.yml
semantic_models:
  - name: sales
    model: ref('sales')
    measures:
      - {name: revenue, agg: sum, expr: revenue_cents}
# models/costs.yml
semantic_models:
  - name: costs
    model: ref('costs')
    measures:
      - {name: cost, agg: sum, expr: cost_cents}
# metrics/margin.yml
metrics:
  - name: margin
    type: derived
    type_params:
      expr: revenue - cost
      metrics: [{name: revenue}, {name: cost}]
```

Because the WorkUnit bundles all three files (cross-component union via the metric), write the derived measure on ONE of the two sources — pick the source whose domain "owns" the metric (here, `sales` — margin is inherently a sales metric). Cross-source references aren't native in KTX SL; treat the metric's operands as already-resolvable in the target source's query context OR emit a standalone SQL that joins the two tables:

```yaml
# <connId>/sales.yaml
# <!-- from: .../models/sales.yml#L1-8 -->
# <!-- from: .../models/costs.yml#L1-8 -->
# <!-- from: .../metrics/margin.yml#L1-8 -->
name: sales
measures:
  - {name: revenue, expr: "sum(revenue_cents)"}
```

```yaml
# <connId>/margin.yaml — standalone because it spans two tables
# <!-- from: .../models/sales.yml#L1-8 -->
# <!-- from: .../models/costs.yml#L1-8 -->
# <!-- from: .../metrics/margin.yml#L1-8 -->
name: margin
source_type: sql
sql: |
  SELECT s.period_id, s.revenue_cents, COALESCE(c.cost_cents, 0) AS cost_cents
  FROM analytics.sales s
  LEFT JOIN analytics.costs c ON c.period_id = s.period_id
grain: [period_id]
columns:
  - {name: period_id, type: string}
  - {name: revenue_cents, type: number}
  - {name: cost_cents, type: number}
measures:
  - {name: revenue, expr: "sum(revenue_cents)"}
  - {name: cost, expr: "sum(cost_cents)"}
  - {name: margin, expr: "sum(revenue_cents) - sum(cost_cents)"}
```

Also write a wiki page at `knowledge/global/margin-metric.md` explaining the cross-source origin.

## Example 4 — filtered metric creates a new measure

```yaml
metrics:
  - name: paid_order_count
    type: simple
    type_params:
      measure: order_count
    filter: "{{ Dimension('orders__status') }} = 'paid'"
```

```yaml
# <connId>/orders.yaml
measures:
  - {name: order_count, expr: "count(order_id)"}
  - {name: paid_order_count, expr: "count(order_id)", filter: "status = 'paid'"}
```

Translate `{{ Dimension('orders__status') }}` to the bare column name `status` (the table alias prefix is implicit within the SL source's scope).