13 KiB
name: metricflow_ingest
description: Map a MetricFlow semantic_model or metric into KTX semantic layer sources. Covers the MetricFlow to KTX primitive table, extends: inheritance flattening, metric-type handling (simple / derived / ratio / cumulative / conversion), model: ref('x') resolution, and four worked examples. Load when the turn contains .yml/.yaml files with top-level semantic_models: or metrics:.
callers: [memory_agent]
MetricFlow to KTX Semantic Layer
A MetricFlow semantic_model maps to an SL source; MetricFlow measures map to KTX measures; MetricFlow entities map to KTX joins; MetricFlow metrics (top-level) map to KTX measures OR to cross-model derived measures. Files in one WorkUnit are ALWAYS part of the same logical entity (a connected component, possibly spanning extends: + cross-model metric refs). Flatten inheritance and cross-file references at write time.
Mapping table
| MetricFlow | KTX form | Notes |
|---|---|---|
semantic_model: X { model: ref('t') } with measures + dimensions |
Overlay at <connId>/X.yaml with measures, columns (computed), joins |
The model: ref resolves to a manifest table. |
semantic_model: X { model: source('s','t') } |
Overlay at <connId>/X.yaml over table t. |
Same shape; source() still resolves to a physical table. |
semantic_model: X { model: <literal> } with no manifest entry |
Standalone with explicit sql:, grain:, columns: |
Happens when the dbt manifest isn't available. |
semantic_model: Y { extends: X } |
Merge Y's measures/dimensions/entities into X's overlay, or write a single overlay named for the most-derived child (Y) containing both X's and Y's primitives | Do not emit a second overlay for X — flatten. |
measures: [{ name, agg, expr }] |
measures: [{ name, expr: "<agg>(<expr>)" }] |
Aggregation inlined. agg: count_distinct → count(distinct ...). |
entities: [{ name, type: primary }] |
grain: [<entity_name-or-expr>] on the overlay/standalone |
Primary/unique entities drive grain. |
entities: [{ name, type: foreign }] |
joins: entry joining to the primary-entity's semantic_model |
Only when a matching primary is discoverable. |
metrics: [{ type: simple, type_params: { measure: X } }] |
If the base measure is labeled/described by the metric: in-place edit to the existing measure. Otherwise leave as-is. | Same-name metrics can absorb metadata. |
metrics: [{ type: simple, filter: <jinja> }] |
New measure on the same source, with the filter translated to SQL and attached via filter: |
Translate Jinja {{ Dimension('x__y') }} to the column name y. |
metrics: [{ type: derived, type_params: { expr, metrics } }] |
Derived measure on whichever source owns the referenced measures, with expr: referencing measure names |
If the metric spans models, still write it once on the source owning the "primary" measure (the one the agent judges most central). Mention the cross-model chain in the description. |
metrics: [{ type: ratio, type_params: { numerator, denominator } }] |
Same as derived; expr: "numerator / NULLIF(denominator, 0)" if no explicit expr |
Safe-division by default. |
metrics: [{ type: cumulative, type_params: { window, grain_to_date } }] |
Standalone source with a window-function SQL; reference the resulting column as a normal measure | KTX SL has no first-class cumulative primitive (spec Non-goals). |
metrics: [{ type: conversion }] |
Flag for human — do NOT write. Emit a wiki note describing the intended semantics. | No KTX equivalent in v1. |
| Metric not mappable | Wiki page <metric_name>-definition.md with the full YAML body quoted |
Capture the intent even if we can't emit SL. |
Type map: MetricFlow time to KTX time; categorical to string; number to number; boolean to boolean. Follow expr over name when both differ — expr is the physical column.
Flattening extends:
Within one WorkUnit, multiple semantic_models linked by extends: are guaranteed to be present (the chunker groups them). Resolve inheritance before writing:
- Start with the most-derived child (the one that no other semantic_model extends).
- Walk the
extends:chain upward, accumulating measures, dimensions, entities. - Write ONE overlay/standalone, named for the most-derived child's SL-appropriate name (not the base).
- Parents that lack their own distinctive content should NOT get a separate overlay. If a parent has unique measures a child doesn't inherit, consider whether the base is used elsewhere — if yes, write both; if no, still one overlay.
- Measure/dimension name collisions: child wins, but note the overridden parent in the overlay's description or in a sibling wiki page.
The spec's worked example has orders, orders_ext (extends orders), and metrics/orders_final.yml (defines revenue referencing both). The right output is ONE overlay named orders_ext (or orders if the team's naming favors the base) containing order_count, gross_amount, refund_amount, and a derived revenue measure. Provenance tags point to all three source files.
model: ref resolution
The model: field on a semantic_model is a string like ref('table_name'), source('src','table_name'), or a literal. Resolve:
ref('x')→ table namex. Verify viasl_discover(x).source('s','t')→ table namet. Verify viasl_discover(t).- Literal (no
ref(...)/source(...)) → treat as the table name directly.
If sl_discover errors (no such table), fall back to sql_execution({ sql: "SELECT column_name FROM <dataset>.INFORMATION_SCHEMA.COLUMNS WHERE table_name = '<x>'" }) (session shape — a connection is already pinned by the ingest session). Never invent column names — every column in columns:, grain:, and sql: must be sourced from a real probe.
After every sl_write_source, call sl_validate. The warehouse will reject invented columns with Unrecognized name: <name> — treat as a hard failure and re-read the schema.
Cumulative metrics — sql-standalone fallback
KTX SL has no first-class window: or grain_to_date: primitive in v1 (spec Non-goals). Translate a MetricFlow cumulative metric to a standalone SL source with a window-function SQL:
# MetricFlow input:
metrics:
- name: cum_revenue_7d
type: cumulative
type_params:
measure: gross_amount
window: 7 days
# KTX standalone output:
name: cum_revenue_7d
source_type: sql
sql: |
SELECT
ordered_at,
SUM(amount) OVER (ORDER BY ordered_at RANGE BETWEEN INTERVAL '7' DAY PRECEDING AND CURRENT ROW) AS cum_revenue_7d,
order_id
FROM analytics.orders
grain: [order_id]
columns:
- {name: ordered_at, type: time, role: time}
- {name: cum_revenue_7d, type: number}
- {name: order_id, type: string}
measures:
- {name: cum_revenue_7d, expr: "max(cum_revenue_7d)"}
Pick the time column based on the semantic_model's defaults.agg_time_dimension (e.g. ordered_at). If the MetricFlow config omits it, probe the base table for time-typed columns and choose the most obvious. After writing the standalone SQL source, call emit_unmapped_fallback with rawPath set to the MetricFlow file path, reason: "cumulative_metric_unsupported", and fallback: "sql_standalone".
Conversion metrics — flag for human
metrics:
- name: signup_to_first_order
type: conversion
type_params:
conversion_type_params:
entity: customer
base_measure: signup_count
conversion_measure: first_order_count
window: 30 days
Do NOT emit SL for this. Instead:
- Write a wiki page at
knowledge/global/<metric_name>-intent.mdquoting the full YAML body and a one-line explanation of the intended semantics (base event → conversion event within window). - Call
emit_unmapped_fallbackwithrawPathset to the MetricFlow file path,reason: "conversion_metric_unsupported", andfallback: "flagged".
When KTX SL gains conversion primitives, re-ingesting will find the prior wiki note (via priorProvenance) and replace it with an SL source.
Provenance markers
Every overlay/standalone/wiki page emitted from a MetricFlow source carries HTML-comment provenance tags. When one overlay derives from multiple files (e.g. an extends chain), emit one tag per contributing file:
# <!-- from: raw-sources/conn-1/metricflow/<syncId>/models/orders.yml#L1-20 -->
# <!-- from: raw-sources/conn-1/metricflow/<syncId>/models/orders_ext.yml#L1-12 -->
# <!-- from: raw-sources/conn-1/metricflow/<syncId>/metrics/orders_final.yml#L1-10 -->
name: orders_ext
...
Line ranges (#L<start>-<end>) point to the exact YAML span within the file (the semantic_models: entry for its own name). Use read_raw_span to identify those ranges before writing.
Example 1 — single semantic_model to overlay
# MetricFlow:
semantic_models:
- name: orders
model: ref('orders')
entities:
- {name: order_id, type: primary}
measures:
- {name: order_count, agg: count, expr: order_id}
- {name: gross_amount, agg: sum, expr: amount}
# KTX overlay at <connId>/orders.yaml:
# <!-- from: raw-sources/.../models/orders.yml#L1-10 -->
name: orders
description: Order fact table.
measures:
- {name: order_count, expr: "count(order_id)"}
- {name: gross_amount, expr: "sum(amount)"}
grain: [order_id]
Example 2 — extends chain → one flattened overlay
# MetricFlow:
# models/orders.yml
semantic_models:
- name: orders
model: ref('orders')
measures:
- {name: order_count, agg: count, expr: order_id}
- {name: gross_amount, agg: sum, expr: amount}
# models/orders_ext.yml
semantic_models:
- name: orders_ext
model: ref('orders_ext')
extends: orders
measures:
- {name: refund_amount, agg: sum, expr: refund_amt}
# metrics/orders_final.yml
metrics:
- name: revenue
type: derived
type_params:
expr: gross_amount - refund_amount
metrics:
- {name: gross_amount}
- {name: refund_amount}
# KTX overlay at <connId>/orders_ext.yaml (one file; inheritance flattened):
# <!-- from: raw-sources/.../models/orders.yml#L1-10 -->
# <!-- from: raw-sources/.../models/orders_ext.yml#L1-8 -->
# <!-- from: raw-sources/.../metrics/orders_final.yml#L1-10 -->
name: orders_ext
description: Extended order fact including refund handling; `revenue` = gross - refund.
measures:
- {name: order_count, expr: "count(order_id)"}
- {name: gross_amount, expr: "sum(amount)"}
- {name: refund_amount, expr: "sum(refund_amt)"}
- {name: revenue, expr: "gross_amount - refund_amount"}
grain: [order_id]
Example 3 — derived metric spanning two semantic_models
# models/sales.yml
semantic_models:
- name: sales
model: ref('sales')
measures:
- {name: revenue, agg: sum, expr: revenue_cents}
# models/costs.yml
semantic_models:
- name: costs
model: ref('costs')
measures:
- {name: cost, agg: sum, expr: cost_cents}
# metrics/margin.yml
metrics:
- name: margin
type: derived
type_params:
expr: revenue - cost
metrics: [{name: revenue}, {name: cost}]
Because the WorkUnit bundles all three files (cross-component union via the metric), write the derived measure on ONE of the two sources — pick the source whose domain "owns" the metric (here, sales — margin is inherently a sales metric). Cross-source references aren't native in KTX SL; treat the metric's operands as already-resolvable in the target source's query context OR emit a standalone SQL that joins the two tables:
# <connId>/sales.yaml
# <!-- from: .../models/sales.yml#L1-8 -->
# <!-- from: .../models/costs.yml#L1-8 -->
# <!-- from: .../metrics/margin.yml#L1-8 -->
name: sales
measures:
- {name: revenue, expr: "sum(revenue_cents)"}
# <connId>/margin.yaml — standalone because it spans two tables
# <!-- from: .../models/sales.yml#L1-8 -->
# <!-- from: .../models/costs.yml#L1-8 -->
# <!-- from: .../metrics/margin.yml#L1-8 -->
name: margin
source_type: sql
sql: |
SELECT s.period_id, s.revenue_cents, COALESCE(c.cost_cents, 0) AS cost_cents
FROM analytics.sales s
LEFT JOIN analytics.costs c ON c.period_id = s.period_id
grain: [period_id]
columns:
- {name: period_id, type: string}
- {name: revenue_cents, type: number}
- {name: cost_cents, type: number}
measures:
- {name: revenue, expr: "sum(revenue_cents)"}
- {name: cost, expr: "sum(cost_cents)"}
- {name: margin, expr: "sum(revenue_cents) - sum(cost_cents)"}
Also write a wiki page at knowledge/global/margin-metric.md explaining the cross-source origin.
Example 4 — filtered metric creates a new measure
metrics:
- name: paid_order_count
type: simple
type_params:
measure: order_count
filter: "{{ Dimension('orders__status') }} = 'paid'"
# <connId>/orders.yaml
measures:
- {name: order_count, expr: "count(order_id)"}
- {name: paid_order_count, expr: "count(order_id)", filter: "status = 'paid'"}
Translate {{ Dimension('orders__status') }} to the bare column name status (the table alias prefix is implicit within the SL source's scope).