ktx/docs-site/content/docs/concepts/semantic-layer-internals.mdx
Luca Martial b318671d31
Shorten concept docs (#118)
* docs: shorten concept pages

* docs: shorten semantic internals page

* docs: restore semantic internals diagrams

* docs: align semantic internals intro

* docs: rename semantic internals page

* docs: polish safe sql comparison
2026-05-16 12:36:07 -04:00

399 lines
18 KiB
Text

---
title: Context-Aware SQL
description: How KTX turns reviewed context, grain, and relationship evidence into safe SQL for agents.
---
## Why query planning needs context
Agents can generate SQL from schema alone, but safe analytics SQL needs more
than table names. KTX uses reviewed context to understand grain, joins, measures,
filters, and where aggregation must happen.
Read this page as four mechanics:
- context files feed the semantic engine;
- evidence becomes a join graph with grain and relationship metadata;
- review keeps the graph current;
- query planning avoids fan-out and ambiguous joins.
## Where the semantic layer fits
This planner is one subsystem inside KTX's broader context layer. It uses source
YAML, wiki context, scan evidence, and provenance to make context actionable for
SQL generation.
<div
className="not-prose my-8 overflow-hidden rounded-lg border border-fd-border bg-fd-card shadow-sm"
aria-label="How context inputs flow through the semantic layer into agent workflows"
>
<div className="grid gap-0 lg:grid-cols-[1fr_2rem_1.12fr_2rem_1fr]">
<section className="bg-fd-background p-4">
<p className="mb-3 text-[11px] font-semibold uppercase tracking-wide text-fd-muted-foreground">
{"Context inputs"}
</p>
<div className="grid gap-2 text-sm">
<div className="border-l-2 border-fd-primary bg-fd-card px-3 py-2">
<p className="font-mono text-xs text-fd-foreground">semantic-layer/</p>
<p className="mt-1 text-xs leading-5 text-fd-muted-foreground">
{"source YAML, measures, joins, grain"}
</p>
</div>
<div className="border-l-2 border-amber-500 bg-fd-card px-3 py-2">
<p className="font-mono text-xs text-fd-foreground">wiki/</p>
<p className="mt-1 text-xs leading-5 text-fd-muted-foreground">
{"business rules, definitions, caveats"}
</p>
</div>
<div className="border-l-2 border-orange-500 bg-fd-card px-3 py-2">
<p className="font-mono text-xs text-fd-foreground">raw-sources/</p>
<p className="mt-1 text-xs leading-5 text-fd-muted-foreground">
{"schema scans, keys, imported metadata"}
</p>
</div>
<div className="border-l-2 border-slate-500 bg-fd-card px-3 py-2 dark:border-cyan-200">
<p className="font-mono text-xs text-fd-foreground">provenance</p>
<p className="mt-1 text-xs leading-5 text-fd-muted-foreground">
{"ingest decisions and review history"}
</p>
</div>
</div>
</section>
<div className="hidden items-center justify-center bg-fd-background lg:flex" aria-hidden="true">
<span className="h-px w-full bg-fd-border" />
</div>
<section className="relative bg-[#102226] p-5 text-white dark:bg-[#0b181b]">
<div className="absolute inset-y-0 left-0 w-1 bg-fd-primary" />
<p className="mb-3 text-[11px] font-semibold uppercase tracking-wide text-cyan-200">
{"Semantic layer engine"}
</p>
<div className="grid gap-2 sm:grid-cols-2">
<div className="rounded-md border border-cyan-100/20 bg-white/8 px-3 py-2">
<p className="text-sm font-semibold">Join graph</p>
<p className="mt-1 text-xs leading-5 text-cyan-50/75">
{"sources as nodes, joins as typed edges"}
</p>
</div>
<div className="rounded-md border border-cyan-100/20 bg-white/8 px-3 py-2">
<p className="text-sm font-semibold">Grain</p>
<p className="mt-1 text-xs leading-5 text-cyan-50/75">
{"row identity before aggregation"}
</p>
</div>
<div className="rounded-md border border-cyan-100/20 bg-white/8 px-3 py-2">
<p className="text-sm font-semibold">Measures</p>
<p className="mt-1 text-xs leading-5 text-cyan-50/75">
{"verified formulas and filters"}
</p>
</div>
<div className="rounded-md border border-cyan-100/20 bg-white/8 px-3 py-2">
<p className="whitespace-nowrap break-normal text-sm font-semibold">Relationships</p>
<p className="mt-1 text-xs leading-5 text-cyan-50/75">
{"many_to_one, one_to_many, one_to_one"}
</p>
</div>
</div>
<div className="mt-3 rounded-md border border-cyan-100/20 bg-cyan-50/10 px-3 py-2 text-sm">
{"Safe query planning before SQL is generated."}
</div>
</section>
<div className="hidden items-center justify-center bg-fd-background lg:flex" aria-hidden="true">
<span className="h-px w-full bg-fd-border" />
</div>
<section className="bg-fd-muted/35 p-4">
<p className="mb-3 text-[11px] font-semibold uppercase tracking-wide text-fd-muted-foreground">
{"Agent workflows"}
</p>
<div className="space-y-2 text-sm">
<div className="rounded-md border border-fd-border bg-fd-card px-3 py-2">
{"Search sources and wiki pages"}
</div>
<div className="rounded-md border border-fd-border bg-fd-card px-3 py-2">
{"Compile trusted SQL"}
</div>
<div className="rounded-md border border-fd-border bg-fd-card px-3 py-2">
{"Explain metrics and provenance"}
</div>
<div className="rounded-md border border-fd-border bg-fd-card px-3 py-2">
{"Patch files and validate review"}
</div>
</div>
</section>
</div>
</div>
## Join graph
A semantic source is a node. A join is a typed edge. KTX uses the graph to
choose valid paths and detect row-multiplying joins before SQL is generated.
| Relationship | Planning impact |
|--------------|-----------------|
| `many_to_one` | Usually safe for adding dimensions |
| `one_to_many` | Can multiply measures and trigger fan-out handling |
| `one_to_one` | Usually safe when keys are correct |
| Equal-cost paths | Ambiguous unless aliases or explicit joins disambiguate |
<figure
className="not-prose my-8 overflow-hidden rounded-lg border border-fd-border bg-fd-card p-4 shadow-sm"
aria-label="Example semantic join graph"
>
<div className="grid gap-3 md:grid-cols-[1fr_1fr_1fr]">
<div className="rounded-md border border-fd-border bg-fd-background px-4 py-3">
<p className="text-sm font-semibold text-fd-foreground">customers</p>
<p className="mt-1 text-xs text-fd-muted-foreground">grain: customer_id</p>
</div>
<div className="rounded-md border-2 border-fd-primary bg-fd-background px-4 py-3">
<p className="text-sm font-semibold text-fd-foreground">orders</p>
<p className="mt-1 text-xs text-fd-muted-foreground">grain: order_id</p>
</div>
<div className="rounded-md border border-fd-border bg-fd-background px-4 py-3">
<p className="text-sm font-semibold text-fd-foreground">order_items</p>
<p className="mt-1 text-xs text-fd-muted-foreground">grain: order_id, line_id</p>
</div>
</div>
<div className="my-3 grid gap-2 text-center text-xs font-medium text-fd-muted-foreground md:grid-cols-[1fr_1fr]">
<div>orders -> customers: many_to_one</div>
<div>orders -> order_items: one_to_many</div>
</div>
<figcaption className="mt-4 border-t border-fd-border pt-3 text-left text-xs leading-5 text-fd-muted-foreground">
<span className="font-medium text-fd-foreground">{"Example: "}</span>
{"refunds joins to orders. Used carefully, it explains net revenue. Joined naively, it can duplicate order-level measures."}
</figcaption>
</figure>
The graph is bidirectional for planning. If `orders -> customers` is
`many_to_one`, the reverse path is `one_to_many`.
## Building and maintaining the graph
KTX starts from evidence, writes reviewable source YAML, and treats the merged
diff as the accepted graph.
| Evidence | What it contributes |
|----------|---------------------|
| Declared primary keys | Initial row grain |
| Declared foreign keys | Formal join candidates |
| Inferred relationships | Edges when warehouses lack constraints |
| dbt, MetricFlow, and LookML imports | Existing metrics, dimensions, explores, and joins |
| Query history | Real join and filter patterns |
| Analyst review | Final authority before context is merged |
<div
className="not-prose my-8 overflow-hidden rounded-lg border border-fd-border bg-fd-card shadow-sm"
aria-label="Semantic layer maintenance loop"
>
<div className="border-b border-fd-border bg-fd-muted/35 px-4 py-3">
<p className="text-[11px] font-semibold uppercase tracking-wide text-fd-muted-foreground">
{"Semantic maintenance loop"}
</p>
<p className="mt-1 text-sm leading-6 text-fd-muted-foreground">
{"Every accepted correction becomes input to the next graph build."}
</p>
</div>
<div className="p-4">
<div className="-mx-4 overflow-x-auto px-4">
<div className="relative mx-auto h-[460px] w-[720px] max-w-none md:w-full md:max-w-[760px]">
<svg
aria-hidden="true"
className="absolute inset-0 h-full w-full text-fd-primary"
fill="none"
viewBox="0 0 760 460"
>
<g
stroke="currentColor"
strokeLinecap="round"
strokeLinejoin="round"
strokeOpacity="0.68"
strokeWidth="2.5"
>
<path d="M 352 80 H 384" />
<path d="M 600 80 H 668 V 150" />
<path d="M 632 284 V 378 H 626" />
<path d="M 408 378 H 376" />
<path d="M 160 378 H 96 V 308" />
<path d="M 128 172 V 80 H 140" />
</g>
<g fill="currentColor" fillOpacity="0.96" stroke="none">
<polygon points="0,0 -14,-7 -14,7" transform="translate(398 80)" />
<polygon points="0,0 -14,-7 -14,7" transform="translate(668 164) rotate(90)" />
<polygon points="0,0 -14,-7 -14,7" transform="translate(612 378) rotate(180)" />
<polygon points="0,0 -14,-7 -14,7" transform="translate(362 378) rotate(180)" />
<polygon points="0,0 -14,-7 -14,7" transform="translate(96 294) rotate(270)" />
<polygon points="0,0 -14,-7 -14,7" transform="translate(154 80)" />
</g>
</svg>
<div className="absolute left-1/2 top-1/2 flex h-32 w-56 -translate-x-1/2 -translate-y-1/2 flex-col items-center justify-center rounded-md border border-fd-primary/50 bg-fd-background px-4 py-4 text-center shadow-sm">
<p className="text-[11px] font-semibold uppercase tracking-wide text-fd-primary">
{"reviewed context"}
</p>
<p className="mt-2 text-sm font-semibold leading-6 text-fd-foreground">
{"The accepted graph becomes the starting point for the next build."}
</p>
</div>
<div className="absolute left-[160px] top-6 h-28 w-48 rounded-md border-2 border-fd-primary bg-fd-background px-4 py-3 text-sm shadow-sm">
<p className="text-[11px] font-semibold uppercase tracking-wide text-fd-muted-foreground">
{"Step 1"}
</p>
<p className="mt-1 font-semibold text-fd-foreground">{"ingest evidence"}</p>
<p className="mt-2 text-xs leading-5 text-fd-muted-foreground">
{"scan schemas, imports, and accepted files"}
</p>
</div>
<div className="absolute left-[408px] top-6 h-28 w-48 rounded-md border border-fd-border bg-fd-background px-4 py-3 text-sm shadow-sm">
<p className="text-[11px] font-semibold uppercase tracking-wide text-fd-muted-foreground">
{"Step 2"}
</p>
<p className="mt-1 font-semibold text-fd-foreground">{"YAML diff"}</p>
<p className="mt-2 text-xs leading-5 text-fd-muted-foreground">
{"draft source, join, grain, and measure changes"}
</p>
</div>
<div className="absolute left-[536px] top-[172px] h-28 w-48 rounded-md border border-fd-border bg-fd-background px-4 py-3 text-sm shadow-sm">
<p className="text-[11px] font-semibold uppercase tracking-wide text-fd-muted-foreground">
{"Step 3"}
</p>
<p className="mt-1 font-semibold text-fd-foreground">{"validation"}</p>
<p className="mt-2 text-xs leading-5 text-fd-muted-foreground">
{"check relationships, syntax, and unsafe query shapes"}
</p>
</div>
<div className="absolute left-[408px] top-[322px] h-28 w-48 rounded-md border border-fd-border bg-fd-background px-4 py-3 text-sm shadow-sm">
<p className="text-[11px] font-semibold uppercase tracking-wide text-fd-muted-foreground">
{"Step 4"}
</p>
<p className="mt-1 font-semibold text-fd-foreground">{"analyst review"}</p>
<p className="mt-2 text-xs leading-5 text-fd-muted-foreground">
{"accept, edit, or reject generated context"}
</p>
</div>
<div className="absolute left-[160px] top-[322px] h-28 w-48 rounded-md border border-fd-border bg-fd-background px-4 py-3 text-sm shadow-sm">
<p className="text-[11px] font-semibold uppercase tracking-wide text-fd-muted-foreground">
{"Step 5"}
</p>
<p className="mt-1 font-semibold text-fd-foreground">{"agent use"}</p>
<p className="mt-2 text-xs leading-5 text-fd-muted-foreground">
{"serve context to search, explain, and query"}
</p>
</div>
<div className="absolute left-8 top-[172px] h-28 w-48 rounded-md border border-fd-primary/70 bg-fd-background px-4 py-3 text-sm shadow-sm">
<p className="text-[11px] font-semibold uppercase tracking-wide text-fd-muted-foreground">
{"Step 6"}
</p>
<p className="mt-1 font-semibold text-fd-foreground">{"corrections"}</p>
<p className="mt-2 text-xs leading-5 text-fd-muted-foreground">
{"agent and analyst fixes become new evidence"}
</p>
</div>
</div>
</div>
</div>
</div>
## Modeling problems
Fan-out is the classic failure mode: an order-level measure joins to line-item
rows before aggregation, so one order becomes many rows.
| Problem | What happens | How KTX handles it |
|---------|--------------|--------------------|
| Order measure joins to `order_items` | `orders.revenue` repeats once per item | Detect `one_to_many` and pre-aggregate |
| Two fact sources share `customers` | Measures multiply across the shared dimension | Treat as a chasm trap and plan each fact locally |
| Filter crosses `one_to_many` | Filtering changes measure grain | Reject or localize the filter |
| Equal-cost paths connect sources | Join choice is ambiguous | Prefer safer paths or require aliases |
## Execution planning
The planner resolves sources, chooses a join tree, checks relationship paths,
and picks a simple or aggregate-locality SQL shape.
| Naive SQL shape | Semantic-layer SQL shape |
|-----------------|--------------------------|
| Join facts and dimensions first, then aggregate | Aggregate each fact source at its own grain, then join results |
| Put every filter in one outer `WHERE` clause | Keep measure filters with the measure source when locality is needed |
| Trust the shortest textual join path | Prefer safe relationship paths and reject disconnected sources |
| Let dimension grain differ across facts | Raise when asymmetric dimensions would fan out another measure |
<div
className="not-prose my-8 overflow-hidden rounded-lg border border-fd-border bg-fd-card shadow-sm"
aria-label="Fan-out safe execution shape"
>
<div className="border-b border-fd-border bg-fd-muted/35 px-4 py-3">
<p className="text-[11px] font-semibold uppercase tracking-wide text-fd-muted-foreground">
{"Fan-out handling"}
</p>
<p className="mt-1 text-sm leading-6 text-fd-muted-foreground">
{"The same question planned before and after KTX preserves the measure grain."}
</p>
</div>
<div className="grid gap-3 bg-fd-background p-4 md:grid-cols-[0.92fr_1.08fr]">
<section className="flex min-h-full flex-col rounded-md border border-fd-border bg-fd-card">
<div className="border-b border-fd-border px-4 py-3">
<p className="text-[11px] font-semibold uppercase tracking-wide text-red-600 dark:text-red-300">
{"Unsafe shape"}
</p>
<p className="mt-1 text-sm font-semibold text-fd-foreground">
{"Join first, aggregate later"}
</p>
</div>
<pre className="m-0 min-h-[13rem] flex-1 overflow-x-auto bg-transparent px-4 py-3 text-xs leading-5 text-fd-foreground">
{`orders
-> join order_items
-> join customers
group by
customer_segment
measure
sum(orders.amount)`}
</pre>
<div className="border-t border-fd-border bg-red-50/60 px-4 py-3 text-sm leading-6 text-red-950 dark:bg-red-950/20 dark:text-red-100">
{"Order-level revenue is exposed to line-item fan-out before aggregation."}
</div>
</section>
<section className="flex min-h-full flex-col rounded-md border border-fd-primary/40 bg-fd-card shadow-[inset_4px_0_0_var(--color-fd-primary)]">
<div className="border-b border-fd-border px-4 py-3">
<p className="text-[11px] font-semibold uppercase tracking-wide text-fd-primary">
{"KTX shape"}
</p>
<p className="mt-1 text-sm font-semibold text-fd-foreground">
{"Aggregate locally, then join"}
</p>
</div>
<pre className="m-0 min-h-[13rem] flex-1 overflow-x-auto bg-transparent px-4 py-3 text-xs leading-5 text-fd-foreground">
{`orders_agg as (
select customer_id, sum(amount) revenue
from orders
group by customer_id
)
select customers.segment, sum(revenue)
from orders_agg
join customers`}
</pre>
<div className="border-t border-fd-border bg-fd-primary/10 px-4 py-3 text-sm leading-6 text-fd-foreground">
{"The measure is pre-aggregated at order grain before dimensions are joined."}
</div>
</section>
</div>
</div>
The result is structured planning: validated sources, typed relationships,
graph search, fan-out detection, aggregate locality, and dialect transpilation.
## Agent usage notes
Use this page when an agent needs to explain how KTX turns reviewed semantic
context into SQL, why relationship metadata matters, or why a query was rejected
as unsafe.
| Agent task | Relevant section | Next page |
|------------|------------------|-----------|
| Explain why KTX asks for `grain` and relationship types | Join graph | [Writing Context](/docs/guides/writing-context) |
| Diagnose duplicated measures after a join | Modeling problems | [ktx sl](/docs/cli-reference/ktx-sl) |
| Explain safe SQL generation | Execution planning | [ktx sl](/docs/cli-reference/ktx-sl) |
| Describe how semantic context stays current | Building and maintaining the graph | [Context as Code](/docs/concepts/context-as-code) |