ktx/docs-site/content/docs/concepts/semantic-layer-internals.mdx
2026-05-16 09:12:15 -07:00

176 lines
6.5 KiB
Text

---
title: Semantic Layer Internals
description: How KTX uses join graphs, grain, and relationship metadata to turn context into safe SQL.
---
KTX is a context layer for agents. This page focuses on the semantic execution
subsystem: the part that turns reviewed YAML context into safe SQL.
Read it as a pipeline:
```text
context files + warehouse evidence
|
v
join graph with grain and relationship metadata
|
v
fan-out checks + aggregate-locality planning
|
v
canonical SQL -> dialect SQL
```
## Where it fits
The semantic layer is not the whole product. It is the engine that makes KTX
context actionable for SQL generation.
| Input | Used for |
|-------|----------|
| `semantic-layer/` | Sources, columns, joins, grain, measures, filters, and segments |
| `wiki/` | Business definitions, caveats, and metric explanations |
| `raw-sources/` | Schema scans, imported metadata, keys, and relationship evidence |
| Provenance | Ingest decisions, review history, and replay context |
Agents use the result to:
- search semantic sources and wiki pages;
- compile trusted SQL instead of guessing joins;
- explain metric meaning and provenance;
- patch YAML or Markdown and validate the diff.
## Join graph
A semantic source is a node. A join is a typed edge with a condition and a
relationship. The graph lets KTX choose valid paths and detect row-multiplying
paths before SQL is generated.
```text
customers <- many_to_one <- orders -> one_to_many -> order_items
grain: customer_id grain: order_id grain: order_id, line_id
```
| Relationship | What it means | Planning impact |
|--------------|---------------|-----------------|
| `many_to_one` | Many fact rows point to one dimension row | Usually safe for adding dimensions |
| `one_to_many` | One row expands into many child rows | Can multiply measures and trigger fan-out handling |
| `one_to_one` | Both sides preserve row identity | Usually safe when keys are correct |
| Ambiguous path | Multiple equal-cost paths connect sources | Requires aliases or a safer explicit path |
The graph is bidirectional for planning. If `orders -> customers` is
`many_to_one`, the reverse path is `one_to_many`; KTX keeps that distinction
instead of treating every join as neutral.
## How KTX builds the graph
KTX starts from evidence, then writes reviewable source YAML. The accepted graph
is the plain-file diff your team approves.
| Evidence | What it contributes |
|----------|---------------------|
| Declared primary keys | Initial row grain for each source |
| Declared foreign keys | Formal join candidates and relationship direction |
| Inferred relationships | Useful edges when warehouses lack constraints |
| dbt, MetricFlow, and LookML imports | Existing metrics, dimensions, entities, explores, and joins |
| Query history | Join and filter patterns agents should respect |
| Analyst review | Final authority before context is merged |
## Maintenance loop
Semantic correctness changes when schemas, metrics, and business definitions
change. KTX keeps that loop explicit.
```text
ingest evidence
|
v
draft YAML diff
|
v
validate relationships and query shapes
|
v
analyst review
|
v
agent use
|
v
corrections become new evidence
```
This matters when a source gains a key, a metric changes definition, or an
analyst corrects a relationship. The next agent starts from the reviewed
context, not a hidden runtime state.
## Modeling problems
Fan-out is the classic failure mode: an order-level measure joins to line-item
rows before aggregation, so one order becomes many rows and revenue is counted
more than once.
| Problem | What happens | How KTX handles it |
|---------|--------------|--------------------|
| Order measure joins to `order_items` | `orders.revenue` repeats once per item | Detect the `one_to_many` path and pre-aggregate the order measure |
| Two fact sources share `customers` | Measures multiply across a shared dimension | Treat it as a chasm trap and plan each fact locally |
| Filter crosses a `one_to_many` path | Filtering after the join changes measure grain | Reject or localize the filter |
| Equal-cost paths connect the same sources | Join choice is ambiguous | Prefer safer paths or require aliases |
Many-to-many questions usually appear as multiple `one_to_many` paths or
independent fact sources. KTX treats those shapes as fan-out or chasm risks
unless the query can be planned at a safe grain.
## Execution planning
The planner resolves sources, chooses a join tree, checks relationship paths,
and decides whether the query can use a simple shape or needs aggregate
locality.
| Naive SQL shape | Semantic-layer SQL shape |
|-----------------|--------------------------|
| Join facts and dimensions first, then aggregate | Aggregate each fact source at its own grain, then join results |
| Put every filter in one outer `WHERE` clause | Keep measure filters with the measure source when locality is needed |
| Trust the shortest textual join path | Prefer safe relationship paths and reject disconnected sources |
| Let dimension grain differ across facts | Raise when asymmetric dimensions would fan out another measure |
Unsafe shape:
```sql
select customers.segment, sum(orders.amount)
from orders
join order_items on order_items.order_id = orders.id
join customers on customers.id = orders.customer_id
group by customers.segment;
```
KTX shape:
```sql
with orders_agg as (
select customer_id, sum(amount) as revenue
from orders
group by customer_id
)
select customers.segment, sum(orders_agg.revenue)
from orders_agg
join customers on customers.id = orders_agg.customer_id
group by customers.segment;
```
The result is structured planning: validated sources, typed relationships,
graph search, fan-out detection, aggregate locality, and final dialect
transpilation.
## Agent usage notes
Use this page when an agent needs to explain how KTX turns reviewed semantic
context into SQL, why relationship metadata matters, or why a query was rejected
as unsafe.
| Agent task | Relevant section | Next page |
|------------|------------------|-----------|
| Explain why KTX asks for `grain` and relationship types | Join graph | [Writing Context](/docs/guides/writing-context) |
| Diagnose duplicated measures after a join | Modeling problems | [ktx sl](/docs/cli-reference/ktx-sl) |
| Explain safe SQL generation | Execution planning | [ktx sl](/docs/cli-reference/ktx-sl) |
| Describe how semantic context stays current | Maintenance loop | [Context as Code](/docs/concepts/context-as-code) |