mirror of
https://github.com/Kaelio/ktx.git
synced 2026-06-25 08:48:08 +02:00
176 lines
6.5 KiB
Text
176 lines
6.5 KiB
Text
---
|
|
title: Semantic Layer Internals
|
|
description: How KTX uses join graphs, grain, and relationship metadata to turn context into safe SQL.
|
|
---
|
|
|
|
KTX is a context layer for agents. This page focuses on the semantic execution
|
|
subsystem: the part that turns reviewed YAML context into safe SQL.
|
|
|
|
Read it as a pipeline:
|
|
|
|
```text
|
|
context files + warehouse evidence
|
|
|
|
|
v
|
|
join graph with grain and relationship metadata
|
|
|
|
|
v
|
|
fan-out checks + aggregate-locality planning
|
|
|
|
|
v
|
|
canonical SQL -> dialect SQL
|
|
```
|
|
|
|
## Where it fits
|
|
|
|
The semantic layer is not the whole product. It is the engine that makes KTX
|
|
context actionable for SQL generation.
|
|
|
|
| Input | Used for |
|
|
|-------|----------|
|
|
| `semantic-layer/` | Sources, columns, joins, grain, measures, filters, and segments |
|
|
| `wiki/` | Business definitions, caveats, and metric explanations |
|
|
| `raw-sources/` | Schema scans, imported metadata, keys, and relationship evidence |
|
|
| Provenance | Ingest decisions, review history, and replay context |
|
|
|
|
Agents use the result to:
|
|
|
|
- search semantic sources and wiki pages;
|
|
- compile trusted SQL instead of guessing joins;
|
|
- explain metric meaning and provenance;
|
|
- patch YAML or Markdown and validate the diff.
|
|
|
|
## Join graph
|
|
|
|
A semantic source is a node. A join is a typed edge with a condition and a
|
|
relationship. The graph lets KTX choose valid paths and detect row-multiplying
|
|
paths before SQL is generated.
|
|
|
|
```text
|
|
customers <- many_to_one <- orders -> one_to_many -> order_items
|
|
grain: customer_id grain: order_id grain: order_id, line_id
|
|
```
|
|
|
|
| Relationship | What it means | Planning impact |
|
|
|--------------|---------------|-----------------|
|
|
| `many_to_one` | Many fact rows point to one dimension row | Usually safe for adding dimensions |
|
|
| `one_to_many` | One row expands into many child rows | Can multiply measures and trigger fan-out handling |
|
|
| `one_to_one` | Both sides preserve row identity | Usually safe when keys are correct |
|
|
| Ambiguous path | Multiple equal-cost paths connect sources | Requires aliases or a safer explicit path |
|
|
|
|
The graph is bidirectional for planning. If `orders -> customers` is
|
|
`many_to_one`, the reverse path is `one_to_many`; KTX keeps that distinction
|
|
instead of treating every join as neutral.
|
|
|
|
## How KTX builds the graph
|
|
|
|
KTX starts from evidence, then writes reviewable source YAML. The accepted graph
|
|
is the plain-file diff your team approves.
|
|
|
|
| Evidence | What it contributes |
|
|
|----------|---------------------|
|
|
| Declared primary keys | Initial row grain for each source |
|
|
| Declared foreign keys | Formal join candidates and relationship direction |
|
|
| Inferred relationships | Useful edges when warehouses lack constraints |
|
|
| dbt, MetricFlow, and LookML imports | Existing metrics, dimensions, entities, explores, and joins |
|
|
| Query history | Join and filter patterns agents should respect |
|
|
| Analyst review | Final authority before context is merged |
|
|
|
|
## Maintenance loop
|
|
|
|
Semantic correctness changes when schemas, metrics, and business definitions
|
|
change. KTX keeps that loop explicit.
|
|
|
|
```text
|
|
ingest evidence
|
|
|
|
|
v
|
|
draft YAML diff
|
|
|
|
|
v
|
|
validate relationships and query shapes
|
|
|
|
|
v
|
|
analyst review
|
|
|
|
|
v
|
|
agent use
|
|
|
|
|
v
|
|
corrections become new evidence
|
|
```
|
|
|
|
This matters when a source gains a key, a metric changes definition, or an
|
|
analyst corrects a relationship. The next agent starts from the reviewed
|
|
context, not a hidden runtime state.
|
|
|
|
## Modeling problems
|
|
|
|
Fan-out is the classic failure mode: an order-level measure joins to line-item
|
|
rows before aggregation, so one order becomes many rows and revenue is counted
|
|
more than once.
|
|
|
|
| Problem | What happens | How KTX handles it |
|
|
|---------|--------------|--------------------|
|
|
| Order measure joins to `order_items` | `orders.revenue` repeats once per item | Detect the `one_to_many` path and pre-aggregate the order measure |
|
|
| Two fact sources share `customers` | Measures multiply across a shared dimension | Treat it as a chasm trap and plan each fact locally |
|
|
| Filter crosses a `one_to_many` path | Filtering after the join changes measure grain | Reject or localize the filter |
|
|
| Equal-cost paths connect the same sources | Join choice is ambiguous | Prefer safer paths or require aliases |
|
|
|
|
Many-to-many questions usually appear as multiple `one_to_many` paths or
|
|
independent fact sources. KTX treats those shapes as fan-out or chasm risks
|
|
unless the query can be planned at a safe grain.
|
|
|
|
## Execution planning
|
|
|
|
The planner resolves sources, chooses a join tree, checks relationship paths,
|
|
and decides whether the query can use a simple shape or needs aggregate
|
|
locality.
|
|
|
|
| Naive SQL shape | Semantic-layer SQL shape |
|
|
|-----------------|--------------------------|
|
|
| Join facts and dimensions first, then aggregate | Aggregate each fact source at its own grain, then join results |
|
|
| Put every filter in one outer `WHERE` clause | Keep measure filters with the measure source when locality is needed |
|
|
| Trust the shortest textual join path | Prefer safe relationship paths and reject disconnected sources |
|
|
| Let dimension grain differ across facts | Raise when asymmetric dimensions would fan out another measure |
|
|
|
|
Unsafe shape:
|
|
|
|
```sql
|
|
select customers.segment, sum(orders.amount)
|
|
from orders
|
|
join order_items on order_items.order_id = orders.id
|
|
join customers on customers.id = orders.customer_id
|
|
group by customers.segment;
|
|
```
|
|
|
|
KTX shape:
|
|
|
|
```sql
|
|
with orders_agg as (
|
|
select customer_id, sum(amount) as revenue
|
|
from orders
|
|
group by customer_id
|
|
)
|
|
select customers.segment, sum(orders_agg.revenue)
|
|
from orders_agg
|
|
join customers on customers.id = orders_agg.customer_id
|
|
group by customers.segment;
|
|
```
|
|
|
|
The result is structured planning: validated sources, typed relationships,
|
|
graph search, fan-out detection, aggregate locality, and final dialect
|
|
transpilation.
|
|
|
|
## Agent usage notes
|
|
|
|
Use this page when an agent needs to explain how KTX turns reviewed semantic
|
|
context into SQL, why relationship metadata matters, or why a query was rejected
|
|
as unsafe.
|
|
|
|
| Agent task | Relevant section | Next page |
|
|
|------------|------------------|-----------|
|
|
| Explain why KTX asks for `grain` and relationship types | Join graph | [Writing Context](/docs/guides/writing-context) |
|
|
| Diagnose duplicated measures after a join | Modeling problems | [ktx sl](/docs/cli-reference/ktx-sl) |
|
|
| Explain safe SQL generation | Execution planning | [ktx sl](/docs/cli-reference/ktx-sl) |
|
|
| Describe how semantic context stays current | Maintenance loop | [Context as Code](/docs/concepts/context-as-code) |
|