ktx/docs-site/content/docs/concepts/semantic-layer-internals.mdx

---
title: Semantic Querying
description: How KTX compiles a short Semantic Query into safe, dialect-correct SQL using a reviewed join graph.
---

import { SemanticLayerFlow } from "@/components/semantic-layer-flow";

KTX's semantic layer is a compiler that turns intent into SQL. The agent
declares _what_ it wants — measures, dimensions, filters — in a small
Semantic Query. KTX figures out the _how_: which tables to join, what
grain to aggregate at, how to keep fan-out from inflating measures, and
what dialect the warehouse speaks.

This page covers four mechanics:

- The Semantic Query contract agents send to the compiler.
- The planner steps that turn a Semantic Query into SQL.
- The join graph that backs those steps, and how it's built.
- The fan-out failure mode the compiler is designed to prevent.

## Imperative SQL vs declarative Semantic Querying

Writing analytics SQL is imperative work. Every question forces the
agent to hold two things in mind at once: _what_ it wants — a measure, a
slice, a filter — and _how_ to compute it: which tables to join, which
key links them, what grain to aggregate at, how to keep one fact from
inflating another, and what dialect the warehouse speaks. Plumbing on
top of intent, every query.

KTX's semantic layer separates those concerns:

- **You and KTX maintain the how.** Sources, joins, grain, measures, and
  segments live in reviewable YAML — the analytical contract the team
  agrees on, version-controlled.
- **The agent declares the what.** It sends a Semantic Query and trusts
  the compiler to produce safe SQL.

The agent stops reasoning about plumbing. It states intent. KTX turns
that into SQL the warehouse can run.

<SemanticLayerFlow />

## The Semantic Query contract

A Semantic Query is the JSON payload the agent sends. Every field is optional
except `measures`, and column references are fully qualified
(`source.column`) so the compiler never has to guess where a name came
from.

Notice what's _not_ in the payload: no `FROM`, no `JOIN`, no `GROUP BY`,
no `WITH`. The agent states what it wants. KTX picks the join path, the
grain, the SQL shape, and the dialect.

| Field | Purpose |
|-------|---------|
| `measures` | Names of pre-defined measures, or inline expressions like `sum(orders.amount)` |
| `dimensions` | Columns to group by, optionally with a `granularity` for time fields |
| `filters` | Row-level predicates, classified into `WHERE` or `HAVING` at planning time |
| `segments` | Named filter sets defined on a source, applied as additional predicates |
| `order_by` | Sort fields with optional direction |
| `limit` | Row cap on the result |

A typical agent call looks like this:

```json
{
  "measures": ["orders.revenue", "tickets.ticket_count"],
  "dimensions": ["customers.segment"],
  "filters": ["orders.created_at >= '2025-01-01'"],
  "limit": 1000
}
```

That payload is enough for KTX to plan and compile. The agent never
authors a join, a CTE, or a dialect-specific cast.

## What the planner does

The planner is a deterministic pipeline. Each Semantic Query runs through the
same ordered steps before any SQL is emitted.

1. **Resolve refs.** Qualify bare column names, look up pre-defined
   measure expressions, and classify each measure as raw or derived.
2. **Pick an anchor and build the join tree.** Choose the largest measure
   source as the root, then run a shortest-path search across the typed
   join graph to reach every required source.
3. **Detect fan-out.** Group measures by their owning source. If more
   than one group exists, the planner marks the query as a chasm trap
   and switches to aggregate-locality compilation.
4. **Classify filters.** Split predicates into row-level (`WHERE`) and
   aggregate-level (`HAVING`) based on whether they reference a measure.
5. **Generate SQL.** Emit Postgres-shaped SQL with the right shape:
   single-source aggregation when the query is safe, per-source CTEs
   when fan-out is present.
6. **Transpile to the target dialect.** Run the result through `sqlglot`
   so the warehouse receives syntax it understands.

The output is the SQL string, the resolved plan, and any warnings
surfaced during planning.

## The join graph

A semantic source is a node. A declared join is a typed edge. The graph
is bidirectional: every forward edge has a reverse with the relationship
inverted, so the planner can traverse from any anchor.

| Relationship | Planning impact |
|--------------|-----------------|
| `many_to_one` | Safe direction for adding dimensions |
| `one_to_many` | Multiplies measures and triggers fan-out handling |
| `one_to_one` | Safe in either direction when keys match |
| Equal-cost paths | Treated as ambiguous; aliases or explicit joins resolve them |

<figure
  className="not-prose my-8 overflow-hidden rounded-lg border border-fd-border bg-fd-card p-4 shadow-sm"
  aria-label="Example semantic join graph"
>
  <div className="grid gap-3 md:grid-cols-[1fr_1fr_1fr]">
    <div className="rounded-md border border-fd-border bg-fd-background px-4 py-3">
      <p className="text-sm font-semibold text-fd-foreground">{"customers"}</p>
      <p className="mt-1 text-xs text-fd-muted-foreground">{"grain: customer_id"}</p>
    </div>
    <div className="rounded-md border-2 border-fd-primary bg-fd-background px-4 py-3">
      <p className="text-sm font-semibold text-fd-foreground">{"orders"}</p>
      <p className="mt-1 text-xs text-fd-muted-foreground">{"grain: order_id"}</p>
    </div>
    <div className="rounded-md border border-fd-border bg-fd-background px-4 py-3">
      <p className="text-sm font-semibold text-fd-foreground">{"order_items"}</p>
      <p className="mt-1 text-xs text-fd-muted-foreground">{"grain: order_id, line_id"}</p>
    </div>
  </div>
  <div className="my-3 grid gap-2 text-center text-xs font-medium text-fd-muted-foreground md:grid-cols-[1fr_1fr]">
    <div>{"orders -> customers: many_to_one"}</div>
    <div>{"orders -> order_items: one_to_many"}</div>
  </div>
  <figcaption className="mt-4 border-t border-fd-border pt-3 text-left text-xs leading-5 text-fd-muted-foreground">
    <span className="font-medium text-fd-foreground">{"Example: "}</span>
    {"refunds joins to orders. Used carefully, it explains net revenue. Joined naively, it duplicates order-level measures."}
  </figcaption>
</figure>

Edges and grain come from your YAML. The compiler treats them as fact,
not a guess.

```yaml
# semantic-layer/warehouse/orders.yaml
name: orders
table: public.orders
grain: [order_id]
joins:
  - to: customers
    on: customer_id = customers.id
    relationship: many_to_one
  - to: order_items
    on: id = order_items.order_id
    relationship: one_to_many
measures:
  - name: revenue
    expr: sum(case when status != 'refunded' then amount end)
```

## Building and maintaining the graph

KTX builds the graph from evidence and accepted edits, not from runtime
inference. Each input contributes a different kind of authority.

| Evidence | What it contributes |
|----------|---------------------|
| Declared primary keys | Initial row grain |
| Declared foreign keys | Formal join candidates |
| Inferred relationships | Edges when the warehouse lacks constraints |
| dbt, MetricFlow, and LookML imports | Existing metrics, dimensions, explores, and joins |
| Query history | Real join and filter patterns from analyst SQL |
| Analyst review | Final authority before context is merged |

<div
  className="not-prose my-8 overflow-hidden rounded-lg border border-fd-border bg-fd-card shadow-sm"
  aria-label="Semantic layer maintenance loop"
>
  <div className="border-b border-fd-border bg-fd-muted/35 px-4 py-3">
    <p className="text-[11px] font-semibold uppercase tracking-wide text-fd-muted-foreground">
      {"Semantic maintenance loop"}
    </p>
    <p className="mt-1 text-sm leading-6 text-fd-muted-foreground">
      {"Every accepted correction becomes input to the next graph build."}
    </p>
  </div>
  <div className="p-4">
    <div className="-mx-4 overflow-x-auto px-4">
      <div className="relative mx-auto h-[460px] w-[720px] max-w-none md:w-full md:max-w-[760px]">
        <svg
          aria-hidden="true"
          className="absolute inset-0 h-full w-full text-fd-primary"
          fill="none"
          viewBox="0 0 760 460"
        >
          <g
            stroke="currentColor"
            strokeLinecap="round"
            strokeLinejoin="round"
            strokeOpacity="0.68"
            strokeWidth="2.5"
          >
            <path d="M 352 80 H 384" />
            <path d="M 600 80 H 668 V 150" />
            <path d="M 632 284 V 378 H 626" />
            <path d="M 408 378 H 376" />
            <path d="M 160 378 H 96 V 308" />
            <path d="M 128 172 V 80 H 140" />
          </g>
          <g fill="currentColor" fillOpacity="0.96" stroke="none">
            <polygon points="0,0 -14,-7 -14,7" transform="translate(398 80)" />
            <polygon points="0,0 -14,-7 -14,7" transform="translate(668 164) rotate(90)" />
            <polygon points="0,0 -14,-7 -14,7" transform="translate(612 378) rotate(180)" />
            <polygon points="0,0 -14,-7 -14,7" transform="translate(362 378) rotate(180)" />
            <polygon points="0,0 -14,-7 -14,7" transform="translate(96 294) rotate(270)" />
            <polygon points="0,0 -14,-7 -14,7" transform="translate(154 80)" />
          </g>
        </svg>

        <div className="absolute left-1/2 top-1/2 flex h-32 w-56 -translate-x-1/2 -translate-y-1/2 flex-col items-center justify-center rounded-md border border-fd-primary/50 bg-fd-background px-4 py-4 text-center shadow-sm">
          <p className="text-[11px] font-semibold uppercase tracking-wide text-fd-primary">
            {"reviewed context"}
          </p>
          <p className="mt-2 text-sm font-semibold leading-6 text-fd-foreground">
            {"The accepted graph becomes the starting point for the next build."}
          </p>
        </div>

        <div className="absolute left-[160px] top-6 h-28 w-48 rounded-md border-2 border-fd-primary bg-fd-background px-4 py-3 text-sm shadow-sm">
          <p className="text-[11px] font-semibold uppercase tracking-wide text-fd-muted-foreground">
            {"Step 1"}
          </p>
          <p className="mt-1 font-semibold text-fd-foreground">{"ingest evidence"}</p>
          <p className="mt-2 text-xs leading-5 text-fd-muted-foreground">
            {"scan schemas, imports, and accepted files"}
          </p>
        </div>
        <div className="absolute left-[408px] top-6 h-28 w-48 rounded-md border border-fd-border bg-fd-background px-4 py-3 text-sm shadow-sm">
          <p className="text-[11px] font-semibold uppercase tracking-wide text-fd-muted-foreground">
            {"Step 2"}
          </p>
          <p className="mt-1 font-semibold text-fd-foreground">{"YAML diff"}</p>
          <p className="mt-2 text-xs leading-5 text-fd-muted-foreground">
            {"draft source, join, grain, and measure changes"}
          </p>
        </div>
        <div className="absolute left-[536px] top-[172px] h-28 w-48 rounded-md border border-fd-border bg-fd-background px-4 py-3 text-sm shadow-sm">
          <p className="text-[11px] font-semibold uppercase tracking-wide text-fd-muted-foreground">
            {"Step 3"}
          </p>
          <p className="mt-1 font-semibold text-fd-foreground">{"validation"}</p>
          <p className="mt-2 text-xs leading-5 text-fd-muted-foreground">
            {"check relationships, syntax, and unsafe query shapes"}
          </p>
        </div>
        <div className="absolute left-[408px] top-[322px] h-28 w-48 rounded-md border border-fd-border bg-fd-background px-4 py-3 text-sm shadow-sm">
          <p className="text-[11px] font-semibold uppercase tracking-wide text-fd-muted-foreground">
            {"Step 4"}
          </p>
          <p className="mt-1 font-semibold text-fd-foreground">{"analyst review"}</p>
          <p className="mt-2 text-xs leading-5 text-fd-muted-foreground">
            {"accept, edit, or reject generated context"}
          </p>
        </div>
        <div className="absolute left-[160px] top-[322px] h-28 w-48 rounded-md border border-fd-border bg-fd-background px-4 py-3 text-sm shadow-sm">
          <p className="text-[11px] font-semibold uppercase tracking-wide text-fd-muted-foreground">
            {"Step 5"}
          </p>
          <p className="mt-1 font-semibold text-fd-foreground">{"agent use"}</p>
          <p className="mt-2 text-xs leading-5 text-fd-muted-foreground">
            {"serve context to search, explain, and query"}
          </p>
        </div>
        <div className="absolute left-8 top-[172px] h-28 w-48 rounded-md border border-fd-primary/70 bg-fd-background px-4 py-3 text-sm shadow-sm">
          <p className="text-[11px] font-semibold uppercase tracking-wide text-fd-muted-foreground">
            {"Step 6"}
          </p>
          <p className="mt-1 font-semibold text-fd-foreground">{"corrections"}</p>
          <p className="mt-2 text-xs leading-5 text-fd-muted-foreground">
            {"agent and analyst fixes become new evidence"}
          </p>
        </div>
      </div>
    </div>
  </div>
</div>

## Fan-out and aggregate locality

Fan-out is the classic analytics failure mode. Two fact tables join to a
shared dimension. A naive query joins them all together first, so each
row from one fact is multiplied by the matching rows from the other.
Measures duplicate, numbers go wrong, and the agent doesn't notice.

KTX's planner detects the shape by grouping measures by their owning
source. If more than one source contributes raw measures, the generator
switches to aggregate locality: each fact is pre-aggregated at its own
grain inside a CTE, and the CTEs are joined back to the dimension at the
end.

| Naive SQL shape | Semantic-layer SQL shape |
|-----------------|--------------------------|
| Join facts and dimensions first, then aggregate | Aggregate each fact at its own grain, then join |
| Put every filter in one outer `WHERE` clause | Keep measure filters with the measure source |
| Trust the shortest textual join path | Prefer typed safe paths, reject disconnected sources |
| Let dimension grain differ across facts | Raise when an asymmetric dimension would fan out another measure |

The result is the same analyst answer, computed with the join shape an
analyst would have written by hand.

## Where the context comes from

The planner is only as good as the YAML it reads. KTX builds and
maintains that YAML for you.

- `raw-sources/<connection>/` holds scan evidence from your warehouse:
  schemas, columns, keys, samples, and observed usage patterns.
- `wiki/` holds business language, definitions, and caveats. The
  planner doesn't read wiki at compile time, but the agent does, so
  measure names and dimensions stay anchored to terms the team uses.
- `semantic-layer/<connection>/` holds the structured sources, joins,
  grain, measures, and segments the planner actually compiles against.

Every accepted edit flows back into the next ingest, so the graph stays
current as the warehouse changes.

## Agent usage notes

Point an agent at this page when it needs to explain why KTX asks for
grain, why a query was rejected as unsafe, or why the compiled SQL looks
different from what the agent first proposed.

| Agent task | Relevant section | Next page |
|------------|------------------|-----------|
| Explain the Semantic Query shape | The Semantic Query contract | [ktx sl](/docs/cli-reference/ktx-sl) |
| Describe what the planner does between query and SQL | What the planner does | [ktx sl](/docs/cli-reference/ktx-sl) |
| Explain why KTX asks for grain and relationship types | The join graph | [Writing context](/docs/guides/writing-context) |
| Diagnose duplicated measures after a join | Fan-out and aggregate locality | [ktx sl](/docs/cli-reference/ktx-sl) |
| Describe how semantic context stays current | Building and maintaining the graph | [Context as code](/docs/concepts/context-as-code) |