diff --git a/docs-site/content/docs/concepts/semantic-layer-internals.mdx b/docs-site/content/docs/concepts/semantic-layer-internals.mdx index 3e9d7cd7..a6d4b640 100644 --- a/docs-site/content/docs/concepts/semantic-layer-internals.mdx +++ b/docs-site/content/docs/concepts/semantic-layer-internals.mdx @@ -4,128 +4,309 @@ description: How KTX uses join graphs, grain, and relationship metadata to turn --- KTX is a context layer for agents. This page focuses on the semantic execution -subsystem: the part that turns reviewed YAML context into safe SQL. +layer: the subsystem that turns reviewed context into safe SQL. -Read it as a pipeline: +Read it as four mechanics: -```text -context files + warehouse evidence - | - v -join graph with grain and relationship metadata - | - v -fan-out checks + aggregate-locality planning - | - v -canonical SQL -> dialect SQL -``` +- context files feed the semantic engine; +- evidence becomes a join graph with grain and relationship metadata; +- review keeps the graph current; +- query planning avoids fan-out and ambiguous joins. -## Where it fits +## Where the semantic layer fits -The semantic layer is not the whole product. It is the engine that makes KTX -context actionable for SQL generation. +The semantic layer is the engine that makes KTX context actionable for SQL +generation. It uses source YAML, wiki context, scan evidence, and provenance. -| Input | Used for | -|-------|----------| -| `semantic-layer/` | Sources, columns, joins, grain, measures, filters, and segments | -| `wiki/` | Business definitions, caveats, and metric explanations | -| `raw-sources/` | Schema scans, imported metadata, keys, and relationship evidence | -| Provenance | Ingest decisions, review history, and replay context | +
+ {"Context inputs"} +
+semantic-layer/
++ {"source YAML, measures, joins, grain"} +
+wiki/
++ {"business rules, definitions, caveats"} +
+raw-sources/
++ {"schema scans, keys, imported metadata"} +
+provenance
++ {"ingest decisions and review history"} +
++ {"Semantic layer engine"} +
+Join graph
++ {"sources as nodes, joins as typed edges"} +
+Grain
++ {"row identity before aggregation"} +
+Measures
++ {"verified formulas and filters"} +
+Relationships
++ {"many_to_one, one_to_many, one_to_one"} +
++ {"Agent workflows"} +
+customers
+grain: customer_id
+orders
+grain: order_id
+order_items
+grain: order_id, line_id
++ {"Semantic maintenance loop"} +
++ {"Every accepted correction becomes input to the next graph build."} +
++ {"reviewed context"} +
++ {"The accepted graph becomes the starting point for the next build."} +
++ {"Step 1"} +
+{"ingest evidence"}
++ {"scan schemas, imports, and accepted files"} +
++ {"Step 2"} +
+{"YAML diff"}
++ {"draft source, join, grain, and measure changes"} +
++ {"Step 3"} +
+{"validation"}
++ {"check relationships, syntax, and unsafe query shapes"} +
++ {"Step 4"} +
+{"analyst review"}
++ {"accept, edit, or reject generated context"} +
++ {"Step 5"} +
+{"agent use"}
++ {"serve context to search, explain, and query"} +
++ {"Step 6"} +
+{"corrections"}
++ {"agent and analyst fixes become new evidence"} +
++ {"Unsafe shape"} +
+
+{`orders
+ join order_items
+ join customers
+group by customer_segment
+sum(orders.amount)`}
+
+ + {"The order measure is exposed to line-item fan-out before aggregation."} +
++ {"KTX shape"} +
+
+{`orders_agg as (
+ select customer_id, sum(amount) revenue
from orders
group by customer_id
)
-select customers.segment, sum(orders_agg.revenue)
+select customers.segment, sum(revenue)
from orders_agg
-join customers on customers.id = orders_agg.customer_id
-group by customers.segment;
-```
+join customers`}
+
+ + {"KTX pre-aggregates fact measures at their own grain before joining dimensions."} +
+