ktx/docs-site/content/docs/concepts/the-context-layer.mdx
Andrey Avtomonov 620d6adbe6
docs: rewrite Semantic Querying concept with imperative-vs-declarative diagram (#156)
* docs: rewrite Semantic Querying concept with imperative-vs-declarative diagram

Reframe semantic-layer-internals.mdx around the contract the semantic
layer offers an agent: declare what you want (a Semantic Query), KTX
figures out how to compute it. Replaces the old "Context-Aware SQL"
framing with a clear imperative-vs-declarative narrative.

Adds a React Flow component (semantic-layer-flow.tsx) that contrasts a
buggy 4-table agent-authored SQL (chasm trap, LEFT-JOIN-in-WHERE,
hardcoded DATE_TRUNC) against the chasm-safe per-fact CTE SQL the
planner actually emits, including the outer GROUP BY over the requested
dimensions. Both lanes converge into a shared warehouse node and each
SQL card now has parallel bullet notes (failures on the left, KTX
behavior on the right).

Side fixes bundled in:
- include the /ktx basePath in the favicon metadata so the icon resolves
  under the production prefix
- migrate docs-site/middleware.ts to docs-site/proxy.ts (Next 16 rename)
- redirect / to /ktx/docs/getting-started/introduction so the apex docs
  URL works
- add tests covering the apex redirect, the favicon basePath, and the
  middleware-to-proxy rename
- propagate the Semantic Query terminology across the ktx-sl CLI
  reference, the context-layer concept page, and the agent-clients /
  primary-sources integration pages

* Fix CI dead-code failures

* docs-site: polish semantic-layer-internals code blocks and flow diagram

- Make CodeBlock a server component so children traverse synchronously
  under React 19 RSC streaming; previously extractText returned "" in
  dev SSR, leaving code blocks empty.
- Add custom JSON/YAML/SQL/code-like tokenizers with theme-aware token
  classes; drop the colored file-glyph dot and gradient tab-head.
- Tighten tab-head: subtle grey background, smaller monospace filename
  in muted grey, smaller rectangular language pill placed to the left
  of the filename.
- Polish the React Flow semantic-layer diagram (controls, fit-view
  padding, edge types).

* docs-site: annotate imperative SQL, add section anchor, drop ClickHouse

- Wire numbered red badges to each problematic span in the "Without KTX"
  SQL with hover sync between SQL gutter, lines, and the notes list.
- Add #imperative-vs-declarative anchor on the flow section header so
  the eyebrow link is shareable; reveals a # glyph on hover/focus.
- Align the compiled-SQL note dots to the first-line midpoint
  (mt-[6px] instead of mt-1) so 4px dots sit at y=8 in a 16px line.
- Remove all ClickHouse references from docs-site (primary-sources,
  quickstart, ktx-setup, contributing, agents-setup, mechanics test,
  warehouse drivers in the flow diagram).

* test: drop ClickHouse contributing-docs assertion

Align the workspace-package mirror test with the ClickHouse removal
from docs-site (75907eb). The connector-clickhouse package still
exists in packages/, but contributing.mdx no longer lists it, so the
test that mirrored docs against the workspace was failing.
2026-05-19 23:41:29 +02:00

148 lines
6.1 KiB
Text

---
title: The Context Layer
description: What a context layer is, why agents need one, and how KTX compares to other semantic layers.
---
## Why agents need context
Database access lets an agent generate SQL. It does not tell the agent which
tables matter, which joins are safe, which metrics are canonical, or what your
team means by "enterprise", "net revenue", or "active customer".
That missing business context is where plausible SQL becomes wrong SQL:
- `orders.amount` may include refunds unless filtered.
- `customers.id` may not be the right join key for every source.
- `legacy_segments` may be stale even though it still exists.
- A metric may have a board-approved definition that is not obvious from
column names.
## Three waves of AI analytics
| Wave | What it gives agents | Where it breaks |
|------|----------------------|-----------------|
| **Database access** | Tables, columns, and query execution | Agents guess joins, filters, and metric logic |
| **Semantic layers** | Modeled metrics, dimensions, joins, and SQL generation | They often miss operating context: anomalies, caveats, ownership, and review history |
| **Agentic context** | Semantic definitions plus wiki knowledge, scans, provenance, and edit workflows | Requires context to be kept current and reviewable |
KTX is built for the third wave: agents that generate SQL, maintain semantic
files, write docs, propose tests, and leave reviewable diffs.
## What KTX adds
A context layer is the trusted knowledge surface between analytics systems and
agents. The semantic layer is the core, but agents also need business rules,
schema evidence, provenance, and a safe way to update files.
```text
Warehouses + dbt + BI + docs
|
v
ktx ingest
|
v
semantic-layer/ + wiki/ + raw-sources/ + provenance
|
v
Agents search, query, explain, validate, and patch context
```
| Pillar | Format | What it answers |
|--------|--------|-----------------|
| **Semantic sources** | `semantic-layer/**/*.yaml` | How do agents query a source safely? |
| **Wiki pages** | `wiki/**/*.md` | What does the business mean, and what caveats matter? |
| **Scan artifacts** | `raw-sources/**` | What did KTX observe in the warehouse or source tool? |
| **Provenance** | Ingest transcripts and run state | Why was this context created or changed? |
## Semantic sources
Semantic sources describe data in terms agents can reason about: row grain,
typed columns, valid joins, named measures, filters, and segments.
```yaml
name: orders
table: public.orders
grain: [id]
joins:
- to: customers
"on": customer_id = customers.id
relationship: many_to_one
measures:
- name: revenue
expr: sum(amount)
filter: "status != 'refunded'"
```
For join graphs, fan-out handling, and execution mechanics, read
[Semantic Querying](/docs/concepts/semantic-layer-internals).
## Wiki pages
Wiki pages capture the context that does not belong in a measure formula:
business definitions, reporting policy, known data issues, metric caveats, and
links back to semantic sources.
| Put it in YAML | Put it in Markdown |
|----------------|--------------------|
| `sum(amount)` | "Net revenue excludes successful refunds." |
| `many_to_one` join metadata | "Use contract segment for board reporting." |
| Row grain and column types | "February had a one-time refund anomaly." |
| Default time dimension | "Finance owns ARR definitions." |
## How KTX compares
KTX overlaps with semantic layers, but the product boundary is broader: it gives
agents a reviewable context workspace, not only a metric runtime.
| Dimension | KTX | MetricFlow / Cube / Malloy |
|-----------|-----|-----------------------------|
| **Primary surface** | Plain YAML and Markdown files | Modeling language, project runtime, or API surface |
| **Models** | Sources, joins, grain, measures, filters, wiki refs, and provenance | Metrics, dimensions, joins, queries, and generated SQL |
| **Agent edit loop** | First-class: patch files, validate, inspect SQL, and review git diffs | Possible, but usually tied to the tool's modeling workflow |
| **Surrounding context** | Built in through wiki pages, scans, transcripts, and source evidence | Usually descriptions, annotations, metadata, or app-specific context |
| **Best fit** | Agents maintaining analytics context and SQL-facing definitions | Teams standardizing metrics, BI APIs, semantic runtimes, or exploratory modeling |
If you already use MetricFlow, LookML, dbt, or BI tools, KTX can ingest that
context and turn it into agent-readable files. You do not need to replace your
serving layer to give agents a better working surface.
## Plain files
A KTX project is a directory of readable files. Semantic sources and wiki pages
are committed to git; local indexes and caches stay under `.ktx/`.
```text
my-project/
├── ktx.yaml
├── semantic-layer/
│ └── warehouse/
│ ├── orders.yaml
│ └── customers.yaml
├── wiki/
│ └── global/
│ ├── revenue.md
│ └── segment-classification.md
├── raw-sources/
│ └── warehouse/
└── .ktx/ # local state, git-ignored
```
This keeps analytics context close to the code review workflow:
- branch context changes;
- review YAML and Markdown diffs;
- merge accepted definitions;
- let agents read the updated source of truth.
## Agent usage notes
Use this page when an agent needs to explain why KTX exists, why schema-only
database access is not enough, or how KTX differs from traditional semantic
layers.
| Agent task | Relevant section | Next page |
|------------|------------------|-----------|
| Explain why a database agent wrote a plausible but wrong query | Why agents need context | [Writing Context](/docs/guides/writing-context) |
| Decide whether a fact belongs in YAML or Markdown | Semantic sources / Wiki pages | [Writing Context](/docs/guides/writing-context) |
| Compare KTX to another semantic layer | How KTX compares | [Primary Sources](/docs/integrations/primary-sources) |
| Explain reviewability and source of truth | Plain files | [Context as Code](/docs/concepts/context-as-code) |