ktx/docs-site/content/docs/concepts/the-context-layer.mdx

149 lines
6.1 KiB
Text
Raw Permalink Normal View History

---
title: The Context Layer
description: What a context layer is, why agents need one, and how KTX compares to other semantic layers.
---
## Why agents need context
Database access lets an agent generate SQL. It does not tell the agent which
tables matter, which joins are safe, which metrics are canonical, or what your
team means by "enterprise", "net revenue", or "active customer".
That missing business context is where plausible SQL becomes wrong SQL:
- `orders.amount` may include refunds unless filtered.
- `customers.id` may not be the right join key for every source.
- `legacy_segments` may be stale even though it still exists.
- A metric may have a board-approved definition that is not obvious from
column names.
## Three waves of AI analytics
| Wave | What it gives agents | Where it breaks |
|------|----------------------|-----------------|
| **Database access** | Tables, columns, and query execution | Agents guess joins, filters, and metric logic |
| **Semantic layers** | Modeled metrics, dimensions, joins, and SQL generation | They often miss operating context: anomalies, caveats, ownership, and review history |
| **Agentic context** | Semantic definitions plus wiki knowledge, scans, provenance, and edit workflows | Requires context to be kept current and reviewable |
KTX is built for the third wave: agents that generate SQL, maintain semantic
files, write docs, propose tests, and leave reviewable diffs.
## What KTX adds
A context layer is the trusted knowledge surface between analytics systems and
agents. The semantic layer is the core, but agents also need business rules,
schema evidence, provenance, and a safe way to update files.
```text
Warehouses + dbt + BI + docs
|
v
ktx ingest
|
v
semantic-layer/ + wiki/ + raw-sources/ + provenance
|
v
Agents search, query, explain, validate, and patch context
```
2026-05-11 23:32:12 -07:00
| Pillar | Format | What it answers |
|--------|--------|-----------------|
| **Semantic sources** | `semantic-layer/**/*.yaml` | How do agents query a source safely? |
| **Wiki pages** | `wiki/**/*.md` | What does the business mean, and what caveats matter? |
| **Scan artifacts** | `raw-sources/**` | What did KTX observe in the warehouse or source tool? |
| **Provenance** | Ingest transcripts and run state | Why was this context created or changed? |
## Semantic sources
Semantic sources describe data in terms agents can reason about: row grain,
typed columns, valid joins, named measures, filters, and segments.
```yaml
name: orders
table: public.orders
grain: [id]
joins:
- to: customers
"on": customer_id = customers.id
relationship: many_to_one
measures:
- name: revenue
expr: sum(amount)
filter: "status != 'refunded'"
```
For join graphs, fan-out handling, and execution mechanics, read
docs: rewrite Semantic Querying concept with imperative-vs-declarative diagram (#156) * docs: rewrite Semantic Querying concept with imperative-vs-declarative diagram Reframe semantic-layer-internals.mdx around the contract the semantic layer offers an agent: declare what you want (a Semantic Query), KTX figures out how to compute it. Replaces the old "Context-Aware SQL" framing with a clear imperative-vs-declarative narrative. Adds a React Flow component (semantic-layer-flow.tsx) that contrasts a buggy 4-table agent-authored SQL (chasm trap, LEFT-JOIN-in-WHERE, hardcoded DATE_TRUNC) against the chasm-safe per-fact CTE SQL the planner actually emits, including the outer GROUP BY over the requested dimensions. Both lanes converge into a shared warehouse node and each SQL card now has parallel bullet notes (failures on the left, KTX behavior on the right). Side fixes bundled in: - include the /ktx basePath in the favicon metadata so the icon resolves under the production prefix - migrate docs-site/middleware.ts to docs-site/proxy.ts (Next 16 rename) - redirect / to /ktx/docs/getting-started/introduction so the apex docs URL works - add tests covering the apex redirect, the favicon basePath, and the middleware-to-proxy rename - propagate the Semantic Query terminology across the ktx-sl CLI reference, the context-layer concept page, and the agent-clients / primary-sources integration pages * Fix CI dead-code failures * docs-site: polish semantic-layer-internals code blocks and flow diagram - Make CodeBlock a server component so children traverse synchronously under React 19 RSC streaming; previously extractText returned "" in dev SSR, leaving code blocks empty. - Add custom JSON/YAML/SQL/code-like tokenizers with theme-aware token classes; drop the colored file-glyph dot and gradient tab-head. - Tighten tab-head: subtle grey background, smaller monospace filename in muted grey, smaller rectangular language pill placed to the left of the filename. - Polish the React Flow semantic-layer diagram (controls, fit-view padding, edge types). * docs-site: annotate imperative SQL, add section anchor, drop ClickHouse - Wire numbered red badges to each problematic span in the "Without KTX" SQL with hover sync between SQL gutter, lines, and the notes list. - Add #imperative-vs-declarative anchor on the flow section header so the eyebrow link is shareable; reveals a # glyph on hover/focus. - Align the compiled-SQL note dots to the first-line midpoint (mt-[6px] instead of mt-1) so 4px dots sit at y=8 in a 16px line. - Remove all ClickHouse references from docs-site (primary-sources, quickstart, ktx-setup, contributing, agents-setup, mechanics test, warehouse drivers in the flow diagram). * test: drop ClickHouse contributing-docs assertion Align the workspace-package mirror test with the ClickHouse removal from docs-site (75907eb). The connector-clickhouse package still exists in packages/, but contributing.mdx no longer lists it, so the test that mirrored docs against the workspace was failing.
2026-05-19 23:41:29 +02:00
[Semantic Querying](/docs/concepts/semantic-layer-internals).
## Wiki pages
Wiki pages capture the context that does not belong in a measure formula:
business definitions, reporting policy, known data issues, metric caveats, and
links back to semantic sources.
| Put it in YAML | Put it in Markdown |
|----------------|--------------------|
| `sum(amount)` | "Net revenue excludes successful refunds." |
| `many_to_one` join metadata | "Use contract segment for board reporting." |
| Row grain and column types | "February had a one-time refund anomaly." |
| Default time dimension | "Finance owns ARR definitions." |
## How KTX compares
KTX overlaps with semantic layers, but the product boundary is broader: it gives
agents a reviewable context workspace, not only a metric runtime.
| Dimension | KTX | MetricFlow / Cube / Malloy |
|-----------|-----|-----------------------------|
| **Primary surface** | Plain YAML and Markdown files | Modeling language, project runtime, or API surface |
| **Models** | Sources, joins, grain, measures, filters, wiki refs, and provenance | Metrics, dimensions, joins, queries, and generated SQL |
| **Agent edit loop** | First-class: patch files, validate, inspect SQL, and review git diffs | Possible, but usually tied to the tool's modeling workflow |
| **Surrounding context** | Built in through wiki pages, scans, transcripts, and source evidence | Usually descriptions, annotations, metadata, or app-specific context |
| **Best fit** | Agents maintaining analytics context and SQL-facing definitions | Teams standardizing metrics, BI APIs, semantic runtimes, or exploratory modeling |
If you already use MetricFlow, LookML, dbt, or BI tools, KTX can ingest that
context and turn it into agent-readable files. You do not need to replace your
serving layer to give agents a better working surface.
## Plain files
A KTX project is a directory of readable files. Semantic sources and wiki pages
are committed to git; local indexes and caches stay under `.ktx/`.
```text
my-project/
├── ktx.yaml
├── semantic-layer/
│ └── warehouse/
│ ├── orders.yaml
│ └── customers.yaml
├── wiki/
│ └── global/
│ ├── revenue.md
│ └── segment-classification.md
├── raw-sources/
│ └── warehouse/
└── .ktx/ # local state, git-ignored
```
This keeps analytics context close to the code review workflow:
- branch context changes;
- review YAML and Markdown diffs;
- merge accepted definitions;
- let agents read the updated source of truth.
## Agent usage notes
Use this page when an agent needs to explain why KTX exists, why schema-only
database access is not enough, or how KTX differs from traditional semantic
layers.
| Agent task | Relevant section | Next page |
|------------|------------------|-----------|
| Explain why a database agent wrote a plausible but wrong query | Why agents need context | [Writing Context](/docs/guides/writing-context) |
| Decide whether a fact belongs in YAML or Markdown | Semantic sources / Wiki pages | [Writing Context](/docs/guides/writing-context) |
| Compare KTX to another semantic layer | How KTX compares | [Primary Sources](/docs/integrations/primary-sources) |
| Explain reviewability and source of truth | Plain files | [Context as Code](/docs/concepts/context-as-code) |