--- title: The Context Layer description: What a context layer is, why agents need one, and the YAML and Markdown surfaces ktx writes to disk. --- import { GitIcon } from "@/components/git-icon"; A context layer is the trusted knowledge surface that sits between your data stack and the agents that query it. It holds the things a database connection can't tell an agent on its own: which metrics are canonical, which joins are safe, what your team means by "active customer", and where every definition came from. **ktx** builds that layer as plain files - YAML, Markdown, and JSON - that agents can search and humans can review. This page covers what's in it, why agents need it, and how it compares to other semantic tooling. ## Database access isn't enough Hand an agent a database connection and it can run SQL. It still has to guess the part that matters: which table is the source of truth, which join is the one analysts actually use, and what definition the business agreed on. Plausible SQL becomes wrong SQL fast. | Schema-only access gives the agent | What it still doesn't know | |------------------------------------|----------------------------| | Tables, columns, and types | Which table is canonical for revenue | | Primary and foreign keys | Which join is safe and which fans out measures | | Sample rows | Which rows are test accounts the team excludes | | `orders.amount` exists | That `amount` includes refunds unless filtered | | A `customers.segment` column | That `legacy_segments` is stale even though it exists | | Column comments, sometimes | The board-approved definition of ARR | Schema is a starting point, not a contract. The context layer is the contract. ## The two pillars A **ktx** project has two committed surfaces, each tuned for a different question. Structured data lives where it can be compiled. Prose lives where it can be searched. Wiki pages cross-reference semantic sources by name, so every metric caveat stays anchored to the definition it explains.

{"Anatomy of a context layer"}

{"Two files, two jobs"}

{"YAML for what the warehouse can execute. Markdown for what the team needs to interpret it. Both are committed to git and reviewed like code."}

{"semantic-layer/**/*.yaml"}

{"git"}

{"Semantic sources"}

{"structured"} {"executable"}

{"Tables, grain, joins, measures, dimensions, filters, and segments. The compiler turns these into dialect-correct SQL."}

{"Answers: "} {"how do I query this safely?"}

{"wiki/**/*.md"}

{"git"}

{"Wiki pages"}

{"free-form"} {"searchable"}

{"Definitions, caveats, policies, and decisions. Frontmatter links each page back to the semantic sources it explains."}

{"Answers: "} {"what does this mean to the business?"}

{"Behind the scenes. "} {"ktx"} {" also keeps scan snapshots and a per-run event log locally so every committed change is traceable to its evidence. You don't read or edit these files yourself - see "} {"Reviewing Context"} {" for how that audit trail flows into review."}
## Semantic sources Semantic sources describe a table the way an agent can reason about it: row grain, typed columns, named measures, valid joins, filters, and segments. The planner compiles these into SQL; nothing else. ```yaml # semantic-layer/warehouse/orders.yaml name: orders table: public.orders grain: [id] columns: - name: id type: number - name: status type: string - name: amount type: number measures: - name: total_revenue expr: sum(amount) filter: "status != 'refunded'" joins: - to: customers "on": customer_id = customers.id relationship: many_to_one ``` For how the compiler walks the join graph, handles fanout, and transpiles dialects, read [Semantic querying](/docs/concepts/semantic-layer-internals). ## Wiki pages Wiki pages hold the context that doesn't belong in a formula: business definitions, reporting policy, anomalies, and metric caveats. Each page links back to the semantic sources it explains through frontmatter. ```markdown # wiki/global/revenue.md --- summary: Paid order value after refunds tags: [finance, orders] sl_refs: [warehouse.orders] refs: [segment-classification] usage_mode: auto --- Revenue is paid order amount after refund adjustments. Use `orders.total_revenue` for recognized order value and `orders.order_count` for paid order volume. ``` ### A navigable graph Those two reference fields - `sl_refs` from a wiki page to a semantic source, and `refs` from a wiki page to other wiki pages - turn the context layer into a graph agents traverse. An agent that finds this page while searching for "revenue" follows `sl_refs` straight to `orders.total_revenue` for the executable definition, then walks `refs` to related policies without rerunning search. The graph only helps if the edges stay live. **ktx** validates references when wiki pages are written and prunes `sl_refs` during ingest when their target sources are deleted or their measures are renamed - so a stale page can never quietly route an agent to a definition that no longer exists. For how the hybrid search pipeline ranks pages, how `[[wikilinks]]` extend the graph, and how ingest authors pages from evidence, read [Wiki retrieval](/docs/concepts/wiki-retrieval). The split between the two pillars is sharp: | Put it in YAML | Put it in Markdown | |----------------|--------------------| | `sum(amount)` | "Net revenue excludes successful refunds." | | `many_to_one` join metadata | "Use the contract segment for board reporting." | | Row grain and column types | "February had a one-time refund anomaly." | | Default time dimension | "Finance owns ARR definitions." | If a fact changes how the SQL runs, it goes in YAML. If a human needs it to trust the answer, it goes in Markdown. ## How ktx compares Two adjacent product categories cover parts of this problem - but each leaves a different gap. **Company brains** (Glean, Notion AI, the search-over-everything tools) index your wikis, docs, and chats so an agent can find context fast. They aren't built for data stacks: there's no join graph, no canonical metrics, and no way to compile a question into safe SQL. An agent reading them still has to guess how to query the warehouse. **Traditional semantic layers** (MetricFlow, Cube, Malloy) solve that side. They give agents reviewable metric definitions and a compiler that produces correct SQL. The cost is maintenance - models, joins, and dimensions are hand-written, and the layer doesn't learn from the warehouse, BI tools, or query history that surround it. The business context that explains *why* a definition exists usually lives somewhere else. **ktx** bundles both surfaces - wiki for business context, semantic layer for queryable definitions - and keeps them current by reading the data stack and reconciling new evidence with the reviewed files. You get the breadth of a knowledge tool and the SQL safety of a semantic layer, without rewriting models every time the warehouse changes. | Capability | Company brain | Semantic layer | **ktx** | |------------|---------------|----------------|-----| | **Surface** | Indexed docs and chats | Modeling language or runtime | YAML and Markdown files | | **Data-stack awareness** | None - treats data tools as text | High for declared metrics, none for the surrounding warehouse | Built in: scans schemas, dbt, BI tools, and query history | | **Maintenance** | Manual page authoring | Manual modeling, model-per-change | Auto-maintained: reconciles evidence with accepted files | | **SQL safety** | None - generates plausible text | Compiled, dialect-correct | Compiled with join-graph and fanout handling | | **Agent edit loop** | Text-only | Tied to the modeling workflow | First-class: patch files, validate, review diffs | If you already use MetricFlow, LookML, dbt, or BI tools, **ktx** can ingest that context and turn it into agent-readable files. You don't need to replace your serving layer to give agents a better working surface. ## A ktx project on disk A **ktx** project is a directory of readable files. Semantic sources and wiki pages are committed to git; everything else **ktx** needs at runtime stays local and out of the repo. ```text my-project/ ├── ktx.yaml # project config and connections ├── semantic-layer/ │ └── warehouse/ │ ├── orders.yaml │ └── customers.yaml ├── wiki/ │ └── global/ │ ├── revenue.md │ └── segment-classification.md └── .ktx/ # local runtime state, git-ignored ``` This keeps analytics context close to the code review workflow: branch context changes, review YAML and Markdown diffs, merge accepted definitions, and let agents read the updated source of truth. ## Agent usage notes Use this page when an agent needs to explain why **ktx** exists, why schema-only database access isn't enough, or how **ktx** differs from traditional semantic layers. | Agent task | Relevant section | Next page | |------------|------------------|-----------| | Explain why a data agent wrote a plausible but wrong query | Database access isn't enough | [Writing Context](/docs/guides/writing-context) | | Decide whether a fact belongs in YAML or Markdown | Semantic sources / Wiki pages | [Writing Context](/docs/guides/writing-context) | | Compare **ktx** to another semantic layer | How ktx compares | [Primary Sources](/docs/integrations/primary-sources) | | Explain reviewability and source of truth | A ktx project on disk | [Reviewing Context](/docs/guides/reviewing-context) |