diff --git a/docs-site/content/docs/concepts/context-as-code.mdx b/docs-site/content/docs/concepts/context-as-code.mdx
index e6ebcd7c..6636b700 100644
--- a/docs-site/content/docs/concepts/context-as-code.mdx
+++ b/docs-site/content/docs/concepts/context-as-code.mdx
@@ -5,100 +5,110 @@ description: Treat analytics context like code - version it, review it, merge it
## The idea
-dbt proved that analytics transformations belong in version control. Before dbt, SQL lived in BI tools, scheduling systems, and spreadsheets - scattered, unreviewed, impossible to audit. "Analytics as code" changed that: put your models in git, review them in PRs, deploy them by merging.
+dbt moved analytics transformations into git. KTX applies the same pattern to
+analytics context: metric definitions, joins, business rules, wiki pages, and
+ingest decisions become files that can be reviewed, merged, and audited.
-KTX applies the same principle to analytics context. Metric definitions, business rules, join relationships, wiki pages - these are artifacts that determine whether an agent produces correct results. They change over time. They need review. They need history. They need to be treated like code.
-
-A KTX project is a git repository. Semantic sources are YAML files. Wiki pages are Markdown files. Changes are commits. Updates are pull requests. Deployment is a merge. The entire lifecycle of your analytics context follows the same workflow your team already uses for dbt models, application code, and infrastructure.
+| Before | With KTX |
+|--------|----------|
+| Context scattered across BI tools, chats, docs, and analyst memory | Context lives in YAML and Markdown |
+| Agent changes are hard to inspect | Agent changes are git diffs |
+| Imports overwrite local judgment | Ingest reconciles with existing files |
+| History depends on tool logs | History lives in commits and transcripts |
## Auto-ingestion
-Most analytics context already exists - it's in your dbt manifests, LookML models, Metabase questions, and team Notion pages. KTX pulls from these sources automatically through adapters.
+Most context already exists in dbt manifests, LookML, MetricFlow, Metabase,
+Notion, warehouse metadata, and analyst notes. KTX reads those inputs through
+adapters, then reconciles them into local files.
-An ingestion run works like this:
+```text
+source tools -> adapters -> reconciliation agent -> YAML + Markdown diffs
+```
-1. **Adapters extract metadata.** Each configured source - dbt, LookML, Metabase, MetricFlow, Notion, or your live database - provides structured metadata about models, metrics, dimensions, questions, and documentation.
+| Step | What happens | Output |
+|------|--------------|--------|
+| **Extract** | Adapters read models, metrics, questions, schemas, and docs | Structured metadata |
+| **Reconcile** | The agent compares incoming facts with existing context | Create, update, skip, or flag |
+| **Write** | KTX saves changed semantic sources and wiki pages | Reviewable project files |
-2. **The LLM agent reconciles.** KTX doesn't blindly overwrite existing context. An LLM agent compares incoming metadata against your current semantic sources and wiki pages. It decides what to create, what to update, and what to leave alone. If your dbt project added a new model, the agent writes a new semantic source. If a Metabase question references a metric you've already defined, the agent skips the duplicate.
-
-3. **Files are written.** New and updated YAML sources and Markdown wiki pages are written to the project directory. Every decision is recorded in the session transcript.
-
-This reconciliation step is what separates auto-ingestion from a simple sync. A naive import would overwrite your hand-tuned metric definitions every time dbt's manifest changes. KTX's agent-driven approach merges intelligently: it respects your edits, fills gaps, and flags conflicts for human review.
+Reconciliation is the key difference from a sync. KTX preserves accepted local
+edits, fills gaps, and surfaces conflicts instead of blindly overwriting files.
## The git workflow
-Auto-ingestion is designed to plug into a PR-based workflow. Run ingestion on a branch, review the changed YAML and Markdown files, and merge them the same way you merge dbt models or application code.
+Run ingestion on a branch, review the changed YAML and Markdown, then merge the
+accepted context the same way you merge dbt or application code.
```text
-dbt / Looker / Metabase / Notion
- |
- v
- metadata changes
- |
- v
- nightly cron or CI ingest
- |
- v
- branch: ingest/nightly
- |
- | + 3 new sources
- | ~ 2 updated joins
- | + 1 wiki page
- v
- open PR
- |
- v
- review semantic diff
- |
- v
- approve & merge
- |
- v
- agents see updated context
+dbt / BI / docs / warehouse
+ |
+ v
+ ktx ingest --all
+ |
+ v
+ branch: ingest/nightly
+ |
+ v
+ semantic diff in PR
+ |
+ v
+ approve and merge
+ |
+ v
+ agents read updated files
```
-A typical branch shows a semantic diff: "this ingest added 3 new sources from dbt, updated 2 join definitions based on schema changes, and created 1 wiki page from a Notion doc." Analytics engineers review the diff, verify that the new sources look correct, and merge.
+Typical review checklist:
-Teams usually run this on demand while setting up a source, then schedule it
-once the source is stable. A cron job or CI schedule can run `ktx ingest --all --no-input`
-overnight on an ingest branch so the latest schema context, dbt manifests, BI
-metadata, and documentation updates are ready for review each morning.
+- new sources match the warehouse and source-tool evidence;
+- joins have the right relationship direction;
+- generated measures match business definitions;
+- wiki pages capture caveats without duplicating YAML;
+- `.ktx/` runtime state stays out of git unless your team intentionally reviews
+ a report or transcript.
-Once merged, agents querying through the KTX CLI see the updated context immediately. No deployment step, no cache invalidation, no restart. The files are the source of truth, and agents read them on every request.
-
-This workflow gives you the same review guarantees you have for dbt models. No semantic source reaches production without a human approving it. But unlike maintaining context manually, the heavy lifting - discovering new tables, drafting source definitions, extracting business rules from documentation - is done by the ingestion agent. You review and approve. You don't write from scratch.
+Teams often run ingestion on demand during setup, then schedule
+`ktx ingest --all --no-input` on an ingest branch once the source is stable.
## Feedback loops
-Context improves over time through two feedback channels.
+Context improves when human corrections and agent signals flow back into the
+same reviewed files.
-**Analyst corrections.** When an analytics engineer spots something wrong - a measure formula that doesn't match the business definition, a join that should be `many_to_one` instead of `one_to_many`, a wiki page that's out of date - they edit the YAML or Markdown directly and commit. These corrections become part of the project's git history, and the next ingestion run respects them. If you manually fix a measure definition, KTX won't overwrite it on the next ingest.
+| Signal | Example | Where it lands |
+|--------|---------|----------------|
+| Analyst correction | A measure excludes test accounts | `semantic-layer/**/*.yaml` |
+| Business clarification | ARR changed definition this quarter | `wiki/**/*.md` |
+| Agent query issue | A filter returns no rows unexpectedly | Wiki caveat or tighter source filter |
+| Join problem | A path duplicates order-level measures | Relationship metadata or grain fix |
-**Agent feedback.** When an agent queries the semantic layer and gets unexpected results - a query that returns no rows because of a bad filter, a join path that produces duplicated results - it can flag the issue. These signals feed back into the context: wiki pages can note known data quality issues, and source definitions can be tightened with better filters, join paths, or grain declarations.
-
-Each of these channels makes the next ingestion cycle better. Analyst corrections teach the system what your team considers authoritative. Agent feedback surfaces gaps in coverage. Context is not a static artifact - it's a living system that converges toward accuracy with every iteration.
+Accepted corrections become input to the next ingest run. That makes the
+context layer converge toward the team's current source of truth.
## Deterministic replay
-Every ingestion session in KTX produces a full transcript: every tool call the LLM agent made, every response it received, every source it created or modified, and the reasoning behind each decision.
+Every ingestion session records the adapter inputs, tool calls, LLM responses,
+write decisions, and reasoning behind each change.
-This matters for three reasons.
+| Use case | What replay gives you |
+|----------|-----------------------|
+| **Debugging** | Trace a bad source, join, or measure back to the input that produced it |
+| **Trust** | Show where a definition came from and who reviewed the resulting diff |
+| **Reproducibility** | Compare old and new ingest behavior after config or model changes |
-**Debugging.** When a semantic source looks wrong - the grain is off, a join points to the wrong table, a measure formula doesn't match the business definition - you can trace it back to the ingestion session that created it. The transcript shows exactly which adapter provided the input, how the LLM interpreted it, and why it made the decision it did. You don't have to guess.
-
-**Trust.** Analytics teams need to trust the context that agents consume. Deterministic replay means you can verify any part of the context layer by re-examining the session that produced it. If a stakeholder asks "where did this revenue definition come from?", you have a complete audit trail - from the dbt manifest entry, through the LLM's reconciliation logic, to the YAML file that was written.
-
-**Reproducibility.** Because ingestion sessions are recorded as structured transcripts (tool calls and responses, not just logs), they can be replayed for testing and validation. If you change your ingestion configuration or upgrade the LLM, you can replay previous sessions to see how the output would differ. This gives you a safety net for changes that affect how context is generated.
-
-The transcript is stored with local ingest run state and can be reviewed or replayed when you need to audit a decision. Commit the resulting YAML and Markdown changes; commit reports or transcripts only when they are part of your team's review workflow.
+Commit the YAML and Markdown changes. Commit reports or transcripts only when
+they are part of your team's review workflow.
## Agent usage notes
-Use this page when an agent needs to explain review workflows, ingestion diffs, replayability, or why KTX writes YAML and Markdown instead of hiding context in a hosted service.
+Use this page when an agent needs to explain review workflows, ingestion diffs,
+replayability, or why KTX writes YAML and Markdown instead of hiding context in
+a hosted service.
| Agent task | Relevant section | Next page |
|------------|------------------|-----------|
| Explain how generated context should be reviewed | The git workflow | [Building Context](/docs/guides/building-context) |
-| Diagnose why ingestion changed a semantic source | Auto-ingestion and Deterministic replay | [ktx ingest](/docs/cli-reference/ktx-ingest) |
+| Diagnose why ingestion changed a semantic source | Auto-ingestion / Deterministic replay | [ktx ingest](/docs/cli-reference/ktx-ingest) |
| Explain how context improves over time | Feedback loops | [Building Context](/docs/guides/building-context) |
| Tell a user what to commit | The git workflow | [Writing Context](/docs/guides/writing-context) |
diff --git a/docs-site/content/docs/concepts/semantic-layer-internals.mdx b/docs-site/content/docs/concepts/semantic-layer-internals.mdx
index c48428e6..aec0ccfa 100644
--- a/docs-site/content/docs/concepts/semantic-layer-internals.mdx
+++ b/docs-site/content/docs/concepts/semantic-layer-internals.mdx
@@ -1,26 +1,26 @@
---
-title: Semantic Layer Internals
-description: How KTX uses join graphs, grain, and relationship metadata to turn context into safe SQL.
+title: Context-Aware SQL
+description: How KTX turns reviewed context, grain, and relationship evidence into safe SQL for agents.
---
-KTX is a context layer for agents. This page focuses on one internal subsystem:
-the semantic execution layer that turns reviewed context into safe SQL.
+## Why query planning needs context
-The semantic layer is important, but it is not the whole product. KTX also
-handles schema evidence, wiki context, provenance, validation, and agent
-workflows around those files.
+Agents can generate SQL from schema alone, but safe analytics SQL needs more
+than table names. KTX uses reviewed context to understand grain, joins, measures,
+filters, and where aggregation must happen.
-Read the page as a pipeline:
+Read this page as four mechanics:
-- context inputs feed the semantic engine;
+- context files feed the semantic engine;
- evidence becomes a join graph with grain and relationship metadata;
-- review and corrections keep that graph current;
-- the execution engine uses the graph to avoid fan-out and ambiguous joins.
+- review keeps the graph current;
+- query planning avoids fan-out and ambiguous joins.
## Where the semantic layer fits
-The semantic layer is not a separate product category inside KTX. It is the
-engine that makes the rest of the context actionable for SQL generation.
+This planner is one subsystem inside KTX's broader context layer. It uses source
+YAML, wiki context, scan evidence, and provenance to make context actionable for
+SQL generation.
-## The join graph KTX builds
+## Join graph
-A semantic source is a node. A join is an edge with a join condition and a
-relationship type. The graph lets KTX choose valid paths, reject unsafe paths,
-and reason about whether a join preserves or multiplies rows before SQL is
-generated.
+A semantic source is a node. A join is a typed edge. KTX uses the graph to
+choose valid paths and detect row-multiplying joins before SQL is generated.
-- `many_to_one` paths are usually safe for adding dimensions.
-- `one_to_many` paths can multiply fact rows and trigger fan-out handling.
-- Equal-cost paths can be ambiguous, so aliases and explicit joins matter.
+| Relationship | Planning impact |
+|--------------|-----------------|
+| `many_to_one` | Usually safe for adding dimensions |
+| `one_to_many` | Can multiply measures and trigger fan-out handling |
+| `one_to_one` | Usually safe when keys are correct |
+| Equal-cost paths | Ambiguous unless aliases or explicit joins disambiguate |
The graph is bidirectional for planning. If `orders -> customers` is
-`many_to_one`, the reverse path is `one_to_many`; KTX keeps that distinction
-instead of treating every join as a neutral edge.
+`many_to_one`, the reverse path is `one_to_many`.
-## How KTX builds the graph
+## Building and maintaining the graph
-KTX starts from evidence, not a blank modeling canvas. Database scans and
-analytics-tool imports create source definitions that an analyst can review.
+KTX starts from evidence, writes reviewable source YAML, and treats the merged
+diff as the accepted graph.
| Evidence | What it contributes |
-|---|---|
-| Declared primary keys | Initial row grain for each source |
-| Declared foreign keys | Formal join candidates and relationship direction |
-| Inferred relationships | Useful edges when warehouses lack constraints |
-| dbt, MetricFlow, and LookML imports | Existing metrics, dimensions, entities, explores, and joins |
-| Query history | Real join and filter patterns agents should respect |
-| Analyst review | The final authority before context is merged |
-
-Generated YAML is intentionally reviewable. KTX can draft joins and measures,
-but the accepted semantic layer is still the plain-file diff your team approves.
-
-## How KTX keeps the graph current
-
-The semantic layer changes as schemas, metrics, and business rules change. KTX
-keeps that loop explicit instead of hiding it behind a remote runtime.
+|----------|---------------------|
+| Declared primary keys | Initial row grain |
+| Declared foreign keys | Formal join candidates |
+| Inferred relationships | Edges when warehouses lack constraints |
+| dbt, MetricFlow, and LookML imports | Existing metrics, dimensions, explores, and joins |
+| Query history | Real join and filter patterns |
+| Analyst review | Final authority before context is merged |
-This matters because semantic correctness is not static. If a source gains a
-new key, a metric changes definition, or an analyst corrects a relationship,
-the next agent gets that reviewed context.
+## Modeling problems
-## The modeling problem the graph solves
+Fan-out is the classic failure mode: an order-level measure joins to line-item
+rows before aggregation, so one order becomes many rows.
-Fan-out is the classic failure mode. If an order-level measure is joined to
-line-item rows before aggregation, one order can become many rows and revenue
-can be counted more than once.
+| Problem | What happens | How KTX handles it |
+|---------|--------------|--------------------|
+| Order measure joins to `order_items` | `orders.revenue` repeats once per item | Detect `one_to_many` and pre-aggregate |
+| Two fact sources share `customers` | Measures multiply across the shared dimension | Treat as a chasm trap and plan each fact locally |
+| Filter crosses `one_to_many` | Filtering changes measure grain | Reject or localize the filter |
+| Equal-cost paths connect sources | Join choice is ambiguous | Prefer safer paths or require aliases |
-| Problem | What happens | How KTX avoids it |
-|---|---|---|
-| Order measure joins to `order_items` | `orders.revenue` repeats once per item | Detect the `one_to_many` path and pre-aggregate the order measure |
-| Two independent fact sources share `customers` | Measures from each fact table multiply across the shared dimension | Treat it as a chasm trap and use aggregate-locality planning |
-| Filter lives only across a `one_to_many` path | Filtering after the join changes the measure grain | Reject or localize the filter instead of silently producing unsafe SQL |
-| Multiple equal-cost paths connect the same sources | The join path is ambiguous | Prefer safer paths and use aliases to disambiguate repeated joins |
+## Execution planning
-Many-to-many questions usually show up as multiple one-to-many paths or
-independent fact sources. KTX treats those shapes as fan-out or chasm risks
-unless the query can be planned at a safe grain.
-
-## How the execution engine uses the graph
-
-The planner resolves the sources in a semantic query, chooses a join tree, and
-checks whether any requested dimension or filter crosses a row-multiplying
-edge. The SQL generator then chooses the simple path or the aggregate-locality
-path.
+The planner resolves sources, chooses a join tree, checks relationship paths,
+and picks a simple or aggregate-locality SQL shape.
| Naive SQL shape | Semantic-layer SQL shape |
-|---|---|
-| Join facts and dimensions first, then aggregate | Aggregate each fact source at its own grain, then join the results |
+|-----------------|--------------------------|
+| Join facts and dimensions first, then aggregate | Aggregate each fact source at its own grain, then join results |
| Put every filter in one outer `WHERE` clause | Keep measure filters with the measure source when locality is needed |
| Trust the shortest textual join path | Prefer safe relationship paths and reject disconnected sources |
| Let dimension grain differ across facts | Raise when asymmetric dimensions would fan out another measure |
@@ -342,27 +323,49 @@ path.
className="not-prose my-8 overflow-hidden rounded-lg border border-fd-border bg-fd-card shadow-sm"
aria-label="Fan-out safe execution shape"
>
-
-
-
- {"Unsafe shape"}
-
-
+
+
+ {"Fan-out handling"}
+
+
+ {"The same question planned before and after KTX preserves the measure grain."}
+
- {"The order measure is exposed to line-item fan-out before aggregation."}
-
+
+ {"Order-level revenue is exposed to line-item fan-out before aggregation."}
+
-
-
- {"KTX shape"}
-
-
+
+
+
+ {"KTX shape"}
+
+
+ {"Aggregate locally, then join"}
+
+
+
{`orders_agg as (
select customer_id, sum(amount) revenue
from orders
@@ -372,27 +375,25 @@ select customers.segment, sum(revenue)
from orders_agg
join customers`}
-
- {"KTX pre-aggregates fact measures at their own grain before joining dimensions."}
-
+
+ {"The measure is pre-aggregated at order grain before dimensions are joined."}
+
-The result is not magic. It is structured planning: validated sources, typed
-relationships, graph search, fan-out detection, aggregate locality, and final
-dialect transpilation.
+The result is structured planning: validated sources, typed relationships,
+graph search, fan-out detection, aggregate locality, and dialect transpilation.
-## What this means for agents
+## Agent usage notes
-KTX gives agents a semantic surface they can inspect and improve, not just a
-folder of notes.
+Use this page when an agent needs to explain how KTX turns reviewed semantic
+context into SQL, why relationship metadata matters, or why a query was rejected
+as unsafe.
-- Search semantic sources and related wiki pages before writing SQL.
-- Compile SQL through `ktx sl query` instead of guessing joins.
-- Validate semantic-layer changes before review.
-- Patch YAML and Markdown files in git.
-- Explain metric meaning and provenance from the same accepted context.
-
-Next, read [Writing Context](/docs/guides/writing-context) for the YAML editing
-workflow or [ktx sl](/docs/cli-reference/ktx-sl) for the command reference.
+| Agent task | Relevant section | Next page |
+|------------|------------------|-----------|
+| Explain why KTX asks for `grain` and relationship types | Join graph | [Writing Context](/docs/guides/writing-context) |
+| Diagnose duplicated measures after a join | Modeling problems | [ktx sl](/docs/cli-reference/ktx-sl) |
+| Explain safe SQL generation | Execution planning | [ktx sl](/docs/cli-reference/ktx-sl) |
+| Describe how semantic context stays current | Building and maintaining the graph | [Context as Code](/docs/concepts/context-as-code) |
diff --git a/docs-site/content/docs/concepts/the-context-layer.mdx b/docs-site/content/docs/concepts/the-context-layer.mdx
index ba7ee3f3..9a8130d0 100644
--- a/docs-site/content/docs/concepts/the-context-layer.mdx
+++ b/docs-site/content/docs/concepts/the-context-layer.mdx
@@ -3,223 +3,66 @@ title: The Context Layer
description: What a context layer is, why agents need one, and how KTX compares to other semantic layers.
---
-## The problem
+## Why agents need context
-Give an agent access to your database and it will generate SQL. It might even produce a decent chart. But ask it a real analytics question - "what's our net revenue trend by segment?" - and things fall apart.
+Database access lets an agent generate SQL. It does not tell the agent which
+tables matter, which joins are safe, which metrics are canonical, or what your
+team means by "enterprise", "net revenue", or "active customer".
-The agent doesn't know that `orders.amount` includes refunds and needs a status filter. It doesn't know that `customers` should join to `orders` on `customer_id`, not `id`. It doesn't know that your team stopped using `legacy_segments` six months ago, or that "enterprise" means contracts over $100k, not just big logos. It sees column names and types. It doesn't see your business.
+That missing business context is where plausible SQL becomes wrong SQL:
-This isn't a model capability problem. Claude Code, Codex, and your BI agents can write correct SQL when they know what correct means. The gap is context: which tables matter, which joins are valid, which metrics are canonical, what the business terms actually refer to. Without that, agents produce plausible-looking artifacts that are subtly, dangerously wrong. Wrong enough to pass a glance, wrong enough to drive a decision.
-
-Analytics engineers already know this pain. It's the same reason you write dbt tests, maintain a data dictionary, and spend half of standup explaining why someone's dashboard number doesn't match the board deck. The difference is that agents make decisions at machine speed, so the wrong context propagates faster than a human can catch it.
+- `orders.amount` may include refunds unless filtered.
+- `customers.id` may not be the right join key for every source.
+- `legacy_segments` may be stale even though it still exists.
+- A metric may have a board-approved definition that is not obvious from
+ column names.
## Three waves of AI analytics
-The industry has moved through three distinct approaches to getting AI and data to work together.
+| Wave | What it gives agents | Where it breaks |
+|------|----------------------|-----------------|
+| **Database access** | Tables, columns, and query execution | Agents guess joins, filters, and metric logic |
+| **Semantic layers** | Modeled metrics, dimensions, joins, and SQL generation | They often miss operating context: anomalies, caveats, ownership, and review history |
+| **Agentic context** | Semantic definitions plus wiki knowledge, scans, provenance, and edit workflows | Requires context to be kept current and reviewable |
-**Wave one: database access.** Connect an LLM to a database, let it generate SQL. This works for simple lookups - "how many orders last week?" - but breaks on anything that requires business knowledge. The agent guesses at joins, invents metrics, and hallucinates table relationships. Every query is a coin flip.
+KTX is built for the third wave: agents that generate SQL, maintain semantic
+files, write docs, propose tests, and leave reviewable diffs.
-**Wave two: semantic layers and text-to-SQL.** Add structure. Define metrics in MetricFlow or Cube, expose schemas, build text-to-SQL pipelines. This is better - the agent knows that `revenue` means `sum(amount) where status != 'refunded'` - but building and maintaining that structure by hand is manual, time-consuming, and still limited. Semantic layers define what to calculate, not why, when, or how to interpret the result. The agent can compute net revenue but doesn't know about the February refund anomaly, the segment reclassification, or the fact that `enterprise` changed definition last quarter.
+## What KTX adds
-**Wave three: agentic context.** AI is no longer just answering questions - it's generating dashboards, writing semantic definitions, proposing dbt models, creating tests and documentation. For that to work, agents need more than metric definitions. They need the full picture: business rules, known data quality issues, relationship maps, historical context, and the institutional knowledge that lives in your team's heads. They need a context layer.
+A context layer is the trusted knowledge surface between analytics systems and
+agents. The semantic layer is the core, but agents also need business rules,
+schema evidence, provenance, and a safe way to update files.
-## What a context layer is
+```text
+Warehouses + dbt + BI + docs
+ |
+ v
+ ktx ingest
+ |
+ v
+semantic-layer/ + wiki/ + raw-sources/ + provenance
+ |
+ v
+Agents search, query, explain, validate, and patch context
+```
-A context layer is the infrastructure that gives agents the business knowledge they need to produce correct analytics artifacts. It includes a semantic layer - that's a critical component - but it's not the whole thing.
+| Pillar | Format | What it answers |
+|--------|--------|-----------------|
+| **Semantic sources** | `semantic-layer/**/*.yaml` | How do agents query a source safely? |
+| **Wiki pages** | `wiki/**/*.md` | What does the business mean, and what caveats matter? |
+| **Scan artifacts** | `raw-sources/**` | What did KTX observe in the warehouse or source tool? |
+| **Provenance** | Ingest transcripts and run state | Why was this context created or changed? |
-
-
-
- {"How KTX works"}
-
-
- {"KTX pulls structured metadata and human knowledge from your analytics stack, reconciles it into reviewable files, then gives agents a trusted surface for search, SQL generation, validation, and edits."}
-
- {"Reviewed agent and analyst edits flow back into the same YAML and Markdown files, so the next ingest run starts from the team's accepted context."}
-
-
-
-KTX organizes context into four pillars:
-
-- Semantic sources
-- Wiki pages
-- Scan artifacts
-- Provenance
-
-Each pillar covers a different kind of context agents need before they can safely write SQL, update semantic definitions, or explain an analytics result.
-
-**Semantic sources** are YAML definitions that describe your data in terms
-agents can reason about:
-
-- source tables or SQL queries;
-- row grain;
-- typed columns;
-- valid joins;
-- named measures, filters, and segments.
-
-This is where "revenue means `sum(amount)` excluding refunds" lives. For the
-join graph, fan-out protections, and execution mechanics, read
-[Semantic Layer Internals](/docs/concepts/semantic-layer-internals).
+Semantic sources describe data in terms agents can reason about: row grain,
+typed columns, valid joins, named measures, filters, and segments.
```yaml
name: orders
table: public.orders
grain: [id]
-columns:
- - name: id
- type: number
- - name: customer_id
- type: number
- - name: amount
- type: number
- - name: status
- type: string
- - name: created_at
- type: time
- role: time
joins:
- to: customers
"on": customer_id = customers.id
@@ -228,95 +71,78 @@ measures:
- name: revenue
expr: sum(amount)
filter: "status != 'refunded'"
- description: Net revenue excluding refunds
- - name: order_count
- expr: count(id)
```
-**Wiki pages** are Markdown documents that capture business definitions, rules, and operating context - the kind of context that doesn't fit in a schema definition. Pages have structured frontmatter (summary, tags, semantic layer references) and free-form content. Agents search them when they need to understand why a metric works a certain way, not just how to compute it.
+For join graphs, fan-out handling, and execution mechanics, read
+[Context-Aware SQL](/docs/concepts/semantic-layer-internals).
-```markdown
----
-summary: Gross-to-net revenue reconciles paid invoices, credits, and refunds.
-tags:
- - finance
- - revenue
-refs:
- - arr-contract-first
-sl_refs:
- - warehouse.invoices
-usage_mode: auto
----
+## Wiki pages
-Gross revenue starts from paid invoice activity. Net revenue subtracts
-credits and successful refunds in the month they are recorded.
+Wiki pages capture the context that does not belong in a measure formula:
+business definitions, reporting policy, known data issues, metric caveats, and
+links back to semantic sources.
-Exclude unpaid, void, draft, and test-account invoice activity from
-canonical revenue reporting.
-```
-
-**Scan artifacts** are the raw output of KTX's database scanner: table and column metadata, inferred foreign key relationships (even without declared constraints), column statistics, and enrichment reports. They form the foundation that semantic sources are built on.
-
-**Provenance** is the record of how context was created and changed. Every ingestion session records a full transcript - which adapter ran, what the LLM decided, which sources were created or updated, and why. This is what makes the system auditable: you can trace any semantic source back to the ingestion decision that created it.
-
-Together, these four pillars give agents enough context to produce analytics artifacts that match what your team would produce - not just syntactically valid SQL, but the right query for the question.
+| Put it in YAML | Put it in Markdown |
+|----------------|--------------------|
+| `sum(amount)` | "Net revenue excludes successful refunds." |
+| `many_to_one` join metadata | "Use contract segment for board reporting." |
+| Row grain and column types | "February had a one-time refund anomaly." |
+| Default time dimension | "Finance owns ARR definitions." |
## How KTX compares
-KTX is a context layer with an agent-native semantic layer at its core. MetricFlow, Cube, and Malloy model metrics, dimensions, joins, and generated SQL. KTX covers that semantic-layer work, then adds the context agents need to use and maintain it: wiki pages, schema scans, provenance, ingestion, validation, and agent-facing CLI commands.
+KTX overlaps with semantic layers, but the product boundary is broader: it gives
+agents a reviewable context workspace, not only a metric runtime.
-The workflow is the difference. Traditional semantic layers are powerful, but they are usually built and maintained through manual modeling work, product-specific runtimes, or language-specific workflows. They are not agent-native by default, which makes them harder for agents to inspect, edit, validate, and review in a tight loop. KTX is designed for agents that need to read context, change semantic files, inspect generated SQL, and leave a reviewable git diff.
+| Dimension | KTX | MetricFlow / Cube / Malloy |
+|-----------|-----|-----------------------------|
+| **Primary surface** | Plain YAML and Markdown files | Modeling language, project runtime, or API surface |
+| **Models** | Sources, joins, grain, measures, filters, wiki refs, and provenance | Metrics, dimensions, joins, queries, and generated SQL |
+| **Agent edit loop** | First-class: patch files, validate, inspect SQL, and review git diffs | Possible, but usually tied to the tool's modeling workflow |
+| **Surrounding context** | Built in through wiki pages, scans, transcripts, and source evidence | Usually descriptions, annotations, metadata, or app-specific context |
+| **Best fit** | Agents maintaining analytics context and SQL-facing definitions | Teams standardizing metrics, BI APIs, semantic runtimes, or exploratory modeling |
-| | KTX semantic layer | MetricFlow | Cube | Malloy |
-|---|---|---|---|---|
-| **Model surface** | Plain YAML sources plus Markdown wiki pages | YAML semantic models and metrics in a dbt project | YAML or JavaScript cubes, views, access policies, and pre-aggregations | `.malloy` models, query pipelines, notebooks, and annotations |
-| **What it models** | Sources, columns, measures, segments, joins, grain, filters, default time dimensions, and context references | Semantic models, entities, dimensions, measures, metrics, time grains, and metric types | Cubes, views, measures, dimensions, segments, joins, hierarchies, policies, and rollups | Sources, joins, dimensions, measures, calculations, nested results, and query pipelines |
-| **Agent edit loop** | First-class. Agents can patch small files, save imperfect drafts, run validation, query through the CLI, inspect SQL, and refine in the same workflow | Possible, but the interface is a dbt/metric workflow rather than an agent context workflow | Possible through code-first models and platform APIs, but changes are tied to runtime deployment and governance concerns | Possible, but agents must operate in Malloy's language and compiler model |
-| **Fan-out safety** | Explicit `grain` plus relationship metadata. KTX detects `one_to_many` fan-out, identifies chasm traps, pre-aggregates independent fact measures into CTEs, and rejects unsafe filters | Dataflow query planning for metric requests, multi-hop joins, metric time, and metric types | Runtime planner, modeled joins, primary keys, views, multi-fact views, and pre-aggregations | Symmetric aggregates and path-based aggregation in the language |
-| **SQL generation** | Structured semantic query to canonical SQL, then dialect transpilation with sqlglot | Metric request to optimized query plan, then engine-specific SQL | REST, GraphQL, Postgres-compatible SQL, Semantic SQL, and cached/pre-aggregated execution | Malloy source/query to dialect-specific SQL and result metadata |
-| **Context around semantics** | Built in: wiki pages, scan artifacts, relationship inference, ingest transcripts, replay, and agent-facing CLI commands | Primarily metric and dbt project context | Descriptions and `meta.ai_context` inside the semantic model, plus platform agent features | Annotations/tags can carry metadata; surrounding context depends on the application |
-| **Best fit** | Agents maintaining analytics code, metrics, joins, SQL, docs, and semantic definitions | Teams standardizing metrics inside dbt workflows | Production semantic APIs, BI integrations, access control, caching, and concurrency | Expressive modeling and exploratory analysis above SQL |
+If you already use MetricFlow, LookML, dbt, or BI tools, KTX can ingest that
+context and turn it into agent-readable files. You do not need to replace your
+serving layer to give agents a better working surface.
-If you do not have a semantic layer, KTX can build an agent-native one from your database schema and enrich it with generated descriptions and wiki pages. If you already use MetricFlow or LookML, KTX ingests from those tools and merges their context into KTX's files. You can keep your existing BI or metric-serving system while using KTX as the semantic and contextual surface agents work against.
+## Plain files
-## The plain-files philosophy
+A KTX project is a directory of readable files. Semantic sources and wiki pages
+are committed to git; local indexes and caches stay under `.ktx/`.
-A KTX project is a directory of plain files. No server to run, no database to manage, no proprietary store to back up. Everything is YAML, Markdown, and SQLite - formats you can read, diff, and version-control with tools you already use.
-
-```
+```text
my-project/
-├── ktx.yaml # Project configuration
+├── ktx.yaml
├── semantic-layer/
│ └── warehouse/
-│ ├── orders.yaml # Semantic source definitions
-│ ├── customers.yaml
-│ └── order_items.yaml
+│ ├── orders.yaml
+│ └── customers.yaml
├── wiki/
-│ ├── global/
-│ │ ├── revenue.md # Business definitions and rules
-│ │ └── segment-classification.md
-│ └── user/
-│ └── local/
-│ └── data-quality-notes.md
+│ └── global/
+│ ├── revenue.md
+│ └── segment-classification.md
├── raw-sources/
│ └── warehouse/
-│ └── live-database/ # Schema ingest artifacts and reports
-└── .ktx/
- ├── db.sqlite # Local state (git-ignored)
- └── cache/ # Runtime cache (git-ignored)
+└── .ktx/ # local state, git-ignored
```
-Semantic sources and wiki pages are committed to git. The SQLite database holds ephemeral state - schema ingest results, embedding indexes, session logs - and is git-ignored. If you delete it, KTX rebuilds it on the next run.
+This keeps analytics context close to the code review workflow:
-This means your analytics context travels with your code. You can fork it, branch it, review it in a PR, and merge it with the same tools you use for dbt models. There's no sync problem between a remote server and your local state. There's no migration to run. The files are the source of truth.
+- branch context changes;
+- review YAML and Markdown diffs;
+- merge accepted definitions;
+- let agents read the updated source of truth.
## Agent usage notes
-Use this page when an agent needs to explain why KTX exists, why schema-only database access is not enough, or how KTX differs from MetricFlow, Cube, Malloy, and traditional semantic layers.
+Use this page when an agent needs to explain why KTX exists, why schema-only
+database access is not enough, or how KTX differs from traditional semantic
+layers.
| Agent task | Relevant section | Next page |
|------------|------------------|-----------|
-| Explain why a database agent made a plausible but wrong query | The problem | [Writing Context](/docs/guides/writing-context) |
-| Decide whether a metric belongs in YAML or Markdown | What a context layer is | [Writing Context](/docs/guides/writing-context) |
+| Explain why a database agent wrote a plausible but wrong query | Why agents need context | [Writing Context](/docs/guides/writing-context) |
+| Decide whether a fact belongs in YAML or Markdown | Semantic sources / Wiki pages | [Writing Context](/docs/guides/writing-context) |
| Compare KTX to another semantic layer | How KTX compares | [Primary Sources](/docs/integrations/primary-sources) |
-| Explain reviewability and source of truth | The plain-files philosophy | [Context as Code](/docs/concepts/context-as-code) |
+| Explain reviewability and source of truth | Plain files | [Context as Code](/docs/concepts/context-as-code) |