Merge origin/main into add-ktx-mcp-claude-desktop

This commit is contained in:
Andrey Avtomonov 2026-05-16 01:59:20 +02:00
commit af4f2c29df
25 changed files with 1657 additions and 234 deletions

View file

@ -4,8 +4,8 @@ description: "Command map and shared options for the KTX CLI."
---
The `ktx` CLI sets up local projects, builds agent-ready context, checks
connections, queries semantic-layer sources, searches wiki pages, and manages
the bundled Python runtime.
connections, queries semantic-layer sources, searches wiki pages, runs the MCP
server, and manages the bundled Python runtime.
## Command Map
@ -26,6 +26,11 @@ ktx
validate <sourceName>
query
status
mcp
start
stop
status
logs
dev
init [directory]
schema
@ -73,4 +78,7 @@ ktx ingest --all
# Search semantic-layer sources and wiki pages
ktx sl search "revenue"
ktx wiki search "revenue recognition"
# Start the local MCP server for agent clients
ktx mcp start
```

View file

@ -29,14 +29,16 @@ connections when you use `--all`.
| `--deep` | Use AI-enriched database ingest | Stored connection default, or `fast` |
| `--query-history` | Include database query-history usage patterns | Stored connection default |
| `--no-query-history` | Skip database query-history usage patterns for this run | Stored connection default |
| `--query-history-window-days <days>` | Query-history lookback window for this run | Stored connection default |
| `--query-history-window-days <days>` | BigQuery/Snowflake query-history lookback window for this run | Stored connection default |
| `--plain` | Print plain text output | `true` |
| `--json` | Print JSON output | `false` |
| `--no-input` | Disable interactive terminal input | — |
`--fast` and `--deep` are mutually exclusive. Depth flags apply only to
database connections. Query-history flags apply only to database connections
that support query history. Query-history ingest runs after schema ingest and
that support query history. The window flag applies to BigQuery and Snowflake;
Postgres reads the current `pg_stat_statements` aggregate data instead of a
time-windowed history table. Query-history ingest runs after schema ingest and
requires deep ingest readiness.
When `--all` selects both databases and context sources, database ingest runs
@ -70,6 +72,7 @@ ktx ingest warehouse --deep
# Include query-history usage patterns
ktx ingest warehouse --deep --query-history
# Set the lookback window for BigQuery or Snowflake query history
ktx ingest warehouse --query-history-window-days 30
# Build a source connection

View file

@ -0,0 +1,70 @@
---
title: "ktx mcp"
description: "Run the KTX MCP HTTP server for agent clients."
---
`ktx mcp` starts, stops, inspects, and tails the local KTX MCP server for a KTX
project. Use it when an agent client connects through MCP instead of generated
CLI instructions.
## Command signature
```bash
ktx mcp <subcommand> [options]
```
## Subcommands
| Subcommand | Description |
|-----------|-------------|
| `start` | Start the KTX MCP HTTP server |
| `stop` | Stop the KTX MCP daemon |
| `status` | Show daemon status, URL, PID, token mode, and project path |
| `logs` | Print the daemon log |
## `mcp start` Options
| Flag | Description | Default |
|------|-------------|---------|
| `--host <host>` | Host to bind | `127.0.0.1` |
| `--port <n>` | Port to bind | `7878` |
| `--token <token>` | Bearer token for non-loopback binding | `KTX_MCP_TOKEN` |
| `--foreground` | Run the server in the foreground | `false` |
| `--allowed-host <host>` | Additional allowed Host header; repeatable | - |
| `--allowed-origin <origin>` | Allowed browser Origin header; repeatable | - |
## `mcp logs` Options
| Flag | Description | Default |
|------|-------------|---------|
| `--follow` | Follow log output | `false` |
## Examples
```bash
# Start the daemon on localhost
ktx mcp start
# Check status
ktx mcp status
# Tail logs
ktx mcp logs --follow
# Run in the foreground on a custom port
ktx mcp start --port 8787 --foreground
```
## Security notes
The default host is loopback-only. If you bind to a non-loopback host, configure
a bearer token with `--token <token>` or `KTX_MCP_TOKEN` and restrict allowed
hosts and origins for browser clients.
## Common errors
| Error | Cause | Recovery |
|-------|-------|----------|
| No KTX project found | Current directory has no `ktx.yaml` and `KTX_PROJECT_DIR` is unset | Run from a KTX project or pass `--project-dir <path>` |
| Non-loopback host rejected | The server needs token auth before binding beyond localhost | Pass `--token <token>` or set `KTX_MCP_TOKEN` |
| Client cannot connect | Host, port, token, allowed host, or allowed origin does not match the client | Check `ktx mcp status`, then restart with explicit `--host`, `--port`, `--allowed-host`, and `--allowed-origin` values |

View file

@ -96,13 +96,16 @@ incomplete.
|------|-------------|
| `--enable-query-history` | Enable query-history ingest when the selected database supports it |
| `--disable-query-history` | Disable query-history ingest for the selected database |
| `--query-history-window-days <number>` | Query-history lookback window |
| `--query-history-window-days <number>` | BigQuery/Snowflake query-history lookback window |
| `--query-history-min-executions <number>` | Minimum executions for a query-history template |
| `--query-history-service-account-pattern <pattern>` | Query-history service-account regex; repeatable |
| `--query-history-redaction-pattern <pattern>` | Query-history SQL-literal redaction regex; repeatable |
Query history setup is supported for Postgres, BigQuery, and Snowflake. Enabling
query history makes deep ingest readiness matter for later `ktx ingest` runs.
Query history setup is supported for Postgres, BigQuery, and Snowflake. The
window flag applies to BigQuery and Snowflake; Postgres reads the current
`pg_stat_statements` aggregate data instead of a time-windowed history table.
Enabling query history makes deep ingest readiness matter for later
`ktx ingest` runs.
### Context Sources

View file

@ -9,6 +9,7 @@
"ktx-sl",
"ktx-wiki",
"ktx-status",
"ktx-mcp",
"ktx-dev"
]
}

View file

@ -1,5 +1,5 @@
{
"title": "Concepts",
"defaultOpen": true,
"pages": ["the-context-layer", "context-as-code"]
"pages": ["the-context-layer", "semantic-layer-internals", "context-as-code"]
}

View file

@ -0,0 +1,398 @@
---
title: Semantic Layer Internals
description: How KTX uses join graphs, grain, and relationship metadata to turn context into safe SQL.
---
KTX is a context layer for agents. This page focuses on one internal subsystem:
the semantic execution layer that turns reviewed context into safe SQL.
The semantic layer is important, but it is not the whole product. KTX also
handles schema evidence, wiki context, provenance, validation, and agent
workflows around those files.
Read the page as a pipeline:
- context inputs feed the semantic engine;
- evidence becomes a join graph with grain and relationship metadata;
- review and corrections keep that graph current;
- the execution engine uses the graph to avoid fan-out and ambiguous joins.
## Where the semantic layer fits
The semantic layer is not a separate product category inside KTX. It is the
engine that makes the rest of the context actionable for SQL generation.
<div
className="not-prose my-8 overflow-hidden rounded-lg border border-fd-border bg-fd-card shadow-sm"
aria-label="How context inputs flow through the semantic layer into agent workflows"
>
<div className="grid gap-0 lg:grid-cols-[1fr_2rem_1.12fr_2rem_1fr]">
<section className="bg-fd-background p-4">
<p className="mb-3 text-[11px] font-semibold uppercase tracking-wide text-fd-muted-foreground">
{"Context inputs"}
</p>
<div className="grid gap-2 text-sm">
<div className="border-l-2 border-fd-primary bg-fd-card px-3 py-2">
<p className="font-mono text-xs text-fd-foreground">semantic-layer/</p>
<p className="mt-1 text-xs leading-5 text-fd-muted-foreground">
{"source YAML, measures, joins, grain"}
</p>
</div>
<div className="border-l-2 border-amber-500 bg-fd-card px-3 py-2">
<p className="font-mono text-xs text-fd-foreground">wiki/</p>
<p className="mt-1 text-xs leading-5 text-fd-muted-foreground">
{"business rules, definitions, caveats"}
</p>
</div>
<div className="border-l-2 border-orange-500 bg-fd-card px-3 py-2">
<p className="font-mono text-xs text-fd-foreground">raw-sources/</p>
<p className="mt-1 text-xs leading-5 text-fd-muted-foreground">
{"schema scans, keys, imported metadata"}
</p>
</div>
<div className="border-l-2 border-slate-500 bg-fd-card px-3 py-2 dark:border-cyan-200">
<p className="font-mono text-xs text-fd-foreground">provenance</p>
<p className="mt-1 text-xs leading-5 text-fd-muted-foreground">
{"ingest decisions and review history"}
</p>
</div>
</div>
</section>
<div className="hidden items-center justify-center bg-fd-background lg:flex" aria-hidden="true">
<span className="h-px w-full bg-fd-border" />
</div>
<section className="relative bg-[#102226] p-5 text-white dark:bg-[#0b181b]">
<div className="absolute inset-y-0 left-0 w-1 bg-fd-primary" />
<p className="mb-3 text-[11px] font-semibold uppercase tracking-wide text-cyan-200">
{"Semantic layer engine"}
</p>
<div className="grid gap-2 sm:grid-cols-2">
<div className="rounded-md border border-cyan-100/20 bg-white/8 px-3 py-2">
<p className="text-sm font-semibold">Join graph</p>
<p className="mt-1 text-xs leading-5 text-cyan-50/75">
{"sources as nodes, joins as typed edges"}
</p>
</div>
<div className="rounded-md border border-cyan-100/20 bg-white/8 px-3 py-2">
<p className="text-sm font-semibold">Grain</p>
<p className="mt-1 text-xs leading-5 text-cyan-50/75">
{"row identity before aggregation"}
</p>
</div>
<div className="rounded-md border border-cyan-100/20 bg-white/8 px-3 py-2">
<p className="text-sm font-semibold">Measures</p>
<p className="mt-1 text-xs leading-5 text-cyan-50/75">
{"verified formulas and filters"}
</p>
</div>
<div className="rounded-md border border-cyan-100/20 bg-white/8 px-3 py-2">
<p className="whitespace-nowrap break-normal text-sm font-semibold">Relationships</p>
<p className="mt-1 text-xs leading-5 text-cyan-50/75">
{"many_to_one, one_to_many, one_to_one"}
</p>
</div>
</div>
<div className="mt-3 rounded-md border border-cyan-100/20 bg-cyan-50/10 px-3 py-2 text-sm">
{"Safe query planning before SQL is generated."}
</div>
</section>
<div className="hidden items-center justify-center bg-fd-background lg:flex" aria-hidden="true">
<span className="h-px w-full bg-fd-border" />
</div>
<section className="bg-fd-muted/35 p-4">
<p className="mb-3 text-[11px] font-semibold uppercase tracking-wide text-fd-muted-foreground">
{"Agent workflows"}
</p>
<div className="space-y-2 text-sm">
<div className="rounded-md border border-fd-border bg-fd-card px-3 py-2">
{"Search sources and wiki pages"}
</div>
<div className="rounded-md border border-fd-border bg-fd-card px-3 py-2">
{"Compile trusted SQL"}
</div>
<div className="rounded-md border border-fd-border bg-fd-card px-3 py-2">
{"Explain metrics and provenance"}
</div>
<div className="rounded-md border border-fd-border bg-fd-card px-3 py-2">
{"Patch files and validate review"}
</div>
</div>
</section>
</div>
</div>
## The join graph KTX builds
A semantic source is a node. A join is an edge with a join condition and a
relationship type. The graph lets KTX choose valid paths, reject unsafe paths,
and reason about whether a join preserves or multiplies rows before SQL is
generated.
- `many_to_one` paths are usually safe for adding dimensions.
- `one_to_many` paths can multiply fact rows and trigger fan-out handling.
- Equal-cost paths can be ambiguous, so aliases and explicit joins matter.
<figure
className="not-prose my-8 overflow-hidden rounded-lg border border-fd-border bg-fd-card p-4 shadow-sm"
aria-label="Example semantic join graph"
>
<div className="grid gap-3 md:grid-cols-[1fr_1fr_1fr]">
<div className="rounded-md border border-fd-border bg-fd-background px-4 py-3">
<p className="text-sm font-semibold text-fd-foreground">customers</p>
<p className="mt-1 text-xs text-fd-muted-foreground">grain: customer_id</p>
</div>
<div className="rounded-md border-2 border-fd-primary bg-fd-background px-4 py-3">
<p className="text-sm font-semibold text-fd-foreground">orders</p>
<p className="mt-1 text-xs text-fd-muted-foreground">grain: order_id</p>
</div>
<div className="rounded-md border border-fd-border bg-fd-background px-4 py-3">
<p className="text-sm font-semibold text-fd-foreground">order_items</p>
<p className="mt-1 text-xs text-fd-muted-foreground">grain: order_id, line_id</p>
</div>
</div>
<div className="my-3 grid gap-2 text-center text-xs font-medium text-fd-muted-foreground md:grid-cols-[1fr_1fr]">
<div>orders -> customers: many_to_one</div>
<div>orders -> order_items: one_to_many</div>
</div>
<figcaption className="mt-4 border-t border-fd-border pt-3 text-left text-xs leading-5 text-fd-muted-foreground">
<span className="font-medium text-fd-foreground">{"Example: "}</span>
{"refunds joins to orders. Used carefully, it explains net revenue. Joined naively, it can duplicate order-level measures."}
</figcaption>
</figure>
The graph is bidirectional for planning. If `orders -> customers` is
`many_to_one`, the reverse path is `one_to_many`; KTX keeps that distinction
instead of treating every join as a neutral edge.
## How KTX builds the graph
KTX starts from evidence, not a blank modeling canvas. Database scans and
analytics-tool imports create source definitions that an analyst can review.
| Evidence | What it contributes |
|---|---|
| Declared primary keys | Initial row grain for each source |
| Declared foreign keys | Formal join candidates and relationship direction |
| Inferred relationships | Useful edges when warehouses lack constraints |
| dbt, MetricFlow, and LookML imports | Existing metrics, dimensions, entities, explores, and joins |
| Query history | Real join and filter patterns agents should respect |
| Analyst review | The final authority before context is merged |
Generated YAML is intentionally reviewable. KTX can draft joins and measures,
but the accepted semantic layer is still the plain-file diff your team approves.
## How KTX keeps the graph current
The semantic layer changes as schemas, metrics, and business rules change. KTX
keeps that loop explicit instead of hiding it behind a remote runtime.
<div
className="not-prose my-8 overflow-hidden rounded-lg border border-fd-border bg-fd-card shadow-sm"
aria-label="Semantic layer maintenance loop"
>
<div className="border-b border-fd-border bg-fd-muted/35 px-4 py-3">
<p className="text-[11px] font-semibold uppercase tracking-wide text-fd-muted-foreground">
{"Semantic maintenance loop"}
</p>
<p className="mt-1 text-sm leading-6 text-fd-muted-foreground">
{"Every accepted correction becomes input to the next graph build."}
</p>
</div>
<div className="p-4">
<div className="-mx-4 overflow-x-auto px-4">
<div className="relative mx-auto h-[460px] w-[720px] max-w-none md:w-full md:max-w-[760px]">
<svg
aria-hidden="true"
className="absolute inset-0 h-full w-full text-fd-primary"
fill="none"
viewBox="0 0 760 460"
>
<g
stroke="currentColor"
strokeLinecap="round"
strokeLinejoin="round"
strokeOpacity="0.68"
strokeWidth="2.5"
>
<path d="M 352 80 H 384" />
<path d="M 600 80 H 668 V 150" />
<path d="M 632 284 V 378 H 626" />
<path d="M 408 378 H 376" />
<path d="M 160 378 H 96 V 308" />
<path d="M 128 172 V 80 H 140" />
</g>
<g fill="currentColor" fillOpacity="0.96" stroke="none">
<polygon points="0,0 -14,-7 -14,7" transform="translate(398 80)" />
<polygon points="0,0 -14,-7 -14,7" transform="translate(668 164) rotate(90)" />
<polygon points="0,0 -14,-7 -14,7" transform="translate(612 378) rotate(180)" />
<polygon points="0,0 -14,-7 -14,7" transform="translate(362 378) rotate(180)" />
<polygon points="0,0 -14,-7 -14,7" transform="translate(96 294) rotate(270)" />
<polygon points="0,0 -14,-7 -14,7" transform="translate(154 80)" />
</g>
</svg>
<div className="absolute left-1/2 top-1/2 flex h-32 w-56 -translate-x-1/2 -translate-y-1/2 flex-col items-center justify-center rounded-md border border-fd-primary/50 bg-fd-background px-4 py-4 text-center shadow-sm">
<p className="text-[11px] font-semibold uppercase tracking-wide text-fd-primary">
{"reviewed context"}
</p>
<p className="mt-2 text-sm font-semibold leading-6 text-fd-foreground">
{"The accepted graph becomes the starting point for the next build."}
</p>
</div>
<div className="absolute left-[160px] top-6 h-28 w-48 rounded-md border-2 border-fd-primary bg-fd-background px-4 py-3 text-sm shadow-sm">
<p className="text-[11px] font-semibold uppercase tracking-wide text-fd-muted-foreground">
{"Step 1"}
</p>
<p className="mt-1 font-semibold text-fd-foreground">{"ingest evidence"}</p>
<p className="mt-2 text-xs leading-5 text-fd-muted-foreground">
{"scan schemas, imports, and accepted files"}
</p>
</div>
<div className="absolute left-[408px] top-6 h-28 w-48 rounded-md border border-fd-border bg-fd-background px-4 py-3 text-sm shadow-sm">
<p className="text-[11px] font-semibold uppercase tracking-wide text-fd-muted-foreground">
{"Step 2"}
</p>
<p className="mt-1 font-semibold text-fd-foreground">{"YAML diff"}</p>
<p className="mt-2 text-xs leading-5 text-fd-muted-foreground">
{"draft source, join, grain, and measure changes"}
</p>
</div>
<div className="absolute left-[536px] top-[172px] h-28 w-48 rounded-md border border-fd-border bg-fd-background px-4 py-3 text-sm shadow-sm">
<p className="text-[11px] font-semibold uppercase tracking-wide text-fd-muted-foreground">
{"Step 3"}
</p>
<p className="mt-1 font-semibold text-fd-foreground">{"validation"}</p>
<p className="mt-2 text-xs leading-5 text-fd-muted-foreground">
{"check relationships, syntax, and unsafe query shapes"}
</p>
</div>
<div className="absolute left-[408px] top-[322px] h-28 w-48 rounded-md border border-fd-border bg-fd-background px-4 py-3 text-sm shadow-sm">
<p className="text-[11px] font-semibold uppercase tracking-wide text-fd-muted-foreground">
{"Step 4"}
</p>
<p className="mt-1 font-semibold text-fd-foreground">{"analyst review"}</p>
<p className="mt-2 text-xs leading-5 text-fd-muted-foreground">
{"accept, edit, or reject generated context"}
</p>
</div>
<div className="absolute left-[160px] top-[322px] h-28 w-48 rounded-md border border-fd-border bg-fd-background px-4 py-3 text-sm shadow-sm">
<p className="text-[11px] font-semibold uppercase tracking-wide text-fd-muted-foreground">
{"Step 5"}
</p>
<p className="mt-1 font-semibold text-fd-foreground">{"agent use"}</p>
<p className="mt-2 text-xs leading-5 text-fd-muted-foreground">
{"serve context to search, explain, and query"}
</p>
</div>
<div className="absolute left-8 top-[172px] h-28 w-48 rounded-md border border-fd-primary/70 bg-fd-background px-4 py-3 text-sm shadow-sm">
<p className="text-[11px] font-semibold uppercase tracking-wide text-fd-muted-foreground">
{"Step 6"}
</p>
<p className="mt-1 font-semibold text-fd-foreground">{"corrections"}</p>
<p className="mt-2 text-xs leading-5 text-fd-muted-foreground">
{"agent and analyst fixes become new evidence"}
</p>
</div>
</div>
</div>
</div>
</div>
This matters because semantic correctness is not static. If a source gains a
new key, a metric changes definition, or an analyst corrects a relationship,
the next agent gets that reviewed context.
## The modeling problem the graph solves
Fan-out is the classic failure mode. If an order-level measure is joined to
line-item rows before aggregation, one order can become many rows and revenue
can be counted more than once.
| Problem | What happens | How KTX avoids it |
|---|---|---|
| Order measure joins to `order_items` | `orders.revenue` repeats once per item | Detect the `one_to_many` path and pre-aggregate the order measure |
| Two independent fact sources share `customers` | Measures from each fact table multiply across the shared dimension | Treat it as a chasm trap and use aggregate-locality planning |
| Filter lives only across a `one_to_many` path | Filtering after the join changes the measure grain | Reject or localize the filter instead of silently producing unsafe SQL |
| Multiple equal-cost paths connect the same sources | The join path is ambiguous | Prefer safer paths and use aliases to disambiguate repeated joins |
Many-to-many questions usually show up as multiple one-to-many paths or
independent fact sources. KTX treats those shapes as fan-out or chasm risks
unless the query can be planned at a safe grain.
## How the execution engine uses the graph
The planner resolves the sources in a semantic query, chooses a join tree, and
checks whether any requested dimension or filter crosses a row-multiplying
edge. The SQL generator then chooses the simple path or the aggregate-locality
path.
| Naive SQL shape | Semantic-layer SQL shape |
|---|---|
| Join facts and dimensions first, then aggregate | Aggregate each fact source at its own grain, then join the results |
| Put every filter in one outer `WHERE` clause | Keep measure filters with the measure source when locality is needed |
| Trust the shortest textual join path | Prefer safe relationship paths and reject disconnected sources |
| Let dimension grain differ across facts | Raise when asymmetric dimensions would fan out another measure |
<div
className="not-prose my-8 overflow-hidden rounded-lg border border-fd-border bg-fd-card shadow-sm"
aria-label="Fan-out safe execution shape"
>
<div className="grid gap-0 md:grid-cols-2">
<section className="border-b border-fd-border bg-fd-background p-4 md:border-b-0 md:border-r">
<p className="mb-3 text-[11px] font-semibold uppercase tracking-wide text-fd-muted-foreground">
{"Unsafe shape"}
</p>
<pre className="overflow-x-auto rounded-md bg-fd-muted p-3 text-xs leading-5 text-fd-foreground">
{`orders
join order_items
join customers
group by customer_segment
sum(orders.amount)`}
</pre>
<p className="mt-3 text-sm text-fd-muted-foreground">
{"The order measure is exposed to line-item fan-out before aggregation."}
</p>
</section>
<section className="bg-fd-background p-4">
<p className="mb-3 text-[11px] font-semibold uppercase tracking-wide text-fd-muted-foreground">
{"KTX shape"}
</p>
<pre className="overflow-x-auto rounded-md border border-fd-border bg-fd-muted p-3 text-xs leading-5 text-fd-foreground">
{`orders_agg as (
select customer_id, sum(amount) revenue
from orders
group by customer_id
)
select customers.segment, sum(revenue)
from orders_agg
join customers`}
</pre>
<p className="mt-3 text-sm text-fd-muted-foreground">
{"KTX pre-aggregates fact measures at their own grain before joining dimensions."}
</p>
</section>
</div>
</div>
The result is not magic. It is structured planning: validated sources, typed
relationships, graph search, fan-out detection, aggregate locality, and final
dialect transpilation.
## What this means for agents
KTX gives agents a semantic surface they can inspect and improve, not just a
folder of notes.
- Search semantic sources and related wiki pages before writing SQL.
- Compile SQL through `ktx sl query` instead of guessing joins.
- Validate semantic-layer changes before review.
- Patch YAML and Markdown files in git.
- Explain metric meaning and provenance from the same accepted context.
Next, read [Writing Context](/docs/guides/writing-context) for the YAML editing
workflow or [ktx sl](/docs/cli-reference/ktx-sl) for the command reference.

View file

@ -191,7 +191,18 @@ KTX organizes context into four pillars:
Each pillar covers a different kind of context agents need before they can safely write SQL, update semantic definitions, or explain an analytics result.
**Semantic sources** are YAML definitions that describe your data in terms agents can reason about. Each source maps to a table or SQL query, declares its grain, defines typed columns, specifies valid joins, and exposes named measures with optional filters. This is where "revenue means `sum(amount)` excluding refunds" lives.
**Semantic sources** are YAML definitions that describe your data in terms
agents can reason about:
- source tables or SQL queries;
- row grain;
- typed columns;
- valid joins;
- named measures, filters, and segments.
This is where "revenue means `sum(amount)` excluding refunds" lives. For the
join graph, fan-out protections, and execution mechanics, read
[Semantic Layer Internals](/docs/concepts/semantic-layer-internals).
```yaml
name: orders
@ -289,7 +300,7 @@ my-project/
│ └── data-quality-notes.md
├── raw-sources/
│ └── warehouse/
│ └── database-ingest/ # Schema ingest artifacts and reports
│ └── live-database/ # Schema ingest artifacts and reports
└── .ktx/
├── db.sqlite # Local state (git-ignored)
└── cache/ # Runtime cache (git-ignored)

View file

@ -3,10 +3,12 @@ title: Introduction
description: How KTX gives analytics agents trusted context for warehouse work.
---
<div className="not-prose mb-14">
<div className="mb-8">
import { ProductMechanics } from "@/components/product-mechanics";
<div className="not-prose mb-10">
<div>
<h1
className="text-4xl font-extrabold tracking-tight lg:text-5xl"
className="max-w-full text-3xl font-extrabold tracking-tight break-words sm:text-4xl lg:text-5xl"
style={{
fontFamily: 'var(--font-display)',
background: 'linear-gradient(180deg, var(--color-fd-foreground) 0%, color-mix(in oklch, var(--color-fd-foreground) 75%, var(--color-fd-primary)) 100%)',
@ -18,62 +20,43 @@ description: How KTX gives analytics agents trusted context for warehouse work.
letterSpacing: '0',
}}
>
Make analytics context{'\n'}usable by agents
Make analytics context usable by agents
</h1>
<p className="mt-4 text-lg text-fd-muted-foreground max-w-2xl" style={{ lineHeight: '1.7' }}>
KTX turns warehouse metadata, semantic definitions, and business knowledge
into reviewable project files that agents can use while planning, querying,
and updating analytics work.
<p className="mt-4 max-w-2xl text-lg text-fd-muted-foreground" style={{ lineHeight: '1.7' }}>
{'KTX turns warehouse metadata, semantic definitions, and business knowledge into reviewable project files that agents can use while planning, querying, and updating analytics work.'}
</p>
</div>
<div className="flex flex-wrap gap-3">
<a
href="/docs/getting-started/quickstart"
className="inline-flex h-10 items-center rounded-lg bg-fd-primary px-5 text-sm font-medium text-fd-primary-foreground transition-colors hover:opacity-90"
>
Get Started
</a>
<a
href="/docs/concepts/the-context-layer"
className="inline-flex h-10 items-center rounded-lg border border-fd-border bg-fd-background px-5 text-sm font-medium text-fd-foreground transition-colors hover:bg-fd-muted"
>
The Context Layer
</a>
<a
href="/docs/guides/building-context"
className="inline-flex h-10 items-center rounded-lg border border-fd-border bg-fd-background px-5 text-sm font-medium text-fd-foreground transition-colors hover:bg-fd-muted"
>
Building Context
</a>
</div>
</div>
## Who KTX is for
<ProductMechanics />
## What agents can do with KTX
KTX is built for analytics engineers and data teams who want data agents to
work on real analytics systems - not just generate one-off SQL.
work on real analytics systems, not just generate one-off SQL.
Use KTX when you want agents to:
Use it when agents need to:
- **Generate SQL** from approved measures and joins
- **Repair semantic definitions** through reviewable diffs
- **Explain metric provenance** with warehouse evidence
- **Work alongside** dbt, LookML, MetricFlow, Looker, Metabase, and modern BI platforms
- **Generate SQL** from approved measures, dimensions, joins, and filters
- **Explain provenance** with wiki context and warehouse evidence
- **Repair context** through reviewable YAML and Markdown diffs
- **Work alongside** dbt, LookML, MetricFlow, Looker, Metabase, and warehouses
Works with PostgreSQL, Snowflake, BigQuery, ClickHouse, MySQL, and SQL Server.
KTX works with SQLite, PostgreSQL, Snowflake, BigQuery, ClickHouse, MySQL, and
SQL Server.
## Explore the docs
## Read next
<Cards>
<Card title="Quickstart" href="/docs/getting-started/quickstart">
Set up KTX and build your first context in under 10 minutes.
</Card>
<Card title="Concepts" href="/docs/concepts/the-context-layer">
Understand what a context layer is and why agents need one.
</Card>
<Card title="Guides" href="/docs/guides/building-context">
Hands-on workflows for scanning, ingesting, writing, and serving.
</Card>
<Card title="Writing Context" href="/docs/guides/writing-context">
Edit semantic-layer YAML and wiki Markdown safely.
</Card>
<Card title="CLI Reference" href="/docs/cli-reference/ktx-setup">
Complete flag and subcommand reference for every KTX command.
</Card>

View file

@ -51,8 +51,8 @@ For scripted setup, pass the project directory explicitly:
ktx setup --project-dir ./analytics
```
If setup exits early, rerun `ktx setup` in the same directory. KTX tracks
completed setup steps and resumes from the remaining work.
If setup exits early, rerun `ktx setup` in the same directory. KTX keeps local
setup progress under `.ktx/setup/` and resumes from the remaining work.
## Step 2: Configure the LLM
@ -122,7 +122,8 @@ Database ready
PostgreSQL, BigQuery, and Snowflake can also enable query-history ingest. Query
history helps KTX learn common query patterns, joins, service-account filters,
and warehouse-specific usage.
and warehouse-specific usage. BigQuery and Snowflake support a lookback window;
Postgres reads the current `pg_stat_statements` aggregate data instead.
## Step 5: Add context sources
@ -200,7 +201,7 @@ KTX writes plain files so people and agents can inspect changes in git.
| Path | Purpose |
|------|---------|
| `ktx.yaml` | Project configuration for LLMs, embeddings, connections, context sources, and setup state |
| `ktx.yaml` | Project configuration for LLMs, embeddings, connections, context sources, and query-history settings |
| `.ktx/secrets/*` | Local secret files referenced from `ktx.yaml`; do not commit these |
| `.ktx/setup/*` | Local setup and context-build state |
| `.ktx/agents/install-manifest.json` | Manifest used to manage installed agent files |

View file

@ -62,13 +62,15 @@ configured, run `ktx setup` or use `--fast`.
PostgreSQL, BigQuery, and Snowflake can add query-history context. This helps
KTX learn common joins, filters, service-account patterns, redaction rules, and
usage-heavy query templates.
usage-heavy query templates. BigQuery and Snowflake support a lookback window;
Postgres reads the current `pg_stat_statements` aggregate data instead.
Enable it during setup, store it under `connections.<id>.context.queryHistory`,
or request it for one run:
```bash
ktx ingest warehouse --deep --query-history
# Set the lookback window for BigQuery or Snowflake query history
ktx ingest warehouse --query-history-window-days 30
```

View file

@ -60,21 +60,25 @@ semantic-layer/<connection-id>/<source-name>.yaml
```yaml
name: orders
description: Customer orders with booked revenue.
descriptions:
user: Customer orders with booked revenue.
table: public.orders
grain:
- order_id
columns:
- name: order_id
type: string
description: Unique order identifier.
descriptions:
user: Unique order identifier.
- name: order_date
type: time
role: time
description: Date the order was placed.
descriptions:
user: Date the order was placed.
- name: total_amount
type: number
description: Booked order value in USD.
descriptions:
user: Booked order value in USD.
measures:
- name: total_revenue
expr: SUM(total_amount)
@ -85,7 +89,8 @@ measures:
```yaml
name: orders
description: Customer orders with line-item totals.
descriptions:
user: Customer orders with line-item totals.
table: public.orders
grain:
- order_id
@ -93,26 +98,31 @@ grain:
columns:
- name: order_id
type: string
description: Unique order identifier.
descriptions:
user: Unique order identifier.
- name: order_date
type: time
role: time
description: Date the order was placed.
descriptions:
user: Date the order was placed.
- name: status
type: string
visibility: public
description: Current order status.
descriptions:
user: Current order status.
- name: _etl_loaded_at
type: time
visibility: hidden
description: Internal load timestamp.
descriptions:
user: Internal load timestamp.
- name: total_amount
type: number
description: Order total in USD.
descriptions:
user: Order total in USD.
measures:
- name: total_revenue
@ -149,9 +159,10 @@ joins:
| Field | Required | Description |
|-------|----------|-------------|
| `name` | Yes | Source identifier. Use lowercase words and underscores. |
| `descriptions` | No | Description map keyed by source, such as `user`, `dbt`, or `ai`. |
| `table` or `sql` | Yes | Database table or custom SQL expression. Use exactly one. |
| `grain` | Yes | Columns that uniquely identify a row at the source grain. |
| `columns` | No | Column definitions with type, role, visibility, and descriptions. |
| `columns` | Yes | Non-empty column definitions with type, role, visibility, and descriptions. |
| `measures` | No | Aggregation expressions such as `SUM`, `COUNT`, and `AVG`. |
| `segments` | No | Named predicates agents can reuse. |
| `joins` | No | Relationships to other semantic sources. |
@ -165,7 +176,7 @@ joins:
| Column | `type` | Yes | Agent-facing type: `string`, `number`, `time`, or `boolean`. |
| Column | `role` | No | Special role such as `time` for default time dimensions. |
| Column | `visibility` | No | `public`, `internal`, or `hidden`. |
| Column | `description` | Strongly recommended | Business meaning and usage notes. |
| Column | `descriptions` | Strongly recommended | Description map keyed by source, such as `user`, `dbt`, or `ai`. |
| Measure | `name` | Yes | Queryable metric name. |
| Measure | `expr` | Yes | SQL aggregation expression at the source grain. |
| Measure | `filter` | No | SQL predicate applied only to this measure. |

View file

@ -122,7 +122,7 @@ Available commands:
- `ktx status --json --project-dir /path/to/project`
- `ktx sl list --json --project-dir /path/to/project`
- `ktx sl search '<text>' --json --project-dir /path/to/project --connection-id '<id>'`
- `ktx sl query --json --project-dir /path/to/project --connection-id '<id>'`
- `ktx sl query --project-dir /path/to/project --connection-id '<id>' --query-file '<path>' --format json --execute --max-rows 100`
- `ktx wiki search '<query>' --json --project-dir /path/to/project --limit 10`
```
@ -266,7 +266,7 @@ Admin CLI skills call the same KTX CLI commands:
| `ktx sl list --json` | List semantic-layer sources |
| `ktx sl search <query> --json` | Search semantic-layer sources |
| `ktx sl validate <source> --connection-id <id>` | Validate semantic source definitions |
| `ktx sl query --json` | Execute a semantic-layer query when semantic compute is configured |
| `ktx sl query --format json` | Execute a semantic-layer query when semantic compute is configured |
### Security constraints

View file

@ -34,8 +34,9 @@ automation flags documented in [`ktx setup`](/docs/cli-reference/ktx-setup).
| Path | Purpose |
|------|---------|
| `ktx.yaml` | Main project configuration for providers, embeddings, connections, source mappings, query history, and setup state |
| `ktx.yaml` | Main project configuration for providers, embeddings, connections, source mappings, and query history |
| `.ktx/secrets/*` | Local file-backed secrets when you choose file references during setup |
| `.ktx/setup/*` | Local setup progress and context-build state |
| `semantic-layer/<connection-id>/` | YAML semantic sources generated by database and source ingestion |
| `wiki/` | Markdown business context, definitions, and ingested knowledge |
| `.ktx/agents/install-manifest.json` | Manifest of agent integration files installed by `ktx setup --agents` |

View file

@ -228,7 +228,7 @@ mapping metadata. The BigQuery connector still authenticates with the
| Feature | Supported | Notes |
|---------|-----------|-------|
| Tables & views | Yes | Including materialized views and external tables |
| Primary keys | No | - |
| Primary keys | Yes | Via `INFORMATION_SCHEMA` table constraints when declared |
| Foreign keys | No | Not available in BigQuery |
| Row count estimates | Yes | From table metadata |
| Column statistics | No | - |
@ -500,7 +500,7 @@ No authentication required - SQLite is file-based. The file must be readable by
- Uses `LIMIT X OFFSET Y` for pagination
- SQLite type affinity system: `TEXT`, `NUMERIC`, `INTEGER`, `REAL`, `BLOB`
- Foreign key enforcement requires explicit `PRAGMA foreign_keys = ON`
- In-memory databases supported with `path: ":memory:"` (for testing)
- Database file must exist before `ktx connection test` or ingest runs
## Common errors