mirror of
https://github.com/Kaelio/ktx.git
synced 2026-06-13 08:15:14 +02:00
Merge origin/main into add-ktx-mcp-claude-desktop
This commit is contained in:
commit
af4f2c29df
25 changed files with 1657 additions and 234 deletions
|
|
@ -4,8 +4,8 @@ description: "Command map and shared options for the KTX CLI."
|
|||
---
|
||||
|
||||
The `ktx` CLI sets up local projects, builds agent-ready context, checks
|
||||
connections, queries semantic-layer sources, searches wiki pages, and manages
|
||||
the bundled Python runtime.
|
||||
connections, queries semantic-layer sources, searches wiki pages, runs the MCP
|
||||
server, and manages the bundled Python runtime.
|
||||
|
||||
## Command Map
|
||||
|
||||
|
|
@ -26,6 +26,11 @@ ktx
|
|||
validate <sourceName>
|
||||
query
|
||||
status
|
||||
mcp
|
||||
start
|
||||
stop
|
||||
status
|
||||
logs
|
||||
dev
|
||||
init [directory]
|
||||
schema
|
||||
|
|
@ -73,4 +78,7 @@ ktx ingest --all
|
|||
# Search semantic-layer sources and wiki pages
|
||||
ktx sl search "revenue"
|
||||
ktx wiki search "revenue recognition"
|
||||
|
||||
# Start the local MCP server for agent clients
|
||||
ktx mcp start
|
||||
```
|
||||
|
|
|
|||
|
|
@ -29,14 +29,16 @@ connections when you use `--all`.
|
|||
| `--deep` | Use AI-enriched database ingest | Stored connection default, or `fast` |
|
||||
| `--query-history` | Include database query-history usage patterns | Stored connection default |
|
||||
| `--no-query-history` | Skip database query-history usage patterns for this run | Stored connection default |
|
||||
| `--query-history-window-days <days>` | Query-history lookback window for this run | Stored connection default |
|
||||
| `--query-history-window-days <days>` | BigQuery/Snowflake query-history lookback window for this run | Stored connection default |
|
||||
| `--plain` | Print plain text output | `true` |
|
||||
| `--json` | Print JSON output | `false` |
|
||||
| `--no-input` | Disable interactive terminal input | — |
|
||||
|
||||
`--fast` and `--deep` are mutually exclusive. Depth flags apply only to
|
||||
database connections. Query-history flags apply only to database connections
|
||||
that support query history. Query-history ingest runs after schema ingest and
|
||||
that support query history. The window flag applies to BigQuery and Snowflake;
|
||||
Postgres reads the current `pg_stat_statements` aggregate data instead of a
|
||||
time-windowed history table. Query-history ingest runs after schema ingest and
|
||||
requires deep ingest readiness.
|
||||
|
||||
When `--all` selects both databases and context sources, database ingest runs
|
||||
|
|
@ -70,6 +72,7 @@ ktx ingest warehouse --deep
|
|||
|
||||
# Include query-history usage patterns
|
||||
ktx ingest warehouse --deep --query-history
|
||||
# Set the lookback window for BigQuery or Snowflake query history
|
||||
ktx ingest warehouse --query-history-window-days 30
|
||||
|
||||
# Build a source connection
|
||||
|
|
|
|||
70
docs-site/content/docs/cli-reference/ktx-mcp.mdx
Normal file
70
docs-site/content/docs/cli-reference/ktx-mcp.mdx
Normal file
|
|
@ -0,0 +1,70 @@
|
|||
---
|
||||
title: "ktx mcp"
|
||||
description: "Run the KTX MCP HTTP server for agent clients."
|
||||
---
|
||||
|
||||
`ktx mcp` starts, stops, inspects, and tails the local KTX MCP server for a KTX
|
||||
project. Use it when an agent client connects through MCP instead of generated
|
||||
CLI instructions.
|
||||
|
||||
## Command signature
|
||||
|
||||
```bash
|
||||
ktx mcp <subcommand> [options]
|
||||
```
|
||||
|
||||
## Subcommands
|
||||
|
||||
| Subcommand | Description |
|
||||
|-----------|-------------|
|
||||
| `start` | Start the KTX MCP HTTP server |
|
||||
| `stop` | Stop the KTX MCP daemon |
|
||||
| `status` | Show daemon status, URL, PID, token mode, and project path |
|
||||
| `logs` | Print the daemon log |
|
||||
|
||||
## `mcp start` Options
|
||||
|
||||
| Flag | Description | Default |
|
||||
|------|-------------|---------|
|
||||
| `--host <host>` | Host to bind | `127.0.0.1` |
|
||||
| `--port <n>` | Port to bind | `7878` |
|
||||
| `--token <token>` | Bearer token for non-loopback binding | `KTX_MCP_TOKEN` |
|
||||
| `--foreground` | Run the server in the foreground | `false` |
|
||||
| `--allowed-host <host>` | Additional allowed Host header; repeatable | - |
|
||||
| `--allowed-origin <origin>` | Allowed browser Origin header; repeatable | - |
|
||||
|
||||
## `mcp logs` Options
|
||||
|
||||
| Flag | Description | Default |
|
||||
|------|-------------|---------|
|
||||
| `--follow` | Follow log output | `false` |
|
||||
|
||||
## Examples
|
||||
|
||||
```bash
|
||||
# Start the daemon on localhost
|
||||
ktx mcp start
|
||||
|
||||
# Check status
|
||||
ktx mcp status
|
||||
|
||||
# Tail logs
|
||||
ktx mcp logs --follow
|
||||
|
||||
# Run in the foreground on a custom port
|
||||
ktx mcp start --port 8787 --foreground
|
||||
```
|
||||
|
||||
## Security notes
|
||||
|
||||
The default host is loopback-only. If you bind to a non-loopback host, configure
|
||||
a bearer token with `--token <token>` or `KTX_MCP_TOKEN` and restrict allowed
|
||||
hosts and origins for browser clients.
|
||||
|
||||
## Common errors
|
||||
|
||||
| Error | Cause | Recovery |
|
||||
|-------|-------|----------|
|
||||
| No KTX project found | Current directory has no `ktx.yaml` and `KTX_PROJECT_DIR` is unset | Run from a KTX project or pass `--project-dir <path>` |
|
||||
| Non-loopback host rejected | The server needs token auth before binding beyond localhost | Pass `--token <token>` or set `KTX_MCP_TOKEN` |
|
||||
| Client cannot connect | Host, port, token, allowed host, or allowed origin does not match the client | Check `ktx mcp status`, then restart with explicit `--host`, `--port`, `--allowed-host`, and `--allowed-origin` values |
|
||||
|
|
@ -96,13 +96,16 @@ incomplete.
|
|||
|------|-------------|
|
||||
| `--enable-query-history` | Enable query-history ingest when the selected database supports it |
|
||||
| `--disable-query-history` | Disable query-history ingest for the selected database |
|
||||
| `--query-history-window-days <number>` | Query-history lookback window |
|
||||
| `--query-history-window-days <number>` | BigQuery/Snowflake query-history lookback window |
|
||||
| `--query-history-min-executions <number>` | Minimum executions for a query-history template |
|
||||
| `--query-history-service-account-pattern <pattern>` | Query-history service-account regex; repeatable |
|
||||
| `--query-history-redaction-pattern <pattern>` | Query-history SQL-literal redaction regex; repeatable |
|
||||
|
||||
Query history setup is supported for Postgres, BigQuery, and Snowflake. Enabling
|
||||
query history makes deep ingest readiness matter for later `ktx ingest` runs.
|
||||
Query history setup is supported for Postgres, BigQuery, and Snowflake. The
|
||||
window flag applies to BigQuery and Snowflake; Postgres reads the current
|
||||
`pg_stat_statements` aggregate data instead of a time-windowed history table.
|
||||
Enabling query history makes deep ingest readiness matter for later
|
||||
`ktx ingest` runs.
|
||||
|
||||
### Context Sources
|
||||
|
||||
|
|
|
|||
|
|
@ -9,6 +9,7 @@
|
|||
"ktx-sl",
|
||||
"ktx-wiki",
|
||||
"ktx-status",
|
||||
"ktx-mcp",
|
||||
"ktx-dev"
|
||||
]
|
||||
}
|
||||
|
|
|
|||
|
|
@ -1,5 +1,5 @@
|
|||
{
|
||||
"title": "Concepts",
|
||||
"defaultOpen": true,
|
||||
"pages": ["the-context-layer", "context-as-code"]
|
||||
"pages": ["the-context-layer", "semantic-layer-internals", "context-as-code"]
|
||||
}
|
||||
|
|
|
|||
398
docs-site/content/docs/concepts/semantic-layer-internals.mdx
Normal file
398
docs-site/content/docs/concepts/semantic-layer-internals.mdx
Normal file
|
|
@ -0,0 +1,398 @@
|
|||
---
|
||||
title: Semantic Layer Internals
|
||||
description: How KTX uses join graphs, grain, and relationship metadata to turn context into safe SQL.
|
||||
---
|
||||
|
||||
KTX is a context layer for agents. This page focuses on one internal subsystem:
|
||||
the semantic execution layer that turns reviewed context into safe SQL.
|
||||
|
||||
The semantic layer is important, but it is not the whole product. KTX also
|
||||
handles schema evidence, wiki context, provenance, validation, and agent
|
||||
workflows around those files.
|
||||
|
||||
Read the page as a pipeline:
|
||||
|
||||
- context inputs feed the semantic engine;
|
||||
- evidence becomes a join graph with grain and relationship metadata;
|
||||
- review and corrections keep that graph current;
|
||||
- the execution engine uses the graph to avoid fan-out and ambiguous joins.
|
||||
|
||||
## Where the semantic layer fits
|
||||
|
||||
The semantic layer is not a separate product category inside KTX. It is the
|
||||
engine that makes the rest of the context actionable for SQL generation.
|
||||
|
||||
<div
|
||||
className="not-prose my-8 overflow-hidden rounded-lg border border-fd-border bg-fd-card shadow-sm"
|
||||
aria-label="How context inputs flow through the semantic layer into agent workflows"
|
||||
>
|
||||
<div className="grid gap-0 lg:grid-cols-[1fr_2rem_1.12fr_2rem_1fr]">
|
||||
<section className="bg-fd-background p-4">
|
||||
<p className="mb-3 text-[11px] font-semibold uppercase tracking-wide text-fd-muted-foreground">
|
||||
{"Context inputs"}
|
||||
</p>
|
||||
<div className="grid gap-2 text-sm">
|
||||
<div className="border-l-2 border-fd-primary bg-fd-card px-3 py-2">
|
||||
<p className="font-mono text-xs text-fd-foreground">semantic-layer/</p>
|
||||
<p className="mt-1 text-xs leading-5 text-fd-muted-foreground">
|
||||
{"source YAML, measures, joins, grain"}
|
||||
</p>
|
||||
</div>
|
||||
<div className="border-l-2 border-amber-500 bg-fd-card px-3 py-2">
|
||||
<p className="font-mono text-xs text-fd-foreground">wiki/</p>
|
||||
<p className="mt-1 text-xs leading-5 text-fd-muted-foreground">
|
||||
{"business rules, definitions, caveats"}
|
||||
</p>
|
||||
</div>
|
||||
<div className="border-l-2 border-orange-500 bg-fd-card px-3 py-2">
|
||||
<p className="font-mono text-xs text-fd-foreground">raw-sources/</p>
|
||||
<p className="mt-1 text-xs leading-5 text-fd-muted-foreground">
|
||||
{"schema scans, keys, imported metadata"}
|
||||
</p>
|
||||
</div>
|
||||
<div className="border-l-2 border-slate-500 bg-fd-card px-3 py-2 dark:border-cyan-200">
|
||||
<p className="font-mono text-xs text-fd-foreground">provenance</p>
|
||||
<p className="mt-1 text-xs leading-5 text-fd-muted-foreground">
|
||||
{"ingest decisions and review history"}
|
||||
</p>
|
||||
</div>
|
||||
</div>
|
||||
</section>
|
||||
|
||||
<div className="hidden items-center justify-center bg-fd-background lg:flex" aria-hidden="true">
|
||||
<span className="h-px w-full bg-fd-border" />
|
||||
</div>
|
||||
|
||||
<section className="relative bg-[#102226] p-5 text-white dark:bg-[#0b181b]">
|
||||
<div className="absolute inset-y-0 left-0 w-1 bg-fd-primary" />
|
||||
<p className="mb-3 text-[11px] font-semibold uppercase tracking-wide text-cyan-200">
|
||||
{"Semantic layer engine"}
|
||||
</p>
|
||||
<div className="grid gap-2 sm:grid-cols-2">
|
||||
<div className="rounded-md border border-cyan-100/20 bg-white/8 px-3 py-2">
|
||||
<p className="text-sm font-semibold">Join graph</p>
|
||||
<p className="mt-1 text-xs leading-5 text-cyan-50/75">
|
||||
{"sources as nodes, joins as typed edges"}
|
||||
</p>
|
||||
</div>
|
||||
<div className="rounded-md border border-cyan-100/20 bg-white/8 px-3 py-2">
|
||||
<p className="text-sm font-semibold">Grain</p>
|
||||
<p className="mt-1 text-xs leading-5 text-cyan-50/75">
|
||||
{"row identity before aggregation"}
|
||||
</p>
|
||||
</div>
|
||||
<div className="rounded-md border border-cyan-100/20 bg-white/8 px-3 py-2">
|
||||
<p className="text-sm font-semibold">Measures</p>
|
||||
<p className="mt-1 text-xs leading-5 text-cyan-50/75">
|
||||
{"verified formulas and filters"}
|
||||
</p>
|
||||
</div>
|
||||
<div className="rounded-md border border-cyan-100/20 bg-white/8 px-3 py-2">
|
||||
<p className="whitespace-nowrap break-normal text-sm font-semibold">Relationships</p>
|
||||
<p className="mt-1 text-xs leading-5 text-cyan-50/75">
|
||||
{"many_to_one, one_to_many, one_to_one"}
|
||||
</p>
|
||||
</div>
|
||||
</div>
|
||||
<div className="mt-3 rounded-md border border-cyan-100/20 bg-cyan-50/10 px-3 py-2 text-sm">
|
||||
{"Safe query planning before SQL is generated."}
|
||||
</div>
|
||||
</section>
|
||||
|
||||
<div className="hidden items-center justify-center bg-fd-background lg:flex" aria-hidden="true">
|
||||
<span className="h-px w-full bg-fd-border" />
|
||||
</div>
|
||||
|
||||
<section className="bg-fd-muted/35 p-4">
|
||||
<p className="mb-3 text-[11px] font-semibold uppercase tracking-wide text-fd-muted-foreground">
|
||||
{"Agent workflows"}
|
||||
</p>
|
||||
<div className="space-y-2 text-sm">
|
||||
<div className="rounded-md border border-fd-border bg-fd-card px-3 py-2">
|
||||
{"Search sources and wiki pages"}
|
||||
</div>
|
||||
<div className="rounded-md border border-fd-border bg-fd-card px-3 py-2">
|
||||
{"Compile trusted SQL"}
|
||||
</div>
|
||||
<div className="rounded-md border border-fd-border bg-fd-card px-3 py-2">
|
||||
{"Explain metrics and provenance"}
|
||||
</div>
|
||||
<div className="rounded-md border border-fd-border bg-fd-card px-3 py-2">
|
||||
{"Patch files and validate review"}
|
||||
</div>
|
||||
</div>
|
||||
</section>
|
||||
</div>
|
||||
</div>
|
||||
|
||||
## The join graph KTX builds
|
||||
|
||||
A semantic source is a node. A join is an edge with a join condition and a
|
||||
relationship type. The graph lets KTX choose valid paths, reject unsafe paths,
|
||||
and reason about whether a join preserves or multiplies rows before SQL is
|
||||
generated.
|
||||
|
||||
- `many_to_one` paths are usually safe for adding dimensions.
|
||||
- `one_to_many` paths can multiply fact rows and trigger fan-out handling.
|
||||
- Equal-cost paths can be ambiguous, so aliases and explicit joins matter.
|
||||
|
||||
<figure
|
||||
className="not-prose my-8 overflow-hidden rounded-lg border border-fd-border bg-fd-card p-4 shadow-sm"
|
||||
aria-label="Example semantic join graph"
|
||||
>
|
||||
<div className="grid gap-3 md:grid-cols-[1fr_1fr_1fr]">
|
||||
<div className="rounded-md border border-fd-border bg-fd-background px-4 py-3">
|
||||
<p className="text-sm font-semibold text-fd-foreground">customers</p>
|
||||
<p className="mt-1 text-xs text-fd-muted-foreground">grain: customer_id</p>
|
||||
</div>
|
||||
<div className="rounded-md border-2 border-fd-primary bg-fd-background px-4 py-3">
|
||||
<p className="text-sm font-semibold text-fd-foreground">orders</p>
|
||||
<p className="mt-1 text-xs text-fd-muted-foreground">grain: order_id</p>
|
||||
</div>
|
||||
<div className="rounded-md border border-fd-border bg-fd-background px-4 py-3">
|
||||
<p className="text-sm font-semibold text-fd-foreground">order_items</p>
|
||||
<p className="mt-1 text-xs text-fd-muted-foreground">grain: order_id, line_id</p>
|
||||
</div>
|
||||
</div>
|
||||
<div className="my-3 grid gap-2 text-center text-xs font-medium text-fd-muted-foreground md:grid-cols-[1fr_1fr]">
|
||||
<div>orders -> customers: many_to_one</div>
|
||||
<div>orders -> order_items: one_to_many</div>
|
||||
</div>
|
||||
<figcaption className="mt-4 border-t border-fd-border pt-3 text-left text-xs leading-5 text-fd-muted-foreground">
|
||||
<span className="font-medium text-fd-foreground">{"Example: "}</span>
|
||||
{"refunds joins to orders. Used carefully, it explains net revenue. Joined naively, it can duplicate order-level measures."}
|
||||
</figcaption>
|
||||
</figure>
|
||||
|
||||
The graph is bidirectional for planning. If `orders -> customers` is
|
||||
`many_to_one`, the reverse path is `one_to_many`; KTX keeps that distinction
|
||||
instead of treating every join as a neutral edge.
|
||||
|
||||
## How KTX builds the graph
|
||||
|
||||
KTX starts from evidence, not a blank modeling canvas. Database scans and
|
||||
analytics-tool imports create source definitions that an analyst can review.
|
||||
|
||||
| Evidence | What it contributes |
|
||||
|---|---|
|
||||
| Declared primary keys | Initial row grain for each source |
|
||||
| Declared foreign keys | Formal join candidates and relationship direction |
|
||||
| Inferred relationships | Useful edges when warehouses lack constraints |
|
||||
| dbt, MetricFlow, and LookML imports | Existing metrics, dimensions, entities, explores, and joins |
|
||||
| Query history | Real join and filter patterns agents should respect |
|
||||
| Analyst review | The final authority before context is merged |
|
||||
|
||||
Generated YAML is intentionally reviewable. KTX can draft joins and measures,
|
||||
but the accepted semantic layer is still the plain-file diff your team approves.
|
||||
|
||||
## How KTX keeps the graph current
|
||||
|
||||
The semantic layer changes as schemas, metrics, and business rules change. KTX
|
||||
keeps that loop explicit instead of hiding it behind a remote runtime.
|
||||
|
||||
<div
|
||||
className="not-prose my-8 overflow-hidden rounded-lg border border-fd-border bg-fd-card shadow-sm"
|
||||
aria-label="Semantic layer maintenance loop"
|
||||
>
|
||||
<div className="border-b border-fd-border bg-fd-muted/35 px-4 py-3">
|
||||
<p className="text-[11px] font-semibold uppercase tracking-wide text-fd-muted-foreground">
|
||||
{"Semantic maintenance loop"}
|
||||
</p>
|
||||
<p className="mt-1 text-sm leading-6 text-fd-muted-foreground">
|
||||
{"Every accepted correction becomes input to the next graph build."}
|
||||
</p>
|
||||
</div>
|
||||
<div className="p-4">
|
||||
<div className="-mx-4 overflow-x-auto px-4">
|
||||
<div className="relative mx-auto h-[460px] w-[720px] max-w-none md:w-full md:max-w-[760px]">
|
||||
<svg
|
||||
aria-hidden="true"
|
||||
className="absolute inset-0 h-full w-full text-fd-primary"
|
||||
fill="none"
|
||||
viewBox="0 0 760 460"
|
||||
>
|
||||
<g
|
||||
stroke="currentColor"
|
||||
strokeLinecap="round"
|
||||
strokeLinejoin="round"
|
||||
strokeOpacity="0.68"
|
||||
strokeWidth="2.5"
|
||||
>
|
||||
<path d="M 352 80 H 384" />
|
||||
<path d="M 600 80 H 668 V 150" />
|
||||
<path d="M 632 284 V 378 H 626" />
|
||||
<path d="M 408 378 H 376" />
|
||||
<path d="M 160 378 H 96 V 308" />
|
||||
<path d="M 128 172 V 80 H 140" />
|
||||
</g>
|
||||
<g fill="currentColor" fillOpacity="0.96" stroke="none">
|
||||
<polygon points="0,0 -14,-7 -14,7" transform="translate(398 80)" />
|
||||
<polygon points="0,0 -14,-7 -14,7" transform="translate(668 164) rotate(90)" />
|
||||
<polygon points="0,0 -14,-7 -14,7" transform="translate(612 378) rotate(180)" />
|
||||
<polygon points="0,0 -14,-7 -14,7" transform="translate(362 378) rotate(180)" />
|
||||
<polygon points="0,0 -14,-7 -14,7" transform="translate(96 294) rotate(270)" />
|
||||
<polygon points="0,0 -14,-7 -14,7" transform="translate(154 80)" />
|
||||
</g>
|
||||
</svg>
|
||||
|
||||
<div className="absolute left-1/2 top-1/2 flex h-32 w-56 -translate-x-1/2 -translate-y-1/2 flex-col items-center justify-center rounded-md border border-fd-primary/50 bg-fd-background px-4 py-4 text-center shadow-sm">
|
||||
<p className="text-[11px] font-semibold uppercase tracking-wide text-fd-primary">
|
||||
{"reviewed context"}
|
||||
</p>
|
||||
<p className="mt-2 text-sm font-semibold leading-6 text-fd-foreground">
|
||||
{"The accepted graph becomes the starting point for the next build."}
|
||||
</p>
|
||||
</div>
|
||||
|
||||
<div className="absolute left-[160px] top-6 h-28 w-48 rounded-md border-2 border-fd-primary bg-fd-background px-4 py-3 text-sm shadow-sm">
|
||||
<p className="text-[11px] font-semibold uppercase tracking-wide text-fd-muted-foreground">
|
||||
{"Step 1"}
|
||||
</p>
|
||||
<p className="mt-1 font-semibold text-fd-foreground">{"ingest evidence"}</p>
|
||||
<p className="mt-2 text-xs leading-5 text-fd-muted-foreground">
|
||||
{"scan schemas, imports, and accepted files"}
|
||||
</p>
|
||||
</div>
|
||||
<div className="absolute left-[408px] top-6 h-28 w-48 rounded-md border border-fd-border bg-fd-background px-4 py-3 text-sm shadow-sm">
|
||||
<p className="text-[11px] font-semibold uppercase tracking-wide text-fd-muted-foreground">
|
||||
{"Step 2"}
|
||||
</p>
|
||||
<p className="mt-1 font-semibold text-fd-foreground">{"YAML diff"}</p>
|
||||
<p className="mt-2 text-xs leading-5 text-fd-muted-foreground">
|
||||
{"draft source, join, grain, and measure changes"}
|
||||
</p>
|
||||
</div>
|
||||
<div className="absolute left-[536px] top-[172px] h-28 w-48 rounded-md border border-fd-border bg-fd-background px-4 py-3 text-sm shadow-sm">
|
||||
<p className="text-[11px] font-semibold uppercase tracking-wide text-fd-muted-foreground">
|
||||
{"Step 3"}
|
||||
</p>
|
||||
<p className="mt-1 font-semibold text-fd-foreground">{"validation"}</p>
|
||||
<p className="mt-2 text-xs leading-5 text-fd-muted-foreground">
|
||||
{"check relationships, syntax, and unsafe query shapes"}
|
||||
</p>
|
||||
</div>
|
||||
<div className="absolute left-[408px] top-[322px] h-28 w-48 rounded-md border border-fd-border bg-fd-background px-4 py-3 text-sm shadow-sm">
|
||||
<p className="text-[11px] font-semibold uppercase tracking-wide text-fd-muted-foreground">
|
||||
{"Step 4"}
|
||||
</p>
|
||||
<p className="mt-1 font-semibold text-fd-foreground">{"analyst review"}</p>
|
||||
<p className="mt-2 text-xs leading-5 text-fd-muted-foreground">
|
||||
{"accept, edit, or reject generated context"}
|
||||
</p>
|
||||
</div>
|
||||
<div className="absolute left-[160px] top-[322px] h-28 w-48 rounded-md border border-fd-border bg-fd-background px-4 py-3 text-sm shadow-sm">
|
||||
<p className="text-[11px] font-semibold uppercase tracking-wide text-fd-muted-foreground">
|
||||
{"Step 5"}
|
||||
</p>
|
||||
<p className="mt-1 font-semibold text-fd-foreground">{"agent use"}</p>
|
||||
<p className="mt-2 text-xs leading-5 text-fd-muted-foreground">
|
||||
{"serve context to search, explain, and query"}
|
||||
</p>
|
||||
</div>
|
||||
<div className="absolute left-8 top-[172px] h-28 w-48 rounded-md border border-fd-primary/70 bg-fd-background px-4 py-3 text-sm shadow-sm">
|
||||
<p className="text-[11px] font-semibold uppercase tracking-wide text-fd-muted-foreground">
|
||||
{"Step 6"}
|
||||
</p>
|
||||
<p className="mt-1 font-semibold text-fd-foreground">{"corrections"}</p>
|
||||
<p className="mt-2 text-xs leading-5 text-fd-muted-foreground">
|
||||
{"agent and analyst fixes become new evidence"}
|
||||
</p>
|
||||
</div>
|
||||
</div>
|
||||
</div>
|
||||
</div>
|
||||
</div>
|
||||
|
||||
This matters because semantic correctness is not static. If a source gains a
|
||||
new key, a metric changes definition, or an analyst corrects a relationship,
|
||||
the next agent gets that reviewed context.
|
||||
|
||||
## The modeling problem the graph solves
|
||||
|
||||
Fan-out is the classic failure mode. If an order-level measure is joined to
|
||||
line-item rows before aggregation, one order can become many rows and revenue
|
||||
can be counted more than once.
|
||||
|
||||
| Problem | What happens | How KTX avoids it |
|
||||
|---|---|---|
|
||||
| Order measure joins to `order_items` | `orders.revenue` repeats once per item | Detect the `one_to_many` path and pre-aggregate the order measure |
|
||||
| Two independent fact sources share `customers` | Measures from each fact table multiply across the shared dimension | Treat it as a chasm trap and use aggregate-locality planning |
|
||||
| Filter lives only across a `one_to_many` path | Filtering after the join changes the measure grain | Reject or localize the filter instead of silently producing unsafe SQL |
|
||||
| Multiple equal-cost paths connect the same sources | The join path is ambiguous | Prefer safer paths and use aliases to disambiguate repeated joins |
|
||||
|
||||
Many-to-many questions usually show up as multiple one-to-many paths or
|
||||
independent fact sources. KTX treats those shapes as fan-out or chasm risks
|
||||
unless the query can be planned at a safe grain.
|
||||
|
||||
## How the execution engine uses the graph
|
||||
|
||||
The planner resolves the sources in a semantic query, chooses a join tree, and
|
||||
checks whether any requested dimension or filter crosses a row-multiplying
|
||||
edge. The SQL generator then chooses the simple path or the aggregate-locality
|
||||
path.
|
||||
|
||||
| Naive SQL shape | Semantic-layer SQL shape |
|
||||
|---|---|
|
||||
| Join facts and dimensions first, then aggregate | Aggregate each fact source at its own grain, then join the results |
|
||||
| Put every filter in one outer `WHERE` clause | Keep measure filters with the measure source when locality is needed |
|
||||
| Trust the shortest textual join path | Prefer safe relationship paths and reject disconnected sources |
|
||||
| Let dimension grain differ across facts | Raise when asymmetric dimensions would fan out another measure |
|
||||
|
||||
<div
|
||||
className="not-prose my-8 overflow-hidden rounded-lg border border-fd-border bg-fd-card shadow-sm"
|
||||
aria-label="Fan-out safe execution shape"
|
||||
>
|
||||
<div className="grid gap-0 md:grid-cols-2">
|
||||
<section className="border-b border-fd-border bg-fd-background p-4 md:border-b-0 md:border-r">
|
||||
<p className="mb-3 text-[11px] font-semibold uppercase tracking-wide text-fd-muted-foreground">
|
||||
{"Unsafe shape"}
|
||||
</p>
|
||||
<pre className="overflow-x-auto rounded-md bg-fd-muted p-3 text-xs leading-5 text-fd-foreground">
|
||||
{`orders
|
||||
join order_items
|
||||
join customers
|
||||
group by customer_segment
|
||||
sum(orders.amount)`}
|
||||
</pre>
|
||||
<p className="mt-3 text-sm text-fd-muted-foreground">
|
||||
{"The order measure is exposed to line-item fan-out before aggregation."}
|
||||
</p>
|
||||
</section>
|
||||
<section className="bg-fd-background p-4">
|
||||
<p className="mb-3 text-[11px] font-semibold uppercase tracking-wide text-fd-muted-foreground">
|
||||
{"KTX shape"}
|
||||
</p>
|
||||
<pre className="overflow-x-auto rounded-md border border-fd-border bg-fd-muted p-3 text-xs leading-5 text-fd-foreground">
|
||||
{`orders_agg as (
|
||||
select customer_id, sum(amount) revenue
|
||||
from orders
|
||||
group by customer_id
|
||||
)
|
||||
select customers.segment, sum(revenue)
|
||||
from orders_agg
|
||||
join customers`}
|
||||
</pre>
|
||||
<p className="mt-3 text-sm text-fd-muted-foreground">
|
||||
{"KTX pre-aggregates fact measures at their own grain before joining dimensions."}
|
||||
</p>
|
||||
</section>
|
||||
</div>
|
||||
</div>
|
||||
|
||||
The result is not magic. It is structured planning: validated sources, typed
|
||||
relationships, graph search, fan-out detection, aggregate locality, and final
|
||||
dialect transpilation.
|
||||
|
||||
## What this means for agents
|
||||
|
||||
KTX gives agents a semantic surface they can inspect and improve, not just a
|
||||
folder of notes.
|
||||
|
||||
- Search semantic sources and related wiki pages before writing SQL.
|
||||
- Compile SQL through `ktx sl query` instead of guessing joins.
|
||||
- Validate semantic-layer changes before review.
|
||||
- Patch YAML and Markdown files in git.
|
||||
- Explain metric meaning and provenance from the same accepted context.
|
||||
|
||||
Next, read [Writing Context](/docs/guides/writing-context) for the YAML editing
|
||||
workflow or [ktx sl](/docs/cli-reference/ktx-sl) for the command reference.
|
||||
|
|
@ -191,7 +191,18 @@ KTX organizes context into four pillars:
|
|||
|
||||
Each pillar covers a different kind of context agents need before they can safely write SQL, update semantic definitions, or explain an analytics result.
|
||||
|
||||
**Semantic sources** are YAML definitions that describe your data in terms agents can reason about. Each source maps to a table or SQL query, declares its grain, defines typed columns, specifies valid joins, and exposes named measures with optional filters. This is where "revenue means `sum(amount)` excluding refunds" lives.
|
||||
**Semantic sources** are YAML definitions that describe your data in terms
|
||||
agents can reason about:
|
||||
|
||||
- source tables or SQL queries;
|
||||
- row grain;
|
||||
- typed columns;
|
||||
- valid joins;
|
||||
- named measures, filters, and segments.
|
||||
|
||||
This is where "revenue means `sum(amount)` excluding refunds" lives. For the
|
||||
join graph, fan-out protections, and execution mechanics, read
|
||||
[Semantic Layer Internals](/docs/concepts/semantic-layer-internals).
|
||||
|
||||
```yaml
|
||||
name: orders
|
||||
|
|
@ -289,7 +300,7 @@ my-project/
|
|||
│ └── data-quality-notes.md
|
||||
├── raw-sources/
|
||||
│ └── warehouse/
|
||||
│ └── database-ingest/ # Schema ingest artifacts and reports
|
||||
│ └── live-database/ # Schema ingest artifacts and reports
|
||||
└── .ktx/
|
||||
├── db.sqlite # Local state (git-ignored)
|
||||
└── cache/ # Runtime cache (git-ignored)
|
||||
|
|
|
|||
|
|
@ -3,10 +3,12 @@ title: Introduction
|
|||
description: How KTX gives analytics agents trusted context for warehouse work.
|
||||
---
|
||||
|
||||
<div className="not-prose mb-14">
|
||||
<div className="mb-8">
|
||||
import { ProductMechanics } from "@/components/product-mechanics";
|
||||
|
||||
<div className="not-prose mb-10">
|
||||
<div>
|
||||
<h1
|
||||
className="text-4xl font-extrabold tracking-tight lg:text-5xl"
|
||||
className="max-w-full text-3xl font-extrabold tracking-tight break-words sm:text-4xl lg:text-5xl"
|
||||
style={{
|
||||
fontFamily: 'var(--font-display)',
|
||||
background: 'linear-gradient(180deg, var(--color-fd-foreground) 0%, color-mix(in oklch, var(--color-fd-foreground) 75%, var(--color-fd-primary)) 100%)',
|
||||
|
|
@ -18,62 +20,43 @@ description: How KTX gives analytics agents trusted context for warehouse work.
|
|||
letterSpacing: '0',
|
||||
}}
|
||||
>
|
||||
Make analytics context{'\n'}usable by agents
|
||||
Make analytics context usable by agents
|
||||
</h1>
|
||||
<p className="mt-4 text-lg text-fd-muted-foreground max-w-2xl" style={{ lineHeight: '1.7' }}>
|
||||
KTX turns warehouse metadata, semantic definitions, and business knowledge
|
||||
into reviewable project files that agents can use while planning, querying,
|
||||
and updating analytics work.
|
||||
<p className="mt-4 max-w-2xl text-lg text-fd-muted-foreground" style={{ lineHeight: '1.7' }}>
|
||||
{'KTX turns warehouse metadata, semantic definitions, and business knowledge into reviewable project files that agents can use while planning, querying, and updating analytics work.'}
|
||||
</p>
|
||||
</div>
|
||||
<div className="flex flex-wrap gap-3">
|
||||
<a
|
||||
href="/docs/getting-started/quickstart"
|
||||
className="inline-flex h-10 items-center rounded-lg bg-fd-primary px-5 text-sm font-medium text-fd-primary-foreground transition-colors hover:opacity-90"
|
||||
>
|
||||
Get Started
|
||||
</a>
|
||||
<a
|
||||
href="/docs/concepts/the-context-layer"
|
||||
className="inline-flex h-10 items-center rounded-lg border border-fd-border bg-fd-background px-5 text-sm font-medium text-fd-foreground transition-colors hover:bg-fd-muted"
|
||||
>
|
||||
The Context Layer
|
||||
</a>
|
||||
<a
|
||||
href="/docs/guides/building-context"
|
||||
className="inline-flex h-10 items-center rounded-lg border border-fd-border bg-fd-background px-5 text-sm font-medium text-fd-foreground transition-colors hover:bg-fd-muted"
|
||||
>
|
||||
Building Context
|
||||
</a>
|
||||
</div>
|
||||
</div>
|
||||
|
||||
## Who KTX is for
|
||||
<ProductMechanics />
|
||||
|
||||
## What agents can do with KTX
|
||||
|
||||
KTX is built for analytics engineers and data teams who want data agents to
|
||||
work on real analytics systems - not just generate one-off SQL.
|
||||
work on real analytics systems, not just generate one-off SQL.
|
||||
|
||||
Use KTX when you want agents to:
|
||||
Use it when agents need to:
|
||||
|
||||
- **Generate SQL** from approved measures and joins
|
||||
- **Repair semantic definitions** through reviewable diffs
|
||||
- **Explain metric provenance** with warehouse evidence
|
||||
- **Work alongside** dbt, LookML, MetricFlow, Looker, Metabase, and modern BI platforms
|
||||
- **Generate SQL** from approved measures, dimensions, joins, and filters
|
||||
- **Explain provenance** with wiki context and warehouse evidence
|
||||
- **Repair context** through reviewable YAML and Markdown diffs
|
||||
- **Work alongside** dbt, LookML, MetricFlow, Looker, Metabase, and warehouses
|
||||
|
||||
Works with PostgreSQL, Snowflake, BigQuery, ClickHouse, MySQL, and SQL Server.
|
||||
KTX works with SQLite, PostgreSQL, Snowflake, BigQuery, ClickHouse, MySQL, and
|
||||
SQL Server.
|
||||
|
||||
## Explore the docs
|
||||
## Read next
|
||||
|
||||
<Cards>
|
||||
<Card title="Quickstart" href="/docs/getting-started/quickstart">
|
||||
Set up KTX and build your first context in under 10 minutes.
|
||||
</Card>
|
||||
<Card title="Concepts" href="/docs/concepts/the-context-layer">
|
||||
Understand what a context layer is and why agents need one.
|
||||
</Card>
|
||||
<Card title="Guides" href="/docs/guides/building-context">
|
||||
Hands-on workflows for scanning, ingesting, writing, and serving.
|
||||
</Card>
|
||||
<Card title="Writing Context" href="/docs/guides/writing-context">
|
||||
Edit semantic-layer YAML and wiki Markdown safely.
|
||||
</Card>
|
||||
<Card title="CLI Reference" href="/docs/cli-reference/ktx-setup">
|
||||
Complete flag and subcommand reference for every KTX command.
|
||||
</Card>
|
||||
|
|
|
|||
|
|
@ -51,8 +51,8 @@ For scripted setup, pass the project directory explicitly:
|
|||
ktx setup --project-dir ./analytics
|
||||
```
|
||||
|
||||
If setup exits early, rerun `ktx setup` in the same directory. KTX tracks
|
||||
completed setup steps and resumes from the remaining work.
|
||||
If setup exits early, rerun `ktx setup` in the same directory. KTX keeps local
|
||||
setup progress under `.ktx/setup/` and resumes from the remaining work.
|
||||
|
||||
## Step 2: Configure the LLM
|
||||
|
||||
|
|
@ -122,7 +122,8 @@ Database ready
|
|||
|
||||
PostgreSQL, BigQuery, and Snowflake can also enable query-history ingest. Query
|
||||
history helps KTX learn common query patterns, joins, service-account filters,
|
||||
and warehouse-specific usage.
|
||||
and warehouse-specific usage. BigQuery and Snowflake support a lookback window;
|
||||
Postgres reads the current `pg_stat_statements` aggregate data instead.
|
||||
|
||||
## Step 5: Add context sources
|
||||
|
||||
|
|
@ -200,7 +201,7 @@ KTX writes plain files so people and agents can inspect changes in git.
|
|||
|
||||
| Path | Purpose |
|
||||
|------|---------|
|
||||
| `ktx.yaml` | Project configuration for LLMs, embeddings, connections, context sources, and setup state |
|
||||
| `ktx.yaml` | Project configuration for LLMs, embeddings, connections, context sources, and query-history settings |
|
||||
| `.ktx/secrets/*` | Local secret files referenced from `ktx.yaml`; do not commit these |
|
||||
| `.ktx/setup/*` | Local setup and context-build state |
|
||||
| `.ktx/agents/install-manifest.json` | Manifest used to manage installed agent files |
|
||||
|
|
|
|||
|
|
@ -62,13 +62,15 @@ configured, run `ktx setup` or use `--fast`.
|
|||
|
||||
PostgreSQL, BigQuery, and Snowflake can add query-history context. This helps
|
||||
KTX learn common joins, filters, service-account patterns, redaction rules, and
|
||||
usage-heavy query templates.
|
||||
usage-heavy query templates. BigQuery and Snowflake support a lookback window;
|
||||
Postgres reads the current `pg_stat_statements` aggregate data instead.
|
||||
|
||||
Enable it during setup, store it under `connections.<id>.context.queryHistory`,
|
||||
or request it for one run:
|
||||
|
||||
```bash
|
||||
ktx ingest warehouse --deep --query-history
|
||||
# Set the lookback window for BigQuery or Snowflake query history
|
||||
ktx ingest warehouse --query-history-window-days 30
|
||||
```
|
||||
|
||||
|
|
|
|||
|
|
@ -60,21 +60,25 @@ semantic-layer/<connection-id>/<source-name>.yaml
|
|||
|
||||
```yaml
|
||||
name: orders
|
||||
description: Customer orders with booked revenue.
|
||||
descriptions:
|
||||
user: Customer orders with booked revenue.
|
||||
table: public.orders
|
||||
grain:
|
||||
- order_id
|
||||
columns:
|
||||
- name: order_id
|
||||
type: string
|
||||
description: Unique order identifier.
|
||||
descriptions:
|
||||
user: Unique order identifier.
|
||||
- name: order_date
|
||||
type: time
|
||||
role: time
|
||||
description: Date the order was placed.
|
||||
descriptions:
|
||||
user: Date the order was placed.
|
||||
- name: total_amount
|
||||
type: number
|
||||
description: Booked order value in USD.
|
||||
descriptions:
|
||||
user: Booked order value in USD.
|
||||
measures:
|
||||
- name: total_revenue
|
||||
expr: SUM(total_amount)
|
||||
|
|
@ -85,7 +89,8 @@ measures:
|
|||
|
||||
```yaml
|
||||
name: orders
|
||||
description: Customer orders with line-item totals.
|
||||
descriptions:
|
||||
user: Customer orders with line-item totals.
|
||||
table: public.orders
|
||||
grain:
|
||||
- order_id
|
||||
|
|
@ -93,26 +98,31 @@ grain:
|
|||
columns:
|
||||
- name: order_id
|
||||
type: string
|
||||
description: Unique order identifier.
|
||||
descriptions:
|
||||
user: Unique order identifier.
|
||||
|
||||
- name: order_date
|
||||
type: time
|
||||
role: time
|
||||
description: Date the order was placed.
|
||||
descriptions:
|
||||
user: Date the order was placed.
|
||||
|
||||
- name: status
|
||||
type: string
|
||||
visibility: public
|
||||
description: Current order status.
|
||||
descriptions:
|
||||
user: Current order status.
|
||||
|
||||
- name: _etl_loaded_at
|
||||
type: time
|
||||
visibility: hidden
|
||||
description: Internal load timestamp.
|
||||
descriptions:
|
||||
user: Internal load timestamp.
|
||||
|
||||
- name: total_amount
|
||||
type: number
|
||||
description: Order total in USD.
|
||||
descriptions:
|
||||
user: Order total in USD.
|
||||
|
||||
measures:
|
||||
- name: total_revenue
|
||||
|
|
@ -149,9 +159,10 @@ joins:
|
|||
| Field | Required | Description |
|
||||
|-------|----------|-------------|
|
||||
| `name` | Yes | Source identifier. Use lowercase words and underscores. |
|
||||
| `descriptions` | No | Description map keyed by source, such as `user`, `dbt`, or `ai`. |
|
||||
| `table` or `sql` | Yes | Database table or custom SQL expression. Use exactly one. |
|
||||
| `grain` | Yes | Columns that uniquely identify a row at the source grain. |
|
||||
| `columns` | No | Column definitions with type, role, visibility, and descriptions. |
|
||||
| `columns` | Yes | Non-empty column definitions with type, role, visibility, and descriptions. |
|
||||
| `measures` | No | Aggregation expressions such as `SUM`, `COUNT`, and `AVG`. |
|
||||
| `segments` | No | Named predicates agents can reuse. |
|
||||
| `joins` | No | Relationships to other semantic sources. |
|
||||
|
|
@ -165,7 +176,7 @@ joins:
|
|||
| Column | `type` | Yes | Agent-facing type: `string`, `number`, `time`, or `boolean`. |
|
||||
| Column | `role` | No | Special role such as `time` for default time dimensions. |
|
||||
| Column | `visibility` | No | `public`, `internal`, or `hidden`. |
|
||||
| Column | `description` | Strongly recommended | Business meaning and usage notes. |
|
||||
| Column | `descriptions` | Strongly recommended | Description map keyed by source, such as `user`, `dbt`, or `ai`. |
|
||||
| Measure | `name` | Yes | Queryable metric name. |
|
||||
| Measure | `expr` | Yes | SQL aggregation expression at the source grain. |
|
||||
| Measure | `filter` | No | SQL predicate applied only to this measure. |
|
||||
|
|
|
|||
|
|
@ -122,7 +122,7 @@ Available commands:
|
|||
- `ktx status --json --project-dir /path/to/project`
|
||||
- `ktx sl list --json --project-dir /path/to/project`
|
||||
- `ktx sl search '<text>' --json --project-dir /path/to/project --connection-id '<id>'`
|
||||
- `ktx sl query --json --project-dir /path/to/project --connection-id '<id>'`
|
||||
- `ktx sl query --project-dir /path/to/project --connection-id '<id>' --query-file '<path>' --format json --execute --max-rows 100`
|
||||
- `ktx wiki search '<query>' --json --project-dir /path/to/project --limit 10`
|
||||
```
|
||||
|
||||
|
|
@ -266,7 +266,7 @@ Admin CLI skills call the same KTX CLI commands:
|
|||
| `ktx sl list --json` | List semantic-layer sources |
|
||||
| `ktx sl search <query> --json` | Search semantic-layer sources |
|
||||
| `ktx sl validate <source> --connection-id <id>` | Validate semantic source definitions |
|
||||
| `ktx sl query --json` | Execute a semantic-layer query when semantic compute is configured |
|
||||
| `ktx sl query --format json` | Execute a semantic-layer query when semantic compute is configured |
|
||||
|
||||
### Security constraints
|
||||
|
||||
|
|
|
|||
|
|
@ -34,8 +34,9 @@ automation flags documented in [`ktx setup`](/docs/cli-reference/ktx-setup).
|
|||
|
||||
| Path | Purpose |
|
||||
|------|---------|
|
||||
| `ktx.yaml` | Main project configuration for providers, embeddings, connections, source mappings, query history, and setup state |
|
||||
| `ktx.yaml` | Main project configuration for providers, embeddings, connections, source mappings, and query history |
|
||||
| `.ktx/secrets/*` | Local file-backed secrets when you choose file references during setup |
|
||||
| `.ktx/setup/*` | Local setup progress and context-build state |
|
||||
| `semantic-layer/<connection-id>/` | YAML semantic sources generated by database and source ingestion |
|
||||
| `wiki/` | Markdown business context, definitions, and ingested knowledge |
|
||||
| `.ktx/agents/install-manifest.json` | Manifest of agent integration files installed by `ktx setup --agents` |
|
||||
|
|
|
|||
|
|
@ -228,7 +228,7 @@ mapping metadata. The BigQuery connector still authenticates with the
|
|||
| Feature | Supported | Notes |
|
||||
|---------|-----------|-------|
|
||||
| Tables & views | Yes | Including materialized views and external tables |
|
||||
| Primary keys | No | - |
|
||||
| Primary keys | Yes | Via `INFORMATION_SCHEMA` table constraints when declared |
|
||||
| Foreign keys | No | Not available in BigQuery |
|
||||
| Row count estimates | Yes | From table metadata |
|
||||
| Column statistics | No | - |
|
||||
|
|
@ -500,7 +500,7 @@ No authentication required - SQLite is file-based. The file must be readable by
|
|||
- Uses `LIMIT X OFFSET Y` for pagination
|
||||
- SQLite type affinity system: `TEXT`, `NUMERIC`, `INTEGER`, `REAL`, `BLOB`
|
||||
- Foreign key enforcement requires explicit `PRAGMA foreign_keys = ON`
|
||||
- In-memory databases supported with `path: ":memory:"` (for testing)
|
||||
- Database file must exist before `ktx connection test` or ingest runs
|
||||
|
||||
## Common errors
|
||||
|
||||
|
|
|
|||
Loading…
Add table
Add a link
Reference in a new issue