Polish documentation copy

This commit is contained in:
Luca Martial 2026-05-14 09:38:48 -07:00
parent ce23aca4c4
commit 5568b3d37a
65 changed files with 478 additions and 478 deletions

View file

@ -19,14 +19,14 @@ ktx setup [options]
|------|-------------|---------|
| `--project-dir <path>` | KTX project directory | `KTX_PROJECT_DIR`, nearest `ktx.yaml`, or cwd |
| `--yes` | Accept safe defaults in non-interactive setup | `false` |
| `--no-input` | Disable interactive terminal input | |
| `--no-input` | Disable interactive terminal input | - |
### Agent Integration
| Flag | Description | Default |
|------|-------------|---------|
| `--agents` | Install agent integration only | `false` |
| `--target <target>` | Agent target (`claude-code`, `codex`, `cursor`, `opencode`, `universal`) | |
| `--target <target>` | Agent target (`claude-code`, `codex`, `cursor`, `opencode`, `universal`) | - |
| `--global` | Install agent integration into the global target scope (Claude Code and Codex only) | `false` |
The setup wizard is the public configuration interface. It prompts for LLM

View file

@ -3,7 +3,7 @@ title: "ktx sl"
description: "List, search, validate, or query semantic-layer sources."
---
Interact with your project's semantic layer. Semantic sources are YAML definitions that describe your tables, columns, measures, joins, and grain the vocabulary agents use to generate correct SQL.
Interact with your project's semantic layer. Semantic sources are YAML definitions that describe your tables, columns, measures, joins, and grain - the vocabulary agents use to generate correct SQL.
## Command signature
@ -26,7 +26,7 @@ ktx sl <subcommand> [options]
| Flag | Description | Default |
|------|-------------|---------|
| `--connection-id <id>` | Filter by KTX connection id | |
| `--connection-id <id>` | Filter by KTX connection id | - |
| `--output <mode>` | Output mode: `pretty` (default in TTY), `plain` (TSV), or `json` | `pretty` |
| `--json` | Shortcut for `--output=json` (overrides `--output`) | `false` |
@ -34,8 +34,8 @@ ktx sl <subcommand> [options]
| Flag | Description | Default |
|------|-------------|---------|
| `--connection-id <id>` | Filter by KTX connection id | |
| `--limit <number>` | Maximum search results | |
| `--connection-id <id>` | Filter by KTX connection id | - |
| `--limit <number>` | Maximum search results | - |
| `--output <mode>` | Output mode: `pretty` (default in TTY), `plain` (TSV), or `json` | `pretty` |
| `--json` | Shortcut for `--output=json` (overrides `--output`) | `false` |
@ -43,24 +43,24 @@ ktx sl <subcommand> [options]
| Flag | Description | Default |
|------|-------------|---------|
| `--connection-id <id>` | KTX connection id (required) | |
| `--connection-id <id>` | KTX connection id (required) | - |
### `sl query`
| Flag | Description | Default |
|------|-------------|---------|
| `--connection-id <id>` | KTX connection id | |
| `--query-file <path>` | JSON semantic-layer query file | |
| `--measure <measure>` | Measure to query; repeatable (at least one required) | |
| `--dimension <dimension>` | Dimension to include; repeatable | |
| `--filter <filter>` | Filter expression; repeatable | |
| `--segment <segment>` | Segment to include; repeatable | |
| `--order-by <field[:direction]>` | Order field, optionally suffixed with `:asc` or `:desc`; repeatable | |
| `--limit <n>` | Query limit | |
| `--connection-id <id>` | KTX connection id | - |
| `--query-file <path>` | JSON semantic-layer query file | - |
| `--measure <measure>` | Measure to query; repeatable (at least one required) | - |
| `--dimension <dimension>` | Dimension to include; repeatable | - |
| `--filter <filter>` | Filter expression; repeatable | - |
| `--segment <segment>` | Segment to include; repeatable | - |
| `--order-by <field[:direction]>` | Order field, optionally suffixed with `:asc` or `:desc`; repeatable | - |
| `--limit <n>` | Query limit | - |
| `--include-empty` | Include empty rows | `false` |
| `--format <format>` | Output format: `json` or `sql` | `json` |
| `--execute` | Execute the compiled query against the database | `false` |
| `--max-rows <n>` | Maximum rows to return when executing | |
| `--max-rows <n>` | Maximum rows to return when executing | - |
## Examples

View file

@ -18,7 +18,7 @@ ktx status [options]
| Flag | Description | Default |
|------|-------------|---------|
| `--json` | Print JSON output | `false` |
| `--no-input` | Disable interactive terminal input | |
| `--no-input` | Disable interactive terminal input | - |
## Examples

View file

@ -33,7 +33,7 @@ ktx wiki <subcommand> [options]
|------|-------------|---------|
| `--json` | Print JSON output | `false` |
| `--user-id <id>` | Local user id | `local` |
| `--limit <number>` | Maximum search results | |
| `--limit <number>` | Maximum search results | - |
## Examples

View file

@ -3,7 +3,7 @@ title: Contributing
description: How to contribute to KTX.
---
KTX is an open-source project and welcomes contributions bug fixes, new connectors, documentation improvements, and feature proposals. This page covers how to set up a development environment, navigate the repository, run tests, and submit changes.
KTX is an open-source project and welcomes contributions - bug fixes, new connectors, documentation improvements, and feature proposals. This page covers how to set up a development environment, navigate the repository, run tests, and submit changes.
## Development setup
@ -14,9 +14,9 @@ an analytics project, use the published
### Prerequisites
- **Node.js 22+** and **pnpm** for the TypeScript workspace
- **Python 3.11+** and **uv** for the Python semantic layer and daemon
- **Git** for version control
- **Node.js 22+** and **pnpm** - for the TypeScript workspace
- **Python 3.11+** and **uv** - for the Python semantic layer and daemon
- **Git** - for version control
### Clone and install
@ -72,8 +72,8 @@ packages/
connector-posthog/ # PostHog connector
python/
ktx-sl/ # Semantic layer grain-aware query planning and SQL generation
ktx-daemon/ # Daemon portable API server around the semantic layer
ktx-sl/ # Semantic layer - grain-aware query planning and SQL generation
ktx-daemon/ # Daemon - portable API server around the semantic layer
examples/ # Example projects and fixtures
scripts/ # Workspace scripts (benchmarks, verification, release)
@ -179,17 +179,17 @@ The `package.json` should follow the pattern of existing connectors:
Your connector class must implement `KtxScanConnector`, which requires:
- **`id`** a string identifier, typically `"<driver>:<connectionId>"`
- **`driver`** the `KtxConnectionDriver` value for your database
- **`capabilities`** a `KtxConnectorCapabilities` object declaring what your connector supports: `tableSampling`, `columnSampling`, `columnStats`, `readOnlySql`, `nestedAnalysis`, `eventStreamDiscovery`, `formalForeignKeys`, `estimatedRowCounts`
- **`introspect()`** discovers tables, columns, types, and constraints, returning a `KtxSchemaSnapshot`
- **`id`** - a string identifier, typically `"<driver>:<connectionId>"`
- **`driver`** - the `KtxConnectionDriver` value for your database
- **`capabilities`** - a `KtxConnectorCapabilities` object declaring what your connector supports: `tableSampling`, `columnSampling`, `columnStats`, `readOnlySql`, `nestedAnalysis`, `eventStreamDiscovery`, `formalForeignKeys`, `estimatedRowCounts`
- **`introspect()`** - discovers tables, columns, types, and constraints, returning a `KtxSchemaSnapshot`
Optional methods for richer scanning:
- **`sampleColumn()`** sample values from a specific column
- **`sampleTable()`** sample rows from a table
- **`columnStats()`** compute column statistics
- **`executeReadOnly()`** execute arbitrary read-only SQL
- **`sampleColumn()`** - sample values from a specific column
- **`sampleTable()`** - sample rows from a table
- **`columnStats()`** - compute column statistics
- **`executeReadOnly()`** - execute arbitrary read-only SQL
### Step 3: Add a dialect
@ -212,7 +212,7 @@ Use `packages/connector-sqlite/` as a minimal reference and `packages/connector-
## Code conventions
- **TypeScript**: strict types, no `any`, no `as unknown as`. Use `zod` schemas for runtime validation at CLI and config boundaries. Follow the `camelCaseSchema` / `PascalCaseType` naming convention for Zod schemas and inferred types.
- **Python**: type hints on all new code, `pathlib` over `os.path`, explicit exception types over broad `except Exception`, `logger.exception()` for caught exceptions. Use `sqlglot` for SQL parsing never regex.
- **Python**: type hints on all new code, `pathlib` over `os.path`, explicit exception types over broad `except Exception`, `logger.exception()` for caught exceptions. Use `sqlglot` for SQL parsing - never regex.
- **Dependencies**: `pnpm` for Node packages (never `npm` or `bun`), `uv` for Python (never `pip`).
- **Dead code**: remove it. Don't leave commented-out code, unused wrappers, or empty directories.
@ -220,11 +220,11 @@ Use `packages/connector-sqlite/` as a minimal reference and `packages/connector-
Before submitting a pull request:
1. **Run the relevant checks** at minimum, `pnpm run type-check` and `pnpm run test` for TypeScript changes, `uv run pytest -q` and `uv run pre-commit run --files [FILES]` for Python changes.
2. **Build if you changed exports** run `pnpm run build` to verify package exports and `dist/` expectations still align.
3. **Keep changes focused** one logical change per PR. Don't bundle unrelated refactors.
4. **Follow existing patterns** match the style and conventions of surrounding code. The codebase favors explicit over clever.
5. **Don't commit artifacts** `node_modules/`, `.venv/`, `dist/`, coverage output, and local databases should not be committed.
1. **Run the relevant checks** - at minimum, `pnpm run type-check` and `pnpm run test` for TypeScript changes, `uv run pytest -q` and `uv run pre-commit run --files [FILES]` for Python changes.
2. **Build if you changed exports** - run `pnpm run build` to verify package exports and `dist/` expectations still align.
3. **Keep changes focused** - one logical change per PR. Don't bundle unrelated refactors.
4. **Follow existing patterns** - match the style and conventions of surrounding code. The codebase favors explicit over clever.
5. **Don't commit artifacts** - `node_modules/`, `.venv/`, `dist/`, coverage output, and local databases should not be committed.
For larger features or architectural changes, open an issue first to discuss the approach.

View file

@ -1,23 +1,23 @@
---
title: Context as Code
description: Treat analytics context like code version it, review it, merge it.
description: Treat analytics context like code - version it, review it, merge it.
---
## The idea
dbt proved that analytics transformations belong in version control. Before dbt, SQL lived in BI tools, scheduling systems, and spreadsheets scattered, unreviewed, impossible to audit. "Analytics as code" changed that: put your models in git, review them in PRs, deploy them by merging.
dbt proved that analytics transformations belong in version control. Before dbt, SQL lived in BI tools, scheduling systems, and spreadsheets - scattered, unreviewed, impossible to audit. "Analytics as code" changed that: put your models in git, review them in PRs, deploy them by merging.
KTX applies the same principle to analytics context. Metric definitions, business rules, join relationships, wiki pages these are artifacts that determine whether an agent produces correct results. They change over time. They need review. They need history. They need to be treated like code.
KTX applies the same principle to analytics context. Metric definitions, business rules, join relationships, wiki pages - these are artifacts that determine whether an agent produces correct results. They change over time. They need review. They need history. They need to be treated like code.
A KTX project is a git repository. Semantic sources are YAML files. Wiki pages are Markdown files. Changes are commits. Updates are pull requests. Deployment is a merge. The entire lifecycle of your analytics context follows the same workflow your team already uses for dbt models, application code, and infrastructure.
## Auto-ingestion
Most analytics context already exists it's in your dbt manifests, LookML models, Metabase questions, and team Notion pages. KTX pulls from these sources automatically through adapters.
Most analytics context already exists - it's in your dbt manifests, LookML models, Metabase questions, and team Notion pages. KTX pulls from these sources automatically through adapters.
An ingestion run works like this:
1. **Adapters extract metadata.** Each configured source — dbt, LookML, Metabase, MetricFlow, Notion, or your live database — provides structured metadata about models, metrics, dimensions, questions, and documentation.
1. **Adapters extract metadata.** Each configured source - dbt, LookML, Metabase, MetricFlow, Notion, or your live database - provides structured metadata about models, metrics, dimensions, questions, and documentation.
2. **The LLM agent reconciles.** KTX doesn't blindly overwrite existing context. An LLM agent compares incoming metadata against your current semantic sources and wiki pages. It decides what to create, what to update, and what to leave alone. If your dbt project added a new model, the agent writes a new semantic source. If a Metabase question references a metric you've already defined, the agent skips the duplicate.
@ -66,17 +66,17 @@ metadata, and documentation updates are ready for review each morning.
Once merged, agents querying through the KTX CLI see the updated context immediately. No deployment step, no cache invalidation, no restart. The files are the source of truth, and agents read them on every request.
This workflow gives you the same review guarantees you have for dbt models. No semantic source reaches production without a human approving it. But unlike maintaining context manually, the heavy lifting — discovering new tables, drafting source definitions, extracting business rules from documentation — is done by the ingestion agent. You review and approve. You don't write from scratch.
This workflow gives you the same review guarantees you have for dbt models. No semantic source reaches production without a human approving it. But unlike maintaining context manually, the heavy lifting - discovering new tables, drafting source definitions, extracting business rules from documentation - is done by the ingestion agent. You review and approve. You don't write from scratch.
## Feedback loops
Context improves over time through two feedback channels.
**Analyst corrections.** When an analytics engineer spots something wrong a measure formula that doesn't match the business definition, a join that should be `many_to_one` instead of `one_to_many`, a wiki page that's out of date they edit the YAML or Markdown directly and commit. These corrections become part of the project's git history, and the next ingestion run respects them. If you manually fix a measure definition, KTX won't overwrite it on the next ingest.
**Analyst corrections.** When an analytics engineer spots something wrong - a measure formula that doesn't match the business definition, a join that should be `many_to_one` instead of `one_to_many`, a wiki page that's out of date - they edit the YAML or Markdown directly and commit. These corrections become part of the project's git history, and the next ingestion run respects them. If you manually fix a measure definition, KTX won't overwrite it on the next ingest.
**Agent feedback.** When an agent queries the semantic layer and gets unexpected results a query that returns no rows because of a bad filter, a join path that produces duplicated results it can flag the issue. These signals feed back into the context: wiki pages can note known data quality issues, and source definitions can be tightened with better filters, join paths, or grain declarations.
**Agent feedback.** When an agent queries the semantic layer and gets unexpected results - a query that returns no rows because of a bad filter, a join path that produces duplicated results - it can flag the issue. These signals feed back into the context: wiki pages can note known data quality issues, and source definitions can be tightened with better filters, join paths, or grain declarations.
Each of these channels makes the next ingestion cycle better. Analyst corrections teach the system what your team considers authoritative. Agent feedback surfaces gaps in coverage. Context is not a static artifact it's a living system that converges toward accuracy with every iteration.
Each of these channels makes the next ingestion cycle better. Analyst corrections teach the system what your team considers authoritative. Agent feedback surfaces gaps in coverage. Context is not a static artifact - it's a living system that converges toward accuracy with every iteration.
## Deterministic replay
@ -84,9 +84,9 @@ Every ingestion session in KTX produces a full transcript: every tool call the L
This matters for three reasons.
**Debugging.** When a semantic source looks wrong the grain is off, a join points to the wrong table, a measure formula doesn't match the business definition you can trace it back to the ingestion session that created it. The transcript shows exactly which adapter provided the input, how the LLM interpreted it, and why it made the decision it did. You don't have to guess.
**Debugging.** When a semantic source looks wrong - the grain is off, a join points to the wrong table, a measure formula doesn't match the business definition - you can trace it back to the ingestion session that created it. The transcript shows exactly which adapter provided the input, how the LLM interpreted it, and why it made the decision it did. You don't have to guess.
**Trust.** Analytics teams need to trust the context that agents consume. Deterministic replay means you can verify any part of the context layer by re-examining the session that produced it. If a stakeholder asks "where did this revenue definition come from?", you have a complete audit trail from the dbt manifest entry, through the LLM's reconciliation logic, to the YAML file that was written.
**Trust.** Analytics teams need to trust the context that agents consume. Deterministic replay means you can verify any part of the context layer by re-examining the session that produced it. If a stakeholder asks "where did this revenue definition come from?", you have a complete audit trail - from the dbt manifest entry, through the LLM's reconciliation logic, to the YAML file that was written.
**Reproducibility.** Because ingestion sessions are recorded as structured transcripts (tool calls and responses, not just logs), they can be replayed for testing and validation. If you change your ingestion configuration or upgrade the LLM, you can replay previous sessions to see how the output would differ. This gives you a safety net for changes that affect how context is generated.

View file

@ -5,7 +5,7 @@ description: What a context layer is, why agents need one, and how KTX compares
## The problem
Give an agent access to your database and it will generate SQL. It might even produce a decent chart. But ask it a real analytics question — "what's our net revenue trend by segment?" — and things fall apart.
Give an agent access to your database and it will generate SQL. It might even produce a decent chart. But ask it a real analytics question - "what's our net revenue trend by segment?" - and things fall apart.
The agent doesn't know that `orders.amount` includes refunds and needs a status filter. It doesn't know that `customers` should join to `orders` on `customer_id`, not `id`. It doesn't know that your team stopped using `legacy_segments` six months ago, or that "enterprise" means contracts over $100k, not just big logos. It sees column names and types. It doesn't see your business.
@ -17,15 +17,15 @@ Analytics engineers already know this pain. It's the same reason you write dbt t
The industry has moved through three distinct approaches to getting AI and data to work together.
**Wave one: database access.** Connect an LLM to a database, let it generate SQL. This works for simple lookups — "how many orders last week?" — but breaks on anything that requires business knowledge. The agent guesses at joins, invents metrics, and hallucinates table relationships. Every query is a coin flip.
**Wave one: database access.** Connect an LLM to a database, let it generate SQL. This works for simple lookups - "how many orders last week?" - but breaks on anything that requires business knowledge. The agent guesses at joins, invents metrics, and hallucinates table relationships. Every query is a coin flip.
**Wave two: semantic layers and text-to-SQL.** Add structure. Define metrics in MetricFlow or Cube, expose schemas, build text-to-SQL pipelines. This is better — the agent knows that `revenue` means `sum(amount) where status != 'refunded'` — but building and maintaining that structure by hand is manual, time-consuming, and still limited. Semantic layers define what to calculate, not why, when, or how to interpret the result. The agent can compute net revenue but doesn't know about the February refund anomaly, the segment reclassification, or the fact that `enterprise` changed definition last quarter.
**Wave two: semantic layers and text-to-SQL.** Add structure. Define metrics in MetricFlow or Cube, expose schemas, build text-to-SQL pipelines. This is better - the agent knows that `revenue` means `sum(amount) where status != 'refunded'` - but building and maintaining that structure by hand is manual, time-consuming, and still limited. Semantic layers define what to calculate, not why, when, or how to interpret the result. The agent can compute net revenue but doesn't know about the February refund anomaly, the segment reclassification, or the fact that `enterprise` changed definition last quarter.
**Wave three: agentic context.** AI is no longer just answering questions it's generating dashboards, writing semantic definitions, proposing dbt models, creating tests and documentation. For that to work, agents need more than metric definitions. They need the full picture: business rules, known data quality issues, relationship maps, historical context, and the institutional knowledge that lives in your team's heads. They need a context layer.
**Wave three: agentic context.** AI is no longer just answering questions - it's generating dashboards, writing semantic definitions, proposing dbt models, creating tests and documentation. For that to work, agents need more than metric definitions. They need the full picture: business rules, known data quality issues, relationship maps, historical context, and the institutional knowledge that lives in your team's heads. They need a context layer.
## What a context layer is
A context layer is the infrastructure that gives agents the business knowledge they need to produce correct analytics artifacts. It includes a semantic layer — that's a critical component — but it's not the whole thing.
A context layer is the infrastructure that gives agents the business knowledge they need to produce correct analytics artifacts. It includes a semantic layer - that's a critical component - but it's not the whole thing.
KTX organizes context into four pillars:
@ -67,7 +67,7 @@ measures:
expr: count(id)
```
**Wiki pages** are Markdown documents that capture business definitions, rules, and operating context the kind of context that doesn't fit in a schema definition. Pages have structured frontmatter (summary, tags, semantic layer references) and free-form content. Agents search them when they need to understand why a metric works a certain way, not just how to compute it.
**Wiki pages** are Markdown documents that capture business definitions, rules, and operating context - the kind of context that doesn't fit in a schema definition. Pages have structured frontmatter (summary, tags, semantic layer references) and free-form content. Agents search them when they need to understand why a metric works a certain way, not just how to compute it.
```markdown
---
@ -91,9 +91,9 @@ canonical revenue reporting.
**Scan artifacts** are the raw output of KTX's database scanner: table and column metadata, inferred foreign key relationships (even without declared constraints), column statistics, and enrichment reports. They form the foundation that semantic sources are built on.
**Provenance** is the record of how context was created and changed. Every ingestion session records a full transcript which adapter ran, what the LLM decided, which sources were created or updated, and why. This is what makes the system auditable: you can trace any semantic source back to the ingestion decision that created it.
**Provenance** is the record of how context was created and changed. Every ingestion session records a full transcript - which adapter ran, what the LLM decided, which sources were created or updated, and why. This is what makes the system auditable: you can trace any semantic source back to the ingestion decision that created it.
Together, these four pillars give agents enough context to produce analytics artifacts that match what your team would produce not just syntactically valid SQL, but the right query for the question.
Together, these four pillars give agents enough context to produce analytics artifacts that match what your team would produce - not just syntactically valid SQL, but the right query for the question.
## How KTX compares
@ -115,7 +115,7 @@ If you do not have a semantic layer, KTX can build an agent-native one from your
## The plain-files philosophy
A KTX project is a directory of plain files. No server to run, no database to manage, no proprietary store to back up. Everything is YAML, Markdown, and SQLite formats you can read, diff, and version-control with tools you already use.
A KTX project is a directory of plain files. No server to run, no database to manage, no proprietary store to back up. Everything is YAML, Markdown, and SQLite - formats you can read, diff, and version-control with tools you already use.
```
my-project/
@ -140,7 +140,7 @@ my-project/
└── cache/ # Runtime cache (git-ignored)
```
Semantic sources and wiki pages are committed to git. The SQLite database holds ephemeral state — schema ingest results, embedding indexes, session logs — and is git-ignored. If you delete it, KTX rebuilds it on the next run.
Semantic sources and wiki pages are committed to git. The SQLite database holds ephemeral state - schema ingest results, embedding indexes, session logs - and is git-ignored. If you delete it, KTX rebuilds it on the next run.
This means your analytics context travels with your code. You can fork it, branch it, review it in a PR, and merge it with the same tools you use for dbt models. There's no sync problem between a remote server and your local state. There's no migration to run. The files are the source of truth.

View file

@ -51,7 +51,7 @@ description: How KTX gives analytics agents trusted context for warehouse work.
## Who KTX is for
KTX is built for analytics engineers and data teams who want data agents to
work on real analytics systems not just generate one-off SQL.
work on real analytics systems - not just generate one-off SQL.
Use KTX when you want agents to:

View file

@ -3,7 +3,7 @@ title: Quickstart
description: Set up KTX and build your first context in under 10 minutes.
---
This guide walks you through `ktx setup` an interactive wizard that configures your LLM provider, connects your database, optionally ingests from your existing tools, builds context, and installs agent integration.
This guide walks you through `ktx setup` - an interactive wizard that configures your LLM provider, connects your database, optionally ingests from your existing tools, builds context, and installs agent integration.
If you are a coding assistant trying to decide which KTX docs page to read, start with the [Agent Quickstart](/docs/ai-resources/agent-quickstart). This page is the human setup walkthrough.
@ -11,8 +11,8 @@ If you are a coding assistant trying to decide which KTX docs page to read, star
Use this sequence when you are setting up KTX in an analytics project:
1. `npm install -g @kaelio/ktx` install the published KTX CLI from npm.
2. `ktx setup` create or resume a KTX project.
1. `npm install -g @kaelio/ktx` - install the published KTX CLI from npm.
2. `ktx setup` - create or resume a KTX project.
The setup wizard is stateful. If it exits before completion, rerun `ktx setup` in the same project directory to resume from the first incomplete step.
@ -118,7 +118,7 @@ under `connections.<id>.context.queryHistory` in `ktx.yaml`.
## Step 4: Add context sources
Context sources let KTX ingest metadata from your existing analytics tools. This step is optional you can skip it and add sources later.
Context sources let KTX ingest metadata from your existing analytics tools. This step is optional - you can skip it and add sources later.
```
◆ Which context sources should KTX ingest?
@ -248,7 +248,7 @@ Agent integration ready: yes (claude-code:project)
## Next steps
- **Build more context** learn about [database ingest](/docs/guides/building-context), relationship detection, and source ingestion workflows in the Building Context guide.
- **Refine your semantic layer** the [Writing Context](/docs/guides/writing-context) guide covers source YAML, measures, joins, and wiki pages.
- **Understand the architecture** read [The Context Layer](/docs/concepts/the-context-layer) to learn why a context layer is more than a semantic layer.
- **Connect more agents** see the [Agent Clients](/docs/integrations/agent-clients) integration page for per-tool setup details.
- **Build more context** - learn about [database ingest](/docs/guides/building-context), relationship detection, and source ingestion workflows in the Building Context guide.
- **Refine your semantic layer** - the [Writing Context](/docs/guides/writing-context) guide covers source YAML, measures, joins, and wiki pages.
- **Understand the architecture** - read [The Context Layer](/docs/concepts/the-context-layer) to learn why a context layer is more than a semantic layer.
- **Connect more agents** - see the [Agent Clients](/docs/integrations/agent-clients) integration page for per-tool setup details.

View file

@ -50,13 +50,13 @@ ktx status
### Relationship detection
Many databases lack declared foreign keys. KTX infers relationships by scoring column pairs across seven signals name similarity, type compatibility, value overlap, embedding similarity, profile uniqueness, null rate, and structural priors. The weighted score determines each candidate's status:
Many databases lack declared foreign keys. KTX infers relationships by scoring column pairs across seven signals - name similarity, type compatibility, value overlap, embedding similarity, profile uniqueness, null rate, and structural priors. The weighted score determines each candidate's status:
| Score range | Status | Meaning |
|-------------|--------|---------|
| &ge; 0.85 | `accepted` | High confidence applied automatically |
| 0.55 &ndash; 0.84 | `review` | Plausible needs human review |
| &lt; 0.55 | `rejected` | Low confidence not applied |
| &ge; 0.85 | `accepted` | High confidence - applied automatically |
| 0.55 &ndash; 0.84 | `review` | Plausible - needs human review |
| &lt; 0.55 | `rejected` | Low confidence - not applied |
Deep database ingest can include relationship evidence where the connector can
provide it. Relationship review and calibration subcommands are not part of the
@ -64,14 +64,14 @@ current public CLI surface.
## Ingestion
Ingestion pulls semantic context from your existing analytics tools — dbt projects, Looker models, Metabase questions, and more — and writes it into your KTX project as semantic sources and wiki pages.
Ingestion pulls semantic context from your existing analytics tools - dbt projects, Looker models, Metabase questions, and more - and writes it into your KTX project as semantic sources and wiki pages.
### How it works
Each ingest run follows this flow:
1. An **adapter** extracts metadata from your tool (dbt manifest, LookML files, Metabase API, etc.)
2. An **LLM agent** reconciles the extracted metadata with your existing context it merges intelligently rather than overwriting
2. An **LLM agent** reconciles the extracted metadata with your existing context - it merges intelligently rather than overwriting
3. **Semantic sources** (YAML) and **wiki pages** (Markdown) are written to your project directory
### Running an ingest

View file

@ -3,22 +3,22 @@ title: Writing Context
description: Write and refine semantic sources and wiki pages.
---
After building context through scanning and ingestion, you'll want to refine it edit semantic sources to match your business logic, add wiki pages that capture tribal knowledge, and query your data through the semantic layer to verify everything works.
After building context through scanning and ingestion, you'll want to refine it - edit semantic sources to match your business logic, add wiki pages that capture tribal knowledge, and query your data through the semantic layer to verify everything works.
## Agent workflow summary
Agents should refine context in this order:
1. `ktx sl list --json` discover available sources and connection ids.
2. `ktx sl search <query> --json` find source candidates for a concept.
1. `ktx sl list --json` - discover available sources and connection ids.
2. `ktx sl search <query> --json` - find source candidates for a concept.
3. Edit the source YAML directly in `semantic-layer/<connection-id>/`.
4. `ktx sl validate <source> --connection-id <id>` verify columns, joins, and table references.
5. `ktx sl query ... --format sql` compile a representative query without executing it.
6. `ktx wiki search ...` check business context captured by ingest or memory.
4. `ktx sl validate <source> --connection-id <id>` - verify columns, joins, and table references.
5. `ktx sl query ... --format sql` - compile a representative query without executing it.
6. `ktx wiki search ...` - check business context captured by ingest or memory.
## Semantic Sources
Semantic sources are YAML files that describe your tables, columns, measures, and joins. They're the core of the context layer the structured definitions that agents use to generate correct SQL.
Semantic sources are YAML files that describe your tables, columns, measures, and joins. They're the core of the context layer - the structured definitions that agents use to generate correct SQL.
### Listing sources
@ -44,7 +44,7 @@ YAML file under `semantic-layer/<connection-id>/`.
### The source schema
A semantic source defines a single queryable entity usually a table or a SQL expression. Here's a fully annotated example:
A semantic source defines a single queryable entity - usually a table or a SQL expression. Here's a fully annotated example:
```yaml
name: orders
@ -146,7 +146,7 @@ Column visibility controls what agents see:
|------------|----------|
| `public` | Included in agent queries and listings (default) |
| `internal` | Available for joins and measures but not shown to agents |
| `hidden` | Excluded entirely useful for ETL columns |
| `hidden` | Excluded entirely - useful for ETL columns |
### Editing a source
@ -161,7 +161,7 @@ Validation checks a source definition against the actual database schema:
ktx sl validate orders --connection-id my-postgres
```
This catches mismatches — columns that don't exist in the table, type mismatches, invalid join targets — before an agent tries to use the source.
This catches mismatches - columns that don't exist in the table, type mismatches, invalid join targets - before an agent tries to use the source.
### Querying
@ -207,20 +207,20 @@ Query flags:
| `--max-rows <n>` | Maximum rows to return when executing |
| `--include-empty` | Include empty/null rows in results |
The query planner is grain-aware it understands the cardinality of joins and avoids chasm traps (double-counting caused by many-to-many fan-outs). When you query measures that span multiple sources, KTX generates sub-queries at the correct grain before joining.
The query planner is grain-aware - it understands the cardinality of joins and avoids chasm traps (double-counting caused by many-to-many fan-outs). When you query measures that span multiple sources, KTX generates sub-queries at the correct grain before joining.
### Workflow: edit and validate a source
1. Open `semantic-layer/my-postgres/orders.yaml`.
2. Edit the file to add columns, measures, joins, or descriptions.
3. `ktx sl validate orders --connection-id my-postgres` check the definition against the live schema.
4. `ktx sl query --connection-id my-postgres --measure total_revenue --dimension order_date --format sql` compile a representative query.
3. `ktx sl validate orders --connection-id my-postgres` - check the definition against the live schema.
4. `ktx sl query --connection-id my-postgres --measure total_revenue --dimension order_date --format sql` - compile a representative query.
If validation fails, fix the YAML before asking an agent to use the source. Common validation failures are missing columns, invalid join targets, and measure expressions that reference fields outside the source.
## Wiki Pages
Wiki pages are Markdown files that capture business context definitions, rules, gotchas, and anything an agent needs to understand beyond what the schema tells it.
Wiki pages are Markdown files that capture business context - definitions, rules, gotchas, and anything an agent needs to understand beyond what the schema tells it.
### What they are
@ -242,8 +242,8 @@ wiki/
└── known-data-issues.md
```
- **Global pages** apply across all connections business definitions, metric standards, company terminology.
- **User-scoped pages** are private to a user ID personal notes, local gotchas, or context you do not want shared globally.
- **Global pages** apply across all connections - business definitions, metric standards, company terminology.
- **User-scoped pages** are private to a user ID - personal notes, local gotchas, or context you do not want shared globally.
### Editing pages
@ -274,7 +274,7 @@ ktx wiki list
ktx wiki search "revenue recognition"
```
Search uses both full-text matching and semantic similarity it finds relevant pages even when the exact terms don't match. Agents call this automatically when they need business context to answer a question.
Search uses both full-text matching and semantic similarity - it finds relevant pages even when the exact terms don't match. Agents call this automatically when they need business context to answer a question.
### Workflow: add searchable business context

View file

@ -3,7 +3,7 @@ title: Context Sources
description: Ingest semantic context from dbt, MetricFlow, LookML, Metabase, Looker, and Notion.
---
Context sources feed your existing analytics tooling into KTX. During ingestion, KTX extracts metadata from each source and uses an LLM agent to reconcile it with your existing semantic layer and knowledge base merging intelligently rather than overwriting.
Context sources feed your existing analytics tooling into KTX. During ingestion, KTX extracts metadata from each source and uses an LLM agent to reconcile it with your existing semantic layer and knowledge base - merging intelligently rather than overwriting.
All context sources are configured in `ktx.yaml` under `connections` with their respective `driver` value.
@ -250,7 +250,7 @@ mappings:
syncMode: ONLY # ONLY = restrict to mapped DBs
```
Find Metabase database IDs in **Admin > Databases** the ID is in the URL when editing a database.
Find Metabase database IDs in **Admin > Databases** - the ID is in the URL when editing a database.
---
@ -353,7 +353,7 @@ Create an integration at [notion.so/my-integrations](https://www.notion.so/my-in
| Field | Description | Default |
|-------|-------------|---------|
| `crawl_mode` | `all_accessible` or `selected_roots` | |
| `crawl_mode` | `all_accessible` or `selected_roots` | - |
| `root_page_ids` | Page IDs to crawl from (for `selected_roots`) | `[]` |
| `root_database_ids` | Database IDs to include | `[]` |
| `max_pages_per_run` | Pages processed per sync | `1000` |
@ -369,7 +369,7 @@ Create an integration at [notion.so/my-integrations](https://www.notion.so/my-in
### Notes
- Notion is knowledge-only it does not produce semantic layer sources
- Notion is knowledge-only - it does not produce semantic layer sources
- Rate limits apply; large workspaces may require multiple ingestion runs
- Incremental sync cursors are stored in `.ktx/db.sqlite`; don't add
`last_successful_cursor` to `ktx.yaml`

View file

@ -154,9 +154,9 @@ For multiple schemas:
| Primary keys | Yes | Via table constraints |
| Foreign keys | No | Not available in Snowflake |
| Row count estimates | Yes | From `INFORMATION_SCHEMA.TABLES.ROW_COUNT` |
| Column statistics | No | |
| Column statistics | No | - |
| Query history | Yes | Via `SNOWFLAKE.ACCOUNT_USAGE.QUERY_HISTORY` when enabled |
| Table sampling | Yes | |
| Table sampling | Yes | - |
### Query history
@ -228,12 +228,12 @@ mapping metadata. The BigQuery connector still authenticates with the
| Feature | Supported | Notes |
|---------|-----------|-------|
| Tables & views | Yes | Including materialized views and external tables |
| Primary keys | No | |
| Primary keys | No | - |
| Foreign keys | No | Not available in BigQuery |
| Row count estimates | Yes | From table metadata |
| Column statistics | No | |
| Column statistics | No | - |
| Query history | Yes | Via region-scoped `INFORMATION_SCHEMA.JOBS_BY_PROJECT` when enabled |
| Table sampling | Yes | |
| Table sampling | Yes | - |
### Query history
@ -307,9 +307,9 @@ connections:
| Primary keys | Yes | Via `system.columns` |
| Foreign keys | No | Not a ClickHouse concept |
| Row count estimates | Yes | Via `system.parts` aggregation |
| Column statistics | No | |
| Query history | No | |
| Table sampling | Yes | |
| Column statistics | No | - |
| Query history | No | - |
| Table sampling | Yes | - |
### Dialect notes
@ -364,8 +364,8 @@ connections:
| Primary keys | Yes | Via `KEY_COLUMN_USAGE` |
| Foreign keys | Yes | Via `REFERENTIAL_CONSTRAINTS` |
| Row count estimates | Yes | From `TABLE_ROWS` (InnoDB estimate) |
| Column statistics | No | |
| Query history | No | |
| Column statistics | No | - |
| Query history | No | - |
| Table sampling | Yes | Uses `RAND()` filter |
### Dialect notes
@ -430,10 +430,10 @@ For multiple schemas:
| Primary keys | Yes | Via `TABLE_CONSTRAINTS` and `KEY_COLUMN_USAGE` |
| Foreign keys | Yes | Via `REFERENTIAL_CONSTRAINTS` |
| Row count estimates | Yes | Via `sys.dm_db_partition_stats` |
| Column statistics | No | |
| Query history | No | |
| Table sampling | Yes | |
| Nested analysis | No | |
| Column statistics | No | - |
| Query history | No | - |
| Table sampling | Yes | - |
| Nested analysis | No | - |
### Dialect notes
@ -478,7 +478,7 @@ url: sqlite:///path/to/db.sqlite
### Authentication
No authentication required SQLite is file-based. The file must be readable by the process running KTX.
No authentication required - SQLite is file-based. The file must be readable by the process running KTX.
### Features
@ -488,10 +488,10 @@ No authentication required — SQLite is file-based. The file must be readable b
| Primary keys | Yes | Via `PRAGMA table_info()` |
| Foreign keys | Yes | Via `PRAGMA foreign_key_list()` (requires `PRAGMA foreign_keys = ON`) |
| Row count estimates | Yes | Exact count via `SELECT COUNT(*)` |
| Column statistics | No | |
| Query history | No | |
| Table sampling | Yes | |
| Nested analysis | No | |
| Column statistics | No | - |
| Query history | No | - |
| Table sampling | Yes | - |
| Nested analysis | No | - |
### Dialect notes