ktx/docs-site/content/docs/guides/building-context.mdx
Andrey Avtomonov b9e0a746af
feat(cli): clean up dev command surface (#57)
* feat(cli): clean up dev command surface

* test: align CI expectations with CLI cleanup

* test(cli): update slow test command expectations
2026-05-13 12:00:08 +02:00

180 lines
6.2 KiB
Text

---
title: Building Context
description: Scan your database schema and ingest context from dbt, Looker, Metabase, and more.
---
Building context is a two-step process. First, you **scan** your database to discover its structure — tables, columns, types, constraints, and relationships. Then you **ingest** from your existing tools to enrich that structure with semantic meaning — metric definitions, business descriptions, join logic, and knowledge that agents need to generate correct analytics.
## Scanning
Scanning connects to your database and extracts structural metadata. KTX stores the results locally so agents can understand your schema without querying the database directly.
### Running a scan
```bash
ktx scan <connection-id>
```
This runs a structural scan by default. You can control what the scan does with the `--mode` flag:
| Mode | What it does |
|------|-------------|
| `structural` | Tables, columns, types, constraints, row counts (default) |
| `enriched` | Structural scan plus LLM-generated column descriptions |
| `relationships` | Structural scan plus foreign key relationship detection |
```bash
# Scan with relationship detection
ktx scan my-postgres --mode relationships
# Preview without writing results
ktx scan my-postgres --dry-run
```
### Checking scan results
Every scan prints a summary and writes local artifacts. Use `ktx status` after a scan to review project readiness and follow-up setup work:
```bash
ktx status
```
### Relationship detection
Many databases lack declared foreign keys. KTX infers relationships by scoring column pairs across seven signals — name similarity, type compatibility, value overlap, embedding similarity, profile uniqueness, null rate, and structural priors. The weighted score determines each candidate's status:
| Score range | Status | Meaning |
|-------------|--------|---------|
| &ge; 0.85 | `accepted` | High confidence — applied automatically |
| 0.55 &ndash; 0.84 | `review` | Plausible — needs human review |
| &lt; 0.55 | `rejected` | Low confidence — not applied |
Relationship scans run with `ktx scan <connection-id> --mode relationships`. This command only executes the scan; relationship review and calibration subcommands are not part of the current CLI surface.
## Ingestion
Ingestion pulls semantic context from your existing analytics tools — dbt projects, Looker models, Metabase questions, and more — and writes it into your KTX project as semantic sources and knowledge pages.
### How it works
Each ingest run follows this flow:
1. An **adapter** extracts metadata from your tool (dbt manifest, LookML files, Metabase API, etc.)
2. An **LLM agent** reconciles the extracted metadata with your existing context — it merges intelligently rather than overwriting
3. **Semantic sources** (YAML) and **knowledge pages** (Markdown) are written to your project directory
### Running an ingest
```bash
ktx ingest run --connection-id my-dbt-source --adapter dbt
```
Useful low-level flags:
| Flag | Description |
|------|-------------|
| `--source-dir <path>` | Directory containing source files (e.g., your dbt project) |
| `--viz` | Render the memory-flow TUI for real-time progress |
| `--json` | Output as JSON |
| `--plain` | Plain text output |
### Watching progress
```bash
# Check status of the latest ingest
ktx ingest status
# Check a specific run
ktx ingest status <run-id>
# Open the visual ingest report (TUI)
ktx ingest watch
# Replay a past ingest run
ktx ingest replay <run-id>
```
The `watch` command opens an interactive TUI that shows the memory-flow output — every tool call, LLM decision, and artifact written during the ingest.
### Available adapters
| Adapter | Source | What gets ingested |
|---------|--------|--------------------|
| `dbt` | dbt project | Model definitions, column descriptions, tests, tags |
| `metricflow` | MetricFlow semantic models | Metrics, dimensions, entities, semantic joins |
| `lookml` | LookML files | Views, explores, dimensions, measures, joins |
| `looker` | Looker API | Explores, looks, dashboard metadata |
| `metabase` | Metabase API | Questions, dashboards, table metadata |
| `notion` | Notion API | Database pages, knowledge articles |
| `historic-sql` | Query history | Frequent queries, usage patterns, runtime stats |
| `live-database` | Direct DB connection | Live schema introspection |
See [Context Sources](/docs/integrations/context-sources) for adapter-specific setup and auth configuration.
### What gets generated
A typical dbt ingest produces semantic sources and knowledge pages in your project:
**Semantic source** (`semantic-layer/my-postgres/orders.yaml`):
```yaml title="semantic-layer/my-postgres/orders.yaml"
name: orders
table: public.orders
grain:
- order_id
columns:
- name: order_id
type: string
description: Unique order identifier
- name: customer_id
type: string
description: Foreign key to customers table
- name: order_date
type: time
role: time
description: Date the order was placed
- name: total_amount
type: number
description: Total order value in USD
measures:
- name: total_revenue
expr: SUM(total_amount)
description: Sum of all order values
- name: order_count
expr: COUNT(DISTINCT order_id)
description: Number of distinct orders
joins:
- to: customers
on: orders.customer_id = customers.customer_id
relationship: many_to_one
```
**Knowledge page** (`knowledge/global/order-status-definitions.md`):
```markdown
---
summary: Business definitions for order status values
tags: [orders, definitions]
sl_refs: [orders]
---
## Order Statuses
- **pending**: Order placed but not yet processed
- **confirmed**: Payment received, awaiting fulfillment
- **shipped**: Order dispatched to carrier
- **delivered**: Order received by customer
- **cancelled**: Order cancelled before shipment
Orders in "pending" status for more than 48 hours are flagged for review.
```
### Deterministic replay
Every ingest session records a full transcript — tool calls, LLM responses, and write decisions. You can replay any session to debug why a source was written a certain way:
```bash
ktx ingest replay <run-id> --viz
```
This opens the same TUI view as the original run, letting you step through the agent's reasoning.