ktx/docs-site/content/docs/guides/building-context.mdx

---
title: Building Context
description: Scan your database schema and ingest context from dbt, Looker, Metabase, and more.
---

Building context is a two-step process. First, you **scan** your database to discover its structure — tables, columns, types, constraints, and relationships. Then you **ingest** from your existing tools to enrich that structure with semantic meaning — metric definitions, business descriptions, join logic, and knowledge that agents need to generate correct analytics.

## Scanning

Scanning connects to your database and extracts structural metadata. KTX stores the results locally so agents can understand your schema without querying the database directly.

### Running a scan

```bash
ktx scan <connection-id>
```

This runs a structural scan by default. You can control what the scan does with the `--mode` flag:

| Mode | What it does |
|------|-------------|
| `structural` | Tables, columns, types, constraints, row counts (default) |
| `enriched` | Structural scan plus LLM-generated column descriptions |
| `relationships` | Structural scan plus foreign key relationship detection |

```bash
# Scan with relationship detection
ktx scan my-postgres --mode relationships

# Preview without writing results
ktx scan my-postgres --dry-run
```

### Checking scan results

Every scan prints a summary and writes local artifacts. Use `ktx status` after a scan to review project readiness and follow-up setup work:

```bash
ktx status
```

### Relationship detection

Many databases lack declared foreign keys. KTX infers relationships by scoring column pairs across seven signals — name similarity, type compatibility, value overlap, embedding similarity, profile uniqueness, null rate, and structural priors. The weighted score determines each candidate's status:

| Score range | Status | Meaning |
|-------------|--------|---------|
| &ge; 0.85 | `accepted` | High confidence — applied automatically |
| 0.55 &ndash; 0.84 | `review` | Plausible — needs human review |
| &lt; 0.55 | `rejected` | Low confidence — not applied |

Relationship scans run with `ktx scan <connection-id> --mode relationships`. This command only executes the scan; relationship review and calibration subcommands are not part of the current CLI surface.

## Ingestion

Ingestion pulls semantic context from your existing analytics tools — dbt projects, Looker models, Metabase questions, and more — and writes it into your KTX project as semantic sources and knowledge pages.

### How it works

Each ingest run follows this flow:

1. An **adapter** extracts metadata from your tool (dbt manifest, LookML files, Metabase API, etc.)
2. An **LLM agent** reconciles the extracted metadata with your existing context — it merges intelligently rather than overwriting
3. **Semantic sources** (YAML) and **knowledge pages** (Markdown) are written to your project directory

### Running an ingest

```bash
ktx ingest run --connection-id my-dbt-source --adapter dbt
```

Useful low-level flags:

| Flag | Description |
|------|-------------|
| `--source-dir <path>` | Directory containing source files (e.g., your dbt project) |
| `--viz` | Render the memory-flow TUI for real-time progress |
| `--json` | Output as JSON |
| `--plain` | Plain text output |

### Watching progress

```bash
# Check status of the latest ingest
ktx ingest status

# Check a specific run
ktx ingest status <run-id>

# Open the visual ingest report (TUI)
ktx ingest watch

# Replay a past ingest run
ktx ingest replay <run-id>
```

The `watch` command opens an interactive TUI that shows the memory-flow output — every tool call, LLM decision, and artifact written during the ingest.

### Available adapters

| Adapter | Source | What gets ingested |
|---------|--------|--------------------|
| `dbt` | dbt project | Model definitions, column descriptions, tests, tags |
| `metricflow` | MetricFlow semantic models | Metrics, dimensions, entities, semantic joins |
| `lookml` | LookML files | Views, explores, dimensions, measures, joins |
| `looker` | Looker API | Explores, looks, dashboard metadata |
| `metabase` | Metabase API | Questions, dashboards, table metadata |
| `notion` | Notion API | Database pages, knowledge articles |
| `historic-sql` | Query history | Frequent queries, usage patterns, runtime stats |
| `live-database` | Direct DB connection | Live schema introspection |

See [Context Sources](/docs/integrations/context-sources) for adapter-specific setup and auth configuration.

### What gets generated

A typical dbt ingest produces semantic sources and knowledge pages in your project:

**Semantic source** (`semantic-layer/my-postgres/orders.yaml`):

```yaml title="semantic-layer/my-postgres/orders.yaml"
name: orders
table: public.orders
grain:
  - order_id
columns:
  - name: order_id
    type: string
    description: Unique order identifier
  - name: customer_id
    type: string
    description: Foreign key to customers table
  - name: order_date
    type: time
    role: time
    description: Date the order was placed
  - name: total_amount
    type: number
    description: Total order value in USD
measures:
  - name: total_revenue
    expr: SUM(total_amount)
    description: Sum of all order values
  - name: order_count
    expr: COUNT(DISTINCT order_id)
    description: Number of distinct orders
joins:
  - to: customers
    on: orders.customer_id = customers.customer_id
    relationship: many_to_one
```

**Knowledge page** (`knowledge/global/order-status-definitions.md`):

```markdown
---
summary: Business definitions for order status values
tags: [orders, definitions]
sl_refs: [orders]
---

## Order Statuses

- **pending**: Order placed but not yet processed
- **confirmed**: Payment received, awaiting fulfillment
- **shipped**: Order dispatched to carrier
- **delivered**: Order received by customer
- **cancelled**: Order cancelled before shipment

Orders in "pending" status for more than 48 hours are flagged for review.
```

### Deterministic replay

Every ingest session records a full transcript — tool calls, LLM responses, and write decisions. You can replay any session to debug why a source was written a certain way:

```bash
ktx ingest replay <run-id> --viz
```

This opens the same TUI view as the original run, letting you step through the agent's reasoning.
docs: add title attributes to config code blocks Triggers VS Code tab styling on the docs' YAML/config blocks where surrounding prose names the file. Mechanical, ~1 edit across the guides section (building-context.mdx); primary-sources.mdx and context-sources.mdx already had title= on their Connection config blocks, so only the prose-named semantic source output example needed updating. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com> 2026-05-11 00:45:43 -07:00			`---`
			`title: Building Context`
			`description: Scan your database schema and ingest context from dbt, Looker, Metabase, and more.`
			`---`

			`Building context is a two-step process. First, you scan your database to discover its structure — tables, columns, types, constraints, and relationships. Then you ingest from your existing tools to enrich that structure with semantic meaning — metric definitions, business descriptions, join logic, and knowledge that agents need to generate correct analytics.`

			`## Scanning`

			`Scanning connects to your database and extracts structural metadata. KTX stores the results locally so agents can understand your schema without querying the database directly.`

			`### Running a scan`

			```bash
feat(cli): clean up dev command surface (#57) * feat(cli): clean up dev command surface * test: align CI expectations with CLI cleanup * test(cli): update slow test command expectations 2026-05-13 12:00:08 +02:00			`ktx scan <connection-id>`
docs: add title attributes to config code blocks Triggers VS Code tab styling on the docs' YAML/config blocks where surrounding prose names the file. Mechanical, ~1 edit across the guides section (building-context.mdx); primary-sources.mdx and context-sources.mdx already had title= on their Connection config blocks, so only the prose-named semantic source output example needed updating. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com> 2026-05-11 00:45:43 -07:00			```

			This runs a structural scan by default. You can control what the scan does with the `--mode` flag:

			`\| Mode \| What it does \|`
			`\|------\|-------------\|`
			\| `structural` \| Tables, columns, types, constraints, row counts (default) \|
			\| `enriched` \| Structural scan plus LLM-generated column descriptions \|
			\| `relationships` \| Structural scan plus foreign key relationship detection \|

			```bash
			`# Scan with relationship detection`
feat(cli): clean up dev command surface (#57) * feat(cli): clean up dev command surface * test: align CI expectations with CLI cleanup * test(cli): update slow test command expectations 2026-05-13 12:00:08 +02:00			`ktx scan my-postgres --mode relationships`
docs: add title attributes to config code blocks Triggers VS Code tab styling on the docs' YAML/config blocks where surrounding prose names the file. Mechanical, ~1 edit across the guides section (building-context.mdx); primary-sources.mdx and context-sources.mdx already had title= on their Connection config blocks, so only the prose-named semantic source output example needed updating. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com> 2026-05-11 00:45:43 -07:00
			`# Preview without writing results`
feat(cli): clean up dev command surface (#57) * feat(cli): clean up dev command surface * test: align CI expectations with CLI cleanup * test(cli): update slow test command expectations 2026-05-13 12:00:08 +02:00			`ktx scan my-postgres --dry-run`
docs: add title attributes to config code blocks Triggers VS Code tab styling on the docs' YAML/config blocks where surrounding prose names the file. Mechanical, ~1 edit across the guides section (building-context.mdx); primary-sources.mdx and context-sources.mdx already had title= on their Connection config blocks, so only the prose-named semantic source output example needed updating. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com> 2026-05-11 00:45:43 -07:00			```

feat(cli): clean up dev command surface (#57) * feat(cli): clean up dev command surface * test: align CI expectations with CLI cleanup * test(cli): update slow test command expectations 2026-05-13 12:00:08 +02:00			`### Checking scan results`
docs: add title attributes to config code blocks Triggers VS Code tab styling on the docs' YAML/config blocks where surrounding prose names the file. Mechanical, ~1 edit across the guides section (building-context.mdx); primary-sources.mdx and context-sources.mdx already had title= on their Connection config blocks, so only the prose-named semantic source output example needed updating. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com> 2026-05-11 00:45:43 -07:00
feat(cli): clean up dev command surface (#57) * feat(cli): clean up dev command surface * test: align CI expectations with CLI cleanup * test(cli): update slow test command expectations 2026-05-13 12:00:08 +02:00			Every scan prints a summary and writes local artifacts. Use `ktx status` after a scan to review project readiness and follow-up setup work:
docs: add title attributes to config code blocks Triggers VS Code tab styling on the docs' YAML/config blocks where surrounding prose names the file. Mechanical, ~1 edit across the guides section (building-context.mdx); primary-sources.mdx and context-sources.mdx already had title= on their Connection config blocks, so only the prose-named semantic source output example needed updating. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com> 2026-05-11 00:45:43 -07:00
			```bash
feat(cli): clean up dev command surface (#57) * feat(cli): clean up dev command surface * test: align CI expectations with CLI cleanup * test(cli): update slow test command expectations 2026-05-13 12:00:08 +02:00			`ktx status`
docs: add title attributes to config code blocks Triggers VS Code tab styling on the docs' YAML/config blocks where surrounding prose names the file. Mechanical, ~1 edit across the guides section (building-context.mdx); primary-sources.mdx and context-sources.mdx already had title= on their Connection config blocks, so only the prose-named semantic source output example needed updating. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com> 2026-05-11 00:45:43 -07:00			```

			`### Relationship detection`

			`Many databases lack declared foreign keys. KTX infers relationships by scoring column pairs across seven signals — name similarity, type compatibility, value overlap, embedding similarity, profile uniqueness, null rate, and structural priors. The weighted score determines each candidate's status:`

			`\| Score range \| Status \| Meaning \|`
			`\|-------------\|--------\|---------\|`
			\| ≥ 0.85 \| `accepted` \| High confidence — applied automatically \|
			\| 0.55 – 0.84 \| `review` \| Plausible — needs human review \|
			\| < 0.55 \| `rejected` \| Low confidence — not applied \|

feat(cli): clean up dev command surface (#57) * feat(cli): clean up dev command surface * test: align CI expectations with CLI cleanup * test(cli): update slow test command expectations 2026-05-13 12:00:08 +02:00			Relationship scans run with `ktx scan <connection-id> --mode relationships`. This command only executes the scan; relationship review and calibration subcommands are not part of the current CLI surface.
docs: add title attributes to config code blocks Triggers VS Code tab styling on the docs' YAML/config blocks where surrounding prose names the file. Mechanical, ~1 edit across the guides section (building-context.mdx); primary-sources.mdx and context-sources.mdx already had title= on their Connection config blocks, so only the prose-named semantic source output example needed updating. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com> 2026-05-11 00:45:43 -07:00
			`## Ingestion`

			`Ingestion pulls semantic context from your existing analytics tools — dbt projects, Looker models, Metabase questions, and more — and writes it into your KTX project as semantic sources and knowledge pages.`

			`### How it works`

			`Each ingest run follows this flow:`

			`1. An adapter extracts metadata from your tool (dbt manifest, LookML files, Metabase API, etc.)`
			`2. An LLM agent reconciles the extracted metadata with your existing context — it merges intelligently rather than overwriting`
			`3. Semantic sources (YAML) and knowledge pages (Markdown) are written to your project directory`

			`### Running an ingest`

			```bash
feat(cli): clean up dev command surface (#57) * feat(cli): clean up dev command surface * test: align CI expectations with CLI cleanup * test(cli): update slow test command expectations 2026-05-13 12:00:08 +02:00			`ktx ingest run --connection-id my-dbt-source --adapter dbt`
docs: add title attributes to config code blocks Triggers VS Code tab styling on the docs' YAML/config blocks where surrounding prose names the file. Mechanical, ~1 edit across the guides section (building-context.mdx); primary-sources.mdx and context-sources.mdx already had title= on their Connection config blocks, so only the prose-named semantic source output example needed updating. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com> 2026-05-11 00:45:43 -07:00			```

			`Useful low-level flags:`

			`\| Flag \| Description \|`
			`\|------\|-------------\|`
			\| `--source-dir <path>` \| Directory containing source files (e.g., your dbt project) \|
			\| `--viz` \| Render the memory-flow TUI for real-time progress \|
			\| `--json` \| Output as JSON \|
			\| `--plain` \| Plain text output \|

			`### Watching progress`

			```bash
			`# Check status of the latest ingest`
			`ktx ingest status`

			`# Check a specific run`
			`ktx ingest status <run-id>`

			`# Open the visual ingest report (TUI)`
			`ktx ingest watch`

			`# Replay a past ingest run`
feat(cli): clean up dev command surface (#57) * feat(cli): clean up dev command surface * test: align CI expectations with CLI cleanup * test(cli): update slow test command expectations 2026-05-13 12:00:08 +02:00			`ktx ingest replay <run-id>`
docs: add title attributes to config code blocks Triggers VS Code tab styling on the docs' YAML/config blocks where surrounding prose names the file. Mechanical, ~1 edit across the guides section (building-context.mdx); primary-sources.mdx and context-sources.mdx already had title= on their Connection config blocks, so only the prose-named semantic source output example needed updating. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com> 2026-05-11 00:45:43 -07:00			```

			The `watch` command opens an interactive TUI that shows the memory-flow output — every tool call, LLM decision, and artifact written during the ingest.

			`### Available adapters`

			`\| Adapter \| Source \| What gets ingested \|`
			`\|---------\|--------\|--------------------\|`
			\| `dbt` \| dbt project \| Model definitions, column descriptions, tests, tags \|
			\| `metricflow` \| MetricFlow semantic models \| Metrics, dimensions, entities, semantic joins \|
			\| `lookml` \| LookML files \| Views, explores, dimensions, measures, joins \|
			\| `looker` \| Looker API \| Explores, looks, dashboard metadata \|
			\| `metabase` \| Metabase API \| Questions, dashboards, table metadata \|
			\| `notion` \| Notion API \| Database pages, knowledge articles \|
			\| `historic-sql` \| Query history \| Frequent queries, usage patterns, runtime stats \|
			\| `live-database` \| Direct DB connection \| Live schema introspection \|

			`See [Context Sources](/docs/integrations/context-sources) for adapter-specific setup and auth configuration.`

			`### What gets generated`

			`A typical dbt ingest produces semantic sources and knowledge pages in your project:`

			Semantic source (`semantic-layer/my-postgres/orders.yaml`):

			```yaml title="semantic-layer/my-postgres/orders.yaml"
			`name: orders`
			`table: public.orders`
			`grain:`
			`- order_id`
			`columns:`
			`- name: order_id`
			`type: string`
			`description: Unique order identifier`
			`- name: customer_id`
			`type: string`
			`description: Foreign key to customers table`
			`- name: order_date`
			`type: time`
			`role: time`
			`description: Date the order was placed`
			`- name: total_amount`
			`type: number`
			`description: Total order value in USD`
			`measures:`
			`- name: total_revenue`
			`expr: SUM(total_amount)`
			`description: Sum of all order values`
			`- name: order_count`
			`expr: COUNT(DISTINCT order_id)`
			`description: Number of distinct orders`
			`joins:`
			`- to: customers`
			`on: orders.customer_id = customers.customer_id`
			`relationship: many_to_one`
			```

			Knowledge page (`knowledge/global/order-status-definitions.md`):

			```markdown
			`---`
			`summary: Business definitions for order status values`
			`tags: [orders, definitions]`
			`sl_refs: [orders]`
			`---`

			`## Order Statuses`

			`- pending: Order placed but not yet processed`
			`- confirmed: Payment received, awaiting fulfillment`
			`- shipped: Order dispatched to carrier`
			`- delivered: Order received by customer`
			`- cancelled: Order cancelled before shipment`

			`Orders in "pending" status for more than 48 hours are flagged for review.`
			```

			`### Deterministic replay`

			`Every ingest session records a full transcript — tool calls, LLM responses, and write decisions. You can replay any session to debug why a source was written a certain way:`

			```bash
feat(cli): clean up dev command surface (#57) * feat(cli): clean up dev command surface * test: align CI expectations with CLI cleanup * test(cli): update slow test command expectations 2026-05-13 12:00:08 +02:00			`ktx ingest replay <run-id> --viz`
docs: add title attributes to config code blocks Triggers VS Code tab styling on the docs' YAML/config blocks where surrounding prose names the file. Mechanical, ~1 edit across the guides section (building-context.mdx); primary-sources.mdx and context-sources.mdx already had title= on their Connection config blocks, so only the prose-named semantic source output example needed updating. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com> 2026-05-11 00:45:43 -07:00			```

			`This opens the same TUI view as the original run, letting you step through the agent's reasoning.`