ktx/docs-site/content/docs/guides/building-context.mdx
Andrey Avtomonov b00c1a11a9
feat: merge ingest and scan
* docs: add CLI component reuse guidance

* docs: add unified ingest ux design

* Refine unified ingest UX design after adversarial review iteration 1

* Refine unified ingest UX design after adversarial review iteration 2

* Refine unified ingest UX design after adversarial review iteration 3

* feat(cli): route public connection ingest command

* feat(cli): hide standalone scan from public help

* feat(cli): plan public ingest depth and query history

* feat(cli): execute public database ingest facets

* feat(ingest): read connection query history config

* fix(cli): use public ingest wording

* fix(config): stop generating ingest adapter allow lists

* docs: document public ingest command

* test: align ingest surface expectations

* docs: add unified ingest public CLI surface plan

* feat(cli): preflight deep public ingest readiness

* feat(setup): store query history in connection context

* feat(setup): store database context depth

* feat(setup): verify context readiness by database depth

* fix(setup): keep context build foreground only

* fix(config): reject reserved ingest connection ids

* test: close unified ingest v1 expectations

* docs: add unified ingest v1 closure plan

* fix(ingest): bypass adapter allow-list for public source ingest

* fix(ingest): honor query history window intent

* fix(ingest): hide scan internals from public database ingest

* feat(ingest): use foreground view for interactive public ingest

* fix(setup): use schema context and query history wording

* test(cli): verify unified ingest public output

* docs: add unified ingest v1 public output closure plan

* fix(setup): forward query history flags

* fix(setup): prompt for postgres query history

* fix(status): report query history readiness

* fix(ingest): remove legacy public guidance

* fix(ingest): polish foreground retry copy

* docs(examples): use unified query history wording

* chore(ingest): finish public query history cleanup

* docs: add unified ingest v1 query history status cleanup plan

* test(docs): cover unified ingest public docs

* docs: align ingest CLI reference with unified UX

* docs: update context build guides for unified ingest

* docs: update setup and primary source ingest wording

* docs: stop advertising adapter-backed example ingest

* docs: close unified ingest public docs gaps

* docs: add unified ingest v1 docs site closure plan

* fix: render unified ingest foreground warnings

* fix: explain query history schema order

* fix: add public ingest retry guidance

* fix: align setup next steps with unified ingest

* fix: remove scan wording from demo progress

* test: verify unified ingest ux closure

* docs: add unified ingest v1 foreground and retry closure plan

* fix(cli): preserve query-history pull config in public ingest

* fix(cli): omit hidden commands from docs command tree

* test(cli): close unified ingest final public surface checks

* docs: add unified ingest v1 final public surface closure plan

* fix(cli): use public source labels in ingest reports

* fix(cli): suppress low-level public ingest output

* test(cli): verify unified ingest public plain output

* docs: add unified ingest v1 public plain output closure plan

* fix(cli): add public ingest copy sanitizers

* fix(cli): sanitize public ingest progress copy

* fix(cli): rename setup schema scope prompt

* docs(plan): add progress copy closure; test: align setup back-nav fixture

Adds the iter9 plan and updates the setup back-navigation test fixture
to pass disableQueryHistory plus listSchemas/listTables stubs that the
unified ingest setup step now requires.

* docs(plan): add final ux labels plan with narrowed label scans

* fix(cli): aggregate unsupported query-history warnings

* fix(cli): align setup database labels

* test(cli): fix setup database test type-check

* fix(cli): remove primary-source wording from setup output

* test(cli): verify unified ingest setup closure

* docs(plan): add unified ingest v1 verification copy closure plan

* fix(cli): remove top-level scan command

* fix(cli): remove legacy ingest and wiki commands

* Merge scan into ingest flow

* feat(cli): split ingest progress into per-phase rows, rename work units to tasks

Each database target in the unified ingest dashboard now renders one row per
real subprocess (Schema, then Query history when enabled) instead of a single
combined bar. Each phase has its own monotonic 0-100% bar so the progress
never snaps back to zero when historic-sql starts after scan completes.
Completed phases keep their final bar, summary, and elapsed time visible as
an inline audit trail; queued and skipped phases are shown explicitly.

Also rename user-facing "work units" / "Failed work units" to "tasks" /
"Failed tasks" in ingest output and parseIngestSummary. The parser still
accepts the legacy "Work units:" wording in captured output for backward
compat. Internal memory-flow event names and type fields are left alone.

* Fix test harness failures

* Fix CI smoke checks

---------

Co-authored-by: Andrey Avtomonov <7889985+andreybavt@users.noreply.github.com>
2026-05-14 01:43:06 +02:00

171 lines
5.4 KiB
Text

---
title: Building Context
description: Build database and source context from configured KTX connections.
---
Building context reads your configured connections and writes local context that
agents can use. Database connections produce schema context, and source
connections such as dbt, Looker, Metabase, and Notion produce semantic sources
and wiki pages.
## Database ingest
Database ingest connects to your warehouse and extracts structural metadata.
KTX stores the results locally so agents can understand your schema without
querying the database directly.
### Running database ingest
```bash
ktx ingest <connection-id>
```
This runs a fast schema ingest by default. You can choose the depth with public
flags:
| Flag | What it does |
|------|-------------|
| `--fast` | Tables, columns, types, constraints, and row counts |
| `--deep` | Fast ingest plus AI-enriched database context |
```bash
# Build one connection quickly
ktx ingest my-postgres --fast
# Build AI-enriched database context
ktx ingest my-postgres --deep
# Build all configured connections
ktx ingest --all
```
### Checking results
Every ingest prints a summary and writes local artifacts. Use `ktx status`
after ingest to review project readiness and follow-up setup work:
```bash
ktx status
```
### Relationship detection
Many databases lack declared foreign keys. KTX infers relationships by scoring column pairs across seven signals — name similarity, type compatibility, value overlap, embedding similarity, profile uniqueness, null rate, and structural priors. The weighted score determines each candidate's status:
| Score range | Status | Meaning |
|-------------|--------|---------|
| &ge; 0.85 | `accepted` | High confidence — applied automatically |
| 0.55 &ndash; 0.84 | `review` | Plausible — needs human review |
| &lt; 0.55 | `rejected` | Low confidence — not applied |
Deep database ingest can include relationship evidence where the connector can
provide it. Relationship review and calibration subcommands are not part of the
current public CLI surface.
## Ingestion
Ingestion pulls semantic context from your existing analytics tools — dbt projects, Looker models, Metabase questions, and more — and writes it into your KTX project as semantic sources and wiki pages.
### How it works
Each ingest run follows this flow:
1. An **adapter** extracts metadata from your tool (dbt manifest, LookML files, Metabase API, etc.)
2. An **LLM agent** reconciles the extracted metadata with your existing context — it merges intelligently rather than overwriting
3. **Semantic sources** (YAML) and **wiki pages** (Markdown) are written to your project directory
### Running an ingest
```bash
ktx ingest my-dbt-source
```
Useful output flags:
| Flag | Description |
|------|-------------|
| `--json` | Output as JSON |
| `--plain` | Plain text output |
Foreground context builds do not detach into background control sessions. If a
run is interrupted, rerun `ktx ingest <connection-id>` or `ktx ingest --all`.
### Supported context sources
| Driver | Source | What gets ingested |
|--------|--------|--------------------|
| `dbt` | dbt project | Model definitions, column descriptions, tests, tags |
| `metricflow` | MetricFlow semantic models | Metrics, dimensions, entities, semantic joins |
| `lookml` | LookML files | Views, explores, dimensions, measures, joins |
| `looker` | Looker API | Explores, looks, dashboard metadata |
| `metabase` | Metabase API | Questions, dashboards, table metadata |
| `notion` | Notion API | Database pages, knowledge articles |
Query history is a database connection facet. Enable it with
`connections.<id>.context.queryHistory` or pass `--query-history` for a current
run. See [Context Sources](/docs/integrations/context-sources) for
driver-specific setup and auth configuration.
### What gets generated
A typical dbt ingest produces semantic sources and wiki pages in your project:
**Semantic source** (`semantic-layer/my-postgres/orders.yaml`):
```yaml title="semantic-layer/my-postgres/orders.yaml"
name: orders
table: public.orders
grain:
- order_id
columns:
- name: order_id
type: string
description: Unique order identifier
- name: customer_id
type: string
description: Foreign key to customers table
- name: order_date
type: time
role: time
description: Date the order was placed
- name: total_amount
type: number
description: Total order value in USD
measures:
- name: total_revenue
expr: SUM(total_amount)
description: Sum of all order values
- name: order_count
expr: COUNT(DISTINCT order_id)
description: Number of distinct orders
joins:
- to: customers
on: orders.customer_id = customers.customer_id
relationship: many_to_one
```
**Wiki page** (`wiki/global/order-status-definitions.md`):
```markdown
---
summary: Business definitions for order status values
tags: [orders, definitions]
sl_refs: [orders]
---
## Order Statuses
- **pending**: Order placed but not yet processed
- **confirmed**: Payment received, awaiting fulfillment
- **shipped**: Order dispatched to carrier
- **delivered**: Order received by customer
- **cancelled**: Order cancelled before shipment
Orders in "pending" status for more than 48 hours are flagged for review.
```
### Ingest transcripts
Every ingest session records a full transcript: tool calls, LLM responses, and
write decisions. Inspect the stored transcript files when you need to debug why
a source was written a certain way.