mirror of
https://github.com/Kaelio/ktx.git
synced 2026-06-10 08:05:14 +02:00
docs: refresh context build guide
This commit is contained in:
parent
918744a127
commit
2b63cd7feb
1 changed files with 153 additions and 129 deletions
|
|
@ -1,171 +1,195 @@
|
|||
---
|
||||
title: Building Context
|
||||
description: Build database and source context from configured KTX connections.
|
||||
description: Build and refresh KTX context from databases, source tools, query history, and text.
|
||||
---
|
||||
|
||||
Building context reads your configured connections and writes local context that
|
||||
agents can use. Database connections produce schema context, and source
|
||||
connections such as dbt, Looker, Metabase, and Notion produce semantic sources
|
||||
and wiki pages.
|
||||
Building context turns configured connections into local semantic-layer sources
|
||||
and wiki pages. Agents use those files to understand your schema, business
|
||||
definitions, metric logic, joins, and known caveats before they write SQL.
|
||||
|
||||
Use this guide after `ktx setup` has created `ktx.yaml` and at least one
|
||||
database or context-source connection.
|
||||
|
||||
## The build loop
|
||||
|
||||
Most projects use this loop:
|
||||
|
||||
1. Check readiness with `ktx status`.
|
||||
2. Build one connection with `ktx ingest <connectionId>`, or build everything
|
||||
with `ktx ingest --all`.
|
||||
3. Search or inspect the generated files under `semantic-layer/` and `wiki/`.
|
||||
4. Edit source YAML or Markdown when business logic needs refinement.
|
||||
5. Validate and query representative sources before handing the context to an
|
||||
agent.
|
||||
|
||||
`ktx ingest --all` runs database connections first, then context-source
|
||||
connections. That order lets dbt, BI, Notion, and text ingest attach context to
|
||||
known warehouse tables.
|
||||
|
||||
## Database ingest
|
||||
|
||||
Database ingest connects to your warehouse and extracts structural metadata.
|
||||
KTX stores the results locally so agents can understand your schema without
|
||||
querying the database directly.
|
||||
|
||||
### Running database ingest
|
||||
Database ingest connects to a configured warehouse and records local schema
|
||||
context. It gives agents table, column, type, constraint, and row-count
|
||||
grounding without requiring them to inspect the database directly.
|
||||
|
||||
```bash
|
||||
ktx ingest <connection-id>
|
||||
```
|
||||
|
||||
This runs a fast schema ingest by default. You can choose the depth with public
|
||||
flags:
|
||||
|
||||
| Flag | What it does |
|
||||
|------|-------------|
|
||||
| `--fast` | Tables, columns, types, constraints, and row counts |
|
||||
| `--deep` | Fast ingest plus AI-enriched database context |
|
||||
|
||||
```bash
|
||||
# Build one connection quickly
|
||||
ktx ingest my-postgres --fast
|
||||
|
||||
# Build AI-enriched database context
|
||||
ktx ingest my-postgres --deep
|
||||
# Build one configured database connection
|
||||
ktx ingest warehouse
|
||||
|
||||
# Build all configured connections
|
||||
ktx ingest --all
|
||||
```
|
||||
|
||||
### Checking results
|
||||
Depth controls how much context KTX builds:
|
||||
|
||||
Every ingest prints a summary and writes local artifacts. Use `ktx status`
|
||||
after ingest to review project readiness and follow-up setup work:
|
||||
| Flag | Best for | What it does |
|
||||
|------|----------|--------------|
|
||||
| `--fast` | First setup, quick refreshes, CI smoke checks | Deterministic schema ingest with tables, columns, types, constraints, and row counts |
|
||||
| `--deep` | Agent-ready context for real analysis | Fast ingest plus AI-enriched descriptions, embeddings, relationship evidence, and optional query history |
|
||||
|
||||
Examples:
|
||||
|
||||
```bash
|
||||
ktx status
|
||||
ktx ingest warehouse --fast
|
||||
ktx ingest warehouse --deep
|
||||
ktx ingest --all --deep
|
||||
```
|
||||
|
||||
### Relationship detection
|
||||
Deep ingest needs LLM and embedding readiness. If those providers are not
|
||||
configured, run `ktx setup` or use `--fast`.
|
||||
|
||||
Many databases lack declared foreign keys. KTX infers relationships by scoring column pairs across seven signals - name similarity, type compatibility, value overlap, embedding similarity, profile uniqueness, null rate, and structural priors. The weighted score determines each candidate's status:
|
||||
## Query history
|
||||
|
||||
| Score range | Status | Meaning |
|
||||
|-------------|--------|---------|
|
||||
| ≥ 0.85 | `accepted` | High confidence - applied automatically |
|
||||
| 0.55 – 0.84 | `review` | Plausible - needs human review |
|
||||
| < 0.55 | `rejected` | Low confidence - not applied |
|
||||
PostgreSQL, BigQuery, and Snowflake can add query-history context. This helps
|
||||
KTX learn common joins, filters, service-account patterns, redaction rules, and
|
||||
usage-heavy query templates.
|
||||
|
||||
Deep database ingest can include relationship evidence where the connector can
|
||||
provide it. Relationship review and calibration subcommands are not part of the
|
||||
current public CLI surface.
|
||||
|
||||
## Ingestion
|
||||
|
||||
Ingestion pulls semantic context from your existing analytics tools - dbt projects, Looker models, Metabase questions, and more - and writes it into your KTX project as semantic sources and wiki pages.
|
||||
|
||||
### How it works
|
||||
|
||||
Each ingest run follows this flow:
|
||||
|
||||
1. An **adapter** extracts metadata from your tool (dbt manifest, LookML files, Metabase API, etc.)
|
||||
2. An **LLM agent** reconciles the extracted metadata with your existing context - it merges intelligently rather than overwriting
|
||||
3. **Semantic sources** (YAML) and **wiki pages** (Markdown) are written to your project directory
|
||||
|
||||
### Running an ingest
|
||||
Enable it during setup, store it under `connections.<id>.context.queryHistory`,
|
||||
or request it for one run:
|
||||
|
||||
```bash
|
||||
ktx ingest my-dbt-source
|
||||
ktx ingest warehouse --deep --query-history
|
||||
ktx ingest warehouse --query-history-window-days 30
|
||||
```
|
||||
|
||||
Useful output flags:
|
||||
Use `--no-query-history` when you want to skip a stored query-history setting
|
||||
for one run.
|
||||
|
||||
## Relationship evidence
|
||||
|
||||
Many databases do not declare all foreign keys. KTX can score relationship
|
||||
candidates using signals such as name similarity, type compatibility, value
|
||||
overlap, embedding similarity, uniqueness, null rate, and structural priors.
|
||||
|
||||
The public CLI does not expose separate relationship review subcommands.
|
||||
Relationship evidence is built as part of deep database ingest when the
|
||||
connector and readiness checks support it.
|
||||
|
||||
## Context-source ingest
|
||||
|
||||
Context-source connections pull business metadata from tools your team already
|
||||
uses. The current public `ktx ingest` command is connection-centric: pass one
|
||||
configured connection id, or pass `--all`.
|
||||
|
||||
```bash
|
||||
# Build one source connection
|
||||
ktx ingest dbt_main
|
||||
|
||||
# Build every configured database and source connection
|
||||
ktx ingest --all
|
||||
```
|
||||
|
||||
Supported source types:
|
||||
|
||||
| Driver | Typical source | Output |
|
||||
|--------|----------------|--------|
|
||||
| `dbt` | dbt project or Git repo | Semantic sources with model, column, test, tag, and description metadata |
|
||||
| `metricflow` | MetricFlow project or Git repo | Metrics, dimensions, entities, and semantic joins |
|
||||
| `lookml` | LookML files or Git repo | Views, explores, dimensions, measures, and joins |
|
||||
| `looker` | Looker API | Explores, looks, dashboards, and model metadata |
|
||||
| `metabase` | Metabase API | Questions, dashboards, table metadata, and mappings |
|
||||
| `notion` | Notion API | Wiki pages and business knowledge |
|
||||
|
||||
Source ingest extracts metadata, reconciles it with existing local context, and
|
||||
writes semantic-layer YAML plus wiki Markdown. It merges rather than blindly
|
||||
overwriting local edits.
|
||||
|
||||
## Text ingest
|
||||
|
||||
Use `ktx ingest text` for notes, Markdown files, runbooks, Slack exports, or
|
||||
other free-form knowledge that should become searchable KTX memory.
|
||||
|
||||
```bash
|
||||
# Capture a Markdown file
|
||||
ktx ingest text docs/revenue-notes.md --connection-id warehouse
|
||||
|
||||
# Capture one stdin item
|
||||
printf "Refunds are excluded from net revenue." | ktx ingest text -
|
||||
|
||||
# Capture direct text
|
||||
ktx ingest text --text "ARR excludes one-time implementation fees."
|
||||
```
|
||||
|
||||
Useful flags:
|
||||
|
||||
| Flag | Description |
|
||||
|------|-------------|
|
||||
| `--json` | Output as JSON |
|
||||
| `--plain` | Plain text output |
|
||||
| `--connection-id <connectionId>` | Attach the captured memory to a KTX connection |
|
||||
| `--user-id <id>` | Attribute capture to a user scope, default `local-cli` |
|
||||
| `--json` | Print structured output |
|
||||
| `--fail-fast` | Stop after the first failed text item |
|
||||
|
||||
Foreground context builds do not detach into background control sessions. If a
|
||||
run is interrupted, rerun `ktx ingest <connection-id>` or `ktx ingest --all`.
|
||||
Text ingest is a good fit for small, high-signal documents. For system-specific
|
||||
connectors such as Notion, dbt, or Metabase, prefer configured source ingest so
|
||||
KTX can preserve source metadata.
|
||||
|
||||
### Supported context sources
|
||||
## Output and artifacts
|
||||
|
||||
| Driver | Source | What gets ingested |
|
||||
|--------|--------|--------------------|
|
||||
| `dbt` | dbt project | Model definitions, column descriptions, tests, tags |
|
||||
| `metricflow` | MetricFlow semantic models | Metrics, dimensions, entities, semantic joins |
|
||||
| `lookml` | LookML files | Views, explores, dimensions, measures, joins |
|
||||
| `looker` | Looker API | Explores, looks, dashboard metadata |
|
||||
| `metabase` | Metabase API | Questions, dashboards, table metadata |
|
||||
| `notion` | Notion API | Database pages, knowledge articles |
|
||||
Every ingest run prints a summary. Use `--json` when an agent or script needs a
|
||||
structured plan and per-target results.
|
||||
|
||||
Query history is a database connection facet. Enable it with
|
||||
`connections.<id>.context.queryHistory` or pass `--query-history` for a current
|
||||
run. See [Context Sources](/docs/integrations/context-sources) for
|
||||
driver-specific setup and auth configuration.
|
||||
|
||||
### What gets generated
|
||||
|
||||
A typical dbt ingest produces semantic sources and wiki pages in your project:
|
||||
|
||||
**Semantic source** (`semantic-layer/my-postgres/orders.yaml`):
|
||||
|
||||
```yaml title="semantic-layer/my-postgres/orders.yaml"
|
||||
name: orders
|
||||
table: public.orders
|
||||
grain:
|
||||
- order_id
|
||||
columns:
|
||||
- name: order_id
|
||||
type: string
|
||||
description: Unique order identifier
|
||||
- name: customer_id
|
||||
type: string
|
||||
description: Foreign key to customers table
|
||||
- name: order_date
|
||||
type: time
|
||||
role: time
|
||||
description: Date the order was placed
|
||||
- name: total_amount
|
||||
type: number
|
||||
description: Total order value in USD
|
||||
measures:
|
||||
- name: total_revenue
|
||||
expr: SUM(total_amount)
|
||||
description: Sum of all order values
|
||||
- name: order_count
|
||||
expr: COUNT(DISTINCT order_id)
|
||||
description: Number of distinct orders
|
||||
joins:
|
||||
- to: customers
|
||||
on: orders.customer_id = customers.customer_id
|
||||
relationship: many_to_one
|
||||
```bash
|
||||
ktx ingest --all --json
|
||||
```
|
||||
|
||||
**Wiki page** (`wiki/global/order-status-definitions.md`):
|
||||
Typical generated files:
|
||||
|
||||
```markdown
|
||||
---
|
||||
summary: Business definitions for order status values
|
||||
tags: [orders, definitions]
|
||||
sl_refs: [orders]
|
||||
---
|
||||
| Path | Created by | Purpose |
|
||||
|------|------------|---------|
|
||||
| `semantic-layer/<connection-id>/*.yaml` | Database and source ingest | Queryable semantic source definitions |
|
||||
| `wiki/global/*.md` | Source, text, and memory ingest | Shared business definitions and notes |
|
||||
| `wiki/user/<user-id>/*.md` | Text and memory ingest | User-scoped context |
|
||||
| `.ktx/setup/context-build.json` | Setup context build | Resume and readiness state for setup |
|
||||
|
||||
## Order Statuses
|
||||
Ingest sessions also record transcripts with tool calls, LLM responses, and
|
||||
write decisions. Inspect them when you need to debug why a source or wiki page
|
||||
was written a certain way.
|
||||
|
||||
- **pending**: Order placed but not yet processed
|
||||
- **confirmed**: Payment received, awaiting fulfillment
|
||||
- **shipped**: Order dispatched to carrier
|
||||
- **delivered**: Order received by customer
|
||||
- **cancelled**: Order cancelled before shipment
|
||||
## Example: first full refresh
|
||||
|
||||
Orders in "pending" status for more than 48 hours are flagged for review.
|
||||
After interactive setup:
|
||||
|
||||
```bash
|
||||
ktx status
|
||||
ktx ingest --all --deep
|
||||
ktx status
|
||||
```
|
||||
|
||||
### Ingest transcripts
|
||||
Then inspect what changed:
|
||||
|
||||
Every ingest session records a full transcript: tool calls, LLM responses, and
|
||||
write decisions. Inspect the stored transcript files when you need to debug why
|
||||
a source was written a certain way.
|
||||
```bash
|
||||
git status --short
|
||||
ktx sl list --json
|
||||
ktx wiki search "revenue" --json --limit 10
|
||||
```
|
||||
|
||||
## Common errors
|
||||
|
||||
| Symptom | Likely cause | Recovery |
|
||||
|---------|--------------|----------|
|
||||
| Connection not configured | The connection id is missing from `ktx.yaml` | Add it with `ktx setup` |
|
||||
| Deep readiness is missing | LLM or embeddings are not setup-ready | Run `ktx setup`, or rerun with `--fast` |
|
||||
| Query history is unsupported | The selected database driver does not expose query history | Run schema ingest without query-history flags |
|
||||
| No target selected | You omitted both a connection id and `--all` | Run `ktx ingest <connectionId>` or `ktx ingest --all` |
|
||||
| Source flags have no effect | Depth and query-history flags were supplied for a source connector | Use those flags only for database connections |
|
||||
| Text ingest stops early | `--fail-fast` stopped on the first failed item | Fix the item or rerun without `--fail-fast` |
|
||||
|
|
|
|||
Loading…
Add table
Add a link
Reference in a new issue