ktx/docs-site/content/docs/guides/building-context.mdx
Andrey Avtomonov 2c9a58bb56
feat(cli): smart defaults and flatter command surface for ktx (#177)
Bare invocations now do the obvious thing instead of erroring out, and mode-as-subcommand patterns collapse into flags on the parent. No new top-level commands.

- `ktx ingest` (bare) ingests every configured connection. The `text` subcommand is gone; capture inline notes with `ktx ingest --text "..."` and files with `ktx ingest --file path` (use `-` for stdin). `--text`/`--file` reject a positional connection id; pass `--connection-id` to tag captured notes.
- `ktx connection` (bare) lists; `ktx connection test` (bare) tests every configured connection.
- `ktx wiki` and `ktx sl` flatten `list`/`search`: bare lists, with a `[query...]` positional searches (multi-word joined with spaces). `sl validate` and `sl query` stay as distinct verbs and now read `--connection-id` from the parent.
- `ktx mcp` (bare) prints daemon status.

Adds a shared `resolveConnectionSelection` helper consumed by ingest and connection test. Updates README, docs-site cli-reference and guides, next-steps strings, agent SKILL templates, and all affected tests. Per-package type-check, unit tests (605), smoke tests, and dead-code checks all pass.
2026-05-20 01:52:37 +02:00

183 lines
6.2 KiB
Text

---
title: Building Context
description: Build and refresh KTX context from databases, source tools, query history, and text.
---
Build context after `ktx setup` creates `ktx.yaml` and at least one database or
context-source connection. KTX writes local semantic-layer sources and wiki
pages for agents to use before writing SQL.
## The build loop
Most projects use this loop:
1. Check readiness with `ktx status`.
2. Build one connection with `ktx ingest <connectionId>`, or build everything
with `ktx ingest --all`.
3. Search or inspect the generated files under `semantic-layer/` and `wiki/`.
4. Edit source YAML or Markdown when business logic needs refinement.
5. Validate and query representative sources before handing the context to an
agent.
`ktx ingest --all` runs databases first, then context-source connections, so
external metadata can attach to known warehouse tables.
## Database ingest
Database ingest records table, column, type, constraint, and row-count context.
```bash
# Build one configured database connection
ktx ingest warehouse
# Build all configured connections
ktx ingest --all
```
Depth controls how much context KTX builds:
| Flag | Best for | What it does |
|------|----------|--------------|
| `--fast` | First setup, quick refreshes, CI smoke checks | Deterministic schema ingest with tables, columns, types, constraints, and row counts |
| `--deep` | Agent-ready context for real analysis | Fast ingest plus AI-enriched descriptions, embeddings, relationship evidence, and optional query history |
Examples:
```bash
ktx ingest warehouse --fast
ktx ingest warehouse --deep
ktx ingest --all --deep
```
Deep ingest needs LLM and embedding readiness. Otherwise run `ktx setup` or use
`--fast`.
With `claude-code`, KTX agent loops can invoke only the KTX MCP tools for the
current run.
## Query history
PostgreSQL, BigQuery, and Snowflake can add query-history context: common joins,
filters, service-account patterns, redaction rules, and high-usage templates.
Enable it during setup, store it under `connections.<id>.context.queryHistory`,
or request it for one run:
```bash
ktx ingest warehouse --deep --query-history
# Set the lookback window for BigQuery or Snowflake query history
ktx ingest warehouse --query-history-window-days 30
```
Use `--no-query-history` when you want to skip a stored query-history setting
for one run.
## Relationship evidence
KTX scores relationship candidates during supported deep database ingest. The
public CLI does not expose separate relationship review subcommands.
## Context-source ingest
Context-source connections pull metadata from dbt, BI tools, Notion, and other
configured systems. Pass one connection id or `--all`.
```bash
# Build one source connection
ktx ingest dbt_main
# Build every configured database and source connection
ktx ingest --all
```
Supported source types:
| Driver | Typical source | Output |
|--------|----------------|--------|
| `dbt` | dbt project or Git repo | Semantic sources with model, column, test, tag, and description metadata |
| `metricflow` | MetricFlow project or Git repo | Metrics, dimensions, entities, and semantic joins |
| `lookml` | LookML files or Git repo | Views, explores, dimensions, measures, and joins |
| `looker` | Looker API | Explores, looks, dashboards, and model metadata |
| `metabase` | Metabase API | Questions, dashboards, table metadata, and mappings |
| `notion` | Notion API | Wiki pages and business knowledge |
Source ingest writes semantic-layer YAML and wiki Markdown, merging with local
edits.
## Text ingest
Use `ktx ingest --text` / `ktx ingest --file` for notes, Markdown, runbooks,
Slack exports, or other searchable memory.
```bash
# Capture a Markdown file
ktx ingest --file docs/revenue-notes.md --connection-id warehouse
# Capture one stdin item
printf "Refunds are excluded from net revenue." | ktx ingest --file -
# Capture direct text
ktx ingest --text "ARR excludes one-time implementation fees."
```
Useful flags:
| Flag | Description |
|------|-------------|
| `--text <content>` | Capture inline text into memory; repeatable |
| `--file <path>` | Capture a text file (or `-` for stdin) into memory; repeatable |
| `--connection-id <connectionId>` | Attach the captured memory to a KTX connection |
| `--user-id <id>` | Attribute capture to a user scope, default `local-cli` |
| `--json` | Print structured output |
| `--fail-fast` | Stop after the first failed text/file item |
Use text ingest for small, high-signal documents. Prefer configured source
ingest for Notion, dbt, Metabase, and similar systems.
## Output and artifacts
Every ingest run prints a summary. Use `--json` for scripts and agents.
```bash
ktx ingest --all --json
```
Typical generated files:
| Path | Created by | Purpose |
|------|------------|---------|
| `semantic-layer/<connection-id>/*.yaml` | Database and source ingest | Queryable semantic source definitions |
| `wiki/global/*.md` | Source, text, and memory ingest | Shared business definitions and notes |
| `wiki/user/<user-id>/*.md` | Text and memory ingest | User-scoped context |
| `.ktx/setup/context-build.json` | Setup context build | Resume and readiness state for setup |
Ingest transcripts include tool calls, LLM responses, and write decisions.
## Example: first full refresh
After interactive setup:
```bash
ktx status
ktx ingest --all --deep
ktx status
```
Then inspect what changed:
```bash
git status --short
ktx sl --json
ktx wiki "revenue" --json --limit 10
```
## Common errors
| Symptom | Likely cause | Recovery |
|---------|--------------|----------|
| Connection not configured | The connection id is missing from `ktx.yaml` | Add it with `ktx setup` |
| Deep readiness is missing | LLM or embeddings are not setup-ready | Run `ktx setup`, or rerun with `--fast` |
| Query history is unsupported | The selected database driver does not expose query history | Run schema ingest without query-history flags |
| No connections configured | The project has no entries under `connections` | Run `ktx setup` and add a database or source connection |
| Source flags have no effect | Depth and query-history flags were supplied for a source connector | Use those flags only for database connections |
| Text ingest stops early | `--fail-fast` stopped on the first failed item | Fix the item or rerun without `--fail-fast` |