2026-05-11 00:45:43 -07:00
---
title: Building Context
2026-05-20 17:33:38 +02:00
description: Build and refresh ktx context from databases, context sources, query history, and text.
2026-05-11 00:45:43 -07:00
---
2026-05-18 09:57:27 -04:00
Build context after `ktx setup` creates `ktx.yaml` and at least one database or
2026-05-20 17:33:38 +02:00
context-source connection. **ktx** writes local semantic sources and wiki
2026-05-18 09:57:27 -04:00
pages for agents to use before writing SQL.
2026-05-11 00:45:43 -07:00
2026-05-14 18:09:26 -04:00
## The build loop
2026-05-11 00:45:43 -07:00
2026-05-14 18:09:26 -04:00
Most projects use this loop:
2026-05-11 00:45:43 -07:00
2026-05-14 18:09:26 -04:00
1. Check readiness with `ktx status`.
2. Build one connection with `ktx ingest <connectionId>`, or build everything
with `ktx ingest --all`.
3. Search or inspect the generated files under `semantic-layer/` and `wiki/`.
4. Edit source YAML or Markdown when business logic needs refinement.
5. Validate and query representative sources before handing the context to an
agent.
2026-05-11 00:45:43 -07:00
2026-05-18 09:57:27 -04:00
`ktx ingest --all` runs databases first, then context-source connections, so
external metadata can attach to known warehouse tables.
2026-05-11 00:45:43 -07:00
2026-05-14 18:09:26 -04:00
## Database ingest
2026-05-11 00:45:43 -07:00
2026-05-29 17:41:04 +02:00
Database ingest always builds enriched context: tables, columns, types,
constraints, and row counts, plus AI-generated descriptions, embeddings, and
relationship evidence.
2026-05-11 00:45:43 -07:00
2026-05-14 18:09:26 -04:00
```bash
# Build one configured database connection
ktx ingest warehouse
2026-05-14 01:43:06 +02:00
# Build all configured connections
ktx ingest --all
2026-05-11 00:45:43 -07:00
```
2026-05-29 17:41:04 +02:00
Enriched ingest needs a configured model and embeddings. Run `ktx setup` first;
connections without that configuration fail before any work starts.
2026-05-14 18:09:26 -04:00
feat: add codex llm backend for ktx runtime work (#253)
* feat: add codex sdk runner foundation
* feat: parse codex runtime events
* feat: expose codex runtime mcp tools
* feat: add codex llm runtime
* feat: wire codex llm backend
* test: avoid Array.fromAsync in codex runner test
* docs: document codex llm backend
* fix: tighten codex runtime config ownership
* fix: use codex sdk env and thread options
* fix: parse codex sdk event shapes
* test: add codex backend live smoke
* docs: clarify codex backend isolation
* fix: drive codex loop metrics from mcp events
* fix: enforce codex local step budget
* docs: disclose codex isolation limits
* fix: count all codex agent steps and stream step callbacks live
The agent-loop step budget only counted completed mcp_tool_call items, so
built-in command_execution steps (which the public Codex SDK/CLI surface can
still expose) never decremented the budget, letting ingest/reconciliation run
past stepBudget until Codex stopped on its own. onStepFinish was also replayed
only after the whole stream drained, so live work_unit_step / reconciliation
progress appeared stuck until the Codex process exited.
collectEvents is now the single live step accumulator: it counts every
completed agent-action item via a shared isCompletedAgentStep predicate
(command_execution, mcp_tool_call, file_change, web_search), fires onStepFinish
as each step completes, and enforces the budget on that broader count. A
no-tool turn still counts as one step. toolFailures stays MCP-specific, since a
non-zero command exit is normal agent exploration, not a loop failure.
* test: align ingest llm-guard assertions with codex backend
The skip-llm ingest guard message now lists codex as a valid backend and
mentions a Claude Code/Codex session plus a codex setup hint, but this slow
suite test still asserted the pre-codex wording. Update it to match the
production message (already covered by the local-bundle-runtime unit test) and
add the codex setup-line assertion.
* fix: treat codex error:null tool calls as success
The Codex SDK serializes error: null on successful mcp_tool_call items, so
the failure check (item.error !== undefined) flagged every successful tool
call as failed with the empty-payload default "Codex turn failed". This
killed every ingest work unit under the codex backend before it could
produce a patch.
Key on status === 'failed' (authoritative, always set) and only treat a
populated error object as a failure. Add a regression test built from a
verbatim real-SDK event capture.
* fix: default codex backend to gpt-5.5 and report real probe errors
The previous default gpt-5.3-codex is an API-key-only model that the OpenAI
API rejects under ChatGPT-account (subscription) auth, so codex status/setup
failed with a misleading "authentication is not usable" message even though
auth was fine.
- Default codex model is now gpt-5.5 (works on both subscription and API-key
auth); the curated setup picker offers gpt-5.5 / gpt-5.4 / gpt-5.4-mini and
keeps free-form entry for account-specific ids (e.g. gpt-5.3-codex-spark).
- runCodexAuthProbe now distinguishes "model not available" from an auth
failure and surfaces the real API error: collectEvents retains stream
events when the SDK throws on a non-zero exit, and the API error JSON
envelope is unwrapped to its human-readable message.
- The Codex isolation warning now renders inside the clack setup frame.
- Docs updated to gpt-5.5 with a note that *-codex ids require API-key auth.
* fix: require llm.models.default in status and match codex probe remediation
Status reported a project ready when a non-none LLM backend was configured
without llm.models.default, but the runtime (resolveModelSlots) hard-requires
it, so ingest/scan/memory threw after `ktx status` said the project was usable.
buildLlmStatus now fails for any non-none backend missing models.default and no
longer invents a fallback model for claude-code/codex.
Codex probe failures now carry a category-matched fix: a model-access failure
steers the user at llm.models.default instead of the auth/install remediation.
runCodexAuthProbe returns the fix and status consumes it; the message stays
self-sufficient so setup output is unchanged.
Docs: README now lists the codex backend and local Codex auth; ktx-setup.mdx
states --llm-model only accepts codex/default or gpt-*/codex-* ids.
Repaired four doctor fixtures that configured a backend without models.default
(the now-correctly-blocked config) and added coverage for the new behavior.
2026-06-02 13:57:11 +02:00
Local-auth backends keep provider credentials out of `ktx.yaml`:
```bash
ktx setup --llm-backend claude-code --no-input
2026-06-08 15:30:48 +02:00
ktx setup --llm-backend codex --no-input
feat: add codex llm backend for ktx runtime work (#253)
* feat: add codex sdk runner foundation
* feat: parse codex runtime events
* feat: expose codex runtime mcp tools
* feat: add codex llm runtime
* feat: wire codex llm backend
* test: avoid Array.fromAsync in codex runner test
* docs: document codex llm backend
* fix: tighten codex runtime config ownership
* fix: use codex sdk env and thread options
* fix: parse codex sdk event shapes
* test: add codex backend live smoke
* docs: clarify codex backend isolation
* fix: drive codex loop metrics from mcp events
* fix: enforce codex local step budget
* docs: disclose codex isolation limits
* fix: count all codex agent steps and stream step callbacks live
The agent-loop step budget only counted completed mcp_tool_call items, so
built-in command_execution steps (which the public Codex SDK/CLI surface can
still expose) never decremented the budget, letting ingest/reconciliation run
past stepBudget until Codex stopped on its own. onStepFinish was also replayed
only after the whole stream drained, so live work_unit_step / reconciliation
progress appeared stuck until the Codex process exited.
collectEvents is now the single live step accumulator: it counts every
completed agent-action item via a shared isCompletedAgentStep predicate
(command_execution, mcp_tool_call, file_change, web_search), fires onStepFinish
as each step completes, and enforces the budget on that broader count. A
no-tool turn still counts as one step. toolFailures stays MCP-specific, since a
non-zero command exit is normal agent exploration, not a loop failure.
* test: align ingest llm-guard assertions with codex backend
The skip-llm ingest guard message now lists codex as a valid backend and
mentions a Claude Code/Codex session plus a codex setup hint, but this slow
suite test still asserted the pre-codex wording. Update it to match the
production message (already covered by the local-bundle-runtime unit test) and
add the codex setup-line assertion.
* fix: treat codex error:null tool calls as success
The Codex SDK serializes error: null on successful mcp_tool_call items, so
the failure check (item.error !== undefined) flagged every successful tool
call as failed with the empty-payload default "Codex turn failed". This
killed every ingest work unit under the codex backend before it could
produce a patch.
Key on status === 'failed' (authoritative, always set) and only treat a
populated error object as a failure. Add a regression test built from a
verbatim real-SDK event capture.
* fix: default codex backend to gpt-5.5 and report real probe errors
The previous default gpt-5.3-codex is an API-key-only model that the OpenAI
API rejects under ChatGPT-account (subscription) auth, so codex status/setup
failed with a misleading "authentication is not usable" message even though
auth was fine.
- Default codex model is now gpt-5.5 (works on both subscription and API-key
auth); the curated setup picker offers gpt-5.5 / gpt-5.4 / gpt-5.4-mini and
keeps free-form entry for account-specific ids (e.g. gpt-5.3-codex-spark).
- runCodexAuthProbe now distinguishes "model not available" from an auth
failure and surfaces the real API error: collectEvents retains stream
events when the SDK throws on a non-zero exit, and the API error JSON
envelope is unwrapped to its human-readable message.
- The Codex isolation warning now renders inside the clack setup frame.
- Docs updated to gpt-5.5 with a note that *-codex ids require API-key auth.
* fix: require llm.models.default in status and match codex probe remediation
Status reported a project ready when a non-none LLM backend was configured
without llm.models.default, but the runtime (resolveModelSlots) hard-requires
it, so ingest/scan/memory threw after `ktx status` said the project was usable.
buildLlmStatus now fails for any non-none backend missing models.default and no
longer invents a fallback model for claude-code/codex.
Codex probe failures now carry a category-matched fix: a model-access failure
steers the user at llm.models.default instead of the auth/install remediation.
runCodexAuthProbe returns the fix and status consumes it; the message stays
self-sufficient so setup output is unchanged.
Docs: README now lists the codex backend and local Codex auth; ktx-setup.mdx
states --llm-model only accepts codex/default or gpt-*/codex-* ids.
Repaired four doctor fixtures that configured a backend without models.default
(the now-correctly-blocked config) and added coverage for the new behavior.
2026-06-02 13:57:11 +02:00
```
With `claude-code`, **ktx** agent loops can invoke only the **ktx** MCP tools
for the current run. With `codex`, **ktx** restricts the temporary runtime MCP
server to the current run's tool set, disables Codex web search, requests a
read-only sandbox, and sets `approval_policy=never`. The public Codex SDK and
CLI surface may still load user Codex config and built-in command execution or
read-only file capabilities, so use `claude-code` for stricter runtime tool
isolation.
2026-05-16 12:06:34 +02:00
2026-05-14 18:09:26 -04:00
## Query history
2026-05-11 00:45:43 -07:00
2026-05-18 09:57:27 -04:00
PostgreSQL, BigQuery, and Snowflake can add query-history context: common joins,
2026-06-03 17:19:42 +02:00
filters, redaction rules, high-usage templates, and service-account exclusions.
When query history is enabled during setup, **ktx** reviews observed in-scope
roles and can write exact `filters.serviceAccounts` patterns for operational
traffic such as loader or refresh roles.
2026-05-11 00:45:43 -07:00
2026-05-14 18:09:26 -04:00
Enable it during setup, store it under `connections.<id>.context.queryHistory`,
or request it for one run:
2026-05-11 00:45:43 -07:00
2026-05-14 18:09:26 -04:00
```bash
2026-05-29 17:41:04 +02:00
ktx ingest warehouse --query-history
2026-05-15 15:31:51 -04:00
# Set the lookback window for BigQuery or Snowflake query history
2026-05-14 18:09:26 -04:00
ktx ingest warehouse --query-history-window-days 30
```
2026-05-11 00:45:43 -07:00
2026-05-14 18:09:26 -04:00
Use `--no-query-history` when you want to skip a stored query-history setting
for one run.
2026-05-11 00:45:43 -07:00
2026-05-14 18:09:26 -04:00
## Relationship evidence
2026-05-11 00:45:43 -07:00
2026-05-29 17:41:04 +02:00
**ktx** scores relationship candidates during database ingest. The public CLI
does not expose separate relationship review subcommands.
2026-05-11 00:45:43 -07:00
2026-05-14 18:09:26 -04:00
## Context-source ingest
2026-05-11 00:45:43 -07:00
2026-05-18 09:57:27 -04:00
Context-source connections pull metadata from dbt, BI tools, Notion, and other
configured systems. Pass one connection id or `--all`.
2026-05-11 00:45:43 -07:00
```bash
2026-05-20 17:33:38 +02:00
# Build one context-source connection
2026-05-14 18:09:26 -04:00
ktx ingest dbt_main
2026-05-20 17:33:38 +02:00
# Build every configured database and context-source connection
2026-05-14 18:09:26 -04:00
ktx ingest --all
2026-05-11 00:45:43 -07:00
```
2026-05-14 18:09:26 -04:00
Supported source types:
| Driver | Typical source | Output |
|--------|----------------|--------|
| `dbt` | dbt project or Git repo | Semantic sources with model, column, test, tag, and description metadata |
| `metricflow` | MetricFlow project or Git repo | Metrics, dimensions, entities, and semantic joins |
| `lookml` | LookML files or Git repo | Views, explores, dimensions, measures, and joins |
| `looker` | Looker API | Explores, looks, dashboards, and model metadata |
| `metabase` | Metabase API | Questions, dashboards, table metadata, and mappings |
| `notion` | Notion API | Wiki pages and business knowledge |
2026-05-20 17:33:38 +02:00
Context-source ingest writes semantic source YAML and wiki Markdown, reconciling
with local edits.
2026-05-14 18:09:26 -04:00
## Text ingest
2026-05-20 01:52:37 +02:00
Use `ktx ingest --text` / `ktx ingest --file` for notes, Markdown, runbooks,
Slack exports, or other searchable memory.
2026-05-14 18:09:26 -04:00
```bash
# Capture a Markdown file
2026-05-20 01:52:37 +02:00
ktx ingest --file docs/revenue-notes.md --connection-id warehouse
2026-05-14 18:09:26 -04:00
# Capture one stdin item
2026-05-20 01:52:37 +02:00
printf "Refunds are excluded from net revenue." | ktx ingest --file -
2026-05-14 18:09:26 -04:00
# Capture direct text
2026-05-20 01:52:37 +02:00
ktx ingest --text "ARR excludes one-time implementation fees."
2026-05-14 18:09:26 -04:00
```
Useful flags:
2026-05-11 00:45:43 -07:00
| Flag | Description |
|------|-------------|
2026-05-20 01:52:37 +02:00
| `--text <content>` | Capture inline text into memory; repeatable |
| `--file <path>` | Capture a text file (or `-` for stdin) into memory; repeatable |
2026-05-20 17:33:38 +02:00
| `--connection-id <connectionId>` | Attach the captured memory to a **ktx** connection |
2026-05-14 18:09:26 -04:00
| `--user-id <id>` | Attribute capture to a user scope, default `local-cli` |
| `--json` | Print structured output |
2026-05-20 01:52:37 +02:00
| `--fail-fast` | Stop after the first failed text/file item |
2026-05-14 18:09:26 -04:00
2026-05-20 17:33:38 +02:00
Use text ingest for small, high-signal documents. Prefer configured context-source
2026-05-18 09:57:27 -04:00
ingest for Notion, dbt, Metabase, and similar systems.
2026-05-14 18:09:26 -04:00
## Output and artifacts
2026-05-18 09:57:27 -04:00
Every ingest run prints a summary. Use `--json` for scripts and agents.
2026-05-14 18:09:26 -04:00
```bash
ktx ingest --all --json
2026-05-11 00:45:43 -07:00
```
2026-05-14 18:09:26 -04:00
Typical generated files:
2026-05-11 00:45:43 -07:00
2026-05-14 18:09:26 -04:00
| Path | Created by | Purpose |
|------|------------|---------|
2026-05-20 17:33:38 +02:00
| `semantic-layer/<connection-id>/*.yaml` | Database and context-source ingest | Queryable semantic source definitions |
| `wiki/global/*.md` | Context-source, text, and memory ingest | Shared business definitions and notes |
2026-05-14 18:09:26 -04:00
| `wiki/user/<user-id>/*.md` | Text and memory ingest | User-scoped context |
| `.ktx/setup/context-build.json` | Setup context build | Resume and readiness state for setup |
2026-05-18 09:57:27 -04:00
Ingest transcripts include tool calls, LLM responses, and write decisions.
2026-05-11 00:45:43 -07:00
2026-05-14 18:09:26 -04:00
## Example: first full refresh
2026-05-11 00:45:43 -07:00
2026-05-14 18:09:26 -04:00
After interactive setup:
2026-05-11 00:45:43 -07:00
2026-05-14 18:09:26 -04:00
```bash
ktx status
2026-05-29 17:41:04 +02:00
ktx ingest --all
2026-05-14 18:09:26 -04:00
ktx status
```
Then inspect what changed:
```bash
git status --short
2026-05-20 01:52:37 +02:00
ktx sl --json
ktx wiki "revenue" --json --limit 10
2026-05-11 00:45:43 -07:00
```
2026-05-14 18:09:26 -04:00
## Common errors
2026-05-11 00:45:43 -07:00
2026-05-14 18:09:26 -04:00
| Symptom | Likely cause | Recovery |
|---------|--------------|----------|
| Connection not configured | The connection id is missing from `ktx.yaml` | Add it with `ktx setup` |
2026-05-29 17:41:04 +02:00
| Enrichment is not configured | LLM or embeddings are not setup-ready | Run `ktx setup` to configure a model and embeddings |
| Query history is unsupported | The selected database driver does not expose query history | Run ingest without query-history flags |
2026-05-20 17:33:38 +02:00
| No connections configured | The project has no entries under `connections` | Run `ktx setup` and add a database or context-source connection |
2026-05-29 17:41:04 +02:00
| Context-source flags have no effect | Query-history flags were supplied for a context-source connector | Use query-history flags only for database connections |
2026-05-14 18:09:26 -04:00
| Text ingest stops early | `--fail-fast` stopped on the first failed item | Fix the item or rerun without `--fail-fast` |