docs: tighten guide copy (#131)

This commit is contained in:
Luca Martial 2026-05-18 09:57:27 -04:00 committed by GitHub
parent d60d83e595
commit c539433d66
No known key found for this signature in database
GPG key ID: B5690EEEBB952194
4 changed files with 67 additions and 115 deletions

View file

@ -3,12 +3,9 @@ title: Building Context
description: Build and refresh KTX context from databases, source tools, query history, and text.
---
Building context turns configured connections into local semantic-layer sources
and wiki pages. Agents use those files to understand your schema, business
definitions, metric logic, joins, and known caveats before they write SQL.
Use this guide after `ktx setup` has created `ktx.yaml` and at least one
database or context-source connection.
Build context after `ktx setup` creates `ktx.yaml` and at least one database or
context-source connection. KTX writes local semantic-layer sources and wiki
pages for agents to use before writing SQL.
## The build loop
@ -22,15 +19,12 @@ Most projects use this loop:
5. Validate and query representative sources before handing the context to an
agent.
`ktx ingest --all` runs database connections first, then context-source
connections. That order lets dbt, BI, Notion, and text ingest attach context to
known warehouse tables.
`ktx ingest --all` runs databases first, then context-source connections, so
external metadata can attach to known warehouse tables.
## Database ingest
Database ingest connects to a configured warehouse and records local schema
context. It gives agents table, column, type, constraint, and row-count
grounding without requiring them to inspect the database directly.
Database ingest records table, column, type, constraint, and row-count context.
```bash
# Build one configured database connection
@ -55,20 +49,16 @@ ktx ingest warehouse --deep
ktx ingest --all --deep
```
Deep ingest needs LLM and embedding readiness. If those providers are not
configured, run `ktx setup` or use `--fast`.
Deep ingest needs LLM and embedding readiness. Otherwise run `ktx setup` or use
`--fast`.
When you use `claude-code`, KTX still controls the tool surface for ingest and
memory capture. Claude Code built-in tools, discovered MCP servers, plugins,
skills, agents, and slash commands are not invokable by KTX agent loops unless
they are exact KTX MCP tools for the current run.
With `claude-code`, KTX agent loops can invoke only the KTX MCP tools for the
current run.
## Query history
PostgreSQL, BigQuery, and Snowflake can add query-history context. This helps
KTX learn common joins, filters, service-account patterns, redaction rules, and
usage-heavy query templates. BigQuery and Snowflake support a lookback window;
Postgres reads the current `pg_stat_statements` aggregate data instead.
PostgreSQL, BigQuery, and Snowflake can add query-history context: common joins,
filters, service-account patterns, redaction rules, and high-usage templates.
Enable it during setup, store it under `connections.<id>.context.queryHistory`,
or request it for one run:
@ -84,19 +74,13 @@ for one run.
## Relationship evidence
Many databases do not declare all foreign keys. KTX can score relationship
candidates using signals such as name similarity, type compatibility, value
overlap, embedding similarity, uniqueness, null rate, and structural priors.
The public CLI does not expose separate relationship review subcommands.
Relationship evidence is built as part of deep database ingest when the
connector and readiness checks support it.
KTX scores relationship candidates during supported deep database ingest. The
public CLI does not expose separate relationship review subcommands.
## Context-source ingest
Context-source connections pull business metadata from tools your team already
uses. The current public `ktx ingest` command is connection-centric: pass one
configured connection id, or pass `--all`.
Context-source connections pull metadata from dbt, BI tools, Notion, and other
configured systems. Pass one connection id or `--all`.
```bash
# Build one source connection
@ -117,14 +101,13 @@ Supported source types:
| `metabase` | Metabase API | Questions, dashboards, table metadata, and mappings |
| `notion` | Notion API | Wiki pages and business knowledge |
Source ingest extracts metadata, reconciles it with existing local context, and
writes semantic-layer YAML plus wiki Markdown. It merges rather than blindly
overwriting local edits.
Source ingest writes semantic-layer YAML and wiki Markdown, merging with local
edits.
## Text ingest
Use `ktx ingest text` for notes, Markdown files, runbooks, Slack exports, or
other free-form knowledge that should become searchable KTX memory.
Use `ktx ingest text` for notes, Markdown, runbooks, Slack exports, or other
searchable memory.
```bash
# Capture a Markdown file
@ -146,14 +129,12 @@ Useful flags:
| `--json` | Print structured output |
| `--fail-fast` | Stop after the first failed text item |
Text ingest is a good fit for small, high-signal documents. For system-specific
connectors such as Notion, dbt, or Metabase, prefer configured source ingest so
KTX can preserve source metadata.
Use text ingest for small, high-signal documents. Prefer configured source
ingest for Notion, dbt, Metabase, and similar systems.
## Output and artifacts
Every ingest run prints a summary. Use `--json` when an agent or script needs a
structured plan and per-target results.
Every ingest run prints a summary. Use `--json` for scripts and agents.
```bash
ktx ingest --all --json
@ -168,9 +149,7 @@ Typical generated files:
| `wiki/user/<user-id>/*.md` | Text and memory ingest | User-scoped context |
| `.ktx/setup/context-build.json` | Setup context build | Resume and readiness state for setup |
Ingest sessions also record transcripts with tool calls, LLM responses, and
write decisions. Inspect them when you need to debug why a source or wiki page
was written a certain way.
Ingest transcripts include tool calls, LLM responses, and write decisions.
## Example: first full refresh

View file

@ -3,8 +3,8 @@ title: LLM configuration
description: Configure KTX LLM providers, model roles, and prompt caching.
---
KTX uses the top-level `llm` block in `ktx.yaml` for text generation,
structured extraction, and ingest or memory agent loops.
Configure text generation, structured extraction, and ingest or memory loops in
the top-level `llm` block.
## Backends
@ -15,9 +15,7 @@ Set `llm.provider.backend` to one of these values:
- `vertex`: Use Vertex AI Anthropic models through Google Cloud credentials.
- `gateway`: Use AI Gateway-compatible Anthropic model ids.
- `claude-code`: Use your local Claude Code session through the Claude Agent
SDK. KTX removes provider-routing environment variables from Claude Code
child processes, so this backend doesn't silently fall back to
`ANTHROPIC_API_KEY`, Vertex, Gateway, or Bedrock credentials.
SDK. KTX strips provider-routing environment variables from child processes.
## Claude Code
@ -36,26 +34,20 @@ llm:
repair: sonnet
```
During setup, choose the Claude Code backend interactively or pass the model in
automation:
During setup, choose the backend interactively or pass the model in automation:
```bash
ktx setup --llm-backend claude-code --llm-model opus --no-input
```
For Claude Code, `sonnet`, `opus`, and `haiku` map to the current KTX defaults.
You can also pass a full Claude model ID, such as `claude-opus-4-7`.
For Claude Code, `sonnet`, `opus`, and `haiku` map to KTX defaults. Full Claude
model IDs are also accepted.
`claude-code` keeps KTX tool boundaries intact. KTX exposes only the MCP tools
needed for the current KTX agent loop, disables Claude Code built-in tools,
keeps plugins empty, and denies every non-KTX tool request through
`canUseTool`. The Claude Agent SDK may still report host-discovered slash
commands, skills, and subagent names in init metadata; that metadata is not an
execution grant for KTX agent loops.
`claude-code` exposes only KTX MCP tools for the current agent loop. SDK init
metadata may still list host slash commands, skills, and subagents; KTX does not
grant execution access to them.
## Prompt caching
`llm.promptCaching` has partial parity on `claude-code`. KTX doesn't pass
Anthropic cache-control markers to the Claude Agent SDK. Status and doctor warn
when you configure prompt-cache TTL, tool, or history fields that the Claude
Agent SDK backend ignores.
`llm.promptCaching` has partial parity on `claude-code`. Status and doctor warn
when the Claude Agent SDK backend ignores configured cache fields.

View file

@ -3,9 +3,8 @@ title: Serving Agents
description: Expose KTX context to Claude Code, Codex, Cursor, OpenCode, and custom agents.
---
KTX serves agents through the public CLI and project-local instruction files.
Agents do not need a separate server. They read the generated rules, call KTX
commands, inspect local context files, and use JSON output when they need
KTX serves agents through the CLI and project-local instruction files. Agents
read generated rules, call KTX commands, inspect context files, and use JSON for
structured results.
## Recommended setup
@ -39,14 +38,13 @@ ktx setup --agents --target claude-code --global
ktx setup --agents --target codex --global
```
KTX records installed files in `.ktx/agents/install-manifest.json`. Rerun
`ktx setup --agents` after moving a checkout or reinstalling the CLI so the
generated instructions point at the current CLI path.
Installed files are recorded in `.ktx/agents/install-manifest.json`. Rerun
`ktx setup --agents` after moving a checkout or reinstalling the CLI.
## Agent command set
All supported agent clients use the same command surface. Use `--project-dir`
when the agent is running outside the KTX project directory.
All supported clients use the same command surface. Use `--project-dir` outside
the KTX project directory.
### Readiness
@ -54,9 +52,8 @@ when the agent is running outside the KTX project directory.
ktx status --json
```
Agents should run this before relying on context. It reports project, LLM,
embedding, database, context-source, context-build, and agent-integration
readiness.
Run this before relying on context. It reports project, provider, connection,
context-build, and agent-integration readiness.
### Semantic layer discovery
@ -66,8 +63,8 @@ ktx sl list --connection-id warehouse --json
ktx sl search "revenue" --json --limit 10
```
Agents use these commands to discover source names, connection ids, measures,
dimensions, and likely files to inspect.
Use these commands to find source names, connection ids, measures, dimensions,
and files to inspect.
### Semantic-layer validation and queries
@ -106,9 +103,8 @@ ktx wiki list --json
ktx wiki search "revenue recognition" --json --limit 10
```
Agents should search wiki context when a question depends on business
definitions, metric caveats, process rules, or terms that are not obvious from
schema names.
Search wiki context for business definitions, metric caveats, process rules, and
non-obvious terms.
### Context refresh
@ -120,8 +116,7 @@ ktx ingest --all
ktx ingest text docs/revenue-notes.md --connection-id warehouse
```
Use `--deep` only when LLM and embedding setup is ready and the user expects an
AI-enriched refresh.
Use `--deep` only when LLM and embedding setup is ready.
## Good agent behavior
@ -135,14 +130,12 @@ Agents should:
- Validate edited semantic sources with `ktx sl validate`.
- Keep generated context changes reviewable in git.
Agents should not assume a background server, ORPC route, frontend app, or
external migration system exists. KTX is a local context layer with a CLI and
plain project files.
KTX is a local context layer with a CLI and plain project files. Do not assume a
background server, ORPC route, frontend app, or external migration system.
## Manual setup
Manual setup is useful for custom agents that can read project-local
instructions but are not yet a named target.
Use manual setup for custom agents that can read project-local instructions.
1. Install the universal target:

View file

@ -3,12 +3,8 @@ title: Writing Context
description: Edit semantic sources and wiki pages so agents use your business logic.
---
KTX context is meant to be edited. Ingest gives you a grounded first draft, then
you refine source YAML and wiki Markdown until agents can answer data questions
with the same definitions your team uses.
Use this guide when you are adding measures, fixing joins, documenting business
rules, or reviewing context changes made by an agent.
Ingest creates the first draft. Edit source YAML and wiki Markdown when you need
sharper metrics, joins, or business rules.
## Editing workflow
@ -45,10 +41,8 @@ Use this order for most context changes:
## Semantic sources
Semantic sources are YAML files that describe queryable entities. A source is
usually a table, but it can also point at a custom SQL expression. Sources
define the vocabulary agents use for measures, dimensions, segments, joins, and
grain-aware query planning.
Semantic sources are YAML files for queryable tables or custom SQL. They define
agent-facing measures, dimensions, segments, joins, and grain.
Source files live at:
@ -198,8 +192,8 @@ joins:
## Measures
Good measures have precise names, SQL expressions at the correct grain, and
descriptions that say what is included and excluded.
Good measures have precise names, correct-grain SQL, and descriptions that name
key inclusions and exclusions.
```yaml
measures:
@ -209,14 +203,13 @@ measures:
description: Completed order revenue after refunds, excluding cancelled orders.
```
Prefer one canonical measure plus wiki synonyms over several nearly identical
measures. If your team uses multiple definitions, document the distinction in a
wiki page and link it with `sl_refs`.
Prefer one canonical measure plus wiki synonyms. Put competing definitions in a
linked wiki page.
## Joins and grain
`grain` and `relationship` prevent agents from producing double-counted SQL.
State the row grain even when it seems obvious.
`grain` and `relationship` prevent double-counted SQL. State the row grain even
when it seems obvious.
```yaml
grain:
@ -228,8 +221,7 @@ joins:
```
Use `many_to_one` for dimensions such as customer, account, product, or plan.
Use `one_to_many` only when the target can fan out the source rows, such as
orders to order items.
Use `one_to_many` only when the target can fan out rows.
## Validate and query
@ -239,8 +231,7 @@ Validation checks source YAML against the live database schema:
ktx sl validate orders --connection-id warehouse
```
It catches missing columns, invalid join targets, and table-reference problems
before an agent relies on the source.
It catches missing columns, invalid joins, and table-reference problems.
Compile a query to inspect generated SQL:
@ -268,9 +259,8 @@ ktx sl query \
## Wiki pages
Wiki pages capture business context that does not belong in a single source
file: metric policies, dashboard caveats, company vocabulary, data freshness,
known issues, and source-of-truth notes.
Wiki pages hold context that does not belong in one source file: policies,
caveats, vocabulary, freshness, known issues, and source-of-truth notes.
Wiki files live under:
@ -280,8 +270,7 @@ wiki/
user/<user-id>/
```
Use global pages for shared business rules. Use user-scoped pages for local
notes, personal conventions, or context that should not be shared broadly.
Use global pages for shared rules and user-scoped pages for local notes.
### Wiki page example
@ -338,8 +327,7 @@ ktx sl search "revenue" --json
ktx wiki search "revenue recognition" --json --limit 10
```
Check that definitions are specific, hidden columns stay hidden, joins have
explicit relationships, and measures compile into the expected SQL.
Check definitions, hidden columns, join relationships, and generated SQL.
## Common errors