docs(docs-site): make core guides agent-friendly

This commit is contained in:
Luca Martial 2026-05-11 16:42:08 -07:00
parent dfa4651ebc
commit 9580bd243d
4 changed files with 191 additions and 1 deletions

View file

@ -5,6 +5,20 @@ description: Set up KTX and build your first context in under 10 minutes.
This guide walks you through `ktx setup` — an interactive wizard that configures your LLM provider, connects your database, optionally ingests from your existing tools, builds context, and installs agent integration.
## Workflow summary
Use this sequence when an agent needs to set up KTX from a fresh checkout:
1. `pnpm install` — install workspace dependencies.
2. `pnpm run setup:dev` — build local packages and prepare the development CLI.
3. `pnpm run link:dev` — link the `ktx` command for local use.
4. `ktx setup` — create or resume a KTX project.
5. `ktx status` — verify project readiness.
6. `ktx sl list` — confirm semantic-layer sources are available.
7. `ktx sl query ... --format sql` — compile a semantic query without executing it.
The setup wizard is stateful. If it exits before completion, rerun `ktx setup` in the same project directory to resume from the first incomplete step.
## Prerequisites
- **Node.js 22+** and **pnpm**
@ -198,6 +212,20 @@ Then select which agents to install for:
**MCP mode** writes an MCP server configuration (e.g., `.mcp.json`) that lets the agent call KTX tools like `sl_query`, `knowledge_search`, and `sl_write_source` over the Model Context Protocol.
## Generated files
KTX writes project state as plain files so agents can inspect and edit changes in git.
| Path | Created by | Purpose |
|------|------------|---------|
| `ktx.yaml` | `ktx setup` | Main project configuration: connections, LLM settings, embeddings, and context sources |
| `.ktx/secrets/*` | `ktx setup` when file-backed secrets are selected | Local secret files referenced from `ktx.yaml`; do not commit these |
| `semantic-layer/<connection-id>/*.yaml` | context build, ingestion, or `ktx sl write` | Semantic source definitions agents use for SQL generation |
| `knowledge/global/*.md` | ingestion or `ktx wiki write --scope global` | Shared business context and metric definitions |
| `knowledge/user/<user-id>/*.md` | `ktx wiki write --scope user` | User-scoped notes for one agent/user context |
| `.mcp.json`, `.cursor/mcp.json`, `.agents/mcp/ktx.json`, `.opencode/mcp.json` | agent integration setup | MCP server configuration for supported agent clients |
| `.claude/skills/ktx/SKILL.md`, `.agents/skills/ktx/SKILL.md` | CLI-mode agent integration setup | Agent instructions for calling `ktx agent` commands |
## Verify it worked
Check your project status:
@ -247,6 +275,18 @@ ktx sl query \
--execute --max-rows 10
```
## Common errors
| Error or symptom | Likely cause | Recovery |
|------------------|--------------|----------|
| `ktx: command not found` | The local CLI has not been linked | Run `pnpm run setup:dev` and `pnpm run link:dev` from the KTX checkout, then open a new shell |
| LLM health check fails | Missing, invalid, or unauthorized Anthropic API key | Export `ANTHROPIC_API_KEY` or rerun `ktx setup` and choose the file-backed secret option |
| OpenAI embedding check fails | `OPENAI_API_KEY` is missing when OpenAI embeddings are selected | Export `OPENAI_API_KEY`, or rerun setup and choose local sentence-transformers embeddings |
| Local embeddings hang or fail | The Python daemon cannot start or the local model runtime is unavailable | Run `uv sync --all-groups`, then start `ktx-daemon serve-http --host 127.0.0.1 --port 8765` and rerun setup |
| Database connection test fails | Credentials, network access, warehouse, database, or schema value is wrong | Test the same URL with the database's native client, then rerun `ktx connection add ... --force` or rerun setup |
| `KTX context built: no` in `ktx status` | Setup saved configuration but did not build context | Run `ktx setup context build` or rerun `ktx setup` and choose to build context now |
| Agent integration is incomplete | Setup skipped the agents step or the target was not installed | Run `ktx setup --agents --target codex --agent-install-mode both --project` using the target you need |
## Next steps
- **Build more context** — learn about [scanning](/docs/guides/building-context), relationship detection, and ingestion workflows in the Building Context guide.

View file

@ -5,6 +5,18 @@ description: Expose your context to Claude Code, Cursor, Codex, and other coding
Once you've built and refined your context, the final step is exposing it to coding agents. KTX provides two channels: an **MCP server** for persistent integration with tools like Claude Code and Cursor, and **CLI commands** for direct terminal access.
## Agent workflow summary
Agents should use KTX in this order:
1. Discover connections with `connection_list` or `ktx agent context --json`.
2. Discover semantic sources with `sl_list_sources` or `ktx agent sl list --json`.
3. Search knowledge with `knowledge_search` or `ktx agent wiki search`.
4. Query through the semantic layer with `sl_query` or `ktx agent sl query`.
5. Execute SQL only when execution is explicitly enabled and row limits are set.
Use the semantic layer first for analytics questions. Direct SQL is a fallback for read-only inspection, not the default path.
## MCP Server
The MCP (Model Context Protocol) server gives agents structured access to your entire context layer — semantic sources, knowledge pages, scans, and ingestion — through a standard tool-calling interface.
@ -85,6 +97,27 @@ When an agent connects via MCP, it can call these tools:
| `memory_capture` | Capture knowledge and semantic updates from a conversation |
| `memory_capture_status` | Check the status of a memory capture run |
### Tool input reference
| Tool | Required inputs | Optional inputs | Output shape |
|------|-----------------|-----------------|--------------|
| `connection_list` | none | none | JSON list of configured connections |
| `connection_test` | `connectionId` | none | JSON test result with driver metadata or an error |
| `sl_list_sources` | none | `connectionId`, `query` | JSON list of semantic source summaries |
| `sl_read_source` | `sourceName`, `connectionId` | none | YAML source content and metadata |
| `sl_write_source` | `sourceName`, `connectionId`, source YAML or delete operation | none | Write result and validation details |
| `sl_validate` | `sourceName`, `connectionId` | none | Validation result with schema and join issues |
| `sl_query` | `connectionId`, measures or query payload | dimensions, filters, segments, order, limit, execute, maxRows | Compiled SQL, query plan, and rows when execution is enabled |
| `knowledge_search` | `query` | `limit`, `userId` | Ranked knowledge results with summaries |
| `knowledge_read` | `pageId` or key | `userId` | Full Markdown knowledge page |
| `knowledge_write` | key, summary, content | tags, refs, semantic-layer refs, scope, userId | Write result |
| `scan_trigger` | `connectionId`, mode | daemon URLs, dry-run options | Scan run id and status |
| `scan_status` | `runId` | none | Scan progress and current state |
| `scan_report` | `runId` | none | Completed scan report |
| `ingest_trigger` | connection/source adapter selection | limits and introspection URLs | Ingest run id and status |
| `ingest_status` | `runId` | none | Ingest progress, work units, and diff summary |
| `memory_capture` | conversation input | model and user options | Memory capture run id |
### How agents use these tools
A typical agent interaction flows like this:
@ -97,6 +130,16 @@ A typical agent interaction flows like this:
Agents should use the semantic layer for analytics questions because it enforces correct joins, grain-aware aggregation, and consistent metric definitions. If SQL execution is enabled, KTX only allows read-only SQL with row limits.
### Workflow: answer an analytics question through MCP
1. `connection_list` — choose the relevant warehouse connection.
2. `sl_list_sources` with a search query — find candidate semantic sources.
3. `knowledge_search` with the user's business terms — find metric definitions and caveats.
4. `sl_read_source` for each candidate source — inspect measures, dimensions, joins, and grain.
5. `sl_query` with `execute: false` — compile SQL and inspect the generated query.
6. `sl_query` with `execute: true` and a bounded `maxRows` — execute only when the user asked for data and execution is enabled.
7. Cite the semantic source and knowledge pages used in the answer.
## CLI Commands
For agents that work through the terminal rather than MCP, KTX provides a set of machine-readable commands under `ktx agent`. These return JSON output designed for programmatic consumption.
@ -149,6 +192,28 @@ ktx agent sql execute --json \
--max-rows 500
```
### CLI input reference
| Command | Required inputs | Optional inputs | Output |
|---------|-----------------|-----------------|--------|
| `ktx agent tools --json` | `--json` | none | JSON list of available agent commands |
| `ktx agent context --json` | `--json` | none | JSON project context and readiness state |
| `ktx agent sl list --json` | `--json` | `--connection-id`, `--query` | JSON semantic source list |
| `ktx agent sl read <sourceName> --json --connection-id <id>` | source name, `--json`, `--connection-id` | none | JSON payload containing source YAML |
| `ktx agent sl query --json --connection-id <id> --query-file <path>` | `--json`, `--connection-id`, `--query-file` | `--execute`, `--max-rows` | JSON compiled query, SQL, plan, and optional rows |
| `ktx agent wiki search <query> --json` | query, `--json` | `--limit` | JSON ranked knowledge results |
| `ktx agent wiki read <pageId> --json` | page id, `--json` | none | JSON full knowledge page |
| `ktx agent sql execute --json --connection-id <id> --sql-file <path> --max-rows <n>` | `--json`, `--connection-id`, `--sql-file`, `--max-rows` | none | JSON rows and execution metadata |
### Workflow: answer an analytics question through CLI
1. `ktx agent context --json` — verify the KTX project is ready for agents.
2. `ktx agent sl list --json --query "revenue"` — find semantic sources related to the question.
3. `ktx agent wiki search "revenue recognition" --json --limit 5` — retrieve business definitions.
4. Write a query JSON file with measures, dimensions, filters, and limits.
5. `ktx agent sl query --json --connection-id my-postgres --query-file query.json` — compile and inspect SQL.
6. Add `--execute --max-rows 100` only when the user needs rows and execution is allowed.
### When to use CLI vs MCP
| | MCP | CLI |
@ -205,3 +270,13 @@ The agents step auto-detects installed tools and generates the right configurati
```
After configuration, the agent can immediately start calling KTX tools — listing sources, searching knowledge, and querying your semantic layer.
## Common errors
| Error or symptom | Likely cause | Recovery |
|------------------|--------------|----------|
| Agent cannot find the MCP server | Agent config points to a missing `ktx` binary or wrong project directory | Run `ktx setup --agents` again, then verify the generated MCP config contains the intended `KTX_PROJECT_DIR` |
| MCP tools list but semantic queries fail | `--semantic-compute` was not enabled or the daemon URL is wrong | Start `ktx serve --mcp stdio --semantic-compute` or set `--semantic-compute-url` to the running daemon |
| Query execution is rejected | The MCP server was started without `--execute-queries` or the SQL is not read-only | Restart with `--execute-queries` only when execution is intended, and keep `maxRows` bounded |
| `ktx agent` command exits without JSON | `--json` was omitted | Re-run the command with `--json`; all `ktx agent` subcommands require it |
| SQL execution exceeds limits | `--max-rows` is missing or too high | Re-run with an explicit value from 1 to 1000 |

View file

@ -5,6 +5,17 @@ description: Write and refine semantic sources and knowledge pages.
After building context through scanning and ingestion, you'll want to refine it — edit semantic sources to match your business logic, add knowledge pages that capture tribal knowledge, and query your data through the semantic layer to verify everything works.
## Agent workflow summary
Agents should refine context in this order:
1. `ktx sl list --json` — discover available sources and connection ids.
2. `ktx sl read <source> --connection-id <id>` — inspect the current YAML.
3. Edit the source YAML directly or use `ktx sl write`.
4. `ktx sl validate <source> --connection-id <id>` — verify columns, joins, and table references.
5. `ktx sl query ... --format sql` — compile a representative query without executing it.
6. `ktx wiki search ...` and `ktx wiki write ...` — add business context that does not belong in schema YAML.
## Semantic Sources
Semantic sources are YAML files that describe your tables, columns, measures, and joins. They're the core of the context layer — the structured definitions that agents use to generate correct SQL.
@ -108,6 +119,26 @@ Key fields:
| `segments` | No | Named filter conditions |
| `inherits_columns_from` | No | Inherit column metadata from a manifest entry |
Source component fields:
| Component | Field | Required | Description |
|-----------|-------|----------|-------------|
| Column | `name` | Yes | Column identifier as used in SQL expressions |
| Column | `type` | Yes | Agent-facing type: `string`, `number`, `time`, or `boolean` |
| Column | `role` | No | Special role such as `time` for default time dimensions |
| Column | `visibility` | No | `public`, `internal`, or `hidden` |
| Column | `description` | Strongly recommended | Human-readable business meaning |
| Measure | `name` | Yes | Queryable metric name |
| Measure | `expr` | Yes | SQL aggregation expression at the source grain |
| Measure | `filter` | No | SQL predicate applied only to this measure |
| Measure | `description` | Strongly recommended | Definition agents can cite and compare |
| Segment | `name` | Yes | Reusable filter name |
| Segment | `expr` | Yes | SQL predicate for the segment |
| Join | `to` | Yes | Target semantic source name |
| Join | `on` | Yes | SQL join condition using source names or aliases |
| Join | `relationship` | Yes | `many_to_one`, `one_to_many`, or `one_to_one` |
| Join | `alias` | No | Query alias for repeated or clearer joins |
Column visibility controls what agents see:
| Visibility | Behavior |
@ -192,6 +223,16 @@ Query flags:
The query planner is grain-aware — it understands the cardinality of joins and avoids chasm traps (double-counting caused by many-to-many fan-outs). When you query measures that span multiple sources, KTX generates sub-queries at the correct grain before joining.
### Workflow: edit and validate a source
1. `ktx sl read orders --connection-id my-postgres > /tmp/orders.yaml` — capture the current definition.
2. Edit `/tmp/orders.yaml` to add columns, measures, joins, or descriptions.
3. `ktx sl write orders --connection-id my-postgres --yaml "$(cat /tmp/orders.yaml)"` — write the updated source.
4. `ktx sl validate orders --connection-id my-postgres` — check the definition against the live schema.
5. `ktx sl query --connection-id my-postgres --measure total_revenue --dimension order_date --format sql` — compile a representative query.
If validation fails, fix the YAML before asking an agent to use the source. Common validation failures are missing columns, invalid join targets, and measure expressions that reference fields outside the source.
## Knowledge Pages
Knowledge pages are Markdown files that capture business context — definitions, rules, gotchas, and anything an agent needs to understand beyond what the schema tells it.
@ -250,6 +291,18 @@ Write flags:
| `--ref <ref>` | Reference to external resources (repeatable) |
| `--sl-ref <ref>` | Link to a semantic source (repeatable) |
Knowledge page fields:
| Field | Required | Description |
|-------|----------|-------------|
| Key | Yes | Stable page identifier passed to `ktx wiki read` |
| Summary | Yes | Short text shown in search results |
| Content | Yes | Full Markdown business context |
| Scope | No | `global` for shared context or `user` for user-scoped notes |
| Tags | No | Search and organization labels |
| External refs | No | Links or identifiers for source-of-truth systems |
| Semantic-layer refs | No | Source names the page explains or constrains |
You can also create and edit knowledge pages directly as Markdown files in the `knowledge/` directory.
### Listing pages
@ -271,3 +324,21 @@ ktx wiki search "revenue recognition"
```
Search uses both full-text matching and semantic similarity — it finds relevant pages even when the exact terms don't match. Agents call this automatically when they need business context to answer a question.
### Workflow: add searchable business context
1. Search first: `ktx wiki search "order status definitions"`.
2. If no page already covers the rule, write a page with `ktx wiki write`.
3. Include a concise `--summary`; agents see this before loading full content.
4. Add `--tag` values for the business area and `--sl-ref` values for related semantic sources.
5. Search again with the user's likely wording to confirm the page is discoverable.
## Common errors
| Error or symptom | Likely cause | Recovery |
|------------------|--------------|----------|
| `ktx sl validate` reports a missing column | YAML references a column that is absent from the scanned table | Run a fresh scan or update the YAML to match the warehouse schema |
| Query compilation double-counts a measure | Join relationship or grain is missing or wrong | Add `grain` and explicit `relationship` values, then validate and recompile |
| Agent cannot find a metric | Measure name or description does not match business terminology | Add a measure description and a knowledge page with common synonyms |
| Knowledge search misses a page | Summary and tags do not include likely user wording | Rewrite the summary and add relevant tags, then search again |
| `ktx sl write` changes are hard to review | Large YAML was passed inline | Edit the source file directly or write from a temporary file, then review the git diff |

View file

@ -97,5 +97,9 @@ function toLlmDocsPage(page: ReturnType<typeof source.getPages>[number]) {
}
function normalizeMarkdown(markdown: string) {
return markdown.trim().replace(/\n{3,}/g, "\n\n");
return markdown
.trim()
.replace(/^---\n[\s\S]*?\n---\n?/, "")
.trim()
.replace(/\n{3,}/g, "\n\n");
}