docs: tighten guide copy (#131)

2026-07-22 11:51:01 +02:00 · 2026-05-18 09:57:27 -04:00 · 2026-05-18 09:57:27 -04:00 · c539433d66
commit c539433d66
parent d60d83e595
4 changed files with 67 additions and 115 deletions
--- a/docs-site/content/docs/guides/building-context.mdx
+++ b/docs-site/content/docs/guides/building-context.mdx
@ -3,12 +3,9 @@ title: Building Context
 description: Build and refresh KTX context from databases, source tools, query history, and text.
 ---

-Building context turns configured connections into local semantic-layer sources
-and wiki pages. Agents use those files to understand your schema, business
-definitions, metric logic, joins, and known caveats before they write SQL.
-
-Use this guide after `ktx setup` has created `ktx.yaml` and at least one
-database or context-source connection.
+Build context after `ktx setup` creates `ktx.yaml` and at least one database or
+context-source connection. KTX writes local semantic-layer sources and wiki
+pages for agents to use before writing SQL.

 ## The build loop

@ -22,15 +19,12 @@ Most projects use this loop:
 5. Validate and query representative sources before handing the context to an
   agent.

-`ktx ingest --all` runs database connections first, then context-source
-connections. That order lets dbt, BI, Notion, and text ingest attach context to
-known warehouse tables.
+`ktx ingest --all` runs databases first, then context-source connections, so
+external metadata can attach to known warehouse tables.

 ## Database ingest

-Database ingest connects to a configured warehouse and records local schema
-context. It gives agents table, column, type, constraint, and row-count
-grounding without requiring them to inspect the database directly.
+Database ingest records table, column, type, constraint, and row-count context.

 ```bash
 # Build one configured database connection
@ -55,20 +49,16 @@ ktx ingest warehouse --deep
 ktx ingest --all --deep
 ```

-Deep ingest needs LLM and embedding readiness. If those providers are not
-configured, run `ktx setup` or use `--fast`.
+Deep ingest needs LLM and embedding readiness. Otherwise run `ktx setup` or use
+`--fast`.

-When you use `claude-code`, KTX still controls the tool surface for ingest and
-memory capture. Claude Code built-in tools, discovered MCP servers, plugins,
-skills, agents, and slash commands are not invokable by KTX agent loops unless
-they are exact KTX MCP tools for the current run.
+With `claude-code`, KTX agent loops can invoke only the KTX MCP tools for the
+current run.

 ## Query history

-PostgreSQL, BigQuery, and Snowflake can add query-history context. This helps
-KTX learn common joins, filters, service-account patterns, redaction rules, and
-usage-heavy query templates. BigQuery and Snowflake support a lookback window;
-Postgres reads the current `pg_stat_statements` aggregate data instead.
+PostgreSQL, BigQuery, and Snowflake can add query-history context: common joins,
+filters, service-account patterns, redaction rules, and high-usage templates.

 Enable it during setup, store it under `connections.<id>.context.queryHistory`,
 or request it for one run:
@ -84,19 +74,13 @@ for one run.

 ## Relationship evidence

-Many databases do not declare all foreign keys. KTX can score relationship
-candidates using signals such as name similarity, type compatibility, value
-overlap, embedding similarity, uniqueness, null rate, and structural priors.
-
-The public CLI does not expose separate relationship review subcommands.
-Relationship evidence is built as part of deep database ingest when the
-connector and readiness checks support it.
+KTX scores relationship candidates during supported deep database ingest. The
+public CLI does not expose separate relationship review subcommands.

 ## Context-source ingest

-Context-source connections pull business metadata from tools your team already
-uses. The current public `ktx ingest` command is connection-centric: pass one
-configured connection id, or pass `--all`.
+Context-source connections pull metadata from dbt, BI tools, Notion, and other
+configured systems. Pass one connection id or `--all`.

 ```bash
 # Build one source connection
@ -117,14 +101,13 @@ Supported source types:
 | `metabase` | Metabase API | Questions, dashboards, table metadata, and mappings |
 | `notion` | Notion API | Wiki pages and business knowledge |

-Source ingest extracts metadata, reconciles it with existing local context, and
-writes semantic-layer YAML plus wiki Markdown. It merges rather than blindly
-overwriting local edits.
+Source ingest writes semantic-layer YAML and wiki Markdown, merging with local
+edits.

 ## Text ingest

-Use `ktx ingest text` for notes, Markdown files, runbooks, Slack exports, or
-other free-form knowledge that should become searchable KTX memory.
+Use `ktx ingest text` for notes, Markdown, runbooks, Slack exports, or other
+searchable memory.

 ```bash
 # Capture a Markdown file
@ -146,14 +129,12 @@ Useful flags:
 | `--json` | Print structured output |
 | `--fail-fast` | Stop after the first failed text item |

-Text ingest is a good fit for small, high-signal documents. For system-specific
-connectors such as Notion, dbt, or Metabase, prefer configured source ingest so
-KTX can preserve source metadata.
+Use text ingest for small, high-signal documents. Prefer configured source
+ingest for Notion, dbt, Metabase, and similar systems.

 ## Output and artifacts

-Every ingest run prints a summary. Use `--json` when an agent or script needs a
-structured plan and per-target results.
+Every ingest run prints a summary. Use `--json` for scripts and agents.

 ```bash
 ktx ingest --all --json
@ -168,9 +149,7 @@ Typical generated files:
 | `wiki/user/<user-id>/*.md` | Text and memory ingest | User-scoped context |
 | `.ktx/setup/context-build.json` | Setup context build | Resume and readiness state for setup |

-Ingest sessions also record transcripts with tool calls, LLM responses, and
-write decisions. Inspect them when you need to debug why a source or wiki page
-was written a certain way.
+Ingest transcripts include tool calls, LLM responses, and write decisions.

 ## Example: first full refresh

--- a/docs-site/content/docs/guides/llm-configuration.mdx
+++ b/docs-site/content/docs/guides/llm-configuration.mdx
@ -3,8 +3,8 @@ title: LLM configuration
 description: Configure KTX LLM providers, model roles, and prompt caching.
 ---

-KTX uses the top-level `llm` block in `ktx.yaml` for text generation,
-structured extraction, and ingest or memory agent loops.
+Configure text generation, structured extraction, and ingest or memory loops in
+the top-level `llm` block.

 ## Backends

@ -15,9 +15,7 @@ Set `llm.provider.backend` to one of these values:
 - `vertex`: Use Vertex AI Anthropic models through Google Cloud credentials.
 - `gateway`: Use AI Gateway-compatible Anthropic model ids.
 - `claude-code`: Use your local Claude Code session through the Claude Agent
-  SDK. KTX removes provider-routing environment variables from Claude Code
-  child processes, so this backend doesn't silently fall back to
-  `ANTHROPIC_API_KEY`, Vertex, Gateway, or Bedrock credentials.
+  SDK. KTX strips provider-routing environment variables from child processes.

 ## Claude Code

@ -36,26 +34,20 @@ llm:
    repair: sonnet
 ```

-During setup, choose the Claude Code backend interactively or pass the model in
-automation:
+During setup, choose the backend interactively or pass the model in automation:

 ```bash
 ktx setup --llm-backend claude-code --llm-model opus --no-input
 ```

-For Claude Code, `sonnet`, `opus`, and `haiku` map to the current KTX defaults.
-You can also pass a full Claude model ID, such as `claude-opus-4-7`.
+For Claude Code, `sonnet`, `opus`, and `haiku` map to KTX defaults. Full Claude
+model IDs are also accepted.

-`claude-code` keeps KTX tool boundaries intact. KTX exposes only the MCP tools
-needed for the current KTX agent loop, disables Claude Code built-in tools,
-keeps plugins empty, and denies every non-KTX tool request through
-`canUseTool`. The Claude Agent SDK may still report host-discovered slash
-commands, skills, and subagent names in init metadata; that metadata is not an
-execution grant for KTX agent loops.
+`claude-code` exposes only KTX MCP tools for the current agent loop. SDK init
+metadata may still list host slash commands, skills, and subagents; KTX does not
+grant execution access to them.

 ## Prompt caching

-`llm.promptCaching` has partial parity on `claude-code`. KTX doesn't pass
-Anthropic cache-control markers to the Claude Agent SDK. Status and doctor warn
-when you configure prompt-cache TTL, tool, or history fields that the Claude
-Agent SDK backend ignores.
+`llm.promptCaching` has partial parity on `claude-code`. Status and doctor warn
+when the Claude Agent SDK backend ignores configured cache fields.
--- a/docs-site/content/docs/guides/serving-agents.mdx
+++ b/docs-site/content/docs/guides/serving-agents.mdx
@ -3,9 +3,8 @@ title: Serving Agents
 description: Expose KTX context to Claude Code, Codex, Cursor, OpenCode, and custom agents.
 ---

-KTX serves agents through the public CLI and project-local instruction files.
-Agents do not need a separate server. They read the generated rules, call KTX
-commands, inspect local context files, and use JSON output when they need
+KTX serves agents through the CLI and project-local instruction files. Agents
+read generated rules, call KTX commands, inspect context files, and use JSON for
 structured results.

 ## Recommended setup
@ -39,14 +38,13 @@ ktx setup --agents --target claude-code --global
 ktx setup --agents --target codex --global
 ```

-KTX records installed files in `.ktx/agents/install-manifest.json`. Rerun
-`ktx setup --agents` after moving a checkout or reinstalling the CLI so the
-generated instructions point at the current CLI path.
+Installed files are recorded in `.ktx/agents/install-manifest.json`. Rerun
+`ktx setup --agents` after moving a checkout or reinstalling the CLI.

 ## Agent command set

-All supported agent clients use the same command surface. Use `--project-dir`
-when the agent is running outside the KTX project directory.
+All supported clients use the same command surface. Use `--project-dir` outside
+the KTX project directory.

 ### Readiness

@ -54,9 +52,8 @@ when the agent is running outside the KTX project directory.
 ktx status --json
 ```

-Agents should run this before relying on context. It reports project, LLM,
-embedding, database, context-source, context-build, and agent-integration
-readiness.
+Run this before relying on context. It reports project, provider, connection,
+context-build, and agent-integration readiness.

 ### Semantic layer discovery

@ -66,8 +63,8 @@ ktx sl list --connection-id warehouse --json
 ktx sl search "revenue" --json --limit 10
 ```

-Agents use these commands to discover source names, connection ids, measures,
-dimensions, and likely files to inspect.
+Use these commands to find source names, connection ids, measures, dimensions,
+and files to inspect.

 ### Semantic-layer validation and queries

@ -106,9 +103,8 @@ ktx wiki list --json
 ktx wiki search "revenue recognition" --json --limit 10
 ```

-Agents should search wiki context when a question depends on business
-definitions, metric caveats, process rules, or terms that are not obvious from
-schema names.
+Search wiki context for business definitions, metric caveats, process rules, and
+non-obvious terms.

 ### Context refresh

@ -120,8 +116,7 @@ ktx ingest --all
 ktx ingest text docs/revenue-notes.md --connection-id warehouse
 ```

-Use `--deep` only when LLM and embedding setup is ready and the user expects an
-AI-enriched refresh.
+Use `--deep` only when LLM and embedding setup is ready.

 ## Good agent behavior

@ -135,14 +130,12 @@ Agents should:
 - Validate edited semantic sources with `ktx sl validate`.
 - Keep generated context changes reviewable in git.

-Agents should not assume a background server, ORPC route, frontend app, or
-external migration system exists. KTX is a local context layer with a CLI and
-plain project files.
+KTX is a local context layer with a CLI and plain project files. Do not assume a
+background server, ORPC route, frontend app, or external migration system.

 ## Manual setup

-Manual setup is useful for custom agents that can read project-local
-instructions but are not yet a named target.
+Use manual setup for custom agents that can read project-local instructions.

 1. Install the universal target:

--- a/docs-site/content/docs/guides/writing-context.mdx
+++ b/docs-site/content/docs/guides/writing-context.mdx
@ -3,12 +3,8 @@ title: Writing Context
 description: Edit semantic sources and wiki pages so agents use your business logic.
 ---

-KTX context is meant to be edited. Ingest gives you a grounded first draft, then
-you refine source YAML and wiki Markdown until agents can answer data questions
-with the same definitions your team uses.
-
-Use this guide when you are adding measures, fixing joins, documenting business
-rules, or reviewing context changes made by an agent.
+Ingest creates the first draft. Edit source YAML and wiki Markdown when you need
+sharper metrics, joins, or business rules.

 ## Editing workflow

@ -45,10 +41,8 @@ Use this order for most context changes:

 ## Semantic sources

-Semantic sources are YAML files that describe queryable entities. A source is
-usually a table, but it can also point at a custom SQL expression. Sources
-define the vocabulary agents use for measures, dimensions, segments, joins, and
-grain-aware query planning.
+Semantic sources are YAML files for queryable tables or custom SQL. They define
+agent-facing measures, dimensions, segments, joins, and grain.

 Source files live at:

@ -198,8 +192,8 @@ joins:

 ## Measures

-Good measures have precise names, SQL expressions at the correct grain, and
-descriptions that say what is included and excluded.
+Good measures have precise names, correct-grain SQL, and descriptions that name
+key inclusions and exclusions.

 ```yaml
 measures:
@ -209,14 +203,13 @@ measures:
    description: Completed order revenue after refunds, excluding cancelled orders.
 ```

-Prefer one canonical measure plus wiki synonyms over several nearly identical
-measures. If your team uses multiple definitions, document the distinction in a
-wiki page and link it with `sl_refs`.
+Prefer one canonical measure plus wiki synonyms. Put competing definitions in a
+linked wiki page.

 ## Joins and grain

-`grain` and `relationship` prevent agents from producing double-counted SQL.
-State the row grain even when it seems obvious.
+`grain` and `relationship` prevent double-counted SQL. State the row grain even
+when it seems obvious.

 ```yaml
 grain:
@ -228,8 +221,7 @@ joins:
 ```

 Use `many_to_one` for dimensions such as customer, account, product, or plan.
-Use `one_to_many` only when the target can fan out the source rows, such as
-orders to order items.
+Use `one_to_many` only when the target can fan out rows.

 ## Validate and query

@ -239,8 +231,7 @@ Validation checks source YAML against the live database schema:
 ktx sl validate orders --connection-id warehouse
 ```

-It catches missing columns, invalid join targets, and table-reference problems
-before an agent relies on the source.
+It catches missing columns, invalid joins, and table-reference problems.

 Compile a query to inspect generated SQL:

@ -268,9 +259,8 @@ ktx sl query \

 ## Wiki pages

-Wiki pages capture business context that does not belong in a single source
-file: metric policies, dashboard caveats, company vocabulary, data freshness,
-known issues, and source-of-truth notes.
+Wiki pages hold context that does not belong in one source file: policies,
+caveats, vocabulary, freshness, known issues, and source-of-truth notes.

 Wiki files live under:

@ -280,8 +270,7 @@ wiki/
  user/<user-id>/
 ```

-Use global pages for shared business rules. Use user-scoped pages for local
-notes, personal conventions, or context that should not be shared broadly.
+Use global pages for shared rules and user-scoped pages for local notes.

 ### Wiki page example

@ -338,8 +327,7 @@ ktx sl search "revenue" --json
 ktx wiki search "revenue recognition" --json --limit 10
 ```

-Check that definitions are specific, hidden columns stay hidden, joins have
-explicit relationships, and measures compile into the expected SQL.
+Check definitions, hidden columns, join relationships, and generated SQL.

 ## Common errors