From f3f8aa613b9aac35b87ef34b9f6784e493132363 Mon Sep 17 00:00:00 2001 From: Luca Martial Date: Mon, 11 May 2026 16:44:31 -0700 Subject: [PATCH] docs(docs-site): add agent notes across docs --- .../docs/benchmarks/link-detection.mdx | 11 +++++++ .../content/docs/community/contributing.mdx | 14 ++++++++ .../content/docs/concepts/context-as-code.mdx | 11 +++++++ .../docs/concepts/the-context-layer.mdx | 11 +++++++ .../docs/getting-started/introduction.mdx | 13 ++++++++ .../docs/integrations/agent-clients.mdx | 29 ++++++++++++++++ .../docs/integrations/context-sources.mdx | 33 +++++++++++++++++++ .../docs/integrations/primary-sources.mdx | 24 ++++++++++++++ 8 files changed, 146 insertions(+) diff --git a/docs-site/content/docs/benchmarks/link-detection.mdx b/docs-site/content/docs/benchmarks/link-detection.mdx index 142cc197..feb2f3f1 100644 --- a/docs-site/content/docs/benchmarks/link-detection.mdx +++ b/docs-site/content/docs/benchmarks/link-detection.mdx @@ -5,6 +5,17 @@ description: How KTX's relationship detection performs on real-world schemas. KTX infers foreign key relationships between tables even when the database declares no primary keys or foreign key constraints. This is critical for analytics warehouses, where constraints are rarely enforced. This page documents the methodology, scoring pipeline, and a reproducible benchmark you can run yourself. +## Agent usage notes + +Use this page when an agent needs to explain, tune, or verify relationship detection. + +| Agent task | Relevant section | Command | +|------------|------------------|---------| +| Explain why KTX inferred a join | Detection pipeline | `ktx dev scan relationships --status all` | +| Decide whether to accept or reject a candidate | Scoring and threshold configuration | `ktx dev scan relationships --accept ` | +| Tune thresholds from reviewed decisions | Broader benchmark suite and calibration | `ktx dev scan relationship-thresholds --connection ` | +| Reproduce the bundled benchmark | Reproducing the benchmark | `pnpm run relationships:verify-orbit` | + ## What this measures Most analytics warehouses — Snowflake, BigQuery, Redshift — don't enforce referential integrity constraints. Tables like `fct_product_events` reference `dim_accounts` by convention (`account_id` → `id`), but nothing in the schema says so. diff --git a/docs-site/content/docs/community/contributing.mdx b/docs-site/content/docs/community/contributing.mdx index 598ac5a9..8feb86c9 100644 --- a/docs-site/content/docs/community/contributing.mdx +++ b/docs-site/content/docs/community/contributing.mdx @@ -220,3 +220,17 @@ Before submitting a pull request: 5. **Don't commit artifacts** — `node_modules/`, `.venv/`, `dist/`, coverage output, and local databases should not be committed. For larger features or architectural changes, open an issue first to discuss the approach. + +## Agent usage notes + +Use this page when an agent is modifying the KTX repository itself rather than using KTX in an analytics project. + +| Agent task | Command or section | +|------------|--------------------| +| Prepare the workspace | `pnpm install`, `pnpm run setup:dev`, `uv sync --all-groups` | +| Verify TypeScript changes | `pnpm run type-check`, `pnpm run test`, or package-filtered equivalents | +| Verify Python changes | `uv run pytest -q` and `uv run pre-commit run --files ` | +| Add a connector | Adding a connector | +| Check style expectations | Code conventions | + +Common recovery path: if a check fails because generated files or local runtimes are missing, run the setup commands first. If a check fails because of a real type, lint, or test error, fix the source file and rerun the smallest failing check before broadening verification. diff --git a/docs-site/content/docs/concepts/context-as-code.mdx b/docs-site/content/docs/concepts/context-as-code.mdx index 152b54ce..4ee25d88 100644 --- a/docs-site/content/docs/concepts/context-as-code.mdx +++ b/docs-site/content/docs/concepts/context-as-code.mdx @@ -80,3 +80,14 @@ This matters for three reasons. **Reproducibility.** Because ingestion sessions are recorded as structured transcripts (tool calls and responses, not just logs), they can be replayed for testing and validation. If you change your ingestion configuration or upgrade the LLM, you can replay previous sessions to see how the output would differ. This gives you a safety net for changes that affect how context is generated. The transcript is stored with local ingest run state and can be reviewed or replayed when you need to audit a decision. Commit the resulting YAML and Markdown changes; commit reports or transcripts only when they are part of your team's review workflow. + +## Agent usage notes + +Use this page when an agent needs to explain review workflows, ingestion diffs, replayability, or why KTX writes YAML and Markdown instead of hiding context in a hosted service. + +| Agent task | Relevant section | Next page | +|------------|------------------|-----------| +| Explain how generated context should be reviewed | The git workflow | [Building Context](/docs/guides/building-context) | +| Diagnose why ingestion changed a semantic source | Auto-ingestion and Deterministic replay | [ktx ingest](/docs/cli-reference/ktx-ingest) | +| Explain how context improves over time | Feedback loops | [Link Detection](/docs/benchmarks/link-detection) | +| Tell a user what to commit | The git workflow | [Writing Context](/docs/guides/writing-context) | diff --git a/docs-site/content/docs/concepts/the-context-layer.mdx b/docs-site/content/docs/concepts/the-context-layer.mdx index 3a3ab9b4..64a17730 100644 --- a/docs-site/content/docs/concepts/the-context-layer.mdx +++ b/docs-site/content/docs/concepts/the-context-layer.mdx @@ -145,3 +145,14 @@ my-project/ Semantic sources and knowledge pages are committed to git. The SQLite database holds ephemeral state — scan results, embedding indexes, session logs — and is git-ignored. If you delete it, KTX rebuilds it on the next run. This means your analytics context travels with your code. You can fork it, branch it, review it in a PR, and merge it with the same tools you use for dbt models. There's no sync problem between a remote server and your local state. There's no migration to run. The files are the source of truth. + +## Agent usage notes + +Use this page when an agent needs to explain why KTX exists, why schema-only database access is not enough, or how KTX differs from MetricFlow, Cube, Malloy, and traditional semantic layers. + +| Agent task | Relevant section | Next page | +|------------|------------------|-----------| +| Explain why a database agent made a plausible but wrong query | The problem | [Writing Context](/docs/guides/writing-context) | +| Decide whether a metric belongs in YAML or Markdown | What a context layer is | [Writing Context](/docs/guides/writing-context) | +| Compare KTX to another semantic layer | How KTX compares | [Primary Sources](/docs/integrations/primary-sources) | +| Explain reviewability and source of truth | The plain-files philosophy | [Context as Code](/docs/concepts/context-as-code) | diff --git a/docs-site/content/docs/getting-started/introduction.mdx b/docs-site/content/docs/getting-started/introduction.mdx index 827cdff5..e074f021 100644 --- a/docs-site/content/docs/getting-started/introduction.mdx +++ b/docs-site/content/docs/getting-started/introduction.mdx @@ -57,3 +57,16 @@ If you've ever watched an agent confidently generate a query that joins on the w - **Get hands-on** — follow the [Quickstart](/docs/getting-started/quickstart) to set up KTX with your own database in under 10 minutes. - **Understand the theory** — read [The Context Layer](/docs/concepts/the-context-layer) to learn why schema access alone breaks on real analytics and how KTX addresses it. + +## Agent usage notes + +Use this page as the high-level routing document for KTX docs. + +| Agent task | Read next | +|------------|-----------| +| Set up a new KTX project | [Quickstart](/docs/getting-started/quickstart) | +| Explain what problem KTX solves | [The Context Layer](/docs/concepts/the-context-layer) | +| Scan a database and ingest metadata | [Building Context](/docs/guides/building-context) | +| Edit semantic sources or knowledge pages | [Writing Context](/docs/guides/writing-context) | +| Connect KTX to an agent client | [Serving Agents](/docs/guides/serving-agents) | +| Look up exact command flags | [CLI Reference](/docs/cli-reference/ktx-setup) | diff --git a/docs-site/content/docs/integrations/agent-clients.mdx b/docs-site/content/docs/integrations/agent-clients.mdx index 088dbba4..75a06cb1 100644 --- a/docs-site/content/docs/integrations/agent-clients.mdx +++ b/docs-site/content/docs/integrations/agent-clients.mdx @@ -10,6 +10,25 @@ KTX integrates with coding agents through two channels that can be used independ Run `ktx setup` and select your agent targets, or configure manually using the snippets below. +## Agent setup matrix + +Use this table to choose the setup path before editing client-specific files. + +| Target | Project-scoped files | Global install | Recommended mode | Reload needed | +|--------|----------------------|----------------|------------------|---------------| +| Claude Code | `.claude/skills/ktx/SKILL.md`, `.mcp.json` | Yes | Both CLI skill and MCP | Skill reload is automatic; MCP starts on first use | +| Cursor | `.cursor/rules/ktx.mdc`, `.cursor/mcp.json` | No | Both CLI rule and MCP | Reload the Cursor window after MCP config changes | +| Codex | `.agents/skills/ktx/SKILL.md`, `.agents/mcp/ktx.json` | Yes | Both CLI skill and MCP | Start a new session after global skill changes | +| OpenCode | `.opencode/commands/ktx.md`, `.opencode/mcp.json` | No | Both CLI command and MCP | Restart OpenCode after config changes | + +The safest generated command is: + +```bash +ktx setup --agents --target codex --agent-install-mode both --project +``` + +Replace `codex` with `claude-code`, `cursor`, `opencode`, or `universal`. + ## Claude Code ### Install via `ktx setup` @@ -277,3 +296,13 @@ All agent clients connect to the same KTX MCP server. The server exposes these t | Global install | Yes | No | Yes | No | | Config location | `.mcp.json` | `.cursor/mcp.json` | `.agents/mcp/ktx.json` | `.opencode/mcp.json` | | Skills location | `.claude/skills/` | `.cursor/rules/` | `.agents/skills/` | `.opencode/commands/` | + +## Common errors + +| Error or symptom | Likely cause | Recovery | +|------------------|--------------|----------| +| Agent cannot start MCP server | The config points to a missing `ktx` binary | Run `pnpm run link:dev`, rerun `ktx setup --agents`, or use an absolute command path | +| Agent sees MCP tools but queries fail | Server args omit `--semantic-compute` | Add `--semantic-compute`; add `--execute-queries` only when read-only execution is intended | +| Agent reads the wrong KTX project | `--project-dir` or `KTX_PROJECT_DIR` points at another directory | Regenerate project-scoped config from the intended project directory | +| CLI skill commands return non-JSON output | The generated command omitted `--json` or an agent changed it | Restore the generated skill/rule and ensure every `ktx agent` command includes `--json` | +| Cursor or OpenCode does not show new tools | The client has not reloaded its MCP config | Reload the app window or restart the agent client | diff --git a/docs-site/content/docs/integrations/context-sources.mdx b/docs-site/content/docs/integrations/context-sources.mdx index 49f387f3..02554e08 100644 --- a/docs-site/content/docs/integrations/context-sources.mdx +++ b/docs-site/content/docs/integrations/context-sources.mdx @@ -7,6 +7,29 @@ Context sources feed your existing analytics tooling into KTX. During ingestion, All context sources are configured in `ktx.yaml` under `connections` with their respective `driver` value. +## Ingestion workflow + +Agents should configure and ingest context sources in this order: + +1. Add the context source connection in `ktx.yaml` or with `ktx setup`. +2. Store tokens as `env:NAME` or `file:/path/to/secret`. +3. Run `ktx ingest ` for one source or `ktx ingest --all`. +4. Check progress with `ktx ingest status --json`. +5. Review generated `semantic-layer/` YAML and `knowledge/` Markdown files in git. +6. Validate changed semantic sources with `ktx sl validate`. + +## Shared source fields + +| Field | Required | Description | +|-------|----------|-------------| +| `driver` | Yes | Source adapter: `dbt`, `metricflow`, `lookml`, `metabase`, `looker`, or `notion` | +| `readonly` | Strongly recommended | Marks the source as read-only for KTX | +| `source_dir` | For local file sources | Absolute or project-relative source directory | +| `repo_url` | For Git-hosted sources | Git repository URL | +| `branch` | No | Git branch to read | +| `path` | No | Subdirectory inside a monorepo | +| `auth_token_ref` | For private APIs/repos | `env:NAME` or `file:/path/to/secret` token reference | + ## dbt Ingests schema definitions, model descriptions, column metadata, and test coverage from a dbt project. @@ -351,3 +374,13 @@ Create an integration at [notion.so/my-integrations](https://www.notion.so/my-in - Notion is knowledge-only — it does not produce semantic layer sources - Rate limits apply; large workspaces may require multiple ingestion runs - `last_successful_cursor` is auto-managed for incremental sync + +## Common errors + +| Error or symptom | Likely cause | Recovery | +|------------------|--------------|----------| +| Adapter cannot read source files | `source_dir`, `repo_url`, `branch`, or `path` is wrong | Verify the path locally or clone the repo manually with the same credentials | +| Private repo/API authentication fails | Token env var or secret file is missing | Export the env var or update `auth_token_ref` to a readable file | +| Ingest creates duplicate context | Existing source names or knowledge pages do not match imported terminology | Review the diff, rename duplicates, and add knowledge pages with canonical names | +| Notion ingest skips pages | Integration lacks access or root ids are missing | Share pages with the Notion integration and set `root_page_ids` or use `all_accessible` carefully | +| Generated semantic sources fail validation | Tool metadata does not match the live warehouse schema | Map BI/source databases to primary warehouse connections and rerun validation | diff --git a/docs-site/content/docs/integrations/primary-sources.mdx b/docs-site/content/docs/integrations/primary-sources.mdx index be71cba0..c36260d1 100644 --- a/docs-site/content/docs/integrations/primary-sources.mdx +++ b/docs-site/content/docs/integrations/primary-sources.mdx @@ -11,6 +11,20 @@ All connectors share these conventions: - Connections are read-only — KTX never writes to your database - Schema scanning discovers tables, columns, types, and constraints automatically +## Connection field reference + +Agents should prefer environment or file references over literal secrets. + +| Field | Required | Applies to | Description | +|-------|----------|------------|-------------| +| `driver` | Yes | all connections | Connector driver such as `postgres`, `snowflake`, `bigquery`, `clickhouse`, `mysql`, `sqlserver`, or `sqlite` | +| `url` | One of the connection methods | URL-style connectors | Database URL, `env:NAME`, or `file:/path/to/secret` | +| `host`, `port`, `database`, `username`, `password` | One of the connection methods | PostgreSQL, MySQL, ClickHouse, SQL Server | Field-by-field connection values | +| `schema` or `schemas` | No | schema-aware warehouses | Single schema or list of schemas to scan | +| `readonly` | Strongly recommended | all primary sources | Marks the connection as read-only in KTX config | +| `historicSql` | No | supported warehouses | Enables query-history ingestion when the warehouse supports it | +| `path` | Yes for path-style SQLite | SQLite | Local SQLite database path or `env:NAME` reference | + ## PostgreSQL The most full-featured connector. Supports schema introspection, foreign key detection, column statistics, and historic SQL via `pg_stat_statements`. @@ -488,3 +502,13 @@ No authentication required — SQLite is file-based. The file must be readable b - SQLite type affinity system: `TEXT`, `NUMERIC`, `INTEGER`, `REAL`, `BLOB` - Foreign key enforcement requires explicit `PRAGMA foreign_keys = ON` - In-memory databases supported with `path: ":memory:"` (for testing) + +## Common errors + +| Error or symptom | Likely cause | Recovery | +|------------------|--------------|----------| +| Connection URL appears in git diff | A literal credential URL was written to `ktx.yaml` | Replace it with `env:NAME` or `file:/path/to/secret` and rotate exposed credentials | +| Scan returns no tables | Schema/database/project filter is wrong or the user lacks metadata permissions | Verify the schema list and grant metadata read permissions | +| Historic SQL is empty | Query history extension or warehouse history view is unavailable | Enable the warehouse-specific history feature, then rerun scan or setup | +| Column statistics are missing | Connector cannot access stats tables or the warehouse does not expose them | Grant stats permissions where supported; otherwise rely on structural scan output | +| SQL execution fails through agents | Connection is missing, unreachable, or execution is disabled in the server | Run `ktx connection test ` and check `ktx serve` flags |