From 1b552a38c27b14c2ecc2d33c487d079aa0b39e7c Mon Sep 17 00:00:00 2001 From: Luca Martial Date: Mon, 11 May 2026 23:32:10 -0700 Subject: [PATCH] docs: refresh setup and install guidance --- README.md | 362 +++++------------- .../content/docs/cli-reference/ktx-serve.mdx | 2 +- .../content/docs/community/contributing.mdx | 9 +- .../docs/getting-started/quickstart.mdx | 74 +--- 4 files changed, 128 insertions(+), 319 deletions(-) diff --git a/README.md b/README.md index 5f152cca..84592226 100644 --- a/README.md +++ b/README.md @@ -3,7 +3,7 @@

- Workspace-first context layer for database agents + The context layer for analytics agents

@@ -14,312 +14,154 @@ --- -KTX stores warehouse memory in a project directory, generates and validates -semantic-layer YAML, indexes knowledge, scans database schemas, and exposes the -result through a CLI and MCP server. +KTX turns warehouse metadata, semantic definitions, and business knowledge into +reviewable project files that agents can use while planning, querying, and +updating analytics work. -KTX projects are plain files: YAML, Markdown, SQLite state, and generated -artifacts. You can inspect them, commit them, and serve them to any MCP client. +A KTX project is a directory of plain files — YAML semantic sources, Markdown +knowledge pages, and SQLite state — that you commit to git and review in PRs, +just like dbt models. -## What KTX provides +## Who KTX is for -- Durable warehouse memory with semantic-layer sources and knowledge pages. -- Native scan connectors for SQLite, Postgres, MySQL, ClickHouse, SQL Server, - BigQuery, and Snowflake. -- Agentic ingest with provenance links, tool transcripts, and replay metadata. -- Local semantic-layer query planning and optional query execution. -- A stdio MCP server with tools for connections, knowledge, semantic-layer - sources, ingest reports, and replay. +KTX is built for analytics engineers and data teams who want data agents to +work on real analytics systems — not just generate one-off SQL. + +Use KTX when you want agents to: + +- **Generate SQL** from approved measures and joins +- **Repair semantic definitions** through reviewable diffs +- **Explain metric provenance** with warehouse evidence +- **Work alongside** dbt, LookML, MetricFlow, Looker, Metabase, and modern BI + platforms + +Works with PostgreSQL, Snowflake, BigQuery, ClickHouse, MySQL, SQL Server, and +SQLite. ## Quick start -Run the pre-seeded demo through the public npm package: +Install the CLI and run the setup wizard: ```bash -npx @kaelio/ktx setup demo --no-input -npx @kaelio/ktx setup demo inspect -``` - -The default demo uses packaged sample data and prebuilt context. It does not -require API keys, network access, or an LLM provider. - -To replay the packaged ingest run, use: - -```bash -npx @kaelio/ktx setup demo --mode replay --no-input -``` - -To run the full agentic demo with an LLM provider, set a provider key for the -current process: - -```bash -ANTHROPIC_API_KEY=$YOUR_ANTHROPIC_API_KEY \ - npx @kaelio/ktx setup demo --mode full --no-input -``` - -Interactive full-demo setup can prompt for a provider key without writing the -key to `ktx.yaml`. - -You can also install the CLI in a project or globally: - -```bash -npm install @kaelio/ktx -npx ktx --help npm install -g @kaelio/ktx -ktx --help +ktx setup ``` -## Build a local project +The wizard walks through six steps: configuring your LLM provider, setting up +embeddings, connecting your database, adding context sources (dbt, LookML, +Metabase, Looker, Notion), building context, and installing agent integration. -Create a project from a local workspace: +If it exits before completion, rerun `ktx setup` to resume where you left off. + +Check your project status: ```bash -npm install @kaelio/ktx -PROJECT_DIR="$(mktemp -d)/ktx-demo" -npx ktx init "$PROJECT_DIR" --name ktx-demo +ktx status ``` -Create a SQLite warehouse: +``` +KTX project: /home/user/analytics +Project ready: yes +LLM ready: yes (claude-sonnet-4-6) +Embeddings ready: yes (text-embedding-3-small) +Primary sources configured: yes (postgres-warehouse) +Context sources configured: yes (dbt-main) +KTX context built: yes +Agent integration ready: yes (claude-code:project) +``` + +## What's in a project + +``` +my-project/ +├── ktx.yaml # Project configuration +├── semantic-layer/ +│ └── warehouse/ +│ ├── orders.yaml # Semantic source definitions +│ ├── customers.yaml +│ └── order_items.yaml +├── knowledge/ +│ ├── global/ +│ │ ├── revenue.md # Business definitions and rules +│ │ └── segment-classification.md +│ └── user/ +│ └── local/ +├── raw-sources/ +│ └── warehouse/ +│ └── live-database/ # Scan artifacts and reports +└── .ktx/ + └── db.sqlite # Local state (git-ignored) +``` + +Semantic sources and knowledge pages are committed to git. The `.ktx/` directory +holds ephemeral state and is git-ignored — delete it and KTX rebuilds on the +next run. + +## Serve agents + +KTX integrates with coding agents through CLI skills, an MCP server, or both. +The setup wizard configures this automatically — here's what each mode looks +like. + +**CLI skills** — the agent calls `ktx` commands directly through a skill file +installed in your agent's config (e.g., `.claude/skills/ktx/SKILL.md`): ```bash -python - "$PROJECT_DIR/demo.db" <<'PY' -import sqlite3 -import sys - -conn = sqlite3.connect(sys.argv[1]) -conn.executescript(""" -DROP TABLE IF EXISTS accounts; -CREATE TABLE accounts ( - account_id INTEGER PRIMARY KEY, - account_name TEXT NOT NULL, - segment TEXT NOT NULL, - region TEXT NOT NULL -); -INSERT INTO accounts VALUES - (1, 'Acme Analytics', 'Mid-Market', 'NA'), - (2, 'Beacon Bank', 'Enterprise', 'EMEA'), - (3, 'Cobalt Coffee', 'SMB', 'NA'), - (4, 'Delta Devices', 'Mid-Market', 'APAC'), - (5, 'Evergreen Energy', 'Enterprise', 'NA'); -""") -conn.close() -PY +ktx sl query --measure orders.revenue --dimension orders.status --format sql +ktx wiki search "revenue definition" +ktx sl validate orders ``` -Replace the generated `ktx.yaml`: +**MCP server** — the agent calls KTX tools over the Model Context Protocol: ```bash -cat > "$PROJECT_DIR/ktx.yaml" <" -memory: - auto_commit: true -YAML -``` - -Write and validate a semantic-layer source: - -```bash -npx ktx sl write accounts --project-dir "$PROJECT_DIR" \ - --connection-id warehouse --yaml 'name: accounts -table: accounts -description: CRM accounts with segmentation attributes. -grain: - - account_id -columns: - - name: account_id - type: number - - name: account_name - type: string - - name: segment - type: string - - name: region - type: string -measures: - - name: account_count - expr: count(account_id) -joins: [] -' - -npx ktx sl validate accounts --project-dir "$PROJECT_DIR" \ - --connection-id warehouse -``` - -Generate SQL and execute the query: - -```bash -npx ktx sl query --project-dir "$PROJECT_DIR" \ - --connection-id warehouse \ - --measure accounts.account_count \ - --dimension accounts.segment \ - --order-by accounts.account_count:desc \ - --limit 5 \ - --format sql - -npx ktx sl query --project-dir "$PROJECT_DIR" \ - --connection-id warehouse \ - --measure accounts.account_count \ - --dimension accounts.segment \ - --order-by accounts.account_count:desc \ - --limit 5 \ - --execute \ - --max-rows 5 -``` - -List and test the warehouse connection: - -```bash -npx ktx connection list --project-dir "$PROJECT_DIR" -npx ktx connection test warehouse --project-dir "$PROJECT_DIR" -``` - -The connection test prints the configured driver and discovered table count: - -```text -Driver: sqlite -Tables: 1 -``` - -### Scan the demo warehouse - -Scan artifacts are written under -`raw-sources/warehouse/live-database//` in the project directory. - -```bash - -SCAN_OUTPUT="$(npx ktx scan warehouse --project-dir "$PROJECT_DIR")" -printf '%s\n' "$SCAN_OUTPUT" -SCAN_RUN_ID="$(printf '%s\n' "$SCAN_OUTPUT" | awk '/^Run: / { print $2 }')" -npx ktx scan status --project-dir "$PROJECT_DIR" "$SCAN_RUN_ID" -npx ktx scan report --project-dir "$PROJECT_DIR" "$SCAN_RUN_ID" -``` - -For non-SQLite drivers, prefer credential references such as `--url env:NAME` -or `--url file:PATH` over literal credential URLs. - -## Managed Python runtime - -KTX installs its Python runtime only when a Python-backed command needs it. -The runtime lives outside the npm cache, is versioned by the installed CLI -version, and is managed by `ktx runtime` commands. - -KTX requires `uv` on `PATH` to create the managed runtime. Install `uv` with -your system package manager or the official installer before running Python- -backed KTX commands. KTX doesn't download `uv` automatically; run -`ktx runtime doctor` if runtime installation fails: - -```bash -npx ktx runtime install --yes -npx ktx runtime status -npx ktx runtime doctor -npx ktx runtime start -npx ktx runtime stop -npx ktx runtime prune --dry-run -npx ktx runtime prune --yes -``` - -Use `runtime prune --dry-run` to preview stale runtime directories from older -CLI versions. Add `--yes` to remove those stale directories after daemon -processes are stopped. - -Commands such as `npx @kaelio/ktx sl query ... --yes` can install the core -runtime lazily from the bundled wheel. Local embeddings remain lazy; prepare -them only when you select local `sentence-transformers` embeddings: - -```bash -npx ktx runtime install --feature local-embeddings --yes -npx ktx runtime start --feature local-embeddings -``` - -## Serve MCP - -Start the stdio MCP server from the project directory: - -```bash -npx ktx serve --mcp stdio --project-dir "$PROJECT_DIR" \ +ktx serve --mcp stdio \ --user-id local \ --semantic-compute \ --execute-queries \ --yes ``` -The `--semantic-compute` flag uses the managed Python runtime when no explicit -semantic compute URL is provided. KTX starts or reuses the managed runtime as -needed. +This exposes tools for connections, knowledge search, semantic-layer sources, +validation, queries, ingestion, and replay. The `--semantic-compute` flag starts +the managed Python runtime for query planning automatically. -The MCP server exposes `connection_list`, `knowledge_search`, -`knowledge_read`, `knowledge_write`, `sl_list_sources`, `sl_read_source`, -`sl_write_source`, `sl_validate`, `sl_query`, `ingest_trigger`, -`ingest_status`, `ingest_report`, and `ingest_replay`. +Supported agents: Claude Code, Codex, Cursor, OpenCode, and any agent that +reads `.agents/` skills or MCP configuration. ## Workspace packages -- `packages/context`: core TypeScript context library. -- `packages/cli`: CLI wrapper over the context package. -- `packages/llm`: LLM and embedding provider helpers. -- `packages/connector-bigquery`: BigQuery scan connector. -- `packages/connector-clickhouse`: ClickHouse scan connector. -- `packages/connector-mysql`: MySQL scan connector. -- `packages/connector-postgres`: Postgres scan connector. -- `packages/connector-snowflake`: Snowflake scan connector. -- `packages/connector-sqlite`: SQLite scan connector. -- `packages/connector-sqlserver`: SQL Server scan connector. -- `python/ktx-sl`: semantic-layer engine. -- `python/ktx-daemon`: portable compute service for semantic-layer operations. +| Package | Purpose | +|---------|---------| +| `packages/cli` | CLI entry point | +| `packages/context` | Core context engine | +| `packages/llm` | LLM and embedding providers | +| `packages/connector-*` | Database connectors (Postgres, Snowflake, BigQuery, ClickHouse, MySQL, SQL Server, SQLite) | +| `python/ktx-sl` | Semantic-layer query planning | +| `python/ktx-daemon` | Portable compute service | ## Development -Install dependencies and run checks: - ```bash +git clone https://github.com/kaelio/ktx.git +cd ktx pnpm install +uv sync --all-groups +pnpm run build pnpm run check -uv sync --all-packages -source .venv/bin/activate -uv run pytest ``` -Use the optional development binary when you want a local `ktx-dev` command: +Use the development CLI for local testing: ```bash +pnpm run setup:dev pnpm run link:dev ktx-dev --help ``` The repository uses `pnpm` for TypeScript packages and `uv` for Python -packages. - -## Release status - -This repository builds one public npm artifact named `@kaelio/ktx`. The release -artifact manifest contains the public npm tarball and the bundled `kaelio-ktx` -runtime wheel. The first public npm handoff is policy-gated through -`release-policy.json`, which keeps Python package publishing disabled because -KTX-owned Python code ships inside the npm package as a bundled wheel. The -`python/ktx-sl` and `python/ktx-daemon` directories remain source packages for -development, not public release artifacts. - -Build local package artifacts and verify the guarded dry-run publish path with: - -```bash -source .venv/bin/activate -pnpm run artifacts:check -pnpm run release:readiness -pnpm run release:npm-publish -``` - -Run the live npm publish only from the manual `KTX Release` workflow with the -`publish_live` input enabled after the `NPM_TOKEN` secret is configured. +packages. See [Contributing](docs-site/content/docs/community/contributing.mdx) +for full development setup, testing, and PR guidelines. ## License diff --git a/docs-site/content/docs/cli-reference/ktx-serve.mdx b/docs-site/content/docs/cli-reference/ktx-serve.mdx index ec0d2b28..3816b808 100644 --- a/docs-site/content/docs/cli-reference/ktx-serve.mdx +++ b/docs-site/content/docs/cli-reference/ktx-serve.mdx @@ -68,7 +68,7 @@ The MCP server is typically configured through `ktx setup --agents` rather than | Error | Cause | Recovery | |-------|-------|----------| -| Agent cannot start server | The agent config cannot find the `ktx` binary | Run `pnpm run link:dev` or use an absolute command path in the agent config | +| Agent cannot start server | The agent config cannot find the `ktx` binary | Install `@kaelio/ktx` globally with `npm install -g @kaelio/ktx` or use an absolute command path in the agent config | | Semantic tools are unavailable | Server was started without `--semantic-compute` | Add `--semantic-compute` or `--semantic-compute-url` to the server args | | Query execution is denied | Server was started without `--execute-queries` | Add `--execute-queries` only for trusted projects where read-only execution is intended | | Context resolves to wrong project | `KTX_PROJECT_DIR` is missing or points elsewhere | Set `KTX_PROJECT_DIR` to the project containing the intended `ktx.yaml` | diff --git a/docs-site/content/docs/community/contributing.mdx b/docs-site/content/docs/community/contributing.mdx index 8feb86c9..1b4e39ce 100644 --- a/docs-site/content/docs/community/contributing.mdx +++ b/docs-site/content/docs/community/contributing.mdx @@ -7,6 +7,11 @@ KTX is an open-source project and welcomes contributions — bug fixes, new conn ## Development setup +This page is for contributors working on the KTX repository. To install KTX for +an analytics project, use the published +[`@kaelio/ktx`](https://www.npmjs.com/package/@kaelio/ktx) package in the +[Quickstart](/docs/getting-started/quickstart). + ### Prerequisites - **Node.js 22+** and **pnpm** — for the TypeScript workspace @@ -44,7 +49,9 @@ pnpm run setup:dev pnpm run link:dev ``` -This makes the `ktx` command available globally, pointing at your local build. +This makes the `ktx-dev` command available globally, pointing at your local +build. Use this development binary when you need to test unpublished repository +changes. ## Repository structure diff --git a/docs-site/content/docs/getting-started/quickstart.mdx b/docs-site/content/docs/getting-started/quickstart.mdx index 91a17d05..61abc301 100644 --- a/docs-site/content/docs/getting-started/quickstart.mdx +++ b/docs-site/content/docs/getting-started/quickstart.mdx @@ -9,44 +9,30 @@ If you are a coding assistant trying to decide which KTX docs page to read, star ## Workflow summary -Use this sequence when an agent needs to set up KTX from a fresh checkout: +Use this sequence when you are setting up KTX in an analytics project: -1. `pnpm install` — install workspace dependencies. -2. `pnpm run setup:dev` — build local packages and prepare the development CLI. -3. `pnpm run link:dev` — link the `ktx` command for local use. -4. `ktx setup` — create or resume a KTX project. -5. `ktx status` — verify project readiness. -6. `ktx sl list` — confirm semantic-layer sources are available. -7. `ktx sl query ... --format sql` — compile a semantic query without executing it. +1. `npm install -g @kaelio/ktx` — install the published KTX CLI from npm. +2. `ktx setup` — create or resume a KTX project. The setup wizard is stateful. If it exits before completion, rerun `ktx setup` in the same project directory to resume from the first incomplete step. -## Prerequisites - -- **Node.js 22+** and **pnpm** -- An **Anthropic API key** for LLM-powered enrichment and ingestion -- A **database connection** — PostgreSQL, Snowflake, BigQuery, ClickHouse, MySQL, SQL Server, or SQLite -- Optionally, a **dbt project**, **LookML repo**, **Metabase instance**, or other context source - ## Install and run setup -KTX is currently used from a local checkout or linked workspace CLI. Build and link the CLI first: +Install the published [`@kaelio/ktx`](https://www.npmjs.com/package/@kaelio/ktx) CLI: ```bash -git clone https://github.com/kaelio/ktx.git -cd ktx -pnpm install -pnpm run setup:dev -pnpm run link:dev +npm install -g @kaelio/ktx ``` -Then run the setup wizard in the directory where you want your KTX project: +Then run the setup wizard: ```bash ktx setup ``` -The wizard walks through six steps. You can go back at any point, and if you exit early, running `ktx setup` again resumes where you left off. +The local checkout flow is only for contributors working on KTX itself. See [Contributing](/docs/community/contributing) for that setup. + +The wizard walks through six steps. You can go back at any point, and if you exit early, rerunning `ktx setup` resumes where you left off. ## Step 1: Configure LLM @@ -86,10 +72,11 @@ KTX uses embeddings for semantic search over sources, wiki content, schema metad **OpenAI embeddings** use `text-embedding-3-small` (1536 dimensions) and require an `OPENAI_API_KEY`. -**Local embeddings** use `all-MiniLM-L6-v2` (384 dimensions) via the KTX Python daemon. No API key is needed. If you run the daemon as a long-lived HTTP service, start it with: +**Local embeddings** use `all-MiniLM-L6-v2` (384 dimensions) via the KTX managed Python runtime. No API key is needed. KTX can install and start the runtime during setup; to prepare it ahead of time, run: ```bash -ktx-daemon serve-http --host 127.0.0.1 --port 8765 +ktx runtime install --feature local-embeddings --yes +ktx runtime start --feature local-embeddings ``` ## Step 3: Connect a database @@ -208,12 +195,15 @@ Then select which agents to install for: │ ◻ Codex │ ◻ Cursor │ ◻ OpenCode +│ ◻ Custom agent (.agents) ``` **CLI mode** writes a skill file (e.g., `.claude/skills/ktx/SKILL.md`) that teaches the agent to call KTX commands directly. **MCP mode** writes an MCP server configuration (e.g., `.mcp.json`) that lets the agent call KTX tools like `sl_query`, `knowledge_search`, and `sl_write_source` over the Model Context Protocol. +**Custom agent** uses the universal `.agents` target for agents that can read project-local skills or MCP configuration. + ## Generated files KTX writes project state as plain files so agents can inspect and edit changes in git. @@ -247,44 +237,14 @@ KTX context built: yes Agent integration ready: yes (claude-code:project) ``` -List your semantic sources: - -```bash -ktx sl list -``` - -Query through the semantic layer: - -```bash -ktx sl query \ - --connection-id postgres-warehouse \ - --measure orders.total_revenue \ - --dimension orders.status \ - --order-by orders.total_revenue:desc \ - --limit 5 \ - --format sql -``` - -This outputs the generated SQL. Add `--execute` to run it against your warehouse: - -```bash -ktx sl query \ - --connection-id postgres-warehouse \ - --measure orders.total_revenue \ - --dimension orders.status \ - --order-by orders.total_revenue:desc \ - --limit 5 \ - --execute --max-rows 10 -``` - ## Common errors | Error or symptom | Likely cause | Recovery | |------------------|--------------|----------| -| `ktx: command not found` | The local CLI has not been linked | Run `pnpm run setup:dev` and `pnpm run link:dev` from the KTX checkout, then open a new shell | +| `ktx: command not found` | The KTX package is not installed globally, or the shell cannot find the global binary | Run `npm install -g @kaelio/ktx` and open a new shell | | LLM health check fails | Missing, invalid, or unauthorized Anthropic API key | Export `ANTHROPIC_API_KEY` or rerun `ktx setup` and choose the file-backed secret option | | OpenAI embedding check fails | `OPENAI_API_KEY` is missing when OpenAI embeddings are selected | Export `OPENAI_API_KEY`, or rerun setup and choose local sentence-transformers embeddings | -| Local embeddings hang or fail | The Python daemon cannot start or the local model runtime is unavailable | Run `uv sync --all-groups`, then start `ktx-daemon serve-http --host 127.0.0.1 --port 8765` and rerun setup | +| Local embeddings hang or fail | The managed Python runtime cannot start or the local model runtime is unavailable | Install `uv`, run `ktx runtime doctor`, then run `ktx runtime install --feature local-embeddings --yes` and rerun setup | | Database connection test fails | Credentials, network access, warehouse, database, or schema value is wrong | Test the same URL with the database's native client, then rerun `ktx connection add ... --force` or rerun setup | | `KTX context built: no` in `ktx status` | Setup saved configuration but did not build context | Run `ktx setup context build` or rerun `ktx setup` and choose to build context now | | Agent integration is incomplete | Setup skipped the agents step or the target was not installed | Run `ktx setup --agents --target codex --agent-install-mode both --project` using the target you need |