docs: refresh setup and install guidance

This commit is contained in:
Luca Martial 2026-05-11 23:32:10 -07:00
parent 94fc580a3d
commit 1b552a38c2
4 changed files with 128 additions and 319 deletions

362
README.md
View file

@ -3,7 +3,7 @@
</h1>
<p align="center">
<strong>Workspace-first context layer for database agents</strong>
<strong>The context layer for analytics agents</strong>
</p>
<p align="center">
@ -14,312 +14,154 @@
---
KTX stores warehouse memory in a project directory, generates and validates
semantic-layer YAML, indexes knowledge, scans database schemas, and exposes the
result through a CLI and MCP server.
KTX turns warehouse metadata, semantic definitions, and business knowledge into
reviewable project files that agents can use while planning, querying, and
updating analytics work.
KTX projects are plain files: YAML, Markdown, SQLite state, and generated
artifacts. You can inspect them, commit them, and serve them to any MCP client.
A KTX project is a directory of plain files — YAML semantic sources, Markdown
knowledge pages, and SQLite state — that you commit to git and review in PRs,
just like dbt models.
## What KTX provides
## Who KTX is for
- Durable warehouse memory with semantic-layer sources and knowledge pages.
- Native scan connectors for SQLite, Postgres, MySQL, ClickHouse, SQL Server,
BigQuery, and Snowflake.
- Agentic ingest with provenance links, tool transcripts, and replay metadata.
- Local semantic-layer query planning and optional query execution.
- A stdio MCP server with tools for connections, knowledge, semantic-layer
sources, ingest reports, and replay.
KTX is built for analytics engineers and data teams who want data agents to
work on real analytics systems — not just generate one-off SQL.
Use KTX when you want agents to:
- **Generate SQL** from approved measures and joins
- **Repair semantic definitions** through reviewable diffs
- **Explain metric provenance** with warehouse evidence
- **Work alongside** dbt, LookML, MetricFlow, Looker, Metabase, and modern BI
platforms
Works with PostgreSQL, Snowflake, BigQuery, ClickHouse, MySQL, SQL Server, and
SQLite.
## Quick start
Run the pre-seeded demo through the public npm package:
Install the CLI and run the setup wizard:
```bash
npx @kaelio/ktx setup demo --no-input
npx @kaelio/ktx setup demo inspect
```
The default demo uses packaged sample data and prebuilt context. It does not
require API keys, network access, or an LLM provider.
To replay the packaged ingest run, use:
```bash
npx @kaelio/ktx setup demo --mode replay --no-input
```
To run the full agentic demo with an LLM provider, set a provider key for the
current process:
```bash
ANTHROPIC_API_KEY=$YOUR_ANTHROPIC_API_KEY \
npx @kaelio/ktx setup demo --mode full --no-input
```
Interactive full-demo setup can prompt for a provider key without writing the
key to `ktx.yaml`.
You can also install the CLI in a project or globally:
```bash
npm install @kaelio/ktx
npx ktx --help
npm install -g @kaelio/ktx
ktx --help
ktx setup
```
## Build a local project
The wizard walks through six steps: configuring your LLM provider, setting up
embeddings, connecting your database, adding context sources (dbt, LookML,
Metabase, Looker, Notion), building context, and installing agent integration.
Create a project from a local workspace:
If it exits before completion, rerun `ktx setup` to resume where you left off.
Check your project status:
```bash
npm install @kaelio/ktx
PROJECT_DIR="$(mktemp -d)/ktx-demo"
npx ktx init "$PROJECT_DIR" --name ktx-demo
ktx status
```
Create a SQLite warehouse:
```
KTX project: /home/user/analytics
Project ready: yes
LLM ready: yes (claude-sonnet-4-6)
Embeddings ready: yes (text-embedding-3-small)
Primary sources configured: yes (postgres-warehouse)
Context sources configured: yes (dbt-main)
KTX context built: yes
Agent integration ready: yes (claude-code:project)
```
## What's in a project
```
my-project/
├── ktx.yaml # Project configuration
├── semantic-layer/
│ └── warehouse/
│ ├── orders.yaml # Semantic source definitions
│ ├── customers.yaml
│ └── order_items.yaml
├── knowledge/
│ ├── global/
│ │ ├── revenue.md # Business definitions and rules
│ │ └── segment-classification.md
│ └── user/
│ └── local/
├── raw-sources/
│ └── warehouse/
│ └── live-database/ # Scan artifacts and reports
└── .ktx/
└── db.sqlite # Local state (git-ignored)
```
Semantic sources and knowledge pages are committed to git. The `.ktx/` directory
holds ephemeral state and is git-ignored — delete it and KTX rebuilds on the
next run.
## Serve agents
KTX integrates with coding agents through CLI skills, an MCP server, or both.
The setup wizard configures this automatically — here's what each mode looks
like.
**CLI skills** — the agent calls `ktx` commands directly through a skill file
installed in your agent's config (e.g., `.claude/skills/ktx/SKILL.md`):
```bash
python - "$PROJECT_DIR/demo.db" <<'PY'
import sqlite3
import sys
conn = sqlite3.connect(sys.argv[1])
conn.executescript("""
DROP TABLE IF EXISTS accounts;
CREATE TABLE accounts (
account_id INTEGER PRIMARY KEY,
account_name TEXT NOT NULL,
segment TEXT NOT NULL,
region TEXT NOT NULL
);
INSERT INTO accounts VALUES
(1, 'Acme Analytics', 'Mid-Market', 'NA'),
(2, 'Beacon Bank', 'Enterprise', 'EMEA'),
(3, 'Cobalt Coffee', 'SMB', 'NA'),
(4, 'Delta Devices', 'Mid-Market', 'APAC'),
(5, 'Evergreen Energy', 'Enterprise', 'NA');
""")
conn.close()
PY
ktx sl query --measure orders.revenue --dimension orders.status --format sql
ktx wiki search "revenue definition"
ktx sl validate orders
```
Replace the generated `ktx.yaml`:
**MCP server** — the agent calls KTX tools over the Model Context Protocol:
```bash
cat > "$PROJECT_DIR/ktx.yaml" <<YAML
project: ktx-demo
connections:
warehouse:
driver: sqlite
path: $PROJECT_DIR/demo.db
readonly: true
storage:
state: sqlite
search: sqlite-fts5
git:
auto_commit: true
author: "ktx <ktx@example.com>"
memory:
auto_commit: true
YAML
```
Write and validate a semantic-layer source:
```bash
npx ktx sl write accounts --project-dir "$PROJECT_DIR" \
--connection-id warehouse --yaml 'name: accounts
table: accounts
description: CRM accounts with segmentation attributes.
grain:
- account_id
columns:
- name: account_id
type: number
- name: account_name
type: string
- name: segment
type: string
- name: region
type: string
measures:
- name: account_count
expr: count(account_id)
joins: []
'
npx ktx sl validate accounts --project-dir "$PROJECT_DIR" \
--connection-id warehouse
```
Generate SQL and execute the query:
```bash
npx ktx sl query --project-dir "$PROJECT_DIR" \
--connection-id warehouse \
--measure accounts.account_count \
--dimension accounts.segment \
--order-by accounts.account_count:desc \
--limit 5 \
--format sql
npx ktx sl query --project-dir "$PROJECT_DIR" \
--connection-id warehouse \
--measure accounts.account_count \
--dimension accounts.segment \
--order-by accounts.account_count:desc \
--limit 5 \
--execute \
--max-rows 5
```
List and test the warehouse connection:
```bash
npx ktx connection list --project-dir "$PROJECT_DIR"
npx ktx connection test warehouse --project-dir "$PROJECT_DIR"
```
The connection test prints the configured driver and discovered table count:
```text
Driver: sqlite
Tables: 1
```
### Scan the demo warehouse
Scan artifacts are written under
`raw-sources/warehouse/live-database/<syncId>/` in the project directory.
```bash
SCAN_OUTPUT="$(npx ktx scan warehouse --project-dir "$PROJECT_DIR")"
printf '%s\n' "$SCAN_OUTPUT"
SCAN_RUN_ID="$(printf '%s\n' "$SCAN_OUTPUT" | awk '/^Run: / { print $2 }')"
npx ktx scan status --project-dir "$PROJECT_DIR" "$SCAN_RUN_ID"
npx ktx scan report --project-dir "$PROJECT_DIR" "$SCAN_RUN_ID"
```
For non-SQLite drivers, prefer credential references such as `--url env:NAME`
or `--url file:PATH` over literal credential URLs.
## Managed Python runtime
KTX installs its Python runtime only when a Python-backed command needs it.
The runtime lives outside the npm cache, is versioned by the installed CLI
version, and is managed by `ktx runtime` commands.
KTX requires `uv` on `PATH` to create the managed runtime. Install `uv` with
your system package manager or the official installer before running Python-
backed KTX commands. KTX doesn't download `uv` automatically; run
`ktx runtime doctor` if runtime installation fails:
```bash
npx ktx runtime install --yes
npx ktx runtime status
npx ktx runtime doctor
npx ktx runtime start
npx ktx runtime stop
npx ktx runtime prune --dry-run
npx ktx runtime prune --yes
```
Use `runtime prune --dry-run` to preview stale runtime directories from older
CLI versions. Add `--yes` to remove those stale directories after daemon
processes are stopped.
Commands such as `npx @kaelio/ktx sl query ... --yes` can install the core
runtime lazily from the bundled wheel. Local embeddings remain lazy; prepare
them only when you select local `sentence-transformers` embeddings:
```bash
npx ktx runtime install --feature local-embeddings --yes
npx ktx runtime start --feature local-embeddings
```
## Serve MCP
Start the stdio MCP server from the project directory:
```bash
npx ktx serve --mcp stdio --project-dir "$PROJECT_DIR" \
ktx serve --mcp stdio \
--user-id local \
--semantic-compute \
--execute-queries \
--yes
```
The `--semantic-compute` flag uses the managed Python runtime when no explicit
semantic compute URL is provided. KTX starts or reuses the managed runtime as
needed.
This exposes tools for connections, knowledge search, semantic-layer sources,
validation, queries, ingestion, and replay. The `--semantic-compute` flag starts
the managed Python runtime for query planning automatically.
The MCP server exposes `connection_list`, `knowledge_search`,
`knowledge_read`, `knowledge_write`, `sl_list_sources`, `sl_read_source`,
`sl_write_source`, `sl_validate`, `sl_query`, `ingest_trigger`,
`ingest_status`, `ingest_report`, and `ingest_replay`.
Supported agents: Claude Code, Codex, Cursor, OpenCode, and any agent that
reads `.agents/` skills or MCP configuration.
## Workspace packages
- `packages/context`: core TypeScript context library.
- `packages/cli`: CLI wrapper over the context package.
- `packages/llm`: LLM and embedding provider helpers.
- `packages/connector-bigquery`: BigQuery scan connector.
- `packages/connector-clickhouse`: ClickHouse scan connector.
- `packages/connector-mysql`: MySQL scan connector.
- `packages/connector-postgres`: Postgres scan connector.
- `packages/connector-snowflake`: Snowflake scan connector.
- `packages/connector-sqlite`: SQLite scan connector.
- `packages/connector-sqlserver`: SQL Server scan connector.
- `python/ktx-sl`: semantic-layer engine.
- `python/ktx-daemon`: portable compute service for semantic-layer operations.
| Package | Purpose |
|---------|---------|
| `packages/cli` | CLI entry point |
| `packages/context` | Core context engine |
| `packages/llm` | LLM and embedding providers |
| `packages/connector-*` | Database connectors (Postgres, Snowflake, BigQuery, ClickHouse, MySQL, SQL Server, SQLite) |
| `python/ktx-sl` | Semantic-layer query planning |
| `python/ktx-daemon` | Portable compute service |
## Development
Install dependencies and run checks:
```bash
git clone https://github.com/kaelio/ktx.git
cd ktx
pnpm install
uv sync --all-groups
pnpm run build
pnpm run check
uv sync --all-packages
source .venv/bin/activate
uv run pytest
```
Use the optional development binary when you want a local `ktx-dev` command:
Use the development CLI for local testing:
```bash
pnpm run setup:dev
pnpm run link:dev
ktx-dev --help
```
The repository uses `pnpm` for TypeScript packages and `uv` for Python
packages.
## Release status
This repository builds one public npm artifact named `@kaelio/ktx`. The release
artifact manifest contains the public npm tarball and the bundled `kaelio-ktx`
runtime wheel. The first public npm handoff is policy-gated through
`release-policy.json`, which keeps Python package publishing disabled because
KTX-owned Python code ships inside the npm package as a bundled wheel. The
`python/ktx-sl` and `python/ktx-daemon` directories remain source packages for
development, not public release artifacts.
Build local package artifacts and verify the guarded dry-run publish path with:
```bash
source .venv/bin/activate
pnpm run artifacts:check
pnpm run release:readiness
pnpm run release:npm-publish
```
Run the live npm publish only from the manual `KTX Release` workflow with the
`publish_live` input enabled after the `NPM_TOKEN` secret is configured.
packages. See [Contributing](docs-site/content/docs/community/contributing.mdx)
for full development setup, testing, and PR guidelines.
## License

View file

@ -68,7 +68,7 @@ The MCP server is typically configured through `ktx setup --agents` rather than
| Error | Cause | Recovery |
|-------|-------|----------|
| Agent cannot start server | The agent config cannot find the `ktx` binary | Run `pnpm run link:dev` or use an absolute command path in the agent config |
| Agent cannot start server | The agent config cannot find the `ktx` binary | Install `@kaelio/ktx` globally with `npm install -g @kaelio/ktx` or use an absolute command path in the agent config |
| Semantic tools are unavailable | Server was started without `--semantic-compute` | Add `--semantic-compute` or `--semantic-compute-url` to the server args |
| Query execution is denied | Server was started without `--execute-queries` | Add `--execute-queries` only for trusted projects where read-only execution is intended |
| Context resolves to wrong project | `KTX_PROJECT_DIR` is missing or points elsewhere | Set `KTX_PROJECT_DIR` to the project containing the intended `ktx.yaml` |

View file

@ -7,6 +7,11 @@ KTX is an open-source project and welcomes contributions — bug fixes, new conn
## Development setup
This page is for contributors working on the KTX repository. To install KTX for
an analytics project, use the published
[`@kaelio/ktx`](https://www.npmjs.com/package/@kaelio/ktx) package in the
[Quickstart](/docs/getting-started/quickstart).
### Prerequisites
- **Node.js 22+** and **pnpm** — for the TypeScript workspace
@ -44,7 +49,9 @@ pnpm run setup:dev
pnpm run link:dev
```
This makes the `ktx` command available globally, pointing at your local build.
This makes the `ktx-dev` command available globally, pointing at your local
build. Use this development binary when you need to test unpublished repository
changes.
## Repository structure

View file

@ -9,44 +9,30 @@ If you are a coding assistant trying to decide which KTX docs page to read, star
## Workflow summary
Use this sequence when an agent needs to set up KTX from a fresh checkout:
Use this sequence when you are setting up KTX in an analytics project:
1. `pnpm install` — install workspace dependencies.
2. `pnpm run setup:dev` — build local packages and prepare the development CLI.
3. `pnpm run link:dev` — link the `ktx` command for local use.
4. `ktx setup` — create or resume a KTX project.
5. `ktx status` — verify project readiness.
6. `ktx sl list` — confirm semantic-layer sources are available.
7. `ktx sl query ... --format sql` — compile a semantic query without executing it.
1. `npm install -g @kaelio/ktx` — install the published KTX CLI from npm.
2. `ktx setup` — create or resume a KTX project.
The setup wizard is stateful. If it exits before completion, rerun `ktx setup` in the same project directory to resume from the first incomplete step.
## Prerequisites
- **Node.js 22+** and **pnpm**
- An **Anthropic API key** for LLM-powered enrichment and ingestion
- A **database connection** — PostgreSQL, Snowflake, BigQuery, ClickHouse, MySQL, SQL Server, or SQLite
- Optionally, a **dbt project**, **LookML repo**, **Metabase instance**, or other context source
## Install and run setup
KTX is currently used from a local checkout or linked workspace CLI. Build and link the CLI first:
Install the published [`@kaelio/ktx`](https://www.npmjs.com/package/@kaelio/ktx) CLI:
```bash
git clone https://github.com/kaelio/ktx.git
cd ktx
pnpm install
pnpm run setup:dev
pnpm run link:dev
npm install -g @kaelio/ktx
```
Then run the setup wizard in the directory where you want your KTX project:
Then run the setup wizard:
```bash
ktx setup
```
The wizard walks through six steps. You can go back at any point, and if you exit early, running `ktx setup` again resumes where you left off.
The local checkout flow is only for contributors working on KTX itself. See [Contributing](/docs/community/contributing) for that setup.
The wizard walks through six steps. You can go back at any point, and if you exit early, rerunning `ktx setup` resumes where you left off.
## Step 1: Configure LLM
@ -86,10 +72,11 @@ KTX uses embeddings for semantic search over sources, wiki content, schema metad
**OpenAI embeddings** use `text-embedding-3-small` (1536 dimensions) and require an `OPENAI_API_KEY`.
**Local embeddings** use `all-MiniLM-L6-v2` (384 dimensions) via the KTX Python daemon. No API key is needed. If you run the daemon as a long-lived HTTP service, start it with:
**Local embeddings** use `all-MiniLM-L6-v2` (384 dimensions) via the KTX managed Python runtime. No API key is needed. KTX can install and start the runtime during setup; to prepare it ahead of time, run:
```bash
ktx-daemon serve-http --host 127.0.0.1 --port 8765
ktx runtime install --feature local-embeddings --yes
ktx runtime start --feature local-embeddings
```
## Step 3: Connect a database
@ -208,12 +195,15 @@ Then select which agents to install for:
│ ◻ Codex
│ ◻ Cursor
│ ◻ OpenCode
│ ◻ Custom agent (.agents)
```
**CLI mode** writes a skill file (e.g., `.claude/skills/ktx/SKILL.md`) that teaches the agent to call KTX commands directly.
**MCP mode** writes an MCP server configuration (e.g., `.mcp.json`) that lets the agent call KTX tools like `sl_query`, `knowledge_search`, and `sl_write_source` over the Model Context Protocol.
**Custom agent** uses the universal `.agents` target for agents that can read project-local skills or MCP configuration.
## Generated files
KTX writes project state as plain files so agents can inspect and edit changes in git.
@ -247,44 +237,14 @@ KTX context built: yes
Agent integration ready: yes (claude-code:project)
```
List your semantic sources:
```bash
ktx sl list
```
Query through the semantic layer:
```bash
ktx sl query \
--connection-id postgres-warehouse \
--measure orders.total_revenue \
--dimension orders.status \
--order-by orders.total_revenue:desc \
--limit 5 \
--format sql
```
This outputs the generated SQL. Add `--execute` to run it against your warehouse:
```bash
ktx sl query \
--connection-id postgres-warehouse \
--measure orders.total_revenue \
--dimension orders.status \
--order-by orders.total_revenue:desc \
--limit 5 \
--execute --max-rows 10
```
## Common errors
| Error or symptom | Likely cause | Recovery |
|------------------|--------------|----------|
| `ktx: command not found` | The local CLI has not been linked | Run `pnpm run setup:dev` and `pnpm run link:dev` from the KTX checkout, then open a new shell |
| `ktx: command not found` | The KTX package is not installed globally, or the shell cannot find the global binary | Run `npm install -g @kaelio/ktx` and open a new shell |
| LLM health check fails | Missing, invalid, or unauthorized Anthropic API key | Export `ANTHROPIC_API_KEY` or rerun `ktx setup` and choose the file-backed secret option |
| OpenAI embedding check fails | `OPENAI_API_KEY` is missing when OpenAI embeddings are selected | Export `OPENAI_API_KEY`, or rerun setup and choose local sentence-transformers embeddings |
| Local embeddings hang or fail | The Python daemon cannot start or the local model runtime is unavailable | Run `uv sync --all-groups`, then start `ktx-daemon serve-http --host 127.0.0.1 --port 8765` and rerun setup |
| Local embeddings hang or fail | The managed Python runtime cannot start or the local model runtime is unavailable | Install `uv`, run `ktx runtime doctor`, then run `ktx runtime install --feature local-embeddings --yes` and rerun setup |
| Database connection test fails | Credentials, network access, warehouse, database, or schema value is wrong | Test the same URL with the database's native client, then rerun `ktx connection add ... --force` or rerun setup |
| `KTX context built: no` in `ktx status` | Setup saved configuration but did not build context | Run `ktx setup context build` or rerun `ktx setup` and choose to build context now |
| Agent integration is incomplete | Setup skipped the agents step or the target was not installed | Run `ktx setup --agents --target codex --agent-install-mode both --project` using the target you need |