* fix(context): merge overlay columns onto manifest columns by name composeOverlay was appending overlay columns to the manifest column list, producing duplicate entries when dbt/metabase overlays declared a column just to attach descriptions. The duplicates carried no `type`, so the pydantic SourceDefinition rejected them at semantic-query time and broke `ktx sl query` for every overlay-backed measure. Now overlay columns match base columns by name (case-insensitive): same-name entries merge onto the manifest (overlay fields win, type/role fall back to the base, descriptions merge per source key) and only new names append. * refactor(sl): split overlay columns from column_overrides and enforce TS/Python wire contract Overlay sources now have two distinct collections: `columns:` for computed columns (requiring `expr` + `type`) and `column_overrides:` for metadata patches to inherited manifest columns. Composing or loading an overlay that mixes the two — or references an unknown column — fails with a typed error. Introduce `ResolvedSemanticLayerSource` / `resolvedSourceSchema` / `toResolvedWire` as the strict shape sent to the Python engine, and add a schema contract test that diffs Zod against the Pydantic JSON schema dumped by `python -m semantic_layer dump-schema`. `SourceDefinition` is now `extra="forbid"` on the Python side. `loadAllSources` surfaces per-file load errors instead of swallowing them, so validation/query paths can report manifest shard parse failures. * fix(context): make scan description generation resilient and quiet A transient sampleTable failure during ingest used to take out every table in a connection: generateTableDescription returned a hardcoded 'Table not found' string into descriptions.ai, and KtxDescriptionGenerator was constructed without a logger, so the failure left no trail anywhere. - sampleTable / sampleColumn calls retry 3x with 200/400/800ms backoff, honouring KtxScanContext.signal via a new KtxAbortedError. - On retry exhaustion or missing capability, table generation falls back to a metadata-only prompt built from column name / native type / comment / rawDescriptions. The column path follows the same rule -- call the LLM when any of samples or rawDescriptions are available; skip only when both are absent. - Logger is now threaded from KtxScanContext into the generator. Failures emit structured KtxScanWarning entries (new description_fallback_used code, plus existing sampling_failed / enrichment_failed / connector_capability_missing). ktx scan groups warnings by code so a batch of identical failures collapses to one summary line plus sample. - Returns null on failure instead of the 'Table not found' sentinel; the manifest writer's existing guard already skips empty descriptions, so schema YAML no longer carries misleading text. SCAN_MANAGED_DESCRIPTION_KEYS already strips stale 'ai' on merge, so existing YAML clears on next run. Also suppress AI SDK v6 'system in messages' warning: pull system messages out of KtxMessageBuilder.wrapSimple's output via a new splitKtxSystemMessages helper and pass them top-level to generateText (preserves cacheControl providerOptions on the SystemModelMessage). Agent-runner's local splitSystemPromptMessages dedupes onto the shared helper. * test(docs): align examples-docs assertions with revamped docs PR #103 (setup/guide doc revamp) reworded several CLI examples and connection labels; the assertions in scripts/examples-docs.test.mjs still referenced the pre-revamp wording and were failing in CI on main. Update the regexes to match the post-revamp content: - drop the `--json` flag from the sl-query example expectation - move the `Driver:` / `Status: ok` probe to the connection reference, which is where that output now lives (driver id is lowercase `postgres`, not the display name `PostgreSQL`) - drop the obsolete `Install \`uv\`...` troubleshooting line - accept `<connectionId>` everywhere; the docs no longer use the hyphenated `<connection-id>` form - match the `warehouse` connection id used in the quickstart instead of the `postgres-warehouse` id only used in the README and setup ref * fix(sl): skip TS/Python schema contract test when uv is unavailable The TypeScript checks CI job does not install uv or Python, so the module-level `execFileSync('uv', ...)` in schemas.contract.test.ts threw ENOENT and failed the suite. Wrap the schema dump in a try/catch and guard the describe block with `describe.skipIf` so the test skips in environments without uv. Local dev and any CI job that has uv on PATH still runs the cross-language contract assertion. |
||
|---|---|---|
| .github/workflows | ||
| assets | ||
| docs/superpowers | ||
| docs-site | ||
| examples | ||
| packages | ||
| python | ||
| scripts | ||
| website | ||
| .gitignore | ||
| .pre-commit-config.yaml | ||
| AGENTS.md | ||
| biome.json | ||
| CLAUDE.md | ||
| conductor.json | ||
| GEMINI.md | ||
| knip.json | ||
| LICENSE | ||
| package.json | ||
| pnpm-lock.yaml | ||
| pnpm-workspace.yaml | ||
| pyproject.toml | ||
| README.md | ||
| release-policy.json | ||
| tsconfig.base.json | ||
| uv.lock | ||
The context layer for analytics agents
KTX turns warehouse metadata, semantic definitions, and business knowledge into reviewable project files that agents can use while planning, querying, and updating analytics work.
A KTX project is a directory of plain files - YAML semantic sources, Markdown wiki pages, and SQLite state - that you commit to git and review in PRs, just like dbt models.
Who KTX is for
KTX is built for analytics engineers and data teams who want data agents to work on real analytics systems - not just generate one-off SQL.
Use KTX when you want agents to:
- Generate SQL from approved measures and joins
- Repair semantic definitions through reviewable diffs
- Explain metric provenance with warehouse evidence
- Work alongside dbt, LookML, MetricFlow, Looker, Metabase, and modern BI platforms
Works with PostgreSQL, Snowflake, BigQuery, ClickHouse, MySQL, SQL Server, and SQLite.
Quick start
Install the CLI and run the setup wizard:
npm install @kaelio/ktx
npm install -g @kaelio/ktx
ktx setup
The wizard walks through six steps: configuring your LLM provider, setting up embeddings, connecting your database, adding context sources (dbt, LookML, Metabase, Looker, Notion), building context, and installing agent integration.
If it exits before completion, rerun ktx setup to resume where you left off.
Check your project status:
ktx status
KTX project: /home/user/analytics
Project ready: yes
LLM ready: yes (claude-sonnet-4-6)
Embeddings ready: yes (text-embedding-3-small)
Databases configured: yes (postgres-warehouse)
Context sources configured: yes (dbt-main)
KTX context built: yes
Agent integration ready: yes (claude-code:project)
Generate SQL from a semantic-layer source:
npx @kaelio/ktx sl query --project-dir "$PROJECT_DIR" \
--connection-id warehouse \
--measure accounts.account_count \
--dimension accounts.segment \
--format sql
List and test a configured warehouse connection:
ktx connection list --project-dir "$PROJECT_DIR"
ktx connection test warehouse --project-dir "$PROJECT_DIR"
The connection test prints the configured driver and connector-specific status:
Connection test passed: warehouse
Driver: sqlite
Status: ok
What's in a project
my-project/
├── ktx.yaml # Project configuration
├── semantic-layer/
│ └── warehouse/
│ ├── orders.yaml # Semantic source definitions
│ ├── customers.yaml
│ └── order_items.yaml
├── wiki/
│ ├── global/
│ │ ├── revenue.md # Business definitions and rules
│ │ └── segment-classification.md
│ └── user/
│ └── local/
├── raw-sources/
│ └── warehouse/
│ └── <syncId>/ # Database ingest artifacts and reports
└── .ktx/
└── db.sqlite # Local state (git-ignored)
Semantic sources and wiki pages are committed to git. The .ktx/ directory
holds ephemeral state and is git-ignored - delete it and KTX rebuilds on the
next run.
Build demo warehouse context
Database ingest artifacts are written under raw-sources/warehouse/<syncId>/
in the project directory.
ktx ingest warehouse --project-dir "$PROJECT_DIR" --fast
ktx status --project-dir "$PROJECT_DIR"
For non-SQLite drivers, prefer credential references such as --url env:NAME
or --url file:PATH over literal credential URLs.
Managed Python runtime
KTX installs its Python runtime only when a Python-backed command needs it.
The runtime lives outside the npm cache, is versioned by the installed CLI
version, and is managed by ktx dev runtime commands.
KTX requires uv on PATH to create the managed runtime. Install uv with
your system package manager or the official installer before running Python-
backed KTX commands. KTX doesn't download uv automatically; run
ktx dev runtime status if runtime installation fails:
ktx dev runtime install --yes
ktx dev runtime status
ktx dev runtime start
ktx dev runtime stop
The release artifact manifest contains the public npm tarball and the bundled kaelio-ktx
runtime wheel. The python/ktx-sl and python/ktx-daemon directories remain
source packages for development, not public release artifacts.
Use KTX with agents
KTX integrates with coding agents through CLI skills. The setup wizard configures this automatically.
CLI skills - the agent calls ktx commands directly through a skill file
installed in your agent's config (e.g., .claude/skills/ktx/SKILL.md):
ktx sl query --measure orders.revenue --dimension orders.status --format sql
ktx wiki search "revenue definition"
ktx sl validate orders
Supported agents: Claude Code, Codex, Cursor, OpenCode, and any agent that
reads .agents/ skills.
Workspace packages
| Package | Purpose |
|---|---|
packages/cli |
CLI entry point |
packages/context |
Core context engine |
packages/llm |
LLM and embedding providers |
packages/connector-bigquery |
BigQuery scan connector |
packages/connector-clickhouse |
ClickHouse scan connector |
packages/connector-mysql |
MySQL scan connector |
packages/connector-postgres |
Postgres scan connector |
packages/connector-snowflake |
Snowflake scan connector |
packages/connector-sqlite |
SQLite scan connector |
packages/connector-sqlserver |
SQL Server scan connector |
python/ktx-sl |
Semantic-layer query planning |
python/ktx-daemon |
Portable compute service |
Development
git clone https://github.com/kaelio/ktx.git
cd ktx
pnpm install
uv sync --all-groups
pnpm run build
pnpm run check
Use the development CLI for local testing:
pnpm run setup:dev
pnpm run link:dev
ktx-dev --help
Debug LLM traces
KTX can capture local AI SDK DevTools traces for LLM calls that run through the KTX provider. Enable it with an environment flag when running an LLM-backed command:
KTX_AI_DEVTOOLS_ENABLED=true ktx ingest warehouse --project-dir "$PROJECT_DIR" --deep
Traces are written to .devtools/generations.json under the current working
directory. To inspect them, run:
pnpm dlx @ai-sdk/devtools
Then open http://localhost:4983. These traces are local-development-only and
store prompts, model outputs, tool arguments/results, and raw provider payloads
in plain text. Do not enable this in production or for sensitive runs.
The repository uses pnpm for TypeScript packages and uv for Python
packages. See Contributing
for full development setup, testing, and PR guidelines.
License
KTX is licensed under the Apache License, Version 2.0. See LICENSE.