ktx is the context layer for analytics agents https://docs.kaelio.com/ktx
Find a file
Kevin Messiaen 3c4fcc27c7
feat: Add duckdb connector (#308)
* refactor(duckdb): extract shared json-safe bigint helper

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>

* feat(duckdb): add and register the duckdb primary connector

Add KtxDuckDbDialect, KtxDuckDbScanConnector (local file-backed, read-only,
never-create, main-schema introspection via information_schema and
duckdb_constraints() for foreign keys), and register the duckdb driver across
the dialect factory, driver registry, connection-type enum, warehouse descriptor,
config schema, scan normalization, connection test drivers, and status display.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>

* feat(duckdb): route live-database ingest through the DuckDB connector

Add the DuckDB live-database introspection bridge and dispatch duckdb
connections to it in local-adapters, matching the SQLite path. Repoint the
config-rejection test off duckdb (now a valid driver) onto the no-driver case.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>

* feat(duckdb): add duckdb to the setup database flow

Offer DuckDB in the interactive checklist and via ktx setup --database duckdb,
with a file-path prompt and duckdb-local default connection id, parallel to SQLite.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>

* feat(duckdb): attach native duckdb files in federation

Native .duckdb members ATTACH with (READ_ONLY) and no TYPE/INSTALL/LOAD, since
the duckdb format needs no extension. attachTypeForDriver returns null for the
native case; buildAttachStatements builds load statements from non-null types
only and emits a conditional ATTACH clause.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>

* docs(duckdb): document the duckdb primary-source connector

Add a DuckDB section to the primary-sources integration page (config, read-only
never-create behavior, main-schema scope, federation) and update the
supported-driver assertion in dialects.test.ts to include duckdb.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>

* fix(duckdb): use single-namespace display shape for main-only refs

DuckDB v1 introspects the main schema and sets db=null on every table, so its
display refs are single-namespace like SQLite. The ansi shape emitted a 1-part
table display it then refused to parse, breaking column-level display resolution.
Switch the dialect to the sqlite display shape and add a round-trip test plus a
composite-foreign-key test that were missing.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>

* refactor(duckdb): resolve connector dialect via getDialectForDriver

Route the connector's dialect through the shared factory like every other
connector, now that duckdb is registered. Single construction path.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>

* fix(duckdb): skip schema picker for single-file duckdb setup

DuckDB is a single-file, single-namespace ('main') database like SQLite,
but the setup scope step only skipped the schema picker for sqlite. DuckDB
fell into the multi-schema path with an empty schema list, rendering a
broken picker ("No matches found" for main). Extend the file-based-driver
early-return to cover duckdb so it ingests every table directly.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>

* refactor(duckdb): reuse shared config helper and derive scope skip

Route duckdb path resolution through the shared resolveStringReference
helper instead of a local third copy of env:/file: handling. Derive the
setup scope-picker skip from SCOPE_DISCOVERY_SPECS membership rather than
a hardcoded sqlite/duckdb driver list.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>

* test(duckdb): use a genuinely unknown driver in the rejection test

The merged "rejects unknown drivers" test used `driver: duckdb` as its
unknown-driver stand-in, which stopped being unknown once this branch
added the duckdb connector. Switch to `nonsense` so it again exercises
the unsupported-driver config error.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>

* test(duckdb): cover dialect, connector, and live-introspection branches

Codecov flagged uncovered branches as dead code; all are real connector,
dialect, and live-ingest behavior. Add unit tests instead of removing them.

- dialect: precedence ladder, sample/clause builders, profiling expressions
- connector: url/env config forms, error throws, never-create guard,
  cardinality cap branches, table-scope empty/non-empty paths
- live-introspection: full-schema and table-scope extraction

Functions 100%, lines ~99% across the duckdb connector dir.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>

* docs: add DuckDB to supported-driver references

The DuckDB connector PR documented the connector itself but left the
scattered supported-driver enumerations stale. Add duckdb to the
federation concept page (participation table, activation, table naming,
limitations), the ktx setup CLI reference, the ktx.yaml warehouse-driver
table, the primary-sources field reference, and the quickstart driver
list (which also restores the missing ClickHouse entry).

---------

Co-authored-by: Claude Opus 4.8 <noreply@anthropic.com>
Co-authored-by: Andrey Avtomonov <andreybavt@gmail.com>
2026-07-01 12:06:02 +00:00
.github chore: remove star history refresh workflow 2026-06-30 15:17:05 +02:00
assets chore: refresh star history chart [skip ci] 2026-06-29 18:46:41 +00:00
docs refactor: enforce ktx naming and AGENTS.md compliance sweep (#289) 2026-06-11 13:49:45 +02:00
docs-site feat: Add duckdb connector (#308) 2026-07-01 12:06:02 +00:00
examples feat(connectors): add MongoDB connector (#305) (#310) 2026-06-29 15:17:56 +02:00
packages/cli feat: Add duckdb connector (#308) 2026-07-01 12:06:02 +00:00
python chore(release): 0.15.0 [skip ci] 2026-06-30 23:16:54 +00:00
scripts feat(cli): self-provision pinned uv and defer MCP Python runtime install (#297) 2026-06-12 16:31:06 +00:00
skills/ktx refactor: enforce ktx naming and AGENTS.md compliance sweep (#289) 2026-06-11 13:49:45 +02:00
.gitignore chore: remove private planning docs (#140) 2026-05-19 14:58:55 +02:00
.pre-commit-config.yaml ci: stop tombi reformatting uv.lock and sync lock to 0.7.0 (#235) 2026-05-29 15:04:48 +02:00
.releaserc.cjs feat: add claude-code llm backend with runtime port (#115) 2026-05-16 12:06:34 +02:00
AGENTS.md fix(sl): parse user filter expressions as predicates, not projections (#307) 2026-06-19 08:47:44 +00:00
biome.json feat: merge ingest and scan 2026-05-14 01:43:06 +02:00
CLAUDE.md Initial open-source release 2026-05-10 23:12:26 +02:00
codecov.yml refactor(release): drop release-policy.json runtime dep and next branch (#180) 2026-05-20 13:53:14 +02:00
conductor.json [codex] Add Conductor workspace scripts (#2) 2026-05-11 09:55:42 +02:00
CONTRIBUTING.md refactor: enforce ktx naming and AGENTS.md compliance sweep (#289) 2026-06-11 13:49:45 +02:00
GEMINI.md Initial open-source release 2026-05-10 23:12:26 +02:00
knip.json feat: ktx batch — scan resilience, analytics SQL craft, connector hardening (#312) 2026-06-29 16:35:57 +00:00
LICENSE ci: run pre-commit checks in CI (#74) 2026-05-13 19:49:25 +02:00
package.json chore(release): 0.15.0 [skip ci] 2026-06-30 23:16:54 +00:00
pnpm-lock.yaml feat: ktx batch — scan resilience, analytics SQL craft, connector hardening (#312) 2026-06-29 16:35:57 +00:00
pnpm-workspace.yaml fix(deps): bump hono override to 4.12.21 to resolve dependabot alerts (#288) 2026-06-10 12:26:01 +00:00
pyproject.toml chore: upgrade dependencies and tooling (#232) 2026-05-29 11:56:55 +02:00
README.md docs: consolidate AI Resources into a single page (#274) 2026-06-09 00:28:56 -04:00
release-policy.json chore(release): 0.15.0 [skip ci] 2026-06-30 23:16:54 +00:00
SECURITY.md refactor: enforce ktx naming and AGENTS.md compliance sweep (#289) 2026-06-11 13:49:45 +02:00
skills.sh.json docs: add ktx skills.sh setup skill (#227) 2026-05-28 12:28:10 +02:00
tombi.toml chore: upgrade dependencies and tooling (#232) 2026-05-29 11:56:55 +02:00
tsconfig.base.json perf(setup): speed up conductor setup and make it rerun-safe (#107) 2026-05-15 12:06:37 +02:00
uv.lock fix(gdrive): validate folder access, run config test, harden Drive API (#321) 2026-06-28 01:02:37 +02:00

ktx

The context layer for data agents

npm version Codecov Tests Documentation Join the ktx Slack community License Y Combinator P25

Quickstart · CLI Reference · Agent Setup · Slack

Built and maintained by Kaelio


ktx is a self-improving context layer that teaches agents how to query your warehouse accurately - from approved metric definitions, joinable columns, and business knowledge it builds and maintains for you.

Note

Run ktx with your own LLM API keys or a local agent sign-in — a Claude Pro/Max subscription through Claude Code, or your local Codex authentication. No extra usage billing from ktx.

Watch the ktx launch video (1:56)

Ingestion: ktx ingests databases, BI tools, modeling code, and docs through its context engine (source connectors, context builder, reconciliation, validation) into wiki Markdown and semantic-layer YAML

Serving: an agent queries ktx through MCP, which searches the wiki and semantic layer, returns approved metrics, and compiles them into read-only SQL run against the warehouse

Why ktx

General-purpose agents struggle on data tasks. They re-explore your warehouse on every question, invent their own metric logic, and return numbers that don't match approved definitions.

Traditional semantic layers don't fix this. They demand constant manual upkeep and don't absorb the rest of your company's knowledge.

ktx does both, automatically:

  • Learns from company knowledge. Ingests wiki content, organizes it, removes duplicates, and flags contradictions for human review.
  • Maps the data stack. Samples tables, captures metadata and usage patterns, detects joinable columns, and annotates sources so agents write better queries.
  • Builds a semantic layer. Combines raw tables and high-level metrics through a join graph that automatically resolves chasm and fan traps, so agents fetch metrics declaratively instead of rewriting canonical SQL each time.
  • Serves agents at execution. Exposes CLI and MCP tools with combined full-text and semantic search across wiki and semantic-layer entities.

How ktx compares

General-purpose agent Traditional semantic layer ktx
Builds warehouse context automatically
Detects joinable columns + resolves fan/chasm traps Manual
Approved, reusable metric definitions
Absorbs wiki / Notion / team knowledge
Flags contradictions across sources
Ships CLI + MCP for agent execution Partial
Read-only by design n/a n/a

Who is ktx for

Use ktx if you:

  • Want agents like Claude Code, Codex, Cursor, or OpenCode to query your warehouse with approved metric definitions
  • Have business knowledge scattered across dbt, Looker, Metabase, Notion, and team wikis
  • Need agents to reuse canonical SQL instead of inventing it on every prompt

Skip ktx if you:

  • You don't have a SQL warehouse - ktx sits on top of one
  • You only need one ad-hoc query - psql or a notebook will do

Works with PostgreSQL, Snowflake, BigQuery, ClickHouse, MySQL, SQL Server, and SQLite. Integrates with dbt, MetricFlow, LookML, Looker, Metabase, and Notion.

Quick Start

npm install -g @kaelio/ktx
ktx setup
ktx status

ktx setup creates or resumes a local ktx project, configures providers and connections, builds context, and installs agent integration.

Example ktx status after setup:

ktx project: /home/user/analytics
Project ready: yes
LLM ready: yes (claude-sonnet-4-6)
Embeddings ready: yes (text-embedding-3-small)
Databases configured: yes (warehouse)
Context sources configured: yes (dbt_main)
ktx context built: yes
Agent integration ready: yes (codex:project)

Tip

Already using an agent? Ask Claude Code, Codex, Cursor, or OpenCode from your project directory:

Run npx skills add Kaelio/ktx --skill ktx and use the ktx skill to install
and configure ktx in this project.

Important

If ktx status prints ktx mcp start --project-dir ..., run it before opening your agent client.

Upgrading

Re-run the global install with the @latest tag:

npm install -g @kaelio/ktx@latest

First commands

Command Purpose
ktx setup Create, resume, or update a ktx project
ktx status Check project readiness
ktx ingest Build context for every configured connection
ktx sl "revenue" Search semantic sources
ktx wiki "refund policy" Search local wiki pages
ktx mcp start Start the MCP server for agent clients

See the CLI Reference for every command, flag, and option.

Project Layout

my-project/
├── ktx.yaml                         # Project configuration
├── semantic-layer/<connection-id>/  # YAML semantic sources
├── wiki/global/                     # Shared business context
├── wiki/user/<user-id>/             # User-scoped notes
├── raw-sources/<connection-id>/     # Ingest artifacts and reports
└── .ktx/                            # Local state and secrets, git-ignored

Commit ktx.yaml, semantic-layer/, and wiki/. Keep .ktx/ local.

Project resolution defaults to KTX_PROJECT_DIR, then the nearest ktx.yaml, then the current directory. Pass --project-dir <path> when scripting.

FAQ

  • Does ktx send my schema or query results to a hosted service? No. ktx runs locally. The only data leaving your machine is what you send to the LLM provider you configured.
  • Which LLM backends are supported? Anthropic API, Google Vertex AI, AI Gateway, the local Claude Code session through the Claude Agent SDK, and your local Codex authentication through the Codex SDK. See LLM configuration.
  • How is ktx different from a dbt or MetricFlow semantic layer? ktx ingests those layers and combines them with raw-table introspection and wiki content. Agents get one searchable surface instead of three disconnected ones - and ktx flags contradictions across sources.
  • Does ktx need a running server? There is no hosted service. The local MCP daemon runs on demand via ktx mcp start when an agent client needs it.
  • Is my warehouse safe? Yes. Connections are read-only - ktx never writes to your database.

Docs

Community

  • Slack — ask questions, share what you're building, and chat with maintainers.
  • GitHub Issues — report bugs and request features.
  • Contributing — set up the repo, run tests, and open a PR.

Development

git clone https://github.com/kaelio/ktx.git
cd ktx
pnpm install
uv sync --all-groups
pnpm run build
pnpm run check

ktx is a pnpm + uv workspace:

Path Purpose
packages/cli TypeScript CLI and published npm package source
packages/cli/src/context Core context engine
packages/cli/src/llm LLM and embedding providers
packages/cli/src/connectors Database scan connectors
python/ktx-sl Semantic-layer query planning
python/ktx-daemon Portable compute service

Local development CLI:

pnpm run setup:dev
pnpm run link:dev
ktx-dev --help

Useful checks:

pnpm run type-check
pnpm run test
pnpm run dead-code
uv run pytest -q

Telemetry

ktx collects privacy-conscious usage telemetry to understand installs and improve setup, command reliability, and data-agent workflows. Catalog telemetry events do not record file paths, hostnames, SQL, schema names, table names, column names, error messages, raw environment values, or argv. Error reports use PostHog Error Tracking and can include stack frames and raw error messages, which may contain local file paths or the local username in those paths. ktx redacts secrets, credentials, database URLs, auth headers, argv, raw environment values, SQL text, row data, and user-typed prompt or MCP argument text from the explicit $exception payload. See Telemetry for the event catalog and opt-out options.

License

ktx is licensed under the Apache License, Version 2.0. See LICENSE.

Star History

ktx Star History Chart