ktx is the context layer for analytics agents https://docs.kaelio.com/ktx
Find a file
Kevin Messiaen 6c815ef529
feat(duckdb): cross-database federation via derived DuckDB connection (#295)
* feat(duckdb): add @duckdb/node-api dependency for federation

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

* refactor(connectors): extract resolveStringReference to shared module

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

* refactor(connectors): route all identical connectors through shared resolveStringReference

Collapse the 5 remaining private copies in bigquery, clickhouse, mysql,
snowflake, and sqlserver into the shared module. Fix a latent bug in the
shared module where `~/path` was incorrectly sliced (dropping only `~`,
leaving the leading `/` and making resolve() ignore homedir). Add a
tilde-expansion test that caught the bug and now covers that branch.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

* feat(sl): reserve _ktx_ connection-id prefix for virtual connections

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

* feat(connections): derive virtual federated connection from compatible members

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

* feat(duckdb): federated executor builds READ_ONLY attaches and runs SQL

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

* fix(duckdb): close federated DuckDB instance and escape quotes in attach url

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

* feat(sl): union member source directories for _ktx_federated

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

* feat(query): route _ktx_federated through DuckDB executor

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

* feat(sl): use duckdb dialect for federated query compilation

Bypass assertSafeConnectionId for _ktx_federated in resolveLocalConnectionId
and loadComputableSources, and resolve the compute dialect to 'duckdb' when
connectionId is FEDERATED_CONNECTION_ID instead of falling through to the
default postgres lookup.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

* test(duckdb): end-to-end cross-catalog federated join

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

* test(duckdb): harden federated join test with multi-book join-key coverage

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

* feat(ingest): keep declared cross-DB joins to federated siblings

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

* feat(setup): surface federated connection availability after adding a member

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

* chore(setup): mark federationNoticeFor @internal for dead-code gate

Also marks attachTypeForDriver, buildAttachStatements, and
isReservedConnectionId @internal — all three are exported solely for
unit-test access with no production cross-file consumer.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

* docs(concepts): document cross-database federation

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

* docs(concepts): correct sqlite two-part naming in federation doc

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

* fix(duckdb): quote federated catalog alias so hyphenated connection ids attach

* refactor(duckdb): single-source federation driver list, dedup attach loads

Collapse the parallel ATTACH_COMPATIBLE_DRIVERS set and ATTACH_TYPE_BY_DRIVER
map into one map in federation.ts whose keys are the membership rule. Replace
FederatedMember.config (read only via a type-erasing cast) with a typed url
field extracted at derive time. Emit INSTALL/LOAD once per distinct driver
type instead of once per member.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

* fix(duckdb): close federated DuckDB instance on connect failure; dedup id validation

Wrap the federated DuckDB instance in its own try/finally so a failing
connect() or a throwing connection.closeSync() no longer leaks the native
instance. Route setup-sources connection-id validation through the canonical
assertSafeConnectionId so the reserved _ktx_ prefix guard applies there too.
Derive the federated dialect through sqlAnalysisDialectForDriver instead of a
hardcoded literal.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

* refactor(federation): carry member connection config and projectDir on FederatedMember

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

* feat(federation): resolve per-member attach targets via canonical connector resolvers

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

* fix(federation): quote mysql attach-string values like postgres

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

* fix(federation): resolve member attach targets via canonical resolvers, supporting sqlite path:

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

* refactor(federation): thread projectDir through deriveFederatedConnection callers

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

* feat(federation): add shared project read-only SQL executor that routes _ktx_federated

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

* test(federation): exercise shared executor default federated path with real DuckDB

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

* refactor(federation): route ingest query executor through shared executor

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

* fix(federation): route MCP sql_execution _ktx_federated through shared executor

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

* fix(federation): preserve cross-DB joins to federated siblings in manifest re-emit

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

* fix(federation): preserve declared cross-DB joins through scan re-ingest

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

* refactor(federation): document sibling-ref invariant, drop unsafe casts in test

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

* fix(federation): namespace federated source names by member to avoid collisions

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

* docs(federation): document member-namespaced federated source names

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

* fix(federation): preserve member SSL/search_path in attach, classify federated MCP errors

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

* refactor(federation): simplify federated dispatch and parallelize sibling reads

Dedup the federated driver ternary in local-query, derive the prefixed
source.name from the already-built name, drop the duplicated error in
federatedAttachTarget's exhaustive switch, inline the one-line
cleanupConnector wrapper, and parallelize federatedSiblingTargets' shard
reads (was sequential await-in-for on the scan hot path).

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

* feat(federation): carry headerTypes through shared SQL executor

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

* feat(federation): add shared federated connection listing builder

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

* fix(federation): route ktx sql through shared executor for _ktx_federated parity

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

* feat(federation): show _ktx_federated in ktx connection list

Surfaces the virtual federated connection in the output of
`ktx connection list` so agents and users can discover cross-database
querying when 2+ attach-compatible connections are configured.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

* feat(federation): surface _ktx_federated in MCP connection_list

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

* test(federation): ktx sql federated cross-file join end-to-end

Drive runKtxSql with the real federated DuckDB executor against two on-disk
sqlite files, stubbing only SQL validation. The test surfaced that the JSON
output path could not serialize bigint values DuckDB returns for integer
columns; printJson now coerces bigint to JSON numbers, matching the
plain/pretty paths.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

* docs(federation): document direct _ktx_federated query surface

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

* fix(federation): coerce DuckDB bigint to number in shared federated executor

DuckDB returns integer columns as JS bigint, which JSON.stringify cannot
serialize. The CLI --json path worked around this with a replacer, but the
MCP sql_execution tool serializes via plain JSON.stringify and crashed on
any federated query selecting an integer column. Coerce bigint to Number
once in executeFederatedQuery so every consumer (CLI, MCP, ingest, SL)
gets a JSON-safe result, and remove the now-redundant CLI replacer.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

* refactor(federation): simplify driver map and collapse forked MCP SQL path

- Replace the identity-valued ATTACH_TYPE_BY_DRIVER record with a
  ATTACH_COMPATIBLE_DRIVERS Set; the driver name doubles as the attach
  type, so the map encoded nothing beyond membership.
- Switch federatedAttachTarget directly on the driver with a default
  throw, dropping the unreachable post-switch throw and its comment.
- Route the MCP sql_execution standard-connection case through the
  shared executeProjectReadOnlySql instead of reimplementing the
  connector create/capability-check/execute/cleanup ceremony, so
  federated and standard connections share one execution path.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

* chore(federation): allowlist placeholder credentials for detect-secrets

The federation doc example URL and the federated-attach test fixtures use
literal placeholder credentials that trip detect-secrets. Mark them with
line-scoped pragma allowlist comments so a real secret added later is still
caught.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

* fix(federation): correct SL addressing, join pruning, and id-quoting guidance

- Federated SL list/search records carry the virtual `_ktx_federated`
  connection id (member origin stays in the prefixed source name), so rows
  round-trip to `ktx sl -c _ktx_federated read` and the fts index no longer
  clobbers per-connection partitions.
- Prune semantic-layer joins by membership in the connection's own source set
  instead of matching the target's first dotted segment against other
  connection ids; a same-connection join whose target name collides with a
  sibling connection id is preserved, and orphan targets that would poison the
  planner are dropped.
- Document double-quoting for connection ids that are not bare SQL identifiers
  (e.g. "books-db".public.books) in the federated naming hint, the sl-query
  rejection error, and the federation docs.
- Preserve exact federated BIGINT values beyond 2^53 as strings instead of
  rounding, and steer the setup federation notice to raw SQL against
  `_ktx_federated`.

* fix(federation): carry ssl:true into postgres URL attach target

A postgres member configured with `url` plus `ssl: true` resolved to both a
connectionString and an ssl flag, but the federated attach builder early-returned
the bare URL and dropped the ssl intent. DuckDB then handed libpq a URL with no
sslmode, so the URL path silently diverged from the discrete-field path (which
emits sslmode=require) and from the direct scan path (which enforces TLS).

Append sslmode=require to the URL when the member sets ssl, unless the URL
already pins a stronger sslmode.

---------

Co-authored-by: Claude Sonnet 4.6 <noreply@anthropic.com>
Co-authored-by: Andrey Avtomonov <andreybavt@gmail.com>
2026-06-15 15:01:39 +00:00
.github chore: revert repo references to Kaelio/ktx and remove rename-resilience (#252) 2026-06-02 00:14:43 +02:00
assets chore: refresh star history chart [skip ci] 2026-06-15 08:10:51 +00:00
docs refactor: enforce ktx naming and AGENTS.md compliance sweep (#289) 2026-06-11 13:49:45 +02:00
docs-site feat(duckdb): cross-database federation via derived DuckDB connection (#295) 2026-06-15 15:01:39 +00:00
examples refactor: enforce ktx naming and AGENTS.md compliance sweep (#289) 2026-06-11 13:49:45 +02:00
packages/cli feat(duckdb): cross-database federation via derived DuckDB connection (#295) 2026-06-15 15:01:39 +00:00
python fix: classify mcp query failures (#302) 2026-06-15 12:48:24 +00:00
scripts feat(cli): self-provision pinned uv and defer MCP Python runtime install (#297) 2026-06-12 16:31:06 +00:00
skills/ktx refactor: enforce ktx naming and AGENTS.md compliance sweep (#289) 2026-06-11 13:49:45 +02:00
.gitignore chore: remove private planning docs (#140) 2026-05-19 14:58:55 +02:00
.pre-commit-config.yaml ci: stop tombi reformatting uv.lock and sync lock to 0.7.0 (#235) 2026-05-29 15:04:48 +02:00
.releaserc.cjs feat: add claude-code llm backend with runtime port (#115) 2026-05-16 12:06:34 +02:00
AGENTS.md fix(cli): survive ktx.yaml version skew and derive repo ownership from disk (#293) 2026-06-11 22:10:47 +02:00
biome.json feat: merge ingest and scan 2026-05-14 01:43:06 +02:00
CLAUDE.md Initial open-source release 2026-05-10 23:12:26 +02:00
codecov.yml refactor(release): drop release-policy.json runtime dep and next branch (#180) 2026-05-20 13:53:14 +02:00
conductor.json [codex] Add Conductor workspace scripts (#2) 2026-05-11 09:55:42 +02:00
CONTRIBUTING.md refactor: enforce ktx naming and AGENTS.md compliance sweep (#289) 2026-06-11 13:49:45 +02:00
GEMINI.md Initial open-source release 2026-05-10 23:12:26 +02:00
knip.json feat: add codex llm backend for ktx runtime work (#253) 2026-06-02 13:57:11 +02:00
LICENSE ci: run pre-commit checks in CI (#74) 2026-05-13 19:49:25 +02:00
package.json chore(release): 0.12.0 [skip ci] 2026-06-12 16:45:18 +00:00
pnpm-lock.yaml feat(duckdb): cross-database federation via derived DuckDB connection (#295) 2026-06-15 15:01:39 +00:00
pnpm-workspace.yaml fix(deps): bump hono override to 4.12.21 to resolve dependabot alerts (#288) 2026-06-10 12:26:01 +00:00
pyproject.toml chore: upgrade dependencies and tooling (#232) 2026-05-29 11:56:55 +02:00
README.md docs: consolidate AI Resources into a single page (#274) 2026-06-09 00:28:56 -04:00
release-policy.json chore(release): 0.12.0 [skip ci] 2026-06-12 16:45:18 +00:00
SECURITY.md refactor: enforce ktx naming and AGENTS.md compliance sweep (#289) 2026-06-11 13:49:45 +02:00
skills.sh.json docs: add ktx skills.sh setup skill (#227) 2026-05-28 12:28:10 +02:00
tombi.toml chore: upgrade dependencies and tooling (#232) 2026-05-29 11:56:55 +02:00
tsconfig.base.json perf(setup): speed up conductor setup and make it rerun-safe (#107) 2026-05-15 12:06:37 +02:00
uv.lock feat(cli): let ktx setup --agents choose an install directory (#298) 2026-06-13 00:46:56 +02:00

ktx

The context layer for data agents

npm version Codecov Tests Documentation Join the ktx Slack community License Y Combinator P25

Quickstart · CLI Reference · Agent Setup · Slack

Built and maintained by Kaelio


ktx is a self-improving context layer that teaches agents how to query your warehouse accurately - from approved metric definitions, joinable columns, and business knowledge it builds and maintains for you.

Note

Run ktx with your own LLM API keys or a local agent sign-in — a Claude Pro/Max subscription through Claude Code, or your local Codex authentication. No extra usage billing from ktx.

Watch the ktx launch video (1:56)

Ingestion: ktx ingests databases, BI tools, modeling code, and docs through its context engine (source connectors, context builder, reconciliation, validation) into wiki Markdown and semantic-layer YAML

Serving: an agent queries ktx through MCP, which searches the wiki and semantic layer, returns approved metrics, and compiles them into read-only SQL run against the warehouse

Why ktx

General-purpose agents struggle on data tasks. They re-explore your warehouse on every question, invent their own metric logic, and return numbers that don't match approved definitions.

Traditional semantic layers don't fix this. They demand constant manual upkeep and don't absorb the rest of your company's knowledge.

ktx does both, automatically:

  • Learns from company knowledge. Ingests wiki content, organizes it, removes duplicates, and flags contradictions for human review.
  • Maps the data stack. Samples tables, captures metadata and usage patterns, detects joinable columns, and annotates sources so agents write better queries.
  • Builds a semantic layer. Combines raw tables and high-level metrics through a join graph that automatically resolves chasm and fan traps, so agents fetch metrics declaratively instead of rewriting canonical SQL each time.
  • Serves agents at execution. Exposes CLI and MCP tools with combined full-text and semantic search across wiki and semantic-layer entities.

How ktx compares

General-purpose agent Traditional semantic layer ktx
Builds warehouse context automatically
Detects joinable columns + resolves fan/chasm traps Manual
Approved, reusable metric definitions
Absorbs wiki / Notion / team knowledge
Flags contradictions across sources
Ships CLI + MCP for agent execution Partial
Read-only by design n/a n/a

Who is ktx for

Use ktx if you:

  • Want agents like Claude Code, Codex, Cursor, or OpenCode to query your warehouse with approved metric definitions
  • Have business knowledge scattered across dbt, Looker, Metabase, Notion, and team wikis
  • Need agents to reuse canonical SQL instead of inventing it on every prompt

Skip ktx if you:

  • You don't have a SQL warehouse - ktx sits on top of one
  • You only need one ad-hoc query - psql or a notebook will do

Works with PostgreSQL, Snowflake, BigQuery, ClickHouse, MySQL, SQL Server, and SQLite. Integrates with dbt, MetricFlow, LookML, Looker, Metabase, and Notion.

Quick Start

npm install -g @kaelio/ktx
ktx setup
ktx status

ktx setup creates or resumes a local ktx project, configures providers and connections, builds context, and installs agent integration.

Example ktx status after setup:

ktx project: /home/user/analytics
Project ready: yes
LLM ready: yes (claude-sonnet-4-6)
Embeddings ready: yes (text-embedding-3-small)
Databases configured: yes (warehouse)
Context sources configured: yes (dbt_main)
ktx context built: yes
Agent integration ready: yes (codex:project)

Tip

Already using an agent? Ask Claude Code, Codex, Cursor, or OpenCode from your project directory:

Run npx skills add Kaelio/ktx --skill ktx and use the ktx skill to install
and configure ktx in this project.

Important

If ktx status prints ktx mcp start --project-dir ..., run it before opening your agent client.

Upgrading

Re-run the global install with the @latest tag:

npm install -g @kaelio/ktx@latest

First commands

Command Purpose
ktx setup Create, resume, or update a ktx project
ktx status Check project readiness
ktx ingest Build context for every configured connection
ktx sl "revenue" Search semantic sources
ktx wiki "refund policy" Search local wiki pages
ktx mcp start Start the MCP server for agent clients

See the CLI Reference for every command, flag, and option.

Project Layout

my-project/
├── ktx.yaml                         # Project configuration
├── semantic-layer/<connection-id>/  # YAML semantic sources
├── wiki/global/                     # Shared business context
├── wiki/user/<user-id>/             # User-scoped notes
├── raw-sources/<connection-id>/     # Ingest artifacts and reports
└── .ktx/                            # Local state and secrets, git-ignored

Commit ktx.yaml, semantic-layer/, and wiki/. Keep .ktx/ local.

Project resolution defaults to KTX_PROJECT_DIR, then the nearest ktx.yaml, then the current directory. Pass --project-dir <path> when scripting.

FAQ

  • Does ktx send my schema or query results to a hosted service? No. ktx runs locally. The only data leaving your machine is what you send to the LLM provider you configured.
  • Which LLM backends are supported? Anthropic API, Google Vertex AI, AI Gateway, the local Claude Code session through the Claude Agent SDK, and your local Codex authentication through the Codex SDK. See LLM configuration.
  • How is ktx different from a dbt or MetricFlow semantic layer? ktx ingests those layers and combines them with raw-table introspection and wiki content. Agents get one searchable surface instead of three disconnected ones - and ktx flags contradictions across sources.
  • Does ktx need a running server? There is no hosted service. The local MCP daemon runs on demand via ktx mcp start when an agent client needs it.
  • Is my warehouse safe? Yes. Connections are read-only - ktx never writes to your database.

Docs

Community

  • Slack — ask questions, share what you're building, and chat with maintainers.
  • GitHub Issues — report bugs and request features.
  • Contributing — set up the repo, run tests, and open a PR.

Development

git clone https://github.com/kaelio/ktx.git
cd ktx
pnpm install
uv sync --all-groups
pnpm run build
pnpm run check

ktx is a pnpm + uv workspace:

Path Purpose
packages/cli TypeScript CLI and published npm package source
packages/cli/src/context Core context engine
packages/cli/src/llm LLM and embedding providers
packages/cli/src/connectors Database scan connectors
python/ktx-sl Semantic-layer query planning
python/ktx-daemon Portable compute service

Local development CLI:

pnpm run setup:dev
pnpm run link:dev
ktx-dev --help

Useful checks:

pnpm run type-check
pnpm run test
pnpm run dead-code
uv run pytest -q

Telemetry

ktx collects privacy-conscious usage telemetry to understand installs and improve setup, command reliability, and data-agent workflows. Catalog telemetry events do not record file paths, hostnames, SQL, schema names, table names, column names, error messages, raw environment values, or argv. Error reports use PostHog Error Tracking and can include stack frames and raw error messages, which may contain local file paths or the local username in those paths. ktx redacts secrets, credentials, database URLs, auth headers, argv, raw environment values, SQL text, row data, and user-typed prompt or MCP argument text from the explicit $exception payload. See Telemetry for the event catalog and opt-out options.

License

ktx is licensed under the Apache License, Version 2.0. See LICENSE.

Star History

ktx Star History Chart