ktx is the context layer for analytics agents https://docs.kaelio.com/ktx
Find a file
Andrey Avtomonov 494618ab14
feat: add codex llm backend for ktx runtime work (#253)
* feat: add codex sdk runner foundation

* feat: parse codex runtime events

* feat: expose codex runtime mcp tools

* feat: add codex llm runtime

* feat: wire codex llm backend

* test: avoid Array.fromAsync in codex runner test

* docs: document codex llm backend

* fix: tighten codex runtime config ownership

* fix: use codex sdk env and thread options

* fix: parse codex sdk event shapes

* test: add codex backend live smoke

* docs: clarify codex backend isolation

* fix: drive codex loop metrics from mcp events

* fix: enforce codex local step budget

* docs: disclose codex isolation limits

* fix: count all codex agent steps and stream step callbacks live

The agent-loop step budget only counted completed mcp_tool_call items, so
built-in command_execution steps (which the public Codex SDK/CLI surface can
still expose) never decremented the budget, letting ingest/reconciliation run
past stepBudget until Codex stopped on its own. onStepFinish was also replayed
only after the whole stream drained, so live work_unit_step / reconciliation
progress appeared stuck until the Codex process exited.

collectEvents is now the single live step accumulator: it counts every
completed agent-action item via a shared isCompletedAgentStep predicate
(command_execution, mcp_tool_call, file_change, web_search), fires onStepFinish
as each step completes, and enforces the budget on that broader count. A
no-tool turn still counts as one step. toolFailures stays MCP-specific, since a
non-zero command exit is normal agent exploration, not a loop failure.

* test: align ingest llm-guard assertions with codex backend

The skip-llm ingest guard message now lists codex as a valid backend and
mentions a Claude Code/Codex session plus a codex setup hint, but this slow
suite test still asserted the pre-codex wording. Update it to match the
production message (already covered by the local-bundle-runtime unit test) and
add the codex setup-line assertion.

* fix: treat codex error:null tool calls as success

The Codex SDK serializes error: null on successful mcp_tool_call items, so
the failure check (item.error !== undefined) flagged every successful tool
call as failed with the empty-payload default "Codex turn failed". This
killed every ingest work unit under the codex backend before it could
produce a patch.

Key on status === 'failed' (authoritative, always set) and only treat a
populated error object as a failure. Add a regression test built from a
verbatim real-SDK event capture.

* fix: default codex backend to gpt-5.5 and report real probe errors

The previous default gpt-5.3-codex is an API-key-only model that the OpenAI
API rejects under ChatGPT-account (subscription) auth, so codex status/setup
failed with a misleading "authentication is not usable" message even though
auth was fine.

- Default codex model is now gpt-5.5 (works on both subscription and API-key
  auth); the curated setup picker offers gpt-5.5 / gpt-5.4 / gpt-5.4-mini and
  keeps free-form entry for account-specific ids (e.g. gpt-5.3-codex-spark).
- runCodexAuthProbe now distinguishes "model not available" from an auth
  failure and surfaces the real API error: collectEvents retains stream
  events when the SDK throws on a non-zero exit, and the API error JSON
  envelope is unwrapped to its human-readable message.
- The Codex isolation warning now renders inside the clack setup frame.
- Docs updated to gpt-5.5 with a note that *-codex ids require API-key auth.

* fix: require llm.models.default in status and match codex probe remediation

Status reported a project ready when a non-none LLM backend was configured
without llm.models.default, but the runtime (resolveModelSlots) hard-requires
it, so ingest/scan/memory threw after `ktx status` said the project was usable.
buildLlmStatus now fails for any non-none backend missing models.default and no
longer invents a fallback model for claude-code/codex.

Codex probe failures now carry a category-matched fix: a model-access failure
steers the user at llm.models.default instead of the auth/install remediation.
runCodexAuthProbe returns the fix and status consumes it; the message stays
self-sufficient so setup output is unchanged.

Docs: README now lists the codex backend and local Codex auth; ktx-setup.mdx
states --llm-model only accepts codex/default or gpt-*/codex-* ids.

Repaired four doctor fixtures that configured a backend without models.default
(the now-correctly-blocked config) and added coverage for the new behavior.
2026-06-02 13:57:11 +02:00
.github chore: revert repo references to Kaelio/ktx and remove rename-resilience (#252) 2026-06-02 00:14:43 +02:00
assets chore: refresh star history chart [skip ci] 2026-06-02 07:46:46 +00:00
docs chore: revert repo references to Kaelio/ktx and remove rename-resilience (#252) 2026-06-02 00:14:43 +02:00
docs-site feat: add codex llm backend for ktx runtime work (#253) 2026-06-02 13:57:11 +02:00
examples chore(workspace): gate dead-code with knip production mode (#196) 2026-05-21 15:28:58 +02:00
packages/cli feat: add codex llm backend for ktx runtime work (#253) 2026-06-02 13:57:11 +02:00
python chore(release): 0.8.0 [skip ci] 2026-06-01 18:09:14 +00:00
scripts feat: add codex llm backend for ktx runtime work (#253) 2026-06-02 13:57:11 +02:00
skills/ktx docs(ktx skill): harden setup guidance from agent-driven demo run (#247) 2026-06-01 12:08:58 +00:00
website feat(docs): add Fumadocs site workspace 2026-05-11 01:08:31 -07:00
.gitignore chore: remove private planning docs (#140) 2026-05-19 14:58:55 +02:00
.pre-commit-config.yaml ci: stop tombi reformatting uv.lock and sync lock to 0.7.0 (#235) 2026-05-29 15:04:48 +02:00
.releaserc.cjs feat: add claude-code llm backend with runtime port (#115) 2026-05-16 12:06:34 +02:00
AGENTS.md chore: revert repo references to Kaelio/ktx and remove rename-resilience (#252) 2026-06-02 00:14:43 +02:00
biome.json feat: merge ingest and scan 2026-05-14 01:43:06 +02:00
CLAUDE.md Initial open-source release 2026-05-10 23:12:26 +02:00
codecov.yml refactor(release): drop release-policy.json runtime dep and next branch (#180) 2026-05-20 13:53:14 +02:00
conductor.json [codex] Add Conductor workspace scripts (#2) 2026-05-11 09:55:42 +02:00
CONTRIBUTING.md chore: revert repo references to Kaelio/ktx and remove rename-resilience (#252) 2026-06-02 00:14:43 +02:00
GEMINI.md Initial open-source release 2026-05-10 23:12:26 +02:00
knip.json feat: add codex llm backend for ktx runtime work (#253) 2026-06-02 13:57:11 +02:00
LICENSE ci: run pre-commit checks in CI (#74) 2026-05-13 19:49:25 +02:00
package.json feat: add codex llm backend for ktx runtime work (#253) 2026-06-02 13:57:11 +02:00
pnpm-lock.yaml feat: add codex llm backend for ktx runtime work (#253) 2026-06-02 13:57:11 +02:00
pnpm-workspace.yaml chore: upgrade dependencies and tooling (#232) 2026-05-29 11:56:55 +02:00
pyproject.toml chore: upgrade dependencies and tooling (#232) 2026-05-29 11:56:55 +02:00
README.md feat: add codex llm backend for ktx runtime work (#253) 2026-06-02 13:57:11 +02:00
release-policy.json chore(release): 0.8.0 [skip ci] 2026-06-01 18:09:14 +00:00
SECURITY.md chore: revert repo references to Kaelio/ktx and remove rename-resilience (#252) 2026-06-02 00:14:43 +02:00
skills.sh.json docs: add ktx skills.sh setup skill (#227) 2026-05-28 12:28:10 +02:00
tombi.toml chore: upgrade dependencies and tooling (#232) 2026-05-29 11:56:55 +02:00
tsconfig.base.json perf(setup): speed up conductor setup and make it rerun-safe (#107) 2026-05-15 12:06:37 +02:00
uv.lock ci: stop tombi reformatting uv.lock and sync lock to 0.7.0 (#235) 2026-05-29 15:04:48 +02:00

ktx

The context layer for data agents

npm version Codecov Tests Documentation Join the ktx Slack community License Y Combinator P25

Quickstart · CLI Reference · Agent Setup · Slack


ktx is a self-improving context layer that teaches agents how to query your warehouse accurately - from approved metric definitions, joinable columns, and business knowledge it builds and maintains for you.

Note

Run ktx with your own LLM API keys or a local agent sign-in — a Claude Pro/Max subscription through Claude Code, or your local Codex authentication. No extra usage billing from ktx.

Watch the ktx launch video (1:56)

Ingestion: ktx ingests databases, BI tools, modeling code, and docs through its context engine (source connectors, context builder, reconciliation, validation) into wiki Markdown and semantic-layer YAML

Serving: an agent queries ktx through MCP, which searches the wiki and semantic layer, returns approved metrics, and compiles them into read-only SQL run against the warehouse

Why ktx

General-purpose agents struggle on data tasks. They re-explore your warehouse on every question, invent their own metric logic, and return numbers that don't match approved definitions.

Traditional semantic layers don't fix this. They demand constant manual upkeep and don't absorb the rest of your company's knowledge.

ktx does both, automatically:

  • Learns from company knowledge. Ingests wiki content, organizes it, removes duplicates, and flags contradictions for human review.
  • Maps the data stack. Samples tables, captures metadata and usage patterns, detects joinable columns, and annotates sources so agents write better queries.
  • Builds a semantic layer. Combines raw tables and high-level metrics through a join graph that automatically resolves chasm and fan traps, so agents fetch metrics declaratively instead of rewriting canonical SQL each time.
  • Serves agents at execution. Exposes CLI and MCP tools with combined full-text and semantic search across wiki and semantic-layer entities.

How ktx compares

General-purpose agent Traditional semantic layer ktx
Builds warehouse context automatically
Detects joinable columns + resolves fan/chasm traps Manual
Approved, reusable metric definitions
Absorbs wiki / Notion / team knowledge
Flags contradictions across sources
Ships CLI + MCP for agent execution Partial
Read-only by design n/a n/a

Who is ktx for

Use ktx if you:

  • Want agents like Claude Code, Codex, Cursor, or OpenCode to query your warehouse with approved metric definitions
  • Have business knowledge scattered across dbt, Looker, Metabase, Notion, and team wikis
  • Need agents to reuse canonical SQL instead of inventing it on every prompt

Skip ktx if you:

  • You don't have a SQL warehouse - ktx sits on top of one
  • You only need one ad-hoc query - psql or a notebook will do

Works with PostgreSQL, Snowflake, BigQuery, ClickHouse, MySQL, SQL Server, and SQLite. Integrates with dbt, MetricFlow, LookML, Looker, Metabase, and Notion.

Quick Start

npm install -g @kaelio/ktx
ktx setup
ktx status

ktx setup creates or resumes a local ktx project, configures providers and connections, builds context, and installs agent integration.

Example ktx status after setup:

ktx project: /home/user/analytics
Project ready: yes
LLM ready: yes (claude-sonnet-4-6)
Embeddings ready: yes (text-embedding-3-small)
Databases configured: yes (warehouse)
Context sources configured: yes (dbt_main)
ktx context built: yes
Agent integration ready: yes (codex:project)

Tip

Already using an agent? Ask Claude Code, Codex, Cursor, or OpenCode from your project directory:

Run npx skills add Kaelio/ktx --skill ktx and use the ktx skill to install
and configure ktx in this project.

Important

If ktx status prints ktx mcp start --project-dir ..., run it before opening your agent client.

First commands

Command Purpose
ktx setup Create, resume, or update a ktx project
ktx status Check project readiness
ktx ingest Build context for every configured connection
ktx sl "revenue" Search semantic sources
ktx wiki "refund policy" Search local wiki pages
ktx mcp start Start the MCP server for agent clients

See the CLI Reference for every command, flag, and option.

Project Layout

my-project/
├── ktx.yaml                         # Project configuration
├── semantic-layer/<connection-id>/  # YAML semantic sources
├── wiki/global/                     # Shared business context
├── wiki/user/<user-id>/             # User-scoped notes
├── raw-sources/<connection-id>/     # Ingest artifacts and reports
└── .ktx/                            # Local state and secrets, git-ignored

Commit ktx.yaml, semantic-layer/, and wiki/. Keep .ktx/ local.

Project resolution defaults to KTX_PROJECT_DIR, then the nearest ktx.yaml, then the current directory. Pass --project-dir <path> when scripting.

FAQ

  • Does ktx send my schema or query results to a hosted service? No. ktx runs locally. The only data leaving your machine is what you send to the LLM provider you configured.
  • Which LLM backends are supported? Anthropic API, Google Vertex AI, AI Gateway, the local Claude Code session through the Claude Agent SDK, and your local Codex authentication through the Codex SDK. See LLM configuration.
  • How is ktx different from a dbt or MetricFlow semantic layer? ktx ingests those layers and combines them with raw-table introspection and wiki content. Agents get one searchable surface instead of three disconnected ones - and ktx flags contradictions across sources.
  • Does ktx need a running server? There is no hosted service. The local MCP daemon runs on demand via ktx mcp start when an agent client needs it.
  • Is my warehouse safe? Yes. Connections are read-only - ktx never writes to your database.

Docs

Community

  • Slack — ask questions, share what you're building, and chat with maintainers.
  • GitHub Issues — report bugs and request features.
  • Contributing — set up the repo, run tests, and open a PR.

Development

git clone https://github.com/kaelio/ktx.git
cd ktx
pnpm install
uv sync --all-groups
pnpm run build
pnpm run check

ktx is a pnpm + uv workspace:

Path Purpose
packages/cli TypeScript CLI and published npm package source
packages/cli/src/context Core context engine
packages/cli/src/llm LLM and embedding providers
packages/cli/src/connectors Database scan connectors
python/ktx-sl Semantic-layer query planning
python/ktx-daemon Portable compute service

Local development CLI:

pnpm run setup:dev
pnpm run link:dev
ktx-dev --help

Useful checks:

pnpm run type-check
pnpm run test
pnpm run dead-code
uv run pytest -q

Telemetry

ktx collects anonymous usage telemetry from interactive CLI runs to improve setup, command reliability, and data-agent workflows. No file paths, hostnames, SQL, schema names, error messages, or argv are recorded. See Telemetry for the event catalog and opt-out options.

License

ktx is licensed under the Apache License, Version 2.0. See LICENSE.

Star History

ktx Star History Chart