apunkt/ktx

mirror of https://github.com/Kaelio/ktx.git synced 2026-07-22 11:51:01 +02:00

ktx is the context layer for analytics agents https://docs.kaelio.com/ktx

Find a file

Andrey Avtomonov 21744fc520 feat(cli): profile ingest runs and split model vs tool time (#249 ) * feat(cli): profile ingest runs to find where wall-clock time goes Add opt-in profiling for `ktx ingest`. Each timed phase, work unit, and agent loop now records durationMs / step count / token usage in the trace, and a post-run aggregator rolls them up into a "where did the time go" report printed to stderr. Enable per run with KTX_PROFILE_INGEST (1/true -> human table, json -> raw structured profile) or persistently via `ingest.profile` in ktx.yaml. The json form emits raw milliseconds, token counts, and a summary.headline one-line diagnosis so coding agents can parse it directly; json wins when both env and config request profiling. - runtime-port: RunLoopMetrics (totalMs, usage, stepCount, stepBoundariesMs) plus onMetrics callbacks on text/object generation - ai-sdk + claude-code runtimes: capture per-loop timing and token usage - work-unit-executor and stages 3/4: thread metrics into trace events - ingest-bundle.runner: time worktree / triage / clustering / index / reconcile / squash phases and emit the profile in a finally block (best-effort; never affects the run outcome) - ingest-profile: new trace+transcript aggregator with table/json formatters - config: ingest.profile flag; docs: profiling section in ktx-ingest.mdx * fix(cli): flush tool-call logs before reading ingest profile Tool transcripts are appended fire-and-forget so the agent hot path never blocks on logging. The ingest profiler read them before the writes settled, so per-work-unit toolMs (and the model-vs-tool split derived from it) could be incomplete. Track in-flight appends and expose flushToolCallLogs() — bounded by a timeout so it can never hang — and flush before the profiler reads the transcript.		2026-06-01 15:49:17 +02:00
.github	ci: normalize star-history.svg trailing newline (#241 )	2026-05-30 17:44:27 +02:00
assets	docs(readme): add launch video to README hero (#248 )	2026-06-01 13:42:42 +00:00
docs	feat(cli)!: remove fast mode; ktx ingest always builds enriched context (KLO-721) (#237 )	2026-05-29 17:41:04 +02:00
docs-site	feat(cli): profile ingest runs and split model vs tool time (#249 )	2026-06-01 15:49:17 +02:00
examples	chore(workspace): gate dead-code with knip production mode (#196 )	2026-05-21 15:28:58 +02:00
packages/cli	feat(cli): profile ingest runs and split model vs tool time (#249 )	2026-06-01 15:49:17 +02:00
python	feat: report MCP client telemetry (#242 )	2026-05-30 18:00:25 +02:00
scripts	feat(cli)!: remove fast mode; ktx ingest always builds enriched context (KLO-721) (#237 )	2026-05-29 17:41:04 +02:00
skills/ktx	docs(ktx skill): harden setup guidance from agent-driven demo run (#247 )	2026-06-01 12:08:58 +00:00
website	feat(docs): add Fumadocs site workspace	2026-05-11 01:08:31 -07:00
.gitignore	chore: remove private planning docs (#140 )	2026-05-19 14:58:55 +02:00
.pre-commit-config.yaml	ci: stop tombi reformatting uv.lock and sync lock to 0.7.0 (#235 )	2026-05-29 15:04:48 +02:00
.releaserc.cjs	feat: add claude-code llm backend with runtime port (#115 )	2026-05-16 12:06:34 +02:00
AGENTS.md	feat(telemetry): enable PostHog GeoIP enrichment (#243 )	2026-05-30 18:33:14 +02:00
biome.json	feat: merge ingest and scan	2026-05-14 01:43:06 +02:00
CLAUDE.md	Initial open-source release	2026-05-10 23:12:26 +02:00
codecov.yml	refactor(release): drop release-policy.json runtime dep and next branch (#180 )	2026-05-20 13:53:14 +02:00
conductor.json	[codex] Add Conductor workspace scripts (#2 )	2026-05-11 09:55:42 +02:00
CONTRIBUTING.md	chore(community): rewards program, issue templates, and triage workflow (#176 )	2026-05-19 19:42:06 -04:00
GEMINI.md	Initial open-source release	2026-05-10 23:12:26 +02:00
knip.json	test: split cli tests from source tree (#216 )	2026-05-26 08:49:05 +02:00
LICENSE	ci: run pre-commit checks in CI (#74 )	2026-05-13 19:49:25 +02:00
package.json	chore: upgrade dependencies and tooling (#232 )	2026-05-29 11:56:55 +02:00
pnpm-lock.yaml	feat: README architecture diagrams + React Flow diagram studio (#245 )	2026-06-01 12:06:27 +02:00
pnpm-workspace.yaml	chore: upgrade dependencies and tooling (#232 )	2026-05-29 11:56:55 +02:00
pyproject.toml	chore: upgrade dependencies and tooling (#232 )	2026-05-29 11:56:55 +02:00
README.md	docs(readme): add launch video to README hero (#248 )	2026-06-01 13:42:42 +00:00
release-policy.json	chore(release): 0.7.0 [skip ci]	2026-05-28 15:21:40 +00:00
SECURITY.md	chore(community): rewards program, issue templates, and triage workflow (#176 )	2026-05-19 19:42:06 -04:00
skills.sh.json	docs: add ktx skills.sh setup skill (#227 )	2026-05-28 12:28:10 +02:00
tombi.toml	chore: upgrade dependencies and tooling (#232 )	2026-05-29 11:56:55 +02:00
tsconfig.base.json	perf(setup): speed up conductor setup and make it rerun-safe (#107 )	2026-05-15 12:06:37 +02:00
uv.lock	ci: stop tombi reformatting uv.lock and sync lock to 0.7.0 (#235 )	2026-05-29 15:04:48 +02:00

README.md

The context layer for data agents

Quickstart · CLI Reference · Agent Setup · Slack

ktx is a self-improving context layer that teaches agents how to query your warehouse accurately - from approved metric definitions, joinable columns, and business knowledge it builds and maintains for you.

Note

Run ktx with your own LLM API keys or a Claude Pro/Max subscription. No extra usage billing from ktx.

Ingestion: ktx ingests databases, BI tools, modeling code, and docs through its context engine (source connectors, context builder, reconciliation, validation) into wiki Markdown and semantic-layer YAML

Serving: an agent queries ktx through MCP, which searches the wiki and semantic layer, returns approved metrics, and compiles them into read-only SQL run against the warehouse

Why ktx

General-purpose agents struggle on data tasks. They re-explore your warehouse on every question, invent their own metric logic, and return numbers that don't match approved definitions.

Traditional semantic layers don't fix this. They demand constant manual upkeep and don't absorb the rest of your company's knowledge.

ktx does both, automatically:

Learns from company knowledge. Ingests wiki content, organizes it, removes duplicates, and flags contradictions for human review.
Maps the data stack. Samples tables, captures metadata and usage patterns, detects joinable columns, and annotates sources so agents write better queries.
Builds a semantic layer. Combines raw tables and high-level metrics through a join graph that automatically resolves chasm and fan traps, so agents fetch metrics declaratively instead of rewriting canonical SQL each time.
Serves agents at execution. Exposes CLI and MCP tools with combined full-text and semantic search across wiki and semantic-layer entities.

How ktx compares

	General-purpose agent	Traditional semantic layer	ktx
Builds warehouse context automatically	—	—	✓
Detects joinable columns + resolves fan/chasm traps	—	Manual	✓
Approved, reusable metric definitions	—	✓	✓
Absorbs wiki / Notion / team knowledge	—	—	✓
Flags contradictions across sources	—	—	✓
Ships CLI + MCP for agent execution	Partial	—	✓
Read-only by design	n/a	n/a	✓

Who is ktx for

Use ktx if you:

Want agents like Claude Code, Codex, Cursor, or OpenCode to query your warehouse with approved metric definitions
Have business knowledge scattered across dbt, Looker, Metabase, Notion, and team wikis
Need agents to reuse canonical SQL instead of inventing it on every prompt

Skip ktx if you:

You don't have a SQL warehouse - ktx sits on top of one
You only need one ad-hoc query - psql or a notebook will do

Works with PostgreSQL, Snowflake, BigQuery, ClickHouse, MySQL, SQL Server, and SQLite. Integrates with dbt, MetricFlow, LookML, Looker, Metabase, and Notion.

Quick Start

npm install -g @kaelio/ktx
ktx setup
ktx status

ktx setup creates or resumes a local ktx project, configures providers and connections, builds context, and installs agent integration.

Example ktx status after setup:

ktx project: /home/user/analytics
Project ready: yes
LLM ready: yes (claude-sonnet-4-6)
Embeddings ready: yes (text-embedding-3-small)
Databases configured: yes (warehouse)
Context sources configured: yes (dbt_main)
ktx context built: yes
Agent integration ready: yes (codex:project)

Tip

Already using an agent? Ask Claude Code, Codex, Cursor, or OpenCode from your project directory:
Run npx skills add Kaelio/ktx --skill ktx and use the ktx skill to install
and configure ktx in this project.

Important

If ktx status prints ktx mcp start --project-dir ..., run it before opening your agent client.

First commands

Command	Purpose
`ktx setup`	Create, resume, or update a ktx project
`ktx status`	Check project readiness
`ktx ingest`	Build context for every configured connection
`ktx sl "revenue"`	Search semantic sources
`ktx wiki "refund policy"`	Search local wiki pages
`ktx mcp start`	Start the MCP server for agent clients

See the CLI Reference for every command, flag, and option.

Project Layout

my-project/
├── ktx.yaml                         # Project configuration
├── semantic-layer/<connection-id>/  # YAML semantic sources
├── wiki/global/                     # Shared business context
├── wiki/user/<user-id>/             # User-scoped notes
├── raw-sources/<connection-id>/     # Ingest artifacts and reports
└── .ktx/                            # Local state and secrets, git-ignored

Commit ktx.yaml, semantic-layer/, and wiki/. Keep .ktx/ local.

Project resolution defaults to KTX_PROJECT_DIR, then the nearest ktx.yaml, then the current directory. Pass --project-dir <path> when scripting.

FAQ

Does ktx send my schema or query results to a hosted service? No. ktx runs locally. The only data leaving your machine is what you send to the LLM provider you configured.
Which LLM backends are supported? Anthropic API, Google Vertex AI, AI Gateway, and the local Claude Code session through the Claude Agent SDK. See LLM configuration.
How is ktx different from a dbt or MetricFlow semantic layer? ktx ingests those layers and combines them with raw-table introspection and wiki content. Agents get one searchable surface instead of three disconnected ones - and ktx flags contradictions across sources.
Does ktx need a running server? There is no hosted service. The local MCP daemon runs on demand via ktx mcp start when an agent client needs it.
Is my warehouse safe? Yes. Connections are read-only - ktx never writes to your database.

Docs

Community

Slack — ask questions, share what you're building, and chat with maintainers.
GitHub Issues — report bugs and request features.
Contributing — set up the repo, run tests, and open a PR.

Development

git clone https://github.com/kaelio/ktx.git
cd ktx
pnpm install
uv sync --all-groups
pnpm run build
pnpm run check

ktx is a pnpm + uv workspace:

Path	Purpose
`packages/cli`	TypeScript CLI and published npm package source
`packages/cli/src/context`	Core context engine
`packages/cli/src/llm`	LLM and embedding providers
`packages/cli/src/connectors`	Database scan connectors
`python/ktx-sl`	Semantic-layer query planning
`python/ktx-daemon`	Portable compute service

Local development CLI:

pnpm run setup:dev
pnpm run link:dev
ktx-dev --help

Useful checks:

pnpm run type-check
pnpm run test
pnpm run dead-code
uv run pytest -q

Telemetry

ktx collects anonymous usage telemetry from interactive CLI runs to improve setup, command reliability, and data-agent workflows. No file paths, hostnames, SQL, schema names, error messages, or argv are recorded. See Telemetry for the event catalog and opt-out options.

License

ktx is licensed under the Apache License, Version 2.0. See LICENSE.