2026-05-19 15:22:17 +02:00
# Goal
Set up KTX from scratch end-to-end as a fully autonomous, agent-driven replacement for the interactive `ktx setup` wizard. Detect the environment, install missing prerequisites, ask the user only for information you genuinely need (which connections to add, credentials), write a valid configuration, verify it works, and run a fast schema ingest. Keep the user updated throughout.
# Operating principles
- **Be autonomous.** Detect, decide, and act. Only ask the user when you need information that only they can provide: project location, which databases/sources to connect, credentials, and similar choices.
2026-05-20 00:43:07 +02:00
- **Stream short status updates.** Before each major phase ("Checking prerequisites…", "Installing uv…", "Configuring warehouse connection…", "Running fast ingest…") print a one-line update. Not chatty - just enough that the user can see what's happening.
2026-05-19 15:22:17 +02:00
- **Verify against docs, never guess.** CLI flags, config keys, and command names must come from the docs or from `ktx <command> --help` . If something looks wrong or missing, say so explicitly.
- **Print every command you run and its exit code.** Terse, not silent.
2026-05-20 00:43:07 +02:00
- **Fail loudly with cause + fix.** When a command fails: capture the exact error, identify the cause, change something, retry. Never retry an unchanged command. Exceptions for *known soft-failures* are listed in Phase 4 - handle those without retrying.
2026-05-19 15:22:17 +02:00
- **No LLM-based ingestion in this flow.** Only `--fast` ingest (schema-only). The user can run `--deep` later.
- **Platform-agnostic.** Detect the host OS first and pick the right install commands / path syntax. Anything path- or shell-specific must branch on OS.
# Authoritative docs
2026-05-20 00:43:07 +02:00
KTX docs are served at `https://docs.kaelio.com/ktx/` . **Start by fetching `https://docs.kaelio.com/ktx/llms.txt`** to discover the docs map. Scan it for a "troubleshooting" entry - if one exists, read it **before** running install/setup so you can apply known fixes preemptively rather than after failing. If no troubleshooting page is listed (current state of the docs), proceed. Then fetch any other `.md` pages you need (setup, ingest, status, connection types). **Never invent CLI flags or config keys** - verify against the docs or `ktx --help` / `ktx <subcommand> --help` .
2026-05-19 15:22:17 +02:00
2026-05-20 00:43:07 +02:00
> **Note on the `ktx status` JSON example in the docs.** The docs page for `ktx status` shows an example shaped like `{"title": "...", "checks": [...]}`. That example is outdated. The real CLI output uses a top-level `verdict` field plus a `connections[]` array - see Phase 5 for the canonical success criteria. Trust the shape in this prompt over the docs example.
2026-05-19 15:22:17 +02:00
# Workflow
2026-05-20 00:43:07 +02:00
## Phase 1 - Detect environment
2026-05-19 15:22:17 +02:00
Determine the host OS (e.g. via `uname -s` , `process.platform` , or `$env:OS` ). Use the right install commands per OS for the rest of this flow.
| Tool | macOS / Linux | Windows (PowerShell) |
|------|---------------|----------------------|
| `uv` | `curl -LsSf https://astral.sh/uv/install.sh \| sh` then re-source shell env | `irm https://astral.sh/uv/install.ps1 \| iex` |
2026-05-20 00:43:07 +02:00
| Node.js | use system / fnm / nvm - **do not** auto-install | use system / nvm-windows - **do not** auto-install |
2026-05-19 15:22:17 +02:00
| KTX CLI | `npm install -g …` (see Phase 2) | `npm install -g …` (see Phase 2) |
If Node.js is missing, **stop and ask the user** to install it (https://nodejs.org/). Do not attempt to auto-install Node.
2026-05-20 00:43:07 +02:00
## Phase 2 - Verify and install prerequisites
2026-05-19 15:22:17 +02:00
Check each tool in order; install only if missing.
2026-05-20 00:43:07 +02:00
1. **Node.js** - run `node --version` . Require >= 22. If missing or older, stop and instruct the user.
2. ** `uv` ** - run `uv --version` . If missing, run the OS-appropriate install command, then re-source the shell environment (`export PATH="$HOME/.local/bin:$PATH"` on Linux/macOS) so `uv` is on `PATH` .
3. **KTX CLI** -
2026-05-19 15:22:17 +02:00
- Install ktx with `npm install -g @kaelio/ktx`
- Verify with `ktx --version` .
Print one status line per tool ("✓ uv 0.11.15 found", "Installing uv…", "✓ ktx 0.x.y installed").
2026-05-20 00:43:07 +02:00
## Phase 3 - Gather user choices
2026-05-19 15:22:17 +02:00
Ask the user (grouped if your harness supports it; otherwise sequentially):
1. **Project directory.** Default: current working directory. Confirm before continuing.
2. **LLM provider.** Default: `claude-code` with model `sonnet` (the user is already inside Claude Code; no extra API key needed). Offer `anthropic` (paste API key, stored as `env:` or `file:` ref) and `vertex` (GCP project + location) as alternatives. Skip if defaults are accepted.
3. **Embeddings backend.** Default: `sentence-transformers` (local, no API key, managed Python runtime). Offer `openai` only if the user has a key.
4. **Database connections.** Ask how many to add, then loop. For each, collect:
- Connection name (e.g. `warehouse` , `analytics` ).
docs: rewrite Semantic Querying concept with imperative-vs-declarative diagram (#156)
* docs: rewrite Semantic Querying concept with imperative-vs-declarative diagram
Reframe semantic-layer-internals.mdx around the contract the semantic
layer offers an agent: declare what you want (a Semantic Query), KTX
figures out how to compute it. Replaces the old "Context-Aware SQL"
framing with a clear imperative-vs-declarative narrative.
Adds a React Flow component (semantic-layer-flow.tsx) that contrasts a
buggy 4-table agent-authored SQL (chasm trap, LEFT-JOIN-in-WHERE,
hardcoded DATE_TRUNC) against the chasm-safe per-fact CTE SQL the
planner actually emits, including the outer GROUP BY over the requested
dimensions. Both lanes converge into a shared warehouse node and each
SQL card now has parallel bullet notes (failures on the left, KTX
behavior on the right).
Side fixes bundled in:
- include the /ktx basePath in the favicon metadata so the icon resolves
under the production prefix
- migrate docs-site/middleware.ts to docs-site/proxy.ts (Next 16 rename)
- redirect / to /ktx/docs/getting-started/introduction so the apex docs
URL works
- add tests covering the apex redirect, the favicon basePath, and the
middleware-to-proxy rename
- propagate the Semantic Query terminology across the ktx-sl CLI
reference, the context-layer concept page, and the agent-clients /
primary-sources integration pages
* Fix CI dead-code failures
* docs-site: polish semantic-layer-internals code blocks and flow diagram
- Make CodeBlock a server component so children traverse synchronously
under React 19 RSC streaming; previously extractText returned "" in
dev SSR, leaving code blocks empty.
- Add custom JSON/YAML/SQL/code-like tokenizers with theme-aware token
classes; drop the colored file-glyph dot and gradient tab-head.
- Tighten tab-head: subtle grey background, smaller monospace filename
in muted grey, smaller rectangular language pill placed to the left
of the filename.
- Polish the React Flow semantic-layer diagram (controls, fit-view
padding, edge types).
* docs-site: annotate imperative SQL, add section anchor, drop ClickHouse
- Wire numbered red badges to each problematic span in the "Without KTX"
SQL with hover sync between SQL gutter, lines, and the notes list.
- Add #imperative-vs-declarative anchor on the flow section header so
the eyebrow link is shareable; reveals a # glyph on hover/focus.
- Align the compiled-SQL note dots to the first-line midpoint
(mt-[6px] instead of mt-1) so 4px dots sit at y=8 in a 16px line.
- Remove all ClickHouse references from docs-site (primary-sources,
quickstart, ktx-setup, contributing, agents-setup, mechanics test,
warehouse drivers in the flow diagram).
* test: drop ClickHouse contributing-docs assertion
Align the workspace-package mirror test with the ClickHouse removal
from docs-site (75907eb). The connector-clickhouse package still
exists in packages/, but contributing.mdx no longer lists it, so the
test that mirrored docs against the workspace was failing.
2026-05-19 23:41:29 +02:00
- Driver: one of `sqlite` , `postgres` , `mysql` , `sqlserver` , `bigquery` , `snowflake` .
2026-05-19 15:22:17 +02:00
- Connection URL/DSN (or service-account file for BigQuery). Accept `env:VAR_NAME` or `file:/abs/path` to avoid pasting raw secrets.
2026-05-20 00:43:07 +02:00
- **Heads-up for the user**: even if they paste a literal URL, KTX will silently relocate it into `<project>/.ktx/secrets/<connection>-url` and rewrite `ktx.yaml` to `url: file:…` - this is correct, secure behavior and not a bug.
2026-05-19 15:22:17 +02:00
- Schemas / datasets to include (postgres / sqlserver / snowflake / bigquery only).
- Optional `enabled_tables` allowlist if the user wants to scope ingest to specific tables.
5. **BI / metadata sources** (dbt, Metabase, Looker, LookML, MetricFlow, Notion). Default: none. Ask only if the user mentions them.
2026-05-20 00:43:07 +02:00
## Phase 4 - Configure the project
2026-05-19 15:22:17 +02:00
2026-05-20 00:43:07 +02:00
Drive the existing wizard non-interactively (verify exact flag names with `ktx setup --help` and the docs - the automation flags are hidden from help but accepted):
2026-05-19 15:22:17 +02:00
```
ktx setup \
--project-dir < path > \
--no-input --yes \
--llm-backend < claude-code | anthropic | vertex > --llm-model < model > \
[--anthropic-api-key-env ANTHROPIC_API_KEY | --anthropic-api-key-file < path > ] \
[--vertex-project < p > --vertex-location < loc > ] \
--embedding-backend < sentence-transformers | openai > \
[--embedding-api-key-env OPENAI_API_KEY] \
--skip-sources \
2026-05-19 19:23:35 +02:00
--database < driver > --database-connection-id < name > --database-url < url | env:VAR | file: / path > \
2026-05-19 15:22:17 +02:00
[--database-schema < schema > …]
```
Notes on the flags above:
2026-05-19 19:23:35 +02:00
- **Project creation is automatic with `--no-input --yes` .** When
`ktx.yaml` exists, setup resumes it. When it doesn't exist, setup creates it
at `--project-dir` .
- **`--database-connection-id` is dual-purpose.** With `--database` or
`--database-url` , it names the new connection. Without those flags, it
selects an existing connection id.
- **Configure one new database connection per setup command.** If the user
wants multiple new connections, run setup again for each connection.
- **You don't need `--skip-agents` in this flow.** The agent integration step
is opt-in: setup leaves it alone unless you pass `--agents --target
< target > `.
2026-05-19 15:22:17 +02:00
- **`--skip-sources` ** is correct and is the documented way to leave BI/metadata sources unconfigured.
### Known soft-failure: `ktx setup` exits 1 after a successful fast build
When you select a configuration that only does fast (schema-only) ingest, `ktx setup` 's final readiness verification fails with:
```
KTX context build did not pass agent-readiness verification.
< connection > : deep database context has not completed.
```
This is **expected** and **does not mean setup failed** . Treat the exit code as a soft-failure **only if all of the following hold** :
- The build log shows the fast ingest reached `[100%] Scan completed` for every configured connection.
- `ktx connection test <name>` (run next) exits 0 for every connection.
- `ktx status --json --no-input` reports `verdict: "ready"` .
2026-05-20 00:43:07 +02:00
If those three conditions hold, proceed to Phase 5 without retrying setup, and **do not** switch to `--deep` to "fix" the readiness gate - deep ingest is explicitly out of scope. Mention this in the final report under "Docs / CLI gaps" so the user is aware.
2026-05-19 15:22:17 +02:00
2026-05-20 00:43:07 +02:00
If any of those three conditions do not hold, this is a real failure - capture the error, fetch the relevant docs page, fix the cause, retry.
2026-05-19 15:22:17 +02:00
After `ktx setup` writes `ktx.yaml` , edit it directly for anything flags don't cover:
- Per-connection `enabled_tables` allowlist (snake_case, under `connections.<name>.enabled_tables` ).
- Any advanced settings the user requested.
2026-05-20 00:43:07 +02:00
Use a YAML-aware editor (e.g. `uv run python -c "import yaml; …"` ) - do not hand-edit blindly.
2026-05-19 15:22:17 +02:00
2026-05-20 00:43:07 +02:00
## Phase 5 - Verify
2026-05-19 15:22:17 +02:00
`ktx setup` already runs a fast schema ingest of every database connection it configures, so you do not need to re-ingest by default. For each configured connection:
```
ktx connection test < connection-name > # must exit 0
```
Only re-run ingest if setup's build log did **not** reach 100% for that connection:
```
ktx ingest < connection-name > --fast --no-input
```
**Mutex warning on `ktx ingest` **: passing both `--yes` and `--no-input` fails with `Choose only one runtime install mode: --yes or --no-input` . Setup already installed the managed Python runtime, so pass **only `--no-input`** to `ktx ingest` . (`--yes` is only needed when an ingest invocation has to install the runtime itself, which is not the case here.)
Then run the global health check:
```
ktx status --json --no-input
```
2026-05-20 00:43:07 +02:00
Success requires (canonical shape - supersedes the example in the docs):
2026-05-19 15:22:17 +02:00
- `verdict: "ready"` at the top of the JSON.
- Every `connections[].status === "ok"` .
- `ktx connection test <name>` exited 0 for every connection.
2026-05-20 00:43:07 +02:00
Do **not** run `--deep` ingest in this flow - that requires LLM time and is out of scope.
2026-05-19 15:22:17 +02:00
refactor(release): drop release-policy.json runtime dep and next branch (#180)
* chore: standardize daemon naming on "KTX daemon"
Replace inconsistent names ("KTX Python daemon", "KTX local embeddings
daemon", "KTX managed daemon", "Python daemon") with the single name
"KTX daemon" in CLI output, errors, command descriptions, test
assertions, smoke scripts, docs, AGENTS.md, issue templates, and
codecov flags. The daemon is a portable compute server with endpoints
for SQL analysis, semantic layer, LookML, database introspection, and
embeddings; the previous labels misrepresented it as embeddings-only or
exposed implementation details ("Python", "managed").
The "KTX Python runtime" concept (installed interpreter + packages) is
deliberately left as-is — it is a separate concept from the daemon
process.
* refactor(release): drop release-policy.json runtime dep and next branch
Strips the release-policy.json fallback from release-version.ts so the CLI
reads its version straight from packages/cli/package.json. dev → 0.0.0-private,
installed @kaelio/ktx → the real semver baked into the published package.json.
KtxCliPackageInfo collapses to { name, version, contextPackageName }; /health
no longer depends on version files surviving past a CI run.
Replaces the dual-branch (main + next) semantic-release model with a single-
branch model on main. rcs and stables interleave on the same branch via
{ name: 'main', prerelease: 'rc', channel: 'next' } / ['main']. Drops
@semantic-release/git and @semantic-release/changelog (nothing is committed
back to the repo on any channel) and the workflow's "Prepare next prerelease
branch" step plus the KTX_PRERELEASE_BRANCH plumbing. The git tag plus the
published npm artifact carry the version forward.
Updates docs/release.md, removes the two now-unused devDeps, regenerates
pnpm-lock.yaml. 611/611 @ktx/cli tests, 173/173 script tests, type-check,
biome, knip all clean.
* fix(release): don't throw on non-main branches at config-load time
knip loads .releaserc.cjs on every PR run, where GITHUB_REF_NAME is the
merge ref (e.g. 180/merge). The previous version of releaseBranches threw
immediately when the branch wasn't main, which made knip fail to evaluate
the config and then mis-flag @semantic-release/exec as an unused dep.
semantic-release already refuses to publish when the current branch doesn't
match a configured release branch, so the explicit throw was redundant.
Drop it (and the unused currentBranch helper) and replace the
"rejects releases from non-main" assertion with one that exercises a CI-
shaped GITHUB_REF_NAME and confirms the config loads.
2026-05-20 13:53:14 +02:00
### Optional: directly probe the KTX daemon
2026-05-19 15:22:17 +02:00
If the user asks for stronger verification that `sentence-transformers` is actually serving (not just that setup said "ok"), do all of:
2026-05-20 01:36:54 +02:00
1. `ktx admin runtime status --json` → expect `"kind": "ready"` and `"features": [..., "local-embeddings"]` .
2026-05-19 15:22:17 +02:00
2. `pgrep -fa ktx-daemon` → expect a process running `ktx-daemon serve-http` .
3. `curl -sS http://127.0.0.1:<port>/health` → expect HTTP 200 with `{"status":"healthy",…}` .
4. `curl -sS -X POST http://127.0.0.1:<port>/embeddings/compute -H 'content-type: application/json' -d '{"text":"hello"}'` → expect `{"embedding": [...384 floats...]}` .
refactor(release): drop release-policy.json runtime dep and next branch (#180)
* chore: standardize daemon naming on "KTX daemon"
Replace inconsistent names ("KTX Python daemon", "KTX local embeddings
daemon", "KTX managed daemon", "Python daemon") with the single name
"KTX daemon" in CLI output, errors, command descriptions, test
assertions, smoke scripts, docs, AGENTS.md, issue templates, and
codecov flags. The daemon is a portable compute server with endpoints
for SQL analysis, semantic layer, LookML, database introspection, and
embeddings; the previous labels misrepresented it as embeddings-only or
exposed implementation details ("Python", "managed").
The "KTX Python runtime" concept (installed interpreter + packages) is
deliberately left as-is — it is a separate concept from the daemon
process.
* refactor(release): drop release-policy.json runtime dep and next branch
Strips the release-policy.json fallback from release-version.ts so the CLI
reads its version straight from packages/cli/package.json. dev → 0.0.0-private,
installed @kaelio/ktx → the real semver baked into the published package.json.
KtxCliPackageInfo collapses to { name, version, contextPackageName }; /health
no longer depends on version files surviving past a CI run.
Replaces the dual-branch (main + next) semantic-release model with a single-
branch model on main. rcs and stables interleave on the same branch via
{ name: 'main', prerelease: 'rc', channel: 'next' } / ['main']. Drops
@semantic-release/git and @semantic-release/changelog (nothing is committed
back to the repo on any channel) and the workflow's "Prepare next prerelease
branch" step plus the KTX_PRERELEASE_BRANCH plumbing. The git tag plus the
published npm artifact carry the version forward.
Updates docs/release.md, removes the two now-unused devDeps, regenerates
pnpm-lock.yaml. 611/611 @ktx/cli tests, 173/173 script tests, type-check,
biome, knip all clean.
* fix(release): don't throw on non-main branches at config-load time
knip loads .releaserc.cjs on every PR run, where GITHUB_REF_NAME is the
merge ref (e.g. 180/merge). The previous version of releaseBranches threw
immediately when the branch wasn't main, which made knip fail to evaluate
the config and then mis-flag @semantic-release/exec as an unused dep.
semantic-release already refuses to publish when the current branch doesn't
match a configured release branch, so the explicit throw was redundant.
Drop it (and the unused currentBranch helper) and replace the
"rejects releases from non-main" assertion with one that exercises a CI-
shaped GITHUB_REF_NAME and confirms the config loads.
2026-05-20 13:53:14 +02:00
Discover the port from setup's log line `Started KTX daemon: http://127.0.0.1:<port>` or from the daemon's OpenAPI at `GET /openapi.json` . Note: the routes are `/health` and `/embeddings/compute` - not `/healthz` or `/embeddings` .
2026-05-19 15:22:17 +02:00
2026-05-20 00:43:07 +02:00
## Phase 6 - Final report
2026-05-19 15:22:17 +02:00
Print a structured report:
```
KTX SETUP COMPLETE
Project: < path >
LLM: < backend > / < model >
Embeddings: < backend > / < model >
refactor(release): drop release-policy.json runtime dep and next branch (#180)
* chore: standardize daemon naming on "KTX daemon"
Replace inconsistent names ("KTX Python daemon", "KTX local embeddings
daemon", "KTX managed daemon", "Python daemon") with the single name
"KTX daemon" in CLI output, errors, command descriptions, test
assertions, smoke scripts, docs, AGENTS.md, issue templates, and
codecov flags. The daemon is a portable compute server with endpoints
for SQL analysis, semantic layer, LookML, database introspection, and
embeddings; the previous labels misrepresented it as embeddings-only or
exposed implementation details ("Python", "managed").
The "KTX Python runtime" concept (installed interpreter + packages) is
deliberately left as-is — it is a separate concept from the daemon
process.
* refactor(release): drop release-policy.json runtime dep and next branch
Strips the release-policy.json fallback from release-version.ts so the CLI
reads its version straight from packages/cli/package.json. dev → 0.0.0-private,
installed @kaelio/ktx → the real semver baked into the published package.json.
KtxCliPackageInfo collapses to { name, version, contextPackageName }; /health
no longer depends on version files surviving past a CI run.
Replaces the dual-branch (main + next) semantic-release model with a single-
branch model on main. rcs and stables interleave on the same branch via
{ name: 'main', prerelease: 'rc', channel: 'next' } / ['main']. Drops
@semantic-release/git and @semantic-release/changelog (nothing is committed
back to the repo on any channel) and the workflow's "Prepare next prerelease
branch" step plus the KTX_PRERELEASE_BRANCH plumbing. The git tag plus the
published npm artifact carry the version forward.
Updates docs/release.md, removes the two now-unused devDeps, regenerates
pnpm-lock.yaml. 611/611 @ktx/cli tests, 173/173 script tests, type-check,
biome, knip all clean.
* fix(release): don't throw on non-main branches at config-load time
knip loads .releaserc.cjs on every PR run, where GITHUB_REF_NAME is the
merge ref (e.g. 180/merge). The previous version of releaseBranches threw
immediately when the branch wasn't main, which made knip fail to evaluate
the config and then mis-flag @semantic-release/exec as an unused dep.
semantic-release already refuses to publish when the current branch doesn't
match a configured release branch, so the explicit throw was redundant.
Drop it (and the unused currentBranch helper) and replace the
"rejects releases from non-main" assertion with one that exercises a CI-
shaped GITHUB_REF_NAME and confirms the config loads.
2026-05-20 13:53:14 +02:00
Runtime: managed Python ✓ (if the KTX daemon was started)
2026-05-19 15:22:17 +02:00
Connections:
- < name > (< driver > ) status=ok schemas=[…] tables=< N >
- …
Sources: < list or " none " >
Verdict: ready
```
Then **Next steps** (copy-pasteable):
1. Enrich with AI descriptions and embeddings: `ktx ingest <connection> --deep` (several minutes per connection).
2026-05-19 19:23:35 +02:00
2. Add more connections later by rerunning this setup or via `ktx setup --database … --database-connection-id …` .
2026-05-20 00:43:07 +02:00
3. Configure BI sources (dbt, Metabase, Looker, LookML, MetricFlow, Notion) - see `ktx setup --help` for `--source …` flags.
2026-05-19 15:22:17 +02:00
4. Install agent integration: `ktx setup --agents --target <claude-code|claude-desktop|codex|cursor|opencode|universal>` (with optional `--global` for `claude-code` /`codex` ).
5. Connect the agent / MCP: see docs at `https://docs.kaelio.com/ktx/` .
Under **Docs / CLI gaps to flag** include any of these that applied during your run:
- `ktx setup` exits non-zero after a successful fast build (deep-readiness gate); status reports ready.
- `ktx ingest` rejects `--yes` and `--no-input` together; docs don't note the conflict.
- `ktx status --json` real shape (`verdict` , `connections[]` ) doesn't match the example in the docs page.
- The pasted DB URL was moved to `.ktx/secrets/<name>-url` automatically.
2026-05-20 00:43:07 +02:00
End with a single line: `RESULT: PASS` or `RESULT: FAIL - <one-line reason>` .
2026-05-19 15:22:17 +02:00
# Operating rules (recap)
- Print every command you run and its exit code. Status updates may be terse, but never silent.
- On failure: capture the error, fetch the relevant docs page, fix the cause, retry. Never retry an unchanged command.
2026-05-20 00:43:07 +02:00
- Known soft-failures (listed in Phase 4 and Phase 5) are not real failures - handle them as documented; do not retry or escalate.
2026-05-19 15:22:17 +02:00
- If you find a docs/CLI gap ("docs say X but CLI does Y"), call it out in the final report.
2026-05-20 00:43:07 +02:00
- Never commit credentials - KTX accepts `env:` and `file:` references; prefer those. KTX will also auto-relocate literal URLs into `.ktx/secrets/` , but that does not protect anyone who pasted the URL into chat history.