From 6c6a3e7bafccdc5c66873be6790882ba46f45f81 Mon Sep 17 00:00:00 2001
From: Andrey Avtomonov <andreybavt@gmail.com>
Date: Thu, 28 May 2026 15:36:56 +0200
Subject: [PATCH] docs(skills): correct ktx setup skill against agent-trial
 findings (#230)
MIME-Version: 1.0
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: 8bit

An external agent ran the skill end-to-end against `ktx setup` and reported
seven concrete failures, all verified against the CLI source:

- All useful setup flags are `.hideHelp()`, so the skill's "verify with
  --help" rule led the agent to conclude its own examples were wrong
  (setup-commands.ts:208-332).
- The non-interactive LLM default is `anthropic` (and requires a key), not
  `claude-code` as the skill claimed (setup-models.ts:505-507).
- `ktx status` exits 1 whenever the LLM is `none`, even with healthy
  embeddings and connections (status-project.ts:204-211, doctor.ts:647).
- `ktx ingest` rejects `--yes`+`--no-input` while `ktx setup` accepts both
  (managed-python-command.ts:23-24).
- `--database-url <raw>` auto-externalizes to `.ktx/secrets/<id>-url` —
  worth telling the agent (setup-databases.ts:671-683).
- Resuming setup with only `--llm-backend` fails on missing DB flags even
  when `ktx.yaml` already has one (setup-databases.ts:1778-1782).
- The `--agents` step prints `Required before using agents: ktx mcp start`
  but the skill never told agents to run it (setup-agents.ts:989,1227).

Rewrite SKILL.md to: lead with the scripted (non-interactive) path; add a
single "gather inputs once" checklist; correct the LLM default; document
`--skip-*` flags and resumability; warn that `status` exit code ≠
readiness; fix the `ktx ingest` example to use `--no-input` only; require
`ktx mcp start` after `--agents`; add a ktx-monorepo branch that avoids
`npm install -g`.

Add skills/ktx/troubleshooting.md (one level deep, per Anthropic's
progressive-disclosure guidance) covering the five real failure signatures
the agent hit: invalid ELF header, missing native CLI binary, missing
Anthropic key, claude-code probe failure, and the resume-without-DB error.

Description rewritten to combine what + when per the official skill
authoring guidelines.
---
 skills/ktx/SKILL.md           | 164 ++++++++++++++++++++--------------
 skills/ktx/troubleshooting.md |  79 ++++++++++++++++
 2 files changed, 174 insertions(+), 69 deletions(-)
 create mode 100644 skills/ktx/troubleshooting.md
diff --git a/skills/ktx/SKILL.md b/skills/ktx/SKILL.md
index 4a2b48a3..85028de7 100644
--- a/skills/ktx/SKILL.md
+++ b/skills/ktx/SKILL.md
@@ -1,100 +1,113 @@
 ---
 name: ktx
-description: Use when installing, configuring, verifying, or debugging ktx in a project, including ktx setup, ktx.yaml, database connectors, embeddings, agent integration, ingest, and ktx status checks.
+description: Installs and configures ktx, the open-source context layer for data agents — runs ktx setup non-interactively with hidden CLI flags, configures database connections and embeddings, installs agent integration, and verifies readiness. Use when the user asks an agent to add ktx to a project, connect data sources, install agent rules, ingest schema, or troubleshoot a local ktx install.
 ---
 
 # ktx
 
 Install and configure **ktx**, the open-source context layer for data agents.
 Use this skill when a user wants an agent to add **ktx** to a project, connect
-data sources, build initial context, install agent rules, or troubleshoot a
-local **ktx** setup.
+data sources, build initial context, install agent integration, or troubleshoot
+a local **ktx** setup.
 
 ## Operating rules
 
 - Act autonomously when the user asks you to install or configure **ktx**.
-- Ask only for choices or values you cannot infer: project directory,
-  connection targets, credentials, account identifiers, and source selections.
+  The non-interactive scripted flow below is the canonical path — bare
+  `ktx setup` is interactive (clack prompts) and an agent cannot drive it.
+- Setup's non-interactive flags are intentionally hidden from `--help`. Use the
+  flags listed below; verify uncommon flags against the docs at
+  `https://docs.kaelio.com/ktx/` or this skill — not against `--help` output.
+- Ask only for values you cannot infer: project directory, connection targets,
+  credentials, account identifiers, and source selections.
 - Never ask the user to paste secrets when an `env:VAR_NAME` or `file:/path`
-  reference would work.
-- Do not commit `.ktx/secrets/*` or pasted credentials.
-- Verify CLI flags and config keys with `ktx --help`, `ktx <command> --help`,
-  or the docs at `https://docs.kaelio.com/ktx/` before using unfamiliar
-  options.
-- Print or report each command you run and its result when doing setup work.
+  reference would work. Pasting a literal URL is also safe — `ktx setup`
+  auto-externalizes URLs into `.ktx/secrets/<id>-url` (see workflow step 2).
+- Do not commit `.ktx/secrets/*`.
+- Print each command you run and its result.
 - If a command fails, identify the cause and change something before retrying.
 
+## Gather inputs once
+
+Before invoking `ktx setup`, collect in one round:
+
+1. Project directory (default: current working directory).
+2. LLM backend and key strategy. In `--no-input` mode the CLI defaults to
+   `anthropic` and **requires an API key**. When the user is inside Claude
+   Code, pass `--llm-backend claude-code` explicitly; otherwise pass
+   `--llm-backend anthropic --anthropic-api-key-env ANTHROPIC_API_KEY`.
+3. Embedding backend (`sentence-transformers` is the local default and needs
+   no key; use `openai` only if the user already has a key, then pass
+   `--embedding-api-key-env OPENAI_API_KEY`).
+4. Database: driver, connection id, URL (or `env:` / `file:` ref), and one or
+   more schemas.
+5. Optional context sources (dbt, Metabase, Looker, LookML, MetricFlow,
+   Notion). Skip with `--skip-sources` if the user has none.
+
+Do not discover these inputs across multiple setup runs.
+
 ## Install workflow
 
-Use this workflow for a new or resumed project setup:
-
-1. Confirm the project directory. Default to the current working directory.
-2. Check prerequisites:
-   - Node.js with `node --version`; require Node 22 or newer.
-   - `uv` with `uv --version`; install it only if missing and local Python
-     runtime features are needed.
-   - **ktx** with `ktx --version`; install the published CLI if missing.
-3. Install the published CLI when needed:
+1. **Detect the install path.** If the working directory contains
+   `packages/cli/dist/bin.js` or `pnpm-workspace.yaml` referencing
+   `@kaelio/ktx` you are inside the **ktx** monorepo — build and link the
+   local CLI with `pnpm` and do **not** run `npm install -g`. Otherwise:
 
    ```bash
-   npm install -g @kaelio/ktx
+   node --version    # require >= 22; stop and ask the user if older
+   ktx --version || npm install -g @kaelio/ktx
    ```
 
-4. Run interactive setup when the user is present:
+2. **Run scripted setup** (canonical path):
 
    ```bash
-   ktx setup
+   ktx setup --no-input --yes \
+     --project-dir <path> \
+     --llm-backend claude-code \
+     --embedding-backend sentence-transformers \
+     --database <driver> --database-connection-id <id> \
+     --database-url '<raw-url | env:NAME | file:/abs/path>' \
+     --database-schema <schema> \
+     --skip-sources
    ```
 
-5. For scripted setup, prefer `ktx setup --no-input --yes` with explicit flags.
-   Verify exact flags with `ktx setup --help` and the docs first.
-6. Configure one new database connection per scripted setup command. For
-   multiple connections, rerun setup once per connection.
-7. Run fast ingest by default. Do not run deep ingest unless the user asks for
-   LLM-backed enrichment.
-8. Install or repair agent integration after project setup:
+   - Configure one new database connection per setup invocation. For multiple
+     connections, rerun setup once per connection.
+   - Pasting a literal `--database-url` is safe: the CLI relocates the URL
+     into `.ktx/secrets/<connection-id>-url` and rewrites `ktx.yaml` to a
+     `file:` ref automatically.
+
+3. **Resumability and `--skip-*`.** Re-running `ktx setup` against an existing
+   project resumes its config. Use `--skip-llm`, `--skip-databases`,
+   `--skip-sources`, or `--skip-embeddings` to leave a slice unconfigured but
+   let the rest complete instead of aborting on the first failure. **When
+   resuming an existing project to change one slice (e.g. only LLM), still
+   pass the database flags from the previous run** — setup validates current
+   flags, not persisted `ktx.yaml` state.
+
+4. **Run fast ingest** if setup did not already complete one:
 
    ```bash
-   ktx setup --agents
+   ktx ingest <connection-id> --fast --no-input
    ```
 
-9. Verify readiness:
+   Note: `ktx ingest` rejects `--yes` together with `--no-input`
+   (*Choose only one runtime install mode*); `ktx setup` accepts both. Use
+   `--no-input` only for ingest. Do not run `--deep` ingest unless the user
+   explicitly asks for LLM-backed enrichment.
+
+5. **Install agent integration:**
 
    ```bash
-   ktx status
+   ktx setup --agents --target <claude-code|claude-desktop|codex|cursor|opencode|universal>
+   ktx mcp start --project-dir <path>
    ```
 
-   Use `ktx status --json` when you need structured success criteria.
+   Agent integration is **not usable until `ktx mcp start` is running**. The
+   `--agents` step prints this requirement as `Required before using agents`.
 
-## Common setup choices
-
-Default choices are usually:
-
-- LLM: `claude-code` if the user is already running Claude Code, otherwise ask.
-- Embeddings: `sentence-transformers` for local embeddings with no API key, or
-  `openai` when the user wants hosted embeddings and has an API key.
-- Databases: SQLite, PostgreSQL, MySQL, SQL Server, BigQuery, Snowflake, or
-  ClickHouse.
-- Context sources: dbt, MetricFlow, LookML, Looker, Metabase, or Notion.
-
-Use `env:` or `file:` references for credentials:
-
-```bash
-ktx setup \
-  --project-dir ./analytics \
-  --no-input \
-  --yes \
-  --database postgres \
-  --database-connection-id warehouse \
-  --database-url env:DATABASE_URL \
-  --database-schema public
-```
-
-Then build or refresh fast context if setup did not already do it:
-
-```bash
-ktx ingest warehouse --fast --no-input
-```
+6. **Fall back to bare `ktx setup` only when a human is at the keyboard** —
+   it uses interactive prompts an agent cannot answer.
 
 ## Files to inspect
 
@@ -108,17 +121,30 @@ ktx ingest warehouse --fast --no-input
 
 ## Verification
 
-After setup, run the smallest checks that cover the configured surface:
+After setup, run:
 
 ```bash
 ktx connection test <connection-id>
-ktx status --json
+ktx status --json --no-input
 ```
 
-Success means the project is ready, configured connections report healthy, and
-the agent integration target requested by the user is installed. If fast setup
-completed but deep context readiness is still missing, report that as the next
-optional enrichment step rather than retrying setup unchanged.
+**Judge readiness from `ktx status --json` fields, not the exit code.**
+`ktx status` exits 1 whenever the LLM is `none`, even when embeddings and
+every database connection are healthy. Treat success as:
+
+- `verdict: "ready"` at the top of the JSON, and
+- every `connections[].status === "ok"`, and
+- every `ktx connection test <id>` exited 0.
+
+A non-zero exit with only the LLM unconfigured is still a usable context
+layer — report it as "ready, LLM optional" rather than retrying setup.
+
+## Troubleshooting
+
+For known failure signatures (`invalid ELF header`,
+`Native CLI binary for <plat> not found`, `Missing Anthropic API key`,
+`claude-code` probe failure, `KTX cannot work without a database` on resume),
+see [troubleshooting.md](troubleshooting.md).
 
 ## Final report
 
diff --git a/skills/ktx/troubleshooting.md b/skills/ktx/troubleshooting.md
new file mode 100644
index 00000000..812b45fc
--- /dev/null
+++ b/skills/ktx/troubleshooting.md
@@ -0,0 +1,79 @@
+# ktx setup troubleshooting
+
+Known failure signatures hit by agent-driven `ktx setup` runs. Match the
+error string in the left column, apply the fix in the right column.
+
+## `Error: invalid ELF header` from `better-sqlite3`
+
+Native module compiled for a different platform or architecture (e.g.
+installed under Rosetta then run under native arm64).
+
+Fix:
+
+```bash
+# Inside the ktx monorepo:
+pnpm rebuild better-sqlite3
+
+# Or for a global install:
+npm rebuild --global better-sqlite3
+```
+
+## `Native CLI binary for <plat> not found`
+
+The platform-specific optional dependency that ships the native CLI binary
+was skipped during install (npm/pnpm "optional dep not for this platform").
+
+Fix:
+
+```bash
+npm install -g @kaelio/ktx --force
+```
+
+## `Missing Anthropic API key: pass --anthropic-api-key-env or --anthropic-api-key-file`
+
+`--no-input` mode defaulted the LLM backend to `anthropic` because no
+`--llm-backend` flag was supplied. The CLI then required a key.
+
+Fix — pick one:
+
+```bash
+# Inside Claude Code, prefer the local backend:
+ktx setup --no-input --llm-backend claude-code ...other flags...
+
+# Otherwise point at an existing env var:
+ktx setup --no-input --llm-backend anthropic \
+  --anthropic-api-key-env ANTHROPIC_API_KEY ...other flags...
+```
+
+## `claude-code` LLM probe fails (auth or binary not found)
+
+The `claude` CLI is not on the agent's `PATH`, or the user has not run
+`claude` interactively at least once to log in.
+
+Fix:
+
+```bash
+which claude            # confirm the binary resolves
+claude --version        # confirm it runs
+# If auth probe still fails, the user must run `claude` once interactively
+# to complete login; agents cannot do this step.
+```
+
+If `claude-code` cannot be made to work, fall back to `--skip-llm` and let
+the rest of setup complete; the project is still a usable context layer
+without an LLM.
+
+## `KTX cannot work without a database` when resuming setup
+
+`ktx setup` validates the **current invocation's flags**, not the persisted
+`ktx.yaml`. Resuming setup with only `--llm-backend …` fails even when the
+project already has a healthy database connection.
+
+Fix — re-pass the database flags from the original setup run, even when
+only changing one slice:
+
+```bash
+ktx setup --no-input \
+  --database <driver> --database-connection-id <id> \
+  --llm-backend claude-code
+```