ktx/docs-site/content/docs/cli-reference/ktx-setup.mdx

275 lines
13 KiB
Text
Raw Permalink Normal View History

---
title: "ktx setup"
description: "Set up or resume a local ktx project."
---
`ktx setup` is the guided configuration flow for a local **ktx** project. It can
create or resume `ktx.yaml`, configure LLM and embedding providers, add
database and context-source connections, prepare required runtime features,
build initial context, and install agent integrations.
When you run bare `ktx` in an interactive terminal outside any **ktx** project, the
CLI starts this same setup flow. Inside an existing project, `ktx setup`
resumes from incomplete setup state or opens the setup menu.
## Command signature
```bash
ktx setup [options]
```
## Visible Options
The help output intentionally keeps setup focused on the common interactive
flags. Automation flags are accepted by the same command and are documented
below.
| Flag | Description | Default |
|------|-------------|---------|
| `--agents` | Install agent configuration and rules only | `false` |
| `--target <target>` | Agent target: `claude-code`, `claude-desktop`, `codex`, `cursor`, `opencode`, or `universal` | - |
| `--global` | Install agent integration into the global target scope for `claude-code` or `codex` | `false` |
| `--yes` | Accept project creation and runtime install defaults where setup asks for confirmation | `false` |
2026-05-14 12:43:14 -04:00
| `--no-input` | Disable interactive terminal input | - |
Use the global `--project-dir <path>` option when setup should target a
specific directory.
## Automation Options
These flags are useful for repeatable setup in examples, tests, CI fixtures, and
scripted project creation. They are not shown in `ktx setup --help`.
### Project Creation
Setup resumes an existing `ktx.yaml` when one is present. When no project
exists, interactive setup prompts for where to create it. In scripts, pass
`--project-dir <dir> --no-input --yes` to create the target directory without
prompts.
### LLM Provider
| Flag | Description |
|------|-------------|
2026-06-01 17:25:43 +02:00
| `--llm-backend <backend>` | LLM backend: `anthropic`, `vertex`, `claude-code`, or `codex` |
| `--llm-backend claude-code` | Use the local Claude Code session for **ktx** LLM calls |
2026-06-01 17:25:43 +02:00
| `--llm-backend codex` | Use local Codex authentication for **ktx** LLM calls |
feat: add claude-code llm backend with runtime port (#115) * docs: revise claude-code ingest backend spec * docs: keep claude-code spec focused on ingest * docs: expand claude-code spec to full llm parity * Refine claude-code backend spec after adversarial review iteration 1 * Refine claude-code backend spec after adversarial review iteration 2 * Refine claude-code backend spec after adversarial review iteration 3 * feat: recognize claude-code llm backend * feat: add ktx llm runtime port * feat: add claude-code llm runtime * feat: route non-agent llm calls through runtime * feat: run ingest agents through llm runtime * feat: support claude-code setup and status * test: verify claude-code backend runtime * docs: add claude-code backend v1 runtime plan * fix: close claude-code runtime isolation checks * fix: warn on claude-code prompt caching during setup * chore: verify claude-code v1 closure * docs: add claude-code backend v1 isolation closure plan * fix: update claude-code ingest setup guidance * docs: add claude-code backend v1 ingest guidance closure plan * docs: align claude-code isolation spec with sdk metadata * test: cover claude-code host discovery metadata * fix: tolerate claude-code host discovery metadata * docs: clarify claude-code host discovery metadata * docs: add claude-code auth-probe isolation fix plan * chore: prepare kaelio ktx rc1 release * chore: add semantic release workflow * fix: unblock ci checks * chore(release): 0.1.0-rc.1 * feat: add Claude Code model selection to setup * fix: keep git maintenance attached in local repos
2026-05-16 12:06:34 +02:00
| `--llm-model <model>` | LLM model ID or backend model alias to validate and save |
| `--anthropic-api-key-env <name>` | Environment variable containing the Anthropic API key |
| `--anthropic-api-key-file <path>` | File containing the Anthropic API key |
| `--vertex-project <project>` | Vertex AI project ID, `env:NAME`, or `file:/path` reference |
| `--vertex-location <location>` | Vertex AI location, `env:NAME`, or `file:/path` reference |
| `--skip-llm` | Leave LLM setup incomplete |
Choose only one Anthropic credential source. Anthropic credential flags are only
valid with the Anthropic backend; Vertex flags are only valid with the Vertex
feat: add claude-code llm backend with runtime port (#115) * docs: revise claude-code ingest backend spec * docs: keep claude-code spec focused on ingest * docs: expand claude-code spec to full llm parity * Refine claude-code backend spec after adversarial review iteration 1 * Refine claude-code backend spec after adversarial review iteration 2 * Refine claude-code backend spec after adversarial review iteration 3 * feat: recognize claude-code llm backend * feat: add ktx llm runtime port * feat: add claude-code llm runtime * feat: route non-agent llm calls through runtime * feat: run ingest agents through llm runtime * feat: support claude-code setup and status * test: verify claude-code backend runtime * docs: add claude-code backend v1 runtime plan * fix: close claude-code runtime isolation checks * fix: warn on claude-code prompt caching during setup * chore: verify claude-code v1 closure * docs: add claude-code backend v1 isolation closure plan * fix: update claude-code ingest setup guidance * docs: add claude-code backend v1 ingest guidance closure plan * docs: align claude-code isolation spec with sdk metadata * test: cover claude-code host discovery metadata * fix: tolerate claude-code host discovery metadata * docs: clarify claude-code host discovery metadata * docs: add claude-code auth-probe isolation fix plan * chore: prepare kaelio ktx rc1 release * chore: add semantic release workflow * fix: unblock ci checks * chore(release): 0.1.0-rc.1 * feat: add Claude Code model selection to setup * fix: keep git maintenance attached in local repos
2026-05-16 12:06:34 +02:00
backend. The `claude-code` backend uses local Claude Code authentication instead
of Anthropic API key or Vertex flags. For Claude Code, `--llm-model` accepts
`sonnet`, `opus`, `haiku`, or a full Claude model ID.
2026-06-01 17:25:43 +02:00
The `codex` backend uses local Codex authentication instead of Anthropic API key
or Vertex flags. For Codex, `--llm-model` accepts `gpt-5.3-codex`, `gpt-5.4`,
or another `gpt-*` or `codex-*` model ID.
### Embeddings
| Flag | Description |
|------|-------------|
| `--embedding-backend <backend>` | Embedding backend: `openai` or `sentence-transformers` |
| `--embedding-api-key-env <name>` | Environment variable containing the embedding provider API key |
| `--embedding-api-key-file <path>` | File containing the embedding provider API key |
| `--skip-embeddings` | Leave embedding setup incomplete |
`sentence-transformers` uses the **ktx**-managed Python runtime. Choose only one
embedding credential source.
### Runtime
Setup prepares the managed Python runtime when your selected configuration
needs it. In the full setup flow, the runtime step runs after database and
context-source setup and before the initial context build.
**ktx** prepares the `core` runtime feature when query-history ingest, Looker
context-source ingest, database introspection fallback, or daemon-backed
context build paths need it. **ktx** prepares the `local-embeddings` runtime feature when you
choose managed local `sentence-transformers` embeddings. Existing external
daemon URLs, such as `KTX_DAEMON_URL` or `KTX_SQL_ANALYSIS_URL`, satisfy the
matching dependency and skip managed runtime installation for that dependency.
`ktx setup --agents` doesn't prepare runtime features or build context. It only
installs agent configuration and rules. Start MCP with `ktx mcp start` before
using HTTP-based agents; MCP startup prepares the runtime it needs.
Interactive setup prompts before installing runtime features. Use `--yes` to
install them without prompting. Use `--no-input` to fail fast when required
runtime features are missing.
### Databases
| Flag | Description |
|------|-------------|
| `--database <driver>` | Database driver to configure; repeatable. Choices: `sqlite`, `postgres`, `mysql`, `clickhouse`, `sqlserver`, `bigquery`, `snowflake` |
| `--database-connection-id <id>` | Existing selected connection id; repeatable. With `--database` or `--database-url`, connection id for the new connection. |
| `--database-url <url>` | URL, `env:NAME`, or `file:/path` for one new URL-style database connection; also used as the SQLite path |
| `--database-schema <schema>` | Database schema or dataset to include; repeatable |
| `--skip-databases` | Leave database setup incomplete |
**ktx** needs at least one database connection before it can build database
context. Use `--skip-databases` only when intentionally leaving the project
incomplete.
`--database-schema` maps to the driver's scope field: `schemas` for PostgreSQL,
MySQL, and SQL Server; `schema_names` for Snowflake; `dataset_ids` for
BigQuery; and `databases` for ClickHouse.
### Query History
| Flag | Description |
|------|-------------|
| `--enable-query-history` | Enable query-history ingest when the selected database supports it |
| `--disable-query-history` | Disable query-history ingest for the selected database |
| `--query-history-window-days <number>` | BigQuery/Snowflake query-history lookback window |
| `--query-history-min-executions <number>` | Minimum executions for a query-history template |
| `--query-history-service-account-pattern <pattern>` | Query-history service-account regex; repeatable |
| `--query-history-redaction-pattern <pattern>` | Query-history SQL-literal redaction regex; repeatable |
Query history setup is supported for Postgres, BigQuery, and Snowflake. The
window flag applies to BigQuery and Snowflake; Postgres reads the current
`pg_stat_statements` aggregate data instead of a time-windowed history table.
Later `ktx ingest` runs build enriched context and need a configured model and
embeddings, including when query history is enabled.
When query history is enabled for PostgreSQL, Snowflake, or BigQuery,
`ktx setup` runs a non-blocking readiness probe after the connection test
passes. A failed probe still writes setup changes, prints the warehouse-specific
test: split cli tests from source tree (#216) * feat(cli): define full warehouse dialect contract * test(cli): keep dialect edge tests focused * fix(cli): stabilize dialect contract foundation * refactor(connectors): own read-only query preparation * refactor(connectors): resolve dialects through registry * refactor(connectors): keep concrete dialect classes internal * chore(workspace): enforce dialect import boundary * refactor(cli): resolve relationship dialect at scan boundary * refactor(cli): use dialect display parsing for entity details * refactor(cli): use dialect display parsing for warehouse catalog * refactor(cli): use dialect SQL in relationship workflows * test(cli): verify solid dialect scan workflow closure * test: split cli tests from source tree * refactor(cli): standardize BigQuery scope listing * feat(sqlite): implement connector scope listing * test(connectors): cover required table listing * feat(cli): add warehouse driver registry * refactor(setup): route scope discovery through driver registry * refactor(cli): route local query execution through driver registry * refactor(historic-sql): route dialect support through driver registry * refactor(cli): test warehouse connections through driver registry * fix(cli): close driver registry type export gaps * Improve setup daemon diagnostics * refactor(setup): centralize rail-prefixed diagnostics + query-history fallback Extract errorMessage, writePrefixedLines, and flushPrefixedBufferedCommandOutput into clack.ts so the setup wizard, managed daemons, and embedding/agent steps share one rail-formatted writer. setup-databases.ts also adds a "disable query history and retry" option when the schema-context build fails and query history is the likely culprit, surfaced via a new failed-query-history-unavailable status. * fix(cli): carry catalog through the picker so BigQuery/Snowflake/SQL Server scope filters match The setup picker's KtxTableListEntry was a 2-level { schema, name }, so qualifiedTableId always wrote db.name into enabled_tables. When BigQuery, Snowflake, or SQL Server later ran fast ingest, their introspect step filtered the scope set with scopedTableNames(scope, { catalog: projectId|database, db }) — catalog was non-null on the introspect side but null in the scope refs, so every entry was rejected, the live-database adapter staged zero table files, and detect() failed with 'Adapter "live-database" did not recognize fetched source output'. Align the picker boundary with the canonical 3-level KtxTableRef: - Add catalog: string | null to KtxTableListEntry. - BigQuery/Snowflake/SQL Server listTables populate catalog from the resolved projectId / database; Postgres/MySQL/ClickHouse/SQLite set null. - qualifiedTableId emits catalog.schema.name when catalog is non-null (resolveEnabledTables already accepts the 3-part shape) and schemasFromEnabledTables now goes through parseDottedTableEntry so it recovers the schema correctly from both 2-part and 3-part entries. - Export parseDottedTableEntry from enabled-tables.ts (@internal) for picker reuse. Update listTables expectations in all seven connector tests and the setup / picker test fixtures. Add a picker regression test that covers the catalog-bearing round-trip (save + refine). * fix(cli): allow debug telemetry under opt-out env
2026-05-26 08:49:05 +02:00
grant or extension remediation, and skips query-history processing until you
fix the prerequisite. If the later schema-context build also fails, interactive
setup offers **Disable query history and retry** so you can finish database
setup with `connections.<id>.context.queryHistory.enabled: false`.
For BigQuery, the remediation tells you to grant `roles/bigquery.resourceViewer`
on the BigQuery project, or grant a custom role that contains
`bigquery.jobs.listAll`.
### Context Sources
In interactive setup, after you configure a database, choose
**Skip context sources** to leave optional context-source setup complete with no
sources. This is equivalent to passing `--skip-sources` in scripted setup.
| Flag | Description |
|------|-------------|
| `--source <type>` | Context-source connector type: `dbt`, `metricflow`, `metabase`, `looker`, `lookml`, or `notion` |
| `--source-connection-id <id>` | Connection id for context-source setup |
| `--source-path <path>` | Local source path for dbt, MetricFlow, or LookML |
| `--source-git-url <url>` | Git URL for dbt, MetricFlow, or LookML |
| `--source-branch <branch>` | Git branch for context-source setup |
| `--source-subpath <path>` | Repo subpath for context-source setup |
| `--source-auth-token-ref <ref>` | `env:` or `file:` credential reference for source repo auth or Notion integration token |
| `--source-url <url>` | Source service URL for Metabase or Looker |
| `--source-api-key-ref <ref>` | `env:` or `file:` API key reference for Metabase |
| `--source-client-id <id>` | Looker client id |
| `--source-client-secret-ref <ref>` | `env:` or `file:` Looker client secret reference |
| `--source-warehouse-connection-id <id>` | Warehouse connection id used for context-source mapping |
| `--source-project-name <name>` | dbt project name override |
| `--source-profiles-path <path>` | dbt profiles path |
| `--source-target <target>` | dbt target or context-source-specific mapping target |
| `--metabase-database-id <id>` | Metabase database id to map |
| `--notion-crawl-mode <mode>` | Notion crawl mode: `all_accessible` or `selected_roots` |
| `--notion-root-page-id <id>` | Notion root page id; repeatable |
| `--skip-sources` | Mark optional context-source setup complete with no sources |
Choose only one source location: `--source-path` or `--source-git-url`.
## Examples
```bash
# Run the interactive setup wizard
ktx setup
# Run setup for a specific project directory
ktx setup --project-dir ./analytics
# Use Claude Code with Opus for ktx LLM calls
feat: add claude-code llm backend with runtime port (#115) * docs: revise claude-code ingest backend spec * docs: keep claude-code spec focused on ingest * docs: expand claude-code spec to full llm parity * Refine claude-code backend spec after adversarial review iteration 1 * Refine claude-code backend spec after adversarial review iteration 2 * Refine claude-code backend spec after adversarial review iteration 3 * feat: recognize claude-code llm backend * feat: add ktx llm runtime port * feat: add claude-code llm runtime * feat: route non-agent llm calls through runtime * feat: run ingest agents through llm runtime * feat: support claude-code setup and status * test: verify claude-code backend runtime * docs: add claude-code backend v1 runtime plan * fix: close claude-code runtime isolation checks * fix: warn on claude-code prompt caching during setup * chore: verify claude-code v1 closure * docs: add claude-code backend v1 isolation closure plan * fix: update claude-code ingest setup guidance * docs: add claude-code backend v1 ingest guidance closure plan * docs: align claude-code isolation spec with sdk metadata * test: cover claude-code host discovery metadata * fix: tolerate claude-code host discovery metadata * docs: clarify claude-code host discovery metadata * docs: add claude-code auth-probe isolation fix plan * chore: prepare kaelio ktx rc1 release * chore: add semantic release workflow * fix: unblock ci checks * chore(release): 0.1.0-rc.1 * feat: add Claude Code model selection to setup * fix: keep git maintenance attached in local repos
2026-05-16 12:06:34 +02:00
ktx setup \
--project-dir ./analytics \
--llm-backend claude-code \
--llm-model opus
2026-06-01 17:25:43 +02:00
# Configure ktx to use local Codex authentication for LLM work
ktx setup --llm-backend codex --llm-model gpt-5.3-codex --no-input
# Script a Postgres connection that reads its URL from the environment
ktx setup \
--project-dir ./analytics \
--no-input \
--yes \
--skip-llm \
--skip-embeddings \
--database postgres \
--database-connection-id warehouse \
--database-url env:DATABASE_URL \
--database-schema public
# Enable Postgres query history while setting up a database
ktx setup \
--project-dir ./analytics \
--database postgres \
--database-connection-id warehouse \
--database-url env:DATABASE_URL \
--enable-query-history \
--query-history-min-executions 5
# Add a Metabase source mapped to an existing warehouse connection
ktx setup \
--source metabase \
--source-connection-id prod_metabase \
--source-url https://metabase.example.com \
--source-api-key-ref env:METABASE_API_KEY \
--source-warehouse-connection-id warehouse \
--metabase-database-id 1
# Add a Notion source that crawls selected root pages
ktx setup \
--source notion \
--source-connection-id notion-main \
--source-auth-token-ref env:NOTION_TOKEN \
--notion-crawl-mode selected_roots \
--notion-root-page-id abc123def456
# Install project-scoped agent integration for Codex
ktx setup --agents --target codex
```
## Output
Interactive setup renders prompts and progress messages. Use `ktx status` to
check setup and context readiness after setup exits.
```text
ktx project: /home/user/analytics
Project ready: yes
LLM ready: yes (claude-sonnet-4-6)
Embeddings ready: yes (text-embedding-3-small)
feat: merge ingest and scan * docs: add CLI component reuse guidance * docs: add unified ingest ux design * Refine unified ingest UX design after adversarial review iteration 1 * Refine unified ingest UX design after adversarial review iteration 2 * Refine unified ingest UX design after adversarial review iteration 3 * feat(cli): route public connection ingest command * feat(cli): hide standalone scan from public help * feat(cli): plan public ingest depth and query history * feat(cli): execute public database ingest facets * feat(ingest): read connection query history config * fix(cli): use public ingest wording * fix(config): stop generating ingest adapter allow lists * docs: document public ingest command * test: align ingest surface expectations * docs: add unified ingest public CLI surface plan * feat(cli): preflight deep public ingest readiness * feat(setup): store query history in connection context * feat(setup): store database context depth * feat(setup): verify context readiness by database depth * fix(setup): keep context build foreground only * fix(config): reject reserved ingest connection ids * test: close unified ingest v1 expectations * docs: add unified ingest v1 closure plan * fix(ingest): bypass adapter allow-list for public source ingest * fix(ingest): honor query history window intent * fix(ingest): hide scan internals from public database ingest * feat(ingest): use foreground view for interactive public ingest * fix(setup): use schema context and query history wording * test(cli): verify unified ingest public output * docs: add unified ingest v1 public output closure plan * fix(setup): forward query history flags * fix(setup): prompt for postgres query history * fix(status): report query history readiness * fix(ingest): remove legacy public guidance * fix(ingest): polish foreground retry copy * docs(examples): use unified query history wording * chore(ingest): finish public query history cleanup * docs: add unified ingest v1 query history status cleanup plan * test(docs): cover unified ingest public docs * docs: align ingest CLI reference with unified UX * docs: update context build guides for unified ingest * docs: update setup and primary source ingest wording * docs: stop advertising adapter-backed example ingest * docs: close unified ingest public docs gaps * docs: add unified ingest v1 docs site closure plan * fix: render unified ingest foreground warnings * fix: explain query history schema order * fix: add public ingest retry guidance * fix: align setup next steps with unified ingest * fix: remove scan wording from demo progress * test: verify unified ingest ux closure * docs: add unified ingest v1 foreground and retry closure plan * fix(cli): preserve query-history pull config in public ingest * fix(cli): omit hidden commands from docs command tree * test(cli): close unified ingest final public surface checks * docs: add unified ingest v1 final public surface closure plan * fix(cli): use public source labels in ingest reports * fix(cli): suppress low-level public ingest output * test(cli): verify unified ingest public plain output * docs: add unified ingest v1 public plain output closure plan * fix(cli): add public ingest copy sanitizers * fix(cli): sanitize public ingest progress copy * fix(cli): rename setup schema scope prompt * docs(plan): add progress copy closure; test: align setup back-nav fixture Adds the iter9 plan and updates the setup back-navigation test fixture to pass disableQueryHistory plus listSchemas/listTables stubs that the unified ingest setup step now requires. * docs(plan): add final ux labels plan with narrowed label scans * fix(cli): aggregate unsupported query-history warnings * fix(cli): align setup database labels * test(cli): fix setup database test type-check * fix(cli): remove primary-source wording from setup output * test(cli): verify unified ingest setup closure * docs(plan): add unified ingest v1 verification copy closure plan * fix(cli): remove top-level scan command * fix(cli): remove legacy ingest and wiki commands * Merge scan into ingest flow * feat(cli): split ingest progress into per-phase rows, rename work units to tasks Each database target in the unified ingest dashboard now renders one row per real subprocess (Schema, then Query history when enabled) instead of a single combined bar. Each phase has its own monotonic 0-100% bar so the progress never snaps back to zero when historic-sql starts after scan completes. Completed phases keep their final bar, summary, and elapsed time visible as an inline audit trail; queued and skipped phases are shown explicitly. Also rename user-facing "work units" / "Failed work units" to "tasks" / "Failed tasks" in ingest output and parseIngestSummary. The parser still accepts the legacy "Work units:" wording in captured output for backward compat. Internal memory-flow event names and type fields are left alone. * Fix test harness failures * Fix CI smoke checks --------- Co-authored-by: Andrey Avtomonov <7889985+andreybavt@users.noreply.github.com>
2026-05-14 01:43:06 +02:00
Databases configured: yes (postgres-warehouse)
Context sources configured: yes (dbt-main)
Runtime ready: yes (core)
ktx context built: yes
Agent integration ready: yes (codex:project)
```
Use `ktx status` for repeatable readiness checks after setup exits.
## Common errors
| Error | Cause | Recovery |
|-------|-------|----------|
| Setup resumes an unexpected project | `KTX_PROJECT_DIR` or nearest `ktx.yaml` points to another directory | Pass `--project-dir <path>` explicitly |
| Setup cannot run in CI | Required values are missing and `--no-input` disables prompts | Provide the relevant automation flags or create a fixture `ktx.yaml` |
| Provider health check fails | Provider key, model id, Vertex project, or Vertex location is invalid | Fix the `env:` or `file:` reference and rerun setup |
| Python runtime is missing | The selected setup needs runtime-backed agent, query-history, Looker, or local embedding features | Accept the interactive prompt, rerun with `--yes`, or run the suggested `ktx admin runtime install` command |
| `--enable-query-history` is rejected | The selected database driver does not support query history | Use Postgres, BigQuery, or Snowflake, or rerun without query-history flags |
| Source setup rejects location flags | Both `--source-path` and `--source-git-url` were supplied | Choose the local path or the Git URL, not both |
2026-05-12 23:51:46 +02:00
| Agent integration missing | Setup skipped the agents step | Run `ktx setup --agents --target <target>` |
| Agent setup cannot prompt for a target | Non-TTY `ktx setup --agents` needs a target | Run `ktx setup --agents --target <target>` or rerun in a TTY |
| Global agent install is rejected | `--global` was used with a target other than `claude-code` or `codex` | Omit `--global`, or choose `--target claude-code` or `--target codex` |