Merge remote-tracking branch 'origin/main' into scan-during-setup

# Conflicts:
#	packages/cli/src/setup-context.test.ts
#	packages/cli/src/setup-context.ts
#	packages/cli/src/setup.test.ts
#	packages/cli/src/setup.ts
This commit is contained in:
Luca Martial 2026-05-13 09:25:25 -07:00
commit fe0a59f55e
357 changed files with 14537 additions and 14297 deletions

View file

@ -37,6 +37,9 @@ jobs:
- name: Install TypeScript dependencies
run: pnpm install --frozen-lockfile
- name: Run TypeScript dead-code checks
run: pnpm run dead-code
- name: Run TypeScript checks
run: pnpm run check

View file

@ -33,6 +33,19 @@ repos:
name: ruff format (python)
files: ^python/
- repo: local
hooks:
- id: biome-dead-code
name: biome dead-code check
entry: pnpm exec biome ci . --formatter-enabled=false --assist-enabled=false
language: system
pass_filenames: false
- id: knip-dead-code
name: knip dead-code check
entry: pnpm exec knip --reporter compact
language: system
pass_filenames: false
- repo: https://github.com/Yelp/detect-secrets
rev: v1.5.0
hooks:

View file

@ -24,6 +24,9 @@ database migrations, ORPC contracts, or `python-service/` layout exist here.
- **MUST**: Keep package/public API changes intentional. Do not add compatibility
wrappers for old KTX names unless the user explicitly asks for a migration
bridge.
- **MUST**: Treat KTX as having no public users unless the user says otherwise.
Legacy support is not necessary by default; prefer clean breaking changes over
compatibility shims, migration bridges, or preserved stale behavior.
### Absolute Prohibitions
@ -86,6 +89,7 @@ pnpm run build
pnpm run type-check
pnpm run test
pnpm run check
pnpm run dead-code
pnpm --filter @ktx/cli run smoke
pnpm --filter './packages/*' run build
pnpm --filter './packages/*' run test
@ -127,6 +131,7 @@ shared contracts or package exports are affected.
- Build/export changes: `pnpm run build`
- Workspace scripts: `node --test scripts/*.test.mjs` or the specific script
test file
- TypeScript dead-code tooling/config changes: `pnpm run dead-code`
- Python semantic layer: `uv run pytest python/ktx-sl/tests -q`
- Python daemon: `uv run pytest python/ktx-daemon/tests -q`
- Python files: also run `uv run pre-commit run --files [FILES]` when
@ -156,6 +161,23 @@ pnpm run test 2>&1 | tee /tmp/ktx-test-output.log
- Do not manually edit generated or built output under `dist/`; edit source and
rebuild.
### Dead TypeScript Code Checks
KTX uses Biome for local unused-code linting and Knip for workspace graph
analysis. These checks are intentionally part of CI and pre-commit because the
normal development workflow is agent-based.
- Run `pnpm run dead-code` after TypeScript changes.
- Treat Knip findings as investigation prompts, not automatic deletion orders.
- Remove private dead code when you confirm there are no imports, dynamic
references, generated references, or tests that still need it.
- Preserve public package exports unless the task explicitly includes API
pruning.
- Add narrow `knip.json` ignores only for intentional dynamic or public cases.
Do not add broad package-level ignores to silence unrelated findings.
- Update `knip.json` when adding dynamic entrypoints, generated files, package
exports, CLI bins, or framework files that Knip cannot infer.
### CLI Standards
- Use Commander for CLI command trees, arguments, options, help text, custom

View file

@ -19,7 +19,7 @@ reviewable project files that agents can use while planning, querying, and
updating analytics work.
A KTX project is a directory of plain files — YAML semantic sources, Markdown
knowledge pages, and SQLite state — that you commit to git and review in PRs,
wiki pages, and SQLite state — that you commit to git and review in PRs,
just like dbt models.
## Who KTX is for
@ -105,7 +105,7 @@ my-project/
│ ├── orders.yaml # Semantic source definitions
│ ├── customers.yaml
│ └── order_items.yaml
├── knowledge/
├── wiki/
│ ├── global/
│ │ ├── revenue.md # Business definitions and rules
│ │ └── segment-classification.md
@ -118,7 +118,7 @@ my-project/
└── db.sqlite # Local state (git-ignored)
```
Semantic sources and knowledge pages are committed to git. The `.ktx/` directory
Semantic sources and wiki pages are committed to git. The `.ktx/` directory
holds ephemeral state and is git-ignored — delete it and KTX rebuilds on the
next run.
@ -130,9 +130,7 @@ Scan artifacts are written under
```bash
SCAN_OUTPUT="$(ktx scan warehouse --project-dir "$PROJECT_DIR")"
printf '%s\n' "$SCAN_OUTPUT"
SCAN_RUN_ID="$(printf '%s\n' "$SCAN_OUTPUT" | awk '/^Run: / { print $2 }')"
ktx scan status --project-dir "$PROJECT_DIR" "$SCAN_RUN_ID"
ktx scan report --project-dir "$PROJECT_DIR" "$SCAN_RUN_ID"
ktx status --project-dir "$PROJECT_DIR"
```
For non-SQLite drivers, prefer credential references such as `--url env:NAME`
@ -147,16 +145,13 @@ version, and is managed by `ktx dev runtime` commands.
KTX requires `uv` on `PATH` to create the managed runtime. Install `uv` with
your system package manager or the official installer before running Python-
backed KTX commands. KTX doesn't download `uv` automatically; run
`ktx dev runtime doctor` if runtime installation fails:
`ktx dev runtime status` if runtime installation fails:
```bash
ktx dev runtime install --yes
ktx dev runtime status
ktx dev runtime doctor
ktx dev runtime start
ktx dev runtime stop
ktx dev runtime prune --dry-run
ktx dev runtime prune --yes
```
The release artifact manifest contains the public npm tarball and the bundled `kaelio-ktx`
@ -223,7 +218,7 @@ KTX provider. Enable it with an environment flag when running an LLM-backed
command:
```bash
KTX_AI_DEVTOOLS_ENABLED=true ktx dev ingest run \
KTX_AI_DEVTOOLS_ENABLED=true ktx ingest run \
--connection-id warehouse \
--adapter metabase
```

36
biome.json Normal file
View file

@ -0,0 +1,36 @@
{
"$schema": "https://biomejs.dev/schemas/2.4.15/schema.json",
"assist": {
"enabled": false
},
"formatter": {
"enabled": false
},
"files": {
"includes": [
"scripts/**/*.mjs",
"packages/**/*.ts",
"packages/**/*.tsx",
"docs-site/**/*.ts",
"docs-site/**/*.tsx",
"docs-site/**/*.mjs",
"!**/dist/**",
"!**/coverage/**",
"!**/.next/**",
"!**/node_modules/**",
"!**/*.gen.ts",
"!**/*.generated.ts"
]
},
"linter": {
"enabled": true,
"rules": {
"recommended": false,
"correctness": {
"noUnusedImports": "error",
"noUnusedVariables": "error",
"noUnusedPrivateClassMembers": "error"
}
}
}
}

View file

@ -47,7 +47,7 @@ export function TerminalPreview() {
<div className="h-2" />
<div>
<span className="term-prompt">$</span>{" "}
<span className="term-cmd">ktx agent context --json</span>
<span className="term-cmd">ktx status --json</span>
<span className="term-cursor ml-1" />
</div>
</div>

View file

@ -22,7 +22,7 @@ Agents should start with the smallest source that answers the task:
| How to check project readiness | [ktx status](/docs/cli-reference/ktx-status) | [Quickstart](/docs/getting-started/quickstart) |
| How context gets built | [Building Context](/docs/guides/building-context) | [ktx ingest](/docs/cli-reference/ktx-ingest) |
| How semantic YAML works | [Writing Context](/docs/guides/writing-context) | [ktx sl](/docs/cli-reference/ktx-sl) |
| How machine-readable CLI output is shaped | [ktx agent](/docs/cli-reference/ktx-agent) | [Markdown Access](/docs/ai-resources/markdown-access) |
| How machine-readable CLI output is shaped | [ktx sl](/docs/cli-reference/ktx-sl) | [ktx wiki](/docs/cli-reference/ktx-wiki) |
## Operating workflow

View file

@ -31,7 +31,8 @@ Every docs page has a Markdown route:
```text
https://docs.kaelio.com/ktx/docs/getting-started/quickstart.md
https://docs.kaelio.com/ktx/docs/cli-reference/ktx-agent.md
https://docs.kaelio.com/ktx/docs/cli-reference/ktx-sl.md
https://docs.kaelio.com/ktx/docs/cli-reference/ktx-wiki.md
https://docs.kaelio.com/ktx/docs/guides/building-context.md
```

View file

@ -1,148 +0,0 @@
---
title: "ktx agent"
description: "Machine-readable commands for coding agents."
---
Hidden commands that provide machine-readable JSON output for coding agents. These are the commands that agent integrations (Claude Code, Cursor, Codex, OpenCode) call under the hood — you typically won't use them directly.
All `ktx agent` subcommands require `--json` and produce structured JSON output on stdout.
## Command signature
```bash
ktx agent <subcommand> --json [options]
```
## Subcommands
| Subcommand | Description |
|-----------|-------------|
| `tools` | Print available agent-facing KTX tools |
| `context` | Print project context for agent planning |
| `sl list` | List semantic-layer sources |
| `sl read <sourceName>` | Read one semantic-layer source |
| `sl query` | Run a semantic-layer query from a JSON file |
| `wiki search <query>` | Search KTX wiki pages |
| `wiki read <pageId>` | Read one KTX wiki page |
| `sql execute` | Execute read-only SQL with a row limit |
## Options
### `agent tools`
| Flag | Description | Default |
|------|-------------|---------|
| `--json` | Print JSON output (required) | — |
### `agent context`
| Flag | Description | Default |
|------|-------------|---------|
| `--json` | Print JSON output (required) | — |
### `agent sl list`
| Flag | Description | Default |
|------|-------------|---------|
| `--json` | Print JSON output (required) | — |
| `--connection-id <id>` | Filter by connection id | — |
| `--query <text>` | Search source names and descriptions | — |
### `agent sl read`
| Flag | Description | Default |
|------|-------------|---------|
| `--json` | Print JSON output (required) | — |
| `--connection-id <id>` | Connection id containing the source | — |
### `agent sl query`
| Flag | Description | Default |
|------|-------------|---------|
| `--json` | Print JSON output (required) | — |
| `--connection-id <id>` | Connection id for execution (required) | — |
| `--query-file <path>` | JSON semantic-layer query file (required) | — |
| `--execute` | Execute the compiled query against the connection | `false` |
| `--max-rows <number>` | Maximum rows to return when executing (1-1000) | — |
### `agent wiki search`
| Flag | Description | Default |
|------|-------------|---------|
| `--json` | Print JSON output (required) | — |
| `--limit <number>` | Maximum search results | `10` |
### `agent wiki read`
| Flag | Description | Default |
|------|-------------|---------|
| `--json` | Print JSON output (required) | — |
### `agent sql execute`
| Flag | Description | Default |
|------|-------------|---------|
| `--json` | Print JSON output (required) | — |
| `--connection-id <id>` | Connection id for execution (required) | — |
| `--sql-file <path>` | SQL file to execute (required) | — |
| `--max-rows <number>` | Maximum rows to return, 1-1000 (required) | — |
## Examples
```bash
# List available tools
ktx agent tools --json
# Get project context for planning
ktx agent context --json
# List semantic sources
ktx agent sl list --json
# Search semantic sources by name
ktx agent sl list --json --query "revenue"
# Read a semantic source
ktx agent sl read orders --json --connection-id my-warehouse
# Run a semantic-layer query from a file
ktx agent sl query --json \
--connection-id my-warehouse \
--query-file /tmp/query.json \
--execute \
--max-rows 100
# Search wiki pages
ktx agent wiki search "churn definition" --json
# Read a specific wiki page
ktx agent wiki read page-abc123 --json
# Execute read-only SQL
ktx agent sql execute --json \
--connection-id my-warehouse \
--sql-file /tmp/query.sql \
--max-rows 500
```
## Output
Every `ktx agent` command writes JSON to stdout and diagnostic text to stderr. Agents should parse stdout as JSON and treat a non-zero exit code as a failed tool call.
```json
{
"ok": true,
"data": {
"type": "agent-response"
}
}
```
## Common errors
| Error | Cause | Recovery |
|-------|-------|----------|
| Missing JSON output | `--json` was omitted | Re-run the same subcommand with `--json` |
| Unknown connection id | The requested connection is not configured in `ktx.yaml` | Call `ktx agent context --json` or `ktx connection list` to discover valid ids |
| Query file cannot be read | `--query-file` points to a missing or invalid JSON file | Write the query payload to a real file and pass its absolute path |
| SQL execution rejected | SQL is not read-only or `--max-rows` is missing | Use semantic-layer queries first; for direct SQL, pass read-only SQL and an explicit row limit |

View file

@ -1,9 +1,11 @@
---
title: "ktx connection"
description: "Add, list, test, and map data sources."
description: "List and test configured data sources."
---
Manage database and source connections in your KTX project. Connections define how KTX reaches your data warehouse, BI tools, and context sources.
Inspect configured connections in your KTX project. Connections define how KTX
reaches your data warehouse, BI tools, and context sources. Use `ktx setup` to
add, remove, or reconfigure connections.
## Command signature
@ -17,96 +19,23 @@ ktx connection <subcommand> [options]
|-----------|-------------|
| `list` | List configured connections |
| `test <connectionId>` | Test a configured connection |
| `add <driver> <connectionId>` | Add or replace a configured connection |
| `remove <connectionId>` | Remove a configured connection from `ktx.yaml` |
| `map <sourceConnectionId>` | Refresh and validate BI-to-warehouse mappings |
| `mapping list <connectionId>` | List Metabase database mappings |
| `mapping set <connectionId> <field> <assignment>` | Set a Metabase or Looker warehouse mapping |
| `mapping apply-bulk <connectionId>` | Apply mappings from JSON |
| `mapping set-sync-enabled <connectionId> <dbId>` | Enable or disable sync for one Metabase database |
| `mapping sync-state get <connectionId>` | Read sync-state selection |
| `mapping sync-state set <connectionId>` | Write sync-state selection |
| `mapping refresh <connectionId>` | Refresh Metabase database mappings |
| `mapping validate <connectionId>` | Validate Metabase database mappings |
| `mapping clear <connectionId> [dbId]` | Clear Metabase database mappings |
| `metabase setup` | Guided setup for a Metabase connection |
| `notion pick <connectionId>` | Pick Notion root pages for a configured Notion connection |
## Options
### `connection add`
The `connection` command has command-level options for listing and testing
existing connections.
| Flag | Description | Default |
|------|-------------|---------|
| `--url <url>` | Connection URL, `env:NAME`, or `file:/path` reference | — |
| `--schema <schema>` | Schema to include; repeatable | — |
| `--readonly` | Mark the connection as read-only | `false` |
| `--force` | Replace an existing connection | `false` |
| `--allow-literal-credentials` | Allow writing a literal credential URL to `ktx.yaml` | `false` |
#### Notion-specific options for `connection add`
| Flag | Description | Default |
|------|-------------|---------|
| `--token-env <name>` | Environment variable containing Notion auth token | — |
| `--token-file <path>` | File containing Notion auth token | — |
| `--crawl-mode <mode>` | Notion crawl mode (`all_accessible` or `selected_roots`) | `selected_roots` |
| `--root-page-id <id>` | Root page to crawl; repeatable | — |
| `--root-database-id <id>` | Root database to crawl; repeatable | — |
| `--root-data-source-id <id>` | Root data source to crawl; repeatable | — |
| `--max-pages <n>` | Maximum pages per run | — |
| `--max-knowledge-creates <n>` | Maximum knowledge creates per run | — |
| `--max-knowledge-updates <n>` | Maximum knowledge updates per run | — |
### `connection remove`
| Flag | Description | Default |
|------|-------------|---------|
| `--force` | Remove without prompting | `false` |
| `--no-input` | Disable interactive terminal input | — |
### `connection map`
### `connection list`
| Flag | Description | Default |
|------|-------------|---------|
| `--json` | Print JSON output | `false` |
### `connection mapping` subcommands
| Flag | Subcommand | Description | Default |
|------|-----------|-------------|---------|
| `--json` | `list`, `sync-state get` | Print JSON output | `false` |
| `--file <path>` | `apply-bulk` | JSON mapping file (required) | — |
| `--enabled <value>` | `set-sync-enabled` | `true` or `false` (required) | — |
| `--mode <mode>` | `sync-state set` | `ALL`, `ONLY`, or `EXCEPT` (required) | — |
| `--collections <ids>` | `sync-state set` | Comma-separated collection ids | — |
| `--items <ids>` | `sync-state set` | Comma-separated item ids | — |
| `--tag-names <names>` | `sync-state set` | Comma-separated tag names | — |
| `--auto-accept` | `refresh` | Accept refresh changes without prompting | `false` |
### `connection metabase setup`
### `connection test`
| Flag | Description | Default |
|------|-------------|---------|
| `--id <connectionId>` | KTX connection id to write | — |
| `--url <url>` | Metabase API URL | — |
| `--api-key <key>` | Metabase API key | — |
| `--mint-api-key` | Mint a Metabase API key with credentials | `false` |
| `--username <email>` | Metabase admin username for API-key minting | — |
| `--password <password>` | Metabase admin password for API-key minting | — |
| `--map <id=target>` | Assign a Metabase database id to a warehouse connection; repeatable | — |
| `--sync <metabaseDatabaseId>` | Enable sync for a discovered database; repeatable | — |
| `--sync-mode <mode>` | Metabase sync selection mode (`ALL`, `ONLY`, or `EXCEPT`) | `ALL` |
| `--run-ingest` | Run ingest after setup | `false` |
| `--yes` | Confirm and apply setup changes without prompting | `false` |
| `--no-input` | Disable interactive terminal input | — |
### `connection notion pick`
| Flag | Description | Default |
|------|-------------|---------|
| `--no-input` | Disable interactive terminal input | — |
| `--root-page-id <id>` | Root page UUID to crawl; repeatable (required with `--no-input`) | — |
| `--json` | Print JSON output | `false` |
## Examples
@ -114,43 +43,20 @@ ktx connection <subcommand> [options]
# List all configured connections
ktx connection list
# Add a Postgres connection using an environment variable
ktx connection add postgres my-warehouse --url "env:DATABASE_URL"
# Add a Postgres connection with specific schemas
ktx connection add postgres analytics --url "env:PG_URL" --schema public --schema analytics
# Add a read-only Snowflake connection
ktx connection add snowflake sf-prod --url "env:SNOWFLAKE_URL" --readonly
# Test a connection
ktx connection test my-warehouse
# Remove a connection
ktx connection remove old-warehouse
# Add a Notion source connection
ktx connection add notion my-notion \
--token-env NOTION_TOKEN \
--crawl-mode selected_roots \
--root-page-id abc123def456...
# Run guided Metabase setup
ktx connection metabase setup --url https://metabase.example.com
# Map a BI database to a warehouse connection
ktx connection mapping set metabase-prod databaseMappings 1=my-warehouse
# Refresh Metabase mappings
ktx connection mapping refresh metabase-prod --auto-accept
# Pick Notion root pages interactively
ktx connection notion pick my-notion
```
## Setup-managed connections
Run `ktx setup` when you need to add or reconfigure a connection. Interactive
setup includes the rich Notion page picker for selected root pages and the
Metabase mapping prompts for BI-to-warehouse mappings.
## Output
Interactive commands render prompts and status text. Commands with `--json` return machine-readable JSON suitable for scripts and agents.
Commands with `--json` return machine-readable JSON suitable for scripts and
agents.
```json
{
@ -168,7 +74,6 @@ Interactive commands render prompts and status text. Commands with `--json` retu
| Error | Cause | Recovery |
|-------|-------|----------|
| Connection test fails | Credentials, network access, database, warehouse, or schema is invalid | Verify the same URL with the database's native client, then rerun `ktx connection add ... --force` |
| Literal credentials rejected | KTX avoids writing raw secrets to `ktx.yaml` by default | Use `env:NAME` or `file:/path/to/secret`; use `--allow-literal-credentials` only for local throwaway projects |
| Mapping validation fails | BI database mappings do not point at valid warehouse connections | Run `ktx connection mapping refresh <connectionId> --auto-accept`, then set invalid mappings explicitly |
| Notion pick cannot run non-interactively | `--no-input` was used without root page or database ids | Pass `--root-page-id`, `--root-database-id`, or `--root-data-source-id` with `--no-input` |
| Connection test fails | Credentials, network access, database, warehouse, or schema is invalid | Verify the same URL with the database's native client, then rerun `ktx setup` and reconfigure the connection |
| Mapping validation fails during setup | BI database mappings do not point at valid warehouse connections | Rerun `ktx setup` and update the source mapping selections |
| Notion page picker cannot run | The terminal is non-interactive or Notion discovery failed | Rerun interactive `ktx setup`, or use non-interactive setup flags with explicit root page ids |

View file

@ -1,9 +1,9 @@
---
title: "ktx dev"
description: "Low-level diagnostics, scans, adapter commands, and mapping tools."
description: "Low-level project initialization and runtime management."
---
Hidden commands for low-level project management, diagnostics, direct adapter control, and shell completion. Most users interact with these through higher-level commands like [`ktx ingest`](/docs/cli-reference/ktx-ingest) and [`ktx setup`](/docs/cli-reference/ktx-setup), but `ktx dev` provides direct access when you need fine-grained control.
`ktx dev` contains development-only project initialization and managed runtime commands. Scan and ingest commands live at the root as [`ktx scan`](/docs/cli-reference/ktx-scan) and [`ktx ingest`](/docs/cli-reference/ktx-ingest).
## Command signature
@ -16,145 +16,42 @@ ktx dev <subcommand> [options]
| Subcommand | Description |
|-----------|-------------|
| `init [directory]` | Initialize a Git-backed KTX project directory |
| `runtime` | Install, inspect, and prune the KTX-managed Python runtime |
| `scan` | Run or inspect standalone connection scans |
| `ingest run` | Run local ingest for one configured connection and source adapter |
| `ingest status [runId]` | Print status for a stored local ingest run |
| `ingest watch [runId]` | Open a stored ingest visual report |
| `ingest replay <runId>` | Replay a stored ingest run through memory-flow output |
| `mapping` | Manage Metabase warehouse mappings (same as `ktx connection mapping`) |
| `completion zsh` | Generate zsh completion script |
| `runtime` | Install, start, stop, and inspect the KTX-managed Python runtime |
## Options
### `dev init`
## `dev init`
| Flag | Description | Default |
|------|-------------|---------|
| `--name <name>` | Project name written to `ktx.yaml` | — |
| `--force` | Rewrite `ktx.yaml` and scaffold files in an existing project | `false` |
### `dev runtime`
## `dev runtime`
`ktx dev runtime` supports `install`, `start`, `stop`, and `status`.
| Flag | Description | Default |
|------|-------------|---------|
| `--feature <feature>` | Runtime feature level for `install` and `start` (`core` or `local-embeddings`) | `core` |
| `--json` | Print JSON output | `false` |
| `--yes` | Confirm runtime install or prune actions where supported | `false` |
| `--json` | Print JSON output for `status` | `false` |
| `--yes` | Confirm runtime install actions where supported | `false` |
| `--force` | Reinstall or restart where supported | `false` |
### `dev scan`
See [`ktx scan`](/docs/cli-reference/ktx-scan) for the full scan command reference.
### `dev ingest run`
| Flag | Description | Default |
|------|-------------|---------|
| `--connection-id <connectionId>` | KTX connection id (required) | — |
| `--adapter <adapter>` | Ingest source adapter name (required) | — |
| `--source-dir <path>` | Directory containing source files | — |
| `--database-introspection-url <url>` | Daemon URL for live-database introspection | — |
| `--debug-llm-request-file <path>` | Write sanitized LLM request structure to a JSONL file | — |
| `--plain` | Print plain text output | `false` |
| `--json` | Print JSON output | `false` |
| `--viz` | Render memory-flow TUI output | `false` |
| `--no-input` | Disable interactive terminal input for visualization | — |
### `dev ingest status`
| Flag | Description | Default |
|------|-------------|---------|
| `--report-file <path>` | Bundle ingest report JSON file to render | — |
| `--plain` | Print plain text output | `false` |
| `--json` | Print JSON output | `false` |
| `--viz` | Render memory-flow TUI output | `false` |
| `--no-input` | Disable interactive terminal input for visualization | — |
### `dev ingest watch`
| Flag | Description | Default |
|------|-------------|---------|
| `--report-file <path>` | Bundle ingest report JSON file to render | — |
| `--plain` | Print plain text output | `false` |
| `--json` | Print JSON output | `false` |
| `--viz` | Render memory-flow TUI output (the default unless `--plain` or `--json` is set) | `true` |
| `--no-input` | Disable interactive terminal input for visualization | — |
### `dev ingest replay`
| Flag | Description | Default |
|------|-------------|---------|
| `--report-file <path>` | Bundle ingest report JSON file to render | — |
| `--plain` | Print plain text output | `false` |
| `--json` | Print JSON output | `false` |
| `--viz` | Render memory-flow TUI output | `false` |
| `--no-input` | Disable interactive terminal input for visualization | — |
### `dev completion zsh`
| Flag | Description | Default |
|------|-------------|---------|
| `--install` | Install zsh completion into `~/.zfunc` and update `~/.zshrc` | `false` |
## Examples
```bash
# Initialize a new KTX project
ktx dev init
# Initialize in a specific directory with a project name
ktx dev init ./my-project --name "Analytics Context"
# Re-initialize an existing project
ktx dev init --force
# Check managed Python runtime readiness
ktx dev runtime doctor
# Start the managed Python daemon
ktx dev runtime install --yes
ktx dev runtime status
ktx dev runtime start
# Run a low-level ingest with a specific adapter
ktx dev ingest run --connection-id my-dbt --adapter dbt
# Run ingest from a specific source directory
ktx dev ingest run \
--connection-id my-dbt \
--adapter dbt \
--source-dir ./dbt-project
# View ingest status with the visual TUI
ktx dev ingest watch run-abc123
# Replay a stored ingest session
ktx dev ingest replay run-abc123
# View ingest status from a report file
ktx dev ingest status --report-file /tmp/ingest-report.json
# Generate zsh completions
ktx dev completion zsh
# Install zsh completions
ktx dev completion zsh --install
ktx dev runtime stop
```
## Output
`ktx dev` commands are diagnostic and may print plain text, JSON, or visual reports depending on the selected flags.
| Mode | How to request it | Use case |
|------|-------------------|----------|
| Plain text | `--plain` or default diagnostic output | Human-readable terminal inspection |
| JSON | `--json` | Agent parsing and automation |
| Visual report | `--viz` | Interactive memory-flow and ingest debugging |
## Common errors
| Error | Cause | Recovery |
|-------|-------|----------|
| Doctor reports missing runtime pieces | Packages, Python environment, or linked CLI are not ready | Run `pnpm install`, `pnpm run setup:dev`, and `uv sync --all-groups` |
| Ingest run cannot find adapter | `--adapter` does not match a supported source adapter | Use configured source names from `ktx.yaml` or run higher-level `ktx ingest` |
| Replay/report file cannot be read | The report path is wrong or the run id is not stored locally | Run `ktx dev ingest status --json` to discover stored run ids and report files |
| Visual output fails in CI | TUI rendering requires an interactive terminal | Use `--plain --no-input` or `--json --no-input` |
| Runtime status reports missing pieces | Packages, Python environment, or linked CLI are not ready | Run `pnpm install`, `pnpm run setup:dev`, `uv sync --all-groups`, then `ktx dev runtime status` |
| Runtime daemon does not start | The managed Python runtime is missing or stale | Run `ktx dev runtime install --yes`, then `ktx dev runtime start` |

View file

@ -1,14 +1,13 @@
---
title: "ktx ingest"
description: "Build and refresh context from configured sources."
description: "Run and inspect local ingest memory-flow output."
---
Ingest context from your configured sources — dbt, Looker, Metabase, MetricFlow, LookML, or Notion. The ingest process extracts metadata from your tools, then uses an LLM agent to reconcile it with existing context, writing semantic sources and knowledge pages to your project.
`ktx ingest` runs adapter-level local ingest and renders stored ingest reports.
## Command signature
```bash
ktx ingest [connectionId] [options]
ktx ingest <subcommand> [options]
```
@ -16,80 +15,59 @@ ktx ingest <subcommand> [options]
| Subcommand | Description |
|-----------|-------------|
| `status [runId]` | Print status for the latest or selected public ingest run |
| `watch [runId]` | Open the latest or selected public ingest visual report |
| `run` | Run local ingest for one configured connection and source adapter |
| `status [runId]` | Print status for the latest or selected stored local ingest run or report file |
| `watch [runId]` | Open the latest or selected stored ingest visual report |
| `replay <runId>` | Replay a stored ingest run or bundle report through memory-flow output |
## Options
### `ingest` (run)
## `ingest run`
| Flag | Description | Default |
|------|-------------|---------|
| `--all` | Ingest every eligible configured source | `false` |
| `--connection-id <connectionId>` | KTX connection id | Required |
| `--adapter <adapter>` | Ingest source adapter name | Required |
| `--source-dir <path>` | Directory containing source files | — |
| `--database-introspection-url <url>` | Daemon URL for live-database introspection | — |
| `--debug-llm-request-file <path>` | Write sanitized LLM request structure to a JSONL file | — |
| `--plain` | Print plain text output | `true` |
| `--json` | Print JSON output | `false` |
| `--no-input` | Disable interactive terminal input | — |
| `--viz` | Render memory-flow TUI output | `false` |
| `--yes` | Install the managed Python runtime without prompting when required | `false` |
| `--no-input` | Disable interactive terminal input for visualization and runtime installation | — |
### `ingest status`
## `ingest status`, `watch`, and `replay`
| Flag | Description | Default |
|------|-------------|---------|
| `--report-file <path>` | Bundle ingest report JSON file to render | — |
| `--plain` | Print plain text output | `true` for `status` and `replay` |
| `--json` | Print JSON output | `false` |
| `--no-input` | Disable interactive terminal input | — |
### `ingest watch`
| Flag | Description | Default |
|------|-------------|---------|
| `--json` | Print JSON output instead of the visual report | `false` |
| `--no-input` | Disable interactive terminal input | — |
| `--viz` | Render memory-flow TUI output | `true` for `watch` |
| `--no-input` | Disable interactive terminal input for visualization | — |
## Examples
```bash
# Ingest from a specific connection
ktx ingest my-dbt-source
ktx ingest run --connection-id my-dbt-source --adapter dbt
ktx ingest run --connection-id prod-metabase --adapter metabase --yes
# Ingest from all eligible sources
ktx ingest --all
# Check the status of the latest ingest
ktx ingest status
# Check the status of a specific ingest run
ktx ingest status run-abc123
# Watch the latest ingest report
ktx ingest watch
# Get ingest status as JSON
ktx ingest status --json
```
## Low-level ingest commands
ktx ingest watch
ktx ingest watch run-abc123
For adapter-level control, use `ktx dev ingest`. See [`ktx dev`](/docs/cli-reference/ktx-dev) for the full low-level ingest surface including `run`, `status`, `watch`, and `replay` with output mode options (`--plain`, `--json`, `--viz`).
## Output
Ingest run commands print progress and create a stored ingest report. `ktx ingest status --json` returns the run state, adapter, connection, and summary information.
```json
{
"runId": "ingest-local-abc123",
"status": "completed",
"connectionId": "dbt-main",
"summary": {
"semanticSourcesChanged": 4,
"knowledgePagesChanged": 2
}
}
ktx ingest replay run-abc123
ktx ingest replay run-abc123 --viz
ktx ingest replay run-abc123 --report-file /tmp/ingest-report.json
```
## Common errors
| Error | Cause | Recovery |
|-------|-------|----------|
| No eligible sources | `ktx.yaml` has no configured context source for ingest | Add a source with `ktx setup` or `ktx connection add`, then rerun ingest |
| Ingest needs credentials | The source adapter requires API or git access | Configure the referenced environment variable or secret file |
| Latest run not found | No ingest run has been started in this project | Run `ktx ingest <connectionId>` or `ktx ingest --all` first |
| Ingest run cannot find adapter | `--adapter` does not match a supported source adapter | Use a configured adapter such as `dbt`, `metabase`, `looker`, `lookml`, `notion`, or `live-database` |
| Latest run not found | No ingest run has been started in this project | Run `ktx ingest run --connection-id <id> --adapter <adapter>` first |
| Report watch fails in a non-interactive shell | Visual report needs a terminal | Use `ktx ingest status --json` for agent and CI workflows |

View file

@ -1,163 +1,39 @@
---
title: "ktx scan"
description: "Run or inspect database scans."
description: "Run standalone database scans."
---
Discover your database schema — tables, columns, types, constraints, and relationships. Scanning is the first step in building context: KTX needs to understand your warehouse structure before it can build semantic sources.
Scan commands live under `ktx dev scan`. See also the [Building Context](/docs/guides/building-context) guide for a walkthrough.
Discover a configured database connection's schema, including tables, columns, types, constraints, and optional relationship signals.
## Command signature
```bash
ktx dev scan <connectionId> [options]
ktx dev scan <subcommand> [options]
ktx scan <connectionId> [options]
```
## Subcommands
| Subcommand | Description |
|-----------|-------------|
| `status <runId>` | Print status for a local scan run |
| `report <runId>` | Print a local scan report |
| `relationships <runId>` | Print relationship artifacts for a local scan run |
| `relationship-apply <runId>` | Apply accepted relationship review decisions as manual manifest joins |
| `relationship-feedback` | Export persisted relationship review decisions as calibration labels |
| `relationship-calibration` | Summarize relationship feedback labels against current score thresholds |
| `relationship-thresholds` | Evaluate relationship feedback labels for offline threshold advice |
## Options
### `scan` (run)
| Flag | Description | Default |
|------|-------------|---------|
| `--mode <mode>` | Scan mode: `structural`, `enriched`, or `relationships` | `structural` |
| `--dry-run` | Run without writing scan results | `false` |
| `--database-introspection-url <url>` | Daemon URL for live-database introspection | — |
### `scan report`
| Flag | Description | Default |
|------|-------------|---------|
| `--json` | Print the raw scan report JSON | `false` |
### `scan relationships`
| Flag | Description | Default |
|------|-------------|---------|
| `--status <status>` | Filter by status: `accepted`, `review`, `rejected`, `skipped`, or `all` | `review` |
| `--limit <count>` | Maximum relationships to print per status | `25` |
| `--accept <candidateId>` | Record an accepted decision for a relationship candidate | — |
| `--reject <candidateId>` | Record a rejected decision for a relationship candidate | — |
| `--note <text>` | Attach a note when recording a relationship review decision | — |
| `--reviewer <name>` | Reviewer name for a relationship review decision | — |
| `--json` | Print relationship artifacts as JSON | `false` |
### `scan relationship-apply`
| Flag | Description | Default |
|------|-------------|---------|
| `--all-accepted` | Apply all accepted relationship review decisions for the scan run | `false` |
| `--candidate <candidateId>` | Apply one accepted relationship review decision; repeatable | — |
| `--dry-run` | Preview relationships that would be written without rewriting manifest shards | `false` |
| `--json` | Print the apply result as JSON | `false` |
### `scan relationship-feedback`
| Flag | Description | Default |
|------|-------------|---------|
| `--connection <connectionId>` | Only export labels for one KTX connection | — |
| `--decision <decision>` | Filter: `accepted`, `rejected`, or `all` | `all` |
| `--json` | Print the export as JSON | `false` |
| `--jsonl` | Print labels as newline-delimited JSON | `false` |
### `scan relationship-calibration`
| Flag | Description | Default |
|------|-------------|---------|
| `--connection <connectionId>` | Only calibrate labels for one KTX connection | — |
| `--decision <decision>` | Filter: `accepted`, `rejected`, or `all` | `all` |
| `--accept-threshold <value>` | Score threshold treated as predicted accepted (01) | `0.85` |
| `--review-threshold <value>` | Score threshold treated as predicted review (01) | `0.55` |
| `--json` | Print the calibration report as JSON | `false` |
### `scan relationship-thresholds`
| Flag | Description | Default |
|------|-------------|---------|
| `--connection <connectionId>` | Only evaluate labels for one KTX connection | — |
| `--min-total-labels <count>` | Minimum scored labels before advice can be ready | `20` |
| `--min-accepted-labels <count>` | Minimum accepted labels before advice can be ready | `5` |
| `--min-rejected-labels <count>` | Minimum rejected labels before advice can be ready | `5` |
| `--json` | Print the threshold advice report as JSON | `false` |
| `--yes` | Install the managed Python runtime without prompting when required | `false` |
| `--no-input` | Disable interactive managed runtime installation | — |
## Examples
```bash
# Run a structural scan of a connection
ktx dev scan my-warehouse
# Run a scan with LLM enrichment
ktx dev scan my-warehouse --mode enriched
# Run a scan with relationship detection
ktx dev scan my-warehouse --mode relationships
# Dry-run a scan (don't write results)
ktx dev scan my-warehouse --dry-run
# Check the status of a scan run
ktx dev scan status run-abc123
# View the scan report
ktx dev scan report run-abc123
# View scan report as JSON
ktx dev scan report run-abc123 --json
# List relationship candidates pending review
ktx dev scan relationships run-abc123
# List all relationships regardless of status
ktx dev scan relationships run-abc123 --status all
# Accept a relationship candidate
ktx dev scan relationships run-abc123 --accept candidate-xyz
# Reject a relationship candidate with a note
ktx dev scan relationships run-abc123 --reject candidate-xyz --note "false positive"
# Apply all accepted relationships to the manifest
ktx dev scan relationship-apply run-abc123 --all-accepted
# Preview what would be applied
ktx dev scan relationship-apply run-abc123 --all-accepted --dry-run
# Export relationship feedback as calibration labels
ktx dev scan relationship-feedback --json
# Calibrate relationship detection thresholds
ktx dev scan relationship-calibration --accept-threshold 0.9 --review-threshold 0.6
# Get threshold advice based on review decisions
ktx dev scan relationship-thresholds
ktx scan my-warehouse
ktx scan my-warehouse --mode enriched
ktx scan my-warehouse --mode relationships
ktx scan my-warehouse --dry-run
ktx scan my-warehouse --database-introspection-url http://127.0.0.1:8765
```
## Output
Scan commands write scan artifacts under the KTX project directory and print status or report summaries. Use `--json` on report and relationship commands when an agent needs structured output.
```json
{
"runId": "scan-local-abc123",
"status": "completed",
"mode": "structural",
"changes": {
"tablesAdded": 42
}
}
```
`ktx scan` prints a human summary and writes scan artifacts under the KTX project directory unless `--dry-run` is set. Use `ktx status` after a scan to inspect project readiness and next setup work.
## Common errors
@ -165,5 +41,4 @@ Scan commands write scan artifacts under the KTX project directory and print sta
|-------|-------|----------|
| Scan cannot connect | Connection credentials or network access are invalid | Run `ktx connection test <connectionId>` and update the connection before scanning |
| Enriched scan cannot describe columns | LLM credentials are missing or invalid | Complete LLM setup with `ktx setup` before enriched scans |
| Relationship apply writes nothing | No accepted candidates match the provided run id or candidate ids | Inspect `ktx dev scan relationships <runId> --status accepted` first |
| Calibration is not ready | Too few reviewed relationship labels exist | Review and accept/reject more candidates, then rerun calibration |
| Relationship scan has limited evidence | The connector cannot provide optional validation or statistics | Re-run with a connector that supports the missing capability, or treat relationship output as lower-confidence context |

View file

@ -1,6 +1,6 @@
---
title: "ktx sl"
description: "List, read, validate, query, or write semantic-layer sources."
description: "List, search, validate, or query semantic-layer sources."
---
Interact with your project's semantic layer. Semantic sources are YAML definitions that describe your tables, columns, measures, joins, and grain — the vocabulary agents use to generate correct SQL.
@ -16,9 +16,8 @@ ktx sl <subcommand> [options]
| Subcommand | Description |
|-----------|-------------|
| `list` | List semantic-layer sources |
| `read <sourceName>` | Read a semantic-layer source |
| `search <query>` | Search semantic-layer sources |
| `validate <sourceName>` | Validate a semantic-layer source against the database schema |
| `write <sourceName>` | Write a semantic-layer source |
| `query` | Compile or execute a semantic-layer query |
## Options
@ -31,11 +30,14 @@ ktx sl <subcommand> [options]
| `--output <mode>` | Output mode: `pretty` (default in TTY), `plain` (TSV), or `json` | `pretty` |
| `--json` | Shortcut for `--output=json` (overrides `--output`) | `false` |
### `sl read`
### `sl search`
| Flag | Description | Default |
|------|-------------|---------|
| `--connection-id <id>` | KTX connection id (required) | — |
| `--connection-id <id>` | Filter by KTX connection id | — |
| `--limit <number>` | Maximum search results | — |
| `--output <mode>` | Output mode: `pretty` (default in TTY), `plain` (TSV), or `json` | `pretty` |
| `--json` | Shortcut for `--output=json` (overrides `--output`) | `false` |
### `sl validate`
@ -43,18 +45,12 @@ ktx sl <subcommand> [options]
|------|-------------|---------|
| `--connection-id <id>` | KTX connection id (required) | — |
### `sl write`
| Flag | Description | Default |
|------|-------------|---------|
| `--connection-id <id>` | KTX connection id (required) | — |
| `--yaml <yaml>` | Semantic-layer source YAML content (required) | — |
### `sl query`
| Flag | Description | Default |
|------|-------------|---------|
| `--connection-id <id>` | KTX connection id | — |
| `--query-file <path>` | JSON semantic-layer query file | — |
| `--measure <measure>` | Measure to query; repeatable (at least one required) | — |
| `--dimension <dimension>` | Dimension to include; repeatable | — |
| `--filter <filter>` | Filter expression; repeatable | — |
@ -78,15 +74,12 @@ ktx sl list --connection-id my-warehouse
# List sources as JSON
ktx sl list --json
# Read a source definition
ktx sl read orders --connection-id my-warehouse
# Search sources as JSON
ktx sl search "revenue" --json
# Validate a source against the live schema
ktx sl validate orders --connection-id my-warehouse
# Write a new source from YAML
ktx sl write customers --connection-id my-warehouse --yaml "$(cat sources/customers.yaml)"
# Compile a query and view the generated SQL
ktx sl query \
--connection-id my-warehouse \
@ -119,6 +112,13 @@ ktx sl query \
--dimension orders.created_date \
--execute \
--max-rows 1000
# Execute a query from a JSON file
ktx sl query \
--connection-id my-warehouse \
--query-file query.json \
--execute \
--max-rows 100
```
## Output
@ -143,5 +143,5 @@ Semantic-layer commands return human-readable output by default. Use `--json` or
|-------|-------|----------|
| Source not found | Source name or connection id is wrong | Run `ktx sl list --json` and retry with an exact source name and connection id |
| Validation fails | YAML references missing columns, invalid joins, or invalid SQL expressions | Fix the source YAML and rerun `ktx sl validate` |
| Query compile fails | Measure, dimension, filter, or segment name is invalid | Read the source with `ktx sl read`, then retry using declared fields |
| Query compile fails | Measure, dimension, filter, or segment name is invalid | Search sources with `ktx sl search`, inspect the source YAML in your project files, then retry using declared fields |
| Execution returns too many rows | `--max-rows` is missing or too high | Add `--max-rows` with a bounded value before executing |

View file

@ -1,9 +1,9 @@
---
title: "ktx wiki"
description: "List, read, search, or write knowledge pages."
description: "List, read, search, or write wiki pages."
---
Manage knowledge pages in your KTX project. Knowledge pages are Markdown documents that capture business definitions, rules, and gotchas. Agents search them for context when answering questions about your data.
Manage wiki pages in your KTX project. Wiki pages are Markdown documents that capture business definitions, rules, and gotchas. Agents search them for context when answering questions about your data.
## Command signature
@ -26,19 +26,23 @@ ktx wiki <subcommand> [options]
| Flag | Description | Default |
|------|-------------|---------|
| `--json` | Print JSON output | `false` |
| `--user-id <id>` | Local user id | `local` |
### `wiki read`
| Flag | Description | Default |
|------|-------------|---------|
| `--json` | Print JSON output | `false` |
| `--user-id <id>` | Local user id | `local` |
### `wiki search`
| Flag | Description | Default |
|------|-------------|---------|
| `--json` | Print JSON output | `false` |
| `--user-id <id>` | Local user id | `local` |
| `--limit <number>` | Maximum search results | — |
### `wiki write`
@ -58,18 +62,27 @@ ktx wiki <subcommand> [options]
# List all wiki pages
ktx wiki list
# List all wiki pages as JSON
ktx wiki list --json
# Read a specific wiki page
ktx wiki read revenue-definitions
# Read a specific wiki page as JSON
ktx wiki read revenue-definitions --json
# Search wiki pages
ktx wiki search "monthly recurring revenue"
# Write a global knowledge page
# Search wiki pages as JSON
ktx wiki search "monthly recurring revenue" --json --limit 10
# Write a global wiki page
ktx wiki write revenue-definitions \
--summary "Canonical revenue metric definitions" \
--content "## MRR\nMonthly Recurring Revenue is calculated as..."
# Write a user-scoped knowledge page
# Write a user-scoped wiki page
ktx wiki write my-notes \
--scope user \
--summary "Personal analysis notes" \
@ -93,17 +106,20 @@ ktx wiki write data-freshness \
## Output
Wiki commands print local knowledge pages and search results. Agents should search first, then read the most relevant page by key.
Wiki commands print local wiki pages and search results. Agents should search first, then read the most relevant page by key.
```json
{
"results": [
{
"key": "revenue-definitions",
"summary": "Canonical revenue metric definitions",
"score": 0.92
}
]
"kind": "list",
"data": {
"items": [
{
"key": "revenue-definitions",
"summary": "Canonical revenue metric definitions",
"score": 0.92
}
]
}
}
```

View file

@ -9,7 +9,6 @@
"ktx-sl",
"ktx-wiki",
"ktx-status",
"ktx-agent",
"ktx-dev"
]
}

View file

@ -7,9 +7,9 @@ description: Treat analytics context like code — version it, review it, merge
dbt proved that analytics transformations belong in version control. Before dbt, SQL lived in BI tools, scheduling systems, and spreadsheets — scattered, unreviewed, impossible to audit. "Analytics as code" changed that: put your models in git, review them in PRs, deploy them by merging.
KTX applies the same principle to analytics context. Metric definitions, business rules, join relationships, knowledge pages — these are artifacts that determine whether an agent produces correct results. They change over time. They need review. They need history. They need to be treated like code.
KTX applies the same principle to analytics context. Metric definitions, business rules, join relationships, wiki pages — these are artifacts that determine whether an agent produces correct results. They change over time. They need review. They need history. They need to be treated like code.
A KTX project is a git repository. Semantic sources are YAML files. Knowledge pages are Markdown files. Changes are commits. Updates are pull requests. Deployment is a merge. The entire lifecycle of your analytics context follows the same workflow your team already uses for dbt models, application code, and infrastructure.
A KTX project is a git repository. Semantic sources are YAML files. Wiki pages are Markdown files. Changes are commits. Updates are pull requests. Deployment is a merge. The entire lifecycle of your analytics context follows the same workflow your team already uses for dbt models, application code, and infrastructure.
## Auto-ingestion
@ -19,9 +19,9 @@ An ingestion run works like this:
1. **Adapters extract metadata.** Each configured source — dbt, LookML, Metabase, MetricFlow, Notion, or your live database — provides structured metadata about models, metrics, dimensions, questions, and documentation.
2. **The LLM agent reconciles.** KTX doesn't blindly overwrite existing context. An LLM agent compares incoming metadata against your current semantic sources and knowledge pages. It decides what to create, what to update, and what to leave alone. If your dbt project added a new model, the agent writes a new semantic source. If a Metabase question references a metric you've already defined, the agent skips the duplicate.
2. **The LLM agent reconciles.** KTX doesn't blindly overwrite existing context. An LLM agent compares incoming metadata against your current semantic sources and wiki pages. It decides what to create, what to update, and what to leave alone. If your dbt project added a new model, the agent writes a new semantic source. If a Metabase question references a metric you've already defined, the agent skips the duplicate.
3. **Files are written.** New and updated YAML sources and Markdown knowledge pages are written to the project directory. Every decision is recorded in the session transcript.
3. **Files are written.** New and updated YAML sources and Markdown wiki pages are written to the project directory. Every decision is recorded in the session transcript.
This reconciliation step is what separates auto-ingestion from a simple sync. A naive import would overwrite your hand-tuned metric definitions every time dbt's manifest changes. KTX's agent-driven approach merges intelligently: it respects your edits, fills gaps, and flags conflicts for human review.
@ -43,7 +43,7 @@ dbt / Looker / Metabase / Notion
|
| + 3 new sources
| ~ 2 updated joins
| + 1 knowledge page
| + 1 wiki page
v
open PR
|
@ -57,9 +57,9 @@ dbt / Looker / Metabase / Notion
agents see updated context
```
A typical branch shows a semantic diff: "this ingest added 3 new sources from dbt, updated 2 join definitions based on schema changes, and created 1 knowledge page from a Notion doc." Analytics engineers review the diff, verify that the new sources look correct, and merge.
A typical branch shows a semantic diff: "this ingest added 3 new sources from dbt, updated 2 join definitions based on schema changes, and created 1 wiki page from a Notion doc." Analytics engineers review the diff, verify that the new sources look correct, and merge.
Teams usually run this on demand while setting up a source, then schedule it once the source is stable. A cron job or CI schedule can run `ktx ingest --all --no-input` overnight on an ingest branch so the latest dbt manifests, BI metadata, and documentation updates are ready for review each morning.
Teams usually run this on demand while setting up a source, then schedule it once the source is stable. A cron job or CI schedule can run `ktx ingest run --connection-id <id> --adapter <adapter> --no-input` overnight on an ingest branch so the latest dbt manifests, BI metadata, and documentation updates are ready for review each morning.
Once merged, agents querying through the KTX CLI see the updated context immediately. No deployment step, no cache invalidation, no restart. The files are the source of truth, and agents read them on every request.
@ -69,9 +69,9 @@ This workflow gives you the same review guarantees you have for dbt models. No s
Context improves over time through two feedback channels.
**Analyst corrections.** When an analytics engineer spots something wrong — a measure formula that doesn't match the business definition, a join that should be `many_to_one` instead of `one_to_many`, a knowledge page that's out of date — they edit the YAML or Markdown directly and commit. These corrections become part of the project's git history, and the next ingestion run respects them. If you manually fix a measure definition, KTX won't overwrite it on the next ingest.
**Analyst corrections.** When an analytics engineer spots something wrong — a measure formula that doesn't match the business definition, a join that should be `many_to_one` instead of `one_to_many`, a wiki page that's out of date — they edit the YAML or Markdown directly and commit. These corrections become part of the project's git history, and the next ingestion run respects them. If you manually fix a measure definition, KTX won't overwrite it on the next ingest.
**Agent feedback.** When an agent queries the semantic layer and gets unexpected results — a query that returns no rows because of a bad filter, a join path that produces duplicated results — it can flag the issue. These signals feed back into the context: knowledge pages can note known data quality issues, and source definitions can be tightened with better filters, join paths, or grain declarations.
**Agent feedback.** When an agent queries the semantic layer and gets unexpected results — a query that returns no rows because of a bad filter, a join path that produces duplicated results — it can flag the issue. These signals feed back into the context: wiki pages can note known data quality issues, and source definitions can be tightened with better filters, join paths, or grain declarations.
Each of these channels makes the next ingestion cycle better. Analyst corrections teach the system what your team considers authoritative. Agent feedback surfaces gaps in coverage. Context is not a static artifact — it's a living system that converges toward accuracy with every iteration.

View file

@ -30,7 +30,7 @@ A context layer is the infrastructure that gives agents the business knowledge t
KTX organizes context into four pillars:
- Semantic sources
- Knowledge pages
- Wiki pages
- Scan artifacts
- Provenance
@ -67,7 +67,7 @@ measures:
expr: count(id)
```
**Knowledge pages** are Markdown documents that capture business definitions, rules, and operating context — the kind of context that doesn't fit in a schema definition. Pages have structured frontmatter (summary, tags, semantic layer references) and free-form content. Agents search them when they need to understand why a metric works a certain way, not just how to compute it.
**Wiki pages** are Markdown documents that capture business definitions, rules, and operating context — the kind of context that doesn't fit in a schema definition. Pages have structured frontmatter (summary, tags, semantic layer references) and free-form content. Agents search them when they need to understand why a metric works a certain way, not just how to compute it.
```markdown
---
@ -97,13 +97,13 @@ Together, these four pillars give agents enough context to produce analytics art
## How KTX compares
KTX is a context layer with an agent-native semantic layer at its core. MetricFlow, Cube, and Malloy model metrics, dimensions, joins, and generated SQL. KTX covers that semantic-layer work, then adds the context agents need to use and maintain it: knowledge pages, schema scans, provenance, ingestion, validation, and agent-facing CLI commands.
KTX is a context layer with an agent-native semantic layer at its core. MetricFlow, Cube, and Malloy model metrics, dimensions, joins, and generated SQL. KTX covers that semantic-layer work, then adds the context agents need to use and maintain it: wiki pages, schema scans, provenance, ingestion, validation, and agent-facing CLI commands.
The workflow is the difference. Traditional semantic layers are powerful, but they are usually built and maintained through manual modeling work, product-specific runtimes, or language-specific workflows. They are not agent-native by default, which makes them harder for agents to inspect, edit, validate, and review in a tight loop. KTX is designed for agents that need to read context, change semantic files, inspect generated SQL, and leave a reviewable git diff.
| | KTX semantic layer | MetricFlow | Cube | Malloy |
|---|---|---|---|---|
| **Model surface** | Plain YAML sources plus Markdown knowledge pages | YAML semantic models and metrics in a dbt project | YAML or JavaScript cubes, views, access policies, and pre-aggregations | `.malloy` models, query pipelines, notebooks, and annotations |
| **Model surface** | Plain YAML sources plus Markdown wiki pages | YAML semantic models and metrics in a dbt project | YAML or JavaScript cubes, views, access policies, and pre-aggregations | `.malloy` models, query pipelines, notebooks, and annotations |
| **What it models** | Sources, columns, measures, segments, joins, grain, filters, default time dimensions, and context references | Semantic models, entities, dimensions, measures, metrics, time grains, and metric types | Cubes, views, measures, dimensions, segments, joins, hierarchies, policies, and rollups | Sources, joins, dimensions, measures, calculations, nested results, and query pipelines |
| **Agent edit loop** | First-class. Agents can patch small files, save imperfect drafts, run validation, query through the CLI, inspect SQL, and refine in the same workflow | Possible, but the interface is a dbt/metric workflow rather than an agent context workflow | Possible through code-first models and platform APIs, but changes are tied to runtime deployment and governance concerns | Possible, but agents must operate in Malloy's language and compiler model |
| **Fan-out safety** | Explicit `grain` plus relationship metadata. KTX detects `one_to_many` fan-out, identifies chasm traps, pre-aggregates independent fact measures into CTEs, and rejects unsafe filters | Dataflow query planning for metric requests, multi-hop joins, metric time, and metric types | Runtime planner, modeled joins, primary keys, views, multi-fact views, and pre-aggregations | Symmetric aggregates and path-based aggregation in the language |
@ -111,7 +111,7 @@ The workflow is the difference. Traditional semantic layers are powerful, but th
| **Context around semantics** | Built in: wiki pages, scan artifacts, relationship inference, ingest transcripts, replay, and agent-facing CLI commands | Primarily metric and dbt project context | Descriptions and `meta.ai_context` inside the semantic model, plus platform agent features | Annotations/tags can carry metadata; surrounding context depends on the application |
| **Best fit** | Agents maintaining analytics code, metrics, joins, SQL, docs, and semantic definitions | Teams standardizing metrics inside dbt workflows | Production semantic APIs, BI integrations, access control, caching, and concurrency | Expressive modeling and exploratory analysis above SQL |
If you do not have a semantic layer, KTX can build an agent-native one from your database schema and enrich it with generated descriptions and knowledge pages. If you already use MetricFlow or LookML, KTX ingests from those tools and merges their context into KTX's files. You can keep your existing BI or metric-serving system while using KTX as the semantic and contextual surface agents work against.
If you do not have a semantic layer, KTX can build an agent-native one from your database schema and enrich it with generated descriptions and wiki pages. If you already use MetricFlow or LookML, KTX ingests from those tools and merges their context into KTX's files. You can keep your existing BI or metric-serving system while using KTX as the semantic and contextual surface agents work against.
## The plain-files philosophy
@ -125,7 +125,7 @@ my-project/
│ ├── orders.yaml # Semantic source definitions
│ ├── customers.yaml
│ └── order_items.yaml
├── knowledge/
├── wiki/
│ ├── global/
│ │ ├── revenue.md # Business definitions and rules
│ │ └── segment-classification.md
@ -140,7 +140,7 @@ my-project/
└── cache/ # Runtime cache (git-ignored)
```
Semantic sources and knowledge pages are committed to git. The SQLite database holds ephemeral state — scan results, embedding indexes, session logs — and is git-ignored. If you delete it, KTX rebuilds it on the next run.
Semantic sources and wiki pages are committed to git. The SQLite database holds ephemeral state — scan results, embedding indexes, session logs — and is git-ignored. If you delete it, KTX rebuilds it on the next run.
This means your analytics context travels with your code. You can fork it, branch it, review it in a PR, and merge it with the same tools you use for dbt models. There's no sync problem between a remote server and your local state. There's no migration to run. The files are the source of truth.

View file

@ -88,5 +88,5 @@ Works with PostgreSQL, Snowflake, BigQuery, ClickHouse, MySQL, and SQL Server.
| Set up a new KTX project | [Quickstart](/docs/getting-started/quickstart) |
| Explain what problem KTX solves | [The Context Layer](/docs/concepts/the-context-layer) |
| Scan a database and ingest metadata | [Building Context](/docs/guides/building-context) |
| Edit semantic sources or knowledge pages | [Writing Context](/docs/guides/writing-context) |
| Edit semantic sources or wiki pages | [Writing Context](/docs/guides/writing-context) |
| Look up exact command flags | [CLI Reference](/docs/cli-reference/ktx-setup) |

View file

@ -146,7 +146,7 @@ This is where KTX does the heavy lifting. It runs an enriched scan of your datab
│ ○ Leave context unbuilt and exit setup
```
The build scans each primary source with LLM enrichment, detects table relationships, and runs ingestion agents that reconcile metadata from your context sources into semantic-layer YAML files and knowledge pages.
The build scans each primary source with LLM enrichment, detects table relationships, and runs ingestion agents that reconcile metadata from your context sources into semantic-layer YAML files and wiki pages.
For a small database (under 50 tables), this takes a few minutes. Larger warehouses can take longer. You can press <kbd>d</kbd> to detach and let it run in the background:
@ -208,10 +208,10 @@ KTX writes project state as plain files so agents can inspect and edit changes i
|------|------------|---------|
| `ktx.yaml` | `ktx setup` | Main project configuration: connections, LLM settings, embeddings, and context sources |
| `.ktx/secrets/*` | `ktx setup` when file-backed secrets are selected | Local secret files referenced from `ktx.yaml`; do not commit these |
| `semantic-layer/<connection-id>/*.yaml` | context build, ingestion, or `ktx sl write` | Semantic source definitions agents use for SQL generation |
| `knowledge/global/*.md` | ingestion or `ktx wiki write --scope global` | Shared business context and metric definitions |
| `knowledge/user/<user-id>/*.md` | `ktx wiki write --scope user` | User-scoped notes for one agent/user context |
| `.claude/skills/ktx/SKILL.md`, `.agents/skills/ktx/SKILL.md` | CLI-mode agent integration setup | Agent instructions for calling `ktx agent` commands |
| `semantic-layer/<connection-id>/*.yaml` | context build, ingestion, or direct file edits | Semantic source definitions agents use for SQL generation |
| `wiki/global/*.md` | ingestion, memory capture, `ktx wiki write --scope global`, or direct file edits | Shared business context and metric definitions |
| `wiki/user/<user-id>/*.md` | memory capture, `ktx wiki write --scope user`, or direct file edits | User-scoped notes for one agent/user context |
| `.claude/skills/ktx/SKILL.md`, `.agents/skills/ktx/SKILL.md` | CLI-mode agent integration setup | Agent instructions for calling public `ktx` commands |
## Verify it worked
@ -239,14 +239,14 @@ Agent integration ready: yes (claude-code:project)
| `ktx: command not found` | The KTX package is not installed globally, or the shell cannot find the global binary | Run `npm install -g @kaelio/ktx` and open a new shell |
| LLM health check fails | Missing, invalid, or unauthorized Anthropic API key | Export `ANTHROPIC_API_KEY` or rerun `ktx setup` and choose the file-backed secret option |
| OpenAI embedding check fails | `OPENAI_API_KEY` is missing when OpenAI embeddings are selected | Export `OPENAI_API_KEY`, or rerun setup and choose local sentence-transformers embeddings |
| Local embeddings hang or fail | The managed Python runtime cannot start or the local model runtime is unavailable | Install `uv`, run `ktx dev runtime doctor`, then run `ktx dev runtime install --feature local-embeddings --yes` and rerun setup |
| Database connection test fails | Credentials, network access, warehouse, database, or schema value is wrong | Test the same URL with the database's native client, then rerun `ktx connection add ... --force` or rerun setup |
| Local embeddings hang or fail | The managed Python runtime cannot start or the local model runtime is unavailable | Install `uv`, run `ktx dev runtime status`, then run `ktx dev runtime install --feature local-embeddings --yes` and rerun setup |
| Database connection test fails | Credentials, network access, warehouse, database, or schema value is wrong | Test the same URL with the database's native client, then rerun `ktx setup` and reconfigure the connection |
| `KTX context built: no` in `ktx status` | Setup saved configuration but did not build context | Run `ktx setup` and choose to build context now |
| Agent integration is incomplete | Setup skipped the agents step or the target was not installed | Run `ktx setup --agents --target codex --project` using the target you need |
## Next steps
- **Build more context** — learn about [scanning](/docs/guides/building-context), relationship detection, and ingestion workflows in the Building Context guide.
- **Refine your semantic layer** — the [Writing Context](/docs/guides/writing-context) guide covers source YAML, measures, joins, and knowledge pages.
- **Refine your semantic layer** — the [Writing Context](/docs/guides/writing-context) guide covers source YAML, measures, joins, and wiki pages.
- **Understand the architecture** — read [The Context Layer](/docs/concepts/the-context-layer) to learn why a context layer is more than a semantic layer.
- **Connect more agents** — see the [Agent Clients](/docs/integrations/agent-clients) integration page for per-tool setup details.

View file

@ -12,7 +12,7 @@ Scanning connects to your database and extracts structural metadata. KTX stores
### Running a scan
```bash
ktx dev scan <connection-id>
ktx scan <connection-id>
```
This runs a structural scan by default. You can control what the scan does with the `--mode` flag:
@ -25,25 +25,18 @@ This runs a structural scan by default. You can control what the scan does with
```bash
# Scan with relationship detection
ktx dev scan my-postgres --mode relationships
ktx scan my-postgres --mode relationships
# Preview without writing results
ktx dev scan my-postgres --dry-run
ktx scan my-postgres --dry-run
```
### Checking scan status
### Checking scan results
Every scan produces a run ID. Use it to check progress or review results:
Every scan prints a summary and writes local artifacts. Use `ktx status` after a scan to review project readiness and follow-up setup work:
```bash
# Check status of a scan run
ktx dev scan status <run-id>
# Print the full scan report
ktx dev scan report <run-id>
# Get the report as JSON for scripting
ktx dev scan report <run-id> --json
ktx status
```
### Relationship detection
@ -56,53 +49,11 @@ Many databases lack declared foreign keys. KTX infers relationships by scoring c
| 0.55 &ndash; 0.84 | `review` | Plausible — needs human review |
| &lt; 0.55 | `rejected` | Low confidence — not applied |
After a relationship scan, review the candidates:
```bash
# Show candidates pending review (default)
ktx dev scan relationships <run-id>
# Show all candidates regardless of status
ktx dev scan relationships <run-id> --status all
# Accept a specific candidate
ktx dev scan relationships <run-id> --accept <candidate-id>
# Reject a candidate with a note
ktx dev scan relationships <run-id> --reject <candidate-id> --note "These columns share a name but are unrelated"
```
Once you've reviewed candidates, apply the accepted ones as joins in your semantic layer:
```bash
# Apply all accepted relationships
ktx dev scan relationship-apply <run-id> --all-accepted
# Preview what would be applied
ktx dev scan relationship-apply <run-id> --all-accepted --dry-run
# Apply a specific candidate
ktx dev scan relationship-apply <run-id> --candidate <candidate-id>
```
### Calibrating thresholds
As you review more relationships, KTX can evaluate whether the default thresholds (0.85 accept, 0.55 review) are optimal for your schema:
```bash
# See how your feedback aligns with current thresholds
ktx dev scan relationship-calibration --connection my-postgres
# Get threshold recommendations (needs 20+ labels, 5+ accepted, 5+ rejected)
ktx dev scan relationship-thresholds --connection my-postgres
# Export your review decisions as calibration labels
ktx dev scan relationship-feedback --connection my-postgres
```
Relationship scans run with `ktx scan <connection-id> --mode relationships`. This command only executes the scan; relationship review and calibration subcommands are not part of the current CLI surface.
## Ingestion
Ingestion pulls semantic context from your existing analytics tools — dbt projects, Looker models, Metabase questions, and more — and writes it into your KTX project as semantic sources and knowledge pages.
Ingestion pulls semantic context from your existing analytics tools — dbt projects, Looker models, Metabase questions, and more — and writes it into your KTX project as semantic sources and wiki pages.
### How it works
@ -110,24 +61,12 @@ Each ingest run follows this flow:
1. An **adapter** extracts metadata from your tool (dbt manifest, LookML files, Metabase API, etc.)
2. An **LLM agent** reconciles the extracted metadata with your existing context — it merges intelligently rather than overwriting
3. **Semantic sources** (YAML) and **knowledge pages** (Markdown) are written to your project directory
3. **Semantic sources** (YAML) and **wiki pages** (Markdown) are written to your project directory
### Running an ingest
```bash
# Ingest one configured context source
ktx ingest my-dbt-source
# Ingest every configured context source
ktx ingest --all
```
The public `ktx ingest` command uses the source configuration in `ktx.yaml`, including the source `driver` and any adapter-specific paths or credentials.
For adapter-level debugging, use the low-level `ktx dev ingest run` command:
```bash
ktx dev ingest run --connection-id my-dbt-source --adapter dbt
ktx ingest run --connection-id my-dbt-source --adapter dbt
```
Useful low-level flags:
@ -152,7 +91,7 @@ ktx ingest status <run-id>
ktx ingest watch
# Replay a past ingest run
ktx dev ingest replay <run-id>
ktx ingest replay <run-id>
```
The `watch` command opens an interactive TUI that shows the memory-flow output — every tool call, LLM decision, and artifact written during the ingest.
@ -174,7 +113,7 @@ See [Context Sources](/docs/integrations/context-sources) for adapter-specific s
### What gets generated
A typical dbt ingest produces semantic sources and knowledge pages in your project:
A typical dbt ingest produces semantic sources and wiki pages in your project:
**Semantic source** (`semantic-layer/my-postgres/orders.yaml`):
@ -210,7 +149,7 @@ joins:
relationship: many_to_one
```
**Knowledge page** (`knowledge/global/order-status-definitions.md`):
**Wiki page** (`wiki/global/order-status-definitions.md`):
```markdown
---
@ -235,7 +174,7 @@ Orders in "pending" status for more than 48 hours are flagged for review.
Every ingest session records a full transcript — tool calls, LLM responses, and write decisions. You can replay any session to debug why a source was written a certain way:
```bash
ktx dev ingest replay <run-id> --viz
ktx ingest replay <run-id> --viz
```
This opens the same TUI view as the original run, letting you step through the agent's reasoning.

View file

@ -3,61 +3,44 @@ title: Serving Agents
description: Expose your context to Claude Code, Cursor, Codex, and other coding agents.
---
Once you've built and refined your context, the final step is exposing it to
coding agents. KTX provides machine-readable CLI commands for direct terminal
access from Claude Code, Cursor, Codex, OpenCode, and custom agent workflows.
Once you've built and refined your context, expose it to coding agents through
the public KTX CLI. Claude Code, Cursor, Codex, OpenCode, and custom agent
workflows can call the same commands you use at a terminal.
## CLI Commands
KTX provides a set of machine-readable commands under `ktx agent`. These return
JSON output designed for programmatic consumption.
KTX public commands support JSON output for the context reads that agents use
most often. Use `--project-dir` when the agent is not already running inside the
KTX project directory.
### Available commands
```bash
# List available tools and their descriptions
ktx agent tools --json
# Get project context for planning
ktx agent context --json
# Check setup and context readiness
ktx status --json
```
**Semantic layer:**
```bash
# List sources
ktx agent sl list --json
ktx agent sl list --json --connection-id my-postgres
# Read a source
ktx agent sl read orders --json --connection-id my-postgres
ktx sl list --json
ktx sl list --json --connection-id my-postgres
ktx sl search "revenue" --json
# Run a query from a JSON file
ktx agent sl query --json \
ktx sl query --json \
--connection-id my-postgres \
--query-file query.json \
--execute \
--max-rows 100
```
**Knowledge:**
**Wiki:**
```bash
# Search knowledge pages
ktx agent wiki search "revenue recognition" --json --limit 10
# Read a specific page
ktx agent wiki read order-status-definitions --json
```
**SQL execution:**
```bash
# Execute read-only SQL with a row limit
ktx agent sql execute --json \
--connection-id my-postgres \
--sql-file query.sql \
--max-rows 500
# Search wiki pages
ktx wiki search "revenue recognition" --json --limit 10
```
## Setting Up Your Agent
@ -73,4 +56,4 @@ configuration. For manual setup or per-tool details, see the
[Agent Clients](/docs/integrations/agent-clients) integration page.
After configuration, the agent can immediately call KTX commands to list
sources, search knowledge, and query your semantic layer.
sources, search wiki pages, and query your semantic layer.

View file

@ -1,20 +1,20 @@
---
title: Writing Context
description: Write and refine semantic sources and knowledge pages.
description: Write and refine semantic sources and wiki pages.
---
After building context through scanning and ingestion, you'll want to refine it — edit semantic sources to match your business logic, add knowledge pages that capture tribal knowledge, and query your data through the semantic layer to verify everything works.
After building context through scanning and ingestion, you'll want to refine it — edit semantic sources to match your business logic, add wiki pages that capture tribal knowledge, and query your data through the semantic layer to verify everything works.
## Agent workflow summary
Agents should refine context in this order:
1. `ktx sl list --json` — discover available sources and connection ids.
2. `ktx sl read <source> --connection-id <id>` — inspect the current YAML.
3. Edit the source YAML directly or use `ktx sl write`.
2. `ktx sl search <query> --json` — find source candidates for a concept.
3. Edit the source YAML directly in `semantic-layer/<connection-id>/`.
4. `ktx sl validate <source> --connection-id <id>` — verify columns, joins, and table references.
5. `ktx sl query ... --format sql` — compile a representative query without executing it.
6. `ktx wiki search ...` and `ktx wiki write ...` — add business context that does not belong in schema YAML.
6. `ktx wiki search ...` — check business context captured by ingest or memory.
## Semantic Sources
@ -33,13 +33,14 @@ ktx sl list --connection-id my-postgres
ktx sl list --json
```
### Reading a source
### Searching sources
```bash
ktx sl read orders --connection-id my-postgres
ktx sl search "revenue" --connection-id my-postgres --json
```
This prints the full YAML definition for the source.
Search returns ranked source summaries. To inspect or edit a source, open the
YAML file under `semantic-layer/<connection-id>/`.
### The source schema
@ -147,25 +148,10 @@ Column visibility controls what agents see:
| `internal` | Available for joins and measures but not shown to agents |
| `hidden` | Excluded entirely — useful for ETL columns |
### Writing a source
### Editing a source
```bash
ktx sl write orders --connection-id my-postgres --yaml '
name: orders
table: public.orders
grain: [order_id]
columns:
- name: order_id
type: string
- name: total_amount
type: number
measures:
- name: total_revenue
expr: SUM(total_amount)
'
```
You can also edit source files directly — they live at `semantic-layer/<connection-id>/<source-name>.yaml` in your project directory.
Edit source files directly. They live at
`semantic-layer/<connection-id>/<source-name>.yaml` in your project directory.
### Validating sources
@ -225,28 +211,27 @@ The query planner is grain-aware — it understands the cardinality of joins and
### Workflow: edit and validate a source
1. `ktx sl read orders --connection-id my-postgres > /tmp/orders.yaml` — capture the current definition.
2. Edit `/tmp/orders.yaml` to add columns, measures, joins, or descriptions.
3. `ktx sl write orders --connection-id my-postgres --yaml "$(cat /tmp/orders.yaml)"` — write the updated source.
4. `ktx sl validate orders --connection-id my-postgres` — check the definition against the live schema.
5. `ktx sl query --connection-id my-postgres --measure total_revenue --dimension order_date --format sql` — compile a representative query.
1. Open `semantic-layer/my-postgres/orders.yaml`.
2. Edit the file to add columns, measures, joins, or descriptions.
3. `ktx sl validate orders --connection-id my-postgres` — check the definition against the live schema.
4. `ktx sl query --connection-id my-postgres --measure total_revenue --dimension order_date --format sql` — compile a representative query.
If validation fails, fix the YAML before asking an agent to use the source. Common validation failures are missing columns, invalid join targets, and measure expressions that reference fields outside the source.
## Knowledge Pages
## Wiki Pages
Knowledge pages are Markdown files that capture business context — definitions, rules, gotchas, and anything an agent needs to understand beyond what the schema tells it.
Wiki pages are Markdown files that capture business context — definitions, rules, gotchas, and anything an agent needs to understand beyond what the schema tells it.
### What they are
When an agent asks "what counts as an active user?" or "why do revenue numbers differ between the dashboard and the SQL query?", the answer isn't in the schema. It's tribal knowledge that lives in Slack threads, Notion pages, or someone's head. Knowledge pages make that context searchable and available to agents.
When an agent asks "what counts as an active user?" or "why do revenue numbers differ between the dashboard and the SQL query?", the answer isn't in the schema. It's tribal knowledge that lives in Slack threads, Notion pages, or someone's head. Wiki pages make that context searchable and available to agents.
### Organization
Knowledge pages are organized by scope:
Wiki pages are organized by scope:
```
knowledge/
wiki/
├── global/ # Cross-cutting definitions
│ ├── order-status-definitions.md
│ ├── revenue-recognition-rules.md
@ -260,42 +245,17 @@ knowledge/
- **Global pages** apply across all connections — business definitions, metric standards, company terminology.
- **User-scoped pages** are private to a user ID — personal notes, local gotchas, or context you do not want shared globally.
### Writing pages
### Editing pages
```bash
ktx wiki write order-status-definitions \
--scope global \
--summary "Business definitions for order status values" \
--content "## Order Statuses
Create and edit wiki pages directly as Markdown files in the `wiki/`
directory, or with `ktx wiki write`. Ingest and memory capture also create
these pages automatically.
- **pending**: Order placed but not yet processed
- **confirmed**: Payment received, awaiting fulfillment
- **shipped**: Order dispatched to carrier
- **delivered**: Order received by customer
- **cancelled**: Order cancelled before shipment
Orders in pending status for more than 48 hours are flagged for review." \
--tag orders \
--tag definitions \
--sl-ref orders
```
Write flags:
| Flag | Description |
|------|-------------|
| `--scope <scope>` | `global` (default) or `user` |
| `--summary <text>` | Short description for search results (required) |
| `--content <text>` | Full Markdown content (required) |
| `--tag <tag>` | Categorization tag (repeatable) |
| `--ref <ref>` | Reference to external resources (repeatable) |
| `--sl-ref <ref>` | Link to a semantic source (repeatable) |
Knowledge page fields:
Wiki page fields:
| Field | Required | Description |
|-------|----------|-------------|
| Key | Yes | Stable page identifier passed to `ktx wiki read` |
| Key | Yes | Stable page identifier used as the Markdown filename |
| Summary | Yes | Short text shown in search results |
| Content | Yes | Full Markdown business context |
| Scope | No | `global` for shared context or `user` for user-scoped notes |
@ -303,20 +263,12 @@ Knowledge page fields:
| External refs | No | Links or identifiers for source-of-truth systems |
| Semantic-layer refs | No | Source names the page explains or constrains |
You can also create and edit knowledge pages directly as Markdown files in the `knowledge/` directory.
### Listing pages
```bash
ktx wiki list
```
### Reading a page
```bash
ktx wiki read order-status-definitions
```
### Searching
```bash
@ -328,9 +280,9 @@ Search uses both full-text matching and semantic similarity — it finds relevan
### Workflow: add searchable business context
1. Search first: `ktx wiki search "order status definitions"`.
2. If no page already covers the rule, write a page with `ktx wiki write`.
3. Include a concise `--summary`; agents see this before loading full content.
4. Add `--tag` values for the business area and `--sl-ref` values for related semantic sources.
2. If no page already covers the rule, create or edit a Markdown file under `wiki/global/`.
3. Include concise frontmatter; agents see the summary before loading full content.
4. Add `tags` values for the business area and `sl_refs` values for related semantic sources.
5. Search again with the user's likely wording to confirm the page is discoverable.
## Common errors
@ -339,6 +291,6 @@ Search uses both full-text matching and semantic similarity — it finds relevan
|------------------|--------------|----------|
| `ktx sl validate` reports a missing column | YAML references a column that is absent from the scanned table | Run a fresh scan or update the YAML to match the warehouse schema |
| Query compilation double-counts a measure | Join relationship or grain is missing or wrong | Add `grain` and explicit `relationship` values, then validate and recompile |
| Agent cannot find a metric | Measure name or description does not match business terminology | Add a measure description and a knowledge page with common synonyms |
| Knowledge search misses a page | Summary and tags do not include likely user wording | Rewrite the summary and add relevant tags, then search again |
| `ktx sl write` changes are hard to review | Large YAML was passed inline | Edit the source file directly or write from a temporary file, then review the git diff |
| Agent cannot find a metric | Measure name or description does not match business terminology | Add a measure description and a wiki page with common synonyms |
| Wiki search misses a page | Summary and tags do not include likely user wording | Rewrite the summary and add relevant tags, then search again |
| Semantic-layer changes are hard to review | The YAML edit is too large or unfocused | Split the change into smaller source-file edits, then review the git diff |

View file

@ -3,7 +3,9 @@ title: Agent Clients
description: Set up KTX with Claude Code, Cursor, Codex, and OpenCode.
---
KTX integrates with coding agents through CLI skills and command files. These files teach agents to call `ktx agent ...` commands directly from the terminal for semantic-layer context, wiki knowledge, and safe SQL execution.
KTX integrates with coding agents through CLI skills and command files. These
files teach agents to call public `ktx` commands directly from the terminal for
semantic-layer context and wiki knowledge.
Run `ktx setup` and select your agent targets, or configure manually using the snippets below.
@ -26,17 +28,15 @@ Create `.claude/skills/ktx/SKILL.md`:
```markdown title=".claude/skills/ktx/SKILL.md"
---
name: ktx
description: Use local KTX semantic context, wiki knowledge, and safe SQL execution for this project.
description: Use local KTX semantic context and wiki knowledge for this project.
---
Available commands:
- `ktx agent context --json --project-dir /path/to/project`
- `ktx agent sl list --json --project-dir /path/to/project`
- `ktx agent sl read '<sourceName>' --json --project-dir /path/to/project`
- `ktx agent sl query --json --project-dir /path/to/project --connection-id '<id>' --query-file '<path>' --execute --max-rows 100`
- `ktx agent wiki search '<query>' --json --project-dir /path/to/project`
- `ktx agent wiki read '<pageId>' --json --project-dir /path/to/project`
- `ktx agent sql execute --json --project-dir /path/to/project --connection-id '<id>' --sql-file '<path>' --max-rows 100`
- `ktx status --json --project-dir /path/to/project`
- `ktx sl list --json --project-dir /path/to/project`
- `ktx sl search '<text>' --json --project-dir /path/to/project --connection-id '<id>'`
- `ktx sl query --json --project-dir /path/to/project --connection-id '<id>' --query-file '<path>' --execute --max-rows 100`
- `ktx wiki search '<query>' --json --project-dir /path/to/project --limit 10`
```
### Workflow tips
@ -123,22 +123,17 @@ All supported agent clients call the same KTX CLI commands:
| Command | Description |
|---------|-------------|
| `ktx agent context --json` | Return a compact project context summary |
| `ktx agent tools --json` | List available agent-facing commands |
| `ktx agent wiki search <query> --json` | Search knowledge pages |
| `ktx agent wiki read <key> --json` | Read a knowledge page |
| `ktx agent wiki write --json` | Write or update a knowledge page |
| `ktx agent sl list --json` | List semantic layer sources |
| `ktx agent sl read <source> --json` | Read a semantic source definition |
| `ktx agent sl write --json` | Write or update a semantic source |
| `ktx agent sl validate --json` | Validate semantic source definitions |
| `ktx agent sl query --json` | Execute a semantic layer query when semantic compute is configured |
| `ktx agent sql execute --json` | Execute read-only SQL with an explicit row limit |
| `ktx status --json` | Return project setup and context readiness |
| `ktx wiki search <query> --json` | Search wiki pages |
| `ktx wiki read <key> --json` | Read a wiki page |
| `ktx wiki write <key>` | Write or update a wiki page |
| `ktx sl list --json` | List semantic-layer sources |
| `ktx sl search <query> --json` | Search semantic-layer sources |
| `ktx sl validate <source> --connection-id <id>` | Validate semantic source definitions |
| `ktx sl query --json` | Execute a semantic-layer query when semantic compute is configured |
### Security constraints
- SQL execution is always read-only.
- Agent SQL execution requires an explicit `--max-rows` limit from 1 to 1000.
- Secrets and credentials are never exposed in command output.
- Commands resolve the project from `--project-dir`, `KTX_PROJECT_DIR`, or the nearest `ktx.yaml`.

View file

@ -13,9 +13,9 @@ Agents should configure and ingest context sources in this order:
1. Add the context source connection in `ktx.yaml` or with `ktx setup`.
2. Store tokens as `env:NAME` or `file:/path/to/secret`.
3. Run `ktx ingest <connectionId>` for one source or `ktx ingest --all`.
3. Run `ktx ingest run --connection-id <connectionId> --adapter <adapter>` for one source or `ktx ingest run --connection-id <id> --adapter <adapter>`.
4. Check progress with `ktx ingest status --json`.
5. Review generated `semantic-layer/` YAML and `knowledge/` Markdown files in git.
5. Review generated `semantic-layer/` YAML and `wiki/` Markdown files in git.
6. Validate changed semantic sources with `ktx sl validate`.
## Shared source fields
@ -233,7 +233,7 @@ Generate an API key in Metabase: **Admin > Settings > Authentication > API Keys*
### What gets ingested
- Semantic sources generated from SQL queries in questions
- Knowledge pages for dashboards (purpose, key metrics, relationships)
- Wiki pages for dashboards (purpose, key metrics, relationships)
- Work units per dashboard and per question
### Warehouse mapping
@ -290,7 +290,7 @@ Generate API credentials in Looker: **Admin > Users > Edit > API Keys**.
### What gets ingested
- Semantic sources from explore field definitions
- Knowledge pages for dashboards (purpose, audience, key metrics)
- Wiki pages for dashboards (purpose, audience, key metrics)
- Triage signals for automated content classification
- Work units per explore and per dashboard
@ -310,11 +310,11 @@ Find Looker connection names in **Admin > Database > Connections**.
## Notion
Ingests pages and databases from a Notion workspace as knowledge pages. Useful for capturing business definitions, data dictionaries, and team documentation that agents need for context.
Ingests pages and databases from a Notion workspace as wiki pages. Useful for capturing business definitions, data dictionaries, and team documentation that agents need for context.
### What it provides
- Knowledge pages synthesized from Notion content
- Wiki pages synthesized from Notion content
- Page hierarchy and relationships
- Database schemas (when Notion databases describe data sources)
- Semantic clustering for organized ingestion
@ -364,7 +364,7 @@ Create an integration at [notion.so/my-integrations](https://www.notion.so/my-in
### What gets ingested
- Knowledge pages synthesized from Notion content (not raw copies)
- Wiki pages synthesized from Notion content (not raw copies)
- Domain context extracted and organized by topic
- Triage signals for classifying page relevance
- Work units clustered by semantic similarity for efficient processing
@ -381,6 +381,6 @@ Create an integration at [notion.so/my-integrations](https://www.notion.so/my-in
|------------------|--------------|----------|
| Adapter cannot read source files | `source_dir`, `repo_url`, `branch`, or `path` is wrong | Verify the path locally or clone the repo manually with the same credentials |
| Private repo/API authentication fails | Token env var or secret file is missing | Export the env var or update `auth_token_ref` to a readable file |
| Ingest creates duplicate context | Existing source names or knowledge pages do not match imported terminology | Review the diff, rename duplicates, and add knowledge pages with canonical names |
| Ingest creates duplicate context | Existing source names or wiki pages do not match imported terminology | Review the diff, rename duplicates, and add wiki pages with canonical names |
| Notion ingest skips pages | Integration lacks access or root ids are missing | Share pages with the Notion integration and set `root_page_ids` or use `all_accessible` carefully |
| Generated semantic sources fail validation | Tool metadata does not match the live warehouse schema | Map BI/source databases to primary warehouse connections and rerun validation |

View file

@ -511,4 +511,4 @@ No authentication required — SQLite is file-based. The file must be readable b
| Scan returns no tables | Schema/database/project filter is wrong or the user lacks metadata permissions | Verify the schema list and grant metadata read permissions |
| Historic SQL is empty | Query history extension or warehouse history view is unavailable | Enable the warehouse-specific history feature, then rerun scan or setup |
| Column statistics are missing | Connector cannot access stats tables or the warehouse does not expose them | Grant stats permissions where supported; otherwise rely on structural scan output |
| SQL execution fails through agents | Connection is missing, unreachable, or query execution is disabled | Run `ktx connection test <id>` and check the agent command flags |
| Semantic query execution fails | Connection is missing, unreachable, or query execution is disabled | Run `ktx connection test <id>` and check the `ktx sl query` flags |

View file

@ -47,7 +47,7 @@ export function buildLlmsTxt() {
> Agent-native context layer for analytics engineering and database agents.
KTX provides semantic-layer files, warehouse scans, knowledge pages, provenance, and agent-facing tools that help coding agents answer analytics questions without inventing metrics or joins.
KTX provides semantic-layer files, warehouse scans, wiki pages, provenance, and agent-facing tools that help coding agents answer analytics questions without inventing metrics or joins.
## Agent Entry Points
@ -60,21 +60,21 @@ ${link("/docs/ai-resources/agent-instructions", "Agent Instructions", "Suggested
${link("/docs/getting-started/introduction", "Introduction", "What KTX is and who it is for")}
${link("/docs/getting-started/quickstart", "Quickstart", "Set up KTX and build your first context")}
${link("/docs/guides/writing-context", "Writing Context", "Write semantic sources and knowledge pages")}
${link("/docs/guides/writing-context", "Writing Context", "Write semantic sources and wiki pages")}
## Machine-Readable Documentation
- [Full documentation](${absoluteUrl("/llms-full.txt")}): All docs pages in one plain-text markdown response
- [Markdown access guide](${absoluteUrl("/docs/ai-resources/markdown-access.md")}): How to fetch llms.txt, llms-full.txt, and per-page Markdown
- [Quickstart markdown](${absoluteUrl("/docs/getting-started/quickstart.md")}): Human setup walkthrough
- [Agent CLI markdown](${absoluteUrl("/docs/cli-reference/ktx-agent.md")}): Machine-readable agent commands
- [Semantic-layer CLI markdown](${absoluteUrl("/docs/cli-reference/ktx-sl.md")}): Semantic-layer commands and JSON output
- [Wiki CLI markdown](${absoluteUrl("/docs/cli-reference/ktx-wiki.md")}): Wiki page commands and JSON output
## CLI Reference
${link("/docs/cli-reference/ktx-setup", "ktx setup", "Interactive project setup")}
${link("/docs/cli-reference/ktx-agent", "ktx agent", "Machine-readable commands for coding agents")}
${link("/docs/cli-reference/ktx-sl", "ktx sl", "Semantic-layer commands")}
${link("/docs/cli-reference/ktx-wiki", "ktx wiki", "Knowledge page commands")}
${link("/docs/cli-reference/ktx-wiki", "ktx wiki", "Wiki page commands")}
${link("/docs/cli-reference/ktx-connection", "ktx connection", "Connection management commands")}
## Integrations

View file

@ -0,0 +1,785 @@
# Notion Warehouse Verification Gap Closure Implementation Plan
> **For agentic workers:** REQUIRED SUB-SKILL: Use superpowers:subagent-driven-development (recommended) or superpowers:executing-plans to implement this plan task-by-task. Steps use checkbox (`- [ ]`) syntax for tracking.
**Goal:** Close the remaining v1 gaps that prevent ingest agents, especially
Notion WorkUnits, from reliably verifying warehouse table and column
identifiers before writing wiki or semantic-layer output.
**Architecture:** Keep the existing warehouse verification tool module and
runner wiring. Add Notion target-warehouse scoping through the local adapter
factory, make the active WorkUnit prompt name the shipped tools, enforce
`allowedConnectionNames` in `discover_data`, and teach `entity_details` to
resolve and reject column-level display targets.
**Tech Stack:** TypeScript, Node 22, Vitest, AI SDK v6 tools, Zod, KTX local
ingest adapters, KTX file store.
---
## Audit summary
The previous implementation plan landed the main tool module and prompt
protocol, but four v1-blocking gaps remain:
- Notion ingest sessions still allow only the Notion connection unless a
specific adapter supplies target IDs. `NotionSourceAdapter` does not supply
target warehouse IDs, so the original Notion hallucination case cannot use
`entity_details` or raw-schema `discover_data` for the warehouse connection.
- The active WorkUnit framing prompt still tells agents to call
`wiki_sl_search` and `sl_describe_table`, which are not shipped KTX tools.
- `discover_data` accepts an explicit out-of-scope `connectionName` and still
searches raw schema for that connection.
- `entity_details({ targets: [{ display: "schema.table.column" }] })` does not
resolve column display strings and does not fail explicit missing-column
targets.
Non-blocking gaps remain out of scope for this plan:
- Full DDL-style `entity_details` formatting with FK and profile summaries.
- AST-backed SQL read-only validation for data-modifying CTEs.
- Search over `enrichment/descriptions.json` for generated descriptions.
- Lexicographic latest-sync edge cases for non-timestamp sync IDs.
- Hard write-time validation in `wiki_write` and `emit_unmapped_fallback`.
## File structure
Modify these files:
- `packages/context/src/ingest/adapters/notion/notion.adapter.ts`: add
configured target warehouse IDs and implement `listTargetConnectionIds()`.
- `packages/context/src/ingest/adapters/notion/notion.adapter.test.ts`: cover
Notion target connection ID fan-out.
- `packages/context/src/ingest/local-adapters.ts`: pass primary warehouse IDs
into `NotionSourceAdapter`.
- `packages/context/src/ingest/local-adapters.test.ts`: cover local Notion
adapter target IDs.
- `packages/context/src/ingest/adapters/notion/chunk.ts`: update Notion
WorkUnit notes to prefer the warehouse verification tools.
- `packages/context/src/ingest/adapters/notion/notion.adapter.test.ts`: update
Notion note expectations.
- `packages/context/prompts/memory_agent_bundle_ingest_work_unit.md`: replace
stale tool names in the active WorkUnit prompt.
- `packages/context/src/ingest/ingest-prompts.test.ts`: guard the WorkUnit
prompt against stale tool names.
- `packages/context/src/ingest/tools/warehouse-verification/discover-data.tool.ts`:
refuse explicit out-of-scope connection names.
- `packages/context/src/ingest/tools/warehouse-verification/discover-data.tool.test.ts`:
cover `discover_data` scoping.
- `packages/context/src/ingest/tools/warehouse-verification/warehouse-catalog.service.ts`:
add column-aware display-target resolution.
- `packages/context/src/ingest/tools/warehouse-verification/warehouse-catalog.service.test.ts`:
cover column display resolution.
- `packages/context/src/ingest/tools/warehouse-verification/entity-details.tool.ts`:
use column-aware resolution and report missing columns.
- `packages/context/src/ingest/tools/warehouse-verification/entity-details.tool.test.ts`:
cover column display and missing-column behavior.
### Task 1: Give Notion ingest access to target warehouses
**Files:**
- Modify: `packages/context/src/ingest/adapters/notion/notion.adapter.ts`
- Modify: `packages/context/src/ingest/adapters/notion/notion.adapter.test.ts`
- Modify: `packages/context/src/ingest/local-adapters.ts`
- Modify: `packages/context/src/ingest/local-adapters.test.ts`
- [ ] **Step 1: Write the failing Notion adapter test**
Add this test inside `describe('NotionSourceAdapter', ...)` in
`packages/context/src/ingest/adapters/notion/notion.adapter.test.ts`:
```ts
it('returns configured target warehouse connection ids', async () => {
const adapter = new NotionSourceAdapter({
targetConnectionIds: ['warehouse', 'warehouse', 'analytics'],
});
await expect(adapter.listTargetConnectionIds?.(stagedDir)).resolves.toEqual([
'analytics',
'warehouse',
]);
});
```
- [ ] **Step 2: Run the failing Notion adapter test**
Run:
```bash
pnpm --filter @ktx/context exec vitest run src/ingest/adapters/notion/notion.adapter.test.ts -t "target warehouse connection ids"
```
Expected: FAIL because `NotionSourceAdapterDeps` has no
`targetConnectionIds` option and `NotionSourceAdapter` does not implement
`listTargetConnectionIds()`.
- [ ] **Step 3: Implement Notion target connection IDs**
Modify `packages/context/src/ingest/adapters/notion/notion.adapter.ts`:
```ts
export interface NotionSourceAdapterDeps {
onPullSucceeded?: (ctx: NotionPullSucceededContext) => Promise<void>;
logger?: NotionFetchLogger;
targetConnectionIds?: string[];
}
function uniqueSorted(values: readonly string[] | undefined): string[] {
return [...new Set(values ?? [])].sort((left, right) =>
left.localeCompare(right),
);
}
```
Add this method to `NotionSourceAdapter`:
```ts
async listTargetConnectionIds(_stagedDir: string): Promise<string[]> {
return uniqueSorted(this.deps.targetConnectionIds);
}
```
- [ ] **Step 4: Pass primary warehouses into the local Notion adapter**
Modify the Notion adapter construction in
`packages/context/src/ingest/local-adapters.ts`:
```ts
new NotionSourceAdapter({
targetConnectionIds: primaryWarehouseConnectionIds(project),
...(options.logger ? { logger: options.logger } : {}),
}),
```
- [ ] **Step 5: Write the local adapter fan-out test**
Add this test to `packages/context/src/ingest/local-adapters.test.ts`:
```ts
it('passes primary warehouse connection ids to the local Notion adapter', async () => {
const adapters = createDefaultLocalIngestAdapters(
projectWithConnections({
notion: {
driver: 'notion',
auth_token: 'secret',
crawl_mode: 'selected_roots',
root_page_ids: ['page-1'],
},
warehouse: {
driver: 'postgres',
url: 'postgresql://readonly@db.example.test/analytics',
},
docs: {
driver: 'dbt',
source_dir: './dbt',
},
} as never),
);
const notion = adapters.find((adapter) => adapter.source === 'notion');
await expect(notion?.listTargetConnectionIds?.('/tmp/staged-notion')).resolves.toEqual([
'warehouse',
]);
});
```
- [ ] **Step 6: Run the Notion target tests**
Run:
```bash
pnpm --filter @ktx/context exec vitest run \
src/ingest/adapters/notion/notion.adapter.test.ts -t "target warehouse connection ids" \
src/ingest/local-adapters.test.ts -t "local Notion adapter"
```
Expected: PASS.
- [ ] **Step 7: Commit**
Run:
```bash
git add \
packages/context/src/ingest/adapters/notion/notion.adapter.ts \
packages/context/src/ingest/adapters/notion/notion.adapter.test.ts \
packages/context/src/ingest/local-adapters.ts \
packages/context/src/ingest/local-adapters.test.ts
git commit -m "fix(context): expose target warehouses to Notion ingest"
```
### Task 2: Remove stale tool names from active ingest prompts
**Files:**
- Modify: `packages/context/prompts/memory_agent_bundle_ingest_work_unit.md`
- Modify: `packages/context/src/ingest/ingest-prompts.test.ts`
- Modify: `packages/context/src/ingest/adapters/notion/chunk.ts`
- Modify: `packages/context/src/ingest/adapters/notion/notion.adapter.test.ts`
- [ ] **Step 1: Add failing prompt guards**
Add this test to `packages/context/src/ingest/ingest-prompts.test.ts`:
```ts
it('uses shipped warehouse verification tools in the WorkUnit prompt', async () => {
const prompt = await readFile(
new URL('../../prompts/memory_agent_bundle_ingest_work_unit.md', import.meta.url),
'utf-8',
);
expect(prompt).toContain('discover_data');
expect(prompt).toContain('entity_details');
expect(prompt).not.toContain('wiki_sl_search');
expect(prompt).not.toContain('sl_describe_table');
});
```
- [ ] **Step 2: Run the failing prompt guard**
Run:
```bash
pnpm --filter @ktx/context exec vitest run src/ingest/ingest-prompts.test.ts -t "warehouse verification tools"
```
Expected: FAIL because the WorkUnit prompt still contains `wiki_sl_search` and
`sl_describe_table`.
- [ ] **Step 3: Update the WorkUnit framing prompt**
In `packages/context/prompts/memory_agent_bundle_ingest_work_unit.md`, replace
the first `<role>` paragraph with:
```md
You are processing ONE WorkUnit of a multi-file ingest bundle. The WorkUnit gives you a slice of raw source files (LookML views, dbt/MetricFlow YAMLs, Metabase card JSONs, Notion pages, or similar) and you must translate that slice into KTX semantic-layer sources and/or knowledge wiki pages, in one pass. Prior WorkUnits in this same job may have already written SL sources and wiki pages; their writes are visible on the working branch and discoverable with `discover_data`.
```
In workflow step 2, replace the final sentence with:
```md
The triage skill tells you how to react when `discover_data` reveals that a prior WU already wrote something overlapping.
```
In workflow step 4, replace the sentence that starts
`For each raw file:` with:
```md
4. For each raw file: call `read_raw_file` (or `read_raw_span` for slicing large files) to load content. Before writing a new SL source or wiki page, call `discover_data` for each candidate source, table, metric, or topic name to find prior-WU writes, existing wiki pages, SL sources, and raw warehouse matches; apply `ingest_triage` when you hit one, and apply any matching canonical pin before deciding whether to edit, rename, or skip.
```
In the `<do_not>` block, replace the physical-column rule with:
```md
- Do not invent physical column names or grain keys. For table-backed SL sources, every `columns:`, `grain:`, `joins:`, `segments:`, and `measures[].expr` column must come from raw-file column declarations or warehouse-backed discovery (`discover_data`, `sl_discover`, `entity_details`). If column names are not confirmed, capture the business context in wiki instead of writing a full SL source.
```
- [ ] **Step 4: Update Notion WorkUnit notes**
In `packages/context/src/ingest/adapters/notion/chunk.ts`, replace
`NOTION_SL_WRITE_GUIDANCE` with:
```ts
const NOTION_SL_WRITE_GUIDANCE =
'Write wiki entries with wiki_write. Wiki keys must be flat slugs like orbit-company-overview, not orbit/company-overview. Search existing wiki pages, SL sources, and raw warehouse schema for the same tables or sl_refs with discover_data before creating a new page. Only write or edit SL sources after discover_data plus sl_discover/sl_read_source or entity_details confirms a mapped non-Notion target source; if no mapped target exists, emit_unmapped_fallback and keep the fact wiki-only. Notion dataSourceCount counts Notion databases/data sources only, not warehouse/dbt mappings. If a warehouse/dbt connection exists but the named table or source is absent, use reason no_physical_table rather than no_connection_mapping. Do not create SL sources under the Notion connection just because a page mentions a warehouse table.';
```
In the `reconcileNotes` array in the same file, replace:
```ts
'Notion dataSourceCount is Notion-only; use sl_discover for warehouse/dbt mapping decisions.',
```
with:
```ts
'Notion dataSourceCount is Notion-only; use discover_data/entity_details for warehouse/dbt mapping decisions.',
```
- [ ] **Step 5: Update Notion note expectations**
In `packages/context/src/ingest/adapters/notion/notion.adapter.test.ts`,
update the note expectations in `it('chunks changed Notion pages...')`:
```ts
expect(result.workUnits[0].notes).toContain('discover_data');
expect(result.workUnits[0].notes).toContain('entity_details');
```
Update the exact `reconcileNotes` expectation to:
```ts
expect(result.reconcileNotes).toEqual([
'Notion maxKnowledgeCreatesPerRun=25',
'Notion maxKnowledgeUpdatesPerRun=20',
'Notion dataSourceCount is Notion-only; use discover_data/entity_details for warehouse/dbt mapping decisions.',
'Reconcile Notion wiki pages sharing tables/sl_refs before creating distinct artifacts.',
]);
```
- [ ] **Step 6: Run prompt and Notion note tests**
Run:
```bash
pnpm --filter @ktx/context exec vitest run \
src/ingest/ingest-prompts.test.ts \
src/ingest/adapters/notion/notion.adapter.test.ts
```
Expected: PASS.
- [ ] **Step 7: Commit**
Run:
```bash
git add \
packages/context/prompts/memory_agent_bundle_ingest_work_unit.md \
packages/context/src/ingest/ingest-prompts.test.ts \
packages/context/src/ingest/adapters/notion/chunk.ts \
packages/context/src/ingest/adapters/notion/notion.adapter.test.ts
git commit -m "fix(context): update ingest prompts for warehouse verification tools"
```
### Task 3: Enforce allowed connection scope in discover_data
**Files:**
- Modify: `packages/context/src/ingest/tools/warehouse-verification/discover-data.tool.ts`
- Modify: `packages/context/src/ingest/tools/warehouse-verification/discover-data.tool.test.ts`
- [ ] **Step 1: Write the failing scoping test**
Add this test to
`packages/context/src/ingest/tools/warehouse-verification/discover-data.tool.test.ts`:
```ts
it('refuses explicit out-of-scope connection names', async () => {
const result = await tool.call({ query: 'orders', connectionName: 'billing' }, context);
expect(result.markdown).toContain('Connection "billing" is not available to this ingest stage.');
expect(result.structured).toEqual({ wiki: null, sl: null, raw: null });
expect(wikiSearchTool.call).not.toHaveBeenCalled();
expect(slDiscoverTool.call).not.toHaveBeenCalled();
expect(catalog.searchByName).not.toHaveBeenCalled();
});
```
- [ ] **Step 2: Run the failing scoping test**
Run:
```bash
pnpm --filter @ktx/context exec vitest run src/ingest/tools/warehouse-verification/discover-data.tool.test.ts -t "out-of-scope"
```
Expected: FAIL because `discover_data` currently searches raw schema for an
explicit `connectionName` even when it is not in `allowedConnectionNames`.
- [ ] **Step 3: Add the scope guard**
In
`packages/context/src/ingest/tools/warehouse-verification/discover-data.tool.ts`,
add this helper near `totalSources()`:
```ts
function allowedConnectionNames(context: ToolContext): ReadonlySet<string> | null {
return context.session?.allowedConnectionNames ?? null;
}
```
At the top of `DiscoverDataTool.call()`, before the `sourceName` branch and
before calling any child tool, add:
```ts
const allowed = allowedConnectionNames(context);
if (input.connectionName && allowed && !allowed.has(input.connectionName)) {
return {
markdown: `Connection "${input.connectionName}" is not available to this ingest stage.`,
structured: { wiki: null, sl: null, raw: null },
};
}
```
Then replace the raw connection-list construction with:
```ts
const connections = input.connectionName ? [input.connectionName] : [...(allowed ?? [])].sort();
```
- [ ] **Step 4: Run discover_data tests**
Run:
```bash
pnpm --filter @ktx/context exec vitest run src/ingest/tools/warehouse-verification/discover-data.tool.test.ts
```
Expected: PASS.
- [ ] **Step 5: Commit**
Run:
```bash
git add \
packages/context/src/ingest/tools/warehouse-verification/discover-data.tool.ts \
packages/context/src/ingest/tools/warehouse-verification/discover-data.tool.test.ts
git commit -m "fix(context): scope raw schema discovery to allowed connections"
```
### Task 4: Fix column-level entity_details verification
**Files:**
- Modify: `packages/context/src/ingest/tools/warehouse-verification/warehouse-catalog.service.ts`
- Modify: `packages/context/src/ingest/tools/warehouse-verification/warehouse-catalog.service.test.ts`
- Modify: `packages/context/src/ingest/tools/warehouse-verification/entity-details.tool.ts`
- Modify: `packages/context/src/ingest/tools/warehouse-verification/entity-details.tool.test.ts`
- [ ] **Step 1: Write failing catalog column-target tests**
First update `seedLiveDatabaseScan()` in that test file so BigQuery tables have
a project/catalog. Replace the repeated inline table refs with:
```ts
const tableRef = {
catalog: driver === 'bigquery' ? 'analytics' : null,
db: driver === 'sqlite' ? null : 'public',
name: 'orders',
};
```
Use `tableRef.catalog`, `tableRef.db`, and `tableRef.name` for the seeded
table and profile table references.
Then add these tests to
`packages/context/src/ingest/tools/warehouse-verification/warehouse-catalog.service.test.ts`:
```ts
it('resolves postgres column display strings without treating the column as a table', async () => {
await seedLiveDatabaseScan();
const catalog = new WarehouseCatalogService({ fileStore: project.fileStore });
await expect(catalog.resolveDisplayTarget('warehouse', 'public.orders.status')).resolves.toMatchObject({
resolved: { catalog: null, db: 'public', name: 'orders', column: 'status' },
candidates: [],
dialect: 'postgres',
});
});
it('resolves BigQuery column display strings with four parts', async () => {
await seedLiveDatabaseScan('warehouse', 'sync-bigquery', 'bigquery');
const catalog = new WarehouseCatalogService({ fileStore: project.fileStore });
await expect(catalog.resolveDisplayTarget('warehouse', 'analytics.public.orders.status')).resolves.toMatchObject({
resolved: { catalog: 'analytics', db: 'public', name: 'orders', column: 'status' },
candidates: [],
dialect: 'bigquery',
});
});
```
- [ ] **Step 2: Run the failing catalog tests**
Run:
```bash
pnpm --filter @ktx/context exec vitest run src/ingest/tools/warehouse-verification/warehouse-catalog.service.test.ts -t "column display"
```
Expected: FAIL because `resolveDisplayTarget()` does not exist.
- [ ] **Step 3: Implement column-aware display resolution**
In
`packages/context/src/ingest/tools/warehouse-verification/warehouse-catalog.service.ts`,
add this exported interface near `RawSchemaHit`:
```ts
export interface DisplayTargetResolution {
resolved: (KtxTableRef & { column?: string }) | null;
candidates: KtxTableRef[];
dialect: string;
}
```
Add these helpers near `parseDisplay()`:
```ts
function expectedDisplayPartCount(driver: CatalogDriver): number {
if (driver === 'sqlite' || driver === 'sqlite3') {
return 1;
}
if (driver === 'bigquery' || driver === 'snowflake' || driver === 'sqlserver') {
return 3;
}
return 2;
}
function parseColumnDisplay(driver: CatalogDriver, display: string): (KtxTableRef & { column: string }) | null {
const parts = splitDisplay(display);
const tablePartCount = expectedDisplayPartCount(driver);
if (parts.length !== tablePartCount + 1) {
return null;
}
const column = parts.at(-1);
if (!column) {
return null;
}
const table = parseDisplay(driver, parts.slice(0, -1).join('.'));
return table ? { ...table, column } : null;
}
```
Add this method to `WarehouseCatalogService` after `resolveDisplay()`:
```ts
async resolveDisplayTarget(connectionName: string, display: string): Promise<DisplayTargetResolution> {
const catalog = await this.loadCatalog(connectionName);
if (!catalog) {
return { resolved: null, candidates: [], dialect: 'unknown' };
}
const dialect = getDialectForDriver(catalog.driver).type;
const tableResolution = await this.resolveDisplay(connectionName, display);
if (tableResolution.resolved) {
return tableResolution;
}
const parsedColumn = parseColumnDisplay(catalog.driver, display);
if (!parsedColumn) {
return { resolved: null, candidates: bestCandidates(catalog.tables, display), dialect };
}
const table = catalog.tables.find((candidate) => refsEqual(candidate, parsedColumn));
if (!table) {
return { resolved: null, candidates: bestCandidates(catalog.tables, display), dialect };
}
return {
resolved: {
catalog: table.catalog,
db: table.db,
name: table.name,
column: parsedColumn.column,
},
candidates: [],
dialect,
};
}
```
- [ ] **Step 4: Write failing entity_details column tests**
Add these tests to
`packages/context/src/ingest/tools/warehouse-verification/entity-details.tool.test.ts`:
```ts
it('resolves display targets that include a column name', async () => {
const result = await tool.call(
{ connectionName: 'warehouse', targets: [{ display: 'public.orders.status' }] },
context,
);
expect(result.markdown).toContain('### public.orders');
expect(result.markdown).toContain('- status (text, nullable=false)');
expect(result.markdown).not.toContain('- id (integer');
expect(result.structured.resolved).toHaveLength(1);
expect(result.structured.resolved[0]?.columns.map((column) => column.name)).toEqual(['status']);
});
it('reports missing explicit columns instead of returning an empty column list', async () => {
const result = await tool.call(
{ connectionName: 'warehouse', targets: [{ display: 'public.orders.plan_tier' }] },
context,
);
expect(result.markdown).toContain('Column not found in scan: public.orders.plan_tier');
expect(result.markdown).toContain('Available columns: id, status');
expect(result.structured.resolved).toHaveLength(0);
expect(result.structured.missing).toHaveLength(1);
});
```
- [ ] **Step 5: Run the failing entity_details tests**
Run:
```bash
pnpm --filter @ktx/context exec vitest run src/ingest/tools/warehouse-verification/entity-details.tool.test.ts -t "column"
```
Expected: FAIL because display column targets are treated as table names and
missing columns are not reported.
- [ ] **Step 6: Use column-aware resolution in entity_details**
In
`packages/context/src/ingest/tools/warehouse-verification/entity-details.tool.ts`,
add this helper near `appendTableMarkdown()`:
```ts
function findColumn(detail: TableDetail, columnName: string): TableDetail['columns'][number] | null {
const normalized = columnName.toLowerCase();
return detail.columns.find((column) => column.name.toLowerCase() === normalized) ?? null;
}
```
Replace the display resolution block inside the `for (const target of
input.targets)` loop with:
```ts
const resolution =
'display' in target
? await catalog.resolveDisplayTarget(input.connectionName, target.display)
: {
resolved: { catalog: target.catalog, db: target.db, name: target.name, column: target.column },
candidates: [],
dialect: '',
};
```
After `const detail = await catalog.getTable(...)`, replace the existing
`resolved.push(detail); appendTableMarkdown(...)` lines with:
```ts
const requestedColumn = resolution.resolved.column;
if (requestedColumn) {
const column = findColumn(detail, requestedColumn);
if (!column) {
missing.push({
target,
candidates: [{ catalog: detail.catalog, db: detail.db, name: detail.name }],
});
parts.push(`Column not found in scan: ${detail.display}.${requestedColumn}`);
parts.push(`Available columns: ${detail.columns.map((candidate) => candidate.name).join(', ')}`);
continue;
}
const scopedDetail = { ...detail, columns: [column] };
resolved.push(scopedDetail);
appendTableMarkdown(parts, scopedDetail, column.name);
continue;
}
resolved.push(detail);
appendTableMarkdown(parts, detail);
```
- [ ] **Step 7: Run warehouse verification tests**
Run:
```bash
pnpm --filter @ktx/context exec vitest run \
src/ingest/tools/warehouse-verification/warehouse-catalog.service.test.ts \
src/ingest/tools/warehouse-verification/entity-details.tool.test.ts
```
Expected: PASS.
- [ ] **Step 8: Commit**
Run:
```bash
git add \
packages/context/src/ingest/tools/warehouse-verification/warehouse-catalog.service.ts \
packages/context/src/ingest/tools/warehouse-verification/warehouse-catalog.service.test.ts \
packages/context/src/ingest/tools/warehouse-verification/entity-details.tool.ts \
packages/context/src/ingest/tools/warehouse-verification/entity-details.tool.test.ts
git commit -m "fix(context): verify warehouse column display targets"
```
### Task 5: Verify the v1 gap closure
**Files:**
- Verify all files changed by Tasks 1-4.
- [ ] **Step 1: Run focused tests**
Run:
```bash
pnpm --filter @ktx/context exec vitest run \
src/ingest/adapters/notion/notion.adapter.test.ts \
src/ingest/local-adapters.test.ts \
src/ingest/ingest-prompts.test.ts \
src/ingest/tools/warehouse-verification/discover-data.tool.test.ts \
src/ingest/tools/warehouse-verification/warehouse-catalog.service.test.ts \
src/ingest/tools/warehouse-verification/entity-details.tool.test.ts
```
Expected: PASS.
- [ ] **Step 2: Run package type-check**
Run:
```bash
pnpm --filter @ktx/context run type-check
```
Expected: PASS.
- [ ] **Step 3: Run package tests**
Run:
```bash
pnpm --filter @ktx/context run test
```
Expected: PASS.
- [ ] **Step 4: Run pre-commit on changed files when configured**
Run:
```bash
uv run pre-commit run --files \
packages/context/src/ingest/adapters/notion/notion.adapter.ts \
packages/context/src/ingest/adapters/notion/notion.adapter.test.ts \
packages/context/src/ingest/local-adapters.ts \
packages/context/src/ingest/local-adapters.test.ts \
packages/context/src/ingest/adapters/notion/chunk.ts \
packages/context/prompts/memory_agent_bundle_ingest_work_unit.md \
packages/context/src/ingest/ingest-prompts.test.ts \
packages/context/src/ingest/tools/warehouse-verification/discover-data.tool.ts \
packages/context/src/ingest/tools/warehouse-verification/discover-data.tool.test.ts \
packages/context/src/ingest/tools/warehouse-verification/warehouse-catalog.service.ts \
packages/context/src/ingest/tools/warehouse-verification/warehouse-catalog.service.test.ts \
packages/context/src/ingest/tools/warehouse-verification/entity-details.tool.ts \
packages/context/src/ingest/tools/warehouse-verification/entity-details.tool.test.ts
```
Expected: PASS. If the repo has no pre-commit config or the local `uv` version
cannot satisfy the project pin, record the exact error and rely on focused
tests plus type-check.
- [ ] **Step 5: Inspect final git status**
Run:
```bash
git status --short
```
Expected: only intentional files are modified. Commit any formatter-driven
changes with:
```bash
git add packages/context
git commit -m "chore(context): verify warehouse verification v1 gaps"
```
## Self-review checklist
- Spec coverage: this plan closes the remaining v1 paths for Notion warehouse
verification, active WorkUnit prompt correctness, raw discovery scoping, and
column-level identifier verification.
- Placeholder scan: no task relies on future-work markers, unnamed edge-case
handling, or cross-task shorthand.
- Type consistency: `discover_data` continues to use `connectionName`,
`sl_discover` still receives `connectionId` internally, and
`resolveDisplayTarget()` returns the same table identity plus optional
`column`.

View file

@ -0,0 +1,957 @@
# Warehouse Verification Final V1 Closure Implementation Plan
> **For agentic workers:** REQUIRED SUB-SKILL: Use superpowers:subagent-driven-development (recommended) or superpowers:executing-plans to implement this plan task-by-task. Steps use checkbox (`- [ ]`) syntax for tracking.
**Goal:** Close the remaining v1 gaps that still prevent ingest agents from
reliably following warehouse verification results through to `entity_details`
and `sql_execution`.
**Architecture:** Keep the existing warehouse verification module and runner
session scoping. Add connection names to raw discovery hits, expose primary
warehouse targets from the remaining source adapters, and make local ingest
SQL probes use the same scan connector read-only execution path as schema scan.
**Tech Stack:** TypeScript, Node 22, Vitest, AI SDK v6 tools, Zod, KTX local
ingest runtime, KTX scan connectors.
---
## Audit summary
The first two implementation plans landed the warehouse verification tools,
prompt protocol, Notion warehouse scoping, and stale prompt-name cleanup. The
focused audit on May 12, 2026, found three remaining v1-blocking gaps:
- `discover_data` searches multiple allowed raw warehouse scans, but raw hits do
not carry or render `connectionName`. The tool tells the agent to call
`entity_details({connectionName, targets: [...]})`, then omits the required
`connectionName` from the follow-up evidence.
- Local LookML and MetricFlow adapters do not expose primary warehouse target
IDs. The runner only adds adapter-provided targets to `allowedConnectionNames`,
so those WorkUnits cannot use raw warehouse verification unless their source
connection is itself the warehouse.
- `sql_execution` calls the local ingest connection catalog, but the catalog
either has no query executor in normal CLI ingest or calls an injected
executor without `projectDir` and connection config. The default local query
executor cannot dispatch without that config.
Non-blocking gaps remain out of scope for this v1 plan:
- Full DDL-style `entity_details` formatting with FK profile summaries.
- AST-backed SQL read-only validation for data-modifying CTE bodies.
- Search over generated `enrichment/descriptions.json`.
- Lexicographic latest-sync edge cases for non-timestamp sync IDs.
- Hard write-time validation in `wiki_write` and `emit_unmapped_fallback`.
## File structure
Modify these files:
- `packages/context/src/ingest/tools/warehouse-verification/warehouse-catalog.service.ts`:
add `connectionName` to raw schema hit records.
- `packages/context/src/ingest/tools/warehouse-verification/discover-data.tool.ts`:
render raw hit connection names and preserve them in structured output.
- `packages/context/src/ingest/tools/warehouse-verification/discover-data.tool.test.ts`:
cover multi-connection raw discovery follow-up data.
- `packages/context/src/ingest/adapters/lookml/lookml.adapter.ts`:
accept and return configured target warehouse connection IDs.
- `packages/context/src/ingest/adapters/lookml/lookml.adapter.test.ts`:
cover LookML target warehouse IDs.
- `packages/context/src/ingest/adapters/metricflow/metricflow.adapter.ts`:
accept and return configured target warehouse connection IDs.
- `packages/context/src/ingest/adapters/metricflow/metricflow.adapter.test.ts`:
cover MetricFlow target warehouse IDs.
- `packages/context/src/ingest/local-adapters.ts`:
pass primary warehouse IDs into LookML and MetricFlow adapters.
- `packages/context/src/ingest/local-adapters.test.ts`:
cover local adapter warehouse target fan-out.
- `packages/context/src/ingest/local-bundle-runtime.ts`:
pass full project connection config to local ingest query executors.
- `packages/context/src/ingest/local-bundle-runtime.test.ts`:
cover the local ingest query executor call shape.
- `packages/context/src/ingest/local-ingest.ts`:
use the shared query executor port type.
- `packages/context/src/mcp/local-project-ports.ts`:
no behavior change expected, but type-checks against the updated local ingest
query executor type.
- `packages/cli/src/ingest.ts`:
provide a read-only scan-connector-backed query executor for normal local
ingest runs.
Create these files:
- `packages/cli/src/ingest-query-executor.ts`: CLI query executor that adapts
scan connectors' `executeReadOnly()` method to `KtxSqlQueryExecutorPort`.
- `packages/cli/src/ingest-query-executor.test.ts`: unit coverage for the CLI
ingest query executor.
### Task 1: Preserve raw discovery connection names
**Files:**
- Modify: `packages/context/src/ingest/tools/warehouse-verification/warehouse-catalog.service.ts`
- Modify: `packages/context/src/ingest/tools/warehouse-verification/discover-data.tool.ts`
- Modify: `packages/context/src/ingest/tools/warehouse-verification/discover-data.tool.test.ts`
- [ ] **Step 1: Write the failing multi-connection discovery test**
Add this test to
`packages/context/src/ingest/tools/warehouse-verification/discover-data.tool.test.ts`:
```ts
it('includes connectionName on raw schema hits so entity_details can follow up', async () => {
const multiConnectionContext: ToolContext = {
...context,
session: { allowedConnectionNames: new Set(['warehouse', 'analytics']) } as any,
};
catalog.searchByName.mockImplementation(async (connectionName: string, query: string) => [
{
kind: 'table',
connectionName,
ref: { catalog: null, db: 'public', name: `${connectionName}_${query}` },
display: `public.${connectionName}_${query}`,
matchedOn: 'name',
},
]);
const result = await tool.call({ query: 'orders', limit: 10 }, multiConnectionContext);
expect(catalog.searchByName).toHaveBeenCalledWith('analytics', 'orders', 10);
expect(catalog.searchByName).toHaveBeenCalledWith('warehouse', 'orders', 10);
expect(result.markdown).toContain('connectionName=analytics');
expect(result.markdown).toContain('connectionName=warehouse');
expect(result.markdown).toContain(
'entity_details({connectionName: "analytics", targets: [{display: "public.analytics_orders"}]})',
);
expect(result.structured.raw?.hits.map((hit) => hit.connectionName)).toEqual([
'analytics',
'warehouse',
]);
});
```
- [ ] **Step 2: Run the failing discovery test**
Run:
```bash
pnpm --filter @ktx/context exec vitest run src/ingest/tools/warehouse-verification/discover-data.tool.test.ts -t "connectionName on raw schema hits"
```
Expected: FAIL because `RawSchemaHit` has no `connectionName` property and the
markdown only renders the display string.
- [ ] **Step 3: Add `connectionName` to raw schema hits**
Modify the raw hit type and hit construction in
`packages/context/src/ingest/tools/warehouse-verification/warehouse-catalog.service.ts`:
```ts
export type RawSchemaHit =
| {
kind: 'table';
connectionName: string;
ref: KtxTableRef;
display: string;
matchedOn: 'name' | 'db' | 'comment' | 'description';
}
| {
kind: 'column';
connectionName: string;
ref: KtxTableRef & { column: string };
display: string;
matchedOn: 'name' | 'comment' | 'description';
};
```
In the table hit block, add `connectionName`:
```ts
hits.push({
kind: 'table',
connectionName,
ref: { catalog: table.catalog, db: table.db, name: table.name },
display: formatDisplay(catalog.driver, table),
matchedOn: tableMatch,
});
```
In the column hit block, add `connectionName`:
```ts
hits.push({
kind: 'column',
connectionName,
ref: { catalog: table.catalog, db: table.db, name: table.name, column: column.name },
display: `${formatDisplay(catalog.driver, table)}.${column.name}`,
matchedOn: columnMatch,
});
```
- [ ] **Step 4: Render follow-up-ready raw hits**
Modify the raw schema markdown in
`packages/context/src/ingest/tools/warehouse-verification/discover-data.tool.ts`:
```ts
parts.push('## Raw Warehouse Schema', '> use `entity_details({connectionName, targets: [{display}]})` for full DDL + sample values');
parts.push(
rawHits
.slice(0, limit)
.map(
(hit) =>
`- ${hit.kind}: ${hit.display} [connectionName=${hit.connectionName}] (matched on ${hit.matchedOn}) — ` +
`follow up with \`entity_details({connectionName: "${hit.connectionName}", targets: [{display: "${hit.display}"}]})\``,
)
.join('\n'),
);
```
- [ ] **Step 5: Run the discovery test**
Run:
```bash
pnpm --filter @ktx/context exec vitest run src/ingest/tools/warehouse-verification/discover-data.tool.test.ts
```
Expected: PASS.
- [ ] **Step 6: Commit**
Run:
```bash
git add \
packages/context/src/ingest/tools/warehouse-verification/warehouse-catalog.service.ts \
packages/context/src/ingest/tools/warehouse-verification/discover-data.tool.ts \
packages/context/src/ingest/tools/warehouse-verification/discover-data.tool.test.ts
git commit -m "fix(context): include raw discovery connection names"
```
### Task 2: Expose LookML and MetricFlow warehouse targets
**Files:**
- Modify: `packages/context/src/ingest/adapters/lookml/lookml.adapter.ts`
- Modify: `packages/context/src/ingest/adapters/lookml/lookml.adapter.test.ts`
- Modify: `packages/context/src/ingest/adapters/metricflow/metricflow.adapter.ts`
- Modify: `packages/context/src/ingest/adapters/metricflow/metricflow.adapter.test.ts`
- Modify: `packages/context/src/ingest/local-adapters.ts`
- Modify: `packages/context/src/ingest/local-adapters.test.ts`
- [ ] **Step 1: Write failing adapter target tests**
Add this test to
`packages/context/src/ingest/adapters/lookml/lookml.adapter.test.ts`:
```ts
it('returns configured target warehouse connection ids', async () => {
const adapter = new LookmlSourceAdapter({
homeDir: join(tmpRoot, 'home'),
targetConnectionIds: ['warehouse', 'analytics', 'warehouse'],
});
await expect(adapter.listTargetConnectionIds?.(join(tmpRoot, 'staged'))).resolves.toEqual([
'analytics',
'warehouse',
]);
});
```
Add this test to
`packages/context/src/ingest/adapters/metricflow/metricflow.adapter.test.ts`:
```ts
it('returns configured target warehouse connection ids', async () => {
const metricflow = new MetricflowSourceAdapter({
homeDir: join(tmpRoot, 'cache-home'),
targetConnectionIds: ['warehouse', 'analytics', 'warehouse'],
});
await expect(metricflow.listTargetConnectionIds?.(stagedDir)).resolves.toEqual([
'analytics',
'warehouse',
]);
});
```
- [ ] **Step 2: Run the failing adapter tests**
Run:
```bash
pnpm --filter @ktx/context exec vitest run \
src/ingest/adapters/lookml/lookml.adapter.test.ts -t "target warehouse connection ids" \
src/ingest/adapters/metricflow/metricflow.adapter.test.ts -t "target warehouse connection ids"
```
Expected: FAIL because neither adapter accepts `targetConnectionIds` or
implements `listTargetConnectionIds()`.
- [ ] **Step 3: Implement target ID support in LookML**
Modify `packages/context/src/ingest/adapters/lookml/lookml.adapter.ts`:
```ts
export interface LookmlSourceAdapterDeps {
homeDir: string;
targetConnectionIds?: string[];
}
function uniqueSorted(values: readonly string[] | undefined): string[] {
return [...new Set(values ?? [])].sort((left, right) => left.localeCompare(right));
}
```
Add this method to `LookmlSourceAdapter`:
```ts
async listTargetConnectionIds(_stagedDir: string): Promise<string[]> {
return uniqueSorted(this.deps.targetConnectionIds);
}
```
- [ ] **Step 4: Implement target ID support in MetricFlow**
Modify `packages/context/src/ingest/adapters/metricflow/metricflow.adapter.ts`:
```ts
export interface MetricflowSourceAdapterDeps {
homeDir: string;
targetConnectionIds?: string[];
}
function uniqueSorted(values: readonly string[] | undefined): string[] {
return [...new Set(values ?? [])].sort((left, right) => left.localeCompare(right));
}
```
Add this method to `MetricflowSourceAdapter`:
```ts
async listTargetConnectionIds(_stagedDir: string): Promise<string[]> {
return uniqueSorted(this.deps.targetConnectionIds);
}
```
- [ ] **Step 5: Pass primary warehouses from the local adapter factory**
Modify the LookML and MetricFlow adapter construction in
`packages/context/src/ingest/local-adapters.ts`:
```ts
new LookmlSourceAdapter({
homeDir: join(project.projectDir, '.ktx/cache'),
targetConnectionIds: primaryWarehouseConnectionIds(project),
}),
```
```ts
new MetricflowSourceAdapter({
homeDir: join(project.projectDir, '.ktx/cache'),
targetConnectionIds: primaryWarehouseConnectionIds(project),
}),
```
- [ ] **Step 6: Write the local adapter fan-out test**
Add this test to `packages/context/src/ingest/local-adapters.test.ts`:
```ts
it('passes primary warehouse connection ids to local LookML and MetricFlow adapters', async () => {
const adapters = createDefaultLocalIngestAdapters(
projectWithConnections({
warehouse: {
driver: 'postgres',
url: 'postgresql://readonly@db.example.test/analytics',
},
lookml_docs: {
driver: 'lookml',
lookml: {
repoUrl: 'https://github.com/acme/lookml.git',
},
},
metrics_repo: {
driver: 'metricflow',
metricflow: {
repoUrl: 'https://github.com/acme/metrics.git',
},
},
} as never),
);
const lookml = adapters.find((adapter) => adapter.source === 'lookml');
const metricflow = adapters.find((adapter) => adapter.source === 'metricflow');
await expect(lookml?.listTargetConnectionIds?.('/tmp/staged-lookml')).resolves.toEqual([
'warehouse',
]);
await expect(metricflow?.listTargetConnectionIds?.('/tmp/staged-metricflow')).resolves.toEqual([
'warehouse',
]);
});
```
- [ ] **Step 7: Run the target fan-out tests**
Run:
```bash
pnpm --filter @ktx/context exec vitest run \
src/ingest/adapters/lookml/lookml.adapter.test.ts \
src/ingest/adapters/metricflow/metricflow.adapter.test.ts \
src/ingest/local-adapters.test.ts
```
Expected: PASS.
- [ ] **Step 8: Commit**
Run:
```bash
git add \
packages/context/src/ingest/adapters/lookml/lookml.adapter.ts \
packages/context/src/ingest/adapters/lookml/lookml.adapter.test.ts \
packages/context/src/ingest/adapters/metricflow/metricflow.adapter.ts \
packages/context/src/ingest/adapters/metricflow/metricflow.adapter.test.ts \
packages/context/src/ingest/local-adapters.ts \
packages/context/src/ingest/local-adapters.test.ts
git commit -m "fix(context): expose warehouse targets for LookML and MetricFlow"
```
### Task 3: Pass full connection config to local ingest SQL execution
**Files:**
- Modify: `packages/context/src/ingest/local-bundle-runtime.ts`
- Modify: `packages/context/src/ingest/local-bundle-runtime.test.ts`
- Modify: `packages/context/src/ingest/local-ingest.ts`
- [ ] **Step 1: Write the failing local connection catalog test**
In `packages/context/src/ingest/local-bundle-runtime.test.ts`, change the
Vitest import to include `vi`:
```ts
import { afterEach, beforeEach, describe, expect, it, vi } from 'vitest';
```
Extend `RuntimeWithConnectionDeps`:
```ts
type RuntimeWithConnectionDeps = {
deps: {
connections: {
listEnabledConnections(ids: string[]): Promise<Array<{ id: string; name: string; connectionType: string }>>;
getConnectionById(connectionId: string): Promise<{ id: string; name: string; connectionType: string } | null>;
executeQuery(connectionId: string, sql: string): Promise<unknown>;
};
};
};
```
Add this test:
```ts
it('passes project connection config to local ingest query executors', async () => {
const agentRunner = new AgentRunnerService({ llmProvider: { getModel: () => ({}) as never } as any });
const queryExecutor = {
execute: vi.fn(async () => ({
headers: ['answer'],
rows: [[1]],
totalRows: 1,
command: 'SELECT',
rowCount: 1,
})),
};
const runtime = createLocalBundleIngestRuntime({
project,
adapters: [new FakeSourceAdapter()],
agentRunner,
queryExecutor,
});
const connections = (runtime.runner as unknown as RuntimeWithConnectionDeps).deps.connections;
await expect(connections.executeQuery('warehouse', 'select 1')).resolves.toMatchObject({
headers: ['answer'],
});
expect(queryExecutor.execute).toHaveBeenCalledWith({
connectionId: 'warehouse',
projectDir: project.projectDir,
connection: project.config.connections.warehouse,
sql: 'select 1',
});
});
```
- [ ] **Step 2: Run the failing local runtime test**
Run:
```bash
pnpm --filter @ktx/context exec vitest run src/ingest/local-bundle-runtime.test.ts -t "project connection config"
```
Expected: FAIL because `LocalConnectionCatalog.executeQuery()` only passes
`connectionId` and `sql`.
- [ ] **Step 3: Update local ingest query executor types**
In `packages/context/src/ingest/local-bundle-runtime.ts`, import the shared
query executor type:
```ts
import { localConnectionInfoFromConfig, type KtxSqlQueryExecutorPort } from '../connections/index.js';
```
Change `CreateLocalBundleIngestRuntimeOptions.queryExecutor` to:
```ts
queryExecutor?: KtxSqlQueryExecutorPort;
```
Change `LocalConnectionCatalog` to store that type:
```ts
class LocalConnectionCatalog implements SlConnectionCatalogPort {
constructor(
private readonly project: KtxLocalProject,
private readonly queryExecutor?: KtxSqlQueryExecutorPort,
) {}
```
Change `executeQuery()`:
```ts
async executeQuery(connectionId: string, sql: string): Promise<KtxQueryResult> {
if (!this.queryExecutor) {
throw new Error('Local ingest has no query executor configured');
}
return this.queryExecutor.execute({
connectionId,
projectDir: this.project.projectDir,
connection: this.project.config.connections[connectionId],
sql,
});
}
```
In `packages/context/src/ingest/local-ingest.ts`, replace the local query
executor object type with the shared port:
```ts
import type { KtxSqlQueryExecutorPort } from '../connections/index.js';
```
```ts
queryExecutor?: KtxSqlQueryExecutorPort;
```
- [ ] **Step 4: Run the local runtime test**
Run:
```bash
pnpm --filter @ktx/context exec vitest run src/ingest/local-bundle-runtime.test.ts -t "project connection config"
```
Expected: PASS.
- [ ] **Step 5: Commit**
Run:
```bash
git add \
packages/context/src/ingest/local-bundle-runtime.ts \
packages/context/src/ingest/local-bundle-runtime.test.ts \
packages/context/src/ingest/local-ingest.ts
git commit -m "fix(context): pass connection config to ingest query executors"
```
### Task 4: Supply a scan-connector query executor to CLI ingest
**Files:**
- Create: `packages/cli/src/ingest-query-executor.ts`
- Create: `packages/cli/src/ingest-query-executor.test.ts`
- Modify: `packages/cli/src/ingest.ts`
- [ ] **Step 1: Write the CLI query executor tests**
Create `packages/cli/src/ingest-query-executor.test.ts`:
```ts
import type { KtxLocalProject } from '@ktx/context/project';
import { createKtxConnectorCapabilities, type KtxScanConnector } from '@ktx/context/scan';
import { describe, expect, it, vi } from 'vitest';
import { createKtxCliIngestQueryExecutor } from './ingest-query-executor.js';
function project(): KtxLocalProject {
return {
projectDir: '/tmp/ktx-query-project',
config: {
project: 'warehouse',
connections: {
warehouse: { driver: 'postgres', url: 'postgresql://readonly@example.test/db' },
},
},
} as unknown as KtxLocalProject;
}
function connector(overrides: Partial<KtxScanConnector> = {}): KtxScanConnector {
return {
id: 'warehouse',
driver: 'postgres',
capabilities: createKtxConnectorCapabilities({ readOnlySql: true }),
async introspect() {
throw new Error('introspect is not used by this test');
},
executeReadOnly: vi.fn(async () => ({
headers: ['answer'],
rows: [[1]],
totalRows: 1,
rowCount: 1,
})),
cleanup: vi.fn(async () => {}),
...overrides,
};
}
describe('createKtxCliIngestQueryExecutor', () => {
it('executes read-only SQL through the scan connector and cleans it up', async () => {
const scanConnector = connector();
const createConnector = vi.fn(async () => scanConnector);
const executor = createKtxCliIngestQueryExecutor(project(), { createConnector });
await expect(
executor.execute({
connectionId: 'warehouse',
connection: { driver: 'postgres', url: 'postgresql://readonly@example.test/db' },
projectDir: '/tmp/ktx-query-project',
sql: 'select 1',
maxRows: 5,
}),
).resolves.toMatchObject({
headers: ['answer'],
rows: [[1]],
totalRows: 1,
command: 'SELECT',
rowCount: 1,
});
expect(createConnector).toHaveBeenCalledWith(project(), 'warehouse');
expect(scanConnector.executeReadOnly).toHaveBeenCalledWith(
{ connectionId: 'warehouse', sql: 'select 1', maxRows: 5 },
{ runId: 'ingest-sql-execution' },
);
expect(scanConnector.cleanup).toHaveBeenCalledTimes(1);
});
it('rejects connectors without read-only SQL support', async () => {
const scanConnector = connector({
capabilities: createKtxConnectorCapabilities({ readOnlySql: false }),
executeReadOnly: undefined,
});
const executor = createKtxCliIngestQueryExecutor(project(), {
createConnector: vi.fn(async () => scanConnector),
});
await expect(
executor.execute({
connectionId: 'warehouse',
connection: { driver: 'postgres' },
projectDir: '/tmp/ktx-query-project',
sql: 'select 1',
}),
).rejects.toThrow('Connection "warehouse" driver "postgres" does not support read-only SQL execution.');
expect(scanConnector.cleanup).toHaveBeenCalledTimes(1);
});
});
```
- [ ] **Step 2: Run the failing CLI query executor test**
Run:
```bash
pnpm --filter @ktx/cli exec vitest run src/ingest-query-executor.test.ts
```
Expected: FAIL because `ingest-query-executor.ts` does not exist.
- [ ] **Step 3: Add the scan-connector-backed query executor**
Create `packages/cli/src/ingest-query-executor.ts`:
```ts
import type { KtxSqlQueryExecutionInput, KtxSqlQueryExecutorPort } from '@ktx/context/connections';
import type { KtxLocalProject } from '@ktx/context/project';
import type { KtxScanConnector, KtxScanContext } from '@ktx/context/scan';
import { createKtxCliScanConnector } from './local-scan-connectors.js';
type CreateConnector = typeof createKtxCliScanConnector;
export interface KtxCliIngestQueryExecutorDeps {
createConnector?: CreateConnector;
}
async function cleanupConnector(connector: KtxScanConnector | null): Promise<void> {
await connector?.cleanup?.();
}
export function createKtxCliIngestQueryExecutor(
project: KtxLocalProject,
deps: KtxCliIngestQueryExecutorDeps = {},
): KtxSqlQueryExecutorPort {
const createConnector = deps.createConnector ?? createKtxCliScanConnector;
return {
async execute(input: KtxSqlQueryExecutionInput) {
let connector: KtxScanConnector | null = null;
try {
connector = await createConnector(project, input.connectionId);
if (!connector.capabilities.readOnlySql || !connector.executeReadOnly) {
throw new Error(
`Connection "${input.connectionId}" driver "${connector.driver}" does not support read-only SQL execution.`,
);
}
const ctx: KtxScanContext = { runId: 'ingest-sql-execution' };
const result = await connector.executeReadOnly(
{ connectionId: input.connectionId, sql: input.sql, maxRows: input.maxRows },
ctx,
);
return {
headers: result.headers,
rows: result.rows,
totalRows: result.totalRows,
command: 'SELECT',
rowCount: result.rowCount,
};
} finally {
await cleanupConnector(connector);
}
},
};
}
```
- [ ] **Step 4: Wire the CLI executor into local ingest runs**
In `packages/cli/src/ingest.ts`, import the executor and type:
```ts
import type { KtxSqlQueryExecutorPort } from '@ktx/context/connections';
import type { KtxLocalProject } from '@ktx/context/project';
import { createKtxCliIngestQueryExecutor } from './ingest-query-executor.js';
```
Extend `KtxIngestDeps`:
```ts
createQueryExecutor?: (project: KtxLocalProject) => KtxSqlQueryExecutorPort;
```
Inside the `args.command === 'run'` branch, after `localIngestOptions` is
defined, add:
```ts
const queryExecutor =
localIngestOptions.queryExecutor ??
(deps.createQueryExecutor ?? createKtxCliIngestQueryExecutor)(project);
```
Pass `queryExecutor` to both local ingest execution paths. In the Metabase
fan-out call:
```ts
...localIngestOptions,
queryExecutor,
trigger: 'manual_resync',
```
In the normal local ingest call:
```ts
...localIngestOptions,
queryExecutor,
pullConfigOptions: adapterOptions,
```
- [ ] **Step 5: Add CLI wiring coverage**
Add this test to `packages/cli/src/ingest.test.ts`:
```ts
it('supplies a scan-connector query executor to local ingest runs', async () => {
const io = makeIo();
const projectDir = join(tempDir, 'query-executor-project');
await writeWarehouseConfig(projectDir);
const queryExecutor = {
execute: vi.fn(async () => ({
headers: [],
rows: [],
totalRows: 0,
command: 'SELECT',
rowCount: 0,
})),
};
const runLocalIngest = vi.fn(async (input: RunLocalIngestOptions): Promise<LocalIngestResult> =>
completedLocalBundleRun(input, 'query-executor-run'),
);
await expect(
runKtxIngest(
{
command: 'run',
projectDir,
connectionId: 'warehouse',
adapter: 'fake',
outputMode: 'json',
},
io.io,
{
runLocalIngest,
createAdapters: () => [],
createQueryExecutor: () => queryExecutor,
},
),
).resolves.toBe(0);
expect(runLocalIngest).toHaveBeenCalledWith(expect.objectContaining({ queryExecutor }));
});
```
- [ ] **Step 6: Run CLI query executor tests**
Run:
```bash
pnpm --filter @ktx/cli exec vitest run src/ingest-query-executor.test.ts src/ingest.test.ts -t "query executor"
```
Expected: PASS.
- [ ] **Step 7: Commit**
Run:
```bash
git add \
packages/cli/src/ingest-query-executor.ts \
packages/cli/src/ingest-query-executor.test.ts \
packages/cli/src/ingest.ts \
packages/cli/src/ingest.test.ts
git commit -m "fix(cli): enable read-only SQL probes for local ingest"
```
### Task 5: Final verification
**Files:**
- Verify: all files changed by Tasks 1-4.
- [ ] **Step 1: Run focused context tests**
Run:
```bash
pnpm --filter @ktx/context exec vitest run \
src/ingest/tools/warehouse-verification/warehouse-catalog.service.test.ts \
src/ingest/tools/warehouse-verification/entity-details.tool.test.ts \
src/ingest/tools/warehouse-verification/discover-data.tool.test.ts \
src/ingest/tools/warehouse-verification/sql-execution.tool.test.ts \
src/ingest/local-bundle-runtime.test.ts \
src/ingest/local-adapters.test.ts \
src/ingest/adapters/lookml/lookml.adapter.test.ts \
src/ingest/adapters/metricflow/metricflow.adapter.test.ts \
src/ingest/ingest-bundle.runner.test.ts
```
Expected: PASS.
- [ ] **Step 2: Run focused CLI tests**
Run:
```bash
pnpm --filter @ktx/cli exec vitest run src/ingest-query-executor.test.ts src/ingest.test.ts
```
Expected: PASS.
- [ ] **Step 3: Run type checks**
Run:
```bash
pnpm --filter @ktx/context run type-check
pnpm --filter @ktx/cli run type-check
```
Expected: both commands pass.
- [ ] **Step 4: Run pre-commit on changed files if configured**
Run:
```bash
uv run pre-commit run --files \
packages/context/src/ingest/tools/warehouse-verification/warehouse-catalog.service.ts \
packages/context/src/ingest/tools/warehouse-verification/discover-data.tool.ts \
packages/context/src/ingest/tools/warehouse-verification/discover-data.tool.test.ts \
packages/context/src/ingest/adapters/lookml/lookml.adapter.ts \
packages/context/src/ingest/adapters/lookml/lookml.adapter.test.ts \
packages/context/src/ingest/adapters/metricflow/metricflow.adapter.ts \
packages/context/src/ingest/adapters/metricflow/metricflow.adapter.test.ts \
packages/context/src/ingest/local-adapters.ts \
packages/context/src/ingest/local-adapters.test.ts \
packages/context/src/ingest/local-bundle-runtime.ts \
packages/context/src/ingest/local-bundle-runtime.test.ts \
packages/context/src/ingest/local-ingest.ts \
packages/cli/src/ingest-query-executor.ts \
packages/cli/src/ingest-query-executor.test.ts \
packages/cli/src/ingest.ts \
packages/cli/src/ingest.test.ts \
docs/superpowers/plans/2026-05-12-warehouse-verification-final-v1-closure.md
```
Expected: PASS. If the repository has no pre-commit config or the local `uv`
version cannot satisfy the configured toolchain, record the exact error and use
the focused test and type-check results as the closest verification.
- [ ] **Step 5: Commit final verification fixes if any were needed**
If verification required edits, run:
```bash
git add <changed-files>
git commit -m "test: cover warehouse verification v1 closure"
```
If verification required no edits, do not create an empty commit.
## Self-review
Spec coverage:
- Raw warehouse discovery still covers wiki, semantic-layer, and raw schema
results, and now raw hits include the connection name needed by the required
`entity_details` follow-up.
- Every local synthesis adapter with an external source connection now has a
path to target warehouse IDs: dbt and Notion already had it, Looker resolves
staged mappings, Metabase fan-out runs under target warehouse IDs, and this
plan adds LookML and MetricFlow.
- `sql_execution` remains scoped by `allowedConnectionNames`, retains the
read-only SQL wrapper, and gains a normal local ingest execution backend.
Placeholder scan:
- This plan contains no deferred implementation placeholders.
- Every code-changing step includes the exact test or implementation snippet to
add.
Type consistency:
- `connectionName` is added to `RawSchemaHit` and used by `DiscoverDataTool`.
- `targetConnectionIds` and `listTargetConnectionIds()` match the existing dbt
and Notion adapter pattern.
- Local ingest uses `KtxSqlQueryExecutorPort` consistently from CLI to context.

File diff suppressed because it is too large Load diff

View file

@ -0,0 +1,345 @@
# Warehouse Verification Prompt Shape Closure Implementation Plan
> **For agentic workers:** REQUIRED SUB-SKILL: Use superpowers:subagent-driven-development (recommended) or superpowers:executing-plans to implement this plan task-by-task. Steps use checkbox (`- [ ]`) syntax for tracking.
**Goal:** Make every warehouse-verification prompt use KTX's shipped
`sql_execution` input shape so ingest agents include `connectionName` when they
probe warehouse identifiers.
**Architecture:** Keep the warehouse verification tool code unchanged. Add
prompt-asset tests that reject Kaelio's old session-only SQL examples, then
update the shared identifier protocol and the three remaining per-skill SQL
probe examples that still show the legacy shape.
**Tech Stack:** Markdown skill prompts, TypeScript, Vitest, pnpm workspace
commands.
---
## Audit Summary
The warehouse verification tools, runner wiring, adapter target fan-out, and
focused tests are present. Focused verification passed:
```bash
pnpm --filter @ktx/context exec vitest run src/connections/dialects.test.ts src/connections/read-only-sql.test.ts src/ingest/tools/warehouse-verification/warehouse-catalog.service.test.ts src/ingest/tools/warehouse-verification/entity-details.tool.test.ts src/ingest/tools/warehouse-verification/sql-execution.tool.test.ts src/ingest/tools/warehouse-verification/discover-data.tool.test.ts src/ingest/ingest-prompts.test.ts src/ingest/ingest-runtime-assets.test.ts src/memory/memory-runtime-assets.test.ts src/ingest/local-adapters.test.ts src/ingest/adapters/notion/notion.adapter.test.ts src/ingest/adapters/lookml/lookml.adapter.test.ts src/ingest/adapters/metricflow/metricflow.adapter.test.ts
pnpm --filter @ktx/cli exec vitest run src/ingest-query-executor.test.ts src/ingest.test.ts -t "supplies a scan-connector query executor"
```
Remaining v1-blocking gap:
- `packages/context/skills/lookml_ingest/SKILL.md`,
`packages/context/skills/metricflow_ingest/SKILL.md`, and
`packages/context/skills/sl_capture/SKILL.md` still contain
`sql_execution({ sql ... })` / "session shape" guidance inherited from
Kaelio. KTX's tool contract is
`sql_execution({connectionName, sql, rowLimit?})`, so these examples can make
agents call the shipped tool with invalid input.
Non-blocking gaps remain out of scope for this v1 plan:
- Full DDL-style `entity_details` formatting with FK profile summaries.
- AST-backed SQL validation for data-modifying CTE bodies.
- Search over generated `enrichment/descriptions.json`.
- Per-WorkUnit reuse of a single `WarehouseCatalogService` instance for cache
hits across separate tool calls.
- A deterministic fake-LLM end-to-end Notion hallucination regression. Prompt
guards and tool contract tests cover the v1 contract; a broader behavior
regression can land as follow-up.
## File Structure
Modify these files:
- `packages/context/src/memory/memory-runtime-assets.test.ts`: add a prompt
guard that rejects the legacy session-only `sql_execution` shape.
- `packages/context/src/ingest/ingest-runtime-assets.test.ts`: strengthen the
shared prompt asset assertion for the KTX `connectionName` SQL shape.
- `packages/context/skills/_shared/identifier-verification.md`: make both SQL
probe instructions show the KTX `connectionName` argument.
- `packages/context/skills/notion_synthesize/SKILL.md`: inline the updated
protocol block.
- `packages/context/skills/dbt_ingest/SKILL.md`: inline the updated protocol
block.
- `packages/context/skills/lookml_ingest/SKILL.md`: inline the updated protocol
block and fix the legacy SQL fallback example.
- `packages/context/skills/looker_ingest/SKILL.md`: inline the updated
protocol block.
- `packages/context/skills/metabase_ingest/SKILL.md`: inline the updated
protocol block.
- `packages/context/skills/metricflow_ingest/SKILL.md`: inline the updated
protocol block and fix the legacy SQL fallback example.
- `packages/context/skills/live_database_ingest/SKILL.md`: inline the updated
protocol block.
- `packages/context/skills/historic_sql_table_digest/SKILL.md`: inline the
updated protocol block.
- `packages/context/skills/historic_sql_patterns/SKILL.md`: inline the updated
protocol block.
- `packages/context/skills/knowledge_capture/SKILL.md`: inline the updated
protocol block.
- `packages/context/skills/sl_capture/SKILL.md`: inline the updated protocol
block and fix the join-discovery SQL example.
### Task 1: Add Prompt Guards For The KTX SQL Tool Shape
**Files:**
- Modify: `packages/context/src/memory/memory-runtime-assets.test.ts`
- Modify: `packages/context/src/ingest/ingest-runtime-assets.test.ts`
- [ ] **Step 1: Add the failing memory asset guard**
In `packages/context/src/memory/memory-runtime-assets.test.ts`, add this test
after `does not ship stale warehouse verification tool names or fictional
identifiers`:
```ts
it('ships only the KTX connectionName sql_execution call shape in writer guidance', async () => {
const shared = await readFile(join(skillsDir, '_shared', 'identifier-verification.md'), 'utf-8');
expect(shared).toContain('sql_execution({connectionName, sql: "SELECT DISTINCT');
expect(shared).toContain('sql_execution({connectionName, sql: "SELECT 1 FROM');
for (const skillName of verificationWriterSkills) {
const body = await readFile(join(skillsDir, skillName, 'SKILL.md'), 'utf-8');
expect(body).toContain('sql_execution({connectionName');
expect(body).not.toContain('sql_execution({ sql');
expect(body).not.toContain('session shape');
expect(body).not.toContain('connection is already pinned by the ingest session');
}
});
```
- [ ] **Step 2: Strengthen the shared ingest asset guard**
In `packages/context/src/ingest/ingest-runtime-assets.test.ts`, update
`packages identifier verification prompt assets` so the final assertions are:
```ts
expect(shared).toContain('discover_data');
expect(shared).toContain('entity_details');
expect(shared).toContain('sql_execution');
expect(shared).toContain('sql_execution({connectionName, sql: "SELECT DISTINCT');
expect(shared).toContain('sql_execution({connectionName, sql: "SELECT 1 FROM');
```
- [ ] **Step 3: Run the failing prompt guards**
Run:
```bash
pnpm --filter @ktx/context exec vitest run src/memory/memory-runtime-assets.test.ts src/ingest/ingest-runtime-assets.test.ts
```
Expected: FAIL. The failure must mention at least one current legacy string:
`sql_execution({ sql`, `session shape`, or missing
`sql_execution({connectionName`.
### Task 2: Update The Shared Identifier Verification Protocol
**Files:**
- Modify: `packages/context/skills/_shared/identifier-verification.md`
- Modify: `packages/context/skills/notion_synthesize/SKILL.md`
- Modify: `packages/context/skills/dbt_ingest/SKILL.md`
- Modify: `packages/context/skills/lookml_ingest/SKILL.md`
- Modify: `packages/context/skills/looker_ingest/SKILL.md`
- Modify: `packages/context/skills/metabase_ingest/SKILL.md`
- Modify: `packages/context/skills/metricflow_ingest/SKILL.md`
- Modify: `packages/context/skills/live_database_ingest/SKILL.md`
- Modify: `packages/context/skills/historic_sql_table_digest/SKILL.md`
- Modify: `packages/context/skills/historic_sql_patterns/SKILL.md`
- Modify: `packages/context/skills/knowledge_capture/SKILL.md`
- Modify: `packages/context/skills/sl_capture/SKILL.md`
- [ ] **Step 1: Replace the shared protocol text**
Replace the full `## Identifier Verification Protocol` block in
`packages/context/skills/_shared/identifier-verification.md` with:
```md
## Identifier Verification Protocol
Before writing a wiki page or SL source on any topic:
1. `discover_data({query: "<topic>"})` - see what wikis, SL sources, and raw
tables already exist. Prefer updating existing pages over creating new ones.
Before emitting any `schema.table` or `schema.table.column` into a wiki body,
SL source, `tables:` frontmatter, `sl_refs`, or `emit_unmapped_fallback`:
2. `entity_details({connectionName, targets: [{display: "<identifier>"}]})` -
confirm the identifier resolves; inspect native types, FK/PK, and
sampleValues.
3. For literal values from the source, such as status codes or plan tiers,
check whether they appear in `entity_details` sampleValues for the relevant
column. If sampleValues is short or the sample may have missed real values,
run a `sql_execution` probe with the same warehouse connection name:
`sql_execution({connectionName, sql: "SELECT DISTINCT <col> FROM <ref> LIMIT 50"})`.
4. If the candidate identifier still does not resolve, do one of:
- Use `sql_execution({connectionName, sql: "SELECT 1 FROM <ref> LIMIT 0"})`.
If it errors, the identifier is fictional.
- Wrap the identifier in `[unverified - from <rawPath>]` in the wiki body,
citing the exact raw path that mentioned it.
- When recording `emit_unmapped_fallback` with `no_physical_table`, include
the failing probe error in `clarification`.
5. Never copy `<schema>.<table>` placeholder strings from these instructions
into output.
```
- [ ] **Step 2: Inline the same protocol in every writer skill**
Replace the existing `## Identifier Verification Protocol` block in each writer
skill with the exact block from Step 1:
```bash
packages/context/skills/notion_synthesize/SKILL.md
packages/context/skills/dbt_ingest/SKILL.md
packages/context/skills/lookml_ingest/SKILL.md
packages/context/skills/looker_ingest/SKILL.md
packages/context/skills/metabase_ingest/SKILL.md
packages/context/skills/metricflow_ingest/SKILL.md
packages/context/skills/live_database_ingest/SKILL.md
packages/context/skills/historic_sql_table_digest/SKILL.md
packages/context/skills/historic_sql_patterns/SKILL.md
packages/context/skills/knowledge_capture/SKILL.md
packages/context/skills/sl_capture/SKILL.md
```
- [ ] **Step 3: Run the shared prompt asset tests**
Run:
```bash
pnpm --filter @ktx/context exec vitest run src/memory/memory-runtime-assets.test.ts src/ingest/ingest-runtime-assets.test.ts
```
Expected: still FAIL because the per-skill legacy SQL examples in LookML,
MetricFlow, and `sl_capture` have not been fixed yet.
### Task 3: Fix Legacy Per-Skill SQL Examples
**Files:**
- Modify: `packages/context/skills/lookml_ingest/SKILL.md`
- Modify: `packages/context/skills/metricflow_ingest/SKILL.md`
- Modify: `packages/context/skills/sl_capture/SKILL.md`
- [ ] **Step 1: Fix the LookML fallback probe example**
In `packages/context/skills/lookml_ingest/SKILL.md`, replace the current
Required flow item 2 with:
```md
2. If the table isn't in the manifest, use the warehouse `connectionName`
returned by `discover_data` or the target connection chosen from
`sl_discover`, then call a dialect-appropriate SQL probe with that
connection name, for example:
`sql_execution({connectionName: "warehouse", sql: "SELECT 1 FROM analytics.orders LIMIT 0"})`.
Replace `warehouse`, `analytics`, and `orders` with the verified connection,
schema or dataset, and table from the WorkUnit evidence.
```
- [ ] **Step 2: Fix the MetricFlow fallback probe example**
In `packages/context/skills/metricflow_ingest/SKILL.md`, replace the paragraph
that begins `If \`sl_discover\` errors` with:
```md
If `sl_discover` errors because no such table exists, use `discover_data` and
`entity_details` to find the warehouse target. If a SQL probe is still needed,
call `sql_execution` with the same warehouse connection name, for example:
`sql_execution({connectionName: "warehouse", sql: "SELECT 1 FROM analytics.orders LIMIT 0"})`.
**Never invent column names** - every column in `columns:`, `grain:`, and
`sql:` must be sourced from raw files, `entity_details`, or a successful SQL
probe.
```
- [ ] **Step 3: Fix the `sl_capture` join probe example**
In `packages/context/skills/sl_capture/SKILL.md`, replace Tool sequence item 6
with:
```md
6. For join discovery: use `sql_execution({connectionName: "warehouse", sql: "SELECT count(*) FROM public.orders o JOIN public.customers c ON c.id = o.customer_id LIMIT 20"})` with the target warehouse connection name and dialect-correct table names to verify the join key exists in both tables and assess cardinality before declaring the join.
```
- [ ] **Step 4: Run the prompt asset tests**
Run:
```bash
pnpm --filter @ktx/context exec vitest run src/memory/memory-runtime-assets.test.ts src/ingest/ingest-runtime-assets.test.ts
```
Expected: PASS. The tests must report 2 files passed.
### Task 4: Final Verification
**Files:**
- No new files.
- [ ] **Step 1: Run focused warehouse prompt and tool tests**
Run:
```bash
pnpm --filter @ktx/context exec vitest run src/connections/dialects.test.ts src/connections/read-only-sql.test.ts src/ingest/tools/warehouse-verification/warehouse-catalog.service.test.ts src/ingest/tools/warehouse-verification/entity-details.tool.test.ts src/ingest/tools/warehouse-verification/sql-execution.tool.test.ts src/ingest/tools/warehouse-verification/discover-data.tool.test.ts src/ingest/ingest-prompts.test.ts src/ingest/ingest-runtime-assets.test.ts src/memory/memory-runtime-assets.test.ts
```
Expected: PASS.
- [ ] **Step 2: Run package type-check**
Run:
```bash
pnpm --filter @ktx/context run type-check
```
Expected: PASS.
- [ ] **Step 3: Inspect final diff**
Run:
```bash
git diff -- packages/context/src/memory/memory-runtime-assets.test.ts packages/context/src/ingest/ingest-runtime-assets.test.ts packages/context/skills/_shared/identifier-verification.md packages/context/skills/notion_synthesize/SKILL.md packages/context/skills/dbt_ingest/SKILL.md packages/context/skills/lookml_ingest/SKILL.md packages/context/skills/looker_ingest/SKILL.md packages/context/skills/metabase_ingest/SKILL.md packages/context/skills/metricflow_ingest/SKILL.md packages/context/skills/live_database_ingest/SKILL.md packages/context/skills/historic_sql_table_digest/SKILL.md packages/context/skills/historic_sql_patterns/SKILL.md packages/context/skills/knowledge_capture/SKILL.md packages/context/skills/sl_capture/SKILL.md
```
Expected: only prompt wording and prompt-asset guards changed. No tool
implementation files changed.
- [ ] **Step 4: Commit**
Run:
```bash
git add packages/context/src/memory/memory-runtime-assets.test.ts packages/context/src/ingest/ingest-runtime-assets.test.ts packages/context/skills/_shared/identifier-verification.md packages/context/skills/notion_synthesize/SKILL.md packages/context/skills/dbt_ingest/SKILL.md packages/context/skills/lookml_ingest/SKILL.md packages/context/skills/looker_ingest/SKILL.md packages/context/skills/metabase_ingest/SKILL.md packages/context/skills/metricflow_ingest/SKILL.md packages/context/skills/live_database_ingest/SKILL.md packages/context/skills/historic_sql_table_digest/SKILL.md packages/context/skills/historic_sql_patterns/SKILL.md packages/context/skills/knowledge_capture/SKILL.md packages/context/skills/sl_capture/SKILL.md
git commit -m "fix(context): align warehouse sql probe prompt shape"
```
Expected: one focused commit.
## Self-Review
Spec coverage:
- The original spec requires `sql_execution` inputs to include
`connectionName`; this plan removes contradictory session-only examples from
all active writer guidance.
- The shared protocol remains in `_shared` and inlined in every synthesis
writer skill named by the original spec.
- The tool implementation remains unchanged because the shipped schema already
enforces the v1 contract.
Placeholder scan:
- The plan has no deferred implementation markers.
- Prompt examples use concrete `warehouse`, `analytics`, and `orders` example
names only to demonstrate JSON shape, and each example tells the worker to
replace them with discovered evidence.
Type consistency:
- Tests assert the exact KTX tool call shape:
`sql_execution({connectionName, sql: ...})`.
- Prompt wording consistently uses `connectionName`, matching
`packages/context/src/ingest/tools/warehouse-verification/sql-execution.tool.ts`.

View file

@ -0,0 +1,215 @@
# Warehouse Verification SQL Example Closure Implementation Plan
> **For agentic workers:** REQUIRED SUB-SKILL: Use superpowers:subagent-driven-development (recommended) or superpowers:executing-plans to implement this plan task-by-task. Steps use checkbox (`- [ ]`) syntax for tracking.
**Goal:** Remove the last connectionless `sql_execution` prompt example so
warehouse-verification writer guidance always matches KTX's shipped tool
contract.
**Architecture:** Keep the warehouse verification tool code unchanged. Tighten
the prompt asset guard so multiline `sql_execution({ sql: ... })` examples
fail tests, then update the stale `sl_capture` worked example to pass
`connectionName` explicitly.
**Tech Stack:** Markdown skill prompts, TypeScript, Vitest, pnpm workspace
commands.
---
## Audit summary
The warehouse verification tools, runner wiring, source-adapter target fan-out,
CLI query executor, and focused tests are present. Focused verification passed:
```bash
pnpm --filter @ktx/context exec vitest run src/connections/dialects.test.ts src/connections/read-only-sql.test.ts src/ingest/tools/warehouse-verification/warehouse-catalog.service.test.ts src/ingest/tools/warehouse-verification/entity-details.tool.test.ts src/ingest/tools/warehouse-verification/sql-execution.tool.test.ts src/ingest/tools/warehouse-verification/discover-data.tool.test.ts src/ingest/ingest-prompts.test.ts src/ingest/ingest-runtime-assets.test.ts src/memory/memory-runtime-assets.test.ts src/ingest/local-adapters.test.ts src/ingest/adapters/notion/notion.adapter.test.ts src/ingest/adapters/lookml/lookml.adapter.test.ts src/ingest/adapters/metricflow/metricflow.adapter.test.ts
pnpm --filter @ktx/cli exec vitest run src/ingest-query-executor.test.ts src/ingest.test.ts -t "supplies a scan-connector query executor"
```
Remaining v1-blocking gap:
- `packages/context/skills/sl_capture/SKILL.md` still contains a worked example
with a multiline `sql_execution({ sql: ... })` call. KTX's tool contract is
`sql_execution({connectionName, sql, rowLimit?})`, so this example can teach
agents to call the shipped tool with invalid input.
Non-blocking gaps remain out of scope for this v1 plan:
- Full DDL-style `entity_details` formatting with FK profile summaries.
- AST-backed SQL validation for data-modifying CTE bodies.
- Search over generated `enrichment/descriptions.json`.
- Per-WorkUnit reuse of a single `WarehouseCatalogService` instance for cache
hits across separate tool calls.
- A deterministic fake-LLM end-to-end Notion hallucination regression.
- Tokenized or embedding-backed raw schema search ranking in `discover_data`.
## File structure
Modify these files:
- `packages/context/src/memory/memory-runtime-assets.test.ts`: add a prompt
guard that catches multiline `sql_execution` calls without `connectionName`.
- `packages/context/skills/sl_capture/SKILL.md`: update the stale worked
example to include the target warehouse `connectionName`.
### Task 1: Add a multiline SQL prompt guard
**Files:**
- Modify: `packages/context/src/memory/memory-runtime-assets.test.ts`
- [ ] **Step 1: Add a helper that extracts `sql_execution` call examples**
In `packages/context/src/memory/memory-runtime-assets.test.ts`, add this helper
after `forbiddenProductPattern()`:
```ts
function sqlExecutionCallBlocks(body: string): string[] {
const blocks: string[] = [];
const marker = 'sql_execution({';
let offset = 0;
while (offset < body.length) {
const start = body.indexOf(marker, offset);
if (start === -1) {
break;
}
const end = body.indexOf('})', start + marker.length);
blocks.push(body.slice(start, end === -1 ? start + marker.length : end + 2));
offset = start + marker.length;
}
return blocks;
}
```
- [ ] **Step 2: Strengthen the existing SQL-shape test**
Replace the body of
`ships only the KTX connectionName sql_execution call shape in writer guidance`
with:
```ts
const shared = await readFile(join(skillsDir, '_shared', 'identifier-verification.md'), 'utf-8');
const bodies = [{ name: '_shared/identifier-verification.md', body: shared }];
expect(shared).toContain('sql_execution({connectionName, sql: "SELECT DISTINCT');
expect(shared).toContain('sql_execution({connectionName, sql: "SELECT 1 FROM');
for (const skillName of verificationWriterSkills) {
const body = await readFile(join(skillsDir, skillName, 'SKILL.md'), 'utf-8');
bodies.push({ name: `${skillName}/SKILL.md`, body });
expect(body).toContain('sql_execution({connectionName');
expect(body).not.toContain('sql_execution({ sql');
expect(body).not.toContain('session shape');
expect(body).not.toContain('connection is already pinned by the ingest session');
}
for (const { name, body } of bodies) {
const calls = sqlExecutionCallBlocks(body);
expect(calls.length, `${name} should contain sql_execution guidance`).toBeGreaterThan(0);
expect(
calls.filter((call) => !call.includes('connectionName')),
`${name} has sql_execution calls without connectionName`,
).toEqual([]);
expect(body, `${name} has a connectionless multiline sql_execution call`).not.toMatch(
/sql_execution\(\{\s*sql\s*:/,
);
}
```
- [ ] **Step 3: Run the failing prompt guard**
Run:
```bash
pnpm --filter @ktx/context exec vitest run src/memory/memory-runtime-assets.test.ts -t "connectionName sql_execution"
```
Expected: FAIL. The failure must identify
`sl_capture/SKILL.md` as having a `sql_execution` call without
`connectionName` or a connectionless multiline `sql_execution` call.
- [ ] **Step 4: Commit the failing guard**
Run:
```bash
git add packages/context/src/memory/memory-runtime-assets.test.ts
git commit -m "test(context): catch connectionless sql execution prompt examples"
```
### Task 2: Fix the stale `sl_capture` SQL example
**Files:**
- Modify: `packages/context/skills/sl_capture/SKILL.md`
- Test: `packages/context/src/memory/memory-runtime-assets.test.ts`
- Test: `packages/context/src/ingest/ingest-runtime-assets.test.ts`
- [ ] **Step 1: Update the worked example**
In `packages/context/skills/sl_capture/SKILL.md`, replace the `sql_execution`
block in "Worked example - new join" with:
```md
sql_execution({
connectionName: "warehouse",
sql: "SELECT COUNT(*), COUNT(DISTINCT a.admin_user_id) FROM public.fct_orders a JOIN public.fct_mau_multiprotocol b ON a.admin_user_id = b.admin_user_id LIMIT 1"
})
```
- [ ] **Step 2: Run the prompt guards**
Run:
```bash
pnpm --filter @ktx/context exec vitest run src/memory/memory-runtime-assets.test.ts src/ingest/ingest-runtime-assets.test.ts
```
Expected: PASS.
- [ ] **Step 3: Run a direct stale-shape scan**
Run:
```bash
rg -n -U "sql_execution\\(\\{\\s*\\n\\s*sql:" packages/context/skills packages/context/prompts
```
Expected: no matches and exit code 1.
- [ ] **Step 4: Run the context type-check**
Run:
```bash
pnpm --filter @ktx/context run type-check
```
Expected: PASS.
- [ ] **Step 5: Commit the prompt fix**
Run:
```bash
git add packages/context/skills/sl_capture/SKILL.md
git commit -m "fix(context): include connection name in sl capture sql example"
```
## Self-review
Spec coverage:
- The only remaining v1-blocking prompt-shape gap has a failing test and a
direct prompt edit.
- Tool implementation, runner wiring, adapter scoping, and CLI execution
remain covered by the focused suites listed in the audit summary.
Placeholder scan:
- This plan contains no deferred implementation placeholders.
Type consistency:
- The plan uses the shipped KTX tool shape:
`sql_execution({connectionName, sql, rowLimit?})`.

View file

@ -0,0 +1,236 @@
# Warehouse Verification Structured Target Miss Closure Implementation Plan
> **For agentic workers:** REQUIRED SUB-SKILL: Use superpowers:subagent-driven-development (recommended) or superpowers:executing-plans to implement this plan task-by-task. Steps use checkbox (`- [ ]`) syntax for tracking.
**Goal:** Make `entity_details` return model-visible not-found evidence for every documented target shape, including structured `{catalog, db, name, column?}` targets.
**Architecture:** Keep the existing warehouse verification module. Add focused tests for missing structured table and column targets, then route structured target labels through the same candidate lookup used by display targets while preserving exact structured resolution.
**Tech Stack:** TypeScript, Node 22, Vitest, AI SDK v6 tools, Zod, KTX ingest tools.
---
## Audit Summary
The implemented plans have landed the warehouse verification tools, ingest
runner wiring, adapter warehouse target fan-out, CLI read-only query executor,
and prompt-shape closures. Focused verification passed on May 13, 2026:
```bash
pnpm --filter @ktx/context exec vitest run src/connections/dialects.test.ts src/connections/read-only-sql.test.ts src/ingest/tools/warehouse-verification/warehouse-catalog.service.test.ts src/ingest/tools/warehouse-verification/entity-details.tool.test.ts src/ingest/tools/warehouse-verification/sql-execution.tool.test.ts src/ingest/tools/warehouse-verification/discover-data.tool.test.ts src/ingest/ingest-prompts.test.ts src/ingest/ingest-runtime-assets.test.ts src/memory/memory-runtime-assets.test.ts src/ingest/local-adapters.test.ts src/ingest/adapters/notion/notion.adapter.test.ts src/ingest/adapters/lookml/lookml.adapter.test.ts src/ingest/adapters/metricflow/metricflow.adapter.test.ts
pnpm --filter @ktx/cli exec vitest run src/ingest-query-executor.test.ts src/ingest.test.ts -t "supplies a scan-connector query executor"
rg -n -U "sql_execution\\(\\{\\s*\\n\\s*sql:" packages/context/skills packages/context/prompts
rg -n "wiki_sl_search|sl_describe_table|orbit_analytics\\.customer" packages/context/skills packages/context/prompts packages/context/src/ingest/tools/emit-unmapped-fallback.tool.ts packages/context/src/sl/tools/sl-warehouse-validation.ts
```
Remaining v1-blocking gap:
- `entity_details` accepts structured targets, but if a structured table target
does not exist, it records `structured.missing` and emits no markdown. Tool
outputs are sent to the model as markdown only, so the synthesis agent gets
an empty response instead of the required "Not found in scan" verification
signal.
Non-blocking gaps remain out of scope for this v1 plan:
- Full DDL-style `entity_details` formatting with FK and profile summaries.
- AST-backed SQL validation for data-modifying CTE bodies.
- Dialect-specific row-limit wrapping for SQL Server probes.
- Search over generated `enrichment/descriptions.json`.
- Per-WorkUnit reuse of a single `WarehouseCatalogService` instance for cache
hits across separate tool calls.
- A deterministic fake-LLM end-to-end Notion hallucination regression.
- Cleanup of legacy demo Orbit wiki fixtures that still mention
`orbit_analytics.customer`.
## File Structure
Modify these files:
- `packages/context/src/ingest/tools/warehouse-verification/entity-details.tool.test.ts`: add failing coverage for missing structured targets.
- `packages/context/src/ingest/tools/warehouse-verification/entity-details.tool.ts`: render missing structured targets into markdown and reuse candidate lookup.
### Task 1: Report Structured Target Misses In `entity_details`
**Files:**
- Modify: `packages/context/src/ingest/tools/warehouse-verification/entity-details.tool.test.ts`
- Modify: `packages/context/src/ingest/tools/warehouse-verification/entity-details.tool.ts`
- [ ] **Step 1: Add failing structured miss tests**
In `packages/context/src/ingest/tools/warehouse-verification/entity-details.tool.test.ts`, add these tests after `reports missing explicit columns instead of returning an empty column list`:
```ts
it('reports missing structured table targets in model-visible markdown', async () => {
const result = await tool.call(
{
connectionName: 'warehouse',
targets: [{ catalog: null, db: 'public', name: 'orderz' }],
},
context,
);
expect(result.markdown).toContain('Not found in scan: public.orderz');
expect(result.markdown).toContain('Closest matches: orders');
expect(result.structured.resolved).toHaveLength(0);
expect(result.structured.missing).toHaveLength(1);
});
it('reports missing structured column targets in model-visible markdown', async () => {
const result = await tool.call(
{
connectionName: 'warehouse',
targets: [{ catalog: null, db: 'public', name: 'orders', column: 'plan_tier' }],
},
context,
);
expect(result.markdown).toContain('Column not found in scan: public.orders.plan_tier');
expect(result.markdown).toContain('Available columns: id, status');
expect(result.structured.resolved).toHaveLength(0);
expect(result.structured.missing).toHaveLength(1);
});
```
- [ ] **Step 2: Run the failing focused test**
Run:
```bash
pnpm --filter @ktx/context exec vitest run src/ingest/tools/warehouse-verification/entity-details.tool.test.ts -t "structured"
```
Expected: FAIL. The first new test must fail because `result.markdown` does not contain `Not found in scan: public.orderz`.
- [ ] **Step 3: Add structured target labels and candidate lookup**
In `packages/context/src/ingest/tools/warehouse-verification/entity-details.tool.ts`, add this type alias after `type EntityDetailsInput = z.infer<typeof entityDetailsInputSchema>;`:
```ts
type EntityDetailsTarget = EntityDetailsInput['targets'][number];
```
Add these helpers after `function allowedConnectionNames(context: ToolContext): ReadonlySet<string> | null { ... }`:
```ts
function targetLabel(target: EntityDetailsTarget): string {
if ('display' in target) {
return target.display;
}
return [target.catalog, target.db, target.name, target.column].filter((part): part is string => !!part).join('.');
}
function appendMissingTargetMarkdown(parts: string[], target: EntityDetailsTarget, candidates: KtxTableRef[]): void {
parts.push(`Not found in scan: ${targetLabel(target)}`);
if (candidates.length > 0) {
parts.push(`Closest matches: ${candidates.map((candidate) => candidate.name).join(', ')}`);
}
}
async function resolveTarget(
catalog: WarehouseCatalogService,
connectionName: string,
target: EntityDetailsTarget,
): Promise<{ resolved: (KtxTableRef & { column?: string }) | null; candidates: KtxTableRef[] }> {
if ('display' in target) {
return catalog.resolveDisplayTarget(connectionName, target.display);
}
const candidateResolution = await catalog.resolveDisplayTarget(connectionName, targetLabel(target));
return {
resolved: {
catalog: target.catalog,
db: target.db,
name: target.name,
column: target.column,
},
candidates: candidateResolution.candidates,
};
}
```
Then replace the `const resolution = ...` block inside the `for (const target of input.targets)` loop with:
```ts
const resolution = await resolveTarget(catalog, input.connectionName, target);
```
Replace the missing-resolution block with:
```ts
if (!resolution.resolved) {
missing.push({ target, candidates: resolution.candidates });
appendMissingTargetMarkdown(parts, target, resolution.candidates);
continue;
}
```
Replace the missing-detail block with:
```ts
if (!detail) {
missing.push({ target, candidates: resolution.candidates });
appendMissingTargetMarkdown(parts, target, resolution.candidates);
continue;
}
```
- [ ] **Step 4: Run the focused entity-details tests**
Run:
```bash
pnpm --filter @ktx/context exec vitest run src/ingest/tools/warehouse-verification/entity-details.tool.test.ts
```
Expected: PASS.
- [ ] **Step 5: Run warehouse verification regression tests**
Run:
```bash
pnpm --filter @ktx/context exec vitest run src/ingest/tools/warehouse-verification/warehouse-catalog.service.test.ts src/ingest/tools/warehouse-verification/entity-details.tool.test.ts src/ingest/tools/warehouse-verification/discover-data.tool.test.ts
```
Expected: PASS.
- [ ] **Step 6: Run context type-check**
Run:
```bash
pnpm --filter @ktx/context run type-check
```
Expected: PASS.
- [ ] **Step 7: Commit**
Run:
```bash
git add \
packages/context/src/ingest/tools/warehouse-verification/entity-details.tool.ts \
packages/context/src/ingest/tools/warehouse-verification/entity-details.tool.test.ts
git commit -m "fix(context): report structured entity detail misses"
```
## Self-review
Spec coverage:
- The original `entity_details` contract says structured and display targets
are mixed shapes and unresolved targets must produce `Not found in scan` with
candidates. This plan adds that model-visible behavior for structured table
misses and preserves the existing column-miss behavior.
Placeholder scan:
- This plan contains no deferred implementation placeholders.
Type consistency:
- The plan uses the existing `WarehouseCatalogService`, `KtxTableRef`,
`EntityDetailsStructured`, and `ToolOutput` types without adding public API
compatibility wrappers.

View file

@ -0,0 +1,331 @@
# Warehouse Verification Tools for Ingestion Synthesis
**Date:** 2026-05-12
**Author:** Andrey Avtomonov
**Status:** Design — pending implementation plan
## Background and motivation
KTX's ingest pipeline synthesises wiki pages and semantic-layer (SL) sources from third-party content (Notion, LookML, Looker, Metabase, dbt, MetricFlow, historic SQL, live-database scans, and chat). The synthesis stage is an LLM call that runs once per WorkUnit, governed by a skill prompt (e.g. `notion_synthesize`) and a set of allowed tools.
A real-world inspection (project `/tmp/ktx-proj-1`) surfaced two failure modes the synthesis stage produces:
1. **Fictional identifiers laundered into wiki output.** A Notion page mentioned `orbit_analytics.customer` as a legacy "customer source" table with a `plan_tier in {free, pro, enterprise}` column. Neither the table, the column, nor those values exist in the configured warehouse. The synthesis LLM faithfully copied them into `knowledge/global/orbit/customers-source.md` as a "Conflict Note", giving the fabricated names full wiki frontmatter, a `Source:` citation, and apparent authority.
2. **Column attribution drift.** The same wiki page documents columns under `orbit_raw.accounts` but states the `paying_account_count` measure filters on `normalized_plan_code` and `contract_status`. Those columns live on `orbit_analytics.mart_account_segments`, not on `accounts`. A reader (or a downstream agent) following the page will write `accounts.normalized_plan_code` and get a `column does not exist` error.
Root cause analysis (`packages/context/skills/notion_synthesize/SKILL.md`, `packages/context/src/ingest/tools/emit-unmapped-fallback.tool.ts`, `packages/context/src/wiki/tools/wiki-write.tool.ts`) showed three contributing factors:
- The synthesis LLM has no verification primitive that distinguishes a real warehouse identifier from a fabricated one. `sl_discover` only finds objects already promoted into the semantic layer; raw warehouse scans (which already exist on disk under `raw-sources/<conn>/live-database/<sync>/`) are not surfaced to the LLM at all.
- `wiki_write` performs no body-text validation — anything the LLM emits is written.
- The skill prompt itself uses `orbit_analytics.customer` as a canonical example string (`SKILL.md:70`), reinforcing the same fictional name the LLM ends up emitting.
Kaelio's server-side ingest WU agent (`/Users/andrey/conductor/workspaces/kaelio-main2/douala/server/src/tools/toolset-factory.service.ts`) had four verification tools that KTX dropped during the open-source extraction: `discover_data`, `entity_details`, `dictionary_search`, and `sql_execution`. The underlying connector infrastructure (`KtxScanConnector`, dialect classes, `assertReadOnlySql`, `SemanticLayerService.executeQuery`) is present in KTX, so the gap is at the tool layer, not the platform layer.
## Goal
Give every ingest adapter's synthesis-time LLM call the tools and skill-prompt instructions needed to verify warehouse identifiers (`schema.table`, `schema.table.column`) and sample values before emitting them into wiki pages, SL sources, `tables:` frontmatter, `sl_refs`, or `emit_unmapped_fallback` records.
## Non-goals
- Not changing `wiki_write` itself. A complementary spec covers hard write-time validation; this spec focuses on giving the LLM the tools to self-validate.
- Not modifying any Notion fetch/chunk/cluster behaviour.
- Not changing the `_schema/*.yaml` format.
- Not introducing a UUID layer for tables or columns; KTX keeps `(connection, catalog, db, name)` as the canonical table identity.
- Not adding `semantic_query` to the synthesis toolset. `semantic_query` is a future tool for the research/chat-time agent; synthesis creates SL sources rather than queries them, so the wrong shape.
- Not adding `dictionary_search`. `entity_details` already returns per-column `sampleValues` from the relationship-profile, and `sql_execution` covers the rarer "where does this literal live?" case more accurately than a sampled-JSON full-text scan.
## What already exists in KTX
The dialect/driver/connection architecture is fully ported from Kaelio. The new tools sit on top of three already-shipping primitives:
| Primitive | Location |
|---|---|
| `KtxTableRef = { catalog: string\|null, db: string\|null, name: string }` | `packages/context/src/scan/types.ts:168` |
| `SemanticLayerService.executeQuery(connectionId, sql)` | `packages/context/src/sl/semantic-layer.service.ts:1004`, used today by `sl_validate` |
| `assertReadOnlySql` / `limitSqlForExecution` | `packages/context/src/connections/read-only-sql.ts` |
| 7 connectors with parallel layout (postgres, mysql, sqlserver, snowflake, bigquery, clickhouse, sqlite), each exporting a dialect class | `packages/connector-*` |
| Raw scan artefacts: `tables/<base64(catalog??'_')>.<base64(db)>.<base64(name)>.json` and `enrichment/relationship-profile.json` (with `nativeType`, `nullable`, `primaryKey`, `foreignKeys`, `rowCount`, `nullCount`, `distinctCount`, `sampleValues`, descriptions) | `raw-sources/<connectionId>/live-database/<latest-sync>/` |
| `wiki_search`, `sl_discover`, `sl_read_source`, `sl_validate`, `emit_unmapped_fallback` | already wired into synthesis stages |
The only meaningfully new code is `WarehouseCatalogService`, a small `getDialectForDriver` dispatch, the three tool files, and the wiring in `ingest-bundle.runner.ts`.
## Architecture
### Module layout
```
packages/context/src/ingest/tools/warehouse-verification/
discover-data.tool.ts
entity-details.tool.ts
sql-execution.tool.ts
warehouse-catalog.service.ts
index.ts # exports createWarehouseVerificationTools()
packages/context/src/connections/
dialects.ts # adds getDialectForDriver()
packages/context/skills/_shared/
identifier-verification.md # the protocol snippet referenced from every synthesis skill
```
### Canonical table identity
Every tool that names a warehouse object uses the tuple `(connectionName, catalog, db, name[, column])`. `connectionName` is the slug from `ktx.yaml` (e.g., `"warehouse"`), validated against `^[a-zA-Z0-9][a-zA-Z0-9_-]*$`. There is no UUID layer.
`display` strings the LLM picks up from source pages (e.g., `"orbit_raw.accounts"` for Postgres or `"project.dataset.table"` for BigQuery) are parsed by `WarehouseCatalogService.resolveDisplay`, which knows the connection's driver via `getDialectForDriver`. Ambiguous parses (e.g., a 2-part display on BigQuery) return a candidates list instead of guessing.
Dialect mapping:
| Driver | catalog | db | name | Display |
|---|---|---|---|---|
| postgres | `null` | schema | table | `schema.table` |
| mysql | `null` | schema | table | `schema.table` |
| sqlserver | catalog | schema | table | `catalog.schema.table` |
| snowflake | database | schema | table | `db.schema.table` |
| bigquery | project | dataset | table | `project.dataset.table` |
| clickhouse | `null` | database | table | `database.table` |
| sqlite | `null` | `null` | table | `table` |
### `WarehouseCatalogService`
Stateless except for a per-WorkUnit cache. Reads raw scan files under `raw-sources/<connectionName>/live-database/<latest-sync>/`.
```ts
class WarehouseCatalogService {
getTable(ref: { connectionName: string } & KtxTableRef): Promise<TableDetail | null>;
listTables(connectionName: string): Promise<KtxTableRef[]>;
resolveDisplay(connectionName: string, display: string): Promise<{
resolved: KtxTableRef | null;
candidates: KtxTableRef[]; // ranked by edit distance when resolved is null
dialect: string;
}>;
searchByName(connectionName: string, query: string, limit: number): Promise<Array<
| { kind: 'table'; ref: KtxTableRef; matchedOn: 'name'|'db'|'comment'|'description' }
| { kind: 'column'; ref: KtxTableRef & { column: string }; matchedOn: 'name'|'comment'|'description' }
>>;
getLatestSyncId(connectionName: string): Promise<string | null>;
}
```
`getTable` merges the raw schema file (native types, PK, FK, nullable) with the enrichment profile (row counts, null rates, distinct counts, sample values, AI-generated descriptions). When no scan exists for the connection, every read returns `null`; tools surface this as a distinct "no scan available" state rather than as "identifier not found", so the LLM doesn't conclude a real table is fictional just because a scan hasn't run yet.
### `getDialectForDriver`
```ts
// packages/context/src/connections/dialects.ts
export type SupportedDriver = 'postgres'|'postgresql'|'mysql'|'sqlserver'|'snowflake'|'bigquery'|'clickhouse'|'sqlite'|'sqlite3';
export function getDialectForDriver(driver: SupportedDriver): KtxDialect;
```
Sync dispatch. The connectors' existing dialect classes already expose the same shape — `formatTableName(KtxTableRef)`, `quoteIdentifier(string)`, `mapToDimensionType(nativeType)`. The implementation plan introduces a minimal `KtxDialect` interface that these classes already satisfy structurally; no connector-internal changes required. Used by tools only for display-string parsing and error-message formatting; tools never construct executable SQL.
## Tool contracts
### `entity_details`
```ts
input = {
connectionName: string,
targets: Array< // 1..50, mixed shapes allowed
| { display: string } // "orbit_raw.accounts" or "orbit_raw.accounts.account_id"
| { catalog: string|null, db: string, name: string, column?: string }
>,
}
```
Output (markdown, per target):
```
### orbit_raw.accounts
Type: table | Native columns: 11 | PK: account_id | FKs: parent_account_id → orbit_raw.accounts.account_id
Description: One row per customer account…
Columns:
- account_id (text, nullable=false, PK) — sample: ["acct_001","acct_002",…]
- parent_account_id (text, nullable=true, FK → orbit_raw.accounts.account_id)
- account_name (text, nullable=false)
- …
Profile: rowCount=4321 distinctCount(account_id)=4321 nullRate(parent_account_id)=0.62
```
When `column` is provided in a target, output is scoped to that one column. When a target doesn't resolve, output is `Not found in scan. Closest matches: …` with up to 5 candidates from `searchByName`. When the connection has no `live-database` scan, output is `No live-database scan available for connection "<name>"; run \`ktx scan\` first.` — distinct from the "not found" state.
Structured output: `{ resolved: TableDetail[], missing: Array<{target, candidates}>, scanAvailable: boolean }`.
Refuses `connectionName` values not in the WU-stage's `allowedConnectionNames` set.
### `sql_execution`
```ts
input = {
connectionName: string,
sql: string, // single SELECT or WITH only
rowLimit?: number, // default 100, hard cap 1000
}
```
Pipeline:
1. `assertReadOnlySql(sql)` — regex rejects anything starting with `insert|update|delete|merge|alter|drop|create|truncate|grant|revoke|copy|call|do|vacuum|analyze|refresh`.
2. `limitSqlForExecution(sql, rowLimit)` — wraps as `select * from (<llm_sql>) as ktx_query_result limit N`.
3. `SemanticLayerService.executeQuery(connectionName, wrappedSql)`.
4. Format as markdown table; first ~20 rows inline; if truncated, append `… +N more rows`.
Structured output: `{ headers, rows, rowCount, truncated, sql, wrappedSql }`.
Connector errors surface verbatim (e.g., Postgres `relation "orbit_analytics.customer" does not exist`). That error message is the most valuable verification signal — it tells the LLM the identifier is fictional.
Refuses `connectionName` not in `allowedConnectionNames`. Each connector's driver-level read-only enforcement (Postgres read-only transaction, BigQuery query-only jobs) is a second defence under the regex gate.
### `discover_data`
```ts
input = {
query: string,
connectionName?: string, // omit to search all configured warehouse connections
limit?: number, // default 10 per section
sourceName?: string, // SL source detail mode (delegates to sl_discover)
}
```
Composes three searches and groups output into three sections, omitting empty sections:
1. **Wiki Pages**`wiki_search({query, limit})`. Routing hint: *use `wiki_read(blockKey)` for full content*.
2. **Semantic Layer Sources**`sl_discover({query, connectionName})`. Routing hint: *use `sl_read_source(sourceName)` for the YAML, or `entity_details` for warehouse-shape details*.
3. **Raw Warehouse Schema**`WarehouseCatalogService.searchByName(connectionName, query, limit)`. Routing hint: *use `entity_details({connectionName, targets: [{display}]})` for full DDL + sample values*.
When `sourceName` is set, delegates entirely to `sl_discover` inspect mode and skips other sections. When all three sections are empty, output is `No matches for "<query>" across wiki, semantic layer, or raw warehouse schema. Try broader terms; this concept may not exist yet.`
Structured output: `{ wiki: WikiSearchStructured|null, sl: SlDiscoverStructured|null, raw: RawSchemaHits|null }`.
## Wiring
`packages/context/src/ingest/ingest-bundle.runner.ts` already plumbs `emit_unmapped_fallback` into both the WorkUnit stage (`createEmitUnmappedFallbackTool` around line 726) and the reconcile stage (around line 962), with merging done via `packages/context/src/ingest/stages/build-wu-context.ts` and `build-reconcile-context.ts`.
Add a parallel factory next to those existing calls:
```ts
const warehouseTools = createWarehouseVerificationTools({
semanticLayerService: scopedSemanticLayerService,
warehouseCatalog: new WarehouseCatalogService({ fileStore, projectDir }),
dialects: getDialectForDriver,
allowedConnectionNames: slConnectionIds, // reuse existing scoping
sqlExecutionRowLimit: 100,
});
// Merge `entity_details`, `sql_execution`, `discover_data` into both stage tool maps
// alongside emit_unmapped_fallback.
```
`createWarehouseVerificationTools` returns `Record<string, Tool>` with three keys. The set is wired into every adapter's synthesis stage — no per-adapter opt-in.
## Skill-prompt updates
### Shared protocol
`packages/context/skills/_shared/identifier-verification.md`:
```md
## Identifier Verification Protocol
Before writing a wiki page or SL source on any topic:
1. `discover_data({query: "<topic>"})` — see what wikis, SL sources, and raw tables
already exist. Prefer updating existing pages over creating new ones.
Before emitting any `schema.table` or `schema.table.column` into a wiki body,
SL source, `tables:` frontmatter, `sl_refs`, or `emit_unmapped_fallback`:
2. `entity_details({connectionName, targets: [{display: "<identifier>"}]})`
confirm the identifier resolves; inspect native types, FK/PK, and sampleValues.
3. For literal values from the source (status codes, plan tiers): check whether
they appear in `entity_details`' `sampleValues` for the relevant column.
If `sampleValues` is short or you suspect the sample missed real values, run
a `sql_execution` probe: `SELECT DISTINCT <col> FROM <ref> LIMIT 50`.
4. If the candidate identifier still doesn't resolve, do one of:
(a) Use `sql_execution` with `SELECT 1 FROM <ref> LIMIT 0`. If it errors,
the identifier is fictional.
(b) Wrap the identifier in `[unverified — from <rawPath>]` in the wiki body,
citing the exact raw path that mentioned it.
(c) When recording `emit_unmapped_fallback` with `no_physical_table`,
include the failing probe error in `clarification`.
5. Never copy `<schema>.<table>` placeholder strings from these instructions
into output.
```
Each affected skill inlines this block verbatim (skill files are independent prompts; KTX has no cross-skill include mechanism today).
### Per-skill diffs
Two skills are deliberately excluded from updates: `ingest_triage` (read-only triage; produces no wiki or SL output) and `sl` (umbrella reference doc; cross-links to the protocol but doesn't need its own copy).
| Skill | Changes |
|---|---|
| `notion_synthesize` | Inline protocol; append `discover_data`, `entity_details`, `sql_execution` to `Allowed:` (line 74); replace `orbit_analytics.customer` example on line 70 with `<schema>.<table>` |
| `dbt_ingest` | Inline protocol; line 24: replace `wiki_sl_search``discover_data` and `sl_describe_table``entity_details`; strengthen the "not permission to invent physical columns" paragraph by naming `entity_details` as the verification call |
| `lookml_ingest` | Inline protocol; add: "Verify each `sql_table_name` from the LookML view with `entity_details` before mapping to an SL source" |
| `looker_ingest` | Inline protocol; add: "For every Looker field reference, call `entity_details` on the underlying `(schema, table, column)` before promoting to `sl_refs` or quoting in wiki body" |
| `metabase_ingest` | Inline protocol; add: "Before writing a wiki page derived from a Metabase question's SQL, verify each `schema.table.column` mentioned with `entity_details`" |
| `metricflow_ingest` | Inline protocol; add: "Verify each MetricFlow model's source table with `entity_details` before producing the corresponding `sl_write_source`" |
| `live_database_ingest` | Inline protocol; add: "Sample values come from the scan record; do not invent values not present in `relationship-profile.json`" |
| `historic_sql_table_digest` | Shortened protocol focused on column attribution: "Only mention columns visible in the table's scan record. Use `entity_details({display})` if uncertain" |
| `historic_sql_patterns` | Inline protocol; add: "Every join column mentioned in pattern descriptions must be verified via `entity_details` for both sides of the join" |
| `knowledge_capture` | Inline protocol; update line 44: "First call `discover_data` to find existing wiki pages, SL sources, and raw tables on the topic" |
| `sl_capture` | Inline protocol; add: "Before `sl_write_source`, call `entity_details` on the target table to confirm column names and types match the YAML being written" |
### Cleanups beyond the four-tool addition
- `notion_synthesize/SKILL.md:70` — remove `orbit_analytics.customer` (placeholder).
- `packages/context/src/ingest/tools/emit-unmapped-fallback.tool.ts:67` — same example string in the Zod `.describe()` — replace with `<schema>.<table>`.
- `dbt_ingest/SKILL.md:24` — fix `wiki_sl_search` and `sl_describe_table` (neither tool exists in KTX).
- `packages/context/src/sl/tools/sl-warehouse-validation.ts:93` — inline error message references the non-existent `sl_describe_table`. Replace with `sl_read_source`.
## Testing strategy
### Unit tests
| Component | Tests |
|---|---|
| `getDialectForDriver` | Every supported driver returns a dialect; unknown driver throws with a clear list of supported drivers |
| `WarehouseCatalogService.getTable` | Reads and merges `tables/<b64>.json` and `relationship-profile.json`; returns `null` when no sync exists; returns `null` for unknown `(catalog, db, name)` |
| `WarehouseCatalogService.resolveDisplay` | Postgres 2-part display → `{catalog: null, db, name}`; BigQuery 3-part display → `{catalog, db, name}`; ambiguous 2-part on BigQuery returns candidates list; unknown displays produce closest-match candidates ordered by edit distance |
| `WarehouseCatalogService.searchByName` | Substring and token match; tiers (exact-name → token-match) ordered correctly; cache hit on second call within same instance |
| `entity_details` | Resolves `{display}` and structured inputs; reports "Not found" with candidates for unknown ref; reports "no scan available" distinctly when scan dir missing; truncates above 50 targets |
| `discover_data` | Three sections present when all three have hits; sections omitted when empty; `sourceName` inspect mode delegates to `sl_discover` and skips other sections; `allowedConnectionNames` scope honoured |
| `sql_execution` | `assertReadOnlySql` rejects each mutating verb; row-limit wrap visible in `wrappedSql`; connector errors surface verbatim with the failing SQL; rejects `connectionName` not in `allowedConnectionNames` |
### Integration tests
- Extend `packages/context/src/ingest/ingest-bundle.runner.test.ts` to verify the three new tools are present in both WU-stage and reconcile-stage tool maps and refuse out-of-scope `connectionName` values.
- New fixture-based test: stage a small `raw-sources/<conn>/live-database/<sync>/` directory with 2 tables + 1 enrichment profile, then call each tool through the runner's tool map and assert the markdown contains the expected fields. Uses the same fake-LLM harness as `notion.adapter.test.ts`.
- One end-to-end regression test reproducing the `orbit_analytics.customer` hallucination: a fake Notion page mentioning the fictional table is fed to the synthesis stage; the run produces a wiki page where the fictional name is wrapped in `[unverified — …]` or omitted, not promoted to `tables:` frontmatter.
### Prompt-bundling tests
Extend `packages/context/src/memory/memory-runtime-assets.test.ts`:
- Every skill in the synthesis-writers list embeds the verification-protocol block (assert by stable header text).
- Every such skill lists the three new tools when it has a `## Tools / Allowed` section, or mentions them inline in a workflow step otherwise.
- No skill file contains any of the banned strings: `orbit_analytics.customer`, `wiki_sl_search`, `sl_describe_table`.
### Performance guards
`WarehouseCatalogService` caches the per-connection table list per stage (one WorkUnit's lifetime). Tests assert second call is a cache hit. No DB index for `searchByName` in this iteration — linear scan over scan artefacts is acceptable up to ~50K columns. If volume warrants it later, a follow-up PR adds a SQLite FTS index.
## Rollout
Four mergeable PRs:
| PR | Lands |
|---|---|
| 1 | `getDialectForDriver` + `WarehouseCatalogService` + `entity_details` tool + wiring in `ingest-bundle.runner.ts` + unit/integration tests |
| 2 | `sql_execution` tool + tests + the `orbit_analytics.customer` regression test (which exercises protocol steps 4a/4c) |
| 3 | `discover_data` tool + tests |
| 4 | All 11 skill prompts updated with the verification protocol + the three cleanups + extended `memory-runtime-assets.test.ts` |
Skill prompts land last so they can reference the tools that already exist.
## Out of scope
- **Hard write-time validation in `wiki_write` / `emit_unmapped_fallback`.** A complementary spec covers regex-based identifier validation at the write boundary. Defence-in-depth — separate concern.
- **SQLite FTS index for `searchByName`.** Deferred until the linear scan benchmark fails.
- **`raw_schema_search` as a standalone tool.** `discover_data`'s raw section covers the concept-search case.
- **`semantic_query` in the synthesis toolset.** `semantic_query` will exist in KTX for the research/chat-time agent; it is deliberately excluded from synthesis because synthesis creates SL sources rather than queries them.
- **`dictionary_search`.** `entity_details` already returns per-column `sampleValues`; for the rarer "where does this literal live?" case, `sql_execution` is more accurate than a sampled-JSON scan.
- **UUID layer for tables/columns.** KTX deliberately stays string-keyed on `(connection, catalog, db, name)`.

View file

@ -19,7 +19,7 @@ agent:
max_iterations: 20
default_toolset:
- sl_query
- knowledge_search
- wiki_search
- sl_read_source
memory:
auto_commit: true

View file

@ -1,6 +1,7 @@
name: orders
table: public.orders
description: Orders placed through the storefront.
descriptions:
user: Orders placed through the storefront.
grain:
- id
columns:

View file

@ -13,10 +13,8 @@ generated local project.
The managed Python runtime smoke requires `uv` on `PATH`, isolates
`KTX_RUNTIME_ROOT`, verifies `ktx dev runtime status`, runs `ktx sl query --yes` to
install the core runtime from the bundled wheel, checks `ktx dev runtime doctor`,
starts and reuses the managed daemon, stops it, previews a stale runtime with
`ktx dev runtime prune --dry-run`, verifies confirmation is required, and removes
the stale runtime with `ktx dev runtime prune --yes`.
install the core runtime from the bundled wheel, checks `ktx dev runtime status`,
starts and reuses the managed daemon, and stops it.
The artifact manifest contains the public `@kaelio/ktx` npm tarball and the
bundled `kaelio-ktx` runtime wheel. The smoke does not install standalone

View file

@ -95,7 +95,7 @@ note, not a warning.
Run local historic-SQL ingest:
```bash
pnpm run ktx -- dev ingest run --project-dir /tmp/ktx-postgres-historic \
pnpm run ktx -- ingest run --project-dir /tmp/ktx-postgres-historic \
--connection-id warehouse \
--adapter historic-sql \
--plain \
@ -103,7 +103,7 @@ pnpm run ktx -- dev ingest run --project-dir /tmp/ktx-postgres-historic \
--no-input
```
The full `dev ingest run` path also runs curation WorkUnits, so it requires a
The full `ingest run` path also runs curation WorkUnits, so it requires a
configured LLM provider.
Inspect the latest manifest:
@ -127,6 +127,6 @@ table.
- Missing grants: confirm `GRANT pg_read_all_stats TO ktx_reader;`.
- Empty snapshot: rerun `scripts/generate-workload.sh base` and keep
`--historic-sql-min-executions 2` for the smoke.
- SQL-analysis failures: run `pnpm run ktx -- dev runtime doctor` from the KTX
- SQL-analysis failures: run `pnpm run ktx -- dev runtime status` from the KTX
repository root and confirm `uv`, the bundled Python wheel, and the managed
runtime all pass.

114
knip.json Normal file
View file

@ -0,0 +1,114 @@
{
"$schema": "https://unpkg.com/knip@6/schema.json",
"workspaces": {
".": {
"entry": ["scripts/**/*.mjs"],
"project": ["scripts/**/*.mjs"]
},
"packages/cli": {
"entry": [
"src/index.ts",
"src/bin.ts",
"src/**/*.test.ts",
"src/**/*.test.tsx",
"scripts/**/*.mjs"
],
"project": ["src/**/*.{ts,tsx}", "scripts/**/*.mjs", "vitest.config.ts"]
},
"packages/context": {
"entry": [
"src/index.ts",
"src/agent/index.ts",
"src/core/index.ts",
"src/connections/index.ts",
"src/daemon/index.ts",
"src/ingest/index.ts",
"src/ingest/memory-flow/index.ts",
"src/ingest/metabase-mapping.ts",
"src/scan/index.ts",
"src/search/index.ts",
"src/sql-analysis/index.ts",
"src/memory/index.ts",
"src/mcp/index.ts",
"src/project/index.ts",
"src/prompts/index.ts",
"src/skills/index.ts",
"src/sl/index.ts",
"src/sl/descriptions.ts",
"src/tools/index.ts",
"src/wiki/index.ts",
"src/**/*.test.ts",
"scripts/**/*.mjs"
],
"project": ["src/**/*.ts", "scripts/**/*.mjs", "vitest.config.ts"]
},
"packages/llm": {
"entry": ["src/index.ts", "src/**/*.test.ts"],
"project": ["src/**/*.ts", "vitest.config.ts"]
},
"packages/connector-*": {
"entry": ["src/index.ts", "src/**/*.test.ts"],
"project": ["src/**/*.ts"]
},
"docs-site": {
"entry": [
"app/**/*.{ts,tsx}",
"components/**/*.{ts,tsx}",
"lib/**/*.{ts,tsx}",
"middleware.ts",
"next.config.mjs",
"source.config.ts",
"tests/**/*.mjs"
],
"project": [
"app/**/*.{ts,tsx}",
"components/**/*.{ts,tsx}",
"lib/**/*.{ts,tsx}",
"*.ts",
"*.mjs",
"tests/**/*.mjs"
],
"ignoreDependencies": ["tailwindcss"]
}
},
"ignore": [
"**/dist/**",
"**/coverage/**",
"**/.next/**",
"**/node_modules/**",
"**/*.gen.ts",
"**/*.generated.ts"
],
"ignoreIssues": {
"packages/cli/src/clack.ts": ["exports"],
"packages/cli/src/commands/connection-metabase-setup.ts": ["exports", "types"],
"packages/cli/src/ingest.test-utils.ts": ["exports"],
"packages/cli/src/io/symbols.ts": ["exports"],
"packages/cli/src/managed-python-command.ts": ["types"],
"packages/cli/src/managed-python-daemon.ts": ["types"],
"packages/cli/src/managed-python-http.ts": ["exports", "types"],
"packages/cli/src/managed-python-runtime.ts": ["types"],
"packages/cli/src/memory-flow-tui.tsx": ["types"],
"packages/cli/src/next-steps.ts": ["exports"],
"packages/cli/src/print-command-tree.ts": ["exports"],
"packages/cli/src/setup-agents.ts": ["exports", "types"],
"packages/cli/src/setup-context.ts": ["types"],
"packages/cli/src/setup-demo-tour.ts": ["exports"],
"packages/cli/src/setup-models.ts": ["exports"],
"packages/cli/src/setup-project.ts": ["types"],
"packages/cli/src/setup-ready-menu.ts": ["types"],
"packages/cli/src/setup-sources.ts": ["types"],
"packages/context/src/ingest/adapters/historic-sql/pattern-inputs.ts": ["exports", "types"],
"packages/context/src/ingest/adapters/lookml/pull-config.ts": ["exports"],
"packages/context/src/ingest/adapters/metabase/serialize-card.ts": ["types"],
"packages/context/src/ingest/adapters/metabase/types.ts": ["exports"],
"packages/context/src/ingest/adapters/metricflow/parse.ts": ["types"],
"packages/context/src/ingest/ports.ts": ["types"],
"packages/context/src/ingest/stages/stage-3-work-units.ts": ["types"],
"packages/context/src/ingest/stages/stage-index.types.ts": ["types"],
"packages/context/src/project/config.ts": ["types"],
"packages/context/src/scan/relationship-candidates.ts": ["types"],
"packages/context/src/scan/relationship-diagnostics.ts": ["types"],
"packages/context/src/tools/context-evidence-tool-store.ts": ["types"]
}
}

View file

@ -18,6 +18,10 @@
"artifacts:verify-manifest": "node scripts/package-artifacts.mjs verify-manifest",
"build": "pnpm --filter './packages/*' run build",
"check": "node scripts/check-boundaries.mjs && node --test scripts/*.test.mjs && pnpm --filter './packages/*' run build && pnpm --filter './packages/*' run test",
"dead-code": "pnpm run dead-code:biome && pnpm run dead-code:knip",
"dead-code:biome": "biome ci . --formatter-enabled=false --assist-enabled=false",
"dead-code:fix": "biome check . --formatter-enabled=false --assist-enabled=false --write && knip --fix --format",
"dead-code:knip": "knip --reporter compact",
"ktx": "node scripts/run-ktx.mjs",
"link:dev": "node scripts/link-dev-cli.mjs",
"native:rebuild": "pnpm -r rebuild better-sqlite3",
@ -36,9 +40,12 @@
"type-check": "pnpm --filter './packages/*' run type-check"
},
"devDependencies": {
"@biomejs/biome": "^2.4.15",
"@types/node": "^25.7.0",
"better-sqlite3": "^12.10.0",
"knip": "^6.12.2",
"typescript": "^6.0.3",
"vitest": "^4.1.6"
"yaml": "^2.9.0"
},
"pnpm": {
"onlyBuiltDependencies": [

View file

@ -2,7 +2,7 @@
{
"id": "link-001",
"artifactKind": "wiki",
"artifactKey": "knowledge/global/arr-contract-first.md",
"artifactKey": "wiki/global/arr-contract-first.md",
"sourceKind": "warehouse",
"sourcePath": "contracts",
"relationship": "describes",
@ -11,7 +11,7 @@
{
"id": "link-002",
"artifactKind": "wiki",
"artifactKey": "knowledge/global/arr-contract-first.md",
"artifactKey": "wiki/global/arr-contract-first.md",
"sourceKind": "notion",
"sourcePath": "raw-sources/notion/arr-and-contract-reporting-notes.md",
"relationship": "derived_from",
@ -20,7 +20,7 @@
{
"id": "link-003",
"artifactKind": "wiki",
"artifactKey": "knowledge/global/revenue-gross-to-net.md",
"artifactKey": "wiki/global/revenue-gross-to-net.md",
"sourceKind": "warehouse",
"sourcePath": "invoices",
"relationship": "describes",
@ -29,7 +29,7 @@
{
"id": "link-004",
"artifactKind": "wiki",
"artifactKey": "knowledge/global/revenue-gross-to-net.md",
"artifactKey": "wiki/global/revenue-gross-to-net.md",
"sourceKind": "notion",
"sourcePath": "raw-sources/notion/revenue-reporting-policy.md",
"relationship": "derived_from",
@ -38,7 +38,7 @@
{
"id": "link-005",
"artifactKind": "wiki",
"artifactKey": "knowledge/global/discount-expiration.md",
"artifactKey": "wiki/global/discount-expiration.md",
"sourceKind": "warehouse",
"sourcePath": "arr_movements",
"relationship": "describes",
@ -47,7 +47,7 @@
{
"id": "link-006",
"artifactKind": "wiki",
"artifactKey": "knowledge/global/nrr-retention.md",
"artifactKey": "wiki/global/nrr-retention.md",
"sourceKind": "warehouse",
"sourcePath": "arr_movements",
"relationship": "describes",
@ -56,7 +56,7 @@
{
"id": "link-007",
"artifactKind": "wiki",
"artifactKey": "knowledge/global/nrr-retention.md",
"artifactKey": "wiki/global/nrr-retention.md",
"sourceKind": "notion",
"sourcePath": "raw-sources/notion/retention-and-nrr-definition-notes.md",
"relationship": "derived_from",
@ -65,7 +65,7 @@
{
"id": "link-008",
"artifactKind": "wiki",
"artifactKey": "knowledge/global/nrr-retention.md",
"artifactKey": "wiki/global/nrr-retention.md",
"sourceKind": "bi",
"sourcePath": "raw-sources/bi/account_retention.view.lkml",
"relationship": "derived_from",
@ -74,7 +74,7 @@
{
"id": "link-009",
"artifactKind": "wiki",
"artifactKey": "knowledge/global/segment-classification.md",
"artifactKey": "wiki/global/segment-classification.md",
"sourceKind": "warehouse",
"sourcePath": "plans",
"relationship": "describes",
@ -83,7 +83,7 @@
{
"id": "link-010",
"artifactKind": "wiki",
"artifactKey": "knowledge/global/segment-classification.md",
"artifactKey": "wiki/global/segment-classification.md",
"sourceKind": "notion",
"sourcePath": "raw-sources/notion/sales-ops-segmentation-guide.md",
"relationship": "derived_from",
@ -92,7 +92,7 @@
{
"id": "link-011",
"artifactKind": "wiki",
"artifactKey": "knowledge/global/activation-policy.md",
"artifactKey": "wiki/global/activation-policy.md",
"sourceKind": "notion",
"sourcePath": "raw-sources/notion/activation-policy-decision-record.md",
"relationship": "derived_from",
@ -101,7 +101,7 @@
{
"id": "link-012",
"artifactKind": "wiki",
"artifactKey": "knowledge/global/procurement-workflows.md",
"artifactKey": "wiki/global/procurement-workflows.md",
"sourceKind": "warehouse",
"sourcePath": "purchase_requests",
"relationship": "describes",
@ -110,7 +110,7 @@
{
"id": "link-013",
"artifactKind": "wiki",
"artifactKey": "knowledge/global/customer-health-scoring.md",
"artifactKey": "wiki/global/customer-health-scoring.md",
"sourceKind": "notion",
"sourcePath": "raw-sources/notion/customer-health-playbook.md",
"relationship": "derived_from",
@ -119,7 +119,7 @@
{
"id": "link-014",
"artifactKind": "wiki",
"artifactKey": "knowledge/global/customer-health-scoring.md",
"artifactKey": "wiki/global/customer-health-scoring.md",
"sourceKind": "warehouse",
"sourcePath": "support_tickets",
"relationship": "describes",
@ -128,7 +128,7 @@
{
"id": "link-015",
"artifactKind": "wiki",
"artifactKey": "knowledge/global/support-escalation.md",
"artifactKey": "wiki/global/support-escalation.md",
"sourceKind": "notion",
"sourcePath": "raw-sources/notion/support-escalation-runbook.md",
"relationship": "derived_from",
@ -137,7 +137,7 @@
{
"id": "link-016",
"artifactKind": "wiki",
"artifactKey": "knowledge/global/internal-test-exclusion.md",
"artifactKey": "wiki/global/internal-test-exclusion.md",
"sourceKind": "notion",
"sourcePath": "raw-sources/notion/analyst-onboarding.md",
"relationship": "derived_from",

View file

@ -47,7 +47,7 @@
"sourceCount": 46
},
"knowledge": {
"path": "knowledge/global",
"path": "wiki/global",
"pageCount": 28
},
"links": {

View file

@ -71,7 +71,7 @@
"type": "work_unit_started",
"unitKey": "revenue-and-contracts",
"skills": [
"knowledge_capture",
"wiki_capture",
"sl_capture"
],
"stepBudget": 40
@ -81,21 +81,21 @@
"unitKey": "revenue-and-contracts",
"target": "wiki",
"action": "created",
"key": "knowledge/global/arr-contract-first.md"
"key": "wiki/global/arr-contract-first.md"
},
{
"type": "candidate_action",
"unitKey": "revenue-and-contracts",
"target": "wiki",
"action": "created",
"key": "knowledge/global/revenue-gross-to-net.md"
"key": "wiki/global/revenue-gross-to-net.md"
},
{
"type": "candidate_action",
"unitKey": "revenue-and-contracts",
"target": "wiki",
"action": "created",
"key": "knowledge/global/discount-expiration.md"
"key": "wiki/global/discount-expiration.md"
},
{
"type": "candidate_action",
@ -127,7 +127,7 @@
"type": "work_unit_started",
"unitKey": "retention-and-segments",
"skills": [
"knowledge_capture",
"wiki_capture",
"sl_capture"
],
"stepBudget": 40
@ -137,14 +137,14 @@
"unitKey": "retention-and-segments",
"target": "wiki",
"action": "created",
"key": "knowledge/global/nrr-retention.md"
"key": "wiki/global/nrr-retention.md"
},
{
"type": "candidate_action",
"unitKey": "retention-and-segments",
"target": "wiki",
"action": "created",
"key": "knowledge/global/segment-classification.md"
"key": "wiki/global/segment-classification.md"
},
{
"type": "candidate_action",
@ -162,7 +162,7 @@
"type": "work_unit_started",
"unitKey": "procurement-and-activation",
"skills": [
"knowledge_capture",
"wiki_capture",
"sl_capture"
],
"stepBudget": 40
@ -172,14 +172,14 @@
"unitKey": "procurement-and-activation",
"target": "wiki",
"action": "created",
"key": "knowledge/global/activation-policy.md"
"key": "wiki/global/activation-policy.md"
},
{
"type": "candidate_action",
"unitKey": "procurement-and-activation",
"target": "wiki",
"action": "created",
"key": "knowledge/global/procurement-workflows.md"
"key": "wiki/global/procurement-workflows.md"
},
{
"type": "candidate_action",
@ -197,7 +197,7 @@
"type": "work_unit_started",
"unitKey": "support-and-health",
"skills": [
"knowledge_capture",
"wiki_capture",
"sl_capture"
],
"stepBudget": 40
@ -207,14 +207,14 @@
"unitKey": "support-and-health",
"target": "wiki",
"action": "created",
"key": "knowledge/global/customer-health-scoring.md"
"key": "wiki/global/customer-health-scoring.md"
},
{
"type": "candidate_action",
"unitKey": "support-and-health",
"target": "wiki",
"action": "created",
"key": "knowledge/global/support-escalation.md"
"key": "wiki/global/support-escalation.md"
},
{
"type": "candidate_action",
@ -232,7 +232,7 @@
"type": "work_unit_started",
"unitKey": "governance-and-exclusions",
"skills": [
"knowledge_capture"
"wiki_capture"
],
"stepBudget": 40
},
@ -241,7 +241,7 @@
"unitKey": "governance-and-exclusions",
"target": "wiki",
"action": "created",
"key": "knowledge/global/internal-test-exclusion.md"
"key": "wiki/global/internal-test-exclusion.md"
},
{
"type": "work_unit_finished",
@ -321,7 +321,7 @@
"unitKey": "revenue-and-contracts",
"target": "wiki",
"action": "created",
"key": "knowledge/global/arr-contract-first.md",
"key": "wiki/global/arr-contract-first.md",
"summary": "ARR follows contract precedence with cancellation and discount caveats.",
"rawFiles": [
"contracts",
@ -334,7 +334,7 @@
"unitKey": "revenue-and-contracts",
"target": "wiki",
"action": "created",
"key": "knowledge/global/revenue-gross-to-net.md",
"key": "wiki/global/revenue-gross-to-net.md",
"summary": "Invoice, refund, and revenue dashboard evidence reconcile gross to net revenue.",
"rawFiles": [
"invoices",
@ -346,7 +346,7 @@
"unitKey": "revenue-and-contracts",
"target": "wiki",
"action": "created",
"key": "knowledge/global/discount-expiration.md",
"key": "wiki/global/discount-expiration.md",
"summary": "Discount expiration is separated from organic contraction for retention reporting.",
"rawFiles": [
"contracts",
@ -394,7 +394,7 @@
"unitKey": "retention-and-segments",
"target": "wiki",
"action": "created",
"key": "knowledge/global/nrr-retention.md",
"key": "wiki/global/nrr-retention.md",
"summary": "NRR uses parent-account rollups and quarterly ARR movement windows.",
"rawFiles": [
"accounts",
@ -407,7 +407,7 @@
"unitKey": "retention-and-segments",
"target": "wiki",
"action": "created",
"key": "knowledge/global/segment-classification.md",
"key": "wiki/global/segment-classification.md",
"summary": "Segment labels come from plan mapping and sales-ops policy notes.",
"rawFiles": [
"accounts",
@ -432,7 +432,7 @@
"unitKey": "procurement-and-activation",
"target": "wiki",
"action": "created",
"key": "knowledge/global/activation-policy.md",
"key": "wiki/global/activation-policy.md",
"summary": "Activation policy changed on January 15, 2026 and is encoded for agents.",
"rawFiles": [
"purchase_requests",
@ -445,7 +445,7 @@
"unitKey": "procurement-and-activation",
"target": "wiki",
"action": "created",
"key": "knowledge/global/procurement-workflows.md",
"key": "wiki/global/procurement-workflows.md",
"summary": "Procurement requester activity and approval events explain product usage.",
"rawFiles": [
"purchase_requests",
@ -468,7 +468,7 @@
"unitKey": "support-and-health",
"target": "wiki",
"action": "created",
"key": "knowledge/global/customer-health-scoring.md",
"key": "wiki/global/customer-health-scoring.md",
"summary": "Customer health combines support severity, ARR exposure, and product usage.",
"rawFiles": [
"support_tickets",
@ -480,7 +480,7 @@
"unitKey": "support-and-health",
"target": "wiki",
"action": "created",
"key": "knowledge/global/support-escalation.md",
"key": "wiki/global/support-escalation.md",
"summary": "Escalation tiers map ticket severity to SLA expectations.",
"rawFiles": [
"support_tickets",
@ -503,7 +503,7 @@
"unitKey": "governance-and-exclusions",
"target": "wiki",
"action": "created",
"key": "knowledge/global/internal-test-exclusion.md",
"key": "wiki/global/internal-test-exclusion.md",
"summary": "Canonical metrics exclude internal and test accounts across source families.",
"rawFiles": [
"raw-sources/notion/analyst-onboarding.md"
@ -515,97 +515,97 @@
{
"rawPath": "contracts",
"artifactKind": "wiki",
"artifactKey": "knowledge/global/arr-contract-first.md",
"artifactKey": "wiki/global/arr-contract-first.md",
"actionType": "wiki_written"
},
{
"rawPath": "raw-sources/notion/arr-and-contract-reporting-notes.md",
"artifactKind": "wiki",
"artifactKey": "knowledge/global/arr-contract-first.md",
"artifactKey": "wiki/global/arr-contract-first.md",
"actionType": "wiki_written"
},
{
"rawPath": "invoices",
"artifactKind": "wiki",
"artifactKey": "knowledge/global/revenue-gross-to-net.md",
"artifactKey": "wiki/global/revenue-gross-to-net.md",
"actionType": "wiki_written"
},
{
"rawPath": "raw-sources/notion/revenue-reporting-policy.md",
"artifactKind": "wiki",
"artifactKey": "knowledge/global/revenue-gross-to-net.md",
"artifactKey": "wiki/global/revenue-gross-to-net.md",
"actionType": "wiki_written"
},
{
"rawPath": "arr_movements",
"artifactKind": "wiki",
"artifactKey": "knowledge/global/discount-expiration.md",
"artifactKey": "wiki/global/discount-expiration.md",
"actionType": "wiki_written"
},
{
"rawPath": "arr_movements",
"artifactKind": "wiki",
"artifactKey": "knowledge/global/nrr-retention.md",
"artifactKey": "wiki/global/nrr-retention.md",
"actionType": "wiki_written"
},
{
"rawPath": "raw-sources/notion/retention-and-nrr-definition-notes.md",
"artifactKind": "wiki",
"artifactKey": "knowledge/global/nrr-retention.md",
"artifactKey": "wiki/global/nrr-retention.md",
"actionType": "wiki_written"
},
{
"rawPath": "raw-sources/bi/account_retention.view.lkml",
"artifactKind": "wiki",
"artifactKey": "knowledge/global/nrr-retention.md",
"artifactKey": "wiki/global/nrr-retention.md",
"actionType": "wiki_written"
},
{
"rawPath": "plans",
"artifactKind": "wiki",
"artifactKey": "knowledge/global/segment-classification.md",
"artifactKey": "wiki/global/segment-classification.md",
"actionType": "wiki_written"
},
{
"rawPath": "raw-sources/notion/sales-ops-segmentation-guide.md",
"artifactKind": "wiki",
"artifactKey": "knowledge/global/segment-classification.md",
"artifactKey": "wiki/global/segment-classification.md",
"actionType": "wiki_written"
},
{
"rawPath": "raw-sources/notion/activation-policy-decision-record.md",
"artifactKind": "wiki",
"artifactKey": "knowledge/global/activation-policy.md",
"artifactKey": "wiki/global/activation-policy.md",
"actionType": "wiki_written"
},
{
"rawPath": "purchase_requests",
"artifactKind": "wiki",
"artifactKey": "knowledge/global/procurement-workflows.md",
"artifactKey": "wiki/global/procurement-workflows.md",
"actionType": "wiki_written"
},
{
"rawPath": "raw-sources/notion/customer-health-playbook.md",
"artifactKind": "wiki",
"artifactKey": "knowledge/global/customer-health-scoring.md",
"artifactKey": "wiki/global/customer-health-scoring.md",
"actionType": "wiki_written"
},
{
"rawPath": "support_tickets",
"artifactKind": "wiki",
"artifactKey": "knowledge/global/customer-health-scoring.md",
"artifactKey": "wiki/global/customer-health-scoring.md",
"actionType": "wiki_written"
},
{
"rawPath": "raw-sources/notion/support-escalation-runbook.md",
"artifactKind": "wiki",
"artifactKey": "knowledge/global/support-escalation.md",
"artifactKey": "wiki/global/support-escalation.md",
"actionType": "wiki_written"
},
{
"rawPath": "raw-sources/notion/analyst-onboarding.md",
"artifactKind": "wiki",
"artifactKey": "knowledge/global/internal-test-exclusion.md",
"artifactKey": "wiki/global/internal-test-exclusion.md",
"actionType": "wiki_written"
},
{

View file

@ -57,4 +57,4 @@ Always join through `customer.id`. Do not join on `email`.
- **Join key:** Always use `customer.id`, never `email`.
- **Timezone:** `created_at` and `last_seen_at` are UTC. Confirm whether a question expects UTC or a local business day before filtering.
- **Paying vs. all:** `free` customers must be excluded from paying-customer follow-ups. Use `paying_customer_count`, not `customer_count`.
- **plan_tier values:** `free`, `pro`, `enterprise`. Note: `pro_plus` is a legacy alias for `growth` in the account/contract layer (see `orbit-plan-segment-normalization`), but `plan_tier` on this table uses `pro` not `pro_plus`.
- **plan_tier values:** `free`, `pro`, `enterprise`. Note: use the canonical plan names from the account/contract layer (see `orbit-plan-segment-normalization`); `plan_tier` on this table uses `pro` rather than `growth`.

View file

@ -27,7 +27,7 @@ Sales Ops must complete the handoff **before the first implementation call**. Cu
| Field | Notes |
|---|---|
| Current plan | Starter / Growth / Enterprise — use canonical plan name, not legacy aliases |
| Current plan | Starter / Growth / Enterprise — use canonical plan name |
| Account segment | self_serve / commercial / enterprise (see `orbit-plan-segment-normalization`) |
| Contract shape | Term, ARR, any discounts or custom terms |
| Renewal contact | Named person on the customer side responsible for renewal |

View file

@ -229,39 +229,39 @@ const knowledgePages = [
];
const provenanceLinks = [
['wiki', 'knowledge/global/arr-contract-first.md', 'warehouse', 'contracts', 'describes', 1],
['wiki', 'wiki/global/arr-contract-first.md', 'warehouse', 'contracts', 'describes', 1],
[
'wiki',
'knowledge/global/arr-contract-first.md',
'wiki/global/arr-contract-first.md',
'notion',
'raw-sources/notion/arr-and-contract-reporting-notes.md',
'derived_from',
0.95,
],
['wiki', 'knowledge/global/revenue-gross-to-net.md', 'warehouse', 'invoices', 'describes', 1],
['wiki', 'wiki/global/revenue-gross-to-net.md', 'warehouse', 'invoices', 'describes', 1],
[
'wiki',
'knowledge/global/revenue-gross-to-net.md',
'wiki/global/revenue-gross-to-net.md',
'notion',
'raw-sources/notion/revenue-reporting-policy.md',
'derived_from',
0.95,
],
['wiki', 'knowledge/global/discount-expiration.md', 'warehouse', 'arr_movements', 'describes', 1],
['wiki', 'knowledge/global/nrr-retention.md', 'warehouse', 'arr_movements', 'describes', 1],
['wiki', 'wiki/global/discount-expiration.md', 'warehouse', 'arr_movements', 'describes', 1],
['wiki', 'wiki/global/nrr-retention.md', 'warehouse', 'arr_movements', 'describes', 1],
[
'wiki',
'knowledge/global/nrr-retention.md',
'wiki/global/nrr-retention.md',
'notion',
'raw-sources/notion/retention-and-nrr-definition-notes.md',
'derived_from',
0.95,
],
['wiki', 'knowledge/global/nrr-retention.md', 'bi', 'raw-sources/bi/account_retention.view.lkml', 'derived_from', 0.85],
['wiki', 'knowledge/global/segment-classification.md', 'warehouse', 'plans', 'describes', 1],
['wiki', 'wiki/global/nrr-retention.md', 'bi', 'raw-sources/bi/account_retention.view.lkml', 'derived_from', 0.85],
['wiki', 'wiki/global/segment-classification.md', 'warehouse', 'plans', 'describes', 1],
[
'wiki',
'knowledge/global/segment-classification.md',
'wiki/global/segment-classification.md',
'notion',
'raw-sources/notion/sales-ops-segmentation-guide.md',
'derived_from',
@ -269,25 +269,25 @@ const provenanceLinks = [
],
[
'wiki',
'knowledge/global/activation-policy.md',
'wiki/global/activation-policy.md',
'notion',
'raw-sources/notion/activation-policy-decision-record.md',
'derived_from',
0.95,
],
['wiki', 'knowledge/global/procurement-workflows.md', 'warehouse', 'purchase_requests', 'describes', 1],
['wiki', 'wiki/global/procurement-workflows.md', 'warehouse', 'purchase_requests', 'describes', 1],
[
'wiki',
'knowledge/global/customer-health-scoring.md',
'wiki/global/customer-health-scoring.md',
'notion',
'raw-sources/notion/customer-health-playbook.md',
'derived_from',
0.9,
],
['wiki', 'knowledge/global/customer-health-scoring.md', 'warehouse', 'support_tickets', 'describes', 1],
['wiki', 'wiki/global/customer-health-scoring.md', 'warehouse', 'support_tickets', 'describes', 1],
[
'wiki',
'knowledge/global/support-escalation.md',
'wiki/global/support-escalation.md',
'notion',
'raw-sources/notion/support-escalation-runbook.md',
'derived_from',
@ -295,7 +295,7 @@ const provenanceLinks = [
],
[
'wiki',
'knowledge/global/internal-test-exclusion.md',
'wiki/global/internal-test-exclusion.md',
'notion',
'raw-sources/notion/analyst-onboarding.md',
'derived_from',
@ -490,7 +490,7 @@ function buildActions() {
unitKey: 'revenue-and-contracts',
target: 'wiki',
action: 'created',
key: 'knowledge/global/arr-contract-first.md',
key: 'wiki/global/arr-contract-first.md',
summary: 'ARR follows contract precedence with cancellation and discount caveats.',
rawFiles: ['contracts', 'arr_movements', 'raw-sources/notion/arr-and-contract-reporting-notes.md'],
status: 'success',
@ -499,7 +499,7 @@ function buildActions() {
unitKey: 'revenue-and-contracts',
target: 'wiki',
action: 'created',
key: 'knowledge/global/revenue-gross-to-net.md',
key: 'wiki/global/revenue-gross-to-net.md',
summary: 'Invoice, refund, and revenue dashboard evidence reconcile gross to net revenue.',
rawFiles: ['invoices', 'raw-sources/bi/revenue_exec.dashboard.lookml'],
status: 'success',
@ -508,7 +508,7 @@ function buildActions() {
unitKey: 'revenue-and-contracts',
target: 'wiki',
action: 'created',
key: 'knowledge/global/discount-expiration.md',
key: 'wiki/global/discount-expiration.md',
summary: 'Discount expiration is separated from organic contraction for retention reporting.',
rawFiles: ['contracts', 'arr_movements'],
status: 'success',
@ -544,7 +544,7 @@ function buildActions() {
unitKey: 'retention-and-segments',
target: 'wiki',
action: 'created',
key: 'knowledge/global/nrr-retention.md',
key: 'wiki/global/nrr-retention.md',
summary: 'NRR uses parent-account rollups and quarterly ARR movement windows.',
rawFiles: ['accounts', 'arr_movements', 'raw-sources/notion/retention-and-nrr-definition-notes.md'],
status: 'success',
@ -553,7 +553,7 @@ function buildActions() {
unitKey: 'retention-and-segments',
target: 'wiki',
action: 'created',
key: 'knowledge/global/segment-classification.md',
key: 'wiki/global/segment-classification.md',
summary: 'Segment labels come from plan mapping and sales-ops policy notes.',
rawFiles: ['accounts', 'plans', 'raw-sources/notion/sales-ops-segmentation-guide.md'],
status: 'success',
@ -571,7 +571,7 @@ function buildActions() {
unitKey: 'procurement-and-activation',
target: 'wiki',
action: 'created',
key: 'knowledge/global/activation-policy.md',
key: 'wiki/global/activation-policy.md',
summary: 'Activation policy changed on January 15, 2026 and is encoded for agents.',
rawFiles: ['purchase_requests', 'users', 'raw-sources/notion/activation-policy-decision-record.md'],
status: 'success',
@ -580,7 +580,7 @@ function buildActions() {
unitKey: 'procurement-and-activation',
target: 'wiki',
action: 'created',
key: 'knowledge/global/procurement-workflows.md',
key: 'wiki/global/procurement-workflows.md',
summary: 'Procurement requester activity and approval events explain product usage.',
rawFiles: ['purchase_requests', 'raw-sources/bi/procurement_activity.view.lkml'],
status: 'success',
@ -598,7 +598,7 @@ function buildActions() {
unitKey: 'support-and-health',
target: 'wiki',
action: 'created',
key: 'knowledge/global/customer-health-scoring.md',
key: 'wiki/global/customer-health-scoring.md',
summary: 'Customer health combines support severity, ARR exposure, and product usage.',
rawFiles: ['support_tickets', 'raw-sources/notion/customer-health-playbook.md'],
status: 'success',
@ -607,7 +607,7 @@ function buildActions() {
unitKey: 'support-and-health',
target: 'wiki',
action: 'created',
key: 'knowledge/global/support-escalation.md',
key: 'wiki/global/support-escalation.md',
summary: 'Escalation tiers map ticket severity to SLA expectations.',
rawFiles: ['support_tickets', 'raw-sources/notion/support-escalation-runbook.md'],
status: 'success',
@ -625,7 +625,7 @@ function buildActions() {
unitKey: 'governance-and-exclusions',
target: 'wiki',
action: 'created',
key: 'knowledge/global/internal-test-exclusion.md',
key: 'wiki/global/internal-test-exclusion.md',
summary: 'Canonical metrics exclude internal and test accounts across source families.',
rawFiles: ['raw-sources/notion/analyst-onboarding.md'],
status: 'success',
@ -665,27 +665,27 @@ function buildReplay(provenance, transcripts) {
{ type: 'raw_snapshot_written', syncId: 'demo-seeded-sync', rawFileCount: 29 },
{ type: 'diff_computed', added: 29, modified: 0, deleted: 0, unchanged: 0 },
{ type: 'chunks_planned', chunkCount: 5, workUnitCount: 5, evictionCount: 0 },
{ type: 'work_unit_started', unitKey: 'revenue-and-contracts', skills: ['knowledge_capture', 'sl_capture'], stepBudget: 40 },
{ type: 'work_unit_started', unitKey: 'revenue-and-contracts', skills: ['wiki_capture', 'sl_capture'], stepBudget: 40 },
{
type: 'candidate_action',
unitKey: 'revenue-and-contracts',
target: 'wiki',
action: 'created',
key: 'knowledge/global/arr-contract-first.md',
key: 'wiki/global/arr-contract-first.md',
},
{
type: 'candidate_action',
unitKey: 'revenue-and-contracts',
target: 'wiki',
action: 'created',
key: 'knowledge/global/revenue-gross-to-net.md',
key: 'wiki/global/revenue-gross-to-net.md',
},
{
type: 'candidate_action',
unitKey: 'revenue-and-contracts',
target: 'wiki',
action: 'created',
key: 'knowledge/global/discount-expiration.md',
key: 'wiki/global/discount-expiration.md',
},
{
type: 'candidate_action',
@ -709,20 +709,20 @@ function buildReplay(provenance, transcripts) {
key: 'orbit_demo.arr_movements',
},
{ type: 'work_unit_finished', unitKey: 'revenue-and-contracts', status: 'success' },
{ type: 'work_unit_started', unitKey: 'retention-and-segments', skills: ['knowledge_capture', 'sl_capture'], stepBudget: 40 },
{ type: 'work_unit_started', unitKey: 'retention-and-segments', skills: ['wiki_capture', 'sl_capture'], stepBudget: 40 },
{
type: 'candidate_action',
unitKey: 'retention-and-segments',
target: 'wiki',
action: 'created',
key: 'knowledge/global/nrr-retention.md',
key: 'wiki/global/nrr-retention.md',
},
{
type: 'candidate_action',
unitKey: 'retention-and-segments',
target: 'wiki',
action: 'created',
key: 'knowledge/global/segment-classification.md',
key: 'wiki/global/segment-classification.md',
},
{
type: 'candidate_action',
@ -735,7 +735,7 @@ function buildReplay(provenance, transcripts) {
{
type: 'work_unit_started',
unitKey: 'procurement-and-activation',
skills: ['knowledge_capture', 'sl_capture'],
skills: ['wiki_capture', 'sl_capture'],
stepBudget: 40,
},
{
@ -743,14 +743,14 @@ function buildReplay(provenance, transcripts) {
unitKey: 'procurement-and-activation',
target: 'wiki',
action: 'created',
key: 'knowledge/global/activation-policy.md',
key: 'wiki/global/activation-policy.md',
},
{
type: 'candidate_action',
unitKey: 'procurement-and-activation',
target: 'wiki',
action: 'created',
key: 'knowledge/global/procurement-workflows.md',
key: 'wiki/global/procurement-workflows.md',
},
{
type: 'candidate_action',
@ -760,20 +760,20 @@ function buildReplay(provenance, transcripts) {
key: 'orbit_demo.purchase_requests',
},
{ type: 'work_unit_finished', unitKey: 'procurement-and-activation', status: 'success' },
{ type: 'work_unit_started', unitKey: 'support-and-health', skills: ['knowledge_capture', 'sl_capture'], stepBudget: 40 },
{ type: 'work_unit_started', unitKey: 'support-and-health', skills: ['wiki_capture', 'sl_capture'], stepBudget: 40 },
{
type: 'candidate_action',
unitKey: 'support-and-health',
target: 'wiki',
action: 'created',
key: 'knowledge/global/customer-health-scoring.md',
key: 'wiki/global/customer-health-scoring.md',
},
{
type: 'candidate_action',
unitKey: 'support-and-health',
target: 'wiki',
action: 'created',
key: 'knowledge/global/support-escalation.md',
key: 'wiki/global/support-escalation.md',
},
{
type: 'candidate_action',
@ -783,13 +783,13 @@ function buildReplay(provenance, transcripts) {
key: 'orbit_demo.support_tickets',
},
{ type: 'work_unit_finished', unitKey: 'support-and-health', status: 'success' },
{ type: 'work_unit_started', unitKey: 'governance-and-exclusions', skills: ['knowledge_capture'], stepBudget: 40 },
{ type: 'work_unit_started', unitKey: 'governance-and-exclusions', skills: ['wiki_capture'], stepBudget: 40 },
{
type: 'candidate_action',
unitKey: 'governance-and-exclusions',
target: 'wiki',
action: 'created',
key: 'knowledge/global/internal-test-exclusion.md',
key: 'wiki/global/internal-test-exclusion.md',
},
{ type: 'work_unit_finished', unitKey: 'governance-and-exclusions', status: 'success' },
{ type: 'reconciliation_finished', conflictCount: 0, fallbackCount: 0 },
@ -835,7 +835,7 @@ function buildReplay(provenance, transcripts) {
async function writeGeneratedContext(rowCounts) {
for (const page of knowledgePages) {
await writeText(join('knowledge/global', page.file), renderKnowledgePage(page));
await writeText(join('wiki/global', page.file), renderKnowledgePage(page));
}
for (const table of semanticLayerTables) {
@ -908,7 +908,7 @@ async function writeGeneratedContext(rowCounts) {
},
generated: {
semanticLayer: { path: 'semantic-layer/orbit_demo', sourceCount: 6 },
knowledge: { path: 'knowledge/global', pageCount: 10 },
knowledge: { path: 'wiki/global', pageCount: 10 },
links: { path: 'links', linkCount: provenanceLinks.length },
},
});
@ -930,7 +930,7 @@ for (const relativeDir of [
'raw-sources/bi',
'raw-sources/notion',
'semantic-layer/orbit_demo',
'knowledge/global',
'wiki/global',
'links',
'reports',
]) {

View file

@ -1,152 +0,0 @@
import { mkdtemp, readFile, rm, writeFile } from 'node:fs/promises';
import { tmpdir } from 'node:os';
import { join } from 'node:path';
import { afterEach, beforeEach, describe, expect, it, vi } from 'vitest';
import {
KTX_AGENT_MAX_ROWS_CAP,
createKtxAgentRuntime,
parseAgentMaxRows,
readAgentJsonFile,
writeAgentJson,
writeAgentJsonError,
} from './agent-runtime.js';
function makeIo() {
let stdout = '';
let stderr = '';
return {
io: {
stdout: { write: (chunk: string) => (stdout += chunk) },
stderr: { write: (chunk: string) => (stderr += chunk) },
},
stdout: () => stdout,
stderr: () => stderr,
};
}
describe('agent runtime helpers', () => {
let tempDir: string;
beforeEach(async () => {
tempDir = await mkdtemp(join(tmpdir(), 'ktx-agent-runtime-'));
});
afterEach(async () => {
await rm(tempDir, { recursive: true, force: true });
});
it('writes JSON success and error envelopes without color or spinners', () => {
const successIo = makeIo();
const errorIo = makeIo();
writeAgentJson(successIo.io, { ok: true });
writeAgentJsonError(errorIo.io, 'missing source', { code: 'NOT_FOUND' });
expect(JSON.parse(successIo.stdout())).toEqual({ ok: true });
expect(successIo.stderr()).toBe('');
expect(JSON.parse(errorIo.stderr())).toEqual({
ok: false,
error: { message: 'missing source', code: 'NOT_FOUND' },
});
expect(errorIo.stdout()).toBe('');
});
it('reads JSON query files as objects', async () => {
const path = join(tempDir, 'query.json');
await writeFile(path, '{"measures":["revenue"],"limit":50}', 'utf-8');
await expect(readAgentJsonFile(path)).resolves.toEqual({ measures: ['revenue'], limit: 50 });
});
it('rejects non-object JSON query files', async () => {
const path = join(tempDir, 'query.json');
await writeFile(path, '["revenue"]', 'utf-8');
await expect(readAgentJsonFile(path)).rejects.toThrow('must contain a JSON object');
});
it('requires positive row limits and enforces the agent cap', () => {
expect(parseAgentMaxRows(100)).toBe(100);
expect(() => parseAgentMaxRows(undefined)).toThrow('maxRows is required');
expect(() => parseAgentMaxRows(0)).toThrow('positive integer');
expect(() => parseAgentMaxRows(KTX_AGENT_MAX_ROWS_CAP + 1)).toThrow(String(KTX_AGENT_MAX_ROWS_CAP));
});
it('constructs local context ports with semantic compute and query executor', async () => {
const project = {
projectDir: tempDir,
configPath: join(tempDir, 'ktx.yaml'),
config: { project: 'revenue', connections: {} },
coreConfig: {},
git: {},
fileStore: {},
} as never;
const ports = { knowledge: {}, semanticLayer: {} } as never;
const semanticLayerCompute = { query: vi.fn(), validateSources: vi.fn(), generateSources: vi.fn() };
const queryExecutor = { execute: vi.fn() };
const loadProject = vi.fn(async () => project);
const createContextTools = vi.fn(() => ports);
await expect(
createKtxAgentRuntime(
{ projectDir: tempDir, enableSemanticCompute: true, enableQueryExecution: true },
{
loadProject,
createContextTools,
createSemanticLayerCompute: () => semanticLayerCompute,
createQueryExecutor: () => queryExecutor,
},
),
).resolves.toMatchObject({ project, ports, queryExecutor });
expect(loadProject).toHaveBeenCalledWith({ projectDir: tempDir });
expect(createContextTools).toHaveBeenCalledWith(project, {
semanticLayerCompute,
queryExecutor,
});
});
it('creates managed semantic compute when no test override is injected', async () => {
const project = {
projectDir: tempDir,
configPath: join(tempDir, 'ktx.yaml'),
config: { project: 'revenue', connections: {} },
coreConfig: {},
git: {},
fileStore: {},
} as never;
const ports = { semanticLayer: {} } as never;
const semanticLayerCompute = { query: vi.fn(), validateSources: vi.fn(), generateSources: vi.fn() };
const loadProject = vi.fn(async () => project);
const createContextTools = vi.fn(() => ports);
const createManagedSemanticLayerCompute = vi.fn(async () => semanticLayerCompute);
const { io } = makeIo();
await expect(
createKtxAgentRuntime(
{
projectDir: tempDir,
enableSemanticCompute: true,
enableQueryExecution: false,
cliVersion: '0.2.0',
runtimeInstallPolicy: 'auto',
io,
},
{
loadProject,
createContextTools,
createManagedSemanticLayerCompute,
},
),
).resolves.toMatchObject({ project, ports, semanticLayerCompute });
expect(createManagedSemanticLayerCompute).toHaveBeenCalledWith({
cliVersion: '0.2.0',
installPolicy: 'auto',
io,
});
expect(createContextTools).toHaveBeenCalledWith(project, {
semanticLayerCompute,
});
});
});

View file

@ -1,109 +0,0 @@
import { readFile } from 'node:fs/promises';
import { createDefaultLocalQueryExecutor, type KtxSqlQueryExecutorPort } from '@ktx/context/connections';
import type { KtxSemanticLayerComputePort } from '@ktx/context/daemon';
import { createLocalProjectMcpContextPorts, type KtxMcpContextPorts } from '@ktx/context/mcp';
import { type KtxLocalProject, loadKtxProject } from '@ktx/context/project';
import type { KtxCliIo } from './cli-runtime.js';
import {
createManagedPythonSemanticLayerComputePort,
type KtxManagedPythonInstallPolicy,
} from './managed-python-command.js';
export const KTX_AGENT_MAX_ROWS_CAP = 1000;
export interface KtxAgentRuntimeOptions {
projectDir: string;
enableSemanticCompute: boolean;
enableQueryExecution: boolean;
cliVersion?: string;
runtimeInstallPolicy?: KtxManagedPythonInstallPolicy;
io?: KtxCliIo;
}
export interface KtxAgentRuntime {
project: KtxLocalProject;
ports: KtxMcpContextPorts;
semanticLayerCompute?: KtxSemanticLayerComputePort;
queryExecutor?: KtxSqlQueryExecutorPort;
}
export interface KtxAgentRuntimeDeps {
loadProject?: typeof loadKtxProject;
createContextTools?: typeof createLocalProjectMcpContextPorts;
createSemanticLayerCompute?: () => KtxSemanticLayerComputePort;
createManagedSemanticLayerCompute?: typeof createManagedPythonSemanticLayerComputePort;
createQueryExecutor?: () => KtxSqlQueryExecutorPort;
}
export function writeAgentJson(io: KtxCliIo, value: unknown): void {
io.stdout.write(`${JSON.stringify(value, null, 2)}\n`);
}
export function writeAgentJsonError(
io: KtxCliIo,
message: string,
detail: Record<string, unknown> = {},
): void {
io.stderr.write(`${JSON.stringify({ ok: false, error: { message, ...detail } }, null, 2)}\n`);
}
export async function readAgentJsonFile(path: string): Promise<Record<string, unknown>> {
const parsed = JSON.parse(await readFile(path, 'utf-8')) as unknown;
if (!parsed || typeof parsed !== 'object' || Array.isArray(parsed)) {
throw new Error(`${path} must contain a JSON object.`);
}
return parsed as Record<string, unknown>;
}
export function parseAgentMaxRows(value: number | undefined): number {
if (!Number.isInteger(value) || value === undefined || value <= 0) {
throw new Error('maxRows is required and must be a positive integer.');
}
if (value > KTX_AGENT_MAX_ROWS_CAP) {
throw new Error(`maxRows must be less than or equal to ${KTX_AGENT_MAX_ROWS_CAP}.`);
}
return value;
}
async function createAgentSemanticLayerCompute(
options: KtxAgentRuntimeOptions,
deps: KtxAgentRuntimeDeps,
): Promise<KtxSemanticLayerComputePort | undefined> {
if (!options.enableSemanticCompute) {
return undefined;
}
if (deps.createSemanticLayerCompute) {
return deps.createSemanticLayerCompute();
}
if (!options.cliVersion || !options.runtimeInstallPolicy || !options.io) {
throw new Error('Managed Python semantic compute requires cliVersion, runtimeInstallPolicy, and io.');
}
const createManagedSemanticLayerCompute =
deps.createManagedSemanticLayerCompute ?? createManagedPythonSemanticLayerComputePort;
return createManagedSemanticLayerCompute({
cliVersion: options.cliVersion,
installPolicy: options.runtimeInstallPolicy,
io: options.io,
});
}
export async function createKtxAgentRuntime(
options: KtxAgentRuntimeOptions,
deps: KtxAgentRuntimeDeps = {},
): Promise<KtxAgentRuntime> {
const project = await (deps.loadProject ?? loadKtxProject)({ projectDir: options.projectDir });
const semanticLayerCompute = await createAgentSemanticLayerCompute(options, deps);
const queryExecutor = options.enableQueryExecution
? (deps.createQueryExecutor ?? createDefaultLocalQueryExecutor)()
: undefined;
const ports = (deps.createContextTools ?? createLocalProjectMcpContextPorts)(project, {
...(semanticLayerCompute ? { semanticLayerCompute } : {}),
...(queryExecutor ? { queryExecutor } : {}),
});
return {
project,
ports,
...(semanticLayerCompute ? { semanticLayerCompute } : {}),
...(queryExecutor ? { queryExecutor } : {}),
};
}

View file

@ -1,51 +0,0 @@
import { describe, expect, it } from 'vitest';
import {
isMissingProjectConfigError,
missingConnectionSlSearchReadiness,
missingProjectSlSearchReadiness,
noConnectionsSlSearchReadiness,
noIndexedSourcesSlSearchReadiness,
} from './agent-search-readiness.js';
describe('agent semantic-layer search readiness guidance', () => {
it('formats missing project guidance with exact recovery commands', () => {
expect(missingProjectSlSearchReadiness('/tmp/ktx-search', 'gross revenue')).toEqual({
code: 'agent_sl_search_missing_project',
message: 'Semantic-layer search needs an initialized KTX project at /tmp/ktx-search.',
nextSteps: [
'ktx setup --project-dir /tmp/ktx-search',
'ktx status --project-dir /tmp/ktx-search',
'ktx ingest <connection>',
'ktx agent sl list --json --query "gross revenue" --project-dir /tmp/ktx-search',
],
});
});
it('formats no-connection and no-index guidance without hiding the project path', () => {
expect(noConnectionsSlSearchReadiness('/tmp/ktx-search', 'revenue')).toMatchObject({
code: 'agent_sl_search_no_connections',
message: 'Semantic-layer search found no configured connections in /tmp/ktx-search.',
});
expect(noIndexedSourcesSlSearchReadiness('/tmp/ktx-search', 'orders')).toMatchObject({
code: 'agent_sl_search_no_indexed_sources',
message: 'Semantic-layer search found no indexed semantic-layer sources in /tmp/ktx-search.',
});
});
it('formats unknown connection guidance', () => {
expect(missingConnectionSlSearchReadiness('/tmp/ktx-search', 'warehouse', 'revenue')).toMatchObject({
code: 'agent_sl_search_unknown_connection',
message: 'Semantic-layer search connection "warehouse" is not configured in /tmp/ktx-search.',
});
});
it('detects missing ktx.yaml read errors', () => {
const error = Object.assign(new Error('ENOENT: no such file or directory'), {
code: 'ENOENT',
path: '/tmp/ktx-search/ktx.yaml',
});
expect(isMissingProjectConfigError(error)).toBe(true);
expect(isMissingProjectConfigError(new Error('other'))).toBe(false);
});
});

View file

@ -1,94 +0,0 @@
export type KtxAgentSlSearchReadinessCode =
| 'agent_sl_search_missing_project'
| 'agent_sl_search_no_connections'
| 'agent_sl_search_unknown_connection'
| 'agent_sl_search_no_indexed_sources';
export interface KtxAgentSlSearchReadinessDetail {
code: KtxAgentSlSearchReadinessCode;
message: string;
nextSteps: string[];
}
function queryForCommand(query: string | undefined): string {
const trimmed = query?.trim();
return trimmed && trimmed.length > 0 ? trimmed : 'revenue';
}
function projectSearchCommand(projectDir: string, query: string | undefined): string {
return `ktx agent sl list --json --query ${JSON.stringify(queryForCommand(query))} --project-dir ${projectDir}`;
}
function baseNextSteps(projectDir: string, query: string | undefined): string[] {
return [
`ktx setup --project-dir ${projectDir}`,
`ktx status --project-dir ${projectDir}`,
'ktx ingest <connection>',
projectSearchCommand(projectDir, query),
];
}
export function missingProjectSlSearchReadiness(
projectDir: string,
query: string | undefined,
): KtxAgentSlSearchReadinessDetail {
return {
code: 'agent_sl_search_missing_project',
message: `Semantic-layer search needs an initialized KTX project at ${projectDir}.`,
nextSteps: baseNextSteps(projectDir, query),
};
}
export function noConnectionsSlSearchReadiness(
projectDir: string,
query: string | undefined,
): KtxAgentSlSearchReadinessDetail {
return {
code: 'agent_sl_search_no_connections',
message: `Semantic-layer search found no configured connections in ${projectDir}.`,
nextSteps: baseNextSteps(projectDir, query),
};
}
export function missingConnectionSlSearchReadiness(
projectDir: string,
connectionId: string,
query: string | undefined,
): KtxAgentSlSearchReadinessDetail {
return {
code: 'agent_sl_search_unknown_connection',
message: `Semantic-layer search connection "${connectionId}" is not configured in ${projectDir}.`,
nextSteps: baseNextSteps(projectDir, query),
};
}
export function noIndexedSourcesSlSearchReadiness(
projectDir: string,
query: string | undefined,
): KtxAgentSlSearchReadinessDetail {
return {
code: 'agent_sl_search_no_indexed_sources',
message: `Semantic-layer search found no indexed semantic-layer sources in ${projectDir}.`,
nextSteps: baseNextSteps(projectDir, query),
};
}
function errorCode(error: unknown): string | undefined {
if (typeof error !== 'object' || error === null || !('code' in error)) {
return undefined;
}
const code = (error as { code?: unknown }).code;
return typeof code === 'string' ? code : undefined;
}
function errorPath(error: unknown): string | undefined {
if (typeof error !== 'object' || error === null || !('path' in error)) {
return undefined;
}
const path = (error as { path?: unknown }).path;
return typeof path === 'string' ? path : undefined;
}
export function isMissingProjectConfigError(error: unknown): boolean {
return errorCode(error) === 'ENOENT' && (errorPath(error)?.endsWith('ktx.yaml') ?? false);
}

View file

@ -1,428 +0,0 @@
import { mkdtemp, rm, writeFile } from 'node:fs/promises';
import { tmpdir } from 'node:os';
import { join } from 'node:path';
import { buildDefaultKtxProjectConfig } from '@ktx/context/project';
import { afterEach, beforeEach, describe, expect, it, vi } from 'vitest';
import { runKtxAgent } from './agent.js';
import type { KtxAgentRuntime } from './agent-runtime.js';
function makeIo() {
let stdout = '';
let stderr = '';
return {
io: {
stdout: { write: (chunk: string) => (stdout += chunk) },
stderr: { write: (chunk: string) => (stderr += chunk) },
},
stdout: () => stdout,
stderr: () => stderr,
};
}
function runtime(overrides: Record<string, unknown> = {}): KtxAgentRuntime {
const config = buildDefaultKtxProjectConfig('revenue');
return {
project: {
projectDir: '/tmp/revenue',
configPath: '/tmp/revenue/ktx.yaml',
config: {
...config,
connections: {
warehouse: { driver: 'sqlite', path: 'warehouse.sqlite', readonly: true as const },
},
},
coreConfig: {} as KtxAgentRuntime['project']['coreConfig'],
git: {} as KtxAgentRuntime['project']['git'],
fileStore: {} as KtxAgentRuntime['project']['fileStore'],
},
ports: {
connections: { list: vi.fn(async () => [{ id: 'warehouse', name: 'warehouse', connectionType: 'sqlite' }]) },
semanticLayer: {
listSources: vi.fn(async () => ({
sources: [
{
connectionId: 'warehouse',
connectionName: 'warehouse',
name: 'orders',
columnCount: 2,
measureCount: 1,
joinCount: 0,
},
],
totalSources: 1,
})),
readSource: vi.fn(async () => ({ sourceName: 'orders', yaml: 'name: orders\n' })),
writeSource: vi.fn(async () => ({ success: true, sourceName: 'orders' })),
validate: vi.fn(async () => ({ success: true, errors: [], warnings: [] })),
query: vi.fn(async () => ({ sql: 'select 1', headers: ['x'], rows: [[1]], totalRows: 1, plan: {} })),
},
knowledge: {
search: vi.fn(async () => ({
results: [
{
key: 'page-1',
path: 'knowledge/global/page-1.md',
scope: 'GLOBAL' as const,
summary: 'Revenue logic',
score: 0.9,
matchReasons: ['lexical' as const],
},
],
totalFound: 1,
})),
read: vi.fn(async () => ({
key: 'page-1',
scope: 'GLOBAL' as const,
summary: 'Revenue logic',
content: 'Use net revenue.',
})),
write: vi.fn(async () => ({ success: true, key: 'page-1', action: 'created' as const })),
},
},
queryExecutor: {
execute: vi.fn(async () => ({ headers: ['x'], rows: [[1]], totalRows: 1, command: 'SELECT', rowCount: 1 })),
},
...overrides,
};
}
function runtimeWithoutConnections(): KtxAgentRuntime {
const base = runtime();
return {
...base,
project: {
...base.project,
config: {
...base.project.config,
connections: {},
},
},
ports: {
...base.ports,
semanticLayer: {
...base.ports.semanticLayer!,
listSources: vi.fn(async () => ({ sources: [], totalSources: 0 })),
},
},
};
}
describe('runKtxAgent', () => {
let tempDir: string;
beforeEach(async () => {
tempDir = await mkdtemp(join(tmpdir(), 'ktx-agent-'));
});
afterEach(async () => {
await rm(tempDir, { recursive: true, force: true });
});
it('prints tool discovery with every stable command', async () => {
const io = makeIo();
await expect(runKtxAgent({ command: 'tools', projectDir: tempDir, json: true }, io.io)).resolves.toBe(0);
const body = JSON.parse(io.stdout());
expect(body.projectDir).toBe(tempDir);
expect(body.tools.map((tool: { name: string }) => tool.name)).toEqual([
'context',
'sl.list',
'sl.read',
'sl.query',
'wiki.search',
'wiki.read',
'sql.execute',
]);
expect(io.stderr()).toBe('');
});
it('prints project context from setup status, connections, and SL summaries', async () => {
const io = makeIo();
const createRuntime = vi.fn(async () => runtime());
const readSetupStatus = vi.fn(async () => ({ project: { path: tempDir, ready: true }, agents: [] }));
await expect(
runKtxAgent({ command: 'context', projectDir: tempDir, json: true }, io.io, { createRuntime, readSetupStatus }),
).resolves.toBe(0);
expect(JSON.parse(io.stdout())).toMatchObject({
projectDir: tempDir,
status: { project: { ready: true } },
connections: [{ id: 'warehouse' }],
semanticLayer: { totalSources: 1 },
});
});
it('dispatches SL list, SL read, wiki search, and wiki read through local ports', async () => {
for (const args of [
{ command: 'sl-list' as const, projectDir: tempDir, json: true as const, connectionId: 'warehouse' },
{
command: 'sl-read' as const,
projectDir: tempDir,
json: true as const,
connectionId: 'warehouse',
sourceName: 'orders',
},
{ command: 'wiki-search' as const, projectDir: tempDir, json: true as const, query: 'revenue', limit: 10 },
{ command: 'wiki-read' as const, projectDir: tempDir, json: true as const, pageId: 'page-1' },
]) {
const io = makeIo();
await expect(runKtxAgent(args, io.io, { createRuntime: async () => runtime() })).resolves.toBe(0);
expect(JSON.parse(io.stdout())).toBeTruthy();
expect(io.stderr()).toBe('');
}
});
it('prints wiki hybrid search metadata from the hidden agent wiki search command', async () => {
const fakeRuntime = runtime();
const knowledge = fakeRuntime.ports.knowledge;
if (!knowledge) {
throw new Error('Expected runtime knowledge port');
}
fakeRuntime.ports.knowledge = {
...knowledge,
search: vi.fn(async () => ({
results: [
{
key: 'metrics-revenue',
path: 'knowledge/global/metrics-revenue.md',
scope: 'GLOBAL' as const,
summary: 'Revenue metric definition',
score: 0.02459016393442623,
matchReasons: ['lexical' as const, 'token' as const],
},
],
totalFound: 1,
})),
};
const io = makeIo();
await expect(
runKtxAgent({ command: 'wiki-search', projectDir: tempDir, json: true, query: 'paid order', limit: 5 }, io.io, {
createRuntime: async () => fakeRuntime,
}),
).resolves.toBe(0);
expect(JSON.parse(io.stdout())).toEqual({
results: [
expect.objectContaining({
key: 'metrics-revenue',
path: 'knowledge/global/metrics-revenue.md',
matchReasons: ['lexical', 'token'],
}),
],
totalFound: 1,
});
});
it('executes SL queries from a JSON query file', async () => {
const queryFile = join(tempDir, 'sl-query.json');
const io = makeIo();
await writeFile(queryFile, '{"measures":["total_revenue"],"dimensions":[]}', 'utf-8');
await expect(
runKtxAgent(
{
command: 'sl-query',
projectDir: tempDir,
json: true,
connectionId: 'warehouse',
queryFile,
execute: true,
maxRows: 100,
cliVersion: '0.2.0',
runtimeInstallPolicy: 'never',
},
io.io,
{ createRuntime: async () => runtime() },
),
).resolves.toBe(0);
expect(JSON.parse(io.stdout())).toMatchObject({ sql: 'select 1', rows: [[1]] });
});
it('passes managed runtime options into default SL query runtime creation', async () => {
const queryFile = join(tempDir, 'sl-query.json');
const io = makeIo();
const createRuntime = vi.fn(async () => runtime());
await writeFile(queryFile, '{"measures":["total_revenue"],"dimensions":[]}', 'utf-8');
await expect(
runKtxAgent(
{
command: 'sl-query',
projectDir: tempDir,
json: true,
connectionId: 'warehouse',
queryFile,
execute: false,
cliVersion: '0.2.0',
runtimeInstallPolicy: 'auto',
},
io.io,
{ createRuntime },
),
).resolves.toBe(0);
expect(createRuntime).toHaveBeenCalledWith({
projectDir: tempDir,
enableSemanticCompute: true,
enableQueryExecution: false,
cliVersion: '0.2.0',
runtimeInstallPolicy: 'auto',
io: io.io,
});
});
it('executes read-only SQL from a SQL file with an explicit row limit', async () => {
const sqlFile = join(tempDir, 'query.sql');
const fakeRuntime = runtime();
const io = makeIo();
await writeFile(sqlFile, 'select 1', 'utf-8');
await expect(
runKtxAgent(
{
command: 'sql-execute',
projectDir: tempDir,
json: true,
connectionId: 'warehouse',
sqlFile,
maxRows: 100,
},
io.io,
{ createRuntime: async () => fakeRuntime as never },
),
).resolves.toBe(0);
expect(fakeRuntime.queryExecutor?.execute).toHaveBeenCalledWith({
connectionId: 'warehouse',
projectDir: '/tmp/revenue',
connection: { driver: 'sqlite', path: 'warehouse.sqlite', readonly: true },
sql: 'select 1',
maxRows: 100,
});
});
it('prints guided JSON when semantic-layer search runs outside a project', async () => {
const io = makeIo();
const missingProjectError = Object.assign(new Error('ENOENT: no such file or directory'), {
code: 'ENOENT',
path: join(tempDir, 'ktx.yaml'),
});
await expect(
runKtxAgent(
{ command: 'sl-list', projectDir: tempDir, json: true, query: 'gross revenue' },
io.io,
{ createRuntime: vi.fn(async () => Promise.reject(missingProjectError)) },
),
).resolves.toBe(1);
expect(JSON.parse(io.stderr())).toEqual({
ok: false,
error: {
code: 'agent_sl_search_missing_project',
message: `Semantic-layer search needs an initialized KTX project at ${tempDir}.`,
nextSteps: [
`ktx setup --project-dir ${tempDir}`,
`ktx status --project-dir ${tempDir}`,
'ktx ingest <connection>',
`ktx agent sl list --json --query "gross revenue" --project-dir ${tempDir}`,
],
},
});
expect(io.stdout()).toBe('');
});
it('prints guided JSON when semantic-layer search has no configured connections', async () => {
const io = makeIo();
await expect(
runKtxAgent(
{ command: 'sl-list', projectDir: tempDir, json: true, query: 'revenue' },
io.io,
{ createRuntime: async () => runtimeWithoutConnections() },
),
).resolves.toBe(1);
expect(JSON.parse(io.stderr())).toMatchObject({
ok: false,
error: {
code: 'agent_sl_search_no_connections',
message: `Semantic-layer search found no configured connections in ${tempDir}.`,
nextSteps: [
`ktx setup --project-dir ${tempDir}`,
`ktx status --project-dir ${tempDir}`,
'ktx ingest <connection>',
`ktx agent sl list --json --query "revenue" --project-dir ${tempDir}`,
],
},
});
});
it('prints guided JSON when semantic-layer search asks for an unknown connection', async () => {
const io = makeIo();
await expect(
runKtxAgent(
{ command: 'sl-list', projectDir: tempDir, json: true, connectionId: 'missing', query: 'revenue' },
io.io,
{ createRuntime: async () => runtime() },
),
).resolves.toBe(1);
expect(JSON.parse(io.stderr())).toMatchObject({
ok: false,
error: {
code: 'agent_sl_search_unknown_connection',
message: `Semantic-layer search connection "missing" is not configured in ${tempDir}.`,
},
});
});
it('prints guided JSON when semantic-layer search has no indexed sources', async () => {
const fakeRuntime = runtime();
const semanticLayer = fakeRuntime.ports.semanticLayer!;
fakeRuntime.ports.semanticLayer = {
...semanticLayer,
listSources: vi.fn(async () => ({ sources: [], totalSources: 0 })),
};
const io = makeIo();
await expect(
runKtxAgent(
{ command: 'sl-list', projectDir: tempDir, json: true, connectionId: 'warehouse', query: 'revenue' },
io.io,
{ createRuntime: async () => fakeRuntime },
),
).resolves.toBe(1);
expect(JSON.parse(io.stderr())).toMatchObject({
ok: false,
error: {
code: 'agent_sl_search_no_indexed_sources',
message: `Semantic-layer search found no indexed semantic-layer sources in ${tempDir}.`,
},
});
});
it('returns JSON errors when required ports or records are missing', async () => {
const io = makeIo();
await expect(
runKtxAgent({ command: 'wiki-read', projectDir: tempDir, json: true, pageId: 'missing' }, io.io, {
createRuntime: async () =>
runtime({
ports: { knowledge: { read: vi.fn(async () => null) } },
}) as never,
}),
).resolves.toBe(1);
expect(JSON.parse(io.stderr())).toMatchObject({
ok: false,
error: { message: expect.stringContaining('missing') },
});
});
});

View file

@ -1,219 +0,0 @@
import { readFile } from 'node:fs/promises';
import type { KtxCliIo } from './cli-runtime.js';
import {
createKtxAgentRuntime,
parseAgentMaxRows,
readAgentJsonFile,
writeAgentJson,
writeAgentJsonError,
type KtxAgentRuntime,
type KtxAgentRuntimeDeps,
} from './agent-runtime.js';
import {
isMissingProjectConfigError,
missingConnectionSlSearchReadiness,
missingProjectSlSearchReadiness,
noConnectionsSlSearchReadiness,
noIndexedSourcesSlSearchReadiness,
type KtxAgentSlSearchReadinessDetail,
} from './agent-search-readiness.js';
import type { KtxManagedPythonInstallPolicy } from './managed-python-command.js';
import { readKtxSetupStatus, type KtxSetupStatus } from './setup.js';
export type KtxAgentArgs =
| { command: 'tools'; projectDir: string; json: true }
| { command: 'context'; projectDir: string; json: true }
| { command: 'sl-list'; projectDir: string; json: true; connectionId?: string; query?: string }
| { command: 'sl-read'; projectDir: string; json: true; connectionId?: string; sourceName: string }
| {
command: 'sl-query';
projectDir: string;
json: true;
connectionId: string;
queryFile: string;
execute: boolean;
maxRows?: number;
cliVersion: string;
runtimeInstallPolicy: KtxManagedPythonInstallPolicy;
}
| { command: 'wiki-search'; projectDir: string; json: true; query: string; limit: number }
| { command: 'wiki-read'; projectDir: string; json: true; pageId: string }
| { command: 'sql-execute'; projectDir: string; json: true; connectionId: string; sqlFile: string; maxRows?: number };
export interface KtxAgentDeps extends KtxAgentRuntimeDeps {
createRuntime?: (options: {
projectDir: string;
enableSemanticCompute: boolean;
enableQueryExecution: boolean;
cliVersion?: string;
runtimeInstallPolicy?: KtxManagedPythonInstallPolicy;
io?: KtxCliIo;
}) => Promise<KtxAgentRuntime>;
readSetupStatus?: (
projectDir: string,
) => Promise<KtxSetupStatus | { project: { path?: string; ready: boolean }; agents: unknown[] }>;
}
const AGENT_TOOLS = [
{ name: 'context', command: 'ktx agent context --json' },
{ name: 'sl.list', command: 'ktx agent sl list --json [--connection-id <id>] [--query <text>]' },
{ name: 'sl.read', command: 'ktx agent sl read <sourceName> --json [--connection-id <id>]' },
{
name: 'sl.query',
command: 'ktx agent sl query --json --connection-id <id> --query-file <path> --execute --max-rows 100',
},
{ name: 'wiki.search', command: 'ktx agent wiki search <query> --json [--limit 10]' },
{ name: 'wiki.read', command: 'ktx agent wiki read <pageId> --json' },
{
name: 'sql.execute',
command: 'ktx agent sql execute --json --connection-id <id> --sql-file <path> --max-rows 100',
},
] as const;
function writeAgentSlSearchReadinessError(io: KtxCliIo, detail: KtxAgentSlSearchReadinessDetail): void {
writeAgentJsonError(io, detail.message, { code: detail.code, nextSteps: detail.nextSteps });
}
async function runtimeFor(args: KtxAgentArgs, deps: KtxAgentDeps, io: KtxCliIo): Promise<KtxAgentRuntime> {
const needsSemanticCompute = args.command === 'sl-query';
const needsQueryExecution = args.command === 'sql-execute' || (args.command === 'sl-query' && args.execute);
const runtimeOptions = {
projectDir: args.projectDir,
enableSemanticCompute: needsSemanticCompute,
enableQueryExecution: needsQueryExecution,
...(args.command === 'sl-query'
? {
cliVersion: args.cliVersion,
runtimeInstallPolicy: args.runtimeInstallPolicy,
io,
}
: {}),
};
return deps.createRuntime ? deps.createRuntime(runtimeOptions) : createKtxAgentRuntime(runtimeOptions, deps);
}
function connectionIdForSource(runtime: KtxAgentRuntime, requested: string | undefined): string {
if (requested) return requested;
const ids = Object.keys(runtime.project.config.connections ?? {});
if (ids.length === 1) return ids[0] as string;
throw new Error('Use --connection-id when the project has zero or multiple connections.');
}
export async function runKtxAgent(args: KtxAgentArgs, io: KtxCliIo, deps: KtxAgentDeps = {}): Promise<number> {
try {
if (args.command === 'tools') {
writeAgentJson(io, { projectDir: args.projectDir, tools: AGENT_TOOLS });
return 0;
}
const runtime = await runtimeFor(args, deps, io);
if (args.command === 'context') {
const [status, connections, semanticLayer] = await Promise.all([
(deps.readSetupStatus ?? readKtxSetupStatus)(args.projectDir),
runtime.ports.connections?.list() ?? [],
runtime.ports.semanticLayer?.listSources({}) ?? { sources: [], totalSources: 0 },
]);
writeAgentJson(io, { projectDir: args.projectDir, status, connections, semanticLayer, tools: AGENT_TOOLS });
return 0;
}
if (args.command === 'sl-list') {
const semanticLayer = runtime.ports.semanticLayer;
if (!semanticLayer) throw new Error('Semantic-layer tools are not available for this project.');
if (args.query) {
const connectionIds = Object.keys(runtime.project.config.connections ?? {});
if (args.connectionId && !runtime.project.config.connections[args.connectionId]) {
writeAgentSlSearchReadinessError(
io,
missingConnectionSlSearchReadiness(args.projectDir, args.connectionId, args.query),
);
return 1;
}
if (connectionIds.length === 0) {
writeAgentSlSearchReadinessError(io, noConnectionsSlSearchReadiness(args.projectDir, args.query));
return 1;
}
}
const listed = await semanticLayer.listSources({ connectionId: args.connectionId, query: args.query });
if (args.query && listed.sources.length === 0) {
const allSources = await semanticLayer.listSources({ connectionId: args.connectionId });
if (allSources.totalSources === 0) {
writeAgentSlSearchReadinessError(io, noIndexedSourcesSlSearchReadiness(args.projectDir, args.query));
return 1;
}
}
writeAgentJson(io, listed);
return 0;
}
if (args.command === 'sl-read') {
const semanticLayer = runtime.ports.semanticLayer;
if (!semanticLayer) throw new Error('Semantic-layer tools are not available for this project.');
const source = await semanticLayer.readSource({
connectionId: connectionIdForSource(runtime, args.connectionId),
sourceName: args.sourceName,
});
if (!source) throw new Error(`Semantic-layer source "${args.sourceName}" was not found.`);
writeAgentJson(io, source);
return 0;
}
if (args.command === 'sl-query') {
const semanticLayer = runtime.ports.semanticLayer;
if (!semanticLayer) throw new Error('Semantic-layer tools are not available for this project.');
const query = await readAgentJsonFile(args.queryFile);
const maxRows = args.execute ? parseAgentMaxRows(args.maxRows) : args.maxRows;
writeAgentJson(
io,
await semanticLayer.query({
connectionId: args.connectionId,
query: { ...query, ...(maxRows !== undefined ? { limit: maxRows } : {}) } as never,
}),
);
return 0;
}
if (args.command === 'wiki-search') {
const knowledge = runtime.ports.knowledge;
if (!knowledge) throw new Error('Wiki tools are not available for this project.');
writeAgentJson(io, await knowledge.search({ userId: 'agent', query: args.query, limit: args.limit }));
return 0;
}
if (args.command === 'wiki-read') {
const knowledge = runtime.ports.knowledge;
if (!knowledge) throw new Error('Wiki tools are not available for this project.');
const page = await knowledge.read({ userId: 'agent', key: args.pageId });
if (!page) throw new Error(`Wiki page "${args.pageId}" was not found.`);
writeAgentJson(io, page);
return 0;
}
const queryExecutor = runtime.queryExecutor;
if (!queryExecutor) throw new Error('SQL execution is not available for this project.');
const connection = runtime.project.config.connections[args.connectionId];
if (!connection) throw new Error(`Connection "${args.connectionId}" was not found.`);
const maxRows = parseAgentMaxRows(args.maxRows);
writeAgentJson(
io,
await queryExecutor.execute({
connectionId: args.connectionId,
projectDir: runtime.project.projectDir,
connection,
sql: await readFile(args.sqlFile, 'utf-8'),
maxRows,
}),
);
return 0;
} catch (error) {
if (args.command === 'sl-list' && args.query && isMissingProjectConfigError(error)) {
writeAgentSlSearchReadinessError(io, missingProjectSlSearchReadiness(args.projectDir, args.query));
return 1;
}
writeAgentJsonError(io, error instanceof Error ? error.message : String(error));
return 1;
}
}

View file

@ -2,6 +2,7 @@ import { cancel, confirm, isCancel, log, spinner } from '@clack/prompts';
export interface KtxCliSpinner {
start(message: string): void;
message(message: string): void;
stop(message: string): void;
error(message: string): void;
}

View file

@ -1,9 +1,9 @@
import { Command, InvalidArgumentError } from '@commander-js/extra-typings';
import type { KtxCliDeps, KtxCliIo, KtxCliPackageInfo } from './cli-runtime.js';
import { registerAgentCommands } from './commands/agent-commands.js';
import { registerConnectionCommands } from './commands/connection-commands.js';
import { registerIngestCommands } from './commands/ingest-commands.js';
import { registerWikiCommands } from './commands/knowledge-commands.js';
import { registerPublicIngestCommands } from './commands/public-ingest-commands.js';
import { registerScanCommands } from './commands/scan-commands.js';
import { registerSetupCommands } from './commands/setup-commands.js';
import { registerSlCommands } from './commands/sl-commands.js';
import { registerStatusCommands } from './commands/status-commands.js';
@ -53,7 +53,7 @@ type CommandPathNode = CommandWithGlobalOptions & {
parent?: CommandPathNode | null;
};
const PROJECT_AWARE_ROOT_COMMANDS = new Set(['setup', 'connection', 'ingest', 'wiki', 'sl', 'status']);
const PROJECT_AWARE_ROOT_COMMANDS = new Set(['setup', 'connection', 'ingest', 'wiki', 'sl', 'status', 'scan']);
export interface CommandWithGlobalOptions {
opts: () => object;
@ -151,7 +151,7 @@ function isProjectAwareCommand(path: string[]): boolean {
const rootCommand = path[1];
if (rootCommand === 'dev') {
return path[2] !== undefined && path[2] !== 'completion' && path[2] !== 'runtime';
return path[2] !== undefined && path[2] !== 'runtime';
}
return rootCommand !== undefined && PROJECT_AWARE_ROOT_COMMANDS.has(rootCommand);
}
@ -162,6 +162,10 @@ function shouldSuppressProjectDirLine(path: string[], options: Record<string, un
return true;
}
if (commandPathKey === 'ktx setup') {
return true;
}
if (
commandPathKey === 'ktx status' &&
typeof options.projectDir !== 'string' &&
@ -176,14 +180,8 @@ function shouldSuppressProjectDirLine(path: string[], options: Record<string, un
}
if (commandPathKey === 'ktx ingest watch') {
return options.json !== true;
}
if (commandPathKey === 'ktx dev ingest watch') {
return options.json !== true && options.plain !== true;
}
if (commandPathKey === 'ktx connection notion pick') {
return options.input !== false;
}
const demoIndex = path.indexOf('demo');
if (demoIndex >= 0) {
const demoCommand = path[demoIndex + 1];
@ -222,7 +220,7 @@ export function resolveCommandProjectDirOverride(command: CommandWithGlobalOptio
function createBaseProgram(info: KtxCliPackageInfo, io: KtxCliIo): Command {
return new Command()
.name('ktx')
.description('Standalone KTX developer CLI')
.description('KTX data agent context layer CLI')
.option('--project-dir <path>', 'KTX project directory (default: KTX_PROJECT_DIR, nearest ktx.yaml, or cwd)')
.option('--debug', 'Enable diagnostic logging to stderr')
.version(`${info.name} ${info.version}`, '-v, --version', 'Show CLI version')
@ -230,7 +228,7 @@ function createBaseProgram(info: KtxCliPackageInfo, io: KtxCliIo): Command {
.configureHelp({ showGlobalOptions: true })
.addHelpText(
'after',
'\nAdvanced:\n ktx dev Low-level diagnostics, scans, adapter commands, and mapping tools.\n',
'\nAdvanced:\n ktx dev Low-level project initialization and runtime management.\n',
)
.showHelpAfterError()
.exitOverride()
@ -315,11 +313,14 @@ export function buildKtxProgram(options: BuildKtxProgramOptions): Command {
registerSetupCommands(program, context);
registerConnectionCommands(program, context);
registerPublicIngestCommands(program, context);
registerIngestCommands(program, context, {
runIngestWithProgress: async (ingestArgs, ingestIo, ingestDeps, defaultRunIngest) =>
await (ingestDeps.ingest ?? defaultRunIngest)(ingestArgs, ingestIo),
});
registerScanCommands(program, context);
registerWikiCommands(program, context);
registerSlCommands(program, context);
registerStatusCommands(program, context);
registerAgentCommands(program, context);
registerDevCommands(program, context);
return program;

View file

@ -1,13 +1,9 @@
import { createRequire } from 'node:module';
import type { KtxConnectionMetabaseSetupArgs } from './commands/connection-metabase-setup.js';
import type { KtxConnectionNotionArgs } from './commands/connection-notion.js';
import type { KtxAgentArgs } from './agent.js';
import type { KtxConnectionArgs } from './connection.js';
import type { KtxDoctorArgs } from './doctor.js';
import type { KtxIngestArgs } from './ingest.js';
import type { KtxKnowledgeArgs } from './knowledge.js';
import type { KtxPublicIngestArgs } from './public-ingest.js';
import type { KtxRuntimeArgs } from './runtime.js';
import type { KtxScanArgs } from './scan.js';
import type { KtxSetupArgs } from './setup.js';
@ -31,13 +27,9 @@ export interface KtxCliIo {
export interface KtxCliDeps {
setup?: (args: KtxSetupArgs, io: KtxCliIo) => Promise<number>;
agent?: (args: KtxAgentArgs, io: KtxCliIo) => Promise<number>;
connection?: (args: KtxConnectionArgs, io: KtxCliIo) => Promise<number>;
connectionNotion?: (args: KtxConnectionNotionArgs, io: KtxCliIo) => Promise<number>;
connectionMetabaseSetup?: (args: KtxConnectionMetabaseSetupArgs, io: KtxCliIo) => Promise<number>;
doctor?: (args: KtxDoctorArgs, io: KtxCliIo) => Promise<number>;
ingest?: (args: KtxIngestArgs, io: KtxCliIo) => Promise<number>;
publicIngest?: (args: KtxPublicIngestArgs, io: KtxCliIo) => Promise<number>;
runtime?: (args: KtxRuntimeArgs, io: KtxCliIo) => Promise<number>;
scan?: (args: KtxScanArgs, io: KtxCliIo) => Promise<number>;
knowledge?: (args: KtxKnowledgeArgs, io: KtxCliIo) => Promise<number>;

View file

@ -1,33 +1,8 @@
import { z } from 'zod';
const projectDirSchema = z.string().min(1);
const safeConnectionIdSchema = z.string().regex(/^[a-zA-Z0-9][a-zA-Z0-9_-]*$/, 'Unsafe connection id');
const stringArraySchema = z.array(z.string());
export const connectionAddCommandSchema = z.object({
command: z.literal('add'),
projectDir: projectDirSchema,
driver: z.string().min(1),
connectionId: safeConnectionIdSchema,
url: z.string().optional(),
schemas: stringArraySchema,
readonly: z.boolean(),
force: z.boolean(),
allowLiteralCredentials: z.boolean(),
notion: z
.object({
authTokenRef: z.string().min(1),
crawlMode: z.enum(['all_accessible', 'selected_roots']),
rootPageIds: stringArraySchema,
rootDatabaseIds: stringArraySchema,
rootDataSourceIds: stringArraySchema,
maxPagesPerRun: z.number().int().positive().optional(),
maxKnowledgeCreatesPerRun: z.number().int().nonnegative().optional(),
maxKnowledgeUpdatesPerRun: z.number().int().nonnegative().optional(),
})
.optional(),
});
export const wikiWriteCommandSchema = z.object({
command: z.literal('write'),
projectDir: projectDirSchema,
@ -53,35 +28,21 @@ export const slQueryCommandSchema = z.object({
command: z.literal('query'),
projectDir: projectDirSchema,
connectionId: z.string().min(1).optional(),
query: z.object({
measures: z.array(z.string().min(1)).min(1),
dimensions: stringArraySchema,
filters: stringArraySchema.optional(),
segments: stringArraySchema.optional(),
order_by: z.array(orderBySchema).optional(),
limit: z.number().int().positive().optional(),
include_empty: z.literal(true).optional(),
}),
query: z
.object({
measures: z.array(z.string().min(1)).min(1),
dimensions: stringArraySchema,
filters: stringArraySchema.optional(),
segments: stringArraySchema.optional(),
order_by: z.array(orderBySchema).optional(),
limit: z.number().int().positive().optional(),
include_empty: z.literal(true).optional(),
})
.optional(),
queryFile: z.string().min(1).optional(),
format: z.enum(['json', 'sql']),
execute: z.boolean(),
cliVersion: z.string().min(1),
runtimeInstallPolicy: z.enum(['prompt', 'auto', 'never']),
maxRows: z.number().int().positive().optional(),
});
export const publicIngestRunCommandSchema = z.object({
command: z.literal('run'),
projectDir: projectDirSchema,
targetConnectionId: safeConnectionIdSchema.optional(),
all: z.boolean(),
json: z.boolean(),
inputMode: z.enum(['auto', 'disabled']),
});
export const publicIngestReadCommandSchema = z.object({
command: z.enum(['status', 'watch']),
projectDir: projectDirSchema,
runId: z.string().min(1).optional(),
json: z.boolean(),
inputMode: z.enum(['auto', 'disabled']),
});

View file

@ -1,149 +0,0 @@
import { Option, type Command } from '@commander-js/extra-typings';
import type { KtxAgentArgs } from '../agent.js';
import type { KtxCliCommandContext } from '../cli-program.js';
import { parsePositiveIntegerOption, resolveCommandProjectDir } from '../cli-program.js';
import { runtimeInstallPolicyFromFlags } from '../managed-python-command.js';
async function runAgent(context: KtxCliCommandContext, args: KtxAgentArgs): Promise<void> {
const runner = context.deps.agent ?? (await import('../agent.js')).runKtxAgent;
context.setExitCode(await runner(args, context.io));
}
function jsonOption(): Option {
return new Option('--json', 'Print JSON output').makeOptionMandatory();
}
export function registerAgentCommands(program: Command, context: KtxCliCommandContext): void {
const agent = program
.command('agent', { hidden: true })
.description('Machine-readable KTX commands for coding agents')
.showHelpAfterError();
agent.hook('preAction', (_thisCommand, actionCommand) => {
context.writeDebug?.('agent', actionCommand);
});
agent
.command('tools')
.description('Print available agent-facing KTX tools')
.addOption(jsonOption())
.action(async (_options, command) => {
await runAgent(context, { command: 'tools', projectDir: resolveCommandProjectDir(command), json: true });
});
agent
.command('context')
.description('Print project context for agent planning')
.addOption(jsonOption())
.action(async (_options, command) => {
await runAgent(context, { command: 'context', projectDir: resolveCommandProjectDir(command), json: true });
});
const sl = agent.command('sl').description('Semantic-layer agent commands');
sl.command('list')
.description('List semantic-layer sources')
.addOption(jsonOption())
.option('--connection-id <id>', 'Filter by connection id')
.option('--query <text>', 'Search source names and descriptions')
.action(async (options: { connectionId?: string; query?: string }, command) => {
await runAgent(context, {
command: 'sl-list',
projectDir: resolveCommandProjectDir(command),
json: true,
...(options.connectionId ? { connectionId: options.connectionId } : {}),
...(options.query ? { query: options.query } : {}),
});
});
sl.command('read')
.description('Read one semantic-layer source')
.argument('<sourceName>')
.addOption(jsonOption())
.option('--connection-id <id>', 'Connection id containing the source')
.action(async (sourceName: string, options: { connectionId?: string }, command) => {
await runAgent(context, {
command: 'sl-read',
projectDir: resolveCommandProjectDir(command),
json: true,
sourceName,
...(options.connectionId ? { connectionId: options.connectionId } : {}),
});
});
sl.command('query')
.description('Run a semantic-layer query JSON file')
.addOption(jsonOption())
.requiredOption('--connection-id <id>', 'Connection id for execution')
.requiredOption('--query-file <path>', 'JSON semantic-layer query file')
.option('--execute', 'Execute the compiled query against the connection', false)
.option('--yes', 'Install the managed Python runtime without prompting when required', false)
.option('--no-input', 'Disable interactive managed runtime installation')
.option('--max-rows <number>', 'Maximum rows to return when executing', parsePositiveIntegerOption)
.action(
async (
options: {
connectionId: string;
queryFile: string;
execute: boolean;
maxRows?: number;
yes?: boolean;
input?: boolean;
},
command,
) => {
await runAgent(context, {
command: 'sl-query',
projectDir: resolveCommandProjectDir(command),
json: true,
connectionId: options.connectionId,
queryFile: options.queryFile,
execute: options.execute,
cliVersion: context.packageInfo.version,
runtimeInstallPolicy: runtimeInstallPolicyFromFlags(options),
...(options.maxRows !== undefined ? { maxRows: options.maxRows } : {}),
});
},
);
const wiki = agent.command('wiki').description('KTX wiki agent commands');
wiki
.command('search')
.description('Search KTX wiki pages')
.argument('<query>')
.addOption(jsonOption())
.option('--limit <number>', 'Maximum search results', parsePositiveIntegerOption, 10)
.action(async (query: string, options: { limit: number }, command) => {
await runAgent(context, {
command: 'wiki-search',
projectDir: resolveCommandProjectDir(command),
json: true,
query,
limit: options.limit,
});
});
wiki
.command('read')
.description('Read one KTX wiki page')
.argument('<pageId>')
.addOption(jsonOption())
.action(async (pageId: string, _options, command) => {
await runAgent(context, { command: 'wiki-read', projectDir: resolveCommandProjectDir(command), json: true, pageId });
});
const sql = agent.command('sql').description('Safe SQL execution commands');
sql
.command('execute')
.description('Execute read-only SQL with a row limit')
.addOption(jsonOption())
.requiredOption('--connection-id <id>', 'Connection id for execution')
.requiredOption('--sql-file <path>', 'SQL file to execute')
.requiredOption('--max-rows <number>', 'Maximum rows to return', parsePositiveIntegerOption)
.action(async (options: { connectionId: string; sqlFile: string; maxRows: number }, command) => {
await runAgent(context, {
command: 'sql-execute',
projectDir: resolveCommandProjectDir(command),
json: true,
connectionId: options.connectionId,
sqlFile: options.sqlFile,
maxRows: options.maxRows,
});
});
}

View file

@ -1,47 +0,0 @@
import type { CommandUnknownOpts } from '@commander-js/extra-typings';
import type { KtxCliCommandContext } from '../cli-program.js';
import { completeCommanderInput, installZshCompletion, zshCompletionScript } from '../completion.js';
export function registerCompletionCommands(
program: CommandUnknownOpts,
context: KtxCliCommandContext,
completionRoot: CommandUnknownOpts = program,
): void {
program
.command('completion')
.description('Generate shell completion scripts')
.command('zsh')
.description('Generate zsh completion script')
.option('--install', 'Install zsh completion into ~/.zfunc and update ~/.zshrc', false)
.action(async (options: { install?: boolean }) => {
if (options.install === true) {
const result = await installZshCompletion();
context.io.stdout.write(`Installed zsh completion: ${result.completionPath}\n`);
context.io.stdout.write(`Updated zsh config: ${result.zshrcPath}\n`);
context.io.stdout.write('Restart your shell or run: source ~/.zshrc\n');
context.setExitCode(0);
return;
}
context.io.stdout.write(zshCompletionScript());
context.setExitCode(0);
});
program
.command('__complete', { hidden: true })
.description('Internal shell completion endpoint')
.requiredOption('--shell <shell>', 'Shell requesting completions')
.requiredOption('--position <position>', 'Current shell word position', (value) => Number(value))
.argument('[words...]', 'Current shell words')
.allowUnknownOption()
.allowExcessArguments()
.action((words: string[], options: { shell: string; position: number }) => {
if (options.shell !== 'zsh') {
context.setExitCode(1);
return;
}
for (const completion of completeCommanderInput(completionRoot, { position: options.position, words })) {
context.io.stdout.write(`${completion}\n`);
}
context.setExitCode(0);
});
}

View file

@ -1,61 +1,19 @@
import { type Command, InvalidArgumentError, Option } from '@commander-js/extra-typings';
import {
collectOption,
type KtxCliCommandContext,
parseBooleanStringOption,
parseNonEmptyAssignmentOption,
parseNonNegativeIntegerOption,
parsePositiveIntegerOption,
parseSafeConnectionIdOption,
resolveCommandProjectDir,
} from '../cli-program.js';
import { connectionAddCommandSchema } from '../command-schemas.js';
import { type Command } from '@commander-js/extra-typings';
import { type KtxCliCommandContext, resolveCommandProjectDir } from '../cli-program.js';
import type { KtxConnectionArgs } from '../connection.js';
import { profileMark } from '../startup-profile.js';
import type { KtxConnectionMappingArgs } from './connection-mapping.js';
import { registerConnectionMetabaseCommands } from './connection-metabase-commands.js';
import { registerConnectionNotionCommands } from './connection-notion-commands.js';
profileMark('module:commands/connection-commands');
const CRAWL_MODE_CHOICES = ['all_accessible', 'selected_roots'] as const;
const SYNC_MODE_CHOICES = ['ALL', 'ONLY', 'EXCEPT'] as const;
function parseCsvIds(value: string): number[] {
return value
.split(',')
.filter(Boolean)
.map((item) => parsePositiveIntegerOption(item));
}
function parseCsvStrings(value: string): string[] {
return value
.split(',')
.map((item) => item.trim())
.filter(Boolean);
}
function parseMappingFieldOption(value: string): 'databaseMappings' | 'connectionMappings' {
if (value === 'databaseMappings' || value === 'connectionMappings') {
return value;
}
throw new InvalidArgumentError('must be databaseMappings or connectionMappings');
}
async function runConnectionArgs(context: KtxCliCommandContext, args: KtxConnectionArgs): Promise<void> {
const runner = context.deps.connection ?? (await import('../connection.js')).runKtxConnection;
context.setExitCode(await runner(args, context.io));
}
async function runMappingArgs(context: KtxCliCommandContext, args: KtxConnectionMappingArgs): Promise<void> {
const { runKtxConnectionMapping } = await import('./connection-mapping.js');
context.setExitCode(await runKtxConnectionMapping(args, context.io));
}
export function registerConnectionCommands(program: Command, context: KtxCliCommandContext, commandName = 'connection'): void {
const connection = program
.command(commandName)
.description('Add, list, test, and map data sources')
.description('List and test configured connections')
.showHelpAfterError()
.addHelpText(
'after',
@ -83,264 +41,4 @@ export function registerConnectionCommands(program: Command, context: KtxCliComm
connectionId,
});
});
connection
.command('add')
.description('Add or replace a configured connection')
.argument('<driver>', 'Connection driver')
.argument('<connectionId>', 'KTX connection id')
.option('--url <url>', 'Connection URL, env:NAME, or file:/path reference')
.option('--schema <schema>', 'Schema to include; repeatable', collectOption, [])
.option('--readonly', 'Mark the connection as read-only', false)
.option('--force', 'Replace an existing connection', false)
.option('--allow-literal-credentials', 'Allow writing a literal credential URL to ktx.yaml', false)
.addOption(new Option('--token-env <name>', 'Environment variable containing Notion auth token').conflicts('tokenFile'))
.addOption(new Option('--token-file <path>', 'File containing Notion auth token').conflicts('tokenEnv'))
.addOption(
new Option('--crawl-mode <mode>', 'Notion crawl mode: all_accessible or selected_roots')
.choices(CRAWL_MODE_CHOICES)
.default('selected_roots'),
)
.option('--root-page-id <id>', 'Root page to crawl; repeatable', collectOption, [])
.option('--root-database-id <id>', 'Root database to crawl; repeatable', collectOption, [])
.option('--root-data-source-id <id>', 'Root data source to crawl; repeatable', collectOption, [])
.option('--max-pages <n>', 'Maximum pages per run', parsePositiveIntegerOption)
.option('--max-knowledge-creates <n>', 'Maximum knowledge creates per run', parseNonNegativeIntegerOption)
.option('--max-knowledge-updates <n>', 'Maximum knowledge updates per run', parseNonNegativeIntegerOption)
.action(async (driver: string, connectionId: string, options, command) => {
const notion =
driver === 'notion'
? {
authTokenRef: options.tokenEnv
? `env:${options.tokenEnv}`
: options.tokenFile
? `file:${options.tokenFile}`
: '',
crawlMode: options.crawlMode,
rootPageIds: options.rootPageId,
rootDatabaseIds: options.rootDatabaseId,
rootDataSourceIds: options.rootDataSourceId,
maxPagesPerRun: options.maxPages,
maxKnowledgeCreatesPerRun: options.maxKnowledgeCreates,
maxKnowledgeUpdatesPerRun: options.maxKnowledgeUpdates,
}
: undefined;
if (driver === 'notion' && !notion?.authTokenRef) {
throw new Error('connection add notion requires --token-env NAME or --token-file PATH');
}
if (
driver === 'notion' &&
notion?.crawlMode === 'selected_roots' &&
notion.rootPageIds.length + notion.rootDatabaseIds.length + notion.rootDataSourceIds.length === 0
) {
throw new Error('connection add notion selected_roots requires at least one root id');
}
const args = connectionAddCommandSchema.parse({
command: 'add',
projectDir: resolveCommandProjectDir(command),
driver,
connectionId,
url: options.url,
schemas: options.schema.filter(Boolean),
readonly: options.readonly === true,
force: options.force === true,
allowLiteralCredentials: options.allowLiteralCredentials === true,
notion,
});
await runConnectionArgs(context, args);
});
connection
.command('remove')
.description('Remove a configured connection from ktx.yaml')
.argument('<connectionId>', 'KTX connection id')
.option('--force', 'Remove without prompting', false)
.option('--no-input', 'Disable interactive terminal input')
.action(async (connectionId: string, options: { force?: boolean; input?: boolean }, command) => {
await runConnectionArgs(context, {
command: 'remove',
projectDir: resolveCommandProjectDir(command),
connectionId,
force: options.force === true,
...(options.input === false ? { inputMode: 'disabled' } : {}),
});
});
connection
.command('map')
.description('Refresh and validate BI-to-warehouse mappings')
.argument('<sourceConnectionId>', 'Source BI connection id')
.option('--json', 'Print JSON output', false)
.action(async (sourceConnectionId: string, options: { json?: boolean }, command) => {
await runConnectionArgs(context, {
command: 'map',
projectDir: resolveCommandProjectDir(command),
sourceConnectionId,
json: options.json === true,
});
});
registerConnectionMappingCommands(connection, context);
registerConnectionMetabaseCommands(connection, context);
registerConnectionNotionCommands(connection, context);
}
export function registerConnectionMappingCommands(connection: Command, context: KtxCliCommandContext): void {
const mapping = connection
.command('mapping')
.description('Manage Metabase warehouse mappings')
.showHelpAfterError()
.addHelpText(
'after',
'\nProject directory defaults to KTX_PROJECT_DIR when set, otherwise the current working directory.\n',
);
mapping
.command('list')
.description('List Metabase database mappings')
.argument('<connectionId>', 'Metabase connection id')
.option('--json', 'Print JSON output where supported', false)
.action(async (connectionId: string, options: { json?: boolean }, command) => {
await runMappingArgs(context, {
command: 'list',
projectDir: resolveCommandProjectDir(command),
connectionId,
json: options.json === true,
});
});
mapping
.command('set')
.description('Set a Metabase or Looker warehouse mapping')
.argument('<connectionId>', 'Source connection id', parseSafeConnectionIdOption)
.argument('<field>', 'Mapping field', parseMappingFieldOption)
.argument('<assignment>', 'Mapping assignment such as 1=prod-warehouse', parseNonEmptyAssignmentOption)
.action(
async (
connectionId: string,
field: 'databaseMappings' | 'connectionMappings',
assignment: { key: string; value: string },
_options: unknown,
command,
) => {
await runMappingArgs(context, {
command: 'set',
projectDir: resolveCommandProjectDir(command),
connectionId,
field,
key: assignment.key,
value: assignment.value,
});
},
);
mapping
.command('apply-bulk')
.description('Apply mappings from JSON')
.argument('<connectionId>', 'Metabase connection id')
.requiredOption('--file <path>', 'JSON mapping file')
.action(async (connectionId: string, options: { file: string }, command) => {
await runMappingArgs(context, {
command: 'apply-bulk',
projectDir: resolveCommandProjectDir(command),
connectionId,
filePath: options.file,
});
});
mapping
.command('set-sync-enabled')
.description('Enable or disable sync for one Metabase database')
.argument('<connectionId>', 'Metabase connection id')
.argument('<metabaseDatabaseId>', 'Metabase database id', parsePositiveIntegerOption)
.requiredOption('--enabled <value>', 'true or false', parseBooleanStringOption)
.action(
async (connectionId: string, metabaseDatabaseId: number, options: { enabled: boolean }, command) => {
await runMappingArgs(context, {
command: 'set-sync-enabled',
projectDir: resolveCommandProjectDir(command),
connectionId,
metabaseDatabaseId,
enabled: options.enabled,
});
},
);
const syncState = mapping.command('sync-state').description('Manage Metabase sync-state selection');
syncState
.command('get')
.description('Read sync-state selection')
.argument('<connectionId>', 'Metabase connection id')
.option('--json', 'Print JSON output where supported', false)
.action(async (connectionId: string, options: { json?: boolean }, command) => {
await runMappingArgs(context, {
command: 'sync-state-get',
projectDir: resolveCommandProjectDir(command),
connectionId,
json: options.json === true,
});
});
syncState
.command('set')
.description('Write sync-state selection')
.argument('<connectionId>', 'Metabase connection id')
.addOption(new Option('--mode <mode>', 'ALL, ONLY, or EXCEPT').choices(SYNC_MODE_CHOICES).makeOptionMandatory())
.option('--collections <ids>', 'Comma-separated collection ids', parseCsvIds, [])
.option('--items <ids>', 'Comma-separated item ids', parseCsvIds, [])
.option('--tag-names <names>', 'Comma-separated tag names', parseCsvStrings, [])
.action(async (connectionId: string, options, command) => {
await runMappingArgs(context, {
command: 'sync-state-set',
projectDir: resolveCommandProjectDir(command),
connectionId,
syncMode: options.mode,
collectionIds: options.collections,
itemIds: options.items,
tagNames: options.tagNames,
});
});
mapping
.command('refresh')
.description('Refresh Metabase database mappings')
.argument('<connectionId>', 'Metabase connection id')
.option('--auto-accept', 'Accept refresh changes without prompting', false)
.action(async (connectionId: string, options: { autoAccept?: boolean }, command) => {
await runMappingArgs(context, {
command: 'refresh',
projectDir: resolveCommandProjectDir(command),
connectionId,
autoAccept: options.autoAccept === true,
});
});
mapping
.command('validate')
.description('Validate Metabase database mappings')
.argument('<connectionId>', 'Metabase connection id')
.action(async (connectionId: string, _options: unknown, command) => {
await runMappingArgs(context, {
command: 'validate',
projectDir: resolveCommandProjectDir(command),
connectionId,
});
});
mapping
.command('clear')
.description('Clear Metabase database mappings')
.argument('<connectionId>', 'Metabase connection id')
.argument('[metabaseDatabaseId]', 'Metabase database id', parsePositiveIntegerOption)
.action(async (connectionId: string, metabaseDatabaseId: number | undefined, _options: unknown, command) => {
await runMappingArgs(context, {
command: 'clear',
projectDir: resolveCommandProjectDir(command),
connectionId,
...(metabaseDatabaseId ? { metabaseDatabaseId } : {}),
});
});
}

View file

@ -1,329 +0,0 @@
import { mkdtemp, rm } from 'node:fs/promises';
import { tmpdir } from 'node:os';
import { join } from 'node:path';
import { LocalMetabaseSourceStateReader } from '@ktx/context/ingest';
import { initKtxProject, loadKtxProject, serializeKtxProjectConfig } from '@ktx/context/project';
import { afterEach, beforeEach, describe, expect, it, vi } from 'vitest';
import { runKtxConnectionMapping } from './connection-mapping.js';
function makeIo() {
let stdout = '';
let stderr = '';
return {
io: {
stdout: {
write: (chunk: string) => {
stdout += chunk;
},
},
stderr: {
write: (chunk: string) => {
stderr += chunk;
},
},
},
stdout: () => stdout,
stderr: () => stderr,
};
}
describe('runKtxConnectionMapping', () => {
let tempDir: string;
let projectDir: string;
beforeEach(async () => {
tempDir = await mkdtemp(join(tmpdir(), 'ktx-cli-metabase-mapping-'));
projectDir = join(tempDir, 'project');
await initKtxProject({ projectDir, projectName: 'mapping' });
const project = await loadKtxProject({ projectDir });
await project.fileStore.writeFile(
'ktx.yaml',
serializeKtxProjectConfig({
...project.config,
connections: {
'prod-metabase': {
driver: 'metabase',
api_url: 'https://metabase.example.com',
api_key_ref: 'env:METABASE_API_KEY', // pragma: allowlist secret
},
'prod-warehouse': {
driver: 'postgres',
url: 'env:WAREHOUSE_URL',
readonly: true,
},
},
}),
'ktx',
'ktx@example.com',
'Seed Metabase mapping test connections',
);
});
async function replaceConnections(connections: Record<string, { driver: string; [key: string]: unknown }>) {
const project = await loadKtxProject({ projectDir });
await project.fileStore.writeFile(
'ktx.yaml',
serializeKtxProjectConfig({
...project.config,
connections,
}),
'ktx',
'ktx@example.com',
'Replace mapping test connections',
);
}
afterEach(async () => {
await rm(tempDir, { recursive: true, force: true });
});
it('sets, lists, disables, and clears local Metabase mappings', async () => {
const io = makeIo();
await expect(
runKtxConnectionMapping(
{
command: 'set',
projectDir,
connectionId: 'prod-metabase',
field: 'databaseMappings',
key: '1',
value: 'prod-warehouse',
},
io.io,
),
).resolves.toBe(0);
const listIo = makeIo();
await expect(
runKtxConnectionMapping({ command: 'list', projectDir, connectionId: 'prod-metabase', json: false }, listIo.io),
).resolves.toBe(0);
expect(listIo.stdout()).toContain('1 -> prod-warehouse');
expect(listIo.stdout()).toContain('unhydrated');
await expect(
runKtxConnectionMapping(
{
command: 'set-sync-enabled',
projectDir,
connectionId: 'prod-metabase',
metabaseDatabaseId: 1,
enabled: false,
},
makeIo().io,
),
).resolves.toBe(0);
await expect(
runKtxConnectionMapping(
{
command: 'clear',
projectDir,
connectionId: 'prod-metabase',
metabaseDatabaseId: 1,
},
makeIo().io,
),
).resolves.toBe(0);
});
it('lists Metabase yaml mapping bootstrap rows before any SQLite command writes', async () => {
const projectDir = await mkdtemp(join(tmpdir(), 'ktx-cli-yaml-mapping-'));
await initKtxProject({ projectDir, projectName: 'yaml-mapping' });
const project = await loadKtxProject({ projectDir });
await project.fileStore.writeFile(
'ktx.yaml',
serializeKtxProjectConfig({
...project.config,
connections: {
'prod-metabase': {
driver: 'metabase',
mappings: {
databaseMappings: { '1': 'prod-warehouse' },
syncEnabled: { '1': true },
},
},
'prod-warehouse': { driver: 'postgres', url: 'postgresql://readonly@db.test/analytics' },
},
}),
'ktx',
'ktx@example.com',
'Seed yaml mappings',
);
const io = makeIo();
await expect(
runKtxConnectionMapping(
{ command: 'list', projectDir, connectionId: 'prod-metabase', json: false },
io.io,
),
).resolves.toBe(0);
expect(io.stdout()).toContain('1 -> prod-warehouse');
expect(io.stdout()).toContain('source: ktx.yaml');
});
it('refreshes Metabase discovery metadata through the injected runtime client', async () => {
const client = {
getDatabases: vi.fn().mockResolvedValue([
{
id: 1,
name: 'Analytics',
engine: 'postgres',
details: { host: 'pg.internal', dbname: 'analytics' },
is_sample: false,
},
]),
cleanup: vi.fn(),
};
const io = makeIo();
await expect(
runKtxConnectionMapping(
{
command: 'refresh',
projectDir,
connectionId: 'prod-metabase',
autoAccept: true,
},
io.io,
{
createMetabaseClient: async () => client as never,
},
),
).resolves.toBe(0);
expect(io.stdout()).toContain('Discovery: 1 database');
expect(client.cleanup).toHaveBeenCalledTimes(1);
const store = new LocalMetabaseSourceStateReader({ dbPath: join(projectDir, '.ktx', 'db.sqlite') });
await expect(store.listDatabaseMappings('prod-metabase')).resolves.toMatchObject([
{ metabaseDatabaseId: 1, metabaseDatabaseName: 'Analytics', source: 'refresh' },
]);
});
it('sets and lists Looker connection mappings', async () => {
await replaceConnections({
'prod-looker': {
driver: 'looker',
base_url: 'https://looker.example.test',
client_id: 'id',
},
'prod-warehouse': {
driver: 'postgres',
url: 'postgresql://readonly@db.example.test/analytics',
},
});
const io = makeIo();
await expect(
runKtxConnectionMapping(
{
command: 'set',
projectDir,
connectionId: 'prod-looker',
field: 'connectionMappings',
key: 'analytics',
value: 'prod-warehouse',
},
io.io,
),
).resolves.toBe(0);
await expect(
runKtxConnectionMapping({ command: 'list', projectDir, connectionId: 'prod-looker', json: false }, io.io),
).resolves.toBe(0);
expect(io.stdout()).toContain('analytics -> prod-warehouse');
});
it('keeps driver-specific mapping field validation in the runner', async () => {
await replaceConnections({
'prod-looker': { driver: 'looker', base_url: 'https://looker.example.com' },
warehouse: { driver: 'postgres', url: 'env:WAREHOUSE_URL' },
});
const io = makeIo();
await expect(
runKtxConnectionMapping(
{
command: 'set',
projectDir,
connectionId: 'prod-looker',
field: 'databaseMappings',
key: '1',
value: 'warehouse',
},
io.io,
),
).resolves.toBe(1);
expect(io.stderr()).toContain('Looker mapping set requires connectionMappings');
});
it('refreshes Looker mapping metadata and reports drift', async () => {
await replaceConnections({
'prod-looker': {
driver: 'looker',
base_url: 'https://looker.example.test',
client_id: 'id',
},
'prod-warehouse': {
driver: 'postgres',
url: 'postgresql://readonly@db.example.test/analytics',
},
});
const io = makeIo();
await expect(
runKtxConnectionMapping(
{ command: 'refresh', projectDir, connectionId: 'prod-looker', autoAccept: true },
io.io,
{
createLookerClient: async () => ({
listLookerConnections: async () => [
{
name: 'analytics',
host: 'db.example.test',
database: 'analytics',
schema: null,
dialect: 'postgres',
},
],
cleanup: async () => {},
}),
},
),
).resolves.toBe(0);
expect(io.stdout()).toContain('Discovery: 1 connection');
expect(io.stdout()).toContain('Unmapped discovered: 1');
});
it('validates Looker mappings through the canonical local warehouse descriptor', async () => {
const projectDir = await mkdtemp(join(tmpdir(), 'ktx-cli-descriptor-validation-'));
await initKtxProject({ projectDir, projectName: 'descriptor-validation' });
const project = await loadKtxProject({ projectDir });
await project.fileStore.writeFile(
'ktx.yaml',
serializeKtxProjectConfig({
...project.config,
connections: {
'prod-looker': {
driver: 'looker',
mappings: { connectionMappings: { analytics: 'prod-warehouse' } },
},
'prod-warehouse': { driver: 'postgresql', url: 'postgresql://readonly@db.test/analytics' },
},
}),
'ktx',
'ktx@example.com',
'Seed descriptor validation',
);
const io = makeIo();
await expect(
runKtxConnectionMapping({ command: 'validate', projectDir, connectionId: 'prod-looker' }, io.io),
).resolves.toBe(0);
expect(io.stdout()).toContain('Mapping validation passed: prod-looker');
expect(io.stderr()).toBe('');
});
});

View file

@ -1,426 +0,0 @@
import { readFile } from 'node:fs/promises';
import { localConnectionToWarehouseDescriptor } from '@ktx/context/connections';
import {
DEFAULT_METABASE_CLIENT_CONFIG,
DefaultLookerConnectionClientFactory,
DefaultMetabaseConnectionClientFactory,
LocalLookerRuntimeStore,
LocalMetabaseSourceStateReader,
computeLookerMappingDrift,
computeMetabaseMappingDrift,
discoverLookerConnections,
discoverMetabaseDatabases,
lookerCredentialsFromLocalConnection,
metabaseRuntimeConfigFromLocalConnection,
seedLocalMappingStateFromKtxYaml,
validateLookerMappings,
validateMappingPhysicalMatch,
type LookerMappingClient,
type MetabaseRuntimeClient,
type MetabaseSyncMode,
} from '@ktx/context/ingest';
import { type KtxLocalProject, ktxLocalStateDbPath, loadKtxProject } from '@ktx/context/project';
import type { KtxCliIo } from '../index.js';
import { profileMark } from '../startup-profile.js';
profileMark('module:commands/connection-mapping');
export type KtxConnectionMappingArgs =
| { command: 'list'; projectDir: string; connectionId: string; json: boolean }
| {
command: 'set';
projectDir: string;
connectionId: string;
field: 'databaseMappings' | 'connectionMappings';
key: string;
value: string;
}
| { command: 'apply-bulk'; projectDir: string; connectionId: string; filePath: string }
| {
command: 'set-sync-enabled';
projectDir: string;
connectionId: string;
metabaseDatabaseId: number;
enabled: boolean;
}
| { command: 'sync-state-get'; projectDir: string; connectionId: string; json: boolean }
| {
command: 'sync-state-set';
projectDir: string;
connectionId: string;
syncMode: MetabaseSyncMode;
collectionIds: number[];
itemIds: number[];
tagNames: string[];
}
| { command: 'refresh'; projectDir: string; connectionId: string; autoAccept: boolean }
| { command: 'validate'; projectDir: string; connectionId: string }
| { command: 'clear'; projectDir: string; connectionId: string; metabaseDatabaseId?: number; mappingKey?: string };
interface KtxConnectionMappingDeps {
createMetabaseClient?: (
project: KtxLocalProject,
connectionId: string,
) => Promise<Pick<MetabaseRuntimeClient, 'getDatabases' | 'cleanup'>>;
createLookerClient?: (
project: KtxLocalProject,
connectionId: string,
) => Promise<Pick<LookerMappingClient, 'listLookerConnections'> & { cleanup?(): Promise<void> }>;
}
interface MetabaseBulkMappingPayload {
databaseMappings?: Record<string, string | null>;
syncEnabled?: Record<string, boolean>;
syncMode?: MetabaseSyncMode;
selections?: { collections?: number[]; items?: number[] };
defaultTagNames?: string[];
}
function parseId(value: string, label: string): number {
const parsed = Number(value);
if (!Number.isInteger(parsed) || parsed < 1) {
throw new Error(`${label} must be a positive integer`);
}
return parsed;
}
async function createDefaultMetabaseClient(
project: KtxLocalProject,
connectionId: string,
): Promise<Pick<MetabaseRuntimeClient, 'getDatabases' | 'cleanup'>> {
const factory = new DefaultMetabaseConnectionClientFactory(
(metabaseConnectionId) =>
metabaseRuntimeConfigFromLocalConnection(metabaseConnectionId, project.config.connections[metabaseConnectionId]),
DEFAULT_METABASE_CLIENT_CONFIG,
);
return factory.createClient(connectionId);
}
async function createDefaultLookerClient(
project: KtxLocalProject,
connectionId: string,
): Promise<Pick<LookerMappingClient, 'listLookerConnections'> & { cleanup?(): Promise<void> }> {
const factory = new DefaultLookerConnectionClientFactory({
async resolve(lookerConnectionId) {
return lookerCredentialsFromLocalConnection(lookerConnectionId, project.config.connections[lookerConnectionId]);
},
});
return factory.createClient(connectionId) as unknown as Pick<LookerMappingClient, 'listLookerConnections'> & {
cleanup?(): Promise<void>;
};
}
function isLookerConnection(project: KtxLocalProject, connectionId: string): boolean {
return String(project.config.connections[connectionId]?.driver ?? '').toLowerCase() === 'looker';
}
function assertLookerConnection(project: KtxLocalProject, connectionId: string): void {
if (!isLookerConnection(project, connectionId)) {
throw new Error(`Connection "${connectionId}" is not a Looker connection`);
}
}
function assertMetabaseConnection(project: KtxLocalProject, connectionId: string): void {
const connection = project.config.connections[connectionId];
if (!connection || String(connection.driver).toLowerCase() !== 'metabase') {
throw new Error(`Connection "${connectionId}" is not a Metabase connection`);
}
}
function assertTargetConnection(project: KtxLocalProject, connectionId: string): void {
if (!project.config.connections[connectionId]) {
throw new Error(`Target connection "${connectionId}" does not exist`);
}
}
function targetPhysicalInfo(project: KtxLocalProject, connectionId: string) {
const descriptor = localConnectionToWarehouseDescriptor(connectionId, project.config.connections[connectionId]);
if (!descriptor) {
return { connection_type: 'UNKNOWN' };
}
return {
connection_type: descriptor.connection_type,
host: descriptor.host ?? null,
database: descriptor.database ?? null,
account: descriptor.account ?? null,
project_id: descriptor.project_id ?? null,
dataset_id: descriptor.dataset_id ?? null,
...descriptor.connection_params,
};
}
function renderMapping(
row: Awaited<ReturnType<LocalMetabaseSourceStateReader['listDatabaseMappings']>>[number],
): string {
const name = row.metabaseDatabaseName ?? 'unhydrated';
const target = row.targetConnectionId ?? '[unmapped]';
return `${row.metabaseDatabaseId} -> ${target} (${name}, sync: ${row.syncEnabled ? 'on' : 'off'}, source: ${
row.source
})`;
}
function renderLookerMapping(row: Awaited<ReturnType<LocalLookerRuntimeStore['listConnectionMappings']>>[number]): string {
const target = row.ktxConnectionId ?? '[unmapped]';
const metadata = [row.lookerDialect, row.lookerHost, row.lookerDatabase].filter(Boolean).join(', ');
return `${row.lookerConnectionName} -> ${target}${metadata ? ` (${metadata}, source: ${row.source})` : ` (source: ${row.source})`}`;
}
export async function runKtxConnectionMapping(
args: KtxConnectionMappingArgs,
io: KtxCliIo = process,
deps: KtxConnectionMappingDeps = {},
): Promise<number> {
try {
const project = await loadKtxProject({ projectDir: args.projectDir });
await seedLocalMappingStateFromKtxYaml(project, args.connectionId);
if (isLookerConnection(project, args.connectionId)) {
assertLookerConnection(project, args.connectionId);
const store = new LocalLookerRuntimeStore({ dbPath: ktxLocalStateDbPath(project) });
if (args.command === 'list') {
const rows = await store.listConnectionMappings(args.connectionId);
io.stdout.write(args.json ? `${JSON.stringify(rows, null, 2)}\n` : `${rows.map(renderLookerMapping).join('\n')}\n`);
return 0;
}
if (args.command === 'set') {
if (args.field !== 'connectionMappings') {
throw new Error('Looker mapping set requires connectionMappings <lookerConnectionName>=<targetConnectionId>');
}
assertTargetConnection(project, args.value);
await store.upsertConnectionMapping({
lookerConnectionId: args.connectionId,
lookerConnectionName: args.key,
ktxConnectionId: args.value,
source: 'cli',
});
io.stdout.write(`Set connectionMappings.${args.key} = ${args.value}\n`);
return 0;
}
if (args.command === 'refresh') {
const client = await (deps.createLookerClient ?? createDefaultLookerClient)(project, args.connectionId);
try {
const discovered = await discoverLookerConnections(client);
const drift = computeLookerMappingDrift({
storedMappings: await store.readMappings(args.connectionId),
discovered,
});
if (args.autoAccept) {
await store.refreshDiscoveredConnections({ lookerConnectionId: args.connectionId, discovered });
}
io.stdout.write(`Discovery: ${discovered.length} ${discovered.length === 1 ? 'connection' : 'connections'}\n`);
io.stdout.write(`Unmapped discovered: ${drift.unmappedDiscovered.length}\n`);
io.stdout.write(`Stale mappings: ${drift.staleMappings.length}\n`);
return 0;
} finally {
await client.cleanup?.();
}
}
if (args.command === 'validate') {
const knownKtxConnectionIds = new Set(Object.keys(project.config.connections));
const knownConnectionTypes = new Map(
Object.entries(project.config.connections).map(([id, _config]) => [id, targetPhysicalInfo(project, id).connection_type]),
);
const validation = validateLookerMappings({
mappings: await store.readMappings(args.connectionId),
knownKtxConnectionIds,
knownConnectionTypes,
});
if (!validation.ok) {
for (const error of validation.errors) {
io.stderr.write(`${error.key}: ${error.reason}\n`);
}
return 1;
}
io.stdout.write(`Mapping validation passed: ${args.connectionId}\n`);
return 0;
}
if (args.command === 'clear') {
await store.clearConnectionMappings({
lookerConnectionId: args.connectionId,
lookerConnectionName: args.mappingKey ?? (args.metabaseDatabaseId ? String(args.metabaseDatabaseId) : undefined),
});
io.stdout.write(
args.mappingKey
? `Cleared connectionMappings.${args.mappingKey}\n`
: `Cleared mappings for ${args.connectionId}\n`,
);
return 0;
}
throw new Error(`Looker connection mapping does not support ${args.command}`);
}
assertMetabaseConnection(project, args.connectionId);
const store = new LocalMetabaseSourceStateReader({ dbPath: ktxLocalStateDbPath(project) });
if (args.command === 'list') {
const rows = await store.listDatabaseMappings(args.connectionId);
io.stdout.write(args.json ? `${JSON.stringify(rows, null, 2)}\n` : `${rows.map(renderMapping).join('\n')}\n`);
return 0;
}
if (args.command === 'set') {
assertTargetConnection(project, args.value);
await store.upsertDatabaseMapping({
connectionId: args.connectionId,
metabaseDatabaseId: parseId(args.key, 'metabaseDatabaseId'),
targetConnectionId: args.value,
syncEnabled: true,
source: 'cli',
});
io.stdout.write(`Set databaseMappings.${args.key} = ${args.value}\n`);
return 0;
}
if (args.command === 'apply-bulk') {
const payload = JSON.parse(await readFile(args.filePath, 'utf8')) as MetabaseBulkMappingPayload;
const existingState = await store.getSourceState(args.connectionId);
const existingRows = await store.listDatabaseMappings(args.connectionId);
const existingById = new Map(existingRows.map((row) => [row.metabaseDatabaseId, row]));
const databaseMappings = payload.databaseMappings ?? {};
for (const targetConnectionId of Object.values(databaseMappings)) {
if (targetConnectionId) {
assertTargetConnection(project, targetConnectionId);
}
}
const mappingIds = new Set([
...existingRows.map((row) => row.metabaseDatabaseId),
...Object.keys(databaseMappings).map((id) => parseId(id, 'metabaseDatabaseId')),
...Object.keys(payload.syncEnabled ?? {}).map((id) => parseId(id, 'metabaseDatabaseId')),
]);
await store.replaceSourceState({
connectionId: args.connectionId,
syncMode: payload.syncMode ?? existingState.syncMode,
defaultTagNames: payload.defaultTagNames ?? existingState.defaultTagNames,
selections:
payload.selections === undefined
? existingState.selections
: [
...(payload.selections.collections ?? []).map((id) => ({
selectionType: 'collection' as const,
metabaseObjectId: id,
})),
...(payload.selections.items ?? []).map((id) => ({
selectionType: 'item' as const,
metabaseObjectId: id,
})),
],
mappings: [...mappingIds]
.sort((a, b) => a - b)
.map((id) => {
const existing = existingById.get(id);
return {
metabaseDatabaseId: id,
metabaseDatabaseName: existing?.metabaseDatabaseName ?? null,
metabaseEngine: existing?.metabaseEngine ?? null,
metabaseHost: existing?.metabaseHost ?? null,
metabaseDbName: existing?.metabaseDbName ?? null,
targetConnectionId: databaseMappings[String(id)] ?? existing?.targetConnectionId ?? null,
syncEnabled: payload.syncEnabled?.[String(id)] ?? existing?.syncEnabled ?? false,
source: 'cli',
};
}),
});
io.stdout.write(`Applied bulk mappings for ${args.connectionId}\n`);
return 0;
}
if (args.command === 'set-sync-enabled') {
await store.setMappingSyncEnabled({
connectionId: args.connectionId,
metabaseDatabaseId: args.metabaseDatabaseId,
syncEnabled: args.enabled,
});
io.stdout.write(`Set syncEnabled.${args.metabaseDatabaseId} = ${args.enabled}\n`);
return 0;
}
if (args.command === 'sync-state-get') {
const state = await store.getSourceState(args.connectionId);
const payload = {
syncMode: state.syncMode,
selections: state.selections,
defaultTagNames: state.defaultTagNames,
};
io.stdout.write(args.json ? `${JSON.stringify(payload, null, 2)}\n` : `${payload.syncMode}\n`);
return 0;
}
if (args.command === 'sync-state-set') {
await store.setSyncState({
connectionId: args.connectionId,
syncMode: args.syncMode,
defaultTagNames: args.tagNames,
selections: [
...args.collectionIds.map((id) => ({ selectionType: 'collection' as const, metabaseObjectId: id })),
...args.itemIds.map((id) => ({ selectionType: 'item' as const, metabaseObjectId: id })),
],
});
io.stdout.write(`Set sync state for ${args.connectionId}\n`);
return 0;
}
if (args.command === 'refresh') {
const client = await (deps.createMetabaseClient ?? createDefaultMetabaseClient)(project, args.connectionId);
try {
const discovered = await discoverMetabaseDatabases(client);
const existing = Object.fromEntries(
(await store.listDatabaseMappings(args.connectionId)).map((row) => [
String(row.metabaseDatabaseId),
row.targetConnectionId,
]),
);
const drift = computeMetabaseMappingDrift({ currentMappings: existing, discovered });
if (args.autoAccept) {
await store.refreshDiscoveredDatabases({ connectionId: args.connectionId, discovered });
}
io.stdout.write(`Discovery: ${discovered.length} ${discovered.length === 1 ? 'database' : 'databases'}\n`);
io.stdout.write(`Unmapped discovered: ${drift.unmappedDiscovered.length}\n`);
io.stdout.write(`Stale mappings: ${drift.staleMappings.length}\n`);
return 0;
} finally {
await client.cleanup();
}
}
if (args.command === 'validate') {
const rows = await store.listDatabaseMappings(args.connectionId);
const failures = rows.flatMap((row) => {
if (!row.targetConnectionId) {
return [];
}
const reason = validateMappingPhysicalMatch(
{ metabaseEngine: row.metabaseEngine, metabaseDbName: row.metabaseDbName, metabaseHost: row.metabaseHost },
project.config.connections[row.targetConnectionId]
? targetPhysicalInfo(project, row.targetConnectionId)
: { connection_type: 'UNKNOWN' },
);
return reason ? [`${row.metabaseDatabaseId}: ${reason}`] : [];
});
if (failures.length > 0) {
for (const failure of failures) {
io.stderr.write(`${failure}\n`);
}
return 1;
}
io.stdout.write(`Mapping validation passed: ${args.connectionId}\n`);
return 0;
}
const metabaseDatabaseId = args.metabaseDatabaseId ?? (args.mappingKey ? parseId(args.mappingKey, 'metabaseDatabaseId') : undefined);
await store.clearDatabaseMappings({ connectionId: args.connectionId, metabaseDatabaseId });
io.stdout.write(
metabaseDatabaseId
? `Cleared databaseMappings.${metabaseDatabaseId}\n`
: `Cleared mappings for ${args.connectionId}\n`,
);
return 0;
} catch (error) {
io.stderr.write(`${error instanceof Error ? error.message : String(error)}\n`);
return 1;
}
}

View file

@ -1,132 +0,0 @@
import { type Command, Option } from '@commander-js/extra-typings';
import {
type KtxCliCommandContext,
parseNonEmptyAssignmentOption,
parsePositiveIntegerOption,
parseSafeConnectionIdOption,
resolveCommandProjectDir,
} from '../cli-program.js';
import {
type KtxConnectionMetabaseSetupArgs,
type MetabaseSetupMappingAssignment,
type MetabaseSetupSyncMode,
runKtxConnectionMetabaseSetup,
} from './connection-metabase-setup.js';
const SYNC_MODE_CHOICES = ['ALL', 'ONLY', 'EXCEPT'] as const satisfies readonly MetabaseSetupSyncMode[];
interface ConnectionMetabaseSetupOptions {
id?: string;
url?: string;
apiKey?: string;
mintApiKey?: boolean;
username?: string;
password?: string;
map: MetabaseSetupMappingAssignment[];
sync: number[];
syncMode: MetabaseSetupSyncMode;
runIngest?: boolean;
yes?: boolean;
input?: boolean;
}
function collectPositiveIntegerOption(value: string, previous: number[] = []): number[] {
return [...previous, parsePositiveIntegerOption(value)];
}
function parseMappingAssignment(value: string): MetabaseSetupMappingAssignment {
const assignment = parseNonEmptyAssignmentOption(value);
return {
metabaseDatabaseId: parsePositiveIntegerOption(assignment.key),
targetConnectionId: parseSafeConnectionIdOption(assignment.value),
};
}
function collectMappingOption(
value: string,
previous: MetabaseSetupMappingAssignment[] = [],
): MetabaseSetupMappingAssignment[] {
return [...previous, parseMappingAssignment(value)];
}
async function runMetabaseSetupArgs(
context: KtxCliCommandContext,
args: KtxConnectionMetabaseSetupArgs,
): Promise<void> {
const runner = context.deps.connectionMetabaseSetup ?? runKtxConnectionMetabaseSetup;
context.setExitCode(await runner(args, context.io));
}
export function registerConnectionMetabaseCommands(connection: Command, context: KtxCliCommandContext): void {
const metabase = connection
.command('metabase')
.description('Configure Metabase connections')
.showHelpAfterError()
.addHelpText(
'after',
'\nProject directory defaults to KTX_PROJECT_DIR when set, otherwise the current working directory.\n',
);
metabase.action(() => {
metabase.outputHelp();
context.setExitCode(0);
});
metabase
.command('setup')
.description('Guided setup for a Metabase connection')
.option('--id <connectionId>', 'KTX connection id to write', parseSafeConnectionIdOption)
.option('--url <url>', 'Metabase API URL')
.addOption(new Option('--api-key <key>', 'Metabase API key').conflicts('mintApiKey'))
.option('--mint-api-key', 'Mint a Metabase API key with credentials', false)
.option('--username <email>', 'Metabase admin username for API-key minting')
.option('--password <password>', 'Metabase admin password for API-key minting')
.addHelpText(
'after',
'\nGuided equivalent of:\n' +
' ktx connection mapping refresh <connectionId> --auto-accept\n' +
' ktx connection mapping set <connectionId> databaseMappings <id>=<target>\n' +
' ktx connection mapping set-sync-enabled <connectionId> <id> --enabled true\n' +
' ktx ingest <connectionId>\n',
)
.option(
'--map <metabaseDatabaseId=targetConnectionId>',
'Assign a Metabase database id to a warehouse connection; repeatable',
collectMappingOption,
[],
)
.option(
'--sync <metabaseDatabaseId>',
'Enable Metabase sync for a discovered database; repeatable',
collectPositiveIntegerOption,
[],
)
.addOption(
new Option('--sync-mode <mode>', 'Metabase sync selection mode')
.choices(SYNC_MODE_CHOICES)
.default('ALL' satisfies MetabaseSetupSyncMode),
)
.option('--run-ingest', 'Run ingest after setup', false)
.option('--yes', 'Confirm and apply setup changes without prompting', false)
.option('--no-input', 'Disable interactive terminal input')
.showHelpAfterError()
.action(async (options: ConnectionMetabaseSetupOptions, command) => {
await runMetabaseSetupArgs(context, {
command: 'setup',
projectDir: resolveCommandProjectDir(command),
connectionId: options.id,
url: options.url,
apiKey: options.apiKey,
mintApiKey: options.mintApiKey === true,
metabaseUsername: options.username,
metabasePassword: options.password,
mappings: options.map,
syncEnabledDatabaseIds: options.sync,
syncMode: options.syncMode ?? 'ALL',
runIngest: options.runIngest === true,
yes: options.yes === true,
inputMode: options.input === false ? 'disabled' : 'auto',
});
});
}

File diff suppressed because it is too large Load diff

View file

@ -1,782 +0,0 @@
import type { Option as ClackOption } from '@clack/prompts';
import {
cancel,
confirm,
intro,
isCancel,
log,
multiselect,
note,
outro,
password,
select,
text,
} from '@clack/prompts';
import { localConnectionToWarehouseDescriptor } from '@ktx/context/connections';
import {
DEFAULT_METABASE_CLIENT_CONFIG,
DefaultMetabaseConnectionClientFactory,
LocalMetabaseSourceStateReader,
MetabaseClient,
type MetabaseDatabase,
type MetabaseRuntimeClient,
type MetabaseSyncMode,
metabaseRuntimeConfigFromLocalConnection,
validateMappingPhysicalMatch,
} from '@ktx/context/ingest';
import {
type KtxLocalProject,
type KtxProjectConnectionConfig,
ktxLocalStateDbPath,
loadKtxProject,
serializeKtxProjectConfig,
} from '@ktx/context/project';
import { createClackSpinner, type KtxCliSpinner } from '../clack.js';
import type { KtxCliIo } from '../cli-runtime.js';
import { withMenuOptionsSpacing, withMultiselectNavigation } from '../prompt-navigation.js';
import { type KtxPublicIngestArgs, runKtxPublicIngest } from '../public-ingest.js';
export type KtxMetabaseSetupInputMode = 'auto' | 'disabled';
export type MetabaseSetupSyncMode = MetabaseSyncMode;
type MetabaseSetupPromptOption<Value> = ClackOption<Value>;
export interface MetabaseSetupLogger {
info(message: string): void;
step(message: string): void;
success(message: string): void;
warn(message: string): void;
error(message: string): void;
}
export interface MetabaseSetupPromptAdapter {
intro(title?: string): void;
outro(message?: string): void;
note(message: string, title: string): void;
log: MetabaseSetupLogger;
spinner(): KtxCliSpinner;
select<T extends string>(options: { message: string; options: Array<MetabaseSetupPromptOption<T>> }): Promise<T>;
multiselect<Value extends number | string>(options: {
message: string;
options: Array<MetabaseSetupPromptOption<Value>>;
initialValues?: Value[];
required?: boolean;
maxItems?: number;
}): Promise<Value[]>;
text(options: { message: string; placeholder?: string }): Promise<string>;
password(options: { message: string }): Promise<string>;
confirm(options: { message: string; initialValue?: boolean }): Promise<boolean>;
cancel(message: string): void;
}
type KtxMetabaseSetupInteractiveIo = KtxCliIo & {
stdin?: { isTTY?: boolean };
};
export interface MetabaseSetupMappingAssignment {
metabaseDatabaseId: number;
targetConnectionId: string;
}
export interface MintMetabaseApiKeyArgs {
url: string;
username: string;
password: string;
}
export type MintMetabaseApiKey = (args: MintMetabaseApiKeyArgs, io: KtxCliIo) => Promise<string>;
export interface KtxConnectionMetabaseSetupArgs {
command: 'setup';
projectDir: string;
connectionId?: string;
url?: string;
apiKey?: string;
mintApiKey: boolean;
metabaseUsername?: string;
metabasePassword?: string;
mappings: MetabaseSetupMappingAssignment[];
syncEnabledDatabaseIds: number[];
syncMode: MetabaseSetupSyncMode;
runIngest: boolean;
yes: boolean;
inputMode: KtxMetabaseSetupInputMode;
}
export interface KtxConnectionMetabaseSetupDeps {
createMetabaseClient?: (
project: KtxLocalProject,
connectionId: string,
) => Promise<Pick<MetabaseRuntimeClient, 'testConnection' | 'getDatabases' | 'cleanup'>>;
mintMetabaseApiKey?: MintMetabaseApiKey;
prompts?: MetabaseSetupPromptAdapter;
runPublicIngest?: (args: Extract<KtxPublicIngestArgs, { command: 'run' }>, io: KtxCliIo) => Promise<number>;
}
function isMetabaseConnection(connection: KtxProjectConnectionConfig | undefined): boolean {
return (
String(connection?.driver ?? '')
.trim()
.toLowerCase() === 'metabase'
);
}
function stringField(value: unknown): string | undefined {
return typeof value === 'string' && value.trim().length > 0 ? value.trim() : undefined;
}
function uniqueSorted(values: number[]): number[] {
return [...new Set(values)].sort((a, b) => a - b);
}
function resolveMetabaseUrl(connection: KtxProjectConnectionConfig | undefined): string | undefined {
return stringField(connection?.api_url) ?? stringField(connection?.apiUrl) ?? stringField(connection?.url);
}
function resolveLiteralMetabaseApiKey(connection: KtxProjectConnectionConfig | undefined): string | undefined {
return stringField(connection?.api_key) ?? stringField(connection?.apiKey);
}
function listMetabaseConnectionIds(project: KtxLocalProject): string[] {
return Object.entries(project.config.connections)
.filter(([_connectionId, connection]) => isMetabaseConnection(connection))
.map(([connectionId]) => connectionId)
.sort();
}
function listWarehouseConnectionIds(project: KtxLocalProject): string[] {
return Object.entries(project.config.connections)
.filter(([connectionId, connection]) => localConnectionToWarehouseDescriptor(connectionId, connection) != null)
.map(([connectionId]) => connectionId)
.sort();
}
function redactSecrets(message: string, secrets: string[]): string {
let result = message;
for (const secret of secrets) {
if (!secret) {
continue;
}
result = result.split(secret).join('[redacted]');
}
return result;
}
async function createDefaultMetabaseClient(
project: KtxLocalProject,
connectionId: string,
): Promise<Pick<MetabaseRuntimeClient, 'testConnection' | 'getDatabases' | 'cleanup'>> {
const factory = new DefaultMetabaseConnectionClientFactory(
(metabaseConnectionId) =>
metabaseRuntimeConfigFromLocalConnection(metabaseConnectionId, project.config.connections[metabaseConnectionId]),
DEFAULT_METABASE_CLIENT_CONFIG,
);
return factory.createClient(connectionId);
}
async function defaultMintMetabaseApiKey(args: MintMetabaseApiKeyArgs): Promise<string> {
const loginClient = new MetabaseClient({ apiUrl: args.url, apiKey: '' }, DEFAULT_METABASE_CLIENT_CONFIG);
const sessionId = await loginClient.createSession(args.username, args.password);
const sessionClient = new MetabaseClient(
{ apiUrl: args.url, apiKey: sessionId, authHeaderName: 'X-Metabase-Session' },
DEFAULT_METABASE_CLIENT_CONFIG,
);
const groups = await sessionClient.getPermissionGroups();
const adminGroup = groups.find((group) => group.name === 'Administrators');
if (!adminGroup) {
throw new Error('Metabase Administrators group was not found; create an API key manually and pass --api-key');
}
const mintedKey = await sessionClient.createApiKey({
groupId: adminGroup.id,
name: `KTX CLI ${new Date().toISOString()}`,
});
const trimmedKey = stringField(mintedKey);
if (!trimmedKey) {
throw new Error('Metabase API key minting returned an empty key');
}
return trimmedKey;
}
function ensureNotCancelled<T>(value: T | symbol, prompts: Pick<MetabaseSetupPromptAdapter, 'cancel'>): T {
if (isCancel(value)) {
prompts.cancel('Setup cancelled.');
throw new Error('Setup cancelled.');
}
return value as T;
}
export function createClackMetabaseSetupPromptAdapter(): MetabaseSetupPromptAdapter {
return {
intro(title?: string): void {
intro(title);
},
outro(message?: string): void {
outro(message);
},
note(message: string, title: string): void {
note(message, title);
},
log: {
info(message: string): void {
log.info(message);
},
step(message: string): void {
log.step(message);
},
success(message: string): void {
log.success(message);
},
warn(message: string): void {
log.warn(message);
},
error(message: string): void {
log.error(message);
},
},
spinner(): KtxCliSpinner {
return createClackSpinner();
},
async select<T extends string>(options: {
message: string;
options: Array<MetabaseSetupPromptOption<T>>;
}): Promise<T> {
return ensureNotCancelled(await select(withMenuOptionsSpacing(options)), this);
},
async multiselect<Value extends number | string>(options: {
message: string;
options: Array<MetabaseSetupPromptOption<Value>>;
initialValues?: Value[];
required?: boolean;
maxItems?: number;
}): Promise<Value[]> {
return ensureNotCancelled(await multiselect(withMenuOptionsSpacing(options)), this);
},
async text(options: { message: string; placeholder?: string }): Promise<string> {
return ensureNotCancelled(await text(options), this);
},
async password(options: { message: string }): Promise<string> {
return ensureNotCancelled(await password(options), this);
},
async confirm(options: { message: string; initialValue?: boolean }): Promise<boolean> {
return ensureNotCancelled(await confirm(options), this);
},
cancel(message: string): void {
cancel(message);
},
};
}
function isInteractiveMetabaseSetupIo(
args: Pick<KtxConnectionMetabaseSetupArgs, 'inputMode'>,
io: KtxMetabaseSetupInteractiveIo,
): boolean {
return args.inputMode !== 'disabled' && io.stdin?.isTTY === true && io.stdout.isTTY === true;
}
function normalizeDiscoveredDatabases(databases: MetabaseDatabase[]): Array<{
id: number;
name: string;
engine: string;
host: string | null;
dbName: string | null;
}> {
return databases
.filter((database) => database.is_sample !== true)
.map((database) => ({
id: database.id,
name: database.name,
engine: stringField(database.engine) ?? 'unknown',
host: stringField(database.details?.host) ?? null,
dbName: stringField(database.details?.dbname) ?? null,
}));
}
function targetPhysicalInfo(project: KtxLocalProject, connectionId: string) {
const descriptor = localConnectionToWarehouseDescriptor(connectionId, project.config.connections[connectionId]);
if (!descriptor) {
return { connection_type: 'UNKNOWN' };
}
return {
connection_type: descriptor.connection_type,
host: descriptor.host ?? null,
database: descriptor.database ?? null,
account: descriptor.account ?? null,
project_id: descriptor.project_id ?? null,
dataset_id: descriptor.dataset_id ?? null,
...descriptor.connection_params,
};
}
function noteMetabaseSetupSummary(options: {
prompts: MetabaseSetupPromptAdapter;
connectionId: string;
url: string;
mappings: MetabaseSetupMappingAssignment[];
syncEnabledDatabaseIds: number[];
}): void {
const mappingLines = options.mappings
.map((mapping) => ` ${mapping.metabaseDatabaseId} -> ${mapping.targetConnectionId}`)
.join('\n');
const syncLines = options.syncEnabledDatabaseIds.map((id) => ` ${id}`).join('\n');
options.prompts.note(
[
`Connection: ${options.connectionId}`,
`URL: ${options.url}`,
'',
'Mappings:',
mappingLines || ' (none)',
'',
'Sync enabled:',
syncLines || ' (none)',
].join('\n'),
'Summary',
);
}
export async function runKtxConnectionMetabaseSetup(
args: KtxConnectionMetabaseSetupArgs,
io: KtxCliIo,
deps: KtxConnectionMetabaseSetupDeps = {},
): Promise<number> {
let apiKeyForRedaction = args.apiKey;
let passwordForRedaction = args.metabasePassword;
const interactiveIo = io as KtxMetabaseSetupInteractiveIo;
const isInteractive = isInteractiveMetabaseSetupIo(args, interactiveIo);
const prompts = deps.prompts ?? (isInteractive ? createClackMetabaseSetupPromptAdapter() : undefined);
try {
if (isInteractive && prompts) {
prompts.intro('KTX Metabase setup');
}
const project = await loadKtxProject({ projectDir: args.projectDir });
const existingMetabaseConnectionIds = listMetabaseConnectionIds(project);
let connectionId: string;
if (args.connectionId) {
connectionId = args.connectionId;
} else if (existingMetabaseConnectionIds.length === 1) {
const onlyMetabaseConnectionId = existingMetabaseConnectionIds[0];
if (!onlyMetabaseConnectionId) {
throw new Error('No Metabase connection id was resolved');
}
connectionId = onlyMetabaseConnectionId;
} else if (existingMetabaseConnectionIds.length > 1) {
if (!isInteractive || !prompts) {
throw new Error(
`Multiple Metabase connections found (${existingMetabaseConnectionIds.join(', ')}); select one with --id`,
);
}
connectionId = await prompts.select({
message: 'Select the Metabase connection to configure',
options: existingMetabaseConnectionIds.map((id) => ({ value: id, label: id })),
});
} else {
connectionId = 'metabase';
}
const existingConnection = project.config.connections[connectionId];
const warehouseConnectionIds = listWarehouseConnectionIds(project);
if (warehouseConnectionIds.length === 0) {
throw new Error('Add a warehouse connection first');
}
let url = args.url ?? resolveMetabaseUrl(existingConnection);
let apiKey = args.apiKey ?? resolveLiteralMetabaseApiKey(existingConnection);
apiKeyForRedaction = apiKey;
if (!url && isInteractive && prompts) {
url = stringField(
await prompts.text({
message: 'Metabase API URL',
placeholder: 'http://localhost:3000',
}),
);
}
if (args.inputMode === 'disabled' && !url) {
throw new Error('missing Metabase URL');
}
if (!args.apiKey && !args.mintApiKey && apiKey && isInteractive && prompts && !args.yes) {
const reuse = await prompts.confirm({
message: `Reuse the existing Metabase API key from connections.${connectionId}?`,
initialValue: true,
});
if (!reuse) {
apiKey = undefined;
apiKeyForRedaction = undefined;
}
}
if (args.mintApiKey) {
let username = stringField(args.metabaseUsername);
let metabasePassword = stringField(args.metabasePassword);
if (isInteractive && prompts) {
if (!username) {
username = stringField(await prompts.text({ message: 'Metabase admin username' }));
}
if (!metabasePassword) {
metabasePassword = stringField(await prompts.password({ message: 'Metabase admin password' }));
}
}
if (!username) {
throw new Error('--mint-api-key requires --username');
}
if (!metabasePassword) {
throw new Error('--mint-api-key requires --password');
}
if (!url) {
throw new Error('Metabase URL is required (use --url)');
}
passwordForRedaction = metabasePassword;
apiKey = await (deps.mintMetabaseApiKey ?? defaultMintMetabaseApiKey)(
{ url, username, password: metabasePassword },
io,
);
apiKeyForRedaction = apiKey;
}
if (!apiKey && isInteractive && prompts) {
const credentialMode = await prompts.select({
message: 'Metabase credentials',
options: [
{ value: 'paste', label: 'Paste API key' },
{ value: 'mint', label: 'Mint API key' },
],
});
if (credentialMode === 'paste') {
apiKey = stringField(await prompts.password({ message: 'Metabase API key' }));
apiKeyForRedaction = apiKey;
} else {
const username = stringField(await prompts.text({ message: 'Metabase admin username' }));
const metabasePassword = stringField(await prompts.password({ message: 'Metabase admin password' }));
if (!username) {
throw new Error('Metabase username is required');
}
if (!metabasePassword) {
throw new Error('Metabase password is required');
}
if (!url) {
throw new Error('Metabase URL is required (use --url)');
}
passwordForRedaction = metabasePassword;
apiKey = await (deps.mintMetabaseApiKey ?? defaultMintMetabaseApiKey)(
{ url, username, password: metabasePassword },
io,
);
apiKeyForRedaction = apiKey;
}
}
if (args.inputMode === 'disabled' && !apiKey) {
throw new Error('missing Metabase API key');
}
if (!url) {
throw new Error('Metabase URL is required (use --url)');
}
if (!apiKey) {
throw new Error('Metabase API key is required (use --api-key)');
}
const transientConnectionConfig: KtxProjectConnectionConfig = {
...(existingConnection ?? {}),
driver: 'metabase',
api_url: url,
api_key: apiKey,
};
const configWithTransient = {
...project.config,
connections: {
...project.config.connections,
[connectionId]: transientConnectionConfig,
},
};
const discoveryProject: KtxLocalProject = { ...project, config: configWithTransient };
for (const mapping of args.mappings) {
if (!configWithTransient.connections[mapping.targetConnectionId]) {
throw new Error(`Target connection "${mapping.targetConnectionId}" does not exist`);
}
}
const client = await (deps.createMetabaseClient ?? createDefaultMetabaseClient)(discoveryProject, connectionId);
try {
const authSpinner = isInteractive && prompts ? prompts.spinner() : undefined;
authSpinner?.start('Testing Metabase connection');
const testResult = await client.testConnection();
if (!testResult.success) {
authSpinner?.error('Metabase authentication failed');
throw new Error(
`Metabase authentication failed. Replace connections.${connectionId}.api_key or use --mint-api-key.`,
);
}
authSpinner?.stop('Metabase reachable');
const discoverySpinner = isInteractive && prompts ? prompts.spinner() : undefined;
discoverySpinner?.start('Discovering Metabase databases');
const discovered = normalizeDiscoveredDatabases(await client.getDatabases());
discoverySpinner?.stop(`Discovered ${discovered.length} ${discovered.length === 1 ? 'database' : 'databases'}`);
if (isInteractive && prompts) {
prompts.log.success(
`Discovered ${discovered.length} ${discovered.length === 1 ? 'database' : 'databases'}`,
);
}
if (discovered.length === 0) {
throw new Error('Metabase auth worked but no usable databases were returned');
}
let resolvedMappings = args.mappings;
let resolvedSyncEnabledDatabaseIds = args.syncEnabledDatabaseIds;
if (resolvedSyncEnabledDatabaseIds.length === 0 && args.yes && resolvedMappings.length > 0) {
resolvedSyncEnabledDatabaseIds = uniqueSorted(resolvedMappings.map((mapping) => mapping.metabaseDatabaseId));
}
if (resolvedMappings.length === 0 && resolvedSyncEnabledDatabaseIds.length === 0) {
const onlyDiscoveredDatabase = discovered.length === 1 ? discovered[0] : undefined;
const compatibleWarehouses = onlyDiscoveredDatabase
? warehouseConnectionIds.filter((warehouseConnectionId) => {
const mismatchReason = validateMappingPhysicalMatch(
{
metabaseEngine: onlyDiscoveredDatabase.engine,
metabaseDbName: onlyDiscoveredDatabase.dbName,
metabaseHost: onlyDiscoveredDatabase.host,
},
targetPhysicalInfo(project, warehouseConnectionId),
);
return !mismatchReason;
})
: [];
const onlyWarehouseConnectionId = compatibleWarehouses[0];
if (onlyDiscoveredDatabase && compatibleWarehouses.length === 1 && onlyWarehouseConnectionId) {
if (args.yes) {
resolvedMappings = [
{ metabaseDatabaseId: onlyDiscoveredDatabase.id, targetConnectionId: onlyWarehouseConnectionId },
];
resolvedSyncEnabledDatabaseIds = [onlyDiscoveredDatabase.id];
} else if (isInteractive && prompts) {
const proposedMappings = [
{ metabaseDatabaseId: onlyDiscoveredDatabase.id, targetConnectionId: onlyWarehouseConnectionId },
];
const proposedSyncEnabledDatabaseIds = [onlyDiscoveredDatabase.id];
noteMetabaseSetupSummary({
prompts,
connectionId,
url,
mappings: proposedMappings,
syncEnabledDatabaseIds: proposedSyncEnabledDatabaseIds,
});
const confirmed = await prompts.confirm({
message: `Map Metabase database "${onlyDiscoveredDatabase.name}" (${onlyDiscoveredDatabase.id}) to "${onlyWarehouseConnectionId}" and enable sync?`,
initialValue: true,
});
if (!confirmed) {
prompts.cancel('Setup cancelled.');
throw new Error('Setup cancelled.');
}
resolvedMappings = proposedMappings;
resolvedSyncEnabledDatabaseIds = proposedSyncEnabledDatabaseIds;
} else {
throw new Error('Metabase mapping/sync is required in --no-input mode; pass --map and --sync');
}
} else if (isInteractive && prompts) {
const selectedDatabaseIds = await prompts.multiselect<number>({
message: withMultiselectNavigation('Select Metabase databases to configure'),
options: discovered.map((database) => ({
value: database.id,
label: `${database.id}: ${database.name}`,
hint: [database.engine, database.host, database.dbName].filter(Boolean).join(' • '),
})),
required: true,
});
resolvedMappings = [];
for (const databaseId of selectedDatabaseIds) {
const database = discovered.find((candidate) => candidate.id === databaseId);
if (!database) {
throw new Error(`Selected database id ${databaseId} was not discovered`);
}
const existingMapping = args.mappings.find((mapping) => mapping.metabaseDatabaseId === databaseId);
if (existingMapping) {
resolvedMappings.push(existingMapping);
continue;
}
const targetConnectionId = await prompts.select({
message: `Map Metabase database ${database.id} ("${database.name}") to which KTX connection?`,
options: warehouseConnectionIds.map((warehouseId) => ({ value: warehouseId, label: warehouseId })),
});
resolvedMappings.push({ metabaseDatabaseId: databaseId, targetConnectionId });
}
const syncIds = await prompts.multiselect<number>({
message: withMultiselectNavigation('Enable sync for which databases?'),
options: selectedDatabaseIds.map((id) => ({ value: id, label: String(id) })),
initialValues: selectedDatabaseIds,
required: true,
});
resolvedSyncEnabledDatabaseIds = uniqueSorted(syncIds);
if (!args.yes) {
noteMetabaseSetupSummary({
prompts,
connectionId,
url,
mappings: resolvedMappings,
syncEnabledDatabaseIds: resolvedSyncEnabledDatabaseIds,
});
const confirmed = await prompts.confirm({
message: 'Write changes to ktx.yaml and enable sync?',
initialValue: true,
});
if (!confirmed) {
prompts.cancel('Setup cancelled.');
throw new Error('Setup cancelled.');
}
}
} else if (args.inputMode === 'disabled') {
throw new Error('Metabase mapping/sync is required in --no-input mode; pass --map and --sync');
}
}
if (
args.inputMode === 'disabled' &&
resolvedMappings.length > 0 &&
resolvedSyncEnabledDatabaseIds.length === 0
) {
throw new Error('Metabase sync selection is required in --no-input mode; pass --sync <metabaseDatabaseId>');
}
const discoveredIds = new Set(discovered.map((database) => database.id));
for (const mapping of resolvedMappings) {
if (!discoveredIds.has(mapping.metabaseDatabaseId)) {
throw new Error(`Mapped database id ${mapping.metabaseDatabaseId} was not discovered`);
}
}
for (const syncId of resolvedSyncEnabledDatabaseIds) {
if (!discoveredIds.has(syncId)) {
throw new Error(`Sync database id ${syncId} was not discovered`);
}
}
await project.fileStore.writeFile(
'ktx.yaml',
serializeKtxProjectConfig(configWithTransient),
'ktx',
'ktx@example.com',
`Setup Metabase connection ${connectionId}`,
);
const updatedProject = await loadKtxProject({ projectDir: args.projectDir });
const store = new LocalMetabaseSourceStateReader({ dbPath: ktxLocalStateDbPath(updatedProject) });
await store.refreshDiscoveredDatabases({ connectionId, discovered });
for (const mapping of resolvedMappings) {
await store.upsertDatabaseMapping({
connectionId,
metabaseDatabaseId: mapping.metabaseDatabaseId,
targetConnectionId: mapping.targetConnectionId,
syncEnabled: false,
source: 'cli',
});
}
for (const metabaseDatabaseId of resolvedSyncEnabledDatabaseIds) {
await store.setMappingSyncEnabled({
connectionId,
metabaseDatabaseId,
syncEnabled: true,
});
}
const existingSyncState = await store.getSourceState(connectionId);
await store.setSyncState({
connectionId,
syncMode: args.syncMode,
defaultTagNames: existingSyncState.defaultTagNames,
selections: existingSyncState.selections,
});
const unhydrated = await store.getUnhydratedSyncEnabledMappingIds(connectionId);
if (unhydrated.length > 0) {
io.stderr.write(
`Sync-enabled mappings are missing discovery metadata; run ktx connection mapping refresh ${connectionId} --auto-accept\n`,
);
return 1;
}
const rows = await store.listDatabaseMappings(connectionId);
const physicalFailures = rows.flatMap((row) => {
if (!row.targetConnectionId) {
return [];
}
const reason = validateMappingPhysicalMatch(
{ metabaseEngine: row.metabaseEngine, metabaseDbName: row.metabaseDbName, metabaseHost: row.metabaseHost },
updatedProject.config.connections[row.targetConnectionId]
? targetPhysicalInfo(updatedProject, row.targetConnectionId)
: { connection_type: 'UNKNOWN' },
);
return reason ? [`${row.metabaseDatabaseId}: ${reason}`] : [];
});
if (physicalFailures.length > 0) {
for (const failure of physicalFailures) {
io.stderr.write(`${failure}\n`);
}
return 1;
}
io.stdout.write(`Connection: ${connectionId}\n`);
io.stdout.write(`Discovered ${discovered.length} ${discovered.length === 1 ? 'database' : 'databases'}\n`);
io.stdout.write(`Next: ktx ingest ${connectionId} --project-dir ${args.projectDir}\n`);
if (args.runIngest) {
const ingestRunner = deps.runPublicIngest ?? runKtxPublicIngest;
const exitCode = await ingestRunner(
{
command: 'run',
projectDir: args.projectDir,
targetConnectionId: connectionId,
all: false,
json: false,
inputMode: 'disabled',
},
io,
);
if (exitCode !== 0) {
io.stderr.write(`Ingest failed; re-run: ktx ingest ${connectionId} --project-dir ${args.projectDir}\n`);
return 1;
}
}
if (isInteractive && prompts) {
prompts.outro('Metabase setup complete');
}
return 0;
} finally {
await client.cleanup();
}
} catch (error) {
const message = error instanceof Error ? error.message : String(error);
io.stderr.write(
`${redactSecrets(message, [apiKeyForRedaction ?? '', passwordForRedaction ?? '', args.apiKey ?? ''])}\n`,
);
return 1;
}
}

View file

@ -1,92 +0,0 @@
import { type Command, InvalidArgumentError } from '@commander-js/extra-typings';
import { collectOption, type KtxCliCommandContext, resolveCommandProjectDir } from '../cli-program.js';
import type { KtxConnectionNotionArgs } from './connection-notion.js';
interface NotionPickOptions {
input?: boolean;
rootPageId: string[];
}
function parseSafeConnectionId(value: string): string {
if (!/^[a-zA-Z0-9][a-zA-Z0-9_-]*$/.test(value)) {
throw new InvalidArgumentError(`Unsafe connection id: ${value}`);
}
return value;
}
function uniqueInOrder(values: string[]): string[] {
const seen = new Set<string>();
const result: string[] = [];
for (const value of values) {
if (!seen.has(value)) {
seen.add(value);
result.push(value);
}
}
return result;
}
function normalizeNotionPageId(value: string): string {
const trimmed = value.trim();
const compact = trimmed.includes('-') ? trimmed.replace(/-/g, '') : trimmed;
if (!/^[0-9a-fA-F]{32}$/.test(compact)) {
throw new Error(`Invalid Notion page UUID: ${value}`);
}
const lower = compact.toLowerCase();
return `${lower.slice(0, 8)}-${lower.slice(8, 12)}-${lower.slice(12, 16)}-${lower.slice(16, 20)}-${lower.slice(20)}`;
}
function buildPickArgs(connectionId: string, projectDir: string, options: NotionPickOptions): KtxConnectionNotionArgs {
if (options.input !== false) {
return {
command: 'pick',
projectDir,
connectionId,
mode: 'interactive',
};
}
const rootPageIds = uniqueInOrder(options.rootPageId.map(normalizeNotionPageId));
if (rootPageIds.length === 0) {
throw new Error('connection notion pick --no-input requires at least one --root-page-id');
}
return {
command: 'pick',
projectDir,
connectionId,
mode: 'non-interactive',
rootPageIds,
};
}
async function runConnectionNotionArgs(context: KtxCliCommandContext, args: KtxConnectionNotionArgs): Promise<void> {
const runner = context.deps.connectionNotion ?? (await import('./connection-notion.js')).runKtxConnectionNotion;
context.setExitCode(await runner(args, context.io));
}
export function registerConnectionNotionCommands(connect: Command, context: KtxCliCommandContext): void {
const notion = connect
.command('notion')
.description('Configure Notion source selection')
.showHelpAfterError()
.addHelpText(
'after',
'\nProject directory defaults to KTX_PROJECT_DIR when set, otherwise the current working directory.\n',
);
notion.action(() => {
notion.outputHelp();
context.setExitCode(0);
});
notion
.command('pick')
.description('Pick Notion root pages for a configured Notion connection')
.argument('<connectionId>', 'Notion connection id', parseSafeConnectionId)
.option('--no-input', 'Disable interactive terminal input')
.option('--root-page-id <id>', 'Root page UUID to crawl; repeatable with --no-input', collectOption, [])
.showHelpAfterError()
.action(async (connectionId: string, options: NotionPickOptions, command) => {
await runConnectionNotionArgs(context, buildPickArgs(connectionId, resolveCommandProjectDir(command), options));
});
}

View file

@ -1,513 +0,0 @@
import { mkdtemp, readFile, rm } from 'node:fs/promises';
import { tmpdir } from 'node:os';
import { join } from 'node:path';
import {
initKtxProject,
loadKtxProject,
serializeKtxProjectConfig,
type KtxProjectConfig,
} from '@ktx/context/project';
import { afterEach, beforeEach, describe, expect, it, vi } from 'vitest';
import {
applyNotionPickerWriteback,
discoverNotionPickerPages,
notionPickerPageFromSearchResult,
normalizeNotionPageId,
resolveNotionWorkspaceLabel,
runKtxConnectionNotion,
type NotionPickerApi,
type PickerRenderInput,
type PickerRenderResult,
} from './connection-notion.js';
function makeIo() {
let stdout = '';
let stderr = '';
return {
io: {
stdout: {
write: (chunk: string) => {
stdout += chunk;
},
},
stderr: {
write: (chunk: string) => {
stderr += chunk;
},
},
},
stdout: () => stdout,
stderr: () => stderr,
};
}
type FakeNotionSearchPage = Record<string, unknown> & { id: string; object: 'page' };
const PAGE_IDS = {
engineering: '11111111-1111-1111-1111-111111111111',
architecture: '22222222-2222-2222-2222-222222222222',
stale: '99999999-9999-9999-9999-999999999999',
};
function notionPage(id: string, title: string, parentId: string | null = null): FakeNotionSearchPage {
return {
object: 'page',
id,
archived: false,
parent: parentId ? { type: 'page_id', page_id: parentId } : { type: 'workspace', workspace: true },
properties: {
title: {
type: 'title',
title: [{ plain_text: title }],
},
},
};
}
function fakeNotionApi(pages: FakeNotionSearchPage[]): NotionPickerApi {
return {
search: vi.fn(async (_filterValue, startCursor) => {
if (startCursor === 'page-2') {
return { results: pages.slice(2), hasMore: false, nextCursor: null };
}
return {
results: pages.slice(0, 2),
hasMore: pages.length > 2,
nextCursor: pages.length > 2 ? 'page-2' : null,
};
}),
retrieveBotUser: vi.fn(async () => ({ name: 'Notion bot', bot: { workspace_name: 'Design Workspace' } })),
};
}
describe('normalizeNotionPageId', () => {
it('accepts dashed and compact UUIDs', () => {
expect(normalizeNotionPageId('11111111222233334444555555555555')).toBe(
'11111111-2222-3333-4444-555555555555',
);
expect(normalizeNotionPageId('AAAAAAAA-BBBB-CCCC-DDDD-EEEEEEEEEEEE')).toBe(
'aaaaaaaa-bbbb-cccc-dddd-eeeeeeeeeeee',
);
});
});
describe('runKtxConnectionNotion', () => {
let tempDir: string;
beforeEach(async () => {
tempDir = await mkdtemp(join(tmpdir(), 'ktx-cli-notion-pick-'));
});
afterEach(async () => {
await rm(tempDir, { recursive: true, force: true });
});
async function writeProjectConfig(projectDir: string, config: KtxProjectConfig): Promise<void> {
const project = await loadKtxProject({ projectDir });
await project.fileStore.writeFile(
'ktx.yaml',
serializeKtxProjectConfig(config),
'ktx',
'ktx@example.com',
'seed test config',
);
}
it('rejects unsafe connection ids before loading a project', async () => {
const io = makeIo();
const loadProject = vi.fn(async () => {
throw new Error('loadProject should not be called');
});
await expect(
runKtxConnectionNotion(
{
command: 'pick',
projectDir: '/tmp/project',
connectionId: '../evil',
mode: 'interactive',
},
io.io,
{ loadProject },
),
).resolves.toBe(1);
expect(loadProject).not.toHaveBeenCalled();
expect(io.stderr()).toContain('Unsafe connection id: ../evil');
});
it('writes selected root_page_ids while preserving every other Notion connection field', async () => {
const projectDir = join(tempDir, 'project');
const initialized = await initKtxProject({ projectDir, projectName: 'warehouse' });
await writeProjectConfig(projectDir, {
...initialized.config,
connections: {
'notion-main': {
driver: 'notion',
auth_token_ref: 'env:NOTION_TOKEN',
crawl_mode: 'all_accessible',
root_page_ids: ['99999999-9999-9999-9999-999999999999'],
root_database_ids: ['database-1'],
root_data_source_ids: ['data-source-1'],
max_pages_per_run: 12,
max_knowledge_creates_per_run: 2,
max_knowledge_updates_per_run: 7,
last_successful_cursor: '{"phase":"all_accessible_pages","cursor":"cursor-1"}',
unknown_future_field: 'keep-me',
},
},
});
const io = makeIo();
await expect(
runKtxConnectionNotion(
{
command: 'pick',
projectDir,
connectionId: 'notion-main',
mode: 'non-interactive',
rootPageIds: [
'11111111-2222-3333-4444-555555555555',
'66666666-7777-8888-9999-aaaaaaaaaaaa',
],
},
io.io,
),
).resolves.toBe(0);
const yaml = await readFile(join(projectDir, 'ktx.yaml'), 'utf-8');
expect(yaml).toContain('crawl_mode: selected_roots');
expect(yaml).toContain('root_page_ids:');
expect(yaml).toContain('11111111-2222-3333-4444-555555555555');
expect(yaml).toContain('66666666-7777-8888-9999-aaaaaaaaaaaa');
expect(yaml).toContain('root_database_ids:');
expect(yaml).toContain('database-1');
expect(yaml).toContain('root_data_source_ids:');
expect(yaml).toContain('data-source-1');
expect(yaml).toContain('last_successful_cursor: \'{"phase":"all_accessible_pages","cursor":"cursor-1"}\'');
expect(yaml).toContain('unknown_future_field: keep-me');
expect(io.stdout()).toContain('Connection: notion-main');
expect(io.stdout()).toContain('rootPageIds: 2');
expect(io.stdout()).toContain('crawlMode: selected_roots');
});
it('rejects empty writeback, missing connections, and non-Notion connections', async () => {
const projectDir = join(tempDir, 'project');
const initialized = await initKtxProject({ projectDir, projectName: 'warehouse' });
await writeProjectConfig(projectDir, {
...initialized.config,
connections: {
warehouse: {
driver: 'postgres',
url: 'env:DATABASE_URL',
readonly: true,
},
},
});
const project = await loadKtxProject({ projectDir });
await expect(applyNotionPickerWriteback(project, 'warehouse', [])).rejects.toThrow(
'connection notion pick requires at least one root page id',
);
await expect(
applyNotionPickerWriteback(project, 'missing', ['11111111-2222-3333-4444-555555555555']),
).rejects.toThrow('Connection "missing" not found');
await expect(
applyNotionPickerWriteback(project, 'warehouse', ['11111111-2222-3333-4444-555555555555']),
).rejects.toThrow('Connection "warehouse" is not a Notion connection');
});
it('extracts picker page inputs from Notion search results', () => {
expect(notionPickerPageFromSearchResult(notionPage(PAGE_IDS.architecture, 'Architecture', PAGE_IDS.engineering)))
.toEqual({
id: PAGE_IDS.architecture,
title: 'Architecture',
archived: false,
parentId: PAGE_IDS.engineering,
});
expect(
notionPickerPageFromSearchResult({
object: 'page',
id: PAGE_IDS.engineering.replaceAll('-', ''),
archived: true,
parent: { type: 'workspace', workspace: true },
properties: {},
}),
).toEqual({
id: PAGE_IDS.engineering,
title: 'Untitled',
archived: true,
parentId: null,
});
});
it('discovers visible pages up to the cap and reports cap state', async () => {
const api = fakeNotionApi([
notionPage(PAGE_IDS.engineering, 'Engineering'),
notionPage(PAGE_IDS.architecture, 'Architecture', PAGE_IDS.engineering),
notionPage('33333333-3333-3333-3333-333333333333', 'Onboarding', PAGE_IDS.engineering),
]);
await expect(discoverNotionPickerPages(api, { cap: 2 })).resolves.toEqual({
pages: [
{ id: PAGE_IDS.engineering, title: 'Engineering', archived: false, parentId: null },
{ id: PAGE_IDS.architecture, title: 'Architecture', archived: false, parentId: PAGE_IDS.engineering },
],
cappedAtCount: 2,
warnings: [],
});
expect(api.search).toHaveBeenCalledTimes(1);
});
it('keeps partial discovery results when Notion search fails after at least one page', async () => {
const api: NotionPickerApi = {
search: vi
.fn()
.mockResolvedValueOnce({
results: [notionPage(PAGE_IDS.engineering, 'Engineering')],
hasMore: true,
nextCursor: 'cursor-2',
})
.mockRejectedValueOnce(new Error('rate limit after first page')),
retrieveBotUser: vi.fn(async () => ({ name: 'Notion bot' })),
};
await expect(discoverNotionPickerPages(api)).resolves.toEqual({
pages: [{ id: PAGE_IDS.engineering, title: 'Engineering', archived: false, parentId: null }],
cappedAtCount: null,
warnings: ['Notion search stopped early: rate limit after first page'],
});
});
it('uses the Notion workspace name when available and falls back to the connection id', async () => {
await expect(resolveNotionWorkspaceLabel(fakeNotionApi([]), 'notion-main')).resolves.toBe('Design Workspace');
await expect(
resolveNotionWorkspaceLabel(
{
search: vi.fn(),
retrieveBotUser: vi.fn(async () => {
throw new Error('users.me unavailable');
}),
},
'notion-main',
),
).resolves.toBe('notion-main');
});
it('runs interactive discovery, warns about stale roots, renders the TUI, and saves selected roots', async () => {
const projectDir = join(tempDir, 'project');
const initialized = await initKtxProject({ projectDir, projectName: 'warehouse' });
await writeProjectConfig(projectDir, {
...initialized.config,
connections: {
'notion-main': {
driver: 'notion',
auth_token_ref: 'env:NOTION_TOKEN',
crawl_mode: 'all_accessible',
root_page_ids: [PAGE_IDS.stale],
root_database_ids: ['database-1'],
root_data_source_ids: ['data-source-1'],
max_pages_per_run: 12,
max_knowledge_creates_per_run: 2,
max_knowledge_updates_per_run: 7,
last_successful_cursor: null,
},
},
});
const api = fakeNotionApi([
notionPage(PAGE_IDS.engineering, 'Engineering'),
notionPage(PAGE_IDS.architecture, 'Architecture', PAGE_IDS.engineering),
]);
const renderPicker = vi.fn(async (input): Promise<PickerRenderResult> => {
expect(input.connectionId).toBe('notion-main');
expect(input.workspaceLabel).toBe('Design Workspace');
expect(input.currentCrawlMode).toBe('all_accessible');
expect(input.cappedAtCount).toBeNull();
expect(input.initialState.preLoadWarnings).toEqual(['1 stored root_page_ids no longer visible']);
return { kind: 'save', rootPageIds: [PAGE_IDS.engineering] };
});
const io = makeIo();
await expect(
runKtxConnectionNotion(
{
command: 'pick',
projectDir,
connectionId: 'notion-main',
mode: 'interactive',
},
io.io,
{
env: { NOTION_TOKEN: 'ntn_test_token' },
createNotionApi: vi.fn(() => api),
renderPicker,
},
),
).resolves.toBe(0);
const yaml = await readFile(join(projectDir, 'ktx.yaml'), 'utf-8');
expect(yaml).toContain('crawl_mode: selected_roots');
expect(yaml).toContain(PAGE_IDS.engineering);
expect(yaml).not.toContain(PAGE_IDS.stale);
expect(io.stderr()).toContain('1 stored root_page_ids no longer visible');
expect(io.stdout()).toContain('Connection: notion-main');
expect(io.stdout()).toContain('rootPageIds: 1');
});
it('uses inline Notion auth_token for interactive discovery', async () => {
const projectDir = join(tempDir, 'project');
const initialized = await initKtxProject({ projectDir, projectName: 'warehouse' });
await writeProjectConfig(projectDir, {
...initialized.config,
connections: {
'notion-main': {
driver: 'notion',
auth_token: 'ntn_inline_token',
crawl_mode: 'selected_roots',
root_page_ids: [PAGE_IDS.engineering],
root_database_ids: [],
root_data_source_ids: [],
max_pages_per_run: 12,
max_knowledge_creates_per_run: 2,
max_knowledge_updates_per_run: 7,
last_successful_cursor: null,
},
},
});
const api = fakeNotionApi([notionPage(PAGE_IDS.engineering, 'Engineering')]);
const createNotionApi = vi.fn((authToken: string) => {
expect(authToken).toBe('ntn_inline_token');
return api;
});
const io = makeIo();
await expect(
runKtxConnectionNotion(
{
command: 'pick',
projectDir,
connectionId: 'notion-main',
mode: 'interactive',
},
io.io,
{
createNotionApi,
renderPicker: vi.fn(async (): Promise<PickerRenderResult> => ({ kind: 'quit' })),
},
),
).resolves.toBe(0);
expect(createNotionApi).toHaveBeenCalledOnce();
expect(io.stdout()).toContain('No changes saved.');
});
it('passes partial-discovery warnings into the TUI banner state', async () => {
const projectDir = join(tempDir, 'project');
const initialized = await initKtxProject({ projectDir, projectName: 'warehouse' });
await writeProjectConfig(projectDir, {
...initialized.config,
connections: {
'notion-main': {
driver: 'notion',
auth_token_ref: 'env:NOTION_TOKEN',
crawl_mode: 'selected_roots',
root_page_ids: [PAGE_IDS.engineering],
root_database_ids: [],
root_data_source_ids: [],
max_pages_per_run: 12,
max_knowledge_creates_per_run: 2,
max_knowledge_updates_per_run: 7,
last_successful_cursor: null,
},
},
});
const api: NotionPickerApi = {
search: vi
.fn()
.mockResolvedValueOnce({
results: [notionPage(PAGE_IDS.engineering, 'Engineering')],
hasMore: true,
nextCursor: 'cursor-2',
})
.mockRejectedValueOnce(new Error('rate limit after first page')),
retrieveBotUser: vi.fn(async () => ({ name: 'Notion bot', bot: { workspace_name: 'Design Workspace' } })),
};
let renderInput: PickerRenderInput | undefined;
const renderPicker = vi.fn(async (input: PickerRenderInput): Promise<PickerRenderResult> => {
renderInput = input;
return { kind: 'quit' };
});
const io = makeIo();
await expect(
runKtxConnectionNotion(
{
command: 'pick',
projectDir,
connectionId: 'notion-main',
mode: 'interactive',
},
io.io,
{
env: { NOTION_TOKEN: 'ntn_test_token' },
createNotionApi: vi.fn(() => api),
renderPicker,
},
),
).resolves.toBe(0);
expect(renderPicker).toHaveBeenCalledOnce();
if (!renderInput) {
throw new Error('renderPicker was not called');
}
expect(renderInput.initialState.preLoadWarnings).toEqual(['Notion search stopped early: rate limit after first page']);
expect(renderInput.initialState.tree.map((node) => node.title)).toEqual(['Engineering']);
expect(io.stderr()).toContain('Notion search stopped early: rate limit after first page');
expect(io.stdout()).toContain('No changes saved.');
});
it('quits interactive mode without writing when the TUI returns quit', async () => {
const projectDir = join(tempDir, 'project');
const initialized = await initKtxProject({ projectDir, projectName: 'warehouse' });
await writeProjectConfig(projectDir, {
...initialized.config,
connections: {
'notion-main': {
driver: 'notion',
auth_token_ref: 'env:NOTION_TOKEN',
crawl_mode: 'selected_roots',
root_page_ids: [PAGE_IDS.engineering],
root_database_ids: [],
root_data_source_ids: [],
max_pages_per_run: 12,
max_knowledge_creates_per_run: 2,
max_knowledge_updates_per_run: 7,
last_successful_cursor: null,
},
},
});
const before = await readFile(join(projectDir, 'ktx.yaml'), 'utf-8');
const io = makeIo();
await expect(
runKtxConnectionNotion(
{
command: 'pick',
projectDir,
connectionId: 'notion-main',
mode: 'interactive',
},
io.io,
{
env: { NOTION_TOKEN: 'ntn_test_token' },
createNotionApi: vi.fn(() => fakeNotionApi([notionPage(PAGE_IDS.engineering, 'Engineering')])),
renderPicker: vi.fn(async (): Promise<PickerRenderResult> => ({ kind: 'quit' })),
},
),
).resolves.toBe(0);
await expect(readFile(join(projectDir, 'ktx.yaml'), 'utf-8')).resolves.toBe(before);
expect(io.stdout()).toContain('No changes saved.');
});
});

View file

@ -1,53 +0,0 @@
import type { Command } from '@commander-js/extra-typings';
import { type CommandWithGlobalOptions, type KtxCliCommandContext, resolveCommandProjectDir } from '../cli-program.js';
import type { KtxDoctorArgs } from '../doctor.js';
import { profileMark } from '../startup-profile.js';
profileMark('module:commands/doctor-commands');
function outputMode(options: { json?: boolean }): 'plain' | 'json' {
return options.json === true ? 'json' : 'plain';
}
function inputMode(options: { input?: boolean }): { inputMode?: 'disabled' } {
return options.input === false ? { inputMode: 'disabled' } : {};
}
async function runDoctorArgs(context: KtxCliCommandContext, args: KtxDoctorArgs): Promise<void> {
const runner = context.deps.doctor ?? (await import('../doctor.js')).runKtxDoctor;
context.setExitCode(await runner(args, context.io));
}
export function registerDoctorCommands(program: Command, context: KtxCliCommandContext): void {
const doctor = program
.command('doctor')
.description('Check KTX setup and project readiness')
.option('--json', 'Print JSON output', false)
.option('--no-input', 'Disable interactive terminal input')
.action(async (options: { json?: boolean; input?: boolean }, command) => {
await runDoctorArgs(context, {
command: 'project',
projectDir: resolveCommandProjectDir(command),
outputMode: outputMode(options),
...inputMode(options),
});
});
doctor
.command('setup')
.description('Check KTX install, build, and local runtime readiness')
.option('--json', 'Print JSON output', false)
.option('--no-input', 'Disable interactive terminal input')
.action(
async (
_options: { json?: boolean; input?: boolean },
command: CommandWithGlobalOptions,
) => {
const options = (command.optsWithGlobals ? command.optsWithGlobals() : command.opts()) as {
json?: boolean;
input?: boolean;
};
await runDoctorArgs(context, { command: 'setup', outputMode: outputMode(options), ...inputMode(options) });
},
);
}

View file

@ -1,5 +1,10 @@
import { type Command, Option } from '@commander-js/extra-typings';
import { collectOption, type KtxCliCommandContext, resolveCommandProjectDir } from '../cli-program.js';
import {
collectOption,
type KtxCliCommandContext,
parsePositiveIntegerOption,
resolveCommandProjectDir,
} from '../cli-program.js';
import { wikiWriteCommandSchema } from '../command-schemas.js';
import type { KtxKnowledgeArgs } from '../knowledge.js';
import { profileMark } from '../startup-profile.js';
@ -24,12 +29,14 @@ export function registerWikiCommands(program: Command, context: KtxCliCommandCon
wiki
.command('list')
.description('List local wiki pages')
.option('--json', 'Print JSON output', false)
.option('--user-id <id>', 'Local user id', 'local')
.action(async (options: { userId: string }, command) => {
.action(async (options: { userId: string; json?: boolean }, command) => {
await runKnowledgeArgs(context, {
command: 'list',
projectDir: resolveCommandProjectDir(command),
userId: options.userId,
json: options.json,
});
});
@ -37,13 +44,15 @@ export function registerWikiCommands(program: Command, context: KtxCliCommandCon
.command('read')
.description('Read one local wiki page')
.argument('<key>', 'Wiki page key')
.option('--json', 'Print JSON output', false)
.option('--user-id <id>', 'Local user id', 'local')
.action(async (key: string, options: { userId: string }, command) => {
.action(async (key: string, options: { userId: string; json?: boolean }, command) => {
await runKnowledgeArgs(context, {
command: 'read',
projectDir: resolveCommandProjectDir(command),
key,
userId: options.userId,
json: options.json,
});
});
@ -51,13 +60,17 @@ export function registerWikiCommands(program: Command, context: KtxCliCommandCon
.command('search')
.description('Search local wiki pages')
.argument('<query>', 'Search query')
.option('--json', 'Print JSON output', false)
.option('--user-id <id>', 'Local user id', 'local')
.action(async (query: string, options: { userId: string }, command) => {
.option('--limit <number>', 'Maximum search results', parsePositiveIntegerOption)
.action(async (query: string, options: { userId: string; json?: boolean; limit?: number }, command) => {
await runKnowledgeArgs(context, {
command: 'search',
projectDir: resolveCommandProjectDir(command),
query,
userId: options.userId,
json: options.json,
...(options.limit !== undefined ? { limit: options.limit } : {}),
});
});

View file

@ -1,109 +0,0 @@
import { InvalidArgumentError, type Command } from '@commander-js/extra-typings';
import { type KtxCliCommandContext, resolveCommandProjectDir } from '../cli-program.js';
import { publicIngestReadCommandSchema, publicIngestRunCommandSchema } from '../command-schemas.js';
import type { KtxPublicIngestArgs, KtxPublicIngestInputMode } from '../public-ingest.js';
import { profileMark } from '../startup-profile.js';
profileMark('module:commands/public-ingest-commands');
interface PublicIngestOptions {
all?: boolean;
json?: boolean;
input?: boolean;
}
function inputMode(options: { input?: boolean }): KtxPublicIngestInputMode {
return options.input === false ? 'disabled' : 'auto';
}
async function runPublicIngestArgs(context: KtxCliCommandContext, args: KtxPublicIngestArgs): Promise<void> {
const runner = context.deps.publicIngest ?? (await import('../public-ingest.js')).runKtxPublicIngest;
context.setExitCode(await runner(args, context.io));
}
function parsePublicIngestConnectionId(value: string): string {
if (value === 'run') {
throw new InvalidArgumentError('run is reserved; use ktx dev ingest run for low-level adapter syntax');
}
return value;
}
export function registerPublicIngestCommands(program: Command, context: KtxCliCommandContext): void {
const ingest = program
.command('ingest')
.description('Build and refresh KTX context from configured sources')
.usage('[options] [connectionId]')
.argument('[connectionId]', 'Connection id to ingest', parsePublicIngestConnectionId)
.option('--all', 'Ingest every eligible configured source', false)
.option('--json', 'Print JSON output', false)
.option('--no-input', 'Disable interactive terminal input')
.addHelpText(
'after',
[
'',
'Examples:',
' ktx ingest <connectionId> [options]',
' ktx ingest --all [options]',
' ktx ingest status [runId] [options]',
' ktx ingest watch [runId] [options]',
'',
'Project directory defaults to KTX_PROJECT_DIR when set, otherwise the current working directory.',
'',
].join('\n'),
)
.showHelpAfterError()
.hook('preAction', (_thisCommand, actionCommand) => {
context.writeDebug?.('ingest', actionCommand);
})
.action(async (connectionId: string | undefined, _options: PublicIngestOptions, command) => {
const options = command.opts();
if (options.all === true && connectionId) {
throw new Error('ktx ingest accepts either --all or <connectionId>, not both');
}
const args = publicIngestRunCommandSchema.parse({
command: 'run',
projectDir: resolveCommandProjectDir(command),
...(connectionId ? { targetConnectionId: connectionId } : {}),
all: options.all === true,
json: options.json === true,
inputMode: inputMode(options),
});
await runPublicIngestArgs(context, args);
});
ingest
.command('status')
.description('Print status for the latest or selected public ingest run')
.argument('[runId]', 'Public ingest run id')
.option('--json', 'Print JSON output', false)
.option('--no-input', 'Disable interactive terminal input')
.action(async (runId: string | undefined, _options: PublicIngestOptions, command) => {
const options = (command.optsWithGlobals ? command.optsWithGlobals() : command.opts()) as PublicIngestOptions;
const args = publicIngestReadCommandSchema.parse({
command: 'status',
projectDir: resolveCommandProjectDir(command),
...(runId ? { runId } : {}),
json: options.json === true,
inputMode: inputMode(options),
});
await runPublicIngestArgs(context, args);
});
ingest
.command('watch')
.description('Open the latest or selected public ingest visual report')
.argument('[runId]', 'Public ingest run id')
.option('--json', 'Print JSON output instead of the visual report', false)
.option('--no-input', 'Disable interactive terminal input')
.action(async (runId: string | undefined, _options: PublicIngestOptions, command) => {
const options = (command.optsWithGlobals ? command.optsWithGlobals() : command.opts()) as PublicIngestOptions;
const args = publicIngestReadCommandSchema.parse({
command: 'watch',
projectDir: resolveCommandProjectDir(command),
...(runId ? { runId } : {}),
json: options.json === true,
inputMode: inputMode(options),
});
await runPublicIngestArgs(context, args);
});
}

View file

@ -18,7 +18,7 @@ async function runRuntimeArgs(context: KtxCliCommandContext, args: KtxRuntimeArg
export function registerRuntimeCommands(program: Command, context: KtxCliCommandContext): void {
const runtime = program
.command('runtime')
.description('Install, inspect, and prune the KTX-managed Python runtime')
.description('Install, start, stop, and inspect the KTX-managed Python runtime')
.showHelpAfterError();
runtime
@ -64,7 +64,7 @@ export function registerRuntimeCommands(program: Command, context: KtxCliCommand
runtime
.command('status')
.description('Show managed Python runtime status')
.description('Show managed Python runtime status and readiness checks')
.option('--json', 'Print JSON output', false)
.action(async (options: { json?: boolean }) => {
await runRuntimeArgs(context, {
@ -73,30 +73,4 @@ export function registerRuntimeCommands(program: Command, context: KtxCliCommand
json: options.json === true,
});
});
runtime
.command('doctor')
.description('Check managed Python runtime prerequisites and installation')
.option('--json', 'Print JSON output', false)
.action(async (options: { json?: boolean }) => {
await runRuntimeArgs(context, {
command: 'doctor',
cliVersion: context.packageInfo.version,
json: options.json === true,
});
});
runtime
.command('prune')
.description('Remove stale managed Python runtimes for older CLI versions')
.option('--dry-run', 'List stale runtimes without deleting them', false)
.option('--yes', 'Confirm deletion of stale runtime directories', false)
.action(async (options: { dryRun?: boolean; yes?: boolean }) => {
await runRuntimeArgs(context, {
command: 'prune',
cliVersion: context.packageInfo.version,
dryRun: options.dryRun === true,
yes: options.yes === true,
});
});
}

View file

@ -1,5 +1,5 @@
import { type Command, InvalidArgumentError, Option } from '@commander-js/extra-typings';
import { type KtxCliCommandContext, parsePositiveIntegerOption, resolveCommandProjectDir } from '../cli-program.js';
import { type Command, InvalidArgumentError } from '@commander-js/extra-typings';
import { type KtxCliCommandContext, resolveCommandProjectDir } from '../cli-program.js';
import { runtimeInstallPolicyFromFlags } from '../managed-python-command.js';
import type { KtxScanArgs } from '../scan.js';
import { profileMark } from '../startup-profile.js';
@ -13,6 +13,16 @@ async function runScanArgs(context: KtxCliCommandContext, args: KtxScanArgs): Pr
type KtxScanModeOption = Extract<KtxScanArgs, { command: 'run' }>['mode'];
const REMOVED_SCAN_SUBCOMMAND_NAMES = new Set([
'status',
'report',
'relationships',
'relationship-apply',
'relationship-feedback',
'relationship-calibration',
'relationship-thresholds',
]);
function parseScanModeOption(value: string): KtxScanModeOption {
if (value === 'structural' || value === 'enriched' || value === 'relationships') {
return value;
@ -20,82 +30,18 @@ function parseScanModeOption(value: string): KtxScanModeOption {
throw new InvalidArgumentError('Allowed choices are structural, enriched, relationships');
}
type KtxRelationshipStatusOption = Extract<KtxScanArgs, { command: 'relationships' }>['status'];
type KtxRelationshipFeedbackDecisionOption = Extract<KtxScanArgs, { command: 'relationshipFeedback' }>['decision'];
function parseRelationshipStatusOption(value: string): KtxRelationshipStatusOption {
if (value === 'accepted' || value === 'review' || value === 'rejected' || value === 'skipped' || value === 'all') {
return value;
}
throw new InvalidArgumentError('Allowed choices are accepted, review, rejected, skipped, all');
}
function parseRelationshipFeedbackDecisionOption(value: string): KtxRelationshipFeedbackDecisionOption {
if (value === 'accepted' || value === 'rejected' || value === 'all') {
return value;
}
throw new InvalidArgumentError('Allowed choices are accepted, rejected, all');
}
function parseNonEmptyOption(value: string): string {
if (value.trim().length === 0) {
throw new InvalidArgumentError('must not be empty');
function parseConnectionId(value: string): string {
if (REMOVED_SCAN_SUBCOMMAND_NAMES.has(value)) {
throw new InvalidArgumentError(`"${value}" is not a scan connection id`);
}
return value;
}
function parseRelationshipCalibrationThreshold(value: string): number {
const parsed = Number(value);
if (Number.isFinite(parsed) && parsed >= 0 && parsed <= 1) {
return parsed;
}
throw new InvalidArgumentError('Allowed range is 0 through 1');
}
function relationshipDecisionArgs(options: {
accept?: string;
reject?: string;
reviewer?: string;
note?: string;
json?: boolean;
}): Pick<
Extract<KtxScanArgs, { command: 'relationshipDecision' }>,
'candidateId' | 'decision' | 'reviewer' | 'note' | 'json'
> | null {
const decisionCount = [options.accept !== undefined, options.reject !== undefined].filter(Boolean).length;
if (decisionCount > 1) {
throw new Error('Only one relationship review decision option can be used: --accept and --reject conflict');
}
if (options.accept !== undefined) {
return {
candidateId: options.accept,
decision: 'accepted',
reviewer: options.reviewer ?? 'ktx',
note: options.note ?? null,
json: options.json === true,
};
}
if (options.reject !== undefined) {
return {
candidateId: options.reject,
decision: 'rejected',
reviewer: options.reviewer ?? 'ktx',
note: options.note ?? null,
json: options.json === true,
};
}
return null;
}
function collectRelationshipCandidateOption(value: string, previous: string[]): string[] {
return [...previous, parseNonEmptyOption(value)];
}
export function registerScanCommands(program: Command, context: KtxCliCommandContext): void {
const scan = program
program
.command('scan')
.description('Run or inspect standalone connection scans')
.argument('[connectionId]', 'KTX connection id to scan')
.description('Run a standalone connection scan')
.argument('<connectionId>', 'KTX connection id to scan', parseConnectionId)
.option(
'--mode <mode>',
'Scan mode: structural, enriched, relationships (default: structural)',
@ -113,13 +59,7 @@ export function registerScanCommands(program: Command, context: KtxCliCommandCon
.hook('preAction', (_thisCommand, actionCommand) => {
context.writeDebug?.('scan', actionCommand);
})
.action(async (connectionId: string | undefined, options, command) => {
if (!connectionId) {
scan.outputHelp();
context.io.stderr.write('ktx dev scan requires <connectionId> or a subcommand\n');
context.setExitCode(1);
return;
}
.action(async (connectionId: string, options, command) => {
const mode = options.mode ?? 'structural';
await runScanArgs(context, {
command: 'run',
@ -133,226 +73,4 @@ export function registerScanCommands(program: Command, context: KtxCliCommandCon
runtimeInstallPolicy: runtimeInstallPolicyFromFlags(options),
});
});
scan
.command('status')
.description('Print status for a local scan run')
.argument('<runId>', 'Local scan run id')
.addHelpText(
'after',
'\n--project-dir is inherited from `ktx dev scan` (default: KTX_PROJECT_DIR or current working directory).\n',
)
.action(async (runId: string, _options: unknown, command) => {
await runScanArgs(context, {
command: 'status',
projectDir: resolveCommandProjectDir(command),
runId,
});
});
scan
.command('report')
.description('Print a local scan report')
.argument('<runId>', 'Local scan run id')
.option('--json', 'Print the raw scan report JSON', false)
.addHelpText(
'after',
'\n--project-dir is inherited from `ktx dev scan` (default: KTX_PROJECT_DIR or current working directory).\n',
)
.action(async (runId: string, options, command) => {
await runScanArgs(context, {
command: 'report',
projectDir: resolveCommandProjectDir(command),
runId,
json: options.json === true,
});
});
scan
.command('relationships')
.description('Print relationship artifacts for a local scan run')
.argument('<runId>', 'Local scan run id')
.option(
'--status <status>',
'Relationship status: accepted, review, rejected, skipped, all',
parseRelationshipStatusOption,
'review',
)
.option('--limit <count>', 'Maximum relationships to print per status', parsePositiveIntegerOption, 25)
.addOption(
new Option('--accept <candidateId>', 'Record a reviewer accepted decision for a relationship candidate')
.argParser(parseNonEmptyOption)
.conflicts('reject'),
)
.addOption(
new Option('--reject <candidateId>', 'Record a reviewer rejected decision for a relationship candidate')
.argParser(parseNonEmptyOption)
.conflicts('accept'),
)
.option('--note <text>', 'Attach a note when recording a relationship review decision')
.option('--reviewer <name>', 'Reviewer name for a relationship review decision')
.option('--json', 'Print relationship artifacts as JSON', false)
.addHelpText(
'after',
'\n--project-dir is inherited from `ktx dev scan` (default: KTX_PROJECT_DIR or current working directory).\n',
)
.action(async (runId: string, options, command) => {
const decision = relationshipDecisionArgs(options);
if (decision) {
await runScanArgs(context, {
command: 'relationshipDecision',
projectDir: resolveCommandProjectDir(command),
runId,
candidateId: decision.candidateId,
decision: decision.decision,
reviewer: decision.reviewer,
note: decision.note,
json: decision.json,
});
return;
}
await runScanArgs(context, {
command: 'relationships',
projectDir: resolveCommandProjectDir(command),
runId,
status: options.status,
json: options.json === true,
limit: options.limit,
});
});
scan
.command('relationship-apply')
.description('Apply accepted relationship review decisions as manual manifest joins')
.argument('<runId>', 'Local scan run id')
.option('--all-accepted', 'Apply all accepted relationship review decisions for the scan run', false)
.option(
'--candidate <candidateId>',
'Apply one accepted relationship review decision',
collectRelationshipCandidateOption,
[],
)
.option('--dry-run', 'Preview relationships that would be written without rewriting manifest shards', false)
.option('--json', 'Print the apply result as JSON', false)
.addHelpText(
'after',
'\n--project-dir is inherited from `ktx dev scan` (default: KTX_PROJECT_DIR or current working directory).\n',
)
.action(async (runId: string, options, command) => {
const parentOptions = command.parent?.opts() as { dryRun?: boolean } | undefined;
await runScanArgs(context, {
command: 'relationshipApply',
projectDir: resolveCommandProjectDir(command),
runId,
applyAllAccepted: options.allAccepted === true,
candidateIds: options.candidate,
dryRun: options.dryRun === true || parentOptions?.dryRun === true,
json: options.json === true,
});
});
scan
.command('relationship-feedback')
.description('Export persisted relationship review decisions as calibration labels')
.option('--connection <connectionId>', 'Only export labels for one KTX connection')
.option(
'--decision <decision>',
'Relationship feedback decision: accepted, rejected, all',
parseRelationshipFeedbackDecisionOption,
'all',
)
.addOption(new Option('--json', 'Print the export as JSON').default(false).conflicts('jsonl'))
.addOption(new Option('--jsonl', 'Print labels as newline-delimited JSON').default(false).conflicts('json'))
.addHelpText(
'after',
'\n--project-dir is inherited from `ktx dev scan` (default: KTX_PROJECT_DIR or current working directory).\n',
)
.action(async (options, command) => {
await runScanArgs(context, {
command: 'relationshipFeedback',
projectDir: resolveCommandProjectDir(command),
connectionId: options.connection ?? null,
decision: options.decision,
json: options.json === true,
jsonl: options.jsonl === true,
});
});
scan
.command('relationship-calibration')
.description('Summarize relationship feedback labels against current score thresholds')
.option('--connection <connectionId>', 'Only calibrate labels for one KTX connection')
.option(
'--decision <decision>',
'Relationship feedback decision: accepted, rejected, all',
parseRelationshipFeedbackDecisionOption,
'all',
)
.option(
'--accept-threshold <value>',
'Score threshold treated as predicted accepted',
parseRelationshipCalibrationThreshold,
0.85,
)
.option(
'--review-threshold <value>',
'Score threshold treated as predicted review',
parseRelationshipCalibrationThreshold,
0.55,
)
.option('--json', 'Print the calibration report as JSON', false)
.addHelpText(
'after',
'\n--project-dir is inherited from `ktx dev scan` (default: KTX_PROJECT_DIR or current working directory).\n',
)
.action(async (options, command) => {
await runScanArgs(context, {
command: 'relationshipCalibration',
projectDir: resolveCommandProjectDir(command),
connectionId: options.connection ?? null,
decision: options.decision,
acceptThreshold: options.acceptThreshold,
reviewThreshold: options.reviewThreshold,
json: options.json === true,
});
});
scan
.command('relationship-thresholds')
.description('Evaluate relationship feedback labels for offline threshold advice')
.option('--connection <connectionId>', 'Only evaluate labels for one KTX connection')
.option(
'--min-total-labels <count>',
'Minimum scored labels before advice can be ready',
parsePositiveIntegerOption,
20,
)
.option(
'--min-accepted-labels <count>',
'Minimum accepted labels before advice can be ready',
parsePositiveIntegerOption,
5,
)
.option(
'--min-rejected-labels <count>',
'Minimum rejected labels before advice can be ready',
parsePositiveIntegerOption,
5,
)
.option('--json', 'Print the threshold advice report as JSON', false)
.addHelpText(
'after',
'\n--project-dir is inherited from `ktx dev scan` (default: KTX_PROJECT_DIR or current working directory).\n',
)
.action(async (options, command) => {
await runScanArgs(context, {
command: 'relationshipThresholds',
projectDir: resolveCommandProjectDir(command),
connectionId: options.connection ?? null,
minTotalLabels: options.minTotalLabels,
minAcceptedLabels: options.minAcceptedLabels,
minRejectedLabels: options.minRejectedLabels,
json: options.json === true,
});
});
}

View file

@ -2,6 +2,7 @@ import { type Command, InvalidArgumentError, Option } from '@commander-js/extra-
import type { KtxCliCommandContext } from '../cli-program.js';
import { resolveCommandProjectDir } from '../cli-program.js';
import type { KtxSetupDatabaseDriver } from '../setup-databases.js';
import type { KtxSetupLlmBackend } from '../setup-models.js';
import type { KtxSetupSourceType } from '../setup-sources.js';
async function runSetupArgs(
@ -27,6 +28,13 @@ function embeddingBackend(value: string): 'openai' | 'sentence-transformers' {
throw new InvalidArgumentError(`invalid choice '${value}'`);
}
function llmBackend(value: string): KtxSetupLlmBackend {
if (value === 'anthropic' || value === 'vertex') {
return value;
}
throw new InvalidArgumentError(`invalid choice '${value}'`);
}
function databaseDriver(value: string): KtxSetupDatabaseDriver {
if (
value === 'sqlite' ||
@ -93,9 +101,12 @@ function shouldShowSetupEntryMenu(
skipAgents?: boolean;
yes?: boolean;
input?: boolean;
llmBackend?: KtxSetupLlmBackend;
anthropicApiKeyEnv?: string;
anthropicApiKeyFile?: string;
anthropicModel?: string;
vertexProject?: string;
vertexLocation?: string;
skipLlm?: boolean;
embeddingBackend?: string;
embeddingApiKeyEnv?: string;
@ -110,7 +121,6 @@ function shouldShowSetupEntryMenu(
disableHistoricSql?: boolean;
historicSqlWindowDays?: number;
historicSqlMinExecutions?: number;
historicSqlMinCalls?: number;
historicSqlServiceAccountPattern?: string[];
historicSqlRedactionPattern?: string[];
skipDatabases?: boolean;
@ -166,9 +176,12 @@ function shouldShowSetupEntryMenu(
'skipAgents',
'yes',
'input',
'llmBackend',
'anthropicApiKeyEnv',
'anthropicApiKeyFile',
'anthropicModel',
'vertexProject',
'vertexLocation',
'skipLlm',
'embeddingBackend',
'embeddingApiKeyEnv',
@ -180,7 +193,6 @@ function shouldShowSetupEntryMenu(
'disableHistoricSql',
'historicSqlWindowDays',
'historicSqlMinExecutions',
'historicSqlMinCalls',
'skipDatabases',
'source',
'sourceConnectionId',
@ -227,9 +239,12 @@ export function registerSetupCommands(program: Command, context: KtxCliCommandCo
.option('--skip-agents', 'Leave agent integration incomplete for now', false)
.option('--yes', 'Accept safe defaults in non-interactive setup', false)
.option('--no-input', 'Disable interactive terminal input')
.addOption(new Option('--llm-backend <backend>', 'LLM backend').argParser(llmBackend))
.option('--anthropic-api-key-env <name>', 'Environment variable containing the Anthropic API key')
.option('--anthropic-api-key-file <path>', 'File containing the Anthropic API key')
.option('--anthropic-model <model>', 'Anthropic model ID to validate and save')
.option('--vertex-project <project>', 'Google Vertex AI project ID, env:NAME, or file:/path')
.option('--vertex-location <location>', 'Google Vertex AI location, env:NAME, or file:/path')
.addOption(new Option('--skip-llm', 'Leave LLM setup incomplete for now').hideHelp().default(false))
.addOption(new Option('--embedding-backend <backend>', 'Embedding backend').argParser(embeddingBackend))
.option('--embedding-api-key-env <name>', 'Environment variable containing the embedding provider API key')
@ -266,11 +281,6 @@ export function registerSetupCommands(program: Command, context: KtxCliCommandCo
.option('--disable-historic-sql', 'Disable Historic SQL for the selected database', false)
.option('--historic-sql-window-days <number>', 'Historic SQL query-history window', positiveInteger)
.option('--historic-sql-min-executions <number>', 'Minimum Historic SQL executions for a template', positiveInteger)
.option(
'--historic-sql-min-calls <number>',
'Alias for --historic-sql-min-executions',
positiveInteger,
)
.option(
'--historic-sql-service-account-pattern <pattern>',
'Historic SQL service-account regex; repeatable',
@ -344,6 +354,16 @@ export function registerSetupCommands(program: Command, context: KtxCliCommandCo
context.setExitCode(1);
return;
}
if (options.llmBackend === 'vertex' && (options.anthropicApiKeyEnv || options.anthropicApiKeyFile)) {
context.io.stderr.write('Anthropic API key flags are only valid with --llm-backend anthropic.\n');
context.setExitCode(1);
return;
}
if (options.llmBackend === 'anthropic' && (options.vertexProject || options.vertexLocation)) {
context.io.stderr.write('Vertex AI flags are only valid with --llm-backend vertex.\n');
context.setExitCode(1);
return;
}
if (options.embeddingApiKeyEnv && options.embeddingApiKeyFile) {
context.io.stderr.write(
'Choose only one embedding credential source: --embedding-api-key-env or --embedding-api-key-file.\n',
@ -371,7 +391,6 @@ export function registerSetupCommands(program: Command, context: KtxCliCommandCo
const mode = options.new ? 'new' : options.existing ? 'existing' : 'auto';
const resolvedAgentScope = options.global ? 'global' : options.agentScope;
const historicSqlMinExecutions = options.historicSqlMinExecutions ?? options.historicSqlMinCalls;
await runSetupArgs(context, {
command: 'run',
projectDir: resolveCommandProjectDir(command),
@ -383,9 +402,12 @@ export function registerSetupCommands(program: Command, context: KtxCliCommandCo
inputMode: options.input === false ? 'disabled' : 'auto',
yes: options.yes === true,
cliVersion: context.packageInfo.version,
...(options.llmBackend ? { llmBackend: options.llmBackend } : {}),
...(options.anthropicApiKeyEnv ? { anthropicApiKeyEnv: options.anthropicApiKeyEnv } : {}),
...(options.anthropicApiKeyFile ? { anthropicApiKeyFile: options.anthropicApiKeyFile } : {}),
...(options.anthropicModel ? { anthropicModel: options.anthropicModel } : {}),
...(options.vertexProject ? { vertexProject: options.vertexProject } : {}),
...(options.vertexLocation ? { vertexLocation: options.vertexLocation } : {}),
skipLlm: options.skipLlm === true,
...(options.embeddingBackend ? { embeddingBackend: options.embeddingBackend } : {}),
...(options.embeddingApiKeyEnv ? { embeddingApiKeyEnv: options.embeddingApiKeyEnv } : {}),
@ -399,7 +421,9 @@ export function registerSetupCommands(program: Command, context: KtxCliCommandCo
...(options.enableHistoricSql ? { enableHistoricSql: true } : {}),
...(options.disableHistoricSql ? { disableHistoricSql: true } : {}),
...(options.historicSqlWindowDays !== undefined ? { historicSqlWindowDays: options.historicSqlWindowDays } : {}),
...(historicSqlMinExecutions !== undefined ? { historicSqlMinExecutions } : {}),
...(options.historicSqlMinExecutions !== undefined
? { historicSqlMinExecutions: options.historicSqlMinExecutions }
: {}),
...(options.historicSqlServiceAccountPattern.length > 0
? { historicSqlServiceAccountPatterns: options.historicSqlServiceAccountPattern }
: {}),

Some files were not shown because too many files have changed in this diff Show more