diff --git a/.github/workflows/ci.yml b/.github/workflows/ci.yml
index 5d70d495..3da14c7b 100644
--- a/.github/workflows/ci.yml
+++ b/.github/workflows/ci.yml
@@ -37,6 +37,9 @@ jobs:
- name: Install TypeScript dependencies
run: pnpm install --frozen-lockfile
+ - name: Run TypeScript dead-code checks
+ run: pnpm run dead-code
+
- name: Run TypeScript checks
run: pnpm run check
diff --git a/.pre-commit-config.yaml b/.pre-commit-config.yaml
index 8908b532..167681a6 100644
--- a/.pre-commit-config.yaml
+++ b/.pre-commit-config.yaml
@@ -33,6 +33,19 @@ repos:
name: ruff format (python)
files: ^python/
+ - repo: local
+ hooks:
+ - id: biome-dead-code
+ name: biome dead-code check
+ entry: pnpm exec biome ci . --formatter-enabled=false --assist-enabled=false
+ language: system
+ pass_filenames: false
+ - id: knip-dead-code
+ name: knip dead-code check
+ entry: pnpm exec knip --reporter compact
+ language: system
+ pass_filenames: false
+
- repo: https://github.com/Yelp/detect-secrets
rev: v1.5.0
hooks:
diff --git a/AGENTS.md b/AGENTS.md
index 2e5a684a..4a235864 100644
--- a/AGENTS.md
+++ b/AGENTS.md
@@ -24,6 +24,9 @@ database migrations, ORPC contracts, or `python-service/` layout exist here.
- **MUST**: Keep package/public API changes intentional. Do not add compatibility
wrappers for old KTX names unless the user explicitly asks for a migration
bridge.
+- **MUST**: Treat KTX as having no public users unless the user says otherwise.
+ Legacy support is not necessary by default; prefer clean breaking changes over
+ compatibility shims, migration bridges, or preserved stale behavior.
### Absolute Prohibitions
@@ -86,6 +89,7 @@ pnpm run build
pnpm run type-check
pnpm run test
pnpm run check
+pnpm run dead-code
pnpm --filter @ktx/cli run smoke
pnpm --filter './packages/*' run build
pnpm --filter './packages/*' run test
@@ -127,6 +131,7 @@ shared contracts or package exports are affected.
- Build/export changes: `pnpm run build`
- Workspace scripts: `node --test scripts/*.test.mjs` or the specific script
test file
+- TypeScript dead-code tooling/config changes: `pnpm run dead-code`
- Python semantic layer: `uv run pytest python/ktx-sl/tests -q`
- Python daemon: `uv run pytest python/ktx-daemon/tests -q`
- Python files: also run `uv run pre-commit run --files [FILES]` when
@@ -156,6 +161,23 @@ pnpm run test 2>&1 | tee /tmp/ktx-test-output.log
- Do not manually edit generated or built output under `dist/`; edit source and
rebuild.
+### Dead TypeScript Code Checks
+
+KTX uses Biome for local unused-code linting and Knip for workspace graph
+analysis. These checks are intentionally part of CI and pre-commit because the
+normal development workflow is agent-based.
+
+- Run `pnpm run dead-code` after TypeScript changes.
+- Treat Knip findings as investigation prompts, not automatic deletion orders.
+- Remove private dead code when you confirm there are no imports, dynamic
+ references, generated references, or tests that still need it.
+- Preserve public package exports unless the task explicitly includes API
+ pruning.
+- Add narrow `knip.json` ignores only for intentional dynamic or public cases.
+ Do not add broad package-level ignores to silence unrelated findings.
+- Update `knip.json` when adding dynamic entrypoints, generated files, package
+ exports, CLI bins, or framework files that Knip cannot infer.
+
### CLI Standards
- Use Commander for CLI command trees, arguments, options, help text, custom
diff --git a/README.md b/README.md
index cfabfbcc..b52a31f6 100644
--- a/README.md
+++ b/README.md
@@ -130,9 +130,7 @@ Scan artifacts are written under
```bash
SCAN_OUTPUT="$(ktx scan warehouse --project-dir "$PROJECT_DIR")"
printf '%s\n' "$SCAN_OUTPUT"
-SCAN_RUN_ID="$(printf '%s\n' "$SCAN_OUTPUT" | awk '/^Run: / { print $2 }')"
-ktx scan status --project-dir "$PROJECT_DIR" "$SCAN_RUN_ID"
-ktx scan report --project-dir "$PROJECT_DIR" "$SCAN_RUN_ID"
+ktx status --project-dir "$PROJECT_DIR"
```
For non-SQLite drivers, prefer credential references such as `--url env:NAME`
@@ -147,16 +145,13 @@ version, and is managed by `ktx dev runtime` commands.
KTX requires `uv` on `PATH` to create the managed runtime. Install `uv` with
your system package manager or the official installer before running Python-
backed KTX commands. KTX doesn't download `uv` automatically; run
-`ktx dev runtime doctor` if runtime installation fails:
+`ktx dev runtime status` if runtime installation fails:
```bash
ktx dev runtime install --yes
ktx dev runtime status
-ktx dev runtime doctor
ktx dev runtime start
ktx dev runtime stop
-ktx dev runtime prune --dry-run
-ktx dev runtime prune --yes
```
The release artifact manifest contains the public npm tarball and the bundled `kaelio-ktx`
@@ -223,7 +218,7 @@ KTX provider. Enable it with an environment flag when running an LLM-backed
command:
```bash
-KTX_AI_DEVTOOLS_ENABLED=true ktx dev ingest run \
+KTX_AI_DEVTOOLS_ENABLED=true ktx ingest run \
--connection-id warehouse \
--adapter metabase
```
diff --git a/biome.json b/biome.json
new file mode 100644
index 00000000..35c6d596
--- /dev/null
+++ b/biome.json
@@ -0,0 +1,36 @@
+{
+ "$schema": "https://biomejs.dev/schemas/2.4.15/schema.json",
+ "assist": {
+ "enabled": false
+ },
+ "formatter": {
+ "enabled": false
+ },
+ "files": {
+ "includes": [
+ "scripts/**/*.mjs",
+ "packages/**/*.ts",
+ "packages/**/*.tsx",
+ "docs-site/**/*.ts",
+ "docs-site/**/*.tsx",
+ "docs-site/**/*.mjs",
+ "!**/dist/**",
+ "!**/coverage/**",
+ "!**/.next/**",
+ "!**/node_modules/**",
+ "!**/*.gen.ts",
+ "!**/*.generated.ts"
+ ]
+ },
+ "linter": {
+ "enabled": true,
+ "rules": {
+ "recommended": false,
+ "correctness": {
+ "noUnusedImports": "error",
+ "noUnusedVariables": "error",
+ "noUnusedPrivateClassMembers": "error"
+ }
+ }
+ }
+}
diff --git a/docs-site/components/terminal-preview.tsx b/docs-site/components/terminal-preview.tsx
index a1f950c8..d430c4ac 100644
--- a/docs-site/components/terminal-preview.tsx
+++ b/docs-site/components/terminal-preview.tsx
@@ -47,7 +47,7 @@ export function TerminalPreview() {
diff --git a/docs-site/content/docs/ai-resources/agent-quickstart.mdx b/docs-site/content/docs/ai-resources/agent-quickstart.mdx
index 40983224..6fd6e5ac 100644
--- a/docs-site/content/docs/ai-resources/agent-quickstart.mdx
+++ b/docs-site/content/docs/ai-resources/agent-quickstart.mdx
@@ -22,7 +22,7 @@ Agents should start with the smallest source that answers the task:
| How to check project readiness | [ktx status](/docs/cli-reference/ktx-status) | [Quickstart](/docs/getting-started/quickstart) |
| How context gets built | [Building Context](/docs/guides/building-context) | [ktx ingest](/docs/cli-reference/ktx-ingest) |
| How semantic YAML works | [Writing Context](/docs/guides/writing-context) | [ktx sl](/docs/cli-reference/ktx-sl) |
-| How machine-readable CLI output is shaped | [ktx agent](/docs/cli-reference/ktx-agent) | [Markdown Access](/docs/ai-resources/markdown-access) |
+| How machine-readable CLI output is shaped | [ktx sl](/docs/cli-reference/ktx-sl) | [ktx wiki](/docs/cli-reference/ktx-wiki) |
## Operating workflow
diff --git a/docs-site/content/docs/ai-resources/markdown-access.mdx b/docs-site/content/docs/ai-resources/markdown-access.mdx
index c363a215..12bb7456 100644
--- a/docs-site/content/docs/ai-resources/markdown-access.mdx
+++ b/docs-site/content/docs/ai-resources/markdown-access.mdx
@@ -31,7 +31,8 @@ Every docs page has a Markdown route:
```text
https://docs.kaelio.com/ktx/docs/getting-started/quickstart.md
-https://docs.kaelio.com/ktx/docs/cli-reference/ktx-agent.md
+https://docs.kaelio.com/ktx/docs/cli-reference/ktx-sl.md
+https://docs.kaelio.com/ktx/docs/cli-reference/ktx-wiki.md
https://docs.kaelio.com/ktx/docs/guides/building-context.md
```
diff --git a/docs-site/content/docs/cli-reference/ktx-agent.mdx b/docs-site/content/docs/cli-reference/ktx-agent.mdx
deleted file mode 100644
index cdc4ceac..00000000
--- a/docs-site/content/docs/cli-reference/ktx-agent.mdx
+++ /dev/null
@@ -1,148 +0,0 @@
----
-title: "ktx agent"
-description: "Machine-readable commands for coding agents."
----
-
-Hidden commands that provide machine-readable JSON output for coding agents. These are the commands that agent integrations (Claude Code, Cursor, Codex, OpenCode) call under the hood — you typically won't use them directly.
-
-All `ktx agent` subcommands require `--json` and produce structured JSON output on stdout.
-
-## Command signature
-
-```bash
-ktx agent --json [options]
-```
-
-## Subcommands
-
-| Subcommand | Description |
-|-----------|-------------|
-| `tools` | Print available agent-facing KTX tools |
-| `context` | Print project context for agent planning |
-| `sl list` | List semantic-layer sources |
-| `sl read ` | Read one semantic-layer source |
-| `sl query` | Run a semantic-layer query from a JSON file |
-| `wiki search ` | Search KTX wiki pages |
-| `wiki read ` | Read one KTX wiki page |
-| `sql execute` | Execute read-only SQL with a row limit |
-
-## Options
-
-### `agent tools`
-
-| Flag | Description | Default |
-|------|-------------|---------|
-| `--json` | Print JSON output (required) | — |
-
-### `agent context`
-
-| Flag | Description | Default |
-|------|-------------|---------|
-| `--json` | Print JSON output (required) | — |
-
-### `agent sl list`
-
-| Flag | Description | Default |
-|------|-------------|---------|
-| `--json` | Print JSON output (required) | — |
-| `--connection-id ` | Filter by connection id | — |
-| `--query ` | Search source names and descriptions | — |
-
-### `agent sl read`
-
-| Flag | Description | Default |
-|------|-------------|---------|
-| `--json` | Print JSON output (required) | — |
-| `--connection-id ` | Connection id containing the source | — |
-
-### `agent sl query`
-
-| Flag | Description | Default |
-|------|-------------|---------|
-| `--json` | Print JSON output (required) | — |
-| `--connection-id ` | Connection id for execution (required) | — |
-| `--query-file ` | JSON semantic-layer query file (required) | — |
-| `--execute` | Execute the compiled query against the connection | `false` |
-| `--max-rows ` | Maximum rows to return when executing (1-1000) | — |
-
-### `agent wiki search`
-
-| Flag | Description | Default |
-|------|-------------|---------|
-| `--json` | Print JSON output (required) | — |
-| `--limit ` | Maximum search results | `10` |
-
-### `agent wiki read`
-
-| Flag | Description | Default |
-|------|-------------|---------|
-| `--json` | Print JSON output (required) | — |
-
-### `agent sql execute`
-
-| Flag | Description | Default |
-|------|-------------|---------|
-| `--json` | Print JSON output (required) | — |
-| `--connection-id ` | Connection id for execution (required) | — |
-| `--sql-file ` | SQL file to execute (required) | — |
-| `--max-rows ` | Maximum rows to return, 1-1000 (required) | — |
-
-## Examples
-
-```bash
-# List available tools
-ktx agent tools --json
-
-# Get project context for planning
-ktx agent context --json
-
-# List semantic sources
-ktx agent sl list --json
-
-# Search semantic sources by name
-ktx agent sl list --json --query "revenue"
-
-# Read a semantic source
-ktx agent sl read orders --json --connection-id my-warehouse
-
-# Run a semantic-layer query from a file
-ktx agent sl query --json \
- --connection-id my-warehouse \
- --query-file /tmp/query.json \
- --execute \
- --max-rows 100
-
-# Search wiki pages
-ktx agent wiki search "churn definition" --json
-
-# Read a specific wiki page
-ktx agent wiki read page-abc123 --json
-
-# Execute read-only SQL
-ktx agent sql execute --json \
- --connection-id my-warehouse \
- --sql-file /tmp/query.sql \
- --max-rows 500
-```
-
-## Output
-
-Every `ktx agent` command writes JSON to stdout and diagnostic text to stderr. Agents should parse stdout as JSON and treat a non-zero exit code as a failed tool call.
-
-```json
-{
- "ok": true,
- "data": {
- "type": "agent-response"
- }
-}
-```
-
-## Common errors
-
-| Error | Cause | Recovery |
-|-------|-------|----------|
-| Missing JSON output | `--json` was omitted | Re-run the same subcommand with `--json` |
-| Unknown connection id | The requested connection is not configured in `ktx.yaml` | Call `ktx agent context --json` or `ktx connection list` to discover valid ids |
-| Query file cannot be read | `--query-file` points to a missing or invalid JSON file | Write the query payload to a real file and pass its absolute path |
-| SQL execution rejected | SQL is not read-only or `--max-rows` is missing | Use semantic-layer queries first; for direct SQL, pass read-only SQL and an explicit row limit |
diff --git a/docs-site/content/docs/cli-reference/ktx-dev.mdx b/docs-site/content/docs/cli-reference/ktx-dev.mdx
index 82ba9acb..e00a4585 100644
--- a/docs-site/content/docs/cli-reference/ktx-dev.mdx
+++ b/docs-site/content/docs/cli-reference/ktx-dev.mdx
@@ -1,9 +1,9 @@
---
title: "ktx dev"
-description: "Low-level diagnostics, scans, adapter commands, and mapping tools."
+description: "Low-level project initialization and runtime management."
---
-Hidden commands for low-level project management, diagnostics, direct adapter control, and shell completion. Most users interact with these through higher-level commands like [`ktx ingest`](/docs/cli-reference/ktx-ingest) and [`ktx setup`](/docs/cli-reference/ktx-setup), but `ktx dev` provides direct access when you need fine-grained control.
+`ktx dev` contains development-only project initialization and managed runtime commands. Scan and ingest commands live at the root as [`ktx scan`](/docs/cli-reference/ktx-scan) and [`ktx ingest`](/docs/cli-reference/ktx-ingest).
## Command signature
@@ -16,145 +16,42 @@ ktx dev [options]
| Subcommand | Description |
|-----------|-------------|
| `init [directory]` | Initialize a Git-backed KTX project directory |
-| `runtime` | Install, inspect, and prune the KTX-managed Python runtime |
-| `scan` | Run or inspect standalone connection scans |
-| `ingest run` | Run local ingest for one configured connection and source adapter |
-| `ingest status [runId]` | Print status for a stored local ingest run |
-| `ingest watch [runId]` | Open a stored ingest visual report |
-| `ingest replay ` | Replay a stored ingest run through memory-flow output |
-| `mapping` | Manage Metabase warehouse mappings (same as `ktx connection mapping`) |
-| `completion zsh` | Generate zsh completion script |
+| `runtime` | Install, start, stop, and inspect the KTX-managed Python runtime |
-## Options
-
-### `dev init`
+## `dev init`
| Flag | Description | Default |
|------|-------------|---------|
| `--name ` | Project name written to `ktx.yaml` | — |
| `--force` | Rewrite `ktx.yaml` and scaffold files in an existing project | `false` |
-### `dev runtime`
+## `dev runtime`
+
+`ktx dev runtime` supports `install`, `start`, `stop`, and `status`.
| Flag | Description | Default |
|------|-------------|---------|
| `--feature ` | Runtime feature level for `install` and `start` (`core` or `local-embeddings`) | `core` |
-| `--json` | Print JSON output | `false` |
-| `--yes` | Confirm runtime install or prune actions where supported | `false` |
+| `--json` | Print JSON output for `status` | `false` |
+| `--yes` | Confirm runtime install actions where supported | `false` |
| `--force` | Reinstall or restart where supported | `false` |
-### `dev scan`
-
-See [`ktx scan`](/docs/cli-reference/ktx-scan) for the full scan command reference.
-
-### `dev ingest run`
-
-| Flag | Description | Default |
-|------|-------------|---------|
-| `--connection-id ` | KTX connection id (required) | — |
-| `--adapter ` | Ingest source adapter name (required) | — |
-| `--source-dir ` | Directory containing source files | — |
-| `--database-introspection-url ` | Daemon URL for live-database introspection | — |
-| `--debug-llm-request-file ` | Write sanitized LLM request structure to a JSONL file | — |
-| `--plain` | Print plain text output | `false` |
-| `--json` | Print JSON output | `false` |
-| `--viz` | Render memory-flow TUI output | `false` |
-| `--no-input` | Disable interactive terminal input for visualization | — |
-
-### `dev ingest status`
-
-| Flag | Description | Default |
-|------|-------------|---------|
-| `--report-file ` | Bundle ingest report JSON file to render | — |
-| `--plain` | Print plain text output | `false` |
-| `--json` | Print JSON output | `false` |
-| `--viz` | Render memory-flow TUI output | `false` |
-| `--no-input` | Disable interactive terminal input for visualization | — |
-
-### `dev ingest watch`
-
-| Flag | Description | Default |
-|------|-------------|---------|
-| `--report-file ` | Bundle ingest report JSON file to render | — |
-| `--plain` | Print plain text output | `false` |
-| `--json` | Print JSON output | `false` |
-| `--viz` | Render memory-flow TUI output (the default unless `--plain` or `--json` is set) | `true` |
-| `--no-input` | Disable interactive terminal input for visualization | — |
-
-### `dev ingest replay`
-
-| Flag | Description | Default |
-|------|-------------|---------|
-| `--report-file ` | Bundle ingest report JSON file to render | — |
-| `--plain` | Print plain text output | `false` |
-| `--json` | Print JSON output | `false` |
-| `--viz` | Render memory-flow TUI output | `false` |
-| `--no-input` | Disable interactive terminal input for visualization | — |
-
-### `dev completion zsh`
-
-| Flag | Description | Default |
-|------|-------------|---------|
-| `--install` | Install zsh completion into `~/.zfunc` and update `~/.zshrc` | `false` |
-
## Examples
```bash
-# Initialize a new KTX project
ktx dev init
-
-# Initialize in a specific directory with a project name
ktx dev init ./my-project --name "Analytics Context"
-
-# Re-initialize an existing project
ktx dev init --force
-# Check managed Python runtime readiness
-ktx dev runtime doctor
-
-# Start the managed Python daemon
+ktx dev runtime install --yes
+ktx dev runtime status
ktx dev runtime start
-
-# Run a low-level ingest with a specific adapter
-ktx dev ingest run --connection-id my-dbt --adapter dbt
-
-# Run ingest from a specific source directory
-ktx dev ingest run \
- --connection-id my-dbt \
- --adapter dbt \
- --source-dir ./dbt-project
-
-# View ingest status with the visual TUI
-ktx dev ingest watch run-abc123
-
-# Replay a stored ingest session
-ktx dev ingest replay run-abc123
-
-# View ingest status from a report file
-ktx dev ingest status --report-file /tmp/ingest-report.json
-
-# Generate zsh completions
-ktx dev completion zsh
-
-# Install zsh completions
-ktx dev completion zsh --install
+ktx dev runtime stop
```
-## Output
-
-`ktx dev` commands are diagnostic and may print plain text, JSON, or visual reports depending on the selected flags.
-
-| Mode | How to request it | Use case |
-|------|-------------------|----------|
-| Plain text | `--plain` or default diagnostic output | Human-readable terminal inspection |
-| JSON | `--json` | Agent parsing and automation |
-| Visual report | `--viz` | Interactive memory-flow and ingest debugging |
-
## Common errors
| Error | Cause | Recovery |
|-------|-------|----------|
-| Doctor reports missing runtime pieces | Packages, Python environment, or linked CLI are not ready | Run `pnpm install`, `pnpm run setup:dev`, and `uv sync --all-groups` |
-| Ingest run cannot find adapter | `--adapter` does not match a supported source adapter | Use configured source names from `ktx.yaml` or run higher-level `ktx ingest` |
-| Replay/report file cannot be read | The report path is wrong or the run id is not stored locally | Run `ktx dev ingest status --json` to discover stored run ids and report files |
-| Visual output fails in CI | TUI rendering requires an interactive terminal | Use `--plain --no-input` or `--json --no-input` |
+| Runtime status reports missing pieces | Packages, Python environment, or linked CLI are not ready | Run `pnpm install`, `pnpm run setup:dev`, `uv sync --all-groups`, then `ktx dev runtime status` |
+| Runtime daemon does not start | The managed Python runtime is missing or stale | Run `ktx dev runtime install --yes`, then `ktx dev runtime start` |
diff --git a/docs-site/content/docs/cli-reference/ktx-ingest.mdx b/docs-site/content/docs/cli-reference/ktx-ingest.mdx
index 8ce9d9a5..e1c0e339 100644
--- a/docs-site/content/docs/cli-reference/ktx-ingest.mdx
+++ b/docs-site/content/docs/cli-reference/ktx-ingest.mdx
@@ -1,14 +1,13 @@
---
title: "ktx ingest"
-description: "Build and refresh context from configured sources."
+description: "Run and inspect local ingest memory-flow output."
---
-Ingest context from your configured sources — dbt, Looker, Metabase, MetricFlow, LookML, or Notion. The ingest process extracts metadata from your tools, then uses an LLM agent to reconcile it with existing context, writing semantic sources and knowledge pages to your project.
+`ktx ingest` runs adapter-level local ingest and renders stored ingest reports.
## Command signature
```bash
-ktx ingest [connectionId] [options]
ktx ingest [options]
```
@@ -16,80 +15,59 @@ ktx ingest [options]
| Subcommand | Description |
|-----------|-------------|
-| `status [runId]` | Print status for the latest or selected public ingest run |
-| `watch [runId]` | Open the latest or selected public ingest visual report |
+| `run` | Run local ingest for one configured connection and source adapter |
+| `status [runId]` | Print status for the latest or selected stored local ingest run or report file |
+| `watch [runId]` | Open the latest or selected stored ingest visual report |
+| `replay ` | Replay a stored ingest run or bundle report through memory-flow output |
-## Options
-
-### `ingest` (run)
+## `ingest run`
| Flag | Description | Default |
|------|-------------|---------|
-| `--all` | Ingest every eligible configured source | `false` |
+| `--connection-id ` | KTX connection id | Required |
+| `--adapter ` | Ingest source adapter name | Required |
+| `--source-dir ` | Directory containing source files | — |
+| `--database-introspection-url ` | Daemon URL for live-database introspection | — |
+| `--debug-llm-request-file ` | Write sanitized LLM request structure to a JSONL file | — |
+| `--plain` | Print plain text output | `true` |
| `--json` | Print JSON output | `false` |
-| `--no-input` | Disable interactive terminal input | — |
+| `--viz` | Render memory-flow TUI output | `false` |
+| `--yes` | Install the managed Python runtime without prompting when required | `false` |
+| `--no-input` | Disable interactive terminal input for visualization and runtime installation | — |
-### `ingest status`
+## `ingest status`, `watch`, and `replay`
| Flag | Description | Default |
|------|-------------|---------|
+| `--report-file ` | Bundle ingest report JSON file to render | — |
+| `--plain` | Print plain text output | `true` for `status` and `replay` |
| `--json` | Print JSON output | `false` |
-| `--no-input` | Disable interactive terminal input | — |
-
-### `ingest watch`
-
-| Flag | Description | Default |
-|------|-------------|---------|
-| `--json` | Print JSON output instead of the visual report | `false` |
-| `--no-input` | Disable interactive terminal input | — |
+| `--viz` | Render memory-flow TUI output | `true` for `watch` |
+| `--no-input` | Disable interactive terminal input for visualization | — |
## Examples
```bash
-# Ingest from a specific connection
-ktx ingest my-dbt-source
+ktx ingest run --connection-id my-dbt-source --adapter dbt
+ktx ingest run --connection-id prod-metabase --adapter metabase --yes
-# Ingest from all eligible sources
-ktx ingest --all
-
-# Check the status of the latest ingest
ktx ingest status
-
-# Check the status of a specific ingest run
ktx ingest status run-abc123
-
-# Watch the latest ingest report
-ktx ingest watch
-
-# Get ingest status as JSON
ktx ingest status --json
-```
-## Low-level ingest commands
+ktx ingest watch
+ktx ingest watch run-abc123
-For adapter-level control, use `ktx dev ingest`. See [`ktx dev`](/docs/cli-reference/ktx-dev) for the full low-level ingest surface including `run`, `status`, `watch`, and `replay` with output mode options (`--plain`, `--json`, `--viz`).
-
-## Output
-
-Ingest run commands print progress and create a stored ingest report. `ktx ingest status --json` returns the run state, adapter, connection, and summary information.
-
-```json
-{
- "runId": "ingest-local-abc123",
- "status": "completed",
- "connectionId": "dbt-main",
- "summary": {
- "semanticSourcesChanged": 4,
- "knowledgePagesChanged": 2
- }
-}
+ktx ingest replay run-abc123
+ktx ingest replay run-abc123 --viz
+ktx ingest replay run-abc123 --report-file /tmp/ingest-report.json
```
## Common errors
| Error | Cause | Recovery |
|-------|-------|----------|
-| No eligible sources | `ktx.yaml` has no configured context source for ingest | Add a source with `ktx setup` or `ktx connection add`, then rerun ingest |
| Ingest needs credentials | The source adapter requires API or git access | Configure the referenced environment variable or secret file |
-| Latest run not found | No ingest run has been started in this project | Run `ktx ingest ` or `ktx ingest --all` first |
+| Ingest run cannot find adapter | `--adapter` does not match a supported source adapter | Use a configured adapter such as `dbt`, `metabase`, `looker`, `lookml`, `notion`, or `live-database` |
+| Latest run not found | No ingest run has been started in this project | Run `ktx ingest run --connection-id --adapter ` first |
| Report watch fails in a non-interactive shell | Visual report needs a terminal | Use `ktx ingest status --json` for agent and CI workflows |
diff --git a/docs-site/content/docs/cli-reference/ktx-scan.mdx b/docs-site/content/docs/cli-reference/ktx-scan.mdx
index 0c37eccb..2f73ed99 100644
--- a/docs-site/content/docs/cli-reference/ktx-scan.mdx
+++ b/docs-site/content/docs/cli-reference/ktx-scan.mdx
@@ -1,163 +1,39 @@
---
title: "ktx scan"
-description: "Run or inspect database scans."
+description: "Run standalone database scans."
---
-Discover your database schema — tables, columns, types, constraints, and relationships. Scanning is the first step in building context: KTX needs to understand your warehouse structure before it can build semantic sources.
-
-Scan commands live under `ktx dev scan`. See also the [Building Context](/docs/guides/building-context) guide for a walkthrough.
+Discover a configured database connection's schema, including tables, columns, types, constraints, and optional relationship signals.
## Command signature
```bash
-ktx dev scan [options]
-ktx dev scan [options]
+ktx scan [options]
```
-## Subcommands
-
-| Subcommand | Description |
-|-----------|-------------|
-| `status ` | Print status for a local scan run |
-| `report ` | Print a local scan report |
-| `relationships ` | Print relationship artifacts for a local scan run |
-| `relationship-apply ` | Apply accepted relationship review decisions as manual manifest joins |
-| `relationship-feedback` | Export persisted relationship review decisions as calibration labels |
-| `relationship-calibration` | Summarize relationship feedback labels against current score thresholds |
-| `relationship-thresholds` | Evaluate relationship feedback labels for offline threshold advice |
-
## Options
-### `scan` (run)
-
| Flag | Description | Default |
|------|-------------|---------|
| `--mode ` | Scan mode: `structural`, `enriched`, or `relationships` | `structural` |
| `--dry-run` | Run without writing scan results | `false` |
| `--database-introspection-url ` | Daemon URL for live-database introspection | — |
-
-### `scan report`
-
-| Flag | Description | Default |
-|------|-------------|---------|
-| `--json` | Print the raw scan report JSON | `false` |
-
-### `scan relationships`
-
-| Flag | Description | Default |
-|------|-------------|---------|
-| `--status ` | Filter by status: `accepted`, `review`, `rejected`, `skipped`, or `all` | `review` |
-| `--limit ` | Maximum relationships to print per status | `25` |
-| `--accept ` | Record an accepted decision for a relationship candidate | — |
-| `--reject ` | Record a rejected decision for a relationship candidate | — |
-| `--note ` | Attach a note when recording a relationship review decision | — |
-| `--reviewer ` | Reviewer name for a relationship review decision | — |
-| `--json` | Print relationship artifacts as JSON | `false` |
-
-### `scan relationship-apply`
-
-| Flag | Description | Default |
-|------|-------------|---------|
-| `--all-accepted` | Apply all accepted relationship review decisions for the scan run | `false` |
-| `--candidate ` | Apply one accepted relationship review decision; repeatable | — |
-| `--dry-run` | Preview relationships that would be written without rewriting manifest shards | `false` |
-| `--json` | Print the apply result as JSON | `false` |
-
-### `scan relationship-feedback`
-
-| Flag | Description | Default |
-|------|-------------|---------|
-| `--connection ` | Only export labels for one KTX connection | — |
-| `--decision ` | Filter: `accepted`, `rejected`, or `all` | `all` |
-| `--json` | Print the export as JSON | `false` |
-| `--jsonl` | Print labels as newline-delimited JSON | `false` |
-
-### `scan relationship-calibration`
-
-| Flag | Description | Default |
-|------|-------------|---------|
-| `--connection ` | Only calibrate labels for one KTX connection | — |
-| `--decision ` | Filter: `accepted`, `rejected`, or `all` | `all` |
-| `--accept-threshold ` | Score threshold treated as predicted accepted (0–1) | `0.85` |
-| `--review-threshold ` | Score threshold treated as predicted review (0–1) | `0.55` |
-| `--json` | Print the calibration report as JSON | `false` |
-
-### `scan relationship-thresholds`
-
-| Flag | Description | Default |
-|------|-------------|---------|
-| `--connection ` | Only evaluate labels for one KTX connection | — |
-| `--min-total-labels ` | Minimum scored labels before advice can be ready | `20` |
-| `--min-accepted-labels ` | Minimum accepted labels before advice can be ready | `5` |
-| `--min-rejected-labels ` | Minimum rejected labels before advice can be ready | `5` |
-| `--json` | Print the threshold advice report as JSON | `false` |
+| `--yes` | Install the managed Python runtime without prompting when required | `false` |
+| `--no-input` | Disable interactive managed runtime installation | — |
## Examples
```bash
-# Run a structural scan of a connection
-ktx dev scan my-warehouse
-
-# Run a scan with LLM enrichment
-ktx dev scan my-warehouse --mode enriched
-
-# Run a scan with relationship detection
-ktx dev scan my-warehouse --mode relationships
-
-# Dry-run a scan (don't write results)
-ktx dev scan my-warehouse --dry-run
-
-# Check the status of a scan run
-ktx dev scan status run-abc123
-
-# View the scan report
-ktx dev scan report run-abc123
-
-# View scan report as JSON
-ktx dev scan report run-abc123 --json
-
-# List relationship candidates pending review
-ktx dev scan relationships run-abc123
-
-# List all relationships regardless of status
-ktx dev scan relationships run-abc123 --status all
-
-# Accept a relationship candidate
-ktx dev scan relationships run-abc123 --accept candidate-xyz
-
-# Reject a relationship candidate with a note
-ktx dev scan relationships run-abc123 --reject candidate-xyz --note "false positive"
-
-# Apply all accepted relationships to the manifest
-ktx dev scan relationship-apply run-abc123 --all-accepted
-
-# Preview what would be applied
-ktx dev scan relationship-apply run-abc123 --all-accepted --dry-run
-
-# Export relationship feedback as calibration labels
-ktx dev scan relationship-feedback --json
-
-# Calibrate relationship detection thresholds
-ktx dev scan relationship-calibration --accept-threshold 0.9 --review-threshold 0.6
-
-# Get threshold advice based on review decisions
-ktx dev scan relationship-thresholds
+ktx scan my-warehouse
+ktx scan my-warehouse --mode enriched
+ktx scan my-warehouse --mode relationships
+ktx scan my-warehouse --dry-run
+ktx scan my-warehouse --database-introspection-url http://127.0.0.1:8765
```
## Output
-Scan commands write scan artifacts under the KTX project directory and print status or report summaries. Use `--json` on report and relationship commands when an agent needs structured output.
-
-```json
-{
- "runId": "scan-local-abc123",
- "status": "completed",
- "mode": "structural",
- "changes": {
- "tablesAdded": 42
- }
-}
-```
+`ktx scan` prints a human summary and writes scan artifacts under the KTX project directory unless `--dry-run` is set. Use `ktx status` after a scan to inspect project readiness and next setup work.
## Common errors
@@ -165,5 +41,4 @@ Scan commands write scan artifacts under the KTX project directory and print sta
|-------|-------|----------|
| Scan cannot connect | Connection credentials or network access are invalid | Run `ktx connection test ` and update the connection before scanning |
| Enriched scan cannot describe columns | LLM credentials are missing or invalid | Complete LLM setup with `ktx setup` before enriched scans |
-| Relationship apply writes nothing | No accepted candidates match the provided run id or candidate ids | Inspect `ktx dev scan relationships --status accepted` first |
-| Calibration is not ready | Too few reviewed relationship labels exist | Review and accept/reject more candidates, then rerun calibration |
+| Relationship scan has limited evidence | The connector cannot provide optional validation or statistics | Re-run with a connector that supports the missing capability, or treat relationship output as lower-confidence context |
diff --git a/docs-site/content/docs/cli-reference/ktx-sl.mdx b/docs-site/content/docs/cli-reference/ktx-sl.mdx
index 4ec7bdd1..f5a31b27 100644
--- a/docs-site/content/docs/cli-reference/ktx-sl.mdx
+++ b/docs-site/content/docs/cli-reference/ktx-sl.mdx
@@ -28,6 +28,7 @@ ktx sl [options]
| Flag | Description | Default |
|------|-------------|---------|
| `--connection-id ` | Filter by KTX connection id | — |
+| `--query ` | Search source names and descriptions | — |
| `--output ` | Output mode: `pretty` (default in TTY), `plain` (TSV), or `json` | `pretty` |
| `--json` | Shortcut for `--output=json` (overrides `--output`) | `false` |
@@ -36,6 +37,7 @@ ktx sl [options]
| Flag | Description | Default |
|------|-------------|---------|
| `--connection-id ` | KTX connection id (required) | — |
+| `--json` | Print JSON output | `false` |
### `sl validate`
@@ -55,6 +57,7 @@ ktx sl [options]
| Flag | Description | Default |
|------|-------------|---------|
| `--connection-id ` | KTX connection id | — |
+| `--query-file ` | JSON semantic-layer query file | — |
| `--measure ` | Measure to query; repeatable (at least one required) | — |
| `--dimension ` | Dimension to include; repeatable | — |
| `--filter ` | Filter expression; repeatable | — |
@@ -78,9 +81,15 @@ ktx sl list --connection-id my-warehouse
# List sources as JSON
ktx sl list --json
+# Search sources as JSON
+ktx sl list --json --query "revenue"
+
# Read a source definition
ktx sl read orders --connection-id my-warehouse
+# Read a source definition as JSON
+ktx sl read orders --connection-id my-warehouse --json
+
# Validate a source against the live schema
ktx sl validate orders --connection-id my-warehouse
@@ -119,6 +128,13 @@ ktx sl query \
--dimension orders.created_date \
--execute \
--max-rows 1000
+
+# Execute a query from a JSON file
+ktx sl query \
+ --connection-id my-warehouse \
+ --query-file query.json \
+ --execute \
+ --max-rows 100
```
## Output
diff --git a/docs-site/content/docs/cli-reference/ktx-wiki.mdx b/docs-site/content/docs/cli-reference/ktx-wiki.mdx
index a709ac07..7e45420e 100644
--- a/docs-site/content/docs/cli-reference/ktx-wiki.mdx
+++ b/docs-site/content/docs/cli-reference/ktx-wiki.mdx
@@ -26,19 +26,23 @@ ktx wiki [options]
| Flag | Description | Default |
|------|-------------|---------|
+| `--json` | Print JSON output | `false` |
| `--user-id ` | Local user id | `local` |
### `wiki read`
| Flag | Description | Default |
|------|-------------|---------|
+| `--json` | Print JSON output | `false` |
| `--user-id ` | Local user id | `local` |
### `wiki search`
| Flag | Description | Default |
|------|-------------|---------|
+| `--json` | Print JSON output | `false` |
| `--user-id ` | Local user id | `local` |
+| `--limit ` | Maximum search results | — |
### `wiki write`
@@ -58,12 +62,21 @@ ktx wiki [options]
# List all wiki pages
ktx wiki list
+# List all wiki pages as JSON
+ktx wiki list --json
+
# Read a specific wiki page
ktx wiki read revenue-definitions
+# Read a specific wiki page as JSON
+ktx wiki read revenue-definitions --json
+
# Search wiki pages
ktx wiki search "monthly recurring revenue"
+# Search wiki pages as JSON
+ktx wiki search "monthly recurring revenue" --json --limit 10
+
# Write a global knowledge page
ktx wiki write revenue-definitions \
--summary "Canonical revenue metric definitions" \
@@ -97,13 +110,16 @@ Wiki commands print local knowledge pages and search results. Agents should sear
```json
{
- "results": [
- {
- "key": "revenue-definitions",
- "summary": "Canonical revenue metric definitions",
- "score": 0.92
- }
- ]
+ "kind": "list",
+ "data": {
+ "items": [
+ {
+ "key": "revenue-definitions",
+ "summary": "Canonical revenue metric definitions",
+ "score": 0.92
+ }
+ ]
+ }
}
```
diff --git a/docs-site/content/docs/cli-reference/meta.json b/docs-site/content/docs/cli-reference/meta.json
index a5d7a95f..bed3f98c 100644
--- a/docs-site/content/docs/cli-reference/meta.json
+++ b/docs-site/content/docs/cli-reference/meta.json
@@ -9,7 +9,6 @@
"ktx-sl",
"ktx-wiki",
"ktx-status",
- "ktx-agent",
"ktx-dev"
]
}
diff --git a/docs-site/content/docs/concepts/context-as-code.mdx b/docs-site/content/docs/concepts/context-as-code.mdx
index e40665ec..3c43082e 100644
--- a/docs-site/content/docs/concepts/context-as-code.mdx
+++ b/docs-site/content/docs/concepts/context-as-code.mdx
@@ -59,7 +59,7 @@ dbt / Looker / Metabase / Notion
A typical branch shows a semantic diff: "this ingest added 3 new sources from dbt, updated 2 join definitions based on schema changes, and created 1 knowledge page from a Notion doc." Analytics engineers review the diff, verify that the new sources look correct, and merge.
-Teams usually run this on demand while setting up a source, then schedule it once the source is stable. A cron job or CI schedule can run `ktx ingest --all --no-input` overnight on an ingest branch so the latest dbt manifests, BI metadata, and documentation updates are ready for review each morning.
+Teams usually run this on demand while setting up a source, then schedule it once the source is stable. A cron job or CI schedule can run `ktx ingest run --connection-id --adapter --no-input` overnight on an ingest branch so the latest dbt manifests, BI metadata, and documentation updates are ready for review each morning.
Once merged, agents querying through the KTX CLI see the updated context immediately. No deployment step, no cache invalidation, no restart. The files are the source of truth, and agents read them on every request.
diff --git a/docs-site/content/docs/getting-started/quickstart.mdx b/docs-site/content/docs/getting-started/quickstart.mdx
index ece3ceac..6aef2b14 100644
--- a/docs-site/content/docs/getting-started/quickstart.mdx
+++ b/docs-site/content/docs/getting-started/quickstart.mdx
@@ -211,7 +211,7 @@ KTX writes project state as plain files so agents can inspect and edit changes i
| `semantic-layer//*.yaml` | context build, ingestion, or `ktx sl write` | Semantic source definitions agents use for SQL generation |
| `knowledge/global/*.md` | ingestion or `ktx wiki write --scope global` | Shared business context and metric definitions |
| `knowledge/user//*.md` | `ktx wiki write --scope user` | User-scoped notes for one agent/user context |
-| `.claude/skills/ktx/SKILL.md`, `.agents/skills/ktx/SKILL.md` | CLI-mode agent integration setup | Agent instructions for calling `ktx agent` commands |
+| `.claude/skills/ktx/SKILL.md`, `.agents/skills/ktx/SKILL.md` | CLI-mode agent integration setup | Agent instructions for calling public `ktx` commands |
## Verify it worked
@@ -239,7 +239,7 @@ Agent integration ready: yes (claude-code:project)
| `ktx: command not found` | The KTX package is not installed globally, or the shell cannot find the global binary | Run `npm install -g @kaelio/ktx` and open a new shell |
| LLM health check fails | Missing, invalid, or unauthorized Anthropic API key | Export `ANTHROPIC_API_KEY` or rerun `ktx setup` and choose the file-backed secret option |
| OpenAI embedding check fails | `OPENAI_API_KEY` is missing when OpenAI embeddings are selected | Export `OPENAI_API_KEY`, or rerun setup and choose local sentence-transformers embeddings |
-| Local embeddings hang or fail | The managed Python runtime cannot start or the local model runtime is unavailable | Install `uv`, run `ktx dev runtime doctor`, then run `ktx dev runtime install --feature local-embeddings --yes` and rerun setup |
+| Local embeddings hang or fail | The managed Python runtime cannot start or the local model runtime is unavailable | Install `uv`, run `ktx dev runtime status`, then run `ktx dev runtime install --feature local-embeddings --yes` and rerun setup |
| Database connection test fails | Credentials, network access, warehouse, database, or schema value is wrong | Test the same URL with the database's native client, then rerun `ktx connection add ... --force` or rerun setup |
| `KTX context built: no` in `ktx status` | Setup saved configuration but did not build context | Run `ktx setup` and choose to build context now |
| Agent integration is incomplete | Setup skipped the agents step or the target was not installed | Run `ktx setup --agents --target codex --project` using the target you need |
diff --git a/docs-site/content/docs/guides/building-context.mdx b/docs-site/content/docs/guides/building-context.mdx
index 31d55bac..25d873d9 100644
--- a/docs-site/content/docs/guides/building-context.mdx
+++ b/docs-site/content/docs/guides/building-context.mdx
@@ -12,7 +12,7 @@ Scanning connects to your database and extracts structural metadata. KTX stores
### Running a scan
```bash
-ktx dev scan
+ktx scan
```
This runs a structural scan by default. You can control what the scan does with the `--mode` flag:
@@ -25,25 +25,18 @@ This runs a structural scan by default. You can control what the scan does with
```bash
# Scan with relationship detection
-ktx dev scan my-postgres --mode relationships
+ktx scan my-postgres --mode relationships
# Preview without writing results
-ktx dev scan my-postgres --dry-run
+ktx scan my-postgres --dry-run
```
-### Checking scan status
+### Checking scan results
-Every scan produces a run ID. Use it to check progress or review results:
+Every scan prints a summary and writes local artifacts. Use `ktx status` after a scan to review project readiness and follow-up setup work:
```bash
-# Check status of a scan run
-ktx dev scan status
-
-# Print the full scan report
-ktx dev scan report
-
-# Get the report as JSON for scripting
-ktx dev scan report --json
+ktx status
```
### Relationship detection
@@ -56,49 +49,7 @@ Many databases lack declared foreign keys. KTX infers relationships by scoring c
| 0.55 – 0.84 | `review` | Plausible — needs human review |
| < 0.55 | `rejected` | Low confidence — not applied |
-After a relationship scan, review the candidates:
-
-```bash
-# Show candidates pending review (default)
-ktx dev scan relationships
-
-# Show all candidates regardless of status
-ktx dev scan relationships --status all
-
-# Accept a specific candidate
-ktx dev scan relationships --accept
-
-# Reject a candidate with a note
-ktx dev scan relationships --reject --note "These columns share a name but are unrelated"
-```
-
-Once you've reviewed candidates, apply the accepted ones as joins in your semantic layer:
-
-```bash
-# Apply all accepted relationships
-ktx dev scan relationship-apply --all-accepted
-
-# Preview what would be applied
-ktx dev scan relationship-apply --all-accepted --dry-run
-
-# Apply a specific candidate
-ktx dev scan relationship-apply --candidate
-```
-
-### Calibrating thresholds
-
-As you review more relationships, KTX can evaluate whether the default thresholds (0.85 accept, 0.55 review) are optimal for your schema:
-
-```bash
-# See how your feedback aligns with current thresholds
-ktx dev scan relationship-calibration --connection my-postgres
-
-# Get threshold recommendations (needs 20+ labels, 5+ accepted, 5+ rejected)
-ktx dev scan relationship-thresholds --connection my-postgres
-
-# Export your review decisions as calibration labels
-ktx dev scan relationship-feedback --connection my-postgres
-```
+Relationship scans run with `ktx scan --mode relationships`. This command only executes the scan; relationship review and calibration subcommands are not part of the current CLI surface.
## Ingestion
@@ -115,19 +66,7 @@ Each ingest run follows this flow:
### Running an ingest
```bash
-# Ingest one configured context source
-ktx ingest my-dbt-source
-
-# Ingest every configured context source
-ktx ingest --all
-```
-
-The public `ktx ingest` command uses the source configuration in `ktx.yaml`, including the source `driver` and any adapter-specific paths or credentials.
-
-For adapter-level debugging, use the low-level `ktx dev ingest run` command:
-
-```bash
-ktx dev ingest run --connection-id my-dbt-source --adapter dbt
+ktx ingest run --connection-id my-dbt-source --adapter dbt
```
Useful low-level flags:
@@ -152,7 +91,7 @@ ktx ingest status
ktx ingest watch
# Replay a past ingest run
-ktx dev ingest replay
+ktx ingest replay
```
The `watch` command opens an interactive TUI that shows the memory-flow output — every tool call, LLM decision, and artifact written during the ingest.
@@ -235,7 +174,7 @@ Orders in "pending" status for more than 48 hours are flagged for review.
Every ingest session records a full transcript — tool calls, LLM responses, and write decisions. You can replay any session to debug why a source was written a certain way:
```bash
-ktx dev ingest replay --viz
+ktx ingest replay --viz
```
This opens the same TUI view as the original run, letting you step through the agent's reasoning.
diff --git a/docs-site/content/docs/guides/serving-agents.mdx b/docs-site/content/docs/guides/serving-agents.mdx
index 4285611b..b6f073b8 100644
--- a/docs-site/content/docs/guides/serving-agents.mdx
+++ b/docs-site/content/docs/guides/serving-agents.mdx
@@ -3,37 +3,36 @@ title: Serving Agents
description: Expose your context to Claude Code, Cursor, Codex, and other coding agents.
---
-Once you've built and refined your context, the final step is exposing it to
-coding agents. KTX provides machine-readable CLI commands for direct terminal
-access from Claude Code, Cursor, Codex, OpenCode, and custom agent workflows.
+Once you've built and refined your context, expose it to coding agents through
+the public KTX CLI. Claude Code, Cursor, Codex, OpenCode, and custom agent
+workflows can call the same commands you use at a terminal.
## CLI Commands
-KTX provides a set of machine-readable commands under `ktx agent`. These return
-JSON output designed for programmatic consumption.
+KTX public commands support JSON output for the context reads that agents use
+most often. Use `--project-dir` when the agent is not already running inside the
+KTX project directory.
### Available commands
```bash
-# List available tools and their descriptions
-ktx agent tools --json
-
-# Get project context for planning
-ktx agent context --json
+# Check setup and context readiness
+ktx status --json
```
**Semantic layer:**
```bash
# List sources
-ktx agent sl list --json
-ktx agent sl list --json --connection-id my-postgres
+ktx sl list --json
+ktx sl list --json --connection-id my-postgres
+ktx sl list --json --query "revenue"
# Read a source
-ktx agent sl read orders --json --connection-id my-postgres
+ktx sl read orders --json --connection-id my-postgres
# Run a query from a JSON file
-ktx agent sl query --json \
+ktx sl query --json \
--connection-id my-postgres \
--query-file query.json \
--execute \
@@ -44,20 +43,10 @@ ktx agent sl query --json \
```bash
# Search knowledge pages
-ktx agent wiki search "revenue recognition" --json --limit 10
+ktx wiki search "revenue recognition" --json --limit 10
# Read a specific page
-ktx agent wiki read order-status-definitions --json
-```
-
-**SQL execution:**
-
-```bash
-# Execute read-only SQL with a row limit
-ktx agent sql execute --json \
- --connection-id my-postgres \
- --sql-file query.sql \
- --max-rows 500
+ktx wiki read order-status-definitions --json
```
## Setting Up Your Agent
diff --git a/docs-site/content/docs/integrations/agent-clients.mdx b/docs-site/content/docs/integrations/agent-clients.mdx
index 1c105e1f..8a055fda 100644
--- a/docs-site/content/docs/integrations/agent-clients.mdx
+++ b/docs-site/content/docs/integrations/agent-clients.mdx
@@ -3,7 +3,9 @@ title: Agent Clients
description: Set up KTX with Claude Code, Cursor, Codex, and OpenCode.
---
-KTX integrates with coding agents through CLI skills and command files. These files teach agents to call `ktx agent ...` commands directly from the terminal for semantic-layer context, wiki knowledge, and safe SQL execution.
+KTX integrates with coding agents through CLI skills and command files. These
+files teach agents to call public `ktx` commands directly from the terminal for
+semantic-layer context and wiki knowledge.
Run `ktx setup` and select your agent targets, or configure manually using the snippets below.
@@ -26,17 +28,17 @@ Create `.claude/skills/ktx/SKILL.md`:
```markdown title=".claude/skills/ktx/SKILL.md"
---
name: ktx
-description: Use local KTX semantic context, wiki knowledge, and safe SQL execution for this project.
+description: Use local KTX semantic context and wiki knowledge for this project.
---
Available commands:
-- `ktx agent context --json --project-dir /path/to/project`
-- `ktx agent sl list --json --project-dir /path/to/project`
-- `ktx agent sl read '' --json --project-dir /path/to/project`
-- `ktx agent sl query --json --project-dir /path/to/project --connection-id '' --query-file '' --execute --max-rows 100`
-- `ktx agent wiki search '' --json --project-dir /path/to/project`
-- `ktx agent wiki read '' --json --project-dir /path/to/project`
-- `ktx agent sql execute --json --project-dir /path/to/project --connection-id '' --sql-file '' --max-rows 100`
+- `ktx status --json --project-dir /path/to/project`
+- `ktx sl list --json --project-dir /path/to/project`
+- `ktx sl list --json --project-dir /path/to/project --query ''`
+- `ktx sl read '' --json --project-dir /path/to/project --connection-id ''`
+- `ktx sl query --json --project-dir /path/to/project --connection-id '' --query-file '' --execute --max-rows 100`
+- `ktx wiki search '' --json --project-dir /path/to/project --limit 10`
+- `ktx wiki read '' --json --project-dir /path/to/project`
```
### Workflow tips
@@ -123,22 +125,19 @@ All supported agent clients call the same KTX CLI commands:
| Command | Description |
|---------|-------------|
-| `ktx agent context --json` | Return a compact project context summary |
-| `ktx agent tools --json` | List available agent-facing commands |
-| `ktx agent wiki search --json` | Search knowledge pages |
-| `ktx agent wiki read --json` | Read a knowledge page |
-| `ktx agent wiki write --json` | Write or update a knowledge page |
-| `ktx agent sl list --json` | List semantic layer sources |
-| `ktx agent sl read --json` | Read a semantic source definition |
-| `ktx agent sl write --json` | Write or update a semantic source |
-| `ktx agent sl validate --json` | Validate semantic source definitions |
-| `ktx agent sl query --json` | Execute a semantic layer query when semantic compute is configured |
-| `ktx agent sql execute --json` | Execute read-only SQL with an explicit row limit |
+| `ktx status --json` | Return project setup and context readiness |
+| `ktx wiki search --json` | Search knowledge pages |
+| `ktx wiki read --json` | Read a knowledge page |
+| `ktx wiki write ` | Write or update a knowledge page |
+| `ktx sl list --json` | List semantic-layer sources |
+| `ktx sl list --query --json` | Search semantic-layer sources |
+| `ktx sl read --json --connection-id ` | Read a semantic source definition |
+| `ktx sl write --connection-id ` | Write or update a semantic source |
+| `ktx sl validate --connection-id ` | Validate semantic source definitions |
+| `ktx sl query --json` | Execute a semantic-layer query when semantic compute is configured |
### Security constraints
-- SQL execution is always read-only.
-- Agent SQL execution requires an explicit `--max-rows` limit from 1 to 1000.
- Secrets and credentials are never exposed in command output.
- Commands resolve the project from `--project-dir`, `KTX_PROJECT_DIR`, or the nearest `ktx.yaml`.
diff --git a/docs-site/content/docs/integrations/context-sources.mdx b/docs-site/content/docs/integrations/context-sources.mdx
index 02554e08..904e3f95 100644
--- a/docs-site/content/docs/integrations/context-sources.mdx
+++ b/docs-site/content/docs/integrations/context-sources.mdx
@@ -13,7 +13,7 @@ Agents should configure and ingest context sources in this order:
1. Add the context source connection in `ktx.yaml` or with `ktx setup`.
2. Store tokens as `env:NAME` or `file:/path/to/secret`.
-3. Run `ktx ingest ` for one source or `ktx ingest --all`.
+3. Run `ktx ingest run --connection-id --adapter ` for one source or `ktx ingest run --connection-id --adapter `.
4. Check progress with `ktx ingest status --json`.
5. Review generated `semantic-layer/` YAML and `knowledge/` Markdown files in git.
6. Validate changed semantic sources with `ktx sl validate`.
diff --git a/docs-site/content/docs/integrations/primary-sources.mdx b/docs-site/content/docs/integrations/primary-sources.mdx
index 49200d47..94dc4e44 100644
--- a/docs-site/content/docs/integrations/primary-sources.mdx
+++ b/docs-site/content/docs/integrations/primary-sources.mdx
@@ -511,4 +511,4 @@ No authentication required — SQLite is file-based. The file must be readable b
| Scan returns no tables | Schema/database/project filter is wrong or the user lacks metadata permissions | Verify the schema list and grant metadata read permissions |
| Historic SQL is empty | Query history extension or warehouse history view is unavailable | Enable the warehouse-specific history feature, then rerun scan or setup |
| Column statistics are missing | Connector cannot access stats tables or the warehouse does not expose them | Grant stats permissions where supported; otherwise rely on structural scan output |
-| SQL execution fails through agents | Connection is missing, unreachable, or query execution is disabled | Run `ktx connection test ` and check the agent command flags |
+| Semantic query execution fails | Connection is missing, unreachable, or query execution is disabled | Run `ktx connection test ` and check the `ktx sl query` flags |
diff --git a/docs-site/lib/llm-docs.ts b/docs-site/lib/llm-docs.ts
index 9d9b5c74..69aac698 100644
--- a/docs-site/lib/llm-docs.ts
+++ b/docs-site/lib/llm-docs.ts
@@ -67,12 +67,12 @@ ${link("/docs/guides/writing-context", "Writing Context", "Write semantic source
- [Full documentation](${absoluteUrl("/llms-full.txt")}): All docs pages in one plain-text markdown response
- [Markdown access guide](${absoluteUrl("/docs/ai-resources/markdown-access.md")}): How to fetch llms.txt, llms-full.txt, and per-page Markdown
- [Quickstart markdown](${absoluteUrl("/docs/getting-started/quickstart.md")}): Human setup walkthrough
-- [Agent CLI markdown](${absoluteUrl("/docs/cli-reference/ktx-agent.md")}): Machine-readable agent commands
+- [Semantic-layer CLI markdown](${absoluteUrl("/docs/cli-reference/ktx-sl.md")}): Semantic-layer commands and JSON output
+- [Wiki CLI markdown](${absoluteUrl("/docs/cli-reference/ktx-wiki.md")}): Knowledge page commands and JSON output
## CLI Reference
${link("/docs/cli-reference/ktx-setup", "ktx setup", "Interactive project setup")}
-${link("/docs/cli-reference/ktx-agent", "ktx agent", "Machine-readable commands for coding agents")}
${link("/docs/cli-reference/ktx-sl", "ktx sl", "Semantic-layer commands")}
${link("/docs/cli-reference/ktx-wiki", "ktx wiki", "Knowledge page commands")}
${link("/docs/cli-reference/ktx-connection", "ktx connection", "Connection management commands")}
diff --git a/docs/superpowers/plans/2026-05-12-notion-warehouse-verification-gap-closure.md b/docs/superpowers/plans/2026-05-12-notion-warehouse-verification-gap-closure.md
new file mode 100644
index 00000000..3cfdc843
--- /dev/null
+++ b/docs/superpowers/plans/2026-05-12-notion-warehouse-verification-gap-closure.md
@@ -0,0 +1,785 @@
+# Notion Warehouse Verification Gap Closure Implementation Plan
+
+> **For agentic workers:** REQUIRED SUB-SKILL: Use superpowers:subagent-driven-development (recommended) or superpowers:executing-plans to implement this plan task-by-task. Steps use checkbox (`- [ ]`) syntax for tracking.
+
+**Goal:** Close the remaining v1 gaps that prevent ingest agents, especially
+Notion WorkUnits, from reliably verifying warehouse table and column
+identifiers before writing wiki or semantic-layer output.
+
+**Architecture:** Keep the existing warehouse verification tool module and
+runner wiring. Add Notion target-warehouse scoping through the local adapter
+factory, make the active WorkUnit prompt name the shipped tools, enforce
+`allowedConnectionNames` in `discover_data`, and teach `entity_details` to
+resolve and reject column-level display targets.
+
+**Tech Stack:** TypeScript, Node 22, Vitest, AI SDK v6 tools, Zod, KTX local
+ingest adapters, KTX file store.
+
+---
+
+## Audit summary
+
+The previous implementation plan landed the main tool module and prompt
+protocol, but four v1-blocking gaps remain:
+
+- Notion ingest sessions still allow only the Notion connection unless a
+ specific adapter supplies target IDs. `NotionSourceAdapter` does not supply
+ target warehouse IDs, so the original Notion hallucination case cannot use
+ `entity_details` or raw-schema `discover_data` for the warehouse connection.
+- The active WorkUnit framing prompt still tells agents to call
+ `wiki_sl_search` and `sl_describe_table`, which are not shipped KTX tools.
+- `discover_data` accepts an explicit out-of-scope `connectionName` and still
+ searches raw schema for that connection.
+- `entity_details({ targets: [{ display: "schema.table.column" }] })` does not
+ resolve column display strings and does not fail explicit missing-column
+ targets.
+
+Non-blocking gaps remain out of scope for this plan:
+
+- Full DDL-style `entity_details` formatting with FK and profile summaries.
+- AST-backed SQL read-only validation for data-modifying CTEs.
+- Search over `enrichment/descriptions.json` for generated descriptions.
+- Lexicographic latest-sync edge cases for non-timestamp sync IDs.
+- Hard write-time validation in `wiki_write` and `emit_unmapped_fallback`.
+
+## File structure
+
+Modify these files:
+
+- `packages/context/src/ingest/adapters/notion/notion.adapter.ts`: add
+ configured target warehouse IDs and implement `listTargetConnectionIds()`.
+- `packages/context/src/ingest/adapters/notion/notion.adapter.test.ts`: cover
+ Notion target connection ID fan-out.
+- `packages/context/src/ingest/local-adapters.ts`: pass primary warehouse IDs
+ into `NotionSourceAdapter`.
+- `packages/context/src/ingest/local-adapters.test.ts`: cover local Notion
+ adapter target IDs.
+- `packages/context/src/ingest/adapters/notion/chunk.ts`: update Notion
+ WorkUnit notes to prefer the warehouse verification tools.
+- `packages/context/src/ingest/adapters/notion/notion.adapter.test.ts`: update
+ Notion note expectations.
+- `packages/context/prompts/memory_agent_bundle_ingest_work_unit.md`: replace
+ stale tool names in the active WorkUnit prompt.
+- `packages/context/src/ingest/ingest-prompts.test.ts`: guard the WorkUnit
+ prompt against stale tool names.
+- `packages/context/src/ingest/tools/warehouse-verification/discover-data.tool.ts`:
+ refuse explicit out-of-scope connection names.
+- `packages/context/src/ingest/tools/warehouse-verification/discover-data.tool.test.ts`:
+ cover `discover_data` scoping.
+- `packages/context/src/ingest/tools/warehouse-verification/warehouse-catalog.service.ts`:
+ add column-aware display-target resolution.
+- `packages/context/src/ingest/tools/warehouse-verification/warehouse-catalog.service.test.ts`:
+ cover column display resolution.
+- `packages/context/src/ingest/tools/warehouse-verification/entity-details.tool.ts`:
+ use column-aware resolution and report missing columns.
+- `packages/context/src/ingest/tools/warehouse-verification/entity-details.tool.test.ts`:
+ cover column display and missing-column behavior.
+
+### Task 1: Give Notion ingest access to target warehouses
+
+**Files:**
+- Modify: `packages/context/src/ingest/adapters/notion/notion.adapter.ts`
+- Modify: `packages/context/src/ingest/adapters/notion/notion.adapter.test.ts`
+- Modify: `packages/context/src/ingest/local-adapters.ts`
+- Modify: `packages/context/src/ingest/local-adapters.test.ts`
+
+- [ ] **Step 1: Write the failing Notion adapter test**
+
+Add this test inside `describe('NotionSourceAdapter', ...)` in
+`packages/context/src/ingest/adapters/notion/notion.adapter.test.ts`:
+
+```ts
+it('returns configured target warehouse connection ids', async () => {
+ const adapter = new NotionSourceAdapter({
+ targetConnectionIds: ['warehouse', 'warehouse', 'analytics'],
+ });
+
+ await expect(adapter.listTargetConnectionIds?.(stagedDir)).resolves.toEqual([
+ 'analytics',
+ 'warehouse',
+ ]);
+});
+```
+
+- [ ] **Step 2: Run the failing Notion adapter test**
+
+Run:
+
+```bash
+pnpm --filter @ktx/context exec vitest run src/ingest/adapters/notion/notion.adapter.test.ts -t "target warehouse connection ids"
+```
+
+Expected: FAIL because `NotionSourceAdapterDeps` has no
+`targetConnectionIds` option and `NotionSourceAdapter` does not implement
+`listTargetConnectionIds()`.
+
+- [ ] **Step 3: Implement Notion target connection IDs**
+
+Modify `packages/context/src/ingest/adapters/notion/notion.adapter.ts`:
+
+```ts
+export interface NotionSourceAdapterDeps {
+ onPullSucceeded?: (ctx: NotionPullSucceededContext) => Promise;
+ logger?: NotionFetchLogger;
+ targetConnectionIds?: string[];
+}
+
+function uniqueSorted(values: readonly string[] | undefined): string[] {
+ return [...new Set(values ?? [])].sort((left, right) =>
+ left.localeCompare(right),
+ );
+}
+```
+
+Add this method to `NotionSourceAdapter`:
+
+```ts
+ async listTargetConnectionIds(_stagedDir: string): Promise {
+ return uniqueSorted(this.deps.targetConnectionIds);
+ }
+```
+
+- [ ] **Step 4: Pass primary warehouses into the local Notion adapter**
+
+Modify the Notion adapter construction in
+`packages/context/src/ingest/local-adapters.ts`:
+
+```ts
+ new NotionSourceAdapter({
+ targetConnectionIds: primaryWarehouseConnectionIds(project),
+ ...(options.logger ? { logger: options.logger } : {}),
+ }),
+```
+
+- [ ] **Step 5: Write the local adapter fan-out test**
+
+Add this test to `packages/context/src/ingest/local-adapters.test.ts`:
+
+```ts
+it('passes primary warehouse connection ids to the local Notion adapter', async () => {
+ const adapters = createDefaultLocalIngestAdapters(
+ projectWithConnections({
+ notion: {
+ driver: 'notion',
+ auth_token: 'secret',
+ crawl_mode: 'selected_roots',
+ root_page_ids: ['page-1'],
+ },
+ warehouse: {
+ driver: 'postgres',
+ url: 'postgresql://readonly@db.example.test/analytics',
+ },
+ docs: {
+ driver: 'dbt',
+ source_dir: './dbt',
+ },
+ } as never),
+ );
+
+ const notion = adapters.find((adapter) => adapter.source === 'notion');
+
+ await expect(notion?.listTargetConnectionIds?.('/tmp/staged-notion')).resolves.toEqual([
+ 'warehouse',
+ ]);
+});
+```
+
+- [ ] **Step 6: Run the Notion target tests**
+
+Run:
+
+```bash
+pnpm --filter @ktx/context exec vitest run \
+ src/ingest/adapters/notion/notion.adapter.test.ts -t "target warehouse connection ids" \
+ src/ingest/local-adapters.test.ts -t "local Notion adapter"
+```
+
+Expected: PASS.
+
+- [ ] **Step 7: Commit**
+
+Run:
+
+```bash
+git add \
+ packages/context/src/ingest/adapters/notion/notion.adapter.ts \
+ packages/context/src/ingest/adapters/notion/notion.adapter.test.ts \
+ packages/context/src/ingest/local-adapters.ts \
+ packages/context/src/ingest/local-adapters.test.ts
+git commit -m "fix(context): expose target warehouses to Notion ingest"
+```
+
+### Task 2: Remove stale tool names from active ingest prompts
+
+**Files:**
+- Modify: `packages/context/prompts/memory_agent_bundle_ingest_work_unit.md`
+- Modify: `packages/context/src/ingest/ingest-prompts.test.ts`
+- Modify: `packages/context/src/ingest/adapters/notion/chunk.ts`
+- Modify: `packages/context/src/ingest/adapters/notion/notion.adapter.test.ts`
+
+- [ ] **Step 1: Add failing prompt guards**
+
+Add this test to `packages/context/src/ingest/ingest-prompts.test.ts`:
+
+```ts
+it('uses shipped warehouse verification tools in the WorkUnit prompt', async () => {
+ const prompt = await readFile(
+ new URL('../../prompts/memory_agent_bundle_ingest_work_unit.md', import.meta.url),
+ 'utf-8',
+ );
+
+ expect(prompt).toContain('discover_data');
+ expect(prompt).toContain('entity_details');
+ expect(prompt).not.toContain('wiki_sl_search');
+ expect(prompt).not.toContain('sl_describe_table');
+});
+```
+
+- [ ] **Step 2: Run the failing prompt guard**
+
+Run:
+
+```bash
+pnpm --filter @ktx/context exec vitest run src/ingest/ingest-prompts.test.ts -t "warehouse verification tools"
+```
+
+Expected: FAIL because the WorkUnit prompt still contains `wiki_sl_search` and
+`sl_describe_table`.
+
+- [ ] **Step 3: Update the WorkUnit framing prompt**
+
+In `packages/context/prompts/memory_agent_bundle_ingest_work_unit.md`, replace
+the first `` paragraph with:
+
+```md
+You are processing ONE WorkUnit of a multi-file ingest bundle. The WorkUnit gives you a slice of raw source files (LookML views, dbt/MetricFlow YAMLs, Metabase card JSONs, Notion pages, or similar) and you must translate that slice into KTX semantic-layer sources and/or knowledge wiki pages, in one pass. Prior WorkUnits in this same job may have already written SL sources and wiki pages; their writes are visible on the working branch and discoverable with `discover_data`.
+```
+
+In workflow step 2, replace the final sentence with:
+
+```md
+The triage skill tells you how to react when `discover_data` reveals that a prior WU already wrote something overlapping.
+```
+
+In workflow step 4, replace the sentence that starts
+`For each raw file:` with:
+
+```md
+4. For each raw file: call `read_raw_file` (or `read_raw_span` for slicing large files) to load content. Before writing a new SL source or wiki page, call `discover_data` for each candidate source, table, metric, or topic name to find prior-WU writes, existing wiki pages, SL sources, and raw warehouse matches; apply `ingest_triage` when you hit one, and apply any matching canonical pin before deciding whether to edit, rename, or skip.
+```
+
+In the `` block, replace the physical-column rule with:
+
+```md
+- Do not invent physical column names or grain keys. For table-backed SL sources, every `columns:`, `grain:`, `joins:`, `segments:`, and `measures[].expr` column must come from raw-file column declarations or warehouse-backed discovery (`discover_data`, `sl_discover`, `entity_details`). If column names are not confirmed, capture the business context in wiki instead of writing a full SL source.
+```
+
+- [ ] **Step 4: Update Notion WorkUnit notes**
+
+In `packages/context/src/ingest/adapters/notion/chunk.ts`, replace
+`NOTION_SL_WRITE_GUIDANCE` with:
+
+```ts
+const NOTION_SL_WRITE_GUIDANCE =
+ 'Write wiki entries with wiki_write. Wiki keys must be flat slugs like orbit-company-overview, not orbit/company-overview. Search existing wiki pages, SL sources, and raw warehouse schema for the same tables or sl_refs with discover_data before creating a new page. Only write or edit SL sources after discover_data plus sl_discover/sl_read_source or entity_details confirms a mapped non-Notion target source; if no mapped target exists, emit_unmapped_fallback and keep the fact wiki-only. Notion dataSourceCount counts Notion databases/data sources only, not warehouse/dbt mappings. If a warehouse/dbt connection exists but the named table or source is absent, use reason no_physical_table rather than no_connection_mapping. Do not create SL sources under the Notion connection just because a page mentions a warehouse table.';
+```
+
+In the `reconcileNotes` array in the same file, replace:
+
+```ts
+ 'Notion dataSourceCount is Notion-only; use sl_discover for warehouse/dbt mapping decisions.',
+```
+
+with:
+
+```ts
+ 'Notion dataSourceCount is Notion-only; use discover_data/entity_details for warehouse/dbt mapping decisions.',
+```
+
+- [ ] **Step 5: Update Notion note expectations**
+
+In `packages/context/src/ingest/adapters/notion/notion.adapter.test.ts`,
+update the note expectations in `it('chunks changed Notion pages...')`:
+
+```ts
+expect(result.workUnits[0].notes).toContain('discover_data');
+expect(result.workUnits[0].notes).toContain('entity_details');
+```
+
+Update the exact `reconcileNotes` expectation to:
+
+```ts
+expect(result.reconcileNotes).toEqual([
+ 'Notion maxKnowledgeCreatesPerRun=25',
+ 'Notion maxKnowledgeUpdatesPerRun=20',
+ 'Notion dataSourceCount is Notion-only; use discover_data/entity_details for warehouse/dbt mapping decisions.',
+ 'Reconcile Notion wiki pages sharing tables/sl_refs before creating distinct artifacts.',
+]);
+```
+
+- [ ] **Step 6: Run prompt and Notion note tests**
+
+Run:
+
+```bash
+pnpm --filter @ktx/context exec vitest run \
+ src/ingest/ingest-prompts.test.ts \
+ src/ingest/adapters/notion/notion.adapter.test.ts
+```
+
+Expected: PASS.
+
+- [ ] **Step 7: Commit**
+
+Run:
+
+```bash
+git add \
+ packages/context/prompts/memory_agent_bundle_ingest_work_unit.md \
+ packages/context/src/ingest/ingest-prompts.test.ts \
+ packages/context/src/ingest/adapters/notion/chunk.ts \
+ packages/context/src/ingest/adapters/notion/notion.adapter.test.ts
+git commit -m "fix(context): update ingest prompts for warehouse verification tools"
+```
+
+### Task 3: Enforce allowed connection scope in discover_data
+
+**Files:**
+- Modify: `packages/context/src/ingest/tools/warehouse-verification/discover-data.tool.ts`
+- Modify: `packages/context/src/ingest/tools/warehouse-verification/discover-data.tool.test.ts`
+
+- [ ] **Step 1: Write the failing scoping test**
+
+Add this test to
+`packages/context/src/ingest/tools/warehouse-verification/discover-data.tool.test.ts`:
+
+```ts
+it('refuses explicit out-of-scope connection names', async () => {
+ const result = await tool.call({ query: 'orders', connectionName: 'billing' }, context);
+
+ expect(result.markdown).toContain('Connection "billing" is not available to this ingest stage.');
+ expect(result.structured).toEqual({ wiki: null, sl: null, raw: null });
+ expect(wikiSearchTool.call).not.toHaveBeenCalled();
+ expect(slDiscoverTool.call).not.toHaveBeenCalled();
+ expect(catalog.searchByName).not.toHaveBeenCalled();
+});
+```
+
+- [ ] **Step 2: Run the failing scoping test**
+
+Run:
+
+```bash
+pnpm --filter @ktx/context exec vitest run src/ingest/tools/warehouse-verification/discover-data.tool.test.ts -t "out-of-scope"
+```
+
+Expected: FAIL because `discover_data` currently searches raw schema for an
+explicit `connectionName` even when it is not in `allowedConnectionNames`.
+
+- [ ] **Step 3: Add the scope guard**
+
+In
+`packages/context/src/ingest/tools/warehouse-verification/discover-data.tool.ts`,
+add this helper near `totalSources()`:
+
+```ts
+function allowedConnectionNames(context: ToolContext): ReadonlySet | null {
+ return context.session?.allowedConnectionNames ?? null;
+}
+```
+
+At the top of `DiscoverDataTool.call()`, before the `sourceName` branch and
+before calling any child tool, add:
+
+```ts
+ const allowed = allowedConnectionNames(context);
+ if (input.connectionName && allowed && !allowed.has(input.connectionName)) {
+ return {
+ markdown: `Connection "${input.connectionName}" is not available to this ingest stage.`,
+ structured: { wiki: null, sl: null, raw: null },
+ };
+ }
+```
+
+Then replace the raw connection-list construction with:
+
+```ts
+ const connections = input.connectionName ? [input.connectionName] : [...(allowed ?? [])].sort();
+```
+
+- [ ] **Step 4: Run discover_data tests**
+
+Run:
+
+```bash
+pnpm --filter @ktx/context exec vitest run src/ingest/tools/warehouse-verification/discover-data.tool.test.ts
+```
+
+Expected: PASS.
+
+- [ ] **Step 5: Commit**
+
+Run:
+
+```bash
+git add \
+ packages/context/src/ingest/tools/warehouse-verification/discover-data.tool.ts \
+ packages/context/src/ingest/tools/warehouse-verification/discover-data.tool.test.ts
+git commit -m "fix(context): scope raw schema discovery to allowed connections"
+```
+
+### Task 4: Fix column-level entity_details verification
+
+**Files:**
+- Modify: `packages/context/src/ingest/tools/warehouse-verification/warehouse-catalog.service.ts`
+- Modify: `packages/context/src/ingest/tools/warehouse-verification/warehouse-catalog.service.test.ts`
+- Modify: `packages/context/src/ingest/tools/warehouse-verification/entity-details.tool.ts`
+- Modify: `packages/context/src/ingest/tools/warehouse-verification/entity-details.tool.test.ts`
+
+- [ ] **Step 1: Write failing catalog column-target tests**
+
+First update `seedLiveDatabaseScan()` in that test file so BigQuery tables have
+a project/catalog. Replace the repeated inline table refs with:
+
+```ts
+const tableRef = {
+ catalog: driver === 'bigquery' ? 'analytics' : null,
+ db: driver === 'sqlite' ? null : 'public',
+ name: 'orders',
+};
+```
+
+Use `tableRef.catalog`, `tableRef.db`, and `tableRef.name` for the seeded
+table and profile table references.
+
+Then add these tests to
+`packages/context/src/ingest/tools/warehouse-verification/warehouse-catalog.service.test.ts`:
+
+```ts
+it('resolves postgres column display strings without treating the column as a table', async () => {
+ await seedLiveDatabaseScan();
+ const catalog = new WarehouseCatalogService({ fileStore: project.fileStore });
+
+ await expect(catalog.resolveDisplayTarget('warehouse', 'public.orders.status')).resolves.toMatchObject({
+ resolved: { catalog: null, db: 'public', name: 'orders', column: 'status' },
+ candidates: [],
+ dialect: 'postgres',
+ });
+});
+
+it('resolves BigQuery column display strings with four parts', async () => {
+ await seedLiveDatabaseScan('warehouse', 'sync-bigquery', 'bigquery');
+ const catalog = new WarehouseCatalogService({ fileStore: project.fileStore });
+
+ await expect(catalog.resolveDisplayTarget('warehouse', 'analytics.public.orders.status')).resolves.toMatchObject({
+ resolved: { catalog: 'analytics', db: 'public', name: 'orders', column: 'status' },
+ candidates: [],
+ dialect: 'bigquery',
+ });
+});
+```
+
+- [ ] **Step 2: Run the failing catalog tests**
+
+Run:
+
+```bash
+pnpm --filter @ktx/context exec vitest run src/ingest/tools/warehouse-verification/warehouse-catalog.service.test.ts -t "column display"
+```
+
+Expected: FAIL because `resolveDisplayTarget()` does not exist.
+
+- [ ] **Step 3: Implement column-aware display resolution**
+
+In
+`packages/context/src/ingest/tools/warehouse-verification/warehouse-catalog.service.ts`,
+add this exported interface near `RawSchemaHit`:
+
+```ts
+export interface DisplayTargetResolution {
+ resolved: (KtxTableRef & { column?: string }) | null;
+ candidates: KtxTableRef[];
+ dialect: string;
+}
+```
+
+Add these helpers near `parseDisplay()`:
+
+```ts
+function expectedDisplayPartCount(driver: CatalogDriver): number {
+ if (driver === 'sqlite' || driver === 'sqlite3') {
+ return 1;
+ }
+ if (driver === 'bigquery' || driver === 'snowflake' || driver === 'sqlserver') {
+ return 3;
+ }
+ return 2;
+}
+
+function parseColumnDisplay(driver: CatalogDriver, display: string): (KtxTableRef & { column: string }) | null {
+ const parts = splitDisplay(display);
+ const tablePartCount = expectedDisplayPartCount(driver);
+ if (parts.length !== tablePartCount + 1) {
+ return null;
+ }
+ const column = parts.at(-1);
+ if (!column) {
+ return null;
+ }
+ const table = parseDisplay(driver, parts.slice(0, -1).join('.'));
+ return table ? { ...table, column } : null;
+}
+```
+
+Add this method to `WarehouseCatalogService` after `resolveDisplay()`:
+
+```ts
+ async resolveDisplayTarget(connectionName: string, display: string): Promise {
+ const catalog = await this.loadCatalog(connectionName);
+ if (!catalog) {
+ return { resolved: null, candidates: [], dialect: 'unknown' };
+ }
+
+ const dialect = getDialectForDriver(catalog.driver).type;
+ const tableResolution = await this.resolveDisplay(connectionName, display);
+ if (tableResolution.resolved) {
+ return tableResolution;
+ }
+
+ const parsedColumn = parseColumnDisplay(catalog.driver, display);
+ if (!parsedColumn) {
+ return { resolved: null, candidates: bestCandidates(catalog.tables, display), dialect };
+ }
+
+ const table = catalog.tables.find((candidate) => refsEqual(candidate, parsedColumn));
+ if (!table) {
+ return { resolved: null, candidates: bestCandidates(catalog.tables, display), dialect };
+ }
+
+ return {
+ resolved: {
+ catalog: table.catalog,
+ db: table.db,
+ name: table.name,
+ column: parsedColumn.column,
+ },
+ candidates: [],
+ dialect,
+ };
+ }
+```
+
+- [ ] **Step 4: Write failing entity_details column tests**
+
+Add these tests to
+`packages/context/src/ingest/tools/warehouse-verification/entity-details.tool.test.ts`:
+
+```ts
+it('resolves display targets that include a column name', async () => {
+ const result = await tool.call(
+ { connectionName: 'warehouse', targets: [{ display: 'public.orders.status' }] },
+ context,
+ );
+
+ expect(result.markdown).toContain('### public.orders');
+ expect(result.markdown).toContain('- status (text, nullable=false)');
+ expect(result.markdown).not.toContain('- id (integer');
+ expect(result.structured.resolved).toHaveLength(1);
+ expect(result.structured.resolved[0]?.columns.map((column) => column.name)).toEqual(['status']);
+});
+
+it('reports missing explicit columns instead of returning an empty column list', async () => {
+ const result = await tool.call(
+ { connectionName: 'warehouse', targets: [{ display: 'public.orders.plan_tier' }] },
+ context,
+ );
+
+ expect(result.markdown).toContain('Column not found in scan: public.orders.plan_tier');
+ expect(result.markdown).toContain('Available columns: id, status');
+ expect(result.structured.resolved).toHaveLength(0);
+ expect(result.structured.missing).toHaveLength(1);
+});
+```
+
+- [ ] **Step 5: Run the failing entity_details tests**
+
+Run:
+
+```bash
+pnpm --filter @ktx/context exec vitest run src/ingest/tools/warehouse-verification/entity-details.tool.test.ts -t "column"
+```
+
+Expected: FAIL because display column targets are treated as table names and
+missing columns are not reported.
+
+- [ ] **Step 6: Use column-aware resolution in entity_details**
+
+In
+`packages/context/src/ingest/tools/warehouse-verification/entity-details.tool.ts`,
+add this helper near `appendTableMarkdown()`:
+
+```ts
+function findColumn(detail: TableDetail, columnName: string): TableDetail['columns'][number] | null {
+ const normalized = columnName.toLowerCase();
+ return detail.columns.find((column) => column.name.toLowerCase() === normalized) ?? null;
+}
+```
+
+Replace the display resolution block inside the `for (const target of
+input.targets)` loop with:
+
+```ts
+ const resolution =
+ 'display' in target
+ ? await catalog.resolveDisplayTarget(input.connectionName, target.display)
+ : {
+ resolved: { catalog: target.catalog, db: target.db, name: target.name, column: target.column },
+ candidates: [],
+ dialect: '',
+ };
+```
+
+After `const detail = await catalog.getTable(...)`, replace the existing
+`resolved.push(detail); appendTableMarkdown(...)` lines with:
+
+```ts
+ const requestedColumn = resolution.resolved.column;
+ if (requestedColumn) {
+ const column = findColumn(detail, requestedColumn);
+ if (!column) {
+ missing.push({
+ target,
+ candidates: [{ catalog: detail.catalog, db: detail.db, name: detail.name }],
+ });
+ parts.push(`Column not found in scan: ${detail.display}.${requestedColumn}`);
+ parts.push(`Available columns: ${detail.columns.map((candidate) => candidate.name).join(', ')}`);
+ continue;
+ }
+ const scopedDetail = { ...detail, columns: [column] };
+ resolved.push(scopedDetail);
+ appendTableMarkdown(parts, scopedDetail, column.name);
+ continue;
+ }
+
+ resolved.push(detail);
+ appendTableMarkdown(parts, detail);
+```
+
+- [ ] **Step 7: Run warehouse verification tests**
+
+Run:
+
+```bash
+pnpm --filter @ktx/context exec vitest run \
+ src/ingest/tools/warehouse-verification/warehouse-catalog.service.test.ts \
+ src/ingest/tools/warehouse-verification/entity-details.tool.test.ts
+```
+
+Expected: PASS.
+
+- [ ] **Step 8: Commit**
+
+Run:
+
+```bash
+git add \
+ packages/context/src/ingest/tools/warehouse-verification/warehouse-catalog.service.ts \
+ packages/context/src/ingest/tools/warehouse-verification/warehouse-catalog.service.test.ts \
+ packages/context/src/ingest/tools/warehouse-verification/entity-details.tool.ts \
+ packages/context/src/ingest/tools/warehouse-verification/entity-details.tool.test.ts
+git commit -m "fix(context): verify warehouse column display targets"
+```
+
+### Task 5: Verify the v1 gap closure
+
+**Files:**
+- Verify all files changed by Tasks 1-4.
+
+- [ ] **Step 1: Run focused tests**
+
+Run:
+
+```bash
+pnpm --filter @ktx/context exec vitest run \
+ src/ingest/adapters/notion/notion.adapter.test.ts \
+ src/ingest/local-adapters.test.ts \
+ src/ingest/ingest-prompts.test.ts \
+ src/ingest/tools/warehouse-verification/discover-data.tool.test.ts \
+ src/ingest/tools/warehouse-verification/warehouse-catalog.service.test.ts \
+ src/ingest/tools/warehouse-verification/entity-details.tool.test.ts
+```
+
+Expected: PASS.
+
+- [ ] **Step 2: Run package type-check**
+
+Run:
+
+```bash
+pnpm --filter @ktx/context run type-check
+```
+
+Expected: PASS.
+
+- [ ] **Step 3: Run package tests**
+
+Run:
+
+```bash
+pnpm --filter @ktx/context run test
+```
+
+Expected: PASS.
+
+- [ ] **Step 4: Run pre-commit on changed files when configured**
+
+Run:
+
+```bash
+uv run pre-commit run --files \
+ packages/context/src/ingest/adapters/notion/notion.adapter.ts \
+ packages/context/src/ingest/adapters/notion/notion.adapter.test.ts \
+ packages/context/src/ingest/local-adapters.ts \
+ packages/context/src/ingest/local-adapters.test.ts \
+ packages/context/src/ingest/adapters/notion/chunk.ts \
+ packages/context/prompts/memory_agent_bundle_ingest_work_unit.md \
+ packages/context/src/ingest/ingest-prompts.test.ts \
+ packages/context/src/ingest/tools/warehouse-verification/discover-data.tool.ts \
+ packages/context/src/ingest/tools/warehouse-verification/discover-data.tool.test.ts \
+ packages/context/src/ingest/tools/warehouse-verification/warehouse-catalog.service.ts \
+ packages/context/src/ingest/tools/warehouse-verification/warehouse-catalog.service.test.ts \
+ packages/context/src/ingest/tools/warehouse-verification/entity-details.tool.ts \
+ packages/context/src/ingest/tools/warehouse-verification/entity-details.tool.test.ts
+```
+
+Expected: PASS. If the repo has no pre-commit config or the local `uv` version
+cannot satisfy the project pin, record the exact error and rely on focused
+tests plus type-check.
+
+- [ ] **Step 5: Inspect final git status**
+
+Run:
+
+```bash
+git status --short
+```
+
+Expected: only intentional files are modified. Commit any formatter-driven
+changes with:
+
+```bash
+git add packages/context
+git commit -m "chore(context): verify warehouse verification v1 gaps"
+```
+
+## Self-review checklist
+
+- Spec coverage: this plan closes the remaining v1 paths for Notion warehouse
+ verification, active WorkUnit prompt correctness, raw discovery scoping, and
+ column-level identifier verification.
+- Placeholder scan: no task relies on future-work markers, unnamed edge-case
+ handling, or cross-task shorthand.
+- Type consistency: `discover_data` continues to use `connectionName`,
+ `sl_discover` still receives `connectionId` internally, and
+ `resolveDisplayTarget()` returns the same table identity plus optional
+ `column`.
diff --git a/docs/superpowers/plans/2026-05-12-warehouse-verification-final-v1-closure.md b/docs/superpowers/plans/2026-05-12-warehouse-verification-final-v1-closure.md
new file mode 100644
index 00000000..f48fea36
--- /dev/null
+++ b/docs/superpowers/plans/2026-05-12-warehouse-verification-final-v1-closure.md
@@ -0,0 +1,957 @@
+# Warehouse Verification Final V1 Closure Implementation Plan
+
+> **For agentic workers:** REQUIRED SUB-SKILL: Use superpowers:subagent-driven-development (recommended) or superpowers:executing-plans to implement this plan task-by-task. Steps use checkbox (`- [ ]`) syntax for tracking.
+
+**Goal:** Close the remaining v1 gaps that still prevent ingest agents from
+reliably following warehouse verification results through to `entity_details`
+and `sql_execution`.
+
+**Architecture:** Keep the existing warehouse verification module and runner
+session scoping. Add connection names to raw discovery hits, expose primary
+warehouse targets from the remaining source adapters, and make local ingest
+SQL probes use the same scan connector read-only execution path as schema scan.
+
+**Tech Stack:** TypeScript, Node 22, Vitest, AI SDK v6 tools, Zod, KTX local
+ingest runtime, KTX scan connectors.
+
+---
+
+## Audit summary
+
+The first two implementation plans landed the warehouse verification tools,
+prompt protocol, Notion warehouse scoping, and stale prompt-name cleanup. The
+focused audit on May 12, 2026, found three remaining v1-blocking gaps:
+
+- `discover_data` searches multiple allowed raw warehouse scans, but raw hits do
+ not carry or render `connectionName`. The tool tells the agent to call
+ `entity_details({connectionName, targets: [...]})`, then omits the required
+ `connectionName` from the follow-up evidence.
+- Local LookML and MetricFlow adapters do not expose primary warehouse target
+ IDs. The runner only adds adapter-provided targets to `allowedConnectionNames`,
+ so those WorkUnits cannot use raw warehouse verification unless their source
+ connection is itself the warehouse.
+- `sql_execution` calls the local ingest connection catalog, but the catalog
+ either has no query executor in normal CLI ingest or calls an injected
+ executor without `projectDir` and connection config. The default local query
+ executor cannot dispatch without that config.
+
+Non-blocking gaps remain out of scope for this v1 plan:
+
+- Full DDL-style `entity_details` formatting with FK profile summaries.
+- AST-backed SQL read-only validation for data-modifying CTE bodies.
+- Search over generated `enrichment/descriptions.json`.
+- Lexicographic latest-sync edge cases for non-timestamp sync IDs.
+- Hard write-time validation in `wiki_write` and `emit_unmapped_fallback`.
+
+## File structure
+
+Modify these files:
+
+- `packages/context/src/ingest/tools/warehouse-verification/warehouse-catalog.service.ts`:
+ add `connectionName` to raw schema hit records.
+- `packages/context/src/ingest/tools/warehouse-verification/discover-data.tool.ts`:
+ render raw hit connection names and preserve them in structured output.
+- `packages/context/src/ingest/tools/warehouse-verification/discover-data.tool.test.ts`:
+ cover multi-connection raw discovery follow-up data.
+- `packages/context/src/ingest/adapters/lookml/lookml.adapter.ts`:
+ accept and return configured target warehouse connection IDs.
+- `packages/context/src/ingest/adapters/lookml/lookml.adapter.test.ts`:
+ cover LookML target warehouse IDs.
+- `packages/context/src/ingest/adapters/metricflow/metricflow.adapter.ts`:
+ accept and return configured target warehouse connection IDs.
+- `packages/context/src/ingest/adapters/metricflow/metricflow.adapter.test.ts`:
+ cover MetricFlow target warehouse IDs.
+- `packages/context/src/ingest/local-adapters.ts`:
+ pass primary warehouse IDs into LookML and MetricFlow adapters.
+- `packages/context/src/ingest/local-adapters.test.ts`:
+ cover local adapter warehouse target fan-out.
+- `packages/context/src/ingest/local-bundle-runtime.ts`:
+ pass full project connection config to local ingest query executors.
+- `packages/context/src/ingest/local-bundle-runtime.test.ts`:
+ cover the local ingest query executor call shape.
+- `packages/context/src/ingest/local-ingest.ts`:
+ use the shared query executor port type.
+- `packages/context/src/mcp/local-project-ports.ts`:
+ no behavior change expected, but type-checks against the updated local ingest
+ query executor type.
+- `packages/cli/src/ingest.ts`:
+ provide a read-only scan-connector-backed query executor for normal local
+ ingest runs.
+
+Create these files:
+
+- `packages/cli/src/ingest-query-executor.ts`: CLI query executor that adapts
+ scan connectors' `executeReadOnly()` method to `KtxSqlQueryExecutorPort`.
+- `packages/cli/src/ingest-query-executor.test.ts`: unit coverage for the CLI
+ ingest query executor.
+
+### Task 1: Preserve raw discovery connection names
+
+**Files:**
+- Modify: `packages/context/src/ingest/tools/warehouse-verification/warehouse-catalog.service.ts`
+- Modify: `packages/context/src/ingest/tools/warehouse-verification/discover-data.tool.ts`
+- Modify: `packages/context/src/ingest/tools/warehouse-verification/discover-data.tool.test.ts`
+
+- [ ] **Step 1: Write the failing multi-connection discovery test**
+
+Add this test to
+`packages/context/src/ingest/tools/warehouse-verification/discover-data.tool.test.ts`:
+
+```ts
+ it('includes connectionName on raw schema hits so entity_details can follow up', async () => {
+ const multiConnectionContext: ToolContext = {
+ ...context,
+ session: { allowedConnectionNames: new Set(['warehouse', 'analytics']) } as any,
+ };
+ catalog.searchByName.mockImplementation(async (connectionName: string, query: string) => [
+ {
+ kind: 'table',
+ connectionName,
+ ref: { catalog: null, db: 'public', name: `${connectionName}_${query}` },
+ display: `public.${connectionName}_${query}`,
+ matchedOn: 'name',
+ },
+ ]);
+
+ const result = await tool.call({ query: 'orders', limit: 10 }, multiConnectionContext);
+
+ expect(catalog.searchByName).toHaveBeenCalledWith('analytics', 'orders', 10);
+ expect(catalog.searchByName).toHaveBeenCalledWith('warehouse', 'orders', 10);
+ expect(result.markdown).toContain('connectionName=analytics');
+ expect(result.markdown).toContain('connectionName=warehouse');
+ expect(result.markdown).toContain(
+ 'entity_details({connectionName: "analytics", targets: [{display: "public.analytics_orders"}]})',
+ );
+ expect(result.structured.raw?.hits.map((hit) => hit.connectionName)).toEqual([
+ 'analytics',
+ 'warehouse',
+ ]);
+ });
+```
+
+- [ ] **Step 2: Run the failing discovery test**
+
+Run:
+
+```bash
+pnpm --filter @ktx/context exec vitest run src/ingest/tools/warehouse-verification/discover-data.tool.test.ts -t "connectionName on raw schema hits"
+```
+
+Expected: FAIL because `RawSchemaHit` has no `connectionName` property and the
+markdown only renders the display string.
+
+- [ ] **Step 3: Add `connectionName` to raw schema hits**
+
+Modify the raw hit type and hit construction in
+`packages/context/src/ingest/tools/warehouse-verification/warehouse-catalog.service.ts`:
+
+```ts
+export type RawSchemaHit =
+ | {
+ kind: 'table';
+ connectionName: string;
+ ref: KtxTableRef;
+ display: string;
+ matchedOn: 'name' | 'db' | 'comment' | 'description';
+ }
+ | {
+ kind: 'column';
+ connectionName: string;
+ ref: KtxTableRef & { column: string };
+ display: string;
+ matchedOn: 'name' | 'comment' | 'description';
+ };
+```
+
+In the table hit block, add `connectionName`:
+
+```ts
+ hits.push({
+ kind: 'table',
+ connectionName,
+ ref: { catalog: table.catalog, db: table.db, name: table.name },
+ display: formatDisplay(catalog.driver, table),
+ matchedOn: tableMatch,
+ });
+```
+
+In the column hit block, add `connectionName`:
+
+```ts
+ hits.push({
+ kind: 'column',
+ connectionName,
+ ref: { catalog: table.catalog, db: table.db, name: table.name, column: column.name },
+ display: `${formatDisplay(catalog.driver, table)}.${column.name}`,
+ matchedOn: columnMatch,
+ });
+```
+
+- [ ] **Step 4: Render follow-up-ready raw hits**
+
+Modify the raw schema markdown in
+`packages/context/src/ingest/tools/warehouse-verification/discover-data.tool.ts`:
+
+```ts
+ parts.push('## Raw Warehouse Schema', '> use `entity_details({connectionName, targets: [{display}]})` for full DDL + sample values');
+ parts.push(
+ rawHits
+ .slice(0, limit)
+ .map(
+ (hit) =>
+ `- ${hit.kind}: ${hit.display} [connectionName=${hit.connectionName}] (matched on ${hit.matchedOn}) — ` +
+ `follow up with \`entity_details({connectionName: "${hit.connectionName}", targets: [{display: "${hit.display}"}]})\``,
+ )
+ .join('\n'),
+ );
+```
+
+- [ ] **Step 5: Run the discovery test**
+
+Run:
+
+```bash
+pnpm --filter @ktx/context exec vitest run src/ingest/tools/warehouse-verification/discover-data.tool.test.ts
+```
+
+Expected: PASS.
+
+- [ ] **Step 6: Commit**
+
+Run:
+
+```bash
+git add \
+ packages/context/src/ingest/tools/warehouse-verification/warehouse-catalog.service.ts \
+ packages/context/src/ingest/tools/warehouse-verification/discover-data.tool.ts \
+ packages/context/src/ingest/tools/warehouse-verification/discover-data.tool.test.ts
+git commit -m "fix(context): include raw discovery connection names"
+```
+
+### Task 2: Expose LookML and MetricFlow warehouse targets
+
+**Files:**
+- Modify: `packages/context/src/ingest/adapters/lookml/lookml.adapter.ts`
+- Modify: `packages/context/src/ingest/adapters/lookml/lookml.adapter.test.ts`
+- Modify: `packages/context/src/ingest/adapters/metricflow/metricflow.adapter.ts`
+- Modify: `packages/context/src/ingest/adapters/metricflow/metricflow.adapter.test.ts`
+- Modify: `packages/context/src/ingest/local-adapters.ts`
+- Modify: `packages/context/src/ingest/local-adapters.test.ts`
+
+- [ ] **Step 1: Write failing adapter target tests**
+
+Add this test to
+`packages/context/src/ingest/adapters/lookml/lookml.adapter.test.ts`:
+
+```ts
+ it('returns configured target warehouse connection ids', async () => {
+ const adapter = new LookmlSourceAdapter({
+ homeDir: join(tmpRoot, 'home'),
+ targetConnectionIds: ['warehouse', 'analytics', 'warehouse'],
+ });
+
+ await expect(adapter.listTargetConnectionIds?.(join(tmpRoot, 'staged'))).resolves.toEqual([
+ 'analytics',
+ 'warehouse',
+ ]);
+ });
+```
+
+Add this test to
+`packages/context/src/ingest/adapters/metricflow/metricflow.adapter.test.ts`:
+
+```ts
+ it('returns configured target warehouse connection ids', async () => {
+ const metricflow = new MetricflowSourceAdapter({
+ homeDir: join(tmpRoot, 'cache-home'),
+ targetConnectionIds: ['warehouse', 'analytics', 'warehouse'],
+ });
+
+ await expect(metricflow.listTargetConnectionIds?.(stagedDir)).resolves.toEqual([
+ 'analytics',
+ 'warehouse',
+ ]);
+ });
+```
+
+- [ ] **Step 2: Run the failing adapter tests**
+
+Run:
+
+```bash
+pnpm --filter @ktx/context exec vitest run \
+ src/ingest/adapters/lookml/lookml.adapter.test.ts -t "target warehouse connection ids" \
+ src/ingest/adapters/metricflow/metricflow.adapter.test.ts -t "target warehouse connection ids"
+```
+
+Expected: FAIL because neither adapter accepts `targetConnectionIds` or
+implements `listTargetConnectionIds()`.
+
+- [ ] **Step 3: Implement target ID support in LookML**
+
+Modify `packages/context/src/ingest/adapters/lookml/lookml.adapter.ts`:
+
+```ts
+export interface LookmlSourceAdapterDeps {
+ homeDir: string;
+ targetConnectionIds?: string[];
+}
+
+function uniqueSorted(values: readonly string[] | undefined): string[] {
+ return [...new Set(values ?? [])].sort((left, right) => left.localeCompare(right));
+}
+```
+
+Add this method to `LookmlSourceAdapter`:
+
+```ts
+ async listTargetConnectionIds(_stagedDir: string): Promise {
+ return uniqueSorted(this.deps.targetConnectionIds);
+ }
+```
+
+- [ ] **Step 4: Implement target ID support in MetricFlow**
+
+Modify `packages/context/src/ingest/adapters/metricflow/metricflow.adapter.ts`:
+
+```ts
+export interface MetricflowSourceAdapterDeps {
+ homeDir: string;
+ targetConnectionIds?: string[];
+}
+
+function uniqueSorted(values: readonly string[] | undefined): string[] {
+ return [...new Set(values ?? [])].sort((left, right) => left.localeCompare(right));
+}
+```
+
+Add this method to `MetricflowSourceAdapter`:
+
+```ts
+ async listTargetConnectionIds(_stagedDir: string): Promise {
+ return uniqueSorted(this.deps.targetConnectionIds);
+ }
+```
+
+- [ ] **Step 5: Pass primary warehouses from the local adapter factory**
+
+Modify the LookML and MetricFlow adapter construction in
+`packages/context/src/ingest/local-adapters.ts`:
+
+```ts
+ new LookmlSourceAdapter({
+ homeDir: join(project.projectDir, '.ktx/cache'),
+ targetConnectionIds: primaryWarehouseConnectionIds(project),
+ }),
+```
+
+```ts
+ new MetricflowSourceAdapter({
+ homeDir: join(project.projectDir, '.ktx/cache'),
+ targetConnectionIds: primaryWarehouseConnectionIds(project),
+ }),
+```
+
+- [ ] **Step 6: Write the local adapter fan-out test**
+
+Add this test to `packages/context/src/ingest/local-adapters.test.ts`:
+
+```ts
+ it('passes primary warehouse connection ids to local LookML and MetricFlow adapters', async () => {
+ const adapters = createDefaultLocalIngestAdapters(
+ projectWithConnections({
+ warehouse: {
+ driver: 'postgres',
+ url: 'postgresql://readonly@db.example.test/analytics',
+ },
+ lookml_docs: {
+ driver: 'lookml',
+ lookml: {
+ repoUrl: 'https://github.com/acme/lookml.git',
+ },
+ },
+ metrics_repo: {
+ driver: 'metricflow',
+ metricflow: {
+ repoUrl: 'https://github.com/acme/metrics.git',
+ },
+ },
+ } as never),
+ );
+
+ const lookml = adapters.find((adapter) => adapter.source === 'lookml');
+ const metricflow = adapters.find((adapter) => adapter.source === 'metricflow');
+
+ await expect(lookml?.listTargetConnectionIds?.('/tmp/staged-lookml')).resolves.toEqual([
+ 'warehouse',
+ ]);
+ await expect(metricflow?.listTargetConnectionIds?.('/tmp/staged-metricflow')).resolves.toEqual([
+ 'warehouse',
+ ]);
+ });
+```
+
+- [ ] **Step 7: Run the target fan-out tests**
+
+Run:
+
+```bash
+pnpm --filter @ktx/context exec vitest run \
+ src/ingest/adapters/lookml/lookml.adapter.test.ts \
+ src/ingest/adapters/metricflow/metricflow.adapter.test.ts \
+ src/ingest/local-adapters.test.ts
+```
+
+Expected: PASS.
+
+- [ ] **Step 8: Commit**
+
+Run:
+
+```bash
+git add \
+ packages/context/src/ingest/adapters/lookml/lookml.adapter.ts \
+ packages/context/src/ingest/adapters/lookml/lookml.adapter.test.ts \
+ packages/context/src/ingest/adapters/metricflow/metricflow.adapter.ts \
+ packages/context/src/ingest/adapters/metricflow/metricflow.adapter.test.ts \
+ packages/context/src/ingest/local-adapters.ts \
+ packages/context/src/ingest/local-adapters.test.ts
+git commit -m "fix(context): expose warehouse targets for LookML and MetricFlow"
+```
+
+### Task 3: Pass full connection config to local ingest SQL execution
+
+**Files:**
+- Modify: `packages/context/src/ingest/local-bundle-runtime.ts`
+- Modify: `packages/context/src/ingest/local-bundle-runtime.test.ts`
+- Modify: `packages/context/src/ingest/local-ingest.ts`
+
+- [ ] **Step 1: Write the failing local connection catalog test**
+
+In `packages/context/src/ingest/local-bundle-runtime.test.ts`, change the
+Vitest import to include `vi`:
+
+```ts
+import { afterEach, beforeEach, describe, expect, it, vi } from 'vitest';
+```
+
+Extend `RuntimeWithConnectionDeps`:
+
+```ts
+type RuntimeWithConnectionDeps = {
+ deps: {
+ connections: {
+ listEnabledConnections(ids: string[]): Promise>;
+ getConnectionById(connectionId: string): Promise<{ id: string; name: string; connectionType: string } | null>;
+ executeQuery(connectionId: string, sql: string): Promise;
+ };
+ };
+};
+```
+
+Add this test:
+
+```ts
+ it('passes project connection config to local ingest query executors', async () => {
+ const agentRunner = new AgentRunnerService({ llmProvider: { getModel: () => ({}) as never } as any });
+ const queryExecutor = {
+ execute: vi.fn(async () => ({
+ headers: ['answer'],
+ rows: [[1]],
+ totalRows: 1,
+ command: 'SELECT',
+ rowCount: 1,
+ })),
+ };
+
+ const runtime = createLocalBundleIngestRuntime({
+ project,
+ adapters: [new FakeSourceAdapter()],
+ agentRunner,
+ queryExecutor,
+ });
+ const connections = (runtime.runner as unknown as RuntimeWithConnectionDeps).deps.connections;
+
+ await expect(connections.executeQuery('warehouse', 'select 1')).resolves.toMatchObject({
+ headers: ['answer'],
+ });
+ expect(queryExecutor.execute).toHaveBeenCalledWith({
+ connectionId: 'warehouse',
+ projectDir: project.projectDir,
+ connection: project.config.connections.warehouse,
+ sql: 'select 1',
+ });
+ });
+```
+
+- [ ] **Step 2: Run the failing local runtime test**
+
+Run:
+
+```bash
+pnpm --filter @ktx/context exec vitest run src/ingest/local-bundle-runtime.test.ts -t "project connection config"
+```
+
+Expected: FAIL because `LocalConnectionCatalog.executeQuery()` only passes
+`connectionId` and `sql`.
+
+- [ ] **Step 3: Update local ingest query executor types**
+
+In `packages/context/src/ingest/local-bundle-runtime.ts`, import the shared
+query executor type:
+
+```ts
+import { localConnectionInfoFromConfig, type KtxSqlQueryExecutorPort } from '../connections/index.js';
+```
+
+Change `CreateLocalBundleIngestRuntimeOptions.queryExecutor` to:
+
+```ts
+ queryExecutor?: KtxSqlQueryExecutorPort;
+```
+
+Change `LocalConnectionCatalog` to store that type:
+
+```ts
+class LocalConnectionCatalog implements SlConnectionCatalogPort {
+ constructor(
+ private readonly project: KtxLocalProject,
+ private readonly queryExecutor?: KtxSqlQueryExecutorPort,
+ ) {}
+```
+
+Change `executeQuery()`:
+
+```ts
+ async executeQuery(connectionId: string, sql: string): Promise {
+ if (!this.queryExecutor) {
+ throw new Error('Local ingest has no query executor configured');
+ }
+ return this.queryExecutor.execute({
+ connectionId,
+ projectDir: this.project.projectDir,
+ connection: this.project.config.connections[connectionId],
+ sql,
+ });
+ }
+```
+
+In `packages/context/src/ingest/local-ingest.ts`, replace the local query
+executor object type with the shared port:
+
+```ts
+import type { KtxSqlQueryExecutorPort } from '../connections/index.js';
+```
+
+```ts
+ queryExecutor?: KtxSqlQueryExecutorPort;
+```
+
+- [ ] **Step 4: Run the local runtime test**
+
+Run:
+
+```bash
+pnpm --filter @ktx/context exec vitest run src/ingest/local-bundle-runtime.test.ts -t "project connection config"
+```
+
+Expected: PASS.
+
+- [ ] **Step 5: Commit**
+
+Run:
+
+```bash
+git add \
+ packages/context/src/ingest/local-bundle-runtime.ts \
+ packages/context/src/ingest/local-bundle-runtime.test.ts \
+ packages/context/src/ingest/local-ingest.ts
+git commit -m "fix(context): pass connection config to ingest query executors"
+```
+
+### Task 4: Supply a scan-connector query executor to CLI ingest
+
+**Files:**
+- Create: `packages/cli/src/ingest-query-executor.ts`
+- Create: `packages/cli/src/ingest-query-executor.test.ts`
+- Modify: `packages/cli/src/ingest.ts`
+
+- [ ] **Step 1: Write the CLI query executor tests**
+
+Create `packages/cli/src/ingest-query-executor.test.ts`:
+
+```ts
+import type { KtxLocalProject } from '@ktx/context/project';
+import { createKtxConnectorCapabilities, type KtxScanConnector } from '@ktx/context/scan';
+import { describe, expect, it, vi } from 'vitest';
+import { createKtxCliIngestQueryExecutor } from './ingest-query-executor.js';
+
+function project(): KtxLocalProject {
+ return {
+ projectDir: '/tmp/ktx-query-project',
+ config: {
+ project: 'warehouse',
+ connections: {
+ warehouse: { driver: 'postgres', url: 'postgresql://readonly@example.test/db' },
+ },
+ },
+ } as unknown as KtxLocalProject;
+}
+
+function connector(overrides: Partial = {}): KtxScanConnector {
+ return {
+ id: 'warehouse',
+ driver: 'postgres',
+ capabilities: createKtxConnectorCapabilities({ readOnlySql: true }),
+ async introspect() {
+ throw new Error('introspect is not used by this test');
+ },
+ executeReadOnly: vi.fn(async () => ({
+ headers: ['answer'],
+ rows: [[1]],
+ totalRows: 1,
+ rowCount: 1,
+ })),
+ cleanup: vi.fn(async () => {}),
+ ...overrides,
+ };
+}
+
+describe('createKtxCliIngestQueryExecutor', () => {
+ it('executes read-only SQL through the scan connector and cleans it up', async () => {
+ const scanConnector = connector();
+ const createConnector = vi.fn(async () => scanConnector);
+ const executor = createKtxCliIngestQueryExecutor(project(), { createConnector });
+
+ await expect(
+ executor.execute({
+ connectionId: 'warehouse',
+ connection: { driver: 'postgres', url: 'postgresql://readonly@example.test/db' },
+ projectDir: '/tmp/ktx-query-project',
+ sql: 'select 1',
+ maxRows: 5,
+ }),
+ ).resolves.toMatchObject({
+ headers: ['answer'],
+ rows: [[1]],
+ totalRows: 1,
+ command: 'SELECT',
+ rowCount: 1,
+ });
+
+ expect(createConnector).toHaveBeenCalledWith(project(), 'warehouse');
+ expect(scanConnector.executeReadOnly).toHaveBeenCalledWith(
+ { connectionId: 'warehouse', sql: 'select 1', maxRows: 5 },
+ { runId: 'ingest-sql-execution' },
+ );
+ expect(scanConnector.cleanup).toHaveBeenCalledTimes(1);
+ });
+
+ it('rejects connectors without read-only SQL support', async () => {
+ const scanConnector = connector({
+ capabilities: createKtxConnectorCapabilities({ readOnlySql: false }),
+ executeReadOnly: undefined,
+ });
+ const executor = createKtxCliIngestQueryExecutor(project(), {
+ createConnector: vi.fn(async () => scanConnector),
+ });
+
+ await expect(
+ executor.execute({
+ connectionId: 'warehouse',
+ connection: { driver: 'postgres' },
+ projectDir: '/tmp/ktx-query-project',
+ sql: 'select 1',
+ }),
+ ).rejects.toThrow('Connection "warehouse" driver "postgres" does not support read-only SQL execution.');
+ expect(scanConnector.cleanup).toHaveBeenCalledTimes(1);
+ });
+});
+```
+
+- [ ] **Step 2: Run the failing CLI query executor test**
+
+Run:
+
+```bash
+pnpm --filter @ktx/cli exec vitest run src/ingest-query-executor.test.ts
+```
+
+Expected: FAIL because `ingest-query-executor.ts` does not exist.
+
+- [ ] **Step 3: Add the scan-connector-backed query executor**
+
+Create `packages/cli/src/ingest-query-executor.ts`:
+
+```ts
+import type { KtxSqlQueryExecutionInput, KtxSqlQueryExecutorPort } from '@ktx/context/connections';
+import type { KtxLocalProject } from '@ktx/context/project';
+import type { KtxScanConnector, KtxScanContext } from '@ktx/context/scan';
+import { createKtxCliScanConnector } from './local-scan-connectors.js';
+
+type CreateConnector = typeof createKtxCliScanConnector;
+
+export interface KtxCliIngestQueryExecutorDeps {
+ createConnector?: CreateConnector;
+}
+
+async function cleanupConnector(connector: KtxScanConnector | null): Promise {
+ await connector?.cleanup?.();
+}
+
+export function createKtxCliIngestQueryExecutor(
+ project: KtxLocalProject,
+ deps: KtxCliIngestQueryExecutorDeps = {},
+): KtxSqlQueryExecutorPort {
+ const createConnector = deps.createConnector ?? createKtxCliScanConnector;
+ return {
+ async execute(input: KtxSqlQueryExecutionInput) {
+ let connector: KtxScanConnector | null = null;
+ try {
+ connector = await createConnector(project, input.connectionId);
+ if (!connector.capabilities.readOnlySql || !connector.executeReadOnly) {
+ throw new Error(
+ `Connection "${input.connectionId}" driver "${connector.driver}" does not support read-only SQL execution.`,
+ );
+ }
+
+ const ctx: KtxScanContext = { runId: 'ingest-sql-execution' };
+ const result = await connector.executeReadOnly(
+ { connectionId: input.connectionId, sql: input.sql, maxRows: input.maxRows },
+ ctx,
+ );
+ return {
+ headers: result.headers,
+ rows: result.rows,
+ totalRows: result.totalRows,
+ command: 'SELECT',
+ rowCount: result.rowCount,
+ };
+ } finally {
+ await cleanupConnector(connector);
+ }
+ },
+ };
+}
+```
+
+- [ ] **Step 4: Wire the CLI executor into local ingest runs**
+
+In `packages/cli/src/ingest.ts`, import the executor and type:
+
+```ts
+import type { KtxSqlQueryExecutorPort } from '@ktx/context/connections';
+import type { KtxLocalProject } from '@ktx/context/project';
+import { createKtxCliIngestQueryExecutor } from './ingest-query-executor.js';
+```
+
+Extend `KtxIngestDeps`:
+
+```ts
+ createQueryExecutor?: (project: KtxLocalProject) => KtxSqlQueryExecutorPort;
+```
+
+Inside the `args.command === 'run'` branch, after `localIngestOptions` is
+defined, add:
+
+```ts
+ const queryExecutor =
+ localIngestOptions.queryExecutor ??
+ (deps.createQueryExecutor ?? createKtxCliIngestQueryExecutor)(project);
+```
+
+Pass `queryExecutor` to both local ingest execution paths. In the Metabase
+fan-out call:
+
+```ts
+ ...localIngestOptions,
+ queryExecutor,
+ trigger: 'manual_resync',
+```
+
+In the normal local ingest call:
+
+```ts
+ ...localIngestOptions,
+ queryExecutor,
+ pullConfigOptions: adapterOptions,
+```
+
+- [ ] **Step 5: Add CLI wiring coverage**
+
+Add this test to `packages/cli/src/ingest.test.ts`:
+
+```ts
+ it('supplies a scan-connector query executor to local ingest runs', async () => {
+ const io = makeIo();
+ const projectDir = join(tempDir, 'query-executor-project');
+ await writeWarehouseConfig(projectDir);
+ const queryExecutor = {
+ execute: vi.fn(async () => ({
+ headers: [],
+ rows: [],
+ totalRows: 0,
+ command: 'SELECT',
+ rowCount: 0,
+ })),
+ };
+ const runLocalIngest = vi.fn(async (input: RunLocalIngestOptions): Promise =>
+ completedLocalBundleRun(input, 'query-executor-run'),
+ );
+
+ await expect(
+ runKtxIngest(
+ {
+ command: 'run',
+ projectDir,
+ connectionId: 'warehouse',
+ adapter: 'fake',
+ outputMode: 'json',
+ },
+ io.io,
+ {
+ runLocalIngest,
+ createAdapters: () => [],
+ createQueryExecutor: () => queryExecutor,
+ },
+ ),
+ ).resolves.toBe(0);
+
+ expect(runLocalIngest).toHaveBeenCalledWith(expect.objectContaining({ queryExecutor }));
+ });
+```
+
+- [ ] **Step 6: Run CLI query executor tests**
+
+Run:
+
+```bash
+pnpm --filter @ktx/cli exec vitest run src/ingest-query-executor.test.ts src/ingest.test.ts -t "query executor"
+```
+
+Expected: PASS.
+
+- [ ] **Step 7: Commit**
+
+Run:
+
+```bash
+git add \
+ packages/cli/src/ingest-query-executor.ts \
+ packages/cli/src/ingest-query-executor.test.ts \
+ packages/cli/src/ingest.ts \
+ packages/cli/src/ingest.test.ts
+git commit -m "fix(cli): enable read-only SQL probes for local ingest"
+```
+
+### Task 5: Final verification
+
+**Files:**
+- Verify: all files changed by Tasks 1-4.
+
+- [ ] **Step 1: Run focused context tests**
+
+Run:
+
+```bash
+pnpm --filter @ktx/context exec vitest run \
+ src/ingest/tools/warehouse-verification/warehouse-catalog.service.test.ts \
+ src/ingest/tools/warehouse-verification/entity-details.tool.test.ts \
+ src/ingest/tools/warehouse-verification/discover-data.tool.test.ts \
+ src/ingest/tools/warehouse-verification/sql-execution.tool.test.ts \
+ src/ingest/local-bundle-runtime.test.ts \
+ src/ingest/local-adapters.test.ts \
+ src/ingest/adapters/lookml/lookml.adapter.test.ts \
+ src/ingest/adapters/metricflow/metricflow.adapter.test.ts \
+ src/ingest/ingest-bundle.runner.test.ts
+```
+
+Expected: PASS.
+
+- [ ] **Step 2: Run focused CLI tests**
+
+Run:
+
+```bash
+pnpm --filter @ktx/cli exec vitest run src/ingest-query-executor.test.ts src/ingest.test.ts
+```
+
+Expected: PASS.
+
+- [ ] **Step 3: Run type checks**
+
+Run:
+
+```bash
+pnpm --filter @ktx/context run type-check
+pnpm --filter @ktx/cli run type-check
+```
+
+Expected: both commands pass.
+
+- [ ] **Step 4: Run pre-commit on changed files if configured**
+
+Run:
+
+```bash
+uv run pre-commit run --files \
+ packages/context/src/ingest/tools/warehouse-verification/warehouse-catalog.service.ts \
+ packages/context/src/ingest/tools/warehouse-verification/discover-data.tool.ts \
+ packages/context/src/ingest/tools/warehouse-verification/discover-data.tool.test.ts \
+ packages/context/src/ingest/adapters/lookml/lookml.adapter.ts \
+ packages/context/src/ingest/adapters/lookml/lookml.adapter.test.ts \
+ packages/context/src/ingest/adapters/metricflow/metricflow.adapter.ts \
+ packages/context/src/ingest/adapters/metricflow/metricflow.adapter.test.ts \
+ packages/context/src/ingest/local-adapters.ts \
+ packages/context/src/ingest/local-adapters.test.ts \
+ packages/context/src/ingest/local-bundle-runtime.ts \
+ packages/context/src/ingest/local-bundle-runtime.test.ts \
+ packages/context/src/ingest/local-ingest.ts \
+ packages/cli/src/ingest-query-executor.ts \
+ packages/cli/src/ingest-query-executor.test.ts \
+ packages/cli/src/ingest.ts \
+ packages/cli/src/ingest.test.ts \
+ docs/superpowers/plans/2026-05-12-warehouse-verification-final-v1-closure.md
+```
+
+Expected: PASS. If the repository has no pre-commit config or the local `uv`
+version cannot satisfy the configured toolchain, record the exact error and use
+the focused test and type-check results as the closest verification.
+
+- [ ] **Step 5: Commit final verification fixes if any were needed**
+
+If verification required edits, run:
+
+```bash
+git add
+git commit -m "test: cover warehouse verification v1 closure"
+```
+
+If verification required no edits, do not create an empty commit.
+
+## Self-review
+
+Spec coverage:
+
+- Raw warehouse discovery still covers wiki, semantic-layer, and raw schema
+ results, and now raw hits include the connection name needed by the required
+ `entity_details` follow-up.
+- Every local synthesis adapter with an external source connection now has a
+ path to target warehouse IDs: dbt and Notion already had it, Looker resolves
+ staged mappings, Metabase fan-out runs under target warehouse IDs, and this
+ plan adds LookML and MetricFlow.
+- `sql_execution` remains scoped by `allowedConnectionNames`, retains the
+ read-only SQL wrapper, and gains a normal local ingest execution backend.
+
+Placeholder scan:
+
+- This plan contains no deferred implementation placeholders.
+- Every code-changing step includes the exact test or implementation snippet to
+ add.
+
+Type consistency:
+
+- `connectionName` is added to `RawSchemaHit` and used by `DiscoverDataTool`.
+- `targetConnectionIds` and `listTargetConnectionIds()` match the existing dbt
+ and Notion adapter pattern.
+- Local ingest uses `KtxSqlQueryExecutorPort` consistently from CLI to context.
diff --git a/docs/superpowers/plans/2026-05-12-warehouse-verification-tools.md b/docs/superpowers/plans/2026-05-12-warehouse-verification-tools.md
new file mode 100644
index 00000000..42bb7f44
--- /dev/null
+++ b/docs/superpowers/plans/2026-05-12-warehouse-verification-tools.md
@@ -0,0 +1,1617 @@
+# Warehouse Verification Tools Implementation Plan
+
+> **For agentic workers:** REQUIRED SUB-SKILL: Use superpowers:subagent-driven-development (recommended) or superpowers:executing-plans to implement this plan task-by-task. Steps use checkbox (`- [ ]`) syntax for tracking.
+
+**Goal:** Add synthesis-time warehouse verification tools so ingest agents can verify raw warehouse tables, columns, and sample values before writing wiki pages, SL sources, `tables:` frontmatter, `sl_refs`, or unmapped fallback records.
+
+**Architecture:** Add a raw scan catalog service over `raw-sources//live-database//`, three BaseTool-backed ingest tools, and runner/tool-session scoping for allowed warehouse connections. Register the tools in the local ingest toolset so both WorkUnit and reconcile stages receive them through the existing `toAiSdkTools()` path.
+
+**Tech Stack:** TypeScript, Node 22, Vitest, AI SDK v6 tools, Zod, KTX file store, KTX semantic layer and wiki tools.
+
+---
+
+## Audit summary
+
+The current repo has the original spec file only; no matching plan or implementation exists under `docs/superpowers/plans`. The following v1-blocking gaps remain:
+
+- `packages/context/src/connections/dialects.ts` does not exist.
+- `packages/context/src/ingest/tools/warehouse-verification/` does not exist.
+- `entity_details`, `sql_execution`, and `discover_data` are not available to ingest WU or reconcile toolsets.
+- `ToolSession` does not carry the ingest stage's allowed warehouse connection IDs.
+- Prompt updates are absent from the 11 writer skills named in the spec.
+- Cleanup strings remain: `orbit_analytics.customer`, `wiki_sl_search`, and `sl_describe_table`.
+- Prompt-bundling and warehouse-tool tests are absent.
+
+Non-blocking gaps remain out of scope for this plan:
+
+- Hard write-time validation in `wiki_write` and `emit_unmapped_fallback`.
+- `dictionary_search`.
+- `semantic_query` in synthesis toolsets.
+- A raw-schema FTS index.
+- A UUID identity layer for tables and columns.
+
+One repo-specific adjustment is required: do not import `@ktx/connector-*`
+dialect classes into `@ktx/context`, because every connector package already
+depends on `@ktx/context`. Add a minimal context-local dialect dispatch instead.
+
+## File structure
+
+Create these files:
+
+- `packages/context/src/connections/dialects.ts`: Context-local driver dispatch for identifier quoting and display formatting.
+- `packages/context/src/connections/dialects.test.ts`: Driver dispatch and display-format tests.
+- `packages/context/src/ingest/tools/warehouse-verification/warehouse-catalog.service.ts`: Reads the latest live-database scan, resolves display identifiers, and searches table and column metadata.
+- `packages/context/src/ingest/tools/warehouse-verification/warehouse-catalog.service.test.ts`: Fixture-backed catalog tests.
+- `packages/context/src/ingest/tools/warehouse-verification/entity-details.tool.ts`: `entity_details` ingest tool.
+- `packages/context/src/ingest/tools/warehouse-verification/entity-details.tool.test.ts`: Tool contract tests.
+- `packages/context/src/ingest/tools/warehouse-verification/sql-execution.tool.ts`: `sql_execution` ingest tool.
+- `packages/context/src/ingest/tools/warehouse-verification/sql-execution.tool.test.ts`: Read-only SQL and output tests.
+- `packages/context/src/ingest/tools/warehouse-verification/discover-data.tool.ts`: `discover_data` ingest tool composing wiki, SL, and raw-schema search.
+- `packages/context/src/ingest/tools/warehouse-verification/discover-data.tool.test.ts`: Discovery composition tests.
+- `packages/context/src/ingest/tools/warehouse-verification/index.ts`: Exports tool classes and `createWarehouseVerificationTools()`.
+- `packages/context/skills/_shared/identifier-verification.md`: Shared protocol text kept in the tree for review even though writer skills inline it.
+
+Modify these files:
+
+- `packages/context/src/connections/index.ts`: Export the dialect helper.
+- `packages/context/src/tools/tool-session.ts`: Add `allowedConnectionNames`.
+- `packages/context/src/ingest/ingest-bundle.runner.ts`: Populate `allowedConnectionNames` for WU and reconcile sessions.
+- `packages/context/src/ingest/local-bundle-runtime.ts`: Register the warehouse verification tools in `LocalIngestToolsetFactory`.
+- `packages/context/src/ingest/ingest-bundle.runner.test.ts`: Assert the runner scopes allowed warehouse connections.
+- `packages/context/src/memory/memory-runtime-assets.test.ts`: Assert writer skills contain the protocol and banned strings are gone.
+- `packages/context/src/ingest/ingest-runtime-assets.test.ts`: Assert ingest skill packaging includes the protocol.
+- `packages/context/src/ingest/tools/emit-unmapped-fallback.tool.ts`: Replace the fictional table example.
+- `packages/context/src/sl/tools/sl-warehouse-validation.ts`: Replace the stale `sl_describe_table` hint.
+- `packages/context/skills/*/SKILL.md`: Inline protocol updates for the writer skills listed in the spec.
+
+### Task 1: Add context-local dialect dispatch
+
+**Files:**
+- Create: `packages/context/src/connections/dialects.ts`
+- Create: `packages/context/src/connections/dialects.test.ts`
+- Modify: `packages/context/src/connections/index.ts`
+
+- [ ] **Step 1: Write the failing dialect tests**
+
+Create `packages/context/src/connections/dialects.test.ts`:
+
+```ts
+import { describe, expect, it } from 'vitest';
+import { getDialectForDriver } from './dialects.js';
+
+describe('getDialectForDriver', () => {
+ it.each([
+ ['postgres', '"public"."orders"'],
+ ['postgresql', '"public"."orders"'],
+ ['mysql', '`public`.`orders`'],
+ ['clickhouse', '`public`.`orders`'],
+ ['sqlite', '"orders"'],
+ ['snowflake', '"analytics"."public"."orders"'],
+ ['bigquery', '`analytics`.`public`.`orders`'],
+ ['sqlserver', '[analytics].[public].[orders]'],
+ ] as const)('formats table names for %s', (driver, expected) => {
+ const dialect = getDialectForDriver(driver);
+ expect(
+ dialect.formatTableName({
+ catalog: driver === 'snowflake' || driver === 'bigquery' || driver === 'sqlserver' ? 'analytics' : null,
+ db: driver === 'sqlite' ? null : 'public',
+ name: 'orders',
+ }),
+ ).toBe(expected);
+ });
+
+ it('throws with a supported-driver list for unknown drivers', () => {
+ expect(() => getDialectForDriver('oracle')).toThrow(
+ 'Unsupported warehouse driver "oracle". Supported drivers: bigquery, clickhouse, mysql, postgres, postgresql, sqlite, sqlite3, snowflake, sqlserver',
+ );
+ });
+});
+```
+
+- [ ] **Step 2: Run the failing test**
+
+Run:
+
+```bash
+pnpm --filter @ktx/context exec vitest run src/connections/dialects.test.ts
+```
+
+Expected: FAIL because `./dialects.js` does not exist.
+
+- [ ] **Step 3: Add the minimal dialect implementation**
+
+Create `packages/context/src/connections/dialects.ts`:
+
+```ts
+import type { KtxSchemaDimensionType, KtxTableRef } from '../scan/types.js';
+
+export type SupportedDriver =
+ | 'postgres'
+ | 'postgresql'
+ | 'mysql'
+ | 'sqlserver'
+ | 'snowflake'
+ | 'bigquery'
+ | 'clickhouse'
+ | 'sqlite'
+ | 'sqlite3';
+
+export interface KtxDialect {
+ readonly type: SupportedDriver;
+ quoteIdentifier(identifier: string): string;
+ formatTableName(table: KtxTableRef): string;
+ mapToDimensionType(nativeType: string): KtxSchemaDimensionType;
+}
+
+const supportedDrivers: SupportedDriver[] = [
+ 'bigquery',
+ 'clickhouse',
+ 'mysql',
+ 'postgres',
+ 'postgresql',
+ 'sqlite',
+ 'sqlite3',
+ 'snowflake',
+ 'sqlserver',
+];
+
+function doubleQuoted(identifier: string): string {
+ return `"${identifier.replace(/"/g, '""')}"`;
+}
+
+function backtickQuoted(identifier: string): string {
+ return `\`${identifier.replace(/`/g, '``')}\``;
+}
+
+function bigQueryQuoted(identifier: string): string {
+ return `\`${identifier.replace(/`/g, '\\`')}\``;
+}
+
+function bracketQuoted(identifier: string): string {
+ return `[${identifier.replace(/\]/g, ']]')}]`;
+}
+
+function inferDimensionType(nativeType: string): KtxSchemaDimensionType {
+ const normalized = nativeType.toLowerCase().trim();
+ if (normalized.includes('date') || normalized.includes('time')) {
+ return 'time';
+ }
+ if (
+ normalized.includes('int') ||
+ normalized.includes('num') ||
+ normalized.includes('dec') ||
+ normalized.includes('float') ||
+ normalized.includes('double') ||
+ normalized.includes('real')
+ ) {
+ return 'number';
+ }
+ if (normalized.includes('bool') || normalized === 'bit') {
+ return 'boolean';
+ }
+ return 'string';
+}
+
+function formatWithParts(table: KtxTableRef, quote: (identifier: string) => string, sqlite = false): string {
+ const parts = sqlite ? [table.name] : [table.catalog, table.db, table.name].filter((part): part is string => !!part);
+ return parts.map(quote).join('.');
+}
+
+function createDialect(type: SupportedDriver, quote: (identifier: string) => string, sqlite = false): KtxDialect {
+ return {
+ type,
+ quoteIdentifier: quote,
+ formatTableName: (table) => formatWithParts(table, quote, sqlite),
+ mapToDimensionType: inferDimensionType,
+ };
+}
+
+const dialects: Record = {
+ postgres: createDialect('postgres', doubleQuoted),
+ postgresql: createDialect('postgresql', doubleQuoted),
+ mysql: createDialect('mysql', backtickQuoted),
+ clickhouse: createDialect('clickhouse', backtickQuoted),
+ sqlite: createDialect('sqlite', doubleQuoted, true),
+ sqlite3: createDialect('sqlite3', doubleQuoted, true),
+ snowflake: createDialect('snowflake', doubleQuoted),
+ bigquery: createDialect('bigquery', bigQueryQuoted),
+ sqlserver: createDialect('sqlserver', bracketQuoted),
+};
+
+export function getDialectForDriver(driver: string): KtxDialect {
+ const normalized = driver.toLowerCase().trim();
+ if (normalized in dialects) {
+ return dialects[normalized as SupportedDriver];
+ }
+ throw new Error(`Unsupported warehouse driver "${driver}". Supported drivers: ${supportedDrivers.join(', ')}`);
+}
+```
+
+Modify `packages/context/src/connections/index.ts`:
+
+```ts
+export type { KtxDialect, SupportedDriver } from './dialects.js';
+export { getDialectForDriver } from './dialects.js';
+```
+
+- [ ] **Step 4: Run the dialect tests**
+
+Run:
+
+```bash
+pnpm --filter @ktx/context exec vitest run src/connections/dialects.test.ts
+```
+
+Expected: PASS.
+
+- [ ] **Step 5: Commit**
+
+Run:
+
+```bash
+git add packages/context/src/connections/dialects.ts packages/context/src/connections/dialects.test.ts packages/context/src/connections/index.ts
+git commit -m "feat(context): add warehouse dialect dispatch"
+```
+
+### Task 2: Add the raw scan warehouse catalog service
+
+**Files:**
+- Create: `packages/context/src/ingest/tools/warehouse-verification/warehouse-catalog.service.ts`
+- Create: `packages/context/src/ingest/tools/warehouse-verification/warehouse-catalog.service.test.ts`
+
+- [ ] **Step 1: Write failing catalog tests**
+
+Create `packages/context/src/ingest/tools/warehouse-verification/warehouse-catalog.service.test.ts`:
+
+```ts
+import { mkdtemp, rm } from 'node:fs/promises';
+import { tmpdir } from 'node:os';
+import { join } from 'node:path';
+import { afterEach, beforeEach, describe, expect, it } from 'vitest';
+import { initKtxProject, type KtxLocalProject } from '../../../project/index.js';
+import { WarehouseCatalogService } from './warehouse-catalog.service.js';
+
+describe('WarehouseCatalogService', () => {
+ let tempDir: string;
+ let project: KtxLocalProject;
+
+ beforeEach(async () => {
+ tempDir = await mkdtemp(join(tmpdir(), 'ktx-warehouse-catalog-'));
+ project = await initKtxProject({ projectDir: join(tempDir, 'project'), projectName: 'warehouse' });
+ });
+
+ afterEach(async () => {
+ await rm(tempDir, { recursive: true, force: true });
+ });
+
+ async function seedLiveDatabaseScan(connectionName = 'warehouse', syncId = 'sync-2', driver = 'postgres') {
+ const root = `raw-sources/${connectionName}/live-database/${syncId}`;
+ await project.fileStore.writeFile(
+ `${root}/connection.json`,
+ JSON.stringify({ connectionId: connectionName, driver, extractedAt: '2026-05-12T00:00:00.000Z' }, null, 2),
+ 'ktx',
+ 'ktx@example.com',
+ 'seed connection',
+ );
+ await project.fileStore.writeFile(
+ `${root}/tables/orders.json`,
+ JSON.stringify(
+ {
+ catalog: null,
+ db: driver === 'sqlite' ? null : 'public',
+ name: 'orders',
+ kind: 'table',
+ comment: 'Customer orders',
+ estimatedRows: 12,
+ columns: [
+ {
+ name: 'id',
+ nativeType: 'integer',
+ normalizedType: 'integer',
+ dimensionType: 'number',
+ nullable: false,
+ primaryKey: true,
+ comment: 'Order id',
+ },
+ {
+ name: 'status',
+ nativeType: 'text',
+ normalizedType: 'text',
+ dimensionType: 'string',
+ nullable: false,
+ primaryKey: false,
+ comment: 'Order status',
+ },
+ ],
+ foreignKeys: [],
+ },
+ null,
+ 2,
+ ),
+ 'ktx',
+ 'ktx@example.com',
+ 'seed orders',
+ );
+ await project.fileStore.writeFile(
+ `${root}/enrichment/relationship-profile.json`,
+ JSON.stringify(
+ {
+ connectionId: connectionName,
+ driver,
+ sqlAvailable: true,
+ queryCount: 3,
+ tables: [{ table: { catalog: null, db: driver === 'sqlite' ? null : 'public', name: 'orders' }, rowCount: 12 }],
+ columns: {
+ 'orders.status': {
+ table: { catalog: null, db: driver === 'sqlite' ? null : 'public', name: 'orders' },
+ column: 'status',
+ nativeType: 'text',
+ normalizedType: 'text',
+ rowCount: 12,
+ nullCount: 0,
+ distinctCount: 2,
+ uniquenessRatio: 0.1667,
+ nullRate: 0,
+ sampleValues: ['paid', 'refunded'],
+ minTextLength: 4,
+ maxTextLength: 8,
+ },
+ },
+ warnings: [],
+ },
+ null,
+ 2,
+ ),
+ 'ktx',
+ 'ktx@example.com',
+ 'seed profile',
+ );
+ }
+
+ it('finds the latest sync and merges table schema with relationship profile values', async () => {
+ await seedLiveDatabaseScan('warehouse', 'sync-1');
+ await seedLiveDatabaseScan('warehouse', 'sync-2');
+ const catalog = new WarehouseCatalogService({ fileStore: project.fileStore });
+
+ await expect(catalog.getLatestSyncId('warehouse')).resolves.toBe('sync-2');
+ const detail = await catalog.getTable({ connectionName: 'warehouse', catalog: null, db: 'public', name: 'orders' });
+
+ expect(detail).toMatchObject({
+ connectionName: 'warehouse',
+ display: 'public.orders',
+ rowCount: 12,
+ columns: [
+ { name: 'id', nativeType: 'integer', primaryKey: true },
+ { name: 'status', nativeType: 'text', sampleValues: ['paid', 'refunded'], distinctCount: 2 },
+ ],
+ });
+ });
+
+ it('returns scanAvailable=false when no live-database scan exists', async () => {
+ const catalog = new WarehouseCatalogService({ fileStore: project.fileStore });
+ await expect(catalog.getTable({ connectionName: 'missing', catalog: null, db: 'public', name: 'orders' })).resolves.toBeNull();
+ await expect(catalog.hasScan('missing')).resolves.toBe(false);
+ });
+
+ it('resolves postgres display strings and returns closest candidates for missing tables', async () => {
+ await seedLiveDatabaseScan();
+ const catalog = new WarehouseCatalogService({ fileStore: project.fileStore });
+
+ await expect(catalog.resolveDisplay('warehouse', 'public.orders')).resolves.toMatchObject({
+ resolved: { catalog: null, db: 'public', name: 'orders' },
+ candidates: [],
+ dialect: 'postgres',
+ });
+ await expect(catalog.resolveDisplay('warehouse', 'public.orderz')).resolves.toMatchObject({
+ resolved: null,
+ candidates: [{ name: 'orders' }],
+ });
+ });
+
+ it('treats two-part BigQuery identifiers as ambiguous instead of guessing', async () => {
+ await seedLiveDatabaseScan('warehouse', 'sync-bigquery', 'bigquery');
+ const catalog = new WarehouseCatalogService({ fileStore: project.fileStore });
+
+ await expect(catalog.resolveDisplay('warehouse', 'public.orders')).resolves.toMatchObject({
+ resolved: null,
+ dialect: 'bigquery',
+ });
+ });
+
+ it('searches table names, column names, comments, and descriptions', async () => {
+ await seedLiveDatabaseScan();
+ const catalog = new WarehouseCatalogService({ fileStore: project.fileStore });
+
+ await expect(catalog.searchByName('warehouse', 'status', 10)).resolves.toEqual(
+ expect.arrayContaining([
+ expect.objectContaining({
+ kind: 'column',
+ ref: expect.objectContaining({ db: 'public', name: 'orders', column: 'status' }),
+ matchedOn: 'name',
+ }),
+ ]),
+ );
+ });
+});
+```
+
+- [ ] **Step 2: Run the failing catalog tests**
+
+Run:
+
+```bash
+pnpm --filter @ktx/context exec vitest run src/ingest/tools/warehouse-verification/warehouse-catalog.service.test.ts
+```
+
+Expected: FAIL because the service file does not exist.
+
+- [ ] **Step 3: Add the catalog service**
+
+Create `packages/context/src/ingest/tools/warehouse-verification/warehouse-catalog.service.ts` with these exported shapes and behavior:
+
+```ts
+import type { KtxFileStorePort } from '../../../core/index.js';
+import { getDialectForDriver } from '../../../connections/index.js';
+import type { KtxConnectionDriver, KtxSchemaColumn, KtxSchemaForeignKey, KtxSchemaTable, KtxTableRef } from '../../../scan/types.js';
+
+export interface WarehouseCatalogServiceDeps {
+ fileStore: KtxFileStorePort;
+}
+
+export interface WarehouseColumnDetail extends KtxSchemaColumn {
+ descriptions: Record;
+ rowCount: number | null;
+ nullCount: number | null;
+ distinctCount: number | null;
+ nullRate: number | null;
+ sampleValues: string[];
+}
+
+export interface TableDetail {
+ connectionName: string;
+ catalog: string | null;
+ db: string | null;
+ name: string;
+ display: string;
+ kind: string;
+ comment: string | null;
+ description: string | null;
+ rowCount: number | null;
+ columns: WarehouseColumnDetail[];
+ foreignKeys: KtxSchemaForeignKey[];
+}
+
+export type RawSchemaHit =
+ | { kind: 'table'; ref: KtxTableRef; display: string; matchedOn: 'name' | 'db' | 'comment' | 'description' }
+ | { kind: 'column'; ref: KtxTableRef & { column: string }; display: string; matchedOn: 'name' | 'comment' | 'description' };
+
+interface ConnectionArtifact {
+ driver?: KtxConnectionDriver;
+}
+
+interface RelationshipProfileColumn {
+ table?: KtxTableRef;
+ column?: string;
+ rowCount?: number;
+ nullCount?: number;
+ distinctCount?: number;
+ nullRate?: number;
+ sampleValues?: unknown[];
+}
+
+interface RelationshipProfileArtifact {
+ driver?: KtxConnectionDriver;
+ tables?: Array<{ table?: KtxTableRef; rowCount?: number }>;
+ columns?: Record;
+}
+
+interface ConnectionCatalog {
+ connectionName: string;
+ syncId: string;
+ driver: KtxConnectionDriver;
+ tables: KtxSchemaTable[];
+ profile: RelationshipProfileArtifact | null;
+}
+```
+
+The implementation must:
+
+- Use `fileStore.listFiles("raw-sources//live-database")` and choose the lexicographically latest path ending in `/connection.json`.
+- Read every JSON file under `/tables/` rather than reconstructing a path from the table ref. This supports encoded and simple table filenames already present in tests.
+- Parse display strings by driver:
+ - Postgres, MySQL, and ClickHouse: `schema.table`.
+ - SQL Server, Snowflake, and BigQuery: `catalog.schema.table`.
+ - SQLite: `table`.
+ - For BigQuery, a two-part display must return `resolved: null` and candidate matches.
+- Match table refs case-insensitively, while preserving stored casing in outputs.
+- Merge relationship-profile fields by `(catalog, db, name, column)`, with fallback matching on `table.name + "." + column`.
+- Cache a loaded connection catalog per `connectionName` within the service instance.
+- Return `null` from `getTable()` when the scan is absent or the table ref is not found.
+
+Use these method signatures:
+
+```ts
+export class WarehouseCatalogService {
+ constructor(private readonly deps: WarehouseCatalogServiceDeps) {}
+
+ async hasScan(connectionName: string): Promise;
+ async getLatestSyncId(connectionName: string): Promise;
+ async listTables(connectionName: string): Promise;
+ async getTable(ref: { connectionName: string } & KtxTableRef): Promise;
+ async resolveDisplay(connectionName: string, display: string): Promise<{
+ resolved: KtxTableRef | null;
+ candidates: KtxTableRef[];
+ dialect: string;
+ }>;
+ async searchByName(connectionName: string, query: string, limit: number): Promise;
+}
+```
+
+- [ ] **Step 4: Run the catalog tests**
+
+Run:
+
+```bash
+pnpm --filter @ktx/context exec vitest run src/ingest/tools/warehouse-verification/warehouse-catalog.service.test.ts
+```
+
+Expected: PASS.
+
+- [ ] **Step 5: Commit**
+
+Run:
+
+```bash
+git add packages/context/src/ingest/tools/warehouse-verification/warehouse-catalog.service.ts packages/context/src/ingest/tools/warehouse-verification/warehouse-catalog.service.test.ts
+git commit -m "feat(context): read warehouse scan catalog"
+```
+
+### Task 3: Add `entity_details`
+
+**Files:**
+- Create: `packages/context/src/ingest/tools/warehouse-verification/entity-details.tool.ts`
+- Create: `packages/context/src/ingest/tools/warehouse-verification/entity-details.tool.test.ts`
+
+- [ ] **Step 1: Write failing `entity_details` tests**
+
+Create tests that instantiate the tool with a seeded `WarehouseCatalogService` and a `ToolContext` whose session has `allowedConnectionNames: new Set(['warehouse'])`. Test these cases:
+
+```ts
+it('returns scoped table detail for a display target', async () => {
+ const result = await tool.call(
+ { connectionName: 'warehouse', targets: [{ display: 'public.orders' }] },
+ context,
+ );
+ expect(result.markdown).toContain('### public.orders');
+ expect(result.markdown).toContain('- status (text, nullable=false)');
+ expect(result.markdown).toContain('sample: ["paid","refunded"]');
+ expect(result.structured.scanAvailable).toBe(true);
+ expect(result.structured.resolved).toHaveLength(1);
+});
+
+it('returns a no-scan state distinct from not found', async () => {
+ const result = await tool.call(
+ { connectionName: 'empty', targets: [{ display: 'public.orders' }] },
+ { ...context, session: { ...context.session!, allowedConnectionNames: new Set(['empty']) } },
+ );
+ expect(result.markdown).toContain('No live-database scan available for connection "empty"; run `ktx scan` first.');
+ expect(result.structured.scanAvailable).toBe(false);
+});
+
+it('refuses out-of-scope connections', async () => {
+ const result = await tool.call(
+ { connectionName: 'billing', targets: [{ display: 'public.orders' }] },
+ context,
+ );
+ expect(result.markdown).toContain('Connection "billing" is not available to this ingest stage.');
+ expect(result.structured.scanAvailable).toBe(false);
+});
+```
+
+- [ ] **Step 2: Run the failing tests**
+
+Run:
+
+```bash
+pnpm --filter @ktx/context exec vitest run src/ingest/tools/warehouse-verification/entity-details.tool.test.ts
+```
+
+Expected: FAIL because the tool file does not exist.
+
+- [ ] **Step 3: Implement the tool**
+
+Create `packages/context/src/ingest/tools/warehouse-verification/entity-details.tool.ts`:
+
+```ts
+import { z } from 'zod';
+import { BaseTool, type ToolContext, type ToolOutput } from '../../../tools/index.js';
+import type { KtxTableRef } from '../../../scan/types.js';
+import { WarehouseCatalogService, type TableDetail } from './warehouse-catalog.service.js';
+
+const targetSchema = z.union([
+ z.object({ display: z.string().min(1) }),
+ z.object({
+ catalog: z.string().nullable(),
+ db: z.string().nullable(),
+ name: z.string().min(1),
+ column: z.string().optional(),
+ }),
+]);
+
+const entityDetailsInputSchema = z.object({
+ connectionName: z.string().regex(/^[a-zA-Z0-9][a-zA-Z0-9_-]*$/),
+ targets: z.array(targetSchema).min(1).max(50),
+});
+
+type EntityDetailsInput = z.infer;
+
+export interface EntityDetailsStructured {
+ resolved: TableDetail[];
+ missing: Array<{ target: unknown; candidates: KtxTableRef[] }>;
+ scanAvailable: boolean;
+}
+
+function allowedConnectionNames(context: ToolContext): ReadonlySet | null {
+ return context.session?.allowedConnectionNames ?? null;
+}
+
+function sampleText(values: string[]): string {
+ return values.length > 0 ? ` - sample: ${JSON.stringify(values.slice(0, 10))}` : '';
+}
+
+function appendTableMarkdown(parts: string[], detail: TableDetail, columnName?: string): void {
+ const columns = columnName ? detail.columns.filter((column) => column.name === columnName) : detail.columns;
+ parts.push(`### ${detail.display}`);
+ parts.push(`Type: ${detail.kind} | Native columns: ${detail.columns.length}`);
+ if (detail.description || detail.comment) {
+ parts.push(`Description: ${detail.description ?? detail.comment}`);
+ }
+ parts.push('', 'Columns:');
+ for (const column of columns) {
+ const pk = column.primaryKey ? ', PK' : '';
+ parts.push(`- ${column.name} (${column.nativeType}, nullable=${column.nullable}${pk})${sampleText(column.sampleValues)}`);
+ }
+ parts.push('');
+}
+
+export class EntityDetailsTool extends BaseTool {
+ readonly name = 'entity_details';
+
+ constructor(private readonly catalogFactory: (context: ToolContext) => WarehouseCatalogService) {
+ super();
+ }
+
+ get description(): string {
+ return 'Verify warehouse tables and columns from the latest live-database scan before writing them into wiki or semantic-layer output.';
+ }
+
+ get inputSchema() {
+ return entityDetailsInputSchema;
+ }
+
+ async call(input: EntityDetailsInput, context: ToolContext): Promise> {
+ const allowed = allowedConnectionNames(context);
+ if (allowed && !allowed.has(input.connectionName)) {
+ return {
+ markdown: `Connection "${input.connectionName}" is not available to this ingest stage.`,
+ structured: { resolved: [], missing: [], scanAvailable: false },
+ };
+ }
+
+ const catalog = this.catalogFactory(context);
+ const scanAvailable = await catalog.hasScan(input.connectionName);
+ if (!scanAvailable) {
+ return {
+ markdown: `No live-database scan available for connection "${input.connectionName}"; run \`ktx scan\` first.`,
+ structured: { resolved: [], missing: [], scanAvailable: false },
+ };
+ }
+
+ const parts: string[] = [];
+ const resolved: TableDetail[] = [];
+ const missing: EntityDetailsStructured['missing'] = [];
+
+ for (const target of input.targets) {
+ const resolution =
+ 'display' in target
+ ? await catalog.resolveDisplay(input.connectionName, target.display)
+ : { resolved: { catalog: target.catalog, db: target.db, name: target.name }, candidates: [], dialect: '' };
+ if (!resolution.resolved) {
+ missing.push({ target, candidates: resolution.candidates });
+ parts.push(`Not found in scan: ${'display' in target ? target.display : target.name}`);
+ if (resolution.candidates.length > 0) {
+ parts.push(`Closest matches: ${resolution.candidates.map((candidate) => candidate.name).join(', ')}`);
+ }
+ continue;
+ }
+ const detail = await catalog.getTable({ connectionName: input.connectionName, ...resolution.resolved });
+ if (!detail) {
+ missing.push({ target, candidates: resolution.candidates });
+ continue;
+ }
+ resolved.push(detail);
+ appendTableMarkdown(parts, detail, 'column' in target ? target.column : undefined);
+ }
+
+ return {
+ markdown: parts.join('\n').trim(),
+ structured: { resolved, missing, scanAvailable: true },
+ };
+ }
+}
+```
+
+- [ ] **Step 4: Run the `entity_details` tests**
+
+Run:
+
+```bash
+pnpm --filter @ktx/context exec vitest run src/ingest/tools/warehouse-verification/entity-details.tool.test.ts
+```
+
+Expected: PASS.
+
+- [ ] **Step 5: Commit**
+
+Run:
+
+```bash
+git add packages/context/src/ingest/tools/warehouse-verification/entity-details.tool.ts packages/context/src/ingest/tools/warehouse-verification/entity-details.tool.test.ts
+git commit -m "feat(context): add entity details verification tool"
+```
+
+### Task 4: Add `sql_execution`
+
+**Files:**
+- Create: `packages/context/src/ingest/tools/warehouse-verification/sql-execution.tool.ts`
+- Create: `packages/context/src/ingest/tools/warehouse-verification/sql-execution.tool.test.ts`
+
+- [ ] **Step 1: Write failing `sql_execution` tests**
+
+Create tests for:
+
+```ts
+it('wraps read-only SQL with a capped row limit', async () => {
+ connections.executeQuery.mockResolvedValue({ headers: ['status'], rows: [['paid']], totalRows: 1 });
+ const result = await tool.call(
+ { connectionName: 'warehouse', sql: 'select status from public.orders', rowLimit: 5 },
+ context,
+ );
+ expect(connections.executeQuery).toHaveBeenCalledWith(
+ 'warehouse',
+ 'select * from (select status from public.orders) as ktx_query_result limit 5',
+ );
+ expect(result.markdown).toContain('| status |');
+ expect(result.structured.wrappedSql).toContain('limit 5');
+});
+
+it.each(['insert into x values (1)', 'drop table x', 'vacuum'])('rejects mutating SQL: %s', async (sql) => {
+ const result = await tool.call({ connectionName: 'warehouse', sql }, context);
+ expect(result.markdown).toContain('Only read-only SELECT/WITH queries can be executed locally.');
+ expect(connections.executeQuery).not.toHaveBeenCalled();
+});
+
+it('surfaces connector errors verbatim', async () => {
+ connections.executeQuery.mockRejectedValue(new Error('relation "orbit_analytics.customer" does not exist'));
+ const result = await tool.call(
+ { connectionName: 'warehouse', sql: 'select 1 from orbit_analytics.customer', rowLimit: 1 },
+ context,
+ );
+ expect(result.markdown).toContain('relation "orbit_analytics.customer" does not exist');
+ expect(result.structured.error).toContain('relation "orbit_analytics.customer" does not exist');
+});
+```
+
+- [ ] **Step 2: Run the failing tests**
+
+Run:
+
+```bash
+pnpm --filter @ktx/context exec vitest run src/ingest/tools/warehouse-verification/sql-execution.tool.test.ts
+```
+
+Expected: FAIL because the tool file does not exist.
+
+- [ ] **Step 3: Implement the tool**
+
+Create `packages/context/src/ingest/tools/warehouse-verification/sql-execution.tool.ts`:
+
+```ts
+import { z } from 'zod';
+import { assertReadOnlySql, limitSqlForExecution } from '../../../connections/index.js';
+import type { SlConnectionCatalogPort } from '../../../sl/index.js';
+import { BaseTool, type ToolContext, type ToolOutput } from '../../../tools/index.js';
+
+const sqlExecutionInputSchema = z.object({
+ connectionName: z.string().regex(/^[a-zA-Z0-9][a-zA-Z0-9_-]*$/),
+ sql: z.string().min(1),
+ rowLimit: z.number().int().positive().max(1000).optional().default(100),
+});
+
+type SqlExecutionInput = z.infer;
+
+export interface SqlExecutionStructured {
+ headers: string[];
+ rows: unknown[][];
+ rowCount: number;
+ truncated: boolean;
+ sql: string;
+ wrappedSql: string;
+ error?: string;
+}
+
+function markdownTable(headers: string[], rows: unknown[][], totalRows: number): string {
+ if (headers.length === 0) {
+ return rows.length === 0 ? 'Query returned no rows.' : JSON.stringify(rows.slice(0, 20));
+ }
+ const visible = rows.slice(0, 20);
+ const lines = [
+ `| ${headers.join(' | ')} |`,
+ `| ${headers.map(() => '---').join(' | ')} |`,
+ ...visible.map((row) => `| ${row.map((value) => String(value ?? '')).join(' | ')} |`),
+ ];
+ if (totalRows > visible.length) {
+ lines.push(`... +${totalRows - visible.length} more rows`);
+ }
+ return lines.join('\n');
+}
+
+export class SqlExecutionTool extends BaseTool {
+ readonly name = 'sql_execution';
+
+ constructor(private readonly connections: SlConnectionCatalogPort) {
+ super();
+ }
+
+ get description(): string {
+ return 'Run a single read-only SELECT or WITH probe against an allowed warehouse connection and return a capped markdown table or the warehouse error.';
+ }
+
+ get inputSchema() {
+ return sqlExecutionInputSchema;
+ }
+
+ async call(input: SqlExecutionInput, context: ToolContext): Promise> {
+ const allowed = context.session?.allowedConnectionNames;
+ if (allowed && !allowed.has(input.connectionName)) {
+ return {
+ markdown: `Connection "${input.connectionName}" is not available to this ingest stage.`,
+ structured: { headers: [], rows: [], rowCount: 0, truncated: false, sql: input.sql, wrappedSql: '', error: 'connection_not_allowed' },
+ };
+ }
+
+ let sql: string;
+ let wrappedSql: string;
+ try {
+ sql = assertReadOnlySql(input.sql);
+ wrappedSql = limitSqlForExecution(sql, input.rowLimit);
+ } catch (error) {
+ const message = error instanceof Error ? error.message : String(error);
+ return {
+ markdown: message,
+ structured: { headers: [], rows: [], rowCount: 0, truncated: false, sql: input.sql, wrappedSql: '', error: message },
+ };
+ }
+
+ try {
+ const result = await this.connections.executeQuery(input.connectionName, wrappedSql);
+ const headers = result.headers ?? [];
+ const rows = result.rows ?? [];
+ const rowCount = result.totalRows ?? rows.length;
+ return {
+ markdown: markdownTable(headers, rows, rowCount),
+ structured: { headers, rows, rowCount, truncated: rowCount > rows.length, sql, wrappedSql },
+ };
+ } catch (error) {
+ const message = error instanceof Error ? error.message : String(error);
+ return {
+ markdown: `SQL execution failed: ${message}`,
+ structured: { headers: [], rows: [], rowCount: 0, truncated: false, sql, wrappedSql, error: message },
+ };
+ }
+ }
+}
+```
+
+- [ ] **Step 4: Run the `sql_execution` tests**
+
+Run:
+
+```bash
+pnpm --filter @ktx/context exec vitest run src/ingest/tools/warehouse-verification/sql-execution.tool.test.ts
+```
+
+Expected: PASS.
+
+- [ ] **Step 5: Commit**
+
+Run:
+
+```bash
+git add packages/context/src/ingest/tools/warehouse-verification/sql-execution.tool.ts packages/context/src/ingest/tools/warehouse-verification/sql-execution.tool.test.ts
+git commit -m "feat(context): add ingest SQL verification tool"
+```
+
+### Task 5: Add `discover_data`
+
+**Files:**
+- Create: `packages/context/src/ingest/tools/warehouse-verification/discover-data.tool.ts`
+- Create: `packages/context/src/ingest/tools/warehouse-verification/discover-data.tool.test.ts`
+- Create: `packages/context/src/ingest/tools/warehouse-verification/index.ts`
+
+- [ ] **Step 1: Write failing `discover_data` tests**
+
+Create tests with fake `wikiSearchTool.call`, `slDiscoverTool.call`, and `WarehouseCatalogService.searchByName`. Cover:
+
+```ts
+it('groups wiki, semantic layer, and raw schema hits with routing hints', async () => {
+ const result = await tool.call({ query: 'orders', connectionName: 'warehouse', limit: 5 }, context);
+ expect(result.markdown).toContain('## Wiki Pages');
+ expect(result.markdown).toContain('use `wiki_read(blockKey)` for full content');
+ expect(result.markdown).toContain('## Semantic Layer Sources');
+ expect(result.markdown).toContain('use `sl_read_source(sourceName)` for the YAML');
+ expect(result.markdown).toContain('## Raw Warehouse Schema');
+ expect(result.markdown).toContain('use `entity_details({connectionName, targets: [{display}]})`');
+ expect(result.structured.raw?.hits).toHaveLength(1);
+});
+
+it('delegates sourceName inspect mode to sl_discover only', async () => {
+ const result = await tool.call({ sourceName: 'orders', connectionName: 'warehouse' }, context);
+ expect(slDiscoverTool.call).toHaveBeenCalledWith({ sourceName: 'orders', connectionId: 'warehouse' }, context);
+ expect(wikiSearchTool.call).not.toHaveBeenCalled();
+ expect(catalog.searchByName).not.toHaveBeenCalled();
+ expect(result.markdown).toContain('source detail');
+});
+
+it('returns the empty-state message when all sections are empty', async () => {
+ const result = await tool.call({ query: 'customer source', connectionName: 'warehouse' }, emptyContext);
+ expect(result.markdown).toContain('No matches for "customer source" across wiki, semantic layer, or raw warehouse schema.');
+});
+```
+
+- [ ] **Step 2: Run the failing tests**
+
+Run:
+
+```bash
+pnpm --filter @ktx/context exec vitest run src/ingest/tools/warehouse-verification/discover-data.tool.test.ts
+```
+
+Expected: FAIL because the tool file does not exist.
+
+- [ ] **Step 3: Implement the tool and index export**
+
+Create `packages/context/src/ingest/tools/warehouse-verification/discover-data.tool.ts`:
+
+```ts
+import { z } from 'zod';
+import type { BaseTool, ToolContext, ToolOutput } from '../../../tools/index.js';
+import { BaseTool as ToolBase } from '../../../tools/index.js';
+import { WarehouseCatalogService, type RawSchemaHit } from './warehouse-catalog.service.js';
+
+const discoverDataInputSchema = z.object({
+ query: z.string().optional(),
+ connectionName: z.string().regex(/^[a-zA-Z0-9][a-zA-Z0-9_-]*$/).optional(),
+ limit: z.number().int().positive().max(50).optional().default(10),
+ sourceName: z.string().optional(),
+});
+
+type DiscoverDataInput = z.infer;
+
+export interface DiscoverDataStructured {
+ wiki: unknown | null;
+ sl: unknown | null;
+ raw: { hits: RawSchemaHit[] } | null;
+}
+
+interface DiscoverDataDeps {
+ wikiSearchTool: BaseTool;
+ slDiscoverTool: BaseTool;
+ catalogFactory: (context: ToolContext) => WarehouseCatalogService;
+}
+
+export class DiscoverDataTool extends ToolBase {
+ readonly name = 'discover_data';
+
+ constructor(private readonly deps: DiscoverDataDeps) {
+ super();
+ }
+
+ get description(): string {
+ return 'Discover existing wiki pages, semantic layer sources, and raw warehouse schema hits before writing ingest output.';
+ }
+
+ get inputSchema() {
+ return discoverDataInputSchema;
+ }
+
+ async call(input: DiscoverDataInput, context: ToolContext): Promise> {
+ if (input.sourceName) {
+ const sl = await this.deps.slDiscoverTool.call(
+ { sourceName: input.sourceName, connectionId: input.connectionName },
+ context,
+ );
+ return { markdown: sl.markdown, structured: { wiki: null, sl: sl.structured, raw: null } };
+ }
+
+ const query = input.query?.trim() || '';
+ const limit = input.limit ?? 10;
+ const parts: string[] = [];
+ let wiki: unknown | null = null;
+ let sl: unknown | null = null;
+ let raw: DiscoverDataStructured['raw'] = null;
+
+ if (query) {
+ const wikiResult = await this.deps.wikiSearchTool.call({ query, limit }, context);
+ if (wikiResult.structured?.totalFound > 0) {
+ parts.push('## Wiki Pages', '> use `wiki_read(blockKey)` for full content', wikiResult.markdown, '');
+ wiki = wikiResult.structured;
+ }
+ }
+
+ const slResult = await this.deps.slDiscoverTool.call(
+ { query: query || undefined, connectionId: input.connectionName },
+ context,
+ );
+ if (slResult.structured?.totalSources > 0) {
+ parts.push('## Semantic Layer Sources', '> use `sl_read_source(sourceName)` for the YAML, or `entity_details` for warehouse-shape details', slResult.markdown, '');
+ sl = slResult.structured;
+ }
+
+ const catalog = this.deps.catalogFactory(context);
+ const connections = input.connectionName
+ ? [input.connectionName]
+ : [...(context.session?.allowedConnectionNames ?? [])].sort();
+ const rawHits: RawSchemaHit[] = [];
+ for (const connectionName of connections) {
+ rawHits.push(...(await catalog.searchByName(connectionName, query, limit)));
+ }
+ if (rawHits.length > 0) {
+ parts.push('## Raw Warehouse Schema', '> use `entity_details({connectionName, targets: [{display}]})` for full DDL + sample values');
+ parts.push(
+ rawHits
+ .slice(0, limit)
+ .map((hit) => `- ${hit.kind}: ${hit.display} (matched on ${hit.matchedOn})`)
+ .join('\n'),
+ );
+ raw = { hits: rawHits.slice(0, limit) };
+ }
+
+ if (parts.length === 0) {
+ return {
+ markdown: `No matches for "${query}" across wiki, semantic layer, or raw warehouse schema. Try broader terms; this concept may not exist yet.`,
+ structured: { wiki, sl, raw },
+ };
+ }
+
+ return { markdown: parts.join('\n'), structured: { wiki, sl, raw } };
+ }
+}
+```
+
+Create `packages/context/src/ingest/tools/warehouse-verification/index.ts`:
+
+```ts
+import type { BaseTool, ToolContext } from '../../../tools/index.js';
+import type { KtxFileStorePort } from '../../../core/index.js';
+import type { SlConnectionCatalogPort } from '../../../sl/index.js';
+import { DiscoverDataTool } from './discover-data.tool.js';
+import { EntityDetailsTool } from './entity-details.tool.js';
+import { SqlExecutionTool } from './sql-execution.tool.js';
+import { WarehouseCatalogService } from './warehouse-catalog.service.js';
+
+export { DiscoverDataTool } from './discover-data.tool.js';
+export { EntityDetailsTool } from './entity-details.tool.js';
+export { SqlExecutionTool } from './sql-execution.tool.js';
+export { WarehouseCatalogService } from './warehouse-catalog.service.js';
+export type { TableDetail, WarehouseColumnDetail, RawSchemaHit } from './warehouse-catalog.service.js';
+
+export function createWarehouseVerificationTools(deps: {
+ connections: SlConnectionCatalogPort;
+ fallbackFileStore: KtxFileStorePort;
+ wikiSearchTool: BaseTool;
+ slDiscoverTool: BaseTool;
+}): BaseTool[] {
+ const catalogFactory = (context: ToolContext) =>
+ new WarehouseCatalogService({
+ fileStore: context.session?.configService ?? deps.fallbackFileStore,
+ });
+ return [
+ new EntityDetailsTool(catalogFactory),
+ new SqlExecutionTool(deps.connections),
+ new DiscoverDataTool({
+ wikiSearchTool: deps.wikiSearchTool,
+ slDiscoverTool: deps.slDiscoverTool,
+ catalogFactory,
+ }),
+ ];
+}
+```
+
+- [ ] **Step 4: Run the `discover_data` tests**
+
+Run:
+
+```bash
+pnpm --filter @ktx/context exec vitest run src/ingest/tools/warehouse-verification/discover-data.tool.test.ts
+```
+
+Expected: PASS.
+
+- [ ] **Step 5: Commit**
+
+Run:
+
+```bash
+git add packages/context/src/ingest/tools/warehouse-verification/discover-data.tool.ts packages/context/src/ingest/tools/warehouse-verification/discover-data.tool.test.ts packages/context/src/ingest/tools/warehouse-verification/index.ts
+git commit -m "feat(context): add raw warehouse discovery tool"
+```
+
+### Task 6: Wire tools into ingest sessions
+
+**Files:**
+- Modify: `packages/context/src/tools/tool-session.ts`
+- Modify: `packages/context/src/ingest/ingest-bundle.runner.ts`
+- Modify: `packages/context/src/ingest/local-bundle-runtime.ts`
+- Modify: `packages/context/src/ingest/ingest-bundle.runner.test.ts`
+
+- [ ] **Step 1: Write failing scoping test**
+
+Add to `packages/context/src/ingest/ingest-bundle.runner.test.ts`:
+
+```ts
+it('threads target warehouse connection names into WorkUnit and reconcile tool sessions', async () => {
+ const deps = makeDeps();
+ const sessions: any[] = [];
+ deps.adapter.listTargetConnectionIds = vi.fn().mockResolvedValue(['warehouse']);
+ deps.toolsetFactory.createIngestWuToolset.mockImplementation((toolSession: any) => {
+ sessions.push(toolSession);
+ return {
+ toAiSdkTools: vi.fn().mockReturnValue({}),
+ getAllTools: vi.fn().mockReturnValue([]),
+ getToolNames: vi.fn().mockReturnValue([]),
+ };
+ });
+ deps.agentRunner.runLoop.mockResolvedValue({ stopReason: 'natural' });
+
+ const runner = buildRunner(deps);
+ (runner as any).stageRawFilesStage1 = vi.fn().mockResolvedValue({
+ currentHashes: new Map([['a.yml', 'h1']]),
+ rawDirInWorktree: 'raw-sources/notion/fake/s',
+ });
+ (runner as any).resolveStagedDir = vi.fn().mockResolvedValue('/tmp/stage/upload-x');
+
+ await runner.run({
+ jobId: 'j1',
+ connectionId: 'notion',
+ sourceKey: 'fake',
+ trigger: 'upload',
+ bundleRef: { kind: 'upload', uploadId: 'upload-x' },
+ });
+
+ expect([...sessions[0].allowedConnectionNames].sort()).toEqual(['notion', 'warehouse']);
+});
+```
+
+- [ ] **Step 2: Run the failing runner test**
+
+Run:
+
+```bash
+pnpm --filter @ktx/context exec vitest run src/ingest/ingest-bundle.runner.test.ts -t "threads target warehouse connection names"
+```
+
+Expected: FAIL because `allowedConnectionNames` is absent.
+
+- [ ] **Step 3: Thread allowed connection names**
+
+Modify `packages/context/src/tools/tool-session.ts`:
+
+```ts
+ allowedRawPaths?: ReadonlySet;
+ allowedConnectionNames?: ReadonlySet;
+ semanticLayerService: SemanticLayerService;
+```
+
+Modify WU session creation in `packages/context/src/ingest/ingest-bundle.runner.ts`:
+
+```ts
+ allowedRawPaths: new Set(wu.rawFiles),
+ allowedConnectionNames: new Set(slConnectionIds),
+ semanticLayerService: scopedSemanticLayerService,
+```
+
+Modify reconcile session creation in the same file:
+
+```ts
+ allowedRawPaths: reconciliationAllowedRawPaths,
+ allowedConnectionNames: new Set(slConnectionIds),
+ semanticLayerService: rcScopedSl,
+```
+
+- [ ] **Step 4: Register the tools in the local ingest toolset**
+
+Modify `packages/context/src/ingest/local-bundle-runtime.ts`:
+
+```ts
+import {
+ createWarehouseVerificationTools,
+} from './tools/warehouse-verification/index.js';
+```
+
+Refactor the existing inline wiki and SL tool instances in `LocalIngestToolsetFactory` so `wikiSearchTool` and `slDiscoverTool` are named constants, then add the warehouse tools:
+
+```ts
+ const wikiSearchTool = new WikiSearchTool({
+ search: async (input) => {
+ const results = await searchLocalKnowledgePages(deps.project, {
+ userId: input.userId,
+ query: input.query,
+ limit: input.limit,
+ embeddingService: deps.embedding,
+ });
+ return {
+ results: results.slice(0, input.limit).map((result) => ({
+ key: result.key,
+ path: result.path,
+ summary: result.summary,
+ score: result.score,
+ matchReasons: result.matchReasons,
+ lanes: result.lanes,
+ })),
+ totalFound: results.length,
+ };
+ },
+ });
+ const slDiscoverTool = new SlDiscoverTool(slDeps, { maxSources: 25, minRrfScore: 0, maxDetailedSources: 5 });
+ const warehouseVerificationTools = createWarehouseVerificationTools({
+ connections: deps.connections,
+ fallbackFileStore: deps.project.fileStore,
+ wikiSearchTool,
+ slDiscoverTool,
+ });
+
+ this.baseTools = [
+ new WikiReadTool(deps.wikiService, deps.knowledgeIndex),
+ wikiSearchTool,
+ new WikiListTagsTool(deps.wikiService, deps.knowledgeIndex),
+ new WikiWriteTool(deps.wikiService, deps.knowledgeIndex, deps.knowledgeEvents),
+ new WikiRemoveTool(deps.wikiService, deps.knowledgeIndex, deps.knowledgeEvents),
+ slDiscoverTool,
+ new SlEditSourceTool(slDeps),
+ new SlReadSourceTool(slDeps),
+ new SlWriteSourceTool(slDeps),
+ new SlValidateTool(slDeps),
+ new SlRollbackTool(deps.slSourcesRepository, deps.connections, 0),
+ ...warehouseVerificationTools,
+ ];
+```
+
+- [ ] **Step 5: Run integration and toolset tests**
+
+Run:
+
+```bash
+pnpm --filter @ktx/context exec vitest run src/ingest/ingest-bundle.runner.test.ts -t "threads target warehouse connection names"
+pnpm --filter @ktx/context exec vitest run src/ingest/local-bundle-runtime.test.ts
+```
+
+Expected: PASS.
+
+- [ ] **Step 6: Commit**
+
+Run:
+
+```bash
+git add packages/context/src/tools/tool-session.ts packages/context/src/ingest/ingest-bundle.runner.ts packages/context/src/ingest/local-bundle-runtime.ts packages/context/src/ingest/ingest-bundle.runner.test.ts
+git commit -m "feat(context): expose warehouse verification tools to ingest"
+```
+
+### Task 7: Update writer prompts and cleanup stale references
+
+**Files:**
+- Create: `packages/context/skills/_shared/identifier-verification.md`
+- Modify: `packages/context/skills/notion_synthesize/SKILL.md`
+- Modify: `packages/context/skills/dbt_ingest/SKILL.md`
+- Modify: `packages/context/skills/lookml_ingest/SKILL.md`
+- Modify: `packages/context/skills/looker_ingest/SKILL.md`
+- Modify: `packages/context/skills/metabase_ingest/SKILL.md`
+- Modify: `packages/context/skills/metricflow_ingest/SKILL.md`
+- Modify: `packages/context/skills/live_database_ingest/SKILL.md`
+- Modify: `packages/context/skills/historic_sql_table_digest/SKILL.md`
+- Modify: `packages/context/skills/historic_sql_patterns/SKILL.md`
+- Modify: `packages/context/skills/knowledge_capture/SKILL.md`
+- Modify: `packages/context/skills/sl_capture/SKILL.md`
+- Modify: `packages/context/skills/sl/SKILL.md`
+- Modify: `packages/context/src/ingest/tools/emit-unmapped-fallback.tool.ts`
+- Modify: `packages/context/src/sl/tools/sl-warehouse-validation.ts`
+
+- [ ] **Step 1: Add the shared protocol file**
+
+Create `packages/context/skills/_shared/identifier-verification.md`:
+
+```md
+## Identifier Verification Protocol
+
+Before writing a wiki page or SL source on any topic:
+
+1. `discover_data({query: ""})` - see what wikis, SL sources, and raw
+ tables already exist. Prefer updating existing pages over creating new ones.
+
+Before emitting any `schema.table` or `schema.table.column` into a wiki body,
+SL source, `tables:` frontmatter, `sl_refs`, or `emit_unmapped_fallback`:
+
+2. `entity_details({connectionName, targets: [{display: ""}]})` -
+ confirm the identifier resolves; inspect native types, FK/PK, and
+ sampleValues.
+3. For literal values from the source, such as status codes or plan tiers,
+ check whether they appear in `entity_details` sampleValues for the relevant
+ column. If sampleValues is short or the sample may have missed real values,
+ run a `sql_execution` probe:
+ `SELECT DISTINCT
FROM LIMIT 50`.
+4. If the candidate identifier still does not resolve, do one of:
+ - Use `sql_execution` with `SELECT 1 FROM LIMIT 0`. If it errors, the
+ identifier is fictional.
+ - Wrap the identifier in `[unverified - from ]` in the wiki body,
+ citing the exact raw path that mentioned it.
+ - When recording `emit_unmapped_fallback` with `no_physical_table`, include
+ the failing probe error in `clarification`.
+5. Never copy `.
` placeholder strings from these instructions
+ into output.
+```
+
+- [ ] **Step 2: Inline the protocol into writer skills**
+
+Add the same protocol block to these skills:
+
+```text
+packages/context/skills/notion_synthesize/SKILL.md
+packages/context/skills/dbt_ingest/SKILL.md
+packages/context/skills/lookml_ingest/SKILL.md
+packages/context/skills/looker_ingest/SKILL.md
+packages/context/skills/metabase_ingest/SKILL.md
+packages/context/skills/metricflow_ingest/SKILL.md
+packages/context/skills/live_database_ingest/SKILL.md
+packages/context/skills/historic_sql_patterns/SKILL.md
+packages/context/skills/knowledge_capture/SKILL.md
+packages/context/skills/sl_capture/SKILL.md
+```
+
+For `packages/context/skills/historic_sql_table_digest/SKILL.md`, add this shorter block:
+
+```md
+## Identifier Verification Protocol
+
+Only mention columns visible in the table's scan record. Use
+`entity_details({connectionName, targets: [{display: ""}]})` if
+the table or column attribution is uncertain. Do not infer join columns or
+filters from neighboring SQL unless the scan record confirms the column exists
+on the named table.
+```
+
+For `packages/context/skills/sl/SKILL.md`, add this cross-reference:
+
+```md
+For capture-time identifier verification, load `sl_capture`. Synthesis writer
+skills must verify warehouse identifiers with `discover_data`,
+`entity_details`, and `sql_execution` before emitting table or column names.
+```
+
+- [ ] **Step 3: Apply per-skill edits**
+
+Make these exact content changes:
+
+- In `notion_synthesize`, add `discover_data`, `entity_details`, and `sql_execution` to the `Allowed:` line. Replace `tableRef: "orbit_analytics.customer"` with `tableRef: ".
"`.
+- In `dbt_ingest`, replace `wiki_sl_search` with `discover_data` and `sl_describe_table` with `entity_details`.
+- In `lookml_ingest`, add: `Verify each sql_table_name from the LookML view with entity_details before mapping to an SL source.`
+- In `looker_ingest`, add: `For every Looker field reference, call entity_details on the underlying schema.table.column before promoting it to sl_refs or quoting it in wiki body.`
+- In `metabase_ingest`, add: `Before writing a wiki page derived from a Metabase question SQL, verify each schema.table.column mentioned with entity_details.`
+- In `metricflow_ingest`, add: `Verify each MetricFlow model source table with entity_details before producing the corresponding sl_write_source.`
+- In `live_database_ingest`, add: `Sample values come from the scan record; do not invent values not present in relationship-profile.json.`
+- In `historic_sql_patterns`, add: `Every join column mentioned in pattern descriptions must be verified via entity_details for both sides of the join.`
+- In `knowledge_capture`, update the workflow to call `discover_data` first when a page relates to data or SL concepts.
+- In `sl_capture`, add: `Before sl_write_source, call entity_details on the target table to confirm column names and types match the YAML being written.`
+
+- [ ] **Step 4: Remove stale code and prompt strings**
+
+Modify `packages/context/src/ingest/tools/emit-unmapped-fallback.tool.ts`:
+
+```ts
+.describe('The fully-qualified table or source reference that triggered the fallback (e.g. ".
"). Used to generate canonical detail text.'),
+```
+
+Modify `packages/context/src/sl/tools/sl-warehouse-validation.ts`:
+
+```ts
+ `that inherits the manifest schema. Call sl_read_source to inspect the existing source first.`,
+```
+
+- [ ] **Step 5: Commit**
+
+Run:
+
+```bash
+git add packages/context/skills packages/context/src/ingest/tools/emit-unmapped-fallback.tool.ts packages/context/src/sl/tools/sl-warehouse-validation.ts
+git commit -m "docs(context): add ingest identifier verification protocol"
+```
+
+### Task 8: Add prompt-bundling and banned-string tests
+
+**Files:**
+- Modify: `packages/context/src/memory/memory-runtime-assets.test.ts`
+- Modify: `packages/context/src/ingest/ingest-runtime-assets.test.ts`
+
+- [ ] **Step 1: Add failing asset tests**
+
+Add to `packages/context/src/memory/memory-runtime-assets.test.ts`:
+
+```ts
+const verificationWriterSkills = [
+ 'notion_synthesize',
+ 'dbt_ingest',
+ 'lookml_ingest',
+ 'looker_ingest',
+ 'metabase_ingest',
+ 'metricflow_ingest',
+ 'live_database_ingest',
+ 'historic_sql_table_digest',
+ 'historic_sql_patterns',
+ 'knowledge_capture',
+ 'sl_capture',
+] as const;
+
+it('ships identifier verification protocol in every synthesis writer skill', async () => {
+ for (const skillName of verificationWriterSkills) {
+ const body = await readFile(join(skillsDir, skillName, 'SKILL.md'), 'utf-8');
+ expect(body).toContain('## Identifier Verification Protocol');
+ expect(body).toMatch(/discover_data|entity_details/);
+ }
+});
+
+it('does not ship stale warehouse verification tool names or fictional identifiers', async () => {
+ for (const skillName of verificationWriterSkills) {
+ const body = await readFile(join(skillsDir, skillName, 'SKILL.md'), 'utf-8');
+ expect(body).not.toContain('orbit_analytics.customer');
+ expect(body).not.toContain('wiki_sl_search');
+ expect(body).not.toContain('sl_describe_table');
+ }
+});
+```
+
+Add to `packages/context/src/ingest/ingest-runtime-assets.test.ts`:
+
+```ts
+it('packages identifier verification prompt assets', async () => {
+ const shared = await readFile(join(skillsDir, '_shared', 'identifier-verification.md'), 'utf-8');
+ expect(shared).toContain('## Identifier Verification Protocol');
+ expect(shared).toContain('discover_data');
+ expect(shared).toContain('entity_details');
+ expect(shared).toContain('sql_execution');
+});
+```
+
+- [ ] **Step 2: Run the asset tests**
+
+Run:
+
+```bash
+pnpm --filter @ktx/context exec vitest run src/memory/memory-runtime-assets.test.ts src/ingest/ingest-runtime-assets.test.ts
+```
+
+Expected: PASS after Task 7.
+
+- [ ] **Step 3: Commit**
+
+Run:
+
+```bash
+git add packages/context/src/memory/memory-runtime-assets.test.ts packages/context/src/ingest/ingest-runtime-assets.test.ts
+git commit -m "test(context): guard ingest identifier verification prompts"
+```
+
+### Task 9: Run the full v1 verification set
+
+**Files:**
+- Verify all files changed by Tasks 1-8.
+
+- [ ] **Step 1: Run focused tests**
+
+Run:
+
+```bash
+pnpm --filter @ktx/context exec vitest run \
+ src/connections/dialects.test.ts \
+ src/ingest/tools/warehouse-verification/warehouse-catalog.service.test.ts \
+ src/ingest/tools/warehouse-verification/entity-details.tool.test.ts \
+ src/ingest/tools/warehouse-verification/sql-execution.tool.test.ts \
+ src/ingest/tools/warehouse-verification/discover-data.tool.test.ts \
+ src/ingest/ingest-bundle.runner.test.ts \
+ src/memory/memory-runtime-assets.test.ts \
+ src/ingest/ingest-runtime-assets.test.ts
+```
+
+Expected: PASS.
+
+- [ ] **Step 2: Run package type-check**
+
+Run:
+
+```bash
+pnpm --filter @ktx/context run type-check
+```
+
+Expected: PASS.
+
+- [ ] **Step 3: Run package tests**
+
+Run:
+
+```bash
+pnpm --filter @ktx/context run test
+```
+
+Expected: PASS.
+
+- [ ] **Step 4: Run pre-commit on changed files when configured**
+
+Run:
+
+```bash
+uv run pre-commit run --files \
+ packages/context/src/connections/dialects.ts \
+ packages/context/src/connections/dialects.test.ts \
+ packages/context/src/connections/index.ts \
+ packages/context/src/tools/tool-session.ts \
+ packages/context/src/ingest/ingest-bundle.runner.ts \
+ packages/context/src/ingest/local-bundle-runtime.ts \
+ packages/context/src/ingest/ingest-bundle.runner.test.ts \
+ packages/context/src/ingest/tools/emit-unmapped-fallback.tool.ts \
+ packages/context/src/sl/tools/sl-warehouse-validation.ts \
+ packages/context/src/memory/memory-runtime-assets.test.ts \
+ packages/context/src/ingest/ingest-runtime-assets.test.ts \
+ packages/context/src/ingest/tools/warehouse-verification/warehouse-catalog.service.ts \
+ packages/context/src/ingest/tools/warehouse-verification/warehouse-catalog.service.test.ts \
+ packages/context/src/ingest/tools/warehouse-verification/entity-details.tool.ts \
+ packages/context/src/ingest/tools/warehouse-verification/entity-details.tool.test.ts \
+ packages/context/src/ingest/tools/warehouse-verification/sql-execution.tool.ts \
+ packages/context/src/ingest/tools/warehouse-verification/sql-execution.tool.test.ts \
+ packages/context/src/ingest/tools/warehouse-verification/discover-data.tool.ts \
+ packages/context/src/ingest/tools/warehouse-verification/discover-data.tool.test.ts \
+ packages/context/src/ingest/tools/warehouse-verification/index.ts \
+ packages/context/skills/_shared/identifier-verification.md \
+ packages/context/skills/notion_synthesize/SKILL.md \
+ packages/context/skills/dbt_ingest/SKILL.md \
+ packages/context/skills/lookml_ingest/SKILL.md \
+ packages/context/skills/looker_ingest/SKILL.md \
+ packages/context/skills/metabase_ingest/SKILL.md \
+ packages/context/skills/metricflow_ingest/SKILL.md \
+ packages/context/skills/live_database_ingest/SKILL.md \
+ packages/context/skills/historic_sql_table_digest/SKILL.md \
+ packages/context/skills/historic_sql_patterns/SKILL.md \
+ packages/context/skills/knowledge_capture/SKILL.md \
+ packages/context/skills/sl_capture/SKILL.md \
+ packages/context/skills/sl/SKILL.md
+```
+
+Expected: PASS. If the repo has no pre-commit config or the local `uv` version cannot satisfy the project pin, record the exact error and rely on the focused tests plus type-check.
+
+- [ ] **Step 5: Commit final verification notes if any files changed during checks**
+
+Run:
+
+```bash
+git status --short
+```
+
+Expected: only intentional files are modified. Commit any formatter-driven edits with:
+
+```bash
+git add packages/context
+git commit -m "chore(context): verify warehouse verification tools"
+```
+
+## Self-review checklist
+
+- Spec coverage: the plan covers dialect dispatch, raw scan catalog reads, `entity_details`, `sql_execution`, `discover_data`, WU and reconcile availability, prompt updates, cleanups, and tests.
+- Placeholder scan: no task relies on unnamed future work.
+- Type consistency: tool inputs use `connectionName`; existing `sl_discover` calls receive `connectionId` internally; raw SQL execution uses `SlConnectionCatalogPort.executeQuery()` because `SemanticLayerService.executeQuery()` currently accepts semantic-layer query input, not raw SQL.
diff --git a/docs/superpowers/plans/2026-05-13-warehouse-verification-prompt-shape-closure.md b/docs/superpowers/plans/2026-05-13-warehouse-verification-prompt-shape-closure.md
new file mode 100644
index 00000000..05223b93
--- /dev/null
+++ b/docs/superpowers/plans/2026-05-13-warehouse-verification-prompt-shape-closure.md
@@ -0,0 +1,345 @@
+# Warehouse Verification Prompt Shape Closure Implementation Plan
+
+> **For agentic workers:** REQUIRED SUB-SKILL: Use superpowers:subagent-driven-development (recommended) or superpowers:executing-plans to implement this plan task-by-task. Steps use checkbox (`- [ ]`) syntax for tracking.
+
+**Goal:** Make every warehouse-verification prompt use KTX's shipped
+`sql_execution` input shape so ingest agents include `connectionName` when they
+probe warehouse identifiers.
+
+**Architecture:** Keep the warehouse verification tool code unchanged. Add
+prompt-asset tests that reject Kaelio's old session-only SQL examples, then
+update the shared identifier protocol and the three remaining per-skill SQL
+probe examples that still show the legacy shape.
+
+**Tech Stack:** Markdown skill prompts, TypeScript, Vitest, pnpm workspace
+commands.
+
+---
+
+## Audit Summary
+
+The warehouse verification tools, runner wiring, adapter target fan-out, and
+focused tests are present. Focused verification passed:
+
+```bash
+pnpm --filter @ktx/context exec vitest run src/connections/dialects.test.ts src/connections/read-only-sql.test.ts src/ingest/tools/warehouse-verification/warehouse-catalog.service.test.ts src/ingest/tools/warehouse-verification/entity-details.tool.test.ts src/ingest/tools/warehouse-verification/sql-execution.tool.test.ts src/ingest/tools/warehouse-verification/discover-data.tool.test.ts src/ingest/ingest-prompts.test.ts src/ingest/ingest-runtime-assets.test.ts src/memory/memory-runtime-assets.test.ts src/ingest/local-adapters.test.ts src/ingest/adapters/notion/notion.adapter.test.ts src/ingest/adapters/lookml/lookml.adapter.test.ts src/ingest/adapters/metricflow/metricflow.adapter.test.ts
+pnpm --filter @ktx/cli exec vitest run src/ingest-query-executor.test.ts src/ingest.test.ts -t "supplies a scan-connector query executor"
+```
+
+Remaining v1-blocking gap:
+
+- `packages/context/skills/lookml_ingest/SKILL.md`,
+ `packages/context/skills/metricflow_ingest/SKILL.md`, and
+ `packages/context/skills/sl_capture/SKILL.md` still contain
+ `sql_execution({ sql ... })` / "session shape" guidance inherited from
+ Kaelio. KTX's tool contract is
+ `sql_execution({connectionName, sql, rowLimit?})`, so these examples can make
+ agents call the shipped tool with invalid input.
+
+Non-blocking gaps remain out of scope for this v1 plan:
+
+- Full DDL-style `entity_details` formatting with FK profile summaries.
+- AST-backed SQL validation for data-modifying CTE bodies.
+- Search over generated `enrichment/descriptions.json`.
+- Per-WorkUnit reuse of a single `WarehouseCatalogService` instance for cache
+ hits across separate tool calls.
+- A deterministic fake-LLM end-to-end Notion hallucination regression. Prompt
+ guards and tool contract tests cover the v1 contract; a broader behavior
+ regression can land as follow-up.
+
+## File Structure
+
+Modify these files:
+
+- `packages/context/src/memory/memory-runtime-assets.test.ts`: add a prompt
+ guard that rejects the legacy session-only `sql_execution` shape.
+- `packages/context/src/ingest/ingest-runtime-assets.test.ts`: strengthen the
+ shared prompt asset assertion for the KTX `connectionName` SQL shape.
+- `packages/context/skills/_shared/identifier-verification.md`: make both SQL
+ probe instructions show the KTX `connectionName` argument.
+- `packages/context/skills/notion_synthesize/SKILL.md`: inline the updated
+ protocol block.
+- `packages/context/skills/dbt_ingest/SKILL.md`: inline the updated protocol
+ block.
+- `packages/context/skills/lookml_ingest/SKILL.md`: inline the updated protocol
+ block and fix the legacy SQL fallback example.
+- `packages/context/skills/looker_ingest/SKILL.md`: inline the updated
+ protocol block.
+- `packages/context/skills/metabase_ingest/SKILL.md`: inline the updated
+ protocol block.
+- `packages/context/skills/metricflow_ingest/SKILL.md`: inline the updated
+ protocol block and fix the legacy SQL fallback example.
+- `packages/context/skills/live_database_ingest/SKILL.md`: inline the updated
+ protocol block.
+- `packages/context/skills/historic_sql_table_digest/SKILL.md`: inline the
+ updated protocol block.
+- `packages/context/skills/historic_sql_patterns/SKILL.md`: inline the updated
+ protocol block.
+- `packages/context/skills/knowledge_capture/SKILL.md`: inline the updated
+ protocol block.
+- `packages/context/skills/sl_capture/SKILL.md`: inline the updated protocol
+ block and fix the join-discovery SQL example.
+
+### Task 1: Add Prompt Guards For The KTX SQL Tool Shape
+
+**Files:**
+- Modify: `packages/context/src/memory/memory-runtime-assets.test.ts`
+- Modify: `packages/context/src/ingest/ingest-runtime-assets.test.ts`
+
+- [ ] **Step 1: Add the failing memory asset guard**
+
+In `packages/context/src/memory/memory-runtime-assets.test.ts`, add this test
+after `does not ship stale warehouse verification tool names or fictional
+identifiers`:
+
+```ts
+ it('ships only the KTX connectionName sql_execution call shape in writer guidance', async () => {
+ const shared = await readFile(join(skillsDir, '_shared', 'identifier-verification.md'), 'utf-8');
+
+ expect(shared).toContain('sql_execution({connectionName, sql: "SELECT DISTINCT');
+ expect(shared).toContain('sql_execution({connectionName, sql: "SELECT 1 FROM');
+
+ for (const skillName of verificationWriterSkills) {
+ const body = await readFile(join(skillsDir, skillName, 'SKILL.md'), 'utf-8');
+ expect(body).toContain('sql_execution({connectionName');
+ expect(body).not.toContain('sql_execution({ sql');
+ expect(body).not.toContain('session shape');
+ expect(body).not.toContain('connection is already pinned by the ingest session');
+ }
+ });
+```
+
+- [ ] **Step 2: Strengthen the shared ingest asset guard**
+
+In `packages/context/src/ingest/ingest-runtime-assets.test.ts`, update
+`packages identifier verification prompt assets` so the final assertions are:
+
+```ts
+ expect(shared).toContain('discover_data');
+ expect(shared).toContain('entity_details');
+ expect(shared).toContain('sql_execution');
+ expect(shared).toContain('sql_execution({connectionName, sql: "SELECT DISTINCT');
+ expect(shared).toContain('sql_execution({connectionName, sql: "SELECT 1 FROM');
+```
+
+- [ ] **Step 3: Run the failing prompt guards**
+
+Run:
+
+```bash
+pnpm --filter @ktx/context exec vitest run src/memory/memory-runtime-assets.test.ts src/ingest/ingest-runtime-assets.test.ts
+```
+
+Expected: FAIL. The failure must mention at least one current legacy string:
+`sql_execution({ sql`, `session shape`, or missing
+`sql_execution({connectionName`.
+
+### Task 2: Update The Shared Identifier Verification Protocol
+
+**Files:**
+- Modify: `packages/context/skills/_shared/identifier-verification.md`
+- Modify: `packages/context/skills/notion_synthesize/SKILL.md`
+- Modify: `packages/context/skills/dbt_ingest/SKILL.md`
+- Modify: `packages/context/skills/lookml_ingest/SKILL.md`
+- Modify: `packages/context/skills/looker_ingest/SKILL.md`
+- Modify: `packages/context/skills/metabase_ingest/SKILL.md`
+- Modify: `packages/context/skills/metricflow_ingest/SKILL.md`
+- Modify: `packages/context/skills/live_database_ingest/SKILL.md`
+- Modify: `packages/context/skills/historic_sql_table_digest/SKILL.md`
+- Modify: `packages/context/skills/historic_sql_patterns/SKILL.md`
+- Modify: `packages/context/skills/knowledge_capture/SKILL.md`
+- Modify: `packages/context/skills/sl_capture/SKILL.md`
+
+- [ ] **Step 1: Replace the shared protocol text**
+
+Replace the full `## Identifier Verification Protocol` block in
+`packages/context/skills/_shared/identifier-verification.md` with:
+
+```md
+## Identifier Verification Protocol
+
+Before writing a wiki page or SL source on any topic:
+
+1. `discover_data({query: ""})` - see what wikis, SL sources, and raw
+ tables already exist. Prefer updating existing pages over creating new ones.
+
+Before emitting any `schema.table` or `schema.table.column` into a wiki body,
+SL source, `tables:` frontmatter, `sl_refs`, or `emit_unmapped_fallback`:
+
+2. `entity_details({connectionName, targets: [{display: ""}]})` -
+ confirm the identifier resolves; inspect native types, FK/PK, and
+ sampleValues.
+3. For literal values from the source, such as status codes or plan tiers,
+ check whether they appear in `entity_details` sampleValues for the relevant
+ column. If sampleValues is short or the sample may have missed real values,
+ run a `sql_execution` probe with the same warehouse connection name:
+ `sql_execution({connectionName, sql: "SELECT DISTINCT
FROM LIMIT 50"})`.
+4. If the candidate identifier still does not resolve, do one of:
+ - Use `sql_execution({connectionName, sql: "SELECT 1 FROM LIMIT 0"})`.
+ If it errors, the identifier is fictional.
+ - Wrap the identifier in `[unverified - from ]` in the wiki body,
+ citing the exact raw path that mentioned it.
+ - When recording `emit_unmapped_fallback` with `no_physical_table`, include
+ the failing probe error in `clarification`.
+5. Never copy `.
` placeholder strings from these instructions
+ into output.
+```
+
+- [ ] **Step 2: Inline the same protocol in every writer skill**
+
+Replace the existing `## Identifier Verification Protocol` block in each writer
+skill with the exact block from Step 1:
+
+```bash
+packages/context/skills/notion_synthesize/SKILL.md
+packages/context/skills/dbt_ingest/SKILL.md
+packages/context/skills/lookml_ingest/SKILL.md
+packages/context/skills/looker_ingest/SKILL.md
+packages/context/skills/metabase_ingest/SKILL.md
+packages/context/skills/metricflow_ingest/SKILL.md
+packages/context/skills/live_database_ingest/SKILL.md
+packages/context/skills/historic_sql_table_digest/SKILL.md
+packages/context/skills/historic_sql_patterns/SKILL.md
+packages/context/skills/knowledge_capture/SKILL.md
+packages/context/skills/sl_capture/SKILL.md
+```
+
+- [ ] **Step 3: Run the shared prompt asset tests**
+
+Run:
+
+```bash
+pnpm --filter @ktx/context exec vitest run src/memory/memory-runtime-assets.test.ts src/ingest/ingest-runtime-assets.test.ts
+```
+
+Expected: still FAIL because the per-skill legacy SQL examples in LookML,
+MetricFlow, and `sl_capture` have not been fixed yet.
+
+### Task 3: Fix Legacy Per-Skill SQL Examples
+
+**Files:**
+- Modify: `packages/context/skills/lookml_ingest/SKILL.md`
+- Modify: `packages/context/skills/metricflow_ingest/SKILL.md`
+- Modify: `packages/context/skills/sl_capture/SKILL.md`
+
+- [ ] **Step 1: Fix the LookML fallback probe example**
+
+In `packages/context/skills/lookml_ingest/SKILL.md`, replace the current
+Required flow item 2 with:
+
+```md
+2. If the table isn't in the manifest, use the warehouse `connectionName`
+ returned by `discover_data` or the target connection chosen from
+ `sl_discover`, then call a dialect-appropriate SQL probe with that
+ connection name, for example:
+ `sql_execution({connectionName: "warehouse", sql: "SELECT 1 FROM analytics.orders LIMIT 0"})`.
+ Replace `warehouse`, `analytics`, and `orders` with the verified connection,
+ schema or dataset, and table from the WorkUnit evidence.
+```
+
+- [ ] **Step 2: Fix the MetricFlow fallback probe example**
+
+In `packages/context/skills/metricflow_ingest/SKILL.md`, replace the paragraph
+that begins `If \`sl_discover\` errors` with:
+
+```md
+If `sl_discover` errors because no such table exists, use `discover_data` and
+`entity_details` to find the warehouse target. If a SQL probe is still needed,
+call `sql_execution` with the same warehouse connection name, for example:
+`sql_execution({connectionName: "warehouse", sql: "SELECT 1 FROM analytics.orders LIMIT 0"})`.
+**Never invent column names** - every column in `columns:`, `grain:`, and
+`sql:` must be sourced from raw files, `entity_details`, or a successful SQL
+probe.
+```
+
+- [ ] **Step 3: Fix the `sl_capture` join probe example**
+
+In `packages/context/skills/sl_capture/SKILL.md`, replace Tool sequence item 6
+with:
+
+```md
+6. For join discovery: use `sql_execution({connectionName: "warehouse", sql: "SELECT count(*) FROM public.orders o JOIN public.customers c ON c.id = o.customer_id LIMIT 20"})` with the target warehouse connection name and dialect-correct table names to verify the join key exists in both tables and assess cardinality before declaring the join.
+```
+
+- [ ] **Step 4: Run the prompt asset tests**
+
+Run:
+
+```bash
+pnpm --filter @ktx/context exec vitest run src/memory/memory-runtime-assets.test.ts src/ingest/ingest-runtime-assets.test.ts
+```
+
+Expected: PASS. The tests must report 2 files passed.
+
+### Task 4: Final Verification
+
+**Files:**
+- No new files.
+
+- [ ] **Step 1: Run focused warehouse prompt and tool tests**
+
+Run:
+
+```bash
+pnpm --filter @ktx/context exec vitest run src/connections/dialects.test.ts src/connections/read-only-sql.test.ts src/ingest/tools/warehouse-verification/warehouse-catalog.service.test.ts src/ingest/tools/warehouse-verification/entity-details.tool.test.ts src/ingest/tools/warehouse-verification/sql-execution.tool.test.ts src/ingest/tools/warehouse-verification/discover-data.tool.test.ts src/ingest/ingest-prompts.test.ts src/ingest/ingest-runtime-assets.test.ts src/memory/memory-runtime-assets.test.ts
+```
+
+Expected: PASS.
+
+- [ ] **Step 2: Run package type-check**
+
+Run:
+
+```bash
+pnpm --filter @ktx/context run type-check
+```
+
+Expected: PASS.
+
+- [ ] **Step 3: Inspect final diff**
+
+Run:
+
+```bash
+git diff -- packages/context/src/memory/memory-runtime-assets.test.ts packages/context/src/ingest/ingest-runtime-assets.test.ts packages/context/skills/_shared/identifier-verification.md packages/context/skills/notion_synthesize/SKILL.md packages/context/skills/dbt_ingest/SKILL.md packages/context/skills/lookml_ingest/SKILL.md packages/context/skills/looker_ingest/SKILL.md packages/context/skills/metabase_ingest/SKILL.md packages/context/skills/metricflow_ingest/SKILL.md packages/context/skills/live_database_ingest/SKILL.md packages/context/skills/historic_sql_table_digest/SKILL.md packages/context/skills/historic_sql_patterns/SKILL.md packages/context/skills/knowledge_capture/SKILL.md packages/context/skills/sl_capture/SKILL.md
+```
+
+Expected: only prompt wording and prompt-asset guards changed. No tool
+implementation files changed.
+
+- [ ] **Step 4: Commit**
+
+Run:
+
+```bash
+git add packages/context/src/memory/memory-runtime-assets.test.ts packages/context/src/ingest/ingest-runtime-assets.test.ts packages/context/skills/_shared/identifier-verification.md packages/context/skills/notion_synthesize/SKILL.md packages/context/skills/dbt_ingest/SKILL.md packages/context/skills/lookml_ingest/SKILL.md packages/context/skills/looker_ingest/SKILL.md packages/context/skills/metabase_ingest/SKILL.md packages/context/skills/metricflow_ingest/SKILL.md packages/context/skills/live_database_ingest/SKILL.md packages/context/skills/historic_sql_table_digest/SKILL.md packages/context/skills/historic_sql_patterns/SKILL.md packages/context/skills/knowledge_capture/SKILL.md packages/context/skills/sl_capture/SKILL.md
+git commit -m "fix(context): align warehouse sql probe prompt shape"
+```
+
+Expected: one focused commit.
+
+## Self-Review
+
+Spec coverage:
+
+- The original spec requires `sql_execution` inputs to include
+ `connectionName`; this plan removes contradictory session-only examples from
+ all active writer guidance.
+- The shared protocol remains in `_shared` and inlined in every synthesis
+ writer skill named by the original spec.
+- The tool implementation remains unchanged because the shipped schema already
+ enforces the v1 contract.
+
+Placeholder scan:
+
+- The plan has no deferred implementation markers.
+- Prompt examples use concrete `warehouse`, `analytics`, and `orders` example
+ names only to demonstrate JSON shape, and each example tells the worker to
+ replace them with discovered evidence.
+
+Type consistency:
+
+- Tests assert the exact KTX tool call shape:
+ `sql_execution({connectionName, sql: ...})`.
+- Prompt wording consistently uses `connectionName`, matching
+ `packages/context/src/ingest/tools/warehouse-verification/sql-execution.tool.ts`.
diff --git a/docs/superpowers/plans/2026-05-13-warehouse-verification-sql-example-closure.md b/docs/superpowers/plans/2026-05-13-warehouse-verification-sql-example-closure.md
new file mode 100644
index 00000000..2d1b1779
--- /dev/null
+++ b/docs/superpowers/plans/2026-05-13-warehouse-verification-sql-example-closure.md
@@ -0,0 +1,215 @@
+# Warehouse Verification SQL Example Closure Implementation Plan
+
+> **For agentic workers:** REQUIRED SUB-SKILL: Use superpowers:subagent-driven-development (recommended) or superpowers:executing-plans to implement this plan task-by-task. Steps use checkbox (`- [ ]`) syntax for tracking.
+
+**Goal:** Remove the last connectionless `sql_execution` prompt example so
+warehouse-verification writer guidance always matches KTX's shipped tool
+contract.
+
+**Architecture:** Keep the warehouse verification tool code unchanged. Tighten
+the prompt asset guard so multiline `sql_execution({ sql: ... })` examples
+fail tests, then update the stale `sl_capture` worked example to pass
+`connectionName` explicitly.
+
+**Tech Stack:** Markdown skill prompts, TypeScript, Vitest, pnpm workspace
+commands.
+
+---
+
+## Audit summary
+
+The warehouse verification tools, runner wiring, source-adapter target fan-out,
+CLI query executor, and focused tests are present. Focused verification passed:
+
+```bash
+pnpm --filter @ktx/context exec vitest run src/connections/dialects.test.ts src/connections/read-only-sql.test.ts src/ingest/tools/warehouse-verification/warehouse-catalog.service.test.ts src/ingest/tools/warehouse-verification/entity-details.tool.test.ts src/ingest/tools/warehouse-verification/sql-execution.tool.test.ts src/ingest/tools/warehouse-verification/discover-data.tool.test.ts src/ingest/ingest-prompts.test.ts src/ingest/ingest-runtime-assets.test.ts src/memory/memory-runtime-assets.test.ts src/ingest/local-adapters.test.ts src/ingest/adapters/notion/notion.adapter.test.ts src/ingest/adapters/lookml/lookml.adapter.test.ts src/ingest/adapters/metricflow/metricflow.adapter.test.ts
+pnpm --filter @ktx/cli exec vitest run src/ingest-query-executor.test.ts src/ingest.test.ts -t "supplies a scan-connector query executor"
+```
+
+Remaining v1-blocking gap:
+
+- `packages/context/skills/sl_capture/SKILL.md` still contains a worked example
+ with a multiline `sql_execution({ sql: ... })` call. KTX's tool contract is
+ `sql_execution({connectionName, sql, rowLimit?})`, so this example can teach
+ agents to call the shipped tool with invalid input.
+
+Non-blocking gaps remain out of scope for this v1 plan:
+
+- Full DDL-style `entity_details` formatting with FK profile summaries.
+- AST-backed SQL validation for data-modifying CTE bodies.
+- Search over generated `enrichment/descriptions.json`.
+- Per-WorkUnit reuse of a single `WarehouseCatalogService` instance for cache
+ hits across separate tool calls.
+- A deterministic fake-LLM end-to-end Notion hallucination regression.
+- Tokenized or embedding-backed raw schema search ranking in `discover_data`.
+
+## File structure
+
+Modify these files:
+
+- `packages/context/src/memory/memory-runtime-assets.test.ts`: add a prompt
+ guard that catches multiline `sql_execution` calls without `connectionName`.
+- `packages/context/skills/sl_capture/SKILL.md`: update the stale worked
+ example to include the target warehouse `connectionName`.
+
+### Task 1: Add a multiline SQL prompt guard
+
+**Files:**
+- Modify: `packages/context/src/memory/memory-runtime-assets.test.ts`
+
+- [ ] **Step 1: Add a helper that extracts `sql_execution` call examples**
+
+In `packages/context/src/memory/memory-runtime-assets.test.ts`, add this helper
+after `forbiddenProductPattern()`:
+
+```ts
+function sqlExecutionCallBlocks(body: string): string[] {
+ const blocks: string[] = [];
+ const marker = 'sql_execution({';
+ let offset = 0;
+
+ while (offset < body.length) {
+ const start = body.indexOf(marker, offset);
+ if (start === -1) {
+ break;
+ }
+ const end = body.indexOf('})', start + marker.length);
+ blocks.push(body.slice(start, end === -1 ? start + marker.length : end + 2));
+ offset = start + marker.length;
+ }
+
+ return blocks;
+}
+```
+
+- [ ] **Step 2: Strengthen the existing SQL-shape test**
+
+Replace the body of
+`ships only the KTX connectionName sql_execution call shape in writer guidance`
+with:
+
+```ts
+ const shared = await readFile(join(skillsDir, '_shared', 'identifier-verification.md'), 'utf-8');
+ const bodies = [{ name: '_shared/identifier-verification.md', body: shared }];
+
+ expect(shared).toContain('sql_execution({connectionName, sql: "SELECT DISTINCT');
+ expect(shared).toContain('sql_execution({connectionName, sql: "SELECT 1 FROM');
+
+ for (const skillName of verificationWriterSkills) {
+ const body = await readFile(join(skillsDir, skillName, 'SKILL.md'), 'utf-8');
+ bodies.push({ name: `${skillName}/SKILL.md`, body });
+ expect(body).toContain('sql_execution({connectionName');
+ expect(body).not.toContain('sql_execution({ sql');
+ expect(body).not.toContain('session shape');
+ expect(body).not.toContain('connection is already pinned by the ingest session');
+ }
+
+ for (const { name, body } of bodies) {
+ const calls = sqlExecutionCallBlocks(body);
+ expect(calls.length, `${name} should contain sql_execution guidance`).toBeGreaterThan(0);
+ expect(
+ calls.filter((call) => !call.includes('connectionName')),
+ `${name} has sql_execution calls without connectionName`,
+ ).toEqual([]);
+ expect(body, `${name} has a connectionless multiline sql_execution call`).not.toMatch(
+ /sql_execution\(\{\s*sql\s*:/,
+ );
+ }
+```
+
+- [ ] **Step 3: Run the failing prompt guard**
+
+Run:
+
+```bash
+pnpm --filter @ktx/context exec vitest run src/memory/memory-runtime-assets.test.ts -t "connectionName sql_execution"
+```
+
+Expected: FAIL. The failure must identify
+`sl_capture/SKILL.md` as having a `sql_execution` call without
+`connectionName` or a connectionless multiline `sql_execution` call.
+
+- [ ] **Step 4: Commit the failing guard**
+
+Run:
+
+```bash
+git add packages/context/src/memory/memory-runtime-assets.test.ts
+git commit -m "test(context): catch connectionless sql execution prompt examples"
+```
+
+### Task 2: Fix the stale `sl_capture` SQL example
+
+**Files:**
+- Modify: `packages/context/skills/sl_capture/SKILL.md`
+- Test: `packages/context/src/memory/memory-runtime-assets.test.ts`
+- Test: `packages/context/src/ingest/ingest-runtime-assets.test.ts`
+
+- [ ] **Step 1: Update the worked example**
+
+In `packages/context/skills/sl_capture/SKILL.md`, replace the `sql_execution`
+block in "Worked example - new join" with:
+
+```md
+sql_execution({
+ connectionName: "warehouse",
+ sql: "SELECT COUNT(*), COUNT(DISTINCT a.admin_user_id) FROM public.fct_orders a JOIN public.fct_mau_multiprotocol b ON a.admin_user_id = b.admin_user_id LIMIT 1"
+})
+```
+
+- [ ] **Step 2: Run the prompt guards**
+
+Run:
+
+```bash
+pnpm --filter @ktx/context exec vitest run src/memory/memory-runtime-assets.test.ts src/ingest/ingest-runtime-assets.test.ts
+```
+
+Expected: PASS.
+
+- [ ] **Step 3: Run a direct stale-shape scan**
+
+Run:
+
+```bash
+rg -n -U "sql_execution\\(\\{\\s*\\n\\s*sql:" packages/context/skills packages/context/prompts
+```
+
+Expected: no matches and exit code 1.
+
+- [ ] **Step 4: Run the context type-check**
+
+Run:
+
+```bash
+pnpm --filter @ktx/context run type-check
+```
+
+Expected: PASS.
+
+- [ ] **Step 5: Commit the prompt fix**
+
+Run:
+
+```bash
+git add packages/context/skills/sl_capture/SKILL.md
+git commit -m "fix(context): include connection name in sl capture sql example"
+```
+
+## Self-review
+
+Spec coverage:
+
+- The only remaining v1-blocking prompt-shape gap has a failing test and a
+ direct prompt edit.
+- Tool implementation, runner wiring, adapter scoping, and CLI execution
+ remain covered by the focused suites listed in the audit summary.
+
+Placeholder scan:
+
+- This plan contains no deferred implementation placeholders.
+
+Type consistency:
+
+- The plan uses the shipped KTX tool shape:
+ `sql_execution({connectionName, sql, rowLimit?})`.
diff --git a/docs/superpowers/plans/2026-05-13-warehouse-verification-structured-target-miss-closure.md b/docs/superpowers/plans/2026-05-13-warehouse-verification-structured-target-miss-closure.md
new file mode 100644
index 00000000..48983c4a
--- /dev/null
+++ b/docs/superpowers/plans/2026-05-13-warehouse-verification-structured-target-miss-closure.md
@@ -0,0 +1,236 @@
+# Warehouse Verification Structured Target Miss Closure Implementation Plan
+
+> **For agentic workers:** REQUIRED SUB-SKILL: Use superpowers:subagent-driven-development (recommended) or superpowers:executing-plans to implement this plan task-by-task. Steps use checkbox (`- [ ]`) syntax for tracking.
+
+**Goal:** Make `entity_details` return model-visible not-found evidence for every documented target shape, including structured `{catalog, db, name, column?}` targets.
+
+**Architecture:** Keep the existing warehouse verification module. Add focused tests for missing structured table and column targets, then route structured target labels through the same candidate lookup used by display targets while preserving exact structured resolution.
+
+**Tech Stack:** TypeScript, Node 22, Vitest, AI SDK v6 tools, Zod, KTX ingest tools.
+
+---
+
+## Audit Summary
+
+The implemented plans have landed the warehouse verification tools, ingest
+runner wiring, adapter warehouse target fan-out, CLI read-only query executor,
+and prompt-shape closures. Focused verification passed on May 13, 2026:
+
+```bash
+pnpm --filter @ktx/context exec vitest run src/connections/dialects.test.ts src/connections/read-only-sql.test.ts src/ingest/tools/warehouse-verification/warehouse-catalog.service.test.ts src/ingest/tools/warehouse-verification/entity-details.tool.test.ts src/ingest/tools/warehouse-verification/sql-execution.tool.test.ts src/ingest/tools/warehouse-verification/discover-data.tool.test.ts src/ingest/ingest-prompts.test.ts src/ingest/ingest-runtime-assets.test.ts src/memory/memory-runtime-assets.test.ts src/ingest/local-adapters.test.ts src/ingest/adapters/notion/notion.adapter.test.ts src/ingest/adapters/lookml/lookml.adapter.test.ts src/ingest/adapters/metricflow/metricflow.adapter.test.ts
+pnpm --filter @ktx/cli exec vitest run src/ingest-query-executor.test.ts src/ingest.test.ts -t "supplies a scan-connector query executor"
+rg -n -U "sql_execution\\(\\{\\s*\\n\\s*sql:" packages/context/skills packages/context/prompts
+rg -n "wiki_sl_search|sl_describe_table|orbit_analytics\\.customer" packages/context/skills packages/context/prompts packages/context/src/ingest/tools/emit-unmapped-fallback.tool.ts packages/context/src/sl/tools/sl-warehouse-validation.ts
+```
+
+Remaining v1-blocking gap:
+
+- `entity_details` accepts structured targets, but if a structured table target
+ does not exist, it records `structured.missing` and emits no markdown. Tool
+ outputs are sent to the model as markdown only, so the synthesis agent gets
+ an empty response instead of the required "Not found in scan" verification
+ signal.
+
+Non-blocking gaps remain out of scope for this v1 plan:
+
+- Full DDL-style `entity_details` formatting with FK and profile summaries.
+- AST-backed SQL validation for data-modifying CTE bodies.
+- Dialect-specific row-limit wrapping for SQL Server probes.
+- Search over generated `enrichment/descriptions.json`.
+- Per-WorkUnit reuse of a single `WarehouseCatalogService` instance for cache
+ hits across separate tool calls.
+- A deterministic fake-LLM end-to-end Notion hallucination regression.
+- Cleanup of legacy demo Orbit wiki fixtures that still mention
+ `orbit_analytics.customer`.
+
+## File Structure
+
+Modify these files:
+
+- `packages/context/src/ingest/tools/warehouse-verification/entity-details.tool.test.ts`: add failing coverage for missing structured targets.
+- `packages/context/src/ingest/tools/warehouse-verification/entity-details.tool.ts`: render missing structured targets into markdown and reuse candidate lookup.
+
+### Task 1: Report Structured Target Misses In `entity_details`
+
+**Files:**
+- Modify: `packages/context/src/ingest/tools/warehouse-verification/entity-details.tool.test.ts`
+- Modify: `packages/context/src/ingest/tools/warehouse-verification/entity-details.tool.ts`
+
+- [ ] **Step 1: Add failing structured miss tests**
+
+In `packages/context/src/ingest/tools/warehouse-verification/entity-details.tool.test.ts`, add these tests after `reports missing explicit columns instead of returning an empty column list`:
+
+```ts
+ it('reports missing structured table targets in model-visible markdown', async () => {
+ const result = await tool.call(
+ {
+ connectionName: 'warehouse',
+ targets: [{ catalog: null, db: 'public', name: 'orderz' }],
+ },
+ context,
+ );
+
+ expect(result.markdown).toContain('Not found in scan: public.orderz');
+ expect(result.markdown).toContain('Closest matches: orders');
+ expect(result.structured.resolved).toHaveLength(0);
+ expect(result.structured.missing).toHaveLength(1);
+ });
+
+ it('reports missing structured column targets in model-visible markdown', async () => {
+ const result = await tool.call(
+ {
+ connectionName: 'warehouse',
+ targets: [{ catalog: null, db: 'public', name: 'orders', column: 'plan_tier' }],
+ },
+ context,
+ );
+
+ expect(result.markdown).toContain('Column not found in scan: public.orders.plan_tier');
+ expect(result.markdown).toContain('Available columns: id, status');
+ expect(result.structured.resolved).toHaveLength(0);
+ expect(result.structured.missing).toHaveLength(1);
+ });
+```
+
+- [ ] **Step 2: Run the failing focused test**
+
+Run:
+
+```bash
+pnpm --filter @ktx/context exec vitest run src/ingest/tools/warehouse-verification/entity-details.tool.test.ts -t "structured"
+```
+
+Expected: FAIL. The first new test must fail because `result.markdown` does not contain `Not found in scan: public.orderz`.
+
+- [ ] **Step 3: Add structured target labels and candidate lookup**
+
+In `packages/context/src/ingest/tools/warehouse-verification/entity-details.tool.ts`, add this type alias after `type EntityDetailsInput = z.infer;`:
+
+```ts
+type EntityDetailsTarget = EntityDetailsInput['targets'][number];
+```
+
+Add these helpers after `function allowedConnectionNames(context: ToolContext): ReadonlySet | null { ... }`:
+
+```ts
+function targetLabel(target: EntityDetailsTarget): string {
+ if ('display' in target) {
+ return target.display;
+ }
+ return [target.catalog, target.db, target.name, target.column].filter((part): part is string => !!part).join('.');
+}
+
+function appendMissingTargetMarkdown(parts: string[], target: EntityDetailsTarget, candidates: KtxTableRef[]): void {
+ parts.push(`Not found in scan: ${targetLabel(target)}`);
+ if (candidates.length > 0) {
+ parts.push(`Closest matches: ${candidates.map((candidate) => candidate.name).join(', ')}`);
+ }
+}
+
+async function resolveTarget(
+ catalog: WarehouseCatalogService,
+ connectionName: string,
+ target: EntityDetailsTarget,
+): Promise<{ resolved: (KtxTableRef & { column?: string }) | null; candidates: KtxTableRef[] }> {
+ if ('display' in target) {
+ return catalog.resolveDisplayTarget(connectionName, target.display);
+ }
+
+ const candidateResolution = await catalog.resolveDisplayTarget(connectionName, targetLabel(target));
+ return {
+ resolved: {
+ catalog: target.catalog,
+ db: target.db,
+ name: target.name,
+ column: target.column,
+ },
+ candidates: candidateResolution.candidates,
+ };
+}
+```
+
+Then replace the `const resolution = ...` block inside the `for (const target of input.targets)` loop with:
+
+```ts
+ const resolution = await resolveTarget(catalog, input.connectionName, target);
+```
+
+Replace the missing-resolution block with:
+
+```ts
+ if (!resolution.resolved) {
+ missing.push({ target, candidates: resolution.candidates });
+ appendMissingTargetMarkdown(parts, target, resolution.candidates);
+ continue;
+ }
+```
+
+Replace the missing-detail block with:
+
+```ts
+ if (!detail) {
+ missing.push({ target, candidates: resolution.candidates });
+ appendMissingTargetMarkdown(parts, target, resolution.candidates);
+ continue;
+ }
+```
+
+- [ ] **Step 4: Run the focused entity-details tests**
+
+Run:
+
+```bash
+pnpm --filter @ktx/context exec vitest run src/ingest/tools/warehouse-verification/entity-details.tool.test.ts
+```
+
+Expected: PASS.
+
+- [ ] **Step 5: Run warehouse verification regression tests**
+
+Run:
+
+```bash
+pnpm --filter @ktx/context exec vitest run src/ingest/tools/warehouse-verification/warehouse-catalog.service.test.ts src/ingest/tools/warehouse-verification/entity-details.tool.test.ts src/ingest/tools/warehouse-verification/discover-data.tool.test.ts
+```
+
+Expected: PASS.
+
+- [ ] **Step 6: Run context type-check**
+
+Run:
+
+```bash
+pnpm --filter @ktx/context run type-check
+```
+
+Expected: PASS.
+
+- [ ] **Step 7: Commit**
+
+Run:
+
+```bash
+git add \
+ packages/context/src/ingest/tools/warehouse-verification/entity-details.tool.ts \
+ packages/context/src/ingest/tools/warehouse-verification/entity-details.tool.test.ts
+git commit -m "fix(context): report structured entity detail misses"
+```
+
+## Self-review
+
+Spec coverage:
+
+- The original `entity_details` contract says structured and display targets
+ are mixed shapes and unresolved targets must produce `Not found in scan` with
+ candidates. This plan adds that model-visible behavior for structured table
+ misses and preserves the existing column-miss behavior.
+
+Placeholder scan:
+
+- This plan contains no deferred implementation placeholders.
+
+Type consistency:
+
+- The plan uses the existing `WarehouseCatalogService`, `KtxTableRef`,
+ `EntityDetailsStructured`, and `ToolOutput` types without adding public API
+ compatibility wrappers.
diff --git a/docs/superpowers/specs/2026-05-12-notion-ingestion-warehouse-verification-design.md b/docs/superpowers/specs/2026-05-12-notion-ingestion-warehouse-verification-design.md
new file mode 100644
index 00000000..074f00e5
--- /dev/null
+++ b/docs/superpowers/specs/2026-05-12-notion-ingestion-warehouse-verification-design.md
@@ -0,0 +1,331 @@
+# Warehouse Verification Tools for Ingestion Synthesis
+
+**Date:** 2026-05-12
+**Author:** Andrey Avtomonov
+**Status:** Design — pending implementation plan
+
+## Background and motivation
+
+KTX's ingest pipeline synthesises wiki pages and semantic-layer (SL) sources from third-party content (Notion, LookML, Looker, Metabase, dbt, MetricFlow, historic SQL, live-database scans, and chat). The synthesis stage is an LLM call that runs once per WorkUnit, governed by a skill prompt (e.g. `notion_synthesize`) and a set of allowed tools.
+
+A real-world inspection (project `/tmp/ktx-proj-1`) surfaced two failure modes the synthesis stage produces:
+
+1. **Fictional identifiers laundered into wiki output.** A Notion page mentioned `orbit_analytics.customer` as a legacy "customer source" table with a `plan_tier in {free, pro, enterprise}` column. Neither the table, the column, nor those values exist in the configured warehouse. The synthesis LLM faithfully copied them into `knowledge/global/orbit/customers-source.md` as a "Conflict Note", giving the fabricated names full wiki frontmatter, a `Source:` citation, and apparent authority.
+2. **Column attribution drift.** The same wiki page documents columns under `orbit_raw.accounts` but states the `paying_account_count` measure filters on `normalized_plan_code` and `contract_status`. Those columns live on `orbit_analytics.mart_account_segments`, not on `accounts`. A reader (or a downstream agent) following the page will write `accounts.normalized_plan_code` and get a `column does not exist` error.
+
+Root cause analysis (`packages/context/skills/notion_synthesize/SKILL.md`, `packages/context/src/ingest/tools/emit-unmapped-fallback.tool.ts`, `packages/context/src/wiki/tools/wiki-write.tool.ts`) showed three contributing factors:
+
+- The synthesis LLM has no verification primitive that distinguishes a real warehouse identifier from a fabricated one. `sl_discover` only finds objects already promoted into the semantic layer; raw warehouse scans (which already exist on disk under `raw-sources//live-database//`) are not surfaced to the LLM at all.
+- `wiki_write` performs no body-text validation — anything the LLM emits is written.
+- The skill prompt itself uses `orbit_analytics.customer` as a canonical example string (`SKILL.md:70`), reinforcing the same fictional name the LLM ends up emitting.
+
+Kaelio's server-side ingest WU agent (`/Users/andrey/conductor/workspaces/kaelio-main2/douala/server/src/tools/toolset-factory.service.ts`) had four verification tools that KTX dropped during the open-source extraction: `discover_data`, `entity_details`, `dictionary_search`, and `sql_execution`. The underlying connector infrastructure (`KtxScanConnector`, dialect classes, `assertReadOnlySql`, `SemanticLayerService.executeQuery`) is present in KTX, so the gap is at the tool layer, not the platform layer.
+
+## Goal
+
+Give every ingest adapter's synthesis-time LLM call the tools and skill-prompt instructions needed to verify warehouse identifiers (`schema.table`, `schema.table.column`) and sample values before emitting them into wiki pages, SL sources, `tables:` frontmatter, `sl_refs`, or `emit_unmapped_fallback` records.
+
+## Non-goals
+
+- Not changing `wiki_write` itself. A complementary spec covers hard write-time validation; this spec focuses on giving the LLM the tools to self-validate.
+- Not modifying any Notion fetch/chunk/cluster behaviour.
+- Not changing the `_schema/*.yaml` format.
+- Not introducing a UUID layer for tables or columns; KTX keeps `(connection, catalog, db, name)` as the canonical table identity.
+- Not adding `semantic_query` to the synthesis toolset. `semantic_query` is a future tool for the research/chat-time agent; synthesis creates SL sources rather than queries them, so the wrong shape.
+- Not adding `dictionary_search`. `entity_details` already returns per-column `sampleValues` from the relationship-profile, and `sql_execution` covers the rarer "where does this literal live?" case more accurately than a sampled-JSON full-text scan.
+
+## What already exists in KTX
+
+The dialect/driver/connection architecture is fully ported from Kaelio. The new tools sit on top of three already-shipping primitives:
+
+| Primitive | Location |
+|---|---|
+| `KtxTableRef = { catalog: string\|null, db: string\|null, name: string }` | `packages/context/src/scan/types.ts:168` |
+| `SemanticLayerService.executeQuery(connectionId, sql)` | `packages/context/src/sl/semantic-layer.service.ts:1004`, used today by `sl_validate` |
+| `assertReadOnlySql` / `limitSqlForExecution` | `packages/context/src/connections/read-only-sql.ts` |
+| 7 connectors with parallel layout (postgres, mysql, sqlserver, snowflake, bigquery, clickhouse, sqlite), each exporting a dialect class | `packages/connector-*` |
+| Raw scan artefacts: `tables/...json` and `enrichment/relationship-profile.json` (with `nativeType`, `nullable`, `primaryKey`, `foreignKeys`, `rowCount`, `nullCount`, `distinctCount`, `sampleValues`, descriptions) | `raw-sources//live-database//` |
+| `wiki_search`, `sl_discover`, `sl_read_source`, `sl_validate`, `emit_unmapped_fallback` | already wired into synthesis stages |
+
+The only meaningfully new code is `WarehouseCatalogService`, a small `getDialectForDriver` dispatch, the three tool files, and the wiring in `ingest-bundle.runner.ts`.
+
+## Architecture
+
+### Module layout
+
+```
+packages/context/src/ingest/tools/warehouse-verification/
+ discover-data.tool.ts
+ entity-details.tool.ts
+ sql-execution.tool.ts
+ warehouse-catalog.service.ts
+ index.ts # exports createWarehouseVerificationTools()
+packages/context/src/connections/
+ dialects.ts # adds getDialectForDriver()
+packages/context/skills/_shared/
+ identifier-verification.md # the protocol snippet referenced from every synthesis skill
+```
+
+### Canonical table identity
+
+Every tool that names a warehouse object uses the tuple `(connectionName, catalog, db, name[, column])`. `connectionName` is the slug from `ktx.yaml` (e.g., `"warehouse"`), validated against `^[a-zA-Z0-9][a-zA-Z0-9_-]*$`. There is no UUID layer.
+
+`display` strings the LLM picks up from source pages (e.g., `"orbit_raw.accounts"` for Postgres or `"project.dataset.table"` for BigQuery) are parsed by `WarehouseCatalogService.resolveDisplay`, which knows the connection's driver via `getDialectForDriver`. Ambiguous parses (e.g., a 2-part display on BigQuery) return a candidates list instead of guessing.
+
+Dialect mapping:
+
+| Driver | catalog | db | name | Display |
+|---|---|---|---|---|
+| postgres | `null` | schema | table | `schema.table` |
+| mysql | `null` | schema | table | `schema.table` |
+| sqlserver | catalog | schema | table | `catalog.schema.table` |
+| snowflake | database | schema | table | `db.schema.table` |
+| bigquery | project | dataset | table | `project.dataset.table` |
+| clickhouse | `null` | database | table | `database.table` |
+| sqlite | `null` | `null` | table | `table` |
+
+### `WarehouseCatalogService`
+
+Stateless except for a per-WorkUnit cache. Reads raw scan files under `raw-sources//live-database//`.
+
+```ts
+class WarehouseCatalogService {
+ getTable(ref: { connectionName: string } & KtxTableRef): Promise;
+ listTables(connectionName: string): Promise;
+ resolveDisplay(connectionName: string, display: string): Promise<{
+ resolved: KtxTableRef | null;
+ candidates: KtxTableRef[]; // ranked by edit distance when resolved is null
+ dialect: string;
+ }>;
+ searchByName(connectionName: string, query: string, limit: number): Promise>;
+ getLatestSyncId(connectionName: string): Promise;
+}
+```
+
+`getTable` merges the raw schema file (native types, PK, FK, nullable) with the enrichment profile (row counts, null rates, distinct counts, sample values, AI-generated descriptions). When no scan exists for the connection, every read returns `null`; tools surface this as a distinct "no scan available" state rather than as "identifier not found", so the LLM doesn't conclude a real table is fictional just because a scan hasn't run yet.
+
+### `getDialectForDriver`
+
+```ts
+// packages/context/src/connections/dialects.ts
+export type SupportedDriver = 'postgres'|'postgresql'|'mysql'|'sqlserver'|'snowflake'|'bigquery'|'clickhouse'|'sqlite'|'sqlite3';
+export function getDialectForDriver(driver: SupportedDriver): KtxDialect;
+```
+
+Sync dispatch. The connectors' existing dialect classes already expose the same shape — `formatTableName(KtxTableRef)`, `quoteIdentifier(string)`, `mapToDimensionType(nativeType)`. The implementation plan introduces a minimal `KtxDialect` interface that these classes already satisfy structurally; no connector-internal changes required. Used by tools only for display-string parsing and error-message formatting; tools never construct executable SQL.
+
+## Tool contracts
+
+### `entity_details`
+
+```ts
+input = {
+ connectionName: string,
+ targets: Array< // 1..50, mixed shapes allowed
+ | { display: string } // "orbit_raw.accounts" or "orbit_raw.accounts.account_id"
+ | { catalog: string|null, db: string, name: string, column?: string }
+ >,
+}
+```
+
+Output (markdown, per target):
+
+```
+### orbit_raw.accounts
+Type: table | Native columns: 11 | PK: account_id | FKs: parent_account_id → orbit_raw.accounts.account_id
+Description: One row per customer account…
+
+Columns:
+- account_id (text, nullable=false, PK) — sample: ["acct_001","acct_002",…]
+- parent_account_id (text, nullable=true, FK → orbit_raw.accounts.account_id)
+- account_name (text, nullable=false)
+- …
+
+Profile: rowCount=4321 distinctCount(account_id)=4321 nullRate(parent_account_id)=0.62
+```
+
+When `column` is provided in a target, output is scoped to that one column. When a target doesn't resolve, output is `Not found in scan. Closest matches: …` with up to 5 candidates from `searchByName`. When the connection has no `live-database` scan, output is `No live-database scan available for connection ""; run \`ktx scan\` first.` — distinct from the "not found" state.
+
+Structured output: `{ resolved: TableDetail[], missing: Array<{target, candidates}>, scanAvailable: boolean }`.
+
+Refuses `connectionName` values not in the WU-stage's `allowedConnectionNames` set.
+
+### `sql_execution`
+
+```ts
+input = {
+ connectionName: string,
+ sql: string, // single SELECT or WITH only
+ rowLimit?: number, // default 100, hard cap 1000
+}
+```
+
+Pipeline:
+
+1. `assertReadOnlySql(sql)` — regex rejects anything starting with `insert|update|delete|merge|alter|drop|create|truncate|grant|revoke|copy|call|do|vacuum|analyze|refresh`.
+2. `limitSqlForExecution(sql, rowLimit)` — wraps as `select * from () as ktx_query_result limit N`.
+3. `SemanticLayerService.executeQuery(connectionName, wrappedSql)`.
+4. Format as markdown table; first ~20 rows inline; if truncated, append `… +N more rows`.
+
+Structured output: `{ headers, rows, rowCount, truncated, sql, wrappedSql }`.
+
+Connector errors surface verbatim (e.g., Postgres `relation "orbit_analytics.customer" does not exist`). That error message is the most valuable verification signal — it tells the LLM the identifier is fictional.
+
+Refuses `connectionName` not in `allowedConnectionNames`. Each connector's driver-level read-only enforcement (Postgres read-only transaction, BigQuery query-only jobs) is a second defence under the regex gate.
+
+### `discover_data`
+
+```ts
+input = {
+ query: string,
+ connectionName?: string, // omit to search all configured warehouse connections
+ limit?: number, // default 10 per section
+ sourceName?: string, // SL source detail mode (delegates to sl_discover)
+}
+```
+
+Composes three searches and groups output into three sections, omitting empty sections:
+
+1. **Wiki Pages** — `wiki_search({query, limit})`. Routing hint: *use `wiki_read(blockKey)` for full content*.
+2. **Semantic Layer Sources** — `sl_discover({query, connectionName})`. Routing hint: *use `sl_read_source(sourceName)` for the YAML, or `entity_details` for warehouse-shape details*.
+3. **Raw Warehouse Schema** — `WarehouseCatalogService.searchByName(connectionName, query, limit)`. Routing hint: *use `entity_details({connectionName, targets: [{display}]})` for full DDL + sample values*.
+
+When `sourceName` is set, delegates entirely to `sl_discover` inspect mode and skips other sections. When all three sections are empty, output is `No matches for "" across wiki, semantic layer, or raw warehouse schema. Try broader terms; this concept may not exist yet.`
+
+Structured output: `{ wiki: WikiSearchStructured|null, sl: SlDiscoverStructured|null, raw: RawSchemaHits|null }`.
+
+## Wiring
+
+`packages/context/src/ingest/ingest-bundle.runner.ts` already plumbs `emit_unmapped_fallback` into both the WorkUnit stage (`createEmitUnmappedFallbackTool` around line 726) and the reconcile stage (around line 962), with merging done via `packages/context/src/ingest/stages/build-wu-context.ts` and `build-reconcile-context.ts`.
+
+Add a parallel factory next to those existing calls:
+
+```ts
+const warehouseTools = createWarehouseVerificationTools({
+ semanticLayerService: scopedSemanticLayerService,
+ warehouseCatalog: new WarehouseCatalogService({ fileStore, projectDir }),
+ dialects: getDialectForDriver,
+ allowedConnectionNames: slConnectionIds, // reuse existing scoping
+ sqlExecutionRowLimit: 100,
+});
+// Merge `entity_details`, `sql_execution`, `discover_data` into both stage tool maps
+// alongside emit_unmapped_fallback.
+```
+
+`createWarehouseVerificationTools` returns `Record` with three keys. The set is wired into every adapter's synthesis stage — no per-adapter opt-in.
+
+## Skill-prompt updates
+
+### Shared protocol
+
+`packages/context/skills/_shared/identifier-verification.md`:
+
+```md
+## Identifier Verification Protocol
+
+Before writing a wiki page or SL source on any topic:
+1. `discover_data({query: ""})` — see what wikis, SL sources, and raw tables
+ already exist. Prefer updating existing pages over creating new ones.
+
+Before emitting any `schema.table` or `schema.table.column` into a wiki body,
+SL source, `tables:` frontmatter, `sl_refs`, or `emit_unmapped_fallback`:
+2. `entity_details({connectionName, targets: [{display: ""}]})` —
+ confirm the identifier resolves; inspect native types, FK/PK, and sampleValues.
+3. For literal values from the source (status codes, plan tiers): check whether
+ they appear in `entity_details`' `sampleValues` for the relevant column.
+ If `sampleValues` is short or you suspect the sample missed real values, run
+ a `sql_execution` probe: `SELECT DISTINCT
FROM LIMIT 50`.
+4. If the candidate identifier still doesn't resolve, do one of:
+ (a) Use `sql_execution` with `SELECT 1 FROM LIMIT 0`. If it errors,
+ the identifier is fictional.
+ (b) Wrap the identifier in `[unverified — from ]` in the wiki body,
+ citing the exact raw path that mentioned it.
+ (c) When recording `emit_unmapped_fallback` with `no_physical_table`,
+ include the failing probe error in `clarification`.
+5. Never copy `.
` placeholder strings from these instructions
+ into output.
+```
+
+Each affected skill inlines this block verbatim (skill files are independent prompts; KTX has no cross-skill include mechanism today).
+
+### Per-skill diffs
+
+Two skills are deliberately excluded from updates: `ingest_triage` (read-only triage; produces no wiki or SL output) and `sl` (umbrella reference doc; cross-links to the protocol but doesn't need its own copy).
+
+| Skill | Changes |
+|---|---|
+| `notion_synthesize` | Inline protocol; append `discover_data`, `entity_details`, `sql_execution` to `Allowed:` (line 74); replace `orbit_analytics.customer` example on line 70 with `.
` |
+| `dbt_ingest` | Inline protocol; line 24: replace `wiki_sl_search` → `discover_data` and `sl_describe_table` → `entity_details`; strengthen the "not permission to invent physical columns" paragraph by naming `entity_details` as the verification call |
+| `lookml_ingest` | Inline protocol; add: "Verify each `sql_table_name` from the LookML view with `entity_details` before mapping to an SL source" |
+| `looker_ingest` | Inline protocol; add: "For every Looker field reference, call `entity_details` on the underlying `(schema, table, column)` before promoting to `sl_refs` or quoting in wiki body" |
+| `metabase_ingest` | Inline protocol; add: "Before writing a wiki page derived from a Metabase question's SQL, verify each `schema.table.column` mentioned with `entity_details`" |
+| `metricflow_ingest` | Inline protocol; add: "Verify each MetricFlow model's source table with `entity_details` before producing the corresponding `sl_write_source`" |
+| `live_database_ingest` | Inline protocol; add: "Sample values come from the scan record; do not invent values not present in `relationship-profile.json`" |
+| `historic_sql_table_digest` | Shortened protocol focused on column attribution: "Only mention columns visible in the table's scan record. Use `entity_details({display})` if uncertain" |
+| `historic_sql_patterns` | Inline protocol; add: "Every join column mentioned in pattern descriptions must be verified via `entity_details` for both sides of the join" |
+| `knowledge_capture` | Inline protocol; update line 44: "First call `discover_data` to find existing wiki pages, SL sources, and raw tables on the topic" |
+| `sl_capture` | Inline protocol; add: "Before `sl_write_source`, call `entity_details` on the target table to confirm column names and types match the YAML being written" |
+
+### Cleanups beyond the four-tool addition
+
+- `notion_synthesize/SKILL.md:70` — remove `orbit_analytics.customer` (placeholder).
+- `packages/context/src/ingest/tools/emit-unmapped-fallback.tool.ts:67` — same example string in the Zod `.describe()` — replace with `.