` to see the error, fix the connection, then retry.
```
@@ -332,6 +338,16 @@ separate `ktx` binary on `PATH`. If the CLI path changes, rerun
## What setup writes
**ktx** writes plain files so people and agents can review changes in git.
+**ktx** initializes a git repository at the project directory and writes context
+changes there. If the project directory is nested inside another repository,
+**ktx** still keeps its own repo and does not commit to the parent repo.
+
+Because **ktx** owns that repository, it will not adopt one it did not create. If
+you point setup at a directory that is already a git repository's root - such as
+an existing application checkout - **ktx** stops and asks you to pick a dedicated
+directory instead. In the setup wizard choose the **New subfolder** option (for
+example `ktx-project`), or pass a fresh `--project-dir` when running setup
+non-interactively.
| Path | Purpose |
|------|---------|
diff --git a/docs-site/content/docs/guides/building-context.mdx b/docs-site/content/docs/guides/building-context.mdx
index 9bcf2659..24550c85 100644
--- a/docs-site/content/docs/guides/building-context.mdx
+++ b/docs-site/content/docs/guides/building-context.mdx
@@ -43,7 +43,7 @@ Local-auth backends keep provider credentials out of `ktx.yaml`:
```bash
ktx setup --llm-backend claude-code --no-input
-ktx setup --llm-backend codex --llm-model gpt-5.5 --no-input
+ktx setup --llm-backend codex --no-input
```
With `claude-code`, **ktx** agent loops can invoke only the **ktx** MCP tools
diff --git a/docs-site/content/docs/guides/llm-configuration.mdx b/docs-site/content/docs/guides/llm-configuration.mdx
index 71ab9d80..776cb275 100644
--- a/docs-site/content/docs/guides/llm-configuration.mdx
+++ b/docs-site/content/docs/guides/llm-configuration.mdx
@@ -30,19 +30,19 @@ llm:
default: sonnet
triage: haiku
candidateExtraction: sonnet
- curator: sonnet
- reconcile: sonnet
- repair: sonnet
+ curator: opus
+ reconcile: opus
+ repair: haiku
```
-During setup, choose the backend interactively or pass the model in automation:
+During setup, choose the backend interactively or pass it in automation:
```bash
-ktx setup --llm-backend claude-code --llm-model opus --no-input
+ktx setup --llm-backend claude-code --no-input
```
-For Claude Code, `sonnet`, `opus`, and `haiku` map to **ktx** defaults. Full Claude
-model IDs are also accepted.
+Setup writes `sonnet`, `haiku`, and `opus` aliases into `llm.models`. You can
+edit any role to another alias or a full Claude model ID after setup.
`claude-code` exposes only **ktx** MCP tools for the current agent loop. SDK init
metadata may still list host slash commands, skills, and subagents; **ktx** does not
@@ -59,12 +59,17 @@ llm:
backend: codex
models:
default: gpt-5.5
+ triage: gpt-5.5
+ candidateExtraction: gpt-5.5
+ curator: gpt-5.5
+ reconcile: gpt-5.5
+ repair: gpt-5.5
```
Configure it non-interactively:
```bash
-ktx setup --llm-backend codex --llm-model gpt-5.5 --no-input
+ktx setup --llm-backend codex --no-input
```
This is separate from Codex agent-client setup. `ktx setup --agents --target
diff --git a/docs-site/content/docs/guides/reviewing-context.mdx b/docs-site/content/docs/guides/reviewing-context.mdx
index 63d4fceb..26f191b6 100644
--- a/docs-site/content/docs/guides/reviewing-context.mdx
+++ b/docs-site/content/docs/guides/reviewing-context.mdx
@@ -61,11 +61,14 @@ committing the file.
## A typical review session
-The loop above describes the shape. In practice, one review session looks like
-this:
+The loop above describes the shape. Run these commands from the **ktx** project
+directory. **ktx** keeps that directory as its own git repository, even when the
+directory lives inside another repository, so reviewing context changes never
+requires committing to a parent application repo.
```bash
# 1. Run ingest on a branch
+cd /path/to/ktx-project
git checkout -b ingest/2026-05-21
ktx ingest --all
diff --git a/docs-site/content/docs/guides/writing-context.mdx b/docs-site/content/docs/guides/writing-context.mdx
index e4ac70c0..8703030e 100644
--- a/docs-site/content/docs/guides/writing-context.mdx
+++ b/docs-site/content/docs/guides/writing-context.mdx
@@ -44,12 +44,17 @@ Use this order for most context changes:
Semantic sources are YAML files for queryable tables or custom SQL. They define
agent-facing measures, dimensions, segments, joins, and grain.
-Semantic source files live at:
+Semantic source files live under:
```text
-semantic-layer/
/.yaml
+semantic-layer//
```
+The file's `name:` field is the source's identity — it carries the warehouse
+identifier verbatim, including case. The filename is a derived label: simple
+lowercase names get `.yaml`, anything else gets a slugged
+filename. Renaming a file does not rename the source.
+
### Minimal source
```yaml
@@ -152,7 +157,7 @@ joins:
| Field | Required | Description |
|-------|----------|-------------|
-| `name` | Yes | Source identifier. Use lowercase words and underscores. |
+| `name` | Yes | Source identity (not the filename). When overlaying an ingested table, match the manifest identifier verbatim, including case (e.g. `SIGNED_UP`); for a new standalone source, lowercase words and underscores are recommended. |
| `descriptions` | No | Description map keyed by source, such as `user`, `dbt`, or `ai`. |
| `table` or `sql` | Yes | Database table or custom SQL expression. Use exactly one. |
| `grain` | Yes | Columns that uniquely identify a row at the source grain. |
diff --git a/docs-site/content/docs/integrations/agent-clients.mdx b/docs-site/content/docs/integrations/agent-clients.mdx
index 46a1ec8b..1ef75d22 100644
--- a/docs-site/content/docs/integrations/agent-clients.mdx
+++ b/docs-site/content/docs/integrations/agent-clients.mdx
@@ -68,19 +68,30 @@ If you choose an install mode, it then asks which targets to install:
└
```
-When every selected target supports both project and global setup, the command
-also asks where to install supported agent config:
+When at least one selected target supports project-scoped setup, the command
+asks where to install agent config:
```txt
-◆ Where should ktx install supported agent config?
+◆ Where should ktx install agent config?
│
│ ktx project: /path/to/your/ktx-project
│
-│ ○ Project scope (ktx project directory)
+│ ○ ktx project directory /path/to/your/ktx-project
+│ ○ Current directory /path/to/where/you/ran/ktx
+│ ○ Custom directory… (enter a path)
│ ○ Global scope (user config)
└
```
+The first three choices write project-scoped files (`.claude/`, `.mcp.json`,
+`.cursor/`, skills, and rules) into the chosen directory while still pointing
+them at this ktx project. Use **Current directory** or **Custom directory…**
+when you open your coding agent from somewhere other than the ktx project
+directory. **Current directory** is hidden when it is already the ktx project
+directory, and **Global scope** appears only when every selected target
+supports global setup. Non-interactive runs pass `--install-dir ` (for
+example `--install-dir .`) for the same result.
+
## Generated files
**ktx** writes MCP client configuration and analytics guidance by default. It writes
diff --git a/docs-site/content/docs/integrations/context-sources.mdx b/docs-site/content/docs/integrations/context-sources.mdx
index 213f5a3d..f7a52685 100644
--- a/docs-site/content/docs/integrations/context-sources.mdx
+++ b/docs-site/content/docs/integrations/context-sources.mdx
@@ -38,15 +38,16 @@ LookML uses top-level `repoUrl`, and MetricFlow uses nested
## dbt
-Ingests schema definitions, model descriptions, column metadata, and test coverage from a dbt project.
+Ingests schema definitions, model descriptions, column metadata, and column test definitions from a dbt project.
### What it provides
- Model and source definitions from `schema.yml` files
-- Column descriptions and types
-- Test coverage signals
-- Semantic model references (if using dbt semantic layer)
-- Data lineage between models
+- Column names, descriptions, and data types
+- Column tests, mapped to semantic facts — `not_null` / `unique` become column constraints, `accepted_values` becomes enum value lists, and `relationships` becomes join / foreign-key edges
+- Model and source tags, and source freshness settings
+
+MetricFlow `semantic_models:` and `metrics:` are ingested through the separate [MetricFlow](#metricflow) source, not the dbt driver.
### Connection config
@@ -87,9 +88,9 @@ connections:
### What gets ingested
-- YAML semantic sources generated from dbt schema files
-- One work unit per semantic source (for projects with >25 YAML files) or all at once for smaller projects
-- Column descriptions, tests, and relationships are preserved
+- **Semantic-layer overlays** (`semantic-layer/*.yaml`): descriptions, constraints, enum values, and joins from the dbt YAML are written onto the semantic source for the matching warehouse table. Overlays land on the warehouse connection that owns the table, which is usually a different connection than the dbt source itself.
+- **Wiki pages** (`wiki/`): for definitions or relationships that don't map to a confirmed physical table.
+- **Work units** for parallel processing: one per schema file under `models/` when the project has more than 25 YAML files, otherwise a single combined unit.
---
@@ -101,7 +102,7 @@ Ingests MetricFlow semantic models and metric definitions. Useful when your team
- Semantic model definitions (entities, dimensions, measures)
- Cross-model metric definitions
-- Dimension and entity relationships between models
+- Entity relationships between models, inferred from matching foreign and primary entities
### Connection config
@@ -133,7 +134,7 @@ For a local path:
### What gets ingested
-- Semantic models with their entities, dimensions, and measures
+- Semantic models with their entities, dimensions, measures, and the join edges inferred from entity relationships
- Metric definitions with their expressions and filters
- Work units organized by connected component (metrics + related semantic models grouped together)
@@ -178,10 +179,10 @@ For a local path:
### What gets ingested
-- View and model definitions organized by connected component
-- LookML field types mapped to semantic layer column types
-- Join definitions and relationship cardinalities
-- SQL table references for warehouse mapping validation
+- One work unit per model, plus a unit for orphan views and one per dashboard
+- Semantic-layer sources per view — overlays for thin `sql_table_name` wrappers, standalone sources for `derived_table` views
+- Measures, joins (with their Looker `relationship:`), and field types mapped to column types (`yesno` → boolean, date/timestamp → time)
+- Wiki pages for relationships and descriptions, with warehouse identifiers verified before writing
### Warehouse mapping
@@ -192,19 +193,19 @@ Optionally validate that LookML references match your expected Looker connection
expectedLookerConnectionName: postgres_connection
```
-This validates that LookML model `connection:` declarations match expectations, flagging mismatches during ingestion.
+This compares each model's `connection:` declaration against the expected name. Mismatched models are flagged, and semantic-layer writes are disabled for them during that ingest while wiki extraction still proceeds.
---
## Metabase
-Ingests dashboards, questions, and their underlying SQL queries from a Metabase instance. Maps Metabase databases to your **ktx** warehouse connections.
+Ingests collections, questions, models, and metrics — with their underlying SQL — from a Metabase instance. Maps Metabase databases to your **ktx** warehouse connections.
### What it provides
-- Dashboard metadata and organization
-- Question/query definitions (native SQL and structured queries)
-- Table and column usage patterns from queries
+- Collections and their hierarchy, used to organize ingested context
+- Questions, models, and metrics — resolved SQL for both native and structured (MBQL) queries
+- Each card's output schema: column types and primary/foreign-key hints
- Database-to-warehouse relationship mapping
### Connection config
@@ -233,9 +234,9 @@ Generate an API key in Metabase: **Admin > Settings > Authentication > API Keys*
### What gets ingested
-- Semantic sources generated from SQL queries in questions
-- Wiki pages for dashboards (purpose, key metrics, relationships)
-- Work units per dashboard and per question
+- Semantic-layer sources generated from each card's resolved SQL and column metadata, written to the mapped warehouse connection
+- Fallback wiki notes only when a referenced table can't be mapped or an identifier can't be verified
+- One work unit per Metabase collection; re-syncs reprocess only collections with changed cards
### Warehouse mapping
@@ -289,10 +290,10 @@ Generate API credentials in Looker: **Admin > Users > Edit > API Keys**.
### What gets ingested
-- Semantic sources from explore field definitions
-- Wiki pages for dashboards (purpose, audience, key metrics)
-- Triage signals for automated content classification
-- Work units per explore and per dashboard
+- Semantic-layer sources from explore fields, written to the mapped warehouse connection (mapped explores only)
+- Wiki pages capturing reusable metric, segment, and domain knowledge from dashboards and Looks
+- Usage and recency signals that drive a triage gate, focusing processing on high-value content
+- Work units per explore, per dashboard, and per Look
### Warehouse mapping
@@ -314,10 +315,10 @@ Ingests pages and databases from a Notion workspace as wiki pages. Useful for ca
### What it provides
-- Wiki pages synthesized from Notion content
-- Page hierarchy and relationships
-- Database schemas (when Notion databases describe primary sources)
-- Semantic clustering for organized ingestion
+- Notion pages crawled from selected roots or all accessible content
+- Page bodies and blocks normalized to Markdown
+- Page hierarchy and cross-page links (child pages, mentions, relations)
+- Notion databases and their data-source rows as individual pages
### Connection config
@@ -356,6 +357,7 @@ Create an integration at [notion.so/my-integrations](https://www.notion.so/my-in
| `crawl_mode` | `all_accessible` or `selected_roots` | - |
| `root_page_ids` | Page IDs to crawl from (for `selected_roots`) | `[]` |
| `root_database_ids` | Database IDs to include | `[]` |
+| `root_data_source_ids` | Data-source IDs to include (for `selected_roots`) | `[]` |
| `max_pages_per_run` | Pages processed per sync | `1000` |
| `max_knowledge_creates_per_run` | New pages created per sync | `25` |
| `max_knowledge_updates_per_run` | Pages updated per sync | `20` |
@@ -363,13 +365,13 @@ Create an integration at [notion.so/my-integrations](https://www.notion.so/my-in
### What gets ingested
- Wiki pages synthesized from Notion content (not raw copies)
-- Domain context extracted and organized by topic
-- Triage signals for classifying page relevance
-- Work units clustered by semantic similarity for efficient processing
+- Semantic-layer sources when a page defines a reusable dataset or metric mapped to a confirmed non-Notion target; otherwise the fact stays wiki-only
+- Page-relevance triage that skips transient content (task lists, status updates, date-titled snapshots)
+- Work units clustered by embedding similarity for efficient synthesis
### Notes
-- Notion is knowledge-only - it does not produce semantic layer sources
+- Notion is wiki-first: it writes durable wiki pages by default and only emits semantic-layer sources for content mapped to a confirmed non-Notion target; unmapped facts stay wiki-only
- Rate limits apply; large workspaces may require multiple ingestion runs
- Incremental sync cursors are stored in `.ktx/db.sqlite`; don't add
`last_successful_cursor` to `ktx.yaml`
diff --git a/docs-site/content/docs/meta.json b/docs-site/content/docs/meta.json
index 7be8bc90..c872cf4b 100644
--- a/docs-site/content/docs/meta.json
+++ b/docs-site/content/docs/meta.json
@@ -8,7 +8,6 @@
"integrations",
"configuration",
"cli-reference",
- "ai-resources",
"community"
]
}
diff --git a/docs-site/lib/llm-docs.ts b/docs-site/lib/llm-docs.ts
index fd6c8dd1..1f5766e1 100644
--- a/docs-site/lib/llm-docs.ts
+++ b/docs-site/lib/llm-docs.ts
@@ -54,9 +54,7 @@ ktx provides semantic-layer files, warehouse scans, wiki pages, provenance, and
- Installable setup skill: run \`npx skills add Kaelio/ktx --skill ktx\` from
the project you want to configure.
-${link("/docs/ai-resources/agent-quickstart", "Agent Quickstart", "Task-first route for coding assistants using ktx")}
-${link("/docs/ai-resources/markdown-access", "Markdown Access", "Fetch ktx docs as llms.txt, llms-full.txt, or per-page Markdown")}
-${link("/docs/ai-resources/agent-instructions", "Agent Instructions", "Suggested instructions for coding assistants that need to read and cite ktx docs")}
+${link("/docs/community/ai-resources", "AI Resources", "How coding agents read, cite, and act on the ktx docs")}
## Start Here
@@ -67,7 +65,7 @@ ${link("/docs/guides/writing-context", "Writing Context", "Write semantic source
## Machine-Readable Documentation
- [Full documentation](${absoluteUrl("/llms-full.txt")}): All docs pages in one plain-text markdown response
-- [Markdown access guide](${absoluteUrl("/docs/ai-resources/markdown-access.md")}): How to fetch llms.txt, llms-full.txt, and per-page Markdown
+- [AI Resources guide](${absoluteUrl("/docs/community/ai-resources.md")}): How agents fetch llms.txt, llms-full.txt, and per-page Markdown
- [Quickstart markdown](${absoluteUrl("/docs/getting-started/quickstart.md")}): Human setup walkthrough
- [Semantic-layer CLI markdown](${absoluteUrl("/docs/cli-reference/ktx-sl.md")}): Semantic-layer commands and JSON output
- [Wiki CLI markdown](${absoluteUrl("/docs/cli-reference/ktx-wiki.md")}): Wiki page commands and JSON output
@@ -147,8 +145,8 @@ function absoluteUrl(path: string) {
function formatCategoryName(category: string) {
const labels: Record = {
- "ai-resources": "AI Resources",
"cli-reference": "CLI Reference",
+ community: "Community & Resources",
};
if (labels[category]) {
diff --git a/docs-site/next.config.mjs b/docs-site/next.config.mjs
index 380dba85..f800bcf5 100644
--- a/docs-site/next.config.mjs
+++ b/docs-site/next.config.mjs
@@ -30,7 +30,36 @@ const config = {
};
},
async redirects() {
+ // Alias-host canonicalization MUST come before the generic root/docs
+ // redirects below. Those generic rules have no host guard, so if they ran
+ // first they would inject a "/ktx" basePath into the path on the alias
+ // hosts, which the alias catch-alls would then prepend a second time —
+ // producing https://docs.kaelio.com/ktx/ktx/docs/... Redirects also run
+ // before beforeFiles rewrites, so the ktx.sh catch-all must exclude
+ // /stars* to let the stars dashboard rewrite proxy through.
return [
+ {
+ source: "/slack",
+ has: [{ type: "host", value: "ktx.sh" }],
+ destination:
+ "https://join.slack.com/t/ktxcommunity/shared_invite/zt-3y9b44m1x-LVyNNJD5nwaZHq4XS29LMQ",
+ permanent: false,
+ basePath: false,
+ },
+ {
+ source: "/:path*",
+ has: [{ type: "host", value: "docs.ktx.sh" }],
+ destination: "https://docs.kaelio.com/ktx/:path*",
+ permanent: true,
+ basePath: false,
+ },
+ {
+ source: "/:path((?!stars(?:/|$)).*)",
+ has: [{ type: "host", value: "ktx.sh" }],
+ destination: "https://docs.kaelio.com/ktx/:path",
+ permanent: true,
+ basePath: false,
+ },
{
source: "/",
destination: "/ktx/docs/getting-started/introduction",
@@ -44,26 +73,30 @@ const config = {
basePath: false,
},
{
- source: "/:path*",
- has: [{ type: "host", value: "docs.ktx.sh" }],
- destination: "https://docs.kaelio.com/ktx/:path*",
+ // AI Resources collapsed from four pages to one and now lives under the
+ // Community & Resources section. Redirect the old top-level URL and the
+ // retired per-page slugs to the new home. Redirects run before the .md
+ // rewrite, so the Markdown variants must be matched first and keep their
+ // .md suffix; otherwise a cached Markdown URL would 308 to the HTML page
+ // and break the agent Markdown contract.
+ source: "/docs/ai-resources.md",
+ destination: "/docs/community/ai-resources.md",
permanent: true,
- basePath: false,
},
{
- source: "/slack",
- has: [{ type: "host", value: "ktx.sh" }],
- destination:
- "https://join.slack.com/t/ktxcommunity/shared_invite/zt-3y9b44m1x-LVyNNJD5nwaZHq4XS29LMQ",
- permanent: false,
- basePath: false,
+ source: "/docs/ai-resources/:slug([^/]+\\.md)",
+ destination: "/docs/community/ai-resources.md",
+ permanent: true,
},
{
- source: "/:path((?!stars(?:/|$)).*)",
- has: [{ type: "host", value: "ktx.sh" }],
- destination: "https://docs.kaelio.com/ktx/:path",
+ source: "/docs/ai-resources",
+ destination: "/docs/community/ai-resources",
+ permanent: true,
+ },
+ {
+ source: "/docs/ai-resources/:slug",
+ destination: "/docs/community/ai-resources",
permanent: true,
- basePath: false,
},
];
},
diff --git a/docs-site/tests/docs-index-route.test.mjs b/docs-site/tests/docs-index-route.test.mjs
index fdd8ec81..e2ab24f0 100644
--- a/docs-site/tests/docs-index-route.test.mjs
+++ b/docs-site/tests/docs-index-route.test.mjs
@@ -2,6 +2,8 @@ import assert from "node:assert/strict";
import { spawn } from "node:child_process";
import { once } from "node:events";
import { readFile, writeFile } from "node:fs/promises";
+import http from "node:http";
+import https from "node:https";
import { dirname, join } from "node:path";
import { createServer } from "node:net";
import { after, before, test } from "node:test";
@@ -100,6 +102,37 @@ after(async () => {
}
});
+// Node's fetch (undici) overwrites the Host header with the connection host,
+// so the alias-host redirect rules never match. The low-level http(s) client
+// sends Host verbatim, which is what the alias canonicalization keys off of.
+function requestWithHost(hostHeader, path) {
+ const target = new URL(docsSiteUrl);
+ const client = target.protocol === "https:" ? https : http;
+ const port =
+ target.port || (target.protocol === "https:" ? "443" : "80");
+
+ return new Promise((resolve, reject) => {
+ const request = client.request(
+ {
+ hostname: target.hostname,
+ port,
+ path,
+ method: "GET",
+ headers: { Host: hostHeader },
+ },
+ (response) => {
+ response.resume();
+ resolve({
+ status: response.statusCode,
+ location: response.headers.location,
+ });
+ },
+ );
+ request.on("error", reject);
+ request.end();
+ });
+}
+
test("/ktx/docs redirects to the docs introduction", async () => {
const response = await fetch(`${docsSiteUrl}${docsBasePath}/docs`, {
redirect: "manual",
@@ -112,6 +145,53 @@ test("/ktx/docs redirects to the docs introduction", async () => {
);
});
+test("retired AI Resources URLs redirect to the page under Community", async () => {
+ // The former top-level URL.
+ const bare = await fetch(
+ `${docsSiteUrl}${docsBasePath}/docs/ai-resources`,
+ { redirect: "manual" },
+ );
+
+ assert.equal(bare.status, 308);
+ assert.equal(
+ bare.headers.get("location"),
+ `${docsBasePath}/docs/community/ai-resources`,
+ );
+
+ // A retired per-page slug.
+ const slug = await fetch(
+ `${docsSiteUrl}${docsBasePath}/docs/ai-resources/agent-quickstart`,
+ { redirect: "manual" },
+ );
+
+ assert.equal(slug.status, 308);
+ assert.equal(
+ slug.headers.get("location"),
+ `${docsBasePath}/docs/community/ai-resources`,
+ );
+
+ // A retired per-page Markdown URL must stay Markdown: it has to redirect to
+ // the new .md route, not fall through to the HTML page.
+ const markdown = await fetch(
+ `${docsSiteUrl}${docsBasePath}/docs/ai-resources/agent-quickstart.md`,
+ { redirect: "manual" },
+ );
+
+ assert.equal(markdown.status, 308);
+ assert.equal(
+ markdown.headers.get("location"),
+ `${docsBasePath}/docs/community/ai-resources.md`,
+ );
+
+ // Following that redirect end to end must land on Markdown, not HTML.
+ const followed = await fetch(
+ `${docsSiteUrl}${docsBasePath}/docs/ai-resources/agent-quickstart.md`,
+ );
+
+ assert.equal(followed.status, 200);
+ assert.match(followed.headers.get("content-type") ?? "", /text\/markdown/);
+});
+
test("/ redirects into the /ktx docs site", async () => {
const response = await fetch(`${docsSiteUrl}/`, {
redirect: "manual",
@@ -141,3 +221,51 @@ test("/ktx/api/search returns docs search results", async () => {
"search should return at least one docs result",
);
});
+
+test("ktx.sh canonicalizes to a single /ktx basePath on the docs host", async () => {
+ const root = await requestWithHost("ktx.sh", "/");
+ assert.equal(root.status, 308);
+ assert.equal(root.location, "https://docs.kaelio.com/ktx/");
+ assert.ok(
+ !root.location.includes("/ktx/ktx"),
+ "the basePath must not be doubled",
+ );
+
+ const page = await requestWithHost(
+ "ktx.sh",
+ "/docs/getting-started/quickstart",
+ );
+ assert.equal(page.status, 308);
+ assert.equal(
+ page.location,
+ "https://docs.kaelio.com/ktx/docs/getting-started/quickstart",
+ );
+});
+
+test("docs.ktx.sh canonicalizes to a single /ktx basePath on the docs host", async () => {
+ const root = await requestWithHost("docs.ktx.sh", "/");
+ assert.equal(root.status, 308);
+ assert.equal(root.location, "https://docs.kaelio.com/ktx");
+ assert.ok(
+ !root.location.includes("/ktx/ktx"),
+ "the basePath must not be doubled",
+ );
+
+ const page = await requestWithHost("docs.ktx.sh", "/llms.txt");
+ assert.equal(page.status, 308);
+ assert.equal(page.location, "https://docs.kaelio.com/ktx/llms.txt");
+});
+
+test("ktx.sh keeps the /slack and /stars exceptions", async () => {
+ const slack = await requestWithHost("ktx.sh", "/slack");
+ assert.equal(slack.status, 307);
+ assert.match(slack.location, /^https:\/\/join\.slack\.com\//);
+
+ // /stars is proxied by a beforeFiles rewrite, so the apex catch-all must not
+ // canonicalize it to the docs host.
+ const stars = await requestWithHost("ktx.sh", "/stars");
+ assert.ok(
+ !(stars.location ?? "").startsWith("https://docs.kaelio.com"),
+ "the stars dashboard must not be redirected to the docs host",
+ );
+});
diff --git a/docs-site/tests/product-mechanics-content.test.mjs b/docs-site/tests/product-mechanics-content.test.mjs
index 5cce9001..d0c9471c 100644
--- a/docs-site/tests/product-mechanics-content.test.mjs
+++ b/docs-site/tests/product-mechanics-content.test.mjs
@@ -85,7 +85,7 @@ test("product mechanics component explains ingestion outputs", async () => {
"compile into SQL",
'"use client"',
"@xyflow/react",
- " {
);
}
- assert.match(
- component,
+ // The ReactFlow canvas config lives in the shared FlowCanvas wrapper, which
+ // product-mechanics renders. Assert the static read-only behavior there.
+ const flowCanvas = await readDocsFile("components/flow-canvas.tsx");
+ for (const guard of [
/nodesDraggable=\{false\}/,
- "ReactFlow canvas should disable node dragging",
- );
- assert.match(
- component,
- /panOnDrag=\{false\}/,
- "ReactFlow canvas should disable panning",
- );
- assert.match(
- component,
+ /nodesConnectable=\{false\}/,
/zoomOnScroll=\{false\}/,
- "ReactFlow canvas should disable scroll zoom",
- );
+ /elementsSelectable=\{false\}/,
+ ]) {
+ assert.match(
+ flowCanvas,
+ guard,
+ `shared FlowCanvas should enforce static read-only behavior: ${guard}`,
+ );
+ }
assert.doesNotMatch(component, /raw-sources/);
assert.doesNotMatch(component, /\.ktx/);
diff --git a/docs-site/tests/product-runtime-content.test.mjs b/docs-site/tests/product-runtime-content.test.mjs
new file mode 100644
index 00000000..ac643faa
--- /dev/null
+++ b/docs-site/tests/product-runtime-content.test.mjs
@@ -0,0 +1,74 @@
+import assert from "node:assert/strict";
+import { readFile } from "node:fs/promises";
+import { dirname, join } from "node:path";
+import { test } from "node:test";
+import { fileURLToPath } from "node:url";
+
+const docsSiteDir = join(dirname(fileURLToPath(import.meta.url)), "..");
+
+async function readDocsFile(path) {
+ return readFile(join(docsSiteDir, path), "utf8");
+}
+
+test("docs introduction renders the serving phase after ingestion", async () => {
+ const introduction = await readDocsFile(
+ "content/docs/getting-started/introduction.mdx",
+ );
+
+ assert.match(
+ introduction,
+ /import\s+\{\s*ProductRuntime\s*\}\s+from\s+"@\/components\/product-runtime";/,
+ );
+ assert.match(introduction, //);
+
+ const mechanicsIndex = introduction.indexOf("");
+ const runtimeIndex = introduction.indexOf("");
+ const useCaseIndex = introduction.indexOf("## Use it for");
+
+ assert.ok(
+ runtimeIndex > mechanicsIndex,
+ "serving diagram should appear after the ingestion diagram",
+ );
+ assert.ok(
+ runtimeIndex < useCaseIndex,
+ "serving diagram should appear before use-case sections",
+ );
+});
+
+test("product runtime component explains the serving cycle", async () => {
+ const component = await readDocsFile("components/product-runtime.tsx");
+
+ for (const expectedText of [
+ "How serving works",
+ "Serving flow",
+ "From an agent request to a governed answer",
+ "Your agent",
+ "Claude Code",
+ "Cursor",
+ "Codex",
+ "Search wiki + semantic layer",
+ "Return approved metrics",
+ "Compile metrics → SQL",
+ "Context layer",
+ "Database",
+ "search + read",
+ "read-only",
+ "wiki/*.md",
+ "semantic-layer/*.yaml",
+ '"use client"',
+ "@xyflow/react",
+ "FlowCanvas",
+ "getSmoothStepPath",
+ "animateMotion",
+ "runtime-particle",
+ "buildCyclePath",
+ ]) {
+ assert.ok(
+ component.includes(expectedText),
+ `component should include: ${expectedText}`,
+ );
+ }
+
+ assert.doesNotMatch(component, /raw-sources/);
+ assert.doesNotMatch(component, /
"
ingest:
adapters:
@@ -18,5 +17,3 @@ agent:
- sl_query
- wiki_search
- sl_read_source
-memory:
- auto_commit: true
diff --git a/examples/orbit-relationship-verification/README.md b/examples/orbit-relationship-verification/README.md
index 126488a2..d99c8fea 100644
--- a/examples/orbit-relationship-verification/README.md
+++ b/examples/orbit-relationship-verification/README.md
@@ -1,11 +1,11 @@
# Orbit-style relationship discovery verification
-This KTX project backs the default `relationships:verify-orbit` command. It uses
+This **ktx** project backs the default `relationships:verify-orbit` command. It uses
the checked-in Orbit-style SQLite fixture from the relationship discovery
benchmark corpus, with no declared primary keys or foreign keys in the database
schema.
-Run from the KTX workspace root:
+Run from the **ktx** workspace root:
```bash
pnpm run relationships:verify-orbit
diff --git a/examples/orbit-relationship-verification/ktx.yaml b/examples/orbit-relationship-verification/ktx.yaml
index b1d30961..4bb605e2 100644
--- a/examples/orbit-relationship-verification/ktx.yaml
+++ b/examples/orbit-relationship-verification/ktx.yaml
@@ -6,7 +6,6 @@ storage:
state: sqlite
search: sqlite-fts5
git:
- auto_commit: true
author: "ktx "
ingest:
adapters: []
diff --git a/examples/package-artifacts/README.md b/examples/package-artifacts/README.md
index 7fa39fb3..b5813b0c 100644
--- a/examples/package-artifacts/README.md
+++ b/examples/package-artifacts/README.md
@@ -14,7 +14,7 @@ generated local project.
The managed Python runtime smoke requires `uv` on `PATH`, isolates
`KTX_RUNTIME_ROOT`, verifies `ktx admin runtime status`, runs `ktx sl query --yes` to
install the core runtime from the bundled wheel, checks `ktx admin runtime status`,
-starts and reuses the KTX daemon, and stops it.
+starts and reuses the **ktx** daemon, and stops it.
The artifact manifest contains the public `@kaelio/ktx` npm tarball and the
bundled `kaelio-ktx` runtime wheel. The smoke does not install standalone
diff --git a/examples/postgres-historic/README.md b/examples/postgres-historic/README.md
index 64fc2593..aadd3cd4 100644
--- a/examples/postgres-historic/README.md
+++ b/examples/postgres-historic/README.md
@@ -17,19 +17,19 @@ unchanged bounded pattern shards do not schedule LLM work.
## Prerequisites
- Docker with Compose v2
-- Node and pnpm matching the KTX workspace
-- `uv` on `PATH` so the KTX-managed Python runtime can install the bundled
+- Node and pnpm matching the **ktx** workspace
+- `uv` on `PATH` so the **ktx**-managed Python runtime can install the bundled
runtime wheel
## Run
-From the KTX repository root:
+From the **ktx** repository root:
```bash
examples/postgres-historic/scripts/smoke.sh
```
-The smoke creates a temporary KTX project, isolates the managed Python runtime
+The smoke creates a temporary **ktx** project, isolates the managed Python runtime
under the temporary project parent, starts Postgres on `127.0.0.1:55432`, and
uses this connection URL:
@@ -41,7 +41,7 @@ Set `KTX_POSTGRES_HISTORIC_KEEP_DOCKER=1` to leave the container running after
the script exits.
The smoke validates the query-history raw snapshot path without requiring LLM
-credentials. It uses KTX's local stage-only ingest API after `ktx setup`, so the
+credentials. It uses **ktx**'s local stage-only ingest API after `ktx setup`, so the
deterministic reader, batch SQL parser, stable artifact writer, and diff-based
WorkUnit planning are checked independently from curation.
@@ -124,6 +124,6 @@ table.
- Missing grants: confirm `GRANT pg_read_all_stats TO ktx_reader;`.
- Empty snapshot: rerun `scripts/generate-workload.sh base` and keep
`--query-history-min-executions 2` for the smoke.
-- SQL-analysis failures: run `pnpm run ktx -- dev runtime status` from the KTX
+- SQL-analysis failures: run `pnpm run ktx -- dev runtime status` from the **ktx**
repository root and confirm `uv`, the bundled Python wheel, and the managed
runtime all pass.
diff --git a/package.json b/package.json
index e7714634..ef428823 100644
--- a/package.json
+++ b/package.json
@@ -1,6 +1,6 @@
{
"name": "ktx-workspace",
- "version": "0.9.0",
+ "version": "0.12.0",
"description": "Workspace root for ktx packages",
"private": true,
"type": "module",
@@ -69,11 +69,6 @@
"typescript": "^6.0.3",
"yaml": "^2.9.0"
},
- "pnpm": {
- "onlyBuiltDependencies": [
- "better-sqlite3"
- ]
- },
"license": "Apache-2.0",
"repository": {
"type": "git",
diff --git a/packages/cli/package.json b/packages/cli/package.json
index 939a8b9c..ed1b366e 100644
--- a/packages/cli/package.json
+++ b/packages/cli/package.json
@@ -1,7 +1,11 @@
{
"name": "@kaelio/ktx",
- "version": "0.9.0",
+ "version": "0.12.0",
"description": "Standalone ktx context layer for data agents",
+ "author": {
+ "name": "Kaelio",
+ "url": "https://www.kaelio.com"
+ },
"type": "module",
"engines": {
"node": ">=22.0.0"
@@ -47,9 +51,11 @@
"@ai-sdk/devtools": "0.0.18",
"@ai-sdk/google-vertex": "^4.0.134",
"@anthropic-ai/claude-agent-sdk": "0.3.146",
+ "@clack/core": "1.3.1",
"@clack/prompts": "1.4.0",
"@clickhouse/client": "^1.18.5",
"@commander-js/extra-typings": "14.0.0",
+ "@duckdb/node-api": "1.5.3-r.3",
"@google-cloud/bigquery": "^8.3.1",
"@looker/sdk": "^26.8.0",
"@looker/sdk-node": "^26.8.0",
@@ -72,6 +78,7 @@
"pg": "^8.21.0",
"posthog-node": "^5.34.9",
"react": "^19.2.6",
+ "semver": "^7.8.1",
"simple-git": "3.36.0",
"snowflake-sdk": "^2.4.2",
"yaml": "^2.9.0",
@@ -85,6 +92,7 @@
"@types/node": "^25.9.1",
"@types/pg": "^8.20.0",
"@types/react": "^19.2.15",
+ "@types/semver": "^7.7.1",
"@vitest/coverage-v8": "^4.1.7",
"ajv": "8.20.0",
"ink-testing-library": "^4.0.0",
diff --git a/packages/cli/src/admin.ts b/packages/cli/src/admin.ts
index 6c04f82f..d4b63ac1 100644
--- a/packages/cli/src/admin.ts
+++ b/packages/cli/src/admin.ts
@@ -24,7 +24,7 @@ export function registerAdminCommands(program: Command, context: KtxCliCommandCo
admin
.command('init')
- .description('Initialize a Git-backed KTX project directory for maintenance scripts')
+ .description('Initialize a Git-backed ktx project directory for maintenance scripts')
.argument('[directory]', 'Project directory')
.option('--force', 'Rewrite ktx.yaml and scaffold files in an existing project', false)
.action(
diff --git a/packages/cli/src/clack.ts b/packages/cli/src/clack.ts
index 2ad51e6c..94999476 100644
--- a/packages/cli/src/clack.ts
+++ b/packages/cli/src/clack.ts
@@ -3,6 +3,30 @@ import type { KtxCliIo } from './cli-runtime.js';
const ESC = String.fromCharCode(0x1b);
+export interface CliStyleEnv {
+ NO_COLOR?: string;
+ TERM?: string;
+}
+
+function ansiEnabled(env: CliStyleEnv = process.env): boolean {
+ return !env.NO_COLOR && env.TERM !== 'dumb';
+}
+
+function ansiColor(text: string, open: number, close: number, env?: CliStyleEnv): string {
+ if (!ansiEnabled(env)) {
+ return text;
+ }
+ return `${ESC}[${open}m${text}${ESC}[${close}m`;
+}
+
+export function dim(text: string, env?: CliStyleEnv): string {
+ return ansiColor(text, 2, 22, env);
+}
+
+export function cyan(text: string, env?: CliStyleEnv): string {
+ return ansiColor(text, 36, 39, env);
+}
+
export interface RailBufferedSource {
stdoutText(): string;
stderrText(): string;
@@ -57,27 +81,39 @@ class KtxCliPromptCancelledError extends Error {
}
export function createClackSpinner(): KtxCliSpinner {
- return spinner();
+ // clack colors the animated spinner frame magenta by default; styleFrame
+ // (typed in SpinnerOptions, absent from the README) recolors it ktx orange.
+ return spinner({ styleFrame: orange });
}
-function magenta(text: string): string {
- return `${ESC}[35m${text}${ESC}[39m`;
+// ktx mascot orange (#FF8A4C) via 24-bit truecolor.
+function orange(text: string): string {
+ if (!ansiEnabled()) {
+ return text;
+ }
+ return `${ESC}[38;2;255;138;76m${text}${ESC}[39m`;
}
function red(text: string): string {
- return `${ESC}[31m${text}${ESC}[39m`;
+ return ansiColor(text, 31, 39);
}
+/**
+ * Stderr-only, non-animated spinner. Use this instead of {@link createCliSpinner}
+ * when the next step reads stdin in raw mode (an Ink TUI or a keypress wait):
+ * the animated clack spinner seizes stdin via `@clack/core`'s `block()` and
+ * leaves it dirty, which the following raw-mode reader misreads as a stray key.
+ */
export function createStaticCliSpinner(io: KtxCliSpinnerIo): KtxCliSpinner {
return {
start(message) {
- io.stderr.write(`${magenta('◐')} ${message}\n`);
+ io.stderr.write(`${orange('◐')} ${message}\n`);
},
message(message) {
- io.stderr.write(`${magenta('│')} ${message}\n`);
+ io.stderr.write(`${orange('│')} ${message}\n`);
},
stop(message) {
- io.stderr.write(`${magenta('◇')} ${message}\n`);
+ io.stderr.write(`${orange('◇')} ${message}\n`);
},
error(message) {
io.stderr.write(`${red('■')} ${message}\n`);
@@ -85,6 +121,30 @@ export function createStaticCliSpinner(io: KtxCliSpinnerIo): KtxCliSpinner {
};
}
+/**
+ * Animated spinner in an interactive terminal, static `◐/◇/■` lines otherwise
+ * (scripts, CI, piped output) so logs stay clean and uncluttered by frames.
+ */
+export function createCliSpinner(io: KtxCliIo): KtxCliSpinner {
+ return io.stdout.isTTY === true ? createClackSpinner() : createStaticCliSpinner(io);
+}
+
+export async function runWithCliSpinner(
+ spinner: KtxCliSpinner,
+ text: { start: string; success: string; failure: string },
+ run: () => Promise,
+): Promise {
+ spinner.start(text.start);
+ try {
+ const value = await run();
+ spinner.stop(text.success);
+ return value;
+ } catch (error) {
+ spinner.error(text.failure);
+ throw error;
+ }
+}
+
export function createClackPromptAdapter(): KtxCliPromptAdapter {
return {
async confirm(options) {
diff --git a/packages/cli/src/claude-code-prompt-caching.ts b/packages/cli/src/claude-code-prompt-caching.ts
index a7c0fa54..0d696579 100644
--- a/packages/cli/src/claude-code-prompt-caching.ts
+++ b/packages/cli/src/claude-code-prompt-caching.ts
@@ -21,7 +21,7 @@ export function formatClaudeCodePromptCachingWarning(fields: string[]): string |
if (fields.length === 0) {
return null;
}
- return `claude-code ignores ${fields.join(', ')} because the Claude Agent SDK does not expose KTX prompt-cache TTL, tool, or history markers.`;
+ return `claude-code ignores ${fields.join(', ')} because the Claude Agent SDK does not expose ktx prompt-cache TTL, tool, or history markers.`;
}
export function formatClaudeCodePromptCachingFix(): string {
diff --git a/packages/cli/src/cli-program.ts b/packages/cli/src/cli-program.ts
index 31ab8a03..f9da6552 100644
--- a/packages/cli/src/cli-program.ts
+++ b/packages/cli/src/cli-program.ts
@@ -2,6 +2,7 @@ import { existsSync } from 'node:fs';
import { join } from 'node:path';
import { Command, type CommandUnknownOpts, InvalidArgumentError } from '@commander-js/extra-typings';
import type { KtxCliDeps, KtxCliIo, KtxCliPackageInfo } from './cli-runtime.js';
+import { SLACK_HELP_FOOTER, writeErrorCommunityHint } from './community-cta.js';
import { registerCompletionCommands } from './commands/completion-commands.js';
import { registerConnectionCommands } from './commands/connection-commands.js';
import { registerIngestCommands } from './commands/ingest-commands.js';
@@ -16,6 +17,7 @@ import { renderMissingProjectMessage } from './doctor.js';
import { findNearestKtxProjectDir, resolveKtxProjectDir } from './project-resolver.js';
import { profileMark, profileSpan } from './startup-profile.js';
import type { CommandOutcome } from './telemetry/index.js';
+import { prepareUpdateCheckNotice, type PrepareUpdateCheckNoticeOptions } from './update-check/update-check.js';
profileMark('module:cli-program');
@@ -39,6 +41,8 @@ interface KtxCommanderProgramOptions {
runInit: (args: { projectDir: string; force: boolean }, io: KtxCliIo) => Promise;
}
+type KtxCliUpdateCheckOptions = Pick;
+
export interface BuildKtxProgramOptions {
io: KtxCliIo;
deps: KtxCliDeps;
@@ -47,6 +51,7 @@ export interface BuildKtxProgramOptions {
setExitCode?: (code: number) => void;
argv?: string[];
setTelemetryModule?: (telemetry: typeof import('./telemetry/index.js')) => void;
+ updateCheck?: KtxCliUpdateCheckOptions;
}
type CommanderExitLike = { exitCode: number; code: string; message: string };
@@ -247,13 +252,14 @@ export function resolveCommandProjectDirOverride(command: CommandWithGlobalOptio
function createBaseProgram(info: KtxCliPackageInfo, io: KtxCliIo): Command {
return new Command()
.name('ktx')
- .description('KTX data agent context layer CLI')
- .option('--project-dir ', 'KTX project directory (default: KTX_PROJECT_DIR, nearest ktx.yaml, or cwd)')
+ .description('ktx data agent context layer CLI')
+ .option('--project-dir ', 'ktx project directory (default: KTX_PROJECT_DIR, nearest ktx.yaml, or cwd)')
.option('--debug', 'Enable diagnostic logging to stderr')
.version(`${info.name} ${info.version}`, '-v, --version', 'Show CLI version')
.helpOption('-h, --help', 'Show this help text')
.configureHelp({ showGlobalOptions: true })
.showHelpAfterError()
+ .addHelpText('after', `\n${SLACK_HELP_FOOTER}`)
.exitOverride()
.configureOutput({
writeOut: (chunk) => io.stdout.write(chunk),
@@ -431,23 +437,36 @@ export function collectCommandFlagsPresent(command: CommandUnknownOpts): Record<
export function buildKtxProgram(options: BuildKtxProgramOptions): Command {
const program = createBaseProgram(options.packageInfo, options.io);
+ let pendingUpdateNotice: string | null = null;
+
program.hook('preAction', async (_thisCommand, actionCommand) => {
// The hidden completion command must stay silent and side-effect free: skip
- // the telemetry notice, command span, and project checks entirely.
+ // the telemetry notice, command span, project checks, and update checks entirely.
if (commandPath(actionCommand as CommandPathNode).includes('__complete')) {
return;
}
+ const commandNode = actionCommand as CommandPathNode;
+ const updateCheck = await prepareUpdateCheckNotice({
+ io: options.io,
+ env: options.updateCheck?.env,
+ fetchDistTags: options.updateCheck?.fetchDistTags,
+ homeDir: options.updateCheck?.homeDir,
+ installedVersion: options.packageInfo.version,
+ now: options.updateCheck?.now,
+ commandOptions: commandOptions(commandNode),
+ });
+ pendingUpdateNotice = updateCheck.notice;
+
const telemetry = await import('./telemetry/index.js');
options.setTelemetryModule?.(telemetry);
await telemetry.showTelemetryNoticeIfNeeded(options.io, options.packageInfo);
- const commandNode = actionCommand as CommandPathNode;
const path = commandPath(commandNode);
const projectDir = resolveCommandProjectDir(commandNode);
const hasProject = ktxYamlExists(projectDir);
const attachProjectGroup = shouldAttachCommandProjectGroup(path, hasProject);
telemetry.beginCommandSpan({
commandPath: path,
- flagsPresent: collectCommandFlagsPresent(commandNode as unknown as CommandUnknownOpts),
+ flagsPresent: collectCommandFlagsPresent(actionCommand),
projectDir: attachProjectGroup ? projectDir : undefined,
hasProject,
attachProjectGroup,
@@ -457,6 +476,13 @@ export function buildKtxProgram(options: BuildKtxProgramOptions): Command {
ensureProjectAvailable(options.io, commandNode);
});
+ program.hook('postAction', () => {
+ if (pendingUpdateNotice) {
+ options.io.stderr.write(pendingUpdateNotice);
+ pendingUpdateNotice = null;
+ }
+ });
+
const context: KtxCliCommandContext = {
io: options.io,
deps: options.deps,
@@ -529,7 +555,15 @@ export async function runCommanderKtxCli(
try {
return await runBareInteractiveCommand(program, io, context);
} catch (error) {
+ const telemetry = await import('./telemetry/index.js');
+ await telemetry.reportException({
+ error,
+ context: { source: 'bare-interactive', handled: true, fatal: false },
+ packageInfo: info,
+ io,
+ });
io.stderr.write(`${formatCliError(error)}\n`);
+ writeErrorCommunityHint(io, 'error');
return 1;
}
}
@@ -554,6 +588,7 @@ export async function runCommanderKtxCli(
exitCode = error.exitCode === 0 ? 0 : 1;
} else {
io.stderr.write(`${formatCliError(error)}\n`);
+ writeErrorCommunityHint(io, 'error');
exitCode = 1;
}
} finally {
@@ -563,6 +598,23 @@ export async function runCommanderKtxCli(
outcome: commandOutcomeForParseResult(parseError, exitCode),
error: parseError,
});
+ if (
+ parseError &&
+ !isCommanderExit(parseError) &&
+ !isKtxProjectMissingAbortError(parseError)
+ ) {
+ await telemetryModule.reportException({
+ error: parseError,
+ context: {
+ source: completed?.commandPath.join(' ') ?? 'commander parseAsync',
+ handled: true,
+ fatal: false,
+ },
+ projectDir: completed?.projectGroupAttached ? completed.projectDir : undefined,
+ packageInfo: info,
+ io,
+ });
+ }
await telemetryModule.emitCompletedCommand({ completed, packageInfo: info, io });
await telemetryModule.shutdownTelemetryEmitter();
}
diff --git a/packages/cli/src/cli-runtime.ts b/packages/cli/src/cli-runtime.ts
index 7043143b..89c7c11d 100644
--- a/packages/cli/src/cli-runtime.ts
+++ b/packages/cli/src/cli-runtime.ts
@@ -12,6 +12,7 @@ import type { KtxSqlArgs } from './sql.js';
import { profileMark, profileSpan } from './startup-profile.js';
import type { KtxTextIngestArgs } from './text-ingest.js';
import { assertCliVersion } from './release-version.js';
+import { writeErrorCommunityHint } from './community-cta.js';
profileMark('module:cli-runtime');
@@ -60,7 +61,7 @@ export function packageInfoFromJson(packageJson: unknown): KtxCliPackageInfo {
typeof packageJson.name !== 'string' ||
typeof packageJson.version !== 'string'
) {
- throw new Error('Invalid KTX CLI package metadata');
+ throw new Error('Invalid ktx CLI package metadata');
}
return {
@@ -76,7 +77,7 @@ async function runInit(args: { projectDir: string; force: boolean }, io: KtxCliI
force: args.force,
});
- io.stdout.write(`Initialized KTX project at ${result.projectDir}\n`);
+ io.stdout.write(`Initialized ktx project at ${result.projectDir}\n`);
io.stdout.write(`Config: ${result.configPath}\n`);
io.stdout.write(`Commit: ${result.commitHash ?? 'none'}\n`);
return 0;
@@ -129,6 +130,54 @@ function installTelemetrySignalFlush(io: KtxCliIo, info: KtxCliPackageInfo): ()
};
}
+/** @internal */
+export function createGlobalExceptionReporter(io: KtxCliIo, info: KtxCliPackageInfo) {
+ return async (source: 'uncaughtException' | 'unhandledRejection', error: unknown): Promise => {
+ const { reportException, shutdownTelemetryEmitter } = await import('./telemetry/index.js');
+ await reportException({
+ error,
+ context: { source, handled: false, fatal: true },
+ io,
+ packageInfo: info,
+ immediate: true,
+ });
+ await shutdownTelemetryEmitter();
+ };
+}
+
+/** @internal */
+export function writeGlobalExceptionToStderr(io: KtxCliIo, error: unknown): void {
+ if (error instanceof Error && error.stack) {
+ io.stderr.write(`${error.stack}\n`);
+ } else {
+ io.stderr.write(`${String(error)}\n`);
+ }
+ writeErrorCommunityHint(io, 'crash');
+}
+
+export function installGlobalExceptionHandlers(io: KtxCliIo, info: KtxCliPackageInfo): () => void {
+ const report = createGlobalExceptionReporter(io, info);
+ const handle = (source: 'uncaughtException' | 'unhandledRejection', error: unknown): void => {
+ void (async () => {
+ try {
+ await report(source, error);
+ } catch {
+ // Best-effort: preserve Node's process termination behavior.
+ }
+ writeGlobalExceptionToStderr(io, error);
+ process.exit(1);
+ })();
+ };
+ const onUncaught = (error: Error): void => handle('uncaughtException', error);
+ const onUnhandled = (reason: unknown): void => handle('unhandledRejection', reason);
+ process.on('uncaughtException', onUncaught);
+ process.on('unhandledRejection', onUnhandled);
+ return () => {
+ process.off('uncaughtException', onUncaught);
+ process.off('unhandledRejection', onUnhandled);
+ };
+}
+
export async function runKtxCli(
argv = process.argv.slice(2),
io: KtxCliIo = process,
@@ -141,11 +190,14 @@ export async function runKtxCli(
// Real-process entry only: flush telemetry if interrupted. Test/programmatic
// callers pass their own `io`, so they never install process-level handlers.
const removeSignalFlush = (io as unknown) === process ? installTelemetrySignalFlush(io, info) : undefined;
+ const removeGlobalExceptionHandlers =
+ (io as unknown) === process ? installGlobalExceptionHandlers(io, info) : undefined;
try {
return await runCommanderKtxCli(argv, io, deps, info, {
runInit: runInitForCommander,
});
} finally {
+ removeGlobalExceptionHandlers?.();
removeSignalFlush?.();
}
}
diff --git a/packages/cli/src/commands/connection-commands.ts b/packages/cli/src/commands/connection-commands.ts
index 213bf608..579575d1 100644
--- a/packages/cli/src/commands/connection-commands.ts
+++ b/packages/cli/src/commands/connection-commands.ts
@@ -37,7 +37,7 @@ export function registerConnectionCommands(program: Command, context: KtxCliComm
connection
.command('test')
.description('Test one or all configured connections (default: all)')
- .argument('[connectionId]', 'KTX connection id to test (omit to test all)')
+ .argument('[connectionId]', 'ktx connection id to test (omit to test all)')
.option('--all', 'Test every configured connection and print a summary list')
.action(async (connectionId: string | undefined, options: { all?: boolean }, command) => {
if (options.all === true && connectionId !== undefined) {
diff --git a/packages/cli/src/commands/ingest-commands.ts b/packages/cli/src/commands/ingest-commands.ts
index b5efe443..d7e09596 100644
--- a/packages/cli/src/commands/ingest-commands.ts
+++ b/packages/cli/src/commands/ingest-commands.ts
@@ -25,16 +25,16 @@ export function registerIngestCommands(
): void {
const ingest = program
.command('ingest')
- .description('Build or inspect KTX context, or capture text into memory')
+ .description('Build or inspect ktx context, or capture text into memory')
.usage('[options] [connectionId]')
.argument('[connectionId]', 'Configured connection id to ingest (omit to ingest all)')
.option('--all', 'Ingest all configured connections', false)
.addOption(new Option('--query-history', 'Include database query-history usage patterns').conflicts('noQueryHistory'))
.addOption(new Option('--no-query-history', 'Skip database query-history usage patterns'))
.option('--query-history-window-days ', 'Query-history lookback window for this run', parsePositiveIntegerOption)
- .option('--text ', 'Capture inline text into KTX memory; repeatable', collectOption, [])
- .option('--file ', 'Capture a text file into KTX memory; use - for stdin; repeatable', collectOption, [])
- .option('--connection-id ', 'KTX connection id to tag captured text/file notes')
+ .option('--text ', 'Capture inline text into ktx memory; repeatable', collectOption, [])
+ .option('--file ', 'Capture a text file into ktx memory; use - for stdin; repeatable', collectOption, [])
+ .option('--connection-id ', 'ktx connection id to tag captured text/file notes')
.option('--user-id ', 'Memory user id for text/file capture attribution', 'local-cli')
.option('--fail-fast', 'Stop after the first failed text/file item', false)
.addOption(new Option('--plain', 'Print plain text output').conflicts(['json']))
diff --git a/packages/cli/src/commands/mcp-commands.ts b/packages/cli/src/commands/mcp-commands.ts
index 94b17498..6978d2b7 100644
--- a/packages/cli/src/commands/mcp-commands.ts
+++ b/packages/cli/src/commands/mcp-commands.ts
@@ -27,11 +27,11 @@ function binPath(): string {
function formatMcpStartResultMessage(input: { status: 'started' | 'already-running'; url: string }): string {
return [
- input.status === 'started' ? `KTX MCP daemon started: ${input.url}` : `KTX MCP daemon already running: ${input.url}`,
+ input.status === 'started' ? `ktx MCP daemon started: ${input.url}` : `ktx MCP daemon already running: ${input.url}`,
'',
- 'KTX is ready for configured agents.',
- 'Open your agent for this KTX project and ask a data question, for example:',
- ' "Use KTX to show me the available tables and metrics."',
+ 'ktx is ready for configured agents.',
+ 'Open your agent for this ktx project and ask a data question, for example:',
+ ' "Use ktx to show me the available tables and metrics."',
'',
].join('\n');
}
@@ -50,14 +50,14 @@ async function printMcpStatus(context: KtxCliCommandContext, projectDir: string)
export function registerMcpCommands(program: Command, context: KtxCliCommandContext): void {
const mcp = program
.command('mcp')
- .description('Manage the KTX MCP HTTP server (bare command: show status)')
+ .description('Manage the ktx MCP HTTP server (bare command: show status)')
.action(async (_options, command) => {
await printMcpStatus(context, resolveCommandProjectDir(command));
});
mcp
.command('stdio')
- .description('Run the KTX MCP server over stdio')
+ .description('Run the ktx MCP server over stdio')
.action(async (_options, command) => {
await (context.deps.mcp?.runStdioServer ?? runKtxMcpStdioServer)({
projectDir: resolveCommandProjectDir(command),
@@ -68,7 +68,7 @@ export function registerMcpCommands(program: Command, context: KtxCliCommandCont
mcp
.command('start')
- .description('Start the KTX MCP HTTP server')
+ .description('Start the ktx MCP HTTP server')
.option('--host ', 'Host to bind', '127.0.0.1')
.option('--port ', 'Port to bind', parsePositiveIntegerOption, 7878)
.option('--token ', 'Bearer token required for non-loopback binding')
@@ -96,7 +96,7 @@ export function registerMcpCommands(program: Command, context: KtxCliCommandCont
allowedOrigins: options.allowedOrigin,
io: context.io,
});
- context.io.stdout.write(`KTX MCP server listening at http://${options.host}:${options.port}/mcp\n`);
+ context.io.stdout.write(`ktx MCP server listening at http://${options.host}:${options.port}/mcp\n`);
return;
}
const result = await (context.deps.mcp?.startDaemon ?? startKtxMcpDaemon)({
@@ -114,24 +114,24 @@ export function registerMcpCommands(program: Command, context: KtxCliCommandCont
mcp
.command('stop')
- .description('Stop the KTX MCP daemon')
+ .description('Stop the ktx MCP daemon')
.action(async (_options, command) => {
const result = await (context.deps.mcp?.stopDaemon ?? stopKtxMcpDaemon)({
projectDir: resolveCommandProjectDir(command),
});
- context.io.stdout.write(result.status === 'stopped' ? 'KTX MCP daemon stopped.\n' : 'KTX MCP daemon is not running.\n');
+ context.io.stdout.write(result.status === 'stopped' ? 'ktx MCP daemon stopped.\n' : 'ktx MCP daemon is not running.\n');
});
mcp
.command('status')
- .description('Show KTX MCP daemon status')
+ .description('Show ktx MCP daemon status')
.action(async (_options, command) => {
await printMcpStatus(context, resolveCommandProjectDir(command));
});
mcp
.command('logs')
- .description('Print the KTX MCP daemon log')
+ .description('Print the ktx MCP daemon log')
.option('--follow', 'Follow log output', false)
.action(async (options, command) => {
const logPath = mcpDaemonLayout(resolveCommandProjectDir(command)).logPath;
diff --git a/packages/cli/src/commands/runtime-commands.ts b/packages/cli/src/commands/runtime-commands.ts
index ef0d06cc..441cceda 100644
--- a/packages/cli/src/commands/runtime-commands.ts
+++ b/packages/cli/src/commands/runtime-commands.ts
@@ -18,7 +18,7 @@ async function runRuntimeArgs(context: KtxCliCommandContext, args: KtxRuntimeArg
export function registerRuntimeCommands(program: Command, context: KtxCliCommandContext): void {
const runtime = program
.command('runtime')
- .description('Install, start, stop, and inspect the KTX-managed Python runtime')
+ .description('Install, start, stop, and inspect the ktx-managed Python runtime')
.showHelpAfterError();
runtime
@@ -38,7 +38,7 @@ export function registerRuntimeCommands(program: Command, context: KtxCliCommand
runtime
.command('start')
- .description('Start the KTX daemon')
+ .description('Start the ktx daemon')
.addOption(createRuntimeFeatureOption())
.option('--force', 'Restart even when a matching daemon is already running', false)
.action(async (options: { feature: RuntimeFeature; force?: boolean }, command: CommandWithGlobalOptions) => {
@@ -53,8 +53,8 @@ export function registerRuntimeCommands(program: Command, context: KtxCliCommand
runtime
.command('stop')
- .description('Stop the KTX daemon')
- .option('--all', 'Stop all KTX daemon processes recorded or discoverable on this machine', false)
+ .description('Stop the ktx daemon')
+ .option('--all', 'Stop all ktx daemon processes recorded or discoverable on this machine', false)
.action(async (options: { all?: boolean }, command: CommandWithGlobalOptions) => {
await runRuntimeArgs(context, {
command: 'stop',
diff --git a/packages/cli/src/commands/setup-commands.ts b/packages/cli/src/commands/setup-commands.ts
index 1619a80a..a37b7eb6 100644
--- a/packages/cli/src/commands/setup-commands.ts
+++ b/packages/cli/src/commands/setup-commands.ts
@@ -2,7 +2,7 @@ import { type Command, InvalidArgumentError, Option } from '@commander-js/extra-
import type { KtxCliCommandContext } from '../cli-program.js';
import { resolveCommandProjectDir } from '../cli-program.js';
import type { KtxSetupDatabaseDriver } from '../setup-databases.js';
-import type { KtxSetupLlmBackend } from '../setup-models.js';
+import { isKtxSetupLlmBackend, type KtxSetupLlmBackend } from '../setup-models.js';
import type { KtxSetupSourceType } from '../setup-sources.js';
async function runSetupArgs(
@@ -16,7 +16,7 @@ async function runSetupArgs(
function positiveInteger(value: string): number {
const parsed = Number.parseInt(value, 10);
if (!Number.isInteger(parsed) || parsed <= 0) {
- throw new Error(`Expected a positive integer, received ${value}`);
+ throw new InvalidArgumentError(`Expected a positive integer, received ${value}`);
}
return parsed;
}
@@ -29,7 +29,7 @@ function embeddingBackend(value: string): 'openai' | 'sentence-transformers' {
}
function llmBackend(value: string): KtxSetupLlmBackend {
- if (value === 'anthropic' || value === 'vertex' || value === 'claude-code' || value === 'codex') {
+ if (isKtxSetupLlmBackend(value)) {
return value;
}
throw new InvalidArgumentError(`invalid choice '${value}'`);
@@ -89,13 +89,13 @@ function shouldShowSetupEntryMenu(
target?: string;
global?: boolean;
local?: boolean;
+ installDir?: string;
skipAgents?: boolean;
yes?: boolean;
input?: boolean;
llmBackend?: KtxSetupLlmBackend;
anthropicApiKeyEnv?: string;
anthropicApiKeyFile?: string;
- llmModel?: string;
vertexProject?: string;
vertexLocation?: string;
skipLlm?: boolean;
@@ -160,13 +160,13 @@ function shouldShowSetupEntryMenu(
'target',
'global',
'local',
+ 'installDir',
'skipAgents',
'yes',
'input',
'llmBackend',
'anthropicApiKeyEnv',
'anthropicApiKeyFile',
- 'llmModel',
'vertexProject',
'vertexLocation',
'skipLlm',
@@ -204,8 +204,8 @@ function shouldShowSetupEntryMenu(
export function registerSetupCommands(program: Command, context: KtxCliCommandContext): void {
const setup = program
.command('setup')
- .description('Set up or resume a local KTX project')
- .addOption(new Option('--project-dir ', 'KTX project directory').hideHelp())
+ .description('Set up or resume a local ktx project')
+ .addOption(new Option('--project-dir ', 'ktx project directory').hideHelp())
.option('--agents', 'Install agent integration only', false)
.addOption(
new Option('--target ', 'Agent target').choices([
@@ -219,6 +219,10 @@ export function registerSetupCommands(program: Command, context: KtxCliCommandCo
)
.option('--global', 'Install agent integration into the global target scope', false)
.option('--local', 'Install Claude Code MCP config into the private per-project ~/.claude.json scope', false)
+ .option(
+ '--install-dir ',
+ 'Directory to install project-scoped agent config into (defaults to the ktx project directory)',
+ )
.addOption(new Option('--skip-agents', 'Leave agent integration incomplete for now').hideHelp().default(false))
.option('--yes', 'Accept project creation and runtime install defaults where setup confirms', false)
.option('--no-input', 'Disable interactive terminal input')
@@ -229,7 +233,6 @@ export function registerSetupCommands(program: Command, context: KtxCliCommandCo
.addOption(
new Option('--anthropic-api-key-file ', 'File containing the Anthropic API key').hideHelp(),
)
- .addOption(new Option('--llm-model ', 'LLM model ID or backend model alias').hideHelp())
.addOption(new Option('--vertex-project ', 'Google Vertex AI project ID, env:NAME, or file:/path').hideHelp())
.addOption(new Option('--vertex-location ', 'Google Vertex AI location, env:NAME, or file:/path').hideHelp())
.addOption(new Option('--skip-llm', 'Leave LLM setup incomplete for now').hideHelp().default(false))
@@ -298,7 +301,7 @@ export function registerSetupCommands(program: Command, context: KtxCliCommandCo
.hideHelp(),
)
.addOption(
- new Option('--skip-databases', 'Leave database setup incomplete; KTX cannot work until a database is added')
+ new Option('--skip-databases', 'Leave database setup incomplete; ktx cannot work until a database is added')
.hideHelp()
.default(false),
)
@@ -397,6 +400,16 @@ export function registerSetupCommands(program: Command, context: KtxCliCommandCo
context.setExitCode(1);
return;
}
+ if (options.installDir && (options.global || options.local)) {
+ context.io.stderr.write('Choose either --install-dir or a scope flag (--global / --local), not both.\n');
+ context.setExitCode(1);
+ return;
+ }
+ if (options.installDir && options.target === 'claude-desktop') {
+ context.io.stderr.write('--install-dir does not apply to --target claude-desktop, which is always global.\n');
+ context.setExitCode(1);
+ return;
+ }
const creatingDatabaseConnection = options.database.length > 0 || options.databaseUrl !== undefined;
if (creatingDatabaseConnection && options.databaseConnectionId.length > 1) {
@@ -406,6 +419,8 @@ export function registerSetupCommands(program: Command, context: KtxCliCommandCo
}
const resolvedAgentScope = options.local ? 'local' : options.global ? 'global' : 'project';
+ const debugEnabled =
+ ((command.optsWithGlobals ? command.optsWithGlobals() : command.opts()) as { debug?: unknown }).debug === true;
await runSetupArgs(context, {
command: 'run',
projectDir: resolveCommandProjectDir(command),
@@ -413,14 +428,15 @@ export function registerSetupCommands(program: Command, context: KtxCliCommandCo
agents: options.agents === true,
...(options.target ? { target: options.target } : {}),
agentScope: resolvedAgentScope,
+ ...(options.installDir ? { installRoot: options.installDir } : {}),
skipAgents: options.skipAgents === true,
inputMode: options.input === false ? 'disabled' : 'auto',
+ ...(debugEnabled ? { debug: true } : {}),
yes: options.yes === true,
cliVersion: context.packageInfo.version,
...(options.llmBackend ? { llmBackend: options.llmBackend } : {}),
...(options.anthropicApiKeyEnv ? { anthropicApiKeyEnv: options.anthropicApiKeyEnv } : {}),
...(options.anthropicApiKeyFile ? { anthropicApiKeyFile: options.anthropicApiKeyFile } : {}),
- ...(options.llmModel ? { llmModel: options.llmModel } : {}),
...(options.vertexProject ? { vertexProject: options.vertexProject } : {}),
...(options.vertexLocation ? { vertexLocation: options.vertexLocation } : {}),
skipLlm: options.skipLlm === true,
diff --git a/packages/cli/src/commands/sl-commands.ts b/packages/cli/src/commands/sl-commands.ts
index 8f2f05a3..7b99e593 100644
--- a/packages/cli/src/commands/sl-commands.ts
+++ b/packages/cli/src/commands/sl-commands.ts
@@ -44,7 +44,7 @@ export function registerSlCommands(program: Command, context: KtxCliCommandConte
.description('List, search, validate, or query local semantic-layer sources')
.usage('[options] [query...]')
.argument('[query...]', 'Search query; omit to list all sources')
- .option('--connection-id ', 'KTX connection id')
+ .option('--connection-id ', 'ktx connection id')
.option('--limit ', 'Maximum search results (search mode only)', parsePositiveIntegerOption)
.addOption(
new Option('--output ', 'Output mode: pretty (default in TTY), plain (TSV), or json').choices([
diff --git a/packages/cli/src/commands/sql-commands.ts b/packages/cli/src/commands/sql-commands.ts
index 0c73df6a..213ab059 100644
--- a/packages/cli/src/commands/sql-commands.ts
+++ b/packages/cli/src/commands/sql-commands.ts
@@ -26,7 +26,7 @@ export function registerSqlCommands(program: Command, context: KtxCliCommandCont
.command('sql')
.description('Execute parser-validated read-only SQL against a configured connection')
.argument('', 'SQL query to execute')
- .requiredOption('-c, --connection ', 'KTX connection id')
+ .requiredOption('-c, --connection ', 'ktx connection id')
.option('--max-rows ', 'Maximum rows to return', parseSqlMaxRowsOption, DEFAULT_MAX_ROWS)
.addOption(
new Option('--output ', 'Output mode: pretty (default), plain (TSV), or json').choices([
diff --git a/packages/cli/src/commands/status-commands.ts b/packages/cli/src/commands/status-commands.ts
index ec429576..1f72a385 100644
--- a/packages/cli/src/commands/status-commands.ts
+++ b/packages/cli/src/commands/status-commands.ts
@@ -15,7 +15,7 @@ function inputMode(options: { input?: boolean }): { inputMode?: 'disabled' } {
export function registerStatusCommands(program: Command, context: KtxCliCommandContext): void {
program
.command('status')
- .description('Check current KTX setup and project readiness')
+ .description('Check current ktx setup and project readiness')
.option('--json', 'Print JSON output', false)
.option('-v, --verbose', 'Show every check, including passing ones', false)
.option('--validate', 'Only validate the ktx.yaml schema; skip readiness checks', false)
diff --git a/packages/cli/src/community-cta.ts b/packages/cli/src/community-cta.ts
new file mode 100644
index 00000000..b4702542
--- /dev/null
+++ b/packages/cli/src/community-cta.ts
@@ -0,0 +1,28 @@
+import type { KtxCliIo } from './cli-runtime.js';
+import { isWritableTtyOutput } from './io/tty.js';
+import { dim } from './io/symbols.js';
+import { SLACK_URL } from './links.js';
+
+type ErrorCtaVariant = 'error' | 'crash';
+
+/** @internal */
+export const SLACK_HELP_FOOTER = `Community & support: ${SLACK_URL}`;
+
+/** @internal */
+export const SLACK_SETUP_NOTE = {
+ title: 'Community',
+ body: `Questions or feedback? Join the ktx Slack: ${SLACK_URL}`,
+} as const;
+
+export function writeErrorCommunityHint(io: KtxCliIo, variant: ErrorCtaVariant): void {
+ if (!isWritableTtyOutput(io.stderr)) {
+ return;
+ }
+
+ const line =
+ variant === 'crash'
+ ? `This may be a bug - report it or ask in the ktx community: ${SLACK_URL}`
+ : `Stuck? The ktx community can help: ${SLACK_URL}`;
+
+ io.stderr.write(`${dim(line)}\n`);
+}
diff --git a/packages/cli/src/connection.ts b/packages/cli/src/connection.ts
index 96281e82..d12dccb7 100644
--- a/packages/cli/src/connection.ts
+++ b/packages/cli/src/connection.ts
@@ -6,6 +6,7 @@ import { type NotionBotInfo, NotionClient } from './context/ingest/adapters/noti
import { createLocalLookerCredentialResolver } from './context/ingest/adapters/looker/local-looker.adapter.js';
import { metabaseRuntimeConfigFromLocalConnection } from './context/ingest/adapters/metabase/local-metabase.adapter.js';
import { testRepoConnection } from './context/ingest/repo-fetch.js';
+import { federatedConnectionListing } from './context/connections/federation.js';
import { getDriverRegistration } from './context/connections/drivers.js';
import { parseNotionConnectionConfig, resolveNotionConnectionAuthToken } from './context/connections/notion-config.js';
import { resolveKtxConfigReference } from './context/core/config-reference.js';
@@ -16,7 +17,8 @@ import { bold, dim, green, red, SYMBOLS } from './io/symbols.js';
import { createKtxCliScanConnector } from './local-scan-connectors.js';
import { profileMark } from './startup-profile.js';
import { isDemoConnection } from './telemetry/demo-detect.js';
-import { emitTelemetryEvent } from './telemetry/index.js';
+import { emitTelemetryEvent, reportException } from './telemetry/index.js';
+import { collectTelemetryRedactionSecrets } from './telemetry/redaction-secrets.js';
import { formatErrorDetail, scrubErrorClass } from './telemetry/scrubber.js';
profileMark('module:connection');
@@ -74,6 +76,12 @@ async function testNativeConnection(
}
const result = await connector.testConnection();
if (!result.success) {
+ // Re-throw the driver's original error so connection_test telemetry records
+ // its real class (e.g. ConnectionError) and code (e.g. ELOGIN) instead of
+ // collapsing every native failure to a generic Error with no code.
+ if (result.cause instanceof Error) {
+ throw result.cause;
+ }
throw new Error(result.error ?? 'connection test failed');
}
return { driver: connector.driver };
@@ -127,7 +135,7 @@ async function createDefaultLookerClient(
connectionId: string,
): Promise {
const factory = new DefaultLookerConnectionClientFactory(createLocalLookerCredentialResolver(project));
- return (await factory.createClient(connectionId)) as unknown as LookerTestPort;
+ return factory.createLookerClient(connectionId);
}
async function testLookerConnection(
@@ -318,6 +326,21 @@ async function emitConnectionTest(input: {
...(errorDetail ? { errorDetail } : {}),
},
});
+ if (input.error) {
+ await reportException({
+ error: input.error,
+ context: { source: 'connection test', handled: true, fatal: false },
+ projectDir: input.project.projectDir,
+ io: input.io,
+ redactionSecrets: await collectTelemetryRedactionSecrets({
+ project: input.project,
+ connectionId: input.connectionId,
+ includeLlm: false,
+ includeEmbeddings: false,
+ env: process.env,
+ }),
+ });
+ }
}
function visualWidth(text: string): number {
@@ -425,15 +448,23 @@ export async function runKtxConnection(
io.stdout.write('No connections configured. Run `ktx setup` to add one.\n');
return 0;
}
- const idWidth = Math.max('ID'.length, ...entries.map(([id]) => id.length));
- const driverWidth = Math.max(
- 'DRIVER'.length,
+ const federated = federatedConnectionListing(project.config.connections, args.projectDir);
+ const idCandidates = [...entries.map(([id]) => id), ...(federated ? [federated.id] : [])];
+ const driverLengths = [
...entries.map(([, c]) => (c.driver ?? 'unknown').length),
- );
+ ...(federated ? [federated.driver.length] : []),
+ ];
+ const idWidth = Math.max('ID'.length, ...idCandidates.map((id) => id.length));
+ const driverWidth = Math.max('DRIVER'.length, ...driverLengths);
io.stdout.write(`${'ID'.padEnd(idWidth)} ${'DRIVER'.padEnd(driverWidth)}\n`);
for (const [id, connection] of entries) {
io.stdout.write(`${id.padEnd(idWidth)} ${(connection.driver ?? 'unknown').padEnd(driverWidth)}\n`);
}
+ if (federated) {
+ io.stdout.write(`${federated.id.padEnd(idWidth)} ${federated.driver.padEnd(driverWidth)}\n`);
+ io.stdout.write(` federates: ${federated.members.join(', ')}\n`);
+ io.stdout.write(` ${federated.hint}\n`);
+ }
return 0;
}
diff --git a/packages/cli/src/connectors/bigquery/connector.ts b/packages/cli/src/connectors/bigquery/connector.ts
index edebe284..0b30c025 100644
--- a/packages/cli/src/connectors/bigquery/connector.ts
+++ b/packages/cli/src/connectors/bigquery/connector.ts
@@ -5,7 +5,9 @@ import { assertReadOnlySql, limitSqlForExecution } from '../../context/connectio
import { tryConstraintQuery } from '../../context/scan/constraint-discovery.js';
import { scopedTableNames } from '../../context/scan/table-ref.js';
import {
+ connectorTestFailure,
createKtxConnectorCapabilities,
+ type KtxConnectorTestResult,
type KtxColumnSampleInput,
type KtxColumnSampleResult,
type KtxColumnStatsInput,
@@ -24,9 +26,7 @@ import {
type KtxTableSampleInput,
type KtxTableSampleResult,
} from '../../context/scan/types.js';
-import { readFileSync } from 'node:fs';
-import { homedir } from 'node:os';
-import { resolve } from 'node:path';
+import { resolveStringReference } from '../shared/string-reference.js';
export interface KtxBigQueryConnectionConfig {
driver?: string;
@@ -136,18 +136,6 @@ class DefaultBigQueryClientFactory implements KtxBigQueryClientFactory {
}
}
-function resolveStringReference(value: string, env: NodeJS.ProcessEnv): string {
- if (value.startsWith('env:')) {
- return env[value.slice('env:'.length)] ?? '';
- }
- if (value.startsWith('file:')) {
- const rawPath = value.slice('file:'.length);
- const path = rawPath.startsWith('~') ? resolve(homedir(), rawPath.slice(1)) : rawPath;
- return readFileSync(path, 'utf-8').trim();
- }
- return value;
-}
-
function stringConfigValue(
connection: KtxBigQueryConnectionConfig | undefined,
key: keyof KtxBigQueryConnectionConfig,
@@ -320,7 +308,7 @@ export class KtxBigQueryScanConnector implements KtxScanConnector {
this.id = `bigquery:${options.connectionId}`;
}
- async testConnection(): Promise<{ success: boolean; error?: string }> {
+ async testConnection(): Promise {
try {
const client = this.getClient();
await client.getDatasets({ maxResults: 1 });
@@ -329,7 +317,7 @@ export class KtxBigQueryScanConnector implements KtxScanConnector {
}
return { success: true };
} catch (error) {
- return { success: false, error: error instanceof Error ? error.message : String(error) };
+ return connectorTestFailure(error);
}
}
diff --git a/packages/cli/src/connectors/clickhouse/connector.ts b/packages/cli/src/connectors/clickhouse/connector.ts
index 74ef7a77..38a477e7 100644
--- a/packages/cli/src/connectors/clickhouse/connector.ts
+++ b/packages/cli/src/connectors/clickhouse/connector.ts
@@ -1,12 +1,10 @@
import { createClient } from '@clickhouse/client';
import { getDialectForDriver } from '../../context/connections/dialects.js';
import { assertReadOnlySql, limitSqlForExecution } from '../../context/connections/read-only-sql.js';
-import { createKtxConnectorCapabilities, type KtxColumnSampleInput, type KtxColumnSampleResult, type KtxColumnStatsInput, type KtxColumnStatsResult, type KtxQueryResult, type KtxReadOnlyQueryInput, type KtxScanConnector, type KtxScanContext, type KtxScanInput, type KtxSchemaColumn, type KtxSchemaSnapshot, type KtxSchemaTable, type KtxTableRef, type KtxTableSampleInput, type KtxTableListEntry, type KtxTableSampleResult } from '../../context/scan/types.js';
+import { connectorTestFailure, createKtxConnectorCapabilities, type KtxConnectorTestResult, type KtxColumnSampleInput, type KtxColumnSampleResult, type KtxColumnStatsInput, type KtxColumnStatsResult, type KtxQueryResult, type KtxReadOnlyQueryInput, type KtxScanConnector, type KtxScanContext, type KtxScanInput, type KtxSchemaColumn, type KtxSchemaSnapshot, type KtxSchemaTable, type KtxTableRef, type KtxTableSampleInput, type KtxTableListEntry, type KtxTableSampleResult } from '../../context/scan/types.js';
import { scopedTableNames } from '../../context/scan/table-ref.js';
-import { readFileSync } from 'node:fs';
+import { resolveStringReference } from '../shared/string-reference.js';
import { Agent as HttpsAgent } from 'node:https';
-import { homedir } from 'node:os';
-import { resolve } from 'node:path';
export interface KtxClickHouseConnectionConfig {
driver?: string;
@@ -142,19 +140,6 @@ function stringConfigValue(
return typeof value === 'string' && value.trim().length > 0 ? resolveStringReference(value.trim(), env) : undefined;
}
-function resolveStringReference(value: string, env: NodeJS.ProcessEnv): string {
- if (value.startsWith('env:')) {
- const envName = value.slice('env:'.length);
- return env[envName] ?? '';
- }
- if (value.startsWith('file:')) {
- const rawPath = value.slice('file:'.length);
- const path = rawPath.startsWith('~') ? resolve(homedir(), rawPath.slice(1)) : rawPath;
- return readFileSync(path, 'utf-8').trim();
- }
- return value;
-}
-
function maybeNumber(value: unknown): number | undefined {
return typeof value === 'number' && Number.isFinite(value) ? value : undefined;
}
@@ -317,12 +302,12 @@ export class KtxClickHouseScanConnector implements KtxScanConnector {
this.id = `clickhouse:${options.connectionId}`;
}
- async testConnection(): Promise<{ success: boolean; error?: string }> {
+ async testConnection(): Promise {
try {
await this.query('SELECT 1');
return { success: true };
} catch (error) {
- return { success: false, error: error instanceof Error ? error.message : String(error) };
+ return connectorTestFailure(error);
}
}
@@ -645,7 +630,7 @@ export class KtxClickHouseScanConnector implements KtxScanConnector {
private assertConnection(connectionId: string): void {
if (connectionId !== this.connectionId) {
- throw new Error(`KTX ClickHouse connector ${this.id} cannot serve connection ${connectionId}`);
+ throw new Error(`ktx ClickHouse connector ${this.id} cannot serve connection ${connectionId}`);
}
}
}
diff --git a/packages/cli/src/connectors/duckdb/federated-attach.ts b/packages/cli/src/connectors/duckdb/federated-attach.ts
new file mode 100644
index 00000000..edcb94eb
--- /dev/null
+++ b/packages/cli/src/connectors/duckdb/federated-attach.ts
@@ -0,0 +1,90 @@
+import { sqliteDatabasePathFromConfig, type KtxSqliteConnectionConfig } from '../sqlite/connector.js';
+import { postgresPoolConfigFromConfig, type KtxPostgresConnectionConfig } from '../postgres/connector.js';
+import {
+ mysqlConnectionPoolConfigFromConfig,
+ type KtxMysqlConnectionConfig,
+} from '../mysql/connector.js';
+import type { FederatedMember } from '../../context/connections/federation.js';
+
+function kvKeyword(value: string): string {
+ // libpq/DuckDB key-value values quote with single quotes and backslash-escape.
+ return /[\s'\\]/.test(value) ? `'${value.replaceAll('\\', '\\\\').replaceAll("'", "\\'")}'` : value;
+}
+
+function withRequiredSslMode(connectionString: string): string {
+ // DuckDB passes this libpq URL straight to the server, so an ssl:true member
+ // must carry sslmode in the URL itself; keep a stronger mode the URL already pins.
+ const url = new URL(connectionString);
+ if (url.searchParams.has('sslmode')) {
+ return connectionString;
+ }
+ url.searchParams.set('sslmode', 'require');
+ return url.toString();
+}
+
+function postgresAttachString(member: FederatedMember, env: NodeJS.ProcessEnv): string {
+ const cfg = postgresPoolConfigFromConfig({
+ connectionId: member.connectionId,
+ connection: member.connection as KtxPostgresConnectionConfig,
+ env,
+ });
+ if (cfg.connectionString) {
+ return cfg.ssl ? withRequiredSslMode(cfg.connectionString) : cfg.connectionString;
+ }
+ const parts: string[] = [];
+ if (cfg.host) parts.push(`host=${kvKeyword(cfg.host)}`);
+ if (cfg.port) parts.push(`port=${cfg.port}`);
+ if (cfg.database) parts.push(`dbname=${kvKeyword(cfg.database)}`);
+ if (cfg.user) parts.push(`user=${kvKeyword(cfg.user)}`);
+ if (cfg.password) parts.push(`password=${kvKeyword(cfg.password)}`);
+ if (cfg.ssl) {
+ parts.push('sslmode=require');
+ }
+ if (cfg.options) {
+ parts.push(`options=${kvKeyword(cfg.options)}`);
+ }
+ return parts.join(' ');
+}
+
+function mysqlAttachString(member: FederatedMember, env: NodeJS.ProcessEnv): string {
+ const cfg = mysqlConnectionPoolConfigFromConfig({
+ connectionId: member.connectionId,
+ connection: member.connection as KtxMysqlConnectionConfig,
+ env,
+ });
+ const parts: string[] = [
+ `host=${kvKeyword(cfg.host)}`,
+ `port=${cfg.port}`,
+ `database=${kvKeyword(cfg.database)}`,
+ `user=${kvKeyword(cfg.user)}`,
+ ];
+ if (cfg.password) {
+ parts.push(`password=${kvKeyword(cfg.password)}`);
+ }
+ if (cfg.ssl) {
+ parts.push('ssl_mode=REQUIRED');
+ }
+ return parts.join(' ');
+}
+
+/**
+ * Resolves a federated member's ktx.yaml config into the connection target
+ * DuckDB's ATTACH wants for that driver, reusing each connector's canonical
+ * resolver so federation and standalone scans agree on config interpretation.
+ */
+export function federatedAttachTarget(member: FederatedMember, env: NodeJS.ProcessEnv): string {
+ switch (member.driver.toLowerCase()) {
+ case 'sqlite':
+ return sqliteDatabasePathFromConfig({
+ connectionId: member.connectionId,
+ projectDir: member.projectDir,
+ connection: member.connection as KtxSqliteConnectionConfig,
+ });
+ case 'postgres':
+ return postgresAttachString(member, env);
+ case 'mysql':
+ return mysqlAttachString(member, env);
+ default:
+ throw new Error(`Driver "${member.driver}" cannot be attached by DuckDB federation.`);
+ }
+}
diff --git a/packages/cli/src/connectors/duckdb/federated-executor.ts b/packages/cli/src/connectors/duckdb/federated-executor.ts
new file mode 100644
index 00000000..7972166c
--- /dev/null
+++ b/packages/cli/src/connectors/duckdb/federated-executor.ts
@@ -0,0 +1,78 @@
+import { DuckDBInstance } from '@duckdb/node-api';
+import { federatedAttachTarget } from './federated-attach.js';
+import type {
+ KtxSqlQueryExecutionInput,
+ KtxSqlQueryExecutionResult,
+} from '../../context/connections/query-executor.js';
+import { normalizeQueryRows } from '../../context/connections/query-executor.js';
+import { assertReadOnlySql, limitSqlForExecution } from '../../context/connections/read-only-sql.js';
+import { attachTypeForDriver, type FederatedMember } from '../../context/connections/federation.js';
+
+function quoteDuckdbIdentifier(id: string): string {
+ return `"${id.replaceAll('"', '""')}"`;
+}
+
+const MIN_SAFE_BIGINT = BigInt(Number.MIN_SAFE_INTEGER);
+const MAX_SAFE_BIGINT = BigInt(Number.MAX_SAFE_INTEGER);
+
+// DuckDB returns integer columns as JS bigint (unserializable by JSON). Values
+// in Number's safe range become Number; larger magnitudes become strings so a
+// BIGINT beyond 2^53 keeps its exact value instead of silently rounding.
+function jsonSafeBigint(value: bigint): number | string {
+ return value >= MIN_SAFE_BIGINT && value <= MAX_SAFE_BIGINT ? Number(value) : value.toString();
+}
+
+function toJsonSafeRows(rows: unknown[][]): unknown[][] {
+ return rows.map((row) => row.map((cell) => (typeof cell === 'bigint' ? jsonSafeBigint(cell) : cell)));
+}
+
+/** @internal */
+export function buildAttachStatements(members: FederatedMember[], env: NodeJS.ProcessEnv): string[] {
+ const attachments = members.map((member) => ({
+ type: attachTypeForDriver(member.driver),
+ url: federatedAttachTarget(member, env),
+ alias: member.connectionId,
+ }));
+
+ const loadStatements = [...new Set(attachments.map((a) => a.type))].map(
+ (type) => `INSTALL ${type}; LOAD ${type};`,
+ );
+ const attachStatements = attachments.map(
+ ({ type, url, alias }) =>
+ `ATTACH '${url.replaceAll("'", "''")}' AS ${quoteDuckdbIdentifier(alias)} (TYPE ${type}, READ_ONLY);`,
+ );
+ return [...loadStatements, ...attachStatements];
+}
+
+export async function executeFederatedQuery(
+ members: FederatedMember[],
+ input: KtxSqlQueryExecutionInput,
+ env: NodeJS.ProcessEnv = process.env,
+): Promise {
+ const sql = limitSqlForExecution(assertReadOnlySql(input.sql), input.maxRows);
+ const attachStatements = buildAttachStatements(members, env);
+
+ const instance = await DuckDBInstance.create(':memory:');
+ try {
+ const connection = await instance.connect();
+ try {
+ for (const statement of attachStatements) {
+ await connection.run(statement);
+ }
+ const reader = await connection.runAndReadAll(sql);
+ const rows = toJsonSafeRows(normalizeQueryRows(reader.getRows()));
+ const headers = reader.columnNames();
+ return {
+ headers,
+ rows,
+ totalRows: rows.length,
+ command: 'SELECT',
+ rowCount: rows.length,
+ };
+ } finally {
+ connection.closeSync();
+ }
+ } finally {
+ instance.closeSync();
+ }
+}
diff --git a/packages/cli/src/connectors/mysql/connector.ts b/packages/cli/src/connectors/mysql/connector.ts
index 29dacc26..5bddec53 100644
--- a/packages/cli/src/connectors/mysql/connector.ts
+++ b/packages/cli/src/connectors/mysql/connector.ts
@@ -1,8 +1,6 @@
import mysql, { type FieldPacket, type Pool, type RowDataPacket } from 'mysql2/promise';
-import { readFileSync } from 'node:fs';
-import { homedir } from 'node:os';
-import { resolve } from 'node:path';
import { getDialectForDriver } from '../../context/connections/dialects.js';
+import { resolveStringReference } from '../shared/string-reference.js';
import { assertReadOnlySql, limitSqlForExecution } from '../../context/connections/read-only-sql.js';
import {
constraintDiscoveryWarning,
@@ -11,7 +9,9 @@ import {
} from '../../context/scan/constraint-discovery.js';
import { scopedTableNames } from '../../context/scan/table-ref.js';
import {
+ connectorTestFailure,
createKtxConnectorCapabilities,
+ type KtxConnectorTestResult,
type KtxColumnSampleInput,
type KtxColumnSampleResult,
type KtxColumnStatsInput,
@@ -157,6 +157,15 @@ interface MysqlDistinctValueRow extends RowDataPacket {
val: unknown;
}
+interface MysqlStatsRow extends RowDataPacket {
+ column_name: string;
+ estimated_cardinality: number | null;
+}
+
+export interface KtxMysqlColumnStatisticsResult {
+ cardinalityByColumn: Map;
+}
+
class DefaultMysqlPoolFactory implements KtxMysqlPoolFactory {
createPool(config: KtxMysqlPoolConfig): KtxMysqlPool {
return mysql.createPool(config) as Pool;
@@ -172,19 +181,6 @@ function stringConfigValue(
return typeof value === 'string' && value.trim().length > 0 ? resolveStringReference(value.trim(), env) : undefined;
}
-function resolveStringReference(value: string, env: NodeJS.ProcessEnv): string {
- if (value.startsWith('env:')) {
- const envName = value.slice('env:'.length);
- return env[envName] ?? '';
- }
- if (value.startsWith('file:')) {
- const rawPath = value.slice('file:'.length);
- const path = rawPath.startsWith('~') ? resolve(homedir(), rawPath.slice(1)) : rawPath;
- return readFileSync(path, 'utf-8').trim();
- }
- return value;
-}
-
function maybeNumber(value: unknown): number | undefined {
return typeof value === 'number' && Number.isFinite(value) ? value : undefined;
}
@@ -382,7 +378,7 @@ export class KtxMysqlScanConnector implements KtxScanConnector {
readonly capabilities = createKtxConnectorCapabilities({
tableSampling: true,
columnSampling: true,
- columnStats: false,
+ columnStats: true,
readOnlySql: true,
nestedAnalysis: true,
formalForeignKeys: true,
@@ -413,12 +409,12 @@ export class KtxMysqlScanConnector implements KtxScanConnector {
this.id = `mysql:${options.connectionId}`;
}
- async testConnection(): Promise<{ success: boolean; error?: string }> {
+ async testConnection(): Promise {
try {
await this.query('SELECT 1');
return { success: true };
} catch (error) {
- return { success: false, error: error instanceof Error ? error.message : String(error) };
+ return connectorTestFailure(error);
}
}
@@ -560,8 +556,29 @@ export class KtxMysqlScanConnector implements KtxScanConnector {
return { values, nullCount: null, distinctCount: null };
}
- async columnStats(_input: KtxColumnStatsInput, _ctx: KtxScanContext): Promise {
- return null;
+ async columnStats(input: KtxColumnStatsInput, _ctx: KtxScanContext): Promise {
+ const stats = await this.getColumnStatistics(input.table);
+ const value = stats?.cardinalityByColumn.get(input.column);
+ return value === undefined
+ ? null
+ : { min: null, max: null, average: null, nullCount: null, distinctCount: value };
+ }
+
+ async getColumnStatistics(table: KtxTableRef): Promise {
+ const schema = table.db ?? this.poolConfig.database;
+ const sql = this.dialect.generateColumnStatisticsQuery(schema, table.name);
+ if (!sql) {
+ return null;
+ }
+ const rows = await this.queryRaw(sql);
+ const cardinalityByColumn = new Map();
+ for (const row of rows) {
+ const cardinality = Number(row.estimated_cardinality);
+ if (Number.isFinite(cardinality) && cardinality >= 0) {
+ cardinalityByColumn.set(row.column_name, cardinality);
+ }
+ }
+ return cardinalityByColumn.size > 0 ? { cardinalityByColumn } : null;
}
async executeReadOnly(input: KtxMysqlReadOnlyQueryInput, _ctx: KtxScanContext): Promise {
@@ -762,7 +779,7 @@ export class KtxMysqlScanConnector implements KtxScanConnector {
private assertConnection(connectionId: string): void {
if (connectionId !== this.connectionId) {
- throw new Error(`KTX MySQL connector ${this.id} cannot serve connection ${connectionId}`);
+ throw new Error(`ktx MySQL connector ${this.id} cannot serve connection ${connectionId}`);
}
}
}
diff --git a/packages/cli/src/connectors/mysql/dialect.ts b/packages/cli/src/connectors/mysql/dialect.ts
index 7f9cc725..6b26c97a 100644
--- a/packages/cli/src/connectors/mysql/dialect.ts
+++ b/packages/cli/src/connectors/mysql/dialect.ts
@@ -171,8 +171,18 @@ export class KtxMysqlDialect implements KtxDialect {
`;
}
- generateColumnStatisticsQuery(_schemaName: string, _tableName: string): string | null {
- return null;
+ generateColumnStatisticsQuery(schemaName: string, tableName: string): string | null {
+ return `
+ SELECT
+ COLUMN_NAME AS column_name,
+ MAX(CARDINALITY) AS estimated_cardinality
+ FROM INFORMATION_SCHEMA.STATISTICS
+ WHERE TABLE_SCHEMA = '${schemaName.replace(/'/g, "''")}'
+ AND TABLE_NAME = '${tableName.replace(/'/g, "''")}'
+ AND CARDINALITY IS NOT NULL
+ AND SEQ_IN_INDEX = 1
+ GROUP BY COLUMN_NAME
+ `;
}
generateRandomizedCardinalitySampleQuery(tableName: string, columnName: string, sampleSize: number): string {
diff --git a/packages/cli/src/connectors/postgres/connector.ts b/packages/cli/src/connectors/postgres/connector.ts
index f206fa6a..1a2fcd40 100644
--- a/packages/cli/src/connectors/postgres/connector.ts
+++ b/packages/cli/src/connectors/postgres/connector.ts
@@ -1,12 +1,12 @@
-import { readFileSync } from 'node:fs';
-import { homedir } from 'node:os';
-import { resolve } from 'node:path';
+import { resolveStringReference } from '../shared/string-reference.js';
import { getDialectForDriver } from '../../context/connections/dialects.js';
import { assertReadOnlySql, limitSqlForExecution } from '../../context/connections/read-only-sql.js';
import { tryConstraintQuery } from '../../context/scan/constraint-discovery.js';
import { scopedTableNames } from '../../context/scan/table-ref.js';
import {
+ connectorTestFailure,
createKtxConnectorCapabilities,
+ type KtxConnectorTestResult,
type KtxColumnSampleInput,
type KtxColumnSampleResult,
type KtxColumnStatsInput,
@@ -279,17 +279,6 @@ function stringConfigValue(
return typeof value === 'string' && value.trim().length > 0 ? resolveStringReference(value.trim(), env) : undefined;
}
-function resolveStringReference(value: string, env: NodeJS.ProcessEnv): string {
- if (value.startsWith('env:')) {
- return env[value.slice('env:'.length)] ?? '';
- }
- if (value.startsWith('file:')) {
- const rawPath = value.slice('file:'.length);
- const path = rawPath.startsWith('~') ? resolve(homedir(), rawPath.slice(1)) : rawPath;
- return readFileSync(path, 'utf-8').trim();
- }
- return value;
-}
function numberValue(value: unknown): number | undefined {
return typeof value === 'number' && Number.isFinite(value) ? value : undefined;
@@ -442,12 +431,12 @@ export class KtxPostgresScanConnector implements KtxScanConnector {
this.id = `postgres:${options.connectionId}`;
}
- async testConnection(): Promise<{ success: boolean; error?: string }> {
+ async testConnection(): Promise {
try {
await this.query('SELECT 1');
return { success: true };
} catch (error) {
- return { success: false, error: error instanceof Error ? error.message : String(error) };
+ return connectorTestFailure(error);
}
}
diff --git a/packages/cli/src/connectors/shared/string-reference.ts b/packages/cli/src/connectors/shared/string-reference.ts
new file mode 100644
index 00000000..7f83736d
--- /dev/null
+++ b/packages/cli/src/connectors/shared/string-reference.ts
@@ -0,0 +1,20 @@
+import { readFileSync } from 'node:fs';
+import { homedir } from 'node:os';
+import { resolve } from 'node:path';
+
+/**
+ * Resolves a config string that may reference an environment variable
+ * (`env:NAME`) or a file (`file:/path`, `~` expands to the home dir).
+ * Plain values pass through unchanged.
+ */
+export function resolveStringReference(value: string, env: NodeJS.ProcessEnv): string {
+ if (value.startsWith('env:')) {
+ return env[value.slice('env:'.length)] ?? '';
+ }
+ if (value.startsWith('file:')) {
+ const rawPath = value.slice('file:'.length);
+ const path = rawPath.startsWith('~') ? resolve(homedir(), rawPath.slice(rawPath[1] === '/' ? 2 : 1)) : rawPath;
+ return readFileSync(path, 'utf-8').trim();
+ }
+ return value;
+}
diff --git a/packages/cli/src/connectors/snowflake/connector.ts b/packages/cli/src/connectors/snowflake/connector.ts
index 86d7ebe7..5f016675 100644
--- a/packages/cli/src/connectors/snowflake/connector.ts
+++ b/packages/cli/src/connectors/snowflake/connector.ts
@@ -1,13 +1,13 @@
import { createPrivateKey } from 'node:crypto';
-import { readFileSync } from 'node:fs';
-import { homedir } from 'node:os';
-import { resolve } from 'node:path';
import { getDialectForDriver } from '../../context/connections/dialects.js';
+import { resolveStringReference } from '../shared/string-reference.js';
import { assertReadOnlySql, limitSqlForExecution } from '../../context/connections/read-only-sql.js';
import { tryConstraintQuery } from '../../context/scan/constraint-discovery.js';
import { scopedTableNames } from '../../context/scan/table-ref.js';
import {
+ connectorTestFailure,
createKtxConnectorCapabilities,
+ type KtxConnectorTestResult,
type KtxColumnSampleInput,
type KtxColumnSampleResult,
type KtxColumnStatsInput,
@@ -105,7 +105,7 @@ export interface KtxSnowflakeScanConnectorOptions {
connectionId: string;
connection: KtxSnowflakeConnectionConfig | undefined;
/**
- * KTX project directory. When provided, snowflake-sdk's logger is redirected to
+ * ktx project directory. When provided, snowflake-sdk's logger is redirected to
* `/.ktx/logs/snowflake.log` so its JSON output does not bleed into
* the CLI's TTY. Tests that use a fake driverFactory can leave this undefined.
*/
@@ -133,18 +133,6 @@ export interface KtxSnowflakeColumnDistinctValuesResult {
const DATE_TYPES = ['DATE', 'TIMESTAMP', 'TIMESTAMP_LTZ', 'TIMESTAMP_NTZ', 'TIMESTAMP_TZ', 'TIME'];
-function resolveStringReference(value: string, env: NodeJS.ProcessEnv): string {
- if (value.startsWith('env:')) {
- return env[value.slice('env:'.length)] ?? '';
- }
- if (value.startsWith('file:')) {
- const rawPath = value.slice('file:'.length);
- const path = rawPath.startsWith('~') ? resolve(homedir(), rawPath.slice(1)) : rawPath;
- return readFileSync(path, 'utf-8').trim();
- }
- return value;
-}
-
function stringConfigValue(
connection: KtxSnowflakeConnectionConfig | undefined,
key: keyof KtxSnowflakeConnectionConfig,
@@ -464,7 +452,7 @@ class SnowflakeSdkDriver implements KtxSnowflakeDriver {
await this.query('SELECT 1');
return { success: true };
} catch (error) {
- return { success: false, error: error instanceof Error ? error.message : String(error) };
+ return connectorTestFailure(error);
}
}
@@ -573,7 +561,7 @@ export class KtxSnowflakeScanConnector implements KtxScanConnector {
}
}
- async testConnection(): Promise<{ success: boolean; error?: string }> {
+ async testConnection(): Promise {
return this.getDriver().test();
}
diff --git a/packages/cli/src/connectors/sqlite/connector.ts b/packages/cli/src/connectors/sqlite/connector.ts
index e996bc25..4cae8f99 100644
--- a/packages/cli/src/connectors/sqlite/connector.ts
+++ b/packages/cli/src/connectors/sqlite/connector.ts
@@ -6,7 +6,7 @@ import { fileURLToPath } from 'node:url';
import { getDialectForDriver } from '../../context/connections/dialects.js';
import { assertReadOnlySql, limitSqlForExecution } from '../../context/connections/read-only-sql.js';
import { normalizeQueryRows } from '../../context/connections/query-executor.js';
-import { createKtxConnectorCapabilities, type KtxColumnSampleInput, type KtxColumnSampleResult, type KtxColumnStatsInput, type KtxColumnStatsResult, type KtxQueryResult, type KtxReadOnlyQueryInput, type KtxScanConnector, type KtxScanContext, type KtxScanInput, type KtxSchemaForeignKey, type KtxSchemaSnapshot, type KtxSchemaTable, type KtxTableListEntry, type KtxTableRef, type KtxTableSampleInput, type KtxTableSampleResult } from '../../context/scan/types.js';
+import { connectorTestFailure, createKtxConnectorCapabilities, type KtxConnectorTestResult, type KtxColumnSampleInput, type KtxColumnSampleResult, type KtxColumnStatsInput, type KtxColumnStatsResult, type KtxQueryResult, type KtxReadOnlyQueryInput, type KtxScanConnector, type KtxScanContext, type KtxScanInput, type KtxSchemaForeignKey, type KtxSchemaSnapshot, type KtxSchemaTable, type KtxTableListEntry, type KtxTableRef, type KtxTableSampleInput, type KtxTableSampleResult } from '../../context/scan/types.js';
import { scopedTableNames } from '../../context/scan/table-ref.js';
export interface KtxSqliteConnectionConfig {
@@ -97,30 +97,6 @@ function sqlitePathFromUrl(url: string): string {
return url;
}
-function stripLeadingSqlComments(sql: string): string {
- let index = 0;
- while (index < sql.length) {
- while (/\s/.test(sql[index] ?? '')) {
- index += 1;
- }
- if (sql.startsWith('--', index)) {
- const end = sql.indexOf('\n', index + 2);
- index = end === -1 ? sql.length : end + 1;
- continue;
- }
- if (sql.startsWith('/*', index)) {
- const end = sql.indexOf('*/', index + 2);
- if (end === -1) {
- return sql.slice(index);
- }
- index = end + 2;
- continue;
- }
- break;
- }
- return sql.slice(index);
-}
-
export function isKtxSqliteConnectionConfig(
connection: KtxSqliteConnectionConfig | undefined,
): connection is KtxSqliteConnectionConfig {
@@ -167,7 +143,7 @@ export class KtxSqliteScanConnector implements KtxScanConnector {
this.id = `sqlite:${options.connectionId}`;
}
- async testConnection(): Promise<{ success: boolean; error?: string }> {
+ async testConnection(): Promise {
try {
if (!existsSync(this.dbPath) || !statSync(this.dbPath).isFile()) {
return { success: false, error: `File not found: ${this.dbPath}` };
@@ -175,7 +151,7 @@ export class KtxSqliteScanConnector implements KtxScanConnector {
this.database().prepare('SELECT 1').get();
return { success: true };
} catch (error) {
- return { success: false, error: error instanceof Error ? error.message : String(error) };
+ return connectorTestFailure(error);
}
}
@@ -255,7 +231,7 @@ export class KtxSqliteScanConnector implements KtxScanConnector {
async executeReadOnly(input: KtxSqliteReadOnlyQueryInput, _ctx: KtxScanContext): Promise {
this.assertConnection(input.connectionId);
- const result = this.query(limitSqlForExecution(stripLeadingSqlComments(input.sql), input.maxRows), input.params);
+ const result = this.query(limitSqlForExecution(input.sql, input.maxRows), input.params);
return { ...result, rowCount: result.rows.length };
}
@@ -379,7 +355,7 @@ export class KtxSqliteScanConnector implements KtxScanConnector {
private assertConnection(connectionId: string): void {
if (connectionId !== this.connectionId) {
- throw new Error(`KTX SQLite connector ${this.id} cannot serve connection ${connectionId}`);
+ throw new Error(`ktx SQLite connector ${this.id} cannot serve connection ${connectionId}`);
}
}
}
diff --git a/packages/cli/src/connectors/sqlserver/connector.ts b/packages/cli/src/connectors/sqlserver/connector.ts
index 0115781d..116fdea7 100644
--- a/packages/cli/src/connectors/sqlserver/connector.ts
+++ b/packages/cli/src/connectors/sqlserver/connector.ts
@@ -1,9 +1,11 @@
-import { assertReadOnlySql } from '../../context/connections/read-only-sql.js';
+import { assertReadOnlySql, stripTrailingSqlNoise } from '../../context/connections/read-only-sql.js';
import { getDialectForDriver } from '../../context/connections/dialects.js';
import { tryConstraintQuery } from '../../context/scan/constraint-discovery.js';
import { scopedTableNames } from '../../context/scan/table-ref.js';
import {
+ connectorTestFailure,
createKtxConnectorCapabilities,
+ type KtxConnectorTestResult,
type KtxColumnSampleInput,
type KtxColumnSampleResult,
type KtxColumnStatsInput,
@@ -23,10 +25,8 @@ import {
type KtxTableSampleInput,
type KtxTableSampleResult,
} from '../../context/scan/types.js';
-import { readFileSync } from 'node:fs';
-import { homedir } from 'node:os';
-import { resolve } from 'node:path';
import sql from 'mssql';
+import { resolveStringReference } from '../shared/string-reference.js';
export interface KtxSqlServerConnectionConfig {
driver?: string;
@@ -206,18 +206,6 @@ function stringConfigValue(
return typeof value === 'string' && value.trim().length > 0 ? resolveStringReference(value.trim(), env) : undefined;
}
-function resolveStringReference(value: string, env: NodeJS.ProcessEnv): string {
- if (value.startsWith('env:')) {
- return env[value.slice('env:'.length)] ?? '';
- }
- if (value.startsWith('file:')) {
- const rawPath = value.slice('file:'.length);
- const path = rawPath.startsWith('~') ? resolve(homedir(), rawPath.slice(1)) : rawPath;
- return readFileSync(path, 'utf-8').trim();
- }
- return value;
-}
-
function parseSqlServerUrl(url: string): Partial {
const parsed = new URL(url);
return {
@@ -282,7 +270,7 @@ function isDeniedError(error: unknown): boolean {
}
function limitSqlForSqlServerExecution(sqlText: string, maxRows: number | undefined): string {
- const trimmed = assertReadOnlySql(sqlText).replace(/;+\s*$/, '');
+ const trimmed = stripTrailingSqlNoise(assertReadOnlySql(sqlText));
if (!maxRows) {
return trimmed;
}
@@ -384,12 +372,12 @@ export class KtxSqlServerScanConnector implements KtxScanConnector {
this.id = `sqlserver:${options.connectionId}`;
}
- async testConnection(): Promise<{ success: boolean; error?: string }> {
+ async testConnection(): Promise {
try {
await this.query('SELECT 1');
return { success: true };
} catch (error) {
- return { success: false, error: error instanceof Error ? error.message : String(error) };
+ return connectorTestFailure(error);
}
}
@@ -831,7 +819,7 @@ export class KtxSqlServerScanConnector implements KtxScanConnector {
private assertConnection(connectionId: string): void {
if (connectionId !== this.connectionId) {
- throw new Error(`KTX SQL Server connector ${this.id} cannot serve connection ${connectionId}`);
+ throw new Error(`ktx SQL Server connector ${this.id} cannot serve connection ${connectionId}`);
}
}
}
diff --git a/packages/cli/src/context-build-view.ts b/packages/cli/src/context-build-view.ts
index 0ddd4922..042a517a 100644
--- a/packages/cli/src/context-build-view.ts
+++ b/packages/cli/src/context-build-view.ts
@@ -12,6 +12,13 @@ import { buildPublicIngestPlan, executePublicIngestTarget, publicProgressMessage
import { createAggregateProgressPort } from './progress-port-adapter.js';
import { formatDuration } from './demo-metrics.js';
import { profileMark } from './startup-profile.js';
+import {
+ isFreshStarCountCache,
+ readStarCountCache,
+ writeStarCountCache,
+} from './star-prompt/cache.js';
+import { fetchGitHubStarCount as defaultFetchGitHubStarCount } from './star-prompt/star-count.js';
+import { renderStarPromptLine } from './star-prompt/star-line.js';
profileMark('module:context-build-view');
@@ -79,6 +86,7 @@ export interface ContextBuildViewState {
frame: number;
startedAt: number | null;
totalElapsedMs: number;
+ starCount: number | null;
}
export interface ContextBuildArgs {
@@ -121,6 +129,8 @@ interface CompletedItemName {
interface ContextBuildRenderOptions {
styled?: boolean;
showHint?: boolean;
+ showStarPrompt?: boolean;
+ columns?: number;
hintText?: string;
projectDir?: string;
title?: string;
@@ -138,6 +148,15 @@ export interface ContextBuildDeps {
now?: () => number;
onSourceProgress?: (sources: ContextBuildSourceProgressUpdate[]) => void;
sourceProgressThrottleMs?: number;
+ fetchStarCount?: typeof defaultFetchGitHubStarCount;
+ starPromptEnv?: StarPromptEnv;
+ starPromptHomeDir?: string;
+}
+
+interface StarPromptEnv extends NodeJS.ProcessEnv {
+ CI?: string;
+ DO_NOT_TRACK?: string;
+ KTX_NO_STAR?: string;
}
// --- Rendering ---
@@ -396,7 +415,7 @@ export function renderContextBuildView(
const hasActive = allTargets.some((t) => t.status === 'running' || t.status === 'queued');
const allDone = totalCount > 0 && !hasActive;
- const headerParts = [options.title ?? 'Building KTX context'];
+ const headerParts = [options.title ?? 'Building ktx context'];
if (totalCount > 0) {
const progressParts: string[] = [`${doneCount}/${totalCount}`];
if (state.totalElapsedMs > 0) progressParts.push(formatDuration(state.totalElapsedMs));
@@ -427,6 +446,14 @@ export function renderContextBuildView(
lines.push('');
}
+ if (options.showStarPrompt && hasActive) {
+ const starPrompt = renderStarPromptLine({
+ count: state.starCount,
+ columns: options.columns ?? 80,
+ });
+ lines.push(styled ? dim(starPrompt) : starPrompt);
+ }
+
if (options.showHint && hasActive) {
const hintContent = options.hintText ?? 'Ctrl+C to stop';
const hint = ` ${hintContent}`;
@@ -584,6 +611,7 @@ export function viewStateFromSourceProgress(
frame: 0,
startedAt: startedAtMs ?? null,
totalElapsedMs: startedAtMs ? now - startedAtMs : 0,
+ starCount: null,
};
}
@@ -631,6 +659,9 @@ export function createRepainter(io: KtxCliIo) {
hasPainted = true;
lastCursorUpRows = cursorUpRowsAfterWrite(content);
},
+ columns() {
+ return terminalColumns();
+ },
};
}
@@ -707,7 +738,7 @@ function failedStepDetail(result: KtxPublicIngestTargetResult): string | null {
const INTERNAL_FAILURE_LINE_RE =
/^(Report|Run|Job|Status|Adapter|Connection|Sync|Mode|Dry run|Diff|Tasks|Work units|Failed tasks|Saved memory|Provenance rows):\s*/;
const ACTIONABLE_FAILURE_LINE_RE =
- /^(Missing bundled Python runtime manifest|KTX Python runtime is required|KTX daemon HTTP|Error:|Failed\b|Could not\b|Cannot\b)/;
+ /^(Missing bundled Python runtime manifest|ktx Python runtime is required|ktx daemon HTTP|Error:|Failed\b|Could not\b|Cannot\b)/;
function trimErrorPrefix(line: string): string {
return line.replace(/^Error:\s*/, '');
@@ -718,7 +749,7 @@ function firstCapturedFailureLine(output: string | undefined): string | null {
.split(/\r?\n/)
.map((candidate) => candidate.trim())
.filter((candidate) => candidate.length > 0)
- .filter((candidate) => !candidate.startsWith('KTX scan completed'))
+ .filter((candidate) => !candidate.startsWith('ktx scan completed'))
.filter((candidate) => !INTERNAL_FAILURE_LINE_RE.test(candidate));
const line = lines.find((candidate) => ACTIONABLE_FAILURE_LINE_RE.test(candidate)) ?? lines.at(-1) ?? null;
return line ? trimErrorPrefix(line) : null;
@@ -758,7 +789,7 @@ function failureTextForTarget(input: {
const code = networkErrorCode(input.error, input.capturedOutput);
if (code && isLocalSqlAnalysisConnectionRefused({ capturedOutput: input.capturedOutput, fallback: input.fallback })) {
return [
- `KTX could not reach the local SQL analysis runtime while processing query history for ${input.target.connectionId}.`,
+ `ktx could not reach the local SQL analysis runtime while processing query history for ${input.target.connectionId}.`,
`Reason: ${NETWORK_ERROR_REASONS[code]} (${code}).`,
`Retry: ${retryCommand({
projectDir: input.projectDir,
@@ -772,7 +803,7 @@ function failureTextForTarget(input: {
if (code) {
const operation = input.target.operation === 'database-ingest' ? 'reading schema for' : 'ingesting';
return [
- `KTX lost its connection to ${friendlyDriverName(input.target.driver)} while ${operation} ${input.target.connectionId}.`,
+ `ktx lost its connection to ${friendlyDriverName(input.target.driver)} while ${operation} ${input.target.connectionId}.`,
`Reason: ${NETWORK_ERROR_REASONS[code]} (${code}).`,
`Retry: ${retryCommand({
projectDir: input.projectDir,
@@ -806,6 +837,7 @@ export function initViewState(targets: KtxPublicIngestPlanTarget[]): ContextBuil
frame: 0,
startedAt: null,
totalElapsedMs: 0,
+ starCount: null,
};
}
@@ -817,6 +849,50 @@ function formatProgressDetail(
return `[${percent}%] ${publicProgressMessage(update.message, target)}`;
}
+const STAR_COUNT_CACHE_TTL_MS = 24 * 60 * 60 * 1000;
+
+function envFlag(value: string | undefined): boolean {
+ return value !== undefined && value !== '' && value !== '0' && value !== 'false';
+}
+
+function shouldSuppressStarPrompt(env: StarPromptEnv): boolean {
+ return envFlag(env.CI) || envFlag(env.DO_NOT_TRACK) || envFlag(env.KTX_NO_STAR);
+}
+
+function startStarPromptCountRefresh(input: {
+ fetchStarCount: typeof defaultFetchGitHubStarCount;
+ homeDir?: string;
+ now: () => number;
+ paint: () => void;
+ state: ContextBuildViewState;
+}): void {
+ const cached = readStarCountCache({ homeDir: input.homeDir });
+ if (cached) {
+ input.state.starCount = cached.count;
+ }
+
+ if (isFreshStarCountCache(cached, new Date(input.now()), STAR_COUNT_CACHE_TTL_MS)) {
+ return;
+ }
+
+ void input.fetchStarCount()
+ .then((count) => {
+ if (typeof count !== 'number' || !Number.isFinite(count)) {
+ return;
+ }
+ input.state.starCount = count;
+ input.paint();
+ void writeStarCountCache(
+ {
+ count,
+ fetchedAt: new Date(input.now()).toISOString(),
+ },
+ { homeDir: input.homeDir },
+ );
+ })
+ .catch(() => undefined);
+}
+
export async function runContextBuild(
project: KtxPublicIngestProject,
args: ContextBuildArgs,
@@ -838,13 +914,31 @@ export async function runContextBuild(
state.startedAt = nowFn();
const repainter = isTTY ? createRepainter(io) : null;
+ const starPromptEnabled = repainter !== null && !shouldSuppressStarPrompt(deps.starPromptEnv ?? process.env);
const viewOpts = {
styled: true,
projectDir: args.projectDir,
notices: plan.notices ?? [],
warnings: plan.warnings,
};
- const paint = (hint: boolean) => repainter?.paint(renderContextBuildView(state, { ...viewOpts, showHint: hint }));
+ const paint = (hint: boolean) =>
+ repainter?.paint(
+ renderContextBuildView(state, {
+ ...viewOpts,
+ showHint: hint,
+ showStarPrompt: starPromptEnabled && hint,
+ columns: repainter.columns(),
+ }),
+ );
+ if (starPromptEnabled) {
+ startStarPromptCountRefresh({
+ fetchStarCount: deps.fetchStarCount ?? defaultFetchGitHubStarCount,
+ homeDir: deps.starPromptHomeDir,
+ now: nowFn,
+ paint: () => paint(true),
+ state,
+ });
+ }
paint(true);
let spinnerInterval: ReturnType | null = null;
diff --git a/packages/cli/src/context/connections/drivers.ts b/packages/cli/src/context/connections/drivers.ts
index 1b87984b..3fbeb058 100644
--- a/packages/cli/src/context/connections/drivers.ts
+++ b/packages/cli/src/context/connections/drivers.ts
@@ -17,7 +17,6 @@ export interface KtxDriverRegistration {
readonly driver: KtxConnectionDriver;
readonly scopeConfigKey: KtxScopeConfigKey | null;
readonly hasHistoricSqlReader: boolean;
- readonly hasLocalQueryExecutor: boolean;
load(): Promise;
}
@@ -31,7 +30,6 @@ export const driverRegistrations: Record {
const m = await import('../../connectors/bigquery/connector.js');
return {
@@ -53,7 +51,6 @@ export const driverRegistrations: Record {
const m = await import('../../connectors/clickhouse/connector.js');
return {
@@ -75,7 +72,6 @@ export const driverRegistrations: Record {
const m = await import('../../connectors/mysql/connector.js');
return {
@@ -97,7 +93,6 @@ export const driverRegistrations: Record {
const m = await import('../../connectors/postgres/connector.js');
return {
@@ -119,7 +114,6 @@ export const driverRegistrations: Record {
const m = await import('../../connectors/sqlite/connector.js');
return {
@@ -141,7 +135,6 @@ export const driverRegistrations: Record {
const m = await import('../../connectors/snowflake/connector.js');
return {
@@ -163,7 +156,6 @@ export const driverRegistrations: Record {
const m = await import('../../connectors/sqlserver/connector.js');
return {
diff --git a/packages/cli/src/context/connections/federation.ts b/packages/cli/src/context/connections/federation.ts
new file mode 100644
index 00000000..74036e2f
--- /dev/null
+++ b/packages/cli/src/context/connections/federation.ts
@@ -0,0 +1,83 @@
+import type { KtxProjectConnectionConfig } from '../project/config.js';
+
+/** Stable id for the runtime-derived federated connection. Never written to ktx.yaml. */
+export const FEDERATED_CONNECTION_ID = '_ktx_federated';
+
+/**
+ * Drivers DuckDB can ATTACH for federation. The driver name doubles as the
+ * DuckDB extension/TYPE name, so this set is the single source of truth for
+ * both membership (a driver participates iff it appears here) and attach type.
+ */
+const ATTACH_COMPATIBLE_DRIVERS = new Set(['postgres', 'mysql', 'sqlite']);
+
+export function attachTypeForDriver(driver: string): string {
+ const normalized = driver.toLowerCase();
+ if (!ATTACH_COMPATIBLE_DRIVERS.has(normalized)) {
+ throw new Error(`Driver "${driver}" cannot be attached by DuckDB federation.`);
+ }
+ return normalized;
+}
+
+export interface FederatedMember {
+ connectionId: string;
+ driver: string;
+ projectDir: string;
+ connection: KtxProjectConnectionConfig;
+}
+
+export interface FederatedConnectionDescriptor {
+ id: typeof FEDERATED_CONNECTION_ID;
+ driver: 'duckdb';
+ members: FederatedMember[];
+}
+
+/**
+ * Derives a virtual federated connection when a project declares 2+
+ * attach-compatible databases. Returns null otherwise — single-DB and
+ * incompatible projects are unaffected.
+ */
+export function deriveFederatedConnection(
+ connections: Record,
+ projectDir: string,
+): FederatedConnectionDescriptor | null {
+ const members: FederatedMember[] = Object.entries(connections)
+ .filter(([, config]) => ATTACH_COMPATIBLE_DRIVERS.has(config.driver.toLowerCase()))
+ .map(([connectionId, config]) => ({
+ connectionId,
+ driver: config.driver.toLowerCase(),
+ projectDir,
+ connection: config,
+ }));
+ if (members.length < 2) {
+ return null;
+ }
+ return { id: FEDERATED_CONNECTION_ID, driver: 'duckdb', members };
+}
+
+export interface FederatedConnectionListing {
+ id: typeof FEDERATED_CONNECTION_ID;
+ driver: 'duckdb';
+ members: string[];
+ hint: string;
+}
+
+/**
+ * Listing-facing view of the virtual federated connection for `ktx connection`
+ * and MCP `connection_list`. Derived from the same declared state as
+ * deriveFederatedConnection, so both surfaces describe one connection.
+ */
+export function federatedConnectionListing(
+ connections: Record,
+ projectDir: string,
+): FederatedConnectionListing | null {
+ const descriptor = deriveFederatedConnection(connections, projectDir);
+ if (!descriptor) {
+ return null;
+ }
+ return {
+ id: FEDERATED_CONNECTION_ID,
+ driver: 'duckdb',
+ members: descriptor.members.map((member) => member.connectionId),
+ hint: 'Cross-database queries run here. Name tables connectionId.schema.table (or connectionId.table for sqlite); double-quote any id that is not a bare SQL identifier, e.g. "books-db".public.books.',
+ };
+}
diff --git a/packages/cli/src/context/connections/local-query-executor.ts b/packages/cli/src/context/connections/local-query-executor.ts
deleted file mode 100644
index 3a2e34c9..00000000
--- a/packages/cli/src/context/connections/local-query-executor.ts
+++ /dev/null
@@ -1,59 +0,0 @@
-import { driverRegistrations, getDriverRegistration } from './drivers.js';
-import { createPostgresQueryExecutor } from './postgres-query-executor.js';
-import type {
- KtxSqlQueryExecutionInput,
- KtxSqlQueryExecutionResult,
- KtxSqlQueryExecutorPort,
-} from './query-executor.js';
-import { createSqliteQueryExecutor } from './sqlite-query-executor.js';
-import type { KtxConnectionDriver } from '../scan/types.js';
-
-export interface DefaultLocalQueryExecutorOptions {
- postgres?: KtxSqlQueryExecutorPort;
- sqlite?: KtxSqlQueryExecutorPort;
-}
-
-function driverFor(input: KtxSqlQueryExecutionInput): string {
- return String(input.connection?.driver ?? '').toLowerCase();
-}
-
-function localExecutorMap(
- options: DefaultLocalQueryExecutorOptions,
-): Partial> {
- const wiredExecutors: Partial> = {
- postgres: options.postgres ?? createPostgresQueryExecutor(),
- sqlite: options.sqlite ?? createSqliteQueryExecutor(),
- };
-
- const executors: Partial> = {};
- for (const registration of Object.values(driverRegistrations)) {
- if (!registration.hasLocalQueryExecutor) continue;
- const executor = wiredExecutors[registration.driver];
- if (executor) {
- executors[registration.driver] = executor;
- }
- }
- return executors;
-}
-
-export function createDefaultLocalQueryExecutor(options: DefaultLocalQueryExecutorOptions = {}): KtxSqlQueryExecutorPort {
- const executors = localExecutorMap(options);
-
- return {
- async execute(input: KtxSqlQueryExecutionInput): Promise {
- const driver = driverFor(input);
- const registration = getDriverRegistration(driver);
- if (!registration?.hasLocalQueryExecutor) {
- throw new Error(`No local query executor is configured for driver "${input.connection?.driver ?? 'unknown'}".`);
- }
-
- const executor = executors[registration.driver];
- if (!executor) {
- throw new Error(
- `Local query executor flag is enabled for driver "${registration.driver}", but no executor factory is wired.`,
- );
- }
- return executor.execute(input);
- },
- };
-}
diff --git a/packages/cli/src/context/connections/local-warehouse-descriptor.ts b/packages/cli/src/context/connections/local-warehouse-descriptor.ts
index 4ad926df..0e5d0b9d 100644
--- a/packages/cli/src/context/connections/local-warehouse-descriptor.ts
+++ b/packages/cli/src/context/connections/local-warehouse-descriptor.ts
@@ -16,6 +16,8 @@ export interface LocalConnectionInfo {
id: string;
name: string;
connectionType: string;
+ members?: string[];
+ hint?: string;
}
const DRIVER_TO_CONNECTION_TYPE: Record = {
diff --git a/packages/cli/src/context/connections/postgres-query-executor.ts b/packages/cli/src/context/connections/postgres-query-executor.ts
deleted file mode 100644
index 842609f4..00000000
--- a/packages/cli/src/context/connections/postgres-query-executor.ts
+++ /dev/null
@@ -1,78 +0,0 @@
-import { Client, type ClientConfig } from 'pg';
-import type {
- KtxSqlQueryExecutionInput,
- KtxSqlQueryExecutionResult,
- KtxSqlQueryExecutorPort,
-} from './query-executor.js';
-import { limitSqlForExecution } from './read-only-sql.js';
-
-interface PgClientLike {
- connect(): Promise;
- query(input: string | { text: string; rowMode: 'array' }): Promise<{
- fields: Array<{ name: string }>;
- rows: unknown[][];
- command: string;
- rowCount: number | null;
- }>;
- end(): Promise;
-}
-
-interface PostgresQueryExecutorOptions {
- statementTimeoutMs?: number;
- queryTimeoutMs?: number;
- connectionTimeoutMs?: number;
- clientFactory?: (config: ClientConfig) => PgClientLike;
-}
-
-function connectionDriver(input: KtxSqlQueryExecutionInput): string {
- return String(input.connection?.driver ?? '').toLowerCase();
-}
-
-function createDefaultClient(config: ClientConfig): PgClientLike {
- return new Client(config);
-}
-
-export function createPostgresQueryExecutor(options: PostgresQueryExecutorOptions = {}): KtxSqlQueryExecutorPort {
- const clientFactory = options.clientFactory ?? createDefaultClient;
- return {
- async execute(input: KtxSqlQueryExecutionInput): Promise {
- const driver = connectionDriver(input);
- const connection = input.connection;
- if (driver !== 'postgres') {
- throw new Error(`Local Postgres execution cannot run driver "${connection?.driver ?? 'unknown'}".`);
- }
- if (typeof connection?.url !== 'string' || connection.url.trim().length === 0) {
- throw new Error(`Local Postgres execution requires connections.${input.connectionId}.url.`);
- }
-
- const client = clientFactory({
- connectionString: connection.url,
- statement_timeout: options.statementTimeoutMs ?? 30_000,
- query_timeout: options.queryTimeoutMs ?? 35_000,
- connectionTimeoutMillis: options.connectionTimeoutMs ?? 5_000,
- application_name: 'ktx-local-query',
- });
- await client.connect();
- try {
- await client.query('BEGIN READ ONLY');
- const result = await client.query({
- text: limitSqlForExecution(input.sql, input.maxRows),
- rowMode: 'array',
- });
- await client.query('COMMIT');
- return {
- headers: result.fields.map((field) => field.name),
- rows: result.rows,
- totalRows: result.rows.length,
- command: result.command,
- rowCount: result.rowCount,
- };
- } catch (error) {
- await client.query('ROLLBACK').catch(() => undefined);
- throw error;
- } finally {
- await client.end();
- }
- },
- };
-}
diff --git a/packages/cli/src/context/connections/project-sql-executor.ts b/packages/cli/src/context/connections/project-sql-executor.ts
new file mode 100644
index 00000000..0c2da04e
--- /dev/null
+++ b/packages/cli/src/context/connections/project-sql-executor.ts
@@ -0,0 +1,58 @@
+import { executeFederatedQuery } from '../../connectors/duckdb/federated-executor.js';
+import type { KtxLocalProject } from '../project/project.js';
+import type { KtxScanConnector, KtxScanContext } from '../scan/types.js';
+import { deriveFederatedConnection, FEDERATED_CONNECTION_ID } from './federation.js';
+import type { KtxSqlQueryExecutionInput, KtxSqlQueryExecutionResult } from './query-executor.js';
+
+export interface ExecuteProjectReadOnlySqlDeps {
+ project: KtxLocalProject;
+ input: KtxSqlQueryExecutionInput;
+ createConnector: (connectionId: string) => Promise | KtxScanConnector;
+ executeFederated?: typeof executeFederatedQuery;
+ runId?: string;
+}
+
+/**
+ * Single resolve-and-execute path for project read-only SQL. The federated
+ * connection is derived from declared state here so every executor entry point
+ * routes `_ktx_federated` identically; standard connections go through the
+ * scan connector.
+ */
+export async function executeProjectReadOnlySql(
+ deps: ExecuteProjectReadOnlySqlDeps,
+): Promise {
+ const { project, input } = deps;
+ if (input.connectionId === FEDERATED_CONNECTION_ID) {
+ const descriptor = deriveFederatedConnection(project.config.connections, project.projectDir);
+ if (!descriptor) {
+ throw new Error('Federated execution requested but fewer than 2 attach-compatible connections exist.');
+ }
+ const runFederated = deps.executeFederated ?? executeFederatedQuery;
+ return runFederated(descriptor.members, input);
+ }
+
+ let connector: KtxScanConnector | null = null;
+ try {
+ connector = await deps.createConnector(input.connectionId);
+ if (!connector.capabilities.readOnlySql || !connector.executeReadOnly) {
+ throw new Error(
+ `Connection "${input.connectionId}" driver "${connector.driver}" does not support read-only SQL execution.`,
+ );
+ }
+ const ctx: KtxScanContext = { runId: deps.runId ?? 'sql-execution' };
+ const result = await connector.executeReadOnly(
+ { connectionId: input.connectionId, sql: input.sql, maxRows: input.maxRows },
+ ctx,
+ );
+ return {
+ headers: result.headers,
+ ...(result.headerTypes ? { headerTypes: result.headerTypes } : {}),
+ rows: result.rows,
+ totalRows: result.totalRows,
+ command: 'SELECT',
+ rowCount: result.rowCount,
+ };
+ } finally {
+ await connector?.cleanup?.();
+ }
+}
diff --git a/packages/cli/src/context/connections/query-executor.ts b/packages/cli/src/context/connections/query-executor.ts
index e169d164..0f963c63 100644
--- a/packages/cli/src/context/connections/query-executor.ts
+++ b/packages/cli/src/context/connections/query-executor.ts
@@ -10,6 +10,7 @@ export interface KtxSqlQueryExecutionInput {
export interface KtxSqlQueryExecutionResult {
headers: string[];
+ headerTypes?: string[];
rows: unknown[][];
totalRows: number;
command: string;
diff --git a/packages/cli/src/context/connections/read-only-sql.ts b/packages/cli/src/context/connections/read-only-sql.ts
index fe71a0c3..1bde80b1 100644
--- a/packages/cli/src/context/connections/read-only-sql.ts
+++ b/packages/cli/src/context/connections/read-only-sql.ts
@@ -1,22 +1,141 @@
+import { KtxQueryError } from '../../errors.js';
+
const MUTATING_SQL =
/^\s*(insert|update|delete|merge|alter|drop|create|truncate|grant|revoke|copy|call|do|vacuum|analyze|refresh)\b/i;
const READ_SQL = /^\s*(select|with)\b/i;
-export function assertReadOnlySql(sql: string): string {
- const trimmed = sql.trim();
- if (!READ_SQL.test(trimmed) || MUTATING_SQL.test(trimmed)) {
- throw new Error('Only read-only SELECT/WITH queries can be executed locally.');
+// Agents (and the daemon's sqlglot validator, which ignores comments) routinely
+// emit read-only queries prefixed with `-- ...` or `/* ... */`. Strip leading
+// comments so the prefix check sees the real statement; otherwise valid SELECT/WITH
+// SQL is rejected here while the parser-backed validator accepts it.
+function stripLeadingSqlComments(sql: string): string {
+ let index = 0;
+ while (index < sql.length) {
+ while (/\s/.test(sql[index] ?? '')) {
+ index += 1;
+ }
+ if (sql.startsWith('--', index)) {
+ const end = sql.indexOf('\n', index + 2);
+ index = end === -1 ? sql.length : end + 1;
+ continue;
+ }
+ if (sql.startsWith('/*', index)) {
+ const end = sql.indexOf('*/', index + 2);
+ if (end === -1) {
+ return sql.slice(index);
+ }
+ index = end + 2;
+ continue;
+ }
+ break;
}
+ return sql.slice(index);
+}
+
+// Lexes past one string literal, quoted identifier, or comment starting at
+// `index`, using standard-SQL rules ('' and "" escapes; no dialect extensions
+// such as backslash escapes or dollar quoting). Returns the index after the
+// token, or `index` unchanged when no quoted/comment token starts there.
+function skipQuotedOrComment(sql: string, index: number): number {
+ const quote = sql[index];
+ if (quote === "'" || quote === '"') {
+ let i = index + 1;
+ while (i < sql.length) {
+ if (sql[i] === quote) {
+ if (sql[i + 1] === quote) {
+ i += 2;
+ continue;
+ }
+ return i + 1;
+ }
+ i += 1;
+ }
+ return sql.length;
+ }
+ if (sql.startsWith('--', index)) {
+ const end = sql.indexOf('\n', index + 2);
+ return end === -1 ? sql.length : end + 1;
+ }
+ if (sql.startsWith('/*', index)) {
+ const end = sql.indexOf('*/', index + 2);
+ return end === -1 ? sql.length : end + 2;
+ }
+ return index;
+}
+
+// Backstop against statement smuggling (`select 1; drop table x`): reject any
+// semicolon that is followed by real content. Semicolons inside string
+// literals, quoted identifiers, and comments are fine, as are trailing
+// semicolons (optionally followed by whitespace and comments). This deliberately
+// lexes standard SQL only, so dialect-specific escapes can cause a false
+// reject — never a false accept; the canonical gate is the daemon's
+// sqlglot-backed validateReadOnly.
+function assertSingleSqlStatement(sql: string): void {
+ let index = 0;
+ let sawSemicolon = false;
+ while (index < sql.length) {
+ const skipped = skipQuotedOrComment(sql, index);
+ if (skipped > index) {
+ index = skipped;
+ continue;
+ }
+ if (sql[index] === ';') {
+ sawSemicolon = true;
+ } else if (sawSemicolon && !/\s/.test(sql[index])) {
+ throw new KtxQueryError('Only one SQL statement can be executed.');
+ }
+ index += 1;
+ }
+}
+
+export function assertReadOnlySql(sql: string): string {
+ const trimmed = stripLeadingSqlComments(sql).trim();
+ if (!READ_SQL.test(trimmed) || MUTATING_SQL.test(trimmed)) {
+ throw new KtxQueryError('Only read-only SELECT/WITH queries can be executed locally.');
+ }
+ assertSingleSqlStatement(trimmed);
return trimmed;
}
+// `assertReadOnlySql` deliberately keeps trailing semicolons, comments, and
+// whitespace (e.g. `select 1; -- done`) — harmless for direct single-statement
+// execution. A row-limit subquery wrapper needs a bare expression instead: a
+// trailing `;` would sit illegally inside the subquery, and a trailing line
+// comment would comment out the closing paren and limit clause. Lex forward with
+// the same standard-SQL rules as the single-statement gate and truncate at the
+// end of the last meaningful token, dropping trailing semicolons, comments, and
+// whitespace. Characters inside string literals and quoted identifiers stay
+// meaningful, so a `;` or `--` within a literal is never mistaken for a
+// terminator (a plain regex cannot make that distinction).
+export function stripTrailingSqlNoise(sql: string): string {
+ let index = 0;
+ let meaningfulEnd = 0;
+ while (index < sql.length) {
+ if (sql.startsWith('--', index) || sql.startsWith('/*', index)) {
+ index = skipQuotedOrComment(sql, index);
+ continue;
+ }
+ const afterQuoted = skipQuotedOrComment(sql, index);
+ if (afterQuoted > index) {
+ meaningfulEnd = afterQuoted;
+ index = afterQuoted;
+ continue;
+ }
+ if (sql[index] !== ';' && !/\s/.test(sql[index] ?? '')) {
+ meaningfulEnd = index + 1;
+ }
+ index += 1;
+ }
+ return sql.slice(0, meaningfulEnd);
+}
+
export function limitSqlForExecution(sql: string, maxRows: number | undefined): string {
- const trimmed = assertReadOnlySql(sql).replace(/;+\s*$/, '');
+ const trimmed = stripTrailingSqlNoise(assertReadOnlySql(sql));
if (!maxRows) {
return trimmed;
}
if (!Number.isInteger(maxRows) || maxRows <= 0) {
- throw new Error('maxRows must be a positive integer.');
+ throw new KtxQueryError('maxRows must be a positive integer.');
}
return `select * from (${trimmed}) as ktx_query_result limit ${maxRows}`;
}
diff --git a/packages/cli/src/context/connections/resolve-connection.ts b/packages/cli/src/context/connections/resolve-connection.ts
new file mode 100644
index 00000000..1dee09ca
--- /dev/null
+++ b/packages/cli/src/context/connections/resolve-connection.ts
@@ -0,0 +1,50 @@
+import { KtxExpectedError } from '../../errors.js';
+import type { KtxProjectConfig, KtxProjectConnectionConfig } from '../project/config.js';
+
+function configuredConnectionIds(config: KtxProjectConfig): string[] {
+ return Object.keys(config.connections).sort();
+}
+
+function availableConnectionsHint(config: KtxProjectConfig): string {
+ const ids = configuredConnectionIds(config);
+ return ids.length === 0
+ ? 'No connections are configured in ktx.yaml.'
+ : `Configured connections: ${ids.join(', ')}.`;
+}
+
+/**
+ * Look up a connection by id, throwing an expected (caller-driven) error that
+ * names the configured connections so an agent or CLI user can self-correct.
+ */
+export function resolveConfiguredConnection(
+ config: KtxProjectConfig,
+ connectionId: string,
+): KtxProjectConnectionConfig {
+ const connection = config.connections[connectionId];
+ if (!connection) {
+ throw new KtxExpectedError(
+ `Connection "${connectionId}" is not configured in ktx.yaml. ${availableConnectionsHint(config)}`,
+ );
+ }
+ return connection;
+}
+
+/**
+ * Resolve the connection id to run against: validate a requested id against the
+ * configured connections, or default to the sole connection when none is given.
+ * Throws an expected error that lists the configured connections otherwise.
+ */
+export function resolveRequiredConnectionId(
+ config: KtxProjectConfig,
+ requested: string | undefined,
+): string {
+ if (requested !== undefined) {
+ resolveConfiguredConnection(config, requested);
+ return requested;
+ }
+ const ids = configuredConnectionIds(config);
+ if (ids.length === 1) {
+ return ids[0];
+ }
+ throw new KtxExpectedError(`connectionId is required. ${availableConnectionsHint(config)}`);
+}
diff --git a/packages/cli/src/context/connections/sqlite-query-executor.ts b/packages/cli/src/context/connections/sqlite-query-executor.ts
deleted file mode 100644
index 40710c96..00000000
--- a/packages/cli/src/context/connections/sqlite-query-executor.ts
+++ /dev/null
@@ -1,92 +0,0 @@
-import { isAbsolute, resolve } from 'node:path';
-import { fileURLToPath } from 'node:url';
-import Database from 'better-sqlite3';
-import { readFileSync } from 'node:fs';
-import { homedir } from 'node:os';
-import type {
- KtxSqlQueryExecutionInput,
- KtxSqlQueryExecutionResult,
- KtxSqlQueryExecutorPort,
-} from './query-executor.js';
-import { normalizeQueryRows } from './query-executor.js';
-import { limitSqlForExecution } from './read-only-sql.js';
-
-type SqliteConnectionConfig = Record | undefined;
-
-function connectionDriver(input: KtxSqlQueryExecutionInput): string {
- return String(input.connection?.driver ?? '').toLowerCase();
-}
-
-function stringConfigValue(connection: SqliteConnectionConfig, key: string): string | undefined {
- const value = connection?.[key];
- return typeof value === 'string' && value.trim().length > 0 ? resolveStringReference(key, value.trim()) : undefined;
-}
-
-function resolveStringReference(key: string, value: string): string {
- if (value.startsWith('env:')) {
- return process.env[value.slice('env:'.length)] ?? '';
- }
- if (key !== 'url' && value.startsWith('file:')) {
- const rawPath = value.slice('file:'.length);
- const path = rawPath.startsWith('~') ? resolve(homedir(), rawPath.slice(1)) : rawPath;
- return readFileSync(path, 'utf-8').trim();
- }
- return value;
-}
-
-function sqlitePathFromUrl(url: string): string {
- if (url.startsWith('file:')) {
- return fileURLToPath(url);
- }
-
- if (url.startsWith('sqlite:')) {
- const parsed = new URL(url);
- if (parsed.pathname.length > 0) {
- return decodeURIComponent(parsed.pathname);
- }
- }
-
- return url;
-}
-
-/** @internal */
-export function sqliteDatabasePathFromConnection(input: KtxSqlQueryExecutionInput): string {
- const driver = connectionDriver(input);
- if (driver !== 'sqlite') {
- throw new Error(`Local SQLite execution cannot run driver "${input.connection?.driver ?? 'unknown'}".`);
- }
-
- const pathValue = stringConfigValue(input.connection, 'path');
- const urlValue = stringConfigValue(input.connection, 'url');
- if (!pathValue && !urlValue) {
- throw new Error(
- `Local SQLite execution requires connections.${input.connectionId}.path or connections.${input.connectionId}.url.`,
- );
- }
-
- const candidate = pathValue ?? sqlitePathFromUrl(urlValue as string);
- return isAbsolute(candidate) ? candidate : resolve(input.projectDir ?? process.cwd(), candidate);
-}
-
-export function createSqliteQueryExecutor(): KtxSqlQueryExecutorPort {
- return {
- async execute(input: KtxSqlQueryExecutionInput): Promise {
- const sql = limitSqlForExecution(input.sql, input.maxRows);
- const dbPath = sqliteDatabasePathFromConnection(input);
- const db = new Database(dbPath, { readonly: true, fileMustExist: true });
- try {
- const statement = db.prepare(sql);
- const rows = statement.all() as unknown[];
- return {
- headers: statement.columns().map((column) => column.name),
- rows: normalizeQueryRows(rows),
- totalRows: rows.length,
- command: 'SELECT',
- rowCount: rows.length,
- };
- } finally {
- db.close();
- }
- },
- };
-}
diff --git a/packages/cli/src/context/core/abort.ts b/packages/cli/src/context/core/abort.ts
new file mode 100644
index 00000000..95467c52
--- /dev/null
+++ b/packages/cli/src/context/core/abort.ts
@@ -0,0 +1,39 @@
+/** @internal */
+export function createAbortError(message = 'Aborted'): DOMException {
+ return new DOMException(message, 'AbortError');
+}
+
+export function isAbortError(error: unknown): boolean {
+ if (error instanceof DOMException && error.name === 'AbortError') {
+ return true;
+ }
+ if (!error || typeof error !== 'object') {
+ return false;
+ }
+ const record = error as { name?: unknown; code?: unknown };
+ return record.name === 'AbortError' || record.code === 'ABORT_ERR';
+}
+
+/** @internal */
+export function throwIfAborted(signal?: AbortSignal): void {
+ if (signal?.aborted) {
+ throw createAbortError();
+ }
+}
+
+export function linkAbortSignal(parent?: AbortSignal): { controller: AbortController; dispose: () => void } {
+ const controller = new AbortController();
+ if (!parent) {
+ return { controller, dispose: () => undefined };
+ }
+ if (parent.aborted) {
+ controller.abort(createAbortError());
+ return { controller, dispose: () => undefined };
+ }
+ const onAbort = () => controller.abort(createAbortError());
+ parent.addEventListener('abort', onAbort, { once: true });
+ return {
+ controller,
+ dispose: () => parent.removeEventListener('abort', onAbort),
+ };
+}
diff --git a/packages/cli/src/context/core/git-env.ts b/packages/cli/src/context/core/git-env.ts
index 9ad3f121..0bb7bf74 100644
--- a/packages/cli/src/context/core/git-env.ts
+++ b/packages/cli/src/context/core/git-env.ts
@@ -24,6 +24,25 @@ function sanitizedGitEnv(env: NodeJS.ProcessEnv = process.env): NodeJS.ProcessEn
return sanitized;
}
-export function createSimpleGit(baseDir: string): SimpleGit {
- return simpleGit({ baseDir, unsafe: { allowUnsafeAskPass: true } }).env(sanitizedGitEnv());
+/**
+ * Create a simple-git client scoped to `baseDir`. When an identity is provided, ktx's own
+ * commits carry it through the GIT_AUTHOR and GIT_COMMITTER environment variables instead of
+ * relying on repo-local or global git config. This keeps commits working when the project
+ * directory is an existing repo ktx did not create and the machine has no configured git
+ * identity (e.g. a fresh Mac with no ~/.gitconfig), without mutating the user's repo config.
+ * Explicit `--author` flags on individual commits still take precedence over GIT_AUTHOR_NAME.
+ *
+ * `commit.gpgsign=false` is injected as a per-invocation `-c` override so ktx's commits never
+ * attempt GPG signing: ktx commits under a synthetic identity that can never own a secret key, so
+ * a user's `commit.gpgsign=true` would otherwise fail every commit with "No secret key".
+ */
+export function createSimpleGit(baseDir: string, identity?: { name: string; email: string }): SimpleGit {
+ const env = sanitizedGitEnv();
+ if (identity?.name && identity.email) {
+ env.GIT_AUTHOR_NAME = identity.name;
+ env.GIT_AUTHOR_EMAIL = identity.email;
+ env.GIT_COMMITTER_NAME = identity.name;
+ env.GIT_COMMITTER_EMAIL = identity.email;
+ }
+ return simpleGit({ baseDir, config: ['commit.gpgsign=false'], unsafe: { allowUnsafeAskPass: true } }).env(env);
}
diff --git a/packages/cli/src/context/core/git.service.ts b/packages/cli/src/context/core/git.service.ts
index 216183ff..febd9277 100644
--- a/packages/cli/src/context/core/git.service.ts
+++ b/packages/cli/src/context/core/git.service.ts
@@ -27,6 +27,58 @@ export interface WorktreeEntry {
head: string | null;
}
+export type KtxRepoOwnership = 'unowned' | 'ktx-managed' | 'foreign';
+
+export class KtxForeignGitRepositoryError extends Error {
+ constructor(configDir: string) {
+ super(
+ `${configDir} is already a git repository that ktx did not create. ` +
+ 'ktx maintains its context in a repository it owns; run ktx in a dedicated directory or move the existing repository aside.',
+ );
+ this.name = 'KtxForeignGitRepositoryError';
+ }
+}
+
+function isNodeErrnoException(error: unknown): error is NodeJS.ErrnoException {
+ return error instanceof Error && 'code' in error;
+}
+
+/**
+ * Classify whether ktx may own a git repository rooted exactly at `dir`. A root
+ * `ktx.yaml` is the ownership signal; the working tree decides, not git history,
+ * because older ktx versions left `ktx.yaml` uncommitted (it holds secret refs).
+ *
+ * - `unowned`: no repo here (including a missing or non-directory path) → ktx may `git init`.
+ * - `ktx-managed`: `/.git` is a directory and `ktx.yaml` sits at the root.
+ * - `foreign`: any other repo — no root `ktx.yaml`, or a `.git` *file* (a linked
+ * worktree). ktx must never adopt or mutate it.
+ *
+ * Reads only `` itself; never walks up, so a parent repo cannot change the answer.
+ */
+export async function classifyKtxRepoOwnership(dir: string): Promise {
+ let dotGitIsDirectory: boolean;
+ try {
+ dotGitIsDirectory = (await fs.lstat(join(dir, '.git'))).isDirectory();
+ } catch (error) {
+ // ENOENT: `/.git` is absent. ENOTDIR: `` itself is a file, so it
+ // can hold no repo. Either way there is nothing for ktx to avoid here.
+ if (isNodeErrnoException(error) && (error.code === 'ENOENT' || error.code === 'ENOTDIR')) {
+ return 'unowned';
+ }
+ throw error;
+ }
+ if (!dotGitIsDirectory) {
+ return 'foreign';
+ }
+ try {
+ // stat (not lstat): follow symlinks, matching what `loadKtxProject`'s
+ // readFile accepts — a dir that loads as a ktx project classifies as one.
+ return (await fs.stat(join(dir, 'ktx.yaml'))).isFile() ? 'ktx-managed' : 'foreign';
+ } catch {
+ return 'foreign';
+ }
+}
+
export type SquashMergeResult =
| { ok: true; squashSha: string; touchedPaths: string[] }
| { ok: false; conflict: true; conflictPaths: string[] };
@@ -85,8 +137,12 @@ export class GitService {
await fs.mkdir(this.configDir, { recursive: true });
this.logger.log(`Config directory ensured at: ${this.configDir}`);
- // Initialize simple-git
- this.git = createSimpleGit(this.configDir);
+ // Initialize simple-git. Carry ktx's identity in the environment so commits succeed even
+ // when this repo already exists and the machine has no configured git identity.
+ this.git = createSimpleGit(this.configDir, {
+ name: this.config.git.userName,
+ email: this.config.git.userEmail,
+ });
// Initialize git repository
await this.initialize();
@@ -94,16 +150,16 @@ export class GitService {
private async initialize(): Promise {
try {
- // Check if already initialized
- const isRepo = await this.git.checkIsRepo();
+ const ownership = await classifyKtxRepoOwnership(this.configDir);
- if (!isRepo) {
- await this.git.init();
- const gitConfig = this.config.git;
- await this.git.addConfig('user.name', gitConfig.userName);
- await this.git.addConfig('user.email', gitConfig.userEmail);
- this.logger.log('Initialized git repository');
+ if (ownership === 'foreign') {
+ throw new KtxForeignGitRepositoryError(this.configDir);
}
+ if (ownership === 'unowned') {
+ await this.git.init();
+ this.logger.log('Initialized ktx-managed git repository');
+ }
+ // ownership === 'ktx-managed' → ktx's own repo; proceed with the normal re-run path.
// Keep any auto-maintenance triggered by writes in-process. Detached maintenance can
// keep object-pack directories alive briefly after awaited git commands complete,
@@ -124,8 +180,17 @@ export class GitService {
this.logger.log('Wrote bootstrap commit to config repo');
}
} catch (error) {
+ // The foreign-repo error is already typed and actionable; surface it verbatim so every
+ // command that loads the project shows the same clear guidance instead of a generic wrapper.
+ if (error instanceof KtxForeignGitRepositoryError) {
+ throw error;
+ }
this.logger.error('Failed to initialize git repository', error);
- throw new Error('Failed to initialize git repository');
+ // Preserve the underlying git error: the generic message alone is undiagnosable in
+ // telemetry and unactionable for the user. The exception reporter walks `cause` and
+ // redacts secrets before send.
+ const detail = error instanceof Error ? error.message : String(error);
+ throw new Error(`Failed to initialize git repository: ${detail}`, { cause: error });
}
}
@@ -547,12 +612,13 @@ export class GitService {
}
/**
- * List all paths under the working tree that match `pathSpec`, scoped to HEAD.
- * Used for the reconciler's first-ever run when there's no watermark to diff from.
+ * List all paths matching `pathSpec` as they exist at `commitHash`. Reads from
+ * git object storage, so it's safe against concurrent working-tree mutations
+ * and can recover paths (e.g. a human-renamed file) that no longer exist on disk.
*/
- async listFilesAtHead(pathSpec: string): Promise {
+ async listFilesAtCommit(pathSpec: string, commitHash: string): Promise {
try {
- const raw = await this.git.raw(['ls-tree', '-r', '-z', '--name-only', 'HEAD', '--', pathSpec]);
+ const raw = await this.git.raw(['ls-tree', '-r', '-z', '--name-only', commitHash, '--', pathSpec]);
if (!raw) {
return [];
}
@@ -562,6 +628,14 @@ export class GitService {
}
}
+ /**
+ * List all paths under the working tree that match `pathSpec`, scoped to HEAD.
+ * Used for the reconciler's first-ever run when there's no watermark to diff from.
+ */
+ async listFilesAtHead(pathSpec: string): Promise {
+ return this.listFilesAtCommit(pathSpec, 'HEAD');
+ }
+
/**
* Collapse all commits between `preHead` and current HEAD into a single commit with the given
* message. Used by the memory agent to squash N per-tool-call commits into one ingest commit.
@@ -899,7 +973,10 @@ export class GitService {
*/
forWorktree(workdir: string): GitService {
const scoped = new GitService(this.config, this.logger);
- scoped.git = createSimpleGit(workdir);
+ scoped.git = createSimpleGit(workdir, {
+ name: this.config.git.userName,
+ email: this.config.git.userEmail,
+ });
scoped.configDir = workdir;
return scoped;
}
diff --git a/packages/cli/src/context/ingest/adapters/historic-sql/projection.ts b/packages/cli/src/context/ingest/adapters/historic-sql/projection.ts
index 2c7830a2..b3470441 100644
--- a/packages/cli/src/context/ingest/adapters/historic-sql/projection.ts
+++ b/packages/cli/src/context/ingest/adapters/historic-sql/projection.ts
@@ -3,6 +3,7 @@ import { dirname, join, relative } from 'node:path';
import YAML from 'yaml';
import type { MemoryAction } from '../../../../context/memory/types.js';
import { rawSourcesDirForSync } from '../../raw-sources-paths.js';
+import { isSlYamlPath } from '../../../sl/source-files.js';
import type { FinalizationOverrideReplay } from '../../types.js';
import { mergeUsagePreservingExternal } from '../live-database/manifest.js';
import { historicSqlEvidenceEnvelopeSchema, type HistoricSqlEvidenceEnvelope } from './evidence.js';
@@ -251,7 +252,7 @@ export async function projectHistoricSqlEvidence(input: HistoricSqlProjectionInp
const patternEvidence = evidence.filter((entry): entry is HistoricSqlEvidenceEnvelope & { kind: 'pattern' } => entry.kind === 'pattern');
const schemaRoot = join(input.workdir, 'semantic-layer', input.connectionId, '_schema');
- for (const file of (await walkFiles(schemaRoot)).filter((candidate) => candidate.endsWith('.yaml') || candidate.endsWith('.yml'))) {
+ for (const file of (await walkFiles(schemaRoot)).filter(isSlYamlPath)) {
const path = join(schemaRoot, file);
const before = await readFile(path, 'utf-8');
const shard = (YAML.parse(before) ?? {}) as ManifestShard;
diff --git a/packages/cli/src/context/ingest/adapters/historic-sql/query-history-filter-picker.ts b/packages/cli/src/context/ingest/adapters/historic-sql/query-history-filter-picker.ts
index bb296513..3f77900d 100644
--- a/packages/cli/src/context/ingest/adapters/historic-sql/query-history-filter-picker.ts
+++ b/packages/cli/src/context/ingest/adapters/historic-sql/query-history-filter-picker.ts
@@ -23,6 +23,7 @@ export interface QueryHistoryFilterProposal {
consideredRoleCount: number;
skipped: { reason: 'no-llm' | 'no-daemon' | 'no-in-scope-history' | 'user-block-present' } | null;
warnings: string[];
+ parseFailedTemplateIds: string[];
}
export interface ProposeQueryHistoryServiceAccountFiltersInput {
@@ -74,7 +75,7 @@ const queryHistoryFilterAdjudicationSchema = z.object({
type QueryHistoryFilterAdjudication = z.infer;
function emptyProposal(skipped: QueryHistoryFilterProposal['skipped'], warnings: string[] = []): QueryHistoryFilterProposal {
- return { excludedRoles: [], consideredRoleCount: 0, skipped, warnings };
+ return { excludedRoles: [], consideredRoleCount: 0, skipped, warnings, parseFailedTemplateIds: [] };
}
function displayTableRef(ref: KtxTableRef): string {
@@ -180,6 +181,7 @@ export async function proposeQueryHistoryServiceAccountFilters(
const windowDays = 'windowDays' in config ? config.windowDays : 90;
const windowStart = new Date(now.getTime() - windowDays * 24 * 60 * 60 * 1000);
const warnings: string[] = [];
+ const parseFailedTemplateIds: string[] = [];
const snapshot: AggregatedTemplate[] = [];
try {
@@ -212,7 +214,7 @@ export async function proposeQueryHistoryServiceAccountFilters(
for (const template of snapshot) {
const parsed = analysis.get(template.templateId);
if (!parsed || parsed.error) {
- warnings.push(`query_history_filter_picker_parse_failed:${template.templateId}`);
+ parseFailedTemplateIds.push(template.templateId);
continue;
}
const tablesTouched = [...new Map(parsed.tablesTouched.map((ref) => [tableRefKey(ref), ref])).values()]
@@ -236,6 +238,7 @@ export async function proposeQueryHistoryServiceAccountFilters(
consideredRoleCount: records.length,
skipped: { reason: 'no-in-scope-history' },
warnings,
+ parseFailedTemplateIds,
};
}
@@ -256,6 +259,7 @@ export async function proposeQueryHistoryServiceAccountFilters(
...warnings,
`query_history_filter_picker_llm_failed:${error instanceof Error ? error.message : String(error)}`,
],
+ parseFailedTemplateIds,
};
}
@@ -274,5 +278,6 @@ export async function proposeQueryHistoryServiceAccountFilters(
consideredRoleCount: records.length,
skipped: input.userServiceAccountsPresent ? { reason: 'user-block-present' } : null,
warnings,
+ parseFailedTemplateIds,
};
}
diff --git a/packages/cli/src/context/ingest/adapters/live-database/manifest.ts b/packages/cli/src/context/ingest/adapters/live-database/manifest.ts
index 3c35b463..2e864528 100644
--- a/packages/cli/src/context/ingest/adapters/live-database/manifest.ts
+++ b/packages/cli/src/context/ingest/adapters/live-database/manifest.ts
@@ -86,6 +86,9 @@ export interface BuildLiveDatabaseManifestShardsInput {
existingPreservedJoins?: Map;
existingDescriptions?: Map;
existingUsage?: Map;
+ // Table refs owned by other federated members; declared cross-DB joins to
+ // these survive even though the target has no shard in this snapshot.
+ federatedSiblingTargets?: Set;
}
export interface BuildLiveDatabaseManifestShardsResult {
@@ -204,15 +207,20 @@ function joinCondition(
.join(' AND ');
}
-function buildJoinsByTable(
+/** @internal */
+export function buildJoinsByTable(
tableNames: Set,
joins: LiveDatabaseManifestJoinData[],
preservedJoins: Map,
+ federatedSiblingTargets: Set = new Set(),
): Map {
const joinsByTable = new Map();
for (const join of joins) {
- if (!tableNames.has(join.fromTable) || !tableNames.has(join.toTable)) {
+ const fromLocal = tableNames.has(join.fromTable);
+ const toLocal = tableNames.has(join.toTable);
+ const toSibling = federatedSiblingTargets.has(join.toTable);
+ if (!fromLocal || (!toLocal && !toSibling)) {
continue;
}
const relationship = RELATIONSHIP_MAP[join.relationship] ?? join.relationship;
@@ -223,13 +231,17 @@ function buildJoinsByTable(
source: join.source,
});
- const reverseRelationship = RELATIONSHIP_INVERSE[relationship] ?? 'one_to_many';
- addJoinOnce(joinsByTable, join.toTable, {
- to: join.fromTable,
- on: joinCondition(join.toTable, join.toColumns, join.fromTable, join.fromColumns),
- relationship: reverseRelationship,
- source: join.source,
- });
+ // Reverse direction only when the target is a local table in THIS snapshot;
+ // a federated sibling has no shard here, so it gets no reverse entry.
+ if (toLocal) {
+ const reverseRelationship = RELATIONSHIP_INVERSE[relationship] ?? 'one_to_many';
+ addJoinOnce(joinsByTable, join.toTable, {
+ to: join.fromTable,
+ on: joinCondition(join.toTable, join.toColumns, join.fromTable, join.fromColumns),
+ relationship: reverseRelationship,
+ source: join.source,
+ });
+ }
}
for (const [tableName, tableJoins] of preservedJoins) {
@@ -237,7 +249,7 @@ function buildJoinsByTable(
continue;
}
for (const join of tableJoins) {
- if (tableNames.has(join.to)) {
+ if (tableNames.has(join.to) || federatedSiblingTargets.has(join.to)) {
addJoinOnce(joinsByTable, tableName, join);
}
}
@@ -250,7 +262,12 @@ export function buildLiveDatabaseManifestShards(
input: BuildLiveDatabaseManifestShardsInput,
): BuildLiveDatabaseManifestShardsResult {
const tableNames = new Set(input.tables.map((table) => table.name));
- const joinsByTable = buildJoinsByTable(tableNames, input.joins, input.existingPreservedJoins ?? new Map());
+ const joinsByTable = buildJoinsByTable(
+ tableNames,
+ input.joins,
+ input.existingPreservedJoins ?? new Map(),
+ input.federatedSiblingTargets ?? new Set(),
+ );
const shards = new Map();
for (const table of input.tables) {
diff --git a/packages/cli/src/context/ingest/adapters/looker/client.ts b/packages/cli/src/context/ingest/adapters/looker/client.ts
index 90f9f466..c31145d0 100644
--- a/packages/cli/src/context/ingest/adapters/looker/client.ts
+++ b/packages/cli/src/context/ingest/adapters/looker/client.ts
@@ -88,13 +88,18 @@ const defaultLogger: LookerClientLogger = {
class InlineLookerSettings extends NodeSettings {
constructor(private readonly params: LookerConnectionParams) {
- super('', {
+ // @looker/sdk-rtl boundary: NodeSettings consumes a string-valued config
+ // section (read back via the readConfig override below), but its constructor
+ // is typed to accept a fully-realized IApiSettings. The string record is the
+ // shape the library actually reads, so narrow to IApiSection first.
+ const settings: IApiSection = {
base_url: normalizeBaseUrl(params.base_url),
client_id: params.client_id,
client_secret: params.client_secret, // pragma: allowlist secret
verify_ssl: 'true',
timeout: '120',
- } as unknown as IApiSettings);
+ };
+ super('', settings as IApiSection & IApiSettings);
}
override readConfig(_section?: string): IApiSection {
diff --git a/packages/cli/src/context/ingest/adapters/looker/factory.ts b/packages/cli/src/context/ingest/adapters/looker/factory.ts
index 64fdfa42..edb1dcbe 100644
--- a/packages/cli/src/context/ingest/adapters/looker/factory.ts
+++ b/packages/cli/src/context/ingest/adapters/looker/factory.ts
@@ -19,6 +19,16 @@ export class DefaultLookerConnectionClientFactory implements LookerConnectionCli
) {}
async createClient(lookerConnectionId: string): Promise {
+ return this.createLookerClient(lookerConnectionId);
+ }
+
+ /**
+ * Like {@link createClient} but preserves the concrete {@link LookerClient}
+ * type, so callers that need methods outside the `LookerRuntimeClient`
+ * contract (e.g. `listLookerConnections`, `testConnection`) keep them without
+ * a cast.
+ */
+ async createLookerClient(lookerConnectionId: string): Promise {
const credentials = await this.resolver.resolve(lookerConnectionId);
return new LookerClient(credentials, this.deps);
}
diff --git a/packages/cli/src/context/ingest/adapters/looker/mapping.ts b/packages/cli/src/context/ingest/adapters/looker/mapping.ts
index 4da6344d..3e451aca 100644
--- a/packages/cli/src/context/ingest/adapters/looker/mapping.ts
+++ b/packages/cli/src/context/ingest/adapters/looker/mapping.ts
@@ -214,7 +214,7 @@ export function validateLookerMappings(args: {
if (!args.knownKtxConnectionIds.has(mapping.ktxConnectionId)) {
errors.push({
key: mapping.lookerConnectionName,
- reason: `KTX connection ${mapping.ktxConnectionId} does not exist`,
+ reason: `ktx connection ${mapping.ktxConnectionId} does not exist`,
});
continue;
}
diff --git a/packages/cli/src/context/ingest/adapters/metabase/client.ts b/packages/cli/src/context/ingest/adapters/metabase/client.ts
index 7b075991..5a48d0c6 100644
--- a/packages/cli/src/context/ingest/adapters/metabase/client.ts
+++ b/packages/cli/src/context/ingest/adapters/metabase/client.ts
@@ -81,7 +81,7 @@ class MetabaseApiError extends Error {
* Strip Metabase `[[ ... {{ var }} ... ]]` optional-clause blocks from native SQL.
*
* The bracketed blocks are emitted only when the embedded `{{ var }}` is supplied at
- * Metabase query time. For KTX semantic-layer ingest there's no such runtime
+ * Metabase query time. For ktx semantic-layer ingest there's no such runtime
* parameter — chat-time filters are composed by the SL query planner — so the optional
* block must be removed before the SQL becomes a permanent SL source. Substituting a
* dummy value (the alternative) bakes a placeholder filter into the source and silently
diff --git a/packages/cli/src/context/ingest/adapters/metabase/local-metabase.adapter.ts b/packages/cli/src/context/ingest/adapters/metabase/local-metabase.adapter.ts
index 7cb3d843..f581c5e3 100644
--- a/packages/cli/src/context/ingest/adapters/metabase/local-metabase.adapter.ts
+++ b/packages/cli/src/context/ingest/adapters/metabase/local-metabase.adapter.ts
@@ -34,7 +34,7 @@ export function metabaseRuntimeConfigFromLocalConnection(
}
if (hasNetworkProxy(connection)) {
throw new Error(
- `Standalone KTX does not support proxy-bearing Metabase connections yet. Use hosted Metabase ingest for "${connectionId}" until the KTX Metabase proxy support spec lands.`,
+ `Standalone ktx does not support proxy-bearing Metabase connections yet. Use hosted Metabase ingest for "${connectionId}" until the ktx Metabase proxy support spec lands.`,
);
}
diff --git a/packages/cli/src/context/ingest/adapters/metabase/mapping.ts b/packages/cli/src/context/ingest/adapters/metabase/mapping.ts
index 788ae43b..b5ba67a7 100644
--- a/packages/cli/src/context/ingest/adapters/metabase/mapping.ts
+++ b/packages/cli/src/context/ingest/adapters/metabase/mapping.ts
@@ -186,7 +186,7 @@ export function validateMetabaseMappings(args: {
continue;
}
if (!args.knownKtxConnectionIds.has(connectionId)) {
- errors.push({ key, reason: `KTX connection ${connectionId} does not exist` });
+ errors.push({ key, reason: `ktx connection ${connectionId} does not exist` });
}
}
return errors.length === 0 ? { ok: true } : { ok: false, errors };
@@ -207,7 +207,7 @@ export function validateMappingPhysicalMatch(
}
if (target.connection_type !== expectedType) {
- return `Metabase database engine '${engine}' does not match KTX connection type '${target.connection_type}'`;
+ return `Metabase database engine '${engine}' does not match ktx connection type '${target.connection_type}'`;
}
const metabaseDb = normalizeName(mapping.metabaseDbName);
@@ -215,7 +215,7 @@ export function validateMappingPhysicalMatch(
if (engine === 'snowflake' || engine === 'bigquery' || engine === 'bigquery-cloud-sdk') {
if (metabaseDb && targetDb && metabaseDb !== targetDb) {
- return `Metabase database '${mapping.metabaseDbName}' does not match KTX connection database '${displayValue(
+ return `Metabase database '${mapping.metabaseDbName}' does not match ktx connection database '${displayValue(
getTargetDatabase(target),
)}'`;
}
@@ -227,12 +227,12 @@ export function validateMappingPhysicalMatch(
const targetHost = normalizeHost(target.host);
if (metabaseHost && targetHost && metabaseHost !== targetHost) {
- return `Metabase host '${mapping.metabaseHost}' does not match KTX connection host '${displayValue(
+ return `Metabase host '${mapping.metabaseHost}' does not match ktx connection host '${displayValue(
target.host,
)}'`;
}
if (metabaseDb && targetDb && metabaseDb !== targetDb) {
- return `Metabase database '${mapping.metabaseDbName}' does not match KTX connection database '${displayValue(
+ return `Metabase database '${mapping.metabaseDbName}' does not match ktx connection database '${displayValue(
getTargetDatabase(target),
)}'`;
}
@@ -274,7 +274,7 @@ export async function refreshMetabaseMapping(args: {
if (!target) {
physicalMismatches.push({
mappingId: String(mapping.id),
- reason: `KTX connection ${mapping.ktxConnectionId} does not exist`,
+ reason: `ktx connection ${mapping.ktxConnectionId} does not exist`,
});
continue;
}
diff --git a/packages/cli/src/context/ingest/artifact-gates.ts b/packages/cli/src/context/ingest/artifact-gates.ts
index 35735575..a67f8455 100644
--- a/packages/cli/src/context/ingest/artifact-gates.ts
+++ b/packages/cli/src/context/ingest/artifact-gates.ts
@@ -2,20 +2,16 @@ import type { SemanticLayerService } from '../../context/sl/semantic-layer.servi
import type { TouchedSlSource } from '../../context/tools/touched-sl-sources.js';
import type { KnowledgeWikiService } from '../../context/wiki/knowledge-wiki.service.js';
import { findMissingWikiRefs } from '../wiki/wiki-ref-validation.js';
+import type { WuValidationResult } from './stages/validate-wu-sources.js';
import { findInvalidWikiBodyRefs } from './wiki-body-refs.js';
-interface TouchedValidationResult {
- invalidSources: string[];
- validSources: string[];
-}
-
export interface FinalArtifactGateInput {
connectionIds: string[];
changedWikiPageKeys: string[];
touchedSlSources: TouchedSlSource[];
wikiService: KnowledgeWikiService;
semanticLayerService: SemanticLayerService;
- validateTouchedSources(touched: TouchedSlSource[]): Promise;
+ validateTouchedSources(touched: TouchedSlSource[]): Promise;
tableExists(connectionId: string, tableRef: string): Promise;
}
@@ -40,54 +36,6 @@ function slEntityNames(source: Awaited();
- const unique: TouchedSlSource[] = [];
- for (const source of sources) {
- const key = `${source.connectionId}:${source.sourceName}`;
- if (seen.has(key)) {
- continue;
- }
- seen.add(key);
- unique.push(source);
- }
- return unique.sort((left, right) => {
- const byConnection = left.connectionId.localeCompare(right.connectionId);
- return byConnection === 0 ? left.sourceName.localeCompare(right.sourceName) : byConnection;
- });
-}
-
-async function expandTouchedSlSourcesWithDirectJoinNeighbors(input: FinalArtifactGateInput): Promise {
- const expanded = [...input.touchedSlSources];
- const touchedByConnection = new Map>();
- for (const source of input.touchedSlSources) {
- const bucket = touchedByConnection.get(source.connectionId) ?? new Set();
- bucket.add(source.sourceName);
- touchedByConnection.set(source.connectionId, bucket);
- }
-
- for (const connectionId of input.connectionIds) {
- const touched = touchedByConnection.get(connectionId);
- if (!touched || touched.size === 0) {
- continue;
- }
- const { sources } = await input.semanticLayerService.loadAllSources(connectionId);
- for (const source of sources) {
- const sourceIsTouched = touched.has(source.name);
- if (sourceIsTouched) {
- for (const join of source.joins ?? []) {
- expanded.push({ connectionId, sourceName: join.to });
- }
- }
- if ((source.joins ?? []).some((join) => touched.has(join.to))) {
- expanded.push({ connectionId, sourceName: source.name });
- }
- }
- }
-
- return uniqueTouchedSources(expanded);
-}
-
async function validateWikiSlRefs(input: FinalArtifactGateInput): Promise {
const errors: string[] = [];
const sourcesByConnection = new Map>['sources']>();
@@ -146,9 +94,13 @@ async function validateWikiRefs(input: FinalArtifactGateInput): Promise {
- const touchedWithDependencies = await expandTouchedSlSourcesWithDirectJoinNeighbors(input);
- const validation = await input.validateTouchedSources(touchedWithDependencies);
- const errors: string[] = validation.invalidSources.map((source) => `semantic-layer validation failed for ${source}`);
+ // Join-neighbor expansion happens inside validateTouchedSources so work-unit
+ // validation and this gate check the same set — a source that passes one
+ // passes the other.
+ const validation = await input.validateTouchedSources(input.touchedSlSources);
+ const errors: string[] = validation.invalidSources.map(
+ (invalid) => `semantic-layer validation failed for ${invalid.source}: ${invalid.errors.join('; ')}`,
+ );
errors.push(...(await validateWikiSlRefs(input)));
const danglingWikiRefs = await validateWikiRefs(input);
if (danglingWikiRefs.length > 0) {
diff --git a/packages/cli/src/context/ingest/constrained-repair.ts b/packages/cli/src/context/ingest/constrained-repair.ts
new file mode 100644
index 00000000..104eaefc
--- /dev/null
+++ b/packages/cli/src/context/ingest/constrained-repair.ts
@@ -0,0 +1,225 @@
+import { mkdir, readFile, rm, writeFile } from 'node:fs/promises';
+import { dirname, join } from 'node:path';
+import { z } from 'zod';
+import type { AgentRunnerPort, KtxRuntimeToolSet } from '../../context/llm/runtime-port.js';
+import type { IngestTraceWriter } from './ingest-trace.js';
+import { traceTimed } from './ingest-trace.js';
+
+/**
+ * Shared loop for the two integration-time repair agents (semantic gate
+ * repair, textual conflict resolution). Success is decided by re-running the
+ * failed check — `verify` — never by whether the agent edited files: an
+ * ineffective edit fails, and an explicit no-change declaration that verifies
+ * succeeds.
+ */
+
+export type RepairVerification = { ok: true } | { ok: false; reason: string };
+
+export type ConstrainedRepairResult =
+ | { status: 'repaired'; attempts: number; changedPaths: string[] }
+ | { status: 'failed'; attempts: number; reason: string };
+
+export interface ConstrainedRepairToolContext {
+ workdir: string;
+ allowedPaths: ReadonlySet;
+ editedPaths: Set;
+ declareNoChange(reason: string): void;
+}
+
+export interface ConstrainedRepairLoopInput {
+ agentRunner: AgentRunnerPort;
+ workdir: string;
+ allowedPaths: string[];
+ trace: IngestTraceWriter;
+ tracePhase: string;
+ traceEventName: string;
+ traceData: Record;
+ systemPrompt: string;
+ buildUserPrompt(input: { attempt: number; maxAttempts: number; previousFailure: string | null }): string;
+ buildExtraTools?(context: ConstrainedRepairToolContext): KtxRuntimeToolSet;
+ verify(changedPaths: string[]): Promise;
+ /** Failure reason when an attempt neither edits nor declares no-change. */
+ noChangeFailureReason: string;
+ telemetryTags: Record;
+ maxAttempts?: number;
+ stepBudget?: number;
+ abortSignal?: AbortSignal;
+}
+
+const readRepairFileSchema = z.object({
+ path: z.string().min(1),
+});
+
+const writeRepairFileSchema = z.object({
+ path: z.string().min(1),
+ content: z.string(),
+});
+
+function normalizeRepoPath(path: string): string {
+ const normalized = path.replace(/\\/g, '/').replace(/^\/+/, '');
+ const parts = normalized.split('/').filter((part) => part.length > 0);
+ if (parts.length === 0 || parts.some((part) => part === '.' || part === '..')) {
+ throw new Error(`repair path must be a repository-relative path: ${path}`);
+ }
+ return parts.join('/');
+}
+
+function assertAllowedPath(path: string, allowedPaths: ReadonlySet): string {
+ const normalized = normalizeRepoPath(path);
+ if (!allowedPaths.has(normalized)) {
+ throw new Error(`repair path not allowed: ${normalized}`);
+ }
+ return normalized;
+}
+
+async function readOptionalFile(path: string): Promise<{ exists: boolean; content: string }> {
+ try {
+ return { exists: true, content: await readFile(path, 'utf-8') };
+ } catch (error) {
+ if (error && typeof error === 'object' && 'code' in error && error.code === 'ENOENT') {
+ return { exists: false, content: '' };
+ }
+ throw error;
+ }
+}
+
+function buildRepairFileTools(context: ConstrainedRepairToolContext): KtxRuntimeToolSet {
+ return {
+ read_repair_file: {
+ name: 'read_repair_file',
+ description: 'Read one allowed file from the integration worktree.',
+ inputSchema: readRepairFileSchema,
+ execute: async ({ path }: z.infer) => {
+ const normalized = assertAllowedPath(path, context.allowedPaths);
+ const file = await readOptionalFile(join(context.workdir, normalized));
+ return {
+ markdown: file.exists ? file.content : `(missing file: ${normalized})`,
+ structured: { path: normalized, exists: file.exists },
+ };
+ },
+ },
+ write_repair_file: {
+ name: 'write_repair_file',
+ description: 'Replace one allowed integration worktree file with repaired text content.',
+ inputSchema: writeRepairFileSchema,
+ execute: async ({ path, content }: z.infer) => {
+ const normalized = assertAllowedPath(path, context.allowedPaths);
+ const fullPath = join(context.workdir, normalized);
+ await mkdir(dirname(fullPath), { recursive: true });
+ await writeFile(fullPath, content, 'utf-8');
+ context.editedPaths.add(normalized);
+ return {
+ markdown: `Wrote ${normalized}`,
+ structured: { path: normalized, bytes: Buffer.byteLength(content) },
+ };
+ },
+ },
+ };
+}
+
+export function buildDeleteRepairFileTool(context: ConstrainedRepairToolContext): KtxRuntimeToolSet {
+ const deleteRepairFileSchema = z.object({
+ path: z.string().min(1),
+ });
+ return {
+ delete_repair_file: {
+ name: 'delete_repair_file',
+ description: 'Delete one allowed integration worktree file when the failed patch proves the deletion is correct.',
+ inputSchema: deleteRepairFileSchema,
+ execute: async ({ path }: z.infer) => {
+ const normalized = assertAllowedPath(path, context.allowedPaths);
+ await rm(join(context.workdir, normalized), { force: true });
+ context.editedPaths.add(normalized);
+ return {
+ markdown: `Deleted ${normalized}`,
+ structured: { path: normalized },
+ };
+ },
+ },
+ };
+}
+
+export async function runConstrainedRepairLoop(input: ConstrainedRepairLoopInput): Promise {
+ const allowedPaths = new Set(input.allowedPaths.map(normalizeRepoPath));
+ const sortedAllowedPaths = [...allowedPaths].sort();
+ const maxAttempts = input.maxAttempts ?? 2;
+ const stepBudget = input.stepBudget ?? 16;
+ // Edits persist in the worktree across attempts, so the verified set and the
+ // reported changedPaths accumulate over the whole loop.
+ const editedPaths = new Set();
+ let lastFailure = 'repair did not run';
+ let previousFailure: string | null = null;
+
+ for (let attempt = 1; attempt <= maxAttempts; attempt += 1) {
+ let noChangeDeclaration: string | null = null;
+ const toolContext: ConstrainedRepairToolContext = {
+ workdir: input.workdir,
+ allowedPaths,
+ editedPaths,
+ declareNoChange: (reason: string) => {
+ noChangeDeclaration = reason;
+ },
+ };
+ const traceData = {
+ ...input.traceData,
+ attempt,
+ maxAttempts,
+ allowedPaths: sortedAllowedPaths,
+ };
+ const result = await traceTimed(input.trace, input.tracePhase, input.traceEventName, traceData, async () =>
+ input.agentRunner.runLoop({
+ modelRole: 'repair',
+ systemPrompt: input.systemPrompt,
+ userPrompt: input.buildUserPrompt({ attempt, maxAttempts, previousFailure }),
+ toolSet: {
+ ...buildRepairFileTools(toolContext),
+ ...(input.buildExtraTools?.(toolContext) ?? {}),
+ },
+ stepBudget,
+ telemetryTags: input.telemetryTags,
+ abortSignal: input.abortSignal,
+ }),
+ );
+
+ if (result.stopReason === 'error') {
+ lastFailure = result.error?.message ?? 'repair agent loop errored';
+ previousFailure = lastFailure;
+ await input.trace.event('error', input.tracePhase, `${input.traceEventName}_failed`, traceData, result.error);
+ continue;
+ }
+
+ const changedPaths = [...editedPaths].sort();
+ if (changedPaths.length === 0 && noChangeDeclaration === null) {
+ // Nothing changed and nothing was claimed: the failed check would fail
+ // identically, so skip verification and retry.
+ lastFailure = input.noChangeFailureReason;
+ previousFailure = lastFailure;
+ await input.trace.event('error', input.tracePhase, `${input.traceEventName}_failed`, {
+ ...traceData,
+ reason: lastFailure,
+ });
+ continue;
+ }
+
+ const verification = await input.verify(changedPaths);
+ if (!verification.ok) {
+ lastFailure = verification.reason;
+ previousFailure = lastFailure;
+ await input.trace.event('error', input.tracePhase, `${input.traceEventName}_failed`, {
+ ...traceData,
+ changedPaths,
+ reason: lastFailure,
+ });
+ continue;
+ }
+
+ await input.trace.event('debug', input.tracePhase, `${input.traceEventName}_repaired`, {
+ ...traceData,
+ changedPaths,
+ ...(noChangeDeclaration !== null ? { noChangeDeclaration } : {}),
+ });
+ return { status: 'repaired', attempts: attempt, changedPaths };
+ }
+
+ return { status: 'failed', attempts: maxAttempts, reason: lastFailure };
+}
diff --git a/packages/cli/src/context/ingest/context-candidates/curator-pagination.service.ts b/packages/cli/src/context/ingest/context-candidates/curator-pagination.service.ts
index 348544ca..fbeab08c 100644
--- a/packages/cli/src/context/ingest/context-candidates/curator-pagination.service.ts
+++ b/packages/cli/src/context/ingest/context-candidates/curator-pagination.service.ts
@@ -39,7 +39,7 @@ export interface CuratorPaginationInput {
buildUserPrompt: (input: CuratorPaginationPromptInput) => string;
buildToolSet: (passNumber: number) => KtxRuntimeToolSet;
getReconciliationActions: () => MemoryAction[];
- onStepFinish?: (info: { passNumber: number; stepIndex: number; stepBudget: number }) => void;
+ abortSignal?: AbortSignal;
}
interface CuratorPaginationResult extends ReconciliationOutcome {
@@ -243,10 +243,7 @@ export class CuratorPaginationService implements CuratorPaginationPort {
sourceKey: params.input.sourceKey,
jobId: params.input.jobId,
forceRun: params.forceRun,
- onStepFinish: params.input.onStepFinish
- ? ({ stepIndex, stepBudget }) =>
- params.input.onStepFinish?.({ passNumber: params.passNumber, stepIndex, stepBudget })
- : undefined,
+ abortSignal: params.input.abortSignal,
});
}
diff --git a/packages/cli/src/context/ingest/final-gate-repair.ts b/packages/cli/src/context/ingest/final-gate-repair.ts
index 1c373aa6..ff2d1a9a 100644
--- a/packages/cli/src/context/ingest/final-gate-repair.ts
+++ b/packages/cli/src/context/ingest/final-gate-repair.ts
@@ -1,16 +1,12 @@
-import { mkdir, readFile, writeFile } from 'node:fs/promises';
-import { dirname, join } from 'node:path';
import { z } from 'zod';
import type { AgentRunnerPort, KtxRuntimeToolSet } from '../../context/llm/runtime-port.js';
-import type { TouchedSlSource } from '../../context/tools/touched-sl-sources.js';
+import type { ConstrainedRepairResult, RepairVerification } from './constrained-repair.js';
+import { runConstrainedRepairLoop } from './constrained-repair.js';
import type { IngestTraceWriter } from './ingest-trace.js';
-import { traceTimed } from './ingest-trace.js';
type FinalGateRepairKind = 'patch_semantic_gate' | 'final_artifact_gate';
-export type FinalGateRepairResult =
- | { status: 'repaired'; attempts: number; changedPaths: string[] }
- | { status: 'failed'; attempts: number; reason: string };
+export type FinalGateRepairResult = ConstrainedRepairResult;
export interface RepairFinalGateFailureInput {
agentRunner: AgentRunnerPort;
@@ -19,50 +15,20 @@ export interface RepairFinalGateFailureInput {
allowedPaths: string[];
trace: IngestTraceWriter;
repairKind: FinalGateRepairKind;
+ /**
+ * Re-runs the failed gate against the current worktree. The repair counts
+ * as successful only when this passes — editing files is not the success
+ * signal.
+ */
+ verify(changedPaths: string[]): Promise;
maxAttempts?: number;
stepBudget?: number;
-}
-
-const readRepairFileSchema = z.object({
- path: z.string().min(1),
-});
-
-const writeRepairFileSchema = z.object({
- path: z.string().min(1),
- content: z.string(),
-});
-
-function normalizeRepoPath(path: string): string {
- const normalized = path.replace(/\\/g, '/').replace(/^\/+/, '');
- const parts = normalized.split('/').filter((part) => part.length > 0);
- if (parts.length === 0 || parts.some((part) => part === '.' || part === '..')) {
- throw new Error(`gate repair path must be a repository-relative path: ${path}`);
- }
- return parts.join('/');
-}
-
-function assertAllowedPath(path: string, allowedPaths: ReadonlySet): string {
- const normalized = normalizeRepoPath(path);
- if (!allowedPaths.has(normalized)) {
- throw new Error(`gate repair path not allowed: ${normalized}`);
- }
- return normalized;
-}
-
-async function readOptionalFile(path: string): Promise<{ exists: boolean; content: string }> {
- try {
- return { exists: true, content: await readFile(path, 'utf-8') };
- } catch (error) {
- if (error && typeof error === 'object' && 'code' in error && error.code === 'ENOENT') {
- return { exists: false, content: '' };
- }
- throw error;
- }
+ abortSignal?: AbortSignal;
}
function buildGateRepairSystemPrompt(): string {
return `
-You repair one KTX isolated-diff artifact gate failure inside the integration worktree.
+You repair one ktx isolated-diff artifact gate failure inside the integration worktree.
@@ -82,7 +48,11 @@ function buildGateRepairUserPrompt(input: {
repairKind: FinalGateRepairKind;
attempt: number;
maxAttempts: number;
+ previousFailure: string | null;
}): string {
+ const previousFailureBlock = input.previousFailure
+ ? `\nPrevious attempt did not pass the gate:\n${input.previousFailure}\n`
+ : '';
return `Repair isolated-diff artifact gates.
Repair kind: ${input.repairKind}
@@ -93,66 +63,34 @@ ${input.allowedPaths.map((path) => `- ${path}`).join('\n')}
Gate error:
${input.gateError}
-
+${previousFailureBlock}
Use read_gate_error first. Then inspect only the allowed files, write the
minimal repaired content, and stop.`;
}
-function buildToolSet(input: {
- workdir: string;
- gateError: string;
- allowedPaths: ReadonlySet;
- editedPaths: Set;
-}): KtxRuntimeToolSet {
+function buildReadGateErrorTool(gateError: string): KtxRuntimeToolSet {
return {
read_gate_error: {
name: 'read_gate_error',
description: 'Read the artifact gate failure that must be repaired.',
inputSchema: z.object({}),
execute: async () => ({
- markdown: input.gateError,
- structured: { gateError: input.gateError },
+ markdown: gateError,
+ structured: { gateError },
}),
},
- read_repair_file: {
- name: 'read_repair_file',
- description: 'Read one allowed file from the integration worktree.',
- inputSchema: readRepairFileSchema,
- execute: async ({ path }: z.infer) => {
- const normalized = assertAllowedPath(path, input.allowedPaths);
- const file = await readOptionalFile(join(input.workdir, normalized));
- return {
- markdown: file.exists ? file.content : `(missing file: ${normalized})`,
- structured: { path: normalized, exists: file.exists },
- };
- },
- },
- write_repair_file: {
- name: 'write_repair_file',
- description: 'Replace one allowed integration worktree file with repaired text content.',
- inputSchema: writeRepairFileSchema,
- execute: async ({ path, content }: z.infer) => {
- const normalized = assertAllowedPath(path, input.allowedPaths);
- const fullPath = join(input.workdir, normalized);
- await mkdir(dirname(fullPath), { recursive: true });
- await writeFile(fullPath, content, 'utf-8');
- input.editedPaths.add(normalized);
- return {
- markdown: `Wrote ${normalized}`,
- structured: { path: normalized, bytes: Buffer.byteLength(content) },
- };
- },
- },
};
}
export function finalGateRepairPaths(input: {
changedWikiPageKeys: string[];
- touchedSlSources: TouchedSlSource[];
+ // Resolved by the caller: SL filenames are derived labels, so the repair
+ // allowlist must carry the real on-disk paths, not name-interpolated ones.
+ touchedSlSourcePaths: string[];
}): string[] {
return [
...new Set([
- ...input.touchedSlSources.map((source) => `semantic-layer/${source.connectionId}/${source.sourceName}.yaml`),
+ ...input.touchedSlSourcePaths,
...input.changedWikiPageKeys.map((pageKey) => `wiki/global/${pageKey}.md`),
]),
].sort();
@@ -161,70 +99,38 @@ export function finalGateRepairPaths(input: {
export async function repairFinalGateFailure(
input: RepairFinalGateFailureInput,
): Promise {
- const allowedPaths = new Set(input.allowedPaths.map(normalizeRepoPath));
- const maxAttempts = input.maxAttempts ?? 1;
- const stepBudget = input.stepBudget ?? 16;
- let lastFailure = 'gate repair did not run';
-
- for (let attempt = 1; attempt <= maxAttempts; attempt += 1) {
- const editedPaths = new Set();
- const sortedAllowedPaths = [...allowedPaths].sort();
- const traceData = {
+ return runConstrainedRepairLoop({
+ agentRunner: input.agentRunner,
+ workdir: input.workdir,
+ allowedPaths: input.allowedPaths,
+ trace: input.trace,
+ tracePhase: 'gate_repair',
+ traceEventName: 'gate_repair',
+ traceData: {
repairKind: input.repairKind,
- attempt,
- maxAttempts,
- allowedPaths: sortedAllowedPaths,
gateError: input.gateError,
- };
- const result = await traceTimed(input.trace, 'gate_repair', 'gate_repair', traceData, async () =>
- input.agentRunner.runLoop({
- modelRole: 'repair',
- systemPrompt: buildGateRepairSystemPrompt(),
- userPrompt: buildGateRepairUserPrompt({
- gateError: input.gateError,
- allowedPaths: sortedAllowedPaths,
- repairKind: input.repairKind,
- attempt,
- maxAttempts,
- }),
- toolSet: buildToolSet({
- workdir: input.workdir,
- gateError: input.gateError,
- allowedPaths,
- editedPaths,
- }),
- stepBudget,
- telemetryTags: {
- operationName: 'ingest-isolated-diff-gate-repair',
- source: input.trace.context.sourceKey,
- jobId: input.trace.context.jobId,
- repairKind: input.repairKind,
- },
+ },
+ systemPrompt: buildGateRepairSystemPrompt(),
+ buildUserPrompt: ({ attempt, maxAttempts, previousFailure }) =>
+ buildGateRepairUserPrompt({
+ gateError: input.gateError,
+ allowedPaths: [...input.allowedPaths].sort(),
+ repairKind: input.repairKind,
+ attempt,
+ maxAttempts,
+ previousFailure,
}),
- );
-
- if (result.stopReason === 'error') {
- lastFailure = result.error?.message ?? 'gate repair agent loop errored';
- await input.trace.event('error', 'gate_repair', 'gate_repair_failed', traceData, result.error);
- continue;
- }
-
- const changedPaths = [...editedPaths].sort();
- if (changedPaths.length === 0) {
- lastFailure = 'gate repair completed without editing an allowed path';
- await input.trace.event('error', 'gate_repair', 'gate_repair_failed', {
- ...traceData,
- reason: lastFailure,
- });
- continue;
- }
-
- await input.trace.event('debug', 'gate_repair', 'gate_repair_repaired', {
- ...traceData,
- changedPaths,
- });
- return { status: 'repaired', attempts: attempt, changedPaths };
- }
-
- return { status: 'failed', attempts: maxAttempts, reason: lastFailure };
+ buildExtraTools: () => buildReadGateErrorTool(input.gateError),
+ verify: input.verify,
+ noChangeFailureReason: 'gate repair completed without editing an allowed path',
+ telemetryTags: {
+ operationName: 'ingest-isolated-diff-gate-repair',
+ source: input.trace.context.sourceKey,
+ jobId: input.trace.context.jobId,
+ repairKind: input.repairKind,
+ },
+ maxAttempts: input.maxAttempts,
+ stepBudget: input.stepBudget ?? 16,
+ abortSignal: input.abortSignal,
+ });
}
diff --git a/packages/cli/src/context/ingest/finalization-scope.ts b/packages/cli/src/context/ingest/finalization-scope.ts
index b5ace2b9..6d7e4a42 100644
--- a/packages/cli/src/context/ingest/finalization-scope.ts
+++ b/packages/cli/src/context/ingest/finalization-scope.ts
@@ -1,3 +1,4 @@
+import { isSlYamlPath } from '../../context/sl/source-files.js';
import type { SemanticLayerSource } from '../../context/sl/types.js';
import type { TouchedSlSource } from '../../context/tools/touched-sl-sources.js';
import type { IngestReportFinalizationMismatch } from './reports.js';
@@ -64,39 +65,36 @@ export function deriveFinalizationWikiPageKeys(paths: string[]): string[] {
);
}
-export async function deriveFinalizationTouchedSources(
- input: DeriveTouchedSourcesInput,
-): Promise {
+// Source identity is the in-file `name:`; filenames are derived labels (see
+// source-files.ts), so a changed path — manifest shard or standalone file —
+// cannot be mapped to a source by parsing its filename. Instead, every changed
+// semantic-layer file is attributed through the before/after diff of its
+// connection's composed sources. A changed file whose connection diff is empty
+// cannot be attributed to any source and is surfaced as unresolved.
+export function deriveFinalizationTouchedSources(input: DeriveTouchedSourcesInput): DeriveTouchedSourcesResult {
const touched = new Map();
const unresolvedPaths: string[] = [];
+ const pathsByConnection = new Map();
for (const path of input.changedPaths) {
- if (!path.startsWith('semantic-layer/') || !(path.endsWith('.yaml') || path.endsWith('.yml'))) {
+ if (!path.startsWith('semantic-layer/') || !isSlYamlPath(path)) {
continue;
}
- const parts = path.split('/');
- const connectionId = parts[1] ?? '';
+ const connectionId = path.split('/')[1] ?? '';
if (!connectionId) {
unresolvedPaths.push(path);
continue;
}
- if (parts[2] !== '_schema') {
- const fileName = parts.at(-1) ?? '';
- const sourceName = fileName.replace(/\.ya?ml$/, '');
- if (!sourceName) {
- unresolvedPaths.push(path);
- continue;
- }
- touched.set(`${connectionId}:${sourceName}`, { connectionId, sourceName });
- continue;
- }
+ pathsByConnection.set(connectionId, [...(pathsByConnection.get(connectionId) ?? []), path]);
+ }
+ for (const [connectionId, paths] of pathsByConnection) {
const changedNames = changedSourceNames(
input.beforeSourcesByConnection.get(connectionId) ?? [],
input.afterSourcesByConnection.get(connectionId) ?? [],
);
if (changedNames.length === 0) {
- unresolvedPaths.push(path);
+ unresolvedPaths.push(...paths);
continue;
}
for (const sourceName of changedNames) {
diff --git a/packages/cli/src/context/ingest/ingest-bundle.runner.ts b/packages/cli/src/context/ingest/ingest-bundle.runner.ts
index 3f2b41d3..45953adf 100644
--- a/packages/cli/src/context/ingest/ingest-bundle.runner.ts
+++ b/packages/cli/src/context/ingest/ingest-bundle.runner.ts
@@ -3,10 +3,12 @@ import { dirname, join } from 'node:path';
import pLimit from 'p-limit';
import { z } from 'zod';
import { type KtxLogger, noopLogger } from '../../context/core/config.js';
+import type { RateLimitWaitState } from '../../context/llm/rate-limit-governor.js';
import { createRuntimeToolDescriptorFromAiTool } from '../../context/llm/runtime-tools.js';
import type { KtxRuntimeToolSet } from '../../context/llm/runtime-port.js';
import type { CaptureSession, MemoryAction } from '../../context/memory/types.js';
import type { SemanticLayerService } from '../../context/sl/semantic-layer.service.js';
+import { isSlYamlPath, slSourceFilePath, slSourceNameForFile, sourceNameFromPath } from '../../context/sl/source-files.js';
import type { SemanticLayerSource } from '../../context/sl/types.js';
import type { SlValidationDeps } from '../../context/sl/tools/sl-warehouse-validation.js';
import { createTouchedSlSources, type TouchedSlSource } from '../../context/tools/touched-sl-sources.js';
@@ -219,6 +221,10 @@ export class IngestBundleRunner {
}
async run(job: IngestBundleJob, ctx?: IngestJobContext): Promise {
+ const unsubscribeRateLimitGovernor = this.subscribeRateLimitGovernor({
+ trace: this.createTrace(job),
+ memoryFlow: ctx?.memoryFlow,
+ });
const key = job.connectionId;
const previous = this.chainByConnection.get(key);
if (previous) {
@@ -241,10 +247,72 @@ export class IngestBundleRunner {
ctx?.memoryFlow?.finish('error', [sanitizeMemoryFlowError(error)]);
throw error;
} finally {
+ unsubscribeRateLimitGovernor();
await this.maybeEmitIngestProfile(job.jobId);
}
}
+ private formatRateLimitWait(
+ state: Extract,
+ ): string {
+ const seconds = Math.ceil(state.remainingMs / 1_000);
+ const minutes = Math.floor(seconds / 60);
+ const remainder = seconds % 60;
+ const duration = minutes > 0 ? `${minutes}m${String(remainder).padStart(2, '0')}s` : `${seconds}s`;
+ const type = state.rateLimitType ? ` ${state.rateLimitType}` : '';
+ return `Rate-limited (${state.provider}${type}); resuming in ${duration}; Ctrl+C to stop`;
+ }
+
+ private subscribeRateLimitGovernor(input: {
+ trace: IngestTraceWriter;
+ memoryFlow?: MemoryFlowEventSink;
+ }): () => void {
+ const governor = this.deps.settings.rateLimitGovernor;
+ if (!governor) {
+ return () => undefined;
+ }
+ return governor.subscribe((state: RateLimitWaitState) => {
+ if (state.kind === 'rate_limit_observed') {
+ void input.trace.event('info', 'rate_limit', 'rate_limit_observed', { ...state });
+ return;
+ }
+ if (state.kind === 'concurrency_adjusted') {
+ void input.trace.event('info', 'rate_limit', 'concurrency_adjusted', { ...state });
+ return;
+ }
+ void input.trace.event('info', 'rate_limit', state.kind, { ...state });
+ if (state.kind === 'wait_tick' || state.kind === 'wait_started') {
+ input.memoryFlow?.emit({
+ type: 'rate_limit_wait',
+ provider: state.provider,
+ ...(state.rateLimitType ? { rateLimitType: state.rateLimitType } : {}),
+ resumeAtMs: state.resumeAtMs,
+ remainingMs: state.remainingMs,
+ });
+ input.memoryFlow?.emit({
+ type: 'stage_progress',
+ stage: 'integration',
+ percent: 50,
+ message: this.formatRateLimitWait(state),
+ transient: true,
+ });
+ }
+ });
+ }
+
+ private async withRateLimitWorkSlot(abortSignal: AbortSignal | undefined, fn: () => Promise): Promise {
+ const governor = this.deps.settings.rateLimitGovernor;
+ if (!governor) {
+ return fn();
+ }
+ const release = await governor.acquireWorkSlot(abortSignal);
+ try {
+ return await fn();
+ } finally {
+ release();
+ }
+ }
+
/**
* When profiling is enabled — via the `KTX_PROFILE_INGEST` env var or the
* `ingest.profile` config setting — read the job's trace + tool transcripts
@@ -431,7 +499,7 @@ export class IngestBundleRunner {
const files = await this.deps.semanticLayerService.listFilesForConnection(connectionId);
const names = files
.filter((f) => !f.startsWith('_schema/'))
- .map((f) => f.replace(/\.yaml$/, ''))
+ .map((f) => sourceNameFromPath(f))
.sort((left, right) => left.localeCompare(right));
const body = names.length > 0 ? names.join('\n') : '(no sources yet)';
return `## ${connectionId}\n${body}`;
@@ -724,14 +792,52 @@ export class IngestBundleRunner {
].sort();
}
- private touchedSlSourcesFromPaths(paths: string[]): TouchedSlSource[] {
- return paths
- .filter((path) => path.startsWith('semantic-layer/') && path.endsWith('.yaml') && !path.includes('/_schema/'))
- .map((path) => {
- const [, connectionId, fileName] = path.split('/');
- return { connectionId: connectionId ?? '', sourceName: (fileName ?? '').replace(/\.yaml$/, '') };
- })
- .filter((source) => source.connectionId.length > 0 && source.sourceName.length > 0);
+ private async touchedSlSourcesFromPaths(
+ worktree: IngestSessionWorktree,
+ paths: string[],
+ deletedFileSha: string,
+ ): Promise {
+ const sources: TouchedSlSource[] = [];
+ for (const path of paths) {
+ if (!path.startsWith('semantic-layer/') || !isSlYamlPath(path) || path.includes('/_schema/')) {
+ continue;
+ }
+ const [, connectionId] = path.split('/');
+ if (!connectionId) {
+ continue;
+ }
+ // Source identity is the in-file `name:`, never the filename — an uppercase
+ // warehouse source like `WIDGET_SALES` lives in a hash-derived
+ // `widget_sales-.yaml`, so parsing the basename yields a phantom name.
+ // Read the live file; when it was deleted this run, recover its declared
+ // name from the pre-change commit the way `revertSourceToPreHead` resolves a
+ // gone file from history. The filename is a last resort only when the content
+ // is unrecoverable from both.
+ let content: string | null;
+ try {
+ content = await readFile(join(worktree.workdir, path), 'utf-8');
+ } catch {
+ content = await worktree.git.getFileAtCommit(path, deletedFileSha).catch(() => null);
+ }
+ const sourceName = content === null ? sourceNameFromPath(path) : slSourceNameForFile(path, content);
+ if (sourceName.length > 0) {
+ sources.push({ connectionId, sourceName });
+ }
+ }
+ return sources;
+ }
+
+ // Inverse direction for commits and repair allowlists: resolve each touched
+ // source to its real on-disk path, falling back to the writer's derived
+ // filename when the file was deleted in this run.
+ private async touchedSlSourcePaths(workdir: string, touched: TouchedSlSource[]): Promise {
+ const service = this.deps.semanticLayerService.forWorktree(workdir);
+ const paths: string[] = [];
+ for (const source of touched) {
+ const file = await service.readSourceFile(source.connectionId, source.sourceName);
+ paths.push(file?.path ?? slSourceFilePath(source.connectionId, source.sourceName));
+ }
+ return paths;
}
private touchedSlSourcesFromActions(actions: MemoryAction[], fallbackConnectionId: string): TouchedSlSource[] {
@@ -872,13 +978,13 @@ export class IngestBundleRunner {
workUnitSettings: { maxConcurrency: number; stepBudget: number; failureMode: 'abort' | 'continue' };
transcriptDir: string;
transcriptSummaries: Map;
- recordTranscriptEntry(path: string): (entry: ToolCallLogEntry) => void;
+ recordTranscriptEntry(path: string): (entry: ToolCallLogEntry) => MutableToolTranscriptSummary;
stageIndex: StageIndex;
includeContextEvidenceTools: boolean;
currentTableExists(tableRef: string): Promise;
memoryFlow?: MemoryFlowEventSink;
+ abortSignal?: AbortSignal;
wuSkillNames: string[];
- onStepFinish?: (info: { stepIndex: number; stepBudget: number }) => void;
}): Promise {
const session: CaptureSession = {
userId: 'system',
@@ -982,7 +1088,6 @@ export class IngestBundleRunner {
type: 'work_unit_started',
unitKey: input.wu.unitKey,
skills: input.wuSkillNames,
- stepBudget: input.workUnitSettings.stepBudget,
});
return executeWorkUnit(
{
@@ -1006,8 +1111,10 @@ export class IngestBundleRunner {
slIndex: input.slIndex,
priorProvenance: input.priorProvenance,
}),
- buildToolSet: (wuInner) =>
- wrapToolsWithLogger(
+ buildToolSet: (wuInner) => {
+ const transcriptPath = join(input.transcriptDir, `${wuInner.unitKey}.jsonl`);
+ const record = input.recordTranscriptEntry(transcriptPath);
+ return wrapToolsWithLogger(
buildWuToolSet({
sourceKey: input.job.sourceKey,
stagedDir: input.stagedDir,
@@ -1016,10 +1123,23 @@ export class IngestBundleRunner {
emitUnmappedFallbackTool: wuEmitUnmappedFallbackTool,
toolsetTools: wuToolset.toRuntimeTools(wuToolContext),
}),
- join(input.transcriptDir, `${wuInner.unitKey}.jsonl`),
+ transcriptPath,
wuInner.unitKey,
- { onEntry: input.recordTranscriptEntry(join(input.transcriptDir, `${wuInner.unitKey}.jsonl`)) },
- ),
+ {
+ // Drive the live HUD heartbeat from real tool calls: each invocation
+ // ticks the running per-unit count. This is an observed signal, not a
+ // re-derived turn count, so it can never overshoot a budget.
+ onEntry: (entry) => {
+ const summary = record(entry);
+ input.memoryFlow?.emit({
+ type: 'work_unit_step',
+ unitKey: wuInner.unitKey,
+ toolCalls: summary.toolCallCount,
+ });
+ },
+ },
+ );
+ },
captureSession: session,
sessionActions,
modelRole: 'candidateExtraction',
@@ -1028,7 +1148,7 @@ export class IngestBundleRunner {
connectionId: input.job.connectionId,
jobId: input.job.jobId,
toolFailureCount: (unitKey) => input.transcriptSummaries.get(unitKey)?.fatalErrorCount ?? 0,
- onStepFinish: input.onStepFinish,
+ abortSignal: input.abortSignal,
},
input.wu,
);
@@ -1097,11 +1217,12 @@ export class IngestBundleRunner {
const transcriptDir = this.deps.storage.resolveTranscriptDir(job.jobId);
const recordTranscriptEntry =
(path: string) =>
- (entry: ToolCallLogEntry): void => {
+ (entry: ToolCallLogEntry): MutableToolTranscriptSummary => {
const current =
transcriptSummaries.get(entry.wuKey) ?? createMutableToolTranscriptSummary(entry.wuKey, path);
recordToolTranscriptEntry(current, entry);
transcriptSummaries.set(entry.wuKey, current);
+ return current;
};
const overrideReport = await this.loadOverrideReport(job);
@@ -1476,7 +1597,7 @@ export class IngestBundleRunner {
projectionTouchedSources = projection.touchedSources;
projectionChangedWikiPageKeys = projection.changedWikiPageKeys;
const projectionPaths = [
- ...projection.touchedSources.map((source) => `semantic-layer/${source.connectionId}/${source.sourceName}.yaml`),
+ ...(await this.touchedSlSourcePaths(sessionWorktree.workdir, projection.touchedSources)),
...projection.changedWikiPageKeys.map((pageKey) => `wiki/global/${pageKey}.md`),
];
projectionTouchedPaths = projectionPaths;
@@ -1524,7 +1645,8 @@ export class IngestBundleRunner {
try {
await Promise.all(
workUnits.map((wu, index) =>
- limitWorkUnit(async () => {
+ limitWorkUnit(() =>
+ this.withRateLimitWorkSlot(ctx?.abortSignal, async () => {
const outcome = await runIsolatedWorkUnit({
unitIndex: index,
ingestionBaseSha,
@@ -1532,6 +1654,7 @@ export class IngestBundleRunner {
patchDir,
trace: runTrace,
workUnit: wu,
+ abortSignal: ctx?.abortSignal,
afterSuccess: (child) => copyTransientIngestEvidence(child.workdir, sessionWorktree.workdir),
run: async (child) => {
const scopedWikiService = this.deps.wikiService.forWorktree(child.workdir);
@@ -1565,11 +1688,9 @@ export class IngestBundleRunner {
includeContextEvidenceTools: adapter.evidenceIndexing === 'documents' && !!contextReport,
currentTableExists: (tableRef) =>
this.tableRefExistsInSemanticLayer(scopedSemanticLayerService, slConnectionIds, tableRef),
+ abortSignal: ctx?.abortSignal,
memoryFlow,
wuSkillNames,
- onStepFinish: ({ stepIndex, stepBudget }) => {
- memoryFlow?.emit({ type: 'work_unit_step', unitKey: wu.unitKey, stepIndex, stepBudget });
- },
});
},
});
@@ -1594,7 +1715,8 @@ export class IngestBundleRunner {
completedWorkUnits / workUnits.length,
`${completedWorkUnits} of ${workUnits.length} work units complete`,
);
- }),
+ }),
+ ),
),
);
} catch (error) {
@@ -1657,7 +1779,11 @@ export class IngestBundleRunner {
await validateFinalIngestArtifacts({
connectionIds: slConnectionIds,
changedWikiPageKeys: this.wikiPageKeysFromPaths(touchedPaths),
- touchedSlSources: this.touchedSlSourcesFromPaths(touchedPaths),
+ touchedSlSources: await this.touchedSlSourcesFromPaths(
+ sessionWorktree,
+ touchedPaths,
+ await sessionWorktree.git.revParseHead(),
+ ),
wikiService: this.deps.wikiService.forWorktree(sessionWorktree.workdir),
semanticLayerService: this.deps.semanticLayerService.forWorktree(sessionWorktree.workdir),
validateTouchedSources: (touched) =>
@@ -1691,8 +1817,10 @@ export class IngestBundleRunner {
touchedPaths: context.touchedPaths,
trace: runTrace,
reason: context.reason,
- maxAttempts: 1,
+ verify: context.verify,
+ maxAttempts: 2,
stepBudget: 12,
+ abortSignal: ctx?.abortSignal,
});
emitStageProgress(
'integration',
@@ -1712,8 +1840,10 @@ export class IngestBundleRunner {
allowedPaths: context.touchedPaths,
trace: runTrace,
repairKind: 'patch_semantic_gate',
- maxAttempts: 1,
+ verify: context.verify,
+ maxAttempts: 2,
stepBudget: 16,
+ abortSignal: ctx?.abortSignal,
});
emitStageProgress(
'integration',
@@ -1938,6 +2068,45 @@ export class IngestBundleRunner {
let curatorWarnings: string[] = [];
let reconcileOutcome: Awaited>;
+ // Reconcile shares the work-unit liveness model: the HUD heartbeat is driven
+ // by real tool calls (a monotonic, observed count), not a re-derived turn
+ // counter. The soft cap only paces the phase progress bar; it is never shown
+ // to the user, so it cannot read as a misleading "X/Y" fraction.
+ const reconcileTranscriptPath = join(transcriptDir, 'reconcile.jsonl');
+ const reconcileProgressSoftCap = 40;
+ const buildReconcileToolSetWithHeartbeat = (): KtxRuntimeToolSet => {
+ const record = recordTranscriptEntry(reconcileTranscriptPath);
+ return wrapToolsWithLogger(
+ buildReconcileToolSet({
+ loadSkillTool: rcLoadSkill,
+ stageListTool: rcStageListTool,
+ stageDiffTool: rcStageDiffTool,
+ evictionListTool: rcEvictionListTool,
+ emitConflictResolutionTool: rcEmitConflictResolutionTool,
+ emitEvictionDecisionTool: rcEmitEvictionDecisionTool,
+ emitArtifactResolutionTool: rcEmitArtifactResolutionTool,
+ emitUnmappedFallbackTool: rcEmitUnmappedFallbackTool,
+ readRawSpanTool: rcRawSpanTool,
+ toolsetTools: rcToolset.toRuntimeTools(rcToolContext),
+ }),
+ reconcileTranscriptPath,
+ 'reconcile',
+ {
+ onEntry: (entry) => {
+ const summary = record(entry);
+ if (!stage4) {
+ return;
+ }
+ const label = `Reconciling results · ${summary.toolCallCount} action${
+ summary.toolCallCount === 1 ? '' : 's'
+ }`;
+ emitStageProgress('reconciliation', 85, label, { transient: true });
+ void stage4.updateProgress(Math.min(0.95, summary.toolCallCount / reconcileProgressSoftCap), label);
+ },
+ },
+ );
+ };
+
const reconcileStartedAt = Date.now();
const reconcileMode = contextReport && this.deps.curatorPagination ? 'curator' : 'single';
if (contextReport && this.deps.curatorPagination) {
@@ -1960,39 +2129,9 @@ export class IngestBundleRunner {
}),
buildUserPrompt: ({ summary, items, runState }) =>
buildReconcileUserPrompt(stageIndex, eviction, { summary, items }, reconcileNotes, runState),
- buildToolSet: (_passNumber) =>
- wrapToolsWithLogger(
- buildReconcileToolSet({
- loadSkillTool: rcLoadSkill,
- stageListTool: rcStageListTool,
- stageDiffTool: rcStageDiffTool,
- evictionListTool: rcEvictionListTool,
- emitConflictResolutionTool: rcEmitConflictResolutionTool,
- emitEvictionDecisionTool: rcEmitEvictionDecisionTool,
- emitArtifactResolutionTool: rcEmitArtifactResolutionTool,
- emitUnmappedFallbackTool: rcEmitUnmappedFallbackTool,
- readRawSpanTool: rcRawSpanTool,
- toolsetTools: rcToolset.toRuntimeTools(rcToolContext),
- }),
- join(transcriptDir, 'reconcile.jsonl'),
- 'reconcile',
- { onEntry: recordTranscriptEntry(join(transcriptDir, 'reconcile.jsonl')) },
- ),
+ buildToolSet: (_passNumber) => buildReconcileToolSetWithHeartbeat(),
getReconciliationActions: () => reconcileActions,
- onStepFinish: stage4
- ? ({ passNumber, stepIndex, stepBudget }) => {
- emitStageProgress(
- 'reconciliation',
- 85,
- `Reconciling results: pass ${passNumber} step ${stepIndex}/${stepBudget}`,
- { transient: true },
- );
- void stage4.updateProgress(
- stepIndex / stepBudget,
- `Reconciling results · pass ${passNumber} step ${stepIndex}`,
- );
- }
- : undefined,
+ abortSignal: ctx?.abortSignal,
});
curatorReport = curatorOutcome.report;
curatorWarnings = curatorOutcome.warnings;
@@ -2015,37 +2154,13 @@ export class IngestBundleRunner {
canonicalPins: relevantCanonicalPins,
}),
buildUserPrompt: (idx, ev) => buildReconcileUserPrompt(idx, ev, undefined, reconcileNotes),
- buildToolSet: () =>
- wrapToolsWithLogger(
- buildReconcileToolSet({
- loadSkillTool: rcLoadSkill,
- stageListTool: rcStageListTool,
- stageDiffTool: rcStageDiffTool,
- evictionListTool: rcEvictionListTool,
- emitConflictResolutionTool: rcEmitConflictResolutionTool,
- emitEvictionDecisionTool: rcEmitEvictionDecisionTool,
- emitArtifactResolutionTool: rcEmitArtifactResolutionTool,
- emitUnmappedFallbackTool: rcEmitUnmappedFallbackTool,
- readRawSpanTool: rcRawSpanTool,
- toolsetTools: rcToolset.toRuntimeTools(rcToolContext),
- }),
- join(transcriptDir, 'reconcile.jsonl'),
- 'reconcile',
- { onEntry: recordTranscriptEntry(join(transcriptDir, 'reconcile.jsonl')) },
- ),
+ buildToolSet: () => buildReconcileToolSetWithHeartbeat(),
modelRole: 'reconcile',
stepBudget: 60,
sourceKey: job.sourceKey,
jobId: job.jobId,
force: !!overrideReport,
- onStepFinish: stage4
- ? ({ stepIndex, stepBudget }) => {
- emitStageProgress('reconciliation', 85, `Reconciling results: step ${stepIndex}/${stepBudget}`, {
- transient: true,
- });
- void stage4.updateProgress(stepIndex / stepBudget, `Reconciling results · step ${stepIndex}`);
- }
- : undefined,
+ abortSignal: ctx?.abortSignal,
});
}
await runTrace.event(
@@ -2219,20 +2334,34 @@ export class IngestBundleRunner {
)
: [];
- const changedConnectionIds = [
- ...new Set([
- ...slConnectionIds,
- ...finalizationTouchedPaths
- .filter((path) => path.startsWith('semantic-layer/'))
- .map((path) => path.split('/')[1])
- .filter((connectionId): connectionId is string => Boolean(connectionId)),
- ]),
- ].sort();
+ // Validate the write scope before deriving touched sources: attribution
+ // by before/after diff is only defined for connections whose
+ // pre-finalization snapshot was loaded (slConnectionIds), and an
+ // out-of-scope write would otherwise surface downstream as a bogus
+ // unresolved-path or declaration-mismatch failure instead of the real
+ // policy violation.
+ await traceTimed(
+ runTrace,
+ 'finalization',
+ 'semantic_layer_target_policy',
+ {
+ sourceKey: job.sourceKey,
+ allowedTargetConnectionIds: slConnectionIds,
+ touchedPaths: [...new Set(finalizationTouchedPaths)].sort(),
+ },
+ async () => {
+ assertSemanticLayerTargetPathsAllowed({
+ paths: finalizationTouchedPaths,
+ allowedConnectionIds: new Set(slConnectionIds),
+ });
+ },
+ );
+
const postFinalizationSourcesByConnection = await this.loadSourcesByConnection(
sessionWorktree.workdir,
- changedConnectionIds,
+ slConnectionIds,
);
- const scope = await deriveFinalizationTouchedSources({
+ const scope = deriveFinalizationTouchedSources({
changedPaths: finalizationTouchedPaths,
beforeSourcesByConnection: preFinalizationSourcesByConnection,
afterSourcesByConnection: postFinalizationSourcesByConnection,
@@ -2367,7 +2496,7 @@ export class IngestBundleRunner {
...(isolatedDiffEnabled ? projectionTouchedSources : []),
...workUnitOutcomes.flatMap((outcome) => outcome.touchedSlSources),
...this.touchedSlSourcesFromActions(reconcileActions, job.connectionId),
- ...this.touchedSlSourcesFromPaths(postReconciliationPaths),
+ ...(await this.touchedSlSourcesFromPaths(sessionWorktree, postReconciliationPaths, preReconciliationSha)),
...finalizationTouchedSources,
]);
const finalWikiGateScope = await this.wikiPageKeysForFinalGates({
@@ -2419,46 +2548,47 @@ export class IngestBundleRunner {
activePhase = 'final_gates';
activeFailureDetails = finalArtifactGateTraceData;
emitStageProgress('final_gates', 89, 'Running final artifact gates');
+ const runFinalArtifactGates = async () => {
+ await validateFinalIngestArtifacts({
+ connectionIds: repairConnectionIds,
+ changedWikiPageKeys: finalChangedWikiPageKeys,
+ touchedSlSources: finalTouchedSlSources,
+ wikiService: this.deps.wikiService.forWorktree(sessionWorktree.workdir),
+ semanticLayerService: this.deps.semanticLayerService.forWorktree(sessionWorktree.workdir),
+ validateTouchedSources: (touched) =>
+ validateWuTouchedSources(
+ {
+ semanticLayerService: this.deps.semanticLayerService.forWorktree(sessionWorktree.workdir),
+ connections: this.deps.connections,
+ configService: sessionWorktree.config,
+ gitService: sessionWorktree.git,
+ slSourcesRepository: this.deps.slSourcesRepository,
+ probeRowCount: this.deps.settings.probeRowCount,
+ slValidator: this.deps.slValidator,
+ },
+ touched,
+ ),
+ tableExists: (connectionId, tableRef) =>
+ this.tableRefExistsInSemanticLayer(
+ this.deps.semanticLayerService.forWorktree(sessionWorktree.workdir),
+ [connectionId],
+ tableRef,
+ ),
+ });
+ };
try {
await traceTimed(
runTrace,
'final_gates',
'final_artifact_gates',
finalArtifactGateTraceData,
- async () => {
- await validateFinalIngestArtifacts({
- connectionIds: repairConnectionIds,
- changedWikiPageKeys: finalChangedWikiPageKeys,
- touchedSlSources: finalTouchedSlSources,
- wikiService: this.deps.wikiService.forWorktree(sessionWorktree.workdir),
- semanticLayerService: this.deps.semanticLayerService.forWorktree(sessionWorktree.workdir),
- validateTouchedSources: (touched) =>
- validateWuTouchedSources(
- {
- semanticLayerService: this.deps.semanticLayerService.forWorktree(sessionWorktree.workdir),
- connections: this.deps.connections,
- configService: sessionWorktree.config,
- gitService: sessionWorktree.git,
- slSourcesRepository: this.deps.slSourcesRepository,
- probeRowCount: this.deps.settings.probeRowCount,
- slValidator: this.deps.slValidator,
- },
- touched,
- ),
- tableExists: (connectionId, tableRef) =>
- this.tableRefExistsInSemanticLayer(
- this.deps.semanticLayerService.forWorktree(sessionWorktree.workdir),
- [connectionId],
- tableRef,
- ),
- });
- },
+ runFinalArtifactGates,
);
} catch (error) {
const gateError = this.errorMessage(error);
const repairPaths = finalGateRepairPaths({
changedWikiPageKeys: finalChangedWikiPageKeys,
- touchedSlSources: finalTouchedSlSources,
+ touchedSlSourcePaths: await this.touchedSlSourcePaths(sessionWorktree.workdir, finalTouchedSlSources),
});
emitStageProgress('final_gates', 89, 'Repairing final artifact gates');
const gateRepair = await repairFinalGateFailure({
@@ -2468,8 +2598,17 @@ export class IngestBundleRunner {
allowedPaths: repairPaths,
trace: runTrace,
repairKind: 'final_artifact_gate',
- maxAttempts: 1,
+ verify: async () => {
+ try {
+ await runFinalArtifactGates();
+ return { ok: true };
+ } catch (verifyError) {
+ return { ok: false, reason: this.errorMessage(verifyError) };
+ }
+ },
+ maxAttempts: 2,
stepBudget: 16,
+ abortSignal: ctx?.abortSignal,
});
isolatedDiffSummary.gateRepairAttempts += gateRepair.attempts;
@@ -2483,44 +2622,9 @@ export class IngestBundleRunner {
throw new Error(`${gateError}\ngate repair failed: ${gateRepair.reason}`);
}
+ // The repair loop re-ran the gates via `verify` before reporting
+ // success, so a repaired status here means the tree already passed.
isolatedDiffSummary.gateRepairs += 1;
- await traceTimed(
- runTrace,
- 'final_gates',
- 'final_artifact_gates_after_gate_repair',
- {
- ...finalArtifactGateTraceData,
- repairedPaths: gateRepair.changedPaths,
- },
- async () => {
- await validateFinalIngestArtifacts({
- connectionIds: repairConnectionIds,
- changedWikiPageKeys: finalChangedWikiPageKeys,
- touchedSlSources: finalTouchedSlSources,
- wikiService: this.deps.wikiService.forWorktree(sessionWorktree.workdir),
- semanticLayerService: this.deps.semanticLayerService.forWorktree(sessionWorktree.workdir),
- validateTouchedSources: (touched) =>
- validateWuTouchedSources(
- {
- semanticLayerService: this.deps.semanticLayerService.forWorktree(sessionWorktree.workdir),
- connections: this.deps.connections,
- configService: sessionWorktree.config,
- gitService: sessionWorktree.git,
- slSourcesRepository: this.deps.slSourcesRepository,
- probeRowCount: this.deps.settings.probeRowCount,
- slValidator: this.deps.slValidator,
- },
- touched,
- ),
- tableExists: (connectionId, tableRef) =>
- this.tableRefExistsInSemanticLayer(
- this.deps.semanticLayerService.forWorktree(sessionWorktree.workdir),
- [connectionId],
- tableRef,
- ),
- });
- },
- );
const repairCommit = await sessionWorktree.git.commitFiles(
gateRepair.changedPaths,
diff --git a/packages/cli/src/context/ingest/isolated-diff/patch-integrator.ts b/packages/cli/src/context/ingest/isolated-diff/patch-integrator.ts
index 1e2f0cee..04cc099b 100644
--- a/packages/cli/src/context/ingest/isolated-diff/patch-integrator.ts
+++ b/packages/cli/src/context/ingest/isolated-diff/patch-integrator.ts
@@ -1,35 +1,32 @@
import { readFile } from 'node:fs/promises';
import type { GitService } from '../../../context/core/git.service.js';
+import type { RepairVerification } from '../constrained-repair.js';
import type { FinalGateRepairResult } from '../final-gate-repair.js';
import type { IngestTraceWriter } from '../ingest-trace.js';
import { traceTimed } from '../ingest-trace.js';
import { assertPatchAllowedForWorkUnit, parsePatchTouchedPaths } from './git-patch.js';
import type { TextualConflictResolutionResult } from './textual-conflict-resolver.js';
-type PatchIntegrationTextualResolution =
- | { status: 'repaired'; attempts: number; changedPaths: string[] }
- | { status: 'failed'; attempts: number; reason: string };
-
export type PatchIntegrationResult =
| {
status: 'accepted';
commitSha: string;
touchedPaths: string[];
- textualResolution?: PatchIntegrationTextualResolution;
+ textualResolution?: TextualConflictResolutionResult;
gateRepair?: FinalGateRepairResult;
}
| {
status: 'textual_conflict';
reason: string;
touchedPaths: string[];
- textualResolution?: PatchIntegrationTextualResolution;
+ textualResolution?: TextualConflictResolutionResult;
gateRepair?: FinalGateRepairResult;
}
| {
status: 'semantic_conflict';
reason: string;
touchedPaths: string[];
- textualResolution?: PatchIntegrationTextualResolution;
+ textualResolution?: TextualConflictResolutionResult;
gateRepair?: FinalGateRepairResult;
};
@@ -47,12 +44,14 @@ export interface IntegrateWorkUnitPatchInput {
patchPath: string;
touchedPaths: string[];
reason: string;
+ verify(changedPaths: string[]): Promise;
}): Promise;
repairGateFailure?(input: {
unitKey: string;
patchPath: string;
touchedPaths: string[];
reason: string;
+ verify(changedPaths: string[]): Promise;
}): Promise;
}
@@ -94,6 +93,19 @@ export async function integrateWorkUnitPatch(input: IntegrateWorkUnitPatchInput)
};
}
+ // Repair and resolution success is decided by this check, not by whether
+ // the repair agent edited files: the gates re-run over the union of the
+ // patch's paths and everything the agent changed.
+ const verifyAppliedTree = async (changedPaths: string[]): Promise => {
+ const paths = [...new Set([...touchedPaths, ...changedPaths])].sort();
+ try {
+ await input.validateAppliedTree(paths);
+ return { ok: true };
+ } catch (error) {
+ return { ok: false, reason: errorMessage(error) };
+ }
+ };
+
try {
await traceTimed(
input.trace,
@@ -130,6 +142,7 @@ export async function integrateWorkUnitPatch(input: IntegrateWorkUnitPatchInput)
patchPath: input.patchPath,
touchedPaths,
reason,
+ verify: verifyAppliedTree,
});
if (textualResolution.status === 'failed') {
@@ -144,115 +157,20 @@ export async function integrateWorkUnitPatch(input: IntegrateWorkUnitPatchInput)
};
}
- try {
- await traceTimed(
- input.trace,
- 'integration',
- 'semantic_gate_after_textual_resolution',
- { unitKey: input.unitKey, touchedPaths: textualResolution.changedPaths },
- async () => {
- await input.validateAppliedTree(textualResolution.changedPaths);
- },
- );
- } catch (semanticError) {
- const reason = errorMessage(semanticError);
- await input.trace.event('error', 'integration', 'patch_semantic_conflict_after_textual_resolution', {
+ if (textualResolution.changedPaths.length === 0) {
+ // The resolver declared the patch redundant and the gates verified the
+ // current tree: the integration worktree already represents this work
+ // unit's content (e.g. a duplicate page created by another work unit).
+ await input.trace.event('debug', 'integration', 'patch_subsumed_after_textual_resolution', {
unitKey: input.unitKey,
patchPath: input.patchPath,
- touchedPaths: textualResolution.changedPaths,
- reason,
+ touchedPaths,
+ attempts: textualResolution.attempts,
});
-
- // A textual conflict and a semantic-gate failure can co-occur: the resolver
- // reconciles the text but can leave wiki sl_refs pointing at measures the
- // merged source no longer defines. Recover via the same gate repair the
- // clean-apply branch uses, instead of hard-failing the whole job.
- if (input.repairGateFailure) {
- const gateRepair = await input.repairGateFailure({
- unitKey: input.unitKey,
- patchPath: input.patchPath,
- touchedPaths: textualResolution.changedPaths,
- reason,
- });
-
- if (gateRepair.status !== 'failed') {
- // The resolver wrote its merge to the worktree (unstaged); the repair
- // edited a subset on top. Commit the union so neither is dropped.
- const resolvedAndRepairedPaths = [
- ...new Set([...textualResolution.changedPaths, ...gateRepair.changedPaths]),
- ].sort();
- try {
- await traceTimed(
- input.trace,
- 'integration',
- 'semantic_gate_after_gate_repair',
- { unitKey: input.unitKey, touchedPaths: gateRepair.changedPaths },
- async () => {
- await input.validateAppliedTree(gateRepair.changedPaths);
- },
- );
-
- const commit = await input.integrationGit.commitFiles(
- resolvedAndRepairedPaths,
- `ingest: resolve WorkUnit ${input.unitKey} conflict`,
- input.author.name,
- input.author.email,
- );
- if (commit.created) {
- await input.trace.event('debug', 'integration', 'patch_accepted_after_textual_resolution', {
- unitKey: input.unitKey,
- commitSha: commit.commitHash,
- touchedPaths: resolvedAndRepairedPaths,
- attempts: textualResolution.attempts,
- gateRepairAttempts: gateRepair.attempts,
- });
- return {
- status: 'accepted',
- commitSha: commit.commitHash,
- touchedPaths: resolvedAndRepairedPaths,
- textualResolution,
- gateRepair,
- };
- }
- } catch (repairValidationError) {
- if (preApplyHead) {
- await input.integrationGit.resetHardTo(preApplyHead);
- }
- await input.trace.event('error', 'integration', 'patch_semantic_conflict_after_textual_resolution', {
- unitKey: input.unitKey,
- patchPath: input.patchPath,
- touchedPaths: gateRepair.changedPaths,
- reason: errorMessage(repairValidationError),
- });
- return {
- status: 'semantic_conflict',
- reason: errorMessage(repairValidationError),
- touchedPaths: gateRepair.changedPaths,
- textualResolution,
- gateRepair,
- };
- }
- }
-
- if (preApplyHead) {
- await input.integrationGit.resetHardTo(preApplyHead);
- }
- return {
- status: 'semantic_conflict',
- reason: gateRepair.status === 'failed' ? gateRepair.reason : reason,
- touchedPaths: textualResolution.changedPaths,
- textualResolution,
- gateRepair,
- };
- }
-
- if (preApplyHead) {
- await input.integrationGit.resetHardTo(preApplyHead);
- }
return {
- status: 'semantic_conflict',
- reason,
- touchedPaths: textualResolution.changedPaths,
+ status: 'accepted',
+ commitSha: preApplyHead ?? '',
+ touchedPaths: [],
textualResolution,
};
}
@@ -264,19 +182,18 @@ export async function integrateWorkUnitPatch(input: IntegrateWorkUnitPatchInput)
input.author.email,
);
if (!commit.created) {
- if (preApplyHead) {
- await input.integrationGit.resetHardTo(preApplyHead);
- }
- const noChangeReason = 'textual resolver produced no committable changes';
- await input.trace.event('error', 'integration', 'textual_conflict_resolver_noop', {
+ // The resolver's writes left the tree byte-identical to the accepted
+ // state, and the gates verified it — the patch is represented already.
+ await input.trace.event('debug', 'integration', 'patch_subsumed_after_textual_resolution', {
unitKey: input.unitKey,
patchPath: input.patchPath,
touchedPaths: textualResolution.changedPaths,
+ attempts: textualResolution.attempts,
});
return {
- status: 'textual_conflict',
- reason: noChangeReason,
- touchedPaths: textualResolution.changedPaths,
+ status: 'accepted',
+ commitSha: preApplyHead ?? '',
+ touchedPaths: [],
textualResolution,
};
}
@@ -314,6 +231,7 @@ export async function integrateWorkUnitPatch(input: IntegrateWorkUnitPatchInput)
patchPath: input.patchPath,
touchedPaths,
reason,
+ verify: verifyAppliedTree,
});
if (gateRepair.status === 'failed') {
@@ -328,28 +246,6 @@ export async function integrateWorkUnitPatch(input: IntegrateWorkUnitPatchInput)
};
}
- try {
- await traceTimed(
- input.trace,
- 'integration',
- 'semantic_gate_after_gate_repair',
- { unitKey: input.unitKey, touchedPaths: gateRepair.changedPaths },
- async () => {
- await input.validateAppliedTree(gateRepair.changedPaths);
- },
- );
- } catch (repairValidationError) {
- if (preApplyHead) {
- await input.integrationGit.resetHardTo(preApplyHead);
- }
- return {
- status: 'semantic_conflict',
- reason: errorMessage(repairValidationError),
- touchedPaths: gateRepair.changedPaths,
- gateRepair,
- };
- }
-
const commit = await input.integrationGit.commitFiles(
gateRepair.changedPaths,
`ingest: repair WorkUnit ${input.unitKey} gates`,
diff --git a/packages/cli/src/context/ingest/isolated-diff/textual-conflict-resolver.ts b/packages/cli/src/context/ingest/isolated-diff/textual-conflict-resolver.ts
index 5ae551d1..8633e7e1 100644
--- a/packages/cli/src/context/ingest/isolated-diff/textual-conflict-resolver.ts
+++ b/packages/cli/src/context/ingest/isolated-diff/textual-conflict-resolver.ts
@@ -1,13 +1,15 @@
-import { mkdir, readFile, rm, writeFile } from 'node:fs/promises';
-import { dirname, join } from 'node:path';
+import { readFile } from 'node:fs/promises';
import { z } from 'zod';
import type { AgentRunnerPort, KtxRuntimeToolSet } from '../../../context/llm/runtime-port.js';
+import type {
+ ConstrainedRepairResult,
+ ConstrainedRepairToolContext,
+ RepairVerification,
+} from '../constrained-repair.js';
+import { buildDeleteRepairFileTool, runConstrainedRepairLoop } from '../constrained-repair.js';
import type { IngestTraceWriter } from '../ingest-trace.js';
-import { traceTimed } from '../ingest-trace.js';
-export type TextualConflictResolutionResult =
- | { status: 'repaired'; attempts: number; changedPaths: string[] }
- | { status: 'failed'; attempts: number; reason: string };
+export type TextualConflictResolutionResult = ConstrainedRepairResult;
export interface ResolveTextualConflictInput {
agentRunner: AgentRunnerPort;
@@ -17,63 +19,31 @@ export interface ResolveTextualConflictInput {
touchedPaths: string[];
trace: IngestTraceWriter;
reason: string;
+ /**
+ * Re-runs the artifact gates against the current worktree. A resolution —
+ * including an explicit no-change declaration for a redundant patch —
+ * counts as successful only when this passes.
+ */
+ verify(changedPaths: string[]): Promise;
maxAttempts?: number;
stepBudget?: number;
-}
-
-const readIntegrationFileSchema = z.object({
- path: z.string().min(1),
-});
-
-const writeIntegrationFileSchema = z.object({
- path: z.string().min(1),
- content: z.string(),
-});
-
-const deleteIntegrationFileSchema = z.object({
- path: z.string().min(1),
-});
-
-function normalizeRepoPath(path: string): string {
- const normalized = path.replace(/\\/g, '/').replace(/^\/+/, '');
- const parts = normalized.split('/').filter((part) => part.length > 0);
- if (parts.length === 0 || parts.some((part) => part === '.' || part === '..')) {
- throw new Error(`resolver path must be a repository-relative path: ${path}`);
- }
- return parts.join('/');
-}
-
-function assertAllowedPath(path: string, allowedPaths: ReadonlySet): string {
- const normalized = normalizeRepoPath(path);
- if (!allowedPaths.has(normalized)) {
- throw new Error(`resolver path not allowed: ${normalized}`);
- }
- return normalized;
-}
-
-async function readOptionalFile(path: string): Promise<{ exists: boolean; content: string }> {
- try {
- return { exists: true, content: await readFile(path, 'utf-8') };
- } catch (error) {
- if (error && typeof error === 'object' && 'code' in error && error.code === 'ENOENT') {
- return { exists: false, content: '' };
- }
- throw error;
- }
+ abortSignal?: AbortSignal;
}
function buildResolverSystemPrompt(): string {
return `
-You repair one failed KTX isolated-diff patch inside the integration worktree.
+You repair one failed ktx isolated-diff patch inside the integration worktree.
- Preserve accepted integration content that is unrelated to the failed patch.
- Incorporate the failed patch only when the patch evidence is compatible with the current file.
+- If the current file already represents everything the failed patch contributes (for example a
+ duplicate page created by another work unit), call declare_patch_redundant instead of editing.
- Edit only paths exposed by the resolver tools.
- Prefer the smallest text edit that makes the composed artifact coherent.
- Do not create new facts that are absent from the current file or failed patch.
-- Stop after writing the repaired file content.
+- Stop after writing the repaired file content or declaring the patch redundant.
`;
}
@@ -84,7 +54,11 @@ function buildResolverUserPrompt(input: {
reason: string;
attempt: number;
maxAttempts: number;
+ previousFailure: string | null;
}): string {
+ const previousFailureBlock = input.previousFailure
+ ? `\nPrevious attempt did not pass the artifact gates:\n${input.previousFailure}\n`
+ : '';
return `Repair isolated-diff textual conflict.
WorkUnit: ${input.unitKey}
@@ -95,17 +69,22 @@ ${input.touchedPaths.map((path) => `- ${path}`).join('\n')}
Git apply failure:
${input.reason}
-
-Use read_failed_patch first. Then read the touched integration files, write the
-repaired content, and stop.`;
+${previousFailureBlock}
+Use read_failed_patch first. Then read the touched integration files and either
+write the repaired content or, when the patch adds nothing the current files do
+not already cover, call declare_patch_redundant. Then stop.`;
}
-function buildToolSet(input: {
- workdir: string;
+function buildResolverExtraTools(input: {
patchPath: string;
- allowedPaths: ReadonlySet;
- editedPaths: Set;
+ context: ConstrainedRepairToolContext;
}): KtxRuntimeToolSet {
+ const declareSchema = z.object({
+ reason: z
+ .string()
+ .min(1)
+ .describe('Why the integration tree already represents everything this patch contributes.'),
+ });
return {
read_failed_patch: {
name: 'read_failed_patch',
@@ -119,46 +98,18 @@ function buildToolSet(input: {
};
},
},
- read_integration_file: {
- name: 'read_integration_file',
- description: 'Read one allowed file from the current integration worktree.',
- inputSchema: readIntegrationFileSchema,
- execute: async ({ path }: z.infer) => {
- const normalized = assertAllowedPath(path, input.allowedPaths);
- const file = await readOptionalFile(join(input.workdir, normalized));
+ ...buildDeleteRepairFileTool(input.context),
+ declare_patch_redundant: {
+ name: 'declare_patch_redundant',
+ description:
+ 'Declare that the failed patch needs no integration because the current worktree already ' +
+ 'represents its content (for example a duplicate page created by another work unit).',
+ inputSchema: declareSchema,
+ execute: async ({ reason }: z.infer) => {
+ input.context.declareNoChange(reason);
return {
- markdown: file.exists ? file.content : `(missing file: ${normalized})`,
- structured: { path: normalized, exists: file.exists },
- };
- },
- },
- write_integration_file: {
- name: 'write_integration_file',
- description: 'Replace one allowed integration worktree file with repaired text content.',
- inputSchema: writeIntegrationFileSchema,
- execute: async ({ path, content }: z.infer) => {
- const normalized = assertAllowedPath(path, input.allowedPaths);
- const fullPath = join(input.workdir, normalized);
- await mkdir(dirname(fullPath), { recursive: true });
- await writeFile(fullPath, content, 'utf-8');
- input.editedPaths.add(normalized);
- return {
- markdown: `Wrote ${normalized}`,
- structured: { path: normalized, bytes: Buffer.byteLength(content) },
- };
- },
- },
- delete_integration_file: {
- name: 'delete_integration_file',
- description: 'Delete one allowed integration worktree file when the failed patch proves the deletion is correct.',
- inputSchema: deleteIntegrationFileSchema,
- execute: async ({ path }: z.infer) => {
- const normalized = assertAllowedPath(path, input.allowedPaths);
- await rm(join(input.workdir, normalized), { force: true });
- input.editedPaths.add(normalized);
- return {
- markdown: `Deleted ${normalized}`,
- structured: { path: normalized },
+ markdown: `Declared patch redundant: ${reason}`,
+ structured: { reason },
};
},
},
@@ -168,71 +119,42 @@ function buildToolSet(input: {
export async function resolveTextualConflict(
input: ResolveTextualConflictInput,
): Promise {
- const allowedPaths = new Set(input.touchedPaths.map(normalizeRepoPath));
- const maxAttempts = input.maxAttempts ?? 1;
- const stepBudget = input.stepBudget ?? 12;
- let lastFailure = 'resolver did not run';
-
- for (let attempt = 1; attempt <= maxAttempts; attempt += 1) {
- const editedPaths = new Set();
- const traceData = {
+ const sortedTouchedPaths = [...input.touchedPaths].sort();
+ return runConstrainedRepairLoop({
+ agentRunner: input.agentRunner,
+ workdir: input.workdir,
+ allowedPaths: input.touchedPaths,
+ trace: input.trace,
+ tracePhase: 'resolver',
+ traceEventName: 'textual_conflict_resolver',
+ traceData: {
unitKey: input.unitKey,
patchPath: input.patchPath,
- touchedPaths: [...allowedPaths].sort(),
- attempt,
- maxAttempts,
+ touchedPaths: sortedTouchedPaths,
reason: input.reason,
- };
- const result = await traceTimed(input.trace, 'resolver', 'textual_conflict_resolver', traceData, async () =>
- input.agentRunner.runLoop({
- modelRole: 'repair',
- systemPrompt: buildResolverSystemPrompt(),
- userPrompt: buildResolverUserPrompt({
- unitKey: input.unitKey,
- patchPath: input.patchPath,
- touchedPaths: [...allowedPaths].sort(),
- reason: input.reason,
- attempt,
- maxAttempts,
- }),
- toolSet: buildToolSet({
- workdir: input.workdir,
- patchPath: input.patchPath,
- allowedPaths,
- editedPaths,
- }),
- stepBudget,
- telemetryTags: {
- operationName: 'ingest-isolated-diff-textual-resolver',
- source: input.trace.context.sourceKey,
- jobId: input.trace.context.jobId,
- unitKey: input.unitKey,
- },
+ },
+ systemPrompt: buildResolverSystemPrompt(),
+ buildUserPrompt: ({ attempt, maxAttempts, previousFailure }) =>
+ buildResolverUserPrompt({
+ unitKey: input.unitKey,
+ patchPath: input.patchPath,
+ touchedPaths: sortedTouchedPaths,
+ reason: input.reason,
+ attempt,
+ maxAttempts,
+ previousFailure,
}),
- );
-
- if (result.stopReason === 'error') {
- lastFailure = result.error?.message ?? 'resolver agent loop errored';
- await input.trace.event('error', 'resolver', 'textual_conflict_resolver_failed', traceData, result.error);
- continue;
- }
-
- const changedPaths = [...editedPaths].sort();
- if (changedPaths.length === 0) {
- lastFailure = 'resolver completed without editing an allowed path';
- await input.trace.event('error', 'resolver', 'textual_conflict_resolver_failed', {
- ...traceData,
- reason: lastFailure,
- });
- continue;
- }
-
- await input.trace.event('debug', 'resolver', 'textual_conflict_resolver_repaired', {
- ...traceData,
- changedPaths,
- });
- return { status: 'repaired', attempts: attempt, changedPaths };
- }
-
- return { status: 'failed', attempts: maxAttempts, reason: lastFailure };
+ buildExtraTools: (context) => buildResolverExtraTools({ patchPath: input.patchPath, context }),
+ verify: input.verify,
+ noChangeFailureReason: 'resolver completed without editing an allowed path or declaring the patch redundant',
+ telemetryTags: {
+ operationName: 'ingest-isolated-diff-textual-resolver',
+ source: input.trace.context.sourceKey,
+ jobId: input.trace.context.jobId,
+ unitKey: input.unitKey,
+ },
+ maxAttempts: input.maxAttempts,
+ stepBudget: input.stepBudget ?? 12,
+ abortSignal: input.abortSignal,
+ });
}
diff --git a/packages/cli/src/context/ingest/isolated-diff/work-unit-executor.ts b/packages/cli/src/context/ingest/isolated-diff/work-unit-executor.ts
index 7475612e..5ab52102 100644
--- a/packages/cli/src/context/ingest/isolated-diff/work-unit-executor.ts
+++ b/packages/cli/src/context/ingest/isolated-diff/work-unit-executor.ts
@@ -14,6 +14,7 @@ export interface RunIsolatedWorkUnitInput {
patchDir: string;
trace: IngestTraceWriter;
workUnit: WorkUnit;
+ abortSignal?: AbortSignal;
run(child: IngestSessionWorktree): Promise;
afterSuccess?(child: IngestSessionWorktree): Promise;
}
diff --git a/packages/cli/src/context/ingest/local-bundle-runtime.ts b/packages/cli/src/context/ingest/local-bundle-runtime.ts
index 9d6aba95..46847646 100644
--- a/packages/cli/src/context/ingest/local-bundle-runtime.ts
+++ b/packages/cli/src/context/ingest/local-bundle-runtime.ts
@@ -4,6 +4,7 @@ import { fileURLToPath } from 'node:url';
import YAML from 'yaml';
import { localConnectionInfoFromConfig } from '../../context/connections/local-warehouse-descriptor.js';
import type { KtxSqlQueryExecutorPort } from '../../context/connections/query-executor.js';
+import type { SqlAnalysisPort } from '../../context/sql-analysis/ports.js';
import type { KtxEmbeddingPort } from '../../context/core/embedding.js';
import type { KtxLogger } from '../../context/core/config.js';
import { noopLogger } from '../../context/core/config.js';
@@ -12,6 +13,7 @@ import type { KtxSemanticLayerComputePort } from '../../context/daemon/semantic-
import { createRuntimeToolDescriptorFromAiTool } from '../../context/llm/runtime-tools.js';
import { createLocalKtxLlmRuntimeFromConfig } from '../../context/llm/local-config.js';
import { KtxIngestEmbeddingPortAdapter } from '../../context/llm/embedding-port.js';
+import { createRateLimitGovernorConfig, RateLimitGovernor } from '../../context/llm/rate-limit-governor.js';
import { RuntimeAgentRunner, type AgentRunnerPort, type KtxLlmRuntimePort, type KtxRuntimeToolSet } from '../../context/llm/runtime-port.js';
import type { KtxEmbeddingProvider } from '../../llm/types.js';
import type { KtxLocalProject } from '../../context/project/project.js';
@@ -75,7 +77,7 @@ import type { SourceAdapter } from './types.js';
const promptsDir = fileURLToPath(new URL('../../prompts', import.meta.url));
const skillsDir = fileURLToPath(new URL('../../skills', import.meta.url));
-const LOCAL_AUTHOR = { name: 'KTX Local', email: 'local@ktx.local' };
+const LOCAL_AUTHOR = { name: 'ktx Local', email: 'local@ktx.local' };
const LOCAL_SHAPE_WARNING = 'Local ingest validates semantic-layer YAML shape only.';
const INGEST_TRACE_LEVELS = new Set(['error', 'info', 'debug', 'trace']);
@@ -94,6 +96,7 @@ export interface CreateLocalBundleIngestRuntimeOptions {
memoryModel?: string;
semanticLayerCompute?: KtxSemanticLayerComputePort;
queryExecutor?: KtxSqlQueryExecutorPort;
+ sqlAnalysis?: SqlAnalysisPort;
jobIdFactory?: () => string;
logger?: KtxLogger;
embeddingProvider?: KtxEmbeddingProvider | null;
@@ -229,8 +232,9 @@ class LocalSlPythonPort implements SlPythonPort {
}
class LocalShapeOnlySlValidator implements SlValidatorPort {
- private validateParsedSource(sourceName: string, parsed: Record) {
- const isOverlay = parsed.table == null && parsed.sql == null;
+ private validateParsedSource(sourceName: string, parsed: unknown) {
+ const fields = (parsed ?? {}) as { table?: unknown; sql?: unknown };
+ const isOverlay = fields.table == null && fields.sql == null;
const result = (isOverlay ? sourceOverlaySchema : sourceDefinitionSchema).safeParse(parsed);
return result.success
? { errors: [], warnings: [LOCAL_SHAPE_WARNING] }
@@ -252,7 +256,7 @@ class LocalShapeOnlySlValidator implements SlValidatorPort {
const { sources, loadErrors } = await deps.semanticLayerService.loadAllSources(connectionId);
const source = sources.find((candidate) => candidate.name === sourceName);
if (source) {
- return this.validateParsedSource(sourceName, source as unknown as Record);
+ return this.validateParsedSource(sourceName, source);
}
const detail =
loadErrors.length > 0
@@ -270,16 +274,13 @@ class LocalShapeOnlySlValidator implements SlValidatorPort {
}
async validateSingleSource(deps: SlValidationDeps, connectionId: string, sourceName: string) {
- let content: string;
- try {
- const file = await deps.semanticLayerService.readSourceFile(connectionId, sourceName);
- content = file.content;
- } catch (error) {
- return this.validateComposedSource(deps, connectionId, sourceName, error);
+ const file = await deps.semanticLayerService.readSourceFile(connectionId, sourceName);
+ if (!file) {
+ return this.validateComposedSource(deps, connectionId, sourceName, 'no standalone or overlay file found');
}
try {
- const parsed = YAML.parse(content) as unknown as Record;
+ const parsed = YAML.parse(file.content) as Record;
return this.validateParsedSource(sourceName, parsed);
} catch (error) {
return {
@@ -518,6 +519,7 @@ class LocalIngestToolsetFactory implements IngestToolsetFactoryPort {
authorResolver: GitAuthorResolverPort;
slSourcesRepository: SlSourcesIndexPort;
connections: SlConnectionCatalogPort;
+ sqlAnalysis?: SqlAnalysisPort;
contextStore: SqliteContextEvidenceStore;
embedding: KtxEmbeddingPort;
}) {
@@ -550,6 +552,7 @@ class LocalIngestToolsetFactory implements IngestToolsetFactoryPort {
const slDiscoverTool = new SlDiscoverTool(slDeps, { maxSources: 25, minRrfScore: 0, maxDetailedSources: 5 });
const warehouseVerificationTools = createWarehouseVerificationTools({
connections: deps.connections,
+ ...(deps.sqlAnalysis ? { sqlAnalysis: deps.sqlAnalysis } : {}),
fallbackFileStore: deps.project.fileStore,
wikiSearchTool,
slDiscoverTool,
@@ -614,12 +617,12 @@ function localIngestLlmProviderGuardMessage(projectDir: string): string {
'ktx ingest requires llm.provider.backend: anthropic, vertex, gateway, claude-code, or codex, or an injected agentRunner.',
'Configure a local Claude Code/Codex session or API-backed LLM, then rerun ingest:',
` ktx setup --project-dir ${projectDir} --llm-backend claude-code --no-input`,
- ` ktx setup --project-dir ${projectDir} --llm-backend codex --llm-model gpt-5.5 --no-input`,
- ` ktx setup --project-dir ${projectDir} --llm-backend anthropic --anthropic-api-key-env ANTHROPIC_API_KEY --llm-model claude-sonnet-4-6 --no-input`,
+ ` ktx setup --project-dir ${projectDir} --llm-backend codex --no-input`,
+ ` ktx setup --project-dir ${projectDir} --llm-backend anthropic --anthropic-api-key-env ANTHROPIC_API_KEY --no-input`,
].join('\n');
}
-function resolveAgentRunner(options: CreateLocalBundleIngestRuntimeOptions): {
+function resolveAgentRunner(options: CreateLocalBundleIngestRuntimeOptions, rateLimitGovernor: RateLimitGovernor): {
agentRunner: AgentRunnerPort;
llmRuntime?: KtxLlmRuntimePort;
} {
@@ -628,6 +631,7 @@ function resolveAgentRunner(options: CreateLocalBundleIngestRuntimeOptions): {
(options.createLlmRuntime ?? createLocalKtxLlmRuntimeFromConfig)(options.project.config.llm, {
projectDir: options.project.projectDir,
env: process.env,
+ rateLimitGovernor,
}) ??
undefined;
@@ -677,7 +681,13 @@ export function createLocalBundleIngestRuntime(
const knowledgeIndex = new LocalKnowledgeIndex(options.project, embedding);
const knowledgeEvents = new NoopKnowledgeEventPort();
const wikiService = new KnowledgeWikiService(rootFileStore, embedding, knowledgeIndex, options.project.git, logger);
- const { agentRunner, llmRuntime } = resolveAgentRunner(options);
+ const rateLimitGovernor = new RateLimitGovernor(
+ createRateLimitGovernorConfig({
+ ...options.project.config.ingest.rateLimit,
+ maxConcurrency: options.project.config.ingest.workUnits.maxConcurrency,
+ }),
+ );
+ const { agentRunner, llmRuntime } = resolveAgentRunner(options, rateLimitGovernor);
const promptService = new PromptService({ promptsDir, partials: [], logger });
const storage = new LocalIngestStorage(options.project);
const registry = registerAdapters(options.adapters);
@@ -691,6 +701,7 @@ export function createLocalBundleIngestRuntime(
authorResolver: new LocalAuthorResolver(),
slSourcesRepository,
connections,
+ ...(options.sqlAnalysis ? { sqlAnalysis: options.sqlAnalysis } : {}),
contextStore,
embedding,
});
@@ -717,6 +728,7 @@ export function createLocalBundleIngestRuntime(
workUnitMaxConcurrency: options.project.config.ingest.workUnits.maxConcurrency,
workUnitStepBudget: options.project.config.ingest.workUnits.stepBudget,
workUnitFailureMode: options.project.config.ingest.workUnits.failureMode,
+ rateLimitGovernor,
profileIngest: options.project.config.ingest.profile,
ingestTraceLevel: ingestTraceLevelFromEnv(),
},
diff --git a/packages/cli/src/context/ingest/local-ingest.ts b/packages/cli/src/context/ingest/local-ingest.ts
index ec8a72f4..5f7f8c5a 100644
--- a/packages/cli/src/context/ingest/local-ingest.ts
+++ b/packages/cli/src/context/ingest/local-ingest.ts
@@ -2,7 +2,9 @@ import { randomUUID } from 'node:crypto';
import { cp, mkdir, rm } from 'node:fs/promises';
import { isAbsolute, resolve } from 'node:path';
import type { KtxSqlQueryExecutorPort } from '../../context/connections/query-executor.js';
+import type { SqlAnalysisPort } from '../../context/sql-analysis/ports.js';
import type { KtxLogger } from '../../context/core/config.js';
+import { createAbortError, isAbortError } from '../../context/core/abort.js';
import type { KtxSemanticLayerComputePort } from '../../context/daemon/semantic-layer-compute.js';
import type { AgentRunnerPort, KtxLlmRuntimePort } from '../../context/llm/runtime-port.js';
import type { KtxLocalProject } from '../../context/project/project.js';
@@ -34,8 +36,10 @@ export interface RunLocalIngestOptions {
memoryModel?: string;
semanticLayerCompute?: KtxSemanticLayerComputePort;
queryExecutor?: KtxSqlQueryExecutorPort;
+ sqlAnalysis?: SqlAnalysisPort;
logger?: KtxLogger;
embeddingProvider?: import('../../llm/types.js').KtxEmbeddingProvider | null;
+ abortSignal?: AbortSignal;
}
export interface LocalIngestResult {
@@ -123,10 +127,11 @@ function findAdapter(adapters: SourceAdapter[], source: string): SourceAdapter {
return adapter;
}
-function localJobContext(jobId: string, memoryFlow?: MemoryFlowEventSink): IngestJobContext {
+function localJobContext(jobId: string, memoryFlow?: MemoryFlowEventSink, abortSignal?: AbortSignal): IngestJobContext {
return {
jobId,
...(memoryFlow ? { memoryFlow } : {}),
+ ...(abortSignal ? { abortSignal } : {}),
startPhase() {
return new LocalIngestPhase();
},
@@ -156,8 +161,10 @@ async function runScheduledPullJob(options: {
memoryModel?: string;
semanticLayerCompute?: KtxSemanticLayerComputePort;
queryExecutor?: KtxSqlQueryExecutorPort;
+ sqlAnalysis?: SqlAnalysisPort;
logger?: KtxLogger;
embeddingProvider?: import('../../llm/types.js').KtxEmbeddingProvider | null;
+ abortSignal?: AbortSignal;
}): Promise {
const runtime = createLocalBundleIngestRuntime(options);
const jobId = options.jobId ?? runtime.nextJobId();
@@ -169,7 +176,7 @@ async function runScheduledPullJob(options: {
trigger: options.trigger ?? 'manual_resync',
bundleRef: { kind: 'scheduled_pull', config: options.pullConfig },
},
- localJobContext(jobId, options.memoryFlow),
+ localJobContext(jobId, options.memoryFlow, options.abortSignal),
);
const report = await runtime.store.findByJobId(jobId);
if (!report) {
@@ -210,8 +217,10 @@ export async function runLocalIngest(options: RunLocalIngestOptions): Promise [
- { type: 'work_unit_started', unitKey: workUnit.unitKey, skills: [], stepBudget: 0 } satisfies MemoryFlowEvent,
+ { type: 'work_unit_started', unitKey: workUnit.unitKey, skills: [] } satisfies MemoryFlowEvent,
...workUnit.actions.map(
(action): MemoryFlowEvent => ({
type: 'candidate_action',
diff --git a/packages/cli/src/context/ingest/memory-flow/schema.ts b/packages/cli/src/context/ingest/memory-flow/schema.ts
index 0268a53f..939d5a18 100644
--- a/packages/cli/src/context/ingest/memory-flow/schema.ts
+++ b/packages/cli/src/context/ingest/memory-flow/schema.ts
@@ -70,17 +70,22 @@ const memoryFlowEventSchema = z.discriminatedUnion('type', [
message: z.string().min(1),
transient: z.boolean().optional(),
}),
+ eventSchema({
+ type: z.literal('rate_limit_wait'),
+ provider: z.string(),
+ rateLimitType: z.string().optional(),
+ resumeAtMs: z.number().int().nonnegative(),
+ remainingMs: z.number().int().nonnegative(),
+ }),
eventSchema({
type: z.literal('work_unit_started'),
unitKey: z.string().min(1),
skills: z.array(z.string().min(1)),
- stepBudget: z.number().int().min(0),
}),
eventSchema({
type: z.literal('work_unit_step'),
unitKey: z.string().min(1),
- stepIndex: z.number().int().min(0),
- stepBudget: z.number().int().min(0),
+ toolCalls: z.number().int().min(0),
}),
eventSchema({
type: z.literal('candidate_action'),
diff --git a/packages/cli/src/context/ingest/memory-flow/types.ts b/packages/cli/src/context/ingest/memory-flow/types.ts
index ab4619a6..e620189e 100644
--- a/packages/cli/src/context/ingest/memory-flow/types.ts
+++ b/packages/cli/src/context/ingest/memory-flow/types.ts
@@ -60,17 +60,22 @@ type MemoryFlowEventPayload =
message: string;
transient?: boolean;
}
+ | {
+ type: 'rate_limit_wait';
+ provider: string;
+ rateLimitType?: string;
+ resumeAtMs: number;
+ remainingMs: number;
+ }
| {
type: 'work_unit_started';
unitKey: string;
skills: string[];
- stepBudget: number;
}
| {
type: 'work_unit_step';
unitKey: string;
- stepIndex: number;
- stepBudget: number;
+ toolCalls: number;
}
| {
type: 'candidate_action';
diff --git a/packages/cli/src/context/ingest/memory-flow/view-model.ts b/packages/cli/src/context/ingest/memory-flow/view-model.ts
index c8f784b8..028cb53f 100644
--- a/packages/cli/src/context/ingest/memory-flow/view-model.ts
+++ b/packages/cli/src/context/ingest/memory-flow/view-model.ts
@@ -513,7 +513,7 @@ export function buildMemoryFlowViewModel(input: MemoryFlowReplayInput): MemoryFl
: `${input.connectionId}/${input.adapter}`;
return {
- title: `KTX memory flow ${titleSources} ${input.status}`,
+ title: `ktx memory flow ${titleSources} ${input.status}`,
subtitle: `Run ${input.runId} Sync ${input.syncId}`,
status: input.status,
activeLine: activeLine(input),
diff --git a/packages/cli/src/context/ingest/ports.ts b/packages/cli/src/context/ingest/ports.ts
index e37e7460..88294f59 100644
--- a/packages/cli/src/context/ingest/ports.ts
+++ b/packages/cli/src/context/ingest/ports.ts
@@ -5,6 +5,7 @@ import type { KtxFileStorePort } from '../../context/core/file-store.js';
import type { KtxLogger } from '../../context/core/config.js';
import type { SessionOutcome } from '../../context/core/session-worktree.service.js';
import type { AgentRunnerPort, KtxLlmRuntimePort, KtxRuntimeToolSet } from '../../context/llm/runtime-port.js';
+import type { RateLimitGovernor } from '../llm/rate-limit-governor.js';
import type { MemoryAction, MemoryKnowledgeSlRefsPort } from '../../context/memory/types.js';
import type { PromptService } from '../../context/prompts/prompt.service.js';
import type { SkillsRegistryService } from '../../context/skills/skills-registry.service.js';
@@ -144,6 +145,7 @@ interface IngestSettingsPort {
workUnitMaxConcurrency?: number;
workUnitStepBudget?: number;
workUnitFailureMode?: 'abort' | 'continue';
+ rateLimitGovernor?: RateLimitGovernor;
/** Print a timing breakdown to stderr at the end of each run (config-driven; see also KTX_PROFILE_INGEST). `'json'` emits the raw structured profile. */
profileIngest?: boolean | 'json';
ingestTraceLevel?: IngestTraceLevel;
@@ -322,7 +324,7 @@ export interface CuratorPaginationPort {
}) => string;
buildToolSet: (passNumber: number) => KtxRuntimeToolSet;
getReconciliationActions: () => MemoryAction[];
- onStepFinish?: (info: { passNumber: number; stepIndex: number; stepBudget: number }) => void;
+ abortSignal?: AbortSignal;
}): Promise;
}
diff --git a/packages/cli/src/context/ingest/stages/stage-3-work-units.ts b/packages/cli/src/context/ingest/stages/stage-3-work-units.ts
index ec514a02..91f8b24b 100644
--- a/packages/cli/src/context/ingest/stages/stage-3-work-units.ts
+++ b/packages/cli/src/context/ingest/stages/stage-3-work-units.ts
@@ -1,21 +1,18 @@
import type { KtxModelRole } from '../../../llm/types.js';
+import { isAbortError } from '../../core/abort.js';
import type { AgentRunnerPort, KtxRuntimeToolSet, RunLoopMetrics } from '../../../context/llm/runtime-port.js';
import type { CaptureSession, MemoryAction } from '../../../context/memory/types.js';
import { listTouchedSlSources, type TouchedSlSource } from '../../../context/tools/touched-sl-sources.js';
+import { formatInvalidWuSources, type WuValidationResult } from './validate-wu-sources.js';
import type { WorkUnit } from '../types.js';
const MAX_WORK_UNIT_PROMPT_CHARS = 240_000;
-interface TouchedValidationResult {
- invalidSources: string[];
- validSources: string[];
-}
-
export interface WorkUnitExecutionDeps {
sessionWorktreeGit: { revParseHead(): Promise };
agentRunner: AgentRunnerPort;
validateWikiRefs?: (actions: MemoryAction[]) => Promise;
- validateTouchedSources: (touched: TouchedSlSource[]) => Promise;
+ validateTouchedSources: (touched: TouchedSlSource[]) => Promise;
resetHardTo: (targetSha: string) => Promise;
buildSystemPrompt: (wu: WorkUnit) => string;
buildUserPrompt: (wu: WorkUnit) => string;
@@ -27,7 +24,7 @@ export interface WorkUnitExecutionDeps {
sourceKey: string;
connectionId: string;
jobId: string;
- onStepFinish?: (info: { stepIndex: number; stepBudget: number }) => void;
+ abortSignal?: AbortSignal;
toolFailureCount?: (unitKey: string) => number;
}
@@ -105,9 +102,12 @@ export async function executeWorkUnit(deps: WorkUnitExecutionDeps, wu: WorkUnit)
unitKey: wu.unitKey,
jobId: deps.jobId,
},
- onStepFinish: deps.onStepFinish,
+ abortSignal: deps.abortSignal,
});
} catch (error) {
+ if (isAbortError(error)) {
+ throw error;
+ }
return failWithResetFromCurrentHead(error instanceof Error ? error.message : String(error));
}
@@ -152,7 +152,7 @@ export async function executeWorkUnit(deps: WorkUnitExecutionDeps, wu: WorkUnit)
// Spec: invalid SL writes reset the session worktree to the WU's pre-state, WU is marked failed,
// its files are absent from the Stage Index. Per-source surgical revert is the
// memory-agent pattern — NOT the bundle-ingest pattern.
- return failWithReset(`sl_validate failed for: ${validation.invalidSources.join(', ')}`);
+ return failWithReset(`sl_validate failed for: ${formatInvalidWuSources(validation.invalidSources)}`);
}
}
diff --git a/packages/cli/src/context/ingest/stages/stage-4-reconciliation.ts b/packages/cli/src/context/ingest/stages/stage-4-reconciliation.ts
index 5abc9bfb..d87a8b80 100644
--- a/packages/cli/src/context/ingest/stages/stage-4-reconciliation.ts
+++ b/packages/cli/src/context/ingest/stages/stage-4-reconciliation.ts
@@ -15,7 +15,7 @@ export interface ReconciliationContext {
sourceKey: string;
jobId: string;
force?: boolean;
- onStepFinish?: (info: { stepIndex: number; stepBudget: number }) => void;
+ abortSignal?: AbortSignal;
forceRun?: boolean;
}
@@ -39,7 +39,7 @@ export async function runReconciliationStage4(ctx: ReconciliationContext): Promi
toolSet: ctx.buildToolSet(),
stepBudget: ctx.stepBudget,
telemetryTags: { operationName: 'ingest-bundle-reconcile', source: ctx.sourceKey, jobId: ctx.jobId },
- onStepFinish: ctx.onStepFinish,
+ abortSignal: ctx.abortSignal,
});
return { skipped: false, stopReason: run.stopReason, error: run.error, ...(run.metrics ? { metrics: run.metrics } : {}) };
}
diff --git a/packages/cli/src/context/ingest/stages/validate-wu-sources.ts b/packages/cli/src/context/ingest/stages/validate-wu-sources.ts
index 4bc3aaa0..f89e5730 100644
--- a/packages/cli/src/context/ingest/stages/validate-wu-sources.ts
+++ b/packages/cli/src/context/ingest/stages/validate-wu-sources.ts
@@ -1,24 +1,153 @@
+import { findMissingJoinTargets, formatMissingJoinTarget } from '../../../context/sl/semantic-layer.service.js';
import type { SlValidationDeps } from '../../../context/sl/tools/sl-warehouse-validation.js';
import type { SlValidatorPort } from '../../../context/sl/sl-validator.port.js';
import type { TouchedSlSource } from '../../../context/tools/touched-sl-sources.js';
+export interface InvalidWuSource {
+ /** `${connectionId}:${sourceName}` */
+ source: string;
+ errors: string[];
+}
+
export interface WuValidationResult {
validSources: string[];
- invalidSources: string[];
+ invalidSources: InvalidWuSource[];
+}
+
+export function formatInvalidWuSources(invalid: InvalidWuSource[]): string {
+ return invalid.map((entry) => `${entry.source} (${entry.errors.join('; ')})`).join(', ');
+}
+
+type LoadedSource = Awaited>['sources'][number];
+
+function uniqueTouchedSources(sources: TouchedSlSource[]): TouchedSlSource[] {
+ const seen = new Set();
+ const unique: TouchedSlSource[] = [];
+ for (const source of sources) {
+ const key = `${source.connectionId}:${source.sourceName}`;
+ if (seen.has(key)) {
+ continue;
+ }
+ seen.add(key);
+ unique.push(source);
+ }
+ return unique.sort((left, right) => {
+ const byConnection = left.connectionId.localeCompare(right.connectionId);
+ return byConnection === 0 ? left.sourceName.localeCompare(right.sourceName) : byConnection;
+ });
+}
+
+/**
+ * Expand the touched set with direct join neighbors that exist: targets the
+ * touched sources join to, and existing sources that join to a touched one.
+ * Missing targets are not added here — they are reported as join-target
+ * errors on the source that declares them, so the failure names the file
+ * that must change instead of the phantom neighbor.
+ */
+function expandWithExistingJoinNeighbors(
+ touched: TouchedSlSource[],
+ sourcesByConnection: Map,
+): TouchedSlSource[] {
+ const expanded = [...touched];
+ const touchedByConnection = new Map>();
+ for (const source of touched) {
+ const bucket = touchedByConnection.get(source.connectionId) ?? new Set();
+ bucket.add(source.sourceName);
+ touchedByConnection.set(source.connectionId, bucket);
+ }
+
+ for (const [connectionId, sources] of sourcesByConnection) {
+ const touchedNames = touchedByConnection.get(connectionId);
+ if (!touchedNames || touchedNames.size === 0) {
+ continue;
+ }
+ const existingNames = new Set(sources.map((source) => source.name));
+ for (const source of sources) {
+ if (touchedNames.has(source.name)) {
+ for (const join of source.joins ?? []) {
+ if (existingNames.has(join.to)) {
+ expanded.push({ connectionId, sourceName: join.to });
+ }
+ }
+ }
+ if ((source.joins ?? []).some((join) => touchedNames.has(join.to))) {
+ expanded.push({ connectionId, sourceName: source.name });
+ }
+ }
+ }
+
+ return uniqueTouchedSources(expanded);
+}
+
+/**
+ * Join-target errors attributable to this change set: every join declared by
+ * a touched source must resolve, and no source may be left joining to a name
+ * this change set removed. Pre-existing dangling joins on untouched sources
+ * are out of scope — they must not block unrelated work. Resolution is the
+ * Python engine's: exact source-name match within the connection.
+ */
+function findJoinTargetErrors(
+ touched: TouchedSlSource[],
+ sourcesByConnection: Map,
+): Map {
+ const errorsBySource = new Map();
+ const touchedByConnection = new Map>();
+ for (const source of touched) {
+ const bucket = touchedByConnection.get(source.connectionId) ?? new Set();
+ bucket.add(source.sourceName);
+ touchedByConnection.set(source.connectionId, bucket);
+ }
+
+ for (const [connectionId, sources] of sourcesByConnection) {
+ const touchedNames = touchedByConnection.get(connectionId);
+ if (!touchedNames || touchedNames.size === 0) {
+ continue;
+ }
+ const existingNames = sources.map((source) => source.name);
+ for (const source of sources) {
+ const sourceIsTouched = touchedNames.has(source.name);
+ const candidateJoins = sourceIsTouched
+ ? source.joins
+ : (source.joins ?? []).filter((join) => touchedNames.has(join.to));
+ const missing = findMissingJoinTargets(candidateJoins, existingNames);
+ if (missing.length === 0) {
+ continue;
+ }
+ const key = `${connectionId}:${source.name}`;
+ const messages = missing.map(formatMissingJoinTarget);
+ errorsBySource.set(key, [...(errorsBySource.get(key) ?? []), ...messages]);
+ }
+ }
+ return errorsBySource;
}
export async function validateWuTouchedSources(
deps: SlValidationDeps & { slValidator: SlValidatorPort },
touched: TouchedSlSource[],
): Promise {
+ if (touched.length === 0) {
+ return { validSources: [], invalidSources: [] };
+ }
+
+ const sourcesByConnection = new Map();
+ for (const connectionId of new Set(touched.map((source) => source.connectionId))) {
+ const { sources } = await deps.semanticLayerService.loadAllSources(connectionId);
+ sourcesByConnection.set(connectionId, sources);
+ }
+
+ const expanded = expandWithExistingJoinNeighbors(touched, sourcesByConnection);
+ const joinTargetErrors = findJoinTargetErrors(touched, sourcesByConnection);
+
const valid: string[] = [];
- const invalid: string[] = [];
- for (const source of touched) {
+ const invalid: InvalidWuSource[] = [];
+ for (const source of expanded) {
+ const key = `${source.connectionId}:${source.sourceName}`;
const result = await deps.slValidator.validateSingleSource(deps, source.connectionId, source.sourceName);
- if (result.errors.length === 0) {
- valid.push(`${source.connectionId}:${source.sourceName}`);
+ const errors = [...result.errors, ...(joinTargetErrors.get(key) ?? [])];
+ if (errors.length === 0) {
+ valid.push(key);
} else {
- invalid.push(`${source.connectionId}:${source.sourceName}`);
+ invalid.push({ source: key, errors });
}
}
return { validSources: valid, invalidSources: invalid };
diff --git a/packages/cli/src/context/ingest/tools/warehouse-verification/create-warehouse-verification-tools.ts b/packages/cli/src/context/ingest/tools/warehouse-verification/create-warehouse-verification-tools.ts
index 166713b9..95b5f9cb 100644
--- a/packages/cli/src/context/ingest/tools/warehouse-verification/create-warehouse-verification-tools.ts
+++ b/packages/cli/src/context/ingest/tools/warehouse-verification/create-warehouse-verification-tools.ts
@@ -1,5 +1,6 @@
import type { KtxFileStorePort } from '../../../core/file-store.js';
import type { SlConnectionCatalogPort } from '../../../sl/ports.js';
+import type { SqlAnalysisPort } from '../../../sql-analysis/ports.js';
import { WarehouseCatalogService } from '../../../scan/warehouse-catalog.js';
import type { BaseTool, ToolContext } from '../../../tools/base-tool.js';
import { DiscoverDataTool } from './discover-data.tool.js';
@@ -8,6 +9,7 @@ import { SqlExecutionTool } from './sql-execution.tool.js';
export function createWarehouseVerificationTools(deps: {
connections: SlConnectionCatalogPort;
+ sqlAnalysis?: SqlAnalysisPort;
fallbackFileStore: KtxFileStorePort;
wikiSearchTool: BaseTool;
slDiscoverTool: BaseTool;
@@ -18,7 +20,7 @@ export function createWarehouseVerificationTools(deps: {
});
return [
new EntityDetailsTool(catalogFactory),
- new SqlExecutionTool(deps.connections),
+ new SqlExecutionTool(deps.connections, deps.sqlAnalysis),
new DiscoverDataTool({
wikiSearchTool: deps.wikiSearchTool,
slDiscoverTool: deps.slDiscoverTool,
diff --git a/packages/cli/src/context/ingest/tools/warehouse-verification/discover-data.tool.ts b/packages/cli/src/context/ingest/tools/warehouse-verification/discover-data.tool.ts
index e358e970..6c3380bc 100644
--- a/packages/cli/src/context/ingest/tools/warehouse-verification/discover-data.tool.ts
+++ b/packages/cli/src/context/ingest/tools/warehouse-verification/discover-data.tool.ts
@@ -70,10 +70,10 @@ export class DiscoverDataTool extends BaseTool {
}
if (input.sourceName) {
- const sl = await this.deps.slDiscoverTool.call(
+ const sl = (await this.deps.slDiscoverTool.call(
{ sourceName: input.sourceName, connectionId: input.connectionId },
context,
- );
+ )) as ToolOutput;
return { markdown: sl.markdown, structured: { wiki: null, sl: sl.structured, raw: null } };
}
@@ -85,17 +85,17 @@ export class DiscoverDataTool extends BaseTool {
let raw: DiscoverDataStructured['raw'] = null;
if (query) {
- const wikiResult = await this.deps.wikiSearchTool.call({ query, limit }, context);
+ const wikiResult = (await this.deps.wikiSearchTool.call({ query, limit }, context)) as ToolOutput;
if (totalFound(wikiResult.structured) > 0) {
parts.push('## Wiki Pages', '> use `wiki_read(blockKey)` for full content', wikiResult.markdown, '');
wiki = wikiResult.structured;
}
}
- const slResult = await this.deps.slDiscoverTool.call(
+ const slResult = (await this.deps.slDiscoverTool.call(
{ query: query || undefined, connectionId: input.connectionId },
context,
- );
+ )) as ToolOutput;
if (totalSources(slResult.structured) > 0) {
parts.push(
'## Semantic Layer Sources',
diff --git a/packages/cli/src/context/ingest/tools/warehouse-verification/sql-execution.tool.ts b/packages/cli/src/context/ingest/tools/warehouse-verification/sql-execution.tool.ts
index 76911ce7..9122d1e6 100644
--- a/packages/cli/src/context/ingest/tools/warehouse-verification/sql-execution.tool.ts
+++ b/packages/cli/src/context/ingest/tools/warehouse-verification/sql-execution.tool.ts
@@ -1,6 +1,8 @@
import { z } from 'zod';
import { assertReadOnlySql, limitSqlForExecution } from '../../../../context/connections/read-only-sql.js';
import type { SlConnectionCatalogPort } from '../../../../context/sl/ports.js';
+import { sqlAnalysisDialectForDriver } from '../../../../context/sql-analysis/dialect.js';
+import type { SqlAnalysisPort } from '../../../../context/sql-analysis/ports.js';
import { BaseTool, type ToolContext, type ToolOutput } from '../../../../context/tools/base-tool.js';
const sqlExecutionInputSchema = z.object({
@@ -40,7 +42,10 @@ function markdownTable(headers: string[], rows: unknown[][], totalRows: number):
export class SqlExecutionTool extends BaseTool {
readonly name = 'sql_execution';
- constructor(private readonly connections: SlConnectionCatalogPort) {
+ constructor(
+ private readonly connections: SlConnectionCatalogPort,
+ private readonly sqlAnalysis?: SqlAnalysisPort,
+ ) {
super();
}
@@ -69,9 +74,24 @@ export class SqlExecutionTool extends BaseTool {
};
}
+ if (!this.sqlAnalysis) {
+ throw new Error('sql_execution requires parser-backed SQL validation.');
+ }
+
let sql: string;
let wrappedSql: string;
try {
+ const connection = await this.connections.getConnectionById(input.connectionId);
+ if (!connection) {
+ throw new Error(`Connection not found: ${input.connectionId}`);
+ }
+ const validation = await this.sqlAnalysis.validateReadOnly(
+ input.sql,
+ sqlAnalysisDialectForDriver(connection.connectionType),
+ );
+ if (!validation.ok) {
+ throw new Error(validation.error ?? 'SQL is not read-only.');
+ }
sql = assertReadOnlySql(input.sql);
wrappedSql = limitSqlForExecution(sql, input.rowLimit);
} catch (error) {
diff --git a/packages/cli/src/context/ingest/types.ts b/packages/cli/src/context/ingest/types.ts
index 337885af..925f3d82 100644
--- a/packages/cli/src/context/ingest/types.ts
+++ b/packages/cli/src/context/ingest/types.ts
@@ -220,5 +220,6 @@ export interface IngestJobPhase {
export interface IngestJobContext {
jobId: string;
memoryFlow?: MemoryFlowEventSink;
+ abortSignal?: AbortSignal;
startPhase(weight: number): IngestJobPhase;
}
diff --git a/packages/cli/src/context/llm/ai-sdk-runtime.ts b/packages/cli/src/context/llm/ai-sdk-runtime.ts
index f5752355..a6776f49 100644
--- a/packages/cli/src/context/llm/ai-sdk-runtime.ts
+++ b/packages/cli/src/context/llm/ai-sdk-runtime.ts
@@ -3,7 +3,9 @@ import type { KtxLlmProvider } from '../../llm/types.js';
import { generateText, Output, stepCountIs, type FlexibleSchema, type TelemetrySettings, type ToolSet } from 'ai';
import type { z } from 'zod';
import { noopLogger, type KtxLogger } from '../../context/core/config.js';
+import { isAbortError } from '../core/abort.js';
import { summarizeKtxLlmDebugRequest, type KtxLlmDebugRequestRecorder } from './debug-request-recorder.js';
+import type { RateLimitGovernor, RateLimitProvider, RateLimitSignal } from './rate-limit-governor.js';
import { createAiSdkToolSet } from './runtime-tools.js';
import type {
KtxGenerateObjectInput,
@@ -40,12 +42,129 @@ export interface AiSdkKtxLlmRuntimeDeps {
telemetry?: AgentTelemetryPort;
logger?: KtxLogger;
debugRequestRecorder?: KtxLlmDebugRequestRecorder;
+ rateLimitGovernor?: Pick;
}
function hasTools(tools: Record): boolean {
return Object.keys(tools).length > 0;
}
+function modelProviderName(model: unknown): RateLimitProvider {
+ const provider = (model as { provider?: string }).provider ?? '';
+ return provider.includes('vertex') || provider.includes('google') ? 'vertex' : 'anthropic-api';
+}
+
+interface HeaderLimitPair {
+ limit: string;
+ remaining: string;
+ rateLimitType: string;
+}
+
+const RATE_LIMIT_HEADER_PAIRS: HeaderLimitPair[] = [
+ {
+ limit: 'anthropic-ratelimit-requests-limit',
+ remaining: 'anthropic-ratelimit-requests-remaining',
+ rateLimitType: 'rpm',
+ },
+ {
+ limit: 'anthropic-ratelimit-tokens-limit',
+ remaining: 'anthropic-ratelimit-tokens-remaining',
+ rateLimitType: 'tpm',
+ },
+ {
+ limit: 'anthropic-ratelimit-input-tokens-limit',
+ remaining: 'anthropic-ratelimit-input-tokens-remaining',
+ rateLimitType: 'itpm',
+ },
+ {
+ limit: 'anthropic-ratelimit-output-tokens-limit',
+ remaining: 'anthropic-ratelimit-output-tokens-remaining',
+ rateLimitType: 'otpm',
+ },
+ {
+ limit: 'x-ratelimit-limit-requests',
+ remaining: 'x-ratelimit-remaining-requests',
+ rateLimitType: 'rpm',
+ },
+ {
+ limit: 'x-ratelimit-limit-tokens',
+ remaining: 'x-ratelimit-remaining-tokens',
+ rateLimitType: 'tpm',
+ },
+];
+
+function normalizeHeaders(headers: unknown): Record {
+ if (!headers || typeof headers !== 'object') {
+ return {};
+ }
+ const get = (headers as { get?: unknown }).get;
+ if (typeof get === 'function') {
+ const out: Record = {};
+ for (const pair of RATE_LIMIT_HEADER_PAIRS) {
+ const limit = get.call(headers, pair.limit);
+ const remaining = get.call(headers, pair.remaining);
+ if (typeof limit === 'string') out[pair.limit] = limit;
+ if (typeof remaining === 'string') out[pair.remaining] = remaining;
+ }
+ return out;
+ }
+ return Object.fromEntries(
+ Object.entries(headers as Record)
+ .filter((entry): entry is [string, string | number] => typeof entry[1] === 'string' || typeof entry[1] === 'number')
+ .map(([key, value]) => [key.toLowerCase(), String(value)]),
+ );
+}
+
+function numericHeader(headers: Record, key: string): number | undefined {
+ const value = Number(headers[key]);
+ return Number.isFinite(value) && value >= 0 ? value : undefined;
+}
+
+function utilizationForPair(headers: Record, pair: HeaderLimitPair): number | undefined {
+ const limit = numericHeader(headers, pair.limit);
+ const remaining = numericHeader(headers, pair.remaining);
+ if (limit === undefined || remaining === undefined || limit <= 0) {
+ return undefined;
+ }
+ return 1 - Math.min(limit, remaining) / limit;
+}
+
+function aiSdkHeaderRateLimitSignal(provider: RateLimitProvider, result: unknown): RateLimitSignal | undefined {
+ const headers = normalizeHeaders((result as { response?: { headers?: unknown } }).response?.headers);
+ let best: { utilization: number; rateLimitType: string } | undefined;
+ for (const pair of RATE_LIMIT_HEADER_PAIRS) {
+ const utilization = utilizationForPair(headers, pair);
+ if (utilization === undefined) {
+ continue;
+ }
+ if (!best || utilization > best.utilization) {
+ best = { utilization, rateLimitType: pair.rateLimitType };
+ }
+ }
+ if (!best) {
+ return undefined;
+ }
+ return {
+ provider,
+ status: 'allowed',
+ rateLimitType: best.rateLimitType,
+ utilization: Number(best.utilization.toFixed(4)),
+ };
+}
+
+function retryAfterMs(error: unknown): number | undefined {
+ const value = (error as { retryAfter?: unknown }).retryAfter;
+ if (typeof value === 'number' && Number.isFinite(value) && value > 0) {
+ return value < 1_000 ? value * 1_000 : value;
+ }
+ return undefined;
+}
+
+function isAiSdkRateLimitError(error: unknown): boolean {
+ const record = error as { name?: string; statusCode?: number; status?: number };
+ return record.name === 'TooManyRequestsError' || record.statusCode === 429 || record.status === 429;
+}
+
export class AiSdkKtxLlmRuntime implements KtxLlmRuntimePort {
private readonly logger: KtxLogger;
@@ -53,6 +172,41 @@ export class AiSdkKtxLlmRuntime implements KtxLlmRuntimePort {
this.logger = deps.logger ?? noopLogger;
}
+ private async generateTextWithRateLimitRetry(
+ provider: RateLimitProvider,
+ abortSignal: AbortSignal | undefined,
+ run: () => Promise,
+ ): Promise {
+ // maxRetryAttempts() returns 1 when no governor is present or pacing is
+ // disabled, so a 429 throws immediately instead of hammering the provider
+ // with no backoff; the AI SDK's own maxRetries still handles transient 429s.
+ const maxAttempts = this.deps.rateLimitGovernor?.maxRetryAttempts() ?? 1;
+ let attempt = 0;
+ while (true) {
+ await this.deps.rateLimitGovernor?.waitForReady(abortSignal);
+ try {
+ const result = await run();
+ const signal = aiSdkHeaderRateLimitSignal(provider, result);
+ if (signal) {
+ this.deps.rateLimitGovernor?.report(signal);
+ }
+ return result;
+ } catch (error) {
+ if (isAbortError(error) || !isAiSdkRateLimitError(error) || attempt >= maxAttempts - 1) {
+ throw error;
+ }
+ attempt += 1;
+ const retryAfter = retryAfterMs(error);
+ this.deps.rateLimitGovernor?.report({
+ provider,
+ status: 'rejected',
+ rateLimitType: 'http_429',
+ ...(retryAfter !== undefined ? { retryAfterMs: retryAfter } : {}),
+ });
+ }
+ }
+ }
+
async generateText(input: KtxGenerateTextInput): Promise {
const model = this.deps.llmProvider.getModel(input.role);
if ((model as { provider?: string }).provider === 'deterministic') {
@@ -67,12 +221,13 @@ export class AiSdkKtxLlmRuntime implements KtxLlmRuntimePort {
});
const split = splitKtxSystemMessages(built.messages);
const startedAt = Date.now();
- const result = await generateText({
+ const request = {
model,
temperature: input.temperature ?? 0,
...(split.system ? { system: split.system } : {}),
messages: split.messages,
tools: built.tools as ToolSet,
+ ...(input.abortSignal ? { abortSignal: input.abortSignal } : {}),
...(hasTools(tools)
? {
experimental_repairToolCall: this.deps.llmProvider.repairToolCallHandler({
@@ -80,10 +235,11 @@ export class AiSdkKtxLlmRuntime implements KtxLlmRuntimePort {
}),
}
: {}),
- });
+ };
+ const result = await this.generateTextWithRateLimitRetry(modelProviderName(model), input.abortSignal, () => generateText(request));
input.onMetrics?.({ totalMs: Date.now() - startedAt, usage: toLlmTokenUsage(result.totalUsage ?? result.usage) });
if (typeof result.text !== 'string') {
- throw new Error('KTX LLM text generation returned no text');
+ throw new Error('ktx LLM text generation returned no text');
}
return result.text;
}
@@ -101,12 +257,13 @@ export class AiSdkKtxLlmRuntime implements KtxLlmRuntimePort {
});
const split = splitKtxSystemMessages(built.messages);
const startedAt = Date.now();
- const result = await generateText({
+ const request = {
model,
temperature: input.temperature ?? 0,
...(split.system ? { system: split.system } : {}),
messages: split.messages,
tools: built.tools as ToolSet,
+ ...(input.abortSignal ? { abortSignal: input.abortSignal } : {}),
...(hasTools(tools)
? {
experimental_repairToolCall: this.deps.llmProvider.repairToolCallHandler({
@@ -114,11 +271,12 @@ export class AiSdkKtxLlmRuntime implements KtxLlmRuntimePort {
}),
}
: {}),
- output: Output.object({ schema: input.schema as unknown as FlexibleSchema