feat: ktx batch — scan resilience, analytics SQL craft, connector hardening (#312)

* docs: add spider2-specs handoff directory for benchmark-driven feature specs * feat(cli): connection-scoped wiki pages Add an optional `connections` frontmatter field so database-specific wiki knowledge can be scoped to a connection without polluting searches about other databases, while page keys stay a flat, globally-unique namespace. - connections: single string or list; absent/empty ⇒ unscoped (applies to all) - wiki_search (MCP) and `ktx wiki --connection` return unscoped ∪ matching pages, filtered at the disk-load seam so all three search lanes draw their candidate pool from the already-scoped set (not a post-filter) - wiki_write accepts connections with REPLACE semantics and rejects a connection-scoped write whose key collides with a disjoint-connection page (data-loss guard; hard error, no silent clobber) - explicit connection-id args (wiki_search, memory_ingest, ktx wiki) are validated against ktx.yaml via a shared assertConfiguredConnectionId, which also closes the prior gap where memory_ingest's connectionId was unvalidated; persisted ids absent from config warn (not fail) in `ktx status` - prompt guidance in the wiki_capture skill and external-ingest prompt; the session connectionId is surfaced to the memory agent and ingest work units Implements spider2-specs/specs/01-connection-scoped-wiki.md; intake draft moved to spider2-specs/done/. * docs(spider2-specs): add specs/ refinement stage and composite-key join spec Describe the todo/ → specs/ → done/ pipeline in the README (refined specs are the durable artifact; intake drafts move to done/ on ship) and add a MEDIUM-priority spec for multi-column composite-key join detection found during the first sqlite smoke test. * feat(cli): add --verbatim ingest mode for authoritative documents Store each --text/--file document body unchanged as a GLOBAL wiki page instead of routing it through the memory agent, which may rewrite, condense, or re-title it. The LLM derives only metadata (summary, tags, sl_refs) and only for frontmatter fields the document does not already set; the stored body is written by code and never edited. - Deterministic page key: files derive it from the filename, inline text from its leading Markdown heading (headless inline text is rejected — pass it as --file instead). - Idempotent: re-running the same body is a no-op; a different body at the same key fails loudly rather than overwriting. - Works with llm.provider.backend: none, deriving a degraded summary from the heading or first sentence. - Existing frontmatter (including unmodeled fields like effective_date) passes through untouched; --connection-id scopes the page. * feat(cli): SQL-authoring craft and per-dialect notes tool for the analytics skill Spec 07: add a dialect-agnostic <sql_craft> block to the ktx-analytics skill (schema discovery, composition, window-function correctness, numeric precision, answer completeness) with one worked window-then-filter example. Workflow steps gain pointers into it; existing guidance is unchanged. Spec 08: add a read-only sql_dialect_notes MCP tool returning a connection's engine SQL conventions (FQTN form, identifier quoting/case, date/time, top-N idiom, JSON access), resolved through the existing sqlAnalysisDialectForDriver path. Notes are per-dialect markdown files under context/sql-analysis/dialects, served by the tool and copied to dist (package-internal, never installed). Non-SQL connections return a clear KtxExpectedError. The flat skill gains a one-line pointer to the tool. Both spider2-specs intake drafts move to done/ with implementation notes. * feat(cli): tolerate objects that fail introspection during scan Isolate per-object introspection failures so one broken or inaccessible object no longer zeroes out a connection's whole semantic layer: the sqlite and bigquery connectors introspect each object defensively (tryIntrospectObject), the live-database adapter records a scan outcome and fetch report, and enabled_tables accepts catalog.db.name, db.name, or bare names with a clear no-match error. Includes matching ktx-daemon introspection changes, docs, and tests. * docs(spider2-specs): add 06-scan-tolerate-broken-objects spec * feat(cli): generalize analytics fan-out rule to multi-hop join chains The ktx-analytics skill's fan-out rule only reliably caught single-hop inflation; agents still silently fanned out on multi-hop chains where the offending one-to-many join sits several hops below the SUM/COUNT and is easy to miss. Rewrite the Composition rule so the danger reads as cumulative across the whole chain (pre-aggregate per measure-owning table), add an affirmative grain-verification habit (default: pre-aggregate to grain; escape hatch: COUNT(DISTINCT key) for pure counts only; SUM/AVG of a fanned-out measure must pre-aggregate), and add one generic wrong-vs-right worked example. Content-only and dialect-agnostic; no new tool, flag, or config. Implements spider2-specs/specs/09 and annotates spec 07's one-example constraint as superseded. * feat(cli): add panel-completeness, time-series window, and text-encoded numeric SQL craft Extend the analytics skill's <sql_craft> with three correctness habits and route the dialect-specific halves through sql_dialect_notes: - Panel completeness (spec 10): full-domain spine -> LEFT JOIN -> COALESCE for "each/every/all/per" questions, defaulted by measure additivity. - Time-series windows (spec 11): explicit cumulative frames, calendar-range rolling windows with minimum-periods guards, and period-over-period via LAG. - Text-encoded numerics (spec 12): sample distinct values, strip/scale/cast in one early CTE, and confirm coverage with a failure-detecting cast. Add per-dialect Series, Rolling window, and Safe cast notes to all seven dialect files so the skill stays dialect-agnostic while the engine-specific syntax lives in sql_dialect_notes. Tests updated and passing (19). * docs(spider2-specs): add specs 10-12 for analytics SQL-craft additions Refined specs and completion records for the panel-completeness spine (10), time-series window recipes (11), and text-encoded numeric parsing (12) implemented in the preceding commit. * docs(spider2-specs): add backlog intake drafts 13-14 - 13: canonical authoritative-source measures - 14: output-completeness final check * skill(analytics): spec 14 output-completeness + iter1 (active column planning) Bundles two changes (entangled in SKILL.md; future spider2 iterations land as separate commits): - spec 14 (output-completeness): multi-part "answer every requested output" rule + a "Final completeness check" in workflow Step 6 and <sql_craft>; analytics skill-content test updated; intake draft -> done/, refined spec added. - iter1 experiment: spec 14's passive end-check did not change behavior on the benchmark's output-completeness failures, so (a) the Plan step now writes the exact output-column list UP FRONT as a contract the final SELECT must match, and (b) "expose identity" -> "project BOTH the entity id and its name" (covers both omission directions). All generic craft. Driven by the Spider 2.0-Lite failure analysis (incomplete output was the largest failure bucket); benchmark only as motivation. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com> * skill(analytics): iter2 — deterministic order in string/array aggregation GROUP_CONCAT/string_agg/array_agg element order is undefined without an explicit ORDER BY; also note SQLite's default text sort is binary/case-sensitive (uppercase before lowercase) vs case-insensitive (COLLATE NOCASE). Generic SQLite craft. Spider 2.0-Lite motivation: an ordered-ingredient-list question failed only on the within-string element order (right elements, wrong order); benchmark as motivation only. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com> * feat(mcp): structured, leveled logging for the MCP server Add one synchronous pino logger per MCP server process, written through the io.stderr sink: plain JSON when stderr is not a TTY, colorized pino-pretty (sync, in-process) when it is. Every tool call logs tool.start with its raw params BEFORE the handler runs and tool.end after (info / warn past KTX_MCP_SLOW_TOOL_MS / error), correlated by callId plus sessionId, so a runaway sql_execution leaves a recoverable start line with its exact SQL and no matching end. HTTP logs session.open/close and wires the previously-dead transport.onerror to transport.error; stdio routes its transport error through the logger. Level via KTX_MCP_LOG_LEVEL (default info). Existing mcp_request_completed telemetry and registerParsedTool are unchanged; no worker/async transport and no redaction in v1 (logs are local-only). Implements spider2-specs/specs/15-mcp-server-structured-logging.md and moves the intake draft to done/. * feat(mcp): report uptimeMs in MCP server /health The /health endpoint now includes uptimeMs (monotonic elapsed time since the server started), mirroring the Python daemon's uptime_ms telemetry field. * feat(cli): bound read-query execution with a per-connection deadline Enforce one shared query deadline (default 30s, overridable per connection via query_timeout_ms) on every executeReadOnly path, so an accidentally-expensive LLM-authored query returns a fast "query exceeded Ns" KtxQueryError instead of hanging the MCP server. - New shared contract context/connections/query-deadline.ts (resolveQueryDeadlineMs, queryDeadlineExceededError); query_timeout_ms added to the shared warehouse schema; BigQuery's job_timeout_ms removed. - SQLite runs the read query in a short-lived forked child process and enforces the deadline with SIGKILL. worker_threads + terminate() was tried first but cannot interrupt a synchronous better-sqlite3 scan (the native loop never yields); SIGKILL reclaims the process in ~2ms and keeps the event loop free. - Remote connectors apply a real server-side statement timeout and re-wrap their own timeout signal as KtxQueryError: Postgres statement_timeout/57014, MySQL max_execution_time/3024, Snowflake STATEMENT_TIMEOUT_IN_SECONDS/604, ClickHouse max_execution_time + aligned request_timeout/159, SQL Server requestTimeout/ ETIMEOUT, BigQuery jobTimeoutMs. - Relationship validation skips a candidate to review on a deadline timeout instead of aborting the pass; the deadline surfaces through the existing MCP pino logger as a matched tool.start/tool.end(error) pair (no new logging code). Also fixes a pre-existing, unrelated invalid cast in mcp-server-factory.test.ts that was breaking tsc -p tsconfig.test.json. * docs(spider2-specs): mark spec 16 (bounded query execution) done Append Implementation notes to the refined spec (what shipped, where, and the worker-thread -> child-process+SIGKILL deviation with its evidence) and move the intake draft from todo/ to done/. * skill(analytics): iter3 — measure-as-amount, inter-event gap, top-per-metric career Three generic interpretation rules: a named business measure (sales/revenue/spend) means its amount not a row count; "inter-event duration/gap" is LAG/LEAD time-between events not a magnitude column; "highest across several achievements" aggregates per metric over the whole history. All three demonstrably FIRE (verified on local008/003/152 SQL). local008 flips to correct (mechanism-aligned). 003/152 still fail on a different axis (source-column / grouping). Generic craft; benchmark only as motivation. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com> * skill(analytics): spine-for-extreme-selection + aggregate-over-selected-set Two generic answer-completeness refinements: - Selecting the extreme group (lowest/highest count over a period/category domain) must rank over the COMPLETE spine, not only groups with fact rows — an empty period is a genuine 0 and often the true minimum. - An aggregate scoped to a per-entity selected set ('avg revenue per actor in those top-3 films') is computed ACROSS that set, distinct from the per-item value; project both. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com> * skill(analytics): iter2 — sharpen extreme-selection spine + top-N ranking-measure - spine-for-extreme: concrete cue that a zero-row period never appears in a GROUP BY of the facts; generate the full calendar, LEFT JOIN, COALESCE, then rank. - aggregate-over-selected-set: top-N selection ranks by the named ranking measure (the item's own revenue), independent of the per-item share that feeds the aggregate. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com> * skill(analytics): iter3 — comparison-between-two-extremes is one wide row Distinguishes a cross-item comparison ('the difference between the highest and lowest month' -> single wide row, both extremes side by side + the comparison column) from 'report a metric for each group' (-> stays long). Generic, question- derived; targets the wide-vs-long shape gap without affecting per-group long output. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com> * skill(analytics): iter4 — anchor a period bucket to the named lifecycle event When a record carries multiple lifecycle timestamps (created/placed, approved, shipped, delivered, completed, settled) and the question counts/measures records in a named *completed state* by period ("delivered orders by month", "shipped items per week"), bucket the period by that named event's own timestamp, not the record-creation timestamp; the state value is the qualifying filter, the matching timestamp is the time anchor. Wording priority is explicit — purchased/placed/ created/submitted/ordered keep the start-event timestamp — and a non-temporal state filter (counts by customer/city/seller with no period) introduces no anchor. Generic analytics craft: counting completed-state records by their creation date silently answers "records that later reached that state, grouped by when they started" instead of the question asked. Surfaced via the spider2-autofix loop; FAIR_PRODUCT (adversary-screened, restatable from question wording + schema/ semantic-layer lifecycle descriptions, no gold dependency). Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com> * skill(analytics): iter5 — canonicalize observed URL-path variants before page-level analysis When a question groups/filters/sequences web pages by a path/url column, sample its distinct values; if the data itself shows /route and /route/ variants for the same page context, canonicalize in an early CTE (preserve / as root, strip trailing slashes from non-root paths, map an observed empty path to / only when the column is a URL path with blank root-page events) and use the canonical path everywhere above. Explicitly forbids inventing aliases the data doesn't show: no merging different route names, no stripping query/fragment/host/scheme, no lowercasing, and no canonicalization when the question asks for raw URL/path or slash-vs-no-slash diffs. Generic web-analytics craft: raw request logs routinely store the same user-visible page with and without a trailing slash, so grouping raw labels silently splits one page into several. Surfaced via the spider2-autofix loop (Codex runner, round r2); FAIR_PRODUCT (adversary-screened, restatable from URL-path semantics + page-grain question wording + solver-observed distinct values, no gold dependency). The rule fired mechanism-aligned on both targets; flipped local330 (landing/exit page counts), local331 residual is a separate sequence-semantics axis beyond canonicalization. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com> * skill(analytics): iter6 — coverage over a selected group is a set-membership aggregate When a question first selects a group of entities ("the top 5 actors", "these products") and then asks what count/share/percentage of a DIFFERENT subject domain relates to *these* selected entities ("what % of customers rented films featuring these actors"), the subject set is the UNION across the whole group: count DISTINCT subject ids once across the selected entities and return one collective value at the subject-domain grain — not one row per selected entity (which double-counts subjects related to more than one entity and answers a different question). Narrowly guarded: emit one row per entity only when the wording says "for each / per / by / list" or asks for each entity's own metric ("top 5 players and their batting averages"). The collective-coverage cousin of the existing per-entity selected-set rule. Generic analytics craft (per-entity metric vs set-level coverage). Surfaced via the spider2-autofix loop (Codex runner, round r3); FAIR_PRODUCT (adversary-screened, restatable from wording alone, no gold dependency). Flipped local195 mechanism-aligned (union COUNT(DISTINCT customer)/total, one scalar); 0 regression across 5 passing per-entity top-N guards (local023/024/029/212/221 stayed long). Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com> * skill(analytics): label-only joins must LEFT JOIN — incomplete dims silently drop fact rows Mirror of the existing fan-out rule for the DROP direction: an inner JOIN to a dimension table used only to attach a display attribute silently discards every fact row whose key has no parent when the dimension is incomplete (trimmed catalogs, late-arriving / SCD-gap rows), shrinking counts/sums and the universe over which shares/averages/medians are computed. Guidance: LEFT JOIN pure enrichment; inner-join a dimension only when intended as a filter; key the aggregate/GROUP BY on the fact column, not the dimension column. Spider2 autofix round 'joindim': flips complex_oracle local050 (FAIL->PASS, official scorer) — solver dropped the gratuitous products inner-join and recovered the exact gold. local060/063 also adopt LEFT JOIN (rule fires) but remain gold-convention-blocked. Guards local061/067 held. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com> * docs(spider2-specs): add todo/17 — lifecycle-event metrics (semantic-layer) Draft intake spec surfaced by the spider2-autofix loop (round r1): the model-layer form of the shipped iter4 lifecycle-date-anchoring skill rule — infer per-state lifecycle-event metrics (e.g. delivered_orders with defaultTimeDimension = the delivery timestamp) during enrichment so the correct time anchor is the default for any consumer, not only an agent that loaded the skill. Generic; FAIR_PRODUCT. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com> * fix(connectors): accept leading underscore in connection/identifier ids The safe-identifier validator regex /^[a-zA-Z0-9][a-zA-Z0-9_-]*$/ allowed an underscore everywhere except the first character, so a connection id / database name that legitimately starts with '_' (valid in Snowflake, e.g. _1000_GENOMES) could never be ingested or queried. Allow a leading underscore across all 16 duplicated validators (connection ids, source ids, page/wiki keys, warehouse- verification tool schemas). Path-safety is unaffected — '.' and '/' remain excluded, and assertSafePathToken still blocks traversal. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com> * feat(analytics): generic geospatial query guidance Add a Snowflake ST_* dialect note (ST_MAKEPOINT lon-first, ST_DWITHIN/ST_CONTAINS/ ST_WITHIN/ST_INTERSECTS, bbox->polygon via ST_MAKEPOLYGON/ST_MAKELINE) and a dialect-agnostic 'Spatial predicates' recipe in the analytics skill (resolve the entity geometry, build an area-of-interest polygon, test with the engine's containment/proximity/overlap predicate; mind lon/lat argument order). Steers the solver off hand-rolled lat/lon BETWEEN boxes toward correct, index-assisted geospatial predicates. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com> * feat(analytics): parse code/dependency text by language grammar Add two generic <sql_craft> rules: (1) parse imported/required/loaded packages by the language or manifest format (Java import keep-package-path allowing underscores/ mixed-case; Python import/from + alias stripping; R library/require; .ipynb parse JSON cell source before language rules; JSON manifests flatten the dependency object keys), stripping comments/prose and splitting multi-import lines; (2) on a de-duplicated table with a documented copy/occurrence count, choose COUNT(*) vs the weight column from the population the question names, not silently. Steers off one broad regex that drops valid identifiers and matches prose. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com> * feat(analytics): source filters/dates/measures from the owning fact grain Add a <sql_craft> rule for joined fact tables at different grains (parent order vs child line item): read each predicate, calendar bucket, and measure from the table whose grain the question names, not whichever is in scope post-join. An order-grain filter ("orders that are Complete", "the order's creation date") must come from the parent even though the child carries its own status/created_at; line price/cost come from the child. Mirror at metric grain: don't combine a parent-grain count with child rows (num_of_item * SUM(line_price) per line) — aggregate each measure at its own grain before combining. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com> * feat(analytics): collapse multi-valued classes to one representative per entity before counting/concentration When an entity carries a multi-valued classification array (IPC/CPC codes, tags) and the methodology counts entities-per-class or a concentration/diversity metric (HHI, originality, share), pick ONE representative per entity first (the array's main/primary/first flag, else a defined fallback like most-frequent), then aggregate; and use COUNT(DISTINCT entity) when the denominator is defined as a count of entities. Unnesting the array otherwise multiplies an entity's weight by its code count, inflating per-class frequencies and skewing the ranking/score. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com> * feat(connectors): introspect BigQuery datasets hosted in foreign projects A dataset_ids/dataset_id entry may now be written `project.dataset` to introspect a dataset hosted in another project while query jobs still bill to credentials.project_id. Entries are parsed once at the config boundary into canonical {project, dataset} pairs; introspection, primary-key discovery, testConnection, getTableRowCount, and listTables (grouped per project) all resolve in the dataset's own project, and scanned tables are labeled with that project so sampling, distinct-value, and read queries resolve. Bare entries are unchanged. Implements spider2-specs/specs/18-bigquery-cross-project-datasets.md. * feat(scan): durable, resumable, bounded relationship detection during enrichment Move the enrichment persistence boundary to the cost boundary and bound the open-ended relationship stage (spec 19). - Checkpoint descriptions + embeddings into the queryable `_schema` manifest (and the raw enrichment artifacts) before relationship detection runs, via a new `onCheckpoint` hook + `writeLocalScanEnrichmentCheckpoint`. An interrupted, budget-truncated, or failed relationship stage now degrades to "no joins", never "no descriptions". - Resume the enrichment cache by content identity: re-key the SQLite stage store on `(connection_id, stage, input_hash)` so a re-run with a fresh runId resumes finished descriptions/embeddings instead of re-paying for LLM work. The disposable cache recreates its table if the on-disk key shape differs. - Make the relationship stage observable and bounded: a sticky wall-clock budget (`scan.relationships.detectionBudgetMs`, default 600000 ms) + per-unit progress + honored `ctx.signal`, threaded through profiling, validation, and composite detection. On exhaustion/abort it stops scheduling, finalizes, and returns a partial result instead of throwing or hanging. - Mark a budget/abort-truncated result partial (diagnostics `partial`/`partialReason` + recoverable `relationship_detection_partial` warning). A graceful partial saves as a completed stage and resumes cheaply; raising the budget changes inputHash and forces a fresh, fuller run. A process killed mid-stage saves nothing. Document `detectionBudgetMs` in the ktx.yaml reference. Append implementation notes to specs/19 and move the intake draft to done/. Also carries the in-tree per-table enrichment LLM timeout work it builds on (`description-generation.ts` + the `enrichment_timeout` warning code), which is intertwined in `local-enrichment.ts`/`types.ts` and cannot be split into a separately-building commit. * feat(scan): bound + retry the per-table enrichment LLM call The batched table-description call had no retry (sampleTable retried 3x, this did not), so a single transient backend error (e.g. an overloaded/burst rejection when many tables enrich concurrently) silently nulled a whole table's descriptions — observed dropping ~70% of a db's tables during a bad window despite ample quota. - Wrap generateObject in retryAsync (3 attempts + backoff; KTX_ENRICH_LLM_ATTEMPTS). - Fresh per-attempt timeout (KTX_ENRICH_LLM_TIMEOUT_MS, default 120s) still bounds a wedged wide table; a timeout is surfaced as KtxAbortedError so it is NOT retried (one wedge stays one timeout, not 3x). - Granular per-table progress + start/done/retry/timeout logging. Composes with spec 19 (its non-goal #1): spec 19 makes completed descriptions durable; this makes more of them complete. * feat(scan): survive a hung LLM enrichment backend and resume descriptions Two compounding failure modes on the per-table description-enrichment path (spec 20): Enforced per-table timeout for subprocess backends. The runtime declares whether it owns an SDK subprocess (subprocessForkSpec on KtxLlmRuntimePort); codex/claude-code calls run behind a ktx-owned detached child that is tree-killed (SIGKILL of the process group on POSIX, taskkill /T on Windows) on the deadline or ctx.signal, reaping the wedged model grandchild. HTTP backends keep native fetch abort. Default stays 120s, one-wedge-one-timeout. Incremental, resumable descriptions persistence. generateDescriptions flushes enriched tables per batch to an inputHash-tagged durable record (at a stable, non-syncId path) plus only the changed manifest shards, skips already-enriched tables on resume, and never lets one table's failure discard the stage (a skipped table costs one missing description, not the whole stage's output). Spec 20 refined + intake draft moved to done/. * feat(scan): selective enrichment stages (--stages) + per-stage cache keys Split the single coarse enrichment cache key into per-stage hashes (descriptions <- snapshot + LLM identity; embeddings <- snapshot + embedding identity + description digest; relationships <- snapshot + relationship settings + LLM identity), so changing one stage's inputs invalidates only that stage and never throws away the expensive per-table descriptions on an unrelated edit. Add `ktx ingest --stages <list>` to force-re-run a chosen subset on an already-ingested connection: a named stage bypasses the completed-stage short-circuit while the per-table descriptions resume record still skips already-enriched tables, and unselected stages are left untouched on disk. Feed embeddings + relationships their description context from the on-disk _schema when descriptions do not run this invocation, and carry descriptions into the llmProposals evidence packet (closing a latent gap on the full-run path too). Surface an enrichment_stage_stale warning when an unselected stage's inputs have drifted, rather than silently cascading the work. Implements spider2-specs/specs/21-selective-enrichment-stages.md. * test(analytics): realign SKILL.md acceptance test with the evolved skill Three assertions in analytics-skill-content.test.ts drifted from the analytics SKILL.md as later iterations edited the skill without updating the test: - the sub-heading was renamed Window functions -> Ordering & aggregation determinism (iter2), so follow the source name; - the rule "Expose identity, not just the label" was renamed to "Project BOTH identity and label" (spec 14), so match the new wording; - the dialect-FQTN guard false-positived on the Java package example com.planet_ink.coffee_mud, whose backticks made a 3-segment package path read as a BigQuery/Snowflake `a.b.c` table reference. Drop the backticks so the guard stays at full strength without weakening it. * fix(scan): --stages subset must not delete unselected stages' on-disk artifacts A --stages subset that omitted descriptions wiped all on-disk ai/db descriptions from the written _schema. runLocalScan writes the structural manifest shard from the bare snapshot BEFORE enrichment runs, and the shard merge treats ai/db as scan-managed and overwrites them with whatever the run emits — none, on a subset that skips descriptions. Enrichment then read the already-wiped shard via loadPriorDescriptions and had nothing to restore. runLocalScanEnrichment now returns the best-available descriptions (fresh-this-run if descriptions ran, else loaded from the on-disk _schema) instead of [], and runLocalScan captures the prior descriptions before the structural write and feeds them to both the structural write and enrichment, so an unselected stage's artifacts survive. Joins were already preserved for --stages descriptions via the manual/inferred preservedJoins path. Tests: a full runLocalScan --stages relationships path test (RED without the fix, GREEN with it — the earlier unit test missed the structural-pre-write ordering), plus enrichment-layer contract tests for both directions. Validated live on northwind: --stages relationships keeps all 110 descriptions + 22 joins (was wiping to 0); --stages descriptions restores descriptions from the spec-20 resume record (no LLM calls) while keeping joins. * feat(dialects): bigquery nested-data (ARRAY/STRUCT/UNNEST), geospatial (GEOGRAPHY), SAFE_DIVIDE bigquery.md lacked the two sections that define BigQuery analytics (present in snowflake.md): - Nested & repeated data: UNNEST to flatten arrays of STRUCTs (GA360 hits, GA4 event_params), dot-notation field access, key-value param scalar-subquery extraction, fan-out/COUNT(DISTINCT) guard. - Geospatial (GEOGRAPHY): ST_GEOGPOINT (lon-first), containment/proximity/distance/intersection predicates, areal allocation via ST_AREA(ST_INTERSECTION()). - SAFE_DIVIDE for zero-denominator-safe rates; sharded-table shard-presence note. Generic BigQuery craft surfaced by sql_dialect_notes; product-completeness (any BQ analyst benefits). * feat(dialects): sqlite ROUND half-up FP-underflow note (+1e-9 before ROUND) SQLite ROUND(x,n) rounds half-away-from-zero, but binary FP stores an exact half-way value just below it, so ROUND(6.475,2) returns 6.47 not 6.48. Add a dialect note: nudge by a tiny epsilon (1e-9) below display precision before rounding for deterministic half-up, leaving non-boundary values unchanged. Generic SQLite craft surfaced by sql_dialect_notes (any analyst rounding a displayed average/rate/price benefits). Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com> * docs(analytics): list-as-delimited-string, answer-literally, drop free-text columns Add SKILL.md guidance to emit list-valued answer cells as delimited STRING (not ARRAY/repeated column), answer the literal ask without unrequested transformations (HAVING for aggregate bounds), and avoid projecting unrequested free-text columns that corrupt row-delimited output. * fix(scan,mcp): gitignore runtime logs, budget-guard LLM proposal, validate enrich timeout - gitignore `.ktx/logs/` in both scaffold + setup-merge lists: the managed MCP daemon writes raw tool params (SQL, memory_ingest content) to mcp.log under a version-controlled `.ktx/`, and snowflake.log already sat there unprotected. - gate the LLM relationship proposal on the detection budget/abort signal so an exhausted or aborted stage cannot start a fresh LLM call; document the boundary. - validate KTX_ENRICH_LLM_TIMEOUT_MS (NaN/0 → 120s default) like enrichAttempts, so a bad value no longer times out every table immediately. - daemon introspection now warns on malformed column/FK rows instead of dropping them silently, matching the table-row path and the "surface broken objects" goal. - docs: document `ktx wiki -c/--connection`; fix the SQLite query-deadline schema doc (forked-subprocess SIGKILL, not worker-thread termination). * fix(scan,wiki,mcp): address PR #312 review findings - scan: key the description pipeline (resume map, enriched-schema and embedding-text lookups, manifest write/read) by full table identity via tableRefKey/buildTableRef, so two same-named tables in different schemas no longer cross-assign descriptions or skip a sibling on resume - scan: re-throw a genuine context cancel during the batched description LLM call so Ctrl-C resumes the stage instead of nulling tables and recording it completed; per-table timeouts still degrade (context.signal not aborted) - scan: report statisticalValidation 'skipped' (not 'completed') when a budget/abort stop leaves relationship profiling partial - wiki: sync the full page corpus into the sqlite index and filter only the candidate/result set, so a connection-scoped search no longer prunes other connections' pages and cached embeddings from the shared index - wiki: route verbatim ingest through the canonical writePageAndSync so contentHash is set and later syncs can short-circuit - mcp: drop the as-unknown-as cast in serializeMcpError - dialects/analytics: document the integer-division trap on postgres/sqlite/tsql Adds regression tests for each behavior change. * fix(wiki): scope connection filter before SQLite lane limit Connection-scoped wiki search applied the connectionId allowlist after the lexical/semantic lanes had already truncated to laneCandidatePoolLimit over the full (connection-agnostic) corpus. When the requested connection was a minority of a large corpus, its pages were crowded out of the candidate pool before filtering, so a semantic-only match could be missed outright and lexical hits under-ranked. Push the path allowlist into searchLexicalCandidates/searchSemanticCandidates so LIMIT applies to in-scope rows, matching what the token lane already did, and drop the now-redundant post-limit JS filters. --------- Co-authored-by: Claude Opus 4.8 <noreply@anthropic.com>
2026-07-01 08:59:39 +02:00 · 2026-06-29 18:35:57 +02:00 · 2026-06-29 18:35:57 +02:00 · f65a5b0e2e
commit f65a5b0e2e
parent 2afab61417
200 changed files with 17780 additions and 672 deletions
--- a/packages/cli/test/context/connections/configured-connections.test.ts
+++ b/packages/cli/test/context/connections/configured-connections.test.ts
@ -0,0 +1,26 @@
+import { describe, expect, it } from 'vitest';
+import type { KtxProjectConnectionConfig } from '../../../src/context/project/config.js';
+import { assertConfiguredConnectionId } from '../../../src/context/connections/configured-connections.js';
+
+const connections = {
+  sales_db: { driver: 'sqlite' } as unknown as KtxProjectConnectionConfig,
+  events_db: { driver: 'sqlite' } as unknown as KtxProjectConnectionConfig,
+};
+
+describe('assertConfiguredConnectionId', () => {
+  it('returns the id when configured', () => {
+    expect(assertConfiguredConnectionId(connections, 'sales_db')).toBe('sales_db');
+  });
+
+  it('throws listing the configured ids when unknown', () => {
+    expect(() => assertConfiguredConnectionId(connections, 'warehouse')).toThrow(
+      'Unknown connection "warehouse". Configured connections: events_db, sales_db.',
+    );
+  });
+
+  it('reports none configured for an empty connections map', () => {
+    expect(() => assertConfiguredConnectionId({}, 'warehouse')).toThrow(
+      'Unknown connection "warehouse". Configured connections: (none configured).',
+    );
+  });
+});
--- a/packages/cli/test/context/connections/query-deadline.test.ts
+++ b/packages/cli/test/context/connections/query-deadline.test.ts
@ -0,0 +1,36 @@
+import { describe, expect, it } from 'vitest';
+import { KtxQueryError } from '../../../src/errors.js';
+import {
+  DEFAULT_QUERY_TIMEOUT_MS,
+  queryDeadlineExceededError,
+  resolveQueryDeadlineMs,
+} from '../../../src/context/connections/query-deadline.js';
+
+describe('resolveQueryDeadlineMs', () => {
+  it('returns the 30s default when no override is set', () => {
+    expect(DEFAULT_QUERY_TIMEOUT_MS).toBe(30_000);
+    expect(resolveQueryDeadlineMs(undefined)).toBe(30_000);
+    expect(resolveQueryDeadlineMs({ driver: 'sqlite' })).toBe(30_000);
+  });
+
+  it('honors a positive-integer query_timeout_ms override', () => {
+    expect(resolveQueryDeadlineMs({ query_timeout_ms: 5_000 })).toBe(5_000);
+    expect(resolveQueryDeadlineMs({ query_timeout_ms: 1 })).toBe(1);
+  });
+
+  it('rejects a zero, negative, or non-integer override', () => {
+    expect(() => resolveQueryDeadlineMs({ query_timeout_ms: 0 })).toThrow(/positive integer/);
+    expect(() => resolveQueryDeadlineMs({ query_timeout_ms: -5 })).toThrow(/positive integer/);
+    expect(() => resolveQueryDeadlineMs({ query_timeout_ms: 1.5 })).toThrow(/positive integer/);
+    expect(() => resolveQueryDeadlineMs({ query_timeout_ms: '5000' as unknown as number })).toThrow(/positive integer/);
+  });
+});
+
+describe('queryDeadlineExceededError', () => {
+  it('is a KtxQueryError with the canonical seconds-rounded message', () => {
+    const error = queryDeadlineExceededError(30_000);
+    expect(error).toBeInstanceOf(KtxQueryError);
+    expect(error.message).toBe('query exceeded 30s');
+    expect(queryDeadlineExceededError(45_000).message).toBe('query exceeded 45s');
+  });
+});
--- a/packages/cli/test/context/ingest/adapters/historic-sql/query-history-filter-picker.test.ts
+++ b/packages/cli/test/context/ingest/adapters/historic-sql/query-history-filter-picker.test.ts
@ -91,6 +91,7 @@ function llm(decisions: Array<{ role: string; exclude: boolean; reason: string }
    generateText: vi.fn(),
    generateObject,
    runAgentLoop: vi.fn(),
+    subprocessForkSpec: () => null,
  };
 }

--- a/packages/cli/test/context/ingest/adapters/live-database/daemon-introspection.test.ts
+++ b/packages/cli/test/context/ingest/adapters/live-database/daemon-introspection.test.ts
@ -130,6 +130,39 @@ describe('createDaemonLiveDatabaseIntrospection', () => {
    });
  });

+  it('maps daemon warnings into the snapshot and drops codes Node cannot render', async () => {
+    const runJson = vi.fn(async () => ({
+      ...daemonResponse,
+      tables: [],
+      warnings: [
+        {
+          code: 'object_introspection_failed',
+          message: 'permission denied for relation locked',
+          table: 'locked',
+          recoverable: true,
+          metadata: { object: 'public.locked' },
+        },
+        { code: 'totally_unknown_code', message: 'ignored', recoverable: true },
+      ],
+    }));
+    const introspection = createDaemonLiveDatabaseIntrospection({
+      connections: { warehouse: { driver: 'postgres', url: 'postgres://localhost:5432/warehouse' } },
+      schemas: ['public'],
+      runJson,
+    });
+
+    const snapshot = await introspection.extractSchema('warehouse');
+    expect(snapshot.warnings).toEqual([
+      {
+        code: 'object_introspection_failed',
+        message: 'permission denied for relation locked',
+        table: 'locked',
+        recoverable: true,
+        metadata: { object: 'public.locked' },
+      },
+    ]);
+  });
+
  it('calls a running daemon HTTP endpoint when baseUrl is configured', async () => {
    const requests: Array<{ url: string | undefined; body: unknown }> = [];
    const server = createServer((request, response) => {
--- a/packages/cli/test/context/ingest/adapters/live-database/live-database.adapter.test.ts
+++ b/packages/cli/test/context/ingest/adapters/live-database/live-database.adapter.test.ts
@ -1,9 +1,14 @@
-import { mkdtemp, readdir, rm } from 'node:fs/promises';
+import Database from 'better-sqlite3';
+import { mkdtemp, readdir, readFile, rm } from 'node:fs/promises';
 import { tmpdir } from 'node:os';
 import { join } from 'node:path';
-import { describe, expect, it, vi } from 'vitest';
+import { afterEach, beforeEach, describe, expect, it, vi } from 'vitest';
 import { tableRefSet, type KtxTableRefKey } from '../../../../../src/context/scan/table-ref.js';
 import { LiveDatabaseSourceAdapter } from '../../../../../src/context/ingest/adapters/live-database/live-database.adapter.js';
+import { createSqliteLiveDatabaseIntrospection } from '../../../../../src/connectors/sqlite/live-database-introspection.js';
+import { resolveEnabledTables } from '../../../../../src/context/scan/enabled-tables.js';
+import { KtxExpectedError } from '../../../../../src/errors.js';
+import type { FetchContext } from '../../../../../src/context/ingest/types.js';

 describe('LiveDatabaseSourceAdapter', () => {
  it('fetches a schema snapshot through the introspection port', async () => {
@ -109,3 +114,106 @@ describe('LiveDatabaseSourceAdapter', () => {
    }
  });
 });
+
+describe('LiveDatabaseSourceAdapter (sqlite) tolerant scan', () => {
+  const CONNECTION_ID = 'warehouse';
+  let tempDir: string;
+
+  beforeEach(async () => {
+    tempDir = await mkdtemp(join(tmpdir(), 'ktx-live-db-tolerant-'));
+  });
+
+  afterEach(async () => {
+    await rm(tempDir, { recursive: true, force: true });
+  });
+
+  function adapterFor(dbPath: string): LiveDatabaseSourceAdapter {
+    return new LiveDatabaseSourceAdapter({
+      introspection: createSqliteLiveDatabaseIntrospection({
+        projectDir: tempDir,
+        connections: { [CONNECTION_ID]: { driver: 'sqlite', path: dbPath } },
+      }),
+    });
+  }
+
+  function ctx(overrides: Partial<FetchContext> = {}): FetchContext {
+    return { connectionId: CONNECTION_ID, sourceKey: 'live-database', ...overrides };
+  }
+
+  it('ingests healthy objects and reports the broken view as a skip', async () => {
+    const dbPath = join(tempDir, 'partial.db');
+    const db = new Database(dbPath);
+    db.exec(`
+      CREATE TABLE base (id INTEGER PRIMARY KEY, start_date TEXT);
+      CREATE VIEW emp_hire_periods_with_name AS SELECT id, start_date FROM base;
+      CREATE TABLE customers (id INTEGER PRIMARY KEY, name TEXT NOT NULL);
+      DROP TABLE base;
+    `);
+    db.close();
+
+    const adapter = adapterFor(dbPath);
+    const stagedDir = join(tempDir, 'staged-partial');
+    await adapter.fetch(undefined, stagedDir, ctx());
+
+    await expect(adapter.detect(stagedDir)).resolves.toBe(true);
+
+    const warnings = JSON.parse(await readFile(join(stagedDir, 'warnings.json'), 'utf8')) as {
+      warnings: Array<{ code: string; table?: string }>;
+    };
+    expect(warnings.warnings).toHaveLength(1);
+    expect(warnings.warnings[0]).toMatchObject({
+      code: 'object_introspection_failed',
+      table: 'emp_hire_periods_with_name',
+    });
+
+    const report = await adapter.readFetchReport(stagedDir);
+    expect(report?.skipped.map((issue) => issue.entityId)).toEqual(['emp_hire_periods_with_name']);
+  });
+
+  it('raises a clear connection error when every object fails introspection', async () => {
+    const dbPath = join(tempDir, 'all-broken.db');
+    const db = new Database(dbPath);
+    db.exec(`
+      CREATE TABLE base (id INTEGER PRIMARY KEY, value TEXT);
+      CREATE VIEW only_view AS SELECT id, value FROM base;
+      DROP TABLE base;
+    `);
+    db.close();
+
+    const adapter = adapterFor(dbPath);
+    await expect(adapter.fetch(undefined, join(tempDir, 'staged-all-broken'), ctx())).rejects.toThrow(KtxExpectedError);
+  });
+
+  it('treats a genuinely empty database as a recognized, empty success', async () => {
+    const dbPath = join(tempDir, 'empty.db');
+    new Database(dbPath).close();
+
+    const adapter = adapterFor(dbPath);
+    const stagedDir = join(tempDir, 'staged-empty');
+    await adapter.fetch(undefined, stagedDir, ctx());
+    await expect(adapter.detect(stagedDir)).resolves.toBe(true);
+    await expect(adapter.readFetchReport(stagedDir)).resolves.toBeNull();
+  });
+
+  it('ingests exactly the enabled_tables subset and fails clearly on a zero-match scope', async () => {
+    const dbPath = join(tempDir, 'scoped.db');
+    const db = new Database(dbPath);
+    db.exec(`
+      CREATE TABLE customers (id INTEGER PRIMARY KEY, name TEXT);
+      CREATE TABLE orders (id INTEGER PRIMARY KEY, customer_id INTEGER);
+    `);
+    db.close();
+    const adapter = adapterFor(dbPath);
+
+    const scope = resolveEnabledTables({ driver: 'sqlite', enabled_tables: ['main.customers'] }) ?? undefined;
+    const stagedDir = join(tempDir, 'staged-scoped');
+    await adapter.fetch(undefined, stagedDir, ctx({ tableScope: scope }));
+    const meta = JSON.parse(await readFile(join(stagedDir, 'connection.json'), 'utf8')) as { tableCount: number };
+    expect(meta.tableCount).toBe(1);
+
+    const typoScope = resolveEnabledTables({ driver: 'sqlite', enabled_tables: ['nope'] }) ?? undefined;
+    await expect(
+      adapter.fetch(undefined, join(tempDir, 'staged-zero'), ctx({ tableScope: typoScope })),
+    ).rejects.toThrow(/matched no objects.*Available objects: customers, orders/s);
+  });
+});
--- a/packages/cli/test/context/ingest/adapters/live-database/manifest.test.ts
+++ b/packages/cli/test/context/ingest/adapters/live-database/manifest.test.ts
@ -14,7 +14,7 @@ describe('buildLiveDatabaseManifestShards', () => {
  it('builds shard objects with generated joins and preserved external descriptions', () => {
    const existingDescriptions = new Map<string, LiveDatabaseManifestExistingDescriptions>([
      [
-        'orders',
+        'public.orders',
        {
          table: { user: 'Pinned analyst description', db: 'Old db description' },
          columns: new Map([['id', { user: 'Pinned id description', db: 'Old id description' }]]),
@ -189,7 +189,7 @@ describe('buildLiveDatabaseManifestShards', () => {
  it('preserves external usage keys while replacing historic SQL managed keys', () => {
    const existingUsage = new Map([
      [
-        'orders',
+        'public.orders',
        {
          narrative: 'Old generated usage narrative.',
          frequencyTier: 'low' as const,
--- a/packages/cli/test/context/ingest/adapters/live-database/scan-outcome.test.ts
+++ b/packages/cli/test/context/ingest/adapters/live-database/scan-outcome.test.ts
@ -0,0 +1,65 @@
+import { describe, expect, it } from 'vitest';
+import { assertLiveDatabaseScanOutcome } from '../../../../../src/context/ingest/adapters/live-database/scan-outcome.js';
+import { tableRefSet } from '../../../../../src/context/scan/table-ref.js';
+import type { KtxSchemaSnapshot, KtxSchemaTable } from '../../../../../src/context/scan/types.js';
+
+function table(name: string): KtxSchemaTable {
+  return { catalog: null, db: null, name, kind: 'table', comment: null, estimatedRows: 0, columns: [], foreignKeys: [] };
+}
+
+function snapshot(overrides: Partial<KtxSchemaSnapshot>): KtxSchemaSnapshot {
+  return {
+    connectionId: 'warehouse',
+    driver: 'sqlite',
+    extractedAt: '2026-06-14T00:00:00.000Z',
+    scope: {},
+    metadata: {},
+    tables: [],
+    ...overrides,
+  };
+}
+
+describe('assertLiveDatabaseScanOutcome', () => {
+  it('passes when at least one object was ingested, even with skips', () => {
+    expect(() =>
+      assertLiveDatabaseScanOutcome({
+        connectionId: 'warehouse',
+        scope: undefined,
+        snapshot: snapshot({
+          tables: [table('customers')],
+          warnings: [{ code: 'object_introspection_failed', message: 'boom', table: 'broken', recoverable: true }],
+        }),
+      }),
+    ).not.toThrow();
+  });
+
+  it('passes for a legitimately empty database (no scope, no objects)', () => {
+    expect(() =>
+      assertLiveDatabaseScanOutcome({ connectionId: 'warehouse', scope: undefined, snapshot: snapshot({}) }),
+    ).not.toThrow();
+  });
+
+  it('fails clearly when every introspected object failed', () => {
+    expect(() =>
+      assertLiveDatabaseScanOutcome({
+        connectionId: 'warehouse',
+        scope: undefined,
+        snapshot: snapshot({
+          warnings: [
+            { code: 'object_introspection_failed', message: 'no such table: base', table: 'only_view', recoverable: true },
+          ],
+        }),
+      }),
+    ).toThrow(/all 1 introspected object failed.*only_view: no such table: base/s);
+  });
+
+  it('fails clearly when a non-empty enabled_tables scope matched nothing, naming available objects', () => {
+    expect(() =>
+      assertLiveDatabaseScanOutcome({
+        connectionId: 'warehouse',
+        scope: tableRefSet([{ catalog: null, db: null, name: 'typo_table' }]),
+        snapshot: snapshot({ metadata: { discovered_object_names: ['customers', 'orders'] } }),
+      }),
+    ).toThrow(/matched no objects.*typo_table.*Available objects: customers, orders/s);
+  });
+});
--- a/packages/cli/test/context/ingest/local-bundle-runtime.test.ts
+++ b/packages/cli/test/context/ingest/local-bundle-runtime.test.ts
@ -91,6 +91,7 @@ describe('createLocalBundleIngestRuntime', () => {
      generateText: vi.fn(),
      generateObject: vi.fn(),
      runAgentLoop: vi.fn(async () => ({ stopReason: 'natural' as const })),
+      subprocessForkSpec: vi.fn(() => null),
    };
    project.config.llm = {
      provider: { backend: 'claude-code' },
--- a/packages/cli/test/context/llm/local-config.test.ts
+++ b/packages/cli/test/context/llm/local-config.test.ts
@ -137,16 +137,19 @@ describe('local ktx LLM config', () => {
      generateText: vi.fn(),
      generateObject: vi.fn(),
      runAgentLoop: vi.fn(),
+      subprocessForkSpec: vi.fn(() => null),
    }));
    const createCodexRuntime = vi.fn(() => ({
      generateText: vi.fn(),
      generateObject: vi.fn(),
      runAgentLoop: vi.fn(),
+      subprocessForkSpec: vi.fn(() => null),
    }));
    const createAiSdkRuntime = vi.fn(() => ({
      generateText: vi.fn(),
      generateObject: vi.fn(),
      runAgentLoop: vi.fn(),
+      subprocessForkSpec: vi.fn(() => null),
    }));
    const createKtxLlmProvider = vi.fn(() => ({
      getModel: vi.fn(),
--- a/packages/cli/test/context/llm/subprocess-generate-object.test.ts
+++ b/packages/cli/test/context/llm/subprocess-generate-object.test.ts
@ -0,0 +1,138 @@
+import { type ChildProcess } from 'node:child_process';
+import { mkdtempSync, readFileSync, rmSync } from 'node:fs';
+import { tmpdir } from 'node:os';
+import { join } from 'node:path';
+import { afterEach, beforeEach, describe, expect, it, vi } from 'vitest';
+import { z } from 'zod';
+import { isAbortError } from '../../../src/context/core/abort.js';
+import {
+  KtxSubprocessDeadlineError,
+  runGenerateObjectInSubprocess,
+} from '../../../src/context/llm/subprocess-generate-object.js';
+import type { SubprocessRuntimeForkSpec } from '../../../src/context/llm/runtime-port.js';
+import { HANGING_CHILD, killTestChildren, RESPONDING_CHILD, spawnTestChild } from './subprocess-test-children.test-utils.js';
+
+const FORK_SPEC: SubprocessRuntimeForkSpec = { backend: 'codex', projectDir: '/tmp', modelSlots: { default: 'codex' } };
+
+function isAlive(pid: number): boolean {
+  try {
+    process.kill(pid, 0);
+    return true;
+  } catch {
+    return false;
+  }
+}
+
+describe('runGenerateObjectInSubprocess', () => {
+  let children: ChildProcess[];
+  let workDir: string;
+
+  function forkFake(code: string, env: Record<string, string> = {}): () => ChildProcess {
+    return () => spawnTestChild(children, code, env);
+  }
+
+  beforeEach(() => {
+    children = [];
+    workDir = mkdtempSync(join(tmpdir(), 'ktx-subproc-'));
+  });
+
+  afterEach(() => {
+    killTestChildren(children);
+    rmSync(workDir, { recursive: true, force: true });
+  });
+
+  it('tree-kills a wedged child at the deadline and reaps its grandchild', async () => {
+    const pidFile = join(workDir, 'gc.pid');
+    const start = Date.now();
+    const pending = runGenerateObjectInSubprocess({
+      forkSpec: FORK_SPEC,
+      role: 'candidateExtraction',
+      prompt: 'x',
+      schema: z.object({ answer: z.string() }),
+      jsonSchema: { type: 'object' },
+      deadlineMs: 300,
+      spawnChild: forkFake(HANGING_CHILD, { KTX_TEST_GC_PID_FILE: pidFile }),
+    });
+
+    await expect(pending).rejects.toBeInstanceOf(KtxSubprocessDeadlineError);
+    // Settled within the deadline plus a small grace, not left wedged.
+    expect(Date.now() - start).toBeLessThan(3000);
+
+    const child = children[0]!;
+    await vi.waitFor(() => expect(child.exitCode !== null || child.signalCode !== null).toBe(true), { timeout: 5000 });
+    expect(child.signalCode).toBe('SIGKILL');
+
+    const grandchildPid = Number(readFileSync(pidFile, 'utf8'));
+    expect(Number.isInteger(grandchildPid)).toBe(true);
+    await vi.waitFor(() => expect(isAlive(grandchildPid)).toBe(false), { timeout: 5000 });
+  });
+
+  it('tree-kills the same way on an external abort', async () => {
+    const pidFile = join(workDir, 'gc.pid');
+    const controller = new AbortController();
+    const pending = runGenerateObjectInSubprocess({
+      forkSpec: FORK_SPEC,
+      role: 'candidateExtraction',
+      prompt: 'x',
+      schema: z.object({ answer: z.string() }),
+      jsonSchema: { type: 'object' },
+      deadlineMs: 60_000,
+      signal: controller.signal,
+      spawnChild: forkFake(HANGING_CHILD, { KTX_TEST_GC_PID_FILE: pidFile }),
+    });
+    void pending.catch(() => undefined);
+
+    await vi.waitFor(() => expect(() => readFileSync(pidFile, 'utf8')).not.toThrow(), { timeout: 5000 });
+    controller.abort();
+
+    await expect(pending).rejects.toSatisfy(isAbortError);
+    const child = children[0]!;
+    await vi.waitFor(() => expect(child.exitCode !== null || child.signalCode !== null).toBe(true), { timeout: 5000 });
+    const grandchildPid = Number(readFileSync(pidFile, 'utf8'));
+    await vi.waitFor(() => expect(isAlive(grandchildPid)).toBe(false), { timeout: 5000 });
+  });
+
+  it('resolves with the schema-validated output on success', async () => {
+    await expect(
+      runGenerateObjectInSubprocess({
+        forkSpec: FORK_SPEC,
+        role: 'candidateExtraction',
+        prompt: 'x',
+        schema: z.object({ answer: z.string() }),
+        jsonSchema: { type: 'object' },
+        deadlineMs: 5_000,
+        spawnChild: forkFake(RESPONDING_CHILD),
+      }),
+    ).resolves.toEqual({ answer: 'yes' });
+  });
+
+  it('rejects when the child output fails schema validation', async () => {
+    await expect(
+      runGenerateObjectInSubprocess({
+        forkSpec: FORK_SPEC,
+        role: 'candidateExtraction',
+        prompt: 'x',
+        schema: z.object({ answer: z.string() }),
+        jsonSchema: { type: 'object' },
+        deadlineMs: 5_000,
+        spawnChild: forkFake(RESPONDING_CHILD, { KTX_TEST_RESPONSE: '{"ok":true,"output":{"wrong":1}}' }),
+      }),
+    ).rejects.toThrow();
+  });
+
+  it('rejects with the child error message when the child reports failure', async () => {
+    await expect(
+      runGenerateObjectInSubprocess({
+        forkSpec: FORK_SPEC,
+        role: 'candidateExtraction',
+        prompt: 'x',
+        schema: z.object({ answer: z.string() }),
+        jsonSchema: { type: 'object' },
+        deadlineMs: 5_000,
+        spawnChild: forkFake(RESPONDING_CHILD, {
+          KTX_TEST_RESPONSE: '{"ok":false,"message":"backend overloaded"}',
+        }),
+      }),
+    ).rejects.toThrow('backend overloaded');
+  });
+});
--- a/packages/cli/test/context/llm/subprocess-test-children.test-utils.ts
+++ b/packages/cli/test/context/llm/subprocess-test-children.test-utils.ts
@ -0,0 +1,45 @@
+import { spawn, type ChildProcess } from 'node:child_process';
+
+// A wedged subprocess-backed call: the child ignores SIGTERM (as a child hung on a
+// provider socket does), spawns a grandchild (the SDK's model binary stand-in) that
+// also ignores SIGTERM, and never replies. Only a SIGKILL of the whole process group
+// reaps it.
+export const HANGING_CHILD = `
+process.on('SIGTERM', () => {});
+const { spawn } = require('node:child_process');
+const { writeFileSync } = require('node:fs');
+process.on('message', () => {
+  const gc = spawn(process.execPath, ['-e', 'process.on("SIGTERM",()=>{});setInterval(()=>{},1000000)'], { stdio: 'ignore' });
+  gc.unref();
+  if (process.env.KTX_TEST_GC_PID_FILE) writeFileSync(process.env.KTX_TEST_GC_PID_FILE, String(gc.pid));
+});
+`;
+
+export const RESPONDING_CHILD = `
+process.on('message', () => {
+  const raw = process.env.KTX_TEST_RESPONSE || '{"ok":true,"output":{"answer":"yes"}}';
+  process.send(JSON.parse(raw), () => process.exit(0));
+});
+`;
+
+export function spawnTestChild(registry: ChildProcess[], code: string, env: Record<string, string> = {}): ChildProcess {
+  const child = spawn(process.execPath, ['-e', code], {
+    detached: true,
+    stdio: ['ignore', 'ignore', 'inherit', 'ipc'],
+    env: { ...process.env, ...env },
+  });
+  registry.push(child);
+  return child;
+}
+
+export function killTestChildren(registry: ChildProcess[]): void {
+  for (const child of registry) {
+    if (child.pid !== undefined && child.exitCode === null && child.signalCode === null) {
+      try {
+        process.kill(-child.pid, 'SIGKILL');
+      } catch {
+        // Already exited.
+      }
+    }
+  }
+}
--- a/packages/cli/test/context/mcp/snapshots/mcp-tools-list.json
+++ b/packages/cli/test/context/mcp/snapshots/mcp-tools-list.json
@ -63,7 +63,7 @@
  {
    "name": "wiki_search",
    "title": "Wiki Search",
-    "description": "Search ktx wiki pages for reusable business context. Example: wiki_search({ query: \"revenue recognition\", limit: 5 }).",
+    "description": "Search ktx wiki pages for reusable business context. Pass connectionId to scope results to one warehouse (unscoped pages plus pages tagged with that connection) when a concept name collides across databases. Example: wiki_search({ query: \"revenue recognition\", connectionId: \"warehouse\", limit: 5 }).",
    "inputSchema": {
      "type": "object",
      "properties": {
@ -78,6 +78,11 @@
          "type": "integer",
          "minimum": 1,
          "maximum": 50
+        },
+        "connectionId": {
+          "description": "Scope results to one connection: returns unscoped pages plus pages tagged with this connection. Omit to search all pages.",
+          "type": "string",
+          "minLength": 1
        }
      },
      "required": [
@ -1478,6 +1483,55 @@
      "taskSupport": "forbidden"
    }
  },
+  {
+    "name": "sql_dialect_notes",
+    "title": "SQL Dialect Notes",
+    "description": "Return the SQL syntax conventions for the dialect of a ktx connection: fully-qualified table-name form, identifier quoting and case-folding, date/time functions, top-N / window-filtering idiom, and JSON access. Call this before writing raw sql_execution SQL against a connection so the SQL matches that engine. Example: sql_dialect_notes({ connectionId: \"warehouse\" }).",
+    "inputSchema": {
+      "type": "object",
+      "properties": {
+        "connectionId": {
+          "type": "string",
+          "minLength": 1,
+          "description": "Connection id whose engine dialect conventions to return."
+        }
+      },
+      "required": [
+        "connectionId"
+      ],
+      "$schema": "http://json-schema.org/draft-07/schema#"
+    },
+    "outputSchema": {
+      "type": "object",
+      "properties": {
+        "connectionId": {
+          "type": "string"
+        },
+        "dialect": {
+          "type": "string"
+        },
+        "notes": {
+          "type": "string"
+        }
+      },
+      "required": [
+        "connectionId",
+        "dialect",
+        "notes"
+      ],
+      "$schema": "http://json-schema.org/draft-07/schema#",
+      "additionalProperties": false
+    },
+    "annotations": {
+      "title": "SQL Dialect Notes",
+      "readOnlyHint": true,
+      "idempotentHint": true,
+      "openWorldHint": false
+    },
+    "execution": {
+      "taskSupport": "forbidden"
+    }
+  },
  {
    "name": "memory_ingest",
    "title": "Memory Ingest",
--- a/packages/cli/test/context/mcp/dialect-notes.test.ts
+++ b/packages/cli/test/context/mcp/dialect-notes.test.ts
@ -0,0 +1,111 @@
+import { readdirSync } from 'node:fs';
+import { fileURLToPath } from 'node:url';
+import { describe, expect, it } from 'vitest';
+import { KtxExpectedError } from '../../../src/errors.js';
+import { KTX_DATABASE_DRIVER_IDS } from '../../../src/connection-drivers.js';
+import type { KtxProjectConnectionConfig } from '../../../src/context/project/config.js';
+import { sqlAnalysisDialectForDriver } from '../../../src/context/sql-analysis/dialect.js';
+import { DIALECTS_WITH_NOTES, sqlDialectNotes } from '../../../src/context/sql-analysis/dialect-notes.js';
+import { resolveDialectNotesForConnection } from '../../../src/context/mcp/local-project-ports.js';
+
+function conn(driver: string): KtxProjectConnectionConfig {
+  return { driver } as KtxProjectConnectionConfig;
+}
+
+describe('per-dialect SQL notes', () => {
+  it('covers every dialect reachable from a configured warehouse driver', () => {
+    // Derived from the connector registry, not a hand-maintained list: a new
+    // warehouse driver whose resolved dialect lacks authored notes fails here.
+    for (const driver of KTX_DATABASE_DRIVER_IDS) {
+      const dialect = sqlAnalysisDialectForDriver(driver);
+      expect(DIALECTS_WITH_NOTES, `driver "${driver}" resolves to dialect "${dialect}"`).toContain(dialect);
+      expect(sqlDialectNotes(dialect).length).toBeGreaterThan(0);
+    }
+  });
+
+  it('keeps the authored-dialect list and the ./dialects markdown files in sync', () => {
+    const dir = fileURLToPath(new URL('../../../src/context/sql-analysis/dialects/', import.meta.url));
+    const files = readdirSync(dir)
+      .filter((name) => name.endsWith('.md'))
+      .map((name) => name.replace(/\.md$/, ''))
+      .sort();
+    expect(files).toEqual([...DIALECTS_WITH_NOTES].sort());
+  });
+
+  it('does not author notes for unreachable dialects', () => {
+    // duckdb/databricks appear in the resolver map but no connector produces them.
+    expect(DIALECTS_WITH_NOTES).not.toContain('duckdb');
+    expect(DIALECTS_WITH_NOTES).not.toContain('databricks');
+  });
+
+  it('answers the full rubric for every dialect', () => {
+    for (const dialect of DIALECTS_WITH_NOTES) {
+      const notes = sqlDialectNotes(dialect);
+      expect(notes, `${dialect}: FQTN`).toContain('**FQTN:**');
+      expect(notes, `${dialect}: identifiers`).toContain('**Identifiers:**');
+      expect(notes, `${dialect}: date/time`).toContain('**Date/time:**');
+      expect(notes, `${dialect}: top-N`).toMatch(/\*\*Top-N/);
+      expect(notes, `${dialect}: series`).toMatch(/\*\*Series/);
+      expect(notes, `${dialect}: rolling window`).toMatch(/\*\*Rolling/);
+      expect(notes, `${dialect}: safe cast`).toMatch(/\*\*Safe cast/);
+      expect(notes, `${dialect}: semi-structured`).toMatch(/\*\*(JSON|Semi-structured)/);
+    }
+  });
+
+  it('gives each engine its own idioms and never leaks another engine-only construct', () => {
+    // A sqlite analyst gets sqlite date idioms and never Snowflake/BigQuery-only syntax.
+    expect(sqlDialectNotes('sqlite')).toMatch(/strftime|julianday/);
+    expect(sqlDialectNotes('sqlite')).not.toContain('VARIANT');
+    expect(sqlDialectNotes('sqlite')).not.toContain('_TABLE_SUFFIX');
+
+    // QUALIFY appears only for the engines that actually support it.
+    expect(sqlDialectNotes('snowflake')).toContain('QUALIFY');
+    expect(sqlDialectNotes('bigquery')).toContain('QUALIFY');
+    for (const dialect of ['postgres', 'mysql', 'sqlite', 'clickhouse', 'tsql'] as const) {
+      expect(sqlDialectNotes(dialect), `${dialect} must not mention QUALIFY`).not.toContain('QUALIFY');
+    }
+
+    // Engine-exclusive markers stay in their own dialect.
+    expect(sqlDialectNotes('snowflake')).toContain('VARIANT');
+    expect(sqlDialectNotes('snowflake')).toContain('DATABASE.SCHEMA.TABLE');
+    expect(sqlDialectNotes('bigquery')).toContain('_TABLE_SUFFIX');
+    expect(sqlDialectNotes('clickhouse')).toContain('LIMIT n BY');
+    expect(sqlDialectNotes('tsql')).toContain('TOP (n)');
+  });
+
+  it('contains no benchmark/grader or version-dated content', () => {
+    for (const dialect of DIALECTS_WITH_NOTES) {
+      const notes = sqlDialectNotes(dialect);
+      expect(notes).not.toMatch(/\bspider\b|\bbenchmark\b|\bgold\b|\bgrader\b/i);
+      expect(notes).not.toMatch(/\bas of v(ersion)?\b/i);
+    }
+  });
+
+  it('falls back to postgres notes for a dialect without its own file', () => {
+    expect(sqlAnalysisDialectForDriver('some-future-engine')).toBe('postgres');
+    // redshift is a valid SqlAnalysisDialect but intentionally unauthored.
+    expect(sqlDialectNotes('redshift')).toBe(sqlDialectNotes('postgres'));
+  });
+});
+
+describe('resolveDialectNotesForConnection', () => {
+  it('resolves a warehouse connection to its dialect notes', () => {
+    expect(resolveDialectNotesForConnection('wh', conn('sqlite'))).toMatchObject({
+      connectionId: 'wh',
+      dialect: 'sqlite',
+    });
+    expect(resolveDialectNotesForConnection('wh', conn('snowflake')).dialect).toBe('snowflake');
+    // The sqlserver driver resolves to the tsql dialect (resolver codomain).
+    expect(resolveDialectNotesForConnection('wh', conn('sqlserver')).dialect).toBe('tsql');
+  });
+
+  it('rejects a non-SQL context source with a clear expected error, not postgres notes', () => {
+    expect(() => resolveDialectNotesForConnection('mb', conn('metabase'))).toThrow(KtxExpectedError);
+    expect(() => resolveDialectNotesForConnection('mb', conn('metabase'))).toThrow(/not a SQL warehouse/);
+  });
+
+  it('rejects an unconfigured connection', () => {
+    expect(() => resolveDialectNotesForConnection('missing', undefined)).toThrow(KtxExpectedError);
+    expect(() => resolveDialectNotesForConnection('missing', undefined)).toThrow(/not configured/);
+  });
+});
--- a/packages/cli/test/context/mcp/local-project-ports.test.ts
+++ b/packages/cli/test/context/mcp/local-project-ports.test.ts
@ -178,6 +178,7 @@ describe('createLocalProjectMcpContextPorts', () => {

    expect(Object.keys(ports).sort()).toEqual([
      'connections',
+      'dialectNotes',
      'dictionarySearch',
      'discover',
      'entityDetails',
@ -187,6 +188,7 @@ describe('createLocalProjectMcpContextPorts', () => {
    expect(Object.keys(ports.connections ?? {}).sort()).toEqual(['list']);
    expect(Object.keys(ports.knowledge ?? {}).sort()).toEqual(['read', 'search']);
    expect(Object.keys(ports.semanticLayer ?? {}).sort()).toEqual(['query', 'readSource']);
+    expect(Object.keys(ports.dialectNotes ?? {}).sort()).toEqual(['read']);
    await expect(ports.connections?.list()).resolves.toEqual([
      { id: 'warehouse', name: 'warehouse', connectionType: 'POSTGRESQL' },
    ]);
@ -803,6 +805,47 @@ describe('createLocalProjectMcpContextPorts', () => {
    expect(search?.results[0]?.score).toBeGreaterThan(0);
  });

+  it('scopes wiki_search to a connection and validates the connection id', async () => {
+    const project = await initKtxProject({ projectDir: tempDir });
+    project.config.connections.sales_db = { driver: 'sqlite', url: 'file:sales.db' };
+    project.config.connections.events_db = { driver: 'sqlite', url: 'file:events.db' };
+    const seed = async (key: string, connections: string[]) => {
+      await project.fileStore.writeFile(
+        `wiki/global/${key}.md`,
+        [
+          '---',
+          `summary: Orders for ${key}`,
+          'usage_mode: auto',
+          ...(connections.length > 0 ? ['connections:', ...connections.map((id) => `  - ${id}`)] : []),
+          '---',
+          '',
+          'Orders are recognized when paid.',
+          '',
+        ].join('\n'),
+        'ktx',
+        'ktx@example.com',
+        `seed ${key}`,
+      );
+    };
+    await seed('orders-sales', ['sales_db']);
+    await seed('orders-events', ['events_db']);
+    await seed('orders-global', []);
+
+    const ports = createLocalProjectMcpContextPorts(project, { embeddingService: null });
+
+    const scoped = await ports.knowledge?.search({
+      userId: 'local-user',
+      query: 'orders paid',
+      limit: 10,
+      connectionId: 'sales_db',
+    });
+    expect(scoped?.results.map((result) => result.key).sort()).toEqual(['orders-global', 'orders-sales']);
+
+    await expect(
+      ports.knowledge?.search({ userId: 'local-user', query: 'orders', limit: 10, connectionId: 'warehouse' }),
+    ).rejects.toThrow('Unknown connection "warehouse". Configured connections: events_db, sales_db.');
+  });
+
  it('reads seeded semantic-layer sources', async () => {
    const project = await initKtxProject({ projectDir: tempDir });
    await seedSlSourceFile(project, {
--- a/packages/cli/test/context/mcp/logger.test.ts
+++ b/packages/cli/test/context/mcp/logger.test.ts
@ -0,0 +1,99 @@
+import { afterEach, describe, expect, it, vi } from 'vitest';
+import { createMcpLogger, mcpLogLevel, mcpSlowToolMs, serializeMcpError } from '../../../src/context/mcp/logger.js';
+
+function capturingIo() {
+  let buf = '';
+  return {
+    io: { stdout: { write() {} }, stderr: { write(chunk: string) { buf += chunk; } } },
+    text: () => buf,
+    json: () =>
+      buf
+        .split('\n')
+        .filter((line) => line.trim().startsWith('{'))
+        .map((line) => JSON.parse(line) as Record<string, unknown>),
+  };
+}
+
+describe('mcpLogLevel', () => {
+  it('defaults to info when unset', () => {
+    expect(mcpLogLevel({})).toBe('info');
+  });
+
+  it('accepts a recognized pino level', () => {
+    expect(mcpLogLevel({ KTX_MCP_LOG_LEVEL: 'debug' })).toBe('debug');
+    expect(mcpLogLevel({ KTX_MCP_LOG_LEVEL: 'WARN' })).toBe('warn');
+  });
+
+  it('falls back to info for an unrecognized value', () => {
+    expect(mcpLogLevel({ KTX_MCP_LOG_LEVEL: 'loud' })).toBe('info');
+  });
+});
+
+describe('mcpSlowToolMs', () => {
+  it('defaults to 10000ms', () => {
+    expect(mcpSlowToolMs({})).toBe(10_000);
+  });
+
+  it('parses a numeric override', () => {
+    expect(mcpSlowToolMs({ KTX_MCP_SLOW_TOOL_MS: '250' })).toBe(250);
+  });
+
+  it('ignores a non-numeric or negative value', () => {
+    expect(mcpSlowToolMs({ KTX_MCP_SLOW_TOOL_MS: 'soon' })).toBe(10_000);
+    expect(mcpSlowToolMs({ KTX_MCP_SLOW_TOOL_MS: '-5' })).toBe(10_000);
+  });
+});
+
+describe('serializeMcpError', () => {
+  it('serializes an Error with type, message, and stack', () => {
+    const out = serializeMcpError(new TypeError('boom'));
+    expect(out.type).toBe('TypeError');
+    expect(out.message).toBe('boom');
+    expect(typeof out.stack).toBe('string');
+  });
+
+  it('reduces a non-error to a message (no synthetic stack)', () => {
+    expect(serializeMcpError('plain text')).toEqual({ message: 'plain text' });
+  });
+});
+
+describe('createMcpLogger', () => {
+  afterEach(() => {
+    vi.unstubAllEnvs();
+  });
+
+  it('writes structured JSON lines through io.stderr when not a TTY', () => {
+    const cap = capturingIo();
+    const logger = createMcpLogger(cap.io, { isTTY: false });
+    logger.info({ tool: 'sql_execution', callId: 'abc' }, 'tool.start');
+
+    const [line] = cap.json();
+    expect(line.msg).toBe('tool.start');
+    expect(line.tool).toBe('sql_execution');
+    expect(line.callId).toBe('abc');
+    expect(typeof line.time).toBe('number');
+    expect(line.level).toBe(30);
+  });
+
+  it('writes human-readable (non-JSON) output for a TTY', () => {
+    const cap = capturingIo();
+    const logger = createMcpLogger(cap.io, { isTTY: true });
+    logger.info({ tool: 'sql_execution' }, 'tool.start');
+
+    expect(cap.text()).toContain('tool.start');
+    // pino-pretty output is not a JSON line.
+    expect(cap.text().trim().startsWith('{')).toBe(false);
+  });
+
+  it('honors KTX_MCP_LOG_LEVEL by suppressing below-threshold lines', () => {
+    vi.stubEnv('KTX_MCP_LOG_LEVEL', 'warn');
+    const cap = capturingIo();
+    const logger = createMcpLogger(cap.io, { isTTY: false });
+    logger.info({}, 'routine');
+    logger.warn({}, 'slow');
+
+    const messages = cap.json().map((line) => line.msg);
+    expect(messages).not.toContain('routine');
+    expect(messages).toContain('slow');
+  });
+});
--- a/packages/cli/test/context/mcp/server.test.ts
+++ b/packages/cli/test/context/mcp/server.test.ts
@ -4,14 +4,17 @@ import { join } from 'node:path';
 import { Client } from '@modelcontextprotocol/sdk/client/index.js';
 import { InMemoryTransport } from '@modelcontextprotocol/sdk/inMemory.js';
 import { afterEach, describe, expect, it, vi } from 'vitest';
+import { KtxQueryError } from '../../../src/errors.js';
 import { createLocalProjectMemoryIngest } from '../../../src/context/memory/local-memory.js';
 import { detectCaptureSignals } from '../../../src/context/memory/capture-signals.js';
 import type { MemoryAgentInput } from '../../../src/context/memory/types.js';
 import { parseKtxProjectConfig, serializeKtxProjectConfig } from '../../../src/context/project/config.js';
 import { initKtxProject } from '../../../src/context/project/project.js';
 import { jsonToolResult } from '../../../src/context/mcp/context-tools.js';
+import { createMcpLogger } from '../../../src/context/mcp/logger.js';
 import { createDefaultKtxMcpServer, createKtxMcpServer } from '../../../src/context/mcp/server.js';
 import type {
+  KtxDialectNotesMcpPort,
  KtxDiscoverDataMcpPort,
  KtxDictionarySearchMcpPort,
  KtxEntityDetailsMcpPort,
@ -84,6 +87,7 @@ const retainedToolNames = [
  'memory_ingest_status',
  'sl_query',
  'sl_read_source',
+  'sql_dialect_notes',
  'sql_execution',
  'wiki_read',
  'wiki_search',
@ -136,6 +140,13 @@ function makeAllContextTools(): KtxMcpContextPorts {
        rowCount: 1,
      }),
    },
+    dialectNotes: {
+      read: vi.fn<KtxDialectNotesMcpPort['read']>().mockResolvedValue({
+        connectionId: 'warehouse',
+        dialect: 'postgres',
+        notes: '**postgres** SQL conventions',
+      }),
+    },
    memoryIngest: {
      ingest: vi.fn<MemoryIngestPort['ingest']>().mockResolvedValue({ runId: 'run-1' }),
      status: vi.fn<MemoryIngestPort['status']>().mockResolvedValue({
@ -203,6 +214,12 @@ describe('createKtxMcpServer', () => {
      },
      sl_query: { title: 'Semantic Layer Query', readOnlyHint: true, openWorldHint: false },
      sql_execution: { title: 'SQL Execution', readOnlyHint: true, openWorldHint: false },
+      sql_dialect_notes: {
+        title: 'SQL Dialect Notes',
+        readOnlyHint: true,
+        idempotentHint: true,
+        openWorldHint: false,
+      },
      memory_ingest: { title: 'Memory Ingest', destructiveHint: true, openWorldHint: false },
      memory_ingest_status: { title: 'Memory Ingest Status', readOnlyHint: true, openWorldHint: false },
    };
@ -219,6 +236,22 @@ describe('createKtxMcpServer', () => {
    }
  });

+  it('routes sql_dialect_notes through the dialect-notes port', async () => {
+    const fake = makeFakeServer();
+    const contextTools = makeAllContextTools();
+    createKtxMcpServer({
+      server: fake.server,
+      userContext: { userId: 'mcp-user' },
+      contextTools,
+    });
+
+    const result = await getTool(fake.tools, 'sql_dialect_notes').handler({ connectionId: 'warehouse' });
+    expect(contextTools.dialectNotes!.read).toHaveBeenCalledWith({ connectionId: 'warehouse' });
+    expect(result).toMatchObject({
+      structuredContent: { connectionId: 'warehouse', dialect: 'postgres' },
+    });
+  });
+
  it('exposes annotations and output schemas through the SDK tools/list response', async () => {
    const result = await listToolsThroughSdk(makeAllContextTools());
    const toolNames = result.tools.map((tool) => tool.name).sort();
@ -1332,3 +1365,179 @@ describe('createKtxMcpServer', () => {
    }
  });
 });
+
+describe('MCP tool-call logging', () => {
+  afterEach(() => {
+    vi.unstubAllEnvs();
+    vi.restoreAllMocks();
+  });
+
+  function loggerCapture() {
+    let buf = '';
+    const io = { stdout: { write() {} }, stderr: { write(chunk: string) { buf += chunk; } } };
+    return {
+      io,
+      logger: createMcpLogger(io, { isTTY: false }),
+      text: () => buf,
+      lines: () =>
+        buf
+          .split('\n')
+          .filter((line) => line.trim().startsWith('{'))
+          .map((line) => JSON.parse(line) as Record<string, unknown>),
+    };
+  }
+
+  it('logs tool.start before the handler runs and a matching tool.end on completion', async () => {
+    const cap = loggerCapture();
+    const fake = makeFakeServer();
+    createKtxMcpServer({
+      server: fake.server,
+      userContext: { userId: 'local' },
+      logger: cap.logger,
+      contextTools: {
+        sqlExecution: {
+          execute: vi
+            .fn<KtxSqlExecutionMcpPort['execute']>()
+            .mockResolvedValue({ headers: ['count'], rows: [[1]], rowCount: 1 }),
+        },
+      },
+    });
+
+    await getTool(fake.tools, 'sql_execution').handler({ connectionId: 'warehouse', sql: 'select 1' });
+
+    const lines = cap.lines();
+    const start = lines.find((line) => line.msg === 'tool.start');
+    const end = lines.find((line) => line.msg === 'tool.end');
+    expect(start).toMatchObject({
+      tool: 'sql_execution',
+      params: { connectionId: 'warehouse', sql: 'select 1' },
+      level: 30,
+    });
+    expect(typeof start?.callId).toBe('string');
+    expect(end).toMatchObject({ tool: 'sql_execution', callId: start?.callId, outcome: 'ok', level: 30 });
+    expect(typeof end?.durationMs).toBe('number');
+    expect(end?.resultSize as number).toBeGreaterThan(0);
+  });
+
+  it('leaves a tool.start carrying the SQL with no matching tool.end when a handler never returns', () => {
+    const cap = loggerCapture();
+    const fake = makeFakeServer();
+    createKtxMcpServer({
+      server: fake.server,
+      userContext: { userId: 'local' },
+      logger: cap.logger,
+      contextTools: {
+        sqlExecution: { execute: () => new Promise(() => {}) },
+      },
+    });
+
+    void getTool(fake.tools, 'sql_execution').handler({ connectionId: 'warehouse', sql: 'select pg_sleep(99999)' });
+
+    const lines = cap.lines();
+    const start = lines.find((line) => line.msg === 'tool.start');
+    expect(start).toMatchObject({ tool: 'sql_execution', params: { sql: 'select pg_sleep(99999)' } });
+    expect(lines.some((line) => line.msg === 'tool.end' && line.callId === start?.callId)).toBe(false);
+  });
+
+  it('emits tool.end at warn when a completed call exceeds the slow threshold', async () => {
+    vi.stubEnv('KTX_MCP_SLOW_TOOL_MS', '0');
+    const cap = loggerCapture();
+    const fake = makeFakeServer();
+    createKtxMcpServer({
+      server: fake.server,
+      userContext: { userId: 'local' },
+      logger: cap.logger,
+      contextTools: {
+        sqlExecution: {
+          execute: async () => {
+            await new Promise((resolve) => setTimeout(resolve, 5));
+            return { headers: ['count'], rows: [[1]], rowCount: 1 };
+          },
+        },
+      },
+    });
+
+    await getTool(fake.tools, 'sql_execution').handler({ connectionId: 'warehouse', sql: 'select 1' });
+
+    const end = cap.lines().find((line) => line.msg === 'tool.end');
+    expect(end).toMatchObject({ outcome: 'ok', level: 40 });
+  });
+
+  it('logs a matched tool.start/tool.end(error) pair carrying the deadline message when a query times out', async () => {
+    const cap = loggerCapture();
+    const fake = makeFakeServer();
+    createKtxMcpServer({
+      server: fake.server,
+      userContext: { userId: 'local' },
+      logger: cap.logger,
+      contextTools: {
+        sqlExecution: {
+          execute: vi.fn<KtxSqlExecutionMcpPort['execute']>().mockRejectedValue(new KtxQueryError('query exceeded 30s')),
+        },
+      },
+    });
+
+    await getTool(fake.tools, 'sql_execution').handler({
+      connectionId: 'warehouse',
+      sql: 'select min(time_id), max(time_id), count(*) from profits',
+    });
+
+    const lines = cap.lines();
+    const start = lines.find((line) => line.msg === 'tool.start');
+    const end = lines.find((line) => line.msg === 'tool.end');
+    expect(typeof start?.callId).toBe('string');
+    expect(end).toMatchObject({ tool: 'sql_execution', callId: start?.callId, outcome: 'error', level: 50 });
+    expect((end?.err as { message?: string }).message).toBe('query exceeded 30s');
+    // No unmatched tool.start remains — the matched pair closes spec 15's hang gap for this case.
+    expect(lines.filter((line) => line.msg === 'tool.start')).toHaveLength(1);
+    expect(lines.filter((line) => line.msg === 'tool.end' && line.callId === start?.callId)).toHaveLength(1);
+    expect(end?.durationMs as number).toBeGreaterThan(0);
+  });
+
+  it('suppresses routine tool traffic at warn level but keeps errored calls', async () => {
+    vi.stubEnv('KTX_MCP_LOG_LEVEL', 'warn');
+    const cap = loggerCapture();
+    const fake = makeFakeServer();
+    createKtxMcpServer({
+      server: fake.server,
+      userContext: { userId: 'local' },
+      logger: cap.logger,
+      contextTools: {
+        knowledge: {
+          search: vi.fn<KtxKnowledgeMcpPort['search']>().mockRejectedValue(new Error('wiki index unavailable')),
+          read: vi.fn<KtxKnowledgeMcpPort['read']>().mockResolvedValue(null),
+        },
+      },
+    });
+
+    await getTool(fake.tools, 'wiki_search').handler({ query: 'revenue', limit: 5 });
+
+    const lines = cap.lines();
+    expect(lines.some((line) => line.msg === 'tool.start')).toBe(false);
+    const end = lines.find((line) => line.msg === 'tool.end');
+    expect(end).toMatchObject({ outcome: 'error', level: 50 });
+    expect((end?.err as { message?: string }).message).toContain('wiki index unavailable');
+  });
+
+  it('does not log tool calls when no logger is provided', async () => {
+    const fake = makeFakeServer();
+    const io = makeIo(false);
+    createKtxMcpServer({
+      server: fake.server,
+      userContext: { userId: 'local' },
+      io,
+      contextTools: {
+        sqlExecution: {
+          execute: vi
+            .fn<KtxSqlExecutionMcpPort['execute']>()
+            .mockResolvedValue({ headers: ['count'], rows: [[1]], rowCount: 1 }),
+        },
+      },
+    });
+
+    await getTool(fake.tools, 'sql_execution').handler({ connectionId: 'warehouse', sql: 'select 1' });
+
+    expect(io.stderrText()).not.toContain('tool.start');
+    expect(io.stderrText()).not.toContain('tool.end');
+  });
+});
--- a/packages/cli/test/context/project/config.test.ts
+++ b/packages/cli/test/context/project/config.test.ts
@ -86,6 +86,7 @@ connections:
          profileSampleRows: 10000,
          profileConcurrency: 4,
          validationConcurrency: 4,
+          detectionBudgetMs: 600000,
        },
      },
    });
@ -427,6 +428,7 @@ scan:
    profileConcurrency: 3
    validationConcurrency: 2
    validationBudget: 0
+    detectionBudgetMs: 120000
 `);

    expect(config.scan.relationships).toEqual({
@ -441,6 +443,7 @@ scan:
      profileConcurrency: 3,
      validationConcurrency: 2,
      validationBudget: 0,
+      detectionBudgetMs: 120000,
    });
    expect(serializeKtxProjectConfig(config)).toContain('enabled: false');
    expect(serializeKtxProjectConfig(config)).toContain('llmProposals: false');
@ -453,6 +456,25 @@ scan:
    expect(serializeKtxProjectConfig(config)).toContain('profileConcurrency: 3');
    expect(serializeKtxProjectConfig(config)).toContain('validationConcurrency: 2');
    expect(serializeKtxProjectConfig(config)).toContain('validationBudget: 0');
+    expect(serializeKtxProjectConfig(config)).toContain('detectionBudgetMs: 120000');
+  });
+
+  it('defaults the relationship detection budget to ten minutes', () => {
+    expect(buildDefaultKtxProjectConfig().scan.relationships.detectionBudgetMs).toBe(600000);
+  });
+
+  it('rejects a non-positive or non-integer relationship detection budget', () => {
+    for (const value of ['0', '-1', '1.5']) {
+      const yaml = `
+scan:
+  relationships:
+    detectionBudgetMs: ${value}
+`;
+      expect(() => parseKtxProjectConfig(yaml)).toThrow(/scan\.relationships\.detectionBudgetMs/);
+      const validation = validateKtxProjectConfig(yaml);
+      expect(validation.ok).toBe(false);
+      expect(validation.issues.map((issue) => issue.path)).toContain('scan.relationships.detectionBudgetMs');
+    }
  });

  it('parses the scan relationship validation budget sentinel', () => {
--- a/packages/cli/test/context/project/setup-config.test.ts
+++ b/packages/cli/test/context/project/setup-config.test.ts
@ -49,10 +49,10 @@ describe('ktx setup config helpers', () => {

  it('merges setup-local gitignore entries without removing existing lines', () => {
    expect(mergeKtxSetupGitignoreEntries('cache/\ndb.sqlite\n')).toBe(
-      ['cache/', 'db.sqlite', 'db.sqlite-*', 'ingest-transcripts/', 'secrets/', 'setup/', 'agents/', ''].join('\n'),
+      ['cache/', 'db.sqlite', 'db.sqlite-*', 'ingest-transcripts/', 'logs/', 'secrets/', 'setup/', 'agents/', ''].join('\n'),
    );
    expect(mergeKtxSetupGitignoreEntries('cache/\nsecrets/\n')).toBe(
-      ['cache/', 'secrets/', 'db.sqlite', 'db.sqlite-*', 'ingest-transcripts/', 'setup/', 'agents/', ''].join('\n'),
+      ['cache/', 'secrets/', 'db.sqlite', 'db.sqlite-*', 'ingest-transcripts/', 'logs/', 'setup/', 'agents/', ''].join('\n'),
    );
  });
 });
--- a/packages/cli/test/context/scan/description-generation.test.ts
+++ b/packages/cli/test/context/scan/description-generation.test.ts
@ -1,4 +1,8 @@
-import { describe, expect, it, vi } from 'vitest';
+import { afterEach, beforeEach, describe, expect, it, vi } from 'vitest';
+import { type ChildProcess } from 'node:child_process';
+import { mkdtempSync, rmSync } from 'node:fs';
+import { tmpdir } from 'node:os';
+import { join } from 'node:path';

 vi.mock('ai', async (importOriginal) => {
  const actual = await importOriginal<typeof import('ai')>();
@ -14,6 +18,7 @@ import {
  KtxDescriptionGenerator,
 } from '../../../src/context/scan/description-generation.js';
 import { createKtxConnectorCapabilities, type KtxScanConnector } from '../../../src/context/scan/types.js';
+import { HANGING_CHILD, killTestChildren, spawnTestChild } from '../llm/subprocess-test-children.test-utils.js';

 function createCache(initial: Record<string, string> = {}): KtxDescriptionCachePort {
  const data = new Map(Object.entries(initial));
@ -41,6 +46,7 @@ function createLlmProvider(text = 'generated description') {
    }),
    generateObject: vi.fn(),
    runAgentLoop: vi.fn(),
+    subprocessForkSpec: () => null,
  } as any;
 }

@ -57,6 +63,7 @@ function createFailingLlmProvider(message = 'timeout exceeded when trying to con
    }),
    generateObject: vi.fn(),
    runAgentLoop: vi.fn(),
+    subprocessForkSpec: () => null,
  } as any;
 }

@ -492,7 +499,8 @@ describe('KtxDescriptionGenerator', () => {
    expect(result.tableDescription).toBeNull();
    expect(Object.fromEntries(result.columnDescriptions)).toEqual({ status: null });
    expect(warnings).toContain('enrichment_failed');
-    expect(llmRuntime.generateObject).toHaveBeenCalledTimes(1);
+    // A transient (non-timeout) failure retries up to the attempt limit (default 3).
+    expect(llmRuntime.generateObject).toHaveBeenCalledTimes(3);
    expect(llmRuntime.generateText).not.toHaveBeenCalled();
  });
 });
@ -684,6 +692,41 @@ describe('KtxDescriptionGenerator resilience', () => {
    expect(warnings).toEqual([]);
  });

+  it('propagates a genuine context abort during the batched LLM call instead of degrading to null', async () => {
+    const controller = new AbortController();
+    const llmRuntime = createLlmProvider('unused');
+    llmRuntime.generateObject = vi.fn(async () => {
+      controller.abort();
+      throw new Error('The operation was aborted');
+    });
+    const warnings: string[] = [];
+    const generator = new KtxDescriptionGenerator({
+      llmRuntime,
+      onWarning: (warning) => warnings.push(warning.code),
+      settings: { columnMaxWords: 12, tableMaxWords: 18, dataSourceMaxWords: 24 },
+    });
+
+    await expect(
+      generator.generateBatchedTableDescriptions({
+        connectionId: 'conn-1',
+        connector: createConnector(),
+        context: { runId: 'run-1', signal: controller.signal },
+        dataSourceType: 'POSTGRESQL',
+        supportsNestedAnalysis: false,
+        table: {
+          catalog: null,
+          db: 'public',
+          name: 'orders',
+          rawDescriptions: {},
+          columns: [{ name: 'status', type: 'text' }],
+        },
+      }),
+    ).rejects.toThrow();
+
+    // A genuine cancellation must not be filed as a per-table failure/timeout.
+    expect(warnings).toEqual([]);
+  });
+
  it('generates column descriptions from rawDescriptions when sampleColumn is unavailable', async () => {
    const samplerWithoutColumn: KtxScanConnector = {
      ...createConnector(),
@ -782,3 +825,89 @@ describe('KtxDescriptionGenerator resilience', () => {
    expect(generateText).not.toHaveBeenCalled();
  });
 });
+
+describe('KtxDescriptionGenerator subprocess kill boundary', () => {
+  const children: ChildProcess[] = [];
+  let workDir: string;
+  let priorTimeout: string | undefined;
+
+  beforeEach(() => {
+    workDir = mkdtempSync(join(tmpdir(), 'ktx-enrich-'));
+    priorTimeout = process.env.KTX_ENRICH_LLM_TIMEOUT_MS;
+    process.env.KTX_ENRICH_LLM_TIMEOUT_MS = '300';
+  });
+
+  afterEach(() => {
+    killTestChildren(children);
+    children.length = 0;
+    if (priorTimeout === undefined) {
+      delete process.env.KTX_ENRICH_LLM_TIMEOUT_MS;
+    } else {
+      process.env.KTX_ENRICH_LLM_TIMEOUT_MS = priorTimeout;
+    }
+    rmSync(workDir, { recursive: true, force: true });
+  });
+
+  it('skips a wedged subprocess-backed table with enrichment_timeout and settles within deadline+grace', async () => {
+    const pidFile = join(workDir, 'gc.pid');
+    const llmRuntime = createLlmProvider('unused');
+    llmRuntime.subprocessForkSpec = () => ({ backend: 'codex', projectDir: '/tmp', modelSlots: { default: 'codex' } });
+    const warnings: string[] = [];
+    const generator = new KtxDescriptionGenerator({
+      llmRuntime,
+      onWarning: (warning) => warnings.push(warning.code),
+      settings: { columnMaxWords: 12, tableMaxWords: 18, dataSourceMaxWords: 24 },
+      spawnSubprocessGenerateChild: () => spawnTestChild(children, HANGING_CHILD, { KTX_TEST_GC_PID_FILE: pidFile }),
+    });
+
+    const start = Date.now();
+    const result = await generator.generateBatchedTableDescriptions({
+      connectionId: 'conn-1',
+      connector: createConnector(),
+      context: { runId: 'run-1' },
+      dataSourceType: 'POSTGRESQL',
+      supportsNestedAnalysis: false,
+      table: { catalog: null, db: 'public', name: 'orders', columns: [{ name: 'status', type: 'text' }] },
+    });
+
+    expect(Date.now() - start).toBeLessThan(5000);
+    expect(result.tableDescription).toBeNull();
+    expect(Object.fromEntries(result.columnDescriptions)).toEqual({ status: null });
+    expect(warnings).toContain('enrichment_timeout');
+    // One wedge = one timeout: the hung table is not retried.
+    expect(children).toHaveLength(1);
+    const child = children[0]!;
+    await vi.waitFor(() => expect(child.exitCode !== null || child.signalCode !== null).toBe(true), { timeout: 5000 });
+  });
+
+  it('runs HTTP-backed enrichment in-process without spawning a child', async () => {
+    const spawnSpy = vi.fn(() => {
+      throw new Error('HTTP backend must not spawn a kill-boundary child');
+    });
+    const llmRuntime = createLlmProvider('unused');
+    llmRuntime.subprocessForkSpec = () => null;
+    llmRuntime.generateObject = vi.fn(async () => ({
+      tableDescription: 'Orders fact table',
+      columns: [{ name: 'status', description: 'Order lifecycle status' }],
+    }));
+    const generator = new KtxDescriptionGenerator({
+      llmRuntime,
+      settings: { columnMaxWords: 12, tableMaxWords: 18, dataSourceMaxWords: 24 },
+      spawnSubprocessGenerateChild: spawnSpy,
+    });
+
+    const result = await generator.generateBatchedTableDescriptions({
+      connectionId: 'conn-1',
+      connector: createConnector(),
+      context: { runId: 'run-1' },
+      dataSourceType: 'POSTGRESQL',
+      supportsNestedAnalysis: false,
+      table: { catalog: null, db: 'public', name: 'orders', columns: [{ name: 'status', type: 'text' }] },
+    });
+
+    expect(spawnSpy).not.toHaveBeenCalled();
+    expect(llmRuntime.generateObject).toHaveBeenCalledTimes(1);
+    expect(result.tableDescription).toBe('Orders fact table');
+    expect(Object.fromEntries(result.columnDescriptions)).toEqual({ status: 'Order lifecycle status' });
+  });
+});
--- a/packages/cli/test/context/scan/description-resume.test.ts
+++ b/packages/cli/test/context/scan/description-resume.test.ts
@ -0,0 +1,264 @@
+import { mkdtemp, rm } from 'node:fs/promises';
+import { tmpdir } from 'node:os';
+import { join } from 'node:path';
+import { afterEach, beforeEach, describe, expect, it, vi } from 'vitest';
+import YAML from 'yaml';
+import type { KtxLlmRuntimePort } from '../../../src/context/llm/runtime-port.js';
+import { buildDefaultKtxProjectConfig, type KtxScanRelationshipConfig } from '../../../src/context/project/config.js';
+import { initKtxProject, type KtxLocalProject } from '../../../src/context/project/project.js';
+import {
+  createKtxScanDescriptionResumeStore,
+  writeLocalScanManifestShards,
+} from '../../../src/context/scan/local-enrichment-artifacts.js';
+import { runLocalScanEnrichment, type KtxLocalScanEnrichmentResult } from '../../../src/context/scan/local-enrichment.js';
+import { SqliteLocalScanEnrichmentStateStore } from '../../../src/context/scan/sqlite-local-enrichment-state-store.js';
+import { createKtxConnectorCapabilities, type KtxScanConnector, type KtxSchemaSnapshot } from '../../../src/context/scan/types.js';
+
+const PROGRESS_PATH = 'raw-sources/warehouse/live-database/enrichment-progress/descriptions.json';
+const SHARD_PATH = 'semantic-layer/warehouse/_schema/public.yaml';
+
+function column(name: string) {
+  return {
+    name,
+    nativeType: 'integer',
+    normalizedType: 'integer' as const,
+    dimensionType: 'number' as const,
+    nullable: false,
+    primaryKey: name === 'id',
+    comment: null,
+  };
+}
+
+function table(name: string) {
+  return {
+    catalog: null,
+    db: 'public',
+    name,
+    kind: 'table' as const,
+    comment: null,
+    estimatedRows: 1,
+    foreignKeys: [],
+    columns: [column('id'), column('value')],
+  };
+}
+
+const snapshot: KtxSchemaSnapshot = {
+  connectionId: 'warehouse',
+  driver: 'postgres',
+  extractedAt: '2026-04-29T12:00:00.000Z',
+  scope: { schemas: ['public'] },
+  metadata: {},
+  tables: [table('customers'), table('orders'), table('products')],
+};
+
+function connector(): KtxScanConnector {
+  return {
+    id: 'test:warehouse',
+    driver: 'postgres',
+    capabilities: createKtxConnectorCapabilities({ tableSampling: true, columnSampling: true }),
+    introspect: vi.fn(async () => snapshot),
+    listSchemas: vi.fn(async () => []),
+    listTables: vi.fn(async () => []),
+    sampleTable: vi.fn(async () => ({ headers: ['id', 'value'], rows: [[1, 2]], totalRows: 1 })),
+    sampleColumn: vi.fn(async () => ({ values: ['1', '2'], nullCount: 0, distinctCount: 2 })),
+  };
+}
+
+function countingRuntime() {
+  let calls = 0;
+  const runtime: KtxLlmRuntimePort = {
+    generateText: vi.fn(async () => 'AI column description'),
+    generateObject: vi.fn(async () => {
+      calls += 1;
+      return { tableDescription: 'AI table description', columns: [] };
+    }) as KtxLlmRuntimePort['generateObject'],
+    runAgentLoop: vi.fn(),
+    subprocessForkSpec: () => null,
+  };
+  return { runtime, calls: () => calls };
+}
+
+function relationshipsDisabled(): KtxScanRelationshipConfig {
+  return { ...buildDefaultKtxProjectConfig().scan.relationships, enabled: false };
+}
+
+describe('descriptions stage incremental persistence + resume', () => {
+  let tempDir: string;
+  let project: KtxLocalProject;
+
+  async function runEnrichment(runId: string): Promise<{ result: KtxLocalScanEnrichmentResult; calls: number }> {
+    const llm = countingRuntime();
+    const result = await runLocalScanEnrichment({
+      connectionId: 'warehouse',
+      mode: 'enriched',
+      connector: connector(),
+      snapshot,
+      context: { runId },
+      providers: { llmRuntime: llm.runtime, embedding: null },
+      descriptionResumeStore: createKtxScanDescriptionResumeStore({
+        project,
+        connectionId: 'warehouse',
+        syncId: 'sync-1',
+        driver: 'postgres',
+      }),
+      syncId: 'sync-1',
+      relationshipSettings: relationshipsDisabled(),
+    });
+    return { result, calls: llm.calls() };
+  }
+
+  async function readProgress(): Promise<{ inputHash: string; descriptions: Array<{ table: { name: string } }> }> {
+    return JSON.parse((await project.fileStore.readFile(PROGRESS_PATH)).content);
+  }
+
+  async function writeProgress(record: unknown): Promise<void> {
+    await project.fileStore.writeFile(PROGRESS_PATH, `${JSON.stringify(record, null, 2)}\n`, 'ktx', 'ktx@example.com', 'edit');
+  }
+
+  beforeEach(async () => {
+    tempDir = await mkdtemp(join(tmpdir(), 'ktx-desc-resume-'));
+    project = await initKtxProject({ projectDir: join(tempDir, 'project') });
+  });
+
+  afterEach(async () => {
+    await rm(tempDir, { recursive: true, force: true });
+  });
+
+  it('flushes durable descriptions + ai manifest descriptions on a fresh run', async () => {
+    const { calls } = await runEnrichment('run-1');
+    expect(calls).toBe(3);
+
+    const progress = await readProgress();
+    expect(progress.descriptions.map((entry) => entry.table.name).sort()).toEqual(['customers', 'orders', 'products']);
+
+    const shard = YAML.parse((await project.fileStore.readFile(SHARD_PATH)).content) as {
+      tables: Record<string, { descriptions?: { ai?: string } }>;
+    };
+    expect(shard.tables.customers?.descriptions?.ai).toBe('AI table description');
+    expect(shard.tables.products?.descriptions?.ai).toBe('AI table description');
+  });
+
+  it('re-issues no LLM calls when every table is already enriched (matching inputHash)', async () => {
+    await runEnrichment('run-1');
+    const { result, calls } = await runEnrichment('run-2');
+
+    expect(calls).toBe(0);
+    expect(result.descriptionUpdates).toHaveLength(3);
+    expect(result.descriptionUpdates.every((update) => update.tableDescription === 'AI table description')).toBe(true);
+  });
+
+  it('re-enriches only the tables missing from the durable record', async () => {
+    await runEnrichment('run-1');
+    const progress = await readProgress();
+    progress.descriptions = progress.descriptions.filter((entry) => entry.table.name !== 'orders');
+    await writeProgress(progress);
+
+    const { result, calls } = await runEnrichment('run-2');
+
+    expect(calls).toBe(1);
+    expect(result.descriptionUpdates.map((update) => update.table.name).sort()).toEqual([
+      'customers',
+      'orders',
+      'products',
+    ]);
+  });
+
+  it('recomputes the whole stage when the durable record inputHash differs', async () => {
+    await runEnrichment('run-1');
+    const progress = await readProgress();
+    await writeProgress({ ...progress, inputHash: 'stale-input-hash' });
+
+    const { calls } = await runEnrichment('run-2');
+    expect(calls).toBe(3);
+  });
+
+  it('persists the other tables and completes the stage when one table fails', async () => {
+    const stateStore = new SqliteLocalScanEnrichmentStateStore({ dbPath: join(tempDir, 'state.sqlite') });
+    let calls = 0;
+    const runtime: KtxLlmRuntimePort = {
+      generateText: vi.fn(async () => 'AI column description'),
+      generateObject: vi.fn(async (input: { prompt: string }) => {
+        calls += 1;
+        if (input.prompt.includes('orders')) {
+          throw new Error('backend overloaded');
+        }
+        return { tableDescription: 'AI table description', columns: [] };
+      }) as KtxLlmRuntimePort['generateObject'],
+      runAgentLoop: vi.fn(),
+      subprocessForkSpec: () => null,
+    };
+
+    const result = await runLocalScanEnrichment({
+      connectionId: 'warehouse',
+      mode: 'enriched',
+      connector: connector(),
+      snapshot,
+      context: { runId: 'run-skip' },
+      providers: { llmRuntime: runtime, embedding: null },
+      descriptionResumeStore: createKtxScanDescriptionResumeStore({
+        project,
+        connectionId: 'warehouse',
+        syncId: 'sync-1',
+        driver: 'postgres',
+      }),
+      stateStore,
+      syncId: 'sync-1',
+      relationshipSettings: relationshipsDisabled(),
+    });
+
+    // orders retries to the attempt limit (3) then fails; customers + products succeed once each.
+    expect(calls).toBe(5);
+    // The failed table is a single missing description, not the whole stage's loss.
+    const byName = new Map(result.descriptionUpdates.map((update) => [update.table.name, update]));
+    expect(byName.get('orders')?.tableDescription).toBeNull();
+    expect(byName.get('customers')?.tableDescription).toBe('AI table description');
+    expect(byName.get('products')?.tableDescription).toBe('AI table description');
+
+    // The stage completed (a completed row exists, not zero).
+    const stages = await stateStore.listRunStages('run-skip');
+    expect(stages.some((stage) => stage.stage === 'descriptions' && stage.status === 'completed')).toBe(true);
+
+    // The good tables are durable: progress record + ai: in the manifest; the failed one is absent.
+    const progress = await readProgress();
+    expect(progress.descriptions.map((entry) => entry.table.name).sort()).toEqual(['customers', 'products']);
+    const shard = YAML.parse((await project.fileStore.readFile(SHARD_PATH)).content) as {
+      tables: Record<string, { descriptions?: { ai?: string } }>;
+    };
+    expect(shard.tables.customers?.descriptions?.ai).toBe('AI table description');
+    expect(shard.tables.orders?.descriptions?.ai).toBeUndefined();
+  });
+
+  it('rewrites only the manifest shards that gained a changed table', async () => {
+    const multiDb: KtxSchemaSnapshot = {
+      ...snapshot,
+      tables: [
+        { ...table('customers'), db: 'sales' },
+        { ...table('orders'), db: 'ops' },
+      ],
+    };
+    await writeLocalScanManifestShards({
+      project,
+      connectionId: 'warehouse',
+      syncId: 'sync-1',
+      driver: 'postgres',
+      snapshot: multiDb,
+      dryRun: false,
+    });
+
+    const flushed = await writeLocalScanManifestShards({
+      project,
+      connectionId: 'warehouse',
+      syncId: 'sync-1',
+      driver: 'postgres',
+      snapshot: multiDb,
+      dryRun: false,
+      descriptionUpdates: [
+        { table: { catalog: null, db: 'sales', name: 'customers' }, tableDescription: 'desc', columnDescriptions: {} },
+      ],
+      onlyChangedTableNames: new Set(['customers']),
+    });
+
+    expect(flushed.manifestShards).toHaveLength(1);
+    expect(flushed.manifestShards[0]).toContain('sales');
+  });
+});
--- a/packages/cli/test/context/scan/enabled-tables.test.ts
+++ b/packages/cli/test/context/scan/enabled-tables.test.ts
@ -0,0 +1,24 @@
+import { describe, expect, it } from 'vitest';
+import { resolveEnabledTables } from '../../../src/context/scan/enabled-tables.js';
+import { tableRefKey } from '../../../src/context/scan/table-ref.js';
+
+describe('resolveEnabledTables', () => {
+  it('returns null when enabled_tables is absent or empty', () => {
+    expect(resolveEnabledTables(undefined)).toBeNull();
+    expect(resolveEnabledTables({ driver: 'sqlite' })).toBeNull();
+    expect(resolveEnabledTables({ driver: 'sqlite', enabled_tables: [] })).toBeNull();
+  });
+
+  it('treats sqlite "main.<name>" as equivalent to the bare "<name>"', () => {
+    const qualified = resolveEnabledTables({ driver: 'sqlite', enabled_tables: ['main.customers'] });
+    const bare = resolveEnabledTables({ driver: 'sqlite', enabled_tables: ['customers'] });
+    const expected = tableRefKey({ catalog: null, db: null, name: 'customers' });
+    expect([...(qualified ?? [])]).toEqual([expected]);
+    expect([...(bare ?? [])]).toEqual([expected]);
+  });
+
+  it('keeps the schema qualifier for non-sqlite drivers', () => {
+    const scope = resolveEnabledTables({ driver: 'postgres', enabled_tables: ['public.customers'] });
+    expect([...(scope ?? [])]).toEqual([tableRefKey({ catalog: null, db: 'public', name: 'customers' })]);
+  });
+});
--- a/packages/cli/test/context/scan/enrichment-state.test.ts
+++ b/packages/cli/test/context/scan/enrichment-state.test.ts
@ -1,15 +1,26 @@
 import { mkdtemp, rm } from 'node:fs/promises';
 import { tmpdir } from 'node:os';
 import { join } from 'node:path';
+import Database from 'better-sqlite3';
 import { afterEach, beforeEach, describe, expect, it } from 'vitest';
 import {
  completedKtxScanEnrichmentStateSummary,
-  computeKtxScanEnrichmentInputHash,
+  computeKtxDescriptionsStageHash,
+  computeKtxEmbeddingsStageHash,
+  computeKtxRelationshipsStageHash,
+  computeKtxScanDescriptionDigest,
+  type KtxScanEmbeddingIdentity,
+  type KtxScanLlmIdentity,
  summarizeKtxScanEnrichmentState,
 } from '../../../src/context/scan/enrichment-state.js';
 import { SqliteLocalScanEnrichmentStateStore } from '../../../src/context/scan/sqlite-local-enrichment-state-store.js';
+import { buildDefaultKtxProjectConfig } from '../../../src/context/project/config.js';
 import type { KtxSchemaSnapshot } from '../../../src/context/scan/types.js';

+const llmIdentity: KtxScanLlmIdentity = { model: 'opus', baseUrlConfigured: false };
+const embeddingIdentity: KtxScanEmbeddingIdentity = { model: 'minilm', dimensions: 384, batchSize: 64 };
+const relationshipSettings = buildDefaultKtxProjectConfig().scan.relationships;
+
 const snapshot: KtxSchemaSnapshot = {
  connectionId: 'warehouse',
  driver: 'postgres',
@ -53,28 +64,19 @@ describe('scan enrichment state', () => {
    await rm(tempDir, { recursive: true, force: true });
  });

-  it('computes stable input hashes without depending on object key order', () => {
-    const first = computeKtxScanEnrichmentInputHash({
-      snapshot,
-      mode: 'enriched',
-      detectRelationships: true,
-      providerIdentity: { provider: 'local-heuristic', llmModel: 'a' },
-    });
-    const second = computeKtxScanEnrichmentInputHash({
+  it('computes stable per-stage hashes without depending on object key order', () => {
+    const first = computeKtxDescriptionsStageHash({ snapshot, llmIdentity });
+    const second = computeKtxDescriptionsStageHash({
      snapshot: { ...snapshot, metadata: {} },
-      mode: 'enriched',
-      detectRelationships: true,
-      providerIdentity: { llmModel: 'a', provider: 'local-heuristic' },
+      llmIdentity: { baseUrlConfigured: false, model: 'opus' },
    });
    const firstTable = snapshot.tables[0];
    if (!firstTable) {
      throw new Error('Expected test snapshot table');
    }
-    const changed = computeKtxScanEnrichmentInputHash({
+    const changed = computeKtxDescriptionsStageHash({
      snapshot: { ...snapshot, tables: [{ ...firstTable, name: 'orders_v2' }] },
-      mode: 'enriched',
-      detectRelationships: true,
-      providerIdentity: { provider: 'local-heuristic', llmModel: 'a' },
+      llmIdentity,
    });

    expect(first).toMatch(/^[a-f0-9]{64}$/);
@ -82,13 +84,48 @@ describe('scan enrichment state', () => {
    expect(changed).not.toBe(first);
  });

+  it('isolates per-stage invalidation: one input changes only its own stage', () => {
+    const descriptionDigest = computeKtxScanDescriptionDigest(['orders.id (integer)']);
+    const descriptions = computeKtxDescriptionsStageHash({ snapshot, llmIdentity });
+    const embeddings = computeKtxEmbeddingsStageHash({ snapshot, embeddingIdentity, descriptionDigest });
+    const relationships = computeKtxRelationshipsStageHash({ snapshot, relationshipSettings, llmIdentity });
+
+    // Switching the description LLM re-keys descriptions + relationships (both
+    // depend on llmIdentity) but NOT embeddings.
+    const otherLlm: KtxScanLlmIdentity = { model: 'sonnet', baseUrlConfigured: false };
+    expect(computeKtxDescriptionsStageHash({ snapshot, llmIdentity: otherLlm })).not.toBe(descriptions);
+    expect(computeKtxRelationshipsStageHash({ snapshot, relationshipSettings, llmIdentity: otherLlm })).not.toBe(
+      relationships,
+    );
+    expect(computeKtxEmbeddingsStageHash({ snapshot, embeddingIdentity, descriptionDigest })).toBe(embeddings);
+
+    // Swapping the embeddings model re-keys only embeddings.
+    const otherEmbedding: KtxScanEmbeddingIdentity = { model: 'mpnet', dimensions: 768, batchSize: 64 };
+    expect(computeKtxEmbeddingsStageHash({ snapshot, embeddingIdentity: otherEmbedding, descriptionDigest })).not.toBe(
+      embeddings,
+    );
+    expect(computeKtxDescriptionsStageHash({ snapshot, llmIdentity })).toBe(descriptions);
+    expect(computeKtxRelationshipsStageHash({ snapshot, relationshipSettings, llmIdentity })).toBe(relationships);
+
+    // A description-content change (new digest) re-keys only embeddings;
+    // relationships are deliberately decoupled from description content (D5).
+    const otherDigest = computeKtxScanDescriptionDigest(['orders.id (integer). A primary key.']);
+    expect(computeKtxEmbeddingsStageHash({ snapshot, embeddingIdentity, descriptionDigest: otherDigest })).not.toBe(
+      embeddings,
+    );
+    expect(computeKtxRelationshipsStageHash({ snapshot, relationshipSettings, llmIdentity })).toBe(relationships);
+
+    // Flipping llmProposals re-keys only relationships.
+    const otherRelationships = { ...relationshipSettings, llmProposals: !relationshipSettings.llmProposals };
+    expect(
+      computeKtxRelationshipsStageHash({ snapshot, relationshipSettings: otherRelationships, llmIdentity }),
+    ).not.toBe(relationships);
+    expect(computeKtxDescriptionsStageHash({ snapshot, llmIdentity })).toBe(descriptions);
+    expect(computeKtxEmbeddingsStageHash({ snapshot, embeddingIdentity, descriptionDigest })).toBe(embeddings);
+  });
+
  it('persists completed stages and ignores stale hashes', async () => {
-    const inputHash = computeKtxScanEnrichmentInputHash({
-      snapshot,
-      mode: 'enriched',
-      detectRelationships: true,
-      providerIdentity: { provider: 'local-heuristic' },
-    });
+    const inputHash = computeKtxDescriptionsStageHash({ snapshot, llmIdentity });

    await store.saveCompletedStage({
      runId: 'scan-run-1',
@ -103,7 +140,7 @@ describe('scan enrichment state', () => {

    await expect(
      store.findCompletedStage({
-        runId: 'scan-run-1',
+        connectionId: 'warehouse',
        stage: 'descriptions',
        inputHash,
      }),
@ -116,13 +153,51 @@ describe('scan enrichment state', () => {

    await expect(
      store.findCompletedStage({
-        runId: 'scan-run-1',
+        connectionId: 'warehouse',
        stage: 'descriptions',
        inputHash: 'different-hash',
      }),
    ).resolves.toBeNull();
  });

+  it('resolves a completed stage across a fresh run id by content identity', async () => {
+    const inputHash = computeKtxDescriptionsStageHash({ snapshot, llmIdentity });
+
+    await store.saveCompletedStage({
+      runId: 'scan-run-first',
+      connectionId: 'warehouse',
+      syncId: 'sync-first',
+      mode: 'enriched',
+      stage: 'descriptions',
+      inputHash,
+      output: [{ table: { catalog: null, db: 'public', name: 'orders' }, tableDescription: 'first' }],
+      updatedAt: '2026-04-29T12:00:00.000Z',
+    });
+    // A later run with the SAME content identity overwrites in place (the
+    // primary key no longer includes run_id), and the lookup resolves it
+    // without ever knowing the run id that produced it.
+    await store.saveCompletedStage({
+      runId: 'scan-run-second',
+      connectionId: 'warehouse',
+      syncId: 'sync-second',
+      mode: 'enriched',
+      stage: 'descriptions',
+      inputHash,
+      output: [{ table: { catalog: null, db: 'public', name: 'orders' }, tableDescription: 'second' }],
+      updatedAt: '2026-04-29T12:05:00.000Z',
+    });
+
+    const resolved = await store.findCompletedStage({
+      connectionId: 'warehouse',
+      stage: 'descriptions',
+      inputHash,
+    });
+    expect(resolved?.runId).toBe('scan-run-second');
+    expect(resolved?.output).toEqual([
+      { table: { catalog: null, db: 'public', name: 'orders' }, tableDescription: 'second' },
+    ]);
+  });
+
  it('records failed stages without making them reusable', async () => {
    await store.saveFailedStage({
      runId: 'scan-run-2',
@ -137,7 +212,7 @@ describe('scan enrichment state', () => {

    await expect(
      store.findCompletedStage({
-        runId: 'scan-run-2',
+        connectionId: 'warehouse',
        stage: 'embeddings',
        inputHash: 'hash-2',
      }),
@ -153,6 +228,47 @@ describe('scan enrichment state', () => {
    ]);
  });

+  it('recreates the resume cache when an older primary key shape is found', async () => {
+    const dbPath = join(tempDir, 'legacy.sqlite');
+    const legacy = new Database(dbPath);
+    legacy.exec(`
+      CREATE TABLE local_scan_enrichment_stages (
+        run_id TEXT NOT NULL,
+        stage TEXT NOT NULL,
+        input_hash TEXT NOT NULL,
+        connection_id TEXT NOT NULL,
+        sync_id TEXT NOT NULL,
+        mode TEXT NOT NULL,
+        status TEXT NOT NULL,
+        output_json TEXT,
+        error_message TEXT,
+        updated_at TEXT NOT NULL,
+        PRIMARY KEY (run_id, stage)
+      );
+      INSERT INTO local_scan_enrichment_stages
+        VALUES ('old-run', 'descriptions', 'hash', 'warehouse', 'sync', 'enriched', 'completed', 'null', NULL, '2026-01-01T00:00:00.000Z');
+    `);
+    legacy.close();
+
+    const recreated = new SqliteLocalScanEnrichmentStateStore({ dbPath });
+    // The legacy row is dropped with the old table; the new key shape is in
+    // force, so a fresh save + lookup round-trips cleanly.
+    await recreated.saveCompletedStage({
+      runId: 'new-run',
+      connectionId: 'warehouse',
+      syncId: 'sync',
+      mode: 'enriched',
+      stage: 'descriptions',
+      inputHash: 'hash',
+      output: ['fresh'],
+      updatedAt: '2026-02-01T00:00:00.000Z',
+    });
+    await expect(
+      recreated.findCompletedStage({ connectionId: 'warehouse', stage: 'descriptions', inputHash: 'hash' }),
+    ).resolves.toMatchObject({ runId: 'new-run', output: ['fresh'] });
+    await expect(recreated.listRunStages('old-run')).resolves.toEqual([]);
+  });
+
  it('summarizes resumed, completed, and failed stages for reports', () => {
    expect(
      summarizeKtxScanEnrichmentState({
--- a/packages/cli/test/context/scan/local-enrichment-artifacts.test.ts
+++ b/packages/cli/test/context/scan/local-enrichment-artifacts.test.ts
@ -5,7 +5,11 @@ import { afterEach, beforeEach, describe, expect, it } from 'vitest';
 import YAML from 'yaml';
 import { initKtxProject, type KtxLocalProject } from '../../../src/context/project/project.js';
 import type { KtxLocalScanEnrichmentResult } from '../../../src/context/scan/local-enrichment.js';
-import { writeLocalScanEnrichmentArtifacts, writeLocalScanManifestShards } from '../../../src/context/scan/local-enrichment-artifacts.js';
+import {
+  loadOnDiskDescriptionUpdates,
+  writeLocalScanEnrichmentArtifacts,
+  writeLocalScanManifestShards,
+} from '../../../src/context/scan/local-enrichment-artifacts.js';
 import type { KtxSchemaSnapshot } from '../../../src/context/scan/types.js';

 const snapshot: KtxSchemaSnapshot = {
@ -220,6 +224,7 @@ function enrichment(): KtxLocalScanEnrichmentResult {
      },
    ],
    compositeRelationships: null,
+    relationshipPartial: null,
  };
 }

@ -238,6 +243,86 @@ describe('writeLocalScanEnrichmentArtifacts', () => {
    await rm(tempDir, { recursive: true, force: true });
  });

+  it('scopes manifest descriptions by full table identity across same-named tables in different schemas', async () => {
+    const multiSchemaSnapshot: KtxSchemaSnapshot = {
+      connectionId: 'warehouse',
+      driver: 'postgres',
+      extractedAt: '2026-04-29T12:00:00.000Z',
+      scope: { schemas: ['analytics', 'staging'] },
+      metadata: {},
+      tables: ['analytics', 'staging'].map((schema) => ({
+        catalog: null,
+        db: schema,
+        name: 'orders',
+        kind: 'table',
+        comment: null,
+        estimatedRows: 1,
+        foreignKeys: [],
+        columns: [
+          {
+            name: 'id',
+            nativeType: 'integer',
+            normalizedType: 'integer',
+            dimensionType: 'number',
+            nullable: false,
+            primaryKey: true,
+            comment: null,
+          },
+        ],
+      })),
+    };
+    const descriptionUpdates = [
+      {
+        table: { catalog: null, db: 'analytics', name: 'orders' },
+        tableDescription: 'Curated analytics orders',
+        columnDescriptions: { id: 'Analytics order id' },
+      },
+      {
+        table: { catalog: null, db: 'staging', name: 'orders' },
+        tableDescription: 'Raw staging orders',
+        columnDescriptions: { id: 'Staging order id' },
+      },
+    ];
+
+    await writeLocalScanManifestShards({
+      project,
+      connectionId: 'warehouse',
+      syncId: 'sync-multi',
+      driver: 'postgres',
+      snapshot: multiSchemaSnapshot,
+      descriptionUpdates,
+      dryRun: false,
+    });
+
+    type Shard = {
+      tables: Record<
+        string,
+        { descriptions?: Record<string, string>; columns: Array<{ name: string; descriptions?: Record<string, string> }> }
+      >;
+    };
+    const analyticsShard = YAML.parse(
+      await readFile(join(project.projectDir, 'semantic-layer/warehouse/_schema/analytics.yaml'), 'utf-8'),
+    ) as Shard;
+    const stagingShard = YAML.parse(
+      await readFile(join(project.projectDir, 'semantic-layer/warehouse/_schema/staging.yaml'), 'utf-8'),
+    ) as Shard;
+
+    expect(analyticsShard.tables.orders?.descriptions?.ai).toBe('Curated analytics orders');
+    expect(stagingShard.tables.orders?.descriptions?.ai).toBe('Raw staging orders');
+    expect(analyticsShard.tables.orders?.columns[0]?.descriptions?.ai).toBe('Analytics order id');
+    expect(stagingShard.tables.orders?.columns[0]?.descriptions?.ai).toBe('Staging order id');
+
+    // The on-disk reconstruction (used by selective `--stages` runs that skip the
+    // descriptions stage) must also resolve per identity, not collapse names.
+    const reconstructed = await loadOnDiskDescriptionUpdates(project, 'warehouse', multiSchemaSnapshot);
+    const analytics = reconstructed.find((update) => update.table.db === 'analytics');
+    const staging = reconstructed.find((update) => update.table.db === 'staging');
+    expect(analytics?.tableDescription).toBe('Curated analytics orders');
+    expect(staging?.tableDescription).toBe('Raw staging orders');
+    expect(analytics?.columnDescriptions.id).toBe('Analytics order id');
+    expect(staging?.columnDescriptions.id).toBe('Staging order id');
+  });
+
  it('writes enrichment artifacts and manifest shards while preserving external descriptions', async () => {
    await project.fileStore.writeFile(
      'semantic-layer/warehouse/_schema/public.yaml',
@ -291,6 +376,7 @@ describe('writeLocalScanEnrichmentArtifacts', () => {
        profileSampleRows: 500,
        profileConcurrency: 3,
        validationConcurrency: 2,
+        detectionBudgetMs: 600000,
      },
    });

@ -476,6 +562,7 @@ describe('writeLocalScanEnrichmentArtifacts', () => {
        profileSampleRows: 10000,
        profileConcurrency: 4,
        validationConcurrency: 4,
+        detectionBudgetMs: 600000,
      },
      dryRun: false,
    });
@ -746,6 +833,7 @@ describe('writeLocalScanEnrichmentArtifacts', () => {
        profileSampleRows: 10000,
        profileConcurrency: 4,
        validationConcurrency: 4,
+        detectionBudgetMs: 600000,
      },
      dryRun: false,
    });
--- a/packages/cli/test/context/scan/local-enrichment.test.ts
+++ b/packages/cli/test/context/scan/local-enrichment.test.ts
@ -1,6 +1,15 @@
+import { mkdtemp, readFile, rm } from 'node:fs/promises';
+import { tmpdir } from 'node:os';
+import { join } from 'node:path';
 import Database from 'better-sqlite3';
 import { describe, expect, it, vi } from 'vitest';
+import YAML from 'yaml';
 import { buildDefaultKtxProjectConfig } from '../../../src/context/project/config.js';
+import { initKtxProject } from '../../../src/context/project/project.js';
+import {
+  loadOnDiskDescriptionUpdates,
+  writeLocalScanEnrichmentArtifacts,
+} from '../../../src/context/scan/local-enrichment-artifacts.js';
 import type {
  KtxScanEnrichmentCompletedStage,
  KtxScanEnrichmentFailedStage,
@ -201,15 +210,24 @@ function noDeclaredRelationshipSnapshot(): KtxSchemaSnapshot {

 function memoryEnrichmentStateStore(): KtxScanEnrichmentStateStore {
  const records = new Map<string, KtxScanEnrichmentCompletedStage | KtxScanEnrichmentFailedStage>();
-  const key = (input: Pick<KtxScanEnrichmentStageLookup, 'runId' | 'stage'>) => `${input.runId}:${input.stage}`;
+  const key = (input: Pick<KtxScanEnrichmentStageLookup, 'connectionId' | 'stage' | 'inputHash'>) =>
+    `${input.connectionId}:${input.stage}:${input.inputHash}`;
  return {
    async findCompletedStage<TOutput>(input: KtxScanEnrichmentStageLookup) {
      const record = records.get(key(input));
-      if (!record || record.status !== 'completed' || record.inputHash !== input.inputHash) {
+      if (!record || record.status !== 'completed') {
        return null;
      }
      return record as KtxScanEnrichmentCompletedStage<TOutput>;
    },
+    async findLatestCompletedStage(input) {
+      const matches = [...records.values()].filter(
+        (record): record is KtxScanEnrichmentCompletedStage =>
+          record.status === 'completed' && record.connectionId === input.connectionId && record.stage === input.stage,
+      );
+      matches.sort((left, right) => (left.updatedAt < right.updatedAt ? 1 : -1));
+      return matches[0] ?? null;
+    },
    async saveCompletedStage(input) {
      records.set(key(input), {
        ...input,
@ -246,6 +264,57 @@ describe('local scan enrichment', () => {
    });
  });

+  it('scopes descriptions by full table identity across same-named tables in different schemas', () => {
+    const multiSchemaSnapshot: KtxSchemaSnapshot = {
+      connectionId: 'warehouse',
+      driver: 'postgres',
+      extractedAt: '2026-04-29T12:00:00.000Z',
+      scope: { schemas: ['analytics', 'staging'] },
+      metadata: {},
+      tables: ['analytics', 'staging'].map((schema) => ({
+        catalog: null,
+        db: schema,
+        name: 'orders',
+        kind: 'table',
+        comment: null,
+        estimatedRows: 1,
+        foreignKeys: [],
+        columns: [
+          {
+            name: 'id',
+            nativeType: 'integer',
+            normalizedType: 'integer',
+            dimensionType: 'number',
+            nullable: false,
+            primaryKey: true,
+            comment: null,
+          },
+        ],
+      })),
+    };
+    const descriptions = [
+      {
+        table: { catalog: null, db: 'analytics', name: 'orders' },
+        tableDescription: 'Curated analytics orders',
+        columnDescriptions: { id: 'Analytics order id' },
+      },
+      {
+        table: { catalog: null, db: 'staging', name: 'orders' },
+        tableDescription: 'Raw staging orders',
+        columnDescriptions: { id: 'Staging order id' },
+      },
+    ];
+
+    const schema = snapshotToKtxEnrichedSchema(multiSchemaSnapshot, new Map(), descriptions);
+
+    const analytics = schema.tables.find((table) => table.id === 'analytics.orders');
+    const staging = schema.tables.find((table) => table.id === 'staging.orders');
+    expect(analytics?.descriptions.ai).toBe('Curated analytics orders');
+    expect(staging?.descriptions.ai).toBe('Raw staging orders');
+    expect(analytics?.columns[0]?.descriptions.ai).toBe('Analytics order id');
+    expect(staging?.columns[0]?.descriptions.ai).toBe('Staging order id');
+  });
+
  it('maps snapshot foreign keys into formal schema relationships', () => {
    const source = noDeclaredRelationshipSnapshot();
    const snapshotWithForeignKey = {
@ -617,8 +686,8 @@ describe('local scan enrichment', () => {

    expect(events).toEqual(
      expect.arrayContaining([
-        expect.objectContaining({ message: 'Generating descriptions 1/2 tables', transient: true }),
-        expect.objectContaining({ message: 'Generating descriptions 2/2 tables', transient: true }),
+        expect.objectContaining({ message: 'Generating descriptions 1/2 (customers, 1 cols)', transient: true }),
+        expect.objectContaining({ message: 'Generating descriptions 2/2 (orders, 2 cols)', transient: true }),
        expect.objectContaining({ message: 'Building embeddings 1/1 batches', transient: true }),
        expect.objectContaining({ message: 'Detecting relationships' }),
      ]),
@ -711,7 +780,7 @@ describe('local scan enrichment', () => {
    expect(embedBatch.mock.calls.map(([texts]) => texts).map((texts) => texts.length)).toEqual([2, 2, 1]);
  });

-  it('reuses completed description and embedding stages for the same run id and snapshot hash', async () => {
+  it('reuses completed description and embedding stages across a fresh run id by content identity', async () => {
    const stateStore = memoryEnrichmentStateStore();
    const scanConnector = connector();
    const providers = {
@ -728,21 +797,25 @@ describe('local scan enrichment', () => {
      providers,
      stateStore,
      syncId: 'sync-resume-1',
-      providerIdentity: { provider: 'fake', embeddingDimensions: 6 },
+      llmIdentity: { model: 'fake', baseUrlConfigured: false },
+      embeddingIdentity: { model: 'fake-embed', dimensions: 6, batchSize: 64 },
    });

    const generateObject = vi.spyOn(providers.llmRuntime, 'generateObject');
    const embedBatch = vi.spyOn(providers.embedding, 'embedBatch');
+    // A re-run mints a brand-new runId/syncId (as a real interrupted ingest
+    // would); resume must still hit the cache via (connectionId, stage, inputHash).
    const second = await runLocalScanEnrichment({
      connectionId: 'warehouse',
      mode: 'enriched',
      detectRelationships: true,
      connector: scanConnector,
-      context: { runId: 'scan-run-resume-1' },
+      context: { runId: 'scan-run-resume-2' },
      providers,
      stateStore,
-      syncId: 'sync-resume-1',
-      providerIdentity: { provider: 'fake', embeddingDimensions: 6 },
+      syncId: 'sync-resume-2',
+      llmIdentity: { model: 'fake', baseUrlConfigured: false },
+      embeddingIdentity: { model: 'fake-embed', dimensions: 6, batchSize: 64 },
    });

    expect(first.state.completedStages).toEqual(['descriptions', 'embeddings', 'relationships']);
@ -756,6 +829,159 @@ describe('local scan enrichment', () => {
    expect(second.relationships).toEqual(first.relationships);
  });

+  it('marks a budget-truncated relationship stage partial, persists it, and re-runs only when the budget is raised', async () => {
+    const executor = new InMemorySqliteExecutor();
+    try {
+      executor.db.exec(`
+        CREATE TABLE accounts (id INTEGER NOT NULL);
+        CREATE TABLE orders (id INTEGER NOT NULL, account_id INTEGER NOT NULL);
+        INSERT INTO accounts (id) VALUES (1), (2);
+        INSERT INTO orders (id, account_id) VALUES (10, 1), (11, 1), (12, 2);
+      `);
+      const scanConnector = {
+        ...connector(),
+        driver: 'sqlite' as const,
+        capabilities: createKtxConnectorCapabilities({ readOnlySql: true, columnStats: true }),
+        introspect: vi.fn(async () => noDeclaredRelationshipSnapshot()),
+        executeReadOnly: executor.executeReadOnly.bind(executor),
+      };
+      const stateStore = memoryEnrichmentStateStore();
+      const base = Date.parse('2026-06-01T00:00:00.000Z');
+      let calls = 0;
+      // A clock that jumps a second per read against a 1ms budget trips at the
+      // first table-profile boundary.
+      const advancingNow = () => new Date(base + calls++ * 1000);
+      const tightSettings = {
+        ...buildDefaultKtxProjectConfig().scan.relationships,
+        detectionBudgetMs: 1,
+      };
+
+      const first = await runLocalScanEnrichment({
+        connectionId: 'warehouse',
+        mode: 'relationships',
+        detectRelationships: true,
+        connector: scanConnector,
+        context: { runId: 'budget-run-1' },
+        providers: null,
+        stateStore,
+        syncId: 'sync-budget-1',
+        relationshipSettings: tightSettings,
+        now: advancingNow,
+      });
+
+      expect(first.relationshipPartial).toEqual({ reason: 'budget' });
+      expect(first.warnings.map((warning) => warning.code)).toContain('relationship_detection_partial');
+      expect(first.state.completedStages).toContain('relationships');
+
+      // A re-run with a fresh runId resumes the saved partial from cache.
+      const second = await runLocalScanEnrichment({
+        connectionId: 'warehouse',
+        mode: 'relationships',
+        detectRelationships: true,
+        connector: scanConnector,
+        context: { runId: 'budget-run-2' },
+        providers: null,
+        stateStore,
+        syncId: 'sync-budget-2',
+        relationshipSettings: tightSettings,
+      });
+      expect(second.state.resumedStages).toContain('relationships');
+
+      // Raising the budget changes the content identity, forcing a fuller run.
+      const third = await runLocalScanEnrichment({
+        connectionId: 'warehouse',
+        mode: 'relationships',
+        detectRelationships: true,
+        connector: scanConnector,
+        context: { runId: 'budget-run-3' },
+        providers: null,
+        stateStore,
+        syncId: 'sync-budget-3',
+        relationshipSettings: { ...tightSettings, detectionBudgetMs: 600_000 },
+      });
+      expect(third.state.resumedStages).not.toContain('relationships');
+      expect(third.relationshipPartial).toBeNull();
+    } finally {
+      executor.close();
+    }
+  });
+
+  it('checkpoints descriptions and embeddings before the relationship stage queries the database', async () => {
+    const executor = new InMemorySqliteExecutor();
+    try {
+      executor.db.exec(`
+        CREATE TABLE accounts (id INTEGER NOT NULL);
+        CREATE TABLE orders (id INTEGER NOT NULL, account_id INTEGER NOT NULL);
+        INSERT INTO accounts (id) VALUES (1), (2);
+        INSERT INTO orders (id, account_id) VALUES (10, 1), (11, 1), (12, 2);
+      `);
+      const checkpoints: Array<Awaited<ReturnType<typeof runLocalScanEnrichment>>> = [];
+      let sawRelationshipQuery = false;
+      let relationshipQueryRanAfterCheckpoint = true;
+      const scanConnector = {
+        ...connector(),
+        driver: 'sqlite' as const,
+        capabilities: createKtxConnectorCapabilities({ readOnlySql: true, columnStats: true }),
+        introspect: vi.fn(async () => noDeclaredRelationshipSnapshot()),
+        executeReadOnly: (input: KtxReadOnlyQueryInput, ctx: KtxScanContext) => {
+          sawRelationshipQuery = true;
+          if (checkpoints.length === 0) {
+            relationshipQueryRanAfterCheckpoint = false;
+          }
+          return executor.executeReadOnly(input, ctx);
+        },
+      };
+
+      const result = await runLocalScanEnrichment({
+        connectionId: 'warehouse',
+        mode: 'enriched',
+        detectRelationships: true,
+        connector: scanConnector,
+        context: { runId: 'checkpoint-order' },
+        providers: {
+          ...createDeterministicLocalScanEnrichmentProviders(),
+          embedding: fakeScanEmbedding({ dimensions: 6 }),
+        },
+        onCheckpoint: async (checkpoint) => {
+          checkpoints.push(checkpoint);
+        },
+      });
+
+      expect(checkpoints).toHaveLength(1);
+      const checkpoint = checkpoints[0];
+      if (!checkpoint) {
+        throw new Error('Expected a checkpoint');
+      }
+      expect(checkpoint.summary.tableDescriptions).toBe('completed');
+      expect(checkpoint.summary.embeddings).toBe('completed');
+      expect(checkpoint.descriptionUpdates.length).toBeGreaterThan(0);
+      expect(checkpoint.embeddingUpdates.length).toBeGreaterThan(0);
+      // The relationship-specific outputs are deliberately absent at checkpoint time.
+      expect(checkpoint.relationshipUpdate).toBeNull();
+      expect(checkpoint.relationshipProfile).toBeNull();
+      expect(sawRelationshipQuery).toBe(true);
+      expect(relationshipQueryRanAfterCheckpoint).toBe(true);
+      // The final result still carries the relationship outputs.
+      expect(result.relationshipProfile).not.toBeNull();
+    } finally {
+      executor.close();
+    }
+  });
+
+  it('does not checkpoint when relationship detection is skipped', async () => {
+    const onCheckpoint = vi.fn(async () => {});
+    await runLocalScanEnrichment({
+      connectionId: 'warehouse',
+      mode: 'enriched',
+      connector: connector(),
+      context: { runId: 'no-checkpoint' },
+      providers: createDeterministicLocalScanEnrichmentProviders(),
+      relationshipSettings: { ...buildDefaultKtxProjectConfig().scan.relationships, enabled: false },
+      onCheckpoint,
+    });
+    expect(onCheckpoint).not.toHaveBeenCalled();
+  });
+
  it('does not reuse completed stages when the snapshot changes', async () => {
    const stateStore = memoryEnrichmentStateStore();
    const providers = {
@ -773,7 +999,8 @@ describe('local scan enrichment', () => {
      providers,
      stateStore,
      syncId: 'sync-resume-hash',
-      providerIdentity: { provider: 'fake', embeddingDimensions: 6 },
+      llmIdentity: { model: 'fake', baseUrlConfigured: false },
+      embeddingIdentity: { model: 'fake-embed', dimensions: 6, batchSize: 64 },
    });

    const firstTable = snapshot.tables[0];
@ -798,7 +1025,8 @@ describe('local scan enrichment', () => {
      providers,
      stateStore,
      syncId: 'sync-resume-hash',
-      providerIdentity: { provider: 'fake', embeddingDimensions: 6 },
+      llmIdentity: { model: 'fake', baseUrlConfigured: false },
+      embeddingIdentity: { model: 'fake-embed', dimensions: 6, batchSize: 64 },
    });

    expect(result.state.resumedStages).toEqual([]);
@ -868,4 +1096,653 @@ describe('local scan enrichment', () => {
    }
  });

+  it('merges ai descriptions into the enriched relationship schema', () => {
+    const schema = snapshotToKtxEnrichedSchema(snapshot, new Map(), [
+      {
+        table: { catalog: null, db: 'public', name: 'orders' },
+        tableDescription: 'All customer orders',
+        columnDescriptions: { customer_id: 'FK to the owning customer' },
+      },
+    ]);
+    const orders = schema.tables.find((table) => table.ref.name === 'orders');
+    expect(orders?.descriptions).toMatchObject({ db: 'Customer orders', ai: 'All customer orders' });
+    expect(orders?.columns.find((column) => column.name === 'customer_id')?.descriptions).toMatchObject({
+      db: 'Customer id',
+      ai: 'FK to the owning customer',
+    });
+  });
+
+  it('force-reruns a named stage past the completed-row short-circuit and leaves unselected stages untouched', async () => {
+    const stateStore = memoryEnrichmentStateStore();
+    const scanConnector = connector();
+    const providers = {
+      ...createDeterministicLocalScanEnrichmentProviders(),
+      embedding: fakeScanEmbedding({ dimensions: 6 }),
+    };
+    const identity = {
+      llmIdentity: { model: 'fake', baseUrlConfigured: false },
+      embeddingIdentity: { model: 'fake-embed', dimensions: 6, batchSize: 64 },
+    };
+
+    await runLocalScanEnrichment({
+      connectionId: 'warehouse',
+      mode: 'enriched',
+      detectRelationships: true,
+      connector: scanConnector,
+      context: { runId: 'force-1' },
+      providers,
+      stateStore,
+      syncId: 'force-s1',
+      ...identity,
+    });
+
+    const generateObject = vi.spyOn(providers.llmRuntime, 'generateObject');
+    const embedBatch = vi.spyOn(providers.embedding, 'embedBatch');
+
+    const rerun = await runLocalScanEnrichment({
+      connectionId: 'warehouse',
+      mode: 'enriched',
+      detectRelationships: true,
+      connector: scanConnector,
+      context: { runId: 'force-2' },
+      providers,
+      stateStore,
+      syncId: 'force-s2',
+      stages: ['descriptions'],
+      ...identity,
+    });
+
+    // Only descriptions ran, and it recomputed (not resumed) despite a matching
+    // completed row; embeddings + relationships were left untouched.
+    expect(rerun.state.completedStages).toEqual(['descriptions']);
+    expect(rerun.state.resumedStages).toEqual([]);
+    expect(generateObject).toHaveBeenCalled();
+    expect(embedBatch).not.toHaveBeenCalled();
+  });
+
+  it('naming every stage forces a full recompute rather than a no-op resume', async () => {
+    const stateStore = memoryEnrichmentStateStore();
+    const scanConnector = connector();
+    const providers = {
+      ...createDeterministicLocalScanEnrichmentProviders(),
+      embedding: fakeScanEmbedding({ dimensions: 6 }),
+    };
+    const identity = {
+      llmIdentity: { model: 'fake', baseUrlConfigured: false },
+      embeddingIdentity: { model: 'fake-embed', dimensions: 6, batchSize: 64 },
+    };
+
+    await runLocalScanEnrichment({
+      connectionId: 'warehouse',
+      mode: 'enriched',
+      detectRelationships: true,
+      connector: scanConnector,
+      context: { runId: 'full-1' },
+      providers,
+      stateStore,
+      syncId: 'full-s1',
+      ...identity,
+    });
+
+    const generateObject = vi.spyOn(providers.llmRuntime, 'generateObject');
+    const embedBatch = vi.spyOn(providers.embedding, 'embedBatch');
+
+    const rerun = await runLocalScanEnrichment({
+      connectionId: 'warehouse',
+      mode: 'enriched',
+      detectRelationships: true,
+      connector: scanConnector,
+      context: { runId: 'full-2' },
+      providers,
+      stateStore,
+      syncId: 'full-s2',
+      stages: ['descriptions', 'embeddings', 'relationships'],
+      ...identity,
+    });
+
+    expect(rerun.state.resumedStages).toEqual([]);
+    expect(rerun.state.completedStages).toEqual(['descriptions', 'embeddings', 'relationships']);
+    expect(generateObject).toHaveBeenCalled();
+    expect(embedBatch).toHaveBeenCalled();
+  });
+
+  it('isolates per-stage invalidation: changing the embedding identity re-runs only embeddings', async () => {
+    const stateStore = memoryEnrichmentStateStore();
+    const scanConnector = connector();
+    const providers = {
+      ...createDeterministicLocalScanEnrichmentProviders(),
+      embedding: fakeScanEmbedding({ dimensions: 6 }),
+    };
+    const llmIdentity = { model: 'fake', baseUrlConfigured: false };
+
+    await runLocalScanEnrichment({
+      connectionId: 'warehouse',
+      mode: 'enriched',
+      detectRelationships: true,
+      connector: scanConnector,
+      context: { runId: 'iso-1' },
+      providers,
+      stateStore,
+      syncId: 'iso-s1',
+      llmIdentity,
+      embeddingIdentity: { model: 'embed-v1', dimensions: 6, batchSize: 64 },
+    });
+
+    const generateObject = vi.spyOn(providers.llmRuntime, 'generateObject');
+    const embedBatch = vi.spyOn(providers.embedding, 'embedBatch');
+
+    const rerun = await runLocalScanEnrichment({
+      connectionId: 'warehouse',
+      mode: 'enriched',
+      detectRelationships: true,
+      connector: scanConnector,
+      context: { runId: 'iso-2' },
+      providers,
+      stateStore,
+      syncId: 'iso-s2',
+      llmIdentity,
+      embeddingIdentity: { model: 'embed-v2', dimensions: 6, batchSize: 64 },
+    });
+
+    // Only the embeddings hash moved: descriptions + relationships resume from
+    // cache, embeddings recompute. No LLM description/proposal calls fire.
+    expect(rerun.state.resumedStages).toEqual(['descriptions', 'relationships']);
+    expect(rerun.state.completedStages).toEqual(['descriptions', 'embeddings', 'relationships']);
+    expect(generateObject).not.toHaveBeenCalled();
+    expect(embedBatch).toHaveBeenCalled();
+  });
+
+  it('warns when a selected stage cannot run because its prerequisite is missing', async () => {
+    const result = await runLocalScanEnrichment({
+      connectionId: 'warehouse',
+      mode: 'enriched',
+      detectRelationships: false,
+      connector: connector(),
+      context: { runId: 'prereq-1' },
+      // No embedding provider configured.
+      providers: createDeterministicLocalScanEnrichmentProviders(),
+      stages: ['embeddings'],
+      llmIdentity: { model: 'fake', baseUrlConfigured: false },
+    });
+
+    expect(result.summary.embeddings).toBe('skipped');
+    expect(result.warnings).toContainEqual(
+      expect.objectContaining({ code: 'enrichment_stage_skipped', metadata: { stage: 'embeddings' } }),
+    );
+  });
+
+  it('feeds on-disk descriptions into the llmProposals prompt on a relationships-only run', async () => {
+    const executor = new InMemorySqliteExecutor();
+    try {
+      executor.db.exec(`
+        CREATE TABLE accounts (id INTEGER NOT NULL);
+        CREATE TABLE orders (id INTEGER NOT NULL, account_id INTEGER NOT NULL);
+        INSERT INTO accounts (id) VALUES (1), (2);
+        INSERT INTO orders (id, account_id) VALUES (10, 1), (11, 1), (12, 2);
+      `);
+      const scanConnector = {
+        ...connector(),
+        driver: 'sqlite' as const,
+        capabilities: createKtxConnectorCapabilities({ readOnlySql: true, columnStats: true }),
+        introspect: vi.fn(async () => noDeclaredRelationshipSnapshot()),
+        executeReadOnly: executor.executeReadOnly.bind(executor),
+      };
+      const providers = createDeterministicLocalScanEnrichmentProviders();
+      const generateObject = vi.spyOn(providers.llmRuntime, 'generateObject');
+      const onDiskDescriptions: Array<{
+        table: { catalog: null; db: null; name: string };
+        tableDescription: string | null;
+        columnDescriptions: Record<string, string | null>;
+      }> = [
+        {
+          table: { catalog: null, db: null, name: 'orders' },
+          tableDescription: 'Customer purchase orders',
+          columnDescriptions: { id: 'Order identifier', account_id: 'The owning account reference' },
+        },
+        {
+          table: { catalog: null, db: null, name: 'accounts' },
+          tableDescription: 'Account records',
+          columnDescriptions: { id: 'Account identifier' },
+        },
+      ];
+
+      await runLocalScanEnrichment({
+        connectionId: 'warehouse',
+        mode: 'enriched',
+        detectRelationships: true,
+        connector: scanConnector,
+        context: { runId: 'rel-only-hydration' },
+        providers,
+        stages: ['relationships'],
+        llmIdentity: { model: 'fake', baseUrlConfigured: false },
+        loadPriorDescriptions: async () => onDiskDescriptions,
+      });
+
+      // The relationship-proposal prompt (the only generateObject calls on a
+      // relationships-only run) carries the on-disk descriptions, not just names.
+      const prompts = generateObject.mock.calls.map((call) => String((call[0] as { prompt: string }).prompt));
+      expect(prompts.length).toBeGreaterThan(0);
+      expect(prompts.some((prompt) => prompt.includes('The owning account reference'))).toBe(true);
+    } finally {
+      executor.close();
+    }
+  });
+
+  it('resume record still skips already-enriched tables when a forced descriptions rerun re-enters compute', async () => {
+    const stateStore = memoryEnrichmentStateStore();
+    const scanConnector = connector();
+    const providers = createDeterministicLocalScanEnrichmentProviders();
+    const identity = { llmIdentity: { model: 'fake', baseUrlConfigured: false } };
+    const resumeStore = {
+      load: vi.fn(async () => [
+        {
+          table: { catalog: null, db: 'public', name: 'customers' },
+          tableDescription: 'Recovered customers description',
+          columnDescriptions: { id: 'Recovered id' },
+        },
+      ]),
+      flush: vi.fn(async () => {}),
+    };
+
+    // Populate a completed descriptions row so a non-forced run would short-circuit.
+    await runLocalScanEnrichment({
+      connectionId: 'warehouse',
+      mode: 'enriched',
+      detectRelationships: false,
+      connector: scanConnector,
+      context: { runId: 'resume-force-1' },
+      providers,
+      stateStore,
+      syncId: 'resume-force-s1',
+      ...identity,
+    });
+
+    const generateObject = vi.spyOn(providers.llmRuntime, 'generateObject');
+    const rerun = await runLocalScanEnrichment({
+      connectionId: 'warehouse',
+      mode: 'enriched',
+      detectRelationships: false,
+      connector: scanConnector,
+      context: { runId: 'resume-force-2' },
+      providers,
+      stateStore,
+      syncId: 'resume-force-s2',
+      stages: ['descriptions'],
+      descriptionResumeStore: resumeStore,
+      ...identity,
+    });
+
+    // Forced compute re-entered, consulted the resume record, recovered
+    // 'customers', and only re-issued the LLM for the un-recovered 'orders'.
+    expect(resumeStore.load).toHaveBeenCalled();
+    expect(generateObject).toHaveBeenCalledTimes(1);
+    expect(rerun.descriptionUpdates.find((update) => update.table.name === 'customers')?.tableDescription).toBe(
+      'Recovered customers description',
+    );
+    expect(rerun.state.resumedStages).toEqual([]);
+  });
+
+  it('resumes per table identity, re-enriching a same-named table in another schema', async () => {
+    const multiSchemaSnapshot: KtxSchemaSnapshot = {
+      connectionId: 'warehouse',
+      driver: 'postgres',
+      extractedAt: '2026-04-29T12:00:00.000Z',
+      scope: { schemas: ['analytics', 'staging'] },
+      metadata: {},
+      tables: ['analytics', 'staging'].map((schema) => ({
+        catalog: null,
+        db: schema,
+        name: 'orders',
+        kind: 'table',
+        comment: null,
+        estimatedRows: 1,
+        foreignKeys: [],
+        columns: [
+          {
+            name: 'id',
+            nativeType: 'integer',
+            normalizedType: 'integer',
+            dimensionType: 'number',
+            nullable: false,
+            primaryKey: true,
+            comment: null,
+          },
+        ],
+      })),
+    };
+    const scanConnector = connector();
+    const providers = createDeterministicLocalScanEnrichmentProviders();
+    const generateObject = vi.spyOn(providers.llmRuntime, 'generateObject');
+    // Only the analytics.orders description was flushed before the interruption.
+    const resumeStore = {
+      load: vi.fn(async () => [
+        {
+          table: { catalog: null, db: 'analytics', name: 'orders' },
+          tableDescription: 'Recovered analytics orders',
+          columnDescriptions: { id: 'Recovered analytics id' },
+        },
+      ]),
+      flush: vi.fn(async () => {}),
+    };
+
+    const result = await runLocalScanEnrichment({
+      connectionId: 'warehouse',
+      mode: 'enriched',
+      detectRelationships: false,
+      connector: scanConnector,
+      snapshot: multiSchemaSnapshot,
+      context: { runId: 'resume-identity' },
+      providers,
+      descriptionResumeStore: resumeStore,
+      relationshipSettings: { ...buildDefaultKtxProjectConfig().scan.relationships, enabled: false },
+    });
+
+    // staging.orders is not recovered (different identity), so it is re-enriched
+    // exactly once; analytics.orders keeps its recovered description.
+    expect(generateObject).toHaveBeenCalledTimes(1);
+    const analytics = result.descriptionUpdates.find((update) => update.table.db === 'analytics');
+    const staging = result.descriptionUpdates.find((update) => update.table.db === 'staging');
+    expect(analytics?.tableDescription).toBe('Recovered analytics orders');
+    expect(staging?.tableDescription).not.toBe('Recovered analytics orders');
+    expect(staging?.tableDescription).toBeTruthy();
+  });
+
+  it('flags an unselected stage stale when its inputs changed, names the cascade, and clears after re-running it', async () => {
+    const stateStore = memoryEnrichmentStateStore();
+    const scanConnector = connector();
+    const providers = {
+      ...createDeterministicLocalScanEnrichmentProviders(),
+      embedding: fakeScanEmbedding({ dimensions: 6 }),
+    };
+    const llmIdentity = { model: 'fake', baseUrlConfigured: false };
+    const embeddingV1 = { model: 'embed-v1', dimensions: 6, batchSize: 64 };
+    const embeddingV2 = { model: 'embed-v2', dimensions: 6, batchSize: 64 };
+
+    // Full run captures embeddings + relationships keyed on the v1 embedding model.
+    const full = await runLocalScanEnrichment({
+      connectionId: 'warehouse',
+      mode: 'enriched',
+      detectRelationships: true,
+      connector: scanConnector,
+      context: { runId: 'stale-1' },
+      providers,
+      stateStore,
+      syncId: 'stale-s1',
+      llmIdentity,
+      embeddingIdentity: embeddingV1,
+    });
+    // Stand in for the persisted _schema so embeddings-only runs see the same
+    // descriptions the descriptions stage produces (deterministic content).
+    const loadPriorDescriptions = async () => full.descriptionUpdates;
+
+    // The embedding model changed in config, but the operator re-ran only descriptions.
+    const reDescribe = await runLocalScanEnrichment({
+      connectionId: 'warehouse',
+      mode: 'enriched',
+      detectRelationships: true,
+      connector: scanConnector,
+      context: { runId: 'stale-2' },
+      providers,
+      stateStore,
+      syncId: 'stale-s2',
+      stages: ['descriptions'],
+      loadPriorDescriptions,
+      llmIdentity,
+      embeddingIdentity: embeddingV2,
+    });
+    const stale = reDescribe.warnings.filter((warning) => warning.code === 'enrichment_stage_stale');
+    expect(stale.map((warning) => warning.metadata?.stage)).toEqual(['embeddings']);
+    expect(stale[0]?.message).toContain('--stages embeddings');
+
+    // Re-embedding on v2 stores the fresh embeddings hash, clearing the staleness.
+    await runLocalScanEnrichment({
+      connectionId: 'warehouse',
+      mode: 'enriched',
+      detectRelationships: true,
+      connector: scanConnector,
+      context: { runId: 'stale-3' },
+      providers,
+      stateStore,
+      syncId: 'stale-s3',
+      stages: ['embeddings'],
+      loadPriorDescriptions,
+      llmIdentity,
+      embeddingIdentity: embeddingV2,
+    });
+    const afterReembed = await runLocalScanEnrichment({
+      connectionId: 'warehouse',
+      mode: 'enriched',
+      detectRelationships: true,
+      connector: scanConnector,
+      context: { runId: 'stale-4' },
+      providers,
+      stateStore,
+      syncId: 'stale-s4',
+      stages: ['descriptions'],
+      loadPriorDescriptions,
+      llmIdentity,
+      embeddingIdentity: embeddingV2,
+    });
+    expect(afterReembed.warnings.filter((warning) => warning.code === 'enrichment_stage_stale')).toEqual([]);
+  });
+
+  const enrichedFixtureSnapshot = (): KtxSchemaSnapshot => ({
+    connectionId: 'warehouse',
+    driver: 'sqlite',
+    extractedAt: '2026-05-07T00:00:00.000Z',
+    scope: {},
+    metadata: {},
+    tables: [
+      {
+        catalog: null,
+        db: null,
+        name: 'accounts',
+        kind: 'table',
+        comment: 'DB accounts',
+        estimatedRows: 2,
+        foreignKeys: [],
+        columns: [
+          {
+            name: 'id',
+            nativeType: 'INTEGER',
+            normalizedType: 'integer',
+            dimensionType: 'number',
+            nullable: false,
+            primaryKey: false,
+            comment: 'DB accounts id',
+          },
+        ],
+      },
+      {
+        catalog: null,
+        db: null,
+        name: 'orders',
+        kind: 'table',
+        comment: 'DB orders',
+        estimatedRows: 3,
+        foreignKeys: [],
+        columns: [
+          {
+            name: 'id',
+            nativeType: 'INTEGER',
+            normalizedType: 'integer',
+            dimensionType: 'number',
+            nullable: false,
+            primaryKey: false,
+            comment: 'DB orders id',
+          },
+          {
+            name: 'account_id',
+            nativeType: 'INTEGER',
+            normalizedType: 'integer',
+            dimensionType: 'number',
+            nullable: false,
+            primaryKey: false,
+            comment: 'DB account ref',
+          },
+        ],
+      },
+    ],
+  });
+
+  const countKeyOccurrences = (text: string, key: string): number =>
+    (text.match(new RegExp(`\\b${key}:`, 'g')) ?? []).length;
+
+  // Regression (spec 21 defect, 2026-06-24): a --stages subset that omits a stage
+  // must not delete that stage's on-disk artifacts from the written _schema.
+  it('a --stages relationships run preserves on-disk descriptions while adding joins', async () => {
+    const tempDir = await mkdtemp(join(tmpdir(), 'ktx-stage-preserve-rel-'));
+    const executor = new InMemorySqliteExecutor();
+    try {
+      executor.db.exec(`
+        CREATE TABLE accounts (id INTEGER NOT NULL);
+        CREATE TABLE orders (id INTEGER NOT NULL, account_id INTEGER NOT NULL);
+        INSERT INTO accounts (id) VALUES (1), (2);
+        INSERT INTO orders (id, account_id) VALUES (10, 1), (11, 1), (12, 2);
+      `);
+      const project = await initKtxProject({ projectDir: join(tempDir, 'project') });
+      const shardPath = 'semantic-layer/warehouse/_schema/public.yaml';
+      // Enriched fixture: full ai + db descriptions, zero joins.
+      await project.fileStore.writeFile(
+        shardPath,
+        YAML.stringify(
+          {
+            tables: {
+              accounts: {
+                table: 'accounts',
+                descriptions: { ai: 'AI accounts table', db: 'DB accounts' },
+                columns: [{ name: 'id', type: 'number', descriptions: { ai: 'AI accounts id', db: 'DB accounts id' } }],
+              },
+              orders: {
+                table: 'orders',
+                descriptions: { ai: 'AI orders table', db: 'DB orders' },
+                columns: [
+                  { name: 'id', type: 'number', descriptions: { ai: 'AI orders id', db: 'DB orders id' } },
+                  { name: 'account_id', type: 'number', descriptions: { ai: 'AI account ref', db: 'DB account ref' } },
+                ],
+              },
+            },
+          },
+          { indent: 2, lineWidth: 0 },
+        ),
+        'ktx',
+        'ktx@example.com',
+        'Seed enriched fixture',
+      );
+      const before = await readFile(join(project.projectDir, shardPath), 'utf-8');
+      const aiBefore = countKeyOccurrences(before, 'ai');
+      const dbBefore = countKeyOccurrences(before, 'db');
+      expect(aiBefore).toBeGreaterThan(0);
+
+      const scanConnector = {
+        ...connector(),
+        driver: 'sqlite' as const,
+        capabilities: createKtxConnectorCapabilities({ readOnlySql: true, columnStats: true }),
+        introspect: vi.fn(async () => enrichedFixtureSnapshot()),
+        executeReadOnly: executor.executeReadOnly.bind(executor),
+      };
+      const result = await runLocalScanEnrichment({
+        connectionId: 'warehouse',
+        mode: 'enriched',
+        detectRelationships: true,
+        connector: scanConnector,
+        context: { runId: 'preserve-rel-1' },
+        providers: createDeterministicLocalScanEnrichmentProviders(),
+        stages: ['relationships'],
+        syncId: 'sync-preserve-rel',
+        loadPriorDescriptions: (snap) => loadOnDiskDescriptionUpdates(project, 'warehouse', snap),
+      });
+      await writeLocalScanEnrichmentArtifacts({
+        project,
+        connectionId: 'warehouse',
+        syncId: 'sync-preserve-rel',
+        driver: 'sqlite',
+        enrichment: result,
+        dryRun: false,
+      });
+
+      const after = await readFile(join(project.projectDir, shardPath), 'utf-8');
+      // Every prior ai:/db: description survived the relationships-only run...
+      expect(countKeyOccurrences(after, 'ai')).toBe(aiBefore);
+      expect(countKeyOccurrences(after, 'db')).toBe(dbBefore);
+      expect(after).toContain('AI orders table');
+      expect(after).toContain('AI account ref');
+      // ...and the relationships stage actually added joins (it was 0 before).
+      expect(result.relationships.accepted).toBeGreaterThan(0);
+      const shard = YAML.parse(after) as { tables: Record<string, { joins?: unknown[] }> };
+      expect(Object.values(shard.tables).some((table) => (table.joins ?? []).length > 0)).toBe(true);
+    } finally {
+      executor.close();
+      await rm(tempDir, { recursive: true, force: true });
+    }
+  });
+
+  it('a --stages descriptions run preserves on-disk joins while refreshing descriptions', async () => {
+    const tempDir = await mkdtemp(join(tmpdir(), 'ktx-stage-preserve-desc-'));
+    try {
+      const project = await initKtxProject({ projectDir: join(tempDir, 'project') });
+      const shardPath = 'semantic-layer/warehouse/_schema/public.yaml';
+      // Fixture: an inferred join present, descriptions absent.
+      await project.fileStore.writeFile(
+        shardPath,
+        YAML.stringify(
+          {
+            tables: {
+              accounts: { table: 'accounts', columns: [{ name: 'id', type: 'number' }] },
+              orders: {
+                table: 'orders',
+                columns: [
+                  { name: 'id', type: 'number' },
+                  { name: 'account_id', type: 'number' },
+                ],
+                joins: [
+                  { to: 'accounts', on: 'orders.account_id = accounts.id', relationship: 'many_to_one', source: 'inferred' },
+                ],
+              },
+            },
+          },
+          { indent: 2, lineWidth: 0 },
+        ),
+        'ktx',
+        'ktx@example.com',
+        'Seed joins fixture',
+      );
+
+      const scanConnector = {
+        ...connector(),
+        driver: 'sqlite' as const,
+        introspect: vi.fn(async () => enrichedFixtureSnapshot()),
+      };
+      const result = await runLocalScanEnrichment({
+        connectionId: 'warehouse',
+        mode: 'enriched',
+        detectRelationships: true,
+        connector: scanConnector,
+        context: { runId: 'preserve-desc-1' },
+        providers: createDeterministicLocalScanEnrichmentProviders(),
+        stages: ['descriptions'],
+        syncId: 'sync-preserve-desc',
+        loadPriorDescriptions: (snap) => loadOnDiskDescriptionUpdates(project, 'warehouse', snap),
+      });
+      await writeLocalScanEnrichmentArtifacts({
+        project,
+        connectionId: 'warehouse',
+        syncId: 'sync-preserve-desc',
+        driver: 'sqlite',
+        enrichment: result,
+        dryRun: false,
+      });
+
+      const after = await readFile(join(project.projectDir, shardPath), 'utf-8');
+      const shard = YAML.parse(after) as {
+        tables: Record<string, { joins?: Array<{ to: string; source: string }> }>;
+      };
+      // The inferred join survived the descriptions-only run...
+      expect(shard.tables.orders?.joins?.some((join) => join.to === 'accounts' && join.source === 'inferred')).toBe(true);
+      // ...and the descriptions stage (re)wrote ai descriptions.
+      expect(countKeyOccurrences(after, 'ai')).toBeGreaterThan(0);
+    } finally {
+      await rm(tempDir, { recursive: true, force: true });
+    }
+  });
 });
--- a/packages/cli/test/context/scan/local-scan.test.ts
+++ b/packages/cli/test/context/scan/local-scan.test.ts
@ -96,6 +96,7 @@ function deterministicLlmRuntime(): KtxLlmRuntimePort {
    generateText: vi.fn(async (input) => `Deterministic description for ${input.prompt.slice(0, 64).trim() || 'data source'}`),
    generateObject: vi.fn(async () => ({ pkCandidates: [], fkCandidates: [] }) as never),
    runAgentLoop: vi.fn(),
+    subprocessForkSpec: () => null,
  };
 }

@ -1672,6 +1673,111 @@ describe('local scan', () => {
    expect(persistedReport).toContain('embedding service timed out');
  });

+  it('keeps AI descriptions in the queryable _schema when the relationship stage fails after enrichment', async () => {
+    // Durability: the paid descriptions are checkpointed into the queryable
+    // manifest before relationship detection runs, so a relationship-stage
+    // failure degrades to "no joins", never "no descriptions".
+    project.config.scan.enrichment = { mode: 'deterministic' };
+    const connector = {
+      id: 'test:warehouse',
+      driver: 'postgres' as const,
+      capabilities: {
+        structuralIntrospection: true as const,
+        tableSampling: true,
+        columnSampling: true,
+        columnStats: true,
+        readOnlySql: true,
+        nestedAnalysis: false,
+        eventStreamDiscovery: false,
+        formalForeignKeys: false,
+        estimatedRowCounts: true,
+      },
+      ...connectorScopeListing,
+      async introspect() {
+        return {
+          connectionId: 'warehouse',
+          driver: 'postgres' as const,
+          extractedAt: '2026-04-29T09:00:00.000Z',
+          scope: { schemas: ['public'] },
+          metadata: {},
+          tables: [
+            {
+              catalog: null,
+              db: 'public',
+              name: 'customers',
+              kind: 'table' as const,
+              comment: 'Customer accounts',
+              estimatedRows: 100,
+              foreignKeys: [],
+              columns: [
+                {
+                  name: 'id',
+                  nativeType: 'integer',
+                  normalizedType: 'integer',
+                  dimensionType: 'number' as const,
+                  nullable: false,
+                  primaryKey: true,
+                  comment: 'Customer id',
+                },
+              ],
+            },
+            {
+              catalog: null,
+              db: 'public',
+              name: 'orders',
+              kind: 'table' as const,
+              comment: 'Customer orders',
+              estimatedRows: 1000,
+              foreignKeys: [],
+              columns: [
+                {
+                  name: 'customer_id',
+                  nativeType: 'integer',
+                  normalizedType: 'integer',
+                  dimensionType: 'number' as const,
+                  nullable: false,
+                  primaryKey: false,
+                  comment: 'Owning customer',
+                },
+              ],
+            },
+          ],
+        };
+      },
+      async sampleTable() {
+        return { headers: ['id'], rows: [[1]], totalRows: 1 };
+      },
+      async sampleColumn() {
+        return { values: ['1'], nullCount: 0, distinctCount: 1 };
+      },
+      // Profiling succeeds; the coverage probe in the relationship stage throws,
+      // standing in for a relationship-stage interruption after enrichment.
+      async executeReadOnly(input: KtxReadOnlyQueryInput) {
+        return relationshipSqlResult(input, { throwOnCoverage: true });
+      },
+    };
+
+    const result = await runLocalScan({
+      project,
+      adapters: [fetchOnlyAdapter({ snapshot: await connector.introspect() })],
+      connectionId: 'warehouse',
+      mode: 'enriched',
+      detectRelationships: true,
+      connector,
+      jobId: 'scan-checkpoint-durability-1',
+      now: () => new Date('2026-04-29T09:20:00.000Z'),
+    });
+
+    expect(result.report.warnings.map((warning) => warning.code)).toContain('enrichment_failed');
+
+    const manifestRaw = await readFile(
+      join(project.projectDir, 'semantic-layer/warehouse/_schema/public.yaml'),
+      'utf-8',
+    );
+    expect(manifestRaw).toContain('ai: |-');
+    expect(manifestRaw).toContain('Deterministic description');
+  });
+
  it('resumes completed local enrichment stages when an enriched scan run is retried', async () => {
    let embeddingAttempts = 0;
    const connector = {
@ -1928,6 +2034,147 @@ describe('local scan', () => {
      'raw-sources/warehouse/live-database/2026-04-29-160000-scan-run-sqlserver/scan-report.json',
    );
  });
+
+  // Regression (spec 21 defect, 2026-06-24): the structural manifest write that runs
+  // BEFORE enrichment must not let a `--stages` subset delete the prior on-disk
+  // descriptions. This goes through the full runLocalScan path (the unit-level
+  // enrichment test could not catch the structural-pre-write ordering).
+  it('a --stages relationships scan preserves on-disk descriptions while adding joins', async () => {
+    const snapshot: KtxSchemaSnapshot = {
+      connectionId: 'warehouse',
+      driver: 'postgres',
+      extractedAt: '2026-05-07T09:00:00.000Z',
+      scope: {},
+      metadata: {},
+      tables: [
+        {
+          catalog: null,
+          db: null,
+          name: 'accounts',
+          kind: 'table',
+          comment: null,
+          estimatedRows: 2,
+          foreignKeys: [],
+          columns: [
+            {
+              name: 'id',
+              nativeType: 'integer',
+              normalizedType: 'integer',
+              dimensionType: 'number',
+              nullable: false,
+              primaryKey: false,
+              comment: null,
+            },
+          ],
+        },
+        {
+          catalog: null,
+          db: null,
+          name: 'orders',
+          kind: 'table',
+          comment: null,
+          estimatedRows: 3,
+          foreignKeys: [],
+          columns: [
+            {
+              name: 'id',
+              nativeType: 'integer',
+              normalizedType: 'integer',
+              dimensionType: 'number',
+              nullable: false,
+              primaryKey: false,
+              comment: null,
+            },
+            {
+              name: 'account_id',
+              nativeType: 'integer',
+              normalizedType: 'integer',
+              dimensionType: 'number',
+              nullable: false,
+              primaryKey: false,
+              comment: null,
+            },
+          ],
+        },
+      ],
+    };
+    // Enriched fixture already on disk: ai descriptions, zero joins.
+    await project.fileStore.writeFile(
+      'semantic-layer/warehouse/_schema/public.yaml',
+      YAML.stringify(
+        {
+          tables: {
+            accounts: {
+              table: 'accounts',
+              descriptions: { ai: 'AI accounts table' },
+              columns: [{ name: 'id', type: 'number', descriptions: { ai: 'AI accounts id' } }],
+            },
+            orders: {
+              table: 'orders',
+              descriptions: { ai: 'AI orders table' },
+              columns: [
+                { name: 'id', type: 'number', descriptions: { ai: 'AI orders id' } },
+                { name: 'account_id', type: 'number', descriptions: { ai: 'AI account ref' } },
+              ],
+            },
+          },
+        },
+        { indent: 2, lineWidth: 0 },
+      ),
+      'ktx',
+      'ktx@example.com',
+      'Seed enriched fixture',
+    );
+    const shardPath = 'semantic-layer/warehouse/_schema/public.yaml';
+    const aiBefore = ((await project.fileStore.readFile(shardPath)).content.match(/\bai:/g) ?? []).length;
+    expect(aiBefore).toBe(5);
+
+    const connector: KtxScanConnector = {
+      id: 'test:warehouse',
+      driver: 'postgres',
+      capabilities: {
+        structuralIntrospection: true,
+        tableSampling: false,
+        columnSampling: false,
+        columnStats: true,
+        readOnlySql: true,
+        nestedAnalysis: false,
+        eventStreamDiscovery: false,
+        formalForeignKeys: false,
+        estimatedRowCounts: true,
+      },
+      ...connectorScopeListing,
+      introspect: vi.fn(async () => snapshot),
+      async executeReadOnly(input: KtxReadOnlyQueryInput) {
+        return relationshipSqlResult(input);
+      },
+    };
+
+    const result = await runLocalScan({
+      project,
+      adapters: [fetchOnlyAdapter({ snapshot })],
+      connectionId: 'warehouse',
+      mode: 'enriched',
+      detectRelationships: true,
+      stages: ['relationships'],
+      connector,
+      enrichmentProviders: { llmRuntime: deterministicLlmRuntime() },
+      jobId: 'scan-stages-relationships-preserve',
+      now: () => new Date('2026-05-07T09:30:00.000Z'),
+    });
+
+    // The relationships stage actually ran and accepted a join...
+    expect(result.report.relationships.accepted).toBe(1);
+    const after = (await project.fileStore.readFile(shardPath)).content;
+    // ...and every prior ai description survived the structural + enrichment writes.
+    expect((after.match(/\bai:/g) ?? []).length).toBe(aiBefore);
+    expect(after).toContain('AI orders table');
+    expect(after).toContain('AI account ref');
+    const manifest = YAML.parse(after) as {
+      tables: Record<string, { joins?: Array<{ to: string; source: string }> }>;
+    };
+    expect(manifest.tables.orders?.joins?.some((join) => join.to === 'accounts')).toBe(true);
+  });
 });

 describe('resolveEnabledTables', () => {
--- a/packages/cli/test/context/scan/object-introspection.test.ts
+++ b/packages/cli/test/context/scan/object-introspection.test.ts
@ -0,0 +1,47 @@
+import { describe, expect, it } from 'vitest';
+import { tryIntrospectObject } from '../../../src/context/scan/object-introspection.js';
+
+describe('tryIntrospectObject', () => {
+  it('returns the read value when introspection succeeds', async () => {
+    await expect(tryIntrospectObject({ object: 'customers' }, () => ({ name: 'customers' }))).resolves.toEqual({
+      ok: true,
+      table: { name: 'customers' },
+    });
+  });
+
+  it('skips with a recoverable warning when the object read throws', async () => {
+    const outcome = await tryIntrospectObject({ object: 'broken_view', db: 'main' }, () => {
+      throw new Error('no such column: ehp.start_date');
+    });
+
+    expect(outcome).toEqual({
+      ok: false,
+      warning: {
+        code: 'object_introspection_failed',
+        message: 'no such column: ehp.start_date',
+        table: 'broken_view',
+        recoverable: true,
+        metadata: { object: 'main.broken_view', db: 'main' },
+      },
+    });
+  });
+
+  it('rethrows native programming faults instead of masking them as object skips', async () => {
+    await expect(
+      tryIntrospectObject({ object: 'customers' }, () => {
+        throw new TypeError('cannot read properties of undefined');
+      }),
+    ).rejects.toBeInstanceOf(TypeError);
+  });
+
+  it('builds a fully-qualified object label for warehouse objects', async () => {
+    const outcome = await tryIntrospectObject({ object: 'orders', db: 'sales', catalog: 'warehouse' }, () => {
+      throw new Error('permission denied');
+    });
+    expect(outcome.ok).toBe(false);
+    if (!outcome.ok) {
+      expect(outcome.warning.table).toBe('orders');
+      expect(outcome.warning.metadata).toEqual({ object: 'warehouse.sales.orders', db: 'sales', catalog: 'warehouse' });
+    }
+  });
+});
--- a/packages/cli/test/context/scan/relationship-detection-budget.test.ts
+++ b/packages/cli/test/context/scan/relationship-detection-budget.test.ts
@ -0,0 +1,72 @@
+import { describe, expect, it } from 'vitest';
+import {
+  createKtxRelationshipDetectionBudget,
+  mapWithBudget,
+} from '../../../src/context/scan/relationship-detection-budget.js';
+
+describe('relationship detection budget', () => {
+  it('reports no stop while inside the wall-clock budget', () => {
+    let clock = 1000;
+    const budget = createKtxRelationshipDetectionBudget({ budgetMs: 500, now: () => clock });
+    expect(budget.check()).toBeNull();
+    clock = 1400;
+    expect(budget.check()).toBeNull();
+    expect(budget.stopReason()).toBeNull();
+  });
+
+  it('trips on budget exhaustion and records it stickily', () => {
+    let clock = 0;
+    const budget = createKtxRelationshipDetectionBudget({ budgetMs: 100, now: () => clock });
+    clock = 150;
+    expect(budget.check()).toBe('budget');
+    // Even after a notional clock rewind the recorded reason persists.
+    clock = 10;
+    expect(budget.stopReason()).toBe('budget');
+  });
+
+  it('prefers abort over budget when the signal fires', () => {
+    const controller = new AbortController();
+    let clock = 0;
+    const budget = createKtxRelationshipDetectionBudget({
+      budgetMs: 1_000,
+      signal: controller.signal,
+      now: () => clock,
+    });
+    expect(budget.check()).toBeNull();
+    controller.abort();
+    expect(budget.check()).toBe('aborted');
+    expect(budget.stopReason()).toBe('aborted');
+  });
+
+  it('maps every item and stays unmarked when the budget is never exhausted', async () => {
+    const budget = createKtxRelationshipDetectionBudget({ budgetMs: 1_000, now: () => 0 });
+    const { results, processedCount } = await mapWithBudget({
+      inputs: [1, 2, 3, 4],
+      concurrency: 2,
+      budget,
+      mapOne: async (value) => value * 10,
+    });
+    expect(processedCount).toBe(4);
+    expect(results).toEqual([10, 20, 30, 40]);
+    expect(budget.stopReason()).toBeNull();
+  });
+
+  it('stops claiming new items once the budget trips and leaves the rest undefined', async () => {
+    let clock = 0;
+    const budget = createKtxRelationshipDetectionBudget({ budgetMs: 25, now: () => clock });
+    const started: number[] = [];
+    const { results, processedCount } = await mapWithBudget({
+      inputs: [0, 1, 2, 3, 4],
+      concurrency: 1,
+      budget,
+      onStart: (index) => {
+        started.push(index);
+        clock += 10; // each unit advances the clock; the budget elapses partway through
+      },
+      mapOne: async (value) => value,
+    });
+    expect(processedCount).toBeLessThan(5);
+    expect(results.slice(processedCount).every((value) => value === undefined)).toBe(true);
+    expect(budget.stopReason()).toBe('budget');
+  });
+});
--- a/packages/cli/test/context/scan/relationship-diagnostics.test.ts
+++ b/packages/cli/test/context/scan/relationship-diagnostics.test.ts
@ -315,6 +315,26 @@ describe('relationship diagnostics artifacts', () => {
    expect(diagnostics.summary).toEqual({ accepted: 0, review: 0, rejected: 0, skipped: 0 });
    expect(diagnostics.noAcceptedReason).toBe('no candidate pairs passed type compatibility');
    expect(diagnostics.candidateCountsBySource).toEqual({});
+    expect(diagnostics.partial).toBe(false);
+    expect(diagnostics.partialReason).toBeNull();
+  });
+
+  it('marks the diagnostics partial with its stop reason when relationship detection was truncated', () => {
+    const artifacts = buildKtxRelationshipArtifacts({ connectionId: 'warehouse' });
+    const diagnostics = buildKtxRelationshipDiagnostics({
+      connectionId: 'warehouse',
+      generatedAt: '2026-05-07T12:00:00.000Z',
+      artifacts,
+      profile: emptyKtxRelationshipProfileArtifact({
+        connectionId: 'warehouse',
+        driver: 'sqlite',
+        reason: 'relationship_profiling_not_run',
+      }),
+      partial: { reason: 'budget' },
+    });
+
+    expect(diagnostics.partial).toBe(true);
+    expect(diagnostics.partialReason).toBe('budget');
  });

  it('records composite relationship endpoints in relationship artifacts', () => {
--- a/packages/cli/test/context/scan/relationship-discovery.test.ts
+++ b/packages/cli/test/context/scan/relationship-discovery.test.ts
@ -224,6 +224,7 @@ function llmRuntime(output: unknown): KtxLlmRuntimePort {
    generateText: vi.fn(),
    generateObject: vi.fn(async () => output) as KtxLlmRuntimePort['generateObject'],
    runAgentLoop: vi.fn(),
+    subprocessForkSpec: () => null,
  };
 }

@ -338,6 +339,126 @@ describe('production relationship discovery', () => {
    });
  });

+  it('emits per-table profiling and per-candidate validation progress', async () => {
+    executor = new InMemorySqliteExecutor();
+    executor.db.exec(`
+      CREATE TABLE accounts (id INTEGER NOT NULL, name TEXT NOT NULL);
+      CREATE TABLE orders (id INTEGER NOT NULL, account_id INTEGER NOT NULL);
+      INSERT INTO accounts (id, name) VALUES (1, 'Acme'), (2, 'Globex');
+      INSERT INTO orders (id, account_id) VALUES (10, 1), (11, 1), (12, 2);
+    `);
+    const messages: string[] = [];
+    const progress = {
+      async update(_progress: number, message?: string) {
+        if (message) {
+          messages.push(message);
+        }
+      },
+      startPhase() {
+        return progress;
+      },
+    };
+
+    const result = await discoverKtxRelationships({
+      connectionId: 'warehouse',
+      dialect: getSqlDialectForDriver('sqlite'),
+      connector: connector(executor),
+      schema: snapshotToKtxEnrichedSchema(snapshot()),
+      context: { runId: 'relationship-progress' },
+      settings: relationshipSettings(),
+      progress,
+    });
+
+    expect(result.partial).toBeNull();
+    expect(messages).toContain('Profiling table 1/2');
+    expect(messages).toContain('Profiling table 2/2');
+    expect(messages.some((message) => message.startsWith('Validating candidate '))).toBe(true);
+  });
+
+  it('returns a partial result when the wall-clock budget is exhausted, without throwing', async () => {
+    executor = new InMemorySqliteExecutor();
+    executor.db.exec(`
+      CREATE TABLE accounts (id INTEGER NOT NULL, name TEXT NOT NULL);
+      CREATE TABLE orders (id INTEGER NOT NULL, account_id INTEGER NOT NULL);
+      INSERT INTO accounts (id, name) VALUES (1, 'Acme'), (2, 'Globex');
+      INSERT INTO orders (id, account_id) VALUES (10, 1), (11, 1), (12, 2);
+    `);
+    // A clock that jumps a full second per read against a 1ms budget exhausts
+    // the budget at the very first unit boundary.
+    let calls = 0;
+    const now = () => calls++ * 1000;
+
+    const result = await discoverKtxRelationships({
+      connectionId: 'warehouse',
+      dialect: getSqlDialectForDriver('sqlite'),
+      connector: connector(executor),
+      schema: snapshotToKtxEnrichedSchema(snapshot()),
+      context: { runId: 'relationship-budget' },
+      settings: { ...relationshipSettings(), detectionBudgetMs: 1 },
+      now,
+    });
+
+    expect(result.partial).toEqual({ reason: 'budget' });
+    expect(result.relationships.accepted).toBe(0);
+  });
+
+  it('does not start the LLM relationship proposal once the budget is exhausted', async () => {
+    executor = new InMemorySqliteExecutor();
+    executor.db.exec(`
+      CREATE TABLE accounts (id INTEGER NOT NULL, name TEXT NOT NULL);
+      CREATE TABLE orders (id INTEGER NOT NULL, account_id INTEGER NOT NULL);
+      INSERT INTO accounts (id, name) VALUES (1, 'Acme'), (2, 'Globex');
+      INSERT INTO orders (id, account_id) VALUES (10, 1), (11, 1), (12, 2);
+    `);
+    let calls = 0;
+    const now = () => calls++ * 1000;
+    const generateObject = vi.fn(async () => ({ pkCandidates: [], fkCandidates: [] }));
+    const runtime: KtxLlmRuntimePort = {
+      generateText: vi.fn(),
+      generateObject: generateObject as KtxLlmRuntimePort['generateObject'],
+      runAgentLoop: vi.fn(),
+      subprocessForkSpec: () => null,
+    };
+
+    const result = await discoverKtxRelationships({
+      connectionId: 'warehouse',
+      dialect: getSqlDialectForDriver('sqlite'),
+      connector: connector(executor),
+      schema: snapshotToKtxEnrichedSchema(snapshot()),
+      context: { runId: 'relationship-budget-llm' },
+      settings: { ...relationshipSettings(), detectionBudgetMs: 1 },
+      llmRuntime: runtime,
+      now,
+    });
+
+    expect(result.partial).toEqual({ reason: 'budget' });
+    expect(result.llmRelationshipValidation).toBe('skipped');
+    expect(generateObject).not.toHaveBeenCalled();
+  });
+
+  it('returns a partial result when the scan signal is already aborted', async () => {
+    executor = new InMemorySqliteExecutor();
+    executor.db.exec(`
+      CREATE TABLE accounts (id INTEGER NOT NULL, name TEXT NOT NULL);
+      CREATE TABLE orders (id INTEGER NOT NULL, account_id INTEGER NOT NULL);
+      INSERT INTO accounts (id, name) VALUES (1, 'Acme'), (2, 'Globex');
+      INSERT INTO orders (id, account_id) VALUES (10, 1), (11, 1), (12, 2);
+    `);
+
+    const result = await discoverKtxRelationships({
+      connectionId: 'warehouse',
+      dialect: getSqlDialectForDriver('sqlite'),
+      connector: connector(executor),
+      schema: snapshotToKtxEnrichedSchema(snapshot()),
+      context: { runId: 'relationship-aborted', signal: AbortSignal.abort() },
+      settings: relationshipSettings(),
+    });
+
+    expect(result.partial).toEqual({ reason: 'aborted' });
+    // A stop-before-completion must not be reported as completed statistical validation.
+    expect(result.statisticalValidation).toBe('skipped');
+  });
+
  it('accepts a profile-driven natural-key relationship without declared metadata', async () => {
    executor = new InMemorySqliteExecutor();
    executor.db.exec(`
--- a/packages/cli/test/context/scan/relationship-llm-proposal.test.ts
+++ b/packages/cli/test/context/scan/relationship-llm-proposal.test.ts
@ -9,6 +9,7 @@ function llmRuntime(output?: unknown): KtxLlmRuntimePort {
    generateText: vi.fn(),
    generateObject: vi.fn(async () => output) as KtxLlmRuntimePort['generateObject'],
    runAgentLoop: vi.fn(),
+    subprocessForkSpec: () => null,
  };
 }

@ -202,6 +203,7 @@ describe('relationship LLM proposals', () => {
          throw new Error('model unavailable');
        }),
        runAgentLoop: vi.fn(),
+        subprocessForkSpec: () => null,
      },
    });
    expect(failed).toMatchObject({ candidates: [], llmCalls: 1, summary: 'failed' });
--- a/packages/cli/test/context/scan/relationship-validation.test.ts
+++ b/packages/cli/test/context/scan/relationship-validation.test.ts
@ -1,5 +1,6 @@
 import Database from 'better-sqlite3';
 import { afterEach, describe, expect, it } from 'vitest';
+import { KtxQueryError } from '../../../src/errors.js';
 import { getSqlDialectForDriver } from '../../../src/context/connections/dialects.js';
 import type { KtxEnrichedColumn, KtxEnrichedSchema, KtxEnrichedTable } from '../../../src/context/scan/enrichment-types.js';
 import { generateKtxRelationshipDiscoveryCandidates } from '../../../src/context/scan/relationship-candidates.js';
@ -139,6 +140,54 @@ describe('relationship validation', () => {
    expect(validated[0]?.score).toBeGreaterThanOrEqual(0.85);
  });

+  it('sends a candidate to review (not source-fatal) when its validation query times out', async () => {
+    executor = new InMemorySqliteExecutor();
+    executor.db.exec(`
+      CREATE TABLE accounts (id INTEGER, name TEXT);
+      CREATE TABLE users (id INTEGER, account_id INTEGER);
+      CREATE TABLE invoices (id INTEGER, account_id INTEGER);
+      INSERT INTO accounts (id, name) VALUES (1, 'Acme'), (2, 'Globex'), (3, 'Initech');
+      INSERT INTO users (id, account_id) VALUES (10, 1), (11, 2), (12, 3);
+      INSERT INTO invoices (id, account_id) VALUES (20, 1), (21, 2), (22, 999);
+    `);
+    const testSchema = schema();
+    const profiles = await profileKtxRelationshipSchema({
+      connectionId: 'warehouse',
+      driver: 'sqlite',
+      dialect: getSqlDialectForDriver('sqlite'),
+      schema: testSchema,
+      executor,
+      ctx: { runId: 'validate-test' },
+    });
+    const candidates = generateKtxRelationshipDiscoveryCandidates(testSchema).filter(
+      (candidate) => candidate.from.table.name === 'users',
+    );
+
+    const warnings: string[] = [];
+    const timingOutExecutor = {
+      executeReadOnly: () => Promise.reject(new KtxQueryError('query exceeded 30s')),
+    };
+    const validated = await validateKtxRelationshipDiscoveryCandidates({
+      connectionId: 'warehouse',
+      dialect: getSqlDialectForDriver('sqlite'),
+      candidates,
+      profiles,
+      executor: timingOutExecutor,
+      ctx: {
+        runId: 'validate-test',
+        logger: { debug() {}, info() {}, warn: (message) => warnings.push(message), error() {} },
+      },
+      tableCount: testSchema.tables.length,
+    });
+
+    expect(validated).toHaveLength(1);
+    expect(validated[0]).toMatchObject({
+      status: 'review',
+      validation: { reasons: ['validation_query_failed'] },
+    });
+    expect(warnings.some((message) => message.includes('query exceeded 30s'))).toBe(true);
+  });
+
  it('rejects a candidate with missing parent values and records the deterministic reason', async () => {
    executor = new InMemorySqliteExecutor();
    executor.db.exec(`
--- a/packages/cli/test/context/wiki/local-knowledge.test.ts
+++ b/packages/cli/test/context/wiki/local-knowledge.test.ts
@ -6,10 +6,12 @@ import { initKtxProject, type KtxLocalProject } from '../../../src/context/proje
 import {
  listLocalKnowledgePageKeys,
  listLocalKnowledgePages,
+  listReferencedConnectionIds,
  readLocalKnowledgePage,
  searchLocalKnowledgePages,
  writeLocalKnowledgePage,
 } from '../../../src/context/wiki/local-knowledge.js';
+import { SqliteKnowledgeIndex } from '../../../src/context/wiki/sqlite-knowledge-index.js';

 class FakeEmbeddingPort {
  readonly maxBatchSize = 16;
@ -284,6 +286,203 @@ describe('local knowledge helpers', () => {
    expect(raw.content).toContain(['fingerprints:', '  - fp_paid_orders'].join('\n'));
  });

+  it('round-trips a connections list through write, read, and list', async () => {
+    await writeLocalKnowledgePage(project, {
+      key: 'orders-sales-db',
+      scope: 'GLOBAL',
+      summary: 'Orders concept for the sales database',
+      content: 'In sales_db, orders are recognized when paid.',
+      connections: ['sales_db'],
+    });
+
+    const raw = await project.fileStore.readFile('wiki/global/orders-sales-db.md');
+    expect(raw.content).toContain(['connections:', '  - sales_db'].join('\n'));
+
+    await expect(readLocalKnowledgePage(project, { key: 'orders-sales-db', userId: 'local' })).resolves.toMatchObject({
+      key: 'orders-sales-db',
+      connections: ['sales_db'],
+    });
+  });
+
+  it('normalizes a single connections string to a list at parse time', async () => {
+    await project.fileStore.writeFile(
+      'wiki/global/single-scoped.md',
+      '---\nsummary: Single connection as scalar\nusage_mode: auto\nconnections: events_db\n---\n\nBody\n',
+      'Test',
+      'test@example.com',
+      'Write scalar connections page',
+    );
+
+    await expect(readLocalKnowledgePage(project, { key: 'single-scoped', userId: 'local' })).resolves.toMatchObject({
+      key: 'single-scoped',
+      connections: ['events_db'],
+    });
+  });
+
+  it('treats an absent connections field as unscoped (empty list)', async () => {
+    await writeLocalKnowledgePage(project, {
+      key: 'fiscal-year',
+      scope: 'GLOBAL',
+      summary: 'Org-wide fiscal year',
+      content: 'Fiscal year starts in February.',
+    });
+
+    await expect(readLocalKnowledgePage(project, { key: 'fiscal-year', userId: 'local' })).resolves.toMatchObject({
+      key: 'fiscal-year',
+      connections: [],
+    });
+  });
+
+  it('scopes search to unscoped pages plus pages listing the requested connection', async () => {
+    await writeLocalKnowledgePage(project, {
+      key: 'orders-sales-db',
+      scope: 'GLOBAL',
+      summary: 'Sales DB orders',
+      content: 'Orders are paid in the sales database.',
+      connections: ['sales_db'],
+    });
+    await writeLocalKnowledgePage(project, {
+      key: 'orders-events-db',
+      scope: 'GLOBAL',
+      summary: 'Events DB orders',
+      content: 'Orders are paid in the events database.',
+      connections: ['events_db'],
+    });
+    await writeLocalKnowledgePage(project, {
+      key: 'orders-global',
+      scope: 'GLOBAL',
+      summary: 'Org-wide orders note',
+      content: 'Orders are paid everywhere in the org.',
+    });
+
+    const scoped = await searchLocalKnowledgePages(project, {
+      query: 'orders paid',
+      userId: 'local',
+      connectionId: 'sales_db',
+    });
+    const keys = scoped.map((result) => result.key).sort();
+    expect(keys).toEqual(['orders-global', 'orders-sales-db']);
+    expect(keys).not.toContain('orders-events-db');
+
+    const unfiltered = await searchLocalKnowledgePages(project, { query: 'orders paid', userId: 'local' });
+    expect(unfiltered.map((result) => result.key).sort()).toEqual([
+      'orders-events-db',
+      'orders-global',
+      'orders-sales-db',
+    ]);
+  });
+
+  it('keeps other-connection pages and embeddings in the sqlite index after a scoped search', async () => {
+    const embedding = new FakeEmbeddingPort();
+    await writeLocalKnowledgePage(project, {
+      key: 'orders-sales-db',
+      scope: 'GLOBAL',
+      summary: 'Sales DB orders',
+      content: 'Orders are paid in the sales database.',
+      connections: ['sales_db'],
+    });
+    await writeLocalKnowledgePage(project, {
+      key: 'orders-events-db',
+      scope: 'GLOBAL',
+      summary: 'Events DB orders',
+      content: 'Orders are paid in the events database.',
+      connections: ['events_db'],
+    });
+
+    const scoped = await searchLocalKnowledgePages(project, {
+      query: 'orders paid',
+      userId: 'local',
+      connectionId: 'sales_db',
+      embeddingService: embedding,
+    });
+    expect(scoped.map((result) => result.key)).toEqual(['orders-sales-db']);
+
+    // A connection-scoped search must not prune the other connection's page (or
+    // its cached embedding) from the shared persistent index.
+    const index = new SqliteKnowledgeIndex({ dbPath: join(project.projectDir, '.ktx', 'db.sqlite') });
+    const indexed = index.getExistingPages();
+    expect([...indexed.keys()].sort()).toEqual([
+      'wiki/global/orders-events-db.md',
+      'wiki/global/orders-sales-db.md',
+    ]);
+    expect(indexed.get('wiki/global/orders-events-db.md')?.embedding).not.toBeNull();
+  });
+
+  it('filters search per connection across lexical and token lanes when embeddings are disabled', async () => {
+    await writeLocalKnowledgePage(project, {
+      key: 'rfm-events-db',
+      scope: 'GLOBAL',
+      summary: 'RFM definition for events_db',
+      content: 'RFM segmentation rules for the events database.',
+      connections: ['events_db'],
+    });
+    await writeLocalKnowledgePage(project, {
+      key: 'rfm-sales-db',
+      scope: 'GLOBAL',
+      summary: 'RFM definition for sales_db',
+      content: 'RFM segmentation rules for the sales database.',
+      connections: ['sales_db'],
+    });
+
+    const lexical = await searchLocalKnowledgePages(project, {
+      query: 'rfm segmentation',
+      userId: 'local',
+      connectionId: 'events_db',
+    });
+    expect(lexical.map((result) => result.key)).toEqual(['rfm-events-db']);
+
+    const token = await searchLocalKnowledgePages(project, {
+      query: 'segmentation---',
+      userId: 'local',
+      connectionId: 'events_db',
+    });
+    expect(token.map((result) => result.key)).toEqual(['rfm-events-db']);
+  });
+
+  it('filters list output by connection while keeping unscoped pages', async () => {
+    await writeLocalKnowledgePage(project, {
+      key: 'orders-sales-db',
+      scope: 'GLOBAL',
+      summary: 'Sales DB orders',
+      content: 'Sales orders.',
+      connections: ['sales_db'],
+    });
+    await writeLocalKnowledgePage(project, {
+      key: 'orders-events-db',
+      scope: 'GLOBAL',
+      summary: 'Events DB orders',
+      content: 'Events orders.',
+      connections: ['events_db'],
+    });
+    await writeLocalKnowledgePage(project, {
+      key: 'orders-global',
+      scope: 'GLOBAL',
+      summary: 'Org-wide orders',
+      content: 'Global orders.',
+    });
+
+    const scoped = await listLocalKnowledgePages(project, { userId: 'local', connectionId: 'sales_db' });
+    expect(scoped.map((page) => page.key).sort()).toEqual(['orders-global', 'orders-sales-db']);
+  });
+
+  it('keeps a page referencing an unconfigured connection searchable and readable', async () => {
+    await writeLocalKnowledgePage(project, {
+      key: 'rfm-removed-db',
+      scope: 'GLOBAL',
+      summary: 'RFM for a since-removed database',
+      content: 'RFM rules.',
+      connections: ['removed_db'],
+    });
+
+    await expect(readLocalKnowledgePage(project, { key: 'rfm-removed-db', userId: 'local' })).resolves.toMatchObject({
+      key: 'rfm-removed-db',
+      connections: ['removed_db'],
+    });
+    const search = await searchLocalKnowledgePages(project, { query: 'rfm rules', userId: 'local' });
+    expect(search.map((result) => result.key)).toContain('rfm-removed-db');
+    await expect(listReferencedConnectionIds(project, { userId: 'local' })).resolves.toEqual(['removed_db']);
+  });
+
  it('falls back to Markdown scanning when the config does not select sqlite-fts5', async () => {
    project.config.storage.search = 'postgres-hybrid';
    await writeLocalKnowledgePage(project, {
--- a/packages/cli/test/context/wiki/sqlite-knowledge-index.test.ts
+++ b/packages/cli/test/context/wiki/sqlite-knowledge-index.test.ts
@ -142,6 +142,49 @@ describe('SqliteKnowledgeIndex', () => {
    ]);
  });

+  it('restricts lexical candidates to the allowlist', () => {
+    const index = new SqliteKnowledgeIndex({ dbPath });
+    index.sync([
+      page({ path: 'wiki/global/revenue.md', key: 'revenue' }),
+      page({ path: 'wiki/global/support.md', key: 'support', content: 'Orders are paid by the support team.' }),
+    ]);
+
+    expect(
+      index
+        .searchLexicalCandidates({ queryText: 'paid', limit: 10, allowedPaths: ['wiki/global/support.md'] })
+        .map((row) => row.path),
+    ).toEqual(['wiki/global/support.md']);
+  });
+
+  it('applies the allowlist before the semantic limit so an in-scope match survives', () => {
+    const index = new SqliteKnowledgeIndex({ dbPath });
+    index.sync([
+      page({ path: 'wiki/global/noise-a.md', key: 'noise-a', embedding: [1, 0] }),
+      page({ path: 'wiki/global/noise-b.md', key: 'noise-b', embedding: [1, 0] }),
+      page({ path: 'wiki/global/target.md', key: 'target', embedding: [1, 0] }),
+    ]);
+
+    // All three tie on similarity; a limit of 1 over the full corpus drops the target.
+    expect(index.searchSemanticCandidates({ queryEmbedding: [1, 0], limit: 1 }).map((row) => row.path)).toEqual([
+      'wiki/global/noise-a.md',
+    ]);
+
+    // Scoped to the target, the limit applies after the allowlist, so it survives.
+    expect(
+      index
+        .searchSemanticCandidates({ queryEmbedding: [1, 0], limit: 1, allowedPaths: ['wiki/global/target.md'] })
+        .map((row) => row.path),
+    ).toEqual(['wiki/global/target.md']);
+  });
+
+  it('treats an empty allowlist as no page in scope', () => {
+    const index = new SqliteKnowledgeIndex({ dbPath });
+    index.sync([page({ embedding: [1, 0] })]);
+
+    expect(index.searchLexicalCandidates({ queryText: 'paid order', limit: 10, allowedPaths: [] })).toEqual([]);
+    expect(index.searchSemanticCandidates({ queryEmbedding: [1, 0], limit: 10, allowedPaths: [] })).toEqual([]);
+  });
+
  it('returns an empty result for blank or punctuation-only queries', () => {
    const index = new SqliteKnowledgeIndex({ dbPath });
    index.rebuild([page()]);
--- a/packages/cli/test/context/wiki/tools/wiki-write.tool.test.ts
+++ b/packages/cli/test/context/wiki/tools/wiki-write.tool.test.ts
@ -263,6 +263,108 @@ describe('WikiWriteTool', () => {
    });
  });

+  it('sets connections on a new page and normalizes a single string to a list', async () => {
+    const { tool, wikiService } = makeTool();
+
+    await tool.call(
+      { key: 'orders-sales-db', summary: 'Sales orders', content: '# Orders', connections: 'sales_db' } as any,
+      baseContext,
+    );
+
+    expect(wikiService.writePage.mock.calls[0][3]).toMatchObject({ connections: ['sales_db'] });
+  });
+
+  it('applies REPLACE semantics for connections on update', async () => {
+    const existing = {
+      pageKey: 'orders',
+      frontmatter: { summary: 'Orders', usage_mode: 'auto' as const, sort_order: 0, connections: ['sales_db'] },
+      content: 'body',
+    };
+    // omit ⇒ keep existing connections
+    {
+      const { tool, wikiService } = makeTool({ wikiService: { readPage: vi.fn().mockResolvedValue(existing) } });
+      await tool.call({ key: 'orders', summary: 'Orders', content: 'new body' } as any, baseContext);
+      expect(wikiService.writePage.mock.calls[0][3]).toMatchObject({ connections: ['sales_db'] });
+    }
+    // [] ⇒ clear to unscoped
+    {
+      const { tool, wikiService } = makeTool({ wikiService: { readPage: vi.fn().mockResolvedValue(existing) } });
+      await tool.call({ key: 'orders', summary: 'Orders', content: 'new body', connections: [] } as any, baseContext);
+      expect(wikiService.writePage.mock.calls[0][3]).toMatchObject({ connections: [] });
+    }
+    // [ids] ⇒ set (broaden within overlap is allowed)
+    {
+      const { tool, wikiService } = makeTool({ wikiService: { readPage: vi.fn().mockResolvedValue(existing) } });
+      await tool.call(
+        { key: 'orders', summary: 'Orders', content: 'new body', connections: ['sales_db', 'events_db'] } as any,
+        baseContext,
+      );
+      expect(wikiService.writePage.mock.calls[0][3]).toMatchObject({ connections: ['sales_db', 'events_db'] });
+    }
+  });
+
+  it('blocks a connection-scoped write whose key collides with a disjoint-connection page', async () => {
+    const { tool, wikiService } = makeTool({
+      wikiService: {
+        readPage: vi.fn().mockResolvedValue({
+          pageKey: 'orders',
+          frontmatter: { summary: 'Events orders', usage_mode: 'auto', sort_order: 0, connections: ['events_db'] },
+          content: 'events body',
+        }),
+      },
+    });
+
+    const result = await tool.call(
+      { key: 'orders', summary: 'Sales orders', content: 'sales body', connections: ['sales_db'] } as any,
+      baseContext,
+    );
+
+    expect(result.structured).toEqual({ success: false, key: 'orders' });
+    expect(result.markdown).toContain('already exists scoped to a different connection');
+    expect(result.markdown).toContain('orders_sales_db');
+    expect(wikiService.writePage).not.toHaveBeenCalled();
+  });
+
+  it('allows narrowing a connection-scoped page within its own scope', async () => {
+    const { tool, wikiService } = makeTool({
+      wikiService: {
+        readPage: vi.fn().mockResolvedValue({
+          pageKey: 'orders',
+          frontmatter: { summary: 'Orders', usage_mode: 'auto', sort_order: 0, connections: ['sales_db', 'events_db'] },
+          content: 'body',
+        }),
+      },
+    });
+
+    const result = await tool.call(
+      { key: 'orders', summary: 'Orders', content: 'body', connections: ['sales_db'] } as any,
+      baseContext,
+    );
+
+    expect(result.structured).toMatchObject({ success: true, action: 'updated' });
+    expect(wikiService.writePage.mock.calls[0][3]).toMatchObject({ connections: ['sales_db'] });
+  });
+
+  it('allows scoping a previously unscoped page (existing connections empty)', async () => {
+    const { tool, wikiService } = makeTool({
+      wikiService: {
+        readPage: vi.fn().mockResolvedValue({
+          pageKey: 'orders',
+          frontmatter: { summary: 'Orders', usage_mode: 'auto', sort_order: 0 },
+          content: 'body',
+        }),
+      },
+    });
+
+    const result = await tool.call(
+      { key: 'orders', summary: 'Orders', content: 'body', connections: ['sales_db'] } as any,
+      baseContext,
+    );
+
+    expect(result.structured).toMatchObject({ success: true, action: 'updated' });
+    expect(wikiService.writePage.mock.calls[0][3]).toMatchObject({ connections: ['sales_db'] });
+  });
+
  it('rejects frontmatter refs that target missing wiki pages', async () => {
    const { tool, wikiService } = makeTool({
      wikiService: {