fix: align KTX agent tools and repair handling (#73)

2026-07-22 11:51:01 +02:00 · 2026-05-14 00:57:51 +02:00 · 2026-05-14 00:57:51 +02:00 · 28b5e2a83e
commit 28b5e2a83e
parent ed690ef60c
19 changed files with 113 additions and 45 deletions
--- a/packages/context/prompts/memory_agent_bundle_ingest_reconcile.md
+++ b/packages/context/prompts/memory_agent_bundle_ingest_reconcile.md
@ -12,7 +12,7 @@ Parsimonious. Stage 3 WUs already loaded `ingest_triage` and handled conflicts t
 3. If the system prompt includes `<canonical_pins>`, apply those pins before flagging a same-name or near-duplicate conflict. A pinned `canonicalArtifactKey` keeps the contested name when it is present in the Stage Index; competing variants keep or receive disambiguated names.
 4. Sweep both exact-key conflicts and near-duplicate writes. Compare WUs that wrote overlapping SL source names, overlapping wiki keys, the same `tables:` or `sl_refs:` action details, or obviously equivalent topic titles under different wiki keys. Call `stage_diff` to see the actual difference, and use `wiki_read`/`sl_read_source` when two different keys appear to describe the same table, metric, or source-of-truth mapping. If they're the same content, leave one canonical artifact and record the duplicate as subsumed. If they differ per `ingest_triage` rules, apply the correct resolution (rename + capture; election of canonical; silent replace for expression-only re-ingest change; or pinned canonical), then call `emit_conflict_resolution` with the artifact key and decision.
 5. For any `wiki_write`, `wiki_remove`, `sl_write_source`, or `sl_edit_source` call you make during reconciliation, include `rawPaths` with only the raw paths that directly caused that reconciliation action.
-6. Call `eviction_list()` for deleted raw paths. For each listed artifact, remove it (`sl_delete`, `wiki_remove`) and include the evicted raw path in `rawPaths`. Then call `emit_eviction_decision` with `action: "removed"` for every removed artifact.
+6. Call `eviction_list()` for deleted raw paths. For each listed artifact, remove it (`sl_write_source`/`sl_edit_source` with `delete: true` for SL sources, `wiki_remove` for wiki pages) and include the evicted raw path in `rawPaths`. Then call `emit_eviction_decision` with `action: "removed"` for every removed artifact.
 7. If the Stage 4 sweep discovers a raw file whose only honest outcome is standalone SQL, wiki-only capture, or a human flag, call `emit_unmapped_fallback` with the raw path, reason, and fallback kind.
 8. Use `read_raw_span` to zoom into specific raw files when you need to resolve what two contested measures or wiki pages actually describe.
 9. Exit when you've processed every item.
--- a/packages/context/skills/ingest_triage/SKILL.md
+++ b/packages/context/skills/ingest_triage/SKILL.md
@ -7,7 +7,7 @@ callers: [memory_agent]
 # Ingest Triage — conflict classification and resolution

 This skill is loaded in two contexts:
- By a Stage 3 WorkUnit agent when `sl_discover` or an `sl_discover` reveals that a prior WU (or a prior sync) already wrote something that overlaps with what the current WU is about to write.
+- By a Stage 3 WorkUnit agent when `sl_discover` reveals that a prior WU (or a prior sync) already wrote something that overlaps with what the current WU is about to write.
 - By the Stage 4 reconciliation agent for cross-WU sweeps and for eviction decisions.

 Apply the rules below before every write that could collide with an existing artifact.
@ -32,7 +32,7 @@ Apply the rules below before every write that could collide with an existing art
   | Definitional contradiction | Same name, substantively different formulas (different aggregation, different filters, different columns) | **Rename + capture**: disambiguate ALL variants with suffix derived from the domain (`churn_risk_engagement_based`, `churn_risk_billing_based`) and write a unified wiki page listing every variant with provenance. The contested name does NOT land in the SL. **Always flag.** |

 5. **Eviction (Stage 4 only)**: for each entry in `eviction_list()`:
-   - Remove the artifact (`sl_delete` for SL sources, `wiki_remove` for wiki pages).
+   - Remove the artifact (`sl_write_source` or `sl_edit_source` with `delete: true` for SL sources, `wiki_remove` for wiki pages).
   - Record the removal with `emit_eviction_decision` and `action: "removed"`.

 ## Why same-ingest vs re-ingest differs
--- a/packages/context/skills/lookml_ingest/SKILL.md
+++ b/packages/context/skills/lookml_ingest/SKILL.md
@ -84,7 +84,7 @@ SL source, `tables:` frontmatter, `sl_refs`, or `emit_unmapped_fallback`:

 **Required flow before writing any overlay or standalone**:

-1. Call `sl_discover(<tableName>)` for each base table you're about to touch. That returns the real columns.
+1. Call `sl_discover({ query: "<tableName>" })` for each base table you're about to touch. That returns the real columns.
 2. If the table isn't in the manifest, use the warehouse `connectionName`
   returned by `discover_data` or the target connection chosen from
   `sl_discover`, then call a dialect-appropriate SQL probe with that
--- a/packages/context/skills/notion_synthesize/SKILL.md
+++ b/packages/context/skills/notion_synthesize/SKILL.md
@ -20,7 +20,7 @@ Each WorkUnit is either a single Notion page/span or a topical cluster of relate
 4. Use `context_evidence_search`, `context_evidence_read`, and `context_evidence_neighbors` to pull supporting chunks when indexed evidence is relevant. Pass `chunkId` and `documentId` values verbatim as returned by the evidence tools.
 5. Write durable business knowledge with `wiki_write`. Aim for a small number of high-quality pages per WorkUnit or cluster. Include `rawPaths` with the exact Notion raw files that support each page.
 6. When the Notion content defines a reusable dataset, metric, segment, join rule, source-of-truth mapping, or table with explicit columns, load `sl_capture`, discover existing sources first with `sl_discover` or `sl_read_source`, then use `sl_write_source` or `sl_edit_source` only for a confirmed mapped non-Notion target source. Include `rawPaths` with the exact Notion raw files that support the SL action. If no mapped target exists, call `emit_unmapped_fallback` and keep the content wiki-only.
-7. For every deleted raw path in the Eviction Set, call `eviction_list`, decide retention, then `context_eviction_decision_write`. Do this even when no wiki write is needed.
+7. For every deleted raw path in the Eviction Set, call `eviction_list`, decide retention, then `emit_eviction_decision`. Do this even when no wiki write is needed.

 ## What To Capture

@ -99,6 +99,6 @@ SL source, `tables:` frontmatter, `sl_refs`, or `emit_unmapped_fallback`:

 ## Tools

-Allowed: `read_raw_file`, `read_raw_span`, `wiki_search`, `wiki_read`, `wiki_write`, `discover_data`, `entity_details`, `sql_execution`, `sl_discover`, `sl_read_source`, `sl_write_source`, `sl_edit_source`, `sl_validate`, `context_evidence_search`, `context_evidence_read`, `context_evidence_neighbors`, `emit_unmapped_fallback`, `eviction_list`, `context_eviction_decision_write`.
+Allowed: `read_raw_file`, `read_raw_span`, `wiki_search`, `wiki_read`, `wiki_write`, `discover_data`, `entity_details`, `sql_execution`, `sl_discover`, `sl_read_source`, `sl_write_source`, `sl_edit_source`, `sl_validate`, `context_evidence_search`, `context_evidence_read`, `context_evidence_neighbors`, `emit_unmapped_fallback`, `eviction_list`, `emit_eviction_decision`.

 Not allowed: `context_candidate_write`, `context_candidate_mark`.
--- a/packages/context/skills/sl/SKILL.md
+++ b/packages/context/skills/sl/SKILL.md
@ -1,6 +1,6 @@
 ---
 name: sl
-description: KTX's semantic layer — a structured catalog of sources (tables/views), measures, joins, and segments expressed as YAML. Covers the schema and how to query it via `semantic_query`. Use when the task involves querying pre-defined metrics (ARR, churn, retention, LTV, MAU) or reading SL source YAML to understand the catalog. Capture is handled by the `sl_capture` skill (memory-agent only).
+description: KTX's semantic layer — a structured catalog of sources (tables/views), measures, joins, and segments expressed as YAML. Covers the schema and how to query it via `sl_query`. Use when the task involves querying pre-defined metrics (ARR, churn, retention, LTV, MAU) or reading SL source YAML to understand the catalog. Capture is handled by the `sl_capture` skill (memory-agent only).
 ---

 # Semantic Layer
@ -9,7 +9,7 @@ KTX's semantic layer (SL) is a structured catalog. Each **source** represents a

 This skill covers two parts:
 - **Part 1** — Schema reference (what an SL source looks like).
- **Part 2** — Querying via `semantic_query`.
+- **Part 2** — Querying via `sl_query`.

 Capture (when and how to add new patterns to the SL) is a separate concern handled by the memory-agent — see the `sl_capture` skill if you are running in capture mode. The research agent **reads** and **queries** the SL via the tools described here; it does not write to it.

@ -162,7 +162,7 @@ segments:
    description: Orders that were paid and not refunded
 ```

-Named, reusable boolean predicates scoped to one source. Reference by bare name in a measure's `segments: []`, or by dotted form `source.segment_name` in a `semantic_query`. Segments are predicates only — they are NOT selectable as dimensions. If you need to group by the predicate, add a `columns[]` entry instead.
+Named, reusable boolean predicates scoped to one source. Reference by bare name in a measure's `segments: []`, or by dotted form `source.segment_name` in an `sl_query`. Segments are predicates only — they are NOT selectable as dimensions. If you need to group by the predicate, add a `columns[]` entry instead.

 ### Cross-references with the wiki

@ -170,11 +170,11 @@ The reverse edge (wiki pages that cite this source) is derived automatically fro

 ---

-## Part 2 — Querying via `semantic_query`
+## Part 2 — Querying via `sl_query`

-The `semantic_query` tool generates correct SQL from a structured query. It handles joins, fan-out prevention, aggregation correctness, and filter classification automatically. Prefer it over writing raw SQL whenever the SL has the relevant sources.
+The `sl_query` tool generates correct SQL from a structured query. It handles joins, fan-out prevention, aggregation correctness, and filter classification automatically. Prefer it over writing raw SQL whenever the SL has the relevant sources.

-### When to prefer semantic_query over raw SQL
+### When to prefer sl_query over raw SQL

 - A pre-defined measure already exists (`source.measure_name` appears in the catalog).
 - The question combines fields from multiple sources — the engine resolves the join path automatically.
@ -189,15 +189,12 @@ Use raw SQL (`sql_execution`) only when:
 ```json
 {
  "connectionId": "uuid-of-the-connection",
-  "reasoning": "Brief note on what this query analyzes",
-  "query": {
-    "measures": ["orders.total_revenue", "sum(orders.amount)"],
-    "dimensions": ["customers.segment", { "field": "orders.created_at", "granularity": "month" }],
-    "filters": ["orders.status != 'cancelled'", "orders.total_revenue > 10000"],
-    "segments": ["orders.paid_non_refunded"],
-    "order_by": [{ "field": "orders.created_at", "direction": "desc" }],
-    "limit": 1000
-  }
+  "measures": ["orders.total_revenue", "sum(orders.amount)"],
+  "dimensions": ["customers.segment", { "field": "orders.created_at", "granularity": "month" }],
+  "filters": ["orders.status != 'cancelled'", "orders.total_revenue > 10000"],
+  "segments": ["orders.paid_non_refunded"],
+  "order_by": [{ "field": "orders.created_at", "direction": "desc" }],
+  "limit": 1000
 }
 ```

--- a/packages/context/skills/sl_capture/SKILL.md
+++ b/packages/context/skills/sl_capture/SKILL.md
@ -63,7 +63,7 @@ Preferred:
 - name: total_revenue
  expr: sum(amount)
 ```
-Callers filter `region = 'US'` at `semantic_query` time.
+Callers filter `region = 'US'` at query time.

 **Bake constants in only when the filter has named business meaning that won't change** (`enterprise_arr` for a contractually defined tier), cannot be expressed via the source's dimensions, or comes from a regulated/fixed list.

@ -100,7 +100,7 @@ measures:

 **Extract repeated filter bundles into named segments.** If the same predicate appears on multiple measures of the same source, lift it to a `segments[]` entry and have each measure reference it. One edit updates every measure that depends on it.

-**Never write a standalone file on a manifest-backed name.** If `sl_discover({ tableName })` finds an existing schema for that name, you MUST write an overlay (`name:` + `measures:`/`segments:`/`descriptions:` only — no `sql:`, `table:`, `grain:`, `columns:`, `joins:`). A standalone with `sql:` or `table:` on a manifest-backed name clobbers the inherited columns and joins; `sl_write_source` and `sl_validate` both reject this shape with a clear fix hint. Always run `sl_discover` before your first write on any existing name.
+**Never write a standalone file on a manifest-backed name.** If `sl_discover({ query: "<table-or-source-name>" })` finds an existing schema for that name, you MUST write an overlay (`name:` + `measures:`/`segments:`/`descriptions:` only — no `sql:`, `table:`, `grain:`, `columns:`, `joins:`). A standalone with `sql:` or `table:` on a manifest-backed name clobbers the inherited columns and joins; `sl_write_source` and `sl_validate` both reject this shape with a clear fix hint. Always run `sl_discover` before your first write on any existing name.

 **Prefer overlay decomposition over standalone SQL sources.** Before reaching for `source_type: sql`, check whether the metric decomposes into measures on existing overlays (including cross-source derived measures). Use `source_type: sql` only when:
 - The metric requires per-user/per-entity derivation that cannot be expressed as a single `expr` (e.g., `EXISTS` over a time-windowed subset), OR
@ -209,10 +209,10 @@ SL source, `tables:` frontmatter, `sl_refs`, or `emit_unmapped_fallback`:
 ## Tool sequence

 1. `sl_discover` — see what source files exist.
-2. `sl_discover({ tableName })` — **REQUIRED before the first write on any name**. Shows columns/joins/grain from the manifest. If the call returns a schema, you MUST write an overlay, not a standalone. Skipping this is the #1 cause of accidentally shadowing the manifest.
-3. `sl_read_source({ sourceName })` — read the raw YAML before editing.
-4. For modifications: `sl_edit_source({ sourceName, old_string, new_string })` with exact-string replacements. `old_string` must match exactly and be unique in the file.
-5. For new sources or full rewrites: `sl_write_source({ sourceName, content })` with the full YAML content.
+2. `sl_discover({ query: "<table-or-source-name>" })` — **REQUIRED before the first write on any name**. Shows columns/joins/grain from the manifest. If the call returns a schema, you MUST write an overlay, not a standalone. Skipping this is the #1 cause of accidentally shadowing the manifest.
+3. `sl_read_source({ connectionId, sourceName })` — read the raw YAML before editing.
+4. For modifications: `sl_edit_source({ connectionId, sourceName, yaml_edits: [{ oldText, newText, reason }] })` with exact-string replacements. `oldText` must match exactly and be unique in the file.
+5. For new sources or full rewrites: `sl_write_source({ connectionId, sourceName, source })` with the full structured source definition.
 6. For join discovery: use `sql_execution({connectionName: "warehouse", sql: "SELECT count(*) FROM public.orders o JOIN public.customers c ON c.id = o.customer_id LIMIT 20"})` with the target warehouse connection name and dialect-correct table names to verify the join key exists in both tables and assess cardinality before declaring the join.
 7. Cross-reference knowledge: author the edge once on the **wiki** side via `sl_refs: [source_name]` in the page's front-matter. The reverse edge (wiki pages that cite an SL source) is derived automatically by the reconciler — do not add a `knowledge_refs:` field to SL YAMLs.
 8. `sl_validate` — run after writing or editing to surface schema issues, duplicate measure names, and cross-source validation errors. Read-only; the writes are already committed (the squash-at-end flow will collapse them into one commit).
@ -235,13 +235,21 @@ Existing index: `orders [measures=0, joins=0] — candidate for enrichment`.
 ```
 sl_discover()
  → orders.yaml does not exist yet
-sl_discover({ tableName: "orders" })
+sl_discover({ query: "orders" })
  → see grain, columns, no current overlay
 sl_write_source({
+  connectionId: "warehouse",
  sourceName: "orders",
-  content: "name: orders\nmeasures:\n  - name: avg_order_value\n    expr: avg(amount)\n    description: Mean order transaction amount — filter by product_category at query time\n"
+  source: {
+    name: "orders",
+    measures: [{
+      name: "avg_order_value",
+      expr: "avg(amount)",
+      description: "Mean order transaction amount - filter by product_category at query time"
+    }]
+  }
 })
-sl_validate()
+sl_validate({ connectionId: "warehouse" })
  → clean
 ```

@ -258,16 +266,17 @@ Current user: "Wait, by 'active' I mean users who have placed an order in the la
 The existing `users.active_count` measure is wrong by the new definition.

 ```
-sl_read_source({ sourceName: "users" })
+sl_read_source({ connectionId: "warehouse", sourceName: "users" })
  → see the wrong measure
 sl_edit_source({
+  connectionId: "warehouse",
  sourceName: "users",
  yaml_edits: [{
    oldText: "  - name: active_count\n    expr: \"count(*)\"\n    filter: \"last_login_at > now() - interval '30 days'\"\n    description: Users who logged in within the last 30 days",
    newText: "  - name: active_count\n    expr: \"count(distinct case when last_order_at > now() - interval '30 days' then user_id end)\"\n    description: Users with at least one order in the last 30 days"
  }]
 })
-sl_validate()
+sl_validate({ connectionId: "warehouse" })
 ```

 If you only added a new measure, the old incorrect `active_count` would stay and future queries would keep answering the wrong question.
@ -277,7 +286,7 @@ If you only added a new measure, the old incorrect `active_count` would stay and
 Prior turn: user asked to correlate LTV with protocol count; assistant joined `fct_orders` with `fct_mau_multiprotocol` on `admin_user_id` in raw SQL.

 ```
-sl_read_source({ sourceName: "fct_orders" })
+sl_read_source({ connectionId: "warehouse", sourceName: "fct_orders" })
  → no joins section yet
 sql_execution({
  connectionName: "warehouse",
@ -285,13 +294,14 @@ sql_execution({
 })
  → confirms cardinality (many orders per MAU row = many_to_one)
 sl_edit_source({
+  connectionId: "warehouse",
  sourceName: "fct_orders",
  yaml_edits: [{
    oldText: "measures:",
    newText: "joins:\n  - to: fct_mau_multiprotocol\n    on: admin_user_id = fct_mau_multiprotocol.admin_user_id\n    relationship: many_to_one\nmeasures:"
  }]
 })
-sl_validate()
+sl_validate({ connectionId: "warehouse" })
 ```

 Always verify joins with `sql_execution` before adding them.
--- a/packages/context/skills/wiki_capture/SKILL.md
+++ b/packages/context/skills/wiki_capture/SKILL.md
@ -31,7 +31,7 @@ Do NOT capture:
 - Temporary instructions scoped to the current chat.
 - Ad-hoc formatting preferences.
 - Information already present in the semantic layer (column names, join paths, measure formulas — those belong in SL).
- **Query results, snapshots, or time-bounded benchmark tables.** Numbers go stale; pasting "Oct 2025: 25%, Nov 2025: 19.9%, …" creates misinformation as soon as new data lands. Reference the SL source by name (`sl_refs`) and let future queries pull live data — the wiki captures the *rule* (definition, exclusion, segmentation), the SL source captures the *measure*, and `semantic_query` captures the *current values*.
+- **Query results, snapshots, or time-bounded benchmark tables.** Numbers go stale; pasting "Oct 2025: 25%, Nov 2025: 19.9%, …" creates misinformation as soon as new data lands. Reference the SL source by name (`sl_refs`) and let future query tools pull live data — the wiki captures the *rule* (definition, exclusion, segmentation), the SL source captures the *measure*, and query execution captures the *current values*.
 - **Interpretive narrative tied to a specific snapshot** ("M1 retention degraded sharply from Dec 2025"). The observation is anchored to data that will move; the actionable convention (e.g., "always exclude in-progress cohorts") may be worth capturing on its own, but the snapshot-specific commentary is not.

 If nothing is worth capturing, respond without calling any tool.
@ -136,7 +136,7 @@ wiki_search({ query: "refund revenue paid orders" })
  → returns `revenue-definition` (related — defines paid-orders filter)
 sl_discover({ query: "refund rate" })
  → returns fct_orders (score 0.08), fct_gaap_revenue (0.06)
-sl_read_source({ sourceName: "fct_orders" })
+sl_read_source({ connectionId: "warehouse", sourceName: "fct_orders" })
  → confirms amount_refunded_dollars and transaction_amount_dollars exist
 wiki_write({
  key: "refund-rate-definition",
--- a/packages/context/src/agent/agent-runner.service.test.ts
+++ b/packages/context/src/agent/agent-runner.service.test.ts
@ -40,6 +40,8 @@ describe('AgentRunnerService.runLoop', () => {

  it('passes systemPrompt, userPrompt, tools, and step budget through to generateText', async () => {
    (generateText as any).mockResolvedValue({ text: 'ok', toolCalls: [], steps: [] });
+    const repairHandler = vi.fn();
+    llmProvider.repairToolCallHandler.mockReturnValueOnce(repairHandler);
    const tools = { noop: { description: 'noop', inputSchema: {}, execute: vi.fn() } };
    await runner.runLoop({
      modelRole: 'candidateExtraction',
@ -59,7 +61,9 @@ describe('AgentRunnerService.runLoop', () => {
    expect(call.tools).toEqual(tools);
    expect(call.stopWhen).toBe(17);
    expect(call.temperature).toBe(0);
+    expect(call.experimental_repairToolCall).toBe(repairHandler);
    expect(llmProvider.getModel).toHaveBeenCalledWith('candidateExtraction');
+    expect(llmProvider.repairToolCallHandler).toHaveBeenCalledWith({ source: 'ktx-agent-runner' });
  });

  it('returns stopReason=natural when the loop completes without error', async () => {
--- a/packages/context/src/agent/agent-runner.service.ts
+++ b/packages/context/src/agent/agent-runner.service.ts
@ -73,6 +73,9 @@ export class AgentRunnerService {
        temperature: 0,
        stopWhen: stepCountIs(params.stepBudget),
        experimental_telemetry: this.deps.telemetry?.createTelemetry(params.telemetryTags),
+        experimental_repairToolCall: this.deps.llmProvider.repairToolCallHandler({
+          source: params.telemetryTags.operationName ?? 'ktx-agent-runner',
+        }),
        messages: built.messages,
        tools: built.tools as Record<string, Tool>,
        onStepFinish: async () => {
--- a/packages/context/src/ingest/ingest-bundle.runner.test.ts
+++ b/packages/context/src/ingest/ingest-bundle.runner.test.ts
@ -695,7 +695,8 @@ describe('IngestBundleRunner — Stages 1 → 7', () => {
        await params.toolSet.emit_unmapped_fallback.execute(
          {
            rawPath: 'a.yml',
-            reason: 'semantic_not_representable',
+            reason: 'parse_error',
+            clarification: 'semantic_not_representable',
            fallback: 'flagged',
          },
          { toolCallId: 'fallback-1', messages: [] },
@ -954,6 +955,7 @@ describe('IngestBundleRunner — Stages 1 → 7', () => {
            {
              rawPath: 'a.yml',
              reason: 'conversion_metric_unsupported',
+              detail: expect.stringContaining('conversion metric'),
              fallback: 'flagged',
            },
          ],
@ -1006,7 +1008,8 @@ describe('IngestBundleRunner — Stages 1 → 7', () => {
        await params.toolSet.emit_unmapped_fallback.execute(
          {
            rawPath: 'cards/untranslated.json',
-            reason: 'metabase_sql_untranslated',
+            reason: 'parse_error',
+            clarification: 'metabase_sql_untranslated',
            fallback: 'flagged',
          },
          { toolCallId: 'fallback-1', messages: [] },
@ -1053,7 +1056,8 @@ describe('IngestBundleRunner — Stages 1 → 7', () => {
          unmappedFallbacks: [
            {
              rawPath: 'cards/untranslated.json',
-              reason: 'metabase_sql_untranslated',
+              reason: 'parse_error',
+              detail: expect.stringContaining('metabase_sql_untranslated'),
              fallback: 'flagged',
            },
          ],
--- a/packages/context/src/ingest/stages/stage-index.types.ts
+++ b/packages/context/src/ingest/stages/stage-index.types.ts
@ -37,7 +37,9 @@ export type UnmappedFallbackReason =
  | 'multiple_table_references'
  | 'unsupported_dialect'
  | 'parse_error'
-  | 'missing_target_table';
+  | 'missing_target_table'
+  | 'cumulative_metric_unsupported'
+  | 'conversion_metric_unsupported';

 export interface UnmappedFallbackRecord {
  rawPath: string;
--- a/packages/context/src/ingest/tools/emit-reconciliation-records.tool.test.ts
+++ b/packages/context/src/ingest/tools/emit-reconciliation-records.tool.test.ts
@ -182,6 +182,30 @@ describe('reconciliation emit tools', () => {
    ]);
  });

+  it('records MetricFlow-specific unsupported fallback reasons', async () => {
+    const stageIndex = makeStageIndex();
+    const tool = createEmitUnmappedFallbackTool({
+      stageIndex,
+      allowedPaths: new Set(['metrics/conversion.yml']),
+    });
+
+    const output = await executeTool(tool, {
+      rawPath: 'metrics/conversion.yml',
+      reason: 'conversion_metric_unsupported',
+      fallback: 'flagged',
+    });
+
+    expect(output).toContain('conversion metric');
+    expect(stageIndex.unmappedFallbacks).toEqual([
+      {
+        rawPath: 'metrics/conversion.yml',
+        reason: 'conversion_metric_unsupported',
+        detail: expect.stringContaining('conversion metric'),
+        fallback: 'flagged',
+      },
+    ]);
+  });
+
  it('rejects unmapped fallback decisions for raw paths outside the allowed set', async () => {
    const stageIndex = makeStageIndex();
    const tool = createEmitUnmappedFallbackTool({
--- a/packages/context/src/ingest/tools/emit-unmapped-fallback.tool.ts
+++ b/packages/context/src/ingest/tools/emit-unmapped-fallback.tool.ts
@ -17,6 +17,8 @@ const unmappedFallbackReasonSchema = z.enum([
  'unsupported_dialect',
  'parse_error',
  'missing_target_table',
+  'cumulative_metric_unsupported',
+  'conversion_metric_unsupported',
 ]);

 function sameUnmappedFallback(left: UnmappedFallbackRecord, right: UnmappedFallbackRecord): boolean {
@ -47,6 +49,10 @@ function canonicalDetail(reason: UnmappedFallbackReason, tableRef: string | unde
      return `${tableClause} uses a SQL dialect that is not yet supported.`;
    case 'parse_error':
      return `${tableClause} could not be parsed.`;
+    case 'cumulative_metric_unsupported':
+      return `${tableClause} is a cumulative metric, which is not yet supported as a first-class semantic-layer primitive.`;
+    case 'conversion_metric_unsupported':
+      return `${tableClause} is a conversion metric, which is not yet supported as a first-class semantic-layer primitive.`;
  }
 }

--- a/packages/context/src/ingest/tools/eviction-list.tool.test.ts
+++ b/packages/context/src/ingest/tools/eviction-list.tool.test.ts
@ -51,6 +51,6 @@ describe('eviction_list tool', () => {
      deletedRawPaths: [],
    });

-    expect(tool.description).toContain('context_eviction_decision_write');
+    expect(tool.description).toContain('emit_eviction_decision');
  });
 });
--- a/packages/context/src/ingest/tools/eviction-list.tool.ts
+++ b/packages/context/src/ingest/tools/eviction-list.tool.ts
@ -12,7 +12,7 @@ export interface EvictionListDeps {
 export function createEvictionListTool(deps: EvictionListDeps) {
  return tool({
    description:
-      'List every artifact that the most recent completed sync produced from a now-deleted raw file. Remove each listed artifact and record the decision with context_eviction_decision_write so the ingest report lists every deleted-source decision.',
+      'List every artifact that the most recent completed sync produced from a now-deleted raw file. Remove each listed artifact and record the decision with emit_eviction_decision so the ingest report lists every deleted-source decision.',
    inputSchema: z.object({}),
    execute: async () => {
      if (deps.deletedRawPaths.length === 0) {
--- a/packages/context/src/ingest/tools/verification-ledger.tool.ts
+++ b/packages/context/src/ingest/tools/verification-ledger.tool.ts
@ -28,7 +28,7 @@ const WRITE_TOOL_NAMES = new Set([
 ]);

 export const VERIFICATION_LEDGER_PROMPT = `<pre_write_verification>
-Before any write-capable tool call (wiki_write, wiki_remove, sl_write_source, sl_edit_source, emit_unmapped_fallback), call record_verification_ledger.
+Before any durable wiki, semantic-layer, or unmapped-fallback write (wiki_write, wiki_remove, sl_write_source, sl_edit_source, emit_unmapped_fallback), call record_verification_ledger.
 The ledger is a model-authored checkpoint, not a deterministic parser gate. Summarize the verification protocol from the loaded skill, list identifiers verified with discover_data/entity_details/sql_execution, and list anything intentionally left unverified. If the write contains no warehouse identifiers, say that explicitly.
 If a write tool returns verification_ledger_required, complete the ledger and retry the write.
 </pre_write_verification>`;
--- a/packages/context/src/llm/generation.ts
+++ b/packages/context/src/llm/generation.ts
@ -4,6 +4,10 @@ import { generateText, Output, type FlexibleSchema, type ToolSet } from 'ai';
 type GenerateTextInput = Parameters<typeof generateText>[0];
 type GenerateTextFn = (input: GenerateTextInput) => Promise<{ text?: string; output?: unknown }>;

+function hasTools(tools: ToolSet): boolean {
+  return Object.keys(tools).length > 0;
+}
+
 interface GenerateKtxTextInput {
  llmProvider: KtxLlmProvider;
  role: KtxModelRole;
@ -30,6 +34,13 @@ export async function generateKtxText(input: GenerateKtxTextInput): Promise<stri
    temperature: input.temperature ?? 0,
    messages: built.messages,
    tools: built.tools as ToolSet,
+    ...(hasTools(built.tools as ToolSet)
+      ? {
+          experimental_repairToolCall: input.llmProvider.repairToolCallHandler({
+            source: `ktx-${input.role}`,
+          }),
+        }
+      : {}),
  });
  if (typeof result.text !== 'string') {
    throw new Error('KTX LLM text generation returned no text');
@ -52,6 +63,13 @@ export async function generateKtxObject<TOutput, TSchema>(
    temperature: input.temperature ?? 0,
    messages: built.messages,
    tools: built.tools as ToolSet,
+    ...(hasTools(built.tools as ToolSet)
+      ? {
+          experimental_repairToolCall: input.llmProvider.repairToolCallHandler({
+            source: `ktx-${input.role}`,
+          }),
+        }
+      : {}),
    output: Output.object({
      schema: input.schema as FlexibleSchema<TOutput>,
    }),
--- a/packages/context/src/sl/tools/sl-discover.tool.ts
+++ b/packages/context/src/sl/tools/sl-discover.tool.ts
@ -53,7 +53,7 @@ export class SlDiscoverTool extends BaseSemanticLayerTool<typeof slDiscoverInput
    return `<purpose>
 Discover available semantic layer sources, columns, measures, and joins.
 When called without a connectionId, discovers sources across ALL data sources — grouped by data source name and ID.
-Use this to understand what data is available before writing a semantic_query.
+Use this to understand what data is available before querying through the semantic layer.
 </purpose>

 <when_to_use>
--- a/packages/context/src/sl/tools/sl-read-source.tool.ts
+++ b/packages/context/src/sl/tools/sl-read-source.tool.ts
@ -36,7 +36,7 @@ Use this when you need to understand how a source is built — e.g., before edit

 <when_not_to_use>
 - To discover what sources/measures/dimensions are available for querying — use sl_discover instead
- To query data — use semantic_query or create_widget with slQuery
+- To query data — use the semantic-layer query surface (\`sl_query\` in MCP)
 </when_not_to_use>`;
  }