feat(ingest): default local ingest to isolated diffs (#128)

* docs: add isolated-diff ingestion design

* Refine isolated-diff ingestion design after adversarial review iteration 1

* Refine isolated-diff ingestion design after adversarial review iteration 2

* Refine isolated-diff ingestion design after adversarial review iteration 3

* feat: persist ingest trace events

* feat: add isolated ingest patch helpers

* feat: validate wiki body semantic references

* feat: add final ingest artifact gates

* feat: execute ingest work units in child worktrees

* feat: integrate isolated work unit patches

* feat: route selected ingest sources through isolated diffs

* test: cover isolated diff ingestion regressions

* feat: add isolated diff ingestion v1 core

* docs: document ingest trace inspection

* docs: add isolated diff ingestion v1 core plan

* fix(ingest): tighten final artifact gates

* fix(ingest): gate isolated final integration tree

* fix(ingest): persist postmortem failure traces

* fix(ingest): trace policy conflicts and cleanup child worktrees

* test(ingest): verify isolated diff postmortem coverage

* docs: add isolated diff ingestion gates and trace closure plan

* fix(ingest): gate provenance before isolated diff squash

* docs: add isolated diff ingestion provenance gate closure plan

* fix(ingest): gate final wiki references

* fix(ingest): enforce SL target connection scope

* fix(ingest): trace isolated SL target policy gates

* test(ingest): cover isolated diff reference and target gates

* chore(ingest): verify isolated diff gate closure

* docs: add isolated diff ingestion reference and target gate closure plan

* fix(ingest): gate global wiki references

* docs: add isolated diff ingestion global wiki reference gate closure plan

* fix(ingest): validate scan sources and wiki refs

* test(ingest): cover isolated diff textual conflict resolver

* test(ingest): cover isolated diff resolver integration

* feat(ingest): repair isolated diff textual conflicts

* feat(ingest): report isolated diff resolver outcomes

* test(ingest): verify isolated diff textual conflict repair

* test(ingest): align textual conflict failure coverage

* docs: add isolated diff textual conflict resolver plan

* test(ingest): cover isolated diff gate repair

* feat(ingest): add isolated diff gate repair agent

* feat(ingest): repair isolated diff semantic gate failures

* feat(ingest): wire isolated diff gate repair

* test(ingest): verify isolated diff final gate repair

* chore(ingest): verify isolated diff gate repair

* docs: add isolated diff gate repair plan

* Improve ingest progress updates

* feat(ingest): route direct-write connectors through isolated diffs

* test(ingest): cover non-metabase isolated diff routing

* feat(ingest): project metricflow semantic models before work units

* test(ingest): verify metricflow isolated projection path

* chore(ingest): verify isolated diff connector migration

* docs: add isolated diff connector migration plan

* feat(ingest): make isolated diff routing the private default

* feat(ingest): promote isolated diff to default runner path

* feat(ingest): default local ingest to isolated diffs

* chore(ingest): remove isolated diff allowlist references

* fix(ingest): preserve transient evidence for isolated work units

* docs: add isolated diff default promotion plan

* refactor(ingest): remove shared worktree WorkUnit path

* docs(ingest): align WorkUnit prompts with isolated diffs

* test(ingest): drop unused runner import

* docs: add isolated diff shared worktree removal plan

* docs: add isolated diff gate repair classification plan

* fix: restrict claude-code mcp servers

* docs: align ingest trace guidance with public CLI

---------

Co-authored-by: Andrey Avtomonov <7889985+andreybavt@users.noreply.github.com>
This commit is contained in:
Andrey Avtomonov 2026-05-18 13:38:06 +02:00 committed by GitHub
parent d1c84e5564
commit e64da5a85d
No known key found for this signature in database
GPG key ID: B5690EEEBB952194
66 changed files with 22346 additions and 514 deletions

View file

@ -111,6 +111,41 @@ notion skipped skipped done done
Use `--json` when a script or agent needs the selected plan and per-target
results.
## Inspect source ingest traces
Source ingest writes persistent JSONL traces for postmortem debugging. Plain
ingest output prints the trace path near the report, run, and job identifiers
when a trace is available:
```text
Report: report-abc123
Run: run-abc123
Job: job-abc123
Trace: .ktx/ingest-traces/job-abc123/trace.jsonl
```
The trace file lives under the project directory at
`.ktx/ingest-traces/<jobId>/trace.jsonl`. Each line is a JSON event with the
job id, run id, sync id, connection id, source key, phase, event name, timing,
state snapshot, decision context, and error details. Failed runs also write a
stored ingest report with `status: "failed"`, `failure.phase`,
`failure.message`, and the same trace path.
Use `jq` or line-oriented tools to inspect a trace:
```bash
jq -c '. | {at, level, phase, event, durationMs, data, error}' \
.ktx/ingest-traces/<jobId>/trace.jsonl
```
KTX writes `debug` trace events by default. Set `KTX_INGEST_TRACE_LEVEL` to
`error`, `info`, `debug`, or `trace` before running ingest to change the trace
verbosity:
```bash
KTX_INGEST_TRACE_LEVEL=trace ktx ingest metabase
```
## Common errors
| Error | Cause | Recovery |

File diff suppressed because it is too large Load diff

View file

@ -0,0 +1,493 @@
# Isolated Diff Ingestion V1 Global Wiki Reference Gate Closure Implementation Plan
> **For agentic workers:** REQUIRED SUB-SKILL: Use
> superpowers:subagent-driven-development (recommended) or
> superpowers:executing-plans to implement this plan task-by-task. Steps use
> checkbox (`- [ ]`) syntax for tracking.
**Goal:** Reject final trees where an isolated-diff run changes semantic-layer
sources or deletes wiki pages and leaves pre-existing wiki pages with stale
body, `sl_refs`, frontmatter `refs`, or inline `[[page-key]]` references.
**Architecture:** Keep `artifact-gates.ts` validation-only. The runner expands
the final wiki gate scope before the existing final artifact gate: changed pages
are always validated, and all global wiki pages are validated when the run
changes any semantic-layer source or removes any wiki page. The final-gate trace
records the expanded scope and why it was expanded.
**Tech Stack:** TypeScript, Vitest, pnpm workspace commands, existing
`IngestBundleRunner`, `KnowledgeWikiService`, and isolated-diff test fixtures.
---
## Audit Summary
The implemented isolated-diff plans cover the core v1 flow: child worktrees,
binary no-rename patch proposals, `git apply --3way --index`, policy rejection,
final gates after reconciliation and repair, pre-squash provenance raw-path
validation, target-connection enforcement, failed reports, and persistent JSONL
traces.
One v1-blocking correctness gap remains. Final wiki gates currently validate
wiki pages changed by the run. They do not validate unchanged pages that become
invalid because the run changes a semantic-layer source or deletes a referenced
wiki page. Two concrete failures can therefore squash into main:
- A pre-existing wiki page body contains
`` `mart_account_segments.total_contract_arr_cents` `` while the run updates
`semantic-layer/warehouse/mart_account_segments.yaml` to define only
`total_contract_arr`.
- A pre-existing wiki page has `refs: [source-page]` or `[[source-page]]` while
the run deletes `wiki/global/source-page.md`.
This plan does not expand connector rollout, promote isolated diffs to the
default, add interactive resolution, add semantic auto-merge, remove the old
path, expand transitive semantic-layer dependencies, or move provenance into
files.
## File Structure
- Modify `packages/context/src/ingest/ingest-bundle.runner.isolated-diff.test.ts`.
Adds two failing end-to-end regressions for unchanged wiki pages made stale by
semantic-layer changes and wiki-page deletion.
- Modify `packages/context/src/ingest/ingest-bundle.runner.ts`.
Adds a final wiki gate scope helper, expands validation to all global wiki
pages when final state changes can invalidate unchanged references, and records
scope details in the final-gate trace and failed report.
---
### Task 1: Add failing unchanged wiki regressions
**Files:**
- Modify: `packages/context/src/ingest/ingest-bundle.runner.isolated-diff.test.ts`
- [ ] **Step 1: Add the stale existing wiki body regression**
Insert this test inside `describe('IngestBundleRunner isolated diff path', ...)`
after the existing Metabase stale-measure regression:
```ts
it('rejects unchanged wiki body refs made stale by isolated semantic-layer changes', async () => {
const runtime = await makeRealGitRuntime();
try {
await mkdir(join(runtime.configDir, 'semantic-layer/warehouse'), { recursive: true });
await mkdir(join(runtime.configDir, 'wiki/global'), { recursive: true });
await writeFile(
join(runtime.configDir, 'semantic-layer/warehouse/mart_account_segments.yaml'),
'name: mart_account_segments\ngrain: [account_id]\ncolumns: [{name: account_id, type: string}]\njoins: []\nmeasures:\n - name: total_contract_arr_cents\n expr: sum(contract_arr)\n',
);
await writeFile(
join(runtime.configDir, 'wiki/global/account-segments.md'),
'---\nsummary: Account segments\nusage_mode: auto\n---\n\nExisting ARR uses `mart_account_segments.total_contract_arr_cents`.\n',
);
await runtime.git.commitFiles(
['semantic-layer/warehouse/mart_account_segments.yaml', 'wiki/global/account-segments.md'],
'seed existing wiki body ref',
'KTX Test',
'system@ktx.local',
);
const preRunHead = await runtime.git.revParseHead();
const { deps, adapter } = makeDeps(runtime);
adapter.chunk.mockResolvedValue({
workUnits: [{ unitKey: 'source-only', rawFiles: ['cards/source.json'], peerFileIndex: [], dependencyPaths: [] }],
});
let currentSession: any = null;
deps.toolsetFactory.createIngestWuToolset = vi.fn((toolSession: any) => {
currentSession = toolSession;
return { toRuntimeTools: vi.fn(() => ({})) };
});
deps.agentRunner.runLoop = vi.fn(async () => {
const root = rootOfConfig(currentSession.configService, runtime.configDir);
await writeFile(
join(root, 'semantic-layer/warehouse/mart_account_segments.yaml'),
'name: mart_account_segments\ngrain: [account_id]\ncolumns: [{name: account_id, type: string}]\njoins: []\nmeasures:\n - name: total_contract_arr\n expr: sum(contract_arr)\n',
);
addTouchedSlSource(currentSession.touchedSlSources, 'warehouse', 'mart_account_segments');
currentSession.actions.push({
target: 'sl',
type: 'updated',
key: 'mart_account_segments',
detail: 'Rename ARR measure',
targetConnectionId: 'warehouse',
rawPaths: ['cards/source.json'],
});
await currentSession.gitService.commitFiles(
['semantic-layer/warehouse/mart_account_segments.yaml'],
'wu source rename',
'KTX Test',
'system@ktx.local',
);
return { stopReason: 'natural' };
}) as never;
const runner = new IngestBundleRunner(deps);
await mockStageRawFiles(runner, runtime, [['cards/source.json', 'h1']]);
await expect(
runner.run({
jobId: 'job-existing-body-stale',
connectionId: 'warehouse',
sourceKey: 'metabase',
trigger: 'upload',
bundleRef: { kind: 'upload', uploadId: 'upload' },
}),
).rejects.toThrow(/total_contract_arr_cents/);
expect(await runtime.git.revParseHead()).toBe(preRunHead);
const trace = await readFile(join(runtime.configDir, '.ktx/ingest-traces/job-existing-body-stale/trace.jsonl'), 'utf-8');
expect(trace).toContain('final_artifact_gates_failed');
expect(trace).toContain('account-segments');
expect(trace).toContain('semantic_layer_changed');
expect(trace).toContain('ingest_failed');
expect(trace).toContain('failure_report_created');
expect(trace).not.toContain('squash_finished');
} finally {
await rm(runtime.homeDir, { recursive: true, force: true });
}
});
```
- [ ] **Step 2: Add the stale existing wiki page-reference regression**
Insert this test near the existing final wiki reference regression:
```ts
it('rejects unchanged inbound wiki refs broken by an isolated wiki deletion', async () => {
const runtime = await makeRealGitRuntime();
try {
await mkdir(join(runtime.configDir, 'wiki/global'), { recursive: true });
await writeFile(
join(runtime.configDir, 'wiki/global/source-page.md'),
'---\nsummary: Source page\nusage_mode: auto\n---\n\nSource page\n',
);
await writeFile(
join(runtime.configDir, 'wiki/global/account-segments.md'),
'---\nsummary: Account segments\nusage_mode: auto\nrefs:\n - source-page\n---\n\nSee [[source-page]].\n',
);
await runtime.git.commitFiles(
['wiki/global/source-page.md', 'wiki/global/account-segments.md'],
'seed inbound wiki refs',
'KTX Test',
'system@ktx.local',
);
const preRunHead = await runtime.git.revParseHead();
const { deps, adapter } = makeDeps(runtime);
adapter.chunk.mockResolvedValue({
workUnits: [{ unitKey: 'delete-target-page', rawFiles: ['pages/delete.json'], peerFileIndex: [], dependencyPaths: [] }],
});
let currentSession: any = null;
deps.toolsetFactory.createIngestWuToolset = vi.fn((toolSession: any) => {
currentSession = toolSession;
return { toRuntimeTools: vi.fn(() => ({})) };
});
deps.agentRunner.runLoop = vi.fn(async () => {
const root = rootOfConfig(currentSession.configService, runtime.configDir);
await rm(join(root, 'wiki/global/source-page.md'), { force: true });
currentSession.actions.push({
target: 'wiki',
type: 'removed',
key: 'source-page',
detail: 'Delete referenced page',
rawPaths: ['pages/delete.json'],
});
await currentSession.gitService.commitFiles(
['wiki/global/source-page.md'],
'wu delete target page',
'KTX Test',
'system@ktx.local',
);
return { stopReason: 'natural' };
}) as never;
const runner = new IngestBundleRunner(deps);
await mockStageRawFiles(runner, runtime, [['pages/delete.json', 'h1']]);
await expect(
runner.run({
jobId: 'job-existing-wiki-ref-stale',
connectionId: 'warehouse',
sourceKey: 'metabase',
trigger: 'upload',
bundleRef: { kind: 'upload', uploadId: 'upload' },
}),
).rejects.toThrow(/wiki references target missing page\(s\): account-segments -> source-page/);
expect(await runtime.git.revParseHead()).toBe(preRunHead);
const trace = await readFile(join(runtime.configDir, '.ktx/ingest-traces/job-existing-wiki-ref-stale/trace.jsonl'), 'utf-8');
expect(trace).toContain('final_artifact_gates_failed');
expect(trace).toContain('account-segments -> source-page');
expect(trace).toContain('wiki_page_removed');
expect(trace).toContain('ingest_failed');
expect(trace).toContain('failure_report_created');
expect(trace).not.toContain('squash_finished');
} finally {
await rm(runtime.homeDir, { recursive: true, force: true });
}
});
```
- [ ] **Step 3: Run the focused regressions and verify they fail**
Run:
```bash
pnpm --filter @ktx/context exec vitest run src/ingest/ingest-bundle.runner.isolated-diff.test.ts -t "unchanged wiki body refs|unchanged inbound wiki refs"
```
Expected: FAIL. The stale body test currently squashes successfully because the
unchanged `account-segments` page is not in `finalChangedWikiPageKeys`. The
inbound wiki ref test currently squashes successfully because the deleted
`source-page` is validated as a missing changed page and skipped, while the
unchanged page that references it is never validated.
---
### Task 2: Expand the final wiki validation scope
**Files:**
- Modify: `packages/context/src/ingest/ingest-bundle.runner.ts`
- [ ] **Step 1: Add final wiki gate scope helpers**
Add these private methods after `uniqueTouchedSlSources()`:
```ts
private removedWikiPageKeysFromActions(actions: MemoryAction[]): string[] {
return this.uniqueWikiPageKeys(
actions.filter((action) => action.target === 'wiki' && action.type === 'removed').map((action) => action.key),
);
}
private async wikiPageKeysForFinalGates(input: {
wikiService: ReturnType<KnowledgeWikiService['forWorktree']>;
changedWikiPageKeys: string[];
touchedSlSources: TouchedSlSource[];
actions: MemoryAction[];
}): Promise<{
pageKeys: string[];
trace: {
global: boolean;
reasons: string[];
changedWikiPageKeys: string[];
removedWikiPageKeys: string[];
pageKeysValidated: string[];
};
}> {
const changedWikiPageKeys = this.uniqueWikiPageKeys(input.changedWikiPageKeys);
const removedWikiPageKeys = this.removedWikiPageKeysFromActions(input.actions);
const reasons: string[] = [];
if (input.touchedSlSources.length > 0) {
reasons.push('semantic_layer_changed');
}
if (removedWikiPageKeys.length > 0) {
reasons.push('wiki_page_removed');
}
let pageKeys = changedWikiPageKeys;
if (reasons.length > 0) {
pageKeys = this.uniqueWikiPageKeys([
...changedWikiPageKeys,
...(await input.wikiService.listPageKeys('GLOBAL', null)),
]);
}
return {
pageKeys,
trace: {
global: reasons.length > 0,
reasons,
changedWikiPageKeys,
removedWikiPageKeys,
pageKeysValidated: pageKeys,
},
};
}
```
- [ ] **Step 2: Use the expanded scope before final gates**
In `runInner()`, replace the current `finalChangedWikiPageKeys` and
`finalTouchedSlSources` block with this code:
```ts
const baseFinalChangedWikiPageKeys = this.uniqueWikiPageKeys([
...(isolatedDiffEnabled ? projectionChangedWikiPageKeys : []),
...workUnitOutcomes
.flatMap((outcome) => outcome.patchTouchedPaths ?? [])
.flatMap((path) => this.wikiPageKeysFromPaths([path])),
...this.wikiPageKeysFromActions(reconcileActions),
...postReconciliationPaths.flatMap((path) => this.wikiPageKeysFromPaths([path])),
...wikiSlRefRepairResult.repairs.filter((repair) => repair.scope === 'GLOBAL').map((repair) => repair.pageKey),
]);
const finalTouchedSlSources = this.uniqueTouchedSlSources([
...(isolatedDiffEnabled ? projectionTouchedSources : []),
...workUnitOutcomes.flatMap((outcome) => outcome.touchedSlSources),
...this.touchedSlSourcesFromActions(reconcileActions, job.connectionId),
...this.touchedSlSourcesFromPaths(postReconciliationPaths),
...(postProcessorOutcome?.touchedSources ?? []),
]);
const finalWikiGateScope = await this.wikiPageKeysForFinalGates({
wikiService: this.deps.wikiService.forWorktree(sessionWorktree.workdir),
changedWikiPageKeys: baseFinalChangedWikiPageKeys,
touchedSlSources: finalTouchedSlSources,
actions: [...stageIndex.workUnits.flatMap((wu) => wu.actions), ...reconcileActions],
});
const finalChangedWikiPageKeys = finalWikiGateScope.pageKeys;
```
This keeps the existing variable name used by `validateFinalIngestArtifacts()`,
but the value now means "wiki page keys to validate in final gates."
- [ ] **Step 3: Add scope details to final-gate trace data**
In the `finalArtifactGateTraceData` object, add the
`wikiReferenceGateScope` field:
```ts
const finalArtifactGateTraceData = {
changedWikiPageKeys: finalChangedWikiPageKeys,
wikiReferenceGateScope: finalWikiGateScope.trace,
touchedSlSources: finalTouchedSlSources,
projectionTouchedPaths,
workUnitPatchTouchedPaths: workUnitOutcomes.flatMap((outcome) => outcome.patchTouchedPaths ?? []),
preReconciliationSha,
postReconciliationSha,
postReconciliationPaths,
reconciliationActionCount: reconcileActions.length,
wikiSlRefRepairCount: wikiSlRefRepairResult.repairs.length,
};
```
The failure report already stores `activeFailureDetails`, so this trace data
also becomes persistent failed-report context when final gates fail.
- [ ] **Step 4: Run the focused regressions and verify they pass**
Run:
```bash
pnpm --filter @ktx/context exec vitest run src/ingest/ingest-bundle.runner.isolated-diff.test.ts -t "unchanged wiki body refs|unchanged inbound wiki refs"
```
Expected: PASS. Both traces include `final_artifact_gates_failed`,
`failure_report_created`, no `squash_finished`, and
`wikiReferenceGateScope` with either `semantic_layer_changed` or
`wiki_page_removed`.
---
### Task 3: Verification and commit
**Files:**
- Verify: `packages/context/src/ingest/ingest-bundle.runner.ts`
- Verify: `packages/context/src/ingest/ingest-bundle.runner.isolated-diff.test.ts`
- [ ] **Step 1: Run the isolated-diff focused suite**
Run:
```bash
pnpm --filter @ktx/context exec vitest run \
src/ingest/ingest-bundle.runner.isolated-diff.test.ts \
src/ingest/artifact-gates.test.ts \
src/ingest/wiki-body-refs.test.ts \
src/ingest/semantic-layer-target-policy.test.ts \
src/ingest/isolated-diff/git-patch.test.ts \
src/ingest/isolated-diff/patch-integrator.test.ts \
src/ingest/isolated-diff/work-unit-executor.test.ts \
src/core/git.service.patch.test.ts
```
Expected: PASS.
- [ ] **Step 2: Type-check the context package**
Run:
```bash
pnpm --filter @ktx/context run type-check
```
Expected: PASS.
- [ ] **Step 3: Run dead-code analysis**
Run:
```bash
pnpm run dead-code
```
Expected: PASS, or only pre-existing findings unrelated to
`packages/context/src/ingest/ingest-bundle.runner.ts` and
`packages/context/src/ingest/ingest-bundle.runner.isolated-diff.test.ts`.
Investigate any new finding before committing.
- [ ] **Step 4: Verify trace acceptance criteria**
Open the traces produced by the two new failing-run tests and confirm these
events and fields exist:
```text
job-existing-body-stale:
- final_artifact_gates_started
- final_artifact_gates_failed
- ingest_failed
- failure_report_created
- no squash_finished
- wikiReferenceGateScope.global is true
- wikiReferenceGateScope.reasons includes semantic_layer_changed
- wikiReferenceGateScope.pageKeysValidated includes account-segments
- error.message includes total_contract_arr_cents
job-existing-wiki-ref-stale:
- final_artifact_gates_started
- final_artifact_gates_failed
- ingest_failed
- failure_report_created
- no squash_finished
- wikiReferenceGateScope.global is true
- wikiReferenceGateScope.reasons includes wiki_page_removed
- wikiReferenceGateScope.removedWikiPageKeys includes source-page
- error.message includes account-segments -> source-page
```
- [ ] **Step 5: Commit**
Run:
```bash
git add packages/context/src/ingest/ingest-bundle.runner.ts \
packages/context/src/ingest/ingest-bundle.runner.isolated-diff.test.ts
git commit -m "fix(ingest): gate global wiki references"
```
Expected: one commit containing only the runner and isolated-diff runner test
changes.
---
## Self-Review
Spec coverage:
- Final global wiki body reference validation now covers unchanged wiki pages
when a run changes semantic-layer sources.
- Final global wiki page reference validation now covers unchanged inbound
references when a run deletes wiki pages.
- The plan keeps resolver behavior fail-fast and stops before squash.
- Persistent trace and failed-report acceptance criteria are explicit and tied
to the concrete failure modes.
Non-blocking gaps unchanged:
- Broader connector rollout.
- Isolated-diff default promotion.
- Old shared-worktree path removal.
- Interactive conflict resolution.
- Semantic auto-merge.
- Transitive semantic-layer dependency expansion.
- Provenance-as-files.

View file

@ -0,0 +1,494 @@
# Isolated Diff Ingestion V1 Provenance Gate Closure Implementation Plan
> **For agentic workers:** REQUIRED SUB-SKILL: Use superpowers:subagent-driven-development (recommended) or superpowers:executing-plans to implement this plan task-by-task. Steps use checkbox (`- [ ]`) syntax for tracking.
**Goal:** Ensure invalid provenance raw paths are rejected before isolated-diff
ingestion squashes any integration worktree changes into the main project
worktree.
**Architecture:** Keep provenance insertion after squash, but derive and
validate the planned provenance rows immediately after final artifact gates and
before the squash stage. This makes provenance validation part of the final
pre-main safety boundary while preserving the existing report and database
write shape.
**Tech Stack:** TypeScript ESM/NodeNext, Vitest, existing
`IngestBundleRunner`, `validateProvenanceRawPaths`, ingest reports, and
persistent ingest traces.
---
## Audit Summary
The implemented isolated-diff path now covers the core v1 safety surface:
child worktrees, binary no-rename patches, `git apply --3way --index`, patch
policy rejection, final wiki and semantic-layer gates after reconciliation and
post-processing, failure reports, and persistent JSONL traces. The focused
isolated-diff test suite passes:
```bash
pnpm --filter @ktx/context exec vitest run \
src/ingest/ingest-trace.test.ts \
src/ingest/wiki-body-refs.test.ts \
src/ingest/artifact-gates.test.ts \
src/ingest/isolated-diff/git-patch.test.ts \
src/ingest/isolated-diff/work-unit-executor.test.ts \
src/ingest/isolated-diff/patch-integrator.test.ts \
src/ingest/ingest-bundle.runner.isolated-diff.test.ts
```
Current result: `7 passed`, `28 passed`.
One v1-blocking gap remains. `validateProvenanceRawPaths()` is called in
`packages/context/src/ingest/ingest-bundle.runner.ts` after
`squashMergeIntoMain()`. A work unit or reconciliation action can emit an
otherwise valid wiki or semantic-layer artifact whose `rawPaths` contain a path
outside the current raw snapshot and eviction set. Today the run fails during
provenance recording, but only after the invalidly-attributed artifacts have
already reached the main project worktree. That violates the spec requirement
that final global gates run before any changes reach main.
Observability for the already-implemented phases is sufficient for postmortem
reconstruction: traces include input snapshots, routing, child worktree
creation and cleanup, patch collection and application, conflict
classification, reconciliation, final gates, failure reports, and run outcome.
This plan adds only the missing provenance validation failure trace because it
corresponds to a concrete pre-main failure mode, not cosmetic trace expansion.
Non-blocking gaps that remain after this plan:
- Migrating Notion, LookML, Looker, dbt, MetricFlow, and historic-SQL direct
durable writes to the isolated path.
- Promoting isolated diffs as the default for all connectors.
- Removing the old shared-worktree WorkUnit execution path.
- Interactive, CLI, or agent-driven conflict resolution.
- Auto-merging semantic conflicts that cannot be proven correct.
- Transitive SQL-projection dependency expansion beyond direct declared joins.
- Moving provenance rows to worktree files.
- Adding failure reports for failures that happen before an ingest run row
exists. The trace file is still written at the deterministic job path.
## File Structure
- Modify `packages/context/src/ingest/ingest-bundle.runner.isolated-diff.test.ts`.
Add a regression proving invalid provenance raw paths fail before squash,
leave main unchanged, skip SQLite provenance insertion, and emit a
postmortem-grade trace event.
- Modify `packages/context/src/ingest/ingest-bundle.runner.ts`.
Extract provenance row construction into private helpers, run provenance
raw-path validation before squash, trace validation success and failure, and
reuse the prevalidated rows for insertion and reports after squash.
---
### Task 1: Add the pre-squash provenance regression
**Files:**
- Modify: `packages/context/src/ingest/ingest-bundle.runner.isolated-diff.test.ts`
- [ ] **Step 1: Write the failing runner test**
Append this test inside the existing
`describe('IngestBundleRunner isolated diff path', ...)` block in
`packages/context/src/ingest/ingest-bundle.runner.isolated-diff.test.ts`:
```ts
it('rejects invalid provenance raw paths before squash reaches main', async () => {
const runtime = await makeRealGitRuntime();
try {
const { deps, adapter } = makeDeps(runtime);
adapter.chunk.mockResolvedValue({
workUnits: [{ unitKey: 'card-valid-artifacts', rawFiles: ['cards/source.json'], peerFileIndex: [], dependencyPaths: [] }],
});
let currentSession: any = null;
deps.toolsetFactory.createIngestWuToolset = vi.fn((toolSession: any) => {
currentSession = toolSession;
return { toRuntimeTools: vi.fn(() => ({})) };
});
deps.agentRunner.runLoop = vi.fn(async () => {
const root = rootOfConfig(currentSession.configService, runtime.configDir);
await mkdir(join(root, 'semantic-layer/warehouse'), { recursive: true });
await mkdir(join(root, 'wiki/global'), { recursive: true });
await writeFile(
join(root, 'semantic-layer/warehouse/mart_account_segments.yaml'),
'name: mart_account_segments\ngrain: [account_id]\ncolumns: [{name: account_id, type: string}]\njoins: []\nmeasures:\n - name: total_contract_arr\n expr: sum(contract_arr)\n',
);
await writeFile(
join(root, 'wiki/global/account-segments.md'),
'---\nsummary: Account segments\nusage_mode: auto\nsl_refs:\n - mart_account_segments\n---\n\nARR is `mart_account_segments.total_contract_arr`.\n',
);
addTouchedSlSource(currentSession.touchedSlSources, 'warehouse', 'mart_account_segments');
currentSession.actions.push({
target: 'sl',
type: 'created',
key: 'mart_account_segments',
detail: 'Valid source',
targetConnectionId: 'warehouse',
rawPaths: ['cards/source.json'],
});
currentSession.actions.push({
target: 'wiki',
type: 'created',
key: 'account-segments',
detail: 'Valid wiki with invalid provenance raw path',
rawPaths: ['cards/missing.json'],
});
await currentSession.gitService.commitFiles(
['semantic-layer/warehouse/mart_account_segments.yaml', 'wiki/global/account-segments.md'],
'valid artifacts with invalid provenance',
'KTX Test',
'system@ktx.local',
);
return { stopReason: 'natural' };
}) as never;
const runner = new IngestBundleRunner(deps);
await mockStageRawFiles(runner, runtime, [['cards/source.json', 'h1']]);
const preRunHead = await runtime.git.revParseHead();
await expect(
runner.run({
jobId: 'job-invalid-provenance',
connectionId: 'warehouse',
sourceKey: 'metabase',
trigger: 'upload',
bundleRef: { kind: 'upload', uploadId: 'upload' },
}),
).rejects.toThrow(/provenance row references raw path outside this snapshot: cards\/missing\.json/);
expect(await runtime.git.revParseHead()).toBe(preRunHead);
expect(deps.provenance.insertMany).not.toHaveBeenCalled();
const trace = await readFile(join(runtime.configDir, '.ktx/ingest-traces/job-invalid-provenance/trace.jsonl'), 'utf-8');
expect(trace).toContain('final_artifact_gates_finished');
expect(trace).toContain('provenance_rows_validation_failed');
expect(trace).toContain('cards/missing.json');
expect(trace).toContain('ingest_failed');
expect(trace).not.toContain('squash_finished');
} finally {
await rm(runtime.homeDir, { recursive: true, force: true });
}
});
```
- [ ] **Step 2: Run the failing regression**
Run:
```bash
pnpm --filter @ktx/context exec vitest run src/ingest/ingest-bundle.runner.isolated-diff.test.ts -t "invalid provenance raw paths"
```
Expected: FAIL because the current runner validates provenance after
`squashMergeIntoMain()`, so `runtime.git.revParseHead()` changes and the trace
does not contain `provenance_rows_validation_failed`.
### Task 2: Move provenance validation into the pre-squash gate boundary
**Files:**
- Modify: `packages/context/src/ingest/ingest-bundle.runner.ts`
- [ ] **Step 1: Import the provenance report and insert types**
In `packages/context/src/ingest/ingest-bundle.runner.ts`, update the imports.
Replace this import block:
```ts
import type {
ContextEvidenceIndexSummary,
IngestBundleRunnerDeps,
IngestProvenanceRow,
IngestRunsPort,
IngestSessionWorktree,
PageTriageRunResult,
} from './ports.js';
```
With:
```ts
import type {
ContextEvidenceIndexSummary,
IngestBundleRunnerDeps,
IngestProvenanceInsert,
IngestProvenanceRow,
IngestRunsPort,
IngestSessionWorktree,
PageTriageRunResult,
} from './ports.js';
```
Replace this import block:
```ts
import {
buildStageIndexFromReportBody,
postProcessorSavedMemoryCounts,
type IngestReportPostProcessorOutcome,
type IngestReportSnapshot,
} from './reports.js';
```
With:
```ts
import {
buildStageIndexFromReportBody,
postProcessorSavedMemoryCounts,
type IngestReportPostProcessorOutcome,
type IngestReportProvenanceDetail,
type IngestReportSnapshot,
} from './reports.js';
```
- [ ] **Step 2: Add provenance row helpers**
Add these private methods after `private errorMessage(error: unknown): string`
in `packages/context/src/ingest/ingest-bundle.runner.ts`:
```ts
private buildProvenanceRows(input: {
job: IngestBundleJob;
syncId: string;
currentHashes: Map<string, string>;
stageIndex: StageIndex;
reconcileActions: MemoryAction[];
eviction?: EvictionUnit;
}): IngestProvenanceInsert[] {
const provenanceRows: IngestProvenanceInsert[] = [];
const actionToType = (action: MemoryAction): IngestProvenanceInsert['actionType'] => {
if (action.target === 'wiki') {
return 'wiki_written';
}
return action.type === 'created' ? 'source_created' : 'measure_added';
};
const producedPaths = new Set<string>();
const pushActionProvenance = (rawPath: string, action: MemoryAction): void => {
const hash = input.currentHashes.get(rawPath) ?? '';
provenanceRows.push({
connectionId: input.job.connectionId,
sourceKey: input.job.sourceKey,
syncId: input.syncId,
rawPath,
rawContentHash: hash,
artifactKind: action.target,
artifactKey: action.key,
targetConnectionId: action.target === 'sl' ? actionTargetConnectionId(action, input.job.connectionId) : null,
artifactContentHash: null,
actionType: actionToType(action),
});
producedPaths.add(rawPath);
};
for (const wu of input.stageIndex.workUnits) {
for (const action of wu.actions) {
for (const rawPath of rawPathsForAction(action, wu.rawFiles)) {
pushActionProvenance(rawPath, action);
}
}
}
for (const action of input.reconcileActions) {
for (const rawPath of action.rawPaths ?? []) {
pushActionProvenance(rawPath, action);
}
}
for (const resolution of input.stageIndex.artifactResolutions ?? []) {
const hash = input.currentHashes.get(resolution.rawPath) ?? '';
provenanceRows.push({
connectionId: input.job.connectionId,
sourceKey: input.job.sourceKey,
syncId: input.syncId,
rawPath: resolution.rawPath,
rawContentHash: hash,
artifactKind: resolution.artifactKind,
artifactKey: resolution.artifactKey,
targetConnectionId: null,
artifactContentHash: null,
actionType: resolution.actionType,
});
producedPaths.add(resolution.rawPath);
}
for (const [rawPath, hash] of input.currentHashes) {
if (producedPaths.has(rawPath)) {
continue;
}
provenanceRows.push({
connectionId: input.job.connectionId,
sourceKey: input.job.sourceKey,
syncId: input.syncId,
rawPath,
rawContentHash: hash,
artifactKind: null,
artifactKey: null,
targetConnectionId: null,
artifactContentHash: null,
actionType: 'skipped',
});
}
return provenanceRows;
}
private toReportProvenanceRows(rows: IngestProvenanceInsert[]): IngestReportProvenanceDetail[] {
return rows.map(({ rawPath, artifactKind, artifactKey, actionType, targetConnectionId }) => ({
rawPath,
artifactKind,
artifactKey,
targetConnectionId: targetConnectionId ?? null,
actionType,
}));
}
```
- [ ] **Step 3: Validate planned provenance rows before squash**
In `packages/context/src/ingest/ingest-bundle.runner.ts`, find the code that
sets `activePhase = 'final_gates';` and runs `traceTimed(...,
'final_artifact_gates', ...)`. Immediately after that `await traceTimed(...)`
block and before the `// Stage 6 — squash commit` comment, insert:
```ts
activePhase = 'provenance_validation';
const provenanceRows = this.buildProvenanceRows({
job,
syncId,
currentHashes,
stageIndex,
reconcileActions,
eviction,
});
await traceTimed(
runTrace,
'provenance',
'provenance_rows_validation',
{
rowCount: provenanceRows.length,
currentRawPathCount: currentHashes.size,
deletedRawPathCount: eviction?.deletedRawPaths.length ?? 0,
},
async () => {
validateProvenanceRawPaths({
rows: provenanceRows,
currentRawPaths: new Set(currentHashes.keys()),
deletedRawPaths: new Set(eviction?.deletedRawPaths ?? []),
});
},
);
const reportProvenanceRows = this.toReportProvenanceRows(provenanceRows);
```
- [ ] **Step 4: Replace the post-squash provenance construction block**
In `packages/context/src/ingest/ingest-bundle.runner.ts`, in the
`activePhase = 'provenance';` section after squash, delete the current block
that starts with:
```ts
// Provenance rows: per-artifact when the WU emitted actions, plus a `skipped`
// fallback for raw files that produced nothing so the next DiffSet still sees
// them.
const provenanceRows: Parameters<typeof this.deps.provenance.insertMany>[0] = [];
```
And ends with:
```ts
await runTrace.event('debug', 'provenance', 'provenance_rows_validated', {
rowCount: provenanceRows.length,
});
```
Do not delete the existing call to `await this.deps.provenance.insertMany(provenanceRows);`.
Immediately after that insertion call, add:
```ts
await runTrace.event('debug', 'provenance', 'provenance_rows_inserted', {
rowCount: provenanceRows.length,
});
```
Then delete the later `const reportProvenanceRows = provenanceRows.map(...)`
block because `reportProvenanceRows` is now created before squash from the
prevalidated rows.
- [ ] **Step 5: Run the provenance regression**
Run:
```bash
pnpm --filter @ktx/context exec vitest run src/ingest/ingest-bundle.runner.isolated-diff.test.ts -t "invalid provenance raw paths"
```
Expected: PASS. The trace contains `provenance_rows_validation_failed`, main
HEAD remains unchanged, and `provenance.insertMany` is not called.
- [ ] **Step 6: Run the focused isolated-diff suite**
Run:
```bash
pnpm --filter @ktx/context exec vitest run \
src/ingest/ingest-trace.test.ts \
src/ingest/wiki-body-refs.test.ts \
src/ingest/artifact-gates.test.ts \
src/ingest/isolated-diff/git-patch.test.ts \
src/ingest/isolated-diff/work-unit-executor.test.ts \
src/ingest/isolated-diff/patch-integrator.test.ts \
src/ingest/ingest-bundle.runner.isolated-diff.test.ts
```
Expected: PASS.
### Task 3: Type-check, dead-code check, and commit
**Files:**
- Verify: `packages/context/src/ingest/ingest-bundle.runner.ts`
- Verify: `packages/context/src/ingest/ingest-bundle.runner.isolated-diff.test.ts`
- [ ] **Step 1: Run the context package type-check**
Run:
```bash
pnpm --filter @ktx/context run type-check
```
Expected: PASS.
- [ ] **Step 2: Run the workspace dead-code check**
Run:
```bash
pnpm run dead-code
```
Expected: PASS, or only existing unrelated Knip/Biome findings. Investigate
any new findings in the two modified files before continuing.
- [ ] **Step 3: Commit the provenance gate closure**
Run:
```bash
git add packages/context/src/ingest/ingest-bundle.runner.ts \
packages/context/src/ingest/ingest-bundle.runner.isolated-diff.test.ts
git commit -m "fix(ingest): gate provenance before isolated diff squash"
```
Expected: one commit containing only the runner and isolated-diff runner test
changes.
## Self-Review
Spec coverage: this plan closes the remaining violation of the design's final
global gate invariant by proving invalid provenance raw paths fail before
squash and by moving provenance validation into the pre-main gate boundary.
Placeholder scan: no placeholder steps remain. Every implementation step names
the exact files, code, commands, and expected results.
Type consistency: the plan uses existing `IngestProvenanceInsert`,
`IngestReportProvenanceDetail`, `MemoryAction`, `EvictionUnit`, `StageIndex`,
`rawPathsForAction()`, and `validateProvenanceRawPaths()` names.

View file

@ -0,0 +1,754 @@
# Isolated Diff Ingestion V1 Default Promotion Implementation Plan
> **For agentic workers:** REQUIRED SUB-SKILL: Use
> superpowers:subagent-driven-development (recommended) or
> superpowers:executing-plans to implement this plan task-by-task. Steps use
> checkbox (`- [ ]`) syntax for tracking.
**Goal:** Promote isolated-diff WorkUnit execution to the default ingest runner
path while keeping the old shared-worktree branch reachable by an explicit
private fallback setting for the final cleanup rollout.
**Architecture:** The runner stops asking whether a source is on an
isolated-diff allowlist. Instead, non-override bundle ingests use isolated
diffs unless the private settings object lists the source in
`sharedWorktreeSourceKeys`. Local runtime defaults that fallback list to empty,
and tests keep the old path covered with an explicit legacy source setting so
rollout step 11 can delete it safely.
**Tech Stack:** TypeScript ESM/NodeNext, Vitest, pnpm workspace commands,
existing `IngestBundleRunner`, `IngestSettingsPort`, local ingest runtime, and
isolated-diff runner tests.
---
## Audit summary
This audit read the original spec at
`docs/superpowers/specs/2026-05-17-isolated-diff-ingestion-design.md`, all
plans matching
`docs/superpowers/plans/2026-05-17-isolated-diff-ingestion-*.md` and
`docs/superpowers/plans/2026-05-18-isolated-diff-ingestion-*.md`, and the
current ingest runner code under `packages/context/src/ingest/`.
Implemented v1 rollout coverage:
- Rollout steps 1 and 2 are implemented by the core plan: child worktrees,
binary no-rename patch proposals, and `git apply --3way --index`
integration exist.
- Rollout step 3 is implemented by the textual conflict resolver plan:
`textual-conflict-resolver.ts` is wired through `patch-integrator.ts`.
- Rollout steps 4, 5, and 6 are implemented by the gates, provenance,
reference, global wiki, and gate-repair plans: final gates, persistent traces,
failure reports, provenance validation, target policy, and repair counters
exist.
- Rollout step 7 is implemented by the core and follow-up plans: Metabase has
isolated-diff stale-reference regression coverage.
- Rollout step 8 is implemented by
`2026-05-18-isolated-diff-ingestion-v1-connector-migration.md` and the
follow-up commits: Notion, LookML, Looker, dbt, and MetricFlow route through
isolated child worktrees, and MetricFlow projection runs before WorkUnits.
Current v1-blocking gaps:
- Rollout step 10 is not complete. `IngestBundleRunner.isIsolatedDiffEnabled()`
still checks `settings.isolatedDiffSourceKeys`, and
`local-bundle-runtime.ts` still installs the internal allowlist returned by
`defaultIsolatedDiffSourceKeys()`.
- Rollout step 11 remains blocked until step 10 lands. The old
shared-worktree WorkUnit branch is still present and must stay reachable in
this plan for final cleanup validation.
Non-blocking gaps:
- Rollout step 9 deterministic semantic merge helpers remain intentionally
deferred until v1 resolver metrics show frequent mechanical repairs.
- Transitive SQL-projection dependency expansion remains outside v1; current
gates cover direct declared join neighbors.
- Moving provenance into worktree files remains outside v1; the implemented
source of truth is the ingest provenance store and report body.
- Public connector knobs such as `executionMode`, `planningStrategy`, and
`conflictPolicy` remain non-goals and must not be added.
- Richer resolver context, such as full transcript excerpts for every
overlapping patch, can be evaluated after the default path has production
traces.
## File structure
- Modify `packages/context/src/ingest/isolated-diff/source-routing.ts`.
Replace the isolated-diff direct-write allowlist with an empty default
shared-worktree fallback list.
- Modify `packages/context/src/ingest/isolated-diff/source-routing.test.ts`.
Lock the fallback list semantics and remove direct-write allowlist
assertions.
- Modify `packages/context/src/ingest/ports.ts`.
Replace `isolatedDiffSourceKeys?: string[]` with
`sharedWorktreeSourceKeys?: string[]` on the private runner settings port.
- Modify `packages/context/src/ingest/ingest-bundle.runner.ts`.
Make isolated diff the default for non-override runs and route to the old
shared branch only when `sharedWorktreeSourceKeys` contains the source.
- Modify `packages/context/src/ingest/ingest-bundle.runner.isolated-diff.test.ts`.
Prove an unlisted source uses isolated diffs by default and prove an
explicit fallback source can still reach the shared-worktree branch.
- Modify `packages/context/src/ingest/local-bundle-runtime.ts`.
Install the new empty fallback list instead of the old isolated-diff
allowlist.
- Modify `packages/context/src/ingest/local-bundle-runtime.test.ts`.
Assert local runtime settings do not expose `isolatedDiffSourceKeys` and do
default `sharedWorktreeSourceKeys` to `[]`.
---
### Task 1: Replace source routing semantics
**Files:**
- Modify: `packages/context/src/ingest/isolated-diff/source-routing.test.ts`
- Modify: `packages/context/src/ingest/isolated-diff/source-routing.ts`
- Modify: `packages/context/src/ingest/ports.ts`
- [ ] **Step 1: Write the failing source-routing tests**
Replace `packages/context/src/ingest/isolated-diff/source-routing.test.ts` with:
```ts
import { describe, expect, it } from 'vitest';
import { defaultSharedWorktreeSourceKeys, isSharedWorktreeFallbackSourceKey } from './source-routing.js';
describe('isolated-diff source routing', () => {
it('defaults every non-override source to isolated diffs', () => {
expect(defaultSharedWorktreeSourceKeys()).toEqual([]);
});
it('returns a mutable copy for runtime settings', () => {
const keys = defaultSharedWorktreeSourceKeys();
keys.push('legacy-source');
expect(defaultSharedWorktreeSourceKeys()).toEqual([]);
});
it('recognizes only explicitly configured shared-worktree fallback sources', () => {
expect(isSharedWorktreeFallbackSourceKey('notion', [])).toBe(false);
expect(isSharedWorktreeFallbackSourceKey('metricflow', [])).toBe(false);
expect(isSharedWorktreeFallbackSourceKey('legacy-source', ['legacy-source'])).toBe(true);
expect(isSharedWorktreeFallbackSourceKey('other-source', ['legacy-source'])).toBe(false);
});
});
```
- [ ] **Step 2: Run the source-routing tests to verify they fail**
Run:
```bash
pnpm --filter @ktx/context exec vitest run src/ingest/isolated-diff/source-routing.test.ts
```
Expected: FAIL because `defaultSharedWorktreeSourceKeys()` and
`isSharedWorktreeFallbackSourceKey()` are not exported yet.
- [ ] **Step 3: Rewrite the routing helper**
Replace `packages/context/src/ingest/isolated-diff/source-routing.ts` with:
```ts
const DEFAULT_SHARED_WORKTREE_SOURCE_KEYS: readonly string[] = [];
export function defaultSharedWorktreeSourceKeys(): string[] {
return [...DEFAULT_SHARED_WORKTREE_SOURCE_KEYS];
}
export function isSharedWorktreeFallbackSourceKey(
sourceKey: string,
sharedWorktreeSourceKeys: readonly string[] = DEFAULT_SHARED_WORKTREE_SOURCE_KEYS,
): boolean {
return sharedWorktreeSourceKeys.includes(sourceKey);
}
```
- [ ] **Step 4: Rename the private settings field**
In `packages/context/src/ingest/ports.ts`, replace the
`IngestSettingsPort` interface with:
```ts
export interface IngestSettingsPort {
memoryIngestionModel: string;
probeRowCount: number;
workUnitMaxConcurrency?: number;
workUnitStepBudget?: number;
workUnitFailureMode?: 'abort' | 'continue';
sharedWorktreeSourceKeys?: string[];
ingestTraceLevel?: IngestTraceLevel;
}
```
- [ ] **Step 5: Run the source-routing tests again**
Run:
```bash
pnpm --filter @ktx/context exec vitest run src/ingest/isolated-diff/source-routing.test.ts
```
Expected: PASS.
- [ ] **Step 6: Commit routing semantics**
Run:
```bash
git add packages/context/src/ingest/isolated-diff/source-routing.ts \
packages/context/src/ingest/isolated-diff/source-routing.test.ts \
packages/context/src/ingest/ports.ts
git commit -m "feat(ingest): make isolated diff routing the private default"
```
### Task 2: Promote the runner default
**Files:**
- Modify: `packages/context/src/ingest/ingest-bundle.runner.isolated-diff.test.ts`
- Modify: `packages/context/src/ingest/ingest-bundle.runner.ts`
- [ ] **Step 1: Update the isolated runner test imports and harness**
In `packages/context/src/ingest/ingest-bundle.runner.isolated-diff.test.ts`,
replace the source-routing import with:
```ts
import { defaultSharedWorktreeSourceKeys } from './isolated-diff/source-routing.js';
```
Then change the `makeDeps()` signature and `settings` block to:
```ts
function makeDeps(
runtime: Awaited<ReturnType<typeof makeRealGitRuntime>>,
sourceKey = 'metabase',
settings: Partial<IngestBundleRunnerDeps['settings']> = {},
) {
```
```ts
settings: {
memoryIngestionModel: 'test',
probeRowCount: 1,
sharedWorktreeSourceKeys: defaultSharedWorktreeSourceKeys(),
ingestTraceLevel: 'trace',
...settings,
},
```
- [ ] **Step 2: Add the default-promotion regression tests**
Insert these tests inside
`describe('IngestBundleRunner isolated diff path', ...)`, before the existing
non-Metabase routing matrix:
```ts
it('routes an unlisted direct-writing source through isolated diffs by default', async () => {
const runtime = await makeRealGitRuntime();
try {
const sourceKey = 'custom-direct-source';
const { deps, adapter } = makeDeps(runtime, sourceKey);
adapter.chunk.mockResolvedValue({
workUnits: [
{
unitKey: 'custom-wiki',
rawFiles: ['custom/page.json'],
peerFileIndex: [],
dependencyPaths: [],
},
],
});
let currentSession: any = null;
deps.toolsetFactory.createIngestWuToolset = vi.fn((toolSession: any) => {
currentSession = toolSession;
return { toRuntimeTools: vi.fn(() => ({})) };
});
deps.agentRunner.runLoop = vi.fn(async (params: any) => {
if (params.telemetryTags.operationName !== 'ingest-bundle-wu') {
return { stopReason: 'natural' };
}
const root = rootOfConfig(currentSession.configService, runtime.configDir);
await mkdir(join(root, 'wiki/global'), { recursive: true });
await writeFile(
join(root, 'wiki/global/custom-isolated.md'),
'---\nsummary: Custom isolated write\nusage_mode: auto\n---\n\nCustom isolated write.\n',
'utf-8',
);
currentSession.actions.push({
target: 'wiki',
type: 'created',
key: 'custom-isolated',
detail: 'Custom isolated write',
rawPaths: ['custom/page.json'],
});
await currentSession.gitService.commitFiles(
['wiki/global/custom-isolated.md'],
'custom wiki',
'KTX Test',
'system@ktx.local',
);
return { stopReason: 'natural' };
}) as never;
const runner = new IngestBundleRunner(deps);
await mockStageRawFiles(runner, runtime, [['custom/page.json', 'h1']], sourceKey);
await expect(
runner.run({
jobId: 'job-custom-default',
connectionId: 'warehouse',
sourceKey,
trigger: 'upload',
bundleRef: { kind: 'upload', uploadId: 'upload' },
}),
).resolves.toMatchObject({
jobId: 'job-custom-default',
failedWorkUnits: [],
workUnitCount: 1,
});
const trace = await readFile(
join(runtime.configDir, '.ktx/ingest-traces/job-custom-default/trace.jsonl'),
'utf-8',
);
expect(trace).toContain('isolated_diff_enabled');
expect(trace).toContain('work_unit_child_created');
expect(trace).not.toContain('shared_worktree_path_enabled');
const reportCreate = vi.mocked(deps.reports.create).mock.calls.at(-1)?.[0];
const reportBody = reportCreate?.body as { isolatedDiff?: unknown } | undefined;
expect(reportBody?.isolatedDiff).toMatchObject({
enabled: true,
acceptedPatches: 1,
});
} finally {
await rm(runtime.homeDir, { recursive: true, force: true });
}
});
it('keeps the shared-worktree path reachable through explicit private fallback settings', async () => {
const runtime = await makeRealGitRuntime();
try {
const sourceKey = 'legacy-source';
const { deps, adapter } = makeDeps(runtime, sourceKey, {
sharedWorktreeSourceKeys: ['legacy-source'],
});
adapter.chunk.mockResolvedValue({
workUnits: [
{
unitKey: 'legacy-wiki',
rawFiles: ['legacy/page.json'],
peerFileIndex: [],
dependencyPaths: [],
},
],
});
let currentSession: any = null;
deps.toolsetFactory.createIngestWuToolset = vi.fn((toolSession: any) => {
currentSession = toolSession;
return { toRuntimeTools: vi.fn(() => ({})) };
});
deps.agentRunner.runLoop = vi.fn(async (params: any) => {
if (params.telemetryTags.operationName !== 'ingest-bundle-wu') {
return { stopReason: 'natural' };
}
const root = rootOfConfig(currentSession.configService, runtime.configDir);
await mkdir(join(root, 'wiki/global'), { recursive: true });
await writeFile(
join(root, 'wiki/global/legacy-shared.md'),
'---\nsummary: Legacy shared write\nusage_mode: auto\n---\n\nLegacy shared write.\n',
'utf-8',
);
currentSession.actions.push({
target: 'wiki',
type: 'created',
key: 'legacy-shared',
detail: 'Legacy shared write',
rawPaths: ['legacy/page.json'],
});
await currentSession.gitService.commitFiles(
['wiki/global/legacy-shared.md'],
'legacy wiki',
'KTX Test',
'system@ktx.local',
);
return { stopReason: 'natural' };
}) as never;
const runner = new IngestBundleRunner(deps);
await mockStageRawFiles(runner, runtime, [['legacy/page.json', 'h1']], sourceKey);
await expect(
runner.run({
jobId: 'job-legacy-shared',
connectionId: 'warehouse',
sourceKey,
trigger: 'upload',
bundleRef: { kind: 'upload', uploadId: 'upload' },
}),
).resolves.toMatchObject({
jobId: 'job-legacy-shared',
failedWorkUnits: [],
workUnitCount: 1,
});
const trace = await readFile(
join(runtime.configDir, '.ktx/ingest-traces/job-legacy-shared/trace.jsonl'),
'utf-8',
);
expect(trace).toContain('shared_worktree_path_enabled');
expect(trace).not.toContain('work_unit_child_created');
const reportCreate = vi.mocked(deps.reports.create).mock.calls.at(-1)?.[0];
const reportBody = reportCreate?.body as { isolatedDiff?: unknown } | undefined;
expect(reportBody?.isolatedDiff).toMatchObject({
enabled: false,
});
} finally {
await rm(runtime.homeDir, { recursive: true, force: true });
}
});
```
- [ ] **Step 3: Run the new runner tests to verify the default test fails**
Run:
```bash
pnpm --filter @ktx/context exec vitest run src/ingest/ingest-bundle.runner.isolated-diff.test.ts -t "unlisted direct-writing source|shared-worktree path reachable"
```
Expected: FAIL. The unlisted source still enters the old shared-worktree path
because the runner checks `isolatedDiffSourceKeys`.
- [ ] **Step 4: Change the runner routing decision**
In `packages/context/src/ingest/ingest-bundle.runner.ts`, replace
`isIsolatedDiffEnabled()` with:
```ts
private isSharedWorktreeFallbackEnabled(sourceKey: string): boolean {
return (this.deps.settings.sharedWorktreeSourceKeys ?? []).includes(sourceKey);
}
```
Then replace the isolated-diff routing line with:
```ts
const isolatedDiffEnabled = !overrideReport && !this.isSharedWorktreeFallbackEnabled(job.sourceKey);
```
Finally, replace the shared-path trace event with:
```ts
await runTrace.event('info', 'routing', 'shared_worktree_path_enabled', {
sourceKey: job.sourceKey,
reason: 'explicit_private_fallback',
});
```
- [ ] **Step 5: Run the new runner tests again**
Run:
```bash
pnpm --filter @ktx/context exec vitest run src/ingest/ingest-bundle.runner.isolated-diff.test.ts -t "unlisted direct-writing source|shared-worktree path reachable"
```
Expected: PASS.
- [ ] **Step 6: Commit runner default promotion**
Run:
```bash
git add packages/context/src/ingest/ingest-bundle.runner.ts \
packages/context/src/ingest/ingest-bundle.runner.isolated-diff.test.ts
git commit -m "feat(ingest): promote isolated diff to default runner path"
```
### Task 3: Update local runtime defaults
**Files:**
- Modify: `packages/context/src/ingest/local-bundle-runtime.test.ts`
- Modify: `packages/context/src/ingest/local-bundle-runtime.ts`
- [ ] **Step 1: Update the local runtime settings test type**
In `packages/context/src/ingest/local-bundle-runtime.test.ts`, replace
`RuntimeWithSettingsDeps` with:
```ts
type RuntimeWithSettingsDeps = {
deps: {
settings: {
sharedWorktreeSourceKeys?: string[];
isolatedDiffSourceKeys?: string[];
};
};
};
```
- [ ] **Step 2: Replace the local runtime settings assertion**
Replace the test named
`enables isolated-diff routing for direct durable-write connectors` with:
```ts
it('defaults local bundle ingest to isolated diffs without an allowlist', () => {
const runtime = createLocalBundleIngestRuntime({
project,
adapters: [new FakeSourceAdapter()],
agentRunner: testAgentRunner(),
});
const settings = (runtime.runner as unknown as RuntimeWithSettingsDeps).deps.settings;
expect(settings.sharedWorktreeSourceKeys).toEqual([]);
expect('isolatedDiffSourceKeys' in settings).toBe(false);
});
```
- [ ] **Step 3: Run the local runtime settings test to verify it fails**
Run:
```bash
pnpm --filter @ktx/context exec vitest run src/ingest/local-bundle-runtime.test.ts -t "defaults local bundle ingest"
```
Expected: FAIL because `local-bundle-runtime.ts` still sets
`isolatedDiffSourceKeys`.
- [ ] **Step 4: Update local runtime imports and settings**
In `packages/context/src/ingest/local-bundle-runtime.ts`, replace the
source-routing import with:
```ts
import { defaultSharedWorktreeSourceKeys } from './isolated-diff/source-routing.js';
```
Then replace the settings field:
```ts
isolatedDiffSourceKeys: defaultIsolatedDiffSourceKeys(),
```
with:
```ts
sharedWorktreeSourceKeys: defaultSharedWorktreeSourceKeys(),
```
- [ ] **Step 5: Run the local runtime settings test again**
Run:
```bash
pnpm --filter @ktx/context exec vitest run src/ingest/local-bundle-runtime.test.ts -t "defaults local bundle ingest"
```
Expected: PASS.
- [ ] **Step 6: Commit local runtime defaults**
Run:
```bash
git add packages/context/src/ingest/local-bundle-runtime.ts \
packages/context/src/ingest/local-bundle-runtime.test.ts
git commit -m "feat(ingest): default local ingest to isolated diffs"
```
### Task 4: Remove stale allowlist references
**Files:**
- Verify: `packages/context/src/ingest/isolated-diff/source-routing.ts`
- Verify: `packages/context/src/ingest/local-bundle-runtime.ts`
- Verify: `packages/context/src/ingest/ingest-bundle.runner.ts`
- Verify: `packages/context/src/ingest/ports.ts`
- Verify: `packages/context/src/ingest/**/*.test.ts`
- [ ] **Step 1: Search for old allowlist names**
Run:
```bash
rg -n "isolatedDiffSourceKeys|defaultIsolatedDiffSourceKeys|ISOLATED_DIFF_DIRECT_WRITE_SOURCE_KEYS|isIsolatedDiffDirectWriteSourceKey" packages/context/src
```
Expected: no matches.
- [ ] **Step 2: Search for the new fallback setting**
Run:
```bash
rg -n "sharedWorktreeSourceKeys|defaultSharedWorktreeSourceKeys|isSharedWorktreeFallbackSourceKey" packages/context/src
```
Expected: matches only in these files:
```text
packages/context/src/ingest/ports.ts
packages/context/src/ingest/ingest-bundle.runner.ts
packages/context/src/ingest/ingest-bundle.runner.isolated-diff.test.ts
packages/context/src/ingest/isolated-diff/source-routing.ts
packages/context/src/ingest/isolated-diff/source-routing.test.ts
packages/context/src/ingest/local-bundle-runtime.ts
packages/context/src/ingest/local-bundle-runtime.test.ts
```
- [ ] **Step 3: Run a focused no-allowlist regression suite**
Run:
```bash
pnpm --filter @ktx/context exec vitest run \
src/ingest/isolated-diff/source-routing.test.ts \
src/ingest/local-bundle-runtime.test.ts \
src/ingest/ingest-bundle.runner.isolated-diff.test.ts \
-t "source routing|defaults local bundle ingest|unlisted direct-writing source|shared-worktree path reachable|routes notion|routes lookml|routes looker|routes dbt|routes metricflow"
```
Expected: PASS.
- [ ] **Step 4: Commit stale-reference cleanup if needed**
If Step 1 or Step 2 required any edits, run:
```bash
git add packages/context/src/ingest
git commit -m "chore(ingest): remove isolated diff allowlist references"
```
If no files changed, record that no cleanup commit was needed in the execution
notes for this task.
### Task 5: Final verification
**Files:**
- Verify: `packages/context/src/ingest/isolated-diff/source-routing.ts`
- Verify: `packages/context/src/ingest/isolated-diff/source-routing.test.ts`
- Verify: `packages/context/src/ingest/ingest-bundle.runner.ts`
- Verify: `packages/context/src/ingest/ingest-bundle.runner.isolated-diff.test.ts`
- Verify: `packages/context/src/ingest/local-bundle-runtime.ts`
- Verify: `packages/context/src/ingest/local-bundle-runtime.test.ts`
- Verify: `packages/context/src/ingest/ports.ts`
- Verify: `docs/superpowers/plans/2026-05-18-isolated-diff-ingestion-v1-default-promotion.md`
- [ ] **Step 1: Run the full isolated-diff focused suite**
Run:
```bash
pnpm --filter @ktx/context exec vitest run \
src/ingest/ingest-trace.test.ts \
src/ingest/wiki-body-refs.test.ts \
src/ingest/artifact-gates.test.ts \
src/ingest/semantic-layer-target-policy.test.ts \
src/ingest/isolated-diff/source-routing.test.ts \
src/ingest/isolated-diff/git-patch.test.ts \
src/ingest/isolated-diff/work-unit-executor.test.ts \
src/ingest/isolated-diff/patch-integrator.test.ts \
src/ingest/isolated-diff/textual-conflict-resolver.test.ts \
src/ingest/final-gate-repair.test.ts \
src/ingest/ingest-bundle.runner.isolated-diff.test.ts \
src/ingest/report-snapshot.test.ts \
src/ingest/local-bundle-runtime.test.ts
```
Expected: PASS.
- [ ] **Step 2: Run the MetricFlow local ingest regression**
Run:
```bash
pnpm --filter @ktx/context exec vitest run src/ingest/local-bundle-ingest.test.ts -t "runs full MetricFlow local ingest"
```
Expected: PASS. The report body includes `isolatedDiff.enabled: true`,
`acceptedPatches: 0`, and a string `projectionSha`.
- [ ] **Step 3: Run package type-check**
Run:
```bash
pnpm --filter @ktx/context run type-check
```
Expected: PASS.
- [ ] **Step 4: Run package tests**
Run:
```bash
pnpm --filter @ktx/context run test
```
Expected: PASS.
- [ ] **Step 5: Run TypeScript dead-code checks**
Run:
```bash
pnpm run dead-code
```
Expected: PASS, or only pre-existing findings unrelated to the files changed
by this plan. Investigate any finding that names `source-routing.ts`,
`ports.ts`, `local-bundle-runtime.ts`, or `ingest-bundle.runner.ts`.
- [ ] **Step 6: Decide whether docs-site needs an update**
No `docs-site/content/docs/` change is expected for this plan because the
change is an internal runner rollout switch and does not add or remove public
CLI commands, flags, config fields, connector setup steps, or user-facing
documentation concepts.
- [ ] **Step 7: Commit final verification notes**
Run:
```bash
git status --short
git add docs/superpowers/plans/2026-05-18-isolated-diff-ingestion-v1-default-promotion.md
git commit -m "docs: add isolated diff default promotion plan"
```
Only include the plan file in this commit if all implementation commits have
already captured their code changes.
## Completion criteria
This plan is complete when:
- `packages/context/src/ingest/ports.ts` has
`sharedWorktreeSourceKeys?: string[]` and no `isolatedDiffSourceKeys` field.
- `IngestBundleRunner` uses isolated diffs for every non-override source unless
`sharedWorktreeSourceKeys` explicitly contains that source.
- The trace for a default-routed source contains `isolated_diff_enabled` and
not `shared_worktree_path_enabled`.
- The trace for an explicitly fallback-routed source contains
`shared_worktree_path_enabled` and not `work_unit_child_created`.
- Local runtime settings default `sharedWorktreeSourceKeys` to `[]`.
- No production or test code under `packages/context/src` references the old
isolated-diff allowlist names.
- The focused isolated-diff suite, MetricFlow local ingest regression,
`@ktx/context` type-check, `@ktx/context` tests, and dead-code checks pass.
## Next rollout step
After this plan is implemented and verified, the only remaining v1-blocking
rollout item from the spec is step 11: remove the old shared-worktree WorkUnit
execution path and delete the private `sharedWorktreeSourceKeys` fallback
setting.

File diff suppressed because it is too large Load diff

View file

@ -0,0 +1,980 @@
# Isolated Diff Ingestion V1 Shared Worktree Removal Implementation Plan
> **For agentic workers:** REQUIRED SUB-SKILL: Use
> superpowers:subagent-driven-development (recommended) or
> superpowers:executing-plans to implement this plan task-by-task. Steps use
> checkbox (`- [ ]`) syntax for tracking.
**Goal:** Remove the old shared-worktree WorkUnit execution path so every
non-override bundle ingest uses isolated WorkUnit diffs.
**Architecture:** Keep `IngestBundleRunner` with one non-override execution
path: raw snapshot, optional deterministic projection, child WorkUnit
worktrees, patch integration, reconciliation, final gates, provenance
validation, and squash. Delete the private fallback routing setting and all
legacy tests, traces, and agent instructions that existed only for shared
WorkUnit state.
**Tech Stack:** TypeScript, Vitest, pnpm, KTX ingest runner, Git worktrees.
---
## Audit summary
This audit read the original design in
`docs/superpowers/specs/2026-05-17-isolated-diff-ingestion-design.md`, every
implemented plan matching
`docs/superpowers/plans/2026-05-17-isolated-diff-ingestion-*.md` and
`docs/superpowers/plans/2026-05-18-isolated-diff-ingestion-*.md`, and the
current implementation under `packages/context/src/ingest/`,
`packages/context/prompts/`, and `packages/context/skills/`.
Implemented v1 rollout coverage:
- Rollout steps 1 and 2 exist in code: isolated child worktrees, binary
no-rename patch collection, and `git apply --3way --index` patch integration.
- Rollout step 3 exists in code:
`packages/context/src/ingest/isolated-diff/textual-conflict-resolver.ts` is
wired through the patch integrator and runner.
- Rollout steps 4, 5, and 6 exist in code: final wiki and semantic-layer gates,
provenance validation before squash, target policy checks, bounded gate
repair, failed reports, and trace counters.
- Rollout step 7 exists in code: the Metabase stale body-reference regression
is covered in `ingest-bundle.runner.isolated-diff.test.ts`.
- Rollout step 8 is committed: Notion, LookML, Looker, dbt, and MetricFlow
route through isolated child worktrees, and MetricFlow projection runs before
WorkUnits.
- Rollout step 10 is committed: non-override ingests default to isolated diffs,
and the old branch is reachable only through the private
`sharedWorktreeSourceKeys` fallback setting.
## Remaining gaps
The remaining v1-blocking gaps are all part of rollout step 11:
- `packages/context/src/ingest/ports.ts` still exposes the private
`sharedWorktreeSourceKeys?: string[]` setting.
- `packages/context/src/ingest/isolated-diff/source-routing.ts` and its test
exist only to support the fallback setting.
- `packages/context/src/ingest/local-bundle-runtime.ts` still installs
`sharedWorktreeSourceKeys: []`.
- `packages/context/src/ingest/ingest-bundle.runner.ts` still checks
`isSharedWorktreeFallbackEnabled()` and contains the
`shared_worktree_path_enabled` branch that runs WorkUnits against the mutable
integration worktree.
- `packages/context/src/ingest/ingest-bundle.runner.isolated-diff.test.ts`
still has a regression proving the shared-worktree fallback is reachable.
- `packages/context/src/ingest/ingest-bundle.runner.test.ts` keeps broad runner
tests on the legacy path through `sharedWorktreeSourceKeys`; those tests must
either use the isolated mock harness or move coverage into the real-git
isolated suite.
- `packages/context/prompts/memory_agent_bundle_ingest_work_unit.md` and
`packages/context/skills/ingest_triage/SKILL.md` still tell WorkUnit agents
that prior WorkUnit writes in the same job are visible in the current working
branch. That instruction is false after isolated diffs and must be removed
with the shared path.
Non-blocking gaps after this plan:
- Rollout step 9 deterministic semantic merge helpers remain intentionally
deferred until resolver metrics show frequent mechanical repairs.
- Semantic-layer dependency expansion remains direct declared joins only; the
spec explicitly defers transitive SQL-projection closure.
- Provenance remains in the ingest provenance store and report body; moving it
to worktree files is a separate schema migration.
- Resolver context can later include richer transcript excerpts and explicit
overlap summaries for every previously applied patch.
- Failures before an ingest run row exists still have deterministic trace files
but no stored ingest report.
## File structure
- Modify `packages/context/src/ingest/ports.ts`. Remove the private fallback
setting from `IngestSettingsPort`.
- Modify `packages/context/src/ingest/local-bundle-runtime.ts`. Stop importing
and installing default shared-worktree fallback settings.
- Delete `packages/context/src/ingest/isolated-diff/source-routing.ts`. This
helper has no responsibility once fallback routing is removed.
- Delete `packages/context/src/ingest/isolated-diff/source-routing.test.ts`.
Its assertions exist only for the fallback helper.
- Modify `packages/context/src/ingest/ingest-bundle.runner.ts`. Delete
`isSharedWorktreeFallbackEnabled()`, the old shared-worktree WorkUnit branch,
and helper methods that only served that branch.
- Modify `packages/context/src/ingest/ingest-bundle.runner.isolated-diff.test.ts`.
Remove fallback reachability coverage and add a stale-setting regression that
proves a runtime object cannot opt out of isolated diffs.
- Modify `packages/context/src/ingest/ingest-bundle.runner.test.ts`. Remove
the fallback setting from the broad test harness and make its mocked Git
session support no-op isolated patch collection.
- Modify `packages/context/src/ingest/local-bundle-runtime.test.ts`. Assert
local runtime settings do not contain the fallback key.
- Modify `packages/context/prompts/memory_agent_bundle_ingest_work_unit.md`.
Replace shared-branch WorkUnit visibility instructions with isolated-diff
instructions.
- Modify `packages/context/skills/ingest_triage/SKILL.md`. Remove Stage 3
prior-WorkUnit visibility language and keep cross-WorkUnit sweep guidance in
Stage 4 reconciliation.
---
### Task 1: Add removal-contract regressions
**Files:**
- Modify: `packages/context/src/ingest/local-bundle-runtime.test.ts`
- Modify: `packages/context/src/ingest/ingest-bundle.runner.isolated-diff.test.ts`
- [ ] **Step 1: Update the local runtime settings type**
In `packages/context/src/ingest/local-bundle-runtime.test.ts`, replace
`RuntimeWithSettingsDeps` with:
```ts
type RuntimeWithSettingsDeps = {
deps: {
settings: Record<string, unknown>;
};
};
```
- [ ] **Step 2: Replace the local runtime fallback-setting assertion**
In `packages/context/src/ingest/local-bundle-runtime.test.ts`, replace the test
named `defaults local bundle ingest to isolated diffs without an allowlist` with:
```ts
it('defaults local bundle ingest to isolated diffs without a shared-worktree fallback setting', () => {
const runtime = createLocalBundleIngestRuntime({
project,
adapters: [new FakeSourceAdapter()],
agentRunner: testAgentRunner(),
});
const settings = (runtime.runner as unknown as RuntimeWithSettingsDeps).deps.settings;
expect(settings).not.toHaveProperty('sharedWorktreeSourceKeys');
expect(Object.keys(settings).sort()).toEqual([
'ingestTraceLevel',
'memoryIngestionModel',
'probeRowCount',
'workUnitFailureMode',
'workUnitMaxConcurrency',
'workUnitStepBudget',
]);
});
```
- [ ] **Step 3: Remove the source-routing import from the isolated runner test**
In `packages/context/src/ingest/ingest-bundle.runner.isolated-diff.test.ts`,
delete this import:
```ts
import { defaultSharedWorktreeSourceKeys } from './isolated-diff/source-routing.js';
```
Then remove the `sharedWorktreeSourceKeys` line from the `settings` object in
`makeDeps()`:
```ts
settings: {
memoryIngestionModel: 'test',
probeRowCount: 1,
ingestTraceLevel: 'trace',
...settings,
},
```
- [ ] **Step 4: Replace the shared fallback reachability test**
In `packages/context/src/ingest/ingest-bundle.runner.isolated-diff.test.ts`,
replace the test named
`keeps the shared-worktree path reachable through explicit private fallback settings`
with this stale-setting regression:
```ts
it('does not support shared-worktree fallback settings', async () => {
const runtime = await makeRealGitRuntime();
try {
const sourceKey = 'legacy-source';
const staleSettings = {
sharedWorktreeSourceKeys: ['legacy-source'],
} as Partial<IngestBundleRunnerDeps['settings']> & Record<string, unknown>;
const { deps, adapter } = makeDeps(runtime, sourceKey, staleSettings);
adapter.chunk.mockResolvedValue({
workUnits: [
{
unitKey: 'legacy-wiki',
rawFiles: ['legacy/page.json'],
peerFileIndex: [],
dependencyPaths: [],
},
],
});
let currentSession: any = null;
deps.toolsetFactory.createIngestWuToolset = vi.fn((toolSession: any) => {
currentSession = toolSession;
return { toRuntimeTools: vi.fn(() => ({})) };
});
deps.agentRunner.runLoop = vi.fn(async (params: any) => {
if (params.telemetryTags.operationName !== 'ingest-bundle-wu') {
return { stopReason: 'natural' };
}
const root = rootOfConfig(currentSession.configService, runtime.configDir);
await mkdir(join(root, 'wiki/global'), { recursive: true });
await writeFile(
join(root, 'wiki/global/legacy-isolated.md'),
'---\nsummary: Legacy isolated write\nusage_mode: auto\n---\n\nLegacy isolated write.\n',
'utf-8',
);
currentSession.actions.push({
target: 'wiki',
type: 'created',
key: 'legacy-isolated',
detail: 'Legacy isolated write',
rawPaths: ['legacy/page.json'],
});
await currentSession.gitService.commitFiles(
['wiki/global/legacy-isolated.md'],
'legacy isolated wiki',
'KTX Test',
'system@ktx.local',
);
return { stopReason: 'natural' };
}) as never;
const runner = new IngestBundleRunner(deps);
await mockStageRawFiles(runner, runtime, [['legacy/page.json', 'h1']], sourceKey);
await expect(
runner.run({
jobId: 'job-legacy-isolated',
connectionId: 'warehouse',
sourceKey,
trigger: 'upload',
bundleRef: { kind: 'upload', uploadId: 'upload' },
}),
).resolves.toMatchObject({
jobId: 'job-legacy-isolated',
failedWorkUnits: [],
workUnitCount: 1,
});
const trace = await readFile(
join(runtime.configDir, '.ktx/ingest-traces/job-legacy-isolated/trace.jsonl'),
'utf-8',
);
expect(trace).toContain('isolated_diff_enabled');
expect(trace).toContain('work_unit_child_created');
expect(trace).not.toContain('shared_worktree_path_enabled');
const reportCreate = vi.mocked(deps.reports.create).mock.calls.at(-1)?.[0];
const reportBody = reportCreate?.body as { isolatedDiff?: unknown } | undefined;
expect(reportBody?.isolatedDiff).toMatchObject({
enabled: true,
acceptedPatches: 1,
});
} finally {
await rm(runtime.homeDir, { recursive: true, force: true });
}
});
```
- [ ] **Step 5: Run the removal regressions and confirm they fail**
Run:
```bash
pnpm --filter @ktx/context exec vitest run \
src/ingest/local-bundle-runtime.test.ts \
src/ingest/ingest-bundle.runner.isolated-diff.test.ts \
-t "shared-worktree fallback|stale|defaults local bundle ingest|unlisted direct-writing source"
```
Expected: FAIL. The local runtime still exposes `sharedWorktreeSourceKeys`, and
the stale-setting runner test still reaches `shared_worktree_path_enabled`.
---
### Task 2: Remove the fallback setting and routing module
**Files:**
- Modify: `packages/context/src/ingest/ports.ts`
- Modify: `packages/context/src/ingest/local-bundle-runtime.ts`
- Delete: `packages/context/src/ingest/isolated-diff/source-routing.ts`
- Delete: `packages/context/src/ingest/isolated-diff/source-routing.test.ts`
- [ ] **Step 1: Remove the fallback setting from the runner settings port**
In `packages/context/src/ingest/ports.ts`, replace `IngestSettingsPort` with:
```ts
export interface IngestSettingsPort {
memoryIngestionModel: string;
probeRowCount: number;
workUnitMaxConcurrency?: number;
workUnitStepBudget?: number;
workUnitFailureMode?: 'abort' | 'continue';
ingestTraceLevel?: IngestTraceLevel;
}
```
- [ ] **Step 2: Remove the local runtime source-routing import**
In `packages/context/src/ingest/local-bundle-runtime.ts`, delete this import:
```ts
import { defaultSharedWorktreeSourceKeys } from './isolated-diff/source-routing.js';
```
- [ ] **Step 3: Remove the local runtime fallback setting**
In `packages/context/src/ingest/local-bundle-runtime.ts`, replace the settings
object with:
```ts
settings: {
memoryIngestionModel: options.project.config.llm.models.default ?? 'local-ingest-model',
probeRowCount: 0,
workUnitMaxConcurrency: options.project.config.ingest.workUnits.maxConcurrency,
workUnitStepBudget: options.project.config.ingest.workUnits.stepBudget,
workUnitFailureMode: options.project.config.ingest.workUnits.failureMode,
ingestTraceLevel: ingestTraceLevelFromEnv(),
},
```
- [ ] **Step 4: Delete the fallback routing helper files**
Delete:
```bash
git rm packages/context/src/ingest/isolated-diff/source-routing.ts
git rm packages/context/src/ingest/isolated-diff/source-routing.test.ts
```
- [ ] **Step 5: Confirm no fallback helper imports remain**
Run:
```bash
rg -n "defaultSharedWorktreeSourceKeys|isSharedWorktreeFallbackSourceKey|source-routing" packages/context/src
```
Expected: FAIL with no matches. `rg` exits with status 1 when the cleanup is
complete.
---
### Task 3: Delete the shared-worktree runner branch
**Files:**
- Modify: `packages/context/src/ingest/ingest-bundle.runner.ts`
- [ ] **Step 1: Remove helper methods used only by the shared branch**
In `packages/context/src/ingest/ingest-bundle.runner.ts`, delete these private
methods:
```ts
private buildFailedWorkUnitOutcome(wu: WorkUnit, error: unknown): WorkUnitOutcome {
return {
unitKey: wu.unitKey,
status: 'failed',
reason: error instanceof Error ? error.message : String(error),
preSha: '',
postSha: '',
actions: [],
touchedSlSources: [],
slDisallowed: wu.slDisallowed,
slDisallowedReason: wu.slDisallowedReason,
};
}
private formatWorkUnitFailure(outcome: WorkUnitOutcome): string {
return `WorkUnit ${outcome.unitKey} failed: ${outcome.reason ?? 'unknown failure'}`;
}
private isSharedWorktreeFallbackEnabled(sourceKey: string): boolean {
return (this.deps.settings.sharedWorktreeSourceKeys ?? []).includes(sourceKey);
}
```
- [ ] **Step 2: Make non-override isolated routing unconditional**
In `packages/context/src/ingest/ingest-bundle.runner.ts`, replace:
```ts
const isolatedDiffEnabled = !overrideReport && !this.isSharedWorktreeFallbackEnabled(job.sourceKey);
```
with:
```ts
const isolatedDiffEnabled = !overrideReport;
```
Then replace:
```ts
if (!overrideReport && isolatedDiffEnabled) {
```
with:
```ts
if (!overrideReport) {
```
- [ ] **Step 3: Delete the old shared-worktree branch**
In `packages/context/src/ingest/ingest-bundle.runner.ts`, delete the whole
branch that starts with:
```ts
} else if (!overrideReport) {
await runTrace.event('info', 'routing', 'shared_worktree_path_enabled', {
sourceKey: job.sourceKey,
reason: 'explicit_private_fallback',
});
```
and ends with:
```ts
latestReportWorkUnits = this.toReportWorkUnits(stageIndex);
}
```
After the deletion, the surrounding code must read:
```ts
}
}
const carryForwardResult =
contextReport && this.deps.contextCandidateCarryforward
? await this.deps.contextCandidateCarryforward.carryForward({
runId: runRow.id,
connectionId: job.connectionId,
sourceKey: job.sourceKey,
})
: null;
```
- [ ] **Step 4: Confirm the branch trace event is gone**
Run:
```bash
rg -n "shared_worktree_path_enabled|explicit_private_fallback|isSharedWorktreeFallbackEnabled|sharedWorktreeSourceKeys" packages/context/src/ingest/ingest-bundle.runner.ts
```
Expected: FAIL with no matches.
---
### Task 4: Update runner tests for isolated-only execution
**Files:**
- Modify: `packages/context/src/ingest/ingest-bundle.runner.test.ts`
- Modify: `packages/context/src/ingest/ingest-bundle.runner.isolated-diff.test.ts`
- [ ] **Step 1: Remove the fallback setting from the broad runner test harness**
In `packages/context/src/ingest/ingest-bundle.runner.test.ts`, replace the
`settings` block in `buildRunner()` with:
```ts
settings: {
probeRowCount: 1,
memoryIngestionModel: 'test-model',
},
```
- [ ] **Step 2: Add no-op isolated patch support to the broad mock Git**
In `packages/context/src/ingest/ingest-bundle.runner.test.ts`, replace the
`scopedGit` object in `makeDeps()` with:
```ts
const scopedGit = {
revParseHead: vi.fn().mockResolvedValue('h'),
commitFiles: vi.fn().mockResolvedValue({ created: true, commitHash: 'h' }),
commitStaged: vi.fn().mockResolvedValue({ created: false, commitHash: 'h' }),
resetHardTo: vi.fn(),
assertWorktreeClean: vi.fn().mockResolvedValue(undefined),
writeBinaryNoRenamePatch: vi.fn(async (_base: string, _head: string, patchPath: string) => {
await writeFile(patchPath, '', 'utf-8');
}),
applyPatchFile3WayIndex: vi.fn(),
diffNameStatus: vi.fn().mockResolvedValue([]),
};
```
- [ ] **Step 3: Update the custom sequencer test Git mock**
In the test named
`refuses to squash-merge when the session worktree has an in-progress sequencer op`,
replace the `sessionGit` object with:
```ts
const sessionGit = {
revParseHead: vi.fn().mockResolvedValue('h'),
commitFiles: vi.fn().mockResolvedValue({ created: true, commitHash: 'h' }),
commitStaged: vi.fn().mockResolvedValue({ created: false, commitHash: 'h' }),
resetHardTo: vi.fn(),
assertWorktreeClean: vi.fn().mockRejectedValue(assertError),
writeBinaryNoRenamePatch: vi.fn(async (_base: string, _head: string, patchPath: string) => {
await writeFile(patchPath, '', 'utf-8');
}),
applyPatchFile3WayIndex: vi.fn(),
diffNameStatus: vi.fn().mockResolvedValue([]),
};
```
- [ ] **Step 4: Move the failed-WorkUnit integration regression to the isolated suite**
In `packages/context/src/ingest/ingest-bundle.runner.test.ts`, delete the test
named `squash-merges only successful WUs into main when one WU fails sl_validate`.
In `packages/context/src/ingest/ingest-bundle.runner.isolated-diff.test.ts`,
add this test near the other real-git isolated runner regressions:
```ts
it('does not integrate failed isolated WorkUnit patches', async () => {
const runtime = await makeRealGitRuntime();
try {
const { deps, adapter } = makeDeps(runtime, 'fake');
adapter.chunk.mockResolvedValue({
workUnits: [
{ unitKey: 'wu-good', rawFiles: ['good.raw'], peerFileIndex: [], dependencyPaths: [] },
{ unitKey: 'wu-bad', rawFiles: ['bad.raw'], peerFileIndex: [], dependencyPaths: [] },
],
});
deps.diffSetService.compute = vi.fn().mockResolvedValue({
added: ['good.raw', 'bad.raw'],
modified: [],
deleted: [],
unchanged: [],
});
deps.slValidator.validateSingleSource = vi.fn(
async (_validationDeps: unknown, _connectionId: string, sourceName: string) => ({
errors: sourceName === 'bad' ? [{ message: 'bad source rejected' }] : [],
warnings: [],
}),
) as never;
let currentSession: any = null;
deps.toolsetFactory.createIngestWuToolset = vi.fn((toolSession: any) => {
currentSession = toolSession;
return { toRuntimeTools: vi.fn(() => ({})) };
});
deps.agentRunner.runLoop = vi.fn(async (params: any) => {
if (params.telemetryTags.operationName !== 'ingest-bundle-wu') {
return { stopReason: 'natural' };
}
const unitKey = params.telemetryTags.unitKey;
const root = rootOfConfig(currentSession.configService, runtime.configDir);
await mkdir(join(root, 'semantic-layer/warehouse'), { recursive: true });
if (unitKey === 'wu-good') {
await writeFile(join(root, 'semantic-layer/warehouse/good.yaml'), 'name: good\n', 'utf-8');
addTouchedSlSource(currentSession.touchedSlSources, 'warehouse', 'good');
currentSession.actions.push({
target: 'sl',
type: 'created',
key: 'good',
detail: 'good source',
targetConnectionId: 'warehouse',
rawPaths: ['good.raw'],
});
await currentSession.gitService.commitFiles(
['semantic-layer/warehouse/good.yaml'],
'test: add good source',
'KTX Test',
'system@ktx.local',
);
}
if (unitKey === 'wu-bad') {
await writeFile(join(root, 'semantic-layer/warehouse/bad.yaml'), 'name: bad\n', 'utf-8');
addTouchedSlSource(currentSession.touchedSlSources, 'warehouse', 'bad');
currentSession.actions.push({
target: 'sl',
type: 'created',
key: 'bad',
detail: 'bad source',
targetConnectionId: 'warehouse',
rawPaths: ['bad.raw'],
});
await currentSession.gitService.commitFiles(
['semantic-layer/warehouse/bad.yaml'],
'test: add bad source',
'KTX Test',
'system@ktx.local',
);
}
return { stopReason: 'natural' };
}) as never;
const runner = new IngestBundleRunner(deps);
await mockStageRawFiles(
runner,
runtime,
[
['good.raw', 'good-hash'],
['bad.raw', 'bad-hash'],
],
'fake',
);
const result = await runner.run({
jobId: 'job-failed-wu-isolated',
connectionId: 'warehouse',
sourceKey: 'fake',
trigger: 'upload',
bundleRef: { kind: 'upload', uploadId: 'upload' },
});
expect(result.failedWorkUnits).toEqual(['wu-bad']);
await expect(readFile(join(runtime.configDir, 'semantic-layer/warehouse/good.yaml'), 'utf-8')).resolves.toContain(
'good',
);
await expect(readFile(join(runtime.configDir, 'semantic-layer/warehouse/bad.yaml'), 'utf-8')).rejects.toThrow();
const reportCreate = vi.mocked(deps.reports.create).mock.calls.at(-1)?.[0];
const reportBody = reportCreate?.body as { isolatedDiff?: { acceptedPatches?: number }; failedWorkUnits?: string[] };
expect(reportBody.failedWorkUnits).toEqual(['wu-bad']);
expect(reportBody.isolatedDiff).toMatchObject({ enabled: true, acceptedPatches: 1 });
const trace = await readFile(
join(runtime.configDir, '.ktx/ingest-traces/job-failed-wu-isolated/trace.jsonl'),
'utf-8',
);
expect(trace).toContain('work_unit_failed_before_patch');
expect(trace).toContain('patch_accepted');
expect(trace).not.toContain('shared_worktree_path_enabled');
} finally {
await rm(runtime.homeDir, { recursive: true, force: true });
}
});
```
- [ ] **Step 5: Run the updated focused runner tests**
Run:
```bash
pnpm --filter @ktx/context exec vitest run \
src/ingest/ingest-bundle.runner.isolated-diff.test.ts \
src/ingest/local-bundle-runtime.test.ts \
-t "does not support shared-worktree|does not integrate failed isolated|defaults local bundle ingest|unlisted direct-writing source"
```
Expected: PASS. The traces contain `isolated_diff_enabled`, child worktree
events, and no `shared_worktree_path_enabled`.
- [ ] **Step 6: Run the broad runner suite**
Run:
```bash
pnpm --filter @ktx/context exec vitest run src/ingest/ingest-bundle.runner.test.ts
```
Expected: PASS. Broad runner coverage no longer depends on
`sharedWorktreeSourceKeys`.
- [ ] **Step 7: Commit the runner removal**
Run:
```bash
git add \
packages/context/src/ingest/ports.ts \
packages/context/src/ingest/local-bundle-runtime.ts \
packages/context/src/ingest/local-bundle-runtime.test.ts \
packages/context/src/ingest/ingest-bundle.runner.ts \
packages/context/src/ingest/ingest-bundle.runner.test.ts \
packages/context/src/ingest/ingest-bundle.runner.isolated-diff.test.ts \
packages/context/src/ingest/isolated-diff/source-routing.ts \
packages/context/src/ingest/isolated-diff/source-routing.test.ts
git commit -m "refactor(ingest): remove shared worktree WorkUnit path"
```
Expected: commit succeeds. The deleted routing files are included as deletions.
---
### Task 5: Remove shared-branch agent instructions
**Files:**
- Modify: `packages/context/prompts/memory_agent_bundle_ingest_work_unit.md`
- Modify: `packages/context/skills/ingest_triage/SKILL.md`
- Test: `packages/context/src/ingest/ingest-prompts.test.ts`
- Test: `packages/context/src/ingest/ingest-runtime-assets.test.ts`
- [ ] **Step 1: Update the WorkUnit role text**
In `packages/context/prompts/memory_agent_bundle_ingest_work_unit.md`, replace
the `<role>` block with:
```md
<role>
You are processing ONE WorkUnit of a multi-file ingest bundle. The WorkUnit
gives you a slice of raw source files (LookML views, dbt/MetricFlow YAMLs,
Metabase card JSONs, Notion pages, or similar) and you must translate that
slice into KTX semantic-layer sources and/or knowledge wiki pages, in one pass.
You run in an isolated WorkUnit worktree. Deterministic projection output,
existing project memory, and listed dependency paths are visible; sibling
WorkUnit edits from this same job are not visible until the runner integrates
accepted patches.
</role>
```
- [ ] **Step 2: Update the WorkUnit workflow text**
In the same prompt, replace workflow steps 2 and 4 with:
```md
2. Load the per-source review skill first (for example `lookml_ingest`,
`metricflow_ingest`, or `dbt_ingest`), then `sl_capture` and
`wiki_capture`, and `ingest_triage` last. The triage skill tells you how to
react when existing project memory, deterministic projection output, or
prior provenance overlaps with what this WorkUnit is about to write.
4. For each raw file: call `read_raw_file` (or `read_raw_span` for slicing large
files) to load content. Before writing a new SL source or wiki page, call
`discover_data` for each candidate source, table, metric, or topic name to
find existing wiki pages, SL sources, deterministic projection output, prior
sync artifacts, and raw warehouse matches; apply `ingest_triage` when you hit
one, and apply any matching canonical pin before deciding whether to edit,
rename, or skip.
```
- [ ] **Step 3: Update the WorkUnit do-not rule**
In the same prompt, replace:
```md
- Do not silently accept a name collision with a prior WU's write when the formula differs. Trigger `ingest_triage`.
```
with:
```md
- Do not silently accept a name collision with visible existing memory,
deterministic projection output, or prior provenance when the formula differs.
Trigger `ingest_triage`.
```
- [ ] **Step 4: Update ingest triage caller guidance**
In `packages/context/skills/ingest_triage/SKILL.md`, replace:
```md
This skill is loaded in two contexts:
- By a Stage 3 WorkUnit agent when `sl_discover` reveals that a prior WU (or a prior sync) already wrote something that overlaps with what the current WU is about to write.
- By the Stage 4 reconciliation agent for cross-WU sweeps and for eviction decisions.
```
with:
```md
This skill is loaded in two contexts:
- By a Stage 3 WorkUnit agent when `sl_discover`, deterministic projection
output, existing project memory, or prior provenance overlaps with what the
current WorkUnit is about to write.
- By the Stage 4 reconciliation agent for cross-WorkUnit sweeps, accepted patch
overlap, and eviction decisions.
```
- [ ] **Step 5: Update same-ingest wording in ingest triage**
In `packages/context/skills/ingest_triage/SKILL.md`, replace:
```md
4. **If there's no prior-sync row (both are from THIS job), check for same-ingest contradictions:**
```
with:
```md
4. **If reconciliation sees accepted patches from this same job with no
prior-sync row, check for same-ingest contradictions:**
```
- [ ] **Step 6: Search for stale shared-state prompt language**
Run:
```bash
rg -n "prior WU|prior-WU|Prior WorkUnits|same job may have already written|visible on the working branch|shared_worktree_path_enabled|shared-worktree path reachable" packages/context/prompts packages/context/skills packages/context/src/ingest
```
Expected: FAIL with no matches.
- [ ] **Step 7: Run prompt asset tests**
Run:
```bash
pnpm --filter @ktx/context exec vitest run \
src/ingest/ingest-prompts.test.ts \
src/ingest/ingest-runtime-assets.test.ts
```
Expected: PASS. Prompt assets still load from packaged KTX assets.
- [ ] **Step 8: Commit the prompt cleanup**
Run:
```bash
git add \
packages/context/prompts/memory_agent_bundle_ingest_work_unit.md \
packages/context/skills/ingest_triage/SKILL.md
git commit -m "docs(ingest): align WorkUnit prompts with isolated diffs"
```
Expected: commit succeeds.
---
### Task 6: Final verification
**Files:**
- Verify: `packages/context/src/ingest/ingest-bundle.runner.ts`
- Verify: `packages/context/src/ingest/ports.ts`
- Verify: `packages/context/src/ingest/local-bundle-runtime.ts`
- Verify: `packages/context/src/ingest/ingest-bundle.runner.test.ts`
- Verify: `packages/context/src/ingest/ingest-bundle.runner.isolated-diff.test.ts`
- Verify: `packages/context/prompts/memory_agent_bundle_ingest_work_unit.md`
- Verify: `packages/context/skills/ingest_triage/SKILL.md`
- [ ] **Step 1: Run the isolated-diff focused suite**
Run:
```bash
pnpm --filter @ktx/context exec vitest run \
src/ingest/ingest-trace.test.ts \
src/ingest/wiki-body-refs.test.ts \
src/ingest/artifact-gates.test.ts \
src/ingest/semantic-layer-target-policy.test.ts \
src/ingest/isolated-diff/git-patch.test.ts \
src/ingest/isolated-diff/work-unit-executor.test.ts \
src/ingest/isolated-diff/patch-integrator.test.ts \
src/ingest/isolated-diff/textual-conflict-resolver.test.ts \
src/ingest/final-gate-repair.test.ts \
src/ingest/report-snapshot.test.ts \
src/ingest/ingest-bundle.runner.isolated-diff.test.ts
```
Expected: PASS. The output includes the isolated-diff runner tests and no
`source-routing.test.ts`.
- [ ] **Step 2: Run the full context test suite**
Run:
```bash
pnpm --filter @ktx/context run test
```
Expected: PASS.
- [ ] **Step 3: Run context type-check**
Run:
```bash
pnpm --filter @ktx/context run type-check
```
Expected: PASS. There are no `sharedWorktreeSourceKeys` type errors because the
setting no longer exists.
- [ ] **Step 4: Run dead-code checks**
Run:
```bash
pnpm run dead-code
```
Expected: PASS. Knip does not report deleted source-routing exports, and Biome
does not report stale imports.
- [ ] **Step 5: Search for removed legacy path names**
Run:
```bash
rg -n "sharedWorktreeSourceKeys|defaultSharedWorktreeSourceKeys|isSharedWorktreeFallbackSourceKey|shared_worktree_path_enabled|explicit_private_fallback|source-routing" packages docs/superpowers/plans/2026-05-18-isolated-diff-ingestion-v1-shared-worktree-removal.md
```
Expected: matches only in this plan file. There must be no matches under
`packages/`.
- [ ] **Step 6: Confirm docs-site does not need an update**
Run:
```bash
rg -n "sharedWorktree|isolatedDiffSourceKeys|sharedWorktreeSourceKeys|executionMode|planningStrategy|conflictPolicy" docs-site README.md packages/*/README.md
```
Expected: either no matches or matches unrelated to a public user-facing knob.
This change removes an internal runner fallback and does not add, remove, or
rename public CLI behavior, configuration, or docs-site content.
- [ ] **Step 7: Commit final verification notes if files changed**
Run:
```bash
git status --short
```
Expected: clean after the two implementation commits. If this command reports
new changes, stop and inspect them before finishing; final verification should
not create extra source changes.
## Self-review
Spec coverage:
- Rollout step 11 is covered by Tasks 1 through 4: the private fallback setting,
helper module, old runner branch, trace event, and fallback tests are deleted.
- The isolated-diff WorkUnit flow remains covered by existing real-git tests and
the new failed-WorkUnit regression in Task 4.
- Agent-facing instructions are aligned with the spec's worktree invariant in
Task 5: sibling WorkUnit edits are not visible inside a child worktree.
- Override ingestion remains outside the WorkUnit execution branch and still
uses prior report materialization plus serial reconciliation.
Placeholder scan:
- This plan contains exact file paths, test names, replacement snippets,
commands, and expected results.
- There are no deferred implementation markers or unspecified edge-case
instructions.
Type consistency:
- `IngestSettingsPort` no longer includes `sharedWorktreeSourceKeys`.
- `isolatedDiffEnabled` remains the runner's internal summary flag and is
equivalent to `!overrideReport`.
- The removed trace event is `shared_worktree_path_enabled`; retained isolated
events include `isolated_diff_enabled`, `work_unit_child_created`, and
`work_unit_patch_collected`.
Execution handoff:
Plan complete and saved to
`docs/superpowers/plans/2026-05-18-isolated-diff-ingestion-v1-shared-worktree-removal.md`.
Two execution options:
1. **Subagent-Driven (recommended)** - Dispatch a fresh subagent per task,
review between tasks, and keep iteration fast.
2. **Inline Execution** - Execute tasks in this session using
`superpowers:executing-plans`, with batch execution and checkpoints.

View file

@ -0,0 +1,612 @@
# Isolated-diff ingestion design
**Date:** 2026-05-17
**Author:** Andrey Avtomonov
**Status:** Design - pending implementation plan
## Background
KTX ingests third-party context sources into durable project memory: raw source
snapshots, wiki pages, semantic-layer sources, evidence documents, candidates,
and fallback records. The current bundle runner stages raw source data in one
ingestion session worktree, then runs work units against that same mutable
worktree.
A Metabase ingestion run exposed the failure mode this design addresses. One
work unit inferred and wrote the semantic-layer measure
`mart_account_segments.total_contract_arr_cents`, a later work unit overwrote
the same source with `total_contract_arr`, and the generated wiki page kept
referencing the stale non-existent measure. The local per-work-unit checks did
not catch the final cross-artifact inconsistency because durable writes were
accepted into shared state before final integration.
The fix is not a Metabase-only validation patch. The same class of risk exists
any time LLM-authored work units mutate durable wiki or semantic-layer files:
Metabase cards, Notion pages and clusters, dbt YAML, MetricFlow YAML, Looker
dashboards and explores, and LookML models and views can all produce overlapping
or contested memory artifacts. KTX needs one ingestion execution model that
isolates agent-authored changes, integrates them deliberately, and validates
the final project state globally.
## Goals
This design creates one opinionated ingestion algorithm for all context sources.
Connector-specific code stays responsible for source-shaped work: fetching raw
data, normalizing raw files, planning work units, and optionally projecting
deterministic facts. The shared runner owns execution correctness.
The design has these goals:
- Run all agent-authored durable writes in isolated per-work-unit worktrees.
- Treat each work unit's git diff as its proposal artifact.
- Integrate accepted diffs through a shared artifact-aware merge path.
- Resolve expected cross-work-unit overlap with bounded agent repair before
failing the run.
- Run final global semantic gates before any changes reach the main project
worktree.
- Keep connector variance minimal and source-shaped, not pipeline-shaped.
- Avoid proposal manifests, typed candidates, and extra reporting entities for
the first implementation.
- Preserve deterministic projections for source systems with authoritative
structured metadata.
## Non-goals
This design does not change the wiki frontmatter schema, wiki page file layout,
the semantic-layer YAML format, or the raw source snapshot layouts. It does add
a narrow author-facing inline-code grammar for explicit wiki body references to
semantic-layer entities and raw tables, because body text is part of the
stale-reference failure class. It also does not remove source adapters' current
fetch and chunk logic in one large rewrite.
This design does not introduce public connector knobs such as
`executionMode`, `planningStrategy`, or `conflictPolicy`. The core runner
becomes more opinionated instead.
This design does not require all connectors to stop using candidates. Candidate
storage remains valid for flows that intentionally defer wiki curation. The
isolation model applies when a work unit writes durable project files.
## Locked design direction
The ingestion runner uses one flow for every source that can produce durable
changes.
```text
fetch raw
-> optional deterministic project
-> adapter plans WorkUnit[]
-> isolated WU diffs
-> artifact-aware integration
-> global semantic gates
-> squash
```
The important invariant is that the core runner does not know why a work unit
exists. A dbt adapter may plan by model, Notion may plan by page or cluster,
MetricFlow may plan by graph component, and Looker may plan by dashboard or
explore. Those differences describe the source system. They are not ingestion
execution modes.
## Architecture
The design splits ingestion into two layers with explicit responsibility
boundaries.
### Source adapter layer
The adapter owns source semantics. It fetches raw evidence, normalizes that
evidence into staged files, and plans work units from the staged snapshot and
diff scope.
The adapter may also provide deterministic projectors. A projector is code that
converts authoritative source facts into KTX artifacts without an agent. Good
examples are live database schema introspection and straightforward MetricFlow
semantic-model import.
The isolation-relevant adapter surface remains small:
```ts
interface SourceAdapter {
source: string;
skillNames: string[];
fetch?(pullConfig: unknown, stagedDir: string, ctx: FetchContext): Promise<void>;
chunk(stagedDir: string, diffSet?: DiffSet): Promise<ChunkResult>;
project?(ctx: DeterministicProjectionContext): Promise<ProjectionResult>;
resolveSlTargets?(ctx: SlTargetResolutionContext): Promise<string[]>;
}
```
This is the subset the isolated-diff runner needs to understand source-shaped
planning and deterministic projection. It is not a proposal to delete existing
`SourceAdapter` fields. Existing lifecycle and source-support fields such as
`detect`, `readFetchReport`, `listTargetConnectionIds`, `clusterWorkUnits`,
`describeScope`, `onPullSucceeded`, `evidenceIndexing`, `triageSupported`,
`getTriageSignals`, and `reconcileSkillNames` stay part of the adapter contract
until a separate cleanup intentionally removes them with migration impact
called out.
`chunk()` returns ordinary `WorkUnit[]`. The runner does not need a
`planningStrategy` enum because the source adapter can plan by any domain shape
that makes sense.
### Ingestion execution layer
The runner owns correctness, isolation, and integration. After `WorkUnit[]`
exists, all connectors follow the same execution path.
The runner is responsible for:
- creating the ingestion integration worktree from the project base commit;
- committing deterministic projection in the integration worktree before child
worktree creation;
- creating one child worktree per work unit from the post-projection ingestion
base commit;
- scoping tools to the work unit's raw files and allowed target connections;
- running the agent loop inside the work unit worktree;
- validating touched artifacts before accepting the work unit diff;
- collecting the work unit git diff;
- applying accepted diffs into the integration worktree;
- resolving textual and artifact-level conflicts;
- running final global gates; and
- squashing the integration worktree back to the project main worktree.
## Worktree model
The design uses three levels of git state.
```text
project main worktree
ingest integration worktree
per-work-unit worktree(s)
```
The project main worktree is the durable KTX project state. The ingestion
integration worktree stages raw snapshots, deterministic projections, accepted
work-unit diffs, reconciliation changes, and final gate repairs before one
squash merge back to main.
Deterministic projection runs first in the integration worktree, after the raw
snapshot is staged and before any per-work-unit worktree is created. The runner
commits those projector changes as a single projection commit. The integration
worktree's post-projection HEAD is the ingestion base commit referenced in this
design. If the adapter has no projector, the raw-snapshot commit is the
ingestion base commit.
Each per-work-unit worktree starts from the same ingestion base commit. A work
unit never observes another concurrent work unit's transient edits. This makes
the work unit diff a clean proposal against a stable base. Work units observe
deterministic projection outputs, including through `dependencyPaths` context,
and do not re-derive authoritative projected facts.
The integration worktree and each per-work-unit worktree must share one Git
object database, created through `git worktree add` from the same repository.
This is required so `git apply --3way` can resolve the base blobs recorded in
each work-unit patch during integration.
The runner creates and runs child worktrees under the existing
`workUnitMaxConcurrency` setting. A run may have many planned work units, but no
more than that bound may be active or left on disk at once. The default remains
serial execution. Child worktrees must be cleaned up after the diff, transcript,
and outcome metadata are persisted, including failure paths. Adapters with
large fan-out, such as Notion, may use `clusterWorkUnits` before execution to
keep work-unit count tractable, but clustering remains source-shaped planning
rather than a separate execution mode.
## Work-unit lifecycle
Each work unit follows a fixed lifecycle.
1. Create a child worktree at the ingestion base commit.
2. Build a scoped tool session for the child worktree.
3. Run the source skill and agent loop.
4. Run work-unit-local gates against touched artifacts.
5. If gates pass, record `git diff --binary` from base to child HEAD.
6. If gates fail, mark the work unit failed and discard the child worktree.
7. Clean up the child worktree after the diff and transcript are persisted.
The work unit outcome stores the existing operational metadata KTX already
records: unit key, status, actions, touched semantic-layer sources, failure
reason, raw files, and transcript path. It does not add a proposal manifest.
The diff is the proposal.
For `slDisallowed` work units, isolation is defense in depth. The scoped
work-unit tools must withhold semantic-layer write and edit tools, and the
integration layer must reject any otherwise accepted diff from that work unit
that touches `semantic-layer/**`. This catches buggy or bypassed tool behavior
before an invalid LookML connection-mismatch write can reach the integration
worktree.
### Diff proposal contract
The proposal artifact is a Git patch with binary-safe content, not the existing
hash-based raw-source `DiffSet`.
The first implementation must use one pinned patch contract:
- collect `git diff --binary --no-renames <base>..HEAD`;
- disable rename and copy detection so renames are represented as delete plus
create in version one;
- preserve mode changes from the patch metadata, but reject unexpected
executable-mode or binary changes under known text artifact roots such as
`wiki/**` and `semantic-layer/**`;
- apply each accepted patch to the integration worktree with
`git apply --3way --index`;
- do not use `git apply --reject`, because partial hunk application is not an
accepted integration state; and
- if patch application fails, leaves conflicts, or touches a path disallowed for
that work unit, roll back the integration worktree to its pre-apply HEAD and
classify the outcome as a textual conflict.
Delete-versus-edit, recreate-versus-edit, and delete-versus-create races are
therefore textual conflicts when Git cannot apply the patch cleanly. If Git
applies the patch but known artifact validators reject the resulting tree, the
outcome is a semantic conflict.
## Integration lifecycle
The integration worktree applies accepted work-unit diffs after local gates
pass. The runner applies diffs in a deterministic order, using the original
work-unit index unless a future implementation introduces explicit dependency
ordering.
Integration has three conflict classes:
- Clean patch application: the diff applies without conflict.
- Textual conflict: git cannot apply the patch cleanly.
- Semantic conflict: the patch applies textually but creates an invalid or
inconsistent artifact.
Textual conflicts are resolved before semantic gates run when a bounded
resolver agent can produce a valid result. Overlapping work-unit writes are
normal, especially for Metabase cards that target the same semantic-layer marts
from different collections. The runner must treat overlap as an integration
case, not as a reason to fail immediately.
Version one is agent-first. If `git apply --3way --index` leaves conflicts,
the runner starts a resolver agent in the integration worktree. The resolver
receives only the failed patch, already-applied patches, conflicted files,
relevant work-unit transcripts, raw evidence paths, and the final-gate rules.
The resolver must preserve all non-conflicting accepted content, resolve
duplicate or competing artifact entries from evidence, and edit only files
touched by the failed patch or already-applied overlapping patches.
The runner then reruns artifact gates for the changed files and continues with
the remaining patches if validation passes. Resolver attempts are capped to
avoid an unbounded repair loop. A run fails only after the bounded resolver
attempts cannot produce a valid integration tree.
Deterministic semantic merge is a later optimization, not a version-one
requirement. After measuring resolver latency, cost, and failure modes, KTX can
add merge helpers for common semantic-layer YAML cases, such as additive
`measures`, `segments`, `columns`, `joins`, and `descriptions` updates keyed by
their stable logical identifiers. Those helpers can replace agent calls for
mechanical merges once the measured v1 behavior justifies the added complexity.
The integration worktree is preserved on failure with conflict markers or
resolver edits, work-unit patches, transcripts, trace events, and the failure
report. The runner never squashes a failed or partially repaired integration
tree back to the project main worktree.
### Gate repair stage
The gate repair stage handles cases where patches apply cleanly but the
combined tree fails final semantic or wiki gates. This is distinct from textual
conflict resolution: the tree is textually valid, but the artifacts violate KTX
contracts.
After each patch integration and after reconciliation, the runner runs final
artifact gates for the affected scope. If gates fail, the runner classifies the
errors before deciding whether to repair or fail.
Repairable gate errors include:
- stale wiki body references to renamed semantic-layer entities;
- invalid `sl_refs` entries that point to entities instead of sources;
- inline prose that accidentally uses explicit SL reference syntax;
- duplicate measures, segments, or joins with equivalent definitions;
- missing or stale wiki references created by accepted patches; and
- join or source references that can be corrected from the composed manifest
and work-unit evidence.
High-risk gate errors fail without automatic repair unless a later
implementation adds a stronger evidence contract:
- two work units define the same measure with different business meaning;
- a required warehouse table or column does not exist;
- a SQL source fails execution and no obvious localized rewrite exists; or
- the repair would require choosing between conflicting facts without evidence.
For repairable errors, the runner starts a gate repair agent with the exact
gate errors, changed files, relevant work-unit transcripts, raw evidence paths,
and final-gate rules. The agent may edit only the files involved in the gate
failure. The runner reruns gates after each repair attempt and caps attempts to
one or two passes per integration stage. If the tree still fails, the run stops
with the final gate report and preserved integration worktree.
### Reconciliation in the new flow
Reconciliation remains a shared runner stage, but it runs as a serial
integration-stage pass instead of a parallel work unit.
The runner applies all accepted work-unit diffs to the integration worktree,
resolves textual conflicts that can be resolved, and then runs reconciliation in
that integration worktree before final global gates and before squash.
Reconciliation must see the integrated state because its job is to resolve
cross-work-unit duplicates, evictions, fallbacks, and source-specific
reconcile guidance.
Reconciliation runs exactly once per integration pass, serially against the
integration worktree, after all accepted work-unit diffs have been applied and
after textual conflicts are resolved. It never runs inside a child worktree and
never overlaps with work-unit execution. This is the safety carve-out from the
isolation goal: concurrent agent writes are the failure mode being avoided, and
reconciliation is non-concurrent by construction.
Reconciliation is not allowed to mutate project main directly. Its changes are
captured as a reconciliation diff against the pre-reconciliation integration
HEAD and recorded in the existing stage/report metadata. Reconciliation gates
validate the artifacts touched by the reconciliation diff plus any wiki page or
semantic-layer source referenced by changed frontmatter or body references,
using the same artifact-class validators as work-unit gates. Reconciliation may
write only to target connections authorized by the adapter for the ingest run,
but it is not subject to any single work unit's `slDisallowed` scope. The final
global gates validate the combined tree after reconciliation. If reconciliation
introduces an invalid wiki or semantic-layer reference, touches an unauthorized
target, or records an unresolvable artifact conflict, the runner sends
repairable failures through the gate repair stage and stops before squash only
when bounded repair cannot produce a valid tree.
## Artifact-aware integration
KTX durable artifacts are structured enough that git-only merge is not a strong
correctness boundary. Artifact-aware integration must parse and validate known
file classes after diffs are applied.
The first implementation must cover these worktree file classes:
- semantic-layer source YAML;
- wiki markdown frontmatter;
- wiki body references to semantic-layer sources, measures, dimensions, and raw
warehouse tables.
Unmapped fallback records are not worktree files in version one. They remain
typed stage-index and report records emitted by `emit_unmapped_fallback`; the
integration layer validates their raw paths and structured reason codes as
report metadata, not as mergeable artifacts.
Provenance also stays out of the worktree in version one. The source of truth is
the ingest provenance store and report body. Before inserting provenance rows,
the global gate derives the planned rows from accepted work-unit actions,
reconciliation actions, artifact-resolution records, and skipped raw files, then
checks those rows against the integrated worktree and staged raw hashes. Moving
provenance to on-disk files would be a separate schema migration, not part of
this design.
Artifact-resolution records are the existing merged or subsumed reconciliation
outputs emitted through `emit_artifact_resolution` as
`ArtifactResolutionRecord` stage-index records. They are in-memory stage
records, not worktree files, and they feed the provenance gate.
Artifact-aware integration starts with validation plus bounded agent repair.
It does not need semantic-layer YAML merge helpers in version one. If two diffs
contest the same source YAML or wiki page and bounded agent repair cannot prove
correctness, the runner must stop rather than silently accepting stale
references. Deterministic semantic merge helpers can be added after v1 metrics
show which conflicts are frequent, mechanical, and worth optimizing.
## Global semantic gates
Final gates run after every accepted diff, deterministic projection, and
reconciliation change has landed in the integration worktree. These gates are
global because the final failure can emerge only after independent valid diffs
combine.
The final gates must include:
- semantic-layer validation for touched and dependency sources;
- wiki `wiki_refs` validation;
- wiki frontmatter `sl_refs` validation, including source-level and
measure-level references;
- wiki body validation for explicit semantic-layer source, measure, dimension,
and table references; and
- provenance validation for raw paths referenced by new or changed artifacts
before those rows are inserted into SQLite.
For semantic-layer validation, touched sources are sources changed by accepted
work-unit diffs, deterministic projection, or reconciliation. Dependency sources
are their direct declared-join neighbors in the composed semantic-layer graph,
including sources they join to and sources that join to them. Version one runs
the existing whole-connection structural checks and source-scoped checks with
the touched-and-dependency source set; it does not expand dependency scope to a
transitive SQL-projection closure.
The wiki body gate needs a narrow grammar so ordinary prose does not become a
semantic-layer reference. In version one, an explicit body reference is one of
these Markdown forms outside fenced code blocks:
- an inline code token in the form `source.entity`, where both parts are plain
identifier tokens, `source` matches a visible semantic-layer source, and
`entity` must match one of that source's measures, dimensions, or segments;
- an inline code token in the form `connectionId/source.entity`, where
`source.entity` follows the same plain-identifier rule and validates against
that specific target connection;
- an inline code token in the form `source:source_name`, which validates a
source-level semantic-layer reference; or
- an inline code token in the form `table:qualified_table_name`, which validates
a raw warehouse table reference against the visible raw table/catalog sources.
The parser ignores unformatted prose, fenced SQL examples, wildcard patterns
such as `mart_nrr_quarterly.*_arr_cents`, inline SQL predicates such as
`users.is_internal = false`, and unprefixed single-token inline code. Two-part
inline code that does not name a visible semantic-layer source is not treated
as an SL entity reference; use the `table:` prefix for raw warehouse table
references.
The `total_contract_arr_cents` incident is the regression case for this gate:
the integrated tree must fail if a wiki page references
`mart_account_segments.total_contract_arr_cents` as an inline-code body token
while the final semantic-layer source defines only `total_contract_arr`.
## Deterministic projection
Some connectors have authoritative structured inputs that do not need an LLM to
write KTX artifacts. Those connectors can provide deterministic projectors that
run in the integration worktree.
Projection is different from work-unit execution:
- projectors are code, not agents;
- projectors run against the integration worktree;
- projectors produce ordinary durable file changes; and
- projector outputs still pass final global gates.
The runner infers hybrid behavior from the adapter. If an adapter has both
projectors and work units, it is hybrid. If it has only projectors, it is
deterministic. If it has only work units, it uses isolated diffs. No public
`executionMode` knob is needed.
## Connector migration notes
Each connector keeps its source-shaped planning logic. The migration changes
where durable writes happen and how they are integrated.
### Metabase
Metabase must move first because it produced the observed stale-measure wiki
reference. Collection and card chunking can remain adapter-specific, but direct
wiki and semantic-layer writes must happen in per-work-unit worktrees.
The regression test must reproduce two work units that touch
`mart_account_segments`: one writes a wiki reference to an inferred measure and
another leaves the final source with a different measure name. The final global
gate must reject the integrated tree.
### dbt
dbt uses source-shaped planning by model or schema file. Deterministic
projection is appropriate for straightforward model, source, column, and
description facts when dbt artifacts are authoritative. Agent work units remain
useful for business wiki synthesis, ambiguous relationship interpretation, and
enrichment that is not directly represented in dbt YAML.
### MetricFlow
MetricFlow uses source-shaped planning by graph component. Existing
deterministic semantic-model import code becomes a projector in the ingestion
flow. Agent work units handle unsupported constructs, cross-model explanations,
and wiki synthesis.
### Looker
Looker already defers some dashboard and look knowledge through candidates.
That can continue. Any direct semantic-layer writes from explores or query
translation must run through isolated work-unit diffs.
Looker-specific API and file-adapter collisions remain connector domain logic,
but final correctness still belongs to the shared integration gates.
### LookML
LookML already has useful source-shaped ownership rules: models, views, orphan
views, dashboards, and connection-mismatch guards. Those rules stay in the
adapter. Direct semantic-layer writes move into isolated work-unit diffs.
Connection-mismatch work units can keep their existing write restrictions. The
runner enforces those restrictions through scoped tools and target connection
resolution.
### Notion
Notion pages and clusters can create overlapping durable wiki knowledge and can
write semantic-layer overlays after warehouse verification. Notion therefore
uses the same isolated-diff execution model for direct durable writes.
Large Notion workspaces still need source-shaped clustering to control context
size and cost. Clustering remains adapter logic; correctness comes from isolated
diffs and final global gates.
## Minimal connector variance
New connectors must not choose from a menu of ingestion architectures. They
must provide the small amount of source-specific behavior the shared runner
needs.
Every connector answers these questions:
- How does KTX fetch or receive raw evidence?
- How does KTX normalize that evidence into staged files?
- How does KTX split the staged evidence into `WorkUnit[]`?
- Are any source facts authoritative enough for deterministic projection?
- Which target semantic-layer connections can the connector write to?
Everything else is shared runner behavior.
## Regression tests
The implementation plan must start with narrow tests that prove the new
execution model prevents the known failure class.
The first test creates a fake or Metabase-like adapter with two work units
starting from the same base:
1. Work unit A writes a wiki page that references
`mart_account_segments.total_contract_arr_cents` as an inline-code body
token.
2. Work unit B writes or overwrites the final semantic-layer source with only
`total_contract_arr`.
3. Both work units pass their local gates in isolation.
4. Integration applies both diffs.
5. The final global gate fails the run before squash.
Additional tests cover:
- two work units editing different wiki pages without conflict;
- two work units editing the same semantic-layer overlay with additive changes,
where the resolver agent preserves both changes and gates the repaired file;
- two work units editing the same semantic-layer overlay with incompatible
definitions, where the resolver agent receives the conflict context and the
run fails only after bounded repair attempts cannot prove a result;
- a textual conflict in a wiki page where the resolver agent preserves
non-conflicting accepted content and gates the repaired page before squash;
- a cleanly merged tree that fails final gates, where the gate repair agent
fixes a stale wiki reference and the run continues;
- an unrepairable final-gate failure, such as a missing warehouse column, where
the runner stops with a preserved integration worktree and report;
- a hybrid adapter case where deterministic projector outputs are visible in a
child worktree before work-unit wiki synthesis, and the final global gate
catches any stale reference to a non-existent projected semantic-layer entity;
- Notion-style direct wiki writes with invalid `sl_refs`; and
- LookML-style `slDisallowed` work units where write tools are unavailable and
integration rejects any diff that still touches `semantic-layer/**`.
## Rollout
The rollout must be incremental because the current runner is shared by all
adapters.
The rollout switch is runner-owned. During migration it may be a private
per-source allowlist, or an internal `IngestSettingsPort` map keyed by
`sourceKey`, but it must not become a `SourceAdapter` field or public connector
configuration knob.
1. Add the per-work-unit worktree executor behind that internal runner setting.
2. Add diff collection and deterministic integration in the existing runner.
3. Add bounded resolver-agent handling for textual conflicts.
4. Add final global wiki and semantic-layer reference gates, including the wiki
body reference parser defined above.
5. Add bounded gate-repair-agent handling for repairable final-gate failures.
6. Instrument resolver latency, attempts, repaired files, and failure classes.
7. Migrate Metabase to the new execution path first.
8. Migrate Notion, LookML, Looker, dbt, and MetricFlow.
9. Add deterministic semantic merge helpers only after v1 metrics show which
agent repairs are frequent and mechanical enough to justify optimization.
10. Promote the new path to the default after the Metabase regression test and
at least one non-Metabase connector pass.
11. Remove the old shared-worktree work-unit execution path.
The rollout is complete when every connector that permits agent-authored durable
writes uses isolated diffs and all integrations pass the same final global
gates.

View file

@ -635,6 +635,117 @@ describe('runKtxIngest', () => {
expect(io.stderr()).not.toContain('Metabase ingest: prod-metabase');
});
it('emits structured child ingest progress during Metabase fan-out', async () => {
const projectDir = join(tempDir, 'project');
await writeMetabaseConfig(projectDir);
const io = makeIo();
const progressEvents: Array<{ percent: number; message: string; transient?: boolean }> = [];
await expect(
runKtxIngest(
{
command: 'run',
projectDir,
connectionId: 'prod-metabase',
adapter: 'metabase',
outputMode: 'json',
},
io.io,
{
progress: (event) => progressEvents.push(event),
runLocalMetabaseIngest: async (input) => {
input.progress?.onMetabaseFanoutPlanned?.({
metabaseConnectionId: 'prod-metabase',
children: [{ metabaseDatabaseId: 1, targetConnectionId: 'warehouse_a' }],
});
input.progress?.onMetabaseChildStarted?.({
metabaseConnectionId: 'prod-metabase',
metabaseDatabaseId: 1,
targetConnectionId: 'warehouse_a',
jobId: 'metabase-child-1',
});
input.memoryFlow?.update({
plannedWorkUnits: [
{
unitKey: 'metabase-col-6',
rawFiles: ['cards/40.json'],
peerFileCount: 0,
dependencyCount: 0,
},
],
});
input.memoryFlow?.emit({ type: 'chunks_planned', chunkCount: 1, workUnitCount: 1, evictionCount: 0 });
input.memoryFlow?.emit({
type: 'work_unit_started',
unitKey: 'metabase-col-6',
skills: ['sl_capture'],
stepBudget: 40,
});
input.memoryFlow?.emit({
type: 'work_unit_step',
unitKey: 'metabase-col-6',
stepIndex: 7,
stepBudget: 40,
});
input.memoryFlow?.emit({
type: 'stage_progress',
stage: 'integration',
percent: 81,
message: 'Resolving text conflict for metabase-col-6',
});
input.memoryFlow?.emit({ type: 'work_unit_finished', unitKey: 'metabase-col-6', status: 'success' });
input.memoryFlow?.update({
plannedWorkUnits: [
{
unitKey: 'metabase-col-7',
rawFiles: ['cards/48.json'],
peerFileCount: 0,
dependencyCount: 0,
},
],
});
input.memoryFlow?.emit({ type: 'chunks_planned', chunkCount: 1, workUnitCount: 1, evictionCount: 0 });
input.memoryFlow?.emit({
type: 'work_unit_started',
unitKey: 'metabase-col-7',
skills: ['sl_capture'],
stepBudget: 40,
});
input.progress?.onMetabaseChildCompleted?.({
metabaseConnectionId: 'prod-metabase',
metabaseDatabaseId: 1,
targetConnectionId: 'warehouse_a',
jobId: 'metabase-child-1',
status: 'done',
});
return {
metabaseConnectionId: 'prod-metabase',
status: 'all_succeeded',
totals: { workUnits: 1, failedWorkUnits: 0 },
children: [],
};
},
},
),
).resolves.toBe(0);
expect(progressEvents).toEqual(
expect.arrayContaining([
{ percent: 45, message: 'Planned 1 task' },
{ percent: 55, message: 'Processing 1/1 tasks: metabase-col-6' },
{
percent: 60,
message: 'Processing tasks: 0/1 complete, 1 active; latest metabase-col-6 step 7/40',
transient: true,
},
{ percent: 81, message: 'Resolving text conflict for metabase-col-6' },
{ percent: 81, message: 'Processing 1/1 tasks: metabase-col-7' },
]),
);
expect(io.stdout()).toContain('"status": "all_succeeded"');
expect(io.stderr()).not.toContain('Metabase ingest: prod-metabase');
});
it('runs Metabase scheduled ingest through the public CLI command path with real fan-out', async () => {
const projectDir = join(tempDir, 'metabase-cli-project');
await writeWarehouseConfig(projectDir);
@ -985,6 +1096,59 @@ describe('runKtxIngest', () => {
expect(io.stdout()).toContain('Status: error\n');
});
it('prints trace path and error status for stored failed ingest reports', async () => {
const projectDir = join(tempDir, 'project');
await writeWarehouseConfig(projectDir);
const io = makeIo();
const report = {
id: 'report-failed',
runId: 'run-failed',
jobId: 'job-failed',
connectionId: 'warehouse',
sourceKey: 'metabase',
createdAt: '2026-05-17T12:00:00.000Z',
body: {
status: 'failed',
syncId: 'sync-failed',
diffSummary: { added: 1, modified: 0, deleted: 0, unchanged: 0 },
commitSha: null,
tracePath: '/project/.ktx/ingest-traces/job-failed/trace.jsonl',
failure: { phase: 'final_gates', message: 'final artifact gates failed' },
workUnits: [],
failedWorkUnits: [],
reconciliationSkipped: true,
conflictsResolved: [],
evictionsApplied: [],
unmappedFallbacks: [],
evictionInputs: [],
unresolvedCards: [],
supersededBy: null,
overrideOf: null,
provenanceRows: [],
toolTranscripts: [],
},
};
await runKtxIngest(
{
command: 'status',
projectDir,
reportFile: '/project/report-failed.json',
runId: 'run-failed',
outputMode: 'plain',
inputMode: 'disabled',
},
io.io,
{
readReportFile: vi.fn().mockResolvedValue(report),
},
);
expect(io.stdout()).toContain('Trace: /project/.ktx/ingest-traces/job-failed/trace.jsonl');
expect(io.stdout()).toContain('Status: error');
expect(io.stdout()).toContain('Error: final artifact gates failed');
});
it('prints a clear first failure reason when query-history work units fail', async () => {
const projectDir = join(tempDir, 'project');
await writeWarehouseConfig(projectDir);

View file

@ -102,7 +102,7 @@ export interface KtxIngestDeps {
}
function reportStatus(report: IngestReportSnapshot): 'done' | 'error' {
return report.body.failedWorkUnits.length > 0 ? 'error' : 'done';
return report.body.status === 'failed' || report.body.failedWorkUnits.length > 0 ? 'error' : 'done';
}
const REPORT_SOURCE_LABELS = new Map<string, string>([
@ -174,6 +174,9 @@ function formatFailureReason(sourceKey: string, reason: string): string {
}
function failedReportMessage(report: IngestReportSnapshot): string | null {
if (report.body.status === 'failed' && report.body.failure?.message) {
return sanitizeMemoryFlowError(report.body.failure.message);
}
const failedCount = report.body.failedWorkUnits.length;
if (failedCount === 0) {
return null;
@ -195,6 +198,9 @@ function writeReportStatus(report: IngestReportSnapshot, io: KtxIngestIo): void
io.stdout.write(`Report: ${report.id}\n`);
io.stdout.write(`Run: ${report.runId}\n`);
io.stdout.write(`Job: ${report.jobId}\n`);
if (report.body.tracePath) {
io.stdout.write(`Trace: ${report.body.tracePath}\n`);
}
io.stdout.write(`Status: ${reportStatus(report)}\n`);
io.stdout.write(`Source: ${reportSourceLabel(report.sourceKey)}\n`);
io.stdout.write(`Connection: ${report.connectionId}\n`);
@ -289,7 +295,11 @@ function formatDiffProgress(event: Extract<MemoryFlowEvent, { type: 'diff_comput
}
function workUnitEventsThrough(snapshot: MemoryFlowReplayInput, eventIndex: number): MemoryFlowEvent[] {
return snapshot.events.slice(0, eventIndex + 1);
const latestPlanIndex = snapshot.events
.slice(0, eventIndex + 1)
.findLastIndex((event) => event.type === 'chunks_planned');
const startIndex = latestPlanIndex >= 0 ? latestPlanIndex + 1 : 0;
return snapshot.events.slice(startIndex, eventIndex + 1);
}
function completedWorkUnitCountThrough(snapshot: MemoryFlowReplayInput, eventIndex: number): number {
@ -313,7 +323,8 @@ function plannedWorkUnitCountThrough(snapshot: MemoryFlowReplayInput, eventIndex
if (snapshot.plannedWorkUnits.length > 0) {
return snapshot.plannedWorkUnits.length;
}
const planEvent = workUnitEventsThrough(snapshot, eventIndex)
const planEvent = snapshot.events
.slice(0, eventIndex + 1)
.filter((event) => event.type === 'chunks_planned')
.at(-1);
return planEvent?.workUnitCount ?? completedWorkUnitCountThrough(snapshot, eventIndex);
@ -359,6 +370,12 @@ function plainIngestEventProgress(
};
case 'stage_skipped':
return { percent: 45, message: `Skipped ${event.stage}: ${event.reason}` };
case 'stage_progress':
return {
percent: event.percent,
message: event.message,
...(event.transient !== undefined ? { transient: event.transient } : {}),
};
case 'work_unit_started': {
const total = plannedWorkUnitCountThrough(snapshot, eventIndex);
const ordinal = workUnitOrdinalThrough(snapshot, eventIndex, event.unitKey);
@ -705,6 +722,25 @@ export async function runKtxIngest(
}
if (args.adapter === 'metabase') {
const executeMetabaseFanout = deps.runLocalMetabaseIngest ?? runLocalMetabaseIngest;
const runOutputMode = effectiveIngestOutputMode(args.outputMode, io, env, {
requireInput: (args.inputMode ?? 'auto') === 'auto',
});
const plainProgress = shouldWritePlainIngestProgress(runOutputMode, io, env)
? createPlainIngestProgressRenderer(args, io)
: null;
const structuredProgress = deps.progress
? createPlainIngestProgressObserver(args, deps.progress)
: null;
const initialMemoryFlow =
plainProgress || structuredProgress ? initialRunMemoryFlowInput(args, 'pending') : undefined;
const memoryFlow = initialMemoryFlow
? createMemoryFlowLiveBuffer(initialMemoryFlow, {
onChange: (snapshot) => {
plainProgress?.update(snapshot);
structuredProgress?.update(snapshot);
},
})
: undefined;
const progress =
args.outputMode === 'json' && !deps.progress
? undefined
@ -715,20 +751,29 @@ export async function runKtxIngest(
: io,
deps.progress,
);
const result = await executeMetabaseFanout({
project: ingestProject,
adapters: createAdapters(ingestProject, adapterOptions),
metabaseConnectionId: args.connectionId,
...localIngestOptions,
queryExecutor,
trigger: 'manual_resync',
jobIdFactory: deps.jobIdFactory,
...(progress ? { progress } : {}),
});
if (args.outputMode === 'json') {
io.stdout.write(`${JSON.stringify(result, null, 2)}\n`);
} else {
writeMetabaseFanoutStatus(result, io);
plainProgress?.start();
structuredProgress?.start();
let result: LocalMetabaseFanoutResult;
try {
result = await executeMetabaseFanout({
project: ingestProject,
adapters: createAdapters(ingestProject, adapterOptions),
metabaseConnectionId: args.connectionId,
...localIngestOptions,
queryExecutor,
trigger: 'manual_resync',
jobIdFactory: deps.jobIdFactory,
...(memoryFlow ? { memoryFlow } : {}),
...(progress ? { progress } : {}),
});
plainProgress?.flush();
if (args.outputMode === 'json') {
io.stdout.write(`${JSON.stringify(result, null, 2)}\n`);
} else {
writeMetabaseFanoutStatus(result, io);
}
} finally {
plainProgress?.flush();
}
return result.status === 'all_succeeded' ? 0 : 1;
}

View file

@ -1,5 +1,12 @@
<role>
You are processing ONE WorkUnit of a multi-file ingest bundle. The WorkUnit gives you a slice of raw source files (LookML views, dbt/MetricFlow YAMLs, Metabase card JSONs, Notion pages, or similar) and you must translate that slice into KTX semantic-layer sources and/or knowledge wiki pages, in one pass. Prior WorkUnits in this same job may have already written SL sources and wiki pages; their writes are visible on the working branch and discoverable with `discover_data`.
You are processing ONE WorkUnit of a multi-file ingest bundle. The WorkUnit
gives you a slice of raw source files (LookML views, dbt/MetricFlow YAMLs,
Metabase card JSONs, Notion pages, or similar) and you must translate that
slice into KTX semantic-layer sources and/or knowledge wiki pages, in one pass.
You run in an isolated WorkUnit worktree. Deterministic projection output,
existing project memory, and listed dependency paths are visible; sibling
WorkUnit edits from this same job are not visible until the runner integrates
accepted patches.
</role>
<stance>
@ -8,9 +15,19 @@ Assertive. The bundle was explicitly submitted for ingest. Default to capturing
<workflow>
1. Read this WorkUnit's section at the end of the user prompt. It lists your `rawFiles`, any unchanged `dependencyPaths` you may need to resolve references, the `peerFileIndex` (paths only; you CANNOT read them), the source's `skillNames`, and any `priorProvenance` rows telling you what earlier syncs produced from these files.
2. Load the per-source review skill first (e.g. `lookml_ingest`, `metricflow_ingest`, `dbt_ingest`), then `sl_capture` and `wiki_capture`, and `ingest_triage` last. The triage skill tells you how to react when `discover_data` reveals that a prior WU already wrote something overlapping.
2. Load the per-source review skill first (for example `lookml_ingest`,
`metricflow_ingest`, or `dbt_ingest`), then `sl_capture` and
`wiki_capture`, and `ingest_triage` last. The triage skill tells you how to
react when existing project memory, deterministic projection output, or
prior provenance overlaps with what this WorkUnit is about to write.
3. If the system prompt includes `<canonical_pins>`, read those pins before choosing artifact keys. A pin's `canonicalArtifactKey` is the preferred artifact for its `contestedKey`: prefer editing the pinned canonical artifact when it already exists or when this raw file clearly updates it. Do not create a duplicate contested artifact when a pin says another artifact is canonical; use a specific disambiguated key only when the raw file describes a genuinely different domain.
4. For each raw file: call `read_raw_file` (or `read_raw_span` for slicing large files) to load content. Before writing a new SL source or wiki page, call `discover_data` for each candidate source, table, metric, or topic name to find prior-WU writes, existing wiki pages, SL sources, and raw warehouse matches; apply `ingest_triage` when you hit one, and apply any matching canonical pin before deciding whether to edit, rename, or skip.
4. For each raw file: call `read_raw_file` (or `read_raw_span` for slicing large
files) to load content. Before writing a new SL source or wiki page, call
`discover_data` for each candidate source, table, metric, or topic name to
find existing wiki pages, SL sources, deterministic projection output, prior
sync artifacts, and raw warehouse matches; apply `ingest_triage` when you hit
one, and apply any matching canonical pin before deciding whether to edit,
rename, or skip.
5. For every `wiki_write`, `wiki_remove`, `sl_write_source`, or `sl_edit_source` call, include `rawPaths` with only the raw file paths that directly support that action. If one artifact synthesizes several files, list each contributing raw file. Do not include unrelated files from the same WorkUnit.
6. When `priorProvenance` names an existing artifact for one of your raw files, prefer `sl_edit` over `sl_write` for that artifact: the re-ingest change rule says expression-only changes replace silently, grain/column/filter changes replace and flag.
7. When a raw file cannot map to normal SL and you use a fallback path, call `emit_unmapped_fallback` exactly once for that raw file and reason. Use `fallback: "sql_standalone"` for a standalone SQL source, `fallback: "wiki_only"` for documentation-only capture, and `fallback: "flagged"` when no reliable artifact can be written.
@ -28,5 +45,7 @@ Wiki keys must be flat slugs like `paid-order-lifecycle`, not directory paths li
- Do not invent physical column names or grain keys. For table-backed SL sources, every `columns:`, `grain:`, `joins:`, `segments:`, and `measures[].expr` column must come from raw-file column declarations or warehouse-backed discovery (`discover_data`, `sl_discover`, `entity_details`). If column names are not confirmed, capture the business context in wiki instead of writing a full SL source.
- Do not write context-source overlays into the context source connection just because that is the current WorkUnit connection. Use `sl_discover` across data sources and write the SL artifact to the warehouse/data-source connection that owns the matching manifest. If there is no confirmed target connection, use `emit_unmapped_fallback` and wiki capture.
- Do not duplicate an artifact that prior provenance says you already produced; update it.
- Do not silently accept a name collision with a prior WU's write when the formula differs. Trigger `ingest_triage`.
- Do not silently accept a name collision with visible existing memory,
deterministic projection output, or prior provenance when the formula differs.
Trigger `ingest_triage`.
</do_not>

View file

@ -7,8 +7,11 @@ callers: [memory_agent]
# Ingest Triage - conflict classification and resolution
This skill is loaded in two contexts:
- By a Stage 3 WorkUnit agent when `sl_discover` reveals that a prior WU (or a prior sync) already wrote something that overlaps with what the current WU is about to write.
- By the Stage 4 reconciliation agent for cross-WU sweeps and for eviction decisions.
- By a Stage 3 WorkUnit agent when `sl_discover`, deterministic projection
output, existing project memory, or prior provenance overlaps with what the
current WorkUnit is about to write.
- By the Stage 4 reconciliation agent for cross-WorkUnit sweeps, accepted patch
overlap, and eviction decisions.
Apply the rules below before every write that could collide with an existing artifact.
@ -23,7 +26,8 @@ Apply the rules below before every write that could collide with an existing art
3. **If the difference is structural - grain, columns, filter, join shape - is the current bundle the re-ingest of a previously-ingested bundle (i.e. `priorProvenance` has a row for this raw file and artifact)?**
Re-ingest change (semantic break): replace + flag. Record in the IngestReport's `conflicts_resolved` list with `flagged_for_human: true`.
4. **If there's no prior-sync row (both are from THIS job), check for same-ingest contradictions:**
4. **If reconciliation sees accepted patches from this same job with no
prior-sync row, check for same-ingest contradictions:**
| Kind | Detection | Resolution |
|---|---|---|

View file

@ -0,0 +1,45 @@
import { mkdir, mkdtemp, readFile, writeFile } from 'node:fs/promises';
import { tmpdir } from 'node:os';
import { join } from 'node:path';
import { describe, expect, it } from 'vitest';
import { GitService } from './git.service.js';
async function makeGit() {
const homeDir = await mkdtemp(join(tmpdir(), 'ktx-git-patch-'));
const configDir = join(homeDir, 'config');
const git = new GitService({
storage: { configDir, homeDir },
git: {
userName: 'System User',
userEmail: 'system@example.com',
bootstrapMessage: 'init',
bootstrapAuthor: 'system',
bootstrapAuthorEmail: 'system@example.com',
},
});
await git.onModuleInit();
return { homeDir, configDir, git };
}
describe('GitService patch helpers', () => {
it('collects binary-safe no-rename patches and applies them with --3way --index', async () => {
const { homeDir, configDir, git } = await makeGit();
await mkdir(join(configDir, 'wiki/global'), { recursive: true });
await writeFile(join(configDir, 'wiki/global/page.md'), 'old\n');
await git.commitFiles(['wiki/global/page.md'], 'add page', 'System User', 'system@example.com');
const base = await git.revParseHead();
await writeFile(join(configDir, 'wiki/global/page.md'), 'new\n');
await git.commitFiles(['wiki/global/page.md'], 'edit page', 'System User', 'system@example.com');
const patchPath = join(homeDir, 'proposal.patch');
await git.writeBinaryNoRenamePatch(base, 'HEAD', patchPath);
const targetDir = join(homeDir, 'target');
await git.addWorktree(targetDir, 'target', base);
const targetGit = git.forWorktree(targetDir);
await targetGit.applyPatchFile3WayIndex(patchPath);
await targetGit.commitStaged('apply proposal', 'System User', 'system@example.com');
await expect(readFile(join(targetDir, 'wiki/global/page.md'), 'utf-8')).resolves.toBe('new\n');
});
});

View file

@ -1,5 +1,5 @@
import { promises as fs } from 'node:fs';
import { join } from 'node:path';
import { dirname, join } from 'node:path';
import type { SimpleGit } from 'simple-git';
import { noopLogger, resolveConfigDir, type KtxCoreConfig, type KtxLogger } from './config.js';
import { createSimpleGit } from './git-env.js';
@ -747,6 +747,55 @@ export class GitService {
}
}
async writeBinaryNoRenamePatch(from: string, to: string, patchPath: string): Promise<void> {
await this.withMutationQueue(async () => {
const patch = await this.git.raw(['diff', '--binary', '--no-renames', `${from}..${to}`]);
await fs.mkdir(dirname(patchPath), { recursive: true });
await fs.writeFile(patchPath, patch, 'utf-8');
});
}
async applyPatchFile3WayIndex(patchPath: string): Promise<void> {
await this.withMutationQueue(async () => {
await this.git.raw(['apply', '--3way', '--index', patchPath]);
});
}
async commitStaged(commitMessage: string, author: string, authorEmail: string): Promise<GitCommitInfo> {
return this.withMutationQueue(async () => {
const stagedChanges = await this.git.diff(['--cached', '--name-only']);
if (!stagedChanges.trim()) {
const head = (await this.git.revparse(['HEAD'])).trim();
const log = await this.git.log({ maxCount: 1 });
const latest = log.latest;
return {
commitHash: head,
shortHash: head.substring(0, 8),
message: latest?.message ?? '',
author: latest?.author_name ?? '',
authorEmail: latest?.author_email ?? '',
timestamp: latest?.date ?? new Date(0).toISOString(),
committedDate: latest?.date ? new Date(latest.date).toISOString() : new Date(0).toISOString(),
created: false,
};
}
await this.git.commit(commitMessage, { '--author': `${author} <${authorEmail}>` });
const head = (await this.git.revparse(['HEAD'])).trim();
const log = await this.git.log({ maxCount: 1 });
const latest = log.latest;
return {
commitHash: head,
shortHash: head.substring(0, 8),
message: latest?.message ?? commitMessage,
author: latest?.author_name ?? author,
authorEmail: latest?.author_email ?? authorEmail,
timestamp: latest?.date ?? new Date().toISOString(),
committedDate: latest?.date ? new Date(latest.date).toISOString() : new Date().toISOString(),
created: true,
};
});
}
private async fileExists(path: string): Promise<boolean> {
try {
await fs.access(path);

View file

@ -138,6 +138,52 @@ describe('fetchMetabaseBundle', () => {
expect(warn).not.toHaveBeenCalled();
});
it('emits memory-flow progress while fetching Metabase cards', async () => {
const events: unknown[] = [];
await fetchMetabaseBundle({
pullConfig: { metabaseConnectionId, metabaseDatabaseId: 42 },
stagedDir,
ctx: {
...makeFetchContext(),
memoryFlow: {
emit: (event) => events.push(event),
update: vi.fn(),
finish: vi.fn(),
snapshot: vi.fn(),
},
},
clientFactory,
sourceStateReader,
});
expect(events).toEqual(
expect.arrayContaining([
expect.objectContaining({
type: 'stage_progress',
stage: 'source',
message: 'Fetching Metabase database 42 metadata',
}),
expect.objectContaining({
type: 'stage_progress',
stage: 'source',
message: 'Fetching 1 Metabase card for database 42',
}),
expect.objectContaining({
type: 'stage_progress',
stage: 'source',
message: 'Checked 1/1 Metabase cards for database 42; wrote 1',
transient: true,
}),
expect.objectContaining({
type: 'stage_progress',
stage: 'source',
message: 'Fetched Metabase database 42: 1 cards, 0 unresolved',
}),
]),
);
});
it('routes Metabase fetch warnings through the injected logger', async () => {
const logger = {
log: vi.fn(),

View file

@ -83,6 +83,15 @@ function resolvePath(index: Map<number | 'root', CollectionNode>, collectionId:
export async function fetchMetabaseBundle(params: FetchMetabaseBundleParams): Promise<void> {
const pullConfig: MetabasePullConfig = parseMetabasePullConfig(params.pullConfig);
const logger = params.logger ?? noopMetabaseFetchLogger;
const emitFetchProgress = (percent: number, message: string, transient = false): void => {
params.ctx.memoryFlow?.emit({
type: 'stage_progress',
stage: 'source',
percent,
message,
...(transient ? { transient } : {}),
});
};
const syncState = await params.sourceStateReader.getSourceState(pullConfig.metabaseConnectionId);
const mapping = syncState.mappings.find(
(m) => m.metabaseDatabaseId === pullConfig.metabaseDatabaseId && m.syncEnabled,
@ -100,6 +109,7 @@ export async function fetchMetabaseBundle(params: FetchMetabaseBundleParams): Pr
const client = await params.clientFactory.createClient(pullConfig, params.ctx);
try {
emitFetchProgress(26, `Fetching Metabase database ${pullConfig.metabaseDatabaseId} metadata`);
let mappingDatabaseName = mapping.metabaseDatabaseName;
let mappingEngine = mapping.metabaseEngine;
if (mappingDatabaseName === null) {
@ -133,6 +143,12 @@ export async function fetchMetabaseBundle(params: FetchMetabaseBundleParams): Pr
await mkdir(join(params.stagedDir, STAGED_FILES.databasesDir), { recursive: true });
const cardIdsToFetch = await resolveCardIdsToFetch(client, scope, pullConfig.metabaseDatabaseId, logger);
emitFetchProgress(
28,
`Fetching ${cardIdsToFetch.length} Metabase card${cardIdsToFetch.length === 1 ? '' : 's'} for database ${
pullConfig.metabaseDatabaseId
}`,
);
const referencedCollectionIds = new Set<number>();
let writtenCards = 0;
@ -212,7 +228,19 @@ export async function fetchMetabaseBundle(params: FetchMetabaseBundleParams): Pr
}
}
}
const knownTotal = Math.max(cardIdsToFetch.length, fetched.size + queue.length);
if (fetched.size === 1 || fetched.size % 10 === 0 || queue.length === 0) {
emitFetchProgress(
30,
`Checked ${fetched.size}/${knownTotal} Metabase cards for database ${pullConfig.metabaseDatabaseId}; wrote ${writtenCards}`,
true,
);
}
}
emitFetchProgress(
32,
`Fetched Metabase database ${pullConfig.metabaseDatabaseId}: ${writtenCards} cards, ${unresolvedCards.length} unresolved`,
);
for (const colId of referencedCollectionIds) {
const node = collectionIndex.get(colId);

View file

@ -1,10 +1,12 @@
import { mkdir, mkdtemp, readFile, rm, writeFile } from 'node:fs/promises';
import { tmpdir } from 'node:os';
import { join } from 'node:path';
import { afterEach, beforeEach, describe, expect, it } from 'vitest';
import { afterEach, beforeEach, describe, expect, it, vi } from 'vitest';
import { makeLocalGitRepo } from '../../../test/make-local-git-repo.js';
import type { SourceAdapter } from '../../types.js';
import type { MetricFlowParseResult } from './deep-parse.js';
import { MetricflowSourceAdapter } from './metricflow.adapter.js';
import { readMetricflowProjectionConfig, writeMetricflowProjectionConfig } from './projection-config.js';
function compileOnlyRequiredDepsCheck(): void {
// @ts-expect-error MetricflowSourceAdapter requires an explicit cache home.
@ -22,6 +24,25 @@ async function makeRepo(tmpRoot: string, files: Record<string, string>) {
return makeLocalGitRepo(fixtureDir, join(tmpRoot, 'origin'));
}
function metricflowParseResult(): MetricFlowParseResult {
return {
semanticModels: [
{
name: 'orders',
description: 'Orders',
modelRef: 'orders',
dimensions: [{ name: 'status', column: 'status', type: 'string', label: 'Status' }],
measures: [{ type: 'simple', name: 'order_count', column: 'id', aggregation: 'count' }],
entities: [{ name: 'customer', type: 'foreign', expr: 'customer_id' }],
defaultTimeDimension: null,
},
],
crossModelMetrics: [],
relationships: [],
warnings: ['parser warning'],
};
}
describe('MetricflowSourceAdapter', () => {
let tmpRoot: string;
let stagedDir: string;
@ -127,4 +148,119 @@ describe('MetricflowSourceAdapter', () => {
await expect(readFile(join(stagedDir, 'models/orders.yml'), 'utf-8')).resolves.toContain('semantic_models');
expect(await adapter.detect(stagedDir)).toBe(true);
});
it('persists parsed target tables for deterministic projection during fetch', async () => {
const repo = await makeRepo(tmpRoot, {
'dbt_project.yml': 'name: analytics\n',
'models/orders.yml': 'semantic_models:\n - name: orders\n model: ref("orders")\n',
});
await adapter.fetch?.(
{
repoUrl: repo.repoUrl,
branch: 'main',
path: null,
authToken: null,
parsedTargetTables: {
orders: {
ok: true,
catalog: null,
schema: 'analytics',
name: 'orders',
canonicalTable: 'analytics.orders',
},
},
},
stagedDir,
{ connectionId: 'warehouse-1', sourceKey: 'metricflow' },
);
await expect(readMetricflowProjectionConfig(stagedDir)).resolves.toMatchObject({
parsedTargetTables: {
orders: {
ok: true,
schema: 'analytics',
name: 'orders',
},
},
});
});
it('projects parsed MetricFlow semantic models in the integration worktree', async () => {
await writeMetricflowProjectionConfig(stagedDir, {
parsedTargetTables: {
orders: {
ok: true,
catalog: null,
schema: 'analytics',
name: 'orders',
canonicalTable: 'analytics.orders',
},
},
});
const scoped = {
getManifestEntry: vi.fn().mockResolvedValue(null),
isManifestBacked: vi.fn().mockResolvedValue(false),
loadAllSources: vi.fn().mockResolvedValue({ sources: [], loadErrors: [] }),
loadSource: vi.fn().mockResolvedValue(null),
writeSource: vi.fn().mockResolvedValue({ warnings: [] }),
};
const semanticLayerService = {
forWorktree: vi.fn().mockReturnValue(scoped),
getManifestEntry: vi.fn(),
isManifestBacked: vi.fn(),
loadAllSources: vi.fn(),
loadSource: vi.fn(),
writeSource: vi.fn(),
};
const result = await adapter.project?.({
connectionId: 'warehouse-1',
sourceKey: 'metricflow',
syncId: 'sync-1',
jobId: 'job-1',
runId: 'run-1',
stagedDir,
workdir: '/tmp/metricflow-integration',
parseArtifacts: metricflowParseResult(),
semanticLayerService: semanticLayerService as never,
});
expect(semanticLayerService.forWorktree).toHaveBeenCalledWith('/tmp/metricflow-integration');
expect(scoped.writeSource).toHaveBeenCalledWith(
'warehouse-1',
expect.objectContaining({ name: 'orders' }),
'dbt MetricFlow',
expect.any(String),
'dbt MetricFlow sync: create source orders',
{ skipValidation: true },
);
expect(result).toMatchObject({
warnings: ['parser warning'],
errors: [],
touchedSources: [{ connectionId: 'warehouse-1', sourceName: 'orders' }],
changedWikiPageKeys: [],
});
});
it('returns a projection error when parse artifacts are missing', async () => {
const result = await adapter.project?.({
connectionId: 'warehouse-1',
sourceKey: 'metricflow',
syncId: 'sync-1',
jobId: 'job-1',
runId: 'run-1',
stagedDir,
workdir: '/tmp/metricflow-integration',
parseArtifacts: undefined,
semanticLayerService: {} as never,
});
expect(result).toMatchObject({
warnings: [],
errors: ['MetricFlow deterministic projection requires parseArtifacts from chunk()'],
touchedSources: [],
changedWikiPageKeys: [],
});
});
});

View file

@ -1,10 +1,23 @@
import { join } from 'node:path';
import type { ChunkResult, DiffSet, FetchContext, SourceAdapter } from '../../types.js';
import type {
ChunkResult,
DeterministicProjectionContext,
DiffSet,
FetchContext,
ProjectionResult,
SourceAdapter,
} from '../../types.js';
import { chunkMetricFlowProject } from './chunk.js';
import { detectMetricFlowStagedDir } from './detect.js';
import { parseMetricflowFiles, type MetricFlowParseResult } from './deep-parse.js';
import { fetchMetricflowRepo } from './fetch.js';
import { importMetricflowSemanticModels } from './import-semantic-models.js';
import { parseMetricFlowStagedDir, type ParsedMetricFlowProject } from './parse.js';
import {
metricflowHostTablesFromParsedTargets,
readMetricflowProjectionConfig,
writeMetricflowProjectionConfig,
} from './projection-config.js';
import { parseMetricflowPullConfig } from './pull-config.js';
export interface MetricflowSourceAdapterDeps {
@ -33,6 +46,9 @@ export class MetricflowSourceAdapter implements SourceAdapter {
cacheDir: this.resolveCacheDir(ctx.connectionId),
stagedDir,
});
await writeMetricflowProjectionConfig(stagedDir, {
parsedTargetTables: config.parsedTargetTables,
});
}
async listTargetConnectionIds(_stagedDir: string): Promise<string[]> {
@ -46,6 +62,37 @@ export class MetricflowSourceAdapter implements SourceAdapter {
return { ...chunk, parseArtifacts };
}
async project(ctx: DeterministicProjectionContext): Promise<ProjectionResult> {
if (!isMetricFlowParseResult(ctx.parseArtifacts)) {
return {
warnings: [],
errors: ['MetricFlow deterministic projection requires parseArtifacts from chunk()'],
touchedSources: [],
changedWikiPageKeys: [],
};
}
const projectionConfig = await readMetricflowProjectionConfig(ctx.stagedDir);
const result = await importMetricflowSemanticModels(
{ semanticLayerService: ctx.semanticLayerService },
{
connectionId: ctx.connectionId,
parseResult: ctx.parseArtifacts,
targetSchema: null,
hostTables: metricflowHostTablesFromParsedTargets(projectionConfig.parsedTargetTables),
workdir: ctx.workdir,
},
);
return {
result,
warnings: result.warnings,
errors: result.errors,
touchedSources: result.touchedSources,
changedWikiPageKeys: [],
};
}
private resolveCacheDir(connectionId: string): string {
return join(this.deps.homeDir, 'ingest-metricflow-repos', connectionId);
}
@ -54,3 +101,16 @@ export class MetricflowSourceAdapter implements SourceAdapter {
function parseMetricflowStagedDirForImport(project: ParsedMetricFlowProject): MetricFlowParseResult {
return parseMetricflowFiles(project.files);
}
function isMetricFlowParseResult(value: unknown): value is MetricFlowParseResult {
if (!value || typeof value !== 'object') {
return false;
}
const candidate = value as Partial<MetricFlowParseResult>;
return (
Array.isArray(candidate.semanticModels) &&
Array.isArray(candidate.crossModelMetrics) &&
Array.isArray(candidate.relationships) &&
Array.isArray(candidate.warnings)
);
}

View file

@ -0,0 +1,54 @@
import { mkdir, readFile, writeFile } from 'node:fs/promises';
import { join } from 'node:path';
import { z } from 'zod';
import { parsedTargetTableSchema, type ParsedTargetTable } from '../../parsed-target-table.js';
import type { MetricflowHostTable } from './semantic-models.js';
const METRICFLOW_PROJECTION_CONFIG_FILE = 'sync-config.json';
const metricflowProjectionConfigSchema = z.object({
parsedTargetTables: z.record(z.string(), parsedTargetTableSchema).default({}),
});
export type MetricflowProjectionConfig = z.infer<typeof metricflowProjectionConfigSchema>;
export async function writeMetricflowProjectionConfig(
stagedDir: string,
config: MetricflowProjectionConfig,
): Promise<void> {
const parsed = metricflowProjectionConfigSchema.parse(config);
await mkdir(stagedDir, { recursive: true });
await writeFile(join(stagedDir, METRICFLOW_PROJECTION_CONFIG_FILE), `${JSON.stringify(parsed, null, 2)}\n`, 'utf-8');
}
export async function readMetricflowProjectionConfig(stagedDir: string): Promise<MetricflowProjectionConfig> {
const path = join(stagedDir, METRICFLOW_PROJECTION_CONFIG_FILE);
try {
return metricflowProjectionConfigSchema.parse(JSON.parse(await readFile(path, 'utf-8')));
} catch (error) {
if (error && typeof error === 'object' && 'code' in error && error.code === 'ENOENT') {
return { parsedTargetTables: {} };
}
throw error;
}
}
export function metricflowHostTablesFromParsedTargets(
parsedTargetTables: Record<string, ParsedTargetTable>,
): MetricflowHostTable[] {
return Object.entries(parsedTargetTables)
.flatMap(([id, table]) =>
table.ok
? [
{
id,
name: table.name,
catalog: table.catalog,
db: table.schema,
columns: [],
},
]
: [],
)
.sort((left, right) => left.id.localeCompare(right.id));
}

View file

@ -0,0 +1,190 @@
import { describe, expect, it, vi } from 'vitest';
import { validateFinalIngestArtifacts, validateProvenanceRawPaths } from './artifact-gates.js';
function wikiServiceWithPages(
pages: Record<string, { refs?: string[]; content?: string; slRefs?: string[] }>,
) {
return {
listPageKeys: vi.fn().mockResolvedValue(Object.keys(pages)),
readPage: vi.fn().mockImplementation((_scope: string, _scopeId: string | null, pageKey: string) => {
const page = pages[pageKey];
if (!page) {
return Promise.resolve(null);
}
return Promise.resolve({
pageKey,
frontmatter: {
summary: pageKey,
usage_mode: 'auto',
refs: page.refs,
sl_refs: page.slRefs,
},
content: page.content ?? '',
});
}),
};
}
describe('artifact gates', () => {
it('fails the final tree when wiki body references a stale semantic-layer measure', async () => {
const wikiService = wikiServiceWithPages({
'account-segments': {
slRefs: ['mart_account_segments'],
content: 'ARR is `mart_account_segments.total_contract_arr_cents`.',
},
});
const semanticLayerService = {
loadAllSources: vi.fn().mockResolvedValue({
sources: [
{
name: 'mart_account_segments',
grain: ['account_id'],
columns: [{ name: 'account_id', type: 'string' }],
joins: [],
measures: [{ name: 'total_contract_arr', expr: 'sum(contract_arr)' }],
table: 'analytics.mart_account_segments',
},
],
loadErrors: [],
}),
};
await expect(
validateFinalIngestArtifacts({
connectionIds: ['warehouse'],
changedWikiPageKeys: ['account-segments'],
touchedSlSources: [{ connectionId: 'warehouse', sourceName: 'mart_account_segments' }],
wikiService: wikiService as never,
semanticLayerService: semanticLayerService as never,
validateTouchedSources: async () => ({ invalidSources: [], validSources: ['mart_account_segments'] }),
tableExists: async () => true,
}),
).rejects.toThrow(/unknown semantic-layer entity mart_account_segments\.total_contract_arr_cents/);
});
it('fails before provenance insertion when a raw path cannot be tied to the current snapshot or eviction set', () => {
expect(() =>
validateProvenanceRawPaths({
rows: [{ rawPath: 'cards/missing.json' }],
currentRawPaths: new Set(['cards/present.json']),
deletedRawPaths: new Set(['cards/deleted.json']),
}),
).toThrow(/provenance row references raw path outside this snapshot: cards\/missing\.json/);
});
it('fails measure-level wiki frontmatter sl_refs that point at missing entities', async () => {
const wikiService = wikiServiceWithPages({
'account-segments': {
slRefs: ['mart_account_segments.total_contract_arr_cents'],
content: 'ARR uses a renamed measure.',
},
});
const semanticLayerService = {
loadAllSources: vi.fn().mockResolvedValue({
sources: [
{
name: 'mart_account_segments',
grain: ['account_id'],
columns: [{ name: 'account_id', type: 'string' }],
joins: [],
measures: [{ name: 'total_contract_arr', expr: 'sum(contract_arr)' }],
table: 'analytics.mart_account_segments',
},
],
loadErrors: [],
}),
};
await expect(
validateFinalIngestArtifacts({
connectionIds: ['warehouse'],
changedWikiPageKeys: ['account-segments'],
touchedSlSources: [{ connectionId: 'warehouse', sourceName: 'mart_account_segments' }],
wikiService: wikiService as never,
semanticLayerService: semanticLayerService as never,
validateTouchedSources: async () => ({ invalidSources: [], validSources: ['warehouse:mart_account_segments'] }),
tableExists: async () => true,
}),
).rejects.toThrow(/unknown sl_refs entity mart_account_segments\.total_contract_arr_cents/);
});
it('validates direct declared-join neighbors of touched semantic-layer sources', async () => {
const semanticLayerService = {
loadAllSources: vi.fn().mockResolvedValue({
sources: [
{
name: 'orders',
grain: ['order_id'],
columns: [
{ name: 'order_id', type: 'string' },
{ name: 'account_id', type: 'string' },
],
joins: [{ to: 'accounts', on: 'orders.account_id = accounts.account_id', relationship: 'many_to_one' }],
measures: [{ name: 'order_count', expr: 'count(*)' }],
},
{
name: 'accounts',
grain: ['account_id'],
columns: [{ name: 'account_id', type: 'string' }],
joins: [],
measures: [{ name: 'account_count', expr: 'count(*)' }],
},
{
name: 'segments',
grain: ['segment_id'],
columns: [
{ name: 'segment_id', type: 'string' },
{ name: 'account_id', type: 'string' },
],
joins: [{ to: 'accounts', on: 'segments.account_id = accounts.account_id', relationship: 'many_to_one' }],
measures: [],
},
],
loadErrors: [],
}),
};
const validateTouchedSources = vi.fn().mockResolvedValue({ invalidSources: [], validSources: [] });
await validateFinalIngestArtifacts({
connectionIds: ['warehouse'],
changedWikiPageKeys: [],
touchedSlSources: [{ connectionId: 'warehouse', sourceName: 'accounts' }],
wikiService: { readPage: vi.fn() } as never,
semanticLayerService: semanticLayerService as never,
validateTouchedSources,
tableExists: async () => true,
});
expect(validateTouchedSources).toHaveBeenCalledWith([
{ connectionId: 'warehouse', sourceName: 'accounts' },
{ connectionId: 'warehouse', sourceName: 'orders' },
{ connectionId: 'warehouse', sourceName: 'segments' },
]);
});
it('fails final gates when a changed wiki page references a missing wiki page', async () => {
const wikiService = wikiServiceWithPages({
'account-segments': {
refs: ['missing-frontmatter-page'],
content: 'See [[missing-inline-page]] for the related process.',
},
});
const semanticLayerService = {
loadAllSources: vi.fn().mockResolvedValue({ sources: [], loadErrors: [] }),
};
await expect(
validateFinalIngestArtifacts({
connectionIds: ['warehouse'],
changedWikiPageKeys: ['account-segments'],
touchedSlSources: [],
wikiService: wikiService as never,
semanticLayerService: semanticLayerService as never,
validateTouchedSources: async () => ({ invalidSources: [], validSources: [] }),
tableExists: async () => true,
}),
).rejects.toThrow(
/wiki references target missing page\(s\): account-segments -> missing-frontmatter-page, account-segments -> missing-inline-page/,
);
});
});

View file

@ -0,0 +1,188 @@
import type { SemanticLayerService } from '../sl/index.js';
import type { TouchedSlSource } from '../tools/index.js';
import type { KnowledgeWikiService } from '../wiki/index.js';
import { findMissingWikiRefs } from '../wiki/wiki-ref-validation.js';
import { findInvalidWikiBodyRefs } from './wiki-body-refs.js';
export interface TouchedValidationResult {
invalidSources: string[];
validSources: string[];
}
export interface FinalArtifactGateInput {
connectionIds: string[];
changedWikiPageKeys: string[];
touchedSlSources: TouchedSlSource[];
wikiService: KnowledgeWikiService;
semanticLayerService: SemanticLayerService;
validateTouchedSources(touched: TouchedSlSource[]): Promise<TouchedValidationResult>;
tableExists(connectionId: string, tableRef: string): Promise<boolean>;
}
export interface ProvenanceRawPathValidationInput {
rows: Array<{ rawPath: string }>;
currentRawPaths: Set<string>;
deletedRawPaths: Set<string>;
}
function parseSlRef(ref: string): { connectionId: string | null; sourceName: string; entityName: string | null } {
const withoutConnection = ref.includes('/') ? ref.slice(ref.indexOf('/') + 1) : ref;
const connectionId = ref.includes('/') ? ref.slice(0, ref.indexOf('/')) : null;
const [sourceName = '', entityName = null] = withoutConnection.split('.', 2);
return { connectionId, sourceName, entityName };
}
function slEntityNames(source: Awaited<ReturnType<SemanticLayerService['loadAllSources']>>['sources'][number]): Set<string> {
return new Set([
...(source.measures ?? []).map((measure) => measure.name),
...(source.columns ?? []).map((column) => column.name),
...(source.segments ?? []).map((segment) => segment.name),
]);
}
function uniqueTouchedSources(sources: TouchedSlSource[]): TouchedSlSource[] {
const seen = new Set<string>();
const unique: TouchedSlSource[] = [];
for (const source of sources) {
const key = `${source.connectionId}:${source.sourceName}`;
if (seen.has(key)) {
continue;
}
seen.add(key);
unique.push(source);
}
return unique.sort((left, right) => {
const byConnection = left.connectionId.localeCompare(right.connectionId);
return byConnection === 0 ? left.sourceName.localeCompare(right.sourceName) : byConnection;
});
}
async function expandTouchedSlSourcesWithDirectJoinNeighbors(input: FinalArtifactGateInput): Promise<TouchedSlSource[]> {
const expanded = [...input.touchedSlSources];
const touchedByConnection = new Map<string, Set<string>>();
for (const source of input.touchedSlSources) {
const bucket = touchedByConnection.get(source.connectionId) ?? new Set<string>();
bucket.add(source.sourceName);
touchedByConnection.set(source.connectionId, bucket);
}
for (const connectionId of input.connectionIds) {
const touched = touchedByConnection.get(connectionId);
if (!touched || touched.size === 0) {
continue;
}
const { sources } = await input.semanticLayerService.loadAllSources(connectionId);
for (const source of sources) {
const sourceIsTouched = touched.has(source.name);
if (sourceIsTouched) {
for (const join of source.joins ?? []) {
expanded.push({ connectionId, sourceName: join.to });
}
}
if ((source.joins ?? []).some((join) => touched.has(join.to))) {
expanded.push({ connectionId, sourceName: source.name });
}
}
}
return uniqueTouchedSources(expanded);
}
async function validateWikiSlRefs(input: FinalArtifactGateInput): Promise<string[]> {
const errors: string[] = [];
const sourcesByConnection = new Map<string, Awaited<ReturnType<SemanticLayerService['loadAllSources']>>['sources']>();
for (const connectionId of input.connectionIds) {
const { sources } = await input.semanticLayerService.loadAllSources(connectionId);
sourcesByConnection.set(connectionId, sources);
}
for (const pageKey of input.changedWikiPageKeys) {
const page = await input.wikiService.readPage('GLOBAL', null, pageKey);
if (!page) {
continue;
}
for (const ref of page.frontmatter.sl_refs ?? []) {
const parsed = parseSlRef(ref);
const candidateConnections = parsed.connectionId ? [parsed.connectionId] : input.connectionIds;
let source: Awaited<ReturnType<SemanticLayerService['loadAllSources']>>['sources'][number] | undefined;
for (const connectionId of candidateConnections) {
source = sourcesByConnection.get(connectionId)?.find((candidate) => candidate.name === parsed.sourceName);
if (source) {
break;
}
}
if (!source) {
errors.push(`${pageKey}: unknown sl_refs entry ${ref}`);
continue;
}
if (parsed.entityName && !slEntityNames(source).has(parsed.entityName)) {
errors.push(`${pageKey}: unknown sl_refs entity ${ref}`);
}
}
}
return errors;
}
async function validateWikiRefs(input: FinalArtifactGateInput): Promise<string[]> {
const dangling: string[] = [];
for (const pageKey of input.changedWikiPageKeys) {
const page = await input.wikiService.readPage('GLOBAL', null, pageKey);
if (!page) {
continue;
}
const missingRefs = await findMissingWikiRefs({
wikiService: input.wikiService,
scope: 'GLOBAL',
scopeId: null,
pageKey,
refs: page.frontmatter.refs,
content: page.content,
});
for (const missingRef of missingRefs) {
dangling.push(`${pageKey} -> ${missingRef}`);
}
}
return dangling;
}
export async function validateFinalIngestArtifacts(input: FinalArtifactGateInput): Promise<void> {
const touchedWithDependencies = await expandTouchedSlSourcesWithDirectJoinNeighbors(input);
const validation = await input.validateTouchedSources(touchedWithDependencies);
const errors: string[] = validation.invalidSources.map((source) => `semantic-layer validation failed for ${source}`);
errors.push(...(await validateWikiSlRefs(input)));
const danglingWikiRefs = await validateWikiRefs(input);
if (danglingWikiRefs.length > 0) {
errors.push(`wiki references target missing page(s): ${danglingWikiRefs.join(', ')}`);
}
for (const pageKey of input.changedWikiPageKeys) {
const page = await input.wikiService.readPage('GLOBAL', null, pageKey);
if (!page) {
continue;
}
errors.push(
...(await findInvalidWikiBodyRefs({
pageKey,
body: page.content,
visibleConnectionIds: input.connectionIds,
loadSources: async (connectionId) => {
const { sources } = await input.semanticLayerService.loadAllSources(connectionId);
return sources;
},
tableExists: input.tableExists,
})),
);
}
if (errors.length > 0) {
throw new Error(`final artifact gates failed:\n${errors.join('\n')}`);
}
}
export function validateProvenanceRawPaths(input: ProvenanceRawPathValidationInput): void {
for (const row of input.rows) {
if (!input.currentRawPaths.has(row.rawPath) && !input.deletedRawPaths.has(row.rawPath)) {
throw new Error(`provenance row references raw path outside this snapshot: ${row.rawPath}`);
}
}
}

View file

@ -0,0 +1,136 @@
import { mkdir, mkdtemp, readFile, writeFile } from 'node:fs/promises';
import { tmpdir } from 'node:os';
import { join } from 'node:path';
import { describe, expect, it, vi } from 'vitest';
import { finalGateRepairPaths, repairFinalGateFailure } from './final-gate-repair.js';
import { FileIngestTraceWriter } from './ingest-trace.js';
async function makeHarness() {
const root = await mkdtemp(join(tmpdir(), 'ktx-final-gate-repair-'));
const workdir = join(root, 'workdir');
await mkdir(join(workdir, 'wiki/global'), { recursive: true });
await mkdir(join(workdir, 'semantic-layer/warehouse'), { recursive: true });
await writeFile(
join(workdir, 'wiki/global/account-segments.md'),
'---\nsummary: Account segments\nusage_mode: auto\n---\n\nARR uses `mart_account_segments.total_contract_arr_cents`.\n',
'utf-8',
);
await writeFile(
join(workdir, 'semantic-layer/warehouse/mart_account_segments.yaml'),
'name: mart_account_segments\ncolumns: [{name: account_id, type: string}]\njoins: []\nmeasures:\n - name: total_contract_arr\n expr: sum(contract_arr)\n',
'utf-8',
);
const trace = new FileIngestTraceWriter({
tracePath: join(root, 'trace.jsonl'),
jobId: 'job-1',
connectionId: 'warehouse',
sourceKey: 'metabase',
runId: 'run-1',
syncId: 'sync-1',
level: 'trace',
});
return { root, workdir, trace };
}
describe('finalGateRepairPaths', () => {
it('derives sorted wiki and semantic-layer file paths', () => {
expect(
finalGateRepairPaths({
changedWikiPageKeys: ['account-segments', 'overview', 'account-segments'],
touchedSlSources: [
{ connectionId: 'warehouse', sourceName: 'mart_account_segments' },
{ connectionId: 'warehouse', sourceName: 'orders' },
{ connectionId: 'warehouse', sourceName: 'orders' },
],
}),
).toEqual([
'semantic-layer/warehouse/mart_account_segments.yaml',
'semantic-layer/warehouse/orders.yaml',
'wiki/global/account-segments.md',
'wiki/global/overview.md',
]);
});
});
describe('repairFinalGateFailure', () => {
it('lets the repair agent read gate errors and edit only allowed files', async () => {
const { workdir, trace } = await makeHarness();
const agentRunner = {
runLoop: vi.fn(async (params: any) => {
const error = await params.toolSet.read_gate_error.execute({});
expect(error.markdown).toContain('total_contract_arr_cents');
const page = await params.toolSet.read_repair_file.execute({
path: 'wiki/global/account-segments.md',
});
expect(page.markdown).toContain('total_contract_arr_cents');
await expect(
params.toolSet.write_repair_file.execute({
path: 'wiki/global/other.md',
content: 'not allowed',
}),
).rejects.toThrow(/gate repair path not allowed/);
await params.toolSet.write_repair_file.execute({
path: 'wiki/global/account-segments.md',
content: page.markdown.replace('total_contract_arr_cents', 'total_contract_arr'),
});
return { stopReason: 'natural' as const };
}),
};
const result = await repairFinalGateFailure({
agentRunner,
workdir,
gateError:
'final artifact gates failed:\naccount-segments: unknown semantic-layer entity mart_account_segments.total_contract_arr_cents',
allowedPaths: ['wiki/global/account-segments.md'],
trace,
repairKind: 'final_artifact_gate',
maxAttempts: 1,
stepBudget: 8,
});
expect(result).toEqual({
status: 'repaired',
attempts: 1,
changedPaths: ['wiki/global/account-segments.md'],
});
await expect(readFile(join(workdir, 'wiki/global/account-segments.md'), 'utf-8')).resolves.toContain(
'total_contract_arr',
);
await expect(readFile(trace.tracePath, 'utf-8')).resolves.toContain('gate_repair_repaired');
expect(agentRunner.runLoop).toHaveBeenCalledWith(
expect.objectContaining({
modelRole: 'repair',
stepBudget: 8,
telemetryTags: expect.objectContaining({
operationName: 'ingest-isolated-diff-gate-repair',
repairKind: 'final_artifact_gate',
}),
}),
);
});
it('returns failed when the repair agent edits no allowed file', async () => {
const { workdir, trace } = await makeHarness();
const result = await repairFinalGateFailure({
agentRunner: { runLoop: vi.fn(async () => ({ stopReason: 'natural' as const })) },
workdir,
gateError: 'final artifact gates failed:\naccount-segments: unknown semantic-layer entity',
allowedPaths: ['wiki/global/account-segments.md'],
trace,
repairKind: 'final_artifact_gate',
maxAttempts: 1,
stepBudget: 8,
});
expect(result).toEqual({
status: 'failed',
attempts: 1,
reason: 'gate repair completed without editing an allowed path',
});
await expect(readFile(trace.tracePath, 'utf-8')).resolves.toContain('gate_repair_failed');
});
});

View file

@ -0,0 +1,230 @@
import { mkdir, readFile, writeFile } from 'node:fs/promises';
import { dirname, join } from 'node:path';
import { z } from 'zod';
import type { AgentRunnerPort, KtxRuntimeToolSet } from '../llm/index.js';
import type { TouchedSlSource } from '../tools/index.js';
import type { IngestTraceWriter } from './ingest-trace.js';
import { traceTimed } from './ingest-trace.js';
type FinalGateRepairKind = 'patch_semantic_gate' | 'final_artifact_gate';
export type FinalGateRepairResult =
| { status: 'repaired'; attempts: number; changedPaths: string[] }
| { status: 'failed'; attempts: number; reason: string };
export interface RepairFinalGateFailureInput {
agentRunner: AgentRunnerPort;
workdir: string;
gateError: string;
allowedPaths: string[];
trace: IngestTraceWriter;
repairKind: FinalGateRepairKind;
maxAttempts?: number;
stepBudget?: number;
}
const readRepairFileSchema = z.object({
path: z.string().min(1),
});
const writeRepairFileSchema = z.object({
path: z.string().min(1),
content: z.string(),
});
function normalizeRepoPath(path: string): string {
const normalized = path.replace(/\\/g, '/').replace(/^\/+/, '');
const parts = normalized.split('/').filter((part) => part.length > 0);
if (parts.length === 0 || parts.some((part) => part === '.' || part === '..')) {
throw new Error(`gate repair path must be a repository-relative path: ${path}`);
}
return parts.join('/');
}
function assertAllowedPath(path: string, allowedPaths: ReadonlySet<string>): string {
const normalized = normalizeRepoPath(path);
if (!allowedPaths.has(normalized)) {
throw new Error(`gate repair path not allowed: ${normalized}`);
}
return normalized;
}
async function readOptionalFile(path: string): Promise<{ exists: boolean; content: string }> {
try {
return { exists: true, content: await readFile(path, 'utf-8') };
} catch (error) {
if (error && typeof error === 'object' && 'code' in error && error.code === 'ENOENT') {
return { exists: false, content: '' };
}
throw error;
}
}
function buildGateRepairSystemPrompt(): string {
return `<role>
You repair one KTX isolated-diff artifact gate failure inside the integration worktree.
</role>
<rules>
- Use read_gate_error first.
- Read only files exposed by read_repair_file.
- Edit only paths exposed by write_repair_file.
- Prefer the smallest text edit that makes the gate pass.
- Preserve accepted work-unit, reconciliation, and deterministic projection content.
- Do not invent warehouse facts, business definitions, or semantic-layer entities.
- If the gate error requires choosing between conflicting facts without evidence, stop without editing.
</rules>`;
}
function buildGateRepairUserPrompt(input: {
gateError: string;
allowedPaths: string[];
repairKind: FinalGateRepairKind;
attempt: number;
maxAttempts: number;
}): string {
return `Repair isolated-diff artifact gates.
Repair kind: ${input.repairKind}
Attempt: ${input.attempt} of ${input.maxAttempts}
Allowed files:
${input.allowedPaths.map((path) => `- ${path}`).join('\n')}
Gate error:
${input.gateError}
Use read_gate_error first. Then inspect only the allowed files, write the
minimal repaired content, and stop.`;
}
function buildToolSet(input: {
workdir: string;
gateError: string;
allowedPaths: ReadonlySet<string>;
editedPaths: Set<string>;
}): KtxRuntimeToolSet {
return {
read_gate_error: {
name: 'read_gate_error',
description: 'Read the artifact gate failure that must be repaired.',
inputSchema: z.object({}),
execute: async () => ({
markdown: input.gateError,
structured: { gateError: input.gateError },
}),
},
read_repair_file: {
name: 'read_repair_file',
description: 'Read one allowed file from the integration worktree.',
inputSchema: readRepairFileSchema,
execute: async ({ path }: z.infer<typeof readRepairFileSchema>) => {
const normalized = assertAllowedPath(path, input.allowedPaths);
const file = await readOptionalFile(join(input.workdir, normalized));
return {
markdown: file.exists ? file.content : `(missing file: ${normalized})`,
structured: { path: normalized, exists: file.exists },
};
},
},
write_repair_file: {
name: 'write_repair_file',
description: 'Replace one allowed integration worktree file with repaired text content.',
inputSchema: writeRepairFileSchema,
execute: async ({ path, content }: z.infer<typeof writeRepairFileSchema>) => {
const normalized = assertAllowedPath(path, input.allowedPaths);
const fullPath = join(input.workdir, normalized);
await mkdir(dirname(fullPath), { recursive: true });
await writeFile(fullPath, content, 'utf-8');
input.editedPaths.add(normalized);
return {
markdown: `Wrote ${normalized}`,
structured: { path: normalized, bytes: Buffer.byteLength(content) },
};
},
},
};
}
export function finalGateRepairPaths(input: {
changedWikiPageKeys: string[];
touchedSlSources: TouchedSlSource[];
}): string[] {
return [
...new Set([
...input.touchedSlSources.map((source) => `semantic-layer/${source.connectionId}/${source.sourceName}.yaml`),
...input.changedWikiPageKeys.map((pageKey) => `wiki/global/${pageKey}.md`),
]),
].sort();
}
export async function repairFinalGateFailure(
input: RepairFinalGateFailureInput,
): Promise<FinalGateRepairResult> {
const allowedPaths = new Set(input.allowedPaths.map(normalizeRepoPath));
const maxAttempts = input.maxAttempts ?? 1;
const stepBudget = input.stepBudget ?? 16;
let lastFailure = 'gate repair did not run';
for (let attempt = 1; attempt <= maxAttempts; attempt += 1) {
const editedPaths = new Set<string>();
const sortedAllowedPaths = [...allowedPaths].sort();
const traceData = {
repairKind: input.repairKind,
attempt,
maxAttempts,
allowedPaths: sortedAllowedPaths,
gateError: input.gateError,
};
const result = await traceTimed(input.trace, 'gate_repair', 'gate_repair', traceData, async () =>
input.agentRunner.runLoop({
modelRole: 'repair',
systemPrompt: buildGateRepairSystemPrompt(),
userPrompt: buildGateRepairUserPrompt({
gateError: input.gateError,
allowedPaths: sortedAllowedPaths,
repairKind: input.repairKind,
attempt,
maxAttempts,
}),
toolSet: buildToolSet({
workdir: input.workdir,
gateError: input.gateError,
allowedPaths,
editedPaths,
}),
stepBudget,
telemetryTags: {
operationName: 'ingest-isolated-diff-gate-repair',
source: input.trace.context.sourceKey,
jobId: input.trace.context.jobId,
repairKind: input.repairKind,
},
}),
);
if (result.stopReason === 'error') {
lastFailure = result.error?.message ?? 'gate repair agent loop errored';
await input.trace.event('error', 'gate_repair', 'gate_repair_failed', traceData, result.error);
continue;
}
const changedPaths = [...editedPaths].sort();
if (changedPaths.length === 0) {
lastFailure = 'gate repair completed without editing an allowed path';
await input.trace.event('error', 'gate_repair', 'gate_repair_failed', {
...traceData,
reason: lastFailure,
});
continue;
}
await input.trace.event('debug', 'gate_repair', 'gate_repair_repaired', {
...traceData,
changedPaths,
});
return { status: 'repaired', attempts: attempt, changedPaths };
}
return { status: 'failed', attempts: maxAttempts, reason: lastFailure };
}

View file

@ -17,6 +17,11 @@ export {
buildLiveDatabaseTableNaturalKey,
ktxSchemaSnapshotToExtractedSchema,
} from './adapters/live-database/extracted-schema.js';
export {
assertSemanticLayerTargetPathsAllowed,
findDisallowedSemanticLayerTargetPaths,
semanticLayerConnectionIdFromPath,
} from './semantic-layer-target-policy.js';
export { LiveDatabaseSourceAdapter } from './adapters/live-database/live-database.adapter.js';
export type {
BuildLiveDatabaseManifestShardsInput,
@ -609,6 +614,11 @@ export {
} from './raw-sources-paths.js';
export { ingestReportSnapshotSchema, parseIngestReportSnapshot } from './report-snapshot.js';
export type { IngestReportBody, IngestReportSnapshot } from './reports.js';
export * from './artifact-gates.js';
export * from './ingest-trace.js';
export * from './isolated-diff/git-patch.js';
export * from './isolated-diff/patch-integrator.js';
export * from './isolated-diff/work-unit-executor.js';
export * from './reports.js';
export { SourceAdapterRegistry } from './source-adapter-registry.js';
export type { SqliteBundleIngestStoreOptions } from './sqlite-bundle-ingest-store.js';
@ -652,4 +662,7 @@ export type {
TriageSignals,
UnresolvedCardInfo,
WorkUnit,
DeterministicProjectionContext,
ProjectionResult,
} from './types.js';
export * from './wiki-body-refs.js';

File diff suppressed because it is too large Load diff

View file

@ -1,8 +1,7 @@
import { mkdir, mkdtemp, readFile, rm, stat, writeFile } from 'node:fs/promises';
import { mkdtemp, readFile, rm, writeFile } from 'node:fs/promises';
import { tmpdir } from 'node:os';
import { join } from 'node:path';
import { beforeEach, describe, expect, it, vi } from 'vitest';
import { GitService } from '../core/index.js';
import { addTouchedSlSource } from '../tools/index.js';
import { IngestBundleRunner } from './ingest-bundle.runner.js';
import { createMemoryFlowLiveBuffer } from './memory-flow/live-buffer.js';
@ -123,9 +122,15 @@ const makeDeps = () => {
};
const scopedGit = {
revParseHead: vi.fn().mockResolvedValue('h'),
commitFiles: vi.fn(),
commitFiles: vi.fn().mockResolvedValue({ created: true, commitHash: 'h' }),
commitStaged: vi.fn().mockResolvedValue({ created: false, commitHash: 'h' }),
resetHardTo: vi.fn(),
assertWorktreeClean: vi.fn().mockResolvedValue(undefined),
writeBinaryNoRenamePatch: vi.fn(async (_base: string, _head: string, patchPath: string) => {
await writeFile(patchPath, '', 'utf-8');
}),
applyPatchFile3WayIndex: vi.fn(),
diffNameStatus: vi.fn().mockResolvedValue([]),
};
const sessionWorktreeService = {
create: vi.fn().mockResolvedValue({
@ -167,10 +172,12 @@ const makeDeps = () => {
loadPrompt: vi.fn().mockResolvedValue('base-framing'),
};
const wikiService = {
forWorktree: vi.fn().mockReturnValue({}),
forWorktree: vi.fn(),
listPageKeys: vi.fn().mockResolvedValue([]),
readPage: vi.fn().mockResolvedValue(null),
syncFromCommit: vi.fn().mockResolvedValue(undefined),
};
wikiService.forWorktree.mockReturnValue(wikiService);
const knowledgeSlRefs = {
syncFromWiki: vi.fn().mockResolvedValue({ inserted: 1, deleted: 0 }),
};
@ -178,7 +185,7 @@ const makeDeps = () => {
listPagesForUser: vi.fn().mockResolvedValue([]),
};
const semanticLayerService = {
forWorktree: vi.fn().mockReturnValue({}),
forWorktree: vi.fn(),
listFilesForConnection: vi
.fn()
.mockImplementation((connectionId: string) =>
@ -193,6 +200,7 @@ const makeDeps = () => {
}),
),
};
semanticLayerService.forWorktree.mockReturnValue(semanticLayerService);
const slSearchService = {
indexSources: vi.fn().mockResolvedValue(undefined),
};
@ -255,8 +263,12 @@ const buildRunner = (deps: ReturnType<typeof makeDeps> = makeDeps(), overrides:
resolveUploadDir: (uploadId) => `/tmp/ktx-test/ingest-uploads/${uploadId}`,
resolvePullDir: (jobId) => `/tmp/ktx-test/ingest-pulls/${jobId}`,
resolveTranscriptDir: (jobId) => `/tmp/ktx-test/run/wu-transcripts/${jobId}`,
resolveTracePath: (jobId) => `/tmp/ktx-test/ingest-traces/${jobId}/trace.jsonl`,
},
settings: {
probeRowCount: 1,
memoryIngestionModel: 'test-model',
},
settings: { probeRowCount: 1, memoryIngestionModel: 'test-model' },
skillsRegistry: deps.skillsRegistry as any,
promptService: deps.promptService as any,
wikiService: deps.wikiService as any,
@ -1505,7 +1517,7 @@ describe('IngestBundleRunner — Stages 1 → 7', () => {
const runner = buildRunner(deps);
(runner as any).stageRawFilesStage1 = vi.fn().mockResolvedValue({
currentHashes: new Map([['explores/b2b/sales_pipeline.json', 'h1']]),
currentHashes: new Map([['a.yml', 'h1']]),
rawDirInWorktree: 'raw-sources/looker-run/fake/s',
});
(runner as any).resolveStagedDir = vi.fn().mockResolvedValue('/tmp/stage/upload-x');
@ -1570,6 +1582,7 @@ describe('IngestBundleRunner — Stages 1 → 7', () => {
workUnits: [{ unitKey: 'u1', rawFiles: ['semantic_models.yml'], peerFileIndex: [], dependencyPaths: [] }],
parseArtifacts: { semanticModels: [{ name: 'orders' }] },
});
deps.adapter.listTargetConnectionIds = vi.fn().mockResolvedValue(['warehouse-2']);
deps.semanticLayerService.loadAllSources.mockImplementation((connectionId: string) =>
Promise.resolve({ sources: [{ name: `${connectionId}_source` }], loadErrors: [] }),
);
@ -1972,9 +1985,15 @@ describe('IngestBundleRunner — Stages 1 → 7', () => {
const assertError = new Error('Worktree has in-progress git operation (sequencer ...); refusing to proceed');
const sessionGit = {
revParseHead: vi.fn().mockResolvedValue('h'),
commitFiles: vi.fn(),
commitFiles: vi.fn().mockResolvedValue({ created: true, commitHash: 'h' }),
commitStaged: vi.fn().mockResolvedValue({ created: false, commitHash: 'h' }),
resetHardTo: vi.fn(),
assertWorktreeClean: vi.fn().mockRejectedValue(assertError),
writeBinaryNoRenamePatch: vi.fn(async (_base: string, _head: string, patchPath: string) => {
await writeFile(patchPath, '', 'utf-8');
}),
applyPatchFile3WayIndex: vi.fn(),
diffNameStatus: vi.fn().mockResolvedValue([]),
};
deps.sessionWorktreeService.create.mockResolvedValue({
chatId: 'j1',
@ -2005,135 +2024,6 @@ describe('IngestBundleRunner — Stages 1 → 7', () => {
expect(deps.gitService.squashMergeIntoMain).not.toHaveBeenCalled();
});
it('squash-merges only successful WUs into main when one WU fails sl_validate', async () => {
const homeDir = await mkdtemp(join(tmpdir(), 'ingest-rollback-'));
try {
const configDir = join(homeDir, 'config');
const mainGit = new GitService({
storage: { configDir, homeDir },
git: {
userName: 'System User',
userEmail: 'system@example.com',
bootstrapMessage: 'Initialize test config repo',
bootstrapAuthor: 'test-system',
bootstrapAuthorEmail: 'system@example.com',
},
});
await mainGit.onModuleInit();
const baseSha = await mainGit.revParseHead();
if (!baseSha) {
throw new Error('no base sha');
}
const deps = makeDeps();
const sessionDir = join(homeDir, '.worktrees', 'session-j1');
const sessionBranch = 'session/j1';
let currentToolSession: any = null;
deps.gitService = mainGit as any;
deps.sessionWorktreeService.create.mockImplementation(async (_jobId: string, startSha: string) => {
await mkdir(join(homeDir, '.worktrees'), { recursive: true });
await mainGit.addWorktree(sessionDir, sessionBranch, startSha);
return {
chatId: 'j1',
workdir: sessionDir,
branch: sessionBranch,
baseSha: startSha,
createdAt: new Date(),
git: mainGit.forWorktree(sessionDir),
config: {},
};
});
deps.sessionWorktreeService.cleanup.mockResolvedValue(undefined);
deps.adapter.chunk.mockResolvedValue({
workUnits: [
{ unitKey: 'wu-good', rawFiles: ['good.raw'], peerFileIndex: [], dependencyPaths: [] },
{ unitKey: 'wu-bad', rawFiles: ['bad.raw'], peerFileIndex: [], dependencyPaths: [] },
],
});
deps.toolsetFactory.createIngestWuToolset.mockImplementation((toolSession: any) => {
currentToolSession = toolSession;
return {
toRuntimeTools: vi.fn().mockReturnValue({}),
getAllTools: vi.fn().mockReturnValue([]),
getToolNames: vi.fn().mockReturnValue([]),
};
});
deps.slValidator.validateSingleSource.mockImplementation(
(_validationDeps: unknown, _connectionId: string, sourceName: string) => ({
errors: sourceName === 'bad' ? [{ message: 'bad source rejected' }] : [],
warnings: [],
}),
);
deps.agentRunner.runLoop.mockImplementation(async (params: any) => {
const unitKey = params.telemetryTags?.unitKey;
if (unitKey === 'wu-good') {
await mkdir(join(sessionDir, 'semantic-layer', 'c1'), { recursive: true });
await writeFile(join(sessionDir, 'semantic-layer', 'c1', 'good.yaml'), 'name: good\n');
addTouchedSlSource(currentToolSession.touchedSlSources, 'c1', 'good');
currentToolSession.actions.push({ target: 'sl', type: 'created', key: 'good', detail: '' });
await currentToolSession.gitService.commitFiles(
['semantic-layer/c1/good.yaml'],
'test: add good source',
'KTX Test',
'system@ktx.local',
);
}
if (unitKey === 'wu-bad') {
await mkdir(join(sessionDir, 'semantic-layer', 'c1'), { recursive: true });
await writeFile(join(sessionDir, 'semantic-layer', 'c1', 'bad.yaml'), 'name: bad\n');
addTouchedSlSource(currentToolSession.touchedSlSources, 'c1', 'bad');
currentToolSession.actions.push({ target: 'sl', type: 'created', key: 'bad', detail: '' });
await currentToolSession.gitService.commitFiles(
['semantic-layer/c1/bad.yaml'],
'test: add bad source',
'KTX Test',
'system@ktx.local',
);
}
return { stopReason: 'natural' };
});
const runner = buildRunner(deps);
(runner as any).stageRawFilesStage1 = vi.fn().mockImplementation(async ({ worktreeRoot }: any) => {
const rawDir = join(worktreeRoot, 'raw-sources', 'c1', 'fake', 's');
await mkdir(rawDir, { recursive: true });
await writeFile(join(rawDir, 'good.raw'), 'good raw');
await writeFile(join(rawDir, 'bad.raw'), 'bad raw');
return {
currentHashes: new Map([
['good.raw', 'good-hash'],
['bad.raw', 'bad-hash'],
]),
rawDirInWorktree: 'raw-sources/c1/fake/s',
};
});
(runner as any).resolveStagedDir = vi.fn().mockResolvedValue('/tmp/stage/upload-x');
const result = await runner.run({
jobId: 'j1',
connectionId: 'c1',
sourceKey: 'fake',
trigger: 'upload',
bundleRef: { kind: 'upload', uploadId: 'upload-x' },
});
expect(result.failedWorkUnits).toEqual(['wu-bad']);
expect(await readFile(join(configDir, 'semantic-layer', 'c1', 'good.yaml'), 'utf-8')).toContain('good');
expect(await readFile(join(configDir, 'semantic-layer', 'c1', 'bad.yaml'), 'utf-8').catch(() => null)).toBeNull();
expect(deps.reportsRepo.create).toHaveBeenCalledWith(
expect.objectContaining({
body: expect.objectContaining({
failedWorkUnits: ['wu-bad'],
}),
}),
);
await expect(stat(join(configDir, '.git', 'sequencer'))).rejects.toThrow();
} finally {
await rm(homeDir, { recursive: true, force: true });
}
});
it('fails the run and rethrows when the adapter cannot detect the bundle', async () => {
const deps = makeDeps();
deps.adapter.detect.mockResolvedValue(false);

File diff suppressed because it is too large Load diff

View file

@ -0,0 +1,85 @@
import { mkdtemp, readFile } from 'node:fs/promises';
import { tmpdir } from 'node:os';
import { join } from 'node:path';
import { describe, expect, it, vi } from 'vitest';
import { FileIngestTraceWriter, ingestTracePathForJob, traceTimed } from './ingest-trace.js';
describe('FileIngestTraceWriter', () => {
it('persists structured trace events as JSONL', async () => {
const root = await mkdtemp(join(tmpdir(), 'ktx-trace-'));
const tracePath = ingestTracePathForJob(root, 'job-1');
const trace = new FileIngestTraceWriter({
tracePath,
jobId: 'job-1',
connectionId: 'metabase-main',
sourceKey: 'metabase',
level: 'debug',
});
await trace.event('debug', 'snapshot', 'input_snapshot', {
baseSha: 'abc123',
rawFileCount: 2,
diffSummary: { added: 1, modified: 1, deleted: 0, unchanged: 3 },
});
const lines = (await readFile(tracePath, 'utf-8'))
.trim()
.split('\n')
.map((line) => JSON.parse(line));
expect(lines).toHaveLength(1);
expect(lines[0]).toMatchObject({
schemaVersion: 1,
jobId: 'job-1',
connectionId: 'metabase-main',
sourceKey: 'metabase',
level: 'debug',
phase: 'snapshot',
event: 'input_snapshot',
data: {
baseSha: 'abc123',
rawFileCount: 2,
diffSummary: { added: 1, modified: 1, deleted: 0, unchanged: 3 },
},
});
expect(typeof lines[0].at).toBe('string');
});
it('records timing and error context for postmortem inspection', async () => {
vi.useFakeTimers();
vi.setSystemTime(new Date('2026-05-17T12:00:00.000Z'));
const root = await mkdtemp(join(tmpdir(), 'ktx-trace-'));
const tracePath = ingestTracePathForJob(root, 'job-2');
const trace = new FileIngestTraceWriter({
tracePath,
jobId: 'job-2',
connectionId: 'c1',
sourceKey: 'fake',
level: 'trace',
});
await expect(
traceTimed(trace, 'integration', 'apply_patch', { unitKey: 'wu-1' }, async () => {
vi.advanceTimersByTime(17);
throw new Error('patch conflict');
}),
).rejects.toThrow('patch conflict');
const lines = (await readFile(tracePath, 'utf-8'))
.trim()
.split('\n')
.map((line) => JSON.parse(line));
expect(lines.map((line) => line.event)).toEqual(['apply_patch_started', 'apply_patch_failed']);
expect(lines[1]).toMatchObject({
level: 'error',
phase: 'integration',
data: { unitKey: 'wu-1' },
error: { name: 'Error', message: 'patch conflict' },
});
expect(lines[1].durationMs).toBe(17);
vi.useRealTimers();
});
it('uses the documented trace path layout', () => {
expect(ingestTracePathForJob('/project/.ktx', 'job-3')).toBe('/project/.ktx/ingest-traces/job-3/trace.jsonl');
});
});

View file

@ -0,0 +1,158 @@
import { appendFile, mkdir } from 'node:fs/promises';
import { dirname, join } from 'node:path';
export type IngestTraceLevel = 'info' | 'debug' | 'trace' | 'error';
const TRACE_LEVEL_RANK: Record<IngestTraceLevel, number> = {
error: 0,
info: 1,
debug: 2,
trace: 3,
};
export interface IngestTraceContext {
tracePath: string;
jobId: string;
connectionId: string;
sourceKey: string;
runId?: string;
syncId?: string;
level?: IngestTraceLevel;
}
export interface IngestTraceEvent {
schemaVersion: 1;
at: string;
level: IngestTraceLevel;
jobId: string;
connectionId: string;
sourceKey: string;
runId?: string;
syncId?: string;
phase: string;
event: string;
durationMs?: number;
data?: Record<string, unknown>;
error?: {
name: string;
message: string;
stack?: string;
};
}
export interface IngestTraceWriter {
readonly tracePath: string;
readonly context: IngestTraceContext;
withContext(context: Partial<Pick<IngestTraceContext, 'runId' | 'syncId'>>): IngestTraceWriter;
event(
level: IngestTraceLevel,
phase: string,
event: string,
data?: Record<string, unknown>,
error?: unknown,
durationMs?: number,
): Promise<void>;
}
export function ingestTracePathForJob(homeDir: string, jobId: string): string {
return join(homeDir, 'ingest-traces', jobId, 'trace.jsonl');
}
function serializeError(error: unknown): IngestTraceEvent['error'] | undefined {
if (error === undefined || error === null) {
return undefined;
}
if (error instanceof Error) {
return {
name: error.name,
message: error.message,
...(error.stack ? { stack: error.stack } : {}),
};
}
return { name: 'Error', message: String(error) };
}
function shouldWrite(configured: IngestTraceLevel, incoming: IngestTraceLevel): boolean {
return TRACE_LEVEL_RANK[incoming] <= TRACE_LEVEL_RANK[configured];
}
export class FileIngestTraceWriter implements IngestTraceWriter {
readonly tracePath: string;
readonly context: IngestTraceContext;
constructor(context: IngestTraceContext) {
this.context = { ...context, level: context.level ?? 'debug' };
this.tracePath = context.tracePath;
}
withContext(context: Partial<Pick<IngestTraceContext, 'runId' | 'syncId'>>): IngestTraceWriter {
return new FileIngestTraceWriter({ ...this.context, ...context, tracePath: this.tracePath });
}
async event(
level: IngestTraceLevel,
phase: string,
event: string,
data?: Record<string, unknown>,
error?: unknown,
durationMs?: number,
): Promise<void> {
if (!shouldWrite(this.context.level ?? 'debug', level)) {
return;
}
const serializedError = serializeError(error);
const payload: IngestTraceEvent = {
schemaVersion: 1,
at: new Date().toISOString(),
level,
jobId: this.context.jobId,
connectionId: this.context.connectionId,
sourceKey: this.context.sourceKey,
...(this.context.runId ? { runId: this.context.runId } : {}),
...(this.context.syncId ? { syncId: this.context.syncId } : {}),
phase,
event,
...(durationMs !== undefined ? { durationMs } : {}),
...(data ? { data } : {}),
...(serializedError ? { error: serializedError } : {}),
};
await mkdir(dirname(this.tracePath), { recursive: true });
await appendFile(this.tracePath, `${JSON.stringify(payload)}\n`, 'utf-8');
}
}
export class NoopIngestTraceWriter implements IngestTraceWriter {
readonly tracePath = '';
readonly context: IngestTraceContext = {
tracePath: '',
jobId: '',
connectionId: '',
sourceKey: '',
level: 'error',
};
withContext(): IngestTraceWriter {
return this;
}
async event(): Promise<void> {}
}
export async function traceTimed<T>(
trace: IngestTraceWriter,
phase: string,
event: string,
data: Record<string, unknown>,
fn: () => Promise<T>,
): Promise<T> {
await trace.event('debug', phase, `${event}_started`, data);
const started = Date.now();
try {
const result = await fn();
await trace.event('debug', phase, `${event}_finished`, data, undefined, Date.now() - started);
return result;
} catch (error) {
await trace.event('error', phase, `${event}_failed`, data, error, Date.now() - started);
throw error;
}
}

View file

@ -0,0 +1,97 @@
import { describe, expect, it } from 'vitest';
import { assertPatchAllowedForWorkUnit, parsePatchTouchedPaths, textArtifactRoots } from './git-patch.js';
describe('isolated diff patch contract', () => {
it('parses touched paths from no-rename git patches', () => {
const patch = [
'diff --git a/wiki/global/a.md b/wiki/global/a.md',
'index 1111111..2222222 100644',
'--- a/wiki/global/a.md',
'+++ b/wiki/global/a.md',
'@@ -1 +1 @@',
'-old',
'+new',
'diff --git a/semantic-layer/c1/orders.yaml b/semantic-layer/c1/orders.yaml',
'new file mode 100644',
'--- /dev/null',
'+++ b/semantic-layer/c1/orders.yaml',
'@@ -0,0 +1 @@',
'+name: orders',
'',
].join('\n');
expect(parsePatchTouchedPaths(patch)).toEqual([
{
path: 'wiki/global/a.md',
oldPath: 'wiki/global/a.md',
newPath: 'wiki/global/a.md',
mode: '100644',
binary: false,
},
{
path: 'semantic-layer/c1/orders.yaml',
oldPath: 'semantic-layer/c1/orders.yaml',
newPath: 'semantic-layer/c1/orders.yaml',
mode: '100644',
binary: false,
},
]);
});
it('rejects semantic-layer paths for slDisallowed work units', () => {
const patch = 'diff --git a/semantic-layer/c1/orders.yaml b/semantic-layer/c1/orders.yaml\nindex 1..2 100644\n';
expect(() =>
assertPatchAllowedForWorkUnit({
unitKey: 'lookml-mismatch',
patch,
slDisallowed: true,
}),
).toThrow(/slDisallowed WorkUnit lookml-mismatch touched semantic-layer\/c1\/orders.yaml/);
});
it('rejects semantic-layer paths outside allowed target connections', () => {
const patch =
'diff --git a/semantic-layer/finance/orders.yaml b/semantic-layer/finance/orders.yaml\nindex 1..2 100644\n';
expect(() =>
assertPatchAllowedForWorkUnit({
unitKey: 'wu-finance',
patch,
slDisallowed: false,
allowedTargetConnectionIds: new Set(['warehouse']),
}),
).toThrow(
/semantic-layer target connection not allowed: semantic-layer\/finance\/orders.yaml \(finance\); allowed: warehouse/,
);
});
it('rejects executable and binary changes under known text artifact roots', () => {
expect(textArtifactRoots).toEqual(['wiki/', 'semantic-layer/']);
const executablePatch =
'diff --git a/wiki/global/a.md b/wiki/global/a.md\nold mode 100644\nnew mode 100755\nindex 1..2\n';
expect(() =>
assertPatchAllowedForWorkUnit({
unitKey: 'wu-1',
patch: executablePatch,
slDisallowed: false,
}),
).toThrow(/unexpected executable mode under wiki\/global\/a.md/);
const binaryPatch = [
'diff --git a/semantic-layer/c1/orders.yaml b/semantic-layer/c1/orders.yaml',
'index 1111111..2222222 100644',
'GIT binary patch',
'literal 0',
'',
].join('\n');
expect(() =>
assertPatchAllowedForWorkUnit({
unitKey: 'wu-2',
patch: binaryPatch,
slDisallowed: false,
}),
).toThrow(/unexpected binary patch under semantic-layer\/c1\/orders.yaml/);
});
});

View file

@ -0,0 +1,101 @@
import { assertSemanticLayerTargetPathsAllowed } from '../semantic-layer-target-policy.js';
export const textArtifactRoots = ['wiki/', 'semantic-layer/'] as const;
export interface PatchTouchedPath {
path: string;
oldPath: string;
newPath: string;
mode: string | null;
binary: boolean;
}
export interface PatchPolicyInput {
unitKey: string;
patch: string;
slDisallowed: boolean;
allowedTargetConnectionIds?: ReadonlySet<string>;
}
function stripPrefix(path: string): string {
return path.replace(/^[ab]\//, '');
}
function isTextArtifactPath(path: string): boolean {
return textArtifactRoots.some((root) => path.startsWith(root));
}
export function parsePatchTouchedPaths(patch: string): PatchTouchedPath[] {
const lines = patch.split('\n');
const entries: PatchTouchedPath[] = [];
let current: PatchTouchedPath | null = null;
const pushCurrent = () => {
if (current) {
entries.push(current);
}
};
for (const line of lines) {
const diffMatch = /^diff --git (.+) (.+)$/.exec(line);
if (diffMatch) {
pushCurrent();
const oldPath = stripPrefix(diffMatch[1] ?? '');
const newPath = stripPrefix(diffMatch[2] ?? '');
current = {
path: newPath === '/dev/null' ? oldPath : newPath,
oldPath,
newPath,
mode: null,
binary: false,
};
continue;
}
if (!current) {
continue;
}
const indexMode = /^index [0-9a-f]+\.\.[0-9a-f]+(?: ([0-7]{6}))?$/.exec(line);
if (indexMode?.[1]) {
current.mode = indexMode[1];
}
const newMode = /^new mode ([0-7]{6})$/.exec(line);
if (newMode) {
current.mode = newMode[1] ?? current.mode;
}
const newFileMode = /^new file mode ([0-7]{6})$/.exec(line);
if (newFileMode) {
current.mode = newFileMode[1] ?? current.mode;
}
if (line === 'GIT binary patch' || line.startsWith('Binary files ')) {
current.binary = true;
}
}
pushCurrent();
return entries;
}
export function assertPatchAllowedForWorkUnit(input: PatchPolicyInput): PatchTouchedPath[] {
const touched = parsePatchTouchedPaths(input.patch);
if (input.allowedTargetConnectionIds) {
assertSemanticLayerTargetPathsAllowed({
paths: touched.map((entry) => entry.path),
allowedConnectionIds: input.allowedTargetConnectionIds,
});
}
for (const entry of touched) {
if (input.slDisallowed && entry.path.startsWith('semantic-layer/')) {
throw new Error(`slDisallowed WorkUnit ${input.unitKey} touched ${entry.path}`);
}
if (!isTextArtifactPath(entry.path)) {
continue;
}
if (entry.binary) {
throw new Error(`unexpected binary patch under ${entry.path}`);
}
if (entry.mode && entry.mode !== '100644') {
throw new Error(`unexpected executable mode under ${entry.path}: ${entry.mode}`);
}
}
return touched;
}

View file

@ -0,0 +1,404 @@
import { mkdir, mkdtemp, readFile, writeFile } from 'node:fs/promises';
import { tmpdir } from 'node:os';
import { join } from 'node:path';
import { describe, expect, it, vi } from 'vitest';
import { GitService } from '../../core/index.js';
import { FileIngestTraceWriter } from '../ingest-trace.js';
import { integrateWorkUnitPatch } from './patch-integrator.js';
async function makeRepo() {
const homeDir = await mkdtemp(join(tmpdir(), 'ktx-integrate-'));
const configDir = join(homeDir, 'config');
const git = new GitService({
storage: { configDir, homeDir },
git: {
userName: 'System User',
userEmail: 'system@example.com',
bootstrapMessage: 'init',
bootstrapAuthor: 'system',
bootstrapAuthorEmail: 'system@example.com',
},
});
await git.onModuleInit();
await mkdir(join(configDir, 'wiki/global'), { recursive: true });
await writeFile(join(configDir, 'wiki/global/a.md'), 'old\n');
await git.commitFiles(['wiki/global/a.md'], 'base', 'System User', 'system@example.com');
return { homeDir, configDir, git, baseSha: await git.revParseHead() };
}
describe('integrateWorkUnitPatch', () => {
it('applies a clean patch, runs semantic gates, and commits accepted changes', async () => {
const { homeDir, configDir, git, baseSha } = await makeRepo();
const childDir = join(homeDir, 'child');
await git.addWorktree(childDir, 'child', baseSha);
const childGit = git.forWorktree(childDir);
await writeFile(join(childDir, 'wiki/global/a.md'), 'new\n');
await childGit.commitFiles(['wiki/global/a.md'], 'edit', 'System User', 'system@example.com');
const patchPath = join(homeDir, 'patches/wu.patch');
await childGit.writeBinaryNoRenamePatch(baseSha, 'HEAD', patchPath);
const trace = new FileIngestTraceWriter({
tracePath: join(homeDir, '.ktx/ingest-traces/job-1/trace.jsonl'),
jobId: 'job-1',
connectionId: 'c1',
sourceKey: 'fake',
level: 'trace',
});
const result = await integrateWorkUnitPatch({
unitKey: 'wu-1',
patchPath,
integrationGit: git,
trace,
author: { name: 'KTX Test', email: 'system@ktx.local' },
validateAppliedTree: vi.fn().mockResolvedValue(undefined),
slDisallowed: false,
allowedTargetConnectionIds: new Set(['c1']),
});
expect(result.status).toBe('accepted');
await expect(readFile(join(configDir, 'wiki/global/a.md'), 'utf-8')).resolves.toBe('new\n');
await expect(readFile(trace.tracePath, 'utf-8')).resolves.toContain('patch_apply_finished');
});
it('rolls back and classifies semantic conflicts', async () => {
const { homeDir, configDir, git, baseSha } = await makeRepo();
const childDir = join(homeDir, 'child-semantic');
await git.addWorktree(childDir, 'child-semantic', baseSha);
const childGit = git.forWorktree(childDir);
await writeFile(join(childDir, 'wiki/global/a.md'), 'bad\n');
await childGit.commitFiles(['wiki/global/a.md'], 'bad edit', 'System User', 'system@example.com');
const patchPath = join(homeDir, 'patches/bad.patch');
await childGit.writeBinaryNoRenamePatch(baseSha, 'HEAD', patchPath);
const trace = new FileIngestTraceWriter({
tracePath: join(homeDir, '.ktx/ingest-traces/job-2/trace.jsonl'),
jobId: 'job-2',
connectionId: 'c1',
sourceKey: 'fake',
level: 'trace',
});
const result = await integrateWorkUnitPatch({
unitKey: 'wu-bad',
patchPath,
integrationGit: git,
trace,
author: { name: 'KTX Test', email: 'system@ktx.local' },
validateAppliedTree: vi.fn().mockRejectedValue(new Error('final artifact gates failed')),
slDisallowed: false,
allowedTargetConnectionIds: new Set(['c1']),
});
expect(result.status).toBe('semantic_conflict');
await expect(readFile(join(configDir, 'wiki/global/a.md'), 'utf-8')).resolves.toBe('old\n');
});
it('classifies slDisallowed patch policy failures as traced textual conflicts', async () => {
const { homeDir, configDir, git, baseSha } = await makeRepo();
await mkdir(join(configDir, 'semantic-layer/c1'), { recursive: true });
await git.commitFiles(['semantic-layer/c1'], 'empty sl dir', 'System User', 'system@example.com');
const childDir = join(homeDir, 'child-policy');
await git.addWorktree(childDir, 'child-policy', baseSha);
const childGit = git.forWorktree(childDir);
await mkdir(join(childDir, 'semantic-layer/c1'), { recursive: true });
await writeFile(join(childDir, 'semantic-layer/c1/orders.yaml'), 'name: orders\ncolumns: []\njoins: []\nmeasures: []\n');
await childGit.commitFiles(['semantic-layer/c1/orders.yaml'], 'forbidden sl', 'System User', 'system@example.com');
const patchPath = join(homeDir, 'patches/forbidden.patch');
await childGit.writeBinaryNoRenamePatch(baseSha, 'HEAD', patchPath);
const trace = new FileIngestTraceWriter({
tracePath: join(homeDir, '.ktx/ingest-traces/job-policy/trace.jsonl'),
jobId: 'job-policy',
connectionId: 'c1',
sourceKey: 'fake',
level: 'trace',
});
const result = await integrateWorkUnitPatch({
unitKey: 'lookml-mismatch',
patchPath,
integrationGit: git,
trace,
author: { name: 'KTX Test', email: 'system@ktx.local' },
validateAppliedTree: vi.fn().mockResolvedValue(undefined),
slDisallowed: true,
allowedTargetConnectionIds: new Set(['c1']),
});
expect(result).toMatchObject({
status: 'textual_conflict',
touchedPaths: ['semantic-layer/c1/orders.yaml'],
});
const rawTrace = await readFile(trace.tracePath, 'utf-8');
expect(rawTrace).toContain('patch_policy_rejected');
expect(rawTrace).toContain('slDisallowed WorkUnit lookml-mismatch touched semantic-layer/c1/orders.yaml');
});
it('classifies unauthorized semantic-layer targets as traced textual conflicts', async () => {
const { homeDir, git, baseSha } = await makeRepo();
const childDir = join(homeDir, 'child-target-policy');
await git.addWorktree(childDir, 'child-target-policy', baseSha);
const childGit = git.forWorktree(childDir);
await mkdir(join(childDir, 'semantic-layer/finance'), { recursive: true });
await writeFile(
join(childDir, 'semantic-layer/finance/orders.yaml'),
'name: orders\ncolumns: []\njoins: []\nmeasures: []\n',
);
await childGit.commitFiles(['semantic-layer/finance/orders.yaml'], 'unauthorized sl', 'System User', 'system@example.com');
const patchPath = join(homeDir, 'patches/unauthorized.patch');
await childGit.writeBinaryNoRenamePatch(baseSha, 'HEAD', patchPath);
const trace = new FileIngestTraceWriter({
tracePath: join(homeDir, '.ktx/ingest-traces/job-target-policy/trace.jsonl'),
jobId: 'job-target-policy',
connectionId: 'c1',
sourceKey: 'fake',
level: 'trace',
});
const result = await integrateWorkUnitPatch({
unitKey: 'wu-finance',
patchPath,
integrationGit: git,
trace,
author: { name: 'KTX Test', email: 'system@ktx.local' },
validateAppliedTree: vi.fn().mockResolvedValue(undefined),
slDisallowed: false,
allowedTargetConnectionIds: new Set(['warehouse']),
});
expect(result).toMatchObject({
status: 'textual_conflict',
touchedPaths: ['semantic-layer/finance/orders.yaml'],
});
const rawTrace = await readFile(trace.tracePath, 'utf-8');
expect(rawTrace).toContain('patch_policy_rejected');
expect(rawTrace).toContain('semantic-layer target connection not allowed');
expect(rawTrace).toContain('allowedTargetConnectionIds');
});
it('repairs a textual conflict through the bounded resolver and commits repaired files', async () => {
const { homeDir, configDir, git, baseSha } = await makeRepo();
await mkdir(join(configDir, 'wiki/global'), { recursive: true });
await writeFile(join(configDir, 'wiki/global/a.md'), 'base\n', 'utf-8');
await git.commitFiles(['wiki/global/a.md'], 'base page', 'System User', 'system@example.com');
const conflictBase = await git.revParseHead();
await writeFile(join(configDir, 'wiki/global/a.md'), 'accepted\n', 'utf-8');
await git.commitFiles(['wiki/global/a.md'], 'accepted edit', 'System User', 'system@example.com');
const childDir = join(homeDir, 'child-conflict');
await git.addWorktree(childDir, 'child-conflict', conflictBase);
const childGit = git.forWorktree(childDir);
await writeFile(join(childDir, 'wiki/global/a.md'), 'proposal\n', 'utf-8');
await childGit.commitFiles(['wiki/global/a.md'], 'proposal edit', 'System User', 'system@example.com');
const patchPath = join(homeDir, 'proposal.patch');
await childGit.writeBinaryNoRenamePatch(conflictBase, 'HEAD', patchPath);
const trace = new FileIngestTraceWriter({
tracePath: join(homeDir, '.ktx/ingest-traces/job-resolver/trace.jsonl'),
jobId: 'job-resolver',
connectionId: 'warehouse',
sourceKey: 'metabase',
level: 'trace',
});
const validateAppliedTree = vi.fn(async (paths: string[]) => {
expect(paths).toEqual(['wiki/global/a.md']);
await expect(readFile(join(configDir, 'wiki/global/a.md'), 'utf-8')).resolves.toBe('accepted\nproposal\n');
});
const result = await integrateWorkUnitPatch({
unitKey: 'wu-conflict',
patchPath,
integrationGit: git,
trace,
author: { name: 'System User', email: 'system@example.com' },
slDisallowed: false,
allowedTargetConnectionIds: new Set(['warehouse']),
validateAppliedTree,
resolveTextualConflict: vi.fn(async (context) => {
expect(context).toMatchObject({
unitKey: 'wu-conflict',
patchPath,
touchedPaths: ['wiki/global/a.md'],
});
await writeFile(join(configDir, 'wiki/global/a.md'), 'accepted\nproposal\n', 'utf-8');
return {
status: 'repaired' as const,
attempts: 1,
changedPaths: ['wiki/global/a.md'],
};
}),
});
expect(result).toMatchObject({
status: 'accepted',
touchedPaths: ['wiki/global/a.md'],
textualResolution: {
status: 'repaired',
attempts: 1,
changedPaths: ['wiki/global/a.md'],
},
});
expect(validateAppliedTree).toHaveBeenCalledOnce();
await expect(readFile(join(configDir, 'wiki/global/a.md'), 'utf-8')).resolves.toBe('accepted\nproposal\n');
await expect(readFile(trace.tracePath, 'utf-8')).resolves.toContain('patch_accepted_after_textual_resolution');
expect(await git.revParseHead()).not.toBe(baseSha);
});
it('keeps the pre-apply integration tree when the resolver cannot repair a textual conflict', async () => {
const { homeDir, configDir, git } = await makeRepo();
await mkdir(join(configDir, 'wiki/global'), { recursive: true });
await writeFile(join(configDir, 'wiki/global/a.md'), 'base\n', 'utf-8');
await git.commitFiles(['wiki/global/a.md'], 'base page', 'System User', 'system@example.com');
const conflictBase = await git.revParseHead();
await writeFile(join(configDir, 'wiki/global/a.md'), 'accepted\n', 'utf-8');
await git.commitFiles(['wiki/global/a.md'], 'accepted edit', 'System User', 'system@example.com');
const acceptedHead = await git.revParseHead();
const childDir = join(homeDir, 'child-conflict-fails');
await git.addWorktree(childDir, 'child-conflict-fails', conflictBase);
const childGit = git.forWorktree(childDir);
await writeFile(join(childDir, 'wiki/global/a.md'), 'proposal\n', 'utf-8');
await childGit.commitFiles(['wiki/global/a.md'], 'proposal edit', 'System User', 'system@example.com');
const patchPath = join(homeDir, 'proposal-fails.patch');
await childGit.writeBinaryNoRenamePatch(conflictBase, 'HEAD', patchPath);
const trace = new FileIngestTraceWriter({
tracePath: join(homeDir, '.ktx/ingest-traces/job-resolver-fails/trace.jsonl'),
jobId: 'job-resolver-fails',
connectionId: 'warehouse',
sourceKey: 'metabase',
level: 'trace',
});
const result = await integrateWorkUnitPatch({
unitKey: 'wu-conflict',
patchPath,
integrationGit: git,
trace,
author: { name: 'System User', email: 'system@example.com' },
slDisallowed: false,
allowedTargetConnectionIds: new Set(['warehouse']),
validateAppliedTree: vi.fn(async () => {}),
resolveTextualConflict: vi.fn(async () => ({
status: 'failed' as const,
attempts: 1,
reason: 'resolver completed without editing an allowed path',
})),
});
expect(result).toMatchObject({
status: 'textual_conflict',
textualResolution: {
status: 'failed',
attempts: 1,
reason: 'resolver completed without editing an allowed path',
},
});
expect(await git.revParseHead()).toBe(acceptedHead);
await expect(readFile(join(configDir, 'wiki/global/a.md'), 'utf-8')).resolves.toBe('accepted\n');
});
it('repairs semantic gate failures after a patch applies cleanly', async () => {
const { homeDir, configDir, git, baseSha } = await makeRepo();
const childDir = join(homeDir, 'child-semantic-repair');
await git.addWorktree(childDir, 'child-semantic-repair', baseSha);
const childGit = git.forWorktree(childDir);
await writeFile(join(childDir, 'wiki/global/a.md'), 'bad semantic ref\n');
await childGit.commitFiles(['wiki/global/a.md'], 'bad semantic edit', 'System User', 'system@example.com');
const patchPath = join(homeDir, 'patches/semantic-repair.patch');
await childGit.writeBinaryNoRenamePatch(baseSha, 'HEAD', patchPath);
const trace = new FileIngestTraceWriter({
tracePath: join(homeDir, '.ktx/ingest-traces/job-semantic-repair/trace.jsonl'),
jobId: 'job-semantic-repair',
connectionId: 'c1',
sourceKey: 'fake',
level: 'trace',
});
const validateAppliedTree = vi
.fn()
.mockRejectedValueOnce(new Error('final artifact gates failed:\na: unknown semantic-layer entity'))
.mockResolvedValueOnce(undefined);
const result = await integrateWorkUnitPatch({
unitKey: 'wu-repairable',
patchPath,
integrationGit: git,
trace,
author: { name: 'KTX Test', email: 'system@ktx.local' },
validateAppliedTree,
slDisallowed: false,
allowedTargetConnectionIds: new Set(['c1']),
repairGateFailure: vi.fn(async (context) => {
expect(context).toMatchObject({
unitKey: 'wu-repairable',
patchPath,
touchedPaths: ['wiki/global/a.md'],
});
await writeFile(join(configDir, 'wiki/global/a.md'), 'repaired semantic ref\n', 'utf-8');
return {
status: 'repaired' as const,
attempts: 1,
changedPaths: ['wiki/global/a.md'],
};
}),
});
expect(result).toMatchObject({
status: 'accepted',
touchedPaths: ['wiki/global/a.md'],
gateRepair: {
status: 'repaired',
attempts: 1,
changedPaths: ['wiki/global/a.md'],
},
});
expect(validateAppliedTree).toHaveBeenCalledTimes(2);
await expect(readFile(join(configDir, 'wiki/global/a.md'), 'utf-8')).resolves.toBe('repaired semantic ref\n');
await expect(readFile(trace.tracePath, 'utf-8')).resolves.toContain('patch_accepted_after_gate_repair');
});
it('keeps the pre-apply tree when semantic gate repair fails', async () => {
const { homeDir, configDir, git, baseSha } = await makeRepo();
const childDir = join(homeDir, 'child-semantic-repair-fails');
await git.addWorktree(childDir, 'child-semantic-repair-fails', baseSha);
const childGit = git.forWorktree(childDir);
await writeFile(join(childDir, 'wiki/global/a.md'), 'bad semantic ref\n');
await childGit.commitFiles(['wiki/global/a.md'], 'bad semantic edit', 'System User', 'system@example.com');
const patchPath = join(homeDir, 'patches/semantic-repair-fails.patch');
await childGit.writeBinaryNoRenamePatch(baseSha, 'HEAD', patchPath);
const trace = new FileIngestTraceWriter({
tracePath: join(homeDir, '.ktx/ingest-traces/job-semantic-repair-fails/trace.jsonl'),
jobId: 'job-semantic-repair-fails',
connectionId: 'c1',
sourceKey: 'fake',
level: 'trace',
});
const result = await integrateWorkUnitPatch({
unitKey: 'wu-not-repaired',
patchPath,
integrationGit: git,
trace,
author: { name: 'KTX Test', email: 'system@ktx.local' },
validateAppliedTree: vi.fn().mockRejectedValue(new Error('final artifact gates failed')),
slDisallowed: false,
allowedTargetConnectionIds: new Set(['c1']),
repairGateFailure: vi.fn(async () => ({
status: 'failed' as const,
attempts: 1,
reason: 'gate repair completed without editing an allowed path',
})),
});
expect(result).toMatchObject({
status: 'semantic_conflict',
gateRepair: {
status: 'failed',
attempts: 1,
reason: 'gate repair completed without editing an allowed path',
},
});
await expect(readFile(join(configDir, 'wiki/global/a.md'), 'utf-8')).resolves.toBe('old\n');
});
});

View file

@ -0,0 +1,321 @@
import { readFile } from 'node:fs/promises';
import type { GitService } from '../../core/index.js';
import type { FinalGateRepairResult } from '../final-gate-repair.js';
import type { IngestTraceWriter } from '../ingest-trace.js';
import { traceTimed } from '../ingest-trace.js';
import { assertPatchAllowedForWorkUnit, parsePatchTouchedPaths } from './git-patch.js';
import type { TextualConflictResolutionResult } from './textual-conflict-resolver.js';
export type PatchIntegrationTextualResolution =
| { status: 'repaired'; attempts: number; changedPaths: string[] }
| { status: 'failed'; attempts: number; reason: string };
export type PatchIntegrationResult =
| {
status: 'accepted';
commitSha: string;
touchedPaths: string[];
textualResolution?: PatchIntegrationTextualResolution;
gateRepair?: FinalGateRepairResult;
}
| {
status: 'textual_conflict';
reason: string;
touchedPaths: string[];
textualResolution?: PatchIntegrationTextualResolution;
gateRepair?: FinalGateRepairResult;
}
| {
status: 'semantic_conflict';
reason: string;
touchedPaths: string[];
textualResolution?: PatchIntegrationTextualResolution;
gateRepair?: FinalGateRepairResult;
};
export interface IntegrateWorkUnitPatchInput {
unitKey: string;
patchPath: string;
integrationGit: GitService;
trace: IngestTraceWriter;
author: { name: string; email: string };
slDisallowed: boolean;
allowedTargetConnectionIds: ReadonlySet<string>;
validateAppliedTree(touchedPaths: string[]): Promise<void>;
resolveTextualConflict?(input: {
unitKey: string;
patchPath: string;
touchedPaths: string[];
reason: string;
}): Promise<TextualConflictResolutionResult>;
repairGateFailure?(input: {
unitKey: string;
patchPath: string;
touchedPaths: string[];
reason: string;
}): Promise<FinalGateRepairResult>;
}
function errorMessage(error: unknown): string {
return error instanceof Error ? error.message : String(error);
}
export async function integrateWorkUnitPatch(input: IntegrateWorkUnitPatchInput): Promise<PatchIntegrationResult> {
const preApplyHead = await input.integrationGit.revParseHead();
const patch = await readFile(input.patchPath, 'utf-8');
const touchedPaths = parsePatchTouchedPaths(patch).map((entry) => entry.path);
if (touchedPaths.length === 0) {
await input.trace.event('debug', 'integration', 'patch_noop_accepted', {
unitKey: input.unitKey,
patchPath: input.patchPath,
patchBytes: Buffer.byteLength(patch),
});
return { status: 'accepted', commitSha: preApplyHead ?? '', touchedPaths };
}
try {
assertPatchAllowedForWorkUnit({
unitKey: input.unitKey,
patch,
slDisallowed: input.slDisallowed,
allowedTargetConnectionIds: input.allowedTargetConnectionIds,
});
} catch (error) {
await input.trace.event('error', 'integration', 'patch_policy_rejected', {
unitKey: input.unitKey,
patchPath: input.patchPath,
touchedPaths,
allowedTargetConnectionIds: [...input.allowedTargetConnectionIds].sort(),
reason: errorMessage(error),
});
return {
status: 'textual_conflict',
reason: errorMessage(error),
touchedPaths,
};
}
try {
await traceTimed(
input.trace,
'integration',
'patch_apply',
{ unitKey: input.unitKey, patchPath: input.patchPath, touchedPaths },
async () => {
await input.integrationGit.applyPatchFile3WayIndex(input.patchPath);
await input.integrationGit.assertWorktreeClean();
},
);
} catch (error) {
if (preApplyHead) {
await input.integrationGit.resetHardTo(preApplyHead);
}
const reason = errorMessage(error);
await input.trace.event('error', 'integration', 'patch_textual_conflict', {
unitKey: input.unitKey,
patchPath: input.patchPath,
touchedPaths,
reason,
});
if (!input.resolveTextualConflict) {
return {
status: 'textual_conflict',
reason,
touchedPaths,
};
}
const textualResolution = await input.resolveTextualConflict({
unitKey: input.unitKey,
patchPath: input.patchPath,
touchedPaths,
reason,
});
if (textualResolution.status === 'failed') {
if (preApplyHead) {
await input.integrationGit.resetHardTo(preApplyHead);
}
return {
status: 'textual_conflict',
reason: textualResolution.reason,
touchedPaths,
textualResolution,
};
}
try {
await traceTimed(
input.trace,
'integration',
'semantic_gate_after_textual_resolution',
{ unitKey: input.unitKey, touchedPaths: textualResolution.changedPaths },
async () => {
await input.validateAppliedTree(textualResolution.changedPaths);
},
);
} catch (semanticError) {
if (preApplyHead) {
await input.integrationGit.resetHardTo(preApplyHead);
}
await input.trace.event('error', 'integration', 'patch_semantic_conflict_after_textual_resolution', {
unitKey: input.unitKey,
patchPath: input.patchPath,
touchedPaths: textualResolution.changedPaths,
reason: errorMessage(semanticError),
});
return {
status: 'semantic_conflict',
reason: errorMessage(semanticError),
touchedPaths: textualResolution.changedPaths,
textualResolution,
};
}
const commit = await input.integrationGit.commitFiles(
textualResolution.changedPaths,
`ingest: resolve WorkUnit ${input.unitKey} conflict`,
input.author.name,
input.author.email,
);
if (!commit.created) {
if (preApplyHead) {
await input.integrationGit.resetHardTo(preApplyHead);
}
const noChangeReason = 'textual resolver produced no committable changes';
await input.trace.event('error', 'integration', 'textual_conflict_resolver_noop', {
unitKey: input.unitKey,
patchPath: input.patchPath,
touchedPaths: textualResolution.changedPaths,
});
return {
status: 'textual_conflict',
reason: noChangeReason,
touchedPaths: textualResolution.changedPaths,
textualResolution,
};
}
await input.trace.event('debug', 'integration', 'patch_accepted_after_textual_resolution', {
unitKey: input.unitKey,
commitSha: commit.commitHash,
touchedPaths: textualResolution.changedPaths,
attempts: textualResolution.attempts,
});
return {
status: 'accepted',
commitSha: commit.commitHash,
touchedPaths: textualResolution.changedPaths,
textualResolution,
};
}
try {
await traceTimed(input.trace, 'integration', 'semantic_gate', { unitKey: input.unitKey, touchedPaths }, async () => {
await input.validateAppliedTree(touchedPaths);
});
} catch (error) {
const reason = errorMessage(error);
await input.trace.event('error', 'integration', 'patch_semantic_conflict', {
unitKey: input.unitKey,
patchPath: input.patchPath,
touchedPaths,
reason,
});
if (input.repairGateFailure) {
const gateRepair = await input.repairGateFailure({
unitKey: input.unitKey,
patchPath: input.patchPath,
touchedPaths,
reason,
});
if (gateRepair.status === 'failed') {
if (preApplyHead) {
await input.integrationGit.resetHardTo(preApplyHead);
}
return {
status: 'semantic_conflict',
reason: gateRepair.reason,
touchedPaths,
gateRepair,
};
}
try {
await traceTimed(
input.trace,
'integration',
'semantic_gate_after_gate_repair',
{ unitKey: input.unitKey, touchedPaths: gateRepair.changedPaths },
async () => {
await input.validateAppliedTree(gateRepair.changedPaths);
},
);
} catch (repairValidationError) {
if (preApplyHead) {
await input.integrationGit.resetHardTo(preApplyHead);
}
return {
status: 'semantic_conflict',
reason: errorMessage(repairValidationError),
touchedPaths: gateRepair.changedPaths,
gateRepair,
};
}
const commit = await input.integrationGit.commitFiles(
gateRepair.changedPaths,
`ingest: repair WorkUnit ${input.unitKey} gates`,
input.author.name,
input.author.email,
);
if (!commit.created) {
if (preApplyHead) {
await input.integrationGit.resetHardTo(preApplyHead);
}
return {
status: 'semantic_conflict',
reason: 'gate repair produced no committable changes',
touchedPaths: gateRepair.changedPaths,
gateRepair,
};
}
await input.trace.event('debug', 'integration', 'patch_accepted_after_gate_repair', {
unitKey: input.unitKey,
commitSha: commit.commitHash,
touchedPaths: gateRepair.changedPaths,
attempts: gateRepair.attempts,
});
return {
status: 'accepted',
commitSha: commit.commitHash,
touchedPaths: gateRepair.changedPaths,
gateRepair,
};
}
if (preApplyHead) {
await input.integrationGit.resetHardTo(preApplyHead);
}
return {
status: 'semantic_conflict',
reason,
touchedPaths,
};
}
const commit = await input.integrationGit.commitStaged(
`ingest: accept WorkUnit ${input.unitKey}`,
input.author.name,
input.author.email,
);
await input.trace.event('debug', 'integration', 'patch_accepted', {
unitKey: input.unitKey,
commitSha: commit.commitHash,
touchedPaths,
});
return { status: 'accepted', commitSha: commit.commitHash, touchedPaths };
}

View file

@ -0,0 +1,120 @@
import { mkdir, mkdtemp, readFile, writeFile } from 'node:fs/promises';
import { tmpdir } from 'node:os';
import { join } from 'node:path';
import { describe, expect, it, vi } from 'vitest';
import { FileIngestTraceWriter } from '../ingest-trace.js';
import { resolveTextualConflict } from './textual-conflict-resolver.js';
async function makeHarness() {
const root = await mkdtemp(join(tmpdir(), 'ktx-textual-resolver-'));
const workdir = join(root, 'workdir');
const patchPath = join(root, 'failed.patch');
const trace = new FileIngestTraceWriter({
tracePath: join(root, 'trace.jsonl'),
jobId: 'job-1',
connectionId: 'warehouse',
sourceKey: 'metabase',
runId: 'run-1',
syncId: 'sync-1',
level: 'trace',
});
await mkdir(join(workdir, 'wiki/global'), { recursive: true });
await writeFile(join(workdir, 'wiki/global/account.md'), 'accepted line\n', 'utf-8');
await writeFile(
patchPath,
[
'diff --git a/wiki/global/account.md b/wiki/global/account.md',
'index 8877391..6f63f4d 100644',
'--- a/wiki/global/account.md',
'+++ b/wiki/global/account.md',
'@@ -1 +1 @@',
'-base line',
'+proposal line',
'',
].join('\n'),
'utf-8',
);
return { root, workdir, patchPath, trace };
}
describe('resolveTextualConflict', () => {
it('lets the repair agent read the failed patch and write only touched paths', async () => {
const { workdir, patchPath, trace } = await makeHarness();
const agentRunner = {
runLoop: vi.fn(async (params: any) => {
const current = await params.toolSet.read_integration_file.execute({ path: 'wiki/global/account.md' });
expect(current.structured).toEqual({ path: 'wiki/global/account.md', exists: true });
expect(current.markdown).toContain('accepted line');
const patch = await params.toolSet.read_failed_patch.execute({});
expect(patch.markdown).toContain('proposal line');
await expect(
params.toolSet.write_integration_file.execute({
path: 'wiki/global/not-allowed.md',
content: 'bad\n',
}),
).rejects.toThrow(/resolver path not allowed/);
await params.toolSet.write_integration_file.execute({
path: 'wiki/global/account.md',
content: 'accepted line\nproposal line\n',
});
return { stopReason: 'natural' as const };
}),
};
const result = await resolveTextualConflict({
agentRunner,
workdir,
unitKey: 'wu-a',
patchPath,
touchedPaths: ['wiki/global/account.md'],
trace,
reason: 'patch failed: wiki/global/account.md',
maxAttempts: 1,
stepBudget: 8,
});
expect(result).toEqual({
status: 'repaired',
attempts: 1,
changedPaths: ['wiki/global/account.md'],
});
await expect(readFile(join(workdir, 'wiki/global/account.md'), 'utf-8')).resolves.toBe(
'accepted line\nproposal line\n',
);
expect(agentRunner.runLoop).toHaveBeenCalledWith(
expect.objectContaining({
modelRole: 'repair',
stepBudget: 8,
telemetryTags: expect.objectContaining({
operationName: 'ingest-isolated-diff-textual-resolver',
jobId: 'job-1',
unitKey: 'wu-a',
}),
}),
);
});
it('fails when the repair agent completes without editing any touched path', async () => {
const { workdir, patchPath, trace } = await makeHarness();
const result = await resolveTextualConflict({
agentRunner: { runLoop: vi.fn(async () => ({ stopReason: 'natural' as const })) },
workdir,
unitKey: 'wu-a',
patchPath,
touchedPaths: ['wiki/global/account.md'],
trace,
reason: 'patch failed: wiki/global/account.md',
maxAttempts: 1,
stepBudget: 8,
});
expect(result).toEqual({
status: 'failed',
attempts: 1,
reason: 'resolver completed without editing an allowed path',
});
});
});

View file

@ -0,0 +1,238 @@
import { mkdir, readFile, rm, writeFile } from 'node:fs/promises';
import { dirname, join } from 'node:path';
import { z } from 'zod';
import type { AgentRunnerPort, KtxRuntimeToolSet } from '../../llm/index.js';
import type { IngestTraceWriter } from '../ingest-trace.js';
import { traceTimed } from '../ingest-trace.js';
export type TextualConflictResolutionResult =
| { status: 'repaired'; attempts: number; changedPaths: string[] }
| { status: 'failed'; attempts: number; reason: string };
export interface ResolveTextualConflictInput {
agentRunner: AgentRunnerPort;
workdir: string;
unitKey: string;
patchPath: string;
touchedPaths: string[];
trace: IngestTraceWriter;
reason: string;
maxAttempts?: number;
stepBudget?: number;
}
const readIntegrationFileSchema = z.object({
path: z.string().min(1),
});
const writeIntegrationFileSchema = z.object({
path: z.string().min(1),
content: z.string(),
});
const deleteIntegrationFileSchema = z.object({
path: z.string().min(1),
});
function normalizeRepoPath(path: string): string {
const normalized = path.replace(/\\/g, '/').replace(/^\/+/, '');
const parts = normalized.split('/').filter((part) => part.length > 0);
if (parts.length === 0 || parts.some((part) => part === '.' || part === '..')) {
throw new Error(`resolver path must be a repository-relative path: ${path}`);
}
return parts.join('/');
}
function assertAllowedPath(path: string, allowedPaths: ReadonlySet<string>): string {
const normalized = normalizeRepoPath(path);
if (!allowedPaths.has(normalized)) {
throw new Error(`resolver path not allowed: ${normalized}`);
}
return normalized;
}
async function readOptionalFile(path: string): Promise<{ exists: boolean; content: string }> {
try {
return { exists: true, content: await readFile(path, 'utf-8') };
} catch (error) {
if (error && typeof error === 'object' && 'code' in error && error.code === 'ENOENT') {
return { exists: false, content: '' };
}
throw error;
}
}
function buildResolverSystemPrompt(): string {
return `<role>
You repair one failed KTX isolated-diff patch inside the integration worktree.
</role>
<rules>
- Preserve accepted integration content that is unrelated to the failed patch.
- Incorporate the failed patch only when the patch evidence is compatible with the current file.
- Edit only paths exposed by the resolver tools.
- Prefer the smallest text edit that makes the composed artifact coherent.
- Do not create new facts that are absent from the current file or failed patch.
- Stop after writing the repaired file content.
</rules>`;
}
function buildResolverUserPrompt(input: {
unitKey: string;
patchPath: string;
touchedPaths: string[];
reason: string;
attempt: number;
maxAttempts: number;
}): string {
return `Repair isolated-diff textual conflict.
WorkUnit: ${input.unitKey}
Attempt: ${input.attempt} of ${input.maxAttempts}
Patch path: ${input.patchPath}
Touched paths:
${input.touchedPaths.map((path) => `- ${path}`).join('\n')}
Git apply failure:
${input.reason}
Use read_failed_patch first. Then read the touched integration files, write the
repaired content, and stop.`;
}
function buildToolSet(input: {
workdir: string;
patchPath: string;
allowedPaths: ReadonlySet<string>;
editedPaths: Set<string>;
}): KtxRuntimeToolSet {
return {
read_failed_patch: {
name: 'read_failed_patch',
description: 'Read the failed Git patch that could not be applied to the integration worktree.',
inputSchema: z.object({}),
execute: async () => {
const patch = await readFile(input.patchPath, 'utf-8');
return {
markdown: patch,
structured: { patchPath: input.patchPath, bytes: Buffer.byteLength(patch) },
};
},
},
read_integration_file: {
name: 'read_integration_file',
description: 'Read one allowed file from the current integration worktree.',
inputSchema: readIntegrationFileSchema,
execute: async ({ path }: z.infer<typeof readIntegrationFileSchema>) => {
const normalized = assertAllowedPath(path, input.allowedPaths);
const file = await readOptionalFile(join(input.workdir, normalized));
return {
markdown: file.exists ? file.content : `(missing file: ${normalized})`,
structured: { path: normalized, exists: file.exists },
};
},
},
write_integration_file: {
name: 'write_integration_file',
description: 'Replace one allowed integration worktree file with repaired text content.',
inputSchema: writeIntegrationFileSchema,
execute: async ({ path, content }: z.infer<typeof writeIntegrationFileSchema>) => {
const normalized = assertAllowedPath(path, input.allowedPaths);
const fullPath = join(input.workdir, normalized);
await mkdir(dirname(fullPath), { recursive: true });
await writeFile(fullPath, content, 'utf-8');
input.editedPaths.add(normalized);
return {
markdown: `Wrote ${normalized}`,
structured: { path: normalized, bytes: Buffer.byteLength(content) },
};
},
},
delete_integration_file: {
name: 'delete_integration_file',
description: 'Delete one allowed integration worktree file when the failed patch proves the deletion is correct.',
inputSchema: deleteIntegrationFileSchema,
execute: async ({ path }: z.infer<typeof deleteIntegrationFileSchema>) => {
const normalized = assertAllowedPath(path, input.allowedPaths);
await rm(join(input.workdir, normalized), { force: true });
input.editedPaths.add(normalized);
return {
markdown: `Deleted ${normalized}`,
structured: { path: normalized },
};
},
},
};
}
export async function resolveTextualConflict(
input: ResolveTextualConflictInput,
): Promise<TextualConflictResolutionResult> {
const allowedPaths = new Set(input.touchedPaths.map(normalizeRepoPath));
const maxAttempts = input.maxAttempts ?? 1;
const stepBudget = input.stepBudget ?? 12;
let lastFailure = 'resolver did not run';
for (let attempt = 1; attempt <= maxAttempts; attempt += 1) {
const editedPaths = new Set<string>();
const traceData = {
unitKey: input.unitKey,
patchPath: input.patchPath,
touchedPaths: [...allowedPaths].sort(),
attempt,
maxAttempts,
reason: input.reason,
};
const result = await traceTimed(input.trace, 'resolver', 'textual_conflict_resolver', traceData, async () =>
input.agentRunner.runLoop({
modelRole: 'repair',
systemPrompt: buildResolverSystemPrompt(),
userPrompt: buildResolverUserPrompt({
unitKey: input.unitKey,
patchPath: input.patchPath,
touchedPaths: [...allowedPaths].sort(),
reason: input.reason,
attempt,
maxAttempts,
}),
toolSet: buildToolSet({
workdir: input.workdir,
patchPath: input.patchPath,
allowedPaths,
editedPaths,
}),
stepBudget,
telemetryTags: {
operationName: 'ingest-isolated-diff-textual-resolver',
source: input.trace.context.sourceKey,
jobId: input.trace.context.jobId,
unitKey: input.unitKey,
},
}),
);
if (result.stopReason === 'error') {
lastFailure = result.error?.message ?? 'resolver agent loop errored';
await input.trace.event('error', 'resolver', 'textual_conflict_resolver_failed', traceData, result.error);
continue;
}
const changedPaths = [...editedPaths].sort();
if (changedPaths.length === 0) {
lastFailure = 'resolver completed without editing an allowed path';
await input.trace.event('error', 'resolver', 'textual_conflict_resolver_failed', {
...traceData,
reason: lastFailure,
});
continue;
}
await input.trace.event('debug', 'resolver', 'textual_conflict_resolver_repaired', {
...traceData,
changedPaths,
});
return { status: 'repaired', attempts: attempt, changedPaths };
}
return { status: 'failed', attempts: maxAttempts, reason: lastFailure };
}

View file

@ -0,0 +1,144 @@
import { mkdir, mkdtemp, readFile, writeFile } from 'node:fs/promises';
import { tmpdir } from 'node:os';
import { join } from 'node:path';
import { describe, expect, it, vi } from 'vitest';
import { GitService } from '../../core/index.js';
import { FileIngestTraceWriter } from '../ingest-trace.js';
import { runIsolatedWorkUnit } from './work-unit-executor.js';
async function makeGit() {
const homeDir = await mkdtemp(join(tmpdir(), 'ktx-isolated-wu-'));
const configDir = join(homeDir, 'config');
const git = new GitService({
storage: { configDir, homeDir },
git: {
userName: 'System User',
userEmail: 'system@example.com',
bootstrapMessage: 'init',
bootstrapAuthor: 'system',
bootstrapAuthorEmail: 'system@example.com',
},
});
await git.onModuleInit();
await mkdir(join(configDir, 'raw-sources/c1/fake/s'), { recursive: true });
await writeFile(join(configDir, 'raw-sources/c1/fake/s/a.json'), '{}\n');
await git.commitFiles(['raw-sources/c1/fake/s/a.json'], 'raw snapshot', 'System User', 'system@example.com');
return { homeDir, configDir, git, baseSha: await git.revParseHead() };
}
describe('runIsolatedWorkUnit', () => {
it('creates a child worktree at the ingestion base and persists a patch proposal', async () => {
const { homeDir, git, baseSha } = await makeGit();
const childDir = join(homeDir, '.worktrees/session-job-1-wu-1');
const sessionWorktreeService = {
create: vi.fn(async (_key: string, startSha: string) => {
await mkdir(join(homeDir, '.worktrees'), { recursive: true });
await git.addWorktree(childDir, 'session/job-1-wu-1', startSha);
const childGit = git.forWorktree(childDir);
return {
chatId: 'job-1-wu-1',
workdir: childDir,
branch: 'session/job-1-wu-1',
baseSha: startSha,
createdAt: new Date(),
git: childGit,
config: {},
};
}),
cleanup: vi.fn(async () => undefined),
};
const tracePath = join(homeDir, '.ktx/ingest-traces/job-1/trace.jsonl');
const trace = new FileIngestTraceWriter({
tracePath,
jobId: 'job-1',
connectionId: 'c1',
sourceKey: 'fake',
level: 'trace',
});
const result = await runIsolatedWorkUnit({
unitIndex: 0,
ingestionBaseSha: baseSha,
sessionWorktreeService: sessionWorktreeService as never,
patchDir: join(homeDir, '.ktx/ingest-patches/job-1'),
trace,
run: async (child) => {
await mkdir(join(child.workdir, 'wiki/global'), { recursive: true });
await writeFile(join(child.workdir, 'wiki/global/a.md'), '---\nsummary: A\nusage_mode: auto\n---\n\nBody\n');
await child.git.commitFiles(['wiki/global/a.md'], 'test: write wiki', 'KTX Test', 'system@ktx.local');
return {
unitKey: 'wu-1',
status: 'success',
preSha: baseSha,
postSha: await child.git.revParseHead(),
actions: [{ target: 'wiki', type: 'created', key: 'a', detail: 'A' }],
touchedSlSources: [],
};
},
workUnit: { unitKey: 'wu-1', rawFiles: ['a.json'], peerFileIndex: [], dependencyPaths: [] },
});
expect(sessionWorktreeService.create).toHaveBeenCalledWith('job-1-wu-1', baseSha);
expect(sessionWorktreeService.cleanup).toHaveBeenCalledWith(expect.any(Object), 'success');
expect(result.status).toBe('success');
if (result.status !== 'success') {
throw new Error('expected successful work unit');
}
const patchPath = result.patchPath;
if (!patchPath) {
throw new Error('expected patch path');
}
expect(patchPath).toContain('0000-wu-1.patch');
await expect(readFile(patchPath, 'utf-8')).resolves.toContain('wiki/global/a.md');
await expect(readFile(tracePath, 'utf-8')).resolves.toContain('work_unit_child_created');
});
it('removes child worktrees after failed WorkUnit outcomes are traced', async () => {
const { homeDir, git, baseSha } = await makeGit();
const childDir = join(homeDir, '.worktrees/session-job-1-wu-fail');
const sessionWorktreeService = {
create: vi.fn(async (_key: string, startSha: string) => {
await mkdir(join(homeDir, '.worktrees'), { recursive: true });
await git.addWorktree(childDir, 'session/job-1-wu-fail', startSha);
return {
chatId: 'job-1-wu-fail',
workdir: childDir,
branch: 'session/job-1-wu-fail',
baseSha: startSha,
createdAt: new Date(),
git: git.forWorktree(childDir),
config: {},
};
}),
cleanup: vi.fn(async () => undefined),
};
const trace = new FileIngestTraceWriter({
tracePath: join(homeDir, '.ktx/ingest-traces/job-1/trace.jsonl'),
jobId: 'job-1',
connectionId: 'c1',
sourceKey: 'fake',
level: 'trace',
});
const result = await runIsolatedWorkUnit({
unitIndex: 0,
ingestionBaseSha: baseSha,
sessionWorktreeService: sessionWorktreeService as never,
patchDir: join(homeDir, '.ktx/ingest-patches/job-1'),
trace,
run: async () => ({
unitKey: 'wu-fail',
status: 'failed',
reason: 'agent loop errored',
preSha: baseSha,
postSha: baseSha,
actions: [],
touchedSlSources: [],
}),
workUnit: { unitKey: 'wu-fail', rawFiles: ['a.json'], peerFileIndex: [], dependencyPaths: [] },
});
expect(result.status).toBe('failed');
expect(sessionWorktreeService.cleanup).toHaveBeenCalledWith(expect.any(Object), 'success');
});
});

View file

@ -0,0 +1,85 @@
import { mkdir, readFile } from 'node:fs/promises';
import { join } from 'node:path';
import type { SessionOutcome } from '../../core/index.js';
import type { IngestSessionWorktree, IngestSessionWorktreePort } from '../ports.js';
import type { WorkUnit } from '../types.js';
import type { IngestTraceWriter } from '../ingest-trace.js';
import type { WorkUnitOutcome } from '../stages/stage-3-work-units.js';
import { parsePatchTouchedPaths } from './git-patch.js';
export interface RunIsolatedWorkUnitInput {
unitIndex: number;
ingestionBaseSha: string;
sessionWorktreeService: IngestSessionWorktreePort;
patchDir: string;
trace: IngestTraceWriter;
workUnit: WorkUnit;
run(child: IngestSessionWorktree): Promise<WorkUnitOutcome>;
afterSuccess?(child: IngestSessionWorktree): Promise<void>;
}
function patchFileName(unitIndex: number, unitKey: string): string {
const safeKey = unitKey.replace(/[^a-zA-Z0-9_.-]+/g, '-');
return `${String(unitIndex).padStart(4, '0')}-${safeKey}.patch`;
}
export async function runIsolatedWorkUnit(input: RunIsolatedWorkUnitInput): Promise<WorkUnitOutcome> {
const sessionKey = `${input.trace.context.jobId}-${input.workUnit.unitKey}`;
let cleanupOutcome: SessionOutcome = 'crash';
const child = await input.sessionWorktreeService.create(sessionKey, input.ingestionBaseSha);
await input.trace.event('debug', 'work_unit', 'work_unit_child_created', {
unitKey: input.workUnit.unitKey,
unitIndex: input.unitIndex,
worktreePath: child.workdir,
baseSha: input.ingestionBaseSha,
});
try {
const outcome = await input.run(child);
if (outcome.status !== 'success') {
cleanupOutcome = 'success';
await input.trace.event('error', 'work_unit', 'work_unit_failed_before_patch', {
unitKey: input.workUnit.unitKey,
reason: outcome.reason ?? 'unknown failure',
});
return { ...outcome, childWorktreePath: child.workdir };
}
await input.afterSuccess?.(child);
await mkdir(input.patchDir, { recursive: true });
const patchPath = join(input.patchDir, patchFileName(input.unitIndex, input.workUnit.unitKey));
await child.git.writeBinaryNoRenamePatch(input.ingestionBaseSha, 'HEAD', patchPath);
const patch = await readFile(patchPath, 'utf-8');
const touched = parsePatchTouchedPaths(patch);
cleanupOutcome = 'success';
await input.trace.event('debug', 'work_unit', 'work_unit_patch_collected', {
unitKey: input.workUnit.unitKey,
patchPath,
touchedPaths: touched.map((entry) => entry.path),
patchBytes: Buffer.byteLength(patch),
});
return {
...outcome,
patchPath,
patchTouchedPaths: touched.map((entry) => entry.path),
childWorktreePath: child.workdir,
};
} catch (error) {
await input.trace.event(
'error',
'work_unit',
'work_unit_child_failed',
{ unitKey: input.workUnit.unitKey, worktreePath: child.workdir },
error,
);
cleanupOutcome = 'success';
throw error;
} finally {
await input.sessionWorktreeService.cleanup(child, cleanupOutcome);
await input.trace.event('trace', 'work_unit', 'work_unit_child_cleanup', {
unitKey: input.workUnit.unitKey,
outcome: cleanupOutcome,
worktreePath: child.workdir,
});
}
}

View file

@ -694,6 +694,14 @@ describe('canonical local ingest', () => {
],
},
});
expect(result.report.body.isolatedDiff).toMatchObject({
enabled: true,
acceptedPatches: 0,
projectionSha: expect.any(String),
});
const projectedSourcePath = join(metricflowProject.projectDir, 'semantic-layer/warehouse/orders.yaml');
await expect(readFile(projectedSourcePath, 'utf-8')).resolves.toContain('name: orders');
const stagedRawPath = join(
metricflowProject.projectDir,

View file

@ -17,6 +17,24 @@ type RuntimeWithConnectionDeps = {
};
};
type RuntimeWithSlValidationDeps = {
deps: {
slValidator: {
validateSingleSource(
deps: unknown,
connectionId: string,
sourceName: string,
): Promise<{ errors: string[]; warnings: string[] }>;
};
};
};
type RuntimeWithSettingsDeps = {
deps: {
settings: Record<string, unknown>;
};
};
function testAgentRunner(): AgentRunnerPort {
return { runLoop: vi.fn().mockResolvedValue({ stopReason: 'natural' as const }) };
}
@ -144,6 +162,77 @@ describe('createLocalBundleIngestRuntime', () => {
]);
});
it('validates manifest-backed scan sources during local ingest gates', async () => {
await project.fileStore.writeFile(
'semantic-layer/warehouse/_schema/public.yaml',
[
'tables:',
' payments:',
' table: public.payments',
' columns:',
' - name: payment_id',
' type: string',
' - name: amount',
' type: number',
'',
].join('\n'),
'ktx',
'ktx@example.com',
'Add warehouse manifest',
);
const agentRunner = testAgentRunner();
const runtime = createLocalBundleIngestRuntime({
project,
adapters: [new FakeSourceAdapter()],
agentRunner,
});
const deps = (runtime.runner as unknown as RuntimeWithSlValidationDeps).deps;
await expect(deps.slValidator.validateSingleSource(deps, 'warehouse', 'payments')).resolves.toEqual({
errors: [],
warnings: expect.any(Array),
});
});
it('does not mask malformed direct overlays with manifest-backed fallback validation', async () => {
await project.fileStore.writeFile(
'semantic-layer/warehouse/_schema/public.yaml',
[
'tables:',
' payments:',
' table: public.payments',
' columns:',
' - name: payment_id',
' type: string',
'',
].join('\n'),
'ktx',
'ktx@example.com',
'Add warehouse manifest',
);
await project.fileStore.writeFile(
'semantic-layer/warehouse/payments.yaml',
['name: payments', 'columns:', ' - [', ''].join('\n'),
'ktx',
'ktx@example.com',
'Add malformed overlay',
);
const agentRunner = testAgentRunner();
const runtime = createLocalBundleIngestRuntime({
project,
adapters: [new FakeSourceAdapter()],
agentRunner,
});
const deps = (runtime.runner as unknown as RuntimeWithSlValidationDeps).deps;
await expect(deps.slValidator.validateSingleSource(deps, 'warehouse', 'payments')).resolves.toEqual({
errors: [expect.stringContaining('invalid YAML')],
warnings: [],
});
});
it('passes project connection config to local ingest query executors', async () => {
const agentRunner = testAgentRunner();
const queryExecutor = {
@ -175,6 +264,27 @@ describe('createLocalBundleIngestRuntime', () => {
});
});
it('defaults local bundle ingest to isolated diffs without a shared-worktree fallback setting', () => {
const runtime = createLocalBundleIngestRuntime({
project,
adapters: [new FakeSourceAdapter()],
agentRunner: testAgentRunner(),
});
const settings = (runtime.runner as unknown as RuntimeWithSettingsDeps).deps.settings;
const fallbackSettingKey = ['sharedWorktree', 'SourceKeys'].join('');
expect(settings).not.toHaveProperty(fallbackSettingKey);
expect(Object.keys(settings).sort()).toEqual([
'ingestTraceLevel',
'memoryIngestionModel',
'probeRowCount',
'workUnitFailureMode',
'workUnitMaxConcurrency',
'workUnitStepBudget',
]);
});
it('accepts a debug LLM request file when constructing the default agent runner', async () => {
await writeFile(
join(project.projectDir, 'ktx.yaml'),

View file

@ -24,7 +24,6 @@ import {
type KtxConnectionInfo,
type KtxQueryResult,
SemanticLayerService,
type SemanticLayerSource,
type SlConnectionCatalogPort,
SlDiscoverTool,
SlEditSourceTool,
@ -76,6 +75,7 @@ import { createEmitHistoricSqlEvidenceTool } from './adapters/historic-sql/evide
import { HistoricSqlProjectionPostProcessor } from './adapters/historic-sql/post-processor.js';
import { ContextEvidenceIndexService, SqliteContextEvidenceStore } from './context-evidence/index.js';
import { DiffSetService } from './diff-set.service.js';
import { ingestTracePathForJob, type IngestTraceLevel } from './ingest-trace.js';
import { IngestBundleRunner } from './ingest-bundle.runner.js';
import { PageTriageService } from './page-triage/index.js';
import { createWarehouseVerificationTools } from './tools/warehouse-verification/index.js';
@ -96,6 +96,12 @@ const promptsDir = fileURLToPath(new URL('../../prompts', import.meta.url));
const skillsDir = fileURLToPath(new URL('../../skills', import.meta.url));
const LOCAL_AUTHOR = { name: 'KTX Local', email: 'local@ktx.local' };
const LOCAL_SHAPE_WARNING = 'Local ingest validates semantic-layer YAML shape only.';
const INGEST_TRACE_LEVELS = new Set<IngestTraceLevel>(['error', 'info', 'debug', 'trace']);
function ingestTraceLevelFromEnv(env: NodeJS.ProcessEnv = process.env): IngestTraceLevel {
const raw = env.KTX_INGEST_TRACE_LEVEL;
return raw && INGEST_TRACE_LEVELS.has(raw as IngestTraceLevel) ? (raw as IngestTraceLevel) : 'debug';
}
export interface CreateLocalBundleIngestRuntimeOptions {
project: KtxLocalProject;
@ -151,6 +157,10 @@ class LocalIngestStorage implements IngestStoragePort {
resolveTranscriptDir(jobId: string): string {
return join(this.project.projectDir, '.ktx/ingest-transcripts', jobId);
}
resolveTracePath(jobId: string): string {
return ingestTracePathForJob(this.homeDir, jobId);
}
}
class LocalIngestLock implements IngestLockPort {
@ -237,22 +247,63 @@ class LocalSlPythonPort implements SlPythonPort {
}
class LocalShapeOnlySlValidator implements SlValidatorPort<SlValidationDeps> {
private validateParsedSource(sourceName: string, parsed: Record<string, unknown>) {
const isOverlay = parsed.table == null && parsed.sql == null;
const result = (isOverlay ? sourceOverlaySchema : sourceDefinitionSchema).safeParse(parsed);
return result.success
? { errors: [], warnings: [LOCAL_SHAPE_WARNING] }
: {
errors: result.error.issues.map(
(issue) => `${sourceName}: ${issue.path.join('.') || 'source'} ${issue.message}`,
),
warnings: [],
};
}
private async validateComposedSource(
deps: SlValidationDeps,
connectionId: string,
sourceName: string,
readError: unknown,
) {
try {
const { sources, loadErrors } = await deps.semanticLayerService.loadAllSources(connectionId);
const source = sources.find((candidate) => candidate.name === sourceName);
if (source) {
return this.validateParsedSource(sourceName, source as unknown as Record<string, unknown>);
}
const detail =
loadErrors.length > 0
? loadErrors.join('; ')
: readError instanceof Error
? readError.message
: String(readError);
return { errors: [`${sourceName}: ${detail}`], warnings: [] };
} catch (fallbackError) {
return {
errors: [`${sourceName}: ${fallbackError instanceof Error ? fallbackError.message : String(fallbackError)}`],
warnings: [],
};
}
}
async validateSingleSource(deps: SlValidationDeps, connectionId: string, sourceName: string) {
let content: string;
try {
const file = await deps.semanticLayerService.readSourceFile(connectionId, sourceName);
const parsed = YAML.parse(file.content) as SemanticLayerSource;
const isOverlay = parsed.table == null && parsed.sql == null;
const result = (isOverlay ? sourceOverlaySchema : sourceDefinitionSchema).safeParse(parsed);
return result.success
? { errors: [], warnings: [LOCAL_SHAPE_WARNING] }
: {
errors: result.error.issues.map(
(issue) => `${sourceName}: ${issue.path.join('.') || 'source'} ${issue.message}`,
),
warnings: [],
};
content = file.content;
} catch (error) {
return { errors: [`${sourceName}: ${error instanceof Error ? error.message : String(error)}`], warnings: [] };
return this.validateComposedSource(deps, connectionId, sourceName, error);
}
try {
const parsed = YAML.parse(content) as unknown as Record<string, unknown>;
return this.validateParsedSource(sourceName, parsed);
} catch (error) {
return {
errors: [`${sourceName}: invalid YAML — ${error instanceof Error ? error.message : String(error)}`],
warnings: [],
};
}
}
}
@ -671,6 +722,7 @@ export function createLocalBundleIngestRuntime(
workUnitMaxConcurrency: options.project.config.ingest.workUnits.maxConcurrency,
workUnitStepBudget: options.project.config.ingest.workUnits.stepBudget,
workUnitFailureMode: options.project.config.ingest.workUnits.failureMode,
ingestTraceLevel: ingestTraceLevelFromEnv(),
},
skillsRegistry: new SkillsRegistryService({ skillsDir, logger }),
promptService,

View file

@ -21,6 +21,7 @@ function snapshot(overrides: Partial<MemoryFlowReplayInput> = {}): MemoryFlowRep
{ type: 'raw_snapshot_written', syncId: 'sync-1', rawFileCount: 2 },
{ type: 'diff_computed', added: 1, modified: 1, deleted: 0, unchanged: 0 },
{ type: 'chunks_planned', chunkCount: 1, workUnitCount: 1, evictionCount: 0 },
{ type: 'stage_progress', stage: 'integration', percent: 80, message: 'Integrating 1/1 patches: orders' },
{ type: 'work_unit_started', unitKey: 'orders', skills: ['wiki_capture'], stepBudget: 40 },
{ type: 'work_unit_step', unitKey: 'orders', stepIndex: 1, stepBudget: 40 },
{ type: 'candidate_action', unitKey: 'orders', target: 'wiki', action: 'created', key: 'wiki/orders.md' },

View file

@ -53,6 +53,23 @@ export const memoryFlowEventSchema = z.discriminatedUnion('type', [
stage: z.enum(['source', 'chunks', 'workUnits', 'actions', 'gates', 'saved']),
reason: z.string().min(1),
}),
eventSchema({
type: z.literal('stage_progress'),
stage: z.enum([
'source',
'integration',
'reconciliation',
'post_processor',
'wiki_sl_ref_repair',
'final_gates',
'save',
'provenance',
'report',
]),
percent: z.number().min(0).max(100),
message: z.string().min(1),
transient: z.boolean().optional(),
}),
eventSchema({
type: z.literal('work_unit_started'),
unitKey: z.string().min(1),

View file

@ -44,6 +44,22 @@ type MemoryFlowEventPayload =
stage: MemoryFlowColumnId;
reason: string;
}
| {
type: 'stage_progress';
stage:
| 'source'
| 'integration'
| 'reconciliation'
| 'post_processor'
| 'wiki_sl_ref_repair'
| 'final_gates'
| 'save'
| 'provenance'
| 'report';
percent: number;
message: string;
transient?: boolean;
}
| {
type: 'work_unit_started';
unitKey: string;

View file

@ -16,6 +16,7 @@ import type {
import type { ToolContext, ToolSession, TouchedSlSource } from '../tools/index.js';
import type { KnowledgeIndexPort, KnowledgeWikiService } from '../wiki/index.js';
import type { CanonicalPin } from './canonical-pins.js';
import type { IngestTraceLevel } from './ingest-trace.js';
import type { IngestReportSnapshot } from './reports.js';
import type {
ReconcileCandidateForPrompt,
@ -142,6 +143,7 @@ export interface IngestSettingsPort {
workUnitMaxConcurrency?: number;
workUnitStepBudget?: number;
workUnitFailureMode?: 'abort' | 'continue';
ingestTraceLevel?: IngestTraceLevel;
}
export interface IngestGitAuthor {
@ -155,6 +157,7 @@ export interface IngestStoragePort {
resolveUploadDir(uploadId: string): string;
resolvePullDir(jobId: string): string;
resolveTranscriptDir(jobId: string): string;
resolveTracePath(jobId: string): string;
}
export interface IngestCommitMessagePort {

View file

@ -206,6 +206,47 @@ describe('parseIngestReportSnapshot', () => {
expect(snapshot.body.toolTranscripts).toEqual([]);
});
it('parses failed ingest reports with trace and failure details', () => {
const snapshot = parseIngestReportSnapshot({
id: 'report-failed',
runId: 'run-failed',
jobId: 'job-failed',
connectionId: 'warehouse',
sourceKey: 'metabase',
createdAt: '2026-05-17T12:00:00.000Z',
body: {
status: 'failed',
syncId: 'sync-failed',
diffSummary: { added: 1, modified: 0, deleted: 0, unchanged: 0 },
commitSha: null,
tracePath: '/project/.ktx/ingest-traces/job-failed/trace.jsonl',
failure: {
phase: 'final_gates',
message: 'final artifact gates failed',
},
workUnits: [],
failedWorkUnits: [],
reconciliationSkipped: true,
conflictsResolved: [],
evictionsApplied: [],
unmappedFallbacks: [],
evictionInputs: [],
unresolvedCards: [],
supersededBy: null,
overrideOf: null,
provenanceRows: [],
toolTranscripts: [],
},
});
expect(snapshot.body.status).toBe('failed');
expect(snapshot.body.failure).toEqual({
phase: 'final_gates',
message: 'final artifact gates failed',
});
expect(snapshot.body.tracePath).toContain('trace.jsonl');
});
it('rejects malformed report snapshots with a concise message', () => {
const report = validReportSnapshot();
report.body.workUnits[0] = {
@ -215,4 +256,93 @@ describe('parseIngestReportSnapshot', () => {
expect(() => parseIngestReportSnapshot(report)).toThrow('Invalid ingest report snapshot');
});
it('parses isolated-diff textual resolver counters', () => {
const snapshot = parseIngestReportSnapshot({
id: 'report-1',
runId: 'run-1',
jobId: 'job-1',
connectionId: 'warehouse',
sourceKey: 'metabase',
createdAt: '2026-05-18T00:00:00.000Z',
body: {
status: 'completed',
syncId: 'sync-1',
diffSummary: { added: 0, modified: 1, deleted: 0, unchanged: 0 },
commitSha: 'abc123',
isolatedDiff: {
enabled: true,
acceptedPatches: 2,
textualConflicts: 1,
semanticConflicts: 0,
resolverAttempts: 1,
resolverRepairs: 1,
resolverFailures: 0,
},
workUnits: [],
failedWorkUnits: [],
reconciliationSkipped: true,
conflictsResolved: [],
evictionsApplied: [],
unmappedFallbacks: [],
artifactResolutions: [],
evictionInputs: [],
unresolvedCards: [],
supersededBy: null,
overrideOf: null,
provenanceRows: [],
toolTranscripts: [],
},
});
expect(snapshot.body.isolatedDiff).toMatchObject({
resolverAttempts: 1,
resolverRepairs: 1,
resolverFailures: 0,
});
});
it('parses isolated-diff gate repair counters', () => {
const snapshot = parseIngestReportSnapshot({
id: 'report-1',
runId: 'run-1',
jobId: 'job-1',
connectionId: 'warehouse',
sourceKey: 'metabase',
createdAt: '2026-05-18T00:00:00.000Z',
body: {
status: 'completed',
syncId: 'sync-1',
diffSummary: { added: 1, modified: 0, deleted: 0, unchanged: 0 },
commitSha: 'abc123',
isolatedDiff: {
enabled: true,
acceptedPatches: 1,
textualConflicts: 0,
semanticConflicts: 1,
gateRepairAttempts: 1,
gateRepairs: 1,
gateRepairFailures: 0,
},
workUnits: [],
failedWorkUnits: [],
reconciliationSkipped: true,
conflictsResolved: [],
evictionsApplied: [],
unmappedFallbacks: [],
evictionInputs: [],
unresolvedCards: [],
supersededBy: null,
overrideOf: null,
provenanceRows: [],
toolTranscripts: [],
},
});
expect(snapshot.body.isolatedDiff).toMatchObject({
gateRepairAttempts: 1,
gateRepairs: 1,
gateRepairFailures: 0,
});
});
});

View file

@ -123,6 +123,12 @@ const sourceFetchReportSchema = z.object({
warnings: z.array(sourceFetchIssueSchema).default([]),
});
const ingestReportFailureSchema = z.object({
phase: z.string().min(1),
message: z.string().min(1),
details: z.record(z.string(), z.unknown()).optional(),
});
export const ingestReportSnapshotSchema = z
.object({
id: z.string().min(1),
@ -133,10 +139,30 @@ export const ingestReportSnapshotSchema = z
createdAt: z.string().min(1),
body: z
.object({
status: z.enum(['completed', 'failed']).optional(),
syncId: z.string().min(1),
diffSummary: ingestDiffSummarySchema,
fetch: sourceFetchReportSchema.optional(),
commitSha: z.string().nullable(),
tracePath: z.string().optional(),
failure: ingestReportFailureSchema.optional(),
isolatedDiff: z
.object({
enabled: z.boolean(),
integrationWorktreePath: z.string().optional(),
ingestionBaseSha: z.string().optional(),
projectionSha: z.string().nullable().optional(),
acceptedPatches: z.number().int().min(0),
textualConflicts: z.number().int().min(0),
semanticConflicts: z.number().int().min(0),
resolverAttempts: z.number().int().min(0).default(0),
resolverRepairs: z.number().int().min(0).default(0),
resolverFailures: z.number().int().min(0).default(0),
gateRepairAttempts: z.number().int().min(0).default(0),
gateRepairs: z.number().int().min(0).default(0),
gateRepairFailures: z.number().int().min(0).default(0),
})
.optional(),
workUnits: z.array(
z.object({
unitKey: z.string().min(1),

View file

@ -48,11 +48,35 @@ export interface IngestReportPostProcessorOutcome {
touchedSources: TouchedSlSource[];
}
export interface IngestReportFailure {
phase: string;
message: string;
details?: Record<string, unknown>;
}
export interface IngestReportBody {
status?: 'completed' | 'failed';
syncId: string;
diffSummary: IngestDiffSummary;
fetch?: SourceFetchReport;
commitSha: string | null;
tracePath?: string;
failure?: IngestReportFailure;
isolatedDiff?: {
enabled: boolean;
integrationWorktreePath?: string;
ingestionBaseSha?: string;
projectionSha?: string | null;
acceptedPatches: number;
textualConflicts: number;
semanticConflicts: number;
resolverAttempts?: number;
resolverRepairs?: number;
resolverFailures?: number;
gateRepairAttempts?: number;
gateRepairs?: number;
gateRepairFailures?: number;
};
workUnits: IngestReportWorkUnit[];
failedWorkUnits: string[];
reconciliationSkipped: boolean;

View file

@ -0,0 +1,38 @@
import { describe, expect, it } from 'vitest';
import {
assertSemanticLayerTargetPathsAllowed,
findDisallowedSemanticLayerTargetPaths,
semanticLayerConnectionIdFromPath,
} from './semantic-layer-target-policy.js';
describe('semantic-layer target policy', () => {
it('extracts connection ids from semantic-layer paths', () => {
expect(semanticLayerConnectionIdFromPath('semantic-layer/warehouse/orders.yaml')).toBe('warehouse');
expect(semanticLayerConnectionIdFromPath('a/semantic-layer/finance/orders.yaml')).toBe('finance');
expect(semanticLayerConnectionIdFromPath('wiki/global/orders.md')).toBeNull();
});
it('finds semantic-layer paths outside the allowed target connections', () => {
expect(
findDisallowedSemanticLayerTargetPaths({
paths: [
'semantic-layer/warehouse/orders.yaml',
'semantic-layer/finance/orders.yaml',
'wiki/global/orders.md',
],
allowedConnectionIds: new Set(['warehouse']),
}),
).toEqual([{ path: 'semantic-layer/finance/orders.yaml', connectionId: 'finance' }]);
});
it('throws a deterministic error for unauthorized semantic-layer targets', () => {
expect(() =>
assertSemanticLayerTargetPathsAllowed({
paths: ['semantic-layer/finance/orders.yaml', 'semantic-layer/marketing/accounts.yaml'],
allowedConnectionIds: new Set(['warehouse']),
}),
).toThrow(
/semantic-layer target connection not allowed: semantic-layer\/finance\/orders\.yaml \(finance\), semantic-layer\/marketing\/accounts\.yaml \(marketing\); allowed: warehouse/,
);
});
});

View file

@ -0,0 +1,42 @@
export interface SemanticLayerTargetPolicyInput {
paths: readonly string[];
allowedConnectionIds: ReadonlySet<string>;
}
export interface SemanticLayerTargetPolicyViolation {
path: string;
connectionId: string;
}
export function semanticLayerConnectionIdFromPath(path: string): string | null {
const normalized = path.replace(/^[ab]\//, '');
const match = /^semantic-layer\/([^/]+)\//.exec(normalized);
return match?.[1] ?? null;
}
export function findDisallowedSemanticLayerTargetPaths(
input: SemanticLayerTargetPolicyInput,
): SemanticLayerTargetPolicyViolation[] {
return input.paths
.map((path) => ({ path, connectionId: semanticLayerConnectionIdFromPath(path) }))
.filter((entry): entry is SemanticLayerTargetPolicyViolation => {
return entry.connectionId !== null && !input.allowedConnectionIds.has(entry.connectionId);
})
.sort((left, right) => {
const byConnection = left.connectionId.localeCompare(right.connectionId);
return byConnection === 0 ? left.path.localeCompare(right.path) : byConnection;
});
}
export function assertSemanticLayerTargetPathsAllowed(input: SemanticLayerTargetPolicyInput): void {
const violations = findDisallowedSemanticLayerTargetPaths(input);
if (violations.length === 0) {
return;
}
const allowed = [...input.allowedConnectionIds].sort();
throw new Error(
`semantic-layer target connection not allowed: ${violations
.map((violation) => `${violation.path} (${violation.connectionId})`)
.join(', ')}; allowed: ${allowed.length > 0 ? allowed.join(', ') : '(none)'}`,
);
}

View file

@ -41,6 +41,9 @@ export interface WorkUnitOutcome {
touchedSlSources: TouchedSlSource[];
slDisallowed?: boolean;
slDisallowedReason?: 'lookml_connection_mismatch';
patchPath?: string;
patchTouchedPaths?: string[];
childWorktreePath?: string;
}
export async function executeWorkUnit(deps: WorkUnitExecutionDeps, wu: WorkUnit): Promise<WorkUnitOutcome> {

View file

@ -1,4 +1,5 @@
import type { KtxEmbeddingPort } from '../core/embedding.js';
import type { SemanticLayerService } from '../sl/index.js';
import type { MemoryFlowEventSink } from './memory-flow/types.js';
export type IngestTrigger = 'upload' | 'scheduled_pull' | 'manual_resync' | 'manual_override';
@ -47,6 +48,7 @@ export interface ChunkResult {
export interface FetchContext {
connectionId: string;
sourceKey: string;
memoryFlow?: MemoryFlowEventSink;
}
type SourceFetchIssueKind =
@ -96,6 +98,26 @@ export interface ClusterWorkUnitsContext {
embedding: KtxEmbeddingPort;
}
export interface DeterministicProjectionContext {
connectionId: string;
sourceKey: string;
syncId: string;
jobId: string;
runId: string;
stagedDir: string;
workdir: string;
parseArtifacts?: unknown;
semanticLayerService: SemanticLayerService;
}
export interface ProjectionResult {
warnings: string[];
errors: string[];
touchedSources: Array<{ connectionId: string; sourceName: string }>;
changedWikiPageKeys: string[];
result?: unknown;
}
export interface SourceAdapter {
readonly source: string;
readonly skillNames: string[];
@ -109,6 +131,7 @@ export interface SourceAdapter {
listTargetConnectionIds?(stagedDir: string): Promise<string[]>;
chunk(stagedDir: string, diffSet?: DiffSet): Promise<ChunkResult>;
clusterWorkUnits?(ctx: ClusterWorkUnitsContext): Promise<WorkUnit[]>;
project?(ctx: DeterministicProjectionContext): Promise<ProjectionResult>;
describeScope?(stagedDir: string): Promise<ScopeDescriptor>;
onPullSucceeded?(ctx: {
connectionId: string;

View file

@ -0,0 +1,153 @@
import { describe, expect, it } from 'vitest';
import { findInvalidWikiBodyRefs, parseWikiBodyRefs } from './wiki-body-refs.js';
const sources = [
{
name: 'mart_account_segments',
grain: ['account_id'],
columns: [
{ name: 'account_id', type: 'string' },
{ name: 'segment', type: 'string' },
],
joins: [],
measures: [{ name: 'total_contract_arr', expr: 'sum(contract_arr)' }],
segments: [{ name: 'enterprise', expr: "segment = 'enterprise'" }],
table: 'analytics.mart_account_segments',
},
];
describe('wiki body refs', () => {
it('parses only explicit inline-code body references outside fenced blocks', () => {
const body = [
'Valid `mart_account_segments.total_contract_arr` and `source:mart_account_segments`.',
'Also `warehouse/mart_account_segments.segment` and `table:analytics.mart_account_segments`.',
'Ignore prose mart_account_segments.total_contract_arr_cents.',
'Ignore `single_token`.',
'Ignore wildcard pattern `mart_nrr_quarterly.*_arr_cents`.',
'Ignore condition `users.is_internal = false`.',
'```sql',
'select `mart_account_segments.total_contract_arr_cents`',
'```',
].join('\n');
expect(parseWikiBodyRefs(body)).toEqual([
{ kind: 'sl_entity', connectionId: null, sourceName: 'mart_account_segments', entityName: 'total_contract_arr' },
{ kind: 'sl_source', connectionId: null, sourceName: 'mart_account_segments' },
{ kind: 'sl_entity', connectionId: 'warehouse', sourceName: 'mart_account_segments', entityName: 'segment' },
{ kind: 'table', connectionId: null, tableRef: 'analytics.mart_account_segments' },
]);
});
it('rejects stale inline-code semantic-layer references', async () => {
const invalid = await findInvalidWikiBodyRefs({
pageKey: 'account-segments',
body: 'ARR is documented as `mart_account_segments.total_contract_arr_cents`.',
visibleConnectionIds: ['warehouse'],
loadSources: async () => sources,
tableExists: async () => true,
});
expect(invalid).toEqual([
'account-segments: unknown semantic-layer entity mart_account_segments.total_contract_arr_cents',
]);
});
it('does not treat wildcard inline-code patterns as exact semantic-layer entity references', async () => {
const invalid = await findInvalidWikiBodyRefs({
pageKey: 'revenue-metrics-encoding',
body: 'Cents columns include `mart_nrr_quarterly.*_arr_cents` and `mart_retention_movement_breakout.*_arr_cents`.',
visibleConnectionIds: ['warehouse'],
loadSources: async () => [
{ name: 'mart_nrr_quarterly', grain: [], columns: [], joins: [], measures: [], table: 'analytics.mart_nrr_quarterly' },
{
name: 'mart_retention_movement_breakout',
grain: [],
columns: [],
joins: [],
measures: [],
table: 'analytics.mart_retention_movement_breakout',
},
],
tableExists: async () => true,
});
expect(invalid).toEqual([]);
});
it('does not treat inline-code SQL predicates as exact semantic-layer entity references', async () => {
const invalid = await findInvalidWikiBodyRefs({
pageKey: 'account-reporting-exclusions',
body: 'Exclude internal users with `users.is_internal = false` and test users with `users.is_test = false`.',
visibleConnectionIds: ['warehouse'],
loadSources: async () => [
{
name: 'users',
grain: [],
columns: [
{ name: 'is_internal', type: 'boolean' },
{ name: 'is_test', type: 'boolean' },
],
joins: [],
measures: [],
table: 'analytics.users',
},
],
tableExists: async () => true,
});
expect(invalid).toEqual([]);
});
it('validates source, dimension, segment, measure, and table references', async () => {
const invalid = await findInvalidWikiBodyRefs({
pageKey: 'account-segments',
body: [
'`mart_account_segments.total_contract_arr`',
'`mart_account_segments.segment`',
'`mart_account_segments.enterprise`',
'`source:mart_account_segments`',
'`table:analytics.mart_account_segments`',
].join('\n'),
visibleConnectionIds: ['warehouse'],
loadSources: async () => sources,
tableExists: async (_connectionId, tableRef) => tableRef === 'analytics.mart_account_segments',
});
expect(invalid).toEqual([]);
});
it('ignores two-part inline code when the source is not visible', async () => {
const invalid = await findInvalidWikiBodyRefs({
pageKey: 'engineering-notes',
body: [
'A version token like `node.v22` is not a semantic-layer reference.',
'A raw table must use `table:analytics.mart_account_segments`.',
].join('\n'),
visibleConnectionIds: ['warehouse'],
loadSources: async () => sources,
tableExists: async (_connectionId, tableRef) => tableRef === 'analytics.mart_account_segments',
});
expect(invalid).toEqual([]);
});
it('still rejects explicit missing source and table references', async () => {
const invalid = await findInvalidWikiBodyRefs({
pageKey: 'account-segments',
body: [
'`source:missing_source`',
'`warehouse/source:missing_source`',
'`table:analytics.missing_table`',
].join('\n'),
visibleConnectionIds: ['warehouse'],
loadSources: async () => sources,
tableExists: async () => false,
});
expect(invalid).toEqual([
'account-segments: unknown semantic-layer source missing_source',
'account-segments: unknown semantic-layer source warehouse/missing_source',
'account-segments: unknown raw table analytics.missing_table',
]);
});
});

View file

@ -0,0 +1,141 @@
import type { SemanticLayerSource } from '../sl/index.js';
export type WikiBodyRef =
| { kind: 'sl_entity'; connectionId: string | null; sourceName: string; entityName: string }
| { kind: 'sl_source'; connectionId: string | null; sourceName: string }
| { kind: 'table'; connectionId: string | null; tableRef: string };
export interface WikiBodyRefValidationInput {
pageKey: string;
body: string;
visibleConnectionIds: string[];
loadSources(connectionId: string): Promise<SemanticLayerSource[]>;
tableExists(connectionId: string, tableRef: string): Promise<boolean>;
}
const inlineCodePattern = /`([^`\n]+)`/g;
function visibleLinesOutsideFences(body: string): string[] {
const lines: string[] = [];
let fenced = false;
for (const line of body.split('\n')) {
if (/^\s*```/.test(line)) {
fenced = !fenced;
continue;
}
if (!fenced) {
lines.push(line);
}
}
return lines;
}
function parseConnectionScoped(value: string): { connectionId: string | null; body: string } {
const slash = value.indexOf('/');
if (slash <= 0) {
return { connectionId: null, body: value };
}
return { connectionId: value.slice(0, slash), body: value.slice(slash + 1) };
}
function isIdentifierToken(value: string): boolean {
return /^[A-Za-z_][A-Za-z0-9_]*$/.test(value);
}
export function parseWikiBodyRefs(body: string): WikiBodyRef[] {
const refs: WikiBodyRef[] = [];
for (const line of visibleLinesOutsideFences(body)) {
for (const match of line.matchAll(inlineCodePattern)) {
const token = (match[1] ?? '').trim();
if (!token) {
continue;
}
const scoped = parseConnectionScoped(token);
if (scoped.body.startsWith('source:')) {
const sourceName = scoped.body.slice('source:'.length).trim();
if (sourceName) {
refs.push({ kind: 'sl_source', connectionId: scoped.connectionId, sourceName });
}
continue;
}
if (scoped.body.startsWith('table:')) {
const tableRef = scoped.body.slice('table:'.length).trim();
if (tableRef) {
refs.push({ kind: 'table', connectionId: scoped.connectionId, tableRef });
}
continue;
}
const parts = scoped.body.split('.');
if (parts.length === 2 && isIdentifierToken(parts[0] ?? '') && isIdentifierToken(parts[1] ?? '')) {
refs.push({
kind: 'sl_entity',
connectionId: scoped.connectionId,
sourceName: parts[0],
entityName: parts[1],
});
}
}
}
return refs;
}
function entityNames(source: SemanticLayerSource): Set<string> {
return new Set([
...(source.measures ?? []).map((measure) => measure.name),
...(source.columns ?? []).map((column) => column.name),
...(source.segments ?? []).map((segment) => segment.name),
]);
}
export async function findInvalidWikiBodyRefs(input: WikiBodyRefValidationInput): Promise<string[]> {
const errors: string[] = [];
const sourceCache = new Map<string, SemanticLayerSource[]>();
const loadSources = async (connectionId: string): Promise<SemanticLayerSource[]> => {
const cached = sourceCache.get(connectionId);
if (cached) {
return cached;
}
const sources = await input.loadSources(connectionId);
sourceCache.set(connectionId, sources);
return sources;
};
const findSource = async (
connectionIds: string[],
sourceName: string,
): Promise<{ connectionId: string; source: SemanticLayerSource } | null> => {
for (const connectionId of connectionIds) {
const source = (await loadSources(connectionId)).find((candidate) => candidate.name === sourceName);
if (source) {
return { connectionId, source };
}
}
return null;
};
for (const ref of parseWikiBodyRefs(input.body)) {
const connectionIds = ref.connectionId ? [ref.connectionId] : input.visibleConnectionIds;
if (ref.kind === 'table') {
const found = await Promise.all(connectionIds.map((connectionId) => input.tableExists(connectionId, ref.tableRef)));
if (!found.some(Boolean)) {
errors.push(`${input.pageKey}: unknown raw table ${ref.connectionId ? `${ref.connectionId}/` : ''}${ref.tableRef}`);
}
continue;
}
const found = await findSource(connectionIds, ref.sourceName);
if (!found) {
if (ref.kind === 'sl_source') {
errors.push(
`${input.pageKey}: unknown semantic-layer source ${ref.connectionId ? `${ref.connectionId}/` : ''}${ref.sourceName}`,
);
}
continue;
}
if (ref.kind === 'sl_entity' && !entityNames(found.source).has(ref.entityName)) {
errors.push(`${input.pageKey}: unknown semantic-layer entity ${ref.sourceName}.${ref.entityName}`);
}
}
return errors;
}

View file

@ -78,6 +78,11 @@ describe('ClaudeCodeKtxLlmRuntime', () => {
skills: [],
plugins: [],
tools: [],
managedSettings: {
allowManagedMcpServersOnly: true,
allowedMcpServers: [],
},
strictMcpConfig: true,
allowedTools: [],
permissionMode: 'dontAsk',
persistSession: false,
@ -144,6 +149,11 @@ describe('ClaudeCodeKtxLlmRuntime', () => {
const options = query.mock.calls[0][0].options;
expect(options.allowedTools).toEqual(['mcp__ktx__load_skill']);
expect(options.managedSettings).toEqual({
allowManagedMcpServersOnly: true,
allowedMcpServers: [{ serverName: 'ktx' }],
});
expect(options.strictMcpConfig).toBe(true);
expect(await options.canUseTool('mcp__ktx__load_skill', {}, { signal: new AbortController().signal, toolUseID: '1' })).toEqual({
behavior: 'allow',
toolUseID: '1',
@ -176,6 +186,11 @@ describe('ClaudeCodeKtxLlmRuntime', () => {
skills: [],
plugins: [],
tools: [],
managedSettings: {
allowManagedMcpServersOnly: true,
allowedMcpServers: [],
},
strictMcpConfig: true,
allowedTools: [],
permissionMode: 'dontAsk',
persistSession: false,
@ -268,6 +283,11 @@ describe('ClaudeCodeKtxLlmRuntime', () => {
const options = query.mock.calls[0][0].options;
expect(options.allowedTools).toEqual(['mcp__ktx__load_skill']);
expect(options.managedSettings).toEqual({
allowManagedMcpServersOnly: true,
allowedMcpServers: [{ serverName: 'ktx' }],
});
expect(options.strictMcpConfig).toBe(true);
expect(await options.canUseTool('mcp__ktx__load_skill', {}, { signal: new AbortController().signal, toolUseID: '1' })).toEqual({
behavior: 'allow',
toolUseID: '1',
@ -334,6 +354,10 @@ describe('ClaudeCodeKtxLlmRuntime', () => {
answer: 'yes',
});
expect(objectQuery.mock.calls[0][0].options.env).toEqual(expect.objectContaining({ PATH: '/usr/bin' }));
expect(objectQuery.mock.calls[0][0].options.managedSettings).toEqual({
allowManagedMcpServersOnly: true,
allowedMcpServers: [],
});
expect(objectQuery.mock.calls[0][0].options.env).not.toEqual(
expect.objectContaining({ ANTHROPIC_API_KEY: 'sk-ant-test', AWS_PROFILE: 'prod' }), // pragma: allowlist secret
);
@ -374,6 +398,10 @@ describe('ClaudeCodeKtxLlmRuntime', () => {
telemetryTags: { operationName: 'test' },
});
expect(agentQuery.mock.calls[0][0].options.env).toEqual(expect.objectContaining({ HOME: '/Users/test' }));
expect(agentQuery.mock.calls[0][0].options.managedSettings).toEqual({
allowManagedMcpServersOnly: true,
allowedMcpServers: [{ serverName: 'ktx' }],
});
expect(agentQuery.mock.calls[0][0].options.env).not.toEqual(
expect.objectContaining({ ANTHROPIC_AUTH_TOKEN: 'token', CLAUDE_CODE_USE_VERTEX: '1' }),
);
@ -442,6 +470,11 @@ describe('ClaudeCodeKtxLlmRuntime', () => {
skills: [],
plugins: [],
tools: [],
managedSettings: {
allowManagedMcpServersOnly: true,
allowedMcpServers: [],
},
strictMcpConfig: true,
allowedTools: [],
persistSession: false,
env: expect.not.objectContaining({ ANTHROPIC_API_KEY: 'sk-ant-test' }),

View file

@ -45,6 +45,8 @@ const BUILTIN_TOOLS = [
'TodoWrite',
];
const KTX_MCP_SERVER_NAME = 'ktx';
function isResult(message: SDKMessage): message is SDKResultMessage {
return message.type === 'result';
}
@ -113,7 +115,14 @@ function assertInitIsolation(
}
function expectedMcpServerNames(tools: KtxRuntimeToolSet | undefined): Set<string> {
return tools && Object.keys(tools).length > 0 ? new Set(['ktx']) : new Set();
return tools && Object.keys(tools).length > 0 ? new Set([KTX_MCP_SERVER_NAME]) : new Set();
}
function managedMcpSettings(serverNames: string[]): NonNullable<Options['managedSettings']> {
return {
allowManagedMcpServersOnly: true,
allowedMcpServers: serverNames.map((serverName) => ({ serverName })),
};
}
function baseOptions(input: {
@ -125,6 +134,7 @@ function baseOptions(input: {
}): Options {
const toolIds = mcpToolIds(input.tools ?? {});
const allowedToolIds = new Set(toolIds);
const expectedServerNames = [...expectedMcpServerNames(input.tools)];
return {
cwd: input.projectDir,
model: input.model,
@ -133,6 +143,8 @@ function baseOptions(input: {
skills: [],
plugins: [],
tools: [],
managedSettings: managedMcpSettings(expectedServerNames),
strictMcpConfig: true,
allowedTools: toolIds,
disallowedTools: BUILTIN_TOOLS,
canUseTool: async (toolName, _toolInput, options) =>
@ -147,7 +159,14 @@ function baseOptions(input: {
persistSession: false,
env: createKtxClaudeCodeEnv(input.env),
...(input.tools && Object.keys(input.tools).length > 0
? { mcpServers: { ktx: createSdkMcpServer({ name: 'ktx', tools: createClaudeSdkTools(input.tools) }) } }
? {
mcpServers: {
[KTX_MCP_SERVER_NAME]: createSdkMcpServer({
name: KTX_MCP_SERVER_NAME,
tools: createClaudeSdkTools(input.tools),
}),
},
}
: {}),
};
}

View file

@ -99,6 +99,27 @@ describe('SlEditSourceTool — session gating', () => {
);
});
it('rejects session-scoped edits outside allowed target connections', async () => {
const { tool } = makeTool();
const session = makeSession({
allowedConnectionNames: new Set(['warehouse']),
});
const context: ToolContext = { ...baseContext, session };
const result = await tool.call(
{
connectionId: 'finance',
sourceName: 'orders',
yaml_edits: [{ oldText: 'measures: []', newText: 'measures: []' }],
} as any,
context,
);
expect(result.structured.success).toBe(false);
expect(result.markdown).toContain('connectionId "finance" is outside this ingest session');
expect(session.actions).toEqual([]);
});
it('indexes normally when no session is present', async () => {
const { tool, slSearchService } = makeTool();
const result = await tool.call(

View file

@ -1,6 +1,12 @@
import YAML from 'yaml';
import { z } from 'zod';
import { addTouchedSlSource, type ToolContext, type ToolOutput, validateActionRawPaths } from '../../tools/index.js';
import {
addTouchedSlSource,
type ToolContext,
type ToolOutput,
validateActionRawPaths,
validateActionTargetConnection,
} from '../../tools/index.js';
import { applySqlEdits } from '../../tools/sql-edit-replacer.js';
import { normalizeSemanticLayerDescriptions } from '../description-normalization.js';
import type { SemanticLayerSource } from '../types.js';
@ -79,6 +85,10 @@ If no source exists yet, use sl_write_source instead — this tool will reject t
const semanticLayerService = context.session?.semanticLayerService ?? this.semanticLayerService;
const skipIndex = context.session?.isWorktreeScoped === true;
const targetConnectionValidation = validateActionTargetConnection(context.session, connectionId);
if (!targetConnectionValidation.ok) {
return this.buildOutput(false, [targetConnectionValidation.error], sourceName);
}
const rawPathValidation = validateActionRawPaths(context.session, input.rawPaths);
if (!rawPathValidation.ok) {
return this.buildOutput(false, [rawPathValidation.error], sourceName);

View file

@ -133,6 +133,34 @@ describe('SlWriteSourceTool — session gating', () => {
);
});
it('rejects session-scoped writes outside allowed target connections', async () => {
const { tool } = makeTool();
const session = makeSession({
allowedConnectionNames: new Set(['warehouse']),
});
const context: ToolContext = { ...baseContext, session };
const result = await tool.call(
{
connectionId: 'finance',
sourceName: 'finance_orders',
source: {
name: 'finance_orders',
table: 'public.orders',
grain: ['id'],
columns: [{ name: 'id', type: 'string' }],
measures: [],
joins: [],
} as any,
} as any,
context,
);
expect(result.structured.success).toBe(false);
expect(result.markdown).toContain('connectionId "finance" is outside this ingest session');
expect(session.actions).toEqual([]);
});
it('indexes normally when no session is present', async () => {
const { tool, slSearchService } = makeTool();
const result = await tool.call(

View file

@ -1,6 +1,12 @@
import YAML from 'yaml';
import { z } from 'zod';
import { addTouchedSlSource, type ToolContext, type ToolOutput, validateActionRawPaths } from '../../tools/index.js';
import {
addTouchedSlSource,
type ToolContext,
type ToolOutput,
validateActionRawPaths,
validateActionTargetConnection,
} from '../../tools/index.js';
import { sourceOverlaySchema } from '../schemas.js';
import type { SemanticLayerService } from '../semantic-layer.service.js';
import type { SemanticLayerSource } from '../types.js';
@ -106,6 +112,10 @@ Do NOT join back to a table that the SQL already aggregates from if the grain co
const semanticLayerService = context.session?.semanticLayerService ?? this.semanticLayerService;
const skipIndex = context.session?.isWorktreeScoped === true;
const targetConnectionValidation = validateActionTargetConnection(context.session, connectionId);
if (!targetConnectionValidation.ok) {
return this.buildOutput(false, [targetConnectionValidation.error], sourceName);
}
const rawPathValidation = validateActionRawPaths(context.session, input.rawPaths);
if (!rawPathValidation.ok) {
return this.buildOutput(false, [rawPathValidation.error], sourceName);

View file

@ -0,0 +1,23 @@
import type { ToolSession } from './tool-session.js';
type ActionTargetConnectionValidation = { ok: true } | { ok: false; error: string };
export function validateActionTargetConnection(
session: ToolSession | undefined,
connectionId: string,
): ActionTargetConnectionValidation {
const allowed = session?.allowedConnectionNames;
if (!allowed) {
return { ok: true };
}
if (allowed.has(connectionId)) {
return { ok: true };
}
const allowedList = [...allowed].sort();
return {
ok: false,
error: `connectionId "${connectionId}" is outside this ingest session's allowed target connections: ${
allowedList.length > 0 ? allowedList.join(', ') : '(none)'
}`,
};
}

View file

@ -32,6 +32,7 @@ export type { SqlEdit } from './sql-edit-replacer.js';
export { applySqlEdits } from './sql-edit-replacer.js';
export type { IngestToolMetadata, MemoryAction, ToolSession } from './tool-session.js';
export { validateActionRawPaths } from './action-raw-paths.js';
export { validateActionTargetConnection } from './action-target-connection.js';
export type { TouchedSlSource, TouchedSlSourceSet } from './touched-sl-sources.js';
export {
addTouchedSlSource,