mirror of
https://github.com/Kaelio/ktx.git
synced 2026-06-07 07:55:13 +02:00
feat(ingest): default local ingest to isolated diffs (#128)
* docs: add isolated-diff ingestion design * Refine isolated-diff ingestion design after adversarial review iteration 1 * Refine isolated-diff ingestion design after adversarial review iteration 2 * Refine isolated-diff ingestion design after adversarial review iteration 3 * feat: persist ingest trace events * feat: add isolated ingest patch helpers * feat: validate wiki body semantic references * feat: add final ingest artifact gates * feat: execute ingest work units in child worktrees * feat: integrate isolated work unit patches * feat: route selected ingest sources through isolated diffs * test: cover isolated diff ingestion regressions * feat: add isolated diff ingestion v1 core * docs: document ingest trace inspection * docs: add isolated diff ingestion v1 core plan * fix(ingest): tighten final artifact gates * fix(ingest): gate isolated final integration tree * fix(ingest): persist postmortem failure traces * fix(ingest): trace policy conflicts and cleanup child worktrees * test(ingest): verify isolated diff postmortem coverage * docs: add isolated diff ingestion gates and trace closure plan * fix(ingest): gate provenance before isolated diff squash * docs: add isolated diff ingestion provenance gate closure plan * fix(ingest): gate final wiki references * fix(ingest): enforce SL target connection scope * fix(ingest): trace isolated SL target policy gates * test(ingest): cover isolated diff reference and target gates * chore(ingest): verify isolated diff gate closure * docs: add isolated diff ingestion reference and target gate closure plan * fix(ingest): gate global wiki references * docs: add isolated diff ingestion global wiki reference gate closure plan * fix(ingest): validate scan sources and wiki refs * test(ingest): cover isolated diff textual conflict resolver * test(ingest): cover isolated diff resolver integration * feat(ingest): repair isolated diff textual conflicts * feat(ingest): report isolated diff resolver outcomes * test(ingest): verify isolated diff textual conflict repair * test(ingest): align textual conflict failure coverage * docs: add isolated diff textual conflict resolver plan * test(ingest): cover isolated diff gate repair * feat(ingest): add isolated diff gate repair agent * feat(ingest): repair isolated diff semantic gate failures * feat(ingest): wire isolated diff gate repair * test(ingest): verify isolated diff final gate repair * chore(ingest): verify isolated diff gate repair * docs: add isolated diff gate repair plan * Improve ingest progress updates * feat(ingest): route direct-write connectors through isolated diffs * test(ingest): cover non-metabase isolated diff routing * feat(ingest): project metricflow semantic models before work units * test(ingest): verify metricflow isolated projection path * chore(ingest): verify isolated diff connector migration * docs: add isolated diff connector migration plan * feat(ingest): make isolated diff routing the private default * feat(ingest): promote isolated diff to default runner path * feat(ingest): default local ingest to isolated diffs * chore(ingest): remove isolated diff allowlist references * fix(ingest): preserve transient evidence for isolated work units * docs: add isolated diff default promotion plan * refactor(ingest): remove shared worktree WorkUnit path * docs(ingest): align WorkUnit prompts with isolated diffs * test(ingest): drop unused runner import * docs: add isolated diff shared worktree removal plan * docs: add isolated diff gate repair classification plan * fix: restrict claude-code mcp servers * docs: align ingest trace guidance with public CLI --------- Co-authored-by: Andrey Avtomonov <7889985+andreybavt@users.noreply.github.com>
This commit is contained in:
parent
d1c84e5564
commit
e64da5a85d
66 changed files with 22346 additions and 514 deletions
|
|
@ -111,6 +111,41 @@ notion skipped skipped done done
|
|||
Use `--json` when a script or agent needs the selected plan and per-target
|
||||
results.
|
||||
|
||||
## Inspect source ingest traces
|
||||
|
||||
Source ingest writes persistent JSONL traces for postmortem debugging. Plain
|
||||
ingest output prints the trace path near the report, run, and job identifiers
|
||||
when a trace is available:
|
||||
|
||||
```text
|
||||
Report: report-abc123
|
||||
Run: run-abc123
|
||||
Job: job-abc123
|
||||
Trace: .ktx/ingest-traces/job-abc123/trace.jsonl
|
||||
```
|
||||
|
||||
The trace file lives under the project directory at
|
||||
`.ktx/ingest-traces/<jobId>/trace.jsonl`. Each line is a JSON event with the
|
||||
job id, run id, sync id, connection id, source key, phase, event name, timing,
|
||||
state snapshot, decision context, and error details. Failed runs also write a
|
||||
stored ingest report with `status: "failed"`, `failure.phase`,
|
||||
`failure.message`, and the same trace path.
|
||||
|
||||
Use `jq` or line-oriented tools to inspect a trace:
|
||||
|
||||
```bash
|
||||
jq -c '. | {at, level, phase, event, durationMs, data, error}' \
|
||||
.ktx/ingest-traces/<jobId>/trace.jsonl
|
||||
```
|
||||
|
||||
KTX writes `debug` trace events by default. Set `KTX_INGEST_TRACE_LEVEL` to
|
||||
`error`, `info`, `debug`, or `trace` before running ingest to change the trace
|
||||
verbosity:
|
||||
|
||||
```bash
|
||||
KTX_INGEST_TRACE_LEVEL=trace ktx ingest metabase
|
||||
```
|
||||
|
||||
## Common errors
|
||||
|
||||
| Error | Cause | Recovery |
|
||||
|
|
|
|||
2938
docs/superpowers/plans/2026-05-17-isolated-diff-ingestion-v1-core.md
Normal file
2938
docs/superpowers/plans/2026-05-17-isolated-diff-ingestion-v1-core.md
Normal file
File diff suppressed because it is too large
Load diff
File diff suppressed because it is too large
Load diff
|
|
@ -0,0 +1,493 @@
|
|||
# Isolated Diff Ingestion V1 Global Wiki Reference Gate Closure Implementation Plan
|
||||
|
||||
> **For agentic workers:** REQUIRED SUB-SKILL: Use
|
||||
> superpowers:subagent-driven-development (recommended) or
|
||||
> superpowers:executing-plans to implement this plan task-by-task. Steps use
|
||||
> checkbox (`- [ ]`) syntax for tracking.
|
||||
|
||||
**Goal:** Reject final trees where an isolated-diff run changes semantic-layer
|
||||
sources or deletes wiki pages and leaves pre-existing wiki pages with stale
|
||||
body, `sl_refs`, frontmatter `refs`, or inline `[[page-key]]` references.
|
||||
|
||||
**Architecture:** Keep `artifact-gates.ts` validation-only. The runner expands
|
||||
the final wiki gate scope before the existing final artifact gate: changed pages
|
||||
are always validated, and all global wiki pages are validated when the run
|
||||
changes any semantic-layer source or removes any wiki page. The final-gate trace
|
||||
records the expanded scope and why it was expanded.
|
||||
|
||||
**Tech Stack:** TypeScript, Vitest, pnpm workspace commands, existing
|
||||
`IngestBundleRunner`, `KnowledgeWikiService`, and isolated-diff test fixtures.
|
||||
|
||||
---
|
||||
|
||||
## Audit Summary
|
||||
|
||||
The implemented isolated-diff plans cover the core v1 flow: child worktrees,
|
||||
binary no-rename patch proposals, `git apply --3way --index`, policy rejection,
|
||||
final gates after reconciliation and repair, pre-squash provenance raw-path
|
||||
validation, target-connection enforcement, failed reports, and persistent JSONL
|
||||
traces.
|
||||
|
||||
One v1-blocking correctness gap remains. Final wiki gates currently validate
|
||||
wiki pages changed by the run. They do not validate unchanged pages that become
|
||||
invalid because the run changes a semantic-layer source or deletes a referenced
|
||||
wiki page. Two concrete failures can therefore squash into main:
|
||||
|
||||
- A pre-existing wiki page body contains
|
||||
`` `mart_account_segments.total_contract_arr_cents` `` while the run updates
|
||||
`semantic-layer/warehouse/mart_account_segments.yaml` to define only
|
||||
`total_contract_arr`.
|
||||
- A pre-existing wiki page has `refs: [source-page]` or `[[source-page]]` while
|
||||
the run deletes `wiki/global/source-page.md`.
|
||||
|
||||
This plan does not expand connector rollout, promote isolated diffs to the
|
||||
default, add interactive resolution, add semantic auto-merge, remove the old
|
||||
path, expand transitive semantic-layer dependencies, or move provenance into
|
||||
files.
|
||||
|
||||
## File Structure
|
||||
|
||||
- Modify `packages/context/src/ingest/ingest-bundle.runner.isolated-diff.test.ts`.
|
||||
Adds two failing end-to-end regressions for unchanged wiki pages made stale by
|
||||
semantic-layer changes and wiki-page deletion.
|
||||
- Modify `packages/context/src/ingest/ingest-bundle.runner.ts`.
|
||||
Adds a final wiki gate scope helper, expands validation to all global wiki
|
||||
pages when final state changes can invalidate unchanged references, and records
|
||||
scope details in the final-gate trace and failed report.
|
||||
|
||||
---
|
||||
|
||||
### Task 1: Add failing unchanged wiki regressions
|
||||
|
||||
**Files:**
|
||||
- Modify: `packages/context/src/ingest/ingest-bundle.runner.isolated-diff.test.ts`
|
||||
|
||||
- [ ] **Step 1: Add the stale existing wiki body regression**
|
||||
|
||||
Insert this test inside `describe('IngestBundleRunner isolated diff path', ...)`
|
||||
after the existing Metabase stale-measure regression:
|
||||
|
||||
```ts
|
||||
it('rejects unchanged wiki body refs made stale by isolated semantic-layer changes', async () => {
|
||||
const runtime = await makeRealGitRuntime();
|
||||
try {
|
||||
await mkdir(join(runtime.configDir, 'semantic-layer/warehouse'), { recursive: true });
|
||||
await mkdir(join(runtime.configDir, 'wiki/global'), { recursive: true });
|
||||
await writeFile(
|
||||
join(runtime.configDir, 'semantic-layer/warehouse/mart_account_segments.yaml'),
|
||||
'name: mart_account_segments\ngrain: [account_id]\ncolumns: [{name: account_id, type: string}]\njoins: []\nmeasures:\n - name: total_contract_arr_cents\n expr: sum(contract_arr)\n',
|
||||
);
|
||||
await writeFile(
|
||||
join(runtime.configDir, 'wiki/global/account-segments.md'),
|
||||
'---\nsummary: Account segments\nusage_mode: auto\n---\n\nExisting ARR uses `mart_account_segments.total_contract_arr_cents`.\n',
|
||||
);
|
||||
await runtime.git.commitFiles(
|
||||
['semantic-layer/warehouse/mart_account_segments.yaml', 'wiki/global/account-segments.md'],
|
||||
'seed existing wiki body ref',
|
||||
'KTX Test',
|
||||
'system@ktx.local',
|
||||
);
|
||||
const preRunHead = await runtime.git.revParseHead();
|
||||
|
||||
const { deps, adapter } = makeDeps(runtime);
|
||||
adapter.chunk.mockResolvedValue({
|
||||
workUnits: [{ unitKey: 'source-only', rawFiles: ['cards/source.json'], peerFileIndex: [], dependencyPaths: [] }],
|
||||
});
|
||||
|
||||
let currentSession: any = null;
|
||||
deps.toolsetFactory.createIngestWuToolset = vi.fn((toolSession: any) => {
|
||||
currentSession = toolSession;
|
||||
return { toRuntimeTools: vi.fn(() => ({})) };
|
||||
});
|
||||
deps.agentRunner.runLoop = vi.fn(async () => {
|
||||
const root = rootOfConfig(currentSession.configService, runtime.configDir);
|
||||
await writeFile(
|
||||
join(root, 'semantic-layer/warehouse/mart_account_segments.yaml'),
|
||||
'name: mart_account_segments\ngrain: [account_id]\ncolumns: [{name: account_id, type: string}]\njoins: []\nmeasures:\n - name: total_contract_arr\n expr: sum(contract_arr)\n',
|
||||
);
|
||||
addTouchedSlSource(currentSession.touchedSlSources, 'warehouse', 'mart_account_segments');
|
||||
currentSession.actions.push({
|
||||
target: 'sl',
|
||||
type: 'updated',
|
||||
key: 'mart_account_segments',
|
||||
detail: 'Rename ARR measure',
|
||||
targetConnectionId: 'warehouse',
|
||||
rawPaths: ['cards/source.json'],
|
||||
});
|
||||
await currentSession.gitService.commitFiles(
|
||||
['semantic-layer/warehouse/mart_account_segments.yaml'],
|
||||
'wu source rename',
|
||||
'KTX Test',
|
||||
'system@ktx.local',
|
||||
);
|
||||
return { stopReason: 'natural' };
|
||||
}) as never;
|
||||
|
||||
const runner = new IngestBundleRunner(deps);
|
||||
await mockStageRawFiles(runner, runtime, [['cards/source.json', 'h1']]);
|
||||
|
||||
await expect(
|
||||
runner.run({
|
||||
jobId: 'job-existing-body-stale',
|
||||
connectionId: 'warehouse',
|
||||
sourceKey: 'metabase',
|
||||
trigger: 'upload',
|
||||
bundleRef: { kind: 'upload', uploadId: 'upload' },
|
||||
}),
|
||||
).rejects.toThrow(/total_contract_arr_cents/);
|
||||
|
||||
expect(await runtime.git.revParseHead()).toBe(preRunHead);
|
||||
const trace = await readFile(join(runtime.configDir, '.ktx/ingest-traces/job-existing-body-stale/trace.jsonl'), 'utf-8');
|
||||
expect(trace).toContain('final_artifact_gates_failed');
|
||||
expect(trace).toContain('account-segments');
|
||||
expect(trace).toContain('semantic_layer_changed');
|
||||
expect(trace).toContain('ingest_failed');
|
||||
expect(trace).toContain('failure_report_created');
|
||||
expect(trace).not.toContain('squash_finished');
|
||||
} finally {
|
||||
await rm(runtime.homeDir, { recursive: true, force: true });
|
||||
}
|
||||
});
|
||||
```
|
||||
|
||||
- [ ] **Step 2: Add the stale existing wiki page-reference regression**
|
||||
|
||||
Insert this test near the existing final wiki reference regression:
|
||||
|
||||
```ts
|
||||
it('rejects unchanged inbound wiki refs broken by an isolated wiki deletion', async () => {
|
||||
const runtime = await makeRealGitRuntime();
|
||||
try {
|
||||
await mkdir(join(runtime.configDir, 'wiki/global'), { recursive: true });
|
||||
await writeFile(
|
||||
join(runtime.configDir, 'wiki/global/source-page.md'),
|
||||
'---\nsummary: Source page\nusage_mode: auto\n---\n\nSource page\n',
|
||||
);
|
||||
await writeFile(
|
||||
join(runtime.configDir, 'wiki/global/account-segments.md'),
|
||||
'---\nsummary: Account segments\nusage_mode: auto\nrefs:\n - source-page\n---\n\nSee [[source-page]].\n',
|
||||
);
|
||||
await runtime.git.commitFiles(
|
||||
['wiki/global/source-page.md', 'wiki/global/account-segments.md'],
|
||||
'seed inbound wiki refs',
|
||||
'KTX Test',
|
||||
'system@ktx.local',
|
||||
);
|
||||
const preRunHead = await runtime.git.revParseHead();
|
||||
|
||||
const { deps, adapter } = makeDeps(runtime);
|
||||
adapter.chunk.mockResolvedValue({
|
||||
workUnits: [{ unitKey: 'delete-target-page', rawFiles: ['pages/delete.json'], peerFileIndex: [], dependencyPaths: [] }],
|
||||
});
|
||||
|
||||
let currentSession: any = null;
|
||||
deps.toolsetFactory.createIngestWuToolset = vi.fn((toolSession: any) => {
|
||||
currentSession = toolSession;
|
||||
return { toRuntimeTools: vi.fn(() => ({})) };
|
||||
});
|
||||
deps.agentRunner.runLoop = vi.fn(async () => {
|
||||
const root = rootOfConfig(currentSession.configService, runtime.configDir);
|
||||
await rm(join(root, 'wiki/global/source-page.md'), { force: true });
|
||||
currentSession.actions.push({
|
||||
target: 'wiki',
|
||||
type: 'removed',
|
||||
key: 'source-page',
|
||||
detail: 'Delete referenced page',
|
||||
rawPaths: ['pages/delete.json'],
|
||||
});
|
||||
await currentSession.gitService.commitFiles(
|
||||
['wiki/global/source-page.md'],
|
||||
'wu delete target page',
|
||||
'KTX Test',
|
||||
'system@ktx.local',
|
||||
);
|
||||
return { stopReason: 'natural' };
|
||||
}) as never;
|
||||
|
||||
const runner = new IngestBundleRunner(deps);
|
||||
await mockStageRawFiles(runner, runtime, [['pages/delete.json', 'h1']]);
|
||||
|
||||
await expect(
|
||||
runner.run({
|
||||
jobId: 'job-existing-wiki-ref-stale',
|
||||
connectionId: 'warehouse',
|
||||
sourceKey: 'metabase',
|
||||
trigger: 'upload',
|
||||
bundleRef: { kind: 'upload', uploadId: 'upload' },
|
||||
}),
|
||||
).rejects.toThrow(/wiki references target missing page\(s\): account-segments -> source-page/);
|
||||
|
||||
expect(await runtime.git.revParseHead()).toBe(preRunHead);
|
||||
const trace = await readFile(join(runtime.configDir, '.ktx/ingest-traces/job-existing-wiki-ref-stale/trace.jsonl'), 'utf-8');
|
||||
expect(trace).toContain('final_artifact_gates_failed');
|
||||
expect(trace).toContain('account-segments -> source-page');
|
||||
expect(trace).toContain('wiki_page_removed');
|
||||
expect(trace).toContain('ingest_failed');
|
||||
expect(trace).toContain('failure_report_created');
|
||||
expect(trace).not.toContain('squash_finished');
|
||||
} finally {
|
||||
await rm(runtime.homeDir, { recursive: true, force: true });
|
||||
}
|
||||
});
|
||||
```
|
||||
|
||||
- [ ] **Step 3: Run the focused regressions and verify they fail**
|
||||
|
||||
Run:
|
||||
|
||||
```bash
|
||||
pnpm --filter @ktx/context exec vitest run src/ingest/ingest-bundle.runner.isolated-diff.test.ts -t "unchanged wiki body refs|unchanged inbound wiki refs"
|
||||
```
|
||||
|
||||
Expected: FAIL. The stale body test currently squashes successfully because the
|
||||
unchanged `account-segments` page is not in `finalChangedWikiPageKeys`. The
|
||||
inbound wiki ref test currently squashes successfully because the deleted
|
||||
`source-page` is validated as a missing changed page and skipped, while the
|
||||
unchanged page that references it is never validated.
|
||||
|
||||
---
|
||||
|
||||
### Task 2: Expand the final wiki validation scope
|
||||
|
||||
**Files:**
|
||||
- Modify: `packages/context/src/ingest/ingest-bundle.runner.ts`
|
||||
|
||||
- [ ] **Step 1: Add final wiki gate scope helpers**
|
||||
|
||||
Add these private methods after `uniqueTouchedSlSources()`:
|
||||
|
||||
```ts
|
||||
private removedWikiPageKeysFromActions(actions: MemoryAction[]): string[] {
|
||||
return this.uniqueWikiPageKeys(
|
||||
actions.filter((action) => action.target === 'wiki' && action.type === 'removed').map((action) => action.key),
|
||||
);
|
||||
}
|
||||
|
||||
private async wikiPageKeysForFinalGates(input: {
|
||||
wikiService: ReturnType<KnowledgeWikiService['forWorktree']>;
|
||||
changedWikiPageKeys: string[];
|
||||
touchedSlSources: TouchedSlSource[];
|
||||
actions: MemoryAction[];
|
||||
}): Promise<{
|
||||
pageKeys: string[];
|
||||
trace: {
|
||||
global: boolean;
|
||||
reasons: string[];
|
||||
changedWikiPageKeys: string[];
|
||||
removedWikiPageKeys: string[];
|
||||
pageKeysValidated: string[];
|
||||
};
|
||||
}> {
|
||||
const changedWikiPageKeys = this.uniqueWikiPageKeys(input.changedWikiPageKeys);
|
||||
const removedWikiPageKeys = this.removedWikiPageKeysFromActions(input.actions);
|
||||
const reasons: string[] = [];
|
||||
if (input.touchedSlSources.length > 0) {
|
||||
reasons.push('semantic_layer_changed');
|
||||
}
|
||||
if (removedWikiPageKeys.length > 0) {
|
||||
reasons.push('wiki_page_removed');
|
||||
}
|
||||
|
||||
let pageKeys = changedWikiPageKeys;
|
||||
if (reasons.length > 0) {
|
||||
pageKeys = this.uniqueWikiPageKeys([
|
||||
...changedWikiPageKeys,
|
||||
...(await input.wikiService.listPageKeys('GLOBAL', null)),
|
||||
]);
|
||||
}
|
||||
|
||||
return {
|
||||
pageKeys,
|
||||
trace: {
|
||||
global: reasons.length > 0,
|
||||
reasons,
|
||||
changedWikiPageKeys,
|
||||
removedWikiPageKeys,
|
||||
pageKeysValidated: pageKeys,
|
||||
},
|
||||
};
|
||||
}
|
||||
```
|
||||
|
||||
- [ ] **Step 2: Use the expanded scope before final gates**
|
||||
|
||||
In `runInner()`, replace the current `finalChangedWikiPageKeys` and
|
||||
`finalTouchedSlSources` block with this code:
|
||||
|
||||
```ts
|
||||
const baseFinalChangedWikiPageKeys = this.uniqueWikiPageKeys([
|
||||
...(isolatedDiffEnabled ? projectionChangedWikiPageKeys : []),
|
||||
...workUnitOutcomes
|
||||
.flatMap((outcome) => outcome.patchTouchedPaths ?? [])
|
||||
.flatMap((path) => this.wikiPageKeysFromPaths([path])),
|
||||
...this.wikiPageKeysFromActions(reconcileActions),
|
||||
...postReconciliationPaths.flatMap((path) => this.wikiPageKeysFromPaths([path])),
|
||||
...wikiSlRefRepairResult.repairs.filter((repair) => repair.scope === 'GLOBAL').map((repair) => repair.pageKey),
|
||||
]);
|
||||
const finalTouchedSlSources = this.uniqueTouchedSlSources([
|
||||
...(isolatedDiffEnabled ? projectionTouchedSources : []),
|
||||
...workUnitOutcomes.flatMap((outcome) => outcome.touchedSlSources),
|
||||
...this.touchedSlSourcesFromActions(reconcileActions, job.connectionId),
|
||||
...this.touchedSlSourcesFromPaths(postReconciliationPaths),
|
||||
...(postProcessorOutcome?.touchedSources ?? []),
|
||||
]);
|
||||
const finalWikiGateScope = await this.wikiPageKeysForFinalGates({
|
||||
wikiService: this.deps.wikiService.forWorktree(sessionWorktree.workdir),
|
||||
changedWikiPageKeys: baseFinalChangedWikiPageKeys,
|
||||
touchedSlSources: finalTouchedSlSources,
|
||||
actions: [...stageIndex.workUnits.flatMap((wu) => wu.actions), ...reconcileActions],
|
||||
});
|
||||
const finalChangedWikiPageKeys = finalWikiGateScope.pageKeys;
|
||||
```
|
||||
|
||||
This keeps the existing variable name used by `validateFinalIngestArtifacts()`,
|
||||
but the value now means "wiki page keys to validate in final gates."
|
||||
|
||||
- [ ] **Step 3: Add scope details to final-gate trace data**
|
||||
|
||||
In the `finalArtifactGateTraceData` object, add the
|
||||
`wikiReferenceGateScope` field:
|
||||
|
||||
```ts
|
||||
const finalArtifactGateTraceData = {
|
||||
changedWikiPageKeys: finalChangedWikiPageKeys,
|
||||
wikiReferenceGateScope: finalWikiGateScope.trace,
|
||||
touchedSlSources: finalTouchedSlSources,
|
||||
projectionTouchedPaths,
|
||||
workUnitPatchTouchedPaths: workUnitOutcomes.flatMap((outcome) => outcome.patchTouchedPaths ?? []),
|
||||
preReconciliationSha,
|
||||
postReconciliationSha,
|
||||
postReconciliationPaths,
|
||||
reconciliationActionCount: reconcileActions.length,
|
||||
wikiSlRefRepairCount: wikiSlRefRepairResult.repairs.length,
|
||||
};
|
||||
```
|
||||
|
||||
The failure report already stores `activeFailureDetails`, so this trace data
|
||||
also becomes persistent failed-report context when final gates fail.
|
||||
|
||||
- [ ] **Step 4: Run the focused regressions and verify they pass**
|
||||
|
||||
Run:
|
||||
|
||||
```bash
|
||||
pnpm --filter @ktx/context exec vitest run src/ingest/ingest-bundle.runner.isolated-diff.test.ts -t "unchanged wiki body refs|unchanged inbound wiki refs"
|
||||
```
|
||||
|
||||
Expected: PASS. Both traces include `final_artifact_gates_failed`,
|
||||
`failure_report_created`, no `squash_finished`, and
|
||||
`wikiReferenceGateScope` with either `semantic_layer_changed` or
|
||||
`wiki_page_removed`.
|
||||
|
||||
---
|
||||
|
||||
### Task 3: Verification and commit
|
||||
|
||||
**Files:**
|
||||
- Verify: `packages/context/src/ingest/ingest-bundle.runner.ts`
|
||||
- Verify: `packages/context/src/ingest/ingest-bundle.runner.isolated-diff.test.ts`
|
||||
|
||||
- [ ] **Step 1: Run the isolated-diff focused suite**
|
||||
|
||||
Run:
|
||||
|
||||
```bash
|
||||
pnpm --filter @ktx/context exec vitest run \
|
||||
src/ingest/ingest-bundle.runner.isolated-diff.test.ts \
|
||||
src/ingest/artifact-gates.test.ts \
|
||||
src/ingest/wiki-body-refs.test.ts \
|
||||
src/ingest/semantic-layer-target-policy.test.ts \
|
||||
src/ingest/isolated-diff/git-patch.test.ts \
|
||||
src/ingest/isolated-diff/patch-integrator.test.ts \
|
||||
src/ingest/isolated-diff/work-unit-executor.test.ts \
|
||||
src/core/git.service.patch.test.ts
|
||||
```
|
||||
|
||||
Expected: PASS.
|
||||
|
||||
- [ ] **Step 2: Type-check the context package**
|
||||
|
||||
Run:
|
||||
|
||||
```bash
|
||||
pnpm --filter @ktx/context run type-check
|
||||
```
|
||||
|
||||
Expected: PASS.
|
||||
|
||||
- [ ] **Step 3: Run dead-code analysis**
|
||||
|
||||
Run:
|
||||
|
||||
```bash
|
||||
pnpm run dead-code
|
||||
```
|
||||
|
||||
Expected: PASS, or only pre-existing findings unrelated to
|
||||
`packages/context/src/ingest/ingest-bundle.runner.ts` and
|
||||
`packages/context/src/ingest/ingest-bundle.runner.isolated-diff.test.ts`.
|
||||
Investigate any new finding before committing.
|
||||
|
||||
- [ ] **Step 4: Verify trace acceptance criteria**
|
||||
|
||||
Open the traces produced by the two new failing-run tests and confirm these
|
||||
events and fields exist:
|
||||
|
||||
```text
|
||||
job-existing-body-stale:
|
||||
- final_artifact_gates_started
|
||||
- final_artifact_gates_failed
|
||||
- ingest_failed
|
||||
- failure_report_created
|
||||
- no squash_finished
|
||||
- wikiReferenceGateScope.global is true
|
||||
- wikiReferenceGateScope.reasons includes semantic_layer_changed
|
||||
- wikiReferenceGateScope.pageKeysValidated includes account-segments
|
||||
- error.message includes total_contract_arr_cents
|
||||
|
||||
job-existing-wiki-ref-stale:
|
||||
- final_artifact_gates_started
|
||||
- final_artifact_gates_failed
|
||||
- ingest_failed
|
||||
- failure_report_created
|
||||
- no squash_finished
|
||||
- wikiReferenceGateScope.global is true
|
||||
- wikiReferenceGateScope.reasons includes wiki_page_removed
|
||||
- wikiReferenceGateScope.removedWikiPageKeys includes source-page
|
||||
- error.message includes account-segments -> source-page
|
||||
```
|
||||
|
||||
- [ ] **Step 5: Commit**
|
||||
|
||||
Run:
|
||||
|
||||
```bash
|
||||
git add packages/context/src/ingest/ingest-bundle.runner.ts \
|
||||
packages/context/src/ingest/ingest-bundle.runner.isolated-diff.test.ts
|
||||
git commit -m "fix(ingest): gate global wiki references"
|
||||
```
|
||||
|
||||
Expected: one commit containing only the runner and isolated-diff runner test
|
||||
changes.
|
||||
|
||||
---
|
||||
|
||||
## Self-Review
|
||||
|
||||
Spec coverage:
|
||||
- Final global wiki body reference validation now covers unchanged wiki pages
|
||||
when a run changes semantic-layer sources.
|
||||
- Final global wiki page reference validation now covers unchanged inbound
|
||||
references when a run deletes wiki pages.
|
||||
- The plan keeps resolver behavior fail-fast and stops before squash.
|
||||
- Persistent trace and failed-report acceptance criteria are explicit and tied
|
||||
to the concrete failure modes.
|
||||
|
||||
Non-blocking gaps unchanged:
|
||||
- Broader connector rollout.
|
||||
- Isolated-diff default promotion.
|
||||
- Old shared-worktree path removal.
|
||||
- Interactive conflict resolution.
|
||||
- Semantic auto-merge.
|
||||
- Transitive semantic-layer dependency expansion.
|
||||
- Provenance-as-files.
|
||||
|
|
@ -0,0 +1,494 @@
|
|||
# Isolated Diff Ingestion V1 Provenance Gate Closure Implementation Plan
|
||||
|
||||
> **For agentic workers:** REQUIRED SUB-SKILL: Use superpowers:subagent-driven-development (recommended) or superpowers:executing-plans to implement this plan task-by-task. Steps use checkbox (`- [ ]`) syntax for tracking.
|
||||
|
||||
**Goal:** Ensure invalid provenance raw paths are rejected before isolated-diff
|
||||
ingestion squashes any integration worktree changes into the main project
|
||||
worktree.
|
||||
|
||||
**Architecture:** Keep provenance insertion after squash, but derive and
|
||||
validate the planned provenance rows immediately after final artifact gates and
|
||||
before the squash stage. This makes provenance validation part of the final
|
||||
pre-main safety boundary while preserving the existing report and database
|
||||
write shape.
|
||||
|
||||
**Tech Stack:** TypeScript ESM/NodeNext, Vitest, existing
|
||||
`IngestBundleRunner`, `validateProvenanceRawPaths`, ingest reports, and
|
||||
persistent ingest traces.
|
||||
|
||||
---
|
||||
|
||||
## Audit Summary
|
||||
|
||||
The implemented isolated-diff path now covers the core v1 safety surface:
|
||||
child worktrees, binary no-rename patches, `git apply --3way --index`, patch
|
||||
policy rejection, final wiki and semantic-layer gates after reconciliation and
|
||||
post-processing, failure reports, and persistent JSONL traces. The focused
|
||||
isolated-diff test suite passes:
|
||||
|
||||
```bash
|
||||
pnpm --filter @ktx/context exec vitest run \
|
||||
src/ingest/ingest-trace.test.ts \
|
||||
src/ingest/wiki-body-refs.test.ts \
|
||||
src/ingest/artifact-gates.test.ts \
|
||||
src/ingest/isolated-diff/git-patch.test.ts \
|
||||
src/ingest/isolated-diff/work-unit-executor.test.ts \
|
||||
src/ingest/isolated-diff/patch-integrator.test.ts \
|
||||
src/ingest/ingest-bundle.runner.isolated-diff.test.ts
|
||||
```
|
||||
|
||||
Current result: `7 passed`, `28 passed`.
|
||||
|
||||
One v1-blocking gap remains. `validateProvenanceRawPaths()` is called in
|
||||
`packages/context/src/ingest/ingest-bundle.runner.ts` after
|
||||
`squashMergeIntoMain()`. A work unit or reconciliation action can emit an
|
||||
otherwise valid wiki or semantic-layer artifact whose `rawPaths` contain a path
|
||||
outside the current raw snapshot and eviction set. Today the run fails during
|
||||
provenance recording, but only after the invalidly-attributed artifacts have
|
||||
already reached the main project worktree. That violates the spec requirement
|
||||
that final global gates run before any changes reach main.
|
||||
|
||||
Observability for the already-implemented phases is sufficient for postmortem
|
||||
reconstruction: traces include input snapshots, routing, child worktree
|
||||
creation and cleanup, patch collection and application, conflict
|
||||
classification, reconciliation, final gates, failure reports, and run outcome.
|
||||
This plan adds only the missing provenance validation failure trace because it
|
||||
corresponds to a concrete pre-main failure mode, not cosmetic trace expansion.
|
||||
|
||||
Non-blocking gaps that remain after this plan:
|
||||
|
||||
- Migrating Notion, LookML, Looker, dbt, MetricFlow, and historic-SQL direct
|
||||
durable writes to the isolated path.
|
||||
- Promoting isolated diffs as the default for all connectors.
|
||||
- Removing the old shared-worktree WorkUnit execution path.
|
||||
- Interactive, CLI, or agent-driven conflict resolution.
|
||||
- Auto-merging semantic conflicts that cannot be proven correct.
|
||||
- Transitive SQL-projection dependency expansion beyond direct declared joins.
|
||||
- Moving provenance rows to worktree files.
|
||||
- Adding failure reports for failures that happen before an ingest run row
|
||||
exists. The trace file is still written at the deterministic job path.
|
||||
|
||||
## File Structure
|
||||
|
||||
- Modify `packages/context/src/ingest/ingest-bundle.runner.isolated-diff.test.ts`.
|
||||
Add a regression proving invalid provenance raw paths fail before squash,
|
||||
leave main unchanged, skip SQLite provenance insertion, and emit a
|
||||
postmortem-grade trace event.
|
||||
- Modify `packages/context/src/ingest/ingest-bundle.runner.ts`.
|
||||
Extract provenance row construction into private helpers, run provenance
|
||||
raw-path validation before squash, trace validation success and failure, and
|
||||
reuse the prevalidated rows for insertion and reports after squash.
|
||||
|
||||
---
|
||||
|
||||
### Task 1: Add the pre-squash provenance regression
|
||||
|
||||
**Files:**
|
||||
- Modify: `packages/context/src/ingest/ingest-bundle.runner.isolated-diff.test.ts`
|
||||
|
||||
- [ ] **Step 1: Write the failing runner test**
|
||||
|
||||
Append this test inside the existing
|
||||
`describe('IngestBundleRunner isolated diff path', ...)` block in
|
||||
`packages/context/src/ingest/ingest-bundle.runner.isolated-diff.test.ts`:
|
||||
|
||||
```ts
|
||||
it('rejects invalid provenance raw paths before squash reaches main', async () => {
|
||||
const runtime = await makeRealGitRuntime();
|
||||
try {
|
||||
const { deps, adapter } = makeDeps(runtime);
|
||||
adapter.chunk.mockResolvedValue({
|
||||
workUnits: [{ unitKey: 'card-valid-artifacts', rawFiles: ['cards/source.json'], peerFileIndex: [], dependencyPaths: [] }],
|
||||
});
|
||||
|
||||
let currentSession: any = null;
|
||||
deps.toolsetFactory.createIngestWuToolset = vi.fn((toolSession: any) => {
|
||||
currentSession = toolSession;
|
||||
return { toRuntimeTools: vi.fn(() => ({})) };
|
||||
});
|
||||
deps.agentRunner.runLoop = vi.fn(async () => {
|
||||
const root = rootOfConfig(currentSession.configService, runtime.configDir);
|
||||
await mkdir(join(root, 'semantic-layer/warehouse'), { recursive: true });
|
||||
await mkdir(join(root, 'wiki/global'), { recursive: true });
|
||||
await writeFile(
|
||||
join(root, 'semantic-layer/warehouse/mart_account_segments.yaml'),
|
||||
'name: mart_account_segments\ngrain: [account_id]\ncolumns: [{name: account_id, type: string}]\njoins: []\nmeasures:\n - name: total_contract_arr\n expr: sum(contract_arr)\n',
|
||||
);
|
||||
await writeFile(
|
||||
join(root, 'wiki/global/account-segments.md'),
|
||||
'---\nsummary: Account segments\nusage_mode: auto\nsl_refs:\n - mart_account_segments\n---\n\nARR is `mart_account_segments.total_contract_arr`.\n',
|
||||
);
|
||||
addTouchedSlSource(currentSession.touchedSlSources, 'warehouse', 'mart_account_segments');
|
||||
currentSession.actions.push({
|
||||
target: 'sl',
|
||||
type: 'created',
|
||||
key: 'mart_account_segments',
|
||||
detail: 'Valid source',
|
||||
targetConnectionId: 'warehouse',
|
||||
rawPaths: ['cards/source.json'],
|
||||
});
|
||||
currentSession.actions.push({
|
||||
target: 'wiki',
|
||||
type: 'created',
|
||||
key: 'account-segments',
|
||||
detail: 'Valid wiki with invalid provenance raw path',
|
||||
rawPaths: ['cards/missing.json'],
|
||||
});
|
||||
await currentSession.gitService.commitFiles(
|
||||
['semantic-layer/warehouse/mart_account_segments.yaml', 'wiki/global/account-segments.md'],
|
||||
'valid artifacts with invalid provenance',
|
||||
'KTX Test',
|
||||
'system@ktx.local',
|
||||
);
|
||||
return { stopReason: 'natural' };
|
||||
}) as never;
|
||||
|
||||
const runner = new IngestBundleRunner(deps);
|
||||
await mockStageRawFiles(runner, runtime, [['cards/source.json', 'h1']]);
|
||||
const preRunHead = await runtime.git.revParseHead();
|
||||
|
||||
await expect(
|
||||
runner.run({
|
||||
jobId: 'job-invalid-provenance',
|
||||
connectionId: 'warehouse',
|
||||
sourceKey: 'metabase',
|
||||
trigger: 'upload',
|
||||
bundleRef: { kind: 'upload', uploadId: 'upload' },
|
||||
}),
|
||||
).rejects.toThrow(/provenance row references raw path outside this snapshot: cards\/missing\.json/);
|
||||
|
||||
expect(await runtime.git.revParseHead()).toBe(preRunHead);
|
||||
expect(deps.provenance.insertMany).not.toHaveBeenCalled();
|
||||
const trace = await readFile(join(runtime.configDir, '.ktx/ingest-traces/job-invalid-provenance/trace.jsonl'), 'utf-8');
|
||||
expect(trace).toContain('final_artifact_gates_finished');
|
||||
expect(trace).toContain('provenance_rows_validation_failed');
|
||||
expect(trace).toContain('cards/missing.json');
|
||||
expect(trace).toContain('ingest_failed');
|
||||
expect(trace).not.toContain('squash_finished');
|
||||
} finally {
|
||||
await rm(runtime.homeDir, { recursive: true, force: true });
|
||||
}
|
||||
});
|
||||
```
|
||||
|
||||
- [ ] **Step 2: Run the failing regression**
|
||||
|
||||
Run:
|
||||
|
||||
```bash
|
||||
pnpm --filter @ktx/context exec vitest run src/ingest/ingest-bundle.runner.isolated-diff.test.ts -t "invalid provenance raw paths"
|
||||
```
|
||||
|
||||
Expected: FAIL because the current runner validates provenance after
|
||||
`squashMergeIntoMain()`, so `runtime.git.revParseHead()` changes and the trace
|
||||
does not contain `provenance_rows_validation_failed`.
|
||||
|
||||
### Task 2: Move provenance validation into the pre-squash gate boundary
|
||||
|
||||
**Files:**
|
||||
- Modify: `packages/context/src/ingest/ingest-bundle.runner.ts`
|
||||
|
||||
- [ ] **Step 1: Import the provenance report and insert types**
|
||||
|
||||
In `packages/context/src/ingest/ingest-bundle.runner.ts`, update the imports.
|
||||
|
||||
Replace this import block:
|
||||
|
||||
```ts
|
||||
import type {
|
||||
ContextEvidenceIndexSummary,
|
||||
IngestBundleRunnerDeps,
|
||||
IngestProvenanceRow,
|
||||
IngestRunsPort,
|
||||
IngestSessionWorktree,
|
||||
PageTriageRunResult,
|
||||
} from './ports.js';
|
||||
```
|
||||
|
||||
With:
|
||||
|
||||
```ts
|
||||
import type {
|
||||
ContextEvidenceIndexSummary,
|
||||
IngestBundleRunnerDeps,
|
||||
IngestProvenanceInsert,
|
||||
IngestProvenanceRow,
|
||||
IngestRunsPort,
|
||||
IngestSessionWorktree,
|
||||
PageTriageRunResult,
|
||||
} from './ports.js';
|
||||
```
|
||||
|
||||
Replace this import block:
|
||||
|
||||
```ts
|
||||
import {
|
||||
buildStageIndexFromReportBody,
|
||||
postProcessorSavedMemoryCounts,
|
||||
type IngestReportPostProcessorOutcome,
|
||||
type IngestReportSnapshot,
|
||||
} from './reports.js';
|
||||
```
|
||||
|
||||
With:
|
||||
|
||||
```ts
|
||||
import {
|
||||
buildStageIndexFromReportBody,
|
||||
postProcessorSavedMemoryCounts,
|
||||
type IngestReportPostProcessorOutcome,
|
||||
type IngestReportProvenanceDetail,
|
||||
type IngestReportSnapshot,
|
||||
} from './reports.js';
|
||||
```
|
||||
|
||||
- [ ] **Step 2: Add provenance row helpers**
|
||||
|
||||
Add these private methods after `private errorMessage(error: unknown): string`
|
||||
in `packages/context/src/ingest/ingest-bundle.runner.ts`:
|
||||
|
||||
```ts
|
||||
private buildProvenanceRows(input: {
|
||||
job: IngestBundleJob;
|
||||
syncId: string;
|
||||
currentHashes: Map<string, string>;
|
||||
stageIndex: StageIndex;
|
||||
reconcileActions: MemoryAction[];
|
||||
eviction?: EvictionUnit;
|
||||
}): IngestProvenanceInsert[] {
|
||||
const provenanceRows: IngestProvenanceInsert[] = [];
|
||||
const actionToType = (action: MemoryAction): IngestProvenanceInsert['actionType'] => {
|
||||
if (action.target === 'wiki') {
|
||||
return 'wiki_written';
|
||||
}
|
||||
return action.type === 'created' ? 'source_created' : 'measure_added';
|
||||
};
|
||||
const producedPaths = new Set<string>();
|
||||
const pushActionProvenance = (rawPath: string, action: MemoryAction): void => {
|
||||
const hash = input.currentHashes.get(rawPath) ?? '';
|
||||
provenanceRows.push({
|
||||
connectionId: input.job.connectionId,
|
||||
sourceKey: input.job.sourceKey,
|
||||
syncId: input.syncId,
|
||||
rawPath,
|
||||
rawContentHash: hash,
|
||||
artifactKind: action.target,
|
||||
artifactKey: action.key,
|
||||
targetConnectionId: action.target === 'sl' ? actionTargetConnectionId(action, input.job.connectionId) : null,
|
||||
artifactContentHash: null,
|
||||
actionType: actionToType(action),
|
||||
});
|
||||
producedPaths.add(rawPath);
|
||||
};
|
||||
|
||||
for (const wu of input.stageIndex.workUnits) {
|
||||
for (const action of wu.actions) {
|
||||
for (const rawPath of rawPathsForAction(action, wu.rawFiles)) {
|
||||
pushActionProvenance(rawPath, action);
|
||||
}
|
||||
}
|
||||
}
|
||||
for (const action of input.reconcileActions) {
|
||||
for (const rawPath of action.rawPaths ?? []) {
|
||||
pushActionProvenance(rawPath, action);
|
||||
}
|
||||
}
|
||||
for (const resolution of input.stageIndex.artifactResolutions ?? []) {
|
||||
const hash = input.currentHashes.get(resolution.rawPath) ?? '';
|
||||
provenanceRows.push({
|
||||
connectionId: input.job.connectionId,
|
||||
sourceKey: input.job.sourceKey,
|
||||
syncId: input.syncId,
|
||||
rawPath: resolution.rawPath,
|
||||
rawContentHash: hash,
|
||||
artifactKind: resolution.artifactKind,
|
||||
artifactKey: resolution.artifactKey,
|
||||
targetConnectionId: null,
|
||||
artifactContentHash: null,
|
||||
actionType: resolution.actionType,
|
||||
});
|
||||
producedPaths.add(resolution.rawPath);
|
||||
}
|
||||
for (const [rawPath, hash] of input.currentHashes) {
|
||||
if (producedPaths.has(rawPath)) {
|
||||
continue;
|
||||
}
|
||||
provenanceRows.push({
|
||||
connectionId: input.job.connectionId,
|
||||
sourceKey: input.job.sourceKey,
|
||||
syncId: input.syncId,
|
||||
rawPath,
|
||||
rawContentHash: hash,
|
||||
artifactKind: null,
|
||||
artifactKey: null,
|
||||
targetConnectionId: null,
|
||||
artifactContentHash: null,
|
||||
actionType: 'skipped',
|
||||
});
|
||||
}
|
||||
|
||||
return provenanceRows;
|
||||
}
|
||||
|
||||
private toReportProvenanceRows(rows: IngestProvenanceInsert[]): IngestReportProvenanceDetail[] {
|
||||
return rows.map(({ rawPath, artifactKind, artifactKey, actionType, targetConnectionId }) => ({
|
||||
rawPath,
|
||||
artifactKind,
|
||||
artifactKey,
|
||||
targetConnectionId: targetConnectionId ?? null,
|
||||
actionType,
|
||||
}));
|
||||
}
|
||||
```
|
||||
|
||||
- [ ] **Step 3: Validate planned provenance rows before squash**
|
||||
|
||||
In `packages/context/src/ingest/ingest-bundle.runner.ts`, find the code that
|
||||
sets `activePhase = 'final_gates';` and runs `traceTimed(...,
|
||||
'final_artifact_gates', ...)`. Immediately after that `await traceTimed(...)`
|
||||
block and before the `// Stage 6 — squash commit` comment, insert:
|
||||
|
||||
```ts
|
||||
activePhase = 'provenance_validation';
|
||||
const provenanceRows = this.buildProvenanceRows({
|
||||
job,
|
||||
syncId,
|
||||
currentHashes,
|
||||
stageIndex,
|
||||
reconcileActions,
|
||||
eviction,
|
||||
});
|
||||
await traceTimed(
|
||||
runTrace,
|
||||
'provenance',
|
||||
'provenance_rows_validation',
|
||||
{
|
||||
rowCount: provenanceRows.length,
|
||||
currentRawPathCount: currentHashes.size,
|
||||
deletedRawPathCount: eviction?.deletedRawPaths.length ?? 0,
|
||||
},
|
||||
async () => {
|
||||
validateProvenanceRawPaths({
|
||||
rows: provenanceRows,
|
||||
currentRawPaths: new Set(currentHashes.keys()),
|
||||
deletedRawPaths: new Set(eviction?.deletedRawPaths ?? []),
|
||||
});
|
||||
},
|
||||
);
|
||||
const reportProvenanceRows = this.toReportProvenanceRows(provenanceRows);
|
||||
```
|
||||
|
||||
- [ ] **Step 4: Replace the post-squash provenance construction block**
|
||||
|
||||
In `packages/context/src/ingest/ingest-bundle.runner.ts`, in the
|
||||
`activePhase = 'provenance';` section after squash, delete the current block
|
||||
that starts with:
|
||||
|
||||
```ts
|
||||
// Provenance rows: per-artifact when the WU emitted actions, plus a `skipped`
|
||||
// fallback for raw files that produced nothing so the next DiffSet still sees
|
||||
// them.
|
||||
const provenanceRows: Parameters<typeof this.deps.provenance.insertMany>[0] = [];
|
||||
```
|
||||
|
||||
And ends with:
|
||||
|
||||
```ts
|
||||
await runTrace.event('debug', 'provenance', 'provenance_rows_validated', {
|
||||
rowCount: provenanceRows.length,
|
||||
});
|
||||
```
|
||||
|
||||
Do not delete the existing call to `await this.deps.provenance.insertMany(provenanceRows);`.
|
||||
Immediately after that insertion call, add:
|
||||
|
||||
```ts
|
||||
await runTrace.event('debug', 'provenance', 'provenance_rows_inserted', {
|
||||
rowCount: provenanceRows.length,
|
||||
});
|
||||
```
|
||||
|
||||
Then delete the later `const reportProvenanceRows = provenanceRows.map(...)`
|
||||
block because `reportProvenanceRows` is now created before squash from the
|
||||
prevalidated rows.
|
||||
|
||||
- [ ] **Step 5: Run the provenance regression**
|
||||
|
||||
Run:
|
||||
|
||||
```bash
|
||||
pnpm --filter @ktx/context exec vitest run src/ingest/ingest-bundle.runner.isolated-diff.test.ts -t "invalid provenance raw paths"
|
||||
```
|
||||
|
||||
Expected: PASS. The trace contains `provenance_rows_validation_failed`, main
|
||||
HEAD remains unchanged, and `provenance.insertMany` is not called.
|
||||
|
||||
- [ ] **Step 6: Run the focused isolated-diff suite**
|
||||
|
||||
Run:
|
||||
|
||||
```bash
|
||||
pnpm --filter @ktx/context exec vitest run \
|
||||
src/ingest/ingest-trace.test.ts \
|
||||
src/ingest/wiki-body-refs.test.ts \
|
||||
src/ingest/artifact-gates.test.ts \
|
||||
src/ingest/isolated-diff/git-patch.test.ts \
|
||||
src/ingest/isolated-diff/work-unit-executor.test.ts \
|
||||
src/ingest/isolated-diff/patch-integrator.test.ts \
|
||||
src/ingest/ingest-bundle.runner.isolated-diff.test.ts
|
||||
```
|
||||
|
||||
Expected: PASS.
|
||||
|
||||
### Task 3: Type-check, dead-code check, and commit
|
||||
|
||||
**Files:**
|
||||
- Verify: `packages/context/src/ingest/ingest-bundle.runner.ts`
|
||||
- Verify: `packages/context/src/ingest/ingest-bundle.runner.isolated-diff.test.ts`
|
||||
|
||||
- [ ] **Step 1: Run the context package type-check**
|
||||
|
||||
Run:
|
||||
|
||||
```bash
|
||||
pnpm --filter @ktx/context run type-check
|
||||
```
|
||||
|
||||
Expected: PASS.
|
||||
|
||||
- [ ] **Step 2: Run the workspace dead-code check**
|
||||
|
||||
Run:
|
||||
|
||||
```bash
|
||||
pnpm run dead-code
|
||||
```
|
||||
|
||||
Expected: PASS, or only existing unrelated Knip/Biome findings. Investigate
|
||||
any new findings in the two modified files before continuing.
|
||||
|
||||
- [ ] **Step 3: Commit the provenance gate closure**
|
||||
|
||||
Run:
|
||||
|
||||
```bash
|
||||
git add packages/context/src/ingest/ingest-bundle.runner.ts \
|
||||
packages/context/src/ingest/ingest-bundle.runner.isolated-diff.test.ts
|
||||
git commit -m "fix(ingest): gate provenance before isolated diff squash"
|
||||
```
|
||||
|
||||
Expected: one commit containing only the runner and isolated-diff runner test
|
||||
changes.
|
||||
|
||||
## Self-Review
|
||||
|
||||
Spec coverage: this plan closes the remaining violation of the design's final
|
||||
global gate invariant by proving invalid provenance raw paths fail before
|
||||
squash and by moving provenance validation into the pre-main gate boundary.
|
||||
|
||||
Placeholder scan: no placeholder steps remain. Every implementation step names
|
||||
the exact files, code, commands, and expected results.
|
||||
|
||||
Type consistency: the plan uses existing `IngestProvenanceInsert`,
|
||||
`IngestReportProvenanceDetail`, `MemoryAction`, `EvictionUnit`, `StageIndex`,
|
||||
`rawPathsForAction()`, and `validateProvenanceRawPaths()` names.
|
||||
File diff suppressed because it is too large
Load diff
File diff suppressed because it is too large
Load diff
|
|
@ -0,0 +1,754 @@
|
|||
# Isolated Diff Ingestion V1 Default Promotion Implementation Plan
|
||||
|
||||
> **For agentic workers:** REQUIRED SUB-SKILL: Use
|
||||
> superpowers:subagent-driven-development (recommended) or
|
||||
> superpowers:executing-plans to implement this plan task-by-task. Steps use
|
||||
> checkbox (`- [ ]`) syntax for tracking.
|
||||
|
||||
**Goal:** Promote isolated-diff WorkUnit execution to the default ingest runner
|
||||
path while keeping the old shared-worktree branch reachable by an explicit
|
||||
private fallback setting for the final cleanup rollout.
|
||||
|
||||
**Architecture:** The runner stops asking whether a source is on an
|
||||
isolated-diff allowlist. Instead, non-override bundle ingests use isolated
|
||||
diffs unless the private settings object lists the source in
|
||||
`sharedWorktreeSourceKeys`. Local runtime defaults that fallback list to empty,
|
||||
and tests keep the old path covered with an explicit legacy source setting so
|
||||
rollout step 11 can delete it safely.
|
||||
|
||||
**Tech Stack:** TypeScript ESM/NodeNext, Vitest, pnpm workspace commands,
|
||||
existing `IngestBundleRunner`, `IngestSettingsPort`, local ingest runtime, and
|
||||
isolated-diff runner tests.
|
||||
|
||||
---
|
||||
|
||||
## Audit summary
|
||||
|
||||
This audit read the original spec at
|
||||
`docs/superpowers/specs/2026-05-17-isolated-diff-ingestion-design.md`, all
|
||||
plans matching
|
||||
`docs/superpowers/plans/2026-05-17-isolated-diff-ingestion-*.md` and
|
||||
`docs/superpowers/plans/2026-05-18-isolated-diff-ingestion-*.md`, and the
|
||||
current ingest runner code under `packages/context/src/ingest/`.
|
||||
|
||||
Implemented v1 rollout coverage:
|
||||
|
||||
- Rollout steps 1 and 2 are implemented by the core plan: child worktrees,
|
||||
binary no-rename patch proposals, and `git apply --3way --index`
|
||||
integration exist.
|
||||
- Rollout step 3 is implemented by the textual conflict resolver plan:
|
||||
`textual-conflict-resolver.ts` is wired through `patch-integrator.ts`.
|
||||
- Rollout steps 4, 5, and 6 are implemented by the gates, provenance,
|
||||
reference, global wiki, and gate-repair plans: final gates, persistent traces,
|
||||
failure reports, provenance validation, target policy, and repair counters
|
||||
exist.
|
||||
- Rollout step 7 is implemented by the core and follow-up plans: Metabase has
|
||||
isolated-diff stale-reference regression coverage.
|
||||
- Rollout step 8 is implemented by
|
||||
`2026-05-18-isolated-diff-ingestion-v1-connector-migration.md` and the
|
||||
follow-up commits: Notion, LookML, Looker, dbt, and MetricFlow route through
|
||||
isolated child worktrees, and MetricFlow projection runs before WorkUnits.
|
||||
|
||||
Current v1-blocking gaps:
|
||||
|
||||
- Rollout step 10 is not complete. `IngestBundleRunner.isIsolatedDiffEnabled()`
|
||||
still checks `settings.isolatedDiffSourceKeys`, and
|
||||
`local-bundle-runtime.ts` still installs the internal allowlist returned by
|
||||
`defaultIsolatedDiffSourceKeys()`.
|
||||
- Rollout step 11 remains blocked until step 10 lands. The old
|
||||
shared-worktree WorkUnit branch is still present and must stay reachable in
|
||||
this plan for final cleanup validation.
|
||||
|
||||
Non-blocking gaps:
|
||||
|
||||
- Rollout step 9 deterministic semantic merge helpers remain intentionally
|
||||
deferred until v1 resolver metrics show frequent mechanical repairs.
|
||||
- Transitive SQL-projection dependency expansion remains outside v1; current
|
||||
gates cover direct declared join neighbors.
|
||||
- Moving provenance into worktree files remains outside v1; the implemented
|
||||
source of truth is the ingest provenance store and report body.
|
||||
- Public connector knobs such as `executionMode`, `planningStrategy`, and
|
||||
`conflictPolicy` remain non-goals and must not be added.
|
||||
- Richer resolver context, such as full transcript excerpts for every
|
||||
overlapping patch, can be evaluated after the default path has production
|
||||
traces.
|
||||
|
||||
## File structure
|
||||
|
||||
- Modify `packages/context/src/ingest/isolated-diff/source-routing.ts`.
|
||||
Replace the isolated-diff direct-write allowlist with an empty default
|
||||
shared-worktree fallback list.
|
||||
- Modify `packages/context/src/ingest/isolated-diff/source-routing.test.ts`.
|
||||
Lock the fallback list semantics and remove direct-write allowlist
|
||||
assertions.
|
||||
- Modify `packages/context/src/ingest/ports.ts`.
|
||||
Replace `isolatedDiffSourceKeys?: string[]` with
|
||||
`sharedWorktreeSourceKeys?: string[]` on the private runner settings port.
|
||||
- Modify `packages/context/src/ingest/ingest-bundle.runner.ts`.
|
||||
Make isolated diff the default for non-override runs and route to the old
|
||||
shared branch only when `sharedWorktreeSourceKeys` contains the source.
|
||||
- Modify `packages/context/src/ingest/ingest-bundle.runner.isolated-diff.test.ts`.
|
||||
Prove an unlisted source uses isolated diffs by default and prove an
|
||||
explicit fallback source can still reach the shared-worktree branch.
|
||||
- Modify `packages/context/src/ingest/local-bundle-runtime.ts`.
|
||||
Install the new empty fallback list instead of the old isolated-diff
|
||||
allowlist.
|
||||
- Modify `packages/context/src/ingest/local-bundle-runtime.test.ts`.
|
||||
Assert local runtime settings do not expose `isolatedDiffSourceKeys` and do
|
||||
default `sharedWorktreeSourceKeys` to `[]`.
|
||||
|
||||
---
|
||||
|
||||
### Task 1: Replace source routing semantics
|
||||
|
||||
**Files:**
|
||||
- Modify: `packages/context/src/ingest/isolated-diff/source-routing.test.ts`
|
||||
- Modify: `packages/context/src/ingest/isolated-diff/source-routing.ts`
|
||||
- Modify: `packages/context/src/ingest/ports.ts`
|
||||
|
||||
- [ ] **Step 1: Write the failing source-routing tests**
|
||||
|
||||
Replace `packages/context/src/ingest/isolated-diff/source-routing.test.ts` with:
|
||||
|
||||
```ts
|
||||
import { describe, expect, it } from 'vitest';
|
||||
import { defaultSharedWorktreeSourceKeys, isSharedWorktreeFallbackSourceKey } from './source-routing.js';
|
||||
|
||||
describe('isolated-diff source routing', () => {
|
||||
it('defaults every non-override source to isolated diffs', () => {
|
||||
expect(defaultSharedWorktreeSourceKeys()).toEqual([]);
|
||||
});
|
||||
|
||||
it('returns a mutable copy for runtime settings', () => {
|
||||
const keys = defaultSharedWorktreeSourceKeys();
|
||||
keys.push('legacy-source');
|
||||
|
||||
expect(defaultSharedWorktreeSourceKeys()).toEqual([]);
|
||||
});
|
||||
|
||||
it('recognizes only explicitly configured shared-worktree fallback sources', () => {
|
||||
expect(isSharedWorktreeFallbackSourceKey('notion', [])).toBe(false);
|
||||
expect(isSharedWorktreeFallbackSourceKey('metricflow', [])).toBe(false);
|
||||
expect(isSharedWorktreeFallbackSourceKey('legacy-source', ['legacy-source'])).toBe(true);
|
||||
expect(isSharedWorktreeFallbackSourceKey('other-source', ['legacy-source'])).toBe(false);
|
||||
});
|
||||
});
|
||||
```
|
||||
|
||||
- [ ] **Step 2: Run the source-routing tests to verify they fail**
|
||||
|
||||
Run:
|
||||
|
||||
```bash
|
||||
pnpm --filter @ktx/context exec vitest run src/ingest/isolated-diff/source-routing.test.ts
|
||||
```
|
||||
|
||||
Expected: FAIL because `defaultSharedWorktreeSourceKeys()` and
|
||||
`isSharedWorktreeFallbackSourceKey()` are not exported yet.
|
||||
|
||||
- [ ] **Step 3: Rewrite the routing helper**
|
||||
|
||||
Replace `packages/context/src/ingest/isolated-diff/source-routing.ts` with:
|
||||
|
||||
```ts
|
||||
const DEFAULT_SHARED_WORKTREE_SOURCE_KEYS: readonly string[] = [];
|
||||
|
||||
export function defaultSharedWorktreeSourceKeys(): string[] {
|
||||
return [...DEFAULT_SHARED_WORKTREE_SOURCE_KEYS];
|
||||
}
|
||||
|
||||
export function isSharedWorktreeFallbackSourceKey(
|
||||
sourceKey: string,
|
||||
sharedWorktreeSourceKeys: readonly string[] = DEFAULT_SHARED_WORKTREE_SOURCE_KEYS,
|
||||
): boolean {
|
||||
return sharedWorktreeSourceKeys.includes(sourceKey);
|
||||
}
|
||||
```
|
||||
|
||||
- [ ] **Step 4: Rename the private settings field**
|
||||
|
||||
In `packages/context/src/ingest/ports.ts`, replace the
|
||||
`IngestSettingsPort` interface with:
|
||||
|
||||
```ts
|
||||
export interface IngestSettingsPort {
|
||||
memoryIngestionModel: string;
|
||||
probeRowCount: number;
|
||||
workUnitMaxConcurrency?: number;
|
||||
workUnitStepBudget?: number;
|
||||
workUnitFailureMode?: 'abort' | 'continue';
|
||||
sharedWorktreeSourceKeys?: string[];
|
||||
ingestTraceLevel?: IngestTraceLevel;
|
||||
}
|
||||
```
|
||||
|
||||
- [ ] **Step 5: Run the source-routing tests again**
|
||||
|
||||
Run:
|
||||
|
||||
```bash
|
||||
pnpm --filter @ktx/context exec vitest run src/ingest/isolated-diff/source-routing.test.ts
|
||||
```
|
||||
|
||||
Expected: PASS.
|
||||
|
||||
- [ ] **Step 6: Commit routing semantics**
|
||||
|
||||
Run:
|
||||
|
||||
```bash
|
||||
git add packages/context/src/ingest/isolated-diff/source-routing.ts \
|
||||
packages/context/src/ingest/isolated-diff/source-routing.test.ts \
|
||||
packages/context/src/ingest/ports.ts
|
||||
git commit -m "feat(ingest): make isolated diff routing the private default"
|
||||
```
|
||||
|
||||
### Task 2: Promote the runner default
|
||||
|
||||
**Files:**
|
||||
- Modify: `packages/context/src/ingest/ingest-bundle.runner.isolated-diff.test.ts`
|
||||
- Modify: `packages/context/src/ingest/ingest-bundle.runner.ts`
|
||||
|
||||
- [ ] **Step 1: Update the isolated runner test imports and harness**
|
||||
|
||||
In `packages/context/src/ingest/ingest-bundle.runner.isolated-diff.test.ts`,
|
||||
replace the source-routing import with:
|
||||
|
||||
```ts
|
||||
import { defaultSharedWorktreeSourceKeys } from './isolated-diff/source-routing.js';
|
||||
```
|
||||
|
||||
Then change the `makeDeps()` signature and `settings` block to:
|
||||
|
||||
```ts
|
||||
function makeDeps(
|
||||
runtime: Awaited<ReturnType<typeof makeRealGitRuntime>>,
|
||||
sourceKey = 'metabase',
|
||||
settings: Partial<IngestBundleRunnerDeps['settings']> = {},
|
||||
) {
|
||||
```
|
||||
|
||||
```ts
|
||||
settings: {
|
||||
memoryIngestionModel: 'test',
|
||||
probeRowCount: 1,
|
||||
sharedWorktreeSourceKeys: defaultSharedWorktreeSourceKeys(),
|
||||
ingestTraceLevel: 'trace',
|
||||
...settings,
|
||||
},
|
||||
```
|
||||
|
||||
- [ ] **Step 2: Add the default-promotion regression tests**
|
||||
|
||||
Insert these tests inside
|
||||
`describe('IngestBundleRunner isolated diff path', ...)`, before the existing
|
||||
non-Metabase routing matrix:
|
||||
|
||||
```ts
|
||||
it('routes an unlisted direct-writing source through isolated diffs by default', async () => {
|
||||
const runtime = await makeRealGitRuntime();
|
||||
try {
|
||||
const sourceKey = 'custom-direct-source';
|
||||
const { deps, adapter } = makeDeps(runtime, sourceKey);
|
||||
adapter.chunk.mockResolvedValue({
|
||||
workUnits: [
|
||||
{
|
||||
unitKey: 'custom-wiki',
|
||||
rawFiles: ['custom/page.json'],
|
||||
peerFileIndex: [],
|
||||
dependencyPaths: [],
|
||||
},
|
||||
],
|
||||
});
|
||||
|
||||
let currentSession: any = null;
|
||||
deps.toolsetFactory.createIngestWuToolset = vi.fn((toolSession: any) => {
|
||||
currentSession = toolSession;
|
||||
return { toRuntimeTools: vi.fn(() => ({})) };
|
||||
});
|
||||
deps.agentRunner.runLoop = vi.fn(async (params: any) => {
|
||||
if (params.telemetryTags.operationName !== 'ingest-bundle-wu') {
|
||||
return { stopReason: 'natural' };
|
||||
}
|
||||
const root = rootOfConfig(currentSession.configService, runtime.configDir);
|
||||
await mkdir(join(root, 'wiki/global'), { recursive: true });
|
||||
await writeFile(
|
||||
join(root, 'wiki/global/custom-isolated.md'),
|
||||
'---\nsummary: Custom isolated write\nusage_mode: auto\n---\n\nCustom isolated write.\n',
|
||||
'utf-8',
|
||||
);
|
||||
currentSession.actions.push({
|
||||
target: 'wiki',
|
||||
type: 'created',
|
||||
key: 'custom-isolated',
|
||||
detail: 'Custom isolated write',
|
||||
rawPaths: ['custom/page.json'],
|
||||
});
|
||||
await currentSession.gitService.commitFiles(
|
||||
['wiki/global/custom-isolated.md'],
|
||||
'custom wiki',
|
||||
'KTX Test',
|
||||
'system@ktx.local',
|
||||
);
|
||||
return { stopReason: 'natural' };
|
||||
}) as never;
|
||||
|
||||
const runner = new IngestBundleRunner(deps);
|
||||
await mockStageRawFiles(runner, runtime, [['custom/page.json', 'h1']], sourceKey);
|
||||
|
||||
await expect(
|
||||
runner.run({
|
||||
jobId: 'job-custom-default',
|
||||
connectionId: 'warehouse',
|
||||
sourceKey,
|
||||
trigger: 'upload',
|
||||
bundleRef: { kind: 'upload', uploadId: 'upload' },
|
||||
}),
|
||||
).resolves.toMatchObject({
|
||||
jobId: 'job-custom-default',
|
||||
failedWorkUnits: [],
|
||||
workUnitCount: 1,
|
||||
});
|
||||
|
||||
const trace = await readFile(
|
||||
join(runtime.configDir, '.ktx/ingest-traces/job-custom-default/trace.jsonl'),
|
||||
'utf-8',
|
||||
);
|
||||
expect(trace).toContain('isolated_diff_enabled');
|
||||
expect(trace).toContain('work_unit_child_created');
|
||||
expect(trace).not.toContain('shared_worktree_path_enabled');
|
||||
|
||||
const reportCreate = vi.mocked(deps.reports.create).mock.calls.at(-1)?.[0];
|
||||
const reportBody = reportCreate?.body as { isolatedDiff?: unknown } | undefined;
|
||||
expect(reportBody?.isolatedDiff).toMatchObject({
|
||||
enabled: true,
|
||||
acceptedPatches: 1,
|
||||
});
|
||||
} finally {
|
||||
await rm(runtime.homeDir, { recursive: true, force: true });
|
||||
}
|
||||
});
|
||||
|
||||
it('keeps the shared-worktree path reachable through explicit private fallback settings', async () => {
|
||||
const runtime = await makeRealGitRuntime();
|
||||
try {
|
||||
const sourceKey = 'legacy-source';
|
||||
const { deps, adapter } = makeDeps(runtime, sourceKey, {
|
||||
sharedWorktreeSourceKeys: ['legacy-source'],
|
||||
});
|
||||
adapter.chunk.mockResolvedValue({
|
||||
workUnits: [
|
||||
{
|
||||
unitKey: 'legacy-wiki',
|
||||
rawFiles: ['legacy/page.json'],
|
||||
peerFileIndex: [],
|
||||
dependencyPaths: [],
|
||||
},
|
||||
],
|
||||
});
|
||||
|
||||
let currentSession: any = null;
|
||||
deps.toolsetFactory.createIngestWuToolset = vi.fn((toolSession: any) => {
|
||||
currentSession = toolSession;
|
||||
return { toRuntimeTools: vi.fn(() => ({})) };
|
||||
});
|
||||
deps.agentRunner.runLoop = vi.fn(async (params: any) => {
|
||||
if (params.telemetryTags.operationName !== 'ingest-bundle-wu') {
|
||||
return { stopReason: 'natural' };
|
||||
}
|
||||
const root = rootOfConfig(currentSession.configService, runtime.configDir);
|
||||
await mkdir(join(root, 'wiki/global'), { recursive: true });
|
||||
await writeFile(
|
||||
join(root, 'wiki/global/legacy-shared.md'),
|
||||
'---\nsummary: Legacy shared write\nusage_mode: auto\n---\n\nLegacy shared write.\n',
|
||||
'utf-8',
|
||||
);
|
||||
currentSession.actions.push({
|
||||
target: 'wiki',
|
||||
type: 'created',
|
||||
key: 'legacy-shared',
|
||||
detail: 'Legacy shared write',
|
||||
rawPaths: ['legacy/page.json'],
|
||||
});
|
||||
await currentSession.gitService.commitFiles(
|
||||
['wiki/global/legacy-shared.md'],
|
||||
'legacy wiki',
|
||||
'KTX Test',
|
||||
'system@ktx.local',
|
||||
);
|
||||
return { stopReason: 'natural' };
|
||||
}) as never;
|
||||
|
||||
const runner = new IngestBundleRunner(deps);
|
||||
await mockStageRawFiles(runner, runtime, [['legacy/page.json', 'h1']], sourceKey);
|
||||
|
||||
await expect(
|
||||
runner.run({
|
||||
jobId: 'job-legacy-shared',
|
||||
connectionId: 'warehouse',
|
||||
sourceKey,
|
||||
trigger: 'upload',
|
||||
bundleRef: { kind: 'upload', uploadId: 'upload' },
|
||||
}),
|
||||
).resolves.toMatchObject({
|
||||
jobId: 'job-legacy-shared',
|
||||
failedWorkUnits: [],
|
||||
workUnitCount: 1,
|
||||
});
|
||||
|
||||
const trace = await readFile(
|
||||
join(runtime.configDir, '.ktx/ingest-traces/job-legacy-shared/trace.jsonl'),
|
||||
'utf-8',
|
||||
);
|
||||
expect(trace).toContain('shared_worktree_path_enabled');
|
||||
expect(trace).not.toContain('work_unit_child_created');
|
||||
|
||||
const reportCreate = vi.mocked(deps.reports.create).mock.calls.at(-1)?.[0];
|
||||
const reportBody = reportCreate?.body as { isolatedDiff?: unknown } | undefined;
|
||||
expect(reportBody?.isolatedDiff).toMatchObject({
|
||||
enabled: false,
|
||||
});
|
||||
} finally {
|
||||
await rm(runtime.homeDir, { recursive: true, force: true });
|
||||
}
|
||||
});
|
||||
```
|
||||
|
||||
- [ ] **Step 3: Run the new runner tests to verify the default test fails**
|
||||
|
||||
Run:
|
||||
|
||||
```bash
|
||||
pnpm --filter @ktx/context exec vitest run src/ingest/ingest-bundle.runner.isolated-diff.test.ts -t "unlisted direct-writing source|shared-worktree path reachable"
|
||||
```
|
||||
|
||||
Expected: FAIL. The unlisted source still enters the old shared-worktree path
|
||||
because the runner checks `isolatedDiffSourceKeys`.
|
||||
|
||||
- [ ] **Step 4: Change the runner routing decision**
|
||||
|
||||
In `packages/context/src/ingest/ingest-bundle.runner.ts`, replace
|
||||
`isIsolatedDiffEnabled()` with:
|
||||
|
||||
```ts
|
||||
private isSharedWorktreeFallbackEnabled(sourceKey: string): boolean {
|
||||
return (this.deps.settings.sharedWorktreeSourceKeys ?? []).includes(sourceKey);
|
||||
}
|
||||
```
|
||||
|
||||
Then replace the isolated-diff routing line with:
|
||||
|
||||
```ts
|
||||
const isolatedDiffEnabled = !overrideReport && !this.isSharedWorktreeFallbackEnabled(job.sourceKey);
|
||||
```
|
||||
|
||||
Finally, replace the shared-path trace event with:
|
||||
|
||||
```ts
|
||||
await runTrace.event('info', 'routing', 'shared_worktree_path_enabled', {
|
||||
sourceKey: job.sourceKey,
|
||||
reason: 'explicit_private_fallback',
|
||||
});
|
||||
```
|
||||
|
||||
- [ ] **Step 5: Run the new runner tests again**
|
||||
|
||||
Run:
|
||||
|
||||
```bash
|
||||
pnpm --filter @ktx/context exec vitest run src/ingest/ingest-bundle.runner.isolated-diff.test.ts -t "unlisted direct-writing source|shared-worktree path reachable"
|
||||
```
|
||||
|
||||
Expected: PASS.
|
||||
|
||||
- [ ] **Step 6: Commit runner default promotion**
|
||||
|
||||
Run:
|
||||
|
||||
```bash
|
||||
git add packages/context/src/ingest/ingest-bundle.runner.ts \
|
||||
packages/context/src/ingest/ingest-bundle.runner.isolated-diff.test.ts
|
||||
git commit -m "feat(ingest): promote isolated diff to default runner path"
|
||||
```
|
||||
|
||||
### Task 3: Update local runtime defaults
|
||||
|
||||
**Files:**
|
||||
- Modify: `packages/context/src/ingest/local-bundle-runtime.test.ts`
|
||||
- Modify: `packages/context/src/ingest/local-bundle-runtime.ts`
|
||||
|
||||
- [ ] **Step 1: Update the local runtime settings test type**
|
||||
|
||||
In `packages/context/src/ingest/local-bundle-runtime.test.ts`, replace
|
||||
`RuntimeWithSettingsDeps` with:
|
||||
|
||||
```ts
|
||||
type RuntimeWithSettingsDeps = {
|
||||
deps: {
|
||||
settings: {
|
||||
sharedWorktreeSourceKeys?: string[];
|
||||
isolatedDiffSourceKeys?: string[];
|
||||
};
|
||||
};
|
||||
};
|
||||
```
|
||||
|
||||
- [ ] **Step 2: Replace the local runtime settings assertion**
|
||||
|
||||
Replace the test named
|
||||
`enables isolated-diff routing for direct durable-write connectors` with:
|
||||
|
||||
```ts
|
||||
it('defaults local bundle ingest to isolated diffs without an allowlist', () => {
|
||||
const runtime = createLocalBundleIngestRuntime({
|
||||
project,
|
||||
adapters: [new FakeSourceAdapter()],
|
||||
agentRunner: testAgentRunner(),
|
||||
});
|
||||
|
||||
const settings = (runtime.runner as unknown as RuntimeWithSettingsDeps).deps.settings;
|
||||
|
||||
expect(settings.sharedWorktreeSourceKeys).toEqual([]);
|
||||
expect('isolatedDiffSourceKeys' in settings).toBe(false);
|
||||
});
|
||||
```
|
||||
|
||||
- [ ] **Step 3: Run the local runtime settings test to verify it fails**
|
||||
|
||||
Run:
|
||||
|
||||
```bash
|
||||
pnpm --filter @ktx/context exec vitest run src/ingest/local-bundle-runtime.test.ts -t "defaults local bundle ingest"
|
||||
```
|
||||
|
||||
Expected: FAIL because `local-bundle-runtime.ts` still sets
|
||||
`isolatedDiffSourceKeys`.
|
||||
|
||||
- [ ] **Step 4: Update local runtime imports and settings**
|
||||
|
||||
In `packages/context/src/ingest/local-bundle-runtime.ts`, replace the
|
||||
source-routing import with:
|
||||
|
||||
```ts
|
||||
import { defaultSharedWorktreeSourceKeys } from './isolated-diff/source-routing.js';
|
||||
```
|
||||
|
||||
Then replace the settings field:
|
||||
|
||||
```ts
|
||||
isolatedDiffSourceKeys: defaultIsolatedDiffSourceKeys(),
|
||||
```
|
||||
|
||||
with:
|
||||
|
||||
```ts
|
||||
sharedWorktreeSourceKeys: defaultSharedWorktreeSourceKeys(),
|
||||
```
|
||||
|
||||
- [ ] **Step 5: Run the local runtime settings test again**
|
||||
|
||||
Run:
|
||||
|
||||
```bash
|
||||
pnpm --filter @ktx/context exec vitest run src/ingest/local-bundle-runtime.test.ts -t "defaults local bundle ingest"
|
||||
```
|
||||
|
||||
Expected: PASS.
|
||||
|
||||
- [ ] **Step 6: Commit local runtime defaults**
|
||||
|
||||
Run:
|
||||
|
||||
```bash
|
||||
git add packages/context/src/ingest/local-bundle-runtime.ts \
|
||||
packages/context/src/ingest/local-bundle-runtime.test.ts
|
||||
git commit -m "feat(ingest): default local ingest to isolated diffs"
|
||||
```
|
||||
|
||||
### Task 4: Remove stale allowlist references
|
||||
|
||||
**Files:**
|
||||
- Verify: `packages/context/src/ingest/isolated-diff/source-routing.ts`
|
||||
- Verify: `packages/context/src/ingest/local-bundle-runtime.ts`
|
||||
- Verify: `packages/context/src/ingest/ingest-bundle.runner.ts`
|
||||
- Verify: `packages/context/src/ingest/ports.ts`
|
||||
- Verify: `packages/context/src/ingest/**/*.test.ts`
|
||||
|
||||
- [ ] **Step 1: Search for old allowlist names**
|
||||
|
||||
Run:
|
||||
|
||||
```bash
|
||||
rg -n "isolatedDiffSourceKeys|defaultIsolatedDiffSourceKeys|ISOLATED_DIFF_DIRECT_WRITE_SOURCE_KEYS|isIsolatedDiffDirectWriteSourceKey" packages/context/src
|
||||
```
|
||||
|
||||
Expected: no matches.
|
||||
|
||||
- [ ] **Step 2: Search for the new fallback setting**
|
||||
|
||||
Run:
|
||||
|
||||
```bash
|
||||
rg -n "sharedWorktreeSourceKeys|defaultSharedWorktreeSourceKeys|isSharedWorktreeFallbackSourceKey" packages/context/src
|
||||
```
|
||||
|
||||
Expected: matches only in these files:
|
||||
|
||||
```text
|
||||
packages/context/src/ingest/ports.ts
|
||||
packages/context/src/ingest/ingest-bundle.runner.ts
|
||||
packages/context/src/ingest/ingest-bundle.runner.isolated-diff.test.ts
|
||||
packages/context/src/ingest/isolated-diff/source-routing.ts
|
||||
packages/context/src/ingest/isolated-diff/source-routing.test.ts
|
||||
packages/context/src/ingest/local-bundle-runtime.ts
|
||||
packages/context/src/ingest/local-bundle-runtime.test.ts
|
||||
```
|
||||
|
||||
- [ ] **Step 3: Run a focused no-allowlist regression suite**
|
||||
|
||||
Run:
|
||||
|
||||
```bash
|
||||
pnpm --filter @ktx/context exec vitest run \
|
||||
src/ingest/isolated-diff/source-routing.test.ts \
|
||||
src/ingest/local-bundle-runtime.test.ts \
|
||||
src/ingest/ingest-bundle.runner.isolated-diff.test.ts \
|
||||
-t "source routing|defaults local bundle ingest|unlisted direct-writing source|shared-worktree path reachable|routes notion|routes lookml|routes looker|routes dbt|routes metricflow"
|
||||
```
|
||||
|
||||
Expected: PASS.
|
||||
|
||||
- [ ] **Step 4: Commit stale-reference cleanup if needed**
|
||||
|
||||
If Step 1 or Step 2 required any edits, run:
|
||||
|
||||
```bash
|
||||
git add packages/context/src/ingest
|
||||
git commit -m "chore(ingest): remove isolated diff allowlist references"
|
||||
```
|
||||
|
||||
If no files changed, record that no cleanup commit was needed in the execution
|
||||
notes for this task.
|
||||
|
||||
### Task 5: Final verification
|
||||
|
||||
**Files:**
|
||||
- Verify: `packages/context/src/ingest/isolated-diff/source-routing.ts`
|
||||
- Verify: `packages/context/src/ingest/isolated-diff/source-routing.test.ts`
|
||||
- Verify: `packages/context/src/ingest/ingest-bundle.runner.ts`
|
||||
- Verify: `packages/context/src/ingest/ingest-bundle.runner.isolated-diff.test.ts`
|
||||
- Verify: `packages/context/src/ingest/local-bundle-runtime.ts`
|
||||
- Verify: `packages/context/src/ingest/local-bundle-runtime.test.ts`
|
||||
- Verify: `packages/context/src/ingest/ports.ts`
|
||||
- Verify: `docs/superpowers/plans/2026-05-18-isolated-diff-ingestion-v1-default-promotion.md`
|
||||
|
||||
- [ ] **Step 1: Run the full isolated-diff focused suite**
|
||||
|
||||
Run:
|
||||
|
||||
```bash
|
||||
pnpm --filter @ktx/context exec vitest run \
|
||||
src/ingest/ingest-trace.test.ts \
|
||||
src/ingest/wiki-body-refs.test.ts \
|
||||
src/ingest/artifact-gates.test.ts \
|
||||
src/ingest/semantic-layer-target-policy.test.ts \
|
||||
src/ingest/isolated-diff/source-routing.test.ts \
|
||||
src/ingest/isolated-diff/git-patch.test.ts \
|
||||
src/ingest/isolated-diff/work-unit-executor.test.ts \
|
||||
src/ingest/isolated-diff/patch-integrator.test.ts \
|
||||
src/ingest/isolated-diff/textual-conflict-resolver.test.ts \
|
||||
src/ingest/final-gate-repair.test.ts \
|
||||
src/ingest/ingest-bundle.runner.isolated-diff.test.ts \
|
||||
src/ingest/report-snapshot.test.ts \
|
||||
src/ingest/local-bundle-runtime.test.ts
|
||||
```
|
||||
|
||||
Expected: PASS.
|
||||
|
||||
- [ ] **Step 2: Run the MetricFlow local ingest regression**
|
||||
|
||||
Run:
|
||||
|
||||
```bash
|
||||
pnpm --filter @ktx/context exec vitest run src/ingest/local-bundle-ingest.test.ts -t "runs full MetricFlow local ingest"
|
||||
```
|
||||
|
||||
Expected: PASS. The report body includes `isolatedDiff.enabled: true`,
|
||||
`acceptedPatches: 0`, and a string `projectionSha`.
|
||||
|
||||
- [ ] **Step 3: Run package type-check**
|
||||
|
||||
Run:
|
||||
|
||||
```bash
|
||||
pnpm --filter @ktx/context run type-check
|
||||
```
|
||||
|
||||
Expected: PASS.
|
||||
|
||||
- [ ] **Step 4: Run package tests**
|
||||
|
||||
Run:
|
||||
|
||||
```bash
|
||||
pnpm --filter @ktx/context run test
|
||||
```
|
||||
|
||||
Expected: PASS.
|
||||
|
||||
- [ ] **Step 5: Run TypeScript dead-code checks**
|
||||
|
||||
Run:
|
||||
|
||||
```bash
|
||||
pnpm run dead-code
|
||||
```
|
||||
|
||||
Expected: PASS, or only pre-existing findings unrelated to the files changed
|
||||
by this plan. Investigate any finding that names `source-routing.ts`,
|
||||
`ports.ts`, `local-bundle-runtime.ts`, or `ingest-bundle.runner.ts`.
|
||||
|
||||
- [ ] **Step 6: Decide whether docs-site needs an update**
|
||||
|
||||
No `docs-site/content/docs/` change is expected for this plan because the
|
||||
change is an internal runner rollout switch and does not add or remove public
|
||||
CLI commands, flags, config fields, connector setup steps, or user-facing
|
||||
documentation concepts.
|
||||
|
||||
- [ ] **Step 7: Commit final verification notes**
|
||||
|
||||
Run:
|
||||
|
||||
```bash
|
||||
git status --short
|
||||
git add docs/superpowers/plans/2026-05-18-isolated-diff-ingestion-v1-default-promotion.md
|
||||
git commit -m "docs: add isolated diff default promotion plan"
|
||||
```
|
||||
|
||||
Only include the plan file in this commit if all implementation commits have
|
||||
already captured their code changes.
|
||||
|
||||
## Completion criteria
|
||||
|
||||
This plan is complete when:
|
||||
|
||||
- `packages/context/src/ingest/ports.ts` has
|
||||
`sharedWorktreeSourceKeys?: string[]` and no `isolatedDiffSourceKeys` field.
|
||||
- `IngestBundleRunner` uses isolated diffs for every non-override source unless
|
||||
`sharedWorktreeSourceKeys` explicitly contains that source.
|
||||
- The trace for a default-routed source contains `isolated_diff_enabled` and
|
||||
not `shared_worktree_path_enabled`.
|
||||
- The trace for an explicitly fallback-routed source contains
|
||||
`shared_worktree_path_enabled` and not `work_unit_child_created`.
|
||||
- Local runtime settings default `sharedWorktreeSourceKeys` to `[]`.
|
||||
- No production or test code under `packages/context/src` references the old
|
||||
isolated-diff allowlist names.
|
||||
- The focused isolated-diff suite, MetricFlow local ingest regression,
|
||||
`@ktx/context` type-check, `@ktx/context` tests, and dead-code checks pass.
|
||||
|
||||
## Next rollout step
|
||||
|
||||
After this plan is implemented and verified, the only remaining v1-blocking
|
||||
rollout item from the spec is step 11: remove the old shared-worktree WorkUnit
|
||||
execution path and delete the private `sharedWorktreeSourceKeys` fallback
|
||||
setting.
|
||||
File diff suppressed because it is too large
Load diff
File diff suppressed because it is too large
Load diff
|
|
@ -0,0 +1,980 @@
|
|||
# Isolated Diff Ingestion V1 Shared Worktree Removal Implementation Plan
|
||||
|
||||
> **For agentic workers:** REQUIRED SUB-SKILL: Use
|
||||
> superpowers:subagent-driven-development (recommended) or
|
||||
> superpowers:executing-plans to implement this plan task-by-task. Steps use
|
||||
> checkbox (`- [ ]`) syntax for tracking.
|
||||
|
||||
**Goal:** Remove the old shared-worktree WorkUnit execution path so every
|
||||
non-override bundle ingest uses isolated WorkUnit diffs.
|
||||
|
||||
**Architecture:** Keep `IngestBundleRunner` with one non-override execution
|
||||
path: raw snapshot, optional deterministic projection, child WorkUnit
|
||||
worktrees, patch integration, reconciliation, final gates, provenance
|
||||
validation, and squash. Delete the private fallback routing setting and all
|
||||
legacy tests, traces, and agent instructions that existed only for shared
|
||||
WorkUnit state.
|
||||
|
||||
**Tech Stack:** TypeScript, Vitest, pnpm, KTX ingest runner, Git worktrees.
|
||||
|
||||
---
|
||||
|
||||
## Audit summary
|
||||
|
||||
This audit read the original design in
|
||||
`docs/superpowers/specs/2026-05-17-isolated-diff-ingestion-design.md`, every
|
||||
implemented plan matching
|
||||
`docs/superpowers/plans/2026-05-17-isolated-diff-ingestion-*.md` and
|
||||
`docs/superpowers/plans/2026-05-18-isolated-diff-ingestion-*.md`, and the
|
||||
current implementation under `packages/context/src/ingest/`,
|
||||
`packages/context/prompts/`, and `packages/context/skills/`.
|
||||
|
||||
Implemented v1 rollout coverage:
|
||||
|
||||
- Rollout steps 1 and 2 exist in code: isolated child worktrees, binary
|
||||
no-rename patch collection, and `git apply --3way --index` patch integration.
|
||||
- Rollout step 3 exists in code:
|
||||
`packages/context/src/ingest/isolated-diff/textual-conflict-resolver.ts` is
|
||||
wired through the patch integrator and runner.
|
||||
- Rollout steps 4, 5, and 6 exist in code: final wiki and semantic-layer gates,
|
||||
provenance validation before squash, target policy checks, bounded gate
|
||||
repair, failed reports, and trace counters.
|
||||
- Rollout step 7 exists in code: the Metabase stale body-reference regression
|
||||
is covered in `ingest-bundle.runner.isolated-diff.test.ts`.
|
||||
- Rollout step 8 is committed: Notion, LookML, Looker, dbt, and MetricFlow
|
||||
route through isolated child worktrees, and MetricFlow projection runs before
|
||||
WorkUnits.
|
||||
- Rollout step 10 is committed: non-override ingests default to isolated diffs,
|
||||
and the old branch is reachable only through the private
|
||||
`sharedWorktreeSourceKeys` fallback setting.
|
||||
|
||||
## Remaining gaps
|
||||
|
||||
The remaining v1-blocking gaps are all part of rollout step 11:
|
||||
|
||||
- `packages/context/src/ingest/ports.ts` still exposes the private
|
||||
`sharedWorktreeSourceKeys?: string[]` setting.
|
||||
- `packages/context/src/ingest/isolated-diff/source-routing.ts` and its test
|
||||
exist only to support the fallback setting.
|
||||
- `packages/context/src/ingest/local-bundle-runtime.ts` still installs
|
||||
`sharedWorktreeSourceKeys: []`.
|
||||
- `packages/context/src/ingest/ingest-bundle.runner.ts` still checks
|
||||
`isSharedWorktreeFallbackEnabled()` and contains the
|
||||
`shared_worktree_path_enabled` branch that runs WorkUnits against the mutable
|
||||
integration worktree.
|
||||
- `packages/context/src/ingest/ingest-bundle.runner.isolated-diff.test.ts`
|
||||
still has a regression proving the shared-worktree fallback is reachable.
|
||||
- `packages/context/src/ingest/ingest-bundle.runner.test.ts` keeps broad runner
|
||||
tests on the legacy path through `sharedWorktreeSourceKeys`; those tests must
|
||||
either use the isolated mock harness or move coverage into the real-git
|
||||
isolated suite.
|
||||
- `packages/context/prompts/memory_agent_bundle_ingest_work_unit.md` and
|
||||
`packages/context/skills/ingest_triage/SKILL.md` still tell WorkUnit agents
|
||||
that prior WorkUnit writes in the same job are visible in the current working
|
||||
branch. That instruction is false after isolated diffs and must be removed
|
||||
with the shared path.
|
||||
|
||||
Non-blocking gaps after this plan:
|
||||
|
||||
- Rollout step 9 deterministic semantic merge helpers remain intentionally
|
||||
deferred until resolver metrics show frequent mechanical repairs.
|
||||
- Semantic-layer dependency expansion remains direct declared joins only; the
|
||||
spec explicitly defers transitive SQL-projection closure.
|
||||
- Provenance remains in the ingest provenance store and report body; moving it
|
||||
to worktree files is a separate schema migration.
|
||||
- Resolver context can later include richer transcript excerpts and explicit
|
||||
overlap summaries for every previously applied patch.
|
||||
- Failures before an ingest run row exists still have deterministic trace files
|
||||
but no stored ingest report.
|
||||
|
||||
## File structure
|
||||
|
||||
- Modify `packages/context/src/ingest/ports.ts`. Remove the private fallback
|
||||
setting from `IngestSettingsPort`.
|
||||
- Modify `packages/context/src/ingest/local-bundle-runtime.ts`. Stop importing
|
||||
and installing default shared-worktree fallback settings.
|
||||
- Delete `packages/context/src/ingest/isolated-diff/source-routing.ts`. This
|
||||
helper has no responsibility once fallback routing is removed.
|
||||
- Delete `packages/context/src/ingest/isolated-diff/source-routing.test.ts`.
|
||||
Its assertions exist only for the fallback helper.
|
||||
- Modify `packages/context/src/ingest/ingest-bundle.runner.ts`. Delete
|
||||
`isSharedWorktreeFallbackEnabled()`, the old shared-worktree WorkUnit branch,
|
||||
and helper methods that only served that branch.
|
||||
- Modify `packages/context/src/ingest/ingest-bundle.runner.isolated-diff.test.ts`.
|
||||
Remove fallback reachability coverage and add a stale-setting regression that
|
||||
proves a runtime object cannot opt out of isolated diffs.
|
||||
- Modify `packages/context/src/ingest/ingest-bundle.runner.test.ts`. Remove
|
||||
the fallback setting from the broad test harness and make its mocked Git
|
||||
session support no-op isolated patch collection.
|
||||
- Modify `packages/context/src/ingest/local-bundle-runtime.test.ts`. Assert
|
||||
local runtime settings do not contain the fallback key.
|
||||
- Modify `packages/context/prompts/memory_agent_bundle_ingest_work_unit.md`.
|
||||
Replace shared-branch WorkUnit visibility instructions with isolated-diff
|
||||
instructions.
|
||||
- Modify `packages/context/skills/ingest_triage/SKILL.md`. Remove Stage 3
|
||||
prior-WorkUnit visibility language and keep cross-WorkUnit sweep guidance in
|
||||
Stage 4 reconciliation.
|
||||
|
||||
---
|
||||
|
||||
### Task 1: Add removal-contract regressions
|
||||
|
||||
**Files:**
|
||||
- Modify: `packages/context/src/ingest/local-bundle-runtime.test.ts`
|
||||
- Modify: `packages/context/src/ingest/ingest-bundle.runner.isolated-diff.test.ts`
|
||||
|
||||
- [ ] **Step 1: Update the local runtime settings type**
|
||||
|
||||
In `packages/context/src/ingest/local-bundle-runtime.test.ts`, replace
|
||||
`RuntimeWithSettingsDeps` with:
|
||||
|
||||
```ts
|
||||
type RuntimeWithSettingsDeps = {
|
||||
deps: {
|
||||
settings: Record<string, unknown>;
|
||||
};
|
||||
};
|
||||
```
|
||||
|
||||
- [ ] **Step 2: Replace the local runtime fallback-setting assertion**
|
||||
|
||||
In `packages/context/src/ingest/local-bundle-runtime.test.ts`, replace the test
|
||||
named `defaults local bundle ingest to isolated diffs without an allowlist` with:
|
||||
|
||||
```ts
|
||||
it('defaults local bundle ingest to isolated diffs without a shared-worktree fallback setting', () => {
|
||||
const runtime = createLocalBundleIngestRuntime({
|
||||
project,
|
||||
adapters: [new FakeSourceAdapter()],
|
||||
agentRunner: testAgentRunner(),
|
||||
});
|
||||
|
||||
const settings = (runtime.runner as unknown as RuntimeWithSettingsDeps).deps.settings;
|
||||
|
||||
expect(settings).not.toHaveProperty('sharedWorktreeSourceKeys');
|
||||
expect(Object.keys(settings).sort()).toEqual([
|
||||
'ingestTraceLevel',
|
||||
'memoryIngestionModel',
|
||||
'probeRowCount',
|
||||
'workUnitFailureMode',
|
||||
'workUnitMaxConcurrency',
|
||||
'workUnitStepBudget',
|
||||
]);
|
||||
});
|
||||
```
|
||||
|
||||
- [ ] **Step 3: Remove the source-routing import from the isolated runner test**
|
||||
|
||||
In `packages/context/src/ingest/ingest-bundle.runner.isolated-diff.test.ts`,
|
||||
delete this import:
|
||||
|
||||
```ts
|
||||
import { defaultSharedWorktreeSourceKeys } from './isolated-diff/source-routing.js';
|
||||
```
|
||||
|
||||
Then remove the `sharedWorktreeSourceKeys` line from the `settings` object in
|
||||
`makeDeps()`:
|
||||
|
||||
```ts
|
||||
settings: {
|
||||
memoryIngestionModel: 'test',
|
||||
probeRowCount: 1,
|
||||
ingestTraceLevel: 'trace',
|
||||
...settings,
|
||||
},
|
||||
```
|
||||
|
||||
- [ ] **Step 4: Replace the shared fallback reachability test**
|
||||
|
||||
In `packages/context/src/ingest/ingest-bundle.runner.isolated-diff.test.ts`,
|
||||
replace the test named
|
||||
`keeps the shared-worktree path reachable through explicit private fallback settings`
|
||||
with this stale-setting regression:
|
||||
|
||||
```ts
|
||||
it('does not support shared-worktree fallback settings', async () => {
|
||||
const runtime = await makeRealGitRuntime();
|
||||
try {
|
||||
const sourceKey = 'legacy-source';
|
||||
const staleSettings = {
|
||||
sharedWorktreeSourceKeys: ['legacy-source'],
|
||||
} as Partial<IngestBundleRunnerDeps['settings']> & Record<string, unknown>;
|
||||
const { deps, adapter } = makeDeps(runtime, sourceKey, staleSettings);
|
||||
adapter.chunk.mockResolvedValue({
|
||||
workUnits: [
|
||||
{
|
||||
unitKey: 'legacy-wiki',
|
||||
rawFiles: ['legacy/page.json'],
|
||||
peerFileIndex: [],
|
||||
dependencyPaths: [],
|
||||
},
|
||||
],
|
||||
});
|
||||
|
||||
let currentSession: any = null;
|
||||
deps.toolsetFactory.createIngestWuToolset = vi.fn((toolSession: any) => {
|
||||
currentSession = toolSession;
|
||||
return { toRuntimeTools: vi.fn(() => ({})) };
|
||||
});
|
||||
deps.agentRunner.runLoop = vi.fn(async (params: any) => {
|
||||
if (params.telemetryTags.operationName !== 'ingest-bundle-wu') {
|
||||
return { stopReason: 'natural' };
|
||||
}
|
||||
const root = rootOfConfig(currentSession.configService, runtime.configDir);
|
||||
await mkdir(join(root, 'wiki/global'), { recursive: true });
|
||||
await writeFile(
|
||||
join(root, 'wiki/global/legacy-isolated.md'),
|
||||
'---\nsummary: Legacy isolated write\nusage_mode: auto\n---\n\nLegacy isolated write.\n',
|
||||
'utf-8',
|
||||
);
|
||||
currentSession.actions.push({
|
||||
target: 'wiki',
|
||||
type: 'created',
|
||||
key: 'legacy-isolated',
|
||||
detail: 'Legacy isolated write',
|
||||
rawPaths: ['legacy/page.json'],
|
||||
});
|
||||
await currentSession.gitService.commitFiles(
|
||||
['wiki/global/legacy-isolated.md'],
|
||||
'legacy isolated wiki',
|
||||
'KTX Test',
|
||||
'system@ktx.local',
|
||||
);
|
||||
return { stopReason: 'natural' };
|
||||
}) as never;
|
||||
|
||||
const runner = new IngestBundleRunner(deps);
|
||||
await mockStageRawFiles(runner, runtime, [['legacy/page.json', 'h1']], sourceKey);
|
||||
|
||||
await expect(
|
||||
runner.run({
|
||||
jobId: 'job-legacy-isolated',
|
||||
connectionId: 'warehouse',
|
||||
sourceKey,
|
||||
trigger: 'upload',
|
||||
bundleRef: { kind: 'upload', uploadId: 'upload' },
|
||||
}),
|
||||
).resolves.toMatchObject({
|
||||
jobId: 'job-legacy-isolated',
|
||||
failedWorkUnits: [],
|
||||
workUnitCount: 1,
|
||||
});
|
||||
|
||||
const trace = await readFile(
|
||||
join(runtime.configDir, '.ktx/ingest-traces/job-legacy-isolated/trace.jsonl'),
|
||||
'utf-8',
|
||||
);
|
||||
expect(trace).toContain('isolated_diff_enabled');
|
||||
expect(trace).toContain('work_unit_child_created');
|
||||
expect(trace).not.toContain('shared_worktree_path_enabled');
|
||||
|
||||
const reportCreate = vi.mocked(deps.reports.create).mock.calls.at(-1)?.[0];
|
||||
const reportBody = reportCreate?.body as { isolatedDiff?: unknown } | undefined;
|
||||
expect(reportBody?.isolatedDiff).toMatchObject({
|
||||
enabled: true,
|
||||
acceptedPatches: 1,
|
||||
});
|
||||
} finally {
|
||||
await rm(runtime.homeDir, { recursive: true, force: true });
|
||||
}
|
||||
});
|
||||
```
|
||||
|
||||
- [ ] **Step 5: Run the removal regressions and confirm they fail**
|
||||
|
||||
Run:
|
||||
|
||||
```bash
|
||||
pnpm --filter @ktx/context exec vitest run \
|
||||
src/ingest/local-bundle-runtime.test.ts \
|
||||
src/ingest/ingest-bundle.runner.isolated-diff.test.ts \
|
||||
-t "shared-worktree fallback|stale|defaults local bundle ingest|unlisted direct-writing source"
|
||||
```
|
||||
|
||||
Expected: FAIL. The local runtime still exposes `sharedWorktreeSourceKeys`, and
|
||||
the stale-setting runner test still reaches `shared_worktree_path_enabled`.
|
||||
|
||||
---
|
||||
|
||||
### Task 2: Remove the fallback setting and routing module
|
||||
|
||||
**Files:**
|
||||
- Modify: `packages/context/src/ingest/ports.ts`
|
||||
- Modify: `packages/context/src/ingest/local-bundle-runtime.ts`
|
||||
- Delete: `packages/context/src/ingest/isolated-diff/source-routing.ts`
|
||||
- Delete: `packages/context/src/ingest/isolated-diff/source-routing.test.ts`
|
||||
|
||||
- [ ] **Step 1: Remove the fallback setting from the runner settings port**
|
||||
|
||||
In `packages/context/src/ingest/ports.ts`, replace `IngestSettingsPort` with:
|
||||
|
||||
```ts
|
||||
export interface IngestSettingsPort {
|
||||
memoryIngestionModel: string;
|
||||
probeRowCount: number;
|
||||
workUnitMaxConcurrency?: number;
|
||||
workUnitStepBudget?: number;
|
||||
workUnitFailureMode?: 'abort' | 'continue';
|
||||
ingestTraceLevel?: IngestTraceLevel;
|
||||
}
|
||||
```
|
||||
|
||||
- [ ] **Step 2: Remove the local runtime source-routing import**
|
||||
|
||||
In `packages/context/src/ingest/local-bundle-runtime.ts`, delete this import:
|
||||
|
||||
```ts
|
||||
import { defaultSharedWorktreeSourceKeys } from './isolated-diff/source-routing.js';
|
||||
```
|
||||
|
||||
- [ ] **Step 3: Remove the local runtime fallback setting**
|
||||
|
||||
In `packages/context/src/ingest/local-bundle-runtime.ts`, replace the settings
|
||||
object with:
|
||||
|
||||
```ts
|
||||
settings: {
|
||||
memoryIngestionModel: options.project.config.llm.models.default ?? 'local-ingest-model',
|
||||
probeRowCount: 0,
|
||||
workUnitMaxConcurrency: options.project.config.ingest.workUnits.maxConcurrency,
|
||||
workUnitStepBudget: options.project.config.ingest.workUnits.stepBudget,
|
||||
workUnitFailureMode: options.project.config.ingest.workUnits.failureMode,
|
||||
ingestTraceLevel: ingestTraceLevelFromEnv(),
|
||||
},
|
||||
```
|
||||
|
||||
- [ ] **Step 4: Delete the fallback routing helper files**
|
||||
|
||||
Delete:
|
||||
|
||||
```bash
|
||||
git rm packages/context/src/ingest/isolated-diff/source-routing.ts
|
||||
git rm packages/context/src/ingest/isolated-diff/source-routing.test.ts
|
||||
```
|
||||
|
||||
- [ ] **Step 5: Confirm no fallback helper imports remain**
|
||||
|
||||
Run:
|
||||
|
||||
```bash
|
||||
rg -n "defaultSharedWorktreeSourceKeys|isSharedWorktreeFallbackSourceKey|source-routing" packages/context/src
|
||||
```
|
||||
|
||||
Expected: FAIL with no matches. `rg` exits with status 1 when the cleanup is
|
||||
complete.
|
||||
|
||||
---
|
||||
|
||||
### Task 3: Delete the shared-worktree runner branch
|
||||
|
||||
**Files:**
|
||||
- Modify: `packages/context/src/ingest/ingest-bundle.runner.ts`
|
||||
|
||||
- [ ] **Step 1: Remove helper methods used only by the shared branch**
|
||||
|
||||
In `packages/context/src/ingest/ingest-bundle.runner.ts`, delete these private
|
||||
methods:
|
||||
|
||||
```ts
|
||||
private buildFailedWorkUnitOutcome(wu: WorkUnit, error: unknown): WorkUnitOutcome {
|
||||
return {
|
||||
unitKey: wu.unitKey,
|
||||
status: 'failed',
|
||||
reason: error instanceof Error ? error.message : String(error),
|
||||
preSha: '',
|
||||
postSha: '',
|
||||
actions: [],
|
||||
touchedSlSources: [],
|
||||
slDisallowed: wu.slDisallowed,
|
||||
slDisallowedReason: wu.slDisallowedReason,
|
||||
};
|
||||
}
|
||||
|
||||
private formatWorkUnitFailure(outcome: WorkUnitOutcome): string {
|
||||
return `WorkUnit ${outcome.unitKey} failed: ${outcome.reason ?? 'unknown failure'}`;
|
||||
}
|
||||
|
||||
private isSharedWorktreeFallbackEnabled(sourceKey: string): boolean {
|
||||
return (this.deps.settings.sharedWorktreeSourceKeys ?? []).includes(sourceKey);
|
||||
}
|
||||
```
|
||||
|
||||
- [ ] **Step 2: Make non-override isolated routing unconditional**
|
||||
|
||||
In `packages/context/src/ingest/ingest-bundle.runner.ts`, replace:
|
||||
|
||||
```ts
|
||||
const isolatedDiffEnabled = !overrideReport && !this.isSharedWorktreeFallbackEnabled(job.sourceKey);
|
||||
```
|
||||
|
||||
with:
|
||||
|
||||
```ts
|
||||
const isolatedDiffEnabled = !overrideReport;
|
||||
```
|
||||
|
||||
Then replace:
|
||||
|
||||
```ts
|
||||
if (!overrideReport && isolatedDiffEnabled) {
|
||||
```
|
||||
|
||||
with:
|
||||
|
||||
```ts
|
||||
if (!overrideReport) {
|
||||
```
|
||||
|
||||
- [ ] **Step 3: Delete the old shared-worktree branch**
|
||||
|
||||
In `packages/context/src/ingest/ingest-bundle.runner.ts`, delete the whole
|
||||
branch that starts with:
|
||||
|
||||
```ts
|
||||
} else if (!overrideReport) {
|
||||
await runTrace.event('info', 'routing', 'shared_worktree_path_enabled', {
|
||||
sourceKey: job.sourceKey,
|
||||
reason: 'explicit_private_fallback',
|
||||
});
|
||||
```
|
||||
|
||||
and ends with:
|
||||
|
||||
```ts
|
||||
latestReportWorkUnits = this.toReportWorkUnits(stageIndex);
|
||||
}
|
||||
```
|
||||
|
||||
After the deletion, the surrounding code must read:
|
||||
|
||||
```ts
|
||||
}
|
||||
|
||||
}
|
||||
const carryForwardResult =
|
||||
contextReport && this.deps.contextCandidateCarryforward
|
||||
? await this.deps.contextCandidateCarryforward.carryForward({
|
||||
runId: runRow.id,
|
||||
connectionId: job.connectionId,
|
||||
sourceKey: job.sourceKey,
|
||||
})
|
||||
: null;
|
||||
```
|
||||
|
||||
- [ ] **Step 4: Confirm the branch trace event is gone**
|
||||
|
||||
Run:
|
||||
|
||||
```bash
|
||||
rg -n "shared_worktree_path_enabled|explicit_private_fallback|isSharedWorktreeFallbackEnabled|sharedWorktreeSourceKeys" packages/context/src/ingest/ingest-bundle.runner.ts
|
||||
```
|
||||
|
||||
Expected: FAIL with no matches.
|
||||
|
||||
---
|
||||
|
||||
### Task 4: Update runner tests for isolated-only execution
|
||||
|
||||
**Files:**
|
||||
- Modify: `packages/context/src/ingest/ingest-bundle.runner.test.ts`
|
||||
- Modify: `packages/context/src/ingest/ingest-bundle.runner.isolated-diff.test.ts`
|
||||
|
||||
- [ ] **Step 1: Remove the fallback setting from the broad runner test harness**
|
||||
|
||||
In `packages/context/src/ingest/ingest-bundle.runner.test.ts`, replace the
|
||||
`settings` block in `buildRunner()` with:
|
||||
|
||||
```ts
|
||||
settings: {
|
||||
probeRowCount: 1,
|
||||
memoryIngestionModel: 'test-model',
|
||||
},
|
||||
```
|
||||
|
||||
- [ ] **Step 2: Add no-op isolated patch support to the broad mock Git**
|
||||
|
||||
In `packages/context/src/ingest/ingest-bundle.runner.test.ts`, replace the
|
||||
`scopedGit` object in `makeDeps()` with:
|
||||
|
||||
```ts
|
||||
const scopedGit = {
|
||||
revParseHead: vi.fn().mockResolvedValue('h'),
|
||||
commitFiles: vi.fn().mockResolvedValue({ created: true, commitHash: 'h' }),
|
||||
commitStaged: vi.fn().mockResolvedValue({ created: false, commitHash: 'h' }),
|
||||
resetHardTo: vi.fn(),
|
||||
assertWorktreeClean: vi.fn().mockResolvedValue(undefined),
|
||||
writeBinaryNoRenamePatch: vi.fn(async (_base: string, _head: string, patchPath: string) => {
|
||||
await writeFile(patchPath, '', 'utf-8');
|
||||
}),
|
||||
applyPatchFile3WayIndex: vi.fn(),
|
||||
diffNameStatus: vi.fn().mockResolvedValue([]),
|
||||
};
|
||||
```
|
||||
|
||||
- [ ] **Step 3: Update the custom sequencer test Git mock**
|
||||
|
||||
In the test named
|
||||
`refuses to squash-merge when the session worktree has an in-progress sequencer op`,
|
||||
replace the `sessionGit` object with:
|
||||
|
||||
```ts
|
||||
const sessionGit = {
|
||||
revParseHead: vi.fn().mockResolvedValue('h'),
|
||||
commitFiles: vi.fn().mockResolvedValue({ created: true, commitHash: 'h' }),
|
||||
commitStaged: vi.fn().mockResolvedValue({ created: false, commitHash: 'h' }),
|
||||
resetHardTo: vi.fn(),
|
||||
assertWorktreeClean: vi.fn().mockRejectedValue(assertError),
|
||||
writeBinaryNoRenamePatch: vi.fn(async (_base: string, _head: string, patchPath: string) => {
|
||||
await writeFile(patchPath, '', 'utf-8');
|
||||
}),
|
||||
applyPatchFile3WayIndex: vi.fn(),
|
||||
diffNameStatus: vi.fn().mockResolvedValue([]),
|
||||
};
|
||||
```
|
||||
|
||||
- [ ] **Step 4: Move the failed-WorkUnit integration regression to the isolated suite**
|
||||
|
||||
In `packages/context/src/ingest/ingest-bundle.runner.test.ts`, delete the test
|
||||
named `squash-merges only successful WUs into main when one WU fails sl_validate`.
|
||||
|
||||
In `packages/context/src/ingest/ingest-bundle.runner.isolated-diff.test.ts`,
|
||||
add this test near the other real-git isolated runner regressions:
|
||||
|
||||
```ts
|
||||
it('does not integrate failed isolated WorkUnit patches', async () => {
|
||||
const runtime = await makeRealGitRuntime();
|
||||
try {
|
||||
const { deps, adapter } = makeDeps(runtime, 'fake');
|
||||
adapter.chunk.mockResolvedValue({
|
||||
workUnits: [
|
||||
{ unitKey: 'wu-good', rawFiles: ['good.raw'], peerFileIndex: [], dependencyPaths: [] },
|
||||
{ unitKey: 'wu-bad', rawFiles: ['bad.raw'], peerFileIndex: [], dependencyPaths: [] },
|
||||
],
|
||||
});
|
||||
deps.diffSetService.compute = vi.fn().mockResolvedValue({
|
||||
added: ['good.raw', 'bad.raw'],
|
||||
modified: [],
|
||||
deleted: [],
|
||||
unchanged: [],
|
||||
});
|
||||
deps.slValidator.validateSingleSource = vi.fn(
|
||||
async (_validationDeps: unknown, _connectionId: string, sourceName: string) => ({
|
||||
errors: sourceName === 'bad' ? [{ message: 'bad source rejected' }] : [],
|
||||
warnings: [],
|
||||
}),
|
||||
) as never;
|
||||
|
||||
let currentSession: any = null;
|
||||
deps.toolsetFactory.createIngestWuToolset = vi.fn((toolSession: any) => {
|
||||
currentSession = toolSession;
|
||||
return { toRuntimeTools: vi.fn(() => ({})) };
|
||||
});
|
||||
deps.agentRunner.runLoop = vi.fn(async (params: any) => {
|
||||
if (params.telemetryTags.operationName !== 'ingest-bundle-wu') {
|
||||
return { stopReason: 'natural' };
|
||||
}
|
||||
const unitKey = params.telemetryTags.unitKey;
|
||||
const root = rootOfConfig(currentSession.configService, runtime.configDir);
|
||||
await mkdir(join(root, 'semantic-layer/warehouse'), { recursive: true });
|
||||
if (unitKey === 'wu-good') {
|
||||
await writeFile(join(root, 'semantic-layer/warehouse/good.yaml'), 'name: good\n', 'utf-8');
|
||||
addTouchedSlSource(currentSession.touchedSlSources, 'warehouse', 'good');
|
||||
currentSession.actions.push({
|
||||
target: 'sl',
|
||||
type: 'created',
|
||||
key: 'good',
|
||||
detail: 'good source',
|
||||
targetConnectionId: 'warehouse',
|
||||
rawPaths: ['good.raw'],
|
||||
});
|
||||
await currentSession.gitService.commitFiles(
|
||||
['semantic-layer/warehouse/good.yaml'],
|
||||
'test: add good source',
|
||||
'KTX Test',
|
||||
'system@ktx.local',
|
||||
);
|
||||
}
|
||||
if (unitKey === 'wu-bad') {
|
||||
await writeFile(join(root, 'semantic-layer/warehouse/bad.yaml'), 'name: bad\n', 'utf-8');
|
||||
addTouchedSlSource(currentSession.touchedSlSources, 'warehouse', 'bad');
|
||||
currentSession.actions.push({
|
||||
target: 'sl',
|
||||
type: 'created',
|
||||
key: 'bad',
|
||||
detail: 'bad source',
|
||||
targetConnectionId: 'warehouse',
|
||||
rawPaths: ['bad.raw'],
|
||||
});
|
||||
await currentSession.gitService.commitFiles(
|
||||
['semantic-layer/warehouse/bad.yaml'],
|
||||
'test: add bad source',
|
||||
'KTX Test',
|
||||
'system@ktx.local',
|
||||
);
|
||||
}
|
||||
return { stopReason: 'natural' };
|
||||
}) as never;
|
||||
|
||||
const runner = new IngestBundleRunner(deps);
|
||||
await mockStageRawFiles(
|
||||
runner,
|
||||
runtime,
|
||||
[
|
||||
['good.raw', 'good-hash'],
|
||||
['bad.raw', 'bad-hash'],
|
||||
],
|
||||
'fake',
|
||||
);
|
||||
|
||||
const result = await runner.run({
|
||||
jobId: 'job-failed-wu-isolated',
|
||||
connectionId: 'warehouse',
|
||||
sourceKey: 'fake',
|
||||
trigger: 'upload',
|
||||
bundleRef: { kind: 'upload', uploadId: 'upload' },
|
||||
});
|
||||
|
||||
expect(result.failedWorkUnits).toEqual(['wu-bad']);
|
||||
await expect(readFile(join(runtime.configDir, 'semantic-layer/warehouse/good.yaml'), 'utf-8')).resolves.toContain(
|
||||
'good',
|
||||
);
|
||||
await expect(readFile(join(runtime.configDir, 'semantic-layer/warehouse/bad.yaml'), 'utf-8')).rejects.toThrow();
|
||||
|
||||
const reportCreate = vi.mocked(deps.reports.create).mock.calls.at(-1)?.[0];
|
||||
const reportBody = reportCreate?.body as { isolatedDiff?: { acceptedPatches?: number }; failedWorkUnits?: string[] };
|
||||
expect(reportBody.failedWorkUnits).toEqual(['wu-bad']);
|
||||
expect(reportBody.isolatedDiff).toMatchObject({ enabled: true, acceptedPatches: 1 });
|
||||
|
||||
const trace = await readFile(
|
||||
join(runtime.configDir, '.ktx/ingest-traces/job-failed-wu-isolated/trace.jsonl'),
|
||||
'utf-8',
|
||||
);
|
||||
expect(trace).toContain('work_unit_failed_before_patch');
|
||||
expect(trace).toContain('patch_accepted');
|
||||
expect(trace).not.toContain('shared_worktree_path_enabled');
|
||||
} finally {
|
||||
await rm(runtime.homeDir, { recursive: true, force: true });
|
||||
}
|
||||
});
|
||||
```
|
||||
|
||||
- [ ] **Step 5: Run the updated focused runner tests**
|
||||
|
||||
Run:
|
||||
|
||||
```bash
|
||||
pnpm --filter @ktx/context exec vitest run \
|
||||
src/ingest/ingest-bundle.runner.isolated-diff.test.ts \
|
||||
src/ingest/local-bundle-runtime.test.ts \
|
||||
-t "does not support shared-worktree|does not integrate failed isolated|defaults local bundle ingest|unlisted direct-writing source"
|
||||
```
|
||||
|
||||
Expected: PASS. The traces contain `isolated_diff_enabled`, child worktree
|
||||
events, and no `shared_worktree_path_enabled`.
|
||||
|
||||
- [ ] **Step 6: Run the broad runner suite**
|
||||
|
||||
Run:
|
||||
|
||||
```bash
|
||||
pnpm --filter @ktx/context exec vitest run src/ingest/ingest-bundle.runner.test.ts
|
||||
```
|
||||
|
||||
Expected: PASS. Broad runner coverage no longer depends on
|
||||
`sharedWorktreeSourceKeys`.
|
||||
|
||||
- [ ] **Step 7: Commit the runner removal**
|
||||
|
||||
Run:
|
||||
|
||||
```bash
|
||||
git add \
|
||||
packages/context/src/ingest/ports.ts \
|
||||
packages/context/src/ingest/local-bundle-runtime.ts \
|
||||
packages/context/src/ingest/local-bundle-runtime.test.ts \
|
||||
packages/context/src/ingest/ingest-bundle.runner.ts \
|
||||
packages/context/src/ingest/ingest-bundle.runner.test.ts \
|
||||
packages/context/src/ingest/ingest-bundle.runner.isolated-diff.test.ts \
|
||||
packages/context/src/ingest/isolated-diff/source-routing.ts \
|
||||
packages/context/src/ingest/isolated-diff/source-routing.test.ts
|
||||
git commit -m "refactor(ingest): remove shared worktree WorkUnit path"
|
||||
```
|
||||
|
||||
Expected: commit succeeds. The deleted routing files are included as deletions.
|
||||
|
||||
---
|
||||
|
||||
### Task 5: Remove shared-branch agent instructions
|
||||
|
||||
**Files:**
|
||||
- Modify: `packages/context/prompts/memory_agent_bundle_ingest_work_unit.md`
|
||||
- Modify: `packages/context/skills/ingest_triage/SKILL.md`
|
||||
- Test: `packages/context/src/ingest/ingest-prompts.test.ts`
|
||||
- Test: `packages/context/src/ingest/ingest-runtime-assets.test.ts`
|
||||
|
||||
- [ ] **Step 1: Update the WorkUnit role text**
|
||||
|
||||
In `packages/context/prompts/memory_agent_bundle_ingest_work_unit.md`, replace
|
||||
the `<role>` block with:
|
||||
|
||||
```md
|
||||
<role>
|
||||
You are processing ONE WorkUnit of a multi-file ingest bundle. The WorkUnit
|
||||
gives you a slice of raw source files (LookML views, dbt/MetricFlow YAMLs,
|
||||
Metabase card JSONs, Notion pages, or similar) and you must translate that
|
||||
slice into KTX semantic-layer sources and/or knowledge wiki pages, in one pass.
|
||||
You run in an isolated WorkUnit worktree. Deterministic projection output,
|
||||
existing project memory, and listed dependency paths are visible; sibling
|
||||
WorkUnit edits from this same job are not visible until the runner integrates
|
||||
accepted patches.
|
||||
</role>
|
||||
```
|
||||
|
||||
- [ ] **Step 2: Update the WorkUnit workflow text**
|
||||
|
||||
In the same prompt, replace workflow steps 2 and 4 with:
|
||||
|
||||
```md
|
||||
2. Load the per-source review skill first (for example `lookml_ingest`,
|
||||
`metricflow_ingest`, or `dbt_ingest`), then `sl_capture` and
|
||||
`wiki_capture`, and `ingest_triage` last. The triage skill tells you how to
|
||||
react when existing project memory, deterministic projection output, or
|
||||
prior provenance overlaps with what this WorkUnit is about to write.
|
||||
4. For each raw file: call `read_raw_file` (or `read_raw_span` for slicing large
|
||||
files) to load content. Before writing a new SL source or wiki page, call
|
||||
`discover_data` for each candidate source, table, metric, or topic name to
|
||||
find existing wiki pages, SL sources, deterministic projection output, prior
|
||||
sync artifacts, and raw warehouse matches; apply `ingest_triage` when you hit
|
||||
one, and apply any matching canonical pin before deciding whether to edit,
|
||||
rename, or skip.
|
||||
```
|
||||
|
||||
- [ ] **Step 3: Update the WorkUnit do-not rule**
|
||||
|
||||
In the same prompt, replace:
|
||||
|
||||
```md
|
||||
- Do not silently accept a name collision with a prior WU's write when the formula differs. Trigger `ingest_triage`.
|
||||
```
|
||||
|
||||
with:
|
||||
|
||||
```md
|
||||
- Do not silently accept a name collision with visible existing memory,
|
||||
deterministic projection output, or prior provenance when the formula differs.
|
||||
Trigger `ingest_triage`.
|
||||
```
|
||||
|
||||
- [ ] **Step 4: Update ingest triage caller guidance**
|
||||
|
||||
In `packages/context/skills/ingest_triage/SKILL.md`, replace:
|
||||
|
||||
```md
|
||||
This skill is loaded in two contexts:
|
||||
- By a Stage 3 WorkUnit agent when `sl_discover` reveals that a prior WU (or a prior sync) already wrote something that overlaps with what the current WU is about to write.
|
||||
- By the Stage 4 reconciliation agent for cross-WU sweeps and for eviction decisions.
|
||||
```
|
||||
|
||||
with:
|
||||
|
||||
```md
|
||||
This skill is loaded in two contexts:
|
||||
- By a Stage 3 WorkUnit agent when `sl_discover`, deterministic projection
|
||||
output, existing project memory, or prior provenance overlaps with what the
|
||||
current WorkUnit is about to write.
|
||||
- By the Stage 4 reconciliation agent for cross-WorkUnit sweeps, accepted patch
|
||||
overlap, and eviction decisions.
|
||||
```
|
||||
|
||||
- [ ] **Step 5: Update same-ingest wording in ingest triage**
|
||||
|
||||
In `packages/context/skills/ingest_triage/SKILL.md`, replace:
|
||||
|
||||
```md
|
||||
4. **If there's no prior-sync row (both are from THIS job), check for same-ingest contradictions:**
|
||||
```
|
||||
|
||||
with:
|
||||
|
||||
```md
|
||||
4. **If reconciliation sees accepted patches from this same job with no
|
||||
prior-sync row, check for same-ingest contradictions:**
|
||||
```
|
||||
|
||||
- [ ] **Step 6: Search for stale shared-state prompt language**
|
||||
|
||||
Run:
|
||||
|
||||
```bash
|
||||
rg -n "prior WU|prior-WU|Prior WorkUnits|same job may have already written|visible on the working branch|shared_worktree_path_enabled|shared-worktree path reachable" packages/context/prompts packages/context/skills packages/context/src/ingest
|
||||
```
|
||||
|
||||
Expected: FAIL with no matches.
|
||||
|
||||
- [ ] **Step 7: Run prompt asset tests**
|
||||
|
||||
Run:
|
||||
|
||||
```bash
|
||||
pnpm --filter @ktx/context exec vitest run \
|
||||
src/ingest/ingest-prompts.test.ts \
|
||||
src/ingest/ingest-runtime-assets.test.ts
|
||||
```
|
||||
|
||||
Expected: PASS. Prompt assets still load from packaged KTX assets.
|
||||
|
||||
- [ ] **Step 8: Commit the prompt cleanup**
|
||||
|
||||
Run:
|
||||
|
||||
```bash
|
||||
git add \
|
||||
packages/context/prompts/memory_agent_bundle_ingest_work_unit.md \
|
||||
packages/context/skills/ingest_triage/SKILL.md
|
||||
git commit -m "docs(ingest): align WorkUnit prompts with isolated diffs"
|
||||
```
|
||||
|
||||
Expected: commit succeeds.
|
||||
|
||||
---
|
||||
|
||||
### Task 6: Final verification
|
||||
|
||||
**Files:**
|
||||
- Verify: `packages/context/src/ingest/ingest-bundle.runner.ts`
|
||||
- Verify: `packages/context/src/ingest/ports.ts`
|
||||
- Verify: `packages/context/src/ingest/local-bundle-runtime.ts`
|
||||
- Verify: `packages/context/src/ingest/ingest-bundle.runner.test.ts`
|
||||
- Verify: `packages/context/src/ingest/ingest-bundle.runner.isolated-diff.test.ts`
|
||||
- Verify: `packages/context/prompts/memory_agent_bundle_ingest_work_unit.md`
|
||||
- Verify: `packages/context/skills/ingest_triage/SKILL.md`
|
||||
|
||||
- [ ] **Step 1: Run the isolated-diff focused suite**
|
||||
|
||||
Run:
|
||||
|
||||
```bash
|
||||
pnpm --filter @ktx/context exec vitest run \
|
||||
src/ingest/ingest-trace.test.ts \
|
||||
src/ingest/wiki-body-refs.test.ts \
|
||||
src/ingest/artifact-gates.test.ts \
|
||||
src/ingest/semantic-layer-target-policy.test.ts \
|
||||
src/ingest/isolated-diff/git-patch.test.ts \
|
||||
src/ingest/isolated-diff/work-unit-executor.test.ts \
|
||||
src/ingest/isolated-diff/patch-integrator.test.ts \
|
||||
src/ingest/isolated-diff/textual-conflict-resolver.test.ts \
|
||||
src/ingest/final-gate-repair.test.ts \
|
||||
src/ingest/report-snapshot.test.ts \
|
||||
src/ingest/ingest-bundle.runner.isolated-diff.test.ts
|
||||
```
|
||||
|
||||
Expected: PASS. The output includes the isolated-diff runner tests and no
|
||||
`source-routing.test.ts`.
|
||||
|
||||
- [ ] **Step 2: Run the full context test suite**
|
||||
|
||||
Run:
|
||||
|
||||
```bash
|
||||
pnpm --filter @ktx/context run test
|
||||
```
|
||||
|
||||
Expected: PASS.
|
||||
|
||||
- [ ] **Step 3: Run context type-check**
|
||||
|
||||
Run:
|
||||
|
||||
```bash
|
||||
pnpm --filter @ktx/context run type-check
|
||||
```
|
||||
|
||||
Expected: PASS. There are no `sharedWorktreeSourceKeys` type errors because the
|
||||
setting no longer exists.
|
||||
|
||||
- [ ] **Step 4: Run dead-code checks**
|
||||
|
||||
Run:
|
||||
|
||||
```bash
|
||||
pnpm run dead-code
|
||||
```
|
||||
|
||||
Expected: PASS. Knip does not report deleted source-routing exports, and Biome
|
||||
does not report stale imports.
|
||||
|
||||
- [ ] **Step 5: Search for removed legacy path names**
|
||||
|
||||
Run:
|
||||
|
||||
```bash
|
||||
rg -n "sharedWorktreeSourceKeys|defaultSharedWorktreeSourceKeys|isSharedWorktreeFallbackSourceKey|shared_worktree_path_enabled|explicit_private_fallback|source-routing" packages docs/superpowers/plans/2026-05-18-isolated-diff-ingestion-v1-shared-worktree-removal.md
|
||||
```
|
||||
|
||||
Expected: matches only in this plan file. There must be no matches under
|
||||
`packages/`.
|
||||
|
||||
- [ ] **Step 6: Confirm docs-site does not need an update**
|
||||
|
||||
Run:
|
||||
|
||||
```bash
|
||||
rg -n "sharedWorktree|isolatedDiffSourceKeys|sharedWorktreeSourceKeys|executionMode|planningStrategy|conflictPolicy" docs-site README.md packages/*/README.md
|
||||
```
|
||||
|
||||
Expected: either no matches or matches unrelated to a public user-facing knob.
|
||||
This change removes an internal runner fallback and does not add, remove, or
|
||||
rename public CLI behavior, configuration, or docs-site content.
|
||||
|
||||
- [ ] **Step 7: Commit final verification notes if files changed**
|
||||
|
||||
Run:
|
||||
|
||||
```bash
|
||||
git status --short
|
||||
```
|
||||
|
||||
Expected: clean after the two implementation commits. If this command reports
|
||||
new changes, stop and inspect them before finishing; final verification should
|
||||
not create extra source changes.
|
||||
|
||||
## Self-review
|
||||
|
||||
Spec coverage:
|
||||
|
||||
- Rollout step 11 is covered by Tasks 1 through 4: the private fallback setting,
|
||||
helper module, old runner branch, trace event, and fallback tests are deleted.
|
||||
- The isolated-diff WorkUnit flow remains covered by existing real-git tests and
|
||||
the new failed-WorkUnit regression in Task 4.
|
||||
- Agent-facing instructions are aligned with the spec's worktree invariant in
|
||||
Task 5: sibling WorkUnit edits are not visible inside a child worktree.
|
||||
- Override ingestion remains outside the WorkUnit execution branch and still
|
||||
uses prior report materialization plus serial reconciliation.
|
||||
|
||||
Placeholder scan:
|
||||
|
||||
- This plan contains exact file paths, test names, replacement snippets,
|
||||
commands, and expected results.
|
||||
- There are no deferred implementation markers or unspecified edge-case
|
||||
instructions.
|
||||
|
||||
Type consistency:
|
||||
|
||||
- `IngestSettingsPort` no longer includes `sharedWorktreeSourceKeys`.
|
||||
- `isolatedDiffEnabled` remains the runner's internal summary flag and is
|
||||
equivalent to `!overrideReport`.
|
||||
- The removed trace event is `shared_worktree_path_enabled`; retained isolated
|
||||
events include `isolated_diff_enabled`, `work_unit_child_created`, and
|
||||
`work_unit_patch_collected`.
|
||||
|
||||
Execution handoff:
|
||||
|
||||
Plan complete and saved to
|
||||
`docs/superpowers/plans/2026-05-18-isolated-diff-ingestion-v1-shared-worktree-removal.md`.
|
||||
|
||||
Two execution options:
|
||||
|
||||
1. **Subagent-Driven (recommended)** - Dispatch a fresh subagent per task,
|
||||
review between tasks, and keep iteration fast.
|
||||
2. **Inline Execution** - Execute tasks in this session using
|
||||
`superpowers:executing-plans`, with batch execution and checkpoints.
|
||||
File diff suppressed because it is too large
Load diff
|
|
@ -0,0 +1,612 @@
|
|||
# Isolated-diff ingestion design
|
||||
|
||||
**Date:** 2026-05-17
|
||||
**Author:** Andrey Avtomonov
|
||||
**Status:** Design - pending implementation plan
|
||||
|
||||
## Background
|
||||
|
||||
KTX ingests third-party context sources into durable project memory: raw source
|
||||
snapshots, wiki pages, semantic-layer sources, evidence documents, candidates,
|
||||
and fallback records. The current bundle runner stages raw source data in one
|
||||
ingestion session worktree, then runs work units against that same mutable
|
||||
worktree.
|
||||
|
||||
A Metabase ingestion run exposed the failure mode this design addresses. One
|
||||
work unit inferred and wrote the semantic-layer measure
|
||||
`mart_account_segments.total_contract_arr_cents`, a later work unit overwrote
|
||||
the same source with `total_contract_arr`, and the generated wiki page kept
|
||||
referencing the stale non-existent measure. The local per-work-unit checks did
|
||||
not catch the final cross-artifact inconsistency because durable writes were
|
||||
accepted into shared state before final integration.
|
||||
|
||||
The fix is not a Metabase-only validation patch. The same class of risk exists
|
||||
any time LLM-authored work units mutate durable wiki or semantic-layer files:
|
||||
Metabase cards, Notion pages and clusters, dbt YAML, MetricFlow YAML, Looker
|
||||
dashboards and explores, and LookML models and views can all produce overlapping
|
||||
or contested memory artifacts. KTX needs one ingestion execution model that
|
||||
isolates agent-authored changes, integrates them deliberately, and validates
|
||||
the final project state globally.
|
||||
|
||||
## Goals
|
||||
|
||||
This design creates one opinionated ingestion algorithm for all context sources.
|
||||
Connector-specific code stays responsible for source-shaped work: fetching raw
|
||||
data, normalizing raw files, planning work units, and optionally projecting
|
||||
deterministic facts. The shared runner owns execution correctness.
|
||||
|
||||
The design has these goals:
|
||||
|
||||
- Run all agent-authored durable writes in isolated per-work-unit worktrees.
|
||||
- Treat each work unit's git diff as its proposal artifact.
|
||||
- Integrate accepted diffs through a shared artifact-aware merge path.
|
||||
- Resolve expected cross-work-unit overlap with bounded agent repair before
|
||||
failing the run.
|
||||
- Run final global semantic gates before any changes reach the main project
|
||||
worktree.
|
||||
- Keep connector variance minimal and source-shaped, not pipeline-shaped.
|
||||
- Avoid proposal manifests, typed candidates, and extra reporting entities for
|
||||
the first implementation.
|
||||
- Preserve deterministic projections for source systems with authoritative
|
||||
structured metadata.
|
||||
|
||||
## Non-goals
|
||||
|
||||
This design does not change the wiki frontmatter schema, wiki page file layout,
|
||||
the semantic-layer YAML format, or the raw source snapshot layouts. It does add
|
||||
a narrow author-facing inline-code grammar for explicit wiki body references to
|
||||
semantic-layer entities and raw tables, because body text is part of the
|
||||
stale-reference failure class. It also does not remove source adapters' current
|
||||
fetch and chunk logic in one large rewrite.
|
||||
|
||||
This design does not introduce public connector knobs such as
|
||||
`executionMode`, `planningStrategy`, or `conflictPolicy`. The core runner
|
||||
becomes more opinionated instead.
|
||||
|
||||
This design does not require all connectors to stop using candidates. Candidate
|
||||
storage remains valid for flows that intentionally defer wiki curation. The
|
||||
isolation model applies when a work unit writes durable project files.
|
||||
|
||||
## Locked design direction
|
||||
|
||||
The ingestion runner uses one flow for every source that can produce durable
|
||||
changes.
|
||||
|
||||
```text
|
||||
fetch raw
|
||||
-> optional deterministic project
|
||||
-> adapter plans WorkUnit[]
|
||||
-> isolated WU diffs
|
||||
-> artifact-aware integration
|
||||
-> global semantic gates
|
||||
-> squash
|
||||
```
|
||||
|
||||
The important invariant is that the core runner does not know why a work unit
|
||||
exists. A dbt adapter may plan by model, Notion may plan by page or cluster,
|
||||
MetricFlow may plan by graph component, and Looker may plan by dashboard or
|
||||
explore. Those differences describe the source system. They are not ingestion
|
||||
execution modes.
|
||||
|
||||
## Architecture
|
||||
|
||||
The design splits ingestion into two layers with explicit responsibility
|
||||
boundaries.
|
||||
|
||||
### Source adapter layer
|
||||
|
||||
The adapter owns source semantics. It fetches raw evidence, normalizes that
|
||||
evidence into staged files, and plans work units from the staged snapshot and
|
||||
diff scope.
|
||||
|
||||
The adapter may also provide deterministic projectors. A projector is code that
|
||||
converts authoritative source facts into KTX artifacts without an agent. Good
|
||||
examples are live database schema introspection and straightforward MetricFlow
|
||||
semantic-model import.
|
||||
|
||||
The isolation-relevant adapter surface remains small:
|
||||
|
||||
```ts
|
||||
interface SourceAdapter {
|
||||
source: string;
|
||||
skillNames: string[];
|
||||
|
||||
fetch?(pullConfig: unknown, stagedDir: string, ctx: FetchContext): Promise<void>;
|
||||
chunk(stagedDir: string, diffSet?: DiffSet): Promise<ChunkResult>;
|
||||
|
||||
project?(ctx: DeterministicProjectionContext): Promise<ProjectionResult>;
|
||||
resolveSlTargets?(ctx: SlTargetResolutionContext): Promise<string[]>;
|
||||
}
|
||||
```
|
||||
|
||||
This is the subset the isolated-diff runner needs to understand source-shaped
|
||||
planning and deterministic projection. It is not a proposal to delete existing
|
||||
`SourceAdapter` fields. Existing lifecycle and source-support fields such as
|
||||
`detect`, `readFetchReport`, `listTargetConnectionIds`, `clusterWorkUnits`,
|
||||
`describeScope`, `onPullSucceeded`, `evidenceIndexing`, `triageSupported`,
|
||||
`getTriageSignals`, and `reconcileSkillNames` stay part of the adapter contract
|
||||
until a separate cleanup intentionally removes them with migration impact
|
||||
called out.
|
||||
|
||||
`chunk()` returns ordinary `WorkUnit[]`. The runner does not need a
|
||||
`planningStrategy` enum because the source adapter can plan by any domain shape
|
||||
that makes sense.
|
||||
|
||||
### Ingestion execution layer
|
||||
|
||||
The runner owns correctness, isolation, and integration. After `WorkUnit[]`
|
||||
exists, all connectors follow the same execution path.
|
||||
|
||||
The runner is responsible for:
|
||||
|
||||
- creating the ingestion integration worktree from the project base commit;
|
||||
- committing deterministic projection in the integration worktree before child
|
||||
worktree creation;
|
||||
- creating one child worktree per work unit from the post-projection ingestion
|
||||
base commit;
|
||||
- scoping tools to the work unit's raw files and allowed target connections;
|
||||
- running the agent loop inside the work unit worktree;
|
||||
- validating touched artifacts before accepting the work unit diff;
|
||||
- collecting the work unit git diff;
|
||||
- applying accepted diffs into the integration worktree;
|
||||
- resolving textual and artifact-level conflicts;
|
||||
- running final global gates; and
|
||||
- squashing the integration worktree back to the project main worktree.
|
||||
|
||||
## Worktree model
|
||||
|
||||
The design uses three levels of git state.
|
||||
|
||||
```text
|
||||
project main worktree
|
||||
ingest integration worktree
|
||||
per-work-unit worktree(s)
|
||||
```
|
||||
|
||||
The project main worktree is the durable KTX project state. The ingestion
|
||||
integration worktree stages raw snapshots, deterministic projections, accepted
|
||||
work-unit diffs, reconciliation changes, and final gate repairs before one
|
||||
squash merge back to main.
|
||||
|
||||
Deterministic projection runs first in the integration worktree, after the raw
|
||||
snapshot is staged and before any per-work-unit worktree is created. The runner
|
||||
commits those projector changes as a single projection commit. The integration
|
||||
worktree's post-projection HEAD is the ingestion base commit referenced in this
|
||||
design. If the adapter has no projector, the raw-snapshot commit is the
|
||||
ingestion base commit.
|
||||
|
||||
Each per-work-unit worktree starts from the same ingestion base commit. A work
|
||||
unit never observes another concurrent work unit's transient edits. This makes
|
||||
the work unit diff a clean proposal against a stable base. Work units observe
|
||||
deterministic projection outputs, including through `dependencyPaths` context,
|
||||
and do not re-derive authoritative projected facts.
|
||||
|
||||
The integration worktree and each per-work-unit worktree must share one Git
|
||||
object database, created through `git worktree add` from the same repository.
|
||||
This is required so `git apply --3way` can resolve the base blobs recorded in
|
||||
each work-unit patch during integration.
|
||||
|
||||
The runner creates and runs child worktrees under the existing
|
||||
`workUnitMaxConcurrency` setting. A run may have many planned work units, but no
|
||||
more than that bound may be active or left on disk at once. The default remains
|
||||
serial execution. Child worktrees must be cleaned up after the diff, transcript,
|
||||
and outcome metadata are persisted, including failure paths. Adapters with
|
||||
large fan-out, such as Notion, may use `clusterWorkUnits` before execution to
|
||||
keep work-unit count tractable, but clustering remains source-shaped planning
|
||||
rather than a separate execution mode.
|
||||
|
||||
## Work-unit lifecycle
|
||||
|
||||
Each work unit follows a fixed lifecycle.
|
||||
|
||||
1. Create a child worktree at the ingestion base commit.
|
||||
2. Build a scoped tool session for the child worktree.
|
||||
3. Run the source skill and agent loop.
|
||||
4. Run work-unit-local gates against touched artifacts.
|
||||
5. If gates pass, record `git diff --binary` from base to child HEAD.
|
||||
6. If gates fail, mark the work unit failed and discard the child worktree.
|
||||
7. Clean up the child worktree after the diff and transcript are persisted.
|
||||
|
||||
The work unit outcome stores the existing operational metadata KTX already
|
||||
records: unit key, status, actions, touched semantic-layer sources, failure
|
||||
reason, raw files, and transcript path. It does not add a proposal manifest.
|
||||
The diff is the proposal.
|
||||
|
||||
For `slDisallowed` work units, isolation is defense in depth. The scoped
|
||||
work-unit tools must withhold semantic-layer write and edit tools, and the
|
||||
integration layer must reject any otherwise accepted diff from that work unit
|
||||
that touches `semantic-layer/**`. This catches buggy or bypassed tool behavior
|
||||
before an invalid LookML connection-mismatch write can reach the integration
|
||||
worktree.
|
||||
|
||||
### Diff proposal contract
|
||||
|
||||
The proposal artifact is a Git patch with binary-safe content, not the existing
|
||||
hash-based raw-source `DiffSet`.
|
||||
|
||||
The first implementation must use one pinned patch contract:
|
||||
|
||||
- collect `git diff --binary --no-renames <base>..HEAD`;
|
||||
- disable rename and copy detection so renames are represented as delete plus
|
||||
create in version one;
|
||||
- preserve mode changes from the patch metadata, but reject unexpected
|
||||
executable-mode or binary changes under known text artifact roots such as
|
||||
`wiki/**` and `semantic-layer/**`;
|
||||
- apply each accepted patch to the integration worktree with
|
||||
`git apply --3way --index`;
|
||||
- do not use `git apply --reject`, because partial hunk application is not an
|
||||
accepted integration state; and
|
||||
- if patch application fails, leaves conflicts, or touches a path disallowed for
|
||||
that work unit, roll back the integration worktree to its pre-apply HEAD and
|
||||
classify the outcome as a textual conflict.
|
||||
|
||||
Delete-versus-edit, recreate-versus-edit, and delete-versus-create races are
|
||||
therefore textual conflicts when Git cannot apply the patch cleanly. If Git
|
||||
applies the patch but known artifact validators reject the resulting tree, the
|
||||
outcome is a semantic conflict.
|
||||
|
||||
## Integration lifecycle
|
||||
|
||||
The integration worktree applies accepted work-unit diffs after local gates
|
||||
pass. The runner applies diffs in a deterministic order, using the original
|
||||
work-unit index unless a future implementation introduces explicit dependency
|
||||
ordering.
|
||||
|
||||
Integration has three conflict classes:
|
||||
|
||||
- Clean patch application: the diff applies without conflict.
|
||||
- Textual conflict: git cannot apply the patch cleanly.
|
||||
- Semantic conflict: the patch applies textually but creates an invalid or
|
||||
inconsistent artifact.
|
||||
|
||||
Textual conflicts are resolved before semantic gates run when a bounded
|
||||
resolver agent can produce a valid result. Overlapping work-unit writes are
|
||||
normal, especially for Metabase cards that target the same semantic-layer marts
|
||||
from different collections. The runner must treat overlap as an integration
|
||||
case, not as a reason to fail immediately.
|
||||
|
||||
Version one is agent-first. If `git apply --3way --index` leaves conflicts,
|
||||
the runner starts a resolver agent in the integration worktree. The resolver
|
||||
receives only the failed patch, already-applied patches, conflicted files,
|
||||
relevant work-unit transcripts, raw evidence paths, and the final-gate rules.
|
||||
The resolver must preserve all non-conflicting accepted content, resolve
|
||||
duplicate or competing artifact entries from evidence, and edit only files
|
||||
touched by the failed patch or already-applied overlapping patches.
|
||||
|
||||
The runner then reruns artifact gates for the changed files and continues with
|
||||
the remaining patches if validation passes. Resolver attempts are capped to
|
||||
avoid an unbounded repair loop. A run fails only after the bounded resolver
|
||||
attempts cannot produce a valid integration tree.
|
||||
|
||||
Deterministic semantic merge is a later optimization, not a version-one
|
||||
requirement. After measuring resolver latency, cost, and failure modes, KTX can
|
||||
add merge helpers for common semantic-layer YAML cases, such as additive
|
||||
`measures`, `segments`, `columns`, `joins`, and `descriptions` updates keyed by
|
||||
their stable logical identifiers. Those helpers can replace agent calls for
|
||||
mechanical merges once the measured v1 behavior justifies the added complexity.
|
||||
|
||||
The integration worktree is preserved on failure with conflict markers or
|
||||
resolver edits, work-unit patches, transcripts, trace events, and the failure
|
||||
report. The runner never squashes a failed or partially repaired integration
|
||||
tree back to the project main worktree.
|
||||
|
||||
### Gate repair stage
|
||||
|
||||
The gate repair stage handles cases where patches apply cleanly but the
|
||||
combined tree fails final semantic or wiki gates. This is distinct from textual
|
||||
conflict resolution: the tree is textually valid, but the artifacts violate KTX
|
||||
contracts.
|
||||
|
||||
After each patch integration and after reconciliation, the runner runs final
|
||||
artifact gates for the affected scope. If gates fail, the runner classifies the
|
||||
errors before deciding whether to repair or fail.
|
||||
|
||||
Repairable gate errors include:
|
||||
|
||||
- stale wiki body references to renamed semantic-layer entities;
|
||||
- invalid `sl_refs` entries that point to entities instead of sources;
|
||||
- inline prose that accidentally uses explicit SL reference syntax;
|
||||
- duplicate measures, segments, or joins with equivalent definitions;
|
||||
- missing or stale wiki references created by accepted patches; and
|
||||
- join or source references that can be corrected from the composed manifest
|
||||
and work-unit evidence.
|
||||
|
||||
High-risk gate errors fail without automatic repair unless a later
|
||||
implementation adds a stronger evidence contract:
|
||||
|
||||
- two work units define the same measure with different business meaning;
|
||||
- a required warehouse table or column does not exist;
|
||||
- a SQL source fails execution and no obvious localized rewrite exists; or
|
||||
- the repair would require choosing between conflicting facts without evidence.
|
||||
|
||||
For repairable errors, the runner starts a gate repair agent with the exact
|
||||
gate errors, changed files, relevant work-unit transcripts, raw evidence paths,
|
||||
and final-gate rules. The agent may edit only the files involved in the gate
|
||||
failure. The runner reruns gates after each repair attempt and caps attempts to
|
||||
one or two passes per integration stage. If the tree still fails, the run stops
|
||||
with the final gate report and preserved integration worktree.
|
||||
|
||||
### Reconciliation in the new flow
|
||||
|
||||
Reconciliation remains a shared runner stage, but it runs as a serial
|
||||
integration-stage pass instead of a parallel work unit.
|
||||
|
||||
The runner applies all accepted work-unit diffs to the integration worktree,
|
||||
resolves textual conflicts that can be resolved, and then runs reconciliation in
|
||||
that integration worktree before final global gates and before squash.
|
||||
Reconciliation must see the integrated state because its job is to resolve
|
||||
cross-work-unit duplicates, evictions, fallbacks, and source-specific
|
||||
reconcile guidance.
|
||||
|
||||
Reconciliation runs exactly once per integration pass, serially against the
|
||||
integration worktree, after all accepted work-unit diffs have been applied and
|
||||
after textual conflicts are resolved. It never runs inside a child worktree and
|
||||
never overlaps with work-unit execution. This is the safety carve-out from the
|
||||
isolation goal: concurrent agent writes are the failure mode being avoided, and
|
||||
reconciliation is non-concurrent by construction.
|
||||
|
||||
Reconciliation is not allowed to mutate project main directly. Its changes are
|
||||
captured as a reconciliation diff against the pre-reconciliation integration
|
||||
HEAD and recorded in the existing stage/report metadata. Reconciliation gates
|
||||
validate the artifacts touched by the reconciliation diff plus any wiki page or
|
||||
semantic-layer source referenced by changed frontmatter or body references,
|
||||
using the same artifact-class validators as work-unit gates. Reconciliation may
|
||||
write only to target connections authorized by the adapter for the ingest run,
|
||||
but it is not subject to any single work unit's `slDisallowed` scope. The final
|
||||
global gates validate the combined tree after reconciliation. If reconciliation
|
||||
introduces an invalid wiki or semantic-layer reference, touches an unauthorized
|
||||
target, or records an unresolvable artifact conflict, the runner sends
|
||||
repairable failures through the gate repair stage and stops before squash only
|
||||
when bounded repair cannot produce a valid tree.
|
||||
|
||||
## Artifact-aware integration
|
||||
|
||||
KTX durable artifacts are structured enough that git-only merge is not a strong
|
||||
correctness boundary. Artifact-aware integration must parse and validate known
|
||||
file classes after diffs are applied.
|
||||
|
||||
The first implementation must cover these worktree file classes:
|
||||
|
||||
- semantic-layer source YAML;
|
||||
- wiki markdown frontmatter;
|
||||
- wiki body references to semantic-layer sources, measures, dimensions, and raw
|
||||
warehouse tables.
|
||||
|
||||
Unmapped fallback records are not worktree files in version one. They remain
|
||||
typed stage-index and report records emitted by `emit_unmapped_fallback`; the
|
||||
integration layer validates their raw paths and structured reason codes as
|
||||
report metadata, not as mergeable artifacts.
|
||||
|
||||
Provenance also stays out of the worktree in version one. The source of truth is
|
||||
the ingest provenance store and report body. Before inserting provenance rows,
|
||||
the global gate derives the planned rows from accepted work-unit actions,
|
||||
reconciliation actions, artifact-resolution records, and skipped raw files, then
|
||||
checks those rows against the integrated worktree and staged raw hashes. Moving
|
||||
provenance to on-disk files would be a separate schema migration, not part of
|
||||
this design.
|
||||
|
||||
Artifact-resolution records are the existing merged or subsumed reconciliation
|
||||
outputs emitted through `emit_artifact_resolution` as
|
||||
`ArtifactResolutionRecord` stage-index records. They are in-memory stage
|
||||
records, not worktree files, and they feed the provenance gate.
|
||||
|
||||
Artifact-aware integration starts with validation plus bounded agent repair.
|
||||
It does not need semantic-layer YAML merge helpers in version one. If two diffs
|
||||
contest the same source YAML or wiki page and bounded agent repair cannot prove
|
||||
correctness, the runner must stop rather than silently accepting stale
|
||||
references. Deterministic semantic merge helpers can be added after v1 metrics
|
||||
show which conflicts are frequent, mechanical, and worth optimizing.
|
||||
|
||||
## Global semantic gates
|
||||
|
||||
Final gates run after every accepted diff, deterministic projection, and
|
||||
reconciliation change has landed in the integration worktree. These gates are
|
||||
global because the final failure can emerge only after independent valid diffs
|
||||
combine.
|
||||
|
||||
The final gates must include:
|
||||
|
||||
- semantic-layer validation for touched and dependency sources;
|
||||
- wiki `wiki_refs` validation;
|
||||
- wiki frontmatter `sl_refs` validation, including source-level and
|
||||
measure-level references;
|
||||
- wiki body validation for explicit semantic-layer source, measure, dimension,
|
||||
and table references; and
|
||||
- provenance validation for raw paths referenced by new or changed artifacts
|
||||
before those rows are inserted into SQLite.
|
||||
|
||||
For semantic-layer validation, touched sources are sources changed by accepted
|
||||
work-unit diffs, deterministic projection, or reconciliation. Dependency sources
|
||||
are their direct declared-join neighbors in the composed semantic-layer graph,
|
||||
including sources they join to and sources that join to them. Version one runs
|
||||
the existing whole-connection structural checks and source-scoped checks with
|
||||
the touched-and-dependency source set; it does not expand dependency scope to a
|
||||
transitive SQL-projection closure.
|
||||
|
||||
The wiki body gate needs a narrow grammar so ordinary prose does not become a
|
||||
semantic-layer reference. In version one, an explicit body reference is one of
|
||||
these Markdown forms outside fenced code blocks:
|
||||
|
||||
- an inline code token in the form `source.entity`, where both parts are plain
|
||||
identifier tokens, `source` matches a visible semantic-layer source, and
|
||||
`entity` must match one of that source's measures, dimensions, or segments;
|
||||
- an inline code token in the form `connectionId/source.entity`, where
|
||||
`source.entity` follows the same plain-identifier rule and validates against
|
||||
that specific target connection;
|
||||
- an inline code token in the form `source:source_name`, which validates a
|
||||
source-level semantic-layer reference; or
|
||||
- an inline code token in the form `table:qualified_table_name`, which validates
|
||||
a raw warehouse table reference against the visible raw table/catalog sources.
|
||||
|
||||
The parser ignores unformatted prose, fenced SQL examples, wildcard patterns
|
||||
such as `mart_nrr_quarterly.*_arr_cents`, inline SQL predicates such as
|
||||
`users.is_internal = false`, and unprefixed single-token inline code. Two-part
|
||||
inline code that does not name a visible semantic-layer source is not treated
|
||||
as an SL entity reference; use the `table:` prefix for raw warehouse table
|
||||
references.
|
||||
|
||||
The `total_contract_arr_cents` incident is the regression case for this gate:
|
||||
the integrated tree must fail if a wiki page references
|
||||
`mart_account_segments.total_contract_arr_cents` as an inline-code body token
|
||||
while the final semantic-layer source defines only `total_contract_arr`.
|
||||
|
||||
## Deterministic projection
|
||||
|
||||
Some connectors have authoritative structured inputs that do not need an LLM to
|
||||
write KTX artifacts. Those connectors can provide deterministic projectors that
|
||||
run in the integration worktree.
|
||||
|
||||
Projection is different from work-unit execution:
|
||||
|
||||
- projectors are code, not agents;
|
||||
- projectors run against the integration worktree;
|
||||
- projectors produce ordinary durable file changes; and
|
||||
- projector outputs still pass final global gates.
|
||||
|
||||
The runner infers hybrid behavior from the adapter. If an adapter has both
|
||||
projectors and work units, it is hybrid. If it has only projectors, it is
|
||||
deterministic. If it has only work units, it uses isolated diffs. No public
|
||||
`executionMode` knob is needed.
|
||||
|
||||
## Connector migration notes
|
||||
|
||||
Each connector keeps its source-shaped planning logic. The migration changes
|
||||
where durable writes happen and how they are integrated.
|
||||
|
||||
### Metabase
|
||||
|
||||
Metabase must move first because it produced the observed stale-measure wiki
|
||||
reference. Collection and card chunking can remain adapter-specific, but direct
|
||||
wiki and semantic-layer writes must happen in per-work-unit worktrees.
|
||||
|
||||
The regression test must reproduce two work units that touch
|
||||
`mart_account_segments`: one writes a wiki reference to an inferred measure and
|
||||
another leaves the final source with a different measure name. The final global
|
||||
gate must reject the integrated tree.
|
||||
|
||||
### dbt
|
||||
|
||||
dbt uses source-shaped planning by model or schema file. Deterministic
|
||||
projection is appropriate for straightforward model, source, column, and
|
||||
description facts when dbt artifacts are authoritative. Agent work units remain
|
||||
useful for business wiki synthesis, ambiguous relationship interpretation, and
|
||||
enrichment that is not directly represented in dbt YAML.
|
||||
|
||||
### MetricFlow
|
||||
|
||||
MetricFlow uses source-shaped planning by graph component. Existing
|
||||
deterministic semantic-model import code becomes a projector in the ingestion
|
||||
flow. Agent work units handle unsupported constructs, cross-model explanations,
|
||||
and wiki synthesis.
|
||||
|
||||
### Looker
|
||||
|
||||
Looker already defers some dashboard and look knowledge through candidates.
|
||||
That can continue. Any direct semantic-layer writes from explores or query
|
||||
translation must run through isolated work-unit diffs.
|
||||
|
||||
Looker-specific API and file-adapter collisions remain connector domain logic,
|
||||
but final correctness still belongs to the shared integration gates.
|
||||
|
||||
### LookML
|
||||
|
||||
LookML already has useful source-shaped ownership rules: models, views, orphan
|
||||
views, dashboards, and connection-mismatch guards. Those rules stay in the
|
||||
adapter. Direct semantic-layer writes move into isolated work-unit diffs.
|
||||
|
||||
Connection-mismatch work units can keep their existing write restrictions. The
|
||||
runner enforces those restrictions through scoped tools and target connection
|
||||
resolution.
|
||||
|
||||
### Notion
|
||||
|
||||
Notion pages and clusters can create overlapping durable wiki knowledge and can
|
||||
write semantic-layer overlays after warehouse verification. Notion therefore
|
||||
uses the same isolated-diff execution model for direct durable writes.
|
||||
|
||||
Large Notion workspaces still need source-shaped clustering to control context
|
||||
size and cost. Clustering remains adapter logic; correctness comes from isolated
|
||||
diffs and final global gates.
|
||||
|
||||
## Minimal connector variance
|
||||
|
||||
New connectors must not choose from a menu of ingestion architectures. They
|
||||
must provide the small amount of source-specific behavior the shared runner
|
||||
needs.
|
||||
|
||||
Every connector answers these questions:
|
||||
|
||||
- How does KTX fetch or receive raw evidence?
|
||||
- How does KTX normalize that evidence into staged files?
|
||||
- How does KTX split the staged evidence into `WorkUnit[]`?
|
||||
- Are any source facts authoritative enough for deterministic projection?
|
||||
- Which target semantic-layer connections can the connector write to?
|
||||
|
||||
Everything else is shared runner behavior.
|
||||
|
||||
## Regression tests
|
||||
|
||||
The implementation plan must start with narrow tests that prove the new
|
||||
execution model prevents the known failure class.
|
||||
|
||||
The first test creates a fake or Metabase-like adapter with two work units
|
||||
starting from the same base:
|
||||
|
||||
1. Work unit A writes a wiki page that references
|
||||
`mart_account_segments.total_contract_arr_cents` as an inline-code body
|
||||
token.
|
||||
2. Work unit B writes or overwrites the final semantic-layer source with only
|
||||
`total_contract_arr`.
|
||||
3. Both work units pass their local gates in isolation.
|
||||
4. Integration applies both diffs.
|
||||
5. The final global gate fails the run before squash.
|
||||
|
||||
Additional tests cover:
|
||||
|
||||
- two work units editing different wiki pages without conflict;
|
||||
- two work units editing the same semantic-layer overlay with additive changes,
|
||||
where the resolver agent preserves both changes and gates the repaired file;
|
||||
- two work units editing the same semantic-layer overlay with incompatible
|
||||
definitions, where the resolver agent receives the conflict context and the
|
||||
run fails only after bounded repair attempts cannot prove a result;
|
||||
- a textual conflict in a wiki page where the resolver agent preserves
|
||||
non-conflicting accepted content and gates the repaired page before squash;
|
||||
- a cleanly merged tree that fails final gates, where the gate repair agent
|
||||
fixes a stale wiki reference and the run continues;
|
||||
- an unrepairable final-gate failure, such as a missing warehouse column, where
|
||||
the runner stops with a preserved integration worktree and report;
|
||||
- a hybrid adapter case where deterministic projector outputs are visible in a
|
||||
child worktree before work-unit wiki synthesis, and the final global gate
|
||||
catches any stale reference to a non-existent projected semantic-layer entity;
|
||||
- Notion-style direct wiki writes with invalid `sl_refs`; and
|
||||
- LookML-style `slDisallowed` work units where write tools are unavailable and
|
||||
integration rejects any diff that still touches `semantic-layer/**`.
|
||||
|
||||
## Rollout
|
||||
|
||||
The rollout must be incremental because the current runner is shared by all
|
||||
adapters.
|
||||
|
||||
The rollout switch is runner-owned. During migration it may be a private
|
||||
per-source allowlist, or an internal `IngestSettingsPort` map keyed by
|
||||
`sourceKey`, but it must not become a `SourceAdapter` field or public connector
|
||||
configuration knob.
|
||||
|
||||
1. Add the per-work-unit worktree executor behind that internal runner setting.
|
||||
2. Add diff collection and deterministic integration in the existing runner.
|
||||
3. Add bounded resolver-agent handling for textual conflicts.
|
||||
4. Add final global wiki and semantic-layer reference gates, including the wiki
|
||||
body reference parser defined above.
|
||||
5. Add bounded gate-repair-agent handling for repairable final-gate failures.
|
||||
6. Instrument resolver latency, attempts, repaired files, and failure classes.
|
||||
7. Migrate Metabase to the new execution path first.
|
||||
8. Migrate Notion, LookML, Looker, dbt, and MetricFlow.
|
||||
9. Add deterministic semantic merge helpers only after v1 metrics show which
|
||||
agent repairs are frequent and mechanical enough to justify optimization.
|
||||
10. Promote the new path to the default after the Metabase regression test and
|
||||
at least one non-Metabase connector pass.
|
||||
11. Remove the old shared-worktree work-unit execution path.
|
||||
|
||||
The rollout is complete when every connector that permits agent-authored durable
|
||||
writes uses isolated diffs and all integrations pass the same final global
|
||||
gates.
|
||||
|
|
@ -635,6 +635,117 @@ describe('runKtxIngest', () => {
|
|||
expect(io.stderr()).not.toContain('Metabase ingest: prod-metabase');
|
||||
});
|
||||
|
||||
it('emits structured child ingest progress during Metabase fan-out', async () => {
|
||||
const projectDir = join(tempDir, 'project');
|
||||
await writeMetabaseConfig(projectDir);
|
||||
const io = makeIo();
|
||||
const progressEvents: Array<{ percent: number; message: string; transient?: boolean }> = [];
|
||||
|
||||
await expect(
|
||||
runKtxIngest(
|
||||
{
|
||||
command: 'run',
|
||||
projectDir,
|
||||
connectionId: 'prod-metabase',
|
||||
adapter: 'metabase',
|
||||
outputMode: 'json',
|
||||
},
|
||||
io.io,
|
||||
{
|
||||
progress: (event) => progressEvents.push(event),
|
||||
runLocalMetabaseIngest: async (input) => {
|
||||
input.progress?.onMetabaseFanoutPlanned?.({
|
||||
metabaseConnectionId: 'prod-metabase',
|
||||
children: [{ metabaseDatabaseId: 1, targetConnectionId: 'warehouse_a' }],
|
||||
});
|
||||
input.progress?.onMetabaseChildStarted?.({
|
||||
metabaseConnectionId: 'prod-metabase',
|
||||
metabaseDatabaseId: 1,
|
||||
targetConnectionId: 'warehouse_a',
|
||||
jobId: 'metabase-child-1',
|
||||
});
|
||||
input.memoryFlow?.update({
|
||||
plannedWorkUnits: [
|
||||
{
|
||||
unitKey: 'metabase-col-6',
|
||||
rawFiles: ['cards/40.json'],
|
||||
peerFileCount: 0,
|
||||
dependencyCount: 0,
|
||||
},
|
||||
],
|
||||
});
|
||||
input.memoryFlow?.emit({ type: 'chunks_planned', chunkCount: 1, workUnitCount: 1, evictionCount: 0 });
|
||||
input.memoryFlow?.emit({
|
||||
type: 'work_unit_started',
|
||||
unitKey: 'metabase-col-6',
|
||||
skills: ['sl_capture'],
|
||||
stepBudget: 40,
|
||||
});
|
||||
input.memoryFlow?.emit({
|
||||
type: 'work_unit_step',
|
||||
unitKey: 'metabase-col-6',
|
||||
stepIndex: 7,
|
||||
stepBudget: 40,
|
||||
});
|
||||
input.memoryFlow?.emit({
|
||||
type: 'stage_progress',
|
||||
stage: 'integration',
|
||||
percent: 81,
|
||||
message: 'Resolving text conflict for metabase-col-6',
|
||||
});
|
||||
input.memoryFlow?.emit({ type: 'work_unit_finished', unitKey: 'metabase-col-6', status: 'success' });
|
||||
input.memoryFlow?.update({
|
||||
plannedWorkUnits: [
|
||||
{
|
||||
unitKey: 'metabase-col-7',
|
||||
rawFiles: ['cards/48.json'],
|
||||
peerFileCount: 0,
|
||||
dependencyCount: 0,
|
||||
},
|
||||
],
|
||||
});
|
||||
input.memoryFlow?.emit({ type: 'chunks_planned', chunkCount: 1, workUnitCount: 1, evictionCount: 0 });
|
||||
input.memoryFlow?.emit({
|
||||
type: 'work_unit_started',
|
||||
unitKey: 'metabase-col-7',
|
||||
skills: ['sl_capture'],
|
||||
stepBudget: 40,
|
||||
});
|
||||
input.progress?.onMetabaseChildCompleted?.({
|
||||
metabaseConnectionId: 'prod-metabase',
|
||||
metabaseDatabaseId: 1,
|
||||
targetConnectionId: 'warehouse_a',
|
||||
jobId: 'metabase-child-1',
|
||||
status: 'done',
|
||||
});
|
||||
return {
|
||||
metabaseConnectionId: 'prod-metabase',
|
||||
status: 'all_succeeded',
|
||||
totals: { workUnits: 1, failedWorkUnits: 0 },
|
||||
children: [],
|
||||
};
|
||||
},
|
||||
},
|
||||
),
|
||||
).resolves.toBe(0);
|
||||
|
||||
expect(progressEvents).toEqual(
|
||||
expect.arrayContaining([
|
||||
{ percent: 45, message: 'Planned 1 task' },
|
||||
{ percent: 55, message: 'Processing 1/1 tasks: metabase-col-6' },
|
||||
{
|
||||
percent: 60,
|
||||
message: 'Processing tasks: 0/1 complete, 1 active; latest metabase-col-6 step 7/40',
|
||||
transient: true,
|
||||
},
|
||||
{ percent: 81, message: 'Resolving text conflict for metabase-col-6' },
|
||||
{ percent: 81, message: 'Processing 1/1 tasks: metabase-col-7' },
|
||||
]),
|
||||
);
|
||||
expect(io.stdout()).toContain('"status": "all_succeeded"');
|
||||
expect(io.stderr()).not.toContain('Metabase ingest: prod-metabase');
|
||||
});
|
||||
|
||||
it('runs Metabase scheduled ingest through the public CLI command path with real fan-out', async () => {
|
||||
const projectDir = join(tempDir, 'metabase-cli-project');
|
||||
await writeWarehouseConfig(projectDir);
|
||||
|
|
@ -985,6 +1096,59 @@ describe('runKtxIngest', () => {
|
|||
expect(io.stdout()).toContain('Status: error\n');
|
||||
});
|
||||
|
||||
it('prints trace path and error status for stored failed ingest reports', async () => {
|
||||
const projectDir = join(tempDir, 'project');
|
||||
await writeWarehouseConfig(projectDir);
|
||||
const io = makeIo();
|
||||
const report = {
|
||||
id: 'report-failed',
|
||||
runId: 'run-failed',
|
||||
jobId: 'job-failed',
|
||||
connectionId: 'warehouse',
|
||||
sourceKey: 'metabase',
|
||||
createdAt: '2026-05-17T12:00:00.000Z',
|
||||
body: {
|
||||
status: 'failed',
|
||||
syncId: 'sync-failed',
|
||||
diffSummary: { added: 1, modified: 0, deleted: 0, unchanged: 0 },
|
||||
commitSha: null,
|
||||
tracePath: '/project/.ktx/ingest-traces/job-failed/trace.jsonl',
|
||||
failure: { phase: 'final_gates', message: 'final artifact gates failed' },
|
||||
workUnits: [],
|
||||
failedWorkUnits: [],
|
||||
reconciliationSkipped: true,
|
||||
conflictsResolved: [],
|
||||
evictionsApplied: [],
|
||||
unmappedFallbacks: [],
|
||||
evictionInputs: [],
|
||||
unresolvedCards: [],
|
||||
supersededBy: null,
|
||||
overrideOf: null,
|
||||
provenanceRows: [],
|
||||
toolTranscripts: [],
|
||||
},
|
||||
};
|
||||
|
||||
await runKtxIngest(
|
||||
{
|
||||
command: 'status',
|
||||
projectDir,
|
||||
reportFile: '/project/report-failed.json',
|
||||
runId: 'run-failed',
|
||||
outputMode: 'plain',
|
||||
inputMode: 'disabled',
|
||||
},
|
||||
io.io,
|
||||
{
|
||||
readReportFile: vi.fn().mockResolvedValue(report),
|
||||
},
|
||||
);
|
||||
|
||||
expect(io.stdout()).toContain('Trace: /project/.ktx/ingest-traces/job-failed/trace.jsonl');
|
||||
expect(io.stdout()).toContain('Status: error');
|
||||
expect(io.stdout()).toContain('Error: final artifact gates failed');
|
||||
});
|
||||
|
||||
it('prints a clear first failure reason when query-history work units fail', async () => {
|
||||
const projectDir = join(tempDir, 'project');
|
||||
await writeWarehouseConfig(projectDir);
|
||||
|
|
|
|||
|
|
@ -102,7 +102,7 @@ export interface KtxIngestDeps {
|
|||
}
|
||||
|
||||
function reportStatus(report: IngestReportSnapshot): 'done' | 'error' {
|
||||
return report.body.failedWorkUnits.length > 0 ? 'error' : 'done';
|
||||
return report.body.status === 'failed' || report.body.failedWorkUnits.length > 0 ? 'error' : 'done';
|
||||
}
|
||||
|
||||
const REPORT_SOURCE_LABELS = new Map<string, string>([
|
||||
|
|
@ -174,6 +174,9 @@ function formatFailureReason(sourceKey: string, reason: string): string {
|
|||
}
|
||||
|
||||
function failedReportMessage(report: IngestReportSnapshot): string | null {
|
||||
if (report.body.status === 'failed' && report.body.failure?.message) {
|
||||
return sanitizeMemoryFlowError(report.body.failure.message);
|
||||
}
|
||||
const failedCount = report.body.failedWorkUnits.length;
|
||||
if (failedCount === 0) {
|
||||
return null;
|
||||
|
|
@ -195,6 +198,9 @@ function writeReportStatus(report: IngestReportSnapshot, io: KtxIngestIo): void
|
|||
io.stdout.write(`Report: ${report.id}\n`);
|
||||
io.stdout.write(`Run: ${report.runId}\n`);
|
||||
io.stdout.write(`Job: ${report.jobId}\n`);
|
||||
if (report.body.tracePath) {
|
||||
io.stdout.write(`Trace: ${report.body.tracePath}\n`);
|
||||
}
|
||||
io.stdout.write(`Status: ${reportStatus(report)}\n`);
|
||||
io.stdout.write(`Source: ${reportSourceLabel(report.sourceKey)}\n`);
|
||||
io.stdout.write(`Connection: ${report.connectionId}\n`);
|
||||
|
|
@ -289,7 +295,11 @@ function formatDiffProgress(event: Extract<MemoryFlowEvent, { type: 'diff_comput
|
|||
}
|
||||
|
||||
function workUnitEventsThrough(snapshot: MemoryFlowReplayInput, eventIndex: number): MemoryFlowEvent[] {
|
||||
return snapshot.events.slice(0, eventIndex + 1);
|
||||
const latestPlanIndex = snapshot.events
|
||||
.slice(0, eventIndex + 1)
|
||||
.findLastIndex((event) => event.type === 'chunks_planned');
|
||||
const startIndex = latestPlanIndex >= 0 ? latestPlanIndex + 1 : 0;
|
||||
return snapshot.events.slice(startIndex, eventIndex + 1);
|
||||
}
|
||||
|
||||
function completedWorkUnitCountThrough(snapshot: MemoryFlowReplayInput, eventIndex: number): number {
|
||||
|
|
@ -313,7 +323,8 @@ function plannedWorkUnitCountThrough(snapshot: MemoryFlowReplayInput, eventIndex
|
|||
if (snapshot.plannedWorkUnits.length > 0) {
|
||||
return snapshot.plannedWorkUnits.length;
|
||||
}
|
||||
const planEvent = workUnitEventsThrough(snapshot, eventIndex)
|
||||
const planEvent = snapshot.events
|
||||
.slice(0, eventIndex + 1)
|
||||
.filter((event) => event.type === 'chunks_planned')
|
||||
.at(-1);
|
||||
return planEvent?.workUnitCount ?? completedWorkUnitCountThrough(snapshot, eventIndex);
|
||||
|
|
@ -359,6 +370,12 @@ function plainIngestEventProgress(
|
|||
};
|
||||
case 'stage_skipped':
|
||||
return { percent: 45, message: `Skipped ${event.stage}: ${event.reason}` };
|
||||
case 'stage_progress':
|
||||
return {
|
||||
percent: event.percent,
|
||||
message: event.message,
|
||||
...(event.transient !== undefined ? { transient: event.transient } : {}),
|
||||
};
|
||||
case 'work_unit_started': {
|
||||
const total = plannedWorkUnitCountThrough(snapshot, eventIndex);
|
||||
const ordinal = workUnitOrdinalThrough(snapshot, eventIndex, event.unitKey);
|
||||
|
|
@ -705,6 +722,25 @@ export async function runKtxIngest(
|
|||
}
|
||||
if (args.adapter === 'metabase') {
|
||||
const executeMetabaseFanout = deps.runLocalMetabaseIngest ?? runLocalMetabaseIngest;
|
||||
const runOutputMode = effectiveIngestOutputMode(args.outputMode, io, env, {
|
||||
requireInput: (args.inputMode ?? 'auto') === 'auto',
|
||||
});
|
||||
const plainProgress = shouldWritePlainIngestProgress(runOutputMode, io, env)
|
||||
? createPlainIngestProgressRenderer(args, io)
|
||||
: null;
|
||||
const structuredProgress = deps.progress
|
||||
? createPlainIngestProgressObserver(args, deps.progress)
|
||||
: null;
|
||||
const initialMemoryFlow =
|
||||
plainProgress || structuredProgress ? initialRunMemoryFlowInput(args, 'pending') : undefined;
|
||||
const memoryFlow = initialMemoryFlow
|
||||
? createMemoryFlowLiveBuffer(initialMemoryFlow, {
|
||||
onChange: (snapshot) => {
|
||||
plainProgress?.update(snapshot);
|
||||
structuredProgress?.update(snapshot);
|
||||
},
|
||||
})
|
||||
: undefined;
|
||||
const progress =
|
||||
args.outputMode === 'json' && !deps.progress
|
||||
? undefined
|
||||
|
|
@ -715,20 +751,29 @@ export async function runKtxIngest(
|
|||
: io,
|
||||
deps.progress,
|
||||
);
|
||||
const result = await executeMetabaseFanout({
|
||||
project: ingestProject,
|
||||
adapters: createAdapters(ingestProject, adapterOptions),
|
||||
metabaseConnectionId: args.connectionId,
|
||||
...localIngestOptions,
|
||||
queryExecutor,
|
||||
trigger: 'manual_resync',
|
||||
jobIdFactory: deps.jobIdFactory,
|
||||
...(progress ? { progress } : {}),
|
||||
});
|
||||
if (args.outputMode === 'json') {
|
||||
io.stdout.write(`${JSON.stringify(result, null, 2)}\n`);
|
||||
} else {
|
||||
writeMetabaseFanoutStatus(result, io);
|
||||
plainProgress?.start();
|
||||
structuredProgress?.start();
|
||||
let result: LocalMetabaseFanoutResult;
|
||||
try {
|
||||
result = await executeMetabaseFanout({
|
||||
project: ingestProject,
|
||||
adapters: createAdapters(ingestProject, adapterOptions),
|
||||
metabaseConnectionId: args.connectionId,
|
||||
...localIngestOptions,
|
||||
queryExecutor,
|
||||
trigger: 'manual_resync',
|
||||
jobIdFactory: deps.jobIdFactory,
|
||||
...(memoryFlow ? { memoryFlow } : {}),
|
||||
...(progress ? { progress } : {}),
|
||||
});
|
||||
plainProgress?.flush();
|
||||
if (args.outputMode === 'json') {
|
||||
io.stdout.write(`${JSON.stringify(result, null, 2)}\n`);
|
||||
} else {
|
||||
writeMetabaseFanoutStatus(result, io);
|
||||
}
|
||||
} finally {
|
||||
plainProgress?.flush();
|
||||
}
|
||||
return result.status === 'all_succeeded' ? 0 : 1;
|
||||
}
|
||||
|
|
|
|||
|
|
@ -1,5 +1,12 @@
|
|||
<role>
|
||||
You are processing ONE WorkUnit of a multi-file ingest bundle. The WorkUnit gives you a slice of raw source files (LookML views, dbt/MetricFlow YAMLs, Metabase card JSONs, Notion pages, or similar) and you must translate that slice into KTX semantic-layer sources and/or knowledge wiki pages, in one pass. Prior WorkUnits in this same job may have already written SL sources and wiki pages; their writes are visible on the working branch and discoverable with `discover_data`.
|
||||
You are processing ONE WorkUnit of a multi-file ingest bundle. The WorkUnit
|
||||
gives you a slice of raw source files (LookML views, dbt/MetricFlow YAMLs,
|
||||
Metabase card JSONs, Notion pages, or similar) and you must translate that
|
||||
slice into KTX semantic-layer sources and/or knowledge wiki pages, in one pass.
|
||||
You run in an isolated WorkUnit worktree. Deterministic projection output,
|
||||
existing project memory, and listed dependency paths are visible; sibling
|
||||
WorkUnit edits from this same job are not visible until the runner integrates
|
||||
accepted patches.
|
||||
</role>
|
||||
|
||||
<stance>
|
||||
|
|
@ -8,9 +15,19 @@ Assertive. The bundle was explicitly submitted for ingest. Default to capturing
|
|||
|
||||
<workflow>
|
||||
1. Read this WorkUnit's section at the end of the user prompt. It lists your `rawFiles`, any unchanged `dependencyPaths` you may need to resolve references, the `peerFileIndex` (paths only; you CANNOT read them), the source's `skillNames`, and any `priorProvenance` rows telling you what earlier syncs produced from these files.
|
||||
2. Load the per-source review skill first (e.g. `lookml_ingest`, `metricflow_ingest`, `dbt_ingest`), then `sl_capture` and `wiki_capture`, and `ingest_triage` last. The triage skill tells you how to react when `discover_data` reveals that a prior WU already wrote something overlapping.
|
||||
2. Load the per-source review skill first (for example `lookml_ingest`,
|
||||
`metricflow_ingest`, or `dbt_ingest`), then `sl_capture` and
|
||||
`wiki_capture`, and `ingest_triage` last. The triage skill tells you how to
|
||||
react when existing project memory, deterministic projection output, or
|
||||
prior provenance overlaps with what this WorkUnit is about to write.
|
||||
3. If the system prompt includes `<canonical_pins>`, read those pins before choosing artifact keys. A pin's `canonicalArtifactKey` is the preferred artifact for its `contestedKey`: prefer editing the pinned canonical artifact when it already exists or when this raw file clearly updates it. Do not create a duplicate contested artifact when a pin says another artifact is canonical; use a specific disambiguated key only when the raw file describes a genuinely different domain.
|
||||
4. For each raw file: call `read_raw_file` (or `read_raw_span` for slicing large files) to load content. Before writing a new SL source or wiki page, call `discover_data` for each candidate source, table, metric, or topic name to find prior-WU writes, existing wiki pages, SL sources, and raw warehouse matches; apply `ingest_triage` when you hit one, and apply any matching canonical pin before deciding whether to edit, rename, or skip.
|
||||
4. For each raw file: call `read_raw_file` (or `read_raw_span` for slicing large
|
||||
files) to load content. Before writing a new SL source or wiki page, call
|
||||
`discover_data` for each candidate source, table, metric, or topic name to
|
||||
find existing wiki pages, SL sources, deterministic projection output, prior
|
||||
sync artifacts, and raw warehouse matches; apply `ingest_triage` when you hit
|
||||
one, and apply any matching canonical pin before deciding whether to edit,
|
||||
rename, or skip.
|
||||
5. For every `wiki_write`, `wiki_remove`, `sl_write_source`, or `sl_edit_source` call, include `rawPaths` with only the raw file paths that directly support that action. If one artifact synthesizes several files, list each contributing raw file. Do not include unrelated files from the same WorkUnit.
|
||||
6. When `priorProvenance` names an existing artifact for one of your raw files, prefer `sl_edit` over `sl_write` for that artifact: the re-ingest change rule says expression-only changes replace silently, grain/column/filter changes replace and flag.
|
||||
7. When a raw file cannot map to normal SL and you use a fallback path, call `emit_unmapped_fallback` exactly once for that raw file and reason. Use `fallback: "sql_standalone"` for a standalone SQL source, `fallback: "wiki_only"` for documentation-only capture, and `fallback: "flagged"` when no reliable artifact can be written.
|
||||
|
|
@ -28,5 +45,7 @@ Wiki keys must be flat slugs like `paid-order-lifecycle`, not directory paths li
|
|||
- Do not invent physical column names or grain keys. For table-backed SL sources, every `columns:`, `grain:`, `joins:`, `segments:`, and `measures[].expr` column must come from raw-file column declarations or warehouse-backed discovery (`discover_data`, `sl_discover`, `entity_details`). If column names are not confirmed, capture the business context in wiki instead of writing a full SL source.
|
||||
- Do not write context-source overlays into the context source connection just because that is the current WorkUnit connection. Use `sl_discover` across data sources and write the SL artifact to the warehouse/data-source connection that owns the matching manifest. If there is no confirmed target connection, use `emit_unmapped_fallback` and wiki capture.
|
||||
- Do not duplicate an artifact that prior provenance says you already produced; update it.
|
||||
- Do not silently accept a name collision with a prior WU's write when the formula differs. Trigger `ingest_triage`.
|
||||
- Do not silently accept a name collision with visible existing memory,
|
||||
deterministic projection output, or prior provenance when the formula differs.
|
||||
Trigger `ingest_triage`.
|
||||
</do_not>
|
||||
|
|
|
|||
|
|
@ -7,8 +7,11 @@ callers: [memory_agent]
|
|||
# Ingest Triage - conflict classification and resolution
|
||||
|
||||
This skill is loaded in two contexts:
|
||||
- By a Stage 3 WorkUnit agent when `sl_discover` reveals that a prior WU (or a prior sync) already wrote something that overlaps with what the current WU is about to write.
|
||||
- By the Stage 4 reconciliation agent for cross-WU sweeps and for eviction decisions.
|
||||
- By a Stage 3 WorkUnit agent when `sl_discover`, deterministic projection
|
||||
output, existing project memory, or prior provenance overlaps with what the
|
||||
current WorkUnit is about to write.
|
||||
- By the Stage 4 reconciliation agent for cross-WorkUnit sweeps, accepted patch
|
||||
overlap, and eviction decisions.
|
||||
|
||||
Apply the rules below before every write that could collide with an existing artifact.
|
||||
|
||||
|
|
@ -23,7 +26,8 @@ Apply the rules below before every write that could collide with an existing art
|
|||
3. **If the difference is structural - grain, columns, filter, join shape - is the current bundle the re-ingest of a previously-ingested bundle (i.e. `priorProvenance` has a row for this raw file and artifact)?**
|
||||
Re-ingest change (semantic break): replace + flag. Record in the IngestReport's `conflicts_resolved` list with `flagged_for_human: true`.
|
||||
|
||||
4. **If there's no prior-sync row (both are from THIS job), check for same-ingest contradictions:**
|
||||
4. **If reconciliation sees accepted patches from this same job with no
|
||||
prior-sync row, check for same-ingest contradictions:**
|
||||
|
||||
| Kind | Detection | Resolution |
|
||||
|---|---|---|
|
||||
|
|
|
|||
45
packages/context/src/core/git.service.patch.test.ts
Normal file
45
packages/context/src/core/git.service.patch.test.ts
Normal file
|
|
@ -0,0 +1,45 @@
|
|||
import { mkdir, mkdtemp, readFile, writeFile } from 'node:fs/promises';
|
||||
import { tmpdir } from 'node:os';
|
||||
import { join } from 'node:path';
|
||||
import { describe, expect, it } from 'vitest';
|
||||
import { GitService } from './git.service.js';
|
||||
|
||||
async function makeGit() {
|
||||
const homeDir = await mkdtemp(join(tmpdir(), 'ktx-git-patch-'));
|
||||
const configDir = join(homeDir, 'config');
|
||||
const git = new GitService({
|
||||
storage: { configDir, homeDir },
|
||||
git: {
|
||||
userName: 'System User',
|
||||
userEmail: 'system@example.com',
|
||||
bootstrapMessage: 'init',
|
||||
bootstrapAuthor: 'system',
|
||||
bootstrapAuthorEmail: 'system@example.com',
|
||||
},
|
||||
});
|
||||
await git.onModuleInit();
|
||||
return { homeDir, configDir, git };
|
||||
}
|
||||
|
||||
describe('GitService patch helpers', () => {
|
||||
it('collects binary-safe no-rename patches and applies them with --3way --index', async () => {
|
||||
const { homeDir, configDir, git } = await makeGit();
|
||||
await mkdir(join(configDir, 'wiki/global'), { recursive: true });
|
||||
await writeFile(join(configDir, 'wiki/global/page.md'), 'old\n');
|
||||
await git.commitFiles(['wiki/global/page.md'], 'add page', 'System User', 'system@example.com');
|
||||
const base = await git.revParseHead();
|
||||
|
||||
await writeFile(join(configDir, 'wiki/global/page.md'), 'new\n');
|
||||
await git.commitFiles(['wiki/global/page.md'], 'edit page', 'System User', 'system@example.com');
|
||||
const patchPath = join(homeDir, 'proposal.patch');
|
||||
await git.writeBinaryNoRenamePatch(base, 'HEAD', patchPath);
|
||||
|
||||
const targetDir = join(homeDir, 'target');
|
||||
await git.addWorktree(targetDir, 'target', base);
|
||||
const targetGit = git.forWorktree(targetDir);
|
||||
await targetGit.applyPatchFile3WayIndex(patchPath);
|
||||
await targetGit.commitStaged('apply proposal', 'System User', 'system@example.com');
|
||||
|
||||
await expect(readFile(join(targetDir, 'wiki/global/page.md'), 'utf-8')).resolves.toBe('new\n');
|
||||
});
|
||||
});
|
||||
|
|
@ -1,5 +1,5 @@
|
|||
import { promises as fs } from 'node:fs';
|
||||
import { join } from 'node:path';
|
||||
import { dirname, join } from 'node:path';
|
||||
import type { SimpleGit } from 'simple-git';
|
||||
import { noopLogger, resolveConfigDir, type KtxCoreConfig, type KtxLogger } from './config.js';
|
||||
import { createSimpleGit } from './git-env.js';
|
||||
|
|
@ -747,6 +747,55 @@ export class GitService {
|
|||
}
|
||||
}
|
||||
|
||||
async writeBinaryNoRenamePatch(from: string, to: string, patchPath: string): Promise<void> {
|
||||
await this.withMutationQueue(async () => {
|
||||
const patch = await this.git.raw(['diff', '--binary', '--no-renames', `${from}..${to}`]);
|
||||
await fs.mkdir(dirname(patchPath), { recursive: true });
|
||||
await fs.writeFile(patchPath, patch, 'utf-8');
|
||||
});
|
||||
}
|
||||
|
||||
async applyPatchFile3WayIndex(patchPath: string): Promise<void> {
|
||||
await this.withMutationQueue(async () => {
|
||||
await this.git.raw(['apply', '--3way', '--index', patchPath]);
|
||||
});
|
||||
}
|
||||
|
||||
async commitStaged(commitMessage: string, author: string, authorEmail: string): Promise<GitCommitInfo> {
|
||||
return this.withMutationQueue(async () => {
|
||||
const stagedChanges = await this.git.diff(['--cached', '--name-only']);
|
||||
if (!stagedChanges.trim()) {
|
||||
const head = (await this.git.revparse(['HEAD'])).trim();
|
||||
const log = await this.git.log({ maxCount: 1 });
|
||||
const latest = log.latest;
|
||||
return {
|
||||
commitHash: head,
|
||||
shortHash: head.substring(0, 8),
|
||||
message: latest?.message ?? '',
|
||||
author: latest?.author_name ?? '',
|
||||
authorEmail: latest?.author_email ?? '',
|
||||
timestamp: latest?.date ?? new Date(0).toISOString(),
|
||||
committedDate: latest?.date ? new Date(latest.date).toISOString() : new Date(0).toISOString(),
|
||||
created: false,
|
||||
};
|
||||
}
|
||||
await this.git.commit(commitMessage, { '--author': `${author} <${authorEmail}>` });
|
||||
const head = (await this.git.revparse(['HEAD'])).trim();
|
||||
const log = await this.git.log({ maxCount: 1 });
|
||||
const latest = log.latest;
|
||||
return {
|
||||
commitHash: head,
|
||||
shortHash: head.substring(0, 8),
|
||||
message: latest?.message ?? commitMessage,
|
||||
author: latest?.author_name ?? author,
|
||||
authorEmail: latest?.author_email ?? authorEmail,
|
||||
timestamp: latest?.date ?? new Date().toISOString(),
|
||||
committedDate: latest?.date ? new Date(latest.date).toISOString() : new Date().toISOString(),
|
||||
created: true,
|
||||
};
|
||||
});
|
||||
}
|
||||
|
||||
private async fileExists(path: string): Promise<boolean> {
|
||||
try {
|
||||
await fs.access(path);
|
||||
|
|
|
|||
|
|
@ -138,6 +138,52 @@ describe('fetchMetabaseBundle', () => {
|
|||
expect(warn).not.toHaveBeenCalled();
|
||||
});
|
||||
|
||||
it('emits memory-flow progress while fetching Metabase cards', async () => {
|
||||
const events: unknown[] = [];
|
||||
|
||||
await fetchMetabaseBundle({
|
||||
pullConfig: { metabaseConnectionId, metabaseDatabaseId: 42 },
|
||||
stagedDir,
|
||||
ctx: {
|
||||
...makeFetchContext(),
|
||||
memoryFlow: {
|
||||
emit: (event) => events.push(event),
|
||||
update: vi.fn(),
|
||||
finish: vi.fn(),
|
||||
snapshot: vi.fn(),
|
||||
},
|
||||
},
|
||||
clientFactory,
|
||||
sourceStateReader,
|
||||
});
|
||||
|
||||
expect(events).toEqual(
|
||||
expect.arrayContaining([
|
||||
expect.objectContaining({
|
||||
type: 'stage_progress',
|
||||
stage: 'source',
|
||||
message: 'Fetching Metabase database 42 metadata',
|
||||
}),
|
||||
expect.objectContaining({
|
||||
type: 'stage_progress',
|
||||
stage: 'source',
|
||||
message: 'Fetching 1 Metabase card for database 42',
|
||||
}),
|
||||
expect.objectContaining({
|
||||
type: 'stage_progress',
|
||||
stage: 'source',
|
||||
message: 'Checked 1/1 Metabase cards for database 42; wrote 1',
|
||||
transient: true,
|
||||
}),
|
||||
expect.objectContaining({
|
||||
type: 'stage_progress',
|
||||
stage: 'source',
|
||||
message: 'Fetched Metabase database 42: 1 cards, 0 unresolved',
|
||||
}),
|
||||
]),
|
||||
);
|
||||
});
|
||||
|
||||
it('routes Metabase fetch warnings through the injected logger', async () => {
|
||||
const logger = {
|
||||
log: vi.fn(),
|
||||
|
|
|
|||
|
|
@ -83,6 +83,15 @@ function resolvePath(index: Map<number | 'root', CollectionNode>, collectionId:
|
|||
export async function fetchMetabaseBundle(params: FetchMetabaseBundleParams): Promise<void> {
|
||||
const pullConfig: MetabasePullConfig = parseMetabasePullConfig(params.pullConfig);
|
||||
const logger = params.logger ?? noopMetabaseFetchLogger;
|
||||
const emitFetchProgress = (percent: number, message: string, transient = false): void => {
|
||||
params.ctx.memoryFlow?.emit({
|
||||
type: 'stage_progress',
|
||||
stage: 'source',
|
||||
percent,
|
||||
message,
|
||||
...(transient ? { transient } : {}),
|
||||
});
|
||||
};
|
||||
const syncState = await params.sourceStateReader.getSourceState(pullConfig.metabaseConnectionId);
|
||||
const mapping = syncState.mappings.find(
|
||||
(m) => m.metabaseDatabaseId === pullConfig.metabaseDatabaseId && m.syncEnabled,
|
||||
|
|
@ -100,6 +109,7 @@ export async function fetchMetabaseBundle(params: FetchMetabaseBundleParams): Pr
|
|||
|
||||
const client = await params.clientFactory.createClient(pullConfig, params.ctx);
|
||||
try {
|
||||
emitFetchProgress(26, `Fetching Metabase database ${pullConfig.metabaseDatabaseId} metadata`);
|
||||
let mappingDatabaseName = mapping.metabaseDatabaseName;
|
||||
let mappingEngine = mapping.metabaseEngine;
|
||||
if (mappingDatabaseName === null) {
|
||||
|
|
@ -133,6 +143,12 @@ export async function fetchMetabaseBundle(params: FetchMetabaseBundleParams): Pr
|
|||
await mkdir(join(params.stagedDir, STAGED_FILES.databasesDir), { recursive: true });
|
||||
|
||||
const cardIdsToFetch = await resolveCardIdsToFetch(client, scope, pullConfig.metabaseDatabaseId, logger);
|
||||
emitFetchProgress(
|
||||
28,
|
||||
`Fetching ${cardIdsToFetch.length} Metabase card${cardIdsToFetch.length === 1 ? '' : 's'} for database ${
|
||||
pullConfig.metabaseDatabaseId
|
||||
}`,
|
||||
);
|
||||
|
||||
const referencedCollectionIds = new Set<number>();
|
||||
let writtenCards = 0;
|
||||
|
|
@ -212,7 +228,19 @@ export async function fetchMetabaseBundle(params: FetchMetabaseBundleParams): Pr
|
|||
}
|
||||
}
|
||||
}
|
||||
const knownTotal = Math.max(cardIdsToFetch.length, fetched.size + queue.length);
|
||||
if (fetched.size === 1 || fetched.size % 10 === 0 || queue.length === 0) {
|
||||
emitFetchProgress(
|
||||
30,
|
||||
`Checked ${fetched.size}/${knownTotal} Metabase cards for database ${pullConfig.metabaseDatabaseId}; wrote ${writtenCards}`,
|
||||
true,
|
||||
);
|
||||
}
|
||||
}
|
||||
emitFetchProgress(
|
||||
32,
|
||||
`Fetched Metabase database ${pullConfig.metabaseDatabaseId}: ${writtenCards} cards, ${unresolvedCards.length} unresolved`,
|
||||
);
|
||||
|
||||
for (const colId of referencedCollectionIds) {
|
||||
const node = collectionIndex.get(colId);
|
||||
|
|
|
|||
|
|
@ -1,10 +1,12 @@
|
|||
import { mkdir, mkdtemp, readFile, rm, writeFile } from 'node:fs/promises';
|
||||
import { tmpdir } from 'node:os';
|
||||
import { join } from 'node:path';
|
||||
import { afterEach, beforeEach, describe, expect, it } from 'vitest';
|
||||
import { afterEach, beforeEach, describe, expect, it, vi } from 'vitest';
|
||||
import { makeLocalGitRepo } from '../../../test/make-local-git-repo.js';
|
||||
import type { SourceAdapter } from '../../types.js';
|
||||
import type { MetricFlowParseResult } from './deep-parse.js';
|
||||
import { MetricflowSourceAdapter } from './metricflow.adapter.js';
|
||||
import { readMetricflowProjectionConfig, writeMetricflowProjectionConfig } from './projection-config.js';
|
||||
|
||||
function compileOnlyRequiredDepsCheck(): void {
|
||||
// @ts-expect-error MetricflowSourceAdapter requires an explicit cache home.
|
||||
|
|
@ -22,6 +24,25 @@ async function makeRepo(tmpRoot: string, files: Record<string, string>) {
|
|||
return makeLocalGitRepo(fixtureDir, join(tmpRoot, 'origin'));
|
||||
}
|
||||
|
||||
function metricflowParseResult(): MetricFlowParseResult {
|
||||
return {
|
||||
semanticModels: [
|
||||
{
|
||||
name: 'orders',
|
||||
description: 'Orders',
|
||||
modelRef: 'orders',
|
||||
dimensions: [{ name: 'status', column: 'status', type: 'string', label: 'Status' }],
|
||||
measures: [{ type: 'simple', name: 'order_count', column: 'id', aggregation: 'count' }],
|
||||
entities: [{ name: 'customer', type: 'foreign', expr: 'customer_id' }],
|
||||
defaultTimeDimension: null,
|
||||
},
|
||||
],
|
||||
crossModelMetrics: [],
|
||||
relationships: [],
|
||||
warnings: ['parser warning'],
|
||||
};
|
||||
}
|
||||
|
||||
describe('MetricflowSourceAdapter', () => {
|
||||
let tmpRoot: string;
|
||||
let stagedDir: string;
|
||||
|
|
@ -127,4 +148,119 @@ describe('MetricflowSourceAdapter', () => {
|
|||
await expect(readFile(join(stagedDir, 'models/orders.yml'), 'utf-8')).resolves.toContain('semantic_models');
|
||||
expect(await adapter.detect(stagedDir)).toBe(true);
|
||||
});
|
||||
|
||||
it('persists parsed target tables for deterministic projection during fetch', async () => {
|
||||
const repo = await makeRepo(tmpRoot, {
|
||||
'dbt_project.yml': 'name: analytics\n',
|
||||
'models/orders.yml': 'semantic_models:\n - name: orders\n model: ref("orders")\n',
|
||||
});
|
||||
|
||||
await adapter.fetch?.(
|
||||
{
|
||||
repoUrl: repo.repoUrl,
|
||||
branch: 'main',
|
||||
path: null,
|
||||
authToken: null,
|
||||
parsedTargetTables: {
|
||||
orders: {
|
||||
ok: true,
|
||||
catalog: null,
|
||||
schema: 'analytics',
|
||||
name: 'orders',
|
||||
canonicalTable: 'analytics.orders',
|
||||
},
|
||||
},
|
||||
},
|
||||
stagedDir,
|
||||
{ connectionId: 'warehouse-1', sourceKey: 'metricflow' },
|
||||
);
|
||||
|
||||
await expect(readMetricflowProjectionConfig(stagedDir)).resolves.toMatchObject({
|
||||
parsedTargetTables: {
|
||||
orders: {
|
||||
ok: true,
|
||||
schema: 'analytics',
|
||||
name: 'orders',
|
||||
},
|
||||
},
|
||||
});
|
||||
});
|
||||
|
||||
it('projects parsed MetricFlow semantic models in the integration worktree', async () => {
|
||||
await writeMetricflowProjectionConfig(stagedDir, {
|
||||
parsedTargetTables: {
|
||||
orders: {
|
||||
ok: true,
|
||||
catalog: null,
|
||||
schema: 'analytics',
|
||||
name: 'orders',
|
||||
canonicalTable: 'analytics.orders',
|
||||
},
|
||||
},
|
||||
});
|
||||
const scoped = {
|
||||
getManifestEntry: vi.fn().mockResolvedValue(null),
|
||||
isManifestBacked: vi.fn().mockResolvedValue(false),
|
||||
loadAllSources: vi.fn().mockResolvedValue({ sources: [], loadErrors: [] }),
|
||||
loadSource: vi.fn().mockResolvedValue(null),
|
||||
writeSource: vi.fn().mockResolvedValue({ warnings: [] }),
|
||||
};
|
||||
const semanticLayerService = {
|
||||
forWorktree: vi.fn().mockReturnValue(scoped),
|
||||
getManifestEntry: vi.fn(),
|
||||
isManifestBacked: vi.fn(),
|
||||
loadAllSources: vi.fn(),
|
||||
loadSource: vi.fn(),
|
||||
writeSource: vi.fn(),
|
||||
};
|
||||
|
||||
const result = await adapter.project?.({
|
||||
connectionId: 'warehouse-1',
|
||||
sourceKey: 'metricflow',
|
||||
syncId: 'sync-1',
|
||||
jobId: 'job-1',
|
||||
runId: 'run-1',
|
||||
stagedDir,
|
||||
workdir: '/tmp/metricflow-integration',
|
||||
parseArtifacts: metricflowParseResult(),
|
||||
semanticLayerService: semanticLayerService as never,
|
||||
});
|
||||
|
||||
expect(semanticLayerService.forWorktree).toHaveBeenCalledWith('/tmp/metricflow-integration');
|
||||
expect(scoped.writeSource).toHaveBeenCalledWith(
|
||||
'warehouse-1',
|
||||
expect.objectContaining({ name: 'orders' }),
|
||||
'dbt MetricFlow',
|
||||
expect.any(String),
|
||||
'dbt MetricFlow sync: create source orders',
|
||||
{ skipValidation: true },
|
||||
);
|
||||
expect(result).toMatchObject({
|
||||
warnings: ['parser warning'],
|
||||
errors: [],
|
||||
touchedSources: [{ connectionId: 'warehouse-1', sourceName: 'orders' }],
|
||||
changedWikiPageKeys: [],
|
||||
});
|
||||
});
|
||||
|
||||
it('returns a projection error when parse artifacts are missing', async () => {
|
||||
const result = await adapter.project?.({
|
||||
connectionId: 'warehouse-1',
|
||||
sourceKey: 'metricflow',
|
||||
syncId: 'sync-1',
|
||||
jobId: 'job-1',
|
||||
runId: 'run-1',
|
||||
stagedDir,
|
||||
workdir: '/tmp/metricflow-integration',
|
||||
parseArtifacts: undefined,
|
||||
semanticLayerService: {} as never,
|
||||
});
|
||||
|
||||
expect(result).toMatchObject({
|
||||
warnings: [],
|
||||
errors: ['MetricFlow deterministic projection requires parseArtifacts from chunk()'],
|
||||
touchedSources: [],
|
||||
changedWikiPageKeys: [],
|
||||
});
|
||||
});
|
||||
});
|
||||
|
|
|
|||
|
|
@ -1,10 +1,23 @@
|
|||
import { join } from 'node:path';
|
||||
import type { ChunkResult, DiffSet, FetchContext, SourceAdapter } from '../../types.js';
|
||||
import type {
|
||||
ChunkResult,
|
||||
DeterministicProjectionContext,
|
||||
DiffSet,
|
||||
FetchContext,
|
||||
ProjectionResult,
|
||||
SourceAdapter,
|
||||
} from '../../types.js';
|
||||
import { chunkMetricFlowProject } from './chunk.js';
|
||||
import { detectMetricFlowStagedDir } from './detect.js';
|
||||
import { parseMetricflowFiles, type MetricFlowParseResult } from './deep-parse.js';
|
||||
import { fetchMetricflowRepo } from './fetch.js';
|
||||
import { importMetricflowSemanticModels } from './import-semantic-models.js';
|
||||
import { parseMetricFlowStagedDir, type ParsedMetricFlowProject } from './parse.js';
|
||||
import {
|
||||
metricflowHostTablesFromParsedTargets,
|
||||
readMetricflowProjectionConfig,
|
||||
writeMetricflowProjectionConfig,
|
||||
} from './projection-config.js';
|
||||
import { parseMetricflowPullConfig } from './pull-config.js';
|
||||
|
||||
export interface MetricflowSourceAdapterDeps {
|
||||
|
|
@ -33,6 +46,9 @@ export class MetricflowSourceAdapter implements SourceAdapter {
|
|||
cacheDir: this.resolveCacheDir(ctx.connectionId),
|
||||
stagedDir,
|
||||
});
|
||||
await writeMetricflowProjectionConfig(stagedDir, {
|
||||
parsedTargetTables: config.parsedTargetTables,
|
||||
});
|
||||
}
|
||||
|
||||
async listTargetConnectionIds(_stagedDir: string): Promise<string[]> {
|
||||
|
|
@ -46,6 +62,37 @@ export class MetricflowSourceAdapter implements SourceAdapter {
|
|||
return { ...chunk, parseArtifacts };
|
||||
}
|
||||
|
||||
async project(ctx: DeterministicProjectionContext): Promise<ProjectionResult> {
|
||||
if (!isMetricFlowParseResult(ctx.parseArtifacts)) {
|
||||
return {
|
||||
warnings: [],
|
||||
errors: ['MetricFlow deterministic projection requires parseArtifacts from chunk()'],
|
||||
touchedSources: [],
|
||||
changedWikiPageKeys: [],
|
||||
};
|
||||
}
|
||||
|
||||
const projectionConfig = await readMetricflowProjectionConfig(ctx.stagedDir);
|
||||
const result = await importMetricflowSemanticModels(
|
||||
{ semanticLayerService: ctx.semanticLayerService },
|
||||
{
|
||||
connectionId: ctx.connectionId,
|
||||
parseResult: ctx.parseArtifacts,
|
||||
targetSchema: null,
|
||||
hostTables: metricflowHostTablesFromParsedTargets(projectionConfig.parsedTargetTables),
|
||||
workdir: ctx.workdir,
|
||||
},
|
||||
);
|
||||
|
||||
return {
|
||||
result,
|
||||
warnings: result.warnings,
|
||||
errors: result.errors,
|
||||
touchedSources: result.touchedSources,
|
||||
changedWikiPageKeys: [],
|
||||
};
|
||||
}
|
||||
|
||||
private resolveCacheDir(connectionId: string): string {
|
||||
return join(this.deps.homeDir, 'ingest-metricflow-repos', connectionId);
|
||||
}
|
||||
|
|
@ -54,3 +101,16 @@ export class MetricflowSourceAdapter implements SourceAdapter {
|
|||
function parseMetricflowStagedDirForImport(project: ParsedMetricFlowProject): MetricFlowParseResult {
|
||||
return parseMetricflowFiles(project.files);
|
||||
}
|
||||
|
||||
function isMetricFlowParseResult(value: unknown): value is MetricFlowParseResult {
|
||||
if (!value || typeof value !== 'object') {
|
||||
return false;
|
||||
}
|
||||
const candidate = value as Partial<MetricFlowParseResult>;
|
||||
return (
|
||||
Array.isArray(candidate.semanticModels) &&
|
||||
Array.isArray(candidate.crossModelMetrics) &&
|
||||
Array.isArray(candidate.relationships) &&
|
||||
Array.isArray(candidate.warnings)
|
||||
);
|
||||
}
|
||||
|
|
|
|||
|
|
@ -0,0 +1,54 @@
|
|||
import { mkdir, readFile, writeFile } from 'node:fs/promises';
|
||||
import { join } from 'node:path';
|
||||
import { z } from 'zod';
|
||||
import { parsedTargetTableSchema, type ParsedTargetTable } from '../../parsed-target-table.js';
|
||||
import type { MetricflowHostTable } from './semantic-models.js';
|
||||
|
||||
const METRICFLOW_PROJECTION_CONFIG_FILE = 'sync-config.json';
|
||||
|
||||
const metricflowProjectionConfigSchema = z.object({
|
||||
parsedTargetTables: z.record(z.string(), parsedTargetTableSchema).default({}),
|
||||
});
|
||||
|
||||
export type MetricflowProjectionConfig = z.infer<typeof metricflowProjectionConfigSchema>;
|
||||
|
||||
export async function writeMetricflowProjectionConfig(
|
||||
stagedDir: string,
|
||||
config: MetricflowProjectionConfig,
|
||||
): Promise<void> {
|
||||
const parsed = metricflowProjectionConfigSchema.parse(config);
|
||||
await mkdir(stagedDir, { recursive: true });
|
||||
await writeFile(join(stagedDir, METRICFLOW_PROJECTION_CONFIG_FILE), `${JSON.stringify(parsed, null, 2)}\n`, 'utf-8');
|
||||
}
|
||||
|
||||
export async function readMetricflowProjectionConfig(stagedDir: string): Promise<MetricflowProjectionConfig> {
|
||||
const path = join(stagedDir, METRICFLOW_PROJECTION_CONFIG_FILE);
|
||||
try {
|
||||
return metricflowProjectionConfigSchema.parse(JSON.parse(await readFile(path, 'utf-8')));
|
||||
} catch (error) {
|
||||
if (error && typeof error === 'object' && 'code' in error && error.code === 'ENOENT') {
|
||||
return { parsedTargetTables: {} };
|
||||
}
|
||||
throw error;
|
||||
}
|
||||
}
|
||||
|
||||
export function metricflowHostTablesFromParsedTargets(
|
||||
parsedTargetTables: Record<string, ParsedTargetTable>,
|
||||
): MetricflowHostTable[] {
|
||||
return Object.entries(parsedTargetTables)
|
||||
.flatMap(([id, table]) =>
|
||||
table.ok
|
||||
? [
|
||||
{
|
||||
id,
|
||||
name: table.name,
|
||||
catalog: table.catalog,
|
||||
db: table.schema,
|
||||
columns: [],
|
||||
},
|
||||
]
|
||||
: [],
|
||||
)
|
||||
.sort((left, right) => left.id.localeCompare(right.id));
|
||||
}
|
||||
190
packages/context/src/ingest/artifact-gates.test.ts
Normal file
190
packages/context/src/ingest/artifact-gates.test.ts
Normal file
|
|
@ -0,0 +1,190 @@
|
|||
import { describe, expect, it, vi } from 'vitest';
|
||||
import { validateFinalIngestArtifacts, validateProvenanceRawPaths } from './artifact-gates.js';
|
||||
|
||||
function wikiServiceWithPages(
|
||||
pages: Record<string, { refs?: string[]; content?: string; slRefs?: string[] }>,
|
||||
) {
|
||||
return {
|
||||
listPageKeys: vi.fn().mockResolvedValue(Object.keys(pages)),
|
||||
readPage: vi.fn().mockImplementation((_scope: string, _scopeId: string | null, pageKey: string) => {
|
||||
const page = pages[pageKey];
|
||||
if (!page) {
|
||||
return Promise.resolve(null);
|
||||
}
|
||||
return Promise.resolve({
|
||||
pageKey,
|
||||
frontmatter: {
|
||||
summary: pageKey,
|
||||
usage_mode: 'auto',
|
||||
refs: page.refs,
|
||||
sl_refs: page.slRefs,
|
||||
},
|
||||
content: page.content ?? '',
|
||||
});
|
||||
}),
|
||||
};
|
||||
}
|
||||
|
||||
describe('artifact gates', () => {
|
||||
it('fails the final tree when wiki body references a stale semantic-layer measure', async () => {
|
||||
const wikiService = wikiServiceWithPages({
|
||||
'account-segments': {
|
||||
slRefs: ['mart_account_segments'],
|
||||
content: 'ARR is `mart_account_segments.total_contract_arr_cents`.',
|
||||
},
|
||||
});
|
||||
const semanticLayerService = {
|
||||
loadAllSources: vi.fn().mockResolvedValue({
|
||||
sources: [
|
||||
{
|
||||
name: 'mart_account_segments',
|
||||
grain: ['account_id'],
|
||||
columns: [{ name: 'account_id', type: 'string' }],
|
||||
joins: [],
|
||||
measures: [{ name: 'total_contract_arr', expr: 'sum(contract_arr)' }],
|
||||
table: 'analytics.mart_account_segments',
|
||||
},
|
||||
],
|
||||
loadErrors: [],
|
||||
}),
|
||||
};
|
||||
|
||||
await expect(
|
||||
validateFinalIngestArtifacts({
|
||||
connectionIds: ['warehouse'],
|
||||
changedWikiPageKeys: ['account-segments'],
|
||||
touchedSlSources: [{ connectionId: 'warehouse', sourceName: 'mart_account_segments' }],
|
||||
wikiService: wikiService as never,
|
||||
semanticLayerService: semanticLayerService as never,
|
||||
validateTouchedSources: async () => ({ invalidSources: [], validSources: ['mart_account_segments'] }),
|
||||
tableExists: async () => true,
|
||||
}),
|
||||
).rejects.toThrow(/unknown semantic-layer entity mart_account_segments\.total_contract_arr_cents/);
|
||||
});
|
||||
|
||||
it('fails before provenance insertion when a raw path cannot be tied to the current snapshot or eviction set', () => {
|
||||
expect(() =>
|
||||
validateProvenanceRawPaths({
|
||||
rows: [{ rawPath: 'cards/missing.json' }],
|
||||
currentRawPaths: new Set(['cards/present.json']),
|
||||
deletedRawPaths: new Set(['cards/deleted.json']),
|
||||
}),
|
||||
).toThrow(/provenance row references raw path outside this snapshot: cards\/missing\.json/);
|
||||
});
|
||||
|
||||
it('fails measure-level wiki frontmatter sl_refs that point at missing entities', async () => {
|
||||
const wikiService = wikiServiceWithPages({
|
||||
'account-segments': {
|
||||
slRefs: ['mart_account_segments.total_contract_arr_cents'],
|
||||
content: 'ARR uses a renamed measure.',
|
||||
},
|
||||
});
|
||||
const semanticLayerService = {
|
||||
loadAllSources: vi.fn().mockResolvedValue({
|
||||
sources: [
|
||||
{
|
||||
name: 'mart_account_segments',
|
||||
grain: ['account_id'],
|
||||
columns: [{ name: 'account_id', type: 'string' }],
|
||||
joins: [],
|
||||
measures: [{ name: 'total_contract_arr', expr: 'sum(contract_arr)' }],
|
||||
table: 'analytics.mart_account_segments',
|
||||
},
|
||||
],
|
||||
loadErrors: [],
|
||||
}),
|
||||
};
|
||||
|
||||
await expect(
|
||||
validateFinalIngestArtifacts({
|
||||
connectionIds: ['warehouse'],
|
||||
changedWikiPageKeys: ['account-segments'],
|
||||
touchedSlSources: [{ connectionId: 'warehouse', sourceName: 'mart_account_segments' }],
|
||||
wikiService: wikiService as never,
|
||||
semanticLayerService: semanticLayerService as never,
|
||||
validateTouchedSources: async () => ({ invalidSources: [], validSources: ['warehouse:mart_account_segments'] }),
|
||||
tableExists: async () => true,
|
||||
}),
|
||||
).rejects.toThrow(/unknown sl_refs entity mart_account_segments\.total_contract_arr_cents/);
|
||||
});
|
||||
|
||||
it('validates direct declared-join neighbors of touched semantic-layer sources', async () => {
|
||||
const semanticLayerService = {
|
||||
loadAllSources: vi.fn().mockResolvedValue({
|
||||
sources: [
|
||||
{
|
||||
name: 'orders',
|
||||
grain: ['order_id'],
|
||||
columns: [
|
||||
{ name: 'order_id', type: 'string' },
|
||||
{ name: 'account_id', type: 'string' },
|
||||
],
|
||||
joins: [{ to: 'accounts', on: 'orders.account_id = accounts.account_id', relationship: 'many_to_one' }],
|
||||
measures: [{ name: 'order_count', expr: 'count(*)' }],
|
||||
},
|
||||
{
|
||||
name: 'accounts',
|
||||
grain: ['account_id'],
|
||||
columns: [{ name: 'account_id', type: 'string' }],
|
||||
joins: [],
|
||||
measures: [{ name: 'account_count', expr: 'count(*)' }],
|
||||
},
|
||||
{
|
||||
name: 'segments',
|
||||
grain: ['segment_id'],
|
||||
columns: [
|
||||
{ name: 'segment_id', type: 'string' },
|
||||
{ name: 'account_id', type: 'string' },
|
||||
],
|
||||
joins: [{ to: 'accounts', on: 'segments.account_id = accounts.account_id', relationship: 'many_to_one' }],
|
||||
measures: [],
|
||||
},
|
||||
],
|
||||
loadErrors: [],
|
||||
}),
|
||||
};
|
||||
const validateTouchedSources = vi.fn().mockResolvedValue({ invalidSources: [], validSources: [] });
|
||||
|
||||
await validateFinalIngestArtifacts({
|
||||
connectionIds: ['warehouse'],
|
||||
changedWikiPageKeys: [],
|
||||
touchedSlSources: [{ connectionId: 'warehouse', sourceName: 'accounts' }],
|
||||
wikiService: { readPage: vi.fn() } as never,
|
||||
semanticLayerService: semanticLayerService as never,
|
||||
validateTouchedSources,
|
||||
tableExists: async () => true,
|
||||
});
|
||||
|
||||
expect(validateTouchedSources).toHaveBeenCalledWith([
|
||||
{ connectionId: 'warehouse', sourceName: 'accounts' },
|
||||
{ connectionId: 'warehouse', sourceName: 'orders' },
|
||||
{ connectionId: 'warehouse', sourceName: 'segments' },
|
||||
]);
|
||||
});
|
||||
|
||||
it('fails final gates when a changed wiki page references a missing wiki page', async () => {
|
||||
const wikiService = wikiServiceWithPages({
|
||||
'account-segments': {
|
||||
refs: ['missing-frontmatter-page'],
|
||||
content: 'See [[missing-inline-page]] for the related process.',
|
||||
},
|
||||
});
|
||||
const semanticLayerService = {
|
||||
loadAllSources: vi.fn().mockResolvedValue({ sources: [], loadErrors: [] }),
|
||||
};
|
||||
|
||||
await expect(
|
||||
validateFinalIngestArtifacts({
|
||||
connectionIds: ['warehouse'],
|
||||
changedWikiPageKeys: ['account-segments'],
|
||||
touchedSlSources: [],
|
||||
wikiService: wikiService as never,
|
||||
semanticLayerService: semanticLayerService as never,
|
||||
validateTouchedSources: async () => ({ invalidSources: [], validSources: [] }),
|
||||
tableExists: async () => true,
|
||||
}),
|
||||
).rejects.toThrow(
|
||||
/wiki references target missing page\(s\): account-segments -> missing-frontmatter-page, account-segments -> missing-inline-page/,
|
||||
);
|
||||
});
|
||||
});
|
||||
188
packages/context/src/ingest/artifact-gates.ts
Normal file
188
packages/context/src/ingest/artifact-gates.ts
Normal file
|
|
@ -0,0 +1,188 @@
|
|||
import type { SemanticLayerService } from '../sl/index.js';
|
||||
import type { TouchedSlSource } from '../tools/index.js';
|
||||
import type { KnowledgeWikiService } from '../wiki/index.js';
|
||||
import { findMissingWikiRefs } from '../wiki/wiki-ref-validation.js';
|
||||
import { findInvalidWikiBodyRefs } from './wiki-body-refs.js';
|
||||
|
||||
export interface TouchedValidationResult {
|
||||
invalidSources: string[];
|
||||
validSources: string[];
|
||||
}
|
||||
|
||||
export interface FinalArtifactGateInput {
|
||||
connectionIds: string[];
|
||||
changedWikiPageKeys: string[];
|
||||
touchedSlSources: TouchedSlSource[];
|
||||
wikiService: KnowledgeWikiService;
|
||||
semanticLayerService: SemanticLayerService;
|
||||
validateTouchedSources(touched: TouchedSlSource[]): Promise<TouchedValidationResult>;
|
||||
tableExists(connectionId: string, tableRef: string): Promise<boolean>;
|
||||
}
|
||||
|
||||
export interface ProvenanceRawPathValidationInput {
|
||||
rows: Array<{ rawPath: string }>;
|
||||
currentRawPaths: Set<string>;
|
||||
deletedRawPaths: Set<string>;
|
||||
}
|
||||
|
||||
function parseSlRef(ref: string): { connectionId: string | null; sourceName: string; entityName: string | null } {
|
||||
const withoutConnection = ref.includes('/') ? ref.slice(ref.indexOf('/') + 1) : ref;
|
||||
const connectionId = ref.includes('/') ? ref.slice(0, ref.indexOf('/')) : null;
|
||||
const [sourceName = '', entityName = null] = withoutConnection.split('.', 2);
|
||||
return { connectionId, sourceName, entityName };
|
||||
}
|
||||
|
||||
function slEntityNames(source: Awaited<ReturnType<SemanticLayerService['loadAllSources']>>['sources'][number]): Set<string> {
|
||||
return new Set([
|
||||
...(source.measures ?? []).map((measure) => measure.name),
|
||||
...(source.columns ?? []).map((column) => column.name),
|
||||
...(source.segments ?? []).map((segment) => segment.name),
|
||||
]);
|
||||
}
|
||||
|
||||
function uniqueTouchedSources(sources: TouchedSlSource[]): TouchedSlSource[] {
|
||||
const seen = new Set<string>();
|
||||
const unique: TouchedSlSource[] = [];
|
||||
for (const source of sources) {
|
||||
const key = `${source.connectionId}:${source.sourceName}`;
|
||||
if (seen.has(key)) {
|
||||
continue;
|
||||
}
|
||||
seen.add(key);
|
||||
unique.push(source);
|
||||
}
|
||||
return unique.sort((left, right) => {
|
||||
const byConnection = left.connectionId.localeCompare(right.connectionId);
|
||||
return byConnection === 0 ? left.sourceName.localeCompare(right.sourceName) : byConnection;
|
||||
});
|
||||
}
|
||||
|
||||
async function expandTouchedSlSourcesWithDirectJoinNeighbors(input: FinalArtifactGateInput): Promise<TouchedSlSource[]> {
|
||||
const expanded = [...input.touchedSlSources];
|
||||
const touchedByConnection = new Map<string, Set<string>>();
|
||||
for (const source of input.touchedSlSources) {
|
||||
const bucket = touchedByConnection.get(source.connectionId) ?? new Set<string>();
|
||||
bucket.add(source.sourceName);
|
||||
touchedByConnection.set(source.connectionId, bucket);
|
||||
}
|
||||
|
||||
for (const connectionId of input.connectionIds) {
|
||||
const touched = touchedByConnection.get(connectionId);
|
||||
if (!touched || touched.size === 0) {
|
||||
continue;
|
||||
}
|
||||
const { sources } = await input.semanticLayerService.loadAllSources(connectionId);
|
||||
for (const source of sources) {
|
||||
const sourceIsTouched = touched.has(source.name);
|
||||
if (sourceIsTouched) {
|
||||
for (const join of source.joins ?? []) {
|
||||
expanded.push({ connectionId, sourceName: join.to });
|
||||
}
|
||||
}
|
||||
if ((source.joins ?? []).some((join) => touched.has(join.to))) {
|
||||
expanded.push({ connectionId, sourceName: source.name });
|
||||
}
|
||||
}
|
||||
}
|
||||
|
||||
return uniqueTouchedSources(expanded);
|
||||
}
|
||||
|
||||
async function validateWikiSlRefs(input: FinalArtifactGateInput): Promise<string[]> {
|
||||
const errors: string[] = [];
|
||||
const sourcesByConnection = new Map<string, Awaited<ReturnType<SemanticLayerService['loadAllSources']>>['sources']>();
|
||||
for (const connectionId of input.connectionIds) {
|
||||
const { sources } = await input.semanticLayerService.loadAllSources(connectionId);
|
||||
sourcesByConnection.set(connectionId, sources);
|
||||
}
|
||||
|
||||
for (const pageKey of input.changedWikiPageKeys) {
|
||||
const page = await input.wikiService.readPage('GLOBAL', null, pageKey);
|
||||
if (!page) {
|
||||
continue;
|
||||
}
|
||||
for (const ref of page.frontmatter.sl_refs ?? []) {
|
||||
const parsed = parseSlRef(ref);
|
||||
const candidateConnections = parsed.connectionId ? [parsed.connectionId] : input.connectionIds;
|
||||
let source: Awaited<ReturnType<SemanticLayerService['loadAllSources']>>['sources'][number] | undefined;
|
||||
for (const connectionId of candidateConnections) {
|
||||
source = sourcesByConnection.get(connectionId)?.find((candidate) => candidate.name === parsed.sourceName);
|
||||
if (source) {
|
||||
break;
|
||||
}
|
||||
}
|
||||
if (!source) {
|
||||
errors.push(`${pageKey}: unknown sl_refs entry ${ref}`);
|
||||
continue;
|
||||
}
|
||||
if (parsed.entityName && !slEntityNames(source).has(parsed.entityName)) {
|
||||
errors.push(`${pageKey}: unknown sl_refs entity ${ref}`);
|
||||
}
|
||||
}
|
||||
}
|
||||
return errors;
|
||||
}
|
||||
|
||||
async function validateWikiRefs(input: FinalArtifactGateInput): Promise<string[]> {
|
||||
const dangling: string[] = [];
|
||||
for (const pageKey of input.changedWikiPageKeys) {
|
||||
const page = await input.wikiService.readPage('GLOBAL', null, pageKey);
|
||||
if (!page) {
|
||||
continue;
|
||||
}
|
||||
const missingRefs = await findMissingWikiRefs({
|
||||
wikiService: input.wikiService,
|
||||
scope: 'GLOBAL',
|
||||
scopeId: null,
|
||||
pageKey,
|
||||
refs: page.frontmatter.refs,
|
||||
content: page.content,
|
||||
});
|
||||
for (const missingRef of missingRefs) {
|
||||
dangling.push(`${pageKey} -> ${missingRef}`);
|
||||
}
|
||||
}
|
||||
return dangling;
|
||||
}
|
||||
|
||||
export async function validateFinalIngestArtifacts(input: FinalArtifactGateInput): Promise<void> {
|
||||
const touchedWithDependencies = await expandTouchedSlSourcesWithDirectJoinNeighbors(input);
|
||||
const validation = await input.validateTouchedSources(touchedWithDependencies);
|
||||
const errors: string[] = validation.invalidSources.map((source) => `semantic-layer validation failed for ${source}`);
|
||||
errors.push(...(await validateWikiSlRefs(input)));
|
||||
const danglingWikiRefs = await validateWikiRefs(input);
|
||||
if (danglingWikiRefs.length > 0) {
|
||||
errors.push(`wiki references target missing page(s): ${danglingWikiRefs.join(', ')}`);
|
||||
}
|
||||
|
||||
for (const pageKey of input.changedWikiPageKeys) {
|
||||
const page = await input.wikiService.readPage('GLOBAL', null, pageKey);
|
||||
if (!page) {
|
||||
continue;
|
||||
}
|
||||
errors.push(
|
||||
...(await findInvalidWikiBodyRefs({
|
||||
pageKey,
|
||||
body: page.content,
|
||||
visibleConnectionIds: input.connectionIds,
|
||||
loadSources: async (connectionId) => {
|
||||
const { sources } = await input.semanticLayerService.loadAllSources(connectionId);
|
||||
return sources;
|
||||
},
|
||||
tableExists: input.tableExists,
|
||||
})),
|
||||
);
|
||||
}
|
||||
|
||||
if (errors.length > 0) {
|
||||
throw new Error(`final artifact gates failed:\n${errors.join('\n')}`);
|
||||
}
|
||||
}
|
||||
|
||||
export function validateProvenanceRawPaths(input: ProvenanceRawPathValidationInput): void {
|
||||
for (const row of input.rows) {
|
||||
if (!input.currentRawPaths.has(row.rawPath) && !input.deletedRawPaths.has(row.rawPath)) {
|
||||
throw new Error(`provenance row references raw path outside this snapshot: ${row.rawPath}`);
|
||||
}
|
||||
}
|
||||
}
|
||||
136
packages/context/src/ingest/final-gate-repair.test.ts
Normal file
136
packages/context/src/ingest/final-gate-repair.test.ts
Normal file
|
|
@ -0,0 +1,136 @@
|
|||
import { mkdir, mkdtemp, readFile, writeFile } from 'node:fs/promises';
|
||||
import { tmpdir } from 'node:os';
|
||||
import { join } from 'node:path';
|
||||
import { describe, expect, it, vi } from 'vitest';
|
||||
import { finalGateRepairPaths, repairFinalGateFailure } from './final-gate-repair.js';
|
||||
import { FileIngestTraceWriter } from './ingest-trace.js';
|
||||
|
||||
async function makeHarness() {
|
||||
const root = await mkdtemp(join(tmpdir(), 'ktx-final-gate-repair-'));
|
||||
const workdir = join(root, 'workdir');
|
||||
await mkdir(join(workdir, 'wiki/global'), { recursive: true });
|
||||
await mkdir(join(workdir, 'semantic-layer/warehouse'), { recursive: true });
|
||||
await writeFile(
|
||||
join(workdir, 'wiki/global/account-segments.md'),
|
||||
'---\nsummary: Account segments\nusage_mode: auto\n---\n\nARR uses `mart_account_segments.total_contract_arr_cents`.\n',
|
||||
'utf-8',
|
||||
);
|
||||
await writeFile(
|
||||
join(workdir, 'semantic-layer/warehouse/mart_account_segments.yaml'),
|
||||
'name: mart_account_segments\ncolumns: [{name: account_id, type: string}]\njoins: []\nmeasures:\n - name: total_contract_arr\n expr: sum(contract_arr)\n',
|
||||
'utf-8',
|
||||
);
|
||||
const trace = new FileIngestTraceWriter({
|
||||
tracePath: join(root, 'trace.jsonl'),
|
||||
jobId: 'job-1',
|
||||
connectionId: 'warehouse',
|
||||
sourceKey: 'metabase',
|
||||
runId: 'run-1',
|
||||
syncId: 'sync-1',
|
||||
level: 'trace',
|
||||
});
|
||||
return { root, workdir, trace };
|
||||
}
|
||||
|
||||
describe('finalGateRepairPaths', () => {
|
||||
it('derives sorted wiki and semantic-layer file paths', () => {
|
||||
expect(
|
||||
finalGateRepairPaths({
|
||||
changedWikiPageKeys: ['account-segments', 'overview', 'account-segments'],
|
||||
touchedSlSources: [
|
||||
{ connectionId: 'warehouse', sourceName: 'mart_account_segments' },
|
||||
{ connectionId: 'warehouse', sourceName: 'orders' },
|
||||
{ connectionId: 'warehouse', sourceName: 'orders' },
|
||||
],
|
||||
}),
|
||||
).toEqual([
|
||||
'semantic-layer/warehouse/mart_account_segments.yaml',
|
||||
'semantic-layer/warehouse/orders.yaml',
|
||||
'wiki/global/account-segments.md',
|
||||
'wiki/global/overview.md',
|
||||
]);
|
||||
});
|
||||
});
|
||||
|
||||
describe('repairFinalGateFailure', () => {
|
||||
it('lets the repair agent read gate errors and edit only allowed files', async () => {
|
||||
const { workdir, trace } = await makeHarness();
|
||||
const agentRunner = {
|
||||
runLoop: vi.fn(async (params: any) => {
|
||||
const error = await params.toolSet.read_gate_error.execute({});
|
||||
expect(error.markdown).toContain('total_contract_arr_cents');
|
||||
|
||||
const page = await params.toolSet.read_repair_file.execute({
|
||||
path: 'wiki/global/account-segments.md',
|
||||
});
|
||||
expect(page.markdown).toContain('total_contract_arr_cents');
|
||||
|
||||
await expect(
|
||||
params.toolSet.write_repair_file.execute({
|
||||
path: 'wiki/global/other.md',
|
||||
content: 'not allowed',
|
||||
}),
|
||||
).rejects.toThrow(/gate repair path not allowed/);
|
||||
|
||||
await params.toolSet.write_repair_file.execute({
|
||||
path: 'wiki/global/account-segments.md',
|
||||
content: page.markdown.replace('total_contract_arr_cents', 'total_contract_arr'),
|
||||
});
|
||||
return { stopReason: 'natural' as const };
|
||||
}),
|
||||
};
|
||||
|
||||
const result = await repairFinalGateFailure({
|
||||
agentRunner,
|
||||
workdir,
|
||||
gateError:
|
||||
'final artifact gates failed:\naccount-segments: unknown semantic-layer entity mart_account_segments.total_contract_arr_cents',
|
||||
allowedPaths: ['wiki/global/account-segments.md'],
|
||||
trace,
|
||||
repairKind: 'final_artifact_gate',
|
||||
maxAttempts: 1,
|
||||
stepBudget: 8,
|
||||
});
|
||||
|
||||
expect(result).toEqual({
|
||||
status: 'repaired',
|
||||
attempts: 1,
|
||||
changedPaths: ['wiki/global/account-segments.md'],
|
||||
});
|
||||
await expect(readFile(join(workdir, 'wiki/global/account-segments.md'), 'utf-8')).resolves.toContain(
|
||||
'total_contract_arr',
|
||||
);
|
||||
await expect(readFile(trace.tracePath, 'utf-8')).resolves.toContain('gate_repair_repaired');
|
||||
expect(agentRunner.runLoop).toHaveBeenCalledWith(
|
||||
expect.objectContaining({
|
||||
modelRole: 'repair',
|
||||
stepBudget: 8,
|
||||
telemetryTags: expect.objectContaining({
|
||||
operationName: 'ingest-isolated-diff-gate-repair',
|
||||
repairKind: 'final_artifact_gate',
|
||||
}),
|
||||
}),
|
||||
);
|
||||
});
|
||||
|
||||
it('returns failed when the repair agent edits no allowed file', async () => {
|
||||
const { workdir, trace } = await makeHarness();
|
||||
const result = await repairFinalGateFailure({
|
||||
agentRunner: { runLoop: vi.fn(async () => ({ stopReason: 'natural' as const })) },
|
||||
workdir,
|
||||
gateError: 'final artifact gates failed:\naccount-segments: unknown semantic-layer entity',
|
||||
allowedPaths: ['wiki/global/account-segments.md'],
|
||||
trace,
|
||||
repairKind: 'final_artifact_gate',
|
||||
maxAttempts: 1,
|
||||
stepBudget: 8,
|
||||
});
|
||||
|
||||
expect(result).toEqual({
|
||||
status: 'failed',
|
||||
attempts: 1,
|
||||
reason: 'gate repair completed without editing an allowed path',
|
||||
});
|
||||
await expect(readFile(trace.tracePath, 'utf-8')).resolves.toContain('gate_repair_failed');
|
||||
});
|
||||
});
|
||||
230
packages/context/src/ingest/final-gate-repair.ts
Normal file
230
packages/context/src/ingest/final-gate-repair.ts
Normal file
|
|
@ -0,0 +1,230 @@
|
|||
import { mkdir, readFile, writeFile } from 'node:fs/promises';
|
||||
import { dirname, join } from 'node:path';
|
||||
import { z } from 'zod';
|
||||
import type { AgentRunnerPort, KtxRuntimeToolSet } from '../llm/index.js';
|
||||
import type { TouchedSlSource } from '../tools/index.js';
|
||||
import type { IngestTraceWriter } from './ingest-trace.js';
|
||||
import { traceTimed } from './ingest-trace.js';
|
||||
|
||||
type FinalGateRepairKind = 'patch_semantic_gate' | 'final_artifact_gate';
|
||||
|
||||
export type FinalGateRepairResult =
|
||||
| { status: 'repaired'; attempts: number; changedPaths: string[] }
|
||||
| { status: 'failed'; attempts: number; reason: string };
|
||||
|
||||
export interface RepairFinalGateFailureInput {
|
||||
agentRunner: AgentRunnerPort;
|
||||
workdir: string;
|
||||
gateError: string;
|
||||
allowedPaths: string[];
|
||||
trace: IngestTraceWriter;
|
||||
repairKind: FinalGateRepairKind;
|
||||
maxAttempts?: number;
|
||||
stepBudget?: number;
|
||||
}
|
||||
|
||||
const readRepairFileSchema = z.object({
|
||||
path: z.string().min(1),
|
||||
});
|
||||
|
||||
const writeRepairFileSchema = z.object({
|
||||
path: z.string().min(1),
|
||||
content: z.string(),
|
||||
});
|
||||
|
||||
function normalizeRepoPath(path: string): string {
|
||||
const normalized = path.replace(/\\/g, '/').replace(/^\/+/, '');
|
||||
const parts = normalized.split('/').filter((part) => part.length > 0);
|
||||
if (parts.length === 0 || parts.some((part) => part === '.' || part === '..')) {
|
||||
throw new Error(`gate repair path must be a repository-relative path: ${path}`);
|
||||
}
|
||||
return parts.join('/');
|
||||
}
|
||||
|
||||
function assertAllowedPath(path: string, allowedPaths: ReadonlySet<string>): string {
|
||||
const normalized = normalizeRepoPath(path);
|
||||
if (!allowedPaths.has(normalized)) {
|
||||
throw new Error(`gate repair path not allowed: ${normalized}`);
|
||||
}
|
||||
return normalized;
|
||||
}
|
||||
|
||||
async function readOptionalFile(path: string): Promise<{ exists: boolean; content: string }> {
|
||||
try {
|
||||
return { exists: true, content: await readFile(path, 'utf-8') };
|
||||
} catch (error) {
|
||||
if (error && typeof error === 'object' && 'code' in error && error.code === 'ENOENT') {
|
||||
return { exists: false, content: '' };
|
||||
}
|
||||
throw error;
|
||||
}
|
||||
}
|
||||
|
||||
function buildGateRepairSystemPrompt(): string {
|
||||
return `<role>
|
||||
You repair one KTX isolated-diff artifact gate failure inside the integration worktree.
|
||||
</role>
|
||||
|
||||
<rules>
|
||||
- Use read_gate_error first.
|
||||
- Read only files exposed by read_repair_file.
|
||||
- Edit only paths exposed by write_repair_file.
|
||||
- Prefer the smallest text edit that makes the gate pass.
|
||||
- Preserve accepted work-unit, reconciliation, and deterministic projection content.
|
||||
- Do not invent warehouse facts, business definitions, or semantic-layer entities.
|
||||
- If the gate error requires choosing between conflicting facts without evidence, stop without editing.
|
||||
</rules>`;
|
||||
}
|
||||
|
||||
function buildGateRepairUserPrompt(input: {
|
||||
gateError: string;
|
||||
allowedPaths: string[];
|
||||
repairKind: FinalGateRepairKind;
|
||||
attempt: number;
|
||||
maxAttempts: number;
|
||||
}): string {
|
||||
return `Repair isolated-diff artifact gates.
|
||||
|
||||
Repair kind: ${input.repairKind}
|
||||
Attempt: ${input.attempt} of ${input.maxAttempts}
|
||||
|
||||
Allowed files:
|
||||
${input.allowedPaths.map((path) => `- ${path}`).join('\n')}
|
||||
|
||||
Gate error:
|
||||
${input.gateError}
|
||||
|
||||
Use read_gate_error first. Then inspect only the allowed files, write the
|
||||
minimal repaired content, and stop.`;
|
||||
}
|
||||
|
||||
function buildToolSet(input: {
|
||||
workdir: string;
|
||||
gateError: string;
|
||||
allowedPaths: ReadonlySet<string>;
|
||||
editedPaths: Set<string>;
|
||||
}): KtxRuntimeToolSet {
|
||||
return {
|
||||
read_gate_error: {
|
||||
name: 'read_gate_error',
|
||||
description: 'Read the artifact gate failure that must be repaired.',
|
||||
inputSchema: z.object({}),
|
||||
execute: async () => ({
|
||||
markdown: input.gateError,
|
||||
structured: { gateError: input.gateError },
|
||||
}),
|
||||
},
|
||||
read_repair_file: {
|
||||
name: 'read_repair_file',
|
||||
description: 'Read one allowed file from the integration worktree.',
|
||||
inputSchema: readRepairFileSchema,
|
||||
execute: async ({ path }: z.infer<typeof readRepairFileSchema>) => {
|
||||
const normalized = assertAllowedPath(path, input.allowedPaths);
|
||||
const file = await readOptionalFile(join(input.workdir, normalized));
|
||||
return {
|
||||
markdown: file.exists ? file.content : `(missing file: ${normalized})`,
|
||||
structured: { path: normalized, exists: file.exists },
|
||||
};
|
||||
},
|
||||
},
|
||||
write_repair_file: {
|
||||
name: 'write_repair_file',
|
||||
description: 'Replace one allowed integration worktree file with repaired text content.',
|
||||
inputSchema: writeRepairFileSchema,
|
||||
execute: async ({ path, content }: z.infer<typeof writeRepairFileSchema>) => {
|
||||
const normalized = assertAllowedPath(path, input.allowedPaths);
|
||||
const fullPath = join(input.workdir, normalized);
|
||||
await mkdir(dirname(fullPath), { recursive: true });
|
||||
await writeFile(fullPath, content, 'utf-8');
|
||||
input.editedPaths.add(normalized);
|
||||
return {
|
||||
markdown: `Wrote ${normalized}`,
|
||||
structured: { path: normalized, bytes: Buffer.byteLength(content) },
|
||||
};
|
||||
},
|
||||
},
|
||||
};
|
||||
}
|
||||
|
||||
export function finalGateRepairPaths(input: {
|
||||
changedWikiPageKeys: string[];
|
||||
touchedSlSources: TouchedSlSource[];
|
||||
}): string[] {
|
||||
return [
|
||||
...new Set([
|
||||
...input.touchedSlSources.map((source) => `semantic-layer/${source.connectionId}/${source.sourceName}.yaml`),
|
||||
...input.changedWikiPageKeys.map((pageKey) => `wiki/global/${pageKey}.md`),
|
||||
]),
|
||||
].sort();
|
||||
}
|
||||
|
||||
export async function repairFinalGateFailure(
|
||||
input: RepairFinalGateFailureInput,
|
||||
): Promise<FinalGateRepairResult> {
|
||||
const allowedPaths = new Set(input.allowedPaths.map(normalizeRepoPath));
|
||||
const maxAttempts = input.maxAttempts ?? 1;
|
||||
const stepBudget = input.stepBudget ?? 16;
|
||||
let lastFailure = 'gate repair did not run';
|
||||
|
||||
for (let attempt = 1; attempt <= maxAttempts; attempt += 1) {
|
||||
const editedPaths = new Set<string>();
|
||||
const sortedAllowedPaths = [...allowedPaths].sort();
|
||||
const traceData = {
|
||||
repairKind: input.repairKind,
|
||||
attempt,
|
||||
maxAttempts,
|
||||
allowedPaths: sortedAllowedPaths,
|
||||
gateError: input.gateError,
|
||||
};
|
||||
const result = await traceTimed(input.trace, 'gate_repair', 'gate_repair', traceData, async () =>
|
||||
input.agentRunner.runLoop({
|
||||
modelRole: 'repair',
|
||||
systemPrompt: buildGateRepairSystemPrompt(),
|
||||
userPrompt: buildGateRepairUserPrompt({
|
||||
gateError: input.gateError,
|
||||
allowedPaths: sortedAllowedPaths,
|
||||
repairKind: input.repairKind,
|
||||
attempt,
|
||||
maxAttempts,
|
||||
}),
|
||||
toolSet: buildToolSet({
|
||||
workdir: input.workdir,
|
||||
gateError: input.gateError,
|
||||
allowedPaths,
|
||||
editedPaths,
|
||||
}),
|
||||
stepBudget,
|
||||
telemetryTags: {
|
||||
operationName: 'ingest-isolated-diff-gate-repair',
|
||||
source: input.trace.context.sourceKey,
|
||||
jobId: input.trace.context.jobId,
|
||||
repairKind: input.repairKind,
|
||||
},
|
||||
}),
|
||||
);
|
||||
|
||||
if (result.stopReason === 'error') {
|
||||
lastFailure = result.error?.message ?? 'gate repair agent loop errored';
|
||||
await input.trace.event('error', 'gate_repair', 'gate_repair_failed', traceData, result.error);
|
||||
continue;
|
||||
}
|
||||
|
||||
const changedPaths = [...editedPaths].sort();
|
||||
if (changedPaths.length === 0) {
|
||||
lastFailure = 'gate repair completed without editing an allowed path';
|
||||
await input.trace.event('error', 'gate_repair', 'gate_repair_failed', {
|
||||
...traceData,
|
||||
reason: lastFailure,
|
||||
});
|
||||
continue;
|
||||
}
|
||||
|
||||
await input.trace.event('debug', 'gate_repair', 'gate_repair_repaired', {
|
||||
...traceData,
|
||||
changedPaths,
|
||||
});
|
||||
return { status: 'repaired', attempts: attempt, changedPaths };
|
||||
}
|
||||
|
||||
return { status: 'failed', attempts: maxAttempts, reason: lastFailure };
|
||||
}
|
||||
|
|
@ -17,6 +17,11 @@ export {
|
|||
buildLiveDatabaseTableNaturalKey,
|
||||
ktxSchemaSnapshotToExtractedSchema,
|
||||
} from './adapters/live-database/extracted-schema.js';
|
||||
export {
|
||||
assertSemanticLayerTargetPathsAllowed,
|
||||
findDisallowedSemanticLayerTargetPaths,
|
||||
semanticLayerConnectionIdFromPath,
|
||||
} from './semantic-layer-target-policy.js';
|
||||
export { LiveDatabaseSourceAdapter } from './adapters/live-database/live-database.adapter.js';
|
||||
export type {
|
||||
BuildLiveDatabaseManifestShardsInput,
|
||||
|
|
@ -609,6 +614,11 @@ export {
|
|||
} from './raw-sources-paths.js';
|
||||
export { ingestReportSnapshotSchema, parseIngestReportSnapshot } from './report-snapshot.js';
|
||||
export type { IngestReportBody, IngestReportSnapshot } from './reports.js';
|
||||
export * from './artifact-gates.js';
|
||||
export * from './ingest-trace.js';
|
||||
export * from './isolated-diff/git-patch.js';
|
||||
export * from './isolated-diff/patch-integrator.js';
|
||||
export * from './isolated-diff/work-unit-executor.js';
|
||||
export * from './reports.js';
|
||||
export { SourceAdapterRegistry } from './source-adapter-registry.js';
|
||||
export type { SqliteBundleIngestStoreOptions } from './sqlite-bundle-ingest-store.js';
|
||||
|
|
@ -652,4 +662,7 @@ export type {
|
|||
TriageSignals,
|
||||
UnresolvedCardInfo,
|
||||
WorkUnit,
|
||||
DeterministicProjectionContext,
|
||||
ProjectionResult,
|
||||
} from './types.js';
|
||||
export * from './wiki-body-refs.js';
|
||||
|
|
|
|||
File diff suppressed because it is too large
Load diff
|
|
@ -1,8 +1,7 @@
|
|||
import { mkdir, mkdtemp, readFile, rm, stat, writeFile } from 'node:fs/promises';
|
||||
import { mkdtemp, readFile, rm, writeFile } from 'node:fs/promises';
|
||||
import { tmpdir } from 'node:os';
|
||||
import { join } from 'node:path';
|
||||
import { beforeEach, describe, expect, it, vi } from 'vitest';
|
||||
import { GitService } from '../core/index.js';
|
||||
import { addTouchedSlSource } from '../tools/index.js';
|
||||
import { IngestBundleRunner } from './ingest-bundle.runner.js';
|
||||
import { createMemoryFlowLiveBuffer } from './memory-flow/live-buffer.js';
|
||||
|
|
@ -123,9 +122,15 @@ const makeDeps = () => {
|
|||
};
|
||||
const scopedGit = {
|
||||
revParseHead: vi.fn().mockResolvedValue('h'),
|
||||
commitFiles: vi.fn(),
|
||||
commitFiles: vi.fn().mockResolvedValue({ created: true, commitHash: 'h' }),
|
||||
commitStaged: vi.fn().mockResolvedValue({ created: false, commitHash: 'h' }),
|
||||
resetHardTo: vi.fn(),
|
||||
assertWorktreeClean: vi.fn().mockResolvedValue(undefined),
|
||||
writeBinaryNoRenamePatch: vi.fn(async (_base: string, _head: string, patchPath: string) => {
|
||||
await writeFile(patchPath, '', 'utf-8');
|
||||
}),
|
||||
applyPatchFile3WayIndex: vi.fn(),
|
||||
diffNameStatus: vi.fn().mockResolvedValue([]),
|
||||
};
|
||||
const sessionWorktreeService = {
|
||||
create: vi.fn().mockResolvedValue({
|
||||
|
|
@ -167,10 +172,12 @@ const makeDeps = () => {
|
|||
loadPrompt: vi.fn().mockResolvedValue('base-framing'),
|
||||
};
|
||||
const wikiService = {
|
||||
forWorktree: vi.fn().mockReturnValue({}),
|
||||
forWorktree: vi.fn(),
|
||||
listPageKeys: vi.fn().mockResolvedValue([]),
|
||||
readPage: vi.fn().mockResolvedValue(null),
|
||||
syncFromCommit: vi.fn().mockResolvedValue(undefined),
|
||||
};
|
||||
wikiService.forWorktree.mockReturnValue(wikiService);
|
||||
const knowledgeSlRefs = {
|
||||
syncFromWiki: vi.fn().mockResolvedValue({ inserted: 1, deleted: 0 }),
|
||||
};
|
||||
|
|
@ -178,7 +185,7 @@ const makeDeps = () => {
|
|||
listPagesForUser: vi.fn().mockResolvedValue([]),
|
||||
};
|
||||
const semanticLayerService = {
|
||||
forWorktree: vi.fn().mockReturnValue({}),
|
||||
forWorktree: vi.fn(),
|
||||
listFilesForConnection: vi
|
||||
.fn()
|
||||
.mockImplementation((connectionId: string) =>
|
||||
|
|
@ -193,6 +200,7 @@ const makeDeps = () => {
|
|||
}),
|
||||
),
|
||||
};
|
||||
semanticLayerService.forWorktree.mockReturnValue(semanticLayerService);
|
||||
const slSearchService = {
|
||||
indexSources: vi.fn().mockResolvedValue(undefined),
|
||||
};
|
||||
|
|
@ -255,8 +263,12 @@ const buildRunner = (deps: ReturnType<typeof makeDeps> = makeDeps(), overrides:
|
|||
resolveUploadDir: (uploadId) => `/tmp/ktx-test/ingest-uploads/${uploadId}`,
|
||||
resolvePullDir: (jobId) => `/tmp/ktx-test/ingest-pulls/${jobId}`,
|
||||
resolveTranscriptDir: (jobId) => `/tmp/ktx-test/run/wu-transcripts/${jobId}`,
|
||||
resolveTracePath: (jobId) => `/tmp/ktx-test/ingest-traces/${jobId}/trace.jsonl`,
|
||||
},
|
||||
settings: {
|
||||
probeRowCount: 1,
|
||||
memoryIngestionModel: 'test-model',
|
||||
},
|
||||
settings: { probeRowCount: 1, memoryIngestionModel: 'test-model' },
|
||||
skillsRegistry: deps.skillsRegistry as any,
|
||||
promptService: deps.promptService as any,
|
||||
wikiService: deps.wikiService as any,
|
||||
|
|
@ -1505,7 +1517,7 @@ describe('IngestBundleRunner — Stages 1 → 7', () => {
|
|||
|
||||
const runner = buildRunner(deps);
|
||||
(runner as any).stageRawFilesStage1 = vi.fn().mockResolvedValue({
|
||||
currentHashes: new Map([['explores/b2b/sales_pipeline.json', 'h1']]),
|
||||
currentHashes: new Map([['a.yml', 'h1']]),
|
||||
rawDirInWorktree: 'raw-sources/looker-run/fake/s',
|
||||
});
|
||||
(runner as any).resolveStagedDir = vi.fn().mockResolvedValue('/tmp/stage/upload-x');
|
||||
|
|
@ -1570,6 +1582,7 @@ describe('IngestBundleRunner — Stages 1 → 7', () => {
|
|||
workUnits: [{ unitKey: 'u1', rawFiles: ['semantic_models.yml'], peerFileIndex: [], dependencyPaths: [] }],
|
||||
parseArtifacts: { semanticModels: [{ name: 'orders' }] },
|
||||
});
|
||||
deps.adapter.listTargetConnectionIds = vi.fn().mockResolvedValue(['warehouse-2']);
|
||||
deps.semanticLayerService.loadAllSources.mockImplementation((connectionId: string) =>
|
||||
Promise.resolve({ sources: [{ name: `${connectionId}_source` }], loadErrors: [] }),
|
||||
);
|
||||
|
|
@ -1972,9 +1985,15 @@ describe('IngestBundleRunner — Stages 1 → 7', () => {
|
|||
const assertError = new Error('Worktree has in-progress git operation (sequencer ...); refusing to proceed');
|
||||
const sessionGit = {
|
||||
revParseHead: vi.fn().mockResolvedValue('h'),
|
||||
commitFiles: vi.fn(),
|
||||
commitFiles: vi.fn().mockResolvedValue({ created: true, commitHash: 'h' }),
|
||||
commitStaged: vi.fn().mockResolvedValue({ created: false, commitHash: 'h' }),
|
||||
resetHardTo: vi.fn(),
|
||||
assertWorktreeClean: vi.fn().mockRejectedValue(assertError),
|
||||
writeBinaryNoRenamePatch: vi.fn(async (_base: string, _head: string, patchPath: string) => {
|
||||
await writeFile(patchPath, '', 'utf-8');
|
||||
}),
|
||||
applyPatchFile3WayIndex: vi.fn(),
|
||||
diffNameStatus: vi.fn().mockResolvedValue([]),
|
||||
};
|
||||
deps.sessionWorktreeService.create.mockResolvedValue({
|
||||
chatId: 'j1',
|
||||
|
|
@ -2005,135 +2024,6 @@ describe('IngestBundleRunner — Stages 1 → 7', () => {
|
|||
expect(deps.gitService.squashMergeIntoMain).not.toHaveBeenCalled();
|
||||
});
|
||||
|
||||
it('squash-merges only successful WUs into main when one WU fails sl_validate', async () => {
|
||||
const homeDir = await mkdtemp(join(tmpdir(), 'ingest-rollback-'));
|
||||
try {
|
||||
const configDir = join(homeDir, 'config');
|
||||
const mainGit = new GitService({
|
||||
storage: { configDir, homeDir },
|
||||
git: {
|
||||
userName: 'System User',
|
||||
userEmail: 'system@example.com',
|
||||
bootstrapMessage: 'Initialize test config repo',
|
||||
bootstrapAuthor: 'test-system',
|
||||
bootstrapAuthorEmail: 'system@example.com',
|
||||
},
|
||||
});
|
||||
await mainGit.onModuleInit();
|
||||
const baseSha = await mainGit.revParseHead();
|
||||
if (!baseSha) {
|
||||
throw new Error('no base sha');
|
||||
}
|
||||
|
||||
const deps = makeDeps();
|
||||
const sessionDir = join(homeDir, '.worktrees', 'session-j1');
|
||||
const sessionBranch = 'session/j1';
|
||||
let currentToolSession: any = null;
|
||||
|
||||
deps.gitService = mainGit as any;
|
||||
deps.sessionWorktreeService.create.mockImplementation(async (_jobId: string, startSha: string) => {
|
||||
await mkdir(join(homeDir, '.worktrees'), { recursive: true });
|
||||
await mainGit.addWorktree(sessionDir, sessionBranch, startSha);
|
||||
return {
|
||||
chatId: 'j1',
|
||||
workdir: sessionDir,
|
||||
branch: sessionBranch,
|
||||
baseSha: startSha,
|
||||
createdAt: new Date(),
|
||||
git: mainGit.forWorktree(sessionDir),
|
||||
config: {},
|
||||
};
|
||||
});
|
||||
deps.sessionWorktreeService.cleanup.mockResolvedValue(undefined);
|
||||
deps.adapter.chunk.mockResolvedValue({
|
||||
workUnits: [
|
||||
{ unitKey: 'wu-good', rawFiles: ['good.raw'], peerFileIndex: [], dependencyPaths: [] },
|
||||
{ unitKey: 'wu-bad', rawFiles: ['bad.raw'], peerFileIndex: [], dependencyPaths: [] },
|
||||
],
|
||||
});
|
||||
deps.toolsetFactory.createIngestWuToolset.mockImplementation((toolSession: any) => {
|
||||
currentToolSession = toolSession;
|
||||
return {
|
||||
toRuntimeTools: vi.fn().mockReturnValue({}),
|
||||
getAllTools: vi.fn().mockReturnValue([]),
|
||||
getToolNames: vi.fn().mockReturnValue([]),
|
||||
};
|
||||
});
|
||||
deps.slValidator.validateSingleSource.mockImplementation(
|
||||
(_validationDeps: unknown, _connectionId: string, sourceName: string) => ({
|
||||
errors: sourceName === 'bad' ? [{ message: 'bad source rejected' }] : [],
|
||||
warnings: [],
|
||||
}),
|
||||
);
|
||||
deps.agentRunner.runLoop.mockImplementation(async (params: any) => {
|
||||
const unitKey = params.telemetryTags?.unitKey;
|
||||
if (unitKey === 'wu-good') {
|
||||
await mkdir(join(sessionDir, 'semantic-layer', 'c1'), { recursive: true });
|
||||
await writeFile(join(sessionDir, 'semantic-layer', 'c1', 'good.yaml'), 'name: good\n');
|
||||
addTouchedSlSource(currentToolSession.touchedSlSources, 'c1', 'good');
|
||||
currentToolSession.actions.push({ target: 'sl', type: 'created', key: 'good', detail: '' });
|
||||
await currentToolSession.gitService.commitFiles(
|
||||
['semantic-layer/c1/good.yaml'],
|
||||
'test: add good source',
|
||||
'KTX Test',
|
||||
'system@ktx.local',
|
||||
);
|
||||
}
|
||||
if (unitKey === 'wu-bad') {
|
||||
await mkdir(join(sessionDir, 'semantic-layer', 'c1'), { recursive: true });
|
||||
await writeFile(join(sessionDir, 'semantic-layer', 'c1', 'bad.yaml'), 'name: bad\n');
|
||||
addTouchedSlSource(currentToolSession.touchedSlSources, 'c1', 'bad');
|
||||
currentToolSession.actions.push({ target: 'sl', type: 'created', key: 'bad', detail: '' });
|
||||
await currentToolSession.gitService.commitFiles(
|
||||
['semantic-layer/c1/bad.yaml'],
|
||||
'test: add bad source',
|
||||
'KTX Test',
|
||||
'system@ktx.local',
|
||||
);
|
||||
}
|
||||
return { stopReason: 'natural' };
|
||||
});
|
||||
|
||||
const runner = buildRunner(deps);
|
||||
(runner as any).stageRawFilesStage1 = vi.fn().mockImplementation(async ({ worktreeRoot }: any) => {
|
||||
const rawDir = join(worktreeRoot, 'raw-sources', 'c1', 'fake', 's');
|
||||
await mkdir(rawDir, { recursive: true });
|
||||
await writeFile(join(rawDir, 'good.raw'), 'good raw');
|
||||
await writeFile(join(rawDir, 'bad.raw'), 'bad raw');
|
||||
return {
|
||||
currentHashes: new Map([
|
||||
['good.raw', 'good-hash'],
|
||||
['bad.raw', 'bad-hash'],
|
||||
]),
|
||||
rawDirInWorktree: 'raw-sources/c1/fake/s',
|
||||
};
|
||||
});
|
||||
(runner as any).resolveStagedDir = vi.fn().mockResolvedValue('/tmp/stage/upload-x');
|
||||
|
||||
const result = await runner.run({
|
||||
jobId: 'j1',
|
||||
connectionId: 'c1',
|
||||
sourceKey: 'fake',
|
||||
trigger: 'upload',
|
||||
bundleRef: { kind: 'upload', uploadId: 'upload-x' },
|
||||
});
|
||||
|
||||
expect(result.failedWorkUnits).toEqual(['wu-bad']);
|
||||
expect(await readFile(join(configDir, 'semantic-layer', 'c1', 'good.yaml'), 'utf-8')).toContain('good');
|
||||
expect(await readFile(join(configDir, 'semantic-layer', 'c1', 'bad.yaml'), 'utf-8').catch(() => null)).toBeNull();
|
||||
expect(deps.reportsRepo.create).toHaveBeenCalledWith(
|
||||
expect.objectContaining({
|
||||
body: expect.objectContaining({
|
||||
failedWorkUnits: ['wu-bad'],
|
||||
}),
|
||||
}),
|
||||
);
|
||||
await expect(stat(join(configDir, '.git', 'sequencer'))).rejects.toThrow();
|
||||
} finally {
|
||||
await rm(homeDir, { recursive: true, force: true });
|
||||
}
|
||||
});
|
||||
|
||||
it('fails the run and rethrows when the adapter cannot detect the bundle', async () => {
|
||||
const deps = makeDeps();
|
||||
deps.adapter.detect.mockResolvedValue(false);
|
||||
|
|
|
|||
File diff suppressed because it is too large
Load diff
85
packages/context/src/ingest/ingest-trace.test.ts
Normal file
85
packages/context/src/ingest/ingest-trace.test.ts
Normal file
|
|
@ -0,0 +1,85 @@
|
|||
import { mkdtemp, readFile } from 'node:fs/promises';
|
||||
import { tmpdir } from 'node:os';
|
||||
import { join } from 'node:path';
|
||||
import { describe, expect, it, vi } from 'vitest';
|
||||
import { FileIngestTraceWriter, ingestTracePathForJob, traceTimed } from './ingest-trace.js';
|
||||
|
||||
describe('FileIngestTraceWriter', () => {
|
||||
it('persists structured trace events as JSONL', async () => {
|
||||
const root = await mkdtemp(join(tmpdir(), 'ktx-trace-'));
|
||||
const tracePath = ingestTracePathForJob(root, 'job-1');
|
||||
const trace = new FileIngestTraceWriter({
|
||||
tracePath,
|
||||
jobId: 'job-1',
|
||||
connectionId: 'metabase-main',
|
||||
sourceKey: 'metabase',
|
||||
level: 'debug',
|
||||
});
|
||||
|
||||
await trace.event('debug', 'snapshot', 'input_snapshot', {
|
||||
baseSha: 'abc123',
|
||||
rawFileCount: 2,
|
||||
diffSummary: { added: 1, modified: 1, deleted: 0, unchanged: 3 },
|
||||
});
|
||||
|
||||
const lines = (await readFile(tracePath, 'utf-8'))
|
||||
.trim()
|
||||
.split('\n')
|
||||
.map((line) => JSON.parse(line));
|
||||
expect(lines).toHaveLength(1);
|
||||
expect(lines[0]).toMatchObject({
|
||||
schemaVersion: 1,
|
||||
jobId: 'job-1',
|
||||
connectionId: 'metabase-main',
|
||||
sourceKey: 'metabase',
|
||||
level: 'debug',
|
||||
phase: 'snapshot',
|
||||
event: 'input_snapshot',
|
||||
data: {
|
||||
baseSha: 'abc123',
|
||||
rawFileCount: 2,
|
||||
diffSummary: { added: 1, modified: 1, deleted: 0, unchanged: 3 },
|
||||
},
|
||||
});
|
||||
expect(typeof lines[0].at).toBe('string');
|
||||
});
|
||||
|
||||
it('records timing and error context for postmortem inspection', async () => {
|
||||
vi.useFakeTimers();
|
||||
vi.setSystemTime(new Date('2026-05-17T12:00:00.000Z'));
|
||||
const root = await mkdtemp(join(tmpdir(), 'ktx-trace-'));
|
||||
const tracePath = ingestTracePathForJob(root, 'job-2');
|
||||
const trace = new FileIngestTraceWriter({
|
||||
tracePath,
|
||||
jobId: 'job-2',
|
||||
connectionId: 'c1',
|
||||
sourceKey: 'fake',
|
||||
level: 'trace',
|
||||
});
|
||||
|
||||
await expect(
|
||||
traceTimed(trace, 'integration', 'apply_patch', { unitKey: 'wu-1' }, async () => {
|
||||
vi.advanceTimersByTime(17);
|
||||
throw new Error('patch conflict');
|
||||
}),
|
||||
).rejects.toThrow('patch conflict');
|
||||
|
||||
const lines = (await readFile(tracePath, 'utf-8'))
|
||||
.trim()
|
||||
.split('\n')
|
||||
.map((line) => JSON.parse(line));
|
||||
expect(lines.map((line) => line.event)).toEqual(['apply_patch_started', 'apply_patch_failed']);
|
||||
expect(lines[1]).toMatchObject({
|
||||
level: 'error',
|
||||
phase: 'integration',
|
||||
data: { unitKey: 'wu-1' },
|
||||
error: { name: 'Error', message: 'patch conflict' },
|
||||
});
|
||||
expect(lines[1].durationMs).toBe(17);
|
||||
vi.useRealTimers();
|
||||
});
|
||||
|
||||
it('uses the documented trace path layout', () => {
|
||||
expect(ingestTracePathForJob('/project/.ktx', 'job-3')).toBe('/project/.ktx/ingest-traces/job-3/trace.jsonl');
|
||||
});
|
||||
});
|
||||
158
packages/context/src/ingest/ingest-trace.ts
Normal file
158
packages/context/src/ingest/ingest-trace.ts
Normal file
|
|
@ -0,0 +1,158 @@
|
|||
import { appendFile, mkdir } from 'node:fs/promises';
|
||||
import { dirname, join } from 'node:path';
|
||||
|
||||
export type IngestTraceLevel = 'info' | 'debug' | 'trace' | 'error';
|
||||
|
||||
const TRACE_LEVEL_RANK: Record<IngestTraceLevel, number> = {
|
||||
error: 0,
|
||||
info: 1,
|
||||
debug: 2,
|
||||
trace: 3,
|
||||
};
|
||||
|
||||
export interface IngestTraceContext {
|
||||
tracePath: string;
|
||||
jobId: string;
|
||||
connectionId: string;
|
||||
sourceKey: string;
|
||||
runId?: string;
|
||||
syncId?: string;
|
||||
level?: IngestTraceLevel;
|
||||
}
|
||||
|
||||
export interface IngestTraceEvent {
|
||||
schemaVersion: 1;
|
||||
at: string;
|
||||
level: IngestTraceLevel;
|
||||
jobId: string;
|
||||
connectionId: string;
|
||||
sourceKey: string;
|
||||
runId?: string;
|
||||
syncId?: string;
|
||||
phase: string;
|
||||
event: string;
|
||||
durationMs?: number;
|
||||
data?: Record<string, unknown>;
|
||||
error?: {
|
||||
name: string;
|
||||
message: string;
|
||||
stack?: string;
|
||||
};
|
||||
}
|
||||
|
||||
export interface IngestTraceWriter {
|
||||
readonly tracePath: string;
|
||||
readonly context: IngestTraceContext;
|
||||
withContext(context: Partial<Pick<IngestTraceContext, 'runId' | 'syncId'>>): IngestTraceWriter;
|
||||
event(
|
||||
level: IngestTraceLevel,
|
||||
phase: string,
|
||||
event: string,
|
||||
data?: Record<string, unknown>,
|
||||
error?: unknown,
|
||||
durationMs?: number,
|
||||
): Promise<void>;
|
||||
}
|
||||
|
||||
export function ingestTracePathForJob(homeDir: string, jobId: string): string {
|
||||
return join(homeDir, 'ingest-traces', jobId, 'trace.jsonl');
|
||||
}
|
||||
|
||||
function serializeError(error: unknown): IngestTraceEvent['error'] | undefined {
|
||||
if (error === undefined || error === null) {
|
||||
return undefined;
|
||||
}
|
||||
if (error instanceof Error) {
|
||||
return {
|
||||
name: error.name,
|
||||
message: error.message,
|
||||
...(error.stack ? { stack: error.stack } : {}),
|
||||
};
|
||||
}
|
||||
return { name: 'Error', message: String(error) };
|
||||
}
|
||||
|
||||
function shouldWrite(configured: IngestTraceLevel, incoming: IngestTraceLevel): boolean {
|
||||
return TRACE_LEVEL_RANK[incoming] <= TRACE_LEVEL_RANK[configured];
|
||||
}
|
||||
|
||||
export class FileIngestTraceWriter implements IngestTraceWriter {
|
||||
readonly tracePath: string;
|
||||
readonly context: IngestTraceContext;
|
||||
|
||||
constructor(context: IngestTraceContext) {
|
||||
this.context = { ...context, level: context.level ?? 'debug' };
|
||||
this.tracePath = context.tracePath;
|
||||
}
|
||||
|
||||
withContext(context: Partial<Pick<IngestTraceContext, 'runId' | 'syncId'>>): IngestTraceWriter {
|
||||
return new FileIngestTraceWriter({ ...this.context, ...context, tracePath: this.tracePath });
|
||||
}
|
||||
|
||||
async event(
|
||||
level: IngestTraceLevel,
|
||||
phase: string,
|
||||
event: string,
|
||||
data?: Record<string, unknown>,
|
||||
error?: unknown,
|
||||
durationMs?: number,
|
||||
): Promise<void> {
|
||||
if (!shouldWrite(this.context.level ?? 'debug', level)) {
|
||||
return;
|
||||
}
|
||||
const serializedError = serializeError(error);
|
||||
const payload: IngestTraceEvent = {
|
||||
schemaVersion: 1,
|
||||
at: new Date().toISOString(),
|
||||
level,
|
||||
jobId: this.context.jobId,
|
||||
connectionId: this.context.connectionId,
|
||||
sourceKey: this.context.sourceKey,
|
||||
...(this.context.runId ? { runId: this.context.runId } : {}),
|
||||
...(this.context.syncId ? { syncId: this.context.syncId } : {}),
|
||||
phase,
|
||||
event,
|
||||
...(durationMs !== undefined ? { durationMs } : {}),
|
||||
...(data ? { data } : {}),
|
||||
...(serializedError ? { error: serializedError } : {}),
|
||||
};
|
||||
await mkdir(dirname(this.tracePath), { recursive: true });
|
||||
await appendFile(this.tracePath, `${JSON.stringify(payload)}\n`, 'utf-8');
|
||||
}
|
||||
}
|
||||
|
||||
export class NoopIngestTraceWriter implements IngestTraceWriter {
|
||||
readonly tracePath = '';
|
||||
readonly context: IngestTraceContext = {
|
||||
tracePath: '',
|
||||
jobId: '',
|
||||
connectionId: '',
|
||||
sourceKey: '',
|
||||
level: 'error',
|
||||
};
|
||||
|
||||
withContext(): IngestTraceWriter {
|
||||
return this;
|
||||
}
|
||||
|
||||
async event(): Promise<void> {}
|
||||
}
|
||||
|
||||
export async function traceTimed<T>(
|
||||
trace: IngestTraceWriter,
|
||||
phase: string,
|
||||
event: string,
|
||||
data: Record<string, unknown>,
|
||||
fn: () => Promise<T>,
|
||||
): Promise<T> {
|
||||
await trace.event('debug', phase, `${event}_started`, data);
|
||||
const started = Date.now();
|
||||
try {
|
||||
const result = await fn();
|
||||
await trace.event('debug', phase, `${event}_finished`, data, undefined, Date.now() - started);
|
||||
return result;
|
||||
} catch (error) {
|
||||
await trace.event('error', phase, `${event}_failed`, data, error, Date.now() - started);
|
||||
throw error;
|
||||
}
|
||||
}
|
||||
97
packages/context/src/ingest/isolated-diff/git-patch.test.ts
Normal file
97
packages/context/src/ingest/isolated-diff/git-patch.test.ts
Normal file
|
|
@ -0,0 +1,97 @@
|
|||
import { describe, expect, it } from 'vitest';
|
||||
import { assertPatchAllowedForWorkUnit, parsePatchTouchedPaths, textArtifactRoots } from './git-patch.js';
|
||||
|
||||
describe('isolated diff patch contract', () => {
|
||||
it('parses touched paths from no-rename git patches', () => {
|
||||
const patch = [
|
||||
'diff --git a/wiki/global/a.md b/wiki/global/a.md',
|
||||
'index 1111111..2222222 100644',
|
||||
'--- a/wiki/global/a.md',
|
||||
'+++ b/wiki/global/a.md',
|
||||
'@@ -1 +1 @@',
|
||||
'-old',
|
||||
'+new',
|
||||
'diff --git a/semantic-layer/c1/orders.yaml b/semantic-layer/c1/orders.yaml',
|
||||
'new file mode 100644',
|
||||
'--- /dev/null',
|
||||
'+++ b/semantic-layer/c1/orders.yaml',
|
||||
'@@ -0,0 +1 @@',
|
||||
'+name: orders',
|
||||
'',
|
||||
].join('\n');
|
||||
|
||||
expect(parsePatchTouchedPaths(patch)).toEqual([
|
||||
{
|
||||
path: 'wiki/global/a.md',
|
||||
oldPath: 'wiki/global/a.md',
|
||||
newPath: 'wiki/global/a.md',
|
||||
mode: '100644',
|
||||
binary: false,
|
||||
},
|
||||
{
|
||||
path: 'semantic-layer/c1/orders.yaml',
|
||||
oldPath: 'semantic-layer/c1/orders.yaml',
|
||||
newPath: 'semantic-layer/c1/orders.yaml',
|
||||
mode: '100644',
|
||||
binary: false,
|
||||
},
|
||||
]);
|
||||
});
|
||||
|
||||
it('rejects semantic-layer paths for slDisallowed work units', () => {
|
||||
const patch = 'diff --git a/semantic-layer/c1/orders.yaml b/semantic-layer/c1/orders.yaml\nindex 1..2 100644\n';
|
||||
|
||||
expect(() =>
|
||||
assertPatchAllowedForWorkUnit({
|
||||
unitKey: 'lookml-mismatch',
|
||||
patch,
|
||||
slDisallowed: true,
|
||||
}),
|
||||
).toThrow(/slDisallowed WorkUnit lookml-mismatch touched semantic-layer\/c1\/orders.yaml/);
|
||||
});
|
||||
|
||||
it('rejects semantic-layer paths outside allowed target connections', () => {
|
||||
const patch =
|
||||
'diff --git a/semantic-layer/finance/orders.yaml b/semantic-layer/finance/orders.yaml\nindex 1..2 100644\n';
|
||||
|
||||
expect(() =>
|
||||
assertPatchAllowedForWorkUnit({
|
||||
unitKey: 'wu-finance',
|
||||
patch,
|
||||
slDisallowed: false,
|
||||
allowedTargetConnectionIds: new Set(['warehouse']),
|
||||
}),
|
||||
).toThrow(
|
||||
/semantic-layer target connection not allowed: semantic-layer\/finance\/orders.yaml \(finance\); allowed: warehouse/,
|
||||
);
|
||||
});
|
||||
|
||||
it('rejects executable and binary changes under known text artifact roots', () => {
|
||||
expect(textArtifactRoots).toEqual(['wiki/', 'semantic-layer/']);
|
||||
|
||||
const executablePatch =
|
||||
'diff --git a/wiki/global/a.md b/wiki/global/a.md\nold mode 100644\nnew mode 100755\nindex 1..2\n';
|
||||
expect(() =>
|
||||
assertPatchAllowedForWorkUnit({
|
||||
unitKey: 'wu-1',
|
||||
patch: executablePatch,
|
||||
slDisallowed: false,
|
||||
}),
|
||||
).toThrow(/unexpected executable mode under wiki\/global\/a.md/);
|
||||
|
||||
const binaryPatch = [
|
||||
'diff --git a/semantic-layer/c1/orders.yaml b/semantic-layer/c1/orders.yaml',
|
||||
'index 1111111..2222222 100644',
|
||||
'GIT binary patch',
|
||||
'literal 0',
|
||||
'',
|
||||
].join('\n');
|
||||
expect(() =>
|
||||
assertPatchAllowedForWorkUnit({
|
||||
unitKey: 'wu-2',
|
||||
patch: binaryPatch,
|
||||
slDisallowed: false,
|
||||
}),
|
||||
).toThrow(/unexpected binary patch under semantic-layer\/c1\/orders.yaml/);
|
||||
});
|
||||
});
|
||||
101
packages/context/src/ingest/isolated-diff/git-patch.ts
Normal file
101
packages/context/src/ingest/isolated-diff/git-patch.ts
Normal file
|
|
@ -0,0 +1,101 @@
|
|||
import { assertSemanticLayerTargetPathsAllowed } from '../semantic-layer-target-policy.js';
|
||||
|
||||
export const textArtifactRoots = ['wiki/', 'semantic-layer/'] as const;
|
||||
|
||||
export interface PatchTouchedPath {
|
||||
path: string;
|
||||
oldPath: string;
|
||||
newPath: string;
|
||||
mode: string | null;
|
||||
binary: boolean;
|
||||
}
|
||||
|
||||
export interface PatchPolicyInput {
|
||||
unitKey: string;
|
||||
patch: string;
|
||||
slDisallowed: boolean;
|
||||
allowedTargetConnectionIds?: ReadonlySet<string>;
|
||||
}
|
||||
|
||||
function stripPrefix(path: string): string {
|
||||
return path.replace(/^[ab]\//, '');
|
||||
}
|
||||
|
||||
function isTextArtifactPath(path: string): boolean {
|
||||
return textArtifactRoots.some((root) => path.startsWith(root));
|
||||
}
|
||||
|
||||
export function parsePatchTouchedPaths(patch: string): PatchTouchedPath[] {
|
||||
const lines = patch.split('\n');
|
||||
const entries: PatchTouchedPath[] = [];
|
||||
let current: PatchTouchedPath | null = null;
|
||||
|
||||
const pushCurrent = () => {
|
||||
if (current) {
|
||||
entries.push(current);
|
||||
}
|
||||
};
|
||||
|
||||
for (const line of lines) {
|
||||
const diffMatch = /^diff --git (.+) (.+)$/.exec(line);
|
||||
if (diffMatch) {
|
||||
pushCurrent();
|
||||
const oldPath = stripPrefix(diffMatch[1] ?? '');
|
||||
const newPath = stripPrefix(diffMatch[2] ?? '');
|
||||
current = {
|
||||
path: newPath === '/dev/null' ? oldPath : newPath,
|
||||
oldPath,
|
||||
newPath,
|
||||
mode: null,
|
||||
binary: false,
|
||||
};
|
||||
continue;
|
||||
}
|
||||
if (!current) {
|
||||
continue;
|
||||
}
|
||||
const indexMode = /^index [0-9a-f]+\.\.[0-9a-f]+(?: ([0-7]{6}))?$/.exec(line);
|
||||
if (indexMode?.[1]) {
|
||||
current.mode = indexMode[1];
|
||||
}
|
||||
const newMode = /^new mode ([0-7]{6})$/.exec(line);
|
||||
if (newMode) {
|
||||
current.mode = newMode[1] ?? current.mode;
|
||||
}
|
||||
const newFileMode = /^new file mode ([0-7]{6})$/.exec(line);
|
||||
if (newFileMode) {
|
||||
current.mode = newFileMode[1] ?? current.mode;
|
||||
}
|
||||
if (line === 'GIT binary patch' || line.startsWith('Binary files ')) {
|
||||
current.binary = true;
|
||||
}
|
||||
}
|
||||
|
||||
pushCurrent();
|
||||
return entries;
|
||||
}
|
||||
|
||||
export function assertPatchAllowedForWorkUnit(input: PatchPolicyInput): PatchTouchedPath[] {
|
||||
const touched = parsePatchTouchedPaths(input.patch);
|
||||
if (input.allowedTargetConnectionIds) {
|
||||
assertSemanticLayerTargetPathsAllowed({
|
||||
paths: touched.map((entry) => entry.path),
|
||||
allowedConnectionIds: input.allowedTargetConnectionIds,
|
||||
});
|
||||
}
|
||||
for (const entry of touched) {
|
||||
if (input.slDisallowed && entry.path.startsWith('semantic-layer/')) {
|
||||
throw new Error(`slDisallowed WorkUnit ${input.unitKey} touched ${entry.path}`);
|
||||
}
|
||||
if (!isTextArtifactPath(entry.path)) {
|
||||
continue;
|
||||
}
|
||||
if (entry.binary) {
|
||||
throw new Error(`unexpected binary patch under ${entry.path}`);
|
||||
}
|
||||
if (entry.mode && entry.mode !== '100644') {
|
||||
throw new Error(`unexpected executable mode under ${entry.path}: ${entry.mode}`);
|
||||
}
|
||||
}
|
||||
return touched;
|
||||
}
|
||||
|
|
@ -0,0 +1,404 @@
|
|||
import { mkdir, mkdtemp, readFile, writeFile } from 'node:fs/promises';
|
||||
import { tmpdir } from 'node:os';
|
||||
import { join } from 'node:path';
|
||||
import { describe, expect, it, vi } from 'vitest';
|
||||
import { GitService } from '../../core/index.js';
|
||||
import { FileIngestTraceWriter } from '../ingest-trace.js';
|
||||
import { integrateWorkUnitPatch } from './patch-integrator.js';
|
||||
|
||||
async function makeRepo() {
|
||||
const homeDir = await mkdtemp(join(tmpdir(), 'ktx-integrate-'));
|
||||
const configDir = join(homeDir, 'config');
|
||||
const git = new GitService({
|
||||
storage: { configDir, homeDir },
|
||||
git: {
|
||||
userName: 'System User',
|
||||
userEmail: 'system@example.com',
|
||||
bootstrapMessage: 'init',
|
||||
bootstrapAuthor: 'system',
|
||||
bootstrapAuthorEmail: 'system@example.com',
|
||||
},
|
||||
});
|
||||
await git.onModuleInit();
|
||||
await mkdir(join(configDir, 'wiki/global'), { recursive: true });
|
||||
await writeFile(join(configDir, 'wiki/global/a.md'), 'old\n');
|
||||
await git.commitFiles(['wiki/global/a.md'], 'base', 'System User', 'system@example.com');
|
||||
return { homeDir, configDir, git, baseSha: await git.revParseHead() };
|
||||
}
|
||||
|
||||
describe('integrateWorkUnitPatch', () => {
|
||||
it('applies a clean patch, runs semantic gates, and commits accepted changes', async () => {
|
||||
const { homeDir, configDir, git, baseSha } = await makeRepo();
|
||||
const childDir = join(homeDir, 'child');
|
||||
await git.addWorktree(childDir, 'child', baseSha);
|
||||
const childGit = git.forWorktree(childDir);
|
||||
await writeFile(join(childDir, 'wiki/global/a.md'), 'new\n');
|
||||
await childGit.commitFiles(['wiki/global/a.md'], 'edit', 'System User', 'system@example.com');
|
||||
const patchPath = join(homeDir, 'patches/wu.patch');
|
||||
await childGit.writeBinaryNoRenamePatch(baseSha, 'HEAD', patchPath);
|
||||
const trace = new FileIngestTraceWriter({
|
||||
tracePath: join(homeDir, '.ktx/ingest-traces/job-1/trace.jsonl'),
|
||||
jobId: 'job-1',
|
||||
connectionId: 'c1',
|
||||
sourceKey: 'fake',
|
||||
level: 'trace',
|
||||
});
|
||||
|
||||
const result = await integrateWorkUnitPatch({
|
||||
unitKey: 'wu-1',
|
||||
patchPath,
|
||||
integrationGit: git,
|
||||
trace,
|
||||
author: { name: 'KTX Test', email: 'system@ktx.local' },
|
||||
validateAppliedTree: vi.fn().mockResolvedValue(undefined),
|
||||
slDisallowed: false,
|
||||
allowedTargetConnectionIds: new Set(['c1']),
|
||||
});
|
||||
|
||||
expect(result.status).toBe('accepted');
|
||||
await expect(readFile(join(configDir, 'wiki/global/a.md'), 'utf-8')).resolves.toBe('new\n');
|
||||
await expect(readFile(trace.tracePath, 'utf-8')).resolves.toContain('patch_apply_finished');
|
||||
});
|
||||
|
||||
it('rolls back and classifies semantic conflicts', async () => {
|
||||
const { homeDir, configDir, git, baseSha } = await makeRepo();
|
||||
const childDir = join(homeDir, 'child-semantic');
|
||||
await git.addWorktree(childDir, 'child-semantic', baseSha);
|
||||
const childGit = git.forWorktree(childDir);
|
||||
await writeFile(join(childDir, 'wiki/global/a.md'), 'bad\n');
|
||||
await childGit.commitFiles(['wiki/global/a.md'], 'bad edit', 'System User', 'system@example.com');
|
||||
const patchPath = join(homeDir, 'patches/bad.patch');
|
||||
await childGit.writeBinaryNoRenamePatch(baseSha, 'HEAD', patchPath);
|
||||
const trace = new FileIngestTraceWriter({
|
||||
tracePath: join(homeDir, '.ktx/ingest-traces/job-2/trace.jsonl'),
|
||||
jobId: 'job-2',
|
||||
connectionId: 'c1',
|
||||
sourceKey: 'fake',
|
||||
level: 'trace',
|
||||
});
|
||||
|
||||
const result = await integrateWorkUnitPatch({
|
||||
unitKey: 'wu-bad',
|
||||
patchPath,
|
||||
integrationGit: git,
|
||||
trace,
|
||||
author: { name: 'KTX Test', email: 'system@ktx.local' },
|
||||
validateAppliedTree: vi.fn().mockRejectedValue(new Error('final artifact gates failed')),
|
||||
slDisallowed: false,
|
||||
allowedTargetConnectionIds: new Set(['c1']),
|
||||
});
|
||||
|
||||
expect(result.status).toBe('semantic_conflict');
|
||||
await expect(readFile(join(configDir, 'wiki/global/a.md'), 'utf-8')).resolves.toBe('old\n');
|
||||
});
|
||||
|
||||
it('classifies slDisallowed patch policy failures as traced textual conflicts', async () => {
|
||||
const { homeDir, configDir, git, baseSha } = await makeRepo();
|
||||
await mkdir(join(configDir, 'semantic-layer/c1'), { recursive: true });
|
||||
await git.commitFiles(['semantic-layer/c1'], 'empty sl dir', 'System User', 'system@example.com');
|
||||
const childDir = join(homeDir, 'child-policy');
|
||||
await git.addWorktree(childDir, 'child-policy', baseSha);
|
||||
const childGit = git.forWorktree(childDir);
|
||||
await mkdir(join(childDir, 'semantic-layer/c1'), { recursive: true });
|
||||
await writeFile(join(childDir, 'semantic-layer/c1/orders.yaml'), 'name: orders\ncolumns: []\njoins: []\nmeasures: []\n');
|
||||
await childGit.commitFiles(['semantic-layer/c1/orders.yaml'], 'forbidden sl', 'System User', 'system@example.com');
|
||||
const patchPath = join(homeDir, 'patches/forbidden.patch');
|
||||
await childGit.writeBinaryNoRenamePatch(baseSha, 'HEAD', patchPath);
|
||||
const trace = new FileIngestTraceWriter({
|
||||
tracePath: join(homeDir, '.ktx/ingest-traces/job-policy/trace.jsonl'),
|
||||
jobId: 'job-policy',
|
||||
connectionId: 'c1',
|
||||
sourceKey: 'fake',
|
||||
level: 'trace',
|
||||
});
|
||||
|
||||
const result = await integrateWorkUnitPatch({
|
||||
unitKey: 'lookml-mismatch',
|
||||
patchPath,
|
||||
integrationGit: git,
|
||||
trace,
|
||||
author: { name: 'KTX Test', email: 'system@ktx.local' },
|
||||
validateAppliedTree: vi.fn().mockResolvedValue(undefined),
|
||||
slDisallowed: true,
|
||||
allowedTargetConnectionIds: new Set(['c1']),
|
||||
});
|
||||
|
||||
expect(result).toMatchObject({
|
||||
status: 'textual_conflict',
|
||||
touchedPaths: ['semantic-layer/c1/orders.yaml'],
|
||||
});
|
||||
const rawTrace = await readFile(trace.tracePath, 'utf-8');
|
||||
expect(rawTrace).toContain('patch_policy_rejected');
|
||||
expect(rawTrace).toContain('slDisallowed WorkUnit lookml-mismatch touched semantic-layer/c1/orders.yaml');
|
||||
});
|
||||
|
||||
it('classifies unauthorized semantic-layer targets as traced textual conflicts', async () => {
|
||||
const { homeDir, git, baseSha } = await makeRepo();
|
||||
const childDir = join(homeDir, 'child-target-policy');
|
||||
await git.addWorktree(childDir, 'child-target-policy', baseSha);
|
||||
const childGit = git.forWorktree(childDir);
|
||||
await mkdir(join(childDir, 'semantic-layer/finance'), { recursive: true });
|
||||
await writeFile(
|
||||
join(childDir, 'semantic-layer/finance/orders.yaml'),
|
||||
'name: orders\ncolumns: []\njoins: []\nmeasures: []\n',
|
||||
);
|
||||
await childGit.commitFiles(['semantic-layer/finance/orders.yaml'], 'unauthorized sl', 'System User', 'system@example.com');
|
||||
const patchPath = join(homeDir, 'patches/unauthorized.patch');
|
||||
await childGit.writeBinaryNoRenamePatch(baseSha, 'HEAD', patchPath);
|
||||
const trace = new FileIngestTraceWriter({
|
||||
tracePath: join(homeDir, '.ktx/ingest-traces/job-target-policy/trace.jsonl'),
|
||||
jobId: 'job-target-policy',
|
||||
connectionId: 'c1',
|
||||
sourceKey: 'fake',
|
||||
level: 'trace',
|
||||
});
|
||||
|
||||
const result = await integrateWorkUnitPatch({
|
||||
unitKey: 'wu-finance',
|
||||
patchPath,
|
||||
integrationGit: git,
|
||||
trace,
|
||||
author: { name: 'KTX Test', email: 'system@ktx.local' },
|
||||
validateAppliedTree: vi.fn().mockResolvedValue(undefined),
|
||||
slDisallowed: false,
|
||||
allowedTargetConnectionIds: new Set(['warehouse']),
|
||||
});
|
||||
|
||||
expect(result).toMatchObject({
|
||||
status: 'textual_conflict',
|
||||
touchedPaths: ['semantic-layer/finance/orders.yaml'],
|
||||
});
|
||||
const rawTrace = await readFile(trace.tracePath, 'utf-8');
|
||||
expect(rawTrace).toContain('patch_policy_rejected');
|
||||
expect(rawTrace).toContain('semantic-layer target connection not allowed');
|
||||
expect(rawTrace).toContain('allowedTargetConnectionIds');
|
||||
});
|
||||
|
||||
it('repairs a textual conflict through the bounded resolver and commits repaired files', async () => {
|
||||
const { homeDir, configDir, git, baseSha } = await makeRepo();
|
||||
await mkdir(join(configDir, 'wiki/global'), { recursive: true });
|
||||
await writeFile(join(configDir, 'wiki/global/a.md'), 'base\n', 'utf-8');
|
||||
await git.commitFiles(['wiki/global/a.md'], 'base page', 'System User', 'system@example.com');
|
||||
const conflictBase = await git.revParseHead();
|
||||
|
||||
await writeFile(join(configDir, 'wiki/global/a.md'), 'accepted\n', 'utf-8');
|
||||
await git.commitFiles(['wiki/global/a.md'], 'accepted edit', 'System User', 'system@example.com');
|
||||
|
||||
const childDir = join(homeDir, 'child-conflict');
|
||||
await git.addWorktree(childDir, 'child-conflict', conflictBase);
|
||||
const childGit = git.forWorktree(childDir);
|
||||
await writeFile(join(childDir, 'wiki/global/a.md'), 'proposal\n', 'utf-8');
|
||||
await childGit.commitFiles(['wiki/global/a.md'], 'proposal edit', 'System User', 'system@example.com');
|
||||
const patchPath = join(homeDir, 'proposal.patch');
|
||||
await childGit.writeBinaryNoRenamePatch(conflictBase, 'HEAD', patchPath);
|
||||
|
||||
const trace = new FileIngestTraceWriter({
|
||||
tracePath: join(homeDir, '.ktx/ingest-traces/job-resolver/trace.jsonl'),
|
||||
jobId: 'job-resolver',
|
||||
connectionId: 'warehouse',
|
||||
sourceKey: 'metabase',
|
||||
level: 'trace',
|
||||
});
|
||||
|
||||
const validateAppliedTree = vi.fn(async (paths: string[]) => {
|
||||
expect(paths).toEqual(['wiki/global/a.md']);
|
||||
await expect(readFile(join(configDir, 'wiki/global/a.md'), 'utf-8')).resolves.toBe('accepted\nproposal\n');
|
||||
});
|
||||
|
||||
const result = await integrateWorkUnitPatch({
|
||||
unitKey: 'wu-conflict',
|
||||
patchPath,
|
||||
integrationGit: git,
|
||||
trace,
|
||||
author: { name: 'System User', email: 'system@example.com' },
|
||||
slDisallowed: false,
|
||||
allowedTargetConnectionIds: new Set(['warehouse']),
|
||||
validateAppliedTree,
|
||||
resolveTextualConflict: vi.fn(async (context) => {
|
||||
expect(context).toMatchObject({
|
||||
unitKey: 'wu-conflict',
|
||||
patchPath,
|
||||
touchedPaths: ['wiki/global/a.md'],
|
||||
});
|
||||
await writeFile(join(configDir, 'wiki/global/a.md'), 'accepted\nproposal\n', 'utf-8');
|
||||
return {
|
||||
status: 'repaired' as const,
|
||||
attempts: 1,
|
||||
changedPaths: ['wiki/global/a.md'],
|
||||
};
|
||||
}),
|
||||
});
|
||||
|
||||
expect(result).toMatchObject({
|
||||
status: 'accepted',
|
||||
touchedPaths: ['wiki/global/a.md'],
|
||||
textualResolution: {
|
||||
status: 'repaired',
|
||||
attempts: 1,
|
||||
changedPaths: ['wiki/global/a.md'],
|
||||
},
|
||||
});
|
||||
expect(validateAppliedTree).toHaveBeenCalledOnce();
|
||||
await expect(readFile(join(configDir, 'wiki/global/a.md'), 'utf-8')).resolves.toBe('accepted\nproposal\n');
|
||||
await expect(readFile(trace.tracePath, 'utf-8')).resolves.toContain('patch_accepted_after_textual_resolution');
|
||||
expect(await git.revParseHead()).not.toBe(baseSha);
|
||||
});
|
||||
|
||||
it('keeps the pre-apply integration tree when the resolver cannot repair a textual conflict', async () => {
|
||||
const { homeDir, configDir, git } = await makeRepo();
|
||||
await mkdir(join(configDir, 'wiki/global'), { recursive: true });
|
||||
await writeFile(join(configDir, 'wiki/global/a.md'), 'base\n', 'utf-8');
|
||||
await git.commitFiles(['wiki/global/a.md'], 'base page', 'System User', 'system@example.com');
|
||||
const conflictBase = await git.revParseHead();
|
||||
|
||||
await writeFile(join(configDir, 'wiki/global/a.md'), 'accepted\n', 'utf-8');
|
||||
await git.commitFiles(['wiki/global/a.md'], 'accepted edit', 'System User', 'system@example.com');
|
||||
const acceptedHead = await git.revParseHead();
|
||||
|
||||
const childDir = join(homeDir, 'child-conflict-fails');
|
||||
await git.addWorktree(childDir, 'child-conflict-fails', conflictBase);
|
||||
const childGit = git.forWorktree(childDir);
|
||||
await writeFile(join(childDir, 'wiki/global/a.md'), 'proposal\n', 'utf-8');
|
||||
await childGit.commitFiles(['wiki/global/a.md'], 'proposal edit', 'System User', 'system@example.com');
|
||||
const patchPath = join(homeDir, 'proposal-fails.patch');
|
||||
await childGit.writeBinaryNoRenamePatch(conflictBase, 'HEAD', patchPath);
|
||||
|
||||
const trace = new FileIngestTraceWriter({
|
||||
tracePath: join(homeDir, '.ktx/ingest-traces/job-resolver-fails/trace.jsonl'),
|
||||
jobId: 'job-resolver-fails',
|
||||
connectionId: 'warehouse',
|
||||
sourceKey: 'metabase',
|
||||
level: 'trace',
|
||||
});
|
||||
|
||||
const result = await integrateWorkUnitPatch({
|
||||
unitKey: 'wu-conflict',
|
||||
patchPath,
|
||||
integrationGit: git,
|
||||
trace,
|
||||
author: { name: 'System User', email: 'system@example.com' },
|
||||
slDisallowed: false,
|
||||
allowedTargetConnectionIds: new Set(['warehouse']),
|
||||
validateAppliedTree: vi.fn(async () => {}),
|
||||
resolveTextualConflict: vi.fn(async () => ({
|
||||
status: 'failed' as const,
|
||||
attempts: 1,
|
||||
reason: 'resolver completed without editing an allowed path',
|
||||
})),
|
||||
});
|
||||
|
||||
expect(result).toMatchObject({
|
||||
status: 'textual_conflict',
|
||||
textualResolution: {
|
||||
status: 'failed',
|
||||
attempts: 1,
|
||||
reason: 'resolver completed without editing an allowed path',
|
||||
},
|
||||
});
|
||||
expect(await git.revParseHead()).toBe(acceptedHead);
|
||||
await expect(readFile(join(configDir, 'wiki/global/a.md'), 'utf-8')).resolves.toBe('accepted\n');
|
||||
});
|
||||
|
||||
it('repairs semantic gate failures after a patch applies cleanly', async () => {
|
||||
const { homeDir, configDir, git, baseSha } = await makeRepo();
|
||||
const childDir = join(homeDir, 'child-semantic-repair');
|
||||
await git.addWorktree(childDir, 'child-semantic-repair', baseSha);
|
||||
const childGit = git.forWorktree(childDir);
|
||||
await writeFile(join(childDir, 'wiki/global/a.md'), 'bad semantic ref\n');
|
||||
await childGit.commitFiles(['wiki/global/a.md'], 'bad semantic edit', 'System User', 'system@example.com');
|
||||
const patchPath = join(homeDir, 'patches/semantic-repair.patch');
|
||||
await childGit.writeBinaryNoRenamePatch(baseSha, 'HEAD', patchPath);
|
||||
const trace = new FileIngestTraceWriter({
|
||||
tracePath: join(homeDir, '.ktx/ingest-traces/job-semantic-repair/trace.jsonl'),
|
||||
jobId: 'job-semantic-repair',
|
||||
connectionId: 'c1',
|
||||
sourceKey: 'fake',
|
||||
level: 'trace',
|
||||
});
|
||||
const validateAppliedTree = vi
|
||||
.fn()
|
||||
.mockRejectedValueOnce(new Error('final artifact gates failed:\na: unknown semantic-layer entity'))
|
||||
.mockResolvedValueOnce(undefined);
|
||||
|
||||
const result = await integrateWorkUnitPatch({
|
||||
unitKey: 'wu-repairable',
|
||||
patchPath,
|
||||
integrationGit: git,
|
||||
trace,
|
||||
author: { name: 'KTX Test', email: 'system@ktx.local' },
|
||||
validateAppliedTree,
|
||||
slDisallowed: false,
|
||||
allowedTargetConnectionIds: new Set(['c1']),
|
||||
repairGateFailure: vi.fn(async (context) => {
|
||||
expect(context).toMatchObject({
|
||||
unitKey: 'wu-repairable',
|
||||
patchPath,
|
||||
touchedPaths: ['wiki/global/a.md'],
|
||||
});
|
||||
await writeFile(join(configDir, 'wiki/global/a.md'), 'repaired semantic ref\n', 'utf-8');
|
||||
return {
|
||||
status: 'repaired' as const,
|
||||
attempts: 1,
|
||||
changedPaths: ['wiki/global/a.md'],
|
||||
};
|
||||
}),
|
||||
});
|
||||
|
||||
expect(result).toMatchObject({
|
||||
status: 'accepted',
|
||||
touchedPaths: ['wiki/global/a.md'],
|
||||
gateRepair: {
|
||||
status: 'repaired',
|
||||
attempts: 1,
|
||||
changedPaths: ['wiki/global/a.md'],
|
||||
},
|
||||
});
|
||||
expect(validateAppliedTree).toHaveBeenCalledTimes(2);
|
||||
await expect(readFile(join(configDir, 'wiki/global/a.md'), 'utf-8')).resolves.toBe('repaired semantic ref\n');
|
||||
await expect(readFile(trace.tracePath, 'utf-8')).resolves.toContain('patch_accepted_after_gate_repair');
|
||||
});
|
||||
|
||||
it('keeps the pre-apply tree when semantic gate repair fails', async () => {
|
||||
const { homeDir, configDir, git, baseSha } = await makeRepo();
|
||||
const childDir = join(homeDir, 'child-semantic-repair-fails');
|
||||
await git.addWorktree(childDir, 'child-semantic-repair-fails', baseSha);
|
||||
const childGit = git.forWorktree(childDir);
|
||||
await writeFile(join(childDir, 'wiki/global/a.md'), 'bad semantic ref\n');
|
||||
await childGit.commitFiles(['wiki/global/a.md'], 'bad semantic edit', 'System User', 'system@example.com');
|
||||
const patchPath = join(homeDir, 'patches/semantic-repair-fails.patch');
|
||||
await childGit.writeBinaryNoRenamePatch(baseSha, 'HEAD', patchPath);
|
||||
const trace = new FileIngestTraceWriter({
|
||||
tracePath: join(homeDir, '.ktx/ingest-traces/job-semantic-repair-fails/trace.jsonl'),
|
||||
jobId: 'job-semantic-repair-fails',
|
||||
connectionId: 'c1',
|
||||
sourceKey: 'fake',
|
||||
level: 'trace',
|
||||
});
|
||||
|
||||
const result = await integrateWorkUnitPatch({
|
||||
unitKey: 'wu-not-repaired',
|
||||
patchPath,
|
||||
integrationGit: git,
|
||||
trace,
|
||||
author: { name: 'KTX Test', email: 'system@ktx.local' },
|
||||
validateAppliedTree: vi.fn().mockRejectedValue(new Error('final artifact gates failed')),
|
||||
slDisallowed: false,
|
||||
allowedTargetConnectionIds: new Set(['c1']),
|
||||
repairGateFailure: vi.fn(async () => ({
|
||||
status: 'failed' as const,
|
||||
attempts: 1,
|
||||
reason: 'gate repair completed without editing an allowed path',
|
||||
})),
|
||||
});
|
||||
|
||||
expect(result).toMatchObject({
|
||||
status: 'semantic_conflict',
|
||||
gateRepair: {
|
||||
status: 'failed',
|
||||
attempts: 1,
|
||||
reason: 'gate repair completed without editing an allowed path',
|
||||
},
|
||||
});
|
||||
await expect(readFile(join(configDir, 'wiki/global/a.md'), 'utf-8')).resolves.toBe('old\n');
|
||||
});
|
||||
});
|
||||
321
packages/context/src/ingest/isolated-diff/patch-integrator.ts
Normal file
321
packages/context/src/ingest/isolated-diff/patch-integrator.ts
Normal file
|
|
@ -0,0 +1,321 @@
|
|||
import { readFile } from 'node:fs/promises';
|
||||
import type { GitService } from '../../core/index.js';
|
||||
import type { FinalGateRepairResult } from '../final-gate-repair.js';
|
||||
import type { IngestTraceWriter } from '../ingest-trace.js';
|
||||
import { traceTimed } from '../ingest-trace.js';
|
||||
import { assertPatchAllowedForWorkUnit, parsePatchTouchedPaths } from './git-patch.js';
|
||||
import type { TextualConflictResolutionResult } from './textual-conflict-resolver.js';
|
||||
|
||||
export type PatchIntegrationTextualResolution =
|
||||
| { status: 'repaired'; attempts: number; changedPaths: string[] }
|
||||
| { status: 'failed'; attempts: number; reason: string };
|
||||
|
||||
export type PatchIntegrationResult =
|
||||
| {
|
||||
status: 'accepted';
|
||||
commitSha: string;
|
||||
touchedPaths: string[];
|
||||
textualResolution?: PatchIntegrationTextualResolution;
|
||||
gateRepair?: FinalGateRepairResult;
|
||||
}
|
||||
| {
|
||||
status: 'textual_conflict';
|
||||
reason: string;
|
||||
touchedPaths: string[];
|
||||
textualResolution?: PatchIntegrationTextualResolution;
|
||||
gateRepair?: FinalGateRepairResult;
|
||||
}
|
||||
| {
|
||||
status: 'semantic_conflict';
|
||||
reason: string;
|
||||
touchedPaths: string[];
|
||||
textualResolution?: PatchIntegrationTextualResolution;
|
||||
gateRepair?: FinalGateRepairResult;
|
||||
};
|
||||
|
||||
export interface IntegrateWorkUnitPatchInput {
|
||||
unitKey: string;
|
||||
patchPath: string;
|
||||
integrationGit: GitService;
|
||||
trace: IngestTraceWriter;
|
||||
author: { name: string; email: string };
|
||||
slDisallowed: boolean;
|
||||
allowedTargetConnectionIds: ReadonlySet<string>;
|
||||
validateAppliedTree(touchedPaths: string[]): Promise<void>;
|
||||
resolveTextualConflict?(input: {
|
||||
unitKey: string;
|
||||
patchPath: string;
|
||||
touchedPaths: string[];
|
||||
reason: string;
|
||||
}): Promise<TextualConflictResolutionResult>;
|
||||
repairGateFailure?(input: {
|
||||
unitKey: string;
|
||||
patchPath: string;
|
||||
touchedPaths: string[];
|
||||
reason: string;
|
||||
}): Promise<FinalGateRepairResult>;
|
||||
}
|
||||
|
||||
function errorMessage(error: unknown): string {
|
||||
return error instanceof Error ? error.message : String(error);
|
||||
}
|
||||
|
||||
export async function integrateWorkUnitPatch(input: IntegrateWorkUnitPatchInput): Promise<PatchIntegrationResult> {
|
||||
const preApplyHead = await input.integrationGit.revParseHead();
|
||||
const patch = await readFile(input.patchPath, 'utf-8');
|
||||
const touchedPaths = parsePatchTouchedPaths(patch).map((entry) => entry.path);
|
||||
if (touchedPaths.length === 0) {
|
||||
await input.trace.event('debug', 'integration', 'patch_noop_accepted', {
|
||||
unitKey: input.unitKey,
|
||||
patchPath: input.patchPath,
|
||||
patchBytes: Buffer.byteLength(patch),
|
||||
});
|
||||
return { status: 'accepted', commitSha: preApplyHead ?? '', touchedPaths };
|
||||
}
|
||||
try {
|
||||
assertPatchAllowedForWorkUnit({
|
||||
unitKey: input.unitKey,
|
||||
patch,
|
||||
slDisallowed: input.slDisallowed,
|
||||
allowedTargetConnectionIds: input.allowedTargetConnectionIds,
|
||||
});
|
||||
} catch (error) {
|
||||
await input.trace.event('error', 'integration', 'patch_policy_rejected', {
|
||||
unitKey: input.unitKey,
|
||||
patchPath: input.patchPath,
|
||||
touchedPaths,
|
||||
allowedTargetConnectionIds: [...input.allowedTargetConnectionIds].sort(),
|
||||
reason: errorMessage(error),
|
||||
});
|
||||
return {
|
||||
status: 'textual_conflict',
|
||||
reason: errorMessage(error),
|
||||
touchedPaths,
|
||||
};
|
||||
}
|
||||
|
||||
try {
|
||||
await traceTimed(
|
||||
input.trace,
|
||||
'integration',
|
||||
'patch_apply',
|
||||
{ unitKey: input.unitKey, patchPath: input.patchPath, touchedPaths },
|
||||
async () => {
|
||||
await input.integrationGit.applyPatchFile3WayIndex(input.patchPath);
|
||||
await input.integrationGit.assertWorktreeClean();
|
||||
},
|
||||
);
|
||||
} catch (error) {
|
||||
if (preApplyHead) {
|
||||
await input.integrationGit.resetHardTo(preApplyHead);
|
||||
}
|
||||
const reason = errorMessage(error);
|
||||
await input.trace.event('error', 'integration', 'patch_textual_conflict', {
|
||||
unitKey: input.unitKey,
|
||||
patchPath: input.patchPath,
|
||||
touchedPaths,
|
||||
reason,
|
||||
});
|
||||
|
||||
if (!input.resolveTextualConflict) {
|
||||
return {
|
||||
status: 'textual_conflict',
|
||||
reason,
|
||||
touchedPaths,
|
||||
};
|
||||
}
|
||||
|
||||
const textualResolution = await input.resolveTextualConflict({
|
||||
unitKey: input.unitKey,
|
||||
patchPath: input.patchPath,
|
||||
touchedPaths,
|
||||
reason,
|
||||
});
|
||||
|
||||
if (textualResolution.status === 'failed') {
|
||||
if (preApplyHead) {
|
||||
await input.integrationGit.resetHardTo(preApplyHead);
|
||||
}
|
||||
return {
|
||||
status: 'textual_conflict',
|
||||
reason: textualResolution.reason,
|
||||
touchedPaths,
|
||||
textualResolution,
|
||||
};
|
||||
}
|
||||
|
||||
try {
|
||||
await traceTimed(
|
||||
input.trace,
|
||||
'integration',
|
||||
'semantic_gate_after_textual_resolution',
|
||||
{ unitKey: input.unitKey, touchedPaths: textualResolution.changedPaths },
|
||||
async () => {
|
||||
await input.validateAppliedTree(textualResolution.changedPaths);
|
||||
},
|
||||
);
|
||||
} catch (semanticError) {
|
||||
if (preApplyHead) {
|
||||
await input.integrationGit.resetHardTo(preApplyHead);
|
||||
}
|
||||
await input.trace.event('error', 'integration', 'patch_semantic_conflict_after_textual_resolution', {
|
||||
unitKey: input.unitKey,
|
||||
patchPath: input.patchPath,
|
||||
touchedPaths: textualResolution.changedPaths,
|
||||
reason: errorMessage(semanticError),
|
||||
});
|
||||
return {
|
||||
status: 'semantic_conflict',
|
||||
reason: errorMessage(semanticError),
|
||||
touchedPaths: textualResolution.changedPaths,
|
||||
textualResolution,
|
||||
};
|
||||
}
|
||||
|
||||
const commit = await input.integrationGit.commitFiles(
|
||||
textualResolution.changedPaths,
|
||||
`ingest: resolve WorkUnit ${input.unitKey} conflict`,
|
||||
input.author.name,
|
||||
input.author.email,
|
||||
);
|
||||
if (!commit.created) {
|
||||
if (preApplyHead) {
|
||||
await input.integrationGit.resetHardTo(preApplyHead);
|
||||
}
|
||||
const noChangeReason = 'textual resolver produced no committable changes';
|
||||
await input.trace.event('error', 'integration', 'textual_conflict_resolver_noop', {
|
||||
unitKey: input.unitKey,
|
||||
patchPath: input.patchPath,
|
||||
touchedPaths: textualResolution.changedPaths,
|
||||
});
|
||||
return {
|
||||
status: 'textual_conflict',
|
||||
reason: noChangeReason,
|
||||
touchedPaths: textualResolution.changedPaths,
|
||||
textualResolution,
|
||||
};
|
||||
}
|
||||
|
||||
await input.trace.event('debug', 'integration', 'patch_accepted_after_textual_resolution', {
|
||||
unitKey: input.unitKey,
|
||||
commitSha: commit.commitHash,
|
||||
touchedPaths: textualResolution.changedPaths,
|
||||
attempts: textualResolution.attempts,
|
||||
});
|
||||
return {
|
||||
status: 'accepted',
|
||||
commitSha: commit.commitHash,
|
||||
touchedPaths: textualResolution.changedPaths,
|
||||
textualResolution,
|
||||
};
|
||||
}
|
||||
|
||||
try {
|
||||
await traceTimed(input.trace, 'integration', 'semantic_gate', { unitKey: input.unitKey, touchedPaths }, async () => {
|
||||
await input.validateAppliedTree(touchedPaths);
|
||||
});
|
||||
} catch (error) {
|
||||
const reason = errorMessage(error);
|
||||
await input.trace.event('error', 'integration', 'patch_semantic_conflict', {
|
||||
unitKey: input.unitKey,
|
||||
patchPath: input.patchPath,
|
||||
touchedPaths,
|
||||
reason,
|
||||
});
|
||||
|
||||
if (input.repairGateFailure) {
|
||||
const gateRepair = await input.repairGateFailure({
|
||||
unitKey: input.unitKey,
|
||||
patchPath: input.patchPath,
|
||||
touchedPaths,
|
||||
reason,
|
||||
});
|
||||
|
||||
if (gateRepair.status === 'failed') {
|
||||
if (preApplyHead) {
|
||||
await input.integrationGit.resetHardTo(preApplyHead);
|
||||
}
|
||||
return {
|
||||
status: 'semantic_conflict',
|
||||
reason: gateRepair.reason,
|
||||
touchedPaths,
|
||||
gateRepair,
|
||||
};
|
||||
}
|
||||
|
||||
try {
|
||||
await traceTimed(
|
||||
input.trace,
|
||||
'integration',
|
||||
'semantic_gate_after_gate_repair',
|
||||
{ unitKey: input.unitKey, touchedPaths: gateRepair.changedPaths },
|
||||
async () => {
|
||||
await input.validateAppliedTree(gateRepair.changedPaths);
|
||||
},
|
||||
);
|
||||
} catch (repairValidationError) {
|
||||
if (preApplyHead) {
|
||||
await input.integrationGit.resetHardTo(preApplyHead);
|
||||
}
|
||||
return {
|
||||
status: 'semantic_conflict',
|
||||
reason: errorMessage(repairValidationError),
|
||||
touchedPaths: gateRepair.changedPaths,
|
||||
gateRepair,
|
||||
};
|
||||
}
|
||||
|
||||
const commit = await input.integrationGit.commitFiles(
|
||||
gateRepair.changedPaths,
|
||||
`ingest: repair WorkUnit ${input.unitKey} gates`,
|
||||
input.author.name,
|
||||
input.author.email,
|
||||
);
|
||||
if (!commit.created) {
|
||||
if (preApplyHead) {
|
||||
await input.integrationGit.resetHardTo(preApplyHead);
|
||||
}
|
||||
return {
|
||||
status: 'semantic_conflict',
|
||||
reason: 'gate repair produced no committable changes',
|
||||
touchedPaths: gateRepair.changedPaths,
|
||||
gateRepair,
|
||||
};
|
||||
}
|
||||
|
||||
await input.trace.event('debug', 'integration', 'patch_accepted_after_gate_repair', {
|
||||
unitKey: input.unitKey,
|
||||
commitSha: commit.commitHash,
|
||||
touchedPaths: gateRepair.changedPaths,
|
||||
attempts: gateRepair.attempts,
|
||||
});
|
||||
return {
|
||||
status: 'accepted',
|
||||
commitSha: commit.commitHash,
|
||||
touchedPaths: gateRepair.changedPaths,
|
||||
gateRepair,
|
||||
};
|
||||
}
|
||||
|
||||
if (preApplyHead) {
|
||||
await input.integrationGit.resetHardTo(preApplyHead);
|
||||
}
|
||||
return {
|
||||
status: 'semantic_conflict',
|
||||
reason,
|
||||
touchedPaths,
|
||||
};
|
||||
}
|
||||
|
||||
const commit = await input.integrationGit.commitStaged(
|
||||
`ingest: accept WorkUnit ${input.unitKey}`,
|
||||
input.author.name,
|
||||
input.author.email,
|
||||
);
|
||||
await input.trace.event('debug', 'integration', 'patch_accepted', {
|
||||
unitKey: input.unitKey,
|
||||
commitSha: commit.commitHash,
|
||||
touchedPaths,
|
||||
});
|
||||
return { status: 'accepted', commitSha: commit.commitHash, touchedPaths };
|
||||
}
|
||||
|
|
@ -0,0 +1,120 @@
|
|||
import { mkdir, mkdtemp, readFile, writeFile } from 'node:fs/promises';
|
||||
import { tmpdir } from 'node:os';
|
||||
import { join } from 'node:path';
|
||||
import { describe, expect, it, vi } from 'vitest';
|
||||
import { FileIngestTraceWriter } from '../ingest-trace.js';
|
||||
import { resolveTextualConflict } from './textual-conflict-resolver.js';
|
||||
|
||||
async function makeHarness() {
|
||||
const root = await mkdtemp(join(tmpdir(), 'ktx-textual-resolver-'));
|
||||
const workdir = join(root, 'workdir');
|
||||
const patchPath = join(root, 'failed.patch');
|
||||
const trace = new FileIngestTraceWriter({
|
||||
tracePath: join(root, 'trace.jsonl'),
|
||||
jobId: 'job-1',
|
||||
connectionId: 'warehouse',
|
||||
sourceKey: 'metabase',
|
||||
runId: 'run-1',
|
||||
syncId: 'sync-1',
|
||||
level: 'trace',
|
||||
});
|
||||
await mkdir(join(workdir, 'wiki/global'), { recursive: true });
|
||||
await writeFile(join(workdir, 'wiki/global/account.md'), 'accepted line\n', 'utf-8');
|
||||
await writeFile(
|
||||
patchPath,
|
||||
[
|
||||
'diff --git a/wiki/global/account.md b/wiki/global/account.md',
|
||||
'index 8877391..6f63f4d 100644',
|
||||
'--- a/wiki/global/account.md',
|
||||
'+++ b/wiki/global/account.md',
|
||||
'@@ -1 +1 @@',
|
||||
'-base line',
|
||||
'+proposal line',
|
||||
'',
|
||||
].join('\n'),
|
||||
'utf-8',
|
||||
);
|
||||
return { root, workdir, patchPath, trace };
|
||||
}
|
||||
|
||||
describe('resolveTextualConflict', () => {
|
||||
it('lets the repair agent read the failed patch and write only touched paths', async () => {
|
||||
const { workdir, patchPath, trace } = await makeHarness();
|
||||
const agentRunner = {
|
||||
runLoop: vi.fn(async (params: any) => {
|
||||
const current = await params.toolSet.read_integration_file.execute({ path: 'wiki/global/account.md' });
|
||||
expect(current.structured).toEqual({ path: 'wiki/global/account.md', exists: true });
|
||||
expect(current.markdown).toContain('accepted line');
|
||||
|
||||
const patch = await params.toolSet.read_failed_patch.execute({});
|
||||
expect(patch.markdown).toContain('proposal line');
|
||||
|
||||
await expect(
|
||||
params.toolSet.write_integration_file.execute({
|
||||
path: 'wiki/global/not-allowed.md',
|
||||
content: 'bad\n',
|
||||
}),
|
||||
).rejects.toThrow(/resolver path not allowed/);
|
||||
|
||||
await params.toolSet.write_integration_file.execute({
|
||||
path: 'wiki/global/account.md',
|
||||
content: 'accepted line\nproposal line\n',
|
||||
});
|
||||
return { stopReason: 'natural' as const };
|
||||
}),
|
||||
};
|
||||
|
||||
const result = await resolveTextualConflict({
|
||||
agentRunner,
|
||||
workdir,
|
||||
unitKey: 'wu-a',
|
||||
patchPath,
|
||||
touchedPaths: ['wiki/global/account.md'],
|
||||
trace,
|
||||
reason: 'patch failed: wiki/global/account.md',
|
||||
maxAttempts: 1,
|
||||
stepBudget: 8,
|
||||
});
|
||||
|
||||
expect(result).toEqual({
|
||||
status: 'repaired',
|
||||
attempts: 1,
|
||||
changedPaths: ['wiki/global/account.md'],
|
||||
});
|
||||
await expect(readFile(join(workdir, 'wiki/global/account.md'), 'utf-8')).resolves.toBe(
|
||||
'accepted line\nproposal line\n',
|
||||
);
|
||||
expect(agentRunner.runLoop).toHaveBeenCalledWith(
|
||||
expect.objectContaining({
|
||||
modelRole: 'repair',
|
||||
stepBudget: 8,
|
||||
telemetryTags: expect.objectContaining({
|
||||
operationName: 'ingest-isolated-diff-textual-resolver',
|
||||
jobId: 'job-1',
|
||||
unitKey: 'wu-a',
|
||||
}),
|
||||
}),
|
||||
);
|
||||
});
|
||||
|
||||
it('fails when the repair agent completes without editing any touched path', async () => {
|
||||
const { workdir, patchPath, trace } = await makeHarness();
|
||||
const result = await resolveTextualConflict({
|
||||
agentRunner: { runLoop: vi.fn(async () => ({ stopReason: 'natural' as const })) },
|
||||
workdir,
|
||||
unitKey: 'wu-a',
|
||||
patchPath,
|
||||
touchedPaths: ['wiki/global/account.md'],
|
||||
trace,
|
||||
reason: 'patch failed: wiki/global/account.md',
|
||||
maxAttempts: 1,
|
||||
stepBudget: 8,
|
||||
});
|
||||
|
||||
expect(result).toEqual({
|
||||
status: 'failed',
|
||||
attempts: 1,
|
||||
reason: 'resolver completed without editing an allowed path',
|
||||
});
|
||||
});
|
||||
});
|
||||
|
|
@ -0,0 +1,238 @@
|
|||
import { mkdir, readFile, rm, writeFile } from 'node:fs/promises';
|
||||
import { dirname, join } from 'node:path';
|
||||
import { z } from 'zod';
|
||||
import type { AgentRunnerPort, KtxRuntimeToolSet } from '../../llm/index.js';
|
||||
import type { IngestTraceWriter } from '../ingest-trace.js';
|
||||
import { traceTimed } from '../ingest-trace.js';
|
||||
|
||||
export type TextualConflictResolutionResult =
|
||||
| { status: 'repaired'; attempts: number; changedPaths: string[] }
|
||||
| { status: 'failed'; attempts: number; reason: string };
|
||||
|
||||
export interface ResolveTextualConflictInput {
|
||||
agentRunner: AgentRunnerPort;
|
||||
workdir: string;
|
||||
unitKey: string;
|
||||
patchPath: string;
|
||||
touchedPaths: string[];
|
||||
trace: IngestTraceWriter;
|
||||
reason: string;
|
||||
maxAttempts?: number;
|
||||
stepBudget?: number;
|
||||
}
|
||||
|
||||
const readIntegrationFileSchema = z.object({
|
||||
path: z.string().min(1),
|
||||
});
|
||||
|
||||
const writeIntegrationFileSchema = z.object({
|
||||
path: z.string().min(1),
|
||||
content: z.string(),
|
||||
});
|
||||
|
||||
const deleteIntegrationFileSchema = z.object({
|
||||
path: z.string().min(1),
|
||||
});
|
||||
|
||||
function normalizeRepoPath(path: string): string {
|
||||
const normalized = path.replace(/\\/g, '/').replace(/^\/+/, '');
|
||||
const parts = normalized.split('/').filter((part) => part.length > 0);
|
||||
if (parts.length === 0 || parts.some((part) => part === '.' || part === '..')) {
|
||||
throw new Error(`resolver path must be a repository-relative path: ${path}`);
|
||||
}
|
||||
return parts.join('/');
|
||||
}
|
||||
|
||||
function assertAllowedPath(path: string, allowedPaths: ReadonlySet<string>): string {
|
||||
const normalized = normalizeRepoPath(path);
|
||||
if (!allowedPaths.has(normalized)) {
|
||||
throw new Error(`resolver path not allowed: ${normalized}`);
|
||||
}
|
||||
return normalized;
|
||||
}
|
||||
|
||||
async function readOptionalFile(path: string): Promise<{ exists: boolean; content: string }> {
|
||||
try {
|
||||
return { exists: true, content: await readFile(path, 'utf-8') };
|
||||
} catch (error) {
|
||||
if (error && typeof error === 'object' && 'code' in error && error.code === 'ENOENT') {
|
||||
return { exists: false, content: '' };
|
||||
}
|
||||
throw error;
|
||||
}
|
||||
}
|
||||
|
||||
function buildResolverSystemPrompt(): string {
|
||||
return `<role>
|
||||
You repair one failed KTX isolated-diff patch inside the integration worktree.
|
||||
</role>
|
||||
|
||||
<rules>
|
||||
- Preserve accepted integration content that is unrelated to the failed patch.
|
||||
- Incorporate the failed patch only when the patch evidence is compatible with the current file.
|
||||
- Edit only paths exposed by the resolver tools.
|
||||
- Prefer the smallest text edit that makes the composed artifact coherent.
|
||||
- Do not create new facts that are absent from the current file or failed patch.
|
||||
- Stop after writing the repaired file content.
|
||||
</rules>`;
|
||||
}
|
||||
|
||||
function buildResolverUserPrompt(input: {
|
||||
unitKey: string;
|
||||
patchPath: string;
|
||||
touchedPaths: string[];
|
||||
reason: string;
|
||||
attempt: number;
|
||||
maxAttempts: number;
|
||||
}): string {
|
||||
return `Repair isolated-diff textual conflict.
|
||||
|
||||
WorkUnit: ${input.unitKey}
|
||||
Attempt: ${input.attempt} of ${input.maxAttempts}
|
||||
Patch path: ${input.patchPath}
|
||||
Touched paths:
|
||||
${input.touchedPaths.map((path) => `- ${path}`).join('\n')}
|
||||
|
||||
Git apply failure:
|
||||
${input.reason}
|
||||
|
||||
Use read_failed_patch first. Then read the touched integration files, write the
|
||||
repaired content, and stop.`;
|
||||
}
|
||||
|
||||
function buildToolSet(input: {
|
||||
workdir: string;
|
||||
patchPath: string;
|
||||
allowedPaths: ReadonlySet<string>;
|
||||
editedPaths: Set<string>;
|
||||
}): KtxRuntimeToolSet {
|
||||
return {
|
||||
read_failed_patch: {
|
||||
name: 'read_failed_patch',
|
||||
description: 'Read the failed Git patch that could not be applied to the integration worktree.',
|
||||
inputSchema: z.object({}),
|
||||
execute: async () => {
|
||||
const patch = await readFile(input.patchPath, 'utf-8');
|
||||
return {
|
||||
markdown: patch,
|
||||
structured: { patchPath: input.patchPath, bytes: Buffer.byteLength(patch) },
|
||||
};
|
||||
},
|
||||
},
|
||||
read_integration_file: {
|
||||
name: 'read_integration_file',
|
||||
description: 'Read one allowed file from the current integration worktree.',
|
||||
inputSchema: readIntegrationFileSchema,
|
||||
execute: async ({ path }: z.infer<typeof readIntegrationFileSchema>) => {
|
||||
const normalized = assertAllowedPath(path, input.allowedPaths);
|
||||
const file = await readOptionalFile(join(input.workdir, normalized));
|
||||
return {
|
||||
markdown: file.exists ? file.content : `(missing file: ${normalized})`,
|
||||
structured: { path: normalized, exists: file.exists },
|
||||
};
|
||||
},
|
||||
},
|
||||
write_integration_file: {
|
||||
name: 'write_integration_file',
|
||||
description: 'Replace one allowed integration worktree file with repaired text content.',
|
||||
inputSchema: writeIntegrationFileSchema,
|
||||
execute: async ({ path, content }: z.infer<typeof writeIntegrationFileSchema>) => {
|
||||
const normalized = assertAllowedPath(path, input.allowedPaths);
|
||||
const fullPath = join(input.workdir, normalized);
|
||||
await mkdir(dirname(fullPath), { recursive: true });
|
||||
await writeFile(fullPath, content, 'utf-8');
|
||||
input.editedPaths.add(normalized);
|
||||
return {
|
||||
markdown: `Wrote ${normalized}`,
|
||||
structured: { path: normalized, bytes: Buffer.byteLength(content) },
|
||||
};
|
||||
},
|
||||
},
|
||||
delete_integration_file: {
|
||||
name: 'delete_integration_file',
|
||||
description: 'Delete one allowed integration worktree file when the failed patch proves the deletion is correct.',
|
||||
inputSchema: deleteIntegrationFileSchema,
|
||||
execute: async ({ path }: z.infer<typeof deleteIntegrationFileSchema>) => {
|
||||
const normalized = assertAllowedPath(path, input.allowedPaths);
|
||||
await rm(join(input.workdir, normalized), { force: true });
|
||||
input.editedPaths.add(normalized);
|
||||
return {
|
||||
markdown: `Deleted ${normalized}`,
|
||||
structured: { path: normalized },
|
||||
};
|
||||
},
|
||||
},
|
||||
};
|
||||
}
|
||||
|
||||
export async function resolveTextualConflict(
|
||||
input: ResolveTextualConflictInput,
|
||||
): Promise<TextualConflictResolutionResult> {
|
||||
const allowedPaths = new Set(input.touchedPaths.map(normalizeRepoPath));
|
||||
const maxAttempts = input.maxAttempts ?? 1;
|
||||
const stepBudget = input.stepBudget ?? 12;
|
||||
let lastFailure = 'resolver did not run';
|
||||
|
||||
for (let attempt = 1; attempt <= maxAttempts; attempt += 1) {
|
||||
const editedPaths = new Set<string>();
|
||||
const traceData = {
|
||||
unitKey: input.unitKey,
|
||||
patchPath: input.patchPath,
|
||||
touchedPaths: [...allowedPaths].sort(),
|
||||
attempt,
|
||||
maxAttempts,
|
||||
reason: input.reason,
|
||||
};
|
||||
const result = await traceTimed(input.trace, 'resolver', 'textual_conflict_resolver', traceData, async () =>
|
||||
input.agentRunner.runLoop({
|
||||
modelRole: 'repair',
|
||||
systemPrompt: buildResolverSystemPrompt(),
|
||||
userPrompt: buildResolverUserPrompt({
|
||||
unitKey: input.unitKey,
|
||||
patchPath: input.patchPath,
|
||||
touchedPaths: [...allowedPaths].sort(),
|
||||
reason: input.reason,
|
||||
attempt,
|
||||
maxAttempts,
|
||||
}),
|
||||
toolSet: buildToolSet({
|
||||
workdir: input.workdir,
|
||||
patchPath: input.patchPath,
|
||||
allowedPaths,
|
||||
editedPaths,
|
||||
}),
|
||||
stepBudget,
|
||||
telemetryTags: {
|
||||
operationName: 'ingest-isolated-diff-textual-resolver',
|
||||
source: input.trace.context.sourceKey,
|
||||
jobId: input.trace.context.jobId,
|
||||
unitKey: input.unitKey,
|
||||
},
|
||||
}),
|
||||
);
|
||||
|
||||
if (result.stopReason === 'error') {
|
||||
lastFailure = result.error?.message ?? 'resolver agent loop errored';
|
||||
await input.trace.event('error', 'resolver', 'textual_conflict_resolver_failed', traceData, result.error);
|
||||
continue;
|
||||
}
|
||||
|
||||
const changedPaths = [...editedPaths].sort();
|
||||
if (changedPaths.length === 0) {
|
||||
lastFailure = 'resolver completed without editing an allowed path';
|
||||
await input.trace.event('error', 'resolver', 'textual_conflict_resolver_failed', {
|
||||
...traceData,
|
||||
reason: lastFailure,
|
||||
});
|
||||
continue;
|
||||
}
|
||||
|
||||
await input.trace.event('debug', 'resolver', 'textual_conflict_resolver_repaired', {
|
||||
...traceData,
|
||||
changedPaths,
|
||||
});
|
||||
return { status: 'repaired', attempts: attempt, changedPaths };
|
||||
}
|
||||
|
||||
return { status: 'failed', attempts: maxAttempts, reason: lastFailure };
|
||||
}
|
||||
|
|
@ -0,0 +1,144 @@
|
|||
import { mkdir, mkdtemp, readFile, writeFile } from 'node:fs/promises';
|
||||
import { tmpdir } from 'node:os';
|
||||
import { join } from 'node:path';
|
||||
import { describe, expect, it, vi } from 'vitest';
|
||||
import { GitService } from '../../core/index.js';
|
||||
import { FileIngestTraceWriter } from '../ingest-trace.js';
|
||||
import { runIsolatedWorkUnit } from './work-unit-executor.js';
|
||||
|
||||
async function makeGit() {
|
||||
const homeDir = await mkdtemp(join(tmpdir(), 'ktx-isolated-wu-'));
|
||||
const configDir = join(homeDir, 'config');
|
||||
const git = new GitService({
|
||||
storage: { configDir, homeDir },
|
||||
git: {
|
||||
userName: 'System User',
|
||||
userEmail: 'system@example.com',
|
||||
bootstrapMessage: 'init',
|
||||
bootstrapAuthor: 'system',
|
||||
bootstrapAuthorEmail: 'system@example.com',
|
||||
},
|
||||
});
|
||||
await git.onModuleInit();
|
||||
await mkdir(join(configDir, 'raw-sources/c1/fake/s'), { recursive: true });
|
||||
await writeFile(join(configDir, 'raw-sources/c1/fake/s/a.json'), '{}\n');
|
||||
await git.commitFiles(['raw-sources/c1/fake/s/a.json'], 'raw snapshot', 'System User', 'system@example.com');
|
||||
return { homeDir, configDir, git, baseSha: await git.revParseHead() };
|
||||
}
|
||||
|
||||
describe('runIsolatedWorkUnit', () => {
|
||||
it('creates a child worktree at the ingestion base and persists a patch proposal', async () => {
|
||||
const { homeDir, git, baseSha } = await makeGit();
|
||||
const childDir = join(homeDir, '.worktrees/session-job-1-wu-1');
|
||||
const sessionWorktreeService = {
|
||||
create: vi.fn(async (_key: string, startSha: string) => {
|
||||
await mkdir(join(homeDir, '.worktrees'), { recursive: true });
|
||||
await git.addWorktree(childDir, 'session/job-1-wu-1', startSha);
|
||||
const childGit = git.forWorktree(childDir);
|
||||
return {
|
||||
chatId: 'job-1-wu-1',
|
||||
workdir: childDir,
|
||||
branch: 'session/job-1-wu-1',
|
||||
baseSha: startSha,
|
||||
createdAt: new Date(),
|
||||
git: childGit,
|
||||
config: {},
|
||||
};
|
||||
}),
|
||||
cleanup: vi.fn(async () => undefined),
|
||||
};
|
||||
const tracePath = join(homeDir, '.ktx/ingest-traces/job-1/trace.jsonl');
|
||||
const trace = new FileIngestTraceWriter({
|
||||
tracePath,
|
||||
jobId: 'job-1',
|
||||
connectionId: 'c1',
|
||||
sourceKey: 'fake',
|
||||
level: 'trace',
|
||||
});
|
||||
|
||||
const result = await runIsolatedWorkUnit({
|
||||
unitIndex: 0,
|
||||
ingestionBaseSha: baseSha,
|
||||
sessionWorktreeService: sessionWorktreeService as never,
|
||||
patchDir: join(homeDir, '.ktx/ingest-patches/job-1'),
|
||||
trace,
|
||||
run: async (child) => {
|
||||
await mkdir(join(child.workdir, 'wiki/global'), { recursive: true });
|
||||
await writeFile(join(child.workdir, 'wiki/global/a.md'), '---\nsummary: A\nusage_mode: auto\n---\n\nBody\n');
|
||||
await child.git.commitFiles(['wiki/global/a.md'], 'test: write wiki', 'KTX Test', 'system@ktx.local');
|
||||
return {
|
||||
unitKey: 'wu-1',
|
||||
status: 'success',
|
||||
preSha: baseSha,
|
||||
postSha: await child.git.revParseHead(),
|
||||
actions: [{ target: 'wiki', type: 'created', key: 'a', detail: 'A' }],
|
||||
touchedSlSources: [],
|
||||
};
|
||||
},
|
||||
workUnit: { unitKey: 'wu-1', rawFiles: ['a.json'], peerFileIndex: [], dependencyPaths: [] },
|
||||
});
|
||||
|
||||
expect(sessionWorktreeService.create).toHaveBeenCalledWith('job-1-wu-1', baseSha);
|
||||
expect(sessionWorktreeService.cleanup).toHaveBeenCalledWith(expect.any(Object), 'success');
|
||||
expect(result.status).toBe('success');
|
||||
if (result.status !== 'success') {
|
||||
throw new Error('expected successful work unit');
|
||||
}
|
||||
const patchPath = result.patchPath;
|
||||
if (!patchPath) {
|
||||
throw new Error('expected patch path');
|
||||
}
|
||||
expect(patchPath).toContain('0000-wu-1.patch');
|
||||
await expect(readFile(patchPath, 'utf-8')).resolves.toContain('wiki/global/a.md');
|
||||
await expect(readFile(tracePath, 'utf-8')).resolves.toContain('work_unit_child_created');
|
||||
});
|
||||
|
||||
it('removes child worktrees after failed WorkUnit outcomes are traced', async () => {
|
||||
const { homeDir, git, baseSha } = await makeGit();
|
||||
const childDir = join(homeDir, '.worktrees/session-job-1-wu-fail');
|
||||
const sessionWorktreeService = {
|
||||
create: vi.fn(async (_key: string, startSha: string) => {
|
||||
await mkdir(join(homeDir, '.worktrees'), { recursive: true });
|
||||
await git.addWorktree(childDir, 'session/job-1-wu-fail', startSha);
|
||||
return {
|
||||
chatId: 'job-1-wu-fail',
|
||||
workdir: childDir,
|
||||
branch: 'session/job-1-wu-fail',
|
||||
baseSha: startSha,
|
||||
createdAt: new Date(),
|
||||
git: git.forWorktree(childDir),
|
||||
config: {},
|
||||
};
|
||||
}),
|
||||
cleanup: vi.fn(async () => undefined),
|
||||
};
|
||||
const trace = new FileIngestTraceWriter({
|
||||
tracePath: join(homeDir, '.ktx/ingest-traces/job-1/trace.jsonl'),
|
||||
jobId: 'job-1',
|
||||
connectionId: 'c1',
|
||||
sourceKey: 'fake',
|
||||
level: 'trace',
|
||||
});
|
||||
|
||||
const result = await runIsolatedWorkUnit({
|
||||
unitIndex: 0,
|
||||
ingestionBaseSha: baseSha,
|
||||
sessionWorktreeService: sessionWorktreeService as never,
|
||||
patchDir: join(homeDir, '.ktx/ingest-patches/job-1'),
|
||||
trace,
|
||||
run: async () => ({
|
||||
unitKey: 'wu-fail',
|
||||
status: 'failed',
|
||||
reason: 'agent loop errored',
|
||||
preSha: baseSha,
|
||||
postSha: baseSha,
|
||||
actions: [],
|
||||
touchedSlSources: [],
|
||||
}),
|
||||
workUnit: { unitKey: 'wu-fail', rawFiles: ['a.json'], peerFileIndex: [], dependencyPaths: [] },
|
||||
});
|
||||
|
||||
expect(result.status).toBe('failed');
|
||||
expect(sessionWorktreeService.cleanup).toHaveBeenCalledWith(expect.any(Object), 'success');
|
||||
});
|
||||
});
|
||||
|
|
@ -0,0 +1,85 @@
|
|||
import { mkdir, readFile } from 'node:fs/promises';
|
||||
import { join } from 'node:path';
|
||||
import type { SessionOutcome } from '../../core/index.js';
|
||||
import type { IngestSessionWorktree, IngestSessionWorktreePort } from '../ports.js';
|
||||
import type { WorkUnit } from '../types.js';
|
||||
import type { IngestTraceWriter } from '../ingest-trace.js';
|
||||
import type { WorkUnitOutcome } from '../stages/stage-3-work-units.js';
|
||||
import { parsePatchTouchedPaths } from './git-patch.js';
|
||||
|
||||
export interface RunIsolatedWorkUnitInput {
|
||||
unitIndex: number;
|
||||
ingestionBaseSha: string;
|
||||
sessionWorktreeService: IngestSessionWorktreePort;
|
||||
patchDir: string;
|
||||
trace: IngestTraceWriter;
|
||||
workUnit: WorkUnit;
|
||||
run(child: IngestSessionWorktree): Promise<WorkUnitOutcome>;
|
||||
afterSuccess?(child: IngestSessionWorktree): Promise<void>;
|
||||
}
|
||||
|
||||
function patchFileName(unitIndex: number, unitKey: string): string {
|
||||
const safeKey = unitKey.replace(/[^a-zA-Z0-9_.-]+/g, '-');
|
||||
return `${String(unitIndex).padStart(4, '0')}-${safeKey}.patch`;
|
||||
}
|
||||
|
||||
export async function runIsolatedWorkUnit(input: RunIsolatedWorkUnitInput): Promise<WorkUnitOutcome> {
|
||||
const sessionKey = `${input.trace.context.jobId}-${input.workUnit.unitKey}`;
|
||||
let cleanupOutcome: SessionOutcome = 'crash';
|
||||
const child = await input.sessionWorktreeService.create(sessionKey, input.ingestionBaseSha);
|
||||
await input.trace.event('debug', 'work_unit', 'work_unit_child_created', {
|
||||
unitKey: input.workUnit.unitKey,
|
||||
unitIndex: input.unitIndex,
|
||||
worktreePath: child.workdir,
|
||||
baseSha: input.ingestionBaseSha,
|
||||
});
|
||||
|
||||
try {
|
||||
const outcome = await input.run(child);
|
||||
if (outcome.status !== 'success') {
|
||||
cleanupOutcome = 'success';
|
||||
await input.trace.event('error', 'work_unit', 'work_unit_failed_before_patch', {
|
||||
unitKey: input.workUnit.unitKey,
|
||||
reason: outcome.reason ?? 'unknown failure',
|
||||
});
|
||||
return { ...outcome, childWorktreePath: child.workdir };
|
||||
}
|
||||
|
||||
await input.afterSuccess?.(child);
|
||||
await mkdir(input.patchDir, { recursive: true });
|
||||
const patchPath = join(input.patchDir, patchFileName(input.unitIndex, input.workUnit.unitKey));
|
||||
await child.git.writeBinaryNoRenamePatch(input.ingestionBaseSha, 'HEAD', patchPath);
|
||||
const patch = await readFile(patchPath, 'utf-8');
|
||||
const touched = parsePatchTouchedPaths(patch);
|
||||
cleanupOutcome = 'success';
|
||||
await input.trace.event('debug', 'work_unit', 'work_unit_patch_collected', {
|
||||
unitKey: input.workUnit.unitKey,
|
||||
patchPath,
|
||||
touchedPaths: touched.map((entry) => entry.path),
|
||||
patchBytes: Buffer.byteLength(patch),
|
||||
});
|
||||
return {
|
||||
...outcome,
|
||||
patchPath,
|
||||
patchTouchedPaths: touched.map((entry) => entry.path),
|
||||
childWorktreePath: child.workdir,
|
||||
};
|
||||
} catch (error) {
|
||||
await input.trace.event(
|
||||
'error',
|
||||
'work_unit',
|
||||
'work_unit_child_failed',
|
||||
{ unitKey: input.workUnit.unitKey, worktreePath: child.workdir },
|
||||
error,
|
||||
);
|
||||
cleanupOutcome = 'success';
|
||||
throw error;
|
||||
} finally {
|
||||
await input.sessionWorktreeService.cleanup(child, cleanupOutcome);
|
||||
await input.trace.event('trace', 'work_unit', 'work_unit_child_cleanup', {
|
||||
unitKey: input.workUnit.unitKey,
|
||||
outcome: cleanupOutcome,
|
||||
worktreePath: child.workdir,
|
||||
});
|
||||
}
|
||||
}
|
||||
|
|
@ -694,6 +694,14 @@ describe('canonical local ingest', () => {
|
|||
],
|
||||
},
|
||||
});
|
||||
expect(result.report.body.isolatedDiff).toMatchObject({
|
||||
enabled: true,
|
||||
acceptedPatches: 0,
|
||||
projectionSha: expect.any(String),
|
||||
});
|
||||
|
||||
const projectedSourcePath = join(metricflowProject.projectDir, 'semantic-layer/warehouse/orders.yaml');
|
||||
await expect(readFile(projectedSourcePath, 'utf-8')).resolves.toContain('name: orders');
|
||||
|
||||
const stagedRawPath = join(
|
||||
metricflowProject.projectDir,
|
||||
|
|
|
|||
|
|
@ -17,6 +17,24 @@ type RuntimeWithConnectionDeps = {
|
|||
};
|
||||
};
|
||||
|
||||
type RuntimeWithSlValidationDeps = {
|
||||
deps: {
|
||||
slValidator: {
|
||||
validateSingleSource(
|
||||
deps: unknown,
|
||||
connectionId: string,
|
||||
sourceName: string,
|
||||
): Promise<{ errors: string[]; warnings: string[] }>;
|
||||
};
|
||||
};
|
||||
};
|
||||
|
||||
type RuntimeWithSettingsDeps = {
|
||||
deps: {
|
||||
settings: Record<string, unknown>;
|
||||
};
|
||||
};
|
||||
|
||||
function testAgentRunner(): AgentRunnerPort {
|
||||
return { runLoop: vi.fn().mockResolvedValue({ stopReason: 'natural' as const }) };
|
||||
}
|
||||
|
|
@ -144,6 +162,77 @@ describe('createLocalBundleIngestRuntime', () => {
|
|||
]);
|
||||
});
|
||||
|
||||
it('validates manifest-backed scan sources during local ingest gates', async () => {
|
||||
await project.fileStore.writeFile(
|
||||
'semantic-layer/warehouse/_schema/public.yaml',
|
||||
[
|
||||
'tables:',
|
||||
' payments:',
|
||||
' table: public.payments',
|
||||
' columns:',
|
||||
' - name: payment_id',
|
||||
' type: string',
|
||||
' - name: amount',
|
||||
' type: number',
|
||||
'',
|
||||
].join('\n'),
|
||||
'ktx',
|
||||
'ktx@example.com',
|
||||
'Add warehouse manifest',
|
||||
);
|
||||
const agentRunner = testAgentRunner();
|
||||
|
||||
const runtime = createLocalBundleIngestRuntime({
|
||||
project,
|
||||
adapters: [new FakeSourceAdapter()],
|
||||
agentRunner,
|
||||
});
|
||||
const deps = (runtime.runner as unknown as RuntimeWithSlValidationDeps).deps;
|
||||
|
||||
await expect(deps.slValidator.validateSingleSource(deps, 'warehouse', 'payments')).resolves.toEqual({
|
||||
errors: [],
|
||||
warnings: expect.any(Array),
|
||||
});
|
||||
});
|
||||
|
||||
it('does not mask malformed direct overlays with manifest-backed fallback validation', async () => {
|
||||
await project.fileStore.writeFile(
|
||||
'semantic-layer/warehouse/_schema/public.yaml',
|
||||
[
|
||||
'tables:',
|
||||
' payments:',
|
||||
' table: public.payments',
|
||||
' columns:',
|
||||
' - name: payment_id',
|
||||
' type: string',
|
||||
'',
|
||||
].join('\n'),
|
||||
'ktx',
|
||||
'ktx@example.com',
|
||||
'Add warehouse manifest',
|
||||
);
|
||||
await project.fileStore.writeFile(
|
||||
'semantic-layer/warehouse/payments.yaml',
|
||||
['name: payments', 'columns:', ' - [', ''].join('\n'),
|
||||
'ktx',
|
||||
'ktx@example.com',
|
||||
'Add malformed overlay',
|
||||
);
|
||||
const agentRunner = testAgentRunner();
|
||||
|
||||
const runtime = createLocalBundleIngestRuntime({
|
||||
project,
|
||||
adapters: [new FakeSourceAdapter()],
|
||||
agentRunner,
|
||||
});
|
||||
const deps = (runtime.runner as unknown as RuntimeWithSlValidationDeps).deps;
|
||||
|
||||
await expect(deps.slValidator.validateSingleSource(deps, 'warehouse', 'payments')).resolves.toEqual({
|
||||
errors: [expect.stringContaining('invalid YAML')],
|
||||
warnings: [],
|
||||
});
|
||||
});
|
||||
|
||||
it('passes project connection config to local ingest query executors', async () => {
|
||||
const agentRunner = testAgentRunner();
|
||||
const queryExecutor = {
|
||||
|
|
@ -175,6 +264,27 @@ describe('createLocalBundleIngestRuntime', () => {
|
|||
});
|
||||
});
|
||||
|
||||
it('defaults local bundle ingest to isolated diffs without a shared-worktree fallback setting', () => {
|
||||
const runtime = createLocalBundleIngestRuntime({
|
||||
project,
|
||||
adapters: [new FakeSourceAdapter()],
|
||||
agentRunner: testAgentRunner(),
|
||||
});
|
||||
|
||||
const settings = (runtime.runner as unknown as RuntimeWithSettingsDeps).deps.settings;
|
||||
const fallbackSettingKey = ['sharedWorktree', 'SourceKeys'].join('');
|
||||
|
||||
expect(settings).not.toHaveProperty(fallbackSettingKey);
|
||||
expect(Object.keys(settings).sort()).toEqual([
|
||||
'ingestTraceLevel',
|
||||
'memoryIngestionModel',
|
||||
'probeRowCount',
|
||||
'workUnitFailureMode',
|
||||
'workUnitMaxConcurrency',
|
||||
'workUnitStepBudget',
|
||||
]);
|
||||
});
|
||||
|
||||
it('accepts a debug LLM request file when constructing the default agent runner', async () => {
|
||||
await writeFile(
|
||||
join(project.projectDir, 'ktx.yaml'),
|
||||
|
|
|
|||
|
|
@ -24,7 +24,6 @@ import {
|
|||
type KtxConnectionInfo,
|
||||
type KtxQueryResult,
|
||||
SemanticLayerService,
|
||||
type SemanticLayerSource,
|
||||
type SlConnectionCatalogPort,
|
||||
SlDiscoverTool,
|
||||
SlEditSourceTool,
|
||||
|
|
@ -76,6 +75,7 @@ import { createEmitHistoricSqlEvidenceTool } from './adapters/historic-sql/evide
|
|||
import { HistoricSqlProjectionPostProcessor } from './adapters/historic-sql/post-processor.js';
|
||||
import { ContextEvidenceIndexService, SqliteContextEvidenceStore } from './context-evidence/index.js';
|
||||
import { DiffSetService } from './diff-set.service.js';
|
||||
import { ingestTracePathForJob, type IngestTraceLevel } from './ingest-trace.js';
|
||||
import { IngestBundleRunner } from './ingest-bundle.runner.js';
|
||||
import { PageTriageService } from './page-triage/index.js';
|
||||
import { createWarehouseVerificationTools } from './tools/warehouse-verification/index.js';
|
||||
|
|
@ -96,6 +96,12 @@ const promptsDir = fileURLToPath(new URL('../../prompts', import.meta.url));
|
|||
const skillsDir = fileURLToPath(new URL('../../skills', import.meta.url));
|
||||
const LOCAL_AUTHOR = { name: 'KTX Local', email: 'local@ktx.local' };
|
||||
const LOCAL_SHAPE_WARNING = 'Local ingest validates semantic-layer YAML shape only.';
|
||||
const INGEST_TRACE_LEVELS = new Set<IngestTraceLevel>(['error', 'info', 'debug', 'trace']);
|
||||
|
||||
function ingestTraceLevelFromEnv(env: NodeJS.ProcessEnv = process.env): IngestTraceLevel {
|
||||
const raw = env.KTX_INGEST_TRACE_LEVEL;
|
||||
return raw && INGEST_TRACE_LEVELS.has(raw as IngestTraceLevel) ? (raw as IngestTraceLevel) : 'debug';
|
||||
}
|
||||
|
||||
export interface CreateLocalBundleIngestRuntimeOptions {
|
||||
project: KtxLocalProject;
|
||||
|
|
@ -151,6 +157,10 @@ class LocalIngestStorage implements IngestStoragePort {
|
|||
resolveTranscriptDir(jobId: string): string {
|
||||
return join(this.project.projectDir, '.ktx/ingest-transcripts', jobId);
|
||||
}
|
||||
|
||||
resolveTracePath(jobId: string): string {
|
||||
return ingestTracePathForJob(this.homeDir, jobId);
|
||||
}
|
||||
}
|
||||
|
||||
class LocalIngestLock implements IngestLockPort {
|
||||
|
|
@ -237,22 +247,63 @@ class LocalSlPythonPort implements SlPythonPort {
|
|||
}
|
||||
|
||||
class LocalShapeOnlySlValidator implements SlValidatorPort<SlValidationDeps> {
|
||||
private validateParsedSource(sourceName: string, parsed: Record<string, unknown>) {
|
||||
const isOverlay = parsed.table == null && parsed.sql == null;
|
||||
const result = (isOverlay ? sourceOverlaySchema : sourceDefinitionSchema).safeParse(parsed);
|
||||
return result.success
|
||||
? { errors: [], warnings: [LOCAL_SHAPE_WARNING] }
|
||||
: {
|
||||
errors: result.error.issues.map(
|
||||
(issue) => `${sourceName}: ${issue.path.join('.') || 'source'} ${issue.message}`,
|
||||
),
|
||||
warnings: [],
|
||||
};
|
||||
}
|
||||
|
||||
private async validateComposedSource(
|
||||
deps: SlValidationDeps,
|
||||
connectionId: string,
|
||||
sourceName: string,
|
||||
readError: unknown,
|
||||
) {
|
||||
try {
|
||||
const { sources, loadErrors } = await deps.semanticLayerService.loadAllSources(connectionId);
|
||||
const source = sources.find((candidate) => candidate.name === sourceName);
|
||||
if (source) {
|
||||
return this.validateParsedSource(sourceName, source as unknown as Record<string, unknown>);
|
||||
}
|
||||
const detail =
|
||||
loadErrors.length > 0
|
||||
? loadErrors.join('; ')
|
||||
: readError instanceof Error
|
||||
? readError.message
|
||||
: String(readError);
|
||||
return { errors: [`${sourceName}: ${detail}`], warnings: [] };
|
||||
} catch (fallbackError) {
|
||||
return {
|
||||
errors: [`${sourceName}: ${fallbackError instanceof Error ? fallbackError.message : String(fallbackError)}`],
|
||||
warnings: [],
|
||||
};
|
||||
}
|
||||
}
|
||||
|
||||
async validateSingleSource(deps: SlValidationDeps, connectionId: string, sourceName: string) {
|
||||
let content: string;
|
||||
try {
|
||||
const file = await deps.semanticLayerService.readSourceFile(connectionId, sourceName);
|
||||
const parsed = YAML.parse(file.content) as SemanticLayerSource;
|
||||
const isOverlay = parsed.table == null && parsed.sql == null;
|
||||
const result = (isOverlay ? sourceOverlaySchema : sourceDefinitionSchema).safeParse(parsed);
|
||||
return result.success
|
||||
? { errors: [], warnings: [LOCAL_SHAPE_WARNING] }
|
||||
: {
|
||||
errors: result.error.issues.map(
|
||||
(issue) => `${sourceName}: ${issue.path.join('.') || 'source'} ${issue.message}`,
|
||||
),
|
||||
warnings: [],
|
||||
};
|
||||
content = file.content;
|
||||
} catch (error) {
|
||||
return { errors: [`${sourceName}: ${error instanceof Error ? error.message : String(error)}`], warnings: [] };
|
||||
return this.validateComposedSource(deps, connectionId, sourceName, error);
|
||||
}
|
||||
|
||||
try {
|
||||
const parsed = YAML.parse(content) as unknown as Record<string, unknown>;
|
||||
return this.validateParsedSource(sourceName, parsed);
|
||||
} catch (error) {
|
||||
return {
|
||||
errors: [`${sourceName}: invalid YAML — ${error instanceof Error ? error.message : String(error)}`],
|
||||
warnings: [],
|
||||
};
|
||||
}
|
||||
}
|
||||
}
|
||||
|
|
@ -671,6 +722,7 @@ export function createLocalBundleIngestRuntime(
|
|||
workUnitMaxConcurrency: options.project.config.ingest.workUnits.maxConcurrency,
|
||||
workUnitStepBudget: options.project.config.ingest.workUnits.stepBudget,
|
||||
workUnitFailureMode: options.project.config.ingest.workUnits.failureMode,
|
||||
ingestTraceLevel: ingestTraceLevelFromEnv(),
|
||||
},
|
||||
skillsRegistry: new SkillsRegistryService({ skillsDir, logger }),
|
||||
promptService,
|
||||
|
|
|
|||
|
|
@ -21,6 +21,7 @@ function snapshot(overrides: Partial<MemoryFlowReplayInput> = {}): MemoryFlowRep
|
|||
{ type: 'raw_snapshot_written', syncId: 'sync-1', rawFileCount: 2 },
|
||||
{ type: 'diff_computed', added: 1, modified: 1, deleted: 0, unchanged: 0 },
|
||||
{ type: 'chunks_planned', chunkCount: 1, workUnitCount: 1, evictionCount: 0 },
|
||||
{ type: 'stage_progress', stage: 'integration', percent: 80, message: 'Integrating 1/1 patches: orders' },
|
||||
{ type: 'work_unit_started', unitKey: 'orders', skills: ['wiki_capture'], stepBudget: 40 },
|
||||
{ type: 'work_unit_step', unitKey: 'orders', stepIndex: 1, stepBudget: 40 },
|
||||
{ type: 'candidate_action', unitKey: 'orders', target: 'wiki', action: 'created', key: 'wiki/orders.md' },
|
||||
|
|
|
|||
|
|
@ -53,6 +53,23 @@ export const memoryFlowEventSchema = z.discriminatedUnion('type', [
|
|||
stage: z.enum(['source', 'chunks', 'workUnits', 'actions', 'gates', 'saved']),
|
||||
reason: z.string().min(1),
|
||||
}),
|
||||
eventSchema({
|
||||
type: z.literal('stage_progress'),
|
||||
stage: z.enum([
|
||||
'source',
|
||||
'integration',
|
||||
'reconciliation',
|
||||
'post_processor',
|
||||
'wiki_sl_ref_repair',
|
||||
'final_gates',
|
||||
'save',
|
||||
'provenance',
|
||||
'report',
|
||||
]),
|
||||
percent: z.number().min(0).max(100),
|
||||
message: z.string().min(1),
|
||||
transient: z.boolean().optional(),
|
||||
}),
|
||||
eventSchema({
|
||||
type: z.literal('work_unit_started'),
|
||||
unitKey: z.string().min(1),
|
||||
|
|
|
|||
|
|
@ -44,6 +44,22 @@ type MemoryFlowEventPayload =
|
|||
stage: MemoryFlowColumnId;
|
||||
reason: string;
|
||||
}
|
||||
| {
|
||||
type: 'stage_progress';
|
||||
stage:
|
||||
| 'source'
|
||||
| 'integration'
|
||||
| 'reconciliation'
|
||||
| 'post_processor'
|
||||
| 'wiki_sl_ref_repair'
|
||||
| 'final_gates'
|
||||
| 'save'
|
||||
| 'provenance'
|
||||
| 'report';
|
||||
percent: number;
|
||||
message: string;
|
||||
transient?: boolean;
|
||||
}
|
||||
| {
|
||||
type: 'work_unit_started';
|
||||
unitKey: string;
|
||||
|
|
|
|||
|
|
@ -16,6 +16,7 @@ import type {
|
|||
import type { ToolContext, ToolSession, TouchedSlSource } from '../tools/index.js';
|
||||
import type { KnowledgeIndexPort, KnowledgeWikiService } from '../wiki/index.js';
|
||||
import type { CanonicalPin } from './canonical-pins.js';
|
||||
import type { IngestTraceLevel } from './ingest-trace.js';
|
||||
import type { IngestReportSnapshot } from './reports.js';
|
||||
import type {
|
||||
ReconcileCandidateForPrompt,
|
||||
|
|
@ -142,6 +143,7 @@ export interface IngestSettingsPort {
|
|||
workUnitMaxConcurrency?: number;
|
||||
workUnitStepBudget?: number;
|
||||
workUnitFailureMode?: 'abort' | 'continue';
|
||||
ingestTraceLevel?: IngestTraceLevel;
|
||||
}
|
||||
|
||||
export interface IngestGitAuthor {
|
||||
|
|
@ -155,6 +157,7 @@ export interface IngestStoragePort {
|
|||
resolveUploadDir(uploadId: string): string;
|
||||
resolvePullDir(jobId: string): string;
|
||||
resolveTranscriptDir(jobId: string): string;
|
||||
resolveTracePath(jobId: string): string;
|
||||
}
|
||||
|
||||
export interface IngestCommitMessagePort {
|
||||
|
|
|
|||
|
|
@ -206,6 +206,47 @@ describe('parseIngestReportSnapshot', () => {
|
|||
expect(snapshot.body.toolTranscripts).toEqual([]);
|
||||
});
|
||||
|
||||
it('parses failed ingest reports with trace and failure details', () => {
|
||||
const snapshot = parseIngestReportSnapshot({
|
||||
id: 'report-failed',
|
||||
runId: 'run-failed',
|
||||
jobId: 'job-failed',
|
||||
connectionId: 'warehouse',
|
||||
sourceKey: 'metabase',
|
||||
createdAt: '2026-05-17T12:00:00.000Z',
|
||||
body: {
|
||||
status: 'failed',
|
||||
syncId: 'sync-failed',
|
||||
diffSummary: { added: 1, modified: 0, deleted: 0, unchanged: 0 },
|
||||
commitSha: null,
|
||||
tracePath: '/project/.ktx/ingest-traces/job-failed/trace.jsonl',
|
||||
failure: {
|
||||
phase: 'final_gates',
|
||||
message: 'final artifact gates failed',
|
||||
},
|
||||
workUnits: [],
|
||||
failedWorkUnits: [],
|
||||
reconciliationSkipped: true,
|
||||
conflictsResolved: [],
|
||||
evictionsApplied: [],
|
||||
unmappedFallbacks: [],
|
||||
evictionInputs: [],
|
||||
unresolvedCards: [],
|
||||
supersededBy: null,
|
||||
overrideOf: null,
|
||||
provenanceRows: [],
|
||||
toolTranscripts: [],
|
||||
},
|
||||
});
|
||||
|
||||
expect(snapshot.body.status).toBe('failed');
|
||||
expect(snapshot.body.failure).toEqual({
|
||||
phase: 'final_gates',
|
||||
message: 'final artifact gates failed',
|
||||
});
|
||||
expect(snapshot.body.tracePath).toContain('trace.jsonl');
|
||||
});
|
||||
|
||||
it('rejects malformed report snapshots with a concise message', () => {
|
||||
const report = validReportSnapshot();
|
||||
report.body.workUnits[0] = {
|
||||
|
|
@ -215,4 +256,93 @@ describe('parseIngestReportSnapshot', () => {
|
|||
|
||||
expect(() => parseIngestReportSnapshot(report)).toThrow('Invalid ingest report snapshot');
|
||||
});
|
||||
|
||||
it('parses isolated-diff textual resolver counters', () => {
|
||||
const snapshot = parseIngestReportSnapshot({
|
||||
id: 'report-1',
|
||||
runId: 'run-1',
|
||||
jobId: 'job-1',
|
||||
connectionId: 'warehouse',
|
||||
sourceKey: 'metabase',
|
||||
createdAt: '2026-05-18T00:00:00.000Z',
|
||||
body: {
|
||||
status: 'completed',
|
||||
syncId: 'sync-1',
|
||||
diffSummary: { added: 0, modified: 1, deleted: 0, unchanged: 0 },
|
||||
commitSha: 'abc123',
|
||||
isolatedDiff: {
|
||||
enabled: true,
|
||||
acceptedPatches: 2,
|
||||
textualConflicts: 1,
|
||||
semanticConflicts: 0,
|
||||
resolverAttempts: 1,
|
||||
resolverRepairs: 1,
|
||||
resolverFailures: 0,
|
||||
},
|
||||
workUnits: [],
|
||||
failedWorkUnits: [],
|
||||
reconciliationSkipped: true,
|
||||
conflictsResolved: [],
|
||||
evictionsApplied: [],
|
||||
unmappedFallbacks: [],
|
||||
artifactResolutions: [],
|
||||
evictionInputs: [],
|
||||
unresolvedCards: [],
|
||||
supersededBy: null,
|
||||
overrideOf: null,
|
||||
provenanceRows: [],
|
||||
toolTranscripts: [],
|
||||
},
|
||||
});
|
||||
|
||||
expect(snapshot.body.isolatedDiff).toMatchObject({
|
||||
resolverAttempts: 1,
|
||||
resolverRepairs: 1,
|
||||
resolverFailures: 0,
|
||||
});
|
||||
});
|
||||
|
||||
it('parses isolated-diff gate repair counters', () => {
|
||||
const snapshot = parseIngestReportSnapshot({
|
||||
id: 'report-1',
|
||||
runId: 'run-1',
|
||||
jobId: 'job-1',
|
||||
connectionId: 'warehouse',
|
||||
sourceKey: 'metabase',
|
||||
createdAt: '2026-05-18T00:00:00.000Z',
|
||||
body: {
|
||||
status: 'completed',
|
||||
syncId: 'sync-1',
|
||||
diffSummary: { added: 1, modified: 0, deleted: 0, unchanged: 0 },
|
||||
commitSha: 'abc123',
|
||||
isolatedDiff: {
|
||||
enabled: true,
|
||||
acceptedPatches: 1,
|
||||
textualConflicts: 0,
|
||||
semanticConflicts: 1,
|
||||
gateRepairAttempts: 1,
|
||||
gateRepairs: 1,
|
||||
gateRepairFailures: 0,
|
||||
},
|
||||
workUnits: [],
|
||||
failedWorkUnits: [],
|
||||
reconciliationSkipped: true,
|
||||
conflictsResolved: [],
|
||||
evictionsApplied: [],
|
||||
unmappedFallbacks: [],
|
||||
evictionInputs: [],
|
||||
unresolvedCards: [],
|
||||
supersededBy: null,
|
||||
overrideOf: null,
|
||||
provenanceRows: [],
|
||||
toolTranscripts: [],
|
||||
},
|
||||
});
|
||||
|
||||
expect(snapshot.body.isolatedDiff).toMatchObject({
|
||||
gateRepairAttempts: 1,
|
||||
gateRepairs: 1,
|
||||
gateRepairFailures: 0,
|
||||
});
|
||||
});
|
||||
});
|
||||
|
|
|
|||
|
|
@ -123,6 +123,12 @@ const sourceFetchReportSchema = z.object({
|
|||
warnings: z.array(sourceFetchIssueSchema).default([]),
|
||||
});
|
||||
|
||||
const ingestReportFailureSchema = z.object({
|
||||
phase: z.string().min(1),
|
||||
message: z.string().min(1),
|
||||
details: z.record(z.string(), z.unknown()).optional(),
|
||||
});
|
||||
|
||||
export const ingestReportSnapshotSchema = z
|
||||
.object({
|
||||
id: z.string().min(1),
|
||||
|
|
@ -133,10 +139,30 @@ export const ingestReportSnapshotSchema = z
|
|||
createdAt: z.string().min(1),
|
||||
body: z
|
||||
.object({
|
||||
status: z.enum(['completed', 'failed']).optional(),
|
||||
syncId: z.string().min(1),
|
||||
diffSummary: ingestDiffSummarySchema,
|
||||
fetch: sourceFetchReportSchema.optional(),
|
||||
commitSha: z.string().nullable(),
|
||||
tracePath: z.string().optional(),
|
||||
failure: ingestReportFailureSchema.optional(),
|
||||
isolatedDiff: z
|
||||
.object({
|
||||
enabled: z.boolean(),
|
||||
integrationWorktreePath: z.string().optional(),
|
||||
ingestionBaseSha: z.string().optional(),
|
||||
projectionSha: z.string().nullable().optional(),
|
||||
acceptedPatches: z.number().int().min(0),
|
||||
textualConflicts: z.number().int().min(0),
|
||||
semanticConflicts: z.number().int().min(0),
|
||||
resolverAttempts: z.number().int().min(0).default(0),
|
||||
resolverRepairs: z.number().int().min(0).default(0),
|
||||
resolverFailures: z.number().int().min(0).default(0),
|
||||
gateRepairAttempts: z.number().int().min(0).default(0),
|
||||
gateRepairs: z.number().int().min(0).default(0),
|
||||
gateRepairFailures: z.number().int().min(0).default(0),
|
||||
})
|
||||
.optional(),
|
||||
workUnits: z.array(
|
||||
z.object({
|
||||
unitKey: z.string().min(1),
|
||||
|
|
|
|||
|
|
@ -48,11 +48,35 @@ export interface IngestReportPostProcessorOutcome {
|
|||
touchedSources: TouchedSlSource[];
|
||||
}
|
||||
|
||||
export interface IngestReportFailure {
|
||||
phase: string;
|
||||
message: string;
|
||||
details?: Record<string, unknown>;
|
||||
}
|
||||
|
||||
export interface IngestReportBody {
|
||||
status?: 'completed' | 'failed';
|
||||
syncId: string;
|
||||
diffSummary: IngestDiffSummary;
|
||||
fetch?: SourceFetchReport;
|
||||
commitSha: string | null;
|
||||
tracePath?: string;
|
||||
failure?: IngestReportFailure;
|
||||
isolatedDiff?: {
|
||||
enabled: boolean;
|
||||
integrationWorktreePath?: string;
|
||||
ingestionBaseSha?: string;
|
||||
projectionSha?: string | null;
|
||||
acceptedPatches: number;
|
||||
textualConflicts: number;
|
||||
semanticConflicts: number;
|
||||
resolverAttempts?: number;
|
||||
resolverRepairs?: number;
|
||||
resolverFailures?: number;
|
||||
gateRepairAttempts?: number;
|
||||
gateRepairs?: number;
|
||||
gateRepairFailures?: number;
|
||||
};
|
||||
workUnits: IngestReportWorkUnit[];
|
||||
failedWorkUnits: string[];
|
||||
reconciliationSkipped: boolean;
|
||||
|
|
|
|||
|
|
@ -0,0 +1,38 @@
|
|||
import { describe, expect, it } from 'vitest';
|
||||
import {
|
||||
assertSemanticLayerTargetPathsAllowed,
|
||||
findDisallowedSemanticLayerTargetPaths,
|
||||
semanticLayerConnectionIdFromPath,
|
||||
} from './semantic-layer-target-policy.js';
|
||||
|
||||
describe('semantic-layer target policy', () => {
|
||||
it('extracts connection ids from semantic-layer paths', () => {
|
||||
expect(semanticLayerConnectionIdFromPath('semantic-layer/warehouse/orders.yaml')).toBe('warehouse');
|
||||
expect(semanticLayerConnectionIdFromPath('a/semantic-layer/finance/orders.yaml')).toBe('finance');
|
||||
expect(semanticLayerConnectionIdFromPath('wiki/global/orders.md')).toBeNull();
|
||||
});
|
||||
|
||||
it('finds semantic-layer paths outside the allowed target connections', () => {
|
||||
expect(
|
||||
findDisallowedSemanticLayerTargetPaths({
|
||||
paths: [
|
||||
'semantic-layer/warehouse/orders.yaml',
|
||||
'semantic-layer/finance/orders.yaml',
|
||||
'wiki/global/orders.md',
|
||||
],
|
||||
allowedConnectionIds: new Set(['warehouse']),
|
||||
}),
|
||||
).toEqual([{ path: 'semantic-layer/finance/orders.yaml', connectionId: 'finance' }]);
|
||||
});
|
||||
|
||||
it('throws a deterministic error for unauthorized semantic-layer targets', () => {
|
||||
expect(() =>
|
||||
assertSemanticLayerTargetPathsAllowed({
|
||||
paths: ['semantic-layer/finance/orders.yaml', 'semantic-layer/marketing/accounts.yaml'],
|
||||
allowedConnectionIds: new Set(['warehouse']),
|
||||
}),
|
||||
).toThrow(
|
||||
/semantic-layer target connection not allowed: semantic-layer\/finance\/orders\.yaml \(finance\), semantic-layer\/marketing\/accounts\.yaml \(marketing\); allowed: warehouse/,
|
||||
);
|
||||
});
|
||||
});
|
||||
42
packages/context/src/ingest/semantic-layer-target-policy.ts
Normal file
42
packages/context/src/ingest/semantic-layer-target-policy.ts
Normal file
|
|
@ -0,0 +1,42 @@
|
|||
export interface SemanticLayerTargetPolicyInput {
|
||||
paths: readonly string[];
|
||||
allowedConnectionIds: ReadonlySet<string>;
|
||||
}
|
||||
|
||||
export interface SemanticLayerTargetPolicyViolation {
|
||||
path: string;
|
||||
connectionId: string;
|
||||
}
|
||||
|
||||
export function semanticLayerConnectionIdFromPath(path: string): string | null {
|
||||
const normalized = path.replace(/^[ab]\//, '');
|
||||
const match = /^semantic-layer\/([^/]+)\//.exec(normalized);
|
||||
return match?.[1] ?? null;
|
||||
}
|
||||
|
||||
export function findDisallowedSemanticLayerTargetPaths(
|
||||
input: SemanticLayerTargetPolicyInput,
|
||||
): SemanticLayerTargetPolicyViolation[] {
|
||||
return input.paths
|
||||
.map((path) => ({ path, connectionId: semanticLayerConnectionIdFromPath(path) }))
|
||||
.filter((entry): entry is SemanticLayerTargetPolicyViolation => {
|
||||
return entry.connectionId !== null && !input.allowedConnectionIds.has(entry.connectionId);
|
||||
})
|
||||
.sort((left, right) => {
|
||||
const byConnection = left.connectionId.localeCompare(right.connectionId);
|
||||
return byConnection === 0 ? left.path.localeCompare(right.path) : byConnection;
|
||||
});
|
||||
}
|
||||
|
||||
export function assertSemanticLayerTargetPathsAllowed(input: SemanticLayerTargetPolicyInput): void {
|
||||
const violations = findDisallowedSemanticLayerTargetPaths(input);
|
||||
if (violations.length === 0) {
|
||||
return;
|
||||
}
|
||||
const allowed = [...input.allowedConnectionIds].sort();
|
||||
throw new Error(
|
||||
`semantic-layer target connection not allowed: ${violations
|
||||
.map((violation) => `${violation.path} (${violation.connectionId})`)
|
||||
.join(', ')}; allowed: ${allowed.length > 0 ? allowed.join(', ') : '(none)'}`,
|
||||
);
|
||||
}
|
||||
|
|
@ -41,6 +41,9 @@ export interface WorkUnitOutcome {
|
|||
touchedSlSources: TouchedSlSource[];
|
||||
slDisallowed?: boolean;
|
||||
slDisallowedReason?: 'lookml_connection_mismatch';
|
||||
patchPath?: string;
|
||||
patchTouchedPaths?: string[];
|
||||
childWorktreePath?: string;
|
||||
}
|
||||
|
||||
export async function executeWorkUnit(deps: WorkUnitExecutionDeps, wu: WorkUnit): Promise<WorkUnitOutcome> {
|
||||
|
|
|
|||
|
|
@ -1,4 +1,5 @@
|
|||
import type { KtxEmbeddingPort } from '../core/embedding.js';
|
||||
import type { SemanticLayerService } from '../sl/index.js';
|
||||
import type { MemoryFlowEventSink } from './memory-flow/types.js';
|
||||
|
||||
export type IngestTrigger = 'upload' | 'scheduled_pull' | 'manual_resync' | 'manual_override';
|
||||
|
|
@ -47,6 +48,7 @@ export interface ChunkResult {
|
|||
export interface FetchContext {
|
||||
connectionId: string;
|
||||
sourceKey: string;
|
||||
memoryFlow?: MemoryFlowEventSink;
|
||||
}
|
||||
|
||||
type SourceFetchIssueKind =
|
||||
|
|
@ -96,6 +98,26 @@ export interface ClusterWorkUnitsContext {
|
|||
embedding: KtxEmbeddingPort;
|
||||
}
|
||||
|
||||
export interface DeterministicProjectionContext {
|
||||
connectionId: string;
|
||||
sourceKey: string;
|
||||
syncId: string;
|
||||
jobId: string;
|
||||
runId: string;
|
||||
stagedDir: string;
|
||||
workdir: string;
|
||||
parseArtifacts?: unknown;
|
||||
semanticLayerService: SemanticLayerService;
|
||||
}
|
||||
|
||||
export interface ProjectionResult {
|
||||
warnings: string[];
|
||||
errors: string[];
|
||||
touchedSources: Array<{ connectionId: string; sourceName: string }>;
|
||||
changedWikiPageKeys: string[];
|
||||
result?: unknown;
|
||||
}
|
||||
|
||||
export interface SourceAdapter {
|
||||
readonly source: string;
|
||||
readonly skillNames: string[];
|
||||
|
|
@ -109,6 +131,7 @@ export interface SourceAdapter {
|
|||
listTargetConnectionIds?(stagedDir: string): Promise<string[]>;
|
||||
chunk(stagedDir: string, diffSet?: DiffSet): Promise<ChunkResult>;
|
||||
clusterWorkUnits?(ctx: ClusterWorkUnitsContext): Promise<WorkUnit[]>;
|
||||
project?(ctx: DeterministicProjectionContext): Promise<ProjectionResult>;
|
||||
describeScope?(stagedDir: string): Promise<ScopeDescriptor>;
|
||||
onPullSucceeded?(ctx: {
|
||||
connectionId: string;
|
||||
|
|
|
|||
153
packages/context/src/ingest/wiki-body-refs.test.ts
Normal file
153
packages/context/src/ingest/wiki-body-refs.test.ts
Normal file
|
|
@ -0,0 +1,153 @@
|
|||
import { describe, expect, it } from 'vitest';
|
||||
import { findInvalidWikiBodyRefs, parseWikiBodyRefs } from './wiki-body-refs.js';
|
||||
|
||||
const sources = [
|
||||
{
|
||||
name: 'mart_account_segments',
|
||||
grain: ['account_id'],
|
||||
columns: [
|
||||
{ name: 'account_id', type: 'string' },
|
||||
{ name: 'segment', type: 'string' },
|
||||
],
|
||||
joins: [],
|
||||
measures: [{ name: 'total_contract_arr', expr: 'sum(contract_arr)' }],
|
||||
segments: [{ name: 'enterprise', expr: "segment = 'enterprise'" }],
|
||||
table: 'analytics.mart_account_segments',
|
||||
},
|
||||
];
|
||||
|
||||
describe('wiki body refs', () => {
|
||||
it('parses only explicit inline-code body references outside fenced blocks', () => {
|
||||
const body = [
|
||||
'Valid `mart_account_segments.total_contract_arr` and `source:mart_account_segments`.',
|
||||
'Also `warehouse/mart_account_segments.segment` and `table:analytics.mart_account_segments`.',
|
||||
'Ignore prose mart_account_segments.total_contract_arr_cents.',
|
||||
'Ignore `single_token`.',
|
||||
'Ignore wildcard pattern `mart_nrr_quarterly.*_arr_cents`.',
|
||||
'Ignore condition `users.is_internal = false`.',
|
||||
'```sql',
|
||||
'select `mart_account_segments.total_contract_arr_cents`',
|
||||
'```',
|
||||
].join('\n');
|
||||
|
||||
expect(parseWikiBodyRefs(body)).toEqual([
|
||||
{ kind: 'sl_entity', connectionId: null, sourceName: 'mart_account_segments', entityName: 'total_contract_arr' },
|
||||
{ kind: 'sl_source', connectionId: null, sourceName: 'mart_account_segments' },
|
||||
{ kind: 'sl_entity', connectionId: 'warehouse', sourceName: 'mart_account_segments', entityName: 'segment' },
|
||||
{ kind: 'table', connectionId: null, tableRef: 'analytics.mart_account_segments' },
|
||||
]);
|
||||
});
|
||||
|
||||
it('rejects stale inline-code semantic-layer references', async () => {
|
||||
const invalid = await findInvalidWikiBodyRefs({
|
||||
pageKey: 'account-segments',
|
||||
body: 'ARR is documented as `mart_account_segments.total_contract_arr_cents`.',
|
||||
visibleConnectionIds: ['warehouse'],
|
||||
loadSources: async () => sources,
|
||||
tableExists: async () => true,
|
||||
});
|
||||
|
||||
expect(invalid).toEqual([
|
||||
'account-segments: unknown semantic-layer entity mart_account_segments.total_contract_arr_cents',
|
||||
]);
|
||||
});
|
||||
|
||||
it('does not treat wildcard inline-code patterns as exact semantic-layer entity references', async () => {
|
||||
const invalid = await findInvalidWikiBodyRefs({
|
||||
pageKey: 'revenue-metrics-encoding',
|
||||
body: 'Cents columns include `mart_nrr_quarterly.*_arr_cents` and `mart_retention_movement_breakout.*_arr_cents`.',
|
||||
visibleConnectionIds: ['warehouse'],
|
||||
loadSources: async () => [
|
||||
{ name: 'mart_nrr_quarterly', grain: [], columns: [], joins: [], measures: [], table: 'analytics.mart_nrr_quarterly' },
|
||||
{
|
||||
name: 'mart_retention_movement_breakout',
|
||||
grain: [],
|
||||
columns: [],
|
||||
joins: [],
|
||||
measures: [],
|
||||
table: 'analytics.mart_retention_movement_breakout',
|
||||
},
|
||||
],
|
||||
tableExists: async () => true,
|
||||
});
|
||||
|
||||
expect(invalid).toEqual([]);
|
||||
});
|
||||
|
||||
it('does not treat inline-code SQL predicates as exact semantic-layer entity references', async () => {
|
||||
const invalid = await findInvalidWikiBodyRefs({
|
||||
pageKey: 'account-reporting-exclusions',
|
||||
body: 'Exclude internal users with `users.is_internal = false` and test users with `users.is_test = false`.',
|
||||
visibleConnectionIds: ['warehouse'],
|
||||
loadSources: async () => [
|
||||
{
|
||||
name: 'users',
|
||||
grain: [],
|
||||
columns: [
|
||||
{ name: 'is_internal', type: 'boolean' },
|
||||
{ name: 'is_test', type: 'boolean' },
|
||||
],
|
||||
joins: [],
|
||||
measures: [],
|
||||
table: 'analytics.users',
|
||||
},
|
||||
],
|
||||
tableExists: async () => true,
|
||||
});
|
||||
|
||||
expect(invalid).toEqual([]);
|
||||
});
|
||||
|
||||
it('validates source, dimension, segment, measure, and table references', async () => {
|
||||
const invalid = await findInvalidWikiBodyRefs({
|
||||
pageKey: 'account-segments',
|
||||
body: [
|
||||
'`mart_account_segments.total_contract_arr`',
|
||||
'`mart_account_segments.segment`',
|
||||
'`mart_account_segments.enterprise`',
|
||||
'`source:mart_account_segments`',
|
||||
'`table:analytics.mart_account_segments`',
|
||||
].join('\n'),
|
||||
visibleConnectionIds: ['warehouse'],
|
||||
loadSources: async () => sources,
|
||||
tableExists: async (_connectionId, tableRef) => tableRef === 'analytics.mart_account_segments',
|
||||
});
|
||||
|
||||
expect(invalid).toEqual([]);
|
||||
});
|
||||
|
||||
it('ignores two-part inline code when the source is not visible', async () => {
|
||||
const invalid = await findInvalidWikiBodyRefs({
|
||||
pageKey: 'engineering-notes',
|
||||
body: [
|
||||
'A version token like `node.v22` is not a semantic-layer reference.',
|
||||
'A raw table must use `table:analytics.mart_account_segments`.',
|
||||
].join('\n'),
|
||||
visibleConnectionIds: ['warehouse'],
|
||||
loadSources: async () => sources,
|
||||
tableExists: async (_connectionId, tableRef) => tableRef === 'analytics.mart_account_segments',
|
||||
});
|
||||
|
||||
expect(invalid).toEqual([]);
|
||||
});
|
||||
|
||||
it('still rejects explicit missing source and table references', async () => {
|
||||
const invalid = await findInvalidWikiBodyRefs({
|
||||
pageKey: 'account-segments',
|
||||
body: [
|
||||
'`source:missing_source`',
|
||||
'`warehouse/source:missing_source`',
|
||||
'`table:analytics.missing_table`',
|
||||
].join('\n'),
|
||||
visibleConnectionIds: ['warehouse'],
|
||||
loadSources: async () => sources,
|
||||
tableExists: async () => false,
|
||||
});
|
||||
|
||||
expect(invalid).toEqual([
|
||||
'account-segments: unknown semantic-layer source missing_source',
|
||||
'account-segments: unknown semantic-layer source warehouse/missing_source',
|
||||
'account-segments: unknown raw table analytics.missing_table',
|
||||
]);
|
||||
});
|
||||
});
|
||||
141
packages/context/src/ingest/wiki-body-refs.ts
Normal file
141
packages/context/src/ingest/wiki-body-refs.ts
Normal file
|
|
@ -0,0 +1,141 @@
|
|||
import type { SemanticLayerSource } from '../sl/index.js';
|
||||
|
||||
export type WikiBodyRef =
|
||||
| { kind: 'sl_entity'; connectionId: string | null; sourceName: string; entityName: string }
|
||||
| { kind: 'sl_source'; connectionId: string | null; sourceName: string }
|
||||
| { kind: 'table'; connectionId: string | null; tableRef: string };
|
||||
|
||||
export interface WikiBodyRefValidationInput {
|
||||
pageKey: string;
|
||||
body: string;
|
||||
visibleConnectionIds: string[];
|
||||
loadSources(connectionId: string): Promise<SemanticLayerSource[]>;
|
||||
tableExists(connectionId: string, tableRef: string): Promise<boolean>;
|
||||
}
|
||||
|
||||
const inlineCodePattern = /`([^`\n]+)`/g;
|
||||
|
||||
function visibleLinesOutsideFences(body: string): string[] {
|
||||
const lines: string[] = [];
|
||||
let fenced = false;
|
||||
for (const line of body.split('\n')) {
|
||||
if (/^\s*```/.test(line)) {
|
||||
fenced = !fenced;
|
||||
continue;
|
||||
}
|
||||
if (!fenced) {
|
||||
lines.push(line);
|
||||
}
|
||||
}
|
||||
return lines;
|
||||
}
|
||||
|
||||
function parseConnectionScoped(value: string): { connectionId: string | null; body: string } {
|
||||
const slash = value.indexOf('/');
|
||||
if (slash <= 0) {
|
||||
return { connectionId: null, body: value };
|
||||
}
|
||||
return { connectionId: value.slice(0, slash), body: value.slice(slash + 1) };
|
||||
}
|
||||
|
||||
function isIdentifierToken(value: string): boolean {
|
||||
return /^[A-Za-z_][A-Za-z0-9_]*$/.test(value);
|
||||
}
|
||||
|
||||
export function parseWikiBodyRefs(body: string): WikiBodyRef[] {
|
||||
const refs: WikiBodyRef[] = [];
|
||||
for (const line of visibleLinesOutsideFences(body)) {
|
||||
for (const match of line.matchAll(inlineCodePattern)) {
|
||||
const token = (match[1] ?? '').trim();
|
||||
if (!token) {
|
||||
continue;
|
||||
}
|
||||
const scoped = parseConnectionScoped(token);
|
||||
if (scoped.body.startsWith('source:')) {
|
||||
const sourceName = scoped.body.slice('source:'.length).trim();
|
||||
if (sourceName) {
|
||||
refs.push({ kind: 'sl_source', connectionId: scoped.connectionId, sourceName });
|
||||
}
|
||||
continue;
|
||||
}
|
||||
if (scoped.body.startsWith('table:')) {
|
||||
const tableRef = scoped.body.slice('table:'.length).trim();
|
||||
if (tableRef) {
|
||||
refs.push({ kind: 'table', connectionId: scoped.connectionId, tableRef });
|
||||
}
|
||||
continue;
|
||||
}
|
||||
const parts = scoped.body.split('.');
|
||||
if (parts.length === 2 && isIdentifierToken(parts[0] ?? '') && isIdentifierToken(parts[1] ?? '')) {
|
||||
refs.push({
|
||||
kind: 'sl_entity',
|
||||
connectionId: scoped.connectionId,
|
||||
sourceName: parts[0],
|
||||
entityName: parts[1],
|
||||
});
|
||||
}
|
||||
}
|
||||
}
|
||||
return refs;
|
||||
}
|
||||
|
||||
function entityNames(source: SemanticLayerSource): Set<string> {
|
||||
return new Set([
|
||||
...(source.measures ?? []).map((measure) => measure.name),
|
||||
...(source.columns ?? []).map((column) => column.name),
|
||||
...(source.segments ?? []).map((segment) => segment.name),
|
||||
]);
|
||||
}
|
||||
|
||||
export async function findInvalidWikiBodyRefs(input: WikiBodyRefValidationInput): Promise<string[]> {
|
||||
const errors: string[] = [];
|
||||
const sourceCache = new Map<string, SemanticLayerSource[]>();
|
||||
const loadSources = async (connectionId: string): Promise<SemanticLayerSource[]> => {
|
||||
const cached = sourceCache.get(connectionId);
|
||||
if (cached) {
|
||||
return cached;
|
||||
}
|
||||
const sources = await input.loadSources(connectionId);
|
||||
sourceCache.set(connectionId, sources);
|
||||
return sources;
|
||||
};
|
||||
|
||||
const findSource = async (
|
||||
connectionIds: string[],
|
||||
sourceName: string,
|
||||
): Promise<{ connectionId: string; source: SemanticLayerSource } | null> => {
|
||||
for (const connectionId of connectionIds) {
|
||||
const source = (await loadSources(connectionId)).find((candidate) => candidate.name === sourceName);
|
||||
if (source) {
|
||||
return { connectionId, source };
|
||||
}
|
||||
}
|
||||
return null;
|
||||
};
|
||||
|
||||
for (const ref of parseWikiBodyRefs(input.body)) {
|
||||
const connectionIds = ref.connectionId ? [ref.connectionId] : input.visibleConnectionIds;
|
||||
if (ref.kind === 'table') {
|
||||
const found = await Promise.all(connectionIds.map((connectionId) => input.tableExists(connectionId, ref.tableRef)));
|
||||
if (!found.some(Boolean)) {
|
||||
errors.push(`${input.pageKey}: unknown raw table ${ref.connectionId ? `${ref.connectionId}/` : ''}${ref.tableRef}`);
|
||||
}
|
||||
continue;
|
||||
}
|
||||
|
||||
const found = await findSource(connectionIds, ref.sourceName);
|
||||
if (!found) {
|
||||
if (ref.kind === 'sl_source') {
|
||||
errors.push(
|
||||
`${input.pageKey}: unknown semantic-layer source ${ref.connectionId ? `${ref.connectionId}/` : ''}${ref.sourceName}`,
|
||||
);
|
||||
}
|
||||
continue;
|
||||
}
|
||||
if (ref.kind === 'sl_entity' && !entityNames(found.source).has(ref.entityName)) {
|
||||
errors.push(`${input.pageKey}: unknown semantic-layer entity ${ref.sourceName}.${ref.entityName}`);
|
||||
}
|
||||
}
|
||||
|
||||
return errors;
|
||||
}
|
||||
|
|
@ -78,6 +78,11 @@ describe('ClaudeCodeKtxLlmRuntime', () => {
|
|||
skills: [],
|
||||
plugins: [],
|
||||
tools: [],
|
||||
managedSettings: {
|
||||
allowManagedMcpServersOnly: true,
|
||||
allowedMcpServers: [],
|
||||
},
|
||||
strictMcpConfig: true,
|
||||
allowedTools: [],
|
||||
permissionMode: 'dontAsk',
|
||||
persistSession: false,
|
||||
|
|
@ -144,6 +149,11 @@ describe('ClaudeCodeKtxLlmRuntime', () => {
|
|||
|
||||
const options = query.mock.calls[0][0].options;
|
||||
expect(options.allowedTools).toEqual(['mcp__ktx__load_skill']);
|
||||
expect(options.managedSettings).toEqual({
|
||||
allowManagedMcpServersOnly: true,
|
||||
allowedMcpServers: [{ serverName: 'ktx' }],
|
||||
});
|
||||
expect(options.strictMcpConfig).toBe(true);
|
||||
expect(await options.canUseTool('mcp__ktx__load_skill', {}, { signal: new AbortController().signal, toolUseID: '1' })).toEqual({
|
||||
behavior: 'allow',
|
||||
toolUseID: '1',
|
||||
|
|
@ -176,6 +186,11 @@ describe('ClaudeCodeKtxLlmRuntime', () => {
|
|||
skills: [],
|
||||
plugins: [],
|
||||
tools: [],
|
||||
managedSettings: {
|
||||
allowManagedMcpServersOnly: true,
|
||||
allowedMcpServers: [],
|
||||
},
|
||||
strictMcpConfig: true,
|
||||
allowedTools: [],
|
||||
permissionMode: 'dontAsk',
|
||||
persistSession: false,
|
||||
|
|
@ -268,6 +283,11 @@ describe('ClaudeCodeKtxLlmRuntime', () => {
|
|||
|
||||
const options = query.mock.calls[0][0].options;
|
||||
expect(options.allowedTools).toEqual(['mcp__ktx__load_skill']);
|
||||
expect(options.managedSettings).toEqual({
|
||||
allowManagedMcpServersOnly: true,
|
||||
allowedMcpServers: [{ serverName: 'ktx' }],
|
||||
});
|
||||
expect(options.strictMcpConfig).toBe(true);
|
||||
expect(await options.canUseTool('mcp__ktx__load_skill', {}, { signal: new AbortController().signal, toolUseID: '1' })).toEqual({
|
||||
behavior: 'allow',
|
||||
toolUseID: '1',
|
||||
|
|
@ -334,6 +354,10 @@ describe('ClaudeCodeKtxLlmRuntime', () => {
|
|||
answer: 'yes',
|
||||
});
|
||||
expect(objectQuery.mock.calls[0][0].options.env).toEqual(expect.objectContaining({ PATH: '/usr/bin' }));
|
||||
expect(objectQuery.mock.calls[0][0].options.managedSettings).toEqual({
|
||||
allowManagedMcpServersOnly: true,
|
||||
allowedMcpServers: [],
|
||||
});
|
||||
expect(objectQuery.mock.calls[0][0].options.env).not.toEqual(
|
||||
expect.objectContaining({ ANTHROPIC_API_KEY: 'sk-ant-test', AWS_PROFILE: 'prod' }), // pragma: allowlist secret
|
||||
);
|
||||
|
|
@ -374,6 +398,10 @@ describe('ClaudeCodeKtxLlmRuntime', () => {
|
|||
telemetryTags: { operationName: 'test' },
|
||||
});
|
||||
expect(agentQuery.mock.calls[0][0].options.env).toEqual(expect.objectContaining({ HOME: '/Users/test' }));
|
||||
expect(agentQuery.mock.calls[0][0].options.managedSettings).toEqual({
|
||||
allowManagedMcpServersOnly: true,
|
||||
allowedMcpServers: [{ serverName: 'ktx' }],
|
||||
});
|
||||
expect(agentQuery.mock.calls[0][0].options.env).not.toEqual(
|
||||
expect.objectContaining({ ANTHROPIC_AUTH_TOKEN: 'token', CLAUDE_CODE_USE_VERTEX: '1' }),
|
||||
);
|
||||
|
|
@ -442,6 +470,11 @@ describe('ClaudeCodeKtxLlmRuntime', () => {
|
|||
skills: [],
|
||||
plugins: [],
|
||||
tools: [],
|
||||
managedSettings: {
|
||||
allowManagedMcpServersOnly: true,
|
||||
allowedMcpServers: [],
|
||||
},
|
||||
strictMcpConfig: true,
|
||||
allowedTools: [],
|
||||
persistSession: false,
|
||||
env: expect.not.objectContaining({ ANTHROPIC_API_KEY: 'sk-ant-test' }),
|
||||
|
|
|
|||
|
|
@ -45,6 +45,8 @@ const BUILTIN_TOOLS = [
|
|||
'TodoWrite',
|
||||
];
|
||||
|
||||
const KTX_MCP_SERVER_NAME = 'ktx';
|
||||
|
||||
function isResult(message: SDKMessage): message is SDKResultMessage {
|
||||
return message.type === 'result';
|
||||
}
|
||||
|
|
@ -113,7 +115,14 @@ function assertInitIsolation(
|
|||
}
|
||||
|
||||
function expectedMcpServerNames(tools: KtxRuntimeToolSet | undefined): Set<string> {
|
||||
return tools && Object.keys(tools).length > 0 ? new Set(['ktx']) : new Set();
|
||||
return tools && Object.keys(tools).length > 0 ? new Set([KTX_MCP_SERVER_NAME]) : new Set();
|
||||
}
|
||||
|
||||
function managedMcpSettings(serverNames: string[]): NonNullable<Options['managedSettings']> {
|
||||
return {
|
||||
allowManagedMcpServersOnly: true,
|
||||
allowedMcpServers: serverNames.map((serverName) => ({ serverName })),
|
||||
};
|
||||
}
|
||||
|
||||
function baseOptions(input: {
|
||||
|
|
@ -125,6 +134,7 @@ function baseOptions(input: {
|
|||
}): Options {
|
||||
const toolIds = mcpToolIds(input.tools ?? {});
|
||||
const allowedToolIds = new Set(toolIds);
|
||||
const expectedServerNames = [...expectedMcpServerNames(input.tools)];
|
||||
return {
|
||||
cwd: input.projectDir,
|
||||
model: input.model,
|
||||
|
|
@ -133,6 +143,8 @@ function baseOptions(input: {
|
|||
skills: [],
|
||||
plugins: [],
|
||||
tools: [],
|
||||
managedSettings: managedMcpSettings(expectedServerNames),
|
||||
strictMcpConfig: true,
|
||||
allowedTools: toolIds,
|
||||
disallowedTools: BUILTIN_TOOLS,
|
||||
canUseTool: async (toolName, _toolInput, options) =>
|
||||
|
|
@ -147,7 +159,14 @@ function baseOptions(input: {
|
|||
persistSession: false,
|
||||
env: createKtxClaudeCodeEnv(input.env),
|
||||
...(input.tools && Object.keys(input.tools).length > 0
|
||||
? { mcpServers: { ktx: createSdkMcpServer({ name: 'ktx', tools: createClaudeSdkTools(input.tools) }) } }
|
||||
? {
|
||||
mcpServers: {
|
||||
[KTX_MCP_SERVER_NAME]: createSdkMcpServer({
|
||||
name: KTX_MCP_SERVER_NAME,
|
||||
tools: createClaudeSdkTools(input.tools),
|
||||
}),
|
||||
},
|
||||
}
|
||||
: {}),
|
||||
};
|
||||
}
|
||||
|
|
|
|||
|
|
@ -99,6 +99,27 @@ describe('SlEditSourceTool — session gating', () => {
|
|||
);
|
||||
});
|
||||
|
||||
it('rejects session-scoped edits outside allowed target connections', async () => {
|
||||
const { tool } = makeTool();
|
||||
const session = makeSession({
|
||||
allowedConnectionNames: new Set(['warehouse']),
|
||||
});
|
||||
const context: ToolContext = { ...baseContext, session };
|
||||
|
||||
const result = await tool.call(
|
||||
{
|
||||
connectionId: 'finance',
|
||||
sourceName: 'orders',
|
||||
yaml_edits: [{ oldText: 'measures: []', newText: 'measures: []' }],
|
||||
} as any,
|
||||
context,
|
||||
);
|
||||
|
||||
expect(result.structured.success).toBe(false);
|
||||
expect(result.markdown).toContain('connectionId "finance" is outside this ingest session');
|
||||
expect(session.actions).toEqual([]);
|
||||
});
|
||||
|
||||
it('indexes normally when no session is present', async () => {
|
||||
const { tool, slSearchService } = makeTool();
|
||||
const result = await tool.call(
|
||||
|
|
|
|||
|
|
@ -1,6 +1,12 @@
|
|||
import YAML from 'yaml';
|
||||
import { z } from 'zod';
|
||||
import { addTouchedSlSource, type ToolContext, type ToolOutput, validateActionRawPaths } from '../../tools/index.js';
|
||||
import {
|
||||
addTouchedSlSource,
|
||||
type ToolContext,
|
||||
type ToolOutput,
|
||||
validateActionRawPaths,
|
||||
validateActionTargetConnection,
|
||||
} from '../../tools/index.js';
|
||||
import { applySqlEdits } from '../../tools/sql-edit-replacer.js';
|
||||
import { normalizeSemanticLayerDescriptions } from '../description-normalization.js';
|
||||
import type { SemanticLayerSource } from '../types.js';
|
||||
|
|
@ -79,6 +85,10 @@ If no source exists yet, use sl_write_source instead — this tool will reject t
|
|||
|
||||
const semanticLayerService = context.session?.semanticLayerService ?? this.semanticLayerService;
|
||||
const skipIndex = context.session?.isWorktreeScoped === true;
|
||||
const targetConnectionValidation = validateActionTargetConnection(context.session, connectionId);
|
||||
if (!targetConnectionValidation.ok) {
|
||||
return this.buildOutput(false, [targetConnectionValidation.error], sourceName);
|
||||
}
|
||||
const rawPathValidation = validateActionRawPaths(context.session, input.rawPaths);
|
||||
if (!rawPathValidation.ok) {
|
||||
return this.buildOutput(false, [rawPathValidation.error], sourceName);
|
||||
|
|
|
|||
|
|
@ -133,6 +133,34 @@ describe('SlWriteSourceTool — session gating', () => {
|
|||
);
|
||||
});
|
||||
|
||||
it('rejects session-scoped writes outside allowed target connections', async () => {
|
||||
const { tool } = makeTool();
|
||||
const session = makeSession({
|
||||
allowedConnectionNames: new Set(['warehouse']),
|
||||
});
|
||||
const context: ToolContext = { ...baseContext, session };
|
||||
|
||||
const result = await tool.call(
|
||||
{
|
||||
connectionId: 'finance',
|
||||
sourceName: 'finance_orders',
|
||||
source: {
|
||||
name: 'finance_orders',
|
||||
table: 'public.orders',
|
||||
grain: ['id'],
|
||||
columns: [{ name: 'id', type: 'string' }],
|
||||
measures: [],
|
||||
joins: [],
|
||||
} as any,
|
||||
} as any,
|
||||
context,
|
||||
);
|
||||
|
||||
expect(result.structured.success).toBe(false);
|
||||
expect(result.markdown).toContain('connectionId "finance" is outside this ingest session');
|
||||
expect(session.actions).toEqual([]);
|
||||
});
|
||||
|
||||
it('indexes normally when no session is present', async () => {
|
||||
const { tool, slSearchService } = makeTool();
|
||||
const result = await tool.call(
|
||||
|
|
|
|||
|
|
@ -1,6 +1,12 @@
|
|||
import YAML from 'yaml';
|
||||
import { z } from 'zod';
|
||||
import { addTouchedSlSource, type ToolContext, type ToolOutput, validateActionRawPaths } from '../../tools/index.js';
|
||||
import {
|
||||
addTouchedSlSource,
|
||||
type ToolContext,
|
||||
type ToolOutput,
|
||||
validateActionRawPaths,
|
||||
validateActionTargetConnection,
|
||||
} from '../../tools/index.js';
|
||||
import { sourceOverlaySchema } from '../schemas.js';
|
||||
import type { SemanticLayerService } from '../semantic-layer.service.js';
|
||||
import type { SemanticLayerSource } from '../types.js';
|
||||
|
|
@ -106,6 +112,10 @@ Do NOT join back to a table that the SQL already aggregates from if the grain co
|
|||
|
||||
const semanticLayerService = context.session?.semanticLayerService ?? this.semanticLayerService;
|
||||
const skipIndex = context.session?.isWorktreeScoped === true;
|
||||
const targetConnectionValidation = validateActionTargetConnection(context.session, connectionId);
|
||||
if (!targetConnectionValidation.ok) {
|
||||
return this.buildOutput(false, [targetConnectionValidation.error], sourceName);
|
||||
}
|
||||
const rawPathValidation = validateActionRawPaths(context.session, input.rawPaths);
|
||||
if (!rawPathValidation.ok) {
|
||||
return this.buildOutput(false, [rawPathValidation.error], sourceName);
|
||||
|
|
|
|||
23
packages/context/src/tools/action-target-connection.ts
Normal file
23
packages/context/src/tools/action-target-connection.ts
Normal file
|
|
@ -0,0 +1,23 @@
|
|||
import type { ToolSession } from './tool-session.js';
|
||||
|
||||
type ActionTargetConnectionValidation = { ok: true } | { ok: false; error: string };
|
||||
|
||||
export function validateActionTargetConnection(
|
||||
session: ToolSession | undefined,
|
||||
connectionId: string,
|
||||
): ActionTargetConnectionValidation {
|
||||
const allowed = session?.allowedConnectionNames;
|
||||
if (!allowed) {
|
||||
return { ok: true };
|
||||
}
|
||||
if (allowed.has(connectionId)) {
|
||||
return { ok: true };
|
||||
}
|
||||
const allowedList = [...allowed].sort();
|
||||
return {
|
||||
ok: false,
|
||||
error: `connectionId "${connectionId}" is outside this ingest session's allowed target connections: ${
|
||||
allowedList.length > 0 ? allowedList.join(', ') : '(none)'
|
||||
}`,
|
||||
};
|
||||
}
|
||||
|
|
@ -32,6 +32,7 @@ export type { SqlEdit } from './sql-edit-replacer.js';
|
|||
export { applySqlEdits } from './sql-edit-replacer.js';
|
||||
export type { IngestToolMetadata, MemoryAction, ToolSession } from './tool-session.js';
|
||||
export { validateActionRawPaths } from './action-raw-paths.js';
|
||||
export { validateActionTargetConnection } from './action-target-connection.js';
|
||||
export type { TouchedSlSource, TouchedSlSourceSet } from './touched-sl-sources.js';
|
||||
export {
|
||||
addTouchedSlSource,
|
||||
|
|
|
|||
Loading…
Add table
Add a link
Reference in a new issue