diff --git a/docs-site/content/docs/getting-started/meta.json b/docs-site/content/docs/getting-started/meta.json index a025cb2f..c40e42c6 100644 --- a/docs-site/content/docs/getting-started/meta.json +++ b/docs-site/content/docs/getting-started/meta.json @@ -1,5 +1,5 @@ { "title": "Getting Started", "defaultOpen": true, - "pages": ["introduction", "quickstart", "troubleshooting-linux"] + "pages": ["introduction", "quickstart"] } diff --git a/docs-site/content/docs/getting-started/quickstart.mdx b/docs-site/content/docs/getting-started/quickstart.mdx index b3a2b03f..0118522c 100644 --- a/docs-site/content/docs/getting-started/quickstart.mdx +++ b/docs-site/content/docs/getting-started/quickstart.mdx @@ -296,7 +296,7 @@ surface. | Anthropic health check fails | API key, model id, or access is invalid | Fix `ANTHROPIC_API_KEY` or rerun setup with a different key or model | | Vertex AI health check fails | Vertex API, Claude access, project, location, or IAM permissions are missing | Check the project, location, Application Default Credentials, and Vertex AI permissions | | OpenAI embeddings fail | `OPENAI_API_KEY` is missing or invalid | Export the key or choose local sentence-transformers embeddings | -| Local embeddings fail | Managed Python runtime cannot install or start | See [Troubleshooting clean Linux install](/docs/getting-started/troubleshooting-linux) — usually missing Python 3.13 or an IPv6 proxy env var | +| Local embeddings fail | Managed Python runtime cannot install or start | Run `ktx dev runtime status`, then install the local embeddings runtime | | Database test fails | Credentials, network access, database, warehouse, or schema is wrong | Test the same values with the database's native client, then rerun setup | | Context is not built | Setup saved configuration but skipped or interrupted the build | Run `ktx setup` or `ktx ingest --all` | | Agent integration is incomplete | Setup skipped the agents step or installed a different target | Run `ktx setup --agents --target ` | diff --git a/docs-site/content/docs/getting-started/troubleshooting-linux.mdx b/docs-site/content/docs/getting-started/troubleshooting-linux.mdx deleted file mode 100644 index 55dd6048..00000000 --- a/docs-site/content/docs/getting-started/troubleshooting-linux.mdx +++ /dev/null @@ -1,163 +0,0 @@ ---- -title: Troubleshooting clean Linux install -description: Known gotchas when installing KTX from scratch on a clean Linux host (Ubuntu, Debian, container images). Read this before debugging managed-runtime or daemon failures. ---- - -This page documents the friction a coding agent (or human) will hit when running `npm install -g @kaelio/ktx@next` on a clean Linux host with no Python ≥ 3.13 installed, and during the first `ktx setup` on that host. Each item lists the symptom, the cause, and the exact recovery command. - -## Prerequisites that aren't always satisfied - -KTX needs: - -| Tool | Minimum version | Why | -|------|-----------------|-----| -| Node.js | 22 | Runs the CLI | -| `uv` | 0.5+ | Manages the local Python runtime (semantic-layer daemon, local embeddings) | -| Python | 3.13 | KTX's managed Python runtime targets `>=3.13`. The system Python on Ubuntu 24.04 is 3.12. | - -If `uv` is not on `PATH`, install it: - -```bash -curl -LsSf https://astral.sh/uv/install.sh | sh -source $HOME/.local/bin/env # or: export PATH="$HOME/.local/bin:$PATH" -``` - -Install Python 3.13 via `uv` so it sits alongside whatever the system ships: - -```bash -uv python install 3.13 -``` - -You do not need to make 3.13 the system default. KTX's runtime installer will pick it up when you set `UV_PYTHON=3.13` for the install command (see below). - -## Symptom: `ktx dev runtime install` fails on the venv step - -The install log (`~/.ktx/runtime//install.log`) shows something like: - -``` -$ uv venv /home/runner/.ktx/runtime//.venv -Using CPython 3.12.3 interpreter at: /usr/bin/python3 -... -Package requires Python >=3.13 but the running Python is 3.12.3 -``` - -**Cause:** `uv venv` picked the system Python (3.12) when it built the runtime virtualenv. KTX's wheels declare `requires-python = ">=3.13"`, so the subsequent install fails. - -**Fix:** install Python 3.13 (above), then force the runtime installer to use it: - -```bash -uv python install 3.13 -UV_PYTHON=3.13 ktx dev runtime install --feature local-embeddings --yes --force -``` - -The `--force` flag rebuilds the venv. Without it, the failed venv from the previous attempt is reused. - -## Symptom: managed Python daemon crashes immediately with `URL parse error` - -The daemon stderr (`/.ktx/runtime/daemon.stderr.log`) contains an httpx traceback ending in something like: - -``` -File ".../httpx/_client.py", line 698, in __init__ - URLPattern(key): None -File ".../httpx/_urls.py", line ..., in __init__ - raise InvalidURL(...) -``` - -**Cause:** an environment variable holds a value httpx cannot parse — typically `NO_PROXY` or `no_proxy` containing an **IPv6 CIDR** such as `fd07:b51a:cc66:f0::/64`. OrbStack and some Docker network setups inject this by default. httpx interprets every comma-separated entry as a URL pattern and rejects raw IPv6 CIDRs. - -**Fix:** scrub the bad entries before starting the daemon. The simplest workaround is to unset proxy vars entirely for daemon-related commands: - -```bash -unset HTTP_PROXY HTTPS_PROXY NO_PROXY http_proxy https_proxy no_proxy -ktx dev runtime start --feature local-embeddings -``` - -If you need proxy entries to remain set for outbound HTTP, keep only the IPv4 + hostname entries: - -```bash -export NO_PROXY="localhost,127.0.0.1,*.orb.internal,*.orb.local" -``` - -This issue is tracked for an upstream fix in the daemon: it should sanitize unparseable entries before constructing httpx clients. - -## Symptom: `ktx setup` keeps connecting to an old daemon port - -Running `ktx setup` more than once can leave orphan `ktx-daemon` processes. Each `setup` invocation may spawn a fresh daemon on a new port and write a new `daemon.json`, while the old one keeps running. Subsequent setup attempts may pick the stale port and fail with a connection-refused error or a `500` health check. - -**Fix:** stop all daemons and remove the state files before re-running setup: - -```bash -pkill -9 -f ktx-daemon || true -rm -f ~/.ktx/runtime/*/daemon.json -rm -f /path/to/project/.ktx/runtime/daemon.json -``` - -Then start the daemon explicitly **before** re-running setup so `setup` reuses it: - -```bash -unset HTTP_PROXY HTTPS_PROXY NO_PROXY http_proxy https_proxy no_proxy -ktx dev runtime start --feature local-embeddings -``` - -## Symptom: `ktx status --json` reports a connection as failed but `ktx connection test ` passes - -`ktx status` may cache a failure record from a prior bad run (for example, when the daemon was crashing). A successful `ktx connection test` does not always invalidate the cache. - -**Fix:** re-run a fast ingest, which writes a fresh status record: - -```bash -ktx ingest --fast -ktx status --json -``` - -## A minimal "clean Linux install" recipe - -If you only want one working sequence, this one works from a fresh Ubuntu 24.04 container with Node 22 and Claude Code installed: - -```bash -# 1. Prerequisites -curl -LsSf https://astral.sh/uv/install.sh | sh -source $HOME/.local/bin/env -uv python install 3.13 -npm install -g @kaelio/ktx@next - -# 2. Pre-warm the managed Python runtime with the right Python -UV_PYTHON=3.13 ktx dev runtime install --feature local-embeddings --yes --force - -# 3. Start the daemon with a clean proxy env -unset HTTP_PROXY HTTPS_PROXY NO_PROXY http_proxy https_proxy no_proxy -ktx dev runtime start --feature local-embeddings - -# 4. Scripted setup (replace DATABASE_URL with your warehouse) -mkdir -p /work/project -cd /work/project -export ANTHROPIC_API_KEY=... # already in env from your Claude Code session -export DATABASE_URL=postgresql://... - -ktx setup \ - --no-input \ - --yes \ - --project-dir /work/project \ - --llm-backend anthropic \ - --anthropic-api-key-env ANTHROPIC_API_KEY \ - --anthropic-model claude-sonnet-4-6 \ - --embedding-backend sentence-transformers \ - --database postgres \ - --new-database-connection-id warehouse \ - --database-url env:DATABASE_URL \ - --skip-sources \ - --skip-agents - -# 5. Build schema context -ktx ingest warehouse --fast - -# 6. Verify -ktx status --json -ktx connection test warehouse -``` - -Success looks like: - -- `ktx status --json` reports `"verdict": "ready"` -- `ktx connection test warehouse` exits 0 with `Status: ok` -- `semantic-layer/warehouse/_schema/` contains generated YAML files diff --git a/docs-site/lib/llm-docs.ts b/docs-site/lib/llm-docs.ts index 9eed23b0..561f73e0 100644 --- a/docs-site/lib/llm-docs.ts +++ b/docs-site/lib/llm-docs.ts @@ -1,7 +1,7 @@ import { source } from "@/lib/source"; import { readDocsPageMarkdown } from "@/lib/docs-markdown"; -const siteOrigin = process.env.KTX_DOCS_ORIGIN ?? "https://docs.kaelio.com/ktx"; +const siteOrigin = "https://docs.kaelio.com/ktx"; export type LlmDocsPage = { title: string; @@ -61,7 +61,6 @@ ${link("/docs/ai-resources/agent-instructions", "Agent Instructions", "Suggested ${link("/docs/getting-started/introduction", "Introduction", "What KTX is and who it is for")} ${link("/docs/getting-started/quickstart", "Quickstart", "Set up KTX and build your first context")} -${link("/docs/getting-started/troubleshooting-linux", "Troubleshooting clean Linux install", "READ FIRST if installing from scratch on Linux/container — covers Python 3.13 prerequisite, IPv6 proxy gotcha, and a minimal working recipe")} ${link("/docs/guides/writing-context", "Writing Context", "Write semantic sources and wiki pages")} ## Machine-Readable Documentation diff --git a/docs-site/next-env.d.ts b/docs-site/next-env.d.ts index c4b7818f..9edff1c7 100644 --- a/docs-site/next-env.d.ts +++ b/docs-site/next-env.d.ts @@ -1,6 +1,6 @@ /// /// -import "./.next/dev/types/routes.d.ts"; +import "./.next/types/routes.d.ts"; // NOTE: This file should not be edited // see https://nextjs.org/docs/app/api-reference/config/typescript for more information. diff --git a/docs/superpowers/plans/2026-05-18-adapter-owned-finalization-v1-closure.md b/docs/superpowers/plans/2026-05-18-adapter-owned-finalization-v1-closure.md deleted file mode 100644 index 318fd8a1..00000000 --- a/docs/superpowers/plans/2026-05-18-adapter-owned-finalization-v1-closure.md +++ /dev/null @@ -1,320 +0,0 @@ -# Adapter-Owned Finalization V1 Closure Implementation Plan - -> **For agentic workers:** REQUIRED SUB-SKILL: Use superpowers:subagent-driven-development (recommended) or superpowers:executing-plans to implement this plan task-by-task. Steps use checkbox (`- [ ]`) syntax for tracking. - -**Goal:** Close the remaining adapter-owned finalization v1 verification gaps so the finalization contract is publicly typed and the historic-SQL local acceptance path passes through `SourceAdapter.finalize()`. - -**Architecture:** The production runner already owns finalization execution, commits, target policy, final gates, reports, traces, and provenance. This plan keeps production behavior intact, exports the finalization adapter types through the ingest barrel, and updates the local historic-SQL acceptance fixture to model the real adapter-owned finalization path instead of the removed post-processor path. - -**Tech Stack:** TypeScript ESM/NodeNext, Vitest, pnpm workspace commands, existing `SourceAdapter`, `projectHistoricSqlEvidence()`, and package export coverage. - ---- - -## Audit summary - -The audit compared -`docs/superpowers/specs/2026-05-18-adapter-owned-ingest-finalization-design.md` -against the implemented source, plan, and targeted tests. - -Implemented v1 coverage: - -- `SourceAdapter.finalize()` exists with typed context and result objects in - `packages/context/src/ingest/types.ts`. -- `IngestBundleRunnerDeps.postProcessors`, `IngestBundlePostProcessorPort`, - `HistoricSqlProjectionPostProcessor`, `post_processor` trace phases, and - `postProcessor` report fields are absent from production source. -- The runner invokes finalization after reconciliation and before - `wiki_sl_ref_repair`, target-policy checks, final artifact gates, - provenance validation, and squash. -- The runner derives finalization touched paths from the integration-worktree - diff, resolves semantic-layer scope including `_schema/*.yaml`, cross-checks - adapter declarations, commits finalization, records reports/traces, rejects - path overlap, and partitions finalization actions for provenance exclusions. -- Override replay passes explicit `overrideReplay` metadata, omits - `parseArtifacts`, and leaves current-run `workUnitOutcomes` empty. -- Historic SQL implements adapter-owned `finalize()` and uses - `projectHistoricSqlEvidence()` for aggregate projection maintenance. - -V1-blocking gaps: - -- `packages/context/src/ingest/index.ts` exports `SourceAdapter` and projection - types, but not `DeterministicFinalizationContext`, - `FinalizationOverrideReplay`, or `FinalizationResult`. The adapter contract is - less usable from the public ingest barrel than the spec requires. -- The targeted verification command currently fails because - `HistoricSqlEvidenceTestAdapter` in - `packages/context/src/ingest/local-bundle-ingest.test.ts` lacks - `finalize()`, so `result.report.body.finalization` is `undefined` in the - local historic-SQL projection acceptance test. - -Non-blocking gaps: - -- Older historical plan documents still mention post-processors. They are - archived implementation history and do not affect runtime behavior. -- The runner has helper-level declaration mismatch coverage, but no dedicated - local-bundle integration test for a finalization declaration mismatch. The - implementation path exists; adding a higher-level regression test can be a - later hardening pass. -- Finalization wiki page deletion could use a future global wiki-reference gate - regression. Historic-SQL v1 finalization updates or archives pages in place, - so this is not required for the current v1 acceptance path. - -## File structure - -- Modify `packages/context/src/ingest/index.ts`. - Re-export the typed finalization adapter contract next to the existing - projection contract. -- Modify `packages/context/src/package-exports.test.ts`. - Add compile-time coverage proving finalization adapter types are exported - from the ingest barrel. -- Modify `packages/context/src/ingest/local-bundle-ingest.test.ts`. - Make the historic-SQL local acceptance test adapter implement - `finalize()` by delegating to `projectHistoricSqlEvidence()`, and rename the - stale test label from post-processor to finalization. - ---- - -### Task 1: Export finalization adapter contract types - -**Files:** -- Modify: `packages/context/src/package-exports.test.ts` -- Modify: `packages/context/src/ingest/index.ts` - -- [ ] **Step 1: Write failing type export coverage** - -In `packages/context/src/package-exports.test.ts`, add this import after the -existing Vitest import: - -```ts -import type { - DeterministicFinalizationContext, - FinalizationOverrideReplay, - FinalizationResult, -} from './ingest/index.js'; -``` - -Then add this constant after `scanTypeExportCoverage`: - -```ts -const ingestFinalizationTypeExportCoverage: Partial<{ - context: DeterministicFinalizationContext; - overrideReplay: FinalizationOverrideReplay; - result: FinalizationResult; -}> = {}; -``` - -Inside the existing package export test, place this assertion immediately after -`expect(scanTypeExportCoverage).toEqual({});`: - -```ts -expect(ingestFinalizationTypeExportCoverage).toEqual({}); -``` - -- [ ] **Step 2: Run type-check to verify the coverage fails** - -Run: - -```bash -pnpm --filter @ktx/context run type-check -``` - -Expected: FAIL with TypeScript errors like: - -```text -Module '"./ingest/index.js"' has no exported member 'DeterministicFinalizationContext'. -Module '"./ingest/index.js"' has no exported member 'FinalizationOverrideReplay'. -Module '"./ingest/index.js"' has no exported member 'FinalizationResult'. -``` - -- [ ] **Step 3: Export the finalization types** - -In `packages/context/src/ingest/index.ts`, update the existing export block -from `./types.js` so the final lines read: - -```ts - WorkUnit, - DeterministicProjectionContext, - ProjectionResult, - DeterministicFinalizationContext, - FinalizationOverrideReplay, - FinalizationResult, -} from './types.js'; -``` - -- [ ] **Step 4: Run type-check and package export coverage** - -Run: - -```bash -pnpm --filter @ktx/context run type-check -pnpm --filter @ktx/context exec vitest run src/package-exports.test.ts -``` - -Expected: both commands PASS. - -- [ ] **Step 5: Commit the type export closure** - -Run: - -```bash -git add packages/context/src/ingest/index.ts packages/context/src/package-exports.test.ts -git commit -m "feat(ingest): export finalization adapter contract types" -``` - -### Task 2: Repair the local historic-SQL finalization acceptance fixture - -**Files:** -- Modify: `packages/context/src/ingest/local-bundle-ingest.test.ts` - -- [ ] **Step 1: Import the projection helper and finalization types** - -In `packages/context/src/ingest/local-bundle-ingest.test.ts`, add this import -after the fake adapter import: - -```ts -import { projectHistoricSqlEvidence } from './adapters/historic-sql/projection.js'; -``` - -Replace the existing type import from `./types.js` with: - -```ts -import type { - ChunkResult, - DeterministicFinalizationContext, - DiffSet, - FinalizationResult, - SourceAdapter, -} from './types.js'; -``` - -- [ ] **Step 2: Add adapter-owned finalization to the test adapter** - -In `HistoricSqlEvidenceTestAdapter`, add this method after `chunk()`: - -```ts - async finalize(ctx: DeterministicFinalizationContext): Promise { - const projection = await projectHistoricSqlEvidence({ - workdir: ctx.workdir, - connectionId: ctx.connectionId, - syncId: ctx.syncId, - runId: ctx.runId, - overrideReplay: ctx.overrideReplay, - }); - - return { - result: projection, - warnings: projection.warnings, - errors: [], - touchedSources: projection.touchedSources, - changedWikiPageKeys: projection.changedWikiPageKeys, - actions: projection.actions, - }; - } -``` - -- [ ] **Step 3: Rename the stale test label** - -Change the test name: - -```ts -it('runs historic-SQL evidence projection through the local bundle post-processor', async () => { -``` - -to: - -```ts -it('runs historic-SQL evidence projection through local bundle finalization', async () => { -``` - -- [ ] **Step 4: Run the focused failing test** - -Run: - -```bash -pnpm --filter @ktx/context exec vitest run src/ingest/local-bundle-ingest.test.ts -t "historic-SQL evidence projection" -``` - -Expected: PASS, and the assertion at -`packages/context/src/ingest/local-bundle-ingest.test.ts:551` receives a -`result.report.body.finalization` object with `status: "success"`. - -- [ ] **Step 5: Commit the local acceptance fixture** - -Run: - -```bash -git add packages/context/src/ingest/local-bundle-ingest.test.ts -git commit -m "test(ingest): exercise historic sql finalization locally" -``` - -### Task 3: Run final verification - -**Files:** -- Verify: `packages/context/src/ingest/finalization-scope.test.ts` -- Verify: `packages/context/src/ingest/ingest-bundle.runner.test.ts` -- Verify: `packages/context/src/ingest/ingest-bundle.runner.isolated-diff.test.ts` -- Verify: `packages/context/src/ingest/adapters/historic-sql/projection.test.ts` -- Verify: `packages/context/src/ingest/local-bundle-ingest.test.ts` -- Verify: `packages/context/src/ingest/adapters/historic-sql/local-ingest-acceptance.test.ts` -- Verify: workspace TypeScript and dead-code checks - -- [ ] **Step 1: Run the adapter-owned finalization targeted suite** - -Run: - -```bash -pnpm --filter @ktx/context exec vitest run src/ingest/finalization-scope.test.ts src/ingest/ingest-bundle.runner.test.ts src/ingest/ingest-bundle.runner.isolated-diff.test.ts src/ingest/adapters/historic-sql/projection.test.ts src/ingest/local-bundle-ingest.test.ts src/ingest/adapters/historic-sql/local-ingest-acceptance.test.ts -``` - -Expected: PASS with all six test files passing. - -- [ ] **Step 2: Run TypeScript validation** - -Run: - -```bash -pnpm --filter @ktx/context run type-check -``` - -Expected: PASS. - -- [ ] **Step 3: Run dead-code validation** - -Run: - -```bash -pnpm run dead-code -``` - -Expected: PASS. - -- [ ] **Step 4: Inspect final status** - -Run: - -```bash -git status --short -``` - -Expected: only the intended committed changes are present, or the worktree is -clean after the two commits. - -## Docs impact - -No `docs-site/content/docs/` update is required. The remaining v1 work is an -adapter contract type export and test acceptance closure; it does not change -CLI behavior, user configuration, setup flow, connector behavior, or public -documentation examples. - -## Self-review - -- Spec coverage: The plan covers the remaining adapter API usability gap and - the failing historic-SQL local finalization acceptance path. The main - runner, reports, traces, provenance, override replay, and historic-SQL - production finalization behavior already exist. -- Placeholder scan: The plan contains no placeholder tasks or unspecified - implementation steps. -- Type consistency: `DeterministicFinalizationContext`, - `FinalizationOverrideReplay`, and `FinalizationResult` match the existing - names in `packages/context/src/ingest/types.ts`; the test adapter delegates - to the existing `projectHistoricSqlEvidence()` result shape. diff --git a/docs/superpowers/plans/2026-05-18-adapter-owned-finalization-v1.md b/docs/superpowers/plans/2026-05-18-adapter-owned-finalization-v1.md deleted file mode 100644 index b38988e3..00000000 --- a/docs/superpowers/plans/2026-05-18-adapter-owned-finalization-v1.md +++ /dev/null @@ -1,1862 +0,0 @@ -# Adapter-Owned Finalization V1 Implementation Plan - -> **For agentic workers:** REQUIRED SUB-SKILL: Use -> superpowers:subagent-driven-development (recommended) or -> superpowers:executing-plans to implement this plan task-by-task. Steps use -> checkbox (`- [ ]`) syntax for tracking. - -**Goal:** Replace runner-level ingest post-processors with a typed -`SourceAdapter.finalize()` phase that runs after reconciliation and before all -final gates. - -**Architecture:** Keep the runner responsible for execution mechanics: -worktree scope, diff capture, commits, declaration checks, wiki semantic-layer -reference repair, target policy, final artifact gates, provenance planning, -reports, traces, and squash. Move source-specific deterministic maintenance -into adapter-owned `finalize()` implementations, starting with historic SQL. - -**Tech Stack:** TypeScript ESM/NodeNext, Vitest, simple-git through -`GitService`, existing `IngestBundleRunner`, `SourceAdapter`, -`SemanticLayerService`, `KnowledgeWikiService`, `StageIndex`, memory actions, -and isolated-diff trace/report infrastructure. - ---- - -## Audit summary - -This audit read -`docs/superpowers/specs/2026-05-18-adapter-owned-ingest-finalization-design.md`, -searched `docs/superpowers/plans/`, and inspected the current ingest runner, -adapter contract, reports, local runtime wiring, and historic-SQL projection -code. - -No existing implementation plan targets this spec directly. Existing -isolated-diff plans implemented prerequisites that this work can reuse: -projection before child worktrees, reconciliation before final gates, target -policy, final artifact gates, wiki semantic-layer reference repair, provenance -raw-path validation, persistent traces, failure reports, and gate repair. - -Current implementation evidence: - -- `SourceAdapter.project()` exists, but `SourceAdapter.finalize()` does not. -- `IngestBundleRunnerDeps.postProcessors` and - `IngestBundlePostProcessorPort` still exist in - `packages/context/src/ingest/ports.ts`. -- `IngestBundleRunner` still runs a `post_processor` phase after - reconciliation. -- `HistoricSqlProjectionPostProcessor` is still exported and wired in - `local-bundle-runtime.ts`. -- Reports still expose `postProcessor`, and saved-memory counts special-case - historic SQL post-processor result fields. -- Historic SQL projection still commits its own changes from the - post-processor, outside runner-owned finalization commit/report/provenance - mechanics. - -## V1-blocking gaps - -These gaps block the adapter-owned finalization spec: - -- Add typed `DeterministicFinalizationContext`, - `FinalizationOverrideReplay`, and `FinalizationResult` objects, and add - optional `SourceAdapter.finalize()`. -- Invoke `adapter.finalize()` after reconciliation and before - `wiki_sl_ref_repair`, target-policy checks, final artifact gates, provenance - validation, and squash. -- Make the runner derive finalization changed paths, wiki page keys, and - semantic-layer touched sources from the integration-worktree diff. -- Resolve aggregate `_schema/*.yaml` semantic-layer changes by comparing - pre-finalization and post-finalization loaded semantic-layer sources. -- Cross-check adapter-declared touched sources and wiki page keys against the - runner-derived diff and fail on under-reporting, over-reporting, or - unresolvable changed semantic-layer paths. -- Commit finalization changes in the integration worktree with a runner-owned - commit and include the commit SHA and touched paths in reports and traces. -- Fail if finalization effectively changes a path already changed by accepted - work-unit, projection, or reconciliation writes in the same run. -- Include finalization paths in target policy and final artifact gates. -- Include finalization actions in saved-memory counts and report details, but - do not re-apply them as writes. -- Add finalization provenance rows only for actions with defensible raw paths: - current raw snapshot paths, current-run `stageIndex.evictionsApplied` raw - paths, or `overrideReplay.evictionRawPaths`. -- Report finalization actions excluded from provenance, including the reason. -- Pass explicit override replay metadata to finalization and keep - `workUnitOutcomes` empty when override replay skips source work units. -- Migrate historic-SQL whole-run projection maintenance into - `HistoricSqlSourceAdapter.finalize()`. -- Remove `IngestBundlePostProcessorPort`, `deps.postProcessors`, - `HistoricSqlProjectionPostProcessor`, `post_processor` trace/report phases, - and `postProcessor` report fields. -- Cover successful finalization, finalization errors, unauthorized target - rejection, declaration mismatch rejection, override replay behavior, - wiki-SL-ref repair placement, finalization provenance exclusion, path - overlap failure, and historic-SQL projection without runner post-processors. - -## Non-blocking gaps - -These are not required for v1 of this spec: - -- Moving historic-SQL per-unit table usage or pattern writes out of typed - evidence into direct work-unit tools. Evidence can remain an internal - adapter input as long as it is not exposed as a runner post-processor - contract. -- Adding deterministic `finalize()` implementations for adapters other than - historic SQL. -- Re-parsing materialized override raw snapshots as a future override-safe - input. This plan treats override replay without current-run evidence as a - no-op for historic-SQL stale/archive cleanup. -- Designing public execution knobs such as `executionMode`, - `planningStrategy`, `conflictPolicy`, or source-key allowlists. -- Reworking wiki page frontmatter, semantic-layer YAML formats, or historic-SQL - chunking. - -## File structure - -- Modify `packages/context/src/ingest/types.ts`. - Owns the new adapter finalization context/result types and - `SourceAdapter.finalize()`. -- Modify `packages/context/src/ingest/reports.ts`. - Replaces post-processor report/count fields with finalization report/count - fields. -- Modify `packages/context/src/ingest/report-snapshot.ts`. - Parses stored finalization report metadata. -- Create `packages/context/src/ingest/finalization-scope.ts`. - Derives finalization wiki keys and semantic-layer source scope from changed - paths and pre/post semantic-layer snapshots, and validates adapter - declarations. -- Create `packages/context/src/ingest/finalization-scope.test.ts`. - Covers standalone SL files, aggregate `_schema` files, wiki pages, mismatch - detection, and unresolvable aggregate changes. -- Modify `packages/context/src/ingest/ingest-bundle.runner.ts`. - Calls `adapter.finalize()`, commits finalization changes, records trace and - report metadata, enforces path overlap and target policy, feeds finalization - into gates, memory counts, provenance, reindexing, and wiki-SL-ref repair. -- Modify `packages/context/src/ingest/ingest-bundle.runner.test.ts`. - Covers unit-level finalization context, reports, failures, and provenance - partitioning. -- Modify `packages/context/src/ingest/ingest-bundle.runner.isolated-diff.test.ts`. - Covers real-git finalization ordering, path overlap, target policy, and - wiki-SL-ref repair placement. -- Modify `packages/context/src/ingest/adapters/historic-sql/projection.ts`. - Makes projection callable from adapter finalization, returns changed wiki - page keys and descriptive actions, and no-ops stale/archive cleanup when no - current-run evidence exists. -- Modify `packages/context/src/ingest/adapters/historic-sql/historic-sql.adapter.ts`. - Implements `finalize()`. -- Modify `packages/context/src/ingest/adapters/historic-sql/projection.test.ts`. - Covers finalization projection result metadata and override-safe no-op - behavior. -- Delete `packages/context/src/ingest/adapters/historic-sql/post-processor.ts`. -- Delete `packages/context/src/ingest/adapters/historic-sql/post-processor.test.ts`. -- Modify `packages/context/src/ingest/local-bundle-runtime.ts`. - Removes post-processor import and dependency wiring. -- Modify `packages/context/src/ingest/ports.ts`. - Removes post-processor port types and dependency injection. -- Modify `packages/context/src/ingest/index.ts`. - Removes post-processor exports and exports finalization helper types only as - needed. -- Modify `packages/context/src/package-exports.test.ts`. - Removes the historic-SQL post-processor export assertion. -- Modify `packages/cli/src/ingest.test.ts` and - `packages/cli/src/setup.ts` only if saved-memory count assertions still - refer to post-processor report fields. - ---- - -### Task 1: Add adapter finalization and report contracts - -**Files:** -- Modify: `packages/context/src/ingest/types.ts` -- Modify: `packages/context/src/ingest/reports.ts` -- Modify: `packages/context/src/ingest/report-snapshot.ts` - -- [ ] **Step 1: Add finalization adapter types** - -In `packages/context/src/ingest/types.ts`, add these imports near the top: - -```ts -import type { MemoryAction } from '../memory/index.js'; -import type { TouchedSlSource } from '../tools/index.js'; -import type { StageIndex } from './stages/stage-index.types.js'; -import type { WorkUnitOutcome } from './stages/stage-3-work-units.js'; -``` - -In the same file, insert this block after `ProjectionResult`: - -```ts -export interface FinalizationOverrideReplay { - priorJobId: string; - priorRunId: string; - priorSyncId: string; - evictionRawPaths: string[]; -} - -export interface DeterministicFinalizationContext { - connectionId: string; - sourceKey: string; - syncId: string; - jobId: string; - runId: string; - stagedDir: string; - workdir: string; - parseArtifacts?: unknown; - stageIndex: StageIndex; - workUnitOutcomes: WorkUnitOutcome[]; - reconciliationActions: MemoryAction[]; - overrideReplay?: FinalizationOverrideReplay; -} - -export interface FinalizationResult { - warnings: string[]; - errors: string[]; - touchedSources: TouchedSlSource[]; - changedWikiPageKeys: string[]; - actions?: MemoryAction[]; - result?: unknown; -} -``` - -Then add the optional method to `SourceAdapter` immediately after `project?`: - -```ts - finalize?(ctx: DeterministicFinalizationContext): Promise; -``` - -- [ ] **Step 2: Add finalization report types** - -In `packages/context/src/ingest/reports.ts`, replace -`IngestReportPostProcessorOutcome` with: - -```ts -export interface IngestReportFinalizationMismatch { - artifactKind: 'sl' | 'wiki'; - key: string; - direction: 'missing_from_adapter_declaration' | 'extra_in_adapter_declaration'; -} - -export interface IngestReportFinalizationProvenanceExclusion { - action: MemoryAction; - reason: 'missing_raw_paths' | 'raw_path_not_defensible'; - invalidRawPaths?: string[]; -} - -export interface IngestReportFinalizationOutcome { - sourceKey: string; - status: 'success' | 'failed' | 'skipped'; - commitSha: string | null; - touchedPaths: string[]; - declaredTouchedSources: TouchedSlSource[]; - derivedTouchedSources: TouchedSlSource[]; - declaredChangedWikiPageKeys: string[]; - derivedChangedWikiPageKeys: string[]; - mismatches: IngestReportFinalizationMismatch[]; - result?: unknown; - errors: string[]; - warnings: string[]; - actions: MemoryAction[]; - provenanceExclusions: IngestReportFinalizationProvenanceExclusion[]; -} -``` - -Replace the `postProcessor?: IngestReportPostProcessorOutcome;` field in -`IngestReportBody` with: - -```ts - finalization?: IngestReportFinalizationOutcome; -``` - -Replace `postProcessorSavedMemoryCounts()` with: - -```ts -export function finalizationSavedMemoryCounts( - finalization: IngestReportFinalizationOutcome | undefined, -): IngestSavedMemoryCounts { - const actions = finalization?.actions ?? []; - return { - wikiCount: actions.filter((action) => action.target === 'wiki').length, - slCount: actions.filter((action) => action.target === 'sl').length, - }; -} -``` - -Then update `savedMemoryCountsForReport()` so it includes finalization -actions: - -```ts -export function savedMemoryCountsForReport(report: IngestReportSnapshot): IngestSavedMemoryCounts { - const workUnitActions = report.body.workUnits.flatMap((workUnit) => workUnit.actions); - const reconciliationActions = report.body.reconciliationActions ?? []; - const finalizationActions = report.body.finalization?.actions ?? []; - const actions = [...workUnitActions, ...reconciliationActions, ...finalizationActions]; - return { - wikiCount: actions.filter((action) => action.target === 'wiki').length, - slCount: actions.filter((action) => action.target === 'sl').length, - }; -} -``` - -- [ ] **Step 3: Parse stored finalization report snapshots** - -In `packages/context/src/ingest/report-snapshot.ts`, add this schema near the -other report schemas: - -```ts -const finalizationMismatchSchema = z.object({ - artifactKind: z.enum(['sl', 'wiki']), - key: z.string().min(1), - direction: z.enum(['missing_from_adapter_declaration', 'extra_in_adapter_declaration']), -}); - -const finalizationProvenanceExclusionSchema = z.object({ - action: ingestActionSchema, - reason: z.enum(['missing_raw_paths', 'raw_path_not_defensible']), - invalidRawPaths: z.array(z.string()).optional(), -}); - -const finalizationOutcomeSchema = z.object({ - sourceKey: z.string().min(1), - status: z.enum(['success', 'failed', 'skipped']), - commitSha: z.string().nullable(), - touchedPaths: z.array(z.string()), - declaredTouchedSources: z.array(touchedSlSourceSchema), - derivedTouchedSources: z.array(touchedSlSourceSchema), - declaredChangedWikiPageKeys: z.array(z.string()), - derivedChangedWikiPageKeys: z.array(z.string()), - mismatches: z.array(finalizationMismatchSchema).default([]), - result: z.unknown().optional(), - errors: z.array(z.string()), - warnings: z.array(z.string()), - actions: z.array(ingestActionSchema).default([]), - provenanceExclusions: z.array(finalizationProvenanceExclusionSchema).default([]), -}); -``` - -Then add this field inside the report body schema: - -```ts - finalization: finalizationOutcomeSchema.optional(), -``` - -- [ ] **Step 4: Run the contract checks** - -Run: - -```bash -pnpm --filter @ktx/context exec vitest run src/ingest/report-snapshot.test.ts src/package-exports.test.ts -``` - -Expected: PASS after the implementation compiles. Before downstream code is -updated, TypeScript references to removed post-processor names may still fail -in later tasks. - -### Task 2: Add finalization scope derivation helpers - -**Files:** -- Create: `packages/context/src/ingest/finalization-scope.ts` -- Create: `packages/context/src/ingest/finalization-scope.test.ts` - -- [ ] **Step 1: Write finalization scope tests** - -Create `packages/context/src/ingest/finalization-scope.test.ts`: - -```ts -import { describe, expect, it } from 'vitest'; -import { - deriveFinalizationWikiPageKeys, - compareFinalizationDeclarations, - deriveFinalizationTouchedSources, -} from './finalization-scope.js'; - -describe('deriveFinalizationWikiPageKeys', () => { - it('maps changed global wiki markdown paths to page keys', () => { - expect( - deriveFinalizationWikiPageKeys([ - 'wiki/global/historic-sql-orders.md', - 'wiki/global/nested/page.md', - 'README.md', - ]), - ).toEqual(['historic-sql-orders']); - }); -}); - -describe('deriveFinalizationTouchedSources', () => { - it('maps standalone semantic-layer files directly', async () => { - const result = await deriveFinalizationTouchedSources({ - changedPaths: ['semantic-layer/warehouse/orders.yaml'], - beforeSourcesByConnection: new Map(), - afterSourcesByConnection: new Map(), - }); - expect(result).toEqual({ - touchedSources: [{ connectionId: 'warehouse', sourceName: 'orders' }], - unresolvedPaths: [], - }); - }); - - it('resolves aggregate _schema changes by comparing loaded source snapshots', async () => { - const beforeSourcesByConnection = new Map([ - [ - 'warehouse', - [ - { - name: 'orders', - grain: ['order_id'], - columns: [{ name: 'order_id', type: 'string' }], - joins: [], - measures: [], - usage: { narrative: 'old' }, - }, - ], - ], - ]); - const afterSourcesByConnection = new Map([ - [ - 'warehouse', - [ - { - name: 'orders', - grain: ['order_id'], - columns: [{ name: 'order_id', type: 'string' }], - joins: [], - measures: [], - usage: { narrative: 'new' }, - }, - ], - ], - ]); - - const result = await deriveFinalizationTouchedSources({ - changedPaths: ['semantic-layer/warehouse/_schema/public.yaml'], - beforeSourcesByConnection, - afterSourcesByConnection, - }); - - expect(result).toEqual({ - touchedSources: [{ connectionId: 'warehouse', sourceName: 'orders' }], - unresolvedPaths: [], - }); - }); - - it('flags aggregate _schema changes that cannot be resolved to logical sources', async () => { - const beforeSourcesByConnection = new Map([['warehouse', []]]); - const afterSourcesByConnection = new Map([['warehouse', []]]); - - const result = await deriveFinalizationTouchedSources({ - changedPaths: ['semantic-layer/warehouse/_schema/public.yaml'], - beforeSourcesByConnection, - afterSourcesByConnection, - }); - - expect(result).toEqual({ - touchedSources: [], - unresolvedPaths: ['semantic-layer/warehouse/_schema/public.yaml'], - }); - }); -}); - -describe('compareFinalizationDeclarations', () => { - it('reports missing and extra adapter declarations', () => { - expect( - compareFinalizationDeclarations({ - declaredTouchedSources: [{ connectionId: 'warehouse', sourceName: 'orders' }], - derivedTouchedSources: [{ connectionId: 'warehouse', sourceName: 'customers' }], - declaredChangedWikiPageKeys: ['orders'], - derivedChangedWikiPageKeys: ['orders', 'patterns'], - }), - ).toEqual([ - { - artifactKind: 'sl', - key: 'warehouse:customers', - direction: 'missing_from_adapter_declaration', - }, - { - artifactKind: 'sl', - key: 'warehouse:orders', - direction: 'extra_in_adapter_declaration', - }, - { - artifactKind: 'wiki', - key: 'patterns', - direction: 'missing_from_adapter_declaration', - }, - ]); - }); -}); -``` - -- [ ] **Step 2: Implement finalization scope helpers** - -Create `packages/context/src/ingest/finalization-scope.ts`: - -```ts -import type { SemanticLayerSource } from '../sl/index.js'; -import type { TouchedSlSource } from '../tools/index.js'; -import type { IngestReportFinalizationMismatch } from './reports.js'; - -interface DeriveTouchedSourcesInput { - changedPaths: string[]; - beforeSourcesByConnection: Map; - afterSourcesByConnection: Map; -} - -interface DeriveTouchedSourcesResult { - touchedSources: TouchedSlSource[]; - unresolvedPaths: string[]; -} - -interface CompareFinalizationDeclarationsInput { - declaredTouchedSources: TouchedSlSource[]; - derivedTouchedSources: TouchedSlSource[]; - declaredChangedWikiPageKeys: string[]; - derivedChangedWikiPageKeys: string[]; -} - -function uniqueSorted(values: string[]): string[] { - return [...new Set(values.filter((value) => value.length > 0))].sort(); -} - -function touchedKey(source: TouchedSlSource): string { - return `${source.connectionId}:${source.sourceName}`; -} - -function stableJson(value: unknown): string { - if (Array.isArray(value)) { - return `[${value.map((entry) => stableJson(entry)).join(',')}]`; - } - if (value && typeof value === 'object') { - const record = value as Record; - return `{${Object.keys(record) - .sort() - .map((key) => `${JSON.stringify(key)}:${stableJson(record[key])}`) - .join(',')}}`; - } - return JSON.stringify(value); -} - -function changedSourceNames( - beforeSources: SemanticLayerSource[], - afterSources: SemanticLayerSource[], -): string[] { - const before = new Map(beforeSources.map((source) => [source.name, stableJson(source)])); - const after = new Map(afterSources.map((source) => [source.name, stableJson(source)])); - return uniqueSorted( - uniqueSorted([...before.keys(), ...after.keys()]).filter((sourceName) => before.get(sourceName) !== after.get(sourceName)), - ); -} - -export function deriveFinalizationWikiPageKeys(paths: string[]): string[] { - return uniqueSorted( - paths - .filter((path) => path.startsWith('wiki/global/') && path.endsWith('.md')) - .filter((path) => !path.slice('wiki/global/'.length, -'.md'.length).includes('/')) - .map((path) => path.slice('wiki/global/'.length, -'.md'.length)), - ); -} - -export async function deriveFinalizationTouchedSources( - input: DeriveTouchedSourcesInput, -): Promise { - const touched = new Map(); - const unresolvedPaths: string[] = []; - - for (const path of input.changedPaths) { - if (!path.startsWith('semantic-layer/') || !(path.endsWith('.yaml') || path.endsWith('.yml'))) { - continue; - } - const parts = path.split('/'); - const connectionId = parts[1] ?? ''; - if (!connectionId) { - unresolvedPaths.push(path); - continue; - } - if (parts[2] !== '_schema') { - const fileName = parts.at(-1) ?? ''; - const sourceName = fileName.replace(/\.ya?ml$/, ''); - if (!sourceName) { - unresolvedPaths.push(path); - continue; - } - touched.set(`${connectionId}:${sourceName}`, { connectionId, sourceName }); - continue; - } - - const changedNames = changedSourceNames( - input.beforeSourcesByConnection.get(connectionId) ?? [], - input.afterSourcesByConnection.get(connectionId) ?? [], - ); - if (changedNames.length === 0) { - unresolvedPaths.push(path); - continue; - } - for (const sourceName of changedNames) { - touched.set(`${connectionId}:${sourceName}`, { connectionId, sourceName }); - } - } - - return { - touchedSources: [...touched.values()].sort((left, right) => touchedKey(left).localeCompare(touchedKey(right))), - unresolvedPaths: uniqueSorted(unresolvedPaths), - }; -} - -export function compareFinalizationDeclarations( - input: CompareFinalizationDeclarationsInput, -): IngestReportFinalizationMismatch[] { - const mismatches: IngestReportFinalizationMismatch[] = []; - const declaredSl = new Set(input.declaredTouchedSources.map(touchedKey)); - const derivedSl = new Set(input.derivedTouchedSources.map(touchedKey)); - const declaredWiki = new Set(input.declaredChangedWikiPageKeys); - const derivedWiki = new Set(input.derivedChangedWikiPageKeys); - - for (const key of [...derivedSl].sort()) { - if (!declaredSl.has(key)) { - mismatches.push({ artifactKind: 'sl', key, direction: 'missing_from_adapter_declaration' }); - } - } - for (const key of [...declaredSl].sort()) { - if (!derivedSl.has(key)) { - mismatches.push({ artifactKind: 'sl', key, direction: 'extra_in_adapter_declaration' }); - } - } - for (const key of [...derivedWiki].sort()) { - if (!declaredWiki.has(key)) { - mismatches.push({ artifactKind: 'wiki', key, direction: 'missing_from_adapter_declaration' }); - } - } - for (const key of [...declaredWiki].sort()) { - if (!derivedWiki.has(key)) { - mismatches.push({ artifactKind: 'wiki', key, direction: 'extra_in_adapter_declaration' }); - } - } - return mismatches; -} -``` - -- [ ] **Step 3: Run the focused helper tests** - -Run: - -```bash -pnpm --filter @ktx/context exec vitest run src/ingest/finalization-scope.test.ts -``` - -Expected: PASS. - -### Task 3: Wire runner-owned finalization - -**Files:** -- Modify: `packages/context/src/ingest/ingest-bundle.runner.test.ts` -- Modify: `packages/context/src/ingest/ingest-bundle.runner.isolated-diff.test.ts` -- Modify: `packages/context/src/ingest/ingest-bundle.runner.ts` - -- [ ] **Step 1: Add a unit test for successful finalization** - -In `packages/context/src/ingest/ingest-bundle.runner.test.ts`, replace the -post-processor success test with a finalization success test: - -```ts - it('runs adapter finalization before squash, records the outcome, and reindexes touched sources', async () => { - const deps = makeDeps(); - deps.adapter.source = 'metricflow'; - deps.registry.get.mockReturnValue(deps.adapter); - deps.adapter.chunk.mockResolvedValue({ - workUnits: [{ unitKey: 'u1', rawFiles: ['semantic_models.yml'], peerFileIndex: [], dependencyPaths: [] }], - parseArtifacts: { semanticModels: [{ name: 'orders' }] }, - }); - deps.adapter.listTargetConnectionIds = vi.fn().mockResolvedValue(['warehouse-2']); - deps.adapter.finalize = vi.fn().mockResolvedValue({ - result: { sourcesTouched: 1 }, - warnings: ['kept going'], - errors: [], - touchedSources: [{ connectionId: 'warehouse-2', sourceName: 'orders' }], - changedWikiPageKeys: [], - actions: [{ target: 'sl', type: 'updated', key: 'orders', targetConnectionId: 'warehouse-2', detail: 'Finalized orders usage', rawPaths: ['semantic_models.yml'] }], - }); - deps.semanticLayerService.loadAllSources.mockImplementation((connectionId: string) => - Promise.resolve({ sources: [{ name: `${connectionId}_source` }], loadErrors: [] }), - ); - deps.sessionWorktree.git.diffNameStatus.mockImplementation(async (from: string, to: string) => - from === 'pre-finalization' && to === 'post-finalization' - ? [{ status: 'M', path: 'semantic-layer/warehouse-2/orders.yaml' }] - : [], - ); - deps.sessionWorktree.git.revParseHead - .mockResolvedValueOnce('pre-finalization') - .mockResolvedValueOnce('post-finalization'); - deps.sessionWorktree.git.commitFiles.mockResolvedValue({ created: true, commitHash: 'finalization-sha' }); - - const runner = buildRunner(deps); - (runner as any).stageRawFilesStage1 = vi.fn().mockResolvedValue({ - currentHashes: new Map([['semantic_models.yml', 'h1']]), - rawDirInWorktree: 'raw-sources/c1/metricflow/s', - }); - (runner as any).resolveStagedDir = vi.fn().mockResolvedValue('/tmp/stage/upload-x'); - - await runner.run({ - jobId: 'j1', - connectionId: 'c1', - sourceKey: 'metricflow', - trigger: 'upload', - bundleRef: { kind: 'upload', uploadId: 'upload-x' }, - }); - - expect(deps.adapter.finalize).toHaveBeenCalledWith( - expect.objectContaining({ - connectionId: 'c1', - sourceKey: 'metricflow', - syncId: expect.any(String), - jobId: 'j1', - runId: 'run-1', - workdir: '/tmp/wt', - parseArtifacts: { semanticModels: [{ name: 'orders' }] }, - overrideReplay: undefined, - }), - ); - expect(deps.reportsRepo.create).toHaveBeenCalledWith( - expect.objectContaining({ - body: expect.objectContaining({ - finalization: expect.objectContaining({ - sourceKey: 'metricflow', - status: 'success', - commitSha: 'finalization-sha', - touchedPaths: ['semantic-layer/warehouse-2/orders.yaml'], - derivedTouchedSources: [{ connectionId: 'warehouse-2', sourceName: 'orders' }], - declaredTouchedSources: [{ connectionId: 'warehouse-2', sourceName: 'orders' }], - actions: [expect.objectContaining({ key: 'orders' })], - }), - }), - }), - ); - expect(deps.semanticLayerService.loadAllSources).toHaveBeenCalledWith('warehouse-2'); - expect(deps.slSearchService.indexSources).toHaveBeenCalledWith('warehouse-2', [{ name: 'warehouse-2_source' }]); - }); -``` - -Adjust the mocked `revParseHead` and `diffNameStatus` values to match the -current helper names in `makeDeps()` if the test harness already sequences -those calls differently. - -- [ ] **Step 2: Add real-git ordering and overlap tests** - -In `packages/context/src/ingest/ingest-bundle.runner.isolated-diff.test.ts`, -add two tests inside the isolated-diff describe block: - -```ts - it('runs finalization before wiki sl-ref repair and final gates', async () => { - const runtime = await makeRealGitRuntime(); - try { - const { deps, adapter } = makeDeps(runtime); - adapter.chunk.mockResolvedValue({ - workUnits: [{ unitKey: 'wiki-page', rawFiles: ['cards/source.json'], peerFileIndex: [], dependencyPaths: [] }], - }); - adapter.finalize = vi.fn(async ({ workdir }) => { - await mkdir(join(workdir, 'semantic-layer/warehouse'), { recursive: true }); - await mkdir(join(workdir, 'wiki/global'), { recursive: true }); - await writeFile( - join(workdir, 'semantic-layer/warehouse/orders.yaml'), - 'name: orders\ngrain: [order_id]\ncolumns: [{name: order_id, type: string}]\njoins: []\nmeasures:\n - name: total_orders\n expr: count(*)\n', - ); - await writeFile( - join(workdir, 'wiki/global/finalized-orders.md'), - '---\nsummary: Finalized orders\nusage_mode: auto\nsl_refs: []\n---\n\nOrders use `orders.total_orders`.\n', - ); - return { - warnings: [], - errors: [], - touchedSources: [{ connectionId: 'warehouse', sourceName: 'orders' }], - changedWikiPageKeys: ['finalized-orders'], - actions: [ - { target: 'sl', type: 'created', key: 'orders', detail: 'Finalized orders', rawPaths: ['cards/source.json'] }, - { target: 'wiki', type: 'created', key: 'finalized-orders', detail: 'Finalized wiki', rawPaths: ['cards/source.json'] }, - ], - }; - }); - deps.agentRunner.runLoop = vi.fn(async () => ({ stopReason: 'natural' as const })) as never; - const runner = new IngestBundleRunner(deps); - await mockStageRawFiles(runner, runtime, [['cards/source.json', 'h1']]); - - await runner.run({ jobId: 'job-finalization', connectionId: 'warehouse', sourceKey: 'metabase', trigger: 'upload', bundleRef: { kind: 'upload', uploadId: 'upload' } }); - - const trace = await readFile(join(runtime.configDir, '.ktx/ingest-traces/job-finalization/trace.jsonl'), 'utf-8'); - expect(trace.indexOf('finalization_committed')).toBeLessThan(trace.indexOf('wiki_sl_refs_repaired')); - expect(trace.indexOf('wiki_sl_refs_repaired')).toBeLessThan(trace.indexOf('final_artifact_gates')); - await expect(readFile(join(runtime.configDir, 'wiki/global/finalized-orders.md'), 'utf-8')).resolves.toContain( - 'sl_refs:\n - orders', - ); - } finally { - await rm(runtime.homeDir, { recursive: true, force: true }); - } - }); - - it('fails when finalization edits a path already changed earlier in the run', async () => { - const runtime = await makeRealGitRuntime(); - try { - const { deps, adapter } = makeDeps(runtime); - adapter.chunk.mockResolvedValue({ - workUnits: [{ unitKey: 'wiki-page', rawFiles: ['cards/source.json'], peerFileIndex: [], dependencyPaths: [] }], - }); - let currentSession: any = null; - deps.toolsetFactory.createIngestWuToolset = vi.fn((toolSession: any) => { - currentSession = toolSession; - return { toRuntimeTools: vi.fn(() => ({})) }; - }); - deps.agentRunner.runLoop = vi.fn(async () => { - const root = rootOfConfig(currentSession.configService, runtime.configDir); - await mkdir(join(root, 'wiki/global'), { recursive: true }); - await writeFile(join(root, 'wiki/global/orders.md'), '---\nsummary: Orders\nusage_mode: auto\n---\n\nWU body\n'); - currentSession.actions.push({ target: 'wiki', type: 'created', key: 'orders', detail: 'WU orders' }); - await currentSession.gitService.commitFiles(['wiki/global/orders.md'], 'wu orders', 'KTX Test', 'system@ktx.local'); - return { stopReason: 'natural' as const }; - }) as never; - adapter.finalize = vi.fn(async ({ workdir }) => { - await writeFile(join(workdir, 'wiki/global/orders.md'), '---\nsummary: Orders\nusage_mode: auto\n---\n\nFinalized body\n'); - return { - warnings: [], - errors: [], - touchedSources: [], - changedWikiPageKeys: ['orders'], - actions: [{ target: 'wiki', type: 'updated', key: 'orders', detail: 'Conflicting finalization' }], - }; - }); - const runner = new IngestBundleRunner(deps); - await mockStageRawFiles(runner, runtime, [['cards/source.json', 'h1']]); - - await expect( - runner.run({ jobId: 'job-finalization-overlap', connectionId: 'warehouse', sourceKey: 'metabase', trigger: 'upload', bundleRef: { kind: 'upload', uploadId: 'upload' } }), - ).rejects.toThrow(/finalization modified path\(s\) already changed earlier in this run: wiki\/global\/orders\.md/); - } finally { - await rm(runtime.homeDir, { recursive: true, force: true }); - } - }); -``` - -Add a target-policy regression in the same file: - -```ts - it('rejects finalization writes to unauthorized semantic-layer targets', async () => { - const runtime = await makeRealGitRuntime(); - try { - const { deps, adapter } = makeDeps(runtime); - adapter.chunk.mockResolvedValue({ workUnits: [] }); - adapter.finalize = vi.fn(async ({ workdir }) => { - await mkdir(join(workdir, 'semantic-layer/other-warehouse'), { recursive: true }); - await writeFile( - join(workdir, 'semantic-layer/other-warehouse/orders.yaml'), - 'name: orders\ngrain: [order_id]\ncolumns: [{name: order_id, type: string}]\njoins: []\nmeasures: []\n', - ); - return { - warnings: [], - errors: [], - touchedSources: [{ connectionId: 'other-warehouse', sourceName: 'orders' }], - changedWikiPageKeys: [], - actions: [{ target: 'sl', type: 'created', key: 'orders', targetConnectionId: 'other-warehouse', detail: 'Forbidden target', rawPaths: ['cards/source.json'] }], - }; - }); - const runner = new IngestBundleRunner(deps); - await mockStageRawFiles(runner, runtime, [['cards/source.json', 'h1']]); - - await expect( - runner.run({ jobId: 'job-finalization-target-policy', connectionId: 'warehouse', sourceKey: 'metabase', trigger: 'upload', bundleRef: { kind: 'upload', uploadId: 'upload' } }), - ).rejects.toThrow(/unauthorized semantic-layer target/); - const trace = await readFile(join(runtime.configDir, '.ktx/ingest-traces/job-finalization-target-policy/trace.jsonl'), 'utf-8'); - expect(trace).toContain('finalization_committed'); - expect(trace).toContain('semantic_layer_target_policy'); - expect(trace).toContain('ingest_failed'); - } finally { - await rm(runtime.homeDir, { recursive: true, force: true }); - } - }); -``` - -- [ ] **Step 3: Implement the runner finalization phase** - -In `packages/context/src/ingest/ingest-bundle.runner.ts`, import the helpers: - -```ts -import { - compareFinalizationDeclarations, - deriveFinalizationTouchedSources, - deriveFinalizationWikiPageKeys, -} from './finalization-scope.js'; -``` - -Near the existing `latestReportProvenanceRows` and `latestReconciliationActions` -variables at the top of `runInternal()`, add: - -```ts - let latestFinalizationOutcome: IngestReportFinalizationOutcome | undefined; -``` - -Replace the post-processor block after reconciliation with this shape: - -```ts - const preFinalizationSha = await sessionWorktree.git.revParseHead(); - const preFinalizationSourcesByConnection = await this.loadSourcesByConnection( - sessionWorktree.workdir, - slConnectionIds, - ); - let finalizationOutcome: IngestReportFinalizationOutcome | undefined; - let finalizationActions: MemoryAction[] = []; - let finalizationTouchedPaths: string[] = []; - let finalizationTouchedSources: TouchedSlSource[] = []; - let finalizationChangedWikiPageKeys: string[] = []; - let finalizationSha: string | null = null; - - activePhase = 'finalization'; - if (adapter.finalize) { - emitStageProgress('finalization', 87, 'Running deterministic finalization'); - await runTrace.event('debug', 'finalization', 'finalization_started', { sourceKey: job.sourceKey }); - const result = await adapter.finalize({ - connectionId: job.connectionId, - sourceKey: job.sourceKey, - syncId, - jobId: job.jobId, - runId: createdRunRow.id, - stagedDir, - workdir: sessionWorktree.workdir, - ...(overrideReport ? {} : { parseArtifacts }), - stageIndex, - workUnitOutcomes, - reconciliationActions: reconcileActions, - ...(overrideReport - ? { - overrideReplay: { - priorJobId: overrideReport.jobId, - priorRunId: overrideReport.runId, - priorSyncId: overrideReport.body.syncId, - evictionRawPaths: overrideReport.body.evictionInputs, - }, - } - : {}), - }); - if (result.errors.length > 0) { - finalizationOutcome = { - sourceKey: job.sourceKey, - status: 'failed', - commitSha: null, - touchedPaths: [], - declaredTouchedSources: result.touchedSources, - derivedTouchedSources: [], - declaredChangedWikiPageKeys: result.changedWikiPageKeys, - derivedChangedWikiPageKeys: [], - mismatches: [], - result: result.result, - errors: result.errors, - warnings: result.warnings, - actions: result.actions ?? [], - provenanceExclusions: [], - }; - latestFinalizationOutcome = finalizationOutcome; - await runTrace.event('error', 'finalization', 'finalization_failed', { - sourceKey: job.sourceKey, - errors: result.errors, - warnings: result.warnings, - }); - throw new Error(`deterministic finalization failed: ${result.errors.join('; ')}`); - } - - const changedBeforeFinalization = new Set([ - ...projectionTouchedPaths, - ...workUnitOutcomes.flatMap((outcome) => outcome.patchTouchedPaths ?? []), - ...(preReconciliationSha && preFinalizationSha !== preReconciliationSha - ? (await sessionWorktree.git.diffNameStatus(preReconciliationSha, preFinalizationSha)).map((entry) => entry.path) - : []), - ]); - const changedStatus = await sessionWorktree.git.changedPaths(); - finalizationTouchedPaths = changedStatus; - const overlapping = finalizationTouchedPaths.filter((path) => changedBeforeFinalization.has(path)); - if (overlapping.length > 0) { - await runTrace.event('error', 'finalization', 'finalization_failed', { - sourceKey: job.sourceKey, - reason: 'path_overlap', - overlappingPaths: overlapping.sort(), - }); - throw new Error(`finalization modified path(s) already changed earlier in this run: ${overlapping.sort().join(', ')}`); - } - - const finalizationCommit = - finalizationTouchedPaths.length > 0 - ? await sessionWorktree.git.commitFiles( - finalizationTouchedPaths, - `ingest(${job.sourceKey}): deterministic finalization syncId=${syncId}`, - this.deps.storage.systemGitAuthor.name, - this.deps.storage.systemGitAuthor.email, - ) - : await sessionWorktree.git.commitStaged( - `ingest(${job.sourceKey}): deterministic finalization syncId=${syncId}`, - this.deps.storage.systemGitAuthor.name, - this.deps.storage.systemGitAuthor.email, - ); - finalizationSha = finalizationCommit.created ? finalizationCommit.commitHash : null; - const postFinalizationSha = await sessionWorktree.git.revParseHead(); - finalizationTouchedPaths = - preFinalizationSha !== postFinalizationSha - ? (await sessionWorktree.git.diffNameStatus(preFinalizationSha, postFinalizationSha)).map((entry) => entry.path) - : []; - - const changedConnectionIds = [ - ...new Set([ - ...slConnectionIds, - ...finalizationTouchedPaths - .filter((path) => path.startsWith('semantic-layer/')) - .map((path) => path.split('/')[1]) - .filter((connectionId): connectionId is string => Boolean(connectionId)), - ]), - ].sort(); - const postFinalizationSourcesByConnection = await this.loadSourcesByConnection( - sessionWorktree.workdir, - changedConnectionIds, - ); - const scope = await deriveFinalizationTouchedSources({ - changedPaths: finalizationTouchedPaths, - beforeSourcesByConnection: preFinalizationSourcesByConnection, - afterSourcesByConnection: postFinalizationSourcesByConnection, - }); - if (scope.unresolvedPaths.length > 0) { - await runTrace.event('error', 'finalization', 'finalization_failed', { - sourceKey: job.sourceKey, - reason: 'unresolved_semantic_layer_paths', - unresolvedPaths: scope.unresolvedPaths, - }); - throw new Error(`could not resolve finalization semantic-layer path(s): ${scope.unresolvedPaths.join(', ')}`); - } - finalizationTouchedSources = scope.touchedSources; - finalizationChangedWikiPageKeys = deriveFinalizationWikiPageKeys(finalizationTouchedPaths); - const mismatches = compareFinalizationDeclarations({ - declaredTouchedSources: result.touchedSources, - derivedTouchedSources: finalizationTouchedSources, - declaredChangedWikiPageKeys: result.changedWikiPageKeys, - derivedChangedWikiPageKeys: finalizationChangedWikiPageKeys, - }); - if (mismatches.length > 0) { - finalizationOutcome = { - sourceKey: job.sourceKey, - status: 'failed', - commitSha: finalizationSha, - touchedPaths: finalizationTouchedPaths, - declaredTouchedSources: result.touchedSources, - derivedTouchedSources: finalizationTouchedSources, - declaredChangedWikiPageKeys: result.changedWikiPageKeys, - derivedChangedWikiPageKeys: finalizationChangedWikiPageKeys, - mismatches, - result: result.result, - errors: ['finalization touched artifact declaration mismatch'], - warnings: result.warnings, - actions: result.actions ?? [], - provenanceExclusions: [], - }; - latestFinalizationOutcome = finalizationOutcome; - await runTrace.event('error', 'finalization', 'finalization_failed', { - sourceKey: job.sourceKey, - reason: 'declaration_mismatch', - mismatches, - }); - throw new Error(`finalization touched artifact declaration mismatch: ${mismatches.map((m) => `${m.direction}:${m.artifactKind}:${m.key}`).join(', ')}`); - } - finalizationActions = result.actions ?? []; - finalizationOutcome = { - sourceKey: job.sourceKey, - status: 'success', - commitSha: finalizationSha, - touchedPaths: finalizationTouchedPaths, - declaredTouchedSources: result.touchedSources, - derivedTouchedSources: finalizationTouchedSources, - declaredChangedWikiPageKeys: result.changedWikiPageKeys, - derivedChangedWikiPageKeys: finalizationChangedWikiPageKeys, - mismatches, - result: result.result, - errors: [], - warnings: result.warnings, - actions: finalizationActions, - provenanceExclusions: [], - }; - latestFinalizationOutcome = finalizationOutcome; - await runTrace.event('debug', 'finalization', 'finalization_committed', { - sourceKey: job.sourceKey, - commitSha: finalizationSha, - touchedPaths: finalizationTouchedPaths, - touchedSources: finalizationTouchedSources, - changedWikiPageKeys: finalizationChangedWikiPageKeys, - warnings: result.warnings, - }); - } else { - await runTrace.event('debug', 'finalization', 'finalization_skipped', { sourceKey: job.sourceKey }); - } -``` - -In the runner `catch` block failure report body, include the latest -finalization outcome: - -```ts - finalization: latestFinalizationOutcome, -``` - -Add `GitService.changedPaths()` in `packages/context/src/core/git.service.ts` -or use an equivalent existing helper: - -```ts - async changedPaths(): Promise { - const raw = await this.git.raw(['status', '--porcelain=v1', '-z']); - const fields = raw.split('\0').filter(Boolean); - const paths: string[] = []; - for (const field of fields) { - const path = field.slice(3); - if (path.length > 0) { - paths.push(path); - } - } - return [...new Set(paths)].sort(); - } -``` - -Add a private runner helper: - -```ts - private async loadSourcesByConnection( - workdir: string, - connectionIds: string[], - ): Promise> { - const service = this.deps.semanticLayerService.forWorktree(workdir); - const result = new Map(); - for (const connectionId of connectionIds) { - const { sources } = await service.loadAllSources(connectionId); - result.set(connectionId, sources); - } - return result; - } -``` - -- [ ] **Step 4: Feed finalization into repair, target policy, and gates** - -In `packages/context/src/ingest/ingest-bundle.runner.ts`, replace uses of -post-processor scope: - -```ts - ...(postProcessorOutcome?.touchedSources ?? []) -``` - -with: - -```ts - ...finalizationTouchedSources -``` - -Replace final wiki page scope additions so finalization wiki keys are included: - -```ts - ...finalizationChangedWikiPageKeys, -``` - -Replace the final target policy post-processor path mapping with: - -```ts - ...finalizationTouchedPaths, -``` - -Replace report body field: - -```ts - finalization: finalizationOutcome, -``` - -Replace SL reindex touched connections so it includes finalization actions and -derived touched sources: - -```ts - .concat(finalizationActions) - .filter((action) => action.target === 'sl') - .map((action) => actionTargetConnectionId(action, job.connectionId)) - .concat(finalizationTouchedSources.map((source) => source.connectionId)), -``` - -- [ ] **Step 5: Run focused runner tests** - -Run: - -```bash -pnpm --filter @ktx/context exec vitest run src/ingest/ingest-bundle.runner.test.ts src/ingest/ingest-bundle.runner.isolated-diff.test.ts -``` - -Expected: PASS after the runner wiring is complete. - -### Task 4: Add finalization provenance and override replay behavior - -**Files:** -- Modify: `packages/context/src/ingest/ingest-bundle.runner.test.ts` -- Modify: `packages/context/src/ingest/ingest-bundle.runner.ts` - -- [ ] **Step 1: Add unit tests for provenance partitioning and override context** - -Add these tests to `packages/context/src/ingest/ingest-bundle.runner.test.ts`: - -```ts - it('reports finalization actions excluded from provenance when raw paths are not defensible', async () => { - const deps = makeDeps(); - deps.adapter.finalize = vi.fn().mockResolvedValue({ - warnings: [], - errors: [], - touchedSources: [], - changedWikiPageKeys: [], - actions: [ - { target: 'wiki', type: 'updated', key: 'historic-sql-pattern', detail: 'No raw path' }, - { target: 'sl', type: 'updated', key: 'orders', detail: 'Invalid raw path', rawPaths: ['missing.json'] }, - ], - }); - const runner = buildRunner(deps); - (runner as any).stageRawFilesStage1 = vi.fn().mockResolvedValue({ - currentHashes: new Map([['current.json', 'h1']]), - rawDirInWorktree: 'raw-sources/c1/fake/s', - }); - (runner as any).resolveStagedDir = vi.fn().mockResolvedValue('/tmp/stage/upload-x'); - - await runner.run({ jobId: 'j1', connectionId: 'c1', sourceKey: 'fake', trigger: 'upload', bundleRef: { kind: 'upload', uploadId: 'upload-x' } }); - - expect(deps.reportsRepo.create).toHaveBeenCalledWith( - expect.objectContaining({ - body: expect.objectContaining({ - finalization: expect.objectContaining({ - provenanceExclusions: [ - expect.objectContaining({ reason: 'missing_raw_paths' }), - expect.objectContaining({ reason: 'raw_path_not_defensible', invalidRawPaths: ['missing.json'] }), - ], - }), - }), - }), - ); - expect(deps.provenanceRepo.insertMany).not.toHaveBeenCalledWith( - expect.arrayContaining([expect.objectContaining({ rawPath: 'missing.json' })]), - ); - }); - - it('passes explicit override replay metadata and no current work unit outcomes', async () => { - const deps = makeDeps(); - deps.reportsRepo.findByJobId.mockResolvedValue({ - id: 'prior-report', - runId: 'prior-run', - jobId: 'prior-job', - connectionId: 'c1', - sourceKey: 'fake', - createdAt: '2026-05-18T00:00:00.000Z', - body: { - status: 'completed', - syncId: 'prior-sync', - diffSummary: { added: 0, modified: 0, deleted: 0, unchanged: 0 }, - commitSha: 'prior-sha', - workUnits: [ - { - unitKey: 'prior-unit', - rawFiles: ['prior.json'], - status: 'success', - actions: [{ target: 'wiki', type: 'created', key: 'prior', detail: 'prior' }], - touchedSlSources: [], - }, - ], - failedWorkUnits: [], - reconciliationSkipped: false, - conflictsResolved: [], - evictionsApplied: [{ rawPath: 'do-not-replay.json', artifactKind: 'wiki', artifactKey: 'old', action: 'removed', reason: 'prior' }], - unmappedFallbacks: [], - artifactResolutions: [], - evictionInputs: ['evicted-from-prior-report.json'], - unresolvedCards: [], - supersededBy: null, - overrideOf: null, - provenanceRows: [], - toolTranscripts: [], - }, - }); - deps.adapter.finalize = vi.fn().mockResolvedValue({ - warnings: [], - errors: [], - touchedSources: [], - changedWikiPageKeys: [], - actions: [], - }); - const runner = buildRunner(deps); - (runner as any).stageRawFilesStage1 = vi.fn().mockResolvedValue({ - currentHashes: new Map([['prior.json', 'h1']]), - rawDirInWorktree: 'raw-sources/c1/fake/prior-sync', - }); - (runner as any).resolveStagedDir = vi.fn().mockResolvedValue('/tmp/stage/prior'); - - await runner.run({ jobId: 'override-job', connectionId: 'c1', sourceKey: 'fake', trigger: 'manual_override', bundleRef: { kind: 'override', priorJobId: 'prior-job' } }); - - expect(deps.adapter.finalize).toHaveBeenCalledWith( - expect.objectContaining({ - workUnitOutcomes: [], - parseArtifacts: undefined, - overrideReplay: { - priorJobId: 'prior-job', - priorRunId: 'prior-run', - priorSyncId: 'prior-sync', - evictionRawPaths: ['evicted-from-prior-report.json'], - }, - }), - ); - }); -``` - -- [ ] **Step 2: Partition finalization actions for provenance** - -In `packages/context/src/ingest/ingest-bundle.runner.ts`, extend -`ProvenanceRowOrigin` with: - -```ts - | { - source: 'finalization_action'; - actionIndex: number; - action: MemoryAction; - }; -``` - -Add this private helper: - -```ts - private partitionFinalizationActionsForProvenance(input: { - actions: MemoryAction[]; - currentRawPaths: Set; - currentEvictionRawPaths: Set; - overrideEvictionRawPaths: Set; - }): { actions: MemoryAction[]; exclusions: IngestReportFinalizationProvenanceExclusion[] } { - const defensible = new Set([ - ...input.currentRawPaths, - ...input.currentEvictionRawPaths, - ...input.overrideEvictionRawPaths, - ]); - const actions: MemoryAction[] = []; - const exclusions: IngestReportFinalizationProvenanceExclusion[] = []; - for (const action of input.actions) { - const rawPaths = action.rawPaths ?? []; - if (rawPaths.length === 0) { - exclusions.push({ action, reason: 'missing_raw_paths' }); - continue; - } - const invalidRawPaths = rawPaths.filter((rawPath) => !defensible.has(rawPath)).sort(); - if (invalidRawPaths.length > 0) { - exclusions.push({ action, reason: 'raw_path_not_defensible', invalidRawPaths }); - continue; - } - actions.push(action); - } - return { actions, exclusions }; - } -``` - -Update `buildProvenancePlan()` input to accept: - -```ts - finalizationActions: MemoryAction[]; -``` - -Then append finalization rows before artifact resolutions: - -```ts - input.finalizationActions.forEach((action, actionIndex) => { - for (const rawPath of action.rawPaths ?? []) { - pushActionProvenance(rawPath, action, { - source: 'finalization_action', - actionIndex, - action, - }); - } - }); -``` - -Before calling `buildProvenancePlan()`, partition finalization actions: - -```ts - const finalizationProvenance = this.partitionFinalizationActionsForProvenance({ - actions: finalizationActions, - currentRawPaths: new Set(currentHashes.keys()), - currentEvictionRawPaths: new Set(stageIndex.evictionsApplied.map((entry) => entry.rawPath)), - overrideEvictionRawPaths: new Set(overrideReport?.body.evictionInputs ?? []), - }); - if (finalizationOutcome) { - finalizationOutcome.provenanceExclusions = finalizationProvenance.exclusions; - } - const provenancePlan = this.buildProvenancePlan({ - job, - syncId, - currentHashes, - stageIndex, - reconcileActions, - finalizationActions: finalizationProvenance.actions, - }); -``` - -- [ ] **Step 3: Include finalization actions in memory flow** - -In `packages/context/src/ingest/ingest-bundle.runner.ts`, replace: - -```ts - const memoryFlowSavedActions = stageIndex.workUnits.flatMap((wu) => wu.actions).concat(reconcileActions); -``` - -with: - -```ts - const memoryFlowSavedActions = stageIndex.workUnits - .flatMap((wu) => wu.actions) - .concat(reconcileActions) - .concat(finalizationActions); -``` - -Remove post-processor memory-count additions from the saved event. - -- [ ] **Step 4: Run focused provenance tests** - -Run: - -```bash -pnpm --filter @ktx/context exec vitest run src/ingest/ingest-bundle.runner.test.ts -t "finalization" -``` - -Expected: PASS. - -### Task 5: Move historic-SQL projection into adapter finalization - -**Files:** -- Modify: `packages/context/src/ingest/adapters/historic-sql/projection.test.ts` -- Modify: `packages/context/src/ingest/adapters/historic-sql/projection.ts` -- Modify: `packages/context/src/ingest/adapters/historic-sql/historic-sql.adapter.ts` - -- [ ] **Step 1: Rename post-processor tests to projection tests** - -Move durable behavior coverage from -`packages/context/src/ingest/adapters/historic-sql/post-processor.test.ts` to -`packages/context/src/ingest/adapters/historic-sql/projection.test.ts`. -The first test must call `projectHistoricSqlEvidence()` directly and assert: - -```ts -expect(result.touchedSources).toEqual([{ connectionId: 'warehouse', sourceName: 'orders' }]); -expect(result.changedWikiPageKeys).toContain('historic-sql-revenue-pattern'); -expect(result.actions).toEqual( - expect.arrayContaining([ - expect.objectContaining({ target: 'sl', key: 'orders', rawPaths: ['tables/public/orders.json'] }), - expect.objectContaining({ target: 'wiki', key: 'historic-sql-revenue-pattern', rawPaths: ['patterns/revenue.json'] }), - ]), -); -``` - -Add an override-safe no-op test: - -```ts - it('does not mark stale or archive pages when override replay has no current-run evidence', async () => { - const result = await projectHistoricSqlEvidence({ - workdir, - connectionId: 'warehouse', - syncId: 'override-sync', - runId: 'override-run', - overrideReplay: { - priorJobId: 'prior-job', - priorRunId: 'prior-run', - priorSyncId: 'prior-sync', - evictionRawPaths: ['tables/public/orders.json'], - }, - }); - - expect(result.tableUsageMerged).toBe(0); - expect(result.staleTablesMarked).toBe(0); - expect(result.patternPagesWritten).toBe(0); - expect(result.stalePatternPagesMarked).toBe(0); - expect(result.archivedPatternPages).toBe(0); - expect(result.touchedSources).toEqual([]); - expect(result.changedWikiPageKeys).toEqual([]); - expect(result.actions).toEqual([]); - }); -``` - -- [ ] **Step 2: Extend historic-SQL projection result metadata** - -In `packages/context/src/ingest/adapters/historic-sql/projection.ts`, add -`overrideReplay`, `changedWikiPageKeys`, and `actions`: - -```ts -import type { FinalizationOverrideReplay } from '../../types.js'; -import type { MemoryAction } from '../../../memory/index.js'; - -export interface HistoricSqlProjectionInput { - workdir: string; - connectionId: string; - syncId: string; - runId: string; - overrideReplay?: FinalizationOverrideReplay; -} - -export interface HistoricSqlProjectionResult { - tableUsageMerged: number; - staleTablesMarked: number; - patternPagesWritten: number; - stalePatternPagesMarked: number; - archivedPatternPages: number; - touchedSources: Array<{ connectionId: string; sourceName: string }>; - changedWikiPageKeys: string[]; - actions: MemoryAction[]; - warnings: string[]; -} -``` - -Initialize the new fields: - -```ts - changedWikiPageKeys: [], - actions: [], -``` - -After loading evidence, add the override-safe no-op guard: - -```ts - if (input.overrideReplay && evidence.length === 0) { - result.warnings.push('historic-sql finalization skipped stale/archive cleanup during override replay without current-run evidence'); - return result; - } - if (evidence.length === 0) { - result.warnings.push('historic-sql finalization skipped because no current-run evidence was emitted'); - return result; - } -``` - -When table usage is merged, push a descriptive action: - -```ts - result.actions.push({ - target: 'sl', - type: 'updated', - key: sourceName, - targetConnectionId: input.connectionId, - detail: `Merged historic-SQL usage for ${matchingEvidence.table}`, - rawPaths: [matchingEvidence.rawPath], - }); -``` - -When a table is marked stale without a defensible raw path, push an action -without `rawPaths`: - -```ts - result.actions.push({ - target: 'sl', - type: 'updated', - key: sourceName, - targetConnectionId: input.connectionId, - detail: `Marked historic-SQL usage stale for ${tableRef}`, - }); -``` - -When a pattern page is written, record the key and action: - -```ts - result.changedWikiPageKeys.push(key); - result.actions.push({ - target: 'wiki', - type: reusable ? 'updated' : 'created', - key, - detail: `Projected historic-SQL pattern ${pattern.pattern.title}`, - rawPaths: [pattern.rawPath], - }); -``` - -When a pattern page is marked stale or archived, record the key and action -without raw paths: - -```ts - result.changedWikiPageKeys.push(page.key); - result.actions.push({ - target: 'wiki', - type: 'updated', - key: page.key, - detail: `Archived stale historic-SQL pattern page ${page.key}`, - }); -``` - -and: - -```ts - result.changedWikiPageKeys.push(page.key); - result.actions.push({ - target: 'wiki', - type: 'updated', - key: page.key, - detail: `Marked historic-SQL pattern page ${page.key} stale`, - }); -``` - -Deduplicate `changedWikiPageKeys` before returning: - -```ts - result.changedWikiPageKeys = [...new Set(result.changedWikiPageKeys)].sort(); - return result; -``` - -- [ ] **Step 3: Implement `HistoricSqlSourceAdapter.finalize()`** - -In `packages/context/src/ingest/adapters/historic-sql/historic-sql.adapter.ts`, -update the type import: - -```ts -import type { - ChunkResult, - DeterministicFinalizationContext, - DiffSet, - FetchContext, - FinalizationResult, - ScopeDescriptor, - SourceAdapter, -} from '../../types.js'; -``` - -Import the projector: - -```ts -import { projectHistoricSqlEvidence } from './projection.js'; -``` - -Add this method to the class: - -```ts - async finalize(ctx: DeterministicFinalizationContext): Promise { - const projection = await projectHistoricSqlEvidence({ - workdir: ctx.workdir, - connectionId: ctx.connectionId, - syncId: ctx.syncId, - runId: ctx.runId, - overrideReplay: ctx.overrideReplay, - }); - return { - result: projection, - warnings: projection.warnings, - errors: [], - touchedSources: projection.touchedSources, - changedWikiPageKeys: projection.changedWikiPageKeys, - actions: projection.actions, - }; - } -``` - -- [ ] **Step 4: Run historic-SQL projection tests** - -Run: - -```bash -pnpm --filter @ktx/context exec vitest run src/ingest/adapters/historic-sql/projection.test.ts src/ingest/adapters/historic-sql/historic-sql.adapter.test.ts -``` - -Expected: PASS. - -### Task 6: Remove post-processor infrastructure - -**Files:** -- Delete: `packages/context/src/ingest/adapters/historic-sql/post-processor.ts` -- Delete: `packages/context/src/ingest/adapters/historic-sql/post-processor.test.ts` -- Modify: `packages/context/src/ingest/ports.ts` -- Modify: `packages/context/src/ingest/local-bundle-runtime.ts` -- Modify: `packages/context/src/ingest/index.ts` -- Modify: `packages/context/src/package-exports.test.ts` -- Modify: `packages/context/src/ingest/ingest-bundle.runner.test.ts` -- Modify: `packages/cli/src/ingest.test.ts` -- Modify: `packages/cli/src/setup.ts` - -- [ ] **Step 1: Remove port and dependency types** - -Delete these interfaces from `packages/context/src/ingest/ports.ts`: - -```ts -export interface IngestBundlePostProcessorInput { - connectionId: string; - sourceKey: string; - syncId: string; - jobId: string; - runId: string; - workdir: string; - parseArtifacts: unknown; -} - -export interface IngestBundlePostProcessorResult { - result?: unknown; - warnings: string[]; - errors: string[]; - touchedSources: TouchedSlSource[]; -} - -export interface IngestBundlePostProcessorPort { - run(input: IngestBundlePostProcessorInput): Promise; -} -``` - -Delete this field from `IngestBundleRunnerDeps`: - -```ts - postProcessors?: Record; -``` - -- [ ] **Step 2: Remove local runtime wiring** - -In `packages/context/src/ingest/local-bundle-runtime.ts`, delete: - -```ts -import { HistoricSqlProjectionPostProcessor } from './adapters/historic-sql/post-processor.js'; -``` - -and delete the dependency object field: - -```ts - postProcessors: { - 'historic-sql': new HistoricSqlProjectionPostProcessor(), - }, -``` - -- [ ] **Step 3: Remove exports and package-export assertions** - -In `packages/context/src/ingest/index.ts`, delete: - -```ts -export { HistoricSqlProjectionPostProcessor } from './adapters/historic-sql/post-processor.js'; -``` - -In `packages/context/src/package-exports.test.ts`, delete: - -```ts - expect(ingest.HistoricSqlProjectionPostProcessor).toBeTypeOf('function'); -``` - -- [ ] **Step 4: Delete post-processor files** - -Delete: - -```bash -rm packages/context/src/ingest/adapters/historic-sql/post-processor.ts -rm packages/context/src/ingest/adapters/historic-sql/post-processor.test.ts -``` - -- [ ] **Step 5: Replace test assertions using `postProcessor`** - -Search: - -```bash -rg -n "postProcessor|post_processor|postProcessors|HistoricSqlProjectionPostProcessor|IngestBundlePostProcessor" packages/context/src packages/cli/src -``` - -Expected remaining matches: none in production code, exports, report schemas, -or tests. Historical matches in `docs/superpowers/plans/` do not need changes. - -For CLI tests that used a `postProcessor` report fixture, replace the fixture -with: - -```ts -finalization: { - sourceKey: 'historic-sql', - status: 'success', - commitSha: 'finalization-sha', - touchedPaths: ['semantic-layer/c1/_schema/public.yaml', 'wiki/global/historic-sql-orders.md'], - declaredTouchedSources: [{ connectionId: 'c1', sourceName: 'orders' }], - derivedTouchedSources: [{ connectionId: 'c1', sourceName: 'orders' }], - declaredChangedWikiPageKeys: ['historic-sql-orders'], - derivedChangedWikiPageKeys: ['historic-sql-orders'], - mismatches: [], - errors: [], - warnings: [], - actions: [ - { target: 'sl', type: 'updated', key: 'orders', detail: 'Merged usage', targetConnectionId: 'c1', rawPaths: ['tables/public/orders.json'] }, - { target: 'wiki', type: 'created', key: 'historic-sql-orders', detail: 'Projected pattern', rawPaths: ['patterns/orders.json'] }, - ], - provenanceExclusions: [], -} -``` - -- [ ] **Step 6: Run the removal checks** - -Run: - -```bash -rg -n "postProcessor|post_processor|postProcessors|HistoricSqlProjectionPostProcessor|IngestBundlePostProcessor" packages/context/src packages/cli/src -pnpm --filter @ktx/context exec vitest run src/package-exports.test.ts src/ingest/ingest-bundle.runner.test.ts src/ingest/report-snapshot.test.ts -``` - -Expected: `rg` returns no matches, and Vitest passes. - -### Task 7: Full verification - -**Files:** -- No source changes beyond prior tasks. - -- [ ] **Step 1: Run focused context tests** - -Run: - -```bash -pnpm --filter @ktx/context exec vitest run \ - src/ingest/finalization-scope.test.ts \ - src/ingest/report-snapshot.test.ts \ - src/ingest/ingest-bundle.runner.test.ts \ - src/ingest/ingest-bundle.runner.isolated-diff.test.ts \ - src/ingest/adapters/historic-sql/projection.test.ts \ - src/ingest/adapters/historic-sql/historic-sql.adapter.test.ts \ - src/package-exports.test.ts -``` - -Expected: PASS. - -- [ ] **Step 2: Run package type-check** - -Run: - -```bash -pnpm --filter @ktx/context run type-check -``` - -Expected: PASS. - -- [ ] **Step 3: Run package tests** - -Run: - -```bash -pnpm --filter @ktx/context run test -``` - -Expected: PASS. - -- [ ] **Step 4: Run dead-code check** - -Run: - -```bash -pnpm run dead-code -``` - -Expected: PASS. If Knip reports only historical Markdown references to removed -post-processor names, leave those Markdown references alone. - -- [ ] **Step 5: Run pre-commit on changed TypeScript and Markdown files** - -Run: - -```bash -uv run pre-commit run --files \ - packages/context/src/ingest/types.ts \ - packages/context/src/ingest/reports.ts \ - packages/context/src/ingest/report-snapshot.ts \ - packages/context/src/ingest/finalization-scope.ts \ - packages/context/src/ingest/finalization-scope.test.ts \ - packages/context/src/ingest/ingest-bundle.runner.ts \ - packages/context/src/ingest/ingest-bundle.runner.test.ts \ - packages/context/src/ingest/ingest-bundle.runner.isolated-diff.test.ts \ - packages/context/src/ingest/adapters/historic-sql/projection.ts \ - packages/context/src/ingest/adapters/historic-sql/projection.test.ts \ - packages/context/src/ingest/adapters/historic-sql/historic-sql.adapter.ts \ - packages/context/src/ingest/adapters/historic-sql/historic-sql.adapter.test.ts \ - packages/context/src/ingest/local-bundle-runtime.ts \ - packages/context/src/ingest/ports.ts \ - packages/context/src/ingest/index.ts \ - packages/context/src/package-exports.test.ts \ - docs/superpowers/plans/2026-05-18-adapter-owned-finalization-v1.md -``` - -Expected: PASS. If pre-commit is unavailable because the local `uv` version is -older than the project pin, report the mismatch and keep the focused pnpm -checks as verification evidence. - -## Documentation decision - -No `docs-site/content/docs/` update is required for this plan. The change -removes an internal runner extension point and changes ingest report internals, -but it does not add or rename a public CLI command, flag, configuration key, -connector setup flow, or user-facing workflow. diff --git a/docs/superpowers/specs/2026-05-18-adapter-owned-ingest-finalization-design.md b/docs/superpowers/specs/2026-05-18-adapter-owned-ingest-finalization-design.md deleted file mode 100644 index 4142bf31..00000000 --- a/docs/superpowers/specs/2026-05-18-adapter-owned-ingest-finalization-design.md +++ /dev/null @@ -1,443 +0,0 @@ -# Adapter-owned ingest finalization design - -**Date:** 2026-05-18 -**Author:** Andrey Avtomonov -**Status:** Design - pending implementation plan - -## Background - -The isolated-diff ingestion migration made KTX's shared bundle runner -responsible for one durable execution model: stage raw source data, run -source-planned work units in isolated child worktrees, integrate their diffs, -reconcile, run final gates, and squash the accepted integration tree back into -the project worktree. - -That direction is correct, but the current code still has a runner-level -post-processing extension point. `IngestBundleRunnerDeps.postProcessors` maps a -source key to an arbitrary `IngestBundlePostProcessorPort`, and local runtime -wires `historic-sql` to `HistoricSqlProjectionPostProcessor`. That path can -write durable semantic-layer and wiki artifacts after work-unit integration and -reconciliation, outside the source adapter contract. - -Historic SQL exposed why the extra path exists. Its table and pattern work units -emit typed evidence, then a deterministic projection step merges the evidence -into `_schema` usage and historic-SQL wiki pages. Some of that work is local to -one work unit, but other behavior is whole-run maintenance: marking stale table -usage, reusing existing pattern pages, and archiving old pattern pages. Those -aggregate decisions do not fit cleanly inside independent per-work-unit writes. - -The design goal is to preserve legitimate adapter-owned deterministic -maintenance without keeping a generic runner-level escape hatch. - -## Goals - -This design tightens the isolated-diff architecture around a stable boundary: -the generic runner owns execution mechanics, and adapters own source semantics. - -The design has these goals: - -- Remove runner-level `postProcessors` as an alternate durable-write pipeline. -- Add a first-class `SourceAdapter.finalize?()` hook for deterministic - post-work-unit source maintenance. -- Keep `finalize?()` constrained, observable, and subject to the same final - validation gates as work-unit and reconciliation changes. -- Preserve historic-SQL aggregate projection behavior without treating it as a - hidden fallback ingestion path. -- Keep public execution knobs out of the adapter API. - -## Non-goals - -This design does not rework source-specific chunking, fetch formats, wiki page -frontmatter, semantic-layer YAML, or raw source layouts. It does not replace -agent-authored work units with deterministic projectors. It also does not add a -public `executionMode`, `planningStrategy`, `conflictPolicy`, or source-key -allowlist. - -Override ingest remains a special correction operation that reuses a prior raw -snapshot and forces reconciliation. It should be documented and tested as -override replay, not as a fallback pipeline. This design does not require -override ingest to run source work units. - -## Locked design direction - -The shared ingestion runner keeps one ordered pipeline for sources that can -write durable project artifacts. - -```text -fetch raw - -> adapter plans WorkUnit[] - -> optional adapter project - -> isolated WU diffs - -> artifact-aware integration - -> reconciliation - -> optional adapter finalize - -> runner wiki-SL-ref repair - -> final target policy and artifact gates - -> squash -``` - -The exact implementation may continue to call `chunk()` before `project()` so a -projector can consume `parseArtifacts`. The architectural invariant is that -`project()` runs in the integration worktree before child worktrees start, while -`finalize()` runs in the integration worktree after accepted work-unit and -reconciliation changes are present. - -Adapters decide what source-specific work belongs in `project()`, work units, -or `finalize()`. The runner decides when those phases run, captures their git -effects, enforces target scope, runs gates, writes traces and reports, and -squashes the final tree. - -## Adapter API - -The source adapter contract should make deterministic source phases explicit. - -```ts -interface SourceAdapter { - readonly source: string; - readonly skillNames: string[]; - readonly reconcileSkillNames?: string[]; - readonly evidenceIndexing?: 'documents'; - readonly triageSupported?: boolean; - - getTriageSignals?(stagedDir: string, externalId: string): Promise; - detect(stagedDir: string): Promise; - fetch?(pullConfig: unknown, stagedDir: string, ctx: FetchContext): Promise; - readFetchReport?(stagedDir: string): Promise; - listTargetConnectionIds?(stagedDir: string): Promise; - chunk(stagedDir: string, diffSet?: DiffSet): Promise; - clusterWorkUnits?(ctx: ClusterWorkUnitsContext): Promise; - project?(ctx: DeterministicProjectionContext): Promise; - finalize?(ctx: DeterministicFinalizationContext): Promise; - describeScope?(stagedDir: string): Promise; - onPullSucceeded?(ctx: PullSucceededContext): Promise; -} -``` - -`finalize?()` is not a compatibility wrapper for old post-processors. It is a -source-adapter method with a fixed location in the runner lifecycle. - -```ts -interface DeterministicFinalizationContext { - connectionId: string; - sourceKey: string; - syncId: string; - jobId: string; - runId: string; - stagedDir: string; - workdir: string; - parseArtifacts?: unknown; - stageIndex: StageIndex; - workUnitOutcomes: WorkUnitOutcome[]; - reconciliationActions: MemoryAction[]; - overrideReplay?: FinalizationOverrideReplay; -} - -interface FinalizationResult { - warnings: string[]; - errors: string[]; - touchedSources: TouchedSlSource[]; - changedWikiPageKeys: string[]; - actions?: MemoryAction[]; - result?: unknown; -} - -interface FinalizationOverrideReplay { - priorJobId: string; - priorRunId: string; - priorSyncId: string; - evictionRawPaths: string[]; -} -``` - -The implementation plan can adjust exact type names to match the existing -module layout, but the contract must preserve these semantics: - -- `finalize?()` is deterministic TypeScript code, not an agent loop. -- It runs only in the ingestion integration worktree. -- It may write ordinary durable project files. -- It must report the semantic-layer sources and wiki page keys it believes it - touched so the runner can verify that declaration against the worktree diff. -- Outside override replay, `stageIndex` is the canonical runner index for - accepted work-unit actions, touched sources, evictions, reconciliation records, - and artifact resolutions visible to the current run. -- In override replay, `stageIndex` is a prior-run replay index for work-unit - facts. It may contain prior-run work-unit actions, touched sources, and - artifact records, and adapters must not treat those entries as current-run - evidence. The runner must not replay prior-report `evictionsApplied` as - current-run eviction evidence. If override reconciliation records eviction - decisions, those records are fresh current-run `stageIndex.evictionsApplied` - entries. -- `workUnitOutcomes` contains only work units executed in the current run. It - is empty when override replay skips source work units. -- `reconciliationActions` contains only accepted reconciliation writes emitted - through the reconciliation tool session in the current run. These actions have - already mutated the integration worktree. -- `overrideReplay` being present is the canonical signal that source work units - did not produce current-run evidence unless another context field explicitly - carries fresh current-run deterministic input. -- `overrideReplay.evictionRawPaths` contains the deleted raw paths loaded from - the prior report's `evictionInputs` for the reused raw snapshot. It is the - only override-replay raw-path allowlist for removed-from-snapshot provenance. - It is not, by itself, proof that a particular durable artifact is stale or was - observed by current-run work units. -- `actions` in `FinalizationResult` are descriptive records for finalization - writes that the adapter already performed. The runner must not re-apply them. - When finalization actions are intended to create provenance rows, they must - carry defensible `rawPaths`: current-snapshot paths from the current raw - snapshot, removed-from-snapshot paths from current-run - `stageIndex.evictionsApplied`, or removed-from-snapshot paths from - `overrideReplay.evictionRawPaths` when override replay is present. - Finalization actions without defensible raw-path attribution are still - reported, but the runner must exclude them from provenance and surface that - exclusion explicitly. -- It cannot mutate the main project worktree directly. -- The finalization context must not pass a root-scoped service that can bypass - the integration worktree. `workdir` is the durable write boundary. If a future - helper is added to the context, the contract must name it as worktree-scoped - and state whether it is read-only or allowed to write. - -The existing adapter API fields unrelated to deterministic projection and -finalization remain part of the contract. Adding `finalize?()` must not remove -triage or evidence-indexing support. - -## Override replay - -Override ingest remains a replay of a prior raw snapshot with forced -reconciliation. It does not execute source work units or call `adapter.chunk()` -in this design, so finalization must not silently assume fresh work-unit -evidence exists. - -The runner should still enter the finalization phase for adapters that -implement `finalize?()`, but it must pass explicit override metadata. In that -mode, `workUnitOutcomes` is empty, `parseArtifacts` is absent, -`overrideReplay.evictionRawPaths` is populated from the prior report's -`evictionInputs`, `stageIndex` comes from the prior report with prior -`evictionsApplied` excluded, and `reconciliationActions` contains only new -override reconciliation actions. - -If a future implementation intentionally re-parses the materialized override -raw snapshot, it must expose that fact through an explicit override-safe context -field instead of relying on `parseArtifacts` alone. `parseArtifacts` by itself -is never current work-unit evidence in override replay and never authorizes -historic-SQL whole-run cleanup. - -Adapters must treat missing current-run deterministic inputs as a no-op, not as -negative evidence. For historic SQL, override replay must not mark tables stale, -mark pattern pages stale, or archive pattern pages from an empty current-run -evidence directory. Whole-run cleanup can run only when `overrideReplay` is -absent and current-run work-unit evidence exists, or when a future explicit -override-safe context field names equivalent facts. Any override-safe -finalization must be derived from the materialized raw snapshot or explicit -prior-report data. In particular, prior-run -`stageIndex.workUnits[*].actions`, prior-run touched sources, and prior-run -artifact records are not proof that the current override run observed or failed -to observe those artifacts. - -## Runner responsibilities - -The runner owns all reusable mechanics around `finalize?()`. - -After reconciliation completes, the runner calls `adapter.finalize?()` if it -exists. The runner captures the pre-finalization commit, derives the -finalization changed paths from the integration-worktree git diff, commits those -changes, records the commit SHA and touched paths in the run trace/report, -includes finalization actions in saved-memory counts, and runs wiki-SL-ref -repair before final target-policy and artifact gates. - -The integration-worktree diff is the source of truth for finalization touched -paths, changed wiki page keys, and semantic-layer paths. The adapter's -`touchedSources` and `changedWikiPageKeys` declaration is a verification input, -not the downstream authority. The runner must derive the final repair and gate -scope from the diff, cross-check the adapter declaration against that diff, and -fail the run on under-reporting or over-reporting that would make wiki-SL-ref -repair, target-policy checks, final gates, reports, traces, or provenance use a -different artifact set from the actual finalization commit. - -The runner-derived semantic-layer scope must include logical -`TouchedSlSource` tuples, not only file paths. Standalone semantic-layer files -under `semantic-layer//.yaml` can map structurally to -`{ connectionId, sourceName }`. Aggregate semantic-layer files, including -`semantic-layer//_schema/*.yaml`, must be resolved by comparing -the pre-finalization and post-finalization materialized semantic-layer sources -with the worktree-scoped semantic-layer parser/loader. Wiki page keys continue -to map structurally from `wiki/global/.md`. If the runner cannot -resolve a changed semantic-layer path to logical touched sources with its own -resolver, the run must fail; it must not fall back to the adapter declaration as -the downstream scope. - -`wiki_sl_ref_repair` remains a runner mechanic, not an adapter method. It runs -after finalization and before final gates, and it uses the normal target -connection set plus the runner-derived finalization touched sources to decide -which semantic-layer references are visible. Its writes are part of the same -integration worktree diff as finalization/reconciliation, so target-policy -checks, final artifact gates, reports, traces, and squash behavior cover those -writes before changes reach the main project worktree. - -The runner must treat finalization like deterministic projection and -reconciliation, not like a free-form source-key plug-in. It must enforce the -same target-connection policy used for work-unit and reconciliation changes. -If finalization writes an unauthorized semantic-layer target, modifies artifacts -outside the authorized target set, references a missing semantic-layer entity, or -returns errors, the run fails before changes reach the main project worktree. - -The runner should expose one trace phase named `finalization`. It should not -keep a `post_processor` stage, `IngestBundlePostProcessorPort`, -`deps.postProcessors`, or report fields that imply a parallel post-processor -pipeline. - -## Adapter application - -Each adapter continues to use the same generic runner mechanics, while keeping -source-specific choices inside the adapter. - -- `metabase` fetches cards and dashboards, computes scope, plans - card/dashboard work units, and usually does not need `project()` or - `finalize()`. -- `notion` fetches pages, extracts triage signals, clusters page work units, - and usually does not need deterministic finalization. -- `dbt` fetches the repository, parses dbt project metadata, plans model work - units, and may later add `project()` if dbt YAML import becomes deterministic. -- `lookml` fetches LookML, produces validation artifacts, plans model and - explore work units, and may later add `project()` for deterministic LookML to - semantic-layer import. -- `looker` fetches runtime bundles, fetch reports, target connections, and - triage signals. It continues to rely on work-unit diffs and shared gates. -- `metricflow` is the current strong `project()` example. It imports - authoritative semantic models before child worktrees start, then lets any - work units observe those projected files. -- `live-database` can remain work-unit based, but database schema introspection - is a good future `project()` candidate because the schema is authoritative - structured metadata. -- `historic-sql` should move current post-processor behavior into the adapter. - Local table-usage and pattern-page writes may move into work-unit tools where - they are genuinely per-unit. Whole-run maintenance such as stale table usage, - pattern-page reuse, and stale/archive page decisions belongs in - `HistoricSqlSourceAdapter.finalize()`. -- `fake` remains a test adapter and does not need deterministic phases. - -## Historic-SQL migration - -Historic SQL should stop using evidence-only tool output plus runner-level -post-processing as its durable projection path. - -The preferred migration is: - -1. Keep historic-SQL work units responsible for source-shaped analysis. -2. Use source-specific tools for per-unit durable writes when the output is - local to that unit, such as a table's usage metadata or one pattern page. -3. Move whole-run deterministic cleanup into - `HistoricSqlSourceAdapter.finalize()`. -4. Delete `HistoricSqlProjectionPostProcessor`, `IngestBundlePostProcessorPort`, - `deps.postProcessors`, and `post_processor` memory-flow/report stages. - -If the implementation keeps typed evidence as an internal handoff between -historic-SQL work units and `finalize()`, that evidence must be framed as -source-specific input to the adapter's deterministic finalization, not as a -generic runner post-processing mechanism. The evidence files must not become a -public compatibility surface. - -Historic-SQL finalization must distinguish "no current-run evidence exists" -from "the current snapshot proves this artifact is stale." Whole-run cleanup -such as stale table usage, pattern-page staleness, and archive decisions can -run only when finalization has current-run historic-SQL evidence or an explicit -override-safe source of equivalent facts. - -## Reports and observability - -Reports should describe first-class pipeline phases, not historical extension -points. The isolated-diff summary should include finalization metadata when the -adapter implements `finalize?()`: whether it ran, finalization commit SHA, -touched paths, touched semantic-layer sources, changed wiki page keys, -warnings, descriptive finalization actions, and source-specific result payload. - -Saved-memory counts should come from work-unit, reconciliation, and -finalization memory actions plus touched artifact reporting. Finalization -actions are reporting/provenance records for writes that already happened in -the integration worktree; they are not a second write channel. There should be -no special `postProcessorSavedMemoryCounts` or `postProcessor` report body. -Memory-flow phases should use `finalization` instead of `post_processor`. - -The runner owns provenance for finalization. Adapters return touched artifacts -and optional descriptive actions, but they do not call the provenance port. -When finalization actions include valid `rawPaths`, the runner folds them into -the normal provenance plan using the current `sourceKey`, `syncId`, raw content -hashes, artifact kind, artifact key, target connection, and action type. The -finalization phase and commit SHA belong in trace/report metadata; they should -not be fabricated inside adapter-written files. - -Finalization reports must show both the adapter-declared touched artifacts and -the runner-derived touched artifacts from the finalization git diff. When those -sets differ, the report and trace must include the mismatch and the run must -fail before wiki-SL-ref repair or final gates rely on the wrong scope. When a -finalization action is excluded from provenance because no defensible raw path -exists, the report must name the action and reason instead of silently dropping -it. - -Traces must make finalization useful for postmortems. At minimum, record -`finalization_started`, `finalization_committed`, `finalization_skipped`, and -`finalization_failed` events with source key, touched paths, warnings, and -error summaries. - -## Failure handling - -Finalization failures are ingestion failures. If `finalize?()` returns errors, -throws, writes unauthorized targets, or causes final gates to fail, the runner -marks the run failed and leaves the main project worktree unchanged. - -Finalization should run after reconciliation because it may need to inspect the -accepted work-unit and reconciliation result. Final gates should run after -finalization because finalization writes durable project artifacts. - -Finalization must not be used to repair arbitrary integration conflicts or -rerun agent work. Conflict repair remains part of artifact-aware integration and -reconciliation. - -Finalization must also preserve reconciliation and accepted work-unit writes -from the same run. The runner must remember the paths changed before -finalization and fail if `finalize?()` modifies the same path after -reconciliation. If a source needs deterministic maintenance for an artifact -created or edited by a work unit in the same run, that behavior belongs in the -source-specific work-unit tool or in a later run, not in post-reconciliation -finalization. - -## Acceptance criteria - -The implementation is complete when these conditions are true: - -- No production runtime wiring references `deps.postProcessors`. -- `IngestBundlePostProcessorPort` and `HistoricSqlProjectionPostProcessor` are - removed from source exports and package export tests. -- `SourceAdapter.finalize?()` exists with typed context and result objects. -- The runner invokes `finalize?()` after reconciliation and before final gates. -- Finalization changes are committed in the integration worktree and included - in target-policy checks, final gates, reports, traces, and provenance inputs. -- Override replay passes explicit override metadata to finalization, including - `overrideReplay.evictionRawPaths`; leaves `workUnitOutcomes` empty when work - units are skipped; omits `parseArtifacts` unless a future explicit - override-safe input is added; and proves historic-SQL finalization does not - use prior-run `stageIndex` records as current-run evidence or stale/archive - artifacts from missing current-run evidence. -- Finalization provenance uses current raw paths, current-run - `stageIndex.evictionsApplied`, or `overrideReplay.evictionRawPaths`, and - actions without defensible raw-path attribution are reported as excluded from - provenance. -- The runner derives finalization touched paths, wiki page keys, and - semantic-layer scope from the integration-worktree git diff, resolves - aggregate semantic-layer files such as `_schema/*.yaml` to logical touched - sources with the runner's own semantic-layer parser/loader, cross-checks the - adapter's touched-artifact declaration, and fails on mismatches or - unresolvable changed semantic-layer paths. -- The runner fails when finalization modifies a path already changed by accepted - work-unit or reconciliation writes in the same run. -- `wiki_sl_ref_repair` remains a runner-owned step after finalization and - before final gates, consumes runner-derived finalization touched sources, and - has its writes covered by target-policy checks and final gates. -- Finalization `actions` are not re-applied by the runner; they are included - only in reporting, saved-memory counts, and provenance planning when their - raw-path attribution is valid. -- Historic SQL uses adapter-owned finalization for whole-run projection - maintenance. -- Tests cover a successful finalization, a finalization failure, unauthorized - finalization target rejection, override replay finalization behavior, - wiki-SL-ref repair placement, and historic-SQL projection behavior without - runner-level post-processors.