Move docs changes to specs repo

This commit is contained in:
Andrey Avtomonov 2026-05-19 15:11:09 +02:00
parent 2403f58eff
commit 573dfc20f0
8 changed files with 4 additions and 2793 deletions

View file

@ -1,5 +1,5 @@
{
"title": "Getting Started",
"defaultOpen": true,
"pages": ["introduction", "quickstart", "troubleshooting-linux"]
"pages": ["introduction", "quickstart"]
}

View file

@ -296,7 +296,7 @@ surface.
| Anthropic health check fails | API key, model id, or access is invalid | Fix `ANTHROPIC_API_KEY` or rerun setup with a different key or model |
| Vertex AI health check fails | Vertex API, Claude access, project, location, or IAM permissions are missing | Check the project, location, Application Default Credentials, and Vertex AI permissions |
| OpenAI embeddings fail | `OPENAI_API_KEY` is missing or invalid | Export the key or choose local sentence-transformers embeddings |
| Local embeddings fail | Managed Python runtime cannot install or start | See [Troubleshooting clean Linux install](/docs/getting-started/troubleshooting-linux) — usually missing Python 3.13 or an IPv6 proxy env var |
| Local embeddings fail | Managed Python runtime cannot install or start | Run `ktx dev runtime status`, then install the local embeddings runtime |
| Database test fails | Credentials, network access, database, warehouse, or schema is wrong | Test the same values with the database's native client, then rerun setup |
| Context is not built | Setup saved configuration but skipped or interrupted the build | Run `ktx setup` or `ktx ingest --all` |
| Agent integration is incomplete | Setup skipped the agents step or installed a different target | Run `ktx setup --agents --target <target>` |

View file

@ -1,163 +0,0 @@
---
title: Troubleshooting clean Linux install
description: Known gotchas when installing KTX from scratch on a clean Linux host (Ubuntu, Debian, container images). Read this before debugging managed-runtime or daemon failures.
---
This page documents the friction a coding agent (or human) will hit when running `npm install -g @kaelio/ktx@next` on a clean Linux host with no Python ≥ 3.13 installed, and during the first `ktx setup` on that host. Each item lists the symptom, the cause, and the exact recovery command.
## Prerequisites that aren't always satisfied
KTX needs:
| Tool | Minimum version | Why |
|------|-----------------|-----|
| Node.js | 22 | Runs the CLI |
| `uv` | 0.5+ | Manages the local Python runtime (semantic-layer daemon, local embeddings) |
| Python | 3.13 | KTX's managed Python runtime targets `>=3.13`. The system Python on Ubuntu 24.04 is 3.12. |
If `uv` is not on `PATH`, install it:
```bash
curl -LsSf https://astral.sh/uv/install.sh | sh
source $HOME/.local/bin/env # or: export PATH="$HOME/.local/bin:$PATH"
```
Install Python 3.13 via `uv` so it sits alongside whatever the system ships:
```bash
uv python install 3.13
```
You do not need to make 3.13 the system default. KTX's runtime installer will pick it up when you set `UV_PYTHON=3.13` for the install command (see below).
## Symptom: `ktx dev runtime install` fails on the venv step
The install log (`~/.ktx/runtime/<version>/install.log`) shows something like:
```
$ uv venv /home/runner/.ktx/runtime/<version>/.venv
Using CPython 3.12.3 interpreter at: /usr/bin/python3
...
Package requires Python >=3.13 but the running Python is 3.12.3
```
**Cause:** `uv venv` picked the system Python (3.12) when it built the runtime virtualenv. KTX's wheels declare `requires-python = ">=3.13"`, so the subsequent install fails.
**Fix:** install Python 3.13 (above), then force the runtime installer to use it:
```bash
uv python install 3.13
UV_PYTHON=3.13 ktx dev runtime install --feature local-embeddings --yes --force
```
The `--force` flag rebuilds the venv. Without it, the failed venv from the previous attempt is reused.
## Symptom: managed Python daemon crashes immediately with `URL parse error`
The daemon stderr (`<project>/.ktx/runtime/daemon.stderr.log`) contains an httpx traceback ending in something like:
```
File ".../httpx/_client.py", line 698, in __init__
URLPattern(key): None
File ".../httpx/_urls.py", line ..., in __init__
raise InvalidURL(...)
```
**Cause:** an environment variable holds a value httpx cannot parse — typically `NO_PROXY` or `no_proxy` containing an **IPv6 CIDR** such as `fd07:b51a:cc66:f0::/64`. OrbStack and some Docker network setups inject this by default. httpx interprets every comma-separated entry as a URL pattern and rejects raw IPv6 CIDRs.
**Fix:** scrub the bad entries before starting the daemon. The simplest workaround is to unset proxy vars entirely for daemon-related commands:
```bash
unset HTTP_PROXY HTTPS_PROXY NO_PROXY http_proxy https_proxy no_proxy
ktx dev runtime start --feature local-embeddings
```
If you need proxy entries to remain set for outbound HTTP, keep only the IPv4 + hostname entries:
```bash
export NO_PROXY="localhost,127.0.0.1,*.orb.internal,*.orb.local"
```
This issue is tracked for an upstream fix in the daemon: it should sanitize unparseable entries before constructing httpx clients.
## Symptom: `ktx setup` keeps connecting to an old daemon port
Running `ktx setup` more than once can leave orphan `ktx-daemon` processes. Each `setup` invocation may spawn a fresh daemon on a new port and write a new `daemon.json`, while the old one keeps running. Subsequent setup attempts may pick the stale port and fail with a connection-refused error or a `500` health check.
**Fix:** stop all daemons and remove the state files before re-running setup:
```bash
pkill -9 -f ktx-daemon || true
rm -f ~/.ktx/runtime/*/daemon.json
rm -f /path/to/project/.ktx/runtime/daemon.json
```
Then start the daemon explicitly **before** re-running setup so `setup` reuses it:
```bash
unset HTTP_PROXY HTTPS_PROXY NO_PROXY http_proxy https_proxy no_proxy
ktx dev runtime start --feature local-embeddings
```
## Symptom: `ktx status --json` reports a connection as failed but `ktx connection test <id>` passes
`ktx status` may cache a failure record from a prior bad run (for example, when the daemon was crashing). A successful `ktx connection test` does not always invalidate the cache.
**Fix:** re-run a fast ingest, which writes a fresh status record:
```bash
ktx ingest <connection-id> --fast
ktx status --json
```
## A minimal "clean Linux install" recipe
If you only want one working sequence, this one works from a fresh Ubuntu 24.04 container with Node 22 and Claude Code installed:
```bash
# 1. Prerequisites
curl -LsSf https://astral.sh/uv/install.sh | sh
source $HOME/.local/bin/env
uv python install 3.13
npm install -g @kaelio/ktx@next
# 2. Pre-warm the managed Python runtime with the right Python
UV_PYTHON=3.13 ktx dev runtime install --feature local-embeddings --yes --force
# 3. Start the daemon with a clean proxy env
unset HTTP_PROXY HTTPS_PROXY NO_PROXY http_proxy https_proxy no_proxy
ktx dev runtime start --feature local-embeddings
# 4. Scripted setup (replace DATABASE_URL with your warehouse)
mkdir -p /work/project
cd /work/project
export ANTHROPIC_API_KEY=... # already in env from your Claude Code session
export DATABASE_URL=postgresql://...
ktx setup \
--no-input \
--yes \
--project-dir /work/project \
--llm-backend anthropic \
--anthropic-api-key-env ANTHROPIC_API_KEY \
--anthropic-model claude-sonnet-4-6 \
--embedding-backend sentence-transformers \
--database postgres \
--new-database-connection-id warehouse \
--database-url env:DATABASE_URL \
--skip-sources \
--skip-agents
# 5. Build schema context
ktx ingest warehouse --fast
# 6. Verify
ktx status --json
ktx connection test warehouse
```
Success looks like:
- `ktx status --json` reports `"verdict": "ready"`
- `ktx connection test warehouse` exits 0 with `Status: ok`
- `semantic-layer/warehouse/_schema/` contains generated YAML files

View file

@ -1,7 +1,7 @@
import { source } from "@/lib/source";
import { readDocsPageMarkdown } from "@/lib/docs-markdown";
const siteOrigin = process.env.KTX_DOCS_ORIGIN ?? "https://docs.kaelio.com/ktx";
const siteOrigin = "https://docs.kaelio.com/ktx";
export type LlmDocsPage = {
title: string;
@ -61,7 +61,6 @@ ${link("/docs/ai-resources/agent-instructions", "Agent Instructions", "Suggested
${link("/docs/getting-started/introduction", "Introduction", "What KTX is and who it is for")}
${link("/docs/getting-started/quickstart", "Quickstart", "Set up KTX and build your first context")}
${link("/docs/getting-started/troubleshooting-linux", "Troubleshooting clean Linux install", "READ FIRST if installing from scratch on Linux/container — covers Python 3.13 prerequisite, IPv6 proxy gotcha, and a minimal working recipe")}
${link("/docs/guides/writing-context", "Writing Context", "Write semantic sources and wiki pages")}
## Machine-Readable Documentation

View file

@ -1,6 +1,6 @@
/// <reference types="next" />
/// <reference types="next/image-types/global" />
import "./.next/dev/types/routes.d.ts";
import "./.next/types/routes.d.ts";
// NOTE: This file should not be edited
// see https://nextjs.org/docs/app/api-reference/config/typescript for more information.

View file

@ -1,320 +0,0 @@
# Adapter-Owned Finalization V1 Closure Implementation Plan
> **For agentic workers:** REQUIRED SUB-SKILL: Use superpowers:subagent-driven-development (recommended) or superpowers:executing-plans to implement this plan task-by-task. Steps use checkbox (`- [ ]`) syntax for tracking.
**Goal:** Close the remaining adapter-owned finalization v1 verification gaps so the finalization contract is publicly typed and the historic-SQL local acceptance path passes through `SourceAdapter.finalize()`.
**Architecture:** The production runner already owns finalization execution, commits, target policy, final gates, reports, traces, and provenance. This plan keeps production behavior intact, exports the finalization adapter types through the ingest barrel, and updates the local historic-SQL acceptance fixture to model the real adapter-owned finalization path instead of the removed post-processor path.
**Tech Stack:** TypeScript ESM/NodeNext, Vitest, pnpm workspace commands, existing `SourceAdapter`, `projectHistoricSqlEvidence()`, and package export coverage.
---
## Audit summary
The audit compared
`docs/superpowers/specs/2026-05-18-adapter-owned-ingest-finalization-design.md`
against the implemented source, plan, and targeted tests.
Implemented v1 coverage:
- `SourceAdapter.finalize()` exists with typed context and result objects in
`packages/context/src/ingest/types.ts`.
- `IngestBundleRunnerDeps.postProcessors`, `IngestBundlePostProcessorPort`,
`HistoricSqlProjectionPostProcessor`, `post_processor` trace phases, and
`postProcessor` report fields are absent from production source.
- The runner invokes finalization after reconciliation and before
`wiki_sl_ref_repair`, target-policy checks, final artifact gates,
provenance validation, and squash.
- The runner derives finalization touched paths from the integration-worktree
diff, resolves semantic-layer scope including `_schema/*.yaml`, cross-checks
adapter declarations, commits finalization, records reports/traces, rejects
path overlap, and partitions finalization actions for provenance exclusions.
- Override replay passes explicit `overrideReplay` metadata, omits
`parseArtifacts`, and leaves current-run `workUnitOutcomes` empty.
- Historic SQL implements adapter-owned `finalize()` and uses
`projectHistoricSqlEvidence()` for aggregate projection maintenance.
V1-blocking gaps:
- `packages/context/src/ingest/index.ts` exports `SourceAdapter` and projection
types, but not `DeterministicFinalizationContext`,
`FinalizationOverrideReplay`, or `FinalizationResult`. The adapter contract is
less usable from the public ingest barrel than the spec requires.
- The targeted verification command currently fails because
`HistoricSqlEvidenceTestAdapter` in
`packages/context/src/ingest/local-bundle-ingest.test.ts` lacks
`finalize()`, so `result.report.body.finalization` is `undefined` in the
local historic-SQL projection acceptance test.
Non-blocking gaps:
- Older historical plan documents still mention post-processors. They are
archived implementation history and do not affect runtime behavior.
- The runner has helper-level declaration mismatch coverage, but no dedicated
local-bundle integration test for a finalization declaration mismatch. The
implementation path exists; adding a higher-level regression test can be a
later hardening pass.
- Finalization wiki page deletion could use a future global wiki-reference gate
regression. Historic-SQL v1 finalization updates or archives pages in place,
so this is not required for the current v1 acceptance path.
## File structure
- Modify `packages/context/src/ingest/index.ts`.
Re-export the typed finalization adapter contract next to the existing
projection contract.
- Modify `packages/context/src/package-exports.test.ts`.
Add compile-time coverage proving finalization adapter types are exported
from the ingest barrel.
- Modify `packages/context/src/ingest/local-bundle-ingest.test.ts`.
Make the historic-SQL local acceptance test adapter implement
`finalize()` by delegating to `projectHistoricSqlEvidence()`, and rename the
stale test label from post-processor to finalization.
---
### Task 1: Export finalization adapter contract types
**Files:**
- Modify: `packages/context/src/package-exports.test.ts`
- Modify: `packages/context/src/ingest/index.ts`
- [ ] **Step 1: Write failing type export coverage**
In `packages/context/src/package-exports.test.ts`, add this import after the
existing Vitest import:
```ts
import type {
DeterministicFinalizationContext,
FinalizationOverrideReplay,
FinalizationResult,
} from './ingest/index.js';
```
Then add this constant after `scanTypeExportCoverage`:
```ts
const ingestFinalizationTypeExportCoverage: Partial<{
context: DeterministicFinalizationContext;
overrideReplay: FinalizationOverrideReplay;
result: FinalizationResult;
}> = {};
```
Inside the existing package export test, place this assertion immediately after
`expect(scanTypeExportCoverage).toEqual({});`:
```ts
expect(ingestFinalizationTypeExportCoverage).toEqual({});
```
- [ ] **Step 2: Run type-check to verify the coverage fails**
Run:
```bash
pnpm --filter @ktx/context run type-check
```
Expected: FAIL with TypeScript errors like:
```text
Module '"./ingest/index.js"' has no exported member 'DeterministicFinalizationContext'.
Module '"./ingest/index.js"' has no exported member 'FinalizationOverrideReplay'.
Module '"./ingest/index.js"' has no exported member 'FinalizationResult'.
```
- [ ] **Step 3: Export the finalization types**
In `packages/context/src/ingest/index.ts`, update the existing export block
from `./types.js` so the final lines read:
```ts
WorkUnit,
DeterministicProjectionContext,
ProjectionResult,
DeterministicFinalizationContext,
FinalizationOverrideReplay,
FinalizationResult,
} from './types.js';
```
- [ ] **Step 4: Run type-check and package export coverage**
Run:
```bash
pnpm --filter @ktx/context run type-check
pnpm --filter @ktx/context exec vitest run src/package-exports.test.ts
```
Expected: both commands PASS.
- [ ] **Step 5: Commit the type export closure**
Run:
```bash
git add packages/context/src/ingest/index.ts packages/context/src/package-exports.test.ts
git commit -m "feat(ingest): export finalization adapter contract types"
```
### Task 2: Repair the local historic-SQL finalization acceptance fixture
**Files:**
- Modify: `packages/context/src/ingest/local-bundle-ingest.test.ts`
- [ ] **Step 1: Import the projection helper and finalization types**
In `packages/context/src/ingest/local-bundle-ingest.test.ts`, add this import
after the fake adapter import:
```ts
import { projectHistoricSqlEvidence } from './adapters/historic-sql/projection.js';
```
Replace the existing type import from `./types.js` with:
```ts
import type {
ChunkResult,
DeterministicFinalizationContext,
DiffSet,
FinalizationResult,
SourceAdapter,
} from './types.js';
```
- [ ] **Step 2: Add adapter-owned finalization to the test adapter**
In `HistoricSqlEvidenceTestAdapter`, add this method after `chunk()`:
```ts
async finalize(ctx: DeterministicFinalizationContext): Promise<FinalizationResult> {
const projection = await projectHistoricSqlEvidence({
workdir: ctx.workdir,
connectionId: ctx.connectionId,
syncId: ctx.syncId,
runId: ctx.runId,
overrideReplay: ctx.overrideReplay,
});
return {
result: projection,
warnings: projection.warnings,
errors: [],
touchedSources: projection.touchedSources,
changedWikiPageKeys: projection.changedWikiPageKeys,
actions: projection.actions,
};
}
```
- [ ] **Step 3: Rename the stale test label**
Change the test name:
```ts
it('runs historic-SQL evidence projection through the local bundle post-processor', async () => {
```
to:
```ts
it('runs historic-SQL evidence projection through local bundle finalization', async () => {
```
- [ ] **Step 4: Run the focused failing test**
Run:
```bash
pnpm --filter @ktx/context exec vitest run src/ingest/local-bundle-ingest.test.ts -t "historic-SQL evidence projection"
```
Expected: PASS, and the assertion at
`packages/context/src/ingest/local-bundle-ingest.test.ts:551` receives a
`result.report.body.finalization` object with `status: "success"`.
- [ ] **Step 5: Commit the local acceptance fixture**
Run:
```bash
git add packages/context/src/ingest/local-bundle-ingest.test.ts
git commit -m "test(ingest): exercise historic sql finalization locally"
```
### Task 3: Run final verification
**Files:**
- Verify: `packages/context/src/ingest/finalization-scope.test.ts`
- Verify: `packages/context/src/ingest/ingest-bundle.runner.test.ts`
- Verify: `packages/context/src/ingest/ingest-bundle.runner.isolated-diff.test.ts`
- Verify: `packages/context/src/ingest/adapters/historic-sql/projection.test.ts`
- Verify: `packages/context/src/ingest/local-bundle-ingest.test.ts`
- Verify: `packages/context/src/ingest/adapters/historic-sql/local-ingest-acceptance.test.ts`
- Verify: workspace TypeScript and dead-code checks
- [ ] **Step 1: Run the adapter-owned finalization targeted suite**
Run:
```bash
pnpm --filter @ktx/context exec vitest run src/ingest/finalization-scope.test.ts src/ingest/ingest-bundle.runner.test.ts src/ingest/ingest-bundle.runner.isolated-diff.test.ts src/ingest/adapters/historic-sql/projection.test.ts src/ingest/local-bundle-ingest.test.ts src/ingest/adapters/historic-sql/local-ingest-acceptance.test.ts
```
Expected: PASS with all six test files passing.
- [ ] **Step 2: Run TypeScript validation**
Run:
```bash
pnpm --filter @ktx/context run type-check
```
Expected: PASS.
- [ ] **Step 3: Run dead-code validation**
Run:
```bash
pnpm run dead-code
```
Expected: PASS.
- [ ] **Step 4: Inspect final status**
Run:
```bash
git status --short
```
Expected: only the intended committed changes are present, or the worktree is
clean after the two commits.
## Docs impact
No `docs-site/content/docs/` update is required. The remaining v1 work is an
adapter contract type export and test acceptance closure; it does not change
CLI behavior, user configuration, setup flow, connector behavior, or public
documentation examples.
## Self-review
- Spec coverage: The plan covers the remaining adapter API usability gap and
the failing historic-SQL local finalization acceptance path. The main
runner, reports, traces, provenance, override replay, and historic-SQL
production finalization behavior already exist.
- Placeholder scan: The plan contains no placeholder tasks or unspecified
implementation steps.
- Type consistency: `DeterministicFinalizationContext`,
`FinalizationOverrideReplay`, and `FinalizationResult` match the existing
names in `packages/context/src/ingest/types.ts`; the test adapter delegates
to the existing `projectHistoricSqlEvidence()` result shape.

View file

@ -1,443 +0,0 @@
# Adapter-owned ingest finalization design
**Date:** 2026-05-18
**Author:** Andrey Avtomonov
**Status:** Design - pending implementation plan
## Background
The isolated-diff ingestion migration made KTX's shared bundle runner
responsible for one durable execution model: stage raw source data, run
source-planned work units in isolated child worktrees, integrate their diffs,
reconcile, run final gates, and squash the accepted integration tree back into
the project worktree.
That direction is correct, but the current code still has a runner-level
post-processing extension point. `IngestBundleRunnerDeps.postProcessors` maps a
source key to an arbitrary `IngestBundlePostProcessorPort`, and local runtime
wires `historic-sql` to `HistoricSqlProjectionPostProcessor`. That path can
write durable semantic-layer and wiki artifacts after work-unit integration and
reconciliation, outside the source adapter contract.
Historic SQL exposed why the extra path exists. Its table and pattern work units
emit typed evidence, then a deterministic projection step merges the evidence
into `_schema` usage and historic-SQL wiki pages. Some of that work is local to
one work unit, but other behavior is whole-run maintenance: marking stale table
usage, reusing existing pattern pages, and archiving old pattern pages. Those
aggregate decisions do not fit cleanly inside independent per-work-unit writes.
The design goal is to preserve legitimate adapter-owned deterministic
maintenance without keeping a generic runner-level escape hatch.
## Goals
This design tightens the isolated-diff architecture around a stable boundary:
the generic runner owns execution mechanics, and adapters own source semantics.
The design has these goals:
- Remove runner-level `postProcessors` as an alternate durable-write pipeline.
- Add a first-class `SourceAdapter.finalize?()` hook for deterministic
post-work-unit source maintenance.
- Keep `finalize?()` constrained, observable, and subject to the same final
validation gates as work-unit and reconciliation changes.
- Preserve historic-SQL aggregate projection behavior without treating it as a
hidden fallback ingestion path.
- Keep public execution knobs out of the adapter API.
## Non-goals
This design does not rework source-specific chunking, fetch formats, wiki page
frontmatter, semantic-layer YAML, or raw source layouts. It does not replace
agent-authored work units with deterministic projectors. It also does not add a
public `executionMode`, `planningStrategy`, `conflictPolicy`, or source-key
allowlist.
Override ingest remains a special correction operation that reuses a prior raw
snapshot and forces reconciliation. It should be documented and tested as
override replay, not as a fallback pipeline. This design does not require
override ingest to run source work units.
## Locked design direction
The shared ingestion runner keeps one ordered pipeline for sources that can
write durable project artifacts.
```text
fetch raw
-> adapter plans WorkUnit[]
-> optional adapter project
-> isolated WU diffs
-> artifact-aware integration
-> reconciliation
-> optional adapter finalize
-> runner wiki-SL-ref repair
-> final target policy and artifact gates
-> squash
```
The exact implementation may continue to call `chunk()` before `project()` so a
projector can consume `parseArtifacts`. The architectural invariant is that
`project()` runs in the integration worktree before child worktrees start, while
`finalize()` runs in the integration worktree after accepted work-unit and
reconciliation changes are present.
Adapters decide what source-specific work belongs in `project()`, work units,
or `finalize()`. The runner decides when those phases run, captures their git
effects, enforces target scope, runs gates, writes traces and reports, and
squashes the final tree.
## Adapter API
The source adapter contract should make deterministic source phases explicit.
```ts
interface SourceAdapter {
readonly source: string;
readonly skillNames: string[];
readonly reconcileSkillNames?: string[];
readonly evidenceIndexing?: 'documents';
readonly triageSupported?: boolean;
getTriageSignals?(stagedDir: string, externalId: string): Promise<TriageSignals>;
detect(stagedDir: string): Promise<boolean>;
fetch?(pullConfig: unknown, stagedDir: string, ctx: FetchContext): Promise<void>;
readFetchReport?(stagedDir: string): Promise<SourceFetchReport | null>;
listTargetConnectionIds?(stagedDir: string): Promise<string[]>;
chunk(stagedDir: string, diffSet?: DiffSet): Promise<ChunkResult>;
clusterWorkUnits?(ctx: ClusterWorkUnitsContext): Promise<WorkUnit[]>;
project?(ctx: DeterministicProjectionContext): Promise<ProjectionResult>;
finalize?(ctx: DeterministicFinalizationContext): Promise<FinalizationResult>;
describeScope?(stagedDir: string): Promise<ScopeDescriptor>;
onPullSucceeded?(ctx: PullSucceededContext): Promise<void>;
}
```
`finalize?()` is not a compatibility wrapper for old post-processors. It is a
source-adapter method with a fixed location in the runner lifecycle.
```ts
interface DeterministicFinalizationContext {
connectionId: string;
sourceKey: string;
syncId: string;
jobId: string;
runId: string;
stagedDir: string;
workdir: string;
parseArtifacts?: unknown;
stageIndex: StageIndex;
workUnitOutcomes: WorkUnitOutcome[];
reconciliationActions: MemoryAction[];
overrideReplay?: FinalizationOverrideReplay;
}
interface FinalizationResult {
warnings: string[];
errors: string[];
touchedSources: TouchedSlSource[];
changedWikiPageKeys: string[];
actions?: MemoryAction[];
result?: unknown;
}
interface FinalizationOverrideReplay {
priorJobId: string;
priorRunId: string;
priorSyncId: string;
evictionRawPaths: string[];
}
```
The implementation plan can adjust exact type names to match the existing
module layout, but the contract must preserve these semantics:
- `finalize?()` is deterministic TypeScript code, not an agent loop.
- It runs only in the ingestion integration worktree.
- It may write ordinary durable project files.
- It must report the semantic-layer sources and wiki page keys it believes it
touched so the runner can verify that declaration against the worktree diff.
- Outside override replay, `stageIndex` is the canonical runner index for
accepted work-unit actions, touched sources, evictions, reconciliation records,
and artifact resolutions visible to the current run.
- In override replay, `stageIndex` is a prior-run replay index for work-unit
facts. It may contain prior-run work-unit actions, touched sources, and
artifact records, and adapters must not treat those entries as current-run
evidence. The runner must not replay prior-report `evictionsApplied` as
current-run eviction evidence. If override reconciliation records eviction
decisions, those records are fresh current-run `stageIndex.evictionsApplied`
entries.
- `workUnitOutcomes` contains only work units executed in the current run. It
is empty when override replay skips source work units.
- `reconciliationActions` contains only accepted reconciliation writes emitted
through the reconciliation tool session in the current run. These actions have
already mutated the integration worktree.
- `overrideReplay` being present is the canonical signal that source work units
did not produce current-run evidence unless another context field explicitly
carries fresh current-run deterministic input.
- `overrideReplay.evictionRawPaths` contains the deleted raw paths loaded from
the prior report's `evictionInputs` for the reused raw snapshot. It is the
only override-replay raw-path allowlist for removed-from-snapshot provenance.
It is not, by itself, proof that a particular durable artifact is stale or was
observed by current-run work units.
- `actions` in `FinalizationResult` are descriptive records for finalization
writes that the adapter already performed. The runner must not re-apply them.
When finalization actions are intended to create provenance rows, they must
carry defensible `rawPaths`: current-snapshot paths from the current raw
snapshot, removed-from-snapshot paths from current-run
`stageIndex.evictionsApplied`, or removed-from-snapshot paths from
`overrideReplay.evictionRawPaths` when override replay is present.
Finalization actions without defensible raw-path attribution are still
reported, but the runner must exclude them from provenance and surface that
exclusion explicitly.
- It cannot mutate the main project worktree directly.
- The finalization context must not pass a root-scoped service that can bypass
the integration worktree. `workdir` is the durable write boundary. If a future
helper is added to the context, the contract must name it as worktree-scoped
and state whether it is read-only or allowed to write.
The existing adapter API fields unrelated to deterministic projection and
finalization remain part of the contract. Adding `finalize?()` must not remove
triage or evidence-indexing support.
## Override replay
Override ingest remains a replay of a prior raw snapshot with forced
reconciliation. It does not execute source work units or call `adapter.chunk()`
in this design, so finalization must not silently assume fresh work-unit
evidence exists.
The runner should still enter the finalization phase for adapters that
implement `finalize?()`, but it must pass explicit override metadata. In that
mode, `workUnitOutcomes` is empty, `parseArtifacts` is absent,
`overrideReplay.evictionRawPaths` is populated from the prior report's
`evictionInputs`, `stageIndex` comes from the prior report with prior
`evictionsApplied` excluded, and `reconciliationActions` contains only new
override reconciliation actions.
If a future implementation intentionally re-parses the materialized override
raw snapshot, it must expose that fact through an explicit override-safe context
field instead of relying on `parseArtifacts` alone. `parseArtifacts` by itself
is never current work-unit evidence in override replay and never authorizes
historic-SQL whole-run cleanup.
Adapters must treat missing current-run deterministic inputs as a no-op, not as
negative evidence. For historic SQL, override replay must not mark tables stale,
mark pattern pages stale, or archive pattern pages from an empty current-run
evidence directory. Whole-run cleanup can run only when `overrideReplay` is
absent and current-run work-unit evidence exists, or when a future explicit
override-safe context field names equivalent facts. Any override-safe
finalization must be derived from the materialized raw snapshot or explicit
prior-report data. In particular, prior-run
`stageIndex.workUnits[*].actions`, prior-run touched sources, and prior-run
artifact records are not proof that the current override run observed or failed
to observe those artifacts.
## Runner responsibilities
The runner owns all reusable mechanics around `finalize?()`.
After reconciliation completes, the runner calls `adapter.finalize?()` if it
exists. The runner captures the pre-finalization commit, derives the
finalization changed paths from the integration-worktree git diff, commits those
changes, records the commit SHA and touched paths in the run trace/report,
includes finalization actions in saved-memory counts, and runs wiki-SL-ref
repair before final target-policy and artifact gates.
The integration-worktree diff is the source of truth for finalization touched
paths, changed wiki page keys, and semantic-layer paths. The adapter's
`touchedSources` and `changedWikiPageKeys` declaration is a verification input,
not the downstream authority. The runner must derive the final repair and gate
scope from the diff, cross-check the adapter declaration against that diff, and
fail the run on under-reporting or over-reporting that would make wiki-SL-ref
repair, target-policy checks, final gates, reports, traces, or provenance use a
different artifact set from the actual finalization commit.
The runner-derived semantic-layer scope must include logical
`TouchedSlSource` tuples, not only file paths. Standalone semantic-layer files
under `semantic-layer/<connectionId>/<sourceName>.yaml` can map structurally to
`{ connectionId, sourceName }`. Aggregate semantic-layer files, including
`semantic-layer/<connectionId>/_schema/*.yaml`, must be resolved by comparing
the pre-finalization and post-finalization materialized semantic-layer sources
with the worktree-scoped semantic-layer parser/loader. Wiki page keys continue
to map structurally from `wiki/global/<pageKey>.md`. If the runner cannot
resolve a changed semantic-layer path to logical touched sources with its own
resolver, the run must fail; it must not fall back to the adapter declaration as
the downstream scope.
`wiki_sl_ref_repair` remains a runner mechanic, not an adapter method. It runs
after finalization and before final gates, and it uses the normal target
connection set plus the runner-derived finalization touched sources to decide
which semantic-layer references are visible. Its writes are part of the same
integration worktree diff as finalization/reconciliation, so target-policy
checks, final artifact gates, reports, traces, and squash behavior cover those
writes before changes reach the main project worktree.
The runner must treat finalization like deterministic projection and
reconciliation, not like a free-form source-key plug-in. It must enforce the
same target-connection policy used for work-unit and reconciliation changes.
If finalization writes an unauthorized semantic-layer target, modifies artifacts
outside the authorized target set, references a missing semantic-layer entity, or
returns errors, the run fails before changes reach the main project worktree.
The runner should expose one trace phase named `finalization`. It should not
keep a `post_processor` stage, `IngestBundlePostProcessorPort`,
`deps.postProcessors`, or report fields that imply a parallel post-processor
pipeline.
## Adapter application
Each adapter continues to use the same generic runner mechanics, while keeping
source-specific choices inside the adapter.
- `metabase` fetches cards and dashboards, computes scope, plans
card/dashboard work units, and usually does not need `project()` or
`finalize()`.
- `notion` fetches pages, extracts triage signals, clusters page work units,
and usually does not need deterministic finalization.
- `dbt` fetches the repository, parses dbt project metadata, plans model work
units, and may later add `project()` if dbt YAML import becomes deterministic.
- `lookml` fetches LookML, produces validation artifacts, plans model and
explore work units, and may later add `project()` for deterministic LookML to
semantic-layer import.
- `looker` fetches runtime bundles, fetch reports, target connections, and
triage signals. It continues to rely on work-unit diffs and shared gates.
- `metricflow` is the current strong `project()` example. It imports
authoritative semantic models before child worktrees start, then lets any
work units observe those projected files.
- `live-database` can remain work-unit based, but database schema introspection
is a good future `project()` candidate because the schema is authoritative
structured metadata.
- `historic-sql` should move current post-processor behavior into the adapter.
Local table-usage and pattern-page writes may move into work-unit tools where
they are genuinely per-unit. Whole-run maintenance such as stale table usage,
pattern-page reuse, and stale/archive page decisions belongs in
`HistoricSqlSourceAdapter.finalize()`.
- `fake` remains a test adapter and does not need deterministic phases.
## Historic-SQL migration
Historic SQL should stop using evidence-only tool output plus runner-level
post-processing as its durable projection path.
The preferred migration is:
1. Keep historic-SQL work units responsible for source-shaped analysis.
2. Use source-specific tools for per-unit durable writes when the output is
local to that unit, such as a table's usage metadata or one pattern page.
3. Move whole-run deterministic cleanup into
`HistoricSqlSourceAdapter.finalize()`.
4. Delete `HistoricSqlProjectionPostProcessor`, `IngestBundlePostProcessorPort`,
`deps.postProcessors`, and `post_processor` memory-flow/report stages.
If the implementation keeps typed evidence as an internal handoff between
historic-SQL work units and `finalize()`, that evidence must be framed as
source-specific input to the adapter's deterministic finalization, not as a
generic runner post-processing mechanism. The evidence files must not become a
public compatibility surface.
Historic-SQL finalization must distinguish "no current-run evidence exists"
from "the current snapshot proves this artifact is stale." Whole-run cleanup
such as stale table usage, pattern-page staleness, and archive decisions can
run only when finalization has current-run historic-SQL evidence or an explicit
override-safe source of equivalent facts.
## Reports and observability
Reports should describe first-class pipeline phases, not historical extension
points. The isolated-diff summary should include finalization metadata when the
adapter implements `finalize?()`: whether it ran, finalization commit SHA,
touched paths, touched semantic-layer sources, changed wiki page keys,
warnings, descriptive finalization actions, and source-specific result payload.
Saved-memory counts should come from work-unit, reconciliation, and
finalization memory actions plus touched artifact reporting. Finalization
actions are reporting/provenance records for writes that already happened in
the integration worktree; they are not a second write channel. There should be
no special `postProcessorSavedMemoryCounts` or `postProcessor` report body.
Memory-flow phases should use `finalization` instead of `post_processor`.
The runner owns provenance for finalization. Adapters return touched artifacts
and optional descriptive actions, but they do not call the provenance port.
When finalization actions include valid `rawPaths`, the runner folds them into
the normal provenance plan using the current `sourceKey`, `syncId`, raw content
hashes, artifact kind, artifact key, target connection, and action type. The
finalization phase and commit SHA belong in trace/report metadata; they should
not be fabricated inside adapter-written files.
Finalization reports must show both the adapter-declared touched artifacts and
the runner-derived touched artifacts from the finalization git diff. When those
sets differ, the report and trace must include the mismatch and the run must
fail before wiki-SL-ref repair or final gates rely on the wrong scope. When a
finalization action is excluded from provenance because no defensible raw path
exists, the report must name the action and reason instead of silently dropping
it.
Traces must make finalization useful for postmortems. At minimum, record
`finalization_started`, `finalization_committed`, `finalization_skipped`, and
`finalization_failed` events with source key, touched paths, warnings, and
error summaries.
## Failure handling
Finalization failures are ingestion failures. If `finalize?()` returns errors,
throws, writes unauthorized targets, or causes final gates to fail, the runner
marks the run failed and leaves the main project worktree unchanged.
Finalization should run after reconciliation because it may need to inspect the
accepted work-unit and reconciliation result. Final gates should run after
finalization because finalization writes durable project artifacts.
Finalization must not be used to repair arbitrary integration conflicts or
rerun agent work. Conflict repair remains part of artifact-aware integration and
reconciliation.
Finalization must also preserve reconciliation and accepted work-unit writes
from the same run. The runner must remember the paths changed before
finalization and fail if `finalize?()` modifies the same path after
reconciliation. If a source needs deterministic maintenance for an artifact
created or edited by a work unit in the same run, that behavior belongs in the
source-specific work-unit tool or in a later run, not in post-reconciliation
finalization.
## Acceptance criteria
The implementation is complete when these conditions are true:
- No production runtime wiring references `deps.postProcessors`.
- `IngestBundlePostProcessorPort` and `HistoricSqlProjectionPostProcessor` are
removed from source exports and package export tests.
- `SourceAdapter.finalize?()` exists with typed context and result objects.
- The runner invokes `finalize?()` after reconciliation and before final gates.
- Finalization changes are committed in the integration worktree and included
in target-policy checks, final gates, reports, traces, and provenance inputs.
- Override replay passes explicit override metadata to finalization, including
`overrideReplay.evictionRawPaths`; leaves `workUnitOutcomes` empty when work
units are skipped; omits `parseArtifacts` unless a future explicit
override-safe input is added; and proves historic-SQL finalization does not
use prior-run `stageIndex` records as current-run evidence or stale/archive
artifacts from missing current-run evidence.
- Finalization provenance uses current raw paths, current-run
`stageIndex.evictionsApplied`, or `overrideReplay.evictionRawPaths`, and
actions without defensible raw-path attribution are reported as excluded from
provenance.
- The runner derives finalization touched paths, wiki page keys, and
semantic-layer scope from the integration-worktree git diff, resolves
aggregate semantic-layer files such as `_schema/*.yaml` to logical touched
sources with the runner's own semantic-layer parser/loader, cross-checks the
adapter's touched-artifact declaration, and fails on mismatches or
unresolvable changed semantic-layer paths.
- The runner fails when finalization modifies a path already changed by accepted
work-unit or reconciliation writes in the same run.
- `wiki_sl_ref_repair` remains a runner-owned step after finalization and
before final gates, consumes runner-derived finalization touched sources, and
has its writes covered by target-policy checks and final gates.
- Finalization `actions` are not re-applied by the runner; they are included
only in reporting, saved-memory counts, and provenance planning when their
raw-path attribution is valid.
- Historic SQL uses adapter-owned finalization for whole-run projection
maintenance.
- Tests cover a successful finalization, a finalization failure, unauthorized
finalization target rejection, override replay finalization behavior,
wiki-SL-ref repair placement, and historic-SQL projection behavior without
runner-level post-processors.