docs(plans): add historic-sql pattern shard smoke/docs plan

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
This commit is contained in:
Andrey Avtomonov 2026-05-11 20:29:08 +02:00
parent 2d1efda176
commit fed5e6f89f

View file

@ -0,0 +1,407 @@
# Historic SQL Pattern Shard Smoke Docs Implementation Plan
> **For agentic workers:** REQUIRED SUB-SKILL: Use superpowers:subagent-driven-development (recommended) or superpowers:executing-plans to implement this plan task-by-task. Steps use checkbox (`- [ ]`) syntax for tracking.
**Goal:** Align the Postgres historic-SQL smoke and example docs with sharded pattern WorkUnits.
**Architecture:** The runtime already writes the full `patterns-input.json` audit file and bounded `patterns-input/part-0001.json` style shards. This plan updates the example acceptance assets so they verify the sharded contract instead of the pre-sharding root `historic-sql-patterns` WorkUnit.
**Tech Stack:** Bash, Node.js built-in test runner, pnpm workspace scripts, KTX local stage-only ingest.
---
## Spec And Existing Plan Status
Spec: `docs/superpowers/specs/2026-05-11-historic-sql-redesign-design.md`
Plans derived from this spec and implemented in this worktree:
- `docs/superpowers/plans/2026-05-11-historic-sql-foundations.md` - implemented. Evidence: `packages/context/src/ingest/adapters/historic-sql/skill-schemas.ts`, `packages/context/src/sql-analysis/ports.ts`, daemon `/sql/analyze-batch`, `SemanticLayerSource.usage`, and `mergeUsagePreservingExternal()`.
- `docs/superpowers/plans/2026-05-11-historic-sql-search-enrichment.md` - implemented. Evidence: usage-aware SL search text, SQLite FTS snippets, and local/MCP result fields `frequencyTier` plus `snippet`.
- `docs/superpowers/plans/2026-05-11-historic-sql-unified-hot-path.md` - implemented. Evidence: `stageHistoricSqlAggregatedSnapshot()`, `chunkHistoricSqlUnifiedStagedDir()`, `PostgresPgssReader`, aggregate BigQuery/Snowflake readers, unified schemas, and package exports.
- `docs/superpowers/plans/2026-05-11-historic-sql-skills-projection-cutover.md` - implemented. Evidence: `HistoricSqlSourceAdapter`, `historic_sql_table_digest`, `historic_sql_patterns`, `emit_historic_sql_evidence`, `HistoricSqlProjectionPostProcessor`, and legacy skill removal from runtime code.
- `docs/superpowers/plans/2026-05-11-historic-sql-cross-dialect-readiness.md` - implemented. Evidence: local adapter registration tests for Postgres, BigQuery, and Snowflake plus PG doctor coverage for informational `pg_stat_statements.max`.
- `docs/superpowers/plans/2026-05-11-historic-sql-docs-smoke-and-config-cleanup.md` - implemented at the time it was written, but its smoke assertions predate pattern shard WorkUnits.
- `docs/superpowers/plans/2026-05-11-historic-sql-projection-archive-hardening.md` - implemented. Evidence: `isArchivedPatternPage()`, archive exclusion from slug matching, stale table tests, and legacy query-page cleanup coverage.
- `docs/superpowers/plans/2026-05-11-historic-sql-end-to-end-retrieval-acceptance.md` - implemented. Evidence: `local-ingest-acceptance.test.ts` proves production adapter output reaches SL search and wiki search.
- `docs/superpowers/plans/2026-05-11-historic-sql-redaction-hardening.md` - implemented. Evidence: `redaction.ts`, `redaction.test.ts`, and staged artifact redaction coverage in `stage-unified.test.ts`.
- `docs/superpowers/plans/2026-05-11-historic-sql-pattern-workunit-sharding.md` - implemented. Evidence: `pattern-inputs.ts`, `pattern-inputs.test.ts`, `stage-unified.ts` writes `patterns-input/part-*.json`, `chunk-unified.ts` emits `historic-sql-patterns-part-*`, `historic_sql_patterns` reads shards, and acceptance tests use `rawPath: 'patterns-input/part-0001.json'`.
No existing spec-derived implementation plan is currently unimplemented in this worktree.
Remaining gap this plan fixes:
- `examples/postgres-historic/scripts/smoke.sh` still asserts a WorkUnit with `unitKey === 'historic-sql-patterns'`.
- Current runtime emits pattern WorkUnits with keys like `historic-sql-patterns-part-0001` and raw files like `patterns-input/part-0001.json`.
- The same smoke only validates the audit file `patterns-input.json`; it does not assert that the bounded shard files exist or contain only cross-table candidates.
- `examples/postgres-historic/README.md` and `examples/README.md` describe unchanged "pattern inputs" but do not explain that `patterns-input.json` is now audit-only and `patterns-input/part-*.json` drives pattern WorkUnits.
- `scripts/examples-docs.test.mjs` does not pin the sharded smoke/doc contract, so the stale root WorkUnit assertion can regress silently.
## File Structure
- Modify `scripts/examples-docs.test.mjs`
Pins docs and smoke script to the sharded pattern WorkUnit contract.
- Modify `examples/postgres-historic/scripts/smoke.sh`
Validates `patterns-input/part-*.json` shard files and `historic-sql-patterns-part-*` stage-only WorkUnits.
- Modify `examples/postgres-historic/README.md`
Documents `patterns-input.json` as the full audit artifact and `patterns-input/part-*.json` as bounded pattern WorkUnit input.
- Modify `examples/README.md`
Updates the short example catalog entry with the same audit-vs-shard wording.
### Task 1: Pin Example Tests To Pattern Shards
**Files:**
- Modify: `scripts/examples-docs.test.mjs`
- [ ] **Step 1: Add failing assertions for sharded pattern smoke/docs**
In `scripts/examples-docs.test.mjs`, inside `it('documents the Postgres historic SQL smoke example', ...)`, add these assertions immediately after the existing `assert.match(readme, /patterns-input\.json/);` line:
```javascript
assert.match(readme, /patterns-input\/part-\*\.json/);
assert.match(readme, /full audit input/);
assert.match(readme, /bounded pattern WorkUnit shards/);
```
In the same test, add these assertions immediately after the existing `assert.match(smoke, /assert_stage_record "\$UNCHANGED_RECORD" unchanged zero/);` line:
```javascript
assert.match(smoke, /assertPatternShards/);
assert.match(smoke, /historic-sql-patterns-part-/);
assert.match(smoke, /patterns-input\/part-/);
assert.doesNotMatch(smoke, /unitKey === 'historic-sql-patterns'/);
```
- [ ] **Step 2: Run the example docs test to verify it fails**
Run:
```bash
node --test scripts/examples-docs.test.mjs
```
Expected: FAIL. The test should report missing `patterns-input/part-*.json`, `full audit input`, `bounded pattern WorkUnit shards`, `assertPatternShards`, or it should fail because `smoke.sh` still contains `unitKey === 'historic-sql-patterns'`.
- [ ] **Step 3: Commit the failing test**
Run:
```bash
git add scripts/examples-docs.test.mjs
git commit -m "test: expect historic sql pattern shard smoke docs"
```
### Task 2: Update The Postgres Historic Smoke
**Files:**
- Modify: `examples/postgres-historic/scripts/smoke.sh`
- Test: `scripts/examples-docs.test.mjs`
- [ ] **Step 1: Import `existsSync` in the embedded snapshot assertion**
In `examples/postgres-historic/scripts/smoke.sh`, inside `assert_unified_snapshot()`, replace this line:
```javascript
const { readFileSync, readdirSync } = require('node:fs');
```
with:
```javascript
const { existsSync, readFileSync, readdirSync } = require('node:fs');
```
- [ ] **Step 2: Add shard validation to `assert_unified_snapshot()`**
In `examples/postgres-historic/scripts/smoke.sh`, inside the embedded Node script in `assert_unified_snapshot()`, add this function after the `legacyKeys` loop:
```javascript
function assertPatternShards(root) {
const shardDir = join(root, 'patterns-input');
assert(existsSync(shardDir), 'Expected patterns-input shard directory');
const shardFiles = readdirSync(shardDir)
.filter((file) => /^part-\d{4}\.json$/.test(file))
.sort()
.map((file) => `patterns-input/${file}`);
assert(shardFiles.length > 0, 'Expected at least one pattern shard file');
for (const shardFile of shardFiles) {
const shard = JSON.parse(readFileSync(join(root, shardFile), 'utf8'));
assert(Array.isArray(shard.templates), `${shardFile}: expected templates array`);
assert(shard.templates.length > 0, `${shardFile}: expected at least one template`);
assert(
shard.templates.every((template) => Array.isArray(template.tablesTouched) && template.tablesTouched.length >= 2),
`${shardFile}: expected only cross-table pattern candidates`,
);
}
return shardFiles;
}
```
- [ ] **Step 3: Assert the full audit input and bounded shards**
In the same embedded Node script, replace the current `patterns` block:
```javascript
const patterns = JSON.parse(readFileSync(join(root, 'patterns-input.json'), 'utf8'));
assert(Array.isArray(patterns.templates) && patterns.templates.length > 0, 'Expected patterns-input templates');
assert(
patterns.templates.every((template) => Array.isArray(template.tablesTouched) && template.tablesTouched.length > 0),
'Expected every pattern template to have touched tables',
);
```
with:
```javascript
const patterns = JSON.parse(readFileSync(join(root, 'patterns-input.json'), 'utf8'));
assert(Array.isArray(patterns.templates) && patterns.templates.length > 0, 'Expected patterns-input audit templates');
assert(
patterns.templates.every((template) => Array.isArray(template.tablesTouched) && template.tablesTouched.length > 0),
'Expected every audit pattern template to have touched tables',
);
const shardFiles = assertPatternShards(root);
assert(
shardFiles.length <= patterns.templates.length,
`Expected shard count ${shardFiles.length} to be no greater than audit template count ${patterns.templates.length}`,
);
```
- [ ] **Step 4: Update the stage record WorkUnit assertions**
In `examples/postgres-historic/scripts/smoke.sh`, inside the embedded Node script in `assert_stage_record()`, replace:
```javascript
assert(record.rawFileCount >= 3, `${label}: expected manifest, patterns input, and at least one table file`);
```
with:
```javascript
assert(record.rawFileCount >= 4, `${label}: expected manifest, audit patterns input, pattern shard, and at least one table file`);
```
Then replace this nonzero WorkUnit block:
```javascript
} else if (expectedWorkUnits === 'nonzero') {
assert(record.workUnitCount > 0, `${label}: expected nonzero WorkUnits`);
assert(record.workUnits.some((unit) => unit.unitKey === 'historic-sql-patterns'), `${label}: expected patterns WorkUnit`);
assert(record.workUnits.some((unit) => unit.unitKey.startsWith('historic-sql-table-')), `${label}: expected table WorkUnit`);
} else {
```
with:
```javascript
} else if (expectedWorkUnits === 'nonzero') {
assert(record.workUnitCount > 0, `${label}: expected nonzero WorkUnits`);
const patternUnits = record.workUnits.filter((unit) => /^historic-sql-patterns-part-\d{4}$/.test(unit.unitKey));
assert(patternUnits.length > 0, `${label}: expected sharded patterns WorkUnit`);
for (const unit of patternUnits) {
assert(
unit.rawFiles.some((rawFile) => /^patterns-input\/part-\d{4}\.json$/.test(rawFile)),
`${label}: expected ${unit.unitKey} to read a pattern shard`,
);
assert(
!unit.rawFiles.includes('patterns-input.json'),
`${label}: expected ${unit.unitKey} not to schedule the full audit patterns input`,
);
}
assert(record.workUnits.some((unit) => unit.unitKey.startsWith('historic-sql-table-')), `${label}: expected table WorkUnit`);
} else {
```
- [ ] **Step 5: Run shell syntax and the docs test**
Run:
```bash
bash -n examples/postgres-historic/scripts/smoke.sh
node --test scripts/examples-docs.test.mjs
```
Expected: `bash -n` exits 0. The docs test still fails until the README files are updated in Task 3.
- [ ] **Step 6: Commit the smoke update**
Run:
```bash
git add examples/postgres-historic/scripts/smoke.sh
git commit -m "test: assert historic sql pattern shard smoke"
```
### Task 3: Update Example Documentation
**Files:**
- Modify: `examples/postgres-historic/README.md`
- Modify: `examples/README.md`
- Test: `scripts/examples-docs.test.mjs`
- [ ] **Step 1: Update the artifact list in the Postgres historic README**
In `examples/postgres-historic/README.md`, replace this list:
```markdown
- `manifest.json`
- `tables/*.json`
- `patterns-input.json`
```
with:
```markdown
- `manifest.json`
- `tables/*.json`
- `patterns-input.json` as the full audit input
- `patterns-input/part-*.json` as bounded pattern WorkUnit shards
```
- [ ] **Step 2: Update the idempotency wording**
In `examples/postgres-historic/README.md`, replace this paragraph:
```markdown
The smoke also runs the same workload twice and verifies the second stage-only
run has `workUnitCount: 0`, which proves unchanged bucketed table and pattern
inputs do not schedule LLM work.
```
with:
```markdown
The smoke also runs the same workload twice and verifies the second stage-only
run has `workUnitCount: 0`, which proves unchanged bucketed table inputs and
unchanged bounded pattern shards do not schedule LLM work.
```
- [ ] **Step 3: Update the manifest inspection wording**
In `examples/postgres-historic/README.md`, replace this paragraph:
```markdown
The manifest should have `source: "historic-sql"`, `dialect: "postgres"`,
positive `snapshotRowCount`, positive `touchedTableCount`, numeric
`parseFailures`, `warnings`, and `probeWarnings`. The same directory should
contain `patterns-input.json` and one `tables/*.json` file per touched table.
```
with:
```markdown
The manifest should have `source: "historic-sql"`, `dialect: "postgres"`,
positive `snapshotRowCount`, positive `touchedTableCount`, numeric
`parseFailures`, `warnings`, and `probeWarnings`. The same directory should
contain `patterns-input.json`, at least one `patterns-input/part-*.json` pattern
shard for cross-table candidates, and one `tables/*.json` file per touched
table.
```
- [ ] **Step 4: Update the examples catalog entry**
In `examples/README.md`, replace this paragraph:
```markdown
`postgres-historic/` is a manual Docker-backed smoke for Postgres historic-SQL
ingest via `pg_stat_statements`. It verifies setup, unified Historic SQL artifacts,
managed daemon batch SQL analysis, and no-WorkUnit idempotency for unchanged
bucketed table and pattern inputs.
```
with:
```markdown
`postgres-historic/` is a manual Docker-backed smoke for Postgres historic-SQL
ingest via `pg_stat_statements`. It verifies setup, unified Historic SQL artifacts,
managed daemon batch SQL analysis, bounded pattern WorkUnit shards, and
no-WorkUnit idempotency for unchanged bucketed table inputs and pattern shards.
```
- [ ] **Step 5: Run the example docs test**
Run:
```bash
node --test scripts/examples-docs.test.mjs
```
Expected: PASS.
- [ ] **Step 6: Commit the docs update**
Run:
```bash
git add examples/postgres-historic/README.md examples/README.md
git commit -m "docs: explain historic sql pattern shards"
```
### Task 4: Verify The Smoke Contract
**Files:**
- Verify: `scripts/examples-docs.test.mjs`
- Verify: `examples/postgres-historic/scripts/smoke.sh`
- Verify: `examples/postgres-historic/README.md`
- Verify: `examples/README.md`
- [ ] **Step 1: Run focused local checks**
Run:
```bash
bash -n examples/postgres-historic/scripts/smoke.sh
node --test scripts/examples-docs.test.mjs
```
Expected: both commands pass.
- [ ] **Step 2: Run the Docker-backed Postgres historic smoke**
Run:
```bash
examples/postgres-historic/scripts/smoke.sh
```
Expected: PASS with `Postgres historic SQL smoke passed`. The stage-only records should include pattern WorkUnits with keys like `historic-sql-patterns-part-0001`, each reading `patterns-input/part-0001.json`, and the unchanged run should report `workUnitCount: 0`.
- [ ] **Step 3: Run the drift grep**
Run:
```bash
rg -n "unitKey === 'historic-sql-patterns'|expected patterns WorkUnit|patterns-input\\.json\\` and one \\`tables|unchanged bucketed table and pattern inputs" examples scripts
```
Expected: no matches.
- [ ] **Step 4: Commit verification metadata if any test-only wording changed**
Run:
```bash
git status --short
```
Expected: no unstaged files. If a previous step required a wording fix, commit only the touched files:
```bash
git add scripts/examples-docs.test.mjs examples/postgres-historic/scripts/smoke.sh examples/postgres-historic/README.md examples/README.md
git commit -m "test: verify historic sql sharded smoke docs"
```
## Self-Review
**Spec coverage:** This plan follows spec section 5.2's deterministic pattern sharding and preserves section 4.6's full `patterns-input.json` audit artifact. It updates the smoke and docs around the already implemented sharded runtime contract.
**Placeholder scan:** The plan contains exact file paths, exact snippets, commands, expected outcomes, and commit commands.
**Type consistency:** The plan uses the implemented runtime names consistently: `patterns-input.json` for the audit file, `patterns-input/part-*.json` for bounded shards, and `historic-sql-patterns-part-0001` style WorkUnit keys for pattern curation.
Plan complete and saved to `docs/superpowers/plans/2026-05-11-historic-sql-pattern-shard-smoke-docs.md`. Two execution options:
**1. Subagent-Driven (recommended)** - dispatch a fresh subagent per task, review between tasks, fast iteration
**2. Inline Execution** - execute tasks in this session using executing-plans, batch execution with checkpoints