ktx/docs/superpowers/plans/2026-05-11-historic-sql-pattern-shard-smoke-docs.md
Andrey Avtomonov fed5e6f89f docs(plans): add historic-sql pattern shard smoke/docs plan
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-11 20:29:08 +02:00

17 KiB

Historic SQL Pattern Shard Smoke Docs Implementation Plan

For agentic workers: REQUIRED SUB-SKILL: Use superpowers:subagent-driven-development (recommended) or superpowers:executing-plans to implement this plan task-by-task. Steps use checkbox (- [ ]) syntax for tracking.

Goal: Align the Postgres historic-SQL smoke and example docs with sharded pattern WorkUnits.

Architecture: The runtime already writes the full patterns-input.json audit file and bounded patterns-input/part-0001.json style shards. This plan updates the example acceptance assets so they verify the sharded contract instead of the pre-sharding root historic-sql-patterns WorkUnit.

Tech Stack: Bash, Node.js built-in test runner, pnpm workspace scripts, KTX local stage-only ingest.


Spec And Existing Plan Status

Spec: docs/superpowers/specs/2026-05-11-historic-sql-redesign-design.md

Plans derived from this spec and implemented in this worktree:

  • docs/superpowers/plans/2026-05-11-historic-sql-foundations.md - implemented. Evidence: packages/context/src/ingest/adapters/historic-sql/skill-schemas.ts, packages/context/src/sql-analysis/ports.ts, daemon /sql/analyze-batch, SemanticLayerSource.usage, and mergeUsagePreservingExternal().
  • docs/superpowers/plans/2026-05-11-historic-sql-search-enrichment.md - implemented. Evidence: usage-aware SL search text, SQLite FTS snippets, and local/MCP result fields frequencyTier plus snippet.
  • docs/superpowers/plans/2026-05-11-historic-sql-unified-hot-path.md - implemented. Evidence: stageHistoricSqlAggregatedSnapshot(), chunkHistoricSqlUnifiedStagedDir(), PostgresPgssReader, aggregate BigQuery/Snowflake readers, unified schemas, and package exports.
  • docs/superpowers/plans/2026-05-11-historic-sql-skills-projection-cutover.md - implemented. Evidence: HistoricSqlSourceAdapter, historic_sql_table_digest, historic_sql_patterns, emit_historic_sql_evidence, HistoricSqlProjectionPostProcessor, and legacy skill removal from runtime code.
  • docs/superpowers/plans/2026-05-11-historic-sql-cross-dialect-readiness.md - implemented. Evidence: local adapter registration tests for Postgres, BigQuery, and Snowflake plus PG doctor coverage for informational pg_stat_statements.max.
  • docs/superpowers/plans/2026-05-11-historic-sql-docs-smoke-and-config-cleanup.md - implemented at the time it was written, but its smoke assertions predate pattern shard WorkUnits.
  • docs/superpowers/plans/2026-05-11-historic-sql-projection-archive-hardening.md - implemented. Evidence: isArchivedPatternPage(), archive exclusion from slug matching, stale table tests, and legacy query-page cleanup coverage.
  • docs/superpowers/plans/2026-05-11-historic-sql-end-to-end-retrieval-acceptance.md - implemented. Evidence: local-ingest-acceptance.test.ts proves production adapter output reaches SL search and wiki search.
  • docs/superpowers/plans/2026-05-11-historic-sql-redaction-hardening.md - implemented. Evidence: redaction.ts, redaction.test.ts, and staged artifact redaction coverage in stage-unified.test.ts.
  • docs/superpowers/plans/2026-05-11-historic-sql-pattern-workunit-sharding.md - implemented. Evidence: pattern-inputs.ts, pattern-inputs.test.ts, stage-unified.ts writes patterns-input/part-*.json, chunk-unified.ts emits historic-sql-patterns-part-*, historic_sql_patterns reads shards, and acceptance tests use rawPath: 'patterns-input/part-0001.json'.

No existing spec-derived implementation plan is currently unimplemented in this worktree.

Remaining gap this plan fixes:

  • examples/postgres-historic/scripts/smoke.sh still asserts a WorkUnit with unitKey === 'historic-sql-patterns'.
  • Current runtime emits pattern WorkUnits with keys like historic-sql-patterns-part-0001 and raw files like patterns-input/part-0001.json.
  • The same smoke only validates the audit file patterns-input.json; it does not assert that the bounded shard files exist or contain only cross-table candidates.
  • examples/postgres-historic/README.md and examples/README.md describe unchanged "pattern inputs" but do not explain that patterns-input.json is now audit-only and patterns-input/part-*.json drives pattern WorkUnits.
  • scripts/examples-docs.test.mjs does not pin the sharded smoke/doc contract, so the stale root WorkUnit assertion can regress silently.

File Structure

  • Modify scripts/examples-docs.test.mjs
    Pins docs and smoke script to the sharded pattern WorkUnit contract.
  • Modify examples/postgres-historic/scripts/smoke.sh
    Validates patterns-input/part-*.json shard files and historic-sql-patterns-part-* stage-only WorkUnits.
  • Modify examples/postgres-historic/README.md
    Documents patterns-input.json as the full audit artifact and patterns-input/part-*.json as bounded pattern WorkUnit input.
  • Modify examples/README.md
    Updates the short example catalog entry with the same audit-vs-shard wording.

Task 1: Pin Example Tests To Pattern Shards

Files:

  • Modify: scripts/examples-docs.test.mjs

  • Step 1: Add failing assertions for sharded pattern smoke/docs

In scripts/examples-docs.test.mjs, inside it('documents the Postgres historic SQL smoke example', ...), add these assertions immediately after the existing assert.match(readme, /patterns-input\.json/); line:

    assert.match(readme, /patterns-input\/part-\*\.json/);
    assert.match(readme, /full audit input/);
    assert.match(readme, /bounded pattern WorkUnit shards/);

In the same test, add these assertions immediately after the existing assert.match(smoke, /assert_stage_record "\$UNCHANGED_RECORD" unchanged zero/); line:

    assert.match(smoke, /assertPatternShards/);
    assert.match(smoke, /historic-sql-patterns-part-/);
    assert.match(smoke, /patterns-input\/part-/);
    assert.doesNotMatch(smoke, /unitKey === 'historic-sql-patterns'/);
  • Step 2: Run the example docs test to verify it fails

Run:

node --test scripts/examples-docs.test.mjs

Expected: FAIL. The test should report missing patterns-input/part-*.json, full audit input, bounded pattern WorkUnit shards, assertPatternShards, or it should fail because smoke.sh still contains unitKey === 'historic-sql-patterns'.

  • Step 3: Commit the failing test

Run:

git add scripts/examples-docs.test.mjs
git commit -m "test: expect historic sql pattern shard smoke docs"

Task 2: Update The Postgres Historic Smoke

Files:

  • Modify: examples/postgres-historic/scripts/smoke.sh

  • Test: scripts/examples-docs.test.mjs

  • Step 1: Import existsSync in the embedded snapshot assertion

In examples/postgres-historic/scripts/smoke.sh, inside assert_unified_snapshot(), replace this line:

const { readFileSync, readdirSync } = require('node:fs');

with:

const { existsSync, readFileSync, readdirSync } = require('node:fs');
  • Step 2: Add shard validation to assert_unified_snapshot()

In examples/postgres-historic/scripts/smoke.sh, inside the embedded Node script in assert_unified_snapshot(), add this function after the legacyKeys loop:

function assertPatternShards(root) {
  const shardDir = join(root, 'patterns-input');
  assert(existsSync(shardDir), 'Expected patterns-input shard directory');
  const shardFiles = readdirSync(shardDir)
    .filter((file) => /^part-\d{4}\.json$/.test(file))
    .sort()
    .map((file) => `patterns-input/${file}`);
  assert(shardFiles.length > 0, 'Expected at least one pattern shard file');

  for (const shardFile of shardFiles) {
    const shard = JSON.parse(readFileSync(join(root, shardFile), 'utf8'));
    assert(Array.isArray(shard.templates), `${shardFile}: expected templates array`);
    assert(shard.templates.length > 0, `${shardFile}: expected at least one template`);
    assert(
      shard.templates.every((template) => Array.isArray(template.tablesTouched) && template.tablesTouched.length >= 2),
      `${shardFile}: expected only cross-table pattern candidates`,
    );
  }

  return shardFiles;
}
  • Step 3: Assert the full audit input and bounded shards

In the same embedded Node script, replace the current patterns block:

const patterns = JSON.parse(readFileSync(join(root, 'patterns-input.json'), 'utf8'));
assert(Array.isArray(patterns.templates) && patterns.templates.length > 0, 'Expected patterns-input templates');
assert(
  patterns.templates.every((template) => Array.isArray(template.tablesTouched) && template.tablesTouched.length > 0),
  'Expected every pattern template to have touched tables',
);

with:

const patterns = JSON.parse(readFileSync(join(root, 'patterns-input.json'), 'utf8'));
assert(Array.isArray(patterns.templates) && patterns.templates.length > 0, 'Expected patterns-input audit templates');
assert(
  patterns.templates.every((template) => Array.isArray(template.tablesTouched) && template.tablesTouched.length > 0),
  'Expected every audit pattern template to have touched tables',
);
const shardFiles = assertPatternShards(root);
assert(
  shardFiles.length <= patterns.templates.length,
  `Expected shard count ${shardFiles.length} to be no greater than audit template count ${patterns.templates.length}`,
);
  • Step 4: Update the stage record WorkUnit assertions

In examples/postgres-historic/scripts/smoke.sh, inside the embedded Node script in assert_stage_record(), replace:

assert(record.rawFileCount >= 3, `${label}: expected manifest, patterns input, and at least one table file`);

with:

assert(record.rawFileCount >= 4, `${label}: expected manifest, audit patterns input, pattern shard, and at least one table file`);

Then replace this nonzero WorkUnit block:

} else if (expectedWorkUnits === 'nonzero') {
  assert(record.workUnitCount > 0, `${label}: expected nonzero WorkUnits`);
  assert(record.workUnits.some((unit) => unit.unitKey === 'historic-sql-patterns'), `${label}: expected patterns WorkUnit`);
  assert(record.workUnits.some((unit) => unit.unitKey.startsWith('historic-sql-table-')), `${label}: expected table WorkUnit`);
} else {

with:

} else if (expectedWorkUnits === 'nonzero') {
  assert(record.workUnitCount > 0, `${label}: expected nonzero WorkUnits`);
  const patternUnits = record.workUnits.filter((unit) => /^historic-sql-patterns-part-\d{4}$/.test(unit.unitKey));
  assert(patternUnits.length > 0, `${label}: expected sharded patterns WorkUnit`);
  for (const unit of patternUnits) {
    assert(
      unit.rawFiles.some((rawFile) => /^patterns-input\/part-\d{4}\.json$/.test(rawFile)),
      `${label}: expected ${unit.unitKey} to read a pattern shard`,
    );
    assert(
      !unit.rawFiles.includes('patterns-input.json'),
      `${label}: expected ${unit.unitKey} not to schedule the full audit patterns input`,
    );
  }
  assert(record.workUnits.some((unit) => unit.unitKey.startsWith('historic-sql-table-')), `${label}: expected table WorkUnit`);
} else {
  • Step 5: Run shell syntax and the docs test

Run:

bash -n examples/postgres-historic/scripts/smoke.sh
node --test scripts/examples-docs.test.mjs

Expected: bash -n exits 0. The docs test still fails until the README files are updated in Task 3.

  • Step 6: Commit the smoke update

Run:

git add examples/postgres-historic/scripts/smoke.sh
git commit -m "test: assert historic sql pattern shard smoke"

Task 3: Update Example Documentation

Files:

  • Modify: examples/postgres-historic/README.md

  • Modify: examples/README.md

  • Test: scripts/examples-docs.test.mjs

  • Step 1: Update the artifact list in the Postgres historic README

In examples/postgres-historic/README.md, replace this list:

- `manifest.json`
- `tables/*.json`
- `patterns-input.json`

with:

- `manifest.json`
- `tables/*.json`
- `patterns-input.json` as the full audit input
- `patterns-input/part-*.json` as bounded pattern WorkUnit shards
  • Step 2: Update the idempotency wording

In examples/postgres-historic/README.md, replace this paragraph:

The smoke also runs the same workload twice and verifies the second stage-only
run has `workUnitCount: 0`, which proves unchanged bucketed table and pattern
inputs do not schedule LLM work.

with:

The smoke also runs the same workload twice and verifies the second stage-only
run has `workUnitCount: 0`, which proves unchanged bucketed table inputs and
unchanged bounded pattern shards do not schedule LLM work.
  • Step 3: Update the manifest inspection wording

In examples/postgres-historic/README.md, replace this paragraph:

The manifest should have `source: "historic-sql"`, `dialect: "postgres"`,
positive `snapshotRowCount`, positive `touchedTableCount`, numeric
`parseFailures`, `warnings`, and `probeWarnings`. The same directory should
contain `patterns-input.json` and one `tables/*.json` file per touched table.

with:

The manifest should have `source: "historic-sql"`, `dialect: "postgres"`,
positive `snapshotRowCount`, positive `touchedTableCount`, numeric
`parseFailures`, `warnings`, and `probeWarnings`. The same directory should
contain `patterns-input.json`, at least one `patterns-input/part-*.json` pattern
shard for cross-table candidates, and one `tables/*.json` file per touched
table.
  • Step 4: Update the examples catalog entry

In examples/README.md, replace this paragraph:

`postgres-historic/` is a manual Docker-backed smoke for Postgres historic-SQL
ingest via `pg_stat_statements`. It verifies setup, unified Historic SQL artifacts,
managed daemon batch SQL analysis, and no-WorkUnit idempotency for unchanged
bucketed table and pattern inputs.

with:

`postgres-historic/` is a manual Docker-backed smoke for Postgres historic-SQL
ingest via `pg_stat_statements`. It verifies setup, unified Historic SQL artifacts,
managed daemon batch SQL analysis, bounded pattern WorkUnit shards, and
no-WorkUnit idempotency for unchanged bucketed table inputs and pattern shards.
  • Step 5: Run the example docs test

Run:

node --test scripts/examples-docs.test.mjs

Expected: PASS.

  • Step 6: Commit the docs update

Run:

git add examples/postgres-historic/README.md examples/README.md
git commit -m "docs: explain historic sql pattern shards"

Task 4: Verify The Smoke Contract

Files:

  • Verify: scripts/examples-docs.test.mjs

  • Verify: examples/postgres-historic/scripts/smoke.sh

  • Verify: examples/postgres-historic/README.md

  • Verify: examples/README.md

  • Step 1: Run focused local checks

Run:

bash -n examples/postgres-historic/scripts/smoke.sh
node --test scripts/examples-docs.test.mjs

Expected: both commands pass.

  • Step 2: Run the Docker-backed Postgres historic smoke

Run:

examples/postgres-historic/scripts/smoke.sh

Expected: PASS with Postgres historic SQL smoke passed. The stage-only records should include pattern WorkUnits with keys like historic-sql-patterns-part-0001, each reading patterns-input/part-0001.json, and the unchanged run should report workUnitCount: 0.

  • Step 3: Run the drift grep

Run:

rg -n "unitKey === 'historic-sql-patterns'|expected patterns WorkUnit|patterns-input\\.json\\` and one \\`tables|unchanged bucketed table and pattern inputs" examples scripts

Expected: no matches.

  • Step 4: Commit verification metadata if any test-only wording changed

Run:

git status --short

Expected: no unstaged files. If a previous step required a wording fix, commit only the touched files:

git add scripts/examples-docs.test.mjs examples/postgres-historic/scripts/smoke.sh examples/postgres-historic/README.md examples/README.md
git commit -m "test: verify historic sql sharded smoke docs"

Self-Review

Spec coverage: This plan follows spec section 5.2's deterministic pattern sharding and preserves section 4.6's full patterns-input.json audit artifact. It updates the smoke and docs around the already implemented sharded runtime contract.

Placeholder scan: The plan contains exact file paths, exact snippets, commands, expected outcomes, and commit commands.

Type consistency: The plan uses the implemented runtime names consistently: patterns-input.json for the audit file, patterns-input/part-*.json for bounded shards, and historic-sql-patterns-part-0001 style WorkUnit keys for pattern curation.

Plan complete and saved to docs/superpowers/plans/2026-05-11-historic-sql-pattern-shard-smoke-docs.md. Two execution options:

1. Subagent-Driven (recommended) - dispatch a fresh subagent per task, review between tasks, fast iteration

2. Inline Execution - execute tasks in this session using executing-plans, batch execution with checkpoints