ktx/scripts/examples-docs.test.mjs
Andrey Avtomonov 2366b00301
chore(workspace): gate dead-code with knip production mode (#196)
* refactor(workspace): relocate @ktx/llm source into packages/cli/src/llm

* refactor(workspace): rewrite @ktx/llm imports to relative paths

* refactor(workspace): fold internal packages into cli

* chore(workspace): gate dead-code with knip production mode

Turn on production-mode knip plus an autofix run in pre-commit and the
`pnpm dead-code` script, document the `/** @internal */` convention for
test-only exports in AGENTS.md, annotate test-only exports across the
CLI with that JSDoc, and drop dead exports/wrappers the new gate
surfaced (e.g. `cli-project.ts`, `lookerRuntimeSourceToFileAdapterSource`,
`createLocalScanEnrichmentProvidersFromConfig`,
`PGLITE_OWNER_PROCESS_BACKEND_CAPABILITIES`, stale type re-exports).
Replace the loose `ignoreIssues` allowlist in `knip.json` with explicit
production entries so cross-package barrel leaks are caught.

* refactor(cli): delete internal barrel index.ts files

The 34 `index.ts` re-export barrels inside `packages/cli/src/` were
holdovers from the pre-fold multi-workspace structure. Post-fold-in they
served no production purpose: external consumers go through the single
package main entry, and in-repo callers mostly imported through them
only because the path was short. Internally, knip flagged most barrel
re-exports as production-dead (only reached via tests).

This change:
- Deletes every internal barrel except `packages/cli/src/index.ts`
  (the published package entry).
- Rewrites ~270 source/test files to import each name directly from
  the file that defines it.
- Moves `tools/warehouse-verification/index.ts` to
  `create-warehouse-verification-tools.ts` (the function it defined
  locally) and updates its single consumer.
- Renames `search/backend-conformance.ts` → `.test-utils.ts` to match
  the existing test-helper file convention.
- Deletes 13 dead test-only chains (dbt-descriptions/*,
  live-database/extracted-schema, live-database/structural-sync,
  relationship-* feedback/review chain) plus their tests and a
  cascading orphan integration test.
- Updates test mocks that pointed at deleted barrel paths
  (notion-client, connector barrels in scan/local-scan-connectors
  tests) to mock the source files instead.
- Points the maintainer benchmark script
  (`scripts/relationship-benchmark-report.mjs`) at source files
  instead of `dist/context/scan/index.js`.
- Drops the barrel `!` entries from `knip.json`; adds explicit
  production entries only for the benchmark code reached via dist by
  the maintainer script.

Net: 413 files changed, ~1.2k insertions, ~9.4k deletions.

`pnpm run dead-code` (Biome + knip default + knip production) and
`pnpm run type-check` are clean; 2277 tests pass.

* refactor(workspace): rename @ktx/cli to @kaelio/ktx and pack it directly

Promote the CLI workspace package to the public name `@kaelio/ktx` and
drop the separate `scripts/build-public-npm-package.mjs` wrapper. The
CLI package is now publishable in place (`publishConfig.access: public`,
`provenance: true`), so artifact packing uses `pnpm pack` against
`packages/cli/` instead of assembling a parallel package tree.

Updates all workspace filter invocations, docs, tests, and release
readiness checks to reference the new package name, and folds the
tarball-name helper into `scripts/public-npm-release-metadata.mjs`.

* docs: align "agent clients" and "data agents" terminology

Replace "client agents" with "agent clients" and "database agents" with
"data agents" across AGENTS.md, README.md, the docs-site copy, and the
matching setup-agents test description, matching the canonical
vocabulary in docs/terminology.md.

Also moves packages/cli/tsconfig.json's tsBuildInfoFile from
node_modules/.cache/ to dist/.tsbuildinfo so incremental builds survive
node_modules reinstalls.

* refactor(release): single source of truth for package version

Make packages/cli/package.json the single source of truth for the
@kaelio/ktx version. publicNpmPackageVersion() now reads it directly,
so artifact filenames, release-readiness checks, and the Python wheel
version all derive from one field. The duplicate
release-policy.json.publicNpmPackageVersion is removed.

Previously the two fields could drift: tarballs were named
kaelio-ktx-0.4.1.tgz while internally containing
@kaelio/ktx@0.0.0-private.

- update-public-release-version.mjs rewrites both Python pyproject.toml
  files (ktx-daemon, ktx-sl) alongside the npm package.jsons,
  normalizing the version for PEP 440 (e.g. 0.1.0-rc.2 -> 0.1.0rc2).
- semantic-release-config.cjs adds the two pyproject.toml files to
  @semantic-release/git assets so the release commit back to main
  carries every version source in lockstep.
- The six "?? '0.0.0-private'" fallback literals across the CLI are
  replaced with "?? getKtxCliPackageInfo().version", and
  createDefaultKtxMcpServer makes its version arg required.
- docs/release.md describes the actual commit-back model: the dev tree
  always reflects the most recent release; no sentinel pin to
  maintain.

Verified: pnpm run artifacts:build now produces
kaelio-ktx-0.4.1.tgz and kaelio_ktx-0.4.1-py3-none-any.whl with
@kaelio/ktx@0.4.1 inside. Full type-check, dead-code, and
2287 vitests + 173 script tests pass.

* refactor(cli): inject embedding provider resolution and detect sentence-transformers runtime

Make resolveProjectEmbeddingProvider and runtimeIo injectable in ingest and
scan command entrypoints so tests can stub them, and teach
resolvePublicIngestRuntimeRequirements to flag the local-embeddings runtime
feature when ktx.yaml selects sentence-transformers.

* chore(cli): mark buildLocalStatsStatus and LocalStatsStatus as @internal

Both symbols are consumed only by status-project.test.ts. Annotating with
/** @internal */ keeps knip's production-mode check clean without changing
runtime behavior.

* fix(cli): use real package metadata in print-command-tree

The stubbed package name embedded a forbidden product identifier that
tripped the boundary check in CI. Read the metadata from package.json
instead — keeps the rendered tree unchanged and removes a duplicate
source of truth.

* feat(cli): show embedding coverage in `ktx status`, drop duplicate disk counts

Inline `(N embedded)` next to the Wiki scope counts and Semantic-layer
source counts, computed with `SUM(embedding_json IS NOT NULL)` over
`knowledge_pages` and `local_sl_sources`. Rename the "Knowledge" label to
"Wiki" (canonical per `docs/terminology.md`) and rename the matching
`localStats.knowledgePages` field to `localStats.wikiPages`.

Drop `wiki=N md` and `semantic-layer=N yaml` from the Disk row — those
duplicated the per-surface rows above. Disk now reports only actual byte
usage (db, cache, raw-sources). The unused `wikiGlobalMarkdownCount` /
`semanticLayerYamlCount` fields, the `isMarkdownEntry` / `isYamlEntry`
helpers, and the `filter` arg on `summarizeDir` are removed.
2026-05-21 15:28:58 +02:00

346 lines
17 KiB
JavaScript

import assert from 'node:assert/strict';
import { readFile } from 'node:fs/promises';
import { describe, it } from 'node:test';
async function readText(relativePath) {
return readFile(new URL(`../${relativePath}`, import.meta.url), 'utf8');
}
function publicNpmPackageName() {
return `@${['kae', 'lio'].join('')}/ktx`;
}
function runtimeWheelPackageName() {
return `${['kae', 'lio'].join('')}-ktx`;
}
function escapeRegExp(value) {
return value.replace(/[.*+?^${}()|[\]\\]/g, '\\$&');
}
function publicPackagePattern(text) {
return new RegExp(text.replaceAll('{package}', escapeRegExp(publicNpmPackageName())));
}
describe('standalone example docs', () => {
it('documents the local warehouse example from the examples index', async () => {
const examples = await readText('examples/README.md');
assert.match(examples, /local-warehouse/);
assert.match(examples, /fake ingest adapter/);
assert.doesNotMatch(examples, /will contain standalone examples/);
});
it('documents the Orbit relationship verification example project', async () => {
const examples = await readText('examples/README.md');
const readme = await readText('examples/orbit-relationship-verification/README.md');
const config = await readText('examples/orbit-relationship-verification/ktx.yaml');
assert.match(examples, /orbit-relationship-verification/);
assert.match(examples, /relationships:verify-orbit/);
assert.match(readme, /Orbit-style relationship discovery verification/);
assert.match(readme, /pnpm run relationships:verify-orbit/);
assert.match(readme, /Accepted: 9/);
assert.match(readme, /Review: 0/);
assert.match(readme, /Rejected: 0/);
assert.doesNotMatch(config, /^project:/m);
assert.match(config, /orbit:/);
assert.match(config, /driver: sqlite/);
assert.match(
config,
/path: \.\.\/\.\.\/packages\/context\/test\/fixtures\/relationship-benchmarks\/orbit_style_product_no_declared_constraints\/data\.sqlite/,
);
assert.match(config, /llmProposals: false/);
assert.match(config, /validationRequiredForManifest: true/);
});
it('documents the Postgres historic SQL smoke example', async () => {
const examples = await readText('examples/README.md');
const readme = await readText('examples/postgres-historic/README.md');
const compose = await readText('examples/postgres-historic/docker-compose.yml');
const initSql = await readText('examples/postgres-historic/init/001-schema.sql');
const workload = await readText('examples/postgres-historic/scripts/generate-workload.sh');
const smoke = await readText('examples/postgres-historic/scripts/smoke.sh');
assert.match(examples, /postgres-historic/);
assert.doesNotMatch(examples, /Historic SQL/);
assert.doesNotMatch(examples, /historic-SQL/);
assert.match(examples, /query-history ingest via `pg_stat_statements`/);
assert.doesNotMatch(readme, new RegExp(['--enable-historic', 'sql'].join('-')));
assert.doesNotMatch(readme, new RegExp(['--historic', 'sql-min-executions'].join('-')));
assert.doesNotMatch(readme, /ktx ingest run --project-dir/);
assert.doesNotMatch(readme, /--adapter historic-sql/);
assert.match(readme, /--enable-query-history/);
assert.match(readme, /--query-history-min-executions 2/);
assert.match(readme, /ktx status --project-dir/);
assert.match(readme, /Postgres query history/);
assert.match(readme, /manifest\.json/);
assert.match(readme, /tables\/\*\.json/);
assert.match(readme, /patterns-input\.json/);
assert.match(readme, /patterns-input\/part-\*\.json/);
assert.match(readme, /full audit input/);
assert.match(readme, /bounded pattern WorkUnit shards/);
assert.match(readme, /workUnitCount: 0/);
assert.match(compose, /postgres:14/);
assert.match(compose, /shared_preload_libraries=pg_stat_statements/);
assert.match(compose, /pg_stat_statements.track=top/);
assert.match(initSql, /CREATE EXTENSION IF NOT EXISTS pg_stat_statements/);
assert.match(initSql, /GRANT pg_read_all_stats TO ktx_reader/);
assert.match(workload, /JOIN customers/);
assert.match(workload, /app_user/);
assert.match(workload, /etl_user/);
assert.match(smoke, /assert_unified_snapshot/);
assert.match(smoke, /assert_stage_record "\$UNCHANGED_RECORD" unchanged zero/);
assert.match(smoke, /assertPatternShards/);
assert.match(smoke, /historic-sql-patterns-part-/);
assert.match(smoke, /patterns-input\/part-/);
assert.doesNotMatch(smoke, new RegExp(["unitKey === 'historic", 'sql', "patterns'"].join('-')));
assert.match(smoke, /--query-history-min-executions 2/);
assert.match(smoke, /KTX_RUNTIME_ROOT/);
assert.match(smoke, /managedDaemon/);
assert.match(smoke, /installPolicy: 'auto'/);
assert.match(smoke, /getKtxCliPackageInfo/);
assert.doesNotMatch(smoke, /python-service/);
assert.doesNotMatch(smoke, /PYTHON_SERVICE/);
assert.doesNotMatch(smoke, /uvicorn app\.main:app/);
assert.doesNotMatch(smoke, /export KTX_SQL_ANALYSIS_URL/);
assert.doesNotMatch(
smoke,
new RegExp(
[
['baseline', 'FirstRun'],
['de', 'graded'],
['stats', 'ResetAt'],
['assert', '_manifest'],
]
.map((parts) => parts.join(''))
.join('|'),
),
);
assert.doesNotMatch(readme, /python-service/);
assert.doesNotMatch(readme, /KTX_SQL_ANALYSIS_URL/);
assert.doesNotMatch(
readme,
new RegExp(
[
['baseline', 'FirstRun'],
['de', 'graded: true'],
['stats', 'ResetAt'],
['fresh PGSS', ' baseline'],
['delta', '-only'],
]
.map((parts) => parts.join(''))
.join('|'),
),
);
});
it('checked-in example configs do not include public database adapters', async () => {
const localWarehouseConfig = await readFile('examples/local-warehouse/ktx.yaml', 'utf8');
const orbitConfig = await readFile('examples/orbit-relationship-verification/ktx.yaml', 'utf8');
const legacyPublicAdapter = new RegExp(['live', 'database'].join('-'));
assert.doesNotMatch(localWarehouseConfig, legacyPublicAdapter);
assert.doesNotMatch(orbitConfig, legacyPublicAdapter);
});
it('lists the consolidated workspace layout in the contributor docs', async () => {
const contributing = await readText('docs-site/content/docs/community/contributing.mdx');
assert.match(contributing, /cli\/\s+# CLI package and published npm package source/);
assert.match(contributing, /src\/context\/\s+# Core context engine/);
assert.match(contributing, /src\/llm\/\s+# LLM client abstraction/);
assert.match(contributing, /src\/connectors\/\s+# Database connectors/);
assert.match(contributing, /ktx-sl\/\s+# Semantic layer/);
assert.match(contributing, /ktx-daemon\/\s+# Daemon/);
});
it('documents agent-facing CLI commands', async () => {
const servingAgents = await readText('docs-site/content/docs/guides/serving-agents.mdx');
for (const command of [
'ktx status --json',
'ktx sl --json',
'ktx sl "revenue" --json',
'ktx sl query',
'ktx wiki "revenue recognition" --json',
]) {
assert.match(servingAgents, new RegExp(command.replace(/[.*+?^${}()|[\]\\]/g, '\\$&')));
}
});
it('walks through connection testing in the quickstart and CLI reference', async () => {
const quickstart = await readText('docs-site/content/docs/getting-started/quickstart.mdx');
const connectionReference = await readText('docs-site/content/docs/cli-reference/ktx-connection.mdx');
assert.match(connectionReference, /ktx connection list/);
assert.match(connectionReference, /ktx connection test my-warehouse/);
assert.match(connectionReference, /ktx connection test --all/);
assert.match(quickstart, /Connection test passed/);
assert.match(connectionReference, /Driver: postgres/);
assert.match(connectionReference, /Status: ok/);
});
it('documents public npm and managed runtime usage', async () => {
const rootReadme = await readText('README.md');
const quickstart = await readText('docs-site/content/docs/getting-started/quickstart.mdx');
const packageArtifacts = await readText('examples/package-artifacts/README.md');
assert.match(rootReadme, publicPackagePattern('npm install -g {package}'));
assert.match(quickstart, publicPackagePattern('npm install -g {package}'));
assert.match(quickstart, /ktx admin runtime install --feature local-embeddings --yes/);
assert.match(quickstart, /ktx admin runtime start --feature local-embeddings/);
assert.match(packageArtifacts, /requires `uv` on `PATH`/);
assert.match(packageArtifacts, /ktx admin runtime status/);
assert.match(packageArtifacts, /ktx admin runtime status/);
assert.doesNotMatch(packageArtifacts, /ktx admin runtime prune/);
assert.match(
packageArtifacts,
new RegExp(
`artifact manifest contains the public \`${escapeRegExp(publicNpmPackageName())}\` npm tarball and the\\s+bundled \`${escapeRegExp(
runtimeWheelPackageName(),
)}\` runtime wheel`,
),
);
assert.doesNotMatch(rootReadme, /ktx serve --mcp stdio/);
assert.doesNotMatch(rootReadme, /uv run ktx-daemon serve-http/);
assert.doesNotMatch(rootReadme, /--semantic-compute-url http:\/\/127\.0\.0\.1:8765/);
});
it('documents the public package artifact smoke shape', async () => {
const readme = await readText('examples/package-artifacts/README.md');
assert.match(readme, publicPackagePattern('{package}'));
assert.match(readme, /managed Python runtime/);
assert.match(
readme,
new RegExp(
`public \`${escapeRegExp(publicNpmPackageName())}\` npm tarball and the\\s+bundled \`${escapeRegExp(
runtimeWheelPackageName(),
)}\`\\s+runtime wheel`,
),
);
assert.match(readme, /does not install standalone\s+Python packages directly/);
assert.doesNotMatch(readme, /standalone Python distributions/);
assert.doesNotMatch(readme, /installs the Python artifacts directly/);
assert.match(readme, /requires `uv` on `PATH`/);
assert.match(readme, /ktx admin runtime status/);
assert.match(readme, /ktx admin runtime status/);
assert.doesNotMatch(readme, /ktx admin runtime prune/);
assert.doesNotMatch(readme, /@ktx\/context/);
assert.doesNotMatch(readme, /@ktx\/cli/);
assert.doesNotMatch(readme, /python -m ktx_daemon semantic-validate/);
});
it('documents unified public ingest workflows in the docs site', async () => {
const rootReadme = await readText('README.md');
const cliMeta = await readText('docs-site/content/docs/cli-reference/meta.json');
const ingestReference = await readText('docs-site/content/docs/cli-reference/ktx-ingest.mdx');
const adminReference = await readText('docs-site/content/docs/cli-reference/ktx-admin.mdx');
const setupReference = await readText('docs-site/content/docs/cli-reference/ktx-setup.mdx');
const buildingContext = await readText('docs-site/content/docs/guides/building-context.mdx');
const contextSources = await readText('docs-site/content/docs/integrations/context-sources.mdx');
const contextAsCode = await readText('docs-site/content/docs/concepts/context-as-code.mdx');
const quickstart = await readText('docs-site/content/docs/getting-started/quickstart.mdx');
const primarySources = await readText('docs-site/content/docs/integrations/primary-sources.mdx');
const examplesIndex = await readText('examples/README.md');
const localWarehouseReadme = await readText('examples/local-warehouse/README.md');
assert.match(ingestReference, /ktx ingest <connectionId>/);
assert.match(ingestReference, /Build every configured connection/);
assert.match(ingestReference, /--query-history-window-days <days>/);
assert.match(buildingContext, /ktx ingest <connectionId>/);
assert.match(buildingContext, /ktx ingest --all/);
assert.match(contextSources, /ktx ingest <connectionId>/);
assert.match(contextAsCode, /ktx ingest --all --no-input/);
assert.match(quickstart, /schema context/);
assert.match(primarySources, /context:\n queryHistory:/);
assert.match(rootReadme, /`ktx ingest <id>` \| Build context for one connection/);
assert.match(quickstart, /Databases:\n warehouse: deep context complete/);
assert.match(quickstart, /Databases configured: yes \(warehouse\)/);
assert.match(setupReference, /Databases configured: yes \(postgres-warehouse\)/);
assert.doesNotMatch(rootReadme, new RegExp(['Primary sources', 'configured'].join(' ')));
assert.doesNotMatch(quickstart, new RegExp(['Primary', 'sources'].join(' ')));
assert.doesNotMatch(setupReference, new RegExp(['Primary sources', 'configured'].join(' ')));
assert.doesNotMatch(cliMeta, /ktx-scan/);
assert.doesNotMatch(ingestReference, /ktx ingest run/);
assert.doesNotMatch(ingestReference, /ktx ingest status/);
assert.doesNotMatch(ingestReference, /ktx ingest replay/);
assert.doesNotMatch(ingestReference, /--adapter/);
assert.doesNotMatch(ingestReference, /ktx ingest watch/);
assert.doesNotMatch(ingestReference, /live-database/);
assert.doesNotMatch(adminReference, /ktx scan/);
assert.doesNotMatch(buildingContext, /ktx ingest watch/);
assert.doesNotMatch(buildingContext, /ktx ingest status/);
assert.doesNotMatch(buildingContext, /ktx ingest replay/);
assert.doesNotMatch(buildingContext, /historic-sql/);
assert.doesNotMatch(buildingContext, /live-database/);
assert.doesNotMatch(contextSources, /ktx ingest run --connection-id/);
assert.doesNotMatch(contextSources, /--adapter <adapter>/);
assert.doesNotMatch(contextAsCode, /ktx ingest run --connection-id/);
assert.doesNotMatch(quickstart, /Historic SQL/);
assert.doesNotMatch(quickstart, /--enable-historic-sql/);
assert.doesNotMatch(quickstart, /press <kbd>d<\/kbd> to detach/);
assert.doesNotMatch(primarySources, /historicSql/);
assert.doesNotMatch(primarySources, /Historic SQL/);
assert.doesNotMatch(examplesIndex, /ktx ingest run --project-dir/);
assert.doesNotMatch(localWarehouseReadme, /ktx ingest run --project-dir/);
assert.match(rootReadme, /raw-sources\//);
assert.doesNotMatch(rootReadme, new RegExp(`${['live', 'database'].join('-')}/`));
assert.doesNotMatch(rootReadme, /ktx scan/);
assert.doesNotMatch(rootReadme, /Run a local ingest smoke test/);
assert.doesNotMatch(rootReadme, /ktx ingest run --project-dir/);
assert.doesNotMatch(rootReadme, /ktx ingest status --project-dir/);
});
it('documents pnpm setup as a prerequisite when optional dev linking fails', async () => {
const rootReadme = await readText('README.md');
assert.match(rootReadme, /pnpm run link:dev/);
assert.match(rootReadme, /ktx-dev --help/);
assert.doesNotMatch(
rootReadme,
/If the setup command reports that pnpm's global bin directory is not on your\n`PATH`, add the printed directory to your shell profile/,
);
});
it('runs the example smoke in the cli smoke script', async () => {
const packageJson = JSON.parse(await readText('packages/cli/package.json'));
assert.match(packageJson.scripts.smoke, /src\/standalone-smoke\.test\.ts/);
assert.match(packageJson.scripts.smoke, /src\/example-smoke\.test\.ts/);
assert.match(packageJson.scripts.test, /--exclude src\/standalone-smoke\.test\.ts/);
assert.match(packageJson.scripts.test, /--exclude src\/example-smoke\.test\.ts/);
});
it('documents daemon HTTP database, source generation, LookML, embedding, and code execution support', async () => {
const readme = await readText('python/ktx-daemon/README.md');
assert.match(readme, /semantic-generate-sources/);
assert.match(readme, /database-introspect/);
assert.match(readme, /POST \/database\/introspect/);
assert.match(readme, /Introspect a Postgres database schema/);
assert.match(readme, /lookml-parse/);
assert.match(readme, /embedding-compute/);
assert.match(readme, /embedding-compute-bulk/);
assert.match(readme, /code-execute/);
assert.match(readme, /--enable-code-execution/);
assert.match(readme, /POST \/semantic-layer\/generate-sources/);
assert.match(readme, /POST \/lookml\/parse/);
assert.match(readme, /POST \/embeddings\/compute/);
assert.match(readme, /POST \/embeddings\/compute-bulk/);
assert.match(readme, /POST \/code\/execute/);
assert.match(readme, /Generate semantic-layer sources from schema scan data/);
assert.match(readme, /Parse LookML projects into resolved, KSL-ready structures/);
assert.match(readme, /Compute text embeddings locally/);
assert.match(readme, /Execute Python code with the current in-process boundary/);
assert.match(readme, /Code execution is off by default/);
assert.match(readme, /does not provide OS-level sandboxing/);
assert.doesNotMatch(readme, /source generation are not exposed through this/);
assert.doesNotMatch(readme, /LookML parsing are not exposed through this/);
assert.doesNotMatch(readme, /embeddings are not exposed through this server mode/);
assert.doesNotMatch(readme, /Code execution is not exposed through this server mode/);
});
});