2026-05-10 23:12:26 +02:00
import { mkdtemp , rm , writeFile } from 'node:fs/promises' ;
import { createServer } from 'node:net' ;
import { tmpdir } from 'node:os' ;
import { dirname , join , resolve } from 'node:path' ;
import { performance } from 'node:perf_hooks' ;
import { fileURLToPath } from 'node:url' ;
import { PGlite } from '@electric-sql/pglite' ;
import { pg _trgm } from '@electric-sql/pglite/contrib/pg_trgm' ;
import { vector } from '@electric-sql/pglite/vector' ;
import { PGLiteSocketServer } from '@electric-sql/pglite-socket' ;
import { Client } from 'pg' ;
const scriptDir = dirname ( fileURLToPath ( import . meta . url ) ) ;
chore(workspace): gate dead-code with knip production mode (#196)
* refactor(workspace): relocate @ktx/llm source into packages/cli/src/llm
* refactor(workspace): rewrite @ktx/llm imports to relative paths
* refactor(workspace): fold internal packages into cli
* chore(workspace): gate dead-code with knip production mode
Turn on production-mode knip plus an autofix run in pre-commit and the
`pnpm dead-code` script, document the `/** @internal */` convention for
test-only exports in AGENTS.md, annotate test-only exports across the
CLI with that JSDoc, and drop dead exports/wrappers the new gate
surfaced (e.g. `cli-project.ts`, `lookerRuntimeSourceToFileAdapterSource`,
`createLocalScanEnrichmentProvidersFromConfig`,
`PGLITE_OWNER_PROCESS_BACKEND_CAPABILITIES`, stale type re-exports).
Replace the loose `ignoreIssues` allowlist in `knip.json` with explicit
production entries so cross-package barrel leaks are caught.
* refactor(cli): delete internal barrel index.ts files
The 34 `index.ts` re-export barrels inside `packages/cli/src/` were
holdovers from the pre-fold multi-workspace structure. Post-fold-in they
served no production purpose: external consumers go through the single
package main entry, and in-repo callers mostly imported through them
only because the path was short. Internally, knip flagged most barrel
re-exports as production-dead (only reached via tests).
This change:
- Deletes every internal barrel except `packages/cli/src/index.ts`
(the published package entry).
- Rewrites ~270 source/test files to import each name directly from
the file that defines it.
- Moves `tools/warehouse-verification/index.ts` to
`create-warehouse-verification-tools.ts` (the function it defined
locally) and updates its single consumer.
- Renames `search/backend-conformance.ts` → `.test-utils.ts` to match
the existing test-helper file convention.
- Deletes 13 dead test-only chains (dbt-descriptions/*,
live-database/extracted-schema, live-database/structural-sync,
relationship-* feedback/review chain) plus their tests and a
cascading orphan integration test.
- Updates test mocks that pointed at deleted barrel paths
(notion-client, connector barrels in scan/local-scan-connectors
tests) to mock the source files instead.
- Points the maintainer benchmark script
(`scripts/relationship-benchmark-report.mjs`) at source files
instead of `dist/context/scan/index.js`.
- Drops the barrel `!` entries from `knip.json`; adds explicit
production entries only for the benchmark code reached via dist by
the maintainer script.
Net: 413 files changed, ~1.2k insertions, ~9.4k deletions.
`pnpm run dead-code` (Biome + knip default + knip production) and
`pnpm run type-check` are clean; 2277 tests pass.
* refactor(workspace): rename @ktx/cli to @kaelio/ktx and pack it directly
Promote the CLI workspace package to the public name `@kaelio/ktx` and
drop the separate `scripts/build-public-npm-package.mjs` wrapper. The
CLI package is now publishable in place (`publishConfig.access: public`,
`provenance: true`), so artifact packing uses `pnpm pack` against
`packages/cli/` instead of assembling a parallel package tree.
Updates all workspace filter invocations, docs, tests, and release
readiness checks to reference the new package name, and folds the
tarball-name helper into `scripts/public-npm-release-metadata.mjs`.
* docs: align "agent clients" and "data agents" terminology
Replace "client agents" with "agent clients" and "database agents" with
"data agents" across AGENTS.md, README.md, the docs-site copy, and the
matching setup-agents test description, matching the canonical
vocabulary in docs/terminology.md.
Also moves packages/cli/tsconfig.json's tsBuildInfoFile from
node_modules/.cache/ to dist/.tsbuildinfo so incremental builds survive
node_modules reinstalls.
* refactor(release): single source of truth for package version
Make packages/cli/package.json the single source of truth for the
@kaelio/ktx version. publicNpmPackageVersion() now reads it directly,
so artifact filenames, release-readiness checks, and the Python wheel
version all derive from one field. The duplicate
release-policy.json.publicNpmPackageVersion is removed.
Previously the two fields could drift: tarballs were named
kaelio-ktx-0.4.1.tgz while internally containing
@kaelio/ktx@0.0.0-private.
- update-public-release-version.mjs rewrites both Python pyproject.toml
files (ktx-daemon, ktx-sl) alongside the npm package.jsons,
normalizing the version for PEP 440 (e.g. 0.1.0-rc.2 -> 0.1.0rc2).
- semantic-release-config.cjs adds the two pyproject.toml files to
@semantic-release/git assets so the release commit back to main
carries every version source in lockstep.
- The six "?? '0.0.0-private'" fallback literals across the CLI are
replaced with "?? getKtxCliPackageInfo().version", and
createDefaultKtxMcpServer makes its version arg required.
- docs/release.md describes the actual commit-back model: the dev tree
always reflects the most recent release; no sentinel pin to
maintain.
Verified: pnpm run artifacts:build now produces
kaelio-ktx-0.4.1.tgz and kaelio_ktx-0.4.1-py3-none-any.whl with
@kaelio/ktx@0.4.1 inside. Full type-check, dead-code, and
2287 vitests + 173 script tests pass.
* refactor(cli): inject embedding provider resolution and detect sentence-transformers runtime
Make resolveProjectEmbeddingProvider and runtimeIo injectable in ingest and
scan command entrypoints so tests can stub them, and teach
resolvePublicIngestRuntimeRequirements to flag the local-embeddings runtime
feature when ktx.yaml selects sentence-transformers.
* chore(cli): mark buildLocalStatsStatus and LocalStatsStatus as @internal
Both symbols are consumed only by status-project.test.ts. Annotating with
/** @internal */ keeps knip's production-mode check clean without changing
runtime behavior.
* fix(cli): use real package metadata in print-command-tree
The stubbed package name embedded a forbidden product identifier that
tripped the boundary check in CI. Read the metadata from package.json
instead — keeps the rendered tree unchanged and removes a duplicate
source of truth.
* feat(cli): show embedding coverage in `ktx status`, drop duplicate disk counts
Inline `(N embedded)` next to the Wiki scope counts and Semantic-layer
source counts, computed with `SUM(embedding_json IS NOT NULL)` over
`knowledge_pages` and `local_sl_sources`. Rename the "Knowledge" label to
"Wiki" (canonical per `docs/terminology.md`) and rename the matching
`localStats.knowledgePages` field to `localStats.wikiPages`.
Drop `wiki=N md` and `semantic-layer=N yaml` from the Disk row — those
duplicated the per-surface rows above. Disk now reports only actual byte
usage (db, cache, raw-sources). The unused `wikiGlobalMarkdownCount` /
`semanticLayerYamlCount` fields, the `isMarkdownEntry` / `isYamlEntry`
helpers, and the `filter` arg on `summarizeDir` are removed.
2026-05-21 15:28:58 +02:00
const ktxRoot = resolve ( scriptDir , '..' ) ;
2026-05-10 23:51:24 +02:00
const reportPath = join ( ktxRoot , 'docs' , 'hybrid-search-pglite-owner-process.md' ) ;
2026-05-10 23:12:26 +02:00
async function timed ( label , fn ) {
const started = performance . now ( ) ;
const value = await fn ( ) ;
return {
label ,
durationMs : Number ( ( performance . now ( ) - started ) . toFixed ( 2 ) ) ,
value ,
} ;
}
async function allocatePort ( ) {
const server = createServer ( ) ;
await new Promise ( ( resolve ) => server . listen ( 0 , '127.0.0.1' , resolve ) ) ;
const address = server . address ( ) ;
if ( typeof address !== 'object' || address === null ) {
throw new Error ( 'Expected TCP server address while allocating a PGlite owner-process port.' ) ;
}
await new Promise ( ( resolve , reject ) => {
server . close ( ( error ) => {
if ( error ) {
reject ( error ) ;
return ;
}
resolve ( ) ;
} ) ;
} ) ;
return address . port ;
}
async function createOwner ( dataDir , port ) {
const db = await PGlite . create ( {
dataDir ,
extensions : {
vector ,
pg _trgm ,
} ,
} ) ;
await db . exec ( `
CREATE EXTENSION IF NOT EXISTS vector ;
CREATE EXTENSION IF NOT EXISTS pg _trgm ;
CREATE TABLE IF NOT EXISTS prototype _documents (
id TEXT PRIMARY KEY ,
search _text TEXT NOT NULL ,
metadata JSONB NOT NULL DEFAULT '{}' : : jsonb ,
embedding vector ( 3 ) NOT NULL
) ;
CREATE INDEX IF NOT EXISTS prototype _documents _fts _idx
ON prototype _documents
USING GIN ( to _tsvector ( 'english' , search _text ) ) ;
CREATE INDEX IF NOT EXISTS prototype _documents _vector _idx
ON prototype _documents
USING ivfflat ( embedding vector _cosine _ops )
WITH ( lists = 1 ) ;
CREATE TABLE IF NOT EXISTS prototype _dictionary _values (
connection _id TEXT NOT NULL ,
source _name TEXT NOT NULL ,
column _name TEXT NOT NULL ,
value TEXT NOT NULL ,
PRIMARY KEY ( connection _id , source _name , column _name , value )
) ;
CREATE INDEX IF NOT EXISTS prototype _dictionary _values _trgm _idx
ON prototype _dictionary _values
USING GIN ( value gin _trgm _ops ) ;
` );
const server = new PGLiteSocketServer ( {
db ,
host : '127.0.0.1' ,
port ,
maxConnections : 100 ,
} ) ;
await server . start ( ) ;
return {
db ,
server ,
connectionConfig : {
host : '127.0.0.1' ,
port ,
user : 'postgres' ,
database : 'postgres' ,
2026-05-10 23:51:24 +02:00
application _name : 'ktx-pglite-owner-report' ,
2026-05-10 23:12:26 +02:00
connectionTimeoutMillis : 5_000 ,
} ,
} ;
}
async function withClient ( connectionConfig , fn ) {
const client = new Client ( connectionConfig ) ;
await client . connect ( ) ;
try {
return await fn ( client ) ;
} finally {
await client . end ( ) ;
}
}
async function seed ( connectionConfig ) {
await withClient ( connectionConfig , async ( client ) => {
await client . query (
`
INSERT INTO prototype _documents ( id , search _text , metadata , embedding )
VALUES
( $1 , $2 , $3 : : jsonb , $4 : : vector ) ,
( $5 , $6 , $7 : : jsonb , $8 : : vector ) ,
( $9 , $10 , $11 : : jsonb , $12 : : vector )
ON CONFLICT ( id ) DO UPDATE
SET search _text = EXCLUDED . search _text ,
metadata = EXCLUDED . metadata ,
embedding = EXCLUDED . embedding
` ,
[
'warehouse/orders' ,
'orders paid revenue refund status customer' ,
JSON . stringify ( { connectionId : 'warehouse' , sourceName : 'orders' } ) ,
JSON . stringify ( [ 1 , 0 , 0 ] ) ,
'finance/orders' ,
'orders finance bookings gross margin' ,
JSON . stringify ( { connectionId : 'finance' , sourceName : 'orders' } ) ,
JSON . stringify ( [ 0.72 , 0.28 , 0 ] ) ,
'warehouse/customers' ,
'customers accounts lifecycle region' ,
JSON . stringify ( { connectionId : 'warehouse' , sourceName : 'customers' } ) ,
JSON . stringify ( [ 0 , 1 , 0 ] ) ,
] ,
) ;
await client . query ( `
INSERT INTO prototype _dictionary _values ( connection _id , source _name , column _name , value )
VALUES
( 'warehouse' , 'orders' , 'status' , 'refunded' ) ,
( 'warehouse' , 'orders' , 'status' , 'paid' ) ,
( 'warehouse' , 'customers' , 'region' , 'emea' )
ON CONFLICT DO NOTHING
` );
} ) ;
}
async function queryTopResults ( connectionConfig ) {
return await withClient ( connectionConfig , async ( client ) => {
const lexical = await client . query (
`
SELECT id
FROM prototype _documents
WHERE to _tsvector ( 'english' , search _text ) @ @ websearch _to _tsquery ( 'english' , $1 )
ORDER BY ts _rank _cd ( to _tsvector ( 'english' , search _text ) , websearch _to _tsquery ( 'english' , $1 ) ) DESC , id ASC
LIMIT 1
` ,
[ 'paid orders' ] ,
) ;
const semantic = await client . query (
`
SELECT id
FROM prototype _documents
ORDER BY embedding <= > $1 : : vector , id ASC
LIMIT 1
` ,
[ JSON . stringify ( [ 1 , 0 , 0 ] ) ] ,
) ;
const dictionary = await client . query (
`
SELECT connection _id || '/' || source _name AS id
FROM prototype _dictionary _values
WHERE similarity ( value , $1 ) > 0
ORDER BY similarity ( value , $1 ) DESC , id ASC , value ASC
LIMIT 1
` ,
[ 'refund' ] ,
) ;
return {
lexical : lexical . rows [ 0 ] ? . id ? ? '<missing>' ,
semantic : semantic . rows [ 0 ] ? . id ? ? '<missing>' ,
dictionary : dictionary . rows [ 0 ] ? . id ? ? '<missing>' ,
} ;
} ) ;
}
async function concurrentReads ( connectionConfig ) {
const clients = await Promise . all (
Array . from ( { length : 4 } , async ( ) => {
const client = new Client ( connectionConfig ) ;
await client . connect ( ) ;
return client ;
} ) ,
) ;
try {
const results = await Promise . all (
clients . map ( ( client ) => client . query ( 'SELECT COUNT(*)::int AS count FROM prototype_documents' ) ) ,
) ;
return results . map ( ( result ) => result . rows [ 0 ] ? . count ? ? null ) ;
} finally {
await Promise . all ( clients . map ( ( client ) => client . end ( ) . catch ( ( ) => undefined ) ) ) ;
}
}
async function stopOwner ( owner ) {
await owner . server . stop ( ) ;
await owner . db . close ( ) ;
}
async function main ( ) {
2026-05-10 23:51:24 +02:00
const tempDir = await mkdtemp ( join ( tmpdir ( ) , 'ktx-pglite-owner-report-' ) ) ;
2026-05-10 23:12:26 +02:00
const dataDir = join ( tempDir , 'pgdata' ) ;
const port = await allocatePort ( ) ;
let owner ;
try {
const startTimer = await timed ( 'startOwner' , async ( ) => await createOwner ( dataDir , port ) ) ;
owner = startTimer . value ;
const seedTimer = await timed ( 'seed' , async ( ) => await seed ( owner . connectionConfig ) ) ;
const queryTimer = await timed ( 'searchQueries' , async ( ) => await queryTopResults ( owner . connectionConfig ) ) ;
const concurrentTimer = await timed ( 'concurrentReads' , async ( ) => await concurrentReads ( owner . connectionConfig ) ) ;
await stopOwner ( owner ) ;
owner = undefined ;
const restartTimer = await timed ( 'restartOwner' , async ( ) => await createOwner ( dataDir , port ) ) ;
owner = restartTimer . value ;
const persisted = await withClient ( owner . connectionConfig , async ( client ) => {
const result = await client . query ( 'SELECT COUNT(*)::int AS count FROM prototype_documents' ) ;
return result . rows [ 0 ] ? . count ? ? null ;
} ) ;
const markdown = ` # Hybrid Search PGlite Owner Process Prototype
Generated : $ { new Date ( ) . toISOString ( ) }
# # Summary
PGlite started behind one explicit owner process , enabled vector and pg _trgm extensions , served PostgreSQL clients through \ ` @electric-sql/pglite-socket \` , answered lexical, semantic, and dictionary probes, and preserved rows across owner restart.
Recommendation : Keep SQLite as the production default . The next PGlite implementation step should be a private adapter prototype behind an explicit configuration flag , still guarded by backend conformance tests , before any CLI or MCP default changes .
# # Timings
| Probe | Duration ms |
| -- - | -- - : |
| startOwner | $ { startTimer . durationMs } |
| seed | $ { seedTimer . durationMs } |
| searchQueries | $ { queryTimer . durationMs } |
| concurrentReads | $ { concurrentTimer . durationMs } |
| restartOwner | $ { restartTimer . durationMs } |
# # Search Feature Results
| Probe | Top result |
| -- - | -- - |
| Postgres FTS through socket | \ ` ${ queryTimer . value . lexical } \` |
| pgvector cosine through socket | \ ` ${ queryTimer . value . semantic } \` |
| pg _trgm dictionary through socket | \ ` ${ queryTimer . value . dictionary } \` |
| Reopened persisted row count | \ ` ${ persisted } \` |
# # Concurrency Observation
Concurrent socket read counts : \ ` ${ concurrentTimer . value . join ( ', ' ) } \`
# # Decision
The owner - process shape is viable for a prototype because it gives CLI and MCP callers a PostgreSQL protocol boundary without opening the same PGlite data directory from independent runtimes . This report is not a production adapter acceptance record .
` ;
await writeFile ( reportPath , markdown ) ;
console . log ( ` Wrote ${ reportPath } ` ) ;
console . log (
JSON . stringify (
{
port ,
timings : {
startOwner : startTimer . durationMs ,
seed : seedTimer . durationMs ,
searchQueries : queryTimer . durationMs ,
concurrentReads : concurrentTimer . durationMs ,
restartOwner : restartTimer . durationMs ,
} ,
topResults : queryTimer . value ,
concurrentReads : concurrentTimer . value ,
persisted ,
} ,
null ,
2 ,
) ,
) ;
} finally {
if ( owner ) {
await stopOwner ( owner ) . catch ( ( ) => undefined ) ;
}
await rm ( tempDir , { recursive : true , force : true } ) ;
}
}
await main ( ) ;