mirror of
https://github.com/Kaelio/ktx.git
synced 2026-06-10 08:05:14 +02:00
* refactor(workspace): relocate @ktx/llm source into packages/cli/src/llm * refactor(workspace): rewrite @ktx/llm imports to relative paths * refactor(workspace): fold internal packages into cli * chore(workspace): gate dead-code with knip production mode Turn on production-mode knip plus an autofix run in pre-commit and the `pnpm dead-code` script, document the `/** @internal */` convention for test-only exports in AGENTS.md, annotate test-only exports across the CLI with that JSDoc, and drop dead exports/wrappers the new gate surfaced (e.g. `cli-project.ts`, `lookerRuntimeSourceToFileAdapterSource`, `createLocalScanEnrichmentProvidersFromConfig`, `PGLITE_OWNER_PROCESS_BACKEND_CAPABILITIES`, stale type re-exports). Replace the loose `ignoreIssues` allowlist in `knip.json` with explicit production entries so cross-package barrel leaks are caught. * refactor(cli): delete internal barrel index.ts files The 34 `index.ts` re-export barrels inside `packages/cli/src/` were holdovers from the pre-fold multi-workspace structure. Post-fold-in they served no production purpose: external consumers go through the single package main entry, and in-repo callers mostly imported through them only because the path was short. Internally, knip flagged most barrel re-exports as production-dead (only reached via tests). This change: - Deletes every internal barrel except `packages/cli/src/index.ts` (the published package entry). - Rewrites ~270 source/test files to import each name directly from the file that defines it. - Moves `tools/warehouse-verification/index.ts` to `create-warehouse-verification-tools.ts` (the function it defined locally) and updates its single consumer. - Renames `search/backend-conformance.ts` → `.test-utils.ts` to match the existing test-helper file convention. - Deletes 13 dead test-only chains (dbt-descriptions/*, live-database/extracted-schema, live-database/structural-sync, relationship-* feedback/review chain) plus their tests and a cascading orphan integration test. - Updates test mocks that pointed at deleted barrel paths (notion-client, connector barrels in scan/local-scan-connectors tests) to mock the source files instead. - Points the maintainer benchmark script (`scripts/relationship-benchmark-report.mjs`) at source files instead of `dist/context/scan/index.js`. - Drops the barrel `!` entries from `knip.json`; adds explicit production entries only for the benchmark code reached via dist by the maintainer script. Net: 413 files changed, ~1.2k insertions, ~9.4k deletions. `pnpm run dead-code` (Biome + knip default + knip production) and `pnpm run type-check` are clean; 2277 tests pass. * refactor(workspace): rename @ktx/cli to @kaelio/ktx and pack it directly Promote the CLI workspace package to the public name `@kaelio/ktx` and drop the separate `scripts/build-public-npm-package.mjs` wrapper. The CLI package is now publishable in place (`publishConfig.access: public`, `provenance: true`), so artifact packing uses `pnpm pack` against `packages/cli/` instead of assembling a parallel package tree. Updates all workspace filter invocations, docs, tests, and release readiness checks to reference the new package name, and folds the tarball-name helper into `scripts/public-npm-release-metadata.mjs`. * docs: align "agent clients" and "data agents" terminology Replace "client agents" with "agent clients" and "database agents" with "data agents" across AGENTS.md, README.md, the docs-site copy, and the matching setup-agents test description, matching the canonical vocabulary in docs/terminology.md. Also moves packages/cli/tsconfig.json's tsBuildInfoFile from node_modules/.cache/ to dist/.tsbuildinfo so incremental builds survive node_modules reinstalls. * refactor(release): single source of truth for package version Make packages/cli/package.json the single source of truth for the @kaelio/ktx version. publicNpmPackageVersion() now reads it directly, so artifact filenames, release-readiness checks, and the Python wheel version all derive from one field. The duplicate release-policy.json.publicNpmPackageVersion is removed. Previously the two fields could drift: tarballs were named kaelio-ktx-0.4.1.tgz while internally containing @kaelio/ktx@0.0.0-private. - update-public-release-version.mjs rewrites both Python pyproject.toml files (ktx-daemon, ktx-sl) alongside the npm package.jsons, normalizing the version for PEP 440 (e.g. 0.1.0-rc.2 -> 0.1.0rc2). - semantic-release-config.cjs adds the two pyproject.toml files to @semantic-release/git assets so the release commit back to main carries every version source in lockstep. - The six "?? '0.0.0-private'" fallback literals across the CLI are replaced with "?? getKtxCliPackageInfo().version", and createDefaultKtxMcpServer makes its version arg required. - docs/release.md describes the actual commit-back model: the dev tree always reflects the most recent release; no sentinel pin to maintain. Verified: pnpm run artifacts:build now produces kaelio-ktx-0.4.1.tgz and kaelio_ktx-0.4.1-py3-none-any.whl with @kaelio/ktx@0.4.1 inside. Full type-check, dead-code, and 2287 vitests + 173 script tests pass. * refactor(cli): inject embedding provider resolution and detect sentence-transformers runtime Make resolveProjectEmbeddingProvider and runtimeIo injectable in ingest and scan command entrypoints so tests can stub them, and teach resolvePublicIngestRuntimeRequirements to flag the local-embeddings runtime feature when ktx.yaml selects sentence-transformers. * chore(cli): mark buildLocalStatsStatus and LocalStatsStatus as @internal Both symbols are consumed only by status-project.test.ts. Annotating with /** @internal */ keeps knip's production-mode check clean without changing runtime behavior. * fix(cli): use real package metadata in print-command-tree The stubbed package name embedded a forbidden product identifier that tripped the boundary check in CI. Read the metadata from package.json instead — keeps the rendered tree unchanged and removes a duplicate source of truth. * feat(cli): show embedding coverage in `ktx status`, drop duplicate disk counts Inline `(N embedded)` next to the Wiki scope counts and Semantic-layer source counts, computed with `SUM(embedding_json IS NOT NULL)` over `knowledge_pages` and `local_sl_sources`. Rename the "Knowledge" label to "Wiki" (canonical per `docs/terminology.md`) and rename the matching `localStats.knowledgePages` field to `localStats.wikiPages`. Drop `wiki=N md` and `semantic-layer=N yaml` from the Disk row — those duplicated the per-surface rows above. Disk now reports only actual byte usage (db, cache, raw-sources). The unused `wikiGlobalMarkdownCount` / `semanticLayerYamlCount` fields, the `isMarkdownEntry` / `isYamlEntry` helpers, and the `filter` arg on `summarizeDir` are removed.
357 lines
11 KiB
TypeScript
357 lines
11 KiB
TypeScript
import { readFile as fsReadFile } from 'node:fs/promises';
|
|
import { basename, resolve } from 'node:path';
|
|
import { createLocalProjectMemoryIngest } from './context/memory/local-memory.js';
|
|
import type { MemoryAgentInput } from './context/memory/types.js';
|
|
import type { MemoryIngestStatus } from './context/memory/memory-runs.js';
|
|
import { loadKtxProject, type KtxLocalProject } from './context/project/project.js';
|
|
import type { KtxCliIo } from './cli-runtime.js';
|
|
import { createRepainter, initViewState, renderContextBuildView, type ContextBuildTargetState } from './context-build-view.js';
|
|
import { formatDuration } from './demo-metrics.js';
|
|
import type { KtxPublicIngestPlanTarget } from './public-ingest.js';
|
|
|
|
export interface KtxTextIngestArgs {
|
|
projectDir: string;
|
|
texts: string[];
|
|
files: string[];
|
|
connectionId?: string;
|
|
userId: string;
|
|
json: boolean;
|
|
failFast: boolean;
|
|
}
|
|
|
|
/** @internal */
|
|
export interface TextMemoryIngestPort {
|
|
ingest(input: MemoryAgentInput): Promise<{ runId: string }>;
|
|
waitForRun(runId: string): Promise<void>;
|
|
status(runId: string): Promise<MemoryIngestStatus | null>;
|
|
}
|
|
|
|
interface TextIngestItem {
|
|
label: string;
|
|
content: string;
|
|
}
|
|
|
|
interface TextIngestResult {
|
|
label: string;
|
|
runId: string | null;
|
|
status: 'done' | 'error';
|
|
captured: MemoryIngestStatus['captured'];
|
|
commitHash: string | null;
|
|
error: string | null;
|
|
}
|
|
|
|
export interface KtxTextIngestDeps {
|
|
loadProject?: (options: { projectDir: string }) => Promise<KtxLocalProject>;
|
|
createMemoryIngest?: (project: KtxLocalProject) => TextMemoryIngestPort;
|
|
readFile?: (path: string) => Promise<string>;
|
|
readStdin?: () => Promise<string>;
|
|
now?: () => number;
|
|
}
|
|
|
|
const INLINE_TEXT_LABEL_MAX_LENGTH = 50;
|
|
const ANSI_ESCAPE_PATTERN = /\x1B\[[0-?]*[ -/]*[@-~]/g;
|
|
|
|
function defaultCreateMemoryIngest(project: KtxLocalProject): TextMemoryIngestPort {
|
|
return createLocalProjectMemoryIngest(project);
|
|
}
|
|
|
|
async function defaultReadStdin(): Promise<string> {
|
|
const chunks: string[] = [];
|
|
process.stdin.setEncoding('utf-8');
|
|
for await (const chunk of process.stdin) {
|
|
chunks.push(String(chunk));
|
|
}
|
|
return chunks.join('');
|
|
}
|
|
|
|
async function defaultReadFile(path: string): Promise<string> {
|
|
return await fsReadFile(path, 'utf-8');
|
|
}
|
|
|
|
function emptyCaptured(): MemoryIngestStatus['captured'] {
|
|
return { wiki: [], sl: [], xrefs: [] };
|
|
}
|
|
|
|
function normalizedTextPreview(content: string): string {
|
|
return content
|
|
.replace(ANSI_ESCAPE_PATTERN, '')
|
|
.replace(/[\u0000-\u001f\u007f-\u009f]/g, ' ')
|
|
.replace(/\s+/g, ' ')
|
|
.trim();
|
|
}
|
|
|
|
function truncateLabel(label: string, maxLength = INLINE_TEXT_LABEL_MAX_LENGTH): string {
|
|
const chars = Array.from(label);
|
|
if (chars.length <= maxLength) {
|
|
return label;
|
|
}
|
|
return `${chars.slice(0, maxLength - 3).join('').trimEnd()}...`;
|
|
}
|
|
|
|
function quoteInlineTextLabel(label: string): string {
|
|
return JSON.stringify(label);
|
|
}
|
|
|
|
function makeUniqueLabel(label: string, usedLabels: Set<string>): string {
|
|
if (!usedLabels.has(label)) {
|
|
return label;
|
|
}
|
|
|
|
for (let index = 2; ; index++) {
|
|
const suffix = ` (${index})`;
|
|
const candidate = `${truncateLabel(label, INLINE_TEXT_LABEL_MAX_LENGTH - suffix.length)}${suffix}`;
|
|
if (!usedLabels.has(candidate)) {
|
|
return candidate;
|
|
}
|
|
}
|
|
}
|
|
|
|
function textLabel(content: string, index: number, usedLabels: Set<string>): string {
|
|
const preview = normalizedTextPreview(content);
|
|
const baseLabel = preview.length > 0 ? quoteInlineTextLabel(truncateLabel(preview)) : `text-${index + 1}`;
|
|
return makeUniqueLabel(baseLabel, usedLabels);
|
|
}
|
|
|
|
function artifactReference(label: string): string {
|
|
return label.startsWith('"') ? label : `"${label}"`;
|
|
}
|
|
|
|
function stdinLabel(items: TextIngestItem[]): string {
|
|
if (!items.some((item) => item.label === 'stdin')) {
|
|
return 'stdin';
|
|
}
|
|
return `stdin-${items.filter((item) => item.label.startsWith('stdin')).length + 1}`;
|
|
}
|
|
|
|
async function loadItems(args: KtxTextIngestArgs, deps: KtxTextIngestDeps): Promise<TextIngestItem[]> {
|
|
const items: TextIngestItem[] = [];
|
|
const usedTextLabels = new Set<string>();
|
|
args.texts.forEach((content, index) => {
|
|
const label = textLabel(content, index, usedTextLabels);
|
|
usedTextLabels.add(label);
|
|
items.push({ label, content });
|
|
});
|
|
|
|
const readFile = deps.readFile ?? defaultReadFile;
|
|
const readStdin = deps.readStdin ?? defaultReadStdin;
|
|
for (const file of args.files) {
|
|
if (file === '-') {
|
|
items.push({ label: stdinLabel(items), content: await readStdin() });
|
|
} else {
|
|
const path = resolve(file);
|
|
items.push({ label: basename(path), content: await readFile(path) });
|
|
}
|
|
}
|
|
|
|
return items;
|
|
}
|
|
|
|
function validateItems(items: TextIngestItem[], io: KtxCliIo): boolean {
|
|
if (items.length === 0) {
|
|
io.stderr.write('Provide at least one text item with --text, a file path, or - for stdin.\n');
|
|
return false;
|
|
}
|
|
|
|
for (const item of items) {
|
|
if (item.content.trim().length === 0) {
|
|
io.stderr.write(`Text item "${item.label}" is empty.\n`);
|
|
return false;
|
|
}
|
|
}
|
|
return true;
|
|
}
|
|
|
|
function makeTarget(label: string): KtxPublicIngestPlanTarget {
|
|
return {
|
|
connectionId: label,
|
|
driver: 'text',
|
|
operation: 'source-ingest',
|
|
debugCommand: '',
|
|
steps: ['memory-update'],
|
|
};
|
|
}
|
|
|
|
function allTargets(state: ReturnType<typeof initViewState>): ContextBuildTargetState[] {
|
|
return [...state.primarySources, ...state.contextSources];
|
|
}
|
|
|
|
function renderTextIngestView(state: ReturnType<typeof initViewState>, styled: boolean): string {
|
|
return renderContextBuildView(state, {
|
|
styled,
|
|
title: 'Ingesting text memory',
|
|
contextGroupLabel: 'Texts',
|
|
sourceIngestRunningText: 'capturing...',
|
|
completedItemName: { singular: 'text', plural: 'texts' },
|
|
});
|
|
}
|
|
|
|
function summarizeCaptured(captured: MemoryIngestStatus['captured']): string {
|
|
const parts = [
|
|
`wiki=${captured.wiki.length}`,
|
|
`sl=${captured.sl.length}`,
|
|
`xrefs=${captured.xrefs.length}`,
|
|
];
|
|
return parts.join(', ');
|
|
}
|
|
|
|
function resultFromStatus(label: string, status: MemoryIngestStatus): TextIngestResult {
|
|
return {
|
|
label,
|
|
runId: status.runId,
|
|
status: status.status === 'done' ? 'done' : 'error',
|
|
captured: status.captured,
|
|
commitHash: status.commitHash,
|
|
error: status.error,
|
|
};
|
|
}
|
|
|
|
function errorResult(label: string, runId: string | null, error: unknown): TextIngestResult {
|
|
return {
|
|
label,
|
|
runId,
|
|
status: 'error',
|
|
captured: emptyCaptured(),
|
|
commitHash: null,
|
|
error: error instanceof Error ? error.message : String(error),
|
|
};
|
|
}
|
|
|
|
function writeJsonResult(args: KtxTextIngestArgs, results: TextIngestResult[], io: KtxCliIo): void {
|
|
io.stdout.write(
|
|
`${JSON.stringify(
|
|
{
|
|
status: results.some((result) => result.status === 'error') ? 'failed' : 'done',
|
|
projectDir: args.projectDir,
|
|
connectionId: args.connectionId ?? null,
|
|
results,
|
|
},
|
|
null,
|
|
2,
|
|
)}\n`,
|
|
);
|
|
}
|
|
|
|
function writePlainFailures(results: TextIngestResult[], io: KtxCliIo): void {
|
|
const failures = results.filter((result) => result.status === 'error');
|
|
if (failures.length === 0) {
|
|
return;
|
|
}
|
|
|
|
io.stdout.write('\nFailed text items:\n');
|
|
for (const result of failures) {
|
|
io.stdout.write(` ${result.label}: ${result.error ?? 'failed'}\n`);
|
|
}
|
|
}
|
|
|
|
export async function runKtxTextIngest(
|
|
args: KtxTextIngestArgs,
|
|
io: KtxCliIo,
|
|
deps: KtxTextIngestDeps = {},
|
|
): Promise<number> {
|
|
const items = await loadItems(args, deps);
|
|
if (!validateItems(items, io)) {
|
|
return 1;
|
|
}
|
|
|
|
const project = await (deps.loadProject ?? loadKtxProject)({ projectDir: args.projectDir });
|
|
const memoryIngest = (deps.createMemoryIngest ?? defaultCreateMemoryIngest)(project);
|
|
const now = deps.now ?? (() => Date.now());
|
|
const batchId = now();
|
|
const state = initViewState(items.map((item) => makeTarget(item.label)));
|
|
const targets = allTargets(state);
|
|
const isTTY = io.stdout.isTTY === true && args.json !== true;
|
|
const repainter = isTTY ? createRepainter(io) : null;
|
|
const results: TextIngestResult[] = [];
|
|
|
|
state.startedAt = now();
|
|
const paint = () => repainter?.paint(renderTextIngestView(state, true));
|
|
paint();
|
|
|
|
let spinnerInterval: ReturnType<typeof setInterval> | null = null;
|
|
if (repainter) {
|
|
spinnerInterval = setInterval(() => {
|
|
const current = now();
|
|
state.frame++;
|
|
state.totalElapsedMs = state.startedAt === null ? 0 : current - state.startedAt;
|
|
for (const target of targets) {
|
|
if (target.status === 'running' && target.startedAt !== null) {
|
|
target.elapsedMs = current - target.startedAt;
|
|
}
|
|
}
|
|
paint();
|
|
}, 140);
|
|
}
|
|
|
|
try {
|
|
for (let index = 0; index < items.length; index++) {
|
|
const item = items[index]!;
|
|
const target = targets[index]!;
|
|
target.status = 'running';
|
|
target.startedAt = now();
|
|
target.detailLine = 'capturing...';
|
|
target.progressUpdatedAtMs = target.startedAt;
|
|
paint();
|
|
|
|
let runId: string | null = null;
|
|
let result: TextIngestResult;
|
|
try {
|
|
const ingestInput: MemoryAgentInput = {
|
|
userId: args.userId,
|
|
chatId: `cli-text-ingest-${batchId}-${index + 1}`,
|
|
userMessage: `Ingest external text artifact ${artifactReference(item.label)} into KTX memory.`,
|
|
assistantMessage: item.content.trim(),
|
|
...(args.connectionId ? { connectionId: args.connectionId } : {}),
|
|
sourceType: 'external_ingest',
|
|
};
|
|
const ingest = await memoryIngest.ingest(ingestInput);
|
|
runId = ingest.runId;
|
|
await memoryIngest.waitForRun(runId);
|
|
const status = await memoryIngest.status(runId);
|
|
if (!status) {
|
|
throw new Error(`Memory ingest run "${runId}" was not found.`);
|
|
}
|
|
result = resultFromStatus(item.label, status);
|
|
} catch (error) {
|
|
result = errorResult(item.label, runId, error);
|
|
}
|
|
|
|
results.push(result);
|
|
target.elapsedMs = now() - (target.startedAt ?? now());
|
|
target.detailLine = null;
|
|
target.status = result.status === 'done' ? 'done' : 'failed';
|
|
target.summaryText = result.status === 'done' ? summarizeCaptured(result.captured) : null;
|
|
target.failureText = result.status === 'error' ? result.error : null;
|
|
paint();
|
|
|
|
if (result.status === 'error' && args.failFast) {
|
|
break;
|
|
}
|
|
}
|
|
} finally {
|
|
if (spinnerInterval) {
|
|
clearInterval(spinnerInterval);
|
|
}
|
|
}
|
|
|
|
if (state.startedAt !== null) {
|
|
state.totalElapsedMs = now() - state.startedAt;
|
|
}
|
|
|
|
if (args.json) {
|
|
writeJsonResult(args, results, io);
|
|
} else if (repainter) {
|
|
repainter.paint(renderTextIngestView(state, true));
|
|
writePlainFailures(results, io);
|
|
} else {
|
|
io.stdout.write(renderTextIngestView(state, false));
|
|
writePlainFailures(results, io);
|
|
}
|
|
|
|
if (!args.json && results.length > 0) {
|
|
const duration = state.totalElapsedMs > 0 ? ` in ${formatDuration(state.totalElapsedMs)}` : '';
|
|
const outcome = results.some((result) => result.status === 'error') ? 'finished with failures' : 'finished';
|
|
io.stdout.write(`Text memory ingest ${outcome}${duration}.\n`);
|
|
}
|
|
|
|
return results.some((result) => result.status === 'error') ? 1 : 0;
|
|
}
|