* feat(setup): add Claude Desktop target and MCP-first agent setup Adds `ktx mcp stdio` and a `claude-desktop` setup target that generates a local plugin ZIP wiring the analytics skill and a stdio MCP config. Replaces the CLI-only agent install mode with MCP+analytics (default) and an optional admin CLI skill, renames the research skill to analytics, and lets interactive setup pick project vs global scope when every target supports it. Extracts a shared MCP server factory used by both HTTP and stdio entrypoints. * Add MCP agent client setup support * Polish setup output formatting * Add MCP tool polish design spec Design for slimming the MCP-registered surface from 25 to 11 tools, introducing memory_ingest, applying the per-tool polish kit (annotations, outputSchema, .describe(), in-band error wrapping, union-drift fixes, type-narrowed jsonToolResult), emitting progress notifications on sql_execution + sl_query, and refining the ktx-analytics SKILL.md to match. * Refine MCP tool polish design spec after adversarial review iteration 1 * Refine MCP tool polish design spec after adversarial review iteration 2 * Refine MCP tool polish design spec after adversarial review iteration 3 * refactor(context): rename memory capture service to ingest * feat(mcp): slim research tool surface * refactor(mcp): remove admin ports from server factory * refactor(cli): rename text ingest memory port * docs: update analytics skill for memory ingest * chore: verify mcp surface rename * Add MCP tool polish v1 surface change plan * feat(context): polish mcp tool metadata * fix(context): enforce resolved semantic layer compute sources * feat(context): emit mcp query progress stages * fix(context): keep mcp progress event internal * Add MCP tool polish v1 metadata & progress plan * Fix CI snapshot and docs checks
45 KiB
Research Agent MCP Discover Data Implementation Plan
For agentic workers: REQUIRED SUB-SKILL: Use superpowers:subagent-driven-development (recommended) or superpowers:executing-plans to implement this plan task-by-task. Steps use checkbox (
- [ ]) syntax for tracking.
Goal: Add the MCP-shaped discover_data tool so external research agents get one ranked discovery view across wiki pages, semantic-layer sources/measures/dimensions, and raw warehouse schema.
Architecture: Create a focused local discovery service in packages/context/src/search/discover.ts that builds deterministic per-kind refs from existing wiki, semantic-layer, and latest scan artifacts, fuses the wiki/SL/raw sub-searches with the existing RRF core, and re-reads local artifacts on every call for MCP daemon freshness. Register the service through the MCP context port and local project MCP ports without changing the existing ingest-only discover_data adapter yet.
Tech Stack: TypeScript, Vitest, Zod, KTX local file store, KTX wiki/SL/scan services, KTX MCP context ports, existing HybridSearchCore/RRF search utilities.
Audit Summary
Original spec: docs/superpowers/specs/2026-05-14-research-agent-mcp-tools-design.md
Implemented v1 slices confirmed in current source:
- Existing in-process MCP semantic runtime exists in
packages/context/src/mcp/server.ts,packages/context/src/mcp/context-tools.ts, andpackages/context/src/mcp/local-project-ports.ts. - Ingest-only warehouse verification tools exist under
packages/context/src/ingest/tools/warehouse-verification/. - MCP
sql_executionis implemented and parser-gated:python/ktx-daemon/src/ktx_daemon/sql_analysis.pyhasvalidate_read_only_sql_response,python/ktx-daemon/src/ktx_daemon/app.pyexposesPOST /sql/validate-read-only,packages/context/src/sql-analysis/ports.tshasvalidateReadOnly(), andpackages/context/src/mcp/context-tools.tsregisterssql_execution. - MCP
entity_detailsis implemented:packages/context/src/scan/entity-details.ts,KtxEntityDetailsMcpPort, context-tool registration, and local project wiring all exist. - MCP
dictionary_searchis implemented:packages/context/src/sl/dictionary-search.ts,KtxDictionarySearchMcpPort, context-tool registration, and local project wiring all exist.
V1-blocking gaps still open:
discover_datais not implemented on the MCP surface. There is nopackages/context/src/search/discover.ts, noKtxDiscoverDataMcpPort, noports.discover, no MCP registration, and no local project wiring.ktx mcp start|stop|status|logsand the HTTP Streamable MCP daemon do not exist. There is nopackages/cli/src/commands/mcp-commands.ts, nopackages/cli/src/managed-mcp-daemon.ts, andpackages/cli/src/cli-program.tsdoes not register anmcpcommand subtree.ktx setup-agentsdoes not installktx-research, write Claude Code/Cursor MCP JSON entries, or print Codex/opencode snippets.plannedKtxAgentFiles()still installs only the existingktxskill/rule files.- Ingest-side warehouse verification tools still use
connectionName,targets, androwLimitcontracts. The original spec says these should converge onconnectionIdnaming, but that cleanup can be planned after the MCP research surface is complete because this plan adds a separate MCP adapter with the required shape.
Non-blocking or explicitly out-of-scope gaps:
- Python code execution via MCP.
- Stdio MCP transport.
- OS-level auto-start.
- Native TLS, audit logging, rate limiting, per-tool authorization, and multi-project daemon routing.
- Streaming SQL results.
This plan covers only the next dependency-ordered v1 blocker: MCP discover_data. Later v1 plans still need to cover the HTTP daemon and setup-agent/research-skill installation.
File Structure
Create:
packages/context/src/search/discover.ts- Defines MCP-shaped
discover_datainput, ref, and response types. - Searches wiki pages through
searchLocalKnowledgePages()andreadLocalKnowledgePage(). - Searches semantic-layer records through
loadLocalSlSourceRecords(). - Searches raw schema by reading the latest
raw-sources/<connectionId>/live-database/<syncId>scan artifacts directly. - Fuses wiki, SL, and raw-schema candidates with
HybridSearchCoreusing equal lane weights and normalizes final scores to0..1. - Re-reads artifacts on every call; no long-lived cache.
- Defines MCP-shaped
packages/context/src/search/discover.test.ts- Covers unified result shape, kind filtering, connection scoping, score normalization, snippet cap, raw table refs, and freshness after a newer scan appears.
Modify:
packages/context/src/search/index.ts- Export
createKtxDiscoverDataServiceand discover types.
- Export
packages/context/src/mcp/types.ts- Add
KtxDiscoverDataMcpPortanddiscover?: KtxDiscoverDataMcpPorttoKtxMcpContextPorts.
- Add
packages/context/src/mcp/context-tools.ts- Add the
discover_dataZod schema and tool registration.
- Add the
packages/context/src/mcp/server.test.ts- Assert
discover_dataregistration and structured array output.
- Assert
packages/context/src/mcp/local-project-ports.ts- Wire local project
discover.search()tocreateKtxDiscoverDataService().
- Wire local project
packages/context/src/mcp/local-project-ports.test.ts- Cover local-port
discover_dataacross wiki, SL, and raw schema.
- Cover local-port
packages/context/src/mcp/index.ts- Export the new MCP port type if it is not already covered by existing barrel exports.
Task 1: Add The Local Discover Data Service
Files:
-
Create:
packages/context/src/search/discover.test.ts -
Create:
packages/context/src/search/discover.ts -
Modify:
packages/context/src/search/index.ts -
Step 1: Write failing service tests
Create packages/context/src/search/discover.test.ts:
import { mkdtemp, rm } from 'node:fs/promises';
import { tmpdir } from 'node:os';
import { join } from 'node:path';
import { afterEach, beforeEach, describe, expect, it } from 'vitest';
import { initKtxProject, type KtxLocalProject } from '../project/index.js';
import { writeLocalKnowledgePage } from '../wiki/local-knowledge.js';
import { createKtxDiscoverDataService } from './discover.js';
describe('createKtxDiscoverDataService', () => {
let tempDir: string;
let project: KtxLocalProject;
beforeEach(async () => {
tempDir = await mkdtemp(join(tmpdir(), 'ktx-discover-data-'));
project = await initKtxProject({ projectDir: join(tempDir, 'project'), projectName: 'warehouse' });
project.config.connections.warehouse = { driver: 'postgres', url: 'env:DATABASE_URL' };
project.config.connections.billing = { driver: 'postgres', url: 'env:BILLING_DATABASE_URL' };
});
afterEach(async () => {
await rm(tempDir, { recursive: true, force: true });
});
async function seedWiki(): Promise<void> {
await writeLocalKnowledgePage(project, {
key: 'orders-playbook',
scope: 'GLOBAL',
summary: 'Paid order operations',
content: 'Use paid orders and order_count to inspect monthly customer activity for Acme Corp.',
tags: ['orders'],
});
}
async function seedSl(): Promise<void> {
await project.fileStore.writeFile(
'semantic-layer/warehouse/orders.yaml',
[
'name: orders',
'descriptions:',
' user: Paid order facts',
'table: public.orders',
'grain: [id]',
'columns:',
' - name: status',
' type: string',
' descriptions:',
' user: Payment status for the order',
' - name: ordered_at',
' type: time',
'measures:',
' - name: order_count',
' expr: count(*)',
' description: Number of paid orders',
'',
].join('\n'),
'ktx',
'ktx@example.com',
'seed sl source',
);
}
async function seedScan(input: {
connectionId?: string;
syncId: string;
tableName?: string;
comment?: string;
sampleValues?: string[];
}): Promise<void> {
const connectionId = input.connectionId ?? 'warehouse';
const root = `raw-sources/${connectionId}/live-database/${input.syncId}`;
const tableName = input.tableName ?? 'orders';
await project.fileStore.writeFile(
`${root}/connection.json`,
JSON.stringify(
{
connectionId,
driver: 'postgres',
extractedAt: `2026-05-14T09:00:00.000Z`,
scope: { schemas: ['public'] },
},
null,
2,
),
'ktx',
'ktx@example.com',
'seed scan connection',
);
await project.fileStore.writeFile(
`${root}/tables/public-${tableName}.json`,
JSON.stringify(
{
catalog: null,
db: 'public',
name: tableName,
kind: 'table',
comment: input.comment ?? 'Orders table from warehouse',
estimatedRows: 123,
descriptions: { db: input.comment ?? 'Orders table from warehouse' },
columns: [
{
name: 'id',
nativeType: 'integer',
normalizedType: 'integer',
dimensionType: 'number',
nullable: false,
primaryKey: true,
comment: 'Order id',
},
{
name: 'status',
nativeType: 'text',
normalizedType: 'text',
dimensionType: 'string',
nullable: false,
primaryKey: false,
comment: 'Order status',
sampleValues: input.sampleValues ?? ['paid', 'pending'],
},
],
foreignKeys: [],
},
null,
2,
),
'ktx',
'ktx@example.com',
'seed table',
);
await project.fileStore.writeFile(
`${root}/scan-report.json`,
JSON.stringify(
{
connectionId,
driver: 'postgres',
syncId: input.syncId,
runId: `scan-${input.syncId}`,
trigger: 'mcp',
mode: 'enriched',
dryRun: false,
artifactPaths: {
rawSourcesDir: root,
reportPath: `${root}/scan-report.json`,
manifestShards: [],
enrichmentArtifacts: [],
},
diffSummary: {
tablesAdded: 1,
tablesModified: 0,
tablesDeleted: 0,
tablesUnchanged: 0,
columnsAdded: 0,
columnsModified: 0,
columnsDeleted: 0,
},
manifestShardsWritten: 0,
structuralSyncStats: {
tablesCreated: 0,
tablesUpdated: 0,
tablesDeleted: 0,
columnsCreated: 0,
columnsUpdated: 0,
columnsDeleted: 0,
},
enrichment: {
dataDictionary: 'completed',
tableDescriptions: 'completed',
columnDescriptions: 'completed',
embeddings: 'skipped',
deterministicRelationships: 'skipped',
llmRelationshipValidation: 'skipped',
statisticalValidation: 'skipped',
},
capabilityGaps: [],
warnings: [],
relationships: { accepted: 0, review: 0, rejected: 0, skipped: 0 },
enrichmentState: { resumedStages: [], completedStages: [], failedStages: [] },
createdAt: '2026-05-14T09:00:00.000Z',
},
null,
2,
),
'ktx',
'ktx@example.com',
'seed scan report',
);
}
it('returns unified ranked refs across wiki, semantic-layer, and raw schema', async () => {
await seedWiki();
await seedSl();
await seedScan({ syncId: 'sync-1', sampleValues: ['paid', 'refunded'] });
const service = createKtxDiscoverDataService(project, { userId: 'local-user' });
const results = await service.search({ query: 'paid orders', connectionId: 'warehouse', limit: 10 });
expect(results.map((result) => result.kind)).toEqual(
expect.arrayContaining(['wiki', 'sl_source', 'sl_measure', 'sl_dimension', 'table', 'column']),
);
expect(results.every((result) => result.score >= 0 && result.score <= 1)).toBe(true);
expect(results.every((result) => result.snippet === null || result.snippet.length <= 200)).toBe(true);
expect(results).toContainEqual(
expect.objectContaining({
kind: 'table',
id: 'public.orders',
connectionId: 'warehouse',
tableRef: { catalog: null, db: 'public', name: 'orders' },
matchedOn: expect.stringMatching(/name|description|comment|display/),
}),
);
expect(results).toContainEqual(
expect.objectContaining({
kind: 'column',
id: 'public.orders.status',
connectionId: 'warehouse',
columnName: 'status',
matchedOn: expect.stringMatching(/name|comment|description|sample_value/),
}),
);
expect(results).toContainEqual(
expect.objectContaining({
kind: 'sl_measure',
id: 'orders.order_count',
connectionId: 'warehouse',
summary: 'Number of paid orders',
snippet: 'count(*)',
matchedOn: expect.stringMatching(/name|description|expr/),
}),
);
});
it('honors kind filters and connection scope', async () => {
await seedWiki();
await seedSl();
await seedScan({ syncId: 'sync-1', connectionId: 'warehouse', tableName: 'orders' });
await seedScan({ syncId: 'sync-2', connectionId: 'billing', tableName: 'invoices', comment: 'Billing invoices' });
const service = createKtxDiscoverDataService(project);
const results = await service.search({
query: 'orders',
connectionId: 'warehouse',
kinds: ['table', 'column'],
limit: 10,
});
expect(results.every((result) => result.kind === 'table' || result.kind === 'column')).toBe(true);
expect(results.every((result) => result.connectionId === 'warehouse')).toBe(true);
expect(results.some((result) => result.id.includes('invoices'))).toBe(false);
expect(results.some((result) => result.kind === 'wiki')).toBe(false);
});
it('re-reads the latest scan artifacts on each call', async () => {
await seedScan({ syncId: 'sync-1', tableName: 'orders', comment: 'Old orders table' });
const service = createKtxDiscoverDataService(project);
await expect(service.search({ query: 'orders', connectionId: 'warehouse', kinds: ['table'], limit: 10 })).resolves.toEqual(
expect.arrayContaining([expect.objectContaining({ id: 'public.orders' })]),
);
await seedScan({ syncId: 'sync-2', tableName: 'invoices', comment: 'Invoice facts' });
const fresh = await service.search({ query: 'invoice', connectionId: 'warehouse', kinds: ['table'], limit: 10 });
expect(fresh).toEqual(expect.arrayContaining([expect.objectContaining({ id: 'public.invoices' })]));
expect(fresh.some((result) => result.id === 'public.orders')).toBe(false);
});
});
- Step 2: Run the failing service tests
Run:
pnpm --filter @ktx/context exec vitest run src/search/discover.test.ts
Expected: FAIL with Cannot find module './discover.js'.
- Step 3: Implement the discover service
Create packages/context/src/search/discover.ts:
import type { KtxEmbeddingPort } from '../core/index.js';
import type { KtxLocalProject } from '../project/index.js';
import type { KtxScanReport, KtxSchemaColumn, KtxSchemaTable, KtxTableRef } from '../scan/index.js';
import { DEFAULT_PRIORITY, loadLocalSlSourceRecords, resolveDescription } from '../sl/index.js';
import type { SemanticLayerSource } from '../sl/index.js';
import { readLocalKnowledgePage, searchLocalKnowledgePages } from '../wiki/local-knowledge.js';
import { HybridSearchCore, type FusedSearchCandidate, type SearchCandidateGenerator } from './index.js';
export type KtxDiscoverDataKind = 'wiki' | 'sl_source' | 'sl_measure' | 'sl_dimension' | 'table' | 'column';
export type KtxDiscoverDataMatchedOn =
| 'name'
| 'display'
| 'description'
| 'comment'
| 'expr'
| 'sample_value'
| 'body';
export interface KtxDiscoverDataInput {
query: string;
connectionId?: string;
kinds?: KtxDiscoverDataKind[];
limit?: number;
}
export interface KtxDiscoverDataRef {
kind: KtxDiscoverDataKind;
id: string;
score: number;
summary: string | null;
snippet: string | null;
matchedOn: KtxDiscoverDataMatchedOn;
connectionId?: string;
tableRef?: KtxTableRef;
columnName?: string;
}
export type KtxDiscoverDataResponse = KtxDiscoverDataRef[];
export interface KtxDiscoverDataServiceOptions {
userId?: string;
embeddingService?: KtxEmbeddingPort | null;
}
interface CandidateRecord {
ref: Omit<KtxDiscoverDataRef, 'score'>;
rankScore: number;
}
type RawTable = KtxSchemaTable & {
descriptions?: Record<string, string>;
columns: Array<KtxSchemaColumn & { descriptions?: Record<string, string>; sampleValues?: unknown[] }>;
};
interface LatestScan {
report: KtxScanReport;
rawSourcesDir: string;
tables: RawTable[];
}
const ALL_KINDS: KtxDiscoverDataKind[] = ['wiki', 'sl_source', 'sl_measure', 'sl_dimension', 'table', 'column'];
function normalize(value: string | null | undefined): string {
return (value ?? '').toLowerCase();
}
function queryTerms(query: string): string[] {
return query
.toLowerCase()
.split(/[^a-z0-9_]+/u)
.map((term) => term.trim())
.filter(Boolean);
}
function hasKind(kinds: ReadonlySet<KtxDiscoverDataKind>, kind: KtxDiscoverDataKind): boolean {
return kinds.has(kind);
}
function cap200(value: string | null | undefined): string | null {
if (!value) {
return null;
}
const compact = value.replace(/\s+/g, ' ').trim();
return compact.length > 200 ? compact.slice(0, 200) : compact;
}
function snippetAround(text: string | null | undefined, terms: readonly string[]): string | null {
if (!text) {
return null;
}
const lower = text.toLowerCase();
const index = terms.map((term) => lower.indexOf(term)).filter((position) => position >= 0).sort((a, b) => a - b)[0] ?? 0;
return cap200(text.slice(Math.max(0, index - 60), index + 140));
}
function textScore(value: string | null | undefined, terms: readonly string[]): number {
const haystack = normalize(value);
if (!haystack || terms.length === 0) {
return 0;
}
const matched = terms.filter((term) => haystack.includes(term)).length;
return matched / terms.length;
}
function bestField(
fields: Array<{ matchedOn: KtxDiscoverDataMatchedOn; text: string | null | undefined; weight: number }>,
terms: readonly string[],
): { matchedOn: KtxDiscoverDataMatchedOn; score: number; text: string | null } | null {
const scored = fields
.map((field) => ({
matchedOn: field.matchedOn,
score: textScore(field.text, terms) * field.weight,
text: field.text ?? null,
}))
.filter((field) => field.score > 0)
.sort((left, right) => right.score - left.score || left.matchedOn.localeCompare(right.matchedOn));
return scored[0] ?? null;
}
function displayForTable(table: KtxTableRef): string {
return [table.catalog, table.db, table.name].filter((part): part is string => Boolean(part)).join('.');
}
function tableRef(table: KtxSchemaTable): KtxTableRef {
return { catalog: table.catalog, db: table.db, name: table.name };
}
async function readJson<T>(project: KtxLocalProject, path: string): Promise<T> {
return JSON.parse((await project.fileStore.readFile(path)).content) as T;
}
async function latestScan(project: KtxLocalProject, connectionId: string): Promise<LatestScan | null> {
const root = `raw-sources/${connectionId}/live-database`;
let files: string[];
try {
files = (await project.fileStore.listFiles(root)).files;
} catch {
return null;
}
const reportPath = files.filter((path) => path.endsWith('/scan-report.json')).sort().at(-1);
if (!reportPath) {
return null;
}
const report = await readJson<KtxScanReport>(project, reportPath);
const rawSourcesDir = report.artifactPaths.rawSourcesDir ?? reportPath.slice(0, -'/scan-report.json'.length);
const listedTables = await project.fileStore.listFiles(`${rawSourcesDir}/tables`);
const tables: RawTable[] = [];
for (const path of listedTables.files.filter((file) => file.endsWith('.json')).sort()) {
tables.push(await readJson<RawTable>(project, path));
}
return { report, rawSourcesDir, tables };
}
function configuredConnectionIds(project: KtxLocalProject, connectionId?: string): string[] {
return connectionId ? [connectionId] : Object.keys(project.config.connections).sort();
}
async function wikiCandidates(
project: KtxLocalProject,
input: KtxDiscoverDataInput,
options: KtxDiscoverDataServiceOptions,
terms: readonly string[],
): Promise<CandidateRecord[]> {
const searchResults = await searchLocalKnowledgePages(project, {
query: input.query,
userId: options.userId,
embeddingService: options.embeddingService ?? null,
limit: Math.max(input.limit ?? 15, 25),
});
const records: CandidateRecord[] = [];
for (const result of searchResults) {
const page = await readLocalKnowledgePage(project, { key: result.key, userId: options.userId });
const content = page?.content ?? '';
const matched = bestField(
[
{ matchedOn: 'name', text: result.key, weight: 1.1 },
{ matchedOn: 'description', text: result.summary, weight: 1 },
{ matchedOn: 'body', text: content, weight: 0.8 },
],
terms,
);
records.push({
rankScore: result.score + (matched?.score ?? 0),
ref: {
kind: 'wiki',
id: result.key,
summary: result.summary || null,
snippet: snippetAround(content, terms),
matchedOn: matched?.matchedOn ?? 'body',
},
});
}
return records.sort((left, right) => right.rankScore - left.rankScore || left.ref.id.localeCompare(right.ref.id));
}
async function slCandidates(
project: KtxLocalProject,
input: KtxDiscoverDataInput,
kinds: ReadonlySet<KtxDiscoverDataKind>,
terms: readonly string[],
): Promise<CandidateRecord[]> {
const records: CandidateRecord[] = [];
for (const connectionId of configuredConnectionIds(project, input.connectionId)) {
const sources = await loadLocalSlSourceRecords(project, { connectionId }).catch(() => []);
for (const sourceRecord of sources) {
const source = sourceRecord.source;
if (hasKind(kinds, 'sl_source')) {
const description = resolveDescription(source.descriptions, { priority: DEFAULT_PRIORITY });
const matched = bestField(
[
{ matchedOn: 'name', text: source.name, weight: 1.2 },
{ matchedOn: 'description', text: description, weight: 1 },
{ matchedOn: 'display', text: source.table ?? source.sql ?? null, weight: 0.8 },
],
terms,
);
if (matched) {
records.push({
rankScore: matched.score,
ref: {
kind: 'sl_source',
id: source.name,
connectionId,
summary: description,
snippet:
matched.matchedOn === 'description'
? snippetAround(description, terms)
: cap200(`${source.name}: ${[...source.measures.map((measure) => measure.name), ...source.columns.map((column) => column.name)].slice(0, 3).join(', ')}`),
matchedOn: matched.matchedOn,
},
});
}
}
if (hasKind(kinds, 'sl_measure')) {
for (const measure of source.measures) {
const matched = bestField(
[
{ matchedOn: 'name', text: measure.name, weight: 1.2 },
{ matchedOn: 'description', text: measure.description, weight: 1 },
{ matchedOn: 'expr', text: measure.expr, weight: 0.9 },
],
terms,
);
if (matched) {
records.push({
rankScore: matched.score,
ref: {
kind: 'sl_measure',
id: `${source.name}.${measure.name}`,
connectionId,
summary: measure.description ?? null,
snippet: cap200(measure.expr),
matchedOn: matched.matchedOn,
},
});
}
}
}
if (hasKind(kinds, 'sl_dimension')) {
for (const column of source.columns) {
const description = resolveDescription(column.descriptions, { priority: DEFAULT_PRIORITY });
const matched = bestField(
[
{ matchedOn: 'name', text: column.name, weight: 1.2 },
{ matchedOn: 'description', text: description, weight: 1 },
{ matchedOn: 'expr', text: column.expr, weight: 0.9 },
],
terms,
);
if (matched) {
records.push({
rankScore: matched.score,
ref: {
kind: 'sl_dimension',
id: `${source.name}.${column.name}`,
connectionId,
summary: description,
snippet: cap200(`${column.name} (${column.type})`),
matchedOn: matched.matchedOn,
},
});
}
}
}
}
}
return records.sort((left, right) => right.rankScore - left.rankScore || left.ref.id.localeCompare(right.ref.id));
}
async function rawCandidates(
project: KtxLocalProject,
input: KtxDiscoverDataInput,
kinds: ReadonlySet<KtxDiscoverDataKind>,
terms: readonly string[],
): Promise<CandidateRecord[]> {
const records: CandidateRecord[] = [];
for (const connectionId of configuredConnectionIds(project, input.connectionId)) {
const scan = await latestScan(project, connectionId);
if (!scan) {
continue;
}
for (const table of scan.tables) {
const ref = tableRef(table);
const display = displayForTable(ref);
const tableDescription = resolveDescription(table.descriptions, { priority: DEFAULT_PRIORITY }) ?? table.comment;
if (hasKind(kinds, 'table')) {
const matched = bestField(
[
{ matchedOn: 'name', text: table.name, weight: 1.2 },
{ matchedOn: 'display', text: display, weight: 1.1 },
{ matchedOn: 'description', text: tableDescription, weight: 1 },
{ matchedOn: 'comment', text: table.comment, weight: 1 },
],
terms,
);
if (matched) {
records.push({
rankScore: matched.score,
ref: {
kind: 'table',
id: display,
connectionId,
tableRef: ref,
summary: tableDescription,
snippet:
matched.matchedOn === 'description' || matched.matchedOn === 'comment'
? snippetAround(matched.text, terms)
: cap200(table.columns.slice(0, 5).map((column) => column.name).join(', ')),
matchedOn: matched.matchedOn,
},
});
}
}
if (hasKind(kinds, 'column')) {
for (const column of table.columns) {
const columnDescription = resolveDescription(column.descriptions, { priority: DEFAULT_PRIORITY }) ?? column.comment;
const samples = (column.sampleValues ?? []).map((value) => String(value)).slice(0, 5);
const matched = bestField(
[
{ matchedOn: 'name', text: column.name, weight: 1.2 },
{ matchedOn: 'display', text: `${display}.${column.name}`, weight: 1.1 },
{ matchedOn: 'description', text: columnDescription, weight: 1 },
{ matchedOn: 'comment', text: column.comment, weight: 1 },
{ matchedOn: 'sample_value', text: samples.join(' '), weight: 0.9 },
],
terms,
);
if (matched) {
records.push({
rankScore: matched.score,
ref: {
kind: 'column',
id: `${display}.${column.name}`,
connectionId,
tableRef: ref,
columnName: column.name,
summary: columnDescription,
snippet:
matched.matchedOn === 'sample_value'
? cap200(`${column.nativeType} - samples: ${samples.join(', ')}`)
: matched.matchedOn === 'description' || matched.matchedOn === 'comment'
? snippetAround(matched.text, terms)
: cap200(column.nativeType),
matchedOn: matched.matchedOn,
},
});
}
}
}
}
}
return records.sort((left, right) => right.rankScore - left.rankScore || left.ref.id.localeCompare(right.ref.id));
}
function generator(name: string, candidates: CandidateRecord[], refsByKey: Map<string, Omit<KtxDiscoverDataRef, 'score'>>): SearchCandidateGenerator {
candidates.forEach((candidate) => refsByKey.set(`${candidate.ref.kind}:${candidate.ref.connectionId ?? ''}:${candidate.ref.id}`, candidate.ref));
return {
lane: name,
weight: 1,
async generate() {
return {
candidates: candidates.map((candidate, index) => ({
id: `${candidate.ref.kind}:${candidate.ref.connectionId ?? ''}:${candidate.ref.id}`,
rank: index + 1,
rawScore: candidate.rankScore,
})),
};
},
};
}
function hydrate(fused: FusedSearchCandidate[], refsByKey: Map<string, Omit<KtxDiscoverDataRef, 'score'>>): KtxDiscoverDataRef[] {
const maxScore = Math.max(...fused.map((candidate) => candidate.score), 0);
return fused
.map((candidate) => {
const ref = refsByKey.get(candidate.id);
if (!ref) {
return null;
}
return {
...ref,
score: maxScore > 0 ? Number((candidate.score / maxScore).toFixed(6)) : 0,
};
})
.filter((result): result is KtxDiscoverDataRef => result !== null);
}
export function createKtxDiscoverDataService(
project: KtxLocalProject,
options: KtxDiscoverDataServiceOptions = {},
): { search(input: KtxDiscoverDataInput): Promise<KtxDiscoverDataResponse> } {
return {
async search(input) {
const limit = Math.max(1, Math.min(input.limit ?? 15, 50));
const query = input.query.trim();
if (!query) {
return [];
}
const kinds = new Set(input.kinds ?? ALL_KINDS);
const terms = queryTerms(query);
const refsByKey = new Map<string, Omit<KtxDiscoverDataRef, 'score'>>();
const generators: SearchCandidateGenerator[] = [];
if (hasKind(kinds, 'wiki')) {
generators.push(generator('wiki', await wikiCandidates(project, { ...input, limit }, options, terms), refsByKey));
}
if (hasKind(kinds, 'sl_source') || hasKind(kinds, 'sl_measure') || hasKind(kinds, 'sl_dimension')) {
generators.push(generator('semantic_layer', await slCandidates(project, { ...input, limit }, kinds, terms), refsByKey));
}
if (hasKind(kinds, 'table') || hasKind(kinds, 'column')) {
generators.push(generator('raw_schema', await rawCandidates(project, { ...input, limit }, kinds, terms), refsByKey));
}
if (generators.length === 0) {
return [];
}
const result = await new HybridSearchCore().search({
queryText: query,
limit,
generators,
laneWeights: { wiki: 1, semantic_layer: 1, raw_schema: 1 },
});
return hydrate(result.results, refsByKey);
},
};
}
- Step 4: Export the service
In packages/context/src/search/index.ts, add:
export { createKtxDiscoverDataService } from './discover.js';
export type {
KtxDiscoverDataInput,
KtxDiscoverDataKind,
KtxDiscoverDataMatchedOn,
KtxDiscoverDataRef,
KtxDiscoverDataResponse,
KtxDiscoverDataServiceOptions,
} from './discover.js';
- Step 5: Run service tests
Run:
pnpm --filter @ktx/context exec vitest run src/search/discover.test.ts
Expected: PASS.
- Step 6: Commit the service
Run:
git add packages/context/src/search/discover.ts packages/context/src/search/discover.test.ts packages/context/src/search/index.ts
git commit -m "feat: add MCP discover data service"
Expected: commit succeeds.
Task 2: Register discover_data In The MCP Tool Surface
Files:
-
Modify:
packages/context/src/mcp/types.ts -
Modify:
packages/context/src/mcp/context-tools.ts -
Modify:
packages/context/src/mcp/server.test.ts -
Modify:
packages/context/src/mcp/index.ts -
Step 1: Write failing MCP registration test
In packages/context/src/mcp/server.test.ts, extend the import from ./types.js to include:
KtxDiscoverDataMcpPort,
Add this test after the dictionary_search registration test:
it('registers discover_data when the host provides a discover port', async () => {
const fake = makeFakeServer();
const discover: KtxDiscoverDataMcpPort = {
search: vi.fn<KtxDiscoverDataMcpPort['search']>().mockResolvedValue([
{
kind: 'table',
id: 'public.orders',
score: 1,
summary: 'Orders table',
snippet: 'id, status',
matchedOn: 'name',
connectionId: 'warehouse',
tableRef: { catalog: null, db: 'public', name: 'orders' },
},
]),
};
createKtxMcpServer({
server: fake.server,
userContext: { userId: 'local-user' },
contextTools: { discover },
});
expect(fake.tools.map((tool) => tool.name)).toEqual(['discover_data']);
await expect(
getTool(fake.tools, 'discover_data').handler({
query: 'orders',
connectionId: 'warehouse',
kinds: ['table'],
limit: 5,
}),
).resolves.toMatchObject({
structuredContent: {
refs: [
{
kind: 'table',
id: 'public.orders',
connectionId: 'warehouse',
tableRef: { catalog: null, db: 'public', name: 'orders' },
},
],
},
});
expect(discover.search).toHaveBeenCalledWith({
query: 'orders',
connectionId: 'warehouse',
kinds: ['table'],
limit: 5,
});
});
- Step 2: Run the failing MCP registration test
Run:
pnpm --filter @ktx/context exec vitest run src/mcp/server.test.ts -t "discover_data"
Expected: FAIL with an import or type error for KtxDiscoverDataMcpPort.
- Step 3: Add MCP discover port types
In packages/context/src/mcp/types.ts, add this import near the other search/scan imports:
import type { KtxDiscoverDataInput, KtxDiscoverDataResponse } from '../search/index.js';
Add this interface after KtxDictionarySearchMcpPort:
export interface KtxDiscoverDataMcpPort {
search(input: KtxDiscoverDataInput): Promise<KtxDiscoverDataResponse>;
}
Add this optional port to KtxMcpContextPorts:
discover?: KtxDiscoverDataMcpPort;
- Step 4: Add the Zod schema and registration
In packages/context/src/mcp/context-tools.ts, add this schema after dictionarySearchSchema:
const discoverDataKindSchema = z.enum(['wiki', 'sl_source', 'sl_measure', 'sl_dimension', 'table', 'column']);
const discoverDataSchema = z.object({
query: z.string().min(1),
connectionId: connectionIdSchema.optional(),
kinds: z.array(discoverDataKindSchema).optional(),
limit: z.number().int().min(1).max(50).default(15).optional(),
});
Add this registration block after the dictionary_search registration block and before sql_execution:
if (ports.discover) {
const discover = ports.discover;
registerParsedTool(
server,
'discover_data',
{
title: 'Discover Data',
description:
'Search across KTX wiki pages, semantic-layer sources/measures/dimensions, and raw warehouse schema refs.',
inputSchema: discoverDataSchema.shape,
},
discoverDataSchema,
async (input) => jsonToolResult(await discover.search(input)),
);
}
- Step 5: Export MCP port types
Check packages/context/src/mcp/index.ts. If it already exports all types from ./types.js, leave it unchanged. If it lists individual type exports, add:
export type { KtxDiscoverDataMcpPort } from './types.js';
- Step 6: Run MCP registration tests
Run:
pnpm --filter @ktx/context exec vitest run src/mcp/server.test.ts -t "discover_data"
Expected: PASS.
- Step 7: Commit MCP registration
Run:
git add packages/context/src/mcp/types.ts packages/context/src/mcp/context-tools.ts packages/context/src/mcp/server.test.ts packages/context/src/mcp/index.ts
git commit -m "feat: expose discover data MCP tool"
Expected: commit succeeds.
Task 3: Wire Local Project MCP Ports
Files:
-
Modify:
packages/context/src/mcp/local-project-ports.ts -
Modify:
packages/context/src/mcp/local-project-ports.test.ts -
Step 1: Write failing local-port test
In packages/context/src/mcp/local-project-ports.test.ts, add this test inside the existing describe('createLocalProjectMcpContextPorts', ...) block:
it('exposes local project discover_data across wiki, semantic-layer, and raw schema', async () => {
await project.fileStore.writeFile(
'wiki/global/orders-playbook.md',
[
'---',
'summary: Paid order operations',
'tags: [orders]',
'refs: []',
'sl_refs: []',
'usage_mode: auto',
'---',
'',
'Paid orders are used for customer activity analysis.',
'',
].join('\n'),
'ktx',
'ktx@example.com',
'seed wiki',
);
await project.fileStore.writeFile(
'semantic-layer/warehouse/orders.yaml',
[
'name: orders',
'descriptions:',
' user: Paid order facts',
'table: public.orders',
'grain: [id]',
'columns:',
' - name: status',
' type: string',
' descriptions:',
' user: Payment status',
'measures:',
' - name: order_count',
' expr: count(*)',
' description: Number of paid orders',
'',
].join('\n'),
'ktx',
'ktx@example.com',
'seed sl',
);
await project.fileStore.writeFile(
'raw-sources/warehouse/live-database/sync-1/connection.json',
JSON.stringify({ connectionId: 'warehouse', driver: 'postgres', extractedAt: '2026-05-14T09:00:00.000Z' }, null, 2),
'ktx',
'ktx@example.com',
'seed connection',
);
await project.fileStore.writeFile(
'raw-sources/warehouse/live-database/sync-1/tables/public-orders.json',
JSON.stringify(
{
catalog: null,
db: 'public',
name: 'orders',
kind: 'table',
comment: 'Orders table',
estimatedRows: 10,
columns: [
{
name: 'status',
nativeType: 'text',
normalizedType: 'text',
dimensionType: 'string',
nullable: false,
primaryKey: false,
comment: 'Order status',
sampleValues: ['paid'],
},
],
foreignKeys: [],
},
null,
2,
),
'ktx',
'ktx@example.com',
'seed table',
);
await project.fileStore.writeFile(
'raw-sources/warehouse/live-database/sync-1/scan-report.json',
JSON.stringify(
{
connectionId: 'warehouse',
driver: 'postgres',
syncId: 'sync-1',
runId: 'scan-1',
trigger: 'mcp',
mode: 'enriched',
dryRun: false,
artifactPaths: {
rawSourcesDir: 'raw-sources/warehouse/live-database/sync-1',
reportPath: 'raw-sources/warehouse/live-database/sync-1/scan-report.json',
manifestShards: [],
enrichmentArtifacts: [],
},
diffSummary: {
tablesAdded: 1,
tablesModified: 0,
tablesDeleted: 0,
tablesUnchanged: 0,
columnsAdded: 0,
columnsModified: 0,
columnsDeleted: 0,
},
manifestShardsWritten: 0,
structuralSyncStats: {
tablesCreated: 0,
tablesUpdated: 0,
tablesDeleted: 0,
columnsCreated: 0,
columnsUpdated: 0,
columnsDeleted: 0,
},
enrichment: {
dataDictionary: 'completed',
tableDescriptions: 'completed',
columnDescriptions: 'completed',
embeddings: 'skipped',
deterministicRelationships: 'skipped',
llmRelationshipValidation: 'skipped',
statisticalValidation: 'skipped',
},
capabilityGaps: [],
warnings: [],
relationships: { accepted: 0, review: 0, rejected: 0, skipped: 0 },
enrichmentState: { resumedStages: [], completedStages: [], failedStages: [] },
createdAt: '2026-05-14T09:00:00.000Z',
},
null,
2,
),
'ktx',
'ktx@example.com',
'seed scan report',
);
const ports = createLocalProjectMcpContextPorts(project);
const results = await ports.discover?.search({ query: 'paid orders', connectionId: 'warehouse', limit: 10 });
expect(results).toEqual(
expect.arrayContaining([
expect.objectContaining({ kind: 'wiki', id: 'orders-playbook' }),
expect.objectContaining({ kind: 'sl_source', id: 'orders', connectionId: 'warehouse' }),
expect.objectContaining({ kind: 'table', id: 'public.orders', connectionId: 'warehouse' }),
]),
);
});
- Step 2: Run the failing local-port test
Run:
pnpm --filter @ktx/context exec vitest run src/mcp/local-project-ports.test.ts -t "discover_data"
Expected: FAIL because ports.discover is undefined.
- Step 3: Wire the local port
In packages/context/src/mcp/local-project-ports.ts, add createKtxDiscoverDataService to the search import block:
import { createKtxDiscoverDataService } from '../search/index.js';
Add this port in the ports object after dictionarySearch:
discover: {
async search(input) {
return createKtxDiscoverDataService(project, { userId: 'local', embeddingService }).search(input);
},
},
- Step 4: Run local-port test
Run:
pnpm --filter @ktx/context exec vitest run src/mcp/local-project-ports.test.ts -t "discover_data"
Expected: PASS.
- Step 5: Commit local-port wiring
Run:
git add packages/context/src/mcp/local-project-ports.ts packages/context/src/mcp/local-project-ports.test.ts
git commit -m "feat: wire local discover data MCP port"
Expected: commit succeeds.
Task 4: Verify The Discover Slice
Files:
-
Verify:
packages/context/src/search/discover.ts -
Verify:
packages/context/src/mcp/context-tools.ts -
Verify:
packages/context/src/mcp/local-project-ports.ts -
Step 1: Run focused tests
Run:
pnpm --filter @ktx/context exec vitest run src/search/discover.test.ts src/mcp/server.test.ts src/mcp/local-project-ports.test.ts
Expected: PASS.
- Step 2: Run context type-check
Run:
pnpm --filter @ktx/context run type-check
Expected: PASS.
- Step 3: Run context test suite
Run:
pnpm --filter @ktx/context run test
Expected: PASS.
- Step 4: Check diff hygiene
Run:
git diff --check
Expected: no output and exit code 0.
- Step 5: Document remaining v1 blockers in handoff
Run:
test -e packages/context/src/search/discover.ts; printf 'discover:%s\n' "$?"
test -e packages/cli/src/commands/mcp-commands.ts; printf 'mcp-commands:%s\n' "$?"
test -e packages/cli/src/managed-mcp-daemon.ts; printf 'managed-mcp:%s\n' "$?"
test -e packages/cli/src/skills/research/SKILL.md; printf 'research-skill:%s\n' "$?"
Expected after this plan is implemented:
discover:0
mcp-commands:1
managed-mcp:1
research-skill:1
- Step 6: Commit verification notes if code changed during verification
If verification required code fixes, run:
git status --short
git add packages/context/src/search/discover.ts packages/context/src/search/discover.test.ts packages/context/src/search/index.ts packages/context/src/mcp/types.ts packages/context/src/mcp/context-tools.ts packages/context/src/mcp/server.test.ts packages/context/src/mcp/local-project-ports.ts packages/context/src/mcp/local-project-ports.test.ts packages/context/src/mcp/index.ts
git commit -m "test: verify MCP discover data"
Expected: commit succeeds only when there are verification fixes to commit. If git status --short is empty, skip this commit.
Self-Review
- Spec coverage: this plan covers the MCP-shaped
discover_datainput/output contract, kind filtering, optionalconnectionId, RRF fusion across wiki/SL/raw lanes, deterministic summary/snippet provenance, rawtableRefandcolumnName, score normalization, local project MCP registration, and freshness by re-reading artifacts on every call. - Remaining v1-blocking spec coverage after this slice: HTTP Streamable MCP daemon,
ktx mcpCLI lifecycle commands, setup-agent MCP config writers/snippet printers,ktx-researchskill installation, and ingest-sideconnectionNamecontract convergence. - Placeholder scan: no placeholder or deferred-work wording remains in this plan.
- Type consistency:
KtxDiscoverDataInput,KtxDiscoverDataRef,KtxDiscoverDataResponse, andKtxDiscoverDataMcpPortare defined before use and match the MCP/local-port registration snippets.