mirror of
https://github.com/Kaelio/ktx.git
synced 2026-06-07 07:55:13 +02:00
Refine research-agent MCP tools spec after adversarial review iteration 3
This commit is contained in:
parent
1a79c0efc3
commit
bef9d14b90
1 changed files with 105 additions and 26 deletions
|
|
@ -56,9 +56,12 @@ to host the MCP server.
|
|||
- Provide a `ktx mcp` CLI subtree that runs the MCP server over HTTP on
|
||||
localhost, with the same lifecycle pattern as the existing managed Python
|
||||
daemon (`packages/cli/src/managed-python-daemon.ts`).
|
||||
- Make `ktx setup-agents` write client-side MCP configuration entries for all
|
||||
configured targets (claude-code, codex, cursor, opencode), pointing at the
|
||||
local HTTP endpoint.
|
||||
- Make `ktx setup-agents` install MCP client configuration for the configured
|
||||
targets pointing at the local HTTP endpoint. v1 splits this by client: for
|
||||
claude-code and cursor (JSON config), `setup-agents` writes the entry
|
||||
directly; for codex (TOML) and opencode (different JSON wrapper),
|
||||
`setup-agents` prints a copy-pasteable snippet rather than writing the file.
|
||||
See the client matrix below for full per-target behavior.
|
||||
- Reuse existing infrastructure (connector `executeReadOnly`, schema
|
||||
snapshots, dictionary profile, hybrid search + RRF) rather than building
|
||||
parallel implementations.
|
||||
|
|
@ -141,14 +144,26 @@ or `entity_details` tools.
|
|||
```typescript
|
||||
{
|
||||
kind: 'wiki' | 'sl_source' | 'sl_measure' | 'sl_dimension' | 'table' | 'column',
|
||||
id: string, // wiki key, source name, or qualified table/column id
|
||||
id: string, // stable id: wiki key, source name, or driver-qualified table/column display string
|
||||
score: number, // RRF fused score, 0-1 range
|
||||
summary: string, // one-line description
|
||||
snippet: string, // short context snippet, ≤200 chars
|
||||
connectionId?: string, // present for non-wiki kinds
|
||||
tableRef?: { // present for kind 'table' and 'column'
|
||||
catalog: string | null,
|
||||
db: string | null,
|
||||
name: string,
|
||||
},
|
||||
columnName?: string, // present for kind 'column'
|
||||
}
|
||||
```
|
||||
|
||||
The structured `tableRef` mirrors the live `KtxSchemaTable` identity
|
||||
(`packages/context/src/scan/types.ts:74-83`) so callers can pass refs into
|
||||
`entity_details` without losing `catalog`/`db` qualification on drivers
|
||||
that need it (BigQuery `project.dataset.table`, Snowflake/SQL Server
|
||||
`database.schema.table`).
|
||||
|
||||
**Implementation:** new module `packages/context/src/search/discover.ts`.
|
||||
Composes three sub-searches in parallel:
|
||||
|
||||
|
|
@ -174,7 +189,18 @@ columns) from the latest scan snapshot. The raw-data equivalent of
|
|||
{
|
||||
connectionId: z.string().min(1),
|
||||
entities: z.array(z.object({
|
||||
table: z.string().min(1), // qualified or unqualified
|
||||
// table accepts either a driver-display string ("project.dataset.table",
|
||||
// "schema.name", "db.schema.name") or a structured ref. The resolver
|
||||
// returns a structured error when the input is ambiguous across multiple
|
||||
// schemas/catalogs.
|
||||
table: z.union([
|
||||
z.string().min(1),
|
||||
z.object({
|
||||
catalog: z.string().nullable(),
|
||||
db: z.string().nullable(),
|
||||
name: z.string().min(1),
|
||||
}),
|
||||
]),
|
||||
columns: z.array(z.string()).optional(), // omit → all columns
|
||||
})).min(1).max(20),
|
||||
}
|
||||
|
|
@ -185,20 +211,29 @@ columns) from the latest scan snapshot. The raw-data equivalent of
|
|||
```typescript
|
||||
{
|
||||
connectionId: string,
|
||||
table: string, // qualified (schema.name)
|
||||
kind: 'table' | 'view',
|
||||
tableRef: { // structured identity, lossless on every driver
|
||||
catalog: string | null, // BigQuery project, Snowflake/SQL Server database
|
||||
db: string | null, // schema/dataset
|
||||
name: string,
|
||||
},
|
||||
display: string, // driver-formatted display string
|
||||
// (e.g. "project.dataset.table", "schema.name")
|
||||
kind: 'table' | 'view' | 'external' | 'event_stream', // matches KtxSchemaTableKind
|
||||
comment: string | null,
|
||||
estimatedRows: number | null,
|
||||
columns: Array<{
|
||||
name: string,
|
||||
nativeType: string,
|
||||
normalizedType: string,
|
||||
dimensionType: 'time' | 'string' | 'number' | 'boolean',
|
||||
nullable: boolean,
|
||||
primaryKey: boolean,
|
||||
comment: string | null,
|
||||
}>,
|
||||
foreignKeys: Array<{
|
||||
fromColumn: string,
|
||||
toCatalog: string | null, // qualified FK target, preserves cross-db FKs
|
||||
toDb: string | null,
|
||||
toTable: string,
|
||||
toColumn: string,
|
||||
constraintName: string | null,
|
||||
|
|
@ -211,6 +246,14 @@ columns) from the latest scan snapshot. The raw-data equivalent of
|
|||
}
|
||||
```
|
||||
|
||||
Output fields mirror `KtxSchemaTable` / `KtxSchemaColumn` /
|
||||
`KtxSchemaForeignKey` from `packages/context/src/scan/types.ts:51-82`. The
|
||||
full `KtxSchemaTableKind` set is preserved so BigQuery `external` tables
|
||||
and warehouses with event-stream sources are not silently coerced. FK
|
||||
target qualification (`toCatalog`/`toDb`) carries through so agents can
|
||||
write valid SQL for cross-schema or cross-database references without
|
||||
re-resolving.
|
||||
|
||||
If `columns` is provided, only the requested columns appear in the `columns`
|
||||
array (PKs and FKs still report on the full table).
|
||||
|
||||
|
|
@ -264,10 +307,34 @@ research skill must teach agents not to treat a miss as exhaustive.
|
|||
```
|
||||
|
||||
**Output:** for each input value, the list of matching entries plus
|
||||
provenance:
|
||||
per-connection provenance. Coverage and miss reasons are connection-scoped
|
||||
because `loadLatestSlDictionaryEntries` iterates each connection's profile
|
||||
artifact independently
|
||||
(`packages/context/src/sl/sl-dictionary-profile.ts:96-112`); a single
|
||||
all-connections call can mix `no_profile_artifact` (one connection never
|
||||
ran an enriched scan), `value_not_in_sample` (another connection ran but
|
||||
the literal was outside the sample), and matches in the same response.
|
||||
|
||||
```typescript
|
||||
{
|
||||
// The set of connections actually searched on this call. When the input
|
||||
// omits connectionId this is every configured connection; otherwise it
|
||||
// contains the single requested connection.
|
||||
searched: Array<{
|
||||
connectionId: string,
|
||||
coverage: {
|
||||
sampledRows: number | null, // profileSampleRows used at profile time
|
||||
valuesPerColumn: number | null, // sampleValuesPerColumn used at profile time
|
||||
profiledColumns: number, // count of columns in the dictionary index for this connection
|
||||
syncId: string | null, // identifier of the profile artifact (null when missing)
|
||||
profiledAt: string | null, // ISO-8601 UTC of the profile artifact (null when missing)
|
||||
},
|
||||
// Per-connection status, independent of any specific input value:
|
||||
// ready — profile present with profiled columns
|
||||
// no_profile_artifact — enriched scan never ran for this connection
|
||||
// no_candidate_columns — profile present but no columns profile-eligible
|
||||
status: 'ready' | 'no_profile_artifact' | 'no_candidate_columns',
|
||||
}>,
|
||||
results: Array<{
|
||||
value: string, // input value
|
||||
matches: Array<{
|
||||
|
|
@ -277,25 +344,28 @@ provenance:
|
|||
matchedValue: string, // actual value found (may differ in case)
|
||||
cardinality: number | null, // column cardinality if known
|
||||
}>,
|
||||
missReason: // present when matches is empty
|
||||
| 'no_profile_artifact' // enriched scan never ran for this connection
|
||||
| 'no_candidate_columns' // profile present, no columns profile-eligible
|
||||
| 'value_not_in_sample', // profile present but value not in sample (non-authoritative miss)
|
||||
coverage: {
|
||||
sampledRows: number | null, // profileSampleRows used at profile time
|
||||
valuesPerColumn: number | null, // sampleValuesPerColumn used at profile time
|
||||
profiledColumns: number, // count of columns in the dictionary index
|
||||
syncId: string | null, // identifier of the profile artifact
|
||||
profiledAt: string | null, // ISO-8601 UTC of the profile artifact
|
||||
},
|
||||
// Per-connection miss reasons for this value, present when that
|
||||
// connection produced no match. Connections that matched do not appear
|
||||
// in `misses`. For ready connections with no match, the reason is
|
||||
// 'value_not_in_sample' (non-authoritative miss). For unready
|
||||
// connections, the reason mirrors their `status` above.
|
||||
misses: Array<{
|
||||
connectionId: string,
|
||||
reason:
|
||||
| 'no_profile_artifact'
|
||||
| 'no_candidate_columns'
|
||||
| 'value_not_in_sample',
|
||||
}>,
|
||||
}>,
|
||||
}
|
||||
```
|
||||
|
||||
**Matching semantics:** case-insensitive substring match against the
|
||||
profile-sampled values. Misses are never authoritative — they only state
|
||||
that the value was not in the captured sample. `missReason` distinguishes
|
||||
"no enriched scan has run" (`no_profile_artifact`) from "scan ran but value
|
||||
that the value was not in the captured sample for the listed connection.
|
||||
`misses[].reason` distinguishes "no enriched scan has run on this
|
||||
connection" (`no_profile_artifact`), "enriched scan ran but no columns
|
||||
were profile-eligible" (`no_candidate_columns`), and "scan ran but value
|
||||
was not in the sample" (`value_not_in_sample`). The research skill must
|
||||
direct agents to follow up a `value_not_in_sample` miss with
|
||||
`sql_execution` against the most plausible columns, not to conclude the
|
||||
|
|
@ -651,9 +721,15 @@ Rules:
|
|||
|
||||
Port is read from `.ktx/mcp.json` if present, falling back to 7878. The
|
||||
install manifest (`agentInstallManifestPath`,
|
||||
`packages/cli/src/setup-agents.ts:60`) tracks each `json-key` (and
|
||||
`toml-table` for the codex snippet case) entry so `ktx setup-agents
|
||||
--remove` can roll back cleanly.
|
||||
`packages/cli/src/setup-agents.ts:60`) tracks each **written** entry so
|
||||
`ktx setup-agents --remove` can roll back cleanly. The current manifest
|
||||
entry kinds are `file` and `json-key`
|
||||
(`packages/cli/src/setup-agents.ts:42-50`); the MCP client writers for
|
||||
claude-code and cursor add `json-key` entries for their respective config
|
||||
files. Printed-only snippets for codex and opencode are **not** tracked in
|
||||
the manifest, and `--remove` does not attempt to mutate user-written
|
||||
files for those targets; the printed instructions tell the user how to
|
||||
remove the entry by hand.
|
||||
|
||||
If the daemon is not running when `setup-agents` runs, the command prints a
|
||||
follow-up hint: "Run `ktx mcp start` to enable the configured KTX MCP
|
||||
|
|
@ -780,8 +856,11 @@ You have access to KTX MCP tools for investigating data. Follow this workflow.
|
|||
responses.
|
||||
- CLI integration test for `ktx mcp start|stop|status` lifecycle following
|
||||
the pattern in `managed-python-daemon.test.ts`.
|
||||
- Setup-agents test verifying the MCP client config entries are written for
|
||||
each target.
|
||||
- Setup-agents tests verifying behavior per target: claude-code and cursor
|
||||
writers add the correct JSON entry and a corresponding `json-key`
|
||||
manifest entry that `--remove` cleans up; codex and opencode targets
|
||||
produce printed snippet output and do not mutate any user config file
|
||||
or add manifest entries in v1.
|
||||
- Verification commands per CLAUDE.md: `pnpm --filter @ktx/context run test`
|
||||
and `pnpm --filter @ktx/cli run test` for the affected packages, plus
|
||||
`pnpm run type-check`.
|
||||
|
|
|
|||
Loading…
Add table
Add a link
Reference in a new issue