feat(mcp):added MCP server (#97)

* docs(specs): design research-agent MCP tools and ktx mcp daemon Adds the 2026-05-14 design spec for exposing four new MCP tools (discover_data, entity_details, dictionary_search, sql_execution), shipping a ktx-research skill, and introducing an HTTP-only ktx mcp daemon so external agents can use KTX as a research-capable context layer. * Refine research-agent MCP tools spec after adversarial review iteration 1 * Refine research-agent MCP tools spec after adversarial review iteration 2 * Refine research-agent MCP tools spec after adversarial review iteration 3 * Refine spec: drop connectionName compat carve-out and ground summary/snippet provenance per kind * feat(daemon): validate read-only SQL with sqlglot * feat(context): expose read-only SQL validation port * feat(context): register MCP sql execution tool * feat(context): execute MCP SQL through validated connector path * test(context): update SQL analysis port fixtures * docs: add research-agent MCP sql execution foundation plan * feat(context): add scan-backed entity details service * feat(context): register MCP entity details tool * feat(context): expose local MCP entity details * test(context): align entity details scan fixtures * docs: add research-agent MCP entity_details plan * feat(context): add dictionary search service * feat(context): register MCP dictionary search tool * feat(context): expose local MCP dictionary search * docs: add research-agent MCP dictionary_search plan * feat: add MCP discover data service * feat: expose discover data MCP tool * feat: wire local discover data MCP port * docs: add research-agent MCP discover_data plan * feat(cli): add mcp http security helpers * feat(cli): host mcp over streamable http * feat(cli): manage mcp daemon lifecycle * feat(cli): add ktx mcp commands * fix(cli): stabilize mcp daemon verification * docs: add research-agent MCP http daemon plan * feat(cli): install KTX research skill * feat(cli): configure MCP clients in setup agents * feat(cli): support Claude local MCP setup scope * docs: add research-agent MCP setup-agents plan * refactor(context): use connectionId in warehouse verification tools * docs(context): update ingest verification prompts for connectionId * docs: add research-agent MCP ingest contract convergence plan * chore: build runtime artifacts in conductor setup --------- Co-authored-by: Andrey Avtomonov <7889985+andreybavt@users.noreply.github.com>
2026-07-01 08:59:39 +02:00 · 2026-05-15 02:35:09 +02:00 · 2026-05-15 02:35:09 +02:00 · b759a4a286
commit b759a4a286
parent c7b64379bf
78 changed files with 13689 additions and 190 deletions
--- a/packages/cli/src/skills/research/SKILL.md
+++ b/packages/cli/src/skills/research/SKILL.md
@ -0,0 +1,49 @@
+---
+name: ktx-research
+description: Use when answering a question that needs data from a KTX-connected database - investigating, analyzing, "how many", "show me", "what's the breakdown of", finding records by value, exploring tables, comparing periods, or any data-investigation request. Triggers even when the user does not say "research"; if the answer requires querying a configured KTX connection, this skill applies.
+---
+
+# KTX Research Workflow
+
+You have access to KTX MCP tools for investigating data. Follow this workflow.
+
+<workflow>
+1. **Discover** - call `discover_data` first to see what exists across wiki, semantic-layer sources, and raw tables. Returns refs only.
+2. **Inspect top hits in parallel** - for each promising ref:
+   - `kind: 'wiki'` -> `wiki_read`
+   - `kind: 'sl_source'`, `kind: 'sl_measure'`, or `kind: 'sl_dimension'` -> `sl_read_source`
+   - `kind: 'table'` or `kind: 'column'` -> `entity_details`
+3. **Resolve literals** - if the user named a value such as "Acme Corp" or "status=shipped", call `dictionary_search` to find which column holds it.
+4. **Query** -
+   - Prefer `sl_query` when the semantic layer covers the question.
+   - Use `sql_execution` only for questions the semantic layer does not cover.
+5. **Capture learnings** - at the end of the turn, call `memory_capture` so future turns benefit. Skip when the answer carries no durable knowledge.
+</workflow>
+
+<rules>
+- Always run `discover_data` before writing SQL. Do not guess table names.
+- Prefer the semantic layer over raw SQL when both can answer the question; measures are the source of truth.
+- Read entity details before writing SQL against an unfamiliar table. Do not assume column names.
+- Treat `sql_execution` as read-only. Writes are rejected by the server.
+- Validate value mentions with `dictionary_search` instead of guessing case or spelling. Treat a `dictionary_search` miss as non-authoritative. The index is built from profile-sampled values, so a missing value may simply have been outside the sample. Follow up with `sql_execution` against the most plausible columns before concluding the value is absent.
+</rules>
+
+<examples>
+**Input:** "How many orders did Acme Corp place last month?"
+
+**Workflow:**
+1. `dictionary_search({ values: ["Acme Corp"] })` finds `customers.name`.
+2. `discover_data({ query: "orders customer monthly" })` finds an orders semantic-layer source.
+3. `sl_read_source({ connectionId: "warehouse", sourceName: "orders_facts" })` confirms the source grain, measures, and dimensions.
+4. `sl_query({ connectionId: "warehouse", measures: ["order_count"], filters: ["customer_name = 'Acme Corp'"] })` answers through the semantic layer.
+5. `memory_capture({ userMessage, assistantMessage })` captures the durable finding.
+
+---
+
+**Input:** "What columns does the events table have?"
+
+**Workflow:**
+1. `discover_data({ query: "events table" })` returns a `table` ref.
+2. `entity_details({ connectionId: "warehouse", entities: [{ table: "analytics.events" }] })` returns columns, types, and foreign keys.
+3. Answer directly. No query is needed.
+</examples>