mirror of
https://github.com/Kaelio/ktx.git
synced 2026-06-07 07:55:13 +02:00
* docs(specs): design research-agent MCP tools and ktx mcp daemon Adds the 2026-05-14 design spec for exposing four new MCP tools (discover_data, entity_details, dictionary_search, sql_execution), shipping a ktx-research skill, and introducing an HTTP-only ktx mcp daemon so external agents can use KTX as a research-capable context layer. * Refine research-agent MCP tools spec after adversarial review iteration 1 * Refine research-agent MCP tools spec after adversarial review iteration 2 * Refine research-agent MCP tools spec after adversarial review iteration 3 * Refine spec: drop connectionName compat carve-out and ground summary/snippet provenance per kind * feat(daemon): validate read-only SQL with sqlglot * feat(context): expose read-only SQL validation port * feat(context): register MCP sql execution tool * feat(context): execute MCP SQL through validated connector path * test(context): update SQL analysis port fixtures * docs: add research-agent MCP sql execution foundation plan * feat(context): add scan-backed entity details service * feat(context): register MCP entity details tool * feat(context): expose local MCP entity details * test(context): align entity details scan fixtures * docs: add research-agent MCP entity_details plan * feat(context): add dictionary search service * feat(context): register MCP dictionary search tool * feat(context): expose local MCP dictionary search * docs: add research-agent MCP dictionary_search plan * feat: add MCP discover data service * feat: expose discover data MCP tool * feat: wire local discover data MCP port * docs: add research-agent MCP discover_data plan * feat(cli): add mcp http security helpers * feat(cli): host mcp over streamable http * feat(cli): manage mcp daemon lifecycle * feat(cli): add ktx mcp commands * fix(cli): stabilize mcp daemon verification * docs: add research-agent MCP http daemon plan * feat(cli): install KTX research skill * feat(cli): configure MCP clients in setup agents * feat(cli): support Claude local MCP setup scope * docs: add research-agent MCP setup-agents plan * refactor(context): use connectionId in warehouse verification tools * docs(context): update ingest verification prompts for connectionId * docs: add research-agent MCP ingest contract convergence plan * chore: build runtime artifacts in conductor setup --------- Co-authored-by: Andrey Avtomonov <7889985+andreybavt@users.noreply.github.com>
4.2 KiB
4.2 KiB
| name | description | callers | |
|---|---|---|---|
| historic_sql_patterns | Identify recurring cross-table historic-SQL analytical intents from a bounded pattern shard and emit typed pattern evidence for deterministic wiki projection. |
|
Historic SQL Patterns
Use this skill when the WorkUnit raw file is a patterns-input/part-0001.json style shard from the historic-sql adapter. Older staged bundles may still provide root patterns-input.json; when that is the WorkUnit raw file, read it the same way.
Required Workflow
- Read the WorkUnit notes first.
- Find the single pattern input file listed under the WorkUnit
rawFilessection. - Call
read_raw_filefor that exact raw file path. - Identify recurring analytical intents that span at least two tables and have repeated usage signal.
- Emit one
patternevidence object per durable cross-table intent by callingemit_historic_sql_evidence. - Set each evidence object's
rawPathto the exact raw file path read in step 3. - Stop after all pattern evidence has been emitted.
Every join column mentioned in pattern descriptions must be verified via entity_details for both sides of the join.
Identifier Verification Protocol
Before writing a wiki page or SL source on any topic:
discover_data({query: "<topic>"})- see what wikis, SL sources, and raw tables already exist. Prefer updating existing pages over creating new ones.
Before emitting any schema.table or schema.table.column into a wiki body,
SL source, tables: frontmatter, sl_refs, or emit_unmapped_fallback:
entity_details({connectionId, targets: [{display: "<identifier>"}]})- confirm the identifier resolves; inspect native types, FK/PK, and sampleValues.- For literal values from the source, such as status codes or plan tiers,
check whether they appear in
entity_detailssampleValues for the relevant column. If sampleValues is short or the sample may have missed real values, run asql_executionprobe with the same warehouse connection id:sql_execution({connectionId, sql: "SELECT DISTINCT <col> FROM <ref> LIMIT 50"}). - If the candidate identifier still does not resolve, do one of:
- Use
sql_execution({connectionId, sql: "SELECT 1 FROM <ref> LIMIT 0"}). If it errors, the identifier is fictional. - Wrap the identifier in
[unverified - from <rawPath>]in the wiki body, citing the exact raw path that mentioned it. - When recording
emit_unmapped_fallbackwithno_physical_table, include the failing probe error inclarification.
- Use
- Never copy
<schema>.<table>placeholder strings from these instructions into output.
Evidence Shape
Each call to emit_historic_sql_evidence must use this shape:
{
"kind": "pattern",
"rawPath": "patterns-input/part-0001.json",
"pattern": {
"slug": "order-lifecycle-analysis",
"title": "Order Lifecycle Analysis",
"narrative": "Analysts compare order statuses with customer segments to understand lifecycle movement.",
"definitionSql": "select o.status, count(*) from public.orders o join public.customers c on c.id = o.customer_id group by o.status",
"tablesInvolved": ["public.orders", "public.customers"],
"slRefs": ["orders", "customers"],
"constituentTemplateIds": ["pg:1", "pg:2"]
}
}
The pattern object must match patternOutputSchema; multiple calls together must form patternsArraySchema.
Pattern Selection Rules
- Prefer patterns that involve two or more tables.
- Prefer templates with
executionsBucketat least10-100anddistinctUsersBucketabove solo usage. - Merge templates into one pattern only when the business intent is the same.
- Use a stable kebab-case slug based on intent, not a template id.
- Set
definitionSqlto the clearest representative SQL from a constituent template. - Set
slRefsto source names when the source name is obvious from table names; omit uncertain refs rather than guessing. - Treat each pattern shard independently; do not read peer shard files from
peerFileIndex.
Boundaries
- Do not call wiki_write.
- Do not call sl_write_source.
- Do not call sl_edit_source.
- Do not call context_candidate_write.
- Do not create single-table pattern pages.
- Do not copy credentials, tokens, user emails, or unredacted literals into evidence.