test(mcp): guard instructions.py against tool drift

The MCP `instructions` hint is static and baked into the client prompt, while tool names, signatures, and error codes are discovered dynamically via tools/list. The two had drifted: instructions restated stale signatures and an error-code enum that omitted schema_validation and trigger_path_conflict. - Trim instructions.py to tool names + call order; stop restating signatures and error codes the dynamic surface already carries. - Document each tool's full error_code contract in the save_workflow and create_workflow docstrings (the descriptions shipped via tools/list). - Add test_mcp_instructions_drift.py: every tool named in the guide must be registered, and every error_code a tool returns must appear in its description. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-06-22 08:38:13 +02:00 · 2026-05-20 18:43:18 +05:30 · 2026-05-20 18:43:18 +05:30 · 8484e4bfaf
commit 8484e4bfaf
parent 5762095edf
4 changed files with 170 additions and 30 deletions
--- a/api/mcp_server/instructions.py
+++ b/api/mcp_server/instructions.py
@ -7,6 +7,14 @@ handling, hard constraints). Design-level per-field guidance belongs in
 each `PropertySpec.llm_hint`; it flows out through `get_node_type` and
 doesn't need to be repeated here.

+Tool names, parameters, and per-tool `error_code` values are NOT
+authoritative here — they reach the model dynamically via `tools/list`
+from each tool's own signature and docstring. Reference tools by bare
+name and describe orchestration; do not restate signatures (they drift)
+or re-enumerate error codes (document those on the tool itself).
+`test_mcp_instructions_drift.py` fails if this guide names a tool that
+is not registered, or if a tool's error codes aren't in its docstring.
+
 Extend based on real LLM failures — every bullet below ideally maps to a
 mistake the system has seen at least once.
 """
@ -17,22 +25,22 @@ You build and edit Dograh voice-AI workflows by emitting TypeScript that uses th
 ## Call order

 ### Reading documentation
-1. `search_docs(query)` — use first for keyword or acronym lookup when the user is asking how Dograh works or how to configure something.
-2. `read_doc(path)` — fetch the full page once one result looks likely. Prefer this over reasoning from search summaries alone.
-3. `list_docs(path=None, depth=1)` — use when the user wants to browse a topic area or when search terms are too vague. Returned section paths feed back into `list_docs`; returned page paths feed into `read_doc`.
+1. `search_docs` — use first for keyword or acronym lookup when the user is asking how Dograh works or how to configure something.
+2. `read_doc` — fetch the full page once one result looks likely. Prefer this over reasoning from search summaries alone.
+3. `list_docs` — use when the user wants to browse a topic area or when search terms are too vague. Call it with no arguments for the top-level sections; returned section paths feed back into `list_docs`, returned page paths feed into `read_doc`.

 ### Editing an existing workflow
 1. `list_workflows` — locate the target workflow.
-2. `get_workflow_code(workflow_id)` — fetch the current source.
-3. (optional) `list_node_types` / `get_node_type(name)` — consult before adding or editing a node type whose fields aren't already visible in the current code.
+2. `get_workflow_code` — fetch the current source for that workflow.
+3. (optional) `list_node_types` / `get_node_type` — consult before adding or editing a node type whose fields aren't already visible in the current code.
 4. Mutate the code in place. Preserve existing nodes, edges, and variable names unless the task requires removing or renaming them.
-5. `save_workflow(workflow_id, code)` — persist as a new draft. The published version is untouched.
+5. `save_workflow` — persist as a new draft. The published version is untouched.

 ### Creating a new workflow
 1. Create a simple 1-node workflow with only `startCall`. The user can iteratively add complexity by editing it.
-2. `list_node_types` / `get_node_type(name)` — consult to learn the fields available on the node types you intend to use.
+2. `list_node_types` / `get_node_type` — consult to learn the fields available on the node types you intend to use.
 3. Author SDK TypeScript from scratch. The `new Workflow({ name: "..." })` call is required — `name` becomes the workflow's display name.
-4. `create_workflow(code)` — persists a new workflow as version 1 (published). Returns the new `workflow_id`. For subsequent edits use `save_workflow(workflow_id, code)` (which writes a draft).
+4. `create_workflow` — persists a new workflow as version 1 (published). Returns the new `workflow_id`. For subsequent edits use `save_workflow` (which writes a draft).

 ## Allowed source shape

@ -73,14 +81,7 @@ Example:

 ## Iterating on errors

-`save_workflow` and `create_workflow` return one of:
- `parse_error` — Disallowed construct (see grammar above) or malformed TypeScript.
- `validation_error` — Node data failed spec validation (unknown field, missing required, wrong type, bad `options` value).
- `graph_validation` — Structural rule broken (missing startCall, unreachable node, edge to/from wrong node type).
- `missing_name` — (`create_workflow` only) `new Workflow({ name })` is absent or empty.
- `bridge_error` — Internal; retry once, then surface to the user.
-
-Every error carries `line` and `column`. Fix at that location and resubmit the **complete source** — this tool does not accept patches.
+A failed `save_workflow` / `create_workflow` returns a result with `saved`/`created` set to false, a machine-readable `error_code`, and a human-readable `error` message — carrying `line` and `column` when the problem is locatable in your source. The full set of `error_code` values and their meanings is documented on each tool (visible in its description). Read the `error` message, fix at the reported location, and resubmit the **complete source** — these tools do not accept patches. If a failure looks internal or transient rather than a problem with your code, retry once before surfacing it to the user.

 ## Field conventions