Make Sanhedrin optional in v2.1.0

2026-07-24 23:41:01 +02:00 · 2026-05-01 04:55:54 -05:00 · 2026-05-01 04:55:54 -05:00 · 4f457ec2db
commit 4f457ec2db
parent c9e96b06fd
8 changed files with 315 additions and 131 deletions
--- a/CHANGELOG.md
+++ b/CHANGELOG.md
@ -7,13 +7,15 @@ and this project adheres to [Semantic Versioning](https://semver.org/spec/v2.0.0

 ## [2.1.0] - 2026-04-27 — "Cognitive Sandwich Goes Local"

-The Sanhedrin Executioner — Vestige's veto layer for Claude Code responses — now runs entirely on a local MLX model (`mlx-community/Qwen3.6-35B-A3B-4bit`). Zero API cost per Claude turn, fully offline, no Anthropic round-trip on the critical path. Combined with four pre-cognitive UserPromptSubmit hooks (synthesis-preflight, cwd-state-injector, vestige-pulse-daemon, preflight-swarm), Vestige now ships a complete "Cognitive Sandwich" — Vestige memories injected before the model thinks, local Sanhedrin veto after the model speaks — installable in one command on a MacBook.
+The Sanhedrin Executioner — Vestige's veto layer for Claude Code responses — can run against a local MLX model (`mlx-community/Qwen3.6-35B-A3B-4bit`) when explicitly enabled. Combined with four pre-cognitive UserPromptSubmit hooks (synthesis-preflight, cwd-state-injector, vestige-pulse-daemon, preflight-swarm), Vestige now ships a complete "Cognitive Sandwich" — Vestige memories injected before the model thinks, optional Sanhedrin veto after the model speaks.
+
+> 2026-05-01 hotfix: Sanhedrin is optional by default. The default installer no longer wires the Sanhedrin Stop hook, no longer starts MLX, and removes the old v2.1.0 MLX launchd job on reinstall. Users who want Sanhedrin can opt in with `--enable-sanhedrin`; Apple Silicon local MLX autostart is a separate `--with-launchd` flag, and x86 users can point `--sanhedrin-endpoint` at any OpenAI-compatible `/v1/chat/completions` endpoint.

 ### Added

 - **`hooks/`** — first-class harness-side companion to the Vestige MCP server. 9 production hooks designed for `~/.claude/hooks/`:
-  - `sanhedrin.sh` — Stop hook that invokes the local Qwen Executioner via the Python bridge.
-  - `sanhedrin-local.py` — local backend that POSTs to `mlx_lm.server` (`localhost:8080`) with Vestige evidence injected via the dashboard `/api/deep_reference` HTTP endpoint. TRUST_FLOOR=0.55 evidence filter + topical-relevance gate + inference-verb ban + 8 worked few-shots covering true positives AND false-positive guards.
+  - `sanhedrin.sh` — optional Stop hook that invokes the Sanhedrin Executioner via the Python bridge only when `VESTIGE_SANHEDRIN_ENABLED=1`.
+  - `sanhedrin-local.py` — OpenAI-compatible backend that POSTs to the configured Sanhedrin endpoint with Vestige evidence injected via the dashboard `/api/deep_reference` HTTP endpoint. TRUST_FLOOR=0.55 evidence filter + topical-relevance gate + inference-verb ban + 8 worked few-shots covering true positives AND false-positive guards.
  - `synthesis-preflight.sh` — UserPromptSubmit hook that POSTs the user prompt to `/api/deep_reference` and injects the trust-scored reasoning chain into context.
  - `cwd-state-injector.sh` — captures git status, branch, modified files, open PRs/issues.
  - `vestige-pulse-daemon.sh` — surfaces fresh Vestige dream insights from the past 20 min.
@ -21,18 +23,20 @@ The Sanhedrin Executioner — Vestige's veto layer for Claude Code responses —
  - `synthesis-stop-validator.sh` — Stop hook regex against forbidden hedging patterns.
  - `veto-detector.sh` — fast 50ms regex pre-screen against `veto`-tagged Vestige memories.
  - `synthesis-gate.sh` — legacy v1 trigger (kept for backward compat).
-  - `settings.fragment.json` — JSON snippet merged into `~/.claude/settings.json` by the installer.
+  - `settings.fragment.json` — lightweight JSON snippet merged into `~/.claude/settings.json` by the default installer.
+  - `settings.sanhedrin.fragment.json` — opt-in JSON snippet used only with `--enable-sanhedrin`.
 - **Dashboard `/api/changelog` endpoint** — bounded REST event feed for recent `DreamCompleted` and `ConnectionDiscovered` events, used by the Pulse hook to inject fresh synthesis into Claude Code context.
 - **`agents/`** — `executioner.md` (legacy/fallback Haiku 4.5 path), `lateral-thinker.md`, `synthesis-composer.md`.
- **`launchd/com.vestige.mlx-server.plist.template`** — auto-start `mlx_lm.server` with the Qwen3.6-35B-A3B-4bit model on login. Templated with `__HOME__` and `__MODEL__` placeholders.
- **`scripts/install-sandwich.sh`** — one-command installer that stages hooks, agents, plist, jq-merges the settings fragment, and `launchctl load`s the plist. Backs up `settings.json` to `.bak.pre-sandwich`. Supports `--force`, `--no-launchd`, `--include-memory-loader`, `--src=PATH`.
- **`scripts/check-sandwich-prereqs.sh`** — comprehensive prereq verifier (Apple Silicon, Python 3.10+, jq, uv, mlx-lm, hf, claude, vestige-mcp, model on disk, MCP HTTP up, server up, plist installed, settings wired).
+- **`launchd/com.vestige.mlx-server.plist.template`** — optional Apple Silicon helper that auto-starts `mlx_lm.server` with the Qwen3.6-35B-A3B-4bit model on login. Templated with `__HOME__` and `__MODEL__` placeholders.
+- **`scripts/install-sandwich.sh`** — one-command installer that stages hooks, agents, jq-merges the settings fragment, and backs up `settings.json` to `.bak.pre-sandwich`. Supports `--force`, `--enable-sanhedrin`, `--with-launchd`, `--sanhedrin-endpoint`, `--sanhedrin-model`, `--include-memory-loader`, `--src=PATH`.
+- **`scripts/check-sandwich-prereqs.sh`** — prereq verifier for lightweight hooks by default, with `--sanhedrin` for the optional endpoint / MLX checks.
 - **`docs/COGNITIVE_SANDWICH.md`** — architecture diagram, install guide, performance notes (82 tok/s on M3 Max), uninstall, configuration env vars.
 - **PR #48** — `VESTIGE_DATA_DIR` env-var support + tilde expansion + secure unix perms (thanks @Jelloeater) — directly addresses the ghost env-vars exposed by v2.0.9 cleanup.

 ### Changed

- **Sanhedrin Executioner default backend swapped from Anthropic Haiku 4.5 → local `mlx_lm.server` + Qwen3.6-35B-A3B-4bit.** Anthropic API key no longer required for the post-cognitive layer. The `executioner.md` agent definition is retained as manual/fallback only when invoked explicitly via `Task(subagent_type='executioner')`.
+- **Sanhedrin is optional by default.** Default installs run on x86 and low-memory machines without downloading or starting the 19 GB MLX model. Reinstalling the default v2.1.0 hotfix removes the old mandatory `com.vestige.mlx-server` launchd job if it exists.
+- **Sanhedrin Executioner backend swapped from Anthropic Haiku 4.5 → OpenAI-compatible endpoint, with local `mlx_lm.server` + Qwen3.6-35B-A3B-4bit as the Apple Silicon opt-in path.** Anthropic API key no longer required for the post-cognitive layer. The `executioner.md` agent definition is retained as manual/fallback only when invoked explicitly via `Task(subagent_type='executioner')`.
 - **All hooks sanitized for public release** — replaced hardcoded personal absolute paths with `$HOME` / `$VESTIGE_*` env vars; removed personal regex tokens.
 - **NPM binary installer now follows package version** — `vestige-mcp-server@2.1.0` downloads release assets from `v2.1.0` instead of a stale hardcoded binary tag, while local workspace installs skip the release-asset download before the tag exists.

@ -41,6 +45,7 @@ The Sanhedrin Executioner — Vestige's veto layer for Claude Code responses —
 - `cargo test --workspace --release --no-fail-fast`: **1,229 passing, 0 failed** (366 vestige-core + 358 vestige-mcp lib + 4 vestige-mcp bin + 497 e2e + 4 doctests).
 - Sanhedrin bridge smoke checks: Python bytecode compilation passes, fail-open bridge invocation returns `yes`, and public hook settings validate as JSON.
 - 8-day Sandwich dogfood: **84% pass rate, 16% legitimate vetoes** caught real hallucinations.
+- 2026-05-01 hotfix checks: `cargo test --workspace --no-fail-fast`, `cargo build --release --workspace`, shell/Python/JSON validation, and default/opt-in installer dry-runs all pass.

 ### Closes

@ -48,8 +53,14 @@ The Sanhedrin Executioner — Vestige's veto layer for Claude Code responses —

 ### Prerequisites for the Cognitive Sandwich

- macOS Apple Silicon (M1+) — required for MLX
 - Python 3.10+
+- `jq`
+- `vestige-mcp`
+- Claude Code
+
+Optional local MLX Sanhedrin backend:
+
+- macOS Apple Silicon (M1+) — required for the launchd MLX helper only
 - ~22 GB free RAM (Qwen3.6-35B-A3B-4bit at runtime)
 - First-run model download: ~19 GB from Hugging Face (cached locally thereafter)

--- a/docs/COGNITIVE_SANDWICH.md
+++ b/docs/COGNITIVE_SANDWICH.md
@ -17,12 +17,12 @@ The Cognitive Sandwich wraps every Claude Code response in two layers of cogniti
 ├────────────────────────────────────────────────┤
 │  🥪 BOTTOM BREAD — Stop hooks                   │
 │   • Veto-detector (fast 50ms regex pre-screen)  │
-│   • Sanhedrin Executioner (LOCAL Qwen3.6-35B)   │
+│   • Sanhedrin Executioner (optional verifier)   │
 │   • Synthesis stop validator (hedge detector)   │
 └────────────────────────────────────────────────┘
 ```

-The Sanhedrin Executioner is the headline of v2.1.0. As of v2.1.0 it runs entirely on a local MLX model (`mlx-community/Qwen3.6-35B-A3B-4bit`), replacing the v2.0.x Haiku 4.5 subagent. **Zero API cost per Claude turn, fully offline, ~5–15s verdict latency on M-series Apple Silicon.**
+Sanhedrin is optional. The default installer wires the lightweight preflight and stop hooks only; it does not start MLX, require a 19 GB model download, or require 20+ GB of RAM. Users who want the post-response semantic verifier can opt in and point it at any OpenAI-compatible `/v1/chat/completions` endpoint. On Apple Silicon, an additional `--with-launchd` flag can auto-start the local MLX Qwen backend.

 ---

@ -38,9 +38,9 @@ The Sanhedrin Executioner is the headline of v2.1.0. As of v2.1.0 it runs entire
 3. **Claude reads the assembled context and generates a draft.**
 4. **Stop hooks fire serially** (any can VETO with `exit 2`, forcing a rewrite):
   - `veto-detector.sh` — fast regex against `veto`-tagged Vestige memories (~50ms)
-   - `sanhedrin.sh` → `sanhedrin-local.py` — single-shot local Qwen3.6-35B-A3B verdict
+   - `sanhedrin.sh` → `sanhedrin-local.py` — optional single-shot semantic verdict
   - `synthesis-stop-validator.sh` — regex against forbidden patterns (hedging, summary-instead-of-composition)
-5. **If all 3 Stop hooks return `exit 0`, the response is delivered.**
+5. **If all enabled Stop hooks return `exit 0`, the response is delivered.**

 ---

@ -85,27 +85,50 @@ cd vestige
 ./scripts/check-sandwich-prereqs.sh     # verify everything's wired
 ```

+### Optional Sanhedrin
+
+Sanhedrin is a separate opt-in layer.
+
+```bash
+# Wire the Sanhedrin Stop hook, using the default OpenAI-compatible endpoint.
+./scripts/install-sandwich.sh --enable-sanhedrin
+
+# Apple Silicon only, and only if the machine has enough memory:
+./scripts/install-sandwich.sh --enable-sanhedrin --with-launchd
+
+# x86 / Linux / Intel Mac: use any OpenAI-compatible endpoint.
+./scripts/install-sandwich.sh \
+  --enable-sanhedrin \
+  --sanhedrin-endpoint=http://127.0.0.1:11434/v1/chat/completions \
+  --sanhedrin-model=qwen2.5:14b
+```
+
 ### Prerequisites

 | Tool | Install |
 |---|---|
-| macOS Apple Silicon (M1+) | required for MLX |
 | Python 3.10+ | typically preinstalled |
 | `jq` | `brew install jq` |
+| `vestige-mcp` | `cargo install vestige-mcp` |
+| Claude Code | https://claude.ai/code |
+
+Optional Apple Silicon local Sanhedrin backend:
+
+| Tool | Install |
+|---|---|
+| macOS Apple Silicon (M1+) | required for MLX launchd only |
 | `uv` | `brew install uv` |
 | `mlx-lm` | `uv tool install mlx-lm` |
 | `huggingface_hub[cli]` | `uv tool install 'huggingface_hub[cli]'` |
-| `vestige-mcp` | `cargo install vestige-mcp` |
-| Claude Code | https://claude.ai/code |
 | Qwen3.6-35B-A3B-4bit | `hf download mlx-community/Qwen3.6-35B-A3B-4bit` (~19 GB) |

 ### What the installer does

 1. Verifies prereqs (warnings for missing tools, fatal only on jq/python3).
 2. Copies hooks to `~/.claude/hooks/`, agents to `~/.claude/agents/`.
-3. Renders `launchd/com.vestige.mlx-server.plist.template` with your `$HOME` and chosen model, writes to `~/Library/LaunchAgents/`.
-4. `launchctl load` the plist (auto-start mlx_lm.server with the Qwen model on boot).
-5. Backs up existing `~/.claude/settings.json` to `.bak.pre-sandwich`, then `jq`-merges the hooks block.
+3. Backs up existing `~/.claude/settings.json` to `.bak.pre-sandwich`, then `jq`-merges the lightweight hooks block.
+4. With `--enable-sanhedrin`, writes `~/.claude/hooks/vestige-sanhedrin.env` and merges a Sanhedrin-enabled hooks block.
+5. With `--enable-sanhedrin --with-launchd` on Apple Silicon, renders and loads `launchd/com.vestige.mlx-server.plist.template`.

 ### Uninstall

@ -120,7 +143,7 @@ cp ~/.claude/settings.json.bak.pre-sandwich ~/.claude/settings.json

 ## Performance notes

-On M3 Max 16-core (400 GB/s memory bandwidth):
+Optional local MLX backend on M3 Max 16-core (400 GB/s memory bandwidth):
 - Sanhedrin verdict: 5–15 seconds end-to-end (single deep_reference + single Qwen call)
 - mlx_lm.server token generation: ~82 tok/s
 - mlx_lm.server peak resident memory: ~19.7 GB
@ -134,11 +157,12 @@ On M3 Max 14-core or M2/M1 Max: closer to 3–7s prompt processing, ~50–60 tok

 | Env var | Default | Effect |
 |---|---|---|
-| `VESTIGE_SANHEDRIN_ENABLED` | `1` | Set to `0` to disable Sanhedrin Stop hook entirely |
+| `VESTIGE_SANHEDRIN_ENABLED` | `0` | Set to `1` to enable the optional Sanhedrin Stop hook |
 | `VESTIGE_SWARM_ENABLED` | `1` | Set to `0` to disable preflight lateral-thinker swarm |
 | `VESTIGE_DASHBOARD_PORT` | `3927` | Vestige MCP HTTP API port used by hooks |
-| `MLX_ENDPOINT` | `http://127.0.0.1:8080/v1/chat/completions` | OpenAI-compatible chat completions endpoint for Sanhedrin |
-| `VESTIGE_SANDWICH_MODEL` | `mlx-community/Qwen3.6-35B-A3B-4bit` | Model launchd serves and Sanhedrin requests |
+| `VESTIGE_SANHEDRIN_ENDPOINT` | `http://127.0.0.1:8080/v1/chat/completions` | OpenAI-compatible chat completions endpoint for Sanhedrin |
+| `VESTIGE_SANHEDRIN_MODEL` | `mlx-community/Qwen3.6-35B-A3B-4bit` | Model name sent to the Sanhedrin endpoint |
+| `MLX_ENDPOINT` / `VESTIGE_SANDWICH_MODEL` | legacy aliases | Backward-compatible names still read by the bridge |
 | `VESTIGE_MEMORY_DIR` | (auto) | Override per-user Claude memory dir |

 ---
@ -151,11 +175,12 @@ Full architecture memory: search Vestige for `god-tier-plan` or `cognitive-sandw

 ---

-## Linux / Intel Mac
+## Linux / Intel Mac / x86

-The launchd layer is macOS-arm64-only. On Linux or Intel Mac:
- Hooks + agents install fine with `--no-launchd`
- The Sanhedrin Stop hook will fail-open (mlx-server unreachable → exit 0)
- Optional: run a remote mlx_lm.server / vLLM / Ollama OpenAI-compatible endpoint and set `MLX_ENDPOINT` to its `/v1/chat/completions` URL
+The base hook harness runs on x86. The launchd MLX helper is macOS-arm64-only.

-Future v2.2.0 will add Linux-native MLX equivalents.
+On Linux, Windows under WSL, or Intel Mac:
+- Run `scripts/install-sandwich.sh` normally for lightweight hooks.
+- If you want Sanhedrin, run an OpenAI-compatible endpoint such as vLLM, Ollama, llama.cpp server, or a remote MLX/vLLM box.
+- Install with `--enable-sanhedrin --sanhedrin-endpoint=<url> --sanhedrin-model=<model>`.
+- If the endpoint is unreachable, Sanhedrin fails open and does not block Claude Code.
--- a/hooks/sanhedrin-local.py
+++ b/hooks/sanhedrin-local.py
@ -1,5 +1,5 @@
 #!/usr/bin/env python3
-# sanhedrin-local.py — Local Qwen3.6-35B-A3B Sanhedrin Executioner.
+# sanhedrin-local.py — OpenAI-compatible Sanhedrin Executioner bridge.
 # Drop-in replacement for the Haiku 4.5 subagent that sanhedrin.sh used to spawn.
 #
 # Reads draft from stdin, prints single-line verdict to stdout:
@ -8,10 +8,10 @@
 #
 # Architecture:
 #   stdin (draft) -> Vestige /api/deep_reference (single semantic query)
-#                 -> mlx_lm.server localhost:8080 (one-shot judgment)
+#                 -> OpenAI-compatible chat endpoint (one-shot judgment)
 #                 -> stdout (single-line verdict)
 #
-# Fail-open: if mlx-server unreachable, print "yes" and exit 0 (don't break
+# Fail-open: if the endpoint is unreachable, print "yes" and exit 0 (don't break
 # the Cognitive Sandwich on infra errors). The wrapping sanhedrin.sh maps
 # "yes" to exit 0, so this preserves existing fail-open semantics.

@ -35,7 +35,11 @@ VESTIGE_BASE_URL = (
    os.environ.get("VESTIGE_BASE_URL") or f"http://127.0.0.1:{DASHBOARD_PORT}"
 ).rstrip("/")

-MLX_ENDPOINT = os.environ.get("MLX_ENDPOINT") or "http://127.0.0.1:8080/v1/chat/completions"
+SANHEDRIN_ENDPOINT = (
+    os.environ.get("VESTIGE_SANHEDRIN_ENDPOINT")
+    or os.environ.get("MLX_ENDPOINT")
+    or "http://127.0.0.1:8080/v1/chat/completions"
+)
 VESTIGE_ENDPOINT = (
    os.environ.get("VESTIGE_DEEP_REFERENCE_ENDPOINT")
    or f"{VESTIGE_BASE_URL}/api/deep_reference"
@ -43,8 +47,12 @@ VESTIGE_ENDPOINT = (
 VESTIGE_HEALTH = (
    os.environ.get("VESTIGE_HEALTH_ENDPOINT") or f"{VESTIGE_BASE_URL}/api/health"
 )
-MODEL = os.environ.get("VESTIGE_SANDWICH_MODEL") or "mlx-community/Qwen3.6-35B-A3B-4bit"
-MLX_TIMEOUT = env_int("MLX_TIMEOUT", 45)
+MODEL = (
+    os.environ.get("VESTIGE_SANHEDRIN_MODEL")
+    or os.environ.get("VESTIGE_SANDWICH_MODEL")
+    or "mlx-community/Qwen3.6-35B-A3B-4bit"
+)
+SANHEDRIN_TIMEOUT = env_int("VESTIGE_SANHEDRIN_TIMEOUT", env_int("MLX_TIMEOUT", 45))
 VESTIGE_TIMEOUT = env_int("VESTIGE_TIMEOUT", 5)
 THINK_RE = re.compile(r"<think>.*?</think>", re.DOTALL | re.IGNORECASE)

@ -289,7 +297,7 @@ def judge(draft: str, evidence: str) -> str:
            "\n\nOn second thought", "\n\nOh wait",
        ],
    }
-    resp = post_json(MLX_ENDPOINT, body, MLX_TIMEOUT)
+    resp = post_json(SANHEDRIN_ENDPOINT, body, SANHEDRIN_TIMEOUT)
    if not isinstance(resp, dict):
        return ""
    try:
--- a/hooks/sanhedrin.sh
+++ b/hooks/sanhedrin.sh
@ -17,25 +17,37 @@
 #   sanhedrin.sh (2-8s Haiku subagent, may block) →
 #   synthesis-stop-validator.sh (existing regex hedge check, may block)
 #
-# Opt-in: set VESTIGE_SANHEDRIN_ENABLED=1 in parent shell.
+# Opt-in: set VESTIGE_SANHEDRIN_ENABLED=1 in parent shell, or install with
+# scripts/install-sandwich.sh --enable-sanhedrin.
 # Re-entrancy lock: VESTIGE_EXECUTIONER_ACTIVE=1 inside the subagent.
 #
 # Ship date 2026-04-20.

 set -u

-# === OPT-OUT GATE ===
-# Post-Cognitive Sanhedrin is ON by default as of 2026-04-21 (birthday
-# launch day). To disable, set VESTIGE_SANHEDRIN_ENABLED=0 in your
-# environment. Default-on guarantees the Cognitive Sandwich fires on
-# fresh machines, Docker containers, GUI-launched Claude Code, and
-# shells without .zshrc — any case where the Claude Code process lacks
-# a sourced profile. The re-entrancy guard (VESTIGE_EXECUTIONER_ACTIVE)
-# below still prevents fork-bombs from the subagent's own Stop hook.
-if [ "${VESTIGE_SANHEDRIN_ENABLED:-1}" = "0" ]; then
-  exit 0
+# === OPT-IN GATE ===
+# Sanhedrin is heavyweight: the default local backend is a ~19 GB model and
+# needs roughly 20+ GB of free RAM. Keep it disabled unless the user explicitly
+# opts in. The installer writes this env file only for --enable-sanhedrin.
+SANHEDRIN_ENV="${VESTIGE_SANHEDRIN_ENV:-$HOME/.claude/hooks/vestige-sanhedrin.env}"
+if [ -f "$SANHEDRIN_ENV" ]; then
+  set +u
+  set -a
+  # shellcheck disable=SC1090
+  . "$SANHEDRIN_ENV" 2>/dev/null || {
+    set +a
+    set -u
+    exit 0
+  }
+  set +a
+  set -u
 fi

+case "${VESTIGE_SANHEDRIN_ENABLED:-0}" in
+  1|true|TRUE|yes|YES|on|ON) ;;
+  *) exit 0 ;;
+esac
+
 # === RE-ENTRANCY GUARD ===
 # The Executioner's own Stop hook will fire when it returns — prevent
 # recursive spawns that would fork-bomb the quota.
@ -114,11 +126,11 @@ if [ -z "$DRAFT" ]; then
 fi

 # === VERIFY local executioner bridge available ===
-# 2026-04-25: switched from Haiku 4.5 subagent to local Qwen3.6-35B-A3B
-# via mlx_lm.server (launchd com.vestige.mlx-server). Bridge script
-# fetches Vestige evidence via HTTP API (VESTIGE_DASHBOARD_PORT, default 3927)
-# then judges via MLX_ENDPOINT (default port 8080). Zero per-token cost, fully offline,
-# sub-second-to-15s verdict latency. Fail-open if mlx-server unreachable.
+# 2026-04-25: switched from Haiku 4.5 subagent to an OpenAI-compatible
+# local/remote endpoint. On Apple Silicon the optional launchd path starts
+# mlx_lm.server; on x86 users can point VESTIGE_SANHEDRIN_ENDPOINT at vLLM,
+# Ollama, llama.cpp, or any compatible /v1/chat/completions endpoint.
+# Fail-open if the endpoint is unreachable.
 BRIDGE="$HOME/.claude/hooks/sanhedrin-local.py"
 if [ ! -x "$BRIDGE" ] && [ ! -f "$BRIDGE" ]; then
  exit 0
@ -191,15 +203,15 @@ case "$TRIMMED" in

 $REASON

-The Executioner (local Qwen3.6-35B-A3B via mlx_lm.server, fresh context,
-fed Vestige deep_reference evidence over HTTP) judged your draft and
+The Executioner (Sanhedrin endpoint, fresh context, fed Vestige
+deep_reference evidence over HTTP) judged your draft and
 found a contradiction against a high-trust memory.

 You may NOT stop. Rewrite WITHOUT the contradicted claim. Use
 mcp__vestige__deep_reference to inspect the cited memory and cite the
 correct replacement pattern from its \`recommended\` field.

-Local-only, zero API cost, fully offline. Bridge script:
+Bridge script:
 ~/.claude/hooks/sanhedrin-local.py
 SANHEDRIN_MSG
    exit 2
--- a/hooks/settings.fragment.json
+++ b/hooks/settings.fragment.json
@ -14,7 +14,6 @@
      {
        "hooks": [
          { "type": "command", "command": "$HOME/.claude/hooks/veto-detector.sh", "timeout": 6 },
-          { "type": "command", "command": "$HOME/.claude/hooks/sanhedrin.sh", "timeout": 70 },
          { "type": "command", "command": "$HOME/.claude/hooks/synthesis-stop-validator.sh", "timeout": 6 }
        ]
      }
--- a/hooks/settings.sanhedrin.fragment.json
+++ b/hooks/settings.sanhedrin.fragment.json
@ -0,0 +1,23 @@
+{
+  "hooks": {
+    "UserPromptSubmit": [
+      {
+        "hooks": [
+          { "type": "command", "command": "$HOME/.claude/hooks/synthesis-preflight.sh", "timeout": 8 },
+          { "type": "command", "command": "$HOME/.claude/hooks/cwd-state-injector.sh", "timeout": 8 },
+          { "type": "command", "command": "$HOME/.claude/hooks/vestige-pulse-daemon.sh", "timeout": 6 },
+          { "type": "command", "command": "$HOME/.claude/hooks/preflight-swarm.sh", "timeout": 45 }
+        ]
+      }
+    ],
+    "Stop": [
+      {
+        "hooks": [
+          { "type": "command", "command": "$HOME/.claude/hooks/veto-detector.sh", "timeout": 6 },
+          { "type": "command", "command": "VESTIGE_SANHEDRIN_ENABLED=1 $HOME/.claude/hooks/sanhedrin.sh", "timeout": 70 },
+          { "type": "command", "command": "$HOME/.claude/hooks/synthesis-stop-validator.sh", "timeout": 6 }
+        ]
+      }
+    ]
+  }
+}
--- a/scripts/check-sandwich-prereqs.sh
+++ b/scripts/check-sandwich-prereqs.sh
@ -5,21 +5,50 @@ set -u
 ok()   { printf '  \033[1;32m[ OK ]\033[0m %s\n' "$*"; }
 warn() { printf '  \033[1;33m[WARN]\033[0m %s\n' "$*"; FAIL=1; }
 miss() { printf '  \033[1;31m[MISS]\033[0m %s\n' "$*"; FAIL=1; }
+info() { printf '  \033[1;36m[INFO]\033[0m %s\n' "$*"; }

 FAIL=0
+CHECK_SANHEDRIN=0
 DASHBOARD_PORT="${VESTIGE_DASHBOARD_PORT:-3927}"
-MLX_ENDPOINT="${MLX_ENDPOINT:-http://127.0.0.1:8080/v1/chat/completions}"
-MLX_ENDPOINT="${MLX_ENDPOINT%/}"
-MLX_MODELS_URL="${MLX_ENDPOINT%/chat/completions}/models"
+SANHEDRIN_ENV="${VESTIGE_SANHEDRIN_ENV:-$HOME/.claude/hooks/vestige-sanhedrin.env}"
+
+for arg in "$@"; do
+  case "$arg" in
+    --sanhedrin|--enable-sanhedrin) CHECK_SANHEDRIN=1 ;;
+    -h|--help)
+      cat <<'EOF'
+Usage: scripts/check-sandwich-prereqs.sh [--sanhedrin]
+
+Without flags, checks the lightweight Cognitive Sandwich hooks.
+With --sanhedrin, also checks the optional OpenAI-compatible verifier endpoint.
+EOF
+      exit 0
+      ;;
+  esac
+done
+
+if [ -f "$SANHEDRIN_ENV" ]; then
+  set +u
+  set -a
+  # shellcheck disable=SC1090
+  . "$SANHEDRIN_ENV" 2>/dev/null || true
+  set +a
+  set -u
+fi
+
+SANHEDRIN_ENDPOINT="${VESTIGE_SANHEDRIN_ENDPOINT:-${MLX_ENDPOINT:-http://127.0.0.1:8080/v1/chat/completions}}"
+SANHEDRIN_ENDPOINT="${SANHEDRIN_ENDPOINT%/}"
+SANHEDRIN_MODELS_URL="${SANHEDRIN_ENDPOINT%/chat/completions}/models"

 echo "Vestige Cognitive Sandwich — Prereq Check"
 echo

 # Platform
-if [ "$(uname -s)" = "Darwin" ] && [ "$(uname -m)" = "arm64" ]; then
-  ok "Apple Silicon macOS ($(sw_vers -productVersion 2>/dev/null || echo darwin))"
-else
-  miss "Apple Silicon Mac required (M1+). Detected $(uname -s) $(uname -m)."
+OS_NAME="$(uname -s)"
+ARCH_NAME="$(uname -m)"
+ok "Platform: $OS_NAME $ARCH_NAME"
+if [ "$OS_NAME" != "Darwin" ] || [ "$ARCH_NAME" != "arm64" ]; then
+  info "Local MLX launchd is Apple Silicon-only; base hooks and endpoint-backed Sanhedrin can run on x86."
 fi

 # Python
@ -35,24 +64,9 @@ fi

 # CLI tools
 command -v jq            >/dev/null && ok "jq"            || miss "jq missing — brew install jq"
-command -v uv            >/dev/null && ok "uv"            || miss "uv missing — brew install uv"
-command -v mlx_lm.server >/dev/null && ok "mlx-lm"        || miss "mlx-lm — uv tool install mlx-lm"
-command -v hf            >/dev/null && ok "huggingface_hub CLI" || miss "hf — uv tool install 'huggingface_hub[cli]'"
 command -v claude        >/dev/null && ok "claude CLI"    || miss "claude CLI — install Claude Code"
 command -v vestige-mcp   >/dev/null && ok "vestige-mcp"   || miss "vestige-mcp — cargo install vestige-mcp"

-# Model on disk — HF cache uses `models--<org>--<name>` (double-dash separators).
-MODEL="${VESTIGE_SANDWICH_MODEL:-mlx-community/Qwen3.6-35B-A3B-4bit}"
-HF_HOME_DEFAULT="${HF_HOME:-$HOME/.cache/huggingface}"
-ENC_MODEL="models--$(printf '%s' "$MODEL" | sed 's|/|--|g')"
-if [ -d "$HF_HOME_DEFAULT/hub/$ENC_MODEL" ]; then
-  ok "Model cached: $MODEL"
-else
-  printf '  \033[1;33m[INFO]\033[0m Model not yet downloaded — first run will fetch ~19GB\n'
-  printf '         hf download %s\n' "$MODEL"
-  # NOT a failure — first-run download is expected.
-fi
-
 # Vestige MCP HTTP API
 if curl -fsS -m 2 "http://127.0.0.1:${DASHBOARD_PORT}/api/health" >/dev/null 2>&1; then
  ok "vestige-mcp dashboard responding on :$DASHBOARD_PORT"
@ -60,20 +74,6 @@ else
  warn "vestige-mcp dashboard not responding on :$DASHBOARD_PORT"
 fi

-# OpenAI-compatible local/remote model endpoint
-if curl -fsS -m 2 "$MLX_MODELS_URL" >/dev/null 2>&1; then
-  ok "model endpoint responding at $MLX_MODELS_URL"
-else
-  warn "model endpoint not responding at $MLX_MODELS_URL — install + load launchd plist or set MLX_ENDPOINT"
-fi
-
-# launchd plist
-if [ -f "$HOME/Library/LaunchAgents/com.vestige.mlx-server.plist" ]; then
-  ok "launchd plist installed"
-else
-  warn "launchd plist missing — run: install-sandwich.sh"
-fi
-
 # Settings hook wiring
 if [ -f "$HOME/.claude/settings.json" ] && \
   jq -e '.hooks.UserPromptSubmit and .hooks.Stop' "$HOME/.claude/settings.json" >/dev/null 2>&1; then
@ -82,9 +82,59 @@ else
  warn "settings.json missing hooks block — run: install-sandwich.sh"
 fi

+if [ "$CHECK_SANHEDRIN" -eq 1 ]; then
+  echo
+  echo "Optional Sanhedrin"
+
+  if [ -f "$SANHEDRIN_ENV" ]; then
+    ok "Sanhedrin env file present"
+  else
+    warn "Sanhedrin env file missing — run: install-sandwich.sh --enable-sanhedrin"
+  fi
+
+  if [ "$OS_NAME" = "Darwin" ] && [ "$ARCH_NAME" = "arm64" ]; then
+    command -v uv            >/dev/null && ok "uv"            || warn "uv missing — brew install uv"
+    command -v mlx_lm.server >/dev/null && ok "mlx-lm"        || warn "mlx-lm — uv tool install mlx-lm"
+    command -v hf            >/dev/null && ok "huggingface_hub CLI" || warn "hf — uv tool install 'huggingface_hub[cli]'"
+
+    MODEL="${VESTIGE_SANHEDRIN_MODEL:-${VESTIGE_SANDWICH_MODEL:-mlx-community/Qwen3.6-35B-A3B-4bit}}"
+    HF_HOME_DEFAULT="${HF_HOME:-$HOME/.cache/huggingface}"
+    ENC_MODEL="models--$(printf '%s' "$MODEL" | sed 's|/|--|g')"
+    if [ -d "$HF_HOME_DEFAULT/hub/$ENC_MODEL" ]; then
+      ok "Model cached: $MODEL"
+    else
+      info "Model not cached: $MODEL (local MLX path downloads ~19GB)"
+    fi
+
+    if [ -f "$HOME/Library/LaunchAgents/com.vestige.mlx-server.plist" ]; then
+      ok "launchd plist installed"
+    else
+      info "launchd plist not installed; endpoint-backed Sanhedrin can still run"
+    fi
+  else
+    info "Skipping MLX/launchd checks on $OS_NAME $ARCH_NAME"
+  fi
+
+  if curl -fsS -m 2 "$SANHEDRIN_MODELS_URL" >/dev/null 2>&1; then
+    ok "Sanhedrin model endpoint responding at $SANHEDRIN_MODELS_URL"
+  else
+    warn "Sanhedrin endpoint not responding at $SANHEDRIN_MODELS_URL"
+  fi
+
+  if [ -f "$HOME/.claude/settings.json" ] && \
+     jq -e '.hooks.Stop[]?.hooks[]?.command | contains("sanhedrin.sh")' "$HOME/.claude/settings.json" >/dev/null 2>&1; then
+    ok "Sanhedrin Stop hook wired"
+  else
+    warn "Sanhedrin Stop hook not wired — run: install-sandwich.sh --enable-sanhedrin"
+  fi
+else
+  echo
+  info "Sanhedrin is optional and not checked. Use --sanhedrin to verify an enabled endpoint."
+fi
+
 echo
 if [ $FAIL -eq 0 ]; then
-  echo "  Ready. Cognitive Sandwich will fire on next Claude Code prompt."
+  echo "  Ready. Lightweight Cognitive Sandwich hooks will fire on next Claude Code prompt."
  exit 0
 else
  echo "  Fix the items above, then re-run."
--- a/scripts/install-sandwich.sh
+++ b/scripts/install-sandwich.sh
@ -4,26 +4,26 @@
 # Usage:
 #   curl -fsSL https://raw.githubusercontent.com/samvallad33/vestige/v2.1.0/scripts/install-sandwich.sh | sh
 #   # or, from a checkout:
-#   ./scripts/install-sandwich.sh [--force] [--no-launchd] [--include-memory-loader]
+#   ./scripts/install-sandwich.sh [--force] [--enable-sanhedrin] [--with-launchd] [--include-memory-loader]
+#   ./scripts/install-sandwich.sh --enable-sanhedrin --sanhedrin-endpoint=http://127.0.0.1:11434/v1/chat/completions --sanhedrin-model=qwen2.5:14b
 #
 # What it does:
 #   1. Verifies required local tools
-#   2. Stages ~/.claude/hooks/, ~/.claude/agents/, ~/Library/LaunchAgents/
+#   2. Stages ~/.claude/hooks/ and ~/.claude/agents/
 #   3. Copies sanitized hooks + agents
-#   4. Renders launchd plist template with $HOME and chosen MODEL
-#   5. Merges hooks block into ~/.claude/settings.json (preserves existing keys)
-#   6. launchctl load com.vestige.mlx-server (auto-starts mlx_lm.server with Qwen3.6-35B-A3B)
-#   7. Prints next-steps for model download
+#   4. Merges the lightweight hooks block into ~/.claude/settings.json
+#   5. Optionally enables Sanhedrin and, only with --with-launchd on Apple Silicon,
+#      auto-starts mlx_lm.server with Qwen3.6-35B-A3B

 set -euo pipefail

 VERSION="${VESTIGE_SANDWICH_VERSION:-v2.1.0}"
 REPO="samvallad33/vestige"
-MODEL_ID="${VESTIGE_SANDWICH_MODEL:-mlx-community/Qwen3.6-35B-A3B-4bit}"
+MODEL_ID="${VESTIGE_SANHEDRIN_MODEL:-${VESTIGE_SANDWICH_MODEL:-mlx-community/Qwen3.6-35B-A3B-4bit}}"
 DASHBOARD_PORT="${VESTIGE_DASHBOARD_PORT:-3927}"
-MLX_ENDPOINT="${MLX_ENDPOINT:-http://127.0.0.1:8080/v1/chat/completions}"
-MLX_ENDPOINT="${MLX_ENDPOINT%/}"
-MLX_MODELS_URL="${MLX_ENDPOINT%/chat/completions}/models"
+SANHEDRIN_ENDPOINT="${VESTIGE_SANHEDRIN_ENDPOINT:-${MLX_ENDPOINT:-http://127.0.0.1:8080/v1/chat/completions}}"
+SANHEDRIN_ENDPOINT="${SANHEDRIN_ENDPOINT%/}"
+SANHEDRIN_MODELS_URL="${SANHEDRIN_ENDPOINT%/chat/completions}/models"

 HOOKS_DIR="$HOME/.claude/hooks"
 AGENTS_DIR="$HOME/.claude/agents"
@ -31,35 +31,53 @@ LAUNCHD_DIR="$HOME/Library/LaunchAgents"
 SETTINGS="$HOME/.claude/settings.json"

 FORCE=0
-NO_LAUNCHD=0
+ENABLE_SANHEDRIN=0
+WITH_LAUNCHD=0
 INCLUDE_MEMORY_LOADER=0
 SRC=""

 for arg in "$@"; do
  case "$arg" in
    --force) FORCE=1 ;;
-    --no-launchd) NO_LAUNCHD=1 ;;
+    --enable-sanhedrin) ENABLE_SANHEDRIN=1 ;;
+    --with-launchd) WITH_LAUNCHD=1 ;;
+    --no-launchd) WITH_LAUNCHD=0 ;;
    --include-memory-loader) INCLUDE_MEMORY_LOADER=1 ;;
+    --sanhedrin-endpoint=*|--endpoint=*)
+      SANHEDRIN_ENDPOINT="${arg#*=}"
+      SANHEDRIN_ENDPOINT="${SANHEDRIN_ENDPOINT%/}"
+      SANHEDRIN_MODELS_URL="${SANHEDRIN_ENDPOINT%/chat/completions}/models"
+      ;;
+    --sanhedrin-model=*|--model=*)
+      MODEL_ID="${arg#*=}"
+      ;;
    --src=*) SRC="${arg#--src=}" ;;
    -h|--help)
-      sed -n '2,20p' "$0"
+      sed -n '2,24p' "$0"
      exit 0
      ;;
  esac
 done

+if [ "$WITH_LAUNCHD" -eq 1 ] && [ "$ENABLE_SANHEDRIN" -eq 0 ]; then
+  ENABLE_SANHEDRIN=1
+fi
+
 say()  { printf '\033[1;36m[sandwich]\033[0m %s\n' "$*"; }
 warn() { printf '\033[1;33m[sandwich]\033[0m %s\n' "$*" >&2; }
 die()  { printf '\033[1;31m[sandwich]\033[0m %s\n' "$*" >&2; exit 1; }

-# --- Platform check (honors --no-launchd for Linux/Intel users) ---
-if [ "$(uname -s)" != "Darwin" ]; then
-  if [ "$NO_LAUNCHD" -eq 0 ]; then
-    die "macOS required for the launchd auto-start of mlx_lm.server. Re-run with --no-launchd to install hooks only and run mlx_lm.server manually."
-  fi
-  warn "Non-Darwin platform — installing hooks/agents only (no launchd). Run an OpenAI-compatible model endpoint and set MLX_ENDPOINT if it is not $MLX_ENDPOINT."
-elif [ "$(uname -m)" != "arm64" ]; then
-  warn "Apple Silicon recommended (M1+). Detected $(uname -m). The local Qwen3.6 model requires arm64 + Metal."
+# --- Platform check ---
+OS_NAME="$(uname -s)"
+ARCH_NAME="$(uname -m)"
+say "platform: $OS_NAME $ARCH_NAME"
+if [ "$ENABLE_SANHEDRIN" -eq 1 ] && [ "$WITH_LAUNCHD" -eq 0 ]; then
+  say "Sanhedrin enabled without launchd; using OpenAI-compatible endpoint: $SANHEDRIN_ENDPOINT"
+fi
+if [ "$WITH_LAUNCHD" -eq 1 ] && { [ "$OS_NAME" != "Darwin" ] || [ "$ARCH_NAME" != "arm64" ]; }; then
+  warn "--with-launchd is Apple Silicon only; skipping local MLX autostart on $OS_NAME $ARCH_NAME"
+  warn "Sanhedrin can still run on x86 via --sanhedrin-endpoint or VESTIGE_SANHEDRIN_ENDPOINT."
+  WITH_LAUNCHD=0
 fi

 # --- Prereqs (warnings only, install proceeds) ---
@ -67,9 +85,11 @@ command -v jq      >/dev/null || die "jq required: brew install jq"
 command -v python3 >/dev/null || die "python3 required (3.10+)"
 command -v claude  >/dev/null || warn "'claude' CLI not found — install Claude Code first."
 command -v vestige-mcp >/dev/null || warn "'vestige-mcp' not found — install with: cargo install vestige-mcp"
-command -v uv      >/dev/null || warn "'uv' not found — install with: brew install uv"
-command -v mlx_lm.server >/dev/null || warn "mlx-lm not installed — run: uv tool install mlx-lm"
-command -v hf      >/dev/null || warn "'hf' not found — run: uv tool install 'huggingface_hub[cli]'"
+if [ "$WITH_LAUNCHD" -eq 1 ]; then
+  command -v uv      >/dev/null || warn "'uv' not found — install with: brew install uv"
+  command -v mlx_lm.server >/dev/null || warn "mlx-lm not installed — run: uv tool install mlx-lm"
+  command -v hf      >/dev/null || warn "'hf' not found — run: uv tool install 'huggingface_hub[cli]'"
+fi

 # --- Resolve source: local checkout or release tarball ---
 if [ -n "$SRC" ]; then
@ -89,7 +109,21 @@ fi
 [ -d "$SCRIPT_DIR/hooks" ] || die "hooks/ not found in $SCRIPT_DIR — wrong source?"

 # --- Stage directories ---
-mkdir -p "$HOOKS_DIR" "$AGENTS_DIR" "$LAUNCHD_DIR"
+mkdir -p "$HOOKS_DIR" "$AGENTS_DIR"
+if [ "$WITH_LAUNCHD" -eq 1 ]; then
+  mkdir -p "$LAUNCHD_DIR"
+fi
+
+# v2.1.0 originally installed the MLX server as part of the default path.
+# Default reinstalls now retire that job; users can restore it with --with-launchd.
+if [ "$WITH_LAUNCHD" -eq 0 ] && [ "$OS_NAME" = "Darwin" ]; then
+  LEGACY_PLIST="$LAUNCHD_DIR/com.vestige.mlx-server.plist"
+  if [ -f "$LEGACY_PLIST" ]; then
+    launchctl unload "$LEGACY_PLIST" 2>/dev/null || true
+    rm -f "$LEGACY_PLIST"
+    say "removed old Sanhedrin launchd job (use --with-launchd to opt back in)"
+  fi
+fi

 # --- Copy hooks ---
 copied=0; skipped=0
@ -121,8 +155,25 @@ for f in "$SCRIPT_DIR/agents"/*.md; do
 done
 say "agents installed to $AGENTS_DIR"

-# --- Render launchd plist (macOS only) ---
-if [ "$NO_LAUNCHD" -eq 0 ]; then
+# --- Persist optional Sanhedrin env ---
+quote_env() {
+  printf "'%s'" "$(printf '%s' "$1" | sed "s/'/'\\\\''/g")"
+}
+
+if [ "$ENABLE_SANHEDRIN" -eq 1 ]; then
+  SANHEDRIN_ENV="$HOOKS_DIR/vestige-sanhedrin.env"
+  {
+    printf 'VESTIGE_SANHEDRIN_ENABLED=1\n'
+    printf 'VESTIGE_SANHEDRIN_ENDPOINT=%s\n' "$(quote_env "$SANHEDRIN_ENDPOINT")"
+    printf 'VESTIGE_SANHEDRIN_MODEL=%s\n' "$(quote_env "$MODEL_ID")"
+    printf 'VESTIGE_DASHBOARD_PORT=%s\n' "$(quote_env "$DASHBOARD_PORT")"
+  } > "$SANHEDRIN_ENV"
+  chmod 0600 "$SANHEDRIN_ENV"
+  say "Sanhedrin opt-in config written to $SANHEDRIN_ENV"
+fi
+
+# --- Render launchd plist (Apple Silicon opt-in only) ---
+if [ "$WITH_LAUNCHD" -eq 1 ]; then
  PLIST="$LAUNCHD_DIR/com.vestige.mlx-server.plist"
  TEMPLATE="$SCRIPT_DIR/launchd/com.vestige.mlx-server.plist.template"
  [ -f "$TEMPLATE" ] || die "launchd template missing: $TEMPLATE"
@ -140,7 +191,11 @@ else
  cp "$SETTINGS" "$HOME/.claude/settings.json.bak.pre-sandwich"
 fi
 TMP_MERGE="$(mktemp)"
-jq -s '.[0] * .[1]' "$SETTINGS" "$SCRIPT_DIR/hooks/settings.fragment.json" > "$TMP_MERGE"
+SETTINGS_FRAGMENT="$SCRIPT_DIR/hooks/settings.fragment.json"
+if [ "$ENABLE_SANHEDRIN" -eq 1 ]; then
+  SETTINGS_FRAGMENT="$SCRIPT_DIR/hooks/settings.sanhedrin.fragment.json"
+fi
+jq -s '.[0] * .[1]' "$SETTINGS" "$SETTINGS_FRAGMENT" > "$TMP_MERGE"
 mv "$TMP_MERGE" "$SETTINGS"
 say "merged hooks block into $SETTINGS (backup at .bak.pre-sandwich)"

@ -148,23 +203,24 @@ say "merged hooks block into $SETTINGS (backup at .bak.pre-sandwich)"
 cat <<EOF

  ┌──────────────────────────────────────────────────────────────┐
-  │  Cognitive Sandwich installed.                                │
+  │  Cognitive Sandwich hooks installed.                          │
  └──────────────────────────────────────────────────────────────┘

  Next steps:
-    1. Download the local model (~19 GB, one-time):
-         hf download $MODEL_ID
-    2. Restart Claude Code so it picks up the new hooks.
-    3. Verify the install:
+    1. Restart Claude Code so it picks up the new hooks.
+    2. Verify the install:
         vestige health                 # if vestige CLI installed
         curl http://127.0.0.1:$DASHBOARD_PORT/api/health
-         curl $MLX_MODELS_URL
-    4. Try a prompt — the Sanhedrin Stop hook will fire and judge
-       Claude's draft against your Vestige memory before delivery.
+         scripts/check-sandwich-prereqs.sh   # from a checkout
+    3. Optional Sanhedrin verifier:
+         ./scripts/install-sandwich.sh --enable-sanhedrin --sanhedrin-endpoint=$SANHEDRIN_ENDPOINT --sanhedrin-model=$MODEL_ID
+       On Apple Silicon with >20 GB free RAM, add --with-launchd to auto-start
+       the local MLX Qwen server. On x86, point --sanhedrin-endpoint at vLLM,
+       Ollama, llama.cpp, or another OpenAI-compatible /v1/chat/completions URL.

  To uninstall:
-    launchctl unload $LAUNCHD_DIR/com.vestige.mlx-server.plist
-    rm $LAUNCHD_DIR/com.vestige.mlx-server.plist
+    launchctl unload $LAUNCHD_DIR/com.vestige.mlx-server.plist 2>/dev/null || true
+    rm -f $LAUNCHD_DIR/com.vestige.mlx-server.plist
    cp $HOME/.claude/settings.json.bak.pre-sandwich $HOME/.claude/settings.json

 EOF