mirror of
https://github.com/samvallad33/vestige.git
synced 2026-06-10 20:35:15 +02:00
Make Sanhedrin optional in v2.1.0
This commit is contained in:
parent
c9e96b06fd
commit
4f457ec2db
8 changed files with 315 additions and 131 deletions
|
|
@ -17,12 +17,12 @@ The Cognitive Sandwich wraps every Claude Code response in two layers of cogniti
|
|||
├────────────────────────────────────────────────┤
|
||||
│ 🥪 BOTTOM BREAD — Stop hooks │
|
||||
│ • Veto-detector (fast 50ms regex pre-screen) │
|
||||
│ • Sanhedrin Executioner (LOCAL Qwen3.6-35B) │
|
||||
│ • Sanhedrin Executioner (optional verifier) │
|
||||
│ • Synthesis stop validator (hedge detector) │
|
||||
└────────────────────────────────────────────────┘
|
||||
```
|
||||
|
||||
The Sanhedrin Executioner is the headline of v2.1.0. As of v2.1.0 it runs entirely on a local MLX model (`mlx-community/Qwen3.6-35B-A3B-4bit`), replacing the v2.0.x Haiku 4.5 subagent. **Zero API cost per Claude turn, fully offline, ~5–15s verdict latency on M-series Apple Silicon.**
|
||||
Sanhedrin is optional. The default installer wires the lightweight preflight and stop hooks only; it does not start MLX, require a 19 GB model download, or require 20+ GB of RAM. Users who want the post-response semantic verifier can opt in and point it at any OpenAI-compatible `/v1/chat/completions` endpoint. On Apple Silicon, an additional `--with-launchd` flag can auto-start the local MLX Qwen backend.
|
||||
|
||||
---
|
||||
|
||||
|
|
@ -38,9 +38,9 @@ The Sanhedrin Executioner is the headline of v2.1.0. As of v2.1.0 it runs entire
|
|||
3. **Claude reads the assembled context and generates a draft.**
|
||||
4. **Stop hooks fire serially** (any can VETO with `exit 2`, forcing a rewrite):
|
||||
- `veto-detector.sh` — fast regex against `veto`-tagged Vestige memories (~50ms)
|
||||
- `sanhedrin.sh` → `sanhedrin-local.py` — single-shot local Qwen3.6-35B-A3B verdict
|
||||
- `sanhedrin.sh` → `sanhedrin-local.py` — optional single-shot semantic verdict
|
||||
- `synthesis-stop-validator.sh` — regex against forbidden patterns (hedging, summary-instead-of-composition)
|
||||
5. **If all 3 Stop hooks return `exit 0`, the response is delivered.**
|
||||
5. **If all enabled Stop hooks return `exit 0`, the response is delivered.**
|
||||
|
||||
---
|
||||
|
||||
|
|
@ -85,27 +85,50 @@ cd vestige
|
|||
./scripts/check-sandwich-prereqs.sh # verify everything's wired
|
||||
```
|
||||
|
||||
### Optional Sanhedrin
|
||||
|
||||
Sanhedrin is a separate opt-in layer.
|
||||
|
||||
```bash
|
||||
# Wire the Sanhedrin Stop hook, using the default OpenAI-compatible endpoint.
|
||||
./scripts/install-sandwich.sh --enable-sanhedrin
|
||||
|
||||
# Apple Silicon only, and only if the machine has enough memory:
|
||||
./scripts/install-sandwich.sh --enable-sanhedrin --with-launchd
|
||||
|
||||
# x86 / Linux / Intel Mac: use any OpenAI-compatible endpoint.
|
||||
./scripts/install-sandwich.sh \
|
||||
--enable-sanhedrin \
|
||||
--sanhedrin-endpoint=http://127.0.0.1:11434/v1/chat/completions \
|
||||
--sanhedrin-model=qwen2.5:14b
|
||||
```
|
||||
|
||||
### Prerequisites
|
||||
|
||||
| Tool | Install |
|
||||
|---|---|
|
||||
| macOS Apple Silicon (M1+) | required for MLX |
|
||||
| Python 3.10+ | typically preinstalled |
|
||||
| `jq` | `brew install jq` |
|
||||
| `vestige-mcp` | `cargo install vestige-mcp` |
|
||||
| Claude Code | https://claude.ai/code |
|
||||
|
||||
Optional Apple Silicon local Sanhedrin backend:
|
||||
|
||||
| Tool | Install |
|
||||
|---|---|
|
||||
| macOS Apple Silicon (M1+) | required for MLX launchd only |
|
||||
| `uv` | `brew install uv` |
|
||||
| `mlx-lm` | `uv tool install mlx-lm` |
|
||||
| `huggingface_hub[cli]` | `uv tool install 'huggingface_hub[cli]'` |
|
||||
| `vestige-mcp` | `cargo install vestige-mcp` |
|
||||
| Claude Code | https://claude.ai/code |
|
||||
| Qwen3.6-35B-A3B-4bit | `hf download mlx-community/Qwen3.6-35B-A3B-4bit` (~19 GB) |
|
||||
|
||||
### What the installer does
|
||||
|
||||
1. Verifies prereqs (warnings for missing tools, fatal only on jq/python3).
|
||||
2. Copies hooks to `~/.claude/hooks/`, agents to `~/.claude/agents/`.
|
||||
3. Renders `launchd/com.vestige.mlx-server.plist.template` with your `$HOME` and chosen model, writes to `~/Library/LaunchAgents/`.
|
||||
4. `launchctl load` the plist (auto-start mlx_lm.server with the Qwen model on boot).
|
||||
5. Backs up existing `~/.claude/settings.json` to `.bak.pre-sandwich`, then `jq`-merges the hooks block.
|
||||
3. Backs up existing `~/.claude/settings.json` to `.bak.pre-sandwich`, then `jq`-merges the lightweight hooks block.
|
||||
4. With `--enable-sanhedrin`, writes `~/.claude/hooks/vestige-sanhedrin.env` and merges a Sanhedrin-enabled hooks block.
|
||||
5. With `--enable-sanhedrin --with-launchd` on Apple Silicon, renders and loads `launchd/com.vestige.mlx-server.plist.template`.
|
||||
|
||||
### Uninstall
|
||||
|
||||
|
|
@ -120,7 +143,7 @@ cp ~/.claude/settings.json.bak.pre-sandwich ~/.claude/settings.json
|
|||
|
||||
## Performance notes
|
||||
|
||||
On M3 Max 16-core (400 GB/s memory bandwidth):
|
||||
Optional local MLX backend on M3 Max 16-core (400 GB/s memory bandwidth):
|
||||
- Sanhedrin verdict: 5–15 seconds end-to-end (single deep_reference + single Qwen call)
|
||||
- mlx_lm.server token generation: ~82 tok/s
|
||||
- mlx_lm.server peak resident memory: ~19.7 GB
|
||||
|
|
@ -134,11 +157,12 @@ On M3 Max 14-core or M2/M1 Max: closer to 3–7s prompt processing, ~50–60 tok
|
|||
|
||||
| Env var | Default | Effect |
|
||||
|---|---|---|
|
||||
| `VESTIGE_SANHEDRIN_ENABLED` | `1` | Set to `0` to disable Sanhedrin Stop hook entirely |
|
||||
| `VESTIGE_SANHEDRIN_ENABLED` | `0` | Set to `1` to enable the optional Sanhedrin Stop hook |
|
||||
| `VESTIGE_SWARM_ENABLED` | `1` | Set to `0` to disable preflight lateral-thinker swarm |
|
||||
| `VESTIGE_DASHBOARD_PORT` | `3927` | Vestige MCP HTTP API port used by hooks |
|
||||
| `MLX_ENDPOINT` | `http://127.0.0.1:8080/v1/chat/completions` | OpenAI-compatible chat completions endpoint for Sanhedrin |
|
||||
| `VESTIGE_SANDWICH_MODEL` | `mlx-community/Qwen3.6-35B-A3B-4bit` | Model launchd serves and Sanhedrin requests |
|
||||
| `VESTIGE_SANHEDRIN_ENDPOINT` | `http://127.0.0.1:8080/v1/chat/completions` | OpenAI-compatible chat completions endpoint for Sanhedrin |
|
||||
| `VESTIGE_SANHEDRIN_MODEL` | `mlx-community/Qwen3.6-35B-A3B-4bit` | Model name sent to the Sanhedrin endpoint |
|
||||
| `MLX_ENDPOINT` / `VESTIGE_SANDWICH_MODEL` | legacy aliases | Backward-compatible names still read by the bridge |
|
||||
| `VESTIGE_MEMORY_DIR` | (auto) | Override per-user Claude memory dir |
|
||||
|
||||
---
|
||||
|
|
@ -151,11 +175,12 @@ Full architecture memory: search Vestige for `god-tier-plan` or `cognitive-sandw
|
|||
|
||||
---
|
||||
|
||||
## Linux / Intel Mac
|
||||
## Linux / Intel Mac / x86
|
||||
|
||||
The launchd layer is macOS-arm64-only. On Linux or Intel Mac:
|
||||
- Hooks + agents install fine with `--no-launchd`
|
||||
- The Sanhedrin Stop hook will fail-open (mlx-server unreachable → exit 0)
|
||||
- Optional: run a remote mlx_lm.server / vLLM / Ollama OpenAI-compatible endpoint and set `MLX_ENDPOINT` to its `/v1/chat/completions` URL
|
||||
The base hook harness runs on x86. The launchd MLX helper is macOS-arm64-only.
|
||||
|
||||
Future v2.2.0 will add Linux-native MLX equivalents.
|
||||
On Linux, Windows under WSL, or Intel Mac:
|
||||
- Run `scripts/install-sandwich.sh` normally for lightweight hooks.
|
||||
- If you want Sanhedrin, run an OpenAI-compatible endpoint such as vLLM, Ollama, llama.cpp server, or a remote MLX/vLLM box.
|
||||
- Install with `--enable-sanhedrin --sanhedrin-endpoint=<url> --sanhedrin-model=<model>`.
|
||||
- If the endpoint is unreachable, Sanhedrin fails open and does not block Claude Code.
|
||||
|
|
|
|||
Loading…
Add table
Add a link
Reference in a new issue