diff --git a/doc/usage.md b/doc/usage.md
index 4912ce8..9e980af 100644
--- a/doc/usage.md
+++ b/doc/usage.md
@@ -79,6 +79,8 @@ For OpenAI API compatibility:
 | `/api/config`                      | GET    | Endpoint configuration                   |
 | `/api/usage-stream`                | GET    | Real-time usage updates (SSE)            |
 | `/health`                          | GET    | Health check                             |
+| `/api/cache/stats`                 | GET    | Cache hit/miss counters and config       |
+| `/api/cache/invalidate`            | POST   | Clear all cache entries and counters     |
 
 ## Making Requests
 
@@ -147,6 +149,58 @@ The MOE system:
 3. Selects the best response
 4. Generates a final refined response
 
+### Semantic LLM Cache
+
+The router can cache LLM responses and serve them instantly — bypassing endpoint selection, model loading, and token generation entirely. Cached responses work for both streaming and non-streaming clients.
+
+Enable it in `config.yaml`:
+
+```yaml
+cache_enabled: true
+cache_backend: sqlite       # persists across restarts
+cache_similarity: 0.9       # semantic matching (requires :semantic image)
+cache_ttl: 3600
+```
+
+For exact-match only (no extra dependencies):
+
+```yaml
+cache_enabled: true
+cache_backend: sqlite
+cache_similarity: 1.0
+```
+
+Check cache performance:
+
+```bash
+curl http://localhost:12434/api/cache/stats
+```
+
+```json
+{
+  "enabled": true,
+  "hits": 1547,
+  "misses": 892,
+  "hit_rate": 0.634,
+  "semantic": true,
+  "backend": "sqlite",
+  "similarity_threshold": 0.9,
+  "history_weight": 0.3
+}
+```
+
+Clear the cache:
+
+```bash
+curl -X POST http://localhost:12434/api/cache/invalidate
+```
+
+**Notes:**
+- MOE requests (`moe-*` model prefix) always bypass the cache
+- Cache is isolated per `model + system prompt` — different users with different system prompts cannot receive each other's cached responses
+- Semantic matching requires the `:semantic` Docker image tag (`ghcr.io/nomyo-ai/nomyo-router:latest-semantic`)
+- See [configuration.md](configuration.md#semantic-llm-cache) for all cache options
+
 ### Token Tracking
 
 The router automatically tracks token usage: