diff --git a/doc/README.md b/doc/README.md index 7978cec..3909089 100644 --- a/doc/README.md +++ b/doc/README.md @@ -24,7 +24,7 @@ doc/ 1. **Install the router**: ```bash - git clone https://github.com/nomyo-ai/nomyo-router.git + git clone https://bitfreedom.net/code/nomyo-ai/nomyo-router.git cd nomyo-router python3 -m venv .venv/router source .venv/router/bin/activate @@ -36,14 +36,19 @@ doc/ endpoints: - http://localhost:11434 max_concurrent_connections: 2 - # Optional router-level API key (leave blank to disable) - nomyo-router-api-key: "" ``` + +# Optional router-level API key (leave blank to disable) + +nomyo-router-api-key: "" + +``` 3. **Run the router**: - ```bash - uvicorn router:app --host 0.0.0.0 --port 12434 - ``` +```bash +uvicorn router:app --host 0.0.0.0 --port 12434 +``` + 4. **Use the router**: Point your frontend to `http://localhost:12434` instead of your Ollama instance. ### Key Features @@ -138,14 +143,15 @@ For additional help: Happy routing! 🚀 - ## Router API key usage If the router API key is set (`NOMYO_ROUTER_API_KEY` env or `nomyo-router-api-key` in config), include it in every request: + - Header (preferred): Authorization: Bearer - Query param: ?api_key= Example: + ```bash curl -H "Authorization: Bearer $NOMYO_ROUTER_API_KEY" http://localhost:12434/api/tags ``` diff --git a/doc/deployment.md b/doc/deployment.md index efff3ff..484e210 100644 --- a/doc/deployment.md +++ b/doc/deployment.md @@ -16,7 +16,7 @@ NOMYO Router can be deployed in various environments depending on your requireme ```bash # Clone the repository -git clone https://github.com/nomyo-ai/nomyo-router.git +git clone https://bitfreedom.net/code/nomyo-ai/nomyo-router.git cd nomyo-router # Create virtual environment @@ -84,10 +84,11 @@ sudo systemctl status nomyo-router ### Image variants -| Tag | Semantic cache | Image size | -|---|---|---| -| `latest` | ❌ exact match only | ~300 MB | -| `latest-semantic` | ✅ sentence-transformers + `all-MiniLM-L6-v2` pre-baked | ~800 MB | + +| Tag | Semantic cache | Image size | +| ------------------- | -------------------------------------------------------- | ------------ | +| `latest` | ❌ exact match only | ~300 MB | +| `latest-semantic` | ✅ sentence-transformers +`all-MiniLM-L6-v2` pre-baked | ~800 MB | The `:semantic` variant enables `cache_similarity < 1.0` in `config.yaml`. The lean image falls back to exact-match caching with a warning if semantic mode is configured. diff --git a/doc/usage.md b/doc/usage.md index 9e980af..54b2551 100644 --- a/doc/usage.md +++ b/doc/usage.md @@ -5,7 +5,7 @@ ### 1. Install the Router ```bash -git clone https://github.com/nomyo-ai/nomyo-router.git +git clone https://bitfreedom.net/code/nomyo-ai/nomyo-router.git cd nomyo-router python3 -m venv .venv/router source .venv/router/bin/activate @@ -42,8 +42,9 @@ Configure your frontend to point to `http://localhost:12434` instead of your Oll The router provides all standard Ollama API endpoints: + | Endpoint | Method | Description | -| --------------- | ------ | --------------------- | +| ----------------- | -------- | ----------------------- | | `/api/generate` | POST | Generate text | | `/api/chat` | POST | Chat completions | | `/api/embed` | POST | Embeddings | @@ -60,8 +61,9 @@ The router provides all standard Ollama API endpoints: For OpenAI API compatibility: + | Endpoint | Method | Description | -| ---------------------- | ------ | ---------------- | +| ------------------------ | -------- | ------------------ | | `/v1/chat/completions` | POST | Chat completions | | `/v1/completions` | POST | Text completions | | `/v1/embeddings` | POST | Embeddings | @@ -69,18 +71,19 @@ For OpenAI API compatibility: ### Monitoring Endpoints -| Endpoint | Method | Description | -| ---------------------------------- | ------ | ---------------------------------------- | -| `/api/usage` | GET | Current connection counts | -| `/api/token_counts` | GET | Token usage statistics | -| `/api/stats` | POST | Detailed model statistics | -| `/api/aggregate_time_series_days` | POST | Aggregate time series data into daily | -| `/api/version` | GET | Ollama version info | -| `/api/config` | GET | Endpoint configuration | -| `/api/usage-stream` | GET | Real-time usage updates (SSE) | -| `/health` | GET | Health check | -| `/api/cache/stats` | GET | Cache hit/miss counters and config | -| `/api/cache/invalidate` | POST | Clear all cache entries and counters | + +| Endpoint | Method | Description | +| ----------------------------------- | -------- | --------------------------------------- | +| `/api/usage` | GET | Current connection counts | +| `/api/token_counts` | GET | Token usage statistics | +| `/api/stats` | POST | Detailed model statistics | +| `/api/aggregate_time_series_days` | POST | Aggregate time series data into daily | +| `/api/version` | GET | Ollama version info | +| `/api/config` | GET | Endpoint configuration | +| `/api/usage-stream` | GET | Real-time usage updates (SSE) | +| `/health` | GET | Health check | +| `/api/cache/stats` | GET | Cache hit/miss counters and config | +| `/api/cache/invalidate` | POST | Clear all cache entries and counters | ## Making Requests @@ -196,6 +199,7 @@ curl -X POST http://localhost:12434/api/cache/invalidate ``` **Notes:** + - MOE requests (`moe-*` model prefix) always bypass the cache - Cache is isolated per `model + system prompt` — different users with different system prompts cannot receive each other's cached responses - Semantic matching requires the `:semantic` Docker image tag (`ghcr.io/nomyo-ai/nomyo-router:latest-semantic`) @@ -404,14 +408,15 @@ print(f"Available models: {[m.id for m in response.data]}") See the [examples](examples/) directory for complete integration examples. - ### Authentication to NOMYO Router If a router API key is configured, include it with each request: + - Header: Authorization: Bearer - Query: ?api_key= Example (tags): + ```bash curl -H "Authorization: Bearer $NOMYO_ROUTER_API_KEY" http://localhost:12434/api/tags ``` diff --git a/requirements.txt b/requirements.txt index da6fe43..a3befbe 100644 --- a/requirements.txt +++ b/requirements.txt @@ -42,4 +42,4 @@ yarl==1.20.1 aiosqlite # Semantic LLM cache — base install (exact-match mode, no heavy ML deps) # For semantic mode use the :semantic Docker image tag (adds sentence-transformers + torch) -semantic-llm-cache@git+https://github.com/nomyo-ai/async-semantic-llm-cache.git@v0.1 +semantic-llm-cache@git+https://bitfreedom.net/code/nomyo-ai/async-semantic-llm-cache.git@v0.1.1