dev-v0.7.x to prod #1
5 changed files with 44 additions and 30 deletions
|
|
@ -24,7 +24,7 @@ doc/
|
|||
1. **Install the router**:
|
||||
|
||||
```bash
|
||||
git clone https://github.com/nomyo-ai/nomyo-router.git
|
||||
git clone https://bitfreedom.net/code/nomyo-ai/nomyo-router.git
|
||||
cd nomyo-router
|
||||
python3 -m venv .venv/router
|
||||
source .venv/router/bin/activate
|
||||
|
|
@ -36,14 +36,19 @@ doc/
|
|||
endpoints:
|
||||
- http://localhost:11434
|
||||
max_concurrent_connections: 2
|
||||
# Optional router-level API key (leave blank to disable)
|
||||
nomyo-router-api-key: ""
|
||||
```
|
||||
|
||||
# Optional router-level API key (leave blank to disable)
|
||||
|
||||
nomyo-router-api-key: ""
|
||||
|
||||
```
|
||||
3. **Run the router**:
|
||||
|
||||
```bash
|
||||
uvicorn router:app --host 0.0.0.0 --port 12434
|
||||
```
|
||||
```bash
|
||||
uvicorn router:app --host 0.0.0.0 --port 12434
|
||||
```
|
||||
|
||||
4. **Use the router**: Point your frontend to `http://localhost:12434` instead of your Ollama instance.
|
||||
|
||||
### Key Features
|
||||
|
|
@ -138,14 +143,15 @@ For additional help:
|
|||
|
||||
Happy routing! 🚀
|
||||
|
||||
|
||||
## Router API key usage
|
||||
|
||||
If the router API key is set (`NOMYO_ROUTER_API_KEY` env or `nomyo-router-api-key` in config), include it in every request:
|
||||
|
||||
- Header (preferred): Authorization: Bearer <router_key>
|
||||
- Query param: ?api_key=<router_key>
|
||||
|
||||
Example:
|
||||
|
||||
```bash
|
||||
curl -H "Authorization: Bearer $NOMYO_ROUTER_API_KEY" http://localhost:12434/api/tags
|
||||
```
|
||||
|
|
|
|||
|
|
@ -16,7 +16,7 @@ NOMYO Router can be deployed in various environments depending on your requireme
|
|||
|
||||
```bash
|
||||
# Clone the repository
|
||||
git clone https://github.com/nomyo-ai/nomyo-router.git
|
||||
git clone https://bitfreedom.net/code/nomyo-ai/nomyo-router.git
|
||||
cd nomyo-router
|
||||
|
||||
# Create virtual environment
|
||||
|
|
@ -84,10 +84,11 @@ sudo systemctl status nomyo-router
|
|||
|
||||
### Image variants
|
||||
|
||||
| Tag | Semantic cache | Image size |
|
||||
|---|---|---|
|
||||
| `latest` | ❌ exact match only | ~300 MB |
|
||||
| `latest-semantic` | ✅ sentence-transformers + `all-MiniLM-L6-v2` pre-baked | ~800 MB |
|
||||
|
||||
| Tag | Semantic cache | Image size |
|
||||
| ------------------- | -------------------------------------------------------- | ------------ |
|
||||
| `latest` | ❌ exact match only | ~300 MB |
|
||||
| `latest-semantic` | ✅ sentence-transformers +`all-MiniLM-L6-v2` pre-baked | ~800 MB |
|
||||
|
||||
The `:semantic` variant enables `cache_similarity < 1.0` in `config.yaml`. The lean image falls back to exact-match caching with a warning if semantic mode is configured.
|
||||
|
||||
|
|
|
|||
37
doc/usage.md
37
doc/usage.md
|
|
@ -5,7 +5,7 @@
|
|||
### 1. Install the Router
|
||||
|
||||
```bash
|
||||
git clone https://github.com/nomyo-ai/nomyo-router.git
|
||||
git clone https://bitfreedom.net/code/nomyo-ai/nomyo-router.git
|
||||
cd nomyo-router
|
||||
python3 -m venv .venv/router
|
||||
source .venv/router/bin/activate
|
||||
|
|
@ -42,8 +42,9 @@ Configure your frontend to point to `http://localhost:12434` instead of your Oll
|
|||
|
||||
The router provides all standard Ollama API endpoints:
|
||||
|
||||
|
||||
| Endpoint | Method | Description |
|
||||
| --------------- | ------ | --------------------- |
|
||||
| ----------------- | -------- | ----------------------- |
|
||||
| `/api/generate` | POST | Generate text |
|
||||
| `/api/chat` | POST | Chat completions |
|
||||
| `/api/embed` | POST | Embeddings |
|
||||
|
|
@ -60,8 +61,9 @@ The router provides all standard Ollama API endpoints:
|
|||
|
||||
For OpenAI API compatibility:
|
||||
|
||||
|
||||
| Endpoint | Method | Description |
|
||||
| ---------------------- | ------ | ---------------- |
|
||||
| ------------------------ | -------- | ------------------ |
|
||||
| `/v1/chat/completions` | POST | Chat completions |
|
||||
| `/v1/completions` | POST | Text completions |
|
||||
| `/v1/embeddings` | POST | Embeddings |
|
||||
|
|
@ -69,18 +71,19 @@ For OpenAI API compatibility:
|
|||
|
||||
### Monitoring Endpoints
|
||||
|
||||
| Endpoint | Method | Description |
|
||||
| ---------------------------------- | ------ | ---------------------------------------- |
|
||||
| `/api/usage` | GET | Current connection counts |
|
||||
| `/api/token_counts` | GET | Token usage statistics |
|
||||
| `/api/stats` | POST | Detailed model statistics |
|
||||
| `/api/aggregate_time_series_days` | POST | Aggregate time series data into daily |
|
||||
| `/api/version` | GET | Ollama version info |
|
||||
| `/api/config` | GET | Endpoint configuration |
|
||||
| `/api/usage-stream` | GET | Real-time usage updates (SSE) |
|
||||
| `/health` | GET | Health check |
|
||||
| `/api/cache/stats` | GET | Cache hit/miss counters and config |
|
||||
| `/api/cache/invalidate` | POST | Clear all cache entries and counters |
|
||||
|
||||
| Endpoint | Method | Description |
|
||||
| ----------------------------------- | -------- | --------------------------------------- |
|
||||
| `/api/usage` | GET | Current connection counts |
|
||||
| `/api/token_counts` | GET | Token usage statistics |
|
||||
| `/api/stats` | POST | Detailed model statistics |
|
||||
| `/api/aggregate_time_series_days` | POST | Aggregate time series data into daily |
|
||||
| `/api/version` | GET | Ollama version info |
|
||||
| `/api/config` | GET | Endpoint configuration |
|
||||
| `/api/usage-stream` | GET | Real-time usage updates (SSE) |
|
||||
| `/health` | GET | Health check |
|
||||
| `/api/cache/stats` | GET | Cache hit/miss counters and config |
|
||||
| `/api/cache/invalidate` | POST | Clear all cache entries and counters |
|
||||
|
||||
## Making Requests
|
||||
|
||||
|
|
@ -196,6 +199,7 @@ curl -X POST http://localhost:12434/api/cache/invalidate
|
|||
```
|
||||
|
||||
**Notes:**
|
||||
|
||||
- MOE requests (`moe-*` model prefix) always bypass the cache
|
||||
- Cache is isolated per `model + system prompt` — different users with different system prompts cannot receive each other's cached responses
|
||||
- Semantic matching requires the `:semantic` Docker image tag (`ghcr.io/nomyo-ai/nomyo-router:latest-semantic`)
|
||||
|
|
@ -404,14 +408,15 @@ print(f"Available models: {[m.id for m in response.data]}")
|
|||
|
||||
See the [examples](examples/) directory for complete integration examples.
|
||||
|
||||
|
||||
### Authentication to NOMYO Router
|
||||
|
||||
If a router API key is configured, include it with each request:
|
||||
|
||||
- Header: Authorization: Bearer <router_key>
|
||||
- Query: ?api_key=<router_key>
|
||||
|
||||
Example (tags):
|
||||
|
||||
```bash
|
||||
curl -H "Authorization: Bearer $NOMYO_ROUTER_API_KEY" http://localhost:12434/api/tags
|
||||
```
|
||||
|
|
|
|||
|
|
@ -42,4 +42,4 @@ yarl==1.20.1
|
|||
aiosqlite
|
||||
# Semantic LLM cache — base install (exact-match mode, no heavy ML deps)
|
||||
# For semantic mode use the :semantic Docker image tag (adds sentence-transformers + torch)
|
||||
semantic-llm-cache@git+https://github.com/nomyo-ai/async-semantic-llm-cache.git@v0.1
|
||||
semantic-llm-cache@git+https://bitfreedom.net/code/nomyo-ai/async-semantic-llm-cache.git@v0.1.1
|
||||
|
|
|
|||
|
|
@ -1092,7 +1092,9 @@ function renderTimeSeriesChart(timeSeriesData, chart, minutes) {
|
|||
function updateTpsChart(payload) {
|
||||
const tokens = payload.token_usage_counts || {};
|
||||
const perModelTokens = {};
|
||||
psRows.forEach((_, model) => {
|
||||
const allModels = new Set();
|
||||
for (const ep in tokens) for (const model in tokens[ep]) allModels.add(model);
|
||||
allModels.forEach(model => {
|
||||
let total = 0;
|
||||
for (const ep in tokens) total += tokens[ep]?.[model] || 0;
|
||||
// Normalise against the first-seen cumulative total so history
|
||||
|
|
|
|||
Loading…
Add table
Add a link
Reference in a new issue