Merge pull request 'dev-v0.7.x to prod' (#1) from dev-v0.7.x into main
Some checks failed
Build and Publish Docker Image (Semantic Cache) / build-and-push-semantic (push) Has been cancelled
Build and Publish Docker Image / build-and-push (push) Has been cancelled

Reviewed-on: https://bitfreedom.net/code/code/nomyo-ai/nomyo-router/pulls/1
This commit is contained in:
Alpha Nerd 2026-04-02 09:17:59 +02:00
commit ba1b2fb651
5 changed files with 44 additions and 30 deletions

View file

@ -24,7 +24,7 @@ doc/
1. **Install the router**:
```bash
git clone https://github.com/nomyo-ai/nomyo-router.git
git clone https://bitfreedom.net/code/nomyo-ai/nomyo-router.git
cd nomyo-router
python3 -m venv .venv/router
source .venv/router/bin/activate
@ -36,14 +36,19 @@ doc/
endpoints:
- http://localhost:11434
max_concurrent_connections: 2
# Optional router-level API key (leave blank to disable)
nomyo-router-api-key: ""
```
# Optional router-level API key (leave blank to disable)
nomyo-router-api-key: ""
```
3. **Run the router**:
```bash
uvicorn router:app --host 0.0.0.0 --port 12434
```
```bash
uvicorn router:app --host 0.0.0.0 --port 12434
```
4. **Use the router**: Point your frontend to `http://localhost:12434` instead of your Ollama instance.
### Key Features
@ -138,14 +143,15 @@ For additional help:
Happy routing! 🚀
## Router API key usage
If the router API key is set (`NOMYO_ROUTER_API_KEY` env or `nomyo-router-api-key` in config), include it in every request:
- Header (preferred): Authorization: Bearer <router_key>
- Query param: ?api_key=<router_key>
Example:
```bash
curl -H "Authorization: Bearer $NOMYO_ROUTER_API_KEY" http://localhost:12434/api/tags
```

View file

@ -16,7 +16,7 @@ NOMYO Router can be deployed in various environments depending on your requireme
```bash
# Clone the repository
git clone https://github.com/nomyo-ai/nomyo-router.git
git clone https://bitfreedom.net/code/nomyo-ai/nomyo-router.git
cd nomyo-router
# Create virtual environment
@ -84,10 +84,11 @@ sudo systemctl status nomyo-router
### Image variants
| Tag | Semantic cache | Image size |
|---|---|---|
| `latest` | ❌ exact match only | ~300 MB |
| `latest-semantic` | ✅ sentence-transformers + `all-MiniLM-L6-v2` pre-baked | ~800 MB |
| Tag | Semantic cache | Image size |
| ------------------- | -------------------------------------------------------- | ------------ |
| `latest` | ❌ exact match only | ~300 MB |
| `latest-semantic` | ✅ sentence-transformers +`all-MiniLM-L6-v2` pre-baked | ~800 MB |
The `:semantic` variant enables `cache_similarity < 1.0` in `config.yaml`. The lean image falls back to exact-match caching with a warning if semantic mode is configured.

View file

@ -5,7 +5,7 @@
### 1. Install the Router
```bash
git clone https://github.com/nomyo-ai/nomyo-router.git
git clone https://bitfreedom.net/code/nomyo-ai/nomyo-router.git
cd nomyo-router
python3 -m venv .venv/router
source .venv/router/bin/activate
@ -42,8 +42,9 @@ Configure your frontend to point to `http://localhost:12434` instead of your Oll
The router provides all standard Ollama API endpoints:
| Endpoint | Method | Description |
| --------------- | ------ | --------------------- |
| ----------------- | -------- | ----------------------- |
| `/api/generate` | POST | Generate text |
| `/api/chat` | POST | Chat completions |
| `/api/embed` | POST | Embeddings |
@ -60,8 +61,9 @@ The router provides all standard Ollama API endpoints:
For OpenAI API compatibility:
| Endpoint | Method | Description |
| ---------------------- | ------ | ---------------- |
| ------------------------ | -------- | ------------------ |
| `/v1/chat/completions` | POST | Chat completions |
| `/v1/completions` | POST | Text completions |
| `/v1/embeddings` | POST | Embeddings |
@ -69,18 +71,19 @@ For OpenAI API compatibility:
### Monitoring Endpoints
| Endpoint | Method | Description |
| ---------------------------------- | ------ | ---------------------------------------- |
| `/api/usage` | GET | Current connection counts |
| `/api/token_counts` | GET | Token usage statistics |
| `/api/stats` | POST | Detailed model statistics |
| `/api/aggregate_time_series_days` | POST | Aggregate time series data into daily |
| `/api/version` | GET | Ollama version info |
| `/api/config` | GET | Endpoint configuration |
| `/api/usage-stream` | GET | Real-time usage updates (SSE) |
| `/health` | GET | Health check |
| `/api/cache/stats` | GET | Cache hit/miss counters and config |
| `/api/cache/invalidate` | POST | Clear all cache entries and counters |
| Endpoint | Method | Description |
| ----------------------------------- | -------- | --------------------------------------- |
| `/api/usage` | GET | Current connection counts |
| `/api/token_counts` | GET | Token usage statistics |
| `/api/stats` | POST | Detailed model statistics |
| `/api/aggregate_time_series_days` | POST | Aggregate time series data into daily |
| `/api/version` | GET | Ollama version info |
| `/api/config` | GET | Endpoint configuration |
| `/api/usage-stream` | GET | Real-time usage updates (SSE) |
| `/health` | GET | Health check |
| `/api/cache/stats` | GET | Cache hit/miss counters and config |
| `/api/cache/invalidate` | POST | Clear all cache entries and counters |
## Making Requests
@ -196,6 +199,7 @@ curl -X POST http://localhost:12434/api/cache/invalidate
```
**Notes:**
- MOE requests (`moe-*` model prefix) always bypass the cache
- Cache is isolated per `model + system prompt` — different users with different system prompts cannot receive each other's cached responses
- Semantic matching requires the `:semantic` Docker image tag (`ghcr.io/nomyo-ai/nomyo-router:latest-semantic`)
@ -404,14 +408,15 @@ print(f"Available models: {[m.id for m in response.data]}")
See the [examples](examples/) directory for complete integration examples.
### Authentication to NOMYO Router
If a router API key is configured, include it with each request:
- Header: Authorization: Bearer <router_key>
- Query: ?api_key=<router_key>
Example (tags):
```bash
curl -H "Authorization: Bearer $NOMYO_ROUTER_API_KEY" http://localhost:12434/api/tags
```

View file

@ -42,4 +42,4 @@ yarl==1.20.1
aiosqlite
# Semantic LLM cache — base install (exact-match mode, no heavy ML deps)
# For semantic mode use the :semantic Docker image tag (adds sentence-transformers + torch)
semantic-llm-cache@git+https://github.com/nomyo-ai/async-semantic-llm-cache.git@v0.1
semantic-llm-cache@git+https://bitfreedom.net/code/nomyo-ai/async-semantic-llm-cache.git@v0.1.1

View file

@ -1092,7 +1092,9 @@ function renderTimeSeriesChart(timeSeriesData, chart, minutes) {
function updateTpsChart(payload) {
const tokens = payload.token_usage_counts || {};
const perModelTokens = {};
psRows.forEach((_, model) => {
const allModels = new Set();
for (const ep in tokens) for (const model in tokens[ep]) allModels.add(model);
allModels.forEach(model => {
let total = 0;
for (const ep in tokens) total += tokens[ep]?.[model] || 0;
// Normalise against the first-seen cumulative total so history