feat: visualization of conversation affinity in dashboard
This commit is contained in:
parent
4acbaeb29c
commit
aa7ec6354a
5 changed files with 306 additions and 19 deletions
|
|
@ -166,6 +166,39 @@ curl -X POST http://localhost:12434/api/cache/invalidate
|
|||
|
||||
Clears all cached entries and resets hit/miss counters.
|
||||
|
||||
### Affinity Stats (Conversation Affinity)
|
||||
|
||||
```bash
|
||||
curl http://localhost:12434/api/affinity_stats
|
||||
```
|
||||
|
||||
Response when [`conversation_affinity`](configuration.md#conversation_affinity) is enabled:
|
||||
|
||||
```json
|
||||
{
|
||||
"enabled": true,
|
||||
"ttl": 300,
|
||||
"entries": [
|
||||
{ "endpoint": "http://gpu-primary:11434", "model": "llama3.2:latest", "remaining": 287.4 },
|
||||
{ "endpoint": "http://gpu-primary:11434", "model": "llama3.2:latest", "remaining": 113.0 },
|
||||
{ "endpoint": "http://gpu-secondary:11434", "model": "qwen2.5-coder:7b", "remaining": 44.8 }
|
||||
]
|
||||
}
|
||||
```
|
||||
|
||||
Response when the feature is disabled:
|
||||
```json
|
||||
{ "enabled": false, "ttl": 300, "entries": [] }
|
||||
```
|
||||
|
||||
- One element per **live pinned conversation** (no fingerprints or content — just the endpoint/model the pin points to and how many seconds it has left before expiry).
|
||||
- Aggregation by `(endpoint, model)` is left to the consumer: the dashboard does this client-side.
|
||||
- The endpoint is gated by the same `nomyo-router-api-key` middleware as the rest of `/api/*`.
|
||||
|
||||
The dashboard's **Running Models (PS) → Affinity** column is rendered from this data. The column auto-hides when `enabled: false`. Each row shows one dot per live pin against that `(endpoint, model)` pair; dot opacity = `remaining / ttl` (floor 0.15), so freshly-routed pins are solid and pins close to expiry fade out. A `+N` overflow badge appears once a single (endpoint, model) holds more than 12 active pins; an em-dash (`—`) marks an `(endpoint, model)` with no live pins.
|
||||
|
||||
> Multiple dots for what looks like "one chat window" is normal — most chat UIs (Open WebUI, LibreChat, …) fire auxiliary requests (title generation, follow-up suggestions, tag extraction) that have their own first-turn fingerprint and therefore their own pin. See [Conversation Affinity → Why the dashboard may show more than one dot per visible conversation](configuration.md#conversation_affinity) for the details.
|
||||
|
||||
### Real-time Usage Stream
|
||||
|
||||
```bash
|
||||
|
|
|
|||
Loading…
Add table
Add a link
Reference in a new issue