feat: add llama-swap as a backend
This commit is contained in:
parent
c8da58430a
commit
aa8baebac5
17 changed files with 544 additions and 52 deletions
|
|
@ -78,6 +78,37 @@ endpoints:
|
|||
- OpenAI-compatible endpoints use `/v1` prefix
|
||||
- The router automatically detects endpoint type based on URL pattern
|
||||
|
||||
### `llama_server_endpoints`
|
||||
|
||||
**Type**: `list[str]` (optional)
|
||||
|
||||
**Default**: `[]`
|
||||
|
||||
**Description**: List of [llama.cpp `llama-server`](https://github.com/ggml-org/llama.cpp) endpoints (OpenAI-compatible, configured with the `/v1` suffix). The router reads each backend's loaded models from `/v1/models` (entries with `status == "loaded"`) and unloads idle models via `POST /models/unload`.
|
||||
|
||||
```yaml
|
||||
llama_server_endpoints:
|
||||
- http://192.168.0.50:8889/v1
|
||||
```
|
||||
|
||||
### `llama_swap_endpoints`
|
||||
|
||||
**Type**: `list[str]` (optional)
|
||||
|
||||
**Default**: `[]`
|
||||
|
||||
**Description**: List of [llama-swap](https://github.com/mostlygeek/llama-swap) endpoints (OpenAI-compatible, configured with the `/v1` suffix). llama-swap fronts multiple `llama-server` workers behind one address. It is treated like `llama_server_endpoints` for routing, model discovery, and reranking, but differs in two ways the router handles automatically:
|
||||
|
||||
- **Loaded-model detection** — llama-swap's `/v1/models` omits the per-model `status` field, so running workers are read from `GET /running` (entries with `state == "ready"`).
|
||||
- **Model unload** — done via `POST /api/models/unload/:model_id` (path parameter), not the `llama-server` body form.
|
||||
|
||||
The router also exposes a passthrough route, `GET|POST /upstream/:model_id/<path>`, which forwards directly to a model's underlying `llama-server` worker (via llama-swap's `/upstream`), letting clients use `llama-server` features that llama-swap does not forward (e.g. token-array prompts).
|
||||
|
||||
```yaml
|
||||
llama_swap_endpoints:
|
||||
- http://192.168.0.50:8890/v1
|
||||
```
|
||||
|
||||
### `max_concurrent_connections`
|
||||
|
||||
**Type**: `int`
|
||||
|
|
|
|||
Loading…
Add table
Add a link
Reference in a new issue