mirror of
https://github.com/katanemo/plano.git
synced 2026-04-30 03:16:28 +02:00
list all Plano-Orchestrator model variants in docs
This commit is contained in:
parent
747946fb39
commit
98038690b0
1 changed files with 4 additions and 2 deletions
|
|
@ -343,10 +343,12 @@ By default, Plano uses a hosted Plano-Orchestrator endpoint. To self-host the or
|
|||
.. note::
|
||||
vLLM requires a Linux server with an NVIDIA GPU (CUDA). For local development on macOS, a GGUF version for Ollama is coming soon.
|
||||
|
||||
Two model variants are available on HuggingFace:
|
||||
The following model variants are available on HuggingFace:
|
||||
|
||||
* `Plano-Orchestrator-4B <https://huggingface.co/katanemo/Plano-Orchestrator-4B>`_ — lighter model, suitable for development and testing
|
||||
* `Plano-Orchestrator-30B-A3B <https://huggingface.co/katanemo/Plano-Orchestrator-30B-A3B>`_ — full-size model for production (FP8 quantized variant also available)
|
||||
* `Plano-Orchestrator-4B-FP8 <https://huggingface.co/katanemo/Plano-Orchestrator-4B-FP8>`_ — FP8 quantized 4B model, lower memory usage
|
||||
* `Plano-Orchestrator-30B-A3B <https://huggingface.co/katanemo/Plano-Orchestrator-30B-A3B>`_ — full-size model for production
|
||||
* `Plano-Orchestrator-30B-A3B-FP8 <https://huggingface.co/katanemo/Plano-Orchestrator-30B-A3B-FP8>`_ — FP8 quantized 30B model, recommended for production deployments
|
||||
|
||||
Using vLLM
|
||||
~~~~~~~~~~
|
||||
|
|
|
|||
Loading…
Add table
Add a link
Reference in a new issue