diff --git a/docs/source/guides/orchestration.rst b/docs/source/guides/orchestration.rst index 20b5455a..1d300f38 100644 --- a/docs/source/guides/orchestration.rst +++ b/docs/source/guides/orchestration.rst @@ -343,10 +343,12 @@ By default, Plano uses a hosted Plano-Orchestrator endpoint. To self-host the or .. note:: vLLM requires a Linux server with an NVIDIA GPU (CUDA). For local development on macOS, a GGUF version for Ollama is coming soon. -Two model variants are available on HuggingFace: +The following model variants are available on HuggingFace: * `Plano-Orchestrator-4B `_ — lighter model, suitable for development and testing -* `Plano-Orchestrator-30B-A3B `_ — full-size model for production (FP8 quantized variant also available) +* `Plano-Orchestrator-4B-FP8 `_ — FP8 quantized 4B model, lower memory usage +* `Plano-Orchestrator-30B-A3B `_ — full-size model for production +* `Plano-Orchestrator-30B-A3B-FP8 `_ — FP8 quantized 30B model, recommended for production deployments Using vLLM ~~~~~~~~~~