list all Plano-Orchestrator model variants in docs

2026-04-30 03:16:28 +02:00 · 2026-03-11 15:53:32 -07:00 · 2026-03-11 15:53:32 -07:00 · 98038690b0
commit 98038690b0
parent 747946fb39
1 changed files with 4 additions and 2 deletions
--- a/docs/source/guides/orchestration.rst
+++ b/docs/source/guides/orchestration.rst
@ -343,10 +343,12 @@ By default, Plano uses a hosted Plano-Orchestrator endpoint. To self-host the or
 .. note::
   vLLM requires a Linux server with an NVIDIA GPU (CUDA). For local development on macOS, a GGUF version for Ollama is coming soon.

-Two model variants are available on HuggingFace:
+The following model variants are available on HuggingFace:

 * `Plano-Orchestrator-4B <https://huggingface.co/katanemo/Plano-Orchestrator-4B>`_ — lighter model, suitable for development and testing
-* `Plano-Orchestrator-30B-A3B <https://huggingface.co/katanemo/Plano-Orchestrator-30B-A3B>`_ — full-size model for production (FP8 quantized variant also available)
+* `Plano-Orchestrator-4B-FP8 <https://huggingface.co/katanemo/Plano-Orchestrator-4B-FP8>`_ — FP8 quantized 4B model, lower memory usage
+* `Plano-Orchestrator-30B-A3B <https://huggingface.co/katanemo/Plano-Orchestrator-30B-A3B>`_ — full-size model for production
+* `Plano-Orchestrator-30B-A3B-FP8 <https://huggingface.co/katanemo/Plano-Orchestrator-30B-A3B-FP8>`_ — FP8 quantized 30B model, recommended for production deployments

 Using vLLM
 ~~~~~~~~~~