157 lines
4.6 KiB
Markdown
157 lines
4.6 KiB
Markdown
# NOMYO Router Documentation
|
|
|
|
Welcome to the NOMYO Router documentation! This folder contains comprehensive guides for using, configuring, and deploying the NOMYO Router.
|
|
|
|
## Documentation Structure
|
|
|
|
```
|
|
doc/
|
|
├── architecture.md # Technical architecture overview
|
|
├── configuration.md # Detailed configuration guide
|
|
├── usage.md # API usage examples
|
|
├── deployment.md # Deployment scenarios
|
|
├── monitoring.md # Monitoring and troubleshooting
|
|
└── examples/ # Example configurations and scripts
|
|
├── docker-compose.yml
|
|
├── sample-config.yaml
|
|
└── k8s-deployment.yaml
|
|
```
|
|
|
|
## Getting Started
|
|
|
|
### Quick Start Guide
|
|
|
|
1. **Install the router**:
|
|
|
|
```bash
|
|
git clone https://bitfreedom.net/code/nomyo-ai/nomyo-router.git
|
|
cd nomyo-router
|
|
python3 -m venv .venv/router
|
|
source .venv/router/bin/activate
|
|
pip3 install -r requirements.txt
|
|
```
|
|
2. **Configure endpoints** in `config.yaml`:
|
|
|
|
```yaml
|
|
endpoints:
|
|
- http://localhost:11434
|
|
max_concurrent_connections: 2
|
|
```
|
|
|
|
# Optional router-level API key (leave blank to disable)
|
|
|
|
nomyo-router-api-key: ""
|
|
|
|
```
|
|
3. **Run the router**:
|
|
|
|
```bash
|
|
uvicorn router:app --host 0.0.0.0 --port 12434
|
|
```
|
|
|
|
4. **Use the router**: Point your frontend to `http://localhost:12434` instead of your Ollama instance.
|
|
|
|
### Key Features
|
|
|
|
- **Intelligent Routing**: Model deployment-aware routing with load balancing
|
|
- **Multi-Endpoint Support**: Combine Ollama and OpenAI-compatible endpoints
|
|
- **Token Tracking**: Comprehensive token usage monitoring
|
|
- **Real-time Monitoring**: Server-Sent Events for live usage updates
|
|
- **OpenAI Compatibility**: Full OpenAI API compatibility layer
|
|
- **MOE System**: Multiple Opinions Ensemble for improved responses with smaller models
|
|
|
|
## Documentation Guides
|
|
|
|
### [Architecture](architecture.md)
|
|
|
|
Learn about the router's internal architecture, routing algorithm, caching mechanisms, and advanced features like the MOE system.
|
|
|
|
### [Configuration](configuration.md)
|
|
|
|
Detailed guide on configuring the router with multiple endpoints, API keys, and environment variables.
|
|
|
|
### [Usage](usage.md)
|
|
|
|
Comprehensive API reference with examples for making requests, streaming responses, and using advanced features.
|
|
|
|
### [Deployment](deployment.md)
|
|
|
|
Step-by-step deployment guides for bare metal, Docker, Kubernetes, and production environments.
|
|
|
|
### [Monitoring](monitoring.md)
|
|
|
|
Monitoring endpoints, troubleshooting guides, performance tuning, and best practices for maintaining your router.
|
|
|
|
## Examples
|
|
|
|
The [examples](examples/) directory contains ready-to-use configuration files:
|
|
|
|
- **docker-compose.yml**: Complete Docker Compose setup with multiple Ollama instances
|
|
- **sample-config.yaml**: Example configuration with comments
|
|
- **k8s-deployment.yaml**: Kubernetes deployment manifests
|
|
|
|
## Need Help?
|
|
|
|
### Common Issues
|
|
|
|
Check the [Monitoring Guide](monitoring.md) for troubleshooting common problems:
|
|
|
|
- Endpoint unavailable
|
|
- Model not found
|
|
- High latency
|
|
- Connection limits reached
|
|
- Token tracking issues
|
|
|
|
### Support
|
|
|
|
For additional help:
|
|
|
|
1. Check the [GitHub Issues](https://github.com/nomyo-ai/nomyo-router/issues)
|
|
2. Review the [Monitoring Guide](monitoring.md) for diagnostics
|
|
3. Examine the router logs for detailed error messages
|
|
|
|
## Best Practices
|
|
|
|
### Configuration
|
|
|
|
- Use environment variables for API keys
|
|
- Set appropriate `max_concurrent_connections` based on your hardware
|
|
- Monitor endpoint health regularly
|
|
- Keep models loaded on multiple endpoints for redundancy
|
|
|
|
### Deployment
|
|
|
|
- Use Docker for containerized deployments
|
|
- Consider Kubernetes for production at scale
|
|
- Set up monitoring and alerting
|
|
- Implement regular backups of token counts database
|
|
|
|
### Performance
|
|
|
|
- Balance load across multiple endpoints
|
|
- Keep frequently used models loaded
|
|
- Monitor connection counts and token usage
|
|
- Scale horizontally when needed
|
|
|
|
## Next Steps
|
|
|
|
1. **Read the [Architecture Guide](architecture.md)** to understand how the router works
|
|
2. **Configure your endpoints** in `config.yaml`
|
|
3. **Deploy the router** using your preferred method
|
|
4. **Monitor your setup** using the monitoring endpoints
|
|
5. **Scale as needed** by adding more endpoints
|
|
|
|
Happy routing! 🚀
|
|
|
|
## Router API key usage
|
|
|
|
If the router API key is set (`NOMYO_ROUTER_API_KEY` env or `nomyo-router-api-key` in config), include it in every request:
|
|
|
|
- Header (preferred): Authorization: Bearer <router_key>
|
|
- Query param: ?api_key=<router_key>
|
|
|
|
Example:
|
|
|
|
```bash
|
|
curl -H "Authorization: Bearer $NOMYO_ROUTER_API_KEY" http://localhost:12434/api/tags
|
|
```
|