feat:
added buffer_lock to prevent race condition in high concurrency scenarios added documentation
This commit is contained in:
parent
434b6d4cca
commit
20a016269d
9 changed files with 2167 additions and 42 deletions
137
doc/README.md
Normal file
137
doc/README.md
Normal file
|
|
@ -0,0 +1,137 @@
|
|||
# NOMYO Router Documentation
|
||||
|
||||
Welcome to the NOMYO Router documentation! This folder contains comprehensive guides for using, configuring, and deploying the NOMYO Router.
|
||||
|
||||
## Documentation Structure
|
||||
|
||||
```
|
||||
doc/
|
||||
├── architecture.md # Technical architecture overview
|
||||
├── configuration.md # Detailed configuration guide
|
||||
├── usage.md # API usage examples
|
||||
├── deployment.md # Deployment scenarios
|
||||
├── monitoring.md # Monitoring and troubleshooting
|
||||
└── examples/ # Example configurations and scripts
|
||||
├── docker-compose.yml
|
||||
├── sample-config.yaml
|
||||
└── k8s-deployment.yaml
|
||||
```
|
||||
|
||||
## Getting Started
|
||||
|
||||
### Quick Start Guide
|
||||
|
||||
1. **Install the router**:
|
||||
|
||||
```bash
|
||||
git clone https://github.com/nomyo-ai/nomyo-router.git
|
||||
cd nomyo-router
|
||||
python3 -m venv .venv/router
|
||||
source .venv/router/bin/activate
|
||||
pip3 install -r requirements.txt
|
||||
```
|
||||
2. **Configure endpoints** in `config.yaml`:
|
||||
|
||||
```yaml
|
||||
endpoints:
|
||||
- http://localhost:11434
|
||||
max_concurrent_connections: 2
|
||||
```
|
||||
3. **Run the router**:
|
||||
|
||||
```bash
|
||||
uvicorn router:app --host 0.0.0.0 --port 12434
|
||||
```
|
||||
4. **Use the router**: Point your frontend to `http://localhost:12434` instead of your Ollama instance.
|
||||
|
||||
### Key Features
|
||||
|
||||
- **Intelligent Routing**: Model deployment-aware routing with load balancing
|
||||
- **Multi-Endpoint Support**: Combine Ollama and OpenAI-compatible endpoints
|
||||
- **Token Tracking**: Comprehensive token usage monitoring
|
||||
- **Real-time Monitoring**: Server-Sent Events for live usage updates
|
||||
- **OpenAI Compatibility**: Full OpenAI API compatibility layer
|
||||
- **MOE System**: Multiple Opinions Ensemble for improved responses with smaller models
|
||||
|
||||
## Documentation Guides
|
||||
|
||||
### [Architecture](architecture.md)
|
||||
|
||||
Learn about the router's internal architecture, routing algorithm, caching mechanisms, and advanced features like the MOE system.
|
||||
|
||||
### [Configuration](configuration.md)
|
||||
|
||||
Detailed guide on configuring the router with multiple endpoints, API keys, and environment variables.
|
||||
|
||||
### [Usage](usage.md)
|
||||
|
||||
Comprehensive API reference with examples for making requests, streaming responses, and using advanced features.
|
||||
|
||||
### [Deployment](deployment.md)
|
||||
|
||||
Step-by-step deployment guides for bare metal, Docker, Kubernetes, and production environments.
|
||||
|
||||
### [Monitoring](monitoring.md)
|
||||
|
||||
Monitoring endpoints, troubleshooting guides, performance tuning, and best practices for maintaining your router.
|
||||
|
||||
## Examples
|
||||
|
||||
The [examples](examples/) directory contains ready-to-use configuration files:
|
||||
|
||||
- **docker-compose.yml**: Complete Docker Compose setup with multiple Ollama instances
|
||||
- **sample-config.yaml**: Example configuration with comments
|
||||
- **k8s-deployment.yaml**: Kubernetes deployment manifests
|
||||
|
||||
## Need Help?
|
||||
|
||||
### Common Issues
|
||||
|
||||
Check the [Monitoring Guide](monitoring.md) for troubleshooting common problems:
|
||||
|
||||
- Endpoint unavailable
|
||||
- Model not found
|
||||
- High latency
|
||||
- Connection limits reached
|
||||
- Token tracking issues
|
||||
|
||||
### Support
|
||||
|
||||
For additional help:
|
||||
|
||||
1. Check the [GitHub Issues](https://github.com/nomyo-ai/nomyo-router/issues)
|
||||
2. Review the [Monitoring Guide](monitoring.md) for diagnostics
|
||||
3. Examine the router logs for detailed error messages
|
||||
|
||||
## Best Practices
|
||||
|
||||
### Configuration
|
||||
|
||||
- Use environment variables for API keys
|
||||
- Set appropriate `max_concurrent_connections` based on your hardware
|
||||
- Monitor endpoint health regularly
|
||||
- Keep models loaded on multiple endpoints for redundancy
|
||||
|
||||
### Deployment
|
||||
|
||||
- Use Docker for containerized deployments
|
||||
- Consider Kubernetes for production at scale
|
||||
- Set up monitoring and alerting
|
||||
- Implement regular backups of token counts database
|
||||
|
||||
### Performance
|
||||
|
||||
- Balance load across multiple endpoints
|
||||
- Keep frequently used models loaded
|
||||
- Monitor connection counts and token usage
|
||||
- Scale horizontally when needed
|
||||
|
||||
## Next Steps
|
||||
|
||||
1. **Read the [Architecture Guide](architecture.md)** to understand how the router works
|
||||
2. **Configure your endpoints** in `config.yaml`
|
||||
3. **Deploy the router** using your preferred method
|
||||
4. **Monitor your setup** using the monitoring endpoints
|
||||
5. **Scale as needed** by adding more endpoints
|
||||
|
||||
Happy routing! 🚀
|
||||
Loading…
Add table
Add a link
Reference in a new issue