feat:

added buffer_lock to prevent race condition in high concurrency scenarios added documentation
2026-01-05 17:16:31 +01:00 · 2026-01-05 17:16:31 +01:00 · 20a016269d
commit 20a016269d
parent 434b6d4cca
9 changed files with 2167 additions and 42 deletions
--- a/doc/README.md
+++ b/doc/README.md
@ -0,0 +1,137 @@
+# NOMYO Router Documentation
+
+Welcome to the NOMYO Router documentation! This folder contains comprehensive guides for using, configuring, and deploying the NOMYO Router.
+
+## Documentation Structure
+
+```
+doc/
+├── architecture.md          # Technical architecture overview
+├── configuration.md         # Detailed configuration guide
+├── usage.md                 # API usage examples
+├── deployment.md            # Deployment scenarios
+├── monitoring.md            # Monitoring and troubleshooting
+└── examples/                # Example configurations and scripts
+    ├── docker-compose.yml
+    ├── sample-config.yaml
+    └── k8s-deployment.yaml
+```
+
+## Getting Started
+
+### Quick Start Guide
+
+1. **Install the router**:
+
+   ```bash
+   git clone https://github.com/nomyo-ai/nomyo-router.git
+   cd nomyo-router
+   python3 -m venv .venv/router
+   source .venv/router/bin/activate
+   pip3 install -r requirements.txt
+   ```
+2. **Configure endpoints** in `config.yaml`:
+
+   ```yaml
+   endpoints:
+     - http://localhost:11434
+   max_concurrent_connections: 2
+   ```
+3. **Run the router**:
+
+   ```bash
+   uvicorn router:app --host 0.0.0.0 --port 12434
+   ```
+4. **Use the router**: Point your frontend to `http://localhost:12434` instead of your Ollama instance.
+
+### Key Features
+
+- **Intelligent Routing**: Model deployment-aware routing with load balancing
+- **Multi-Endpoint Support**: Combine Ollama and OpenAI-compatible endpoints
+- **Token Tracking**: Comprehensive token usage monitoring
+- **Real-time Monitoring**: Server-Sent Events for live usage updates
+- **OpenAI Compatibility**: Full OpenAI API compatibility layer
+- **MOE System**: Multiple Opinions Ensemble for improved responses with smaller models
+
+## Documentation Guides
+
+### [Architecture](architecture.md)
+
+Learn about the router's internal architecture, routing algorithm, caching mechanisms, and advanced features like the MOE system.
+
+### [Configuration](configuration.md)
+
+Detailed guide on configuring the router with multiple endpoints, API keys, and environment variables.
+
+### [Usage](usage.md)
+
+Comprehensive API reference with examples for making requests, streaming responses, and using advanced features.
+
+### [Deployment](deployment.md)
+
+Step-by-step deployment guides for bare metal, Docker, Kubernetes, and production environments.
+
+### [Monitoring](monitoring.md)
+
+Monitoring endpoints, troubleshooting guides, performance tuning, and best practices for maintaining your router.
+
+## Examples
+
+The [examples](examples/) directory contains ready-to-use configuration files:
+
+- **docker-compose.yml**: Complete Docker Compose setup with multiple Ollama instances
+- **sample-config.yaml**: Example configuration with comments
+- **k8s-deployment.yaml**: Kubernetes deployment manifests
+
+## Need Help?
+
+### Common Issues
+
+Check the [Monitoring Guide](monitoring.md) for troubleshooting common problems:
+
+- Endpoint unavailable
+- Model not found
+- High latency
+- Connection limits reached
+- Token tracking issues
+
+### Support
+
+For additional help:
+
+1. Check the [GitHub Issues](https://github.com/nomyo-ai/nomyo-router/issues)
+2. Review the [Monitoring Guide](monitoring.md) for diagnostics
+3. Examine the router logs for detailed error messages
+
+## Best Practices
+
+### Configuration
+
+- Use environment variables for API keys
+- Set appropriate `max_concurrent_connections` based on your hardware
+- Monitor endpoint health regularly
+- Keep models loaded on multiple endpoints for redundancy
+
+### Deployment
+
+- Use Docker for containerized deployments
+- Consider Kubernetes for production at scale
+- Set up monitoring and alerting
+- Implement regular backups of token counts database
+
+### Performance
+
+- Balance load across multiple endpoints
+- Keep frequently used models loaded
+- Monitor connection counts and token usage
+- Scale horizontally when needed
+
+## Next Steps
+
+1. **Read the [Architecture Guide](architecture.md)** to understand how the router works
+2. **Configure your endpoints** in `config.yaml`
+3. **Deploy the router** using your preferred method
+4. **Monitor your setup** using the monitoring endpoints
+5. **Scale as needed** by adding more endpoints
+
+Happy routing! 🚀