feat:
added buffer_lock to prevent race condition in high concurrency scenarios added documentation
This commit is contained in:
parent
434b6d4cca
commit
20a016269d
9 changed files with 2167 additions and 42 deletions
515
doc/monitoring.md
Normal file
515
doc/monitoring.md
Normal file
|
|
@ -0,0 +1,515 @@
|
|||
# Monitoring and Troubleshooting Guide
|
||||
|
||||
## Monitoring Overview
|
||||
|
||||
NOMYO Router provides comprehensive monitoring capabilities to track performance, health, and usage patterns.
|
||||
|
||||
## Monitoring Endpoints
|
||||
|
||||
### Health Check
|
||||
|
||||
```bash
|
||||
curl http://localhost:12434/health
|
||||
```
|
||||
|
||||
Response:
|
||||
```json
|
||||
{
|
||||
"status": "ok" | "error",
|
||||
"endpoints": {
|
||||
"http://endpoint1:11434": {
|
||||
"status": "ok" | "error",
|
||||
"version": "string" | "detail": "error message"
|
||||
}
|
||||
}
|
||||
}
|
||||
```
|
||||
|
||||
**HTTP Status Codes**:
|
||||
- `200`: All endpoints healthy
|
||||
- `503`: One or more endpoints unhealthy
|
||||
|
||||
### Current Usage
|
||||
|
||||
```bash
|
||||
curl http://localhost:12434/api/usage
|
||||
```
|
||||
|
||||
Response:
|
||||
```json
|
||||
{
|
||||
"usage_counts": {
|
||||
"http://endpoint1:11434": {
|
||||
"llama3": 2,
|
||||
"mistral": 1
|
||||
},
|
||||
"http://endpoint2:11434": {
|
||||
"llama3": 0,
|
||||
"mistral": 3
|
||||
}
|
||||
},
|
||||
"token_usage_counts": {
|
||||
"http://endpoint1:11434": {
|
||||
"llama3": 1542,
|
||||
"mistral": 876
|
||||
}
|
||||
}
|
||||
}
|
||||
```
|
||||
|
||||
### Token Statistics
|
||||
|
||||
```bash
|
||||
curl http://localhost:12434/api/token_counts
|
||||
```
|
||||
|
||||
Response:
|
||||
```json
|
||||
{
|
||||
"total_tokens": 2418,
|
||||
"breakdown": [
|
||||
{
|
||||
"endpoint": "http://endpoint1:11434",
|
||||
"model": "llama3",
|
||||
"input_tokens": 120,
|
||||
"output_tokens": 1422,
|
||||
"total_tokens": 1542
|
||||
},
|
||||
{
|
||||
"endpoint": "http://endpoint1:11434",
|
||||
"model": "mistral",
|
||||
"input_tokens": 80,
|
||||
"output_tokens": 796,
|
||||
"total_tokens": 876
|
||||
}
|
||||
]
|
||||
}
|
||||
```
|
||||
|
||||
### Model Statistics
|
||||
|
||||
```bash
|
||||
curl http://localhost:12434/api/stats -X POST -d '{"model": "llama3"}'
|
||||
```
|
||||
|
||||
Response:
|
||||
```json
|
||||
{
|
||||
"model": "llama3",
|
||||
"input_tokens": 120,
|
||||
"output_tokens": 1422,
|
||||
"total_tokens": 1542,
|
||||
"time_series": [
|
||||
{
|
||||
"endpoint": "http://endpoint1:11434",
|
||||
"timestamp": 1712345600,
|
||||
"input_tokens": 20,
|
||||
"output_tokens": 150,
|
||||
"total_tokens": 170
|
||||
}
|
||||
],
|
||||
"endpoint_distribution": {
|
||||
"http://endpoint1:11434": 1542
|
||||
}
|
||||
}
|
||||
```
|
||||
|
||||
### Configuration Status
|
||||
|
||||
```bash
|
||||
curl http://localhost:12434/api/config
|
||||
```
|
||||
|
||||
Response:
|
||||
```json
|
||||
{
|
||||
"endpoints": [
|
||||
{
|
||||
"url": "http://endpoint1:11434",
|
||||
"status": "ok" | "error",
|
||||
"version": "string" | "detail": "error message"
|
||||
}
|
||||
]
|
||||
}
|
||||
```
|
||||
|
||||
### Real-time Usage Stream
|
||||
|
||||
```bash
|
||||
curl http://localhost:12434/api/usage-stream
|
||||
```
|
||||
|
||||
This provides Server-Sent Events (SSE) with real-time updates:
|
||||
```
|
||||
data: {"usage_counts": {...}, "token_usage_counts": {...}}
|
||||
|
||||
data: {"usage_counts": {...}, "token_usage_counts": {...}}
|
||||
```
|
||||
|
||||
## Monitoring Tools
|
||||
|
||||
### Prometheus Integration
|
||||
|
||||
Create a Prometheus scrape configuration:
|
||||
|
||||
```yaml
|
||||
scrape_configs:
|
||||
- job_name: 'nomyo-router'
|
||||
metrics_path: '/api/usage'
|
||||
params:
|
||||
format: ['prometheus']
|
||||
static_configs:
|
||||
- targets: ['nomyo-router:12434']
|
||||
```
|
||||
|
||||
### Grafana Dashboard
|
||||
|
||||
Create a dashboard with these panels:
|
||||
- Endpoint health status
|
||||
- Current connection counts
|
||||
- Token usage (input/output/total)
|
||||
- Request rates
|
||||
- Response times
|
||||
- Error rates
|
||||
|
||||
### Logging
|
||||
|
||||
The router logs important events to stdout:
|
||||
- Configuration loading
|
||||
- Endpoint connection issues
|
||||
- Token counting operations
|
||||
- Error conditions
|
||||
|
||||
## Troubleshooting Guide
|
||||
|
||||
### Common Issues and Solutions
|
||||
|
||||
#### 1. Endpoint Unavailable
|
||||
|
||||
**Symptoms**:
|
||||
- Health check shows endpoint as "error"
|
||||
- Requests fail with connection errors
|
||||
|
||||
**Diagnosis**:
|
||||
```bash
|
||||
curl http://localhost:12434/health
|
||||
curl http://localhost:12434/api/config
|
||||
```
|
||||
|
||||
**Solutions**:
|
||||
- Verify Ollama endpoint is running
|
||||
- Check network connectivity
|
||||
- Verify firewall rules
|
||||
- Check DNS resolution
|
||||
- Test direct connection: `curl http://endpoint:11434/api/version`
|
||||
|
||||
#### 2. Model Not Found
|
||||
|
||||
**Symptoms**:
|
||||
- Error: "None of the configured endpoints advertise the model"
|
||||
- Requests fail with model not found
|
||||
|
||||
**Diagnosis**:
|
||||
```bash
|
||||
curl http://localhost:12434/api/tags
|
||||
curl http://endpoint:11434/api/tags
|
||||
```
|
||||
|
||||
**Solutions**:
|
||||
- Pull the model on the endpoint: `curl http://endpoint:11434/api/pull -d '{"name": "llama3"}'`
|
||||
- Verify model name spelling
|
||||
- Check if model is available on any endpoint
|
||||
- For OpenAI endpoints, ensure model exists in their catalog
|
||||
|
||||
#### 3. High Latency
|
||||
|
||||
**Symptoms**:
|
||||
- Slow response times
|
||||
- Requests timing out
|
||||
|
||||
**Diagnosis**:
|
||||
```bash
|
||||
curl http://localhost:12434/api/usage
|
||||
curl http://localhost:12434/api/config
|
||||
```
|
||||
|
||||
**Solutions**:
|
||||
- Check if endpoints are overloaded (high connection counts)
|
||||
- Increase `max_concurrent_connections`
|
||||
- Add more endpoints to the cluster
|
||||
- Monitor Ollama endpoint performance
|
||||
- Check network latency between router and endpoints
|
||||
|
||||
#### 4. Connection Limits Reached
|
||||
|
||||
**Symptoms**:
|
||||
- Requests queuing
|
||||
- High connection counts
|
||||
- Slow response times
|
||||
|
||||
**Diagnosis**:
|
||||
```bash
|
||||
curl http://localhost:12434/api/usage
|
||||
```
|
||||
|
||||
**Solutions**:
|
||||
- Increase `max_concurrent_connections` in config.yaml
|
||||
- Add more Ollama endpoints
|
||||
- Scale your Ollama cluster
|
||||
- Use MOE system for critical queries
|
||||
|
||||
#### 5. Token Tracking Not Working
|
||||
|
||||
**Symptoms**:
|
||||
- Token counts not updating
|
||||
- Database errors
|
||||
|
||||
**Diagnosis**:
|
||||
```bash
|
||||
ls -la token_counts.db
|
||||
curl http://localhost:12434/api/token_counts
|
||||
```
|
||||
|
||||
**Solutions**:
|
||||
- Verify database file permissions
|
||||
- Check if database path is writable
|
||||
- Restart router to rebuild database
|
||||
- Check disk space
|
||||
- Verify environment variable `NOMYO_ROUTER_DB_PATH`
|
||||
|
||||
#### 6. Streaming Issues
|
||||
|
||||
**Symptoms**:
|
||||
- Incomplete responses
|
||||
- Connection resets during streaming
|
||||
- Timeout errors
|
||||
|
||||
**Diagnosis**:
|
||||
- Check router logs for errors
|
||||
- Test with non-streaming requests
|
||||
- Monitor connection counts
|
||||
|
||||
**Solutions**:
|
||||
- Increase timeout settings
|
||||
- Reduce `max_concurrent_connections`
|
||||
- Check network stability
|
||||
- Test with smaller payloads
|
||||
|
||||
### Error Messages
|
||||
|
||||
#### "Failed to connect to endpoint"
|
||||
|
||||
**Cause**: Network connectivity issue
|
||||
**Action**: Verify endpoint is reachable, check firewall, test DNS
|
||||
|
||||
#### "None of the configured endpoints advertise the model"
|
||||
|
||||
**Cause**: Model not pulled on any endpoint
|
||||
**Action**: Pull the model, verify model name
|
||||
|
||||
#### "Timed out waiting for endpoint"
|
||||
|
||||
**Cause**: Endpoint slow to respond
|
||||
**Action**: Check endpoint health, increase timeouts
|
||||
|
||||
#### "Invalid JSON format in request body"
|
||||
|
||||
**Cause**: Malformed request
|
||||
**Action**: Validate request payload, check API documentation
|
||||
|
||||
#### "Missing required field 'model'"
|
||||
|
||||
**Cause**: Request missing model parameter
|
||||
**Action**: Add model parameter to request
|
||||
|
||||
### Performance Tuning
|
||||
|
||||
#### Optimizing Connection Handling
|
||||
|
||||
1. **Adjust concurrency limits**:
|
||||
```yaml
|
||||
max_concurrent_connections: 4
|
||||
```
|
||||
|
||||
2. **Monitor connection usage**:
|
||||
```bash
|
||||
curl http://localhost:12434/api/usage
|
||||
```
|
||||
|
||||
3. **Scale horizontally**:
|
||||
- Add more Ollama endpoints
|
||||
- Deploy multiple router instances
|
||||
|
||||
#### Reducing Latency
|
||||
|
||||
1. **Keep models loaded**:
|
||||
- Use frequently accessed models
|
||||
- Monitor `/api/ps` for loaded models
|
||||
|
||||
2. **Optimize endpoint selection**:
|
||||
- Distribute models across endpoints
|
||||
- Balance load evenly
|
||||
|
||||
3. **Use caching**:
|
||||
- Models cache (300s TTL)
|
||||
- Loaded models cache (30s TTL)
|
||||
|
||||
#### Memory Management
|
||||
|
||||
1. **Monitor memory usage**:
|
||||
- Token buffer grows with usage
|
||||
- Time-series data accumulates
|
||||
|
||||
2. **Adjust flush interval**:
|
||||
- Default: 10 seconds
|
||||
- Can be increased for less frequent I/O
|
||||
|
||||
3. **Database maintenance**:
|
||||
- Regular backups
|
||||
- Archive old data periodically
|
||||
|
||||
### Database Management
|
||||
|
||||
#### Viewing Token Data
|
||||
|
||||
```bash
|
||||
sqlite3 token_counts.db "SELECT * FROM token_counts;"
|
||||
sqlite3 token_counts.db "SELECT * FROM time_series LIMIT 100;"
|
||||
```
|
||||
|
||||
#### Aggregating Old Data
|
||||
|
||||
```bash
|
||||
curl http://localhost:12434/api/aggregate_time_series_days \
|
||||
-X POST \
|
||||
-d '{"days": 30, "trim_old": true}'
|
||||
```
|
||||
|
||||
#### Backing Up Database
|
||||
|
||||
```bash
|
||||
cp token_counts.db token_counts.db.backup
|
||||
```
|
||||
|
||||
#### Restoring Database
|
||||
|
||||
```bash
|
||||
cp token_counts.db.backup token_counts.db
|
||||
```
|
||||
|
||||
### Advanced Troubleshooting
|
||||
|
||||
#### Debugging Endpoint Selection
|
||||
|
||||
Enable debug logging to see endpoint selection decisions:
|
||||
```bash
|
||||
uvicorn router:app --host 0.0.0.0 --port 12434 --log-level debug
|
||||
```
|
||||
|
||||
#### Testing Individual Endpoints
|
||||
|
||||
```bash
|
||||
# Test endpoint directly
|
||||
curl http://endpoint:11434/api/version
|
||||
|
||||
# Test model availability
|
||||
curl http://endpoint:11434/api/tags
|
||||
|
||||
# Test model loading
|
||||
curl http://endpoint:11434/api/ps
|
||||
```
|
||||
|
||||
#### Network Diagnostics
|
||||
|
||||
```bash
|
||||
# Test connectivity
|
||||
nc -zv endpoint 11434
|
||||
|
||||
# Test DNS resolution
|
||||
dig endpoint
|
||||
|
||||
# Test latency
|
||||
ping endpoint
|
||||
```
|
||||
|
||||
### Common Pitfalls
|
||||
|
||||
1. **Using localhost in Docker**:
|
||||
- Inside Docker, `localhost` refers to the container itself
|
||||
- Use `host.docker.internal` or Docker service names
|
||||
|
||||
2. **Incorrect model names**:
|
||||
- Ollama: `llama3:latest`
|
||||
- OpenAI: `gpt-4` (no version suffix)
|
||||
|
||||
3. **Missing API keys**:
|
||||
- Remote endpoints require authentication
|
||||
- Set keys in config.yaml or environment variables
|
||||
|
||||
4. **Firewall blocking**:
|
||||
- Ensure port 11434 is open for Ollama
|
||||
- Ensure port 12434 is open for router
|
||||
|
||||
5. **Insufficient resources**:
|
||||
- Monitor CPU/memory on Ollama endpoints
|
||||
- Adjust `max_concurrent_connections` accordingly
|
||||
|
||||
## Best Practices
|
||||
|
||||
### Monitoring Setup
|
||||
|
||||
1. **Set up health checks**:
|
||||
- Monitor `/health` endpoint
|
||||
- Alert on status "error"
|
||||
|
||||
2. **Track usage metrics**:
|
||||
- Monitor connection counts
|
||||
- Track token usage
|
||||
- Watch for connection limits
|
||||
|
||||
3. **Log important events**:
|
||||
- Configuration changes
|
||||
- Endpoint failures
|
||||
- Recovery events
|
||||
|
||||
4. **Regular backups**:
|
||||
- Backup token_counts.db
|
||||
- Schedule regular backups
|
||||
- Test restore procedure
|
||||
|
||||
### Performance Monitoring
|
||||
|
||||
1. **Baseline metrics**:
|
||||
- Establish normal usage patterns
|
||||
- Track trends over time
|
||||
|
||||
2. **Alert thresholds**:
|
||||
- Set alerts for high connection counts
|
||||
- Monitor error rates
|
||||
- Watch for latency spikes
|
||||
|
||||
3. **Capacity planning**:
|
||||
- Track growth in token usage
|
||||
- Plan for scaling needs
|
||||
- Monitor resource utilization
|
||||
|
||||
### Incident Response
|
||||
|
||||
1. **Quick diagnosis**:
|
||||
- Check health endpoint first
|
||||
- Review recent logs
|
||||
- Verify endpoint status
|
||||
|
||||
2. **Isolation**:
|
||||
- Identify affected endpoints
|
||||
- Isolate problematic components
|
||||
- Fallback to healthy endpoints
|
||||
|
||||
3. **Recovery**:
|
||||
- Restart router if needed
|
||||
- Rebalance load
|
||||
- Restore from backup if necessary
|
||||
|
||||
## Examples
|
||||
|
||||
See the [examples](examples/) directory for monitoring configuration examples.
|
||||
Loading…
Add table
Add a link
Reference in a new issue