mirror of
https://github.com/trustgraph-ai/trustgraph.git
synced 2026-04-25 16:36:21 +02:00
* Consolidate logging into a single module * Added Loki logging * Update tech spec * Add processor label * Fix recursive log entries, logging Loki"s internals
350 lines
9.8 KiB
Markdown
350 lines
9.8 KiB
Markdown
# TrustGraph Logging Strategy
|
|
|
|
## Overview
|
|
|
|
TrustGraph uses Python's built-in `logging` module for all logging operations, with centralized configuration and optional Loki integration for log aggregation. This provides a standardized, flexible approach to logging across all components of the system.
|
|
|
|
## Default Configuration
|
|
|
|
### Logging Level
|
|
- **Default Level**: `INFO`
|
|
- **Configurable via**: `--log-level` command-line argument
|
|
- **Choices**: `DEBUG`, `INFO`, `WARNING`, `ERROR`, `CRITICAL`
|
|
|
|
### Output Destinations
|
|
1. **Console (stdout)**: Always enabled - ensures compatibility with containerized environments
|
|
2. **Loki**: Optional centralized log aggregation (enabled by default, can be disabled)
|
|
|
|
## Centralized Logging Module
|
|
|
|
All logging configuration is managed by `trustgraph.base.logging` module, which provides:
|
|
- `add_logging_args(parser)` - Adds standard logging CLI arguments
|
|
- `setup_logging(args)` - Configures logging from parsed arguments
|
|
|
|
This module is used by all server-side components:
|
|
- AsyncProcessor-based services
|
|
- API Gateway
|
|
- MCP Server
|
|
|
|
## Implementation Guidelines
|
|
|
|
### 1. Logger Initialization
|
|
|
|
Each module should create its own logger using the module's `__name__`:
|
|
|
|
```python
|
|
import logging
|
|
|
|
logger = logging.getLogger(__name__)
|
|
```
|
|
|
|
The logger name is automatically used as a label in Loki for filtering and searching.
|
|
|
|
### 2. Service Initialization
|
|
|
|
All server-side services automatically get logging configuration through the centralized module:
|
|
|
|
```python
|
|
from trustgraph.base import add_logging_args, setup_logging
|
|
import argparse
|
|
|
|
def main():
|
|
parser = argparse.ArgumentParser()
|
|
|
|
# Add standard logging arguments (includes Loki configuration)
|
|
add_logging_args(parser)
|
|
|
|
# Add your service-specific arguments
|
|
parser.add_argument('--port', type=int, default=8080)
|
|
|
|
args = parser.parse_args()
|
|
args = vars(args)
|
|
|
|
# Setup logging early in startup
|
|
setup_logging(args)
|
|
|
|
# Rest of your service initialization
|
|
logger = logging.getLogger(__name__)
|
|
logger.info("Service starting...")
|
|
```
|
|
|
|
### 3. Command-Line Arguments
|
|
|
|
All services support these logging arguments:
|
|
|
|
**Log Level:**
|
|
```bash
|
|
--log-level {DEBUG,INFO,WARNING,ERROR,CRITICAL}
|
|
```
|
|
|
|
**Loki Configuration:**
|
|
```bash
|
|
--loki-enabled # Enable Loki (default)
|
|
--no-loki-enabled # Disable Loki
|
|
--loki-url URL # Loki push URL (default: http://loki:3100/loki/api/v1/push)
|
|
--loki-username USERNAME # Optional authentication
|
|
--loki-password PASSWORD # Optional authentication
|
|
```
|
|
|
|
**Examples:**
|
|
```bash
|
|
# Default - INFO level, Loki enabled
|
|
./my-service
|
|
|
|
# Debug mode, console only
|
|
./my-service --log-level DEBUG --no-loki-enabled
|
|
|
|
# Custom Loki server with auth
|
|
./my-service --loki-url http://loki.prod:3100/loki/api/v1/push \
|
|
--loki-username admin --loki-password secret
|
|
```
|
|
|
|
### 4. Environment Variables
|
|
|
|
Loki configuration supports environment variable fallbacks:
|
|
|
|
```bash
|
|
export LOKI_URL=http://loki.prod:3100/loki/api/v1/push
|
|
export LOKI_USERNAME=admin
|
|
export LOKI_PASSWORD=secret
|
|
```
|
|
|
|
Command-line arguments take precedence over environment variables.
|
|
|
|
### 5. Logging Best Practices
|
|
|
|
#### Log Levels Usage
|
|
- **DEBUG**: Detailed information for diagnosing problems (variable values, function entry/exit)
|
|
- **INFO**: General informational messages (service started, configuration loaded, processing milestones)
|
|
- **WARNING**: Warning messages for potentially harmful situations (deprecated features, recoverable errors)
|
|
- **ERROR**: Error messages for serious problems (failed operations, exceptions)
|
|
- **CRITICAL**: Critical messages for system failures requiring immediate attention
|
|
|
|
#### Message Format
|
|
```python
|
|
# Good - includes context
|
|
logger.info(f"Processing document: {doc_id}, size: {doc_size} bytes")
|
|
logger.error(f"Failed to connect to database: {error}", exc_info=True)
|
|
|
|
# Avoid - lacks context
|
|
logger.info("Processing document")
|
|
logger.error("Connection failed")
|
|
```
|
|
|
|
#### Performance Considerations
|
|
```python
|
|
# Use lazy formatting for expensive operations
|
|
logger.debug("Expensive operation result: %s", expensive_function())
|
|
|
|
# Check log level for very expensive debug operations
|
|
if logger.isEnabledFor(logging.DEBUG):
|
|
debug_data = compute_expensive_debug_info()
|
|
logger.debug(f"Debug data: {debug_data}")
|
|
```
|
|
|
|
### 6. Structured Logging with Loki
|
|
|
|
For complex data, use structured logging with extra tags for Loki:
|
|
|
|
```python
|
|
logger.info("Request processed", extra={
|
|
'tags': {
|
|
'request_id': request_id,
|
|
'user_id': user_id,
|
|
'status': 'success'
|
|
}
|
|
})
|
|
```
|
|
|
|
These tags become searchable labels in Loki, in addition to automatic labels:
|
|
- `severity` - Log level (DEBUG, INFO, WARNING, ERROR, CRITICAL)
|
|
- `logger` - Module name (from `__name__`)
|
|
|
|
### 7. Exception Logging
|
|
|
|
Always include stack traces for exceptions:
|
|
|
|
```python
|
|
try:
|
|
process_data()
|
|
except Exception as e:
|
|
logger.error(f"Failed to process data: {e}", exc_info=True)
|
|
raise
|
|
```
|
|
|
|
### 8. Async Logging Considerations
|
|
|
|
The logging system uses non-blocking queued handlers for Loki:
|
|
- Console output is synchronous (fast)
|
|
- Loki output is queued with 500-message buffer
|
|
- Background thread handles Loki transmission
|
|
- No blocking of main application code
|
|
|
|
```python
|
|
import asyncio
|
|
import logging
|
|
|
|
async def async_operation():
|
|
logger = logging.getLogger(__name__)
|
|
# Logging is thread-safe and won't block async operations
|
|
logger.info(f"Starting async operation in task: {asyncio.current_task().get_name()}")
|
|
```
|
|
|
|
## Loki Integration
|
|
|
|
### Architecture
|
|
|
|
The logging system uses Python's built-in `QueueHandler` and `QueueListener` for non-blocking Loki integration:
|
|
|
|
1. **QueueHandler**: Logs are placed in a 500-message queue (non-blocking)
|
|
2. **Background Thread**: QueueListener sends logs to Loki asynchronously
|
|
3. **Graceful Degradation**: If Loki is unavailable, console logging continues
|
|
|
|
### Automatic Labels
|
|
|
|
Every log sent to Loki includes:
|
|
- `processor`: Processor identity (e.g., `config-svc`, `text-completion`, `embeddings`)
|
|
- `severity`: Log level (DEBUG, INFO, etc.)
|
|
- `logger`: Module name (e.g., `trustgraph.gateway.service`, `trustgraph.agent.react.service`)
|
|
|
|
### Custom Labels
|
|
|
|
Add custom labels via the `extra` parameter:
|
|
|
|
```python
|
|
logger.info("User action", extra={
|
|
'tags': {
|
|
'user_id': user_id,
|
|
'action': 'document_upload',
|
|
'collection': collection_name
|
|
}
|
|
})
|
|
```
|
|
|
|
### Querying Logs in Loki
|
|
|
|
```logql
|
|
# All logs from a specific processor (recommended - matches Prometheus metrics)
|
|
{processor="config-svc"}
|
|
{processor="text-completion"}
|
|
{processor="embeddings"}
|
|
|
|
# Error logs from a specific processor
|
|
{processor="config-svc", severity="ERROR"}
|
|
|
|
# Error logs from all processors
|
|
{severity="ERROR"}
|
|
|
|
# Logs from a specific processor with text filter
|
|
{processor="text-completion"} |= "Processing"
|
|
|
|
# All logs from API gateway
|
|
{processor="api-gateway"}
|
|
|
|
# Logs from processors matching pattern
|
|
{processor=~".*-completion"}
|
|
|
|
# Logs with custom tags
|
|
{processor="api-gateway"} | json | user_id="12345"
|
|
```
|
|
|
|
### Graceful Degradation
|
|
|
|
If Loki is unavailable or `python-logging-loki` is not installed:
|
|
- Warning message printed to console
|
|
- Console logging continues normally
|
|
- Application continues running
|
|
- No retry logic for Loki connection (fail fast, degrade gracefully)
|
|
|
|
## Testing
|
|
|
|
During tests, consider using a different logging configuration:
|
|
|
|
```python
|
|
# In test setup
|
|
import logging
|
|
|
|
# Reduce noise during tests
|
|
logging.getLogger().setLevel(logging.WARNING)
|
|
|
|
# Or disable Loki for tests
|
|
setup_logging({'log_level': 'WARNING', 'loki_enabled': False})
|
|
```
|
|
|
|
## Monitoring Integration
|
|
|
|
### Standard Format
|
|
All logs use consistent format:
|
|
```
|
|
2025-01-09 10:30:45,123 - trustgraph.gateway.service - INFO - Request processed
|
|
```
|
|
|
|
Format components:
|
|
- Timestamp (ISO format with milliseconds)
|
|
- Logger name (module path)
|
|
- Log level
|
|
- Message
|
|
|
|
### Loki Queries for Monitoring
|
|
|
|
Common monitoring queries:
|
|
|
|
```logql
|
|
# Error rate by processor
|
|
rate({severity="ERROR"}[5m]) by (processor)
|
|
|
|
# Top error-producing processors
|
|
topk(5, count_over_time({severity="ERROR"}[1h]) by (processor))
|
|
|
|
# Recent errors with processor name
|
|
{severity="ERROR"} | line_format "{{.processor}}: {{.message}}"
|
|
|
|
# All agent processors
|
|
{processor=~".*agent.*"} |= "exception"
|
|
|
|
# Specific processor error count
|
|
count_over_time({processor="config-svc", severity="ERROR"}[1h])
|
|
```
|
|
|
|
## Security Considerations
|
|
|
|
- **Never log sensitive information** (passwords, API keys, personal data, tokens)
|
|
- **Sanitize user input** before logging
|
|
- **Use placeholders** for sensitive fields: `user_id=****1234`
|
|
- **Loki authentication**: Use `--loki-username` and `--loki-password` for secure deployments
|
|
- **Secure transport**: Use HTTPS for Loki URL in production: `https://loki.prod:3100/loki/api/v1/push`
|
|
|
|
## Dependencies
|
|
|
|
The centralized logging module requires:
|
|
- `python-logging-loki` - For Loki integration (optional, graceful degradation if missing)
|
|
|
|
Already included in `trustgraph-base/pyproject.toml` and `requirements.txt`.
|
|
|
|
## Migration Path
|
|
|
|
For existing code:
|
|
|
|
1. **Services already using AsyncProcessor**: No changes needed, Loki support is automatic
|
|
2. **Services not using AsyncProcessor** (api-gateway, mcp-server): Already updated
|
|
3. **CLI tools**: Out of scope - continue using print() or simple logging
|
|
|
|
### From print() to logging:
|
|
```python
|
|
# Before
|
|
print(f"Processing document {doc_id}")
|
|
|
|
# After
|
|
logger = logging.getLogger(__name__)
|
|
logger.info(f"Processing document {doc_id}")
|
|
```
|
|
|
|
## Configuration Summary
|
|
|
|
| Argument | Default | Environment Variable | Description |
|
|
|----------|---------|---------------------|-------------|
|
|
| `--log-level` | `INFO` | - | Console and Loki log level |
|
|
| `--loki-enabled` | `True` | - | Enable Loki logging |
|
|
| `--loki-url` | `http://loki:3100/loki/api/v1/push` | `LOKI_URL` | Loki push endpoint |
|
|
| `--loki-username` | `None` | `LOKI_USERNAME` | Loki auth username |
|
|
| `--loki-password` | `None` | `LOKI_PASSWORD` | Loki auth password |
|