mirror of
https://github.com/trustgraph-ai/trustgraph.git
synced 2026-04-25 08:26:21 +02:00
Native CLI i18n: The TrustGraph CLI has built-in translation support that dynamically loads language strings. You can test and use different languages by simply passing the --lang flag (e.g., --lang es for Spanish, --lang ru for Russian) or by configuring your environment's LANG variable. Automated Docs Translations: This PR introduces autonomously translated Markdown documentation into several target languages, including Spanish, Swahili, Portuguese, Turkish, Hindi, Hebrew, Arabic, Simplified Chinese, and Russian.
356 lines
9.9 KiB
Markdown
356 lines
9.9 KiB
Markdown
---
|
|
layout: default
|
|
title: "TrustGraph Logging Strategy"
|
|
parent: "Tech Specs"
|
|
---
|
|
|
|
# TrustGraph Logging Strategy
|
|
|
|
## Overview
|
|
|
|
TrustGraph uses Python's built-in `logging` module for all logging operations, with centralized configuration and optional Loki integration for log aggregation. This provides a standardized, flexible approach to logging across all components of the system.
|
|
|
|
## Default Configuration
|
|
|
|
### Logging Level
|
|
- **Default Level**: `INFO`
|
|
- **Configurable via**: `--log-level` command-line argument
|
|
- **Choices**: `DEBUG`, `INFO`, `WARNING`, `ERROR`, `CRITICAL`
|
|
|
|
### Output Destinations
|
|
1. **Console (stdout)**: Always enabled - ensures compatibility with containerized environments
|
|
2. **Loki**: Optional centralized log aggregation (enabled by default, can be disabled)
|
|
|
|
## Centralized Logging Module
|
|
|
|
All logging configuration is managed by `trustgraph.base.logging` module, which provides:
|
|
- `add_logging_args(parser)` - Adds standard logging CLI arguments
|
|
- `setup_logging(args)` - Configures logging from parsed arguments
|
|
|
|
This module is used by all server-side components:
|
|
- AsyncProcessor-based services
|
|
- API Gateway
|
|
- MCP Server
|
|
|
|
## Implementation Guidelines
|
|
|
|
### 1. Logger Initialization
|
|
|
|
Each module should create its own logger using the module's `__name__`:
|
|
|
|
```python
|
|
import logging
|
|
|
|
logger = logging.getLogger(__name__)
|
|
```
|
|
|
|
The logger name is automatically used as a label in Loki for filtering and searching.
|
|
|
|
### 2. Service Initialization
|
|
|
|
All server-side services automatically get logging configuration through the centralized module:
|
|
|
|
```python
|
|
from trustgraph.base import add_logging_args, setup_logging
|
|
import argparse
|
|
|
|
def main():
|
|
parser = argparse.ArgumentParser()
|
|
|
|
# Add standard logging arguments (includes Loki configuration)
|
|
add_logging_args(parser)
|
|
|
|
# Add your service-specific arguments
|
|
parser.add_argument('--port', type=int, default=8080)
|
|
|
|
args = parser.parse_args()
|
|
args = vars(args)
|
|
|
|
# Setup logging early in startup
|
|
setup_logging(args)
|
|
|
|
# Rest of your service initialization
|
|
logger = logging.getLogger(__name__)
|
|
logger.info("Service starting...")
|
|
```
|
|
|
|
### 3. Command-Line Arguments
|
|
|
|
All services support these logging arguments:
|
|
|
|
**Log Level:**
|
|
```bash
|
|
--log-level {DEBUG,INFO,WARNING,ERROR,CRITICAL}
|
|
```
|
|
|
|
**Loki Configuration:**
|
|
```bash
|
|
--loki-enabled # Enable Loki (default)
|
|
--no-loki-enabled # Disable Loki
|
|
--loki-url URL # Loki push URL (default: http://loki:3100/loki/api/v1/push)
|
|
--loki-username USERNAME # Optional authentication
|
|
--loki-password PASSWORD # Optional authentication
|
|
```
|
|
|
|
**Examples:**
|
|
```bash
|
|
# Default - INFO level, Loki enabled
|
|
./my-service
|
|
|
|
# Debug mode, console only
|
|
./my-service --log-level DEBUG --no-loki-enabled
|
|
|
|
# Custom Loki server with auth
|
|
./my-service --loki-url http://loki.prod:3100/loki/api/v1/push \
|
|
--loki-username admin --loki-password secret
|
|
```
|
|
|
|
### 4. Environment Variables
|
|
|
|
Loki configuration supports environment variable fallbacks:
|
|
|
|
```bash
|
|
export LOKI_URL=http://loki.prod:3100/loki/api/v1/push
|
|
export LOKI_USERNAME=admin
|
|
export LOKI_PASSWORD=secret
|
|
```
|
|
|
|
Command-line arguments take precedence over environment variables.
|
|
|
|
### 5. Logging Best Practices
|
|
|
|
#### Log Levels Usage
|
|
- **DEBUG**: Detailed information for diagnosing problems (variable values, function entry/exit)
|
|
- **INFO**: General informational messages (service started, configuration loaded, processing milestones)
|
|
- **WARNING**: Warning messages for potentially harmful situations (deprecated features, recoverable errors)
|
|
- **ERROR**: Error messages for serious problems (failed operations, exceptions)
|
|
- **CRITICAL**: Critical messages for system failures requiring immediate attention
|
|
|
|
#### Message Format
|
|
```python
|
|
# Good - includes context
|
|
logger.info(f"Processing document: {doc_id}, size: {doc_size} bytes")
|
|
logger.error(f"Failed to connect to database: {error}", exc_info=True)
|
|
|
|
# Avoid - lacks context
|
|
logger.info("Processing document")
|
|
logger.error("Connection failed")
|
|
```
|
|
|
|
#### Performance Considerations
|
|
```python
|
|
# Use lazy formatting for expensive operations
|
|
logger.debug("Expensive operation result: %s", expensive_function())
|
|
|
|
# Check log level for very expensive debug operations
|
|
if logger.isEnabledFor(logging.DEBUG):
|
|
debug_data = compute_expensive_debug_info()
|
|
logger.debug(f"Debug data: {debug_data}")
|
|
```
|
|
|
|
### 6. Structured Logging with Loki
|
|
|
|
For complex data, use structured logging with extra tags for Loki:
|
|
|
|
```python
|
|
logger.info("Request processed", extra={
|
|
'tags': {
|
|
'request_id': request_id,
|
|
'user_id': user_id,
|
|
'status': 'success'
|
|
}
|
|
})
|
|
```
|
|
|
|
These tags become searchable labels in Loki, in addition to automatic labels:
|
|
- `severity` - Log level (DEBUG, INFO, WARNING, ERROR, CRITICAL)
|
|
- `logger` - Module name (from `__name__`)
|
|
|
|
### 7. Exception Logging
|
|
|
|
Always include stack traces for exceptions:
|
|
|
|
```python
|
|
try:
|
|
process_data()
|
|
except Exception as e:
|
|
logger.error(f"Failed to process data: {e}", exc_info=True)
|
|
raise
|
|
```
|
|
|
|
### 8. Async Logging Considerations
|
|
|
|
The logging system uses non-blocking queued handlers for Loki:
|
|
- Console output is synchronous (fast)
|
|
- Loki output is queued with 500-message buffer
|
|
- Background thread handles Loki transmission
|
|
- No blocking of main application code
|
|
|
|
```python
|
|
import asyncio
|
|
import logging
|
|
|
|
async def async_operation():
|
|
logger = logging.getLogger(__name__)
|
|
# Logging is thread-safe and won't block async operations
|
|
logger.info(f"Starting async operation in task: {asyncio.current_task().get_name()}")
|
|
```
|
|
|
|
## Loki Integration
|
|
|
|
### Architecture
|
|
|
|
The logging system uses Python's built-in `QueueHandler` and `QueueListener` for non-blocking Loki integration:
|
|
|
|
1. **QueueHandler**: Logs are placed in a 500-message queue (non-blocking)
|
|
2. **Background Thread**: QueueListener sends logs to Loki asynchronously
|
|
3. **Graceful Degradation**: If Loki is unavailable, console logging continues
|
|
|
|
### Automatic Labels
|
|
|
|
Every log sent to Loki includes:
|
|
- `processor`: Processor identity (e.g., `config-svc`, `text-completion`, `embeddings`)
|
|
- `severity`: Log level (DEBUG, INFO, etc.)
|
|
- `logger`: Module name (e.g., `trustgraph.gateway.service`, `trustgraph.agent.react.service`)
|
|
|
|
### Custom Labels
|
|
|
|
Add custom labels via the `extra` parameter:
|
|
|
|
```python
|
|
logger.info("User action", extra={
|
|
'tags': {
|
|
'user_id': user_id,
|
|
'action': 'document_upload',
|
|
'collection': collection_name
|
|
}
|
|
})
|
|
```
|
|
|
|
### Querying Logs in Loki
|
|
|
|
```logql
|
|
# All logs from a specific processor (recommended - matches Prometheus metrics)
|
|
{processor="config-svc"}
|
|
{processor="text-completion"}
|
|
{processor="embeddings"}
|
|
|
|
# Error logs from a specific processor
|
|
{processor="config-svc", severity="ERROR"}
|
|
|
|
# Error logs from all processors
|
|
{severity="ERROR"}
|
|
|
|
# Logs from a specific processor with text filter
|
|
{processor="text-completion"} |= "Processing"
|
|
|
|
# All logs from API gateway
|
|
{processor="api-gateway"}
|
|
|
|
# Logs from processors matching pattern
|
|
{processor=~".*-completion"}
|
|
|
|
# Logs with custom tags
|
|
{processor="api-gateway"} | json | user_id="12345"
|
|
```
|
|
|
|
### Graceful Degradation
|
|
|
|
If Loki is unavailable or `python-logging-loki` is not installed:
|
|
- Warning message printed to console
|
|
- Console logging continues normally
|
|
- Application continues running
|
|
- No retry logic for Loki connection (fail fast, degrade gracefully)
|
|
|
|
## Testing
|
|
|
|
During tests, consider using a different logging configuration:
|
|
|
|
```python
|
|
# In test setup
|
|
import logging
|
|
|
|
# Reduce noise during tests
|
|
logging.getLogger().setLevel(logging.WARNING)
|
|
|
|
# Or disable Loki for tests
|
|
setup_logging({'log_level': 'WARNING', 'loki_enabled': False})
|
|
```
|
|
|
|
## Monitoring Integration
|
|
|
|
### Standard Format
|
|
All logs use consistent format:
|
|
```
|
|
2025-01-09 10:30:45,123 - trustgraph.gateway.service - INFO - Request processed
|
|
```
|
|
|
|
Format components:
|
|
- Timestamp (ISO format with milliseconds)
|
|
- Logger name (module path)
|
|
- Log level
|
|
- Message
|
|
|
|
### Loki Queries for Monitoring
|
|
|
|
Common monitoring queries:
|
|
|
|
```logql
|
|
# Error rate by processor
|
|
rate({severity="ERROR"}[5m]) by (processor)
|
|
|
|
# Top error-producing processors
|
|
topk(5, count_over_time({severity="ERROR"}[1h]) by (processor))
|
|
|
|
# Recent errors with processor name
|
|
{severity="ERROR"} | line_format "{{.processor}}: {{.message}}"
|
|
|
|
# All agent processors
|
|
{processor=~".*agent.*"} |= "exception"
|
|
|
|
# Specific processor error count
|
|
count_over_time({processor="config-svc", severity="ERROR"}[1h])
|
|
```
|
|
|
|
## Security Considerations
|
|
|
|
- **Never log sensitive information** (passwords, API keys, personal data, tokens)
|
|
- **Sanitize user input** before logging
|
|
- **Use placeholders** for sensitive fields: `user_id=****1234`
|
|
- **Loki authentication**: Use `--loki-username` and `--loki-password` for secure deployments
|
|
- **Secure transport**: Use HTTPS for Loki URL in production: `https://loki.prod:3100/loki/api/v1/push`
|
|
|
|
## Dependencies
|
|
|
|
The centralized logging module requires:
|
|
- `python-logging-loki` - For Loki integration (optional, graceful degradation if missing)
|
|
|
|
Already included in `trustgraph-base/pyproject.toml` and `requirements.txt`.
|
|
|
|
## Migration Path
|
|
|
|
For existing code:
|
|
|
|
1. **Services already using AsyncProcessor**: No changes needed, Loki support is automatic
|
|
2. **Services not using AsyncProcessor** (api-gateway, mcp-server): Already updated
|
|
3. **CLI tools**: Out of scope - continue using print() or simple logging
|
|
|
|
### From print() to logging:
|
|
```python
|
|
# Before
|
|
print(f"Processing document {doc_id}")
|
|
|
|
# After
|
|
logger = logging.getLogger(__name__)
|
|
logger.info(f"Processing document {doc_id}")
|
|
```
|
|
|
|
## Configuration Summary
|
|
|
|
| Argument | Default | Environment Variable | Description |
|
|
|----------|---------|---------------------|-------------|
|
|
| `--log-level` | `INFO` | - | Console and Loki log level |
|
|
| `--loki-enabled` | `True` | - | Enable Loki logging |
|
|
| `--loki-url` | `http://loki:3100/loki/api/v1/push` | `LOKI_URL` | Loki push endpoint |
|
|
| `--loki-username` | `None` | `LOKI_USERNAME` | Loki auth username |
|
|
| `--loki-password` | `None` | `LOKI_PASSWORD` | Loki auth password |
|