More config cli (#466)

* Extra config CLI tech spec * Describe packaging * Added CLI commands * Add tests
2026-04-25 00:16:23 +02:00 · 2025-08-22 13:36:10 +01:00 · 2025-08-22 13:36:10 +01:00 · 28190fea8a
commit 28190fea8a
parent 5e71d0cadb
14 changed files with 1361 additions and 0 deletions
--- a/docs/tech-specs/logging-strategy.md
+++ b/docs/tech-specs/logging-strategy.md
@ -0,0 +1,169 @@
+# TrustGraph Logging Strategy
+
+## Overview
+
+TrustGraph uses Python's built-in `logging` module for all logging operations. This provides a standardized, flexible approach to logging across all components of the system.
+
+## Default Configuration
+
+### Logging Level
+- **Default Level**: `INFO`
+- **Debug Mode**: `DEBUG` (enabled via command-line argument)
+- **Production**: `WARNING` or `ERROR` as appropriate
+
+### Output Destination
+All logs should be written to **standard output (stdout)** to ensure compatibility with containerized environments and log aggregation systems.
+
+## Implementation Guidelines
+
+### 1. Logger Initialization
+
+Each module should create its own logger using the module's `__name__`:
+
+```python
+import logging
+
+logger = logging.getLogger(__name__)
+```
+
+### 2. Centralized Configuration
+
+The logging configuration should be centralized in `async_processor.py` (or a dedicated logging configuration module) since it's inherited by much of the codebase:
+
+```python
+import logging
+import argparse
+
+def setup_logging(log_level='INFO'):
+    """Configure logging for the entire application"""
+    logging.basicConfig(
+        level=getattr(logging, log_level.upper()),
+        format='%(asctime)s - %(name)s - %(levelname)s - %(message)s',
+        handlers=[logging.StreamHandler()]
+    )
+
+def parse_args():
+    parser = argparse.ArgumentParser()
+    parser.add_argument(
+        '--log-level',
+        default='INFO',
+        choices=['DEBUG', 'INFO', 'WARNING', 'ERROR', 'CRITICAL'],
+        help='Set the logging level (default: INFO)'
+    )
+    return parser.parse_args()
+
+# In main execution
+if __name__ == '__main__':
+    args = parse_args()
+    setup_logging(args.log_level)
+```
+
+### 3. Logging Best Practices
+
+#### Log Levels Usage
+- **DEBUG**: Detailed information for diagnosing problems (variable values, function entry/exit)
+- **INFO**: General informational messages (service started, configuration loaded, processing milestones)
+- **WARNING**: Warning messages for potentially harmful situations (deprecated features, recoverable errors)
+- **ERROR**: Error messages for serious problems (failed operations, exceptions)
+- **CRITICAL**: Critical messages for system failures requiring immediate attention
+
+#### Message Format
+```python
+# Good - includes context
+logger.info(f"Processing document: {doc_id}, size: {doc_size} bytes")
+logger.error(f"Failed to connect to database: {error}", exc_info=True)
+
+# Avoid - lacks context
+logger.info("Processing document")
+logger.error("Connection failed")
+```
+
+#### Performance Considerations
+```python
+# Use lazy formatting for expensive operations
+logger.debug("Expensive operation result: %s", expensive_function())
+
+# Check log level for very expensive debug operations
+if logger.isEnabledFor(logging.DEBUG):
+    debug_data = compute_expensive_debug_info()
+    logger.debug(f"Debug data: {debug_data}")
+```
+
+### 4. Structured Logging
+
+For complex data, use structured logging:
+
+```python
+logger.info("Request processed", extra={
+    'request_id': request_id,
+    'duration_ms': duration,
+    'status_code': status_code,
+    'user_id': user_id
+})
+```
+
+### 5. Exception Logging
+
+Always include stack traces for exceptions:
+
+```python
+try:
+    process_data()
+except Exception as e:
+    logger.error(f"Failed to process data: {e}", exc_info=True)
+    raise
+```
+
+### 6. Async Logging Considerations
+
+For async code, ensure thread-safe logging:
+
+```python
+import asyncio
+import logging
+
+async def async_operation():
+    logger = logging.getLogger(__name__)
+    logger.info(f"Starting async operation in task: {asyncio.current_task().get_name()}")
+```
+
+## Environment Variables
+
+Support environment-based configuration as a fallback:
+
+```python
+import os
+
+log_level = os.environ.get('TRUSTGRAPH_LOG_LEVEL', 'INFO')
+```
+
+## Testing
+
+During tests, consider using a different logging configuration:
+
+```python
+# In test setup
+logging.getLogger().setLevel(logging.WARNING)  # Reduce noise during tests
+```
+
+## Monitoring Integration
+
+Ensure log format is compatible with monitoring tools:
+- Include timestamps in ISO format
+- Use consistent field names
+- Include correlation IDs where applicable
+- Structure logs for easy parsing (JSON format for production)
+
+## Security Considerations
+
+- Never log sensitive information (passwords, API keys, personal data)
+- Sanitize user input before logging
+- Use placeholders for sensitive fields: `user_id=****1234`
+
+## Migration Path
+
+For existing code using print statements:
+1. Replace `print()` with appropriate logger calls
+2. Choose appropriate log levels based on message importance
+3. Add context to make logs more useful
+4. Test logging output at different levels