mirror of
https://github.com/trustgraph-ai/trustgraph.git
synced 2026-04-25 08:26:21 +02:00
Native CLI i18n: The TrustGraph CLI has built-in translation support that dynamically loads language strings. You can test and use different languages by simply passing the --lang flag (e.g., --lang es for Spanish, --lang ru for Russian) or by configuring your environment's LANG variable. Automated Docs Translations: This PR introduces autonomously translated Markdown documentation into several target languages, including Spanish, Swahili, Portuguese, Turkish, Hindi, Hebrew, Arabic, Simplified Chinese, and Russian.
337 lines
12 KiB
Markdown
337 lines
12 KiB
Markdown
---
|
|
layout: default
|
|
title: "Tech Spec: Cassandra Configuration Consolidation"
|
|
parent: "Tech Specs"
|
|
---
|
|
|
|
# Tech Spec: Cassandra Configuration Consolidation
|
|
|
|
**Status:** Draft
|
|
**Author:** Assistant
|
|
**Date:** 2024-09-03
|
|
|
|
## Overview
|
|
|
|
This specification addresses the inconsistent naming and configuration patterns for Cassandra connection parameters across the TrustGraph codebase. Currently, two different parameter naming schemes exist (`cassandra_*` vs `graph_*`), leading to confusion and maintenance complexity.
|
|
|
|
## Problem Statement
|
|
|
|
The codebase currently uses two distinct sets of Cassandra configuration parameters:
|
|
|
|
1. **Knowledge/Config/Library modules** use:
|
|
- `cassandra_host` (list of hosts)
|
|
- `cassandra_user`
|
|
- `cassandra_password`
|
|
|
|
2. **Graph/Storage modules** use:
|
|
- `graph_host` (single host, sometimes converted to list)
|
|
- `graph_username`
|
|
- `graph_password`
|
|
|
|
3. **Inconsistent command-line exposure**:
|
|
- Some processors (e.g., `kg-store`) don't expose Cassandra settings as command-line arguments
|
|
- Other processors expose them with different names and formats
|
|
- Help text doesn't reflect environment variable defaults
|
|
|
|
Both parameter sets connect to the same Cassandra cluster but with different naming conventions, causing:
|
|
- Configuration confusion for users
|
|
- Increased maintenance burden
|
|
- Inconsistent documentation
|
|
- Potential for misconfiguration
|
|
- Inability to override settings via command-line in some processors
|
|
|
|
## Proposed Solution
|
|
|
|
### 1. Standardize Parameter Names
|
|
|
|
All modules will use consistent `cassandra_*` parameter names:
|
|
- `cassandra_host` - List of hosts (internally stored as list)
|
|
- `cassandra_username` - Username for authentication
|
|
- `cassandra_password` - Password for authentication
|
|
|
|
### 2. Command-Line Arguments
|
|
|
|
All processors MUST expose Cassandra configuration via command-line arguments:
|
|
- `--cassandra-host` - Comma-separated list of hosts
|
|
- `--cassandra-username` - Username for authentication
|
|
- `--cassandra-password` - Password for authentication
|
|
|
|
### 3. Environment Variable Fallback
|
|
|
|
If command-line parameters are not explicitly provided, the system will check environment variables:
|
|
- `CASSANDRA_HOST` - Comma-separated list of hosts
|
|
- `CASSANDRA_USERNAME` - Username for authentication
|
|
- `CASSANDRA_PASSWORD` - Password for authentication
|
|
|
|
### 4. Default Values
|
|
|
|
If neither command-line parameters nor environment variables are specified:
|
|
- `cassandra_host` defaults to `["cassandra"]`
|
|
- `cassandra_username` defaults to `None` (no authentication)
|
|
- `cassandra_password` defaults to `None` (no authentication)
|
|
|
|
### 5. Help Text Requirements
|
|
|
|
The `--help` output must:
|
|
- Show environment variable values as defaults when set
|
|
- Never display password values (show `****` or `<set>` instead)
|
|
- Clearly indicate the resolution order in help text
|
|
|
|
Example help output:
|
|
```
|
|
--cassandra-host HOST
|
|
Cassandra host list, comma-separated (default: prod-cluster-1,prod-cluster-2)
|
|
[from CASSANDRA_HOST environment variable]
|
|
|
|
--cassandra-username USERNAME
|
|
Cassandra username (default: cassandra_user)
|
|
[from CASSANDRA_USERNAME environment variable]
|
|
|
|
--cassandra-password PASSWORD
|
|
Cassandra password (default: <set from environment>)
|
|
```
|
|
|
|
## Implementation Details
|
|
|
|
### Parameter Resolution Order
|
|
|
|
For each Cassandra parameter, the resolution order will be:
|
|
1. Command-line argument value
|
|
2. Environment variable (`CASSANDRA_*`)
|
|
3. Default value
|
|
|
|
### Host Parameter Handling
|
|
|
|
The `cassandra_host` parameter:
|
|
- Command-line accepts comma-separated string: `--cassandra-host "host1,host2,host3"`
|
|
- Environment variable accepts comma-separated string: `CASSANDRA_HOST="host1,host2,host3"`
|
|
- Internally always stored as list: `["host1", "host2", "host3"]`
|
|
- Single host: `"localhost"` → converted to `["localhost"]`
|
|
- Already a list: `["host1", "host2"]` → used as-is
|
|
|
|
### Authentication Logic
|
|
|
|
Authentication will be used when both `cassandra_username` and `cassandra_password` are provided:
|
|
```python
|
|
if cassandra_username and cassandra_password:
|
|
# Use SSL context and PlainTextAuthProvider
|
|
else:
|
|
# Connect without authentication
|
|
```
|
|
|
|
## Files to Modify
|
|
|
|
### Modules using `graph_*` parameters (to be changed):
|
|
- `trustgraph-flow/trustgraph/storage/triples/cassandra/write.py`
|
|
- `trustgraph-flow/trustgraph/storage/objects/cassandra/write.py`
|
|
- `trustgraph-flow/trustgraph/storage/rows/cassandra/write.py`
|
|
- `trustgraph-flow/trustgraph/query/triples/cassandra/service.py`
|
|
|
|
### Modules using `cassandra_*` parameters (to be updated with env fallback):
|
|
- `trustgraph-flow/trustgraph/tables/config.py`
|
|
- `trustgraph-flow/trustgraph/tables/knowledge.py`
|
|
- `trustgraph-flow/trustgraph/tables/library.py`
|
|
- `trustgraph-flow/trustgraph/storage/knowledge/store.py`
|
|
- `trustgraph-flow/trustgraph/cores/knowledge.py`
|
|
- `trustgraph-flow/trustgraph/librarian/librarian.py`
|
|
- `trustgraph-flow/trustgraph/librarian/service.py`
|
|
- `trustgraph-flow/trustgraph/config/service/service.py`
|
|
- `trustgraph-flow/trustgraph/cores/service.py`
|
|
|
|
### Test Files to Update:
|
|
- `tests/unit/test_cores/test_knowledge_manager.py`
|
|
- `tests/unit/test_storage/test_triples_cassandra_storage.py`
|
|
- `tests/unit/test_query/test_triples_cassandra_query.py`
|
|
- `tests/integration/test_objects_cassandra_integration.py`
|
|
|
|
## Implementation Strategy
|
|
|
|
### Phase 1: Create Common Configuration Helper
|
|
Create utility functions to standardize Cassandra configuration across all processors:
|
|
|
|
```python
|
|
import os
|
|
import argparse
|
|
|
|
def get_cassandra_defaults():
|
|
"""Get default values from environment variables or fallback."""
|
|
return {
|
|
'host': os.getenv('CASSANDRA_HOST', 'cassandra'),
|
|
'username': os.getenv('CASSANDRA_USERNAME'),
|
|
'password': os.getenv('CASSANDRA_PASSWORD')
|
|
}
|
|
|
|
def add_cassandra_args(parser: argparse.ArgumentParser):
|
|
"""
|
|
Add standardized Cassandra arguments to an argument parser.
|
|
Shows environment variable values in help text.
|
|
"""
|
|
defaults = get_cassandra_defaults()
|
|
|
|
# Format help text with env var indication
|
|
host_help = f"Cassandra host list, comma-separated (default: {defaults['host']})"
|
|
if 'CASSANDRA_HOST' in os.environ:
|
|
host_help += " [from CASSANDRA_HOST]"
|
|
|
|
username_help = f"Cassandra username"
|
|
if defaults['username']:
|
|
username_help += f" (default: {defaults['username']})"
|
|
if 'CASSANDRA_USERNAME' in os.environ:
|
|
username_help += " [from CASSANDRA_USERNAME]"
|
|
|
|
password_help = "Cassandra password"
|
|
if defaults['password']:
|
|
password_help += " (default: <set>)"
|
|
if 'CASSANDRA_PASSWORD' in os.environ:
|
|
password_help += " [from CASSANDRA_PASSWORD]"
|
|
|
|
parser.add_argument(
|
|
'--cassandra-host',
|
|
default=defaults['host'],
|
|
help=host_help
|
|
)
|
|
|
|
parser.add_argument(
|
|
'--cassandra-username',
|
|
default=defaults['username'],
|
|
help=username_help
|
|
)
|
|
|
|
parser.add_argument(
|
|
'--cassandra-password',
|
|
default=defaults['password'],
|
|
help=password_help
|
|
)
|
|
|
|
def resolve_cassandra_config(args) -> tuple[list[str], str|None, str|None]:
|
|
"""
|
|
Convert argparse args to Cassandra configuration.
|
|
|
|
Returns:
|
|
tuple: (hosts_list, username, password)
|
|
"""
|
|
# Convert host string to list
|
|
if isinstance(args.cassandra_host, str):
|
|
hosts = [h.strip() for h in args.cassandra_host.split(',')]
|
|
else:
|
|
hosts = args.cassandra_host
|
|
|
|
return hosts, args.cassandra_username, args.cassandra_password
|
|
```
|
|
|
|
### Phase 2: Update Modules Using `graph_*` Parameters
|
|
1. Change parameter names from `graph_*` to `cassandra_*`
|
|
2. Replace custom `add_args()` methods with standardized `add_cassandra_args()`
|
|
3. Use the common configuration helper functions
|
|
4. Update documentation strings
|
|
|
|
Example transformation:
|
|
```python
|
|
# OLD CODE
|
|
@staticmethod
|
|
def add_args(parser):
|
|
parser.add_argument(
|
|
'-g', '--graph-host',
|
|
default="localhost",
|
|
help=f'Graph host (default: localhost)'
|
|
)
|
|
parser.add_argument(
|
|
'--graph-username',
|
|
default=None,
|
|
help=f'Cassandra username'
|
|
)
|
|
|
|
# NEW CODE
|
|
@staticmethod
|
|
def add_args(parser):
|
|
FlowProcessor.add_args(parser)
|
|
add_cassandra_args(parser) # Use standard helper
|
|
```
|
|
|
|
### Phase 3: Update Modules Using `cassandra_*` Parameters
|
|
1. Add command-line argument support where missing (e.g., `kg-store`)
|
|
2. Replace existing argument definitions with `add_cassandra_args()`
|
|
3. Use `resolve_cassandra_config()` for consistent resolution
|
|
4. Ensure consistent host list handling
|
|
|
|
### Phase 4: Update Tests and Documentation
|
|
1. Update all test files to use new parameter names
|
|
2. Update CLI documentation
|
|
3. Update API documentation
|
|
4. Add environment variable documentation
|
|
|
|
## Backward Compatibility
|
|
|
|
To maintain backward compatibility during transition:
|
|
|
|
1. **Deprecation warnings** for `graph_*` parameters
|
|
2. **Parameter aliasing** - accept both old and new names initially
|
|
3. **Phased rollout** over multiple releases
|
|
4. **Documentation updates** with migration guide
|
|
|
|
Example backward compatibility code:
|
|
```python
|
|
def __init__(self, **params):
|
|
# Handle deprecated graph_* parameters
|
|
if 'graph_host' in params:
|
|
warnings.warn("graph_host is deprecated, use cassandra_host", DeprecationWarning)
|
|
params.setdefault('cassandra_host', params.pop('graph_host'))
|
|
|
|
if 'graph_username' in params:
|
|
warnings.warn("graph_username is deprecated, use cassandra_username", DeprecationWarning)
|
|
params.setdefault('cassandra_username', params.pop('graph_username'))
|
|
|
|
# ... continue with standard resolution
|
|
```
|
|
|
|
## Testing Strategy
|
|
|
|
1. **Unit tests** for configuration resolution logic
|
|
2. **Integration tests** with various configuration combinations
|
|
3. **Environment variable tests**
|
|
4. **Backward compatibility tests** with deprecated parameters
|
|
5. **Docker compose tests** with environment variables
|
|
|
|
## Documentation Updates
|
|
|
|
1. Update all CLI command documentation
|
|
2. Update API documentation
|
|
3. Create migration guide
|
|
4. Update Docker compose examples
|
|
5. Update configuration reference documentation
|
|
|
|
## Risks and Mitigation
|
|
|
|
| Risk | Impact | Mitigation |
|
|
|------|--------|------------|
|
|
| Breaking changes for users | High | Implement backward compatibility period |
|
|
| Configuration confusion during transition | Medium | Clear documentation and deprecation warnings |
|
|
| Test failures | Medium | Comprehensive test updates |
|
|
| Docker deployment issues | High | Update all Docker compose examples |
|
|
|
|
## Success Criteria
|
|
|
|
- [ ] All modules use consistent `cassandra_*` parameter names
|
|
- [ ] All processors expose Cassandra settings via command-line arguments
|
|
- [ ] Command-line help text shows environment variable defaults
|
|
- [ ] Password values are never displayed in help text
|
|
- [ ] Environment variable fallback works correctly
|
|
- [ ] `cassandra_host` is consistently handled as a list internally
|
|
- [ ] Backward compatibility maintained for at least 2 releases
|
|
- [ ] All tests pass with new configuration system
|
|
- [ ] Documentation fully updated
|
|
- [ ] Docker compose examples work with environment variables
|
|
|
|
## Timeline
|
|
|
|
- **Week 1:** Implement common configuration helper and update `graph_*` modules
|
|
- **Week 2:** Add environment variable support to existing `cassandra_*` modules
|
|
- **Week 3:** Update tests and documentation
|
|
- **Week 4:** Integration testing and bug fixes
|
|
|
|
## Future Considerations
|
|
|
|
- Consider extending this pattern to other database configurations (e.g., Elasticsearch)
|
|
- Implement configuration validation and better error messages
|
|
- Add support for Cassandra connection pooling configuration
|
|
- Consider adding configuration file support (.env files)
|