trustgraph/tests/integration/README.md

269 lines
10 KiB
Markdown
Raw Normal View History

Release/v1.2 (#457) * Bump setup.py versions for 1.1 * PoC MCP server (#419) * Very initial MCP server PoC for TrustGraph * Put service on port 8000 * Add MCP container and packages to buildout * Update docs for API/CLI changes in 1.0 (#421) * Update some API basics for the 0.23/1.0 API change * Add MCP container push (#425) * Add command args to the MCP server (#426) * Host and port parameters * Added websocket arg * More docs * MCP client support (#427) - MCP client service - Tool request/response schema - API gateway support for mcp-tool - Message translation for tool request & response - Make mcp-tool using configuration service for information about where the MCP services are. * Feature/react call mcp (#428) Key Features - MCP Tool Integration: Added core MCP tool support with ToolClientSpec and ToolClient classes - API Enhancement: New mcp_tool method for flow-specific tool invocation - CLI Tooling: New tg-invoke-mcp-tool command for testing MCP integration - React Agent Enhancement: Fixed and improved multi-tool invocation capabilities - Tool Management: Enhanced CLI for tool configuration and management Changes - Added MCP tool invocation to API with flow-specific integration - Implemented ToolClientSpec and ToolClient for tool call handling - Updated agent-manager-react to invoke MCP tools with configurable types - Enhanced CLI with new commands and improved help text - Added comprehensive documentation for new CLI commands - Improved tool configuration management Testing - Added tg-invoke-mcp-tool CLI command for isolated MCP integration testing - Enhanced agent capability to invoke multiple tools simultaneously * Test suite executed from CI pipeline (#433) * Test strategy & test cases * Unit tests * Integration tests * Extending test coverage (#434) * Contract tests * Testing embeedings * Agent unit tests * Knowledge pipeline tests * Turn on contract tests * Increase storage test coverage (#435) * Fixing storage and adding tests * PR pipeline only runs quick tests * Empty configuration is returned as empty list, previously was not in response (#436) * Update config util to take files as well as command-line text (#437) * Updated CLI invocation and config model for tools and mcp (#438) * Updated CLI invocation and config model for tools and mcp * CLI anomalies * Tweaked the MCP tool implementation for new model * Update agent implementation to match the new model * Fix agent tools, now all tested * Fixed integration tests * Fix MCP delete tool params * Update Python deps to 1.2 * Update to enable knowledge extraction using the agent framework (#439) * Implement KG extraction agent (kg-extract-agent) * Using ReAct framework (agent-manager-react) * ReAct manager had an issue when emitting JSON, which conflicts which ReAct manager's own JSON messages, so refactored ReAct manager to use traditional ReAct messages, non-JSON structure. * Minor refactor to take the prompt template client out of prompt-template so it can be more readily used by other modules. kg-extract-agent uses this framework. * Migrate from setup.py to pyproject.toml (#440) * Converted setup.py to pyproject.toml * Modern package infrastructure as recommended by py docs * Install missing build deps (#441) * Install missing build deps (#442) * Implement logging strategy (#444) * Logging strategy and convert all prints() to logging invocations * Fix/startup failure (#445) * Fix loggin startup problems * Fix logging startup problems (#446) * Fix logging startup problems (#447) * Fixed Mistral OCR to use current API (#448) * Fixed Mistral OCR to use current API * Added PDF decoder tests * Fix Mistral OCR ident to be standard pdf-decoder (#450) * Fix Mistral OCR ident to be standard pdf-decoder * Correct test * Schema structure refactor (#451) * Write schema refactor spec * Implemented schema refactor spec * Structure data mvp (#452) * Structured data tech spec * Architecture principles * New schemas * Updated schemas and specs * Object extractor * Add .coveragerc * New tests * Cassandra object storage * Trying to object extraction working, issues exist * Validate librarian collection (#453) * Fix token chunker, broken API invocation (#454) * Fix token chunker, broken API invocation (#455) * Knowledge load utility CLI (#456) * Knowledge loader * More tests
2025-08-18 20:56:09 +01:00
# Integration Test Pattern for TrustGraph
This directory contains integration tests that verify the coordination between multiple TrustGraph services and components, following the patterns outlined in [TEST_STRATEGY.md](../../TEST_STRATEGY.md).
## Integration Test Approach
Integration tests focus on **service-to-service communication patterns** and **end-to-end message flows** while still using mocks for external infrastructure.
### Key Principles
1. **Test Service Coordination**: Verify that services work together correctly
2. **Mock External Dependencies**: Use mocks for databases, APIs, and infrastructure
3. **Real Business Logic**: Exercise actual service logic and data transformations
4. **Error Propagation**: Test how errors flow through the system
5. **Configuration Testing**: Verify services respond correctly to different configurations
## Test Structure
### Fixtures (conftest.py)
Common fixtures for integration tests:
- `mock_pulsar_client`: Mock Pulsar messaging client
- `mock_flow_context`: Mock flow context for service coordination
- `integration_config`: Standard configuration for integration tests
- `sample_documents`: Test document collections
- `sample_embeddings`: Test embedding vectors
- `sample_queries`: Test query sets
### Test Patterns
#### 1. End-to-End Flow Testing
```python
@pytest.mark.integration
@pytest.mark.asyncio
async def test_service_end_to_end_flow(self, service_instance, mock_clients):
"""Test complete service pipeline from input to output"""
# Arrange - Set up realistic test data
# Act - Execute the full service workflow
# Assert - Verify coordination between all components
```
#### 2. Error Propagation Testing
```python
@pytest.mark.integration
@pytest.mark.asyncio
async def test_service_error_handling(self, service_instance, mock_clients):
"""Test how errors propagate through service coordination"""
# Arrange - Set up failure scenarios
# Act - Execute service with failing dependency
# Assert - Verify proper error handling and cleanup
```
#### 3. Configuration Testing
```python
@pytest.mark.integration
@pytest.mark.asyncio
async def test_service_configuration_scenarios(self, service_instance):
"""Test service behavior with different configurations"""
# Test multiple configuration scenarios
# Verify service adapts correctly to each configuration
```
## Running Integration Tests
### Run All Integration Tests
```bash
pytest tests/integration/ -m integration
```
### Run Specific Test
```bash
pytest tests/integration/test_document_rag_integration.py::TestDocumentRagIntegration::test_document_rag_end_to_end_flow -v
```
### Run with Coverage (Skip Coverage Requirement)
```bash
pytest tests/integration/ -m integration --cov=trustgraph --cov-fail-under=0
```
### Run Slow Tests
```bash
pytest tests/integration/ -m "integration and slow"
```
### Skip Slow Tests
```bash
pytest tests/integration/ -m "integration and not slow"
```
## Examples: Integration Test Implementations
### 1. Document RAG Integration Test
The `test_document_rag_integration.py` demonstrates the integration test pattern:
### What It Tests
- **Service Coordination**: Embeddings → Document Retrieval → Prompt Generation
- **Error Handling**: Failure scenarios for each service dependency
- **Configuration**: Different document limits, users, and collections
- **Performance**: Large document set handling
### Key Features
- **Realistic Data Flow**: Uses actual service logic with mocked dependencies
- **Multiple Scenarios**: Success, failure, and edge cases
- **Verbose Logging**: Tests logging functionality
- **Multi-User Support**: Tests user and collection isolation
### Test Coverage
- ✅ End-to-end happy path
- ✅ No documents found scenario
- ✅ Service failure scenarios (embeddings, documents, prompt)
- ✅ Configuration variations
- ✅ Multi-user isolation
- ✅ Performance testing
- ✅ Verbose logging
### 2. Text Completion Integration Test
The `test_text_completion_integration.py` demonstrates external API integration testing:
### What It Tests
- **External API Integration**: OpenAI API connectivity and authentication
- **Rate Limiting**: Proper handling of API rate limits and retries
- **Error Handling**: API failures, connection timeouts, and error propagation
- **Token Tracking**: Accurate input/output token counting and metrics
- **Configuration**: Different model parameters and settings
- **Concurrency**: Multiple simultaneous API requests
### Key Features
- **Realistic Mock Responses**: Uses actual OpenAI API response structures
- **Authentication Testing**: API key validation and base URL configuration
- **Error Scenarios**: Rate limits, connection failures, invalid requests
- **Performance Metrics**: Timing and token usage validation
- **Model Flexibility**: Tests different GPT models and parameters
### Test Coverage
- ✅ Successful text completion generation
- ✅ Multiple model configurations (GPT-3.5, GPT-4, GPT-4-turbo)
- ✅ Rate limit handling (RateLimitError → TooManyRequests)
- ✅ API error handling and propagation
- ✅ Token counting accuracy
- ✅ Prompt construction and parameter validation
- ✅ Authentication patterns and API key validation
- ✅ Concurrent request processing
- ✅ Response content extraction and validation
- ✅ Performance timing measurements
### 3. Agent Manager Integration Test
The `test_agent_manager_integration.py` demonstrates complex service coordination testing:
### What It Tests
- **ReAct Pattern**: Think-Act-Observe cycles with multi-step reasoning
- **Tool Coordination**: Selection and execution of different tools (knowledge query, text completion, MCP tools)
- **Conversation State**: Management of conversation history and context
- **Multi-Service Integration**: Coordination between prompt, graph RAG, and tool services
- **Error Handling**: Tool failures, unknown tools, and error propagation
- **Configuration Management**: Dynamic tool loading and configuration
### Key Features
- **Complex Coordination**: Tests agent reasoning with multiple tool options
- **Stateful Processing**: Maintains conversation history across interactions
- **Dynamic Tool Selection**: Tests tool selection based on context and reasoning
- **Callback Pattern**: Tests think/observe callback mechanisms
- **JSON Serialization**: Handles complex data structures in prompts
- **Performance Testing**: Large conversation history handling
### Test Coverage
- ✅ Basic reasoning cycle with tool selection
- ✅ Final answer generation (ending ReAct cycle)
- ✅ Full ReAct cycle with tool execution
- ✅ Conversation history management
- ✅ Multiple tool coordination and selection
- ✅ Tool argument validation and processing
- ✅ Error handling (unknown tools, execution failures)
- ✅ Context integration and additional prompting
- ✅ Empty tool configuration handling
- ✅ Tool response processing and cleanup
- ✅ Performance with large conversation history
- ✅ JSON serialization in complex prompts
### 4. Knowledge Graph Extract → Store Pipeline Integration Test
The `test_kg_extract_store_integration.py` demonstrates multi-stage pipeline testing:
### What It Tests
- **Text-to-Graph Transformation**: Complete pipeline from text chunks to graph triples
- **Entity Extraction**: Definition extraction with proper URI generation
- **Relationship Extraction**: Subject-predicate-object relationship extraction
- **Graph Database Integration**: Storage coordination with Cassandra knowledge store
- **Data Validation**: Entity filtering, validation, and consistency checks
- **Pipeline Coordination**: Multi-stage processing with proper data flow
### Key Features
- **Multi-Stage Pipeline**: Tests definitions → relationships → storage coordination
- **Graph Data Structures**: RDF triples, entity contexts, and graph embeddings
- **URI Generation**: Consistent entity URI creation across pipeline stages
- **Data Transformation**: Complex text analysis to structured graph data
- **Batch Processing**: Large document set processing performance
- **Error Resilience**: Graceful handling of extraction failures
### Test Coverage
- ✅ Definitions extraction pipeline (text → entities + definitions)
- ✅ Relationships extraction pipeline (text → subject-predicate-object)
- ✅ URI generation consistency between processors
- ✅ Triple generation from definitions and relationships
- ✅ Knowledge store integration (triples and embeddings storage)
- ✅ End-to-end pipeline coordination
- ✅ Error handling in extraction services
- ✅ Empty and invalid extraction results handling
- ✅ Entity filtering and validation
- ✅ Large batch processing performance
- ✅ Metadata propagation through pipeline stages
## Best Practices
### Test Organization
- Group related tests in classes
- Use descriptive test names that explain the scenario
- Follow the Arrange-Act-Assert pattern
- Use appropriate pytest markers (`@pytest.mark.integration`, `@pytest.mark.slow`)
### Mock Strategy
- Mock external services (databases, APIs, message brokers)
- Use real service logic and data transformations
- Create realistic mock responses that match actual service behavior
- Reset mocks between tests to ensure isolation
### Test Data
- Use realistic test data that reflects actual usage patterns
- Create reusable fixtures for common test scenarios
- Test with various data sizes and edge cases
- Include both success and failure scenarios
### Error Testing
- Test each dependency failure scenario
- Verify proper error propagation and cleanup
- Test timeout and retry mechanisms
- Validate error response formats
### Performance Testing
- Mark performance tests with `@pytest.mark.slow`
- Test with realistic data volumes
- Set reasonable performance expectations
- Monitor resource usage during tests
## Adding New Integration Tests
1. **Identify Service Dependencies**: Map out which services your target service coordinates with
2. **Create Mock Fixtures**: Set up mocks for each dependency in conftest.py
3. **Design Test Scenarios**: Plan happy path, error cases, and edge conditions
4. **Implement Tests**: Follow the established patterns in this directory
5. **Add Documentation**: Update this README with your new test patterns
## Test Markers
- `@pytest.mark.integration`: Marks tests as integration tests
- `@pytest.mark.slow`: Marks tests that take longer to run
- `@pytest.mark.asyncio`: Required for async test functions
## Future Enhancements
- Add tests with real test containers for database integration
- Implement contract testing for service interfaces
- Add performance benchmarking for critical paths
- Create integration test templates for common service patterns