trustgraph/tests/integration/README.md

# Integration Test Pattern for TrustGraph

This directory contains integration tests that verify the coordination between multiple TrustGraph services and components, following the patterns outlined in [TEST_STRATEGY.md](../../TEST_STRATEGY.md).

## Integration Test Approach

Integration tests focus on **service-to-service communication patterns** and **end-to-end message flows** while still using mocks for external infrastructure.

### Key Principles

1. **Test Service Coordination**: Verify that services work together correctly
2. **Mock External Dependencies**: Use mocks for databases, APIs, and infrastructure
3. **Real Business Logic**: Exercise actual service logic and data transformations
4. **Error Propagation**: Test how errors flow through the system
5. **Configuration Testing**: Verify services respond correctly to different configurations

## Test Structure

### Fixtures (conftest.py)

Common fixtures for integration tests:
- `mock_pulsar_client`: Mock Pulsar messaging client
- `mock_flow_context`: Mock flow context for service coordination
- `integration_config`: Standard configuration for integration tests
- `sample_documents`: Test document collections
- `sample_embeddings`: Test embedding vectors
- `sample_queries`: Test query sets

### Test Patterns

#### 1. End-to-End Flow Testing

```python
@pytest.mark.integration
@pytest.mark.asyncio
async def test_service_end_to_end_flow(self, service_instance, mock_clients):
    """Test complete service pipeline from input to output"""
    # Arrange - Set up realistic test data
    # Act - Execute the full service workflow
    # Assert - Verify coordination between all components
```

#### 2. Error Propagation Testing

```python
@pytest.mark.integration
@pytest.mark.asyncio
async def test_service_error_handling(self, service_instance, mock_clients):
    """Test how errors propagate through service coordination"""
    # Arrange - Set up failure scenarios
    # Act - Execute service with failing dependency
    # Assert - Verify proper error handling and cleanup
```

#### 3. Configuration Testing

```python
@pytest.mark.integration
@pytest.mark.asyncio
async def test_service_configuration_scenarios(self, service_instance):
    """Test service behavior with different configurations"""
    # Test multiple configuration scenarios
    # Verify service adapts correctly to each configuration
```

## Running Integration Tests

### Run All Integration Tests
```bash
pytest tests/integration/ -m integration
```

### Run Specific Test
```bash
pytest tests/integration/test_document_rag_integration.py::TestDocumentRagIntegration::test_document_rag_end_to_end_flow -v
```

### Run with Coverage (Skip Coverage Requirement)
```bash
pytest tests/integration/ -m integration --cov=trustgraph --cov-fail-under=0
```

### Run Slow Tests
```bash
pytest tests/integration/ -m "integration and slow"
```

### Skip Slow Tests
```bash
pytest tests/integration/ -m "integration and not slow"
```

## Examples: Integration Test Implementations

### 1. Document RAG Integration Test

The `test_document_rag_integration.py` demonstrates the integration test pattern:

### What It Tests
- **Service Coordination**: Embeddings → Document Retrieval → Prompt Generation
- **Error Handling**: Failure scenarios for each service dependency
- **Configuration**: Different document limits, users, and collections
- **Performance**: Large document set handling

### Key Features
- **Realistic Data Flow**: Uses actual service logic with mocked dependencies
- **Multiple Scenarios**: Success, failure, and edge cases
- **Verbose Logging**: Tests logging functionality
- **Multi-User Support**: Tests user and collection isolation

### Test Coverage
- ✅ End-to-end happy path
- ✅ No documents found scenario
- ✅ Service failure scenarios (embeddings, documents, prompt)
- ✅ Configuration variations
- ✅ Multi-user isolation
- ✅ Performance testing
- ✅ Verbose logging

### 2. Text Completion Integration Test

The `test_text_completion_integration.py` demonstrates external API integration testing:

### What It Tests
- **External API Integration**: OpenAI API connectivity and authentication
- **Rate Limiting**: Proper handling of API rate limits and retries
- **Error Handling**: API failures, connection timeouts, and error propagation
- **Token Tracking**: Accurate input/output token counting and metrics
- **Configuration**: Different model parameters and settings
- **Concurrency**: Multiple simultaneous API requests

### Key Features
- **Realistic Mock Responses**: Uses actual OpenAI API response structures
- **Authentication Testing**: API key validation and base URL configuration
- **Error Scenarios**: Rate limits, connection failures, invalid requests
- **Performance Metrics**: Timing and token usage validation
- **Model Flexibility**: Tests different GPT models and parameters

### Test Coverage
- ✅ Successful text completion generation
- ✅ Multiple model configurations (GPT-3.5, GPT-4, GPT-4-turbo)
- ✅ Rate limit handling (RateLimitError → TooManyRequests)
- ✅ API error handling and propagation
- ✅ Token counting accuracy
- ✅ Prompt construction and parameter validation
- ✅ Authentication patterns and API key validation
- ✅ Concurrent request processing
- ✅ Response content extraction and validation
- ✅ Performance timing measurements

### 3. Agent Manager Integration Test

The `test_agent_manager_integration.py` demonstrates complex service coordination testing:

### What It Tests
- **ReAct Pattern**: Think-Act-Observe cycles with multi-step reasoning
- **Tool Coordination**: Selection and execution of different tools (knowledge query, text completion, MCP tools)
- **Conversation State**: Management of conversation history and context
- **Multi-Service Integration**: Coordination between prompt, graph RAG, and tool services
- **Error Handling**: Tool failures, unknown tools, and error propagation
- **Configuration Management**: Dynamic tool loading and configuration

### Key Features
- **Complex Coordination**: Tests agent reasoning with multiple tool options
- **Stateful Processing**: Maintains conversation history across interactions
- **Dynamic Tool Selection**: Tests tool selection based on context and reasoning
- **Callback Pattern**: Tests think/observe callback mechanisms
- **JSON Serialization**: Handles complex data structures in prompts
- **Performance Testing**: Large conversation history handling

### Test Coverage
- ✅ Basic reasoning cycle with tool selection
- ✅ Final answer generation (ending ReAct cycle)
- ✅ Full ReAct cycle with tool execution
- ✅ Conversation history management
- ✅ Multiple tool coordination and selection
- ✅ Tool argument validation and processing
- ✅ Error handling (unknown tools, execution failures)
- ✅ Context integration and additional prompting
- ✅ Empty tool configuration handling
- ✅ Tool response processing and cleanup
- ✅ Performance with large conversation history
- ✅ JSON serialization in complex prompts

### 4. Knowledge Graph Extract → Store Pipeline Integration Test

The `test_kg_extract_store_integration.py` demonstrates multi-stage pipeline testing:

### What It Tests
- **Text-to-Graph Transformation**: Complete pipeline from text chunks to graph triples
- **Entity Extraction**: Definition extraction with proper URI generation
- **Relationship Extraction**: Subject-predicate-object relationship extraction
- **Graph Database Integration**: Storage coordination with Cassandra knowledge store
- **Data Validation**: Entity filtering, validation, and consistency checks
- **Pipeline Coordination**: Multi-stage processing with proper data flow

### Key Features
- **Multi-Stage Pipeline**: Tests definitions → relationships → storage coordination
- **Graph Data Structures**: RDF triples, entity contexts, and graph embeddings
- **URI Generation**: Consistent entity URI creation across pipeline stages
- **Data Transformation**: Complex text analysis to structured graph data
- **Batch Processing**: Large document set processing performance
- **Error Resilience**: Graceful handling of extraction failures

### Test Coverage
- ✅ Definitions extraction pipeline (text → entities + definitions)
- ✅ Relationships extraction pipeline (text → subject-predicate-object)
- ✅ URI generation consistency between processors
- ✅ Triple generation from definitions and relationships
- ✅ Knowledge store integration (triples and embeddings storage)
- ✅ End-to-end pipeline coordination
- ✅ Error handling in extraction services
- ✅ Empty and invalid extraction results handling
- ✅ Entity filtering and validation
- ✅ Large batch processing performance
- ✅ Metadata propagation through pipeline stages

## Best Practices

### Test Organization
- Group related tests in classes
- Use descriptive test names that explain the scenario
- Follow the Arrange-Act-Assert pattern
- Use appropriate pytest markers (`@pytest.mark.integration`, `@pytest.mark.slow`)

### Mock Strategy
- Mock external services (databases, APIs, message brokers)
- Use real service logic and data transformations
- Create realistic mock responses that match actual service behavior
- Reset mocks between tests to ensure isolation

### Test Data
- Use realistic test data that reflects actual usage patterns
- Create reusable fixtures for common test scenarios
- Test with various data sizes and edge cases
- Include both success and failure scenarios

### Error Testing
- Test each dependency failure scenario
- Verify proper error propagation and cleanup
- Test timeout and retry mechanisms
- Validate error response formats

### Performance Testing
- Mark performance tests with `@pytest.mark.slow`
- Test with realistic data volumes
- Set reasonable performance expectations
- Monitor resource usage during tests

## Adding New Integration Tests

1. **Identify Service Dependencies**: Map out which services your target service coordinates with
2. **Create Mock Fixtures**: Set up mocks for each dependency in conftest.py
3. **Design Test Scenarios**: Plan happy path, error cases, and edge conditions
4. **Implement Tests**: Follow the established patterns in this directory
5. **Add Documentation**: Update this README with your new test patterns

## Test Markers

- `@pytest.mark.integration`: Marks tests as integration tests
- `@pytest.mark.slow`: Marks tests that take longer to run
- `@pytest.mark.asyncio`: Required for async test functions

## Future Enhancements

- Add tests with real test containers for database integration
- Implement contract testing for service interfaces
- Add performance benchmarking for critical paths
- Create integration test templates for common service patterns
Release/v1.2 (#457) * Bump setup.py versions for 1.1 * PoC MCP server (#419) * Very initial MCP server PoC for TrustGraph * Put service on port 8000 * Add MCP container and packages to buildout * Update docs for API/CLI changes in 1.0 (#421) * Update some API basics for the 0.23/1.0 API change * Add MCP container push (#425) * Add command args to the MCP server (#426) * Host and port parameters * Added websocket arg * More docs * MCP client support (#427) - MCP client service - Tool request/response schema - API gateway support for mcp-tool - Message translation for tool request & response - Make mcp-tool using configuration service for information about where the MCP services are. * Feature/react call mcp (#428) Key Features - MCP Tool Integration: Added core MCP tool support with ToolClientSpec and ToolClient classes - API Enhancement: New mcp_tool method for flow-specific tool invocation - CLI Tooling: New tg-invoke-mcp-tool command for testing MCP integration - React Agent Enhancement: Fixed and improved multi-tool invocation capabilities - Tool Management: Enhanced CLI for tool configuration and management Changes - Added MCP tool invocation to API with flow-specific integration - Implemented ToolClientSpec and ToolClient for tool call handling - Updated agent-manager-react to invoke MCP tools with configurable types - Enhanced CLI with new commands and improved help text - Added comprehensive documentation for new CLI commands - Improved tool configuration management Testing - Added tg-invoke-mcp-tool CLI command for isolated MCP integration testing - Enhanced agent capability to invoke multiple tools simultaneously * Test suite executed from CI pipeline (#433) * Test strategy & test cases * Unit tests * Integration tests * Extending test coverage (#434) * Contract tests * Testing embeedings * Agent unit tests * Knowledge pipeline tests * Turn on contract tests * Increase storage test coverage (#435) * Fixing storage and adding tests * PR pipeline only runs quick tests * Empty configuration is returned as empty list, previously was not in response (#436) * Update config util to take files as well as command-line text (#437) * Updated CLI invocation and config model for tools and mcp (#438) * Updated CLI invocation and config model for tools and mcp * CLI anomalies * Tweaked the MCP tool implementation for new model * Update agent implementation to match the new model * Fix agent tools, now all tested * Fixed integration tests * Fix MCP delete tool params * Update Python deps to 1.2 * Update to enable knowledge extraction using the agent framework (#439) * Implement KG extraction agent (kg-extract-agent) * Using ReAct framework (agent-manager-react) * ReAct manager had an issue when emitting JSON, which conflicts which ReAct manager's own JSON messages, so refactored ReAct manager to use traditional ReAct messages, non-JSON structure. * Minor refactor to take the prompt template client out of prompt-template so it can be more readily used by other modules. kg-extract-agent uses this framework. * Migrate from setup.py to pyproject.toml (#440) * Converted setup.py to pyproject.toml * Modern package infrastructure as recommended by py docs * Install missing build deps (#441) * Install missing build deps (#442) * Implement logging strategy (#444) * Logging strategy and convert all prints() to logging invocations * Fix/startup failure (#445) * Fix loggin startup problems * Fix logging startup problems (#446) * Fix logging startup problems (#447) * Fixed Mistral OCR to use current API (#448) * Fixed Mistral OCR to use current API * Added PDF decoder tests * Fix Mistral OCR ident to be standard pdf-decoder (#450) * Fix Mistral OCR ident to be standard pdf-decoder * Correct test * Schema structure refactor (#451) * Write schema refactor spec * Implemented schema refactor spec * Structure data mvp (#452) * Structured data tech spec * Architecture principles * New schemas * Updated schemas and specs * Object extractor * Add .coveragerc * New tests * Cassandra object storage * Trying to object extraction working, issues exist * Validate librarian collection (#453) * Fix token chunker, broken API invocation (#454) * Fix token chunker, broken API invocation (#455) * Knowledge load utility CLI (#456) * Knowledge loader * More tests 2025-08-18 20:56:09 +01:00			`# Integration Test Pattern for TrustGraph`

			`This directory contains integration tests that verify the coordination between multiple TrustGraph services and components, following the patterns outlined in [TEST_STRATEGY.md](../../TEST_STRATEGY.md).`

			`## Integration Test Approach`

			`Integration tests focus on service-to-service communication patterns and end-to-end message flows while still using mocks for external infrastructure.`

			`### Key Principles`

			`1. Test Service Coordination: Verify that services work together correctly`
			`2. Mock External Dependencies: Use mocks for databases, APIs, and infrastructure`
			`3. Real Business Logic: Exercise actual service logic and data transformations`
			`4. Error Propagation: Test how errors flow through the system`
			`5. Configuration Testing: Verify services respond correctly to different configurations`

			`## Test Structure`

			`### Fixtures (conftest.py)`

			`Common fixtures for integration tests:`
			- `mock_pulsar_client`: Mock Pulsar messaging client
			- `mock_flow_context`: Mock flow context for service coordination
			- `integration_config`: Standard configuration for integration tests
			- `sample_documents`: Test document collections
			- `sample_embeddings`: Test embedding vectors
			- `sample_queries`: Test query sets

			`### Test Patterns`

			`#### 1. End-to-End Flow Testing`

			```python
			`@pytest.mark.integration`
			`@pytest.mark.asyncio`
			`async def test_service_end_to_end_flow(self, service_instance, mock_clients):`
			`"""Test complete service pipeline from input to output"""`
			`# Arrange - Set up realistic test data`
			`# Act - Execute the full service workflow`
			`# Assert - Verify coordination between all components`
			```

			`#### 2. Error Propagation Testing`

			```python
			`@pytest.mark.integration`
			`@pytest.mark.asyncio`
			`async def test_service_error_handling(self, service_instance, mock_clients):`
			`"""Test how errors propagate through service coordination"""`
			`# Arrange - Set up failure scenarios`
			`# Act - Execute service with failing dependency`
			`# Assert - Verify proper error handling and cleanup`
			```

			`#### 3. Configuration Testing`

			```python
			`@pytest.mark.integration`
			`@pytest.mark.asyncio`
			`async def test_service_configuration_scenarios(self, service_instance):`
			`"""Test service behavior with different configurations"""`
			`# Test multiple configuration scenarios`
			`# Verify service adapts correctly to each configuration`
			```

			`## Running Integration Tests`

			`### Run All Integration Tests`
			```bash
			`pytest tests/integration/ -m integration`
			```

			`### Run Specific Test`
			```bash
			`pytest tests/integration/test_document_rag_integration.py::TestDocumentRagIntegration::test_document_rag_end_to_end_flow -v`
			```

			`### Run with Coverage (Skip Coverage Requirement)`
			```bash
			`pytest tests/integration/ -m integration --cov=trustgraph --cov-fail-under=0`
			```

			`### Run Slow Tests`
			```bash
			`pytest tests/integration/ -m "integration and slow"`
			```

			`### Skip Slow Tests`
			```bash
			`pytest tests/integration/ -m "integration and not slow"`
			```

			`## Examples: Integration Test Implementations`

			`### 1. Document RAG Integration Test`

			The `test_document_rag_integration.py` demonstrates the integration test pattern:

			`### What It Tests`
			`- Service Coordination: Embeddings → Document Retrieval → Prompt Generation`
			`- Error Handling: Failure scenarios for each service dependency`
			`- Configuration: Different document limits, users, and collections`
			`- Performance: Large document set handling`

			`### Key Features`
			`- Realistic Data Flow: Uses actual service logic with mocked dependencies`
			`- Multiple Scenarios: Success, failure, and edge cases`
			`- Verbose Logging: Tests logging functionality`
			`- Multi-User Support: Tests user and collection isolation`

			`### Test Coverage`
			`- ✅ End-to-end happy path`
			`- ✅ No documents found scenario`
			`- ✅ Service failure scenarios (embeddings, documents, prompt)`
			`- ✅ Configuration variations`
			`- ✅ Multi-user isolation`
			`- ✅ Performance testing`
			`- ✅ Verbose logging`

			`### 2. Text Completion Integration Test`

			The `test_text_completion_integration.py` demonstrates external API integration testing:

			`### What It Tests`
			`- External API Integration: OpenAI API connectivity and authentication`
			`- Rate Limiting: Proper handling of API rate limits and retries`
			`- Error Handling: API failures, connection timeouts, and error propagation`
			`- Token Tracking: Accurate input/output token counting and metrics`
			`- Configuration: Different model parameters and settings`
			`- Concurrency: Multiple simultaneous API requests`

			`### Key Features`
			`- Realistic Mock Responses: Uses actual OpenAI API response structures`
			`- Authentication Testing: API key validation and base URL configuration`
			`- Error Scenarios: Rate limits, connection failures, invalid requests`
			`- Performance Metrics: Timing and token usage validation`
			`- Model Flexibility: Tests different GPT models and parameters`

			`### Test Coverage`
			`- ✅ Successful text completion generation`
			`- ✅ Multiple model configurations (GPT-3.5, GPT-4, GPT-4-turbo)`
			`- ✅ Rate limit handling (RateLimitError → TooManyRequests)`
			`- ✅ API error handling and propagation`
			`- ✅ Token counting accuracy`
			`- ✅ Prompt construction and parameter validation`
			`- ✅ Authentication patterns and API key validation`
			`- ✅ Concurrent request processing`
			`- ✅ Response content extraction and validation`
			`- ✅ Performance timing measurements`

			`### 3. Agent Manager Integration Test`

			The `test_agent_manager_integration.py` demonstrates complex service coordination testing:

			`### What It Tests`
			`- ReAct Pattern: Think-Act-Observe cycles with multi-step reasoning`
			`- Tool Coordination: Selection and execution of different tools (knowledge query, text completion, MCP tools)`
			`- Conversation State: Management of conversation history and context`
			`- Multi-Service Integration: Coordination between prompt, graph RAG, and tool services`
			`- Error Handling: Tool failures, unknown tools, and error propagation`
			`- Configuration Management: Dynamic tool loading and configuration`

			`### Key Features`
			`- Complex Coordination: Tests agent reasoning with multiple tool options`
			`- Stateful Processing: Maintains conversation history across interactions`
			`- Dynamic Tool Selection: Tests tool selection based on context and reasoning`
			`- Callback Pattern: Tests think/observe callback mechanisms`
			`- JSON Serialization: Handles complex data structures in prompts`
			`- Performance Testing: Large conversation history handling`

			`### Test Coverage`
			`- ✅ Basic reasoning cycle with tool selection`
			`- ✅ Final answer generation (ending ReAct cycle)`
			`- ✅ Full ReAct cycle with tool execution`
			`- ✅ Conversation history management`
			`- ✅ Multiple tool coordination and selection`
			`- ✅ Tool argument validation and processing`
			`- ✅ Error handling (unknown tools, execution failures)`
			`- ✅ Context integration and additional prompting`
			`- ✅ Empty tool configuration handling`
			`- ✅ Tool response processing and cleanup`
			`- ✅ Performance with large conversation history`
			`- ✅ JSON serialization in complex prompts`

			`### 4. Knowledge Graph Extract → Store Pipeline Integration Test`

			The `test_kg_extract_store_integration.py` demonstrates multi-stage pipeline testing:

			`### What It Tests`
			`- Text-to-Graph Transformation: Complete pipeline from text chunks to graph triples`
			`- Entity Extraction: Definition extraction with proper URI generation`
			`- Relationship Extraction: Subject-predicate-object relationship extraction`
			`- Graph Database Integration: Storage coordination with Cassandra knowledge store`
			`- Data Validation: Entity filtering, validation, and consistency checks`
			`- Pipeline Coordination: Multi-stage processing with proper data flow`

			`### Key Features`
			`- Multi-Stage Pipeline: Tests definitions → relationships → storage coordination`
			`- Graph Data Structures: RDF triples, entity contexts, and graph embeddings`
			`- URI Generation: Consistent entity URI creation across pipeline stages`
			`- Data Transformation: Complex text analysis to structured graph data`
			`- Batch Processing: Large document set processing performance`
			`- Error Resilience: Graceful handling of extraction failures`

			`### Test Coverage`
			`- ✅ Definitions extraction pipeline (text → entities + definitions)`
			`- ✅ Relationships extraction pipeline (text → subject-predicate-object)`
			`- ✅ URI generation consistency between processors`
			`- ✅ Triple generation from definitions and relationships`
			`- ✅ Knowledge store integration (triples and embeddings storage)`
			`- ✅ End-to-end pipeline coordination`
			`- ✅ Error handling in extraction services`
			`- ✅ Empty and invalid extraction results handling`
			`- ✅ Entity filtering and validation`
			`- ✅ Large batch processing performance`
			`- ✅ Metadata propagation through pipeline stages`

			`## Best Practices`

			`### Test Organization`
			`- Group related tests in classes`
			`- Use descriptive test names that explain the scenario`
			`- Follow the Arrange-Act-Assert pattern`
			- Use appropriate pytest markers (`@pytest.mark.integration`, `@pytest.mark.slow`)

			`### Mock Strategy`
			`- Mock external services (databases, APIs, message brokers)`
			`- Use real service logic and data transformations`
			`- Create realistic mock responses that match actual service behavior`
			`- Reset mocks between tests to ensure isolation`

			`### Test Data`
			`- Use realistic test data that reflects actual usage patterns`
			`- Create reusable fixtures for common test scenarios`
			`- Test with various data sizes and edge cases`
			`- Include both success and failure scenarios`

			`### Error Testing`
			`- Test each dependency failure scenario`
			`- Verify proper error propagation and cleanup`
			`- Test timeout and retry mechanisms`
			`- Validate error response formats`

			`### Performance Testing`
			- Mark performance tests with `@pytest.mark.slow`
			`- Test with realistic data volumes`
			`- Set reasonable performance expectations`
			`- Monitor resource usage during tests`

			`## Adding New Integration Tests`

			`1. Identify Service Dependencies: Map out which services your target service coordinates with`
			`2. Create Mock Fixtures: Set up mocks for each dependency in conftest.py`
			`3. Design Test Scenarios: Plan happy path, error cases, and edge conditions`
			`4. Implement Tests: Follow the established patterns in this directory`
			`5. Add Documentation: Update this README with your new test patterns`

			`## Test Markers`

			- `@pytest.mark.integration`: Marks tests as integration tests
			- `@pytest.mark.slow`: Marks tests that take longer to run
			- `@pytest.mark.asyncio`: Required for async test functions

			`## Future Enhancements`

			`- Add tests with real test containers for database integration`
			`- Implement contract testing for service interfaces`
			`- Add performance benchmarking for critical paths`
			`- Create integration test templates for common service patterns`