trustgraph/tests/integration/README.md
cybermaggedon 89be656990
Release/v1.2 (#457)
* Bump setup.py versions for 1.1

* PoC MCP server (#419)

* Very initial MCP server PoC for TrustGraph

* Put service on port 8000

* Add MCP container and packages to buildout

* Update docs for API/CLI changes in 1.0 (#421)

* Update some API basics for the 0.23/1.0 API change

* Add MCP container push (#425)

* Add command args to the MCP server (#426)

* Host and port parameters

* Added websocket arg

* More docs

* MCP client support (#427)

- MCP client service
- Tool request/response schema
- API gateway support for mcp-tool
- Message translation for tool request & response
- Make mcp-tool using configuration service for information
  about where the MCP services are.

* Feature/react call mcp (#428)

Key Features

  - MCP Tool Integration: Added core MCP tool support with ToolClientSpec and ToolClient classes
  - API Enhancement: New mcp_tool method for flow-specific tool invocation
  - CLI Tooling: New tg-invoke-mcp-tool command for testing MCP integration
  - React Agent Enhancement: Fixed and improved multi-tool invocation capabilities
  - Tool Management: Enhanced CLI for tool configuration and management

Changes

  - Added MCP tool invocation to API with flow-specific integration
  - Implemented ToolClientSpec and ToolClient for tool call handling
  - Updated agent-manager-react to invoke MCP tools with configurable types
  - Enhanced CLI with new commands and improved help text
  - Added comprehensive documentation for new CLI commands
  - Improved tool configuration management

Testing

  - Added tg-invoke-mcp-tool CLI command for isolated MCP integration testing
  - Enhanced agent capability to invoke multiple tools simultaneously

* Test suite executed from CI pipeline (#433)

* Test strategy & test cases

* Unit tests

* Integration tests

* Extending test coverage (#434)

* Contract tests

* Testing embeedings

* Agent unit tests

* Knowledge pipeline tests

* Turn on contract tests

* Increase storage test coverage (#435)

* Fixing storage and adding tests

* PR pipeline only runs quick tests

* Empty configuration is returned as empty list, previously was not in response (#436)

* Update config util to take files as well as command-line text (#437)

* Updated CLI invocation and config model for tools and mcp (#438)

* Updated CLI invocation and config model for tools and mcp

* CLI anomalies

* Tweaked the MCP tool implementation for new model

* Update agent implementation to match the new model

* Fix agent tools, now all tested

* Fixed integration tests

* Fix MCP delete tool params

* Update Python deps to 1.2

* Update to enable knowledge extraction using the agent framework (#439)

* Implement KG extraction agent (kg-extract-agent)

* Using ReAct framework (agent-manager-react)
 
* ReAct manager had an issue when emitting JSON, which conflicts which ReAct manager's own JSON messages, so refactored ReAct manager to use traditional ReAct messages, non-JSON structure.
 
* Minor refactor to take the prompt template client out of prompt-template so it can be more readily used by other modules. kg-extract-agent uses this framework.

* Migrate from setup.py to pyproject.toml (#440)

* Converted setup.py to pyproject.toml

* Modern package infrastructure as recommended by py docs

* Install missing build deps (#441)

* Install missing build deps (#442)

* Implement logging strategy (#444)

* Logging strategy and convert all prints() to logging invocations

* Fix/startup failure (#445)

* Fix loggin startup problems

* Fix logging startup problems (#446)

* Fix logging startup problems (#447)

* Fixed Mistral OCR to use current API (#448)

* Fixed Mistral OCR to use current API

* Added PDF decoder tests

* Fix Mistral OCR ident to be standard pdf-decoder (#450)

* Fix Mistral OCR ident to be standard pdf-decoder

* Correct test

* Schema structure refactor (#451)

* Write schema refactor spec

* Implemented schema refactor spec

* Structure data mvp (#452)

* Structured data tech spec

* Architecture principles

* New schemas

* Updated schemas and specs

* Object extractor

* Add .coveragerc

* New tests

* Cassandra object storage

* Trying to object extraction working, issues exist

* Validate librarian collection (#453)

* Fix token chunker, broken API invocation (#454)

* Fix token chunker, broken API invocation (#455)

* Knowledge load utility CLI (#456)

* Knowledge loader

* More tests
2025-08-18 20:56:09 +01:00

10 KiB

Integration Test Pattern for TrustGraph

This directory contains integration tests that verify the coordination between multiple TrustGraph services and components, following the patterns outlined in TEST_STRATEGY.md.

Integration Test Approach

Integration tests focus on service-to-service communication patterns and end-to-end message flows while still using mocks for external infrastructure.

Key Principles

  1. Test Service Coordination: Verify that services work together correctly
  2. Mock External Dependencies: Use mocks for databases, APIs, and infrastructure
  3. Real Business Logic: Exercise actual service logic and data transformations
  4. Error Propagation: Test how errors flow through the system
  5. Configuration Testing: Verify services respond correctly to different configurations

Test Structure

Fixtures (conftest.py)

Common fixtures for integration tests:

  • mock_pulsar_client: Mock Pulsar messaging client
  • mock_flow_context: Mock flow context for service coordination
  • integration_config: Standard configuration for integration tests
  • sample_documents: Test document collections
  • sample_embeddings: Test embedding vectors
  • sample_queries: Test query sets

Test Patterns

1. End-to-End Flow Testing

@pytest.mark.integration
@pytest.mark.asyncio
async def test_service_end_to_end_flow(self, service_instance, mock_clients):
    """Test complete service pipeline from input to output"""
    # Arrange - Set up realistic test data
    # Act - Execute the full service workflow
    # Assert - Verify coordination between all components

2. Error Propagation Testing

@pytest.mark.integration
@pytest.mark.asyncio
async def test_service_error_handling(self, service_instance, mock_clients):
    """Test how errors propagate through service coordination"""
    # Arrange - Set up failure scenarios
    # Act - Execute service with failing dependency
    # Assert - Verify proper error handling and cleanup

3. Configuration Testing

@pytest.mark.integration
@pytest.mark.asyncio
async def test_service_configuration_scenarios(self, service_instance):
    """Test service behavior with different configurations"""
    # Test multiple configuration scenarios
    # Verify service adapts correctly to each configuration

Running Integration Tests

Run All Integration Tests

pytest tests/integration/ -m integration

Run Specific Test

pytest tests/integration/test_document_rag_integration.py::TestDocumentRagIntegration::test_document_rag_end_to_end_flow -v

Run with Coverage (Skip Coverage Requirement)

pytest tests/integration/ -m integration --cov=trustgraph --cov-fail-under=0

Run Slow Tests

pytest tests/integration/ -m "integration and slow"

Skip Slow Tests

pytest tests/integration/ -m "integration and not slow"

Examples: Integration Test Implementations

1. Document RAG Integration Test

The test_document_rag_integration.py demonstrates the integration test pattern:

What It Tests

  • Service Coordination: Embeddings → Document Retrieval → Prompt Generation
  • Error Handling: Failure scenarios for each service dependency
  • Configuration: Different document limits, users, and collections
  • Performance: Large document set handling

Key Features

  • Realistic Data Flow: Uses actual service logic with mocked dependencies
  • Multiple Scenarios: Success, failure, and edge cases
  • Verbose Logging: Tests logging functionality
  • Multi-User Support: Tests user and collection isolation

Test Coverage

  • End-to-end happy path
  • No documents found scenario
  • Service failure scenarios (embeddings, documents, prompt)
  • Configuration variations
  • Multi-user isolation
  • Performance testing
  • Verbose logging

2. Text Completion Integration Test

The test_text_completion_integration.py demonstrates external API integration testing:

What It Tests

  • External API Integration: OpenAI API connectivity and authentication
  • Rate Limiting: Proper handling of API rate limits and retries
  • Error Handling: API failures, connection timeouts, and error propagation
  • Token Tracking: Accurate input/output token counting and metrics
  • Configuration: Different model parameters and settings
  • Concurrency: Multiple simultaneous API requests

Key Features

  • Realistic Mock Responses: Uses actual OpenAI API response structures
  • Authentication Testing: API key validation and base URL configuration
  • Error Scenarios: Rate limits, connection failures, invalid requests
  • Performance Metrics: Timing and token usage validation
  • Model Flexibility: Tests different GPT models and parameters

Test Coverage

  • Successful text completion generation
  • Multiple model configurations (GPT-3.5, GPT-4, GPT-4-turbo)
  • Rate limit handling (RateLimitError → TooManyRequests)
  • API error handling and propagation
  • Token counting accuracy
  • Prompt construction and parameter validation
  • Authentication patterns and API key validation
  • Concurrent request processing
  • Response content extraction and validation
  • Performance timing measurements

3. Agent Manager Integration Test

The test_agent_manager_integration.py demonstrates complex service coordination testing:

What It Tests

  • ReAct Pattern: Think-Act-Observe cycles with multi-step reasoning
  • Tool Coordination: Selection and execution of different tools (knowledge query, text completion, MCP tools)
  • Conversation State: Management of conversation history and context
  • Multi-Service Integration: Coordination between prompt, graph RAG, and tool services
  • Error Handling: Tool failures, unknown tools, and error propagation
  • Configuration Management: Dynamic tool loading and configuration

Key Features

  • Complex Coordination: Tests agent reasoning with multiple tool options
  • Stateful Processing: Maintains conversation history across interactions
  • Dynamic Tool Selection: Tests tool selection based on context and reasoning
  • Callback Pattern: Tests think/observe callback mechanisms
  • JSON Serialization: Handles complex data structures in prompts
  • Performance Testing: Large conversation history handling

Test Coverage

  • Basic reasoning cycle with tool selection
  • Final answer generation (ending ReAct cycle)
  • Full ReAct cycle with tool execution
  • Conversation history management
  • Multiple tool coordination and selection
  • Tool argument validation and processing
  • Error handling (unknown tools, execution failures)
  • Context integration and additional prompting
  • Empty tool configuration handling
  • Tool response processing and cleanup
  • Performance with large conversation history
  • JSON serialization in complex prompts

4. Knowledge Graph Extract → Store Pipeline Integration Test

The test_kg_extract_store_integration.py demonstrates multi-stage pipeline testing:

What It Tests

  • Text-to-Graph Transformation: Complete pipeline from text chunks to graph triples
  • Entity Extraction: Definition extraction with proper URI generation
  • Relationship Extraction: Subject-predicate-object relationship extraction
  • Graph Database Integration: Storage coordination with Cassandra knowledge store
  • Data Validation: Entity filtering, validation, and consistency checks
  • Pipeline Coordination: Multi-stage processing with proper data flow

Key Features

  • Multi-Stage Pipeline: Tests definitions → relationships → storage coordination
  • Graph Data Structures: RDF triples, entity contexts, and graph embeddings
  • URI Generation: Consistent entity URI creation across pipeline stages
  • Data Transformation: Complex text analysis to structured graph data
  • Batch Processing: Large document set processing performance
  • Error Resilience: Graceful handling of extraction failures

Test Coverage

  • Definitions extraction pipeline (text → entities + definitions)
  • Relationships extraction pipeline (text → subject-predicate-object)
  • URI generation consistency between processors
  • Triple generation from definitions and relationships
  • Knowledge store integration (triples and embeddings storage)
  • End-to-end pipeline coordination
  • Error handling in extraction services
  • Empty and invalid extraction results handling
  • Entity filtering and validation
  • Large batch processing performance
  • Metadata propagation through pipeline stages

Best Practices

Test Organization

  • Group related tests in classes
  • Use descriptive test names that explain the scenario
  • Follow the Arrange-Act-Assert pattern
  • Use appropriate pytest markers (@pytest.mark.integration, @pytest.mark.slow)

Mock Strategy

  • Mock external services (databases, APIs, message brokers)
  • Use real service logic and data transformations
  • Create realistic mock responses that match actual service behavior
  • Reset mocks between tests to ensure isolation

Test Data

  • Use realistic test data that reflects actual usage patterns
  • Create reusable fixtures for common test scenarios
  • Test with various data sizes and edge cases
  • Include both success and failure scenarios

Error Testing

  • Test each dependency failure scenario
  • Verify proper error propagation and cleanup
  • Test timeout and retry mechanisms
  • Validate error response formats

Performance Testing

  • Mark performance tests with @pytest.mark.slow
  • Test with realistic data volumes
  • Set reasonable performance expectations
  • Monitor resource usage during tests

Adding New Integration Tests

  1. Identify Service Dependencies: Map out which services your target service coordinates with
  2. Create Mock Fixtures: Set up mocks for each dependency in conftest.py
  3. Design Test Scenarios: Plan happy path, error cases, and edge conditions
  4. Implement Tests: Follow the established patterns in this directory
  5. Add Documentation: Update this README with your new test patterns

Test Markers

  • @pytest.mark.integration: Marks tests as integration tests
  • @pytest.mark.slow: Marks tests that take longer to run
  • @pytest.mark.asyncio: Required for async test functions

Future Enhancements

  • Add tests with real test containers for database integration
  • Implement contract testing for service interfaces
  • Add performance benchmarking for critical paths
  • Create integration test templates for common service patterns