mirror of
https://github.com/trustgraph-ai/trustgraph.git
synced 2026-05-22 22:05:13 +02:00
Release/v1.2 (#457)
* Bump setup.py versions for 1.1 * PoC MCP server (#419) * Very initial MCP server PoC for TrustGraph * Put service on port 8000 * Add MCP container and packages to buildout * Update docs for API/CLI changes in 1.0 (#421) * Update some API basics for the 0.23/1.0 API change * Add MCP container push (#425) * Add command args to the MCP server (#426) * Host and port parameters * Added websocket arg * More docs * MCP client support (#427) - MCP client service - Tool request/response schema - API gateway support for mcp-tool - Message translation for tool request & response - Make mcp-tool using configuration service for information about where the MCP services are. * Feature/react call mcp (#428) Key Features - MCP Tool Integration: Added core MCP tool support with ToolClientSpec and ToolClient classes - API Enhancement: New mcp_tool method for flow-specific tool invocation - CLI Tooling: New tg-invoke-mcp-tool command for testing MCP integration - React Agent Enhancement: Fixed and improved multi-tool invocation capabilities - Tool Management: Enhanced CLI for tool configuration and management Changes - Added MCP tool invocation to API with flow-specific integration - Implemented ToolClientSpec and ToolClient for tool call handling - Updated agent-manager-react to invoke MCP tools with configurable types - Enhanced CLI with new commands and improved help text - Added comprehensive documentation for new CLI commands - Improved tool configuration management Testing - Added tg-invoke-mcp-tool CLI command for isolated MCP integration testing - Enhanced agent capability to invoke multiple tools simultaneously * Test suite executed from CI pipeline (#433) * Test strategy & test cases * Unit tests * Integration tests * Extending test coverage (#434) * Contract tests * Testing embeedings * Agent unit tests * Knowledge pipeline tests * Turn on contract tests * Increase storage test coverage (#435) * Fixing storage and adding tests * PR pipeline only runs quick tests * Empty configuration is returned as empty list, previously was not in response (#436) * Update config util to take files as well as command-line text (#437) * Updated CLI invocation and config model for tools and mcp (#438) * Updated CLI invocation and config model for tools and mcp * CLI anomalies * Tweaked the MCP tool implementation for new model * Update agent implementation to match the new model * Fix agent tools, now all tested * Fixed integration tests * Fix MCP delete tool params * Update Python deps to 1.2 * Update to enable knowledge extraction using the agent framework (#439) * Implement KG extraction agent (kg-extract-agent) * Using ReAct framework (agent-manager-react) * ReAct manager had an issue when emitting JSON, which conflicts which ReAct manager's own JSON messages, so refactored ReAct manager to use traditional ReAct messages, non-JSON structure. * Minor refactor to take the prompt template client out of prompt-template so it can be more readily used by other modules. kg-extract-agent uses this framework. * Migrate from setup.py to pyproject.toml (#440) * Converted setup.py to pyproject.toml * Modern package infrastructure as recommended by py docs * Install missing build deps (#441) * Install missing build deps (#442) * Implement logging strategy (#444) * Logging strategy and convert all prints() to logging invocations * Fix/startup failure (#445) * Fix loggin startup problems * Fix logging startup problems (#446) * Fix logging startup problems (#447) * Fixed Mistral OCR to use current API (#448) * Fixed Mistral OCR to use current API * Added PDF decoder tests * Fix Mistral OCR ident to be standard pdf-decoder (#450) * Fix Mistral OCR ident to be standard pdf-decoder * Correct test * Schema structure refactor (#451) * Write schema refactor spec * Implemented schema refactor spec * Structure data mvp (#452) * Structured data tech spec * Architecture principles * New schemas * Updated schemas and specs * Object extractor * Add .coveragerc * New tests * Cassandra object storage * Trying to object extraction working, issues exist * Validate librarian collection (#453) * Fix token chunker, broken API invocation (#454) * Fix token chunker, broken API invocation (#455) * Knowledge load utility CLI (#456) * Knowledge loader * More tests
This commit is contained in:
parent
c85ba197be
commit
89be656990
509 changed files with 49632 additions and 5159 deletions
243
tests/contract/README.md
Normal file
243
tests/contract/README.md
Normal file
|
|
@ -0,0 +1,243 @@
|
|||
# Contract Tests for TrustGraph
|
||||
|
||||
This directory contains contract tests that verify service interface contracts, message schemas, and API compatibility across the TrustGraph microservices architecture.
|
||||
|
||||
## Overview
|
||||
|
||||
Contract tests ensure that:
|
||||
- **Message schemas remain compatible** across service versions
|
||||
- **API interfaces stay stable** for consumers
|
||||
- **Service communication contracts** are maintained
|
||||
- **Schema evolution** doesn't break existing integrations
|
||||
|
||||
## Test Categories
|
||||
|
||||
### 1. Pulsar Message Schema Contracts (`test_message_contracts.py`)
|
||||
|
||||
Tests the contracts for all Pulsar message schemas used in TrustGraph service communication.
|
||||
|
||||
#### **Coverage:**
|
||||
- ✅ **Text Completion Messages**: `TextCompletionRequest` ↔ `TextCompletionResponse`
|
||||
- ✅ **Document RAG Messages**: `DocumentRagQuery` ↔ `DocumentRagResponse`
|
||||
- ✅ **Agent Messages**: `AgentRequest` ↔ `AgentResponse` ↔ `AgentStep`
|
||||
- ✅ **Graph Messages**: `Chunk` → `Triple` → `Triples` → `EntityContext`
|
||||
- ✅ **Common Messages**: `Metadata`, `Value`, `Error` schemas
|
||||
- ✅ **Message Routing**: Properties, correlation IDs, routing keys
|
||||
- ✅ **Schema Evolution**: Backward/forward compatibility testing
|
||||
- ✅ **Serialization**: Schema validation and data integrity
|
||||
|
||||
#### **Key Features:**
|
||||
- **Schema Validation**: Ensures all message schemas accept valid data and reject invalid data
|
||||
- **Field Contracts**: Validates required vs optional fields and type constraints
|
||||
- **Nested Schema Support**: Tests complex schemas with embedded objects and arrays
|
||||
- **Routing Contracts**: Validates message properties and routing conventions
|
||||
- **Evolution Testing**: Backward compatibility and schema versioning support
|
||||
|
||||
## Running Contract Tests
|
||||
|
||||
### Run All Contract Tests
|
||||
```bash
|
||||
pytest tests/contract/ -m contract
|
||||
```
|
||||
|
||||
### Run Specific Contract Test Categories
|
||||
```bash
|
||||
# Message schema contracts
|
||||
pytest tests/contract/test_message_contracts.py -v
|
||||
|
||||
# Specific test class
|
||||
pytest tests/contract/test_message_contracts.py::TestTextCompletionMessageContracts -v
|
||||
|
||||
# Schema evolution tests
|
||||
pytest tests/contract/test_message_contracts.py::TestSchemaEvolutionContracts -v
|
||||
```
|
||||
|
||||
### Run with Coverage
|
||||
```bash
|
||||
pytest tests/contract/ -m contract --cov=trustgraph.schema --cov-report=html
|
||||
```
|
||||
|
||||
## Contract Test Patterns
|
||||
|
||||
### 1. Schema Validation Pattern
|
||||
```python
|
||||
@pytest.mark.contract
|
||||
def test_schema_contract(self, sample_message_data):
|
||||
"""Test that schema accepts valid data and rejects invalid data"""
|
||||
# Arrange
|
||||
valid_data = sample_message_data["SchemaName"]
|
||||
|
||||
# Act & Assert
|
||||
assert validate_schema_contract(SchemaClass, valid_data)
|
||||
|
||||
# Test field constraints
|
||||
instance = SchemaClass(**valid_data)
|
||||
assert hasattr(instance, 'required_field')
|
||||
assert isinstance(instance.required_field, expected_type)
|
||||
```
|
||||
|
||||
### 2. Serialization Contract Pattern
|
||||
```python
|
||||
@pytest.mark.contract
|
||||
def test_serialization_contract(self, sample_message_data):
|
||||
"""Test schema serialization/deserialization contracts"""
|
||||
# Arrange
|
||||
data = sample_message_data["SchemaName"]
|
||||
|
||||
# Act & Assert
|
||||
assert serialize_deserialize_test(SchemaClass, data)
|
||||
```
|
||||
|
||||
### 3. Evolution Contract Pattern
|
||||
```python
|
||||
@pytest.mark.contract
|
||||
def test_backward_compatibility_contract(self, schema_evolution_data):
|
||||
"""Test that new schema versions accept old data formats"""
|
||||
# Arrange
|
||||
old_version_data = schema_evolution_data["SchemaName_v1"]
|
||||
|
||||
# Act - Should work with current schema
|
||||
instance = CurrentSchema(**old_version_data)
|
||||
|
||||
# Assert - Required fields maintained
|
||||
assert instance.required_field == expected_value
|
||||
```
|
||||
|
||||
## Schema Registry
|
||||
|
||||
The contract tests maintain a registry of all TrustGraph schemas:
|
||||
|
||||
```python
|
||||
schema_registry = {
|
||||
# Text Completion
|
||||
"TextCompletionRequest": TextCompletionRequest,
|
||||
"TextCompletionResponse": TextCompletionResponse,
|
||||
|
||||
# Document RAG
|
||||
"DocumentRagQuery": DocumentRagQuery,
|
||||
"DocumentRagResponse": DocumentRagResponse,
|
||||
|
||||
# Agent
|
||||
"AgentRequest": AgentRequest,
|
||||
"AgentResponse": AgentResponse,
|
||||
|
||||
# Graph/Knowledge
|
||||
"Chunk": Chunk,
|
||||
"Triple": Triple,
|
||||
"Triples": Triples,
|
||||
"Value": Value,
|
||||
|
||||
# Common
|
||||
"Metadata": Metadata,
|
||||
"Error": Error,
|
||||
}
|
||||
```
|
||||
|
||||
## Message Contract Specifications
|
||||
|
||||
### Text Completion Service Contract
|
||||
```yaml
|
||||
TextCompletionRequest:
|
||||
required_fields: [system, prompt]
|
||||
field_types:
|
||||
system: string
|
||||
prompt: string
|
||||
|
||||
TextCompletionResponse:
|
||||
required_fields: [error, response, model]
|
||||
field_types:
|
||||
error: Error | null
|
||||
response: string | null
|
||||
in_token: integer | null
|
||||
out_token: integer | null
|
||||
model: string
|
||||
```
|
||||
|
||||
### Document RAG Service Contract
|
||||
```yaml
|
||||
DocumentRagQuery:
|
||||
required_fields: [query, user, collection]
|
||||
field_types:
|
||||
query: string
|
||||
user: string
|
||||
collection: string
|
||||
doc_limit: integer
|
||||
|
||||
DocumentRagResponse:
|
||||
required_fields: [error, response]
|
||||
field_types:
|
||||
error: Error | null
|
||||
response: string | null
|
||||
```
|
||||
|
||||
### Agent Service Contract
|
||||
```yaml
|
||||
AgentRequest:
|
||||
required_fields: [question, history]
|
||||
field_types:
|
||||
question: string
|
||||
plan: string
|
||||
state: string
|
||||
history: Array<AgentStep>
|
||||
|
||||
AgentResponse:
|
||||
required_fields: [error]
|
||||
field_types:
|
||||
answer: string | null
|
||||
error: Error | null
|
||||
thought: string | null
|
||||
observation: string | null
|
||||
```
|
||||
|
||||
## Best Practices
|
||||
|
||||
### Contract Test Design
|
||||
1. **Test Both Valid and Invalid Data**: Ensure schemas accept valid data and reject invalid data
|
||||
2. **Verify Field Constraints**: Test type constraints, required vs optional fields
|
||||
3. **Test Nested Schemas**: Validate complex objects with embedded schemas
|
||||
4. **Test Array Fields**: Ensure array serialization maintains order and content
|
||||
5. **Test Optional Fields**: Verify optional field handling in serialization
|
||||
|
||||
### Schema Evolution
|
||||
1. **Backward Compatibility**: New schema versions must accept old message formats
|
||||
2. **Required Field Stability**: Required fields should never become optional or be removed
|
||||
3. **Additive Changes**: New fields should be optional to maintain compatibility
|
||||
4. **Deprecation Strategy**: Plan deprecation path for schema changes
|
||||
|
||||
### Error Handling
|
||||
1. **Error Schema Consistency**: All error responses use consistent Error schema
|
||||
2. **Error Type Contracts**: Error types follow naming conventions
|
||||
3. **Error Message Format**: Error messages provide actionable information
|
||||
|
||||
## Adding New Contract Tests
|
||||
|
||||
When adding new message schemas or modifying existing ones:
|
||||
|
||||
1. **Add to Schema Registry**: Update `conftest.py` schema registry
|
||||
2. **Add Sample Data**: Create valid sample data in `conftest.py`
|
||||
3. **Create Contract Tests**: Follow existing patterns for validation
|
||||
4. **Test Evolution**: Add backward compatibility tests
|
||||
5. **Update Documentation**: Document schema contracts in this README
|
||||
|
||||
## Integration with CI/CD
|
||||
|
||||
Contract tests should be run:
|
||||
- **On every commit** to detect breaking changes early
|
||||
- **Before releases** to ensure API stability
|
||||
- **On schema changes** to validate compatibility
|
||||
- **In dependency updates** to catch breaking changes
|
||||
|
||||
```bash
|
||||
# CI/CD pipeline command
|
||||
pytest tests/contract/ -m contract --junitxml=contract-test-results.xml
|
||||
```
|
||||
|
||||
## Contract Test Results
|
||||
|
||||
Contract tests provide:
|
||||
- ✅ **Schema Compatibility Reports**: Which schemas pass/fail validation
|
||||
- ✅ **Breaking Change Detection**: Identifies contract violations
|
||||
- ✅ **Evolution Validation**: Confirms backward compatibility
|
||||
- ✅ **Field Constraint Verification**: Validates data type contracts
|
||||
|
||||
This ensures that TrustGraph services can evolve independently while maintaining stable, compatible interfaces for all service communication.
|
||||
0
tests/contract/__init__.py
Normal file
0
tests/contract/__init__.py
Normal file
224
tests/contract/conftest.py
Normal file
224
tests/contract/conftest.py
Normal file
|
|
@ -0,0 +1,224 @@
|
|||
"""
|
||||
Contract test fixtures and configuration
|
||||
|
||||
This file provides common fixtures for contract testing, focusing on
|
||||
message schema validation, API interface contracts, and service compatibility.
|
||||
"""
|
||||
|
||||
import pytest
|
||||
import json
|
||||
from typing import Dict, Any, Type
|
||||
from pulsar.schema import Record
|
||||
from unittest.mock import MagicMock
|
||||
|
||||
from trustgraph.schema import (
|
||||
TextCompletionRequest, TextCompletionResponse,
|
||||
DocumentRagQuery, DocumentRagResponse,
|
||||
AgentRequest, AgentResponse, AgentStep,
|
||||
Chunk, Triple, Triples, Value, Error,
|
||||
EntityContext, EntityContexts,
|
||||
GraphEmbeddings, EntityEmbeddings,
|
||||
Metadata
|
||||
)
|
||||
|
||||
|
||||
@pytest.fixture
|
||||
def schema_registry():
|
||||
"""Registry of all Pulsar schemas used in TrustGraph"""
|
||||
return {
|
||||
# Text Completion
|
||||
"TextCompletionRequest": TextCompletionRequest,
|
||||
"TextCompletionResponse": TextCompletionResponse,
|
||||
|
||||
# Document RAG
|
||||
"DocumentRagQuery": DocumentRagQuery,
|
||||
"DocumentRagResponse": DocumentRagResponse,
|
||||
|
||||
# Agent
|
||||
"AgentRequest": AgentRequest,
|
||||
"AgentResponse": AgentResponse,
|
||||
"AgentStep": AgentStep,
|
||||
|
||||
# Graph
|
||||
"Chunk": Chunk,
|
||||
"Triple": Triple,
|
||||
"Triples": Triples,
|
||||
"Value": Value,
|
||||
"Error": Error,
|
||||
"EntityContext": EntityContext,
|
||||
"EntityContexts": EntityContexts,
|
||||
"GraphEmbeddings": GraphEmbeddings,
|
||||
"EntityEmbeddings": EntityEmbeddings,
|
||||
|
||||
# Common
|
||||
"Metadata": Metadata,
|
||||
}
|
||||
|
||||
|
||||
@pytest.fixture
|
||||
def sample_message_data():
|
||||
"""Sample message data for contract testing"""
|
||||
return {
|
||||
"TextCompletionRequest": {
|
||||
"system": "You are a helpful assistant.",
|
||||
"prompt": "What is machine learning?"
|
||||
},
|
||||
"TextCompletionResponse": {
|
||||
"error": None,
|
||||
"response": "Machine learning is a subset of artificial intelligence.",
|
||||
"in_token": 50,
|
||||
"out_token": 100,
|
||||
"model": "gpt-3.5-turbo"
|
||||
},
|
||||
"DocumentRagQuery": {
|
||||
"query": "What is artificial intelligence?",
|
||||
"user": "test_user",
|
||||
"collection": "test_collection",
|
||||
"doc_limit": 10
|
||||
},
|
||||
"DocumentRagResponse": {
|
||||
"error": None,
|
||||
"response": "Artificial intelligence is the simulation of human intelligence in machines."
|
||||
},
|
||||
"AgentRequest": {
|
||||
"question": "What is machine learning?",
|
||||
"plan": "",
|
||||
"state": "",
|
||||
"history": []
|
||||
},
|
||||
"AgentResponse": {
|
||||
"answer": "Machine learning is a subset of AI.",
|
||||
"error": None,
|
||||
"thought": "I need to provide information about machine learning.",
|
||||
"observation": None
|
||||
},
|
||||
"Metadata": {
|
||||
"id": "test-doc-123",
|
||||
"user": "test_user",
|
||||
"collection": "test_collection",
|
||||
"metadata": []
|
||||
},
|
||||
"Value": {
|
||||
"value": "http://example.com/entity",
|
||||
"is_uri": True,
|
||||
"type": ""
|
||||
},
|
||||
"Triple": {
|
||||
"s": Value(
|
||||
value="http://example.com/subject",
|
||||
is_uri=True,
|
||||
type=""
|
||||
),
|
||||
"p": Value(
|
||||
value="http://example.com/predicate",
|
||||
is_uri=True,
|
||||
type=""
|
||||
),
|
||||
"o": Value(
|
||||
value="Object value",
|
||||
is_uri=False,
|
||||
type=""
|
||||
)
|
||||
}
|
||||
}
|
||||
|
||||
|
||||
@pytest.fixture
|
||||
def invalid_message_data():
|
||||
"""Invalid message data for contract validation testing"""
|
||||
return {
|
||||
"TextCompletionRequest": [
|
||||
{"system": None, "prompt": "test"}, # Invalid system (None)
|
||||
{"system": "test", "prompt": None}, # Invalid prompt (None)
|
||||
{"system": 123, "prompt": "test"}, # Invalid system (not string)
|
||||
{}, # Missing required fields
|
||||
],
|
||||
"DocumentRagQuery": [
|
||||
{"query": None, "user": "test", "collection": "test", "doc_limit": 10}, # Invalid query
|
||||
{"query": "test", "user": None, "collection": "test", "doc_limit": 10}, # Invalid user
|
||||
{"query": "test", "user": "test", "collection": "test", "doc_limit": -1}, # Invalid doc_limit
|
||||
{"query": "test"}, # Missing required fields
|
||||
],
|
||||
"Value": [
|
||||
{"value": None, "is_uri": True, "type": ""}, # Invalid value (None)
|
||||
{"value": "test", "is_uri": "not_boolean", "type": ""}, # Invalid is_uri
|
||||
{"value": 123, "is_uri": True, "type": ""}, # Invalid value (not string)
|
||||
]
|
||||
}
|
||||
|
||||
|
||||
@pytest.fixture
|
||||
def message_properties():
|
||||
"""Standard message properties for contract testing"""
|
||||
return {
|
||||
"id": "test-message-123",
|
||||
"routing_key": "test.routing.key",
|
||||
"timestamp": "2024-01-01T00:00:00Z",
|
||||
"source_service": "test-service",
|
||||
"correlation_id": "correlation-123"
|
||||
}
|
||||
|
||||
|
||||
@pytest.fixture
|
||||
def schema_evolution_data():
|
||||
"""Data for testing schema evolution and backward compatibility"""
|
||||
return {
|
||||
"TextCompletionRequest_v1": {
|
||||
"system": "You are helpful.",
|
||||
"prompt": "Test prompt"
|
||||
},
|
||||
"TextCompletionRequest_v2": {
|
||||
"system": "You are helpful.",
|
||||
"prompt": "Test prompt",
|
||||
"temperature": 0.7, # New field
|
||||
"max_tokens": 100 # New field
|
||||
},
|
||||
"TextCompletionResponse_v1": {
|
||||
"error": None,
|
||||
"response": "Test response",
|
||||
"model": "gpt-3.5-turbo"
|
||||
},
|
||||
"TextCompletionResponse_v2": {
|
||||
"error": None,
|
||||
"response": "Test response",
|
||||
"in_token": 50, # New field
|
||||
"out_token": 100, # New field
|
||||
"model": "gpt-3.5-turbo"
|
||||
}
|
||||
}
|
||||
|
||||
|
||||
def validate_schema_contract(schema_class: Type[Record], data: Dict[str, Any]) -> bool:
|
||||
"""Helper function to validate schema contracts"""
|
||||
try:
|
||||
# Create instance from data
|
||||
instance = schema_class(**data)
|
||||
|
||||
# Verify all fields are accessible
|
||||
for field_name in data.keys():
|
||||
assert hasattr(instance, field_name)
|
||||
assert getattr(instance, field_name) == data[field_name]
|
||||
|
||||
return True
|
||||
except Exception:
|
||||
return False
|
||||
|
||||
|
||||
def serialize_deserialize_test(schema_class: Type[Record], data: Dict[str, Any]) -> bool:
|
||||
"""Helper function to test serialization/deserialization"""
|
||||
try:
|
||||
# Create instance
|
||||
instance = schema_class(**data)
|
||||
|
||||
# This would test actual Pulsar serialization if we had the client
|
||||
# For now, we test the schema construction and field access
|
||||
for field_name, field_value in data.items():
|
||||
assert getattr(instance, field_name) == field_value
|
||||
|
||||
return True
|
||||
except Exception:
|
||||
return False
|
||||
|
||||
|
||||
# Test markers for contract tests
|
||||
pytestmark = pytest.mark.contract
|
||||
614
tests/contract/test_message_contracts.py
Normal file
614
tests/contract/test_message_contracts.py
Normal file
|
|
@ -0,0 +1,614 @@
|
|||
"""
|
||||
Contract tests for Pulsar Message Schemas
|
||||
|
||||
These tests verify the contracts for all Pulsar message schemas used in TrustGraph,
|
||||
ensuring schema compatibility, serialization contracts, and service interface stability.
|
||||
Following the TEST_STRATEGY.md approach for contract testing.
|
||||
"""
|
||||
|
||||
import pytest
|
||||
import json
|
||||
from typing import Dict, Any, Type
|
||||
from pulsar.schema import Record
|
||||
|
||||
from trustgraph.schema import (
|
||||
TextCompletionRequest, TextCompletionResponse,
|
||||
DocumentRagQuery, DocumentRagResponse,
|
||||
AgentRequest, AgentResponse, AgentStep,
|
||||
Chunk, Triple, Triples, Value, Error,
|
||||
EntityContext, EntityContexts,
|
||||
GraphEmbeddings, EntityEmbeddings,
|
||||
Metadata, Field, RowSchema,
|
||||
StructuredDataSubmission, ExtractedObject,
|
||||
NLPToStructuredQueryRequest, NLPToStructuredQueryResponse,
|
||||
StructuredQueryRequest, StructuredQueryResponse,
|
||||
StructuredObjectEmbedding
|
||||
)
|
||||
from .conftest import validate_schema_contract, serialize_deserialize_test
|
||||
|
||||
|
||||
@pytest.mark.contract
|
||||
class TestTextCompletionMessageContracts:
|
||||
"""Contract tests for Text Completion message schemas"""
|
||||
|
||||
def test_text_completion_request_schema_contract(self, sample_message_data):
|
||||
"""Test TextCompletionRequest schema contract"""
|
||||
# Arrange
|
||||
request_data = sample_message_data["TextCompletionRequest"]
|
||||
|
||||
# Act & Assert
|
||||
assert validate_schema_contract(TextCompletionRequest, request_data)
|
||||
|
||||
# Test required fields
|
||||
request = TextCompletionRequest(**request_data)
|
||||
assert hasattr(request, 'system')
|
||||
assert hasattr(request, 'prompt')
|
||||
assert isinstance(request.system, str)
|
||||
assert isinstance(request.prompt, str)
|
||||
|
||||
def test_text_completion_response_schema_contract(self, sample_message_data):
|
||||
"""Test TextCompletionResponse schema contract"""
|
||||
# Arrange
|
||||
response_data = sample_message_data["TextCompletionResponse"]
|
||||
|
||||
# Act & Assert
|
||||
assert validate_schema_contract(TextCompletionResponse, response_data)
|
||||
|
||||
# Test required fields
|
||||
response = TextCompletionResponse(**response_data)
|
||||
assert hasattr(response, 'error')
|
||||
assert hasattr(response, 'response')
|
||||
assert hasattr(response, 'in_token')
|
||||
assert hasattr(response, 'out_token')
|
||||
assert hasattr(response, 'model')
|
||||
|
||||
def test_text_completion_request_serialization_contract(self, sample_message_data):
|
||||
"""Test TextCompletionRequest serialization/deserialization contract"""
|
||||
# Arrange
|
||||
request_data = sample_message_data["TextCompletionRequest"]
|
||||
|
||||
# Act & Assert
|
||||
assert serialize_deserialize_test(TextCompletionRequest, request_data)
|
||||
|
||||
def test_text_completion_response_serialization_contract(self, sample_message_data):
|
||||
"""Test TextCompletionResponse serialization/deserialization contract"""
|
||||
# Arrange
|
||||
response_data = sample_message_data["TextCompletionResponse"]
|
||||
|
||||
# Act & Assert
|
||||
assert serialize_deserialize_test(TextCompletionResponse, response_data)
|
||||
|
||||
def test_text_completion_request_field_constraints(self):
|
||||
"""Test TextCompletionRequest field type constraints"""
|
||||
# Test valid data
|
||||
valid_request = TextCompletionRequest(
|
||||
system="You are helpful.",
|
||||
prompt="Test prompt"
|
||||
)
|
||||
assert valid_request.system == "You are helpful."
|
||||
assert valid_request.prompt == "Test prompt"
|
||||
|
||||
def test_text_completion_response_field_constraints(self):
|
||||
"""Test TextCompletionResponse field type constraints"""
|
||||
# Test valid response with no error
|
||||
valid_response = TextCompletionResponse(
|
||||
error=None,
|
||||
response="Test response",
|
||||
in_token=50,
|
||||
out_token=100,
|
||||
model="gpt-3.5-turbo"
|
||||
)
|
||||
assert valid_response.error is None
|
||||
assert valid_response.response == "Test response"
|
||||
assert valid_response.in_token == 50
|
||||
assert valid_response.out_token == 100
|
||||
assert valid_response.model == "gpt-3.5-turbo"
|
||||
|
||||
# Test response with error
|
||||
error_response = TextCompletionResponse(
|
||||
error=Error(type="rate-limit", message="Rate limit exceeded"),
|
||||
response=None,
|
||||
in_token=None,
|
||||
out_token=None,
|
||||
model=None
|
||||
)
|
||||
assert error_response.error is not None
|
||||
assert error_response.error.type == "rate-limit"
|
||||
assert error_response.response is None
|
||||
|
||||
|
||||
@pytest.mark.contract
|
||||
class TestDocumentRagMessageContracts:
|
||||
"""Contract tests for Document RAG message schemas"""
|
||||
|
||||
def test_document_rag_query_schema_contract(self, sample_message_data):
|
||||
"""Test DocumentRagQuery schema contract"""
|
||||
# Arrange
|
||||
query_data = sample_message_data["DocumentRagQuery"]
|
||||
|
||||
# Act & Assert
|
||||
assert validate_schema_contract(DocumentRagQuery, query_data)
|
||||
|
||||
# Test required fields
|
||||
query = DocumentRagQuery(**query_data)
|
||||
assert hasattr(query, 'query')
|
||||
assert hasattr(query, 'user')
|
||||
assert hasattr(query, 'collection')
|
||||
assert hasattr(query, 'doc_limit')
|
||||
|
||||
def test_document_rag_response_schema_contract(self, sample_message_data):
|
||||
"""Test DocumentRagResponse schema contract"""
|
||||
# Arrange
|
||||
response_data = sample_message_data["DocumentRagResponse"]
|
||||
|
||||
# Act & Assert
|
||||
assert validate_schema_contract(DocumentRagResponse, response_data)
|
||||
|
||||
# Test required fields
|
||||
response = DocumentRagResponse(**response_data)
|
||||
assert hasattr(response, 'error')
|
||||
assert hasattr(response, 'response')
|
||||
|
||||
def test_document_rag_query_field_constraints(self):
|
||||
"""Test DocumentRagQuery field constraints"""
|
||||
# Test valid query
|
||||
valid_query = DocumentRagQuery(
|
||||
query="What is AI?",
|
||||
user="test_user",
|
||||
collection="test_collection",
|
||||
doc_limit=5
|
||||
)
|
||||
assert valid_query.query == "What is AI?"
|
||||
assert valid_query.user == "test_user"
|
||||
assert valid_query.collection == "test_collection"
|
||||
assert valid_query.doc_limit == 5
|
||||
|
||||
def test_document_rag_response_error_contract(self):
|
||||
"""Test DocumentRagResponse error handling contract"""
|
||||
# Test successful response
|
||||
success_response = DocumentRagResponse(
|
||||
error=None,
|
||||
response="AI is artificial intelligence."
|
||||
)
|
||||
assert success_response.error is None
|
||||
assert success_response.response == "AI is artificial intelligence."
|
||||
|
||||
# Test error response
|
||||
error_response = DocumentRagResponse(
|
||||
error=Error(type="no-documents", message="No documents found"),
|
||||
response=None
|
||||
)
|
||||
assert error_response.error is not None
|
||||
assert error_response.error.type == "no-documents"
|
||||
assert error_response.response is None
|
||||
|
||||
|
||||
@pytest.mark.contract
|
||||
class TestAgentMessageContracts:
|
||||
"""Contract tests for Agent message schemas"""
|
||||
|
||||
def test_agent_request_schema_contract(self, sample_message_data):
|
||||
"""Test AgentRequest schema contract"""
|
||||
# Arrange
|
||||
request_data = sample_message_data["AgentRequest"]
|
||||
|
||||
# Act & Assert
|
||||
assert validate_schema_contract(AgentRequest, request_data)
|
||||
|
||||
# Test required fields
|
||||
request = AgentRequest(**request_data)
|
||||
assert hasattr(request, 'question')
|
||||
assert hasattr(request, 'plan')
|
||||
assert hasattr(request, 'state')
|
||||
assert hasattr(request, 'history')
|
||||
|
||||
def test_agent_response_schema_contract(self, sample_message_data):
|
||||
"""Test AgentResponse schema contract"""
|
||||
# Arrange
|
||||
response_data = sample_message_data["AgentResponse"]
|
||||
|
||||
# Act & Assert
|
||||
assert validate_schema_contract(AgentResponse, response_data)
|
||||
|
||||
# Test required fields
|
||||
response = AgentResponse(**response_data)
|
||||
assert hasattr(response, 'answer')
|
||||
assert hasattr(response, 'error')
|
||||
assert hasattr(response, 'thought')
|
||||
assert hasattr(response, 'observation')
|
||||
|
||||
def test_agent_step_schema_contract(self):
|
||||
"""Test AgentStep schema contract"""
|
||||
# Arrange
|
||||
step_data = {
|
||||
"thought": "I need to search for information",
|
||||
"action": "knowledge_query",
|
||||
"arguments": {"question": "What is AI?"},
|
||||
"observation": "AI is artificial intelligence"
|
||||
}
|
||||
|
||||
# Act & Assert
|
||||
assert validate_schema_contract(AgentStep, step_data)
|
||||
|
||||
step = AgentStep(**step_data)
|
||||
assert step.thought == "I need to search for information"
|
||||
assert step.action == "knowledge_query"
|
||||
assert step.arguments == {"question": "What is AI?"}
|
||||
assert step.observation == "AI is artificial intelligence"
|
||||
|
||||
def test_agent_request_with_history_contract(self):
|
||||
"""Test AgentRequest with conversation history contract"""
|
||||
# Arrange
|
||||
history_steps = [
|
||||
AgentStep(
|
||||
thought="First thought",
|
||||
action="first_action",
|
||||
arguments={"param": "value"},
|
||||
observation="First observation"
|
||||
),
|
||||
AgentStep(
|
||||
thought="Second thought",
|
||||
action="second_action",
|
||||
arguments={"param2": "value2"},
|
||||
observation="Second observation"
|
||||
)
|
||||
]
|
||||
|
||||
# Act
|
||||
request = AgentRequest(
|
||||
question="What comes next?",
|
||||
plan="Multi-step plan",
|
||||
state="processing",
|
||||
history=history_steps
|
||||
)
|
||||
|
||||
# Assert
|
||||
assert len(request.history) == 2
|
||||
assert request.history[0].thought == "First thought"
|
||||
assert request.history[1].action == "second_action"
|
||||
|
||||
|
||||
@pytest.mark.contract
|
||||
class TestGraphMessageContracts:
|
||||
"""Contract tests for Graph/Knowledge message schemas"""
|
||||
|
||||
def test_value_schema_contract(self, sample_message_data):
|
||||
"""Test Value schema contract"""
|
||||
# Arrange
|
||||
value_data = sample_message_data["Value"]
|
||||
|
||||
# Act & Assert
|
||||
assert validate_schema_contract(Value, value_data)
|
||||
|
||||
# Test URI value
|
||||
uri_value = Value(**value_data)
|
||||
assert uri_value.value == "http://example.com/entity"
|
||||
assert uri_value.is_uri is True
|
||||
|
||||
# Test literal value
|
||||
literal_value = Value(
|
||||
value="Literal text value",
|
||||
is_uri=False,
|
||||
type=""
|
||||
)
|
||||
assert literal_value.value == "Literal text value"
|
||||
assert literal_value.is_uri is False
|
||||
|
||||
def test_triple_schema_contract(self, sample_message_data):
|
||||
"""Test Triple schema contract"""
|
||||
# Arrange
|
||||
triple_data = sample_message_data["Triple"]
|
||||
|
||||
# Act & Assert - Triple uses Value objects, not dict validation
|
||||
triple = Triple(
|
||||
s=triple_data["s"],
|
||||
p=triple_data["p"],
|
||||
o=triple_data["o"]
|
||||
)
|
||||
assert triple.s.value == "http://example.com/subject"
|
||||
assert triple.p.value == "http://example.com/predicate"
|
||||
assert triple.o.value == "Object value"
|
||||
assert triple.s.is_uri is True
|
||||
assert triple.p.is_uri is True
|
||||
assert triple.o.is_uri is False
|
||||
|
||||
def test_triples_schema_contract(self, sample_message_data):
|
||||
"""Test Triples (batch) schema contract"""
|
||||
# Arrange
|
||||
metadata = Metadata(**sample_message_data["Metadata"])
|
||||
triple = Triple(**sample_message_data["Triple"])
|
||||
|
||||
triples_data = {
|
||||
"metadata": metadata,
|
||||
"triples": [triple]
|
||||
}
|
||||
|
||||
# Act & Assert
|
||||
assert validate_schema_contract(Triples, triples_data)
|
||||
|
||||
triples = Triples(**triples_data)
|
||||
assert triples.metadata.id == "test-doc-123"
|
||||
assert len(triples.triples) == 1
|
||||
assert triples.triples[0].s.value == "http://example.com/subject"
|
||||
|
||||
def test_chunk_schema_contract(self, sample_message_data):
|
||||
"""Test Chunk schema contract"""
|
||||
# Arrange
|
||||
metadata = Metadata(**sample_message_data["Metadata"])
|
||||
chunk_data = {
|
||||
"metadata": metadata,
|
||||
"chunk": b"This is a text chunk for processing"
|
||||
}
|
||||
|
||||
# Act & Assert
|
||||
assert validate_schema_contract(Chunk, chunk_data)
|
||||
|
||||
chunk = Chunk(**chunk_data)
|
||||
assert chunk.metadata.id == "test-doc-123"
|
||||
assert chunk.chunk == b"This is a text chunk for processing"
|
||||
|
||||
def test_entity_context_schema_contract(self):
|
||||
"""Test EntityContext schema contract"""
|
||||
# Arrange
|
||||
entity_value = Value(value="http://example.com/entity", is_uri=True, type="")
|
||||
entity_context_data = {
|
||||
"entity": entity_value,
|
||||
"context": "Context information about the entity"
|
||||
}
|
||||
|
||||
# Act & Assert
|
||||
assert validate_schema_contract(EntityContext, entity_context_data)
|
||||
|
||||
entity_context = EntityContext(**entity_context_data)
|
||||
assert entity_context.entity.value == "http://example.com/entity"
|
||||
assert entity_context.context == "Context information about the entity"
|
||||
|
||||
def test_entity_contexts_batch_schema_contract(self, sample_message_data):
|
||||
"""Test EntityContexts (batch) schema contract"""
|
||||
# Arrange
|
||||
metadata = Metadata(**sample_message_data["Metadata"])
|
||||
entity_value = Value(value="http://example.com/entity", is_uri=True, type="")
|
||||
entity_context = EntityContext(
|
||||
entity=entity_value,
|
||||
context="Entity context"
|
||||
)
|
||||
|
||||
entity_contexts_data = {
|
||||
"metadata": metadata,
|
||||
"entities": [entity_context]
|
||||
}
|
||||
|
||||
# Act & Assert
|
||||
assert validate_schema_contract(EntityContexts, entity_contexts_data)
|
||||
|
||||
entity_contexts = EntityContexts(**entity_contexts_data)
|
||||
assert entity_contexts.metadata.id == "test-doc-123"
|
||||
assert len(entity_contexts.entities) == 1
|
||||
assert entity_contexts.entities[0].context == "Entity context"
|
||||
|
||||
|
||||
@pytest.mark.contract
|
||||
class TestMetadataMessageContracts:
|
||||
"""Contract tests for Metadata and common message schemas"""
|
||||
|
||||
def test_metadata_schema_contract(self, sample_message_data):
|
||||
"""Test Metadata schema contract"""
|
||||
# Arrange
|
||||
metadata_data = sample_message_data["Metadata"]
|
||||
|
||||
# Act & Assert
|
||||
assert validate_schema_contract(Metadata, metadata_data)
|
||||
|
||||
metadata = Metadata(**metadata_data)
|
||||
assert metadata.id == "test-doc-123"
|
||||
assert metadata.user == "test_user"
|
||||
assert metadata.collection == "test_collection"
|
||||
assert isinstance(metadata.metadata, list)
|
||||
|
||||
def test_metadata_with_triples_contract(self, sample_message_data):
|
||||
"""Test Metadata with embedded triples contract"""
|
||||
# Arrange
|
||||
triple = Triple(**sample_message_data["Triple"])
|
||||
metadata_data = {
|
||||
"id": "doc-with-triples",
|
||||
"user": "test_user",
|
||||
"collection": "test_collection",
|
||||
"metadata": [triple]
|
||||
}
|
||||
|
||||
# Act & Assert
|
||||
assert validate_schema_contract(Metadata, metadata_data)
|
||||
|
||||
metadata = Metadata(**metadata_data)
|
||||
assert len(metadata.metadata) == 1
|
||||
assert metadata.metadata[0].s.value == "http://example.com/subject"
|
||||
|
||||
def test_error_schema_contract(self):
|
||||
"""Test Error schema contract"""
|
||||
# Arrange
|
||||
error_data = {
|
||||
"type": "validation-error",
|
||||
"message": "Invalid input data provided"
|
||||
}
|
||||
|
||||
# Act & Assert
|
||||
assert validate_schema_contract(Error, error_data)
|
||||
|
||||
error = Error(**error_data)
|
||||
assert error.type == "validation-error"
|
||||
assert error.message == "Invalid input data provided"
|
||||
|
||||
|
||||
@pytest.mark.contract
|
||||
class TestMessageRoutingContracts:
|
||||
"""Contract tests for message routing and properties"""
|
||||
|
||||
def test_message_property_contracts(self, message_properties):
|
||||
"""Test standard message property contracts"""
|
||||
# Act & Assert
|
||||
required_properties = ["id", "routing_key", "timestamp", "source_service"]
|
||||
|
||||
for prop in required_properties:
|
||||
assert prop in message_properties
|
||||
assert message_properties[prop] is not None
|
||||
assert isinstance(message_properties[prop], str)
|
||||
|
||||
def test_message_id_format_contract(self, message_properties):
|
||||
"""Test message ID format contract"""
|
||||
# Act & Assert
|
||||
message_id = message_properties["id"]
|
||||
assert isinstance(message_id, str)
|
||||
assert len(message_id) > 0
|
||||
# Message IDs should follow a consistent format
|
||||
assert "test-message-" in message_id
|
||||
|
||||
def test_routing_key_format_contract(self, message_properties):
|
||||
"""Test routing key format contract"""
|
||||
# Act & Assert
|
||||
routing_key = message_properties["routing_key"]
|
||||
assert isinstance(routing_key, str)
|
||||
assert "." in routing_key # Should use dot notation
|
||||
assert routing_key.count(".") >= 2 # Should have at least 3 parts
|
||||
|
||||
def test_correlation_id_contract(self, message_properties):
|
||||
"""Test correlation ID contract for request/response tracking"""
|
||||
# Act & Assert
|
||||
correlation_id = message_properties.get("correlation_id")
|
||||
if correlation_id is not None:
|
||||
assert isinstance(correlation_id, str)
|
||||
assert len(correlation_id) > 0
|
||||
|
||||
|
||||
@pytest.mark.contract
|
||||
class TestSchemaEvolutionContracts:
|
||||
"""Contract tests for schema evolution and backward compatibility"""
|
||||
|
||||
def test_schema_backward_compatibility(self, schema_evolution_data):
|
||||
"""Test schema backward compatibility"""
|
||||
# Test that v1 data can still be processed
|
||||
v1_request = schema_evolution_data["TextCompletionRequest_v1"]
|
||||
|
||||
# Should work with current schema (optional fields default)
|
||||
request = TextCompletionRequest(**v1_request)
|
||||
assert request.system == "You are helpful."
|
||||
assert request.prompt == "Test prompt"
|
||||
|
||||
def test_schema_forward_compatibility(self, schema_evolution_data):
|
||||
"""Test schema forward compatibility with new fields"""
|
||||
# Test that v2 data works with additional fields
|
||||
v2_request = schema_evolution_data["TextCompletionRequest_v2"]
|
||||
|
||||
# Current schema should handle new fields gracefully
|
||||
# (This would require actual schema versioning implementation)
|
||||
base_fields = {"system": v2_request["system"], "prompt": v2_request["prompt"]}
|
||||
request = TextCompletionRequest(**base_fields)
|
||||
assert request.system == "You are helpful."
|
||||
assert request.prompt == "Test prompt"
|
||||
|
||||
def test_required_field_stability_contract(self):
|
||||
"""Test that required fields remain stable across versions"""
|
||||
# These fields should never become optional or be removed
|
||||
required_fields = {
|
||||
"TextCompletionRequest": ["system", "prompt"],
|
||||
"TextCompletionResponse": ["error", "response", "model"],
|
||||
"DocumentRagQuery": ["query", "user", "collection"],
|
||||
"DocumentRagResponse": ["error", "response"],
|
||||
"AgentRequest": ["question", "history"],
|
||||
"AgentResponse": ["error"],
|
||||
}
|
||||
|
||||
# Verify required fields are present in schema definitions
|
||||
for schema_name, fields in required_fields.items():
|
||||
# This would be implemented with actual schema introspection
|
||||
# For now, we verify by attempting to create instances
|
||||
assert len(fields) > 0 # Ensure we have defined required fields
|
||||
|
||||
|
||||
@pytest.mark.contract
|
||||
class TestSerializationContracts:
|
||||
"""Contract tests for message serialization/deserialization"""
|
||||
|
||||
def test_all_schemas_serialization_contract(self, schema_registry, sample_message_data):
|
||||
"""Test serialization contract for all schemas"""
|
||||
# Test each schema in the registry
|
||||
for schema_name, schema_class in schema_registry.items():
|
||||
if schema_name in sample_message_data:
|
||||
# Skip Triple schema as it requires special handling with Value objects
|
||||
if schema_name == "Triple":
|
||||
continue
|
||||
|
||||
# Act & Assert
|
||||
data = sample_message_data[schema_name]
|
||||
assert serialize_deserialize_test(schema_class, data), f"Serialization failed for {schema_name}"
|
||||
|
||||
def test_triple_serialization_contract(self, sample_message_data):
|
||||
"""Test Triple schema serialization contract with Value objects"""
|
||||
# Arrange
|
||||
triple_data = sample_message_data["Triple"]
|
||||
|
||||
# Act
|
||||
triple = Triple(
|
||||
s=triple_data["s"],
|
||||
p=triple_data["p"],
|
||||
o=triple_data["o"]
|
||||
)
|
||||
|
||||
# Assert - Test that Value objects are properly constructed and accessible
|
||||
assert triple.s.value == "http://example.com/subject"
|
||||
assert triple.p.value == "http://example.com/predicate"
|
||||
assert triple.o.value == "Object value"
|
||||
assert isinstance(triple.s, Value)
|
||||
assert isinstance(triple.p, Value)
|
||||
assert isinstance(triple.o, Value)
|
||||
|
||||
def test_nested_schema_serialization_contract(self, sample_message_data):
|
||||
"""Test serialization of nested schemas"""
|
||||
# Test Triples (contains Metadata and Triple objects)
|
||||
metadata = Metadata(**sample_message_data["Metadata"])
|
||||
triple = Triple(**sample_message_data["Triple"])
|
||||
|
||||
triples = Triples(metadata=metadata, triples=[triple])
|
||||
|
||||
# Verify nested objects maintain their contracts
|
||||
assert triples.metadata.id == "test-doc-123"
|
||||
assert triples.triples[0].s.value == "http://example.com/subject"
|
||||
|
||||
def test_array_field_serialization_contract(self):
|
||||
"""Test serialization of array fields"""
|
||||
# Test AgentRequest with history array
|
||||
steps = [
|
||||
AgentStep(
|
||||
thought=f"Step {i}",
|
||||
action=f"action_{i}",
|
||||
arguments={f"param_{i}": f"value_{i}"},
|
||||
observation=f"Observation {i}"
|
||||
)
|
||||
for i in range(3)
|
||||
]
|
||||
|
||||
request = AgentRequest(
|
||||
question="Test with array",
|
||||
plan="Test plan",
|
||||
state="Test state",
|
||||
history=steps
|
||||
)
|
||||
|
||||
# Verify array serialization maintains order and content
|
||||
assert len(request.history) == 3
|
||||
assert request.history[0].thought == "Step 0"
|
||||
assert request.history[2].action == "action_2"
|
||||
|
||||
def test_optional_field_serialization_contract(self):
|
||||
"""Test serialization contract for optional fields"""
|
||||
# Test with minimal required fields
|
||||
minimal_response = TextCompletionResponse(
|
||||
error=None,
|
||||
response="Test",
|
||||
in_token=None, # Optional field
|
||||
out_token=None, # Optional field
|
||||
model="test-model"
|
||||
)
|
||||
|
||||
assert minimal_response.response == "Test"
|
||||
assert minimal_response.in_token is None
|
||||
assert minimal_response.out_token is None
|
||||
306
tests/contract/test_objects_cassandra_contracts.py
Normal file
306
tests/contract/test_objects_cassandra_contracts.py
Normal file
|
|
@ -0,0 +1,306 @@
|
|||
"""
|
||||
Contract tests for Cassandra Object Storage
|
||||
|
||||
These tests verify the message contracts and schema compatibility
|
||||
for the objects storage processor.
|
||||
"""
|
||||
|
||||
import pytest
|
||||
import json
|
||||
from pulsar.schema import AvroSchema
|
||||
|
||||
from trustgraph.schema import ExtractedObject, Metadata, RowSchema, Field
|
||||
from trustgraph.storage.objects.cassandra.write import Processor
|
||||
|
||||
|
||||
@pytest.mark.contract
|
||||
class TestObjectsCassandraContracts:
|
||||
"""Contract tests for Cassandra object storage messages"""
|
||||
|
||||
def test_extracted_object_input_contract(self):
|
||||
"""Test that ExtractedObject schema matches expected input format"""
|
||||
# Create test object with all required fields
|
||||
test_metadata = Metadata(
|
||||
id="test-doc-001",
|
||||
user="test_user",
|
||||
collection="test_collection",
|
||||
metadata=[]
|
||||
)
|
||||
|
||||
test_object = ExtractedObject(
|
||||
metadata=test_metadata,
|
||||
schema_name="customer_records",
|
||||
values={
|
||||
"customer_id": "CUST123",
|
||||
"name": "Test Customer",
|
||||
"email": "test@example.com"
|
||||
},
|
||||
confidence=0.95,
|
||||
source_span="Customer data from document..."
|
||||
)
|
||||
|
||||
# Verify all required fields are present
|
||||
assert hasattr(test_object, 'metadata')
|
||||
assert hasattr(test_object, 'schema_name')
|
||||
assert hasattr(test_object, 'values')
|
||||
assert hasattr(test_object, 'confidence')
|
||||
assert hasattr(test_object, 'source_span')
|
||||
|
||||
# Verify metadata structure
|
||||
assert hasattr(test_object.metadata, 'id')
|
||||
assert hasattr(test_object.metadata, 'user')
|
||||
assert hasattr(test_object.metadata, 'collection')
|
||||
assert hasattr(test_object.metadata, 'metadata')
|
||||
|
||||
# Verify types
|
||||
assert isinstance(test_object.schema_name, str)
|
||||
assert isinstance(test_object.values, dict)
|
||||
assert isinstance(test_object.confidence, float)
|
||||
assert isinstance(test_object.source_span, str)
|
||||
|
||||
def test_row_schema_structure_contract(self):
|
||||
"""Test RowSchema structure used for table definitions"""
|
||||
# Create test schema
|
||||
test_fields = [
|
||||
Field(
|
||||
name="id",
|
||||
type="string",
|
||||
size=50,
|
||||
primary=True,
|
||||
description="Primary key",
|
||||
required=True,
|
||||
enum_values=[],
|
||||
indexed=False
|
||||
),
|
||||
Field(
|
||||
name="status",
|
||||
type="string",
|
||||
size=20,
|
||||
primary=False,
|
||||
description="Status field",
|
||||
required=False,
|
||||
enum_values=["active", "inactive", "pending"],
|
||||
indexed=True
|
||||
)
|
||||
]
|
||||
|
||||
test_schema = RowSchema(
|
||||
name="test_table",
|
||||
description="Test table schema",
|
||||
fields=test_fields
|
||||
)
|
||||
|
||||
# Verify schema structure
|
||||
assert hasattr(test_schema, 'name')
|
||||
assert hasattr(test_schema, 'description')
|
||||
assert hasattr(test_schema, 'fields')
|
||||
assert isinstance(test_schema.fields, list)
|
||||
|
||||
# Verify field structure
|
||||
for field in test_schema.fields:
|
||||
assert hasattr(field, 'name')
|
||||
assert hasattr(field, 'type')
|
||||
assert hasattr(field, 'size')
|
||||
assert hasattr(field, 'primary')
|
||||
assert hasattr(field, 'description')
|
||||
assert hasattr(field, 'required')
|
||||
assert hasattr(field, 'enum_values')
|
||||
assert hasattr(field, 'indexed')
|
||||
|
||||
def test_schema_config_format_contract(self):
|
||||
"""Test the expected configuration format for schemas"""
|
||||
# Define expected config structure
|
||||
config_format = {
|
||||
"schema": {
|
||||
"table_name": json.dumps({
|
||||
"name": "table_name",
|
||||
"description": "Table description",
|
||||
"fields": [
|
||||
{
|
||||
"name": "field_name",
|
||||
"type": "string",
|
||||
"size": 0,
|
||||
"primary_key": True,
|
||||
"description": "Field description",
|
||||
"required": True,
|
||||
"enum": [],
|
||||
"indexed": False
|
||||
}
|
||||
]
|
||||
})
|
||||
}
|
||||
}
|
||||
|
||||
# Verify config can be parsed
|
||||
schema_json = json.loads(config_format["schema"]["table_name"])
|
||||
assert "name" in schema_json
|
||||
assert "fields" in schema_json
|
||||
assert isinstance(schema_json["fields"], list)
|
||||
|
||||
# Verify field format
|
||||
field = schema_json["fields"][0]
|
||||
required_field_keys = {"name", "type"}
|
||||
optional_field_keys = {"size", "primary_key", "description", "required", "enum", "indexed"}
|
||||
|
||||
assert required_field_keys.issubset(field.keys())
|
||||
assert set(field.keys()).issubset(required_field_keys | optional_field_keys)
|
||||
|
||||
def test_cassandra_type_mapping_contract(self):
|
||||
"""Test that all supported field types have Cassandra mappings"""
|
||||
processor = Processor.__new__(Processor)
|
||||
|
||||
# All field types that should be supported
|
||||
supported_types = [
|
||||
("string", "text"),
|
||||
("integer", "int"), # or bigint based on size
|
||||
("float", "float"), # or double based on size
|
||||
("boolean", "boolean"),
|
||||
("timestamp", "timestamp"),
|
||||
("date", "date"),
|
||||
("time", "time"),
|
||||
("uuid", "uuid")
|
||||
]
|
||||
|
||||
for field_type, expected_cassandra_type in supported_types:
|
||||
cassandra_type = processor.get_cassandra_type(field_type)
|
||||
# For integer and float, the exact type depends on size
|
||||
if field_type in ["integer", "float"]:
|
||||
assert cassandra_type in ["int", "bigint", "float", "double"]
|
||||
else:
|
||||
assert cassandra_type == expected_cassandra_type
|
||||
|
||||
def test_value_conversion_contract(self):
|
||||
"""Test value conversion for all supported types"""
|
||||
processor = Processor.__new__(Processor)
|
||||
|
||||
# Test conversions maintain data integrity
|
||||
test_cases = [
|
||||
# (input_value, field_type, expected_output, expected_type)
|
||||
("123", "integer", 123, int),
|
||||
("123.45", "float", 123.45, float),
|
||||
("true", "boolean", True, bool),
|
||||
("false", "boolean", False, bool),
|
||||
("test string", "string", "test string", str),
|
||||
(None, "string", None, type(None)),
|
||||
]
|
||||
|
||||
for input_val, field_type, expected_val, expected_type in test_cases:
|
||||
result = processor.convert_value(input_val, field_type)
|
||||
assert result == expected_val
|
||||
assert isinstance(result, expected_type) or result is None
|
||||
|
||||
def test_extracted_object_serialization_contract(self):
|
||||
"""Test that ExtractedObject can be serialized/deserialized correctly"""
|
||||
# Create test object
|
||||
original = ExtractedObject(
|
||||
metadata=Metadata(
|
||||
id="serial-001",
|
||||
user="test_user",
|
||||
collection="test_coll",
|
||||
metadata=[]
|
||||
),
|
||||
schema_name="test_schema",
|
||||
values={"field1": "value1", "field2": "123"},
|
||||
confidence=0.85,
|
||||
source_span="Test span"
|
||||
)
|
||||
|
||||
# Test serialization using schema
|
||||
schema = AvroSchema(ExtractedObject)
|
||||
|
||||
# Encode and decode
|
||||
encoded = schema.encode(original)
|
||||
decoded = schema.decode(encoded)
|
||||
|
||||
# Verify round-trip
|
||||
assert decoded.metadata.id == original.metadata.id
|
||||
assert decoded.metadata.user == original.metadata.user
|
||||
assert decoded.metadata.collection == original.metadata.collection
|
||||
assert decoded.schema_name == original.schema_name
|
||||
assert decoded.values == original.values
|
||||
assert decoded.confidence == original.confidence
|
||||
assert decoded.source_span == original.source_span
|
||||
|
||||
def test_cassandra_table_naming_contract(self):
|
||||
"""Test Cassandra naming conventions and constraints"""
|
||||
processor = Processor.__new__(Processor)
|
||||
|
||||
# Test table naming (always gets o_ prefix)
|
||||
table_test_names = [
|
||||
("simple_name", "o_simple_name"),
|
||||
("Name-With-Dashes", "o_name_with_dashes"),
|
||||
("name.with.dots", "o_name_with_dots"),
|
||||
("123_numbers", "o_123_numbers"),
|
||||
("special!@#chars", "o_special___chars"), # 3 special chars become 3 underscores
|
||||
("UPPERCASE", "o_uppercase"),
|
||||
("CamelCase", "o_camelcase"),
|
||||
("", "o_"), # Edge case - empty string becomes o_
|
||||
]
|
||||
|
||||
for input_name, expected_name in table_test_names:
|
||||
result = processor.sanitize_table(input_name)
|
||||
assert result == expected_name
|
||||
# Verify result is valid Cassandra identifier (starts with letter)
|
||||
assert result.startswith('o_')
|
||||
assert result.replace('o_', '').replace('_', '').isalnum() or result == 'o_'
|
||||
|
||||
# Test regular name sanitization (only adds o_ prefix if starts with number)
|
||||
name_test_cases = [
|
||||
("simple_name", "simple_name"),
|
||||
("Name-With-Dashes", "name_with_dashes"),
|
||||
("name.with.dots", "name_with_dots"),
|
||||
("123_numbers", "o_123_numbers"), # Only this gets o_ prefix
|
||||
("special!@#chars", "special___chars"), # 3 special chars become 3 underscores
|
||||
("UPPERCASE", "uppercase"),
|
||||
("CamelCase", "camelcase"),
|
||||
]
|
||||
|
||||
for input_name, expected_name in name_test_cases:
|
||||
result = processor.sanitize_name(input_name)
|
||||
assert result == expected_name
|
||||
|
||||
def test_primary_key_structure_contract(self):
|
||||
"""Test that primary key structure follows Cassandra best practices"""
|
||||
# Verify partition key always includes collection
|
||||
processor = Processor.__new__(Processor)
|
||||
processor.schemas = {}
|
||||
processor.known_keyspaces = set()
|
||||
processor.known_tables = {}
|
||||
processor.session = None
|
||||
|
||||
# Test schema with primary key
|
||||
schema_with_pk = RowSchema(
|
||||
name="test",
|
||||
fields=[
|
||||
Field(name="id", type="string", primary=True),
|
||||
Field(name="data", type="string")
|
||||
]
|
||||
)
|
||||
|
||||
# The primary key should be ((collection, id))
|
||||
# This is verified in the implementation where collection
|
||||
# is always first in the partition key
|
||||
|
||||
def test_metadata_field_usage_contract(self):
|
||||
"""Test that metadata fields are used correctly in storage"""
|
||||
# Create test object
|
||||
test_obj = ExtractedObject(
|
||||
metadata=Metadata(
|
||||
id="meta-001",
|
||||
user="user123", # -> keyspace
|
||||
collection="coll456", # -> partition key
|
||||
metadata=[{"key": "value"}]
|
||||
),
|
||||
schema_name="table789", # -> table name
|
||||
values={"field": "value"},
|
||||
confidence=0.9,
|
||||
source_span="Source"
|
||||
)
|
||||
|
||||
# Verify mapping contract:
|
||||
# - metadata.user -> Cassandra keyspace
|
||||
# - schema_name -> Cassandra table
|
||||
# - metadata.collection -> Part of primary key
|
||||
assert test_obj.metadata.user # Required for keyspace
|
||||
assert test_obj.schema_name # Required for table
|
||||
assert test_obj.metadata.collection # Required for partition key
|
||||
308
tests/contract/test_structured_data_contracts.py
Normal file
308
tests/contract/test_structured_data_contracts.py
Normal file
|
|
@ -0,0 +1,308 @@
|
|||
"""
|
||||
Contract tests for Structured Data Pulsar Message Schemas
|
||||
|
||||
These tests verify the contracts for all structured data Pulsar message schemas,
|
||||
ensuring schema compatibility, serialization contracts, and service interface stability.
|
||||
Following the TEST_STRATEGY.md approach for contract testing.
|
||||
"""
|
||||
|
||||
import pytest
|
||||
import json
|
||||
from typing import Dict, Any
|
||||
|
||||
from trustgraph.schema import (
|
||||
StructuredDataSubmission, ExtractedObject,
|
||||
NLPToStructuredQueryRequest, NLPToStructuredQueryResponse,
|
||||
StructuredQueryRequest, StructuredQueryResponse,
|
||||
StructuredObjectEmbedding, Field, RowSchema,
|
||||
Metadata, Error, Value
|
||||
)
|
||||
from .conftest import serialize_deserialize_test
|
||||
|
||||
|
||||
@pytest.mark.contract
|
||||
class TestStructuredDataSchemaContracts:
|
||||
"""Contract tests for structured data schemas"""
|
||||
|
||||
def test_field_schema_contract(self):
|
||||
"""Test enhanced Field schema contract"""
|
||||
# Arrange & Act - create Field instance directly
|
||||
field = Field(
|
||||
name="customer_id",
|
||||
type="string",
|
||||
size=0,
|
||||
primary=True,
|
||||
description="Unique customer identifier",
|
||||
required=True,
|
||||
enum_values=[],
|
||||
indexed=True
|
||||
)
|
||||
|
||||
# Assert - test field properties
|
||||
assert field.name == "customer_id"
|
||||
assert field.type == "string"
|
||||
assert field.primary is True
|
||||
assert field.indexed is True
|
||||
assert isinstance(field.enum_values, list)
|
||||
assert len(field.enum_values) == 0
|
||||
|
||||
# Test with enum values
|
||||
field_with_enum = Field(
|
||||
name="status",
|
||||
type="string",
|
||||
size=0,
|
||||
primary=False,
|
||||
description="Status field",
|
||||
required=False,
|
||||
enum_values=["active", "inactive"],
|
||||
indexed=True
|
||||
)
|
||||
|
||||
assert len(field_with_enum.enum_values) == 2
|
||||
assert "active" in field_with_enum.enum_values
|
||||
|
||||
def test_row_schema_contract(self):
|
||||
"""Test RowSchema contract"""
|
||||
# Arrange & Act
|
||||
field = Field(
|
||||
name="email",
|
||||
type="string",
|
||||
size=255,
|
||||
primary=False,
|
||||
description="Customer email",
|
||||
required=True,
|
||||
enum_values=[],
|
||||
indexed=True
|
||||
)
|
||||
|
||||
schema = RowSchema(
|
||||
name="customers",
|
||||
description="Customer records schema",
|
||||
fields=[field]
|
||||
)
|
||||
|
||||
# Assert
|
||||
assert schema.name == "customers"
|
||||
assert schema.description == "Customer records schema"
|
||||
assert len(schema.fields) == 1
|
||||
assert schema.fields[0].name == "email"
|
||||
assert schema.fields[0].indexed is True
|
||||
|
||||
def test_structured_data_submission_contract(self):
|
||||
"""Test StructuredDataSubmission schema contract"""
|
||||
# Arrange
|
||||
metadata = Metadata(
|
||||
id="structured-data-001",
|
||||
user="test_user",
|
||||
collection="test_collection",
|
||||
metadata=[]
|
||||
)
|
||||
|
||||
# Act
|
||||
submission = StructuredDataSubmission(
|
||||
metadata=metadata,
|
||||
format="csv",
|
||||
schema_name="customer_records",
|
||||
data=b"id,name,email\n1,John,john@example.com",
|
||||
options={"delimiter": ",", "header": "true"}
|
||||
)
|
||||
|
||||
# Assert
|
||||
assert submission.format == "csv"
|
||||
assert submission.schema_name == "customer_records"
|
||||
assert submission.options["delimiter"] == ","
|
||||
assert submission.metadata.id == "structured-data-001"
|
||||
assert len(submission.data) > 0
|
||||
|
||||
def test_extracted_object_contract(self):
|
||||
"""Test ExtractedObject schema contract"""
|
||||
# Arrange
|
||||
metadata = Metadata(
|
||||
id="extracted-obj-001",
|
||||
user="test_user",
|
||||
collection="test_collection",
|
||||
metadata=[]
|
||||
)
|
||||
|
||||
# Act
|
||||
obj = ExtractedObject(
|
||||
metadata=metadata,
|
||||
schema_name="customer_records",
|
||||
values={"id": "123", "name": "John Doe", "email": "john@example.com"},
|
||||
confidence=0.95,
|
||||
source_span="John Doe (john@example.com) customer ID 123"
|
||||
)
|
||||
|
||||
# Assert
|
||||
assert obj.schema_name == "customer_records"
|
||||
assert obj.values["name"] == "John Doe"
|
||||
assert obj.confidence == 0.95
|
||||
assert len(obj.source_span) > 0
|
||||
assert obj.metadata.id == "extracted-obj-001"
|
||||
|
||||
|
||||
@pytest.mark.contract
|
||||
class TestStructuredQueryServiceContracts:
|
||||
"""Contract tests for structured query services"""
|
||||
|
||||
def test_nlp_to_structured_query_request_contract(self):
|
||||
"""Test NLPToStructuredQueryRequest schema contract"""
|
||||
# Act
|
||||
request = NLPToStructuredQueryRequest(
|
||||
natural_language_query="Show me all customers who registered last month",
|
||||
max_results=100,
|
||||
context_hints={"time_range": "last_month", "entity_type": "customer"}
|
||||
)
|
||||
|
||||
# Assert
|
||||
assert "customers" in request.natural_language_query
|
||||
assert request.max_results == 100
|
||||
assert request.context_hints["time_range"] == "last_month"
|
||||
|
||||
def test_nlp_to_structured_query_response_contract(self):
|
||||
"""Test NLPToStructuredQueryResponse schema contract"""
|
||||
# Act
|
||||
response = NLPToStructuredQueryResponse(
|
||||
error=None,
|
||||
graphql_query="query { customers(filter: {registered: {gte: \"2024-01-01\"}}) { id name email } }",
|
||||
variables={"start_date": "2024-01-01"},
|
||||
detected_schemas=["customers"],
|
||||
confidence=0.92
|
||||
)
|
||||
|
||||
# Assert
|
||||
assert response.error is None
|
||||
assert "customers" in response.graphql_query
|
||||
assert response.detected_schemas[0] == "customers"
|
||||
assert response.confidence > 0.9
|
||||
|
||||
def test_structured_query_request_contract(self):
|
||||
"""Test StructuredQueryRequest schema contract"""
|
||||
# Act
|
||||
request = StructuredQueryRequest(
|
||||
query="query GetCustomers($limit: Int) { customers(limit: $limit) { id name email } }",
|
||||
variables={"limit": "10"},
|
||||
operation_name="GetCustomers"
|
||||
)
|
||||
|
||||
# Assert
|
||||
assert "customers" in request.query
|
||||
assert request.variables["limit"] == "10"
|
||||
assert request.operation_name == "GetCustomers"
|
||||
|
||||
def test_structured_query_response_contract(self):
|
||||
"""Test StructuredQueryResponse schema contract"""
|
||||
# Act
|
||||
response = StructuredQueryResponse(
|
||||
error=None,
|
||||
data='{"customers": [{"id": "1", "name": "John", "email": "john@example.com"}]}',
|
||||
errors=[]
|
||||
)
|
||||
|
||||
# Assert
|
||||
assert response.error is None
|
||||
assert "customers" in response.data
|
||||
assert len(response.errors) == 0
|
||||
|
||||
def test_structured_query_response_with_errors_contract(self):
|
||||
"""Test StructuredQueryResponse with GraphQL errors contract"""
|
||||
# Act
|
||||
response = StructuredQueryResponse(
|
||||
error=None,
|
||||
data=None,
|
||||
errors=["Field 'invalid_field' not found in schema 'customers'"]
|
||||
)
|
||||
|
||||
# Assert
|
||||
assert response.data is None
|
||||
assert len(response.errors) == 1
|
||||
assert "invalid_field" in response.errors[0]
|
||||
|
||||
|
||||
@pytest.mark.contract
|
||||
class TestStructuredEmbeddingsContracts:
|
||||
"""Contract tests for structured object embeddings"""
|
||||
|
||||
def test_structured_object_embedding_contract(self):
|
||||
"""Test StructuredObjectEmbedding schema contract"""
|
||||
# Arrange
|
||||
metadata = Metadata(
|
||||
id="struct-embed-001",
|
||||
user="test_user",
|
||||
collection="test_collection",
|
||||
metadata=[]
|
||||
)
|
||||
|
||||
# Act
|
||||
embedding = StructuredObjectEmbedding(
|
||||
metadata=metadata,
|
||||
vectors=[[0.1, 0.2, 0.3], [0.4, 0.5, 0.6]],
|
||||
schema_name="customer_records",
|
||||
object_id="customer_123",
|
||||
field_embeddings={
|
||||
"name": [0.1, 0.2, 0.3],
|
||||
"email": [0.4, 0.5, 0.6]
|
||||
}
|
||||
)
|
||||
|
||||
# Assert
|
||||
assert embedding.schema_name == "customer_records"
|
||||
assert embedding.object_id == "customer_123"
|
||||
assert len(embedding.vectors) == 2
|
||||
assert len(embedding.field_embeddings) == 2
|
||||
assert "name" in embedding.field_embeddings
|
||||
|
||||
|
||||
@pytest.mark.contract
|
||||
class TestStructuredDataSerializationContracts:
|
||||
"""Contract tests for structured data serialization/deserialization"""
|
||||
|
||||
def test_structured_data_submission_serialization(self):
|
||||
"""Test StructuredDataSubmission serialization contract"""
|
||||
# Arrange
|
||||
metadata = Metadata(id="test", user="user", collection="col", metadata=[])
|
||||
submission_data = {
|
||||
"metadata": metadata,
|
||||
"format": "json",
|
||||
"schema_name": "test_schema",
|
||||
"data": b'{"test": "data"}',
|
||||
"options": {"encoding": "utf-8"}
|
||||
}
|
||||
|
||||
# Act & Assert
|
||||
assert serialize_deserialize_test(StructuredDataSubmission, submission_data)
|
||||
|
||||
def test_extracted_object_serialization(self):
|
||||
"""Test ExtractedObject serialization contract"""
|
||||
# Arrange
|
||||
metadata = Metadata(id="test", user="user", collection="col", metadata=[])
|
||||
object_data = {
|
||||
"metadata": metadata,
|
||||
"schema_name": "test_schema",
|
||||
"values": {"field1": "value1"},
|
||||
"confidence": 0.8,
|
||||
"source_span": "test span"
|
||||
}
|
||||
|
||||
# Act & Assert
|
||||
assert serialize_deserialize_test(ExtractedObject, object_data)
|
||||
|
||||
def test_nlp_query_serialization(self):
|
||||
"""Test NLP query request/response serialization contract"""
|
||||
# Test request
|
||||
request_data = {
|
||||
"natural_language_query": "test query",
|
||||
"max_results": 10,
|
||||
"context_hints": {}
|
||||
}
|
||||
assert serialize_deserialize_test(NLPToStructuredQueryRequest, request_data)
|
||||
|
||||
# Test response
|
||||
response_data = {
|
||||
"error": None,
|
||||
"graphql_query": "query { test }",
|
||||
"variables": {},
|
||||
"detected_schemas": ["test"],
|
||||
"confidence": 0.9
|
||||
}
|
||||
assert serialize_deserialize_test(NLPToStructuredQueryResponse, response_data)
|
||||
Loading…
Add table
Add a link
Reference in a new issue