Release/v1.2 (#457)

* Bump setup.py versions for 1.1

* PoC MCP server (#419)

* Very initial MCP server PoC for TrustGraph

* Put service on port 8000

* Add MCP container and packages to buildout

* Update docs for API/CLI changes in 1.0 (#421)

* Update some API basics for the 0.23/1.0 API change

* Add MCP container push (#425)

* Add command args to the MCP server (#426)

* Host and port parameters

* Added websocket arg

* More docs

* MCP client support (#427)

- MCP client service
- Tool request/response schema
- API gateway support for mcp-tool
- Message translation for tool request & response
- Make mcp-tool using configuration service for information
  about where the MCP services are.

* Feature/react call mcp (#428)

Key Features

  - MCP Tool Integration: Added core MCP tool support with ToolClientSpec and ToolClient classes
  - API Enhancement: New mcp_tool method for flow-specific tool invocation
  - CLI Tooling: New tg-invoke-mcp-tool command for testing MCP integration
  - React Agent Enhancement: Fixed and improved multi-tool invocation capabilities
  - Tool Management: Enhanced CLI for tool configuration and management

Changes

  - Added MCP tool invocation to API with flow-specific integration
  - Implemented ToolClientSpec and ToolClient for tool call handling
  - Updated agent-manager-react to invoke MCP tools with configurable types
  - Enhanced CLI with new commands and improved help text
  - Added comprehensive documentation for new CLI commands
  - Improved tool configuration management

Testing

  - Added tg-invoke-mcp-tool CLI command for isolated MCP integration testing
  - Enhanced agent capability to invoke multiple tools simultaneously

* Test suite executed from CI pipeline (#433)

* Test strategy & test cases

* Unit tests

* Integration tests

* Extending test coverage (#434)

* Contract tests

* Testing embeedings

* Agent unit tests

* Knowledge pipeline tests

* Turn on contract tests

* Increase storage test coverage (#435)

* Fixing storage and adding tests

* PR pipeline only runs quick tests

* Empty configuration is returned as empty list, previously was not in response (#436)

* Update config util to take files as well as command-line text (#437)

* Updated CLI invocation and config model for tools and mcp (#438)

* Updated CLI invocation and config model for tools and mcp

* CLI anomalies

* Tweaked the MCP tool implementation for new model

* Update agent implementation to match the new model

* Fix agent tools, now all tested

* Fixed integration tests

* Fix MCP delete tool params

* Update Python deps to 1.2

* Update to enable knowledge extraction using the agent framework (#439)

* Implement KG extraction agent (kg-extract-agent)

* Using ReAct framework (agent-manager-react)
 
* ReAct manager had an issue when emitting JSON, which conflicts which ReAct manager's own JSON messages, so refactored ReAct manager to use traditional ReAct messages, non-JSON structure.
 
* Minor refactor to take the prompt template client out of prompt-template so it can be more readily used by other modules. kg-extract-agent uses this framework.

* Migrate from setup.py to pyproject.toml (#440)

* Converted setup.py to pyproject.toml

* Modern package infrastructure as recommended by py docs

* Install missing build deps (#441)

* Install missing build deps (#442)

* Implement logging strategy (#444)

* Logging strategy and convert all prints() to logging invocations

* Fix/startup failure (#445)

* Fix loggin startup problems

* Fix logging startup problems (#446)

* Fix logging startup problems (#447)

* Fixed Mistral OCR to use current API (#448)

* Fixed Mistral OCR to use current API

* Added PDF decoder tests

* Fix Mistral OCR ident to be standard pdf-decoder (#450)

* Fix Mistral OCR ident to be standard pdf-decoder

* Correct test

* Schema structure refactor (#451)

* Write schema refactor spec

* Implemented schema refactor spec

* Structure data mvp (#452)

* Structured data tech spec

* Architecture principles

* New schemas

* Updated schemas and specs

* Object extractor

* Add .coveragerc

* New tests

* Cassandra object storage

* Trying to object extraction working, issues exist

* Validate librarian collection (#453)

* Fix token chunker, broken API invocation (#454)

* Fix token chunker, broken API invocation (#455)

* Knowledge load utility CLI (#456)

* Knowledge loader

* More tests
This commit is contained in:
cybermaggedon 2025-08-18 20:56:09 +01:00 committed by GitHub
parent c85ba197be
commit 89be656990
No known key found for this signature in database
GPG key ID: B5690EEEBB952194
509 changed files with 49632 additions and 5159 deletions

243
tests/contract/README.md Normal file
View file

@ -0,0 +1,243 @@
# Contract Tests for TrustGraph
This directory contains contract tests that verify service interface contracts, message schemas, and API compatibility across the TrustGraph microservices architecture.
## Overview
Contract tests ensure that:
- **Message schemas remain compatible** across service versions
- **API interfaces stay stable** for consumers
- **Service communication contracts** are maintained
- **Schema evolution** doesn't break existing integrations
## Test Categories
### 1. Pulsar Message Schema Contracts (`test_message_contracts.py`)
Tests the contracts for all Pulsar message schemas used in TrustGraph service communication.
#### **Coverage:**
- ✅ **Text Completion Messages**: `TextCompletionRequest``TextCompletionResponse`
- ✅ **Document RAG Messages**: `DocumentRagQuery``DocumentRagResponse`
- ✅ **Agent Messages**: `AgentRequest``AgentResponse``AgentStep`
- ✅ **Graph Messages**: `Chunk``Triple``Triples``EntityContext`
- ✅ **Common Messages**: `Metadata`, `Value`, `Error` schemas
- ✅ **Message Routing**: Properties, correlation IDs, routing keys
- ✅ **Schema Evolution**: Backward/forward compatibility testing
- ✅ **Serialization**: Schema validation and data integrity
#### **Key Features:**
- **Schema Validation**: Ensures all message schemas accept valid data and reject invalid data
- **Field Contracts**: Validates required vs optional fields and type constraints
- **Nested Schema Support**: Tests complex schemas with embedded objects and arrays
- **Routing Contracts**: Validates message properties and routing conventions
- **Evolution Testing**: Backward compatibility and schema versioning support
## Running Contract Tests
### Run All Contract Tests
```bash
pytest tests/contract/ -m contract
```
### Run Specific Contract Test Categories
```bash
# Message schema contracts
pytest tests/contract/test_message_contracts.py -v
# Specific test class
pytest tests/contract/test_message_contracts.py::TestTextCompletionMessageContracts -v
# Schema evolution tests
pytest tests/contract/test_message_contracts.py::TestSchemaEvolutionContracts -v
```
### Run with Coverage
```bash
pytest tests/contract/ -m contract --cov=trustgraph.schema --cov-report=html
```
## Contract Test Patterns
### 1. Schema Validation Pattern
```python
@pytest.mark.contract
def test_schema_contract(self, sample_message_data):
"""Test that schema accepts valid data and rejects invalid data"""
# Arrange
valid_data = sample_message_data["SchemaName"]
# Act & Assert
assert validate_schema_contract(SchemaClass, valid_data)
# Test field constraints
instance = SchemaClass(**valid_data)
assert hasattr(instance, 'required_field')
assert isinstance(instance.required_field, expected_type)
```
### 2. Serialization Contract Pattern
```python
@pytest.mark.contract
def test_serialization_contract(self, sample_message_data):
"""Test schema serialization/deserialization contracts"""
# Arrange
data = sample_message_data["SchemaName"]
# Act & Assert
assert serialize_deserialize_test(SchemaClass, data)
```
### 3. Evolution Contract Pattern
```python
@pytest.mark.contract
def test_backward_compatibility_contract(self, schema_evolution_data):
"""Test that new schema versions accept old data formats"""
# Arrange
old_version_data = schema_evolution_data["SchemaName_v1"]
# Act - Should work with current schema
instance = CurrentSchema(**old_version_data)
# Assert - Required fields maintained
assert instance.required_field == expected_value
```
## Schema Registry
The contract tests maintain a registry of all TrustGraph schemas:
```python
schema_registry = {
# Text Completion
"TextCompletionRequest": TextCompletionRequest,
"TextCompletionResponse": TextCompletionResponse,
# Document RAG
"DocumentRagQuery": DocumentRagQuery,
"DocumentRagResponse": DocumentRagResponse,
# Agent
"AgentRequest": AgentRequest,
"AgentResponse": AgentResponse,
# Graph/Knowledge
"Chunk": Chunk,
"Triple": Triple,
"Triples": Triples,
"Value": Value,
# Common
"Metadata": Metadata,
"Error": Error,
}
```
## Message Contract Specifications
### Text Completion Service Contract
```yaml
TextCompletionRequest:
required_fields: [system, prompt]
field_types:
system: string
prompt: string
TextCompletionResponse:
required_fields: [error, response, model]
field_types:
error: Error | null
response: string | null
in_token: integer | null
out_token: integer | null
model: string
```
### Document RAG Service Contract
```yaml
DocumentRagQuery:
required_fields: [query, user, collection]
field_types:
query: string
user: string
collection: string
doc_limit: integer
DocumentRagResponse:
required_fields: [error, response]
field_types:
error: Error | null
response: string | null
```
### Agent Service Contract
```yaml
AgentRequest:
required_fields: [question, history]
field_types:
question: string
plan: string
state: string
history: Array<AgentStep>
AgentResponse:
required_fields: [error]
field_types:
answer: string | null
error: Error | null
thought: string | null
observation: string | null
```
## Best Practices
### Contract Test Design
1. **Test Both Valid and Invalid Data**: Ensure schemas accept valid data and reject invalid data
2. **Verify Field Constraints**: Test type constraints, required vs optional fields
3. **Test Nested Schemas**: Validate complex objects with embedded schemas
4. **Test Array Fields**: Ensure array serialization maintains order and content
5. **Test Optional Fields**: Verify optional field handling in serialization
### Schema Evolution
1. **Backward Compatibility**: New schema versions must accept old message formats
2. **Required Field Stability**: Required fields should never become optional or be removed
3. **Additive Changes**: New fields should be optional to maintain compatibility
4. **Deprecation Strategy**: Plan deprecation path for schema changes
### Error Handling
1. **Error Schema Consistency**: All error responses use consistent Error schema
2. **Error Type Contracts**: Error types follow naming conventions
3. **Error Message Format**: Error messages provide actionable information
## Adding New Contract Tests
When adding new message schemas or modifying existing ones:
1. **Add to Schema Registry**: Update `conftest.py` schema registry
2. **Add Sample Data**: Create valid sample data in `conftest.py`
3. **Create Contract Tests**: Follow existing patterns for validation
4. **Test Evolution**: Add backward compatibility tests
5. **Update Documentation**: Document schema contracts in this README
## Integration with CI/CD
Contract tests should be run:
- **On every commit** to detect breaking changes early
- **Before releases** to ensure API stability
- **On schema changes** to validate compatibility
- **In dependency updates** to catch breaking changes
```bash
# CI/CD pipeline command
pytest tests/contract/ -m contract --junitxml=contract-test-results.xml
```
## Contract Test Results
Contract tests provide:
- ✅ **Schema Compatibility Reports**: Which schemas pass/fail validation
- ✅ **Breaking Change Detection**: Identifies contract violations
- ✅ **Evolution Validation**: Confirms backward compatibility
- ✅ **Field Constraint Verification**: Validates data type contracts
This ensures that TrustGraph services can evolve independently while maintaining stable, compatible interfaces for all service communication.

View file

224
tests/contract/conftest.py Normal file
View file

@ -0,0 +1,224 @@
"""
Contract test fixtures and configuration
This file provides common fixtures for contract testing, focusing on
message schema validation, API interface contracts, and service compatibility.
"""
import pytest
import json
from typing import Dict, Any, Type
from pulsar.schema import Record
from unittest.mock import MagicMock
from trustgraph.schema import (
TextCompletionRequest, TextCompletionResponse,
DocumentRagQuery, DocumentRagResponse,
AgentRequest, AgentResponse, AgentStep,
Chunk, Triple, Triples, Value, Error,
EntityContext, EntityContexts,
GraphEmbeddings, EntityEmbeddings,
Metadata
)
@pytest.fixture
def schema_registry():
"""Registry of all Pulsar schemas used in TrustGraph"""
return {
# Text Completion
"TextCompletionRequest": TextCompletionRequest,
"TextCompletionResponse": TextCompletionResponse,
# Document RAG
"DocumentRagQuery": DocumentRagQuery,
"DocumentRagResponse": DocumentRagResponse,
# Agent
"AgentRequest": AgentRequest,
"AgentResponse": AgentResponse,
"AgentStep": AgentStep,
# Graph
"Chunk": Chunk,
"Triple": Triple,
"Triples": Triples,
"Value": Value,
"Error": Error,
"EntityContext": EntityContext,
"EntityContexts": EntityContexts,
"GraphEmbeddings": GraphEmbeddings,
"EntityEmbeddings": EntityEmbeddings,
# Common
"Metadata": Metadata,
}
@pytest.fixture
def sample_message_data():
"""Sample message data for contract testing"""
return {
"TextCompletionRequest": {
"system": "You are a helpful assistant.",
"prompt": "What is machine learning?"
},
"TextCompletionResponse": {
"error": None,
"response": "Machine learning is a subset of artificial intelligence.",
"in_token": 50,
"out_token": 100,
"model": "gpt-3.5-turbo"
},
"DocumentRagQuery": {
"query": "What is artificial intelligence?",
"user": "test_user",
"collection": "test_collection",
"doc_limit": 10
},
"DocumentRagResponse": {
"error": None,
"response": "Artificial intelligence is the simulation of human intelligence in machines."
},
"AgentRequest": {
"question": "What is machine learning?",
"plan": "",
"state": "",
"history": []
},
"AgentResponse": {
"answer": "Machine learning is a subset of AI.",
"error": None,
"thought": "I need to provide information about machine learning.",
"observation": None
},
"Metadata": {
"id": "test-doc-123",
"user": "test_user",
"collection": "test_collection",
"metadata": []
},
"Value": {
"value": "http://example.com/entity",
"is_uri": True,
"type": ""
},
"Triple": {
"s": Value(
value="http://example.com/subject",
is_uri=True,
type=""
),
"p": Value(
value="http://example.com/predicate",
is_uri=True,
type=""
),
"o": Value(
value="Object value",
is_uri=False,
type=""
)
}
}
@pytest.fixture
def invalid_message_data():
"""Invalid message data for contract validation testing"""
return {
"TextCompletionRequest": [
{"system": None, "prompt": "test"}, # Invalid system (None)
{"system": "test", "prompt": None}, # Invalid prompt (None)
{"system": 123, "prompt": "test"}, # Invalid system (not string)
{}, # Missing required fields
],
"DocumentRagQuery": [
{"query": None, "user": "test", "collection": "test", "doc_limit": 10}, # Invalid query
{"query": "test", "user": None, "collection": "test", "doc_limit": 10}, # Invalid user
{"query": "test", "user": "test", "collection": "test", "doc_limit": -1}, # Invalid doc_limit
{"query": "test"}, # Missing required fields
],
"Value": [
{"value": None, "is_uri": True, "type": ""}, # Invalid value (None)
{"value": "test", "is_uri": "not_boolean", "type": ""}, # Invalid is_uri
{"value": 123, "is_uri": True, "type": ""}, # Invalid value (not string)
]
}
@pytest.fixture
def message_properties():
"""Standard message properties for contract testing"""
return {
"id": "test-message-123",
"routing_key": "test.routing.key",
"timestamp": "2024-01-01T00:00:00Z",
"source_service": "test-service",
"correlation_id": "correlation-123"
}
@pytest.fixture
def schema_evolution_data():
"""Data for testing schema evolution and backward compatibility"""
return {
"TextCompletionRequest_v1": {
"system": "You are helpful.",
"prompt": "Test prompt"
},
"TextCompletionRequest_v2": {
"system": "You are helpful.",
"prompt": "Test prompt",
"temperature": 0.7, # New field
"max_tokens": 100 # New field
},
"TextCompletionResponse_v1": {
"error": None,
"response": "Test response",
"model": "gpt-3.5-turbo"
},
"TextCompletionResponse_v2": {
"error": None,
"response": "Test response",
"in_token": 50, # New field
"out_token": 100, # New field
"model": "gpt-3.5-turbo"
}
}
def validate_schema_contract(schema_class: Type[Record], data: Dict[str, Any]) -> bool:
"""Helper function to validate schema contracts"""
try:
# Create instance from data
instance = schema_class(**data)
# Verify all fields are accessible
for field_name in data.keys():
assert hasattr(instance, field_name)
assert getattr(instance, field_name) == data[field_name]
return True
except Exception:
return False
def serialize_deserialize_test(schema_class: Type[Record], data: Dict[str, Any]) -> bool:
"""Helper function to test serialization/deserialization"""
try:
# Create instance
instance = schema_class(**data)
# This would test actual Pulsar serialization if we had the client
# For now, we test the schema construction and field access
for field_name, field_value in data.items():
assert getattr(instance, field_name) == field_value
return True
except Exception:
return False
# Test markers for contract tests
pytestmark = pytest.mark.contract

View file

@ -0,0 +1,614 @@
"""
Contract tests for Pulsar Message Schemas
These tests verify the contracts for all Pulsar message schemas used in TrustGraph,
ensuring schema compatibility, serialization contracts, and service interface stability.
Following the TEST_STRATEGY.md approach for contract testing.
"""
import pytest
import json
from typing import Dict, Any, Type
from pulsar.schema import Record
from trustgraph.schema import (
TextCompletionRequest, TextCompletionResponse,
DocumentRagQuery, DocumentRagResponse,
AgentRequest, AgentResponse, AgentStep,
Chunk, Triple, Triples, Value, Error,
EntityContext, EntityContexts,
GraphEmbeddings, EntityEmbeddings,
Metadata, Field, RowSchema,
StructuredDataSubmission, ExtractedObject,
NLPToStructuredQueryRequest, NLPToStructuredQueryResponse,
StructuredQueryRequest, StructuredQueryResponse,
StructuredObjectEmbedding
)
from .conftest import validate_schema_contract, serialize_deserialize_test
@pytest.mark.contract
class TestTextCompletionMessageContracts:
"""Contract tests for Text Completion message schemas"""
def test_text_completion_request_schema_contract(self, sample_message_data):
"""Test TextCompletionRequest schema contract"""
# Arrange
request_data = sample_message_data["TextCompletionRequest"]
# Act & Assert
assert validate_schema_contract(TextCompletionRequest, request_data)
# Test required fields
request = TextCompletionRequest(**request_data)
assert hasattr(request, 'system')
assert hasattr(request, 'prompt')
assert isinstance(request.system, str)
assert isinstance(request.prompt, str)
def test_text_completion_response_schema_contract(self, sample_message_data):
"""Test TextCompletionResponse schema contract"""
# Arrange
response_data = sample_message_data["TextCompletionResponse"]
# Act & Assert
assert validate_schema_contract(TextCompletionResponse, response_data)
# Test required fields
response = TextCompletionResponse(**response_data)
assert hasattr(response, 'error')
assert hasattr(response, 'response')
assert hasattr(response, 'in_token')
assert hasattr(response, 'out_token')
assert hasattr(response, 'model')
def test_text_completion_request_serialization_contract(self, sample_message_data):
"""Test TextCompletionRequest serialization/deserialization contract"""
# Arrange
request_data = sample_message_data["TextCompletionRequest"]
# Act & Assert
assert serialize_deserialize_test(TextCompletionRequest, request_data)
def test_text_completion_response_serialization_contract(self, sample_message_data):
"""Test TextCompletionResponse serialization/deserialization contract"""
# Arrange
response_data = sample_message_data["TextCompletionResponse"]
# Act & Assert
assert serialize_deserialize_test(TextCompletionResponse, response_data)
def test_text_completion_request_field_constraints(self):
"""Test TextCompletionRequest field type constraints"""
# Test valid data
valid_request = TextCompletionRequest(
system="You are helpful.",
prompt="Test prompt"
)
assert valid_request.system == "You are helpful."
assert valid_request.prompt == "Test prompt"
def test_text_completion_response_field_constraints(self):
"""Test TextCompletionResponse field type constraints"""
# Test valid response with no error
valid_response = TextCompletionResponse(
error=None,
response="Test response",
in_token=50,
out_token=100,
model="gpt-3.5-turbo"
)
assert valid_response.error is None
assert valid_response.response == "Test response"
assert valid_response.in_token == 50
assert valid_response.out_token == 100
assert valid_response.model == "gpt-3.5-turbo"
# Test response with error
error_response = TextCompletionResponse(
error=Error(type="rate-limit", message="Rate limit exceeded"),
response=None,
in_token=None,
out_token=None,
model=None
)
assert error_response.error is not None
assert error_response.error.type == "rate-limit"
assert error_response.response is None
@pytest.mark.contract
class TestDocumentRagMessageContracts:
"""Contract tests for Document RAG message schemas"""
def test_document_rag_query_schema_contract(self, sample_message_data):
"""Test DocumentRagQuery schema contract"""
# Arrange
query_data = sample_message_data["DocumentRagQuery"]
# Act & Assert
assert validate_schema_contract(DocumentRagQuery, query_data)
# Test required fields
query = DocumentRagQuery(**query_data)
assert hasattr(query, 'query')
assert hasattr(query, 'user')
assert hasattr(query, 'collection')
assert hasattr(query, 'doc_limit')
def test_document_rag_response_schema_contract(self, sample_message_data):
"""Test DocumentRagResponse schema contract"""
# Arrange
response_data = sample_message_data["DocumentRagResponse"]
# Act & Assert
assert validate_schema_contract(DocumentRagResponse, response_data)
# Test required fields
response = DocumentRagResponse(**response_data)
assert hasattr(response, 'error')
assert hasattr(response, 'response')
def test_document_rag_query_field_constraints(self):
"""Test DocumentRagQuery field constraints"""
# Test valid query
valid_query = DocumentRagQuery(
query="What is AI?",
user="test_user",
collection="test_collection",
doc_limit=5
)
assert valid_query.query == "What is AI?"
assert valid_query.user == "test_user"
assert valid_query.collection == "test_collection"
assert valid_query.doc_limit == 5
def test_document_rag_response_error_contract(self):
"""Test DocumentRagResponse error handling contract"""
# Test successful response
success_response = DocumentRagResponse(
error=None,
response="AI is artificial intelligence."
)
assert success_response.error is None
assert success_response.response == "AI is artificial intelligence."
# Test error response
error_response = DocumentRagResponse(
error=Error(type="no-documents", message="No documents found"),
response=None
)
assert error_response.error is not None
assert error_response.error.type == "no-documents"
assert error_response.response is None
@pytest.mark.contract
class TestAgentMessageContracts:
"""Contract tests for Agent message schemas"""
def test_agent_request_schema_contract(self, sample_message_data):
"""Test AgentRequest schema contract"""
# Arrange
request_data = sample_message_data["AgentRequest"]
# Act & Assert
assert validate_schema_contract(AgentRequest, request_data)
# Test required fields
request = AgentRequest(**request_data)
assert hasattr(request, 'question')
assert hasattr(request, 'plan')
assert hasattr(request, 'state')
assert hasattr(request, 'history')
def test_agent_response_schema_contract(self, sample_message_data):
"""Test AgentResponse schema contract"""
# Arrange
response_data = sample_message_data["AgentResponse"]
# Act & Assert
assert validate_schema_contract(AgentResponse, response_data)
# Test required fields
response = AgentResponse(**response_data)
assert hasattr(response, 'answer')
assert hasattr(response, 'error')
assert hasattr(response, 'thought')
assert hasattr(response, 'observation')
def test_agent_step_schema_contract(self):
"""Test AgentStep schema contract"""
# Arrange
step_data = {
"thought": "I need to search for information",
"action": "knowledge_query",
"arguments": {"question": "What is AI?"},
"observation": "AI is artificial intelligence"
}
# Act & Assert
assert validate_schema_contract(AgentStep, step_data)
step = AgentStep(**step_data)
assert step.thought == "I need to search for information"
assert step.action == "knowledge_query"
assert step.arguments == {"question": "What is AI?"}
assert step.observation == "AI is artificial intelligence"
def test_agent_request_with_history_contract(self):
"""Test AgentRequest with conversation history contract"""
# Arrange
history_steps = [
AgentStep(
thought="First thought",
action="first_action",
arguments={"param": "value"},
observation="First observation"
),
AgentStep(
thought="Second thought",
action="second_action",
arguments={"param2": "value2"},
observation="Second observation"
)
]
# Act
request = AgentRequest(
question="What comes next?",
plan="Multi-step plan",
state="processing",
history=history_steps
)
# Assert
assert len(request.history) == 2
assert request.history[0].thought == "First thought"
assert request.history[1].action == "second_action"
@pytest.mark.contract
class TestGraphMessageContracts:
"""Contract tests for Graph/Knowledge message schemas"""
def test_value_schema_contract(self, sample_message_data):
"""Test Value schema contract"""
# Arrange
value_data = sample_message_data["Value"]
# Act & Assert
assert validate_schema_contract(Value, value_data)
# Test URI value
uri_value = Value(**value_data)
assert uri_value.value == "http://example.com/entity"
assert uri_value.is_uri is True
# Test literal value
literal_value = Value(
value="Literal text value",
is_uri=False,
type=""
)
assert literal_value.value == "Literal text value"
assert literal_value.is_uri is False
def test_triple_schema_contract(self, sample_message_data):
"""Test Triple schema contract"""
# Arrange
triple_data = sample_message_data["Triple"]
# Act & Assert - Triple uses Value objects, not dict validation
triple = Triple(
s=triple_data["s"],
p=triple_data["p"],
o=triple_data["o"]
)
assert triple.s.value == "http://example.com/subject"
assert triple.p.value == "http://example.com/predicate"
assert triple.o.value == "Object value"
assert triple.s.is_uri is True
assert triple.p.is_uri is True
assert triple.o.is_uri is False
def test_triples_schema_contract(self, sample_message_data):
"""Test Triples (batch) schema contract"""
# Arrange
metadata = Metadata(**sample_message_data["Metadata"])
triple = Triple(**sample_message_data["Triple"])
triples_data = {
"metadata": metadata,
"triples": [triple]
}
# Act & Assert
assert validate_schema_contract(Triples, triples_data)
triples = Triples(**triples_data)
assert triples.metadata.id == "test-doc-123"
assert len(triples.triples) == 1
assert triples.triples[0].s.value == "http://example.com/subject"
def test_chunk_schema_contract(self, sample_message_data):
"""Test Chunk schema contract"""
# Arrange
metadata = Metadata(**sample_message_data["Metadata"])
chunk_data = {
"metadata": metadata,
"chunk": b"This is a text chunk for processing"
}
# Act & Assert
assert validate_schema_contract(Chunk, chunk_data)
chunk = Chunk(**chunk_data)
assert chunk.metadata.id == "test-doc-123"
assert chunk.chunk == b"This is a text chunk for processing"
def test_entity_context_schema_contract(self):
"""Test EntityContext schema contract"""
# Arrange
entity_value = Value(value="http://example.com/entity", is_uri=True, type="")
entity_context_data = {
"entity": entity_value,
"context": "Context information about the entity"
}
# Act & Assert
assert validate_schema_contract(EntityContext, entity_context_data)
entity_context = EntityContext(**entity_context_data)
assert entity_context.entity.value == "http://example.com/entity"
assert entity_context.context == "Context information about the entity"
def test_entity_contexts_batch_schema_contract(self, sample_message_data):
"""Test EntityContexts (batch) schema contract"""
# Arrange
metadata = Metadata(**sample_message_data["Metadata"])
entity_value = Value(value="http://example.com/entity", is_uri=True, type="")
entity_context = EntityContext(
entity=entity_value,
context="Entity context"
)
entity_contexts_data = {
"metadata": metadata,
"entities": [entity_context]
}
# Act & Assert
assert validate_schema_contract(EntityContexts, entity_contexts_data)
entity_contexts = EntityContexts(**entity_contexts_data)
assert entity_contexts.metadata.id == "test-doc-123"
assert len(entity_contexts.entities) == 1
assert entity_contexts.entities[0].context == "Entity context"
@pytest.mark.contract
class TestMetadataMessageContracts:
"""Contract tests for Metadata and common message schemas"""
def test_metadata_schema_contract(self, sample_message_data):
"""Test Metadata schema contract"""
# Arrange
metadata_data = sample_message_data["Metadata"]
# Act & Assert
assert validate_schema_contract(Metadata, metadata_data)
metadata = Metadata(**metadata_data)
assert metadata.id == "test-doc-123"
assert metadata.user == "test_user"
assert metadata.collection == "test_collection"
assert isinstance(metadata.metadata, list)
def test_metadata_with_triples_contract(self, sample_message_data):
"""Test Metadata with embedded triples contract"""
# Arrange
triple = Triple(**sample_message_data["Triple"])
metadata_data = {
"id": "doc-with-triples",
"user": "test_user",
"collection": "test_collection",
"metadata": [triple]
}
# Act & Assert
assert validate_schema_contract(Metadata, metadata_data)
metadata = Metadata(**metadata_data)
assert len(metadata.metadata) == 1
assert metadata.metadata[0].s.value == "http://example.com/subject"
def test_error_schema_contract(self):
"""Test Error schema contract"""
# Arrange
error_data = {
"type": "validation-error",
"message": "Invalid input data provided"
}
# Act & Assert
assert validate_schema_contract(Error, error_data)
error = Error(**error_data)
assert error.type == "validation-error"
assert error.message == "Invalid input data provided"
@pytest.mark.contract
class TestMessageRoutingContracts:
"""Contract tests for message routing and properties"""
def test_message_property_contracts(self, message_properties):
"""Test standard message property contracts"""
# Act & Assert
required_properties = ["id", "routing_key", "timestamp", "source_service"]
for prop in required_properties:
assert prop in message_properties
assert message_properties[prop] is not None
assert isinstance(message_properties[prop], str)
def test_message_id_format_contract(self, message_properties):
"""Test message ID format contract"""
# Act & Assert
message_id = message_properties["id"]
assert isinstance(message_id, str)
assert len(message_id) > 0
# Message IDs should follow a consistent format
assert "test-message-" in message_id
def test_routing_key_format_contract(self, message_properties):
"""Test routing key format contract"""
# Act & Assert
routing_key = message_properties["routing_key"]
assert isinstance(routing_key, str)
assert "." in routing_key # Should use dot notation
assert routing_key.count(".") >= 2 # Should have at least 3 parts
def test_correlation_id_contract(self, message_properties):
"""Test correlation ID contract for request/response tracking"""
# Act & Assert
correlation_id = message_properties.get("correlation_id")
if correlation_id is not None:
assert isinstance(correlation_id, str)
assert len(correlation_id) > 0
@pytest.mark.contract
class TestSchemaEvolutionContracts:
"""Contract tests for schema evolution and backward compatibility"""
def test_schema_backward_compatibility(self, schema_evolution_data):
"""Test schema backward compatibility"""
# Test that v1 data can still be processed
v1_request = schema_evolution_data["TextCompletionRequest_v1"]
# Should work with current schema (optional fields default)
request = TextCompletionRequest(**v1_request)
assert request.system == "You are helpful."
assert request.prompt == "Test prompt"
def test_schema_forward_compatibility(self, schema_evolution_data):
"""Test schema forward compatibility with new fields"""
# Test that v2 data works with additional fields
v2_request = schema_evolution_data["TextCompletionRequest_v2"]
# Current schema should handle new fields gracefully
# (This would require actual schema versioning implementation)
base_fields = {"system": v2_request["system"], "prompt": v2_request["prompt"]}
request = TextCompletionRequest(**base_fields)
assert request.system == "You are helpful."
assert request.prompt == "Test prompt"
def test_required_field_stability_contract(self):
"""Test that required fields remain stable across versions"""
# These fields should never become optional or be removed
required_fields = {
"TextCompletionRequest": ["system", "prompt"],
"TextCompletionResponse": ["error", "response", "model"],
"DocumentRagQuery": ["query", "user", "collection"],
"DocumentRagResponse": ["error", "response"],
"AgentRequest": ["question", "history"],
"AgentResponse": ["error"],
}
# Verify required fields are present in schema definitions
for schema_name, fields in required_fields.items():
# This would be implemented with actual schema introspection
# For now, we verify by attempting to create instances
assert len(fields) > 0 # Ensure we have defined required fields
@pytest.mark.contract
class TestSerializationContracts:
"""Contract tests for message serialization/deserialization"""
def test_all_schemas_serialization_contract(self, schema_registry, sample_message_data):
"""Test serialization contract for all schemas"""
# Test each schema in the registry
for schema_name, schema_class in schema_registry.items():
if schema_name in sample_message_data:
# Skip Triple schema as it requires special handling with Value objects
if schema_name == "Triple":
continue
# Act & Assert
data = sample_message_data[schema_name]
assert serialize_deserialize_test(schema_class, data), f"Serialization failed for {schema_name}"
def test_triple_serialization_contract(self, sample_message_data):
"""Test Triple schema serialization contract with Value objects"""
# Arrange
triple_data = sample_message_data["Triple"]
# Act
triple = Triple(
s=triple_data["s"],
p=triple_data["p"],
o=triple_data["o"]
)
# Assert - Test that Value objects are properly constructed and accessible
assert triple.s.value == "http://example.com/subject"
assert triple.p.value == "http://example.com/predicate"
assert triple.o.value == "Object value"
assert isinstance(triple.s, Value)
assert isinstance(triple.p, Value)
assert isinstance(triple.o, Value)
def test_nested_schema_serialization_contract(self, sample_message_data):
"""Test serialization of nested schemas"""
# Test Triples (contains Metadata and Triple objects)
metadata = Metadata(**sample_message_data["Metadata"])
triple = Triple(**sample_message_data["Triple"])
triples = Triples(metadata=metadata, triples=[triple])
# Verify nested objects maintain their contracts
assert triples.metadata.id == "test-doc-123"
assert triples.triples[0].s.value == "http://example.com/subject"
def test_array_field_serialization_contract(self):
"""Test serialization of array fields"""
# Test AgentRequest with history array
steps = [
AgentStep(
thought=f"Step {i}",
action=f"action_{i}",
arguments={f"param_{i}": f"value_{i}"},
observation=f"Observation {i}"
)
for i in range(3)
]
request = AgentRequest(
question="Test with array",
plan="Test plan",
state="Test state",
history=steps
)
# Verify array serialization maintains order and content
assert len(request.history) == 3
assert request.history[0].thought == "Step 0"
assert request.history[2].action == "action_2"
def test_optional_field_serialization_contract(self):
"""Test serialization contract for optional fields"""
# Test with minimal required fields
minimal_response = TextCompletionResponse(
error=None,
response="Test",
in_token=None, # Optional field
out_token=None, # Optional field
model="test-model"
)
assert minimal_response.response == "Test"
assert minimal_response.in_token is None
assert minimal_response.out_token is None

View file

@ -0,0 +1,306 @@
"""
Contract tests for Cassandra Object Storage
These tests verify the message contracts and schema compatibility
for the objects storage processor.
"""
import pytest
import json
from pulsar.schema import AvroSchema
from trustgraph.schema import ExtractedObject, Metadata, RowSchema, Field
from trustgraph.storage.objects.cassandra.write import Processor
@pytest.mark.contract
class TestObjectsCassandraContracts:
"""Contract tests for Cassandra object storage messages"""
def test_extracted_object_input_contract(self):
"""Test that ExtractedObject schema matches expected input format"""
# Create test object with all required fields
test_metadata = Metadata(
id="test-doc-001",
user="test_user",
collection="test_collection",
metadata=[]
)
test_object = ExtractedObject(
metadata=test_metadata,
schema_name="customer_records",
values={
"customer_id": "CUST123",
"name": "Test Customer",
"email": "test@example.com"
},
confidence=0.95,
source_span="Customer data from document..."
)
# Verify all required fields are present
assert hasattr(test_object, 'metadata')
assert hasattr(test_object, 'schema_name')
assert hasattr(test_object, 'values')
assert hasattr(test_object, 'confidence')
assert hasattr(test_object, 'source_span')
# Verify metadata structure
assert hasattr(test_object.metadata, 'id')
assert hasattr(test_object.metadata, 'user')
assert hasattr(test_object.metadata, 'collection')
assert hasattr(test_object.metadata, 'metadata')
# Verify types
assert isinstance(test_object.schema_name, str)
assert isinstance(test_object.values, dict)
assert isinstance(test_object.confidence, float)
assert isinstance(test_object.source_span, str)
def test_row_schema_structure_contract(self):
"""Test RowSchema structure used for table definitions"""
# Create test schema
test_fields = [
Field(
name="id",
type="string",
size=50,
primary=True,
description="Primary key",
required=True,
enum_values=[],
indexed=False
),
Field(
name="status",
type="string",
size=20,
primary=False,
description="Status field",
required=False,
enum_values=["active", "inactive", "pending"],
indexed=True
)
]
test_schema = RowSchema(
name="test_table",
description="Test table schema",
fields=test_fields
)
# Verify schema structure
assert hasattr(test_schema, 'name')
assert hasattr(test_schema, 'description')
assert hasattr(test_schema, 'fields')
assert isinstance(test_schema.fields, list)
# Verify field structure
for field in test_schema.fields:
assert hasattr(field, 'name')
assert hasattr(field, 'type')
assert hasattr(field, 'size')
assert hasattr(field, 'primary')
assert hasattr(field, 'description')
assert hasattr(field, 'required')
assert hasattr(field, 'enum_values')
assert hasattr(field, 'indexed')
def test_schema_config_format_contract(self):
"""Test the expected configuration format for schemas"""
# Define expected config structure
config_format = {
"schema": {
"table_name": json.dumps({
"name": "table_name",
"description": "Table description",
"fields": [
{
"name": "field_name",
"type": "string",
"size": 0,
"primary_key": True,
"description": "Field description",
"required": True,
"enum": [],
"indexed": False
}
]
})
}
}
# Verify config can be parsed
schema_json = json.loads(config_format["schema"]["table_name"])
assert "name" in schema_json
assert "fields" in schema_json
assert isinstance(schema_json["fields"], list)
# Verify field format
field = schema_json["fields"][0]
required_field_keys = {"name", "type"}
optional_field_keys = {"size", "primary_key", "description", "required", "enum", "indexed"}
assert required_field_keys.issubset(field.keys())
assert set(field.keys()).issubset(required_field_keys | optional_field_keys)
def test_cassandra_type_mapping_contract(self):
"""Test that all supported field types have Cassandra mappings"""
processor = Processor.__new__(Processor)
# All field types that should be supported
supported_types = [
("string", "text"),
("integer", "int"), # or bigint based on size
("float", "float"), # or double based on size
("boolean", "boolean"),
("timestamp", "timestamp"),
("date", "date"),
("time", "time"),
("uuid", "uuid")
]
for field_type, expected_cassandra_type in supported_types:
cassandra_type = processor.get_cassandra_type(field_type)
# For integer and float, the exact type depends on size
if field_type in ["integer", "float"]:
assert cassandra_type in ["int", "bigint", "float", "double"]
else:
assert cassandra_type == expected_cassandra_type
def test_value_conversion_contract(self):
"""Test value conversion for all supported types"""
processor = Processor.__new__(Processor)
# Test conversions maintain data integrity
test_cases = [
# (input_value, field_type, expected_output, expected_type)
("123", "integer", 123, int),
("123.45", "float", 123.45, float),
("true", "boolean", True, bool),
("false", "boolean", False, bool),
("test string", "string", "test string", str),
(None, "string", None, type(None)),
]
for input_val, field_type, expected_val, expected_type in test_cases:
result = processor.convert_value(input_val, field_type)
assert result == expected_val
assert isinstance(result, expected_type) or result is None
def test_extracted_object_serialization_contract(self):
"""Test that ExtractedObject can be serialized/deserialized correctly"""
# Create test object
original = ExtractedObject(
metadata=Metadata(
id="serial-001",
user="test_user",
collection="test_coll",
metadata=[]
),
schema_name="test_schema",
values={"field1": "value1", "field2": "123"},
confidence=0.85,
source_span="Test span"
)
# Test serialization using schema
schema = AvroSchema(ExtractedObject)
# Encode and decode
encoded = schema.encode(original)
decoded = schema.decode(encoded)
# Verify round-trip
assert decoded.metadata.id == original.metadata.id
assert decoded.metadata.user == original.metadata.user
assert decoded.metadata.collection == original.metadata.collection
assert decoded.schema_name == original.schema_name
assert decoded.values == original.values
assert decoded.confidence == original.confidence
assert decoded.source_span == original.source_span
def test_cassandra_table_naming_contract(self):
"""Test Cassandra naming conventions and constraints"""
processor = Processor.__new__(Processor)
# Test table naming (always gets o_ prefix)
table_test_names = [
("simple_name", "o_simple_name"),
("Name-With-Dashes", "o_name_with_dashes"),
("name.with.dots", "o_name_with_dots"),
("123_numbers", "o_123_numbers"),
("special!@#chars", "o_special___chars"), # 3 special chars become 3 underscores
("UPPERCASE", "o_uppercase"),
("CamelCase", "o_camelcase"),
("", "o_"), # Edge case - empty string becomes o_
]
for input_name, expected_name in table_test_names:
result = processor.sanitize_table(input_name)
assert result == expected_name
# Verify result is valid Cassandra identifier (starts with letter)
assert result.startswith('o_')
assert result.replace('o_', '').replace('_', '').isalnum() or result == 'o_'
# Test regular name sanitization (only adds o_ prefix if starts with number)
name_test_cases = [
("simple_name", "simple_name"),
("Name-With-Dashes", "name_with_dashes"),
("name.with.dots", "name_with_dots"),
("123_numbers", "o_123_numbers"), # Only this gets o_ prefix
("special!@#chars", "special___chars"), # 3 special chars become 3 underscores
("UPPERCASE", "uppercase"),
("CamelCase", "camelcase"),
]
for input_name, expected_name in name_test_cases:
result = processor.sanitize_name(input_name)
assert result == expected_name
def test_primary_key_structure_contract(self):
"""Test that primary key structure follows Cassandra best practices"""
# Verify partition key always includes collection
processor = Processor.__new__(Processor)
processor.schemas = {}
processor.known_keyspaces = set()
processor.known_tables = {}
processor.session = None
# Test schema with primary key
schema_with_pk = RowSchema(
name="test",
fields=[
Field(name="id", type="string", primary=True),
Field(name="data", type="string")
]
)
# The primary key should be ((collection, id))
# This is verified in the implementation where collection
# is always first in the partition key
def test_metadata_field_usage_contract(self):
"""Test that metadata fields are used correctly in storage"""
# Create test object
test_obj = ExtractedObject(
metadata=Metadata(
id="meta-001",
user="user123", # -> keyspace
collection="coll456", # -> partition key
metadata=[{"key": "value"}]
),
schema_name="table789", # -> table name
values={"field": "value"},
confidence=0.9,
source_span="Source"
)
# Verify mapping contract:
# - metadata.user -> Cassandra keyspace
# - schema_name -> Cassandra table
# - metadata.collection -> Part of primary key
assert test_obj.metadata.user # Required for keyspace
assert test_obj.schema_name # Required for table
assert test_obj.metadata.collection # Required for partition key

View file

@ -0,0 +1,308 @@
"""
Contract tests for Structured Data Pulsar Message Schemas
These tests verify the contracts for all structured data Pulsar message schemas,
ensuring schema compatibility, serialization contracts, and service interface stability.
Following the TEST_STRATEGY.md approach for contract testing.
"""
import pytest
import json
from typing import Dict, Any
from trustgraph.schema import (
StructuredDataSubmission, ExtractedObject,
NLPToStructuredQueryRequest, NLPToStructuredQueryResponse,
StructuredQueryRequest, StructuredQueryResponse,
StructuredObjectEmbedding, Field, RowSchema,
Metadata, Error, Value
)
from .conftest import serialize_deserialize_test
@pytest.mark.contract
class TestStructuredDataSchemaContracts:
"""Contract tests for structured data schemas"""
def test_field_schema_contract(self):
"""Test enhanced Field schema contract"""
# Arrange & Act - create Field instance directly
field = Field(
name="customer_id",
type="string",
size=0,
primary=True,
description="Unique customer identifier",
required=True,
enum_values=[],
indexed=True
)
# Assert - test field properties
assert field.name == "customer_id"
assert field.type == "string"
assert field.primary is True
assert field.indexed is True
assert isinstance(field.enum_values, list)
assert len(field.enum_values) == 0
# Test with enum values
field_with_enum = Field(
name="status",
type="string",
size=0,
primary=False,
description="Status field",
required=False,
enum_values=["active", "inactive"],
indexed=True
)
assert len(field_with_enum.enum_values) == 2
assert "active" in field_with_enum.enum_values
def test_row_schema_contract(self):
"""Test RowSchema contract"""
# Arrange & Act
field = Field(
name="email",
type="string",
size=255,
primary=False,
description="Customer email",
required=True,
enum_values=[],
indexed=True
)
schema = RowSchema(
name="customers",
description="Customer records schema",
fields=[field]
)
# Assert
assert schema.name == "customers"
assert schema.description == "Customer records schema"
assert len(schema.fields) == 1
assert schema.fields[0].name == "email"
assert schema.fields[0].indexed is True
def test_structured_data_submission_contract(self):
"""Test StructuredDataSubmission schema contract"""
# Arrange
metadata = Metadata(
id="structured-data-001",
user="test_user",
collection="test_collection",
metadata=[]
)
# Act
submission = StructuredDataSubmission(
metadata=metadata,
format="csv",
schema_name="customer_records",
data=b"id,name,email\n1,John,john@example.com",
options={"delimiter": ",", "header": "true"}
)
# Assert
assert submission.format == "csv"
assert submission.schema_name == "customer_records"
assert submission.options["delimiter"] == ","
assert submission.metadata.id == "structured-data-001"
assert len(submission.data) > 0
def test_extracted_object_contract(self):
"""Test ExtractedObject schema contract"""
# Arrange
metadata = Metadata(
id="extracted-obj-001",
user="test_user",
collection="test_collection",
metadata=[]
)
# Act
obj = ExtractedObject(
metadata=metadata,
schema_name="customer_records",
values={"id": "123", "name": "John Doe", "email": "john@example.com"},
confidence=0.95,
source_span="John Doe (john@example.com) customer ID 123"
)
# Assert
assert obj.schema_name == "customer_records"
assert obj.values["name"] == "John Doe"
assert obj.confidence == 0.95
assert len(obj.source_span) > 0
assert obj.metadata.id == "extracted-obj-001"
@pytest.mark.contract
class TestStructuredQueryServiceContracts:
"""Contract tests for structured query services"""
def test_nlp_to_structured_query_request_contract(self):
"""Test NLPToStructuredQueryRequest schema contract"""
# Act
request = NLPToStructuredQueryRequest(
natural_language_query="Show me all customers who registered last month",
max_results=100,
context_hints={"time_range": "last_month", "entity_type": "customer"}
)
# Assert
assert "customers" in request.natural_language_query
assert request.max_results == 100
assert request.context_hints["time_range"] == "last_month"
def test_nlp_to_structured_query_response_contract(self):
"""Test NLPToStructuredQueryResponse schema contract"""
# Act
response = NLPToStructuredQueryResponse(
error=None,
graphql_query="query { customers(filter: {registered: {gte: \"2024-01-01\"}}) { id name email } }",
variables={"start_date": "2024-01-01"},
detected_schemas=["customers"],
confidence=0.92
)
# Assert
assert response.error is None
assert "customers" in response.graphql_query
assert response.detected_schemas[0] == "customers"
assert response.confidence > 0.9
def test_structured_query_request_contract(self):
"""Test StructuredQueryRequest schema contract"""
# Act
request = StructuredQueryRequest(
query="query GetCustomers($limit: Int) { customers(limit: $limit) { id name email } }",
variables={"limit": "10"},
operation_name="GetCustomers"
)
# Assert
assert "customers" in request.query
assert request.variables["limit"] == "10"
assert request.operation_name == "GetCustomers"
def test_structured_query_response_contract(self):
"""Test StructuredQueryResponse schema contract"""
# Act
response = StructuredQueryResponse(
error=None,
data='{"customers": [{"id": "1", "name": "John", "email": "john@example.com"}]}',
errors=[]
)
# Assert
assert response.error is None
assert "customers" in response.data
assert len(response.errors) == 0
def test_structured_query_response_with_errors_contract(self):
"""Test StructuredQueryResponse with GraphQL errors contract"""
# Act
response = StructuredQueryResponse(
error=None,
data=None,
errors=["Field 'invalid_field' not found in schema 'customers'"]
)
# Assert
assert response.data is None
assert len(response.errors) == 1
assert "invalid_field" in response.errors[0]
@pytest.mark.contract
class TestStructuredEmbeddingsContracts:
"""Contract tests for structured object embeddings"""
def test_structured_object_embedding_contract(self):
"""Test StructuredObjectEmbedding schema contract"""
# Arrange
metadata = Metadata(
id="struct-embed-001",
user="test_user",
collection="test_collection",
metadata=[]
)
# Act
embedding = StructuredObjectEmbedding(
metadata=metadata,
vectors=[[0.1, 0.2, 0.3], [0.4, 0.5, 0.6]],
schema_name="customer_records",
object_id="customer_123",
field_embeddings={
"name": [0.1, 0.2, 0.3],
"email": [0.4, 0.5, 0.6]
}
)
# Assert
assert embedding.schema_name == "customer_records"
assert embedding.object_id == "customer_123"
assert len(embedding.vectors) == 2
assert len(embedding.field_embeddings) == 2
assert "name" in embedding.field_embeddings
@pytest.mark.contract
class TestStructuredDataSerializationContracts:
"""Contract tests for structured data serialization/deserialization"""
def test_structured_data_submission_serialization(self):
"""Test StructuredDataSubmission serialization contract"""
# Arrange
metadata = Metadata(id="test", user="user", collection="col", metadata=[])
submission_data = {
"metadata": metadata,
"format": "json",
"schema_name": "test_schema",
"data": b'{"test": "data"}',
"options": {"encoding": "utf-8"}
}
# Act & Assert
assert serialize_deserialize_test(StructuredDataSubmission, submission_data)
def test_extracted_object_serialization(self):
"""Test ExtractedObject serialization contract"""
# Arrange
metadata = Metadata(id="test", user="user", collection="col", metadata=[])
object_data = {
"metadata": metadata,
"schema_name": "test_schema",
"values": {"field1": "value1"},
"confidence": 0.8,
"source_span": "test span"
}
# Act & Assert
assert serialize_deserialize_test(ExtractedObject, object_data)
def test_nlp_query_serialization(self):
"""Test NLP query request/response serialization contract"""
# Test request
request_data = {
"natural_language_query": "test query",
"max_results": 10,
"context_hints": {}
}
assert serialize_deserialize_test(NLPToStructuredQueryRequest, request_data)
# Test response
response_data = {
"error": None,
"graphql_query": "query { test }",
"variables": {},
"detected_schemas": ["test"],
"confidence": 0.9
}
assert serialize_deserialize_test(NLPToStructuredQueryResponse, response_data)