mirror of https://github.com/trustgraph-ai/trustgraph.git synced 2026-04-25 16:36:21 +02:00

cybermaggedon 7a6197d8c3 GraphRAG Query-Time Explainability (#677 ) Implements full explainability pipeline for GraphRAG queries, enabling traceability from answers back to source documents. Renamed throughout for clarity: - provenance_callback → explain_callback - provenance_id → explain_id - provenance_collection → explain_collection - message_type "provenance" → "explain" - Queue name "provenance" → "explainability" GraphRAG queries now emit explainability events as they execute: 1. Session - query text and timestamp 2. Retrieval - edges retrieved from subgraph 3. Selection - selected edges with LLM reasoning (JSONL with id + reasoning) 4. Answer - reference to synthesized response Events stream via explain_callback during query(), enabling real-time UX. - Answers stored in librarian service (not inline in graph - too large) - Document ID as URN: urn:trustgraph:answer:{session_id} - Graph stores tg:document reference (IRI) to librarian document - Added librarian producer/consumer to graph-rag service - get_labelgraph() now returns (labeled_edges, uri_map) - uri_map maps edge_id(label_s, label_p, label_o) → (uri_s, uri_p, uri_o) - Explainability data stores original URIs, not labels - Enables tracing edges back to reifying statements via tg:reifies - Added serialize_triple() to query service (matches storage format) - get_term_value() now handles TRIPLE type terms - Enables querying by quoted triple in object position: ?stmt tg:reifies <<s p o>> - Displays real-time explainability events during query - Resolves rdfs:label for edge components (s, p, o) - Traces source chain via prov:wasDerivedFrom to root document - Output: "Source: Chunk 1 → Page 2 → Document Title" - Label caching to avoid repeated queries GraphRagResponse: - explain_id: str \| None - explain_collection: str \| None - message_type: str ("chunk" or "explain") - end_of_session: bool trustgraph-base/trustgraph/provenance/: - namespaces.py - Added TG_DOCUMENT predicate - triples.py - answer_triples() supports document_id reference - uris.py - Added edge_selection_uri() trustgraph-base/trustgraph/schema/services/retrieval.py: - GraphRagResponse with explain_id, explain_collection, end_of_session trustgraph-flow/trustgraph/retrieval/graph_rag/: - graph_rag.py - URI preservation, streaming answer accumulation - rag.py - Librarian integration, real-time explain emission trustgraph-flow/trustgraph/query/triples/cassandra/service.py: - Quoted triple serialization for query matching trustgraph-cli/trustgraph/cli/invoke_graph_rag.py: - Full explainability display with label resolution and source tracing		2026-03-10 10:00:01 +00:00
..
__init__.py	Extending test coverage (#434 )	2025-07-14 17:54:04 +01:00
conftest.py	Changed schema for Value -> Term, majorly breaking change (#622 )	2026-01-27 13:48:08 +00:00
README.md	Extending test coverage (#434 )	2025-07-14 17:54:04 +01:00
test_document_embeddings_contract.py	Embeddings API scores (#671 )	2026-03-09 10:53:44 +00:00
test_message_contracts.py	Changed schema for Value -> Term, majorly breaking change (#622 )	2026-01-27 13:48:08 +00:00
test_rows_cassandra_contracts.py	Structured data 2 (#645 )	2026-02-23 15:56:29 +00:00
test_rows_graphql_query_contracts.py	Structured data 2 (#645 )	2026-02-23 15:56:29 +00:00
test_structured_data_contracts.py	Embeddings API scores (#671 )	2026-03-09 10:53:44 +00:00
test_translator_completion_flags.py	GraphRAG Query-Time Explainability (#677 )	2026-03-10 10:00:01 +00:00

README.md

Contract Tests for TrustGraph

This directory contains contract tests that verify service interface contracts, message schemas, and API compatibility across the TrustGraph microservices architecture.

Overview

Contract tests ensure that:

Message schemas remain compatible across service versions
API interfaces stay stable for consumers
Service communication contracts are maintained
Schema evolution doesn't break existing integrations

Test Categories

1. Pulsar Message Schema Contracts (`test_message_contracts.py`)

Tests the contracts for all Pulsar message schemas used in TrustGraph service communication.

Coverage:

✅ Text Completion Messages: TextCompletionRequest ↔ TextCompletionResponse
✅ Document RAG Messages: DocumentRagQuery ↔ DocumentRagResponse
✅ Agent Messages: AgentRequest ↔ AgentResponse ↔ AgentStep
✅ Graph Messages: Chunk → Triple → Triples → EntityContext
✅ Common Messages: Metadata, Value, Error schemas
✅ Message Routing: Properties, correlation IDs, routing keys
✅ Schema Evolution: Backward/forward compatibility testing
✅ Serialization: Schema validation and data integrity

Key Features:

Schema Validation: Ensures all message schemas accept valid data and reject invalid data
Field Contracts: Validates required vs optional fields and type constraints
Nested Schema Support: Tests complex schemas with embedded objects and arrays
Routing Contracts: Validates message properties and routing conventions
Evolution Testing: Backward compatibility and schema versioning support

Running Contract Tests

Run All Contract Tests

pytest tests/contract/ -m contract

Run Specific Contract Test Categories

# Message schema contracts
pytest tests/contract/test_message_contracts.py -v

# Specific test class
pytest tests/contract/test_message_contracts.py::TestTextCompletionMessageContracts -v

# Schema evolution tests
pytest tests/contract/test_message_contracts.py::TestSchemaEvolutionContracts -v

Run with Coverage

pytest tests/contract/ -m contract --cov=trustgraph.schema --cov-report=html

Contract Test Patterns

1. Schema Validation Pattern

@pytest.mark.contract
def test_schema_contract(self, sample_message_data):
    """Test that schema accepts valid data and rejects invalid data"""
    # Arrange
    valid_data = sample_message_data["SchemaName"]
    
    # Act & Assert
    assert validate_schema_contract(SchemaClass, valid_data)
    
    # Test field constraints
    instance = SchemaClass(**valid_data)
    assert hasattr(instance, 'required_field')
    assert isinstance(instance.required_field, expected_type)

2. Serialization Contract Pattern

@pytest.mark.contract  
def test_serialization_contract(self, sample_message_data):
    """Test schema serialization/deserialization contracts"""
    # Arrange
    data = sample_message_data["SchemaName"]
    
    # Act & Assert
    assert serialize_deserialize_test(SchemaClass, data)

3. Evolution Contract Pattern

@pytest.mark.contract
def test_backward_compatibility_contract(self, schema_evolution_data):
    """Test that new schema versions accept old data formats"""
    # Arrange
    old_version_data = schema_evolution_data["SchemaName_v1"]
    
    # Act - Should work with current schema
    instance = CurrentSchema(**old_version_data)
    
    # Assert - Required fields maintained
    assert instance.required_field == expected_value

Schema Registry

The contract tests maintain a registry of all TrustGraph schemas:

schema_registry = {
    # Text Completion
    "TextCompletionRequest": TextCompletionRequest,
    "TextCompletionResponse": TextCompletionResponse,
    
    # Document RAG  
    "DocumentRagQuery": DocumentRagQuery,
    "DocumentRagResponse": DocumentRagResponse,
    
    # Agent
    "AgentRequest": AgentRequest,
    "AgentResponse": AgentResponse,
    
    # Graph/Knowledge
    "Chunk": Chunk,
    "Triple": Triple,
    "Triples": Triples,
    "Value": Value,
    
    # Common
    "Metadata": Metadata,
    "Error": Error,
}

Message Contract Specifications

Text Completion Service Contract

TextCompletionRequest:
  required_fields: [system, prompt]
  field_types:
    system: string
    prompt: string

TextCompletionResponse:
  required_fields: [error, response, model]  
  field_types:
    error: Error | null
    response: string | null
    in_token: integer | null
    out_token: integer | null
    model: string

Document RAG Service Contract

DocumentRagQuery:
  required_fields: [query, user, collection]
  field_types:
    query: string
    user: string
    collection: string
    doc_limit: integer

DocumentRagResponse:
  required_fields: [error, response]
  field_types:
    error: Error | null
    response: string | null

Agent Service Contract

AgentRequest:
  required_fields: [question, history]
  field_types:
    question: string
    plan: string
    state: string
    history: Array<AgentStep>

AgentResponse:
  required_fields: [error]
  field_types:
    answer: string | null
    error: Error | null
    thought: string | null
    observation: string | null

Best Practices

Contract Test Design

Test Both Valid and Invalid Data: Ensure schemas accept valid data and reject invalid data
Verify Field Constraints: Test type constraints, required vs optional fields
Test Nested Schemas: Validate complex objects with embedded schemas
Test Array Fields: Ensure array serialization maintains order and content
Test Optional Fields: Verify optional field handling in serialization

Schema Evolution

Backward Compatibility: New schema versions must accept old message formats
Required Field Stability: Required fields should never become optional or be removed
Additive Changes: New fields should be optional to maintain compatibility
Deprecation Strategy: Plan deprecation path for schema changes

Error Handling

Error Schema Consistency: All error responses use consistent Error schema
Error Type Contracts: Error types follow naming conventions
Error Message Format: Error messages provide actionable information

Adding New Contract Tests

When adding new message schemas or modifying existing ones:

Add to Schema Registry: Update conftest.py schema registry
Add Sample Data: Create valid sample data in conftest.py
Create Contract Tests: Follow existing patterns for validation
Test Evolution: Add backward compatibility tests
Update Documentation: Document schema contracts in this README

Integration with CI/CD

Contract tests should be run:

On every commit to detect breaking changes early
Before releases to ensure API stability
On schema changes to validate compatibility
In dependency updates to catch breaking changes

# CI/CD pipeline command
pytest tests/contract/ -m contract --junitxml=contract-test-results.xml

Contract Test Results

Contract tests provide:

✅ Schema Compatibility Reports: Which schemas pass/fail validation
✅ Breaking Change Detection: Identifies contract violations
✅ Evolution Validation: Confirms backward compatibility
✅ Field Constraint Verification: Validates data type contracts

This ensures that TrustGraph services can evolve independently while maintaining stable, compatible interfaces for all service communication.