This specification describes the implementation of a GraphQL query interface for TrustGraph's structured data storage in Apache Cassandra. Building upon the structured data capabilities outlined in the structured-data.md specification, this document details how GraphQL queries will be executed against Cassandra tables containing extracted and ingested structured objects.
The GraphQL query service will provide a flexible, type-safe interface for querying structured data stored in Cassandra. It will dynamically adapt to schema changes, support complex queries including relationships between objects, and integrate seamlessly with TrustGraph's existing message-based architecture.
## Goals
- **Dynamic Schema Support**: Automatically adapt to schema changes in configuration without service restarts
- **GraphQL Standards Compliance**: Provide a standard GraphQL interface compatible with existing GraphQL tooling and clients
- **Efficient Cassandra Queries**: Translate GraphQL queries into efficient Cassandra CQL queries respecting partition keys and indexes
- **Relationship Resolution**: Support GraphQL field resolvers for relationships between different object types
- **Type Safety**: Ensure type-safe query execution and response generation based on schema definitions
- **Scalable Performance**: Handle concurrent queries efficiently with proper connection pooling and query optimization
- **Request/Response Integration**: Maintain compatibility with TrustGraph's Pulsar-based request/response pattern
- **Error Handling**: Provide comprehensive error reporting for schema mismatches, query errors, and data validation issues
## Background
The structured data storage implementation (trustgraph-flow/trustgraph/storage/objects/cassandra/) writes objects to Cassandra tables based on schema definitions stored in TrustGraph's configuration system. These tables use a composite partition key structure with collection and schema-defined primary keys, enabling efficient queries within collections.
Current limitations that this specification addresses:
- No query interface for the structured data stored in Cassandra
- Inability to leverage GraphQL's powerful query capabilities for structured data
- Missing support for relationship traversal between related objects
- Lack of a standardized query language for structured data access
The GraphQL query service will bridge these gaps by:
- Providing a standard GraphQL interface for querying Cassandra tables
- Dynamically generating GraphQL schemas from TrustGraph configuration
- Efficiently translating GraphQL queries to Cassandra CQL
- Supporting relationship resolution through field resolvers
## Technical Design
### Architecture
The GraphQL query service will be implemented as a new TrustGraph flow processor following established patterns:
- Support both single object and list relationships
### Query Execution Flow
1.**Request Reception**:
- Receive ObjectsQueryRequest from Pulsar
- Extract GraphQL query string and variables
- Identify user and collection context
2.**Query Validation**:
- Parse GraphQL query using Strawberry
- Validate against current schema
- Check field selections and argument types
3.**CQL Generation**:
- Analyze GraphQL selections
- Build CQL query with proper WHERE clauses
- Include collection in partition key
- Apply filters based on GraphQL arguments
4.**Query Execution**:
- Execute CQL query against Cassandra
- Map results to GraphQL response structure
- Resolve any relationship fields
- Format response according to GraphQL spec
5.**Response Delivery**:
- Create ObjectsQueryResponse with results
- Include any execution errors
- Send response via Pulsar with correlation ID
### Data Models
> **Note**: An existing StructuredQueryRequest/Response schema exists in `trustgraph-base/trustgraph/schema/services/structured_query.py`. However, it lacks critical fields (user, collection) and uses suboptimal types. The schemas below represent the recommended evolution, which should either replace the existing schemas or be created as new ObjectsQueryRequest/Response types.
#### Request Schema (ObjectsQueryRequest)
```python
from pulsar.schema import Record, String, Map, Array
class ObjectsQueryRequest(Record):
user = String() # Cassandra keyspace (follows pattern from TriplesQueryRequest)
collection = String() # Data collection identifier (required for partition key)
query = String() # GraphQL query string
variables = Map(String()) # GraphQL variables (consider enhancing to support all JSON types)
operation_name = String() # Operation to execute for multi-operation documents
```
**Rationale for changes from existing StructuredQueryRequest:**
- Added `user` and `collection` fields to match other query services pattern
- These fields are essential for identifying the Cassandra keyspace and collection
- Variables remain as Map(String()) for now but should ideally support all JSON types
#### Response Schema (ObjectsQueryResponse)
```python
from pulsar.schema import Record, String, Array
from ..core.primitives import Error
class GraphQLError(Record):
message = String()
path = Array(String()) # Path to the field that caused the error