trustgraph/docs/tech-specs/pubsub.md

964 lines
33 KiB
Markdown
Raw Normal View History

---
layout: default
title: "Pub/Sub Infrastructure"
parent: "Tech Specs"
---
# Pub/Sub Infrastructure
## Overview
This document catalogs all connections between the TrustGraph codebase and the pub/sub infrastructure. Currently, the system is hardcoded to use Apache Pulsar. This analysis identifies all integration points to inform future refactoring toward a configurable pub/sub abstraction.
## Current State: Pulsar Integration Points
### 1. Direct Pulsar Client Usage
**Location:** `trustgraph-flow/trustgraph/gateway/service.py`
The API gateway directly imports and instantiates the Pulsar client:
- **Line 20:** `import pulsar`
- **Lines 54-61:** Direct instantiation of `pulsar.Client()` with optional `pulsar.AuthenticationToken()`
- **Lines 33-35:** Default Pulsar host configuration from environment variables
- **Lines 178-192:** CLI arguments for `--pulsar-host`, `--pulsar-api-key`, and `--pulsar-listener`
- **Lines 78, 124:** Passes `pulsar_client` to `ConfigReceiver` and `DispatcherManager`
This is the only location that directly instantiates a Pulsar client outside of the abstraction layer.
### 2. Base Processor Framework
**Location:** `trustgraph-base/trustgraph/base/async_processor.py`
The base class for all processors provides Pulsar connectivity:
- **Line 9:** `import _pulsar` (for exception handling)
- **Line 18:** `from . pubsub import PulsarClient`
- **Line 38:** Creates `pulsar_client_object = PulsarClient(**params)`
- **Lines 104-108:** Properties exposing `pulsar_host` and `pulsar_client`
- **Line 250:** Static method `add_args()` calls `PulsarClient.add_args(parser)` for CLI arguments
- **Lines 223-225:** Exception handling for `_pulsar.Interrupted`
All processors inherit from `AsyncProcessor`, making this the central integration point.
### 3. Consumer Abstraction
**Location:** `trustgraph-base/trustgraph/base/consumer.py`
Consumes messages from queues and invokes handler functions:
**Pulsar imports:**
- **Line 12:** `from pulsar.schema import JsonSchema`
- **Line 13:** `import pulsar`
- **Line 14:** `import _pulsar`
**Pulsar-specific usage:**
- **Lines 100, 102:** `pulsar.InitialPosition.Earliest` / `pulsar.InitialPosition.Latest`
- **Line 108:** `JsonSchema(self.schema)` wrapper
- **Line 110:** `pulsar.ConsumerType.Shared`
- **Lines 104-111:** `self.client.subscribe()` with Pulsar-specific parameters
- **Lines 143, 150, 65:** `consumer.unsubscribe()` and `consumer.close()` methods
- **Line 162:** `_pulsar.Timeout` exception
- **Lines 182, 205, 232:** `consumer.acknowledge()` / `consumer.negative_acknowledge()`
**Spec file:** `trustgraph-base/trustgraph/base/consumer_spec.py`
- **Line 22:** References `processor.pulsar_client`
### 4. Producer Abstraction
**Location:** `trustgraph-base/trustgraph/base/producer.py`
Sends messages to queues:
**Pulsar imports:**
- **Line 2:** `from pulsar.schema import JsonSchema`
**Pulsar-specific usage:**
- **Line 49:** `JsonSchema(self.schema)` wrapper
- **Lines 47-51:** `self.client.create_producer()` with Pulsar-specific parameters (topic, schema, chunking_enabled)
- **Lines 31, 76:** `producer.close()` method
- **Lines 64-65:** `producer.send()` with message and properties
**Spec file:** `trustgraph-base/trustgraph/base/producer_spec.py`
- **Line 18:** References `processor.pulsar_client`
### 5. Publisher Abstraction
**Location:** `trustgraph-base/trustgraph/base/publisher.py`
Asynchronous message publishing with queue buffering:
**Pulsar imports:**
- **Line 2:** `from pulsar.schema import JsonSchema`
- **Line 6:** `import pulsar`
**Pulsar-specific usage:**
- **Line 52:** `JsonSchema(self.schema)` wrapper
- **Lines 50-54:** `self.client.create_producer()` with Pulsar-specific parameters
- **Lines 101, 103:** `producer.send()` with message and optional properties
- **Lines 106-107:** `producer.flush()` and `producer.close()` methods
### 6. Subscriber Abstraction
**Location:** `trustgraph-base/trustgraph/base/subscriber.py`
Provides multi-recipient message distribution from queues:
**Pulsar imports:**
- **Line 6:** `from pulsar.schema import JsonSchema`
- **Line 8:** `import _pulsar`
**Pulsar-specific usage:**
- **Line 55:** `JsonSchema(self.schema)` wrapper
- **Line 57:** `self.client.subscribe(**subscribe_args)`
- **Lines 101, 136, 160, 167-172:** Pulsar exceptions: `_pulsar.Timeout`, `_pulsar.InvalidConfiguration`, `_pulsar.AlreadyClosed`
- **Lines 159, 166, 170:** Consumer methods: `negative_acknowledge()`, `unsubscribe()`, `close()`
- **Lines 247, 251:** Message acknowledgment: `acknowledge()`, `negative_acknowledge()`
**Spec file:** `trustgraph-base/trustgraph/base/subscriber_spec.py`
- **Line 19:** References `processor.pulsar_client`
### 7. Schema System (Heart of Darkness)
**Location:** `trustgraph-base/trustgraph/schema/`
Every message schema in the system is defined using Pulsar's schema framework.
**Core primitives:** `schema/core/primitives.py`
- **Line 2:** `from pulsar.schema import Record, String, Boolean, Array, Integer`
- All schemas inherit from Pulsar's `Record` base class
- All field types are Pulsar types: `String()`, `Integer()`, `Boolean()`, `Array()`, `Map()`, `Double()`
**Example schemas:**
- `schema/services/llm.py` (Line 2): `from pulsar.schema import Record, String, Array, Double, Integer, Boolean`
- `schema/services/config.py` (Line 2): `from pulsar.schema import Record, Bytes, String, Boolean, Array, Map, Integer`
**Topic naming:** `schema/core/topic.py`
- **Lines 2-3:** Topic format: `{kind}://{tenant}/{namespace}/{topic}`
- This URI structure is Pulsar-specific (e.g., `persistent://tg/flow/config`)
**Impact:**
- All request/response message definitions throughout the codebase use Pulsar schemas
- This includes services for: config, flow, llm, prompt, query, storage, agent, collection, diagnosis, library, lookup, nlp_query, objects_query, retrieval, structured_query
- Schema definitions are imported and used extensively across all processors and services
## Summary
### Pulsar Dependencies by Category
1. **Client instantiation:**
- Direct: `gateway/service.py`
- Abstracted: `async_processor.py``pubsub.py` (PulsarClient)
2. **Message transport:**
- Consumer: `consumer.py`, `consumer_spec.py`
- Producer: `producer.py`, `producer_spec.py`
- Publisher: `publisher.py`
- Subscriber: `subscriber.py`, `subscriber_spec.py`
3. **Schema system:**
- Base types: `schema/core/primitives.py`
- All service schemas: `schema/services/*.py`
- Topic naming: `schema/core/topic.py`
4. **Pulsar-specific concepts required:**
- Topic-based messaging
- Schema system (Record, field types)
- Shared subscriptions
- Message acknowledgment (positive/negative)
- Consumer positioning (earliest/latest)
- Message properties
- Initial positions and consumer types
- Chunking support
- Persistent vs non-persistent topics
### Refactoring Challenges
The good news: The abstraction layer (Consumer, Producer, Publisher, Subscriber) provides a clean encapsulation of most Pulsar interactions.
The challenges:
1. **Schema system pervasiveness:** Every message definition uses `pulsar.schema.Record` and Pulsar field types
2. **Pulsar-specific enums:** `InitialPosition`, `ConsumerType`
3. **Pulsar exceptions:** `_pulsar.Timeout`, `_pulsar.Interrupted`, `_pulsar.InvalidConfiguration`, `_pulsar.AlreadyClosed`
4. **Method signatures:** `acknowledge()`, `negative_acknowledge()`, `subscribe()`, `create_producer()`, etc.
5. **Topic URI format:** Pulsar's `kind://tenant/namespace/topic` structure
### Next Steps
To make the pub/sub infrastructure configurable, we need to:
1. Create an abstraction interface for the client/schema system
2. Abstract Pulsar-specific enums and exceptions
3. Create schema wrappers or alternative schema definitions
4. Implement the interface for both Pulsar and alternative systems (Kafka, RabbitMQ, Redis Streams, etc.)
5. Update `pubsub.py` to be configurable and support multiple backends
6. Provide migration path for existing deployments
## Approach Draft 1: Adapter Pattern with Schema Translation Layer
### Key Insight
The **schema system** is the deepest integration point - everything else flows from it. We need to solve this first, or we'll be rewriting the entire codebase.
### Strategy: Minimal Disruption with Adapters
**1. Keep Pulsar schemas as the internal representation**
- Don't rewrite all the schema definitions
- Schemas remain `pulsar.schema.Record` internally
- Use adapters to translate at the boundary between our code and the pub/sub backend
**2. Create a pub/sub abstraction layer:**
```
┌─────────────────────────────────────┐
│ Existing Code (unchanged) │
│ - Uses Pulsar schemas internally │
│ - Consumer/Producer/Publisher │
└──────────────┬──────────────────────┘
┌──────────────┴──────────────────────┐
│ PubSubFactory (configurable) │
│ - Creates backend-specific client │
└──────────────┬──────────────────────┘
┌──────┴──────┐
│ │
┌───────▼─────┐ ┌────▼─────────┐
│ PulsarAdapter│ │ KafkaAdapter │ etc...
│ (passthrough)│ │ (translates) │
└──────────────┘ └──────────────┘
```
**3. Define abstract interfaces:**
- `PubSubClient` - client connection
- `PubSubProducer` - sending messages
- `PubSubConsumer` - receiving messages
- `SchemaAdapter` - translating Pulsar schemas to/from JSON or backend-specific formats
**4. Implementation details:**
For **Pulsar adapter**: Nearly passthrough, minimal translation
For **other backends** (Kafka, RabbitMQ, etc.):
- Serialize Pulsar Record objects to JSON/bytes
- Map concepts like:
- `InitialPosition.Earliest/Latest` → Kafka's auto.offset.reset
- `acknowledge()` → Kafka's commit
- `negative_acknowledge()` → Re-queue or DLQ pattern
- Topic URIs → Backend-specific topic names
### Analysis
**Pros:**
- ✅ Minimal code changes to existing services
- ✅ Schemas stay as-is (no massive rewrite)
- ✅ Gradual migration path
- ✅ Pulsar users see no difference
- ✅ New backends added via adapters
**Cons:**
- ⚠️ Still carries Pulsar dependency (for schema definitions)
- ⚠️ Some impedance mismatch translating concepts
### Alternative Consideration
Create a **TrustGraph schema system** that's pub/sub agnostic (using dataclasses or Pydantic), then generate Pulsar/Kafka/etc schemas from it. This requires rewriting every schema file and potentially breaking changes.
### Recommendation for Draft 1
Start with the **adapter approach** because:
1. It's pragmatic - works with existing code
2. Proves the concept with minimal risk
3. Can evolve to a native schema system later if needed
4. Configuration-driven: one env var switches backends
## Approach Draft 2: Backend-Agnostic Schema System with Dataclasses
### Core Concept
Use Python **dataclasses** as the neutral schema definition format. Each pub/sub backend provides its own serialization/deserialization for dataclasses, eliminating the need for Pulsar schemas to remain in the codebase.
### Schema Polymorphism at the Factory Level
Instead of translating Pulsar schemas, **each backend provides its own schema handling** that works with standard Python dataclasses.
### Publisher Flow
```python
# 1. Get the configured backend from factory
pubsub = get_pubsub() # Returns PulsarBackend, MQTTBackend, etc.
# 2. Get schema class from the backend
# (Can be imported directly - backend-agnostic)
from trustgraph.schema.services.llm import TextCompletionRequest
# 3. Create a producer/publisher for a specific topic
producer = pubsub.create_producer(
topic="text-completion-requests",
schema=TextCompletionRequest # Tells backend what schema to use
)
# 4. Create message instances (same API regardless of backend)
request = TextCompletionRequest(
system="You are helpful",
prompt="Hello world",
streaming=False
)
# 5. Send the message
producer.send(request) # Backend serializes appropriately
```
### Consumer Flow
```python
# 1. Get the configured backend
pubsub = get_pubsub()
# 2. Create a consumer
consumer = pubsub.subscribe(
topic="text-completion-requests",
schema=TextCompletionRequest # Tells backend how to deserialize
)
# 3. Receive and deserialize
msg = consumer.receive()
request = msg.value() # Returns TextCompletionRequest dataclass instance
# 4. Use the data (type-safe access)
print(request.system) # "You are helpful"
print(request.prompt) # "Hello world"
print(request.streaming) # False
```
### What Happens Behind the Scenes
**For Pulsar backend:**
- `create_producer()` → creates Pulsar producer with JSON schema or dynamically generated Record
- `send(request)` → serializes dataclass to JSON/Pulsar format, sends to Pulsar
- `receive()` → gets Pulsar message, deserializes back to dataclass
**For MQTT backend:**
- `create_producer()` → connects to MQTT broker, no schema registration needed
- `send(request)` → converts dataclass to JSON, publishes to MQTT topic
- `receive()` → subscribes to MQTT topic, deserializes JSON to dataclass
**For Kafka backend:**
- `create_producer()` → creates Kafka producer, registers Avro schema if needed
- `send(request)` → serializes dataclass to Avro format, sends to Kafka
- `receive()` → gets Kafka message, deserializes Avro back to dataclass
### Key Design Points
1. **Schema object creation**: The dataclass instance (`TextCompletionRequest(...)`) is identical regardless of backend
2. **Backend handles encoding**: Each backend knows how to serialize its dataclass to the wire format
3. **Schema definition at creation**: When creating producer/consumer, you specify the schema type
4. **Type safety preserved**: You get back a proper `TextCompletionRequest` object, not a dict
5. **No backend leakage**: Application code never imports backend-specific libraries
### Example Transformation
**Current (Pulsar-specific):**
```python
# schema/services/llm.py
from pulsar.schema import Record, String, Boolean, Integer
class TextCompletionRequest(Record):
system = String()
prompt = String()
streaming = Boolean()
```
**New (Backend-agnostic):**
```python
# schema/services/llm.py
from dataclasses import dataclass
@dataclass
class TextCompletionRequest:
system: str
prompt: str
streaming: bool = False
```
### Backend Integration
Each backend handles serialization/deserialization of dataclasses:
**Pulsar backend:**
- Dynamically generate `pulsar.schema.Record` classes from dataclasses
- Or serialize dataclasses to JSON and use Pulsar's JSON schema
- Maintains compatibility with existing Pulsar deployments
**MQTT/Redis backend:**
- Direct JSON serialization of dataclass instances
- Use `dataclasses.asdict()` / `from_dict()`
- Lightweight, no schema registry needed
**Kafka backend:**
- Generate Avro schemas from dataclass definitions
- Use Confluent's schema registry
- Type-safe serialization with schema evolution support
### Architecture
```
┌─────────────────────────────────────┐
│ Application Code │
│ - Uses dataclass schemas │
│ - Backend-agnostic │
└──────────────┬──────────────────────┘
┌──────────────┴──────────────────────┐
│ PubSubFactory (configurable) │
│ - get_pubsub() returns backend │
└──────────────┬──────────────────────┘
┌──────┴──────┐
│ │
┌───────▼─────────┐ ┌────▼──────────────┐
│ PulsarBackend │ │ MQTTBackend │
│ - JSON schema │ │ - JSON serialize │
│ - or dynamic │ │ - Simple queues │
│ Record gen │ │ │
└─────────────────┘ └───────────────────┘
```
### Implementation Details
**1. Schema definitions:** Plain dataclasses with type hints
- `str`, `int`, `bool`, `float` for primitives
- `list[T]` for arrays
- `dict[str, T]` for maps
- Nested dataclasses for complex types
**2. Each backend provides:**
- Serializer: `dataclass → bytes/wire format`
- Deserializer: `bytes/wire format → dataclass`
- Schema registration (if needed, like Pulsar/Kafka)
**3. Consumer/Producer abstraction:**
- Already exists (consumer.py, producer.py)
- Update to use backend's serialization
- Remove direct Pulsar imports
**4. Type mappings:**
- Pulsar `String()` → Python `str`
- Pulsar `Integer()` → Python `int`
- Pulsar `Boolean()` → Python `bool`
- Pulsar `Array(T)` → Python `list[T]`
- Pulsar `Map(K, V)` → Python `dict[K, V]`
- Pulsar `Double()` → Python `float`
- Pulsar `Bytes()` → Python `bytes`
### Migration Path
1. **Create dataclass versions** of all schemas in `trustgraph/schema/`
2. **Update backend classes** (Consumer, Producer, Publisher, Subscriber) to use backend-provided serialization
3. **Implement PulsarBackend** with JSON schema or dynamic Record generation
4. **Test with Pulsar** to ensure backward compatibility with existing deployments
5. **Add new backends** (MQTT, Kafka, Redis, etc.) as needed
6. **Remove Pulsar imports** from schema files
### Benefits
**No pub/sub dependency** in schema definitions
**Standard Python** - easy to understand, type-check, document
**Modern tooling** - works with mypy, IDE autocomplete, linters
**Backend-optimized** - each backend uses native serialization
**No translation overhead** - direct serialization, no adapters
**Type safety** - real objects with proper types
**Easy validation** - can use Pydantic if needed
### Challenges & Solutions
**Challenge:** Pulsar's `Record` has runtime field validation
**Solution:** Use Pydantic dataclasses for validation if needed, or Python 3.10+ dataclass features with `__post_init__`
**Challenge:** Some Pulsar-specific features (like `Bytes` type)
**Solution:** Map to `bytes` type in dataclass, backend handles encoding appropriately
**Challenge:** Topic naming (`persistent://tenant/namespace/topic`)
**Solution:** Abstract topic names in schema definitions, backend converts to proper format
**Challenge:** Schema evolution and versioning
**Solution:** Each backend handles this according to its capabilities (Pulsar schema versions, Kafka schema registry, etc.)
**Challenge:** Nested complex types
**Solution:** Use nested dataclasses, backends recursively serialize/deserialize
### Design Decisions
1. **Plain dataclasses or Pydantic?**
-**Decision: Use plain Python dataclasses**
- Simpler, no additional dependencies
- Validation not required in practice
- Easier to understand and maintain
2. **Schema evolution:**
-**Decision: No versioning mechanism needed**
- Schemas are stable and long-lasting
- Updates typically add new fields (backward compatible)
- Backends handle schema evolution according to their capabilities
3. **Backward compatibility:**
-**Decision: Major version change, no backward compatibility required**
- Will be a breaking change with migration instructions
- Clean break allows for better design
- Migration guide will be provided for existing deployments
4. **Nested types and complex structures:**
-**Decision: Use nested dataclasses naturally**
- Python dataclasses handle nesting perfectly
- `list[T]` for arrays, `dict[K, V]` for maps
- Backends recursively serialize/deserialize
- Example:
```python
@dataclass
class Value:
value: str
is_uri: bool
@dataclass
class Triple:
s: Value # Nested dataclass
p: Value
o: Value
@dataclass
class GraphQuery:
triples: list[Triple] # Array of nested dataclasses
metadata: dict[str, str]
```
5. **Default values and optional fields:**
-**Decision: Mix of required, defaults, and optional fields**
- Required fields: No default value
- Fields with defaults: Always present, have sensible default
- Truly optional fields: `T | None = None`, omitted from serialization when `None`
- Example:
```python
@dataclass
class TextCompletionRequest:
system: str # Required, no default
prompt: str # Required, no default
streaming: bool = False # Optional with default value
metadata: dict | None = None # Truly optional, can be absent
```
**Important serialization semantics:**
When `metadata = None`:
```json
{
"system": "...",
"prompt": "...",
"streaming": false
// metadata field NOT PRESENT
}
```
When `metadata = {}` (explicitly empty):
```json
{
"system": "...",
"prompt": "...",
"streaming": false,
"metadata": {} // Field PRESENT but empty
}
```
**Key distinction:**
- `None` → field absent from JSON (not serialized)
- Empty value (`{}`, `[]`, `""`) → field present with empty value
- This matters semantically: "not provided" vs "explicitly empty"
- Serialization backends must skip `None` fields, not encode as `null`
## Approach Draft 3: Implementation Details
### Generic Queue Naming Format
Replace backend-specific queue names with a generic format that backends can map appropriately.
**Format:** `{qos}/{tenant}/{namespace}/{queue-name}`
Where:
- `qos`: Quality of Service level
- `q0` = best-effort (fire and forget, no acknowledgment)
- `q1` = at-least-once (requires acknowledgment)
- `q2` = exactly-once (two-phase acknowledgment)
- `tenant`: Logical grouping for multi-tenancy
- `namespace`: Sub-grouping within tenant
- `queue-name`: Actual queue/topic name
**Examples:**
```
q1/tg/flow/text-completion-requests
q2/tg/config/config-push
q0/tg/metrics/stats
```
### Backend Topic Mapping
Each backend maps the generic format to its native format:
**Pulsar Backend:**
```python
def map_topic(self, generic_topic: str) -> str:
# Parse: q1/tg/flow/text-completion-requests
qos, tenant, namespace, queue = generic_topic.split('/', 3)
# Map QoS to persistence
persistence = 'persistent' if qos in ['q1', 'q2'] else 'non-persistent'
# Return Pulsar URI: persistent://tg/flow/text-completion-requests
return f"{persistence}://{tenant}/{namespace}/{queue}"
```
**MQTT Backend:**
```python
def map_topic(self, generic_topic: str) -> tuple[str, int]:
# Parse: q1/tg/flow/text-completion-requests
qos, tenant, namespace, queue = generic_topic.split('/', 3)
# Map QoS level
qos_level = {'q0': 0, 'q1': 1, 'q2': 2}[qos]
# Build MQTT topic including tenant/namespace for proper namespacing
mqtt_topic = f"{tenant}/{namespace}/{queue}"
return mqtt_topic, qos_level
```
### Updated Topic Helper Function
```python
# schema/core/topic.py
def topic(queue_name, qos='q1', tenant='tg', namespace='flow'):
"""
Create a generic topic identifier that can be mapped by backends.
Args:
queue_name: The queue/topic name
qos: Quality of service
- 'q0' = best-effort (no ack)
- 'q1' = at-least-once (ack required)
- 'q2' = exactly-once (two-phase ack)
tenant: Tenant identifier for multi-tenancy
namespace: Namespace within tenant
Returns:
Generic topic string: qos/tenant/namespace/queue_name
Examples:
topic('my-queue') # q1/tg/flow/my-queue
topic('config', qos='q2', namespace='config') # q2/tg/config/config
"""
return f"{qos}/{tenant}/{namespace}/{queue_name}"
```
### Configuration and Initialization
**Command-Line Arguments + Environment Variables:**
```python
# In base/async_processor.py - add_args() method
@staticmethod
def add_args(parser):
# Pub/sub backend selection
parser.add_argument(
'--pubsub-backend',
default=os.getenv('PUBSUB_BACKEND', 'pulsar'),
choices=['pulsar', 'mqtt'],
help='Pub/sub backend (default: pulsar, env: PUBSUB_BACKEND)'
)
# Pulsar-specific configuration
parser.add_argument(
'--pulsar-host',
default=os.getenv('PULSAR_HOST', 'pulsar://localhost:6650'),
help='Pulsar host (default: pulsar://localhost:6650, env: PULSAR_HOST)'
)
parser.add_argument(
'--pulsar-api-key',
default=os.getenv('PULSAR_API_KEY', None),
help='Pulsar API key (env: PULSAR_API_KEY)'
)
parser.add_argument(
'--pulsar-listener',
default=os.getenv('PULSAR_LISTENER', None),
help='Pulsar listener name (env: PULSAR_LISTENER)'
)
# MQTT-specific configuration
parser.add_argument(
'--mqtt-host',
default=os.getenv('MQTT_HOST', 'localhost'),
help='MQTT broker host (default: localhost, env: MQTT_HOST)'
)
parser.add_argument(
'--mqtt-port',
type=int,
default=int(os.getenv('MQTT_PORT', '1883')),
help='MQTT broker port (default: 1883, env: MQTT_PORT)'
)
parser.add_argument(
'--mqtt-username',
default=os.getenv('MQTT_USERNAME', None),
help='MQTT username (env: MQTT_USERNAME)'
)
parser.add_argument(
'--mqtt-password',
default=os.getenv('MQTT_PASSWORD', None),
help='MQTT password (env: MQTT_PASSWORD)'
)
```
**Factory Function:**
```python
# In base/pubsub.py or base/pubsub_factory.py
def get_pubsub(**config) -> PubSubBackend:
"""
Create and return a pub/sub backend based on configuration.
Args:
config: Configuration dict from command-line args
Must include 'pubsub_backend' key
Returns:
Backend instance (PulsarBackend, MQTTBackend, etc.)
"""
backend_type = config.get('pubsub_backend', 'pulsar')
if backend_type == 'pulsar':
return PulsarBackend(
host=config.get('pulsar_host'),
api_key=config.get('pulsar_api_key'),
listener=config.get('pulsar_listener'),
)
elif backend_type == 'mqtt':
return MQTTBackend(
host=config.get('mqtt_host'),
port=config.get('mqtt_port'),
username=config.get('mqtt_username'),
password=config.get('mqtt_password'),
)
else:
raise ValueError(f"Unknown pub/sub backend: {backend_type}")
```
**Usage in AsyncProcessor:**
```python
# In async_processor.py
class AsyncProcessor:
def __init__(self, **params):
self.id = params.get("id")
# Create backend from config (replaces PulsarClient)
self.pubsub = get_pubsub(**params)
# Rest of initialization...
```
### Backend Interface
```python
class PubSubBackend(Protocol):
"""Protocol defining the interface all pub/sub backends must implement."""
def create_producer(self, topic: str, schema: type, **options) -> BackendProducer:
"""
Create a producer for a topic.
Args:
topic: Generic topic format (qos/tenant/namespace/queue)
schema: Dataclass type for messages
options: Backend-specific options (e.g., chunking_enabled)
Returns:
Backend-specific producer instance
"""
...
def create_consumer(
self,
topic: str,
subscription: str,
schema: type,
initial_position: str = 'latest',
consumer_type: str = 'shared',
**options
) -> BackendConsumer:
"""
Create a consumer for a topic.
Args:
topic: Generic topic format (qos/tenant/namespace/queue)
subscription: Subscription/consumer group name
schema: Dataclass type for messages
initial_position: 'earliest' or 'latest' (MQTT may ignore)
consumer_type: 'shared', 'exclusive', 'failover' (MQTT may ignore)
options: Backend-specific options
Returns:
Backend-specific consumer instance
"""
...
def close(self) -> None:
"""Close the backend connection."""
...
```
```python
class BackendProducer(Protocol):
"""Protocol for backend-specific producer."""
def send(self, message: Any, properties: dict = {}) -> None:
"""Send a message (dataclass instance) with optional properties."""
...
def flush(self) -> None:
"""Flush any buffered messages."""
...
def close(self) -> None:
"""Close the producer."""
...
```
```python
class BackendConsumer(Protocol):
"""Protocol for backend-specific consumer."""
def receive(self, timeout_millis: int = 2000) -> Message:
"""
Receive a message from the topic.
Raises:
TimeoutError: If no message received within timeout
"""
...
def acknowledge(self, message: Message) -> None:
"""Acknowledge successful processing of a message."""
...
def negative_acknowledge(self, message: Message) -> None:
"""Negative acknowledge - triggers redelivery."""
...
def unsubscribe(self) -> None:
"""Unsubscribe from the topic."""
...
def close(self) -> None:
"""Close the consumer."""
...
```
```python
class Message(Protocol):
"""Protocol for a received message."""
def value(self) -> Any:
"""Get the deserialized message (dataclass instance)."""
...
def properties(self) -> dict:
"""Get message properties/metadata."""
...
```
### Existing Classes Refactoring
The existing `Consumer`, `Producer`, `Publisher`, `Subscriber` classes remain largely intact:
**Current responsibilities (keep):**
- Async threading model and taskgroups
- Reconnection logic and retry handling
- Metrics collection
- Rate limiting
- Concurrency management
**Changes needed:**
- Remove direct Pulsar imports (`pulsar.schema`, `pulsar.InitialPosition`, etc.)
- Accept `BackendProducer`/`BackendConsumer` instead of Pulsar client
- Delegate actual pub/sub operations to backend instances
- Map generic concepts to backend calls
**Example refactoring:**
```python
# OLD - consumer.py
class Consumer:
def __init__(self, client, topic, subscriber, schema, ...):
self.client = client # Direct Pulsar client
# ...
async def consumer_run(self):
# Uses pulsar.InitialPosition, pulsar.ConsumerType
self.consumer = self.client.subscribe(
topic=self.topic,
schema=JsonSchema(self.schema),
initial_position=pulsar.InitialPosition.Earliest,
consumer_type=pulsar.ConsumerType.Shared,
)
# NEW - consumer.py
class Consumer:
def __init__(self, backend_consumer, schema, ...):
self.backend_consumer = backend_consumer # Backend-specific consumer
self.schema = schema
# ...
async def consumer_run(self):
# Backend consumer already created with right settings
# Just use it directly
while self.running:
msg = await asyncio.to_thread(
self.backend_consumer.receive,
timeout_millis=2000
)
await self.handle_message(msg)
```
### Backend-Specific Behaviors
**Pulsar Backend:**
- Maps `q0``non-persistent://`, `q1`/`q2``persistent://`
- Supports all consumer types (shared, exclusive, failover)
- Supports initial position (earliest/latest)
- Native message acknowledgment
- Schema registry support
**MQTT Backend:**
- Maps `q0`/`q1`/`q2` → MQTT QoS levels 0/1/2
- Includes tenant/namespace in topic path for namespacing
- Auto-generates client IDs from subscription names
- Ignores initial position (no message history in basic MQTT)
- Ignores consumer type (MQTT uses client IDs, not consumer groups)
- Simple publish/subscribe model
### Design Decisions Summary
1.**Generic queue naming**: `qos/tenant/namespace/queue-name` format
2.**QoS in queue ID**: Determined by queue definition, not configuration
3.**Reconnection**: Handled by Consumer/Producer classes, not backends
4.**MQTT topics**: Include tenant/namespace for proper namespacing
5.**Message history**: MQTT ignores `initial_position` parameter (future enhancement)
6.**Client IDs**: MQTT backend auto-generates from subscription name
### Future Enhancements
**MQTT message history:**
- Could add optional persistence layer (e.g., retained messages, external store)
- Would allow supporting `initial_position='earliest'`
- Not required for initial implementation