trustgraph/docs/tech-specs/pubsub.md
cybermaggedon 34eb083836
Messaging fabric plugins (#592)
* Plugin architecture for messaging fabric

* Schemas use a technology neutral expression

* Schemas strictness has uncovered some incorrect schema use which is fixed
2025-12-17 21:40:43 +00:00

33 KiB

Pub/Sub Infrastructure

Overview

This document catalogs all connections between the TrustGraph codebase and the pub/sub infrastructure. Currently, the system is hardcoded to use Apache Pulsar. This analysis identifies all integration points to inform future refactoring toward a configurable pub/sub abstraction.

Current State: Pulsar Integration Points

1. Direct Pulsar Client Usage

Location: trustgraph-flow/trustgraph/gateway/service.py

The API gateway directly imports and instantiates the Pulsar client:

  • Line 20: import pulsar
  • Lines 54-61: Direct instantiation of pulsar.Client() with optional pulsar.AuthenticationToken()
  • Lines 33-35: Default Pulsar host configuration from environment variables
  • Lines 178-192: CLI arguments for --pulsar-host, --pulsar-api-key, and --pulsar-listener
  • Lines 78, 124: Passes pulsar_client to ConfigReceiver and DispatcherManager

This is the only location that directly instantiates a Pulsar client outside of the abstraction layer.

2. Base Processor Framework

Location: trustgraph-base/trustgraph/base/async_processor.py

The base class for all processors provides Pulsar connectivity:

  • Line 9: import _pulsar (for exception handling)
  • Line 18: from . pubsub import PulsarClient
  • Line 38: Creates pulsar_client_object = PulsarClient(**params)
  • Lines 104-108: Properties exposing pulsar_host and pulsar_client
  • Line 250: Static method add_args() calls PulsarClient.add_args(parser) for CLI arguments
  • Lines 223-225: Exception handling for _pulsar.Interrupted

All processors inherit from AsyncProcessor, making this the central integration point.

3. Consumer Abstraction

Location: trustgraph-base/trustgraph/base/consumer.py

Consumes messages from queues and invokes handler functions:

Pulsar imports:

  • Line 12: from pulsar.schema import JsonSchema
  • Line 13: import pulsar
  • Line 14: import _pulsar

Pulsar-specific usage:

  • Lines 100, 102: pulsar.InitialPosition.Earliest / pulsar.InitialPosition.Latest
  • Line 108: JsonSchema(self.schema) wrapper
  • Line 110: pulsar.ConsumerType.Shared
  • Lines 104-111: self.client.subscribe() with Pulsar-specific parameters
  • Lines 143, 150, 65: consumer.unsubscribe() and consumer.close() methods
  • Line 162: _pulsar.Timeout exception
  • Lines 182, 205, 232: consumer.acknowledge() / consumer.negative_acknowledge()

Spec file: trustgraph-base/trustgraph/base/consumer_spec.py

  • Line 22: References processor.pulsar_client

4. Producer Abstraction

Location: trustgraph-base/trustgraph/base/producer.py

Sends messages to queues:

Pulsar imports:

  • Line 2: from pulsar.schema import JsonSchema

Pulsar-specific usage:

  • Line 49: JsonSchema(self.schema) wrapper
  • Lines 47-51: self.client.create_producer() with Pulsar-specific parameters (topic, schema, chunking_enabled)
  • Lines 31, 76: producer.close() method
  • Lines 64-65: producer.send() with message and properties

Spec file: trustgraph-base/trustgraph/base/producer_spec.py

  • Line 18: References processor.pulsar_client

5. Publisher Abstraction

Location: trustgraph-base/trustgraph/base/publisher.py

Asynchronous message publishing with queue buffering:

Pulsar imports:

  • Line 2: from pulsar.schema import JsonSchema
  • Line 6: import pulsar

Pulsar-specific usage:

  • Line 52: JsonSchema(self.schema) wrapper
  • Lines 50-54: self.client.create_producer() with Pulsar-specific parameters
  • Lines 101, 103: producer.send() with message and optional properties
  • Lines 106-107: producer.flush() and producer.close() methods

6. Subscriber Abstraction

Location: trustgraph-base/trustgraph/base/subscriber.py

Provides multi-recipient message distribution from queues:

Pulsar imports:

  • Line 6: from pulsar.schema import JsonSchema
  • Line 8: import _pulsar

Pulsar-specific usage:

  • Line 55: JsonSchema(self.schema) wrapper
  • Line 57: self.client.subscribe(**subscribe_args)
  • Lines 101, 136, 160, 167-172: Pulsar exceptions: _pulsar.Timeout, _pulsar.InvalidConfiguration, _pulsar.AlreadyClosed
  • Lines 159, 166, 170: Consumer methods: negative_acknowledge(), unsubscribe(), close()
  • Lines 247, 251: Message acknowledgment: acknowledge(), negative_acknowledge()

Spec file: trustgraph-base/trustgraph/base/subscriber_spec.py

  • Line 19: References processor.pulsar_client

7. Schema System (Heart of Darkness)

Location: trustgraph-base/trustgraph/schema/

Every message schema in the system is defined using Pulsar's schema framework.

Core primitives: schema/core/primitives.py

  • Line 2: from pulsar.schema import Record, String, Boolean, Array, Integer
  • All schemas inherit from Pulsar's Record base class
  • All field types are Pulsar types: String(), Integer(), Boolean(), Array(), Map(), Double()

Example schemas:

  • schema/services/llm.py (Line 2): from pulsar.schema import Record, String, Array, Double, Integer, Boolean
  • schema/services/config.py (Line 2): from pulsar.schema import Record, Bytes, String, Boolean, Array, Map, Integer

Topic naming: schema/core/topic.py

  • Lines 2-3: Topic format: {kind}://{tenant}/{namespace}/{topic}
  • This URI structure is Pulsar-specific (e.g., persistent://tg/flow/config)

Impact:

  • All request/response message definitions throughout the codebase use Pulsar schemas
  • This includes services for: config, flow, llm, prompt, query, storage, agent, collection, diagnosis, library, lookup, nlp_query, objects_query, retrieval, structured_query
  • Schema definitions are imported and used extensively across all processors and services

Summary

Pulsar Dependencies by Category

  1. Client instantiation:

    • Direct: gateway/service.py
    • Abstracted: async_processor.pypubsub.py (PulsarClient)
  2. Message transport:

    • Consumer: consumer.py, consumer_spec.py
    • Producer: producer.py, producer_spec.py
    • Publisher: publisher.py
    • Subscriber: subscriber.py, subscriber_spec.py
  3. Schema system:

    • Base types: schema/core/primitives.py
    • All service schemas: schema/services/*.py
    • Topic naming: schema/core/topic.py
  4. Pulsar-specific concepts required:

    • Topic-based messaging
    • Schema system (Record, field types)
    • Shared subscriptions
    • Message acknowledgment (positive/negative)
    • Consumer positioning (earliest/latest)
    • Message properties
    • Initial positions and consumer types
    • Chunking support
    • Persistent vs non-persistent topics

Refactoring Challenges

The good news: The abstraction layer (Consumer, Producer, Publisher, Subscriber) provides a clean encapsulation of most Pulsar interactions.

The challenges:

  1. Schema system pervasiveness: Every message definition uses pulsar.schema.Record and Pulsar field types
  2. Pulsar-specific enums: InitialPosition, ConsumerType
  3. Pulsar exceptions: _pulsar.Timeout, _pulsar.Interrupted, _pulsar.InvalidConfiguration, _pulsar.AlreadyClosed
  4. Method signatures: acknowledge(), negative_acknowledge(), subscribe(), create_producer(), etc.
  5. Topic URI format: Pulsar's kind://tenant/namespace/topic structure

Next Steps

To make the pub/sub infrastructure configurable, we need to:

  1. Create an abstraction interface for the client/schema system
  2. Abstract Pulsar-specific enums and exceptions
  3. Create schema wrappers or alternative schema definitions
  4. Implement the interface for both Pulsar and alternative systems (Kafka, RabbitMQ, Redis Streams, etc.)
  5. Update pubsub.py to be configurable and support multiple backends
  6. Provide migration path for existing deployments

Approach Draft 1: Adapter Pattern with Schema Translation Layer

Key Insight

The schema system is the deepest integration point - everything else flows from it. We need to solve this first, or we'll be rewriting the entire codebase.

Strategy: Minimal Disruption with Adapters

1. Keep Pulsar schemas as the internal representation

  • Don't rewrite all the schema definitions
  • Schemas remain pulsar.schema.Record internally
  • Use adapters to translate at the boundary between our code and the pub/sub backend

2. Create a pub/sub abstraction layer:

┌─────────────────────────────────────┐
│   Existing Code (unchanged)         │
│   - Uses Pulsar schemas internally  │
│   - Consumer/Producer/Publisher     │
└──────────────┬──────────────────────┘
               │
┌──────────────┴──────────────────────┐
│   PubSubFactory (configurable)      │
│   - Creates backend-specific client │
└──────────────┬──────────────────────┘
               │
        ┌──────┴──────┐
        │             │
┌───────▼─────┐  ┌────▼─────────┐
│ PulsarAdapter│  │ KafkaAdapter │  etc...
│ (passthrough)│  │ (translates) │
└──────────────┘  └──────────────┘

3. Define abstract interfaces:

  • PubSubClient - client connection
  • PubSubProducer - sending messages
  • PubSubConsumer - receiving messages
  • SchemaAdapter - translating Pulsar schemas to/from JSON or backend-specific formats

4. Implementation details:

For Pulsar adapter: Nearly passthrough, minimal translation

For other backends (Kafka, RabbitMQ, etc.):

  • Serialize Pulsar Record objects to JSON/bytes
  • Map concepts like:
    • InitialPosition.Earliest/Latest → Kafka's auto.offset.reset
    • acknowledge() → Kafka's commit
    • negative_acknowledge() → Re-queue or DLQ pattern
    • Topic URIs → Backend-specific topic names

Analysis

Pros:

  • Minimal code changes to existing services
  • Schemas stay as-is (no massive rewrite)
  • Gradual migration path
  • Pulsar users see no difference
  • New backends added via adapters

Cons:

  • ⚠️ Still carries Pulsar dependency (for schema definitions)
  • ⚠️ Some impedance mismatch translating concepts

Alternative Consideration

Create a TrustGraph schema system that's pub/sub agnostic (using dataclasses or Pydantic), then generate Pulsar/Kafka/etc schemas from it. This requires rewriting every schema file and potentially breaking changes.

Recommendation for Draft 1

Start with the adapter approach because:

  1. It's pragmatic - works with existing code
  2. Proves the concept with minimal risk
  3. Can evolve to a native schema system later if needed
  4. Configuration-driven: one env var switches backends

Approach Draft 2: Backend-Agnostic Schema System with Dataclasses

Core Concept

Use Python dataclasses as the neutral schema definition format. Each pub/sub backend provides its own serialization/deserialization for dataclasses, eliminating the need for Pulsar schemas to remain in the codebase.

Schema Polymorphism at the Factory Level

Instead of translating Pulsar schemas, each backend provides its own schema handling that works with standard Python dataclasses.

Publisher Flow

# 1. Get the configured backend from factory
pubsub = get_pubsub()  # Returns PulsarBackend, MQTTBackend, etc.

# 2. Get schema class from the backend
# (Can be imported directly - backend-agnostic)
from trustgraph.schema.services.llm import TextCompletionRequest

# 3. Create a producer/publisher for a specific topic
producer = pubsub.create_producer(
    topic="text-completion-requests",
    schema=TextCompletionRequest  # Tells backend what schema to use
)

# 4. Create message instances (same API regardless of backend)
request = TextCompletionRequest(
    system="You are helpful",
    prompt="Hello world",
    streaming=False
)

# 5. Send the message
producer.send(request)  # Backend serializes appropriately

Consumer Flow

# 1. Get the configured backend
pubsub = get_pubsub()

# 2. Create a consumer
consumer = pubsub.subscribe(
    topic="text-completion-requests",
    schema=TextCompletionRequest  # Tells backend how to deserialize
)

# 3. Receive and deserialize
msg = consumer.receive()
request = msg.value()  # Returns TextCompletionRequest dataclass instance

# 4. Use the data (type-safe access)
print(request.system)   # "You are helpful"
print(request.prompt)   # "Hello world"
print(request.streaming)  # False

What Happens Behind the Scenes

For Pulsar backend:

  • create_producer() → creates Pulsar producer with JSON schema or dynamically generated Record
  • send(request) → serializes dataclass to JSON/Pulsar format, sends to Pulsar
  • receive() → gets Pulsar message, deserializes back to dataclass

For MQTT backend:

  • create_producer() → connects to MQTT broker, no schema registration needed
  • send(request) → converts dataclass to JSON, publishes to MQTT topic
  • receive() → subscribes to MQTT topic, deserializes JSON to dataclass

For Kafka backend:

  • create_producer() → creates Kafka producer, registers Avro schema if needed
  • send(request) → serializes dataclass to Avro format, sends to Kafka
  • receive() → gets Kafka message, deserializes Avro back to dataclass

Key Design Points

  1. Schema object creation: The dataclass instance (TextCompletionRequest(...)) is identical regardless of backend
  2. Backend handles encoding: Each backend knows how to serialize its dataclass to the wire format
  3. Schema definition at creation: When creating producer/consumer, you specify the schema type
  4. Type safety preserved: You get back a proper TextCompletionRequest object, not a dict
  5. No backend leakage: Application code never imports backend-specific libraries

Example Transformation

Current (Pulsar-specific):

# schema/services/llm.py
from pulsar.schema import Record, String, Boolean, Integer

class TextCompletionRequest(Record):
    system = String()
    prompt = String()
    streaming = Boolean()

New (Backend-agnostic):

# schema/services/llm.py
from dataclasses import dataclass

@dataclass
class TextCompletionRequest:
    system: str
    prompt: str
    streaming: bool = False

Backend Integration

Each backend handles serialization/deserialization of dataclasses:

Pulsar backend:

  • Dynamically generate pulsar.schema.Record classes from dataclasses
  • Or serialize dataclasses to JSON and use Pulsar's JSON schema
  • Maintains compatibility with existing Pulsar deployments

MQTT/Redis backend:

  • Direct JSON serialization of dataclass instances
  • Use dataclasses.asdict() / from_dict()
  • Lightweight, no schema registry needed

Kafka backend:

  • Generate Avro schemas from dataclass definitions
  • Use Confluent's schema registry
  • Type-safe serialization with schema evolution support

Architecture

┌─────────────────────────────────────┐
│   Application Code                  │
│   - Uses dataclass schemas          │
│   - Backend-agnostic                │
└──────────────┬──────────────────────┘
               │
┌──────────────┴──────────────────────┐
│   PubSubFactory (configurable)      │
│   - get_pubsub() returns backend    │
└──────────────┬──────────────────────┘
               │
        ┌──────┴──────┐
        │             │
┌───────▼─────────┐  ┌────▼──────────────┐
│ PulsarBackend   │  │ MQTTBackend       │
│ - JSON schema   │  │ - JSON serialize  │
│ - or dynamic    │  │ - Simple queues   │
│   Record gen    │  │                   │
└─────────────────┘  └───────────────────┘

Implementation Details

1. Schema definitions: Plain dataclasses with type hints

  • str, int, bool, float for primitives
  • list[T] for arrays
  • dict[str, T] for maps
  • Nested dataclasses for complex types

2. Each backend provides:

  • Serializer: dataclass → bytes/wire format
  • Deserializer: bytes/wire format → dataclass
  • Schema registration (if needed, like Pulsar/Kafka)

3. Consumer/Producer abstraction:

  • Already exists (consumer.py, producer.py)
  • Update to use backend's serialization
  • Remove direct Pulsar imports

4. Type mappings:

  • Pulsar String() → Python str
  • Pulsar Integer() → Python int
  • Pulsar Boolean() → Python bool
  • Pulsar Array(T) → Python list[T]
  • Pulsar Map(K, V) → Python dict[K, V]
  • Pulsar Double() → Python float
  • Pulsar Bytes() → Python bytes

Migration Path

  1. Create dataclass versions of all schemas in trustgraph/schema/
  2. Update backend classes (Consumer, Producer, Publisher, Subscriber) to use backend-provided serialization
  3. Implement PulsarBackend with JSON schema or dynamic Record generation
  4. Test with Pulsar to ensure backward compatibility with existing deployments
  5. Add new backends (MQTT, Kafka, Redis, etc.) as needed
  6. Remove Pulsar imports from schema files

Benefits

No pub/sub dependency in schema definitions Standard Python - easy to understand, type-check, document Modern tooling - works with mypy, IDE autocomplete, linters Backend-optimized - each backend uses native serialization No translation overhead - direct serialization, no adapters Type safety - real objects with proper types Easy validation - can use Pydantic if needed

Challenges & Solutions

Challenge: Pulsar's Record has runtime field validation Solution: Use Pydantic dataclasses for validation if needed, or Python 3.10+ dataclass features with __post_init__

Challenge: Some Pulsar-specific features (like Bytes type) Solution: Map to bytes type in dataclass, backend handles encoding appropriately

Challenge: Topic naming (persistent://tenant/namespace/topic) Solution: Abstract topic names in schema definitions, backend converts to proper format

Challenge: Schema evolution and versioning Solution: Each backend handles this according to its capabilities (Pulsar schema versions, Kafka schema registry, etc.)

Challenge: Nested complex types Solution: Use nested dataclasses, backends recursively serialize/deserialize

Design Decisions

  1. Plain dataclasses or Pydantic?

    • Decision: Use plain Python dataclasses
    • Simpler, no additional dependencies
    • Validation not required in practice
    • Easier to understand and maintain
  2. Schema evolution:

    • Decision: No versioning mechanism needed
    • Schemas are stable and long-lasting
    • Updates typically add new fields (backward compatible)
    • Backends handle schema evolution according to their capabilities
  3. Backward compatibility:

    • Decision: Major version change, no backward compatibility required
    • Will be a breaking change with migration instructions
    • Clean break allows for better design
    • Migration guide will be provided for existing deployments
  4. Nested types and complex structures:

    • Decision: Use nested dataclasses naturally
    • Python dataclasses handle nesting perfectly
    • list[T] for arrays, dict[K, V] for maps
    • Backends recursively serialize/deserialize
    • Example:
      @dataclass
      class Value:
          value: str
          is_uri: bool
      
      @dataclass
      class Triple:
          s: Value              # Nested dataclass
          p: Value
          o: Value
      
      @dataclass
      class GraphQuery:
          triples: list[Triple]  # Array of nested dataclasses
          metadata: dict[str, str]
      
  5. Default values and optional fields:

    • Decision: Mix of required, defaults, and optional fields
    • Required fields: No default value
    • Fields with defaults: Always present, have sensible default
    • Truly optional fields: T | None = None, omitted from serialization when None
    • Example:
      @dataclass
      class TextCompletionRequest:
          system: str              # Required, no default
          prompt: str              # Required, no default
          streaming: bool = False  # Optional with default value
          metadata: dict | None = None  # Truly optional, can be absent
      

    Important serialization semantics:

    When metadata = None:

    {
        "system": "...",
        "prompt": "...",
        "streaming": false
        // metadata field NOT PRESENT
    }
    

    When metadata = {} (explicitly empty):

    {
        "system": "...",
        "prompt": "...",
        "streaming": false,
        "metadata": {}  // Field PRESENT but empty
    }
    

    Key distinction:

    • None → field absent from JSON (not serialized)
    • Empty value ({}, [], "") → field present with empty value
    • This matters semantically: "not provided" vs "explicitly empty"
    • Serialization backends must skip None fields, not encode as null

Approach Draft 3: Implementation Details

Generic Queue Naming Format

Replace backend-specific queue names with a generic format that backends can map appropriately.

Format: {qos}/{tenant}/{namespace}/{queue-name}

Where:

  • qos: Quality of Service level
    • q0 = best-effort (fire and forget, no acknowledgment)
    • q1 = at-least-once (requires acknowledgment)
    • q2 = exactly-once (two-phase acknowledgment)
  • tenant: Logical grouping for multi-tenancy
  • namespace: Sub-grouping within tenant
  • queue-name: Actual queue/topic name

Examples:

q1/tg/flow/text-completion-requests
q2/tg/config/config-push
q0/tg/metrics/stats

Backend Topic Mapping

Each backend maps the generic format to its native format:

Pulsar Backend:

def map_topic(self, generic_topic: str) -> str:
    # Parse: q1/tg/flow/text-completion-requests
    qos, tenant, namespace, queue = generic_topic.split('/', 3)

    # Map QoS to persistence
    persistence = 'persistent' if qos in ['q1', 'q2'] else 'non-persistent'

    # Return Pulsar URI: persistent://tg/flow/text-completion-requests
    return f"{persistence}://{tenant}/{namespace}/{queue}"

MQTT Backend:

def map_topic(self, generic_topic: str) -> tuple[str, int]:
    # Parse: q1/tg/flow/text-completion-requests
    qos, tenant, namespace, queue = generic_topic.split('/', 3)

    # Map QoS level
    qos_level = {'q0': 0, 'q1': 1, 'q2': 2}[qos]

    # Build MQTT topic including tenant/namespace for proper namespacing
    mqtt_topic = f"{tenant}/{namespace}/{queue}"

    return mqtt_topic, qos_level

Updated Topic Helper Function

# schema/core/topic.py
def topic(queue_name, qos='q1', tenant='tg', namespace='flow'):
    """
    Create a generic topic identifier that can be mapped by backends.

    Args:
        queue_name: The queue/topic name
        qos: Quality of service
             - 'q0' = best-effort (no ack)
             - 'q1' = at-least-once (ack required)
             - 'q2' = exactly-once (two-phase ack)
        tenant: Tenant identifier for multi-tenancy
        namespace: Namespace within tenant

    Returns:
        Generic topic string: qos/tenant/namespace/queue_name

    Examples:
        topic('my-queue')  # q1/tg/flow/my-queue
        topic('config', qos='q2', namespace='config')  # q2/tg/config/config
    """
    return f"{qos}/{tenant}/{namespace}/{queue_name}"

Configuration and Initialization

Command-Line Arguments + Environment Variables:

# In base/async_processor.py - add_args() method
@staticmethod
def add_args(parser):
    # Pub/sub backend selection
    parser.add_argument(
        '--pubsub-backend',
        default=os.getenv('PUBSUB_BACKEND', 'pulsar'),
        choices=['pulsar', 'mqtt'],
        help='Pub/sub backend (default: pulsar, env: PUBSUB_BACKEND)'
    )

    # Pulsar-specific configuration
    parser.add_argument(
        '--pulsar-host',
        default=os.getenv('PULSAR_HOST', 'pulsar://localhost:6650'),
        help='Pulsar host (default: pulsar://localhost:6650, env: PULSAR_HOST)'
    )

    parser.add_argument(
        '--pulsar-api-key',
        default=os.getenv('PULSAR_API_KEY', None),
        help='Pulsar API key (env: PULSAR_API_KEY)'
    )

    parser.add_argument(
        '--pulsar-listener',
        default=os.getenv('PULSAR_LISTENER', None),
        help='Pulsar listener name (env: PULSAR_LISTENER)'
    )

    # MQTT-specific configuration
    parser.add_argument(
        '--mqtt-host',
        default=os.getenv('MQTT_HOST', 'localhost'),
        help='MQTT broker host (default: localhost, env: MQTT_HOST)'
    )

    parser.add_argument(
        '--mqtt-port',
        type=int,
        default=int(os.getenv('MQTT_PORT', '1883')),
        help='MQTT broker port (default: 1883, env: MQTT_PORT)'
    )

    parser.add_argument(
        '--mqtt-username',
        default=os.getenv('MQTT_USERNAME', None),
        help='MQTT username (env: MQTT_USERNAME)'
    )

    parser.add_argument(
        '--mqtt-password',
        default=os.getenv('MQTT_PASSWORD', None),
        help='MQTT password (env: MQTT_PASSWORD)'
    )

Factory Function:

# In base/pubsub.py or base/pubsub_factory.py
def get_pubsub(**config) -> PubSubBackend:
    """
    Create and return a pub/sub backend based on configuration.

    Args:
        config: Configuration dict from command-line args
                Must include 'pubsub_backend' key

    Returns:
        Backend instance (PulsarBackend, MQTTBackend, etc.)
    """
    backend_type = config.get('pubsub_backend', 'pulsar')

    if backend_type == 'pulsar':
        return PulsarBackend(
            host=config.get('pulsar_host'),
            api_key=config.get('pulsar_api_key'),
            listener=config.get('pulsar_listener'),
        )
    elif backend_type == 'mqtt':
        return MQTTBackend(
            host=config.get('mqtt_host'),
            port=config.get('mqtt_port'),
            username=config.get('mqtt_username'),
            password=config.get('mqtt_password'),
        )
    else:
        raise ValueError(f"Unknown pub/sub backend: {backend_type}")

Usage in AsyncProcessor:

# In async_processor.py
class AsyncProcessor:
    def __init__(self, **params):
        self.id = params.get("id")

        # Create backend from config (replaces PulsarClient)
        self.pubsub = get_pubsub(**params)

        # Rest of initialization...

Backend Interface

class PubSubBackend(Protocol):
    """Protocol defining the interface all pub/sub backends must implement."""

    def create_producer(self, topic: str, schema: type, **options) -> BackendProducer:
        """
        Create a producer for a topic.

        Args:
            topic: Generic topic format (qos/tenant/namespace/queue)
            schema: Dataclass type for messages
            options: Backend-specific options (e.g., chunking_enabled)

        Returns:
            Backend-specific producer instance
        """
        ...

    def create_consumer(
        self,
        topic: str,
        subscription: str,
        schema: type,
        initial_position: str = 'latest',
        consumer_type: str = 'shared',
        **options
    ) -> BackendConsumer:
        """
        Create a consumer for a topic.

        Args:
            topic: Generic topic format (qos/tenant/namespace/queue)
            subscription: Subscription/consumer group name
            schema: Dataclass type for messages
            initial_position: 'earliest' or 'latest' (MQTT may ignore)
            consumer_type: 'shared', 'exclusive', 'failover' (MQTT may ignore)
            options: Backend-specific options

        Returns:
            Backend-specific consumer instance
        """
        ...

    def close(self) -> None:
        """Close the backend connection."""
        ...
class BackendProducer(Protocol):
    """Protocol for backend-specific producer."""

    def send(self, message: Any, properties: dict = {}) -> None:
        """Send a message (dataclass instance) with optional properties."""
        ...

    def flush(self) -> None:
        """Flush any buffered messages."""
        ...

    def close(self) -> None:
        """Close the producer."""
        ...
class BackendConsumer(Protocol):
    """Protocol for backend-specific consumer."""

    def receive(self, timeout_millis: int = 2000) -> Message:
        """
        Receive a message from the topic.

        Raises:
            TimeoutError: If no message received within timeout
        """
        ...

    def acknowledge(self, message: Message) -> None:
        """Acknowledge successful processing of a message."""
        ...

    def negative_acknowledge(self, message: Message) -> None:
        """Negative acknowledge - triggers redelivery."""
        ...

    def unsubscribe(self) -> None:
        """Unsubscribe from the topic."""
        ...

    def close(self) -> None:
        """Close the consumer."""
        ...
class Message(Protocol):
    """Protocol for a received message."""

    def value(self) -> Any:
        """Get the deserialized message (dataclass instance)."""
        ...

    def properties(self) -> dict:
        """Get message properties/metadata."""
        ...

Existing Classes Refactoring

The existing Consumer, Producer, Publisher, Subscriber classes remain largely intact:

Current responsibilities (keep):

  • Async threading model and taskgroups
  • Reconnection logic and retry handling
  • Metrics collection
  • Rate limiting
  • Concurrency management

Changes needed:

  • Remove direct Pulsar imports (pulsar.schema, pulsar.InitialPosition, etc.)
  • Accept BackendProducer/BackendConsumer instead of Pulsar client
  • Delegate actual pub/sub operations to backend instances
  • Map generic concepts to backend calls

Example refactoring:

# OLD - consumer.py
class Consumer:
    def __init__(self, client, topic, subscriber, schema, ...):
        self.client = client  # Direct Pulsar client
        # ...

    async def consumer_run(self):
        # Uses pulsar.InitialPosition, pulsar.ConsumerType
        self.consumer = self.client.subscribe(
            topic=self.topic,
            schema=JsonSchema(self.schema),
            initial_position=pulsar.InitialPosition.Earliest,
            consumer_type=pulsar.ConsumerType.Shared,
        )

# NEW - consumer.py
class Consumer:
    def __init__(self, backend_consumer, schema, ...):
        self.backend_consumer = backend_consumer  # Backend-specific consumer
        self.schema = schema
        # ...

    async def consumer_run(self):
        # Backend consumer already created with right settings
        # Just use it directly
        while self.running:
            msg = await asyncio.to_thread(
                self.backend_consumer.receive,
                timeout_millis=2000
            )
            await self.handle_message(msg)

Backend-Specific Behaviors

Pulsar Backend:

  • Maps q0non-persistent://, q1/q2persistent://
  • Supports all consumer types (shared, exclusive, failover)
  • Supports initial position (earliest/latest)
  • Native message acknowledgment
  • Schema registry support

MQTT Backend:

  • Maps q0/q1/q2 → MQTT QoS levels 0/1/2
  • Includes tenant/namespace in topic path for namespacing
  • Auto-generates client IDs from subscription names
  • Ignores initial position (no message history in basic MQTT)
  • Ignores consumer type (MQTT uses client IDs, not consumer groups)
  • Simple publish/subscribe model

Design Decisions Summary

  1. Generic queue naming: qos/tenant/namespace/queue-name format
  2. QoS in queue ID: Determined by queue definition, not configuration
  3. Reconnection: Handled by Consumer/Producer classes, not backends
  4. MQTT topics: Include tenant/namespace for proper namespacing
  5. Message history: MQTT ignores initial_position parameter (future enhancement)
  6. Client IDs: MQTT backend auto-generates from subscription name

Future Enhancements

MQTT message history:

  • Could add optional persistence layer (e.g., retained messages, external store)
  • Would allow supporting initial_position='earliest'
  • Not required for initial implementation