Messaging fabric plugins (#592)

* Plugin architecture for messaging fabric * Schemas use a technology neutral expression * Schemas strictness has uncovered some incorrect schema use which is fixed
2026-06-09 06:45:13 +02:00 · 2025-12-17 21:40:43 +00:00 · 2025-12-17 21:40:43 +00:00 · 34eb083836
commit 34eb083836
parent 1865b3f3c8
100 changed files with 2342 additions and 828 deletions
--- a/docs/tech-specs/pubsub.md
+++ b/docs/tech-specs/pubsub.md
@ -0,0 +1,958 @@
+# Pub/Sub Infrastructure
+
+## Overview
+
+This document catalogs all connections between the TrustGraph codebase and the pub/sub infrastructure. Currently, the system is hardcoded to use Apache Pulsar. This analysis identifies all integration points to inform future refactoring toward a configurable pub/sub abstraction.
+
+## Current State: Pulsar Integration Points
+
+### 1. Direct Pulsar Client Usage
+
+**Location:** `trustgraph-flow/trustgraph/gateway/service.py`
+
+The API gateway directly imports and instantiates the Pulsar client:
+
+- **Line 20:** `import pulsar`
+- **Lines 54-61:** Direct instantiation of `pulsar.Client()` with optional `pulsar.AuthenticationToken()`
+- **Lines 33-35:** Default Pulsar host configuration from environment variables
+- **Lines 178-192:** CLI arguments for `--pulsar-host`, `--pulsar-api-key`, and `--pulsar-listener`
+- **Lines 78, 124:** Passes `pulsar_client` to `ConfigReceiver` and `DispatcherManager`
+
+This is the only location that directly instantiates a Pulsar client outside of the abstraction layer.
+
+### 2. Base Processor Framework
+
+**Location:** `trustgraph-base/trustgraph/base/async_processor.py`
+
+The base class for all processors provides Pulsar connectivity:
+
+- **Line 9:** `import _pulsar` (for exception handling)
+- **Line 18:** `from . pubsub import PulsarClient`
+- **Line 38:** Creates `pulsar_client_object = PulsarClient(**params)`
+- **Lines 104-108:** Properties exposing `pulsar_host` and `pulsar_client`
+- **Line 250:** Static method `add_args()` calls `PulsarClient.add_args(parser)` for CLI arguments
+- **Lines 223-225:** Exception handling for `_pulsar.Interrupted`
+
+All processors inherit from `AsyncProcessor`, making this the central integration point.
+
+### 3. Consumer Abstraction
+
+**Location:** `trustgraph-base/trustgraph/base/consumer.py`
+
+Consumes messages from queues and invokes handler functions:
+
+**Pulsar imports:**
+- **Line 12:** `from pulsar.schema import JsonSchema`
+- **Line 13:** `import pulsar`
+- **Line 14:** `import _pulsar`
+
+**Pulsar-specific usage:**
+- **Lines 100, 102:** `pulsar.InitialPosition.Earliest` / `pulsar.InitialPosition.Latest`
+- **Line 108:** `JsonSchema(self.schema)` wrapper
+- **Line 110:** `pulsar.ConsumerType.Shared`
+- **Lines 104-111:** `self.client.subscribe()` with Pulsar-specific parameters
+- **Lines 143, 150, 65:** `consumer.unsubscribe()` and `consumer.close()` methods
+- **Line 162:** `_pulsar.Timeout` exception
+- **Lines 182, 205, 232:** `consumer.acknowledge()` / `consumer.negative_acknowledge()`
+
+**Spec file:** `trustgraph-base/trustgraph/base/consumer_spec.py`
+- **Line 22:** References `processor.pulsar_client`
+
+### 4. Producer Abstraction
+
+**Location:** `trustgraph-base/trustgraph/base/producer.py`
+
+Sends messages to queues:
+
+**Pulsar imports:**
+- **Line 2:** `from pulsar.schema import JsonSchema`
+
+**Pulsar-specific usage:**
+- **Line 49:** `JsonSchema(self.schema)` wrapper
+- **Lines 47-51:** `self.client.create_producer()` with Pulsar-specific parameters (topic, schema, chunking_enabled)
+- **Lines 31, 76:** `producer.close()` method
+- **Lines 64-65:** `producer.send()` with message and properties
+
+**Spec file:** `trustgraph-base/trustgraph/base/producer_spec.py`
+- **Line 18:** References `processor.pulsar_client`
+
+### 5. Publisher Abstraction
+
+**Location:** `trustgraph-base/trustgraph/base/publisher.py`
+
+Asynchronous message publishing with queue buffering:
+
+**Pulsar imports:**
+- **Line 2:** `from pulsar.schema import JsonSchema`
+- **Line 6:** `import pulsar`
+
+**Pulsar-specific usage:**
+- **Line 52:** `JsonSchema(self.schema)` wrapper
+- **Lines 50-54:** `self.client.create_producer()` with Pulsar-specific parameters
+- **Lines 101, 103:** `producer.send()` with message and optional properties
+- **Lines 106-107:** `producer.flush()` and `producer.close()` methods
+
+### 6. Subscriber Abstraction
+
+**Location:** `trustgraph-base/trustgraph/base/subscriber.py`
+
+Provides multi-recipient message distribution from queues:
+
+**Pulsar imports:**
+- **Line 6:** `from pulsar.schema import JsonSchema`
+- **Line 8:** `import _pulsar`
+
+**Pulsar-specific usage:**
+- **Line 55:** `JsonSchema(self.schema)` wrapper
+- **Line 57:** `self.client.subscribe(**subscribe_args)`
+- **Lines 101, 136, 160, 167-172:** Pulsar exceptions: `_pulsar.Timeout`, `_pulsar.InvalidConfiguration`, `_pulsar.AlreadyClosed`
+- **Lines 159, 166, 170:** Consumer methods: `negative_acknowledge()`, `unsubscribe()`, `close()`
+- **Lines 247, 251:** Message acknowledgment: `acknowledge()`, `negative_acknowledge()`
+
+**Spec file:** `trustgraph-base/trustgraph/base/subscriber_spec.py`
+- **Line 19:** References `processor.pulsar_client`
+
+### 7. Schema System (Heart of Darkness)
+
+**Location:** `trustgraph-base/trustgraph/schema/`
+
+Every message schema in the system is defined using Pulsar's schema framework.
+
+**Core primitives:** `schema/core/primitives.py`
+- **Line 2:** `from pulsar.schema import Record, String, Boolean, Array, Integer`
+- All schemas inherit from Pulsar's `Record` base class
+- All field types are Pulsar types: `String()`, `Integer()`, `Boolean()`, `Array()`, `Map()`, `Double()`
+
+**Example schemas:**
+- `schema/services/llm.py` (Line 2): `from pulsar.schema import Record, String, Array, Double, Integer, Boolean`
+- `schema/services/config.py` (Line 2): `from pulsar.schema import Record, Bytes, String, Boolean, Array, Map, Integer`
+
+**Topic naming:** `schema/core/topic.py`
+- **Lines 2-3:** Topic format: `{kind}://{tenant}/{namespace}/{topic}`
+- This URI structure is Pulsar-specific (e.g., `persistent://tg/flow/config`)
+
+**Impact:**
+- All request/response message definitions throughout the codebase use Pulsar schemas
+- This includes services for: config, flow, llm, prompt, query, storage, agent, collection, diagnosis, library, lookup, nlp_query, objects_query, retrieval, structured_query
+- Schema definitions are imported and used extensively across all processors and services
+
+## Summary
+
+### Pulsar Dependencies by Category
+
+1. **Client instantiation:**
+   - Direct: `gateway/service.py`
+   - Abstracted: `async_processor.py` → `pubsub.py` (PulsarClient)
+
+2. **Message transport:**
+   - Consumer: `consumer.py`, `consumer_spec.py`
+   - Producer: `producer.py`, `producer_spec.py`
+   - Publisher: `publisher.py`
+   - Subscriber: `subscriber.py`, `subscriber_spec.py`
+
+3. **Schema system:**
+   - Base types: `schema/core/primitives.py`
+   - All service schemas: `schema/services/*.py`
+   - Topic naming: `schema/core/topic.py`
+
+4. **Pulsar-specific concepts required:**
+   - Topic-based messaging
+   - Schema system (Record, field types)
+   - Shared subscriptions
+   - Message acknowledgment (positive/negative)
+   - Consumer positioning (earliest/latest)
+   - Message properties
+   - Initial positions and consumer types
+   - Chunking support
+   - Persistent vs non-persistent topics
+
+### Refactoring Challenges
+
+The good news: The abstraction layer (Consumer, Producer, Publisher, Subscriber) provides a clean encapsulation of most Pulsar interactions.
+
+The challenges:
+1. **Schema system pervasiveness:** Every message definition uses `pulsar.schema.Record` and Pulsar field types
+2. **Pulsar-specific enums:** `InitialPosition`, `ConsumerType`
+3. **Pulsar exceptions:** `_pulsar.Timeout`, `_pulsar.Interrupted`, `_pulsar.InvalidConfiguration`, `_pulsar.AlreadyClosed`
+4. **Method signatures:** `acknowledge()`, `negative_acknowledge()`, `subscribe()`, `create_producer()`, etc.
+5. **Topic URI format:** Pulsar's `kind://tenant/namespace/topic` structure
+
+### Next Steps
+
+To make the pub/sub infrastructure configurable, we need to:
+
+1. Create an abstraction interface for the client/schema system
+2. Abstract Pulsar-specific enums and exceptions
+3. Create schema wrappers or alternative schema definitions
+4. Implement the interface for both Pulsar and alternative systems (Kafka, RabbitMQ, Redis Streams, etc.)
+5. Update `pubsub.py` to be configurable and support multiple backends
+6. Provide migration path for existing deployments
+
+## Approach Draft 1: Adapter Pattern with Schema Translation Layer
+
+### Key Insight
+The **schema system** is the deepest integration point - everything else flows from it. We need to solve this first, or we'll be rewriting the entire codebase.
+
+### Strategy: Minimal Disruption with Adapters
+
+**1. Keep Pulsar schemas as the internal representation**
+- Don't rewrite all the schema definitions
+- Schemas remain `pulsar.schema.Record` internally
+- Use adapters to translate at the boundary between our code and the pub/sub backend
+
+**2. Create a pub/sub abstraction layer:**
+
+```
+┌─────────────────────────────────────┐
+│   Existing Code (unchanged)         │
+│   - Uses Pulsar schemas internally  │
+│   - Consumer/Producer/Publisher     │
+└──────────────┬──────────────────────┘
+               │
+┌──────────────┴──────────────────────┐
+│   PubSubFactory (configurable)      │
+│   - Creates backend-specific client │
+└──────────────┬──────────────────────┘
+               │
+        ┌──────┴──────┐
+        │             │
+┌───────▼─────┐  ┌────▼─────────┐
+│ PulsarAdapter│  │ KafkaAdapter │  etc...
+│ (passthrough)│  │ (translates) │
+└──────────────┘  └──────────────┘
+```
+
+**3. Define abstract interfaces:**
+- `PubSubClient` - client connection
+- `PubSubProducer` - sending messages
+- `PubSubConsumer` - receiving messages
+- `SchemaAdapter` - translating Pulsar schemas to/from JSON or backend-specific formats
+
+**4. Implementation details:**
+
+For **Pulsar adapter**: Nearly passthrough, minimal translation
+
+For **other backends** (Kafka, RabbitMQ, etc.):
+- Serialize Pulsar Record objects to JSON/bytes
+- Map concepts like:
+  - `InitialPosition.Earliest/Latest` → Kafka's auto.offset.reset
+  - `acknowledge()` → Kafka's commit
+  - `negative_acknowledge()` → Re-queue or DLQ pattern
+  - Topic URIs → Backend-specific topic names
+
+### Analysis
+
+**Pros:**
+- ✅ Minimal code changes to existing services
+- ✅ Schemas stay as-is (no massive rewrite)
+- ✅ Gradual migration path
+- ✅ Pulsar users see no difference
+- ✅ New backends added via adapters
+
+**Cons:**
+- ⚠️ Still carries Pulsar dependency (for schema definitions)
+- ⚠️ Some impedance mismatch translating concepts
+
+### Alternative Consideration
+
+Create a **TrustGraph schema system** that's pub/sub agnostic (using dataclasses or Pydantic), then generate Pulsar/Kafka/etc schemas from it. This requires rewriting every schema file and potentially breaking changes.
+
+### Recommendation for Draft 1
+
+Start with the **adapter approach** because:
+1. It's pragmatic - works with existing code
+2. Proves the concept with minimal risk
+3. Can evolve to a native schema system later if needed
+4. Configuration-driven: one env var switches backends
+
+## Approach Draft 2: Backend-Agnostic Schema System with Dataclasses
+
+### Core Concept
+
+Use Python **dataclasses** as the neutral schema definition format. Each pub/sub backend provides its own serialization/deserialization for dataclasses, eliminating the need for Pulsar schemas to remain in the codebase.
+
+### Schema Polymorphism at the Factory Level
+
+Instead of translating Pulsar schemas, **each backend provides its own schema handling** that works with standard Python dataclasses.
+
+### Publisher Flow
+
+```python
+# 1. Get the configured backend from factory
+pubsub = get_pubsub()  # Returns PulsarBackend, MQTTBackend, etc.
+
+# 2. Get schema class from the backend
+# (Can be imported directly - backend-agnostic)
+from trustgraph.schema.services.llm import TextCompletionRequest
+
+# 3. Create a producer/publisher for a specific topic
+producer = pubsub.create_producer(
+    topic="text-completion-requests",
+    schema=TextCompletionRequest  # Tells backend what schema to use
+)
+
+# 4. Create message instances (same API regardless of backend)
+request = TextCompletionRequest(
+    system="You are helpful",
+    prompt="Hello world",
+    streaming=False
+)
+
+# 5. Send the message
+producer.send(request)  # Backend serializes appropriately
+```
+
+### Consumer Flow
+
+```python
+# 1. Get the configured backend
+pubsub = get_pubsub()
+
+# 2. Create a consumer
+consumer = pubsub.subscribe(
+    topic="text-completion-requests",
+    schema=TextCompletionRequest  # Tells backend how to deserialize
+)
+
+# 3. Receive and deserialize
+msg = consumer.receive()
+request = msg.value()  # Returns TextCompletionRequest dataclass instance
+
+# 4. Use the data (type-safe access)
+print(request.system)   # "You are helpful"
+print(request.prompt)   # "Hello world"
+print(request.streaming)  # False
+```
+
+### What Happens Behind the Scenes
+
+**For Pulsar backend:**
+- `create_producer()` → creates Pulsar producer with JSON schema or dynamically generated Record
+- `send(request)` → serializes dataclass to JSON/Pulsar format, sends to Pulsar
+- `receive()` → gets Pulsar message, deserializes back to dataclass
+
+**For MQTT backend:**
+- `create_producer()` → connects to MQTT broker, no schema registration needed
+- `send(request)` → converts dataclass to JSON, publishes to MQTT topic
+- `receive()` → subscribes to MQTT topic, deserializes JSON to dataclass
+
+**For Kafka backend:**
+- `create_producer()` → creates Kafka producer, registers Avro schema if needed
+- `send(request)` → serializes dataclass to Avro format, sends to Kafka
+- `receive()` → gets Kafka message, deserializes Avro back to dataclass
+
+### Key Design Points
+
+1. **Schema object creation**: The dataclass instance (`TextCompletionRequest(...)`) is identical regardless of backend
+2. **Backend handles encoding**: Each backend knows how to serialize its dataclass to the wire format
+3. **Schema definition at creation**: When creating producer/consumer, you specify the schema type
+4. **Type safety preserved**: You get back a proper `TextCompletionRequest` object, not a dict
+5. **No backend leakage**: Application code never imports backend-specific libraries
+
+### Example Transformation
+
+**Current (Pulsar-specific):**
+```python
+# schema/services/llm.py
+from pulsar.schema import Record, String, Boolean, Integer
+
+class TextCompletionRequest(Record):
+    system = String()
+    prompt = String()
+    streaming = Boolean()
+```
+
+**New (Backend-agnostic):**
+```python
+# schema/services/llm.py
+from dataclasses import dataclass
+
+@dataclass
+class TextCompletionRequest:
+    system: str
+    prompt: str
+    streaming: bool = False
+```
+
+### Backend Integration
+
+Each backend handles serialization/deserialization of dataclasses:
+
+**Pulsar backend:**
+- Dynamically generate `pulsar.schema.Record` classes from dataclasses
+- Or serialize dataclasses to JSON and use Pulsar's JSON schema
+- Maintains compatibility with existing Pulsar deployments
+
+**MQTT/Redis backend:**
+- Direct JSON serialization of dataclass instances
+- Use `dataclasses.asdict()` / `from_dict()`
+- Lightweight, no schema registry needed
+
+**Kafka backend:**
+- Generate Avro schemas from dataclass definitions
+- Use Confluent's schema registry
+- Type-safe serialization with schema evolution support
+
+### Architecture
+
+```
+┌─────────────────────────────────────┐
+│   Application Code                  │
+│   - Uses dataclass schemas          │
+│   - Backend-agnostic                │
+└──────────────┬──────────────────────┘
+               │
+┌──────────────┴──────────────────────┐
+│   PubSubFactory (configurable)      │
+│   - get_pubsub() returns backend    │
+└──────────────┬──────────────────────┘
+               │
+        ┌──────┴──────┐
+        │             │
+┌───────▼─────────┐  ┌────▼──────────────┐
+│ PulsarBackend   │  │ MQTTBackend       │
+│ - JSON schema   │  │ - JSON serialize  │
+│ - or dynamic    │  │ - Simple queues   │
+│   Record gen    │  │                   │
+└─────────────────┘  └───────────────────┘
+```
+
+### Implementation Details
+
+**1. Schema definitions:** Plain dataclasses with type hints
+   - `str`, `int`, `bool`, `float` for primitives
+   - `list[T]` for arrays
+   - `dict[str, T]` for maps
+   - Nested dataclasses for complex types
+
+**2. Each backend provides:**
+   - Serializer: `dataclass → bytes/wire format`
+   - Deserializer: `bytes/wire format → dataclass`
+   - Schema registration (if needed, like Pulsar/Kafka)
+
+**3. Consumer/Producer abstraction:**
+   - Already exists (consumer.py, producer.py)
+   - Update to use backend's serialization
+   - Remove direct Pulsar imports
+
+**4. Type mappings:**
+   - Pulsar `String()` → Python `str`
+   - Pulsar `Integer()` → Python `int`
+   - Pulsar `Boolean()` → Python `bool`
+   - Pulsar `Array(T)` → Python `list[T]`
+   - Pulsar `Map(K, V)` → Python `dict[K, V]`
+   - Pulsar `Double()` → Python `float`
+   - Pulsar `Bytes()` → Python `bytes`
+
+### Migration Path
+
+1. **Create dataclass versions** of all schemas in `trustgraph/schema/`
+2. **Update backend classes** (Consumer, Producer, Publisher, Subscriber) to use backend-provided serialization
+3. **Implement PulsarBackend** with JSON schema or dynamic Record generation
+4. **Test with Pulsar** to ensure backward compatibility with existing deployments
+5. **Add new backends** (MQTT, Kafka, Redis, etc.) as needed
+6. **Remove Pulsar imports** from schema files
+
+### Benefits
+
+✅ **No pub/sub dependency** in schema definitions
+✅ **Standard Python** - easy to understand, type-check, document
+✅ **Modern tooling** - works with mypy, IDE autocomplete, linters
+✅ **Backend-optimized** - each backend uses native serialization
+✅ **No translation overhead** - direct serialization, no adapters
+✅ **Type safety** - real objects with proper types
+✅ **Easy validation** - can use Pydantic if needed
+
+### Challenges & Solutions
+
+**Challenge:** Pulsar's `Record` has runtime field validation
+**Solution:** Use Pydantic dataclasses for validation if needed, or Python 3.10+ dataclass features with `__post_init__`
+
+**Challenge:** Some Pulsar-specific features (like `Bytes` type)
+**Solution:** Map to `bytes` type in dataclass, backend handles encoding appropriately
+
+**Challenge:** Topic naming (`persistent://tenant/namespace/topic`)
+**Solution:** Abstract topic names in schema definitions, backend converts to proper format
+
+**Challenge:** Schema evolution and versioning
+**Solution:** Each backend handles this according to its capabilities (Pulsar schema versions, Kafka schema registry, etc.)
+
+**Challenge:** Nested complex types
+**Solution:** Use nested dataclasses, backends recursively serialize/deserialize
+
+### Design Decisions
+
+1. **Plain dataclasses or Pydantic?**
+   - ✅ **Decision: Use plain Python dataclasses**
+   - Simpler, no additional dependencies
+   - Validation not required in practice
+   - Easier to understand and maintain
+
+2. **Schema evolution:**
+   - ✅ **Decision: No versioning mechanism needed**
+   - Schemas are stable and long-lasting
+   - Updates typically add new fields (backward compatible)
+   - Backends handle schema evolution according to their capabilities
+
+3. **Backward compatibility:**
+   - ✅ **Decision: Major version change, no backward compatibility required**
+   - Will be a breaking change with migration instructions
+   - Clean break allows for better design
+   - Migration guide will be provided for existing deployments
+
+4. **Nested types and complex structures:**
+   - ✅ **Decision: Use nested dataclasses naturally**
+   - Python dataclasses handle nesting perfectly
+   - `list[T]` for arrays, `dict[K, V]` for maps
+   - Backends recursively serialize/deserialize
+   - Example:
+     ```python
+     @dataclass
+     class Value:
+         value: str
+         is_uri: bool
+
+     @dataclass
+     class Triple:
+         s: Value              # Nested dataclass
+         p: Value
+         o: Value
+
+     @dataclass
+     class GraphQuery:
+         triples: list[Triple]  # Array of nested dataclasses
+         metadata: dict[str, str]
+     ```
+
+5. **Default values and optional fields:**
+   - ✅ **Decision: Mix of required, defaults, and optional fields**
+   - Required fields: No default value
+   - Fields with defaults: Always present, have sensible default
+   - Truly optional fields: `T | None = None`, omitted from serialization when `None`
+   - Example:
+     ```python
+     @dataclass
+     class TextCompletionRequest:
+         system: str              # Required, no default
+         prompt: str              # Required, no default
+         streaming: bool = False  # Optional with default value
+         metadata: dict | None = None  # Truly optional, can be absent
+     ```
+
+   **Important serialization semantics:**
+
+   When `metadata = None`:
+   ```json
+   {
+       "system": "...",
+       "prompt": "...",
+       "streaming": false
+       // metadata field NOT PRESENT
+   }
+   ```
+
+   When `metadata = {}` (explicitly empty):
+   ```json
+   {
+       "system": "...",
+       "prompt": "...",
+       "streaming": false,
+       "metadata": {}  // Field PRESENT but empty
+   }
+   ```
+
+   **Key distinction:**
+   - `None` → field absent from JSON (not serialized)
+   - Empty value (`{}`, `[]`, `""`) → field present with empty value
+   - This matters semantically: "not provided" vs "explicitly empty"
+   - Serialization backends must skip `None` fields, not encode as `null`
+
+## Approach Draft 3: Implementation Details
+
+### Generic Queue Naming Format
+
+Replace backend-specific queue names with a generic format that backends can map appropriately.
+
+**Format:** `{qos}/{tenant}/{namespace}/{queue-name}`
+
+Where:
+- `qos`: Quality of Service level
+  - `q0` = best-effort (fire and forget, no acknowledgment)
+  - `q1` = at-least-once (requires acknowledgment)
+  - `q2` = exactly-once (two-phase acknowledgment)
+- `tenant`: Logical grouping for multi-tenancy
+- `namespace`: Sub-grouping within tenant
+- `queue-name`: Actual queue/topic name
+
+**Examples:**
+```
+q1/tg/flow/text-completion-requests
+q2/tg/config/config-push
+q0/tg/metrics/stats
+```
+
+### Backend Topic Mapping
+
+Each backend maps the generic format to its native format:
+
+**Pulsar Backend:**
+```python
+def map_topic(self, generic_topic: str) -> str:
+    # Parse: q1/tg/flow/text-completion-requests
+    qos, tenant, namespace, queue = generic_topic.split('/', 3)
+
+    # Map QoS to persistence
+    persistence = 'persistent' if qos in ['q1', 'q2'] else 'non-persistent'
+
+    # Return Pulsar URI: persistent://tg/flow/text-completion-requests
+    return f"{persistence}://{tenant}/{namespace}/{queue}"
+```
+
+**MQTT Backend:**
+```python
+def map_topic(self, generic_topic: str) -> tuple[str, int]:
+    # Parse: q1/tg/flow/text-completion-requests
+    qos, tenant, namespace, queue = generic_topic.split('/', 3)
+
+    # Map QoS level
+    qos_level = {'q0': 0, 'q1': 1, 'q2': 2}[qos]
+
+    # Build MQTT topic including tenant/namespace for proper namespacing
+    mqtt_topic = f"{tenant}/{namespace}/{queue}"
+
+    return mqtt_topic, qos_level
+```
+
+### Updated Topic Helper Function
+
+```python
+# schema/core/topic.py
+def topic(queue_name, qos='q1', tenant='tg', namespace='flow'):
+    """
+    Create a generic topic identifier that can be mapped by backends.
+
+    Args:
+        queue_name: The queue/topic name
+        qos: Quality of service
+             - 'q0' = best-effort (no ack)
+             - 'q1' = at-least-once (ack required)
+             - 'q2' = exactly-once (two-phase ack)
+        tenant: Tenant identifier for multi-tenancy
+        namespace: Namespace within tenant
+
+    Returns:
+        Generic topic string: qos/tenant/namespace/queue_name
+
+    Examples:
+        topic('my-queue')  # q1/tg/flow/my-queue
+        topic('config', qos='q2', namespace='config')  # q2/tg/config/config
+    """
+    return f"{qos}/{tenant}/{namespace}/{queue_name}"
+```
+
+### Configuration and Initialization
+
+**Command-Line Arguments + Environment Variables:**
+
+```python
+# In base/async_processor.py - add_args() method
+@staticmethod
+def add_args(parser):
+    # Pub/sub backend selection
+    parser.add_argument(
+        '--pubsub-backend',
+        default=os.getenv('PUBSUB_BACKEND', 'pulsar'),
+        choices=['pulsar', 'mqtt'],
+        help='Pub/sub backend (default: pulsar, env: PUBSUB_BACKEND)'
+    )
+
+    # Pulsar-specific configuration
+    parser.add_argument(
+        '--pulsar-host',
+        default=os.getenv('PULSAR_HOST', 'pulsar://localhost:6650'),
+        help='Pulsar host (default: pulsar://localhost:6650, env: PULSAR_HOST)'
+    )
+
+    parser.add_argument(
+        '--pulsar-api-key',
+        default=os.getenv('PULSAR_API_KEY', None),
+        help='Pulsar API key (env: PULSAR_API_KEY)'
+    )
+
+    parser.add_argument(
+        '--pulsar-listener',
+        default=os.getenv('PULSAR_LISTENER', None),
+        help='Pulsar listener name (env: PULSAR_LISTENER)'
+    )
+
+    # MQTT-specific configuration
+    parser.add_argument(
+        '--mqtt-host',
+        default=os.getenv('MQTT_HOST', 'localhost'),
+        help='MQTT broker host (default: localhost, env: MQTT_HOST)'
+    )
+
+    parser.add_argument(
+        '--mqtt-port',
+        type=int,
+        default=int(os.getenv('MQTT_PORT', '1883')),
+        help='MQTT broker port (default: 1883, env: MQTT_PORT)'
+    )
+
+    parser.add_argument(
+        '--mqtt-username',
+        default=os.getenv('MQTT_USERNAME', None),
+        help='MQTT username (env: MQTT_USERNAME)'
+    )
+
+    parser.add_argument(
+        '--mqtt-password',
+        default=os.getenv('MQTT_PASSWORD', None),
+        help='MQTT password (env: MQTT_PASSWORD)'
+    )
+```
+
+**Factory Function:**
+
+```python
+# In base/pubsub.py or base/pubsub_factory.py
+def get_pubsub(**config) -> PubSubBackend:
+    """
+    Create and return a pub/sub backend based on configuration.
+
+    Args:
+        config: Configuration dict from command-line args
+                Must include 'pubsub_backend' key
+
+    Returns:
+        Backend instance (PulsarBackend, MQTTBackend, etc.)
+    """
+    backend_type = config.get('pubsub_backend', 'pulsar')
+
+    if backend_type == 'pulsar':
+        return PulsarBackend(
+            host=config.get('pulsar_host'),
+            api_key=config.get('pulsar_api_key'),
+            listener=config.get('pulsar_listener'),
+        )
+    elif backend_type == 'mqtt':
+        return MQTTBackend(
+            host=config.get('mqtt_host'),
+            port=config.get('mqtt_port'),
+            username=config.get('mqtt_username'),
+            password=config.get('mqtt_password'),
+        )
+    else:
+        raise ValueError(f"Unknown pub/sub backend: {backend_type}")
+```
+
+**Usage in AsyncProcessor:**
+
+```python
+# In async_processor.py
+class AsyncProcessor:
+    def __init__(self, **params):
+        self.id = params.get("id")
+
+        # Create backend from config (replaces PulsarClient)
+        self.pubsub = get_pubsub(**params)
+
+        # Rest of initialization...
+```
+
+### Backend Interface
+
+```python
+class PubSubBackend(Protocol):
+    """Protocol defining the interface all pub/sub backends must implement."""
+
+    def create_producer(self, topic: str, schema: type, **options) -> BackendProducer:
+        """
+        Create a producer for a topic.
+
+        Args:
+            topic: Generic topic format (qos/tenant/namespace/queue)
+            schema: Dataclass type for messages
+            options: Backend-specific options (e.g., chunking_enabled)
+
+        Returns:
+            Backend-specific producer instance
+        """
+        ...
+
+    def create_consumer(
+        self,
+        topic: str,
+        subscription: str,
+        schema: type,
+        initial_position: str = 'latest',
+        consumer_type: str = 'shared',
+        **options
+    ) -> BackendConsumer:
+        """
+        Create a consumer for a topic.
+
+        Args:
+            topic: Generic topic format (qos/tenant/namespace/queue)
+            subscription: Subscription/consumer group name
+            schema: Dataclass type for messages
+            initial_position: 'earliest' or 'latest' (MQTT may ignore)
+            consumer_type: 'shared', 'exclusive', 'failover' (MQTT may ignore)
+            options: Backend-specific options
+
+        Returns:
+            Backend-specific consumer instance
+        """
+        ...
+
+    def close(self) -> None:
+        """Close the backend connection."""
+        ...
+```
+
+```python
+class BackendProducer(Protocol):
+    """Protocol for backend-specific producer."""
+
+    def send(self, message: Any, properties: dict = {}) -> None:
+        """Send a message (dataclass instance) with optional properties."""
+        ...
+
+    def flush(self) -> None:
+        """Flush any buffered messages."""
+        ...
+
+    def close(self) -> None:
+        """Close the producer."""
+        ...
+```
+
+```python
+class BackendConsumer(Protocol):
+    """Protocol for backend-specific consumer."""
+
+    def receive(self, timeout_millis: int = 2000) -> Message:
+        """
+        Receive a message from the topic.
+
+        Raises:
+            TimeoutError: If no message received within timeout
+        """
+        ...
+
+    def acknowledge(self, message: Message) -> None:
+        """Acknowledge successful processing of a message."""
+        ...
+
+    def negative_acknowledge(self, message: Message) -> None:
+        """Negative acknowledge - triggers redelivery."""
+        ...
+
+    def unsubscribe(self) -> None:
+        """Unsubscribe from the topic."""
+        ...
+
+    def close(self) -> None:
+        """Close the consumer."""
+        ...
+```
+
+```python
+class Message(Protocol):
+    """Protocol for a received message."""
+
+    def value(self) -> Any:
+        """Get the deserialized message (dataclass instance)."""
+        ...
+
+    def properties(self) -> dict:
+        """Get message properties/metadata."""
+        ...
+```
+
+### Existing Classes Refactoring
+
+The existing `Consumer`, `Producer`, `Publisher`, `Subscriber` classes remain largely intact:
+
+**Current responsibilities (keep):**
+- Async threading model and taskgroups
+- Reconnection logic and retry handling
+- Metrics collection
+- Rate limiting
+- Concurrency management
+
+**Changes needed:**
+- Remove direct Pulsar imports (`pulsar.schema`, `pulsar.InitialPosition`, etc.)
+- Accept `BackendProducer`/`BackendConsumer` instead of Pulsar client
+- Delegate actual pub/sub operations to backend instances
+- Map generic concepts to backend calls
+
+**Example refactoring:**
+
+```python
+# OLD - consumer.py
+class Consumer:
+    def __init__(self, client, topic, subscriber, schema, ...):
+        self.client = client  # Direct Pulsar client
+        # ...
+
+    async def consumer_run(self):
+        # Uses pulsar.InitialPosition, pulsar.ConsumerType
+        self.consumer = self.client.subscribe(
+            topic=self.topic,
+            schema=JsonSchema(self.schema),
+            initial_position=pulsar.InitialPosition.Earliest,
+            consumer_type=pulsar.ConsumerType.Shared,
+        )
+
+# NEW - consumer.py
+class Consumer:
+    def __init__(self, backend_consumer, schema, ...):
+        self.backend_consumer = backend_consumer  # Backend-specific consumer
+        self.schema = schema
+        # ...
+
+    async def consumer_run(self):
+        # Backend consumer already created with right settings
+        # Just use it directly
+        while self.running:
+            msg = await asyncio.to_thread(
+                self.backend_consumer.receive,
+                timeout_millis=2000
+            )
+            await self.handle_message(msg)
+```
+
+### Backend-Specific Behaviors
+
+**Pulsar Backend:**
+- Maps `q0` → `non-persistent://`, `q1`/`q2` → `persistent://`
+- Supports all consumer types (shared, exclusive, failover)
+- Supports initial position (earliest/latest)
+- Native message acknowledgment
+- Schema registry support
+
+**MQTT Backend:**
+- Maps `q0`/`q1`/`q2` → MQTT QoS levels 0/1/2
+- Includes tenant/namespace in topic path for namespacing
+- Auto-generates client IDs from subscription names
+- Ignores initial position (no message history in basic MQTT)
+- Ignores consumer type (MQTT uses client IDs, not consumer groups)
+- Simple publish/subscribe model
+
+### Design Decisions Summary
+
+1. ✅ **Generic queue naming**: `qos/tenant/namespace/queue-name` format
+2. ✅ **QoS in queue ID**: Determined by queue definition, not configuration
+3. ✅ **Reconnection**: Handled by Consumer/Producer classes, not backends
+4. ✅ **MQTT topics**: Include tenant/namespace for proper namespacing
+5. ✅ **Message history**: MQTT ignores `initial_position` parameter (future enhancement)
+6. ✅ **Client IDs**: MQTT backend auto-generates from subscription name
+
+### Future Enhancements
+
+**MQTT message history:**
+- Could add optional persistence layer (e.g., retained messages, external store)
+- Would allow supporting `initial_position='earliest'`
+- Not required for initial implementation
+