mirror of
https://github.com/trustgraph-ai/trustgraph.git
synced 2026-04-25 08:26:21 +02:00
959 lines
33 KiB
Markdown
959 lines
33 KiB
Markdown
|
|
# Pub/Sub Infrastructure
|
||
|
|
|
||
|
|
## Overview
|
||
|
|
|
||
|
|
This document catalogs all connections between the TrustGraph codebase and the pub/sub infrastructure. Currently, the system is hardcoded to use Apache Pulsar. This analysis identifies all integration points to inform future refactoring toward a configurable pub/sub abstraction.
|
||
|
|
|
||
|
|
## Current State: Pulsar Integration Points
|
||
|
|
|
||
|
|
### 1. Direct Pulsar Client Usage
|
||
|
|
|
||
|
|
**Location:** `trustgraph-flow/trustgraph/gateway/service.py`
|
||
|
|
|
||
|
|
The API gateway directly imports and instantiates the Pulsar client:
|
||
|
|
|
||
|
|
- **Line 20:** `import pulsar`
|
||
|
|
- **Lines 54-61:** Direct instantiation of `pulsar.Client()` with optional `pulsar.AuthenticationToken()`
|
||
|
|
- **Lines 33-35:** Default Pulsar host configuration from environment variables
|
||
|
|
- **Lines 178-192:** CLI arguments for `--pulsar-host`, `--pulsar-api-key`, and `--pulsar-listener`
|
||
|
|
- **Lines 78, 124:** Passes `pulsar_client` to `ConfigReceiver` and `DispatcherManager`
|
||
|
|
|
||
|
|
This is the only location that directly instantiates a Pulsar client outside of the abstraction layer.
|
||
|
|
|
||
|
|
### 2. Base Processor Framework
|
||
|
|
|
||
|
|
**Location:** `trustgraph-base/trustgraph/base/async_processor.py`
|
||
|
|
|
||
|
|
The base class for all processors provides Pulsar connectivity:
|
||
|
|
|
||
|
|
- **Line 9:** `import _pulsar` (for exception handling)
|
||
|
|
- **Line 18:** `from . pubsub import PulsarClient`
|
||
|
|
- **Line 38:** Creates `pulsar_client_object = PulsarClient(**params)`
|
||
|
|
- **Lines 104-108:** Properties exposing `pulsar_host` and `pulsar_client`
|
||
|
|
- **Line 250:** Static method `add_args()` calls `PulsarClient.add_args(parser)` for CLI arguments
|
||
|
|
- **Lines 223-225:** Exception handling for `_pulsar.Interrupted`
|
||
|
|
|
||
|
|
All processors inherit from `AsyncProcessor`, making this the central integration point.
|
||
|
|
|
||
|
|
### 3. Consumer Abstraction
|
||
|
|
|
||
|
|
**Location:** `trustgraph-base/trustgraph/base/consumer.py`
|
||
|
|
|
||
|
|
Consumes messages from queues and invokes handler functions:
|
||
|
|
|
||
|
|
**Pulsar imports:**
|
||
|
|
- **Line 12:** `from pulsar.schema import JsonSchema`
|
||
|
|
- **Line 13:** `import pulsar`
|
||
|
|
- **Line 14:** `import _pulsar`
|
||
|
|
|
||
|
|
**Pulsar-specific usage:**
|
||
|
|
- **Lines 100, 102:** `pulsar.InitialPosition.Earliest` / `pulsar.InitialPosition.Latest`
|
||
|
|
- **Line 108:** `JsonSchema(self.schema)` wrapper
|
||
|
|
- **Line 110:** `pulsar.ConsumerType.Shared`
|
||
|
|
- **Lines 104-111:** `self.client.subscribe()` with Pulsar-specific parameters
|
||
|
|
- **Lines 143, 150, 65:** `consumer.unsubscribe()` and `consumer.close()` methods
|
||
|
|
- **Line 162:** `_pulsar.Timeout` exception
|
||
|
|
- **Lines 182, 205, 232:** `consumer.acknowledge()` / `consumer.negative_acknowledge()`
|
||
|
|
|
||
|
|
**Spec file:** `trustgraph-base/trustgraph/base/consumer_spec.py`
|
||
|
|
- **Line 22:** References `processor.pulsar_client`
|
||
|
|
|
||
|
|
### 4. Producer Abstraction
|
||
|
|
|
||
|
|
**Location:** `trustgraph-base/trustgraph/base/producer.py`
|
||
|
|
|
||
|
|
Sends messages to queues:
|
||
|
|
|
||
|
|
**Pulsar imports:**
|
||
|
|
- **Line 2:** `from pulsar.schema import JsonSchema`
|
||
|
|
|
||
|
|
**Pulsar-specific usage:**
|
||
|
|
- **Line 49:** `JsonSchema(self.schema)` wrapper
|
||
|
|
- **Lines 47-51:** `self.client.create_producer()` with Pulsar-specific parameters (topic, schema, chunking_enabled)
|
||
|
|
- **Lines 31, 76:** `producer.close()` method
|
||
|
|
- **Lines 64-65:** `producer.send()` with message and properties
|
||
|
|
|
||
|
|
**Spec file:** `trustgraph-base/trustgraph/base/producer_spec.py`
|
||
|
|
- **Line 18:** References `processor.pulsar_client`
|
||
|
|
|
||
|
|
### 5. Publisher Abstraction
|
||
|
|
|
||
|
|
**Location:** `trustgraph-base/trustgraph/base/publisher.py`
|
||
|
|
|
||
|
|
Asynchronous message publishing with queue buffering:
|
||
|
|
|
||
|
|
**Pulsar imports:**
|
||
|
|
- **Line 2:** `from pulsar.schema import JsonSchema`
|
||
|
|
- **Line 6:** `import pulsar`
|
||
|
|
|
||
|
|
**Pulsar-specific usage:**
|
||
|
|
- **Line 52:** `JsonSchema(self.schema)` wrapper
|
||
|
|
- **Lines 50-54:** `self.client.create_producer()` with Pulsar-specific parameters
|
||
|
|
- **Lines 101, 103:** `producer.send()` with message and optional properties
|
||
|
|
- **Lines 106-107:** `producer.flush()` and `producer.close()` methods
|
||
|
|
|
||
|
|
### 6. Subscriber Abstraction
|
||
|
|
|
||
|
|
**Location:** `trustgraph-base/trustgraph/base/subscriber.py`
|
||
|
|
|
||
|
|
Provides multi-recipient message distribution from queues:
|
||
|
|
|
||
|
|
**Pulsar imports:**
|
||
|
|
- **Line 6:** `from pulsar.schema import JsonSchema`
|
||
|
|
- **Line 8:** `import _pulsar`
|
||
|
|
|
||
|
|
**Pulsar-specific usage:**
|
||
|
|
- **Line 55:** `JsonSchema(self.schema)` wrapper
|
||
|
|
- **Line 57:** `self.client.subscribe(**subscribe_args)`
|
||
|
|
- **Lines 101, 136, 160, 167-172:** Pulsar exceptions: `_pulsar.Timeout`, `_pulsar.InvalidConfiguration`, `_pulsar.AlreadyClosed`
|
||
|
|
- **Lines 159, 166, 170:** Consumer methods: `negative_acknowledge()`, `unsubscribe()`, `close()`
|
||
|
|
- **Lines 247, 251:** Message acknowledgment: `acknowledge()`, `negative_acknowledge()`
|
||
|
|
|
||
|
|
**Spec file:** `trustgraph-base/trustgraph/base/subscriber_spec.py`
|
||
|
|
- **Line 19:** References `processor.pulsar_client`
|
||
|
|
|
||
|
|
### 7. Schema System (Heart of Darkness)
|
||
|
|
|
||
|
|
**Location:** `trustgraph-base/trustgraph/schema/`
|
||
|
|
|
||
|
|
Every message schema in the system is defined using Pulsar's schema framework.
|
||
|
|
|
||
|
|
**Core primitives:** `schema/core/primitives.py`
|
||
|
|
- **Line 2:** `from pulsar.schema import Record, String, Boolean, Array, Integer`
|
||
|
|
- All schemas inherit from Pulsar's `Record` base class
|
||
|
|
- All field types are Pulsar types: `String()`, `Integer()`, `Boolean()`, `Array()`, `Map()`, `Double()`
|
||
|
|
|
||
|
|
**Example schemas:**
|
||
|
|
- `schema/services/llm.py` (Line 2): `from pulsar.schema import Record, String, Array, Double, Integer, Boolean`
|
||
|
|
- `schema/services/config.py` (Line 2): `from pulsar.schema import Record, Bytes, String, Boolean, Array, Map, Integer`
|
||
|
|
|
||
|
|
**Topic naming:** `schema/core/topic.py`
|
||
|
|
- **Lines 2-3:** Topic format: `{kind}://{tenant}/{namespace}/{topic}`
|
||
|
|
- This URI structure is Pulsar-specific (e.g., `persistent://tg/flow/config`)
|
||
|
|
|
||
|
|
**Impact:**
|
||
|
|
- All request/response message definitions throughout the codebase use Pulsar schemas
|
||
|
|
- This includes services for: config, flow, llm, prompt, query, storage, agent, collection, diagnosis, library, lookup, nlp_query, objects_query, retrieval, structured_query
|
||
|
|
- Schema definitions are imported and used extensively across all processors and services
|
||
|
|
|
||
|
|
## Summary
|
||
|
|
|
||
|
|
### Pulsar Dependencies by Category
|
||
|
|
|
||
|
|
1. **Client instantiation:**
|
||
|
|
- Direct: `gateway/service.py`
|
||
|
|
- Abstracted: `async_processor.py` → `pubsub.py` (PulsarClient)
|
||
|
|
|
||
|
|
2. **Message transport:**
|
||
|
|
- Consumer: `consumer.py`, `consumer_spec.py`
|
||
|
|
- Producer: `producer.py`, `producer_spec.py`
|
||
|
|
- Publisher: `publisher.py`
|
||
|
|
- Subscriber: `subscriber.py`, `subscriber_spec.py`
|
||
|
|
|
||
|
|
3. **Schema system:**
|
||
|
|
- Base types: `schema/core/primitives.py`
|
||
|
|
- All service schemas: `schema/services/*.py`
|
||
|
|
- Topic naming: `schema/core/topic.py`
|
||
|
|
|
||
|
|
4. **Pulsar-specific concepts required:**
|
||
|
|
- Topic-based messaging
|
||
|
|
- Schema system (Record, field types)
|
||
|
|
- Shared subscriptions
|
||
|
|
- Message acknowledgment (positive/negative)
|
||
|
|
- Consumer positioning (earliest/latest)
|
||
|
|
- Message properties
|
||
|
|
- Initial positions and consumer types
|
||
|
|
- Chunking support
|
||
|
|
- Persistent vs non-persistent topics
|
||
|
|
|
||
|
|
### Refactoring Challenges
|
||
|
|
|
||
|
|
The good news: The abstraction layer (Consumer, Producer, Publisher, Subscriber) provides a clean encapsulation of most Pulsar interactions.
|
||
|
|
|
||
|
|
The challenges:
|
||
|
|
1. **Schema system pervasiveness:** Every message definition uses `pulsar.schema.Record` and Pulsar field types
|
||
|
|
2. **Pulsar-specific enums:** `InitialPosition`, `ConsumerType`
|
||
|
|
3. **Pulsar exceptions:** `_pulsar.Timeout`, `_pulsar.Interrupted`, `_pulsar.InvalidConfiguration`, `_pulsar.AlreadyClosed`
|
||
|
|
4. **Method signatures:** `acknowledge()`, `negative_acknowledge()`, `subscribe()`, `create_producer()`, etc.
|
||
|
|
5. **Topic URI format:** Pulsar's `kind://tenant/namespace/topic` structure
|
||
|
|
|
||
|
|
### Next Steps
|
||
|
|
|
||
|
|
To make the pub/sub infrastructure configurable, we need to:
|
||
|
|
|
||
|
|
1. Create an abstraction interface for the client/schema system
|
||
|
|
2. Abstract Pulsar-specific enums and exceptions
|
||
|
|
3. Create schema wrappers or alternative schema definitions
|
||
|
|
4. Implement the interface for both Pulsar and alternative systems (Kafka, RabbitMQ, Redis Streams, etc.)
|
||
|
|
5. Update `pubsub.py` to be configurable and support multiple backends
|
||
|
|
6. Provide migration path for existing deployments
|
||
|
|
|
||
|
|
## Approach Draft 1: Adapter Pattern with Schema Translation Layer
|
||
|
|
|
||
|
|
### Key Insight
|
||
|
|
The **schema system** is the deepest integration point - everything else flows from it. We need to solve this first, or we'll be rewriting the entire codebase.
|
||
|
|
|
||
|
|
### Strategy: Minimal Disruption with Adapters
|
||
|
|
|
||
|
|
**1. Keep Pulsar schemas as the internal representation**
|
||
|
|
- Don't rewrite all the schema definitions
|
||
|
|
- Schemas remain `pulsar.schema.Record` internally
|
||
|
|
- Use adapters to translate at the boundary between our code and the pub/sub backend
|
||
|
|
|
||
|
|
**2. Create a pub/sub abstraction layer:**
|
||
|
|
|
||
|
|
```
|
||
|
|
┌─────────────────────────────────────┐
|
||
|
|
│ Existing Code (unchanged) │
|
||
|
|
│ - Uses Pulsar schemas internally │
|
||
|
|
│ - Consumer/Producer/Publisher │
|
||
|
|
└──────────────┬──────────────────────┘
|
||
|
|
│
|
||
|
|
┌──────────────┴──────────────────────┐
|
||
|
|
│ PubSubFactory (configurable) │
|
||
|
|
│ - Creates backend-specific client │
|
||
|
|
└──────────────┬──────────────────────┘
|
||
|
|
│
|
||
|
|
┌──────┴──────┐
|
||
|
|
│ │
|
||
|
|
┌───────▼─────┐ ┌────▼─────────┐
|
||
|
|
│ PulsarAdapter│ │ KafkaAdapter │ etc...
|
||
|
|
│ (passthrough)│ │ (translates) │
|
||
|
|
└──────────────┘ └──────────────┘
|
||
|
|
```
|
||
|
|
|
||
|
|
**3. Define abstract interfaces:**
|
||
|
|
- `PubSubClient` - client connection
|
||
|
|
- `PubSubProducer` - sending messages
|
||
|
|
- `PubSubConsumer` - receiving messages
|
||
|
|
- `SchemaAdapter` - translating Pulsar schemas to/from JSON or backend-specific formats
|
||
|
|
|
||
|
|
**4. Implementation details:**
|
||
|
|
|
||
|
|
For **Pulsar adapter**: Nearly passthrough, minimal translation
|
||
|
|
|
||
|
|
For **other backends** (Kafka, RabbitMQ, etc.):
|
||
|
|
- Serialize Pulsar Record objects to JSON/bytes
|
||
|
|
- Map concepts like:
|
||
|
|
- `InitialPosition.Earliest/Latest` → Kafka's auto.offset.reset
|
||
|
|
- `acknowledge()` → Kafka's commit
|
||
|
|
- `negative_acknowledge()` → Re-queue or DLQ pattern
|
||
|
|
- Topic URIs → Backend-specific topic names
|
||
|
|
|
||
|
|
### Analysis
|
||
|
|
|
||
|
|
**Pros:**
|
||
|
|
- ✅ Minimal code changes to existing services
|
||
|
|
- ✅ Schemas stay as-is (no massive rewrite)
|
||
|
|
- ✅ Gradual migration path
|
||
|
|
- ✅ Pulsar users see no difference
|
||
|
|
- ✅ New backends added via adapters
|
||
|
|
|
||
|
|
**Cons:**
|
||
|
|
- ⚠️ Still carries Pulsar dependency (for schema definitions)
|
||
|
|
- ⚠️ Some impedance mismatch translating concepts
|
||
|
|
|
||
|
|
### Alternative Consideration
|
||
|
|
|
||
|
|
Create a **TrustGraph schema system** that's pub/sub agnostic (using dataclasses or Pydantic), then generate Pulsar/Kafka/etc schemas from it. This requires rewriting every schema file and potentially breaking changes.
|
||
|
|
|
||
|
|
### Recommendation for Draft 1
|
||
|
|
|
||
|
|
Start with the **adapter approach** because:
|
||
|
|
1. It's pragmatic - works with existing code
|
||
|
|
2. Proves the concept with minimal risk
|
||
|
|
3. Can evolve to a native schema system later if needed
|
||
|
|
4. Configuration-driven: one env var switches backends
|
||
|
|
|
||
|
|
## Approach Draft 2: Backend-Agnostic Schema System with Dataclasses
|
||
|
|
|
||
|
|
### Core Concept
|
||
|
|
|
||
|
|
Use Python **dataclasses** as the neutral schema definition format. Each pub/sub backend provides its own serialization/deserialization for dataclasses, eliminating the need for Pulsar schemas to remain in the codebase.
|
||
|
|
|
||
|
|
### Schema Polymorphism at the Factory Level
|
||
|
|
|
||
|
|
Instead of translating Pulsar schemas, **each backend provides its own schema handling** that works with standard Python dataclasses.
|
||
|
|
|
||
|
|
### Publisher Flow
|
||
|
|
|
||
|
|
```python
|
||
|
|
# 1. Get the configured backend from factory
|
||
|
|
pubsub = get_pubsub() # Returns PulsarBackend, MQTTBackend, etc.
|
||
|
|
|
||
|
|
# 2. Get schema class from the backend
|
||
|
|
# (Can be imported directly - backend-agnostic)
|
||
|
|
from trustgraph.schema.services.llm import TextCompletionRequest
|
||
|
|
|
||
|
|
# 3. Create a producer/publisher for a specific topic
|
||
|
|
producer = pubsub.create_producer(
|
||
|
|
topic="text-completion-requests",
|
||
|
|
schema=TextCompletionRequest # Tells backend what schema to use
|
||
|
|
)
|
||
|
|
|
||
|
|
# 4. Create message instances (same API regardless of backend)
|
||
|
|
request = TextCompletionRequest(
|
||
|
|
system="You are helpful",
|
||
|
|
prompt="Hello world",
|
||
|
|
streaming=False
|
||
|
|
)
|
||
|
|
|
||
|
|
# 5. Send the message
|
||
|
|
producer.send(request) # Backend serializes appropriately
|
||
|
|
```
|
||
|
|
|
||
|
|
### Consumer Flow
|
||
|
|
|
||
|
|
```python
|
||
|
|
# 1. Get the configured backend
|
||
|
|
pubsub = get_pubsub()
|
||
|
|
|
||
|
|
# 2. Create a consumer
|
||
|
|
consumer = pubsub.subscribe(
|
||
|
|
topic="text-completion-requests",
|
||
|
|
schema=TextCompletionRequest # Tells backend how to deserialize
|
||
|
|
)
|
||
|
|
|
||
|
|
# 3. Receive and deserialize
|
||
|
|
msg = consumer.receive()
|
||
|
|
request = msg.value() # Returns TextCompletionRequest dataclass instance
|
||
|
|
|
||
|
|
# 4. Use the data (type-safe access)
|
||
|
|
print(request.system) # "You are helpful"
|
||
|
|
print(request.prompt) # "Hello world"
|
||
|
|
print(request.streaming) # False
|
||
|
|
```
|
||
|
|
|
||
|
|
### What Happens Behind the Scenes
|
||
|
|
|
||
|
|
**For Pulsar backend:**
|
||
|
|
- `create_producer()` → creates Pulsar producer with JSON schema or dynamically generated Record
|
||
|
|
- `send(request)` → serializes dataclass to JSON/Pulsar format, sends to Pulsar
|
||
|
|
- `receive()` → gets Pulsar message, deserializes back to dataclass
|
||
|
|
|
||
|
|
**For MQTT backend:**
|
||
|
|
- `create_producer()` → connects to MQTT broker, no schema registration needed
|
||
|
|
- `send(request)` → converts dataclass to JSON, publishes to MQTT topic
|
||
|
|
- `receive()` → subscribes to MQTT topic, deserializes JSON to dataclass
|
||
|
|
|
||
|
|
**For Kafka backend:**
|
||
|
|
- `create_producer()` → creates Kafka producer, registers Avro schema if needed
|
||
|
|
- `send(request)` → serializes dataclass to Avro format, sends to Kafka
|
||
|
|
- `receive()` → gets Kafka message, deserializes Avro back to dataclass
|
||
|
|
|
||
|
|
### Key Design Points
|
||
|
|
|
||
|
|
1. **Schema object creation**: The dataclass instance (`TextCompletionRequest(...)`) is identical regardless of backend
|
||
|
|
2. **Backend handles encoding**: Each backend knows how to serialize its dataclass to the wire format
|
||
|
|
3. **Schema definition at creation**: When creating producer/consumer, you specify the schema type
|
||
|
|
4. **Type safety preserved**: You get back a proper `TextCompletionRequest` object, not a dict
|
||
|
|
5. **No backend leakage**: Application code never imports backend-specific libraries
|
||
|
|
|
||
|
|
### Example Transformation
|
||
|
|
|
||
|
|
**Current (Pulsar-specific):**
|
||
|
|
```python
|
||
|
|
# schema/services/llm.py
|
||
|
|
from pulsar.schema import Record, String, Boolean, Integer
|
||
|
|
|
||
|
|
class TextCompletionRequest(Record):
|
||
|
|
system = String()
|
||
|
|
prompt = String()
|
||
|
|
streaming = Boolean()
|
||
|
|
```
|
||
|
|
|
||
|
|
**New (Backend-agnostic):**
|
||
|
|
```python
|
||
|
|
# schema/services/llm.py
|
||
|
|
from dataclasses import dataclass
|
||
|
|
|
||
|
|
@dataclass
|
||
|
|
class TextCompletionRequest:
|
||
|
|
system: str
|
||
|
|
prompt: str
|
||
|
|
streaming: bool = False
|
||
|
|
```
|
||
|
|
|
||
|
|
### Backend Integration
|
||
|
|
|
||
|
|
Each backend handles serialization/deserialization of dataclasses:
|
||
|
|
|
||
|
|
**Pulsar backend:**
|
||
|
|
- Dynamically generate `pulsar.schema.Record` classes from dataclasses
|
||
|
|
- Or serialize dataclasses to JSON and use Pulsar's JSON schema
|
||
|
|
- Maintains compatibility with existing Pulsar deployments
|
||
|
|
|
||
|
|
**MQTT/Redis backend:**
|
||
|
|
- Direct JSON serialization of dataclass instances
|
||
|
|
- Use `dataclasses.asdict()` / `from_dict()`
|
||
|
|
- Lightweight, no schema registry needed
|
||
|
|
|
||
|
|
**Kafka backend:**
|
||
|
|
- Generate Avro schemas from dataclass definitions
|
||
|
|
- Use Confluent's schema registry
|
||
|
|
- Type-safe serialization with schema evolution support
|
||
|
|
|
||
|
|
### Architecture
|
||
|
|
|
||
|
|
```
|
||
|
|
┌─────────────────────────────────────┐
|
||
|
|
│ Application Code │
|
||
|
|
│ - Uses dataclass schemas │
|
||
|
|
│ - Backend-agnostic │
|
||
|
|
└──────────────┬──────────────────────┘
|
||
|
|
│
|
||
|
|
┌──────────────┴──────────────────────┐
|
||
|
|
│ PubSubFactory (configurable) │
|
||
|
|
│ - get_pubsub() returns backend │
|
||
|
|
└──────────────┬──────────────────────┘
|
||
|
|
│
|
||
|
|
┌──────┴──────┐
|
||
|
|
│ │
|
||
|
|
┌───────▼─────────┐ ┌────▼──────────────┐
|
||
|
|
│ PulsarBackend │ │ MQTTBackend │
|
||
|
|
│ - JSON schema │ │ - JSON serialize │
|
||
|
|
│ - or dynamic │ │ - Simple queues │
|
||
|
|
│ Record gen │ │ │
|
||
|
|
└─────────────────┘ └───────────────────┘
|
||
|
|
```
|
||
|
|
|
||
|
|
### Implementation Details
|
||
|
|
|
||
|
|
**1. Schema definitions:** Plain dataclasses with type hints
|
||
|
|
- `str`, `int`, `bool`, `float` for primitives
|
||
|
|
- `list[T]` for arrays
|
||
|
|
- `dict[str, T]` for maps
|
||
|
|
- Nested dataclasses for complex types
|
||
|
|
|
||
|
|
**2. Each backend provides:**
|
||
|
|
- Serializer: `dataclass → bytes/wire format`
|
||
|
|
- Deserializer: `bytes/wire format → dataclass`
|
||
|
|
- Schema registration (if needed, like Pulsar/Kafka)
|
||
|
|
|
||
|
|
**3. Consumer/Producer abstraction:**
|
||
|
|
- Already exists (consumer.py, producer.py)
|
||
|
|
- Update to use backend's serialization
|
||
|
|
- Remove direct Pulsar imports
|
||
|
|
|
||
|
|
**4. Type mappings:**
|
||
|
|
- Pulsar `String()` → Python `str`
|
||
|
|
- Pulsar `Integer()` → Python `int`
|
||
|
|
- Pulsar `Boolean()` → Python `bool`
|
||
|
|
- Pulsar `Array(T)` → Python `list[T]`
|
||
|
|
- Pulsar `Map(K, V)` → Python `dict[K, V]`
|
||
|
|
- Pulsar `Double()` → Python `float`
|
||
|
|
- Pulsar `Bytes()` → Python `bytes`
|
||
|
|
|
||
|
|
### Migration Path
|
||
|
|
|
||
|
|
1. **Create dataclass versions** of all schemas in `trustgraph/schema/`
|
||
|
|
2. **Update backend classes** (Consumer, Producer, Publisher, Subscriber) to use backend-provided serialization
|
||
|
|
3. **Implement PulsarBackend** with JSON schema or dynamic Record generation
|
||
|
|
4. **Test with Pulsar** to ensure backward compatibility with existing deployments
|
||
|
|
5. **Add new backends** (MQTT, Kafka, Redis, etc.) as needed
|
||
|
|
6. **Remove Pulsar imports** from schema files
|
||
|
|
|
||
|
|
### Benefits
|
||
|
|
|
||
|
|
✅ **No pub/sub dependency** in schema definitions
|
||
|
|
✅ **Standard Python** - easy to understand, type-check, document
|
||
|
|
✅ **Modern tooling** - works with mypy, IDE autocomplete, linters
|
||
|
|
✅ **Backend-optimized** - each backend uses native serialization
|
||
|
|
✅ **No translation overhead** - direct serialization, no adapters
|
||
|
|
✅ **Type safety** - real objects with proper types
|
||
|
|
✅ **Easy validation** - can use Pydantic if needed
|
||
|
|
|
||
|
|
### Challenges & Solutions
|
||
|
|
|
||
|
|
**Challenge:** Pulsar's `Record` has runtime field validation
|
||
|
|
**Solution:** Use Pydantic dataclasses for validation if needed, or Python 3.10+ dataclass features with `__post_init__`
|
||
|
|
|
||
|
|
**Challenge:** Some Pulsar-specific features (like `Bytes` type)
|
||
|
|
**Solution:** Map to `bytes` type in dataclass, backend handles encoding appropriately
|
||
|
|
|
||
|
|
**Challenge:** Topic naming (`persistent://tenant/namespace/topic`)
|
||
|
|
**Solution:** Abstract topic names in schema definitions, backend converts to proper format
|
||
|
|
|
||
|
|
**Challenge:** Schema evolution and versioning
|
||
|
|
**Solution:** Each backend handles this according to its capabilities (Pulsar schema versions, Kafka schema registry, etc.)
|
||
|
|
|
||
|
|
**Challenge:** Nested complex types
|
||
|
|
**Solution:** Use nested dataclasses, backends recursively serialize/deserialize
|
||
|
|
|
||
|
|
### Design Decisions
|
||
|
|
|
||
|
|
1. **Plain dataclasses or Pydantic?**
|
||
|
|
- ✅ **Decision: Use plain Python dataclasses**
|
||
|
|
- Simpler, no additional dependencies
|
||
|
|
- Validation not required in practice
|
||
|
|
- Easier to understand and maintain
|
||
|
|
|
||
|
|
2. **Schema evolution:**
|
||
|
|
- ✅ **Decision: No versioning mechanism needed**
|
||
|
|
- Schemas are stable and long-lasting
|
||
|
|
- Updates typically add new fields (backward compatible)
|
||
|
|
- Backends handle schema evolution according to their capabilities
|
||
|
|
|
||
|
|
3. **Backward compatibility:**
|
||
|
|
- ✅ **Decision: Major version change, no backward compatibility required**
|
||
|
|
- Will be a breaking change with migration instructions
|
||
|
|
- Clean break allows for better design
|
||
|
|
- Migration guide will be provided for existing deployments
|
||
|
|
|
||
|
|
4. **Nested types and complex structures:**
|
||
|
|
- ✅ **Decision: Use nested dataclasses naturally**
|
||
|
|
- Python dataclasses handle nesting perfectly
|
||
|
|
- `list[T]` for arrays, `dict[K, V]` for maps
|
||
|
|
- Backends recursively serialize/deserialize
|
||
|
|
- Example:
|
||
|
|
```python
|
||
|
|
@dataclass
|
||
|
|
class Value:
|
||
|
|
value: str
|
||
|
|
is_uri: bool
|
||
|
|
|
||
|
|
@dataclass
|
||
|
|
class Triple:
|
||
|
|
s: Value # Nested dataclass
|
||
|
|
p: Value
|
||
|
|
o: Value
|
||
|
|
|
||
|
|
@dataclass
|
||
|
|
class GraphQuery:
|
||
|
|
triples: list[Triple] # Array of nested dataclasses
|
||
|
|
metadata: dict[str, str]
|
||
|
|
```
|
||
|
|
|
||
|
|
5. **Default values and optional fields:**
|
||
|
|
- ✅ **Decision: Mix of required, defaults, and optional fields**
|
||
|
|
- Required fields: No default value
|
||
|
|
- Fields with defaults: Always present, have sensible default
|
||
|
|
- Truly optional fields: `T | None = None`, omitted from serialization when `None`
|
||
|
|
- Example:
|
||
|
|
```python
|
||
|
|
@dataclass
|
||
|
|
class TextCompletionRequest:
|
||
|
|
system: str # Required, no default
|
||
|
|
prompt: str # Required, no default
|
||
|
|
streaming: bool = False # Optional with default value
|
||
|
|
metadata: dict | None = None # Truly optional, can be absent
|
||
|
|
```
|
||
|
|
|
||
|
|
**Important serialization semantics:**
|
||
|
|
|
||
|
|
When `metadata = None`:
|
||
|
|
```json
|
||
|
|
{
|
||
|
|
"system": "...",
|
||
|
|
"prompt": "...",
|
||
|
|
"streaming": false
|
||
|
|
// metadata field NOT PRESENT
|
||
|
|
}
|
||
|
|
```
|
||
|
|
|
||
|
|
When `metadata = {}` (explicitly empty):
|
||
|
|
```json
|
||
|
|
{
|
||
|
|
"system": "...",
|
||
|
|
"prompt": "...",
|
||
|
|
"streaming": false,
|
||
|
|
"metadata": {} // Field PRESENT but empty
|
||
|
|
}
|
||
|
|
```
|
||
|
|
|
||
|
|
**Key distinction:**
|
||
|
|
- `None` → field absent from JSON (not serialized)
|
||
|
|
- Empty value (`{}`, `[]`, `""`) → field present with empty value
|
||
|
|
- This matters semantically: "not provided" vs "explicitly empty"
|
||
|
|
- Serialization backends must skip `None` fields, not encode as `null`
|
||
|
|
|
||
|
|
## Approach Draft 3: Implementation Details
|
||
|
|
|
||
|
|
### Generic Queue Naming Format
|
||
|
|
|
||
|
|
Replace backend-specific queue names with a generic format that backends can map appropriately.
|
||
|
|
|
||
|
|
**Format:** `{qos}/{tenant}/{namespace}/{queue-name}`
|
||
|
|
|
||
|
|
Where:
|
||
|
|
- `qos`: Quality of Service level
|
||
|
|
- `q0` = best-effort (fire and forget, no acknowledgment)
|
||
|
|
- `q1` = at-least-once (requires acknowledgment)
|
||
|
|
- `q2` = exactly-once (two-phase acknowledgment)
|
||
|
|
- `tenant`: Logical grouping for multi-tenancy
|
||
|
|
- `namespace`: Sub-grouping within tenant
|
||
|
|
- `queue-name`: Actual queue/topic name
|
||
|
|
|
||
|
|
**Examples:**
|
||
|
|
```
|
||
|
|
q1/tg/flow/text-completion-requests
|
||
|
|
q2/tg/config/config-push
|
||
|
|
q0/tg/metrics/stats
|
||
|
|
```
|
||
|
|
|
||
|
|
### Backend Topic Mapping
|
||
|
|
|
||
|
|
Each backend maps the generic format to its native format:
|
||
|
|
|
||
|
|
**Pulsar Backend:**
|
||
|
|
```python
|
||
|
|
def map_topic(self, generic_topic: str) -> str:
|
||
|
|
# Parse: q1/tg/flow/text-completion-requests
|
||
|
|
qos, tenant, namespace, queue = generic_topic.split('/', 3)
|
||
|
|
|
||
|
|
# Map QoS to persistence
|
||
|
|
persistence = 'persistent' if qos in ['q1', 'q2'] else 'non-persistent'
|
||
|
|
|
||
|
|
# Return Pulsar URI: persistent://tg/flow/text-completion-requests
|
||
|
|
return f"{persistence}://{tenant}/{namespace}/{queue}"
|
||
|
|
```
|
||
|
|
|
||
|
|
**MQTT Backend:**
|
||
|
|
```python
|
||
|
|
def map_topic(self, generic_topic: str) -> tuple[str, int]:
|
||
|
|
# Parse: q1/tg/flow/text-completion-requests
|
||
|
|
qos, tenant, namespace, queue = generic_topic.split('/', 3)
|
||
|
|
|
||
|
|
# Map QoS level
|
||
|
|
qos_level = {'q0': 0, 'q1': 1, 'q2': 2}[qos]
|
||
|
|
|
||
|
|
# Build MQTT topic including tenant/namespace for proper namespacing
|
||
|
|
mqtt_topic = f"{tenant}/{namespace}/{queue}"
|
||
|
|
|
||
|
|
return mqtt_topic, qos_level
|
||
|
|
```
|
||
|
|
|
||
|
|
### Updated Topic Helper Function
|
||
|
|
|
||
|
|
```python
|
||
|
|
# schema/core/topic.py
|
||
|
|
def topic(queue_name, qos='q1', tenant='tg', namespace='flow'):
|
||
|
|
"""
|
||
|
|
Create a generic topic identifier that can be mapped by backends.
|
||
|
|
|
||
|
|
Args:
|
||
|
|
queue_name: The queue/topic name
|
||
|
|
qos: Quality of service
|
||
|
|
- 'q0' = best-effort (no ack)
|
||
|
|
- 'q1' = at-least-once (ack required)
|
||
|
|
- 'q2' = exactly-once (two-phase ack)
|
||
|
|
tenant: Tenant identifier for multi-tenancy
|
||
|
|
namespace: Namespace within tenant
|
||
|
|
|
||
|
|
Returns:
|
||
|
|
Generic topic string: qos/tenant/namespace/queue_name
|
||
|
|
|
||
|
|
Examples:
|
||
|
|
topic('my-queue') # q1/tg/flow/my-queue
|
||
|
|
topic('config', qos='q2', namespace='config') # q2/tg/config/config
|
||
|
|
"""
|
||
|
|
return f"{qos}/{tenant}/{namespace}/{queue_name}"
|
||
|
|
```
|
||
|
|
|
||
|
|
### Configuration and Initialization
|
||
|
|
|
||
|
|
**Command-Line Arguments + Environment Variables:**
|
||
|
|
|
||
|
|
```python
|
||
|
|
# In base/async_processor.py - add_args() method
|
||
|
|
@staticmethod
|
||
|
|
def add_args(parser):
|
||
|
|
# Pub/sub backend selection
|
||
|
|
parser.add_argument(
|
||
|
|
'--pubsub-backend',
|
||
|
|
default=os.getenv('PUBSUB_BACKEND', 'pulsar'),
|
||
|
|
choices=['pulsar', 'mqtt'],
|
||
|
|
help='Pub/sub backend (default: pulsar, env: PUBSUB_BACKEND)'
|
||
|
|
)
|
||
|
|
|
||
|
|
# Pulsar-specific configuration
|
||
|
|
parser.add_argument(
|
||
|
|
'--pulsar-host',
|
||
|
|
default=os.getenv('PULSAR_HOST', 'pulsar://localhost:6650'),
|
||
|
|
help='Pulsar host (default: pulsar://localhost:6650, env: PULSAR_HOST)'
|
||
|
|
)
|
||
|
|
|
||
|
|
parser.add_argument(
|
||
|
|
'--pulsar-api-key',
|
||
|
|
default=os.getenv('PULSAR_API_KEY', None),
|
||
|
|
help='Pulsar API key (env: PULSAR_API_KEY)'
|
||
|
|
)
|
||
|
|
|
||
|
|
parser.add_argument(
|
||
|
|
'--pulsar-listener',
|
||
|
|
default=os.getenv('PULSAR_LISTENER', None),
|
||
|
|
help='Pulsar listener name (env: PULSAR_LISTENER)'
|
||
|
|
)
|
||
|
|
|
||
|
|
# MQTT-specific configuration
|
||
|
|
parser.add_argument(
|
||
|
|
'--mqtt-host',
|
||
|
|
default=os.getenv('MQTT_HOST', 'localhost'),
|
||
|
|
help='MQTT broker host (default: localhost, env: MQTT_HOST)'
|
||
|
|
)
|
||
|
|
|
||
|
|
parser.add_argument(
|
||
|
|
'--mqtt-port',
|
||
|
|
type=int,
|
||
|
|
default=int(os.getenv('MQTT_PORT', '1883')),
|
||
|
|
help='MQTT broker port (default: 1883, env: MQTT_PORT)'
|
||
|
|
)
|
||
|
|
|
||
|
|
parser.add_argument(
|
||
|
|
'--mqtt-username',
|
||
|
|
default=os.getenv('MQTT_USERNAME', None),
|
||
|
|
help='MQTT username (env: MQTT_USERNAME)'
|
||
|
|
)
|
||
|
|
|
||
|
|
parser.add_argument(
|
||
|
|
'--mqtt-password',
|
||
|
|
default=os.getenv('MQTT_PASSWORD', None),
|
||
|
|
help='MQTT password (env: MQTT_PASSWORD)'
|
||
|
|
)
|
||
|
|
```
|
||
|
|
|
||
|
|
**Factory Function:**
|
||
|
|
|
||
|
|
```python
|
||
|
|
# In base/pubsub.py or base/pubsub_factory.py
|
||
|
|
def get_pubsub(**config) -> PubSubBackend:
|
||
|
|
"""
|
||
|
|
Create and return a pub/sub backend based on configuration.
|
||
|
|
|
||
|
|
Args:
|
||
|
|
config: Configuration dict from command-line args
|
||
|
|
Must include 'pubsub_backend' key
|
||
|
|
|
||
|
|
Returns:
|
||
|
|
Backend instance (PulsarBackend, MQTTBackend, etc.)
|
||
|
|
"""
|
||
|
|
backend_type = config.get('pubsub_backend', 'pulsar')
|
||
|
|
|
||
|
|
if backend_type == 'pulsar':
|
||
|
|
return PulsarBackend(
|
||
|
|
host=config.get('pulsar_host'),
|
||
|
|
api_key=config.get('pulsar_api_key'),
|
||
|
|
listener=config.get('pulsar_listener'),
|
||
|
|
)
|
||
|
|
elif backend_type == 'mqtt':
|
||
|
|
return MQTTBackend(
|
||
|
|
host=config.get('mqtt_host'),
|
||
|
|
port=config.get('mqtt_port'),
|
||
|
|
username=config.get('mqtt_username'),
|
||
|
|
password=config.get('mqtt_password'),
|
||
|
|
)
|
||
|
|
else:
|
||
|
|
raise ValueError(f"Unknown pub/sub backend: {backend_type}")
|
||
|
|
```
|
||
|
|
|
||
|
|
**Usage in AsyncProcessor:**
|
||
|
|
|
||
|
|
```python
|
||
|
|
# In async_processor.py
|
||
|
|
class AsyncProcessor:
|
||
|
|
def __init__(self, **params):
|
||
|
|
self.id = params.get("id")
|
||
|
|
|
||
|
|
# Create backend from config (replaces PulsarClient)
|
||
|
|
self.pubsub = get_pubsub(**params)
|
||
|
|
|
||
|
|
# Rest of initialization...
|
||
|
|
```
|
||
|
|
|
||
|
|
### Backend Interface
|
||
|
|
|
||
|
|
```python
|
||
|
|
class PubSubBackend(Protocol):
|
||
|
|
"""Protocol defining the interface all pub/sub backends must implement."""
|
||
|
|
|
||
|
|
def create_producer(self, topic: str, schema: type, **options) -> BackendProducer:
|
||
|
|
"""
|
||
|
|
Create a producer for a topic.
|
||
|
|
|
||
|
|
Args:
|
||
|
|
topic: Generic topic format (qos/tenant/namespace/queue)
|
||
|
|
schema: Dataclass type for messages
|
||
|
|
options: Backend-specific options (e.g., chunking_enabled)
|
||
|
|
|
||
|
|
Returns:
|
||
|
|
Backend-specific producer instance
|
||
|
|
"""
|
||
|
|
...
|
||
|
|
|
||
|
|
def create_consumer(
|
||
|
|
self,
|
||
|
|
topic: str,
|
||
|
|
subscription: str,
|
||
|
|
schema: type,
|
||
|
|
initial_position: str = 'latest',
|
||
|
|
consumer_type: str = 'shared',
|
||
|
|
**options
|
||
|
|
) -> BackendConsumer:
|
||
|
|
"""
|
||
|
|
Create a consumer for a topic.
|
||
|
|
|
||
|
|
Args:
|
||
|
|
topic: Generic topic format (qos/tenant/namespace/queue)
|
||
|
|
subscription: Subscription/consumer group name
|
||
|
|
schema: Dataclass type for messages
|
||
|
|
initial_position: 'earliest' or 'latest' (MQTT may ignore)
|
||
|
|
consumer_type: 'shared', 'exclusive', 'failover' (MQTT may ignore)
|
||
|
|
options: Backend-specific options
|
||
|
|
|
||
|
|
Returns:
|
||
|
|
Backend-specific consumer instance
|
||
|
|
"""
|
||
|
|
...
|
||
|
|
|
||
|
|
def close(self) -> None:
|
||
|
|
"""Close the backend connection."""
|
||
|
|
...
|
||
|
|
```
|
||
|
|
|
||
|
|
```python
|
||
|
|
class BackendProducer(Protocol):
|
||
|
|
"""Protocol for backend-specific producer."""
|
||
|
|
|
||
|
|
def send(self, message: Any, properties: dict = {}) -> None:
|
||
|
|
"""Send a message (dataclass instance) with optional properties."""
|
||
|
|
...
|
||
|
|
|
||
|
|
def flush(self) -> None:
|
||
|
|
"""Flush any buffered messages."""
|
||
|
|
...
|
||
|
|
|
||
|
|
def close(self) -> None:
|
||
|
|
"""Close the producer."""
|
||
|
|
...
|
||
|
|
```
|
||
|
|
|
||
|
|
```python
|
||
|
|
class BackendConsumer(Protocol):
|
||
|
|
"""Protocol for backend-specific consumer."""
|
||
|
|
|
||
|
|
def receive(self, timeout_millis: int = 2000) -> Message:
|
||
|
|
"""
|
||
|
|
Receive a message from the topic.
|
||
|
|
|
||
|
|
Raises:
|
||
|
|
TimeoutError: If no message received within timeout
|
||
|
|
"""
|
||
|
|
...
|
||
|
|
|
||
|
|
def acknowledge(self, message: Message) -> None:
|
||
|
|
"""Acknowledge successful processing of a message."""
|
||
|
|
...
|
||
|
|
|
||
|
|
def negative_acknowledge(self, message: Message) -> None:
|
||
|
|
"""Negative acknowledge - triggers redelivery."""
|
||
|
|
...
|
||
|
|
|
||
|
|
def unsubscribe(self) -> None:
|
||
|
|
"""Unsubscribe from the topic."""
|
||
|
|
...
|
||
|
|
|
||
|
|
def close(self) -> None:
|
||
|
|
"""Close the consumer."""
|
||
|
|
...
|
||
|
|
```
|
||
|
|
|
||
|
|
```python
|
||
|
|
class Message(Protocol):
|
||
|
|
"""Protocol for a received message."""
|
||
|
|
|
||
|
|
def value(self) -> Any:
|
||
|
|
"""Get the deserialized message (dataclass instance)."""
|
||
|
|
...
|
||
|
|
|
||
|
|
def properties(self) -> dict:
|
||
|
|
"""Get message properties/metadata."""
|
||
|
|
...
|
||
|
|
```
|
||
|
|
|
||
|
|
### Existing Classes Refactoring
|
||
|
|
|
||
|
|
The existing `Consumer`, `Producer`, `Publisher`, `Subscriber` classes remain largely intact:
|
||
|
|
|
||
|
|
**Current responsibilities (keep):**
|
||
|
|
- Async threading model and taskgroups
|
||
|
|
- Reconnection logic and retry handling
|
||
|
|
- Metrics collection
|
||
|
|
- Rate limiting
|
||
|
|
- Concurrency management
|
||
|
|
|
||
|
|
**Changes needed:**
|
||
|
|
- Remove direct Pulsar imports (`pulsar.schema`, `pulsar.InitialPosition`, etc.)
|
||
|
|
- Accept `BackendProducer`/`BackendConsumer` instead of Pulsar client
|
||
|
|
- Delegate actual pub/sub operations to backend instances
|
||
|
|
- Map generic concepts to backend calls
|
||
|
|
|
||
|
|
**Example refactoring:**
|
||
|
|
|
||
|
|
```python
|
||
|
|
# OLD - consumer.py
|
||
|
|
class Consumer:
|
||
|
|
def __init__(self, client, topic, subscriber, schema, ...):
|
||
|
|
self.client = client # Direct Pulsar client
|
||
|
|
# ...
|
||
|
|
|
||
|
|
async def consumer_run(self):
|
||
|
|
# Uses pulsar.InitialPosition, pulsar.ConsumerType
|
||
|
|
self.consumer = self.client.subscribe(
|
||
|
|
topic=self.topic,
|
||
|
|
schema=JsonSchema(self.schema),
|
||
|
|
initial_position=pulsar.InitialPosition.Earliest,
|
||
|
|
consumer_type=pulsar.ConsumerType.Shared,
|
||
|
|
)
|
||
|
|
|
||
|
|
# NEW - consumer.py
|
||
|
|
class Consumer:
|
||
|
|
def __init__(self, backend_consumer, schema, ...):
|
||
|
|
self.backend_consumer = backend_consumer # Backend-specific consumer
|
||
|
|
self.schema = schema
|
||
|
|
# ...
|
||
|
|
|
||
|
|
async def consumer_run(self):
|
||
|
|
# Backend consumer already created with right settings
|
||
|
|
# Just use it directly
|
||
|
|
while self.running:
|
||
|
|
msg = await asyncio.to_thread(
|
||
|
|
self.backend_consumer.receive,
|
||
|
|
timeout_millis=2000
|
||
|
|
)
|
||
|
|
await self.handle_message(msg)
|
||
|
|
```
|
||
|
|
|
||
|
|
### Backend-Specific Behaviors
|
||
|
|
|
||
|
|
**Pulsar Backend:**
|
||
|
|
- Maps `q0` → `non-persistent://`, `q1`/`q2` → `persistent://`
|
||
|
|
- Supports all consumer types (shared, exclusive, failover)
|
||
|
|
- Supports initial position (earliest/latest)
|
||
|
|
- Native message acknowledgment
|
||
|
|
- Schema registry support
|
||
|
|
|
||
|
|
**MQTT Backend:**
|
||
|
|
- Maps `q0`/`q1`/`q2` → MQTT QoS levels 0/1/2
|
||
|
|
- Includes tenant/namespace in topic path for namespacing
|
||
|
|
- Auto-generates client IDs from subscription names
|
||
|
|
- Ignores initial position (no message history in basic MQTT)
|
||
|
|
- Ignores consumer type (MQTT uses client IDs, not consumer groups)
|
||
|
|
- Simple publish/subscribe model
|
||
|
|
|
||
|
|
### Design Decisions Summary
|
||
|
|
|
||
|
|
1. ✅ **Generic queue naming**: `qos/tenant/namespace/queue-name` format
|
||
|
|
2. ✅ **QoS in queue ID**: Determined by queue definition, not configuration
|
||
|
|
3. ✅ **Reconnection**: Handled by Consumer/Producer classes, not backends
|
||
|
|
4. ✅ **MQTT topics**: Include tenant/namespace for proper namespacing
|
||
|
|
5. ✅ **Message history**: MQTT ignores `initial_position` parameter (future enhancement)
|
||
|
|
6. ✅ **Client IDs**: MQTT backend auto-generates from subscription name
|
||
|
|
|
||
|
|
### Future Enhancements
|
||
|
|
|
||
|
|
**MQTT message history:**
|
||
|
|
- Could add optional persistence layer (e.g., retained messages, external store)
|
||
|
|
- Would allow supporting `initial_position='earliest'`
|
||
|
|
- Not required for initial implementation
|
||
|
|
|