mirror of
https://github.com/trustgraph-ai/trustgraph.git
synced 2026-04-25 00:16:23 +02:00
Messaging fabric plugins (#592)
* Plugin architecture for messaging fabric * Schemas use a technology neutral expression * Schemas strictness has uncovered some incorrect schema use which is fixed
This commit is contained in:
parent
1865b3f3c8
commit
34eb083836
100 changed files with 2342 additions and 828 deletions
958
docs/tech-specs/pubsub.md
Normal file
958
docs/tech-specs/pubsub.md
Normal file
|
|
@ -0,0 +1,958 @@
|
|||
# Pub/Sub Infrastructure
|
||||
|
||||
## Overview
|
||||
|
||||
This document catalogs all connections between the TrustGraph codebase and the pub/sub infrastructure. Currently, the system is hardcoded to use Apache Pulsar. This analysis identifies all integration points to inform future refactoring toward a configurable pub/sub abstraction.
|
||||
|
||||
## Current State: Pulsar Integration Points
|
||||
|
||||
### 1. Direct Pulsar Client Usage
|
||||
|
||||
**Location:** `trustgraph-flow/trustgraph/gateway/service.py`
|
||||
|
||||
The API gateway directly imports and instantiates the Pulsar client:
|
||||
|
||||
- **Line 20:** `import pulsar`
|
||||
- **Lines 54-61:** Direct instantiation of `pulsar.Client()` with optional `pulsar.AuthenticationToken()`
|
||||
- **Lines 33-35:** Default Pulsar host configuration from environment variables
|
||||
- **Lines 178-192:** CLI arguments for `--pulsar-host`, `--pulsar-api-key`, and `--pulsar-listener`
|
||||
- **Lines 78, 124:** Passes `pulsar_client` to `ConfigReceiver` and `DispatcherManager`
|
||||
|
||||
This is the only location that directly instantiates a Pulsar client outside of the abstraction layer.
|
||||
|
||||
### 2. Base Processor Framework
|
||||
|
||||
**Location:** `trustgraph-base/trustgraph/base/async_processor.py`
|
||||
|
||||
The base class for all processors provides Pulsar connectivity:
|
||||
|
||||
- **Line 9:** `import _pulsar` (for exception handling)
|
||||
- **Line 18:** `from . pubsub import PulsarClient`
|
||||
- **Line 38:** Creates `pulsar_client_object = PulsarClient(**params)`
|
||||
- **Lines 104-108:** Properties exposing `pulsar_host` and `pulsar_client`
|
||||
- **Line 250:** Static method `add_args()` calls `PulsarClient.add_args(parser)` for CLI arguments
|
||||
- **Lines 223-225:** Exception handling for `_pulsar.Interrupted`
|
||||
|
||||
All processors inherit from `AsyncProcessor`, making this the central integration point.
|
||||
|
||||
### 3. Consumer Abstraction
|
||||
|
||||
**Location:** `trustgraph-base/trustgraph/base/consumer.py`
|
||||
|
||||
Consumes messages from queues and invokes handler functions:
|
||||
|
||||
**Pulsar imports:**
|
||||
- **Line 12:** `from pulsar.schema import JsonSchema`
|
||||
- **Line 13:** `import pulsar`
|
||||
- **Line 14:** `import _pulsar`
|
||||
|
||||
**Pulsar-specific usage:**
|
||||
- **Lines 100, 102:** `pulsar.InitialPosition.Earliest` / `pulsar.InitialPosition.Latest`
|
||||
- **Line 108:** `JsonSchema(self.schema)` wrapper
|
||||
- **Line 110:** `pulsar.ConsumerType.Shared`
|
||||
- **Lines 104-111:** `self.client.subscribe()` with Pulsar-specific parameters
|
||||
- **Lines 143, 150, 65:** `consumer.unsubscribe()` and `consumer.close()` methods
|
||||
- **Line 162:** `_pulsar.Timeout` exception
|
||||
- **Lines 182, 205, 232:** `consumer.acknowledge()` / `consumer.negative_acknowledge()`
|
||||
|
||||
**Spec file:** `trustgraph-base/trustgraph/base/consumer_spec.py`
|
||||
- **Line 22:** References `processor.pulsar_client`
|
||||
|
||||
### 4. Producer Abstraction
|
||||
|
||||
**Location:** `trustgraph-base/trustgraph/base/producer.py`
|
||||
|
||||
Sends messages to queues:
|
||||
|
||||
**Pulsar imports:**
|
||||
- **Line 2:** `from pulsar.schema import JsonSchema`
|
||||
|
||||
**Pulsar-specific usage:**
|
||||
- **Line 49:** `JsonSchema(self.schema)` wrapper
|
||||
- **Lines 47-51:** `self.client.create_producer()` with Pulsar-specific parameters (topic, schema, chunking_enabled)
|
||||
- **Lines 31, 76:** `producer.close()` method
|
||||
- **Lines 64-65:** `producer.send()` with message and properties
|
||||
|
||||
**Spec file:** `trustgraph-base/trustgraph/base/producer_spec.py`
|
||||
- **Line 18:** References `processor.pulsar_client`
|
||||
|
||||
### 5. Publisher Abstraction
|
||||
|
||||
**Location:** `trustgraph-base/trustgraph/base/publisher.py`
|
||||
|
||||
Asynchronous message publishing with queue buffering:
|
||||
|
||||
**Pulsar imports:**
|
||||
- **Line 2:** `from pulsar.schema import JsonSchema`
|
||||
- **Line 6:** `import pulsar`
|
||||
|
||||
**Pulsar-specific usage:**
|
||||
- **Line 52:** `JsonSchema(self.schema)` wrapper
|
||||
- **Lines 50-54:** `self.client.create_producer()` with Pulsar-specific parameters
|
||||
- **Lines 101, 103:** `producer.send()` with message and optional properties
|
||||
- **Lines 106-107:** `producer.flush()` and `producer.close()` methods
|
||||
|
||||
### 6. Subscriber Abstraction
|
||||
|
||||
**Location:** `trustgraph-base/trustgraph/base/subscriber.py`
|
||||
|
||||
Provides multi-recipient message distribution from queues:
|
||||
|
||||
**Pulsar imports:**
|
||||
- **Line 6:** `from pulsar.schema import JsonSchema`
|
||||
- **Line 8:** `import _pulsar`
|
||||
|
||||
**Pulsar-specific usage:**
|
||||
- **Line 55:** `JsonSchema(self.schema)` wrapper
|
||||
- **Line 57:** `self.client.subscribe(**subscribe_args)`
|
||||
- **Lines 101, 136, 160, 167-172:** Pulsar exceptions: `_pulsar.Timeout`, `_pulsar.InvalidConfiguration`, `_pulsar.AlreadyClosed`
|
||||
- **Lines 159, 166, 170:** Consumer methods: `negative_acknowledge()`, `unsubscribe()`, `close()`
|
||||
- **Lines 247, 251:** Message acknowledgment: `acknowledge()`, `negative_acknowledge()`
|
||||
|
||||
**Spec file:** `trustgraph-base/trustgraph/base/subscriber_spec.py`
|
||||
- **Line 19:** References `processor.pulsar_client`
|
||||
|
||||
### 7. Schema System (Heart of Darkness)
|
||||
|
||||
**Location:** `trustgraph-base/trustgraph/schema/`
|
||||
|
||||
Every message schema in the system is defined using Pulsar's schema framework.
|
||||
|
||||
**Core primitives:** `schema/core/primitives.py`
|
||||
- **Line 2:** `from pulsar.schema import Record, String, Boolean, Array, Integer`
|
||||
- All schemas inherit from Pulsar's `Record` base class
|
||||
- All field types are Pulsar types: `String()`, `Integer()`, `Boolean()`, `Array()`, `Map()`, `Double()`
|
||||
|
||||
**Example schemas:**
|
||||
- `schema/services/llm.py` (Line 2): `from pulsar.schema import Record, String, Array, Double, Integer, Boolean`
|
||||
- `schema/services/config.py` (Line 2): `from pulsar.schema import Record, Bytes, String, Boolean, Array, Map, Integer`
|
||||
|
||||
**Topic naming:** `schema/core/topic.py`
|
||||
- **Lines 2-3:** Topic format: `{kind}://{tenant}/{namespace}/{topic}`
|
||||
- This URI structure is Pulsar-specific (e.g., `persistent://tg/flow/config`)
|
||||
|
||||
**Impact:**
|
||||
- All request/response message definitions throughout the codebase use Pulsar schemas
|
||||
- This includes services for: config, flow, llm, prompt, query, storage, agent, collection, diagnosis, library, lookup, nlp_query, objects_query, retrieval, structured_query
|
||||
- Schema definitions are imported and used extensively across all processors and services
|
||||
|
||||
## Summary
|
||||
|
||||
### Pulsar Dependencies by Category
|
||||
|
||||
1. **Client instantiation:**
|
||||
- Direct: `gateway/service.py`
|
||||
- Abstracted: `async_processor.py` → `pubsub.py` (PulsarClient)
|
||||
|
||||
2. **Message transport:**
|
||||
- Consumer: `consumer.py`, `consumer_spec.py`
|
||||
- Producer: `producer.py`, `producer_spec.py`
|
||||
- Publisher: `publisher.py`
|
||||
- Subscriber: `subscriber.py`, `subscriber_spec.py`
|
||||
|
||||
3. **Schema system:**
|
||||
- Base types: `schema/core/primitives.py`
|
||||
- All service schemas: `schema/services/*.py`
|
||||
- Topic naming: `schema/core/topic.py`
|
||||
|
||||
4. **Pulsar-specific concepts required:**
|
||||
- Topic-based messaging
|
||||
- Schema system (Record, field types)
|
||||
- Shared subscriptions
|
||||
- Message acknowledgment (positive/negative)
|
||||
- Consumer positioning (earliest/latest)
|
||||
- Message properties
|
||||
- Initial positions and consumer types
|
||||
- Chunking support
|
||||
- Persistent vs non-persistent topics
|
||||
|
||||
### Refactoring Challenges
|
||||
|
||||
The good news: The abstraction layer (Consumer, Producer, Publisher, Subscriber) provides a clean encapsulation of most Pulsar interactions.
|
||||
|
||||
The challenges:
|
||||
1. **Schema system pervasiveness:** Every message definition uses `pulsar.schema.Record` and Pulsar field types
|
||||
2. **Pulsar-specific enums:** `InitialPosition`, `ConsumerType`
|
||||
3. **Pulsar exceptions:** `_pulsar.Timeout`, `_pulsar.Interrupted`, `_pulsar.InvalidConfiguration`, `_pulsar.AlreadyClosed`
|
||||
4. **Method signatures:** `acknowledge()`, `negative_acknowledge()`, `subscribe()`, `create_producer()`, etc.
|
||||
5. **Topic URI format:** Pulsar's `kind://tenant/namespace/topic` structure
|
||||
|
||||
### Next Steps
|
||||
|
||||
To make the pub/sub infrastructure configurable, we need to:
|
||||
|
||||
1. Create an abstraction interface for the client/schema system
|
||||
2. Abstract Pulsar-specific enums and exceptions
|
||||
3. Create schema wrappers or alternative schema definitions
|
||||
4. Implement the interface for both Pulsar and alternative systems (Kafka, RabbitMQ, Redis Streams, etc.)
|
||||
5. Update `pubsub.py` to be configurable and support multiple backends
|
||||
6. Provide migration path for existing deployments
|
||||
|
||||
## Approach Draft 1: Adapter Pattern with Schema Translation Layer
|
||||
|
||||
### Key Insight
|
||||
The **schema system** is the deepest integration point - everything else flows from it. We need to solve this first, or we'll be rewriting the entire codebase.
|
||||
|
||||
### Strategy: Minimal Disruption with Adapters
|
||||
|
||||
**1. Keep Pulsar schemas as the internal representation**
|
||||
- Don't rewrite all the schema definitions
|
||||
- Schemas remain `pulsar.schema.Record` internally
|
||||
- Use adapters to translate at the boundary between our code and the pub/sub backend
|
||||
|
||||
**2. Create a pub/sub abstraction layer:**
|
||||
|
||||
```
|
||||
┌─────────────────────────────────────┐
|
||||
│ Existing Code (unchanged) │
|
||||
│ - Uses Pulsar schemas internally │
|
||||
│ - Consumer/Producer/Publisher │
|
||||
└──────────────┬──────────────────────┘
|
||||
│
|
||||
┌──────────────┴──────────────────────┐
|
||||
│ PubSubFactory (configurable) │
|
||||
│ - Creates backend-specific client │
|
||||
└──────────────┬──────────────────────┘
|
||||
│
|
||||
┌──────┴──────┐
|
||||
│ │
|
||||
┌───────▼─────┐ ┌────▼─────────┐
|
||||
│ PulsarAdapter│ │ KafkaAdapter │ etc...
|
||||
│ (passthrough)│ │ (translates) │
|
||||
└──────────────┘ └──────────────┘
|
||||
```
|
||||
|
||||
**3. Define abstract interfaces:**
|
||||
- `PubSubClient` - client connection
|
||||
- `PubSubProducer` - sending messages
|
||||
- `PubSubConsumer` - receiving messages
|
||||
- `SchemaAdapter` - translating Pulsar schemas to/from JSON or backend-specific formats
|
||||
|
||||
**4. Implementation details:**
|
||||
|
||||
For **Pulsar adapter**: Nearly passthrough, minimal translation
|
||||
|
||||
For **other backends** (Kafka, RabbitMQ, etc.):
|
||||
- Serialize Pulsar Record objects to JSON/bytes
|
||||
- Map concepts like:
|
||||
- `InitialPosition.Earliest/Latest` → Kafka's auto.offset.reset
|
||||
- `acknowledge()` → Kafka's commit
|
||||
- `negative_acknowledge()` → Re-queue or DLQ pattern
|
||||
- Topic URIs → Backend-specific topic names
|
||||
|
||||
### Analysis
|
||||
|
||||
**Pros:**
|
||||
- ✅ Minimal code changes to existing services
|
||||
- ✅ Schemas stay as-is (no massive rewrite)
|
||||
- ✅ Gradual migration path
|
||||
- ✅ Pulsar users see no difference
|
||||
- ✅ New backends added via adapters
|
||||
|
||||
**Cons:**
|
||||
- ⚠️ Still carries Pulsar dependency (for schema definitions)
|
||||
- ⚠️ Some impedance mismatch translating concepts
|
||||
|
||||
### Alternative Consideration
|
||||
|
||||
Create a **TrustGraph schema system** that's pub/sub agnostic (using dataclasses or Pydantic), then generate Pulsar/Kafka/etc schemas from it. This requires rewriting every schema file and potentially breaking changes.
|
||||
|
||||
### Recommendation for Draft 1
|
||||
|
||||
Start with the **adapter approach** because:
|
||||
1. It's pragmatic - works with existing code
|
||||
2. Proves the concept with minimal risk
|
||||
3. Can evolve to a native schema system later if needed
|
||||
4. Configuration-driven: one env var switches backends
|
||||
|
||||
## Approach Draft 2: Backend-Agnostic Schema System with Dataclasses
|
||||
|
||||
### Core Concept
|
||||
|
||||
Use Python **dataclasses** as the neutral schema definition format. Each pub/sub backend provides its own serialization/deserialization for dataclasses, eliminating the need for Pulsar schemas to remain in the codebase.
|
||||
|
||||
### Schema Polymorphism at the Factory Level
|
||||
|
||||
Instead of translating Pulsar schemas, **each backend provides its own schema handling** that works with standard Python dataclasses.
|
||||
|
||||
### Publisher Flow
|
||||
|
||||
```python
|
||||
# 1. Get the configured backend from factory
|
||||
pubsub = get_pubsub() # Returns PulsarBackend, MQTTBackend, etc.
|
||||
|
||||
# 2. Get schema class from the backend
|
||||
# (Can be imported directly - backend-agnostic)
|
||||
from trustgraph.schema.services.llm import TextCompletionRequest
|
||||
|
||||
# 3. Create a producer/publisher for a specific topic
|
||||
producer = pubsub.create_producer(
|
||||
topic="text-completion-requests",
|
||||
schema=TextCompletionRequest # Tells backend what schema to use
|
||||
)
|
||||
|
||||
# 4. Create message instances (same API regardless of backend)
|
||||
request = TextCompletionRequest(
|
||||
system="You are helpful",
|
||||
prompt="Hello world",
|
||||
streaming=False
|
||||
)
|
||||
|
||||
# 5. Send the message
|
||||
producer.send(request) # Backend serializes appropriately
|
||||
```
|
||||
|
||||
### Consumer Flow
|
||||
|
||||
```python
|
||||
# 1. Get the configured backend
|
||||
pubsub = get_pubsub()
|
||||
|
||||
# 2. Create a consumer
|
||||
consumer = pubsub.subscribe(
|
||||
topic="text-completion-requests",
|
||||
schema=TextCompletionRequest # Tells backend how to deserialize
|
||||
)
|
||||
|
||||
# 3. Receive and deserialize
|
||||
msg = consumer.receive()
|
||||
request = msg.value() # Returns TextCompletionRequest dataclass instance
|
||||
|
||||
# 4. Use the data (type-safe access)
|
||||
print(request.system) # "You are helpful"
|
||||
print(request.prompt) # "Hello world"
|
||||
print(request.streaming) # False
|
||||
```
|
||||
|
||||
### What Happens Behind the Scenes
|
||||
|
||||
**For Pulsar backend:**
|
||||
- `create_producer()` → creates Pulsar producer with JSON schema or dynamically generated Record
|
||||
- `send(request)` → serializes dataclass to JSON/Pulsar format, sends to Pulsar
|
||||
- `receive()` → gets Pulsar message, deserializes back to dataclass
|
||||
|
||||
**For MQTT backend:**
|
||||
- `create_producer()` → connects to MQTT broker, no schema registration needed
|
||||
- `send(request)` → converts dataclass to JSON, publishes to MQTT topic
|
||||
- `receive()` → subscribes to MQTT topic, deserializes JSON to dataclass
|
||||
|
||||
**For Kafka backend:**
|
||||
- `create_producer()` → creates Kafka producer, registers Avro schema if needed
|
||||
- `send(request)` → serializes dataclass to Avro format, sends to Kafka
|
||||
- `receive()` → gets Kafka message, deserializes Avro back to dataclass
|
||||
|
||||
### Key Design Points
|
||||
|
||||
1. **Schema object creation**: The dataclass instance (`TextCompletionRequest(...)`) is identical regardless of backend
|
||||
2. **Backend handles encoding**: Each backend knows how to serialize its dataclass to the wire format
|
||||
3. **Schema definition at creation**: When creating producer/consumer, you specify the schema type
|
||||
4. **Type safety preserved**: You get back a proper `TextCompletionRequest` object, not a dict
|
||||
5. **No backend leakage**: Application code never imports backend-specific libraries
|
||||
|
||||
### Example Transformation
|
||||
|
||||
**Current (Pulsar-specific):**
|
||||
```python
|
||||
# schema/services/llm.py
|
||||
from pulsar.schema import Record, String, Boolean, Integer
|
||||
|
||||
class TextCompletionRequest(Record):
|
||||
system = String()
|
||||
prompt = String()
|
||||
streaming = Boolean()
|
||||
```
|
||||
|
||||
**New (Backend-agnostic):**
|
||||
```python
|
||||
# schema/services/llm.py
|
||||
from dataclasses import dataclass
|
||||
|
||||
@dataclass
|
||||
class TextCompletionRequest:
|
||||
system: str
|
||||
prompt: str
|
||||
streaming: bool = False
|
||||
```
|
||||
|
||||
### Backend Integration
|
||||
|
||||
Each backend handles serialization/deserialization of dataclasses:
|
||||
|
||||
**Pulsar backend:**
|
||||
- Dynamically generate `pulsar.schema.Record` classes from dataclasses
|
||||
- Or serialize dataclasses to JSON and use Pulsar's JSON schema
|
||||
- Maintains compatibility with existing Pulsar deployments
|
||||
|
||||
**MQTT/Redis backend:**
|
||||
- Direct JSON serialization of dataclass instances
|
||||
- Use `dataclasses.asdict()` / `from_dict()`
|
||||
- Lightweight, no schema registry needed
|
||||
|
||||
**Kafka backend:**
|
||||
- Generate Avro schemas from dataclass definitions
|
||||
- Use Confluent's schema registry
|
||||
- Type-safe serialization with schema evolution support
|
||||
|
||||
### Architecture
|
||||
|
||||
```
|
||||
┌─────────────────────────────────────┐
|
||||
│ Application Code │
|
||||
│ - Uses dataclass schemas │
|
||||
│ - Backend-agnostic │
|
||||
└──────────────┬──────────────────────┘
|
||||
│
|
||||
┌──────────────┴──────────────────────┐
|
||||
│ PubSubFactory (configurable) │
|
||||
│ - get_pubsub() returns backend │
|
||||
└──────────────┬──────────────────────┘
|
||||
│
|
||||
┌──────┴──────┐
|
||||
│ │
|
||||
┌───────▼─────────┐ ┌────▼──────────────┐
|
||||
│ PulsarBackend │ │ MQTTBackend │
|
||||
│ - JSON schema │ │ - JSON serialize │
|
||||
│ - or dynamic │ │ - Simple queues │
|
||||
│ Record gen │ │ │
|
||||
└─────────────────┘ └───────────────────┘
|
||||
```
|
||||
|
||||
### Implementation Details
|
||||
|
||||
**1. Schema definitions:** Plain dataclasses with type hints
|
||||
- `str`, `int`, `bool`, `float` for primitives
|
||||
- `list[T]` for arrays
|
||||
- `dict[str, T]` for maps
|
||||
- Nested dataclasses for complex types
|
||||
|
||||
**2. Each backend provides:**
|
||||
- Serializer: `dataclass → bytes/wire format`
|
||||
- Deserializer: `bytes/wire format → dataclass`
|
||||
- Schema registration (if needed, like Pulsar/Kafka)
|
||||
|
||||
**3. Consumer/Producer abstraction:**
|
||||
- Already exists (consumer.py, producer.py)
|
||||
- Update to use backend's serialization
|
||||
- Remove direct Pulsar imports
|
||||
|
||||
**4. Type mappings:**
|
||||
- Pulsar `String()` → Python `str`
|
||||
- Pulsar `Integer()` → Python `int`
|
||||
- Pulsar `Boolean()` → Python `bool`
|
||||
- Pulsar `Array(T)` → Python `list[T]`
|
||||
- Pulsar `Map(K, V)` → Python `dict[K, V]`
|
||||
- Pulsar `Double()` → Python `float`
|
||||
- Pulsar `Bytes()` → Python `bytes`
|
||||
|
||||
### Migration Path
|
||||
|
||||
1. **Create dataclass versions** of all schemas in `trustgraph/schema/`
|
||||
2. **Update backend classes** (Consumer, Producer, Publisher, Subscriber) to use backend-provided serialization
|
||||
3. **Implement PulsarBackend** with JSON schema or dynamic Record generation
|
||||
4. **Test with Pulsar** to ensure backward compatibility with existing deployments
|
||||
5. **Add new backends** (MQTT, Kafka, Redis, etc.) as needed
|
||||
6. **Remove Pulsar imports** from schema files
|
||||
|
||||
### Benefits
|
||||
|
||||
✅ **No pub/sub dependency** in schema definitions
|
||||
✅ **Standard Python** - easy to understand, type-check, document
|
||||
✅ **Modern tooling** - works with mypy, IDE autocomplete, linters
|
||||
✅ **Backend-optimized** - each backend uses native serialization
|
||||
✅ **No translation overhead** - direct serialization, no adapters
|
||||
✅ **Type safety** - real objects with proper types
|
||||
✅ **Easy validation** - can use Pydantic if needed
|
||||
|
||||
### Challenges & Solutions
|
||||
|
||||
**Challenge:** Pulsar's `Record` has runtime field validation
|
||||
**Solution:** Use Pydantic dataclasses for validation if needed, or Python 3.10+ dataclass features with `__post_init__`
|
||||
|
||||
**Challenge:** Some Pulsar-specific features (like `Bytes` type)
|
||||
**Solution:** Map to `bytes` type in dataclass, backend handles encoding appropriately
|
||||
|
||||
**Challenge:** Topic naming (`persistent://tenant/namespace/topic`)
|
||||
**Solution:** Abstract topic names in schema definitions, backend converts to proper format
|
||||
|
||||
**Challenge:** Schema evolution and versioning
|
||||
**Solution:** Each backend handles this according to its capabilities (Pulsar schema versions, Kafka schema registry, etc.)
|
||||
|
||||
**Challenge:** Nested complex types
|
||||
**Solution:** Use nested dataclasses, backends recursively serialize/deserialize
|
||||
|
||||
### Design Decisions
|
||||
|
||||
1. **Plain dataclasses or Pydantic?**
|
||||
- ✅ **Decision: Use plain Python dataclasses**
|
||||
- Simpler, no additional dependencies
|
||||
- Validation not required in practice
|
||||
- Easier to understand and maintain
|
||||
|
||||
2. **Schema evolution:**
|
||||
- ✅ **Decision: No versioning mechanism needed**
|
||||
- Schemas are stable and long-lasting
|
||||
- Updates typically add new fields (backward compatible)
|
||||
- Backends handle schema evolution according to their capabilities
|
||||
|
||||
3. **Backward compatibility:**
|
||||
- ✅ **Decision: Major version change, no backward compatibility required**
|
||||
- Will be a breaking change with migration instructions
|
||||
- Clean break allows for better design
|
||||
- Migration guide will be provided for existing deployments
|
||||
|
||||
4. **Nested types and complex structures:**
|
||||
- ✅ **Decision: Use nested dataclasses naturally**
|
||||
- Python dataclasses handle nesting perfectly
|
||||
- `list[T]` for arrays, `dict[K, V]` for maps
|
||||
- Backends recursively serialize/deserialize
|
||||
- Example:
|
||||
```python
|
||||
@dataclass
|
||||
class Value:
|
||||
value: str
|
||||
is_uri: bool
|
||||
|
||||
@dataclass
|
||||
class Triple:
|
||||
s: Value # Nested dataclass
|
||||
p: Value
|
||||
o: Value
|
||||
|
||||
@dataclass
|
||||
class GraphQuery:
|
||||
triples: list[Triple] # Array of nested dataclasses
|
||||
metadata: dict[str, str]
|
||||
```
|
||||
|
||||
5. **Default values and optional fields:**
|
||||
- ✅ **Decision: Mix of required, defaults, and optional fields**
|
||||
- Required fields: No default value
|
||||
- Fields with defaults: Always present, have sensible default
|
||||
- Truly optional fields: `T | None = None`, omitted from serialization when `None`
|
||||
- Example:
|
||||
```python
|
||||
@dataclass
|
||||
class TextCompletionRequest:
|
||||
system: str # Required, no default
|
||||
prompt: str # Required, no default
|
||||
streaming: bool = False # Optional with default value
|
||||
metadata: dict | None = None # Truly optional, can be absent
|
||||
```
|
||||
|
||||
**Important serialization semantics:**
|
||||
|
||||
When `metadata = None`:
|
||||
```json
|
||||
{
|
||||
"system": "...",
|
||||
"prompt": "...",
|
||||
"streaming": false
|
||||
// metadata field NOT PRESENT
|
||||
}
|
||||
```
|
||||
|
||||
When `metadata = {}` (explicitly empty):
|
||||
```json
|
||||
{
|
||||
"system": "...",
|
||||
"prompt": "...",
|
||||
"streaming": false,
|
||||
"metadata": {} // Field PRESENT but empty
|
||||
}
|
||||
```
|
||||
|
||||
**Key distinction:**
|
||||
- `None` → field absent from JSON (not serialized)
|
||||
- Empty value (`{}`, `[]`, `""`) → field present with empty value
|
||||
- This matters semantically: "not provided" vs "explicitly empty"
|
||||
- Serialization backends must skip `None` fields, not encode as `null`
|
||||
|
||||
## Approach Draft 3: Implementation Details
|
||||
|
||||
### Generic Queue Naming Format
|
||||
|
||||
Replace backend-specific queue names with a generic format that backends can map appropriately.
|
||||
|
||||
**Format:** `{qos}/{tenant}/{namespace}/{queue-name}`
|
||||
|
||||
Where:
|
||||
- `qos`: Quality of Service level
|
||||
- `q0` = best-effort (fire and forget, no acknowledgment)
|
||||
- `q1` = at-least-once (requires acknowledgment)
|
||||
- `q2` = exactly-once (two-phase acknowledgment)
|
||||
- `tenant`: Logical grouping for multi-tenancy
|
||||
- `namespace`: Sub-grouping within tenant
|
||||
- `queue-name`: Actual queue/topic name
|
||||
|
||||
**Examples:**
|
||||
```
|
||||
q1/tg/flow/text-completion-requests
|
||||
q2/tg/config/config-push
|
||||
q0/tg/metrics/stats
|
||||
```
|
||||
|
||||
### Backend Topic Mapping
|
||||
|
||||
Each backend maps the generic format to its native format:
|
||||
|
||||
**Pulsar Backend:**
|
||||
```python
|
||||
def map_topic(self, generic_topic: str) -> str:
|
||||
# Parse: q1/tg/flow/text-completion-requests
|
||||
qos, tenant, namespace, queue = generic_topic.split('/', 3)
|
||||
|
||||
# Map QoS to persistence
|
||||
persistence = 'persistent' if qos in ['q1', 'q2'] else 'non-persistent'
|
||||
|
||||
# Return Pulsar URI: persistent://tg/flow/text-completion-requests
|
||||
return f"{persistence}://{tenant}/{namespace}/{queue}"
|
||||
```
|
||||
|
||||
**MQTT Backend:**
|
||||
```python
|
||||
def map_topic(self, generic_topic: str) -> tuple[str, int]:
|
||||
# Parse: q1/tg/flow/text-completion-requests
|
||||
qos, tenant, namespace, queue = generic_topic.split('/', 3)
|
||||
|
||||
# Map QoS level
|
||||
qos_level = {'q0': 0, 'q1': 1, 'q2': 2}[qos]
|
||||
|
||||
# Build MQTT topic including tenant/namespace for proper namespacing
|
||||
mqtt_topic = f"{tenant}/{namespace}/{queue}"
|
||||
|
||||
return mqtt_topic, qos_level
|
||||
```
|
||||
|
||||
### Updated Topic Helper Function
|
||||
|
||||
```python
|
||||
# schema/core/topic.py
|
||||
def topic(queue_name, qos='q1', tenant='tg', namespace='flow'):
|
||||
"""
|
||||
Create a generic topic identifier that can be mapped by backends.
|
||||
|
||||
Args:
|
||||
queue_name: The queue/topic name
|
||||
qos: Quality of service
|
||||
- 'q0' = best-effort (no ack)
|
||||
- 'q1' = at-least-once (ack required)
|
||||
- 'q2' = exactly-once (two-phase ack)
|
||||
tenant: Tenant identifier for multi-tenancy
|
||||
namespace: Namespace within tenant
|
||||
|
||||
Returns:
|
||||
Generic topic string: qos/tenant/namespace/queue_name
|
||||
|
||||
Examples:
|
||||
topic('my-queue') # q1/tg/flow/my-queue
|
||||
topic('config', qos='q2', namespace='config') # q2/tg/config/config
|
||||
"""
|
||||
return f"{qos}/{tenant}/{namespace}/{queue_name}"
|
||||
```
|
||||
|
||||
### Configuration and Initialization
|
||||
|
||||
**Command-Line Arguments + Environment Variables:**
|
||||
|
||||
```python
|
||||
# In base/async_processor.py - add_args() method
|
||||
@staticmethod
|
||||
def add_args(parser):
|
||||
# Pub/sub backend selection
|
||||
parser.add_argument(
|
||||
'--pubsub-backend',
|
||||
default=os.getenv('PUBSUB_BACKEND', 'pulsar'),
|
||||
choices=['pulsar', 'mqtt'],
|
||||
help='Pub/sub backend (default: pulsar, env: PUBSUB_BACKEND)'
|
||||
)
|
||||
|
||||
# Pulsar-specific configuration
|
||||
parser.add_argument(
|
||||
'--pulsar-host',
|
||||
default=os.getenv('PULSAR_HOST', 'pulsar://localhost:6650'),
|
||||
help='Pulsar host (default: pulsar://localhost:6650, env: PULSAR_HOST)'
|
||||
)
|
||||
|
||||
parser.add_argument(
|
||||
'--pulsar-api-key',
|
||||
default=os.getenv('PULSAR_API_KEY', None),
|
||||
help='Pulsar API key (env: PULSAR_API_KEY)'
|
||||
)
|
||||
|
||||
parser.add_argument(
|
||||
'--pulsar-listener',
|
||||
default=os.getenv('PULSAR_LISTENER', None),
|
||||
help='Pulsar listener name (env: PULSAR_LISTENER)'
|
||||
)
|
||||
|
||||
# MQTT-specific configuration
|
||||
parser.add_argument(
|
||||
'--mqtt-host',
|
||||
default=os.getenv('MQTT_HOST', 'localhost'),
|
||||
help='MQTT broker host (default: localhost, env: MQTT_HOST)'
|
||||
)
|
||||
|
||||
parser.add_argument(
|
||||
'--mqtt-port',
|
||||
type=int,
|
||||
default=int(os.getenv('MQTT_PORT', '1883')),
|
||||
help='MQTT broker port (default: 1883, env: MQTT_PORT)'
|
||||
)
|
||||
|
||||
parser.add_argument(
|
||||
'--mqtt-username',
|
||||
default=os.getenv('MQTT_USERNAME', None),
|
||||
help='MQTT username (env: MQTT_USERNAME)'
|
||||
)
|
||||
|
||||
parser.add_argument(
|
||||
'--mqtt-password',
|
||||
default=os.getenv('MQTT_PASSWORD', None),
|
||||
help='MQTT password (env: MQTT_PASSWORD)'
|
||||
)
|
||||
```
|
||||
|
||||
**Factory Function:**
|
||||
|
||||
```python
|
||||
# In base/pubsub.py or base/pubsub_factory.py
|
||||
def get_pubsub(**config) -> PubSubBackend:
|
||||
"""
|
||||
Create and return a pub/sub backend based on configuration.
|
||||
|
||||
Args:
|
||||
config: Configuration dict from command-line args
|
||||
Must include 'pubsub_backend' key
|
||||
|
||||
Returns:
|
||||
Backend instance (PulsarBackend, MQTTBackend, etc.)
|
||||
"""
|
||||
backend_type = config.get('pubsub_backend', 'pulsar')
|
||||
|
||||
if backend_type == 'pulsar':
|
||||
return PulsarBackend(
|
||||
host=config.get('pulsar_host'),
|
||||
api_key=config.get('pulsar_api_key'),
|
||||
listener=config.get('pulsar_listener'),
|
||||
)
|
||||
elif backend_type == 'mqtt':
|
||||
return MQTTBackend(
|
||||
host=config.get('mqtt_host'),
|
||||
port=config.get('mqtt_port'),
|
||||
username=config.get('mqtt_username'),
|
||||
password=config.get('mqtt_password'),
|
||||
)
|
||||
else:
|
||||
raise ValueError(f"Unknown pub/sub backend: {backend_type}")
|
||||
```
|
||||
|
||||
**Usage in AsyncProcessor:**
|
||||
|
||||
```python
|
||||
# In async_processor.py
|
||||
class AsyncProcessor:
|
||||
def __init__(self, **params):
|
||||
self.id = params.get("id")
|
||||
|
||||
# Create backend from config (replaces PulsarClient)
|
||||
self.pubsub = get_pubsub(**params)
|
||||
|
||||
# Rest of initialization...
|
||||
```
|
||||
|
||||
### Backend Interface
|
||||
|
||||
```python
|
||||
class PubSubBackend(Protocol):
|
||||
"""Protocol defining the interface all pub/sub backends must implement."""
|
||||
|
||||
def create_producer(self, topic: str, schema: type, **options) -> BackendProducer:
|
||||
"""
|
||||
Create a producer for a topic.
|
||||
|
||||
Args:
|
||||
topic: Generic topic format (qos/tenant/namespace/queue)
|
||||
schema: Dataclass type for messages
|
||||
options: Backend-specific options (e.g., chunking_enabled)
|
||||
|
||||
Returns:
|
||||
Backend-specific producer instance
|
||||
"""
|
||||
...
|
||||
|
||||
def create_consumer(
|
||||
self,
|
||||
topic: str,
|
||||
subscription: str,
|
||||
schema: type,
|
||||
initial_position: str = 'latest',
|
||||
consumer_type: str = 'shared',
|
||||
**options
|
||||
) -> BackendConsumer:
|
||||
"""
|
||||
Create a consumer for a topic.
|
||||
|
||||
Args:
|
||||
topic: Generic topic format (qos/tenant/namespace/queue)
|
||||
subscription: Subscription/consumer group name
|
||||
schema: Dataclass type for messages
|
||||
initial_position: 'earliest' or 'latest' (MQTT may ignore)
|
||||
consumer_type: 'shared', 'exclusive', 'failover' (MQTT may ignore)
|
||||
options: Backend-specific options
|
||||
|
||||
Returns:
|
||||
Backend-specific consumer instance
|
||||
"""
|
||||
...
|
||||
|
||||
def close(self) -> None:
|
||||
"""Close the backend connection."""
|
||||
...
|
||||
```
|
||||
|
||||
```python
|
||||
class BackendProducer(Protocol):
|
||||
"""Protocol for backend-specific producer."""
|
||||
|
||||
def send(self, message: Any, properties: dict = {}) -> None:
|
||||
"""Send a message (dataclass instance) with optional properties."""
|
||||
...
|
||||
|
||||
def flush(self) -> None:
|
||||
"""Flush any buffered messages."""
|
||||
...
|
||||
|
||||
def close(self) -> None:
|
||||
"""Close the producer."""
|
||||
...
|
||||
```
|
||||
|
||||
```python
|
||||
class BackendConsumer(Protocol):
|
||||
"""Protocol for backend-specific consumer."""
|
||||
|
||||
def receive(self, timeout_millis: int = 2000) -> Message:
|
||||
"""
|
||||
Receive a message from the topic.
|
||||
|
||||
Raises:
|
||||
TimeoutError: If no message received within timeout
|
||||
"""
|
||||
...
|
||||
|
||||
def acknowledge(self, message: Message) -> None:
|
||||
"""Acknowledge successful processing of a message."""
|
||||
...
|
||||
|
||||
def negative_acknowledge(self, message: Message) -> None:
|
||||
"""Negative acknowledge - triggers redelivery."""
|
||||
...
|
||||
|
||||
def unsubscribe(self) -> None:
|
||||
"""Unsubscribe from the topic."""
|
||||
...
|
||||
|
||||
def close(self) -> None:
|
||||
"""Close the consumer."""
|
||||
...
|
||||
```
|
||||
|
||||
```python
|
||||
class Message(Protocol):
|
||||
"""Protocol for a received message."""
|
||||
|
||||
def value(self) -> Any:
|
||||
"""Get the deserialized message (dataclass instance)."""
|
||||
...
|
||||
|
||||
def properties(self) -> dict:
|
||||
"""Get message properties/metadata."""
|
||||
...
|
||||
```
|
||||
|
||||
### Existing Classes Refactoring
|
||||
|
||||
The existing `Consumer`, `Producer`, `Publisher`, `Subscriber` classes remain largely intact:
|
||||
|
||||
**Current responsibilities (keep):**
|
||||
- Async threading model and taskgroups
|
||||
- Reconnection logic and retry handling
|
||||
- Metrics collection
|
||||
- Rate limiting
|
||||
- Concurrency management
|
||||
|
||||
**Changes needed:**
|
||||
- Remove direct Pulsar imports (`pulsar.schema`, `pulsar.InitialPosition`, etc.)
|
||||
- Accept `BackendProducer`/`BackendConsumer` instead of Pulsar client
|
||||
- Delegate actual pub/sub operations to backend instances
|
||||
- Map generic concepts to backend calls
|
||||
|
||||
**Example refactoring:**
|
||||
|
||||
```python
|
||||
# OLD - consumer.py
|
||||
class Consumer:
|
||||
def __init__(self, client, topic, subscriber, schema, ...):
|
||||
self.client = client # Direct Pulsar client
|
||||
# ...
|
||||
|
||||
async def consumer_run(self):
|
||||
# Uses pulsar.InitialPosition, pulsar.ConsumerType
|
||||
self.consumer = self.client.subscribe(
|
||||
topic=self.topic,
|
||||
schema=JsonSchema(self.schema),
|
||||
initial_position=pulsar.InitialPosition.Earliest,
|
||||
consumer_type=pulsar.ConsumerType.Shared,
|
||||
)
|
||||
|
||||
# NEW - consumer.py
|
||||
class Consumer:
|
||||
def __init__(self, backend_consumer, schema, ...):
|
||||
self.backend_consumer = backend_consumer # Backend-specific consumer
|
||||
self.schema = schema
|
||||
# ...
|
||||
|
||||
async def consumer_run(self):
|
||||
# Backend consumer already created with right settings
|
||||
# Just use it directly
|
||||
while self.running:
|
||||
msg = await asyncio.to_thread(
|
||||
self.backend_consumer.receive,
|
||||
timeout_millis=2000
|
||||
)
|
||||
await self.handle_message(msg)
|
||||
```
|
||||
|
||||
### Backend-Specific Behaviors
|
||||
|
||||
**Pulsar Backend:**
|
||||
- Maps `q0` → `non-persistent://`, `q1`/`q2` → `persistent://`
|
||||
- Supports all consumer types (shared, exclusive, failover)
|
||||
- Supports initial position (earliest/latest)
|
||||
- Native message acknowledgment
|
||||
- Schema registry support
|
||||
|
||||
**MQTT Backend:**
|
||||
- Maps `q0`/`q1`/`q2` → MQTT QoS levels 0/1/2
|
||||
- Includes tenant/namespace in topic path for namespacing
|
||||
- Auto-generates client IDs from subscription names
|
||||
- Ignores initial position (no message history in basic MQTT)
|
||||
- Ignores consumer type (MQTT uses client IDs, not consumer groups)
|
||||
- Simple publish/subscribe model
|
||||
|
||||
### Design Decisions Summary
|
||||
|
||||
1. ✅ **Generic queue naming**: `qos/tenant/namespace/queue-name` format
|
||||
2. ✅ **QoS in queue ID**: Determined by queue definition, not configuration
|
||||
3. ✅ **Reconnection**: Handled by Consumer/Producer classes, not backends
|
||||
4. ✅ **MQTT topics**: Include tenant/namespace for proper namespacing
|
||||
5. ✅ **Message history**: MQTT ignores `initial_position` parameter (future enhancement)
|
||||
6. ✅ **Client IDs**: MQTT backend auto-generates from subscription name
|
||||
|
||||
### Future Enhancements
|
||||
|
||||
**MQTT message history:**
|
||||
- Could add optional persistence layer (e.g., retained messages, external store)
|
||||
- Would allow supporting `initial_position='earliest'`
|
||||
- Not required for initial implementation
|
||||
|
||||
Loading…
Add table
Add a link
Reference in a new issue