Messaging fabric plugins (#592)

* Plugin architecture for messaging fabric

* Schemas use a technology neutral expression

* Schemas strictness has uncovered some incorrect schema use which is fixed
This commit is contained in:
cybermaggedon 2025-12-17 21:40:43 +00:00 committed by GitHub
parent 1865b3f3c8
commit 34eb083836
No known key found for this signature in database
GPG key ID: B5690EEEBB952194
100 changed files with 2342 additions and 828 deletions

958
docs/tech-specs/pubsub.md Normal file
View file

@ -0,0 +1,958 @@
# Pub/Sub Infrastructure
## Overview
This document catalogs all connections between the TrustGraph codebase and the pub/sub infrastructure. Currently, the system is hardcoded to use Apache Pulsar. This analysis identifies all integration points to inform future refactoring toward a configurable pub/sub abstraction.
## Current State: Pulsar Integration Points
### 1. Direct Pulsar Client Usage
**Location:** `trustgraph-flow/trustgraph/gateway/service.py`
The API gateway directly imports and instantiates the Pulsar client:
- **Line 20:** `import pulsar`
- **Lines 54-61:** Direct instantiation of `pulsar.Client()` with optional `pulsar.AuthenticationToken()`
- **Lines 33-35:** Default Pulsar host configuration from environment variables
- **Lines 178-192:** CLI arguments for `--pulsar-host`, `--pulsar-api-key`, and `--pulsar-listener`
- **Lines 78, 124:** Passes `pulsar_client` to `ConfigReceiver` and `DispatcherManager`
This is the only location that directly instantiates a Pulsar client outside of the abstraction layer.
### 2. Base Processor Framework
**Location:** `trustgraph-base/trustgraph/base/async_processor.py`
The base class for all processors provides Pulsar connectivity:
- **Line 9:** `import _pulsar` (for exception handling)
- **Line 18:** `from . pubsub import PulsarClient`
- **Line 38:** Creates `pulsar_client_object = PulsarClient(**params)`
- **Lines 104-108:** Properties exposing `pulsar_host` and `pulsar_client`
- **Line 250:** Static method `add_args()` calls `PulsarClient.add_args(parser)` for CLI arguments
- **Lines 223-225:** Exception handling for `_pulsar.Interrupted`
All processors inherit from `AsyncProcessor`, making this the central integration point.
### 3. Consumer Abstraction
**Location:** `trustgraph-base/trustgraph/base/consumer.py`
Consumes messages from queues and invokes handler functions:
**Pulsar imports:**
- **Line 12:** `from pulsar.schema import JsonSchema`
- **Line 13:** `import pulsar`
- **Line 14:** `import _pulsar`
**Pulsar-specific usage:**
- **Lines 100, 102:** `pulsar.InitialPosition.Earliest` / `pulsar.InitialPosition.Latest`
- **Line 108:** `JsonSchema(self.schema)` wrapper
- **Line 110:** `pulsar.ConsumerType.Shared`
- **Lines 104-111:** `self.client.subscribe()` with Pulsar-specific parameters
- **Lines 143, 150, 65:** `consumer.unsubscribe()` and `consumer.close()` methods
- **Line 162:** `_pulsar.Timeout` exception
- **Lines 182, 205, 232:** `consumer.acknowledge()` / `consumer.negative_acknowledge()`
**Spec file:** `trustgraph-base/trustgraph/base/consumer_spec.py`
- **Line 22:** References `processor.pulsar_client`
### 4. Producer Abstraction
**Location:** `trustgraph-base/trustgraph/base/producer.py`
Sends messages to queues:
**Pulsar imports:**
- **Line 2:** `from pulsar.schema import JsonSchema`
**Pulsar-specific usage:**
- **Line 49:** `JsonSchema(self.schema)` wrapper
- **Lines 47-51:** `self.client.create_producer()` with Pulsar-specific parameters (topic, schema, chunking_enabled)
- **Lines 31, 76:** `producer.close()` method
- **Lines 64-65:** `producer.send()` with message and properties
**Spec file:** `trustgraph-base/trustgraph/base/producer_spec.py`
- **Line 18:** References `processor.pulsar_client`
### 5. Publisher Abstraction
**Location:** `trustgraph-base/trustgraph/base/publisher.py`
Asynchronous message publishing with queue buffering:
**Pulsar imports:**
- **Line 2:** `from pulsar.schema import JsonSchema`
- **Line 6:** `import pulsar`
**Pulsar-specific usage:**
- **Line 52:** `JsonSchema(self.schema)` wrapper
- **Lines 50-54:** `self.client.create_producer()` with Pulsar-specific parameters
- **Lines 101, 103:** `producer.send()` with message and optional properties
- **Lines 106-107:** `producer.flush()` and `producer.close()` methods
### 6. Subscriber Abstraction
**Location:** `trustgraph-base/trustgraph/base/subscriber.py`
Provides multi-recipient message distribution from queues:
**Pulsar imports:**
- **Line 6:** `from pulsar.schema import JsonSchema`
- **Line 8:** `import _pulsar`
**Pulsar-specific usage:**
- **Line 55:** `JsonSchema(self.schema)` wrapper
- **Line 57:** `self.client.subscribe(**subscribe_args)`
- **Lines 101, 136, 160, 167-172:** Pulsar exceptions: `_pulsar.Timeout`, `_pulsar.InvalidConfiguration`, `_pulsar.AlreadyClosed`
- **Lines 159, 166, 170:** Consumer methods: `negative_acknowledge()`, `unsubscribe()`, `close()`
- **Lines 247, 251:** Message acknowledgment: `acknowledge()`, `negative_acknowledge()`
**Spec file:** `trustgraph-base/trustgraph/base/subscriber_spec.py`
- **Line 19:** References `processor.pulsar_client`
### 7. Schema System (Heart of Darkness)
**Location:** `trustgraph-base/trustgraph/schema/`
Every message schema in the system is defined using Pulsar's schema framework.
**Core primitives:** `schema/core/primitives.py`
- **Line 2:** `from pulsar.schema import Record, String, Boolean, Array, Integer`
- All schemas inherit from Pulsar's `Record` base class
- All field types are Pulsar types: `String()`, `Integer()`, `Boolean()`, `Array()`, `Map()`, `Double()`
**Example schemas:**
- `schema/services/llm.py` (Line 2): `from pulsar.schema import Record, String, Array, Double, Integer, Boolean`
- `schema/services/config.py` (Line 2): `from pulsar.schema import Record, Bytes, String, Boolean, Array, Map, Integer`
**Topic naming:** `schema/core/topic.py`
- **Lines 2-3:** Topic format: `{kind}://{tenant}/{namespace}/{topic}`
- This URI structure is Pulsar-specific (e.g., `persistent://tg/flow/config`)
**Impact:**
- All request/response message definitions throughout the codebase use Pulsar schemas
- This includes services for: config, flow, llm, prompt, query, storage, agent, collection, diagnosis, library, lookup, nlp_query, objects_query, retrieval, structured_query
- Schema definitions are imported and used extensively across all processors and services
## Summary
### Pulsar Dependencies by Category
1. **Client instantiation:**
- Direct: `gateway/service.py`
- Abstracted: `async_processor.py``pubsub.py` (PulsarClient)
2. **Message transport:**
- Consumer: `consumer.py`, `consumer_spec.py`
- Producer: `producer.py`, `producer_spec.py`
- Publisher: `publisher.py`
- Subscriber: `subscriber.py`, `subscriber_spec.py`
3. **Schema system:**
- Base types: `schema/core/primitives.py`
- All service schemas: `schema/services/*.py`
- Topic naming: `schema/core/topic.py`
4. **Pulsar-specific concepts required:**
- Topic-based messaging
- Schema system (Record, field types)
- Shared subscriptions
- Message acknowledgment (positive/negative)
- Consumer positioning (earliest/latest)
- Message properties
- Initial positions and consumer types
- Chunking support
- Persistent vs non-persistent topics
### Refactoring Challenges
The good news: The abstraction layer (Consumer, Producer, Publisher, Subscriber) provides a clean encapsulation of most Pulsar interactions.
The challenges:
1. **Schema system pervasiveness:** Every message definition uses `pulsar.schema.Record` and Pulsar field types
2. **Pulsar-specific enums:** `InitialPosition`, `ConsumerType`
3. **Pulsar exceptions:** `_pulsar.Timeout`, `_pulsar.Interrupted`, `_pulsar.InvalidConfiguration`, `_pulsar.AlreadyClosed`
4. **Method signatures:** `acknowledge()`, `negative_acknowledge()`, `subscribe()`, `create_producer()`, etc.
5. **Topic URI format:** Pulsar's `kind://tenant/namespace/topic` structure
### Next Steps
To make the pub/sub infrastructure configurable, we need to:
1. Create an abstraction interface for the client/schema system
2. Abstract Pulsar-specific enums and exceptions
3. Create schema wrappers or alternative schema definitions
4. Implement the interface for both Pulsar and alternative systems (Kafka, RabbitMQ, Redis Streams, etc.)
5. Update `pubsub.py` to be configurable and support multiple backends
6. Provide migration path for existing deployments
## Approach Draft 1: Adapter Pattern with Schema Translation Layer
### Key Insight
The **schema system** is the deepest integration point - everything else flows from it. We need to solve this first, or we'll be rewriting the entire codebase.
### Strategy: Minimal Disruption with Adapters
**1. Keep Pulsar schemas as the internal representation**
- Don't rewrite all the schema definitions
- Schemas remain `pulsar.schema.Record` internally
- Use adapters to translate at the boundary between our code and the pub/sub backend
**2. Create a pub/sub abstraction layer:**
```
┌─────────────────────────────────────┐
│ Existing Code (unchanged) │
│ - Uses Pulsar schemas internally │
│ - Consumer/Producer/Publisher │
└──────────────┬──────────────────────┘
┌──────────────┴──────────────────────┐
│ PubSubFactory (configurable) │
│ - Creates backend-specific client │
└──────────────┬──────────────────────┘
┌──────┴──────┐
│ │
┌───────▼─────┐ ┌────▼─────────┐
│ PulsarAdapter│ │ KafkaAdapter │ etc...
│ (passthrough)│ │ (translates) │
└──────────────┘ └──────────────┘
```
**3. Define abstract interfaces:**
- `PubSubClient` - client connection
- `PubSubProducer` - sending messages
- `PubSubConsumer` - receiving messages
- `SchemaAdapter` - translating Pulsar schemas to/from JSON or backend-specific formats
**4. Implementation details:**
For **Pulsar adapter**: Nearly passthrough, minimal translation
For **other backends** (Kafka, RabbitMQ, etc.):
- Serialize Pulsar Record objects to JSON/bytes
- Map concepts like:
- `InitialPosition.Earliest/Latest` → Kafka's auto.offset.reset
- `acknowledge()` → Kafka's commit
- `negative_acknowledge()` → Re-queue or DLQ pattern
- Topic URIs → Backend-specific topic names
### Analysis
**Pros:**
- ✅ Minimal code changes to existing services
- ✅ Schemas stay as-is (no massive rewrite)
- ✅ Gradual migration path
- ✅ Pulsar users see no difference
- ✅ New backends added via adapters
**Cons:**
- ⚠️ Still carries Pulsar dependency (for schema definitions)
- ⚠️ Some impedance mismatch translating concepts
### Alternative Consideration
Create a **TrustGraph schema system** that's pub/sub agnostic (using dataclasses or Pydantic), then generate Pulsar/Kafka/etc schemas from it. This requires rewriting every schema file and potentially breaking changes.
### Recommendation for Draft 1
Start with the **adapter approach** because:
1. It's pragmatic - works with existing code
2. Proves the concept with minimal risk
3. Can evolve to a native schema system later if needed
4. Configuration-driven: one env var switches backends
## Approach Draft 2: Backend-Agnostic Schema System with Dataclasses
### Core Concept
Use Python **dataclasses** as the neutral schema definition format. Each pub/sub backend provides its own serialization/deserialization for dataclasses, eliminating the need for Pulsar schemas to remain in the codebase.
### Schema Polymorphism at the Factory Level
Instead of translating Pulsar schemas, **each backend provides its own schema handling** that works with standard Python dataclasses.
### Publisher Flow
```python
# 1. Get the configured backend from factory
pubsub = get_pubsub() # Returns PulsarBackend, MQTTBackend, etc.
# 2. Get schema class from the backend
# (Can be imported directly - backend-agnostic)
from trustgraph.schema.services.llm import TextCompletionRequest
# 3. Create a producer/publisher for a specific topic
producer = pubsub.create_producer(
topic="text-completion-requests",
schema=TextCompletionRequest # Tells backend what schema to use
)
# 4. Create message instances (same API regardless of backend)
request = TextCompletionRequest(
system="You are helpful",
prompt="Hello world",
streaming=False
)
# 5. Send the message
producer.send(request) # Backend serializes appropriately
```
### Consumer Flow
```python
# 1. Get the configured backend
pubsub = get_pubsub()
# 2. Create a consumer
consumer = pubsub.subscribe(
topic="text-completion-requests",
schema=TextCompletionRequest # Tells backend how to deserialize
)
# 3. Receive and deserialize
msg = consumer.receive()
request = msg.value() # Returns TextCompletionRequest dataclass instance
# 4. Use the data (type-safe access)
print(request.system) # "You are helpful"
print(request.prompt) # "Hello world"
print(request.streaming) # False
```
### What Happens Behind the Scenes
**For Pulsar backend:**
- `create_producer()` → creates Pulsar producer with JSON schema or dynamically generated Record
- `send(request)` → serializes dataclass to JSON/Pulsar format, sends to Pulsar
- `receive()` → gets Pulsar message, deserializes back to dataclass
**For MQTT backend:**
- `create_producer()` → connects to MQTT broker, no schema registration needed
- `send(request)` → converts dataclass to JSON, publishes to MQTT topic
- `receive()` → subscribes to MQTT topic, deserializes JSON to dataclass
**For Kafka backend:**
- `create_producer()` → creates Kafka producer, registers Avro schema if needed
- `send(request)` → serializes dataclass to Avro format, sends to Kafka
- `receive()` → gets Kafka message, deserializes Avro back to dataclass
### Key Design Points
1. **Schema object creation**: The dataclass instance (`TextCompletionRequest(...)`) is identical regardless of backend
2. **Backend handles encoding**: Each backend knows how to serialize its dataclass to the wire format
3. **Schema definition at creation**: When creating producer/consumer, you specify the schema type
4. **Type safety preserved**: You get back a proper `TextCompletionRequest` object, not a dict
5. **No backend leakage**: Application code never imports backend-specific libraries
### Example Transformation
**Current (Pulsar-specific):**
```python
# schema/services/llm.py
from pulsar.schema import Record, String, Boolean, Integer
class TextCompletionRequest(Record):
system = String()
prompt = String()
streaming = Boolean()
```
**New (Backend-agnostic):**
```python
# schema/services/llm.py
from dataclasses import dataclass
@dataclass
class TextCompletionRequest:
system: str
prompt: str
streaming: bool = False
```
### Backend Integration
Each backend handles serialization/deserialization of dataclasses:
**Pulsar backend:**
- Dynamically generate `pulsar.schema.Record` classes from dataclasses
- Or serialize dataclasses to JSON and use Pulsar's JSON schema
- Maintains compatibility with existing Pulsar deployments
**MQTT/Redis backend:**
- Direct JSON serialization of dataclass instances
- Use `dataclasses.asdict()` / `from_dict()`
- Lightweight, no schema registry needed
**Kafka backend:**
- Generate Avro schemas from dataclass definitions
- Use Confluent's schema registry
- Type-safe serialization with schema evolution support
### Architecture
```
┌─────────────────────────────────────┐
│ Application Code │
│ - Uses dataclass schemas │
│ - Backend-agnostic │
└──────────────┬──────────────────────┘
┌──────────────┴──────────────────────┐
│ PubSubFactory (configurable) │
│ - get_pubsub() returns backend │
└──────────────┬──────────────────────┘
┌──────┴──────┐
│ │
┌───────▼─────────┐ ┌────▼──────────────┐
│ PulsarBackend │ │ MQTTBackend │
│ - JSON schema │ │ - JSON serialize │
│ - or dynamic │ │ - Simple queues │
│ Record gen │ │ │
└─────────────────┘ └───────────────────┘
```
### Implementation Details
**1. Schema definitions:** Plain dataclasses with type hints
- `str`, `int`, `bool`, `float` for primitives
- `list[T]` for arrays
- `dict[str, T]` for maps
- Nested dataclasses for complex types
**2. Each backend provides:**
- Serializer: `dataclass → bytes/wire format`
- Deserializer: `bytes/wire format → dataclass`
- Schema registration (if needed, like Pulsar/Kafka)
**3. Consumer/Producer abstraction:**
- Already exists (consumer.py, producer.py)
- Update to use backend's serialization
- Remove direct Pulsar imports
**4. Type mappings:**
- Pulsar `String()` → Python `str`
- Pulsar `Integer()` → Python `int`
- Pulsar `Boolean()` → Python `bool`
- Pulsar `Array(T)` → Python `list[T]`
- Pulsar `Map(K, V)` → Python `dict[K, V]`
- Pulsar `Double()` → Python `float`
- Pulsar `Bytes()` → Python `bytes`
### Migration Path
1. **Create dataclass versions** of all schemas in `trustgraph/schema/`
2. **Update backend classes** (Consumer, Producer, Publisher, Subscriber) to use backend-provided serialization
3. **Implement PulsarBackend** with JSON schema or dynamic Record generation
4. **Test with Pulsar** to ensure backward compatibility with existing deployments
5. **Add new backends** (MQTT, Kafka, Redis, etc.) as needed
6. **Remove Pulsar imports** from schema files
### Benefits
**No pub/sub dependency** in schema definitions
**Standard Python** - easy to understand, type-check, document
**Modern tooling** - works with mypy, IDE autocomplete, linters
**Backend-optimized** - each backend uses native serialization
**No translation overhead** - direct serialization, no adapters
**Type safety** - real objects with proper types
**Easy validation** - can use Pydantic if needed
### Challenges & Solutions
**Challenge:** Pulsar's `Record` has runtime field validation
**Solution:** Use Pydantic dataclasses for validation if needed, or Python 3.10+ dataclass features with `__post_init__`
**Challenge:** Some Pulsar-specific features (like `Bytes` type)
**Solution:** Map to `bytes` type in dataclass, backend handles encoding appropriately
**Challenge:** Topic naming (`persistent://tenant/namespace/topic`)
**Solution:** Abstract topic names in schema definitions, backend converts to proper format
**Challenge:** Schema evolution and versioning
**Solution:** Each backend handles this according to its capabilities (Pulsar schema versions, Kafka schema registry, etc.)
**Challenge:** Nested complex types
**Solution:** Use nested dataclasses, backends recursively serialize/deserialize
### Design Decisions
1. **Plain dataclasses or Pydantic?**
- ✅ **Decision: Use plain Python dataclasses**
- Simpler, no additional dependencies
- Validation not required in practice
- Easier to understand and maintain
2. **Schema evolution:**
- ✅ **Decision: No versioning mechanism needed**
- Schemas are stable and long-lasting
- Updates typically add new fields (backward compatible)
- Backends handle schema evolution according to their capabilities
3. **Backward compatibility:**
- ✅ **Decision: Major version change, no backward compatibility required**
- Will be a breaking change with migration instructions
- Clean break allows for better design
- Migration guide will be provided for existing deployments
4. **Nested types and complex structures:**
- ✅ **Decision: Use nested dataclasses naturally**
- Python dataclasses handle nesting perfectly
- `list[T]` for arrays, `dict[K, V]` for maps
- Backends recursively serialize/deserialize
- Example:
```python
@dataclass
class Value:
value: str
is_uri: bool
@dataclass
class Triple:
s: Value # Nested dataclass
p: Value
o: Value
@dataclass
class GraphQuery:
triples: list[Triple] # Array of nested dataclasses
metadata: dict[str, str]
```
5. **Default values and optional fields:**
- ✅ **Decision: Mix of required, defaults, and optional fields**
- Required fields: No default value
- Fields with defaults: Always present, have sensible default
- Truly optional fields: `T | None = None`, omitted from serialization when `None`
- Example:
```python
@dataclass
class TextCompletionRequest:
system: str # Required, no default
prompt: str # Required, no default
streaming: bool = False # Optional with default value
metadata: dict | None = None # Truly optional, can be absent
```
**Important serialization semantics:**
When `metadata = None`:
```json
{
"system": "...",
"prompt": "...",
"streaming": false
// metadata field NOT PRESENT
}
```
When `metadata = {}` (explicitly empty):
```json
{
"system": "...",
"prompt": "...",
"streaming": false,
"metadata": {} // Field PRESENT but empty
}
```
**Key distinction:**
- `None` → field absent from JSON (not serialized)
- Empty value (`{}`, `[]`, `""`) → field present with empty value
- This matters semantically: "not provided" vs "explicitly empty"
- Serialization backends must skip `None` fields, not encode as `null`
## Approach Draft 3: Implementation Details
### Generic Queue Naming Format
Replace backend-specific queue names with a generic format that backends can map appropriately.
**Format:** `{qos}/{tenant}/{namespace}/{queue-name}`
Where:
- `qos`: Quality of Service level
- `q0` = best-effort (fire and forget, no acknowledgment)
- `q1` = at-least-once (requires acknowledgment)
- `q2` = exactly-once (two-phase acknowledgment)
- `tenant`: Logical grouping for multi-tenancy
- `namespace`: Sub-grouping within tenant
- `queue-name`: Actual queue/topic name
**Examples:**
```
q1/tg/flow/text-completion-requests
q2/tg/config/config-push
q0/tg/metrics/stats
```
### Backend Topic Mapping
Each backend maps the generic format to its native format:
**Pulsar Backend:**
```python
def map_topic(self, generic_topic: str) -> str:
# Parse: q1/tg/flow/text-completion-requests
qos, tenant, namespace, queue = generic_topic.split('/', 3)
# Map QoS to persistence
persistence = 'persistent' if qos in ['q1', 'q2'] else 'non-persistent'
# Return Pulsar URI: persistent://tg/flow/text-completion-requests
return f"{persistence}://{tenant}/{namespace}/{queue}"
```
**MQTT Backend:**
```python
def map_topic(self, generic_topic: str) -> tuple[str, int]:
# Parse: q1/tg/flow/text-completion-requests
qos, tenant, namespace, queue = generic_topic.split('/', 3)
# Map QoS level
qos_level = {'q0': 0, 'q1': 1, 'q2': 2}[qos]
# Build MQTT topic including tenant/namespace for proper namespacing
mqtt_topic = f"{tenant}/{namespace}/{queue}"
return mqtt_topic, qos_level
```
### Updated Topic Helper Function
```python
# schema/core/topic.py
def topic(queue_name, qos='q1', tenant='tg', namespace='flow'):
"""
Create a generic topic identifier that can be mapped by backends.
Args:
queue_name: The queue/topic name
qos: Quality of service
- 'q0' = best-effort (no ack)
- 'q1' = at-least-once (ack required)
- 'q2' = exactly-once (two-phase ack)
tenant: Tenant identifier for multi-tenancy
namespace: Namespace within tenant
Returns:
Generic topic string: qos/tenant/namespace/queue_name
Examples:
topic('my-queue') # q1/tg/flow/my-queue
topic('config', qos='q2', namespace='config') # q2/tg/config/config
"""
return f"{qos}/{tenant}/{namespace}/{queue_name}"
```
### Configuration and Initialization
**Command-Line Arguments + Environment Variables:**
```python
# In base/async_processor.py - add_args() method
@staticmethod
def add_args(parser):
# Pub/sub backend selection
parser.add_argument(
'--pubsub-backend',
default=os.getenv('PUBSUB_BACKEND', 'pulsar'),
choices=['pulsar', 'mqtt'],
help='Pub/sub backend (default: pulsar, env: PUBSUB_BACKEND)'
)
# Pulsar-specific configuration
parser.add_argument(
'--pulsar-host',
default=os.getenv('PULSAR_HOST', 'pulsar://localhost:6650'),
help='Pulsar host (default: pulsar://localhost:6650, env: PULSAR_HOST)'
)
parser.add_argument(
'--pulsar-api-key',
default=os.getenv('PULSAR_API_KEY', None),
help='Pulsar API key (env: PULSAR_API_KEY)'
)
parser.add_argument(
'--pulsar-listener',
default=os.getenv('PULSAR_LISTENER', None),
help='Pulsar listener name (env: PULSAR_LISTENER)'
)
# MQTT-specific configuration
parser.add_argument(
'--mqtt-host',
default=os.getenv('MQTT_HOST', 'localhost'),
help='MQTT broker host (default: localhost, env: MQTT_HOST)'
)
parser.add_argument(
'--mqtt-port',
type=int,
default=int(os.getenv('MQTT_PORT', '1883')),
help='MQTT broker port (default: 1883, env: MQTT_PORT)'
)
parser.add_argument(
'--mqtt-username',
default=os.getenv('MQTT_USERNAME', None),
help='MQTT username (env: MQTT_USERNAME)'
)
parser.add_argument(
'--mqtt-password',
default=os.getenv('MQTT_PASSWORD', None),
help='MQTT password (env: MQTT_PASSWORD)'
)
```
**Factory Function:**
```python
# In base/pubsub.py or base/pubsub_factory.py
def get_pubsub(**config) -> PubSubBackend:
"""
Create and return a pub/sub backend based on configuration.
Args:
config: Configuration dict from command-line args
Must include 'pubsub_backend' key
Returns:
Backend instance (PulsarBackend, MQTTBackend, etc.)
"""
backend_type = config.get('pubsub_backend', 'pulsar')
if backend_type == 'pulsar':
return PulsarBackend(
host=config.get('pulsar_host'),
api_key=config.get('pulsar_api_key'),
listener=config.get('pulsar_listener'),
)
elif backend_type == 'mqtt':
return MQTTBackend(
host=config.get('mqtt_host'),
port=config.get('mqtt_port'),
username=config.get('mqtt_username'),
password=config.get('mqtt_password'),
)
else:
raise ValueError(f"Unknown pub/sub backend: {backend_type}")
```
**Usage in AsyncProcessor:**
```python
# In async_processor.py
class AsyncProcessor:
def __init__(self, **params):
self.id = params.get("id")
# Create backend from config (replaces PulsarClient)
self.pubsub = get_pubsub(**params)
# Rest of initialization...
```
### Backend Interface
```python
class PubSubBackend(Protocol):
"""Protocol defining the interface all pub/sub backends must implement."""
def create_producer(self, topic: str, schema: type, **options) -> BackendProducer:
"""
Create a producer for a topic.
Args:
topic: Generic topic format (qos/tenant/namespace/queue)
schema: Dataclass type for messages
options: Backend-specific options (e.g., chunking_enabled)
Returns:
Backend-specific producer instance
"""
...
def create_consumer(
self,
topic: str,
subscription: str,
schema: type,
initial_position: str = 'latest',
consumer_type: str = 'shared',
**options
) -> BackendConsumer:
"""
Create a consumer for a topic.
Args:
topic: Generic topic format (qos/tenant/namespace/queue)
subscription: Subscription/consumer group name
schema: Dataclass type for messages
initial_position: 'earliest' or 'latest' (MQTT may ignore)
consumer_type: 'shared', 'exclusive', 'failover' (MQTT may ignore)
options: Backend-specific options
Returns:
Backend-specific consumer instance
"""
...
def close(self) -> None:
"""Close the backend connection."""
...
```
```python
class BackendProducer(Protocol):
"""Protocol for backend-specific producer."""
def send(self, message: Any, properties: dict = {}) -> None:
"""Send a message (dataclass instance) with optional properties."""
...
def flush(self) -> None:
"""Flush any buffered messages."""
...
def close(self) -> None:
"""Close the producer."""
...
```
```python
class BackendConsumer(Protocol):
"""Protocol for backend-specific consumer."""
def receive(self, timeout_millis: int = 2000) -> Message:
"""
Receive a message from the topic.
Raises:
TimeoutError: If no message received within timeout
"""
...
def acknowledge(self, message: Message) -> None:
"""Acknowledge successful processing of a message."""
...
def negative_acknowledge(self, message: Message) -> None:
"""Negative acknowledge - triggers redelivery."""
...
def unsubscribe(self) -> None:
"""Unsubscribe from the topic."""
...
def close(self) -> None:
"""Close the consumer."""
...
```
```python
class Message(Protocol):
"""Protocol for a received message."""
def value(self) -> Any:
"""Get the deserialized message (dataclass instance)."""
...
def properties(self) -> dict:
"""Get message properties/metadata."""
...
```
### Existing Classes Refactoring
The existing `Consumer`, `Producer`, `Publisher`, `Subscriber` classes remain largely intact:
**Current responsibilities (keep):**
- Async threading model and taskgroups
- Reconnection logic and retry handling
- Metrics collection
- Rate limiting
- Concurrency management
**Changes needed:**
- Remove direct Pulsar imports (`pulsar.schema`, `pulsar.InitialPosition`, etc.)
- Accept `BackendProducer`/`BackendConsumer` instead of Pulsar client
- Delegate actual pub/sub operations to backend instances
- Map generic concepts to backend calls
**Example refactoring:**
```python
# OLD - consumer.py
class Consumer:
def __init__(self, client, topic, subscriber, schema, ...):
self.client = client # Direct Pulsar client
# ...
async def consumer_run(self):
# Uses pulsar.InitialPosition, pulsar.ConsumerType
self.consumer = self.client.subscribe(
topic=self.topic,
schema=JsonSchema(self.schema),
initial_position=pulsar.InitialPosition.Earliest,
consumer_type=pulsar.ConsumerType.Shared,
)
# NEW - consumer.py
class Consumer:
def __init__(self, backend_consumer, schema, ...):
self.backend_consumer = backend_consumer # Backend-specific consumer
self.schema = schema
# ...
async def consumer_run(self):
# Backend consumer already created with right settings
# Just use it directly
while self.running:
msg = await asyncio.to_thread(
self.backend_consumer.receive,
timeout_millis=2000
)
await self.handle_message(msg)
```
### Backend-Specific Behaviors
**Pulsar Backend:**
- Maps `q0``non-persistent://`, `q1`/`q2``persistent://`
- Supports all consumer types (shared, exclusive, failover)
- Supports initial position (earliest/latest)
- Native message acknowledgment
- Schema registry support
**MQTT Backend:**
- Maps `q0`/`q1`/`q2` → MQTT QoS levels 0/1/2
- Includes tenant/namespace in topic path for namespacing
- Auto-generates client IDs from subscription names
- Ignores initial position (no message history in basic MQTT)
- Ignores consumer type (MQTT uses client IDs, not consumer groups)
- Simple publish/subscribe model
### Design Decisions Summary
1. ✅ **Generic queue naming**: `qos/tenant/namespace/queue-name` format
2. ✅ **QoS in queue ID**: Determined by queue definition, not configuration
3. ✅ **Reconnection**: Handled by Consumer/Producer classes, not backends
4. ✅ **MQTT topics**: Include tenant/namespace for proper namespacing
5. ✅ **Message history**: MQTT ignores `initial_position` parameter (future enhancement)
6. ✅ **Client IDs**: MQTT backend auto-generates from subscription name
### Future Enhancements
**MQTT message history:**
- Could add optional persistence layer (e.g., retained messages, external store)
- Would allow supporting `initial_position='earliest'`
- Not required for initial implementation

View file

@ -159,12 +159,12 @@ class AsyncFlowInstance:
result = await self.request("text-completion", request_data) result = await self.request("text-completion", request_data)
return result.get("response", "") return result.get("response", "")
async def graph_rag(self, question: str, user: str, collection: str, async def graph_rag(self, query: str, user: str, collection: str,
max_subgraph_size: int = 1000, max_subgraph_count: int = 5, max_subgraph_size: int = 1000, max_subgraph_count: int = 5,
max_entity_distance: int = 3, **kwargs: Any) -> str: max_entity_distance: int = 3, **kwargs: Any) -> str:
"""Graph RAG (non-streaming, use async_socket for streaming)""" """Graph RAG (non-streaming, use async_socket for streaming)"""
request_data = { request_data = {
"question": question, "query": query,
"user": user, "user": user,
"collection": collection, "collection": collection,
"max-subgraph-size": max_subgraph_size, "max-subgraph-size": max_subgraph_size,
@ -177,11 +177,11 @@ class AsyncFlowInstance:
result = await self.request("graph-rag", request_data) result = await self.request("graph-rag", request_data)
return result.get("response", "") return result.get("response", "")
async def document_rag(self, question: str, user: str, collection: str, async def document_rag(self, query: str, user: str, collection: str,
doc_limit: int = 10, **kwargs: Any) -> str: doc_limit: int = 10, **kwargs: Any) -> str:
"""Document RAG (non-streaming, use async_socket for streaming)""" """Document RAG (non-streaming, use async_socket for streaming)"""
request_data = { request_data = {
"question": question, "query": query,
"user": user, "user": user,
"collection": collection, "collection": collection,
"doc-limit": doc_limit, "doc-limit": doc_limit,

View file

@ -208,12 +208,12 @@ class AsyncSocketFlowInstance:
if hasattr(chunk, 'content'): if hasattr(chunk, 'content'):
yield chunk.content yield chunk.content
async def graph_rag(self, question: str, user: str, collection: str, async def graph_rag(self, query: str, user: str, collection: str,
max_subgraph_size: int = 1000, max_subgraph_count: int = 5, max_subgraph_size: int = 1000, max_subgraph_count: int = 5,
max_entity_distance: int = 3, streaming: bool = False, **kwargs): max_entity_distance: int = 3, streaming: bool = False, **kwargs):
"""Graph RAG with optional streaming""" """Graph RAG with optional streaming"""
request = { request = {
"question": question, "query": query,
"user": user, "user": user,
"collection": collection, "collection": collection,
"max-subgraph-size": max_subgraph_size, "max-subgraph-size": max_subgraph_size,
@ -235,11 +235,11 @@ class AsyncSocketFlowInstance:
if hasattr(chunk, 'content'): if hasattr(chunk, 'content'):
yield chunk.content yield chunk.content
async def document_rag(self, question: str, user: str, collection: str, async def document_rag(self, query: str, user: str, collection: str,
doc_limit: int = 10, streaming: bool = False, **kwargs): doc_limit: int = 10, streaming: bool = False, **kwargs):
"""Document RAG with optional streaming""" """Document RAG with optional streaming"""
request = { request = {
"question": question, "query": query,
"user": user, "user": user,
"collection": collection, "collection": collection,
"doc-limit": doc_limit, "doc-limit": doc_limit,

View file

@ -160,14 +160,14 @@ class FlowInstance:
)["answer"] )["answer"]
def graph_rag( def graph_rag(
self, question, user="trustgraph", collection="default", self, query, user="trustgraph", collection="default",
entity_limit=50, triple_limit=30, max_subgraph_size=150, entity_limit=50, triple_limit=30, max_subgraph_size=150,
max_path_length=2, max_path_length=2,
): ):
# The input consists of a question # The input consists of a question
input = { input = {
"query": question, "query": query,
"user": user, "user": user,
"collection": collection, "collection": collection,
"entity-limit": entity_limit, "entity-limit": entity_limit,
@ -182,13 +182,13 @@ class FlowInstance:
)["response"] )["response"]
def document_rag( def document_rag(
self, question, user="trustgraph", collection="default", self, query, user="trustgraph", collection="default",
doc_limit=10, doc_limit=10,
): ):
# The input consists of a question # The input consists of a question
input = { input = {
"query": question, "query": query,
"user": user, "user": user,
"collection": collection, "collection": collection,
"doc-limit": doc_limit, "doc-limit": doc_limit,

View file

@ -284,7 +284,7 @@ class SocketFlowInstance:
def graph_rag( def graph_rag(
self, self,
question: str, query: str,
user: str, user: str,
collection: str, collection: str,
max_subgraph_size: int = 1000, max_subgraph_size: int = 1000,
@ -295,7 +295,7 @@ class SocketFlowInstance:
) -> Union[str, Iterator[str]]: ) -> Union[str, Iterator[str]]:
"""Graph RAG with optional streaming""" """Graph RAG with optional streaming"""
request = { request = {
"question": question, "query": query,
"user": user, "user": user,
"collection": collection, "collection": collection,
"max-subgraph-size": max_subgraph_size, "max-subgraph-size": max_subgraph_size,
@ -316,7 +316,7 @@ class SocketFlowInstance:
def document_rag( def document_rag(
self, self,
question: str, query: str,
user: str, user: str,
collection: str, collection: str,
doc_limit: int = 10, doc_limit: int = 10,
@ -325,7 +325,7 @@ class SocketFlowInstance:
) -> Union[str, Iterator[str]]: ) -> Union[str, Iterator[str]]:
"""Document RAG with optional streaming""" """Document RAG with optional streaming"""
request = { request = {
"question": question, "query": query,
"user": user, "user": user,
"collection": collection, "collection": collection,
"doc-limit": doc_limit, "doc-limit": doc_limit,

View file

@ -15,7 +15,7 @@ from prometheus_client import start_http_server, Info
from .. schema import ConfigPush, config_push_queue from .. schema import ConfigPush, config_push_queue
from .. log_level import LogLevel from .. log_level import LogLevel
from . pubsub import PulsarClient from . pubsub import PulsarClient, get_pubsub
from . producer import Producer from . producer import Producer
from . consumer import Consumer from . consumer import Consumer
from . metrics import ProcessorMetrics, ConsumerMetrics from . metrics import ProcessorMetrics, ConsumerMetrics
@ -34,8 +34,11 @@ class AsyncProcessor:
# Store the identity # Store the identity
self.id = params.get("id") self.id = params.get("id")
# Register a pulsar client # Create pub/sub backend via factory
self.pulsar_client_object = PulsarClient(**params) self.pubsub_backend = get_pubsub(**params)
# Store pulsar_host for backward compatibility
self._pulsar_host = params.get("pulsar_host", "pulsar://pulsar:6650")
# Initialise metrics, records the parameters # Initialise metrics, records the parameters
ProcessorMetrics(processor = self.id).info({ ProcessorMetrics(processor = self.id).info({
@ -70,7 +73,7 @@ class AsyncProcessor:
self.config_sub_task = Consumer( self.config_sub_task = Consumer(
taskgroup = self.taskgroup, taskgroup = self.taskgroup,
client = self.pulsar_client, backend = self.pubsub_backend, # Changed from client to backend
subscriber = config_subscriber_id, subscriber = config_subscriber_id,
flow = None, flow = None,
@ -96,16 +99,16 @@ class AsyncProcessor:
# This is called to stop all threads. An over-ride point for extra # This is called to stop all threads. An over-ride point for extra
# functionality # functionality
def stop(self): def stop(self):
self.pulsar_client.close() self.pubsub_backend.close()
self.running = False self.running = False
# Returns the pulsar host # Returns the pub/sub backend (new interface)
@property @property
def pulsar_host(self): return self.pulsar_client_object.pulsar_host def pubsub(self): return self.pubsub_backend
# Returns the pulsar client # Returns the pulsar host (backward compatibility)
@property @property
def pulsar_client(self): return self.pulsar_client_object.client def pulsar_host(self): return self._pulsar_host
# Register a new event handler for configuration change # Register a new event handler for configuration change
def register_config_handler(self, handler): def register_config_handler(self, handler):
@ -247,6 +250,14 @@ class AsyncProcessor:
@staticmethod @staticmethod
def add_args(parser): def add_args(parser):
# Pub/sub backend selection
parser.add_argument(
'--pubsub-backend',
default=os.getenv('PUBSUB_BACKEND', 'pulsar'),
choices=['pulsar', 'mqtt'],
help='Pub/sub backend (default: pulsar, env: PUBSUB_BACKEND)',
)
PulsarClient.add_args(parser) PulsarClient.add_args(parser)
add_logging_args(parser) add_logging_args(parser)

View file

@ -0,0 +1,148 @@
"""
Backend abstraction interfaces for pub/sub systems.
This module defines Protocol classes that all pub/sub backends must implement,
allowing TrustGraph to work with different messaging systems (Pulsar, MQTT, Kafka, etc.)
"""
from typing import Protocol, Any, runtime_checkable
@runtime_checkable
class Message(Protocol):
"""Protocol for a received message."""
def value(self) -> Any:
"""
Get the deserialized message content.
Returns:
Dataclass instance representing the message
"""
...
def properties(self) -> dict:
"""
Get message properties/metadata.
Returns:
Dictionary of message properties
"""
...
@runtime_checkable
class BackendProducer(Protocol):
"""Protocol for backend-specific producer."""
def send(self, message: Any, properties: dict = {}) -> None:
"""
Send a message (dataclass instance) with optional properties.
Args:
message: Dataclass instance to send
properties: Optional metadata properties
"""
...
def flush(self) -> None:
"""Flush any buffered messages."""
...
def close(self) -> None:
"""Close the producer."""
...
@runtime_checkable
class BackendConsumer(Protocol):
"""Protocol for backend-specific consumer."""
def receive(self, timeout_millis: int = 2000) -> Message:
"""
Receive a message from the topic.
Args:
timeout_millis: Timeout in milliseconds
Returns:
Message object
Raises:
TimeoutError: If no message received within timeout
"""
...
def acknowledge(self, message: Message) -> None:
"""
Acknowledge successful processing of a message.
Args:
message: The message to acknowledge
"""
...
def negative_acknowledge(self, message: Message) -> None:
"""
Negative acknowledge - triggers redelivery.
Args:
message: The message to negatively acknowledge
"""
...
def unsubscribe(self) -> None:
"""Unsubscribe from the topic."""
...
def close(self) -> None:
"""Close the consumer."""
...
@runtime_checkable
class PubSubBackend(Protocol):
"""Protocol defining the interface all pub/sub backends must implement."""
def create_producer(self, topic: str, schema: type, **options) -> BackendProducer:
"""
Create a producer for a topic.
Args:
topic: Generic topic format (qos/tenant/namespace/queue)
schema: Dataclass type for messages
**options: Backend-specific options (e.g., chunking_enabled)
Returns:
Backend-specific producer instance
"""
...
def create_consumer(
self,
topic: str,
subscription: str,
schema: type,
initial_position: str = 'latest',
consumer_type: str = 'shared',
**options
) -> BackendConsumer:
"""
Create a consumer for a topic.
Args:
topic: Generic topic format (qos/tenant/namespace/queue)
subscription: Subscription/consumer group name
schema: Dataclass type for messages
initial_position: 'earliest' or 'latest' (some backends may ignore)
consumer_type: 'shared', 'exclusive', 'failover' (some backends may ignore)
**options: Backend-specific options
Returns:
Backend-specific consumer instance
"""
...
def close(self) -> None:
"""Close the backend connection."""
...

View file

@ -9,9 +9,6 @@
# one handler, and a single thread of concurrency, nothing too outrageous # one handler, and a single thread of concurrency, nothing too outrageous
# will happen if synchronous / blocking code is used # will happen if synchronous / blocking code is used
from pulsar.schema import JsonSchema
import pulsar
import _pulsar
import asyncio import asyncio
import time import time
import logging import logging
@ -21,11 +18,15 @@ from .. exceptions import TooManyRequests
# Module logger # Module logger
logger = logging.getLogger(__name__) logger = logging.getLogger(__name__)
# Timeout exception - can come from different backends
class TimeoutError(Exception):
pass
class Consumer: class Consumer:
def __init__( def __init__(
self, taskgroup, flow, client, topic, subscriber, schema, self, taskgroup, flow, backend, topic, subscriber, schema,
handler, handler,
metrics = None, metrics = None,
start_of_messages=False, start_of_messages=False,
rate_limit_retry_time = 10, rate_limit_timeout = 7200, rate_limit_retry_time = 10, rate_limit_timeout = 7200,
@ -35,7 +36,7 @@ class Consumer:
self.taskgroup = taskgroup self.taskgroup = taskgroup
self.flow = flow self.flow = flow
self.client = client self.backend = backend # Changed from 'client' to 'backend'
self.topic = topic self.topic = topic
self.subscriber = subscriber self.subscriber = subscriber
self.schema = schema self.schema = schema
@ -96,18 +97,20 @@ class Consumer:
logger.info(f"Subscribing to topic: {self.topic}") logger.info(f"Subscribing to topic: {self.topic}")
# Determine initial position
if self.start_of_messages: if self.start_of_messages:
pos = pulsar.InitialPosition.Earliest initial_pos = 'earliest'
else: else:
pos = pulsar.InitialPosition.Latest initial_pos = 'latest'
# Create consumer via backend
self.consumer = await asyncio.to_thread( self.consumer = await asyncio.to_thread(
self.client.subscribe, self.backend.create_consumer,
topic = self.topic, topic = self.topic,
subscription_name = self.subscriber, subscription = self.subscriber,
schema = JsonSchema(self.schema), schema = self.schema,
initial_position = pos, initial_position = initial_pos,
consumer_type = pulsar.ConsumerType.Shared, consumer_type = 'shared',
) )
except Exception as e: except Exception as e:
@ -159,9 +162,10 @@ class Consumer:
self.consumer.receive, self.consumer.receive,
timeout_millis=2000 timeout_millis=2000
) )
except _pulsar.Timeout:
continue
except Exception as e: except Exception as e:
# Handle timeout from any backend
if 'timeout' in str(type(e)).lower() or 'timeout' in str(e).lower():
continue
raise e raise e
await self.handle_one_from_queue(msg) await self.handle_one_from_queue(msg)

View file

@ -19,7 +19,7 @@ class ConsumerSpec(Spec):
consumer = Consumer( consumer = Consumer(
taskgroup = processor.taskgroup, taskgroup = processor.taskgroup,
flow = flow, flow = flow,
client = processor.pulsar_client, backend = processor.pubsub,
topic = definition[self.name], topic = definition[self.name],
subscriber = processor.id + "--" + flow.name + "--" + self.name, subscriber = processor.id + "--" + flow.name + "--" + self.name,
schema = self.schema, schema = self.schema,

View file

@ -1,5 +1,4 @@
from pulsar.schema import JsonSchema
import asyncio import asyncio
import logging import logging
@ -8,10 +7,10 @@ logger = logging.getLogger(__name__)
class Producer: class Producer:
def __init__(self, client, topic, schema, metrics=None, def __init__(self, backend, topic, schema, metrics=None,
chunking_enabled=True): chunking_enabled=True):
self.client = client self.backend = backend # Changed from 'client' to 'backend'
self.topic = topic self.topic = topic
self.schema = schema self.schema = schema
@ -44,9 +43,9 @@ class Producer:
try: try:
logger.info(f"Connecting publisher to {self.topic}...") logger.info(f"Connecting publisher to {self.topic}...")
self.producer = self.client.create_producer( self.producer = self.backend.create_producer(
topic = self.topic, topic = self.topic,
schema = JsonSchema(self.schema), schema = self.schema,
chunking_enabled = self.chunking_enabled, chunking_enabled = self.chunking_enabled,
) )
logger.info(f"Connected publisher to {self.topic}") logger.info(f"Connected publisher to {self.topic}")

View file

@ -15,7 +15,7 @@ class ProducerSpec(Spec):
) )
producer = Producer( producer = Producer(
client = processor.pulsar_client, backend = processor.pubsub,
topic = definition[self.name], topic = definition[self.name],
schema = self.schema, schema = self.schema,
metrics = producer_metrics, metrics = producer_metrics,

View file

@ -37,21 +37,20 @@ class PromptClient(RequestResponse):
else: else:
logger.info("DEBUG prompt_client: Streaming path") logger.info("DEBUG prompt_client: Streaming path")
# Streaming path - collect all chunks # Streaming path - just forward chunks, don't accumulate
full_text = "" last_text = ""
full_object = None last_object = None
async def collect_chunks(resp): async def forward_chunks(resp):
nonlocal full_text, full_object nonlocal last_text, last_object
logger.info(f"DEBUG prompt_client: collect_chunks called, resp.text={resp.text[:50] if resp.text else None}, end_of_stream={getattr(resp, 'end_of_stream', False)}") logger.info(f"DEBUG prompt_client: forward_chunks called, resp.text={resp.text[:50] if resp.text else None}, end_of_stream={getattr(resp, 'end_of_stream', False)}")
if resp.error: if resp.error:
logger.error(f"DEBUG prompt_client: Error in response: {resp.error.message}") logger.error(f"DEBUG prompt_client: Error in response: {resp.error.message}")
raise RuntimeError(resp.error.message) raise RuntimeError(resp.error.message)
if resp.text: if resp.text:
full_text += resp.text last_text = resp.text
logger.info(f"DEBUG prompt_client: Accumulated {len(full_text)} chars")
# Call chunk callback if provided # Call chunk callback if provided
if chunk_callback: if chunk_callback:
logger.info(f"DEBUG prompt_client: Calling chunk_callback") logger.info(f"DEBUG prompt_client: Calling chunk_callback")
@ -61,7 +60,7 @@ class PromptClient(RequestResponse):
chunk_callback(resp.text) chunk_callback(resp.text)
elif resp.object: elif resp.object:
logger.info(f"DEBUG prompt_client: Got object response") logger.info(f"DEBUG prompt_client: Got object response")
full_object = resp.object last_object = resp.object
end_stream = getattr(resp, 'end_of_stream', False) end_stream = getattr(resp, 'end_of_stream', False)
logger.info(f"DEBUG prompt_client: Returning end_of_stream={end_stream}") logger.info(f"DEBUG prompt_client: Returning end_of_stream={end_stream}")
@ -79,17 +78,17 @@ class PromptClient(RequestResponse):
logger.info(f"DEBUG prompt_client: About to call self.request with recipient, timeout={timeout}") logger.info(f"DEBUG prompt_client: About to call self.request with recipient, timeout={timeout}")
await self.request( await self.request(
req, req,
recipient=collect_chunks, recipient=forward_chunks,
timeout=timeout timeout=timeout
) )
logger.info(f"DEBUG prompt_client: self.request returned, full_text has {len(full_text)} chars") logger.info(f"DEBUG prompt_client: self.request returned, last_text={last_text[:50] if last_text else None}")
if full_text: if last_text:
logger.info("DEBUG prompt_client: Returning full_text") logger.info("DEBUG prompt_client: Returning last_text")
return full_text return last_text
logger.info("DEBUG prompt_client: Returning parsed full_object") logger.info("DEBUG prompt_client: Returning parsed last_object")
return json.loads(full_object) return json.loads(last_object) if last_object else None
async def extract_definitions(self, text, timeout=600): async def extract_definitions(self, text, timeout=600):
return await self.prompt( return await self.prompt(

View file

@ -1,9 +1,6 @@
from pulsar.schema import JsonSchema
import asyncio import asyncio
import time import time
import pulsar
import logging import logging
# Module logger # Module logger
@ -11,9 +8,9 @@ logger = logging.getLogger(__name__)
class Publisher: class Publisher:
def __init__(self, client, topic, schema=None, max_size=10, def __init__(self, backend, topic, schema=None, max_size=10,
chunking_enabled=True, drain_timeout=5.0): chunking_enabled=True, drain_timeout=5.0):
self.client = client self.backend = backend # Changed from 'client' to 'backend'
self.topic = topic self.topic = topic
self.schema = schema self.schema = schema
self.q = asyncio.Queue(maxsize=max_size) self.q = asyncio.Queue(maxsize=max_size)
@ -47,9 +44,9 @@ class Publisher:
try: try:
producer = self.client.create_producer( producer = self.backend.create_producer(
topic=self.topic, topic=self.topic,
schema=JsonSchema(self.schema), schema=self.schema,
chunking_enabled=self.chunking_enabled, chunking_enabled=self.chunking_enabled,
) )

View file

@ -4,8 +4,45 @@ import pulsar
import _pulsar import _pulsar
import uuid import uuid
from pulsar.schema import JsonSchema from pulsar.schema import JsonSchema
import logging
from .. log_level import LogLevel from .. log_level import LogLevel
from .pulsar_backend import PulsarBackend
logger = logging.getLogger(__name__)
def get_pubsub(**config):
"""
Factory function to create a pub/sub backend based on configuration.
Args:
config: Configuration dictionary from command-line args
Must include 'pubsub_backend' key
Returns:
Backend instance (PulsarBackend, MQTTBackend, etc.)
Example:
backend = get_pubsub(
pubsub_backend='pulsar',
pulsar_host='pulsar://localhost:6650'
)
"""
backend_type = config.get('pubsub_backend', 'pulsar')
if backend_type == 'pulsar':
return PulsarBackend(
host=config.get('pulsar_host', PulsarClient.default_pulsar_host),
api_key=config.get('pulsar_api_key', PulsarClient.default_pulsar_api_key),
listener=config.get('pulsar_listener'),
)
elif backend_type == 'mqtt':
# TODO: Implement MQTT backend
raise NotImplementedError("MQTT backend not yet implemented")
else:
raise ValueError(f"Unknown pub/sub backend: {backend_type}")
class PulsarClient: class PulsarClient:

View file

@ -0,0 +1,350 @@
"""
Pulsar backend implementation for pub/sub abstraction.
This module provides a Pulsar-specific implementation of the backend interfaces,
handling topic mapping, serialization, and Pulsar client management.
"""
import pulsar
import _pulsar
import json
import logging
import base64
import types
from dataclasses import asdict, is_dataclass
from typing import Any
from .backend import PubSubBackend, BackendProducer, BackendConsumer, Message
logger = logging.getLogger(__name__)
def dataclass_to_dict(obj: Any) -> dict:
"""
Recursively convert a dataclass to a dictionary, handling None values and bytes.
None values are excluded from the dictionary (not serialized).
Bytes values are decoded as UTF-8 strings for JSON serialization (matching Pulsar behavior).
"""
if obj is None:
return None
if is_dataclass(obj):
result = {}
for key, value in asdict(obj).items():
if value is not None:
if isinstance(value, bytes):
# Decode bytes as UTF-8 for JSON serialization (like Pulsar did)
result[key] = value.decode('utf-8')
elif is_dataclass(value):
result[key] = dataclass_to_dict(value)
elif isinstance(value, list):
result[key] = [
item.decode('utf-8') if isinstance(item, bytes)
else dataclass_to_dict(item) if is_dataclass(item)
else item
for item in value
]
elif isinstance(value, dict):
result[key] = {k: dataclass_to_dict(v) if is_dataclass(v) else v for k, v in value.items()}
else:
result[key] = value
return result
return obj
def dict_to_dataclass(data: dict, cls: type) -> Any:
"""
Convert a dictionary back to a dataclass instance.
Handles nested dataclasses and missing fields.
"""
if data is None:
return None
if not is_dataclass(cls):
return data
# Get field types from the dataclass
field_types = {f.name: f.type for f in cls.__dataclass_fields__.values()}
kwargs = {}
for key, value in data.items():
if key in field_types:
field_type = field_types[key]
# Handle modern union types (X | Y)
if isinstance(field_type, types.UnionType):
# Check if it's Optional (X | None)
if type(None) in field_type.__args__:
# Get the non-None type
actual_type = next((t for t in field_type.__args__ if t is not type(None)), None)
if actual_type and is_dataclass(actual_type) and isinstance(value, dict):
kwargs[key] = dict_to_dataclass(value, actual_type)
else:
kwargs[key] = value
else:
kwargs[key] = value
# Check if this is a generic type (list, dict, etc.)
elif hasattr(field_type, '__origin__'):
# Handle list[T]
if field_type.__origin__ == list:
item_type = field_type.__args__[0] if field_type.__args__ else None
if item_type and is_dataclass(item_type) and isinstance(value, list):
kwargs[key] = [
dict_to_dataclass(item, item_type) if isinstance(item, dict) else item
for item in value
]
else:
kwargs[key] = value
# Handle old-style Optional[T] (which is Union[T, None])
elif hasattr(field_type, '__args__') and type(None) in field_type.__args__:
# Get the non-None type from Union
actual_type = next((t for t in field_type.__args__ if t is not type(None)), None)
if actual_type and is_dataclass(actual_type) and isinstance(value, dict):
kwargs[key] = dict_to_dataclass(value, actual_type)
else:
kwargs[key] = value
else:
kwargs[key] = value
# Handle direct dataclass fields
elif is_dataclass(field_type) and isinstance(value, dict):
kwargs[key] = dict_to_dataclass(value, field_type)
# Handle bytes fields (UTF-8 encoded strings from JSON)
elif field_type == bytes and isinstance(value, str):
kwargs[key] = value.encode('utf-8')
else:
kwargs[key] = value
return cls(**kwargs)
class PulsarMessage:
"""Wrapper for Pulsar messages to match Message protocol."""
def __init__(self, pulsar_msg, schema_cls):
self._msg = pulsar_msg
self._schema_cls = schema_cls
self._value = None
def value(self) -> Any:
"""Deserialize and return the message value as a dataclass."""
if self._value is None:
# Get JSON string from Pulsar message
json_data = self._msg.data().decode('utf-8')
data_dict = json.loads(json_data)
# Convert to dataclass
self._value = dict_to_dataclass(data_dict, self._schema_cls)
return self._value
def properties(self) -> dict:
"""Return message properties."""
return self._msg.properties()
class PulsarBackendProducer:
"""Pulsar-specific producer implementation."""
def __init__(self, pulsar_producer, schema_cls):
self._producer = pulsar_producer
self._schema_cls = schema_cls
def send(self, message: Any, properties: dict = {}) -> None:
"""Send a dataclass message."""
# Convert dataclass to dict, excluding None values
data_dict = dataclass_to_dict(message)
# Serialize to JSON
json_data = json.dumps(data_dict)
# Send via Pulsar
self._producer.send(json_data.encode('utf-8'), properties=properties)
def flush(self) -> None:
"""Flush buffered messages."""
self._producer.flush()
def close(self) -> None:
"""Close the producer."""
self._producer.close()
class PulsarBackendConsumer:
"""Pulsar-specific consumer implementation."""
def __init__(self, pulsar_consumer, schema_cls):
self._consumer = pulsar_consumer
self._schema_cls = schema_cls
def receive(self, timeout_millis: int = 2000) -> Message:
"""Receive a message."""
pulsar_msg = self._consumer.receive(timeout_millis=timeout_millis)
return PulsarMessage(pulsar_msg, self._schema_cls)
def acknowledge(self, message: Message) -> None:
"""Acknowledge a message."""
if isinstance(message, PulsarMessage):
self._consumer.acknowledge(message._msg)
def negative_acknowledge(self, message: Message) -> None:
"""Negative acknowledge a message."""
if isinstance(message, PulsarMessage):
self._consumer.negative_acknowledge(message._msg)
def unsubscribe(self) -> None:
"""Unsubscribe from the topic."""
self._consumer.unsubscribe()
def close(self) -> None:
"""Close the consumer."""
self._consumer.close()
class PulsarBackend:
"""
Pulsar backend implementation.
Handles topic mapping, client management, and creation of Pulsar-specific
producers and consumers.
"""
def __init__(self, host: str, api_key: str = None, listener: str = None):
"""
Initialize Pulsar backend.
Args:
host: Pulsar broker URL (e.g., pulsar://localhost:6650)
api_key: Optional API key for authentication
listener: Optional listener name for multi-homed setups
"""
self.host = host
self.api_key = api_key
self.listener = listener
# Create Pulsar client
client_args = {'service_url': host}
if listener:
client_args['listener_name'] = listener
if api_key:
client_args['authentication'] = pulsar.AuthenticationToken(api_key)
self.client = pulsar.Client(**client_args)
logger.info(f"Pulsar client connected to {host}")
def map_topic(self, generic_topic: str) -> str:
"""
Map generic topic format to Pulsar URI.
Format: qos/tenant/namespace/queue
Example: q1/tg/flow/my-queue -> persistent://tg/flow/my-queue
Args:
generic_topic: Generic topic string or already-formatted Pulsar URI
Returns:
Pulsar topic URI
"""
# If already a Pulsar URI, return as-is
if '://' in generic_topic:
return generic_topic
parts = generic_topic.split('/', 3)
if len(parts) != 4:
raise ValueError(f"Invalid topic format: {generic_topic}, expected qos/tenant/namespace/queue")
qos, tenant, namespace, queue = parts
# Map QoS to persistence
if qos == 'q0':
persistence = 'non-persistent'
elif qos in ['q1', 'q2']:
persistence = 'persistent'
else:
raise ValueError(f"Invalid QoS level: {qos}, expected q0, q1, or q2")
return f"{persistence}://{tenant}/{namespace}/{queue}"
def create_producer(self, topic: str, schema: type, **options) -> BackendProducer:
"""
Create a Pulsar producer.
Args:
topic: Generic topic format (qos/tenant/namespace/queue)
schema: Dataclass type for messages
**options: Backend-specific options (e.g., chunking_enabled)
Returns:
PulsarBackendProducer instance
"""
pulsar_topic = self.map_topic(topic)
producer_args = {
'topic': pulsar_topic,
'schema': pulsar.schema.BytesSchema(), # We handle serialization ourselves
}
# Add optional parameters
if 'chunking_enabled' in options:
producer_args['chunking_enabled'] = options['chunking_enabled']
pulsar_producer = self.client.create_producer(**producer_args)
logger.debug(f"Created producer for topic: {pulsar_topic}")
return PulsarBackendProducer(pulsar_producer, schema)
def create_consumer(
self,
topic: str,
subscription: str,
schema: type,
initial_position: str = 'latest',
consumer_type: str = 'shared',
**options
) -> BackendConsumer:
"""
Create a Pulsar consumer.
Args:
topic: Generic topic format (qos/tenant/namespace/queue)
subscription: Subscription name
schema: Dataclass type for messages
initial_position: 'earliest' or 'latest'
consumer_type: 'shared', 'exclusive', or 'failover'
**options: Backend-specific options
Returns:
PulsarBackendConsumer instance
"""
pulsar_topic = self.map_topic(topic)
# Map initial position
if initial_position == 'earliest':
pos = pulsar.InitialPosition.Earliest
else:
pos = pulsar.InitialPosition.Latest
# Map consumer type
if consumer_type == 'exclusive':
ctype = pulsar.ConsumerType.Exclusive
elif consumer_type == 'failover':
ctype = pulsar.ConsumerType.Failover
else:
ctype = pulsar.ConsumerType.Shared
consumer_args = {
'topic': pulsar_topic,
'subscription_name': subscription,
'schema': pulsar.schema.BytesSchema(), # We handle deserialization ourselves
'initial_position': pos,
'consumer_type': ctype,
}
pulsar_consumer = self.client.subscribe(**consumer_args)
logger.debug(f"Created consumer for topic: {pulsar_topic}, subscription: {subscription}")
return PulsarBackendConsumer(pulsar_consumer, schema)
def close(self) -> None:
"""Close the Pulsar client."""
self.client.close()
logger.info("Pulsar client closed")

View file

@ -14,7 +14,7 @@ logger = logging.getLogger(__name__)
class RequestResponse(Subscriber): class RequestResponse(Subscriber):
def __init__( def __init__(
self, client, subscription, consumer_name, self, backend, subscription, consumer_name,
request_topic, request_schema, request_topic, request_schema,
request_metrics, request_metrics,
response_topic, response_schema, response_topic, response_schema,
@ -22,7 +22,7 @@ class RequestResponse(Subscriber):
): ):
super(RequestResponse, self).__init__( super(RequestResponse, self).__init__(
client = client, backend = backend,
subscription = subscription, subscription = subscription,
consumer_name = consumer_name, consumer_name = consumer_name,
topic = response_topic, topic = response_topic,
@ -31,7 +31,7 @@ class RequestResponse(Subscriber):
) )
self.producer = Producer( self.producer = Producer(
client = client, backend = backend,
topic = request_topic, topic = request_topic,
schema = request_schema, schema = request_schema,
metrics = request_metrics, metrics = request_metrics,
@ -126,7 +126,7 @@ class RequestResponseSpec(Spec):
) )
rr = self.impl( rr = self.impl(
client = processor.pulsar_client, backend = processor.pubsub,
# Make subscription names unique, so that all subscribers get # Make subscription names unique, so that all subscribers get
# to see all response messages # to see all response messages

View file

@ -3,9 +3,7 @@
# off of a queue and make it available using an internal broker system, # off of a queue and make it available using an internal broker system,
# so suitable for when multiple recipients are reading from the same queue # so suitable for when multiple recipients are reading from the same queue
from pulsar.schema import JsonSchema
import asyncio import asyncio
import _pulsar
import time import time
import logging import logging
import uuid import uuid
@ -13,12 +11,16 @@ import uuid
# Module logger # Module logger
logger = logging.getLogger(__name__) logger = logging.getLogger(__name__)
# Timeout exception - can come from different backends
class TimeoutError(Exception):
pass
class Subscriber: class Subscriber:
def __init__(self, client, topic, subscription, consumer_name, def __init__(self, backend, topic, subscription, consumer_name,
schema=None, max_size=100, metrics=None, schema=None, max_size=100, metrics=None,
backpressure_strategy="block", drain_timeout=5.0): backpressure_strategy="block", drain_timeout=5.0):
self.client = client self.backend = backend # Changed from 'client' to 'backend'
self.topic = topic self.topic = topic
self.subscription = subscription self.subscription = subscription
self.consumer_name = consumer_name self.consumer_name = consumer_name
@ -43,18 +45,14 @@ class Subscriber:
async def start(self): async def start(self):
# Build subscribe arguments # Create consumer via backend
subscribe_args = { self.consumer = await asyncio.to_thread(
'topic': self.topic, self.backend.create_consumer,
'subscription_name': self.subscription, topic=self.topic,
'consumer_name': self.consumer_name, subscription=self.subscription,
} schema=self.schema,
consumer_type='shared',
# Only add schema if provided (omit if None) )
if self.schema is not None:
subscribe_args['schema'] = JsonSchema(self.schema)
self.consumer = self.client.subscribe(**subscribe_args)
self.task = asyncio.create_task(self.run()) self.task = asyncio.create_task(self.run())
@ -94,12 +92,13 @@ class Subscriber:
drain_end_time = time.time() + self.drain_timeout drain_end_time = time.time() + self.drain_timeout
logger.info(f"Subscriber entering drain mode, timeout={self.drain_timeout}s") logger.info(f"Subscriber entering drain mode, timeout={self.drain_timeout}s")
# Stop accepting new messages from Pulsar during drain # Stop accepting new messages during drain
if self.consumer: # Note: Not all backends support pausing message listeners
if self.consumer and hasattr(self.consumer, 'pause_message_listener'):
try: try:
self.consumer.pause_message_listener() self.consumer.pause_message_listener()
except _pulsar.InvalidConfiguration: except Exception:
# Not all consumers have message listeners (e.g., blocking receive mode) # Not all consumers support message listeners
pass pass
# Check drain timeout # Check drain timeout
@ -133,9 +132,10 @@ class Subscriber:
self.consumer.receive, self.consumer.receive,
timeout_millis=250 timeout_millis=250
) )
except _pulsar.Timeout:
continue
except Exception as e: except Exception as e:
# Handle timeout from any backend
if 'timeout' in str(type(e)).lower() or 'timeout' in str(e).lower():
continue
logger.error(f"Exception in subscriber receive: {e}", exc_info=True) logger.error(f"Exception in subscriber receive: {e}", exc_info=True)
raise e raise e
@ -157,19 +157,20 @@ class Subscriber:
for msg in self.pending_acks.values(): for msg in self.pending_acks.values():
try: try:
self.consumer.negative_acknowledge(msg) self.consumer.negative_acknowledge(msg)
except _pulsar.AlreadyClosed: except Exception:
pass # Consumer already closed pass # Consumer already closed or error
self.pending_acks.clear() self.pending_acks.clear()
if self.consumer: if self.consumer:
try: if hasattr(self.consumer, 'unsubscribe'):
self.consumer.unsubscribe() try:
except _pulsar.AlreadyClosed: self.consumer.unsubscribe()
pass # Already closed except Exception:
pass # Already closed or error
try: try:
self.consumer.close() self.consumer.close()
except _pulsar.AlreadyClosed: except Exception:
pass # Already closed pass # Already closed or error
self.consumer = None self.consumer = None

View file

@ -16,7 +16,7 @@ class SubscriberSpec(Spec):
) )
subscriber = Subscriber( subscriber = Subscriber(
client = processor.pulsar_client, backend = processor.pubsub,
topic = definition[self.name], topic = definition[self.name],
subscription = flow.id, subscription = flow.id,
consumer_name = flow.id, consumer_name = flow.id,

View file

@ -7,6 +7,7 @@ import time
from pulsar.schema import JsonSchema from pulsar.schema import JsonSchema
from .. exceptions import * from .. exceptions import *
from ..base.pubsub import get_pubsub
# Default timeout for a request/response. In seconds. # Default timeout for a request/response. In seconds.
DEFAULT_TIMEOUT=300 DEFAULT_TIMEOUT=300
@ -39,30 +40,25 @@ class BaseClient:
if subscriber == None: if subscriber == None:
subscriber = str(uuid.uuid4()) subscriber = str(uuid.uuid4())
if pulsar_api_key: # Create backend using factory
auth = pulsar.AuthenticationToken(pulsar_api_key) self.backend = get_pubsub(
self.client = pulsar.Client( pulsar_host=pulsar_host,
pulsar_host, pulsar_api_key=pulsar_api_key,
logger=pulsar.ConsoleLogger(log_level), pulsar_listener=listener,
authentication=auth, pubsub_backend='pulsar'
listener=listener, )
)
else:
self.client = pulsar.Client(
pulsar_host,
logger=pulsar.ConsoleLogger(log_level),
listener_name=listener,
)
self.producer = self.client.create_producer( self.producer = self.backend.create_producer(
topic=input_queue, topic=input_queue,
schema=JsonSchema(input_schema), schema=input_schema,
chunking_enabled=True, chunking_enabled=True,
) )
self.consumer = self.client.subscribe( self.consumer = self.backend.create_consumer(
output_queue, subscriber, topic=output_queue,
schema=JsonSchema(output_schema), subscription=subscriber,
schema=output_schema,
consumer_type='shared',
) )
self.input_schema = input_schema self.input_schema = input_schema
@ -136,10 +132,11 @@ class BaseClient:
if hasattr(self, "consumer"): if hasattr(self, "consumer"):
self.consumer.close() self.consumer.close()
if hasattr(self, "producer"): if hasattr(self, "producer"):
self.producer.flush() self.producer.flush()
self.producer.close() self.producer.close()
self.client.close() if hasattr(self, "backend"):
self.backend.close()

View file

@ -64,7 +64,6 @@ class ConfigClient(BaseClient):
def get(self, keys, timeout=300): def get(self, keys, timeout=300):
resp = self.call( resp = self.call(
id=id,
operation="get", operation="get",
keys=[ keys=[
ConfigKey( ConfigKey(
@ -88,7 +87,6 @@ class ConfigClient(BaseClient):
def list(self, type, timeout=300): def list(self, type, timeout=300):
resp = self.call( resp = self.call(
id=id,
operation="list", operation="list",
type=type, type=type,
timeout=timeout timeout=timeout
@ -99,7 +97,6 @@ class ConfigClient(BaseClient):
def getvalues(self, type, timeout=300): def getvalues(self, type, timeout=300):
resp = self.call( resp = self.call(
id=id,
operation="getvalues", operation="getvalues",
type=type, type=type,
timeout=timeout timeout=timeout
@ -117,7 +114,6 @@ class ConfigClient(BaseClient):
def delete(self, keys, timeout=300): def delete(self, keys, timeout=300):
resp = self.call( resp = self.call(
id=id,
operation="delete", operation="delete",
keys=[ keys=[
ConfigKey( ConfigKey(
@ -134,7 +130,6 @@ class ConfigClient(BaseClient):
def put(self, values, timeout=300): def put(self, values, timeout=300):
resp = self.call( resp = self.call(
id=id,
operation="put", operation="put",
values=[ values=[
ConfigValue( ConfigValue(
@ -152,7 +147,6 @@ class ConfigClient(BaseClient):
def config(self, timeout=300): def config(self, timeout=300):
resp = self.call( resp = self.call(
id=id,
operation="config", operation="config",
timeout=timeout timeout=timeout
) )

View file

@ -34,14 +34,12 @@ class DocumentRagResponseTranslator(MessageTranslator):
def from_pulsar(self, obj: DocumentRagResponse) -> Dict[str, Any]: def from_pulsar(self, obj: DocumentRagResponse) -> Dict[str, Any]:
result = {} result = {}
# Check if this is a streaming response (has chunk) # Include response content (chunk or complete)
if hasattr(obj, 'chunk') and obj.chunk: if obj.response:
result["chunk"] = obj.chunk result["response"] = obj.response
result["end_of_stream"] = getattr(obj, "end_of_stream", False)
else: # Include end_of_stream flag
# Non-streaming response result["end_of_stream"] = getattr(obj, "end_of_stream", False)
if obj.response:
result["response"] = obj.response
# Always include error if present # Always include error if present
if hasattr(obj, 'error') and obj.error and obj.error.message: if hasattr(obj, 'error') and obj.error and obj.error.message:
@ -51,13 +49,7 @@ class DocumentRagResponseTranslator(MessageTranslator):
def from_response_with_completion(self, obj: DocumentRagResponse) -> Tuple[Dict[str, Any], bool]: def from_response_with_completion(self, obj: DocumentRagResponse) -> Tuple[Dict[str, Any], bool]:
"""Returns (response_dict, is_final)""" """Returns (response_dict, is_final)"""
# For streaming responses, check end_of_stream is_final = getattr(obj, 'end_of_stream', False)
if hasattr(obj, 'chunk') and obj.chunk:
is_final = getattr(obj, 'end_of_stream', False)
else:
# For non-streaming responses, it's always final
is_final = True
return self.from_pulsar(obj), is_final return self.from_pulsar(obj), is_final
@ -98,14 +90,12 @@ class GraphRagResponseTranslator(MessageTranslator):
def from_pulsar(self, obj: GraphRagResponse) -> Dict[str, Any]: def from_pulsar(self, obj: GraphRagResponse) -> Dict[str, Any]:
result = {} result = {}
# Check if this is a streaming response (has chunk) # Include response content (chunk or complete)
if hasattr(obj, 'chunk') and obj.chunk: if obj.response:
result["chunk"] = obj.chunk result["response"] = obj.response
result["end_of_stream"] = getattr(obj, "end_of_stream", False)
else: # Include end_of_stream flag
# Non-streaming response result["end_of_stream"] = getattr(obj, "end_of_stream", False)
if obj.response:
result["response"] = obj.response
# Always include error if present # Always include error if present
if hasattr(obj, 'error') and obj.error and obj.error.message: if hasattr(obj, 'error') and obj.error and obj.error.message:
@ -115,11 +105,5 @@ class GraphRagResponseTranslator(MessageTranslator):
def from_response_with_completion(self, obj: GraphRagResponse) -> Tuple[Dict[str, Any], bool]: def from_response_with_completion(self, obj: GraphRagResponse) -> Tuple[Dict[str, Any], bool]:
"""Returns (response_dict, is_final)""" """Returns (response_dict, is_final)"""
# For streaming responses, check end_of_stream is_final = getattr(obj, 'end_of_stream', False)
if hasattr(obj, 'chunk') and obj.chunk:
is_final = getattr(obj, 'end_of_stream', False)
else:
# For non-streaming responses, it's always final
is_final = True
return self.from_pulsar(obj), is_final return self.from_pulsar(obj), is_final

View file

@ -1,16 +1,14 @@
from dataclasses import dataclass, field
from pulsar.schema import Record, String, Array
from .primitives import Triple from .primitives import Triple
class Metadata(Record): @dataclass
class Metadata:
# Source identifier # Source identifier
id = String() id: str = ""
# Subgraph # Subgraph
metadata = Array(Triple()) metadata: list[Triple] = field(default_factory=list)
# Collection management # Collection management
user = String() user: str = ""
collection = String() collection: str = ""

View file

@ -1,34 +1,39 @@
from pulsar.schema import Record, String, Boolean, Array, Integer from dataclasses import dataclass, field
class Error(Record): @dataclass
type = String() class Error:
message = String() type: str = ""
message: str = ""
class Value(Record): @dataclass
value = String() class Value:
is_uri = Boolean() value: str = ""
type = String() is_uri: bool = False
type: str = ""
class Triple(Record): @dataclass
s = Value() class Triple:
p = Value() s: Value | None = None
o = Value() p: Value | None = None
o: Value | None = None
class Field(Record): @dataclass
name = String() class Field:
name: str = ""
# int, string, long, bool, float, double, timestamp # int, string, long, bool, float, double, timestamp
type = String() type: str = ""
size = Integer() size: int = 0
primary = Boolean() primary: bool = False
description = String() description: str = ""
# NEW FIELDS for structured data: # NEW FIELDS for structured data:
required = Boolean() # Whether field is required required: bool = False # Whether field is required
enum_values = Array(String()) # For enum type fields enum_values: list[str] = field(default_factory=list) # For enum type fields
indexed = Boolean() # Whether field should be indexed indexed: bool = False # Whether field should be indexed
class RowSchema(Record): @dataclass
name = String() class RowSchema:
description = String() name: str = ""
fields = Array(Field()) description: str = ""
fields: list[Field] = field(default_factory=list)

View file

@ -1,4 +1,23 @@
def topic(topic, kind='persistent', tenant='tg', namespace='flow'): def topic(queue_name, qos='q1', tenant='tg', namespace='flow'):
return f"{kind}://{tenant}/{namespace}/{topic}" """
Create a generic topic identifier that can be mapped by backends.
Args:
queue_name: The queue/topic name
qos: Quality of service
- 'q0' = best-effort (no ack)
- 'q1' = at-least-once (ack required)
- 'q2' = exactly-once (two-phase ack)
tenant: Tenant identifier for multi-tenancy
namespace: Namespace within tenant
Returns:
Generic topic string: qos/tenant/namespace/queue_name
Examples:
topic('my-queue') # q1/tg/flow/my-queue
topic('config', qos='q2', namespace='config') # q2/tg/config/config
"""
return f"{qos}/{tenant}/{namespace}/{queue_name}"

View file

@ -1,4 +1,4 @@
from pulsar.schema import Record, Bytes from dataclasses import dataclass
from ..core.metadata import Metadata from ..core.metadata import Metadata
from ..core.topic import topic from ..core.topic import topic
@ -6,24 +6,27 @@ from ..core.topic import topic
############################################################################ ############################################################################
# PDF docs etc. # PDF docs etc.
class Document(Record): @dataclass
metadata = Metadata() class Document:
data = Bytes() metadata: Metadata | None = None
data: bytes = b""
############################################################################ ############################################################################
# Text documents / text from PDF # Text documents / text from PDF
class TextDocument(Record): @dataclass
metadata = Metadata() class TextDocument:
text = Bytes() metadata: Metadata | None = None
text: bytes = b""
############################################################################ ############################################################################
# Chunks of text # Chunks of text
class Chunk(Record): @dataclass
metadata = Metadata() class Chunk:
chunk = Bytes() metadata: Metadata | None = None
chunk: bytes = b""
############################################################################ ############################################################################

View file

@ -1,4 +1,4 @@
from pulsar.schema import Record, Bytes, String, Boolean, Integer, Array, Double, Map from dataclasses import dataclass, field
from ..core.metadata import Metadata from ..core.metadata import Metadata
from ..core.primitives import Value, RowSchema from ..core.primitives import Value, RowSchema
@ -8,49 +8,55 @@ from ..core.topic import topic
# Graph embeddings are embeddings associated with a graph entity # Graph embeddings are embeddings associated with a graph entity
class EntityEmbeddings(Record): @dataclass
entity = Value() class EntityEmbeddings:
vectors = Array(Array(Double())) entity: Value | None = None
vectors: list[list[float]] = field(default_factory=list)
# This is a 'batching' mechanism for the above data # This is a 'batching' mechanism for the above data
class GraphEmbeddings(Record): @dataclass
metadata = Metadata() class GraphEmbeddings:
entities = Array(EntityEmbeddings()) metadata: Metadata | None = None
entities: list[EntityEmbeddings] = field(default_factory=list)
############################################################################ ############################################################################
# Document embeddings are embeddings associated with a chunk # Document embeddings are embeddings associated with a chunk
class ChunkEmbeddings(Record): @dataclass
chunk = Bytes() class ChunkEmbeddings:
vectors = Array(Array(Double())) chunk: bytes = b""
vectors: list[list[float]] = field(default_factory=list)
# This is a 'batching' mechanism for the above data # This is a 'batching' mechanism for the above data
class DocumentEmbeddings(Record): @dataclass
metadata = Metadata() class DocumentEmbeddings:
chunks = Array(ChunkEmbeddings()) metadata: Metadata | None = None
chunks: list[ChunkEmbeddings] = field(default_factory=list)
############################################################################ ############################################################################
# Object embeddings are embeddings associated with the primary key of an # Object embeddings are embeddings associated with the primary key of an
# object # object
class ObjectEmbeddings(Record): @dataclass
metadata = Metadata() class ObjectEmbeddings:
vectors = Array(Array(Double())) metadata: Metadata | None = None
name = String() vectors: list[list[float]] = field(default_factory=list)
key_name = String() name: str = ""
id = String() key_name: str = ""
id: str = ""
############################################################################ ############################################################################
# Structured object embeddings with enhanced capabilities # Structured object embeddings with enhanced capabilities
class StructuredObjectEmbedding(Record): @dataclass
metadata = Metadata() class StructuredObjectEmbedding:
vectors = Array(Array(Double())) metadata: Metadata | None = None
schema_name = String() vectors: list[list[float]] = field(default_factory=list)
object_id = String() # Primary key value schema_name: str = ""
field_embeddings = Map(Array(Double())) # Per-field embeddings object_id: str = "" # Primary key value
field_embeddings: dict[str, list[float]] = field(default_factory=dict) # Per-field embeddings
############################################################################ ############################################################################

View file

@ -1,4 +1,4 @@
from pulsar.schema import Record, String, Array from dataclasses import dataclass, field
from ..core.primitives import Value, Triple from ..core.primitives import Value, Triple
from ..core.metadata import Metadata from ..core.metadata import Metadata
@ -8,21 +8,24 @@ from ..core.topic import topic
# Entity context are an entity associated with textual context # Entity context are an entity associated with textual context
class EntityContext(Record): @dataclass
entity = Value() class EntityContext:
context = String() entity: Value | None = None
context: str = ""
# This is a 'batching' mechanism for the above data # This is a 'batching' mechanism for the above data
class EntityContexts(Record): @dataclass
metadata = Metadata() class EntityContexts:
entities = Array(EntityContext()) metadata: Metadata | None = None
entities: list[EntityContext] = field(default_factory=list)
############################################################################ ############################################################################
# Graph triples # Graph triples
class Triples(Record): @dataclass
metadata = Metadata() class Triples:
triples = Array(Triple()) metadata: Metadata | None = None
triples: list[Triple] = field(default_factory=list)
############################################################################ ############################################################################

View file

@ -1,5 +1,4 @@
from dataclasses import dataclass, field
from pulsar.schema import Record, Bytes, String, Array, Long, Boolean
from ..core.primitives import Triple, Error from ..core.primitives import Triple, Error
from ..core.topic import topic from ..core.topic import topic
from ..core.metadata import Metadata from ..core.metadata import Metadata
@ -22,40 +21,40 @@ from .embeddings import GraphEmbeddings
# <- () # <- ()
# <- (error) # <- (error)
class KnowledgeRequest(Record): @dataclass
class KnowledgeRequest:
# get-kg-core, delete-kg-core, list-kg-cores, put-kg-core # get-kg-core, delete-kg-core, list-kg-cores, put-kg-core
# load-kg-core, unload-kg-core # load-kg-core, unload-kg-core
operation = String() operation: str = ""
# list-kg-cores, delete-kg-core, put-kg-core # list-kg-cores, delete-kg-core, put-kg-core
user = String() user: str = ""
# get-kg-core, list-kg-cores, delete-kg-core, put-kg-core, # get-kg-core, list-kg-cores, delete-kg-core, put-kg-core,
# load-kg-core, unload-kg-core # load-kg-core, unload-kg-core
id = String() id: str = ""
# load-kg-core # load-kg-core
flow = String() flow: str = ""
# load-kg-core # load-kg-core
collection = String() collection: str = ""
# put-kg-core # put-kg-core
triples = Triples() triples: Triples | None = None
graph_embeddings = GraphEmbeddings() graph_embeddings: GraphEmbeddings | None = None
class KnowledgeResponse(Record): @dataclass
error = Error() class KnowledgeResponse:
ids = Array(String()) error: Error | None = None
eos = Boolean() # Indicates end of knowledge core stream ids: list[str] = field(default_factory=list)
triples = Triples() eos: bool = False # Indicates end of knowledge core stream
graph_embeddings = GraphEmbeddings() triples: Triples | None = None
graph_embeddings: GraphEmbeddings | None = None
knowledge_request_queue = topic( knowledge_request_queue = topic(
'knowledge', kind='non-persistent', namespace='request' 'knowledge', qos='q0', namespace='request'
) )
knowledge_response_queue = topic( knowledge_response_queue = topic(
'knowledge', kind='non-persistent', namespace='response', 'knowledge', qos='q0', namespace='response',
) )

View file

@ -1,4 +1,4 @@
from pulsar.schema import Record, String, Boolean from dataclasses import dataclass
from ..core.topic import topic from ..core.topic import topic
@ -6,21 +6,25 @@ from ..core.topic import topic
# NLP extraction data types # NLP extraction data types
class Definition(Record): @dataclass
name = String() class Definition:
definition = String() name: str = ""
definition: str = ""
class Topic(Record): @dataclass
name = String() class Topic:
definition = String() name: str = ""
definition: str = ""
class Relationship(Record): @dataclass
s = String() class Relationship:
p = String() s: str = ""
o = String() p: str = ""
o_entity = Boolean() o: str = ""
o_entity: bool = False
class Fact(Record): @dataclass
s = String() class Fact:
p = String() s: str = ""
o = String() p: str = ""
o: str = ""

View file

@ -1,4 +1,4 @@
from pulsar.schema import Record, String, Map, Double, Array from dataclasses import dataclass, field
from ..core.metadata import Metadata from ..core.metadata import Metadata
from ..core.topic import topic from ..core.topic import topic
@ -7,11 +7,13 @@ from ..core.topic import topic
# Extracted object from text processing # Extracted object from text processing
class ExtractedObject(Record): @dataclass
metadata = Metadata() class ExtractedObject:
schema_name = String() # Which schema this object belongs to metadata: Metadata | None = None
values = Array(Map(String())) # Array of objects, each object is field name -> value schema_name: str = "" # Which schema this object belongs to
confidence = Double() values: list[dict[str, str]] = field(default_factory=list) # Array of objects, each object is field name -> value
source_span = String() # Text span where object was found confidence: float = 0.0
source_span: str = "" # Text span where object was found
############################################################################
############################################################################

View file

@ -1,4 +1,4 @@
from pulsar.schema import Record, Array, Map, String from dataclasses import dataclass, field
from ..core.metadata import Metadata from ..core.metadata import Metadata
from ..core.primitives import RowSchema from ..core.primitives import RowSchema
@ -8,9 +8,10 @@ from ..core.topic import topic
# Stores rows of information # Stores rows of information
class Rows(Record): @dataclass
metadata = Metadata() class Rows:
row_schema = RowSchema() metadata: Metadata | None = None
rows = Array(Map(String())) row_schema: RowSchema | None = None
rows: list[dict[str, str]] = field(default_factory=list)
############################################################################ ############################################################################

View file

@ -1,4 +1,4 @@
from pulsar.schema import Record, String, Bytes, Map from dataclasses import dataclass, field
from ..core.metadata import Metadata from ..core.metadata import Metadata
from ..core.topic import topic from ..core.topic import topic
@ -7,11 +7,13 @@ from ..core.topic import topic
# Structured data submission for fire-and-forget processing # Structured data submission for fire-and-forget processing
class StructuredDataSubmission(Record): @dataclass
metadata = Metadata() class StructuredDataSubmission:
format = String() # "json", "csv", "xml" metadata: Metadata | None = None
schema_name = String() # Reference to schema in config format: str = "" # "json", "csv", "xml"
data = Bytes() # Raw data to ingest schema_name: str = "" # Reference to schema in config
options = Map(String()) # Format-specific options data: bytes = b"" # Raw data to ingest
options: dict[str, str] = field(default_factory=dict) # Format-specific options
############################################################################
############################################################################

View file

@ -1,5 +1,5 @@
from pulsar.schema import Record, String, Array, Map, Boolean from dataclasses import dataclass, field
from ..core.topic import topic from ..core.topic import topic
from ..core.primitives import Error from ..core.primitives import Error
@ -8,33 +8,36 @@ from ..core.primitives import Error
# Prompt services, abstract the prompt generation # Prompt services, abstract the prompt generation
class AgentStep(Record): @dataclass
thought = String() class AgentStep:
action = String() thought: str = ""
arguments = Map(String()) action: str = ""
observation = String() arguments: dict[str, str] = field(default_factory=dict)
user = String() # User context for the step observation: str = ""
user: str = "" # User context for the step
class AgentRequest(Record): @dataclass
question = String() class AgentRequest:
state = String() question: str = ""
group = Array(String()) state: str = ""
history = Array(AgentStep()) group: list[str] | None = None
user = String() # User context for multi-tenancy history: list[AgentStep] = field(default_factory=list)
streaming = Boolean() # NEW: Enable streaming response delivery (default false) user: str = "" # User context for multi-tenancy
streaming: bool = False # NEW: Enable streaming response delivery (default false)
class AgentResponse(Record): @dataclass
class AgentResponse:
# Streaming-first design # Streaming-first design
chunk_type = String() # "thought", "action", "observation", "answer", "error" chunk_type: str = "" # "thought", "action", "observation", "answer", "error"
content = String() # The actual content (interpretation depends on chunk_type) content: str = "" # The actual content (interpretation depends on chunk_type)
end_of_message = Boolean() # Current chunk type (thought/action/etc.) is complete end_of_message: bool = False # Current chunk type (thought/action/etc.) is complete
end_of_dialog = Boolean() # Entire agent dialog is complete end_of_dialog: bool = False # Entire agent dialog is complete
# Legacy fields (deprecated but kept for backward compatibility) # Legacy fields (deprecated but kept for backward compatibility)
answer = String() answer: str = ""
error = Error() error: Error | None = None
thought = String() thought: str = ""
observation = String() observation: str = ""
############################################################################ ############################################################################

View file

@ -1,4 +1,4 @@
from pulsar.schema import Record, String, Integer, Array from dataclasses import dataclass, field
from datetime import datetime from datetime import datetime
from ..core.primitives import Error from ..core.primitives import Error
@ -10,37 +10,40 @@ from ..core.topic import topic
# Collection metadata operations (for librarian service) # Collection metadata operations (for librarian service)
class CollectionMetadata(Record): @dataclass
class CollectionMetadata:
"""Collection metadata record""" """Collection metadata record"""
user = String() user: str = ""
collection = String() collection: str = ""
name = String() name: str = ""
description = String() description: str = ""
tags = Array(String()) tags: list[str] = field(default_factory=list)
############################################################################ ############################################################################
class CollectionManagementRequest(Record): @dataclass
class CollectionManagementRequest:
"""Request for collection management operations""" """Request for collection management operations"""
operation = String() # e.g., "delete-collection" operation: str = "" # e.g., "delete-collection"
# For 'list-collections' # For 'list-collections'
user = String() user: str = ""
collection = String() collection: str = ""
timestamp = String() # ISO timestamp timestamp: str = "" # ISO timestamp
name = String() name: str = ""
description = String() description: str = ""
tags = Array(String()) tags: list[str] = field(default_factory=list)
# For list # For list
tag_filter = Array(String()) # Optional filter by tags tag_filter: list[str] = field(default_factory=list) # Optional filter by tags
limit = Integer() limit: int = 0
class CollectionManagementResponse(Record): @dataclass
class CollectionManagementResponse:
"""Response for collection management operations""" """Response for collection management operations"""
error = Error() # Only populated if there's an error error: Error | None = None # Only populated if there's an error
timestamp = String() # ISO timestamp timestamp: str = "" # ISO timestamp
collections = Array(CollectionMetadata()) collections: list[CollectionMetadata] = field(default_factory=list)
############################################################################ ############################################################################
@ -48,8 +51,9 @@ class CollectionManagementResponse(Record):
# Topics # Topics
collection_request_queue = topic( collection_request_queue = topic(
'collection', kind='non-persistent', namespace='request' 'collection', qos='q0', namespace='request'
) )
collection_response_queue = topic( collection_response_queue = topic(
'collection', kind='non-persistent', namespace='response' 'collection', qos='q0', namespace='response'
) )

View file

@ -1,5 +1,5 @@
from pulsar.schema import Record, Bytes, String, Boolean, Array, Map, Integer from dataclasses import dataclass, field
from ..core.topic import topic from ..core.topic import topic
from ..core.primitives import Error from ..core.primitives import Error
@ -13,58 +13,61 @@ from ..core.primitives import Error
# put(values) -> () # put(values) -> ()
# delete(keys) -> () # delete(keys) -> ()
# config() -> (version, config) # config() -> (version, config)
class ConfigKey(Record): @dataclass
type = String() class ConfigKey:
key = String() type: str = ""
key: str = ""
class ConfigValue(Record): @dataclass
type = String() class ConfigValue:
key = String() type: str = ""
value = String() key: str = ""
value: str = ""
# Prompt services, abstract the prompt generation # Prompt services, abstract the prompt generation
class ConfigRequest(Record): @dataclass
class ConfigRequest:
operation = String() # get, list, getvalues, delete, put, config operation: str = "" # get, list, getvalues, delete, put, config
# get, delete # get, delete
keys = Array(ConfigKey()) keys: list[ConfigKey] = field(default_factory=list)
# list, getvalues # list, getvalues
type = String() type: str = ""
# put # put
values = Array(ConfigValue()) values: list[ConfigValue] = field(default_factory=list)
class ConfigResponse(Record):
@dataclass
class ConfigResponse:
# get, list, getvalues, config # get, list, getvalues, config
version = Integer() version: int = 0
# get, getvalues # get, getvalues
values = Array(ConfigValue()) values: list[ConfigValue] = field(default_factory=list)
# list # list
directory = Array(String()) directory: list[str] = field(default_factory=list)
# config # config
config = Map(Map(String())) config: dict[str, dict[str, str]] = field(default_factory=dict)
# Everything # Everything
error = Error() error: Error | None = None
class ConfigPush(Record): @dataclass
version = Integer() class ConfigPush:
config = Map(Map(String())) version: int = 0
config: dict[str, dict[str, str]] = field(default_factory=dict)
config_request_queue = topic( config_request_queue = topic(
'config', kind='non-persistent', namespace='request' 'config', qos='q0', namespace='request'
) )
config_response_queue = topic( config_response_queue = topic(
'config', kind='non-persistent', namespace='response' 'config', qos='q0', namespace='response'
) )
config_push_queue = topic( config_push_queue = topic(
'config', kind='persistent', namespace='config' 'config', qos='q2', namespace='config'
) )
############################################################################ ############################################################################

View file

@ -1,33 +1,36 @@
from pulsar.schema import Record, String, Map, Double, Array from dataclasses import dataclass, field
from ..core.primitives import Error from ..core.primitives import Error
############################################################################ ############################################################################
# Structured data diagnosis services # Structured data diagnosis services
class StructuredDataDiagnosisRequest(Record): @dataclass
operation = String() # "detect-type", "generate-descriptor", "diagnose", or "schema-selection" class StructuredDataDiagnosisRequest:
sample = String() # Data sample to analyze (text content) operation: str = "" # "detect-type", "generate-descriptor", "diagnose", or "schema-selection"
type = String() # Data type (csv, json, xml) - optional, required for generate-descriptor sample: str = "" # Data sample to analyze (text content)
schema_name = String() # Target schema name for descriptor generation - optional type: str = "" # Data type (csv, json, xml) - optional, required for generate-descriptor
schema_name: str = "" # Target schema name for descriptor generation - optional
# JSON encoded options (e.g., delimiter for CSV) # JSON encoded options (e.g., delimiter for CSV)
options = Map(String()) options: dict[str, str] = field(default_factory=dict)
class StructuredDataDiagnosisResponse(Record): @dataclass
error = Error() class StructuredDataDiagnosisResponse:
error: Error | None = None
operation = String() # The operation that was performed operation: str = "" # The operation that was performed
detected_type = String() # Detected data type (for detect-type/diagnose) - optional detected_type: str = "" # Detected data type (for detect-type/diagnose) - optional
confidence = Double() # Confidence score for type detection - optional confidence: float = 0.0 # Confidence score for type detection - optional
# JSON encoded descriptor (for generate-descriptor/diagnose) - optional # JSON encoded descriptor (for generate-descriptor/diagnose) - optional
descriptor = String() descriptor: str = ""
# JSON encoded additional metadata (e.g., field count, sample records) # JSON encoded additional metadata (e.g., field count, sample records)
metadata = Map(String()) metadata: dict[str, str] = field(default_factory=dict)
# Array of matching schema IDs (for schema-selection operation) - optional # Array of matching schema IDs (for schema-selection operation) - optional
schema_matches = Array(String()) schema_matches: list[str] = field(default_factory=list)
############################################################################
############################################################################

View file

@ -1,5 +1,5 @@
from pulsar.schema import Record, Bytes, String, Boolean, Array, Map, Integer from dataclasses import dataclass, field
from ..core.topic import topic from ..core.topic import topic
from ..core.primitives import Error from ..core.primitives import Error
@ -11,61 +11,61 @@ from ..core.primitives import Error
# get_class(classname) -> (class) # get_class(classname) -> (class)
# put_class(class) -> (class) # put_class(class) -> (class)
# delete_class(classname) -> () # delete_class(classname) -> ()
# #
# list_flows() -> (flowid[]) # list_flows() -> (flowid[])
# get_flow(flowid) -> (flow) # get_flow(flowid) -> (flow)
# start_flow(flowid, classname) -> () # start_flow(flowid, classname) -> ()
# stop_flow(flowid) -> () # stop_flow(flowid) -> ()
# Prompt services, abstract the prompt generation # Prompt services, abstract the prompt generation
class FlowRequest(Record): @dataclass
class FlowRequest:
operation = String() # list-classes, get-class, put-class, delete-class operation: str = "" # list-classes, get-class, put-class, delete-class
# list-flows, get-flow, start-flow, stop-flow # list-flows, get-flow, start-flow, stop-flow
# get_class, put_class, delete_class, start_flow # get_class, put_class, delete_class, start_flow
class_name = String() class_name: str = ""
# put_class # put_class
class_definition = String() class_definition: str = ""
# start_flow # start_flow
description = String() description: str = ""
# get_flow, start_flow, stop_flow # get_flow, start_flow, stop_flow
flow_id = String() flow_id: str = ""
# start_flow - optional parameters for flow customization # start_flow - optional parameters for flow customization
parameters = Map(String()) parameters: dict[str, str] = field(default_factory=dict)
class FlowResponse(Record):
@dataclass
class FlowResponse:
# list_classes # list_classes
class_names = Array(String()) class_names: list[str] = field(default_factory=list)
# list_flows # list_flows
flow_ids = Array(String()) flow_ids: list[str] = field(default_factory=list)
# get_class # get_class
class_definition = String() class_definition: str = ""
# get_flow # get_flow
flow = String() flow: str = ""
# get_flow # get_flow
description = String() description: str = ""
# get_flow - parameters used when flow was started # get_flow - parameters used when flow was started
parameters = Map(String()) parameters: dict[str, str] = field(default_factory=dict)
# Everything # Everything
error = Error() error: Error | None = None
flow_request_queue = topic( flow_request_queue = topic(
'flow', kind='non-persistent', namespace='request' 'flow', qos='q0', namespace='request'
) )
flow_response_queue = topic( flow_response_queue = topic(
'flow', kind='non-persistent', namespace='response' 'flow', qos='q0', namespace='response'
) )
############################################################################ ############################################################################

View file

@ -1,9 +1,8 @@
from dataclasses import dataclass, field
from pulsar.schema import Record, Bytes, String, Array, Long
from ..core.primitives import Triple, Error from ..core.primitives import Triple, Error
from ..core.topic import topic from ..core.topic import topic
from ..core.metadata import Metadata from ..core.metadata import Metadata
from ..knowledge.document import Document, TextDocument # Note: Document imports will be updated after knowledge schemas are converted
# add-document # add-document
# -> (document_id, document_metadata, content) # -> (document_id, document_metadata, content)
@ -50,76 +49,79 @@ from ..knowledge.document import Document, TextDocument
# <- (processing_metadata[]) # <- (processing_metadata[])
# <- (error) # <- (error)
class DocumentMetadata(Record): @dataclass
id = String() class DocumentMetadata:
time = Long() id: str = ""
kind = String() time: int = 0
title = String() kind: str = ""
comments = String() title: str = ""
metadata = Array(Triple()) comments: str = ""
user = String() metadata: list[Triple] = field(default_factory=list)
tags = Array(String()) user: str = ""
tags: list[str] = field(default_factory=list)
class ProcessingMetadata(Record): @dataclass
id = String() class ProcessingMetadata:
document_id = String() id: str = ""
time = Long() document_id: str = ""
flow = String() time: int = 0
user = String() flow: str = ""
collection = String() user: str = ""
tags = Array(String()) collection: str = ""
tags: list[str] = field(default_factory=list)
class Criteria(Record): @dataclass
key = String() class Criteria:
value = String() key: str = ""
operator = String() value: str = ""
operator: str = ""
class LibrarianRequest(Record):
@dataclass
class LibrarianRequest:
# add-document, remove-document, update-document, get-document-metadata, # add-document, remove-document, update-document, get-document-metadata,
# get-document-content, add-processing, remove-processing, list-documents, # get-document-content, add-processing, remove-processing, list-documents,
# list-processing # list-processing
operation = String() operation: str = ""
# add-document, remove-document, update-document, get-document-metadata, # add-document, remove-document, update-document, get-document-metadata,
# get-document-content # get-document-content
document_id = String() document_id: str = ""
# add-processing, remove-processing # add-processing, remove-processing
processing_id = String() processing_id: str = ""
# add-document, update-document # add-document, update-document
document_metadata = DocumentMetadata() document_metadata: DocumentMetadata | None = None
# add-processing # add-processing
processing_metadata = ProcessingMetadata() processing_metadata: ProcessingMetadata | None = None
# add-document # add-document
content = Bytes() content: bytes = b""
# list-documents, list-processing # list-documents, list-processing
user = String() user: str = ""
# list-documents?, list-processing? # list-documents?, list-processing?
collection = String() collection: str = ""
# #
criteria = Array(Criteria()) criteria: list[Criteria] = field(default_factory=list)
class LibrarianResponse(Record): @dataclass
error = Error() class LibrarianResponse:
document_metadata = DocumentMetadata() error: Error | None = None
content = Bytes() document_metadata: DocumentMetadata | None = None
document_metadatas = Array(DocumentMetadata()) content: bytes = b""
processing_metadatas = Array(ProcessingMetadata()) document_metadatas: list[DocumentMetadata] = field(default_factory=list)
processing_metadatas: list[ProcessingMetadata] = field(default_factory=list)
# FIXME: Is this right? Using persistence on librarian so that # FIXME: Is this right? Using persistence on librarian so that
# message chunking works # message chunking works
librarian_request_queue = topic( librarian_request_queue = topic(
'librarian', kind='persistent', namespace='request' 'librarian', qos='q1', namespace='request'
) )
librarian_response_queue = topic( librarian_response_queue = topic(
'librarian', kind='persistent', namespace='response', 'librarian', qos='q1', namespace='response',
) )

View file

@ -1,5 +1,5 @@
from pulsar.schema import Record, String, Array, Double, Integer, Boolean from dataclasses import dataclass, field
from ..core.topic import topic from ..core.topic import topic
from ..core.primitives import Error from ..core.primitives import Error
@ -8,46 +8,49 @@ from ..core.primitives import Error
# LLM text completion # LLM text completion
class TextCompletionRequest(Record): @dataclass
system = String() class TextCompletionRequest:
prompt = String() system: str = ""
streaming = Boolean() # Default false for backward compatibility prompt: str = ""
streaming: bool = False # Default false for backward compatibility
class TextCompletionResponse(Record): @dataclass
error = Error() class TextCompletionResponse:
response = String() error: Error | None = None
in_token = Integer() response: str = ""
out_token = Integer() in_token: int = 0
model = String() out_token: int = 0
end_of_stream = Boolean() # Indicates final message in stream model: str = ""
end_of_stream: bool = False # Indicates final message in stream
############################################################################ ############################################################################
# Embeddings # Embeddings
class EmbeddingsRequest(Record): @dataclass
text = String() class EmbeddingsRequest:
text: str = ""
class EmbeddingsResponse(Record): @dataclass
error = Error() class EmbeddingsResponse:
vectors = Array(Array(Double())) error: Error | None = None
vectors: list[list[float]] = field(default_factory=list)
############################################################################ ############################################################################
# Tool request/response # Tool request/response
class ToolRequest(Record): @dataclass
name = String() class ToolRequest:
name: str = ""
# Parameters are JSON encoded # Parameters are JSON encoded
parameters = String() parameters: str = ""
class ToolResponse(Record):
error = Error()
@dataclass
class ToolResponse:
error: Error | None = None
# Plain text aka "unstructured" # Plain text aka "unstructured"
text = String() text: str = ""
# JSON-encoded object aka "structured" # JSON-encoded object aka "structured"
object = String() object: str = ""

View file

@ -1,5 +1,4 @@
from dataclasses import dataclass
from pulsar.schema import Record, String
from ..core.primitives import Error, Value, Triple from ..core.primitives import Error, Value, Triple
from ..core.topic import topic from ..core.topic import topic
@ -9,13 +8,14 @@ from ..core.metadata import Metadata
# Lookups # Lookups
class LookupRequest(Record): @dataclass
kind = String() class LookupRequest:
term = String() kind: str = ""
term: str = ""
class LookupResponse(Record): @dataclass
text = String() class LookupResponse:
error = Error() text: str = ""
error: Error | None = None
############################################################################ ############################################################################

View file

@ -1,4 +1,4 @@
from pulsar.schema import Record, String, Array, Map, Integer, Double from dataclasses import dataclass, field
from ..core.primitives import Error from ..core.primitives import Error
from ..core.topic import topic from ..core.topic import topic
@ -7,15 +7,18 @@ from ..core.topic import topic
# NLP to Structured Query Service - converts natural language to GraphQL # NLP to Structured Query Service - converts natural language to GraphQL
class QuestionToStructuredQueryRequest(Record): @dataclass
question = String() class QuestionToStructuredQueryRequest:
max_results = Integer() question: str = ""
max_results: int = 0
class QuestionToStructuredQueryResponse(Record): @dataclass
error = Error() class QuestionToStructuredQueryResponse:
graphql_query = String() # Generated GraphQL query error: Error | None = None
variables = Map(String()) # GraphQL variables if any graphql_query: str = "" # Generated GraphQL query
detected_schemas = Array(String()) # Which schemas the query targets variables: dict[str, str] = field(default_factory=dict) # GraphQL variables if any
confidence = Double() detected_schemas: list[str] = field(default_factory=list) # Which schemas the query targets
confidence: float = 0.0
############################################################################ ############################################################################

View file

@ -1,4 +1,4 @@
from pulsar.schema import Record, String, Map, Array from dataclasses import dataclass, field
from ..core.primitives import Error from ..core.primitives import Error
from ..core.topic import topic from ..core.topic import topic
@ -7,22 +7,25 @@ from ..core.topic import topic
# Objects Query Service - executes GraphQL queries against structured data # Objects Query Service - executes GraphQL queries against structured data
class GraphQLError(Record): @dataclass
message = String() class GraphQLError:
path = Array(String()) # Path to the field that caused the error message: str = ""
extensions = Map(String()) # Additional error metadata path: list[str] = field(default_factory=list) # Path to the field that caused the error
extensions: dict[str, str] = field(default_factory=dict) # Additional error metadata
class ObjectsQueryRequest(Record): @dataclass
user = String() # Cassandra keyspace (follows pattern from TriplesQueryRequest) class ObjectsQueryRequest:
collection = String() # Data collection identifier (required for partition key) user: str = "" # Cassandra keyspace (follows pattern from TriplesQueryRequest)
query = String() # GraphQL query string collection: str = "" # Data collection identifier (required for partition key)
variables = Map(String()) # GraphQL variables query: str = "" # GraphQL query string
operation_name = String() # Operation to execute for multi-operation documents variables: dict[str, str] = field(default_factory=dict) # GraphQL variables
operation_name: str = "" # Operation to execute for multi-operation documents
class ObjectsQueryResponse(Record): @dataclass
error = Error() # System-level error (connection, timeout, etc.) class ObjectsQueryResponse:
data = String() # JSON-encoded GraphQL response data error: Error | None = None # System-level error (connection, timeout, etc.)
errors = Array(GraphQLError()) # GraphQL field-level errors data: str = "" # JSON-encoded GraphQL response data
extensions = Map(String()) # Query metadata (execution time, etc.) errors: list[GraphQLError] = field(default_factory=list) # GraphQL field-level errors
extensions: dict[str, str] = field(default_factory=dict) # Query metadata (execution time, etc.)
############################################################################ ############################################################################

View file

@ -1,4 +1,4 @@
from pulsar.schema import Record, String, Map, Boolean from dataclasses import dataclass, field
from ..core.primitives import Error from ..core.primitives import Error
from ..core.topic import topic from ..core.topic import topic
@ -18,27 +18,28 @@ from ..core.topic import topic
# extract-rows # extract-rows
# schema, chunk -> rows # schema, chunk -> rows
class PromptRequest(Record): @dataclass
id = String() class PromptRequest:
id: str = ""
# JSON encoded values # JSON encoded values
terms = Map(String()) terms: dict[str, str] = field(default_factory=dict)
# Streaming support (default false for backward compatibility) # Streaming support (default false for backward compatibility)
streaming = Boolean() streaming: bool = False
class PromptResponse(Record):
@dataclass
class PromptResponse:
# Error case # Error case
error = Error() error: Error | None = None
# Just plain text # Just plain text
text = String() text: str = ""
# JSON encoded # JSON encoded
object = String() object: str = ""
# Indicates final message in stream # Indicates final message in stream
end_of_stream = Boolean() end_of_stream: bool = False
############################################################################ ############################################################################

View file

@ -1,4 +1,4 @@
from pulsar.schema import Record, String, Integer, Array, Double from dataclasses import dataclass, field
from ..core.primitives import Error, Value, Triple from ..core.primitives import Error, Value, Triple
from ..core.topic import topic from ..core.topic import topic
@ -7,49 +7,55 @@ from ..core.topic import topic
# Graph embeddings query # Graph embeddings query
class GraphEmbeddingsRequest(Record): @dataclass
vectors = Array(Array(Double())) class GraphEmbeddingsRequest:
limit = Integer() vectors: list[list[float]] = field(default_factory=list)
user = String() limit: int = 0
collection = String() user: str = ""
collection: str = ""
class GraphEmbeddingsResponse(Record): @dataclass
error = Error() class GraphEmbeddingsResponse:
entities = Array(Value()) error: Error | None = None
entities: list[Value] = field(default_factory=list)
############################################################################ ############################################################################
# Graph triples query # Graph triples query
class TriplesQueryRequest(Record): @dataclass
user = String() class TriplesQueryRequest:
collection = String() user: str = ""
s = Value() collection: str = ""
p = Value() s: Value | None = None
o = Value() p: Value | None = None
limit = Integer() o: Value | None = None
limit: int = 0
class TriplesQueryResponse(Record): @dataclass
error = Error() class TriplesQueryResponse:
triples = Array(Triple()) error: Error | None = None
triples: list[Triple] = field(default_factory=list)
############################################################################ ############################################################################
# Doc embeddings query # Doc embeddings query
class DocumentEmbeddingsRequest(Record): @dataclass
vectors = Array(Array(Double())) class DocumentEmbeddingsRequest:
limit = Integer() vectors: list[list[float]] = field(default_factory=list)
user = String() limit: int = 0
collection = String() user: str = ""
collection: str = ""
class DocumentEmbeddingsResponse(Record): @dataclass
error = Error() class DocumentEmbeddingsResponse:
chunks = Array(String()) error: Error | None = None
chunks: list[str] = field(default_factory=list)
document_embeddings_request_queue = topic( document_embeddings_request_queue = topic(
"non-persistent://trustgraph/document-embeddings-request" "document-embeddings-request", qos='q0', tenant='trustgraph', namespace='flow'
) )
document_embeddings_response_queue = topic( document_embeddings_response_queue = topic(
"non-persistent://trustgraph/document-embeddings-response" "document-embeddings-response", qos='q0', tenant='trustgraph', namespace='flow'
) )

View file

@ -1,5 +1,4 @@
from dataclasses import dataclass
from pulsar.schema import Record, Bytes, String, Boolean, Integer, Array, Double
from ..core.topic import topic from ..core.topic import topic
from ..core.primitives import Error, Value from ..core.primitives import Error, Value
@ -7,36 +6,37 @@ from ..core.primitives import Error, Value
# Graph RAG text retrieval # Graph RAG text retrieval
class GraphRagQuery(Record): @dataclass
query = String() class GraphRagQuery:
user = String() query: str = ""
collection = String() user: str = ""
entity_limit = Integer() collection: str = ""
triple_limit = Integer() entity_limit: int = 0
max_subgraph_size = Integer() triple_limit: int = 0
max_path_length = Integer() max_subgraph_size: int = 0
streaming = Boolean() max_path_length: int = 0
streaming: bool = False
class GraphRagResponse(Record): @dataclass
error = Error() class GraphRagResponse:
response = String() error: Error | None = None
chunk = String() response: str = ""
end_of_stream = Boolean() end_of_stream: bool = False
############################################################################ ############################################################################
# Document RAG text retrieval # Document RAG text retrieval
class DocumentRagQuery(Record): @dataclass
query = String() class DocumentRagQuery:
user = String() query: str = ""
collection = String() user: str = ""
doc_limit = Integer() collection: str = ""
streaming = Boolean() doc_limit: int = 0
streaming: bool = False
class DocumentRagResponse(Record):
error = Error()
response = String()
chunk = String()
end_of_stream = Boolean()
@dataclass
class DocumentRagResponse:
error: Error | None = None
response: str = ""
end_of_stream: bool = False

View file

@ -1,4 +1,4 @@
from pulsar.schema import Record, String from dataclasses import dataclass
from ..core.primitives import Error from ..core.primitives import Error
from ..core.topic import topic from ..core.topic import topic
@ -7,15 +7,17 @@ from ..core.topic import topic
# Storage management operations # Storage management operations
class StorageManagementRequest(Record): @dataclass
class StorageManagementRequest:
"""Request for storage management operations sent to store processors""" """Request for storage management operations sent to store processors"""
operation = String() # e.g., "delete-collection" operation: str = "" # e.g., "delete-collection"
user = String() user: str = ""
collection = String() collection: str = ""
class StorageManagementResponse(Record): @dataclass
class StorageManagementResponse:
"""Response from storage processors for management operations""" """Response from storage processors for management operations"""
error = Error() # Only populated if there's an error, if null success error: Error | None = None # Only populated if there's an error, if null success
############################################################################ ############################################################################
@ -23,20 +25,21 @@ class StorageManagementResponse(Record):
# Topics for sending collection management requests to different storage types # Topics for sending collection management requests to different storage types
vector_storage_management_topic = topic( vector_storage_management_topic = topic(
'vector-storage-management', kind='non-persistent', namespace='request' 'vector-storage-management', qos='q0', namespace='request'
) )
object_storage_management_topic = topic( object_storage_management_topic = topic(
'object-storage-management', kind='non-persistent', namespace='request' 'object-storage-management', qos='q0', namespace='request'
) )
triples_storage_management_topic = topic( triples_storage_management_topic = topic(
'triples-storage-management', kind='non-persistent', namespace='request' 'triples-storage-management', qos='q0', namespace='request'
) )
# Topic for receiving responses from storage processors # Topic for receiving responses from storage processors
storage_management_response_topic = topic( storage_management_response_topic = topic(
'storage-management', kind='non-persistent', namespace='response' 'storage-management', qos='q0', namespace='response'
) )
############################################################################ ############################################################################

View file

@ -1,4 +1,4 @@
from pulsar.schema import Record, String, Map, Array from dataclasses import dataclass, field
from ..core.primitives import Error from ..core.primitives import Error
from ..core.topic import topic from ..core.topic import topic
@ -7,14 +7,17 @@ from ..core.topic import topic
# Structured Query Service - executes GraphQL queries # Structured Query Service - executes GraphQL queries
class StructuredQueryRequest(Record): @dataclass
question = String() class StructuredQueryRequest:
user = String() # Cassandra keyspace identifier question: str = ""
collection = String() # Data collection identifier user: str = "" # Cassandra keyspace identifier
collection: str = "" # Data collection identifier
class StructuredQueryResponse(Record): @dataclass
error = Error() class StructuredQueryResponse:
data = String() # JSON-encoded GraphQL response data error: Error | None = None
errors = Array(String()) # GraphQL errors if any data: str = "" # JSON-encoded GraphQL response data
errors: list[str] = field(default_factory=list) # GraphQL errors if any
############################################################################ ############################################################################

View file

@ -17,6 +17,7 @@ from datetime import datetime
import argparse import argparse
from trustgraph.base.subscriber import Subscriber from trustgraph.base.subscriber import Subscriber
from trustgraph.base.pubsub import get_pubsub
def format_message(queue_name, msg): def format_message(queue_name, msg):
"""Format a message with timestamp and queue name.""" """Format a message with timestamp and queue name."""
@ -167,11 +168,11 @@ async def async_main(queues, output_file, pulsar_host, listener_name, subscriber
print(f"Mode: {'append' if append_mode else 'overwrite'}") print(f"Mode: {'append' if append_mode else 'overwrite'}")
print(f"Press Ctrl+C to stop\n") print(f"Press Ctrl+C to stop\n")
# Connect to Pulsar # Create backend connection
try: try:
client = pulsar.Client(pulsar_host, listener_name=listener_name) backend = get_pubsub(pulsar_host=pulsar_host, pulsar_listener=listener_name, pubsub_backend='pulsar')
except Exception as e: except Exception as e:
print(f"Error connecting to Pulsar at {pulsar_host}: {e}", file=sys.stderr) print(f"Error connecting to backend at {pulsar_host}: {e}", file=sys.stderr)
sys.exit(1) sys.exit(1)
# Create Subscribers and central queue # Create Subscribers and central queue
@ -181,7 +182,7 @@ async def async_main(queues, output_file, pulsar_host, listener_name, subscriber
for queue_name in queues: for queue_name in queues:
try: try:
sub = Subscriber( sub = Subscriber(
client=client, backend=backend,
topic=queue_name, topic=queue_name,
subscription=subscriber_name, subscription=subscriber_name,
consumer_name=f"{subscriber_name}-{queue_name}", consumer_name=f"{subscriber_name}-{queue_name}",
@ -195,7 +196,7 @@ async def async_main(queues, output_file, pulsar_host, listener_name, subscriber
if not subscribers: if not subscribers:
print("\nNo subscribers created. Exiting.", file=sys.stderr) print("\nNo subscribers created. Exiting.", file=sys.stderr)
client.close() backend.close()
sys.exit(1) sys.exit(1)
print(f"\nListening for messages...\n") print(f"\nListening for messages...\n")
@ -256,7 +257,7 @@ async def async_main(queues, output_file, pulsar_host, listener_name, subscriber
# Clean shutdown of Subscribers # Clean shutdown of Subscribers
for _, sub in subscribers: for _, sub in subscribers:
await sub.stop() await sub.stop()
client.close() backend.close()
print(f"\nMessages logged to: {output_file}") print(f"\nMessages logged to: {output_file}")

View file

@ -24,7 +24,7 @@ def question(url, flow_id, question, user, collection, doc_limit, streaming=True
try: try:
response = flow.document_rag( response = flow.document_rag(
question=question, query=question,
user=user, user=user,
collection=collection, collection=collection,
doc_limit=doc_limit, doc_limit=doc_limit,
@ -42,7 +42,7 @@ def question(url, flow_id, question, user, collection, doc_limit, streaming=True
# Use REST API for non-streaming # Use REST API for non-streaming
flow = api.flow().id(flow_id) flow = api.flow().id(flow_id)
resp = flow.document_rag( resp = flow.document_rag(
question=question, query=question,
user=user, user=user,
collection=collection, collection=collection,
doc_limit=doc_limit, doc_limit=doc_limit,

View file

@ -30,7 +30,7 @@ def question(
try: try:
response = flow.graph_rag( response = flow.graph_rag(
question=question, query=question,
user=user, user=user,
collection=collection, collection=collection,
entity_limit=entity_limit, entity_limit=entity_limit,
@ -51,7 +51,7 @@ def question(
# Use REST API for non-streaming # Use REST API for non-streaming
flow = api.flow().id(flow_id) flow = api.flow().id(flow_id)
resp = flow.graph_rag( resp = flow.graph_rag(
question=question, query=question,
user=user, user=user,
collection=collection, collection=collection,
entity_limit=entity_limit, entity_limit=entity_limit,

View file

@ -433,13 +433,11 @@ class Processor(AgentService):
end_of_dialog=True, end_of_dialog=True,
# Legacy fields for backward compatibility # Legacy fields for backward compatibility
error=error_obj, error=error_obj,
response=None,
) )
else: else:
# Legacy format # Legacy format
r = AgentResponse( r = AgentResponse(
error=error_obj, error=error_obj,
response=None,
) )
await respond(r) await respond(r)

View file

@ -95,9 +95,6 @@ class Configuration:
return ConfigResponse( return ConfigResponse(
version = await self.get_version(), version = await self.get_version(),
values = values, values = values,
directory = None,
config = None,
error = None,
) )
async def handle_list(self, v): async def handle_list(self, v):
@ -117,10 +114,7 @@ class Configuration:
return ConfigResponse( return ConfigResponse(
version = await self.get_version(), version = await self.get_version(),
values = None,
directory = await self.table_store.get_keys(v.type), directory = await self.table_store.get_keys(v.type),
config = None,
error = None,
) )
async def handle_getvalues(self, v): async def handle_getvalues(self, v):
@ -150,9 +144,6 @@ class Configuration:
return ConfigResponse( return ConfigResponse(
version = await self.get_version(), version = await self.get_version(),
values = list(values), values = list(values),
directory = None,
config = None,
error = None,
) )
async def handle_delete(self, v): async def handle_delete(self, v):
@ -179,12 +170,6 @@ class Configuration:
await self.push() await self.push()
return ConfigResponse( return ConfigResponse(
version = None,
value = None,
directory = None,
values = None,
config = None,
error = None,
) )
async def handle_put(self, v): async def handle_put(self, v):
@ -198,11 +183,6 @@ class Configuration:
await self.push() await self.push()
return ConfigResponse( return ConfigResponse(
version = None,
value = None,
directory = None,
values = None,
error = None,
) )
async def get_config(self): async def get_config(self):
@ -224,11 +204,7 @@ class Configuration:
return ConfigResponse( return ConfigResponse(
version = await self.get_version(), version = await self.get_version(),
value = None,
directory = None,
values = None,
config = config, config = config,
error = None,
) )
async def handle(self, msg): async def handle(self, msg):
@ -262,9 +238,6 @@ class Configuration:
else: else:
resp = ConfigResponse( resp = ConfigResponse(
value=None,
directory=None,
values=None,
error=Error( error=Error(
type = "bad-operation", type = "bad-operation",
message = "Bad operation" message = "Bad operation"

View file

@ -361,9 +361,6 @@ class FlowConfig:
else: else:
resp = FlowResponse( resp = FlowResponse(
value=None,
directory=None,
values=None,
error=Error( error=Error(
type = "bad-operation", type = "bad-operation",
message = "Bad operation" message = "Bad operation"

View file

@ -112,7 +112,7 @@ class Processor(AsyncProcessor):
self.config_request_consumer = Consumer( self.config_request_consumer = Consumer(
taskgroup = self.taskgroup, taskgroup = self.taskgroup,
client = self.pulsar_client, backend = self.pubsub,
flow = None, flow = None,
topic = config_request_queue, topic = config_request_queue,
subscriber = id, subscriber = id,
@ -122,14 +122,14 @@ class Processor(AsyncProcessor):
) )
self.config_response_producer = Producer( self.config_response_producer = Producer(
client = self.pulsar_client, backend = self.pubsub,
topic = config_response_queue, topic = config_response_queue,
schema = ConfigResponse, schema = ConfigResponse,
metrics = config_response_metrics, metrics = config_response_metrics,
) )
self.config_push_producer = Producer( self.config_push_producer = Producer(
client = self.pulsar_client, backend = self.pubsub,
topic = config_push_queue, topic = config_push_queue,
schema = ConfigPush, schema = ConfigPush,
metrics = config_push_metrics, metrics = config_push_metrics,
@ -137,7 +137,7 @@ class Processor(AsyncProcessor):
self.flow_request_consumer = Consumer( self.flow_request_consumer = Consumer(
taskgroup = self.taskgroup, taskgroup = self.taskgroup,
client = self.pulsar_client, backend = self.pubsub,
flow = None, flow = None,
topic = flow_request_queue, topic = flow_request_queue,
subscriber = id, subscriber = id,
@ -147,7 +147,7 @@ class Processor(AsyncProcessor):
) )
self.flow_response_producer = Producer( self.flow_response_producer = Producer(
client = self.pulsar_client, backend = self.pubsub,
topic = flow_response_queue, topic = flow_response_queue,
schema = FlowResponse, schema = FlowResponse,
metrics = flow_response_metrics, metrics = flow_response_metrics,
@ -178,11 +178,7 @@ class Processor(AsyncProcessor):
resp = ConfigPush( resp = ConfigPush(
version = version, version = version,
value = None,
directory = None,
values = None,
config = config, config = config,
error = None,
) )
await self.config_push_producer.send(resp) await self.config_push_producer.send(resp)
@ -215,7 +211,6 @@ class Processor(AsyncProcessor):
type = "config-error", type = "config-error",
message = str(e), message = str(e),
), ),
text=None,
) )
await self.config_response_producer.send( await self.config_response_producer.send(
@ -240,13 +235,12 @@ class Processor(AsyncProcessor):
) )
except Exception as e: except Exception as e:
resp = FlowResponse( resp = FlowResponse(
error=Error( error=Error(
type = "flow-error", type = "flow-error",
message = str(e), message = str(e),
), ),
text=None,
) )
await self.flow_response_producer.send( await self.flow_response_producer.send(

View file

@ -234,11 +234,11 @@ class KnowledgeManager:
logger.debug(f"Graph embeddings queue: {ge_q}") logger.debug(f"Graph embeddings queue: {ge_q}")
t_pub = Publisher( t_pub = Publisher(
self.flow_config.pulsar_client, t_q, self.flow_config.pubsub, t_q,
schema=Triples, schema=Triples,
) )
ge_pub = Publisher( ge_pub = Publisher(
self.flow_config.pulsar_client, ge_q, self.flow_config.pubsub, ge_q,
schema=GraphEmbeddings schema=GraphEmbeddings
) )

View file

@ -84,7 +84,7 @@ class Processor(AsyncProcessor):
self.knowledge_request_consumer = Consumer( self.knowledge_request_consumer = Consumer(
taskgroup = self.taskgroup, taskgroup = self.taskgroup,
client = self.pulsar_client, backend = self.pubsub,
flow = None, flow = None,
topic = knowledge_request_queue, topic = knowledge_request_queue,
subscriber = id, subscriber = id,
@ -94,7 +94,7 @@ class Processor(AsyncProcessor):
) )
self.knowledge_response_producer = Producer( self.knowledge_response_producer = Producer(
client = self.pulsar_client, backend = self.pubsub,
topic = knowledge_response_queue, topic = knowledge_response_queue,
schema = KnowledgeResponse, schema = KnowledgeResponse,
metrics = knowledge_response_metrics, metrics = knowledge_response_metrics,

View file

@ -34,9 +34,9 @@ logger.setLevel(logging.INFO)
class ConfigReceiver: class ConfigReceiver:
def __init__(self, pulsar_client): def __init__(self, backend):
self.pulsar_client = pulsar_client self.backend = backend
self.flow_handlers = [] self.flow_handlers = []
@ -104,8 +104,8 @@ class ConfigReceiver:
self.config_cons = Consumer( self.config_cons = Consumer(
taskgroup = tg, taskgroup = tg,
flow = None, flow = None,
client = self.pulsar_client, backend = self.backend,
subscriber = f"gateway-{id}", subscriber = f"gateway-{id}",
topic = config_push_queue, topic = config_push_queue,
schema = ConfigPush, schema = ConfigPush,
handler = self.on_config, handler = self.on_config,

View file

@ -6,12 +6,12 @@ from . requestor import ServiceRequestor
class AgentRequestor(ServiceRequestor): class AgentRequestor(ServiceRequestor):
def __init__( def __init__(
self, pulsar_client, request_queue, response_queue, timeout, self, backend, request_queue, response_queue, timeout,
consumer, subscriber, consumer, subscriber,
): ):
super(AgentRequestor, self).__init__( super(AgentRequestor, self).__init__(
pulsar_client=pulsar_client, backend=backend,
request_queue=request_queue, request_queue=request_queue,
response_queue=response_queue, response_queue=response_queue,
request_schema=AgentRequest, request_schema=AgentRequest,

View file

@ -5,7 +5,7 @@ from ... messaging import TranslatorRegistry
from . requestor import ServiceRequestor from . requestor import ServiceRequestor
class CollectionManagementRequestor(ServiceRequestor): class CollectionManagementRequestor(ServiceRequestor):
def __init__(self, pulsar_client, consumer, subscriber, timeout=120, def __init__(self, backend, consumer, subscriber, timeout=120,
request_queue=None, response_queue=None): request_queue=None, response_queue=None):
if request_queue is None: if request_queue is None:
@ -14,7 +14,7 @@ class CollectionManagementRequestor(ServiceRequestor):
response_queue = collection_response_queue response_queue = collection_response_queue
super(CollectionManagementRequestor, self).__init__( super(CollectionManagementRequestor, self).__init__(
pulsar_client=pulsar_client, backend=backend,
consumer_name = consumer, consumer_name = consumer,
subscription = subscriber, subscription = subscriber,
request_queue=request_queue, request_queue=request_queue,

View file

@ -7,7 +7,7 @@ from ... messaging import TranslatorRegistry
from . requestor import ServiceRequestor from . requestor import ServiceRequestor
class ConfigRequestor(ServiceRequestor): class ConfigRequestor(ServiceRequestor):
def __init__(self, pulsar_client, consumer, subscriber, timeout=120, def __init__(self, backend, consumer, subscriber, timeout=120,
request_queue=None, response_queue=None): request_queue=None, response_queue=None):
if request_queue is None: if request_queue is None:
@ -16,7 +16,7 @@ class ConfigRequestor(ServiceRequestor):
response_queue = config_response_queue response_queue = config_response_queue
super(ConfigRequestor, self).__init__( super(ConfigRequestor, self).__init__(
pulsar_client=pulsar_client, backend=backend,
consumer_name = consumer, consumer_name = consumer,
subscription = subscriber, subscription = subscriber,
request_queue=request_queue, request_queue=request_queue,

View file

@ -10,9 +10,9 @@ logger = logging.getLogger(__name__)
class CoreExport: class CoreExport:
def __init__(self, pulsar_client): def __init__(self, backend):
self.pulsar_client = pulsar_client self.backend = backend
async def process(self, data, error, ok, request): async def process(self, data, error, ok, request):
id = request.query["id"] id = request.query["id"]
@ -21,7 +21,7 @@ class CoreExport:
response = await ok() response = await ok()
kr = KnowledgeRequestor( kr = KnowledgeRequestor(
pulsar_client = self.pulsar_client, backend = self.backend,
consumer = "api-gateway-core-export-" + str(uuid.uuid4()), consumer = "api-gateway-core-export-" + str(uuid.uuid4()),
subscriber = "api-gateway-core-export-" + str(uuid.uuid4()), subscriber = "api-gateway-core-export-" + str(uuid.uuid4()),
) )

View file

@ -11,8 +11,8 @@ logger = logging.getLogger(__name__)
class CoreImport: class CoreImport:
def __init__(self, pulsar_client): def __init__(self, backend):
self.pulsar_client = pulsar_client self.backend = backend
async def process(self, data, error, ok, request): async def process(self, data, error, ok, request):
@ -20,7 +20,7 @@ class CoreImport:
user = request.query["user"] user = request.query["user"]
kr = KnowledgeRequestor( kr = KnowledgeRequestor(
pulsar_client = self.pulsar_client, backend = self.backend,
consumer = "api-gateway-core-import-" + str(uuid.uuid4()), consumer = "api-gateway-core-import-" + str(uuid.uuid4()),
subscriber = "api-gateway-core-import-" + str(uuid.uuid4()), subscriber = "api-gateway-core-import-" + str(uuid.uuid4()),
) )

View file

@ -15,12 +15,12 @@ logger = logging.getLogger(__name__)
class DocumentEmbeddingsExport: class DocumentEmbeddingsExport:
def __init__( def __init__(
self, ws, running, pulsar_client, queue, consumer, subscriber self, ws, running, backend, queue, consumer, subscriber
): ):
self.ws = ws self.ws = ws
self.running = running self.running = running
self.pulsar_client = pulsar_client self.backend = backend
self.queue = queue self.queue = queue
self.consumer = consumer self.consumer = consumer
self.subscriber = subscriber self.subscriber = subscriber
@ -48,9 +48,9 @@ class DocumentEmbeddingsExport:
async def run(self): async def run(self):
"""Enhanced run with better error handling""" """Enhanced run with better error handling"""
self.subs = Subscriber( self.subs = Subscriber(
client = self.pulsar_client, backend = self.backend,
topic = self.queue, topic = self.queue,
consumer_name = self.consumer, consumer_name = self.consumer,
subscription = self.subscriber, subscription = self.subscriber,
schema = DocumentEmbeddings, schema = DocumentEmbeddings,
backpressure_strategy = "block" # Configurable backpressure_strategy = "block" # Configurable

View file

@ -15,7 +15,7 @@ logger = logging.getLogger(__name__)
class DocumentEmbeddingsImport: class DocumentEmbeddingsImport:
def __init__( def __init__(
self, ws, running, pulsar_client, queue self, ws, running, backend, queue
): ):
self.ws = ws self.ws = ws
@ -23,7 +23,7 @@ class DocumentEmbeddingsImport:
self.translator = DocumentEmbeddingsTranslator() self.translator = DocumentEmbeddingsTranslator()
self.publisher = Publisher( self.publisher = Publisher(
pulsar_client, topic = queue, schema = DocumentEmbeddings backend, topic = queue, schema = DocumentEmbeddings
) )
async def start(self): async def start(self):

View file

@ -11,10 +11,10 @@ from . sender import ServiceSender
logger = logging.getLogger(__name__) logger = logging.getLogger(__name__)
class DocumentLoad(ServiceSender): class DocumentLoad(ServiceSender):
def __init__(self, pulsar_client, queue): def __init__(self, backend, queue):
super(DocumentLoad, self).__init__( super(DocumentLoad, self).__init__(
pulsar_client = pulsar_client, backend = backend,
queue = queue, queue = queue,
schema = Document, schema = Document,
) )

View file

@ -6,12 +6,12 @@ from . requestor import ServiceRequestor
class DocumentRagRequestor(ServiceRequestor): class DocumentRagRequestor(ServiceRequestor):
def __init__( def __init__(
self, pulsar_client, request_queue, response_queue, timeout, self, backend, request_queue, response_queue, timeout,
consumer, subscriber, consumer, subscriber,
): ):
super(DocumentRagRequestor, self).__init__( super(DocumentRagRequestor, self).__init__(
pulsar_client=pulsar_client, backend=backend,
request_queue=request_queue, request_queue=request_queue,
response_queue=response_queue, response_queue=response_queue,
request_schema=DocumentRagQuery, request_schema=DocumentRagQuery,

View file

@ -6,12 +6,12 @@ from . requestor import ServiceRequestor
class EmbeddingsRequestor(ServiceRequestor): class EmbeddingsRequestor(ServiceRequestor):
def __init__( def __init__(
self, pulsar_client, request_queue, response_queue, timeout, self, backend, request_queue, response_queue, timeout,
consumer, subscriber, consumer, subscriber,
): ):
super(EmbeddingsRequestor, self).__init__( super(EmbeddingsRequestor, self).__init__(
pulsar_client=pulsar_client, backend=backend,
request_queue=request_queue, request_queue=request_queue,
response_queue=response_queue, response_queue=response_queue,
request_schema=EmbeddingsRequest, request_schema=EmbeddingsRequest,

View file

@ -15,12 +15,12 @@ logger = logging.getLogger(__name__)
class EntityContextsExport: class EntityContextsExport:
def __init__( def __init__(
self, ws, running, pulsar_client, queue, consumer, subscriber self, ws, running, backend, queue, consumer, subscriber
): ):
self.ws = ws self.ws = ws
self.running = running self.running = running
self.pulsar_client = pulsar_client self.backend = backend
self.queue = queue self.queue = queue
self.consumer = consumer self.consumer = consumer
self.subscriber = subscriber self.subscriber = subscriber
@ -48,9 +48,9 @@ class EntityContextsExport:
async def run(self): async def run(self):
"""Enhanced run with better error handling""" """Enhanced run with better error handling"""
self.subs = Subscriber( self.subs = Subscriber(
client = self.pulsar_client, backend = self.backend,
topic = self.queue, topic = self.queue,
consumer_name = self.consumer, consumer_name = self.consumer,
subscription = self.subscriber, subscription = self.subscriber,
schema = EntityContexts, schema = EntityContexts,
backpressure_strategy = "block" # Configurable backpressure_strategy = "block" # Configurable

View file

@ -16,14 +16,14 @@ logger = logging.getLogger(__name__)
class EntityContextsImport: class EntityContextsImport:
def __init__( def __init__(
self, ws, running, pulsar_client, queue self, ws, running, backend, queue
): ):
self.ws = ws self.ws = ws
self.running = running self.running = running
self.publisher = Publisher( self.publisher = Publisher(
pulsar_client, topic = queue, schema = EntityContexts backend, topic = queue, schema = EntityContexts
) )
async def start(self): async def start(self):

View file

@ -7,7 +7,7 @@ from ... messaging import TranslatorRegistry
from . requestor import ServiceRequestor from . requestor import ServiceRequestor
class FlowRequestor(ServiceRequestor): class FlowRequestor(ServiceRequestor):
def __init__(self, pulsar_client, consumer, subscriber, timeout=120, def __init__(self, backend, consumer, subscriber, timeout=120,
request_queue=None, response_queue=None): request_queue=None, response_queue=None):
if request_queue is None: if request_queue is None:
@ -16,7 +16,7 @@ class FlowRequestor(ServiceRequestor):
response_queue = flow_response_queue response_queue = flow_response_queue
super(FlowRequestor, self).__init__( super(FlowRequestor, self).__init__(
pulsar_client=pulsar_client, backend=backend,
consumer_name = consumer, consumer_name = consumer,
subscription = subscriber, subscription = subscriber,
request_queue=request_queue, request_queue=request_queue,

View file

@ -15,12 +15,12 @@ logger = logging.getLogger(__name__)
class GraphEmbeddingsExport: class GraphEmbeddingsExport:
def __init__( def __init__(
self, ws, running, pulsar_client, queue, consumer, subscriber self, ws, running, backend, queue, consumer, subscriber
): ):
self.ws = ws self.ws = ws
self.running = running self.running = running
self.pulsar_client = pulsar_client self.backend = backend
self.queue = queue self.queue = queue
self.consumer = consumer self.consumer = consumer
self.subscriber = subscriber self.subscriber = subscriber
@ -48,9 +48,9 @@ class GraphEmbeddingsExport:
async def run(self): async def run(self):
"""Enhanced run with better error handling""" """Enhanced run with better error handling"""
self.subs = Subscriber( self.subs = Subscriber(
client = self.pulsar_client, backend = self.backend,
topic = self.queue, topic = self.queue,
consumer_name = self.consumer, consumer_name = self.consumer,
subscription = self.subscriber, subscription = self.subscriber,
schema = GraphEmbeddings, schema = GraphEmbeddings,
backpressure_strategy = "block" # Configurable backpressure_strategy = "block" # Configurable

View file

@ -16,14 +16,14 @@ logger = logging.getLogger(__name__)
class GraphEmbeddingsImport: class GraphEmbeddingsImport:
def __init__( def __init__(
self, ws, running, pulsar_client, queue self, ws, running, backend, queue
): ):
self.ws = ws self.ws = ws
self.running = running self.running = running
self.publisher = Publisher( self.publisher = Publisher(
pulsar_client, topic = queue, schema = GraphEmbeddings backend, topic = queue, schema = GraphEmbeddings
) )
async def start(self): async def start(self):

View file

@ -6,12 +6,12 @@ from . requestor import ServiceRequestor
class GraphEmbeddingsQueryRequestor(ServiceRequestor): class GraphEmbeddingsQueryRequestor(ServiceRequestor):
def __init__( def __init__(
self, pulsar_client, request_queue, response_queue, timeout, self, backend, request_queue, response_queue, timeout,
consumer, subscriber, consumer, subscriber,
): ):
super(GraphEmbeddingsQueryRequestor, self).__init__( super(GraphEmbeddingsQueryRequestor, self).__init__(
pulsar_client=pulsar_client, backend=backend,
request_queue=request_queue, request_queue=request_queue,
response_queue=response_queue, response_queue=response_queue,
request_schema=GraphEmbeddingsRequest, request_schema=GraphEmbeddingsRequest,

View file

@ -6,12 +6,12 @@ from . requestor import ServiceRequestor
class GraphRagRequestor(ServiceRequestor): class GraphRagRequestor(ServiceRequestor):
def __init__( def __init__(
self, pulsar_client, request_queue, response_queue, timeout, self, backend, request_queue, response_queue, timeout,
consumer, subscriber, consumer, subscriber,
): ):
super(GraphRagRequestor, self).__init__( super(GraphRagRequestor, self).__init__(
pulsar_client=pulsar_client, backend=backend,
request_queue=request_queue, request_queue=request_queue,
response_queue=response_queue, response_queue=response_queue,
request_schema=GraphRagQuery, request_schema=GraphRagQuery,

View file

@ -10,7 +10,7 @@ from ... messaging import TranslatorRegistry
from . requestor import ServiceRequestor from . requestor import ServiceRequestor
class KnowledgeRequestor(ServiceRequestor): class KnowledgeRequestor(ServiceRequestor):
def __init__(self, pulsar_client, consumer, subscriber, timeout=120, def __init__(self, backend, consumer, subscriber, timeout=120,
request_queue=None, response_queue=None): request_queue=None, response_queue=None):
if request_queue is None: if request_queue is None:
@ -19,7 +19,7 @@ class KnowledgeRequestor(ServiceRequestor):
response_queue = knowledge_response_queue response_queue = knowledge_response_queue
super(KnowledgeRequestor, self).__init__( super(KnowledgeRequestor, self).__init__(
pulsar_client=pulsar_client, backend=backend,
consumer_name = consumer, consumer_name = consumer,
subscription = subscriber, subscription = subscriber,
request_queue=request_queue, request_queue=request_queue,

View file

@ -9,7 +9,7 @@ from ... messaging import TranslatorRegistry
from . requestor import ServiceRequestor from . requestor import ServiceRequestor
class LibrarianRequestor(ServiceRequestor): class LibrarianRequestor(ServiceRequestor):
def __init__(self, pulsar_client, consumer, subscriber, timeout=120, def __init__(self, backend, consumer, subscriber, timeout=120,
request_queue=None, response_queue=None): request_queue=None, response_queue=None):
if request_queue is None: if request_queue is None:
@ -18,7 +18,7 @@ class LibrarianRequestor(ServiceRequestor):
response_queue = librarian_response_queue response_queue = librarian_response_queue
super(LibrarianRequestor, self).__init__( super(LibrarianRequestor, self).__init__(
pulsar_client=pulsar_client, backend=backend,
consumer_name = consumer, consumer_name = consumer,
subscription = subscriber, subscription = subscriber,
request_queue=request_queue, request_queue=request_queue,

View file

@ -98,9 +98,9 @@ class DispatcherWrapper:
class DispatcherManager: class DispatcherManager:
def __init__(self, pulsar_client, config_receiver, prefix="api-gateway", def __init__(self, backend, config_receiver, prefix="api-gateway",
queue_overrides=None): queue_overrides=None):
self.pulsar_client = pulsar_client self.backend = backend
self.config_receiver = config_receiver self.config_receiver = config_receiver
self.config_receiver.add_handler(self) self.config_receiver.add_handler(self)
self.prefix = prefix self.prefix = prefix
@ -133,12 +133,12 @@ class DispatcherManager:
async def process_core_import(self, data, error, ok, request): async def process_core_import(self, data, error, ok, request):
ci = CoreImport(self.pulsar_client) ci = CoreImport(self.backend)
return await ci.process(data, error, ok, request) return await ci.process(data, error, ok, request)
async def process_core_export(self, data, error, ok, request): async def process_core_export(self, data, error, ok, request):
ce = CoreExport(self.pulsar_client) ce = CoreExport(self.backend)
return await ce.process(data, error, ok, request) return await ce.process(data, error, ok, request)
async def process_global_service(self, data, responder, params): async def process_global_service(self, data, responder, params):
@ -161,7 +161,7 @@ class DispatcherManager:
response_queue = self.queue_overrides[kind].get("response") response_queue = self.queue_overrides[kind].get("response")
dispatcher = global_dispatchers[kind]( dispatcher = global_dispatchers[kind](
pulsar_client = self.pulsar_client, backend = self.backend,
timeout = 120, timeout = 120,
consumer = f"{self.prefix}-{kind}-request", consumer = f"{self.prefix}-{kind}-request",
subscriber = f"{self.prefix}-{kind}-request", subscriber = f"{self.prefix}-{kind}-request",
@ -216,7 +216,7 @@ class DispatcherManager:
id = str(uuid.uuid4()) id = str(uuid.uuid4())
dispatcher = import_dispatchers[kind]( dispatcher = import_dispatchers[kind](
pulsar_client = self.pulsar_client, backend = self.backend,
ws = ws, ws = ws,
running = running, running = running,
queue = qconfig, queue = qconfig,
@ -254,7 +254,7 @@ class DispatcherManager:
id = str(uuid.uuid4()) id = str(uuid.uuid4())
dispatcher = export_dispatchers[kind]( dispatcher = export_dispatchers[kind](
pulsar_client = self.pulsar_client, backend = self.backend,
ws = ws, ws = ws,
running = running, running = running,
queue = qconfig, queue = qconfig,
@ -296,7 +296,7 @@ class DispatcherManager:
if kind in request_response_dispatchers: if kind in request_response_dispatchers:
dispatcher = request_response_dispatchers[kind]( dispatcher = request_response_dispatchers[kind](
pulsar_client = self.pulsar_client, backend = self.backend,
request_queue = qconfig["request"], request_queue = qconfig["request"],
response_queue = qconfig["response"], response_queue = qconfig["response"],
timeout = 120, timeout = 120,
@ -305,7 +305,7 @@ class DispatcherManager:
) )
elif kind in sender_dispatchers: elif kind in sender_dispatchers:
dispatcher = sender_dispatchers[kind]( dispatcher = sender_dispatchers[kind](
pulsar_client = self.pulsar_client, backend = self.backend,
queue = qconfig, queue = qconfig,
) )
else: else:

View file

@ -6,12 +6,12 @@ from . requestor import ServiceRequestor
class McpToolRequestor(ServiceRequestor): class McpToolRequestor(ServiceRequestor):
def __init__( def __init__(
self, pulsar_client, request_queue, response_queue, timeout, self, backend, request_queue, response_queue, timeout,
consumer, subscriber, consumer, subscriber,
): ):
super(McpToolRequestor, self).__init__( super(McpToolRequestor, self).__init__(
pulsar_client=pulsar_client, backend=backend,
request_queue=request_queue, request_queue=request_queue,
response_queue=response_queue, response_queue=response_queue,
request_schema=ToolRequest, request_schema=ToolRequest,

View file

@ -5,12 +5,12 @@ from . requestor import ServiceRequestor
class NLPQueryRequestor(ServiceRequestor): class NLPQueryRequestor(ServiceRequestor):
def __init__( def __init__(
self, pulsar_client, request_queue, response_queue, timeout, self, backend, request_queue, response_queue, timeout,
consumer, subscriber, consumer, subscriber,
): ):
super(NLPQueryRequestor, self).__init__( super(NLPQueryRequestor, self).__init__(
pulsar_client=pulsar_client, backend=backend,
request_queue=request_queue, request_queue=request_queue,
response_queue=response_queue, response_queue=response_queue,
request_schema=QuestionToStructuredQueryRequest, request_schema=QuestionToStructuredQueryRequest,

View file

@ -15,14 +15,14 @@ logger = logging.getLogger(__name__)
class ObjectsImport: class ObjectsImport:
def __init__( def __init__(
self, ws, running, pulsar_client, queue self, ws, running, backend, queue
): ):
self.ws = ws self.ws = ws
self.running = running self.running = running
self.publisher = Publisher( self.publisher = Publisher(
pulsar_client, topic = queue, schema = ExtractedObject backend, topic = queue, schema = ExtractedObject
) )
async def start(self): async def start(self):

View file

@ -5,12 +5,12 @@ from . requestor import ServiceRequestor
class ObjectsQueryRequestor(ServiceRequestor): class ObjectsQueryRequestor(ServiceRequestor):
def __init__( def __init__(
self, pulsar_client, request_queue, response_queue, timeout, self, backend, request_queue, response_queue, timeout,
consumer, subscriber, consumer, subscriber,
): ):
super(ObjectsQueryRequestor, self).__init__( super(ObjectsQueryRequestor, self).__init__(
pulsar_client=pulsar_client, backend=backend,
request_queue=request_queue, request_queue=request_queue,
response_queue=response_queue, response_queue=response_queue,
request_schema=ObjectsQueryRequest, request_schema=ObjectsQueryRequest,

View file

@ -8,12 +8,12 @@ from . requestor import ServiceRequestor
class PromptRequestor(ServiceRequestor): class PromptRequestor(ServiceRequestor):
def __init__( def __init__(
self, pulsar_client, request_queue, response_queue, timeout, self, backend, request_queue, response_queue, timeout,
consumer, subscriber, consumer, subscriber,
): ):
super(PromptRequestor, self).__init__( super(PromptRequestor, self).__init__(
pulsar_client=pulsar_client, backend=backend,
request_queue=request_queue, request_queue=request_queue,
response_queue=response_queue, response_queue=response_queue,
request_schema=PromptRequest, request_schema=PromptRequest,

View file

@ -13,7 +13,7 @@ class ServiceRequestor:
def __init__( def __init__(
self, self,
pulsar_client, backend,
request_queue, request_schema, request_queue, request_schema,
response_queue, response_schema, response_queue, response_schema,
subscription="api-gateway", consumer_name="api-gateway", subscription="api-gateway", consumer_name="api-gateway",
@ -21,12 +21,12 @@ class ServiceRequestor:
): ):
self.pub = Publisher( self.pub = Publisher(
pulsar_client, request_queue, backend, request_queue,
schema=request_schema, schema=request_schema,
) )
self.sub = Subscriber( self.sub = Subscriber(
pulsar_client, response_queue, backend, response_queue,
subscription, consumer_name, subscription, consumer_name,
response_schema response_schema
) )

View file

@ -14,12 +14,12 @@ class ServiceSender:
def __init__( def __init__(
self, self,
pulsar_client, backend,
queue, schema, queue, schema,
): ):
self.pub = Publisher( self.pub = Publisher(
pulsar_client, queue, backend, queue,
schema=schema, schema=schema,
) )

View file

@ -13,7 +13,7 @@ class ServiceRequestor:
def __init__( def __init__(
self, self,
pulsar_client, backend,
queue, schema, queue, schema,
handler, handler,
subscription="api-gateway", consumer_name="api-gateway", subscription="api-gateway", consumer_name="api-gateway",
@ -21,7 +21,7 @@ class ServiceRequestor:
): ):
self.sub = Subscriber( self.sub = Subscriber(
pulsar_client, queue, backend, queue,
subscription, consumer_name, subscription, consumer_name,
schema schema
) )

View file

@ -5,12 +5,12 @@ from . requestor import ServiceRequestor
class StructuredDiagRequestor(ServiceRequestor): class StructuredDiagRequestor(ServiceRequestor):
def __init__( def __init__(
self, pulsar_client, request_queue, response_queue, timeout, self, backend, request_queue, response_queue, timeout,
consumer, subscriber, consumer, subscriber,
): ):
super(StructuredDiagRequestor, self).__init__( super(StructuredDiagRequestor, self).__init__(
pulsar_client=pulsar_client, backend=backend,
request_queue=request_queue, request_queue=request_queue,
response_queue=response_queue, response_queue=response_queue,
request_schema=StructuredDataDiagnosisRequest, request_schema=StructuredDataDiagnosisRequest,

View file

@ -5,12 +5,12 @@ from . requestor import ServiceRequestor
class StructuredQueryRequestor(ServiceRequestor): class StructuredQueryRequestor(ServiceRequestor):
def __init__( def __init__(
self, pulsar_client, request_queue, response_queue, timeout, self, backend, request_queue, response_queue, timeout,
consumer, subscriber, consumer, subscriber,
): ):
super(StructuredQueryRequestor, self).__init__( super(StructuredQueryRequestor, self).__init__(
pulsar_client=pulsar_client, backend=backend,
request_queue=request_queue, request_queue=request_queue,
response_queue=response_queue, response_queue=response_queue,
request_schema=StructuredQueryRequest, request_schema=StructuredQueryRequest,

View file

@ -6,12 +6,12 @@ from . requestor import ServiceRequestor
class TextCompletionRequestor(ServiceRequestor): class TextCompletionRequestor(ServiceRequestor):
def __init__( def __init__(
self, pulsar_client, request_queue, response_queue, timeout, self, backend, request_queue, response_queue, timeout,
consumer, subscriber, consumer, subscriber,
): ):
super(TextCompletionRequestor, self).__init__( super(TextCompletionRequestor, self).__init__(
pulsar_client=pulsar_client, backend=backend,
request_queue=request_queue, request_queue=request_queue,
response_queue=response_queue, response_queue=response_queue,
request_schema=TextCompletionRequest, request_schema=TextCompletionRequest,

View file

@ -11,10 +11,10 @@ from . sender import ServiceSender
logger = logging.getLogger(__name__) logger = logging.getLogger(__name__)
class TextLoad(ServiceSender): class TextLoad(ServiceSender):
def __init__(self, pulsar_client, queue): def __init__(self, backend, queue):
super(TextLoad, self).__init__( super(TextLoad, self).__init__(
pulsar_client = pulsar_client, backend = backend,
queue = queue, queue = queue,
schema = TextDocument, schema = TextDocument,
) )

View file

@ -15,12 +15,12 @@ logger = logging.getLogger(__name__)
class TriplesExport: class TriplesExport:
def __init__( def __init__(
self, ws, running, pulsar_client, queue, consumer, subscriber self, ws, running, backend, queue, consumer, subscriber
): ):
self.ws = ws self.ws = ws
self.running = running self.running = running
self.pulsar_client = pulsar_client self.backend = backend
self.queue = queue self.queue = queue
self.consumer = consumer self.consumer = consumer
self.subscriber = subscriber self.subscriber = subscriber
@ -48,9 +48,9 @@ class TriplesExport:
async def run(self): async def run(self):
"""Enhanced run with better error handling""" """Enhanced run with better error handling"""
self.subs = Subscriber( self.subs = Subscriber(
client = self.pulsar_client, backend = self.backend,
topic = self.queue, topic = self.queue,
consumer_name = self.consumer, consumer_name = self.consumer,
subscription = self.subscriber, subscription = self.subscriber,
schema = Triples, schema = Triples,
backpressure_strategy = "block" # Configurable backpressure_strategy = "block" # Configurable

View file

@ -16,14 +16,14 @@ logger = logging.getLogger(__name__)
class TriplesImport: class TriplesImport:
def __init__( def __init__(
self, ws, running, pulsar_client, queue self, ws, running, backend, queue
): ):
self.ws = ws self.ws = ws
self.running = running self.running = running
self.publisher = Publisher( self.publisher = Publisher(
pulsar_client, topic = queue, schema = Triples backend, topic = queue, schema = Triples
) )
async def start(self): async def start(self):

View file

@ -6,12 +6,12 @@ from . requestor import ServiceRequestor
class TriplesQueryRequestor(ServiceRequestor): class TriplesQueryRequestor(ServiceRequestor):
def __init__( def __init__(
self, pulsar_client, request_queue, response_queue, timeout, self, backend, request_queue, response_queue, timeout,
consumer, subscriber, consumer, subscriber,
): ):
super(TriplesQueryRequestor, self).__init__( super(TriplesQueryRequestor, self).__init__(
pulsar_client=pulsar_client, backend=backend,
request_queue=request_queue, request_queue=request_queue,
response_queue=response_queue, response_queue=response_queue,
request_schema=TriplesQueryRequest, request_schema=TriplesQueryRequest,

View file

@ -10,6 +10,7 @@ import logging
import os import os
from trustgraph.base.logging import setup_logging from trustgraph.base.logging import setup_logging
from trustgraph.base.pubsub import get_pubsub
from . auth import Authenticator from . auth import Authenticator
from . config.receiver import ConfigReceiver from . config.receiver import ConfigReceiver
@ -50,15 +51,8 @@ class Api:
self.pulsar_listener = config.get("pulsar_listener", None) self.pulsar_listener = config.get("pulsar_listener", None)
if self.pulsar_api_key: # Create backend using factory
self.pulsar_client = pulsar.Client( self.pubsub_backend = get_pubsub(**config)
self.pulsar_host, listener_name=self.pulsar_listener,
authentication=pulsar.AuthenticationToken(self.pulsar_api_key)
)
else:
self.pulsar_client = pulsar.Client(
self.pulsar_host, listener_name=self.pulsar_listener,
)
self.prometheus_url = config.get( self.prometheus_url = config.get(
"prometheus_url", default_prometheus_url, "prometheus_url", default_prometheus_url,
@ -75,7 +69,7 @@ class Api:
else: else:
self.auth = Authenticator(allow_all=True) self.auth = Authenticator(allow_all=True)
self.config_receiver = ConfigReceiver(self.pulsar_client) self.config_receiver = ConfigReceiver(self.pubsub_backend)
# Build queue overrides dictionary from CLI arguments # Build queue overrides dictionary from CLI arguments
queue_overrides = {} queue_overrides = {}
@ -121,7 +115,7 @@ class Api:
queue_overrides["librarian"]["response"] = librarian_resp queue_overrides["librarian"]["response"] = librarian_resp
self.dispatcher_manager = DispatcherManager( self.dispatcher_manager = DispatcherManager(
pulsar_client = self.pulsar_client, backend = self.pubsub_backend,
config_receiver = self.config_receiver, config_receiver = self.config_receiver,
prefix = "gateway", prefix = "gateway",
queue_overrides = queue_overrides, queue_overrides = queue_overrides,
@ -174,6 +168,14 @@ def run():
help='Service identifier for logging and metrics (default: api-gateway)', help='Service identifier for logging and metrics (default: api-gateway)',
) )
# Pub/sub backend selection
parser.add_argument(
'--pubsub-backend',
default=os.getenv('PUBSUB_BACKEND', 'pulsar'),
choices=['pulsar', 'mqtt'],
help='Pub/sub backend (default: pulsar, env: PUBSUB_BACKEND)',
)
parser.add_argument( parser.add_argument(
'-p', '--pulsar-host', '-p', '--pulsar-host',
default=default_pulsar_host, default=default_pulsar_host,

View file

@ -143,7 +143,7 @@ class Processor(AsyncProcessor):
self.librarian_request_consumer = Consumer( self.librarian_request_consumer = Consumer(
taskgroup = self.taskgroup, taskgroup = self.taskgroup,
client = self.pulsar_client, backend = self.pubsub,
flow = None, flow = None,
topic = librarian_request_queue, topic = librarian_request_queue,
subscriber = id, subscriber = id,
@ -153,7 +153,7 @@ class Processor(AsyncProcessor):
) )
self.librarian_response_producer = Producer( self.librarian_response_producer = Producer(
client = self.pulsar_client, backend = self.pubsub,
topic = librarian_response_queue, topic = librarian_response_queue,
schema = LibrarianResponse, schema = LibrarianResponse,
metrics = librarian_response_metrics, metrics = librarian_response_metrics,
@ -161,7 +161,7 @@ class Processor(AsyncProcessor):
self.collection_request_consumer = Consumer( self.collection_request_consumer = Consumer(
taskgroup = self.taskgroup, taskgroup = self.taskgroup,
client = self.pulsar_client, backend = self.pubsub,
flow = None, flow = None,
topic = collection_request_queue, topic = collection_request_queue,
subscriber = id, subscriber = id,
@ -171,7 +171,7 @@ class Processor(AsyncProcessor):
) )
self.collection_response_producer = Producer( self.collection_response_producer = Producer(
client = self.pulsar_client, backend = self.pubsub,
topic = collection_response_queue, topic = collection_response_queue,
schema = CollectionManagementResponse, schema = CollectionManagementResponse,
metrics = collection_response_metrics, metrics = collection_response_metrics,
@ -183,7 +183,7 @@ class Processor(AsyncProcessor):
) )
self.config_request_producer = Producer( self.config_request_producer = Producer(
client = self.pulsar_client, backend = self.pubsub,
topic = config_request_queue, topic = config_request_queue,
schema = ConfigRequest, schema = ConfigRequest,
metrics = config_request_metrics, metrics = config_request_metrics,
@ -195,7 +195,7 @@ class Processor(AsyncProcessor):
self.config_response_consumer = Consumer( self.config_response_consumer = Consumer(
taskgroup = self.taskgroup, taskgroup = self.taskgroup,
client = self.pulsar_client, backend = self.pubsub,
flow = None, flow = None,
topic = config_response_queue, topic = config_response_queue,
subscriber = f"{id}-config", subscriber = f"{id}-config",
@ -299,14 +299,13 @@ class Processor(AsyncProcessor):
collection = processing.collection collection = processing.collection
), ),
data = base64.b64encode(content).decode("utf-8") data = base64.b64encode(content).decode("utf-8")
) )
schema = Document schema = Document
logger.debug(f"Submitting to queue {q}...") logger.debug(f"Submitting to queue {q}...")
pub = Publisher( pub = Publisher(
self.pulsar_client, q, schema=schema self.pubsub, q, schema=schema
) )
await pub.start() await pub.start()

View file

@ -98,16 +98,16 @@ class Processor(FlowProcessor):
async def send_chunk(chunk): async def send_chunk(chunk):
await flow("response").send( await flow("response").send(
DocumentRagResponse( DocumentRagResponse(
chunk=chunk, response=chunk,
end_of_stream=False, end_of_stream=False,
response=None,
error=None error=None
), ),
properties={"id": id} properties={"id": id}
) )
# Query with streaming enabled # Query with streaming enabled
full_response = await self.rag.query( # The query returns the last chunk (not accumulated text)
final_response = await self.rag.query(
v.query, v.query,
user=v.user, user=v.user,
collection=v.collection, collection=v.collection,
@ -116,12 +116,11 @@ class Processor(FlowProcessor):
chunk_callback=send_chunk, chunk_callback=send_chunk,
) )
# Send final message with complete response # Send final message with last chunk
await flow("response").send( await flow("response").send(
DocumentRagResponse( DocumentRagResponse(
chunk=None, response=final_response if final_response else "",
end_of_stream=True, end_of_stream=True,
response=full_response,
error=None error=None
), ),
properties={"id": id} properties={"id": id}

View file

@ -141,16 +141,16 @@ class Processor(FlowProcessor):
async def send_chunk(chunk): async def send_chunk(chunk):
await flow("response").send( await flow("response").send(
GraphRagResponse( GraphRagResponse(
chunk=chunk, response=chunk,
end_of_stream=False, end_of_stream=False,
response=None,
error=None error=None
), ),
properties={"id": id} properties={"id": id}
) )
# Query with streaming enabled # Query with streaming enabled
full_response = await rag.query( # The query will send chunks via callback AND return the complete text
final_response = await rag.query(
query = v.query, user = v.user, collection = v.collection, query = v.query, user = v.user, collection = v.collection,
entity_limit = entity_limit, triple_limit = triple_limit, entity_limit = entity_limit, triple_limit = triple_limit,
max_subgraph_size = max_subgraph_size, max_subgraph_size = max_subgraph_size,
@ -159,12 +159,12 @@ class Processor(FlowProcessor):
chunk_callback = send_chunk, chunk_callback = send_chunk,
) )
# Send final message with complete response # Send final message - may have last chunk of content with end_of_stream=True
# (prompt service may send final chunk with text, so we pass through whatever we got)
await flow("response").send( await flow("response").send(
GraphRagResponse( GraphRagResponse(
chunk=None, response=final_response if final_response else "",
end_of_stream=True, end_of_stream=True,
response=full_response,
error=None error=None
), ),
properties={"id": id} properties={"id": id}

View file

@ -26,19 +26,19 @@ class WebSocketResponder:
self.completed = True self.completed = True
class MessageDispatcher: class MessageDispatcher:
def __init__(self, max_workers: int = 10, config_receiver=None, pulsar_client=None): def __init__(self, max_workers: int = 10, config_receiver=None, backend=None):
self.max_workers = max_workers self.max_workers = max_workers
self.semaphore = asyncio.Semaphore(max_workers) self.semaphore = asyncio.Semaphore(max_workers)
self.active_tasks = set() self.active_tasks = set()
self.pulsar_client = pulsar_client self.backend = backend
# Use DispatcherManager for flow and service management # Use DispatcherManager for flow and service management
if pulsar_client and config_receiver: if backend and config_receiver:
self.dispatcher_manager = DispatcherManager(pulsar_client, config_receiver, prefix="rev-gateway") self.dispatcher_manager = DispatcherManager(backend, config_receiver, prefix="rev-gateway")
else: else:
self.dispatcher_manager = None self.dispatcher_manager = None
logger.warning("No pulsar_client or config_receiver provided - using fallback mode") logger.warning("No backend or config_receiver provided - using fallback mode")
# Service name mapping from websocket protocol to translator registry # Service name mapping from websocket protocol to translator registry
self.service_mapping = { self.service_mapping = {
@ -78,7 +78,7 @@ class MessageDispatcher:
try: try:
if not self.dispatcher_manager: if not self.dispatcher_manager:
raise RuntimeError("DispatcherManager not available - pulsar_client and config_receiver required") raise RuntimeError("DispatcherManager not available - backend and config_receiver required")
# Use DispatcherManager for flow-based processing # Use DispatcherManager for flow-based processing
responder = WebSocketResponder() responder = WebSocketResponder()

View file

@ -7,10 +7,10 @@ import os
from aiohttp import ClientSession, WSMsgType, ClientWebSocketResponse from aiohttp import ClientSession, WSMsgType, ClientWebSocketResponse
from typing import Optional from typing import Optional
from urllib.parse import urlparse, urlunparse from urllib.parse import urlparse, urlunparse
import pulsar
from .dispatcher import MessageDispatcher from .dispatcher import MessageDispatcher
from ..gateway.config.receiver import ConfigReceiver from ..gateway.config.receiver import ConfigReceiver
from ..base import get_pubsub
logger = logging.getLogger("rev_gateway") logger = logging.getLogger("rev_gateway")
logger.setLevel(logging.INFO) logger.setLevel(logging.INFO)
@ -56,25 +56,20 @@ class ReverseGateway:
self.pulsar_host = pulsar_host or os.getenv("PULSAR_HOST", "pulsar://pulsar:6650") self.pulsar_host = pulsar_host or os.getenv("PULSAR_HOST", "pulsar://pulsar:6650")
self.pulsar_api_key = pulsar_api_key or os.getenv("PULSAR_API_KEY", None) self.pulsar_api_key = pulsar_api_key or os.getenv("PULSAR_API_KEY", None)
self.pulsar_listener = pulsar_listener self.pulsar_listener = pulsar_listener
# Initialize Pulsar client # Create backend using factory
if self.pulsar_api_key: backend_params = {
self.pulsar_client = pulsar.Client( 'pulsar_host': self.pulsar_host,
self.pulsar_host, 'pulsar_api_key': self.pulsar_api_key,
listener_name=self.pulsar_listener, 'pulsar_listener': self.pulsar_listener,
authentication=pulsar.AuthenticationToken(self.pulsar_api_key) }
) self.backend = get_pubsub(**backend_params)
else:
self.pulsar_client = pulsar.Client(
self.pulsar_host,
listener_name=self.pulsar_listener
)
# Initialize config receiver # Initialize config receiver
self.config_receiver = ConfigReceiver(self.pulsar_client) self.config_receiver = ConfigReceiver(self.backend)
# Initialize dispatcher with config_receiver and pulsar_client - must be created after config_receiver # Initialize dispatcher with config_receiver and backend - must be created after config_receiver
self.dispatcher = MessageDispatcher(max_workers, self.config_receiver, self.pulsar_client) self.dispatcher = MessageDispatcher(max_workers, self.config_receiver, self.backend)
async def connect(self) -> bool: async def connect(self) -> bool:
try: try:
@ -170,10 +165,10 @@ class ReverseGateway:
self.running = False self.running = False
await self.dispatcher.shutdown() await self.dispatcher.shutdown()
await self.disconnect() await self.disconnect()
# Close Pulsar client # Close backend
if hasattr(self, 'pulsar_client'): if hasattr(self, 'backend'):
self.pulsar_client.close() self.backend.close()
def stop(self): def stop(self):
self.running = False self.running = False

View file

@ -78,7 +78,7 @@ class Processor(FlowProcessor):
# Create storage management consumer # Create storage management consumer
self.storage_request_consumer = Consumer( self.storage_request_consumer = Consumer(
taskgroup=self.taskgroup, taskgroup=self.taskgroup,
client=self.pulsar_client, backend=self.pubsub,
flow=None, flow=None,
topic=object_storage_management_topic, topic=object_storage_management_topic,
subscriber=f"{id}-storage", subscriber=f"{id}-storage",
@ -89,7 +89,7 @@ class Processor(FlowProcessor):
# Create storage management response producer # Create storage management response producer
self.storage_response_producer = Producer( self.storage_response_producer = Producer(
client=self.pulsar_client, backend=self.pubsub,
topic=storage_management_response_topic, topic=storage_management_response_topic,
schema=StorageManagementResponse, schema=StorageManagementResponse,
metrics=storage_response_metrics, metrics=storage_response_metrics,

View file

@ -338,7 +338,6 @@ class LibraryTableStore:
for m in row[5] for m in row[5]
], ],
tags = row[6] if row[6] else [], tags = row[6] if row[6] else [],
object_id = row[7],
) )
for row in resp for row in resp
] ]
@ -384,7 +383,6 @@ class LibraryTableStore:
for m in row[4] for m in row[4]
], ],
tags = row[5] if row[5] else [], tags = row[5] if row[5] else [],
object_id = row[6],
) )
logger.debug("Done") logger.debug("Done")