trustgraph/tests/unit/test_chunking/conftest.py
cybermaggedon d35473f7f7
feat: workspace-based multi-tenancy, replacing user as tenancy axis (#840)
Introduces `workspace` as the isolation boundary for config, flows,
library, and knowledge data. Removes `user` as a schema-level field
throughout the code, API specs, and tests; workspace provides the
same separation more cleanly at the trusted flow.workspace layer
rather than through client-supplied message fields.

Design
------
- IAM tech spec (docs/tech-specs/iam.md) documents current state,
  proposed auth/access model, and migration direction.
- Data ownership model (docs/tech-specs/data-ownership-model.md)
  captures the workspace/collection/flow hierarchy.

Schema + messaging
------------------
- Drop `user` field from AgentRequest/Step, GraphRagQuery,
  DocumentRagQuery, Triples/Graph/Document/Row EmbeddingsRequest,
  Sparql/Rows/Structured QueryRequest, ToolServiceRequest.
- Keep collection/workspace routing via flow.workspace at the
  service layer.
- Translators updated to not serialise/deserialise user.

API specs
---------
- OpenAPI schemas and path examples cleaned of user fields.
- Websocket async-api messages updated.
- Removed the unused parameters/User.yaml.

Services + base
---------------
- Librarian, collection manager, knowledge, config: all operations
  scoped by workspace. Config client API takes workspace as first
  positional arg.
- `flow.workspace` set at flow start time by the infrastructure;
  no longer pass-through from clients.
- Tool service drops user-personalisation passthrough.

CLI + SDK
---------
- tg-init-workspace and workspace-aware import/export.
- All tg-* commands drop user args; accept --workspace.
- Python API/SDK (flow, socket_client, async_*, explainability,
  library) drop user kwargs from every method signature.

MCP server
----------
- All tool endpoints drop user parameters; socket_manager no longer
  keyed per user.

Flow service
------------
- Closure-based topic cleanup on flow stop: only delete topics
  whose blueprint template was parameterised AND no remaining
  live flow (across all workspaces) still resolves to that topic.
  Three scopes fall out naturally from template analysis:
    * {id} -> per-flow, deleted on stop
    * {blueprint} -> per-blueprint, kept while any flow of the
      same blueprint exists
    * {workspace} -> per-workspace, kept while any flow in the
      workspace exists
    * literal -> global, never deleted (e.g. tg.request.librarian)
  Fixes a bug where stopping a flow silently destroyed the global
  librarian exchange, wedging all library operations until manual
  restart.

RabbitMQ backend
----------------
- heartbeat=60, blocked_connection_timeout=300. Catches silently
  dead connections (broker restart, orphaned channels, network
  partitions) within ~2 heartbeat windows, so the consumer
  reconnects and re-binds its queue rather than sitting forever
  on a zombie connection.

Tests
-----
- Full test refresh: unit, integration, contract, provenance.
- Dropped user-field assertions and constructor kwargs across
  ~100 test files.
- Renamed user-collection isolation tests to workspace-collection.
2026-04-21 23:23:01 +01:00

145 lines
No EOL
4.1 KiB
Python

import pytest
from unittest.mock import AsyncMock, Mock, patch
from trustgraph.schema import TextDocument, Metadata
from trustgraph.chunking.recursive.chunker import Processor as RecursiveChunker
from trustgraph.chunking.token.chunker import Processor as TokenChunker
from prometheus_client import REGISTRY
@pytest.fixture
def mock_flow():
"""Mock flow function that returns a mock output producer."""
output_mock = AsyncMock()
flow_mock = Mock(return_value=output_mock)
return flow_mock, output_mock
@pytest.fixture
def mock_consumer():
"""Mock consumer with test attributes."""
consumer = Mock()
consumer.id = "test-consumer"
consumer.flow = "test-flow"
return consumer
@pytest.fixture
def sample_text_document():
"""Sample document with moderate length text."""
metadata = Metadata(
id="test-doc-1",
collection="test-collection"
)
text = "The quick brown fox jumps over the lazy dog. " * 20
return TextDocument(
metadata=metadata,
text=text.encode("utf-8")
)
@pytest.fixture
def long_text_document():
"""Long document for testing multiple chunks."""
metadata = Metadata(
id="test-doc-long",
collection="test-collection"
)
# Create a long text that will definitely be chunked
text = " ".join([f"Sentence number {i}. This is part of a long document." for i in range(200)])
return TextDocument(
metadata=metadata,
text=text.encode("utf-8")
)
@pytest.fixture
def unicode_text_document():
"""Document with various unicode characters."""
metadata = Metadata(
id="test-doc-unicode",
collection="test-collection"
)
text = """
English: Hello World!
Chinese: 你好世界
Japanese: こんにちは世界
Korean: 안녕하세요 세계
Arabic: مرحبا بالعالم
Russian: Привет мир
Emoji: 🌍 🌎 🌏 😀 🎉
Math: ∑ ∏ ∫ ∞ √ π
Symbols: © ® ™ € £ ¥
"""
return TextDocument(
metadata=metadata,
text=text.encode("utf-8")
)
@pytest.fixture
def empty_text_document():
"""Empty document for edge case testing."""
metadata = Metadata(
id="test-doc-empty",
collection="test-collection"
)
return TextDocument(
metadata=metadata,
text=b""
)
@pytest.fixture
def mock_message(sample_text_document):
"""Mock message containing a document."""
msg = Mock()
msg.value.return_value = sample_text_document
return msg
@pytest.fixture(autouse=True)
def clear_metrics():
"""Clear metrics before each test to avoid duplicates."""
# Clear the chunk_metric class attribute if it exists
if hasattr(RecursiveChunker, 'chunk_metric'):
# Unregister from Prometheus registry first
try:
REGISTRY.unregister(RecursiveChunker.chunk_metric)
except KeyError:
pass # Already unregistered
delattr(RecursiveChunker, 'chunk_metric')
if hasattr(TokenChunker, 'chunk_metric'):
try:
REGISTRY.unregister(TokenChunker.chunk_metric)
except KeyError:
pass # Already unregistered
delattr(TokenChunker, 'chunk_metric')
yield
# Clean up after test as well
if hasattr(RecursiveChunker, 'chunk_metric'):
try:
REGISTRY.unregister(RecursiveChunker.chunk_metric)
except KeyError:
pass
delattr(RecursiveChunker, 'chunk_metric')
if hasattr(TokenChunker, 'chunk_metric'):
try:
REGISTRY.unregister(TokenChunker.chunk_metric)
except KeyError:
pass
delattr(TokenChunker, 'chunk_metric')
@pytest.fixture
def mock_async_processor_init():
"""Mock AsyncProcessor.__init__ to avoid taskgroup requirement."""
def init_mock(self, **kwargs):
# Set attributes that AsyncProcessor would normally set
self.config_handlers = []
self.specifications = []
self.flows = {}
self.id = kwargs.get('id', 'test-processor')
# Don't call the real __init__
with patch('trustgraph.base.async_processor.AsyncProcessor.__init__', init_mock):
yield