trustgraph/tests/contract/test_schema_field_contracts.py

"""
Contract tests for schema dataclass field sets.

These pin the *field names* of small, widely-constructed schema dataclasses
so that any rename, removal, or accidental addition fails CI loudly instead
of waiting for a runtime TypeError on the next websocket message.

Background: in v2.2 the `Metadata` dataclass dropped a `metadata: list[Triple]`
field but several call sites kept passing `Metadata(metadata=...)`. The bug
was only discovered when a websocket import dispatcher received its first
real message in production. A trivial structural assertion of the kind
below would have caught it at unit-test time.

Add to this file whenever a schema rename burns you. The cost of a frozen
field set is a one-line update when you intentionally evolve the schema; the
benefit is that every call site is forced to come along for the ride.
"""

import dataclasses
import pytest

from trustgraph.schema import (
    Metadata,
    EntityContext,
    EntityEmbeddings,
    ChunkEmbeddings,
)


def _field_names(dc):
    return {f.name for f in dataclasses.fields(dc)}


@pytest.mark.contract
class TestSchemaFieldContracts:
    """Pin the field set of dataclasses that get constructed all over the
    codebase. If you intentionally change one of these, update the
    expected set in the same commit — that diff will surface every call
    site that needs to come along."""

    def test_metadata_fields(self):
        # NOTE: there is no `metadata` field. A previous regression
        # constructed Metadata(metadata=...) and crashed at runtime.
        # `user` was also dropped in the workspace refactor — workspace
        # now flows via flow.workspace, not via message payload.
        assert _field_names(Metadata) == {
            "id",
            "root",
            "collection",
        }

    def test_entity_embeddings_fields(self):
        # NOTE: the embedding field is `vector` (singular, list[float]).
        # There is no `vectors` field. Several call sites historically
        # passed `vectors=` and crashed at runtime.
        assert _field_names(EntityEmbeddings) == {
            "entity",
            "vector",
            "chunk_id",
        }

    def test_chunk_embeddings_fields(self):
        # Same `vector` (singular) convention as EntityEmbeddings.
        assert _field_names(ChunkEmbeddings) == {
            "chunk_id",
            "vector",
        }

    def test_entity_context_fields(self):
        assert _field_names(EntityContext) == {
            "entity",
            "context",
            "chunk_id",
        }
Fix Metadata/EntityEmbeddings schema migration tail and add regression tests (#777) The Metadata dataclass dropped its `metadata: list[Triple]` field and EntityEmbeddings/ChunkEmbeddings settled on a singular `vector: list[float]` field, but several call sites kept passing `Metadata(metadata=...)` and `EntityEmbeddings(vectors=...)`. The bugs were latent until a websocket client first hit `/api/v1/flow/default/import/entity-contexts`, at which point the dispatcher TypeError'd on construction. Production fixes (5 call sites on the same migration tail): * trustgraph-flow gateway dispatchers entity_contexts_import.py and graph_embeddings_import.py — drop the stale Metadata(metadata=...) kwarg; switch graph_embeddings_import to the singular `vector` wire key. * trustgraph-base messaging translators knowledge.py and document_loading.py — fix decode side to read the singular `"vector"` key, matching what their own encode sides have always written. * trustgraph-flow tables/knowledge.py — fix Cassandra row deserialiser to construct EntityEmbeddings(vector=...) instead of vectors=. * trustgraph-flow gateway core_import/core_export — switch the kg-core msgpack wire format to the singular `"v"`/`"vector"` key and drop the dead `m["m"]` envelope field that referenced the removed Metadata.metadata triples list (it was a guaranteed KeyError on the export side). Defense-in-depth regression coverage (32 new tests across 7 files): * tests/contract/test_schema_field_contracts.py — pin the field set of Metadata, EntityEmbeddings, ChunkEmbeddings, EntityContext so any future schema rename fails CI loudly with a clear diff. * tests/unit/test_translators/test_knowledge_translator_roundtrip.py and test_document_embeddings_translator_roundtrip.py - encode→decode round-trip the affected translators end to end, locking in the singular `"vector"` wire key. * tests/unit/test_gateway/test_entity_contexts_import_dispatcher.py and test_graph_embeddings_import_dispatcher.py — exercise the websocket dispatchers' receive() path with realistic payloads, the direct regression test for the original production crash. * tests/unit/test_gateway/test_core_import_export_roundtrip.py — pack/unpack the kg-core msgpack format through the real dispatcher classes (with KnowledgeRequestor mocked), including a full export→import round-trip. * tests/unit/test_tables/test_knowledge_table_store.py — exercise the Cassandra row → schema conversion via __new__ to bypass the live cluster connection. Also fixes an unrelated leaked-coroutine RuntimeWarning in test_gateway/test_service.py::test_run_method_calls_web_run_app: the mocked aiohttp.web.run_app now closes the coroutine that Api.run() hands it, mirroring what the real run_app would do, instead of leaving it for the GC to complain about. 2026-04-10 20:43:45 +01:00			`"""`
			`Contract tests for schema dataclass field sets.`

			`These pin the field names of small, widely-constructed schema dataclasses`
			`so that any rename, removal, or accidental addition fails CI loudly instead`
			`of waiting for a runtime TypeError on the next websocket message.`

			Background: in v2.2 the `Metadata` dataclass dropped a `metadata: list[Triple]`
			field but several call sites kept passing `Metadata(metadata=...)`. The bug
			`was only discovered when a websocket import dispatcher received its first`
			`real message in production. A trivial structural assertion of the kind`
			`below would have caught it at unit-test time.`

			`Add to this file whenever a schema rename burns you. The cost of a frozen`
			`field set is a one-line update when you intentionally evolve the schema; the`
			`benefit is that every call site is forced to come along for the ride.`
			`"""`

			`import dataclasses`
			`import pytest`

			`from trustgraph.schema import (`
			`Metadata,`
			`EntityContext,`
			`EntityEmbeddings,`
			`ChunkEmbeddings,`
			`)`


			`def _field_names(dc):`
			`return {f.name for f in dataclasses.fields(dc)}`


			`@pytest.mark.contract`
			`class TestSchemaFieldContracts:`
			`"""Pin the field set of dataclasses that get constructed all over the`
			`codebase. If you intentionally change one of these, update the`
			`expected set in the same commit — that diff will surface every call`
			`site that needs to come along."""`

			`def test_metadata_fields(self):`
			# NOTE: there is no `metadata` field. A previous regression
			`# constructed Metadata(metadata=...) and crashed at runtime.`
feat: workspace-based multi-tenancy, replacing user as tenancy axis (#840) Introduces `workspace` as the isolation boundary for config, flows, library, and knowledge data. Removes `user` as a schema-level field throughout the code, API specs, and tests; workspace provides the same separation more cleanly at the trusted flow.workspace layer rather than through client-supplied message fields. Design ------ - IAM tech spec (docs/tech-specs/iam.md) documents current state, proposed auth/access model, and migration direction. - Data ownership model (docs/tech-specs/data-ownership-model.md) captures the workspace/collection/flow hierarchy. Schema + messaging ------------------ - Drop `user` field from AgentRequest/Step, GraphRagQuery, DocumentRagQuery, Triples/Graph/Document/Row EmbeddingsRequest, Sparql/Rows/Structured QueryRequest, ToolServiceRequest. - Keep collection/workspace routing via flow.workspace at the service layer. - Translators updated to not serialise/deserialise user. API specs --------- - OpenAPI schemas and path examples cleaned of user fields. - Websocket async-api messages updated. - Removed the unused parameters/User.yaml. Services + base --------------- - Librarian, collection manager, knowledge, config: all operations scoped by workspace. Config client API takes workspace as first positional arg. - `flow.workspace` set at flow start time by the infrastructure; no longer pass-through from clients. - Tool service drops user-personalisation passthrough. CLI + SDK --------- - tg-init-workspace and workspace-aware import/export. - All tg-* commands drop user args; accept --workspace. - Python API/SDK (flow, socket_client, async_, explainability, library) drop user kwargs from every method signature. MCP server ---------- - All tool endpoints drop user parameters; socket_manager no longer keyed per user. Flow service ------------ - Closure-based topic cleanup on flow stop: only delete topics whose blueprint template was parameterised AND no remaining live flow (across all workspaces) still resolves to that topic. Three scopes fall out naturally from template analysis: {id} -> per-flow, deleted on stop * {blueprint} -> per-blueprint, kept while any flow of the same blueprint exists * {workspace} -> per-workspace, kept while any flow in the workspace exists * literal -> global, never deleted (e.g. tg.request.librarian) Fixes a bug where stopping a flow silently destroyed the global librarian exchange, wedging all library operations until manual restart. RabbitMQ backend ---------------- - heartbeat=60, blocked_connection_timeout=300. Catches silently dead connections (broker restart, orphaned channels, network partitions) within ~2 heartbeat windows, so the consumer reconnects and re-binds its queue rather than sitting forever on a zombie connection. Tests ----- - Full test refresh: unit, integration, contract, provenance. - Dropped user-field assertions and constructor kwargs across ~100 test files. - Renamed user-collection isolation tests to workspace-collection. 2026-04-21 23:23:01 +01:00			# `user` was also dropped in the workspace refactor — workspace
			`# now flows via flow.workspace, not via message payload.`
Fix Metadata/EntityEmbeddings schema migration tail and add regression tests (#777) The Metadata dataclass dropped its `metadata: list[Triple]` field and EntityEmbeddings/ChunkEmbeddings settled on a singular `vector: list[float]` field, but several call sites kept passing `Metadata(metadata=...)` and `EntityEmbeddings(vectors=...)`. The bugs were latent until a websocket client first hit `/api/v1/flow/default/import/entity-contexts`, at which point the dispatcher TypeError'd on construction. Production fixes (5 call sites on the same migration tail): * trustgraph-flow gateway dispatchers entity_contexts_import.py and graph_embeddings_import.py — drop the stale Metadata(metadata=...) kwarg; switch graph_embeddings_import to the singular `vector` wire key. * trustgraph-base messaging translators knowledge.py and document_loading.py — fix decode side to read the singular `"vector"` key, matching what their own encode sides have always written. * trustgraph-flow tables/knowledge.py — fix Cassandra row deserialiser to construct EntityEmbeddings(vector=...) instead of vectors=. * trustgraph-flow gateway core_import/core_export — switch the kg-core msgpack wire format to the singular `"v"`/`"vector"` key and drop the dead `m["m"]` envelope field that referenced the removed Metadata.metadata triples list (it was a guaranteed KeyError on the export side). Defense-in-depth regression coverage (32 new tests across 7 files): * tests/contract/test_schema_field_contracts.py — pin the field set of Metadata, EntityEmbeddings, ChunkEmbeddings, EntityContext so any future schema rename fails CI loudly with a clear diff. * tests/unit/test_translators/test_knowledge_translator_roundtrip.py and test_document_embeddings_translator_roundtrip.py - encode→decode round-trip the affected translators end to end, locking in the singular `"vector"` wire key. * tests/unit/test_gateway/test_entity_contexts_import_dispatcher.py and test_graph_embeddings_import_dispatcher.py — exercise the websocket dispatchers' receive() path with realistic payloads, the direct regression test for the original production crash. * tests/unit/test_gateway/test_core_import_export_roundtrip.py — pack/unpack the kg-core msgpack format through the real dispatcher classes (with KnowledgeRequestor mocked), including a full export→import round-trip. * tests/unit/test_tables/test_knowledge_table_store.py — exercise the Cassandra row → schema conversion via __new__ to bypass the live cluster connection. Also fixes an unrelated leaked-coroutine RuntimeWarning in test_gateway/test_service.py::test_run_method_calls_web_run_app: the mocked aiohttp.web.run_app now closes the coroutine that Api.run() hands it, mirroring what the real run_app would do, instead of leaving it for the GC to complain about. 2026-04-10 20:43:45 +01:00			`assert _field_names(Metadata) == {`
			`"id",`
			`"root",`
			`"collection",`
			`}`

			`def test_entity_embeddings_fields(self):`
			# NOTE: the embedding field is `vector` (singular, list[float]).
			# There is no `vectors` field. Several call sites historically
			# passed `vectors=` and crashed at runtime.
			`assert _field_names(EntityEmbeddings) == {`
			`"entity",`
			`"vector",`
			`"chunk_id",`
			`}`

			`def test_chunk_embeddings_fields(self):`
			# Same `vector` (singular) convention as EntityEmbeddings.
			`assert _field_names(ChunkEmbeddings) == {`
			`"chunk_id",`
			`"vector",`
			`}`

			`def test_entity_context_fields(self):`
			`assert _field_names(EntityContext) == {`
			`"entity",`
			`"context",`
			`"chunk_id",`
			`}`