trustgraph

mirror of https://github.com/trustgraph-ai/trustgraph.git synced 2026-04-26 08:56:21 +02:00

Author	SHA1	Message	Date
cybermaggedon	6cbaf88fc6	fix: ontology extractor reads .objects, not .object, from PromptResult (#842 ) The extract-with-ontologies prompt is a JSONL prompt, which means the prompt service returns a PromptResult with response_type="jsonl" and the parsed items in `.objects` (plural). The ontology extractor was reading `.object` (singular) — the field used for response_type="json" — which is always None for JSONL prompts. Effect: the parser received None on every chunk, hit its "Unexpected response type: <class 'NoneType'>" branch, returned no ExtractionResult, and extract_with_simplified_format returned []. Every extraction silently produced zero triples. Graphs populated only with the seed ontology schema (TBox) and document/chunk provenance — no instance triples at all. The e2e test threshold of >=100 edges per collection was met by schema + provenance alone, so the failure mode was invisible until RAG queries couldn't find any content. Regression introduced in v2.3 with the token-usage work (commit `56d700f3` / `14e49d83`) when PromptClient.prompt() began returning a PromptResult wrapper instead of the raw text/dict/list. All other call sites of .prompt() across retrieval/, agent/, orchestrator/ were already reading the correct field for their prompt's response_type; ontology extraction was the sole stranded caller. Also adds tests/unit/test_extract/test_ontology/test_extract_with_simplified_format.py covering: - happy path: populated .objects produces non-empty triples - production failure shape: .objects=None returns [] cleanly - empty .objects returns [] without raising - defensive: do not silently fall back to .object for a JSONL prompt	2026-04-22 12:10:42 +01:00
cybermaggedon	d35473f7f7	feat: workspace-based multi-tenancy, replacing user as tenancy axis (#840 ) Introduces `workspace` as the isolation boundary for config, flows, library, and knowledge data. Removes `user` as a schema-level field throughout the code, API specs, and tests; workspace provides the same separation more cleanly at the trusted flow.workspace layer rather than through client-supplied message fields. Design ------ - IAM tech spec (docs/tech-specs/iam.md) documents current state, proposed auth/access model, and migration direction. - Data ownership model (docs/tech-specs/data-ownership-model.md) captures the workspace/collection/flow hierarchy. Schema + messaging ------------------ - Drop `user` field from AgentRequest/Step, GraphRagQuery, DocumentRagQuery, Triples/Graph/Document/Row EmbeddingsRequest, Sparql/Rows/Structured QueryRequest, ToolServiceRequest. - Keep collection/workspace routing via flow.workspace at the service layer. - Translators updated to not serialise/deserialise user. API specs --------- - OpenAPI schemas and path examples cleaned of user fields. - Websocket async-api messages updated. - Removed the unused parameters/User.yaml. Services + base --------------- - Librarian, collection manager, knowledge, config: all operations scoped by workspace. Config client API takes workspace as first positional arg. - `flow.workspace` set at flow start time by the infrastructure; no longer pass-through from clients. - Tool service drops user-personalisation passthrough. CLI + SDK --------- - tg-init-workspace and workspace-aware import/export. - All tg-* commands drop user args; accept --workspace. - Python API/SDK (flow, socket_client, async_, explainability, library) drop user kwargs from every method signature. MCP server ---------- - All tool endpoints drop user parameters; socket_manager no longer keyed per user. Flow service ------------ - Closure-based topic cleanup on flow stop: only delete topics whose blueprint template was parameterised AND no remaining live flow (across all workspaces) still resolves to that topic. Three scopes fall out naturally from template analysis: {id} -> per-flow, deleted on stop * {blueprint} -> per-blueprint, kept while any flow of the same blueprint exists * {workspace} -> per-workspace, kept while any flow in the workspace exists * literal -> global, never deleted (e.g. tg.request.librarian) Fixes a bug where stopping a flow silently destroyed the global librarian exchange, wedging all library operations until manual restart. RabbitMQ backend ---------------- - heartbeat=60, blocked_connection_timeout=300. Catches silently dead connections (broker restart, orphaned channels, network partitions) within ~2 heartbeat windows, so the consumer reconnects and re-binds its queue rather than sitting forever on a zombie connection. Tests ----- - Full test refresh: unit, integration, contract, provenance. - Dropped user-field assertions and constructor kwargs across ~100 test files. - Renamed user-collection isolation tests to workspace-collection.	2026-04-21 23:23:01 +01:00
Het Patel	391b9076f3	feat: add domain and range validation to triple extraction in extract.py (#825 )	2026-04-17 11:29:57 +01:00
cybermaggedon	14e49d83c7	Expose LLM token usage across all service layers (#782 ) Expose LLM token usage (in_token, out_token, model) across all service layers Propagate token counts from LLM services through the prompt, text-completion, graph-RAG, document-RAG, and agent orchestrator pipelines to the API gateway and Python SDK. All fields are Optional — None means "not available", distinguishing from a real zero count. Key changes: - Schema: Add in_token/out_token/model to TextCompletionResponse, PromptResponse, GraphRagResponse, DocumentRagResponse, AgentResponse - TextCompletionClient: New TextCompletionResult return type. Split into text_completion() (non-streaming) and text_completion_stream() (streaming with per-chunk handler callback) - PromptClient: New PromptResult with response_type (text/json/jsonl), typed fields (text/object/objects), and token usage. All callers updated. - RAG services: Accumulate token usage across all prompt calls (extract-concepts, edge-scoring, edge-reasoning, synthesis). Non-streaming path sends single combined response instead of chunk + end_of_session. - Agent orchestrator: UsageTracker accumulates tokens across meta-router, pattern prompt calls, and react reasoning. Attached to end_of_dialog. - Translators: Encode token fields when not None (is not None, not truthy) - Python SDK: RAG and text-completion methods return TextCompletionResult (non-streaming) or RAGChunk/AgentAnswer with token fields (streaming) - CLI: --show-usage flag on tg-invoke-llm, tg-invoke-prompt, tg-invoke-graph-rag, tg-invoke-document-rag, tg-invoke-agent	2026-04-13 14:38:34 +01:00
cybermaggedon	29b4300808	Updated test suite for explainability & provenance (#696 ) * Provenance tests * Embeddings tests * Test librarian * Test triples stream * Test concurrency * Entity centric graph writes * Agent tool service tests * Structured data tests * RDF tests * Addition LLM tests * Reliability tests	2026-03-13 14:27:42 +00:00
cybermaggedon	cf0daedefa	Changed schema for Value -> Term, majorly breaking change (#622 ) * Changed schema for Value -> Term, majorly breaking change * Following the schema change, Value -> Term into all processing * Updated Cassandra for g, p, s, o index patterns (7 indexes) * Reviewed and updated all tests * Neo4j, Memgraph and FalkorDB remain broken, will look at once settled down	2026-01-27 13:48:08 +00:00
cybermaggedon	6c85038c75	Ontology extraction tests (#560 )	2025-11-13 20:02:12 +00:00
cybermaggedon	83f0c1e7f3	Structure data mvp (#452 ) * Structured data tech spec * Architecture principles * New schemas * Updated schemas and specs * Object extractor * Add .coveragerc * New tests * Cassandra object storage * Trying to object extraction working, issues exist	2025-08-07 20:47:20 +01:00

8 commits