trustgraph

mirror of https://github.com/trustgraph-ai/trustgraph.git synced 2026-07-01 09:29:38 +02:00

Author	SHA1	Message	Date
cybermaggedon	14e49d83c7	Expose LLM token usage across all service layers (#782 ) Expose LLM token usage (in_token, out_token, model) across all service layers Propagate token counts from LLM services through the prompt, text-completion, graph-RAG, document-RAG, and agent orchestrator pipelines to the API gateway and Python SDK. All fields are Optional — None means "not available", distinguishing from a real zero count. Key changes: - Schema: Add in_token/out_token/model to TextCompletionResponse, PromptResponse, GraphRagResponse, DocumentRagResponse, AgentResponse - TextCompletionClient: New TextCompletionResult return type. Split into text_completion() (non-streaming) and text_completion_stream() (streaming with per-chunk handler callback) - PromptClient: New PromptResult with response_type (text/json/jsonl), typed fields (text/object/objects), and token usage. All callers updated. - RAG services: Accumulate token usage across all prompt calls (extract-concepts, edge-scoring, edge-reasoning, synthesis). Non-streaming path sends single combined response instead of chunk + end_of_session. - Agent orchestrator: UsageTracker accumulates tokens across meta-router, pattern prompt calls, and react reasoning. Attached to end_of_dialog. - Translators: Encode token fields when not None (is not None, not truthy) - Python SDK: RAG and text-completion methods return TextCompletionResult (non-streaming) or RAGChunk/AgentAnswer with token fields (streaming) - CLI: --show-usage flag on tg-invoke-llm, tg-invoke-prompt, tg-invoke-graph-rag, tg-invoke-document-rag, tg-invoke-agent	2026-04-13 14:38:34 +01:00
cybermaggedon	67cfa80836	SPARQL CLI reports errors from service (#794 ) SPARQL query CLI ignores errors from the SPARQL service and just emits a zero row output. This change causes an error to be reported	2026-04-13 14:31:33 +01:00
elpresidank	5a6ea1e70e	merged in V2T	2026-04-12 17:34:39 -05:00
elpresidank	ee45cb4850	feat: fix RAG pipelines, Beep Graph branding, PWA, and ambient glow UI Pipeline fixes: - Fix agent getting empty response from graph-rag by combining answer + explain data in single message (RequestResponse returns first msg) - Fix Doc RAG pipeline: add content field to Qdrant doc payload, seed 10 document chunks, fix type mismatches across base/flow/client - Forward explainability events from agent's KnowledgeQuery to client - Add "agent" to TERM_BEARING_RESPONSE_SERVICES for triple translation - Fix embeddings env var (OLLAMA_URL), user/collection threading, edge scoring threshold, and various protocol mismatches Branding: - Rename TrustGraph → Beep Graph (title, sidebar, settings, about) - Custom lambda + ThugLife pixel glasses SVG logo component - Forest green color palette (brand-50 through brand-900) - SVG favicon + PNG icons (16/32/180/192/512) - PWA manifest with service worker for offline shell caching - Splash screen with animated logo pulse on app load - Ambient glow background with drifting green radial blobs Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>	2026-04-12 10:19:10 -05:00
elpresidank	87f6e5eb05	feat: chat message actions, explainability graphs, and graph query filters Add chat UX improvements: message actions toolbar (copy/delete/regenerate) on hover, inline explainability subgraph visualization from RAG/agent queries, and token metadata for all chat modes. Enhance graph page with SPO query filters, configurable triple limit, and type legend overlay. Extract shared graph utilities for reuse across components. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>	2026-04-12 02:55:46 -05:00
elpresidank	d5dd15be72	feat: MCP Tools management UI with QA accessibility fixes Add dedicated /mcp-tools page for managing MCP servers and tools from the workbench. Includes CRUD dialogs, config API integration, and feature flag gating via mcpTools switch. QA pass also fixes accessibility across existing pages: aria-expanded on chat phase blocks, tabpanel tabindex on prompts, toggle contrast ratio (WCAG 2.1 SC 1.4.11) on settings. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>	2026-04-12 00:59:20 -05:00
elpresidank	f4d6e49217	Merge remote-tracking branch 'origin/master' into ts-port	2026-04-11 23:56:34 -05:00
elpresidank	338adf8668	fix: global focus-visible rings and light-mode border contrast - Add global focus-visible outline for buttons, switches, selects, and inputs so all interactive elements show a visible brand-500 ring on keyboard focus (not just NavLinks and dialog close) - Darken light-mode --color-border from #e4e4e7 to #d4d4d8 so input borders, dividers, and mode selector outlines are visible on white Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>	2026-04-11 23:28:38 -05:00
elpresidank	d097b790ff	fix: comprehensive a11y and contrast QA pass across workbench Automated QA loop (6 parallel browser agents, 2 rounds) found and fixed 15 accessibility, contrast, and responsive issues across all 8 pages: - WCAG contrast: light-mode warning (#854d0e), error (#b91c1c), toggle off-state (surface-400), connection badge (fg-muted) - ARIA: mode selector group+pressed, tab pattern ids+labelledby, nav and aside labels, dialog focus-return, alert roles on banners - Responsive: library header flex-wrap, search/button aria-labels - Focus: NavLink visible ring, dialog close button ring Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>	2026-04-11 23:26:28 -05:00
cybermaggedon	ffe310af7c	Fix RabbitMQ request/response race and chunker Flow API drift (#779 ) * Fix Metadata/EntityEmbeddings schema migration tail and add regression tests (#776) The Metadata dataclass dropped its `metadata: list[Triple]` field and EntityEmbeddings/ChunkEmbeddings settled on a singular `vector: list[float]` field, but several call sites kept passing `Metadata(metadata=...)` and `EntityEmbeddings(vectors=...)`. The bugs were latent until a websocket client first hit `/api/v1/flow/default/import/entity-contexts`, at which point the dispatcher TypeError'd on construction. Production fixes (5 call sites on the same migration tail): * trustgraph-flow gateway dispatchers entity_contexts_import.py and graph_embeddings_import.py — drop the stale Metadata(metadata=...) kwarg; switch graph_embeddings_import to the singular `vector` wire key. * trustgraph-base messaging translators knowledge.py and document_loading.py — fix decode side to read the singular `"vector"` key, matching what their own encode sides have always written. * trustgraph-flow tables/knowledge.py — fix Cassandra row deserialiser to construct EntityEmbeddings(vector=...) instead of vectors=. * trustgraph-flow gateway core_import/core_export — switch the kg-core msgpack wire format to the singular `"v"`/`"vector"` key and drop the dead `m["m"]` envelope field that referenced the removed Metadata.metadata triples list (it was a guaranteed KeyError on the export side). Defense-in-depth regression coverage (32 new tests across 7 files): * tests/contract/test_schema_field_contracts.py — pin the field set of Metadata, EntityEmbeddings, ChunkEmbeddings, EntityContext so any future schema rename fails CI loudly with a clear diff. * tests/unit/test_translators/test_knowledge_translator_roundtrip.py and test_document_embeddings_translator_roundtrip.py - encode→decode round-trip the affected translators end to end, locking in the singular `"vector"` wire key. * tests/unit/test_gateway/test_entity_contexts_import_dispatcher.py and test_graph_embeddings_import_dispatcher.py — exercise the websocket dispatchers' receive() path with realistic payloads, the direct regression test for the original production crash. * tests/unit/test_gateway/test_core_import_export_roundtrip.py — pack/unpack the kg-core msgpack format through the real dispatcher classes (with KnowledgeRequestor mocked), including a full export→import round-trip. * tests/unit/test_tables/test_knowledge_table_store.py — exercise the Cassandra row → schema conversion via __new__ to bypass the live cluster connection. Also fixes an unrelated leaked-coroutine RuntimeWarning in test_gateway/test_service.py::test_run_method_calls_web_run_app: the mocked aiohttp.web.run_app now closes the coroutine that Api.run() hands it, mirroring what the real run_app would do, instead of leaving it for the GC to complain about. * Fix RabbitMQ request/response race and chunker Flow API drift Two unrelated regressions surfaced after the v2.2 queue class refactor. Bundled here because both are small and both block production. 1. Request/response race against ephemeral RabbitMQ response queues Commit `feeb92b3` switched response/notify queues to per-subscriber auto-delete exclusive queues. That fixed orphaned-queue accumulation but introduced a setup race: Subscriber.start() created the run() task and returned immediately, while the underlying RabbitMQ consumer only declared and bound its queue lazily on the first receive() call. RequestResponse.request() therefore published the request before any queue was bound to the matching routing key, and the broker dropped the reply. Symptoms: "Failed to fetch config on notify" / "Request timeout exception" repeating roughly every 10s in api-gateway, document-embeddings and any other service exercising the config notify path. Fix: * Add ensure_connected() to the BackendConsumer protocol; implement it on RabbitMQBackendConsumer (calls _connect synchronously, declaring and binding the queue) and as a no-op on PulsarBackendConsumer (Pulsar's client.subscribe is already synchronous at construction). * Convert Subscriber's readiness signal from a non-existent Event to an asyncio.Future created in start(). run() calls consumer.ensure_connected() immediately after create_consumer() and sets _ready.set_result(None) on first successful bind. start() awaits the future via asyncio.wait so it returns only once the consumer is fully bound. Any reply published after start() returns is therefore guaranteed to land in a bound queue. * First-attempt connection failures call _ready.set_exception(e) and exit run() so start() unblocks with the error rather than hanging forever — the existing higher-level retry pattern in fetch_and_apply_config takes over from there. Runtime failures after a successful start still go through the existing retry-with-backoff path. * Update the two existing graceful-shutdown tests that monkey-patch Subscriber.run with a custom coroutine to honor the new contract by signalling _ready themselves. * Add tests/unit/test_base/test_subscriber_readiness.py with five regression tests pinning the readiness contract: ensure_connected must be called before start() returns; start() must block while ensure_connected runs (race-condition guard with a threading.Event gate); first-attempt create_consumer and ensure_connected failures must propagate to start() instead of hanging; ensure_connected must run before any receive() call. 2. Chunker Flow parameter lookup using the wrong attribute trustgraph-base/trustgraph/base/chunking_service.py was reading flow.parameters.get("chunk-size") and chunk-overlap, but the Flow class has no `parameters` attribute — parameter lookup is exposed through Flow.__call__ (flow("chunk-size") returns the resolved value or None). The exception was caught and logged as a WARNING, so chunking continued with the default sizes and any configured chunk-size / chunk-overlap was silently ignored: chunker - WARNING - Could not parse chunk-size parameter: 'Flow' object has no attribute 'parameters' The chunker tests didn't catch this because they constructed mock_flow = MagicMock() and configured mock_flow.parameters.get.side_effect = ..., which is the same phantom attribute MagicMock auto-creates on demand. Tests and production agreed on the wrong API. Fix: switch chunking_service.py to flow("chunk-size") / flow("chunk-overlap"). Update both chunker test files to mock the __call__ side_effect instead of the phantom parameters.get, merging parameter values into the existing flow() lookup the on_message tests already used for producer resolution.	2026-04-11 01:29:38 +01:00
cybermaggedon	c23e28aa66	Fix Metadata/EntityEmbeddings schema migration tail and add regression tests (#777 ) The Metadata dataclass dropped its `metadata: list[Triple]` field and EntityEmbeddings/ChunkEmbeddings settled on a singular `vector: list[float]` field, but several call sites kept passing `Metadata(metadata=...)` and `EntityEmbeddings(vectors=...)`. The bugs were latent until a websocket client first hit `/api/v1/flow/default/import/entity-contexts`, at which point the dispatcher TypeError'd on construction. Production fixes (5 call sites on the same migration tail): * trustgraph-flow gateway dispatchers entity_contexts_import.py and graph_embeddings_import.py — drop the stale Metadata(metadata=...) kwarg; switch graph_embeddings_import to the singular `vector` wire key. * trustgraph-base messaging translators knowledge.py and document_loading.py — fix decode side to read the singular `"vector"` key, matching what their own encode sides have always written. * trustgraph-flow tables/knowledge.py — fix Cassandra row deserialiser to construct EntityEmbeddings(vector=...) instead of vectors=. * trustgraph-flow gateway core_import/core_export — switch the kg-core msgpack wire format to the singular `"v"`/`"vector"` key and drop the dead `m["m"]` envelope field that referenced the removed Metadata.metadata triples list (it was a guaranteed KeyError on the export side). Defense-in-depth regression coverage (32 new tests across 7 files): * tests/contract/test_schema_field_contracts.py — pin the field set of Metadata, EntityEmbeddings, ChunkEmbeddings, EntityContext so any future schema rename fails CI loudly with a clear diff. * tests/unit/test_translators/test_knowledge_translator_roundtrip.py and test_document_embeddings_translator_roundtrip.py - encode→decode round-trip the affected translators end to end, locking in the singular `"vector"` wire key. * tests/unit/test_gateway/test_entity_contexts_import_dispatcher.py and test_graph_embeddings_import_dispatcher.py — exercise the websocket dispatchers' receive() path with realistic payloads, the direct regression test for the original production crash. * tests/unit/test_gateway/test_core_import_export_roundtrip.py — pack/unpack the kg-core msgpack format through the real dispatcher classes (with KnowledgeRequestor mocked), including a full export→import round-trip. * tests/unit/test_tables/test_knowledge_table_store.py — exercise the Cassandra row → schema conversion via __new__ to bypass the live cluster connection. Also fixes an unrelated leaked-coroutine RuntimeWarning in test_gateway/test_service.py::test_run_method_calls_web_run_app: the mocked aiohttp.web.run_app now closes the coroutine that Api.run() hands it, mirroring what the real run_app would do, instead of leaving it for the GC to complain about.	2026-04-10 20:43:45 +01:00
cybermaggedon	0994d4b05f	Open 2.3 release branch (#775 ) * Update packages and CI for new release branch	2026-04-10 14:42:19 +01:00
cybermaggedon	ad0bff10ee	master -> release/v2.3 (#774 ) * Mainly README changes	2026-04-10 14:38:46 +01:00
cybermaggedon	ec8f740de3	release/v2.2 -> master (#773 )	2026-04-10 14:36:58 +01:00
elpresidank	77a5fa5044	fix: QA regression pass — graph sizing, focus trap, contrast, accessibility - Fix graph canvas using window dimensions instead of container by using ResizeObserver ref callback (attaches when conditionally-rendered container mounts) - Fix dialog focus trap escaping to background — filter hidden/disabled elements from focusable selector - Fix sidebar connection status and disconnection banner contrast — use semantic text-warning/bg-warning instead of amber-400 (1.65:1 → 4.5:1+) - Add aria-label to chat textarea, htmlFor/id pairs to flows dialog inputs - Add ARIA tab pattern to prompts page (role=tablist/tab/tabpanel, aria-selected, aria-controls) - Fix prompts heading hierarchy (H1→H2 instead of H1→H3) - Add flex-wrap to flows page header, fix badge contrast across pages - Fix service-call race condition with early returns instead of console.log Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>	2026-04-10 07:48:01 -05:00
elpresidank	b854b56558	feat: MCP tool client infrastructure for agent extensibility Add the full MCP tool pipeline enabling agents to invoke external tools (like Brave Search) via MCP servers: - Add ToolRequest/ToolResponse types and mcp-tool topics to @trustgraph/base - Create McpToolService (FlowProcessor) that connects to external MCP servers via @modelcontextprotocol/sdk StreamableHTTP transport - Add createMcpTool() to wire MCP tools into the agent's ReAct loop - Implement config-driven tool registration in AgentService with backward- compatible fallback to hardcoded tools - Add tool filtering by group and state (port of Python tool_filter.py) - Register mcp-tool in gateway dispatcher and export from @trustgraph/flow - Fix flow restart race condition: skip restart when flow definitions unchanged - Update seed config with MCP server config and tool definitions - Add run scripts for MCP tool service and Brave Search MCP server Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>	2026-04-10 05:45:46 -05:00
elpresidank	f2b376abef	fix: FalkorDB result parsing, embeddings routing, triples query response, graph visualization - Fix FalkorDB triples query: client v5 returns objects not arrays, use named field access - Fix embeddings service: align spec names to "embeddings-request"/"embeddings-response" - Fix client triplesQuery: read `triples` field instead of `response` from backend - Fix graph page crash: guard against non-array triples, accept literals as entity nodes - Add seed:demo script for AI industry knowledge graph (254 triples, 64 entities) Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>	2026-04-10 04:59:36 -05:00
cybermaggedon	feeb92b33f	Refactor: Derive consumer behaviour from queue class (#772 ) Derive consumer behaviour from queue class, remove consumer_type parameter The queue class prefix (flow, request, response, notify) now fully determines consumer behaviour in both RabbitMQ and Pulsar backends. Added 'notify' class for ephemeral broadcast (config push notifications). Response and notify classes always create per-subscriber auto-delete queues, eliminating orphaned queues that accumulated on service restarts. Change init-trustgraph to set up the 'notify' namespace in Pulsar instead of old hangover 'state'. Fixes 'stuck backlog' on RabbitMQ config notification queue.	2026-04-09 09:55:41 +01:00
cybermaggedon	befe951b2e	Merge 2.2 into master (#771 )	2026-04-08 14:17:18 +01:00
cybermaggedon	aff96e57cb	Added Explainable AI agent demo in Typescript (#770 ) (Not functional code)	2026-04-08 14:16:14 +01:00
cybermaggedon	e81418c58f	fix: preserve literal types in focus quoted triples and document tracing (#769 ) The triples client returns Uri/Literal (str subclasses), not Term objects. _quoted_triple() treated all values as IRIs, so literal objects like skos:definition values were mistyped in focus provenance events, and trace_source_documents could not match them in the store. Added to_term() to convert Uri/Literal back to Term, threaded a term_map from follow_edges_batch through get_subgraph/get_labelgraph into uri_map, and updated _quoted_triple to accept Term objects directly.	2026-04-08 13:37:02 +01:00
cybermaggedon	4b5bfacab1	Forward missing explain_triples through RAG clients and agent tool callback (#768 ) fix: forward explain_triples through RAG clients and agent tool callback - RAG clients and the KnowledgeQueryImpl tool callback were dropping explain_triples from explain events, losing provenance data (including focus edge selections) when graph-rag is invoked via the agent. Tests for provenance and explainability (56 new): - Client-level forwarding of explain_triples - Graph-RAG structural chain (question → grounding → exploration → focus → synthesis) - Graph-RAG integration with mocked subsidiary clients - Document-RAG integration (question → grounding → exploration → synthesis) - Agent-orchestrator all 3 patterns: react, plan-then-execute, supervisor	2026-04-08 11:41:17 +01:00
Cyber MacGeddon	dc72ed3cca	Merge branch 'release/v2.2'	2026-04-07 22:29:55 +01:00
cybermaggedon	e899370d98	Update docs for 2.2 release (#766 ) - Update protocol specs - Update protocol docs - Update API specs	2026-04-07 22:24:59 +01:00
elpresidank	580ee319a3	fix: prevent dispatcher race condition via promise-based lazy init Store the initialization Promise in the requestors map synchronously before yielding, so concurrent callers for the same key await the same instance — prevents orphaned RequestResponse objects and duplicate NATS subscriptions. Mirrors upstream fix `8f18ba02`. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>	2026-04-07 12:11:11 -05:00
elpresidank	2a2e8e76a3	Merge remote-tracking branch 'origin/master' into ts-port	2026-04-07 10:51:24 -05:00
elpresidank	5e3929a883	fix: comprehensive QA audit — light mode, accessibility, error handling, code quality - Fix light mode: theme-aware graph node labels, remove prose-invert for theme-safe markdown, add brand/semantic color overrides for light backgrounds - Add 404 catch-all route redirecting unknown paths to /chat - FalkorDB: add .catch() to connectPromise, add ensureConnected() to all store methods (createLiteral, relateNode, relateLiteral, deleteCollection) - Accessibility: dialog role/aria-modal, toast aria-live, dismiss/zoom/search button aria-labels, close panel aria-label - Lazy-load ForceGraph2D (splits 189KB into separate chunk, main bundle -26%) - Cap conversation localStorage at 200 messages to prevent quota overflow - Fix pnpm test: add --passWithNoTests to cli/mcp packages - Add upload error notification instead of silent catch - Remove unused class-variance-authority dep and dead tabs.tsx component - Add @types/node to flow package devDependencies - Remove stale FIXME comment in messages.ts Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>	2026-04-07 09:15:59 -05:00
cybermaggedon	c20e6540ec	Subscriber resilience and RabbitMQ fixes (#765 ) Subscriber resilience: recreate consumer after connection failure - Move consumer creation from Subscriber.start() into the run() loop, matching the pattern used by Consumer. If the connection drops and the consumer is closed in the finally block, the loop now recreates it on the next iteration instead of spinning forever on a None consumer. Consumer thread safety: - Dedicated ThreadPoolExecutor per consumer so all pika operations (create, receive, acknowledge, negative_acknowledge) run on the same thread — pika BlockingConnection is not thread-safe - Applies to both Consumer and Subscriber classes Config handler type audit — fix four mismatched type registrations: - librarian: was ["librarian"] (non-existent type), now ["flow", "active-flow"] (matches config["flow"] that the handler reads) - cores/service: was ["kg-core"], now ["flow"] (reads config["flow"]) - metering/counter: was ["token-costs"], now ["token-cost"] (singular) - agent/mcp_tool: was ["mcp-tool"], now ["mcp"] (reads config["mcp"]) Update tests	2026-04-07 14:51:14 +01:00
elpresidank	9ef9ef854f	fix: iterative QA pass — resolve remaining bugs, UX and accessibility improvements Three QA iterations to convergence (zero issues remaining): Workbench UI: - Connection badge: amber "Connected (no auth)" for unauthenticated state - Theme persistence: restore script in index.html + localStorage sync - Settings About section: add bottom padding so content isn't clipped - Clear messages: cancel in-flight requests when clearing chat - Feature switch labels: proper casing + acronym handling (MCP, LLM) - Token Cost badge: hidden during loading state - ARIA: role="switch", aria-checked on toggles, aria-labels on buttons - ConfigApi: null-safe chaining for getPrompts/getSystemPrompt Grafana dashboards: - Auto-refresh 30s on all 3 dashboards - Panel heights reduced to fit viewport without scrolling - Anonymous role upgraded to Editor for Explore access Infrastructure: - Nginx: DNS resolver with variable-based upstream (prevents crash loop) - Workbench port set to 3002 in .env Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>	2026-04-07 06:33:22 -05:00
cybermaggedon	ddd4bd7790	Deliver explainability triples inline in retrieval response stream (#763 ) Provenance triples are now included directly in explain messages from GraphRAG, DocumentRAG, and Agent services, eliminating the need for follow-up knowledge graph queries to retrieve explainability details. Each explain message in the response stream now carries: - explain_id: root URI for this provenance step (unchanged) - explain_graph: named graph where triples are stored (unchanged) - explain_triples: the actual provenance triples for this step (new) Changes across the stack: - Schema: added explain_triples field to GraphRagResponse, DocumentRagResponse, and AgentResponse - Services: all explain message call sites pass triples through (graph_rag, document_rag, agent react, agent orchestrator) - Translators: encode explain_triples via TripleTranslator for gateway wire format - Python SDK: ProvenanceEvent now includes parsed ExplainEntity and raw triples; expanded event_type detection - CLI: invoke_graph_rag, invoke_agent, invoke_document_rag use inline entity when available, fall back to graph query - Tech specs updated Additional explainability test	2026-04-07 12:19:05 +01:00
cybermaggedon	2f8d6a3ffb	Fix agent config handler registration, remove debug prints, disable RabbitMQ heartbeats (#764 ) - Fix agent react and orchestrator services appending bare methods to config_handlers instead of using register_config_handler() — caused 'method object is not subscriptable' on config notify - Add exc_info to config fetch retry logging for proper tracebacks - Remove debug print statements from collection management dispatcher and translator - Disable RabbitMQ heartbeats (heartbeat=0) to prevent broker closing idle producer connections that can't process heartbeat frames from BlockingConnection	2026-04-07 12:11:12 +01:00
Sreeram Venkatasubramanian	c737e8c356	fix: reduce consumer poll timeout from 2000ms to 100ms (#761 )	2026-04-07 12:09:20 +01:00
Sreeram Venkatasubramanian	f0c9039b76	fix: reduce consumer poll timeout from 2000ms to 100ms	2026-04-07 12:02:27 +01:00
elpresidank	3a80872482	fix: comprehensive QA — resolve 13 bugs, add UX improvements across all services Client SDK: add .catch() to graphRagStreaming/documentRagStreaming (silent timeout), null-guard JSON.parse in getPrompts/getSystemPrompt/getPrompt. Backend: implement "getvalues" config operation for token costs, null-check createTerm() in FalkorDB triples query, add knowledge-cores service entrypoint and Docker entry, return proper HTTP 400/404 for gateway error responses. Workbench: cancel button + elapsed timer for chat, clear agent spinner on error, flow dialog inline validation, responsive header wrapping, knowledge cores loading timeout, sidebar/page naming consistency, theme toggle indicator. Infrastructure: enable Grafana Explore for viewers, add gateway Prometheus scrape target, fix RAG pipeline dashboard layout (6 panels visible), filter Service Health to configured targets only. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>	2026-04-07 05:20:10 -05:00
elpresidank	72870a7e2e	feat: add unit tests, Docker polish, and workbench UX improvements Unit tests: Consumer class (7), recursive-splitter (10), parseJsonResponse (11) — 28 total. Docker: add 5 commented LLM provider services, dev compose override, .env.example. Workbench: chat persistence, error boundary, disconnect banner, prompts error handling. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>	2026-04-07 03:51:29 -05:00
elpresidank	c7eefee607	feat: add Docker entrypoints, LLM providers, pipeline hardening, workbench pages Phase 9 — four parallel workstreams: - Stream A: 14 Docker entrypoints for containerized deployment - Stream B: Pipeline hardening — robust JSON parsing, LLM retry logic, consumer negative-ack, FalkorDB test import fix - Stream C: Azure OpenAI, OpenAI-compatible, and Mistral LLM providers - Stream D: Workbench Prompts, Token Cost, Knowledge Cores pages + Settings feature switches Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>	2026-04-07 03:22:55 -05:00
elpresidank	50fb311d2d	feat: real PDF pipeline test — end-to-end knowledge extraction working Add full pipeline test that generates a real PDF, processes it through the entire pipeline, and verifies knowledge lands in FalkorDB: - Create test PDF generator using pdf-lib (2-page doc about Acme Corp) - Add testFullPipeline() to integration tests with store verification - Fix FalkorDB client connect() — createClient returns unconnected client in both TriplesStore and TriplesQuery classes Results: PDF decoded (2 pages) → chunked (2 chunks) → extracted (4 relationships) → 16 triples stored in FalkorDB including: alice-johnson → is-a-senior-engineer → acme-corporation cloudsync → uses-aws-for-hosting → amazon-web-services provenance: pages → prov:wasDerivedFrom → source document Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>	2026-04-07 02:19:12 -05:00
elpresidank	5bc7a1b6fc	fix: resolve FlowProcessor topic collisions, librarian timeout, tests Two bugs found during end-to-end testing: 1. FlowProcessor never restarted flows when config changed — it only started them once. Stale NATS JetStream data from previous sessions caused services to bind to wrong topics. Fix: stop and restart flows on every config push that includes flow definitions. 2. Gateway publishToTopic sent messages without an id property. Pipeline FlowProcessor handlers check properties.id and silently return if missing. Fix: auto-generate a message id when publishing to topics. Both fixes validated: 13/13 integration tests passing, PDF decoder correctly receives and processes document messages through the pipeline. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>	2026-04-07 01:53:55 -05:00
elpresidank	c545213224	feat: add query/retrieval FlowProcessor services and missing runner scripts Wire up the query and retrieval side of the pipeline so the agent can answer questions from stored knowledge: - Triples query service (FalkorDB) — all SPO pattern queries via NATS - Graph embeddings query service (Qdrant) — entity vector similarity - Document embeddings query service (Qdrant) — chunk vector similarity - Graph RAG service — full concept→entity→traverse→score→synthesize pipeline - Document RAG service — embed→find chunks→synthesize pipeline - Runner scripts for chunker, extractor, embeddings (missing from Phase 5) - Add DocumentEmbeddingsRequest/Response schema types - Add RAG prompt templates (extract-concepts, edge-scoring, synthesize) - Add graph/doc embeddings query topics to seed config + flow manager - Add all pipeline/query/retrieval services to docker-compose - 8 new runner scripts, 8 new pnpm script aliases Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>	2026-04-07 01:05:54 -05:00
elpresidank	8f7008822a	feat: add document pipeline — PDF decoder, Ollama LLM, storage services Add end-to-end document processing pipeline: - PDF decoder service (pdfjs-dist) extracts text per page from librarian docs - Ollama native LLM service for local model inference - FalkorDB triples store FlowProcessor consumer - Qdrant graph embeddings store FlowProcessor consumer - Fix spec name collisions in chunker/extractor (input→chunk-input, etc.) - Gateway /load endpoint to trigger document processing - Align flow manager blueprint and seed config with full pipeline topics - Add runner scripts and test coverage for document load Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>	2026-04-06 23:47:43 -05:00
elpresidank	8f9de7604e	fix: make abstract class constructors protected Marks FlowProcessor and EmbeddingsService constructors as protected since these classes should only be instantiated via subclasses. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>	2026-04-06 21:52:00 -05:00
cybermaggedon	4acd853023	Config push notify pattern: replace stateful pub/sub with signal+ fetch (#760 ) Replace the config push mechanism that broadcast the full config blob on a 'state' class pub/sub queue with a lightweight notify signal containing only the version number and affected config types. Processors fetch the full config via request/response from the config service when notified. This eliminates the need for the pub/sub 'state' queue class and stateful pub/sub services entirely. The config push queue moves from 'state' to 'flow' class — a simple transient signal rather than a retained message. This solves the RabbitMQ late-subscriber problem where restarting processes never received the current config because their fresh queue had no historical messages. Key changes: - ConfigPush schema: config dict replaced with types list - Subscribe-then-fetch startup with retry: processors subscribe to notify queue, fetch config via request/response, then process buffered notifies with version comparison to avoid race conditions - register_config_handler() accepts optional types parameter so handlers only fire when their config types change - Short-lived config request/response clients to avoid subscriber contention on non-persistent response topics - Config service passes affected types through put/delete/flow operations - Gateway ConfigReceiver rewritten with same notify pattern and retry loop Tests updated New tests: - register_config_handler: without types, with types, multiple types, multiple handlers - on_config_notify: old/same version skipped, irrelevant types skipped (version still updated), relevant type triggers fetch, handler without types always called, mixed handler filtering, empty types invokes all, fetch failure handled gracefully - fetch_config: returns config+version, raises on error response, stops client even on exception - fetch_and_apply_config: applies to all handlers on startup, retries on failure	2026-04-06 16:57:27 +01:00
V.Sreeram	d4723566cb	fix: prevent duplicate dispatcher creation race condition in invoke_global_service (#715 ) * fix: prevent duplicate dispatcher creation race condition in invoke_global_service Concurrent coroutines could all pass the `if key in self.dispatchers` check before any of them wrote the result back, because `await dispatcher.start()` yields to the event loop. This caused multiple Pulsar consumers to be created on the same shared subscription, distributing responses round-robin and dropping ~2/3 of them — manifesting as a permanent spinner in the Workbench UI. Apply a double-checked asyncio.Lock in both `invoke_global_service` and `invoke_flow_service` so only one dispatcher is ever created per service key. * test: add concurrent-dispatch tests for race condition fix Add asyncio.gather-based tests that verify invoke_global_service and invoke_flow_service create exactly one dispatcher under concurrent calls, preventing the duplicate Pulsar consumer bug.	2026-04-06 11:14:32 +01:00
V.Sreeram	8f18ba0257	fix: prevent duplicate dispatcher creation race condition in invoke_global_service (#715 ) * fix: prevent duplicate dispatcher creation race condition in invoke_global_service Concurrent coroutines could all pass the `if key in self.dispatchers` check before any of them wrote the result back, because `await dispatcher.start()` yields to the event loop. This caused multiple Pulsar consumers to be created on the same shared subscription, distributing responses round-robin and dropping ~2/3 of them — manifesting as a permanent spinner in the Workbench UI. Apply a double-checked asyncio.Lock in both `invoke_global_service` and `invoke_flow_service` so only one dispatcher is ever created per service key. * test: add concurrent-dispatch tests for race condition fix Add asyncio.gather-based tests that verify invoke_global_service and invoke_flow_service create exactly one dispatcher under concurrent calls, preventing the duplicate Pulsar consumer bug.	2026-04-06 11:13:59 +01:00
Alex Jenkins	10a931f04c	Feat: Auto-pull missing Ollama models (#757 ) * fix deadlink in readme Signed-off-by: Jenkins, Kenneth Alexander <kjenkins60@gatech.edu> * feat: Auto-pull Ollama models Signed-off-by: Jenkins, Kenneth Alexander <kjenkins60@gatech.edu> * fix: Restore namespace __init__.py files for package resolution Signed-off-by: Jenkins, Kenneth Alexander <kjenkins60@gatech.edu> * fix CI Signed-off-by: Jenkins, Kenneth Alexander <kjenkins60@gatech.edu>	2026-04-06 11:10:53 +01:00
Alex Jenkins	7daa06e9e4	Feat: Auto-pull missing Ollama models (#757 ) * fix deadlink in readme Signed-off-by: Jenkins, Kenneth Alexander <kjenkins60@gatech.edu> * feat: Auto-pull Ollama models Signed-off-by: Jenkins, Kenneth Alexander <kjenkins60@gatech.edu> * fix: Restore namespace __init__.py files for package resolution Signed-off-by: Jenkins, Kenneth Alexander <kjenkins60@gatech.edu> * fix CI Signed-off-by: Jenkins, Kenneth Alexander <kjenkins60@gatech.edu>	2026-04-06 11:10:14 +01:00
elpresidank	25d4227cb5	fix: resolve FlowProcessor topic collisions, librarian timeout, tests Fix critical bug where all FlowProcessor services shared the same spec names ("request"/"response"), causing them to steal each other's NATS topics. Now each service uses unique spec names matching the flow config topic keys (e.g., "text-completion-request", "prompt-request", "agent-request"). Fix librarian NATS consumer timeout (500ms → 2000ms, below NATS minimum). Update seed-config and test-pipeline with correct flow topic mappings. Add prompt template runner script. Smoke test results: 11/11 passing (config CRUD, WebSocket, LLM, librarian CRUD). Agent routing verified via manual curl test. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>	2026-04-06 01:02:10 -05:00
elpresidank	515fc0c264	fix: Docker build fixes, add agent/librarian/flow-manager to compose Fix Containerfiles: - Move tsconfig.json to workspace config layer for early availability - Add missing workspace package.json entries for pnpm lockfile resolution Docker Compose: - Move Grafana from port 3000 to 3030 (avoid conflicts) - Add agent, librarian, and flow-manager app services - Add librarian-data volume for document persistence Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>	2026-04-06 00:41:01 -05:00
elpresidank	7db5a1023e	feat: add flow manager, config seeding, and expanded integration tests Flow Management Service: - FlowManagerService (AsyncProcessor) handling list/get/start/stop flows and list/get blueprints via kebab-case wire format - Default blueprint with all service topic mappings - Pushes flow config to config service on start/stop Config Seeding: - seed-config.ts script pushes prompt templates (extract-relationships, extract-definitions, document-prompt, kg-prompt) and default flow definition via gateway REST API Integration Tests: - Librarian CRUD: add-document, list-documents, get-content, delete - Agent query: verifies routing through gateway to agent service - Skip flags: SKIP_LIBRARIAN=1, SKIP_AGENT=1 Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>	2026-04-06 00:37:03 -05:00
elpresidank	d1f24cf759	feat: add Docker deployment with Containerfile, entrypoints, and nginx Multi-stage Containerfile for all Node.js services (single image, different CMD per docker-compose service). ESM entrypoints for gateway, config, text-completion, prompt, embeddings, agent, and librarian. Workbench gets a separate Containerfile (nginx:alpine) with SPA routing and API/WebSocket proxy to gateway. Docker Compose updated with 6 app services (gateway, config-service, text-completion, prompt, embeddings, workbench) using shared trustgraph-ts:local image. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>	2026-04-06 00:21:00 -05:00

... 2 3 4 5 6 ...

1463 commits