trustgraph

mirror of https://github.com/trustgraph-ai/trustgraph.git synced 2026-04-25 00:16:23 +02:00

Author	SHA1	Message	Date
cybermaggedon	b2e69cfa48	Add multi-arch (amd64/arm64) container builds and parallel CI (#807 ) Pull requests back-ported to release/v2.2: 801, 802, 804, 805 Restructure container builds for multi-platform support, enabling ARM-based deployments (e.g. Apple Silicon via Docker Desktop). Makefile: - Replace per-container named targets with pattern rules (container-%, manifest-%, platform-%-{amd64,arm64}, combine-manifest-%) - Add parallel CI targets: platform builds push per-arch images, combine-manifest creates and pushes the multi-arch manifest list - Remove legacy cruft targets (update-dcs, update-templates) CI (release.yaml): - Split single deploy job into build-platform-image (16 parallel jobs: 8 containers x 2 platforms) and combine-manifests (8 jobs, metadata only) - Use native ARM runners (ubuntu-24.04-arm) Containerfile.hf: - Downgrade to Python 3.12 (PyTorch lacks arm64 wheels for 3.13) - Use standard PyTorch package instead of +cpu variant (no arm64 wheels on the cpu index)	2026-04-14 16:48:46 +01:00
cybermaggedon	d9dc4cbab5	SPARQL query service (#754 ) SPARQL 1.1 query service wrapping pub/sub triples interface Add a backend-agnostic SPARQL query service that parses SPARQL queries using rdflib, decomposes them into triple pattern lookups via the existing TriplesClient pub/sub interface, and performs in-memory joins, filters, and projections. Includes: - SPARQL parser, algebra evaluator, expression evaluator, solution sequence operations (BGP, JOIN, OPTIONAL, UNION, FILTER, BIND, VALUES, GROUP BY, ORDER BY, LIMIT/OFFSET, DISTINCT, aggregates) - FlowProcessor service with TriplesClientSpec - Gateway dispatcher, request/response translators, API spec - Python SDK method (FlowInstance.sparql_query) - CLI command (tg-invoke-sparql-query) - Tech spec (docs/tech-specs/sparql-query.md) New unit tests for SPARQL query	2026-04-02 17:21:39 +01:00
cybermaggedon	24f0190ce7	RabbitMQ pub/sub backend with topic exchange architecture (#752 ) Adds a RabbitMQ backend as an alternative to Pulsar, selectable via PUBSUB_BACKEND=rabbitmq. Both backends implement the same PubSubBackend protocol — no application code changes needed to switch. RabbitMQ topology: - Single topic exchange per topicspace (e.g. 'tg') - Routing key derived from queue class and topic name - Shared consumers: named queue bound to exchange (competing, round-robin) - Exclusive consumers: anonymous auto-delete queue (broadcast, each gets every message). Used by Subscriber and config push consumer. - Thread-local producer connections (pika is not thread-safe) - Push-based consumption via basic_consume with process_data_events for heartbeat processing Consumer model changes: - Consumer class creates one backend consumer per concurrent task (required for pika thread safety, harmless for Pulsar) - Consumer class accepts consumer_type parameter - Subscriber passes consumer_type='exclusive' for broadcast semantics - Config push consumer uses consumer_type='exclusive' so every processor instance receives config updates - handle_one_from_queue receives consumer as parameter for correct per-connection ack/nack LibrarianClient: - New shared client class replacing duplicated librarian request-response code across 6+ services (chunking, decoders, RAG, etc.) - Uses stream-document instead of get-document-content for fetching document content in 1MB chunks (avoids broker message size limits) - Standalone object (self.librarian = LibrarianClient(...)) not a mixin - get-document-content marked deprecated in schema and OpenAPI spec Serialisation: - Extracted dataclass_to_dict/dict_to_dataclass to shared serialization.py (used by both Pulsar and RabbitMQ backends) Librarian queues: - Changed from flow class (persistent) back to request/response class now that stream-document eliminates large single messages - API upload chunk size reduced from 5MB to 3MB to stay under broker limits after base64 encoding Factory and CLI: - get_pubsub() handles 'rabbitmq' backend with RabbitMQ connection params - add_pubsub_args() includes RabbitMQ options (host, port, credentials) - add_pubsub_args(standalone=True) defaults to localhost for CLI tools - init_trustgraph skips Pulsar admin setup for non-Pulsar backends - tg-dump-queues and tg-monitor-prompts use backend abstraction - BaseClient and ConfigClient accept generic pubsub config	2026-04-02 12:47:16 +01:00
cybermaggedon	e65ea217a2	agent-orchestrator improvements (#743 ) agent-orchestrator improvements: - Improve agent trace - Improve queue dumping - Fixing supervisor pattern - Fix synthesis step to remove loop Minor dev environment improvements: - Improve queue dump output for JSON - Reduce dev container rebuild	2026-03-31 11:24:30 +01:00
cybermaggedon	5c6fe90fe2	Add universal document decoder with multi-format support (#705 ) Add universal document decoder with multi-format support using 'unstructured'. New universal decoder service powered by the unstructured library, handling DOCX, XLSX, PPTX, HTML, Markdown, CSV, RTF, ODT, EPUB and more through a single service. Tables are preserved as HTML markup for better downstream extraction. Images are stored in the librarian but excluded from the text pipeline. Configurable section grouping strategies (whole-document, heading, element-type, count, size) for non-page formats. Page-based formats (PDF, PPTX, XLSX) are automatically grouped by page. All four decoders (PDF, Mistral OCR, Tesseract OCR, universal) now share the "document-decoder" ident so they are interchangeable. PDF-only decoders fetch document metadata to check MIME type and gracefully skip unsupported formats. Librarian changes: removed MIME type whitelist validation so any document format can be ingested. Simplified routing so text/plain goes to text-load and everything else goes to document-load. Removed dual inline/streaming data paths — documents always use document_id for content retrieval. New provenance entity types (tg:Section, tg:Image) and metadata predicates (tg:elementTypes, tg:tableCount, tg:imageCount) for richer explainability. Universal decoder is in its own package (trustgraph-unstructured) and container image (trustgraph-unstructured).	2026-03-23 12:56:35 +00:00
cybermaggedon	1809c1f56d	Structured data 2 (#645 ) * Structured data refactor - multi-index tables, remove need for manual mods to the Cassandra tables * Tech spec updated to track implementation	2026-02-23 15:56:29 +00:00
cybermaggedon	cf0daedefa	Changed schema for Value -> Term, majorly breaking change (#622 ) * Changed schema for Value -> Term, majorly breaking change * Following the schema change, Value -> Term into all processing * Updated Cassandra for g, p, s, o index patterns (7 indexes) * Reviewed and updated all tests * Neo4j, Memgraph and FalkorDB remain broken, will look at once settled down	2026-01-27 13:48:08 +00:00
cybermaggedon	9a34ab1b93	Complete remaining parameter work (#530 ) * Fix CLI typo * Complete flow parameters work, still needs implementation in LLMs	2025-09-24 13:58:34 +01:00
cybermaggedon	f6bccd7438	Parallel contain builds (#515 )	2025-09-11 12:32:04 +01:00
cybermaggedon	6c681967ab	Validate librarian collection (#453 )	2025-08-07 21:36:24 +01:00
cybermaggedon	98022d6af4	Migrate from setup.py to pyproject.toml (#440 ) * Converted setup.py to pyproject.toml * Modern package infrastructure as recommended by py docs	2025-07-23 21:22:08 +01:00
cybermaggedon	ac977d18f4	Add MCP container push (#425 )	2025-07-03 17:00:59 +01:00
cybermaggedon	f907ea7db8	PoC MCP server (#419 ) * Very initial MCP server PoC for TrustGraph * Put service on port 8000 * Add MCP container and packages to buildout	2025-07-02 18:19:23 +01:00
cybermaggedon	4461d7b289	Feature/persist config (#370 ) * Cassandra tables for config * Config is backed by Cassandra	2025-05-07 12:58:32 +01:00
cybermaggedon	d0da122bed	Fix/llms (#366 ) * Fix LMStudio, cache documents with tg-load-sample-documents * Fix Mistral	2025-05-06 16:17:16 +01:00
cybermaggedon	a9197d11ee	Feature/configure flows (#345 ) - Keeps processing in different flows separate so that data can go to different stores / collections etc. - Potentially supports different processing flows - Tidies the processing API with common base-classes for e.g. LLMs, and automatic configuration of 'clients' to use the right queue names in a flow	2025-04-22 20:21:38 +01:00
cybermaggedon	1d222235d3	Configuration initialisation (#335 ) * - Fixed error reporting in config - Updated tg-init-pulsar to be able to load initial config to config-svc - Tweaked API naming and added more config calls * Tools to dump out prompts and agent tools	2025-04-02 13:52:33 +01:00
cybermaggedon	c759d55734	Added module which does OCR for PDF, pdf-ocr in a separate package (#324 ) (has a lot of dependencies). Uses Tesseract.	2025-03-20 09:29:40 +00:00
JackColquitt	5f5cf8fd07	Added basic Mistral API support	2025-03-14 17:47:59 -07:00
cybermaggedon	edcdc4d59d	Feature/separate containers (#287 ) * Separate containerfiles * Add push to Makefile * Update image names in the templates	2025-01-28 19:36:05 +00:00
cybermaggedon	dc2b599fda	Fix/release broken (#257 ) * Break release into 3 jobs * Replace Github action with podman command	2025-01-06 21:45:42 +00:00
Avi Avni	1ab2a7ff6b	wip integrate falkordb (#211 )	2024-12-18 21:01:24 +00:00
cybermaggedon	7df7843dad	Main/remove parquet (#195 ) * Remove Parquet code, and package build	2024-12-06 08:51:10 +00:00
Cyber MacGeddon	43756d872b	Set dependencies up for the 0.13 branch. Set version=0.0.0 in Makefile to spot build errors.	2024-10-15 00:31:08 +01:00
Cyber MacGeddon	5850b6c136	Merge branch 'release/v0.11' into release/v0.12	2024-10-09 19:38:13 +01:00
cybermaggedon	a711bc1dde	Fix trustgraph broken linkage (#109 )	2024-10-08 20:33:14 +01:00
cybermaggedon	148092a6af	Fix/lock 0.11 version (#108 ) * - Locked 0.11 packages to 0.11 deps - Added 'trustgraph' uber-package which installs the rest - Added dependency to set package versions before building packages * Bump version	2024-10-04 22:12:39 +01:00
cybermaggedon	dda29bb663	Workflows (#105 ) * Some basic structure for workflows * Add PyPI publication for 0.12 * Bump version * Test bundle generation * Install jsonnet * Use release action to automate release creation	2024-10-04 17:28:07 +01:00
cybermaggedon	222dc9982c	Feature/azure openai templates (#104 ) * Azure OpenAI LLM templates * Bump version, fix package versions * Add azure-openai to template generation	2024-10-04 15:47:46 +01:00
Cyber MacGeddon	adba99f270	Bump version	2024-10-02 22:25:48 +01:00
cybermaggedon	db9ed06b1c	Fix/broken kg extract topics (#97 ) * Add missing kg-extract-topics service * Bump version	2024-10-02 22:23:00 +01:00
cybermaggedon	14672f7f0e	Fix/processor state specify prom (#93 ) * Provide mean to specify -p prometheus server * Bump version	2024-10-01 22:14:28 +01:00
Cyber MacGeddon	2e6be5cdce	Bump version	2024-10-01 21:06:07 +01:00
cybermaggedon	56a9ac3ba9	Change LLM latency dashboard to be rate & bump version (#92 )	2024-10-01 21:04:55 +01:00
cybermaggedon	ef1b8b5a13	Feature/metering dashboard (#89 ) * Bump version * Added Prom metrics to metering, added dashboard * Update YAMLs * Add $ on axis * Tweak dashboard	2024-10-01 06:46:41 +01:00
cybermaggedon	88a7dfa126	Maint/rename pkg (#88 ) * Rename trustgraph-utils -> trustgraph-cli * Update YAMLs	2024-09-30 22:20:26 +01:00
cybermaggedon	771d9fc2c7	Make util pathnames have tg- prefix (#87 )	2024-09-30 21:24:22 +01:00
cybermaggedon	0e4c9c69ee	Add twine upload target (#86 )	2024-09-30 21:07:18 +01:00
cybermaggedon	c26ada08c2	Fix VertexAI package. Add Python packaging to Makefile. (#85 ) Bump version & generate templates.	2024-09-30 20:50:20 +01:00
cybermaggedon	f00baab1b8	Maint/fix build env (#84 ) * Put README placeholders for packages in place * Bump version	2024-09-30 19:47:09 +01:00
cybermaggedon	9b91d5eee3	Feature/pkgsplit (#83 ) * Starting to spawn base package * More package hacking * Bedrock and VertexAI * Parquet split * Updated templates * Utils	2024-09-30 19:36:09 +01:00
cybermaggedon	3fb75c617b	Maint/auto pkg versions (#82 ) * Remove need to manage setup.py version * Update YAMLs	2024-09-30 16:38:50 +01:00
cybermaggedon	cdace22ee4	Feature/simpler subpackages (#81 ) * Back to simpler directory structure * Bump version, update templates	2024-09-30 16:16:20 +01:00
cybermaggedon	f081933217	Feature/subpackages (#80 ) * Renaming what will become the core package * Tweaking to get package build working * Fix metering merge * Rename to core directory * Bump version. Use namespace searching for packaging trustgraph-core * Change references to trustgraph-core * Forming embeddings-hf package * Reference modules in core package. * Build both packages to one container, bump version * Update YAMLs	2024-09-30 14:00:29 +01:00
cybermaggedon	14d79ef9f1	Streamline startup (#79 ) * Separate Prom metrics, different processors as different jobs * Create producers before consumers, may streamline startup. * Bump version * Add Pulsar init command, will replace pulsar-admin invocations. * Integrate tg-init-pulsar with YAMLs * Update YAMLs	2024-09-30 12:19:22 +01:00
Cyber MacGeddon	5e8a1520ee	Bump version & update templates	2024-09-30 00:00:36 +01:00
cybermaggedon	74a14639bd	Feature/track processor state (#78 ) * Add a Prom metric to consumers & consumer/producers to track the running state. * New script, gets processor state using prometheus * Bump version, add tg-processor-state to package * Update templates	2024-09-29 23:50:57 +01:00
cybermaggedon	efc364583b	Fix/graph rag uses wrong prompt (#77 ) * Fix queue name invocation, use correct names, not defaults * Bump version * Update templates	2024-09-29 20:38:50 +01:00
Cyber MacGeddon	5951fb4e56	Bump version to 0.11.1	2024-09-29 18:15:34 +01:00
cybermaggedon	fa30544999	Fix/revert template change (#71 ) * Ditched the deploy directory (going away in 0.11) and putting YAML files in top-dir of Github (for now). * Update Makefile for the template change	2024-09-29 18:13:34 +01:00

1 2 3

120 commits