trustgraph

mirror of https://github.com/trustgraph-ai/trustgraph.git synced 2026-07-10 22:02:12 +02:00

Author	SHA1	Message	Date
cybermaggedon	03c7a7c5a8	Add multi-arch (amd64/arm64) container builds and parallel CI Pull requests back-ported to release/v2.2: 801, 802, 804, 805 Restructure container builds for multi-platform support, enabling ARM-based deployments (e.g. Apple Silicon via Docker Desktop). Makefile: - Replace per-container named targets with pattern rules (container-%, manifest-%, platform-%-{amd64,arm64}, combine-manifest-%) - Add parallel CI targets: platform builds push per-arch images, combine-manifest creates and pushes the multi-arch manifest list - Remove legacy cruft targets (update-dcs, update-templates) CI (release.yaml): - Split single deploy job into build-platform-image (16 parallel jobs: 8 containers x 2 platforms) and combine-manifests (8 jobs, metadata only) - Use native ARM runners (ubuntu-24.04-arm) Containerfile.hf: - Downgrade to Python 3.12 (PyTorch lacks arm64 wheels for 3.13) - Use standard PyTorch package instead of +cpu variant (no arm64 wheels on the cpu index)	2026-04-14 12:22:12 +01:00
cybermaggedon	413f917676	Add missing pdf extra to unstructured dependency (#728 ) * Fix PDF processing deps so that PDF processing works	2026-03-29 20:22:45 +01:00
cybermaggedon	5c6fe90fe2	Add universal document decoder with multi-format support (#705 ) Add universal document decoder with multi-format support using 'unstructured'. New universal decoder service powered by the unstructured library, handling DOCX, XLSX, PPTX, HTML, Markdown, CSV, RTF, ODT, EPUB and more through a single service. Tables are preserved as HTML markup for better downstream extraction. Images are stored in the librarian but excluded from the text pipeline. Configurable section grouping strategies (whole-document, heading, element-type, count, size) for non-page formats. Page-based formats (PDF, PPTX, XLSX) are automatically grouped by page. All four decoders (PDF, Mistral OCR, Tesseract OCR, universal) now share the "document-decoder" ident so they are interchangeable. PDF-only decoders fetch document metadata to check MIME type and gracefully skip unsupported formats. Librarian changes: removed MIME type whitelist validation so any document format can be ingested. Simplified routing so text/plain goes to text-load and everything else goes to document-load. Removed dual inline/streaming data paths — documents always use document_id for content retrieval. New provenance entity types (tg:Section, tg:Image) and metadata predicates (tg:elementTypes, tg:tableCount, tg:imageCount) for richer explainability. Universal decoder is in its own package (trustgraph-unstructured) and container image (trustgraph-unstructured).	2026-03-23 12:56:35 +00:00
cybermaggedon	08063a5ee9	Remove unused deps (#640 ) * Removed the Google GenAI hard-coded install	2026-02-20 10:13:44 +00:00
cybermaggedon	05b9063fea	Feature/python3.13 (#553 ) * Python to 3.13 * cassandra-driver -> scylla-driver (cassandra-driver doesn't work with Python 3.13)	2025-10-11 12:19:26 +01:00
cybermaggedon	98022d6af4	Migrate from setup.py to pyproject.toml (#440 ) * Converted setup.py to pyproject.toml * Modern package infrastructure as recommended by py docs	2025-07-23 21:22:08 +01:00
cybermaggedon	f907ea7db8	PoC MCP server (#419 ) * Very initial MCP server PoC for TrustGraph * Put service on port 8000 * Add MCP container and packages to buildout	2025-07-02 18:19:23 +01:00
cybermaggedon	81d73445bd	Add missing dependencies to the PDF OCR container (#411 )	2025-06-16 14:15:16 +01:00
cybermaggedon	448819ed47	Updates to Google AI: (#394 ) - Changed GoogleAIStudio LLM code to match latest documentation - Very minor tweak to vertexai LLM code - just matching what's in SDK docs no actual change to implementation. - Tweaked VertexAI container build to speed up in dev - Comments in LLM code to mention which docs it was built from. Google SDKs are confusing ATM.	2025-05-24 12:09:43 +01:00
cybermaggedon	b380c2054d	Change containers to Python 3.12 (#386 )	2025-05-17 23:15:29 +01:00
cybermaggedon	e04d3631fd	Update container deps (#385 ) - Fedora 42 container - Pulsar client 3.7.0 - Latest AI libs	2025-05-17 20:54:56 +01:00
cybermaggedon	322725be04	Fix container build (#325 )	2025-03-20 09:38:54 +00:00
cybermaggedon	c759d55734	Added module which does OCR for PDF, pdf-ocr in a separate package (#324 ) (has a lot of dependencies). Uses Tesseract.	2025-03-20 09:29:40 +00:00
JackColquitt	d676804107	Added Mistral jsonnet templates	2025-03-14 18:07:51 -07:00
cybermaggedon	edcdc4d59d	Feature/separate containers (#287 ) * Separate containerfiles * Add push to Makefile * Update image names in the templates	2025-01-28 19:36:05 +00:00

15 commits