mirror of
https://github.com/trustgraph-ai/trustgraph.git
synced 2026-04-25 00:16:23 +02:00
Add universal document decoder with multi-format support using 'unstructured'. New universal decoder service powered by the unstructured library, handling DOCX, XLSX, PPTX, HTML, Markdown, CSV, RTF, ODT, EPUB and more through a single service. Tables are preserved as HTML markup for better downstream extraction. Images are stored in the librarian but excluded from the text pipeline. Configurable section grouping strategies (whole-document, heading, element-type, count, size) for non-page formats. Page-based formats (PDF, PPTX, XLSX) are automatically grouped by page. All four decoders (PDF, Mistral OCR, Tesseract OCR, universal) now share the "document-decoder" ident so they are interchangeable. PDF-only decoders fetch document metadata to check MIME type and gracefully skip unsupported formats. Librarian changes: removed MIME type whitelist validation so any document format can be ingested. Simplified routing so text/plain goes to text-load and everything else goes to document-load. Removed dual inline/streaming data paths — documents always use document_id for content retrieval. New provenance entity types (tg:Section, tg:Image) and metadata predicates (tg:elementTypes, tg:tableCount, tg:imageCount) for richer explainability. Universal decoder is in its own package (trustgraph-unstructured) and container image (trustgraph-unstructured). |
||
|---|---|---|
| .. | ||
| tech-specs | ||
| api-gateway-changes-v1.8-to-v2.1.md | ||
| api.html | ||
| cli-changes-v1.8-to-v2.1.md | ||
| generate-api-docs.py | ||
| python-api.md | ||
| README.api-docs.md | ||
| README.cats | ||
| README.challenger | ||
| README.md | ||
| websocket.html | ||
TrustGraph Documentation
Welcome to TrustGraph! For comprehensive documentation, please visit:
📖 https://docs.trustgraph.ai
The main documentation site includes:
- Overview - Introduction to TrustGraph concepts and architecture
- Guides - Step-by-step tutorials and how-to guides
- Deployment - Deployment options and configuration
- Reference - API specifications and CLI documentation
Getting Started
New to TrustGraph? Start with the Overview to understand the system.
Ready to deploy? Check out the Deployment Guide.
Integrating with code? See the API Reference for REST, WebSocket, and SDK documentation.