trustgraph/docs
cybermaggedon 5c6fe90fe2
Add universal document decoder with multi-format support (#705)
Add universal document decoder with multi-format support
using 'unstructured'.

New universal decoder service powered by the unstructured
library, handling DOCX, XLSX, PPTX, HTML, Markdown, CSV, RTF,
ODT, EPUB and more through a single service. Tables are preserved
as HTML markup for better downstream extraction. Images are
stored in the librarian but excluded from the text
pipeline. Configurable section grouping strategies
(whole-document, heading, element-type, count, size) for non-page
formats. Page-based formats (PDF, PPTX, XLSX) are automatically
grouped by page.

All four decoders (PDF, Mistral OCR, Tesseract OCR, universal)
now share the "document-decoder" ident so they are
interchangeable.  PDF-only decoders fetch document metadata to
check MIME type and gracefully skip unsupported formats.

Librarian changes: removed MIME type whitelist validation so any
document format can be ingested. Simplified routing so text/plain
goes to text-load and everything else goes to document-load.
Removed dual inline/streaming data paths — documents always use
document_id for content retrieval.

New provenance entity types (tg:Section, tg:Image) and metadata
predicates (tg:elementTypes, tg:tableCount, tg:imageCount) for
richer explainability.

Universal decoder is in its own package (trustgraph-unstructured)
and container image (trustgraph-unstructured).
2026-03-23 12:56:35 +00:00
..
tech-specs Add universal document decoder with multi-format support (#705) 2026-03-23 12:56:35 +00:00
api-gateway-changes-v1.8-to-v2.1.md Update API specs for 2.1 (#699) 2026-03-17 20:36:31 +00:00
api.html Update API specs for 2.1 (#699) 2026-03-17 20:36:31 +00:00
cli-changes-v1.8-to-v2.1.md Update API specs for 2.1 (#699) 2026-03-17 20:36:31 +00:00
generate-api-docs.py Python API docs (#614) 2026-01-15 15:12:32 +00:00
python-api.md Update API specs for 2.1 (#699) 2026-03-17 20:36:31 +00:00
README.api-docs.md Python API docs (#614) 2026-01-15 15:12:32 +00:00
README.cats Added agent support to templates (#150) 2024-11-12 00:22:18 +00:00
README.challenger Added agent support to templates (#150) 2024-11-12 00:22:18 +00:00
README.md Add AsyncAPI spec for websocket (#613) 2026-01-15 11:57:16 +00:00
websocket.html Update API specs for 2.1 (#699) 2026-03-17 20:36:31 +00:00

TrustGraph Documentation

Welcome to TrustGraph! For comprehensive documentation, please visit:

📖 https://docs.trustgraph.ai

The main documentation site includes:

  • Overview - Introduction to TrustGraph concepts and architecture
  • Guides - Step-by-step tutorials and how-to guides
  • Deployment - Deployment options and configuration
  • Reference - API specifications and CLI documentation

Getting Started

New to TrustGraph? Start with the Overview to understand the system.

Ready to deploy? Check out the Deployment Guide.

Integrating with code? See the API Reference for REST, WebSocket, and SDK documentation.