trustgraph/docs
cybermaggedon a630e143ef
Incremental / large document loading (#659)
Tech spec

BlobStore (trustgraph-flow/trustgraph/librarian/blob_store.py):
- get_stream() - yields document content in chunks for streaming retrieval
- create_multipart_upload() - initializes S3 multipart upload, returns
  upload_id
- upload_part() - uploads a single part, returns etag
- complete_multipart_upload() - finalizes upload with part etags
- abort_multipart_upload() - cancels and cleans up

Cassandra schema (trustgraph-flow/trustgraph/tables/library.py):
- New upload_session table with 24-hour TTL
- Index on user for listing sessions
- Prepared statements for all operations
- Methods: create_upload_session(), get_upload_session(),
  update_upload_session_chunk(), delete_upload_session(),
  list_upload_sessions()

- Schema extended with UploadSession, UploadProgress, and new
  request/response fields
- Librarian methods: begin_upload, upload_chunk, complete_upload,
  abort_upload, get_upload_status, list_uploads
- Service routing for all new operations
- Python SDK with transparent chunked upload:
  - add_document() auto-switches to chunked for files > 10MB
  - Progress callback support (on_progress)
  - get_pending_uploads(), get_upload_status(), abort_upload(),
    resume_upload()

- Document table: Added parent_id and document_type columns with index
- Document schema (knowledge/document.py): Added document_id field for
  streaming retrieval
- Librarian operations:
  - add-child-document for extracted PDF pages
  - list-children to get child documents
  - stream-document for chunked content retrieval
  - Cascade delete removes children when parent is deleted
  - list-documents filters children by default
- PDF decoder (decoding/pdf/pdf_decoder.py): Updated to stream large
  documents from librarian API to temp file
- Librarian service (librarian/service.py): Sends document_id instead of
  content for large PDFs (>2MB)
- Deprecated tools (load_pdf.py, load_text.py): Added deprecation
  warnings directing users to tg-add-library-document +
  tg-start-library-processing

Remove load_pdf and load_text utils

Move chunker/librarian comms to base class

Updating tests
2026-03-04 16:57:58 +00:00
..
tech-specs Incremental / large document loading (#659) 2026-03-04 16:57:58 +00:00
api.html Add AsyncAPI spec for websocket (#613) 2026-01-15 11:57:16 +00:00
generate-api-docs.py Python API docs (#614) 2026-01-15 15:12:32 +00:00
python-api.md Python API docs (#614) 2026-01-15 15:12:32 +00:00
README.api-docs.md Python API docs (#614) 2026-01-15 15:12:32 +00:00
README.cats Added agent support to templates (#150) 2024-11-12 00:22:18 +00:00
README.challenger Added agent support to templates (#150) 2024-11-12 00:22:18 +00:00
README.md Add AsyncAPI spec for websocket (#613) 2026-01-15 11:57:16 +00:00
websocket.html Add AsyncAPI spec for websocket (#613) 2026-01-15 11:57:16 +00:00

TrustGraph Documentation

Welcome to TrustGraph! For comprehensive documentation, please visit:

📖 https://docs.trustgraph.ai

The main documentation site includes:

  • Overview - Introduction to TrustGraph concepts and architecture
  • Guides - Step-by-step tutorials and how-to guides
  • Deployment - Deployment options and configuration
  • Reference - API specifications and CLI documentation

Getting Started

New to TrustGraph? Start with the Overview to understand the system.

Ready to deploy? Check out the Deployment Guide.

Integrating with code? See the API Reference for REST, WebSocket, and SDK documentation.