trustgraph

mirror of https://github.com/trustgraph-ai/trustgraph.git synced 2026-06-11 15:55:12 +02:00

Author	SHA1	Message	Date
cybermaggedon	96fd1eab15	Use UUID-based URNs for page and chunk IDs (#703 ) Page and chunk document IDs were deterministic ({doc_id}/p{num}, {doc_id}/p{num}/c{num}), causing "Document already exists" errors when reprocessing documents through different flows. Content may differ between runs due to different parameters or extractors, so deterministic IDs are incorrect. Pages now use urn:page:{uuid}, chunks use urn:chunk:{uuid}. Parent- child relationships are tracked via librarian metadata and provenance triples. Also brings Mistral OCR and Tesseract OCR decoders up to parity with the PDF decoder: librarian fetch/save support, per-page output with unique IDs, and provenance triple emission. Fixes Mistral OCR bug where only the first 5 pages were processed.	2026-03-21 21:17:03 +00:00
cybermaggedon	88fe8468bc	Update CI for 2.1 release (#653 )	2026-02-28 11:10:11 +00:00
cybermaggedon	cf0daedefa	Changed schema for Value -> Term, majorly breaking change (#622 ) * Changed schema for Value -> Term, majorly breaking change * Following the schema change, Value -> Term into all processing * Updated Cassandra for g, p, s, o index patterns (7 indexes) * Reviewed and updated all tests * Neo4j, Memgraph and FalkorDB remain broken, will look at once settled down	2026-01-27 13:48:08 +00:00
cybermaggedon	e4f0013841	Open 1.9 branch (#620 )	2026-01-26 17:36:25 +00:00
Cyber MacGeddon	1865b3f3c8	Start 1.8 release branch	2025-12-17 21:32:13 +00:00
Cyber MacGeddon	98aaa4f67e	Configure for 1.7 release branch	2025-12-03 09:46:55 +00:00
cybermaggedon	97d8b84d7f	Open 1.6 release branch (#564 )	2025-11-24 10:05:29 +00:00
cybermaggedon	ad35656811	Prepare 1.5 release branch (#550 )	2025-10-11 11:44:00 +01:00
cybermaggedon	0b59f0c828	Maint/open 1.4 release branch (#508 ) * Change pyproject files for 1.4 * Fix tests to track 1.4	2025-09-10 22:11:03 +01:00
cybermaggedon	5139c6ad5d	Bump pyproject.toml constraints (#477 )	2025-08-28 13:45:58 +01:00
cybermaggedon	dd70aade11	Implement logging strategy (#444 ) * Logging strategy and convert all prints() to logging invocations	2025-07-30 23:18:38 +01:00
cybermaggedon	98022d6af4	Migrate from setup.py to pyproject.toml (#440 ) * Converted setup.py to pyproject.toml * Modern package infrastructure as recommended by py docs	2025-07-23 21:22:08 +01:00
Cyber MacGeddon	1fe4ed5226	Update Python deps to 1.2	2025-07-17 19:26:19 +01:00
Cyber MacGeddon	f0b2752abf	Bump setup.py versions for 1.1	2025-07-02 16:40:13 +01:00
cybermaggedon	6dc7b4cbfc	Merge pull request #382 from trustgraph-ai/fix/import-queues-not-working Fix/import queues not working	2025-05-17 13:02:58 +01:00
Cyber MacGeddon	848d93922b	Port Tesseract OCR code to new API	2025-05-12 16:27:04 +01:00
Cyber MacGeddon	6dadf30c66	Bump package versions	2025-05-08 22:06:58 +01:00
cybermaggedon	099018e103	Update package versions (#352 )	2025-04-25 19:45:02 +01:00
cybermaggedon	a9197d11ee	Feature/configure flows (#345 ) - Keeps processing in different flows separate so that data can go to different stores / collections etc. - Potentially supports different processing flows - Tidies the processing API with common base-classes for e.g. LLMs, and automatic configuration of 'clients' to use the right queue names in a flow	2025-04-22 20:21:38 +01:00
Cyber MacGeddon	b1cefbe1f7	Update setup.py files to prep 0.22 branch	2025-03-31 22:14:38 +01:00
cybermaggedon	c759d55734	Added module which does OCR for PDF, pdf-ocr in a separate package (#324 ) (has a lot of dependencies). Uses Tesseract.	2025-03-20 09:29:40 +00:00

21 commits