From 3ac49528d6f1e20accacbec3759990b4a43a4165 Mon Sep 17 00:00:00 2001 From: Cyber MacGeddon Date: Thu, 11 Jul 2024 22:41:42 +0100 Subject: [PATCH] Update docs --- README.md | 12 ++++++++++-- 1 file changed, 10 insertions(+), 2 deletions(-) diff --git a/README.md b/README.md index 406778dd..2ee161b9 100644 --- a/README.md +++ b/README.md @@ -79,6 +79,14 @@ Pulsar provides two types of connectivity: to answer prompts. - `loader` - Takes a document and loads into the processing pipeline. Used e.g. to add PDF documents. -- `pdf-decoder` - -- `vector-write-milvus` - +- `pdf-decoder` - Takes a PDF doc and emits text extracted from the document. + Text extraction from PDF is not a perfect science as PDF is a printable + format. For instance, the wrapping of text between lines in a PDF document + is not semantically encoded, so the decoder will see wrapped lines as + space-separated. +- `vector-write-milvus` - Takes vector-entity mappings and records them + in the graph. +## Getting started + +TBD \ No newline at end of file