Update README.md

This commit is contained in:
Ray 2025-08-09 01:39:50 +08:00 committed by GitHub
parent f0d5642681
commit 87633b2cbb
No known key found for this signature in database
GPG key ID: B5690EEEBB952194

View file

@ -20,13 +20,12 @@ Are you frustrated with vector database retrieval accuracy for long professional
### PageIndex OCR (Updates On 2025/08/07)
This repo is designed for generating PageIndex tree structure with text inputbut, but many real-world use cases involve PDFs that require OCR to convert them into Markdown. However, extracting high-quality text from PDF documents remains a non-trivial challenge. Most OCR tools only extract page-level content, losing the broader document context and hierarchy.
This repo is designed for generating PageIndex tree structure with text input, but many real-world use cases involve PDFs that require OCR to convert them into Markdown. However, extracting high-quality text from PDF documents remains a non-trivial challenge. Most OCR tools only extract page-level content, losing the broader document context and hierarchy.
To address this, weve introduced PageIndex OCR—the first OCR system designed to preserve the global structure of the documents. PageIndex OCR significantly outperforms other leading OCR tools, such as those from Mistral and Contextual AI, in recognizing the hierarchical layout and semantic relationships across pages.
To address this, we introduced PageIndex OCR — the first OCR system designed to preserve the global structure of documents. PageIndex OCR significantly outperforms other leading OCR tools, such as those from Mistral and Contextual AI, in recognizing true hierarchy and semantic relationships across document pages.
- Quality experience PageIndex OCRs next-level quality with our [Dashboard](https://dash.pageindex.ai).
- Easily integrate PageIndex OCR into your stack via our [API](https://lnkd.in/e4Kbk59T).
- Experience next-level OCR quality with PageIndex OCR at our [Dashboard](https://dash.pageindex.ai).
- Integrate seamlessly PageIndex OCR into your stack via our [API](https://lnkd.in/e4Kbk59T).
<img width="3016" height="1644" alt="image" src="https://github.com/user-attachments/assets/eb35d8ae-865c-4e60-a33b-ebbd00c41732" />