mirror of
https://github.com/VectifyAI/PageIndex.git
synced 2026-04-24 23:56:21 +02:00
Update README.md
This commit is contained in:
parent
3eb7a9f11d
commit
a3baeaf52c
1 changed files with 12 additions and 10 deletions
22
README.md
22
README.md
|
|
@ -15,16 +15,6 @@ Are you frustrated with vector database retrieval accuracy for long professional
|
|||
|
||||
Self-host it with this open-source repo, or try our ☁️ [Cloud service](https://dash.pageindex.ai/) - no setup required.
|
||||
|
||||
### PageIndex OCR (Updates On 2025/08/07)
|
||||
This repo is designed for generating PageIndex tree structure with text input, but many real-world use cases involve PDFs that require OCR to convert them into Markdown. However, extracting high-quality text from PDF documents remains a non-trivial challenge. Most OCR tools only extract page-level content, losing the broader document context and hierarchy.
|
||||
|
||||
To address this, we introduced PageIndex OCR — the first OCR system designed to preserve the global structure of documents. PageIndex OCR significantly outperforms other leading OCR tools, such as those from Mistral and Contextual AI, in recognizing true hierarchy and semantic relationships across document pages.
|
||||
|
||||
- Experience next-level OCR quality with PageIndex OCR at our [Dashboard](https://dash.pageindex.ai).
|
||||
- Integrate seamlessly PageIndex OCR into your stack via our [API](https://docs.pageindex.ai/quickstart).
|
||||
|
||||
<img width="3016" height="1644" alt="image" src="https://github.com/user-attachments/assets/eb35d8ae-865c-4e60-a33b-ebbd00c41732" />
|
||||
|
||||
---
|
||||
|
||||
# **⭐ What is PageIndex**
|
||||
|
|
@ -127,6 +117,18 @@ Leave your email in [this form](https://ii2abc2jejf.typeform.com/to/meB40zV0) to
|
|||
|
||||
---
|
||||
|
||||
### PageIndex OCR (Updates On 2025/08/07)
|
||||
This repo is designed for generating PageIndex tree structure with text input, but many real-world use cases involve PDFs that require OCR to convert them into Markdown. However, extracting high-quality text from PDF documents remains a non-trivial challenge. Most OCR tools only extract page-level content, losing the broader document context and hierarchy.
|
||||
|
||||
To address this, we introduced PageIndex OCR — the first OCR system designed to preserve the global structure of documents. PageIndex OCR significantly outperforms other leading OCR tools, such as those from Mistral and Contextual AI, in recognizing true hierarchy and semantic relationships across document pages.
|
||||
|
||||
- Experience next-level OCR quality with PageIndex OCR at our [Dashboard](https://dash.pageindex.ai).
|
||||
- Integrate seamlessly PageIndex OCR into your stack via our [API](https://docs.pageindex.ai/quickstart).
|
||||
|
||||
<img width="3016" height="1644" alt="image" src="https://github.com/user-attachments/assets/eb35d8ae-865c-4e60-a33b-ebbd00c41732" />
|
||||
|
||||
---
|
||||
|
||||
# 📈 Case Study: Mafin 2.5 on FinanceBench
|
||||
|
||||
[Mafin 2.5](https://vectify.ai/mafin) is a state-of-the-art reasoning-based RAG model designed specifically for financial document analysis. Powered by **PageIndex**, it achieved a market-leading [**98.7% accuracy**](https://vectify.ai/blog/Mafin2.5) on the [FinanceBench](https://arxiv.org/abs/2311.11944) benchmark — significantly outperforming traditional vector-based RAG systems.
|
||||
|
|
|
|||
Loading…
Add table
Add a link
Reference in a new issue