mirror of
https://github.com/VectifyAI/PageIndex.git
synced 2026-04-24 23:56:21 +02:00
Update README.md
This commit is contained in:
parent
ae60af60fd
commit
d2c92a9310
1 changed files with 3 additions and 3 deletions
|
|
@ -76,7 +76,7 @@ PageIndex can transform lengthy PDF documents into a semantic **tree structure**
|
|||
|
||||
Here is an example output. See more [example documents](https://github.com/VectifyAI/PageIndex/tree/main/tests/pdfs) and [generated trees](https://github.com/VectifyAI/PageIndex/tree/main/tests/results).
|
||||
|
||||
```
|
||||
```python
|
||||
...
|
||||
{
|
||||
"title": "Financial Stability",
|
||||
|
|
@ -164,12 +164,12 @@ python3 run_pageindex.py --md_path /path/to/your/document.md
|
|||
|
||||
# ☁️ Improved Tree Generation with PageIndex OCR
|
||||
|
||||
This repo is designed for generating PageIndex tree structure for simple PDFs, but many real-world use cases involve complex PDFs that are hard to parsed by classic python tools. However, extracting high-quality text from PDF documents remains a non-trivial challenge. Most OCR tools only extract page-level content, losing the broader document context and hierarchy.
|
||||
This repo is designed for generating PageIndex tree structure for simple PDFs, but many real-world use cases involve complex PDFs that are hard to parse by classic Python tools. However, extracting high-quality text from PDF documents remains a non-trivial challenge. Most OCR tools only extract page-level content, losing the broader document context and hierarchy.
|
||||
|
||||
To address this, we introduced PageIndex OCR — the first long-context OCR model designed to preserve the global structure of documents. PageIndex OCR significantly outperforms other leading OCR tools, such as those from Mistral and Contextual AI, in recognizing true hierarchy and semantic relationships across document pages.
|
||||
|
||||
- Experience next-level OCR quality with PageIndex OCR at our [Dashboard](https://dash.pageindex.ai/).
|
||||
- Integrate seamlessly PageIndex OCR into your stack via our [API](https://docs.pageindex.ai/quickstart).
|
||||
- Integrate PageIndex OCR seamlessly into your stack via our [API](https://docs.pageindex.ai/quickstart).
|
||||
|
||||
<p align="center">
|
||||
<img src="https://github.com/user-attachments/assets/eb35d8ae-865c-4e60-a33b-ebbd00c41732" width="80%">
|
||||
|
|
|
|||
Loading…
Add table
Add a link
Reference in a new issue