mirror of
https://github.com/VectifyAI/PageIndex.git
synced 2026-06-12 19:55:17 +02:00
Default behavior unchanged. Users can opt in via pdf_parser="pypdfium2" for cleaner text extraction (no broken words, correct Unicode) and 3-5x faster parsing. PyPDF2 remains the only required dependency; pypdfium2 is lazy-imported. |
||
|---|---|---|
| .. | ||
| __init__.py | ||
| client.py | ||
| config.yaml | ||
| page_index.py | ||
| page_index_md.py | ||
| retrieve.py | ||
| utils.py | ||