Commit graph

8 commits

Author SHA1 Message Date
Ray
108cb28518 Move pdf_parser off doc dict, pass via call args 2026-05-11 16:40:32 +08:00
Ray
ec1aaca4c9 Centralize default parser as DEFAULT_PDF_PARSER constant 2026-05-11 16:24:01 +08:00
Ray
1629ef4318 Take pdf_parser out of ConfigLoader, use plain function arg 2026-05-11 16:20:45 +08:00
Ray
9539fe7513 Add pypdfium2 as optional PDF parser
Default behavior unchanged. Users can opt in via pdf_parser="pypdfium2"
for cleaner text extraction (no broken words, correct Unicode) and
3-5x faster parsing. PyPDF2 remains the only required dependency;
pypdfium2 is lazy-imported.
2026-05-11 16:04:07 +08:00
Ray
a108c021ae
Disable agent tracing and auto-add litellm/ prefix for retrieve_model
* Disable agent tracing and auto-add litellm/ prefix for retrieve_model

* Preserve supported retrieve_model prefixes

* Remove temporary retrieve_model tests

* Limit tracing disablement to demo execution
2026-03-29 00:55:57 +08:00
Ray
4002dc94de Rename demo script and update README wording 2026-03-28 04:56:05 +08:00
Ray
77722838e1
Restructure examples directory and improve document storage (#189)
* Consolidate tests/ into examples/documents/

* Add line_count and reorder structure keys

* Lazy-load documents with _meta.json index

* Update demo script and add pre-shipped workspace

* Extract shared helpers for JSON reading and meta entry building
2026-03-28 04:28:59 +08:00
Kylin
5d4491f3bf
Add PageIndexClient with agent-based retrieval via OpenAI Agents SDK (#125)
* Add PageIndexClient with retrieve, streaming support and litellm integration
* Add OpenAI agents demo example
* Update README with example agent demo section
* Support separate retrieve_model configuration for index and retrieve
2026-03-26 23:19:50 +08:00