Add PageIndexClient with agent-based retrieval via OpenAI Agents SDK (#125)

* Add PageIndexClient with retrieve, streaming support and litellm integration
* Add OpenAI agents demo example
* Update README with example agent demo section
* Support separate retrieve_model configuration for index and retrieve
This commit is contained in:
Kylin 2026-03-26 23:19:50 +08:00 committed by GitHub
parent 2403be8f27
commit 5d4491f3bf
No known key found for this signature in database
GPG key ID: B5690EEEBB952194
9 changed files with 501 additions and 7 deletions

View file

@ -147,15 +147,17 @@ You can follow these steps to generate a PageIndex tree from a PDF document.
pip3 install --upgrade -r requirements.txt
```
### 2. Set your OpenAI API key
### 2. Set your LLM API key
Create a `.env` file in the root directory and add your API key:
Create a `.env` file in the root directory with your LLM API key::
```bash
CHATGPT_API_KEY=your_openai_key_here
OPENAI_API_KEY=your_openai_key_here
# or
CHATGPT_API_KEY=your_openai_key_here # legacy, still supported
```
### 3. Run PageIndex on your PDF
### 3. Generate PageIndex structure for your PDF
```bash
python3 run_pageindex.py --pdf_path /path/to/your/document.pdf
@ -189,7 +191,21 @@ python3 run_pageindex.py --md_path /path/to/your/document.md
> Note: in this function, we use "#" to determine node heading and their levels. For example, "##" is level 2, "###" is level 3, etc. Make sure your markdown file is formatted correctly. If your Markdown file was converted from a PDF or HTML, we don't recommend using this function, since most existing conversion tools cannot preserve the original hierarchy. Instead, use our [PageIndex OCR](https://pageindex.ai/blog/ocr), which is designed to preserve the original hierarchy, to convert the PDF to a markdown file and then use this function.
</details>
<!--
### A Complete Agentic RAG Example
For a complete agent-based QA example using the [OpenAI Agents SDK](https://github.com/openai/openai-agents-python), see [`examples/openai_agents_demo.py`](examples/openai_agents_demo.py).
```bash
# Install optional dependency
pip3 install openai-agents
# Run the demo
python3 examples/openai_agents_demo.py
```
---
<!--
# ☁️ Improved Tree Generation with PageIndex OCR
This repo is designed for generating PageIndex tree structure for simple PDFs, but many real-world use cases involve complex PDFs that are hard to parse by classic Python tools. However, extracting high-quality text from PDF documents remains a non-trivial challenge. Most OCR tools only extract page-level content, losing the broader document context and hierarchy.