mirror of
https://github.com/VectifyAI/PageIndex.git
synced 2026-04-24 23:56:21 +02:00
Add PageIndexClient with agent-based retrieval via OpenAI Agents SDK (#125)
* Add PageIndexClient with retrieve, streaming support and litellm integration * Add OpenAI agents demo example * Update README with example agent demo section * Support separate retrieve_model configuration for index and retrieve
This commit is contained in:
parent
2403be8f27
commit
5d4491f3bf
9 changed files with 501 additions and 7 deletions
26
README.md
26
README.md
|
|
@ -147,15 +147,17 @@ You can follow these steps to generate a PageIndex tree from a PDF document.
|
|||
pip3 install --upgrade -r requirements.txt
|
||||
```
|
||||
|
||||
### 2. Set your OpenAI API key
|
||||
### 2. Set your LLM API key
|
||||
|
||||
Create a `.env` file in the root directory and add your API key:
|
||||
Create a `.env` file in the root directory with your LLM API key::
|
||||
|
||||
```bash
|
||||
CHATGPT_API_KEY=your_openai_key_here
|
||||
OPENAI_API_KEY=your_openai_key_here
|
||||
# or
|
||||
CHATGPT_API_KEY=your_openai_key_here # legacy, still supported
|
||||
```
|
||||
|
||||
### 3. Run PageIndex on your PDF
|
||||
### 3. Generate PageIndex structure for your PDF
|
||||
|
||||
```bash
|
||||
python3 run_pageindex.py --pdf_path /path/to/your/document.pdf
|
||||
|
|
@ -189,7 +191,21 @@ python3 run_pageindex.py --md_path /path/to/your/document.md
|
|||
> Note: in this function, we use "#" to determine node heading and their levels. For example, "##" is level 2, "###" is level 3, etc. Make sure your markdown file is formatted correctly. If your Markdown file was converted from a PDF or HTML, we don't recommend using this function, since most existing conversion tools cannot preserve the original hierarchy. Instead, use our [PageIndex OCR](https://pageindex.ai/blog/ocr), which is designed to preserve the original hierarchy, to convert the PDF to a markdown file and then use this function.
|
||||
</details>
|
||||
|
||||
<!--
|
||||
### A Complete Agentic RAG Example
|
||||
|
||||
For a complete agent-based QA example using the [OpenAI Agents SDK](https://github.com/openai/openai-agents-python), see [`examples/openai_agents_demo.py`](examples/openai_agents_demo.py).
|
||||
|
||||
```bash
|
||||
# Install optional dependency
|
||||
pip3 install openai-agents
|
||||
|
||||
# Run the demo
|
||||
python3 examples/openai_agents_demo.py
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
<!--
|
||||
# ☁️ Improved Tree Generation with PageIndex OCR
|
||||
|
||||
This repo is designed for generating PageIndex tree structure for simple PDFs, but many real-world use cases involve complex PDFs that are hard to parse by classic Python tools. However, extracting high-quality text from PDF documents remains a non-trivial challenge. Most OCR tools only extract page-level content, losing the broader document context and hierarchy.
|
||||
|
|
|
|||
Loading…
Add table
Add a link
Reference in a new issue