Update demo example paper and polish README

2026-04-24 23:56:21 +02:00 · 2026-03-27 01:22:03 +08:00 · 2026-03-27 01:22:03 +08:00 · 9798aaae19
commit 9798aaae19
parent 5d4491f3bf
3 changed files with 12 additions and 10 deletions
--- a/.gitignore
+++ b/.gitignore
@ -7,6 +7,7 @@ chroma-collections.parquet
 chroma-embeddings.parquet
 .DS_Store
 .env*
+.venv/
 notebook
 SDK/*
 log/*
--- a/README.md
+++ b/README.md
@ -101,6 +101,7 @@ The PageIndex service is available as a ChatGPT-style [chat platform](https://ch
 ---

 # 🌲 PageIndex Tree Structure
+
 PageIndex can transform lengthy PDF documents into a semantic **tree structure**, similar to a _"table of contents"_ but optimized for use with Large Language Models (LLMs). It's ideal for: financial reports, regulatory filings, academic textbooks, legal or technical manuals, and any document that exceeds LLM context limits.

 Below is an example PageIndex tree structure. Also see more example [documents](https://github.com/VectifyAI/PageIndex/tree/main/tests/pdfs) and generated [tree structures](https://github.com/VectifyAI/PageIndex/tree/main/tests/results).
@ -133,7 +134,7 @@ Below is an example PageIndex tree structure. Also see more example [documents](
 ...
 ```

-You can generate the PageIndex tree structure with this open-source repo, or use our [API](https://docs.pageindex.ai/quickstart) 
+You can generate the PageIndex tree structure with this open-source repo, or use our [API](https://docs.pageindex.ai/quickstart).

 ---

@ -149,7 +150,7 @@ pip3 install --upgrade -r requirements.txt

 ### 2. Set your LLM API key

-Create a `.env` file in the root directory with your LLM API key::
+Create a `.env` file in the root directory with your LLM API key, with multi-LLM support via [LiteLLM](https://docs.litellm.ai/docs/providers):

 ```bash
 OPENAI_API_KEY=your_openai_key_here
@ -169,7 +170,7 @@ python3 run_pageindex.py --pdf_path /path/to/your/document.pdf
 You can customize the processing with additional optional arguments:

 ```
--model                 OpenAI model to use (default: gpt-4o-2024-11-20)
+--model                 LLM model to use (default: gpt-4o-2024-11-20)
 --toc-check-pages       Pages to check for table of contents (default: 20)
 --max-pages-per-node    Max pages per node (default: 10)
 --max-tokens-per-node   Max tokens per node (default: 20000)
@ -182,7 +183,7 @@ You can customize the processing with additional optional arguments:
 <details>
 <summary><strong>Markdown support</strong></summary>
 <br>
-We also provide markdown support for PageIndex. You can use the `-md_path` flag to generate a tree structure for a markdown file.
+We also provide markdown support for PageIndex. You can use the `--md_path` flag to generate a tree structure for a markdown file.

 ```bash
 python3 run_pageindex.py --md_path /path/to/your/document.md
@ -193,7 +194,7 @@ python3 run_pageindex.py --md_path /path/to/your/document.md

 ### A Complete Agentic RAG Example

-For a complete agent-based QA example using the [OpenAI Agents SDK](https://github.com/openai/openai-agents-python), see [`examples/openai_agents_demo.py`](examples/openai_agents_demo.py).
+For a complete example on **agentic RAG with PageIndex** (using [OpenAI Agents SDK](https://github.com/openai/openai-agents-python)), see [`examples/openai_agents_demo.py`](examples/openai_agents_demo.py).

 ```bash
 # Install optional dependency
--- a/examples/openai_agents_demo.py
+++ b/examples/openai_agents_demo.py
@ -32,8 +32,8 @@ from openai.types.responses import ResponseTextDeltaEvent, ResponseReasoningSumm
 from pageindex import PageIndexClient
 import pageindex.utils as utils

-PDF_URL = "https://arxiv.org/pdf/2501.12948.pdf"
-PDF_PATH = "tests/pdfs/deepseek-r1.pdf"
+PDF_URL = "https://arxiv.org/pdf/2603.15031"
+PDF_PATH = "tests/pdfs/attention-residuals.pdf"
 WORKSPACE = "./pageindex_workspace"

 AGENT_SYSTEM_PROMPT = """
@ -168,6 +168,6 @@ print(client.get_document(doc_id))
 print("\n" + "=" * 60)
 print("Step 3: Agent Query (auto tool-use)")
 print("=" * 60)
-question = "What reward design does DeepSeek-R1-Zero use, and why was it chosen over supervised fine-tuning?"
+question = "Explain Attention Residuals in simple language."
 print(f"\nQuestion: '{question}'\n")
 query_agent(client, doc_id, question, verbose=True)