diff --git a/cookbook/pageindex_RAG_simple.ipynb b/cookbook/pageindex_RAG_simple.ipynb
index fa6400d..fed785c 100644
--- a/cookbook/pageindex_RAG_simple.ipynb
+++ b/cookbook/pageindex_RAG_simple.ipynb
@@ -39,13 +39,19 @@
"\n",
"- **No Vectors Needed**: Uses document structure and LLM reasoning for retrieval.\n",
"- **No Chunking Needed**: Documents are organized into natural sections rather than artificial chunks.\n",
- "- **No Top-K Needed**: The LLM decides how many nodes need to be retrieved.\n",
- "- **Transparent Retrieval Process**: Retrieval based on reasoning — say goodbye to approximate semantic search ('vibe retrieval').\n",
+ "- **Human-like Retrieval**: Simulates how human experts navigate and extract knowledge from complex documents. \n",
+ "- **Transparent Retrieval Process**: Retrieval based on reasoning — say goodbye to approximate semantic search ('vibe retrieval')."
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "metadata": {},
+ "source": [
+ "## 📝 About this Notebook\n",
"\n",
- "# 📝 About this Notebook\n",
"This notebook demonstrates a simple example of **vectorless RAG** with PageIndex. You will learn:\n",
- "- [x] How to generate PageIndex tree structure of a document.\n",
- "- [x] How to perform retrieval with tree search.\n",
+ "- [x] How to build a PageIndex tree structure of a document.\n",
+ "- [x] How to perform reasoning-based retrieval with tree search.\n",
"- [x] How to generate the answer based on the retrieved context."
]
},
@@ -55,7 +61,7 @@
"id": "7ziuTbbWcG1L"
},
"source": [
- "# Preparation\n",
+ "## Preparation\n",
"\n"
]
},
@@ -65,7 +71,7 @@
"id": "edTfrizMFK4c"
},
"source": [
- "## Install Dependencies"
+ "### Install dependencies"
]
},
{
@@ -86,7 +92,7 @@
"id": "WVEWzPKGcG1M"
},
"source": [
- "## Setup Environment"
+ "### Setup environment"
]
},
{
@@ -114,7 +120,7 @@
"id": "AR7PLeVbcG1N"
},
"source": [
- "## Define Utility Functions"
+ "### Define utility functions"
]
},
{
@@ -169,7 +175,7 @@
"id": "heGtIMOVcG1N"
},
"source": [
- "# Step 1: PageIndex Tree Generation"
+ "## Step 1: PageIndex Tree Generation"
]
},
{
@@ -178,7 +184,7 @@
"id": "Mzd1VWjwMUJL"
},
"source": [
- "## Submit a document with PageIndex SDK"
+ "### Submit a document with PageIndex SDK"
]
},
{
@@ -224,7 +230,7 @@
"id": "4-Hrh0azcG1N"
},
"source": [
- "## Get the generated PageIndex tree structure"
+ "### Get the generated PageIndex tree structure"
]
},
{
@@ -329,9 +335,9 @@
"id": "USoCLOiQcG1O"
},
"source": [
- "# Step 2: Reasoning-Based Retrieval with Tree Search\n",
+ "## Step 2: Reasoning-Based Retrieval with Tree Search\n",
"\n",
- "#### Use LLM to search the PageIndex tree and decide which nodes may contain the relevant context."
+ "### Use LLM for tree search and decide which nodes might contain relevant context"
]
},
{
@@ -367,6 +373,13 @@
"tree_search_result = await call_llm(search_prompt)"
]
},
+ {
+ "cell_type": "markdown",
+ "metadata": {},
+ "source": [
+ "### Print retrieved nodes and reasoning process"
+ ]
+ },
{
"cell_type": "code",
"execution_count": null,
@@ -426,8 +439,6 @@
}
],
"source": [
- "### Print retrieval nodes\n",
- "\n",
"node_map = create_node_mapping(tree)\n",
"tree_search_result_json = json.loads(tree_search_result)\n",
"\n",
@@ -446,9 +457,9 @@
"id": "10wOZDG_cG1O"
},
"source": [
- "# Step 3: Answer Generation\n",
+ "## Step 3: Answer Generation\n",
"\n",
- "#### Extract context from relevant nodes and generate the final answer."
+ "### Extract relevant context from retrieved nodes"
]
},
{
@@ -496,12 +507,18 @@
}
],
"source": [
- "# Prepare Retrieved Context\n",
- "\n",
"node_list = json.loads(tree_search_result)[\"node_list\"]\n",
"relevant_content = \"\\n\\n\".join(node_map[node_id][\"text\"] for node_id in node_list)\n",
+ "\n",
"print_markdown('## Retrieved Context', '---')\n",
- "print_markdown(f'{relevant_content[:1000]} ...')"
+ "print_markdown(relevant_content[:1000] + ' ...')"
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "metadata": {},
+ "source": [
+ "### Generate answer based on retrieved context"
]
},
{
@@ -548,8 +565,6 @@
}
],
"source": [
- "# Generate Answer\n",
- "\n",
"answer_prompt = f\"\"\"\n",
"Answer the question based on the context:\n",
"\n",
@@ -572,15 +587,21 @@
"source": [
"# 🎯 What's Next\n",
"\n",
- "This notebook has demonstrated a basic example of **reasoning-based**, **vectorless** RAG with PageIndex. The workflow illustrates the core idea:\n",
- "> *Generating a hierarchical tree structure from a document, reasoning over that tree structure, and extracting relevant context without relying on a vector database or top-k similarity search*.\n",
+ "This notebook has demonstrated a basic, minimal example of **reasoning-based**, **vectorless** RAG with PageIndex. The workflow illustrates the core idea:\n",
+ "> *Generating a hierarchical tree structure from a document, reasoning over that tree structure, and extracting relevant context, without relying on a vector database or top-k similarity search*.\n",
"\n",
"While this notebook highlights a minimal workflow, the PageIndex framework is built to support **far more advanced** use cases. In upcoming tutorials, we will introduce:\n",
- "* **Multi-node reasoning for complex query** — Scale tree search to handle queries that require context from multiple nodes.\n",
- "* **Multi-document search** — Enable reasoning-based navigation across large document collections, extending beyond a single file.\n",
- "* **Efficient Tree search** — Improve tree search efficiency for long documents with a large number of nodes.\n",
+ "* **Multi-Node Reasoning with Content Extraction** — Scale tree search to extract and select relevant content from multiple nodes.\n",
+ "* **Multi-Document Search** — Enable reasoning-based navigation across large document collections, extending beyond a single file.\n",
+ "* **Efficient Tree Search** — Improve tree search efficiency for long documents with a large number of nodes.\n",
"* **Expert Knowledge Integration and Preference Alignment** — Incorporate user preferences or expert insights by adding knowledge directly into the LLM tree search, without the need for fine-tuning.\n",
- "\n",
+ "\n"
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "metadata": {},
+ "source": [
"# 🔎 Learn More About PageIndex\n",
" 🏠 Homepage • \n",
" 🖥️ Dashboard • \n",
@@ -591,8 +612,7 @@
"\n",
"
\n",
"\n",
- "© 2025 [Vectify AI](https://vectify.ai)\n",
- "\n"
+ "© 2025 [Vectify AI](https://vectify.ai)"
]
}
],