mirror of
https://github.com/VectifyAI/PageIndex.git
synced 2026-04-24 23:56:21 +02:00
fix notebook
This commit is contained in:
parent
6b7e15bd84
commit
faeeb81972
1 changed files with 51 additions and 31 deletions
|
|
@ -39,13 +39,19 @@
|
|||
"\n",
|
||||
"- **No Vectors Needed**: Uses document structure and LLM reasoning for retrieval.\n",
|
||||
"- **No Chunking Needed**: Documents are organized into natural sections rather than artificial chunks.\n",
|
||||
"- **No Top-K Needed**: The LLM decides how many nodes need to be retrieved.\n",
|
||||
"- **Transparent Retrieval Process**: Retrieval based on reasoning — say goodbye to approximate semantic search ('vibe retrieval').\n",
|
||||
"- **Human-like Retrieval**: Simulates how human experts navigate and extract knowledge from complex documents. \n",
|
||||
"- **Transparent Retrieval Process**: Retrieval based on reasoning — say goodbye to approximate semantic search ('vibe retrieval')."
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"## 📝 About this Notebook\n",
|
||||
"\n",
|
||||
"# 📝 About this Notebook\n",
|
||||
"This notebook demonstrates a simple example of **vectorless RAG** with PageIndex. You will learn:\n",
|
||||
"- [x] How to generate PageIndex tree structure of a document.\n",
|
||||
"- [x] How to perform retrieval with tree search.\n",
|
||||
"- [x] How to build a PageIndex tree structure of a document.\n",
|
||||
"- [x] How to perform reasoning-based retrieval with tree search.\n",
|
||||
"- [x] How to generate the answer based on the retrieved context."
|
||||
]
|
||||
},
|
||||
|
|
@ -55,7 +61,7 @@
|
|||
"id": "7ziuTbbWcG1L"
|
||||
},
|
||||
"source": [
|
||||
"# Preparation\n",
|
||||
"## Preparation\n",
|
||||
"\n"
|
||||
]
|
||||
},
|
||||
|
|
@ -65,7 +71,7 @@
|
|||
"id": "edTfrizMFK4c"
|
||||
},
|
||||
"source": [
|
||||
"## Install Dependencies"
|
||||
"### Install dependencies"
|
||||
]
|
||||
},
|
||||
{
|
||||
|
|
@ -86,7 +92,7 @@
|
|||
"id": "WVEWzPKGcG1M"
|
||||
},
|
||||
"source": [
|
||||
"## Setup Environment"
|
||||
"### Setup environment"
|
||||
]
|
||||
},
|
||||
{
|
||||
|
|
@ -114,7 +120,7 @@
|
|||
"id": "AR7PLeVbcG1N"
|
||||
},
|
||||
"source": [
|
||||
"## Define Utility Functions"
|
||||
"### Define utility functions"
|
||||
]
|
||||
},
|
||||
{
|
||||
|
|
@ -169,7 +175,7 @@
|
|||
"id": "heGtIMOVcG1N"
|
||||
},
|
||||
"source": [
|
||||
"# Step 1: PageIndex Tree Generation"
|
||||
"## Step 1: PageIndex Tree Generation"
|
||||
]
|
||||
},
|
||||
{
|
||||
|
|
@ -178,7 +184,7 @@
|
|||
"id": "Mzd1VWjwMUJL"
|
||||
},
|
||||
"source": [
|
||||
"## Submit a document with PageIndex SDK"
|
||||
"### Submit a document with PageIndex SDK"
|
||||
]
|
||||
},
|
||||
{
|
||||
|
|
@ -224,7 +230,7 @@
|
|||
"id": "4-Hrh0azcG1N"
|
||||
},
|
||||
"source": [
|
||||
"## Get the generated PageIndex tree structure"
|
||||
"### Get the generated PageIndex tree structure"
|
||||
]
|
||||
},
|
||||
{
|
||||
|
|
@ -329,9 +335,9 @@
|
|||
"id": "USoCLOiQcG1O"
|
||||
},
|
||||
"source": [
|
||||
"# Step 2: Reasoning-Based Retrieval with Tree Search\n",
|
||||
"## Step 2: Reasoning-Based Retrieval with Tree Search\n",
|
||||
"\n",
|
||||
"#### Use LLM to search the PageIndex tree and decide which nodes may contain the relevant context."
|
||||
"### Use LLM for tree search and decide which nodes might contain relevant context"
|
||||
]
|
||||
},
|
||||
{
|
||||
|
|
@ -367,6 +373,13 @@
|
|||
"tree_search_result = await call_llm(search_prompt)"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"### Print retrieved nodes and reasoning process"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": null,
|
||||
|
|
@ -426,8 +439,6 @@
|
|||
}
|
||||
],
|
||||
"source": [
|
||||
"### Print retrieval nodes\n",
|
||||
"\n",
|
||||
"node_map = create_node_mapping(tree)\n",
|
||||
"tree_search_result_json = json.loads(tree_search_result)\n",
|
||||
"\n",
|
||||
|
|
@ -446,9 +457,9 @@
|
|||
"id": "10wOZDG_cG1O"
|
||||
},
|
||||
"source": [
|
||||
"# Step 3: Answer Generation\n",
|
||||
"## Step 3: Answer Generation\n",
|
||||
"\n",
|
||||
"#### Extract context from relevant nodes and generate the final answer."
|
||||
"### Extract relevant context from retrieved nodes"
|
||||
]
|
||||
},
|
||||
{
|
||||
|
|
@ -496,12 +507,18 @@
|
|||
}
|
||||
],
|
||||
"source": [
|
||||
"# Prepare Retrieved Context\n",
|
||||
"\n",
|
||||
"node_list = json.loads(tree_search_result)[\"node_list\"]\n",
|
||||
"relevant_content = \"\\n\\n\".join(node_map[node_id][\"text\"] for node_id in node_list)\n",
|
||||
"\n",
|
||||
"print_markdown('## Retrieved Context', '---')\n",
|
||||
"print_markdown(f'{relevant_content[:1000]} ...')"
|
||||
"print_markdown(relevant_content[:1000] + ' ...')"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"### Generate answer based on retrieved context"
|
||||
]
|
||||
},
|
||||
{
|
||||
|
|
@ -548,8 +565,6 @@
|
|||
}
|
||||
],
|
||||
"source": [
|
||||
"# Generate Answer\n",
|
||||
"\n",
|
||||
"answer_prompt = f\"\"\"\n",
|
||||
"Answer the question based on the context:\n",
|
||||
"\n",
|
||||
|
|
@ -572,15 +587,21 @@
|
|||
"source": [
|
||||
"# 🎯 What's Next\n",
|
||||
"\n",
|
||||
"This notebook has demonstrated a basic example of **reasoning-based**, **vectorless** RAG with PageIndex. The workflow illustrates the core idea:\n",
|
||||
"> *Generating a hierarchical tree structure from a document, reasoning over that tree structure, and extracting relevant context without relying on a vector database or top-k similarity search*.\n",
|
||||
"This notebook has demonstrated a basic, minimal example of **reasoning-based**, **vectorless** RAG with PageIndex. The workflow illustrates the core idea:\n",
|
||||
"> *Generating a hierarchical tree structure from a document, reasoning over that tree structure, and extracting relevant context, without relying on a vector database or top-k similarity search*.\n",
|
||||
"\n",
|
||||
"While this notebook highlights a minimal workflow, the PageIndex framework is built to support **far more advanced** use cases. In upcoming tutorials, we will introduce:\n",
|
||||
"* **Multi-node reasoning for complex query** — Scale tree search to handle queries that require context from multiple nodes.\n",
|
||||
"* **Multi-document search** — Enable reasoning-based navigation across large document collections, extending beyond a single file.\n",
|
||||
"* **Efficient Tree search** — Improve tree search efficiency for long documents with a large number of nodes.\n",
|
||||
"* **Multi-Node Reasoning with Content Extraction** — Scale tree search to extract and select relevant content from multiple nodes.\n",
|
||||
"* **Multi-Document Search** — Enable reasoning-based navigation across large document collections, extending beyond a single file.\n",
|
||||
"* **Efficient Tree Search** — Improve tree search efficiency for long documents with a large number of nodes.\n",
|
||||
"* **Expert Knowledge Integration and Preference Alignment** — Incorporate user preferences or expert insights by adding knowledge directly into the LLM tree search, without the need for fine-tuning.\n",
|
||||
"\n",
|
||||
"\n"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"# 🔎 Learn More About PageIndex\n",
|
||||
" <a href=\"https://vectify.ai\">🏠 Homepage</a> • \n",
|
||||
" <a href=\"https://dash.pageindex.ai\">🖥️ Dashboard</a> • \n",
|
||||
|
|
@ -591,8 +612,7 @@
|
|||
"\n",
|
||||
"<br>\n",
|
||||
"\n",
|
||||
"© 2025 [Vectify AI](https://vectify.ai)\n",
|
||||
"\n"
|
||||
"© 2025 [Vectify AI](https://vectify.ai)"
|
||||
]
|
||||
}
|
||||
],
|
||||
|
|
|
|||
Loading…
Add table
Add a link
Reference in a new issue