diff --git a/cookbook/pageindex_RAG_simple.ipynb b/cookbook/pageindex_RAG_simple.ipynb index 0bf4b29..fbfef47 100644 --- a/cookbook/pageindex_RAG_simple.ipynb +++ b/cookbook/pageindex_RAG_simple.ipynb @@ -40,12 +40,16 @@ "cell_type": "markdown", "metadata": {}, "source": [ - "PageIndex generates a searchable tree structure of documents, enabling reasoning-based retrieval through tree search. \n", + "## Introduction\n", + "PageIndex is a new **vectorless RAG** framework. It conduct retrieval in two steps: \n", + "1. Generate a tree structure to index documents \n", + "2. Perform reasoning-based retrieval through tree search \n", "\n", "
\n", - " \n", + " \n", "
\n", "\n", + "Compared to classic vector-based RAG, PageIndex features:\n", "- **No Vectors Needed**: Uses document structure and LLM reasoning for retrieval.\n", "- **No Chunking Needed**: Documents are organized into natural sections rather than artificial chunks.\n", "- **Human-like Retrieval**: Simulates how human experts navigate and extract knowledge from complex documents. \n", @@ -61,21 +65,18 @@ "This notebook demonstrates a simple example of **vectorless RAG** with PageIndex through the following steps:\n", "- [x] Build a PageIndex tree structure of a document\n", "- [x] Perform reasoning-based retrieval with tree search\n", - "- [x] Generate answers based on the retrieved context" + "- [x] Generate answers based on the retrieved context\n", + "\n", + "---" ] }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [] - }, { "cell_type": "markdown", "metadata": { "id": "7ziuTbbWcG1L" }, "source": [ - "## Preparation\n", + "## Step 0: Preparation\n", "\n" ] }, @@ -85,7 +86,7 @@ "id": "edTfrizMFK4c" }, "source": [ - "### Install dependencies" + "#### 0.1 Install dependencies" ] }, { @@ -106,7 +107,7 @@ "id": "WVEWzPKGcG1M" }, "source": [ - "### Setup environment" + "#### 0.2 Setup environment" ] }, { @@ -134,7 +135,7 @@ "id": "AR7PLeVbcG1N" }, "source": [ - "### Define utility functions" + "#### 0.3 Define utility functions" ] }, { @@ -198,7 +199,7 @@ "id": "Mzd1VWjwMUJL" }, "source": [ - "### Submit a document with PageIndex SDK" + "#### 1.1 Submit a document with PageIndex SDK" ] }, { @@ -244,7 +245,7 @@ "id": "4-Hrh0azcG1N" }, "source": [ - "### Get the generated PageIndex tree structure" + "#### 1.2 Get the generated PageIndex tree structure" ] }, { @@ -356,7 +357,7 @@ "cell_type": "markdown", "metadata": {}, "source": [ - "### Use LLM for tree search and identify nodes that might contain relevant context" + "#### 2.1 Use LLM for tree search and identify nodes that might contain relevant context" ] }, { @@ -396,7 +397,7 @@ "cell_type": "markdown", "metadata": {}, "source": [ - "### Print retrieved nodes and reasoning process" + "#### 2.2 Print retrieved nodes and reasoning process" ] }, { @@ -455,7 +456,7 @@ "cell_type": "markdown", "metadata": {}, "source": [ - "### Extract relevant context from retrieved nodes" + "#### 3.1 Extract relevant context from retrieved nodes" ] }, { @@ -507,7 +508,7 @@ "cell_type": "markdown", "metadata": {}, "source": [ - "### Generate answer based on retrieved context" + "#### 3.2 Generate answer based on retrieved context" ] }, {