mirror of
https://github.com/VectifyAI/PageIndex.git
synced 2026-04-25 16:16:22 +02:00
607 lines
24 KiB
Text
607 lines
24 KiB
Text
{
|
||
"cells": [
|
||
{
|
||
"cell_type": "markdown",
|
||
"metadata": {
|
||
"id": "TCh9BTedHJK1"
|
||
},
|
||
"source": [
|
||
""
|
||
]
|
||
},
|
||
{
|
||
"cell_type": "markdown",
|
||
"metadata": {
|
||
"id": "nD0hb4TFHWTt"
|
||
},
|
||
"source": [
|
||
" <p align=\"center\"><i>Reasoning-based RAG ✧ No Vector DB ✧ No Chunking ✧ Human-like Retrieval</i></p>\n",
|
||
"\n",
|
||
"<p align=\"center\">\n",
|
||
" <a href=\"https://vectify.ai\">🏠 Homepage</a> • \n",
|
||
" <a href=\"https://dash.pageindex.ai\">🖥️ Dashboard</a> • \n",
|
||
" <a href=\"https://docs.pageindex.ai/quickstart\">📚 API Docs</a> • \n",
|
||
" <a href=\"https://github.com/vectifyai/pageindex\">📦 GitHub</a> • \n",
|
||
" <a href=\"https://discord.com/invite/VuXuf29EUj\">💬 Discord</a> • \n",
|
||
" <a href=\"https://ii2abc2jejf.typeform.com/to/tK3AXl8T\">✉️ Contact</a> \n",
|
||
"</p>\n",
|
||
"\n",
|
||
"<p align=\"center\">\n",
|
||
" <a href=\"https://github.com/VectifyAI/PageIndex/stargazers\">\n",
|
||
" <img src=\"https://img.shields.io/github/stars/VectifyAI/PageIndex?style=for-the-badge&logo=github&label=⭐%20Star%20Us\" alt=\"Star us on GitHub\" />\n",
|
||
" </a>\n",
|
||
"</p>\n",
|
||
"\n",
|
||
"---"
|
||
]
|
||
},
|
||
{
|
||
"cell_type": "markdown",
|
||
"metadata": {
|
||
"id": "Ebvn5qfpcG1K"
|
||
},
|
||
"source": [
|
||
"# Simple Vectorless RAG with PageIndex"
|
||
]
|
||
},
|
||
{
|
||
"cell_type": "markdown",
|
||
"metadata": {},
|
||
"source": [
|
||
"## PageIndex Introduction\n",
|
||
"PageIndex is a new **reasoning-based**, **vectorless RAG** framework that performs retrieval in two steps: \n",
|
||
"1. Generate a tree structure index of documents \n",
|
||
"2. Perform reasoning-based retrieval through tree search \n",
|
||
"\n",
|
||
"<div align=\"center\">\n",
|
||
" <img src=\"https://docs.pageindex.ai/images/cookbook/vectorless-rag.png\" width=\"70%\">\n",
|
||
"</div>\n",
|
||
"\n",
|
||
"Compared to traditional vector-based RAG, PageIndex features:\n",
|
||
"- **No Vectors Needed**: Uses document structure and LLM reasoning for retrieval.\n",
|
||
"- **No Chunking Needed**: Documents are organized into natural sections rather than artificial chunks.\n",
|
||
"- **Human-like Retrieval**: Simulates how human experts navigate and extract knowledge from complex documents. \n",
|
||
"- **Transparent Retrieval Process**: Retrieval based on reasoning — say goodbye to approximate semantic search (\"vibe retrieval\")."
|
||
]
|
||
},
|
||
{
|
||
"cell_type": "markdown",
|
||
"metadata": {},
|
||
"source": [
|
||
"## 📝 Notebook Overview\n",
|
||
"\n",
|
||
"This notebook demonstrates a simple example of **vectorless RAG** with PageIndex. You will learn how to:\n",
|
||
"- [x] Build a PageIndex tree structure of a document\n",
|
||
"- [x] Perform reasoning-based retrieval with tree search\n",
|
||
"- [x] Generate answers based on the retrieved context\n",
|
||
"\n",
|
||
"---"
|
||
]
|
||
},
|
||
{
|
||
"cell_type": "markdown",
|
||
"metadata": {
|
||
"id": "7ziuTbbWcG1L"
|
||
},
|
||
"source": [
|
||
"## Step 0: Preparation\n",
|
||
"\n"
|
||
]
|
||
},
|
||
{
|
||
"cell_type": "markdown",
|
||
"metadata": {
|
||
"id": "edTfrizMFK4c"
|
||
},
|
||
"source": [
|
||
"#### 0.1 Install PageIndex"
|
||
]
|
||
},
|
||
{
|
||
"cell_type": "code",
|
||
"execution_count": null,
|
||
"metadata": {
|
||
"collapsed": true,
|
||
"id": "LaoB58wQFNDh"
|
||
},
|
||
"outputs": [],
|
||
"source": [
|
||
"%pip install -q --upgrade pageindex"
|
||
]
|
||
},
|
||
{
|
||
"cell_type": "markdown",
|
||
"metadata": {
|
||
"id": "WVEWzPKGcG1M"
|
||
},
|
||
"source": [
|
||
"#### 0.2 Setup PageIndex"
|
||
]
|
||
},
|
||
{
|
||
"cell_type": "code",
|
||
"execution_count": null,
|
||
"metadata": {
|
||
"id": "StvqfcK4cG1M"
|
||
},
|
||
"outputs": [],
|
||
"source": [
|
||
"from pageindex import PageIndexClient\n",
|
||
"import pageindex.utils as utils\n",
|
||
"\n",
|
||
"# Get your PageIndex API key from https://dash.pageindex.ai/api-keys\n",
|
||
"PAGEINDEX_API_KEY = \"YOUR_PAGEINDEX_API_KEY\"\n",
|
||
"pi_client = PageIndexClient(api_key=PAGEINDEX_API_KEY)"
|
||
]
|
||
},
|
||
{
|
||
"cell_type": "markdown",
|
||
"metadata": {},
|
||
"source": [
|
||
"#### 0.3 Setup LLM\n",
|
||
"\n",
|
||
"Choose your preferred LLM for reasoning-based retrieval. In this example, we use OpenAI’s GPT-4.1."
|
||
]
|
||
},
|
||
{
|
||
"cell_type": "code",
|
||
"execution_count": null,
|
||
"metadata": {},
|
||
"outputs": [],
|
||
"source": [
|
||
"import openai\n",
|
||
"OPENAI_API_KEY = \"YOUR_OPENAI_API_KEY\"\n",
|
||
"\n",
|
||
"async def call_llm(prompt, model=\"gpt-4.1\", temperature=0):\n",
|
||
" client = openai.AsyncOpenAI(api_key=OPENAI_API_KEY)\n",
|
||
" response = await client.chat.completions.create(\n",
|
||
" model=model,\n",
|
||
" messages=[{\"role\": \"user\", \"content\": prompt}],\n",
|
||
" temperature=temperature\n",
|
||
" )\n",
|
||
" return response.choices[0].message.content.strip()"
|
||
]
|
||
},
|
||
{
|
||
"cell_type": "markdown",
|
||
"metadata": {
|
||
"id": "heGtIMOVcG1N"
|
||
},
|
||
"source": [
|
||
"## Step 1: PageIndex Tree Generation"
|
||
]
|
||
},
|
||
{
|
||
"cell_type": "markdown",
|
||
"metadata": {
|
||
"id": "Mzd1VWjwMUJL"
|
||
},
|
||
"source": [
|
||
"#### 1.1 Submit a document for generating PageIndex tree"
|
||
]
|
||
},
|
||
{
|
||
"cell_type": "code",
|
||
"execution_count": null,
|
||
"metadata": {
|
||
"colab": {
|
||
"base_uri": "https://localhost:8080/"
|
||
},
|
||
"id": "f6--eZPLcG1N",
|
||
"outputId": "ca688cfd-6c4b-4a57-dac2-f3c2604c4112"
|
||
},
|
||
"outputs": [
|
||
{
|
||
"name": "stdout",
|
||
"output_type": "stream",
|
||
"text": [
|
||
"Downloaded https://arxiv.org/pdf/2501.12948.pdf\n",
|
||
"Document Submitted: pi-cmeseq08w00vt0bo3u6tr244g\n"
|
||
]
|
||
}
|
||
],
|
||
"source": [
|
||
"import os, requests\n",
|
||
"\n",
|
||
"# You can also use our GitHub repo to generate PageIndex tree\n",
|
||
"# https://github.com/VectifyAI/PageIndex\n",
|
||
"\n",
|
||
"pdf_url = \"https://arxiv.org/pdf/2501.12948.pdf\"\n",
|
||
"pdf_path = os.path.join(\"../data\", pdf_url.split('/')[-1])\n",
|
||
"os.makedirs(os.path.dirname(pdf_path), exist_ok=True)\n",
|
||
"\n",
|
||
"response = requests.get(pdf_url)\n",
|
||
"with open(pdf_path, \"wb\") as f:\n",
|
||
" f.write(response.content)\n",
|
||
"print(f\"Downloaded {pdf_url}\")\n",
|
||
"\n",
|
||
"doc_id = pi_client.submit_document(pdf_path)[\"doc_id\"]\n",
|
||
"print('Document Submitted:', doc_id)"
|
||
]
|
||
},
|
||
{
|
||
"cell_type": "markdown",
|
||
"metadata": {
|
||
"id": "4-Hrh0azcG1N"
|
||
},
|
||
"source": [
|
||
"#### 1.2 Get the generated PageIndex tree structure"
|
||
]
|
||
},
|
||
{
|
||
"cell_type": "code",
|
||
"execution_count": null,
|
||
"metadata": {
|
||
"colab": {
|
||
"base_uri": "https://localhost:8080/",
|
||
"height": 1000
|
||
},
|
||
"id": "b1Q1g6vrcG1O",
|
||
"outputId": "dc944660-38ad-47ea-d358-be422edbae53"
|
||
},
|
||
"outputs": [
|
||
{
|
||
"name": "stdout",
|
||
"output_type": "stream",
|
||
"text": [
|
||
"Simplified Tree Structure of the Document:\n",
|
||
"[{'title': 'DeepSeek-R1: Incentivizing Reasoning Cap...',\n",
|
||
" 'node_id': '0000',\n",
|
||
" 'prefix_summary': '# DeepSeek-R1: Incentivizing Reasoning C...',\n",
|
||
" 'nodes': [{'title': 'Abstract',\n",
|
||
" 'node_id': '0001',\n",
|
||
" 'summary': 'The partial document introduces two reas...'},\n",
|
||
" {'title': 'Contents',\n",
|
||
" 'node_id': '0002',\n",
|
||
" 'summary': 'This partial document provides a detaile...'},\n",
|
||
" {'title': '1. Introduction',\n",
|
||
" 'node_id': '0003',\n",
|
||
" 'prefix_summary': 'The partial document introduces recent a...',\n",
|
||
" 'nodes': [{'title': '1.1. Contributions',\n",
|
||
" 'node_id': '0004',\n",
|
||
" 'summary': 'This partial document outlines the main ...'},\n",
|
||
" {'title': '1.2. Summary of Evaluation Results',\n",
|
||
" 'node_id': '0005',\n",
|
||
" 'summary': 'The partial document provides a summary ...'}]},\n",
|
||
" {'title': '2. Approach',\n",
|
||
" 'node_id': '0006',\n",
|
||
" 'prefix_summary': '## 2. Approach\\n',\n",
|
||
" 'nodes': [{'title': '2.1. Overview',\n",
|
||
" 'node_id': '0007',\n",
|
||
" 'summary': '### 2.1. Overview\\n\\nPrevious work has hea...'},\n",
|
||
" {'title': '2.2. DeepSeek-R1-Zero: Reinforcement Lea...',\n",
|
||
" 'node_id': '0008',\n",
|
||
" 'prefix_summary': '### 2.2. DeepSeek-R1-Zero: Reinforcement...',\n",
|
||
" 'nodes': [{'title': '2.2.1. Reinforcement Learning Algorithm',\n",
|
||
" 'node_id': '0009',\n",
|
||
" 'summary': 'The partial document describes the Group...'},\n",
|
||
" {'title': '2.2.2. Reward Modeling',\n",
|
||
" 'node_id': '0010',\n",
|
||
" 'summary': 'This partial document discusses the rewa...'},\n",
|
||
" {'title': '2.2.3. Training Template',\n",
|
||
" 'node_id': '0011',\n",
|
||
" 'summary': '#### 2.2.3. Training Template\\n\\nTo train ...'},\n",
|
||
" {'title': '2.2.4. Performance, Self-evolution Proce...',\n",
|
||
" 'node_id': '0012',\n",
|
||
" 'summary': 'This partial document discusses the perf...'}]},\n",
|
||
" {'title': '2.3. DeepSeek-R1: Reinforcement Learning...',\n",
|
||
" 'node_id': '0013',\n",
|
||
" 'summary': 'This partial document describes the trai...'},\n",
|
||
" {'title': '2.4. Distillation: Empower Small Models ...',\n",
|
||
" 'node_id': '0014',\n",
|
||
" 'summary': 'This partial document discusses the proc...'}]},\n",
|
||
" {'title': '3. Experiment',\n",
|
||
" 'node_id': '0015',\n",
|
||
" 'prefix_summary': 'The partial document describes the exper...',\n",
|
||
" 'nodes': [{'title': '3.1. DeepSeek-R1 Evaluation',\n",
|
||
" 'node_id': '0016',\n",
|
||
" 'summary': 'This partial document presents a compreh...'},\n",
|
||
" {'title': '3.2. Distilled Model Evaluation',\n",
|
||
" 'node_id': '0017',\n",
|
||
" 'summary': 'This partial document presents an evalua...'}]},\n",
|
||
" {'title': '4. Discussion',\n",
|
||
" 'node_id': '0018',\n",
|
||
" 'summary': 'This partial document discusses the comp...'},\n",
|
||
" {'title': '5. Conclusion, Limitations, and Future W...',\n",
|
||
" 'node_id': '0019',\n",
|
||
" 'summary': 'This partial document presents the concl...'},\n",
|
||
" {'title': 'References',\n",
|
||
" 'node_id': '0020',\n",
|
||
" 'summary': 'This partial document consists of the re...'},\n",
|
||
" {'title': 'Appendix', 'node_id': '0021', 'summary': '## Appendix\\n'},\n",
|
||
" {'title': 'A. Contributions and Acknowledgments',\n",
|
||
" 'node_id': '0022',\n",
|
||
" 'summary': 'This partial document section details th...'}]}]\n"
|
||
]
|
||
}
|
||
],
|
||
"source": [
|
||
"if pi_client.is_retrieval_ready(doc_id):\n",
|
||
" tree = pi_client.get_tree(doc_id, node_summary=True)['result']\n",
|
||
" print('Simplified Tree Structure of the Document:')\n",
|
||
" utils.print_tree(tree)\n",
|
||
"else:\n",
|
||
" print(\"Processing document, please try again later...\")"
|
||
]
|
||
},
|
||
{
|
||
"cell_type": "markdown",
|
||
"metadata": {
|
||
"id": "USoCLOiQcG1O"
|
||
},
|
||
"source": [
|
||
"## Step 2: Reasoning-Based Retrieval with Tree Search"
|
||
]
|
||
},
|
||
{
|
||
"cell_type": "markdown",
|
||
"metadata": {},
|
||
"source": [
|
||
"#### 2.1 Use LLM for tree search and identify nodes that might contain relevant context"
|
||
]
|
||
},
|
||
{
|
||
"cell_type": "code",
|
||
"execution_count": 21,
|
||
"metadata": {
|
||
"id": "LLHNJAtTcG1O"
|
||
},
|
||
"outputs": [],
|
||
"source": [
|
||
"import json\n",
|
||
"\n",
|
||
"query = \"What are the conclusions in this document?\"\n",
|
||
"\n",
|
||
"tree_without_text = utils.remove_fields(tree.copy(), fields=['text'])\n",
|
||
"\n",
|
||
"search_prompt = f\"\"\"\n",
|
||
"You are given a question and a tree structure of a document.\n",
|
||
"Each node contains a node id, node title, and a corresponding summary.\n",
|
||
"Your task is to find all nodes that are likely to contain the answer to the question.\n",
|
||
"\n",
|
||
"Question: {query}\n",
|
||
"\n",
|
||
"Document tree structure:\n",
|
||
"{json.dumps(tree_without_text, indent=2)}\n",
|
||
"\n",
|
||
"Please reply in the following JSON format:\n",
|
||
"{{\n",
|
||
" \"thinking\": \"<Your thinking process on which nodes are relevant to the question>\",\n",
|
||
" \"node_list\": [\"node_id_1\", \"node_id_2\", ..., \"node_id_n\"]\n",
|
||
"}}\n",
|
||
"Directly return the final JSON structure. Do not output anything else.\n",
|
||
"\"\"\"\n",
|
||
"\n",
|
||
"tree_search_result = await call_llm(search_prompt)"
|
||
]
|
||
},
|
||
{
|
||
"cell_type": "markdown",
|
||
"metadata": {},
|
||
"source": [
|
||
"#### 2.2 Print retrieved nodes and reasoning process"
|
||
]
|
||
},
|
||
{
|
||
"cell_type": "code",
|
||
"execution_count": 57,
|
||
"metadata": {
|
||
"colab": {
|
||
"base_uri": "https://localhost:8080/",
|
||
"height": 206
|
||
},
|
||
"id": "P8DVUOuAen5u",
|
||
"outputId": "6bb6d052-ef30-4716-f88e-be98bcb7ebdb"
|
||
},
|
||
"outputs": [
|
||
{
|
||
"name": "stdout",
|
||
"output_type": "stream",
|
||
"text": [
|
||
"Reasoning Process:\n",
|
||
"The question asks for the conclusions in the document. Typically, conclusions are found in sections\n",
|
||
"explicitly titled 'Conclusion' or in sections summarizing the findings and implications of the work.\n",
|
||
"In this document tree, node 0019 ('5. Conclusion, Limitations, and Future Work') is the most\n",
|
||
"directly relevant, as it is dedicated to the conclusion and related topics. Additionally, the\n",
|
||
"'Abstract' (node 0001) may contain a high-level summary that sometimes includes concluding remarks,\n",
|
||
"but it is less likely to contain the full conclusions. Other sections like 'Discussion' (node 0018)\n",
|
||
"may discuss implications but are not explicitly conclusions. Therefore, the primary node is 0019.\n",
|
||
"\n",
|
||
"Retrieved Nodes:\n",
|
||
"Node ID: 0019\t Page: 16\t Title: 5. Conclusion, Limitations, and Future Work\n"
|
||
]
|
||
}
|
||
],
|
||
"source": [
|
||
"node_map = utils.create_node_mapping(tree)\n",
|
||
"tree_search_result_json = json.loads(tree_search_result)\n",
|
||
"\n",
|
||
"print('Reasoning Process:')\n",
|
||
"utils.print_wrapped(tree_search_result_json['thinking'])\n",
|
||
"\n",
|
||
"print('\\nRetrieved Nodes:')\n",
|
||
"for node_id in tree_search_result_json[\"node_list\"]:\n",
|
||
" node = node_map[node_id]\n",
|
||
" print(f\"Node ID: {node['node_id']}\\t Page: {node['page_index']}\\t Title: {node['title']}\")"
|
||
]
|
||
},
|
||
{
|
||
"cell_type": "markdown",
|
||
"metadata": {
|
||
"id": "10wOZDG_cG1O"
|
||
},
|
||
"source": [
|
||
"## Step 3: Answer Generation"
|
||
]
|
||
},
|
||
{
|
||
"cell_type": "markdown",
|
||
"metadata": {},
|
||
"source": [
|
||
"#### 3.1 Extract relevant context from retrieved nodes"
|
||
]
|
||
},
|
||
{
|
||
"cell_type": "code",
|
||
"execution_count": 58,
|
||
"metadata": {
|
||
"colab": {
|
||
"base_uri": "https://localhost:8080/",
|
||
"height": 279
|
||
},
|
||
"id": "a7UCBnXlcG1O",
|
||
"outputId": "8a026ea3-4ef3-473a-a57b-b4565409749e"
|
||
},
|
||
"outputs": [
|
||
{
|
||
"name": "stdout",
|
||
"output_type": "stream",
|
||
"text": [
|
||
"Retrieved Context:\n",
|
||
"\n",
|
||
"## 5. Conclusion, Limitations, and Future Work\n",
|
||
"\n",
|
||
"In this work, we share our journey in enhancing model reasoning abilities through reinforcement\n",
|
||
"learning. DeepSeek-R1-Zero represents a pure RL approach without relying on cold-start data,\n",
|
||
"achieving strong performance across various tasks. DeepSeek-R1 is more powerful, leveraging cold-\n",
|
||
"start data alongside iterative RL fine-tuning. Ultimately, DeepSeek-R1 achieves performance\n",
|
||
"comparable to OpenAI-o1-1217 on a range of tasks.\n",
|
||
"\n",
|
||
"We further explore distillation the reasoning capability to small dense models. We use DeepSeek-R1\n",
|
||
"as the teacher model to generate 800K training samples, and fine-tune several small dense models.\n",
|
||
"The results are promising: DeepSeek-R1-Distill-Qwen-1.5B outperforms GPT-4o and Claude-3.5-Sonnet on\n",
|
||
"math benchmarks with $28.9 \\%$ on AIME and $83.9 \\%$ on MATH. Other dense models also achieve\n",
|
||
"impressive results, significantly outperforming other instructiontuned models based on the same\n",
|
||
"underlying checkpoints.\n",
|
||
"\n",
|
||
"In the fut...\n"
|
||
]
|
||
}
|
||
],
|
||
"source": [
|
||
"node_list = json.loads(tree_search_result)[\"node_list\"]\n",
|
||
"relevant_content = \"\\n\\n\".join(node_map[node_id][\"text\"] for node_id in node_list)\n",
|
||
"\n",
|
||
"print('Retrieved Context:\\n')\n",
|
||
"utils.print_wrapped(relevant_content[:1000] + '...')"
|
||
]
|
||
},
|
||
{
|
||
"cell_type": "markdown",
|
||
"metadata": {},
|
||
"source": [
|
||
"#### 3.2 Generate answer based on retrieved context"
|
||
]
|
||
},
|
||
{
|
||
"cell_type": "code",
|
||
"execution_count": 59,
|
||
"metadata": {
|
||
"colab": {
|
||
"base_uri": "https://localhost:8080/",
|
||
"height": 210
|
||
},
|
||
"id": "tcp_PhHzcG1O",
|
||
"outputId": "187ff116-9bb0-4ab4-bacb-13944460b5ff"
|
||
},
|
||
"outputs": [
|
||
{
|
||
"name": "stdout",
|
||
"output_type": "stream",
|
||
"text": [
|
||
"Generated Answer:\n",
|
||
"\n",
|
||
"The conclusions in this document are:\n",
|
||
"\n",
|
||
"- DeepSeek-R1-Zero, a pure reinforcement learning (RL) approach without cold-start data, achieves\n",
|
||
"strong performance across various tasks.\n",
|
||
"- DeepSeek-R1, which combines cold-start data with iterative RL fine-tuning, is more powerful and\n",
|
||
"achieves performance comparable to OpenAI-o1-1217 on a range of tasks.\n",
|
||
"- Distilling DeepSeek-R1’s reasoning capabilities into smaller dense models is promising; for\n",
|
||
"example, DeepSeek-R1-Distill-Qwen-1.5B outperforms GPT-4o and Claude-3.5-Sonnet on math benchmarks,\n",
|
||
"and other dense models also show significant improvements over similar instruction-tuned models.\n",
|
||
"\n",
|
||
"These results demonstrate the effectiveness of the RL-based approach and the potential for\n",
|
||
"distilling reasoning abilities into smaller models.\n"
|
||
]
|
||
}
|
||
],
|
||
"source": [
|
||
"answer_prompt = f\"\"\"\n",
|
||
"Answer the question based on the context:\n",
|
||
"\n",
|
||
"Question: {query}\n",
|
||
"Context: {relevant_content}\n",
|
||
"\n",
|
||
"Provide a clear, concise answer based only on the context provided.\n",
|
||
"\"\"\"\n",
|
||
"\n",
|
||
"print('Generated Answer:\\n')\n",
|
||
"answer = await call_llm(answer_prompt)\n",
|
||
"utils.print_wrapped(answer)"
|
||
]
|
||
},
|
||
{
|
||
"cell_type": "markdown",
|
||
"metadata": {
|
||
"id": "_1kaGD3GcG1O"
|
||
},
|
||
"source": [
|
||
"---\n",
|
||
"\n",
|
||
"## 🎯 What's Next\n",
|
||
"\n",
|
||
"This notebook has demonstrated a basic, minimal example of **reasoning-based**, **vectorless** RAG with PageIndex. The workflow illustrates the core idea:\n",
|
||
"> *Generating a hierarchical tree structure from a document, reasoning over that tree structure, and extracting relevant context, without relying on a vector database or top-k similarity search*.\n",
|
||
"\n",
|
||
"While this notebook highlights a minimal workflow, the PageIndex framework is built to support **far more advanced** use cases. In upcoming tutorials, we will introduce:\n",
|
||
"* **Multi-Node Reasoning with Content Extraction** — Scale tree search to extract and select relevant content from multiple nodes.\n",
|
||
"* **Multi-Document Search** — Enable reasoning-based navigation across large document collections, extending beyond a single file.\n",
|
||
"* **Efficient Tree Search** — Improve tree search efficiency for long documents with a large number of nodes.\n",
|
||
"* **Expert Knowledge Integration and Preference Alignment** — Incorporate user preferences or expert insights by adding knowledge directly into the LLM tree search, without the need for fine-tuning.\n",
|
||
"\n"
|
||
]
|
||
},
|
||
{
|
||
"cell_type": "markdown",
|
||
"metadata": {},
|
||
"source": [
|
||
"## 🔎 Learn More About PageIndex\n",
|
||
" <a href=\"https://vectify.ai\">🏠 Homepage</a> • \n",
|
||
" <a href=\"https://dash.pageindex.ai\">🖥️ Dashboard</a> • \n",
|
||
" <a href=\"https://docs.pageindex.ai/quickstart\">📚 API Docs</a> • \n",
|
||
" <a href=\"https://github.com/vectifyai/pageindex\">📦 GitHub</a> • \n",
|
||
" <a href=\"https://discord.com/invite/VuXuf29EUj\">💬 Discord</a> • \n",
|
||
" <a href=\"https://ii2abc2jejf.typeform.com/to/tK3AXl8T\">✉️ Contact</a>\n",
|
||
"\n",
|
||
"<br>\n",
|
||
"\n",
|
||
"© 2025 [Vectify AI](https://vectify.ai)"
|
||
]
|
||
}
|
||
],
|
||
"metadata": {
|
||
"colab": {
|
||
"provenance": []
|
||
},
|
||
"kernelspec": {
|
||
"display_name": "Python 3",
|
||
"language": "python",
|
||
"name": "python3"
|
||
},
|
||
"language_info": {
|
||
"codemirror_mode": {
|
||
"name": "ipython",
|
||
"version": 3
|
||
},
|
||
"file_extension": ".py",
|
||
"mimetype": "text/x-python",
|
||
"name": "python",
|
||
"nbconvert_exporter": "python",
|
||
"pygments_lexer": "ipython3",
|
||
"version": "3.11.9"
|
||
}
|
||
},
|
||
"nbformat": 4,
|
||
"nbformat_minor": 0
|
||
}
|