Merge branch 'MODSetter:main' into main

This commit is contained in:
Anshul Sharma 2025-06-01 09:38:21 +05:30 committed by GitHub
commit 2ae8d227bf
No known key found for this signature in database
GPG key ID: B5690EEEBB952194
106 changed files with 8506 additions and 2268 deletions

17
.env.example Normal file
View file

@ -0,0 +1,17 @@
# Frontend Configuration
FRONTEND_PORT=3000
NEXT_PUBLIC_API_URL=http://backend:8000
# Backend Configuration
BACKEND_PORT=8000
# Database Configuration
POSTGRES_USER=postgres
POSTGRES_PASSWORD=postgres
POSTGRES_DB=surfsense
POSTGRES_PORT=5432
# pgAdmin Configuration
PGADMIN_PORT=5050
PGADMIN_DEFAULT_EMAIL=admin@surfsense.com
PGADMIN_DEFAULT_PASSWORD=surfsense

45
.github/PULL_REQUEST_TEMPLATE.md vendored Normal file
View file

@ -0,0 +1,45 @@
<!--- Provide a general summary of your changes in the Title above -->
## Description
<!--- Describe your changes in detail -->
## Motivation and Context
<!--- Why is this change required? What problem does it solve? -->
<!--- If this PR relates to an open issue, please link to the issue here: FIX #123 -->
FIX #
## Changes Overview
<!-- List the primary changes/improvements made in this PR -->
-
## Screenshots
<!-- If applicable, add screenshots or images to demonstrate the changes visually -->
## API Changes
<!-- Document any API changes if applicable -->
- [ ] This PR includes API changes
## Types of changes
<!--- What types of changes does your code introduce? Put an `x` in all the boxes that apply: -->
- [ ] Bug fix (non-breaking change which fixes an issue)
- [ ] New feature (non-breaking change which adds functionality)
- [ ] Performance improvement (non-breaking change which enhances performance)
- [ ] Documentation update
- [ ] Breaking change (fix or feature that would cause existing functionality to change)
## Testing
<!-- Describe the tests that have been run to verify your changes -->
- [ ] I have tested these changes locally
- [ ] I have added/updated unit tests
- [ ] I have added/updated integration tests
## Checklist:
<!--- Go over all the following points, and put an `x` in all the boxes that apply. -->
<!--- If you're unsure about any of these, don't hesitate to ask. We're here to help! -->
- [ ] My code follows the code style of this project
- [ ] My change requires documentation updates
- [ ] I have updated the documentation accordingly
- [ ] My change requires dependency updates
- [ ] I have updated the dependencies accordingly
- [ ] My code builds clean without any errors or warnings
- [ ] All new and existing tests passed

76
.github/workflows/docker-publish.yml vendored Normal file
View file

@ -0,0 +1,76 @@
name: Docker Publish
on:
push:
branches: [ "main" ]
jobs:
build_and_push_backend:
runs-on: ubuntu-latest
permissions:
contents: read
packages: write
steps:
- name: Checkout repository
uses: actions/checkout@v4
- name: Set up QEMU
uses: docker/setup-qemu-action@v3
- name: Set up Docker Buildx
uses: docker/setup-buildx-action@v3
- name: Log in to GitHub Container Registry
uses: docker/login-action@v3
with:
registry: ghcr.io
username: ${{ github.actor }}
password: ${{ secrets.GITHUB_TOKEN }}
- name: Build and push backend image
uses: docker/build-push-action@v5
with:
context: ./surfsense_backend
file: ./surfsense_backend/Dockerfile
push: true
tags: ghcr.io/${{ github.repository_owner }}/surfsense_backend:${{ github.sha }}
platforms: linux/amd64,linux/arm64
labels: |
org.opencontainers.image.source=${{ github.repositoryUrl }}
org.opencontainers.image.created=${{ fromJSON(steps.meta.outputs.json).labels['org.opencontainers.image.created'] }}
org.opencontainers.image.revision=${{ github.sha }}
build_and_push_frontend:
runs-on: ubuntu-latest
permissions:
contents: read
packages: write
steps:
- name: Checkout repository
uses: actions/checkout@v4
- name: Set up QEMU
uses: docker/setup-qemu-action@v3
- name: Set up Docker Buildx
uses: docker/setup-buildx-action@v3
- name: Log in to GitHub Container Registry
uses: docker/login-action@v3
with:
registry: ghcr.io
username: ${{ github.actor }}
password: ${{ secrets.GITHUB_TOKEN }}
- name: Build and push frontend image
uses: docker/build-push-action@v5
with:
context: ./surfsense_web
file: ./surfsense_web/Dockerfile
push: true
tags: ghcr.io/${{ github.repository_owner }}/surfsense_web:${{ github.sha }}
platforms: linux/amd64,linux/arm64
labels: |
org.opencontainers.image.source=${{ github.repositoryUrl }}
org.opencontainers.image.created=${{ fromJSON(steps.meta.outputs.json).labels['org.opencontainers.image.created'] }}
org.opencontainers.image.revision=${{ github.sha }}

1
.gitignore vendored
View file

@ -1 +1,2 @@
.flashrank_cache* .flashrank_cache*
podcasts/

124
DEPLOYMENT_GUIDE.md Normal file
View file

@ -0,0 +1,124 @@
# SurfSense Deployment Guide
This guide explains the different deployment options available for SurfSense using Docker Compose.
## Deployment Options
SurfSense uses a flexible Docker Compose configuration that allows you to easily switch between deployment modes without manually editing files. Our approach uses Docker's built-in override functionality with two configuration files:
1. **docker-compose.yml**: Contains essential core services (database and pgAdmin)
2. **docker-compose.override.yml**: Contains application services (frontend and backend)
This structure provides several advantages:
- No need to comment/uncomment services manually
- Clear separation between core infrastructure and application services
- Easy switching between development and production environments
## Deployment Modes
### Full Stack Mode (Development)
This mode runs everything: frontend, backend, database, and pgAdmin. It's ideal for development environments where you need the complete application stack.
```bash
# Both files are automatically used (docker-compose.yml + docker-compose.override.yml)
docker compose up -d
```
### Core Services Mode (Production)
This mode runs only the database and pgAdmin services. It's suitable for production environments where you might want to deploy the frontend and backend separately or need to run database migrations.
```bash
# Explicitly use only the main file
docker compose -f docker-compose.yml up -d
```
## Custom Deployment Options
### Running Specific Services
You can specify which services to start by naming them:
```bash
# Start only database
docker compose up -d db
# Start database and pgAdmin
docker compose up -d db pgadmin
# Start only backend (requires db to be running)
docker compose up -d backend
```
### Using Custom Override Files
You can create and use custom override files for different environments:
```bash
# Create a staging configuration
docker compose -f docker-compose.yml -f docker-compose.staging.yml up -d
```
## Environment Variables
The deployment can be customized using environment variables:
```bash
# Change default ports
FRONTEND_PORT=4000 BACKEND_PORT=9000 docker compose up -d
# Or use a .env file
# Create or modify .env file with your desired values
docker compose up -d
```
## Common Deployment Workflows
### Initial Setup
```bash
# Clone the repository
git clone https://github.com/MODSetter/SurfSense.git
cd SurfSense
# Copy example env files
cp .env.example .env
cp surfsense_backend/.env.example surfsense_backend/.env
cp surfsense_web/.env.example surfsense_web/.env
# Edit the .env files with your configuration
# Start full stack for development
docker compose up -d
```
### Database-Only Mode (for migrations or maintenance)
```bash
# Start just the database
docker compose -f docker-compose.yml up -d db
# Run migrations or maintenance tasks
docker compose exec db psql -U postgres -d surfsense
```
### Scaling in Production
For production deployments, you might want to:
1. Run core services with Docker Compose
2. Deploy frontend/backend with specialized services like Vercel, Netlify, or dedicated application servers
This separation allows for better scaling and resource utilization in production environments.
## Troubleshooting
If you encounter issues with the deployment:
- Check container logs: `docker compose logs -f [service_name]`
- Ensure all required environment variables are set
- Verify network connectivity between containers
- Check that required ports are available and not blocked by firewalls
For more detailed setup instructions, refer to [DOCKER_SETUP.md](DOCKER_SETUP.md).

View file

@ -7,73 +7,186 @@ This document explains how to run the SurfSense project using Docker Compose.
- Docker and Docker Compose installed on your machine - Docker and Docker Compose installed on your machine
- Git (to clone the repository) - Git (to clone the repository)
## Environment Variables Configuration
SurfSense Docker setup supports configuration through environment variables. You can set these variables in two ways:
1. Create a `.env` file in the project root directory (copy from `.env.example`)
2. Set environment variables directly in your shell before running Docker Compose
The following environment variables are available:
```
# Frontend Configuration
FRONTEND_PORT=3000
NEXT_PUBLIC_API_URL=http://backend:8000
# Backend Configuration
BACKEND_PORT=8000
# Database Configuration
POSTGRES_USER=postgres
POSTGRES_PASSWORD=postgres
POSTGRES_DB=surfsense
POSTGRES_PORT=5432
# pgAdmin Configuration
PGADMIN_PORT=5050
PGADMIN_DEFAULT_EMAIL=admin@surfsense.com
PGADMIN_DEFAULT_PASSWORD=surfsense
```
## Deployment Options
SurfSense uses a flexible Docker Compose setup that allows you to choose between different deployment modes:
### Option 1: Full-Stack Deployment (Development Mode)
Includes frontend, backend, database, and pgAdmin. This is the default when running `docker compose up`.
### Option 2: Core Services Only (Production Mode)
Includes only database and pgAdmin, suitable for production environments where you might deploy frontend/backend separately.
Our setup uses two files:
- `docker-compose.yml`: Contains core services (database and pgAdmin)
- `docker-compose.override.yml`: Contains application services (frontend and backend)
## Setup ## Setup
1. Make sure you have all the necessary environment variables set up: 1. Make sure you have all the necessary environment variables set up:
- Copy `surfsense_backend/.env.example` to `surfsense_backend/.env` and fill in the required values - Copy `surfsense_backend/.env.example` to `surfsense_backend/.env` and fill in the required values
- Copy `surfsense_web/.env.example` to `surfsense_web/.env` and fill in the required values - Copy `surfsense_web/.env.example` to `surfsense_web/.env` and fill in the required values
- Optionally: Copy `.env.example` to `.env` in the project root to customize Docker settings
2. Build and start the containers: 2. Deploy based on your needs:
**Full Stack (Development Mode)**:
```bash ```bash
docker-compose up --build # Both files are automatically used
docker compose up --build
```
**Core Services Only (Production Mode)**:
```bash
# Explicitly use only the main file
docker compose -f docker-compose.yml up --build
``` ```
3. To run in detached mode (in the background): 3. To run in detached mode (in the background):
```bash ```bash
docker-compose up -d # Full stack
docker compose up -d
# Core services only
docker compose -f docker-compose.yml up -d
``` ```
4. Access the applications: 4. Access the applications:
- Frontend: http://localhost:3000 - Frontend: http://localhost:3000 (when using full stack)
- Backend API: http://localhost:8000 - Backend API: http://localhost:8000 (when using full stack)
- API Documentation: http://localhost:8000/docs - API Documentation: http://localhost:8000/docs (when using full stack)
- pgAdmin: http://localhost:5050
## Customizing the Deployment
If you need to make temporary changes to either full stack or core services deployment, you can:
1. **Temporarily disable override file**:
```bash
docker compose -f docker-compose.yml up -d
```
2. **Use a custom override file**:
```bash
docker compose -f docker-compose.yml -f custom-override.yml up -d
```
3. **Temporarily modify which services start**:
```bash
docker compose up -d db pgadmin
```
## Useful Commands ## Useful Commands
- Stop the containers: - Stop the containers:
```bash ```bash
docker-compose down docker compose down
``` ```
- View logs: - View logs:
```bash ```bash
# All services # All services
docker-compose logs -f docker compose logs -f
# Specific service # Specific service
docker-compose logs -f backend docker compose logs -f backend
docker-compose logs -f frontend docker compose logs -f frontend
docker-compose logs -f db docker compose logs -f db
docker compose logs -f pgadmin
``` ```
- Restart a specific service: - Restart a specific service:
```bash ```bash
docker-compose restart backend docker compose restart backend
``` ```
- Execute commands in a running container: - Execute commands in a running container:
```bash ```bash
# Backend # Backend
docker-compose exec backend python -m pytest docker compose exec backend python -m pytest
# Frontend # Frontend
docker-compose exec frontend pnpm lint docker compose exec frontend pnpm lint
``` ```
## Database ## Database
The PostgreSQL database with pgvector extensions is available at: The PostgreSQL database with pgvector extensions is available at:
- Host: localhost - Host: localhost
- Port: 5432 - Port: 5432 (configurable via POSTGRES_PORT)
- Username: postgres - Username: postgres (configurable via POSTGRES_USER)
- Password: postgres - Password: postgres (configurable via POSTGRES_PASSWORD)
- Database: surfsense - Database: surfsense (configurable via POSTGRES_DB)
You can connect to it using any PostgreSQL client. You can connect to it using any PostgreSQL client or the included pgAdmin.
## pgAdmin
pgAdmin is a web-based administration tool for PostgreSQL. It is included in the Docker setup for easier database management.
- URL: http://localhost:5050 (configurable via PGADMIN_PORT)
- Default Email: admin@surfsense.com (configurable via PGADMIN_DEFAULT_EMAIL)
- Default Password: surfsense (configurable via PGADMIN_DEFAULT_PASSWORD)
### Connecting to the Database in pgAdmin
1. Log in to pgAdmin using the credentials above
2. Right-click on "Servers" in the left sidebar and select "Create" > "Server"
3. In the "General" tab, give your connection a name (e.g., "SurfSense DB")
4. In the "Connection" tab, enter the following:
- Host: db
- Port: 5432
- Maintenance database: surfsense
- Username: postgres
- Password: postgres
5. Click "Save" to establish the connection
## Troubleshooting ## Troubleshooting
- If you encounter permission errors, you may need to run the docker commands with `sudo`. - If you encounter permission errors, you may need to run the docker commands with `sudo`.
- If ports are already in use, modify the port mappings in the `docker-compose.yml` file. - If ports are already in use, modify the port mappings in the `.env` file or directly in the `docker-compose.yml` file.
- For backend dependency issues, you may need to modify the `Dockerfile` in the backend directory. - For backend dependency issues, you may need to modify the `Dockerfile` in the backend directory.
- For frontend dependency issues, you may need to modify the `Dockerfile` in the frontend directory. - If you encounter frontend dependency errors, adjust the frontend's `Dockerfile` accordingly.
- If pgAdmin doesn't connect to the database, ensure you're using `db` as the hostname, not `localhost`, as that's the Docker network name.
- If you need only specific services, you can explicitly name them: `docker compose up db pgadmin`
## Understanding Docker Compose File Structure
The project uses Docker's default override mechanism:
1. **docker-compose.yml**: Contains essential services (database and pgAdmin)
2. **docker-compose.override.yml**: Contains development services (frontend and backend)
When you run `docker compose up` without additional flags, Docker automatically merges both files.
When you run `docker compose -f docker-compose.yml up`, only the specified file is used.
This approach lets you maintain a cleaner codebase without manually commenting/uncommenting services in your configuration files.

125
README.md
View file

@ -1,39 +1,53 @@
![new_header](https://github.com/user-attachments/assets/e236b764-0ddc-42ff-a1f1-8fbb3d2e0e65) ![new_header](https://github.com/user-attachments/assets/e236b764-0ddc-42ff-a1f1-8fbb3d2e0e65)
# SurfSense # SurfSense
While tools like NotebookLM and Perplexity are impressive and highly effective for conducting research on any topic/query, SurfSense elevates this capability by integrating with your personal knowledge base. It is a highly customizable AI research agent, connected to external sources such as search engines (Tavily), Slack, Linear, Notion, YouTube, GitHub and more to come. While tools like NotebookLM and Perplexity are impressive and highly effective for conducting research on any topic/query, SurfSense elevates this capability by integrating with your personal knowledge base. It is a highly customizable AI research agent, connected to external sources such as search engines (Tavily, LinkUp), Slack, Linear, Notion, YouTube, GitHub and more to come.
<div align="center">
<a href="https://trendshift.io/repositories/13606" target="_blank"><img src="https://trendshift.io/api/badge/repositories/13606" alt="MODSetter%2FSurfSense | Trendshift" style="width: 250px; height: 55px;" width="250" height="55"/></a>
</div>
# Video # Video
https://github.com/user-attachments/assets/48142909-6391-4084-b7e8-81da388bb1fc https://github.com/user-attachments/assets/48142909-6391-4084-b7e8-81da388bb1fc
# Podcast's
https://github.com/user-attachments/assets/d516982f-de00-4c41-9e4c-632a7d942f41
## Podcast Sample
https://github.com/user-attachments/assets/bf64a6ca-934b-47ac-9e1b-edac5fe972ec
## Key Features ## Key Features
### 1. Latest
#### 💡 **Idea**: ### 💡 **Idea**:
Have your own highly customizable private NotebookLM and Perplexity integrated with external sources. Have your own highly customizable private NotebookLM and Perplexity integrated with external sources.
#### 📁 **Multiple File Format Uploading Support** ### 📁 **Multiple File Format Uploading Support**
Save content from your own personal files *(Documents, images and supports **27 file extensions**)* to your own personal knowledge base . Save content from your own personal files *(Documents, images, videos and supports **50+ file extensions**)* to your own personal knowledge base .
#### 🔍 **Powerful Search** ### 🔍 **Powerful Search**
Quickly research or find anything in your saved content . Quickly research or find anything in your saved content .
#### 💬 **Chat with your Saved Content** ### 💬 **Chat with your Saved Content**
Interact in Natural Language and get cited answers. Interact in Natural Language and get cited answers.
#### 📄 **Cited Answers** ### 📄 **Cited Answers**
Get Cited answers just like Perplexity. Get Cited answers just like Perplexity.
#### 🔔 **Privacy & Local LLM Support** ### 🔔 **Privacy & Local LLM Support**
Works Flawlessly with Ollama local LLMs. Works Flawlessly with Ollama local LLMs.
#### 🏠 **Self Hostable** ### 🏠 **Self Hostable**
Open source and easy to deploy locally. Open source and easy to deploy locally.
#### 📊 **Advanced RAG Techniques** ### 🎙️ Podcasts
- Blazingly fast podcast generation agent. (Creates a 3-minute podcast in under 20 seconds.)
- Convert your chat conversations into engaging audio content
- Support for multiple TTS providers (OpenAI, Azure, Google Vertex AI)
### 📊 **Advanced RAG Techniques**
- Supports 150+ LLM's - Supports 150+ LLM's
- Supports 6000+ Embedding Models. - Supports 6000+ Embedding Models.
- Supports all major Rerankers (Pinecode, Cohere, Flashrank etc) - Supports all major Rerankers (Pinecode, Cohere, Flashrank etc)
@ -41,8 +55,8 @@ Open source and easy to deploy locally.
- Utilizes Hybrid Search (Semantic + Full Text Search combined with Reciprocal Rank Fusion). - Utilizes Hybrid Search (Semantic + Full Text Search combined with Reciprocal Rank Fusion).
- RAG as a Service API Backend. - RAG as a Service API Backend.
#### **External Sources** ### **External Sources**
- Search Engines (Tavily) - Search Engines (Tavily, LinkUp)
- Slack - Slack
- Linear - Linear
- Notion - Notion
@ -50,17 +64,41 @@ Open source and easy to deploy locally.
- GitHub - GitHub
- and more to come..... - and more to come.....
#### 🔖 Cross Browser Extension ## 📄 **Supported File Extensions**
> **Note**: File format support depends on your ETL service configuration. LlamaCloud supports 50+ formats, while Unstructured supports 34+ core formats.
### Documents & Text
**LlamaCloud**: `.pdf`, `.doc`, `.docx`, `.docm`, `.dot`, `.dotm`, `.rtf`, `.txt`, `.xml`, `.epub`, `.odt`, `.wpd`, `.pages`, `.key`, `.numbers`, `.602`, `.abw`, `.cgm`, `.cwk`, `.hwp`, `.lwp`, `.mw`, `.mcw`, `.pbd`, `.sda`, `.sdd`, `.sdp`, `.sdw`, `.sgl`, `.sti`, `.sxi`, `.sxw`, `.stw`, `.sxg`, `.uof`, `.uop`, `.uot`, `.vor`, `.wps`, `.zabw`
**Unstructured**: `.doc`, `.docx`, `.odt`, `.rtf`, `.pdf`, `.xml`, `.txt`, `.md`, `.markdown`, `.rst`, `.html`, `.org`, `.epub`
### Presentations
**LlamaCloud**: `.ppt`, `.pptx`, `.pptm`, `.pot`, `.potm`, `.potx`, `.odp`, `.key`
**Unstructured**: `.ppt`, `.pptx`
### Spreadsheets & Data
**LlamaCloud**: `.xlsx`, `.xls`, `.xlsm`, `.xlsb`, `.xlw`, `.csv`, `.tsv`, `.ods`, `.fods`, `.numbers`, `.dbf`, `.123`, `.dif`, `.sylk`, `.slk`, `.prn`, `.et`, `.uos1`, `.uos2`, `.wk1`, `.wk2`, `.wk3`, `.wk4`, `.wks`, `.wq1`, `.wq2`, `.wb1`, `.wb2`, `.wb3`, `.qpw`, `.xlr`, `.eth`
**Unstructured**: `.xls`, `.xlsx`, `.csv`, `.tsv`
### Images
**LlamaCloud**: `.jpg`, `.jpeg`, `.png`, `.gif`, `.bmp`, `.svg`, `.tiff`, `.webp`, `.html`, `.htm`, `.web`
**Unstructured**: `.jpg`, `.jpeg`, `.png`, `.bmp`, `.tiff`, `.heic`
### Audio & Video *(Always Supported)*
`.mp3`, `.mpga`, `.m4a`, `.wav`, `.mp4`, `.mpeg`, `.webm`
### Email & Communication
**Unstructured**: `.eml`, `.msg`, `.p7s`
### 🔖 Cross Browser Extension
- The SurfSense extension can be used to save any webpage you like. - The SurfSense extension can be used to save any webpage you like.
- Its main usecase is to save any webpages protected beyond authentication. - Its main usecase is to save any webpages protected beyond authentication.
### 2. Temporarily Deprecated
#### Podcasts
- The SurfSense Podcast feature is currently being reworked for better UI and stability. Expect it soon.
## FEATURE REQUESTS AND FUTURE ## FEATURE REQUESTS AND FUTURE
@ -76,7 +114,13 @@ Join the [SurfSense Discord](https://discord.gg/ejRNvftDp9) and help shape the f
SurfSense provides two installation methods: SurfSense provides two installation methods:
1. **[Docker Installation](https://www.surfsense.net/docs/docker-installation)** - The easiest way to get SurfSense up and running with all dependencies containerized. Less Customization. 1. **[Docker Installation](https://www.surfsense.net/docs/docker-installation)** - The easiest way to get SurfSense up and running with all dependencies containerized.
- Includes pgAdmin for database management through a web UI
- Supports environment variable customization via `.env` file
- Flexible deployment options (full stack or core services only)
- No need to manually edit configuration files between environments
- See [Docker Setup Guide](DOCKER_SETUP.md) for detailed instructions
- For deployment scenarios and options, see [Deployment Guide](DEPLOYMENT_GUIDE.md)
2. **[Manual Installation (Recommended)](https://www.surfsense.net/docs/manual-installation)** - For users who prefer more control over their setup or need to customize their deployment. 2. **[Manual Installation (Recommended)](https://www.surfsense.net/docs/manual-installation)** - For users who prefer more control over their setup or need to customize their deployment.
@ -84,7 +128,6 @@ Both installation guides include detailed OS-specific instructions for Windows,
Before installation, make sure to complete the [prerequisite setup steps](https://www.surfsense.net/docs/) including: Before installation, make sure to complete the [prerequisite setup steps](https://www.surfsense.net/docs/) including:
- PGVector setup - PGVector setup
- Google OAuth configuration
- Unstructured.io API key - Unstructured.io API key
- Other required API keys - Other required API keys
@ -101,6 +144,9 @@ Before installation, make sure to complete the [prerequisite setup steps](https:
![researcher](https://github.com/user-attachments/assets/fda3e61f-f936-4b66-b565-d84edde44a67) ![researcher](https://github.com/user-attachments/assets/fda3e61f-f936-4b66-b565-d84edde44a67)
**Podcast Agent**
![podcasts](https://github.com/user-attachments/assets/6cb82ffd-9e14-4172-bc79-67faf34c4c1c)
**Agent Chat** **Agent Chat**
@ -112,6 +158,7 @@ Before installation, make sure to complete the [prerequisite setup steps](https:
![ext2](https://github.com/user-attachments/assets/a9b9f1aa-2677-404d-b0a0-c1b2dddf24a7) ![ext2](https://github.com/user-attachments/assets/a9b9f1aa-2677-404d-b0a0-c1b2dddf24a7)
## Tech Stack ## Tech Stack
@ -178,6 +225,14 @@ Before installation, make sure to complete the [prerequisite setup steps](https:
- **@tanstack/react-table**: Headless UI for building powerful tables & datagrids. - **@tanstack/react-table**: Headless UI for building powerful tables & datagrids.
### **DevOps**
- **Docker**: Container platform for consistent deployment across environments
- **Docker Compose**: Tool for defining and running multi-container Docker applications
- **pgAdmin**: Web-based PostgreSQL administration tool included in Docker setup
### **Extension** ### **Extension**
Manifest v3 on Plasmo Manifest v3 on Plasmo
@ -185,16 +240,8 @@ Before installation, make sure to complete the [prerequisite setup steps](https:
## Future Work ## Future Work
- Add More Connectors. - Add More Connectors.
- Patch minor bugs. - Patch minor bugs.
- Implement Canvas. - Document Chat **[REIMPLEMENT]**
- Complete Hybrid Search. **[Done]** - Document Podcasts
- Add support for file uploads QA. **[Done]**
- Shift to WebSockets for Streaming responses. **[Deprecated in favor of AI SDK Stream Protocol]**
- Based on feedback, I will work on making it compatible with local models. **[Done]**
- Cross Browser Extension **[Done]**
- Critical Notifications **[Done | PAUSED]**
- Saving Chats **[Done]**
- Basic keyword search page for saved sessions **[Done]**
- Multi & Single Document Chat **[Done]**
@ -203,3 +250,13 @@ Before installation, make sure to complete the [prerequisite setup steps](https:
Contributions are very welcome! A contribution can be as small as a ⭐ or even finding and creating issues. Contributions are very welcome! A contribution can be as small as a ⭐ or even finding and creating issues.
Fine-tuning the Backend is always desired. Fine-tuning the Backend is always desired.
## Star History
<a href="https://www.star-history.com/#MODSetter/SurfSense&Date">
<picture>
<source media="(prefers-color-scheme: dark)" srcset="https://api.star-history.com/svg?repos=MODSetter/SurfSense&type=Date&theme=dark" />
<source media="(prefers-color-scheme: light)" srcset="https://api.star-history.com/svg?repos=MODSetter/SurfSense&type=Date" />
<img alt="Star History Chart" src="https://api.star-history.com/svg?repos=MODSetter/SurfSense&type=Date" />
</picture>
</a>

View file

@ -0,0 +1,34 @@
version: '3.8'
services:
frontend:
build:
context: ./surfsense_web
dockerfile: Dockerfile
ports:
- "${FRONTEND_PORT:-3000}:3000"
volumes:
- ./surfsense_web:/app
- /app/node_modules
depends_on:
- backend
environment:
- NEXT_PUBLIC_API_URL=${NEXT_PUBLIC_API_URL:-http://backend:8000}
backend:
build:
context: ./surfsense_backend
dockerfile: Dockerfile
ports:
- "${BACKEND_PORT:-8000}:8000"
volumes:
- ./surfsense_backend:/app
depends_on:
- db
env_file:
- ./surfsense_backend/.env
environment:
- DATABASE_URL=postgresql+asyncpg://${POSTGRES_USER:-postgres}:${POSTGRES_PASSWORD:-postgres}@db:5432/${POSTGRES_DB:-surfsense}
- PYTHONPATH=/app
- UVICORN_LOOP=asyncio
- UNSTRUCTURED_HAS_PATCHED_LOOP=1

View file

@ -1,48 +1,29 @@
version: '3.8' version: '3.8'
services: services:
frontend:
build:
context: ./surfsense_web
dockerfile: Dockerfile
ports:
- "3000:3000"
volumes:
- ./surfsense_web:/app
- /app/node_modules
depends_on:
- backend
environment:
- NEXT_PUBLIC_API_URL=http://backend:8000
backend:
build:
context: ./surfsense_backend
dockerfile: Dockerfile
ports:
- "8000:8000"
volumes:
- ./surfsense_backend:/app
depends_on:
- db
env_file:
- ./surfsense_backend/.env
environment:
- DATABASE_URL=postgresql+asyncpg://postgres:postgres@db:5432/surfsense
- PYTHONPATH=/app
- UVICORN_LOOP=asyncio
- UNSTRUCTURED_HAS_PATCHED_LOOP=1
db: db:
image: ankane/pgvector:latest image: ankane/pgvector:latest
ports: ports:
- "5432:5432" - "${POSTGRES_PORT:-5432}:5432"
volumes: volumes:
- postgres_data:/var/lib/postgresql/data - postgres_data:/var/lib/postgresql/data
environment: environment:
- POSTGRES_USER=postgres - POSTGRES_USER=${POSTGRES_USER:-postgres}
- POSTGRES_PASSWORD=postgres - POSTGRES_PASSWORD=${POSTGRES_PASSWORD:-postgres}
- POSTGRES_DB=surfsense - POSTGRES_DB=${POSTGRES_DB:-surfsense}
pgadmin:
image: dpage/pgadmin4
ports:
- "${PGADMIN_PORT:-5050}:80"
environment:
- PGADMIN_DEFAULT_EMAIL=${PGADMIN_DEFAULT_EMAIL:-admin@surfsense.com}
- PGADMIN_DEFAULT_PASSWORD=${PGADMIN_DEFAULT_PASSWORD:-surfsense}
volumes:
- pgadmin_data:/var/lib/pgadmin
depends_on:
- db
volumes: volumes:
postgres_data: postgres_data:
pgadmin_data:

View file

@ -1,10 +1,15 @@
DATABASE_URL="postgresql+asyncpg://postgres:postgres@localhost:5432/surfsense" DATABASE_URL="postgresql+asyncpg://postgres:postgres@localhost:5432/surfsense"
SECRET_KEY="SECRET" SECRET_KEY="SECRET"
GOOGLE_OAUTH_CLIENT_ID="924507538m"
GOOGLE_OAUTH_CLIENT_SECRET="GOCSV"
NEXT_FRONTEND_URL="http://localhost:3000" NEXT_FRONTEND_URL="http://localhost:3000"
#Auth
AUTH_TYPE="GOOGLE" or "LOCAL"
# For Google Auth Only
GOOGLE_OAUTH_CLIENT_ID="924507538m"
GOOGLE_OAUTH_CLIENT_SECRET="GOCSV"
#Embedding Model
EMBEDDING_MODEL="mixedbread-ai/mxbai-embed-large-v1" EMBEDDING_MODEL="mixedbread-ai/mxbai-embed-large-v1"
RERANKERS_MODEL_NAME="ms-marco-MiniLM-L-12-v2" RERANKERS_MODEL_NAME="ms-marco-MiniLM-L-12-v2"
@ -15,15 +20,32 @@ FAST_LLM="openai/gpt-4o-mini"
STRATEGIC_LLM="openai/gpt-4o" STRATEGIC_LLM="openai/gpt-4o"
LONG_CONTEXT_LLM="gemini/gemini-2.0-flash" LONG_CONTEXT_LLM="gemini/gemini-2.0-flash"
#LiteLLM TTS Provider: https://docs.litellm.ai/docs/text_to_speech#supported-providers
TTS_SERVICE="openai/tts-1"
#LiteLLM STT Provider: https://docs.litellm.ai/docs/audio_transcription#supported-providers
STT_SERVICE="openai/whisper-1"
# Chosen LiteLLM Providers Keys # Chosen LiteLLM Providers Keys
OPENAI_API_KEY="sk-proj-iA" OPENAI_API_KEY="sk-proj-iA"
GEMINI_API_KEY="AIzaSyB6-1641124124124124124124124124124" GEMINI_API_KEY="AIzaSyB6-1641124124124124124124124124124"
UNSTRUCTURED_API_KEY="Tpu3P0U8iy"
FIRECRAWL_API_KEY="fcr-01J0000000000000000000000" FIRECRAWL_API_KEY="fcr-01J0000000000000000000000"
#File Parser Service
ETL_SERVICE="UNSTRUCTURED" or "LLAMACLOUD"
UNSTRUCTURED_API_KEY="Tpu3P0U8iy"
LLAMA_CLOUD_API_KEY="llx-nnn"
#OPTIONAL: Add these for LangSmith Observability #OPTIONAL: Add these for LangSmith Observability
LANGSMITH_TRACING=true LANGSMITH_TRACING=true
LANGSMITH_ENDPOINT="https://api.smith.langchain.com" LANGSMITH_ENDPOINT="https://api.smith.langchain.com"
LANGSMITH_API_KEY="lsv2_pt_....." LANGSMITH_API_KEY="lsv2_pt_....."
LANGSMITH_PROJECT="surfsense" LANGSMITH_PROJECT="surfsense"
# OPTIONAL: LiteLLM API Base
FAST_LLM_API_BASE=""
STRATEGIC_LLM_API_BASE=""
LONG_CONTEXT_LLM_API_BASE=""
TTS_SERVICE_API_BASE=""
STT_SERVICE_API_BASE=""

View file

@ -5,3 +5,4 @@ data/
__pycache__/ __pycache__/
.flashrank_cache .flashrank_cache
surf_new_backend.egg-info/ surf_new_backend.egg-info/
podcasts/

View file

@ -110,7 +110,6 @@ See pyproject.toml for detailed dependency information. Key dependencies include
- fastapi and related packages - fastapi and related packages
- fastapi-users: Authentication and user management - fastapi-users: Authentication and user management
- firecrawl-py: Web crawling capabilities - firecrawl-py: Web crawling capabilities
- gpt-researcher: Advanced research capabilities
- langchain components for AI workflows - langchain components for AI workflows
- litellm: LLM model integration - litellm: LLM model integration
- pgvector: Vector similarity search in PostgreSQL - pgvector: Vector similarity search in PostgreSQL

View file

@ -2,7 +2,6 @@
Revision ID: 1 Revision ID: 1
Revises: Revises:
Create Date: 2023-10-27 10:00:00.000000
""" """
from typing import Sequence, Union from typing import Sequence, Union

View file

@ -2,7 +2,6 @@
Revision ID: 2 Revision ID: 2
Revises: e55302644c51 Revises: e55302644c51
Create Date: 2025-04-16 10:00:00.000000
""" """
from typing import Sequence, Union from typing import Sequence, Union

View file

@ -2,7 +2,6 @@
Revision ID: 3 Revision ID: 3
Revises: 2 Revises: 2
Create Date: 2025-04-16 10:05:00.059921
""" """
from typing import Sequence, Union from typing import Sequence, Union

View file

@ -0,0 +1,44 @@
"""Add LINKUP_API to SearchSourceConnectorType enum
Revision ID: 4
Revises: 3
"""
from typing import Sequence, Union
from alembic import op
import sqlalchemy as sa
# revision identifiers, used by Alembic.
revision: str = '4'
down_revision: Union[str, None] = '3'
branch_labels: Union[str, Sequence[str], None] = None
depends_on: Union[str, Sequence[str], None] = None
def upgrade() -> None:
# ### commands auto generated by Alembic - please adjust! ###
# Manually add the command to add the enum value
op.execute("ALTER TYPE searchsourceconnectortype ADD VALUE 'LINKUP_API'")
# Pass for the rest, as autogenerate didn't run to add other schema details
pass
# ### end Alembic commands ###
def downgrade() -> None:
# ### commands auto generated by Alembic - please adjust! ###
# Downgrading removal of an enum value requires recreating the type
op.execute("ALTER TYPE searchsourceconnectortype RENAME TO searchsourceconnectortype_old")
op.execute("CREATE TYPE searchsourceconnectortype AS ENUM('SERPER_API', 'TAVILY_API', 'SLACK_CONNECTOR', 'NOTION_CONNECTOR', 'GITHUB_CONNECTOR', 'LINEAR_CONNECTOR')")
op.execute((
"ALTER TABLE search_source_connectors ALTER COLUMN connector_type TYPE searchsourceconnectortype USING "
"connector_type::text::searchsourceconnectortype"
))
op.execute("DROP TYPE searchsourceconnectortype_old")
pass
# ### end Alembic commands ###

View file

@ -0,0 +1,57 @@
"""Remove char limit on title columns
Revision ID: 5
Revises: 4
"""
from typing import Sequence, Union
from alembic import op
import sqlalchemy as sa
# revision identifiers, used by Alembic.
revision: str = '5'
down_revision: Union[str, None] = '4'
branch_labels: Union[str, Sequence[str], None] = None
depends_on: Union[str, Sequence[str], None] = None
def upgrade() -> None:
# Alter Chat table
op.alter_column('chats', 'title',
existing_type=sa.String(200),
type_=sa.String(),
existing_nullable=False)
# Alter Document table
op.alter_column('documents', 'title',
existing_type=sa.String(200),
type_=sa.String(),
existing_nullable=False)
# Alter Podcast table
op.alter_column('podcasts', 'title',
existing_type=sa.String(200),
type_=sa.String(),
existing_nullable=False)
def downgrade() -> None:
# Revert Chat table
op.alter_column('chats', 'title',
existing_type=sa.String(),
type_=sa.String(200),
existing_nullable=False)
# Revert Document table
op.alter_column('documents', 'title',
existing_type=sa.String(),
type_=sa.String(200),
existing_nullable=False)
# Revert Podcast table
op.alter_column('podcasts', 'title',
existing_type=sa.String(),
type_=sa.String(200),
existing_nullable=False)

View file

@ -0,0 +1,43 @@
"""Change podcast_content to podcast_transcript with JSON type
Revision ID: 6
Revises: 5
"""
from typing import Sequence, Union
from alembic import op
import sqlalchemy as sa
from sqlalchemy.dialects.postgresql import JSON
# revision identifiers, used by Alembic.
revision: str = '6'
down_revision: Union[str, None] = '5'
branch_labels: Union[str, Sequence[str], None] = None
depends_on: Union[str, Sequence[str], None] = None
def upgrade() -> None:
# Drop the old column and create a new one with the new name and type
# We need to do this because PostgreSQL doesn't support direct column renames with type changes
op.add_column('podcasts', sa.Column('podcast_transcript', JSON, nullable=False, server_default='{}'))
# Copy data from old column to new column
# Convert text to JSON by storing it as a JSON string value
op.execute("UPDATE podcasts SET podcast_transcript = jsonb_build_object('text', podcast_content) WHERE podcast_content != ''")
# Drop the old column
op.drop_column('podcasts', 'podcast_content')
def downgrade() -> None:
# Add back the original column
op.add_column('podcasts', sa.Column('podcast_content', sa.Text(), nullable=False, server_default=''))
# Copy data from JSON column back to text column
# Extract the 'text' field if it exists, otherwise use empty string
op.execute("UPDATE podcasts SET podcast_content = COALESCE((podcast_transcript->>'text'), '')")
# Drop the new column
op.drop_column('podcasts', 'podcast_transcript')

View file

@ -0,0 +1,27 @@
"""Remove is_generated column from podcasts table
Revision ID: 7
Revises: 6
"""
from typing import Sequence, Union
from alembic import op
import sqlalchemy as sa
# revision identifiers, used by Alembic.
revision: str = '7'
down_revision: Union[str, None] = '6'
branch_labels: Union[str, Sequence[str], None] = None
depends_on: Union[str, Sequence[str], None] = None
def upgrade() -> None:
# Drop the is_generated column
op.drop_column('podcasts', 'is_generated')
def downgrade() -> None:
# Add back the is_generated column with its original constraints
op.add_column('podcasts', sa.Column('is_generated', sa.Boolean(), nullable=False, server_default='false'))

View file

@ -0,0 +1,56 @@
"""Add content_hash column to documents table
Revision ID: 8
Revises: 7
"""
from typing import Sequence, Union
from alembic import op
import sqlalchemy as sa
# revision identifiers, used by Alembic.
revision: str = '8'
down_revision: Union[str, None] = '7'
branch_labels: Union[str, Sequence[str], None] = None
depends_on: Union[str, Sequence[str], None] = None
def upgrade() -> None:
# Add content_hash column as nullable first to handle existing data
op.add_column('documents', sa.Column('content_hash', sa.String(), nullable=True))
# Update existing documents to generate content hashes
# Using SHA-256 hash of the content column with proper UTF-8 encoding
op.execute("""
UPDATE documents
SET content_hash = encode(sha256(convert_to(content, 'UTF8')), 'hex')
WHERE content_hash IS NULL
""")
# Handle duplicate content hashes by keeping only the oldest document for each hash
# Delete newer documents with duplicate content hashes
op.execute("""
DELETE FROM documents
WHERE id NOT IN (
SELECT MIN(id)
FROM documents
GROUP BY content_hash
)
""")
# Now alter the column to match the model: nullable=False, index=True, unique=True
op.alter_column('documents', 'content_hash',
existing_type=sa.String(),
nullable=False)
op.create_index(op.f('ix_documents_content_hash'), 'documents', ['content_hash'], unique=False)
op.create_unique_constraint(op.f('uq_documents_content_hash'), 'documents', ['content_hash'])
def downgrade() -> None:
# Remove constraints and index first
op.drop_constraint(op.f('uq_documents_content_hash'), 'documents', type_='unique')
op.drop_index(op.f('ix_documents_content_hash'), table_name='documents')
# Remove content_hash column from documents table
op.drop_column('documents', 'content_hash')

View file

@ -2,7 +2,6 @@
Revision ID: e55302644c51 Revision ID: e55302644c51
Revises: 1 Revises: 1
Create Date: 2025-04-13 19:56:00.059921
""" """
from typing import Sequence, Union from typing import Sequence, Union

View file

@ -1 +0,0 @@
"""This is upcoming research agent. Work in progress."""

View file

@ -0,0 +1,8 @@
"""New LangGraph Agent.
This module defines a custom graph.
"""
from .graph import graph
__all__ = ["graph"]

View file

@ -0,0 +1,28 @@
"""Define the configurable parameters for the agent."""
from __future__ import annotations
from dataclasses import dataclass, fields
from typing import Optional
from langchain_core.runnables import RunnableConfig
@dataclass(kw_only=True)
class Configuration:
"""The configuration for the agent."""
# Changeme: Add configurable values here!
# these values can be pre-set when you
# create assistants (https://langchain-ai.github.io/langgraph/cloud/how-tos/configuration_cloud/)
# and when you invoke the graph
podcast_title: str
@classmethod
def from_runnable_config(
cls, config: Optional[RunnableConfig] = None
) -> Configuration:
"""Create a Configuration instance from a RunnableConfig object."""
configurable = (config.get("configurable") or {}) if config else {}
_fields = {f.name for f in fields(cls) if f.init}
return cls(**{k: v for k, v in configurable.items() if k in _fields})

View file

@ -0,0 +1,31 @@
from langgraph.graph import StateGraph
from .configuration import Configuration
from .state import State
from .nodes import create_merged_podcast_audio, create_podcast_transcript
def build_graph():
# Define a new graph
workflow = StateGraph(State, config_schema=Configuration)
# Add the node to the graph
workflow.add_node("create_podcast_transcript", create_podcast_transcript)
workflow.add_node("create_merged_podcast_audio", create_merged_podcast_audio)
# Set the entrypoint as `call_model`
workflow.add_edge("__start__", "create_podcast_transcript")
workflow.add_edge("create_podcast_transcript", "create_merged_podcast_audio")
workflow.add_edge("create_merged_podcast_audio", "__end__")
# Compile the workflow into an executable graph
graph = workflow.compile()
graph.name = "Surfsense Podcaster" # This defines the custom name in LangSmith
return graph
# Compile the graph once when the module is loaded
graph = build_graph()

View file

@ -0,0 +1,206 @@
from typing import Any, Dict
import json
import os
import uuid
from pathlib import Path
import asyncio
from langchain_core.messages import HumanMessage, SystemMessage
from langchain_core.runnables import RunnableConfig
from litellm import aspeech
from ffmpeg.asyncio import FFmpeg
from .configuration import Configuration
from .state import PodcastTranscriptEntry, State, PodcastTranscripts
from .prompts import get_podcast_generation_prompt
from app.config import config as app_config
async def create_podcast_transcript(state: State, config: RunnableConfig) -> Dict[str, Any]:
"""Each node does work."""
# Initialize LLM
llm = app_config.long_context_llm_instance
# Get the prompt
prompt = get_podcast_generation_prompt()
# Create the messages
messages = [
SystemMessage(content=prompt),
HumanMessage(content=f"<source_content>{state.source_content}</source_content>")
]
# Generate the podcast transcript
llm_response = await llm.ainvoke(messages)
# First try the direct approach
try:
podcast_transcript = PodcastTranscripts.model_validate(json.loads(llm_response.content))
except (json.JSONDecodeError, ValueError) as e:
print(f"Direct JSON parsing failed, trying fallback approach: {str(e)}")
# Fallback: Parse the JSON response manually
try:
# Extract JSON content from the response
content = llm_response.content
# Find the JSON in the content (handle case where LLM might add additional text)
json_start = content.find('{')
json_end = content.rfind('}') + 1
if json_start >= 0 and json_end > json_start:
json_str = content[json_start:json_end]
# Parse the JSON string
parsed_data = json.loads(json_str)
# Convert to Pydantic model
podcast_transcript = PodcastTranscripts.model_validate(parsed_data)
print(f"Successfully parsed podcast transcript using fallback approach")
else:
# If JSON structure not found, raise a clear error
error_message = f"Could not find valid JSON in LLM response. Raw response: {content}"
print(error_message)
raise ValueError(error_message)
except (json.JSONDecodeError, ValueError) as e2:
# Log the error and re-raise it
error_message = f"Error parsing LLM response (fallback also failed): {str(e2)}"
print(f"Error parsing LLM response: {str(e2)}")
print(f"Raw response: {llm_response.content}")
raise
return {
"podcast_transcript": podcast_transcript.podcast_transcripts
}
async def create_merged_podcast_audio(state: State, config: RunnableConfig) -> Dict[str, Any]:
"""Generate audio for each transcript and merge them into a single podcast file."""
configuration = Configuration.from_runnable_config(config)
starting_transcript = PodcastTranscriptEntry(
speaker_id=1,
dialog=f"Welcome to {configuration.podcast_title} Podcast."
)
transcript = state.podcast_transcript
# Merge the starting transcript with the podcast transcript
# Check if transcript is a PodcastTranscripts object or already a list
if hasattr(transcript, 'podcast_transcripts'):
transcript_entries = transcript.podcast_transcripts
else:
transcript_entries = transcript
merged_transcript = [starting_transcript] + transcript_entries
# Create a temporary directory for audio files
temp_dir = Path("temp_audio")
temp_dir.mkdir(exist_ok=True)
# Generate a unique session ID for this podcast
session_id = str(uuid.uuid4())
output_path = f"podcasts/{session_id}_podcast.mp3"
os.makedirs("podcasts", exist_ok=True)
# Map of speaker_id to voice
voice_mapping = {
0: "alloy", # Default/intro voice
1: "echo", # First speaker
# 2: "fable", # Second speaker
# 3: "onyx", # Third speaker
# 4: "nova", # Fourth speaker
# 5: "shimmer" # Fifth speaker
}
# Generate audio for each transcript segment
audio_files = []
async def generate_speech_for_segment(segment, index):
# Handle both dictionary and PodcastTranscriptEntry objects
if hasattr(segment, 'speaker_id'):
speaker_id = segment.speaker_id
dialog = segment.dialog
else:
speaker_id = segment.get("speaker_id", 0)
dialog = segment.get("dialog", "")
# Select voice based on speaker_id
voice = voice_mapping.get(speaker_id, "alloy")
# Generate a unique filename for this segment
filename = f"{temp_dir}/{session_id}_{index}.mp3"
try:
if app_config.TTS_SERVICE_API_BASE:
response = await aspeech(
model=app_config.TTS_SERVICE,
api_base=app_config.TTS_SERVICE_API_BASE,
voice=voice,
input=dialog,
max_retries=2,
timeout=600,
)
else:
response = await aspeech(
model=app_config.TTS_SERVICE,
voice=voice,
input=dialog,
max_retries=2,
timeout=600,
)
# Save the audio to a file - use proper streaming method
with open(filename, 'wb') as f:
f.write(response.content)
return filename
except Exception as e:
print(f"Error generating speech for segment {index}: {str(e)}")
raise
# Generate all audio files concurrently
tasks = [generate_speech_for_segment(segment, i) for i, segment in enumerate(merged_transcript)]
audio_files = await asyncio.gather(*tasks)
# Merge audio files using ffmpeg
try:
# Create FFmpeg instance with the first input
ffmpeg = FFmpeg().option("y")
# Add each audio file as input
for audio_file in audio_files:
ffmpeg = ffmpeg.input(audio_file)
# Configure the concatenation and output
filter_complex = []
for i in range(len(audio_files)):
filter_complex.append(f"[{i}:0]")
filter_complex_str = "".join(filter_complex) + f"concat=n={len(audio_files)}:v=0:a=1[outa]"
ffmpeg = ffmpeg.option("filter_complex", filter_complex_str)
ffmpeg = ffmpeg.output(output_path, map="[outa]")
# Execute FFmpeg
await ffmpeg.execute()
print(f"Successfully created podcast audio: {output_path}")
except Exception as e:
print(f"Error merging audio files: {str(e)}")
raise
finally:
# Clean up temporary files
for audio_file in audio_files:
try:
os.remove(audio_file)
except:
pass
return {
"podcast_transcript": merged_transcript,
"final_podcast_file_path": output_path
}

View file

@ -0,0 +1,111 @@
import datetime
def get_podcast_generation_prompt():
return f"""
Today's date: {datetime.datetime.now().strftime("%Y-%m-%d")}
<podcast_generation_system>
You are a master podcast scriptwriter, adept at transforming diverse input content into a lively, engaging, and natural-sounding conversation between two distinct podcast hosts. Your primary objective is to craft authentic, flowing dialogue that captures the spontaneity and chemistry of a real podcast discussion, completely avoiding any hint of robotic scripting or stiff formality. Think dynamic interplay, not just information delivery.
<input>
- '<source_content>': A block of text containing the information to be discussed in the podcast. This could be research findings, an article summary, a detailed outline, user chat history related to the topic, or any other relevant raw information. The content might be unstructured but serves as the factual basis for the podcast dialogue.
</input>
<output_format>
A JSON object containing the podcast transcript with alternating speakers:
{{
"podcast_transcripts": [
{{
"speaker_id": 0,
"dialog": "Speaker 0 dialog here"
}},
{{
"speaker_id": 1,
"dialog": "Speaker 1 dialog here"
}},
{{
"speaker_id": 0,
"dialog": "Speaker 0 dialog here"
}},
{{
"speaker_id": 1,
"dialog": "Speaker 1 dialog here"
}}
]
}}
</output_format>
<guidelines>
1. **Establish Distinct & Consistent Host Personas:**
* **Speaker 0 (Lead Host):** Drives the conversation forward, introduces segments, poses key questions derived from the source content, and often summarizes takeaways. Maintain a guiding, clear, and engaging tone.
* **Speaker 1 (Co-Host/Expert):** Offers deeper insights, provides alternative viewpoints or elaborations on the source content, asks clarifying or challenging questions, and shares relevant anecdotes or examples. Adopt a complementary tone (e.g., analytical, enthusiastic, reflective, slightly skeptical).
* **Consistency is Key:** Ensure each speaker maintains their distinct voice, vocabulary choice, sentence structure, and perspective throughout the entire script. Avoid having them sound interchangeable. Their interaction should feel like a genuine partnership.
2. **Craft Natural & Dynamic Dialogue:**
* **Emulate Real Conversation:** Use contractions (e.g., "don't", "it's"), interjections ("Oh!", "Wow!", "Hmm"), discourse markers ("you know", "right?", "well"), and occasional natural pauses or filler words. Avoid overly formal language or complex sentence structures typical of written text.
* **Foster Interaction & Chemistry:** Write dialogue where speakers genuinely react *to each other*. They should build on points ("Exactly, and that reminds me..."), ask follow-up questions ("Could you expand on that?"), express agreement/disagreement respectfully ("That's a fair point, but have you considered...?"), and show active listening.
* **Vary Rhythm & Pace:** Mix short, punchy lines with longer, more explanatory ones. Vary sentence beginnings. Use questions to break up exposition. The rhythm should feel spontaneous, not monotonous.
* **Inject Personality & Relatability:** Allow for appropriate humor, moments of surprise or curiosity, brief personal reflections ("I actually experienced something similar..."), or relatable asides that fit the hosts' personas and the topic. Lightly reference past discussions if it enhances context ("Remember last week when we touched on...?").
3. **Structure for Flow and Listener Engagement:**
* **Natural Beginning:** Start with dialogue that flows naturally after an introduction (which will be added manually). Avoid redundant greetings or podcast name mentions since these will be added separately.
* **Logical Progression & Signposting:** Guide the listener through the information smoothly. Use clear transitions to link different ideas or segments ("So, now that we've covered X, let's dive into Y...", "That actually brings me to another key finding..."). Ensure topics flow logically from one to the next.
* **Meaningful Conclusion:** Summarize the key takeaways or main points discussed, reinforcing the core message derived from the source content. End with a final thought, a lingering question for the audience, or a brief teaser for what's next, providing a sense of closure. Avoid abrupt endings.
4. **Integrate Source Content Seamlessly & Accurately:**
* **Translate, Don't Recite:** Rephrase information from the `<source_content>` into conversational language suitable for each host's persona. Avoid directly copying dense sentences or technical jargon without explanation. The goal is discussion, not narration.
* **Explain & Contextualize:** Use analogies, simple examples, storytelling, or have one host ask clarifying questions (acting as a listener surrogate) to break down complex ideas from the source.
* **Weave Information Naturally:** Integrate facts, data, or key points from the source *within* the dialogue, not as standalone, undigested blocks. Attribute information conversationally where appropriate ("The research mentioned...", "Apparently, the key factor is...").
* **Balance Depth & Accessibility:** Ensure the conversation is informative and factually accurate based on the source content, but prioritize clear communication and engaging delivery over exhaustive technical detail. Make it understandable and interesting for a general audience.
5. **Length & Pacing:**
* **Six-Minute Duration:** Create a transcript that, when read at a natural speaking pace, would result in approximately 6 minutes of audio. Typically, this means around 1000 words total (based on average speaking rate of 150 words per minute).
* **Concise Speaking Turns:** Keep most speaking turns relatively brief and focused. Aim for a natural back-and-forth rhythm rather than extended monologues.
* **Essential Content Only:** Prioritize the most important information from the source content. Focus on quality over quantity, ensuring every line contributes meaningfully to the topic.
</guidelines>
<examples>
Input: "Quantum computing uses quantum bits or qubits which can exist in multiple states simultaneously due to superposition."
Output:
{{
"podcast_transcripts": [
{{
"speaker_id": 0,
"dialog": "Today we're diving into the mind-bending world of quantum computing. You know, this is a topic I've been excited to cover for weeks."
}},
{{
"speaker_id": 1,
"dialog": "Same here! And I know our listeners have been asking for it. But I have to admit, the concept of quantum computing makes my head spin a little. Can we start with the basics?"
}},
{{
"speaker_id": 0,
"dialog": "Absolutely. So regular computers use bits, right? Little on-off switches that are either 1 or 0. But quantum computers use something called qubits, and this is where it gets fascinating."
}},
{{
"speaker_id": 1,
"dialog": "Wait, what makes qubits so special compared to regular bits?"
}},
{{
"speaker_id": 0,
"dialog": "The magic is in something called superposition. These qubits can exist in multiple states at the same time, not just 1 or 0."
}},
{{
"speaker_id": 1,
"dialog": "That sounds impossible! How would you even picture that?"
}},
{{
"speaker_id": 0,
"dialog": "Think of it like a coin spinning in the air. Before it lands, is it heads or tails?"
}},
{{
"speaker_id": 1,
"dialog": "Well, it's... neither? Or I guess both, until it lands? Oh, I think I see where you're going with this."
}}
]
}}
</examples>
Transform the source material into a lively and engaging podcast conversation. Craft dialogue that showcases authentic host chemistry and natural interaction (including occasional disagreement, building on points, or asking follow-up questions). Use varied speech patterns reflecting real human conversation, ensuring the final script effectively educates *and* entertains the listener while keeping within a 5-minute audio duration.
</podcast_generation_system>
"""

View file

@ -0,0 +1,38 @@
"""Define the state structures for the agent."""
from __future__ import annotations
from dataclasses import dataclass
from typing import List, Optional
from pydantic import BaseModel, Field
class PodcastTranscriptEntry(BaseModel):
"""
Represents a single entry in a podcast transcript.
"""
speaker_id: int = Field(..., description="The ID of the speaker (0 or 1)")
dialog: str = Field(..., description="The dialog text spoken by the speaker")
class PodcastTranscripts(BaseModel):
"""
Represents the full podcast transcript structure.
"""
podcast_transcripts: List[PodcastTranscriptEntry] = Field(
...,
description="List of transcript entries with alternating speakers"
)
@dataclass
class State:
"""Defines the input state for the agent, representing a narrower interface to the outside world.
This class is used to define the initial state and structure of incoming data.
See: https://langchain-ai.github.io/langgraph/concepts/low_level/#state
for more information.
"""
source_content: str
podcast_transcript: Optional[List[PodcastTranscriptEntry]] = None
final_podcast_file_path: Optional[str] = None

View file

@ -3,10 +3,16 @@
from __future__ import annotations from __future__ import annotations
from dataclasses import dataclass, fields from dataclasses import dataclass, fields
from enum import Enum
from typing import Optional, List, Any from typing import Optional, List, Any
from langchain_core.runnables import RunnableConfig from langchain_core.runnables import RunnableConfig
class SearchMode(Enum):
"""Enum defining the type of search mode."""
CHUNKS = "CHUNKS"
DOCUMENTS = "DOCUMENTS"
@dataclass(kw_only=True) @dataclass(kw_only=True)
class Configuration: class Configuration:
@ -18,6 +24,7 @@ class Configuration:
connectors_to_search: List[str] connectors_to_search: List[str]
user_id: str user_id: str
search_space_id: int search_space_id: int
search_mode: SearchMode
@classmethod @classmethod

View file

@ -1,6 +1,6 @@
from langgraph.graph import StateGraph from langgraph.graph import StateGraph
from .state import State from .state import State
from .nodes import write_answer_outline, process_sections from .nodes import reformulate_user_query, write_answer_outline, process_sections
from .configuration import Configuration from .configuration import Configuration
from typing import TypedDict, List, Dict, Any, Optional from typing import TypedDict, List, Dict, Any, Optional
@ -25,11 +25,13 @@ def build_graph():
workflow = StateGraph(State, config_schema=Configuration) workflow = StateGraph(State, config_schema=Configuration)
# Add nodes to the graph # Add nodes to the graph
workflow.add_node("reformulate_user_query", reformulate_user_query)
workflow.add_node("write_answer_outline", write_answer_outline) workflow.add_node("write_answer_outline", write_answer_outline)
workflow.add_node("process_sections", process_sections) workflow.add_node("process_sections", process_sections)
# Define the edges - create a linear flow # Define the edges - create a linear flow
workflow.add_edge("__start__", "write_answer_outline") workflow.add_edge("__start__", "reformulate_user_query")
workflow.add_edge("reformulate_user_query", "write_answer_outline")
workflow.add_edge("write_answer_outline", "process_sections") workflow.add_edge("write_answer_outline", "process_sections")
workflow.add_edge("process_sections", "__end__") workflow.add_edge("process_sections", "__end__")

View file

@ -10,10 +10,14 @@ from langchain_core.runnables import RunnableConfig
from pydantic import BaseModel, Field from pydantic import BaseModel, Field
from sqlalchemy.ext.asyncio import AsyncSession from sqlalchemy.ext.asyncio import AsyncSession
from .configuration import Configuration from .configuration import Configuration, SearchMode
from .prompts import get_answer_outline_system_prompt from .prompts import get_answer_outline_system_prompt
from .state import State from .state import State
from .sub_section_writer.graph import graph as sub_section_writer_graph from .sub_section_writer.graph import graph as sub_section_writer_graph
from .sub_section_writer.configuration import SubSectionType
from app.utils.query_service import QueryService
from langgraph.types import StreamWriter from langgraph.types import StreamWriter
@ -41,14 +45,15 @@ async def write_answer_outline(state: State, config: RunnableConfig, writer: Str
""" """
streaming_service = state.streaming_service streaming_service = state.streaming_service
streaming_service.only_update_terminal("Generating answer outline...") streaming_service.only_update_terminal("🔍 Generating answer outline...")
writer({"yeild_value": streaming_service._format_annotations()}) writer({"yeild_value": streaming_service._format_annotations()})
# Get configuration from runnable config # Get configuration from runnable config
configuration = Configuration.from_runnable_config(config) configuration = Configuration.from_runnable_config(config)
reformulated_query = state.reformulated_query
user_query = configuration.user_query user_query = configuration.user_query
num_sections = configuration.num_sections num_sections = configuration.num_sections
streaming_service.only_update_terminal(f"Planning research approach for query: {user_query[:100]}...") streaming_service.only_update_terminal(f"🤔 Planning research approach for: \"{user_query[:100]}...\"")
writer({"yeild_value": streaming_service._format_annotations()}) writer({"yeild_value": streaming_service._format_annotations()})
# Initialize LLM # Initialize LLM
@ -58,7 +63,7 @@ async def write_answer_outline(state: State, config: RunnableConfig, writer: Str
human_message_content = f""" human_message_content = f"""
Now Please create an answer outline for the following query: Now Please create an answer outline for the following query:
User Query: {user_query} User Query: {reformulated_query}
Number of Sections: {num_sections} Number of Sections: {num_sections}
Remember to format your response as valid JSON exactly matching this structure: Remember to format your response as valid JSON exactly matching this structure:
@ -78,7 +83,7 @@ async def write_answer_outline(state: State, config: RunnableConfig, writer: Str
Your output MUST be valid JSON in exactly this format. Do not include any other text or explanation. Your output MUST be valid JSON in exactly this format. Do not include any other text or explanation.
""" """
streaming_service.only_update_terminal("Designing structured outline with AI...") streaming_service.only_update_terminal("📝 Designing structured outline with AI...")
writer({"yeild_value": streaming_service._format_annotations()}) writer({"yeild_value": streaming_service._format_annotations()})
# Create messages for the LLM # Create messages for the LLM
@ -88,7 +93,7 @@ async def write_answer_outline(state: State, config: RunnableConfig, writer: Str
] ]
# Call the LLM directly without using structured output # Call the LLM directly without using structured output
streaming_service.only_update_terminal("Processing answer structure...") streaming_service.only_update_terminal("⚙️ Processing answer structure...")
writer({"yeild_value": streaming_service._format_annotations()}) writer({"yeild_value": streaming_service._format_annotations()})
response = await llm.ainvoke(messages) response = await llm.ainvoke(messages)
@ -111,7 +116,7 @@ async def write_answer_outline(state: State, config: RunnableConfig, writer: Str
answer_outline = AnswerOutline(**parsed_data) answer_outline = AnswerOutline(**parsed_data)
total_questions = sum(len(section.questions) for section in answer_outline.answer_outline) total_questions = sum(len(section.questions) for section in answer_outline.answer_outline)
streaming_service.only_update_terminal(f"Successfully generated outline with {len(answer_outline.answer_outline)} sections and {total_questions} research questions") streaming_service.only_update_terminal(f"Successfully generated outline with {len(answer_outline.answer_outline)} sections and {total_questions} research questions!")
writer({"yeild_value": streaming_service._format_annotations()}) writer({"yeild_value": streaming_service._format_annotations()})
print(f"Successfully generated answer outline with {len(answer_outline.answer_outline)} sections") print(f"Successfully generated answer outline with {len(answer_outline.answer_outline)} sections")
@ -121,14 +126,14 @@ async def write_answer_outline(state: State, config: RunnableConfig, writer: Str
else: else:
# If JSON structure not found, raise a clear error # If JSON structure not found, raise a clear error
error_message = f"Could not find valid JSON in LLM response. Raw response: {content}" error_message = f"Could not find valid JSON in LLM response. Raw response: {content}"
streaming_service.only_update_terminal(error_message, "error") streaming_service.only_update_terminal(f"{error_message}", "error")
writer({"yeild_value": streaming_service._format_annotations()}) writer({"yeild_value": streaming_service._format_annotations()})
raise ValueError(error_message) raise ValueError(error_message)
except (json.JSONDecodeError, ValueError) as e: except (json.JSONDecodeError, ValueError) as e:
# Log the error and re-raise it # Log the error and re-raise it
error_message = f"Error parsing LLM response: {str(e)}" error_message = f"Error parsing LLM response: {str(e)}"
streaming_service.only_update_terminal(error_message, "error") streaming_service.only_update_terminal(f"{error_message}", "error")
writer({"yeild_value": streaming_service._format_annotations()}) writer({"yeild_value": streaming_service._format_annotations()})
print(f"Error parsing LLM response: {str(e)}") print(f"Error parsing LLM response: {str(e)}")
@ -143,11 +148,18 @@ async def fetch_relevant_documents(
connectors_to_search: List[str], connectors_to_search: List[str],
writer: StreamWriter = None, writer: StreamWriter = None,
state: State = None, state: State = None,
top_k: int = 20 top_k: int = 10,
connector_service: ConnectorService = None,
search_mode: SearchMode = SearchMode.CHUNKS
) -> List[Dict[str, Any]]: ) -> List[Dict[str, Any]]:
""" """
Fetch relevant documents for research questions using the provided connectors. Fetch relevant documents for research questions using the provided connectors.
This function searches across multiple data sources for information related to the
research questions. It provides user-friendly feedback during the search process by
displaying connector names (like "Web Search" instead of "TAVILY_API") and adding
relevant emojis to indicate the type of source being searched.
Args: Args:
research_questions: List of research questions to find documents for research_questions: List of research questions to find documents for
user_id: The user ID user_id: The user ID
@ -157,19 +169,22 @@ async def fetch_relevant_documents(
writer: StreamWriter for sending progress updates writer: StreamWriter for sending progress updates
state: The current state containing the streaming service state: The current state containing the streaming service
top_k: Number of top results to retrieve per connector per question top_k: Number of top results to retrieve per connector per question
connector_service: An initialized connector service to use for searching
Returns: Returns:
List of relevant documents List of relevant documents
""" """
# Initialize services # Initialize services
connector_service = ConnectorService(db_session) # connector_service = ConnectorService(db_session)
# Only use streaming if both writer and state are provided # Only use streaming if both writer and state are provided
streaming_service = state.streaming_service if state is not None else None streaming_service = state.streaming_service if state is not None else None
# Stream initial status update # Stream initial status update
if streaming_service and writer: if streaming_service and writer:
streaming_service.only_update_terminal(f"Starting research on {len(research_questions)} questions using {len(connectors_to_search)} connectors...") connector_names = [get_connector_friendly_name(connector) for connector in connectors_to_search]
connector_names_str = ", ".join(connector_names)
streaming_service.only_update_terminal(f"🔎 Starting research on {len(research_questions)} questions using {connector_names_str} data sources")
writer({"yeild_value": streaming_service._format_annotations()}) writer({"yeild_value": streaming_service._format_annotations()})
all_raw_documents = [] # Store all raw documents all_raw_documents = [] # Store all raw documents
@ -178,7 +193,7 @@ async def fetch_relevant_documents(
for i, user_query in enumerate(research_questions): for i, user_query in enumerate(research_questions):
# Stream question being researched # Stream question being researched
if streaming_service and writer: if streaming_service and writer:
streaming_service.only_update_terminal(f"Researching question {i+1}/{len(research_questions)}: {user_query[:100]}...") streaming_service.only_update_terminal(f"🧠 Researching question {i+1}/{len(research_questions)}: \"{user_query[:100]}...\"")
writer({"yeild_value": streaming_service._format_annotations()}) writer({"yeild_value": streaming_service._format_annotations()})
# Use original research question as the query # Use original research question as the query
@ -188,7 +203,9 @@ async def fetch_relevant_documents(
for connector in connectors_to_search: for connector in connectors_to_search:
# Stream connector being searched # Stream connector being searched
if streaming_service and writer: if streaming_service and writer:
streaming_service.only_update_terminal(f"Searching {connector} for relevant information...") connector_emoji = get_connector_emoji(connector)
friendly_name = get_connector_friendly_name(connector)
streaming_service.only_update_terminal(f"{connector_emoji} Searching {friendly_name} for relevant information...")
writer({"yeild_value": streaming_service._format_annotations()}) writer({"yeild_value": streaming_service._format_annotations()})
try: try:
@ -197,7 +214,8 @@ async def fetch_relevant_documents(
user_query=reformulated_query, user_query=reformulated_query,
user_id=user_id, user_id=user_id,
search_space_id=search_space_id, search_space_id=search_space_id,
top_k=top_k top_k=top_k,
search_mode=search_mode
) )
# Add to sources and raw documents # Add to sources and raw documents
@ -207,7 +225,7 @@ async def fetch_relevant_documents(
# Stream found document count # Stream found document count
if streaming_service and writer: if streaming_service and writer:
streaming_service.only_update_terminal(f"Found {len(youtube_chunks)} YouTube chunks relevant to the query") streaming_service.only_update_terminal(f"📹 Found {len(youtube_chunks)} YouTube chunks related to your query")
writer({"yeild_value": streaming_service._format_annotations()}) writer({"yeild_value": streaming_service._format_annotations()})
elif connector == "EXTENSION": elif connector == "EXTENSION":
@ -215,7 +233,8 @@ async def fetch_relevant_documents(
user_query=reformulated_query, user_query=reformulated_query,
user_id=user_id, user_id=user_id,
search_space_id=search_space_id, search_space_id=search_space_id,
top_k=top_k top_k=top_k,
search_mode=search_mode
) )
# Add to sources and raw documents # Add to sources and raw documents
@ -225,7 +244,7 @@ async def fetch_relevant_documents(
# Stream found document count # Stream found document count
if streaming_service and writer: if streaming_service and writer:
streaming_service.only_update_terminal(f"Found {len(extension_chunks)} extension chunks relevant to the query") streaming_service.only_update_terminal(f"🧩 Found {len(extension_chunks)} Browser Extension chunks related to your query")
writer({"yeild_value": streaming_service._format_annotations()}) writer({"yeild_value": streaming_service._format_annotations()})
elif connector == "CRAWLED_URL": elif connector == "CRAWLED_URL":
@ -233,7 +252,8 @@ async def fetch_relevant_documents(
user_query=reformulated_query, user_query=reformulated_query,
user_id=user_id, user_id=user_id,
search_space_id=search_space_id, search_space_id=search_space_id,
top_k=top_k top_k=top_k,
search_mode=search_mode
) )
# Add to sources and raw documents # Add to sources and raw documents
@ -243,7 +263,7 @@ async def fetch_relevant_documents(
# Stream found document count # Stream found document count
if streaming_service and writer: if streaming_service and writer:
streaming_service.only_update_terminal(f"Found {len(crawled_urls_chunks)} crawled URL chunks relevant to the query") streaming_service.only_update_terminal(f"🌐 Found {len(crawled_urls_chunks)} Web Pages chunks related to your query")
writer({"yeild_value": streaming_service._format_annotations()}) writer({"yeild_value": streaming_service._format_annotations()})
elif connector == "FILE": elif connector == "FILE":
@ -251,7 +271,8 @@ async def fetch_relevant_documents(
user_query=reformulated_query, user_query=reformulated_query,
user_id=user_id, user_id=user_id,
search_space_id=search_space_id, search_space_id=search_space_id,
top_k=top_k top_k=top_k,
search_mode=search_mode
) )
# Add to sources and raw documents # Add to sources and raw documents
@ -261,7 +282,84 @@ async def fetch_relevant_documents(
# Stream found document count # Stream found document count
if streaming_service and writer: if streaming_service and writer:
streaming_service.only_update_terminal(f"Found {len(files_chunks)} file chunks relevant to the query") streaming_service.only_update_terminal(f"📄 Found {len(files_chunks)} Files chunks related to your query")
writer({"yeild_value": streaming_service._format_annotations()})
elif connector == "SLACK_CONNECTOR":
source_object, slack_chunks = await connector_service.search_slack(
user_query=reformulated_query,
user_id=user_id,
search_space_id=search_space_id,
top_k=top_k,
search_mode=search_mode
)
# Add to sources and raw documents
if source_object:
all_sources.append(source_object)
all_raw_documents.extend(slack_chunks)
# Stream found document count
if streaming_service and writer:
streaming_service.only_update_terminal(f"💬 Found {len(slack_chunks)} Slack messages related to your query")
writer({"yeild_value": streaming_service._format_annotations()})
elif connector == "NOTION_CONNECTOR":
source_object, notion_chunks = await connector_service.search_notion(
user_query=reformulated_query,
user_id=user_id,
search_space_id=search_space_id,
top_k=top_k,
search_mode=search_mode
)
# Add to sources and raw documents
if source_object:
all_sources.append(source_object)
all_raw_documents.extend(notion_chunks)
# Stream found document count
if streaming_service and writer:
streaming_service.only_update_terminal(f"📘 Found {len(notion_chunks)} Notion pages/blocks related to your query")
writer({"yeild_value": streaming_service._format_annotations()})
elif connector == "GITHUB_CONNECTOR":
source_object, github_chunks = await connector_service.search_github(
user_query=reformulated_query,
user_id=user_id,
search_space_id=search_space_id,
top_k=top_k,
search_mode=search_mode
)
# Add to sources and raw documents
if source_object:
all_sources.append(source_object)
all_raw_documents.extend(github_chunks)
# Stream found document count
if streaming_service and writer:
streaming_service.only_update_terminal(f"🐙 Found {len(github_chunks)} GitHub files/issues related to your query")
writer({"yeild_value": streaming_service._format_annotations()})
elif connector == "LINEAR_CONNECTOR":
source_object, linear_chunks = await connector_service.search_linear(
user_query=reformulated_query,
user_id=user_id,
search_space_id=search_space_id,
top_k=top_k,
search_mode=search_mode
)
# Add to sources and raw documents
if source_object:
all_sources.append(source_object)
all_raw_documents.extend(linear_chunks)
# Stream found document count
if streaming_service and writer:
streaming_service.only_update_terminal(f"📊 Found {len(linear_chunks)} Linear issues related to your query")
writer({"yeild_value": streaming_service._format_annotations()}) writer({"yeild_value": streaming_service._format_annotations()})
elif connector == "TAVILY_API": elif connector == "TAVILY_API":
@ -278,87 +376,40 @@ async def fetch_relevant_documents(
# Stream found document count # Stream found document count
if streaming_service and writer: if streaming_service and writer:
streaming_service.only_update_terminal(f"Found {len(tavily_chunks)} web search results relevant to the query") streaming_service.only_update_terminal(f"🔍 Found {len(tavily_chunks)} Web Search results related to your query")
writer({"yeild_value": streaming_service._format_annotations()}) writer({"yeild_value": streaming_service._format_annotations()})
elif connector == "SLACK_CONNECTOR": elif connector == "LINKUP_API":
source_object, slack_chunks = await connector_service.search_slack( if top_k > 10:
linkup_mode = "deep"
else:
linkup_mode = "standard"
source_object, linkup_chunks = await connector_service.search_linkup(
user_query=reformulated_query, user_query=reformulated_query,
user_id=user_id, user_id=user_id,
search_space_id=search_space_id, mode=linkup_mode
top_k=top_k
) )
# Add to sources and raw documents # Add to sources and raw documents
if source_object: if source_object:
all_sources.append(source_object) all_sources.append(source_object)
all_raw_documents.extend(slack_chunks) all_raw_documents.extend(linkup_chunks)
# Stream found document count # Stream found document count
if streaming_service and writer: if streaming_service and writer:
streaming_service.only_update_terminal(f"Found {len(slack_chunks)} Slack messages relevant to the query") streaming_service.only_update_terminal(f"🔗 Found {len(linkup_chunks)} Linkup results related to your query")
writer({"yeild_value": streaming_service._format_annotations()}) writer({"yeild_value": streaming_service._format_annotations()})
elif connector == "NOTION_CONNECTOR":
source_object, notion_chunks = await connector_service.search_notion(
user_query=reformulated_query,
user_id=user_id,
search_space_id=search_space_id,
top_k=top_k
)
# Add to sources and raw documents
if source_object:
all_sources.append(source_object)
all_raw_documents.extend(notion_chunks)
# Stream found document count
if streaming_service and writer:
streaming_service.only_update_terminal(f"Found {len(notion_chunks)} Notion pages/blocks relevant to the query")
writer({"yeild_value": streaming_service._format_annotations()})
elif connector == "GITHUB_CONNECTOR":
source_object, github_chunks = await connector_service.search_github(
user_query=reformulated_query,
user_id=user_id,
search_space_id=search_space_id,
top_k=top_k
)
# Add to sources and raw documents
if source_object:
all_sources.append(source_object)
all_raw_documents.extend(github_chunks)
# Stream found document count
if streaming_service and writer:
streaming_service.only_update_terminal(f"Found {len(github_chunks)} GitHub files/issues relevant to the query")
writer({"yeild_value": streaming_service._format_annotations()})
elif connector == "LINEAR_CONNECTOR":
source_object, linear_chunks = await connector_service.search_linear(
user_query=reformulated_query,
user_id=user_id,
search_space_id=search_space_id,
top_k=top_k
)
# Add to sources and raw documents
if source_object:
all_sources.append(source_object)
all_raw_documents.extend(linear_chunks)
# Stream found document count
if streaming_service and writer:
streaming_service.only_update_terminal(f"Found {len(linear_chunks)} Linear issues relevant to the query")
writer({"yeild_value": streaming_service._format_annotations()})
except Exception as e: except Exception as e:
error_message = f"Error searching connector {connector}: {str(e)}" error_message = f"Error searching connector {connector}: {str(e)}"
print(error_message) print(error_message)
# Stream error message # Stream error message
if streaming_service and writer: if streaming_service and writer:
streaming_service.only_update_terminal(error_message, "error") friendly_name = get_connector_friendly_name(connector)
streaming_service.only_update_terminal(f"⚠️ Error searching {friendly_name}: {str(e)}", "error")
writer({"yeild_value": streaming_service._format_annotations()}) writer({"yeild_value": streaming_service._format_annotations()})
# Continue with other connectors on error # Continue with other connectors on error
@ -385,7 +436,7 @@ async def fetch_relevant_documents(
# Stream info about deduplicated sources # Stream info about deduplicated sources
if streaming_service and writer: if streaming_service and writer:
streaming_service.only_update_terminal(f"Collected {len(deduplicated_sources)} unique sources across all connectors") streaming_service.only_update_terminal(f"📚 Collected {len(deduplicated_sources)} unique sources across all connectors")
writer({"yeild_value": streaming_service._format_annotations()}) writer({"yeild_value": streaming_service._format_annotations()})
# After all sources are collected and deduplicated, stream them # After all sources are collected and deduplicated, stream them
@ -415,12 +466,44 @@ async def fetch_relevant_documents(
# Stream info about deduplicated documents # Stream info about deduplicated documents
if streaming_service and writer: if streaming_service and writer:
streaming_service.only_update_terminal(f"Found {len(deduplicated_docs)} unique document chunks after deduplication") streaming_service.only_update_terminal(f"🧹 Found {len(deduplicated_docs)} unique document chunks after removing duplicates")
writer({"yeild_value": streaming_service._format_annotations()}) writer({"yeild_value": streaming_service._format_annotations()})
# Return deduplicated documents # Return deduplicated documents
return deduplicated_docs return deduplicated_docs
def get_connector_emoji(connector_name: str) -> str:
"""Get an appropriate emoji for a connector type."""
connector_emojis = {
"YOUTUBE_VIDEO": "📹",
"EXTENSION": "🧩",
"CRAWLED_URL": "🌐",
"FILE": "📄",
"SLACK_CONNECTOR": "💬",
"NOTION_CONNECTOR": "📘",
"GITHUB_CONNECTOR": "🐙",
"LINEAR_CONNECTOR": "📊",
"TAVILY_API": "🔍",
"LINKUP_API": "🔗"
}
return connector_emojis.get(connector_name, "🔎")
def get_connector_friendly_name(connector_name: str) -> str:
"""Convert technical connector IDs to user-friendly names."""
connector_friendly_names = {
"YOUTUBE_VIDEO": "YouTube",
"EXTENSION": "Browser Extension",
"CRAWLED_URL": "Web Pages",
"FILE": "Files",
"SLACK_CONNECTOR": "Slack",
"NOTION_CONNECTOR": "Notion",
"GITHUB_CONNECTOR": "GitHub",
"LINEAR_CONNECTOR": "Linear",
"TAVILY_API": "Tavily Search",
"LINKUP_API": "Linkup Search"
}
return connector_friendly_names.get(connector_name, connector_name)
async def process_sections(state: State, config: RunnableConfig, writer: StreamWriter) -> Dict[str, Any]: async def process_sections(state: State, config: RunnableConfig, writer: StreamWriter) -> Dict[str, Any]:
""" """
Process all sections in parallel and combine the results. Process all sections in parallel and combine the results.
@ -437,13 +520,17 @@ async def process_sections(state: State, config: RunnableConfig, writer: StreamW
answer_outline = state.answer_outline answer_outline = state.answer_outline
streaming_service = state.streaming_service streaming_service = state.streaming_service
streaming_service.only_update_terminal(f"Starting to process research sections...") # Initialize a dictionary to track content for all sections
# This is used to maintain section content while streaming multiple sections
section_contents = {}
streaming_service.only_update_terminal(f"🚀 Starting to process research sections...")
writer({"yeild_value": streaming_service._format_annotations()}) writer({"yeild_value": streaming_service._format_annotations()})
print(f"Processing sections from outline: {answer_outline is not None}") print(f"Processing sections from outline: {answer_outline is not None}")
if not answer_outline: if not answer_outline:
streaming_service.only_update_terminal("Error: No answer outline was provided. Cannot generate report.", "error") streaming_service.only_update_terminal("Error: No answer outline was provided. Cannot generate report.", "error")
writer({"yeild_value": streaming_service._format_annotations()}) writer({"yeild_value": streaming_service._format_annotations()})
return { return {
"final_written_report": "No answer outline was provided. Cannot generate final report." "final_written_report": "No answer outline was provided. Cannot generate final report."
@ -455,16 +542,26 @@ async def process_sections(state: State, config: RunnableConfig, writer: StreamW
all_questions.extend(section.questions) all_questions.extend(section.questions)
print(f"Collected {len(all_questions)} questions from all sections") print(f"Collected {len(all_questions)} questions from all sections")
streaming_service.only_update_terminal(f"Found {len(all_questions)} research questions across {len(answer_outline.answer_outline)} sections") streaming_service.only_update_terminal(f"🧩 Found {len(all_questions)} research questions across {len(answer_outline.answer_outline)} sections")
writer({"yeild_value": streaming_service._format_annotations()}) writer({"yeild_value": streaming_service._format_annotations()})
# Fetch relevant documents once for all questions # Fetch relevant documents once for all questions
streaming_service.only_update_terminal("Searching for relevant information across all connectors...") streaming_service.only_update_terminal("🔍 Searching for relevant information across all connectors...")
writer({"yeild_value": streaming_service._format_annotations()}) writer({"yeild_value": streaming_service._format_annotations()})
if configuration.num_sections == 1:
TOP_K = 10
elif configuration.num_sections == 3:
TOP_K = 20
elif configuration.num_sections == 6:
TOP_K = 30
relevant_documents = [] relevant_documents = []
async with async_session_maker() as db_session: async with async_session_maker() as db_session:
try: try:
# Create connector service inside the db_session scope
connector_service = ConnectorService(db_session)
relevant_documents = await fetch_relevant_documents( relevant_documents = await fetch_relevant_documents(
research_questions=all_questions, research_questions=all_questions,
user_id=configuration.user_id, user_id=configuration.user_id,
@ -472,30 +569,47 @@ async def process_sections(state: State, config: RunnableConfig, writer: StreamW
db_session=db_session, db_session=db_session,
connectors_to_search=configuration.connectors_to_search, connectors_to_search=configuration.connectors_to_search,
writer=writer, writer=writer,
state=state state=state,
top_k=TOP_K,
connector_service=connector_service,
search_mode=configuration.search_mode
) )
except Exception as e: except Exception as e:
error_message = f"Error fetching relevant documents: {str(e)}" error_message = f"Error fetching relevant documents: {str(e)}"
print(error_message) print(error_message)
streaming_service.only_update_terminal(error_message, "error") streaming_service.only_update_terminal(f"{error_message}", "error")
writer({"yeild_value": streaming_service._format_annotations()}) writer({"yeild_value": streaming_service._format_annotations()})
# Log the error and continue with an empty list of documents # Log the error and continue with an empty list of documents
# This allows the process to continue, but the report might lack information # This allows the process to continue, but the report might lack information
relevant_documents = [] relevant_documents = []
# Consider adding more robust error handling or reporting if needed
print(f"Fetched {len(relevant_documents)} relevant documents for all sections") print(f"Fetched {len(relevant_documents)} relevant documents for all sections")
streaming_service.only_update_terminal(f"Starting to draft {len(answer_outline.answer_outline)} sections using {len(relevant_documents)} relevant document chunks") streaming_service.only_update_terminal(f"Starting to draft {len(answer_outline.answer_outline)} sections using {len(relevant_documents)} relevant document chunks")
writer({"yeild_value": streaming_service._format_annotations()}) writer({"yeild_value": streaming_service._format_annotations()})
# Create tasks to process each section in parallel with the same document set # Create tasks to process each section in parallel with the same document set
section_tasks = [] section_tasks = []
streaming_service.only_update_terminal("Creating processing tasks for each section...") streaming_service.only_update_terminal("⚙️ Creating processing tasks for each section...")
writer({"yeild_value": streaming_service._format_annotations()}) writer({"yeild_value": streaming_service._format_annotations()})
for section in answer_outline.answer_outline: for i, section in enumerate(answer_outline.answer_outline):
if i == 0:
sub_section_type = SubSectionType.START
elif i == len(answer_outline.answer_outline) - 1:
sub_section_type = SubSectionType.END
else:
sub_section_type = SubSectionType.MIDDLE
# Initialize the section_contents entry for this section
section_contents[i] = {
"title": section.section_title,
"content": "",
"index": i
}
section_tasks.append( section_tasks.append(
process_section_with_documents( process_section_with_documents(
section_id=i,
section_title=section.section_title, section_title=section.section_title,
section_questions=section.questions, section_questions=section.questions,
user_query=configuration.user_query, user_query=configuration.user_query,
@ -503,19 +617,21 @@ async def process_sections(state: State, config: RunnableConfig, writer: StreamW
search_space_id=configuration.search_space_id, search_space_id=configuration.search_space_id,
relevant_documents=relevant_documents, relevant_documents=relevant_documents,
state=state, state=state,
writer=writer writer=writer,
sub_section_type=sub_section_type,
section_contents=section_contents
) )
) )
# Run all section processing tasks in parallel # Run all section processing tasks in parallel
print(f"Running {len(section_tasks)} section processing tasks in parallel") print(f"Running {len(section_tasks)} section processing tasks in parallel")
streaming_service.only_update_terminal(f"Processing {len(section_tasks)} sections simultaneously...") streaming_service.only_update_terminal(f"Processing {len(section_tasks)} sections simultaneously...")
writer({"yeild_value": streaming_service._format_annotations()}) writer({"yeild_value": streaming_service._format_annotations()})
section_results = await asyncio.gather(*section_tasks, return_exceptions=True) section_results = await asyncio.gather(*section_tasks, return_exceptions=True)
# Handle any exceptions in the results # Handle any exceptions in the results
streaming_service.only_update_terminal("Combining section results into final report...") streaming_service.only_update_terminal("🧵 Combining section results into final report...")
writer({"yeild_value": streaming_service._format_annotations()}) writer({"yeild_value": streaming_service._format_annotations()})
processed_results = [] processed_results = []
@ -524,7 +640,7 @@ async def process_sections(state: State, config: RunnableConfig, writer: StreamW
section_title = answer_outline.answer_outline[i].section_title section_title = answer_outline.answer_outline[i].section_title
error_message = f"Error processing section '{section_title}': {str(result)}" error_message = f"Error processing section '{section_title}': {str(result)}"
print(error_message) print(error_message)
streaming_service.only_update_terminal(error_message, "error") streaming_service.only_update_terminal(f"⚠️ {error_message}", "error")
writer({"yeild_value": streaming_service._format_annotations()}) writer({"yeild_value": streaming_service._format_annotations()})
processed_results.append(error_message) processed_results.append(error_message)
else: else:
@ -542,31 +658,18 @@ async def process_sections(state: State, config: RunnableConfig, writer: StreamW
final_written_report = "\n".join(final_report) final_written_report = "\n".join(final_report)
print(f"Generated final report with {len(final_report)} parts") print(f"Generated final report with {len(final_report)} parts")
streaming_service.only_update_terminal("Final research report generated successfully!") streaming_service.only_update_terminal("🎉 Final research report generated successfully!")
writer({"yeild_value": streaming_service._format_annotations()}) writer({"yeild_value": streaming_service._format_annotations()})
if hasattr(state, 'streaming_service') and state.streaming_service: # Skip the final update since we've been streaming incremental updates
# Convert the final report to the expected format for UI: # The final answer from each section is already shown in the UI
# A list of strings where empty strings represent line breaks
formatted_report = []
for section in final_report:
if section == "\n":
# Add an empty string for line breaks
formatted_report.append("")
else:
# Split any multiline content by newlines and add each line
section_lines = section.split("\n")
formatted_report.extend(section_lines)
state.streaming_service.only_update_answer(formatted_report)
writer({"yeild_value": state.streaming_service._format_annotations()})
return { return {
"final_written_report": final_written_report "final_written_report": final_written_report
} }
async def process_section_with_documents( async def process_section_with_documents(
section_id: int,
section_title: str, section_title: str,
section_questions: List[str], section_questions: List[str],
user_id: str, user_id: str,
@ -574,12 +677,15 @@ async def process_section_with_documents(
relevant_documents: List[Dict[str, Any]], relevant_documents: List[Dict[str, Any]],
user_query: str, user_query: str,
state: State = None, state: State = None,
writer: StreamWriter = None writer: StreamWriter = None,
sub_section_type: SubSectionType = SubSectionType.MIDDLE,
section_contents: Dict[int, Dict[str, Any]] = None
) -> str: ) -> str:
""" """
Process a single section using pre-fetched documents. Process a single section using pre-fetched documents.
Args: Args:
section_id: The ID of the section
section_title: The title of the section section_title: The title of the section
section_questions: List of research questions for this section section_questions: List of research questions for this section
user_id: The user ID user_id: The user ID
@ -587,6 +693,8 @@ async def process_section_with_documents(
relevant_documents: Pre-fetched documents to use for this section relevant_documents: Pre-fetched documents to use for this section
state: The current state state: The current state
writer: StreamWriter for sending progress updates writer: StreamWriter for sending progress updates
sub_section_type: The type of section (start, middle, end)
section_contents: Dictionary to track content across multiple sections
Returns: Returns:
The written section content The written section content
@ -597,14 +705,14 @@ async def process_section_with_documents(
# Send status update via streaming if available # Send status update via streaming if available
if state and state.streaming_service and writer: if state and state.streaming_service and writer:
state.streaming_service.only_update_terminal(f"Writing section: {section_title} with {len(section_questions)} research questions") state.streaming_service.only_update_terminal(f"📝 Writing section: \"{section_title}\" with {len(section_questions)} research questions")
writer({"yeild_value": state.streaming_service._format_annotations()}) writer({"yeild_value": state.streaming_service._format_annotations()})
# Fallback if no documents found # Fallback if no documents found
if not documents_to_use: if not documents_to_use:
print(f"No relevant documents found for section: {section_title}") print(f"No relevant documents found for section: {section_title}")
if state and state.streaming_service and writer: if state and state.streaming_service and writer:
state.streaming_service.only_update_terminal(f"Warning: No relevant documents found for section: {section_title}", "warning") state.streaming_service.only_update_terminal(f"⚠️ Warning: No relevant documents found for section: \"{section_title}\"", "warning")
writer({"yeild_value": state.streaming_service._format_annotations()}) writer({"yeild_value": state.streaming_service._format_annotations()})
documents_to_use = [ documents_to_use = [
@ -619,6 +727,7 @@ async def process_section_with_documents(
"configurable": { "configurable": {
"sub_section_title": section_title, "sub_section_title": section_title,
"sub_section_questions": section_questions, "sub_section_questions": section_questions,
"sub_section_type": sub_section_type,
"user_query": user_query, "user_query": user_query,
"relevant_documents": documents_to_use, "relevant_documents": documents_to_use,
"user_id": user_id, "user_id": user_id,
@ -626,33 +735,94 @@ async def process_section_with_documents(
} }
} }
# Create the initial state with db_session # Create the initial state with db_session and chat_history
sub_state = {"db_session": db_session} sub_state = {
"db_session": db_session,
"chat_history": state.chat_history
}
# Invoke the sub-section writer graph # Invoke the sub-section writer graph with streaming
print(f"Invoking sub_section_writer for: {section_title}") print(f"Invoking sub_section_writer for: {section_title}")
if state and state.streaming_service and writer: if state and state.streaming_service and writer:
state.streaming_service.only_update_terminal(f"Analyzing information and drafting content for section: {section_title}") state.streaming_service.only_update_terminal(f"🧠 Analyzing information and drafting content for section: \"{section_title}\"")
writer({"yeild_value": state.streaming_service._format_annotations()}) writer({"yeild_value": state.streaming_service._format_annotations()})
result = await sub_section_writer_graph.ainvoke(sub_state, config) # Variables to track streaming state
complete_content = "" # Tracks the complete content received so far
# Return the final answer from the sub_section_writer async for chunk_type, chunk in sub_section_writer_graph.astream(sub_state, config, stream_mode=["values"]):
final_answer = result.get("final_answer", "No content was generated for this section.") if "final_answer" in chunk:
new_content = chunk["final_answer"]
if new_content and new_content != complete_content:
# Extract only the new content (delta)
delta = new_content[len(complete_content):]
# Send section content update via streaming if available # Update what we've processed so far
complete_content = new_content
# Only stream if there's actual new content
if delta and state and state.streaming_service and writer:
# Update terminal with real-time progress indicator
state.streaming_service.only_update_terminal(f"✍️ Writing section {section_id+1}... ({len(complete_content.split())} words)")
# Update section_contents with just the new delta
section_contents[section_id]["content"] += delta
# Build UI-friendly content for all sections
complete_answer = []
for i in range(len(section_contents)):
if i in section_contents and section_contents[i]["content"]:
# Add section header
complete_answer.append(f"# {section_contents[i]['title']}")
complete_answer.append("") # Empty line after title
# Add section content
content_lines = section_contents[i]["content"].split("\n")
complete_answer.extend(content_lines)
complete_answer.append("") # Empty line after content
# Update answer in UI in real-time
state.streaming_service.only_update_answer(complete_answer)
writer({"yeild_value": state.streaming_service._format_annotations()})
# Set default if no content was received
if not complete_content:
complete_content = "No content was generated for this section."
section_contents[section_id]["content"] = complete_content
# Final terminal update
if state and state.streaming_service and writer: if state and state.streaming_service and writer:
state.streaming_service.only_update_terminal(f"Completed writing section: {section_title}") state.streaming_service.only_update_terminal(f"Completed section: \"{section_title}\"")
writer({"yeild_value": state.streaming_service._format_annotations()}) writer({"yeild_value": state.streaming_service._format_annotations()})
return final_answer return complete_content
except Exception as e: except Exception as e:
print(f"Error processing section '{section_title}': {str(e)}") print(f"Error processing section '{section_title}': {str(e)}")
# Send error update via streaming if available # Send error update via streaming if available
if state and state.streaming_service and writer: if state and state.streaming_service and writer:
state.streaming_service.only_update_terminal(f"Error processing section '{section_title}': {str(e)}", "error") state.streaming_service.only_update_terminal(f"❌ Error processing section \"{section_title}\": {str(e)}", "error")
writer({"yeild_value": state.streaming_service._format_annotations()}) writer({"yeild_value": state.streaming_service._format_annotations()})
return f"Error processing section: {section_title}. Details: {str(e)}" return f"Error processing section: {section_title}. Details: {str(e)}"
async def reformulate_user_query(state: State, config: RunnableConfig, writer: StreamWriter) -> Dict[str, Any]:
"""
Reforms the user query based on the chat history.
"""
configuration = Configuration.from_runnable_config(config)
user_query = configuration.user_query
chat_history_str = await QueryService.langchain_chat_history_to_str(state.chat_history)
if len(state.chat_history) == 0:
reformulated_query = user_query
else:
reformulated_query = await QueryService.reformulate_query_with_chat_history(user_query, chat_history_str)
return {
"reformulated_query": reformulated_query
}

View file

@ -3,7 +3,7 @@
from __future__ import annotations from __future__ import annotations
from dataclasses import dataclass, field from dataclasses import dataclass, field
from typing import Optional, Any from typing import List, Optional, Any
from sqlalchemy.ext.asyncio import AsyncSession from sqlalchemy.ext.asyncio import AsyncSession
from app.utils.streaming_service import StreamingService from app.utils.streaming_service import StreamingService
@ -21,7 +21,9 @@ class State:
# Streaming service # Streaming service
streaming_service: StreamingService streaming_service: StreamingService
# Intermediate state - populated during workflow chat_history: Optional[List[Any]] = field(default_factory=list)
reformulated_query: Optional[str] = field(default=None)
# Using field to explicitly mark as part of state # Using field to explicitly mark as part of state
answer_outline: Optional[Any] = field(default=None) answer_outline: Optional[Any] = field(default=None)

View file

@ -3,11 +3,19 @@
from __future__ import annotations from __future__ import annotations
from dataclasses import dataclass, fields from dataclasses import dataclass, fields
from enum import Enum
from typing import Optional, List, Any from typing import Optional, List, Any
from langchain_core.runnables import RunnableConfig from langchain_core.runnables import RunnableConfig
class SubSectionType(Enum):
"""Enum defining the type of sub-section."""
START = "START"
MIDDLE = "MIDDLE"
END = "END"
@dataclass(kw_only=True) @dataclass(kw_only=True)
class Configuration: class Configuration:
"""The configuration for the agent.""" """The configuration for the agent."""
@ -15,6 +23,7 @@ class Configuration:
# Input parameters provided at invocation # Input parameters provided at invocation
sub_section_title: str sub_section_title: str
sub_section_questions: List[str] sub_section_questions: List[str]
sub_section_type: SubSectionType
user_query: str user_query: str
relevant_documents: List[Any] # Documents provided directly to the agent relevant_documents: List[Any] # Documents provided directly to the agent
user_id: str user_id: str

View file

@ -5,6 +5,7 @@ from typing import Any, Dict
from app.config import config as app_config from app.config import config as app_config
from .prompts import get_citation_system_prompt from .prompts import get_citation_system_prompt
from langchain_core.messages import HumanMessage, SystemMessage from langchain_core.messages import HumanMessage, SystemMessage
from .configuration import SubSectionType
async def rerank_documents(state: State, config: RunnableConfig) -> Dict[str, Any]: async def rerank_documents(state: State, config: RunnableConfig) -> Dict[str, Any]:
""" """
@ -38,7 +39,9 @@ async def rerank_documents(state: State, config: RunnableConfig) -> Dict[str, An
try: try:
# Use the sub-section questions for reranking context # Use the sub-section questions for reranking context
# rerank_query = "\n".join(sub_section_questions) # rerank_query = "\n".join(sub_section_questions)
rerank_query = configuration.user_query # rerank_query = configuration.user_query
rerank_query = configuration.user_query + "\n" + "\n".join(sub_section_questions)
# Convert documents to format expected by reranker if needed # Convert documents to format expected by reranker if needed
reranker_input_docs = [ reranker_input_docs = [
@ -102,13 +105,14 @@ async def write_sub_section(state: State, config: RunnableConfig) -> Dict[str, A
# Extract content and metadata # Extract content and metadata
content = doc.get("content", "") content = doc.get("content", "")
doc_info = doc.get("document", {}) doc_info = doc.get("document", {})
document_id = doc_info.get("id", f"{i+1}") # Use document ID or index+1 as source_id document_id = doc_info.get("id") # Use document ID
# Format document according to the citation system prompt's expected format # Format document according to the citation system prompt's expected format
formatted_doc = f""" formatted_doc = f"""
<document> <document>
<metadata> <metadata>
<source_id>{document_id}</source_id> <source_id>{document_id}</source_id>
<source_type>{doc_info.get("document_type", "CRAWLED_URL")}</source_type>
</metadata> </metadata>
<content> <content>
{content} {content}
@ -122,12 +126,27 @@ async def write_sub_section(state: State, config: RunnableConfig) -> Dict[str, A
sub_section_questions = configuration.sub_section_questions sub_section_questions = configuration.sub_section_questions
user_query = configuration.user_query # Get the original user query user_query = configuration.user_query # Get the original user query
documents_text = "\n".join(formatted_documents) documents_text = "\n".join(formatted_documents)
sub_section_type = configuration.sub_section_type
# Format the questions as bullet points for clarity # Format the questions as bullet points for clarity
questions_text = "\n".join([f"- {question}" for question in sub_section_questions]) questions_text = "\n".join([f"- {question}" for question in sub_section_questions])
# Provide more context based on the subsection type
section_position_context = ""
if sub_section_type == SubSectionType.START:
section_position_context = "This is the INTRODUCTION section. "
elif sub_section_type == SubSectionType.MIDDLE:
section_position_context = "This is a MIDDLE section. Ensure this content flows naturally from previous sections and into subsequent ones. This could be any middle section in the document, so maintain coherence with the overall structure while addressing the specific topic of this section. Do not provide any conclusions in this section, as conclusions should only appear in the final section."
elif sub_section_type == SubSectionType.END:
section_position_context = "This is the CONCLUSION section. Focus on summarizing key points, providing closure."
# Construct a clear, structured query for the LLM # Construct a clear, structured query for the LLM
human_message_content = f""" human_message_content = f"""
Source material:
<documents>
{documents_text}
</documents>
Now user's query is: Now user's query is:
<user_query> <user_query>
{user_query} {user_query}
@ -138,20 +157,23 @@ async def write_sub_section(state: State, config: RunnableConfig) -> Dict[str, A
{section_title} {section_title}
</sub_section_title> </sub_section_title>
Use the provided documents as your source material and cite them properly using the IEEE citation format [X] where X is the source_id. <section_position>
<documents> {section_position_context}
{documents_text} </section_position>
</documents>
<guiding_questions>
{questions_text}
</guiding_questions>
""" """
# Create messages for the LLM # Create messages for the LLM
messages = [ messages_with_chat_history = state.chat_history + [
SystemMessage(content=get_citation_system_prompt()), SystemMessage(content=get_citation_system_prompt()),
HumanMessage(content=human_message_content) HumanMessage(content=human_message_content)
] ]
# Call the LLM and get the response # Call the LLM and get the response
response = await llm.ainvoke(messages) response = await llm.ainvoke(messages_with_chat_history)
final_answer = response.content final_answer = response.content
return { return {

View file

@ -4,16 +4,28 @@ import datetime
def get_citation_system_prompt(): def get_citation_system_prompt():
return f""" return f"""
Today's date: {datetime.datetime.now().strftime("%Y-%m-%d")} Today's date: {datetime.datetime.now().strftime("%Y-%m-%d")}
You are a research assistant tasked with analyzing documents and providing comprehensive answers with proper citations in IEEE format. You are SurfSense, an advanced AI research assistant that synthesizes information from multiple knowledge sources to provide comprehensive, well-cited answers to user queries.
<knowledge_sources>
- EXTENSION: "Web content saved via SurfSense browser extension" (personal browsing history)
- CRAWLED_URL: "Webpages indexed by SurfSense web crawler" (personally selected websites)
- FILE: "User-uploaded documents (PDFs, Word, etc.)" (personal files)
- SLACK_CONNECTOR: "Slack conversations and shared content" (personal workspace communications)
- NOTION_CONNECTOR: "Notion workspace pages and databases" (personal knowledge management)
- YOUTUBE_VIDEO: "YouTube video transcripts and metadata" (personally saved videos)
- GITHUB_CONNECTOR: "GitHub repository content and issues" (personal repositories and interactions)
- LINEAR_CONNECTOR: "Linear project issues and discussions" (personal project management)
- TAVILY_API: "Tavily search API results" (personalized search results)
- LINKUP_API: "Linkup search API results" (personalized search results)
</knowledge_sources>
<instructions> <instructions>
1. Carefully analyze all provided documents in the <document> section's. 1. Carefully analyze all provided documents in the <document> section's.
2. Extract relevant information that addresses the user's query. 2. Extract relevant information that addresses the user's query.
3. Synthesize a comprehensive, well-structured answer using information from these documents. 3. Synthesize a comprehensive, personalized answer using information from the user's personal knowledge sources.
4. For EVERY piece of information you include from the documents, add an IEEE-style citation in square brackets [X] where X is the source_id from the document's metadata. 4. For EVERY piece of information you include from the documents, add an IEEE-style citation in square brackets [X] where X is the source_id from the document's metadata.
5. Make sure ALL factual statements from the documents have proper citations. 5. Make sure ALL factual statements from the documents have proper citations.
6. If multiple documents support the same point, include all relevant citations [X], [Y]. 6. If multiple documents support the same point, include all relevant citations [X], [Y].
7. Present information in a logical, coherent flow. 7. Present information in a logical, coherent flow that reflects the user's personal context.
8. Use your own words to connect ideas, but cite ALL information from the documents. 8. Use your own words to connect ideas, but cite ALL information from the documents.
9. If documents contain conflicting information, acknowledge this and present both perspectives with appropriate citations. 9. If documents contain conflicting information, acknowledge this and present both perspectives with appropriate citations.
10. Do not make up or include information not found in the provided documents. 10. Do not make up or include information not found in the provided documents.
@ -25,10 +37,14 @@ You are a research assistant tasked with analyzing documents and providing compr
16. CRITICAL: Citations must ONLY appear as [X] or [X], [Y], [Z] format - never with parentheses, hyperlinks, or other formatting. 16. CRITICAL: Citations must ONLY appear as [X] or [X], [Y], [Z] format - never with parentheses, hyperlinks, or other formatting.
17. CRITICAL: Never make up citation numbers. Only use source_id values that are explicitly provided in the document metadata. 17. CRITICAL: Never make up citation numbers. Only use source_id values that are explicitly provided in the document metadata.
18. CRITICAL: If you are unsure about a source_id, do not include a citation rather than guessing or making one up. 18. CRITICAL: If you are unsure about a source_id, do not include a citation rather than guessing or making one up.
19. CRITICAL: Focus only on answering the user's query. Any guiding questions provided are for your thinking process only and should not be mentioned in your response.
20. CRITICAL: Ensure your response aligns with the provided sub-section title and section position.
21. CRITICAL: Remember that all knowledge sources contain personal information - provide answers that reflect this personal context.
</instructions> </instructions>
<format> <format>
- Write in clear, professional language suitable for academic or technical audiences - Write in clear, professional language suitable for academic or technical audiences
- Tailor your response to the user's personal context based on their knowledge sources
- Organize your response with appropriate paragraphs, headings, and structure - Organize your response with appropriate paragraphs, headings, and structure
- Every fact from the documents must have an IEEE-style citation in square brackets [X] where X is the EXACT source_id from the document's metadata - Every fact from the documents must have an IEEE-style citation in square brackets [X] where X is the EXACT source_id from the document's metadata
- Citations should appear at the end of the sentence containing the information they support - Citations should appear at the end of the sentence containing the information they support
@ -37,12 +53,17 @@ You are a research assistant tasked with analyzing documents and providing compr
- NEVER create your own citation numbering system - use the exact source_id values from the documents. - NEVER create your own citation numbering system - use the exact source_id values from the documents.
- NEVER format citations as clickable links or as markdown links like "([1](https://example.com))". Always use plain square brackets only. - NEVER format citations as clickable links or as markdown links like "([1](https://example.com))". Always use plain square brackets only.
- NEVER make up citation numbers if you are unsure about the source_id. It is better to omit the citation than to guess. - NEVER make up citation numbers if you are unsure about the source_id. It is better to omit the citation than to guess.
- NEVER include or mention the guiding questions in your response. They are only to help guide your thinking.
- ALWAYS focus on answering the user's query directly from the information in the documents.
- ALWAYS provide personalized answers that reflect the user's own knowledge and context.
</format> </format>
<input_example> <input_example>
<documents>
<document> <document>
<metadata> <metadata>
<source_id>1</source_id> <source_id>1</source_id>
<source_type>EXTENSION</source_type>
</metadata> </metadata>
<content> <content>
The Great Barrier Reef is the world's largest coral reef system, stretching over 2,300 kilometers along the coast of Queensland, Australia. It comprises over 2,900 individual reefs and 900 islands. The Great Barrier Reef is the world's largest coral reef system, stretching over 2,300 kilometers along the coast of Queensland, Australia. It comprises over 2,900 individual reefs and 900 islands.
@ -52,6 +73,7 @@ You are a research assistant tasked with analyzing documents and providing compr
<document> <document>
<metadata> <metadata>
<source_id>13</source_id> <source_id>13</source_id>
<source_type>YOUTUBE_VIDEO</source_type>
</metadata> </metadata>
<content> <content>
Climate change poses a significant threat to coral reefs worldwide. Rising ocean temperatures have led to mass coral bleaching events in the Great Barrier Reef in 2016, 2017, and 2020. Climate change poses a significant threat to coral reefs worldwide. Rising ocean temperatures have led to mass coral bleaching events in the Great Barrier Reef in 2016, 2017, and 2020.
@ -61,15 +83,17 @@ You are a research assistant tasked with analyzing documents and providing compr
<document> <document>
<metadata> <metadata>
<source_id>21</source_id> <source_id>21</source_id>
<source_type>CRAWLED_URL</source_type>
</metadata> </metadata>
<content> <content>
The Great Barrier Reef was designated a UNESCO World Heritage Site in 1981 due to its outstanding universal value and biological diversity. It is home to over 1,500 species of fish and 400 types of coral. The Great Barrier Reef was designated a UNESCO World Heritage Site in 1981 due to its outstanding universal value and biological diversity. It is home to over 1,500 species of fish and 400 types of coral.
</content> </content>
</document> </document>
</documents>
</input_example> </input_example>
<output_example> <output_example>
The Great Barrier Reef is the world's largest coral reef system, stretching over 2,300 kilometers along the coast of Queensland, Australia [1]. It was designated a UNESCO World Heritage Site in 1981 due to its outstanding universal value and biological diversity [21]. The reef is home to over 1,500 species of fish and 400 types of coral [21]. Unfortunately, climate change poses a significant threat to coral reefs worldwide, with rising ocean temperatures leading to mass coral bleaching events in the Great Barrier Reef in 2016, 2017, and 2020 [13]. The reef system comprises over 2,900 individual reefs and 900 islands [1], making it an ecological treasure that requires protection from multiple threats [1], [13]. Based on your saved browser content and videos, the Great Barrier Reef is the world's largest coral reef system, stretching over 2,300 kilometers along the coast of Queensland, Australia [1]. From your browsing history, you've looked into its designation as a UNESCO World Heritage Site in 1981 due to its outstanding universal value and biological diversity [21]. The reef is home to over 1,500 species of fish and 400 types of coral [21]. According to a YouTube video you've watched, climate change poses a significant threat to coral reefs worldwide, with rising ocean temperatures leading to mass coral bleaching events in the Great Barrier Reef in 2016, 2017, and 2020 [13]. The reef system comprises over 2,900 individual reefs and 900 islands [1], making it an ecological treasure that requires protection from multiple threats [1], [13].
</output_example> </output_example>
<incorrect_citation_formats> <incorrect_citation_formats>
@ -84,4 +108,22 @@ ONLY use plain square brackets [1] or multiple citations [1], [2], [3]
</incorrect_citation_formats> </incorrect_citation_formats>
Note that the citation numbers match exactly with the source_id values (1, 13, and 21) and are not renumbered sequentially. Citations follow IEEE style with square brackets and appear at the end of sentences. Note that the citation numbers match exactly with the source_id values (1, 13, and 21) and are not renumbered sequentially. Citations follow IEEE style with square brackets and appear at the end of sentences.
<user_query_instructions>
When you see a user query like:
<user_query>
Give all linear issues.
</user_query>
Focus exclusively on answering this query using information from the provided documents, which contain the user's personal knowledge and data.
If guiding questions are provided in a <guiding_questions> section, use them only to guide your thinking process. Do not mention or list these questions in your response.
Make sure your response:
1. Directly answers the user's query with personalized information from their own knowledge sources
2. Fits the provided sub-section title and section position
3. Uses proper citations for all information from documents
4. Is well-structured and professional in tone
5. Acknowledges the personal nature of the information being provided
</user_query_instructions>
""" """

View file

@ -2,7 +2,7 @@
from __future__ import annotations from __future__ import annotations
from dataclasses import dataclass from dataclasses import dataclass, field
from typing import List, Optional, Any from typing import List, Optional, Any
from sqlalchemy.ext.asyncio import AsyncSession from sqlalchemy.ext.asyncio import AsyncSession
@ -17,6 +17,7 @@ class State:
# Runtime context # Runtime context
db_session: AsyncSession db_session: AsyncSession
chat_history: Optional[List[Any]] = field(default_factory=list)
# OUTPUT: Populated by agent nodes # OUTPUT: Populated by agent nodes
reranked_documents: Optional[List[Any]] = None reranked_documents: Optional[List[Any]] = None
final_answer: Optional[str] = None final_answer: Optional[str] = None

View file

@ -6,16 +6,18 @@ from fastapi.middleware.cors import CORSMiddleware
from sqlalchemy.ext.asyncio import AsyncSession from sqlalchemy.ext.asyncio import AsyncSession
from app.db import User, create_db_and_tables, get_async_session from app.db import User, create_db_and_tables, get_async_session
from app.retriver.chunks_hybrid_search import ChucksHybridSearchRetriever
from app.schemas import UserCreate, UserRead, UserUpdate from app.schemas import UserCreate, UserRead, UserUpdate
from app.routes import router as crud_router
from app.config import config
from app.users import ( from app.users import (
SECRET, SECRET,
auth_backend, auth_backend,
fastapi_users, fastapi_users,
google_oauth_client, current_active_user
current_active_user,
) )
from app.routes import router as crud_router
@asynccontextmanager @asynccontextmanager
@ -59,11 +61,20 @@ app.include_router(
prefix="/users", prefix="/users",
tags=["users"], tags=["users"],
) )
app.include_router(
fastapi_users.get_oauth_router(google_oauth_client, auth_backend, SECRET, is_verified_by_default=True), if config.AUTH_TYPE == "GOOGLE":
prefix="/auth/google", from app.users import google_oauth_client
tags=["auth"], app.include_router(
) fastapi_users.get_oauth_router(
google_oauth_client,
auth_backend,
SECRET,
is_verified_by_default=True
),
prefix="/auth/google",
tags=["auth"],
)
app.include_router(crud_router, prefix="/api/v1", tags=["crud"]) app.include_router(crud_router, prefix="/api/v1", tags=["crud"])

View file

@ -1,12 +1,12 @@
import os import os
from pathlib import Path from pathlib import Path
import shutil
from chonkie import AutoEmbeddings, LateChunker from chonkie import AutoEmbeddings, CodeChunker, RecursiveChunker
from rerankers import Reranker
from langchain_community.chat_models import ChatLiteLLM
from dotenv import load_dotenv from dotenv import load_dotenv
from langchain_community.chat_models import ChatLiteLLM
from rerankers import Reranker
# Get the base directory of the project # Get the base directory of the project
BASE_DIR = Path(__file__).resolve().parent.parent.parent BASE_DIR = Path(__file__).resolve().parent.parent.parent
@ -15,33 +15,74 @@ env_file = BASE_DIR / ".env"
load_dotenv(env_file) load_dotenv(env_file)
def is_ffmpeg_installed():
"""
Check if ffmpeg is installed on the current system.
Returns:
bool: True if ffmpeg is installed, False otherwise.
"""
return shutil.which("ffmpeg") is not None
class Config: class Config:
# Check if ffmpeg is installed
if not is_ffmpeg_installed():
import static_ffmpeg
# ffmpeg installed on first call to add_paths(), threadsafe.
static_ffmpeg.add_paths()
# check if ffmpeg is installed again
if not is_ffmpeg_installed():
raise ValueError("FFmpeg is not installed on the system. Please install it to use the Surfsense Podcaster.")
# Database # Database
DATABASE_URL = os.getenv("DATABASE_URL") DATABASE_URL = os.getenv("DATABASE_URL")
# Google OAuth
GOOGLE_OAUTH_CLIENT_ID = os.getenv("GOOGLE_OAUTH_CLIENT_ID")
GOOGLE_OAUTH_CLIENT_SECRET = os.getenv("GOOGLE_OAUTH_CLIENT_SECRET")
NEXT_FRONTEND_URL = os.getenv("NEXT_FRONTEND_URL") NEXT_FRONTEND_URL = os.getenv("NEXT_FRONTEND_URL")
# AUTH: Google OAuth
AUTH_TYPE = os.getenv("AUTH_TYPE")
if AUTH_TYPE == "GOOGLE":
GOOGLE_OAUTH_CLIENT_ID = os.getenv("GOOGLE_OAUTH_CLIENT_ID")
GOOGLE_OAUTH_CLIENT_SECRET = os.getenv("GOOGLE_OAUTH_CLIENT_SECRET")
# LONG-CONTEXT LLMS # LONG-CONTEXT LLMS
LONG_CONTEXT_LLM = os.getenv("LONG_CONTEXT_LLM") LONG_CONTEXT_LLM = os.getenv("LONG_CONTEXT_LLM")
long_context_llm_instance = ChatLiteLLM(model=LONG_CONTEXT_LLM) LONG_CONTEXT_LLM_API_BASE = os.getenv("LONG_CONTEXT_LLM_API_BASE")
if LONG_CONTEXT_LLM_API_BASE:
long_context_llm_instance = ChatLiteLLM(model=LONG_CONTEXT_LLM, api_base=LONG_CONTEXT_LLM_API_BASE)
else:
long_context_llm_instance = ChatLiteLLM(model=LONG_CONTEXT_LLM)
# GPT Researcher # FAST LLM
FAST_LLM = os.getenv("FAST_LLM") FAST_LLM = os.getenv("FAST_LLM")
STRATEGIC_LLM = os.getenv("STRATEGIC_LLM") FAST_LLM_API_BASE = os.getenv("FAST_LLM_API_BASE")
fast_llm_instance = ChatLiteLLM(model=FAST_LLM) if FAST_LLM_API_BASE:
strategic_llm_instance = ChatLiteLLM(model=STRATEGIC_LLM) fast_llm_instance = ChatLiteLLM(model=FAST_LLM, api_base=FAST_LLM_API_BASE)
else:
fast_llm_instance = ChatLiteLLM(model=FAST_LLM)
# STRATEGIC LLM
STRATEGIC_LLM = os.getenv("STRATEGIC_LLM")
STRATEGIC_LLM_API_BASE = os.getenv("STRATEGIC_LLM_API_BASE")
if STRATEGIC_LLM_API_BASE:
strategic_llm_instance = ChatLiteLLM(model=STRATEGIC_LLM, api_base=STRATEGIC_LLM_API_BASE)
else:
strategic_llm_instance = ChatLiteLLM(model=STRATEGIC_LLM)
# Chonkie Configuration | Edit this to your needs # Chonkie Configuration | Edit this to your needs
EMBEDDING_MODEL = os.getenv("EMBEDDING_MODEL") EMBEDDING_MODEL = os.getenv("EMBEDDING_MODEL")
embedding_model_instance = AutoEmbeddings.get_embeddings(EMBEDDING_MODEL) embedding_model_instance = AutoEmbeddings.get_embeddings(EMBEDDING_MODEL)
chunker_instance = LateChunker( chunker_instance = RecursiveChunker(
embedding_model=EMBEDDING_MODEL, chunk_size=getattr(embedding_model_instance, 'max_seq_length', 512)
chunk_size=embedding_model_instance.max_seq_length, )
code_chunker_instance = CodeChunker(
chunk_size=getattr(embedding_model_instance, 'max_seq_length', 512)
) )
# Reranker's Configuration | Pinecode, Cohere etc. Read more at https://github.com/AnswerDotAI/rerankers?tab=readme-ov-file#usage # Reranker's Configuration | Pinecode, Cohere etc. Read more at https://github.com/AnswerDotAI/rerankers?tab=readme-ov-file#usage
@ -55,12 +96,30 @@ class Config:
# OAuth JWT # OAuth JWT
SECRET_KEY = os.getenv("SECRET_KEY") SECRET_KEY = os.getenv("SECRET_KEY")
# Unstructured API Key # ETL Service
UNSTRUCTURED_API_KEY = os.getenv("UNSTRUCTURED_API_KEY") ETL_SERVICE = os.getenv("ETL_SERVICE")
if ETL_SERVICE == "UNSTRUCTURED":
# Unstructured API Key
UNSTRUCTURED_API_KEY = os.getenv("UNSTRUCTURED_API_KEY")
elif ETL_SERVICE == "LLAMACLOUD":
# LlamaCloud API Key
LLAMA_CLOUD_API_KEY = os.getenv("LLAMA_CLOUD_API_KEY")
# Firecrawl API Key # Firecrawl API Key
FIRECRAWL_API_KEY = os.getenv("FIRECRAWL_API_KEY", None) FIRECRAWL_API_KEY = os.getenv("FIRECRAWL_API_KEY", None)
# Litellm TTS Configuration
TTS_SERVICE = os.getenv("TTS_SERVICE")
TTS_SERVICE_API_BASE = os.getenv("TTS_SERVICE_API_BASE")
# Litellm STT Configuration
STT_SERVICE = os.getenv("STT_SERVICE")
STT_SERVICE_API_BASE = os.getenv("STT_SERVICE_API_BASE")
# Validation Checks # Validation Checks
# Check embedding dimension # Check embedding dimension
if hasattr(embedding_model_instance, 'dimension') and embedding_model_instance.dimension > 2000: if hasattr(embedding_model_instance, 'dimension') and embedding_model_instance.dimension > 2000:

View file

@ -80,7 +80,7 @@ class GitHubConnector:
# type='owner' fetches repos owned by the user # type='owner' fetches repos owned by the user
# type='member' fetches repos the user is a collaborator on (including orgs) # type='member' fetches repos the user is a collaborator on (including orgs)
# type='all' fetches both # type='all' fetches both
for repo in self.gh.repositories(type='owner', sort='updated'): for repo in self.gh.repositories(type='all', sort='updated'):
repos_data.append({ repos_data.append({
"id": repo.id, "id": repo.id,
"name": repo.name, "name": repo.name,

View file

@ -6,7 +6,7 @@ Allows fetching issue lists and their comments with date range filtering.
""" """
import requests import requests
from datetime import datetime, timedelta from datetime import datetime
from typing import Dict, List, Optional, Tuple, Any, Union from typing import Dict, List, Optional, Tuple, Any, Union

View file

@ -6,11 +6,15 @@ Allows fetching channel lists and message history with date range filtering.
""" """
import os import os
import time # Added import
import logging # Added import
from slack_sdk import WebClient from slack_sdk import WebClient
from slack_sdk.errors import SlackApiError from slack_sdk.errors import SlackApiError
from datetime import datetime, timedelta from datetime import datetime
from typing import Dict, List, Optional, Tuple, Any, Union from typing import Dict, List, Optional, Tuple, Any, Union
logger = logging.getLogger(__name__) # Added logger
class SlackHistory: class SlackHistory:
"""Class for retrieving conversation history from Slack channels.""" """Class for retrieving conversation history from Slack channels."""
@ -33,56 +37,88 @@ class SlackHistory:
""" """
self.client = WebClient(token=token) self.client = WebClient(token=token)
def get_all_channels(self, include_private: bool = True) -> Dict[str, str]: def get_all_channels(self, include_private: bool = True) -> List[Dict[str, Any]]:
""" """
Fetch all channels that the bot has access to. Fetch all channels that the bot has access to, with rate limit handling.
Args: Args:
include_private: Whether to include private channels include_private: Whether to include private channels
Returns: Returns:
Dictionary mapping channel names to channel IDs List of dictionaries, each representing a channel with id, name, is_private, is_member.
Raises: Raises:
ValueError: If no Slack client has been initialized ValueError: If no Slack client has been initialized
SlackApiError: If there's an error calling the Slack API SlackApiError: If there's an unrecoverable error calling the Slack API
RuntimeError: For unexpected errors during channel fetching.
""" """
if not self.client: if not self.client:
raise ValueError("Slack client not initialized. Call set_token() first.") raise ValueError("Slack client not initialized. Call set_token() first.")
channels_dict = {} channels_list = [] # Changed from dict to list
types = "public_channel" types = "public_channel"
if include_private: if include_private:
types += ",private_channel" types += ",private_channel"
try: next_cursor = None
# Call the conversations.list method is_first_request = True
result = self.client.conversations_list(
types=types,
limit=1000 # Maximum allowed by API
)
channels = result["channels"]
# Handle pagination for workspaces with many channels while is_first_request or next_cursor:
while result.get("response_metadata", {}).get("next_cursor"): try:
next_cursor = result["response_metadata"]["next_cursor"] if not is_first_request: # Add delay only for paginated requests
logger.info(f"Paginating for channels, waiting 3 seconds before next call. Cursor: {next_cursor}")
time.sleep(3)
# Get the next batch of channels current_limit = 1000 # Max limit
result = self.client.conversations_list( api_result = self.client.conversations_list(
types=types, types=types,
cursor=next_cursor, cursor=next_cursor,
limit=1000 limit=current_limit
) )
channels.extend(result["channels"])
# Create a dictionary mapping channel names to IDs channels_on_page = api_result["channels"]
for channel in channels: for channel in channels_on_page:
channels_dict[channel["name"]] = channel["id"] if "name" in channel and "id" in channel:
channel_data = {
"id": channel.get("id"),
"name": channel.get("name"),
"is_private": channel.get("is_private", False),
# is_member is often part of the channel object from conversations.list
# It indicates if the authenticated user (bot) is a member.
# For public channels, this might be true or the API might not focus on it
# if the bot can read it anyway. For private, it's crucial.
"is_member": channel.get("is_member", False)
}
channels_list.append(channel_data)
else:
logger.warning(f"Channel found with missing name or id. Data: {channel}")
return channels_dict
except SlackApiError as e: next_cursor = api_result.get("response_metadata", {}).get("next_cursor")
raise SlackApiError(f"Error retrieving channels: {e}", e.response) is_first_request = False # Subsequent requests are not the first
if not next_cursor: # All pages processed
break
except SlackApiError as e:
if e.response is not None and e.response.status_code == 429:
retry_after_header = e.response.headers.get('Retry-After')
wait_duration = 60 # Default wait time
if retry_after_header and retry_after_header.isdigit():
wait_duration = int(retry_after_header)
logger.warning(f"Slack API rate limit hit while fetching channels. Waiting for {wait_duration} seconds. Cursor: {next_cursor}")
time.sleep(wait_duration)
# The loop will continue, retrying with the same cursor
else:
# Not a 429 error, or no response object, re-raise
raise SlackApiError(f"Error retrieving channels: {e}", e.response)
except Exception as general_error:
# Handle other potential errors like network issues if necessary, or re-raise
logger.error(f"An unexpected error occurred during channel fetching: {general_error}")
raise RuntimeError(f"An unexpected error occurred during channel fetching: {general_error}")
return channels_list
def get_conversation_history( def get_conversation_history(
self, self,
@ -110,17 +146,18 @@ class SlackHistory:
if not self.client: if not self.client:
raise ValueError("Slack client not initialized. Call set_token() first.") raise ValueError("Slack client not initialized. Call set_token() first.")
try: messages = []
# Call the conversations.history method next_cursor = None
messages = []
next_cursor = None while True:
try:
# Proactive delay for conversations.history (Tier 3)
time.sleep(1.2) # Wait 1.2 seconds before each history call.
while True:
kwargs = { kwargs = {
"channel": channel_id, "channel": channel_id,
"limit": min(limit, 1000), # API max is 1000 "limit": min(limit, 1000), # API max is 1000
} }
if oldest: if oldest:
kwargs["oldest"] = oldest kwargs["oldest"] = oldest
if latest: if latest:
@ -128,21 +165,56 @@ class SlackHistory:
if next_cursor: if next_cursor:
kwargs["cursor"] = next_cursor kwargs["cursor"] = next_cursor
result = self.client.conversations_history(**kwargs) current_api_call_successful = False
result = None # Ensure result is defined
try:
result = self.client.conversations_history(**kwargs)
current_api_call_successful = True
except SlackApiError as e_history:
if e_history.response is not None and e_history.response.status_code == 429:
retry_after_str = e_history.response.headers.get('Retry-After')
wait_time = 60 # Default
if retry_after_str and retry_after_str.isdigit():
wait_time = int(retry_after_str)
logger.warning(
f"Rate limited by Slack on conversations.history for channel {channel_id}. "
f"Retrying after {wait_time} seconds. Cursor: {next_cursor}"
)
time.sleep(wait_time)
# current_api_call_successful remains False, loop will retry this page
else:
raise # Re-raise to outer handler for not_in_channel or other SlackApiErrors
if not current_api_call_successful:
continue # Retry the current page fetch due to handled rate limit
# Process result if successful
batch = result["messages"] batch = result["messages"]
messages.extend(batch) messages.extend(batch)
# Check if we need to paginate
if result.get("has_more", False) and len(messages) < limit: if result.get("has_more", False) and len(messages) < limit:
next_cursor = result["response_metadata"]["next_cursor"] next_cursor = result["response_metadata"]["next_cursor"]
else: else:
break break # Exit pagination loop
# Respect the overall limit parameter except SlackApiError as e: # Outer catch for not_in_channel or unhandled SlackApiErrors from inner try
return messages[:limit] if (e.response is not None and
hasattr(e.response, 'data') and
isinstance(e.response.data, dict) and
e.response.data.get('error') == 'not_in_channel'):
logger.warning(
f"Bot is not in channel '{channel_id}'. Cannot fetch history. "
"Please add the bot to this channel."
)
return []
# For other SlackApiErrors from inner block or this level
raise SlackApiError(f"Error retrieving history for channel {channel_id}: {e}", e.response)
except Exception as general_error: # Catch any other unexpected errors
logger.error(f"Unexpected error in get_conversation_history for channel {channel_id}: {general_error}")
# Re-raise the general error to allow higher-level handling or visibility
raise
except SlackApiError as e: return messages[:limit]
raise SlackApiError(f"Error retrieving history for channel {channel_id}: {e}", e.response)
@staticmethod @staticmethod
def convert_date_to_timestamp(date_str: str) -> Optional[int]: def convert_date_to_timestamp(date_str: str) -> Optional[int]:
@ -221,11 +293,30 @@ class SlackHistory:
if not self.client: if not self.client:
raise ValueError("Slack client not initialized. Call set_token() first.") raise ValueError("Slack client not initialized. Call set_token() first.")
try: while True:
result = self.client.users_info(user=user_id) try:
return result["user"] # Proactive delay for users.info (Tier 4) - generally not needed unless called extremely rapidly.
except SlackApiError as e: # For now, we are only adding Retry-After as per plan.
raise SlackApiError(f"Error retrieving user info for {user_id}: {e}", e.response) # time.sleep(0.6) # Optional: ~100 req/min if ever needed.
result = self.client.users_info(user=user_id)
return result["user"] # Success, return and exit loop implicitly
except SlackApiError as e_user_info:
if e_user_info.response is not None and e_user_info.response.status_code == 429:
retry_after_str = e_user_info.response.headers.get('Retry-After')
wait_time = 30 # Default for Tier 4, can be adjusted
if retry_after_str and retry_after_str.isdigit():
wait_time = int(retry_after_str)
logger.warning(f"Rate limited by Slack on users.info for user {user_id}. Retrying after {wait_time} seconds.")
time.sleep(wait_time)
continue # Retry the API call
else:
# Not a 429 error, or no response object, re-raise
raise SlackApiError(f"Error retrieving user info for {user_id}: {e_user_info}", e_user_info.response)
except Exception as general_error: # Catch any other unexpected errors
logger.error(f"Unexpected error in get_user_info for user {user_id}: {general_error}")
raise # Re-raise unexpected errors
def format_message(self, msg: Dict[str, Any], include_user_info: bool = False) -> Dict[str, Any]: def format_message(self, msg: Dict[str, Any], include_user_info: bool = False) -> Dict[str, Any]:
""" """

View file

@ -0,0 +1,154 @@
import unittest
from unittest.mock import patch, Mock, call
from datetime import datetime
# Adjust the import path based on the actual location if test_github_connector.py
# is not in the same directory as github_connector.py or if paths are set up differently.
# Assuming surfsend_backend/app/connectors/test_github_connector.py
from surfsense_backend.app.connectors.github_connector import GitHubConnector
from github3.exceptions import ForbiddenError # Import the specific exception
class TestGitHubConnector(unittest.TestCase):
@patch('surfsense_backend.app.connectors.github_connector.github_login')
def test_get_user_repositories_uses_type_all(self, mock_github_login):
# Mock the GitHub client object and its methods
mock_gh_instance = Mock()
mock_github_login.return_value = mock_gh_instance
# Mock the self.gh.me() call in __init__ to prevent an actual API call
mock_gh_instance.me.return_value = Mock() # Simple mock to pass initialization
# Prepare mock repository data
mock_repo1_data = Mock()
mock_repo1_data.id = 1
mock_repo1_data.name = "repo1"
mock_repo1_data.full_name = "user/repo1"
mock_repo1_data.private = False
mock_repo1_data.html_url = "http://example.com/user/repo1"
mock_repo1_data.description = "Test repo 1"
mock_repo1_data.updated_at = datetime(2023, 1, 1, 10, 30, 0) # Added time component
mock_repo2_data = Mock()
mock_repo2_data.id = 2
mock_repo2_data.name = "org-repo"
mock_repo2_data.full_name = "org/org-repo"
mock_repo2_data.private = True
mock_repo2_data.html_url = "http://example.com/org/org-repo"
mock_repo2_data.description = "Org repo"
mock_repo2_data.updated_at = datetime(2023, 1, 2, 12, 0, 0) # Added time component
# Configure the mock for gh.repositories() call
# This method is an iterator, so it should return an iterable (e.g., a list)
mock_gh_instance.repositories.return_value = [mock_repo1_data, mock_repo2_data]
connector = GitHubConnector(token="fake_token")
repositories = connector.get_user_repositories()
# Assert that gh.repositories was called correctly
mock_gh_instance.repositories.assert_called_once_with(type='all', sort='updated')
# Assert the structure and content of the returned data
expected_repositories = [
{
"id": 1, "name": "repo1", "full_name": "user/repo1", "private": False,
"url": "http://example.com/user/repo1", "description": "Test repo 1",
"last_updated": datetime(2023, 1, 1, 10, 30, 0)
},
{
"id": 2, "name": "org-repo", "full_name": "org/org-repo", "private": True,
"url": "http://example.com/org/org-repo", "description": "Org repo",
"last_updated": datetime(2023, 1, 2, 12, 0, 0)
}
]
self.assertEqual(repositories, expected_repositories)
self.assertEqual(len(repositories), 2)
@patch('surfsense_backend.app.connectors.github_connector.github_login')
def test_get_user_repositories_handles_empty_description_and_none_updated_at(self, mock_github_login):
# Mock the GitHub client object and its methods
mock_gh_instance = Mock()
mock_github_login.return_value = mock_gh_instance
mock_gh_instance.me.return_value = Mock()
mock_repo_data = Mock()
mock_repo_data.id = 1
mock_repo_data.name = "repo_no_desc"
mock_repo_data.full_name = "user/repo_no_desc"
mock_repo_data.private = False
mock_repo_data.html_url = "http://example.com/user/repo_no_desc"
mock_repo_data.description = None # Test None description
mock_repo_data.updated_at = None # Test None updated_at
mock_gh_instance.repositories.return_value = [mock_repo_data]
connector = GitHubConnector(token="fake_token")
repositories = connector.get_user_repositories()
mock_gh_instance.repositories.assert_called_once_with(type='all', sort='updated')
expected_repositories = [
{
"id": 1, "name": "repo_no_desc", "full_name": "user/repo_no_desc", "private": False,
"url": "http://example.com/user/repo_no_desc", "description": "", # Expect empty string
"last_updated": None # Expect None
}
]
self.assertEqual(repositories, expected_repositories)
@patch('surfsense_backend.app.connectors.github_connector.github_login')
def test_github_connector_initialization_failure_forbidden(self, mock_github_login):
# Test that __init__ raises ValueError on auth failure (ForbiddenError)
mock_gh_instance = Mock()
mock_github_login.return_value = mock_gh_instance
# Create a mock response object for the ForbiddenError
# The actual response structure might vary, but github3.py's ForbiddenError
# can be instantiated with just a response object that has a status_code.
mock_response = Mock()
mock_response.status_code = 403 # Typically Forbidden
# Setup the side_effect for self.gh.me()
mock_gh_instance.me.side_effect = ForbiddenError(mock_response)
with self.assertRaises(ValueError) as context:
GitHubConnector(token="invalid_token_forbidden")
self.assertIn("Invalid GitHub token or insufficient permissions.", str(context.exception))
@patch('surfsense_backend.app.connectors.github_connector.github_login')
def test_github_connector_initialization_failure_authentication_failed(self, mock_github_login):
# Test that __init__ raises ValueError on auth failure (AuthenticationFailed, which is a subclass of ForbiddenError)
# For github3.py, AuthenticationFailed is more specific for token issues.
from github3.exceptions import AuthenticationFailed
mock_gh_instance = Mock()
mock_github_login.return_value = mock_gh_instance
mock_response = Mock()
mock_response.status_code = 401 # Typically Unauthorized
mock_gh_instance.me.side_effect = AuthenticationFailed(mock_response)
with self.assertRaises(ValueError) as context:
GitHubConnector(token="invalid_token_authfailed")
self.assertIn("Invalid GitHub token or insufficient permissions.", str(context.exception))
@patch('surfsense_backend.app.connectors.github_connector.github_login')
def test_get_user_repositories_handles_api_exception(self, mock_github_login):
mock_gh_instance = Mock()
mock_github_login.return_value = mock_gh_instance
mock_gh_instance.me.return_value = Mock()
# Simulate an exception when calling repositories
mock_gh_instance.repositories.side_effect = Exception("API Error")
connector = GitHubConnector(token="fake_token")
# We expect it to log an error and return an empty list
with patch('surfsense_backend.app.connectors.github_connector.logger') as mock_logger:
repositories = connector.get_user_repositories()
self.assertEqual(repositories, [])
mock_logger.error.assert_called_once()
self.assertIn("Failed to fetch GitHub repositories: API Error", mock_logger.error.call_args[0][0])
if __name__ == '__main__':
unittest.main()

View file

@ -0,0 +1,420 @@
import unittest
import time # Imported to be available for patching target module
from unittest.mock import patch, Mock, call
from slack_sdk.errors import SlackApiError
# Since test_slack_history.py is in the same directory as slack_history.py
from .slack_history import SlackHistory
class TestSlackHistoryGetAllChannels(unittest.TestCase):
@patch('surfsense_backend.app.connectors.slack_history.logger')
@patch('surfsense_backend.app.connectors.slack_history.time.sleep')
@patch('slack_sdk.WebClient')
def test_get_all_channels_pagination_with_delay(self, MockWebClient, mock_sleep, mock_logger):
mock_client_instance = MockWebClient.return_value
# Mock API responses now include is_private and is_member
page1_response = {
"channels": [
{"name": "general", "id": "C1", "is_private": False, "is_member": True},
{"name": "dev", "id": "C0", "is_private": False, "is_member": True}
],
"response_metadata": {"next_cursor": "cursor123"}
}
page2_response = {
"channels": [{"name": "random", "id": "C2", "is_private": True, "is_member": True}],
"response_metadata": {"next_cursor": ""}
}
mock_client_instance.conversations_list.side_effect = [
page1_response,
page2_response
]
slack_history = SlackHistory(token="fake_token")
channels_list = slack_history.get_all_channels(include_private=True)
expected_channels_list = [
{"id": "C1", "name": "general", "is_private": False, "is_member": True},
{"id": "C0", "name": "dev", "is_private": False, "is_member": True},
{"id": "C2", "name": "random", "is_private": True, "is_member": True}
]
self.assertEqual(len(channels_list), 3)
self.assertListEqual(channels_list, expected_channels_list) # Assert list equality
expected_calls = [
call(types="public_channel,private_channel", cursor=None, limit=1000),
call(types="public_channel,private_channel", cursor="cursor123", limit=1000)
]
mock_client_instance.conversations_list.assert_has_calls(expected_calls)
self.assertEqual(mock_client_instance.conversations_list.call_count, 2)
mock_sleep.assert_called_once_with(3)
mock_logger.info.assert_called_once_with("Paginating for channels, waiting 3 seconds before next call. Cursor: cursor123")
@patch('surfsense_backend.app.connectors.slack_history.logger')
@patch('surfsense_backend.app.connectors.slack_history.time.sleep')
@patch('slack_sdk.WebClient')
def test_get_all_channels_rate_limit_with_retry_after(self, MockWebClient, mock_sleep, mock_logger):
mock_client_instance = MockWebClient.return_value
mock_error_response = Mock()
mock_error_response.status_code = 429
mock_error_response.headers = {'Retry-After': '5'}
successful_response = {
"channels": [{"name": "general", "id": "C1", "is_private": False, "is_member": True}],
"response_metadata": {"next_cursor": ""}
}
mock_client_instance.conversations_list.side_effect = [
SlackApiError(message="ratelimited", response=mock_error_response),
successful_response
]
slack_history = SlackHistory(token="fake_token")
channels_list = slack_history.get_all_channels(include_private=True)
expected_channels_list = [{"id": "C1", "name": "general", "is_private": False, "is_member": True}]
self.assertEqual(len(channels_list), 1)
self.assertListEqual(channels_list, expected_channels_list)
mock_sleep.assert_called_once_with(5)
mock_logger.warning.assert_called_once_with("Slack API rate limit hit while fetching channels. Waiting for 5 seconds. Cursor: None")
expected_calls = [
call(types="public_channel,private_channel", cursor=None, limit=1000),
call(types="public_channel,private_channel", cursor=None, limit=1000)
]
mock_client_instance.conversations_list.assert_has_calls(expected_calls)
self.assertEqual(mock_client_instance.conversations_list.call_count, 2)
@patch('surfsense_backend.app.connectors.slack_history.logger')
@patch('surfsense_backend.app.connectors.slack_history.time.sleep')
@patch('slack_sdk.WebClient')
def test_get_all_channels_rate_limit_no_retry_after_valid_header(self, MockWebClient, mock_sleep, mock_logger):
mock_client_instance = MockWebClient.return_value
mock_error_response = Mock()
mock_error_response.status_code = 429
mock_error_response.headers = {'Retry-After': 'invalid_value'}
successful_response = {
"channels": [{"name": "general", "id": "C1", "is_private": False, "is_member": True}],
"response_metadata": {"next_cursor": ""}
}
mock_client_instance.conversations_list.side_effect = [
SlackApiError(message="ratelimited", response=mock_error_response),
successful_response
]
slack_history = SlackHistory(token="fake_token")
channels_list = slack_history.get_all_channels(include_private=True)
expected_channels_list = [{"id": "C1", "name": "general", "is_private": False, "is_member": True}]
self.assertListEqual(channels_list, expected_channels_list)
mock_sleep.assert_called_once_with(60) # Default fallback
mock_logger.warning.assert_called_once_with("Slack API rate limit hit while fetching channels. Waiting for 60 seconds. Cursor: None")
self.assertEqual(mock_client_instance.conversations_list.call_count, 2)
@patch('surfsense_backend.app.connectors.slack_history.logger')
@patch('surfsense_backend.app.connectors.slack_history.time.sleep')
@patch('slack_sdk.WebClient')
def test_get_all_channels_rate_limit_no_retry_after_header(self, MockWebClient, mock_sleep, mock_logger):
mock_client_instance = MockWebClient.return_value
mock_error_response = Mock()
mock_error_response.status_code = 429
mock_error_response.headers = {}
successful_response = {
"channels": [{"name": "general", "id": "C1", "is_private": False, "is_member": True}],
"response_metadata": {"next_cursor": ""}
}
mock_client_instance.conversations_list.side_effect = [
SlackApiError(message="ratelimited", response=mock_error_response),
successful_response
]
slack_history = SlackHistory(token="fake_token")
channels_list = slack_history.get_all_channels(include_private=True)
expected_channels_list = [{"id": "C1", "name": "general", "is_private": False, "is_member": True}]
self.assertListEqual(channels_list, expected_channels_list)
mock_sleep.assert_called_once_with(60) # Default fallback
mock_logger.warning.assert_called_once_with("Slack API rate limit hit while fetching channels. Waiting for 60 seconds. Cursor: None")
self.assertEqual(mock_client_instance.conversations_list.call_count, 2)
@patch('surfsense_backend.app.connectors.slack_history.logger')
@patch('surfsense_backend.app.connectors.slack_history.time.sleep')
@patch('slack_sdk.WebClient')
def test_get_all_channels_other_slack_api_error(self, MockWebClient, mock_sleep, mock_logger):
mock_client_instance = MockWebClient.return_value
mock_error_response = Mock()
mock_error_response.status_code = 500
mock_error_response.headers = {}
mock_error_response.data = {"ok": False, "error": "internal_error"}
original_error = SlackApiError(message="server error", response=mock_error_response)
mock_client_instance.conversations_list.side_effect = original_error
slack_history = SlackHistory(token="fake_token")
with self.assertRaises(SlackApiError) as context:
slack_history.get_all_channels(include_private=True)
self.assertEqual(context.exception.response.status_code, 500)
self.assertIn("server error", str(context.exception))
mock_sleep.assert_not_called()
mock_logger.warning.assert_not_called() # Ensure no rate limit log
mock_client_instance.conversations_list.assert_called_once_with(
types="public_channel,private_channel", cursor=None, limit=1000
)
@patch('surfsense_backend.app.connectors.slack_history.logger')
@patch('surfsense_backend.app.connectors.slack_history.time.sleep')
@patch('slack_sdk.WebClient')
def test_get_all_channels_handles_missing_name_id_gracefully(self, MockWebClient, mock_sleep, mock_logger):
mock_client_instance = MockWebClient.return_value
response_with_malformed_data = {
"channels": [
{"id": "C1_missing_name", "is_private": False, "is_member": True},
{"name": "channel_missing_id", "is_private": False, "is_member": True},
{"name": "general", "id": "C2_valid", "is_private": False, "is_member": True}
],
"response_metadata": {"next_cursor": ""}
}
mock_client_instance.conversations_list.return_value = response_with_malformed_data
slack_history = SlackHistory(token="fake_token")
channels_list = slack_history.get_all_channels(include_private=True)
expected_channels_list = [
{"id": "C2_valid", "name": "general", "is_private": False, "is_member": True}
]
self.assertEqual(len(channels_list), 1)
self.assertListEqual(channels_list, expected_channels_list)
self.assertEqual(mock_logger.warning.call_count, 2)
mock_logger.warning.assert_any_call("Channel found with missing name or id. Data: {'id': 'C1_missing_name', 'is_private': False, 'is_member': True}")
mock_logger.warning.assert_any_call("Channel found with missing name or id. Data: {'name': 'channel_missing_id', 'is_private': False, 'is_member': True}")
mock_sleep.assert_not_called()
mock_client_instance.conversations_list.assert_called_once_with(
types="public_channel,private_channel", cursor=None, limit=1000
)
if __name__ == '__main__':
unittest.main()
class TestSlackHistoryGetConversationHistory(unittest.TestCase):
@patch('surfsense_backend.app.connectors.slack_history.logger')
@patch('surfsense_backend.app.connectors.slack_history.time.sleep')
@patch('slack_sdk.WebClient')
def test_proactive_delay_single_page(self, MockWebClient, mock_time_sleep, mock_logger):
mock_client_instance = MockWebClient.return_value
mock_client_instance.conversations_history.return_value = {
"messages": [{"text": "msg1"}],
"has_more": False
}
slack_history = SlackHistory(token="fake_token")
slack_history.get_conversation_history(channel_id="C123")
mock_time_sleep.assert_called_once_with(1.2) # Proactive delay
@patch('surfsense_backend.app.connectors.slack_history.logger')
@patch('surfsense_backend.app.connectors.slack_history.time.sleep')
@patch('slack_sdk.WebClient')
def test_proactive_delay_multiple_pages(self, MockWebClient, mock_time_sleep, mock_logger):
mock_client_instance = MockWebClient.return_value
mock_client_instance.conversations_history.side_effect = [
{
"messages": [{"text": "msg1"}],
"has_more": True,
"response_metadata": {"next_cursor": "cursor1"}
},
{
"messages": [{"text": "msg2"}],
"has_more": False
}
]
slack_history = SlackHistory(token="fake_token")
slack_history.get_conversation_history(channel_id="C123")
# Expected calls: 1.2 (page1), 1.2 (page2)
self.assertEqual(mock_time_sleep.call_count, 2)
mock_time_sleep.assert_has_calls([call(1.2), call(1.2)])
@patch('surfsense_backend.app.connectors.slack_history.logger')
@patch('surfsense_backend.app.connectors.slack_history.time.sleep')
@patch('slack_sdk.WebClient')
def test_retry_after_logic(self, MockWebClient, mock_time_sleep, mock_logger):
mock_client_instance = MockWebClient.return_value
mock_error_response = Mock()
mock_error_response.status_code = 429
mock_error_response.headers = {'Retry-After': '5'}
mock_client_instance.conversations_history.side_effect = [
SlackApiError(message="ratelimited", response=mock_error_response),
{"messages": [{"text": "msg1"}], "has_more": False}
]
slack_history = SlackHistory(token="fake_token")
messages = slack_history.get_conversation_history(channel_id="C123")
self.assertEqual(len(messages), 1)
self.assertEqual(messages[0]["text"], "msg1")
# Expected sleep calls: 1.2 (proactive for 1st attempt), 5 (rate limit), 1.2 (proactive for 2nd attempt)
mock_time_sleep.assert_has_calls([call(1.2), call(5), call(1.2)], any_order=False)
mock_logger.warning.assert_called_once() # Check that a warning was logged for rate limiting
@patch('surfsense_backend.app.connectors.slack_history.logger')
@patch('surfsense_backend.app.connectors.slack_history.time.sleep')
@patch('slack_sdk.WebClient')
def test_not_in_channel_error(self, MockWebClient, mock_time_sleep, mock_logger):
mock_client_instance = MockWebClient.return_value
mock_error_response = Mock()
mock_error_response.status_code = 403 # Typical for not_in_channel, but data matters more
mock_error_response.data = {'ok': False, 'error': 'not_in_channel'}
# This error is now raised by the inner try-except, then caught by the outer one
mock_client_instance.conversations_history.side_effect = SlackApiError(
message="not_in_channel error",
response=mock_error_response
)
slack_history = SlackHistory(token="fake_token")
messages = slack_history.get_conversation_history(channel_id="C123")
self.assertEqual(messages, [])
mock_logger.warning.assert_called_with(
"Bot is not in channel 'C123'. Cannot fetch history. Please add the bot to this channel."
)
mock_time_sleep.assert_called_once_with(1.2) # Proactive delay before the API call
@patch('surfsense_backend.app.connectors.slack_history.logger')
@patch('surfsense_backend.app.connectors.slack_history.time.sleep')
@patch('slack_sdk.WebClient')
def test_other_slack_api_error_propagates(self, MockWebClient, mock_time_sleep, mock_logger):
mock_client_instance = MockWebClient.return_value
mock_error_response = Mock()
mock_error_response.status_code = 500
mock_error_response.data = {'ok': False, 'error': 'internal_error'}
original_error = SlackApiError(message="server error", response=mock_error_response)
mock_client_instance.conversations_history.side_effect = original_error
slack_history = SlackHistory(token="fake_token")
with self.assertRaises(SlackApiError) as context:
slack_history.get_conversation_history(channel_id="C123")
self.assertIn("Error retrieving history for channel C123", str(context.exception))
self.assertIs(context.exception.response, mock_error_response)
mock_time_sleep.assert_called_once_with(1.2) # Proactive delay
@patch('surfsense_backend.app.connectors.slack_history.logger')
@patch('surfsense_backend.app.connectors.slack_history.time.sleep')
@patch('slack_sdk.WebClient')
def test_general_exception_propagates(self, MockWebClient, mock_time_sleep, mock_logger):
mock_client_instance = MockWebClient.return_value
original_error = Exception("Something broke")
mock_client_instance.conversations_history.side_effect = original_error
slack_history = SlackHistory(token="fake_token")
with self.assertRaises(Exception) as context: # Check for generic Exception
slack_history.get_conversation_history(channel_id="C123")
self.assertIs(context.exception, original_error) # Should re-raise the original error
mock_logger.error.assert_called_once_with("Unexpected error in get_conversation_history for channel C123: Something broke")
mock_time_sleep.assert_called_once_with(1.2) # Proactive delay
class TestSlackHistoryGetUserInfo(unittest.TestCase):
@patch('surfsense_backend.app.connectors.slack_history.logger')
@patch('surfsense_backend.app.connectors.slack_history.time.sleep')
@patch('slack_sdk.WebClient')
def test_retry_after_logic(self, MockWebClient, mock_time_sleep, mock_logger):
mock_client_instance = MockWebClient.return_value
mock_error_response = Mock()
mock_error_response.status_code = 429
mock_error_response.headers = {'Retry-After': '3'} # Using 3 seconds for test
successful_user_data = {"id": "U123", "name": "testuser"}
mock_client_instance.users_info.side_effect = [
SlackApiError(message="ratelimited_userinfo", response=mock_error_response),
{"user": successful_user_data}
]
slack_history = SlackHistory(token="fake_token")
user_info = slack_history.get_user_info(user_id="U123")
self.assertEqual(user_info, successful_user_data)
# Assert that time.sleep was called for the rate limit
mock_time_sleep.assert_called_once_with(3)
mock_logger.warning.assert_called_once_with(
"Rate limited by Slack on users.info for user U123. Retrying after 3 seconds."
)
# Assert users_info was called twice (original + retry)
self.assertEqual(mock_client_instance.users_info.call_count, 2)
mock_client_instance.users_info.assert_has_calls([call(user="U123"), call(user="U123")])
@patch('surfsense_backend.app.connectors.slack_history.logger')
@patch('surfsense_backend.app.connectors.slack_history.time.sleep') # time.sleep might be called by other logic, but not expected here
@patch('slack_sdk.WebClient')
def test_other_slack_api_error_propagates(self, MockWebClient, mock_time_sleep, mock_logger):
mock_client_instance = MockWebClient.return_value
mock_error_response = Mock()
mock_error_response.status_code = 500 # Some other error
mock_error_response.data = {'ok': False, 'error': 'internal_server_error'}
original_error = SlackApiError(message="internal server error", response=mock_error_response)
mock_client_instance.users_info.side_effect = original_error
slack_history = SlackHistory(token="fake_token")
with self.assertRaises(SlackApiError) as context:
slack_history.get_user_info(user_id="U123")
# Check that the raised error is the one we expect
self.assertIn("Error retrieving user info for U123", str(context.exception))
self.assertIs(context.exception.response, mock_error_response)
mock_time_sleep.assert_not_called() # No rate limit sleep
@patch('surfsense_backend.app.connectors.slack_history.logger')
@patch('surfsense_backend.app.connectors.slack_history.time.sleep')
@patch('slack_sdk.WebClient')
def test_general_exception_propagates(self, MockWebClient, mock_time_sleep, mock_logger):
mock_client_instance = MockWebClient.return_value
original_error = Exception("A very generic problem")
mock_client_instance.users_info.side_effect = original_error
slack_history = SlackHistory(token="fake_token")
with self.assertRaises(Exception) as context:
slack_history.get_user_info(user_id="U123")
self.assertIs(context.exception, original_error) # Check it's the exact same exception
mock_logger.error.assert_called_once_with(
"Unexpected error in get_user_info for user U123: A very generic problem"
)
mock_time_sleep.assert_not_called() # No rate limit sleep

View file

@ -3,11 +3,7 @@ from datetime import datetime, timezone
from enum import Enum from enum import Enum
from fastapi import Depends from fastapi import Depends
from fastapi_users.db import (
SQLAlchemyBaseOAuthAccountTableUUID,
SQLAlchemyBaseUserTableUUID,
SQLAlchemyUserDatabase,
)
from pgvector.sqlalchemy import Vector from pgvector.sqlalchemy import Vector
from sqlalchemy import ( from sqlalchemy import (
ARRAY, ARRAY,
@ -30,6 +26,18 @@ from app.config import config
from app.retriver.chunks_hybrid_search import ChucksHybridSearchRetriever from app.retriver.chunks_hybrid_search import ChucksHybridSearchRetriever
from app.retriver.documents_hybrid_search import DocumentHybridSearchRetriever from app.retriver.documents_hybrid_search import DocumentHybridSearchRetriever
if config.AUTH_TYPE == "GOOGLE":
from fastapi_users.db import (
SQLAlchemyBaseOAuthAccountTableUUID,
SQLAlchemyBaseUserTableUUID,
SQLAlchemyUserDatabase,
)
else:
from fastapi_users.db import (
SQLAlchemyBaseUserTableUUID,
SQLAlchemyUserDatabase,
)
DATABASE_URL = config.DATABASE_URL DATABASE_URL = config.DATABASE_URL
@ -44,8 +52,9 @@ class DocumentType(str, Enum):
LINEAR_CONNECTOR = "LINEAR_CONNECTOR" LINEAR_CONNECTOR = "LINEAR_CONNECTOR"
class SearchSourceConnectorType(str, Enum): class SearchSourceConnectorType(str, Enum):
SERPER_API = "SERPER_API" SERPER_API = "SERPER_API" # NOT IMPLEMENTED YET : DON'T REMEMBER WHY : MOST PROBABLY BECAUSE WE NEED TO CRAWL THE RESULTS RETURNED BY IT
TAVILY_API = "TAVILY_API" TAVILY_API = "TAVILY_API"
LINKUP_API = "LINKUP_API"
SLACK_CONNECTOR = "SLACK_CONNECTOR" SLACK_CONNECTOR = "SLACK_CONNECTOR"
NOTION_CONNECTOR = "NOTION_CONNECTOR" NOTION_CONNECTOR = "NOTION_CONNECTOR"
GITHUB_CONNECTOR = "GITHUB_CONNECTOR" GITHUB_CONNECTOR = "GITHUB_CONNECTOR"
@ -75,7 +84,7 @@ class Chat(BaseModel, TimestampMixin):
__tablename__ = "chats" __tablename__ = "chats"
type = Column(SQLAlchemyEnum(ChatType), nullable=False) type = Column(SQLAlchemyEnum(ChatType), nullable=False)
title = Column(String(200), nullable=False, index=True) title = Column(String, nullable=False, index=True)
initial_connectors = Column(ARRAY(String), nullable=True) initial_connectors = Column(ARRAY(String), nullable=True)
messages = Column(JSON, nullable=False) messages = Column(JSON, nullable=False)
@ -85,11 +94,12 @@ class Chat(BaseModel, TimestampMixin):
class Document(BaseModel, TimestampMixin): class Document(BaseModel, TimestampMixin):
__tablename__ = "documents" __tablename__ = "documents"
title = Column(String(200), nullable=False, index=True) title = Column(String, nullable=False, index=True)
document_type = Column(SQLAlchemyEnum(DocumentType), nullable=False) document_type = Column(SQLAlchemyEnum(DocumentType), nullable=False)
document_metadata = Column(JSON, nullable=True) document_metadata = Column(JSON, nullable=True)
content = Column(Text, nullable=False) content = Column(Text, nullable=False)
content_hash = Column(String, nullable=False, index=True, unique=True)
embedding = Column(Vector(config.embedding_model_instance.dimension)) embedding = Column(Vector(config.embedding_model_instance.dimension))
search_space_id = Column(Integer, ForeignKey("searchspaces.id", ondelete='CASCADE'), nullable=False) search_space_id = Column(Integer, ForeignKey("searchspaces.id", ondelete='CASCADE'), nullable=False)
@ -108,9 +118,8 @@ class Chunk(BaseModel, TimestampMixin):
class Podcast(BaseModel, TimestampMixin): class Podcast(BaseModel, TimestampMixin):
__tablename__ = "podcasts" __tablename__ = "podcasts"
title = Column(String(200), nullable=False, index=True) title = Column(String, nullable=False, index=True)
is_generated = Column(Boolean, nullable=False, default=False) podcast_transcript = Column(JSON, nullable=False, default={})
podcast_content = Column(Text, nullable=False, default="")
file_location = Column(String(500), nullable=False, default="") file_location = Column(String(500), nullable=False, default="")
search_space_id = Column(Integer, ForeignKey("searchspaces.id", ondelete='CASCADE'), nullable=False) search_space_id = Column(Integer, ForeignKey("searchspaces.id", ondelete='CASCADE'), nullable=False)
@ -141,17 +150,22 @@ class SearchSourceConnector(BaseModel, TimestampMixin):
user_id = Column(UUID(as_uuid=True), ForeignKey("user.id", ondelete='CASCADE'), nullable=False) user_id = Column(UUID(as_uuid=True), ForeignKey("user.id", ondelete='CASCADE'), nullable=False)
user = relationship("User", back_populates="search_source_connectors") user = relationship("User", back_populates="search_source_connectors")
if config.AUTH_TYPE == "GOOGLE":
class OAuthAccount(SQLAlchemyBaseOAuthAccountTableUUID, Base): class OAuthAccount(SQLAlchemyBaseOAuthAccountTableUUID, Base):
pass pass
class User(SQLAlchemyBaseUserTableUUID, Base): class User(SQLAlchemyBaseUserTableUUID, Base):
oauth_accounts: Mapped[list[OAuthAccount]] = relationship( oauth_accounts: Mapped[list[OAuthAccount]] = relationship(
"OAuthAccount", lazy="joined" "OAuthAccount", lazy="joined"
) )
search_spaces = relationship("SearchSpace", back_populates="user") search_spaces = relationship("SearchSpace", back_populates="user")
search_source_connectors = relationship("SearchSourceConnector", back_populates="user") search_source_connectors = relationship("SearchSourceConnector", back_populates="user")
else:
class User(SQLAlchemyBaseUserTableUUID, Base):
search_spaces = relationship("SearchSpace", back_populates="user")
search_source_connectors = relationship("SearchSourceConnector", back_populates="user")
engine = create_async_engine(DATABASE_URL) engine = create_async_engine(DATABASE_URL)
@ -180,8 +194,12 @@ async def get_async_session() -> AsyncGenerator[AsyncSession, None]:
yield session yield session
async def get_user_db(session: AsyncSession = Depends(get_async_session)): if config.AUTH_TYPE == "GOOGLE":
yield SQLAlchemyUserDatabase(session, User, OAuthAccount) async def get_user_db(session: AsyncSession = Depends(get_async_session)):
yield SQLAlchemyUserDatabase(session, User, OAuthAccount)
else:
async def get_user_db(session: AsyncSession = Depends(get_async_session)):
yield SQLAlchemyUserDatabase(session, User)
async def get_chucks_hybrid_search_retriever(session: AsyncSession = Depends(get_async_session)): async def get_chucks_hybrid_search_retriever(session: AsyncSession = Depends(get_async_session)):
return ChucksHybridSearchRetriever(session) return ChucksHybridSearchRetriever(session)

View file

@ -113,8 +113,6 @@ class DocumentHybridSearchRetriever:
search_space_id: Optional search space ID to filter results search_space_id: Optional search space ID to filter results
document_type: Optional document type to filter results (e.g., "FILE", "CRAWLED_URL") document_type: Optional document type to filter results (e.g., "FILE", "CRAWLED_URL")
Returns:
List of dictionaries containing document data and relevance scores
""" """
from sqlalchemy import select, func, text from sqlalchemy import select, func, text
from sqlalchemy.orm import joinedload from sqlalchemy.orm import joinedload
@ -224,10 +222,22 @@ class DocumentHybridSearchRetriever:
# Convert to serializable dictionaries # Convert to serializable dictionaries
serialized_results = [] serialized_results = []
for document, score in documents_with_scores: for document, score in documents_with_scores:
# Fetch associated chunks for this document
from sqlalchemy import select
from app.db import Chunk
chunks_query = select(Chunk).where(Chunk.document_id == document.id).order_by(Chunk.id)
chunks_result = await self.db_session.execute(chunks_query)
chunks = chunks_result.scalars().all()
# Concatenate chunks content
concatenated_chunks_content = " ".join([chunk.content for chunk in chunks]) if chunks else document.content
serialized_results.append({ serialized_results.append({
"document_id": document.id, "document_id": document.id,
"title": document.title, "title": document.title,
"content": document.content, "content": document.content,
"chunks_content": concatenated_chunks_content,
"document_type": document.document_type.value if hasattr(document, 'document_type') else None, "document_type": document.document_type.value if hasattr(document, 'document_type') else None,
"metadata": document.document_metadata, "metadata": document.document_metadata,
"score": float(score), # Ensure score is a Python float "score": float(score), # Ensure score is a Python float

View file

@ -10,6 +10,8 @@ from fastapi.responses import StreamingResponse
from sqlalchemy.exc import IntegrityError, OperationalError from sqlalchemy.exc import IntegrityError, OperationalError
from sqlalchemy.ext.asyncio import AsyncSession from sqlalchemy.ext.asyncio import AsyncSession
from sqlalchemy.future import select from sqlalchemy.future import select
from langchain.schema import HumanMessage, AIMessage
router = APIRouter() router = APIRouter()
@ -20,15 +22,17 @@ async def handle_chat_data(
user: User = Depends(current_active_user) user: User = Depends(current_active_user)
): ):
messages = request.messages messages = request.messages
if messages[-1].role != "user": if messages[-1]['role'] != "user":
raise HTTPException( raise HTTPException(
status_code=400, detail="Last message must be a user message") status_code=400, detail="Last message must be a user message")
user_query = messages[-1].content user_query = messages[-1]['content']
search_space_id = request.data.get('search_space_id') search_space_id = request.data.get('search_space_id')
research_mode: str = request.data.get('research_mode') research_mode: str = request.data.get('research_mode')
selected_connectors: List[str] = request.data.get('selected_connectors') selected_connectors: List[str] = request.data.get('selected_connectors')
search_mode_str = request.data.get('search_mode', "CHUNKS")
# Convert search_space_id to integer if it's a string # Convert search_space_id to integer if it's a string
if search_space_id and isinstance(search_space_id, str): if search_space_id and isinstance(search_space_id, str):
try: try:
@ -44,13 +48,30 @@ async def handle_chat_data(
raise HTTPException( raise HTTPException(
status_code=403, detail="You don't have access to this search space") status_code=403, detail="You don't have access to this search space")
langchain_chat_history = []
for message in messages[:-1]:
if message['role'] == "user":
langchain_chat_history.append(HumanMessage(content=message['content']))
elif message['role'] == "assistant":
# Last annotation type will always be "ANSWER" here
answer_annotation = message['annotations'][-1]
answer_text = ""
if answer_annotation['type'] == "ANSWER":
answer_text = answer_annotation['content']
# If content is a list, join it into a single string
if isinstance(answer_text, list):
answer_text = "\n".join(answer_text)
langchain_chat_history.append(AIMessage(content=answer_text))
response = StreamingResponse(stream_connector_search_results( response = StreamingResponse(stream_connector_search_results(
user_query, user_query,
user.id, user.id,
search_space_id, # Already converted to int in lines 32-37 search_space_id, # Already converted to int in lines 32-37
session, session,
research_mode, research_mode,
selected_connectors selected_connectors,
langchain_chat_history,
search_mode_str
)) ))
response.headers['x-vercel-ai-data-stream'] = 'v1' response.headers['x-vercel-ai-data-stream'] = 'v1'
return response return response

View file

@ -1,3 +1,4 @@
from litellm import atranscription
from fastapi import APIRouter, Depends, BackgroundTasks, UploadFile, Form, HTTPException from fastapi import APIRouter, Depends, BackgroundTasks, UploadFile, Form, HTTPException
from sqlalchemy.ext.asyncio import AsyncSession from sqlalchemy.ext.asyncio import AsyncSession
from sqlalchemy.future import select from sqlalchemy.future import select
@ -6,7 +7,8 @@ from app.db import get_async_session, User, SearchSpace, Document, DocumentType
from app.schemas import DocumentsCreate, DocumentUpdate, DocumentRead from app.schemas import DocumentsCreate, DocumentUpdate, DocumentRead
from app.users import current_active_user from app.users import current_active_user
from app.utils.check_ownership import check_ownership from app.utils.check_ownership import check_ownership
from app.tasks.background_tasks import add_extension_received_document, add_received_file_document, add_crawled_url_document, add_youtube_video_document from app.tasks.background_tasks import add_received_markdown_file_document, add_extension_received_document, add_received_file_document_using_unstructured, add_crawled_url_document, add_youtube_video_document, add_received_file_document_using_llamacloud
from app.config import config as app_config
# Force asyncio to use standard event loop before unstructured imports # Force asyncio to use standard event loop before unstructured imports
import asyncio import asyncio
try: try:
@ -15,12 +17,11 @@ except RuntimeError:
pass pass
import os import os
os.environ["UNSTRUCTURED_HAS_PATCHED_LOOP"] = "1" os.environ["UNSTRUCTURED_HAS_PATCHED_LOOP"] = "1"
from langchain_unstructured import UnstructuredLoader
from app.config import config
import json
router = APIRouter() router = APIRouter()
@router.post("/documents/") @router.post("/documents/")
async def create_documents( async def create_documents(
request: DocumentsCreate, request: DocumentsCreate,
@ -70,6 +71,7 @@ async def create_documents(
detail=f"Failed to process documents: {str(e)}" detail=f"Failed to process documents: {str(e)}"
) )
@router.post("/documents/fileupload") @router.post("/documents/fileupload")
async def create_documents( async def create_documents(
files: list[UploadFile], files: list[UploadFile],
@ -100,7 +102,6 @@ async def create_documents(
with open(temp_path, "wb") as f: with open(temp_path, "wb") as f:
f.write(content) f.write(content)
# Process in background to avoid uvloop conflicts
fastapi_background_tasks.add_task( fastapi_background_tasks.add_task(
process_file_in_background_with_new_session, process_file_in_background_with_new_session,
temp_path, temp_path,
@ -132,40 +133,136 @@ async def process_file_in_background(
session: AsyncSession session: AsyncSession
): ):
try: try:
# Use synchronous unstructured API to avoid event loop issues # Check if the file is a markdown file
from langchain_community.document_loaders import UnstructuredFileLoader if filename.lower().endswith(('.md', '.markdown')):
# For markdown files, read the content directly
with open(file_path, 'r', encoding='utf-8') as f:
markdown_content = f.read()
# Process the file # Clean up the temp file
loader = UnstructuredFileLoader( import os
file_path, try:
mode="elements", os.unlink(file_path)
post_processors=[], except:
languages=["eng"], pass
include_orig_elements=False,
include_metadata=False,
strategy="auto",
)
docs = loader.load() # Process markdown directly through specialized function
await add_received_markdown_file_document(
session,
filename,
markdown_content,
search_space_id
)
# Check if the file is an audio file
elif filename.lower().endswith(('.mp3', '.mp4', '.mpeg', '.mpga', '.m4a', '.wav', '.webm')):
# Open the audio file for transcription
with open(file_path, "rb") as audio_file:
# Use LiteLLM for audio transcription
if app_config.STT_SERVICE_API_BASE:
transcription_response = await atranscription(
model=app_config.STT_SERVICE,
file=audio_file,
api_base=app_config.STT_SERVICE_API_BASE
)
else:
transcription_response = await atranscription(
model=app_config.STT_SERVICE,
file=audio_file
)
# Clean up the temp file # Extract the transcribed text
import os transcribed_text = transcription_response.get("text", "")
try:
os.unlink(file_path)
except:
pass
# Pass the documents to the existing background task # Add metadata about the transcription
await add_received_file_document( transcribed_text = f"# Transcription of {filename}\n\n{transcribed_text}"
session,
filename, # Clean up the temp file
docs, try:
search_space_id os.unlink(file_path)
) except:
pass
# Process transcription as markdown document
await add_received_markdown_file_document(
session,
filename,
transcribed_text,
search_space_id
)
else:
if app_config.ETL_SERVICE == "UNSTRUCTURED":
from langchain_unstructured import UnstructuredLoader
# Process the file
loader = UnstructuredLoader(
file_path,
mode="elements",
post_processors=[],
languages=["eng"],
include_orig_elements=False,
include_metadata=False,
strategy="auto",
)
docs = await loader.aload()
# Clean up the temp file
import os
try:
os.unlink(file_path)
except:
pass
# Pass the documents to the existing background task
await add_received_file_document_using_unstructured(
session,
filename,
docs,
search_space_id
)
elif app_config.ETL_SERVICE == "LLAMACLOUD":
from llama_cloud_services import LlamaParse
from llama_cloud_services.parse.utils import ResultType
# Create LlamaParse parser instance
parser = LlamaParse(
api_key=app_config.LLAMA_CLOUD_API_KEY,
num_workers=1, # Use single worker for file processing
verbose=True,
language="en",
result_type=ResultType.MD
)
# Parse the file asynchronously
result = await parser.aparse(file_path)
# Clean up the temp file
import os
try:
os.unlink(file_path)
except:
pass
# Get markdown documents from the result
markdown_documents = await result.aget_markdown_documents(split_by_page=False)
for doc in markdown_documents:
# Extract text content from the markdown documents
markdown_content = doc.text
# Process the documents using our LlamaCloud background task
await add_received_file_document_using_llamacloud(
session,
filename,
llamacloud_markdown_document=markdown_content,
search_space_id=search_space_id
)
except Exception as e: except Exception as e:
import logging import logging
logging.error(f"Error processing file in background: {str(e)}") logging.error(f"Error processing file in background: {str(e)}")
@router.get("/documents/", response_model=List[DocumentRead]) @router.get("/documents/", response_model=List[DocumentRead])
async def read_documents( async def read_documents(
skip: int = 0, skip: int = 0,
@ -175,7 +272,8 @@ async def read_documents(
user: User = Depends(current_active_user) user: User = Depends(current_active_user)
): ):
try: try:
query = select(Document).join(SearchSpace).filter(SearchSpace.user_id == user.id) query = select(Document).join(SearchSpace).filter(
SearchSpace.user_id == user.id)
# Filter by search_space_id if provided # Filter by search_space_id if provided
if search_space_id is not None: if search_space_id is not None:
@ -206,6 +304,7 @@ async def read_documents(
detail=f"Failed to fetch documents: {str(e)}" detail=f"Failed to fetch documents: {str(e)}"
) )
@router.get("/documents/{document_id}", response_model=DocumentRead) @router.get("/documents/{document_id}", response_model=DocumentRead)
async def read_document( async def read_document(
document_id: int, document_id: int,
@ -242,6 +341,7 @@ async def read_document(
detail=f"Failed to fetch document: {str(e)}" detail=f"Failed to fetch document: {str(e)}"
) )
@router.put("/documents/{document_id}", response_model=DocumentRead) @router.put("/documents/{document_id}", response_model=DocumentRead)
async def update_document( async def update_document(
document_id: int, document_id: int,
@ -289,6 +389,7 @@ async def update_document(
detail=f"Failed to update document: {str(e)}" detail=f"Failed to update document: {str(e)}"
) )
@router.delete("/documents/{document_id}", response_model=dict) @router.delete("/documents/{document_id}", response_model=dict)
async def delete_document( async def delete_document(
document_id: int, document_id: int,
@ -337,6 +438,7 @@ async def process_extension_document_with_new_session(
import logging import logging
logging.error(f"Error processing extension document: {str(e)}") logging.error(f"Error processing extension document: {str(e)}")
async def process_crawled_url_with_new_session( async def process_crawled_url_with_new_session(
url: str, url: str,
search_space_id: int search_space_id: int
@ -351,6 +453,7 @@ async def process_crawled_url_with_new_session(
import logging import logging
logging.error(f"Error processing crawled URL: {str(e)}") logging.error(f"Error processing crawled URL: {str(e)}")
async def process_file_in_background_with_new_session( async def process_file_in_background_with_new_session(
file_path: str, file_path: str,
filename: str, filename: str,
@ -362,6 +465,7 @@ async def process_file_in_background_with_new_session(
async with async_session_maker() as session: async with async_session_maker() as session:
await process_file_in_background(file_path, filename, search_space_id, session) await process_file_in_background(file_path, filename, search_space_id, session)
async def process_youtube_video_with_new_session( async def process_youtube_video_with_new_session(
url: str, url: str,
search_space_id: int search_space_id: int
@ -376,3 +480,4 @@ async def process_youtube_video_with_new_session(
import logging import logging
logging.error(f"Error processing YouTube video: {str(e)}") logging.error(f"Error processing YouTube video: {str(e)}")

View file

@ -1,12 +1,16 @@
from fastapi import APIRouter, Depends, HTTPException from fastapi import APIRouter, BackgroundTasks, Depends, HTTPException
from sqlalchemy.ext.asyncio import AsyncSession from sqlalchemy.ext.asyncio import AsyncSession
from sqlalchemy.future import select from sqlalchemy.future import select
from sqlalchemy.exc import IntegrityError, SQLAlchemyError from sqlalchemy.exc import IntegrityError, SQLAlchemyError
from typing import List from typing import List
from app.db import get_async_session, User, SearchSpace, Podcast from app.db import get_async_session, User, SearchSpace, Podcast, Chat
from app.schemas import PodcastCreate, PodcastUpdate, PodcastRead from app.schemas import PodcastCreate, PodcastUpdate, PodcastRead, PodcastGenerateRequest
from app.users import current_active_user from app.users import current_active_user
from app.utils.check_ownership import check_ownership from app.utils.check_ownership import check_ownership
from app.tasks.podcast_tasks import generate_chat_podcast
from fastapi.responses import StreamingResponse
import os
from pathlib import Path
router = APIRouter() router = APIRouter()
@ -120,3 +124,120 @@ async def delete_podcast(
except SQLAlchemyError: except SQLAlchemyError:
await session.rollback() await session.rollback()
raise HTTPException(status_code=500, detail="Database error occurred while deleting podcast") raise HTTPException(status_code=500, detail="Database error occurred while deleting podcast")
async def generate_chat_podcast_with_new_session(
chat_id: int,
search_space_id: int,
podcast_title: str = "SurfSense Podcast"
):
"""Create a new session and process chat podcast generation."""
from app.db import async_session_maker
async with async_session_maker() as session:
try:
await generate_chat_podcast(session, chat_id, search_space_id, podcast_title)
except Exception as e:
import logging
logging.error(f"Error generating podcast from chat: {str(e)}")
@router.post("/podcasts/generate/")
async def generate_podcast(
request: PodcastGenerateRequest,
session: AsyncSession = Depends(get_async_session),
user: User = Depends(current_active_user),
fastapi_background_tasks: BackgroundTasks = BackgroundTasks()
):
try:
# Check if the user owns the search space
await check_ownership(session, SearchSpace, request.search_space_id, user)
if request.type == "CHAT":
# Verify that all chat IDs belong to this user and search space
query = select(Chat).filter(
Chat.id.in_(request.ids),
Chat.search_space_id == request.search_space_id
).join(SearchSpace).filter(SearchSpace.user_id == user.id)
result = await session.execute(query)
valid_chats = result.scalars().all()
valid_chat_ids = [chat.id for chat in valid_chats]
# If any requested ID is not in valid IDs, raise error immediately
if len(valid_chat_ids) != len(request.ids):
raise HTTPException(
status_code=403,
detail="One or more chat IDs do not belong to this user or search space"
)
# Only add a single task with the first chat ID
for chat_id in valid_chat_ids:
fastapi_background_tasks.add_task(
generate_chat_podcast_with_new_session,
chat_id,
request.search_space_id,
request.podcast_title
)
return {
"message": "Podcast generation started",
}
except HTTPException as he:
raise he
except IntegrityError as e:
await session.rollback()
raise HTTPException(status_code=400, detail="Podcast generation failed due to constraint violation")
except SQLAlchemyError as e:
await session.rollback()
raise HTTPException(status_code=500, detail="Database error occurred while generating podcast")
except Exception as e:
await session.rollback()
raise HTTPException(status_code=500, detail=f"An unexpected error occurred: {str(e)}")
@router.get("/podcasts/{podcast_id}/stream")
async def stream_podcast(
podcast_id: int,
session: AsyncSession = Depends(get_async_session),
user: User = Depends(current_active_user)
):
"""Stream a podcast audio file."""
try:
# Get the podcast and check if user has access
result = await session.execute(
select(Podcast)
.join(SearchSpace)
.filter(Podcast.id == podcast_id, SearchSpace.user_id == user.id)
)
podcast = result.scalars().first()
if not podcast:
raise HTTPException(
status_code=404,
detail="Podcast not found or you don't have permission to access it"
)
# Get the file path
file_path = podcast.file_location
# Check if the file exists
if not os.path.isfile(file_path):
raise HTTPException(status_code=404, detail="Podcast audio file not found")
# Define a generator function to stream the file
def iterfile():
with open(file_path, mode="rb") as file_like:
yield from file_like
# Return a streaming response with appropriate headers
return StreamingResponse(
iterfile(),
media_type="audio/mpeg",
headers={
"Accept-Ranges": "bytes",
"Content-Disposition": f"inline; filename={Path(file_path).name}"
}
)
except HTTPException as he:
raise he
except Exception as e:
raise HTTPException(status_code=500, detail=f"Error streaming podcast: {str(e)}")

View file

@ -21,7 +21,7 @@ from app.utils.check_ownership import check_ownership
from pydantic import BaseModel, Field, ValidationError from pydantic import BaseModel, Field, ValidationError
from app.tasks.connectors_indexing_tasks import index_slack_messages, index_notion_pages, index_github_repos, index_linear_issues from app.tasks.connectors_indexing_tasks import index_slack_messages, index_notion_pages, index_github_repos, index_linear_issues
from app.connectors.github_connector import GitHubConnector from app.connectors.github_connector import GitHubConnector
from datetime import datetime, timezone, timedelta from datetime import datetime, timedelta
import logging import logging
# Set up logging # Set up logging

View file

@ -10,7 +10,7 @@ from .documents import (
DocumentRead, DocumentRead,
) )
from .chunks import ChunkBase, ChunkCreate, ChunkUpdate, ChunkRead from .chunks import ChunkBase, ChunkCreate, ChunkUpdate, ChunkRead
from .podcasts import PodcastBase, PodcastCreate, PodcastUpdate, PodcastRead from .podcasts import PodcastBase, PodcastCreate, PodcastUpdate, PodcastRead, PodcastGenerateRequest
from .chats import ChatBase, ChatCreate, ChatUpdate, ChatRead, AISDKChatRequest from .chats import ChatBase, ChatCreate, ChatUpdate, ChatRead, AISDKChatRequest
from .search_source_connector import SearchSourceConnectorBase, SearchSourceConnectorCreate, SearchSourceConnectorUpdate, SearchSourceConnectorRead from .search_source_connector import SearchSourceConnectorBase, SearchSourceConnectorCreate, SearchSourceConnectorUpdate, SearchSourceConnectorRead
@ -39,6 +39,7 @@ __all__ = [
"PodcastCreate", "PodcastCreate",
"PodcastUpdate", "PodcastUpdate",
"PodcastRead", "PodcastRead",
"PodcastGenerateRequest",
"ChatBase", "ChatBase",
"ChatCreate", "ChatCreate",
"ChatUpdate", "ChatUpdate",

View file

@ -1,8 +1,10 @@
from datetime import datetime from datetime import datetime
from pydantic import BaseModel from pydantic import BaseModel, ConfigDict
class TimestampModel(BaseModel): class TimestampModel(BaseModel):
created_at: datetime created_at: datetime
model_config = ConfigDict(from_attributes=True)
class IDModel(BaseModel): class IDModel(BaseModel):
id: int id: int
model_config = ConfigDict(from_attributes=True)

View file

@ -1,8 +1,10 @@
from typing import Any, Dict, List, Optional from typing import Any, Dict, List, Optional
from pydantic import BaseModel
from sqlalchemy import JSON
from .base import IDModel, TimestampModel
from app.db import ChatType from app.db import ChatType
from pydantic import BaseModel, ConfigDict
from .base import IDModel, TimestampModel
class ChatBase(BaseModel): class ChatBase(BaseModel):
type: ChatType type: ChatType
@ -25,14 +27,14 @@ class ToolInvocation(BaseModel):
result: dict result: dict
class ClientMessage(BaseModel): # class ClientMessage(BaseModel):
role: str # role: str
content: str # content: str
experimental_attachments: Optional[List[ClientAttachment]] = None # experimental_attachments: Optional[List[ClientAttachment]] = None
toolInvocations: Optional[List[ToolInvocation]] = None # toolInvocations: Optional[List[ToolInvocation]] = None
class AISDKChatRequest(BaseModel): class AISDKChatRequest(BaseModel):
messages: List[ClientMessage] messages: List[Any]
data: Optional[Dict[str, Any]] = None data: Optional[Dict[str, Any]] = None
class ChatCreate(ChatBase): class ChatCreate(ChatBase):
@ -42,5 +44,4 @@ class ChatUpdate(ChatBase):
pass pass
class ChatRead(ChatBase, IDModel, TimestampModel): class ChatRead(ChatBase, IDModel, TimestampModel):
class Config: model_config = ConfigDict(from_attributes=True)
from_attributes = True

View file

@ -1,4 +1,4 @@
from pydantic import BaseModel from pydantic import BaseModel, ConfigDict
from .base import IDModel, TimestampModel from .base import IDModel, TimestampModel
class ChunkBase(BaseModel): class ChunkBase(BaseModel):
@ -12,5 +12,4 @@ class ChunkUpdate(ChunkBase):
pass pass
class ChunkRead(ChunkBase, IDModel, TimestampModel): class ChunkRead(ChunkBase, IDModel, TimestampModel):
class Config: model_config = ConfigDict(from_attributes=True)
from_attributes = True

View file

@ -1,7 +1,5 @@
from typing import List, Any from typing import List, Any
from pydantic import BaseModel from pydantic import BaseModel, ConfigDict
from sqlalchemy import JSON
from .base import IDModel, TimestampModel
from app.db import DocumentType from app.db import DocumentType
from datetime import datetime from datetime import datetime
@ -37,6 +35,5 @@ class DocumentRead(BaseModel):
created_at: datetime created_at: datetime
search_space_id: int search_space_id: int
class Config: model_config = ConfigDict(from_attributes=True)
from_attributes = True

View file

@ -1,10 +1,10 @@
from pydantic import BaseModel from pydantic import BaseModel, ConfigDict
from typing import Any, List, Literal
from .base import IDModel, TimestampModel from .base import IDModel, TimestampModel
class PodcastBase(BaseModel): class PodcastBase(BaseModel):
title: str title: str
is_generated: bool = False podcast_transcript: List[Any]
podcast_content: str = ""
file_location: str = "" file_location: str = ""
search_space_id: int search_space_id: int
@ -15,5 +15,10 @@ class PodcastUpdate(PodcastBase):
pass pass
class PodcastRead(PodcastBase, IDModel, TimestampModel): class PodcastRead(PodcastBase, IDModel, TimestampModel):
class Config: model_config = ConfigDict(from_attributes=True)
from_attributes = True
class PodcastGenerateRequest(BaseModel):
type: Literal["DOCUMENT", "CHAT"]
ids: List[int]
search_space_id: int
podcast_title: str = "SurfSense Podcast"

View file

@ -1,7 +1,7 @@
from datetime import datetime from datetime import datetime
import uuid import uuid
from typing import Dict, Any, Optional from typing import Dict, Any, Optional
from pydantic import BaseModel, field_validator from pydantic import BaseModel, field_validator, ConfigDict
from .base import IDModel, TimestampModel from .base import IDModel, TimestampModel
from app.db import SearchSourceConnectorType from app.db import SearchSourceConnectorType
@ -37,6 +37,16 @@ class SearchSourceConnectorBase(BaseModel):
if not config.get("TAVILY_API_KEY"): if not config.get("TAVILY_API_KEY"):
raise ValueError("TAVILY_API_KEY cannot be empty") raise ValueError("TAVILY_API_KEY cannot be empty")
elif connector_type == SearchSourceConnectorType.LINKUP_API:
# For LINKUP_API, only allow LINKUP_API_KEY
allowed_keys = ["LINKUP_API_KEY"]
if set(config.keys()) != set(allowed_keys):
raise ValueError(f"For LINKUP_API connector type, config must only contain these keys: {allowed_keys}")
# Ensure the API key is not empty
if not config.get("LINKUP_API_KEY"):
raise ValueError("LINKUP_API_KEY cannot be empty")
elif connector_type == SearchSourceConnectorType.SLACK_CONNECTOR: elif connector_type == SearchSourceConnectorType.SLACK_CONNECTOR:
# For SLACK_CONNECTOR, only allow SLACK_BOT_TOKEN # For SLACK_CONNECTOR, only allow SLACK_BOT_TOKEN
allowed_keys = ["SLACK_BOT_TOKEN"] allowed_keys = ["SLACK_BOT_TOKEN"]
@ -96,5 +106,4 @@ class SearchSourceConnectorUpdate(BaseModel):
class SearchSourceConnectorRead(SearchSourceConnectorBase, IDModel, TimestampModel): class SearchSourceConnectorRead(SearchSourceConnectorBase, IDModel, TimestampModel):
user_id: uuid.UUID user_id: uuid.UUID
class Config: model_config = ConfigDict(from_attributes=True)
from_attributes = True

View file

@ -1,7 +1,7 @@
from datetime import datetime from datetime import datetime
import uuid import uuid
from typing import Optional from typing import Optional
from pydantic import BaseModel from pydantic import BaseModel, ConfigDict
from .base import IDModel, TimestampModel from .base import IDModel, TimestampModel
class SearchSpaceBase(BaseModel): class SearchSpaceBase(BaseModel):
@ -19,5 +19,4 @@ class SearchSpaceRead(SearchSpaceBase, IDModel, TimestampModel):
created_at: datetime created_at: datetime
user_id: uuid.UUID user_id: uuid.UUID
class Config: model_config = ConfigDict(from_attributes=True)
from_attributes = True

View file

@ -1,27 +1,29 @@
from typing import Optional, List from typing import Optional, List
from sqlalchemy.ext.asyncio import AsyncSession from sqlalchemy.ext.asyncio import AsyncSession
from sqlalchemy.exc import SQLAlchemyError from sqlalchemy.exc import SQLAlchemyError
from sqlalchemy.future import select
from app.db import Document, DocumentType, Chunk from app.db import Document, DocumentType, Chunk
from app.schemas import ExtensionDocumentContent from app.schemas import ExtensionDocumentContent
from app.config import config from app.config import config
from app.prompts import SUMMARY_PROMPT_TEMPLATE from app.prompts import SUMMARY_PROMPT_TEMPLATE
from datetime import datetime from datetime import datetime
from app.utils.document_converters import convert_document_to_markdown from app.utils.document_converters import convert_document_to_markdown, generate_content_hash
from langchain_core.documents import Document as LangChainDocument from langchain_core.documents import Document as LangChainDocument
from langchain_community.document_loaders import FireCrawlLoader, AsyncChromiumLoader from langchain_community.document_loaders import FireCrawlLoader, AsyncChromiumLoader
from langchain_community.document_transformers import MarkdownifyTransformer from langchain_community.document_transformers import MarkdownifyTransformer
import validators import validators
from youtube_transcript_api import YouTubeTranscriptApi
from urllib.parse import urlparse, parse_qs
import aiohttp
import logging
md = MarkdownifyTransformer() md = MarkdownifyTransformer()
async def add_crawled_url_document( async def add_crawled_url_document(
session: AsyncSession, session: AsyncSession, url: str, search_space_id: int
url: str,
search_space_id: int
) -> Optional[Document]: ) -> Optional[Document]:
try: try:
if not validators.url(url): if not validators.url(url):
raise ValueError(f"Url {url} is not a valid URL address") raise ValueError(f"Url {url} is not a valid URL address")
@ -33,7 +35,7 @@ async def add_crawled_url_document(
params={ params={
"formats": ["markdown"], "formats": ["markdown"],
"excludeTags": ["a"], "excludeTags": ["a"],
} },
) )
else: else:
crawl_loader = AsyncChromiumLoader(urls=[url], headless=True) crawl_loader = AsyncChromiumLoader(urls=[url], headless=True)
@ -43,20 +45,21 @@ async def add_crawled_url_document(
if type(crawl_loader) == FireCrawlLoader: if type(crawl_loader) == FireCrawlLoader:
content_in_markdown = url_crawled[0].page_content content_in_markdown = url_crawled[0].page_content
elif type(crawl_loader) == AsyncChromiumLoader: elif type(crawl_loader) == AsyncChromiumLoader:
content_in_markdown = md.transform_documents(url_crawled)[ content_in_markdown = md.transform_documents(url_crawled)[0].page_content
0].page_content
# Format document metadata in a more maintainable way # Format document metadata in a more maintainable way
metadata_sections = [ metadata_sections = [
("METADATA", [ (
f"{key.upper()}: {value}" for key, value in url_crawled[0].metadata.items() "METADATA",
]), [
("CONTENT", [ f"{key.upper()}: {value}"
"FORMAT: markdown", for key, value in url_crawled[0].metadata.items()
"TEXT_START", ],
content_in_markdown, ),
"TEXT_END" (
]) "CONTENT",
["FORMAT: markdown", "TEXT_START", content_in_markdown, "TEXT_END"],
),
] ]
# Build the document string more efficiently # Build the document string more efficiently
@ -69,31 +72,48 @@ async def add_crawled_url_document(
document_parts.append(f"</{section_title}>") document_parts.append(f"</{section_title}>")
document_parts.append("</DOCUMENT>") document_parts.append("</DOCUMENT>")
combined_document_string = '\n'.join(document_parts) combined_document_string = "\n".join(document_parts)
content_hash = generate_content_hash(combined_document_string)
# Check if document with this content hash already exists
existing_doc_result = await session.execute(
select(Document).where(Document.content_hash == content_hash)
)
existing_document = existing_doc_result.scalars().first()
if existing_document:
logging.info(f"Document with content hash {content_hash} already exists. Skipping processing.")
return existing_document
# Generate summary # Generate summary
summary_chain = SUMMARY_PROMPT_TEMPLATE | config.long_context_llm_instance summary_chain = SUMMARY_PROMPT_TEMPLATE | config.long_context_llm_instance
summary_result = await summary_chain.ainvoke({"document": combined_document_string}) summary_result = await summary_chain.ainvoke(
{"document": combined_document_string}
)
summary_content = summary_result.content summary_content = summary_result.content
summary_embedding = config.embedding_model_instance.embed( summary_embedding = config.embedding_model_instance.embed(summary_content)
summary_content)
# Process chunks # Process chunks
chunks = [ chunks = [
Chunk(content=chunk.text, embedding=chunk.embedding) Chunk(
content=chunk.text,
embedding=config.embedding_model_instance.embed(chunk.text),
)
for chunk in config.chunker_instance.chunk(content_in_markdown) for chunk in config.chunker_instance.chunk(content_in_markdown)
] ]
# Create and store document # Create and store document
document = Document( document = Document(
search_space_id=search_space_id, search_space_id=search_space_id,
title=url_crawled[0].metadata['title'] if type( title=url_crawled[0].metadata["title"]
crawl_loader) == FireCrawlLoader else url_crawled[0].metadata['source'], if type(crawl_loader) == FireCrawlLoader
else url_crawled[0].metadata["source"],
document_type=DocumentType.CRAWLED_URL, document_type=DocumentType.CRAWLED_URL,
document_metadata=url_crawled[0].metadata, document_metadata=url_crawled[0].metadata,
content=summary_content, content=summary_content,
embedding=summary_embedding, embedding=summary_embedding,
chunks=chunks chunks=chunks,
content_hash=content_hash,
) )
session.add(document) session.add(document)
@ -111,9 +131,7 @@ async def add_crawled_url_document(
async def add_extension_received_document( async def add_extension_received_document(
session: AsyncSession, session: AsyncSession, content: ExtensionDocumentContent, search_space_id: int
content: ExtensionDocumentContent,
search_space_id: int
) -> Optional[Document]: ) -> Optional[Document]:
""" """
Process and store document content received from the SurfSense Extension. Process and store document content received from the SurfSense Extension.
@ -129,20 +147,21 @@ async def add_extension_received_document(
try: try:
# Format document metadata in a more maintainable way # Format document metadata in a more maintainable way
metadata_sections = [ metadata_sections = [
("METADATA", [ (
f"SESSION_ID: {content.metadata.BrowsingSessionId}", "METADATA",
f"URL: {content.metadata.VisitedWebPageURL}", [
f"TITLE: {content.metadata.VisitedWebPageTitle}", f"SESSION_ID: {content.metadata.BrowsingSessionId}",
f"REFERRER: {content.metadata.VisitedWebPageReffererURL}", f"URL: {content.metadata.VisitedWebPageURL}",
f"TIMESTAMP: {content.metadata.VisitedWebPageDateWithTimeInISOString}", f"TITLE: {content.metadata.VisitedWebPageTitle}",
f"DURATION_MS: {content.metadata.VisitedWebPageVisitDurationInMilliseconds}" f"REFERRER: {content.metadata.VisitedWebPageReffererURL}",
]), f"TIMESTAMP: {content.metadata.VisitedWebPageDateWithTimeInISOString}",
("CONTENT", [ f"DURATION_MS: {content.metadata.VisitedWebPageVisitDurationInMilliseconds}",
"FORMAT: markdown", ],
"TEXT_START", ),
content.pageContent, (
"TEXT_END" "CONTENT",
]) ["FORMAT: markdown", "TEXT_START", content.pageContent, "TEXT_END"],
),
] ]
# Build the document string more efficiently # Build the document string more efficiently
@ -155,18 +174,33 @@ async def add_extension_received_document(
document_parts.append(f"</{section_title}>") document_parts.append(f"</{section_title}>")
document_parts.append("</DOCUMENT>") document_parts.append("</DOCUMENT>")
combined_document_string = '\n'.join(document_parts) combined_document_string = "\n".join(document_parts)
content_hash = generate_content_hash(combined_document_string)
# Check if document with this content hash already exists
existing_doc_result = await session.execute(
select(Document).where(Document.content_hash == content_hash)
)
existing_document = existing_doc_result.scalars().first()
if existing_document:
logging.info(f"Document with content hash {content_hash} already exists. Skipping processing.")
return existing_document
# Generate summary # Generate summary
summary_chain = SUMMARY_PROMPT_TEMPLATE | config.long_context_llm_instance summary_chain = SUMMARY_PROMPT_TEMPLATE | config.long_context_llm_instance
summary_result = await summary_chain.ainvoke({"document": combined_document_string}) summary_result = await summary_chain.ainvoke(
{"document": combined_document_string}
)
summary_content = summary_result.content summary_content = summary_result.content
summary_embedding = config.embedding_model_instance.embed( summary_embedding = config.embedding_model_instance.embed(summary_content)
summary_content)
# Process chunks # Process chunks
chunks = [ chunks = [
Chunk(content=chunk.text, embedding=chunk.embedding) Chunk(
content=chunk.text,
embedding=config.embedding_model_instance.embed(chunk.text),
)
for chunk in config.chunker_instance.chunk(content.pageContent) for chunk in config.chunker_instance.chunk(content.pageContent)
] ]
@ -178,7 +212,8 @@ async def add_extension_received_document(
document_metadata=content.metadata.model_dump(), document_metadata=content.metadata.model_dump(),
content=summary_content, content=summary_content,
embedding=summary_embedding, embedding=summary_embedding,
chunks=chunks chunks=chunks,
content_hash=content_hash,
) )
session.add(document) session.add(document)
@ -195,27 +230,34 @@ async def add_extension_received_document(
raise RuntimeError(f"Failed to process extension document: {str(e)}") raise RuntimeError(f"Failed to process extension document: {str(e)}")
async def add_received_file_document( async def add_received_markdown_file_document(
session: AsyncSession, session: AsyncSession, file_name: str, file_in_markdown: str, search_space_id: int
file_name: str,
unstructured_processed_elements: List[LangChainDocument],
search_space_id: int
) -> Optional[Document]: ) -> Optional[Document]:
try: try:
file_in_markdown = await convert_document_to_markdown(unstructured_processed_elements) content_hash = generate_content_hash(file_in_markdown)
# TODO: Check if file_markdown exceeds token limit of embedding model # Check if document with this content hash already exists
existing_doc_result = await session.execute(
select(Document).where(Document.content_hash == content_hash)
)
existing_document = existing_doc_result.scalars().first()
if existing_document:
logging.info(f"Document with content hash {content_hash} already exists. Skipping processing.")
return existing_document
# Generate summary # Generate summary
summary_chain = SUMMARY_PROMPT_TEMPLATE | config.long_context_llm_instance summary_chain = SUMMARY_PROMPT_TEMPLATE | config.long_context_llm_instance
summary_result = await summary_chain.ainvoke({"document": file_in_markdown}) summary_result = await summary_chain.ainvoke({"document": file_in_markdown})
summary_content = summary_result.content summary_content = summary_result.content
summary_embedding = config.embedding_model_instance.embed( summary_embedding = config.embedding_model_instance.embed(summary_content)
summary_content)
# Process chunks # Process chunks
chunks = [ chunks = [
Chunk(content=chunk.text, embedding=chunk.embedding) Chunk(
content=chunk.text,
embedding=config.embedding_model_instance.embed(chunk.text),
)
for chunk in config.chunker_instance.chunk(file_in_markdown) for chunk in config.chunker_instance.chunk(file_in_markdown)
] ]
@ -226,11 +268,11 @@ async def add_received_file_document(
document_type=DocumentType.FILE, document_type=DocumentType.FILE,
document_metadata={ document_metadata={
"FILE_NAME": file_name, "FILE_NAME": file_name,
"SAVED_AT": datetime.now().strftime("%Y-%m-%d %H:%M:%S")
}, },
content=summary_content, content=summary_content,
embedding=summary_embedding, embedding=summary_embedding,
chunks=chunks chunks=chunks,
content_hash=content_hash,
) )
session.add(document) session.add(document)
@ -246,21 +288,173 @@ async def add_received_file_document(
raise RuntimeError(f"Failed to process file document: {str(e)}") raise RuntimeError(f"Failed to process file document: {str(e)}")
async def add_youtube_video_document( async def add_received_file_document_using_unstructured(
session: AsyncSession, session: AsyncSession,
url: str, file_name: str,
search_space_id: int unstructured_processed_elements: List[LangChainDocument],
): search_space_id: int,
) -> Optional[Document]:
try:
file_in_markdown = await convert_document_to_markdown(
unstructured_processed_elements
)
content_hash = generate_content_hash(file_in_markdown)
# Check if document with this content hash already exists
existing_doc_result = await session.execute(
select(Document).where(Document.content_hash == content_hash)
)
existing_document = existing_doc_result.scalars().first()
if existing_document:
logging.info(f"Document with content hash {content_hash} already exists. Skipping processing.")
return existing_document
# TODO: Check if file_markdown exceeds token limit of embedding model
# Generate summary
summary_chain = SUMMARY_PROMPT_TEMPLATE | config.long_context_llm_instance
summary_result = await summary_chain.ainvoke({"document": file_in_markdown})
summary_content = summary_result.content
summary_embedding = config.embedding_model_instance.embed(summary_content)
# Process chunks
chunks = [
Chunk(
content=chunk.text,
embedding=config.embedding_model_instance.embed(chunk.text),
)
for chunk in config.chunker_instance.chunk(file_in_markdown)
]
# Create and store document
document = Document(
search_space_id=search_space_id,
title=file_name,
document_type=DocumentType.FILE,
document_metadata={
"FILE_NAME": file_name,
"ETL_SERVICE": "UNSTRUCTURED",
},
content=summary_content,
embedding=summary_embedding,
chunks=chunks,
content_hash=content_hash,
)
session.add(document)
await session.commit()
await session.refresh(document)
return document
except SQLAlchemyError as db_error:
await session.rollback()
raise db_error
except Exception as e:
await session.rollback()
raise RuntimeError(f"Failed to process file document: {str(e)}")
async def add_received_file_document_using_llamacloud(
session: AsyncSession,
file_name: str,
llamacloud_markdown_document: str,
search_space_id: int,
) -> Optional[Document]:
""" """
Process a YouTube video URL, extract transcripts, and add as document. Process and store document content parsed by LlamaCloud.
Args:
session: Database session
file_name: Name of the processed file
llamacloud_markdown_documents: List of markdown content from LlamaCloud parsing
search_space_id: ID of the search space
Returns:
Document object if successful, None if failed
""" """
try: try:
from youtube_transcript_api import YouTubeTranscriptApi # Combine all markdown documents into one
file_in_markdown = llamacloud_markdown_document
content_hash = generate_content_hash(file_in_markdown)
# Check if document with this content hash already exists
existing_doc_result = await session.execute(
select(Document).where(Document.content_hash == content_hash)
)
existing_document = existing_doc_result.scalars().first()
if existing_document:
logging.info(f"Document with content hash {content_hash} already exists. Skipping processing.")
return existing_document
# Generate summary
summary_chain = SUMMARY_PROMPT_TEMPLATE | config.long_context_llm_instance
summary_result = await summary_chain.ainvoke({"document": file_in_markdown})
summary_content = summary_result.content
summary_embedding = config.embedding_model_instance.embed(summary_content)
# Process chunks
chunks = [
Chunk(
content=chunk.text,
embedding=config.embedding_model_instance.embed(chunk.text),
)
for chunk in config.chunker_instance.chunk(file_in_markdown)
]
# Create and store document
document = Document(
search_space_id=search_space_id,
title=file_name,
document_type=DocumentType.FILE,
document_metadata={
"FILE_NAME": file_name,
"ETL_SERVICE": "LLAMACLOUD",
},
content=summary_content,
embedding=summary_embedding,
chunks=chunks,
content_hash=content_hash,
)
session.add(document)
await session.commit()
await session.refresh(document)
return document
except SQLAlchemyError as db_error:
await session.rollback()
raise db_error
except Exception as e:
await session.rollback()
raise RuntimeError(f"Failed to process file document using LlamaCloud: {str(e)}")
async def add_youtube_video_document(
session: AsyncSession, url: str, search_space_id: int
):
"""
Process a YouTube video URL, extract transcripts, and store as a document.
Args:
session: Database session for storing the document
url: YouTube video URL (supports standard, shortened, and embed formats)
search_space_id: ID of the search space to add the document to
Returns:
Document: The created document object
Raises:
ValueError: If the YouTube video ID cannot be extracted from the URL
SQLAlchemyError: If there's a database error
RuntimeError: If the video processing fails
"""
try:
# Extract video ID from URL # Extract video ID from URL
def get_youtube_video_id(url: str): def get_youtube_video_id(url: str):
from urllib.parse import urlparse, parse_qs
parsed_url = urlparse(url) parsed_url = urlparse(url)
hostname = parsed_url.hostname hostname = parsed_url.hostname
@ -281,19 +475,16 @@ async def add_youtube_video_document(
if not video_id: if not video_id:
raise ValueError(f"Could not extract video ID from URL: {url}") raise ValueError(f"Could not extract video ID from URL: {url}")
# Get video metadata # Get video metadata using async HTTP client
import json params = {
from urllib.parse import urlencode "format": "json",
from urllib.request import urlopen "url": f"https://www.youtube.com/watch?v={video_id}",
}
params = {"format": "json", "url": f"https://www.youtube.com/watch?v={video_id}"}
oembed_url = "https://www.youtube.com/oembed" oembed_url = "https://www.youtube.com/oembed"
query_string = urlencode(params)
full_url = oembed_url + "?" + query_string
with urlopen(full_url) as response: async with aiohttp.ClientSession() as http_session:
response_text = response.read() async with http_session.get(oembed_url, params=params) as response:
video_data = json.loads(response_text.decode()) video_data = await response.json()
# Get video transcript # Get video transcript
try: try:
@ -312,19 +503,20 @@ async def add_youtube_video_document(
# Format document metadata in a more maintainable way # Format document metadata in a more maintainable way
metadata_sections = [ metadata_sections = [
("METADATA", [ (
f"TITLE: {video_data.get('title', 'YouTube Video')}", "METADATA",
f"URL: {url}", [
f"VIDEO_ID: {video_id}", f"TITLE: {video_data.get('title', 'YouTube Video')}",
f"AUTHOR: {video_data.get('author_name', 'Unknown')}", f"URL: {url}",
f"THUMBNAIL: {video_data.get('thumbnail_url', '')}" f"VIDEO_ID: {video_id}",
]), f"AUTHOR: {video_data.get('author_name', 'Unknown')}",
("CONTENT", [ f"THUMBNAIL: {video_data.get('thumbnail_url', '')}",
"FORMAT: transcript", ],
"TEXT_START", ),
transcript_text, (
"TEXT_END" "CONTENT",
]) ["FORMAT: transcript", "TEXT_START", transcript_text, "TEXT_END"],
),
] ]
# Build the document string more efficiently # Build the document string more efficiently
@ -337,22 +529,37 @@ async def add_youtube_video_document(
document_parts.append(f"</{section_title}>") document_parts.append(f"</{section_title}>")
document_parts.append("</DOCUMENT>") document_parts.append("</DOCUMENT>")
combined_document_string = '\n'.join(document_parts) combined_document_string = "\n".join(document_parts)
content_hash = generate_content_hash(combined_document_string)
# Check if document with this content hash already exists
existing_doc_result = await session.execute(
select(Document).where(Document.content_hash == content_hash)
)
existing_document = existing_doc_result.scalars().first()
if existing_document:
logging.info(f"Document with content hash {content_hash} already exists. Skipping processing.")
return existing_document
# Generate summary # Generate summary
summary_chain = SUMMARY_PROMPT_TEMPLATE | config.long_context_llm_instance summary_chain = SUMMARY_PROMPT_TEMPLATE | config.long_context_llm_instance
summary_result = await summary_chain.ainvoke({"document": combined_document_string}) summary_result = await summary_chain.ainvoke(
{"document": combined_document_string}
)
summary_content = summary_result.content summary_content = summary_result.content
summary_embedding = config.embedding_model_instance.embed(summary_content) summary_embedding = config.embedding_model_instance.embed(summary_content)
# Process chunks # Process chunks
chunks = [ chunks = [
Chunk(content=chunk.text, embedding=chunk.embedding) Chunk(
for chunk in config.chunker_instance.chunk(transcript_text) content=chunk.text,
embedding=config.embedding_model_instance.embed(chunk.text),
)
for chunk in config.chunker_instance.chunk(combined_document_string)
] ]
# Create document # Create document
from app.db import Document, DocumentType
document = Document( document = Document(
title=video_data.get("title", "YouTube Video"), title=video_data.get("title", "YouTube Video"),
@ -362,12 +569,13 @@ async def add_youtube_video_document(
"video_id": video_id, "video_id": video_id,
"video_title": video_data.get("title", "YouTube Video"), "video_title": video_data.get("title", "YouTube Video"),
"author": video_data.get("author_name", "Unknown"), "author": video_data.get("author_name", "Unknown"),
"thumbnail": video_data.get("thumbnail_url", "") "thumbnail": video_data.get("thumbnail_url", ""),
}, },
content=summary_content, content=summary_content,
embedding=summary_embedding, embedding=summary_embedding,
chunks=chunks, chunks=chunks,
search_space_id=search_space_id search_space_id=search_space_id,
content_hash=content_hash,
) )
session.add(document) session.add(document)
@ -380,6 +588,5 @@ async def add_youtube_video_document(
raise db_error raise db_error
except Exception as e: except Exception as e:
await session.rollback() await session.rollback()
import logging
logging.error(f"Failed to process YouTube video: {str(e)}") logging.error(f"Failed to process YouTube video: {str(e)}")
raise raise

View file

@ -14,6 +14,8 @@ from app.connectors.linear_connector import LinearConnector
from slack_sdk.errors import SlackApiError from slack_sdk.errors import SlackApiError
import logging import logging
from app.utils.document_converters import generate_content_hash
# Set up logging # Set up logging
logger = logging.getLogger(__name__) logger = logging.getLogger(__name__)
@ -67,13 +69,13 @@ async def index_slack_messages(
# Check if last_indexed_at is in the future or after end_date # Check if last_indexed_at is in the future or after end_date
if last_indexed_naive > end_date: if last_indexed_naive > end_date:
logger.warning(f"Last indexed date ({last_indexed_naive.strftime('%Y-%m-%d')}) is in the future. Using 30 days ago instead.") logger.warning(f"Last indexed date ({last_indexed_naive.strftime('%Y-%m-%d')}) is in the future. Using 365 days ago instead.")
start_date = end_date - timedelta(days=30) start_date = end_date - timedelta(days=365)
else: else:
start_date = last_indexed_naive start_date = last_indexed_naive
logger.info(f"Using last_indexed_at ({start_date.strftime('%Y-%m-%d')}) as start date") logger.info(f"Using last_indexed_at ({start_date.strftime('%Y-%m-%d')}) as start date")
else: else:
start_date = end_date - timedelta(days=30) # Use 30 days instead of 365 to catch recent issues start_date = end_date - timedelta(days=365) # Use 365 days as default
logger.info(f"No last_indexed_at found, using {start_date.strftime('%Y-%m-%d')} (30 days ago) as start date") logger.info(f"No last_indexed_at found, using {start_date.strftime('%Y-%m-%d')} (30 days ago) as start date")
# Format dates for Slack API # Format dates for Slack API
@ -89,58 +91,31 @@ async def index_slack_messages(
if not channels: if not channels:
return 0, "No Slack channels found" return 0, "No Slack channels found"
# Get existing documents for this search space and connector type to prevent duplicates
existing_docs_result = await session.execute(
select(Document)
.filter(
Document.search_space_id == search_space_id,
Document.document_type == DocumentType.SLACK_CONNECTOR
)
)
existing_docs = existing_docs_result.scalars().all()
# Create a lookup dictionary of existing documents by channel_id
existing_docs_by_channel_id = {}
for doc in existing_docs:
if "channel_id" in doc.document_metadata:
existing_docs_by_channel_id[doc.document_metadata["channel_id"]] = doc
logger.info(f"Found {len(existing_docs_by_channel_id)} existing Slack documents in database")
# Track the number of documents indexed # Track the number of documents indexed
documents_indexed = 0 documents_indexed = 0
documents_updated = 0
documents_skipped = 0 documents_skipped = 0
skipped_channels = [] skipped_channels = []
# Process each channel # Process each channel
for channel_name, channel_id in channels.items(): for channel_obj in channels: # Modified loop to iterate over list of channel objects
try: channel_id = channel_obj["id"]
# Check if the bot is a member of the channel channel_name = channel_obj["name"]
try: is_private = channel_obj["is_private"]
# First try to get channel info to check if bot is a member is_member = channel_obj["is_member"] # This might be False for public channels too
channel_info = slack_client.client.conversations_info(channel=channel_id)
# For private channels, the bot needs to be a member try:
if channel_info.get("channel", {}).get("is_private", False): # If it's a private channel and the bot is not a member, skip.
# Check if bot is a member # For public channels, if they are listed by conversations.list, the bot can typically read history.
is_member = channel_info.get("channel", {}).get("is_member", False) # The `not_in_channel` error in get_conversation_history will be the ultimate gatekeeper if history is inaccessible.
if not is_member: if is_private and not is_member:
logger.warning(f"Bot is not a member of private channel {channel_name} ({channel_id}). Skipping.") logger.warning(f"Bot is not a member of private channel {channel_name} ({channel_id}). Skipping.")
skipped_channels.append(f"{channel_name} (private, bot not a member)") skipped_channels.append(f"{channel_name} (private, bot not a member)")
documents_skipped += 1 documents_skipped += 1
continue continue
except SlackApiError as e:
if "not_in_channel" in str(e) or "channel_not_found" in str(e):
logger.warning(f"Bot cannot access channel {channel_name} ({channel_id}). Skipping.")
skipped_channels.append(f"{channel_name} (access error)")
documents_skipped += 1
continue
else:
# Re-raise if it's a different error
raise
# Get messages for this channel # Get messages for this channel
# The get_history_by_date_range now uses get_conversation_history,
# which handles 'not_in_channel' by returning [] and logging.
messages, error = slack_client.get_history_by_date_range( messages, error = slack_client.get_history_by_date_range(
channel_id=channel_id, channel_id=channel_id,
start_date=start_date_str, start_date=start_date_str,
@ -189,10 +164,9 @@ async def index_slack_messages(
("METADATA", [ ("METADATA", [
f"CHANNEL_NAME: {channel_name}", f"CHANNEL_NAME: {channel_name}",
f"CHANNEL_ID: {channel_id}", f"CHANNEL_ID: {channel_id}",
f"START_DATE: {start_date_str}", # f"START_DATE: {start_date_str}",
f"END_DATE: {end_date_str}", # f"END_DATE: {end_date_str}",
f"MESSAGE_COUNT: {len(formatted_messages)}", f"MESSAGE_COUNT: {len(formatted_messages)}"
f"INDEXED_AT: {datetime.now().strftime('%Y-%m-%d %H:%M:%S')}"
]), ]),
("CONTENT", [ ("CONTENT", [
"FORMAT: markdown", "FORMAT: markdown",
@ -213,6 +187,18 @@ async def index_slack_messages(
document_parts.append("</DOCUMENT>") document_parts.append("</DOCUMENT>")
combined_document_string = '\n'.join(document_parts) combined_document_string = '\n'.join(document_parts)
content_hash = generate_content_hash(combined_document_string)
# Check if document with this content hash already exists
existing_doc_by_hash_result = await session.execute(
select(Document).where(Document.content_hash == content_hash)
)
existing_document_by_hash = existing_doc_by_hash_result.scalars().first()
if existing_document_by_hash:
logger.info(f"Document with content hash {content_hash} already exists for channel {channel_name}. Skipping processing.")
documents_skipped += 1
continue
# Generate summary # Generate summary
summary_chain = SUMMARY_PROMPT_TEMPLATE | config.long_context_llm_instance summary_chain = SUMMARY_PROMPT_TEMPLATE | config.long_context_llm_instance
@ -222,65 +208,32 @@ async def index_slack_messages(
# Process chunks # Process chunks
chunks = [ chunks = [
Chunk(content=chunk.text, embedding=chunk.embedding) Chunk(content=chunk.text, embedding=config.embedding_model_instance.embed(chunk.text))
for chunk in config.chunker_instance.chunk(channel_content) for chunk in config.chunker_instance.chunk(channel_content)
] ]
# Check if this channel already exists in our database # Create and store new document
existing_document = existing_docs_by_channel_id.get(channel_id) document = Document(
search_space_id=search_space_id,
if existing_document: title=f"Slack - {channel_name}",
# Update existing document instead of creating a new one document_type=DocumentType.SLACK_CONNECTOR,
logger.info(f"Updating existing document for channel {channel_name}") document_metadata={
# Update document fields
existing_document.title = f"Slack - {channel_name}"
existing_document.document_metadata = {
"channel_name": channel_name, "channel_name": channel_name,
"channel_id": channel_id, "channel_id": channel_id,
"start_date": start_date_str, "start_date": start_date_str,
"end_date": end_date_str, "end_date": end_date_str,
"message_count": len(formatted_messages), "message_count": len(formatted_messages),
"indexed_at": datetime.now().strftime("%Y-%m-%d %H:%M:%S"), "indexed_at": datetime.now().strftime("%Y-%m-%d %H:%M:%S")
"last_updated": datetime.now().strftime("%Y-%m-%d %H:%M:%S") },
} content=summary_content,
existing_document.content = summary_content embedding=summary_embedding,
existing_document.embedding = summary_embedding chunks=chunks,
content_hash=content_hash,
)
# Delete existing chunks and add new ones session.add(document)
await session.execute( documents_indexed += 1
delete(Chunk) logger.info(f"Successfully indexed new channel {channel_name} with {len(formatted_messages)} messages")
.where(Chunk.document_id == existing_document.id)
)
# Assign new chunks to existing document
for chunk in chunks:
chunk.document_id = existing_document.id
session.add(chunk)
documents_updated += 1
else:
# Create and store new document
document = Document(
search_space_id=search_space_id,
title=f"Slack - {channel_name}",
document_type=DocumentType.SLACK_CONNECTOR,
document_metadata={
"channel_name": channel_name,
"channel_id": channel_id,
"start_date": start_date_str,
"end_date": end_date_str,
"message_count": len(formatted_messages),
"indexed_at": datetime.now().strftime("%Y-%m-%d %H:%M:%S")
},
content=summary_content,
embedding=summary_embedding,
chunks=chunks
)
session.add(document)
documents_indexed += 1
logger.info(f"Successfully indexed new channel {channel_name} with {len(formatted_messages)} messages")
except SlackApiError as slack_error: except SlackApiError as slack_error:
logger.error(f"Slack API error for channel {channel_name}: {str(slack_error)}") logger.error(f"Slack API error for channel {channel_name}: {str(slack_error)}")
@ -295,7 +248,7 @@ async def index_slack_messages(
# Update the last_indexed_at timestamp for the connector only if requested # Update the last_indexed_at timestamp for the connector only if requested
# and if we successfully indexed at least one channel # and if we successfully indexed at least one channel
total_processed = documents_indexed + documents_updated total_processed = documents_indexed
if update_last_indexed and total_processed > 0: if update_last_indexed and total_processed > 0:
connector.last_indexed_at = datetime.now() connector.last_indexed_at = datetime.now()
@ -305,11 +258,11 @@ async def index_slack_messages(
# Prepare result message # Prepare result message
result_message = None result_message = None
if skipped_channels: if skipped_channels:
result_message = f"Processed {total_processed} channels ({documents_indexed} new, {documents_updated} updated). Skipped {len(skipped_channels)} channels: {', '.join(skipped_channels)}" result_message = f"Processed {total_processed} channels. Skipped {len(skipped_channels)} channels: {', '.join(skipped_channels)}"
else: else:
result_message = f"Processed {total_processed} channels ({documents_indexed} new, {documents_updated} updated)." result_message = f"Processed {total_processed} channels."
logger.info(f"Slack indexing completed: {documents_indexed} new channels, {documents_updated} updated, {documents_skipped} skipped") logger.info(f"Slack indexing completed: {documents_indexed} new channels, {documents_skipped} skipped")
return total_processed, result_message return total_processed, result_message
except SQLAlchemyError as db_error: except SQLAlchemyError as db_error:
@ -386,27 +339,8 @@ async def index_notion_pages(
logger.info("No Notion pages found to index") logger.info("No Notion pages found to index")
return 0, "No Notion pages found" return 0, "No Notion pages found"
# Get existing documents for this search space and connector type to prevent duplicates
existing_docs_result = await session.execute(
select(Document)
.filter(
Document.search_space_id == search_space_id,
Document.document_type == DocumentType.NOTION_CONNECTOR
)
)
existing_docs = existing_docs_result.scalars().all()
# Create a lookup dictionary of existing documents by page_id
existing_docs_by_page_id = {}
for doc in existing_docs:
if "page_id" in doc.document_metadata:
existing_docs_by_page_id[doc.document_metadata["page_id"]] = doc
logger.info(f"Found {len(existing_docs_by_page_id)} existing Notion documents in database")
# Track the number of documents indexed # Track the number of documents indexed
documents_indexed = 0 documents_indexed = 0
documents_updated = 0
documents_skipped = 0 documents_skipped = 0
skipped_pages = [] skipped_pages = []
@ -482,8 +416,7 @@ async def index_notion_pages(
metadata_sections = [ metadata_sections = [
("METADATA", [ ("METADATA", [
f"PAGE_TITLE: {page_title}", f"PAGE_TITLE: {page_title}",
f"PAGE_ID: {page_id}", f"PAGE_ID: {page_id}"
f"INDEXED_AT: {datetime.now().strftime('%Y-%m-%d %H:%M:%S')}"
]), ]),
("CONTENT", [ ("CONTENT", [
"FORMAT: markdown", "FORMAT: markdown",
@ -504,6 +437,18 @@ async def index_notion_pages(
document_parts.append("</DOCUMENT>") document_parts.append("</DOCUMENT>")
combined_document_string = '\n'.join(document_parts) combined_document_string = '\n'.join(document_parts)
content_hash = generate_content_hash(combined_document_string)
# Check if document with this content hash already exists
existing_doc_by_hash_result = await session.execute(
select(Document).where(Document.content_hash == content_hash)
)
existing_document_by_hash = existing_doc_by_hash_result.scalars().first()
if existing_document_by_hash:
logger.info(f"Document with content hash {content_hash} already exists for page {page_title}. Skipping processing.")
documents_skipped += 1
continue
# Generate summary # Generate summary
logger.debug(f"Generating summary for page {page_title}") logger.debug(f"Generating summary for page {page_title}")
@ -515,59 +460,29 @@ async def index_notion_pages(
# Process chunks # Process chunks
logger.debug(f"Chunking content for page {page_title}") logger.debug(f"Chunking content for page {page_title}")
chunks = [ chunks = [
Chunk(content=chunk.text, embedding=chunk.embedding) Chunk(content=chunk.text, embedding=config.embedding_model_instance.embed(chunk.text))
for chunk in config.chunker_instance.chunk(markdown_content) for chunk in config.chunker_instance.chunk(markdown_content)
] ]
# Check if this page already exists in our database # Create and store new document
existing_document = existing_docs_by_page_id.get(page_id) document = Document(
search_space_id=search_space_id,
if existing_document: title=f"Notion - {page_title}",
# Update existing document instead of creating a new one document_type=DocumentType.NOTION_CONNECTOR,
logger.info(f"Updating existing document for page {page_title}") document_metadata={
# Update document fields
existing_document.title = f"Notion - {page_title}"
existing_document.document_metadata = {
"page_title": page_title, "page_title": page_title,
"page_id": page_id, "page_id": page_id,
"indexed_at": datetime.now().strftime("%Y-%m-%d %H:%M:%S"), "indexed_at": datetime.now().strftime("%Y-%m-%d %H:%M:%S")
"last_updated": datetime.now().strftime("%Y-%m-%d %H:%M:%S") },
} content=summary_content,
existing_document.content = summary_content content_hash=content_hash,
existing_document.embedding = summary_embedding embedding=summary_embedding,
chunks=chunks
)
# Delete existing chunks and add new ones session.add(document)
await session.execute( documents_indexed += 1
delete(Chunk) logger.info(f"Successfully indexed new Notion page: {page_title}")
.where(Chunk.document_id == existing_document.id)
)
# Assign new chunks to existing document
for chunk in chunks:
chunk.document_id = existing_document.id
session.add(chunk)
documents_updated += 1
else:
# Create and store new document
document = Document(
search_space_id=search_space_id,
title=f"Notion - {page_title}",
document_type=DocumentType.NOTION_CONNECTOR,
document_metadata={
"page_title": page_title,
"page_id": page_id,
"indexed_at": datetime.now().strftime("%Y-%m-%d %H:%M:%S")
},
content=summary_content,
embedding=summary_embedding,
chunks=chunks
)
session.add(document)
documents_indexed += 1
logger.info(f"Successfully indexed new Notion page: {page_title}")
except Exception as e: except Exception as e:
logger.error(f"Error processing Notion page {page.get('title', 'Unknown')}: {str(e)}", exc_info=True) logger.error(f"Error processing Notion page {page.get('title', 'Unknown')}: {str(e)}", exc_info=True)
@ -577,7 +492,7 @@ async def index_notion_pages(
# Update the last_indexed_at timestamp for the connector only if requested # Update the last_indexed_at timestamp for the connector only if requested
# and if we successfully indexed at least one page # and if we successfully indexed at least one page
total_processed = documents_indexed + documents_updated total_processed = documents_indexed
if update_last_indexed and total_processed > 0: if update_last_indexed and total_processed > 0:
connector.last_indexed_at = datetime.now() connector.last_indexed_at = datetime.now()
logger.info(f"Updated last_indexed_at for connector {connector_id}") logger.info(f"Updated last_indexed_at for connector {connector_id}")
@ -588,11 +503,11 @@ async def index_notion_pages(
# Prepare result message # Prepare result message
result_message = None result_message = None
if skipped_pages: if skipped_pages:
result_message = f"Processed {total_processed} pages ({documents_indexed} new, {documents_updated} updated). Skipped {len(skipped_pages)} pages: {', '.join(skipped_pages)}" result_message = f"Processed {total_processed} pages. Skipped {len(skipped_pages)} pages: {', '.join(skipped_pages)}"
else: else:
result_message = f"Processed {total_processed} pages ({documents_indexed} new, {documents_updated} updated)." result_message = f"Processed {total_processed} pages."
logger.info(f"Notion indexing completed: {documents_indexed} new pages, {documents_updated} updated, {documents_skipped} skipped") logger.info(f"Notion indexing completed: {documents_indexed} new pages, {documents_skipped} skipped")
return total_processed, result_message return total_processed, result_message
except SQLAlchemyError as db_error: except SQLAlchemyError as db_error:
@ -660,19 +575,6 @@ async def index_github_repos(
# If a repo is inaccessible, get_repository_files will likely fail gracefully later. # If a repo is inaccessible, get_repository_files will likely fail gracefully later.
logger.info(f"Starting indexing for {len(repo_full_names_to_index)} selected repositories.") logger.info(f"Starting indexing for {len(repo_full_names_to_index)} selected repositories.")
# 5. Get existing documents for this search space and connector type to prevent duplicates
existing_docs_result = await session.execute(
select(Document)
.filter(
Document.search_space_id == search_space_id,
Document.document_type == DocumentType.GITHUB_CONNECTOR
)
)
existing_docs = existing_docs_result.scalars().all()
# Create a lookup dict: key=repo_fullname/file_path, value=Document object
existing_docs_lookup = {doc.document_metadata.get("full_path"): doc for doc in existing_docs if doc.document_metadata.get("full_path")}
logger.info(f"Found {len(existing_docs_lookup)} existing GitHub documents in database for search space {search_space_id}")
# 6. Iterate through selected repositories and index files # 6. Iterate through selected repositories and index files
for repo_full_name in repo_full_names_to_index: for repo_full_name in repo_full_names_to_index:
if not repo_full_name or not isinstance(repo_full_name, str): if not repo_full_name or not isinstance(repo_full_name, str):
@ -699,12 +601,6 @@ async def index_github_repos(
logger.warning(f"Skipping file with missing info in {repo_full_name}: {file_info}") logger.warning(f"Skipping file with missing info in {repo_full_name}: {file_info}")
continue continue
# Check if document already exists and if content hash matches
existing_doc = existing_docs_lookup.get(full_path_key)
if existing_doc and existing_doc.document_metadata.get("sha") == file_sha:
logger.debug(f"Skipping unchanged file: {full_path_key}")
continue # Skip if SHA matches (content hasn't changed)
# Get file content # Get file content
file_content = github_client.get_file_content(repo_full_name, file_path) file_content = github_client.get_file_content(repo_full_name, file_path)
@ -712,6 +608,18 @@ async def index_github_repos(
logger.warning(f"Could not retrieve content for {full_path_key}. Skipping.") logger.warning(f"Could not retrieve content for {full_path_key}. Skipping.")
continue # Skip if content fetch failed continue # Skip if content fetch failed
content_hash = generate_content_hash(file_content)
# Check if document with this content hash already exists
existing_doc_by_hash_result = await session.execute(
select(Document).where(Document.content_hash == content_hash)
)
existing_document_by_hash = existing_doc_by_hash_result.scalars().first()
if existing_document_by_hash:
logger.info(f"Document with content hash {content_hash} already exists for file {full_path_key}. Skipping processing.")
continue
# Use file_content directly for chunking, maybe summary for main content? # Use file_content directly for chunking, maybe summary for main content?
# For now, let's use the full content for both, might need refinement # For now, let's use the full content for both, might need refinement
summary_content = f"GitHub file: {full_path_key}\n\n{file_content[:1000]}..." # Simple summary summary_content = f"GitHub file: {full_path_key}\n\n{file_content[:1000]}..." # Simple summary
@ -720,8 +628,8 @@ async def index_github_repos(
# Chunk the content # Chunk the content
try: try:
chunks_data = [ chunks_data = [
Chunk(content=chunk.text, embedding=chunk.embedding) Chunk(content=chunk.text, embedding=config.embedding_model_instance.embed(chunk.text))
for chunk in config.chunker_instance.chunk(file_content) for chunk in config.code_chunker_instance.chunk(file_content)
] ]
except Exception as chunk_err: except Exception as chunk_err:
logger.error(f"Failed to chunk file {full_path_key}: {chunk_err}") logger.error(f"Failed to chunk file {full_path_key}: {chunk_err}")
@ -738,42 +646,20 @@ async def index_github_repos(
"indexed_at": datetime.now(timezone.utc).isoformat() "indexed_at": datetime.now(timezone.utc).isoformat()
} }
if existing_doc: # Create new document
# Update existing document logger.info(f"Creating new document for file: {full_path_key}")
logger.info(f"Updating document for file: {full_path_key}") document = Document(
existing_doc.title = f"GitHub - {file_path}" title=f"GitHub - {file_path}",
existing_doc.document_metadata = doc_metadata document_type=DocumentType.GITHUB_CONNECTOR,
existing_doc.content = summary_content # Update summary document_metadata=doc_metadata,
existing_doc.embedding = summary_embedding # Update embedding content=summary_content, # Store summary
content_hash=content_hash,
# Delete old chunks embedding=summary_embedding,
await session.execute( search_space_id=search_space_id,
delete(Chunk) chunks=chunks_data # Associate chunks directly
.where(Chunk.document_id == existing_doc.id) )
) session.add(document)
# Add new chunks documents_processed += 1
for chunk_obj in chunks_data:
chunk_obj.document_id = existing_doc.id
session.add(chunk_obj)
documents_processed += 1
else:
# Create new document
logger.info(f"Creating new document for file: {full_path_key}")
document = Document(
title=f"GitHub - {file_path}",
document_type=DocumentType.GITHUB_CONNECTOR,
document_metadata=doc_metadata,
content=summary_content, # Store summary
embedding=summary_embedding,
search_space_id=search_space_id,
chunks=chunks_data # Associate chunks directly
)
session.add(document)
documents_processed += 1
# Commit periodically or at the end? For now, commit per repo
# await session.commit()
except Exception as repo_err: except Exception as repo_err:
logger.error(f"Failed to process repository {repo_full_name}: {repo_err}") logger.error(f"Failed to process repository {repo_full_name}: {repo_err}")
@ -847,14 +733,14 @@ async def index_linear_issues(
# Check if last_indexed_at is in the future or after end_date # Check if last_indexed_at is in the future or after end_date
if last_indexed_naive > end_date: if last_indexed_naive > end_date:
logger.warning(f"Last indexed date ({last_indexed_naive.strftime('%Y-%m-%d')}) is in the future. Using 30 days ago instead.") logger.warning(f"Last indexed date ({last_indexed_naive.strftime('%Y-%m-%d')}) is in the future. Using 365 days ago instead.")
start_date = end_date - timedelta(days=30) start_date = end_date - timedelta(days=365)
else: else:
start_date = last_indexed_naive start_date = last_indexed_naive
logger.info(f"Using last_indexed_at ({start_date.strftime('%Y-%m-%d')}) as start date") logger.info(f"Using last_indexed_at ({start_date.strftime('%Y-%m-%d')}) as start date")
else: else:
start_date = end_date - timedelta(days=30) # Use 30 days instead of 365 to catch recent issues start_date = end_date - timedelta(days=365) # Use 365 days as default
logger.info(f"No last_indexed_at found, using {start_date.strftime('%Y-%m-%d')} (30 days ago) as start date") logger.info(f"No last_indexed_at found, using {start_date.strftime('%Y-%m-%d')} (365 days ago) as start date")
# Format dates for Linear API # Format dates for Linear API
start_date_str = start_date.strftime("%Y-%m-%d") start_date_str = start_date.strftime("%Y-%m-%d")
@ -905,35 +791,8 @@ async def index_linear_issues(
if len(issues) > 10: if len(issues) > 10:
logger.info(f" ...and {len(issues) - 10} more issues") logger.info(f" ...and {len(issues) - 10} more issues")
# Get existing documents for this search space and connector type to prevent duplicates
existing_docs_result = await session.execute(
select(Document)
.filter(
Document.search_space_id == search_space_id,
Document.document_type == DocumentType.LINEAR_CONNECTOR
)
)
existing_docs = existing_docs_result.scalars().all()
# Create a lookup dictionary of existing documents by issue_id
existing_docs_by_issue_id = {}
for doc in existing_docs:
if "issue_id" in doc.document_metadata:
existing_docs_by_issue_id[doc.document_metadata["issue_id"]] = doc
logger.info(f"Found {len(existing_docs_by_issue_id)} existing Linear documents in database")
# Log existing document IDs for debugging
if existing_docs_by_issue_id:
logger.info("Existing Linear document issue IDs in database:")
for idx, (issue_id, doc) in enumerate(list(existing_docs_by_issue_id.items())[:10]): # Log first 10
logger.info(f" {idx+1}. {issue_id} - {doc.document_metadata.get('issue_identifier', 'Unknown')} - {doc.document_metadata.get('issue_title', 'Unknown')}")
if len(existing_docs_by_issue_id) > 10:
logger.info(f" ...and {len(existing_docs_by_issue_id) - 10} more existing documents")
# Track the number of documents indexed # Track the number of documents indexed
documents_indexed = 0 documents_indexed = 0
documents_updated = 0
documents_skipped = 0 documents_skipped = 0
skipped_issues = [] skipped_issues = []
@ -979,71 +838,51 @@ async def index_linear_issues(
comment_count = len(formatted_issue.get("comments", [])) comment_count = len(formatted_issue.get("comments", []))
summary_content += f"Comments: {comment_count}" summary_content += f"Comments: {comment_count}"
content_hash = generate_content_hash(issue_content)
# Check if document with this content hash already exists
existing_doc_by_hash_result = await session.execute(
select(Document).where(Document.content_hash == content_hash)
)
existing_document_by_hash = existing_doc_by_hash_result.scalars().first()
if existing_document_by_hash:
logger.info(f"Document with content hash {content_hash} already exists for issue {issue_identifier}. Skipping processing.")
documents_skipped += 1
continue
# Generate embedding for the summary # Generate embedding for the summary
summary_embedding = config.embedding_model_instance.embed(summary_content) summary_embedding = config.embedding_model_instance.embed(summary_content)
# Process chunks - using the full issue content with comments # Process chunks - using the full issue content with comments
chunks = [ chunks = [
Chunk(content=chunk.text, embedding=chunk.embedding) Chunk(content=chunk.text, embedding=config.embedding_model_instance.embed(chunk.text))
for chunk in config.chunker_instance.chunk(issue_content) for chunk in config.chunker_instance.chunk(issue_content)
] ]
# Check if this issue already exists in our database # Create and store new document
existing_document = existing_docs_by_issue_id.get(issue_id) logger.info(f"Creating new document for issue {issue_identifier} - {issue_title}")
document = Document(
if existing_document: search_space_id=search_space_id,
# Update existing document instead of creating a new one title=f"Linear - {issue_identifier}: {issue_title}",
logger.info(f"Updating existing document for issue {issue_identifier} - {issue_title}") document_type=DocumentType.LINEAR_CONNECTOR,
document_metadata={
# Update document fields
existing_document.title = f"Linear - {issue_identifier}: {issue_title}"
existing_document.document_metadata = {
"issue_id": issue_id, "issue_id": issue_id,
"issue_identifier": issue_identifier, "issue_identifier": issue_identifier,
"issue_title": issue_title, "issue_title": issue_title,
"state": state, "state": state,
"comment_count": comment_count, "comment_count": comment_count,
"indexed_at": datetime.now().strftime("%Y-%m-%d %H:%M:%S"), "indexed_at": datetime.now().strftime("%Y-%m-%d %H:%M:%S")
"last_updated": datetime.now().strftime("%Y-%m-%d %H:%M:%S") },
} content=summary_content,
existing_document.content = summary_content content_hash=content_hash,
existing_document.embedding = summary_embedding embedding=summary_embedding,
chunks=chunks
)
# Delete existing chunks and add new ones session.add(document)
await session.execute( documents_indexed += 1
delete(Chunk) logger.info(f"Successfully indexed new issue {issue_identifier} - {issue_title}")
.where(Chunk.document_id == existing_document.id)
)
# Assign new chunks to existing document
for chunk in chunks:
chunk.document_id = existing_document.id
session.add(chunk)
documents_updated += 1
else:
# Create and store new document
logger.info(f"Creating new document for issue {issue_identifier} - {issue_title}")
document = Document(
search_space_id=search_space_id,
title=f"Linear - {issue_identifier}: {issue_title}",
document_type=DocumentType.LINEAR_CONNECTOR,
document_metadata={
"issue_id": issue_id,
"issue_identifier": issue_identifier,
"issue_title": issue_title,
"state": state,
"comment_count": comment_count,
"indexed_at": datetime.now().strftime("%Y-%m-%d %H:%M:%S")
},
content=summary_content,
embedding=summary_embedding,
chunks=chunks
)
session.add(document)
documents_indexed += 1
logger.info(f"Successfully indexed new issue {issue_identifier} - {issue_title}")
except Exception as e: except Exception as e:
logger.error(f"Error processing issue {issue.get('identifier', 'Unknown')}: {str(e)}", exc_info=True) logger.error(f"Error processing issue {issue.get('identifier', 'Unknown')}: {str(e)}", exc_info=True)
@ -1052,7 +891,7 @@ async def index_linear_issues(
continue # Skip this issue and continue with others continue # Skip this issue and continue with others
# Update the last_indexed_at timestamp for the connector only if requested # Update the last_indexed_at timestamp for the connector only if requested
total_processed = documents_indexed + documents_updated total_processed = documents_indexed
if update_last_indexed: if update_last_indexed:
connector.last_indexed_at = datetime.now() connector.last_indexed_at = datetime.now()
logger.info(f"Updated last_indexed_at to {connector.last_indexed_at}") logger.info(f"Updated last_indexed_at to {connector.last_indexed_at}")
@ -1062,7 +901,7 @@ async def index_linear_issues(
logger.info(f"Successfully committed all Linear document changes to database") logger.info(f"Successfully committed all Linear document changes to database")
logger.info(f"Linear indexing completed: {documents_indexed} new issues, {documents_updated} updated, {documents_skipped} skipped") logger.info(f"Linear indexing completed: {documents_indexed} new issues, {documents_skipped} skipped")
return total_processed, None # Return None as the error message to indicate success return total_processed, None # Return None as the error message to indicate success
except SQLAlchemyError as db_error: except SQLAlchemyError as db_error:

View file

@ -0,0 +1,93 @@
from app.agents.podcaster.graph import graph as podcaster_graph
from app.agents.podcaster.state import State
from app.db import Chat, Podcast
from sqlalchemy import select
from sqlalchemy.ext.asyncio import AsyncSession
async def generate_document_podcast(
session: AsyncSession,
document_id: int,
search_space_id: int,
user_id: int
):
# TODO: Need to fetch the document chunks, then concatenate them and pass them to the podcast generation model
pass
async def generate_chat_podcast(
session: AsyncSession,
chat_id: int,
search_space_id: int,
podcast_title: str
):
# Fetch the chat with the specified ID
query = select(Chat).filter(
Chat.id == chat_id,
Chat.search_space_id == search_space_id
)
result = await session.execute(query)
chat = result.scalars().first()
if not chat:
raise ValueError(f"Chat with id {chat_id} not found in search space {search_space_id}")
# Create chat history structure
chat_history_str = "<chat_history>"
for message in chat.messages:
if message["role"] == "user":
chat_history_str += f"<user_message>{message['content']}</user_message>"
elif message["role"] == "assistant":
# Last annotation type will always be "ANSWER" here
answer_annotation = message["annotations"][-1]
answer_text = ""
if answer_annotation["type"] == "ANSWER":
answer_text = answer_annotation["content"]
# If content is a list, join it into a single string
if isinstance(answer_text, list):
answer_text = "\n".join(answer_text)
chat_history_str += f"<assistant_message>{answer_text}</assistant_message>"
chat_history_str += "</chat_history>"
# Pass it to the SurfSense Podcaster
config = {
"configurable": {
"podcast_title" : "Surfsense",
}
}
# Initialize state with database session and streaming service
initial_state = State(
source_content=chat_history_str,
)
# Run the graph directly
result = await podcaster_graph.ainvoke(initial_state, config=config)
# Convert podcast transcript entries to serializable format
serializable_transcript = []
for entry in result["podcast_transcript"]:
serializable_transcript.append({
"speaker_id": entry.speaker_id,
"dialog": entry.dialog
})
# Create a new podcast entry
podcast = Podcast(
title=f"{podcast_title}",
podcast_transcript=serializable_transcript,
file_location=result["final_podcast_file_path"],
search_space_id=search_space_id
)
# Add to session and commit
session.add(podcast)
await session.commit()
await session.refresh(podcast)
return podcast

View file

@ -1,4 +1,4 @@
from typing import AsyncGenerator, List, Union from typing import Any, AsyncGenerator, List, Union
from uuid import UUID from uuid import UUID
from app.agents.researcher.graph import graph as researcher_graph from app.agents.researcher.graph import graph as researcher_graph
@ -6,6 +6,8 @@ from app.agents.researcher.state import State
from app.utils.streaming_service import StreamingService from app.utils.streaming_service import StreamingService
from sqlalchemy.ext.asyncio import AsyncSession from sqlalchemy.ext.asyncio import AsyncSession
from app.agents.researcher.configuration import SearchMode
async def stream_connector_search_results( async def stream_connector_search_results(
user_query: str, user_query: str,
@ -13,7 +15,9 @@ async def stream_connector_search_results(
search_space_id: int, search_space_id: int,
session: AsyncSession, session: AsyncSession,
research_mode: str, research_mode: str,
selected_connectors: List[str] selected_connectors: List[str],
langchain_chat_history: List[Any],
search_mode_str: str
) -> AsyncGenerator[str, None]: ) -> AsyncGenerator[str, None]:
""" """
Stream connector search results to the client Stream connector search results to the client
@ -40,6 +44,11 @@ async def stream_connector_search_results(
# Convert UUID to string if needed # Convert UUID to string if needed
user_id_str = str(user_id) if isinstance(user_id, UUID) else user_id user_id_str = str(user_id) if isinstance(user_id, UUID) else user_id
if search_mode_str == "CHUNKS":
search_mode = SearchMode.CHUNKS
elif search_mode_str == "DOCUMENTS":
search_mode = SearchMode.DOCUMENTS
# Sample configuration # Sample configuration
config = { config = {
"configurable": { "configurable": {
@ -47,13 +56,15 @@ async def stream_connector_search_results(
"num_sections": NUM_SECTIONS, "num_sections": NUM_SECTIONS,
"connectors_to_search": selected_connectors, "connectors_to_search": selected_connectors,
"user_id": user_id_str, "user_id": user_id_str,
"search_space_id": search_space_id "search_space_id": search_space_id,
"search_mode": search_mode
} }
} }
# Initialize state with database session and streaming service # Initialize state with database session and streaming service
initial_state = State( initial_state = State(
db_session=session, db_session=session,
streaming_service=streaming_service streaming_service=streaming_service,
chat_history=langchain_chat_history
) )
# Run the graph directly # Run the graph directly

View file

@ -10,8 +10,8 @@ from fastapi_users.authentication import (
JWTStrategy, JWTStrategy,
) )
from fastapi_users.db import SQLAlchemyUserDatabase from fastapi_users.db import SQLAlchemyUserDatabase
from httpx_oauth.clients.google import GoogleOAuth2 from fastapi.responses import JSONResponse
from fastapi_users.schemas import model_dump
from app.config import config from app.config import config
from app.db import User, get_user_db from app.db import User, get_user_db
from pydantic import BaseModel from pydantic import BaseModel
@ -22,10 +22,13 @@ class BearerResponse(BaseModel):
SECRET = config.SECRET_KEY SECRET = config.SECRET_KEY
google_oauth_client = GoogleOAuth2( if config.AUTH_TYPE == "GOOGLE":
config.GOOGLE_OAUTH_CLIENT_ID, from httpx_oauth.clients.google import GoogleOAuth2
config.GOOGLE_OAUTH_CLIENT_SECRET,
) google_oauth_client = GoogleOAuth2(
config.GOOGLE_OAUTH_CLIENT_ID,
config.GOOGLE_OAUTH_CLIENT_SECRET,
)
class UserManager(UUIDIDMixin, BaseUserManager[User, uuid.UUID]): class UserManager(UUIDIDMixin, BaseUserManager[User, uuid.UUID]):
@ -79,7 +82,10 @@ class CustomBearerTransport(BearerTransport):
async def get_login_response(self, token: str) -> Response: async def get_login_response(self, token: str) -> Response:
bearer_response = BearerResponse(access_token=token, token_type="bearer") bearer_response = BearerResponse(access_token=token, token_type="bearer")
redirect_url = f"{config.NEXT_FRONTEND_URL}/auth/callback?token={bearer_response.access_token}" redirect_url = f"{config.NEXT_FRONTEND_URL}/auth/callback?token={bearer_response.access_token}"
return RedirectResponse(redirect_url, status_code=302) if config.AUTH_TYPE == "GOOGLE":
return RedirectResponse(redirect_url, status_code=302)
else:
return JSONResponse(model_dump(bearer_response))
bearer_transport = CustomBearerTransport(tokenUrl="auth/jwt/login") bearer_transport = CustomBearerTransport(tokenUrl="auth/jwt/login")

File diff suppressed because it is too large Load diff

View file

@ -1,3 +1,6 @@
import hashlib
async def convert_element_to_markdown(element) -> str: async def convert_element_to_markdown(element) -> str:
""" """
Convert an Unstructured element to markdown format based on its category. Convert an Unstructured element to markdown format based on its category.
@ -55,6 +58,7 @@ async def convert_document_to_markdown(elements):
return "".join(markdown_parts) return "".join(markdown_parts)
def convert_chunks_to_langchain_documents(chunks): def convert_chunks_to_langchain_documents(chunks):
""" """
Convert chunks from hybrid search results to LangChain Document objects. Convert chunks from hybrid search results to LangChain Document objects.
@ -97,7 +101,8 @@ def convert_chunks_to_langchain_documents(chunks):
# Add document metadata if available # Add document metadata if available
if "metadata" in doc: if "metadata" in doc:
# Prefix document metadata keys to avoid conflicts # Prefix document metadata keys to avoid conflicts
doc_metadata = {f"doc_meta_{k}": v for k, v in doc.get("metadata", {}).items()} doc_metadata = {f"doc_meta_{k}": v for k,
v in doc.get("metadata", {}).items()}
metadata.update(doc_metadata) metadata.update(doc_metadata)
# Add source URL if available in metadata # Add source URL if available in metadata
@ -134,3 +139,8 @@ def convert_chunks_to_langchain_documents(chunks):
langchain_docs.append(langchain_doc) langchain_docs.append(langchain_doc)
return langchain_docs return langchain_docs
def generate_content_hash(content: str) -> str:
"""Generate SHA-256 hash for the given content."""
return hashlib.sha256(content.encode('utf-8')).hexdigest()

View file

@ -1,8 +1,8 @@
""" import datetime
NOTE: This is not used anymore. Might be removed in the future. from langchain.schema import HumanMessage, SystemMessage, AIMessage
"""
from langchain.schema import HumanMessage, SystemMessage
from app.config import config from app.config import config
from typing import Any, List, Optional
class QueryService: class QueryService:
""" """
@ -10,13 +10,14 @@ class QueryService:
""" """
@staticmethod @staticmethod
async def reformulate_query(user_query: str) -> str: async def reformulate_query_with_chat_history(user_query: str, chat_history_str: Optional[str] = None) -> str:
""" """
Reformulate the user query using the STRATEGIC_LLM to make it more Reformulate the user query using the STRATEGIC_LLM to make it more
effective for information retrieval and research purposes. effective for information retrieval and research purposes.
Args: Args:
user_query: The original user query user_query: The original user query
chat_history: Optional list of previous chat messages
Returns: Returns:
str: The reformulated query str: The reformulated query
@ -30,31 +31,30 @@ class QueryService:
# Create system message with instructions # Create system message with instructions
system_message = SystemMessage( system_message = SystemMessage(
content=""" content=f"""
You are an expert at reformulating user queries to optimize information retrieval. Today's date: {datetime.datetime.now().strftime("%Y-%m-%d")}
Your job is to take a user query and reformulate it to: You are a highly skilled AI assistant specializing in query optimization for advanced research.
Your primary objective is to transform a user's initial query into a highly effective search query.
This reformulated query will be used to retrieve information from diverse data sources.
1. Make it more specific and detailed **Chat History Context:**
2. Expand ambiguous terms {chat_history_str if chat_history_str else "No prior conversation history is available."}
3. Include relevant synonyms and alternative phrasings If chat history is provided, analyze it to understand the user's evolving information needs and the broader context of their request. Use this understanding to refine the current query, ensuring it builds upon or clarifies previous interactions.
4. Break down complex questions into their core components
5. Ensure it's comprehensive for research purposes
The query will be used with the following data sources/connectors: **Query Reformulation Guidelines:**
- SERPER_API: Web search for retrieving current information from the internet Your reformulated query should:
- TAVILY_API: Research-focused search API for comprehensive information 1. **Enhance Specificity and Detail:** Add precision to narrow the search focus effectively, making the query less ambiguous and more targeted.
- SLACK_CONNECTOR: Retrieves information from indexed Slack workspace conversations 2. **Resolve Ambiguities:** Identify and clarify vague terms or phrases. If a term has multiple meanings, orient the query towards the most likely one given the context.
- NOTION_CONNECTOR: Retrieves information from indexed Notion documents and databases 3. **Expand Key Concepts:** Incorporate relevant synonyms, related terms, and alternative phrasings for core concepts. This helps capture a wider range of relevant documents.
- FILE: Searches through user's uploaded files 4. **Deconstruct Complex Questions:** If the original query is multifaceted, break it down into its core searchable components or rephrase it to address each aspect clearly. The final output must still be a single, coherent query string.
- CRAWLED_URL: Searches through previously crawled web pages 5. **Optimize for Comprehensiveness:** Ensure the query is structured to uncover all essential facets of the original request, aiming for thorough information retrieval suitable for research.
6. **Maintain User Intent:** The reformulated query must stay true to the original intent of the user's query. Do not introduce new topics or shift the focus significantly.
IMPORTANT: Keep the reformulated query as concise as possible while still being effective. **Crucial Constraints:**
Avoid unnecessary verbosity and limit the query to only essential terms and concepts. * **Conciseness and Effectiveness:** While aiming for comprehensiveness, the reformulated query MUST be as concise as possible. Eliminate all unnecessary verbosity. Focus on essential keywords, entities, and concepts that directly contribute to effective retrieval.
* **Single, Direct Output:** Return ONLY the reformulated query itself. Do NOT include any explanations, introductory phrases (e.g., "Reformulated query:", "Here is the optimized query:"), or any other surrounding text or markdown formatting.
Please optimize the query to work effectively across these different data sources. Your output should be a single, optimized query string, ready for immediate use in a search system.
Return ONLY the reformulated query without explanations, prefixes, or commentary.
Do not include phrases like "Reformulated query:" or any other text except the query itself.
""" """
) )
@ -79,3 +79,22 @@ class QueryService:
# Log the error and return the original query # Log the error and return the original query
print(f"Error reformulating query: {e}") print(f"Error reformulating query: {e}")
return user_query return user_query
@staticmethod
async def langchain_chat_history_to_str(chat_history: List[Any]) -> str:
"""
Convert a list of chat history messages to a string.
"""
chat_history_str = "<chat_history>\n"
for chat_message in chat_history:
if isinstance(chat_message, HumanMessage):
chat_history_str += f"<user>{chat_message.content}</user>\n"
elif isinstance(chat_message, AIMessage):
chat_history_str += f"<assistant>{chat_message.content}</assistant>\n"
elif isinstance(chat_message, SystemMessage):
chat_history_str += f"<system>{chat_message.content}</system>\n"
chat_history_str += "</chat_history>"
return chat_history_str

View file

@ -1,5 +0,0 @@
from app.agents.researcher.graph import graph as researcher_graph
from app.agents.researcher.sub_section_writer.graph import graph as sub_section_writer_graph
print(researcher_graph.get_graph().draw_mermaid())
print(sub_section_writer_graph.get_graph().draw_mermaid())

View file

@ -1,13 +1,13 @@
[project] [project]
name = "surf-new-backend" name = "surf-new-backend"
version = "0.0.6" version = "0.0.7"
description = "SurfSense Backend" description = "SurfSense Backend"
readme = "README.md" readme = "README.md"
requires-python = ">=3.12" requires-python = ">=3.12"
dependencies = [ dependencies = [
"alembic>=1.13.0", "alembic>=1.13.0",
"asyncpg>=0.30.0", "asyncpg>=0.30.0",
"chonkie[all]>=0.4.1", "chonkie[all]>=1.0.6",
"fastapi>=0.115.8", "fastapi>=0.115.8",
"fastapi-users[oauth,sqlalchemy]>=14.0.1", "fastapi-users[oauth,sqlalchemy]>=14.0.1",
"firecrawl-py>=1.12.0", "firecrawl-py>=1.12.0",
@ -15,14 +15,18 @@ dependencies = [
"langchain-community>=0.3.17", "langchain-community>=0.3.17",
"langchain-unstructured>=0.1.6", "langchain-unstructured>=0.1.6",
"langgraph>=0.3.29", "langgraph>=0.3.29",
"linkup-sdk>=0.2.4",
"litellm>=1.61.4", "litellm>=1.61.4",
"llama-cloud-services>=0.6.25",
"markdownify>=0.14.1", "markdownify>=0.14.1",
"notion-client>=2.3.0", "notion-client>=2.3.0",
"pgvector>=0.3.6", "pgvector>=0.3.6",
"playwright>=1.50.0", "playwright>=1.50.0",
"python-ffmpeg>=2.0.12",
"rerankers[flashrank]>=0.7.1", "rerankers[flashrank]>=0.7.1",
"sentence-transformers>=3.4.1", "sentence-transformers>=3.4.1",
"slack-sdk>=3.34.0", "slack-sdk>=3.34.0",
"static-ffmpeg>=2.13",
"tavily-python>=0.3.2", "tavily-python>=0.3.2",
"unstructured-client>=0.30.0", "unstructured-client>=0.30.0",
"unstructured[all-docs]>=0.16.25", "unstructured[all-docs]>=0.16.25",

View file

@ -13,6 +13,24 @@ resolution-markers = [
"(python_full_version < '3.12.4' and platform_machine != 'aarch64' and sys_platform == 'linux') or (python_full_version < '3.12.4' and sys_platform != 'darwin' and sys_platform != 'linux')", "(python_full_version < '3.12.4' and platform_machine != 'aarch64' and sys_platform == 'linux') or (python_full_version < '3.12.4' and sys_platform != 'darwin' and sys_platform != 'linux')",
] ]
[[package]]
name = "accelerate"
version = "1.6.0"
source = { registry = "https://pypi.org/simple" }
dependencies = [
{ name = "huggingface-hub" },
{ name = "numpy" },
{ name = "packaging" },
{ name = "psutil" },
{ name = "pyyaml" },
{ name = "safetensors" },
{ name = "torch" },
]
sdist = { url = "https://files.pythonhosted.org/packages/8a/6e/c29a1dcde7db07f47870ed63e5124086b11874ad52ccd533dc1ca2c799da/accelerate-1.6.0.tar.gz", hash = "sha256:28c1ef1846e690944f98b68dc7b8bb6c51d032d45e85dcbb3adb0c8b99dffb32", size = 363804 }
wheels = [
{ url = "https://files.pythonhosted.org/packages/63/b1/8198e3cdd11a426b1df2912e3381018c4a4a55368f6d0857ba3ca418ef93/accelerate-1.6.0-py3-none-any.whl", hash = "sha256:1aee717d3d3735ad6d09710a7c26990ee4652b79b4e93df46551551b5227c2aa", size = 354748 },
]
[[package]] [[package]]
name = "aiofiles" name = "aiofiles"
version = "24.1.0" version = "24.1.0"
@ -92,6 +110,18 @@ wheels = [
{ url = "https://files.pythonhosted.org/packages/ec/6a/bc7e17a3e87a2985d3e8f4da4cd0f481060eb78fb08596c42be62c90a4d9/aiosignal-1.3.2-py2.py3-none-any.whl", hash = "sha256:45cde58e409a301715980c2b01d0c28bdde3770d8290b5eb2173759d9acb31a5", size = 7597 }, { url = "https://files.pythonhosted.org/packages/ec/6a/bc7e17a3e87a2985d3e8f4da4cd0f481060eb78fb08596c42be62c90a4d9/aiosignal-1.3.2-py2.py3-none-any.whl", hash = "sha256:45cde58e409a301715980c2b01d0c28bdde3770d8290b5eb2173759d9acb31a5", size = 7597 },
] ]
[[package]]
name = "aiosqlite"
version = "0.21.0"
source = { registry = "https://pypi.org/simple" }
dependencies = [
{ name = "typing-extensions" },
]
sdist = { url = "https://files.pythonhosted.org/packages/13/7d/8bca2bf9a247c2c5dfeec1d7a5f40db6518f88d314b8bca9da29670d2671/aiosqlite-0.21.0.tar.gz", hash = "sha256:131bb8056daa3bc875608c631c678cda73922a2d4ba8aec373b19f18c17e7aa3", size = 13454 }
wheels = [
{ url = "https://files.pythonhosted.org/packages/f5/10/6c25ed6de94c49f88a91fa5018cb4c0f3625f31d5be9f771ebe5cc7cd506/aiosqlite-0.21.0-py3-none-any.whl", hash = "sha256:2549cf4057f95f53dcba16f2b64e8e2791d7e1adedb13197dd8ed77bb226d7d0", size = 15792 },
]
[[package]] [[package]]
name = "alembic" name = "alembic"
version = "1.15.2" version = "1.15.2"
@ -201,19 +231,6 @@ wheels = [
{ url = "https://files.pythonhosted.org/packages/fc/30/d4986a882011f9df997a55e6becd864812ccfcd821d64aac8570ee39f719/attrs-25.1.0-py3-none-any.whl", hash = "sha256:c75a69e28a550a7e93789579c22aa26b0f5b83b75dc4e08fe092980051e1090a", size = 63152 }, { url = "https://files.pythonhosted.org/packages/fc/30/d4986a882011f9df997a55e6becd864812ccfcd821d64aac8570ee39f719/attrs-25.1.0-py3-none-any.whl", hash = "sha256:c75a69e28a550a7e93789579c22aa26b0f5b83b75dc4e08fe092980051e1090a", size = 63152 },
] ]
[[package]]
name = "autotiktokenizer"
version = "0.2.2"
source = { registry = "https://pypi.org/simple" }
dependencies = [
{ name = "huggingface-hub" },
{ name = "tiktoken" },
]
sdist = { url = "https://files.pythonhosted.org/packages/a6/1a/c6f494750dc67c2e5b06b91ae9565d46adb384f25f61a7136ff79dd02413/autotiktokenizer-0.2.2.tar.gz", hash = "sha256:f0954f14cedfe538b96ba0eed2e39996378c0bdf649fd977d6a047e419e05fdb", size = 15401 }
wheels = [
{ url = "https://files.pythonhosted.org/packages/d8/7b/c34469a1495d755bac1c80fbf3c0c2c29eb03ffe61172d889426025173bd/autotiktokenizer-0.2.2-py3-none-any.whl", hash = "sha256:ebbf15d9d5516fcb3287a8153bd8efbcc932f9c99089b2357255413cf37815d9", size = 8957 },
]
[[package]] [[package]]
name = "backoff" name = "backoff"
version = "2.2.1" version = "2.2.1"
@ -223,6 +240,22 @@ wheels = [
{ url = "https://files.pythonhosted.org/packages/df/73/b6e24bd22e6720ca8ee9a85a0c4a2971af8497d8f3193fa05390cbd46e09/backoff-2.2.1-py3-none-any.whl", hash = "sha256:63579f9a0628e06278f7e47b7d7d5b6ce20dc65c5e96a6f3ca99a6adca0396e8", size = 15148 }, { url = "https://files.pythonhosted.org/packages/df/73/b6e24bd22e6720ca8ee9a85a0c4a2971af8497d8f3193fa05390cbd46e09/backoff-2.2.1-py3-none-any.whl", hash = "sha256:63579f9a0628e06278f7e47b7d7d5b6ce20dc65c5e96a6f3ca99a6adca0396e8", size = 15148 },
] ]
[[package]]
name = "banks"
version = "2.1.2"
source = { registry = "https://pypi.org/simple" }
dependencies = [
{ name = "deprecated" },
{ name = "griffe" },
{ name = "jinja2" },
{ name = "platformdirs" },
{ name = "pydantic" },
]
sdist = { url = "https://files.pythonhosted.org/packages/77/34/2b6697f02ffb68bee50e5fd37d6c64432244d3245603fd62950169dfed7e/banks-2.1.2.tar.gz", hash = "sha256:a0651db9d14b57fa2e115e78f68dbb1b36fe226ad6eef96192542908b1d20c1f", size = 173332 }
wheels = [
{ url = "https://files.pythonhosted.org/packages/04/4a/7fdca29d1db62f5f5c3446bf8f668beacdb0b5a8aff4247574ddfddc6bcd/banks-2.1.2-py3-none-any.whl", hash = "sha256:7fba451069f6bea376483b8136a0f29cb1e6883133626d00e077e20a3d102c0e", size = 28064 },
]
[[package]] [[package]]
name = "bcrypt" name = "bcrypt"
version = "4.2.1" version = "4.2.1"
@ -363,23 +396,36 @@ wheels = [
[[package]] [[package]]
name = "chonkie" name = "chonkie"
version = "0.4.1" version = "1.0.6"
source = { registry = "https://pypi.org/simple" } source = { registry = "https://pypi.org/simple" }
dependencies = [ dependencies = [
{ name = "autotiktokenizer" }, { name = "tokenizers" },
{ name = "tqdm" }, { name = "tqdm" },
] ]
sdist = { url = "https://files.pythonhosted.org/packages/2e/94/4a1bc8bdf06e7327bb256abb85767647125286c9bbc7cbcd77a550b96d63/chonkie-0.4.1.tar.gz", hash = "sha256:164216efa01af02e750e7cb218cea87918a18f83ebbd8f020b25557f1ed36aa9", size = 43284 } sdist = { url = "https://files.pythonhosted.org/packages/5a/db/16d5d23a216db734bcb68e61c466ff48a55dc0d2cdc7ecdd73aaea1f6f7d/chonkie-1.0.6.tar.gz", hash = "sha256:feefad3cbbb62b4a55f4c6409bd8d8f0ee180d8319c4d32e31539a768955b3b0", size = 70056 }
wheels = [ wheels = [
{ url = "https://files.pythonhosted.org/packages/c0/b5/c0d77500a413794773edb630bdc7061121c237a4eaf6ce222226c200d603/chonkie-0.4.1-py3-none-any.whl", hash = "sha256:af7d95d17f4ed60a26e32f0bad60f807287e3301189114755d727657ed2ef964", size = 51193 }, { url = "https://files.pythonhosted.org/packages/bc/46/d6d9789eb6e61bfa073a13fd2b5cbbcf022a7781adbb060a25d82f16437e/chonkie-1.0.6-py3-none-any.whl", hash = "sha256:d8cfcf665cb6a64ac6ca87da61207372a88b9e5a7bb697faade78069c853e4b1", size = 89526 },
] ]
[package.optional-dependencies] [package.optional-dependencies]
all = [ all = [
{ name = "accelerate" },
{ name = "cohere" },
{ name = "google-genai" },
{ name = "huggingface-hub" },
{ name = "jsonschema" },
{ name = "magika" },
{ name = "model2vec" }, { name = "model2vec" },
{ name = "numpy" }, { name = "numpy" },
{ name = "openai" }, { name = "openai" },
{ name = "pydantic" },
{ name = "rich" },
{ name = "sentence-transformers" }, { name = "sentence-transformers" },
{ name = "tiktoken" },
{ name = "torch" },
{ name = "transformers" },
{ name = "tree-sitter" },
{ name = "tree-sitter-language-pack" },
] ]
[[package]] [[package]]
@ -394,6 +440,26 @@ wheels = [
{ url = "https://files.pythonhosted.org/packages/7e/d4/7ebdbd03970677812aac39c869717059dbb71a4cfc033ca6e5221787892c/click-8.1.8-py3-none-any.whl", hash = "sha256:63c132bbbed01578a06712a2d1f497bb62d9c1c0d329b7903a866228027263b2", size = 98188 }, { url = "https://files.pythonhosted.org/packages/7e/d4/7ebdbd03970677812aac39c869717059dbb71a4cfc033ca6e5221787892c/click-8.1.8-py3-none-any.whl", hash = "sha256:63c132bbbed01578a06712a2d1f497bb62d9c1c0d329b7903a866228027263b2", size = 98188 },
] ]
[[package]]
name = "cohere"
version = "5.15.0"
source = { registry = "https://pypi.org/simple" }
dependencies = [
{ name = "fastavro" },
{ name = "httpx" },
{ name = "httpx-sse" },
{ name = "pydantic" },
{ name = "pydantic-core" },
{ name = "requests" },
{ name = "tokenizers" },
{ name = "types-requests" },
{ name = "typing-extensions" },
]
sdist = { url = "https://files.pythonhosted.org/packages/a1/33/69c7d1b25a20eafef4197a1444c7f87d5241e936194e54876ea8996157e6/cohere-5.15.0.tar.gz", hash = "sha256:e802d4718ddb0bb655654382ebbce002756a3800faac30296cde7f1bdc6ff2cc", size = 135021 }
wheels = [
{ url = "https://files.pythonhosted.org/packages/c7/87/94694db7fe6df979fbc03286eaabdfa98f1c8fa532960e5afdf965e10960/cohere-5.15.0-py3-none-any.whl", hash = "sha256:22ff867c2a6f2fc2b585360c6072f584f11f275ef6d9242bac24e0fa2df1dfb5", size = 259522 },
]
[[package]] [[package]]
name = "colorama" name = "colorama"
version = "0.4.6" version = "0.4.6"
@ -534,6 +600,15 @@ wheels = [
{ url = "https://files.pythonhosted.org/packages/6e/c6/ac0b6c1e2d138f1002bcf799d330bd6d85084fece321e662a14223794041/Deprecated-1.2.18-py2.py3-none-any.whl", hash = "sha256:bd5011788200372a32418f888e326a09ff80d0214bd961147cfed01b5c018eec", size = 9998 }, { url = "https://files.pythonhosted.org/packages/6e/c6/ac0b6c1e2d138f1002bcf799d330bd6d85084fece321e662a14223794041/Deprecated-1.2.18-py2.py3-none-any.whl", hash = "sha256:bd5011788200372a32418f888e326a09ff80d0214bd961147cfed01b5c018eec", size = 9998 },
] ]
[[package]]
name = "dirtyjson"
version = "1.0.8"
source = { registry = "https://pypi.org/simple" }
sdist = { url = "https://files.pythonhosted.org/packages/db/04/d24f6e645ad82ba0ef092fa17d9ef7a21953781663648a01c9371d9e8e98/dirtyjson-1.0.8.tar.gz", hash = "sha256:90ca4a18f3ff30ce849d100dcf4a003953c79d3a2348ef056f1d9c22231a25fd", size = 30782 }
wheels = [
{ url = "https://files.pythonhosted.org/packages/68/69/1bcf70f81de1b4a9f21b3a62ec0c83bdff991c88d6cc2267d02408457e88/dirtyjson-1.0.8-py3-none-any.whl", hash = "sha256:125e27248435a58acace26d5c2c4c11a1c0de0a9c5124c5a94ba78e517d74f53", size = 25197 },
]
[[package]] [[package]]
name = "distro" name = "distro"
version = "1.9.0" version = "1.9.0"
@ -552,6 +627,15 @@ wheels = [
{ url = "https://files.pythonhosted.org/packages/68/1b/e0a87d256e40e8c888847551b20a017a6b98139178505dc7ffb96f04e954/dnspython-2.7.0-py3-none-any.whl", hash = "sha256:b4c34b7d10b51bcc3a5071e7b8dee77939f1e878477eeecc965e9835f63c6c86", size = 313632 }, { url = "https://files.pythonhosted.org/packages/68/1b/e0a87d256e40e8c888847551b20a017a6b98139178505dc7ffb96f04e954/dnspython-2.7.0-py3-none-any.whl", hash = "sha256:b4c34b7d10b51bcc3a5071e7b8dee77939f1e878477eeecc965e9835f63c6c86", size = 313632 },
] ]
[[package]]
name = "docutils"
version = "0.21.2"
source = { registry = "https://pypi.org/simple" }
sdist = { url = "https://files.pythonhosted.org/packages/ae/ed/aefcc8cd0ba62a0560c3c18c33925362d46c6075480bfa4df87b28e169a9/docutils-0.21.2.tar.gz", hash = "sha256:3a6b18732edf182daa3cd12775bbb338cf5691468f91eeeb109deff6ebfa986f", size = 2204444 }
wheels = [
{ url = "https://files.pythonhosted.org/packages/8f/d7/9322c609343d929e75e7e5e6255e614fcc67572cfd083959cdef3b7aad79/docutils-0.21.2-py3-none-any.whl", hash = "sha256:dafca5b9e384f0e419294eb4d2ff9fa826435bf15f15b7bd45723e8ad76811b2", size = 587408 },
]
[[package]] [[package]]
name = "effdet" name = "effdet"
version = "0.4.1" version = "0.4.1"
@ -660,6 +744,26 @@ wheels = [
{ url = "https://files.pythonhosted.org/packages/a6/08/9968963c1fb8c34627b7f1fbcdfe9438540f87dc7c9bfb59bb4fd19a4ecf/fastapi_users_db_sqlalchemy-7.0.0-py3-none-any.whl", hash = "sha256:5fceac018e7cfa69efc70834dd3035b3de7988eb4274154a0dbe8b14f5aa001e", size = 6891 }, { url = "https://files.pythonhosted.org/packages/a6/08/9968963c1fb8c34627b7f1fbcdfe9438540f87dc7c9bfb59bb4fd19a4ecf/fastapi_users_db_sqlalchemy-7.0.0-py3-none-any.whl", hash = "sha256:5fceac018e7cfa69efc70834dd3035b3de7988eb4274154a0dbe8b14f5aa001e", size = 6891 },
] ]
[[package]]
name = "fastavro"
version = "1.10.0"
source = { registry = "https://pypi.org/simple" }
sdist = { url = "https://files.pythonhosted.org/packages/f3/67/7121d2221e998706cac00fa779ec44c1c943cb65e8a7ed1bd57d78d93f2c/fastavro-1.10.0.tar.gz", hash = "sha256:47bf41ac6d52cdfe4a3da88c75a802321321b37b663a900d12765101a5d6886f", size = 987970 }
wheels = [
{ url = "https://files.pythonhosted.org/packages/9c/a4/8e69c0a5cd121e5d476237de1bde5a7947f791ae45768ae52ed0d3ea8d18/fastavro-1.10.0-cp312-cp312-macosx_10_13_universal2.whl", hash = "sha256:cfe57cb0d72f304bd0dcc5a3208ca6a7363a9ae76f3073307d095c9d053b29d4", size = 1036343 },
{ url = "https://files.pythonhosted.org/packages/1e/01/aa219e2b33e5873d27b867ec0fad9f35f23d461114e1135a7e46c06786d2/fastavro-1.10.0-cp312-cp312-manylinux_2_17_aarch64.manylinux2014_aarch64.whl", hash = "sha256:74e517440c824cb65fb29d3e3903a9406f4d7c75490cef47e55c4c82cdc66270", size = 3263368 },
{ url = "https://files.pythonhosted.org/packages/a7/ba/1766e2d7d95df2e95e9e9a089dc7a537c0616720b053a111a918fa7ee6b6/fastavro-1.10.0-cp312-cp312-manylinux_2_17_x86_64.manylinux2014_x86_64.whl", hash = "sha256:203c17d44cadde76e8eecb30f2d1b4f33eb478877552d71f049265dc6f2ecd10", size = 3328933 },
{ url = "https://files.pythonhosted.org/packages/2e/40/26e56696b9696ab4fbba25a96b8037ca3f9fd8a8cc55b4b36400ef023e49/fastavro-1.10.0-cp312-cp312-musllinux_1_2_aarch64.whl", hash = "sha256:6575be7f2b5f94023b5a4e766b0251924945ad55e9a96672dc523656d17fe251", size = 3258045 },
{ url = "https://files.pythonhosted.org/packages/4e/bc/2f6c92c06c5363372abe828bccdd95762f2c1983b261509f94189c38c8a1/fastavro-1.10.0-cp312-cp312-musllinux_1_2_x86_64.whl", hash = "sha256:fe471deb675ed2f01ee2aac958fbf8ebb13ea00fa4ce7f87e57710a0bc592208", size = 3418001 },
{ url = "https://files.pythonhosted.org/packages/0c/ce/cfd16546c04ebbca1be80873b533c788cec76f7bfac231bfac6786047572/fastavro-1.10.0-cp312-cp312-win_amd64.whl", hash = "sha256:567ff515f2a5d26d9674b31c95477f3e6022ec206124c62169bc2ffaf0889089", size = 487855 },
{ url = "https://files.pythonhosted.org/packages/c9/c4/163cf154cc694c2dccc70cd6796db6214ac668a1260bf0310401dad188dc/fastavro-1.10.0-cp313-cp313-macosx_10_13_universal2.whl", hash = "sha256:82263af0adfddb39c85f9517d736e1e940fe506dfcc35bc9ab9f85e0fa9236d8", size = 1022741 },
{ url = "https://files.pythonhosted.org/packages/38/01/a24598f5f31b8582a92fe9c41bf91caeed50d5b5eaa7576e6f8b23cb488d/fastavro-1.10.0-cp313-cp313-manylinux_2_17_aarch64.manylinux2014_aarch64.whl", hash = "sha256:566c193109ff0ff84f1072a165b7106c4f96050078a4e6ac7391f81ca1ef3efa", size = 3237421 },
{ url = "https://files.pythonhosted.org/packages/a7/bf/08bcf65cfb7feb0e5b1329fafeb4a9b95b7b5ec723ba58c7dbd0d04ded34/fastavro-1.10.0-cp313-cp313-manylinux_2_17_x86_64.manylinux2014_x86_64.whl", hash = "sha256:e400d2e55d068404d9fea7c5021f8b999c6f9d9afa1d1f3652ec92c105ffcbdd", size = 3300222 },
{ url = "https://files.pythonhosted.org/packages/53/4d/a6c25f3166328f8306ec2e6be1123ed78a55b8ab774a43a661124508881f/fastavro-1.10.0-cp313-cp313-musllinux_1_2_aarch64.whl", hash = "sha256:9b8227497f71565270f9249fc9af32a93644ca683a0167cfe66d203845c3a038", size = 3233276 },
{ url = "https://files.pythonhosted.org/packages/47/1c/b2b2ce2bf866a248ae23e96a87b3b8369427ff79be9112073039bee1d245/fastavro-1.10.0-cp313-cp313-musllinux_1_2_x86_64.whl", hash = "sha256:8e62d04c65461b30ac6d314e4197ad666371e97ae8cb2c16f971d802f6c7f514", size = 3388936 },
{ url = "https://files.pythonhosted.org/packages/1f/2c/43927e22a2d57587b3aa09765098a6d833246b672d34c10c5f135414745a/fastavro-1.10.0-cp313-cp313-win_amd64.whl", hash = "sha256:86baf8c9740ab570d0d4d18517da71626fe9be4d1142bea684db52bd5adb078f", size = 483967 },
]
[[package]] [[package]]
name = "filelock" name = "filelock"
version = "3.17.0" version = "3.17.0"
@ -858,6 +962,24 @@ wheels = [
{ url = "https://files.pythonhosted.org/packages/94/b6/60f2910485d32f7bba92cc33e5053b3f29d61fccaa57e5e58c600bb7e0d2/google_cloud_vision-3.10.1-py3-none-any.whl", hash = "sha256:91959ea12b0d6a8442e30c0a5062cd305f349a4840f9184b5061b3153bbd8476", size = 526076 }, { url = "https://files.pythonhosted.org/packages/94/b6/60f2910485d32f7bba92cc33e5053b3f29d61fccaa57e5e58c600bb7e0d2/google_cloud_vision-3.10.1-py3-none-any.whl", hash = "sha256:91959ea12b0d6a8442e30c0a5062cd305f349a4840f9184b5061b3153bbd8476", size = 526076 },
] ]
[[package]]
name = "google-genai"
version = "1.12.1"
source = { registry = "https://pypi.org/simple" }
dependencies = [
{ name = "anyio" },
{ name = "google-auth" },
{ name = "httpx" },
{ name = "pydantic" },
{ name = "requests" },
{ name = "typing-extensions" },
{ name = "websockets" },
]
sdist = { url = "https://files.pythonhosted.org/packages/38/9c/c907dbea921663bb7c41f415337bedd08259d17da8d156396c7237611744/google_genai-1.12.1.tar.gz", hash = "sha256:5c7eda422360643ce602a3f6b23152470ec1039310ef40080cbe4e71237f6391", size = 167752 }
wheels = [
{ url = "https://files.pythonhosted.org/packages/29/2c/5b454dec837328eb167e78f45a14da502af223f8b94a4824e2fd0df74f19/google_genai-1.12.1-py3-none-any.whl", hash = "sha256:7cbc1bc029712946ce41bcf80c0eaa89eb8c09c308efbbfe30fd491f402c258a", size = 165940 },
]
[[package]] [[package]]
name = "googleapis-common-protos" name = "googleapis-common-protos"
version = "1.69.2" version = "1.69.2"
@ -903,6 +1025,18 @@ wheels = [
{ url = "https://files.pythonhosted.org/packages/ac/38/08cc303ddddc4b3d7c628c3039a61a3aae36c241ed01393d00c2fd663473/greenlet-3.1.1-cp313-cp313t-musllinux_1_1_x86_64.whl", hash = "sha256:411f015496fec93c1c8cd4e5238da364e1da7a124bcb293f085bf2860c32c6f6", size = 1142112 }, { url = "https://files.pythonhosted.org/packages/ac/38/08cc303ddddc4b3d7c628c3039a61a3aae36c241ed01393d00c2fd663473/greenlet-3.1.1-cp313-cp313t-musllinux_1_1_x86_64.whl", hash = "sha256:411f015496fec93c1c8cd4e5238da364e1da7a124bcb293f085bf2860c32c6f6", size = 1142112 },
] ]
[[package]]
name = "griffe"
version = "1.7.3"
source = { registry = "https://pypi.org/simple" }
dependencies = [
{ name = "colorama" },
]
sdist = { url = "https://files.pythonhosted.org/packages/a9/3e/5aa9a61f7c3c47b0b52a1d930302992229d191bf4bc76447b324b731510a/griffe-1.7.3.tar.gz", hash = "sha256:52ee893c6a3a968b639ace8015bec9d36594961e156e23315c8e8e51401fa50b", size = 395137 }
wheels = [
{ url = "https://files.pythonhosted.org/packages/58/c6/5c20af38c2a57c15d87f7f38bee77d63c1d2a3689f74fefaf35915dd12b2/griffe-1.7.3-py3-none-any.whl", hash = "sha256:c6b3ee30c2f0f17f30bcdef5068d6ab7a2a4f1b8bf1a3e74b56fffd21e1c5f75", size = 129303 },
]
[[package]] [[package]]
name = "grpcio" name = "grpcio"
version = "1.71.0" version = "1.71.0"
@ -1068,6 +1202,18 @@ wheels = [
{ url = "https://files.pythonhosted.org/packages/f0/0f/310fb31e39e2d734ccaa2c0fb981ee41f7bd5056ce9bc29b2248bd569169/humanfriendly-10.0-py2.py3-none-any.whl", hash = "sha256:1697e1a8a8f550fd43c2865cd84542fc175a61dcb779b6fee18cf6b6ccba1477", size = 86794 }, { url = "https://files.pythonhosted.org/packages/f0/0f/310fb31e39e2d734ccaa2c0fb981ee41f7bd5056ce9bc29b2248bd569169/humanfriendly-10.0-py2.py3-none-any.whl", hash = "sha256:1697e1a8a8f550fd43c2865cd84542fc175a61dcb779b6fee18cf6b6ccba1477", size = 86794 },
] ]
[[package]]
name = "id"
version = "1.5.0"
source = { registry = "https://pypi.org/simple" }
dependencies = [
{ name = "requests" },
]
sdist = { url = "https://files.pythonhosted.org/packages/22/11/102da08f88412d875fa2f1a9a469ff7ad4c874b0ca6fed0048fe385bdb3d/id-1.5.0.tar.gz", hash = "sha256:292cb8a49eacbbdbce97244f47a97b4c62540169c976552e497fd57df0734c1d", size = 15237 }
wheels = [
{ url = "https://files.pythonhosted.org/packages/9f/cb/18326d2d89ad3b0dd143da971e77afd1e6ca6674f1b1c3df4b6bec6279fc/id-1.5.0-py3-none-any.whl", hash = "sha256:f1434e1cef91f2cbb8a4ec64663d5a23b9ed43ef44c4c957d02583d61714c658", size = 13611 },
]
[[package]] [[package]]
name = "idna" name = "idna"
version = "3.10" version = "3.10"
@ -1089,6 +1235,48 @@ wheels = [
{ url = "https://files.pythonhosted.org/packages/79/9d/0fb148dc4d6fa4a7dd1d8378168d9b4cd8d4560a6fbf6f0121c5fc34eb68/importlib_metadata-8.6.1-py3-none-any.whl", hash = "sha256:02a89390c1e15fdfdc0d7c6b25cb3e62650d0494005c97d6f148bf5b9787525e", size = 26971 }, { url = "https://files.pythonhosted.org/packages/79/9d/0fb148dc4d6fa4a7dd1d8378168d9b4cd8d4560a6fbf6f0121c5fc34eb68/importlib_metadata-8.6.1-py3-none-any.whl", hash = "sha256:02a89390c1e15fdfdc0d7c6b25cb3e62650d0494005c97d6f148bf5b9787525e", size = 26971 },
] ]
[[package]]
name = "jaraco-classes"
version = "3.4.0"
source = { registry = "https://pypi.org/simple" }
dependencies = [
{ name = "more-itertools" },
]
sdist = { url = "https://files.pythonhosted.org/packages/06/c0/ed4a27bc5571b99e3cff68f8a9fa5b56ff7df1c2251cc715a652ddd26402/jaraco.classes-3.4.0.tar.gz", hash = "sha256:47a024b51d0239c0dd8c8540c6c7f484be3b8fcf0b2d85c13825780d3b3f3acd", size = 11780 }
wheels = [
{ url = "https://files.pythonhosted.org/packages/7f/66/b15ce62552d84bbfcec9a4873ab79d993a1dd4edb922cbfccae192bd5b5f/jaraco.classes-3.4.0-py3-none-any.whl", hash = "sha256:f662826b6bed8cace05e7ff873ce0f9283b5c924470fe664fff1c2f00f581790", size = 6777 },
]
[[package]]
name = "jaraco-context"
version = "6.0.1"
source = { registry = "https://pypi.org/simple" }
sdist = { url = "https://files.pythonhosted.org/packages/df/ad/f3777b81bf0b6e7bc7514a1656d3e637b2e8e15fab2ce3235730b3e7a4e6/jaraco_context-6.0.1.tar.gz", hash = "sha256:9bae4ea555cf0b14938dc0aee7c9f32ed303aa20a3b73e7dc80111628792d1b3", size = 13912 }
wheels = [
{ url = "https://files.pythonhosted.org/packages/ff/db/0c52c4cf5e4bd9f5d7135ec7669a3a767af21b3a308e1ed3674881e52b62/jaraco.context-6.0.1-py3-none-any.whl", hash = "sha256:f797fc481b490edb305122c9181830a3a5b76d84ef6d1aef2fb9b47ab956f9e4", size = 6825 },
]
[[package]]
name = "jaraco-functools"
version = "4.1.0"
source = { registry = "https://pypi.org/simple" }
dependencies = [
{ name = "more-itertools" },
]
sdist = { url = "https://files.pythonhosted.org/packages/ab/23/9894b3df5d0a6eb44611c36aec777823fc2e07740dabbd0b810e19594013/jaraco_functools-4.1.0.tar.gz", hash = "sha256:70f7e0e2ae076498e212562325e805204fc092d7b4c17e0e86c959e249701a9d", size = 19159 }
wheels = [
{ url = "https://files.pythonhosted.org/packages/9f/4f/24b319316142c44283d7540e76c7b5a6dbd5db623abd86bb7b3491c21018/jaraco.functools-4.1.0-py3-none-any.whl", hash = "sha256:ad159f13428bc4acbf5541ad6dec511f91573b90fba04df61dafa2a1231cf649", size = 10187 },
]
[[package]]
name = "jeepney"
version = "0.9.0"
source = { registry = "https://pypi.org/simple" }
sdist = { url = "https://files.pythonhosted.org/packages/7b/6f/357efd7602486741aa73ffc0617fb310a29b588ed0fd69c2399acbb85b0c/jeepney-0.9.0.tar.gz", hash = "sha256:cf0e9e845622b81e4a28df94c40345400256ec608d0e55bb8a3feaa9163f5732", size = 106758 }
wheels = [
{ url = "https://files.pythonhosted.org/packages/b2/a3/e137168c9c44d18eff0376253da9f1e9234d0239e0ee230d2fee6cea8e55/jeepney-0.9.0-py3-none-any.whl", hash = "sha256:97e5714520c16fc0a45695e5365a2e11b81ea79bba796e26f9f1d178cb182683", size = 49010 },
]
[[package]] [[package]]
name = "jinja2" name = "jinja2"
version = "3.1.5" version = "3.1.5"
@ -1193,6 +1381,23 @@ wheels = [
{ url = "https://files.pythonhosted.org/packages/d1/0f/8910b19ac0670a0f80ce1008e5e751c4a57e14d2c4c13a482aa6079fa9d6/jsonschema_specifications-2024.10.1-py3-none-any.whl", hash = "sha256:a09a0680616357d9a0ecf05c12ad234479f549239d0f5b55f3deea67475da9bf", size = 18459 }, { url = "https://files.pythonhosted.org/packages/d1/0f/8910b19ac0670a0f80ce1008e5e751c4a57e14d2c4c13a482aa6079fa9d6/jsonschema_specifications-2024.10.1-py3-none-any.whl", hash = "sha256:a09a0680616357d9a0ecf05c12ad234479f549239d0f5b55f3deea67475da9bf", size = 18459 },
] ]
[[package]]
name = "keyring"
version = "25.6.0"
source = { registry = "https://pypi.org/simple" }
dependencies = [
{ name = "jaraco-classes" },
{ name = "jaraco-context" },
{ name = "jaraco-functools" },
{ name = "jeepney", marker = "sys_platform == 'linux'" },
{ name = "pywin32-ctypes", marker = "sys_platform == 'win32'" },
{ name = "secretstorage", marker = "sys_platform == 'linux'" },
]
sdist = { url = "https://files.pythonhosted.org/packages/70/09/d904a6e96f76ff214be59e7aa6ef7190008f52a0ab6689760a98de0bf37d/keyring-25.6.0.tar.gz", hash = "sha256:0b39998aa941431eb3d9b0d4b2460bc773b9df6fed7621c2dfb291a7e0187a66", size = 62750 }
wheels = [
{ url = "https://files.pythonhosted.org/packages/d3/32/da7f44bcb1105d3e88a0b74ebdca50c59121d2ddf71c9e34ba47df7f3a56/keyring-25.6.0-py3-none-any.whl", hash = "sha256:552a3f7af126ece7ed5c89753650eec89c7eaae8617d0aa4d9ad2b75111266bd", size = 39085 },
]
[[package]] [[package]]
name = "kiwisolver" name = "kiwisolver"
version = "1.4.8" version = "1.4.8"
@ -1413,6 +1618,19 @@ wheels = [
{ url = "https://files.pythonhosted.org/packages/8b/e4/5380e8229c442e406404977d2ec71a9db6a3e6a89fce7791c6ad7cd2bdbe/langsmith-0.3.8-py3-none-any.whl", hash = "sha256:fbb9dd97b0f090219447fca9362698d07abaeda1da85aa7cc6ec6517b36581b1", size = 332800 }, { url = "https://files.pythonhosted.org/packages/8b/e4/5380e8229c442e406404977d2ec71a9db6a3e6a89fce7791c6ad7cd2bdbe/langsmith-0.3.8-py3-none-any.whl", hash = "sha256:fbb9dd97b0f090219447fca9362698d07abaeda1da85aa7cc6ec6517b36581b1", size = 332800 },
] ]
[[package]]
name = "linkup-sdk"
version = "0.2.4"
source = { registry = "https://pypi.org/simple" }
dependencies = [
{ name = "httpx" },
{ name = "pydantic" },
]
sdist = { url = "https://files.pythonhosted.org/packages/c2/c7/d9a85331bf2611ecac67f1ad92a6ced641b2e2e93eea26b17a9af701b3d1/linkup_sdk-0.2.4.tar.gz", hash = "sha256:2b8fd1894b9b4715bc14aabcbf53df6def9024f2cc426f234cc59e1807ec4c12", size = 9392 }
wheels = [
{ url = "https://files.pythonhosted.org/packages/18/d8/bb9e01328fe5ad979e3e459c0f76321d295663906deef56eeaa5ce0cf269/linkup_sdk-0.2.4-py3-none-any.whl", hash = "sha256:8bc4c4f34de93529136a14e42441d803868d681c2bf3fd59be51923e44f1f1d4", size = 8325 },
]
[[package]] [[package]]
name = "litellm" name = "litellm"
version = "1.61.4" version = "1.61.4"
@ -1435,6 +1653,72 @@ wheels = [
{ url = "https://files.pythonhosted.org/packages/f9/c2/1b6c502909b7af9054736af61e27558a3341e8c1ba28e7f82473e6dd936f/litellm-1.61.4-py3-none-any.whl", hash = "sha256:e87e0d397a191795b4217f9299fc9b21eaacaab91409695f0a4780cceccda6e1", size = 6814517 }, { url = "https://files.pythonhosted.org/packages/f9/c2/1b6c502909b7af9054736af61e27558a3341e8c1ba28e7f82473e6dd936f/litellm-1.61.4-py3-none-any.whl", hash = "sha256:e87e0d397a191795b4217f9299fc9b21eaacaab91409695f0a4780cceccda6e1", size = 6814517 },
] ]
[[package]]
name = "llama-cloud"
version = "0.1.23"
source = { registry = "https://pypi.org/simple" }
dependencies = [
{ name = "certifi" },
{ name = "httpx" },
{ name = "pydantic" },
]
sdist = { url = "https://files.pythonhosted.org/packages/5b/e4/d1a30167ed6690a408382be1cf7de220a506085f4371baaf067d65bad8fd/llama_cloud-0.1.23.tar.gz", hash = "sha256:3d84a24a860f046d39a106c06742ec0ea39a574ac42bbf91706fe025f44e233e", size = 101292 }
wheels = [
{ url = "https://files.pythonhosted.org/packages/8a/15/3b56acef877dbc5d01d7e1a782c2cc50ef8a08d5773121c3bc20546de582/llama_cloud-0.1.23-py3-none-any.whl", hash = "sha256:ce95b0705d85c99b3b27b0af0d16a17d9a81b14c96bf13c1063a1bd13d8d0446", size = 267343 },
]
[[package]]
name = "llama-cloud-services"
version = "0.6.25"
source = { registry = "https://pypi.org/simple" }
dependencies = [
{ name = "click" },
{ name = "llama-cloud" },
{ name = "llama-index-core" },
{ name = "platformdirs" },
{ name = "pydantic" },
{ name = "python-dotenv" },
]
sdist = { url = "https://files.pythonhosted.org/packages/79/c0/89f89dfc2c2b6c2d5c1c5fde9f445696eb12f9c2a4e17637ab0aaf7cc373/llama_cloud_services-0.6.25.tar.gz", hash = "sha256:3608004b0cf984640a3a36657b8b40394d7ce2c48e3eb9dd24fc654df7643595", size = 32303 }
wheels = [
{ url = "https://files.pythonhosted.org/packages/e6/f1/99b8ef4a636dafd5f1ae1e1b19eb9f793f51573d782919bf01d9b9f797f4/llama_cloud_services-0.6.25-py3-none-any.whl", hash = "sha256:aef0afbbf0d6dc485e6566af2daeeefa8caa7bc7f6511d860036bc0aac15361b", size = 37231 },
]
[[package]]
name = "llama-index-core"
version = "0.12.39"
source = { registry = "https://pypi.org/simple" }
dependencies = [
{ name = "aiohttp" },
{ name = "aiosqlite" },
{ name = "banks" },
{ name = "dataclasses-json" },
{ name = "deprecated" },
{ name = "dirtyjson" },
{ name = "filetype" },
{ name = "fsspec" },
{ name = "httpx" },
{ name = "nest-asyncio" },
{ name = "networkx" },
{ name = "nltk" },
{ name = "numpy" },
{ name = "pillow" },
{ name = "pydantic" },
{ name = "pyyaml" },
{ name = "requests" },
{ name = "sqlalchemy", extra = ["asyncio"] },
{ name = "tenacity" },
{ name = "tiktoken" },
{ name = "tqdm" },
{ name = "typing-extensions" },
{ name = "typing-inspect" },
{ name = "wrapt" },
]
sdist = { url = "https://files.pythonhosted.org/packages/f7/45/163806502804ff75ace474f868cc33158774c4eb31d565133f32932e930e/llama_index_core-0.12.39.tar.gz", hash = "sha256:0cca9de59953542a3c2f1db61327c5204e0b1e997f31f1200e49392b2879593a", size = 7292040 }
wheels = [
{ url = "https://files.pythonhosted.org/packages/dd/a3/583d80764df75aefc9885f28dcc06a0e5aefc993fa5318186e70f2340d73/llama_index_core-0.12.39-py3-none-any.whl", hash = "sha256:c255ed87aa85e43893f2bb05870b61ce7701d7a6a931d174ba925def5856b4c2", size = 7664906 },
]
[[package]] [[package]]
name = "lxml" name = "lxml"
version = "5.3.1" version = "5.3.1"
@ -1477,6 +1761,24 @@ wheels = [
{ url = "https://files.pythonhosted.org/packages/80/83/8c54533b3576f4391eebea88454738978669a6cad0d8e23266224007939d/lxml-5.3.1-cp313-cp313-win_amd64.whl", hash = "sha256:91fb6a43d72b4f8863d21f347a9163eecbf36e76e2f51068d59cd004c506f332", size = 3814484 }, { url = "https://files.pythonhosted.org/packages/80/83/8c54533b3576f4391eebea88454738978669a6cad0d8e23266224007939d/lxml-5.3.1-cp313-cp313-win_amd64.whl", hash = "sha256:91fb6a43d72b4f8863d21f347a9163eecbf36e76e2f51068d59cd004c506f332", size = 3814484 },
] ]
[[package]]
name = "magika"
version = "0.6.1"
source = { registry = "https://pypi.org/simple" }
dependencies = [
{ name = "click" },
{ name = "numpy" },
{ name = "onnxruntime" },
{ name = "python-dotenv" },
]
sdist = { url = "https://files.pythonhosted.org/packages/6d/18/ea70f6abd36f455037340f12c8125918c726d08cd6e01f0b76b6884e0c38/magika-0.6.1.tar.gz", hash = "sha256:e3dd22c73936630b1cd79d0f412d6d9a53dc99ba5e3709b1ac53f56bc998e635", size = 3030234 }
wheels = [
{ url = "https://files.pythonhosted.org/packages/1f/be/c9f7bb9ee94abe8d344b660672001313e459c67b867b24abe32d5c80a9ce/magika-0.6.1-py3-none-any.whl", hash = "sha256:15838d2469f1394d8e9598bc7fceea1ede7f35aebe9675c6b45c6b5c48315931", size = 2968516 },
{ url = "https://files.pythonhosted.org/packages/3c/b9/016b174520e81faef5edb31b6c7a73966dc84ee33acd23a2e7b775df7ba4/magika-0.6.1-py3-none-macosx_11_0_arm64.whl", hash = "sha256:dadd036296a2e4840fd48fa0712848fe122da438e8f607dc8f19ca4663c359dc", size = 12408519 },
{ url = "https://files.pythonhosted.org/packages/02/b7/e7dfeb235823a82d676c68a748541c24db0249b854f945f6e3cec11c1b7e/magika-0.6.1-py3-none-manylinux_2_28_x86_64.whl", hash = "sha256:133c0e1a844361de86ca2dd7c530e38b324e86177d30c52e36fd82101c190b5c", size = 15089294 },
{ url = "https://files.pythonhosted.org/packages/64/f0/bec5bff0125d08c1bc3baef88beeb910121085249f67b5994ea961615b55/magika-0.6.1-py3-none-win_amd64.whl", hash = "sha256:0342b6230ea9aea7ab4b8fa92e1b46f1cc62e724d452ee8d6821a37f56738d22", size = 12378455 },
]
[[package]] [[package]]
name = "makefun" name = "makefun"
version = "1.15.6" version = "1.15.6"
@ -1630,7 +1932,7 @@ wheels = [
[[package]] [[package]]
name = "model2vec" name = "model2vec"
version = "0.4.0" version = "0.4.1"
source = { registry = "https://pypi.org/simple" } source = { registry = "https://pypi.org/simple" }
dependencies = [ dependencies = [
{ name = "jinja2" }, { name = "jinja2" },
@ -1642,9 +1944,18 @@ dependencies = [
{ name = "tokenizers" }, { name = "tokenizers" },
{ name = "tqdm" }, { name = "tqdm" },
] ]
sdist = { url = "https://files.pythonhosted.org/packages/83/e2/3fb7bd8c612f71ad3abded92e7401f97f1e71427d3a68a3fb85f39394b17/model2vec-0.4.0.tar.gz", hash = "sha256:48d4a3da040499b0090f736eb8f22ea0fdd35b67462d81d789c70004423adbae", size = 2486998 } sdist = { url = "https://files.pythonhosted.org/packages/b8/c1/3cd6cab10e8b7da8c32acebf85672d38a26f5f03165bfeaa617a5ec0bb61/model2vec-0.4.1.tar.gz", hash = "sha256:fc6038416679eebe448951708f2d0bebdee8510f47970af1c81a8f054a3c3f9f", size = 2660626 }
wheels = [ wheels = [
{ url = "https://files.pythonhosted.org/packages/93/7d/39ff093c4e45303a06e3c5825c6144cbd21f18a1393a154bbf93232b0f1a/model2vec-0.4.0-py3-none-any.whl", hash = "sha256:df30685a55841c61c6638e4f329648e76b148507bd778801d7bfcd6b970a4f2f", size = 38593 }, { url = "https://files.pythonhosted.org/packages/cd/76/c8575f90f521017597c5e57e3bfef61e3f27d9cb6c741a82a24d72b10a60/model2vec-0.4.1-py3-none-any.whl", hash = "sha256:04a397a17da9b967082b6baa4c494f0be48c89ec4e1a3975b4f290f045238a38", size = 41972 },
]
[[package]]
name = "more-itertools"
version = "10.7.0"
source = { registry = "https://pypi.org/simple" }
sdist = { url = "https://files.pythonhosted.org/packages/ce/a0/834b0cebabbfc7e311f30b46c8188790a37f89fc8d756660346fe5abfd09/more_itertools-10.7.0.tar.gz", hash = "sha256:9fddd5403be01a94b204faadcff459ec3568cf110265d3c54323e1e866ad29d3", size = 127671 }
wheels = [
{ url = "https://files.pythonhosted.org/packages/2b/9f/7ba6f94fc1e9ac3d2b853fdff3035fb2fa5afbed898c4a72b8a020610594/more_itertools-10.7.0-py3-none-any.whl", hash = "sha256:d43980384673cb07d2f7d2d918c616b30c659c089ee23953f601d6609c67510e", size = 65278 },
] ]
[[package]] [[package]]
@ -1722,6 +2033,37 @@ wheels = [
{ url = "https://files.pythonhosted.org/packages/b9/54/dd730b32ea14ea797530a4479b2ed46a6fb250f682a9cfb997e968bf0261/networkx-3.4.2-py3-none-any.whl", hash = "sha256:df5d4365b724cf81b8c6a7312509d0c22386097011ad1abe274afd5e9d3bbc5f", size = 1723263 }, { url = "https://files.pythonhosted.org/packages/b9/54/dd730b32ea14ea797530a4479b2ed46a6fb250f682a9cfb997e968bf0261/networkx-3.4.2-py3-none-any.whl", hash = "sha256:df5d4365b724cf81b8c6a7312509d0c22386097011ad1abe274afd5e9d3bbc5f", size = 1723263 },
] ]
[[package]]
name = "nh3"
version = "0.2.21"
source = { registry = "https://pypi.org/simple" }
sdist = { url = "https://files.pythonhosted.org/packages/37/30/2f81466f250eb7f591d4d193930df661c8c23e9056bdc78e365b646054d8/nh3-0.2.21.tar.gz", hash = "sha256:4990e7ee6a55490dbf00d61a6f476c9a3258e31e711e13713b2ea7d6616f670e", size = 16581 }
wheels = [
{ url = "https://files.pythonhosted.org/packages/7f/81/b83775687fcf00e08ade6d4605f0be9c4584cb44c4973d9f27b7456a31c9/nh3-0.2.21-cp313-cp313t-macosx_10_12_x86_64.macosx_11_0_arm64.macosx_10_12_universal2.whl", hash = "sha256:fcff321bd60c6c5c9cb4ddf2554e22772bb41ebd93ad88171bbbb6f271255286", size = 1297678 },
{ url = "https://files.pythonhosted.org/packages/22/ee/d0ad8fb4b5769f073b2df6807f69a5e57ca9cea504b78809921aef460d20/nh3-0.2.21-cp313-cp313t-manylinux_2_17_x86_64.manylinux2014_x86_64.whl", hash = "sha256:31eedcd7d08b0eae28ba47f43fd33a653b4cdb271d64f1aeda47001618348fde", size = 733774 },
{ url = "https://files.pythonhosted.org/packages/ea/76/b450141e2d384ede43fe53953552f1c6741a499a8c20955ad049555cabc8/nh3-0.2.21-cp313-cp313t-manylinux_2_5_i686.manylinux1_i686.whl", hash = "sha256:d426d7be1a2f3d896950fe263332ed1662f6c78525b4520c8e9861f8d7f0d243", size = 760012 },
{ url = "https://files.pythonhosted.org/packages/97/90/1182275db76cd8fbb1f6bf84c770107fafee0cb7da3e66e416bcb9633da2/nh3-0.2.21-cp313-cp313t-musllinux_1_2_aarch64.whl", hash = "sha256:9d67709bc0d7d1f5797b21db26e7a8b3d15d21c9c5f58ccfe48b5328483b685b", size = 923619 },
{ url = "https://files.pythonhosted.org/packages/29/c7/269a7cfbec9693fad8d767c34a755c25ccb8d048fc1dfc7a7d86bc99375c/nh3-0.2.21-cp313-cp313t-musllinux_1_2_armv7l.whl", hash = "sha256:55823c5ea1f6b267a4fad5de39bc0524d49a47783e1fe094bcf9c537a37df251", size = 1000384 },
{ url = "https://files.pythonhosted.org/packages/68/a9/48479dbf5f49ad93f0badd73fbb48b3d769189f04c6c69b0df261978b009/nh3-0.2.21-cp313-cp313t-musllinux_1_2_i686.whl", hash = "sha256:818f2b6df3763e058efa9e69677b5a92f9bc0acff3295af5ed013da544250d5b", size = 918908 },
{ url = "https://files.pythonhosted.org/packages/d7/da/0279c118f8be2dc306e56819880b19a1cf2379472e3b79fc8eab44e267e3/nh3-0.2.21-cp313-cp313t-musllinux_1_2_x86_64.whl", hash = "sha256:b3b5c58161e08549904ac4abd450dacd94ff648916f7c376ae4b2c0652b98ff9", size = 909180 },
{ url = "https://files.pythonhosted.org/packages/26/16/93309693f8abcb1088ae143a9c8dbcece9c8f7fb297d492d3918340c41f1/nh3-0.2.21-cp313-cp313t-win32.whl", hash = "sha256:637d4a10c834e1b7d9548592c7aad760611415fcd5bd346f77fd8a064309ae6d", size = 532747 },
{ url = "https://files.pythonhosted.org/packages/a2/3a/96eb26c56cbb733c0b4a6a907fab8408ddf3ead5d1b065830a8f6a9c3557/nh3-0.2.21-cp313-cp313t-win_amd64.whl", hash = "sha256:713d16686596e556b65e7f8c58328c2df63f1a7abe1277d87625dcbbc012ef82", size = 528908 },
{ url = "https://files.pythonhosted.org/packages/ba/1d/b1ef74121fe325a69601270f276021908392081f4953d50b03cbb38b395f/nh3-0.2.21-cp38-abi3-macosx_10_12_x86_64.macosx_11_0_arm64.macosx_10_12_universal2.whl", hash = "sha256:a772dec5b7b7325780922dd904709f0f5f3a79fbf756de5291c01370f6df0967", size = 1316133 },
{ url = "https://files.pythonhosted.org/packages/b8/f2/2c7f79ce6de55b41e7715f7f59b159fd59f6cdb66223c05b42adaee2b645/nh3-0.2.21-cp38-abi3-manylinux_2_17_aarch64.manylinux2014_aarch64.whl", hash = "sha256:d002b648592bf3033adfd875a48f09b8ecc000abd7f6a8769ed86b6ccc70c759", size = 758328 },
{ url = "https://files.pythonhosted.org/packages/6d/ad/07bd706fcf2b7979c51b83d8b8def28f413b090cf0cb0035ee6b425e9de5/nh3-0.2.21-cp38-abi3-manylinux_2_17_armv7l.manylinux2014_armv7l.whl", hash = "sha256:2a5174551f95f2836f2ad6a8074560f261cf9740a48437d6151fd2d4d7d617ab", size = 747020 },
{ url = "https://files.pythonhosted.org/packages/75/99/06a6ba0b8a0d79c3d35496f19accc58199a1fb2dce5e711a31be7e2c1426/nh3-0.2.21-cp38-abi3-manylinux_2_17_ppc64.manylinux2014_ppc64.whl", hash = "sha256:b8d55ea1fc7ae3633d758a92aafa3505cd3cc5a6e40470c9164d54dff6f96d42", size = 944878 },
{ url = "https://files.pythonhosted.org/packages/79/d4/dc76f5dc50018cdaf161d436449181557373869aacf38a826885192fc587/nh3-0.2.21-cp38-abi3-manylinux_2_17_ppc64le.manylinux2014_ppc64le.whl", hash = "sha256:6ae319f17cd8960d0612f0f0ddff5a90700fa71926ca800e9028e7851ce44a6f", size = 903460 },
{ url = "https://files.pythonhosted.org/packages/cd/c3/d4f8037b2ab02ebf5a2e8637bd54736ed3d0e6a2869e10341f8d9085f00e/nh3-0.2.21-cp38-abi3-manylinux_2_17_s390x.manylinux2014_s390x.whl", hash = "sha256:63ca02ac6f27fc80f9894409eb61de2cb20ef0a23740c7e29f9ec827139fa578", size = 839369 },
{ url = "https://files.pythonhosted.org/packages/11/a9/1cd3c6964ec51daed7b01ca4686a5c793581bf4492cbd7274b3f544c9abe/nh3-0.2.21-cp38-abi3-manylinux_2_17_x86_64.manylinux2014_x86_64.whl", hash = "sha256:a5f77e62aed5c4acad635239ac1290404c7e940c81abe561fd2af011ff59f585", size = 739036 },
{ url = "https://files.pythonhosted.org/packages/fd/04/bfb3ff08d17a8a96325010ae6c53ba41de6248e63cdb1b88ef6369a6cdfc/nh3-0.2.21-cp38-abi3-manylinux_2_5_i686.manylinux1_i686.whl", hash = "sha256:087ffadfdcd497658c3adc797258ce0f06be8a537786a7217649fc1c0c60c293", size = 768712 },
{ url = "https://files.pythonhosted.org/packages/9e/aa/cfc0bf545d668b97d9adea4f8b4598667d2b21b725d83396c343ad12bba7/nh3-0.2.21-cp38-abi3-musllinux_1_2_aarch64.whl", hash = "sha256:ac7006c3abd097790e611fe4646ecb19a8d7f2184b882f6093293b8d9b887431", size = 930559 },
{ url = "https://files.pythonhosted.org/packages/78/9d/6f5369a801d3a1b02e6a9a097d56bcc2f6ef98cffebf03c4bb3850d8e0f0/nh3-0.2.21-cp38-abi3-musllinux_1_2_armv7l.whl", hash = "sha256:6141caabe00bbddc869665b35fc56a478eb774a8c1dfd6fba9fe1dfdf29e6efa", size = 1008591 },
{ url = "https://files.pythonhosted.org/packages/a6/df/01b05299f68c69e480edff608248313cbb5dbd7595c5e048abe8972a57f9/nh3-0.2.21-cp38-abi3-musllinux_1_2_i686.whl", hash = "sha256:20979783526641c81d2f5bfa6ca5ccca3d1e4472474b162c6256745fbfe31cd1", size = 925670 },
{ url = "https://files.pythonhosted.org/packages/3d/79/bdba276f58d15386a3387fe8d54e980fb47557c915f5448d8c6ac6f7ea9b/nh3-0.2.21-cp38-abi3-musllinux_1_2_x86_64.whl", hash = "sha256:a7ea28cd49293749d67e4fcf326c554c83ec912cd09cd94aa7ec3ab1921c8283", size = 917093 },
{ url = "https://files.pythonhosted.org/packages/e7/d8/c6f977a5cd4011c914fb58f5ae573b071d736187ccab31bfb1d539f4af9f/nh3-0.2.21-cp38-abi3-win32.whl", hash = "sha256:6c9c30b8b0d291a7c5ab0967ab200598ba33208f754f2f4920e9343bdd88f79a", size = 537623 },
{ url = "https://files.pythonhosted.org/packages/23/fc/8ce756c032c70ae3dd1d48a3552577a325475af2a2f629604b44f571165c/nh3-0.2.21-cp38-abi3-win_amd64.whl", hash = "sha256:bb0014948f04d7976aabae43fcd4cb7f551f9f8ce785a4c9ef66e6c2590f8629", size = 535283 },
]
[[package]] [[package]]
name = "nltk" name = "nltk"
version = "3.9.1" version = "3.9.1"
@ -1751,18 +2093,40 @@ wheels = [
[[package]] [[package]]
name = "numpy" name = "numpy"
version = "1.26.4" version = "2.2.5"
source = { registry = "https://pypi.org/simple" } source = { registry = "https://pypi.org/simple" }
sdist = { url = "https://files.pythonhosted.org/packages/65/6e/09db70a523a96d25e115e71cc56a6f9031e7b8cd166c1ac8438307c14058/numpy-1.26.4.tar.gz", hash = "sha256:2a02aba9ed12e4ac4eb3ea9421c420301a0c6460d9830d74a9df87efa4912010", size = 15786129 } sdist = { url = "https://files.pythonhosted.org/packages/dc/b2/ce4b867d8cd9c0ee84938ae1e6a6f7926ebf928c9090d036fc3c6a04f946/numpy-2.2.5.tar.gz", hash = "sha256:a9c0d994680cd991b1cb772e8b297340085466a6fe964bc9d4e80f5e2f43c291", size = 20273920 }
wheels = [ wheels = [
{ url = "https://files.pythonhosted.org/packages/95/12/8f2020a8e8b8383ac0177dc9570aad031a3beb12e38847f7129bacd96228/numpy-1.26.4-cp312-cp312-macosx_10_9_x86_64.whl", hash = "sha256:b3ce300f3644fb06443ee2222c2201dd3a89ea6040541412b8fa189341847218", size = 20335901 }, { url = "https://files.pythonhosted.org/packages/e2/f7/1fd4ff108cd9d7ef929b8882692e23665dc9c23feecafbb9c6b80f4ec583/numpy-2.2.5-cp312-cp312-macosx_10_13_x86_64.whl", hash = "sha256:ee461a4eaab4f165b68780a6a1af95fb23a29932be7569b9fab666c407969051", size = 20948633 },
{ url = "https://files.pythonhosted.org/packages/75/5b/ca6c8bd14007e5ca171c7c03102d17b4f4e0ceb53957e8c44343a9546dcc/numpy-1.26.4-cp312-cp312-macosx_11_0_arm64.whl", hash = "sha256:03a8c78d01d9781b28a6989f6fa1bb2c4f2d51201cf99d3dd875df6fbd96b23b", size = 13685868 }, { url = "https://files.pythonhosted.org/packages/12/03/d443c278348371b20d830af155ff2079acad6a9e60279fac2b41dbbb73d8/numpy-2.2.5-cp312-cp312-macosx_11_0_arm64.whl", hash = "sha256:ec31367fd6a255dc8de4772bd1658c3e926d8e860a0b6e922b615e532d320ddc", size = 14176123 },
{ url = "https://files.pythonhosted.org/packages/79/f8/97f10e6755e2a7d027ca783f63044d5b1bc1ae7acb12afe6a9b4286eac17/numpy-1.26.4-cp312-cp312-manylinux_2_17_aarch64.manylinux2014_aarch64.whl", hash = "sha256:9fad7dcb1aac3c7f0584a5a8133e3a43eeb2fe127f47e3632d43d677c66c102b", size = 13925109 }, { url = "https://files.pythonhosted.org/packages/2b/0b/5ca264641d0e7b14393313304da48b225d15d471250376f3fbdb1a2be603/numpy-2.2.5-cp312-cp312-macosx_14_0_arm64.whl", hash = "sha256:47834cde750d3c9f4e52c6ca28a7361859fcaf52695c7dc3cc1a720b8922683e", size = 5163817 },
{ url = "https://files.pythonhosted.org/packages/0f/50/de23fde84e45f5c4fda2488c759b69990fd4512387a8632860f3ac9cd225/numpy-1.26.4-cp312-cp312-manylinux_2_17_x86_64.manylinux2014_x86_64.whl", hash = "sha256:675d61ffbfa78604709862923189bad94014bef562cc35cf61d3a07bba02a7ed", size = 17950613 }, { url = "https://files.pythonhosted.org/packages/04/b3/d522672b9e3d28e26e1613de7675b441bbd1eaca75db95680635dd158c67/numpy-2.2.5-cp312-cp312-macosx_14_0_x86_64.whl", hash = "sha256:2c1a1c6ccce4022383583a6ded7bbcda22fc635eb4eb1e0a053336425ed36dfa", size = 6698066 },
{ url = "https://files.pythonhosted.org/packages/4c/0c/9c603826b6465e82591e05ca230dfc13376da512b25ccd0894709b054ed0/numpy-1.26.4-cp312-cp312-musllinux_1_1_aarch64.whl", hash = "sha256:ab47dbe5cc8210f55aa58e4805fe224dac469cde56b9f731a4c098b91917159a", size = 13572172 }, { url = "https://files.pythonhosted.org/packages/a0/93/0f7a75c1ff02d4b76df35079676b3b2719fcdfb39abdf44c8b33f43ef37d/numpy-2.2.5-cp312-cp312-manylinux_2_17_aarch64.manylinux2014_aarch64.whl", hash = "sha256:9d75f338f5f79ee23548b03d801d28a505198297534f62416391857ea0479571", size = 14087277 },
{ url = "https://files.pythonhosted.org/packages/76/8c/2ba3902e1a0fc1c74962ea9bb33a534bb05984ad7ff9515bf8d07527cadd/numpy-1.26.4-cp312-cp312-musllinux_1_1_x86_64.whl", hash = "sha256:1dda2e7b4ec9dd512f84935c5f126c8bd8b9f2fc001e9f54af255e8c5f16b0e0", size = 17786643 }, { url = "https://files.pythonhosted.org/packages/b0/d9/7c338b923c53d431bc837b5b787052fef9ae68a56fe91e325aac0d48226e/numpy-2.2.5-cp312-cp312-manylinux_2_17_x86_64.manylinux2014_x86_64.whl", hash = "sha256:3a801fef99668f309b88640e28d261991bfad9617c27beda4a3aec4f217ea073", size = 16135742 },
{ url = "https://files.pythonhosted.org/packages/28/4a/46d9e65106879492374999e76eb85f87b15328e06bd1550668f79f7b18c6/numpy-1.26.4-cp312-cp312-win32.whl", hash = "sha256:50193e430acfc1346175fcbdaa28ffec49947a06918b7b92130744e81e640110", size = 5677803 }, { url = "https://files.pythonhosted.org/packages/2d/10/4dec9184a5d74ba9867c6f7d1e9f2e0fb5fe96ff2bf50bb6f342d64f2003/numpy-2.2.5-cp312-cp312-musllinux_1_2_aarch64.whl", hash = "sha256:abe38cd8381245a7f49967a6010e77dbf3680bd3627c0fe4362dd693b404c7f8", size = 15581825 },
{ url = "https://files.pythonhosted.org/packages/16/2e/86f24451c2d530c88daf997cb8d6ac622c1d40d19f5a031ed68a4b73a374/numpy-1.26.4-cp312-cp312-win_amd64.whl", hash = "sha256:08beddf13648eb95f8d867350f6a018a4be2e5ad54c8d8caed89ebca558b2818", size = 15517754 }, { url = "https://files.pythonhosted.org/packages/80/1f/2b6fcd636e848053f5b57712a7d1880b1565eec35a637fdfd0a30d5e738d/numpy-2.2.5-cp312-cp312-musllinux_1_2_x86_64.whl", hash = "sha256:5a0ac90e46fdb5649ab6369d1ab6104bfe5854ab19b645bf5cda0127a13034ae", size = 17899600 },
{ url = "https://files.pythonhosted.org/packages/ec/87/36801f4dc2623d76a0a3835975524a84bd2b18fe0f8835d45c8eae2f9ff2/numpy-2.2.5-cp312-cp312-win32.whl", hash = "sha256:0cd48122a6b7eab8f06404805b1bd5856200e3ed6f8a1b9a194f9d9054631beb", size = 6312626 },
{ url = "https://files.pythonhosted.org/packages/8b/09/4ffb4d6cfe7ca6707336187951992bd8a8b9142cf345d87ab858d2d7636a/numpy-2.2.5-cp312-cp312-win_amd64.whl", hash = "sha256:ced69262a8278547e63409b2653b372bf4baff0870c57efa76c5703fd6543282", size = 12645715 },
{ url = "https://files.pythonhosted.org/packages/e2/a0/0aa7f0f4509a2e07bd7a509042967c2fab635690d4f48c6c7b3afd4f448c/numpy-2.2.5-cp313-cp313-macosx_10_13_x86_64.whl", hash = "sha256:059b51b658f4414fff78c6d7b1b4e18283ab5fa56d270ff212d5ba0c561846f4", size = 20935102 },
{ url = "https://files.pythonhosted.org/packages/7e/e4/a6a9f4537542912ec513185396fce52cdd45bdcf3e9d921ab02a93ca5aa9/numpy-2.2.5-cp313-cp313-macosx_11_0_arm64.whl", hash = "sha256:47f9ed103af0bc63182609044b0490747e03bd20a67e391192dde119bf43d52f", size = 14191709 },
{ url = "https://files.pythonhosted.org/packages/be/65/72f3186b6050bbfe9c43cb81f9df59ae63603491d36179cf7a7c8d216758/numpy-2.2.5-cp313-cp313-macosx_14_0_arm64.whl", hash = "sha256:261a1ef047751bb02f29dfe337230b5882b54521ca121fc7f62668133cb119c9", size = 5149173 },
{ url = "https://files.pythonhosted.org/packages/e5/e9/83e7a9432378dde5802651307ae5e9ea07bb72b416728202218cd4da2801/numpy-2.2.5-cp313-cp313-macosx_14_0_x86_64.whl", hash = "sha256:4520caa3807c1ceb005d125a75e715567806fed67e315cea619d5ec6e75a4191", size = 6684502 },
{ url = "https://files.pythonhosted.org/packages/ea/27/b80da6c762394c8ee516b74c1f686fcd16c8f23b14de57ba0cad7349d1d2/numpy-2.2.5-cp313-cp313-manylinux_2_17_aarch64.manylinux2014_aarch64.whl", hash = "sha256:3d14b17b9be5f9c9301f43d2e2a4886a33b53f4e6fdf9ca2f4cc60aeeee76372", size = 14084417 },
{ url = "https://files.pythonhosted.org/packages/aa/fc/ebfd32c3e124e6a1043e19c0ab0769818aa69050ce5589b63d05ff185526/numpy-2.2.5-cp313-cp313-manylinux_2_17_x86_64.manylinux2014_x86_64.whl", hash = "sha256:2ba321813a00e508d5421104464510cc962a6f791aa2fca1c97b1e65027da80d", size = 16133807 },
{ url = "https://files.pythonhosted.org/packages/bf/9b/4cc171a0acbe4666f7775cfd21d4eb6bb1d36d3a0431f48a73e9212d2278/numpy-2.2.5-cp313-cp313-musllinux_1_2_aarch64.whl", hash = "sha256:a4cbdef3ddf777423060c6f81b5694bad2dc9675f110c4b2a60dc0181543fac7", size = 15575611 },
{ url = "https://files.pythonhosted.org/packages/a3/45/40f4135341850df48f8edcf949cf47b523c404b712774f8855a64c96ef29/numpy-2.2.5-cp313-cp313-musllinux_1_2_x86_64.whl", hash = "sha256:54088a5a147ab71a8e7fdfd8c3601972751ded0739c6b696ad9cb0343e21ab73", size = 17895747 },
{ url = "https://files.pythonhosted.org/packages/f8/4c/b32a17a46f0ffbde8cc82df6d3daeaf4f552e346df143e1b188a701a8f09/numpy-2.2.5-cp313-cp313-win32.whl", hash = "sha256:c8b82a55ef86a2d8e81b63da85e55f5537d2157165be1cb2ce7cfa57b6aef38b", size = 6309594 },
{ url = "https://files.pythonhosted.org/packages/13/ae/72e6276feb9ef06787365b05915bfdb057d01fceb4a43cb80978e518d79b/numpy-2.2.5-cp313-cp313-win_amd64.whl", hash = "sha256:d8882a829fd779f0f43998e931c466802a77ca1ee0fe25a3abe50278616b1471", size = 12638356 },
{ url = "https://files.pythonhosted.org/packages/79/56/be8b85a9f2adb688e7ded6324e20149a03541d2b3297c3ffc1a73f46dedb/numpy-2.2.5-cp313-cp313t-macosx_10_13_x86_64.whl", hash = "sha256:e8b025c351b9f0e8b5436cf28a07fa4ac0204d67b38f01433ac7f9b870fa38c6", size = 20963778 },
{ url = "https://files.pythonhosted.org/packages/ff/77/19c5e62d55bff507a18c3cdff82e94fe174957bad25860a991cac719d3ab/numpy-2.2.5-cp313-cp313t-macosx_11_0_arm64.whl", hash = "sha256:8dfa94b6a4374e7851bbb6f35e6ded2120b752b063e6acdd3157e4d2bb922eba", size = 14207279 },
{ url = "https://files.pythonhosted.org/packages/75/22/aa11f22dc11ff4ffe4e849d9b63bbe8d4ac6d5fae85ddaa67dfe43be3e76/numpy-2.2.5-cp313-cp313t-macosx_14_0_arm64.whl", hash = "sha256:97c8425d4e26437e65e1d189d22dff4a079b747ff9c2788057bfb8114ce1e133", size = 5199247 },
{ url = "https://files.pythonhosted.org/packages/4f/6c/12d5e760fc62c08eded0394f62039f5a9857f758312bf01632a81d841459/numpy-2.2.5-cp313-cp313t-macosx_14_0_x86_64.whl", hash = "sha256:352d330048c055ea6db701130abc48a21bec690a8d38f8284e00fab256dc1376", size = 6711087 },
{ url = "https://files.pythonhosted.org/packages/ef/94/ece8280cf4218b2bee5cec9567629e61e51b4be501e5c6840ceb593db945/numpy-2.2.5-cp313-cp313t-manylinux_2_17_aarch64.manylinux2014_aarch64.whl", hash = "sha256:8b4c0773b6ada798f51f0f8e30c054d32304ccc6e9c5d93d46cb26f3d385ab19", size = 14059964 },
{ url = "https://files.pythonhosted.org/packages/39/41/c5377dac0514aaeec69115830a39d905b1882819c8e65d97fc60e177e19e/numpy-2.2.5-cp313-cp313t-manylinux_2_17_x86_64.manylinux2014_x86_64.whl", hash = "sha256:55f09e00d4dccd76b179c0f18a44f041e5332fd0e022886ba1c0bbf3ea4a18d0", size = 16121214 },
{ url = "https://files.pythonhosted.org/packages/db/54/3b9f89a943257bc8e187145c6bc0eb8e3d615655f7b14e9b490b053e8149/numpy-2.2.5-cp313-cp313t-musllinux_1_2_aarch64.whl", hash = "sha256:02f226baeefa68f7d579e213d0f3493496397d8f1cff5e2b222af274c86a552a", size = 15575788 },
{ url = "https://files.pythonhosted.org/packages/b1/c4/2e407e85df35b29f79945751b8f8e671057a13a376497d7fb2151ba0d290/numpy-2.2.5-cp313-cp313t-musllinux_1_2_x86_64.whl", hash = "sha256:c26843fd58f65da9491165072da2cccc372530681de481ef670dcc8e27cfb066", size = 17893672 },
{ url = "https://files.pythonhosted.org/packages/29/7e/d0b44e129d038dba453f00d0e29ebd6eaf2f06055d72b95b9947998aca14/numpy-2.2.5-cp313-cp313t-win32.whl", hash = "sha256:1a161c2c79ab30fe4501d5a2bbfe8b162490757cf90b7f05be8b80bc02f7bb8e", size = 6377102 },
{ url = "https://files.pythonhosted.org/packages/63/be/b85e4aa4bf42c6502851b971f1c326d583fcc68227385f92089cf50a7b45/numpy-2.2.5-cp313-cp313t-win_amd64.whl", hash = "sha256:d403c84991b5ad291d3809bace5e85f4bbf44a04bdc9a88ed2bb1807b3360bb8", size = 12750096 },
] ]
[[package]] [[package]]
@ -2219,6 +2583,15 @@ wheels = [
{ url = "https://files.pythonhosted.org/packages/cf/6c/41c21c6c8af92b9fea313aa47c75de49e2f9a467964ee33eb0135d47eb64/pillow-11.1.0-cp313-cp313t-win_arm64.whl", hash = "sha256:67cd427c68926108778a9005f2a04adbd5e67c442ed21d95389fe1d595458756", size = 2377651 }, { url = "https://files.pythonhosted.org/packages/cf/6c/41c21c6c8af92b9fea313aa47c75de49e2f9a467964ee33eb0135d47eb64/pillow-11.1.0-cp313-cp313t-win_arm64.whl", hash = "sha256:67cd427c68926108778a9005f2a04adbd5e67c442ed21d95389fe1d595458756", size = 2377651 },
] ]
[[package]]
name = "platformdirs"
version = "4.3.8"
source = { registry = "https://pypi.org/simple" }
sdist = { url = "https://files.pythonhosted.org/packages/fe/8b/3c73abc9c759ecd3f1f7ceff6685840859e8070c4d947c93fae71f6a0bf2/platformdirs-4.3.8.tar.gz", hash = "sha256:3d512d96e16bcb959a814c9f348431070822a6496326a4be0911c40b5a74c2bc", size = 21362 }
wheels = [
{ url = "https://files.pythonhosted.org/packages/fe/39/979e8e21520d4e47a0bbe349e2713c0aac6f3d853d0e5b34d76206c439aa/platformdirs-4.3.8-py3-none-any.whl", hash = "sha256:ff7059bb7eb1179e2685604f4aaf157cfd9535242bd23742eadc3c13542139b4", size = 18567 },
]
[[package]] [[package]]
name = "playwright" name = "playwright"
version = "1.50.0" version = "1.50.0"
@ -2237,6 +2610,12 @@ wheels = [
{ url = "https://files.pythonhosted.org/packages/bc/2b/e944e10c9b18e77e43d3bb4d6faa323f6cc27597db37b75bc3fd796adfd5/playwright-1.50.0-py3-none-win_amd64.whl", hash = "sha256:1859423da82de631704d5e3d88602d755462b0906824c1debe140979397d2e8d", size = 34784546 }, { url = "https://files.pythonhosted.org/packages/bc/2b/e944e10c9b18e77e43d3bb4d6faa323f6cc27597db37b75bc3fd796adfd5/playwright-1.50.0-py3-none-win_amd64.whl", hash = "sha256:1859423da82de631704d5e3d88602d755462b0906824c1debe140979397d2e8d", size = 34784546 },
] ]
[[package]]
name = "progress"
version = "1.6"
source = { registry = "https://pypi.org/simple" }
sdist = { url = "https://files.pythonhosted.org/packages/2a/68/d8412d1e0d70edf9791cbac5426dc859f4649afc22f2abbeb0d947cf70fd/progress-1.6.tar.gz", hash = "sha256:c9c86e98b5c03fa1fe11e3b67c1feda4788b8d0fe7336c2ff7d5644ccfba34cd", size = 7842 }
[[package]] [[package]]
name = "propcache" name = "propcache"
version = "0.2.1" version = "0.2.1"
@ -2576,6 +2955,19 @@ wheels = [
{ url = "https://files.pythonhosted.org/packages/6a/3e/b68c118422ec867fa7ab88444e1274aa40681c606d59ac27de5a5588f082/python_dotenv-1.0.1-py3-none-any.whl", hash = "sha256:f7b63ef50f1b690dddf550d03497b66d609393b40b564ed0d674909a68ebf16a", size = 19863 }, { url = "https://files.pythonhosted.org/packages/6a/3e/b68c118422ec867fa7ab88444e1274aa40681c606d59ac27de5a5588f082/python_dotenv-1.0.1-py3-none-any.whl", hash = "sha256:f7b63ef50f1b690dddf550d03497b66d609393b40b564ed0d674909a68ebf16a", size = 19863 },
] ]
[[package]]
name = "python-ffmpeg"
version = "2.0.12"
source = { registry = "https://pypi.org/simple" }
dependencies = [
{ name = "pyee" },
{ name = "typing-extensions" },
]
sdist = { url = "https://files.pythonhosted.org/packages/dd/4d/7ecffb341d646e016be76e36f5a42cb32f409c9ca21a57b68f067fad3fc7/python_ffmpeg-2.0.12.tar.gz", hash = "sha256:19ac80af5a064a2f53c245af1a909b2d7648ea045500d96d3bcd507b88d43dc7", size = 14126292 }
wheels = [
{ url = "https://files.pythonhosted.org/packages/7f/6d/02e817aec661defe148cb9eb0c4eca2444846305f625c2243fb9f92a9045/python_ffmpeg-2.0.12-py3-none-any.whl", hash = "sha256:d86697da8dfb39335183e336d31baf42fb217468adf5ac97fd743898240faae3", size = 14411 },
]
[[package]] [[package]]
name = "python-iso639" name = "python-iso639"
version = "2025.2.18" version = "2025.2.18"
@ -2641,6 +3033,15 @@ wheels = [
{ url = "https://files.pythonhosted.org/packages/eb/38/ac33370d784287baa1c3d538978b5e2ea064d4c1b93ffbd12826c190dd10/pytz-2025.1-py2.py3-none-any.whl", hash = "sha256:89dd22dca55b46eac6eda23b2d72721bf1bdfef212645d81513ef5d03038de57", size = 507930 }, { url = "https://files.pythonhosted.org/packages/eb/38/ac33370d784287baa1c3d538978b5e2ea064d4c1b93ffbd12826c190dd10/pytz-2025.1-py2.py3-none-any.whl", hash = "sha256:89dd22dca55b46eac6eda23b2d72721bf1bdfef212645d81513ef5d03038de57", size = 507930 },
] ]
[[package]]
name = "pywin32-ctypes"
version = "0.2.3"
source = { registry = "https://pypi.org/simple" }
sdist = { url = "https://files.pythonhosted.org/packages/85/9f/01a1a99704853cb63f253eea009390c88e7131c67e66a0a02099a8c917cb/pywin32-ctypes-0.2.3.tar.gz", hash = "sha256:d162dc04946d704503b2edc4d55f3dba5c1d539ead017afa00142c38b9885755", size = 29471 }
wheels = [
{ url = "https://files.pythonhosted.org/packages/de/3d/8161f7711c017e01ac9f008dfddd9410dff3674334c233bde66e7ba65bbf/pywin32_ctypes-0.2.3-py3-none-any.whl", hash = "sha256:8a1513379d709975552d202d942d9837758905c8d01eb82b8bcc30918929e7b8", size = 30756 },
]
[[package]] [[package]]
name = "pyyaml" name = "pyyaml"
version = "6.0.2" version = "6.0.2"
@ -2705,6 +3106,20 @@ wheels = [
{ url = "https://files.pythonhosted.org/packages/4b/43/ca3d1018b392f49131843648e10b08ace23afe8dad3bee5f136e4346b7cd/rapidfuzz-3.12.2-cp313-cp313-win_arm64.whl", hash = "sha256:69f6ecdf1452139f2b947d0c169a605de578efdb72cbb2373cb0a94edca1fd34", size = 863535 }, { url = "https://files.pythonhosted.org/packages/4b/43/ca3d1018b392f49131843648e10b08ace23afe8dad3bee5f136e4346b7cd/rapidfuzz-3.12.2-cp313-cp313-win_arm64.whl", hash = "sha256:69f6ecdf1452139f2b947d0c169a605de578efdb72cbb2373cb0a94edca1fd34", size = 863535 },
] ]
[[package]]
name = "readme-renderer"
version = "44.0"
source = { registry = "https://pypi.org/simple" }
dependencies = [
{ name = "docutils" },
{ name = "nh3" },
{ name = "pygments" },
]
sdist = { url = "https://files.pythonhosted.org/packages/5a/a9/104ec9234c8448c4379768221ea6df01260cd6c2ce13182d4eac531c8342/readme_renderer-44.0.tar.gz", hash = "sha256:8712034eabbfa6805cacf1402b4eeb2a73028f72d1166d6f5cb7f9c047c5d1e1", size = 32056 }
wheels = [
{ url = "https://files.pythonhosted.org/packages/e1/67/921ec3024056483db83953ae8e48079ad62b92db7880013ca77632921dd0/readme_renderer-44.0-py3-none-any.whl", hash = "sha256:2fbca89b81a08526aadf1357a8c2ae889ec05fb03f5da67f9769c9a592166151", size = 13310 },
]
[[package]] [[package]]
name = "referencing" name = "referencing"
version = "0.36.2" version = "0.36.2"
@ -2798,17 +3213,26 @@ flashrank = [
{ name = "flashrank" }, { name = "flashrank" },
] ]
[[package]]
name = "rfc3986"
version = "2.0.0"
source = { registry = "https://pypi.org/simple" }
sdist = { url = "https://files.pythonhosted.org/packages/85/40/1520d68bfa07ab5a6f065a186815fb6610c86fe957bc065754e47f7b0840/rfc3986-2.0.0.tar.gz", hash = "sha256:97aacf9dbd4bfd829baad6e6309fa6573aaf1be3f6fa735c8ab05e46cecb261c", size = 49026 }
wheels = [
{ url = "https://files.pythonhosted.org/packages/ff/9a/9afaade874b2fa6c752c36f1548f718b5b83af81ed9b76628329dab81c1b/rfc3986-2.0.0-py2.py3-none-any.whl", hash = "sha256:50b1502b60e289cb37883f3dfd34532b8873c7de9f49bb546641ce9cbd256ebd", size = 31326 },
]
[[package]] [[package]]
name = "rich" name = "rich"
version = "13.9.4" version = "14.0.0"
source = { registry = "https://pypi.org/simple" } source = { registry = "https://pypi.org/simple" }
dependencies = [ dependencies = [
{ name = "markdown-it-py" }, { name = "markdown-it-py" },
{ name = "pygments" }, { name = "pygments" },
] ]
sdist = { url = "https://files.pythonhosted.org/packages/ab/3a/0316b28d0761c6734d6bc14e770d85506c986c85ffb239e688eeaab2c2bc/rich-13.9.4.tar.gz", hash = "sha256:439594978a49a09530cff7ebc4b5c7103ef57baf48d5ea3184f21d9a2befa098", size = 223149 } sdist = { url = "https://files.pythonhosted.org/packages/a1/53/830aa4c3066a8ab0ae9a9955976fb770fe9c6102117c8ec4ab3ea62d89e8/rich-14.0.0.tar.gz", hash = "sha256:82f1bc23a6a21ebca4ae0c45af9bdbc492ed20231dcb63f297d6d1021a9d5725", size = 224078 }
wheels = [ wheels = [
{ url = "https://files.pythonhosted.org/packages/19/71/39c7c0d87f8d4e6c020a393182060eaefeeae6c01dab6a84ec346f2567df/rich-13.9.4-py3-none-any.whl", hash = "sha256:6049d5e6ec054bf2779ab3358186963bac2ea89175919d699e378b99738c2a90", size = 242424 }, { url = "https://files.pythonhosted.org/packages/0d/9b/63f4c7ebc259242c89b3acafdb37b41d1185c07ff0011164674e9076b491/rich-14.0.0-py3-none-any.whl", hash = "sha256:1c9491e1951aac09caffd42f448ee3d04e58923ffe14993f6e83068dc395d7e0", size = 243229 },
] ]
[[package]] [[package]]
@ -2954,6 +3378,19 @@ wheels = [
{ url = "https://files.pythonhosted.org/packages/e4/1f/5d46a8d94e9f6d2c913cbb109e57e7eed914de38ea99e2c4d69a9fc93140/scipy-1.15.1-cp313-cp313t-win_amd64.whl", hash = "sha256:bc7136626261ac1ed988dca56cfc4ab5180f75e0ee52e58f1e6aa74b5f3eacd5", size = 43181730 }, { url = "https://files.pythonhosted.org/packages/e4/1f/5d46a8d94e9f6d2c913cbb109e57e7eed914de38ea99e2c4d69a9fc93140/scipy-1.15.1-cp313-cp313t-win_amd64.whl", hash = "sha256:bc7136626261ac1ed988dca56cfc4ab5180f75e0ee52e58f1e6aa74b5f3eacd5", size = 43181730 },
] ]
[[package]]
name = "secretstorage"
version = "3.3.3"
source = { registry = "https://pypi.org/simple" }
dependencies = [
{ name = "cryptography", marker = "sys_platform != 'darwin'" },
{ name = "jeepney", marker = "sys_platform != 'darwin'" },
]
sdist = { url = "https://files.pythonhosted.org/packages/53/a4/f48c9d79cb507ed1373477dbceaba7401fd8a23af63b837fa61f1dcd3691/SecretStorage-3.3.3.tar.gz", hash = "sha256:2403533ef369eca6d2ba81718576c5e0f564d5cca1b58f73a8b23e7d4eeebd77", size = 19739 }
wheels = [
{ url = "https://files.pythonhosted.org/packages/54/24/b4293291fa1dd830f353d2cb163295742fa87f179fcc8a20a306a81978b7/SecretStorage-3.3.3-py3-none-any.whl", hash = "sha256:f356e6628222568e3af06f2eba8df495efa13b3b63081dafd4f7d9a7b7bc9f99", size = 15221 },
]
[[package]] [[package]]
name = "sentence-transformers" name = "sentence-transformers"
version = "3.4.1" version = "3.4.1"
@ -3063,9 +3500,23 @@ wheels = [
{ url = "https://files.pythonhosted.org/packages/d9/61/f2b52e107b1fc8944b33ef56bf6ac4ebbe16d91b94d2b87ce013bf63fb84/starlette-0.45.3-py3-none-any.whl", hash = "sha256:dfb6d332576f136ec740296c7e8bb8c8a7125044e7c6da30744718880cdd059d", size = 71507 }, { url = "https://files.pythonhosted.org/packages/d9/61/f2b52e107b1fc8944b33ef56bf6ac4ebbe16d91b94d2b87ce013bf63fb84/starlette-0.45.3-py3-none-any.whl", hash = "sha256:dfb6d332576f136ec740296c7e8bb8c8a7125044e7c6da30744718880cdd059d", size = 71507 },
] ]
[[package]]
name = "static-ffmpeg"
version = "2.13"
source = { registry = "https://pypi.org/simple" }
dependencies = [
{ name = "filelock" },
{ name = "progress" },
{ name = "requests" },
{ name = "twine" },
]
wheels = [
{ url = "https://files.pythonhosted.org/packages/09/39/1a5d0603280dd681ec52a2a6717c05dab530190dff7887b7603740a1741b/static_ffmpeg-2.13-py3-none-any.whl", hash = "sha256:3bed55a7979f9de9d1eec1126b98774a1d41c2e323811f59973d54b9c94d6dac", size = 7586 },
]
[[package]] [[package]]
name = "surf-new-backend" name = "surf-new-backend"
version = "0.0.6" version = "0.0.7"
source = { virtual = "." } source = { virtual = "." }
dependencies = [ dependencies = [
{ name = "alembic" }, { name = "alembic" },
@ -3078,14 +3529,18 @@ dependencies = [
{ name = "langchain-community" }, { name = "langchain-community" },
{ name = "langchain-unstructured" }, { name = "langchain-unstructured" },
{ name = "langgraph" }, { name = "langgraph" },
{ name = "linkup-sdk" },
{ name = "litellm" }, { name = "litellm" },
{ name = "llama-cloud-services" },
{ name = "markdownify" }, { name = "markdownify" },
{ name = "notion-client" }, { name = "notion-client" },
{ name = "pgvector" }, { name = "pgvector" },
{ name = "playwright" }, { name = "playwright" },
{ name = "python-ffmpeg" },
{ name = "rerankers", extra = ["flashrank"] }, { name = "rerankers", extra = ["flashrank"] },
{ name = "sentence-transformers" }, { name = "sentence-transformers" },
{ name = "slack-sdk" }, { name = "slack-sdk" },
{ name = "static-ffmpeg" },
{ name = "tavily-python" }, { name = "tavily-python" },
{ name = "unstructured", extra = ["all-docs"] }, { name = "unstructured", extra = ["all-docs"] },
{ name = "unstructured-client" }, { name = "unstructured-client" },
@ -3098,7 +3553,7 @@ dependencies = [
requires-dist = [ requires-dist = [
{ name = "alembic", specifier = ">=1.13.0" }, { name = "alembic", specifier = ">=1.13.0" },
{ name = "asyncpg", specifier = ">=0.30.0" }, { name = "asyncpg", specifier = ">=0.30.0" },
{ name = "chonkie", extras = ["all"], specifier = ">=0.4.1" }, { name = "chonkie", extras = ["all"], specifier = ">=1.0.6" },
{ name = "fastapi", specifier = ">=0.115.8" }, { name = "fastapi", specifier = ">=0.115.8" },
{ name = "fastapi-users", extras = ["oauth", "sqlalchemy"], specifier = ">=14.0.1" }, { name = "fastapi-users", extras = ["oauth", "sqlalchemy"], specifier = ">=14.0.1" },
{ name = "firecrawl-py", specifier = ">=1.12.0" }, { name = "firecrawl-py", specifier = ">=1.12.0" },
@ -3106,14 +3561,18 @@ requires-dist = [
{ name = "langchain-community", specifier = ">=0.3.17" }, { name = "langchain-community", specifier = ">=0.3.17" },
{ name = "langchain-unstructured", specifier = ">=0.1.6" }, { name = "langchain-unstructured", specifier = ">=0.1.6" },
{ name = "langgraph", specifier = ">=0.3.29" }, { name = "langgraph", specifier = ">=0.3.29" },
{ name = "linkup-sdk", specifier = ">=0.2.4" },
{ name = "litellm", specifier = ">=1.61.4" }, { name = "litellm", specifier = ">=1.61.4" },
{ name = "llama-cloud-services", specifier = ">=0.6.25" },
{ name = "markdownify", specifier = ">=0.14.1" }, { name = "markdownify", specifier = ">=0.14.1" },
{ name = "notion-client", specifier = ">=2.3.0" }, { name = "notion-client", specifier = ">=2.3.0" },
{ name = "pgvector", specifier = ">=0.3.6" }, { name = "pgvector", specifier = ">=0.3.6" },
{ name = "playwright", specifier = ">=1.50.0" }, { name = "playwright", specifier = ">=1.50.0" },
{ name = "python-ffmpeg", specifier = ">=2.0.12" },
{ name = "rerankers", extras = ["flashrank"], specifier = ">=0.7.1" }, { name = "rerankers", extras = ["flashrank"], specifier = ">=0.7.1" },
{ name = "sentence-transformers", specifier = ">=3.4.1" }, { name = "sentence-transformers", specifier = ">=3.4.1" },
{ name = "slack-sdk", specifier = ">=3.34.0" }, { name = "slack-sdk", specifier = ">=3.34.0" },
{ name = "static-ffmpeg", specifier = ">=2.13" },
{ name = "tavily-python", specifier = ">=0.3.2" }, { name = "tavily-python", specifier = ">=0.3.2" },
{ name = "unstructured", extras = ["all-docs"], specifier = ">=0.16.25" }, { name = "unstructured", extras = ["all-docs"], specifier = ">=0.16.25" },
{ name = "unstructured-client", specifier = ">=0.30.0" }, { name = "unstructured-client", specifier = ">=0.30.0" },
@ -3324,6 +3783,91 @@ wheels = [
{ url = "https://files.pythonhosted.org/packages/b6/1a/efeecb8d83705f2f4beac98d46f2148c95ecd7babfb31b5c0f1e7017e83d/transformers-4.48.3-py3-none-any.whl", hash = "sha256:78697f990f5ef350c23b46bf86d5081ce96b49479ab180b2de7687267de8fd36", size = 9669412 }, { url = "https://files.pythonhosted.org/packages/b6/1a/efeecb8d83705f2f4beac98d46f2148c95ecd7babfb31b5c0f1e7017e83d/transformers-4.48.3-py3-none-any.whl", hash = "sha256:78697f990f5ef350c23b46bf86d5081ce96b49479ab180b2de7687267de8fd36", size = 9669412 },
] ]
[[package]]
name = "tree-sitter"
version = "0.24.0"
source = { registry = "https://pypi.org/simple" }
sdist = { url = "https://files.pythonhosted.org/packages/a7/a2/698b9d31d08ad5558f8bfbfe3a0781bd4b1f284e89bde3ad18e05101a892/tree-sitter-0.24.0.tar.gz", hash = "sha256:abd95af65ca2f4f7eca356343391ed669e764f37748b5352946f00f7fc78e734", size = 168304 }
wheels = [
{ url = "https://files.pythonhosted.org/packages/e9/57/3a590f287b5aa60c07d5545953912be3d252481bf5e178f750db75572bff/tree_sitter-0.24.0-cp312-cp312-macosx_10_13_x86_64.whl", hash = "sha256:14beeff5f11e223c37be7d5d119819880601a80d0399abe8c738ae2288804afc", size = 140788 },
{ url = "https://files.pythonhosted.org/packages/61/0b/fc289e0cba7dbe77c6655a4dd949cd23c663fd62a8b4d8f02f97e28d7fe5/tree_sitter-0.24.0-cp312-cp312-macosx_11_0_arm64.whl", hash = "sha256:26a5b130f70d5925d67b47db314da209063664585a2fd36fa69e0717738efaf4", size = 133945 },
{ url = "https://files.pythonhosted.org/packages/86/d7/80767238308a137e0b5b5c947aa243e3c1e3e430e6d0d5ae94b9a9ffd1a2/tree_sitter-0.24.0-cp312-cp312-manylinux_2_17_aarch64.manylinux2014_aarch64.whl", hash = "sha256:5fc5c3c26d83c9d0ecb4fc4304fba35f034b7761d35286b936c1db1217558b4e", size = 564819 },
{ url = "https://files.pythonhosted.org/packages/bf/b3/6c5574f4b937b836601f5fb556b24804b0a6341f2eb42f40c0e6464339f4/tree_sitter-0.24.0-cp312-cp312-manylinux_2_17_x86_64.manylinux2014_x86_64.whl", hash = "sha256:772e1bd8c0931c866b848d0369b32218ac97c24b04790ec4b0e409901945dd8e", size = 579303 },
{ url = "https://files.pythonhosted.org/packages/0a/f4/bd0ddf9abe242ea67cca18a64810f8af230fc1ea74b28bb702e838ccd874/tree_sitter-0.24.0-cp312-cp312-musllinux_1_2_x86_64.whl", hash = "sha256:24a8dd03b0d6b8812425f3b84d2f4763322684e38baf74e5bb766128b5633dc7", size = 581054 },
{ url = "https://files.pythonhosted.org/packages/8c/1c/ff23fa4931b6ef1bbeac461b904ca7e49eaec7e7e5398584e3eef836ec96/tree_sitter-0.24.0-cp312-cp312-win_amd64.whl", hash = "sha256:f9e8b1605ab60ed43803100f067eed71b0b0e6c1fb9860a262727dbfbbb74751", size = 120221 },
{ url = "https://files.pythonhosted.org/packages/b2/2a/9979c626f303177b7612a802237d0533155bf1e425ff6f73cc40f25453e2/tree_sitter-0.24.0-cp312-cp312-win_arm64.whl", hash = "sha256:f733a83d8355fc95561582b66bbea92ffd365c5d7a665bc9ebd25e049c2b2abb", size = 108234 },
{ url = "https://files.pythonhosted.org/packages/61/cd/2348339c85803330ce38cee1c6cbbfa78a656b34ff58606ebaf5c9e83bd0/tree_sitter-0.24.0-cp313-cp313-macosx_10_13_x86_64.whl", hash = "sha256:0d4a6416ed421c4210f0ca405a4834d5ccfbb8ad6692d4d74f7773ef68f92071", size = 140781 },
{ url = "https://files.pythonhosted.org/packages/8b/a3/1ea9d8b64e8dcfcc0051028a9c84a630301290995cd6e947bf88267ef7b1/tree_sitter-0.24.0-cp313-cp313-macosx_11_0_arm64.whl", hash = "sha256:e0992d483677e71d5c5d37f30dfb2e3afec2f932a9c53eec4fca13869b788c6c", size = 133928 },
{ url = "https://files.pythonhosted.org/packages/fe/ae/55c1055609c9428a4aedf4b164400ab9adb0b1bf1538b51f4b3748a6c983/tree_sitter-0.24.0-cp313-cp313-manylinux_2_17_aarch64.manylinux2014_aarch64.whl", hash = "sha256:57277a12fbcefb1c8b206186068d456c600dbfbc3fd6c76968ee22614c5cd5ad", size = 564497 },
{ url = "https://files.pythonhosted.org/packages/ce/d0/f2ffcd04882c5aa28d205a787353130cbf84b2b8a977fd211bdc3b399ae3/tree_sitter-0.24.0-cp313-cp313-manylinux_2_17_x86_64.manylinux2014_x86_64.whl", hash = "sha256:d25fa22766d63f73716c6fec1a31ee5cf904aa429484256bd5fdf5259051ed74", size = 578917 },
{ url = "https://files.pythonhosted.org/packages/af/82/aebe78ea23a2b3a79324993d4915f3093ad1af43d7c2208ee90be9273273/tree_sitter-0.24.0-cp313-cp313-musllinux_1_2_x86_64.whl", hash = "sha256:7d5d9537507e1c8c5fa9935b34f320bfec4114d675e028f3ad94f11cf9db37b9", size = 581148 },
{ url = "https://files.pythonhosted.org/packages/a1/b4/6b0291a590c2b0417cfdb64ccb8ea242f270a46ed429c641fbc2bfab77e0/tree_sitter-0.24.0-cp313-cp313-win_amd64.whl", hash = "sha256:f58bb4956917715ec4d5a28681829a8dad5c342cafd4aea269f9132a83ca9b34", size = 120207 },
{ url = "https://files.pythonhosted.org/packages/a8/18/542fd844b75272630229c9939b03f7db232c71a9d82aadc59c596319ea6a/tree_sitter-0.24.0-cp313-cp313-win_arm64.whl", hash = "sha256:23641bd25dcd4bb0b6fa91b8fb3f46cc9f1c9f475efe4d536d3f1f688d1b84c8", size = 108232 },
]
[[package]]
name = "tree-sitter-c-sharp"
version = "0.23.1"
source = { registry = "https://pypi.org/simple" }
sdist = { url = "https://files.pythonhosted.org/packages/22/85/a61c782afbb706a47d990eaee6977e7c2bd013771c5bf5c81c617684f286/tree_sitter_c_sharp-0.23.1.tar.gz", hash = "sha256:322e2cfd3a547a840375276b2aea3335fa6458aeac082f6c60fec3f745c967eb", size = 1317728 }
wheels = [
{ url = "https://files.pythonhosted.org/packages/58/04/f6c2df4c53a588ccd88d50851155945cff8cd887bd70c175e00aaade7edf/tree_sitter_c_sharp-0.23.1-cp39-abi3-macosx_10_9_x86_64.whl", hash = "sha256:2b612a6e5bd17bb7fa2aab4bb6fc1fba45c94f09cb034ab332e45603b86e32fd", size = 372235 },
{ url = "https://files.pythonhosted.org/packages/99/10/1aa9486f1e28fc22810fa92cbdc54e1051e7f5536a5e5b5e9695f609b31e/tree_sitter_c_sharp-0.23.1-cp39-abi3-macosx_11_0_arm64.whl", hash = "sha256:1a8b98f62bc53efcd4d971151950c9b9cd5cbe3bacdb0cd69fdccac63350d83e", size = 419046 },
{ url = "https://files.pythonhosted.org/packages/0f/21/13df29f8fcb9ba9f209b7b413a4764b673dfd58989a0dd67e9c7e19e9c2e/tree_sitter_c_sharp-0.23.1-cp39-abi3-manylinux_2_17_aarch64.manylinux2014_aarch64.whl", hash = "sha256:986e93d845a438ec3c4416401aa98e6a6f6631d644bbbc2e43fcb915c51d255d", size = 415999 },
{ url = "https://files.pythonhosted.org/packages/ca/72/fc6846795bcdae2f8aa94cc8b1d1af33d634e08be63e294ff0d6794b1efc/tree_sitter_c_sharp-0.23.1-cp39-abi3-manylinux_2_5_x86_64.manylinux1_x86_64.manylinux_2_17_x86_64.manylinux2014_x86_64.whl", hash = "sha256:a8024e466b2f5611c6dc90321f232d8584893c7fb88b75e4a831992f877616d2", size = 402830 },
{ url = "https://files.pythonhosted.org/packages/fe/3a/b6028c5890ce6653807d5fa88c72232c027c6ceb480dbeb3b186d60e5971/tree_sitter_c_sharp-0.23.1-cp39-abi3-musllinux_1_2_x86_64.whl", hash = "sha256:7f9bf876866835492281d336b9e1f9626ab668737f74e914c31d285261507da7", size = 397880 },
{ url = "https://files.pythonhosted.org/packages/47/d2/4facaa34b40f8104d8751746d0e1cd2ddf0beb9f1404b736b97f372bd1f3/tree_sitter_c_sharp-0.23.1-cp39-abi3-win_amd64.whl", hash = "sha256:ae9a9e859e8f44e2b07578d44f9a220d3fa25b688966708af6aa55d42abeebb3", size = 377562 },
{ url = "https://files.pythonhosted.org/packages/d8/88/3cf6bd9959d94d1fec1e6a9c530c5f08ff4115a474f62aedb5fedb0f7241/tree_sitter_c_sharp-0.23.1-cp39-abi3-win_arm64.whl", hash = "sha256:c81548347a93347be4f48cb63ec7d60ef4b0efa91313330e69641e49aa5a08c5", size = 375157 },
]
[[package]]
name = "tree-sitter-embedded-template"
version = "0.23.2"
source = { registry = "https://pypi.org/simple" }
sdist = { url = "https://files.pythonhosted.org/packages/28/d6/5a58ea2f0480f5ed188b733114a8c275532a2fd1568b3898793b13d28af5/tree_sitter_embedded_template-0.23.2.tar.gz", hash = "sha256:7b24dcf2e92497f54323e617564d36866230a8bfb719dbb7b45b461510dcddaa", size = 8471 }
wheels = [
{ url = "https://files.pythonhosted.org/packages/ef/c1/be0c48ed9609b720e74ade86f24ea086e353fe9c7405ee9630c3d52d09a2/tree_sitter_embedded_template-0.23.2-cp39-abi3-macosx_10_9_x86_64.whl", hash = "sha256:a505c2d2494464029d79db541cab52f6da5fb326bf3d355e69bf98b84eb89ae0", size = 9554 },
{ url = "https://files.pythonhosted.org/packages/6d/a5/7c12f5d302525ee36d1eafc28a68e4454da5bad208436d547326bee4ed76/tree_sitter_embedded_template-0.23.2-cp39-abi3-macosx_11_0_arm64.whl", hash = "sha256:28028b93b42cc3753261ae7ce066675d407f59de512417524f9c3ab7792b1d37", size = 10051 },
{ url = "https://files.pythonhosted.org/packages/cd/87/95aaba8b64b849200bd7d4ae510cc394ecaef46a031499cbff301766970d/tree_sitter_embedded_template-0.23.2-cp39-abi3-manylinux_2_17_aarch64.manylinux2014_aarch64.whl", hash = "sha256:ec399d59ce93ffb60759a2d96053eed529f3c3f6a27128f261710d0d0de60e10", size = 17532 },
{ url = "https://files.pythonhosted.org/packages/13/f8/8c837b898f00b35f9f3f76a4abc525e80866a69343083c9ff329e17ecb03/tree_sitter_embedded_template-0.23.2-cp39-abi3-manylinux_2_5_x86_64.manylinux1_x86_64.manylinux_2_17_x86_64.manylinux2014_x86_64.whl", hash = "sha256:bcfa01f62b88d50dbcb736cc23baec8ddbfe08daacfdc613eee8c04ab65efd09", size = 17394 },
{ url = "https://files.pythonhosted.org/packages/89/9b/893adf9e465d2d7f14870871bf2f3b30045e5ac417cb596f667a72eda493/tree_sitter_embedded_template-0.23.2-cp39-abi3-musllinux_1_2_x86_64.whl", hash = "sha256:6debd24791466f887109a433c31aa4a5deeba2b217817521c745a4e748a944ed", size = 16439 },
{ url = "https://files.pythonhosted.org/packages/40/96/e79934572723673db9f867000500c6eea61a37705e02c7aee9ee031bbb6f/tree_sitter_embedded_template-0.23.2-cp39-abi3-win_amd64.whl", hash = "sha256:158fecb38be5b15db0190ef7238e5248f24bf32ae3cab93bc1197e293a5641eb", size = 12572 },
{ url = "https://files.pythonhosted.org/packages/63/06/27f678b9874e4e2e39ddc6f5cce3374c8c60e6046ea8588a491ab6fc9fcb/tree_sitter_embedded_template-0.23.2-cp39-abi3-win_arm64.whl", hash = "sha256:9f1f3b79fe273f3d15a5b64c85fc6ebfb48decfbe8542accd05f5b7694860df0", size = 11232 },
]
[[package]]
name = "tree-sitter-language-pack"
version = "0.7.2"
source = { registry = "https://pypi.org/simple" }
dependencies = [
{ name = "tree-sitter" },
{ name = "tree-sitter-c-sharp" },
{ name = "tree-sitter-embedded-template" },
{ name = "tree-sitter-yaml" },
]
sdist = { url = "https://files.pythonhosted.org/packages/9b/1e/2d63d93025fd5b527327c3fd348955cebaec02a3f1bcec88ab4d88ddfc39/tree_sitter_language_pack-0.7.2.tar.gz", hash = "sha256:46fc96cc3bddfee7091fdedec2ae7e34218679e58241e8319bf82026f6d02eae", size = 59264078 }
wheels = [
{ url = "https://files.pythonhosted.org/packages/da/9d/2c6272bf4fd18a22d8c07d3c983940dbece4f0e9e21f5c78f15a2740f435/tree_sitter_language_pack-0.7.2-cp39-abi3-macosx_10_13_universal2.whl", hash = "sha256:4036603020bd32060d9931a64f8c3d8637de575f350f11534971012e51a27a95", size = 28132977 },
{ url = "https://files.pythonhosted.org/packages/2b/e2/0f2511019c27b870061f9ad719074095ef84cd7857a730765bfa066384be/tree_sitter_language_pack-0.7.2-cp39-abi3-manylinux2014_aarch64.whl", hash = "sha256:801926dbc81eeca4ce97b846cc899dcf3fecfdc3b2514a68eeeb118f70ac686d", size = 17576769 },
{ url = "https://files.pythonhosted.org/packages/3a/88/7b38233def5c359503ad4d36533f96f9fe2943a8eeeced66b36312c49e1b/tree_sitter_language_pack-0.7.2-cp39-abi3-manylinux2014_x86_64.whl", hash = "sha256:77be80335fb585f48eb268b0e07ca54f3da8f30c2eab7be749113f116c3ef316", size = 17433872 },
{ url = "https://files.pythonhosted.org/packages/f8/27/fc5dce240b68a1ed876bc80b2238fbaaa0f695dbaf88660728a0239a2b20/tree_sitter_language_pack-0.7.2-cp39-abi3-win_amd64.whl", hash = "sha256:d71c6b4c14b3370ca783319ede7a581a10e6dd1bdfe5d31d316d9216981a6406", size = 14316050 },
]
[[package]]
name = "tree-sitter-yaml"
version = "0.7.0"
source = { registry = "https://pypi.org/simple" }
sdist = { url = "https://files.pythonhosted.org/packages/93/04/6de8be8112c50450cab753fcd6b74d8368c60f6099bf551cee0bec69563a/tree_sitter_yaml-0.7.0.tar.gz", hash = "sha256:9c8bb17d9755c3b0e757260917240c0d19883cd3b59a5d74f205baa8bf8435a4", size = 85085 }
wheels = [
{ url = "https://files.pythonhosted.org/packages/69/1d/243dbdf59fae8a4109e19f0994e2627ddedb2e16b7cf99bd42be64367742/tree_sitter_yaml-0.7.0-cp39-abi3-macosx_10_9_x86_64.whl", hash = "sha256:e21553ac190ae05bf82796df8beb4d9158ba195b5846018cb36fbc3a35bd0679", size = 43335 },
{ url = "https://files.pythonhosted.org/packages/e2/63/e5d5868a1498e20fd07e7db62933766fd64950279862e3e7f150b88ec69d/tree_sitter_yaml-0.7.0-cp39-abi3-macosx_11_0_arm64.whl", hash = "sha256:c022054f1f9b54201082ea83073a6c24c42d0436ad8ee99ff2574cba8f928c28", size = 44574 },
{ url = "https://files.pythonhosted.org/packages/f5/ba/9cff9a3fddb1b6b38bc71ce1dfdb8892ab15a4042c104f4582e30318b412/tree_sitter_yaml-0.7.0-cp39-abi3-manylinux_2_17_aarch64.manylinux2014_aarch64.whl", hash = "sha256:1cd1725142f19e41c51d27c99cfc60780f596e069eb181cfa6433d993a19aa3d", size = 93088 },
{ url = "https://files.pythonhosted.org/packages/19/09/39d29d9a22cee0b3c3e4f3fdbd23e4534b9c2a84b5f962f369eafcfbf88c/tree_sitter_yaml-0.7.0-cp39-abi3-manylinux_2_5_x86_64.manylinux1_x86_64.manylinux_2_17_x86_64.manylinux2014_x86_64.whl", hash = "sha256:9d1b268378254f75bb27396d83c96d886ccbfcda6bd8c2778e94e3e1d2459085", size = 91367 },
{ url = "https://files.pythonhosted.org/packages/b0/b7/285653b894b351436917b5fe5e738eecaeb2128b4e4bf72bfe0c6043f62e/tree_sitter_yaml-0.7.0-cp39-abi3-musllinux_1_2_x86_64.whl", hash = "sha256:27c2e7f4f49ddf410003abbb82a7b00ec77ea263d8ef08dbce1a15d293eed2fd", size = 87405 },
{ url = "https://files.pythonhosted.org/packages/bb/73/0cdc82ea653c190475a4f63dd4a1f4efd5d1c7d09d2668b8d84008a4c4f8/tree_sitter_yaml-0.7.0-cp39-abi3-win_amd64.whl", hash = "sha256:98dce0d6bc376f842cfb1d3c32512eea95b37e61cd2c87074bb4b05c999917c8", size = 45360 },
{ url = "https://files.pythonhosted.org/packages/2e/32/af2d676b0176a958f22a75b04be836e09476a10844baab78c018a5030297/tree_sitter_yaml-0.7.0-cp39-abi3-win_arm64.whl", hash = "sha256:f0f8d8e05fa8e70f08d0f18a209d6026e171844f4ea7090e7c779b9c375b3a31", size = 43650 },
]
[[package]] [[package]]
name = "triton" name = "triton"
version = "3.2.0" version = "3.2.0"
@ -3333,6 +3877,38 @@ wheels = [
{ url = "https://files.pythonhosted.org/packages/c7/30/37a3384d1e2e9320331baca41e835e90a3767303642c7a80d4510152cbcf/triton-3.2.0-cp313-cp313-manylinux_2_17_x86_64.manylinux2014_x86_64.whl", hash = "sha256:e5dfa23ba84541d7c0a531dfce76d8bcd19159d50a4a8b14ad01e91734a5c1b0", size = 253154278 }, { url = "https://files.pythonhosted.org/packages/c7/30/37a3384d1e2e9320331baca41e835e90a3767303642c7a80d4510152cbcf/triton-3.2.0-cp313-cp313-manylinux_2_17_x86_64.manylinux2014_x86_64.whl", hash = "sha256:e5dfa23ba84541d7c0a531dfce76d8bcd19159d50a4a8b14ad01e91734a5c1b0", size = 253154278 },
] ]
[[package]]
name = "twine"
version = "6.1.0"
source = { registry = "https://pypi.org/simple" }
dependencies = [
{ name = "id" },
{ name = "keyring", marker = "platform_machine != 'ppc64le' and platform_machine != 's390x'" },
{ name = "packaging" },
{ name = "readme-renderer" },
{ name = "requests" },
{ name = "requests-toolbelt" },
{ name = "rfc3986" },
{ name = "rich" },
{ name = "urllib3" },
]
sdist = { url = "https://files.pythonhosted.org/packages/c8/a2/6df94fc5c8e2170d21d7134a565c3a8fb84f9797c1dd65a5976aaf714418/twine-6.1.0.tar.gz", hash = "sha256:be324f6272eff91d07ee93f251edf232fc647935dd585ac003539b42404a8dbd", size = 168404 }
wheels = [
{ url = "https://files.pythonhosted.org/packages/7c/b6/74e927715a285743351233f33ea3c684528a0d374d2e43ff9ce9585b73fe/twine-6.1.0-py3-none-any.whl", hash = "sha256:a47f973caf122930bf0fbbf17f80b83bc1602c9ce393c7845f289a3001dc5384", size = 40791 },
]
[[package]]
name = "types-requests"
version = "2.32.0.20250328"
source = { registry = "https://pypi.org/simple" }
dependencies = [
{ name = "urllib3" },
]
sdist = { url = "https://files.pythonhosted.org/packages/00/7d/eb174f74e3f5634eaacb38031bbe467dfe2e545bc255e5c90096ec46bc46/types_requests-2.32.0.20250328.tar.gz", hash = "sha256:c9e67228ea103bd811c96984fac36ed2ae8da87a36a633964a21f199d60baf32", size = 22995 }
wheels = [
{ url = "https://files.pythonhosted.org/packages/cc/15/3700282a9d4ea3b37044264d3e4d1b1f0095a4ebf860a99914fd544e3be3/types_requests-2.32.0.20250328-py3-none-any.whl", hash = "sha256:72ff80f84b15eb3aa7a8e2625fffb6a93f2ad5a0c20215fc1dcfa61117bcb2a2", size = 20663 },
]
[[package]] [[package]]
name = "typing-extensions" name = "typing-extensions"
version = "4.12.2" version = "4.12.2"
@ -3366,7 +3942,7 @@ wheels = [
[[package]] [[package]]
name = "unstructured" name = "unstructured"
version = "0.16.25" version = "0.17.2"
source = { registry = "https://pypi.org/simple" } source = { registry = "https://pypi.org/simple" }
dependencies = [ dependencies = [
{ name = "backoff" }, { name = "backoff" },
@ -3391,9 +3967,9 @@ dependencies = [
{ name = "unstructured-client" }, { name = "unstructured-client" },
{ name = "wrapt" }, { name = "wrapt" },
] ]
sdist = { url = "https://files.pythonhosted.org/packages/64/31/98c4c78e305d1294888adf87fd5ee30577a4c393951341ca32b43f167f1e/unstructured-0.16.25.tar.gz", hash = "sha256:73b9b0f51dbb687af572ecdb849a6811710b9cac797ddeab8ee80fa07d8aa5e6", size = 1683097 } sdist = { url = "https://files.pythonhosted.org/packages/b4/49/b95ff4b609d7328cd0394ac9d8ad69839e11a1f879462496afcf4887154a/unstructured-0.17.2.tar.gz", hash = "sha256:af18c3caef0a6c562cf77e34ee8b6ff522b605031d2336ffe565df66f126aa46", size = 1684745 }
wheels = [ wheels = [
{ url = "https://files.pythonhosted.org/packages/12/4f/ad08585b5c8a33c82ea119494c4d3023f4796958c56e668b15cc282ec0a0/unstructured-0.16.25-py3-none-any.whl", hash = "sha256:14719ccef2830216cf1c5bf654f75e2bf07b17ca5dcee9da5ac74618130fd337", size = 1769286 }, { url = "https://files.pythonhosted.org/packages/cb/88/061a9dedd4e8cc0c31097c3275a9ef1fd7307e26afac5cd582487386e1b8/unstructured-0.17.2-py3-none-any.whl", hash = "sha256:527dd26a4b273aebef2f9119c9d4f0d0ce17640038d92296d23abe89be123840", size = 1771563 },
] ]
[package.optional-dependencies] [package.optional-dependencies]
@ -3403,6 +3979,7 @@ all-docs = [
{ name = "markdown" }, { name = "markdown" },
{ name = "networkx" }, { name = "networkx" },
{ name = "onnx" }, { name = "onnx" },
{ name = "onnxruntime" },
{ name = "openpyxl" }, { name = "openpyxl" },
{ name = "pandas" }, { name = "pandas" },
{ name = "pdf2image" }, { name = "pdf2image" },

View file

@ -1,7 +1,7 @@
{ {
"name": "surfsense_browser_extension", "name": "surfsense_browser_extension",
"displayName": "Surfsense Browser Extension", "displayName": "Surfsense Browser Extension",
"version": "0.0.6", "version": "0.0.7",
"description": "Extension to collect Browsing History for SurfSense.", "description": "Extension to collect Browsing History for SurfSense.",
"author": "https://github.com/MODSetter", "author": "https://github.com/MODSetter",
"scripts": { "scripts": {

View file

@ -1 +1,3 @@
NEXT_PUBLIC_FASTAPI_BACKEND_URL=http://localhost:8000 NEXT_PUBLIC_FASTAPI_BACKEND_URL=http://localhost:8000
NEXT_PUBLIC_FASTAPI_BACKEND_AUTH_TYPE=LOCAL or GOOGLE
NEXT_PUBLIC_ETL_SERVICE=UNSTRUCTURED or LLAMACLOUD

View file

@ -3,7 +3,7 @@
import { useState, useEffect } from 'react'; import { useState, useEffect } from 'react';
import { motion, AnimatePresence } from 'framer-motion'; import { motion, AnimatePresence } from 'framer-motion';
import { useSearchParams } from 'next/navigation'; import { useSearchParams } from 'next/navigation';
import { MessageCircleMore, Search, Calendar, Tag, Trash2, ExternalLink, MoreHorizontal } from 'lucide-react'; import { MessageCircleMore, Search, Calendar, Tag, Trash2, ExternalLink, MoreHorizontal, Radio, CheckCircle, Circle, Podcast } from 'lucide-react';
import { format } from 'date-fns'; import { format } from 'date-fns';
// UI Components // UI Components
@ -42,6 +42,9 @@ import {
SelectTrigger, SelectTrigger,
SelectValue, SelectValue,
} from "@/components/ui/select"; } from "@/components/ui/select";
import { Checkbox } from "@/components/ui/checkbox";
import { Label } from "@/components/ui/label";
import { toast } from "sonner";
interface Chat { interface Chat {
created_at: string; created_at: string;
@ -92,6 +95,18 @@ export default function ChatsPageClient({ searchSpaceId }: ChatsPageClientProps)
const [chatToDelete, setChatToDelete] = useState<{ id: number, title: string } | null>(null); const [chatToDelete, setChatToDelete] = useState<{ id: number, title: string } | null>(null);
const [isDeleting, setIsDeleting] = useState(false); const [isDeleting, setIsDeleting] = useState(false);
// New state for podcast generation
const [selectedChats, setSelectedChats] = useState<number[]>([]);
const [selectionMode, setSelectionMode] = useState(false);
const [podcastDialogOpen, setPodcastDialogOpen] = useState(false);
const [podcastTitle, setPodcastTitle] = useState("");
const [isGeneratingPodcast, setIsGeneratingPodcast] = useState(false);
// New state for individual podcast generation
const [currentChatIndex, setCurrentChatIndex] = useState(0);
const [podcastTitles, setPodcastTitles] = useState<{[key: number]: string}>({});
const [processingChat, setProcessingChat] = useState<Chat | null>(null);
const chatsPerPage = 9; const chatsPerPage = 9;
const searchParams = useSearchParams(); const searchParams = useSearchParams();
@ -234,6 +249,177 @@ export default function ChatsPageClient({ searchSpaceId }: ChatsPageClientProps)
// Get unique chat types for filter dropdown // Get unique chat types for filter dropdown
const chatTypes = ['all', ...Array.from(new Set(chats.map(chat => chat.type)))]; const chatTypes = ['all', ...Array.from(new Set(chats.map(chat => chat.type)))];
// Generate individual podcasts from selected chats
const handleGeneratePodcast = async () => {
if (selectedChats.length === 0) {
toast.error("Please select at least one chat");
return;
}
const currentChatId = selectedChats[currentChatIndex];
const currentTitle = podcastTitles[currentChatId] || podcastTitle;
if (!currentTitle.trim()) {
toast.error("Please enter a podcast title");
return;
}
setIsGeneratingPodcast(true);
try {
const token = localStorage.getItem('surfsense_bearer_token');
if (!token) {
toast.error("Authentication error. Please log in again.");
setIsGeneratingPodcast(false);
return;
}
// Create payload for single chat
const payload = {
type: "CHAT",
ids: [currentChatId], // Single chat ID
search_space_id: parseInt(searchSpaceId),
podcast_title: currentTitle
};
const response = await fetch(`${process.env.NEXT_PUBLIC_FASTAPI_BACKEND_URL}/api/v1/podcasts/generate/`, {
method: 'POST',
headers: {
'Authorization': `Bearer ${token}`,
'Content-Type': 'application/json',
},
body: JSON.stringify(payload)
});
if (!response.ok) {
const errorData = await response.json().catch(() => ({}));
throw new Error(errorData.detail || "Failed to generate podcast");
}
const data = await response.json();
toast.success(`Podcast "${currentTitle}" generation started!`);
// Move to the next chat or finish
if (currentChatIndex < selectedChats.length - 1) {
// Set up for next chat
setCurrentChatIndex(currentChatIndex + 1);
// Find the next chat from the chats array
const nextChatId = selectedChats[currentChatIndex + 1];
const nextChat = chats.find(chat => chat.id === nextChatId) || null;
setProcessingChat(nextChat);
// Default title for the next chat
if (!podcastTitles[nextChatId]) {
setPodcastTitle(nextChat?.title || `Podcast from Chat ${nextChatId}`);
} else {
setPodcastTitle(podcastTitles[nextChatId]);
}
setIsGeneratingPodcast(false);
} else {
// All done
finishPodcastGeneration();
}
} catch (error) {
console.error('Error generating podcast:', error);
toast.error(error instanceof Error ? error.message : 'Failed to generate podcast');
setIsGeneratingPodcast(false);
}
};
// Helper to finish the podcast generation process
const finishPodcastGeneration = () => {
toast.success("All podcasts are being generated! Check the podcasts tab to see them when ready.");
setPodcastDialogOpen(false);
setSelectedChats([]);
setSelectionMode(false);
setCurrentChatIndex(0);
setPodcastTitles({});
setProcessingChat(null);
setPodcastTitle("");
setIsGeneratingPodcast(false);
};
// Start podcast generation flow
const startPodcastGeneration = () => {
if (selectedChats.length === 0) {
toast.error("Please select at least one chat");
return;
}
// Reset the state for podcast generation
setCurrentChatIndex(0);
setPodcastTitles({});
// Set up for the first chat
const firstChatId = selectedChats[0];
const firstChat = chats.find(chat => chat.id === firstChatId) || null;
setProcessingChat(firstChat);
// Set default title for the first chat
setPodcastTitle(firstChat?.title || `Podcast from Chat ${firstChatId}`);
setPodcastDialogOpen(true);
};
// Update the title for the current chat
const updateCurrentChatTitle = (title: string) => {
const currentChatId = selectedChats[currentChatIndex];
setPodcastTitle(title);
setPodcastTitles(prev => ({
...prev,
[currentChatId]: title
}));
};
// Skip generating a podcast for the current chat
const skipCurrentChat = () => {
if (currentChatIndex < selectedChats.length - 1) {
// Move to the next chat
setCurrentChatIndex(currentChatIndex + 1);
// Find the next chat
const nextChatId = selectedChats[currentChatIndex + 1];
const nextChat = chats.find(chat => chat.id === nextChatId) || null;
setProcessingChat(nextChat);
// Set default title for the next chat
if (!podcastTitles[nextChatId]) {
setPodcastTitle(nextChat?.title || `Podcast from Chat ${nextChatId}`);
} else {
setPodcastTitle(podcastTitles[nextChatId]);
}
} else {
// All done (all skipped)
finishPodcastGeneration();
}
};
// Toggle chat selection
const toggleChatSelection = (chatId: number) => {
setSelectedChats(prev =>
prev.includes(chatId)
? prev.filter(id => id !== chatId)
: [...prev, chatId]
);
};
// Select all visible chats
const selectAllVisibleChats = () => {
const visibleChatIds = currentChats.map(chat => chat.id);
setSelectedChats(prev => {
const allSelected = visibleChatIds.every(id => prev.includes(id));
return allSelected
? prev.filter(id => !visibleChatIds.includes(id)) // Deselect all visible if all are selected
: [...new Set([...prev, ...visibleChatIds])]; // Add all visible, ensuring no duplicates
});
};
// Cancel selection mode
const cancelSelectionMode = () => {
setSelectionMode(false);
setSelectedChats([]);
};
return ( return (
<motion.div <motion.div
className="container p-6 mx-auto" className="container p-6 mx-auto"
@ -278,18 +464,63 @@ export default function ChatsPageClient({ searchSpaceId }: ChatsPageClientProps)
</Select> </Select>
</div> </div>
<div> <div className="flex items-center gap-2">
<Select value={sortOrder} onValueChange={setSortOrder}> {selectionMode ? (
<SelectTrigger className="w-40"> <>
<SelectValue placeholder="Sort order" /> <Button
</SelectTrigger> variant="outline"
<SelectContent> size="sm"
<SelectGroup> onClick={selectAllVisibleChats}
<SelectItem value="newest">Newest First</SelectItem> className="gap-1"
<SelectItem value="oldest">Oldest First</SelectItem> title="Select or deselect all chats on the current page"
</SelectGroup> >
</SelectContent> <CheckCircle className="h-4 w-4" />
</Select> {currentChats.every(chat => selectedChats.includes(chat.id))
? "Deselect Page"
: "Select Page"}
</Button>
<Button
variant="default"
size="sm"
onClick={startPodcastGeneration}
className="gap-1"
disabled={selectedChats.length === 0}
>
<Podcast className="h-4 w-4" />
Generate Podcast ({selectedChats.length})
</Button>
<Button
variant="ghost"
size="sm"
onClick={cancelSelectionMode}
>
Cancel
</Button>
</>
) : (
<>
<Button
variant="outline"
size="sm"
onClick={() => setSelectionMode(true)}
className="gap-1"
>
<Podcast className="h-4 w-4" />
Podcaster
</Button>
<Select value={sortOrder} onValueChange={setSortOrder}>
<SelectTrigger className="w-40">
<SelectValue placeholder="Sort order" />
</SelectTrigger>
<SelectContent>
<SelectGroup>
<SelectItem value="newest">Newest First</SelectItem>
<SelectItem value="oldest">Oldest First</SelectItem>
</SelectGroup>
</SelectContent>
</Select>
</>
)}
</div> </div>
</div> </div>
@ -334,44 +565,79 @@ export default function ChatsPageClient({ searchSpaceId }: ChatsPageClientProps)
animate="animate" animate="animate"
exit="exit" exit="exit"
transition={{ duration: 0.2, delay: index * 0.05 }} transition={{ duration: 0.2, delay: index * 0.05 }}
className="overflow-hidden hover:shadow-md transition-shadow" className={`overflow-hidden hover:shadow-md transition-shadow
${selectionMode && selectedChats.includes(chat.id)
? 'ring-2 ring-primary ring-offset-2' : ''}`}
onClick={(e) => {
if (!selectionMode) return;
// Ignore clicks coming from interactive elements
if ((e.target as HTMLElement).closest('button, a, [data-stop-selection]')) return;
toggleChatSelection(chat.id);
}}
> >
<CardHeader className="pb-3"> <CardHeader className="pb-3">
<div className="flex justify-between items-start"> <div className="flex justify-between items-start">
<div className="space-y-1"> <div className="space-y-1 flex items-start gap-2">
<CardTitle className="line-clamp-1">{chat.title || `Chat ${chat.id}`}</CardTitle> {selectionMode && (
<CardDescription> <div className="mt-1">
<span className="flex items-center gap-1"> {selectedChats.includes(chat.id)
<Calendar className="h-3.5 w-3.5" /> ? <CheckCircle className="h-4 w-4 text-primary" />
<span>{format(new Date(chat.created_at), 'MMM d, yyyy')}</span> : <Circle className="h-4 w-4 text-muted-foreground" />}
</span> </div>
</CardDescription> )}
<div>
<CardTitle className="line-clamp-1">{chat.title || `Chat ${chat.id}`}</CardTitle>
<CardDescription>
<span className="flex items-center gap-1">
<Calendar className="h-3.5 w-3.5" />
<span>{format(new Date(chat.created_at), 'MMM d, yyyy')}</span>
</span>
</CardDescription>
</div>
</div> </div>
<DropdownMenu> {!selectionMode && (
<DropdownMenuTrigger asChild> <DropdownMenu>
<Button variant="ghost" size="icon" className="h-8 w-8"> <DropdownMenuTrigger asChild>
<MoreHorizontal className="h-4 w-4" /> <Button
<span className="sr-only">Open menu</span> variant="ghost"
</Button> size="icon"
</DropdownMenuTrigger> className="h-8 w-8"
<DropdownMenuContent align="end"> data-stop-selection
<DropdownMenuItem onClick={() => window.location.href = `/dashboard/${chat.search_space_id}/researcher/${chat.id}`}> >
<ExternalLink className="mr-2 h-4 w-4" /> <MoreHorizontal className="h-4 w-4" />
<span>View Chat</span> <span className="sr-only">Open menu</span>
</DropdownMenuItem> </Button>
<DropdownMenuSeparator /> </DropdownMenuTrigger>
<DropdownMenuItem <DropdownMenuContent align="end">
className="text-destructive focus:text-destructive" <DropdownMenuItem onClick={() => window.location.href = `/dashboard/${chat.search_space_id}/researcher/${chat.id}`}>
onClick={() => { <ExternalLink className="mr-2 h-4 w-4" />
setChatToDelete({ id: chat.id, title: chat.title || `Chat ${chat.id}` }); <span>View Chat</span>
setDeleteDialogOpen(true); </DropdownMenuItem>
}} <DropdownMenuItem
> onClick={() => {
<Trash2 className="mr-2 h-4 w-4" /> setSelectedChats([chat.id]);
<span>Delete Chat</span> setPodcastTitle(chat.title || `Chat ${chat.id}`);
</DropdownMenuItem> setPodcastDialogOpen(true);
</DropdownMenuContent> }}
</DropdownMenu> >
<Podcast className="mr-2 h-4 w-4" />
<span>Generate Podcast</span>
</DropdownMenuItem>
<DropdownMenuSeparator />
<DropdownMenuItem
className="text-destructive focus:text-destructive"
onClick={(e) => {
e.stopPropagation();
setChatToDelete({ id: chat.id, title: chat.title || `Chat ${chat.id}` });
setDeleteDialogOpen(true);
}}
>
<Trash2 className="mr-2 h-4 w-4" />
<span>Delete Chat</span>
</DropdownMenuItem>
</DropdownMenuContent>
</DropdownMenu>
)}
</div> </div>
</CardHeader> </CardHeader>
<CardContent> <CardContent>
@ -505,6 +771,104 @@ export default function ChatsPageClient({ searchSpaceId }: ChatsPageClientProps)
</DialogFooter> </DialogFooter>
</DialogContent> </DialogContent>
</Dialog> </Dialog>
{/* Podcast Generation Dialog */}
<Dialog
open={podcastDialogOpen}
onOpenChange={(isOpen: boolean) => {
if (!isOpen) {
// Cancel the process if dialog is closed
setPodcastDialogOpen(false);
setSelectedChats([]);
setSelectionMode(false);
setCurrentChatIndex(0);
setPodcastTitles({});
setProcessingChat(null);
setPodcastTitle("");
} else {
setPodcastDialogOpen(true);
}
}}
>
<DialogContent className="sm:max-w-md">
<DialogHeader>
<DialogTitle className="flex items-center gap-2">
<Podcast className="h-5 w-5 text-primary" />
<span>Generate Podcast {currentChatIndex + 1} of {selectedChats.length}</span>
</DialogTitle>
<DialogDescription>
{selectedChats.length > 1 ? (
<>Creating individual podcasts for each selected chat. Currently processing: <span className="font-medium">{processingChat?.title || `Chat ${selectedChats[currentChatIndex]}`}</span></>
) : (
<>Create a podcast from this chat. The podcast will be available in the podcasts section once generated.</>
)}
</DialogDescription>
</DialogHeader>
<div className="space-y-4 py-2">
<div className="space-y-2">
<Label htmlFor="podcast-title">Podcast Title</Label>
<Input
id="podcast-title"
placeholder="Enter podcast title"
value={podcastTitle}
onChange={(e) => updateCurrentChatTitle(e.target.value)}
/>
</div>
{selectedChats.length > 1 && (
<div className="w-full bg-muted rounded-full h-2.5 mt-4">
<div
className="bg-primary h-2.5 rounded-full transition-all duration-300"
style={{ width: `${((currentChatIndex) / selectedChats.length) * 100}%` }}
></div>
</div>
)}
</div>
<DialogFooter className="flex gap-2 sm:justify-end">
{selectedChats.length > 1 && !isGeneratingPodcast && (
<Button
variant="outline"
onClick={skipCurrentChat}
className="gap-1"
>
Skip
</Button>
)}
<Button
variant="outline"
onClick={() => {
setPodcastDialogOpen(false);
setCurrentChatIndex(0);
setPodcastTitles({});
setProcessingChat(null);
}}
disabled={isGeneratingPodcast}
>
Cancel
</Button>
<Button
variant="default"
onClick={handleGeneratePodcast}
disabled={isGeneratingPodcast}
className="gap-2"
>
{isGeneratingPodcast ? (
<>
<span className="h-4 w-4 animate-spin rounded-full border-2 border-current border-t-transparent" />
Generating...
</>
) : (
<>
<Podcast className="h-4 w-4" />
Generate Podcast
</>
)}
</Button>
</DialogFooter>
</DialogContent>
</Dialog>
</motion.div> </motion.div>
); );
} }

View file

@ -8,8 +8,8 @@ interface PageProps {
} }
export default async function ChatsPage({ params }: PageProps) { export default async function ChatsPage({ params }: PageProps) {
// Await params to properly access dynamic route parameters // Get search space ID from the route parameter
const searchSpaceId = params.search_space_id; const { search_space_id: searchSpaceId } = await Promise.resolve(params);
return ( return (
<Suspense fallback={<div className="flex items-center justify-center h-[60vh]"> <Suspense fallback={<div className="flex items-center justify-center h-[60vh]">

View file

@ -4,255 +4,308 @@ import { useState, useEffect } from "react";
import { useRouter, useParams } from "next/navigation"; import { useRouter, useParams } from "next/navigation";
import { motion } from "framer-motion"; import { motion } from "framer-motion";
import { toast } from "sonner"; import { toast } from "sonner";
import { Edit, Plus, Search, Trash2, ExternalLink, RefreshCw } from "lucide-react"; import {
Edit,
Plus,
Search,
Trash2,
ExternalLink,
RefreshCw,
} from "lucide-react";
import { useSearchSourceConnectors } from "@/hooks/useSearchSourceConnectors"; import { useSearchSourceConnectors } from "@/hooks/useSearchSourceConnectors";
import { Button } from "@/components/ui/button"; import { Button } from "@/components/ui/button";
import { import {
Card, Card,
CardContent, CardContent,
CardDescription, CardDescription,
CardFooter, CardFooter,
CardHeader, CardHeader,
CardTitle, CardTitle,
} from "@/components/ui/card"; } from "@/components/ui/card";
import { import {
Table, Table,
TableBody, TableBody,
TableCell, TableCell,
TableHead, TableHead,
TableHeader, TableHeader,
TableRow, TableRow,
} from "@/components/ui/table"; } from "@/components/ui/table";
import { import {
AlertDialog, AlertDialog,
AlertDialogAction, AlertDialogAction,
AlertDialogCancel, AlertDialogCancel,
AlertDialogContent, AlertDialogContent,
AlertDialogDescription, AlertDialogDescription,
AlertDialogFooter, AlertDialogFooter,
AlertDialogHeader, AlertDialogHeader,
AlertDialogTitle, AlertDialogTitle,
AlertDialogTrigger, AlertDialogTrigger,
} from "@/components/ui/alert-dialog"; } from "@/components/ui/alert-dialog";
import { Tooltip, TooltipContent, TooltipProvider, TooltipTrigger } from "@/components/ui/tooltip"; import {
Tooltip,
TooltipContent,
TooltipProvider,
TooltipTrigger,
} from "@/components/ui/tooltip";
import { getConnectorIcon } from "@/components/chat";
// Helper function to get connector type display name // Helper function to get connector type display name
const getConnectorTypeDisplay = (type: string): string => { const getConnectorTypeDisplay = (type: string): string => {
const typeMap: Record<string, string> = { const typeMap: Record<string, string> = {
"SERPER_API": "Serper API", SERPER_API: "Serper API",
"TAVILY_API": "Tavily API", TAVILY_API: "Tavily API",
"SLACK_CONNECTOR": "Slack", SLACK_CONNECTOR: "Slack",
"NOTION_CONNECTOR": "Notion", NOTION_CONNECTOR: "Notion",
"GITHUB_CONNECTOR": "GitHub", GITHUB_CONNECTOR: "GitHub",
"LINEAR_CONNECTOR": "Linear", LINEAR_CONNECTOR: "Linear",
// Add other connector types here as needed LINKUP_API: "Linkup",
}; // Add other connector types here as needed
return typeMap[type] || type; };
return typeMap[type] || type;
}; };
// Helper function to format date with time // Helper function to format date with time
const formatDateTime = (dateString: string | null): string => { const formatDateTime = (dateString: string | null): string => {
if (!dateString) return "Never"; if (!dateString) return "Never";
const date = new Date(dateString); const date = new Date(dateString);
return new Intl.DateTimeFormat('en-US', { return new Intl.DateTimeFormat("en-US", {
year: 'numeric', year: "numeric",
month: 'short', month: "short",
day: 'numeric', day: "numeric",
hour: '2-digit', hour: "2-digit",
minute: '2-digit' minute: "2-digit",
}).format(date); }).format(date);
}; };
export default function ConnectorsPage() { export default function ConnectorsPage() {
const router = useRouter(); const router = useRouter();
const params = useParams(); const params = useParams();
const searchSpaceId = params.search_space_id as string; const searchSpaceId = params.search_space_id as string;
const { connectors, isLoading, error, deleteConnector, indexConnector } = useSearchSourceConnectors(); const { connectors, isLoading, error, deleteConnector, indexConnector } =
const [connectorToDelete, setConnectorToDelete] = useState<number | null>(null); useSearchSourceConnectors();
const [indexingConnectorId, setIndexingConnectorId] = useState<number | null>(null); const [connectorToDelete, setConnectorToDelete] = useState<number | null>(
null,
);
const [indexingConnectorId, setIndexingConnectorId] = useState<number | null>(
null,
);
useEffect(() => { useEffect(() => {
if (error) { if (error) {
toast.error("Failed to load connectors"); toast.error("Failed to load connectors");
console.error("Error fetching connectors:", error); console.error("Error fetching connectors:", error);
} }
}, [error]); }, [error]);
// Handle connector deletion // Handle connector deletion
const handleDeleteConnector = async () => { const handleDeleteConnector = async () => {
if (connectorToDelete === null) return; if (connectorToDelete === null) return;
try { try {
await deleteConnector(connectorToDelete); await deleteConnector(connectorToDelete);
toast.success("Connector deleted successfully"); toast.success("Connector deleted successfully");
} catch (error) { } catch (error) {
console.error("Error deleting connector:", error); console.error("Error deleting connector:", error);
toast.error("Failed to delete connector"); toast.error("Failed to delete connector");
} finally { } finally {
setConnectorToDelete(null); setConnectorToDelete(null);
} }
}; };
// Handle connector indexing // Handle connector indexing
const handleIndexConnector = async (connectorId: number) => { const handleIndexConnector = async (connectorId: number) => {
setIndexingConnectorId(connectorId); setIndexingConnectorId(connectorId);
try { try {
await indexConnector(connectorId, searchSpaceId); await indexConnector(connectorId, searchSpaceId);
toast.success("Connector content indexed successfully"); toast.success("Connector content indexed successfully");
} catch (error) { } catch (error) {
console.error("Error indexing connector content:", error); console.error("Error indexing connector content:", error);
toast.error(error instanceof Error ? error.message : "Failed to index connector content"); toast.error(
} finally { error instanceof Error
setIndexingConnectorId(null); ? error.message
} : "Failed to index connector content",
}; );
} finally {
setIndexingConnectorId(null);
}
};
return ( return (
<div className="container mx-auto py-8 max-w-6xl"> <div className="container mx-auto py-8 max-w-6xl">
<motion.div <motion.div
initial={{ opacity: 0, y: 20 }} initial={{ opacity: 0, y: 20 }}
animate={{ opacity: 1, y: 0 }} animate={{ opacity: 1, y: 0 }}
transition={{ duration: 0.5 }} transition={{ duration: 0.5 }}
className="mb-8 flex items-center justify-between" className="mb-8 flex items-center justify-between"
> >
<div> <div>
<h1 className="text-3xl font-bold tracking-tight">Connectors</h1> <h1 className="text-3xl font-bold tracking-tight">Connectors</h1>
<p className="text-muted-foreground mt-2"> <p className="text-muted-foreground mt-2">
Manage your connected services and data sources. Manage your connected services and data sources.
</p> </p>
</div> </div>
<Button onClick={() => router.push(`/dashboard/${searchSpaceId}/connectors/add`)}> <Button
<Plus className="mr-2 h-4 w-4" /> onClick={() =>
Add Connector router.push(`/dashboard/${searchSpaceId}/connectors/add`)
</Button> }
</motion.div> >
<Plus className="mr-2 h-4 w-4" />
Add Connector
</Button>
</motion.div>
<Card> <Card>
<CardHeader className="pb-3"> <CardHeader className="pb-3">
<CardTitle>Your Connectors</CardTitle> <CardTitle>Your Connectors</CardTitle>
<CardDescription> <CardDescription>
View and manage all your connected services. View and manage all your connected services.
</CardDescription> </CardDescription>
</CardHeader> </CardHeader>
<CardContent> <CardContent>
{isLoading ? ( {isLoading ? (
<div className="flex justify-center py-8"> <div className="flex justify-center py-8">
<div className="animate-pulse text-center"> <div className="animate-pulse text-center">
<div className="h-6 w-32 bg-muted rounded mx-auto mb-2"></div> <div className="h-6 w-32 bg-muted rounded mx-auto mb-2"></div>
<div className="h-4 w-48 bg-muted rounded mx-auto"></div> <div className="h-4 w-48 bg-muted rounded mx-auto"></div>
</div> </div>
</div> </div>
) : connectors.length === 0 ? ( ) : connectors.length === 0 ? (
<div className="text-center py-12"> <div className="text-center py-12">
<h3 className="text-lg font-medium mb-2">No connectors found</h3> <h3 className="text-lg font-medium mb-2">No connectors found</h3>
<p className="text-muted-foreground mb-6"> <p className="text-muted-foreground mb-6">
You haven't added any connectors yet. Add one to enhance your search capabilities. You haven't added any connectors yet. Add one to enhance your
</p> search capabilities.
<Button onClick={() => router.push(`/dashboard/${searchSpaceId}/connectors/add`)}> </p>
<Plus className="mr-2 h-4 w-4" /> <Button
Add Your First Connector onClick={() =>
</Button> router.push(`/dashboard/${searchSpaceId}/connectors/add`)
</div> }
) : ( >
<div className="rounded-md border"> <Plus className="mr-2 h-4 w-4" />
<Table> Add Your First Connector
<TableHeader> </Button>
<TableRow> </div>
<TableHead>Name</TableHead> ) : (
<TableHead>Type</TableHead> <div className="rounded-md border">
<TableHead>Last Indexed</TableHead> <Table>
<TableHead className="text-right">Actions</TableHead> <TableHeader>
</TableRow> <TableRow>
</TableHeader> <TableHead>Name</TableHead>
<TableBody> <TableHead>Type</TableHead>
{connectors.map((connector) => ( <TableHead>Last Indexed</TableHead>
<TableRow key={connector.id}> <TableHead className="text-right">Actions</TableHead>
<TableCell className="font-medium">{connector.name}</TableCell> </TableRow>
<TableCell>{getConnectorTypeDisplay(connector.connector_type)}</TableCell> </TableHeader>
<TableCell> <TableBody>
{connector.is_indexable {connectors.map((connector) => (
? formatDateTime(connector.last_indexed_at) <TableRow key={connector.id}>
: "Not indexable"} <TableCell className="font-medium">
</TableCell> {connector.name}
<TableCell className="text-right"> </TableCell>
<div className="flex justify-end gap-2"> <TableCell>
{connector.is_indexable && ( {getConnectorIcon(connector.connector_type)}
<TooltipProvider> </TableCell>
<Tooltip> <TableCell>
<TooltipTrigger asChild> {connector.is_indexable
<Button ? formatDateTime(connector.last_indexed_at)
variant="outline" : "Not indexable"}
size="sm" </TableCell>
onClick={() => handleIndexConnector(connector.id)} <TableCell className="text-right">
disabled={indexingConnectorId === connector.id} <div className="flex justify-end gap-2">
> {connector.is_indexable && (
{indexingConnectorId === connector.id ? ( <TooltipProvider>
<RefreshCw className="h-4 w-4 animate-spin" /> <Tooltip>
) : ( <TooltipTrigger asChild>
<RefreshCw className="h-4 w-4" /> <Button
)} variant="outline"
<span className="sr-only">Index Content</span> size="sm"
</Button> onClick={() =>
</TooltipTrigger> handleIndexConnector(connector.id)
<TooltipContent> }
<p>Index Content</p> disabled={
</TooltipContent> indexingConnectorId === connector.id
</Tooltip> }
</TooltipProvider> >
)} {indexingConnectorId === connector.id ? (
<Button <RefreshCw className="h-4 w-4 animate-spin" />
variant="outline" ) : (
size="sm" <RefreshCw className="h-4 w-4" />
onClick={() => router.push(`/dashboard/${searchSpaceId}/connectors/${connector.id}/edit`)} )}
> <span className="sr-only">
<Edit className="h-4 w-4" /> Index Content
<span className="sr-only">Edit</span> </span>
</Button> </Button>
<AlertDialog> </TooltipTrigger>
<AlertDialogTrigger asChild> <TooltipContent>
<Button <p>Index Content</p>
variant="outline" </TooltipContent>
size="sm" </Tooltip>
className="text-destructive-foreground hover:bg-destructive/10" </TooltipProvider>
onClick={() => setConnectorToDelete(connector.id)} )}
> <Button
<Trash2 className="h-4 w-4" /> variant="outline"
<span className="sr-only">Delete</span> size="sm"
</Button> onClick={() =>
</AlertDialogTrigger> router.push(
<AlertDialogContent> `/dashboard/${searchSpaceId}/connectors/${connector.id}/edit`,
<AlertDialogHeader> )
<AlertDialogTitle>Delete Connector</AlertDialogTitle> }
<AlertDialogDescription> >
Are you sure you want to delete this connector? This action cannot be undone. <Edit className="h-4 w-4" />
</AlertDialogDescription> <span className="sr-only">Edit</span>
</AlertDialogHeader> </Button>
<AlertDialogFooter> <AlertDialog>
<AlertDialogCancel onClick={() => setConnectorToDelete(null)}> <AlertDialogTrigger asChild>
Cancel <Button
</AlertDialogCancel> variant="outline"
<AlertDialogAction size="sm"
className="bg-destructive text-destructive-foreground hover:bg-destructive/90" className="text-destructive-foreground hover:bg-destructive/10"
onClick={handleDeleteConnector} onClick={() =>
> setConnectorToDelete(connector.id)
Delete }
</AlertDialogAction> >
</AlertDialogFooter> <Trash2 className="h-4 w-4" />
</AlertDialogContent> <span className="sr-only">Delete</span>
</AlertDialog> </Button>
</div> </AlertDialogTrigger>
</TableCell> <AlertDialogContent>
</TableRow> <AlertDialogHeader>
))} <AlertDialogTitle>
</TableBody> Delete Connector
</Table> </AlertDialogTitle>
</div> <AlertDialogDescription>
)} Are you sure you want to delete this
</CardContent> connector? This action cannot be undone.
</Card> </AlertDialogDescription>
</div> </AlertDialogHeader>
); <AlertDialogFooter>
<AlertDialogCancel
onClick={() => setConnectorToDelete(null)}
>
Cancel
</AlertDialogCancel>
<AlertDialogAction
className="bg-destructive text-destructive-foreground hover:bg-destructive/90"
onClick={handleDeleteConnector}
>
Delete
</AlertDialogAction>
</AlertDialogFooter>
</AlertDialogContent>
</AlertDialog>
</div>
</TableCell>
</TableRow>
))}
</TableBody>
</Table>
</div>
)}
</CardContent>
</Card>
</div>
);
} }

View file

@ -1,6 +1,6 @@
"use client"; "use client";
import React, { useEffect } from 'react'; import React, { useEffect } from "react";
import { useRouter, useParams } from "next/navigation"; import { useRouter, useParams } from "next/navigation";
import { motion } from "framer-motion"; import { motion } from "framer-motion";
import { toast } from "sonner"; import { toast } from "sonner";
@ -8,169 +8,208 @@ import { ArrowLeft, Check, Loader2, Github } from "lucide-react";
import { Form } from "@/components/ui/form"; import { Form } from "@/components/ui/form";
import { Button } from "@/components/ui/button"; import { Button } from "@/components/ui/button";
import { Card, CardContent, CardDescription, CardFooter, CardHeader, CardTitle } from "@/components/ui/card"; import {
Card,
CardContent,
CardDescription,
CardFooter,
CardHeader,
CardTitle,
} from "@/components/ui/card";
// Import Utils, Types, Hook, and Components // Import Utils, Types, Hook, and Components
import { getConnectorTypeDisplay } from '@/lib/connectors/utils'; import { getConnectorTypeDisplay } from "@/lib/connectors/utils";
import { useConnectorEditPage } from '@/hooks/useConnectorEditPage'; import { useConnectorEditPage } from "@/hooks/useConnectorEditPage";
import { EditConnectorLoadingSkeleton } from "@/components/editConnector/EditConnectorLoadingSkeleton"; import { EditConnectorLoadingSkeleton } from "@/components/editConnector/EditConnectorLoadingSkeleton";
import { EditConnectorNameForm } from "@/components/editConnector/EditConnectorNameForm"; import { EditConnectorNameForm } from "@/components/editConnector/EditConnectorNameForm";
import { EditGitHubConnectorConfig } from "@/components/editConnector/EditGitHubConnectorConfig"; import { EditGitHubConnectorConfig } from "@/components/editConnector/EditGitHubConnectorConfig";
import { EditSimpleTokenForm } from "@/components/editConnector/EditSimpleTokenForm"; import { EditSimpleTokenForm } from "@/components/editConnector/EditSimpleTokenForm";
import { getConnectorIcon } from "@/components/chat";
export default function EditConnectorPage() { export default function EditConnectorPage() {
const router = useRouter(); const router = useRouter();
const params = useParams(); const params = useParams();
const searchSpaceId = params.search_space_id as string; const searchSpaceId = params.search_space_id as string;
// Ensure connectorId is parsed safely // Ensure connectorId is parsed safely
const connectorIdParam = params.connector_id as string; const connectorIdParam = params.connector_id as string;
const connectorId = connectorIdParam ? parseInt(connectorIdParam, 10) : NaN; const connectorId = connectorIdParam ? parseInt(connectorIdParam, 10) : NaN;
// Use the custom hook to manage state and logic // Use the custom hook to manage state and logic
const { const {
connectorsLoading, connectorsLoading,
connector, connector,
isSaving, isSaving,
editForm, editForm,
patForm, // Needed for GitHub child component patForm, // Needed for GitHub child component
handleSaveChanges, handleSaveChanges,
// GitHub specific props for the child component // GitHub specific props for the child component
editMode, editMode,
setEditMode, // Pass down if needed by GitHub component setEditMode, // Pass down if needed by GitHub component
originalPat, originalPat,
currentSelectedRepos, currentSelectedRepos,
fetchedRepos, fetchedRepos,
setFetchedRepos, setFetchedRepos,
newSelectedRepos, newSelectedRepos,
setNewSelectedRepos, setNewSelectedRepos,
isFetchingRepos, isFetchingRepos,
handleFetchRepositories, handleFetchRepositories,
handleRepoSelectionChange, handleRepoSelectionChange,
} = useConnectorEditPage(connectorId, searchSpaceId); } = useConnectorEditPage(connectorId, searchSpaceId);
// Redirect if connectorId is not a valid number after parsing // Redirect if connectorId is not a valid number after parsing
useEffect(() => { useEffect(() => {
if (isNaN(connectorId)) { if (isNaN(connectorId)) {
toast.error("Invalid Connector ID."); toast.error("Invalid Connector ID.");
router.push(`/dashboard/${searchSpaceId}/connectors`); router.push(`/dashboard/${searchSpaceId}/connectors`);
} }
}, [connectorId, router, searchSpaceId]); }, [connectorId, router, searchSpaceId]);
// Loading State // Loading State
if (connectorsLoading || !connector) { if (connectorsLoading || !connector) {
// Handle NaN case before showing skeleton // Handle NaN case before showing skeleton
if (isNaN(connectorId)) return null; if (isNaN(connectorId)) return null;
return <EditConnectorLoadingSkeleton />; return <EditConnectorLoadingSkeleton />;
} }
// Main Render using data/handlers from the hook // Main Render using data/handlers from the hook
return ( return (
<div className="container mx-auto py-8 max-w-3xl"> <div className="container mx-auto py-8 max-w-3xl">
<Button variant="ghost" className="mb-6" onClick={() => router.push(`/dashboard/${searchSpaceId}/connectors`)}> <Button
<ArrowLeft className="mr-2 h-4 w-4" /> Back to Connectors variant="ghost"
</Button> className="mb-6"
onClick={() => router.push(`/dashboard/${searchSpaceId}/connectors`)}
>
<ArrowLeft className="mr-2 h-4 w-4" /> Back to Connectors
</Button>
<motion.div initial={{ opacity: 0, y: 20 }} animate={{ opacity: 1, y: 0 }} transition={{ duration: 0.5 }}> <motion.div
<Card className="border-2 border-border"> initial={{ opacity: 0, y: 20 }}
<CardHeader> animate={{ opacity: 1, y: 0 }}
<CardTitle className="text-2xl font-bold flex items-center gap-2"> transition={{ duration: 0.5 }}
<Github className="h-6 w-6" /> {/* TODO: Dynamic icon */} >
Edit {getConnectorTypeDisplay(connector.connector_type)} Connector <Card className="border-2 border-border">
</CardTitle> <CardHeader>
<CardDescription>Modify connector name and configuration.</CardDescription> <CardTitle className="text-2xl font-bold flex items-center gap-2">
</CardHeader> {getConnectorIcon(connector.connector_type)}
Edit {getConnectorTypeDisplay(connector.connector_type)} Connector
</CardTitle>
<CardDescription>
Modify connector name and configuration.
</CardDescription>
</CardHeader>
<Form {...editForm}> <Form {...editForm}>
{/* Pass hook's handleSaveChanges */} {/* Pass hook's handleSaveChanges */}
<form onSubmit={editForm.handleSubmit(handleSaveChanges)} className="space-y-6"> <form
<CardContent className="space-y-6"> onSubmit={editForm.handleSubmit(handleSaveChanges)}
{/* Pass form control from hook */} className="space-y-6"
<EditConnectorNameForm control={editForm.control} /> >
<CardContent className="space-y-6">
{/* Pass form control from hook */}
<EditConnectorNameForm control={editForm.control} />
<hr /> <hr />
<h3 className="text-lg font-semibold">Configuration</h3> <h3 className="text-lg font-semibold">Configuration</h3>
{/* == GitHub == */} {/* == GitHub == */}
{connector.connector_type === 'GITHUB_CONNECTOR' && ( {connector.connector_type === "GITHUB_CONNECTOR" && (
<EditGitHubConnectorConfig <EditGitHubConnectorConfig
// Pass relevant state and handlers from hook // Pass relevant state and handlers from hook
editMode={editMode} editMode={editMode}
setEditMode={setEditMode} // Pass setter if child manages mode setEditMode={setEditMode} // Pass setter if child manages mode
originalPat={originalPat} originalPat={originalPat}
currentSelectedRepos={currentSelectedRepos} currentSelectedRepos={currentSelectedRepos}
fetchedRepos={fetchedRepos} fetchedRepos={fetchedRepos}
newSelectedRepos={newSelectedRepos} newSelectedRepos={newSelectedRepos}
isFetchingRepos={isFetchingRepos} isFetchingRepos={isFetchingRepos}
patForm={patForm} patForm={patForm}
handleFetchRepositories={handleFetchRepositories} handleFetchRepositories={handleFetchRepositories}
handleRepoSelectionChange={handleRepoSelectionChange} handleRepoSelectionChange={handleRepoSelectionChange}
setNewSelectedRepos={setNewSelectedRepos} setNewSelectedRepos={setNewSelectedRepos}
setFetchedRepos={setFetchedRepos} setFetchedRepos={setFetchedRepos}
/> />
)} )}
{/* == Slack == */} {/* == Slack == */}
{connector.connector_type === 'SLACK_CONNECTOR' && ( {connector.connector_type === "SLACK_CONNECTOR" && (
<EditSimpleTokenForm <EditSimpleTokenForm
control={editForm.control} control={editForm.control}
fieldName="SLACK_BOT_TOKEN" fieldName="SLACK_BOT_TOKEN"
fieldLabel="Slack Bot Token" fieldLabel="Slack Bot Token"
fieldDescription="Update the Slack Bot Token if needed." fieldDescription="Update the Slack Bot Token if needed."
placeholder="Begins with xoxb-..." placeholder="Begins with xoxb-..."
/> />
)} )}
{/* == Notion == */} {/* == Notion == */}
{connector.connector_type === 'NOTION_CONNECTOR' && ( {connector.connector_type === "NOTION_CONNECTOR" && (
<EditSimpleTokenForm <EditSimpleTokenForm
control={editForm.control} control={editForm.control}
fieldName="NOTION_INTEGRATION_TOKEN" fieldName="NOTION_INTEGRATION_TOKEN"
fieldLabel="Notion Integration Token" fieldLabel="Notion Integration Token"
fieldDescription="Update the Notion Integration Token if needed." fieldDescription="Update the Notion Integration Token if needed."
placeholder="Begins with secret_..." placeholder="Begins with secret_..."
/> />
)} )}
{/* == Serper == */} {/* == Serper == */}
{connector.connector_type === 'SERPER_API' && ( {connector.connector_type === "SERPER_API" && (
<EditSimpleTokenForm <EditSimpleTokenForm
control={editForm.control} control={editForm.control}
fieldName="SERPER_API_KEY" fieldName="SERPER_API_KEY"
fieldLabel="Serper API Key" fieldLabel="Serper API Key"
fieldDescription="Update the Serper API Key if needed." fieldDescription="Update the Serper API Key if needed."
/> />
)} )}
{/* == Tavily == */} {/* == Tavily == */}
{connector.connector_type === 'TAVILY_API' && ( {connector.connector_type === "TAVILY_API" && (
<EditSimpleTokenForm <EditSimpleTokenForm
control={editForm.control} control={editForm.control}
fieldName="TAVILY_API_KEY" fieldName="TAVILY_API_KEY"
fieldLabel="Tavily API Key" fieldLabel="Tavily API Key"
fieldDescription="Update the Tavily API Key if needed." fieldDescription="Update the Tavily API Key if needed."
/> />
)} )}
{/* == Linear == */} {/* == Linear == */}
{connector.connector_type === 'LINEAR_CONNECTOR' && ( {connector.connector_type === "LINEAR_CONNECTOR" && (
<EditSimpleTokenForm <EditSimpleTokenForm
control={editForm.control} control={editForm.control}
fieldName="LINEAR_API_KEY" fieldName="LINEAR_API_KEY"
fieldLabel="Linear API Key" fieldLabel="Linear API Key"
fieldDescription="Update your Linear API Key if needed." fieldDescription="Update your Linear API Key if needed."
placeholder="Begins with lin_api_..." placeholder="Begins with lin_api_..."
/> />
)} )}
</CardContent> {/* == Linkup == */}
<CardFooter className="border-t pt-6"> {connector.connector_type === "LINKUP_API" && (
<Button type="submit" disabled={isSaving} className="w-full sm:w-auto"> <EditSimpleTokenForm
{isSaving ? <Loader2 className="mr-2 h-4 w-4 animate-spin" /> : <Check className="mr-2 h-4 w-4" />} control={editForm.control}
Save Changes fieldName="LINKUP_API_KEY"
</Button> fieldLabel="Linkup API Key"
</CardFooter> fieldDescription="Update your Linkup API Key if needed."
</form> placeholder="Begins with linkup_..."
</Form> />
</Card> )}
</motion.div> </CardContent>
</div> <CardFooter className="border-t pt-6">
); <Button
type="submit"
disabled={isSaving}
className="w-full sm:w-auto"
>
{isSaving ? (
<Loader2 className="mr-2 h-4 w-4 animate-spin" />
) : (
<Check className="mr-2 h-4 w-4" />
)}
Save Changes
</Button>
</CardFooter>
</form>
</Form>
</Card>
</motion.div>
</div>
);
} }

View file

@ -52,6 +52,7 @@ const getConnectorTypeDisplay = (type: string): string => {
"SLACK_CONNECTOR": "Slack Connector", "SLACK_CONNECTOR": "Slack Connector",
"NOTION_CONNECTOR": "Notion Connector", "NOTION_CONNECTOR": "Notion Connector",
"GITHUB_CONNECTOR": "GitHub Connector", "GITHUB_CONNECTOR": "GitHub Connector",
"LINKUP_API": "Linkup",
// Add other connector types here as needed // Add other connector types here as needed
}; };
return typeMap[type] || type; return typeMap[type] || type;
@ -87,7 +88,8 @@ export default function EditConnectorPage() {
"TAVILY_API": "TAVILY_API_KEY", "TAVILY_API": "TAVILY_API_KEY",
"SLACK_CONNECTOR": "SLACK_BOT_TOKEN", "SLACK_CONNECTOR": "SLACK_BOT_TOKEN",
"NOTION_CONNECTOR": "NOTION_INTEGRATION_TOKEN", "NOTION_CONNECTOR": "NOTION_INTEGRATION_TOKEN",
"GITHUB_CONNECTOR": "GITHUB_PAT" "GITHUB_CONNECTOR": "GITHUB_PAT",
"LINKUP_API": "LINKUP_API_KEY"
}; };
return fieldMap[connectorType] || ""; return fieldMap[connectorType] || "";
}; };
@ -229,7 +231,9 @@ export default function EditConnectorPage() {
? "Notion Integration Token" ? "Notion Integration Token"
: connector?.connector_type === "GITHUB_CONNECTOR" : connector?.connector_type === "GITHUB_CONNECTOR"
? "GitHub Personal Access Token (PAT)" ? "GitHub Personal Access Token (PAT)"
: "API Key"} : connector?.connector_type === "LINKUP_API"
? "Linkup API Key"
: "API Key"}
</FormLabel> </FormLabel>
<FormControl> <FormControl>
<Input <Input
@ -241,7 +245,9 @@ export default function EditConnectorPage() {
? "Enter new Notion Token (optional)" ? "Enter new Notion Token (optional)"
: connector?.connector_type === "GITHUB_CONNECTOR" : connector?.connector_type === "GITHUB_CONNECTOR"
? "Enter new GitHub PAT (optional)" ? "Enter new GitHub PAT (optional)"
: "Enter new API key (optional)" : connector?.connector_type === "LINKUP_API"
? "Enter new Linkup API Key (optional)"
: "Enter new API key (optional)"
} }
{...field} {...field}
/> />
@ -253,7 +259,9 @@ export default function EditConnectorPage() {
? "Enter a new Notion Integration Token or leave blank to keep your existing token." ? "Enter a new Notion Integration Token or leave blank to keep your existing token."
: connector?.connector_type === "GITHUB_CONNECTOR" : connector?.connector_type === "GITHUB_CONNECTOR"
? "Enter a new GitHub PAT or leave blank to keep your existing token." ? "Enter a new GitHub PAT or leave blank to keep your existing token."
: "Enter a new API key or leave blank to keep your existing key."} : connector?.connector_type === "LINKUP_API"
? "Enter a new Linkup API Key or leave blank to keep your existing key."
: "Enter a new API key or leave blank to keep your existing key."}
</FormDescription> </FormDescription>
<FormMessage /> <FormMessage />
</FormItem> </FormItem>

View file

@ -0,0 +1,207 @@
"use client";
import { useState } from "react";
import { useRouter, useParams } from "next/navigation";
import { motion } from "framer-motion";
import { zodResolver } from "@hookform/resolvers/zod";
import { useForm } from "react-hook-form";
import * as z from "zod";
import { toast } from "sonner";
import { ArrowLeft, Check, Info, Loader2 } from "lucide-react";
import { useSearchSourceConnectors } from "@/hooks/useSearchSourceConnectors";
import {
Form,
FormControl,
FormDescription,
FormField,
FormItem,
FormLabel,
FormMessage,
} from "@/components/ui/form";
import { Input } from "@/components/ui/input";
import { Button } from "@/components/ui/button";
import {
Card,
CardContent,
CardDescription,
CardFooter,
CardHeader,
CardTitle,
} from "@/components/ui/card";
import {
Alert,
AlertDescription,
AlertTitle,
} from "@/components/ui/alert";
// Define the form schema with Zod
const linkupApiFormSchema = z.object({
name: z.string().min(3, {
message: "Connector name must be at least 3 characters.",
}),
api_key: z.string().min(10, {
message: "API key is required and must be valid.",
}),
});
// Define the type for the form values
type LinkupApiFormValues = z.infer<typeof linkupApiFormSchema>;
export default function LinkupApiPage() {
const router = useRouter();
const params = useParams();
const searchSpaceId = params.search_space_id as string;
const [isSubmitting, setIsSubmitting] = useState(false);
const { createConnector } = useSearchSourceConnectors();
// Initialize the form
const form = useForm<LinkupApiFormValues>({
resolver: zodResolver(linkupApiFormSchema),
defaultValues: {
name: "Linkup API Connector",
api_key: "",
},
});
// Handle form submission
const onSubmit = async (values: LinkupApiFormValues) => {
setIsSubmitting(true);
try {
await createConnector({
name: values.name,
connector_type: "LINKUP_API",
config: {
LINKUP_API_KEY: values.api_key,
},
is_indexable: false,
last_indexed_at: null,
});
toast.success("Linkup API connector created successfully!");
// Navigate back to connectors page
router.push(`/dashboard/${searchSpaceId}/connectors`);
} catch (error) {
console.error("Error creating connector:", error);
toast.error(error instanceof Error ? error.message : "Failed to create connector");
} finally {
setIsSubmitting(false);
}
};
return (
<div className="container mx-auto py-8 max-w-3xl">
<Button
variant="ghost"
className="mb-6"
onClick={() => router.push(`/dashboard/${searchSpaceId}/connectors/add`)}
>
<ArrowLeft className="mr-2 h-4 w-4" />
Back to Connectors
</Button>
<motion.div
initial={{ opacity: 0, y: 20 }}
animate={{ opacity: 1, y: 0 }}
transition={{ duration: 0.5 }}
>
<Card className="border-2 border-border">
<CardHeader>
<CardTitle className="text-2xl font-bold">Connect Linkup API</CardTitle>
<CardDescription>
Integrate with Linkup API to enhance your search capabilities with AI-powered search results.
</CardDescription>
</CardHeader>
<CardContent>
<Alert className="mb-6 bg-muted">
<Info className="h-4 w-4" />
<AlertTitle>API Key Required</AlertTitle>
<AlertDescription>
You'll need a Linkup API key to use this connector. You can get one by signing up at{" "}
<a
href="https://linkup.so"
target="_blank"
rel="noopener noreferrer"
className="font-medium underline underline-offset-4"
>
linkup.so
</a>
</AlertDescription>
</Alert>
<Form {...form}>
<form onSubmit={form.handleSubmit(onSubmit)} className="space-y-6">
<FormField
control={form.control}
name="name"
render={({ field }) => (
<FormItem>
<FormLabel>Connector Name</FormLabel>
<FormControl>
<Input placeholder="My Linkup API Connector" {...field} />
</FormControl>
<FormDescription>
A friendly name to identify this connector.
</FormDescription>
<FormMessage />
</FormItem>
)}
/>
<FormField
control={form.control}
name="api_key"
render={({ field }) => (
<FormItem>
<FormLabel>Linkup API Key</FormLabel>
<FormControl>
<Input
type="password"
placeholder="Enter your Linkup API key"
{...field}
/>
</FormControl>
<FormDescription>
Your API key will be encrypted and stored securely.
</FormDescription>
<FormMessage />
</FormItem>
)}
/>
<div className="flex justify-end">
<Button
type="submit"
disabled={isSubmitting}
className="w-full sm:w-auto"
>
{isSubmitting ? (
<>
<Loader2 className="mr-2 h-4 w-4 animate-spin" />
Connecting...
</>
) : (
<>
<Check className="mr-2 h-4 w-4" />
Connect Linkup API
</>
)}
</Button>
</div>
</form>
</Form>
</CardContent>
<CardFooter className="flex flex-col items-start border-t bg-muted/50 px-6 py-4">
<h4 className="text-sm font-medium">What you get with Linkup API:</h4>
<ul className="mt-2 list-disc pl-5 text-sm text-muted-foreground">
<li>AI-powered search results tailored to your queries</li>
<li>Real-time information from the web</li>
<li>Enhanced search capabilities for your projects</li>
</ul>
</CardFooter>
</Card>
</motion.div>
</div>
);
}

View file

@ -16,6 +16,7 @@ import {
IconWorldWww, IconWorldWww,
IconTicket, IconTicket,
IconLayoutKanban, IconLayoutKanban,
IconLinkPlus,
} from "@tabler/icons-react"; } from "@tabler/icons-react";
import { AnimatePresence, motion } from "framer-motion"; import { AnimatePresence, motion } from "framer-motion";
import Link from "next/link"; import Link from "next/link";
@ -50,7 +51,13 @@ const connectorCategories: ConnectorCategory[] = [
icon: <IconWorldWww className="h-6 w-6" />, icon: <IconWorldWww className="h-6 w-6" />,
status: "available", status: "available",
}, },
// Add other search engine connectors like Tavily, Serper if they have UI config {
id: "linkup-api",
title: "Linkup API",
description: "Search the web using the Linkup API",
icon: <IconLinkPlus className="h-6 w-6" />,
status: "available",
},
], ],
}, },
{ {

View file

@ -42,34 +42,95 @@ export default function FileUploader() {
const router = useRouter(); const router = useRouter();
const fileInputRef = useRef<HTMLInputElement>(null); const fileInputRef = useRef<HTMLInputElement>(null);
const acceptedFileTypes = { // Audio files are always supported (using whisper)
'image/bmp': ['.bmp'], const audioFileTypes = {
'text/csv': ['.csv'], 'audio/mpeg': ['.mp3', '.mpeg', '.mpga'],
'application/msword': ['.doc'], 'audio/mp4': ['.mp4', '.m4a'],
'application/vnd.openxmlformats-officedocument.wordprocessingml.document': ['.docx'], 'audio/wav': ['.wav'],
'message/rfc822': ['.eml'], 'audio/webm': ['.webm'],
'application/epub+zip': ['.epub'], };
'image/heic': ['.heic'],
'text/html': ['.html'], // Conditionally set accepted file types based on ETL service
'image/jpeg': ['.jpeg', '.jpg'], const acceptedFileTypes = process.env.NEXT_PUBLIC_ETL_SERVICE === 'LLAMACLOUD'
'image/png': ['.png'], ? {
'text/markdown': ['.md'], // LlamaCloud supported file types
'application/vnd.ms-outlook': ['.msg'], 'application/pdf': ['.pdf'],
'application/vnd.oasis.opendocument.text': ['.odt'], 'application/msword': ['.doc'],
'text/x-org': ['.org'], 'application/vnd.openxmlformats-officedocument.wordprocessingml.document': ['.docx'],
'application/pkcs7-signature': ['.p7s'], 'application/vnd.ms-word.document.macroEnabled.12': ['.docm'],
'application/pdf': ['.pdf'], 'application/msword-template': ['.dot'],
'application/vnd.ms-powerpoint': ['.ppt'], 'application/vnd.ms-word.template.macroEnabled.12': ['.dotm'],
'application/vnd.openxmlformats-officedocument.presentationml.presentation': ['.pptx'], 'application/vnd.ms-powerpoint': ['.ppt'],
'text/x-rst': ['.rst'], 'application/vnd.ms-powerpoint.template.macroEnabled.12': ['.pptm'],
'application/rtf': ['.rtf'], 'application/vnd.openxmlformats-officedocument.presentationml.presentation': ['.pptx'],
'image/tiff': ['.tiff'], 'application/vnd.ms-powerpoint.template': ['.pot'],
'text/plain': ['.txt'], 'application/vnd.openxmlformats-officedocument.presentationml.template': ['.potx'],
'text/tab-separated-values': ['.tsv'], 'application/vnd.openxmlformats-officedocument.spreadsheetml.sheet': ['.xlsx'],
'application/vnd.ms-excel': ['.xls'], 'application/vnd.ms-excel': ['.xls'],
'application/vnd.openxmlformats-officedocument.spreadsheetml.sheet': ['.xlsx'], 'application/vnd.ms-excel.sheet.macroEnabled.12': ['.xlsm'],
'application/xml': ['.xml'], 'application/vnd.ms-excel.sheet.binary.macroEnabled.12': ['.xlsb'],
} 'application/vnd.ms-excel.workspace': ['.xlw'],
'application/rtf': ['.rtf'],
'application/xml': ['.xml'],
'application/epub+zip': ['.epub'],
'application/vnd.apple.keynote': ['.key'],
'application/vnd.apple.pages': ['.pages'],
'application/vnd.apple.numbers': ['.numbers'],
'application/vnd.wordperfect': ['.wpd'],
'application/vnd.oasis.opendocument.text': ['.odt'],
'application/vnd.oasis.opendocument.presentation': ['.odp'],
'application/vnd.oasis.opendocument.graphics': ['.odg'],
'application/vnd.oasis.opendocument.spreadsheet': ['.ods'],
'application/vnd.oasis.opendocument.formula': ['.fods'],
'text/plain': ['.txt'],
'text/csv': ['.csv'],
'text/tab-separated-values': ['.tsv'],
'text/html': ['.html', '.htm', '.web'],
'image/jpeg': ['.jpg', '.jpeg'],
'image/png': ['.png'],
'image/gif': ['.gif'],
'image/bmp': ['.bmp'],
'image/svg+xml': ['.svg'],
'image/tiff': ['.tiff'],
'image/webp': ['.webp'],
'application/dbase': ['.dbf'],
'application/vnd.lotus-1-2-3': ['.123'],
'text/x-web-markdown': ['.602', '.abw', '.cgm', '.cwk', '.hwp', '.lwp', '.mw', '.mcw', '.pbd', '.sda', '.sdd', '.sdp', '.sdw', '.sgl', '.sti', '.sxi', '.sxw', '.stw', '.sxg', '.uof', '.uop', '.uot', '.vor', '.wps', '.zabw'],
'text/x-spreadsheet': ['.dif', '.sylk', '.slk', '.prn', '.et', '.uos1', '.uos2', '.wk1', '.wk2', '.wk3', '.wk4', '.wks', '.wq1', '.wq2', '.wb1', '.wb2', '.wb3', '.qpw', '.xlr', '.eth'],
// Audio files (always supported)
...audioFileTypes,
}
: {
// Unstructured supported file types
'image/bmp': ['.bmp'],
'text/csv': ['.csv'],
'application/msword': ['.doc'],
'application/vnd.openxmlformats-officedocument.wordprocessingml.document': ['.docx'],
'message/rfc822': ['.eml'],
'application/epub+zip': ['.epub'],
'image/heic': ['.heic'],
'text/html': ['.html'],
'image/jpeg': ['.jpeg', '.jpg'],
'image/png': ['.png'],
'text/markdown': ['.md', '.markdown'],
'application/vnd.ms-outlook': ['.msg'],
'application/vnd.oasis.opendocument.text': ['.odt'],
'text/x-org': ['.org'],
'application/pkcs7-signature': ['.p7s'],
'application/pdf': ['.pdf'],
'application/vnd.ms-powerpoint': ['.ppt'],
'application/vnd.openxmlformats-officedocument.presentationml.presentation': ['.pptx'],
'text/x-rst': ['.rst'],
'application/rtf': ['.rtf'],
'image/tiff': ['.tiff'],
'text/plain': ['.txt'],
'text/tab-separated-values': ['.tsv'],
'application/vnd.ms-excel': ['.xls'],
'application/vnd.openxmlformats-officedocument.spreadsheetml.sheet': ['.xlsx'],
'application/xml': ['.xml'],
// Audio files (always supported)
...audioFileTypes,
};
const supportedExtensions = Array.from(new Set(Object.values(acceptedFileTypes).flat())).sort() const supportedExtensions = Array.from(new Set(Object.values(acceptedFileTypes).flat())).sort()

View file

@ -73,6 +73,13 @@ export default function DashboardLayout({
}, },
], ],
}, },
{
title: "Podcasts",
url: `/dashboard/${search_space_id}/podcasts`,
icon: "Podcast",
items: [
],
}
// TODO: Add research synthesizer's // TODO: Add research synthesizer's
// { // {
// title: "Research Synthesizer's", // title: "Research Synthesizer's",

View file

@ -0,0 +1,20 @@
import { Suspense } from 'react';
import PodcastsPageClient from './podcasts-client';
interface PageProps {
params: {
search_space_id: string;
};
}
export default async function PodcastsPage({ params }: PageProps) {
const { search_space_id: searchSpaceId } = await Promise.resolve(params);
return (
<Suspense fallback={<div className="flex items-center justify-center h-[60vh]">
<div className="h-8 w-8 animate-spin rounded-full border-4 border-primary border-t-transparent"></div>
</div>}>
<PodcastsPageClient searchSpaceId={searchSpaceId} />
</Suspense>
);
}

View file

@ -0,0 +1,968 @@
'use client';
import { format } from 'date-fns';
import { AnimatePresence, motion } from 'framer-motion';
import {
Calendar,
MoreHorizontal,
Pause,
Play,
Podcast,
Search,
SkipBack,
SkipForward,
Trash2,
Volume2, VolumeX
} from 'lucide-react';
import { useEffect, useRef, useState } from 'react';
// UI Components
import { Button } from '@/components/ui/button';
import { Card } from '@/components/ui/card';
import {
Dialog,
DialogContent,
DialogDescription,
DialogFooter,
DialogHeader,
DialogTitle,
} from "@/components/ui/dialog";
import {
DropdownMenu,
DropdownMenuContent,
DropdownMenuItem,
DropdownMenuTrigger
} from '@/components/ui/dropdown-menu';
import { Input } from '@/components/ui/input';
import {
Select,
SelectContent,
SelectGroup,
SelectItem,
SelectTrigger,
SelectValue,
} from "@/components/ui/select";
import { Slider } from '@/components/ui/slider';
import { toast } from "sonner";
interface PodcastItem {
id: number;
title: string;
created_at: string;
file_location: string;
podcast_transcript: any[];
search_space_id: number;
}
interface PodcastsPageClientProps {
searchSpaceId: string;
}
const pageVariants = {
initial: { opacity: 0 },
enter: { opacity: 1, transition: { duration: 0.4, ease: 'easeInOut', staggerChildren: 0.1 } },
exit: { opacity: 0, transition: { duration: 0.3, ease: 'easeInOut' } }
};
const podcastCardVariants = {
initial: { scale: 0.95, y: 20, opacity: 0 },
animate: { scale: 1, y: 0, opacity: 1, transition: { type: "spring", stiffness: 300, damping: 25 } },
exit: { scale: 0.95, y: -20, opacity: 0 },
hover: { y: -5, scale: 1.02, transition: { duration: 0.2 } }
};
const MotionCard = motion(Card);
export default function PodcastsPageClient({ searchSpaceId }: PodcastsPageClientProps) {
const [podcasts, setPodcasts] = useState<PodcastItem[]>([]);
const [filteredPodcasts, setFilteredPodcasts] = useState<PodcastItem[]>([]);
const [isLoading, setIsLoading] = useState(true);
const [error, setError] = useState<string | null>(null);
const [searchQuery, setSearchQuery] = useState('');
const [sortOrder, setSortOrder] = useState<string>('newest');
const [deleteDialogOpen, setDeleteDialogOpen] = useState(false);
const [podcastToDelete, setPodcastToDelete] = useState<{ id: number, title: string } | null>(null);
const [isDeleting, setIsDeleting] = useState(false);
// Audio player state
const [currentPodcast, setCurrentPodcast] = useState<PodcastItem | null>(null);
const [audioSrc, setAudioSrc] = useState<string | undefined>(undefined);
const [isAudioLoading, setIsAudioLoading] = useState(false);
const [isPlaying, setIsPlaying] = useState(false);
const [currentTime, setCurrentTime] = useState(0);
const [duration, setDuration] = useState(0);
const [volume, setVolume] = useState(0.7);
const [isMuted, setIsMuted] = useState(false);
const audioRef = useRef<HTMLAudioElement | null>(null);
const currentObjectUrlRef = useRef<string | null>(null);
// Add podcast image URL constant
const PODCAST_IMAGE_URL = "https://static.vecteezy.com/system/resources/thumbnails/002/157/611/small_2x/illustrations-concept-design-podcast-channel-free-vector.jpg";
// Fetch podcasts from API
useEffect(() => {
const fetchPodcasts = async () => {
try {
setIsLoading(true);
// Get token from localStorage
const token = localStorage.getItem('surfsense_bearer_token');
if (!token) {
setError('Authentication token not found. Please log in again.');
setIsLoading(false);
return;
}
// Fetch all podcasts for this search space
const response = await fetch(
`${process.env.NEXT_PUBLIC_FASTAPI_BACKEND_URL}/api/v1/podcasts/`,
{
headers: {
'Authorization': `Bearer ${token}`,
'Content-Type': 'application/json',
},
cache: 'no-store',
}
);
if (!response.ok) {
const errorData = await response.json().catch(() => null);
throw new Error(`Failed to fetch podcasts: ${response.status} ${errorData?.detail || ''}`);
}
const data: PodcastItem[] = await response.json();
setPodcasts(data);
setFilteredPodcasts(data);
setError(null);
} catch (error) {
console.error('Error fetching podcasts:', error);
setError(error instanceof Error ? error.message : 'Unknown error occurred');
setPodcasts([]);
setFilteredPodcasts([]);
} finally {
setIsLoading(false);
}
};
fetchPodcasts();
}, [searchSpaceId]);
// Filter and sort podcasts based on search query and sort order
useEffect(() => {
let result = [...podcasts];
// Filter by search term
if (searchQuery) {
const query = searchQuery.toLowerCase();
result = result.filter(podcast =>
podcast.title.toLowerCase().includes(query)
);
}
// Filter by search space
result = result.filter(podcast =>
podcast.search_space_id === parseInt(searchSpaceId)
);
// Sort podcasts
result.sort((a, b) => {
const dateA = new Date(a.created_at).getTime();
const dateB = new Date(b.created_at).getTime();
return sortOrder === 'newest' ? dateB - dateA : dateA - dateB;
});
setFilteredPodcasts(result);
}, [podcasts, searchQuery, sortOrder, searchSpaceId]);
// Cleanup object URL on unmount or when currentPodcast changes
useEffect(() => {
return () => {
if (currentObjectUrlRef.current) {
URL.revokeObjectURL(currentObjectUrlRef.current);
currentObjectUrlRef.current = null;
}
};
}, []);
// Audio player time update handler
const handleTimeUpdate = () => {
if (audioRef.current) {
setCurrentTime(audioRef.current.currentTime);
}
};
// Audio player metadata loaded handler
const handleMetadataLoaded = () => {
if (audioRef.current) {
setDuration(audioRef.current.duration);
}
};
// Play/pause toggle
const togglePlayPause = () => {
if (audioRef.current) {
if (isPlaying) {
audioRef.current.pause();
} else {
audioRef.current.play();
}
setIsPlaying(!isPlaying);
}
};
// Seek to position
const handleSeek = (value: number[]) => {
if (audioRef.current) {
audioRef.current.currentTime = value[0];
setCurrentTime(value[0]);
}
};
// Volume change
const handleVolumeChange = (value: number[]) => {
if (audioRef.current) {
const newVolume = value[0];
// Set volume
audioRef.current.volume = newVolume;
setVolume(newVolume);
// Handle mute state based on volume
if (newVolume === 0) {
audioRef.current.muted = true;
setIsMuted(true);
} else {
audioRef.current.muted = false;
setIsMuted(false);
}
}
};
// Toggle mute
const toggleMute = () => {
if (audioRef.current) {
const newMutedState = !isMuted;
audioRef.current.muted = newMutedState;
setIsMuted(newMutedState);
// If unmuting, restore previous volume if it was 0
if (!newMutedState && volume === 0) {
const restoredVolume = 0.5;
audioRef.current.volume = restoredVolume;
setVolume(restoredVolume);
}
}
};
// Skip forward 10 seconds
const skipForward = () => {
if (audioRef.current) {
audioRef.current.currentTime = Math.min(audioRef.current.duration, audioRef.current.currentTime + 10);
}
};
// Skip backward 10 seconds
const skipBackward = () => {
if (audioRef.current) {
audioRef.current.currentTime = Math.max(0, audioRef.current.currentTime - 10);
}
};
// Format time in MM:SS
const formatTime = (time: number) => {
const minutes = Math.floor(time / 60);
const seconds = Math.floor(time % 60);
return `${minutes}:${seconds < 10 ? '0' : ''}${seconds}`;
};
// Play podcast - Fetch blob and set object URL
const playPodcast = async (podcast: PodcastItem) => {
// If the same podcast is selected, just toggle play/pause
if (currentPodcast && currentPodcast.id === podcast.id) {
togglePlayPause();
return;
}
// Prevent multiple simultaneous loading requests
if (isAudioLoading) {
return;
}
try {
// Reset player state and show loading
setCurrentPodcast(podcast);
setAudioSrc(undefined);
setCurrentTime(0);
setDuration(0);
setIsPlaying(false);
setIsAudioLoading(true);
const token = localStorage.getItem('surfsense_bearer_token');
if (!token) {
throw new Error('Authentication token not found.');
}
// Revoke previous object URL if exists (only after we've started the new request)
if (currentObjectUrlRef.current) {
URL.revokeObjectURL(currentObjectUrlRef.current);
currentObjectUrlRef.current = null;
}
// Use AbortController to handle timeout or cancellation
const controller = new AbortController();
const timeoutId = setTimeout(() => controller.abort(), 30000); // 30 second timeout
try {
const response = await fetch(
`${process.env.NEXT_PUBLIC_FASTAPI_BACKEND_URL}/api/v1/podcasts/${podcast.id}/stream`,
{
headers: {
'Authorization': `Bearer ${token}`,
},
signal: controller.signal
}
);
if (!response.ok) {
throw new Error(`Failed to fetch audio stream: ${response.statusText}`);
}
const blob = await response.blob();
const objectUrl = URL.createObjectURL(blob);
currentObjectUrlRef.current = objectUrl;
// Set audio source
setAudioSrc(objectUrl);
// Wait for the audio to be ready before playing
// We'll handle actual playback in the onLoadedData event instead of here
} catch (error) {
if (error instanceof DOMException && error.name === 'AbortError') {
throw new Error('Request timed out. Please try again.');
}
throw error;
} finally {
clearTimeout(timeoutId);
}
} catch (error) {
console.error('Error fetching or playing podcast:', error);
toast.error(error instanceof Error ? error.message : 'Failed to load podcast audio.');
// Reset state on error
setCurrentPodcast(null);
setAudioSrc(undefined);
} finally {
setIsAudioLoading(false);
}
};
// Function to handle podcast deletion
const handleDeletePodcast = async () => {
if (!podcastToDelete) return;
setIsDeleting(true);
try {
const token = localStorage.getItem('surfsense_bearer_token');
if (!token) {
setIsDeleting(false);
return;
}
const response = await fetch(`${process.env.NEXT_PUBLIC_FASTAPI_BACKEND_URL}/api/v1/podcasts/${podcastToDelete.id}`, {
method: 'DELETE',
headers: {
'Authorization': `Bearer ${token}`,
'Content-Type': 'application/json',
}
});
if (!response.ok) {
throw new Error(`Failed to delete podcast: ${response.statusText}`);
}
// Close dialog and refresh podcasts
setDeleteDialogOpen(false);
setPodcastToDelete(null);
// Update local state by removing the deleted podcast
setPodcasts(prevPodcasts => prevPodcasts.filter(podcast => podcast.id !== podcastToDelete.id));
// If the current playing podcast is deleted, stop playback
if (currentPodcast && currentPodcast.id === podcastToDelete.id) {
if (audioRef.current) {
audioRef.current.pause();
}
setCurrentPodcast(null);
setIsPlaying(false);
}
toast.success('Podcast deleted successfully');
} catch (error) {
console.error('Error deleting podcast:', error);
toast.error(error instanceof Error ? error.message : 'Failed to delete podcast');
} finally {
setIsDeleting(false);
}
};
return (
<motion.div
className="container p-6 mx-auto"
initial="initial"
animate="enter"
exit="exit"
variants={pageVariants}
>
<div className="flex flex-col space-y-4 md:space-y-6">
<div className="flex flex-col space-y-2">
<h1 className="text-3xl font-bold tracking-tight">Podcasts</h1>
<p className="text-muted-foreground">Listen to generated podcasts.</p>
</div>
{/* Filter and Search Bar */}
<div className="flex flex-col space-y-4 md:flex-row md:items-center md:justify-between md:space-y-0">
<div className="flex flex-1 items-center gap-2">
<div className="relative w-full md:w-80">
<Search className="absolute left-2.5 top-2.5 h-4 w-4 text-muted-foreground" />
<Input
type="text"
placeholder="Search podcasts..."
className="pl-8"
value={searchQuery}
onChange={(e) => setSearchQuery(e.target.value)}
/>
</div>
</div>
<div>
<Select value={sortOrder} onValueChange={setSortOrder}>
<SelectTrigger className="w-40">
<SelectValue placeholder="Sort order" />
</SelectTrigger>
<SelectContent>
<SelectGroup>
<SelectItem value="newest">Newest First</SelectItem>
<SelectItem value="oldest">Oldest First</SelectItem>
</SelectGroup>
</SelectContent>
</Select>
</div>
</div>
{/* Status Messages */}
{isLoading && (
<div className="flex items-center justify-center h-40">
<div className="flex flex-col items-center gap-2">
<div className="h-8 w-8 animate-spin rounded-full border-4 border-primary border-t-transparent"></div>
<p className="text-sm text-muted-foreground">Loading podcasts...</p>
</div>
</div>
)}
{error && !isLoading && (
<div className="border border-destructive/50 text-destructive p-4 rounded-md">
<h3 className="font-medium">Error loading podcasts</h3>
<p className="text-sm">{error}</p>
</div>
)}
{!isLoading && !error && filteredPodcasts.length === 0 && (
<div className="flex flex-col items-center justify-center h-40 gap-2 text-center">
<Podcast className="h-8 w-8 text-muted-foreground" />
<h3 className="font-medium">No podcasts found</h3>
<p className="text-sm text-muted-foreground">
{searchQuery
? 'Try adjusting your search filters'
: 'Generate podcasts from your chats to get started'}
</p>
</div>
)}
{/* Podcast Grid */}
{!isLoading && !error && filteredPodcasts.length > 0 && (
<AnimatePresence mode="wait">
<motion.div
className="grid grid-cols-1 md:grid-cols-2 lg:grid-cols-3 gap-6"
variants={pageVariants}
initial="initial"
animate="enter"
exit="exit"
>
{filteredPodcasts.map((podcast, index) => (
<MotionCard
key={podcast.id}
variants={podcastCardVariants}
initial="initial"
animate="animate"
exit="exit"
whileHover="hover"
transition={{ duration: 0.2, delay: index * 0.05 }}
className={`
bg-card/60 dark:bg-card/40 backdrop-blur-lg rounded-xl p-4
shadow-md hover:shadow-xl transition-all duration-300
border-border overflow-hidden cursor-pointer
${currentPodcast?.id === podcast.id ? 'ring-2 ring-primary ring-offset-2 ring-offset-background' : ''}
`}
layout
onClick={() => playPodcast(podcast)}
>
<div
className="relative w-full aspect-[16/10] mb-4 rounded-lg overflow-hidden"
>
{/* Podcast image with gradient overlay */}
<img
src={PODCAST_IMAGE_URL}
alt="Podcast illustration"
className="w-full h-full object-cover transition-transform duration-500 group-hover:scale-105 brightness-[0.85] contrast-[1.1]"
loading="lazy"
/>
{/* Better overlay with gradient for improved text legibility */}
<div className="absolute inset-0 bg-gradient-to-t from-black/60 to-black/10 transition-opacity duration-300"></div>
{/* Loading indicator with improved animation */}
{currentPodcast?.id === podcast.id && isAudioLoading && (
<motion.div
className="absolute inset-0 flex items-center justify-center bg-background/60 backdrop-blur-md z-10"
initial={{ opacity: 0 }}
animate={{ opacity: 1 }}
exit={{ opacity: 0 }}
transition={{ duration: 0.2 }}
>
<motion.div
className="flex flex-col items-center gap-3"
initial={{ scale: 0.9 }}
animate={{ scale: 1 }}
transition={{ type: "spring", damping: 20 }}
>
<div className="h-14 w-14 rounded-full border-4 border-primary/30 border-t-primary animate-spin"></div>
<p className="text-sm text-foreground font-medium">Loading podcast...</p>
</motion.div>
</motion.div>
)}
{/* Play button with animations */}
{!(currentPodcast?.id === podcast.id && (isPlaying || isAudioLoading)) && (
<motion.div
className="absolute top-1/2 left-1/2 -translate-x-1/2 -translate-y-1/2 z-10"
whileHover={{ scale: 1.1 }}
whileTap={{ scale: 0.9 }}
>
<Button
variant="secondary"
size="icon"
className="h-16 w-16 rounded-full
bg-background/80 hover:bg-background/95 backdrop-blur-md
transition-all duration-200 shadow-xl border-0
flex items-center justify-center"
onClick={(e) => {
e.stopPropagation();
playPodcast(podcast);
}}
disabled={isAudioLoading}
>
<motion.div
initial={{ scale: 0.8 }}
animate={{ scale: 1 }}
transition={{ type: "spring", stiffness: 400, damping: 10 }}
className="text-primary w-10 h-10 flex items-center justify-center"
>
<Play className="h-8 w-8 ml-1" />
</motion.div>
</Button>
</motion.div>
)}
{/* Pause button with animations */}
{currentPodcast?.id === podcast.id && isPlaying && !isAudioLoading && (
<motion.div
className="absolute top-1/2 left-1/2 -translate-x-1/2 -translate-y-1/2 z-10"
whileHover={{ scale: 1.1 }}
whileTap={{ scale: 0.9 }}
>
<Button
variant="secondary"
size="icon"
className="h-16 w-16 rounded-full
bg-background/80 hover:bg-background/95 backdrop-blur-md
transition-all duration-200 shadow-xl border-0
flex items-center justify-center"
onClick={(e) => {
e.stopPropagation();
togglePlayPause();
}}
disabled={isAudioLoading}
>
<motion.div
initial={{ scale: 0.8 }}
animate={{ scale: 1 }}
transition={{ type: "spring", stiffness: 400, damping: 10 }}
className="text-primary w-10 h-10 flex items-center justify-center"
>
<Pause className="h-8 w-8" />
</motion.div>
</Button>
</motion.div>
)}
{/* Now playing indicator */}
{currentPodcast?.id === podcast.id && !isAudioLoading && (
<div className="absolute top-2 left-2 bg-primary text-primary-foreground text-xs px-2 py-1 rounded-full z-10 flex items-center gap-1.5">
<span className="relative flex h-2 w-2">
<span className="animate-ping absolute inline-flex h-full w-full rounded-full bg-primary-foreground opacity-75"></span>
<span className="relative inline-flex rounded-full h-2 w-2 bg-primary-foreground"></span>
</span>
Now Playing
</div>
)}
</div>
<div className="mb-3 px-1">
<h3 className="text-base font-semibold text-foreground truncate" title={podcast.title}>
{podcast.title || 'Untitled Podcast'}
</h3>
<p className="text-xs text-muted-foreground mt-0.5 flex items-center gap-1.5">
<Calendar className="h-3 w-3" />
{format(new Date(podcast.created_at), 'MMM d, yyyy')}
</p>
</div>
{currentPodcast?.id === podcast.id && !isAudioLoading && (
<motion.div
className="mb-3 px-1"
initial={{ opacity: 0, y: 5 }}
animate={{ opacity: 1, y: 0 }}
transition={{ delay: 0.1 }}
>
<div
className="h-1.5 bg-muted rounded-full cursor-pointer group relative overflow-hidden"
onClick={(e) => {
e.stopPropagation();
if (!audioRef.current || !duration) return;
const container = e.currentTarget;
const rect = container.getBoundingClientRect();
const x = e.clientX - rect.left;
const percentage = Math.max(0, Math.min(1, x / rect.width));
const newTime = percentage * duration;
handleSeek([newTime]);
}}
>
<motion.div
className="h-full bg-primary rounded-full relative"
style={{ width: `${(currentTime / duration) * 100}%` }}
transition={{ ease: "linear" }}
>
<motion.div
className="absolute right-0 top-1/2 -translate-y-1/2 w-3 h-3
bg-primary rounded-full shadow-md transform scale-0
group-hover:scale-100 transition-transform"
whileHover={{ scale: 1.5 }}
/>
</motion.div>
</div>
<div className="flex justify-between mt-1.5 text-xs text-muted-foreground">
<span>{formatTime(currentTime)}</span>
<span>{formatTime(duration)}</span>
</div>
</motion.div>
)}
{currentPodcast?.id === podcast.id && !isAudioLoading && (
<motion.div
className="flex items-center justify-between px-2 mt-1"
initial={{ opacity: 0, y: 5 }}
animate={{ opacity: 1, y: 0 }}
transition={{ delay: 0.2 }}
>
<motion.div whileHover={{ scale: 1.2 }} whileTap={{ scale: 0.95 }}>
<Button
variant="ghost"
size="icon"
onClick={(e) => {
e.stopPropagation();
skipBackward();
}}
className="w-9 h-9 text-muted-foreground hover:text-primary transition-colors"
title="Rewind 10 seconds"
disabled={!duration}
>
<SkipBack className="w-5 h-5" />
</Button>
</motion.div>
<motion.div whileHover={{ scale: 1.2 }} whileTap={{ scale: 0.95 }}>
<Button
variant="ghost"
size="icon"
onClick={(e) => {
e.stopPropagation();
togglePlayPause();
}}
className="w-10 h-10 text-primary hover:bg-primary/10 rounded-full transition-colors"
disabled={!duration}
>
{isPlaying ?
<Pause className="w-6 h-6" /> :
<Play className="w-6 h-6 ml-0.5" />
}
</Button>
</motion.div>
<motion.div whileHover={{ scale: 1.2 }} whileTap={{ scale: 0.95 }}>
<Button
variant="ghost"
size="icon"
onClick={(e) => {
e.stopPropagation();
skipForward();
}}
className="w-9 h-9 text-muted-foreground hover:text-primary transition-colors"
title="Forward 10 seconds"
disabled={!duration}
>
<SkipForward className="w-5 h-5" />
</Button>
</motion.div>
</motion.div>
)}
<div className="absolute top-2 right-2 z-20">
<DropdownMenu>
<DropdownMenuTrigger asChild>
<Button
variant="ghost"
size="icon"
className="h-7 w-7 bg-background/50 hover:bg-background/80 rounded-full backdrop-blur-sm"
onClick={(e) => e.stopPropagation()}
>
<MoreHorizontal className="h-4 w-4" />
<span className="sr-only">Open menu</span>
</Button>
</DropdownMenuTrigger>
<DropdownMenuContent align="end">
<DropdownMenuItem
className="text-destructive focus:text-destructive"
onClick={(e) => {
e.stopPropagation();
setPodcastToDelete({ id: podcast.id, title: podcast.title });
setDeleteDialogOpen(true);
}}
>
<Trash2 className="mr-2 h-4 w-4" />
<span>Delete Podcast</span>
</DropdownMenuItem>
</DropdownMenuContent>
</DropdownMenu>
</div>
</MotionCard>
))}
</motion.div>
</AnimatePresence>
)}
{/* Current Podcast Player (Fixed at bottom) */}
{currentPodcast && !isAudioLoading && audioSrc && (
<motion.div
initial={{ y: 100, opacity: 0 }}
animate={{ y: 0, opacity: 1 }}
exit={{ y: 100, opacity: 0 }}
transition={{ type: "spring", stiffness: 300, damping: 30 }}
className="fixed bottom-0 left-0 right-0 bg-background/95 backdrop-blur-sm border-t p-4 shadow-lg z-50"
>
<div className="container mx-auto">
<div className="flex flex-col md:flex-row items-center gap-4">
<div className="flex-shrink-0">
<motion.div
className="w-12 h-12 bg-primary/20 rounded-md flex items-center justify-center"
animate={{ scale: isPlaying ? [1, 1.05, 1] : 1 }}
transition={{ repeat: isPlaying ? Infinity : 0, duration: 2 }}
>
<Podcast className="h-6 w-6 text-primary" />
</motion.div>
</div>
<div className="flex-grow min-w-0">
<h4 className="font-medium text-sm line-clamp-1">{currentPodcast.title}</h4>
<div className="flex items-center gap-2 mt-2">
<div className="flex-grow relative">
<Slider
value={[currentTime]}
min={0}
max={duration || 100}
step={0.1}
onValueChange={handleSeek}
className="relative z-10"
/>
<motion.div
className="absolute left-0 top-1/2 h-2 bg-primary/25 rounded-full -translate-y-1/2"
style={{ width: `${(currentTime / (duration || 100)) * 100}%` }}
transition={{ ease: "linear" }}
/>
</div>
<div className="flex-shrink-0 text-xs text-muted-foreground whitespace-nowrap">
{formatTime(currentTime)} / {formatTime(duration)}
</div>
</div>
</div>
<div className="flex items-center gap-2">
<motion.div whileHover={{ scale: 1.1 }} whileTap={{ scale: 0.95 }}>
<Button
variant="ghost"
size="icon"
onClick={skipBackward}
className="h-8 w-8"
>
<SkipBack className="h-4 w-4" />
</Button>
</motion.div>
<motion.div whileHover={{ scale: 1.1 }} whileTap={{ scale: 0.95 }}>
<Button
variant="default"
size="icon"
onClick={togglePlayPause}
className="h-10 w-10 rounded-full"
>
{isPlaying ? <Pause className="h-5 w-5" /> : <Play className="h-5 w-5 ml-0.5" />}
</Button>
</motion.div>
<motion.div whileHover={{ scale: 1.1 }} whileTap={{ scale: 0.95 }}>
<Button
variant="ghost"
size="icon"
onClick={skipForward}
className="h-8 w-8"
>
<SkipForward className="h-4 w-4" />
</Button>
</motion.div>
<div className="hidden md:flex items-center gap-2 ml-4 w-32">
<motion.div whileHover={{ scale: 1.1 }} whileTap={{ scale: 0.95 }}>
<Button
variant="ghost"
size="icon"
onClick={toggleMute}
className={`h-8 w-8 ${isMuted ? "text-muted-foreground" : "text-primary"}`}
>
{isMuted ? <VolumeX className="h-4 w-4" /> : <Volume2 className="h-4 w-4" />}
</Button>
</motion.div>
<div className="relative w-24">
<Slider
value={[isMuted ? 0 : volume]}
min={0}
max={1}
step={0.01}
onValueChange={handleVolumeChange}
className="w-24"
disabled={isMuted}
/>
<motion.div
className={`absolute left-0 bottom-0 h-1 bg-primary/30 rounded-full ${isMuted ? "opacity-50" : ""}`}
initial={false}
animate={{ width: `${(isMuted ? 0 : volume) * 100}%` }}
/>
</div>
</div>
</div>
</div>
</div>
</motion.div>
)}
</div>
{/* Delete Confirmation Dialog */}
<Dialog open={deleteDialogOpen} onOpenChange={setDeleteDialogOpen}>
<DialogContent className="sm:max-w-md">
<DialogHeader>
<DialogTitle className="flex items-center gap-2">
<Trash2 className="h-5 w-5 text-destructive" />
<span>Delete Podcast</span>
</DialogTitle>
<DialogDescription>
Are you sure you want to delete <span className="font-medium">{podcastToDelete?.title}</span>? This action cannot be undone.
</DialogDescription>
</DialogHeader>
<DialogFooter className="flex gap-2 sm:justify-end">
<Button
variant="outline"
onClick={() => setDeleteDialogOpen(false)}
disabled={isDeleting}
>
Cancel
</Button>
<Button
variant="destructive"
onClick={handleDeletePodcast}
disabled={isDeleting}
className="gap-2"
>
{isDeleting ? (
<>
<span className="h-4 w-4 animate-spin rounded-full border-2 border-current border-t-transparent" />
Deleting...
</>
) : (
<>
<Trash2 className="h-4 w-4" />
Delete
</>
)}
</Button>
</DialogFooter>
</DialogContent>
</Dialog>
{/* Hidden audio element for playback */}
<audio
ref={audioRef}
src={audioSrc}
preload="auto"
onTimeUpdate={handleTimeUpdate}
onLoadedMetadata={handleMetadataLoaded}
onLoadedData={() => {
// Only auto-play when audio is fully loaded
if (audioRef.current && currentPodcast && audioSrc) {
// Small delay to ensure browser is ready to play
setTimeout(() => {
if (audioRef.current) {
audioRef.current.play()
.then(() => {
setIsPlaying(true);
})
.catch(error => {
console.error('Error playing audio:', error);
// Don't show error if it's just the user navigating away
if (error.name !== 'AbortError') {
toast.error('Failed to play audio.');
}
setIsPlaying(false);
});
}
}, 100);
}
}}
onEnded={() => setIsPlaying(false)}
onError={(e) => {
console.error('Audio error:', e);
if (audioRef.current?.error) {
// Log the specific error code for debugging
console.error('Audio error code:', audioRef.current.error.code);
// Don't show error message for aborted loads
if (audioRef.current.error.code !== audioRef.current.error.MEDIA_ERR_ABORTED) {
toast.error('Error playing audio. Please try again.');
}
}
// Reset playing state on error
setIsPlaying(false);
}}
/>
</motion.div>
);
}

View file

@ -13,7 +13,9 @@ import {
ArrowDown, ArrowDown,
CircleUser, CircleUser,
Database, Database,
SendHorizontal SendHorizontal,
FileText,
Grid3x3
} from 'lucide-react'; } from 'lucide-react';
import { Card, CardContent, CardHeader, CardTitle } from '@/components/ui/card'; import { Card, CardContent, CardHeader, CardTitle } from '@/components/ui/card';
import { Button } from '@/components/ui/button'; import { Button } from '@/components/ui/button';
@ -46,7 +48,6 @@ import {
researcherOptions researcherOptions
} from '@/components/chat'; } from '@/components/chat';
import { MarkdownViewer } from '@/components/markdown-viewer'; import { MarkdownViewer } from '@/components/markdown-viewer';
import { connectorSourcesMenu as defaultConnectorSourcesMenu } from '@/components/chat/connector-sources';
import { Logo } from '@/components/Logo'; import { Logo } from '@/components/Logo';
import { useSearchSourceConnectors } from '@/hooks'; import { useSearchSourceConnectors } from '@/hooks';
@ -239,7 +240,6 @@ const SourcesDialogContent = ({
const ChatPage = () => { const ChatPage = () => {
const [token, setToken] = React.useState<string | null>(null); const [token, setToken] = React.useState<string | null>(null);
const [activeTab, setActiveTab] = useState("");
const [dialogOpenId, setDialogOpenId] = useState<number | null>(null); const [dialogOpenId, setDialogOpenId] = useState<number | null>(null);
const [sourcesPage, setSourcesPage] = useState(1); const [sourcesPage, setSourcesPage] = useState(1);
const [expandedSources, setExpandedSources] = useState(false); const [expandedSources, setExpandedSources] = useState(false);
@ -249,10 +249,10 @@ const ChatPage = () => {
const tabsListRef = useRef<HTMLDivElement>(null); const tabsListRef = useRef<HTMLDivElement>(null);
const [terminalExpanded, setTerminalExpanded] = useState(false); const [terminalExpanded, setTerminalExpanded] = useState(false);
const [selectedConnectors, setSelectedConnectors] = useState<string[]>(["CRAWLED_URL"]); const [selectedConnectors, setSelectedConnectors] = useState<string[]>(["CRAWLED_URL"]);
const [searchMode, setSearchMode] = useState<'DOCUMENTS' | 'CHUNKS'>('DOCUMENTS');
const [researchMode, setResearchMode] = useState<ResearchMode>("GENERAL"); const [researchMode, setResearchMode] = useState<ResearchMode>("GENERAL");
const [currentTime, setCurrentTime] = useState<string>(''); const [currentTime, setCurrentTime] = useState<string>('');
const [currentDate, setCurrentDate] = useState<string>(''); const [currentDate, setCurrentDate] = useState<string>('');
const [connectorSources, setConnectorSources] = useState<any[]>([]);
const terminalMessagesRef = useRef<HTMLDivElement>(null); const terminalMessagesRef = useRef<HTMLDivElement>(null);
const { connectorSourceItems, isLoading: isLoadingConnectors } = useSearchSourceConnectors(); const { connectorSourceItems, isLoading: isLoadingConnectors } = useSearchSourceConnectors();
@ -364,7 +364,8 @@ const ChatPage = () => {
data: { data: {
search_space_id: search_space_id, search_space_id: search_space_id,
selected_connectors: selectedConnectors, selected_connectors: selectedConnectors,
research_mode: researchMode research_mode: researchMode,
search_mode: searchMode
} }
}, },
onError: (error) => { onError: (error) => {
@ -476,43 +477,10 @@ const ChatPage = () => {
updateChat(); updateChat();
}, [messages, status, chat_id, researchMode, selectedConnectors, search_space_id]); }, [messages, status, chat_id, researchMode, selectedConnectors, search_space_id]);
// Memoize connector sources to prevent excessive re-renders
const processedConnectorSources = React.useMemo(() => {
if (messages.length === 0) return connectorSources;
// Only process when we have a complete message (not streaming)
if (status !== 'ready') return connectorSources;
// Find the latest assistant message
const assistantMessages = messages.filter(msg => msg.role === 'assistant');
if (assistantMessages.length === 0) return connectorSources;
const latestAssistantMessage = assistantMessages[assistantMessages.length - 1];
if (!latestAssistantMessage?.annotations) return connectorSources;
// Find the latest SOURCES annotation
const annotations = latestAssistantMessage.annotations as any[];
const sourcesAnnotations = annotations.filter(a => a.type === 'SOURCES');
if (sourcesAnnotations.length === 0) return connectorSources;
const latestSourcesAnnotation = sourcesAnnotations[sourcesAnnotations.length - 1];
if (!latestSourcesAnnotation.content) return connectorSources;
// Use this content if it differs from current
return latestSourcesAnnotation.content;
}, [messages, status, connectorSources]);
// Update connector sources when processed value changes
useEffect(() => {
if (processedConnectorSources !== connectorSources) {
setConnectorSources(processedConnectorSources);
}
}, [processedConnectorSources, connectorSources]);
// Check and scroll terminal when terminal info is available // Check and scroll terminal when terminal info is available
useEffect(() => { useEffect(() => {
if (messages.length === 0 || status !== 'ready') return; // Modified to trigger during streaming as well (removed status check)
if (messages.length === 0) return;
// Find the latest assistant message // Find the latest assistant message
const assistantMessages = messages.filter(msg => msg.role === 'assistant'); const assistantMessages = messages.filter(msg => msg.role === 'assistant');
@ -526,10 +494,27 @@ const ChatPage = () => {
const terminalInfoAnnotations = annotations.filter(a => a.type === 'TERMINAL_INFO'); const terminalInfoAnnotations = annotations.filter(a => a.type === 'TERMINAL_INFO');
if (terminalInfoAnnotations.length > 0) { if (terminalInfoAnnotations.length > 0) {
// Schedule scrolling after the DOM has been updated // Always scroll to bottom when terminal info is updated, even during streaming
setTimeout(scrollTerminalToBottom, 100); scrollTerminalToBottom();
} }
}, [messages, status]); }, [messages]); // Removed status from dependencies to ensure it triggers during streaming
// Pure function to get connector sources for a specific message
const getMessageConnectorSources = (message: any): any[] => {
if (!message || message.role !== 'assistant' || !message.annotations) return [];
// Find all SOURCES annotations
const annotations = message.annotations as any[];
const sourcesAnnotations = annotations.filter(a => a.type === 'SOURCES');
// Get the latest SOURCES annotation
if (sourcesAnnotations.length === 0) return [];
const latestSourcesAnnotation = sourcesAnnotations[sourcesAnnotations.length - 1];
if (!latestSourcesAnnotation.content) return [];
return latestSourcesAnnotation.content;
};
// Custom handleSubmit function to include selected connectors and answer type // Custom handleSubmit function to include selected connectors and answer type
const handleSubmit = (e: React.FormEvent) => { const handleSubmit = (e: React.FormEvent) => {
@ -561,17 +546,12 @@ const ChatPage = () => {
scrollToBottom(); scrollToBottom();
}, [messages]); }, [messages]);
// Set activeTab when connectorSources change using a memoized value // Reset sources page when new messages arrive
const activeTabValue = React.useMemo(() => {
return connectorSources.length > 0 ? connectorSources[0].type : "";
}, [connectorSources]);
// Update activeTab when the memoized value changes
useEffect(() => { useEffect(() => {
if (activeTabValue && activeTabValue !== activeTab) { // Reset pagination when we get new messages
setActiveTab(activeTabValue); setSourcesPage(1);
} setExpandedSources(false);
}, [activeTabValue, activeTab]); }, [messages]);
// Scroll terminal to bottom when expanded // Scroll terminal to bottom when expanded
useEffect(() => { useEffect(() => {
@ -580,11 +560,6 @@ const ChatPage = () => {
} }
}, [terminalExpanded]); }, [terminalExpanded]);
// Get total sources count for a connector type
const getSourcesCount = (connectorType: string) => {
return getSourcesCountUtil(connectorSources, connectorType);
};
// Function to check scroll position and update indicators // Function to check scroll position and update indicators
const updateScrollIndicators = () => { const updateScrollIndicators = () => {
updateScrollIndicatorsUtil(tabsListRef as React.RefObject<HTMLDivElement>, setCanScrollLeft, setCanScrollRight); updateScrollIndicatorsUtil(tabsListRef as React.RefObject<HTMLDivElement>, setCanScrollLeft, setCanScrollRight);
@ -610,23 +585,6 @@ const ChatPage = () => {
// Use the scroll to bottom hook // Use the scroll to bottom hook
useScrollToBottom(messagesEndRef as React.RefObject<HTMLDivElement>, [messages]); useScrollToBottom(messagesEndRef as React.RefObject<HTMLDivElement>, [messages]);
// Function to get sources for the main view
const getMainViewSources = (connector: any) => {
return getMainViewSourcesUtil(connector, INITIAL_SOURCES_DISPLAY);
};
// Function to get filtered sources for the dialog with null check
const getFilteredSourcesWithCheck = (connector: any, sourceFilter: string) => {
if (!connector?.sources) return [];
return getFilteredSourcesUtil(connector, sourceFilter);
};
// Function to get paginated dialog sources with null check
const getPaginatedDialogSourcesWithCheck = (connector: any, sourceFilter: string, expandedSources: boolean, sourcesPage: number, sourcesPerPage: number) => {
if (!connector?.sources) return [];
return getPaginatedDialogSourcesUtil(connector, sourceFilter, expandedSources, sourcesPage, sourcesPerPage);
};
// Function to get a citation source by ID // Function to get a citation source by ID
const getCitationSource = React.useCallback((citationId: number, messageIndex?: number): Source | null => { const getCitationSource = React.useCallback((citationId: number, messageIndex?: number): Source | null => {
if (!messages || messages.length === 0) return null; if (!messages || messages.length === 0) return null;
@ -638,23 +596,14 @@ const ChatPage = () => {
if (assistantMessages.length === 0) return null; if (assistantMessages.length === 0) return null;
const latestAssistantMessage = assistantMessages[assistantMessages.length - 1]; const latestAssistantMessage = assistantMessages[assistantMessages.length - 1];
if (!latestAssistantMessage?.annotations) return null;
// Find all SOURCES annotations // Use our helper function to get sources
const annotations = latestAssistantMessage.annotations as any[]; const sources = getMessageConnectorSources(latestAssistantMessage);
const sourcesAnnotations = annotations.filter( if (sources.length === 0) return null;
(annotation) => annotation.type === 'SOURCES'
);
// Get the latest SOURCES annotation
if (sourcesAnnotations.length === 0) return null;
const latestSourcesAnnotation = sourcesAnnotations[sourcesAnnotations.length - 1];
if (!latestSourcesAnnotation.content) return null;
// Flatten all sources from all connectors // Flatten all sources from all connectors
const allSources: Source[] = []; const allSources: Source[] = [];
latestSourcesAnnotation.content.forEach((connector: ConnectorSource) => { sources.forEach((connector: ConnectorSource) => {
if (connector.sources && Array.isArray(connector.sources)) { if (connector.sources && Array.isArray(connector.sources)) {
connector.sources.forEach((source: SourceItem) => { connector.sources.forEach((source: SourceItem) => {
allSources.push({ allSources.push({
@ -675,23 +624,14 @@ const ChatPage = () => {
} else { } else {
// Use the specific message by index // Use the specific message by index
const message = messages[messageIndex]; const message = messages[messageIndex];
if (!message || message.role !== 'assistant' || !message.annotations) return null;
// Find all SOURCES annotations // Use our helper function to get sources
const annotations = message.annotations as any[]; const sources = getMessageConnectorSources(message);
const sourcesAnnotations = annotations.filter( if (sources.length === 0) return null;
(annotation) => annotation.type === 'SOURCES'
);
// Get the latest SOURCES annotation
if (sourcesAnnotations.length === 0) return null;
const latestSourcesAnnotation = sourcesAnnotations[sourcesAnnotations.length - 1];
if (!latestSourcesAnnotation.content) return null;
// Flatten all sources from all connectors // Flatten all sources from all connectors
const allSources: Source[] = []; const allSources: Source[] = [];
latestSourcesAnnotation.content.forEach((connector: ConnectorSource) => { sources.forEach((connector: ConnectorSource) => {
if (connector.sources && Array.isArray(connector.sources)) { if (connector.sources && Array.isArray(connector.sources)) {
connector.sources.forEach((source: SourceItem) => { connector.sources.forEach((source: SourceItem) => {
allSources.push({ allSources.push({
@ -712,6 +652,34 @@ const ChatPage = () => {
} }
}, [messages]); }, [messages]);
// Pure function for rendering terminal content - no hooks allowed here
const renderTerminalContent = (message: any) => {
if (!message.annotations) return null;
// Get all TERMINAL_INFO annotations
const terminalInfoAnnotations = (message.annotations as any[])
.filter(a => a.type === 'TERMINAL_INFO');
// Get the latest TERMINAL_INFO annotation
const latestTerminalInfo = terminalInfoAnnotations.length > 0
? terminalInfoAnnotations[terminalInfoAnnotations.length - 1]
: null;
// Render the content of the latest TERMINAL_INFO annotation
return latestTerminalInfo?.content.map((item: any, idx: number) => (
<div key={idx} className="py-0.5 flex items-start text-gray-300">
<span className="text-gray-500 text-xs mr-2 w-10 flex-shrink-0">[{String(idx).padStart(2, '0')}:{String(Math.floor(idx * 2)).padStart(2, '0')}]</span>
<span className="mr-2 opacity-70">{'>'}</span>
<span className={`
${item.type === 'info' ? 'text-blue-300' : ''}
${item.type === 'success' ? 'text-green-300' : ''}
${item.type === 'error' ? 'text-red-300' : ''}
${item.type === 'warning' ? 'text-yellow-300' : ''}
`}>{item.text}</span>
</div>
));
};
return ( return (
<> <>
<div className="flex flex-col min-h-[calc(100vh-4rem)] min-w-4xl max-w-4xl mx-auto px-4 py-8 overflow-x-hidden justify-center gap-4"> <div className="flex flex-col min-h-[calc(100vh-4rem)] min-w-4xl max-w-4xl mx-auto px-4 py-8 overflow-x-hidden justify-center gap-4">
@ -781,30 +749,9 @@ const ChatPage = () => {
<span className="mr-1">$</span> <span className="mr-1">$</span>
<span>surfsense-researcher</span> <span>surfsense-researcher</span>
</div> </div>
{message.annotations && (() => {
// Get all TERMINAL_INFO annotations
const terminalInfoAnnotations = (message.annotations as any[])
.filter(a => a.type === 'TERMINAL_INFO');
// Get the latest TERMINAL_INFO annotation {renderTerminalContent(message)}
const latestTerminalInfo = terminalInfoAnnotations.length > 0
? terminalInfoAnnotations[terminalInfoAnnotations.length - 1]
: null;
// Render the content of the latest TERMINAL_INFO annotation
return latestTerminalInfo?.content.map((item: any, idx: number) => (
<div key={idx} className="py-0.5 flex items-start text-gray-300">
<span className="text-gray-500 text-xs mr-2 w-10 flex-shrink-0">[{String(idx).padStart(2, '0')}:{String(Math.floor(idx * 2)).padStart(2, '0')}]</span>
<span className="mr-2 opacity-70">{'>'}</span>
<span className={`
${item.type === 'info' ? 'text-blue-300' : ''}
${item.type === 'success' ? 'text-green-300' : ''}
${item.type === 'error' ? 'text-red-300' : ''}
${item.type === 'warning' ? 'text-yellow-300' : ''}
`}>{item.text}</span>
</div>
));
})()}
<div className="mt-2 flex items-center"> <div className="mt-2 flex items-center">
<span className="text-gray-500 text-xs mr-2 w-10 flex-shrink-0">[00:13]</span> <span className="text-gray-500 text-xs mr-2 w-10 flex-shrink-0">[00:13]</span>
<span className="text-green-400 mr-1">researcher@surfsense</span> <span className="text-green-400 mr-1">researcher@surfsense</span>
@ -836,105 +783,120 @@ const ChatPage = () => {
<span className="font-medium">Sources</span> <span className="font-medium">Sources</span>
</div> </div>
<Tabs {(() => {
defaultValue={connectorSources.length > 0 ? connectorSources[0].type : "CRAWLED_URL"} // Get sources for this specific message
className="w-full" const messageConnectorSources = getMessageConnectorSources(message);
onValueChange={setActiveTab}
>
<div className="mb-4">
<div className="flex items-center">
<Button
variant="ghost"
size="icon"
onClick={scrollTabsLeft}
className="flex-shrink-0 mr-2 z-10"
disabled={!canScrollLeft}
>
<ChevronLeft className="h-4 w-4" />
</Button>
<div className="flex-1 overflow-hidden"> if (messageConnectorSources.length === 0) {
<div className="flex overflow-x-auto hide-scrollbar" ref={tabsListRef} onScroll={updateScrollIndicators}> return (
<TabsList className="flex-1 bg-transparent border-0 p-0 custom-tabs-list"> <div className="text-center py-8 text-gray-500 dark:text-gray-400 border border-dashed rounded-md">
{connectorSources.map((connector) => ( <Database className="h-8 w-8 mx-auto mb-2 opacity-50" />
<TabsTrigger </div>
key={connector.id} );
value={connector.type} }
className="flex items-center gap-1 mx-1 data-[state=active]:bg-gray-100 dark:data-[state=active]:bg-gray-800 rounded-md"
> // Use these message-specific sources for the Tabs component
{getConnectorIcon(connector.type)} return (
<span className="hidden sm:inline ml-1">{connector.name.split(' ')[0]}</span> <Tabs
<span className="bg-gray-200 dark:bg-gray-700 px-1.5 py-0.5 rounded text-xs"> defaultValue={messageConnectorSources.length > 0 ? messageConnectorSources[0].type : "CRAWLED_URL"}
{getSourcesCount(connector.type)} className="w-full"
</span> >
</TabsTrigger> <div className="mb-4">
))} <div className="flex items-center">
</TabsList> <Button
variant="ghost"
size="icon"
onClick={scrollTabsLeft}
className="flex-shrink-0 mr-2 z-10"
disabled={!canScrollLeft}
>
<ChevronLeft className="h-4 w-4" />
</Button>
<div className="flex-1 overflow-hidden">
<div className="flex overflow-x-auto hide-scrollbar" ref={tabsListRef} onScroll={updateScrollIndicators}>
<TabsList className="flex-1 bg-transparent border-0 p-0 custom-tabs-list">
{messageConnectorSources.map((connector) => (
<TabsTrigger
key={connector.id}
value={connector.type}
className="flex items-center gap-1 mx-1 data-[state=active]:bg-gray-100 dark:data-[state=active]:bg-gray-800 rounded-md"
>
{getConnectorIcon(connector.type)}
<span className="hidden sm:inline ml-1">{connector.name.split(' ')[0]}</span>
<span className="bg-gray-200 dark:bg-gray-700 px-1.5 py-0.5 rounded text-xs">
{connector.sources?.length || 0}
</span>
</TabsTrigger>
))}
</TabsList>
</div>
</div>
<Button
variant="ghost"
size="icon"
onClick={scrollTabsRight}
className="flex-shrink-0 ml-2 z-10"
disabled={!canScrollRight}
>
<ChevronRight className="h-4 w-4" />
</Button>
</div> </div>
</div> </div>
<Button {messageConnectorSources.map(connector => (
variant="ghost" <TabsContent key={connector.id} value={connector.type} className="mt-0">
size="icon" <div className="space-y-3">
onClick={scrollTabsRight} {connector.sources?.slice(0, INITIAL_SOURCES_DISPLAY)?.map((source: any) => (
className="flex-shrink-0 ml-2 z-10" <Card key={source.id} className="p-3 hover:bg-gray-50 dark:hover:bg-gray-800 cursor-pointer">
disabled={!canScrollRight} <div className="flex items-start gap-3">
> <div className="flex-shrink-0 w-6 h-6 flex items-center justify-center">
<ChevronRight className="h-4 w-4" /> {getConnectorIcon(connector.type)}
</Button> </div>
</div> <div className="flex-1">
</div> <h3 className="font-medium text-sm">{source.title}</h3>
<p className="text-sm text-gray-500 dark:text-gray-400">{source.description}</p>
</div>
<Button
variant="ghost"
size="icon"
className="h-6 w-6"
onClick={() => window.open(source.url, '_blank')}
>
<ExternalLink className="h-4 w-4" />
</Button>
</div>
</Card>
))}
{connectorSources.map(connector => ( {connector.sources?.length > INITIAL_SOURCES_DISPLAY && (
<TabsContent key={connector.id} value={connector.type} className="mt-0"> <Dialog open={dialogOpenId === connector.id} onOpenChange={(open) => setDialogOpenId(open ? connector.id : null)}>
<div className="space-y-3"> <DialogTrigger asChild>
{getMainViewSources(connector)?.map((source: any) => ( <Button variant="ghost" className="w-full text-sm text-gray-500 dark:text-gray-400">
<Card key={source.id} className="p-3 hover:bg-gray-50 dark:hover:bg-gray-800 cursor-pointer"> Show {connector.sources.length - INITIAL_SOURCES_DISPLAY} More Sources
<div className="flex items-start gap-3"> </Button>
<div className="flex-shrink-0 w-6 h-6 flex items-center justify-center"> </DialogTrigger>
{getConnectorIcon(connector.type)} <DialogContent className="sm:max-w-[600px] max-h-[80vh] overflow-y-auto dark:border-gray-700">
</div> <SourcesDialogContent
<div className="flex-1"> connector={connector}
<h3 className="font-medium text-sm">{source.title}</h3> sourceFilter={sourceFilter}
<p className="text-sm text-gray-500 dark:text-gray-400">{source.description}</p> expandedSources={expandedSources}
</div> sourcesPage={sourcesPage}
<Button setSourcesPage={setSourcesPage}
variant="ghost" setSourceFilter={setSourceFilter}
size="icon" setExpandedSources={setExpandedSources}
className="h-6 w-6" isLoadingMore={false}
onClick={() => window.open(source.url, '_blank')} />
> </DialogContent>
<ExternalLink className="h-4 w-4" /> </Dialog>
</Button> )}
</div> </div>
</Card> </TabsContent>
))} ))}
</Tabs>
{connector.sources.length > INITIAL_SOURCES_DISPLAY && ( );
<Dialog open={dialogOpenId === connector.id} onOpenChange={(open) => setDialogOpenId(open ? connector.id : null)}> })()}
<DialogTrigger asChild>
<Button variant="ghost" className="w-full text-sm text-gray-500 dark:text-gray-400">
Show {connector.sources.length - INITIAL_SOURCES_DISPLAY} More Sources
</Button>
</DialogTrigger>
<DialogContent className="sm:max-w-[600px] max-h-[80vh] overflow-y-auto dark:border-gray-700">
<SourcesDialogContent
connector={connector}
sourceFilter={sourceFilter}
expandedSources={expandedSources}
sourcesPage={sourcesPage}
setSourcesPage={setSourcesPage}
setSourceFilter={setSourceFilter}
setExpandedSources={setExpandedSources}
isLoadingMore={false}
/>
</DialogContent>
</Dialog>
)}
</div>
</TabsContent>
))}
</Tabs>
</div> </div>
{/* Answer Section */} {/* Answer Section */}
@ -1014,15 +976,17 @@ const ChatPage = () => {
<span className="sr-only">Send</span> <span className="sr-only">Send</span>
</Button> </Button>
</form> </form>
<div className="flex items-center justify-between px-2 py-1 mt-8"> <div className="flex items-center justify-between px-2 py-2 mt-3">
<div className="flex items-center gap-4"> <div className="flex items-center space-x-3">
{/* Connector Selection Dialog */} {/* Connector Selection Dialog */}
<Dialog> <Dialog>
<DialogTrigger asChild> <DialogTrigger asChild>
<ConnectorButton <div className="h-8">
selectedConnectors={selectedConnectors} <ConnectorButton
onClick={() => { }} selectedConnectors={selectedConnectors}
/> onClick={() => { }}
/>
</div>
</DialogTrigger> </DialogTrigger>
<DialogContent className="sm:max-w-md"> <DialogContent className="sm:max-w-md">
<DialogHeader> <DialogHeader>
@ -1089,12 +1053,40 @@ const ChatPage = () => {
</DialogContent> </DialogContent>
</Dialog> </Dialog>
{/* Search Mode Control */}
<div className="flex items-center p-0.5 rounded-md border border-border bg-muted/20 h-8">
<button
onClick={() => setSearchMode('DOCUMENTS')}
className={`flex h-full items-center justify-center gap-1 px-2 rounded text-xs font-medium transition-colors flex-1 whitespace-nowrap overflow-hidden ${
searchMode === 'DOCUMENTS'
? 'bg-primary text-primary-foreground shadow-sm'
: 'text-muted-foreground hover:text-foreground hover:bg-muted/50'
}`}
>
<FileText className="h-3 w-3 flex-shrink-0 mr-1" />
<span>Full Document</span>
</button>
<button
onClick={() => setSearchMode('CHUNKS')}
className={`flex h-full items-center justify-center gap-1 px-2 rounded text-xs font-medium transition-colors flex-1 whitespace-nowrap overflow-hidden ${
searchMode === 'CHUNKS'
? 'bg-primary text-primary-foreground shadow-sm'
: 'text-muted-foreground hover:text-foreground hover:bg-muted/50'
}`}
>
<Grid3x3 className="h-3 w-3 flex-shrink-0 mr-1" />
<span>Document Chunks</span>
</button>
</div>
{/* Research Mode Segmented Control */} {/* Research Mode Segmented Control */}
<SegmentedControl<ResearchMode> <div className="h-8">
value={researchMode} <SegmentedControl<ResearchMode>
onChange={setResearchMode} value={researchMode}
options={researcherOptions} onChange={setResearchMode}
/> options={researcherOptions}
/>
</div>
</div> </div>
</div> </div>
</div> </div>

View file

@ -4,7 +4,7 @@ import React from 'react'
import Link from 'next/link' import Link from 'next/link'
import { motion } from 'framer-motion' import { motion } from 'framer-motion'
import { Button } from '@/components/ui/button' import { Button } from '@/components/ui/button'
import { Plus, Search, Trash2, AlertCircle, Loader2 } from 'lucide-react' import { Plus, Search, Trash2, AlertCircle, Loader2, LogOut } from 'lucide-react'
import { Tilt } from '@/components/ui/tilt' import { Tilt } from '@/components/ui/tilt'
import { Spotlight } from '@/components/ui/spotlight' import { Spotlight } from '@/components/ui/spotlight'
import { Logo } from '@/components/Logo'; import { Logo } from '@/components/Logo';
@ -145,11 +145,19 @@ const DashboardPage = () => {
}, },
}; };
const router = useRouter();
const { searchSpaces, loading, error, refreshSearchSpaces } = useSearchSpaces(); const { searchSpaces, loading, error, refreshSearchSpaces } = useSearchSpaces();
if (loading) return <LoadingScreen />; if (loading) return <LoadingScreen />;
if (error) return <ErrorScreen message={error} />; if (error) return <ErrorScreen message={error} />;
const handleLogout = () => {
if (typeof window !== 'undefined') {
localStorage.removeItem('surfsense_bearer_token');
router.push('/');
}
};
const handleDeleteSearchSpace = async (id: number) => { const handleDeleteSearchSpace = async (id: number) => {
// Send DELETE request to the API // Send DELETE request to the API
try { try {
@ -193,7 +201,18 @@ const DashboardPage = () => {
</p> </p>
</div> </div>
</div> </div>
<ThemeTogglerComponent /> <div className="flex items-center space-x-3">
<Button
variant="ghost"
size="icon"
onClick={handleLogout}
className="h-9 w-9 rounded-full"
aria-label="Logout"
>
<LogOut className="h-5 w-5" />
</Button>
<ThemeTogglerComponent />
</div>
</div> </div>
<div className="flex flex-col space-y-6 mt-6"> <div className="flex flex-col space-y-6 mt-6">

View file

@ -45,6 +45,7 @@
--sidebar-accent-foreground: oklch(0.205 0 0); --sidebar-accent-foreground: oklch(0.205 0 0);
--sidebar-border: oklch(0.922 0 0); --sidebar-border: oklch(0.922 0 0);
--sidebar-ring: oklch(0.708 0 0); --sidebar-ring: oklch(0.708 0 0);
--syntax-bg: #f5f5f5;
} }
.dark { .dark {
@ -80,6 +81,7 @@
--sidebar-accent-foreground: oklch(0.985 0 0); --sidebar-accent-foreground: oklch(0.985 0 0);
--sidebar-border: oklch(0.269 0 0); --sidebar-border: oklch(0.269 0 0);
--sidebar-ring: oklch(0.439 0 0); --sidebar-ring: oklch(0.439 0 0);
--syntax-bg: #1e1e1e;
} }
@theme inline { @theme inline {

View file

@ -15,35 +15,67 @@ const roboto = Roboto({
}); });
export const metadata: Metadata = { export const metadata: Metadata = {
title: "SurfSense - A Personal NotebookLM and Perplexity-like AI Assistant for Everyone.", title: "SurfSense Customizable AI Research & Knowledge Management Assistant",
description: description:
"Have your own private NotebookLM and Perplexity with better integrations.", "SurfSense is an AI-powered research assistant that integrates with tools like Notion, GitHub, Slack, and more to help you efficiently manage, search, and chat with your documents. Generate podcasts, perform hybrid search, and unlock insights from your knowledge base.",
openGraph: { keywords: [
images: [ "SurfSense",
{ "AI research assistant",
url: "https://surfsense.net/og-image.png", "AI knowledge management",
width: 1200, "AI document assistant",
height: 630, "customizable AI assistant",
alt: "SurfSense - A Personal NotebookLM and Perplexity-like AI Assistant for Everyone.", "notion integration",
}, "slack integration",
], "github integration",
}, "hybrid search",
twitter: { "vector search",
card: "summary_large_image", "RAG",
site: "https://surfsense.net", "LangChain",
creator: "https://surfsense.net", "FastAPI",
title: "SurfSense - A Personal NotebookLM and Perplexity-like AI Assistant for Everyone.", "LLM apps",
description: "AI document chat",
"Have your own private NotebookLM and Perplexity with better integrations.", "knowledge management AI",
images: [ "AI-powered document search",
{ "personal AI assistant",
url: "https://surfsense.net/og-image.png", "AI research tools",
width: 1200, "AI podcast generator",
height: 630, "AI knowledge base",
alt: "SurfSense - A Personal NotebookLM and Perplexity-like AI Assistant for Everyone.", "AI document assistant tools",
}, "AI-powered search assistant",
], ],
}, openGraph: {
title: "SurfSense AI Research & Knowledge Management Assistant",
description:
"Connect your documents and tools like Notion, Slack, GitHub, and more to your private AI assistant. SurfSense offers powerful search, document chat, podcast generation, and RAG APIs to enhance your workflow.",
url: "https://surfsense.net",
siteName: "SurfSense",
type: "website",
images: [
{
url: "https://surfsense.net/og-image.png",
width: 1200,
height: 630,
alt: "SurfSense AI Research Assistant",
},
],
locale: "en_US",
},
twitter: {
card: "summary_large_image",
title: "SurfSense AI Assistant for Research & Knowledge Management",
description:
"Have your own NotebookLM or Perplexity, but better. SurfSense connects external tools, allows chat with your documents, and generates fast, high-quality podcasts.",
creator: "https://surfsense.net",
site: "https://surfsense.net",
images: [
{
url: "https://surfsense.net/og-image-twitter.png",
width: 1200,
height: 630,
alt: "SurfSense AI Assistant Preview",
},
],
}
}; };
export default async function RootLayout({ export default async function RootLayout({

View file

@ -0,0 +1,43 @@
"use client";
import React from "react";
export const AmbientBackground = () => {
return (
<div className="pointer-events-none absolute left-0 top-0 z-0 h-screen w-screen">
<div
style={{
transform: "translateY(-350px) rotate(-45deg)",
width: "560px",
height: "1380px",
background:
"radial-gradient(68.54% 68.72% at 55.02% 31.46%, rgba(59, 130, 246, 0.08) 0%, rgba(59, 130, 246, 0.02) 50%, rgba(59, 130, 246, 0) 100%)",
}}
className="absolute left-0 top-0"
/>
<div
style={{
transform: "rotate(-45deg) translate(5%, -50%)",
transformOrigin: "top left",
width: "240px",
height: "1380px",
background:
"radial-gradient(50% 50% at 50% 50%, rgba(59, 130, 246, 0.06) 0%, rgba(59, 130, 246, 0.02) 80%, transparent 100%)",
}}
className="absolute left-0 top-0"
/>
<div
style={{
position: "absolute",
borderRadius: "20px",
transform: "rotate(-45deg) translate(-180%, -70%)",
transformOrigin: "top left",
width: "240px",
height: "1380px",
background:
"radial-gradient(50% 50% at 50% 50%, rgba(59, 130, 246, 0.04) 0%, rgba(59, 130, 246, 0.02) 80%, transparent 100%)",
}}
className="absolute left-0 top-0"
/>
</div>
);
};

View file

@ -3,6 +3,7 @@ import React from "react";
import { IconBrandGoogleFilled } from "@tabler/icons-react"; import { IconBrandGoogleFilled } from "@tabler/icons-react";
import { motion } from "framer-motion"; import { motion } from "framer-motion";
import { Logo } from "@/components/Logo"; import { Logo } from "@/components/Logo";
import { AmbientBackground } from "./AmbientBackground";
export function GoogleLoginButton() { export function GoogleLoginButton() {
const handleGoogleLogin = () => { const handleGoogleLogin = () => {
@ -34,6 +35,42 @@ export function GoogleLoginButton() {
Welcome Back Welcome Back
</h1> </h1>
<motion.div
initial={{ opacity: 0, y: -5 }}
animate={{ opacity: 1, y: 0 }}
transition={{ duration: 0.3 }}
className="mb-4 w-full overflow-hidden rounded-lg border border-yellow-200 bg-yellow-50 text-yellow-900 shadow-sm dark:border-yellow-900/30 dark:bg-yellow-900/20 dark:text-yellow-200"
>
<motion.div
className="flex items-center gap-2 p-4"
initial={{ x: -5 }}
animate={{ x: 0 }}
transition={{ delay: 0.1, duration: 0.2 }}
>
<svg
xmlns="http://www.w3.org/2000/svg"
width="16"
height="16"
viewBox="0 0 24 24"
fill="none"
stroke="currentColor"
strokeWidth="2"
strokeLinecap="round"
strokeLinejoin="round"
className="flex-shrink-0"
>
<path d="M10.29 3.86L1.82 18a2 2 0 0 0 1.71 3h16.94a2 2 0 0 0 1.71-3L13.71 3.86a2 2 0 0 0-3.42 0z"/>
<line x1="12" y1="9" x2="12" y2="13"/>
<line x1="12" y1="17" x2="12.01" y2="17"/>
</svg>
<div className="ml-1">
<p className="text-sm font-medium">
SurfSense Cloud is currently in development. Check <a href="/docs" className="text-blue-600 underline dark:text-blue-400 hover:text-blue-800 dark:hover:text-blue-300">Docs</a> for more information on Self-Hosted version.
</p>
</div>
</motion.div>
</motion.div>
<motion.button <motion.button
whileHover={{ scale: 1.02 }} whileHover={{ scale: 1.02 }}
whileTap={{ scale: 0.98 }} whileTap={{ scale: 0.98 }}
@ -53,46 +90,3 @@ export function GoogleLoginButton() {
</div> </div>
); );
} }
const AmbientBackground = () => {
return (
<div className="pointer-events-none absolute left-0 top-0 z-0 h-screen w-screen">
<div
style={{
transform: "translateY(-350px) rotate(-45deg)",
width: "560px",
height: "1380px",
background:
"radial-gradient(68.54% 68.72% at 55.02% 31.46%, rgba(59, 130, 246, 0.08) 0%, rgba(59, 130, 246, 0.02) 50%, rgba(59, 130, 246, 0) 100%)",
}}
className="absolute left-0 top-0"
/>
<div
style={{
transform: "rotate(-45deg) translate(5%, -50%)",
transformOrigin: "top left",
width: "240px",
height: "1380px",
background:
"radial-gradient(50% 50% at 50% 50%, rgba(59, 130, 246, 0.06) 0%, rgba(59, 130, 246, 0.02) 80%, transparent 100%)",
}}
className="absolute left-0 top-0"
/>
<div
style={{
position: "absolute",
borderRadius: "20px",
transform: "rotate(-45deg) translate(-180%, -70%)",
transformOrigin: "top left",
width: "240px",
height: "1380px",
background:
"radial-gradient(50% 50% at 50% 50%, rgba(59, 130, 246, 0.04) 0%, rgba(59, 130, 246, 0.02) 80%, transparent 100%)",
}}
className="absolute left-0 top-0"
/>
</div>
);
};

View file

@ -0,0 +1,114 @@
"use client";
import React, { useState, useEffect } from "react";
import { useRouter } from "next/navigation";
import Link from "next/link";
export function LocalLoginForm() {
const [username, setUsername] = useState("");
const [password, setPassword] = useState("");
const [error, setError] = useState("");
const [isLoading, setIsLoading] = useState(false);
const [authType, setAuthType] = useState<string | null>(null);
const router = useRouter();
useEffect(() => {
// Get the auth type from environment variables
setAuthType(process.env.NEXT_PUBLIC_FASTAPI_BACKEND_AUTH_TYPE || "GOOGLE");
}, []);
const handleSubmit = async (e: React.FormEvent) => {
e.preventDefault();
setIsLoading(true);
setError("");
try {
// Create form data for the API request
const formData = new URLSearchParams();
formData.append("username", username);
formData.append("password", password);
formData.append("grant_type", "password");
const response = await fetch(
`${process.env.NEXT_PUBLIC_FASTAPI_BACKEND_URL}/auth/jwt/login`,
{
method: "POST",
headers: {
"Content-Type": "application/x-www-form-urlencoded",
},
body: formData.toString(),
}
);
const data = await response.json();
if (!response.ok) {
throw new Error(data.detail || "Failed to login");
}
router.push("/auth/callback?token=" + data.access_token);
} catch (err: any) {
setError(err.message || "An error occurred during login");
} finally {
setIsLoading(false);
}
};
return (
<div className="w-full max-w-md">
<form onSubmit={handleSubmit} className="space-y-4">
{error && (
<div className="rounded-md bg-red-50 p-4 text-sm text-red-500 dark:bg-red-900/20 dark:text-red-200">
{error}
</div>
)}
<div>
<label htmlFor="email" className="block text-sm font-medium text-gray-700 dark:text-gray-300">
Email
</label>
<input
id="email"
type="email"
required
value={username}
onChange={(e) => setUsername(e.target.value)}
className="mt-1 block w-full rounded-md border border-gray-300 bg-white px-3 py-2 shadow-sm focus:border-blue-500 focus:outline-none focus:ring-blue-500 dark:border-gray-700 dark:bg-gray-800 dark:text-white"
/>
</div>
<div>
<label htmlFor="password" className="block text-sm font-medium text-gray-700 dark:text-gray-300">
Password
</label>
<input
id="password"
type="password"
required
value={password}
onChange={(e) => setPassword(e.target.value)}
className="mt-1 block w-full rounded-md border border-gray-300 bg-white px-3 py-2 shadow-sm focus:border-blue-500 focus:outline-none focus:ring-blue-500 dark:border-gray-700 dark:bg-gray-800 dark:text-white"
/>
</div>
<button
type="submit"
disabled={isLoading}
className="w-full rounded-md bg-blue-600 px-4 py-2 text-white shadow-sm hover:bg-blue-700 focus:outline-none focus:ring-2 focus:ring-blue-500 focus:ring-offset-2 disabled:cursor-not-allowed disabled:opacity-50"
>
{isLoading ? "Signing in..." : "Sign in"}
</button>
</form>
{authType === "LOCAL" && (
<div className="mt-4 text-center text-sm">
<p className="text-gray-600 dark:text-gray-400">
Don&apos;t have an account?{" "}
<Link href="/register" className="font-medium text-blue-600 hover:text-blue-500 dark:text-blue-400">
Register here
</Link>
</p>
</div>
)}
</div>
);
}

View file

@ -1,5 +1,89 @@
"use client";
import { useState, useEffect, Suspense } from "react";
import { GoogleLoginButton } from "./GoogleLoginButton"; import { GoogleLoginButton } from "./GoogleLoginButton";
import { LocalLoginForm } from "./LocalLoginForm";
import { Logo } from "@/components/Logo";
import { AmbientBackground } from "./AmbientBackground";
import { useSearchParams } from "next/navigation";
import { Loader2 } from "lucide-react";
function LoginContent() {
const [authType, setAuthType] = useState<string | null>(null);
const [registrationSuccess, setRegistrationSuccess] = useState(false);
const [isLoading, setIsLoading] = useState(true);
const searchParams = useSearchParams();
useEffect(() => {
// Check if the user was redirected from registration
if (searchParams.get("registered") === "true") {
setRegistrationSuccess(true);
}
// Get the auth type from environment variables
setAuthType(process.env.NEXT_PUBLIC_FASTAPI_BACKEND_AUTH_TYPE || "GOOGLE");
setIsLoading(false);
}, [searchParams]);
// Show loading state while determining auth type
if (isLoading) {
return (
<div className="relative w-full overflow-hidden">
<AmbientBackground />
<div className="mx-auto flex h-screen max-w-lg flex-col items-center justify-center">
<Logo className="rounded-md" />
<div className="mt-8 flex items-center space-x-2">
<Loader2 className="h-5 w-5 animate-spin text-muted-foreground" />
<span className="text-muted-foreground">Loading...</span>
</div>
</div>
</div>
);
}
if (authType === "GOOGLE") {
return <GoogleLoginButton />;
}
return (
<div className="relative w-full overflow-hidden">
<AmbientBackground />
<div className="mx-auto flex h-screen max-w-lg flex-col items-center justify-center">
<Logo className="rounded-md" />
<h1 className="my-8 text-xl font-bold text-neutral-800 dark:text-neutral-100 md:text-4xl">
Sign In
</h1>
{registrationSuccess && (
<div className="mb-4 w-full rounded-md bg-green-50 p-4 text-sm text-green-500 dark:bg-green-900/20 dark:text-green-200">
Registration successful! You can now sign in with your credentials.
</div>
)}
<LocalLoginForm />
</div>
</div>
);
}
// Loading fallback for Suspense
const LoadingFallback = () => (
<div className="relative w-full overflow-hidden">
<AmbientBackground />
<div className="mx-auto flex h-screen max-w-lg flex-col items-center justify-center">
<Logo className="rounded-md" />
<div className="mt-8 flex items-center space-x-2">
<Loader2 className="h-5 w-5 animate-spin text-muted-foreground" />
<span className="text-muted-foreground">Loading...</span>
</div>
</div>
</div>
);
export default function LoginPage() { export default function LoginPage() {
return <GoogleLoginButton />; return (
<Suspense fallback={<LoadingFallback />}>
<LoginContent />
</Suspense>
);
} }

View file

@ -0,0 +1,149 @@
"use client";
import React, { useState, useEffect } from "react";
import { useRouter } from "next/navigation";
import Link from "next/link";
import { Logo } from "@/components/Logo";
import { AmbientBackground } from "../login/AmbientBackground";
export default function RegisterPage() {
const [email, setEmail] = useState("");
const [password, setPassword] = useState("");
const [confirmPassword, setConfirmPassword] = useState("");
const [error, setError] = useState("");
const [isLoading, setIsLoading] = useState(false);
const router = useRouter();
// Check authentication type and redirect if not LOCAL
useEffect(() => {
const authType = process.env.NEXT_PUBLIC_FASTAPI_BACKEND_AUTH_TYPE || "GOOGLE";
if (authType !== "LOCAL") {
router.push("/login");
}
}, [router]);
const handleSubmit = async (e: React.FormEvent) => {
e.preventDefault();
// Form validation
if (password !== confirmPassword) {
setError("Passwords do not match");
return;
}
setIsLoading(true);
setError("");
try {
const response = await fetch(
`${process.env.NEXT_PUBLIC_FASTAPI_BACKEND_URL}/auth/register`,
{
method: "POST",
headers: {
"Content-Type": "application/json",
},
body: JSON.stringify({
email,
password,
is_active: true,
is_superuser: false,
is_verified: false,
}),
}
);
const data = await response.json();
if (!response.ok) {
throw new Error(data.detail || "Registration failed");
}
// Redirect to login page after successful registration
router.push("/login?registered=true");
} catch (err: any) {
setError(err.message || "An error occurred during registration");
} finally {
setIsLoading(false);
}
};
return (
<div className="relative w-full overflow-hidden">
<AmbientBackground />
<div className="mx-auto flex h-screen max-w-lg flex-col items-center justify-center">
<Logo className="rounded-md" />
<h1 className="my-8 text-xl font-bold text-neutral-800 dark:text-neutral-100 md:text-4xl">
Create an Account
</h1>
<div className="w-full max-w-md">
<form onSubmit={handleSubmit} className="space-y-4">
{error && (
<div className="rounded-md bg-red-50 p-4 text-sm text-red-500 dark:bg-red-900/20 dark:text-red-200">
{error}
</div>
)}
<div>
<label htmlFor="email" className="block text-sm font-medium text-gray-700 dark:text-gray-300">
Email
</label>
<input
id="email"
type="email"
required
value={email}
onChange={(e) => setEmail(e.target.value)}
className="mt-1 block w-full rounded-md border border-gray-300 bg-white px-3 py-2 shadow-sm focus:border-blue-500 focus:outline-none focus:ring-blue-500 dark:border-gray-700 dark:bg-gray-800 dark:text-white"
/>
</div>
<div>
<label htmlFor="password" className="block text-sm font-medium text-gray-700 dark:text-gray-300">
Password
</label>
<input
id="password"
type="password"
required
value={password}
onChange={(e) => setPassword(e.target.value)}
className="mt-1 block w-full rounded-md border border-gray-300 bg-white px-3 py-2 shadow-sm focus:border-blue-500 focus:outline-none focus:ring-blue-500 dark:border-gray-700 dark:bg-gray-800 dark:text-white"
/>
</div>
<div>
<label htmlFor="confirmPassword" className="block text-sm font-medium text-gray-700 dark:text-gray-300">
Confirm Password
</label>
<input
id="confirmPassword"
type="password"
required
value={confirmPassword}
onChange={(e) => setConfirmPassword(e.target.value)}
className="mt-1 block w-full rounded-md border border-gray-300 bg-white px-3 py-2 shadow-sm focus:border-blue-500 focus:outline-none focus:ring-blue-500 dark:border-gray-700 dark:bg-gray-800 dark:text-white"
/>
</div>
<button
type="submit"
disabled={isLoading}
className="w-full rounded-md bg-blue-600 px-4 py-2 text-white shadow-sm hover:bg-blue-700 focus:outline-none focus:ring-2 focus:ring-blue-500 focus:ring-offset-2 disabled:cursor-not-allowed disabled:opacity-50"
>
{isLoading ? "Creating account..." : "Register"}
</button>
</form>
<div className="mt-4 text-center text-sm">
<p className="text-gray-600 dark:text-gray-400">
Already have an account?{" "}
<Link href="/login" className="font-medium text-blue-600 hover:text-blue-500 dark:text-blue-400">
Sign in
</Link>
</p>
</div>
</div>
</div>
</div>
);
}

View file

@ -0,0 +1,48 @@
import type { MetadataRoute } from 'next'
export default function sitemap(): MetadataRoute.Sitemap {
return [
{
url: 'https://www.surfsense.net/',
lastModified: new Date(),
changeFrequency: 'yearly',
priority: 1,
},
{
url: 'https://www.surfsense.net/privacy',
lastModified: new Date(),
changeFrequency: 'monthly',
priority: 0.9,
},
{
url: 'https://www.surfsense.net/terms',
lastModified: new Date(),
changeFrequency: 'monthly',
priority: 0.9,
},
{
url: 'https://www.surfsense.net/docs',
lastModified: new Date(),
changeFrequency: 'weekly',
priority: 0.9,
},
{
url: 'https://www.surfsense.net/docs/installation',
lastModified: new Date(),
changeFrequency: 'weekly',
priority: 0.9,
},
{
url: 'https://www.surfsense.net/docs/docker-installation',
lastModified: new Date(),
changeFrequency: 'weekly',
priority: 0.9,
},
{
url: 'https://www.surfsense.net/docs/manual-installation',
lastModified: new Date(),
changeFrequency: 'weekly',
priority: 0.9,
},
]
}

View file

@ -19,6 +19,17 @@ export function ModernHeroWithGradients() {
<DarkModeGradient /> <DarkModeGradient />
<div className="relative z-20 flex flex-col items-center justify-center overflow-hidden rounded-3xl p-4 md:p-12 lg:p-16"> <div className="relative z-20 flex flex-col items-center justify-center overflow-hidden rounded-3xl p-4 md:p-12 lg:p-16">
<div className="flex justify-center w-full mb-4">
<Link href="https://github.com/MODSetter/SurfSense" target="_blank" rel="noopener noreferrer">
<img
src="https://trendshift.io/api/badge/repositories/13606"
alt="MODSetter%2FSurfSense | Trendshift"
style={{ width: "250px", height: "55px" }}
width={250}
height={55}
/>
</Link>
</div>
<Link <Link
href="/docs" href="/docs"
className="flex items-center gap-1 rounded-full border border-gray-200 bg-gradient-to-b from-gray-50 to-gray-100 px-4 py-1 text-center text-sm text-gray-800 shadow-sm dark:border-[#404040] dark:bg-gradient-to-b dark:from-[#5B5B5D] dark:to-[#262627] dark:text-white dark:shadow-inner dark:shadow-purple-500/10" className="flex items-center gap-1 rounded-full border border-gray-200 bg-gradient-to-b from-gray-50 to-gray-100 px-4 py-1 text-center text-sm text-gray-800 shadow-sm dark:border-[#404040] dark:bg-gradient-to-b dark:from-[#5B5B5D] dark:to-[#262627] dark:text-white dark:shadow-inner dark:shadow-purple-500/10"
@ -36,7 +47,7 @@ export function ModernHeroWithGradients() {
</h1> </h1>
</div> </div>
<p className="mx-auto max-w-3xl py-6 text-center text-base text-gray-600 dark:text-neutral-300 md:text-lg lg:text-xl"> <p className="mx-auto max-w-3xl py-6 text-center text-base text-gray-600 dark:text-neutral-300 md:text-lg lg:text-xl">
A Customizable AI Research Agent just like NotebookLM or Perplexity, but connected to external sources such as search engines (Tavily), Slack, Linear, Notion, YouTube, GitHub and more. A Customizable AI Research Agent just like NotebookLM or Perplexity, but connected to external sources such as search engines (Tavily, LinkUp), Slack, Linear, Notion, YouTube, GitHub and more.
</p> </p>
<div className="flex flex-col items-center gap-6 py-6 sm:flex-row"> <div className="flex flex-col items-center gap-6 py-6 sm:flex-row">
<Link <Link

View file

@ -1,6 +1,6 @@
"use client"; "use client";
import { cn } from "@/lib/utils"; import { cn } from "@/lib/utils";
import { IconMenu2, IconX, IconBrandGoogleFilled } from "@tabler/icons-react"; import { IconMenu2, IconX, IconBrandGoogleFilled, IconUser } from "@tabler/icons-react";
import { import {
motion, motion,
AnimatePresence, AnimatePresence,
@ -64,24 +64,8 @@ const DesktopNav = ({ navItems, visible }: NavbarProps) => {
const [hoveredIndex, setHoveredIndex] = useState<number | null>(null); const [hoveredIndex, setHoveredIndex] = useState<number | null>(null);
const handleGoogleLogin = () => { const handleGoogleLogin = () => {
// Redirect to Google OAuth authorization URL // Redirect to the login page
fetch(`${process.env.NEXT_PUBLIC_FASTAPI_BACKEND_URL}/auth/google/authorize`) window.location.href = '/login';
.then((response) => {
if (!response.ok) {
throw new Error('Failed to get authorization URL');
}
return response.json();
})
.then((data) => {
if (data.authorization_url) {
window.location.href = data.authorization_url;
} else {
console.error('No authorization URL received');
}
})
.catch((error) => {
console.error('Error during Google login:', error);
});
}; };
return ( return (
@ -191,8 +175,8 @@ const DesktopNav = ({ navItems, visible }: NavbarProps) => {
variant="outline" variant="outline"
className="hidden cursor-pointer md:flex items-center gap-2 rounded-full dark:bg-white/20 dark:hover:bg-white/30 dark:text-white bg-gray-100 hover:bg-gray-200 text-gray-800 border-0" className="hidden cursor-pointer md:flex items-center gap-2 rounded-full dark:bg-white/20 dark:hover:bg-white/30 dark:text-white bg-gray-100 hover:bg-gray-200 text-gray-800 border-0"
> >
<IconBrandGoogleFilled className="h-4 w-4" /> <IconUser className="h-4 w-4" />
<span>Sign in with Google</span> <span>Sign in</span>
</Button> </Button>
</motion.div> </motion.div>
)} )}
@ -241,7 +225,7 @@ const MobileNav = ({ navItems, visible }: NavbarProps) => {
} as React.CSSProperties} } as React.CSSProperties}
> >
<div className="flex flex-row justify-between items-center w-full"> <div className="flex flex-row justify-between items-center w-full">
<Logo className="h-8 w-8 rounded-md" /> <Logo className="h-8 w-8 rounded-md" />
<div className="flex items-center gap-2"> <div className="flex items-center gap-2">
<ThemeTogglerComponent /> <ThemeTogglerComponent />
{open ? ( {open ? (
@ -294,8 +278,8 @@ const MobileNav = ({ navItems, visible }: NavbarProps) => {
variant="outline" variant="outline"
className="flex cursor-pointer items-center gap-2 mt-4 w-full justify-center rounded-full dark:bg-white/20 dark:hover:bg-white/30 dark:text-white bg-gray-100 hover:bg-gray-200 text-gray-800 border-0" className="flex cursor-pointer items-center gap-2 mt-4 w-full justify-center rounded-full dark:bg-white/20 dark:hover:bg-white/30 dark:text-white bg-gray-100 hover:bg-gray-200 text-gray-800 border-0"
> >
<IconBrandGoogleFilled className="h-4 w-4" /> <IconUser className="h-4 w-4" />
<span>Sign in with Google</span> <span>Sign in</span>
</Button> </Button>
</motion.div> </motion.div>
)} )}

View file

@ -11,7 +11,7 @@ import {
Link, Link,
Webhook, Webhook,
} from 'lucide-react'; } from 'lucide-react';
import { IconBrandNotion, IconBrandSlack, IconBrandYoutube, IconBrandGithub, IconLayoutKanban } from "@tabler/icons-react"; import { IconBrandNotion, IconBrandSlack, IconBrandYoutube, IconBrandGithub, IconLayoutKanban, IconLinkPlus } from "@tabler/icons-react";
import { Button } from '@/components/ui/button'; import { Button } from '@/components/ui/button';
import { Connector, ResearchMode } from './types'; import { Connector, ResearchMode } from './types';
@ -20,6 +20,8 @@ export const getConnectorIcon = (connectorType: string) => {
const iconProps = { className: "h-4 w-4" }; const iconProps = { className: "h-4 w-4" };
switch(connectorType) { switch(connectorType) {
case 'LINKUP_API':
return <IconLinkPlus {...iconProps} />;
case 'LINEAR_CONNECTOR': case 'LINEAR_CONNECTOR':
return <IconLayoutKanban {...iconProps} />; return <IconLayoutKanban {...iconProps} />;
case 'GITHUB_CONNECTOR': case 'GITHUB_CONNECTOR':
@ -145,7 +147,7 @@ export const ConnectorButton = ({ selectedConnectors, onClick, connectorSources
return ( return (
<Button <Button
variant="outline" variant="outline"
className="h-7 px-2 text-xs font-medium rounded-md border-border relative overflow-hidden group scale-90 origin-left" className="h-8 px-2 text-xs font-medium rounded-md border-border relative overflow-hidden group"
onClick={onClick} onClick={onClick}
aria-label={selectedCount === 0 ? "Select Connectors" : `${selectedCount} connectors selected`} aria-label={selectedCount === 0 ? "Select Connectors" : `${selectedCount} connectors selected`}
> >

View file

@ -15,11 +15,11 @@ type SegmentedControlProps<T extends string> = {
*/ */
function SegmentedControl<T extends string>({ value, onChange, options }: SegmentedControlProps<T>) { function SegmentedControl<T extends string>({ value, onChange, options }: SegmentedControlProps<T>) {
return ( return (
<div className="flex rounded-md border border-border overflow-hidden scale-90 origin-left"> <div className="flex h-7 rounded-md border border-border overflow-hidden">
{options.map((option) => ( {options.map((option) => (
<button <button
key={option.value} key={option.value}
className={`flex items-center gap-1 px-2 py-1 text-xs transition-colors ${ className={`flex h-full items-center gap-1 px-2 text-xs transition-colors ${
value === option.value value === option.value
? 'bg-primary text-primary-foreground' ? 'bg-primary text-primary-foreground'
: 'hover:bg-muted' : 'hover:bg-muted'

View file

@ -30,5 +30,6 @@ export const editConnectorSchema = z.object({
SERPER_API_KEY: z.string().optional(), SERPER_API_KEY: z.string().optional(),
TAVILY_API_KEY: z.string().optional(), TAVILY_API_KEY: z.string().optional(),
LINEAR_API_KEY: z.string().optional(), LINEAR_API_KEY: z.string().optional(),
LINKUP_API_KEY: z.string().optional(),
}); });
export type EditConnectorFormValues = z.infer<typeof editConnectorSchema>; export type EditConnectorFormValues = z.infer<typeof editConnectorSchema>;

View file

@ -1,4 +1,4 @@
import React, { useMemo } from "react"; import React, { useMemo, useState, useEffect } from "react";
import ReactMarkdown from "react-markdown"; import ReactMarkdown from "react-markdown";
import rehypeRaw from "rehype-raw"; import rehypeRaw from "rehype-raw";
import rehypeSanitize from "rehype-sanitize"; import rehypeSanitize from "rehype-sanitize";
@ -6,6 +6,10 @@ import remarkGfm from "remark-gfm";
import { cn } from "@/lib/utils"; import { cn } from "@/lib/utils";
import { Citation } from "./chat/Citation"; import { Citation } from "./chat/Citation";
import { Source } from "./chat/types"; import { Source } from "./chat/types";
import { Prism as SyntaxHighlighter } from "react-syntax-highlighter";
import { oneLight, oneDark } from "react-syntax-highlighter/dist/cjs/styles/prism";
import { Check, Copy } from "lucide-react";
import { useTheme } from "next-themes";
interface MarkdownViewerProps { interface MarkdownViewerProps {
content: string; content: string;
@ -75,16 +79,19 @@ export function MarkdownViewer({ content, className, getCitationSource }: Markdo
td: ({node, ...props}: any) => <td className="px-3 py-2 border-t border-border" {...props} />, td: ({node, ...props}: any) => <td className="px-3 py-2 border-t border-border" {...props} />,
code: ({node, className, children, ...props}: any) => { code: ({node, className, children, ...props}: any) => {
const match = /language-(\w+)/.exec(className || ''); const match = /language-(\w+)/.exec(className || '');
const language = match ? match[1] : '';
const isInline = !match; const isInline = !match;
return isInline
? <code className="bg-muted px-1 py-0.5 rounded text-xs" {...props}>{children}</code> if (isInline) {
: ( return <code className="bg-muted px-1 py-0.5 rounded text-xs" {...props}>{children}</code>;
<div className="relative my-4"> }
<pre className="bg-muted p-4 rounded-md overflow-x-auto">
<code className="text-xs" {...props}>{children}</code> // For code blocks, add syntax highlighting and copy functionality
</pre> return (
</div> <CodeBlock language={language} {...props}>
); {String(children).replace(/\n$/, '')}
</CodeBlock>
);
} }
}; };
}, [getCitationSource]); }, [getCitationSource]);
@ -102,6 +109,102 @@ export function MarkdownViewer({ content, className, getCitationSource }: Markdo
); );
} }
// Code block component with syntax highlighting and copy functionality
const CodeBlock = ({ children, language }: { children: string, language: string }) => {
const [copied, setCopied] = useState(false);
const { resolvedTheme, theme } = useTheme();
const [mounted, setMounted] = useState(false);
// Prevent hydration issues
useEffect(() => {
setMounted(true);
}, []);
const handleCopy = async () => {
await navigator.clipboard.writeText(children);
setCopied(true);
setTimeout(() => setCopied(false), 2000);
};
// Choose theme based on current system/user preference
const isDarkTheme = mounted && (resolvedTheme === 'dark' || theme === 'dark');
const syntaxTheme = isDarkTheme ? oneDark : oneLight;
return (
<div className="relative my-4 group">
<div className="absolute right-2 top-2 z-10">
<button
onClick={handleCopy}
className="p-1.5 rounded-md bg-background/80 hover:bg-background border border-border flex items-center justify-center transition-colors"
aria-label="Copy code"
>
{copied ?
<Check size={14} className="text-green-500" /> :
<Copy size={14} className="text-muted-foreground" />
}
</button>
</div>
{mounted ? (
<SyntaxHighlighter
language={language || 'text'}
style={{
...syntaxTheme,
'pre[class*="language-"]': {
...syntaxTheme['pre[class*="language-"]'],
margin: 0,
border: 'none',
borderRadius: '0.375rem',
background: 'var(--syntax-bg)'
},
'code[class*="language-"]': {
...syntaxTheme['code[class*="language-"]'],
border: 'none',
background: 'var(--syntax-bg)'
}
}}
customStyle={{
margin: 0,
borderRadius: '0.375rem',
fontSize: '0.75rem',
lineHeight: '1.5rem',
backgroundColor: 'var(--syntax-bg)',
border: 'none',
}}
codeTagProps={{
className: "font-mono",
style: {
border: 'none',
background: 'var(--syntax-bg)'
}
}}
showLineNumbers={false}
wrapLines={false}
lineProps={{
style: {
wordBreak: 'break-all',
whiteSpace: 'pre-wrap',
border: 'none',
borderBottom: 'none',
paddingLeft: 0,
paddingRight: 0,
margin: '0.25rem 0'
}
}}
PreTag="div"
>
{children}
</SyntaxHighlighter>
) : (
<div className="bg-muted p-4 rounded-md">
<pre className="m-0 p-0 border-0">
<code className="text-xs font-mono border-0 leading-6">{children}</code>
</pre>
</div>
)}
</div>
);
};
// Helper function to process citations within React children // Helper function to process citations within React children
const processCitationsInReactChildren = (children: React.ReactNode, getCitationSource: (id: number) => Source | null): React.ReactNode => { const processCitationsInReactChildren = (children: React.ReactNode, getCitationSource: (id: number) => Source | null): React.ReactNode => {
// If children is not an array or string, just return it // If children is not an array or string, just return it

View file

@ -14,6 +14,7 @@ import {
Info, Info,
ExternalLink, ExternalLink,
Trash2, Trash2,
Podcast,
type LucideIcon, type LucideIcon,
} from "lucide-react" } from "lucide-react"
@ -45,7 +46,8 @@ export const iconMap: Record<string, LucideIcon> = {
AlertCircle, AlertCircle,
Info, Info,
ExternalLink, ExternalLink,
Trash2 Trash2,
Podcast
} }
const defaultData = { const defaultData = {

View file

@ -0,0 +1,28 @@
"use client"
import * as React from "react"
import * as SliderPrimitive from "@radix-ui/react-slider"
import { cn } from "@/lib/utils"
const Slider = React.forwardRef<
React.ElementRef<typeof SliderPrimitive.Root>,
React.ComponentPropsWithoutRef<typeof SliderPrimitive.Root>
>(({ className, ...props }, ref) => (
<SliderPrimitive.Root
ref={ref}
className={cn(
"relative flex w-full touch-none select-none items-center",
className
)}
{...props}
>
<SliderPrimitive.Track className="relative h-2 w-full grow overflow-hidden rounded-full bg-secondary">
<SliderPrimitive.Range className="absolute h-full bg-primary" />
</SliderPrimitive.Track>
<SliderPrimitive.Thumb className="block h-5 w-5 rounded-full border-2 border-primary bg-background ring-offset-background transition-colors focus-visible:outline-none focus-visible:ring-2 focus-visible:ring-ring focus-visible:ring-offset-2 disabled:pointer-events-none disabled:opacity-50" />
</SliderPrimitive.Root>
))
Slider.displayName = SliderPrimitive.Root.displayName
export { Slider }

View file

@ -3,6 +3,7 @@ title: Docker Installation
description: Setting up SurfSense using Docker description: Setting up SurfSense using Docker
full: true full: true
--- ---
## Known Limitations ## Known Limitations
⚠️ **Important Note:** Currently, the following features have limited functionality when running in Docker: ⚠️ **Important Note:** Currently, the following features have limited functionality when running in Docker:
@ -12,7 +13,6 @@ full: true
We're actively working to resolve these limitations in future releases. We're actively working to resolve these limitations in future releases.
# Docker Installation # Docker Installation
This guide explains how to run SurfSense using Docker Compose, which is the preferred and recommended method for deployment. This guide explains how to run SurfSense using Docker Compose, which is the preferred and recommended method for deployment.
@ -32,84 +32,158 @@ Before you begin, ensure you have:
## Installation Steps ## Installation Steps
1. **Configure Environment Variables** 1. **Configure Environment Variables**
Set up the necessary environment variables:
Set up the necessary environment variables: **Linux/macOS:**
**Linux/macOS:** ```bash
```bash # Copy example environment files
# Copy example environment files cp surfsense_backend/.env.example surfsense_backend/.env
cp surfsense_backend/.env.example surfsense_backend/.env cp surfsense_web/.env.example surfsense_web/.env
cp surfsense_web/.env.example surfsense_web/.env cp .env.example .env # For Docker-specific settings
``` ```
**Windows (Command Prompt):** **Windows (Command Prompt):**
```cmd
copy surfsense_backend\.env.example surfsense_backend\.env
copy surfsense_web\.env.example surfsense_web\.env
```
**Windows (PowerShell):** ```cmd
```powershell copy surfsense_backend\.env.example surfsense_backend\.env
Copy-Item -Path surfsense_backend\.env.example -Destination surfsense_backend\.env copy surfsense_web\.env.example surfsense_web\.env
Copy-Item -Path surfsense_web\.env.example -Destination surfsense_web\.env copy .env.example .env
``` ```
Edit both `.env` files and fill in the required values: **Windows (PowerShell):**
**Backend Environment Variables:** ```powershell
Copy-Item -Path surfsense_backend\.env.example -Destination surfsense_backend\.env
Copy-Item -Path surfsense_web\.env.example -Destination surfsense_web\.env
Copy-Item -Path .env.example -Destination .env
```
| ENV VARIABLE | DESCRIPTION | Edit all `.env` files and fill in the required values:
|--------------|-------------|
| DATABASE_URL | PostgreSQL connection string (e.g., `postgresql+asyncpg://postgres:postgres@localhost:5432/surfsense`) |
| SECRET_KEY | JWT Secret key for authentication (should be a secure random string) |
| GOOGLE_OAUTH_CLIENT_ID | Google OAuth client ID obtained from Google Cloud Console |
| GOOGLE_OAUTH_CLIENT_SECRET | Google OAuth client secret obtained from Google Cloud Console |
| NEXT_FRONTEND_URL | URL where your frontend application is hosted (e.g., `http://localhost:3000`) |
| EMBEDDING_MODEL | Name of the embedding model (e.g., `mixedbread-ai/mxbai-embed-large-v1`) |
| RERANKERS_MODEL_NAME | Name of the reranker model (e.g., `ms-marco-MiniLM-L-12-v2`) |
| RERANKERS_MODEL_TYPE | Type of reranker model (e.g., `flashrank`) |
| FAST_LLM | LiteLLM routed smaller, faster LLM (e.g., `openai/gpt-4o-mini`, `ollama/deepseek-r1:8b`) |
| STRATEGIC_LLM | LiteLLM routed advanced LLM for complex tasks (e.g., `openai/gpt-4o`, `ollama/gemma3:12b`) |
| LONG_CONTEXT_LLM | LiteLLM routed LLM for longer context windows (e.g., `gemini/gemini-2.0-flash`, `ollama/deepseek-r1:8b`) |
| UNSTRUCTURED_API_KEY | API key for Unstructured.io service for document parsing |
| FIRECRAWL_API_KEY | API key for Firecrawl service for web crawling |
Include API keys for the LLM providers you're using. For example: ### Docker-Specific Environment Variables
- `OPENAI_API_KEY`: If using OpenAI models
- `GEMINI_API_KEY`: If using Google Gemini models
For other LLM providers, refer to the [LiteLLM documentation](https://docs.litellm.ai/docs/providers). | ENV VARIABLE | DESCRIPTION | DEFAULT VALUE |
|----------------------------|-----------------------------------------------------------------------------|---------------------|
| FRONTEND_PORT | Port for the frontend service | 3000 |
| BACKEND_PORT | Port for the backend API service | 8000 |
| POSTGRES_PORT | Port for the PostgreSQL database | 5432 |
| PGADMIN_PORT | Port for pgAdmin web interface | 5050 |
| POSTGRES_USER | PostgreSQL username | postgres |
| POSTGRES_PASSWORD | PostgreSQL password | postgres |
| POSTGRES_DB | PostgreSQL database name | surfsense |
| PGADMIN_DEFAULT_EMAIL | Email for pgAdmin login | admin@surfsense.com |
| PGADMIN_DEFAULT_PASSWORD | Password for pgAdmin login | surfsense |
| NEXT_PUBLIC_API_URL | URL of the backend API (used by frontend) | http://backend:8000 |
**Frontend Environment Variables:** **Backend Environment Variables:**
| ENV VARIABLE | DESCRIPTION | | ENV VARIABLE | DESCRIPTION |
|--------------|-------------| | -------------------------- | ----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- |
| NEXT_PUBLIC_FASTAPI_BACKEND_URL | URL of the backend service (e.g., `http://localhost:8000`) | | DATABASE_URL | PostgreSQL connection string (e.g., `postgresql+asyncpg://postgres:postgres@localhost:5432/surfsense`) |
| SECRET_KEY | JWT Secret key for authentication (should be a secure random string) |
| AUTH_TYPE | Authentication method: `GOOGLE` for OAuth with Google, `LOCAL` for email/password authentication |
| NEXT_FRONTEND_URL | URL where your frontend application is hosted (e.g., `http://localhost:3000`) |
| EMBEDDING_MODEL | Name of the embedding model (e.g., `openai://text-embedding-ada-002`, `anthropic://claude-v1`, `mixedbread-ai/mxbai-embed-large-v1`) |
| RERANKERS_MODEL_NAME | Name of the reranker model (e.g., `ms-marco-MiniLM-L-12-v2`) |
| RERANKERS_MODEL_TYPE | Type of reranker model (e.g., `flashrank`) |
| FAST_LLM | LiteLLM routed smaller, faster LLM (e.g., `openai/gpt-4o-mini`, `ollama/deepseek-r1:8b`) |
| STRATEGIC_LLM | LiteLLM routed advanced LLM for complex tasks (e.g., `openai/gpt-4o`, `ollama/gemma3:12b`) |
| LONG_CONTEXT_LLM | LiteLLM routed LLM for longer context windows (e.g., `gemini/gemini-2.0-flash`, `ollama/deepseek-r1:8b`) |
| ETL_SERVICE | Document parsing service: `UNSTRUCTURED` (supports 34+ formats) or `LLAMACLOUD` (supports 50+ formats including legacy document types) |
| UNSTRUCTURED_API_KEY | API key for Unstructured.io service for document parsing (required if ETL_SERVICE=UNSTRUCTURED) |
| LLAMA_CLOUD_API_KEY | API key for LlamaCloud service for document parsing (required if ETL_SERVICE=LLAMACLOUD) |
| FIRECRAWL_API_KEY | API key for Firecrawl service for web crawling |
| TTS_SERVICE | Text-to-Speech API provider for Podcasts (e.g., `openai/tts-1`, `azure/neural`, `vertex_ai/`). See [supported providers](https://docs.litellm.ai/docs/text_to_speech#supported-providers) |
| STT_SERVICE | Speech-to-Text API provider for Podcasts (e.g., `openai/whisper-1`). See [supported providers](https://docs.litellm.ai/docs/audio_transcription#supported-providers) |
Include API keys for your chosen LLM providers:
| ENV VARIABLE | DESCRIPTION |
|--------------------|-----------------------------------------------------------------------------|
| `OPENAI_API_KEY` | Required if using OpenAI models |
| `GEMINI_API_KEY` | Required if using Google Gemini models |
| `ANTHROPIC_API_KEY`| Required if using Anthropic models |
### Google OAuth Configuration (if AUTH_TYPE=GOOGLE)
| ENV VARIABLE | DESCRIPTION |
|----------------------------|-----------------------------------------------------------------------------|
| `GOOGLE_OAUTH_CLIENT_ID` | Client ID from Google Cloud Console |
| `GOOGLE_OAUTH_CLIENT_SECRET` | Client secret from Google Cloud Console |
**Optional Backend LangSmith Observability:**
| ENV VARIABLE | DESCRIPTION |
|--------------|-------------|
| LANGSMITH_TRACING | Enable LangSmith tracing (e.g., `true`) |
| LANGSMITH_ENDPOINT | LangSmith API endpoint (e.g., `https://api.smith.langchain.com`) |
| LANGSMITH_API_KEY | Your LangSmith API key |
| LANGSMITH_PROJECT | LangSmith project name (e.g., `surfsense`) |
**Optional Backend LiteLLM API Base URLs:**
| ENV VARIABLE | DESCRIPTION |
|--------------|-------------|
| FAST_LLM_API_BASE | Custom API base URL for the fast LLM |
| STRATEGIC_LLM_API_BASE | Custom API base URL for the strategic LLM |
| LONG_CONTEXT_LLM_API_BASE | Custom API base URL for the long context LLM |
| TTS_SERVICE_API_BASE | Custom API base URL for the Text-to-Speech (TTS) service |
| STT_SERVICE_API_BASE | Custom API base URL for the Speech-to-Text (STT) service |
For other LLM providers, refer to the [LiteLLM documentation](https://docs.litellm.ai/docs/providers).
### Frontend Environment Variables
| ENV VARIABLE | DESCRIPTION |
| ------------------------------- | ---------------------------------------------------------- |
| NEXT_PUBLIC_FASTAPI_BACKEND_URL | URL of the backend service (e.g., `http://localhost:8000`) |
| NEXT_PUBLIC_FASTAPI_BACKEND_AUTH_TYPE | Same value as set in backend AUTH_TYPE i.e `GOOGLE` for OAuth with Google, `LOCAL` for email/password authentication |
| NEXT_PUBLIC_ETL_SERVICE | Document parsing service (should match backend ETL_SERVICE): `UNSTRUCTURED` or `LLAMACLOUD` - affects supported file formats in upload interface |
2. **Build and Start Containers** 2. **Build and Start Containers**
Start the Docker containers: Start the Docker containers:
**Linux/macOS/Windows:** **Linux/macOS/Windows:**
```bash ```bash
docker-compose up --build docker compose up --build
``` ```
To run in detached mode (in the background): To run in detached mode (in the background):
**Linux/macOS/Windows:** **Linux/macOS/Windows:**
```bash ```bash
docker-compose up -d docker compose up -d
``` ```
**Note for Windows users:** If you're using older Docker Desktop versions, you might need to use `docker compose` (with a space) instead of `docker-compose`. **Note for Windows users:** If you're using older Docker Desktop versions, you might need to use `docker compose` (with a space) instead of `docker compose`.
3. **Access the Applications** 3. **Access the Applications**
Once the containers are running, you can access: Once the containers are running, you can access:
- Frontend: [http://localhost:3000](http://localhost:3000) - Frontend: [http://localhost:3000](http://localhost:3000)
- Backend API: [http://localhost:8000](http://localhost:8000) - Backend API: [http://localhost:8000](http://localhost:8000)
- API Documentation: [http://localhost:8000/docs](http://localhost:8000/docs) - API Documentation: [http://localhost:8000/docs](http://localhost:8000/docs)
- pgAdmin: [http://localhost:5050](http://localhost:5050)
## Using pgAdmin
pgAdmin is included in the Docker setup to help manage your PostgreSQL database. To connect:
1. Open pgAdmin at [http://localhost:5050](http://localhost:5050)
2. Login with the credentials from your `.env` file (default: admin@surfsense.com / surfsense)
3. Right-click "Servers" > "Create" > "Server"
4. In the "General" tab, name your connection (e.g., "SurfSense DB")
5. In the "Connection" tab:
- Host: `db`
- Port: `5432`
- Maintenance database: `surfsense`
- Username: `postgres` (or your custom POSTGRES_USER)
- Password: `postgres` (or your custom POSTGRES_PASSWORD)
6. Click "Save" to connect
## Useful Docker Commands ## Useful Docker Commands
@ -118,39 +192,43 @@ Before you begin, ensure you have:
- **Stop containers:** - **Stop containers:**
**Linux/macOS/Windows:** **Linux/macOS/Windows:**
```bash ```bash
docker-compose down docker compose down
``` ```
- **View logs:** - **View logs:**
**Linux/macOS/Windows:** **Linux/macOS/Windows:**
```bash ```bash
# All services # All services
docker-compose logs -f docker compose logs -f
# Specific service # Specific service
docker-compose logs -f backend docker compose logs -f backend
docker-compose logs -f frontend docker compose logs -f frontend
docker-compose logs -f db docker compose logs -f db
``` ```
- **Restart a specific service:** - **Restart a specific service:**
**Linux/macOS/Windows:** **Linux/macOS/Windows:**
```bash ```bash
docker-compose restart backend docker compose restart backend
``` ```
- **Execute commands in a running container:** - **Execute commands in a running container:**
**Linux/macOS/Windows:** **Linux/macOS/Windows:**
```bash ```bash
# Backend # Backend
docker-compose exec backend python -m pytest docker compose exec backend python -m pytest
# Frontend # Frontend
docker-compose exec frontend pnpm lint docker compose exec frontend pnpm lint
``` ```
## Troubleshooting ## Troubleshooting
@ -162,7 +240,6 @@ Before you begin, ensure you have:
- For frontend dependency issues, check the `Dockerfile` in the frontend directory. - For frontend dependency issues, check the `Dockerfile` in the frontend directory.
- **Windows-specific:** If you encounter line ending issues (CRLF vs LF), configure Git to handle line endings properly with `git config --global core.autocrlf true` before cloning the repository. - **Windows-specific:** If you encounter line ending issues (CRLF vs LF), configure Git to handle line endings properly with `git config --global core.autocrlf true` before cloning the repository.
## Next Steps ## Next Steps
Once your installation is complete, you can start using SurfSense! Navigate to the frontend URL and log in using your Google account. Once your installation is complete, you can start using SurfSense! Navigate to the frontend URL and log in using your Google account.

Some files were not shown because too many files have changed in this diff Show more