Merge branch 'MODSetter:main' into anshulss/buildimage

This commit is contained in:
Anshul Sharma 2025-04-26 22:52:14 +05:30 committed by GitHub
commit 73350c7f92
No known key found for this signature in database
GPG key ID: B5690EEEBB952194
92 changed files with 8163 additions and 1785 deletions

162
README.md
View file

@ -1,11 +1,12 @@
![headnew](https://github.com/user-attachments/assets/a44fd1e7-1861-46d0-aff7-19cf33e86baa)
![new_header](https://github.com/user-attachments/assets/e236b764-0ddc-42ff-a1f1-8fbb3d2e0e65)
# SurfSense
While tools like NotebookLM and Perplexity are impressive and highly effective for conducting research on any topic, SurfSense elevates this capability by integrating with your personal knowledge base. It is a highly customizable AI research agent, connected to external sources such as search engines (Tavily), Slack, Notion, and more to come.
While tools like NotebookLM and Perplexity are impressive and highly effective for conducting research on any topic/query, SurfSense elevates this capability by integrating with your personal knowledge base. It is a highly customizable AI research agent, connected to external sources such as search engines (Tavily), Slack, Linear, Notion, YouTube, GitHub and more to come.
# Video
@ -43,8 +44,10 @@ Open source and easy to deploy locally.
#### **External Sources**
- Search Engines (Tavily)
- Slack
- Linear
- Notion
- Youtube Videos
- GitHub
- and more to come.....
#### 🔖 Cross Browser Extension
@ -69,144 +72,69 @@ Join the [SurfSense Discord](https://discord.gg/ejRNvftDp9) and help shape the f
## How to get started?
### PRE-START CHECKS
### Installation Options
#### PGVector
Make sure pgvector extension is installed on your machine. Setup Guide https://github.com/pgvector/pgvector?tab=readme-ov-file#installation
SurfSense provides two installation methods:
#### File Uploading Support
For File uploading you need Unstructured.io API key. You can get it at http://platform.unstructured.io/
1. **[Docker Installation](https://www.surfsense.net/docs/docker-installation)** - The easiest way to get SurfSense up and running with all dependencies containerized. Less Customization.
#### Auth
SurfSense now only works with Google OAuth. Make sure to set your OAuth Client at https://developers.google.com/identity/protocols/oauth2 . We need client id and client secret for backend. Make sure to enable people api and add the required scopes under data access (openid, userinfo.email, userinfo.profile)
2. **[Manual Installation (Recommended)](https://www.surfsense.net/docs/manual-installation)** - For users who prefer more control over their setup or need to customize their deployment.
![gauth](https://github.com/user-attachments/assets/80d60fe5-889b-48a6-b947-200fdaf544c1)
Both installation guides include detailed OS-specific instructions for Windows, macOS, and Linux.
Before installation, make sure to complete the [prerequisite setup steps](https://www.surfsense.net/docs/) including:
- PGVector setup
- Google OAuth configuration
- Unstructured.io API key
- Other required API keys
## Screenshots
**Search Spaces**
![search_spaces](https://github.com/user-attachments/assets/e254c38c-f937-44b6-9e9d-770db583d099)
**Manage Documents**
![documents](https://github.com/user-attachments/assets/7001e306-eb06-4009-89c6-8fadfdc3fc4d)
**Research Agent**
![researcher](https://github.com/user-attachments/assets/fda3e61f-f936-4b66-b565-d84edde44a67)
#### Crawler Support
SurfSense currently uses [Firecrawl.py](https://www.firecrawl.dev/) right now. Playwright crawler support will be added soon.
**Agent Chat**
![chat](https://github.com/user-attachments/assets/bb352d52-1c6d-4020-926b-722d0b98b491)
## Quick Start
### Preferred Method: Docker Setup
The recommended way to run SurfSense is using Docker, which ensures consistent environment across different systems.
1. Make sure you have Docker and Docker Compose installed
2. Follow the detailed instructions in our [Docker Setup Guide](DOCKER_SETUP.md)
```bash
# Start all services with one command
docker-compose up --build
```
---
### Alternative: Manual Setup
### Backend (./surfsense_backend)
This is the core of SurfSense. Before we begin let's look at `.env` variables' that we need to successfully setup SurfSense.
|ENV VARIABLE|DESCRIPTION|
|--|--|
| DATABASE_URL| Your PostgreSQL database connection string. Eg. `postgresql+asyncpg://postgres:postgres@localhost:5432/surfsense`|
| SECRET_KEY| JWT Secret key used for authentication. Should be a secure random string. Eg. `SURFSENSE_SECRET_KEY_123456789`|
| GOOGLE_OAUTH_CLIENT_ID| Google OAuth client ID obtained from Google Cloud Console when setting up OAuth authentication|
| GOOGLE_OAUTH_CLIENT_SECRET| Google OAuth client secret obtained from Google Cloud Console when setting up OAuth authentication|
| NEXT_FRONTEND_URL| URL where your frontend application is hosted. Eg. `http://localhost:3000`|
| EMBEDDING_MODEL| Name of the embedding model to use for vector embeddings. Currently works with Sentence Transformers only. Expect other embeddings soon. Eg. `mixedbread-ai/mxbai-embed-large-v1`|
| RERANKERS_MODEL_NAME| Name of the reranker model for search result reranking. Eg. `ms-marco-MiniLM-L-12-v2`|
| RERANKERS_MODEL_TYPE| Type of reranker model being used. Eg. `flashrank`|
| FAST_LLM| LiteLLM routed Smaller, faster LLM for quick responses. Eg. `litellm:openai/gpt-4o`|
| SMART_LLM| LiteLLM routed Balanced LLM for general use. Eg. `litellm:openai/gpt-4o`|
| STRATEGIC_LLM| LiteLLM routed Advanced LLM for complex reasoning tasks. Eg. `litellm:openai/gpt-4o`|
| LONG_CONTEXT_LLM| LiteLLM routed LLM capable of handling longer context windows. Eg. `litellm:gemini/gemini-2.0-flash`|
| UNSTRUCTURED_API_KEY| API key for Unstructured.io service for document parsing|
| FIRECRAWL_API_KEY| API key for Firecrawl service for web crawling and data extraction|
IMPORTANT: Since LLM calls are routed through LiteLLM make sure to include API keys of LLM models you are using. For example if you used `litellm:openai/gpt-4o` make sure to include OpenAI API Key `OPENAI_API_KEY` or if you use `litellm:gemini/gemini-2.0-flash` then you include `GEMINI_API_KEY`.
You can also integrate any LLM just follow this https://docs.litellm.ai/docs/providers
Now once you have everything let's proceed to run SurfSense.
1. Install `uv` : https://docs.astral.sh/uv/getting-started/installation/
2. Now just run this command to install dependencies i.e `uv sync`
3. That's it. Now just run the `main.py` file using `uv run main.py`. You can also optionally pass `--reload` as an argument to enable hot reloading.
4. If everything worked fine you should see screen like this.
![backend](https://i.ibb.co/542Vhqw/backendrunning.png)
---
### FrontEnd (./surfsense_web)
For local frontend setup just fill out the `.env` file of frontend.
|ENV VARIABLE|DESCRIPTION|
|--|--|
| NEXT_PUBLIC_FASTAPI_BACKEND_URL | Give hosted backend url here. Eg. `http://localhost:8000`|
1. Now install dependencies using `pnpm install`
2. Run it using `pnpm run dev`
You should see your Next.js frontend running at `localhost:3000`
---
### Extension (./surfsense_browser_extension)
Extension is in plasmo framework which is a cross browser extension framework. Extension main usecase is to save any webpages protected beyond authentication.
For building extension just fill out the `.env` file of frontend.
|ENV VARIABLE|DESCRIPTION|
|--|--|
| PLASMO_PUBLIC_BACKEND_URL| SurfSense Backend URL eg. "http://127.0.0.1:8000" |
Build the extension for your favorite browser using this guide: https://docs.plasmo.com/framework/workflows/build#with-a-specific-target
When you load and start the extension you should see a Apu page like this
**Browser Extension**
![ext1](https://github.com/user-attachments/assets/1f042b7a-6349-422b-94fb-d40d0df16c40)
After filling in your SurfSense API key you should be able to use extension now.
![ext2](https://github.com/user-attachments/assets/a9b9f1aa-2677-404d-b0a0-c1b2dddf24a7)
|Options|Explanations|
|--|--|
| Search Space | Search Space to save your dynamic bookmarks. |
| Clear Inactive History Sessions | It clears the saved content for Inactive Tab Sessions. |
| Save Current Webpage Snapshot | Stores the current webpage session info into SurfSense history store|
| Save to SurfSense | Processes the SurfSense History Store & Initiates a Save Job |
## Tech Stack
## Tech Stack
### **BackEnd**
- **FastAPI**: Modern, fast web framework for building APIs with Python
- **PostgreSQL with pgvector**: Database with vector search capabilities for similarity searches
- **SQLAlchemy**: SQL toolkit and ORM (Object-Relational Mapping) for database interactions
- **FastAPI Users**: Authentication and user management with JWT and OAuth support
- **LangChain**: Framework for developing AI-powered applications
- **Alembic**: A database migrations tool for SQLAlchemy.
- **GPT Integration**: Integration with LLM models through LiteLLM
- **FastAPI Users**: Authentication and user management with JWT and OAuth support
- **LangGraph**: Framework for developing AI-agents.
- **LangChain**: Framework for developing AI-powered applications.
- **LLM Integration**: Integration with LLM models through LiteLLM
- **Rerankers**: Advanced result ranking for improved search relevance
- **GPT-Researcher**: Advanced research capabilities
- **Hybrid Search**: Combines vector similarity and full-text search for optimal results using Reciprocal Rank Fusion (RRF)
- **Vector Embeddings**: Document and text embeddings for semantic search
@ -214,10 +142,8 @@ After filling in your SurfSense API key you should be able to use extension now.
- **pgvector**: PostgreSQL extension for efficient vector similarity operations
- **Chonkie**: Advanced document chunking and embedding library
- Uses `AutoEmbeddings` for flexible embedding model selection
- `LateChunker` for optimized document chunking based on embedding model's max sequence length
- Uses `AutoEmbeddings` for flexible embedding model selection
- `LateChunker` for optimized document chunking based on embedding model's max sequence length

View file

@ -4,18 +4,26 @@ SECRET_KEY="SECRET"
GOOGLE_OAUTH_CLIENT_ID="924507538m"
GOOGLE_OAUTH_CLIENT_SECRET="GOCSV"
NEXT_FRONTEND_URL="http://localhost:3000"
EMBEDDING_MODEL="mixedbread-ai/mxbai-embed-large-v1"
RERANKERS_MODEL_NAME="ms-marco-MiniLM-L-12-v2"
RERANKERS_MODEL_TYPE="flashrank"
FAST_LLM="litellm:openai/gpt-4o-mini"
SMART_LLM="litellm:openai/gpt-4o-mini"
STRATEGIC_LLM="litellm:openai/gpt-4o-mini"
LONG_CONTEXT_LLM="litellm:gemini/gemini-2.0-flash-thinking-exp-01-21"
# https://docs.litellm.ai/docs/providers
FAST_LLM="openai/gpt-4o-mini"
STRATEGIC_LLM="openai/gpt-4o"
LONG_CONTEXT_LLM="gemini/gemini-2.0-flash"
# Chosen LiteLLM Providers Keys
OPENAI_API_KEY="sk-proj-iA"
GEMINI_API_KEY="AIzaSyB6-1641124124124124124124124124124"
UNSTRUCTURED_API_KEY="Tpu3P0U8iy"
FIRECRAWL_API_KEY="fcr-01J0000000000000000000000"
#OPTIONAL: Add these for LangSmith Observability
LANGSMITH_TRACING=true
LANGSMITH_ENDPOINT="https://api.smith.langchain.com"
LANGSMITH_API_KEY="lsv2_pt_....."
LANGSMITH_PROJECT="surfsense"

View file

@ -3,4 +3,5 @@
venv/
data/
__pycache__/
.flashrank_cache
.flashrank_cache
surf_new_backend.egg-info/

View file

@ -0,0 +1,119 @@
# A generic, single database configuration.
[alembic]
# path to migration scripts.
# Use forward slashes (/) also on windows to provide an os agnostic path
script_location = alembic
# template used to generate migration file names; The default value is %%(rev)s_%%(slug)s
# Uncomment the line below if you want the files to be prepended with date and time
# file_template = %%(year)d_%%(month).2d_%%(day).2d_%%(hour).2d%%(minute).2d-%%(rev)s_%%(slug)s
# sys.path path, will be prepended to sys.path if present.
# defaults to the current working directory.
prepend_sys_path = .
# timezone to use when rendering the date within the migration file
# as well as the filename.
# If specified, requires the python>=3.9 or backports.zoneinfo library and tzdata library.
# Any required deps can installed by adding `alembic[tz]` to the pip requirements
# string value is passed to ZoneInfo()
# leave blank for localtime
# timezone =
# max length of characters to apply to the "slug" field
# truncate_slug_length = 40
# set to 'true' to run the environment during
# the 'revision' command, regardless of autogenerate
# revision_environment = false
# set to 'true' to allow .pyc and .pyo files without
# a source .py file to be detected as revisions in the
# versions/ directory
# sourceless = false
# version location specification; This defaults
# to alembic/versions. When using multiple version
# directories, initial revisions must be specified with --version-path.
# The path separator used here should be the separator specified by "version_path_separator" below.
# version_locations = %(here)s/bar:%(here)s/bat:alembic/versions
# version path separator; As mentioned above, this is the character used to split
# version_locations. The default within new alembic.ini files is "os", which uses os.pathsep.
# If this key is omitted entirely, it falls back to the legacy behavior of splitting on spaces and/or commas.
# Valid values for version_path_separator are:
#
# version_path_separator = :
# version_path_separator = ;
# version_path_separator = space
# version_path_separator = newline
#
# Use os.pathsep. Default configuration used for new projects.
version_path_separator = os
# set to 'true' to search source files recursively
# in each "version_locations" directory
# new in Alembic version 1.10
# recursive_version_locations = false
# the output encoding used when revision files
# are written from script.py.mako
# output_encoding = utf-8
# The SQLAlchemy URL to connect to
# IMPORTANT: Replace this with your actual async database URL
sqlalchemy.url = postgresql+asyncpg://postgres:postgres@localhost:5432/surfsense
[post_write_hooks]
# post_write_hooks defines scripts or Python functions that are run
# on newly generated revision scripts. See the documentation for further
# detail and examples
# format using "black" - use the console_scripts runner, against the "black" entrypoint
# hooks = black
# black.type = console_scripts
# black.entrypoint = black
# black.options = -l 79 REVISION_SCRIPT_FILENAME
# lint with attempts to fix using "ruff" - use the exec runner, execute a binary
# hooks = ruff
# ruff.type = exec
# ruff.executable = %(here)s/.venv/bin/ruff
# ruff.options = check --fix REVISION_SCRIPT_FILENAME
# Logging configuration
[loggers]
keys = root,sqlalchemy,alembic
[handlers]
keys = console
[formatters]
keys = generic
[logger_root]
level = WARNING
handlers = console
qualname =
[logger_sqlalchemy]
level = WARNING
handlers =
qualname = sqlalchemy.engine
[logger_alembic]
level = INFO
handlers =
qualname = alembic
[handler_console]
class = StreamHandler
args = (sys.stderr,)
level = NOTSET
formatter = generic
[formatter_generic]
format = %(levelname)-5.5s [%(name)s] %(message)s
datefmt = %H:%M:%S

View file

@ -0,0 +1 @@
Generic single-database configuration with an async dbapi.

View file

@ -0,0 +1,98 @@
import asyncio
from logging.config import fileConfig
import os
import sys
from sqlalchemy import pool
from sqlalchemy.engine import Connection
from sqlalchemy.ext.asyncio import async_engine_from_config
from alembic import context
# Ensure the app directory is in the Python path
# This allows Alembic to find your models
sys.path.insert(0, os.path.realpath(os.path.join(os.path.dirname(__file__), '..')))
# Import your models base
from app.db import Base # Assuming your Base is defined in app.db
# this is the Alembic Config object, which provides
# access to the values within the .ini file in use.
config = context.config
# Interpret the config file for Python logging.
# This line sets up loggers basically.
if config.config_file_name is not None:
fileConfig(config.config_file_name)
# add your model's MetaData object here
# for 'autogenerate' support
# from myapp import mymodel
# target_metadata = mymodel.Base.metadata
target_metadata = Base.metadata
# other values from the config, defined by the needs of env.py,
# can be acquired:
# my_important_option = config.get_main_option("my_important_option")
# ... etc.
def run_migrations_offline() -> None:
"""Run migrations in 'offline' mode.
This configures the context with just a URL
and not an Engine, though an Engine is acceptable
here as well. By skipping the Engine creation
we don't even need a DBAPI to be available.
Calls to context.execute() here emit the given string to the
script output.
"""
url = config.get_main_option("sqlalchemy.url")
context.configure(
url=url,
target_metadata=target_metadata,
literal_binds=True,
dialect_opts={"paramstyle": "named"},
)
with context.begin_transaction():
context.run_migrations()
def do_run_migrations(connection: Connection) -> None:
context.configure(connection=connection, target_metadata=target_metadata)
with context.begin_transaction():
context.run_migrations()
async def run_async_migrations() -> None:
"""In this scenario we need to create an Engine
and associate a connection with the context.
"""
connectable = async_engine_from_config(
config.get_section(config.config_ini_section, {}),
prefix="sqlalchemy.",
poolclass=pool.NullPool,
)
async with connectable.connect() as connection:
await connection.run_sync(do_run_migrations)
await connectable.dispose()
def run_migrations_online() -> None:
"""Run migrations in 'online' mode."""
asyncio.run(run_async_migrations())
if context.is_offline_mode():
run_migrations_offline()
else:
run_migrations_online()

View file

@ -0,0 +1,28 @@
"""${message}
Revision ID: ${up_revision}
Revises: ${down_revision | comma,n}
Create Date: ${create_date}
"""
from typing import Sequence, Union
from alembic import op
import sqlalchemy as sa
${imports if imports else ""}
# revision identifiers, used by Alembic.
revision: str = ${repr(up_revision)}
down_revision: Union[str, None] = ${repr(down_revision)}
branch_labels: Union[str, Sequence[str], None] = ${repr(branch_labels)}
depends_on: Union[str, Sequence[str], None] = ${repr(depends_on)}
def upgrade() -> None:
"""Upgrade schema."""
${upgrades if upgrades else "pass"}
def downgrade() -> None:
"""Downgrade schema."""
${downgrades if downgrades else "pass"}

View file

@ -0,0 +1,53 @@
"""Add GITHUB_CONNECTOR to SearchSourceConnectorType enum
Revision ID: 1
Revises:
Create Date: 2023-10-27 10:00:00.000000
"""
from typing import Sequence, Union
from alembic import op
import sqlalchemy as sa
# Import pgvector if needed for other types, though not for this ENUM change
# import pgvector
# revision identifiers, used by Alembic.
revision: str = '1'
down_revision: Union[str, None] = None
branch_labels: Union[str, Sequence[str], None] = None
depends_on: Union[str, Sequence[str], None] = None
def upgrade() -> None:
# ### commands auto generated by Alembic - please adjust! ###
# Manually add the command to add the enum value
# Note: It's generally better to let autogenerate handle this, but we're bypassing it
op.execute("ALTER TYPE searchsourceconnectortype ADD VALUE 'GITHUB_CONNECTOR'")
# Pass for the rest, as autogenerate didn't run to add other schema details
pass
# ### end Alembic commands ###
def downgrade() -> None:
# ### commands auto generated by Alembic - please adjust! ###
# Downgrading removal of an enum value is complex and potentially dangerous
# if the value is in use. Often omitted or requires manual SQL based on context.
# For now, we'll just pass. If you needed to reverse this, you'd likely
# have to manually check if 'GITHUB_CONNECTOR' is used in the table
# and then potentially recreate the type without it.
op.execute("ALTER TYPE searchsourceconnectortype RENAME TO searchsourceconnectortype_old")
op.execute("CREATE TYPE searchsourceconnectortype AS ENUM('SERPER_API', 'TAVILY_API', 'SLACK_CONNECTOR', 'NOTION_CONNECTOR')")
op.execute((
"ALTER TABLE search_source_connectors ALTER COLUMN connector_type TYPE searchsourceconnectortype USING "
"connector_type::text::searchsourceconnectortype"
))
op.execute("DROP TYPE searchsourceconnectortype_old")
pass
# ### end Alembic commands ###

View file

@ -0,0 +1,45 @@
"""Add LINEAR_CONNECTOR to SearchSourceConnectorType enum
Revision ID: 2
Revises: e55302644c51
Create Date: 2025-04-16 10:00:00.000000
"""
from typing import Sequence, Union
from alembic import op
import sqlalchemy as sa
# revision identifiers, used by Alembic.
revision: str = '2'
down_revision: Union[str, None] = 'e55302644c51'
branch_labels: Union[str, Sequence[str], None] = None
depends_on: Union[str, Sequence[str], None] = None
def upgrade() -> None:
# ### commands auto generated by Alembic - please adjust! ###
# Manually add the command to add the enum value
op.execute("ALTER TYPE searchsourceconnectortype ADD VALUE 'LINEAR_CONNECTOR'")
# Pass for the rest, as autogenerate didn't run to add other schema details
pass
# ### end Alembic commands ###
def downgrade() -> None:
# ### commands auto generated by Alembic - please adjust! ###
# Downgrading removal of an enum value requires recreating the type
op.execute("ALTER TYPE searchsourceconnectortype RENAME TO searchsourceconnectortype_old")
op.execute("CREATE TYPE searchsourceconnectortype AS ENUM('SERPER_API', 'TAVILY_API', 'SLACK_CONNECTOR', 'NOTION_CONNECTOR', 'GITHUB_CONNECTOR')")
op.execute((
"ALTER TABLE search_source_connectors ALTER COLUMN connector_type TYPE searchsourceconnectortype USING "
"connector_type::text::searchsourceconnectortype"
))
op.execute("DROP TYPE searchsourceconnectortype_old")
pass
# ### end Alembic commands ###

View file

@ -0,0 +1,71 @@
"""Add LINEAR_CONNECTOR to DocumentType enum
Revision ID: 3
Revises: 2
Create Date: 2025-04-16 10:05:00.059921
"""
from typing import Sequence, Union
from alembic import op
import sqlalchemy as sa
# revision identifiers, used by Alembic.
revision: str = '3'
down_revision: Union[str, None] = '2'
branch_labels: Union[str, Sequence[str], None] = None
depends_on: Union[str, Sequence[str], None] = None
# Define the ENUM type name and the new value
ENUM_NAME = 'documenttype' # Make sure this matches the name in your DB (usually lowercase class name)
NEW_VALUE = 'LINEAR_CONNECTOR'
def upgrade() -> None:
"""Upgrade schema."""
op.execute(f"ALTER TYPE {ENUM_NAME} ADD VALUE '{NEW_VALUE}'")
# Warning: This will delete all rows with the new value
def downgrade() -> None:
"""Downgrade schema - remove LINEAR_CONNECTOR from enum."""
# The old type name
old_enum_name = f"{ENUM_NAME}_old"
# Enum values *before* LINEAR_CONNECTOR was added
old_values = (
'EXTENSION',
'CRAWLED_URL',
'FILE',
'SLACK_CONNECTOR',
'NOTION_CONNECTOR',
'YOUTUBE_VIDEO',
'GITHUB_CONNECTOR'
)
old_values_sql = ", ".join([f"'{v}'" for v in old_values])
# Table and column names (adjust if different)
table_name = 'documents'
column_name = 'document_type'
# 1. Rename the current enum type
op.execute(f"ALTER TYPE {ENUM_NAME} RENAME TO {old_enum_name}")
# 2. Create the new enum type with the old values
op.execute(f"CREATE TYPE {ENUM_NAME} AS ENUM({old_values_sql})")
# 3. Update the table:
op.execute(
f"DELETE FROM {table_name} WHERE {column_name}::text = '{NEW_VALUE}'"
)
# 4. Alter the column to use the new enum type (casting old values)
op.execute(
f"ALTER TABLE {table_name} ALTER COLUMN {column_name} "
f"TYPE {ENUM_NAME} USING {column_name}::text::{ENUM_NAME}"
)
# 5. Drop the old enum type
op.execute(f"DROP TYPE {old_enum_name}")
# ### end Alembic commands ###

View file

@ -0,0 +1,70 @@
"""Add GITHUB_CONNECTOR to DocumentType enum
Revision ID: e55302644c51
Revises: 1
Create Date: 2025-04-13 19:56:00.059921
"""
from typing import Sequence, Union
from alembic import op
import sqlalchemy as sa
# revision identifiers, used by Alembic.
revision: str = 'e55302644c51'
down_revision: Union[str, None] = '1'
branch_labels: Union[str, Sequence[str], None] = None
depends_on: Union[str, Sequence[str], None] = None
# Define the ENUM type name and the new value
ENUM_NAME = 'documenttype' # Make sure this matches the name in your DB (usually lowercase class name)
NEW_VALUE = 'GITHUB_CONNECTOR'
def upgrade() -> None:
"""Upgrade schema."""
op.execute(f"ALTER TYPE {ENUM_NAME} ADD VALUE '{NEW_VALUE}'")
# Warning: This will delete all rows with the new value
def downgrade() -> None:
"""Downgrade schema - remove GITHUB_CONNECTOR from enum."""
# The old type name
old_enum_name = f"{ENUM_NAME}_old"
# Enum values *before* GITHUB_CONNECTOR was added
old_values = (
'EXTENSION',
'CRAWLED_URL',
'FILE',
'SLACK_CONNECTOR',
'NOTION_CONNECTOR',
'YOUTUBE_VIDEO'
)
old_values_sql = ", ".join([f"'{v}'" for v in old_values])
# Table and column names (adjust if different)
table_name = 'documents'
column_name = 'document_type'
# 1. Rename the current enum type
op.execute(f"ALTER TYPE {ENUM_NAME} RENAME TO {old_enum_name}")
# 2. Create the new enum type with the old values
op.execute(f"CREATE TYPE {ENUM_NAME} AS ENUM({old_values_sql})")
# 3. Update the table:
op.execute(
f"DELETE FROM {table_name} WHERE {column_name}::text = '{NEW_VALUE}'"
)
# 4. Alter the column to use the new enum type (casting old values)
op.execute(
f"ALTER TABLE {table_name} ALTER COLUMN {column_name} "
f"TYPE {ENUM_NAME} USING {column_name}::text::{ENUM_NAME}"
)
# 5. Drop the old enum type
op.execute(f"DROP TYPE {old_enum_name}")
# ### end Alembic commands ###

View file

@ -0,0 +1 @@
"""This is upcoming research agent. Work in progress."""

View file

@ -0,0 +1,30 @@
"""Define the configurable parameters for the agent."""
from __future__ import annotations
from dataclasses import dataclass, fields
from typing import Optional, List, Any
from langchain_core.runnables import RunnableConfig
@dataclass(kw_only=True)
class Configuration:
"""The configuration for the agent."""
# Input parameters provided at invocation
user_query: str
num_sections: int
connectors_to_search: List[str]
user_id: str
search_space_id: int
@classmethod
def from_runnable_config(
cls, config: Optional[RunnableConfig] = None
) -> Configuration:
"""Create a Configuration instance from a RunnableConfig object."""
configurable = (config.get("configurable") or {}) if config else {}
_fields = {f.name for f in fields(cls) if f.init}
return cls(**{k: v for k, v in configurable.items() if k in _fields})

View file

@ -0,0 +1,43 @@
from langgraph.graph import StateGraph
from .state import State
from .nodes import write_answer_outline, process_sections
from .configuration import Configuration
from typing import TypedDict, List, Dict, Any, Optional
# Define what keys are in our state dict
class GraphState(TypedDict):
# Intermediate data produced during workflow
answer_outline: Optional[Any]
# Final output
final_written_report: Optional[str]
def build_graph():
"""
Build and return the LangGraph workflow.
This function constructs the researcher agent graph with proper state management
and node connections following LangGraph best practices.
Returns:
A compiled LangGraph workflow
"""
# Define a new graph with state class
workflow = StateGraph(State, config_schema=Configuration)
# Add nodes to the graph
workflow.add_node("write_answer_outline", write_answer_outline)
workflow.add_node("process_sections", process_sections)
# Define the edges - create a linear flow
workflow.add_edge("__start__", "write_answer_outline")
workflow.add_edge("write_answer_outline", "process_sections")
workflow.add_edge("process_sections", "__end__")
# Compile the workflow into an executable graph
graph = workflow.compile()
graph.name = "Surfsense Researcher" # This defines the custom name in LangSmith
return graph
# Compile the graph once when the module is loaded
graph = build_graph()

View file

@ -0,0 +1,658 @@
import asyncio
import json
from typing import Any, Dict, List
from app.config import config as app_config
from app.db import async_session_maker
from app.utils.connector_service import ConnectorService
from langchain_core.messages import HumanMessage, SystemMessage
from langchain_core.runnables import RunnableConfig
from pydantic import BaseModel, Field
from sqlalchemy.ext.asyncio import AsyncSession
from .configuration import Configuration
from .prompts import get_answer_outline_system_prompt
from .state import State
from .sub_section_writer.graph import graph as sub_section_writer_graph
from langgraph.types import StreamWriter
class Section(BaseModel):
"""A section in the answer outline."""
section_id: int = Field(..., description="The zero-based index of the section")
section_title: str = Field(..., description="The title of the section")
questions: List[str] = Field(..., description="Questions to research for this section")
class AnswerOutline(BaseModel):
"""The complete answer outline with all sections."""
answer_outline: List[Section] = Field(..., description="List of sections in the answer outline")
async def write_answer_outline(state: State, config: RunnableConfig, writer: StreamWriter) -> Dict[str, Any]:
"""
Create a structured answer outline based on the user query.
This node takes the user query and number of sections from the configuration and uses
an LLM to generate a comprehensive outline with logical sections and research questions
for each section.
Returns:
Dict containing the answer outline in the "answer_outline" key for state update.
"""
streaming_service = state.streaming_service
streaming_service.only_update_terminal("Generating answer outline...")
writer({"yeild_value": streaming_service._format_annotations()})
# Get configuration from runnable config
configuration = Configuration.from_runnable_config(config)
user_query = configuration.user_query
num_sections = configuration.num_sections
streaming_service.only_update_terminal(f"Planning research approach for query: {user_query[:100]}...")
writer({"yeild_value": streaming_service._format_annotations()})
# Initialize LLM
llm = app_config.strategic_llm_instance
# Create the human message content
human_message_content = f"""
Now Please create an answer outline for the following query:
User Query: {user_query}
Number of Sections: {num_sections}
Remember to format your response as valid JSON exactly matching this structure:
{{
"answer_outline": [
{{
"section_id": 0,
"section_title": "Section Title",
"questions": [
"Question 1 to research for this section",
"Question 2 to research for this section"
]
}}
]
}}
Your output MUST be valid JSON in exactly this format. Do not include any other text or explanation.
"""
streaming_service.only_update_terminal("Designing structured outline with AI...")
writer({"yeild_value": streaming_service._format_annotations()})
# Create messages for the LLM
messages = [
SystemMessage(content=get_answer_outline_system_prompt()),
HumanMessage(content=human_message_content)
]
# Call the LLM directly without using structured output
streaming_service.only_update_terminal("Processing answer structure...")
writer({"yeild_value": streaming_service._format_annotations()})
response = await llm.ainvoke(messages)
# Parse the JSON response manually
try:
# Extract JSON content from the response
content = response.content
# Find the JSON in the content (handle case where LLM might add additional text)
json_start = content.find('{')
json_end = content.rfind('}') + 1
if json_start >= 0 and json_end > json_start:
json_str = content[json_start:json_end]
# Parse the JSON string
parsed_data = json.loads(json_str)
# Convert to Pydantic model
answer_outline = AnswerOutline(**parsed_data)
total_questions = sum(len(section.questions) for section in answer_outline.answer_outline)
streaming_service.only_update_terminal(f"Successfully generated outline with {len(answer_outline.answer_outline)} sections and {total_questions} research questions")
writer({"yeild_value": streaming_service._format_annotations()})
print(f"Successfully generated answer outline with {len(answer_outline.answer_outline)} sections")
# Return state update
return {"answer_outline": answer_outline}
else:
# If JSON structure not found, raise a clear error
error_message = f"Could not find valid JSON in LLM response. Raw response: {content}"
streaming_service.only_update_terminal(error_message, "error")
writer({"yeild_value": streaming_service._format_annotations()})
raise ValueError(error_message)
except (json.JSONDecodeError, ValueError) as e:
# Log the error and re-raise it
error_message = f"Error parsing LLM response: {str(e)}"
streaming_service.only_update_terminal(error_message, "error")
writer({"yeild_value": streaming_service._format_annotations()})
print(f"Error parsing LLM response: {str(e)}")
print(f"Raw response: {response.content}")
raise
async def fetch_relevant_documents(
research_questions: List[str],
user_id: str,
search_space_id: int,
db_session: AsyncSession,
connectors_to_search: List[str],
writer: StreamWriter = None,
state: State = None,
top_k: int = 20
) -> List[Dict[str, Any]]:
"""
Fetch relevant documents for research questions using the provided connectors.
Args:
research_questions: List of research questions to find documents for
user_id: The user ID
search_space_id: The search space ID
db_session: The database session
connectors_to_search: List of connectors to search
writer: StreamWriter for sending progress updates
state: The current state containing the streaming service
top_k: Number of top results to retrieve per connector per question
Returns:
List of relevant documents
"""
# Initialize services
connector_service = ConnectorService(db_session)
# Only use streaming if both writer and state are provided
streaming_service = state.streaming_service if state is not None else None
# Stream initial status update
if streaming_service and writer:
streaming_service.only_update_terminal(f"Starting research on {len(research_questions)} questions using {len(connectors_to_search)} connectors...")
writer({"yeild_value": streaming_service._format_annotations()})
all_raw_documents = [] # Store all raw documents
all_sources = [] # Store all sources
for i, user_query in enumerate(research_questions):
# Stream question being researched
if streaming_service and writer:
streaming_service.only_update_terminal(f"Researching question {i+1}/{len(research_questions)}: {user_query[:100]}...")
writer({"yeild_value": streaming_service._format_annotations()})
# Use original research question as the query
reformulated_query = user_query
# Process each selected connector
for connector in connectors_to_search:
# Stream connector being searched
if streaming_service and writer:
streaming_service.only_update_terminal(f"Searching {connector} for relevant information...")
writer({"yeild_value": streaming_service._format_annotations()})
try:
if connector == "YOUTUBE_VIDEO":
source_object, youtube_chunks = await connector_service.search_youtube(
user_query=reformulated_query,
user_id=user_id,
search_space_id=search_space_id,
top_k=top_k
)
# Add to sources and raw documents
if source_object:
all_sources.append(source_object)
all_raw_documents.extend(youtube_chunks)
# Stream found document count
if streaming_service and writer:
streaming_service.only_update_terminal(f"Found {len(youtube_chunks)} YouTube chunks relevant to the query")
writer({"yeild_value": streaming_service._format_annotations()})
elif connector == "EXTENSION":
source_object, extension_chunks = await connector_service.search_extension(
user_query=reformulated_query,
user_id=user_id,
search_space_id=search_space_id,
top_k=top_k
)
# Add to sources and raw documents
if source_object:
all_sources.append(source_object)
all_raw_documents.extend(extension_chunks)
# Stream found document count
if streaming_service and writer:
streaming_service.only_update_terminal(f"Found {len(extension_chunks)} extension chunks relevant to the query")
writer({"yeild_value": streaming_service._format_annotations()})
elif connector == "CRAWLED_URL":
source_object, crawled_urls_chunks = await connector_service.search_crawled_urls(
user_query=reformulated_query,
user_id=user_id,
search_space_id=search_space_id,
top_k=top_k
)
# Add to sources and raw documents
if source_object:
all_sources.append(source_object)
all_raw_documents.extend(crawled_urls_chunks)
# Stream found document count
if streaming_service and writer:
streaming_service.only_update_terminal(f"Found {len(crawled_urls_chunks)} crawled URL chunks relevant to the query")
writer({"yeild_value": streaming_service._format_annotations()})
elif connector == "FILE":
source_object, files_chunks = await connector_service.search_files(
user_query=reformulated_query,
user_id=user_id,
search_space_id=search_space_id,
top_k=top_k
)
# Add to sources and raw documents
if source_object:
all_sources.append(source_object)
all_raw_documents.extend(files_chunks)
# Stream found document count
if streaming_service and writer:
streaming_service.only_update_terminal(f"Found {len(files_chunks)} file chunks relevant to the query")
writer({"yeild_value": streaming_service._format_annotations()})
elif connector == "TAVILY_API":
source_object, tavily_chunks = await connector_service.search_tavily(
user_query=reformulated_query,
user_id=user_id,
top_k=top_k
)
# Add to sources and raw documents
if source_object:
all_sources.append(source_object)
all_raw_documents.extend(tavily_chunks)
# Stream found document count
if streaming_service and writer:
streaming_service.only_update_terminal(f"Found {len(tavily_chunks)} web search results relevant to the query")
writer({"yeild_value": streaming_service._format_annotations()})
elif connector == "SLACK_CONNECTOR":
source_object, slack_chunks = await connector_service.search_slack(
user_query=reformulated_query,
user_id=user_id,
search_space_id=search_space_id,
top_k=top_k
)
# Add to sources and raw documents
if source_object:
all_sources.append(source_object)
all_raw_documents.extend(slack_chunks)
# Stream found document count
if streaming_service and writer:
streaming_service.only_update_terminal(f"Found {len(slack_chunks)} Slack messages relevant to the query")
writer({"yeild_value": streaming_service._format_annotations()})
elif connector == "NOTION_CONNECTOR":
source_object, notion_chunks = await connector_service.search_notion(
user_query=reformulated_query,
user_id=user_id,
search_space_id=search_space_id,
top_k=top_k
)
# Add to sources and raw documents
if source_object:
all_sources.append(source_object)
all_raw_documents.extend(notion_chunks)
# Stream found document count
if streaming_service and writer:
streaming_service.only_update_terminal(f"Found {len(notion_chunks)} Notion pages/blocks relevant to the query")
writer({"yeild_value": streaming_service._format_annotations()})
elif connector == "GITHUB_CONNECTOR":
source_object, github_chunks = await connector_service.search_github(
user_query=reformulated_query,
user_id=user_id,
search_space_id=search_space_id,
top_k=top_k
)
# Add to sources and raw documents
if source_object:
all_sources.append(source_object)
all_raw_documents.extend(github_chunks)
# Stream found document count
if streaming_service and writer:
streaming_service.only_update_terminal(f"Found {len(github_chunks)} GitHub files/issues relevant to the query")
writer({"yeild_value": streaming_service._format_annotations()})
elif connector == "LINEAR_CONNECTOR":
source_object, linear_chunks = await connector_service.search_linear(
user_query=reformulated_query,
user_id=user_id,
search_space_id=search_space_id,
top_k=top_k
)
# Add to sources and raw documents
if source_object:
all_sources.append(source_object)
all_raw_documents.extend(linear_chunks)
# Stream found document count
if streaming_service and writer:
streaming_service.only_update_terminal(f"Found {len(linear_chunks)} Linear issues relevant to the query")
writer({"yeild_value": streaming_service._format_annotations()})
except Exception as e:
error_message = f"Error searching connector {connector}: {str(e)}"
print(error_message)
# Stream error message
if streaming_service and writer:
streaming_service.only_update_terminal(error_message, "error")
writer({"yeild_value": streaming_service._format_annotations()})
# Continue with other connectors on error
continue
# Deduplicate source objects by ID before streaming
deduplicated_sources = []
seen_source_keys = set()
for source_obj in all_sources:
# Use combination of source ID and type as a unique identifier
# This ensures we don't accidentally deduplicate sources from different connectors
source_id = source_obj.get('id')
source_type = source_obj.get('type')
if source_id and source_type:
source_key = f"{source_type}_{source_id}"
if source_key not in seen_source_keys:
seen_source_keys.add(source_key)
deduplicated_sources.append(source_obj)
else:
# If there's no ID or type, just add it to be safe
deduplicated_sources.append(source_obj)
# Stream info about deduplicated sources
if streaming_service and writer:
streaming_service.only_update_terminal(f"Collected {len(deduplicated_sources)} unique sources across all connectors")
writer({"yeild_value": streaming_service._format_annotations()})
# After all sources are collected and deduplicated, stream them
if streaming_service and writer:
streaming_service.only_update_sources(deduplicated_sources)
writer({"yeild_value": streaming_service._format_annotations()})
# Deduplicate raw documents based on chunk_id or content
seen_chunk_ids = set()
seen_content_hashes = set()
deduplicated_docs = []
for doc in all_raw_documents:
chunk_id = doc.get("chunk_id")
content = doc.get("content", "")
content_hash = hash(content)
# Skip if we've seen this chunk_id or content before
if (chunk_id and chunk_id in seen_chunk_ids) or content_hash in seen_content_hashes:
continue
# Add to our tracking sets and keep this document
if chunk_id:
seen_chunk_ids.add(chunk_id)
seen_content_hashes.add(content_hash)
deduplicated_docs.append(doc)
# Stream info about deduplicated documents
if streaming_service and writer:
streaming_service.only_update_terminal(f"Found {len(deduplicated_docs)} unique document chunks after deduplication")
writer({"yeild_value": streaming_service._format_annotations()})
# Return deduplicated documents
return deduplicated_docs
async def process_sections(state: State, config: RunnableConfig, writer: StreamWriter) -> Dict[str, Any]:
"""
Process all sections in parallel and combine the results.
This node takes the answer outline from the previous step, fetches relevant documents
for all questions across all sections once, and then processes each section in parallel
using the sub_section_writer graph with the shared document pool.
Returns:
Dict containing the final written report in the "final_written_report" key.
"""
# Get configuration and answer outline from state
configuration = Configuration.from_runnable_config(config)
answer_outline = state.answer_outline
streaming_service = state.streaming_service
streaming_service.only_update_terminal(f"Starting to process research sections...")
writer({"yeild_value": streaming_service._format_annotations()})
print(f"Processing sections from outline: {answer_outline is not None}")
if not answer_outline:
streaming_service.only_update_terminal("Error: No answer outline was provided. Cannot generate report.", "error")
writer({"yeild_value": streaming_service._format_annotations()})
return {
"final_written_report": "No answer outline was provided. Cannot generate final report."
}
# Collect all questions from all sections
all_questions = []
for section in answer_outline.answer_outline:
all_questions.extend(section.questions)
print(f"Collected {len(all_questions)} questions from all sections")
streaming_service.only_update_terminal(f"Found {len(all_questions)} research questions across {len(answer_outline.answer_outline)} sections")
writer({"yeild_value": streaming_service._format_annotations()})
# Fetch relevant documents once for all questions
streaming_service.only_update_terminal("Searching for relevant information across all connectors...")
writer({"yeild_value": streaming_service._format_annotations()})
relevant_documents = []
async with async_session_maker() as db_session:
try:
relevant_documents = await fetch_relevant_documents(
research_questions=all_questions,
user_id=configuration.user_id,
search_space_id=configuration.search_space_id,
db_session=db_session,
connectors_to_search=configuration.connectors_to_search,
writer=writer,
state=state
)
except Exception as e:
error_message = f"Error fetching relevant documents: {str(e)}"
print(error_message)
streaming_service.only_update_terminal(error_message, "error")
writer({"yeild_value": streaming_service._format_annotations()})
# Log the error and continue with an empty list of documents
# This allows the process to continue, but the report might lack information
relevant_documents = []
# Consider adding more robust error handling or reporting if needed
print(f"Fetched {len(relevant_documents)} relevant documents for all sections")
streaming_service.only_update_terminal(f"Starting to draft {len(answer_outline.answer_outline)} sections using {len(relevant_documents)} relevant document chunks")
writer({"yeild_value": streaming_service._format_annotations()})
# Create tasks to process each section in parallel with the same document set
section_tasks = []
streaming_service.only_update_terminal("Creating processing tasks for each section...")
writer({"yeild_value": streaming_service._format_annotations()})
for section in answer_outline.answer_outline:
section_tasks.append(
process_section_with_documents(
section_title=section.section_title,
section_questions=section.questions,
user_query=configuration.user_query,
user_id=configuration.user_id,
search_space_id=configuration.search_space_id,
relevant_documents=relevant_documents,
state=state,
writer=writer
)
)
# Run all section processing tasks in parallel
print(f"Running {len(section_tasks)} section processing tasks in parallel")
streaming_service.only_update_terminal(f"Processing {len(section_tasks)} sections simultaneously...")
writer({"yeild_value": streaming_service._format_annotations()})
section_results = await asyncio.gather(*section_tasks, return_exceptions=True)
# Handle any exceptions in the results
streaming_service.only_update_terminal("Combining section results into final report...")
writer({"yeild_value": streaming_service._format_annotations()})
processed_results = []
for i, result in enumerate(section_results):
if isinstance(result, Exception):
section_title = answer_outline.answer_outline[i].section_title
error_message = f"Error processing section '{section_title}': {str(result)}"
print(error_message)
streaming_service.only_update_terminal(error_message, "error")
writer({"yeild_value": streaming_service._format_annotations()})
processed_results.append(error_message)
else:
processed_results.append(result)
# Combine the results into a final report with section titles
final_report = []
for i, (section, content) in enumerate(zip(answer_outline.answer_outline, processed_results)):
# Skip adding the section header since the content already contains the title
final_report.append(content)
final_report.append("\n")
# Join all sections with newlines
final_written_report = "\n".join(final_report)
print(f"Generated final report with {len(final_report)} parts")
streaming_service.only_update_terminal("Final research report generated successfully!")
writer({"yeild_value": streaming_service._format_annotations()})
if hasattr(state, 'streaming_service') and state.streaming_service:
# Convert the final report to the expected format for UI:
# A list of strings where empty strings represent line breaks
formatted_report = []
for section in final_report:
if section == "\n":
# Add an empty string for line breaks
formatted_report.append("")
else:
# Split any multiline content by newlines and add each line
section_lines = section.split("\n")
formatted_report.extend(section_lines)
state.streaming_service.only_update_answer(formatted_report)
writer({"yeild_value": state.streaming_service._format_annotations()})
return {
"final_written_report": final_written_report
}
async def process_section_with_documents(
section_title: str,
section_questions: List[str],
user_id: str,
search_space_id: int,
relevant_documents: List[Dict[str, Any]],
user_query: str,
state: State = None,
writer: StreamWriter = None
) -> str:
"""
Process a single section using pre-fetched documents.
Args:
section_title: The title of the section
section_questions: List of research questions for this section
user_id: The user ID
search_space_id: The search space ID
relevant_documents: Pre-fetched documents to use for this section
state: The current state
writer: StreamWriter for sending progress updates
Returns:
The written section content
"""
try:
# Use the provided documents
documents_to_use = relevant_documents
# Send status update via streaming if available
if state and state.streaming_service and writer:
state.streaming_service.only_update_terminal(f"Writing section: {section_title} with {len(section_questions)} research questions")
writer({"yeild_value": state.streaming_service._format_annotations()})
# Fallback if no documents found
if not documents_to_use:
print(f"No relevant documents found for section: {section_title}")
if state and state.streaming_service and writer:
state.streaming_service.only_update_terminal(f"Warning: No relevant documents found for section: {section_title}", "warning")
writer({"yeild_value": state.streaming_service._format_annotations()})
documents_to_use = [
{"content": f"No specific information was found for: {question}"}
for question in section_questions
]
# Create a new database session for this section
async with async_session_maker() as db_session:
# Call the sub_section_writer graph with the appropriate config
config = {
"configurable": {
"sub_section_title": section_title,
"sub_section_questions": section_questions,
"user_query": user_query,
"relevant_documents": documents_to_use,
"user_id": user_id,
"search_space_id": search_space_id
}
}
# Create the initial state with db_session
sub_state = {"db_session": db_session}
# Invoke the sub-section writer graph
print(f"Invoking sub_section_writer for: {section_title}")
if state and state.streaming_service and writer:
state.streaming_service.only_update_terminal(f"Analyzing information and drafting content for section: {section_title}")
writer({"yeild_value": state.streaming_service._format_annotations()})
result = await sub_section_writer_graph.ainvoke(sub_state, config)
# Return the final answer from the sub_section_writer
final_answer = result.get("final_answer", "No content was generated for this section.")
# Send section content update via streaming if available
if state and state.streaming_service and writer:
state.streaming_service.only_update_terminal(f"Completed writing section: {section_title}")
writer({"yeild_value": state.streaming_service._format_annotations()})
return final_answer
except Exception as e:
print(f"Error processing section '{section_title}': {str(e)}")
# Send error update via streaming if available
if state and state.streaming_service and writer:
state.streaming_service.only_update_terminal(f"Error processing section '{section_title}': {str(e)}", "error")
writer({"yeild_value": state.streaming_service._format_annotations()})
return f"Error processing section: {section_title}. Details: {str(e)}"

View file

@ -0,0 +1,92 @@
import datetime
def get_answer_outline_system_prompt():
return f"""
Today's date: {datetime.datetime.now().strftime("%Y-%m-%d")}
<answer_outline_system>
You are an expert research assistant specializing in structuring information. Your task is to create a detailed and logical research outline based on the user's query. This outline will serve as the blueprint for generating a comprehensive research report.
<input>
- user_query (string): The main question or topic the user wants researched. This guides the entire outline creation process.
- num_sections (integer): The target number of distinct sections the final research report should have. This helps control the granularity and structure of the outline.
</input>
<output_format>
A JSON object with the following structure:
{{
"answer_outline": [
{{
"section_id": 0,
"section_title": "Section Title",
"questions": [
"Question 1 to research for this section",
"Question 2 to research for this section"
]
}}
]
}}
</output_format>
<instructions>
1. **Deconstruct the `user_query`:** Identify the key concepts, entities, and the core information requested by the user.
2. **Determine Section Themes:** Based on the analysis and the requested `num_sections`, divide the topic into distinct, logical themes or sub-topics. Each theme will become a section. Ensure these themes collectively address the `user_query` comprehensively.
3. **Develop Sections:** For *each* of the `num_sections`:
* **Assign `section_id`:** Start with 0 and increment sequentially for each section.
* **Craft `section_title`:** Write a concise, descriptive title that clearly defines the scope and focus of the section's theme.
* **Formulate Research `questions`:** Generate 2 to 5 specific, targeted research questions for this section. These questions must:
* Directly relate to the `section_title` and explore its key aspects.
* Be answerable through focused research (e.g., searching documents, databases, or knowledge bases).
* Be distinct from each other and from questions in other sections. Avoid redundancy.
* Collectively guide the gathering of information needed to fully address the section's theme.
4. **Ensure Logical Flow:** Arrange the sections in a coherent and intuitive sequence. Consider structures like:
* General background -> Specific details -> Analysis/Comparison -> Applications/Implications
* Problem definition -> Proposed solutions -> Evaluation -> Conclusion
* Chronological progression
5. **Verify Completeness and Cohesion:** Review the entire outline (`section_titles` and `questions`) to confirm that:
* All sections together provide a complete and well-structured answer to the original `user_query`.
* There are no significant overlaps or gaps in coverage between sections.
6. **Adhere Strictly to Output Format:** Ensure the final output is a valid JSON object matching the specified structure exactly, including correct field names (`answer_outline`, `section_id`, `section_title`, `questions`) and data types.
</instructions>
<examples>
User Query: "What are the health benefits of meditation?"
Number of Sections: 3
{{
"answer_outline": [
{{
"section_id": 0,
"section_title": "Physical Health Benefits of Meditation",
"questions": [
"What physiological changes occur in the body during meditation?",
"How does regular meditation affect blood pressure and heart health?",
"What impact does meditation have on inflammation and immune function?",
"Can meditation help with pain management, and if so, how?"
]
}},
{{
"section_id": 1,
"section_title": "Mental Health Benefits of Meditation",
"questions": [
"How does meditation affect stress and anxiety levels?",
"What changes in brain structure or function have been observed in meditation practitioners?",
"Can meditation help with depression and mood disorders?",
"What is the relationship between meditation and cognitive function?"
]
}},
{{
"section_id": 2,
"section_title": "Best Meditation Practices for Maximum Benefits",
"questions": [
"What are the most effective meditation techniques for beginners?",
"How long and how frequently should one meditate to see benefits?",
"Are there specific meditation approaches best suited for particular health goals?",
"What common obstacles prevent people from experiencing meditation benefits?"
]
}}
]
}}
</examples>
</answer_outline_system>
"""

View file

@ -0,0 +1,31 @@
"""Define the state structures for the agent."""
from __future__ import annotations
from dataclasses import dataclass, field
from typing import Optional, Any
from sqlalchemy.ext.asyncio import AsyncSession
from app.utils.streaming_service import StreamingService
@dataclass
class State:
"""Defines the dynamic state for the agent during execution.
This state tracks the database session and the outputs generated by the agent's nodes.
See: https://langchain-ai.github.io/langgraph/concepts/low_level/#state
for more information.
"""
# Runtime context (not part of actual graph state)
db_session: AsyncSession
# Streaming service
streaming_service: StreamingService
# Intermediate state - populated during workflow
# Using field to explicitly mark as part of state
answer_outline: Optional[Any] = field(default=None)
# OUTPUT: Populated by agent nodes
# Using field to explicitly mark as part of state
final_written_report: Optional[str] = field(default=None)

View file

@ -0,0 +1,8 @@
"""New LangGraph Agent.
This module defines a custom graph.
"""
from .graph import graph
__all__ = ["graph"]

View file

@ -0,0 +1,31 @@
"""Define the configurable parameters for the agent."""
from __future__ import annotations
from dataclasses import dataclass, fields
from typing import Optional, List, Any
from langchain_core.runnables import RunnableConfig
@dataclass(kw_only=True)
class Configuration:
"""The configuration for the agent."""
# Input parameters provided at invocation
sub_section_title: str
sub_section_questions: List[str]
user_query: str
relevant_documents: List[Any] # Documents provided directly to the agent
user_id: str
search_space_id: int
@classmethod
def from_runnable_config(
cls, config: Optional[RunnableConfig] = None
) -> Configuration:
"""Create a Configuration instance from a RunnableConfig object."""
configurable = (config.get("configurable") or {}) if config else {}
_fields = {f.name for f in fields(cls) if f.init}
return cls(**{k: v for k, v in configurable.items() if k in _fields})

View file

@ -0,0 +1,20 @@
from langgraph.graph import StateGraph
from .state import State
from .nodes import write_sub_section, rerank_documents
from .configuration import Configuration
# Define a new graph
workflow = StateGraph(State, config_schema=Configuration)
# Add the nodes to the graph
workflow.add_node("rerank_documents", rerank_documents)
workflow.add_node("write_sub_section", write_sub_section)
# Connect the nodes
workflow.add_edge("__start__", "rerank_documents")
workflow.add_edge("rerank_documents", "write_sub_section")
workflow.add_edge("write_sub_section", "__end__")
# Compile the workflow into an executable graph
graph = workflow.compile()
graph.name = "Sub Section Writer" # This defines the custom name in LangSmith

View file

@ -0,0 +1,160 @@
from .configuration import Configuration
from langchain_core.runnables import RunnableConfig
from .state import State
from typing import Any, Dict
from app.config import config as app_config
from .prompts import get_citation_system_prompt
from langchain_core.messages import HumanMessage, SystemMessage
async def rerank_documents(state: State, config: RunnableConfig) -> Dict[str, Any]:
"""
Rerank the documents based on relevance to the sub-section title.
This node takes the relevant documents provided in the configuration,
reranks them using the reranker service based on the sub-section title,
and updates the state with the reranked documents.
Returns:
Dict containing the reranked documents.
"""
# Get configuration and relevant documents
configuration = Configuration.from_runnable_config(config)
documents = configuration.relevant_documents
sub_section_questions = configuration.sub_section_questions
# If no documents were provided, return empty list
if not documents or len(documents) == 0:
return {
"reranked_documents": []
}
# Get reranker service from app config
reranker_service = getattr(app_config, "reranker_service", None)
# Use documents as is if no reranker service is available
reranked_docs = documents
if reranker_service:
try:
# Use the sub-section questions for reranking context
# rerank_query = "\n".join(sub_section_questions)
rerank_query = configuration.user_query
# Convert documents to format expected by reranker if needed
reranker_input_docs = [
{
"chunk_id": doc.get("chunk_id", f"chunk_{i}"),
"content": doc.get("content", ""),
"score": doc.get("score", 0.0),
"document": {
"id": doc.get("document", {}).get("id", ""),
"title": doc.get("document", {}).get("title", ""),
"document_type": doc.get("document", {}).get("document_type", ""),
"metadata": doc.get("document", {}).get("metadata", {})
}
} for i, doc in enumerate(documents)
]
# Rerank documents using the section title
reranked_docs = reranker_service.rerank_documents(rerank_query, reranker_input_docs)
# Sort by score in descending order
reranked_docs.sort(key=lambda x: x.get("score", 0), reverse=True)
print(f"Reranked {len(reranked_docs)} documents for section: {configuration.sub_section_title}")
except Exception as e:
print(f"Error during reranking: {str(e)}")
# Use original docs if reranking fails
return {
"reranked_documents": reranked_docs
}
async def write_sub_section(state: State, config: RunnableConfig) -> Dict[str, Any]:
"""
Write the sub-section using the provided documents.
This node takes the relevant documents provided in the configuration and uses
an LLM to generate a comprehensive answer to the sub-section title with
proper citations. The citations follow IEEE format using source IDs from the
documents.
Returns:
Dict containing the final answer in the "final_answer" key.
"""
# Get configuration and relevant documents from configuration
configuration = Configuration.from_runnable_config(config)
documents = configuration.relevant_documents
# Initialize LLM
llm = app_config.fast_llm_instance
# If no documents were provided, return a message indicating this
if not documents or len(documents) == 0:
return {
"final_answer": "No relevant documents were provided to answer this question. Please provide documents or try a different approach."
}
# Prepare documents for citation formatting
formatted_documents = []
for i, doc in enumerate(documents):
# Extract content and metadata
content = doc.get("content", "")
doc_info = doc.get("document", {})
document_id = doc_info.get("id", f"{i+1}") # Use document ID or index+1 as source_id
# Format document according to the citation system prompt's expected format
formatted_doc = f"""
<document>
<metadata>
<source_id>{document_id}</source_id>
</metadata>
<content>
{content}
</content>
</document>
"""
formatted_documents.append(formatted_doc)
# Create the query that uses the section title and questions
section_title = configuration.sub_section_title
sub_section_questions = configuration.sub_section_questions
user_query = configuration.user_query # Get the original user query
documents_text = "\n".join(formatted_documents)
# Format the questions as bullet points for clarity
questions_text = "\n".join([f"- {question}" for question in sub_section_questions])
# Construct a clear, structured query for the LLM
human_message_content = f"""
Now user's query is:
<user_query>
{user_query}
</user_query>
The sub-section title is:
<sub_section_title>
{section_title}
</sub_section_title>
Use the provided documents as your source material and cite them properly using the IEEE citation format [X] where X is the source_id.
<documents>
{documents_text}
</documents>
"""
# Create messages for the LLM
messages = [
SystemMessage(content=get_citation_system_prompt()),
HumanMessage(content=human_message_content)
]
# Call the LLM and get the response
response = await llm.ainvoke(messages)
final_answer = response.content
return {
"final_answer": final_answer
}

View file

@ -0,0 +1,87 @@
import datetime
def get_citation_system_prompt():
return f"""
Today's date: {datetime.datetime.now().strftime("%Y-%m-%d")}
You are a research assistant tasked with analyzing documents and providing comprehensive answers with proper citations in IEEE format.
<instructions>
1. Carefully analyze all provided documents in the <document> section's.
2. Extract relevant information that addresses the user's query.
3. Synthesize a comprehensive, well-structured answer using information from these documents.
4. For EVERY piece of information you include from the documents, add an IEEE-style citation in square brackets [X] where X is the source_id from the document's metadata.
5. Make sure ALL factual statements from the documents have proper citations.
6. If multiple documents support the same point, include all relevant citations [X], [Y].
7. Present information in a logical, coherent flow.
8. Use your own words to connect ideas, but cite ALL information from the documents.
9. If documents contain conflicting information, acknowledge this and present both perspectives with appropriate citations.
10. Do not make up or include information not found in the provided documents.
11. CRITICAL: You MUST use the exact source_id value from each document's metadata for citations. Do not create your own citation numbers.
12. CRITICAL: Every citation MUST be in the IEEE format [X] where X is the exact source_id value.
13. CRITICAL: Never renumber or reorder citations - always use the original source_id values.
14. CRITICAL: Do not return citations as clickable links.
15. CRITICAL: Never format citations as markdown links like "([1](https://example.com))". Always use plain square brackets only.
16. CRITICAL: Citations must ONLY appear as [X] or [X], [Y], [Z] format - never with parentheses, hyperlinks, or other formatting.
17. CRITICAL: Never make up citation numbers. Only use source_id values that are explicitly provided in the document metadata.
18. CRITICAL: If you are unsure about a source_id, do not include a citation rather than guessing or making one up.
</instructions>
<format>
- Write in clear, professional language suitable for academic or technical audiences
- Organize your response with appropriate paragraphs, headings, and structure
- Every fact from the documents must have an IEEE-style citation in square brackets [X] where X is the EXACT source_id from the document's metadata
- Citations should appear at the end of the sentence containing the information they support
- Multiple citations should be separated by commas: [X], [Y], [Z]
- No need to return references section. Just citation numbers in answer.
- NEVER create your own citation numbering system - use the exact source_id values from the documents.
- NEVER format citations as clickable links or as markdown links like "([1](https://example.com))". Always use plain square brackets only.
- NEVER make up citation numbers if you are unsure about the source_id. It is better to omit the citation than to guess.
</format>
<input_example>
<document>
<metadata>
<source_id>1</source_id>
</metadata>
<content>
The Great Barrier Reef is the world's largest coral reef system, stretching over 2,300 kilometers along the coast of Queensland, Australia. It comprises over 2,900 individual reefs and 900 islands.
</content>
</document>
<document>
<metadata>
<source_id>13</source_id>
</metadata>
<content>
Climate change poses a significant threat to coral reefs worldwide. Rising ocean temperatures have led to mass coral bleaching events in the Great Barrier Reef in 2016, 2017, and 2020.
</content>
</document>
<document>
<metadata>
<source_id>21</source_id>
</metadata>
<content>
The Great Barrier Reef was designated a UNESCO World Heritage Site in 1981 due to its outstanding universal value and biological diversity. It is home to over 1,500 species of fish and 400 types of coral.
</content>
</document>
</input_example>
<output_example>
The Great Barrier Reef is the world's largest coral reef system, stretching over 2,300 kilometers along the coast of Queensland, Australia [1]. It was designated a UNESCO World Heritage Site in 1981 due to its outstanding universal value and biological diversity [21]. The reef is home to over 1,500 species of fish and 400 types of coral [21]. Unfortunately, climate change poses a significant threat to coral reefs worldwide, with rising ocean temperatures leading to mass coral bleaching events in the Great Barrier Reef in 2016, 2017, and 2020 [13]. The reef system comprises over 2,900 individual reefs and 900 islands [1], making it an ecological treasure that requires protection from multiple threats [1], [13].
</output_example>
<incorrect_citation_formats>
DO NOT use any of these incorrect citation formats:
- Using parentheses and markdown links: ([1](https://github.com/MODSetter/SurfSense))
- Using parentheses around brackets: ([1])
- Using hyperlinked text: [link to source 1](https://example.com)
- Using footnote style: ... reef system¹
- Making up citation numbers when source_id is unknown
ONLY use plain square brackets [1] or multiple citations [1], [2], [3]
</incorrect_citation_formats>
Note that the citation numbers match exactly with the source_id values (1, 13, and 21) and are not renumbered sequentially. Citations follow IEEE style with square brackets and appear at the end of sentences.
"""

View file

@ -0,0 +1,23 @@
"""Define the state structures for the agent."""
from __future__ import annotations
from dataclasses import dataclass
from typing import List, Optional, Any
from sqlalchemy.ext.asyncio import AsyncSession
@dataclass
class State:
"""Defines the dynamic state for the agent during execution.
This state tracks the database session and the outputs generated by the agent's nodes.
See: https://langchain-ai.github.io/langgraph/concepts/low_level/#state
for more information.
"""
# Runtime context
db_session: AsyncSession
# OUTPUT: Populated by agent nodes
reranked_documents: Optional[List[Any]] = None
final_answer: Optional[str] = None

View file

@ -15,17 +15,6 @@ env_file = BASE_DIR / ".env"
load_dotenv(env_file)
def extract_model_name(llm_string: str) -> str:
"""Extract the model name from an LLM string.
Example: "litellm:openai/gpt-4o-mini" -> "openai/gpt-4o-mini"
Args:
llm_string: The LLM string with optional prefix
Returns:
str: The extracted model name
"""
return llm_string.split(":", 1)[1] if ":" in llm_string else llm_string
class Config:
# Database
@ -38,15 +27,13 @@ class Config:
# LONG-CONTEXT LLMS
LONG_CONTEXT_LLM = os.getenv("LONG_CONTEXT_LLM")
long_context_llm_instance = ChatLiteLLM(model=extract_model_name(LONG_CONTEXT_LLM))
long_context_llm_instance = ChatLiteLLM(model=LONG_CONTEXT_LLM)
# GPT Researcher
FAST_LLM = os.getenv("FAST_LLM")
SMART_LLM = os.getenv("SMART_LLM")
STRATEGIC_LLM = os.getenv("STRATEGIC_LLM")
fast_llm_instance = ChatLiteLLM(model=extract_model_name(FAST_LLM))
smart_llm_instance = ChatLiteLLM(model=extract_model_name(SMART_LLM))
strategic_llm_instance = ChatLiteLLM(model=extract_model_name(STRATEGIC_LLM))
fast_llm_instance = ChatLiteLLM(model=FAST_LLM)
strategic_llm_instance = ChatLiteLLM(model=STRATEGIC_LLM)
# Chonkie Configuration | Edit this to your needs

View file

@ -0,0 +1,211 @@
import base64
import logging
from typing import List, Optional, Dict, Any, Tuple
from github3 import login as github_login, exceptions as github_exceptions
from github3.repos.contents import Contents
from github3.exceptions import ForbiddenError, NotFoundError
logger = logging.getLogger(__name__)
# List of common code file extensions to target
CODE_EXTENSIONS = {
'.py', '.js', '.jsx', '.ts', '.tsx', '.java', '.c', '.cpp', '.h', '.hpp',
'.cs', '.go', '.rb', '.php', '.swift', '.kt', '.scala', '.rs', '.m',
'.sh', '.bash', '.ps1', '.lua', '.pl', '.pm', '.r', '.dart', '.sql'
}
# List of common documentation/text file extensions
DOC_EXTENSIONS = {
'.md', '.txt', '.rst', '.adoc', '.html', '.htm', '.xml', '.json', '.yaml', '.yml', '.toml'
}
# Maximum file size in bytes (e.g., 1MB)
MAX_FILE_SIZE = 1 * 1024 * 1024
class GitHubConnector:
"""Connector for interacting with the GitHub API."""
# Directories to skip during file traversal
SKIPPED_DIRS = {
# Version control
'.git',
# Dependencies
'node_modules',
'vendor',
# Build artifacts / Caches
'build',
'dist',
'target',
'__pycache__',
# Virtual environments
'venv',
'.venv',
'env',
# IDE/Editor config
'.vscode',
'.idea',
'.project',
'.settings',
# Temporary / Logs
'tmp',
'logs',
# Add other project-specific irrelevant directories if needed
}
def __init__(self, token: str):
"""
Initializes the GitHub connector.
Args:
token: GitHub Personal Access Token (PAT).
"""
if not token:
raise ValueError("GitHub token cannot be empty.")
try:
self.gh = github_login(token=token)
# Try a simple authenticated call to check token validity
self.gh.me()
logger.info("Successfully authenticated with GitHub API.")
except (github_exceptions.AuthenticationFailed, ForbiddenError) as e:
logger.error(f"GitHub authentication failed: {e}")
raise ValueError("Invalid GitHub token or insufficient permissions.")
except Exception as e:
logger.error(f"Failed to initialize GitHub client: {e}")
raise
def get_user_repositories(self) -> List[Dict[str, Any]]:
"""Fetches repositories accessible by the authenticated user."""
repos_data = []
try:
# type='owner' fetches repos owned by the user
# type='member' fetches repos the user is a collaborator on (including orgs)
# type='all' fetches both
for repo in self.gh.repositories(type='owner', sort='updated'):
repos_data.append({
"id": repo.id,
"name": repo.name,
"full_name": repo.full_name,
"private": repo.private,
"url": repo.html_url,
"description": repo.description or "",
"last_updated": repo.updated_at if repo.updated_at else None,
})
logger.info(f"Fetched {len(repos_data)} repositories.")
return repos_data
except Exception as e:
logger.error(f"Failed to fetch GitHub repositories: {e}")
return [] # Return empty list on error
def get_repository_files(self, repo_full_name: str, path: str = '') -> List[Dict[str, Any]]:
"""
Recursively fetches details of relevant files (code, docs) within a repository path.
Args:
repo_full_name: The full name of the repository (e.g., 'owner/repo').
path: The starting path within the repository (default is root).
Returns:
A list of dictionaries, each containing file details (path, sha, url, size).
Returns an empty list if the repository or path is not found or on error.
"""
files_list = []
try:
owner, repo_name = repo_full_name.split('/')
repo = self.gh.repository(owner, repo_name)
if not repo:
logger.warning(f"Repository '{repo_full_name}' not found.")
return []
contents = repo.directory_contents(directory_path=path) # Use directory_contents for clarity
# contents returns a list of tuples (name, content_obj)
for item_name, content_item in contents:
if not isinstance(content_item, Contents):
continue
if content_item.type == 'dir':
# Check if the directory name is in the skipped list
if content_item.name in self.SKIPPED_DIRS:
logger.debug(f"Skipping directory: {content_item.path}")
continue # Skip recursion for this directory
# Recursively fetch contents of subdirectory
files_list.extend(self.get_repository_files(repo_full_name, path=content_item.path))
elif content_item.type == 'file':
# Check if the file extension is relevant and size is within limits
file_extension = '.' + content_item.name.split('.')[-1].lower() if '.' in content_item.name else ''
is_code = file_extension in CODE_EXTENSIONS
is_doc = file_extension in DOC_EXTENSIONS
if (is_code or is_doc) and content_item.size <= MAX_FILE_SIZE:
files_list.append({
"path": content_item.path,
"sha": content_item.sha,
"url": content_item.html_url,
"size": content_item.size,
"type": "code" if is_code else "doc"
})
elif content_item.size > MAX_FILE_SIZE:
logger.debug(f"Skipping large file: {content_item.path} ({content_item.size} bytes)")
else:
logger.debug(f"Skipping irrelevant file type: {content_item.path}")
except (NotFoundError, ForbiddenError) as e:
logger.warning(f"Cannot access path '{path}' in '{repo_full_name}': {e}")
except Exception as e:
logger.error(f"Failed to get files for {repo_full_name} at path '{path}': {e}")
# Return what we have collected so far in case of partial failure
return files_list
def get_file_content(self, repo_full_name: str, file_path: str) -> Optional[str]:
"""
Fetches the decoded content of a specific file.
Args:
repo_full_name: The full name of the repository (e.g., 'owner/repo').
file_path: The path to the file within the repository.
Returns:
The decoded file content as a string, or None if fetching fails or file is too large.
"""
try:
owner, repo_name = repo_full_name.split('/')
repo = self.gh.repository(owner, repo_name)
if not repo:
logger.warning(f"Repository '{repo_full_name}' not found when fetching file '{file_path}'.")
return None
content_item = repo.file_contents(path=file_path) # Use file_contents for clarity
if not content_item or not isinstance(content_item, Contents) or content_item.type != 'file':
logger.warning(f"File '{file_path}' not found or is not a file in '{repo_full_name}'.")
return None
if content_item.size > MAX_FILE_SIZE:
logger.warning(f"File '{file_path}' in '{repo_full_name}' exceeds max size ({content_item.size} > {MAX_FILE_SIZE}). Skipping content fetch.")
return None
# Content is base64 encoded
if content_item.content:
try:
decoded_content = base64.b64decode(content_item.content).decode('utf-8')
return decoded_content
except UnicodeDecodeError:
logger.warning(f"Could not decode file '{file_path}' in '{repo_full_name}' as UTF-8. Trying with 'latin-1'.")
try:
# Try a fallback encoding
decoded_content = base64.b64decode(content_item.content).decode('latin-1')
return decoded_content
except Exception as decode_err:
logger.error(f"Failed to decode file '{file_path}' with fallback encoding: {decode_err}")
return None # Give up if fallback fails
else:
logger.warning(f"No content returned for file '{file_path}' in '{repo_full_name}'. It might be empty.")
return "" # Return empty string for empty files
except (NotFoundError, ForbiddenError) as e:
logger.warning(f"Cannot access file '{file_path}' in '{repo_full_name}': {e}")
return None
except Exception as e:
logger.error(f"Failed to get content for file '{file_path}' in '{repo_full_name}': {e}")
return None

View file

@ -0,0 +1,454 @@
"""
Linear Connector Module
A module for retrieving issues and comments from Linear.
Allows fetching issue lists and their comments with date range filtering.
"""
import requests
from datetime import datetime, timedelta
from typing import Dict, List, Optional, Tuple, Any, Union
class LinearConnector:
"""Class for retrieving issues and comments from Linear."""
def __init__(self, token: str = None):
"""
Initialize the LinearConnector class.
Args:
token: Linear API token (optional, can be set later with set_token)
"""
self.token = token
self.api_url = "https://api.linear.app/graphql"
def set_token(self, token: str) -> None:
"""
Set the Linear API token.
Args:
token: Linear API token
"""
self.token = token
def get_headers(self) -> Dict[str, str]:
"""
Get headers for Linear API requests.
Returns:
Dictionary of headers
Raises:
ValueError: If no Linear token has been set
"""
if not self.token:
raise ValueError("Linear token not initialized. Call set_token() first.")
return {
'Content-Type': 'application/json',
'Authorization': self.token
}
def execute_graphql_query(self, query: str, variables: Dict[str, Any] = None) -> Dict[str, Any]:
"""
Execute a GraphQL query against the Linear API.
Args:
query: GraphQL query string
variables: Variables for the GraphQL query (optional)
Returns:
Response data from the API
Raises:
ValueError: If no Linear token has been set
Exception: If the API request fails
"""
if not self.token:
raise ValueError("Linear token not initialized. Call set_token() first.")
headers = self.get_headers()
payload = {'query': query}
if variables:
payload['variables'] = variables
response = requests.post(
self.api_url,
headers=headers,
json=payload
)
if response.status_code == 200:
return response.json()
else:
raise Exception(f"Query failed with status code {response.status_code}: {response.text}")
def get_all_issues(self, include_comments: bool = True) -> List[Dict[str, Any]]:
"""
Fetch all issues from Linear.
Args:
include_comments: Whether to include comments in the response
Returns:
List of issue objects
Raises:
ValueError: If no Linear token has been set
Exception: If the API request fails
"""
comments_query = ""
if include_comments:
comments_query = """
comments {
nodes {
id
body
user {
id
name
email
}
createdAt
updatedAt
}
}
"""
query = f"""
query {{
issues {{
nodes {{
id
identifier
title
description
state {{
id
name
type
}}
assignee {{
id
name
email
}}
creator {{
id
name
email
}}
createdAt
updatedAt
{comments_query}
}}
}}
}}
"""
result = self.execute_graphql_query(query)
# Extract issues from the response
if "data" in result and "issues" in result["data"] and "nodes" in result["data"]["issues"]:
return result["data"]["issues"]["nodes"]
return []
def get_issues_by_date_range(
self,
start_date: str,
end_date: str,
include_comments: bool = True
) -> Tuple[List[Dict[str, Any]], Optional[str]]:
"""
Fetch issues within a date range.
Args:
start_date: Start date in YYYY-MM-DD format
end_date: End date in YYYY-MM-DD format (inclusive)
include_comments: Whether to include comments in the response
Returns:
Tuple containing (issues list, error message or None)
"""
# Convert date strings to ISO format
try:
# For Linear API: we need to use a more specific format for the filter
# Instead of DateTime, use a string in the filter for DateTimeOrDuration
comments_query = ""
if include_comments:
comments_query = """
comments {
nodes {
id
body
user {
id
name
email
}
createdAt
updatedAt
}
}
"""
# Query issues that were either created OR updated within the date range
# This ensures we catch both new issues and updated existing issues
query = f"""
query IssuesByDateRange($after: String) {{
issues(
first: 100,
after: $after,
filter: {{
or: [
{{
createdAt: {{
gte: "{start_date}T00:00:00Z"
lte: "{end_date}T23:59:59Z"
}}
}},
{{
updatedAt: {{
gte: "{start_date}T00:00:00Z"
lte: "{end_date}T23:59:59Z"
}}
}}
]
}}
) {{
nodes {{
id
identifier
title
description
state {{
id
name
type
}}
assignee {{
id
name
email
}}
creator {{
id
name
email
}}
createdAt
updatedAt
{comments_query}
}}
pageInfo {{
hasNextPage
endCursor
}}
}}
}}
"""
try:
all_issues = []
has_next_page = True
cursor = None
# Handle pagination to get all issues
while has_next_page:
variables = {"after": cursor} if cursor else {}
result = self.execute_graphql_query(query, variables)
# Check for errors
if "errors" in result:
error_message = "; ".join([error.get("message", "Unknown error") for error in result["errors"]])
return [], f"GraphQL errors: {error_message}"
# Extract issues from the response
if "data" in result and "issues" in result["data"]:
issues_page = result["data"]["issues"]
# Add issues from this page
if "nodes" in issues_page:
all_issues.extend(issues_page["nodes"])
# Check if there are more pages
if "pageInfo" in issues_page:
page_info = issues_page["pageInfo"]
has_next_page = page_info.get("hasNextPage", False)
cursor = page_info.get("endCursor") if has_next_page else None
else:
has_next_page = False
else:
has_next_page = False
if not all_issues:
return [], "No issues found in the specified date range."
return all_issues, None
except Exception as e:
return [], f"Error fetching issues: {str(e)}"
except ValueError as e:
return [], f"Invalid date format: {str(e)}. Please use YYYY-MM-DD."
def format_issue(self, issue: Dict[str, Any]) -> Dict[str, Any]:
"""
Format an issue for easier consumption.
Args:
issue: The issue object from Linear API
Returns:
Formatted issue dictionary
"""
# Extract basic issue details
formatted = {
"id": issue.get("id", ""),
"identifier": issue.get("identifier", ""),
"title": issue.get("title", ""),
"description": issue.get("description", ""),
"state": issue.get("state", {}).get("name", "Unknown") if issue.get("state") else "Unknown",
"state_type": issue.get("state", {}).get("type", "Unknown") if issue.get("state") else "Unknown",
"created_at": issue.get("createdAt", ""),
"updated_at": issue.get("updatedAt", ""),
"creator": {
"id": issue.get("creator", {}).get("id", "") if issue.get("creator") else "",
"name": issue.get("creator", {}).get("name", "Unknown") if issue.get("creator") else "Unknown",
"email": issue.get("creator", {}).get("email", "") if issue.get("creator") else ""
} if issue.get("creator") else {"id": "", "name": "Unknown", "email": ""},
"assignee": {
"id": issue.get("assignee", {}).get("id", ""),
"name": issue.get("assignee", {}).get("name", "Unknown"),
"email": issue.get("assignee", {}).get("email", "")
} if issue.get("assignee") else None,
"comments": []
}
# Extract comments if available
if "comments" in issue and "nodes" in issue["comments"]:
for comment in issue["comments"]["nodes"]:
formatted_comment = {
"id": comment.get("id", ""),
"body": comment.get("body", ""),
"created_at": comment.get("createdAt", ""),
"updated_at": comment.get("updatedAt", ""),
"user": {
"id": comment.get("user", {}).get("id", "") if comment.get("user") else "",
"name": comment.get("user", {}).get("name", "Unknown") if comment.get("user") else "Unknown",
"email": comment.get("user", {}).get("email", "") if comment.get("user") else ""
} if comment.get("user") else {"id": "", "name": "Unknown", "email": ""}
}
formatted["comments"].append(formatted_comment)
return formatted
def format_issue_to_markdown(self, issue: Dict[str, Any]) -> str:
"""
Convert an issue to markdown format.
Args:
issue: The issue object (either raw or formatted)
Returns:
Markdown string representation of the issue
"""
# Format the issue if it's not already formatted
if "identifier" not in issue:
issue = self.format_issue(issue)
# Build the markdown content
markdown = f"# {issue.get('identifier', 'No ID')}: {issue.get('title', 'No Title')}\n\n"
if issue.get('state'):
markdown += f"**Status:** {issue['state']}\n\n"
if issue.get('assignee') and issue['assignee'].get('name'):
markdown += f"**Assignee:** {issue['assignee']['name']}\n"
if issue.get('creator') and issue['creator'].get('name'):
markdown += f"**Created by:** {issue['creator']['name']}\n"
if issue.get('created_at'):
created_date = self.format_date(issue['created_at'])
markdown += f"**Created:** {created_date}\n"
if issue.get('updated_at'):
updated_date = self.format_date(issue['updated_at'])
markdown += f"**Updated:** {updated_date}\n\n"
if issue.get('description'):
markdown += f"## Description\n\n{issue['description']}\n\n"
if issue.get('comments'):
markdown += f"## Comments ({len(issue['comments'])})\n\n"
for comment in issue['comments']:
user_name = "Unknown"
if comment.get('user') and comment['user'].get('name'):
user_name = comment['user']['name']
comment_date = "Unknown date"
if comment.get('created_at'):
comment_date = self.format_date(comment['created_at'])
markdown += f"### {user_name} ({comment_date})\n\n{comment.get('body', '')}\n\n---\n\n"
return markdown
@staticmethod
def format_date(iso_date: str) -> str:
"""
Format an ISO date string to a more readable format.
Args:
iso_date: ISO format date string
Returns:
Formatted date string
"""
if not iso_date or not isinstance(iso_date, str):
return "Unknown date"
try:
dt = datetime.fromisoformat(iso_date.replace('Z', '+00:00'))
return dt.strftime('%Y-%m-%d %H:%M:%S')
except ValueError:
return iso_date
# Example usage (uncomment to use):
"""
if __name__ == "__main__":
# Set your token here
token = "YOUR_LINEAR_API_KEY"
linear = LinearConnector(token)
try:
# Get all issues with comments
issues = linear.get_all_issues()
print(f"Retrieved {len(issues)} issues")
# Format and print the first issue as markdown
if issues:
issue_md = linear.format_issue_to_markdown(issues[0])
print("\nSample Issue in Markdown:\n")
print(issue_md)
# Get issues by date range
start_date = "2023-01-01"
end_date = "2023-01-31"
date_issues, error = linear.get_issues_by_date_range(start_date, end_date)
if error:
print(f"Error: {error}")
else:
print(f"\nRetrieved {len(date_issues)} issues from {start_date} to {end_date}")
except Exception as e:
print(f"Error: {e}")
"""

View file

@ -40,12 +40,16 @@ class DocumentType(str, Enum):
SLACK_CONNECTOR = "SLACK_CONNECTOR"
NOTION_CONNECTOR = "NOTION_CONNECTOR"
YOUTUBE_VIDEO = "YOUTUBE_VIDEO"
GITHUB_CONNECTOR = "GITHUB_CONNECTOR"
LINEAR_CONNECTOR = "LINEAR_CONNECTOR"
class SearchSourceConnectorType(str, Enum):
SERPER_API = "SERPER_API"
TAVILY_API = "TAVILY_API"
SLACK_CONNECTOR = "SLACK_CONNECTOR"
NOTION_CONNECTOR = "NOTION_CONNECTOR"
GITHUB_CONNECTOR = "GITHUB_CONNECTOR"
LINEAR_CONNECTOR = "LINEAR_CONNECTOR"
class ChatType(str, Enum):
GENERAL = "GENERAL"

View file

@ -1,14 +1,15 @@
from fastapi import APIRouter, Depends, HTTPException, Query
from fastapi.responses import StreamingResponse
from sqlalchemy.ext.asyncio import AsyncSession
from sqlalchemy.future import select
from sqlalchemy.exc import IntegrityError, OperationalError
from typing import List
from app.db import get_async_session, User, SearchSpace, Chat
from app.schemas import ChatCreate, ChatUpdate, ChatRead, AISDKChatRequest
from app.db import Chat, SearchSpace, User, get_async_session
from app.schemas import AISDKChatRequest, ChatCreate, ChatRead, ChatUpdate
from app.tasks.stream_connector_search_results import stream_connector_search_results
from app.users import current_active_user
from app.utils.check_ownership import check_ownership
from fastapi import APIRouter, Depends, HTTPException
from fastapi.responses import StreamingResponse
from sqlalchemy.exc import IntegrityError, OperationalError
from sqlalchemy.ext.asyncio import AsyncSession
from sqlalchemy.future import select
router = APIRouter()
@ -46,7 +47,7 @@ async def handle_chat_data(
response = StreamingResponse(stream_connector_search_results(
user_query,
user.id,
search_space_id,
search_space_id, # Already converted to int in lines 32-37
session,
research_mode,
selected_connectors
@ -88,16 +89,19 @@ async def create_chat(
async def read_chats(
skip: int = 0,
limit: int = 100,
search_space_id: int = None,
session: AsyncSession = Depends(get_async_session),
user: User = Depends(current_active_user)
):
try:
query = select(Chat).join(SearchSpace).filter(SearchSpace.user_id == user.id)
# Filter by search_space_id if provided
if search_space_id is not None:
query = query.filter(Chat.search_space_id == search_space_id)
result = await session.execute(
select(Chat)
.join(SearchSpace)
.filter(SearchSpace.user_id == user.id)
.offset(skip)
.limit(limit)
query.offset(skip).limit(limit)
)
return result.scalars().all()
except OperationalError:

View file

@ -170,16 +170,19 @@ async def process_file_in_background(
async def read_documents(
skip: int = 0,
limit: int = 300,
search_space_id: int = None,
session: AsyncSession = Depends(get_async_session),
user: User = Depends(current_active_user)
):
try:
query = select(Document).join(SearchSpace).filter(SearchSpace.user_id == user.id)
# Filter by search_space_id if provided
if search_space_id is not None:
query = query.filter(Document.search_space_id == search_space_id)
result = await session.execute(
select(Document)
.join(SearchSpace)
.filter(SearchSpace.user_id == user.id)
.offset(skip)
.limit(limit)
query.offset(skip).limit(limit)
)
db_documents = result.scalars().all()

View file

@ -7,20 +7,21 @@ PUT /search-source-connectors/{connector_id} - Update a specific connector
DELETE /search-source-connectors/{connector_id} - Delete a specific connector
POST /search-source-connectors/{connector_id}/index - Index content from a connector to a search space
Note: Each user can have only one connector of each type (SERPER_API, TAVILY_API, SLACK_CONNECTOR, NOTION_CONNECTOR).
Note: Each user can have only one connector of each type (SERPER_API, TAVILY_API, SLACK_CONNECTOR, NOTION_CONNECTOR, GITHUB_CONNECTOR, LINEAR_CONNECTOR).
"""
from fastapi import APIRouter, Depends, HTTPException, Query, BackgroundTasks
from fastapi import APIRouter, Depends, HTTPException, Query, BackgroundTasks, Body
from sqlalchemy.ext.asyncio import AsyncSession
from sqlalchemy.future import select
from sqlalchemy.exc import IntegrityError
from typing import List, Dict, Any
from app.db import get_async_session, User, SearchSourceConnector, SearchSourceConnectorType, SearchSpace
from app.schemas import SearchSourceConnectorCreate, SearchSourceConnectorUpdate, SearchSourceConnectorRead
from app.db import get_async_session, User, SearchSourceConnector, SearchSourceConnectorType, SearchSpace, async_session_maker
from app.schemas import SearchSourceConnectorCreate, SearchSourceConnectorUpdate, SearchSourceConnectorRead, SearchSourceConnectorBase
from app.users import current_active_user
from app.utils.check_ownership import check_ownership
from pydantic import ValidationError
from app.tasks.connectors_indexing_tasks import index_slack_messages, index_notion_pages
from datetime import datetime
from pydantic import BaseModel, Field, ValidationError
from app.tasks.connectors_indexing_tasks import index_slack_messages, index_notion_pages, index_github_repos, index_linear_issues
from app.connectors.github_connector import GitHubConnector
from datetime import datetime, timezone, timedelta
import logging
# Set up logging
@ -28,6 +29,34 @@ logger = logging.getLogger(__name__)
router = APIRouter()
# Use Pydantic's BaseModel here
class GitHubPATRequest(BaseModel):
github_pat: str = Field(..., description="GitHub Personal Access Token")
# --- New Endpoint to list GitHub Repositories ---
@router.post("/github/repositories/", response_model=List[Dict[str, Any]])
async def list_github_repositories(
pat_request: GitHubPATRequest,
user: User = Depends(current_active_user) # Ensure the user is logged in
):
"""
Fetches a list of repositories accessible by the provided GitHub PAT.
The PAT is used for this request only and is not stored.
"""
try:
# Initialize GitHubConnector with the provided PAT
github_client = GitHubConnector(token=pat_request.github_pat)
# Fetch repositories
repositories = github_client.get_user_repositories()
return repositories
except ValueError as e:
# Handle invalid token error specifically
logger.error(f"GitHub PAT validation failed for user {user.id}: {str(e)}")
raise HTTPException(status_code=400, detail=f"Invalid GitHub PAT: {str(e)}")
except Exception as e:
logger.error(f"Failed to fetch GitHub repositories for user {user.id}: {str(e)}")
raise HTTPException(status_code=500, detail="Failed to fetch GitHub repositories.")
@router.post("/search-source-connectors/", response_model=SearchSourceConnectorRead)
async def create_search_source_connector(
connector: SearchSourceConnectorCreate,
@ -37,7 +66,7 @@ async def create_search_source_connector(
"""
Create a new search source connector.
Each user can have only one connector of each type (SERPER_API, TAVILY_API, SLACK_CONNECTOR).
Each user can have only one connector of each type (SERPER_API, TAVILY_API, SLACK_CONNECTOR, etc.).
The config must contain the appropriate keys for the connector type.
"""
try:
@ -50,13 +79,11 @@ async def create_search_source_connector(
)
)
existing_connector = result.scalars().first()
if existing_connector:
raise HTTPException(
status_code=409,
detail=f"A connector with type {connector.connector_type} already exists. Each user can have only one connector of each type."
)
db_connector = SearchSourceConnector(**connector.model_dump(), user_id=user.id)
session.add(db_connector)
await session.commit()
@ -78,6 +105,7 @@ async def create_search_source_connector(
await session.rollback()
raise
except Exception as e:
logger.error(f"Failed to create search source connector: {str(e)}")
await session.rollback()
raise HTTPException(
status_code=500,
@ -88,16 +116,18 @@ async def create_search_source_connector(
async def read_search_source_connectors(
skip: int = 0,
limit: int = 100,
search_space_id: int = None,
session: AsyncSession = Depends(get_async_session),
user: User = Depends(current_active_user)
):
"""List all search source connectors for the current user."""
try:
query = select(SearchSourceConnector).filter(SearchSourceConnector.user_id == user.id)
# No need to filter by search_space_id as connectors are user-owned, not search space specific
result = await session.execute(
select(SearchSourceConnector)
.filter(SearchSourceConnector.user_id == user.id)
.offset(skip)
.limit(limit)
query.offset(skip).limit(limit)
)
return result.scalars().all()
except Exception as e:
@ -132,54 +162,84 @@ async def update_search_source_connector(
):
"""
Update a search source connector.
Each user can have only one connector of each type (SERPER_API, TAVILY_API, SLACK_CONNECTOR).
The config must contain the appropriate keys for the connector type.
Handles partial updates, including merging changes into the 'config' field.
"""
try:
db_connector = await check_ownership(session, SearchSourceConnector, connector_id, user)
db_connector = await check_ownership(session, SearchSourceConnector, connector_id, user)
# Convert the sparse update data (only fields present in request) to a dict
update_data = connector_update.model_dump(exclude_unset=True)
# Special handling for 'config' field
if "config" in update_data:
incoming_config = update_data["config"] # Config data from the request
existing_config = db_connector.config if db_connector.config else {} # Current config from DB
# If connector type is being changed, check if one of that type already exists
if connector_update.connector_type != db_connector.connector_type:
# Merge incoming config into existing config
# This preserves existing keys (like GITHUB_PAT) if they are not in the incoming data
merged_config = existing_config.copy()
merged_config.update(incoming_config)
# -- Validation after merging --
# Validate the *merged* config based on the connector type
# We need the connector type - use the one from the update if provided, else the existing one
current_connector_type = connector_update.connector_type if connector_update.connector_type is not None else db_connector.connector_type
try:
# We can reuse the base validator by creating a temporary base model instance
# Note: This assumes 'name' and 'is_indexable' are not crucial for config validation itself
temp_data_for_validation = {
"name": db_connector.name, # Use existing name
"connector_type": current_connector_type,
"is_indexable": db_connector.is_indexable, # Use existing value
"last_indexed_at": db_connector.last_indexed_at, # Not used by validator
"config": merged_config
}
SearchSourceConnectorBase.model_validate(temp_data_for_validation)
except ValidationError as e:
# Raise specific validation error for the merged config
raise HTTPException(
status_code=422,
detail=f"Validation error for merged config: {str(e)}"
)
# If validation passes, update the main update_data dict with the merged config
update_data["config"] = merged_config
# Apply all updates (including the potentially merged config)
for key, value in update_data.items():
# Prevent changing connector_type if it causes a duplicate (check moved here)
if key == "connector_type" and value != db_connector.connector_type:
result = await session.execute(
select(SearchSourceConnector)
.filter(
SearchSourceConnector.user_id == user.id,
SearchSourceConnector.connector_type == connector_update.connector_type,
SearchSourceConnector.connector_type == value,
SearchSourceConnector.id != connector_id
)
)
existing_connector = result.scalars().first()
if existing_connector:
raise HTTPException(
status_code=409,
detail=f"A connector with type {connector_update.connector_type} already exists. Each user can have only one connector of each type."
detail=f"A connector with type {value} already exists. Each user can have only one connector of each type."
)
update_data = connector_update.model_dump(exclude_unset=True)
for key, value in update_data.items():
setattr(db_connector, key, value)
setattr(db_connector, key, value)
try:
await session.commit()
await session.refresh(db_connector)
return db_connector
except ValidationError as e:
await session.rollback()
raise HTTPException(
status_code=422,
detail=f"Validation error: {str(e)}"
)
except IntegrityError as e:
await session.rollback()
# This might occur if connector_type constraint is violated somehow after the check
raise HTTPException(
status_code=409,
detail=f"Integrity error: A connector with this type already exists. {str(e)}"
detail=f"Database integrity error during update: {str(e)}"
)
except HTTPException:
await session.rollback()
raise
except Exception as e:
await session.rollback()
logger.error(f"Failed to update search source connector {connector_id}: {e}", exc_info=True)
raise HTTPException(
status_code=500,
detail=f"Failed to update search source connector: {str(e)}"
@ -218,10 +278,10 @@ async def index_connector_content(
Index content from a connector to a search space.
Currently supports:
- SLACK_CONNECTOR: Indexes messages from all accessible Slack channels since the last indexing
(or the last 365 days if never indexed before)
- NOTION_CONNECTOR: Indexes pages from all accessible Notion pages since the last indexing
(or the last 365 days if never indexed before)
- SLACK_CONNECTOR: Indexes messages from all accessible Slack channels
- NOTION_CONNECTOR: Indexes pages from all accessible Notion pages
- GITHUB_CONNECTOR: Indexes code and documentation from GitHub repositories
- LINEAR_CONNECTOR: Indexes issues and comments from Linear
Args:
connector_id: ID of the connector to use
@ -239,43 +299,65 @@ async def index_connector_content(
search_space = await check_ownership(session, SearchSpace, search_space_id, user)
# Handle different connector types
response_message = ""
indexing_from = None
indexing_to = None
today_str = datetime.now().strftime("%Y-%m-%d")
if connector.connector_type == SearchSourceConnectorType.SLACK_CONNECTOR:
# Determine the time range that will be indexed
if not connector.last_indexed_at:
start_date = "365 days ago"
start_date = "365 days ago" # Or perhaps set a specific date if needed
else:
# Check if last_indexed_at is today
today = datetime.now().date()
if connector.last_indexed_at.date() == today:
# If last indexed today, go back 1 day to ensure we don't miss anything
start_date = (today - datetime.timedelta(days=1)).strftime("%Y-%m-%d")
start_date = (today - timedelta(days=1)).strftime("%Y-%m-%d")
else:
start_date = connector.last_indexed_at.strftime("%Y-%m-%d")
# Add the indexing task to background tasks
if background_tasks:
background_tasks.add_task(
run_slack_indexing_with_new_session,
connector_id,
search_space_id
)
return {
"success": True,
"message": "Slack indexing started in the background",
"connector_type": connector.connector_type,
"search_space": search_space.name,
"indexing_from": start_date,
"indexing_to": datetime.now().strftime("%Y-%m-%d")
}
else:
# For testing or if background tasks are not available
return {
"success": False,
"message": "Background tasks not available",
"connector_type": connector.connector_type
}
indexing_from = start_date
indexing_to = today_str
# Run indexing in background
logger.info(f"Triggering Slack indexing for connector {connector_id} into search space {search_space_id}")
background_tasks.add_task(run_slack_indexing_with_new_session, connector_id, search_space_id)
response_message = "Slack indexing started in the background."
elif connector.connector_type == SearchSourceConnectorType.NOTION_CONNECTOR:
# Determine the time range that will be indexed
if not connector.last_indexed_at:
start_date = "365 days ago" # Or perhaps set a specific date
else:
# Check if last_indexed_at is today
today = datetime.now().date()
if connector.last_indexed_at.date() == today:
# If last indexed today, go back 1 day to ensure we don't miss anything
start_date = (today - timedelta(days=1)).strftime("%Y-%m-%d")
else:
start_date = connector.last_indexed_at.strftime("%Y-%m-%d")
indexing_from = start_date
indexing_to = today_str
# Run indexing in background
logger.info(f"Triggering Notion indexing for connector {connector_id} into search space {search_space_id}")
background_tasks.add_task(run_notion_indexing_with_new_session, connector_id, search_space_id)
response_message = "Notion indexing started in the background."
elif connector.connector_type == SearchSourceConnectorType.GITHUB_CONNECTOR:
# GitHub connector likely indexes everything relevant, or uses internal logic
# Setting indexing_from to None and indexing_to to today
indexing_from = None
indexing_to = today_str
# Run indexing in background
logger.info(f"Triggering GitHub indexing for connector {connector_id} into search space {search_space_id}")
background_tasks.add_task(run_github_indexing_with_new_session, connector_id, search_space_id)
response_message = "GitHub indexing started in the background."
elif connector.connector_type == SearchSourceConnectorType.LINEAR_CONNECTOR:
# Determine the time range that will be indexed
if not connector.last_indexed_at:
start_date = "365 days ago"
@ -284,48 +366,39 @@ async def index_connector_content(
today = datetime.now().date()
if connector.last_indexed_at.date() == today:
# If last indexed today, go back 1 day to ensure we don't miss anything
start_date = (today - datetime.timedelta(days=1)).strftime("%Y-%m-%d")
start_date = (today - timedelta(days=1)).strftime("%Y-%m-%d")
else:
start_date = connector.last_indexed_at.strftime("%Y-%m-%d")
# Add the indexing task to background tasks
if background_tasks:
background_tasks.add_task(
run_notion_indexing_with_new_session,
connector_id,
search_space_id
)
return {
"success": True,
"message": "Notion indexing started in the background",
"connector_type": connector.connector_type,
"search_space": search_space.name,
"indexing_from": start_date,
"indexing_to": datetime.now().strftime("%Y-%m-%d")
}
else:
# For testing or if background tasks are not available
return {
"success": False,
"message": "Background tasks not available",
"connector_type": connector.connector_type
}
indexing_from = start_date
indexing_to = today_str
# Run indexing in background
logger.info(f"Triggering Linear indexing for connector {connector_id} into search space {search_space_id}")
background_tasks.add_task(run_linear_indexing_with_new_session, connector_id, search_space_id)
response_message = "Linear indexing started in the background."
else:
raise HTTPException(
status_code=400,
detail=f"Indexing not supported for connector type: {connector.connector_type}"
)
return {
"message": response_message,
"connector_id": connector_id,
"search_space_id": search_space_id,
"indexing_from": indexing_from,
"indexing_to": indexing_to
}
except HTTPException:
raise
except Exception as e:
logger.error(f"Failed to start indexing: {str(e)}")
logger.error(f"Failed to initiate indexing for connector {connector_id}: {e}", exc_info=True)
raise HTTPException(
status_code=500,
detail=f"Failed to start indexing: {str(e)}"
)
detail=f"Failed to initiate indexing: {str(e)}"
)
async def update_connector_last_indexed(
session: AsyncSession,
@ -361,8 +434,6 @@ async def run_slack_indexing_with_new_session(
Create a new session and run the Slack indexing task.
This prevents session leaks by creating a dedicated session for the background task.
"""
from app.db import async_session_maker
async with async_session_maker() as session:
await run_slack_indexing(session, connector_id, search_space_id)
@ -405,8 +476,6 @@ async def run_notion_indexing_with_new_session(
Create a new session and run the Notion indexing task.
This prevents session leaks by creating a dedicated session for the background task.
"""
from app.db import async_session_maker
async with async_session_maker() as session:
await run_notion_indexing(session, connector_id, search_space_id)
@ -439,4 +508,72 @@ async def run_notion_indexing(
else:
logger.error(f"Notion indexing failed or no documents processed: {error_or_warning}")
except Exception as e:
logger.error(f"Error in background Notion indexing task: {str(e)}")
logger.error(f"Error in background Notion indexing task: {str(e)}")
# Add new helper functions for GitHub indexing
async def run_github_indexing_with_new_session(
connector_id: int,
search_space_id: int
):
"""Wrapper to run GitHub indexing with its own database session."""
logger.info(f"Background task started: Indexing GitHub connector {connector_id} into space {search_space_id}")
async with async_session_maker() as session:
await run_github_indexing(session, connector_id, search_space_id)
logger.info(f"Background task finished: Indexing GitHub connector {connector_id}")
async def run_github_indexing(
session: AsyncSession,
connector_id: int,
search_space_id: int
):
"""Runs the GitHub indexing task and updates the timestamp."""
try:
indexed_count, error_message = await index_github_repos(
session, connector_id, search_space_id, update_last_indexed=False
)
if error_message:
logger.error(f"GitHub indexing failed for connector {connector_id}: {error_message}")
# Optionally update status in DB to indicate failure
else:
logger.info(f"GitHub indexing successful for connector {connector_id}. Indexed {indexed_count} documents.")
# Update the last indexed timestamp only on success
await update_connector_last_indexed(session, connector_id)
await session.commit() # Commit timestamp update
except Exception as e:
await session.rollback()
logger.error(f"Critical error in run_github_indexing for connector {connector_id}: {e}", exc_info=True)
# Optionally update status in DB to indicate failure
# Add new helper functions for Linear indexing
async def run_linear_indexing_with_new_session(
connector_id: int,
search_space_id: int
):
"""Wrapper to run Linear indexing with its own database session."""
logger.info(f"Background task started: Indexing Linear connector {connector_id} into space {search_space_id}")
async with async_session_maker() as session:
await run_linear_indexing(session, connector_id, search_space_id)
logger.info(f"Background task finished: Indexing Linear connector {connector_id}")
async def run_linear_indexing(
session: AsyncSession,
connector_id: int,
search_space_id: int
):
"""Runs the Linear indexing task and updates the timestamp."""
try:
indexed_count, error_message = await index_linear_issues(
session, connector_id, search_space_id, update_last_indexed=False
)
if error_message:
logger.error(f"Linear indexing failed for connector {connector_id}: {error_message}")
# Optionally update status in DB to indicate failure
else:
logger.info(f"Linear indexing successful for connector {connector_id}. Indexed {indexed_count} documents.")
# Update the last indexed timestamp only on success
await update_connector_last_indexed(session, connector_id)
await session.commit() # Commit timestamp update
except Exception as e:
await session.rollback()
logger.error(f"Critical error in run_linear_indexing for connector {connector_id}: {e}", exc_info=True)
# Optionally update status in DB to indicate failure

View file

@ -1,16 +1,15 @@
from datetime import datetime
import uuid
from typing import Dict, Any
from typing import Dict, Any, Optional
from pydantic import BaseModel, field_validator
from .base import IDModel, TimestampModel
from app.db import SearchSourceConnectorType
from fastapi import HTTPException
class SearchSourceConnectorBase(BaseModel):
name: str
connector_type: SearchSourceConnectorType
is_indexable: bool
last_indexed_at: datetime | None
last_indexed_at: Optional[datetime] = None
config: Dict[str, Any]
@field_validator('config')
@ -57,17 +56,45 @@ class SearchSourceConnectorBase(BaseModel):
# Ensure the integration token is not empty
if not config.get("NOTION_INTEGRATION_TOKEN"):
raise ValueError("NOTION_INTEGRATION_TOKEN cannot be empty")
elif connector_type == SearchSourceConnectorType.GITHUB_CONNECTOR:
# For GITHUB_CONNECTOR, only allow GITHUB_PAT and repo_full_names
allowed_keys = ["GITHUB_PAT", "repo_full_names"]
if set(config.keys()) != set(allowed_keys):
raise ValueError(f"For GITHUB_CONNECTOR connector type, config must only contain these keys: {allowed_keys}")
# Ensure the token is not empty
if not config.get("GITHUB_PAT"):
raise ValueError("GITHUB_PAT cannot be empty")
# Ensure the repo_full_names is present and is a non-empty list
repo_full_names = config.get("repo_full_names")
if not isinstance(repo_full_names, list) or not repo_full_names:
raise ValueError("repo_full_names must be a non-empty list of strings")
elif connector_type == SearchSourceConnectorType.LINEAR_CONNECTOR:
# For LINEAR_CONNECTOR, only allow LINEAR_API_KEY
allowed_keys = ["LINEAR_API_KEY"]
if set(config.keys()) != set(allowed_keys):
raise ValueError(f"For LINEAR_CONNECTOR connector type, config must only contain these keys: {allowed_keys}")
# Ensure the token is not empty
if not config.get("LINEAR_API_KEY"):
raise ValueError("LINEAR_API_KEY cannot be empty")
return config
class SearchSourceConnectorCreate(SearchSourceConnectorBase):
pass
class SearchSourceConnectorUpdate(SearchSourceConnectorBase):
pass
class SearchSourceConnectorUpdate(BaseModel):
name: Optional[str] = None
connector_type: Optional[SearchSourceConnectorType] = None
is_indexable: Optional[bool] = None
last_indexed_at: Optional[datetime] = None
config: Optional[Dict[str, Any]] = None
class SearchSourceConnectorRead(SearchSourceConnectorBase, IDModel, TimestampModel):
user_id: uuid.UUID
class Config:
from_attributes = True
from_attributes = True

View file

@ -1,14 +1,16 @@
from typing import Optional, List, Dict, Any, Tuple
from typing import Optional, Tuple
from sqlalchemy.ext.asyncio import AsyncSession
from sqlalchemy.exc import SQLAlchemyError
from sqlalchemy.future import select
from sqlalchemy import delete
from datetime import datetime, timedelta
from datetime import datetime, timedelta, timezone
from app.db import Document, DocumentType, Chunk, SearchSourceConnector, SearchSourceConnectorType
from app.config import config
from app.prompts import SUMMARY_PROMPT_TEMPLATE
from app.connectors.slack_history import SlackHistory
from app.connectors.notion_history import NotionHistoryConnector
from app.connectors.github_connector import GitHubConnector
from app.connectors.linear_connector import LinearConnector
from slack_sdk.errors import SlackApiError
import logging
@ -59,8 +61,20 @@ async def index_slack_messages(
end_date = datetime.now()
# Use last_indexed_at as start date if available, otherwise use 365 days ago
start_date = end_date - timedelta(days=365)
if connector.last_indexed_at:
# Convert dates to be comparable (both timezone-naive)
last_indexed_naive = connector.last_indexed_at.replace(tzinfo=None) if connector.last_indexed_at.tzinfo else connector.last_indexed_at
# Check if last_indexed_at is in the future or after end_date
if last_indexed_naive > end_date:
logger.warning(f"Last indexed date ({last_indexed_naive.strftime('%Y-%m-%d')}) is in the future. Using 30 days ago instead.")
start_date = end_date - timedelta(days=30)
else:
start_date = last_indexed_naive
logger.info(f"Using last_indexed_at ({start_date.strftime('%Y-%m-%d')}) as start date")
else:
start_date = end_date - timedelta(days=30) # Use 30 days instead of 365 to catch recent issues
logger.info(f"No last_indexed_at found, using {start_date.strftime('%Y-%m-%d')} (30 days ago) as start date")
# Format dates for Slack API
start_date_str = start_date.strftime("%Y-%m-%d")
@ -589,3 +603,473 @@ async def index_notion_pages(
await session.rollback()
logger.error(f"Failed to index Notion pages: {str(e)}", exc_info=True)
return 0, f"Failed to index Notion pages: {str(e)}"
async def index_github_repos(
session: AsyncSession,
connector_id: int,
search_space_id: int,
update_last_indexed: bool = True
) -> Tuple[int, Optional[str]]:
"""
Index code and documentation files from accessible GitHub repositories.
Args:
session: Database session
connector_id: ID of the GitHub connector
search_space_id: ID of the search space to store documents in
update_last_indexed: Whether to update the last_indexed_at timestamp (default: True)
Returns:
Tuple containing (number of documents indexed, error message or None)
"""
documents_processed = 0
errors = []
try:
# 1. Get the GitHub connector from the database
result = await session.execute(
select(SearchSourceConnector)
.filter(
SearchSourceConnector.id == connector_id,
SearchSourceConnector.connector_type == SearchSourceConnectorType.GITHUB_CONNECTOR
)
)
connector = result.scalars().first()
if not connector:
return 0, f"Connector with ID {connector_id} not found or is not a GitHub connector"
# 2. Get the GitHub PAT and selected repositories from the connector config
github_pat = connector.config.get("GITHUB_PAT")
repo_full_names_to_index = connector.config.get("repo_full_names")
if not github_pat:
return 0, "GitHub Personal Access Token (PAT) not found in connector config"
if not repo_full_names_to_index or not isinstance(repo_full_names_to_index, list):
return 0, "'repo_full_names' not found or is not a list in connector config"
# 3. Initialize GitHub connector client
try:
github_client = GitHubConnector(token=github_pat)
except ValueError as e:
return 0, f"Failed to initialize GitHub client: {str(e)}"
# 4. Validate selected repositories
# For simplicity, we'll proceed with the list provided.
# If a repo is inaccessible, get_repository_files will likely fail gracefully later.
logger.info(f"Starting indexing for {len(repo_full_names_to_index)} selected repositories.")
# 5. Get existing documents for this search space and connector type to prevent duplicates
existing_docs_result = await session.execute(
select(Document)
.filter(
Document.search_space_id == search_space_id,
Document.document_type == DocumentType.GITHUB_CONNECTOR
)
)
existing_docs = existing_docs_result.scalars().all()
# Create a lookup dict: key=repo_fullname/file_path, value=Document object
existing_docs_lookup = {doc.document_metadata.get("full_path"): doc for doc in existing_docs if doc.document_metadata.get("full_path")}
logger.info(f"Found {len(existing_docs_lookup)} existing GitHub documents in database for search space {search_space_id}")
# 6. Iterate through selected repositories and index files
for repo_full_name in repo_full_names_to_index:
if not repo_full_name or not isinstance(repo_full_name, str):
logger.warning(f"Skipping invalid repository entry: {repo_full_name}")
continue
logger.info(f"Processing repository: {repo_full_name}")
try:
files_to_index = github_client.get_repository_files(repo_full_name)
if not files_to_index:
logger.info(f"No indexable files found in repository: {repo_full_name}")
continue
logger.info(f"Found {len(files_to_index)} files to process in {repo_full_name}")
for file_info in files_to_index:
file_path = file_info.get("path")
file_url = file_info.get("url")
file_sha = file_info.get("sha")
file_type = file_info.get("type") # 'code' or 'doc'
full_path_key = f"{repo_full_name}/{file_path}"
if not file_path or not file_url or not file_sha:
logger.warning(f"Skipping file with missing info in {repo_full_name}: {file_info}")
continue
# Check if document already exists and if content hash matches
existing_doc = existing_docs_lookup.get(full_path_key)
if existing_doc and existing_doc.document_metadata.get("sha") == file_sha:
logger.debug(f"Skipping unchanged file: {full_path_key}")
continue # Skip if SHA matches (content hasn't changed)
# Get file content
file_content = github_client.get_file_content(repo_full_name, file_path)
if file_content is None:
logger.warning(f"Could not retrieve content for {full_path_key}. Skipping.")
continue # Skip if content fetch failed
# Use file_content directly for chunking, maybe summary for main content?
# For now, let's use the full content for both, might need refinement
summary_content = f"GitHub file: {full_path_key}\n\n{file_content[:1000]}..." # Simple summary
summary_embedding = config.embedding_model_instance.embed(summary_content)
# Chunk the content
try:
chunks_data = [
Chunk(content=chunk.text, embedding=chunk.embedding)
for chunk in config.chunker_instance.chunk(file_content)
]
except Exception as chunk_err:
logger.error(f"Failed to chunk file {full_path_key}: {chunk_err}")
errors.append(f"Chunking failed for {full_path_key}: {chunk_err}")
continue # Skip this file if chunking fails
doc_metadata = {
"repository_full_name": repo_full_name,
"file_path": file_path,
"full_path": full_path_key, # For easier lookup
"url": file_url,
"sha": file_sha,
"type": file_type,
"indexed_at": datetime.now(timezone.utc).isoformat()
}
if existing_doc:
# Update existing document
logger.info(f"Updating document for file: {full_path_key}")
existing_doc.title = f"GitHub - {file_path}"
existing_doc.document_metadata = doc_metadata
existing_doc.content = summary_content # Update summary
existing_doc.embedding = summary_embedding # Update embedding
# Delete old chunks
await session.execute(
delete(Chunk)
.where(Chunk.document_id == existing_doc.id)
)
# Add new chunks
for chunk_obj in chunks_data:
chunk_obj.document_id = existing_doc.id
session.add(chunk_obj)
documents_processed += 1
else:
# Create new document
logger.info(f"Creating new document for file: {full_path_key}")
document = Document(
title=f"GitHub - {file_path}",
document_type=DocumentType.GITHUB_CONNECTOR,
document_metadata=doc_metadata,
content=summary_content, # Store summary
embedding=summary_embedding,
search_space_id=search_space_id,
chunks=chunks_data # Associate chunks directly
)
session.add(document)
documents_processed += 1
# Commit periodically or at the end? For now, commit per repo
# await session.commit()
except Exception as repo_err:
logger.error(f"Failed to process repository {repo_full_name}: {repo_err}")
errors.append(f"Failed processing {repo_full_name}: {repo_err}")
# Commit all changes at the end
await session.commit()
logger.info(f"Finished GitHub indexing for connector {connector_id}. Processed {documents_processed} files.")
except SQLAlchemyError as db_err:
await session.rollback()
logger.error(f"Database error during GitHub indexing for connector {connector_id}: {db_err}")
errors.append(f"Database error: {db_err}")
return documents_processed, "; ".join(errors) if errors else str(db_err)
except Exception as e:
await session.rollback()
logger.error(f"Unexpected error during GitHub indexing for connector {connector_id}: {e}", exc_info=True)
errors.append(f"Unexpected error: {e}")
return documents_processed, "; ".join(errors) if errors else str(e)
error_message = "; ".join(errors) if errors else None
return documents_processed, error_message
async def index_linear_issues(
session: AsyncSession,
connector_id: int,
search_space_id: int,
update_last_indexed: bool = True
) -> Tuple[int, Optional[str]]:
"""
Index Linear issues and comments.
Args:
session: Database session
connector_id: ID of the Linear connector
search_space_id: ID of the search space to store documents in
update_last_indexed: Whether to update the last_indexed_at timestamp (default: True)
Returns:
Tuple containing (number of documents indexed, error message or None)
"""
try:
# Get the connector
result = await session.execute(
select(SearchSourceConnector)
.filter(
SearchSourceConnector.id == connector_id,
SearchSourceConnector.connector_type == SearchSourceConnectorType.LINEAR_CONNECTOR
)
)
connector = result.scalars().first()
if not connector:
return 0, f"Connector with ID {connector_id} not found or is not a Linear connector"
# Get the Linear token from the connector config
linear_token = connector.config.get("LINEAR_API_KEY")
if not linear_token:
return 0, "Linear API token not found in connector config"
# Initialize Linear client
linear_client = LinearConnector(token=linear_token)
# Calculate date range
end_date = datetime.now()
# Use last_indexed_at as start date if available, otherwise use 365 days ago
if connector.last_indexed_at:
# Convert dates to be comparable (both timezone-naive)
last_indexed_naive = connector.last_indexed_at.replace(tzinfo=None) if connector.last_indexed_at.tzinfo else connector.last_indexed_at
# Check if last_indexed_at is in the future or after end_date
if last_indexed_naive > end_date:
logger.warning(f"Last indexed date ({last_indexed_naive.strftime('%Y-%m-%d')}) is in the future. Using 30 days ago instead.")
start_date = end_date - timedelta(days=30)
else:
start_date = last_indexed_naive
logger.info(f"Using last_indexed_at ({start_date.strftime('%Y-%m-%d')}) as start date")
else:
start_date = end_date - timedelta(days=30) # Use 30 days instead of 365 to catch recent issues
logger.info(f"No last_indexed_at found, using {start_date.strftime('%Y-%m-%d')} (30 days ago) as start date")
# Format dates for Linear API
start_date_str = start_date.strftime("%Y-%m-%d")
end_date_str = end_date.strftime("%Y-%m-%d")
logger.info(f"Fetching Linear issues from {start_date_str} to {end_date_str}")
# Get issues within date range
try:
issues, error = linear_client.get_issues_by_date_range(
start_date=start_date_str,
end_date=end_date_str,
include_comments=True
)
if error:
logger.error(f"Failed to get Linear issues: {error}")
# Don't treat "No issues found" as an error that should stop indexing
if "No issues found" in error:
logger.info("No issues found is not a critical error, continuing with update")
if update_last_indexed:
connector.last_indexed_at = datetime.now()
await session.commit()
logger.info(f"Updated last_indexed_at to {connector.last_indexed_at} despite no issues found")
return 0, None
else:
return 0, f"Failed to get Linear issues: {error}"
logger.info(f"Retrieved {len(issues)} issues from Linear API")
except Exception as e:
logger.error(f"Exception when calling Linear API: {str(e)}", exc_info=True)
return 0, f"Failed to get Linear issues: {str(e)}"
if not issues:
logger.info("No Linear issues found for the specified date range")
if update_last_indexed:
connector.last_indexed_at = datetime.now()
await session.commit()
logger.info(f"Updated last_indexed_at to {connector.last_indexed_at} despite no issues found")
return 0, None # Return None instead of error message when no issues found
# Log issue IDs and titles for debugging
logger.info("Issues retrieved from Linear API:")
for idx, issue in enumerate(issues[:10]): # Log first 10 issues
logger.info(f" {idx+1}. {issue.get('identifier', 'Unknown')} - {issue.get('title', 'Unknown')} - Created: {issue.get('createdAt', 'Unknown')} - Updated: {issue.get('updatedAt', 'Unknown')}")
if len(issues) > 10:
logger.info(f" ...and {len(issues) - 10} more issues")
# Get existing documents for this search space and connector type to prevent duplicates
existing_docs_result = await session.execute(
select(Document)
.filter(
Document.search_space_id == search_space_id,
Document.document_type == DocumentType.LINEAR_CONNECTOR
)
)
existing_docs = existing_docs_result.scalars().all()
# Create a lookup dictionary of existing documents by issue_id
existing_docs_by_issue_id = {}
for doc in existing_docs:
if "issue_id" in doc.document_metadata:
existing_docs_by_issue_id[doc.document_metadata["issue_id"]] = doc
logger.info(f"Found {len(existing_docs_by_issue_id)} existing Linear documents in database")
# Log existing document IDs for debugging
if existing_docs_by_issue_id:
logger.info("Existing Linear document issue IDs in database:")
for idx, (issue_id, doc) in enumerate(list(existing_docs_by_issue_id.items())[:10]): # Log first 10
logger.info(f" {idx+1}. {issue_id} - {doc.document_metadata.get('issue_identifier', 'Unknown')} - {doc.document_metadata.get('issue_title', 'Unknown')}")
if len(existing_docs_by_issue_id) > 10:
logger.info(f" ...and {len(existing_docs_by_issue_id) - 10} more existing documents")
# Track the number of documents indexed
documents_indexed = 0
documents_updated = 0
documents_skipped = 0
skipped_issues = []
# Process each issue
for issue in issues:
try:
issue_id = issue.get("id")
issue_identifier = issue.get("identifier", "")
issue_title = issue.get("title", "")
if not issue_id or not issue_title:
logger.warning(f"Skipping issue with missing ID or title: {issue_id or 'Unknown'}")
skipped_issues.append(f"{issue_identifier or 'Unknown'} (missing data)")
documents_skipped += 1
continue
# Format the issue first to get well-structured data
formatted_issue = linear_client.format_issue(issue)
# Convert issue to markdown format
issue_content = linear_client.format_issue_to_markdown(formatted_issue)
if not issue_content:
logger.warning(f"Skipping issue with no content: {issue_identifier} - {issue_title}")
skipped_issues.append(f"{issue_identifier} (no content)")
documents_skipped += 1
continue
# Create a short summary for the embedding
# This avoids using the LLM and just uses the issue data directly
state = formatted_issue.get("state", "Unknown")
description = formatted_issue.get("description", "")
# Truncate description if it's too long for the summary
if description and len(description) > 500:
description = description[:497] + "..."
# Create a simple summary from the issue data
summary_content = f"Linear Issue {issue_identifier}: {issue_title}\n\nStatus: {state}\n\n"
if description:
summary_content += f"Description: {description}\n\n"
# Add comment count
comment_count = len(formatted_issue.get("comments", []))
summary_content += f"Comments: {comment_count}"
# Generate embedding for the summary
summary_embedding = config.embedding_model_instance.embed(summary_content)
# Process chunks - using the full issue content with comments
chunks = [
Chunk(content=chunk.text, embedding=chunk.embedding)
for chunk in config.chunker_instance.chunk(issue_content)
]
# Check if this issue already exists in our database
existing_document = existing_docs_by_issue_id.get(issue_id)
if existing_document:
# Update existing document instead of creating a new one
logger.info(f"Updating existing document for issue {issue_identifier} - {issue_title}")
# Update document fields
existing_document.title = f"Linear - {issue_identifier}: {issue_title}"
existing_document.document_metadata = {
"issue_id": issue_id,
"issue_identifier": issue_identifier,
"issue_title": issue_title,
"state": state,
"comment_count": comment_count,
"indexed_at": datetime.now().strftime("%Y-%m-%d %H:%M:%S"),
"last_updated": datetime.now().strftime("%Y-%m-%d %H:%M:%S")
}
existing_document.content = summary_content
existing_document.embedding = summary_embedding
# Delete existing chunks and add new ones
await session.execute(
delete(Chunk)
.where(Chunk.document_id == existing_document.id)
)
# Assign new chunks to existing document
for chunk in chunks:
chunk.document_id = existing_document.id
session.add(chunk)
documents_updated += 1
else:
# Create and store new document
logger.info(f"Creating new document for issue {issue_identifier} - {issue_title}")
document = Document(
search_space_id=search_space_id,
title=f"Linear - {issue_identifier}: {issue_title}",
document_type=DocumentType.LINEAR_CONNECTOR,
document_metadata={
"issue_id": issue_id,
"issue_identifier": issue_identifier,
"issue_title": issue_title,
"state": state,
"comment_count": comment_count,
"indexed_at": datetime.now().strftime("%Y-%m-%d %H:%M:%S")
},
content=summary_content,
embedding=summary_embedding,
chunks=chunks
)
session.add(document)
documents_indexed += 1
logger.info(f"Successfully indexed new issue {issue_identifier} - {issue_title}")
except Exception as e:
logger.error(f"Error processing issue {issue.get('identifier', 'Unknown')}: {str(e)}", exc_info=True)
skipped_issues.append(f"{issue.get('identifier', 'Unknown')} (processing error)")
documents_skipped += 1
continue # Skip this issue and continue with others
# Update the last_indexed_at timestamp for the connector only if requested
total_processed = documents_indexed + documents_updated
if update_last_indexed:
connector.last_indexed_at = datetime.now()
logger.info(f"Updated last_indexed_at to {connector.last_indexed_at}")
# Commit all changes
await session.commit()
logger.info(f"Successfully committed all Linear document changes to database")
logger.info(f"Linear indexing completed: {documents_indexed} new issues, {documents_updated} updated, {documents_skipped} skipped")
return total_processed, None # Return None as the error message to indicate success
except SQLAlchemyError as db_error:
await session.rollback()
logger.error(f"Database error: {str(db_error)}", exc_info=True)
return 0, f"Database error: {str(db_error)}"
except Exception as e:
await session.rollback()
logger.error(f"Failed to index Linear issues: {str(e)}", exc_info=True)
return 0, f"Failed to index Linear issues: {str(e)}"

View file

@ -1,20 +1,15 @@
import json
from sqlalchemy.ext.asyncio import AsyncSession
from typing import List, AsyncGenerator, Dict, Any
import asyncio
import re
from typing import AsyncGenerator, List, Union
from uuid import UUID
from app.utils.connector_service import ConnectorService
from app.utils.research_service import ResearchService
from app.agents.researcher.graph import graph as researcher_graph
from app.agents.researcher.state import State
from app.utils.streaming_service import StreamingService
from app.utils.reranker_service import RerankerService
from app.utils.query_service import QueryService
from app.config import config
from app.utils.document_converters import convert_chunks_to_langchain_documents
from sqlalchemy.ext.asyncio import AsyncSession
async def stream_connector_search_results(
user_query: str,
user_id: int,
user_id: Union[str, UUID],
search_space_id: int,
session: AsyncSession,
research_mode: str,
@ -25,7 +20,7 @@ async def stream_connector_search_results(
Args:
user_query: The user's query
user_id: The user's ID
user_id: The user's ID (can be UUID object or string)
search_space_id: The search space ID
session: The database session
research_mode: The research mode
@ -34,365 +29,45 @@ async def stream_connector_search_results(
Yields:
str: Formatted response strings
"""
# Initialize services
connector_service = ConnectorService(session)
streaming_service = StreamingService()
# Reformulate the user query using the strategic LLM
yield streaming_service.add_terminal_message("Reformulating your query for better results...", "info")
reformulated_query = await QueryService.reformulate_query(user_query)
yield streaming_service.add_terminal_message(f"Searching for: {reformulated_query}", "success")
reranker_service = RerankerService.get_reranker_instance(config)
all_raw_documents = [] # Store all raw documents before reranking
all_sources = []
TOP_K = 20
if research_mode == "GENERAL":
TOP_K = 20
NUM_SECTIONS = 1
elif research_mode == "DEEP":
TOP_K = 40
NUM_SECTIONS = 3
elif research_mode == "DEEPER":
TOP_K = 60
NUM_SECTIONS = 6
# Process each selected connector
for connector in selected_connectors:
if connector == "YOUTUBE_VIDEO":
# Send terminal message about starting search
yield streaming_service.add_terminal_message("Starting to search for youtube videos...")
# Search for YouTube videos using reformulated query
result_object, youtube_chunks = await connector_service.search_youtube(
user_query=reformulated_query,
user_id=user_id,
search_space_id=search_space_id,
top_k=TOP_K
)
# Send terminal message about search results
yield streaming_service.add_terminal_message(
f"Found {len(result_object['sources'])} relevant YouTube videos",
"success"
)
# Update sources
all_sources.append(result_object)
yield streaming_service.update_sources(all_sources)
# Add documents to collection
all_raw_documents.extend(youtube_chunks)
# Extension Docs
if connector == "EXTENSION":
# Send terminal message about starting search
yield streaming_service.add_terminal_message("Starting to search for extension...")
# Search for crawled URLs using reformulated query
result_object, extension_chunks = await connector_service.search_extension(
user_query=reformulated_query,
user_id=user_id,
search_space_id=search_space_id,
top_k=TOP_K
)
# Send terminal message about search results
yield streaming_service.add_terminal_message(
f"Found {len(result_object['sources'])} relevant extension documents",
"success"
)
# Update sources
all_sources.append(result_object)
yield streaming_service.update_sources(all_sources)
# Add documents to collection
all_raw_documents.extend(extension_chunks)
# Crawled URLs
if connector == "CRAWLED_URL":
# Send terminal message about starting search
yield streaming_service.add_terminal_message("Starting to search for crawled URLs...")
# Search for crawled URLs using reformulated query
result_object, crawled_urls_chunks = await connector_service.search_crawled_urls(
user_query=reformulated_query,
user_id=user_id,
search_space_id=search_space_id,
top_k=TOP_K
)
# Send terminal message about search results
yield streaming_service.add_terminal_message(
f"Found {len(result_object['sources'])} relevant crawled URLs",
"success"
)
# Update sources
all_sources.append(result_object)
yield streaming_service.update_sources(all_sources)
# Add documents to collection
all_raw_documents.extend(crawled_urls_chunks)
# Files
if connector == "FILE":
# Send terminal message about starting search
yield streaming_service.add_terminal_message("Starting to search for files...")
# Search for files using reformulated query
result_object, files_chunks = await connector_service.search_files(
user_query=reformulated_query,
user_id=user_id,
search_space_id=search_space_id,
top_k=TOP_K
)
# Send terminal message about search results
yield streaming_service.add_terminal_message(
f"Found {len(result_object['sources'])} relevant files",
"success"
)
# Update sources
all_sources.append(result_object)
yield streaming_service.update_sources(all_sources)
# Add documents to collection
all_raw_documents.extend(files_chunks)
# Tavily Connector
if connector == "TAVILY_API":
# Send terminal message about starting search
yield streaming_service.add_terminal_message("Starting to search with Tavily API...")
# Search using Tavily API with reformulated query
result_object, tavily_chunks = await connector_service.search_tavily(
user_query=reformulated_query,
user_id=user_id,
top_k=TOP_K
)
# Send terminal message about search results
yield streaming_service.add_terminal_message(
f"Found {len(result_object['sources'])} relevant results from Tavily",
"success"
)
# Update sources
all_sources.append(result_object)
yield streaming_service.update_sources(all_sources)
# Add documents to collection
all_raw_documents.extend(tavily_chunks)
# Slack Connector
if connector == "SLACK_CONNECTOR":
# Send terminal message about starting search
yield streaming_service.add_terminal_message("Starting to search for slack connector...")
# Search using Slack API with reformulated query
result_object, slack_chunks = await connector_service.search_slack(
user_query=reformulated_query,
user_id=user_id,
search_space_id=search_space_id,
top_k=TOP_K
)
# Send terminal message about search results
yield streaming_service.add_terminal_message(
f"Found {len(result_object['sources'])} relevant results from Slack",
"success"
)
# Update sources
all_sources.append(result_object)
yield streaming_service.update_sources(all_sources)
# Add documents to collection
all_raw_documents.extend(slack_chunks)
# Notion Connector
if connector == "NOTION_CONNECTOR":
# Send terminal message about starting search
yield streaming_service.add_terminal_message("Starting to search for notion connector...")
# Search using Notion API with reformulated query
result_object, notion_chunks = await connector_service.search_notion(
user_query=reformulated_query,
user_id=user_id,
search_space_id=search_space_id,
top_k=TOP_K
)
# Send terminal message about search results
yield streaming_service.add_terminal_message(
f"Found {len(result_object['sources'])} relevant results from Notion",
"success"
)
# Update sources
all_sources.append(result_object)
yield streaming_service.update_sources(all_sources)
# Add documents to collection
all_raw_documents.extend(notion_chunks)
# Convert UUID to string if needed
user_id_str = str(user_id) if isinstance(user_id, UUID) else user_id
# If we have documents to research
if all_raw_documents:
# Rerank all documents if reranker is available
if reranker_service:
yield streaming_service.add_terminal_message("Reranking documents for better relevance...", "info")
# Convert documents to format expected by reranker
reranker_input_docs = [
{
"chunk_id": doc.get("chunk_id", f"chunk_{i}"),
"content": doc.get("content", ""),
"score": doc.get("score", 0.0),
"document": {
"id": doc.get("document", {}).get("id", ""),
"title": doc.get("document", {}).get("title", ""),
"document_type": doc.get("document", {}).get("document_type", ""),
"metadata": doc.get("document", {}).get("metadata", {})
}
} for i, doc in enumerate(all_raw_documents)
]
# Rerank documents using the reformulated query
reranked_docs = reranker_service.rerank_documents(reformulated_query, reranker_input_docs)
# Sort by score in descending order
reranked_docs.sort(key=lambda x: x.get("score", 0), reverse=True)
# Convert back to langchain documents format
from langchain.schema import Document as LangchainDocument
all_langchain_documents_to_research = [
LangchainDocument(
page_content= f"""<document><metadata><source_id>{doc.get("document", {}).get("id", "")}</source_id></metadata><content>{doc.get("content", "")}</content></document>""",
metadata={
# **doc.get("document", {}).get("metadata", {}),
# "score": doc.get("score", 0.0),
# "rank": doc.get("rank", 0),
# "document_id": doc.get("document", {}).get("id", ""),
# "document_title": doc.get("document", {}).get("title", ""),
# "document_type": doc.get("document", {}).get("document_type", ""),
# # Explicitly set source_id for citation purposes
"source_id": str(doc.get("document", {}).get("id", ""))
}
) for doc in reranked_docs
]
yield streaming_service.add_terminal_message(f"Reranked {len(all_langchain_documents_to_research)} documents", "success")
else:
# Use raw documents if no reranker is available
all_langchain_documents_to_research = convert_chunks_to_langchain_documents(all_raw_documents)
# Send terminal message about starting research
yield streaming_service.add_terminal_message("Starting to research...", "info")
# Create a buffer to collect report content
report_buffer = []
# Use the streaming research method
yield streaming_service.add_terminal_message("Generating report...", "info")
# Create a wrapper to handle the streaming
class StreamHandler:
def __init__(self):
self.queue = asyncio.Queue()
async def handle_progress(self, data):
result = None
if data.get("type") == "logs":
# Handle log messages
result = streaming_service.add_terminal_message(data.get("output", ""), "info")
elif data.get("type") == "report":
# Handle report content
content = data.get("output", "")
# Fix incorrect citation formats using regex
# More specific pattern to match only numeric citations in markdown-style links
# This matches patterns like ([1](https://github.com/...)) but not general links like ([Click here](https://...))
pattern = r'\(\[(\d+)\]\((https?://[^\)]+)\)\)'
# Replace with just [X] where X is the number
content = re.sub(pattern, r'[\1]', content)
# Also match other incorrect formats like ([1]) and convert to [1]
# Only match if the content inside brackets is a number
content = re.sub(r'\(\[(\d+)\]\)', r'[\1]', content)
report_buffer.append(content)
# Update the answer with the accumulated content
result = streaming_service.update_answer(report_buffer)
if result:
await self.queue.put(result)
return result
async def get_next(self):
try:
return await self.queue.get()
except Exception as e:
print(f"Error getting next item from queue: {e}")
return None
def task_done(self):
self.queue.task_done()
# Create the stream handler
stream_handler = StreamHandler()
# Start the research process in a separate task
research_task = asyncio.create_task(
ResearchService.stream_research(
user_query=reformulated_query,
documents=all_langchain_documents_to_research,
on_progress=stream_handler.handle_progress,
research_mode=research_mode
)
)
# Stream results as they become available
while not research_task.done() or not stream_handler.queue.empty():
try:
# Get the next result with a timeout
result = await asyncio.wait_for(stream_handler.get_next(), timeout=0.1)
stream_handler.task_done()
yield result
except asyncio.TimeoutError:
# No result available yet, check if the research task is done
if research_task.done():
# If the queue is empty and the task is done, we're finished
if stream_handler.queue.empty():
break
# Get the final report
try:
final_report = await research_task
# Send terminal message about research completion
yield streaming_service.add_terminal_message("Research completed", "success")
# Update the answer with the final report
final_report_lines = final_report.split('\n')
yield streaming_service.update_answer(final_report_lines)
except Exception as e:
# Handle any exceptions
yield streaming_service.add_terminal_message(f"Error during research: {str(e)}", "error")
# Send completion message
yield streaming_service.format_completion()
# Sample configuration
config = {
"configurable": {
"user_query": user_query,
"num_sections": NUM_SECTIONS,
"connectors_to_search": selected_connectors,
"user_id": user_id_str,
"search_space_id": search_space_id
}
}
# Initialize state with database session and streaming service
initial_state = State(
db_session=session,
streaming_service=streaming_service
)
# Run the graph directly
print("\nRunning the complete researcher workflow...")
# Use streaming with config parameter
async for chunk in researcher_graph.astream(
initial_state,
config=config,
stream_mode="custom",
):
# If the chunk contains a 'yeild_value' key, print its value
# Note: there's a typo in 'yeild_value' in the code, but we need to match it
if isinstance(chunk, dict) and 'yeild_value' in chunk:
yield chunk['yeild_value']
yield streaming_service.format_completion()

View file

@ -13,7 +13,7 @@ class ConnectorService:
self.retriever = ChucksHybridSearchRetriever(session)
self.source_id_counter = 1
async def search_crawled_urls(self, user_query: str, user_id: int, search_space_id: int, top_k: int = 20) -> tuple:
async def search_crawled_urls(self, user_query: str, user_id: str, search_space_id: int, top_k: int = 20) -> tuple:
"""
Search for crawled URLs and return both the source information and langchain documents
@ -28,16 +28,16 @@ class ConnectorService:
document_type="CRAWLED_URL"
)
# Map crawled_urls_chunks to the required format
mapped_sources = {}
# Process each chunk and create sources directly without deduplication
sources_list = []
for i, chunk in enumerate(crawled_urls_chunks):
#Fix for UI
# Fix for UI
crawled_urls_chunks[i]['document']['id'] = self.source_id_counter
# Extract document metadata
document = chunk.get('document', {})
metadata = document.get('metadata', {})
# Create a mapped source entry
# Create a source entry
source = {
"id": self.source_id_counter,
"title": document.get('title', 'Untitled Document'),
@ -46,14 +46,7 @@ class ConnectorService:
}
self.source_id_counter += 1
# Use a unique identifier for tracking unique sources
source_key = source.get("url") or source.get("title")
if source_key and source_key not in mapped_sources:
mapped_sources[source_key] = source
# Convert to list of sources
sources_list = list(mapped_sources.values())
sources_list.append(source)
# Create result object
result_object = {
@ -63,10 +56,9 @@ class ConnectorService:
"sources": sources_list,
}
return result_object, crawled_urls_chunks
async def search_files(self, user_query: str, user_id: int, search_space_id: int, top_k: int = 20) -> tuple:
async def search_files(self, user_query: str, user_id: str, search_space_id: int, top_k: int = 20) -> tuple:
"""
Search for files and return both the source information and langchain documents
@ -81,16 +73,16 @@ class ConnectorService:
document_type="FILE"
)
# Map crawled_urls_chunks to the required format
mapped_sources = {}
# Process each chunk and create sources directly without deduplication
sources_list = []
for i, chunk in enumerate(files_chunks):
#Fix for UI
# Fix for UI
files_chunks[i]['document']['id'] = self.source_id_counter
# Extract document metadata
document = chunk.get('document', {})
metadata = document.get('metadata', {})
# Create a mapped source entry
# Create a source entry
source = {
"id": self.source_id_counter,
"title": document.get('title', 'Untitled Document'),
@ -99,14 +91,7 @@ class ConnectorService:
}
self.source_id_counter += 1
# Use a unique identifier for tracking unique sources
source_key = source.get("url") or source.get("title")
if source_key and source_key not in mapped_sources:
mapped_sources[source_key] = source
# Convert to list of sources
sources_list = list(mapped_sources.values())
sources_list.append(source)
# Create result object
result_object = {
@ -118,7 +103,7 @@ class ConnectorService:
return result_object, files_chunks
async def get_connector_by_type(self, user_id: int, connector_type: SearchSourceConnectorType) -> Optional[SearchSourceConnector]:
async def get_connector_by_type(self, user_id: str, connector_type: SearchSourceConnectorType) -> Optional[SearchSourceConnector]:
"""
Get a connector by type for a specific user
@ -138,7 +123,7 @@ class ConnectorService:
)
return result.scalars().first()
async def search_tavily(self, user_query: str, user_id: int, top_k: int = 20) -> tuple:
async def search_tavily(self, user_query: str, user_id: str, top_k: int = 20) -> tuple:
"""
Search using Tavily API and return both the source information and documents
@ -177,13 +162,10 @@ class ConnectorService:
# Extract results from Tavily response
tavily_results = response.get("results", [])
# Map Tavily results to the required format
# Process each result and create sources directly without deduplication
sources_list = []
documents = []
# Start IDs from 1000 to avoid conflicts with other connectors
base_id = 100
for i, result in enumerate(tavily_results):
# Create a source entry
@ -234,7 +216,7 @@ class ConnectorService:
"sources": [],
}, []
async def search_slack(self, user_query: str, user_id: int, search_space_id: int, top_k: int = 20) -> tuple:
async def search_slack(self, user_query: str, user_id: str, search_space_id: int, top_k: int = 20) -> tuple:
"""
Search for slack and return both the source information and langchain documents
@ -249,10 +231,10 @@ class ConnectorService:
document_type="SLACK_CONNECTOR"
)
# Map slack_chunks to the required format
mapped_sources = {}
# Process each chunk and create sources directly without deduplication
sources_list = []
for i, chunk in enumerate(slack_chunks):
#Fix for UI
# Fix for UI
slack_chunks[i]['document']['id'] = self.source_id_counter
# Extract document metadata
document = chunk.get('document', {})
@ -286,14 +268,7 @@ class ConnectorService:
}
self.source_id_counter += 1
# Use channel_id and content as a unique identifier for tracking unique sources
source_key = f"{channel_id}_{chunk.get('chunk_id', i)}"
if source_key and source_key not in mapped_sources:
mapped_sources[source_key] = source
# Convert to list of sources
sources_list = list(mapped_sources.values())
sources_list.append(source)
# Create result object
result_object = {
@ -305,7 +280,7 @@ class ConnectorService:
return result_object, slack_chunks
async def search_notion(self, user_query: str, user_id: int, search_space_id: int, top_k: int = 20) -> tuple:
async def search_notion(self, user_query: str, user_id: str, search_space_id: int, top_k: int = 20) -> tuple:
"""
Search for Notion pages and return both the source information and langchain documents
@ -326,8 +301,8 @@ class ConnectorService:
document_type="NOTION_CONNECTOR"
)
# Map notion_chunks to the required format
mapped_sources = {}
# Process each chunk and create sources directly without deduplication
sources_list = []
for i, chunk in enumerate(notion_chunks):
# Fix for UI
notion_chunks[i]['document']['id'] = self.source_id_counter
@ -365,14 +340,7 @@ class ConnectorService:
}
self.source_id_counter += 1
# Use page_id and content as a unique identifier for tracking unique sources
source_key = f"{page_id}_{chunk.get('chunk_id', i)}"
if source_key and source_key not in mapped_sources:
mapped_sources[source_key] = source
# Convert to list of sources
sources_list = list(mapped_sources.values())
sources_list.append(source)
# Create result object
result_object = {
@ -384,7 +352,7 @@ class ConnectorService:
return result_object, notion_chunks
async def search_extension(self, user_query: str, user_id: int, search_space_id: int, top_k: int = 20) -> tuple:
async def search_extension(self, user_query: str, user_id: str, search_space_id: int, top_k: int = 20) -> tuple:
"""
Search for extension data and return both the source information and langchain documents
@ -405,8 +373,8 @@ class ConnectorService:
document_type="EXTENSION"
)
# Map extension_chunks to the required format
mapped_sources = {}
# Process each chunk and create sources directly without deduplication
sources_list = []
for i, chunk in enumerate(extension_chunks):
# Fix for UI
extension_chunks[i]['document']['id'] = self.source_id_counter
@ -462,14 +430,7 @@ class ConnectorService:
}
self.source_id_counter += 1
# Use URL and timestamp as a unique identifier for tracking unique sources
source_key = f"{webpage_url}_{visit_date}"
if source_key and source_key not in mapped_sources:
mapped_sources[source_key] = source
# Convert to list of sources
sources_list = list(mapped_sources.values())
sources_list.append(source)
# Create result object
result_object = {
@ -481,7 +442,7 @@ class ConnectorService:
return result_object, extension_chunks
async def search_youtube(self, user_query: str, user_id: int, search_space_id: int, top_k: int = 20) -> tuple:
async def search_youtube(self, user_query: str, user_id: str, search_space_id: int, top_k: int = 20) -> tuple:
"""
Search for YouTube videos and return both the source information and langchain documents
@ -502,8 +463,8 @@ class ConnectorService:
document_type="YOUTUBE_VIDEO"
)
# Map youtube_chunks to the required format
mapped_sources = {}
# Process each chunk and create sources directly without deduplication
sources_list = []
for i, chunk in enumerate(youtube_chunks):
# Fix for UI
youtube_chunks[i]['document']['id'] = self.source_id_counter
@ -541,21 +502,144 @@ class ConnectorService:
}
self.source_id_counter += 1
# Use video_id as a unique identifier for tracking unique sources
source_key = video_id or f"youtube_{i}"
if source_key and source_key not in mapped_sources:
mapped_sources[source_key] = source
# Convert to list of sources
sources_list = list(mapped_sources.values())
sources_list.append(source)
# Create result object
result_object = {
"id": 6, # Assign a unique ID for the YouTube connector
"id": 7, # Assign a unique ID for the YouTube connector
"name": "YouTube Videos",
"type": "YOUTUBE_VIDEO",
"sources": sources_list,
}
return result_object, youtube_chunks
return result_object, youtube_chunks
async def search_github(self, user_query: str, user_id: int, search_space_id: int, top_k: int = 20) -> tuple:
"""
Search for GitHub documents and return both the source information and langchain documents
Returns:
tuple: (sources_info, langchain_documents)
"""
github_chunks = await self.retriever.hybrid_search(
query_text=user_query,
top_k=top_k,
user_id=user_id,
search_space_id=search_space_id,
document_type="GITHUB_CONNECTOR"
)
# Process each chunk and create sources directly without deduplication
sources_list = []
for i, chunk in enumerate(github_chunks):
# Fix for UI - assign a unique ID for citation/source tracking
github_chunks[i]['document']['id'] = self.source_id_counter
# Extract document metadata
document = chunk.get('document', {})
metadata = document.get('metadata', {})
# Create a source entry
source = {
"id": self.source_id_counter,
"title": document.get('title', 'GitHub Document'), # Use specific title if available
"description": metadata.get('description', chunk.get('content', '')[:100]), # Use description or content preview
"url": metadata.get('url', '') # Use URL if available in metadata
}
self.source_id_counter += 1
sources_list.append(source)
# Create result object
result_object = {
"id": 8,
"name": "GitHub",
"type": "GITHUB_CONNECTOR",
"sources": sources_list,
}
return result_object, github_chunks
async def search_linear(self, user_query: str, user_id: str, search_space_id: int, top_k: int = 20) -> tuple:
"""
Search for Linear issues and comments and return both the source information and langchain documents
Args:
user_query: The user's query
user_id: The user's ID
search_space_id: The search space ID to search in
top_k: Maximum number of results to return
Returns:
tuple: (sources_info, langchain_documents)
"""
linear_chunks = await self.retriever.hybrid_search(
query_text=user_query,
top_k=top_k,
user_id=user_id,
search_space_id=search_space_id,
document_type="LINEAR_CONNECTOR"
)
# Process each chunk and create sources directly without deduplication
sources_list = []
for i, chunk in enumerate(linear_chunks):
# Fix for UI
linear_chunks[i]['document']['id'] = self.source_id_counter
# Extract document metadata
document = chunk.get('document', {})
metadata = document.get('metadata', {})
# Extract Linear-specific metadata
issue_identifier = metadata.get('issue_identifier', '')
issue_title = metadata.get('issue_title', 'Untitled Issue')
issue_state = metadata.get('state', '')
comment_count = metadata.get('comment_count', 0)
# Create a more descriptive title for Linear issues
title = f"Linear: {issue_identifier} - {issue_title}"
if issue_state:
title += f" ({issue_state})"
# Create a more descriptive description for Linear issues
description = chunk.get('content', '')[:100]
if len(description) == 100:
description += "..."
# Add comment count info to description
if comment_count:
if description:
description += f" | Comments: {comment_count}"
else:
description = f"Comments: {comment_count}"
# For URL, we could construct a URL to the Linear issue if we have the workspace info
# For now, use a generic placeholder
url = ""
if issue_identifier:
# This is a generic format, may need to be adjusted based on actual Linear workspace
url = f"https://linear.app/issue/{issue_identifier}"
source = {
"id": self.source_id_counter,
"title": title,
"description": description,
"url": url,
"issue_identifier": issue_identifier,
"state": issue_state,
"comment_count": comment_count
}
self.source_id_counter += 1
sources_list.append(source)
# Create result object
result_object = {
"id": 9, # Assign a unique ID for the Linear connector
"name": "Linear Issues",
"type": "LINEAR_CONNECTOR",
"sources": sources_list,
}
return result_object, linear_chunks

View file

@ -1,5 +1,7 @@
from typing import Dict, Any
from langchain.schema import LLMResult, HumanMessage, SystemMessage
"""
NOTE: This is not used anymore. Might be removed in the future.
"""
from langchain.schema import HumanMessage, SystemMessage
from app.config import config
class QueryService:

View file

@ -1,211 +0,0 @@
import asyncio
import re
from typing import List, Dict, Any, AsyncGenerator, Callable, Optional
from langchain.schema import Document
from gpt_researcher.agent import GPTResearcher
from gpt_researcher.utils.enum import ReportType, Tone, ReportSource
from dotenv import load_dotenv
load_dotenv()
class ResearchService:
@staticmethod
async def create_custom_prompt(user_query: str) -> str:
citation_prompt = f"""
You are a research assistant tasked with analyzing documents and providing comprehensive answers with proper citations in IEEE format.
<instructions>
1. Carefully analyze all provided documents in the <document> section's.
2. Extract relevant information that addresses the user's query.
3. Synthesize a comprehensive, well-structured answer using information from these documents.
4. For EVERY piece of information you include from the documents, add an IEEE-style citation in square brackets [X] where X is the source_id from the document's metadata.
5. Make sure ALL factual statements from the documents have proper citations.
6. If multiple documents support the same point, include all relevant citations [X], [Y].
7. Present information in a logical, coherent flow.
8. Use your own words to connect ideas, but cite ALL information from the documents.
9. If documents contain conflicting information, acknowledge this and present both perspectives with appropriate citations.
10. Do not make up or include information not found in the provided documents.
11. CRITICAL: You MUST use the exact source_id value from each document's metadata for citations. Do not create your own citation numbers.
12. CRITICAL: Every citation MUST be in the IEEE format [X] where X is the exact source_id value.
13. CRITICAL: Never renumber or reorder citations - always use the original source_id values.
14. CRITICAL: Do not return citations as clickable links.
15. CRITICAL: Never format citations as markdown links like "([1](https://example.com))". Always use plain square brackets only.
16. CRITICAL: Citations must ONLY appear as [X] or [X], [Y], [Z] format - never with parentheses, hyperlinks, or other formatting.
17. CRITICAL: Never make up citation numbers. Only use source_id values that are explicitly provided in the document metadata.
18. CRITICAL: If you are unsure about a source_id, do not include a citation rather than guessing or making one up.
</instructions>
<format>
- Write in clear, professional language suitable for academic or technical audiences
- Organize your response with appropriate paragraphs, headings, and structure
- Every fact from the documents must have an IEEE-style citation in square brackets [X] where X is the EXACT source_id from the document's metadata
- Citations should appear at the end of the sentence containing the information they support
- Multiple citations should be separated by commas: [X], [Y], [Z]
- No need to return references section. Just citation numbers in answer.
- NEVER create your own citation numbering system - use the exact source_id values from the documents.
- NEVER format citations as clickable links or as markdown links like "([1](https://example.com))". Always use plain square brackets only.
- NEVER make up citation numbers if you are unsure about the source_id. It is better to omit the citation than to guess.
</format>
<input_example>
<document>
<metadata>
<source_id>1</source_id>
</metadata>
<content>
<text>
The Great Barrier Reef is the world's largest coral reef system, stretching over 2,300 kilometers along the coast of Queensland, Australia. It comprises over 2,900 individual reefs and 900 islands.
</text>
</content>
</document>
<document>
<metadata>
<source_id>13</source_id>
</metadata>
<content>
<text>
Climate change poses a significant threat to coral reefs worldwide. Rising ocean temperatures have led to mass coral bleaching events in the Great Barrier Reef in 2016, 2017, and 2020.
</text>
</content>
</document>
<document>
<metadata>
<source_id>21</source_id>
</metadata>
<content>
<text>
The Great Barrier Reef was designated a UNESCO World Heritage Site in 1981 due to its outstanding universal value and biological diversity. It is home to over 1,500 species of fish and 400 types of coral.
</text>
</content>
</document>
</input_example>
<output_example>
The Great Barrier Reef is the world's largest coral reef system, stretching over 2,300 kilometers along the coast of Queensland, Australia [1]. It was designated a UNESCO World Heritage Site in 1981 due to its outstanding universal value and biological diversity [21]. The reef is home to over 1,500 species of fish and 400 types of coral [21]. Unfortunately, climate change poses a significant threat to coral reefs worldwide, with rising ocean temperatures leading to mass coral bleaching events in the Great Barrier Reef in 2016, 2017, and 2020 [13]. The reef system comprises over 2,900 individual reefs and 900 islands [1], making it an ecological treasure that requires protection from multiple threats [1], [13].
</output_example>
<incorrect_citation_formats>
DO NOT use any of these incorrect citation formats:
- Using parentheses and markdown links: ([1](https://github.com/MODSetter/SurfSense))
- Using parentheses around brackets: ([1])
- Using hyperlinked text: [link to source 1](https://example.com)
- Using footnote style: ... reef system¹
- Making up citation numbers when source_id is unknown
ONLY use plain square brackets [1] or multiple citations [1], [2], [3]
</incorrect_citation_formats>
Note that the citation numbers match exactly with the source_id values (1, 13, and 21) and are not renumbered sequentially. Citations follow IEEE style with square brackets and appear at the end of sentences.
Now, please research the following query:
<user_query_to_research>
{user_query}
</user_query_to_research>
"""
return citation_prompt
@staticmethod
async def stream_research(
user_query: str,
documents: List[Document] = None,
on_progress: Optional[Callable] = None,
research_mode: str = "GENERAL"
) -> str:
"""
Stream the research process using GPTResearcher
Args:
user_query: The user's query
documents: List of Document objects to use for research
on_progress: Optional callback for progress updates
research_mode: Research mode to use
Returns:
str: The final research report
"""
# Create a custom websocket-like object to capture streaming output
class StreamingWebsocket:
async def send_json(self, data):
if on_progress:
try:
# Filter out excessive logging of the prompt
if data.get("type") == "logs":
output = data.get("output", "")
# Check if this is a verbose prompt log
if "You are a research assistant tasked with analyzing documents" in output and len(output) > 500:
# Replace with a shorter message
data["output"] = f"Processing research for query: {user_query}"
result = await on_progress(data)
return result
except Exception as e:
print(f"Error in on_progress callback: {e}")
return None
streaming_websocket = StreamingWebsocket()
custom_prompt_for_ieee_citations = await ResearchService.create_custom_prompt(user_query)
if(research_mode == "GENERAL"):
research_report_type = ReportType.CustomReport.value
elif(research_mode == "DEEP"):
research_report_type = ReportType.ResearchReport.value
elif(research_mode == "DEEPER"):
research_report_type = ReportType.DetailedReport.value
# elif(research_mode == "DEEPEST"):
# research_report_type = ReportType.DeepResearch.value
# Initialize GPTResearcher with the streaming websocket
researcher = GPTResearcher(
query=custom_prompt_for_ieee_citations,
report_type=research_report_type,
report_format="IEEE",
report_source=ReportSource.LangChainDocuments.value,
tone=Tone.Formal,
documents=documents,
verbose=True,
websocket=streaming_websocket
)
# Conduct research
await researcher.conduct_research()
# Generate report with streaming
report = await researcher.write_report()
# Fix citation format
report = ResearchService.fix_citation_format(report)
return report
@staticmethod
def fix_citation_format(text: str) -> str:
"""
Fix any incorrectly formatted citations in the text.
Args:
text: The text to fix
Returns:
str: The text with fixed citations
"""
if not text:
return text
# More specific pattern to match only numeric citations in markdown-style links
# This matches patterns like ([1](https://github.com/...)) but not general links like ([Click here](https://...))
pattern = r'\(\[(\d+)\]\((https?://[^\)]+)\)\)'
# Replace with just [X] where X is the number
text = re.sub(pattern, r'[\1]', text)
# Also match other incorrect formats like ([1]) and convert to [1]
# Only match if the content inside brackets is a number
text = re.sub(r'\(\[(\d+)\]\)', r'[\1]', text)
return text

View file

@ -1,5 +1,6 @@
import json
from typing import List, Dict, Any, Generator
from typing import Any, Dict, List
class StreamingService:
def __init__(self):
@ -18,55 +19,7 @@ class StreamingService:
"content": []
}
]
def add_terminal_message(self, text: str, message_type: str = "info") -> str:
"""
Add a terminal message to the annotations and return the formatted response
Args:
text: The message text
message_type: The message type (info, success, error)
Returns:
str: The formatted response string
"""
self.message_annotations[0]["content"].append({
"id": self.terminal_idx,
"text": text,
"type": message_type
})
self.terminal_idx += 1
return self._format_annotations()
def update_sources(self, sources: List[Dict[str, Any]]) -> str:
"""
Update the sources in the annotations and return the formatted response
Args:
sources: List of source objects
Returns:
str: The formatted response string
"""
self.message_annotations[1]["content"] = sources
return self._format_annotations()
def update_answer(self, answer_content: List[str]) -> str:
"""
Update the answer in the annotations and return the formatted response
Args:
answer_content: The answer content as a list of strings
Returns:
str: The formatted response string
"""
self.message_annotations[2] = {
"type": "ANSWER",
"content": answer_content
}
return self._format_annotations()
# It is used to send annotations to the frontend
def _format_annotations(self) -> str:
"""
Format the annotations as a string
@ -76,6 +29,7 @@ class StreamingService:
"""
return f'8:{json.dumps(self.message_annotations)}\n'
# It is used to end Streaming
def format_completion(self, prompt_tokens: int = 156, completion_tokens: int = 204) -> str:
"""
Format a completion message
@ -96,4 +50,23 @@ class StreamingService:
"totalTokens": total_tokens
}
}
return f'd:{json.dumps(completion_data)}\n'
return f'd:{json.dumps(completion_data)}\n'
def only_update_terminal(self, text: str, message_type: str = "info") -> str:
self.message_annotations[0]["content"].append({
"id": self.terminal_idx,
"text": text,
"type": message_type
})
self.terminal_idx += 1
return self.message_annotations
def only_update_sources(self, sources: List[Dict[str, Any]]) -> str:
self.message_annotations[1]["content"] = sources
return self.message_annotations
def only_update_answer(self, answer: List[str]) -> str:
self.message_annotations[2]["content"] = answer
return self.message_annotations

View file

@ -0,0 +1,5 @@
from app.agents.researcher.graph import graph as researcher_graph
from app.agents.researcher.sub_section_writer.graph import graph as sub_section_writer_graph
print(researcher_graph.get_graph().draw_mermaid())
print(sub_section_writer_graph.get_graph().draw_mermaid())

View file

@ -1,5 +1,12 @@
import uvicorn
import argparse
import logging
logging.basicConfig(
level=logging.INFO,
format='%(asctime)s - %(name)s - %(levelname)s - %(message)s',
datefmt='%Y-%m-%d %H:%M:%S'
)
if __name__ == "__main__":
parser = argparse.ArgumentParser(description='Run the SurfSense application')

View file

@ -1,18 +1,20 @@
[project]
name = "surf-new-backend"
version = "0.1.0"
description = "Add your description here"
version = "0.0.6"
description = "SurfSense Backend"
readme = "README.md"
requires-python = ">=3.12"
dependencies = [
"alembic>=1.13.0",
"asyncpg>=0.30.0",
"chonkie[all]>=0.4.1",
"fastapi>=0.115.8",
"fastapi-users[oauth,sqlalchemy]>=14.0.1",
"firecrawl-py>=1.12.0",
"gpt-researcher>=0.12.12",
"github3.py==4.0.1",
"langchain-community>=0.3.17",
"langchain-unstructured>=0.1.6",
"langgraph>=0.3.29",
"litellm>=1.61.4",
"markdownify>=0.14.1",
"notion-client>=2.3.0",

View file

@ -92,6 +92,20 @@ wheels = [
{ url = "https://files.pythonhosted.org/packages/ec/6a/bc7e17a3e87a2985d3e8f4da4cd0f481060eb78fb08596c42be62c90a4d9/aiosignal-1.3.2-py2.py3-none-any.whl", hash = "sha256:45cde58e409a301715980c2b01d0c28bdde3770d8290b5eb2173759d9acb31a5", size = 7597 },
]
[[package]]
name = "alembic"
version = "1.15.2"
source = { registry = "https://pypi.org/simple" }
dependencies = [
{ name = "mako" },
{ name = "sqlalchemy" },
{ name = "typing-extensions" },
]
sdist = { url = "https://files.pythonhosted.org/packages/e6/57/e314c31b261d1e8a5a5f1908065b4ff98270a778ce7579bd4254477209a7/alembic-1.15.2.tar.gz", hash = "sha256:1c72391bbdeffccfe317eefba686cb9a3c078005478885413b95c3b26c57a8a7", size = 1925573 }
wheels = [
{ url = "https://files.pythonhosted.org/packages/41/18/d89a443ed1ab9bcda16264716f809c663866d4ca8de218aa78fd50b38ead/alembic-1.15.2-py3-none-any.whl", hash = "sha256:2e76bd916d547f6900ec4bb5a90aeac1485d2c92536923d0b138c02b126edc53", size = 231911 },
]
[[package]]
name = "annotated-types"
version = "0.7.0"
@ -154,19 +168,6 @@ wheels = [
{ url = "https://files.pythonhosted.org/packages/5a/e4/bf8034d25edaa495da3c8a3405627d2e35758e44ff6eaa7948092646fdcc/argon2_cffi_bindings-21.2.0-cp38-abi3-macosx_10_9_universal2.whl", hash = "sha256:e415e3f62c8d124ee16018e491a009937f8cf7ebf5eb430ffc5de21b900dad93", size = 53104 },
]
[[package]]
name = "arxiv"
version = "2.1.3"
source = { registry = "https://pypi.org/simple" }
dependencies = [
{ name = "feedparser" },
{ name = "requests" },
]
sdist = { url = "https://files.pythonhosted.org/packages/fe/59/fe41f54bdfed776c2e9bcd6289e4c71349eb938241d89b4c97d0f33e8013/arxiv-2.1.3.tar.gz", hash = "sha256:32365221994d2cf05657c1fadf63a26efc8ccdec18590281ee03515bfef8bc4e", size = 16747 }
wheels = [
{ url = "https://files.pythonhosted.org/packages/b7/7b/7bf42178d227b26d3daf94cdd22a72a4ed5bf235548c4f5aea49c51c6458/arxiv-2.1.3-py3-none-any.whl", hash = "sha256:6f43673ab770a9e848d7d4fc1894824df55edeac3c3572ea280c9ba2e3c0f39f", size = 11478 },
]
[[package]]
name = "asyncpg"
version = "0.30.0"
@ -265,61 +266,6 @@ wheels = [
{ url = "https://files.pythonhosted.org/packages/f9/49/6abb616eb3cbab6a7cca303dc02fdf3836de2e0b834bf966a7f5271a34d8/beautifulsoup4-4.13.3-py3-none-any.whl", hash = "sha256:99045d7d3f08f91f0d656bc9b7efbae189426cd913d830294a15eefa0ea4df16", size = 186015 },
]
[[package]]
name = "brotli"
version = "1.1.0"
source = { registry = "https://pypi.org/simple" }
sdist = { url = "https://files.pythonhosted.org/packages/2f/c2/f9e977608bdf958650638c3f1e28f85a1b075f075ebbe77db8555463787b/Brotli-1.1.0.tar.gz", hash = "sha256:81de08ac11bcb85841e440c13611c00b67d3bf82698314928d0b676362546724", size = 7372270 }
wheels = [
{ url = "https://files.pythonhosted.org/packages/5c/d0/5373ae13b93fe00095a58efcbce837fd470ca39f703a235d2a999baadfbc/Brotli-1.1.0-cp312-cp312-macosx_10_13_universal2.whl", hash = "sha256:32d95b80260d79926f5fab3c41701dbb818fde1c9da590e77e571eefd14abe28", size = 815693 },
{ url = "https://files.pythonhosted.org/packages/8e/48/f6e1cdf86751300c288c1459724bfa6917a80e30dbfc326f92cea5d3683a/Brotli-1.1.0-cp312-cp312-macosx_10_13_x86_64.whl", hash = "sha256:b760c65308ff1e462f65d69c12e4ae085cff3b332d894637f6273a12a482d09f", size = 422489 },
{ url = "https://files.pythonhosted.org/packages/06/88/564958cedce636d0f1bed313381dfc4b4e3d3f6015a63dae6146e1b8c65c/Brotli-1.1.0-cp312-cp312-macosx_10_9_universal2.whl", hash = "sha256:316cc9b17edf613ac76b1f1f305d2a748f1b976b033b049a6ecdfd5612c70409", size = 873081 },
{ url = "https://files.pythonhosted.org/packages/58/79/b7026a8bb65da9a6bb7d14329fd2bd48d2b7f86d7329d5cc8ddc6a90526f/Brotli-1.1.0-cp312-cp312-macosx_10_9_x86_64.whl", hash = "sha256:caf9ee9a5775f3111642d33b86237b05808dafcd6268faa492250e9b78046eb2", size = 446244 },
{ url = "https://files.pythonhosted.org/packages/e5/18/c18c32ecea41b6c0004e15606e274006366fe19436b6adccc1ae7b2e50c2/Brotli-1.1.0-cp312-cp312-manylinux_2_17_aarch64.manylinux2014_aarch64.whl", hash = "sha256:70051525001750221daa10907c77830bc889cb6d865cc0b813d9db7fefc21451", size = 2906505 },
{ url = "https://files.pythonhosted.org/packages/08/c8/69ec0496b1ada7569b62d85893d928e865df29b90736558d6c98c2031208/Brotli-1.1.0-cp312-cp312-manylinux_2_17_ppc64le.manylinux2014_ppc64le.whl", hash = "sha256:7f4bf76817c14aa98cc6697ac02f3972cb8c3da93e9ef16b9c66573a68014f91", size = 2944152 },
{ url = "https://files.pythonhosted.org/packages/ab/fb/0517cea182219d6768113a38167ef6d4eb157a033178cc938033a552ed6d/Brotli-1.1.0-cp312-cp312-manylinux_2_17_x86_64.manylinux2014_x86_64.whl", hash = "sha256:d0c5516f0aed654134a2fc936325cc2e642f8a0e096d075209672eb321cff408", size = 2919252 },
{ url = "https://files.pythonhosted.org/packages/c7/53/73a3431662e33ae61a5c80b1b9d2d18f58dfa910ae8dd696e57d39f1a2f5/Brotli-1.1.0-cp312-cp312-manylinux_2_5_i686.manylinux1_i686.manylinux_2_17_i686.manylinux2014_i686.whl", hash = "sha256:6c3020404e0b5eefd7c9485ccf8393cfb75ec38ce75586e046573c9dc29967a0", size = 2845955 },
{ url = "https://files.pythonhosted.org/packages/55/ac/bd280708d9c5ebdbf9de01459e625a3e3803cce0784f47d633562cf40e83/Brotli-1.1.0-cp312-cp312-musllinux_1_1_aarch64.whl", hash = "sha256:4ed11165dd45ce798d99a136808a794a748d5dc38511303239d4e2363c0695dc", size = 2914304 },
{ url = "https://files.pythonhosted.org/packages/76/58/5c391b41ecfc4527d2cc3350719b02e87cb424ef8ba2023fb662f9bf743c/Brotli-1.1.0-cp312-cp312-musllinux_1_1_i686.whl", hash = "sha256:4093c631e96fdd49e0377a9c167bfd75b6d0bad2ace734c6eb20b348bc3ea180", size = 2814452 },
{ url = "https://files.pythonhosted.org/packages/c7/4e/91b8256dfe99c407f174924b65a01f5305e303f486cc7a2e8a5d43c8bec3/Brotli-1.1.0-cp312-cp312-musllinux_1_1_ppc64le.whl", hash = "sha256:7e4c4629ddad63006efa0ef968c8e4751c5868ff0b1c5c40f76524e894c50248", size = 2938751 },
{ url = "https://files.pythonhosted.org/packages/5a/a6/e2a39a5d3b412938362bbbeba5af904092bf3f95b867b4a3eb856104074e/Brotli-1.1.0-cp312-cp312-musllinux_1_1_x86_64.whl", hash = "sha256:861bf317735688269936f755fa136a99d1ed526883859f86e41a5d43c61d8966", size = 2933757 },
{ url = "https://files.pythonhosted.org/packages/13/f0/358354786280a509482e0e77c1a5459e439766597d280f28cb097642fc26/Brotli-1.1.0-cp312-cp312-musllinux_1_2_aarch64.whl", hash = "sha256:87a3044c3a35055527ac75e419dfa9f4f3667a1e887ee80360589eb8c90aabb9", size = 2936146 },
{ url = "https://files.pythonhosted.org/packages/80/f7/daf538c1060d3a88266b80ecc1d1c98b79553b3f117a485653f17070ea2a/Brotli-1.1.0-cp312-cp312-musllinux_1_2_i686.whl", hash = "sha256:c5529b34c1c9d937168297f2c1fde7ebe9ebdd5e121297ff9c043bdb2ae3d6fb", size = 2848055 },
{ url = "https://files.pythonhosted.org/packages/ad/cf/0eaa0585c4077d3c2d1edf322d8e97aabf317941d3a72d7b3ad8bce004b0/Brotli-1.1.0-cp312-cp312-musllinux_1_2_ppc64le.whl", hash = "sha256:ca63e1890ede90b2e4454f9a65135a4d387a4585ff8282bb72964fab893f2111", size = 3035102 },
{ url = "https://files.pythonhosted.org/packages/d8/63/1c1585b2aa554fe6dbce30f0c18bdbc877fa9a1bf5ff17677d9cca0ac122/Brotli-1.1.0-cp312-cp312-musllinux_1_2_x86_64.whl", hash = "sha256:e79e6520141d792237c70bcd7a3b122d00f2613769ae0cb61c52e89fd3443839", size = 2930029 },
{ url = "https://files.pythonhosted.org/packages/5f/3b/4e3fd1893eb3bbfef8e5a80d4508bec17a57bb92d586c85c12d28666bb13/Brotli-1.1.0-cp312-cp312-win32.whl", hash = "sha256:5f4d5ea15c9382135076d2fb28dde923352fe02951e66935a9efaac8f10e81b0", size = 333276 },
{ url = "https://files.pythonhosted.org/packages/3d/d5/942051b45a9e883b5b6e98c041698b1eb2012d25e5948c58d6bf85b1bb43/Brotli-1.1.0-cp312-cp312-win_amd64.whl", hash = "sha256:906bc3a79de8c4ae5b86d3d75a8b77e44404b0f4261714306e3ad248d8ab0951", size = 357255 },
{ url = "https://files.pythonhosted.org/packages/0a/9f/fb37bb8ffc52a8da37b1c03c459a8cd55df7a57bdccd8831d500e994a0ca/Brotli-1.1.0-cp313-cp313-macosx_10_13_universal2.whl", hash = "sha256:8bf32b98b75c13ec7cf774164172683d6e7891088f6316e54425fde1efc276d5", size = 815681 },
{ url = "https://files.pythonhosted.org/packages/06/b3/dbd332a988586fefb0aa49c779f59f47cae76855c2d00f450364bb574cac/Brotli-1.1.0-cp313-cp313-macosx_10_13_x86_64.whl", hash = "sha256:7bc37c4d6b87fb1017ea28c9508b36bbcb0c3d18b4260fcdf08b200c74a6aee8", size = 422475 },
{ url = "https://files.pythonhosted.org/packages/bb/80/6aaddc2f63dbcf2d93c2d204e49c11a9ec93a8c7c63261e2b4bd35198283/Brotli-1.1.0-cp313-cp313-manylinux_2_17_aarch64.manylinux2014_aarch64.whl", hash = "sha256:3c0ef38c7a7014ffac184db9e04debe495d317cc9c6fb10071f7fefd93100a4f", size = 2906173 },
{ url = "https://files.pythonhosted.org/packages/ea/1d/e6ca79c96ff5b641df6097d299347507d39a9604bde8915e76bf026d6c77/Brotli-1.1.0-cp313-cp313-manylinux_2_17_ppc64le.manylinux2014_ppc64le.whl", hash = "sha256:91d7cc2a76b5567591d12c01f019dd7afce6ba8cba6571187e21e2fc418ae648", size = 2943803 },
{ url = "https://files.pythonhosted.org/packages/ac/a3/d98d2472e0130b7dd3acdbb7f390d478123dbf62b7d32bda5c830a96116d/Brotli-1.1.0-cp313-cp313-manylinux_2_17_x86_64.manylinux2014_x86_64.whl", hash = "sha256:a93dde851926f4f2678e704fadeb39e16c35d8baebd5252c9fd94ce8ce68c4a0", size = 2918946 },
{ url = "https://files.pythonhosted.org/packages/c4/a5/c69e6d272aee3e1423ed005d8915a7eaa0384c7de503da987f2d224d0721/Brotli-1.1.0-cp313-cp313-manylinux_2_5_i686.manylinux1_i686.manylinux_2_17_i686.manylinux2014_i686.whl", hash = "sha256:f0db75f47be8b8abc8d9e31bc7aad0547ca26f24a54e6fd10231d623f183d089", size = 2845707 },
{ url = "https://files.pythonhosted.org/packages/58/9f/4149d38b52725afa39067350696c09526de0125ebfbaab5acc5af28b42ea/Brotli-1.1.0-cp313-cp313-musllinux_1_2_aarch64.whl", hash = "sha256:6967ced6730aed543b8673008b5a391c3b1076d834ca438bbd70635c73775368", size = 2936231 },
{ url = "https://files.pythonhosted.org/packages/5a/5a/145de884285611838a16bebfdb060c231c52b8f84dfbe52b852a15780386/Brotli-1.1.0-cp313-cp313-musllinux_1_2_i686.whl", hash = "sha256:7eedaa5d036d9336c95915035fb57422054014ebdeb6f3b42eac809928e40d0c", size = 2848157 },
{ url = "https://files.pythonhosted.org/packages/50/ae/408b6bfb8525dadebd3b3dd5b19d631da4f7d46420321db44cd99dcf2f2c/Brotli-1.1.0-cp313-cp313-musllinux_1_2_ppc64le.whl", hash = "sha256:d487f5432bf35b60ed625d7e1b448e2dc855422e87469e3f450aa5552b0eb284", size = 3035122 },
{ url = "https://files.pythonhosted.org/packages/af/85/a94e5cfaa0ca449d8f91c3d6f78313ebf919a0dbd55a100c711c6e9655bc/Brotli-1.1.0-cp313-cp313-musllinux_1_2_x86_64.whl", hash = "sha256:832436e59afb93e1836081a20f324cb185836c617659b07b129141a8426973c7", size = 2930206 },
{ url = "https://files.pythonhosted.org/packages/c2/f0/a61d9262cd01351df22e57ad7c34f66794709acab13f34be2675f45bf89d/Brotli-1.1.0-cp313-cp313-win32.whl", hash = "sha256:43395e90523f9c23a3d5bdf004733246fba087f2948f87ab28015f12359ca6a0", size = 333804 },
{ url = "https://files.pythonhosted.org/packages/7e/c1/ec214e9c94000d1c1974ec67ced1c970c148aa6b8d8373066123fc3dbf06/Brotli-1.1.0-cp313-cp313-win_amd64.whl", hash = "sha256:9011560a466d2eb3f5a6e4929cf4a09be405c64154e12df0dd72713f6500e32b", size = 358517 },
]
[[package]]
name = "brotlicffi"
version = "1.1.0.0"
source = { registry = "https://pypi.org/simple" }
dependencies = [
{ name = "cffi" },
]
sdist = { url = "https://files.pythonhosted.org/packages/95/9d/70caa61192f570fcf0352766331b735afa931b4c6bc9a348a0925cc13288/brotlicffi-1.1.0.0.tar.gz", hash = "sha256:b77827a689905143f87915310b93b273ab17888fd43ef350d4832c4a71083c13", size = 465192 }
wheels = [
{ url = "https://files.pythonhosted.org/packages/a2/11/7b96009d3dcc2c931e828ce1e157f03824a69fb728d06bfd7b2fc6f93718/brotlicffi-1.1.0.0-cp37-abi3-macosx_10_9_x86_64.whl", hash = "sha256:9b7ae6bd1a3f0df532b6d67ff674099a96d22bc0948955cb338488c31bfb8851", size = 453786 },
{ url = "https://files.pythonhosted.org/packages/d6/e6/a8f46f4a4ee7856fbd6ac0c6fb0dc65ed181ba46cd77875b8d9bbe494d9e/brotlicffi-1.1.0.0-cp37-abi3-manylinux_2_17_aarch64.manylinux2014_aarch64.whl", hash = "sha256:19ffc919fa4fc6ace69286e0a23b3789b4219058313cf9b45625016bf7ff996b", size = 2911165 },
{ url = "https://files.pythonhosted.org/packages/be/20/201559dff14e83ba345a5ec03335607e47467b6633c210607e693aefac40/brotlicffi-1.1.0.0-cp37-abi3-manylinux_2_17_x86_64.manylinux2014_x86_64.whl", hash = "sha256:9feb210d932ffe7798ee62e6145d3a757eb6233aa9a4e7db78dd3690d7755814", size = 2927895 },
{ url = "https://files.pythonhosted.org/packages/cd/15/695b1409264143be3c933f708a3f81d53c4a1e1ebbc06f46331decbf6563/brotlicffi-1.1.0.0-cp37-abi3-manylinux_2_5_i686.manylinux1_i686.manylinux_2_17_i686.manylinux2014_i686.whl", hash = "sha256:84763dbdef5dd5c24b75597a77e1b30c66604725707565188ba54bab4f114820", size = 2851834 },
{ url = "https://files.pythonhosted.org/packages/b4/40/b961a702463b6005baf952794c2e9e0099bde657d0d7e007f923883b907f/brotlicffi-1.1.0.0-cp37-abi3-win32.whl", hash = "sha256:1b12b50e07c3911e1efa3a8971543e7648100713d4e0971b13631cce22c587eb", size = 341731 },
{ url = "https://files.pythonhosted.org/packages/1c/fa/5408a03c041114ceab628ce21766a4ea882aa6f6f0a800e04ee3a30ec6b9/brotlicffi-1.1.0.0-cp37-abi3-win_amd64.whl", hash = "sha256:994a4f0681bb6c6c3b0925530a1926b7a189d878e6e5e38fae8efa47c5d9c613", size = 366783 },
]
[[package]]
name = "cachetools"
version = "5.5.2"
@ -545,19 +491,6 @@ wheels = [
{ url = "https://files.pythonhosted.org/packages/d2/05/5533d30f53f10239616a357f080892026db2d550a40c393d0a8a7af834a9/cryptography-44.0.1-cp39-abi3-win_amd64.whl", hash = "sha256:e403f7f766ded778ecdb790da786b418a9f2394f36e8cc8b796cc056ab05f44f", size = 3207303 },
]
[[package]]
name = "cssselect2"
version = "0.8.0"
source = { registry = "https://pypi.org/simple" }
dependencies = [
{ name = "tinycss2" },
{ name = "webencodings" },
]
sdist = { url = "https://files.pythonhosted.org/packages/9f/86/fd7f58fc498b3166f3a7e8e0cddb6e620fe1da35b02248b1bd59e95dbaaa/cssselect2-0.8.0.tar.gz", hash = "sha256:7674ffb954a3b46162392aee2a3a0aedb2e14ecf99fcc28644900f4e6e3e9d3a", size = 35716 }
wheels = [
{ url = "https://files.pythonhosted.org/packages/0f/e7/aa315e6a749d9b96c2504a1ba0ba031ba2d0517e972ce22682e3fccecb09/cssselect2-0.8.0-py3-none-any.whl", hash = "sha256:46fc70ebc41ced7a32cd42d58b1884d72ade23d21e5a4eaaf022401c13f0e76e", size = 15454 },
]
[[package]]
name = "cycler"
version = "0.12.1"
@ -619,12 +552,6 @@ wheels = [
{ url = "https://files.pythonhosted.org/packages/68/1b/e0a87d256e40e8c888847551b20a017a6b98139178505dc7ffb96f04e954/dnspython-2.7.0-py3-none-any.whl", hash = "sha256:b4c34b7d10b51bcc3a5071e7b8dee77939f1e878477eeecc965e9835f63c6c86", size = 313632 },
]
[[package]]
name = "docopt"
version = "0.6.2"
source = { registry = "https://pypi.org/simple" }
sdist = { url = "https://files.pythonhosted.org/packages/a2/55/8f8cab2afd404cf578136ef2cc5dfb50baa1761b68c9da1fb1e4eed343c9/docopt-0.6.2.tar.gz", hash = "sha256:49b3a825280bd66b3aa83585ef59c4a8c82f2c8a522dbe754a8bc8d08c85c491", size = 25901 }
[[package]]
name = "effdet"
version = "0.4.1"
@ -733,18 +660,6 @@ wheels = [
{ url = "https://files.pythonhosted.org/packages/a6/08/9968963c1fb8c34627b7f1fbcdfe9438540f87dc7c9bfb59bb4fd19a4ecf/fastapi_users_db_sqlalchemy-7.0.0-py3-none-any.whl", hash = "sha256:5fceac018e7cfa69efc70834dd3035b3de7988eb4274154a0dbe8b14f5aa001e", size = 6891 },
]
[[package]]
name = "feedparser"
version = "6.0.11"
source = { registry = "https://pypi.org/simple" }
dependencies = [
{ name = "sgmllib3k" },
]
sdist = { url = "https://files.pythonhosted.org/packages/ff/aa/7af346ebeb42a76bf108027fe7f3328bb4e57a3a96e53e21fd9ef9dd6dd0/feedparser-6.0.11.tar.gz", hash = "sha256:c9d0407b64c6f2a065d0ebb292c2b35c01050cc0dc33757461aaabdc4c4184d5", size = 286197 }
wheels = [
{ url = "https://files.pythonhosted.org/packages/7c/d4/8c31aad9cc18f451c49f7f9cfb5799dadffc88177f7917bc90a66459b1d7/feedparser-6.0.11-py3-none-any.whl", hash = "sha256:0be7ee7b395572b19ebeb1d6aafb0028dee11169f1c934e0ed67d54992f4ad45", size = 81343 },
]
[[package]]
name = "filelock"
version = "3.17.0"
@ -829,13 +744,6 @@ wheels = [
{ url = "https://files.pythonhosted.org/packages/bf/ff/44934a031ce5a39125415eb405b9efb76fe7f9586b75291d66ae5cbfc4e6/fonttools-4.56.0-py3-none-any.whl", hash = "sha256:1088182f68c303b50ca4dc0c82d42083d176cba37af1937e1a976a31149d4d14", size = 1089800 },
]
[package.optional-dependencies]
woff = [
{ name = "brotli", marker = "platform_python_implementation == 'CPython'" },
{ name = "brotlicffi", marker = "platform_python_implementation != 'CPython'" },
{ name = "zopfli" },
]
[[package]]
name = "frozenlist"
version = "1.5.0"
@ -884,6 +792,21 @@ wheels = [
{ url = "https://files.pythonhosted.org/packages/e2/94/758680531a00d06e471ef649e4ec2ed6bf185356a7f9fbfbb7368a40bd49/fsspec-2025.2.0-py3-none-any.whl", hash = "sha256:9de2ad9ce1f85e1931858535bc882543171d197001a0a5eb2ddc04f1781ab95b", size = 184484 },
]
[[package]]
name = "github3-py"
version = "4.0.1"
source = { registry = "https://pypi.org/simple" }
dependencies = [
{ name = "pyjwt", extra = ["crypto"] },
{ name = "python-dateutil" },
{ name = "requests" },
{ name = "uritemplate" },
]
sdist = { url = "https://files.pythonhosted.org/packages/89/91/603bcaf8cd1b3927de64bf56c3a8915f6653ea7281919140c5bcff2bfe7b/github3.py-4.0.1.tar.gz", hash = "sha256:30d571076753efc389edc7f9aaef338a4fcb24b54d8968d5f39b1342f45ddd36", size = 36214038 }
wheels = [
{ url = "https://files.pythonhosted.org/packages/61/ad/2394d4fb542574678b0ba342daf734d4d811768da3c2ee0c84d509dcb26c/github3.py-4.0.1-py3-none-any.whl", hash = "sha256:a89af7de25650612d1da2f0609622bcdeb07ee8a45a1c06b2d16a05e4234e753", size = 151800 },
]
[[package]]
name = "google-api-core"
version = "2.24.2"
@ -947,42 +870,6 @@ wheels = [
{ url = "https://files.pythonhosted.org/packages/f9/53/d35476d547a286506f0a6a634ccf1e5d288fffd53d48f0bd5fef61d68684/googleapis_common_protos-1.69.2-py3-none-any.whl", hash = "sha256:0b30452ff9c7a27d80bfc5718954063e8ab53dd3697093d3bc99581f5fd24212", size = 293215 },
]
[[package]]
name = "gpt-researcher"
version = "0.12.12"
source = { registry = "https://pypi.org/simple" }
dependencies = [
{ name = "aiofiles" },
{ name = "arxiv" },
{ name = "beautifulsoup4" },
{ name = "colorama" },
{ name = "htmldocx" },
{ name = "json-repair" },
{ name = "json5" },
{ name = "langchain" },
{ name = "langchain-community" },
{ name = "langchain-openai" },
{ name = "loguru" },
{ name = "lxml-html-clean" },
{ name = "markdown" },
{ name = "md2pdf" },
{ name = "mistune" },
{ name = "pydantic" },
{ name = "pymupdf" },
{ name = "python-docx" },
{ name = "python-dotenv" },
{ name = "python-multipart" },
{ name = "pyyaml" },
{ name = "requests" },
{ name = "tiktoken" },
{ name = "unstructured" },
{ name = "websockets" },
]
sdist = { url = "https://files.pythonhosted.org/packages/55/d2/3a1acfd8b71ead6ab9a6833bcd1725e468e65817e7b04c233cd2d5e0a629/gpt_researcher-0.12.12.tar.gz", hash = "sha256:e3fa6faae4a3dc7e4280521eceb94a081a0aae277eb1ded25152579f65195844", size = 123669 }
wheels = [
{ url = "https://files.pythonhosted.org/packages/09/0b/80ade14566946ca2253d1718746b63059f5bb1a2e489804c87e24332082d/gpt_researcher-0.12.12-py3-none-any.whl", hash = "sha256:3db51994406844d8acb28cb2a4897b9e9aaa34f0127edd7f8ded103fe2e544d4", size = 161671 },
]
[[package]]
name = "greenlet"
version = "3.1.1"
@ -1080,19 +967,6 @@ wheels = [
{ url = "https://files.pythonhosted.org/packages/6c/dd/a834df6482147d48e225a49515aabc28974ad5a4ca3215c18a882565b028/html5lib-1.1-py2.py3-none-any.whl", hash = "sha256:0d78f8fde1c230e99fe37986a60526d7049ed4bf8a9fadbad5f00e22e58e041d", size = 112173 },
]
[[package]]
name = "htmldocx"
version = "0.0.6"
source = { registry = "https://pypi.org/simple" }
dependencies = [
{ name = "beautifulsoup4" },
{ name = "python-docx" },
]
sdist = { url = "https://files.pythonhosted.org/packages/8b/61/91a6b70ee576a4b07310d81efd4c688fe2e6f63ea42ec95b8f1d436b887e/htmldocx-0.0.6.tar.gz", hash = "sha256:b4bcec895f86d7a50ffc7133ca24d85c24f3614db2b37d33a30d9d04654a5486", size = 9418 }
wheels = [
{ url = "https://files.pythonhosted.org/packages/8f/da/c70fc2ce54c1d1ce7c16f9656589273a6c94cbbc8867b3a512618d977309/htmldocx-0.0.6-py3-none-any.whl", hash = "sha256:adf5e95ad8ba8121e606cf138c614de13327a1192a5782acdb4a0abdc23db1b7", size = 9490 },
]
[[package]]
name = "httpcore"
version = "1.0.7"
@ -1271,24 +1145,6 @@ wheels = [
{ url = "https://files.pythonhosted.org/packages/91/29/df4b9b42f2be0b623cbd5e2140cafcaa2bef0759a00b7b70104dcfe2fb51/joblib-1.4.2-py3-none-any.whl", hash = "sha256:06d478d5674cbc267e7496a410ee875abd68e4340feff4490bcb7afb88060ae6", size = 301817 },
]
[[package]]
name = "json-repair"
version = "0.39.1"
source = { registry = "https://pypi.org/simple" }
sdist = { url = "https://files.pythonhosted.org/packages/95/60/6d1599bc01070d9fe3840d245ae80fd24b981c732d962842825ce7a9fde6/json_repair-0.39.1.tar.gz", hash = "sha256:e90a489f247e1a8fc86612a5c719872a3dbf9cbaffd6d55f238ec571a77740fa", size = 30040 }
wheels = [
{ url = "https://files.pythonhosted.org/packages/ff/b9/2e445481555422b907dab468b53574bc1e995099ca1a1201d0d876ca05e9/json_repair-0.39.1-py3-none-any.whl", hash = "sha256:3001409a2f319249f13e13d6c622117a5b70ea7e0c6f43864a0233cdffc3a599", size = 20686 },
]
[[package]]
name = "json5"
version = "0.10.0"
source = { registry = "https://pypi.org/simple" }
sdist = { url = "https://files.pythonhosted.org/packages/85/3d/bbe62f3d0c05a689c711cff57b2e3ac3d3e526380adb7c781989f075115c/json5-0.10.0.tar.gz", hash = "sha256:e66941c8f0a02026943c52c2eb34ebeb2a6f819a0be05920a6f5243cd30fd559", size = 48202 }
wheels = [
{ url = "https://files.pythonhosted.org/packages/aa/42/797895b952b682c3dafe23b1834507ee7f02f4d6299b65aaa61425763278/json5-0.10.0-py3-none-any.whl", hash = "sha256:19b23410220a7271e8377f81ba8aacba2fdd56947fbb137ee5977cbe1f5e8dfa", size = 34049 },
]
[[package]]
name = "jsonpatch"
version = "1.33"
@ -1450,20 +1306,6 @@ wheels = [
{ url = "https://files.pythonhosted.org/packages/20/0e/ddf9f5dc46b178df5c101666bb3bc7fc526d68cd81cdd60cbe1b6b438b30/langchain_core-0.3.43-py3-none-any.whl", hash = "sha256:caa6bc1f4c6ab71d3c2e400f8b62e1cd6dc5ac2c37e03f12f3e2c60befd5b273", size = 415421 },
]
[[package]]
name = "langchain-openai"
version = "0.3.8"
source = { registry = "https://pypi.org/simple" }
dependencies = [
{ name = "langchain-core" },
{ name = "openai" },
{ name = "tiktoken" },
]
sdist = { url = "https://files.pythonhosted.org/packages/2e/04/ae071af0b04d1c3a8040498714091afd21149f6f8ae1dbab584317d9dfd7/langchain_openai-0.3.8.tar.gz", hash = "sha256:4d73727eda8102d1d07a2ca036278fccab0bb5e0abf353cec9c3973eb72550ec", size = 256898 }
wheels = [
{ url = "https://files.pythonhosted.org/packages/a5/43/9c6a1101bcd751d52a3328a06956f85122f9aaa31da1b15a8e0f99a70317/langchain_openai-0.3.8-py3-none-any.whl", hash = "sha256:9004dc8ef853aece0d8f0feca7753dc97f710fa3e53874c8db66466520436dbb", size = 55446 },
]
[[package]]
name = "langchain-text-splitters"
version = "0.3.6"
@ -1499,6 +1341,61 @@ dependencies = [
]
sdist = { url = "https://files.pythonhosted.org/packages/0e/72/a3add0e4eec4eb9e2569554f7c70f4a3c27712f40e3284d483e88094cc0e/langdetect-1.0.9.tar.gz", hash = "sha256:cbc1fef89f8d062739774bd51eda3da3274006b3661d199c2655f6b3f6d605a0", size = 981474 }
[[package]]
name = "langgraph"
version = "0.3.29"
source = { registry = "https://pypi.org/simple" }
dependencies = [
{ name = "langchain-core" },
{ name = "langgraph-checkpoint" },
{ name = "langgraph-prebuilt" },
{ name = "langgraph-sdk" },
{ name = "xxhash" },
]
sdist = { url = "https://files.pythonhosted.org/packages/26/00/6a38988d472835845ee6837402dc6050e012117b84ef2b838b7abd3268f1/langgraph-0.3.29.tar.gz", hash = "sha256:2bfa6d6b04541ddfcb03b56efd1fca6294a1700ff61a52c1582a8bb4f2d55a94", size = 119970 }
wheels = [
{ url = "https://files.pythonhosted.org/packages/66/b4/89d81ed78efeec5b3d554a9244cdc6aa6cbf544da9c53738d7c2c6d4be57/langgraph-0.3.29-py3-none-any.whl", hash = "sha256:6045fbbe9ccc5af3fd7295a86f88e0d2b111243a36290e41248af379009e4cc1", size = 144692 },
]
[[package]]
name = "langgraph-checkpoint"
version = "2.0.24"
source = { registry = "https://pypi.org/simple" }
dependencies = [
{ name = "langchain-core" },
{ name = "ormsgpack" },
]
sdist = { url = "https://files.pythonhosted.org/packages/0d/df/bacef68562ba4c391ded751eecda8e579ec78a581506064cf625e0ebd93a/langgraph_checkpoint-2.0.24.tar.gz", hash = "sha256:9596dad332344e7e871257be464df8a07c2e9bac66143081b11b9422b0167e5b", size = 37328 }
wheels = [
{ url = "https://files.pythonhosted.org/packages/bc/60/30397e8fd2b7dead3754aa79d708caff9dbb371f30b4cd21802c60f6b921/langgraph_checkpoint-2.0.24-py3-none-any.whl", hash = "sha256:3836e2909ef2387d1fa8d04ee3e2a353f980d519fd6c649af352676dc73d66b8", size = 42028 },
]
[[package]]
name = "langgraph-prebuilt"
version = "0.1.8"
source = { registry = "https://pypi.org/simple" }
dependencies = [
{ name = "langchain-core" },
{ name = "langgraph-checkpoint" },
]
sdist = { url = "https://files.pythonhosted.org/packages/57/30/f31f0e076c37d097b53e4cff5d479a3686e1991f6c86a1a4727d5d1f5489/langgraph_prebuilt-0.1.8.tar.gz", hash = "sha256:4de7659151829b2b955b6798df6800e580e617782c15c2c5b29b139697491831", size = 24543 }
wheels = [
{ url = "https://files.pythonhosted.org/packages/36/72/9e092665502f8f52f2708065ed14fbbba3f95d1a1b65d62049b0c5fcdf00/langgraph_prebuilt-0.1.8-py3-none-any.whl", hash = "sha256:ae97b828ae00be2cefec503423aa782e1bff165e9b94592e224da132f2526968", size = 25903 },
]
[[package]]
name = "langgraph-sdk"
version = "0.1.61"
source = { registry = "https://pypi.org/simple" }
dependencies = [
{ name = "httpx" },
{ name = "orjson" },
]
sdist = { url = "https://files.pythonhosted.org/packages/f0/c6/a11de2c770e1ac2774e2f19fdbd982b8df079e4206376456e14af395a3f0/langgraph_sdk-0.1.61.tar.gz", hash = "sha256:87dd1f07ab82da8875ac343268ece8bf5414632017ebc9d1cef4b523962fd601", size = 44136 }
wheels = [
{ url = "https://files.pythonhosted.org/packages/fb/2b/85e796d8b4aad892c5d2bccc0def124fcdc2c9852dfa121adadfc41085b2/langgraph_sdk-0.1.61-py3-none-any.whl", hash = "sha256:f2d774b12497c428862993090622d51e0dbc3f53e0cee3d74a13c7495d835cc6", size = 47249 },
]
[[package]]
name = "langsmith"
version = "0.3.8"
@ -1538,19 +1435,6 @@ wheels = [
{ url = "https://files.pythonhosted.org/packages/f9/c2/1b6c502909b7af9054736af61e27558a3341e8c1ba28e7f82473e6dd936f/litellm-1.61.4-py3-none-any.whl", hash = "sha256:e87e0d397a191795b4217f9299fc9b21eaacaab91409695f0a4780cceccda6e1", size = 6814517 },
]
[[package]]
name = "loguru"
version = "0.7.3"
source = { registry = "https://pypi.org/simple" }
dependencies = [
{ name = "colorama", marker = "sys_platform == 'win32'" },
{ name = "win32-setctime", marker = "sys_platform == 'win32'" },
]
sdist = { url = "https://files.pythonhosted.org/packages/3a/05/a1dae3dffd1116099471c643b8924f5aa6524411dc6c63fdae648c4f1aca/loguru-0.7.3.tar.gz", hash = "sha256:19480589e77d47b8d85b2c827ad95d49bf31b0dcde16593892eb51dd18706eb6", size = 63559 }
wheels = [
{ url = "https://files.pythonhosted.org/packages/0c/29/0348de65b8cc732daa3e33e67806420b2ae89bdce2b04af740289c5c6c8c/loguru-0.7.3-py3-none-any.whl", hash = "sha256:31a33c10c8e1e10422bfd431aeb5d351c7cf7fa671e3c4df004162264b28220c", size = 61595 },
]
[[package]]
name = "lxml"
version = "5.3.1"
@ -1593,18 +1477,6 @@ wheels = [
{ url = "https://files.pythonhosted.org/packages/80/83/8c54533b3576f4391eebea88454738978669a6cad0d8e23266224007939d/lxml-5.3.1-cp313-cp313-win_amd64.whl", hash = "sha256:91fb6a43d72b4f8863d21f347a9163eecbf36e76e2f51068d59cd004c506f332", size = 3814484 },
]
[[package]]
name = "lxml-html-clean"
version = "0.4.1"
source = { registry = "https://pypi.org/simple" }
dependencies = [
{ name = "lxml" },
]
sdist = { url = "https://files.pythonhosted.org/packages/81/f2/fe319e3c5cb505a361b95d1e0d0d793fe28d4dcc2fc39d3cae9324dc4233/lxml_html_clean-0.4.1.tar.gz", hash = "sha256:40c838bbcf1fc72ba4ce811fbb3135913017b27820d7c16e8bc412ae1d8bc00b", size = 21378 }
wheels = [
{ url = "https://files.pythonhosted.org/packages/f7/ba/2af7a60b45bf21375e111c1e2d5d721108d06c80e3d9a3cc1d767afe1731/lxml_html_clean-0.4.1-py3-none-any.whl", hash = "sha256:b704f2757e61d793b1c08bf5ad69e4c0b68d6696f4c3c1429982caf90050bcaf", size = 14114 },
]
[[package]]
name = "makefun"
version = "1.15.6"
@ -1614,6 +1486,18 @@ wheels = [
{ url = "https://files.pythonhosted.org/packages/89/a1/3e145759e776c8866488a71270c399bf7c4e554551ac2e247aa0a18a0596/makefun-1.15.6-py2.py3-none-any.whl", hash = "sha256:e69b870f0bb60304765b1e3db576aaecf2f9b3e5105afe8cfeff8f2afe6ad067", size = 22946 },
]
[[package]]
name = "mako"
version = "1.3.10"
source = { registry = "https://pypi.org/simple" }
dependencies = [
{ name = "markupsafe" },
]
sdist = { url = "https://files.pythonhosted.org/packages/9e/38/bd5b78a920a64d708fe6bc8e0a2c075e1389d53bef8413725c63ba041535/mako-1.3.10.tar.gz", hash = "sha256:99579a6f39583fa7e5630a28c3c1f440e4e97a414b80372649c0ce338da2ea28", size = 392474 }
wheels = [
{ url = "https://files.pythonhosted.org/packages/87/fb/99f81ac72ae23375f22b7afdb7642aba97c00a713c217124420147681a2f/mako-1.3.10-py3-none-any.whl", hash = "sha256:baef24a52fc4fc514a0887ac600f9f1cff3d82c61d4d700a1fa84d597b88db59", size = 78509 },
]
[[package]]
name = "markdown"
version = "3.7"
@ -1635,15 +1519,6 @@ wheels = [
{ url = "https://files.pythonhosted.org/packages/42/d7/1ec15b46af6af88f19b8e5ffea08fa375d433c998b8a7639e76935c14f1f/markdown_it_py-3.0.0-py3-none-any.whl", hash = "sha256:355216845c60bd96232cd8d8c40e8f9765cc86f46880e43a8fd22dc1a1a8cab1", size = 87528 },
]
[[package]]
name = "markdown2"
version = "2.5.3"
source = { registry = "https://pypi.org/simple" }
sdist = { url = "https://files.pythonhosted.org/packages/44/52/d7dcc6284d59edb8301b8400435fbb4926a9b0f13a12b5cbaf3a4a54bb7b/markdown2-2.5.3.tar.gz", hash = "sha256:4d502953a4633408b0ab3ec503c5d6984d1b14307e32b325ec7d16ea57524895", size = 141676 }
wheels = [
{ url = "https://files.pythonhosted.org/packages/84/37/0a13c83ccf5365b8e08ea572dfbc04b8cb87cadd359b2451a567f5248878/markdown2-2.5.3-py3-none-any.whl", hash = "sha256:a8ebb7e84b8519c37bf7382b3db600f1798a22c245bfd754a1f87ca8d7ea63b3", size = 48550 },
]
[[package]]
name = "markdownify"
version = "0.14.1"
@ -1744,17 +1619,6 @@ wheels = [
{ url = "https://files.pythonhosted.org/packages/ac/c2/0d5aae823bdcc42cc99327ecdd4d28585e15ccd5218c453b7bcd827f3421/matplotlib-3.10.1-cp313-cp313t-win_amd64.whl", hash = "sha256:bc411ebd5889a78dabbc457b3fa153203e22248bfa6eedc6797be5df0164dbf9", size = 8134832 },
]
[[package]]
name = "md2pdf"
version = "1.0.1"
source = { registry = "https://pypi.org/simple" }
dependencies = [
{ name = "docopt" },
{ name = "markdown2" },
{ name = "weasyprint" },
]
sdist = { url = "https://files.pythonhosted.org/packages/de/b0/adbef5356f97a6d33c7811805b06e3774c7a58ea70dc28039ae4ad1ba1be/md2pdf-1.0.1.tar.gz", hash = "sha256:3d5aab77dcd5b6f5827b193819ab1a8c1cec506ce5f6c777c3411b703352cd98", size = 6377 }
[[package]]
name = "mdurl"
version = "0.1.2"
@ -1764,15 +1628,6 @@ wheels = [
{ url = "https://files.pythonhosted.org/packages/b3/38/89ba8ad64ae25be8de66a6d463314cf1eb366222074cfda9ee839c56a4b4/mdurl-0.1.2-py3-none-any.whl", hash = "sha256:84008a41e51615a49fc9966191ff91509e3c40b939176e643fd50a5c2196b8f8", size = 9979 },
]
[[package]]
name = "mistune"
version = "3.1.2"
source = { registry = "https://pypi.org/simple" }
sdist = { url = "https://files.pythonhosted.org/packages/80/f7/f6d06304c61c2a73213c0a4815280f70d985429cda26272f490e42119c1a/mistune-3.1.2.tar.gz", hash = "sha256:733bf018ba007e8b5f2d3a9eb624034f6ee26c4ea769a98ec533ee111d504dff", size = 94613 }
wheels = [
{ url = "https://files.pythonhosted.org/packages/12/92/30b4e54c4d7c48c06db61595cffbbf4f19588ea177896f9b78f0fbe021fd/mistune-3.1.2-py3-none-any.whl", hash = "sha256:4b47731332315cdca99e0ded46fc0004001c1299ff773dfb48fbe1fd226de319", size = 53696 },
]
[[package]]
name = "model2vec"
version = "0.4.0"
@ -2169,6 +2024,30 @@ wheels = [
{ url = "https://files.pythonhosted.org/packages/27/f1/1d7ec15b20f8ce9300bc850de1e059132b88990e46cd0ccac29cbf11e4f9/orjson-3.10.15-cp313-cp313-win_amd64.whl", hash = "sha256:fd56a26a04f6ba5fb2045b0acc487a63162a958ed837648c5781e1fe3316cfbf", size = 133444 },
]
[[package]]
name = "ormsgpack"
version = "1.9.1"
source = { registry = "https://pypi.org/simple" }
sdist = { url = "https://files.pythonhosted.org/packages/25/a7/462cf8ff5e29241868b82d3a5ec124d690eb6a6a5c6fa5bb1367b839e027/ormsgpack-1.9.1.tar.gz", hash = "sha256:3da6e63d82565e590b98178545e64f0f8506137b92bd31a2d04fd7c82baf5794", size = 56887 }
wheels = [
{ url = "https://files.pythonhosted.org/packages/dd/f1/155a598cc8030526ccaaf91ba4d61530f87900645559487edba58b0a90a2/ormsgpack-1.9.1-cp312-cp312-macosx_10_12_x86_64.macosx_11_0_arm64.macosx_10_12_universal2.whl", hash = "sha256:1ede445fc3fdba219bb0e0d1f289df26a9c7602016b7daac6fafe8fe4e91548f", size = 383225 },
{ url = "https://files.pythonhosted.org/packages/23/1c/ef3097ba550fad55c79525f461febdd4e0d9cc18d065248044536f09488e/ormsgpack-1.9.1-cp312-cp312-manylinux_2_17_aarch64.manylinux2014_aarch64.whl", hash = "sha256:db50b9f918e25b289114312ed775794d0978b469831b992bdc65bfe20b91fe30", size = 214056 },
{ url = "https://files.pythonhosted.org/packages/27/77/64d0da25896b2cbb99505ca518c109d7dd1964d7fde14c10943731738b60/ormsgpack-1.9.1-cp312-cp312-manylinux_2_17_armv7l.manylinux2014_armv7l.whl", hash = "sha256:8c7d8fc58e4333308f58ec720b1ee6b12b2b3fe2d2d8f0766ab751cb351e8757", size = 217339 },
{ url = "https://files.pythonhosted.org/packages/6c/10/c3a7fd0a0068b0bb52cccbfeb5656db895d69e895a3abbc210c4b3f98ff8/ormsgpack-1.9.1-cp312-cp312-manylinux_2_17_x86_64.manylinux2014_x86_64.whl", hash = "sha256:aeee6d08c040db265cb8563444aba343ecb32cbdbe2414a489dcead9f70c6765", size = 223816 },
{ url = "https://files.pythonhosted.org/packages/43/e7/aee1238dba652f2116c2523d36fd1c5f9775436032be5c233108fd2a1415/ormsgpack-1.9.1-cp312-cp312-musllinux_1_2_aarch64.whl", hash = "sha256:2fbb8181c198bdc413a4e889e5200f010724eea4b6d5a9a7eee2df039ac04aca", size = 394287 },
{ url = "https://files.pythonhosted.org/packages/c7/09/1b452a92376f29d7a2da7c18fb01cf09978197a8eccbb8b204e72fd5a970/ormsgpack-1.9.1-cp312-cp312-musllinux_1_2_armv7l.whl", hash = "sha256:16488f094ac0e2250cceea6caf72962614aa432ee11dd57ef45e1ad25ece3eff", size = 480709 },
{ url = "https://files.pythonhosted.org/packages/de/13/7fa9fee5a73af8a73a42bf8c2e69489605714f65f5a41454400a05e84a3b/ormsgpack-1.9.1-cp312-cp312-musllinux_1_2_x86_64.whl", hash = "sha256:422d960bfd6ad88be20794f50ec7953d8f7a0f2df60e19d0e8feb994e2ed64ee", size = 397247 },
{ url = "https://files.pythonhosted.org/packages/a1/2d/2e87cb28110db0d3bb750edd4d8719b5068852a2eef5e96b0bf376bb8a81/ormsgpack-1.9.1-cp312-cp312-win_amd64.whl", hash = "sha256:e6e2f9eab527cf43fb4a4293e493370276b1c8716cf305689202d646c6a782ef", size = 125368 },
{ url = "https://files.pythonhosted.org/packages/b8/54/0390d5d092831e4df29dbafe32402891fc14b3e6ffe5a644b16cbbc9d9bc/ormsgpack-1.9.1-cp313-cp313-macosx_10_12_x86_64.macosx_11_0_arm64.macosx_10_12_universal2.whl", hash = "sha256:ac61c18d9dd085e8519b949f7e655f7fb07909fd09c53b4338dd33309012e289", size = 383226 },
{ url = "https://files.pythonhosted.org/packages/47/64/8b15d262d1caefead8fb22ec144f5ff7d9505fc31c22bc34598053d46fbe/ormsgpack-1.9.1-cp313-cp313-manylinux_2_17_aarch64.manylinux2014_aarch64.whl", hash = "sha256:134840b8c6615da2c24ce77bd12a46098015c808197a9995c7a2d991e1904eec", size = 214057 },
{ url = "https://files.pythonhosted.org/packages/57/00/65823609266bad4d5ed29ea753d24a3bdb01c7edaf923da80967fc31f9c5/ormsgpack-1.9.1-cp313-cp313-manylinux_2_17_armv7l.manylinux2014_armv7l.whl", hash = "sha256:38fd42618f626394b2c7713c5d4bcbc917254e9753d5d4cde460658b51b11a74", size = 217340 },
{ url = "https://files.pythonhosted.org/packages/a0/51/e535c50f7f87b49110233647f55300d7975139ef5e51f1adb4c55f58c124/ormsgpack-1.9.1-cp313-cp313-manylinux_2_17_x86_64.manylinux2014_x86_64.whl", hash = "sha256:9d36397333ad07b9eba4c2e271fa78951bd81afc059c85a6e9f6c0eb2de07cda", size = 223815 },
{ url = "https://files.pythonhosted.org/packages/0c/ee/393e4a6de2a62124bf589602648f295a9fb3907a0e2fe80061b88899d072/ormsgpack-1.9.1-cp313-cp313-musllinux_1_2_aarch64.whl", hash = "sha256:603063089597917d04e4c1b1d53988a34f7dc2ff1a03adcfd1cf4ae966d5fba6", size = 394287 },
{ url = "https://files.pythonhosted.org/packages/c6/d8/e56d7c3cb73a0e533e3e2a21ae5838b2aa36a9dac1ca9c861af6bae5a369/ormsgpack-1.9.1-cp313-cp313-musllinux_1_2_armv7l.whl", hash = "sha256:94bbf2b185e0cb721ceaba20e64b7158e6caf0cecd140ca29b9f05a8d5e91e2f", size = 480707 },
{ url = "https://files.pythonhosted.org/packages/e6/e0/6a3c6a6dc98583a721c54b02f5195bde8f801aebdeda9b601fa2ab30ad39/ormsgpack-1.9.1-cp313-cp313-musllinux_1_2_x86_64.whl", hash = "sha256:c38f380b1e8c96a712eb302b9349347385161a8e29046868ae2bfdfcb23e2692", size = 397246 },
{ url = "https://files.pythonhosted.org/packages/b0/60/0ee5d790f13507e1f75ac21fc82dc1ef29afe1f520bd0f249d65b2f4839b/ormsgpack-1.9.1-cp313-cp313-win_amd64.whl", hash = "sha256:a4bc63fb30db94075611cedbbc3d261dd17cf2aa8ff75a0fd684cd45ca29cb1b", size = 125371 },
]
[[package]]
name = "packaging"
version = "24.2"
@ -2572,15 +2451,6 @@ wheels = [
{ url = "https://files.pythonhosted.org/packages/b4/46/93416fdae86d40879714f72956ac14df9c7b76f7d41a4d68aa9f71a0028b/pydantic_settings-2.7.1-py3-none-any.whl", hash = "sha256:590be9e6e24d06db33a4262829edef682500ef008565a969c73d39d5f8bfb3fd", size = 29718 },
]
[[package]]
name = "pydyf"
version = "0.11.0"
source = { registry = "https://pypi.org/simple" }
sdist = { url = "https://files.pythonhosted.org/packages/2e/c2/97fc6ce4ce0045080dc99446def812081b57750ed8aa67bfdfafa4561fe5/pydyf-0.11.0.tar.gz", hash = "sha256:394dddf619cca9d0c55715e3c55ea121a9bf9cbc780cdc1201a2427917b86b64", size = 17769 }
wheels = [
{ url = "https://files.pythonhosted.org/packages/c9/ac/d5db977deaf28c6ecbc61bbca269eb3e8f0b3a1f55c8549e5333e606e005/pydyf-0.11.0-py3-none-any.whl", hash = "sha256:0aaf9e2ebbe786ec7a78ec3fbffa4cdcecde53fd6f563221d53c6bc1328848a3", size = 8104 },
]
[[package]]
name = "pyee"
version = "12.1.1"
@ -2616,21 +2486,6 @@ crypto = [
{ name = "cryptography" },
]
[[package]]
name = "pymupdf"
version = "1.25.3"
source = { registry = "https://pypi.org/simple" }
sdist = { url = "https://files.pythonhosted.org/packages/06/47/b61c1c44b87cbdaeecdec3f43ce524ed6b3c72172bc6184eb82c94fbc43d/pymupdf-1.25.3.tar.gz", hash = "sha256:b640187c64c5ac5d97505a92e836da299da79c2f689f3f94a67a37a493492193", size = 67259841 }
wheels = [
{ url = "https://files.pythonhosted.org/packages/61/9b/98ef4b98309e9db3baa9fe572f0e61b6130bb9852d13189970f35b703499/pymupdf-1.25.3-cp39-abi3-macosx_10_9_x86_64.whl", hash = "sha256:96878e1b748f9c2011aecb2028c5f96b5a347a9a91169130ad0133053d97915e", size = 19343576 },
{ url = "https://files.pythonhosted.org/packages/14/62/4e12126db174c8cfbf692281cda971cc4046c5f5226032c2cfaa6f83e08d/pymupdf-1.25.3-cp39-abi3-macosx_11_0_arm64.whl", hash = "sha256:6ef753005b72ebfd23470f72f7e30f61e21b0b5e748045ec5b8f89e6e3068d62", size = 18580114 },
{ url = "https://files.pythonhosted.org/packages/ec/c5/cf7ecf005e4f8ba3664d6aaa0613adeba4c2ab524832c452c69857e7184f/pymupdf-1.25.3-cp39-abi3-manylinux2014_aarch64.manylinux_2_17_aarch64.whl", hash = "sha256:cbff443d899f37b17f1e67563cc03673d50b4bf33ccc237e73d34f18f3a07ccf", size = 19442580 },
{ url = "https://files.pythonhosted.org/packages/52/de/bd1418e31f73d37b8381cd5deacfd681e6be702b8890e123e83724569ee1/pymupdf-1.25.3-cp39-abi3-manylinux2014_x86_64.manylinux_2_17_x86_64.whl", hash = "sha256:46d90c4f9e62d1856e8db4b9f04a202ff4a7f086a816af73abdc86adb7f5e25a", size = 19999825 },
{ url = "https://files.pythonhosted.org/packages/42/ee/3c449b0de061440ba1ac984aa845315e9e2dca0ff2003c5adfc6febff203/pymupdf-1.25.3-cp39-abi3-musllinux_1_2_x86_64.whl", hash = "sha256:a5de51efdbe4d486b6c1111c84e8a231cbfb426f3d6ff31ab530ad70e6f39756", size = 21123157 },
{ url = "https://files.pythonhosted.org/packages/83/53/71faaaf91c56f2883b13f3dd849bf2697f012eb35eb7b952d62734cff41f/pymupdf-1.25.3-cp39-abi3-win32.whl", hash = "sha256:bca72e6089f985d800596e22973f79cc08af6cbff1d93e5bda9248326a03857c", size = 15094211 },
{ url = "https://files.pythonhosted.org/packages/09/e0/d72e88a1d5e23aa381fd463057dc3d0fb29090e1e7308a870c334716579c/pymupdf-1.25.3-cp39-abi3-win_amd64.whl", hash = "sha256:4fb357438c9129fbf939b5af85323434df64e36759c399c376b62ad6da95498c", size = 16542949 },
]
[[package]]
name = "pypandoc"
version = "1.15"
@ -2678,15 +2533,6 @@ wheels = [
{ url = "https://files.pythonhosted.org/packages/e1/6b/2706497c86e8d69fb76afe5ea857fe1794621aa0f3b1d863feb953fe0f22/pypdfium2-4.30.1-py3-none-win_arm64.whl", hash = "sha256:c2b6d63f6d425d9416c08d2511822b54b8e3ac38e639fc41164b1d75584b3a8c", size = 2814810 },
]
[[package]]
name = "pyphen"
version = "0.17.2"
source = { registry = "https://pypi.org/simple" }
sdist = { url = "https://files.pythonhosted.org/packages/69/56/e4d7e1bd70d997713649c5ce530b2d15a5fc2245a74ca820fc2d51d89d4d/pyphen-0.17.2.tar.gz", hash = "sha256:f60647a9c9b30ec6c59910097af82bc5dd2d36576b918e44148d8b07ef3b4aa3", size = 2079470 }
wheels = [
{ url = "https://files.pythonhosted.org/packages/7b/1f/c2142d2edf833a90728e5cdeb10bdbdc094dde8dbac078cee0cf33f5e11b/pyphen-0.17.2-py3-none-any.whl", hash = "sha256:3a07fb017cb2341e1d9ff31b8634efb1ae4dc4b130468c7c39dd3d32e7c3affd", size = 2079358 },
]
[[package]]
name = "pyreadline3"
version = "3.5.4"
@ -3135,12 +2981,6 @@ wheels = [
{ url = "https://files.pythonhosted.org/packages/69/8a/b9dc7678803429e4a3bc9ba462fa3dd9066824d3c607490235c6a796be5a/setuptools-75.8.0-py3-none-any.whl", hash = "sha256:e3982f444617239225d675215d51f6ba05f845d4eec313da4418fdbb56fb27e3", size = 1228782 },
]
[[package]]
name = "sgmllib3k"
version = "1.0.0"
source = { registry = "https://pypi.org/simple" }
sdist = { url = "https://files.pythonhosted.org/packages/9e/bd/3704a8c3e0942d711c1299ebf7b9091930adae6675d7c8f476a7ce48653c/sgmllib3k-1.0.0.tar.gz", hash = "sha256:7868fb1c8bfa764c1ac563d3cf369c381d1325d36124933a726f29fcdaa812e9", size = 5750 }
[[package]]
name = "six"
version = "1.17.0"
@ -3225,17 +3065,19 @@ wheels = [
[[package]]
name = "surf-new-backend"
version = "0.1.0"
version = "0.0.6"
source = { virtual = "." }
dependencies = [
{ name = "alembic" },
{ name = "asyncpg" },
{ name = "chonkie", extra = ["all"] },
{ name = "fastapi" },
{ name = "fastapi-users", extra = ["oauth", "sqlalchemy"] },
{ name = "firecrawl-py" },
{ name = "gpt-researcher" },
{ name = "github3-py" },
{ name = "langchain-community" },
{ name = "langchain-unstructured" },
{ name = "langgraph" },
{ name = "litellm" },
{ name = "markdownify" },
{ name = "notion-client" },
@ -3254,14 +3096,16 @@ dependencies = [
[package.metadata]
requires-dist = [
{ name = "alembic", specifier = ">=1.13.0" },
{ name = "asyncpg", specifier = ">=0.30.0" },
{ name = "chonkie", extras = ["all"], specifier = ">=0.4.1" },
{ name = "fastapi", specifier = ">=0.115.8" },
{ name = "fastapi-users", extras = ["oauth", "sqlalchemy"], specifier = ">=14.0.1" },
{ name = "firecrawl-py", specifier = ">=1.12.0" },
{ name = "gpt-researcher", specifier = ">=0.12.12" },
{ name = "github3-py", specifier = "==4.0.1" },
{ name = "langchain-community", specifier = ">=0.3.17" },
{ name = "langchain-unstructured", specifier = ">=0.1.6" },
{ name = "langgraph", specifier = ">=0.3.29" },
{ name = "litellm", specifier = ">=1.61.4" },
{ name = "markdownify", specifier = ">=0.14.1" },
{ name = "notion-client", specifier = ">=2.3.0" },
@ -3362,30 +3206,6 @@ wheels = [
{ url = "https://files.pythonhosted.org/packages/6c/d0/179abca8b984b3deefd996f362b612c39da73b60f685921e6cd58b6125b4/timm-1.0.15-py3-none-any.whl", hash = "sha256:5a3dc460c24e322ecc7fd1f3e3eb112423ddee320cb059cc1956fbc9731748ef", size = 2361373 },
]
[[package]]
name = "tinycss2"
version = "1.4.0"
source = { registry = "https://pypi.org/simple" }
dependencies = [
{ name = "webencodings" },
]
sdist = { url = "https://files.pythonhosted.org/packages/7a/fd/7a5ee21fd08ff70d3d33a5781c255cbe779659bd03278feb98b19ee550f4/tinycss2-1.4.0.tar.gz", hash = "sha256:10c0972f6fc0fbee87c3edb76549357415e94548c1ae10ebccdea16fb404a9b7", size = 87085 }
wheels = [
{ url = "https://files.pythonhosted.org/packages/e6/34/ebdc18bae6aa14fbee1a08b63c015c72b64868ff7dae68808ab500c492e2/tinycss2-1.4.0-py3-none-any.whl", hash = "sha256:3a49cf47b7675da0b15d0c6e1df8df4ebd96e9394bb905a5775adb0d884c5289", size = 26610 },
]
[[package]]
name = "tinyhtml5"
version = "2.0.0"
source = { registry = "https://pypi.org/simple" }
dependencies = [
{ name = "webencodings" },
]
sdist = { url = "https://files.pythonhosted.org/packages/fd/03/6111ed99e9bf7dfa1c30baeef0e0fb7e0bd387bd07f8e5b270776fe1de3f/tinyhtml5-2.0.0.tar.gz", hash = "sha256:086f998833da24c300c414d9fe81d9b368fd04cb9d2596a008421cbc705fcfcc", size = 179507 }
wheels = [
{ url = "https://files.pythonhosted.org/packages/5c/de/27c57899297163a4a84104d5cec0af3b1ac5faf62f44667e506373c6b8ce/tinyhtml5-2.0.0-py3-none-any.whl", hash = "sha256:13683277c5b176d070f82d099d977194b7a1e26815b016114f581a74bbfbf47e", size = 39793 },
]
[[package]]
name = "tokenizers"
version = "0.21.0"
@ -3658,6 +3478,15 @@ wheels = [
{ url = "https://files.pythonhosted.org/packages/10/6d/adb955ecf60811a3735d508974bbb5358e7745b635dc001329267529c6f2/unstructured.pytesseract-0.3.15-py3-none-any.whl", hash = "sha256:a3f505c5efb7ff9f10379051a7dd6aa624b3be6b0f023ed6767cc80d0b1613d1", size = 14992 },
]
[[package]]
name = "uritemplate"
version = "4.1.1"
source = { registry = "https://pypi.org/simple" }
sdist = { url = "https://files.pythonhosted.org/packages/d2/5a/4742fdba39cd02a56226815abfa72fe0aa81c33bed16ed045647d6000eba/uritemplate-4.1.1.tar.gz", hash = "sha256:4346edfc5c3b79f694bccd6d6099a322bbeb628dbf2cd86eea55a456ce5124f0", size = 273898 }
wheels = [
{ url = "https://files.pythonhosted.org/packages/81/c0/7461b49cd25aeece13766f02ee576d1db528f1c37ce69aee300e075b485b/uritemplate-4.1.1-py2.py3-none-any.whl", hash = "sha256:830c08b8d99bdd312ea4ead05994a38e8936266f84b9a7878232db50b044e02e", size = 10356 },
]
[[package]]
name = "urllib3"
version = "2.3.0"
@ -3756,25 +3585,6 @@ wheels = [
{ url = "https://files.pythonhosted.org/packages/f0/e5/96b8e55271685ddbadc50ce8bc53aa2dff278fb7ac4c2e473df890def2dc/watchfiles-1.0.4-cp313-cp313-win_amd64.whl", hash = "sha256:d6097538b0ae5c1b88c3b55afa245a66793a8fec7ada6755322e465fb1a0e8cc", size = 285216 },
]
[[package]]
name = "weasyprint"
version = "64.1"
source = { registry = "https://pypi.org/simple" }
dependencies = [
{ name = "cffi" },
{ name = "cssselect2" },
{ name = "fonttools", extra = ["woff"] },
{ name = "pillow" },
{ name = "pydyf" },
{ name = "pyphen" },
{ name = "tinycss2" },
{ name = "tinyhtml5" },
]
sdist = { url = "https://files.pythonhosted.org/packages/6c/a0/f6b3ef688e747488b17b3b39d27fe7438d3ec88d1b79d5524485a5458020/weasyprint-64.1.tar.gz", hash = "sha256:28b02f2c6409bafce1b1220d9d76a7345875bd3bd08c4f6dfbf510bb92a94757", size = 498647 }
wheels = [
{ url = "https://files.pythonhosted.org/packages/42/95/bf333fbbaf73c1c211b6b801b9ac2563db8e2225f69902d1ba8b25c70e9c/weasyprint-64.1-py3-none-any.whl", hash = "sha256:f7c88ea8ce0ce0c527cbb9c802689e035fae50016d7efc5dfdaba4b75abf68f4", size = 302025 },
]
[[package]]
name = "webencodings"
version = "0.5.1"
@ -3815,15 +3625,6 @@ wheels = [
{ url = "https://files.pythonhosted.org/packages/7b/c8/d529f8a32ce40d98309f4470780631e971a5a842b60aec864833b3615786/websockets-14.2-py3-none-any.whl", hash = "sha256:7a6ceec4ea84469f15cf15807a747e9efe57e369c384fa86e022b3bea679b79b", size = 157416 },
]
[[package]]
name = "win32-setctime"
version = "1.2.0"
source = { registry = "https://pypi.org/simple" }
sdist = { url = "https://files.pythonhosted.org/packages/b3/8f/705086c9d734d3b663af0e9bb3d4de6578d08f46b1b101c2442fd9aecaa2/win32_setctime-1.2.0.tar.gz", hash = "sha256:ae1fdf948f5640aae05c511ade119313fb6a30d7eabe25fef9764dca5873c4c0", size = 4867 }
wheels = [
{ url = "https://files.pythonhosted.org/packages/e1/07/c6fe3ad3e685340704d314d765b7912993bcb8dc198f0e7a89382d37974b/win32_setctime-1.2.0-py3-none-any.whl", hash = "sha256:95d644c4e708aba81dc3704a116d8cbc974d70b3bdb8be1d150e36be6e9d1390", size = 4083 },
]
[[package]]
name = "wrapt"
version = "1.17.2"
@ -3884,6 +3685,44 @@ wheels = [
{ url = "https://files.pythonhosted.org/packages/9b/07/df054f7413bdfff5e98f75056e4ed0977d0c8716424011fac2587864d1d3/XlsxWriter-3.2.2-py3-none-any.whl", hash = "sha256:272ce861e7fa5e82a4a6ebc24511f2cb952fde3461f6c6e1a1e81d3272db1471", size = 165121 },
]
[[package]]
name = "xxhash"
version = "3.5.0"
source = { registry = "https://pypi.org/simple" }
sdist = { url = "https://files.pythonhosted.org/packages/00/5e/d6e5258d69df8b4ed8c83b6664f2b47d30d2dec551a29ad72a6c69eafd31/xxhash-3.5.0.tar.gz", hash = "sha256:84f2caddf951c9cbf8dc2e22a89d4ccf5d86391ac6418fe81e3c67d0cf60b45f", size = 84241 }
wheels = [
{ url = "https://files.pythonhosted.org/packages/07/0e/1bfce2502c57d7e2e787600b31c83535af83746885aa1a5f153d8c8059d6/xxhash-3.5.0-cp312-cp312-macosx_10_9_x86_64.whl", hash = "sha256:14470ace8bd3b5d51318782cd94e6f94431974f16cb3b8dc15d52f3b69df8e00", size = 31969 },
{ url = "https://files.pythonhosted.org/packages/3f/d6/8ca450d6fe5b71ce521b4e5db69622383d039e2b253e9b2f24f93265b52c/xxhash-3.5.0-cp312-cp312-macosx_11_0_arm64.whl", hash = "sha256:59aa1203de1cb96dbeab595ded0ad0c0056bb2245ae11fac11c0ceea861382b9", size = 30787 },
{ url = "https://files.pythonhosted.org/packages/5b/84/de7c89bc6ef63d750159086a6ada6416cc4349eab23f76ab870407178b93/xxhash-3.5.0-cp312-cp312-manylinux_2_17_aarch64.manylinux2014_aarch64.whl", hash = "sha256:08424f6648526076e28fae6ea2806c0a7d504b9ef05ae61d196d571e5c879c84", size = 220959 },
{ url = "https://files.pythonhosted.org/packages/fe/86/51258d3e8a8545ff26468c977101964c14d56a8a37f5835bc0082426c672/xxhash-3.5.0-cp312-cp312-manylinux_2_17_ppc64le.manylinux2014_ppc64le.whl", hash = "sha256:61a1ff00674879725b194695e17f23d3248998b843eb5e933007ca743310f793", size = 200006 },
{ url = "https://files.pythonhosted.org/packages/02/0a/96973bd325412feccf23cf3680fd2246aebf4b789122f938d5557c54a6b2/xxhash-3.5.0-cp312-cp312-manylinux_2_17_s390x.manylinux2014_s390x.whl", hash = "sha256:f2f2c61bee5844d41c3eb015ac652a0229e901074951ae48581d58bfb2ba01be", size = 428326 },
{ url = "https://files.pythonhosted.org/packages/11/a7/81dba5010f7e733de88af9555725146fc133be97ce36533867f4c7e75066/xxhash-3.5.0-cp312-cp312-manylinux_2_17_x86_64.manylinux2014_x86_64.whl", hash = "sha256:9d32a592cac88d18cc09a89172e1c32d7f2a6e516c3dfde1b9adb90ab5df54a6", size = 194380 },
{ url = "https://files.pythonhosted.org/packages/fb/7d/f29006ab398a173f4501c0e4977ba288f1c621d878ec217b4ff516810c04/xxhash-3.5.0-cp312-cp312-manylinux_2_5_i686.manylinux1_i686.manylinux_2_17_i686.manylinux2014_i686.whl", hash = "sha256:70dabf941dede727cca579e8c205e61121afc9b28516752fd65724be1355cc90", size = 207934 },
{ url = "https://files.pythonhosted.org/packages/8a/6e/6e88b8f24612510e73d4d70d9b0c7dff62a2e78451b9f0d042a5462c8d03/xxhash-3.5.0-cp312-cp312-musllinux_1_2_aarch64.whl", hash = "sha256:e5d0ddaca65ecca9c10dcf01730165fd858533d0be84c75c327487c37a906a27", size = 216301 },
{ url = "https://files.pythonhosted.org/packages/af/51/7862f4fa4b75a25c3b4163c8a873f070532fe5f2d3f9b3fc869c8337a398/xxhash-3.5.0-cp312-cp312-musllinux_1_2_i686.whl", hash = "sha256:3e5b5e16c5a480fe5f59f56c30abdeba09ffd75da8d13f6b9b6fd224d0b4d0a2", size = 203351 },
{ url = "https://files.pythonhosted.org/packages/22/61/8d6a40f288f791cf79ed5bb113159abf0c81d6efb86e734334f698eb4c59/xxhash-3.5.0-cp312-cp312-musllinux_1_2_ppc64le.whl", hash = "sha256:149b7914451eb154b3dfaa721315117ea1dac2cc55a01bfbd4df7c68c5dd683d", size = 210294 },
{ url = "https://files.pythonhosted.org/packages/17/02/215c4698955762d45a8158117190261b2dbefe9ae7e5b906768c09d8bc74/xxhash-3.5.0-cp312-cp312-musllinux_1_2_s390x.whl", hash = "sha256:eade977f5c96c677035ff39c56ac74d851b1cca7d607ab3d8f23c6b859379cab", size = 414674 },
{ url = "https://files.pythonhosted.org/packages/31/5c/b7a8db8a3237cff3d535261325d95de509f6a8ae439a5a7a4ffcff478189/xxhash-3.5.0-cp312-cp312-musllinux_1_2_x86_64.whl", hash = "sha256:fa9f547bd98f5553d03160967866a71056a60960be00356a15ecc44efb40ba8e", size = 192022 },
{ url = "https://files.pythonhosted.org/packages/78/e3/dd76659b2811b3fd06892a8beb850e1996b63e9235af5a86ea348f053e9e/xxhash-3.5.0-cp312-cp312-win32.whl", hash = "sha256:f7b58d1fd3551b8c80a971199543379be1cee3d0d409e1f6d8b01c1a2eebf1f8", size = 30170 },
{ url = "https://files.pythonhosted.org/packages/d9/6b/1c443fe6cfeb4ad1dcf231cdec96eb94fb43d6498b4469ed8b51f8b59a37/xxhash-3.5.0-cp312-cp312-win_amd64.whl", hash = "sha256:fa0cafd3a2af231b4e113fba24a65d7922af91aeb23774a8b78228e6cd785e3e", size = 30040 },
{ url = "https://files.pythonhosted.org/packages/0f/eb/04405305f290173acc0350eba6d2f1a794b57925df0398861a20fbafa415/xxhash-3.5.0-cp312-cp312-win_arm64.whl", hash = "sha256:586886c7e89cb9828bcd8a5686b12e161368e0064d040e225e72607b43858ba2", size = 26796 },
{ url = "https://files.pythonhosted.org/packages/c9/b8/e4b3ad92d249be5c83fa72916c9091b0965cb0faeff05d9a0a3870ae6bff/xxhash-3.5.0-cp313-cp313-macosx_10_13_x86_64.whl", hash = "sha256:37889a0d13b0b7d739cfc128b1c902f04e32de17b33d74b637ad42f1c55101f6", size = 31795 },
{ url = "https://files.pythonhosted.org/packages/fc/d8/b3627a0aebfbfa4c12a41e22af3742cf08c8ea84f5cc3367b5de2d039cce/xxhash-3.5.0-cp313-cp313-macosx_11_0_arm64.whl", hash = "sha256:97a662338797c660178e682f3bc180277b9569a59abfb5925e8620fba00b9fc5", size = 30792 },
{ url = "https://files.pythonhosted.org/packages/c3/cc/762312960691da989c7cd0545cb120ba2a4148741c6ba458aa723c00a3f8/xxhash-3.5.0-cp313-cp313-manylinux_2_17_aarch64.manylinux2014_aarch64.whl", hash = "sha256:7f85e0108d51092bdda90672476c7d909c04ada6923c14ff9d913c4f7dc8a3bc", size = 220950 },
{ url = "https://files.pythonhosted.org/packages/fe/e9/cc266f1042c3c13750e86a535496b58beb12bf8c50a915c336136f6168dc/xxhash-3.5.0-cp313-cp313-manylinux_2_17_ppc64le.manylinux2014_ppc64le.whl", hash = "sha256:cd2fd827b0ba763ac919440042302315c564fdb797294d86e8cdd4578e3bc7f3", size = 199980 },
{ url = "https://files.pythonhosted.org/packages/bf/85/a836cd0dc5cc20376de26b346858d0ac9656f8f730998ca4324921a010b9/xxhash-3.5.0-cp313-cp313-manylinux_2_17_s390x.manylinux2014_s390x.whl", hash = "sha256:82085c2abec437abebf457c1d12fccb30cc8b3774a0814872511f0f0562c768c", size = 428324 },
{ url = "https://files.pythonhosted.org/packages/b4/0e/15c243775342ce840b9ba34aceace06a1148fa1630cd8ca269e3223987f5/xxhash-3.5.0-cp313-cp313-manylinux_2_17_x86_64.manylinux2014_x86_64.whl", hash = "sha256:07fda5de378626e502b42b311b049848c2ef38784d0d67b6f30bb5008642f8eb", size = 194370 },
{ url = "https://files.pythonhosted.org/packages/87/a1/b028bb02636dfdc190da01951d0703b3d904301ed0ef6094d948983bef0e/xxhash-3.5.0-cp313-cp313-manylinux_2_5_i686.manylinux1_i686.manylinux_2_17_i686.manylinux2014_i686.whl", hash = "sha256:c279f0d2b34ef15f922b77966640ade58b4ccdfef1c4d94b20f2a364617a493f", size = 207911 },
{ url = "https://files.pythonhosted.org/packages/80/d5/73c73b03fc0ac73dacf069fdf6036c9abad82de0a47549e9912c955ab449/xxhash-3.5.0-cp313-cp313-musllinux_1_2_aarch64.whl", hash = "sha256:89e66ceed67b213dec5a773e2f7a9e8c58f64daeb38c7859d8815d2c89f39ad7", size = 216352 },
{ url = "https://files.pythonhosted.org/packages/b6/2a/5043dba5ddbe35b4fe6ea0a111280ad9c3d4ba477dd0f2d1fe1129bda9d0/xxhash-3.5.0-cp313-cp313-musllinux_1_2_i686.whl", hash = "sha256:bcd51708a633410737111e998ceb3b45d3dbc98c0931f743d9bb0a209033a326", size = 203410 },
{ url = "https://files.pythonhosted.org/packages/a2/b2/9a8ded888b7b190aed75b484eb5c853ddd48aa2896e7b59bbfbce442f0a1/xxhash-3.5.0-cp313-cp313-musllinux_1_2_ppc64le.whl", hash = "sha256:3ff2c0a34eae7df88c868be53a8dd56fbdf592109e21d4bfa092a27b0bf4a7bf", size = 210322 },
{ url = "https://files.pythonhosted.org/packages/98/62/440083fafbc917bf3e4b67c2ade621920dd905517e85631c10aac955c1d2/xxhash-3.5.0-cp313-cp313-musllinux_1_2_s390x.whl", hash = "sha256:4e28503dccc7d32e0b9817aa0cbfc1f45f563b2c995b7a66c4c8a0d232e840c7", size = 414725 },
{ url = "https://files.pythonhosted.org/packages/75/db/009206f7076ad60a517e016bb0058381d96a007ce3f79fa91d3010f49cc2/xxhash-3.5.0-cp313-cp313-musllinux_1_2_x86_64.whl", hash = "sha256:a6c50017518329ed65a9e4829154626f008916d36295b6a3ba336e2458824c8c", size = 192070 },
{ url = "https://files.pythonhosted.org/packages/1f/6d/c61e0668943a034abc3a569cdc5aeae37d686d9da7e39cf2ed621d533e36/xxhash-3.5.0-cp313-cp313-win32.whl", hash = "sha256:53a068fe70301ec30d868ece566ac90d873e3bb059cf83c32e76012c889b8637", size = 30172 },
{ url = "https://files.pythonhosted.org/packages/96/14/8416dce965f35e3d24722cdf79361ae154fa23e2ab730e5323aa98d7919e/xxhash-3.5.0-cp313-cp313-win_amd64.whl", hash = "sha256:80babcc30e7a1a484eab952d76a4f4673ff601f54d5142c26826502740e70b43", size = 30041 },
{ url = "https://files.pythonhosted.org/packages/27/ee/518b72faa2073f5aa8e3262408d284892cb79cf2754ba0c3a5870645ef73/xxhash-3.5.0-cp313-cp313-win_arm64.whl", hash = "sha256:4811336f1ce11cac89dcbd18f3a25c527c16311709a89313c3acaf771def2d4b", size = 26801 },
]
[[package]]
name = "yarl"
version = "1.18.3"
@ -3952,34 +3791,6 @@ wheels = [
{ url = "https://files.pythonhosted.org/packages/b7/1a/7e4798e9339adc931158c9d69ecc34f5e6791489d469f5e50ec15e35f458/zipp-3.21.0-py3-none-any.whl", hash = "sha256:ac1bbe05fd2991f160ebce24ffbac5f6d11d83dc90891255885223d42b3cd931", size = 9630 },
]
[[package]]
name = "zopfli"
version = "0.2.3.post1"
source = { registry = "https://pypi.org/simple" }
sdist = { url = "https://files.pythonhosted.org/packages/5e/7c/a8f6696e694709e2abcbccd27d05ef761e9b6efae217e11d977471555b62/zopfli-0.2.3.post1.tar.gz", hash = "sha256:96484dc0f48be1c5d7ae9f38ed1ce41e3675fd506b27c11a6607f14b49101e99", size = 175629 }
wheels = [
{ url = "https://files.pythonhosted.org/packages/3f/ce/b6441cc01881d06e0b5883f32c44e7cc9772e0d04e3e59277f59f80b9a19/zopfli-0.2.3.post1-cp312-cp312-macosx_10_13_universal2.whl", hash = "sha256:3f0197b6aa6eb3086ae9e66d6dd86c4d502b6c68b0ec490496348ae8c05ecaef", size = 295489 },
{ url = "https://files.pythonhosted.org/packages/93/f0/24dd708f00ae0a925bc5c9edae858641c80f6a81a516810dc4d21688a930/zopfli-0.2.3.post1-cp312-cp312-macosx_10_13_x86_64.whl", hash = "sha256:5fcfc0dc2761e4fcc15ad5d273b4d58c2e8e059d3214a7390d4d3c8e2aee644e", size = 163010 },
{ url = "https://files.pythonhosted.org/packages/65/57/0378eeeb5e3e1e83b1b0958616b2bf954f102ba5b0755b9747dafbd8cb72/zopfli-0.2.3.post1-cp312-cp312-manylinux_2_12_i686.manylinux2010_i686.manylinux_2_17_i686.manylinux2014_i686.whl", hash = "sha256:cac2b37ab21c2b36a10b685b1893ebd6b0f83ae26004838ac817680881576567", size = 823649 },
{ url = "https://files.pythonhosted.org/packages/ab/8a/3ab8a616d4655acf5cf63c40ca84e434289d7d95518a1a42d28b4a7228f8/zopfli-0.2.3.post1-cp312-cp312-manylinux_2_17_aarch64.manylinux2014_aarch64.whl", hash = "sha256:8d5ab297d660b75c159190ce6d73035502310e40fd35170aed7d1a1aea7ddd65", size = 826557 },
{ url = "https://files.pythonhosted.org/packages/ed/4d/7f6820af119c4fec6efaf007bffee7bc9052f695853a711a951be7afd26b/zopfli-0.2.3.post1-cp312-cp312-manylinux_2_17_x86_64.manylinux2014_x86_64.whl", hash = "sha256:9ba214f4f45bec195ee8559651154d3ac2932470b9d91c5715fc29c013349f8c", size = 851127 },
{ url = "https://files.pythonhosted.org/packages/e1/db/1ef5353ab06f9f2fb0c25ed0cddf1418fe275cc2ee548bc4a29340c44fe1/zopfli-0.2.3.post1-cp312-cp312-musllinux_1_2_aarch64.whl", hash = "sha256:c1e0ed5d84ffa2d677cc9582fc01e61dab2e7ef8b8996e055f0a76167b1b94df", size = 1754183 },
{ url = "https://files.pythonhosted.org/packages/39/03/44f8f39950354d330fa798e4bab1ac8e38ec787d3fde25d5b9c7770065a2/zopfli-0.2.3.post1-cp312-cp312-musllinux_1_2_i686.whl", hash = "sha256:bfa1eb759e07d8b7aa7a310a2bc535e127ee70addf90dc8d4b946b593c3e51a8", size = 1905945 },
{ url = "https://files.pythonhosted.org/packages/74/7b/94b920c33cc64255f59e3cfc77c829b5c6e60805d189baeada728854a342/zopfli-0.2.3.post1-cp312-cp312-musllinux_1_2_x86_64.whl", hash = "sha256:cd2c002f160502608dcc822ed2441a0f4509c52e86fcfd1a09e937278ed1ca14", size = 1835885 },
{ url = "https://files.pythonhosted.org/packages/ad/89/c869ac844351e285a6165e2da79b715b0619a122e3160d183805adf8ab45/zopfli-0.2.3.post1-cp312-cp312-win32.whl", hash = "sha256:7be5cc6732eb7b4df17305d8a7b293223f934a31783a874a01164703bc1be6cd", size = 82743 },
{ url = "https://files.pythonhosted.org/packages/29/e6/c98912fd3a589d8a7316c408fd91519f72c237805c4400b753e3942fda0b/zopfli-0.2.3.post1-cp312-cp312-win_amd64.whl", hash = "sha256:4e50ffac74842c1c1018b9b73875a0d0a877c066ab06bf7cccbaa84af97e754f", size = 99403 },
{ url = "https://files.pythonhosted.org/packages/2b/24/0e552e2efce9a20625b56e9609d1e33c2966be33fc008681121ec267daec/zopfli-0.2.3.post1-cp313-cp313-macosx_10_13_universal2.whl", hash = "sha256:ecb7572df5372abce8073df078207d9d1749f20b8b136089916a4a0868d56051", size = 295485 },
{ url = "https://files.pythonhosted.org/packages/08/83/b2564369fb98797a617fe2796097b1d719a4937234375757ad2a3febc04b/zopfli-0.2.3.post1-cp313-cp313-macosx_10_13_x86_64.whl", hash = "sha256:a1cf720896d2ce998bc8e051d4b4ce0d8bec007aab6243102e8e1d22a0b2fb3f", size = 163000 },
{ url = "https://files.pythonhosted.org/packages/3c/55/81d419739c2aab35e19b58bce5498dcb58e6446e5eb69f2d3c748b1c9151/zopfli-0.2.3.post1-cp313-cp313-manylinux_2_12_i686.manylinux2010_i686.manylinux_2_17_i686.manylinux2014_i686.whl", hash = "sha256:5aad740b4d4fcbaaae4887823925166ffd062db3b248b3f432198fc287381d1a", size = 823699 },
{ url = "https://files.pythonhosted.org/packages/9e/91/89f07c8ea3c9bc64099b3461627b07a8384302235ee0f357eaa86f98f509/zopfli-0.2.3.post1-cp313-cp313-manylinux_2_17_aarch64.manylinux2014_aarch64.whl", hash = "sha256:6617fb10f9e4393b331941861d73afb119cd847e88e4974bdbe8068ceef3f73f", size = 826612 },
{ url = "https://files.pythonhosted.org/packages/41/31/46670fc0c7805d42bc89702440fa9b73491d68abbc39e28d687180755178/zopfli-0.2.3.post1-cp313-cp313-manylinux_2_17_x86_64.manylinux2014_x86_64.whl", hash = "sha256:a53b18797cdef27e019db595d66c4b077325afe2fd62145953275f53d84ce40c", size = 851148 },
{ url = "https://files.pythonhosted.org/packages/22/00/71ad39277bbb88f9fd20fb786bd3ff2ea4025c53b31652a0da796fb546cd/zopfli-0.2.3.post1-cp313-cp313-musllinux_1_2_aarch64.whl", hash = "sha256:b78008a69300d929ca2efeffec951b64a312e9a811e265ea4a907ab546d79fa6", size = 1754215 },
{ url = "https://files.pythonhosted.org/packages/d0/4e/e542c508d20c3dfbef1b90fcf726f824f505e725747f777b0b7b7d1deb95/zopfli-0.2.3.post1-cp313-cp313-musllinux_1_2_i686.whl", hash = "sha256:0aa5f90d6298bda02a95bc8dc8c3c19004d5a4e44bda00b67ca7431d857b4b54", size = 1905988 },
{ url = "https://files.pythonhosted.org/packages/ba/a5/817ac1ecc888723e91dc172e8c6eeab9f48a1e52285803b965084e11bbd5/zopfli-0.2.3.post1-cp313-cp313-musllinux_1_2_x86_64.whl", hash = "sha256:2768c877f76c8a0e7519b1c86c93757f3c01492ddde55751e9988afb7eff64e1", size = 1835907 },
{ url = "https://files.pythonhosted.org/packages/cd/35/2525f90c972d8aafc39784a8c00244eeee8e8221b26cbc576748ee9dc1cd/zopfli-0.2.3.post1-cp313-cp313-win32.whl", hash = "sha256:71390dbd3fbf6ebea9a5d85ffed8c26ee1453ee09248e9b88486e30e0397b775", size = 82742 },
{ url = "https://files.pythonhosted.org/packages/2f/c6/49b27570923956d52d37363e8f5df3a31a61bd7719bb8718527a9df3ae5f/zopfli-0.2.3.post1-cp313-cp313-win_amd64.whl", hash = "sha256:a86eb88e06bd87e1fff31dac878965c26b0c26db59ddcf78bb0379a954b120de", size = 99408 },
]
[[package]]
name = "zstandard"
version = "0.23.0"

View file

@ -1,7 +1,7 @@
{
"name": "surfsense",
"displayName": "Surfsense",
"version": "0.0.1",
"name": "surfsense_browser_extension",
"displayName": "Surfsense Browser Extension",
"version": "0.0.6",
"description": "Extension to collect Browsing History for SurfSense.",
"author": "https://github.com/MODSetter",
"scripts": {

View file

@ -39,3 +39,6 @@ yarn-error.log*
# typescript
*.tsbuildinfo
next-env.d.ts
# source
/.source/

View file

@ -8,8 +8,15 @@ RUN npm install -g pnpm
# Copy package files
COPY package.json pnpm-lock.yaml ./
# Install dependencies
RUN pnpm install
# First copy the config file to avoid fumadocs-mdx postinstall error
COPY source.config.ts ./
COPY content ./content
# Install dependencies with --ignore-scripts to skip postinstall
RUN pnpm install --ignore-scripts
# Now run the postinstall script manually
RUN pnpm fumadocs-mdx
# Copy source code
COPY . .

View file

@ -0,0 +1,4 @@
import { source } from '@/lib/source';
import { createFromSource } from 'fumadocs-core/search/server';
export const { GET } = createFromSource(source);

View file

@ -7,12 +7,15 @@ interface PageProps {
};
}
export default function ChatsPage({ params }: PageProps) {
export default async function ChatsPage({ params }: PageProps) {
// Await params to properly access dynamic route parameters
const searchSpaceId = params.search_space_id;
return (
<Suspense fallback={<div className="flex items-center justify-center h-[60vh]">
<div className="h-8 w-8 animate-spin rounded-full border-4 border-primary border-t-transparent"></div>
</div>}>
<ChatsPageClient searchSpaceId={params.search_space_id} />
<ChatsPageClient searchSpaceId={searchSpaceId} />
</Suspense>
);
}

View file

@ -44,6 +44,8 @@ const getConnectorTypeDisplay = (type: string): string => {
"TAVILY_API": "Tavily API",
"SLACK_CONNECTOR": "Slack",
"NOTION_CONNECTOR": "Notion",
"GITHUB_CONNECTOR": "GitHub",
"LINEAR_CONNECTOR": "Linear",
// Add other connector types here as needed
};
return typeMap[type] || type;
@ -204,7 +206,7 @@ export default function ConnectorsPage() {
<Button
variant="outline"
size="sm"
onClick={() => router.push(`/dashboard/${searchSpaceId}/connectors/${connector.id}`)}
onClick={() => router.push(`/dashboard/${searchSpaceId}/connectors/${connector.id}/edit`)}
>
<Edit className="h-4 w-4" />
<span className="sr-only">Edit</span>
@ -253,4 +255,4 @@ export default function ConnectorsPage() {
</Card>
</div>
);
}
}

View file

@ -0,0 +1,176 @@
"use client";
import React, { useEffect } from 'react';
import { useRouter, useParams } from "next/navigation";
import { motion } from "framer-motion";
import { toast } from "sonner";
import { ArrowLeft, Check, Loader2, Github } from "lucide-react";
import { Form } from "@/components/ui/form";
import { Button } from "@/components/ui/button";
import { Card, CardContent, CardDescription, CardFooter, CardHeader, CardTitle } from "@/components/ui/card";
// Import Utils, Types, Hook, and Components
import { getConnectorTypeDisplay } from '@/lib/connectors/utils';
import { useConnectorEditPage } from '@/hooks/useConnectorEditPage';
import { EditConnectorLoadingSkeleton } from "@/components/editConnector/EditConnectorLoadingSkeleton";
import { EditConnectorNameForm } from "@/components/editConnector/EditConnectorNameForm";
import { EditGitHubConnectorConfig } from "@/components/editConnector/EditGitHubConnectorConfig";
import { EditSimpleTokenForm } from "@/components/editConnector/EditSimpleTokenForm";
export default function EditConnectorPage() {
const router = useRouter();
const params = useParams();
const searchSpaceId = params.search_space_id as string;
// Ensure connectorId is parsed safely
const connectorIdParam = params.connector_id as string;
const connectorId = connectorIdParam ? parseInt(connectorIdParam, 10) : NaN;
// Use the custom hook to manage state and logic
const {
connectorsLoading,
connector,
isSaving,
editForm,
patForm, // Needed for GitHub child component
handleSaveChanges,
// GitHub specific props for the child component
editMode,
setEditMode, // Pass down if needed by GitHub component
originalPat,
currentSelectedRepos,
fetchedRepos,
setFetchedRepos,
newSelectedRepos,
setNewSelectedRepos,
isFetchingRepos,
handleFetchRepositories,
handleRepoSelectionChange,
} = useConnectorEditPage(connectorId, searchSpaceId);
// Redirect if connectorId is not a valid number after parsing
useEffect(() => {
if (isNaN(connectorId)) {
toast.error("Invalid Connector ID.");
router.push(`/dashboard/${searchSpaceId}/connectors`);
}
}, [connectorId, router, searchSpaceId]);
// Loading State
if (connectorsLoading || !connector) {
// Handle NaN case before showing skeleton
if (isNaN(connectorId)) return null;
return <EditConnectorLoadingSkeleton />;
}
// Main Render using data/handlers from the hook
return (
<div className="container mx-auto py-8 max-w-3xl">
<Button variant="ghost" className="mb-6" onClick={() => router.push(`/dashboard/${searchSpaceId}/connectors`)}>
<ArrowLeft className="mr-2 h-4 w-4" /> Back to Connectors
</Button>
<motion.div initial={{ opacity: 0, y: 20 }} animate={{ opacity: 1, y: 0 }} transition={{ duration: 0.5 }}>
<Card className="border-2 border-border">
<CardHeader>
<CardTitle className="text-2xl font-bold flex items-center gap-2">
<Github className="h-6 w-6" /> {/* TODO: Dynamic icon */}
Edit {getConnectorTypeDisplay(connector.connector_type)} Connector
</CardTitle>
<CardDescription>Modify connector name and configuration.</CardDescription>
</CardHeader>
<Form {...editForm}>
{/* Pass hook's handleSaveChanges */}
<form onSubmit={editForm.handleSubmit(handleSaveChanges)} className="space-y-6">
<CardContent className="space-y-6">
{/* Pass form control from hook */}
<EditConnectorNameForm control={editForm.control} />
<hr />
<h3 className="text-lg font-semibold">Configuration</h3>
{/* == GitHub == */}
{connector.connector_type === 'GITHUB_CONNECTOR' && (
<EditGitHubConnectorConfig
// Pass relevant state and handlers from hook
editMode={editMode}
setEditMode={setEditMode} // Pass setter if child manages mode
originalPat={originalPat}
currentSelectedRepos={currentSelectedRepos}
fetchedRepos={fetchedRepos}
newSelectedRepos={newSelectedRepos}
isFetchingRepos={isFetchingRepos}
patForm={patForm}
handleFetchRepositories={handleFetchRepositories}
handleRepoSelectionChange={handleRepoSelectionChange}
setNewSelectedRepos={setNewSelectedRepos}
setFetchedRepos={setFetchedRepos}
/>
)}
{/* == Slack == */}
{connector.connector_type === 'SLACK_CONNECTOR' && (
<EditSimpleTokenForm
control={editForm.control}
fieldName="SLACK_BOT_TOKEN"
fieldLabel="Slack Bot Token"
fieldDescription="Update the Slack Bot Token if needed."
placeholder="Begins with xoxb-..."
/>
)}
{/* == Notion == */}
{connector.connector_type === 'NOTION_CONNECTOR' && (
<EditSimpleTokenForm
control={editForm.control}
fieldName="NOTION_INTEGRATION_TOKEN"
fieldLabel="Notion Integration Token"
fieldDescription="Update the Notion Integration Token if needed."
placeholder="Begins with secret_..."
/>
)}
{/* == Serper == */}
{connector.connector_type === 'SERPER_API' && (
<EditSimpleTokenForm
control={editForm.control}
fieldName="SERPER_API_KEY"
fieldLabel="Serper API Key"
fieldDescription="Update the Serper API Key if needed."
/>
)}
{/* == Tavily == */}
{connector.connector_type === 'TAVILY_API' && (
<EditSimpleTokenForm
control={editForm.control}
fieldName="TAVILY_API_KEY"
fieldLabel="Tavily API Key"
fieldDescription="Update the Tavily API Key if needed."
/>
)}
{/* == Linear == */}
{connector.connector_type === 'LINEAR_CONNECTOR' && (
<EditSimpleTokenForm
control={editForm.control}
fieldName="LINEAR_API_KEY"
fieldLabel="Linear API Key"
fieldDescription="Update your Linear API Key if needed."
placeholder="Begins with lin_api_..."
/>
)}
</CardContent>
<CardFooter className="border-t pt-6">
<Button type="submit" disabled={isSaving} className="w-full sm:w-auto">
{isSaving ? <Loader2 className="mr-2 h-4 w-4 animate-spin" /> : <Check className="mr-2 h-4 w-4" />}
Save Changes
</Button>
</CardFooter>
</form>
</Form>
</Card>
</motion.div>
</div>
);
}

View file

@ -51,6 +51,7 @@ const getConnectorTypeDisplay = (type: string): string => {
"TAVILY_API": "Tavily API",
"SLACK_CONNECTOR": "Slack Connector",
"NOTION_CONNECTOR": "Notion Connector",
"GITHUB_CONNECTOR": "GitHub Connector",
// Add other connector types here as needed
};
return typeMap[type] || type;
@ -69,7 +70,7 @@ export default function EditConnectorPage() {
const [connector, setConnector] = useState<SearchSourceConnector | null>(null);
const [isLoading, setIsLoading] = useState(true);
const [isSubmitting, setIsSubmitting] = useState(false);
console.log("connector", connector);
// Initialize the form
const form = useForm<ApiConnectorFormValues>({
resolver: zodResolver(apiConnectorFormSchema),
@ -85,7 +86,8 @@ export default function EditConnectorPage() {
"SERPER_API": "SERPER_API_KEY",
"TAVILY_API": "TAVILY_API_KEY",
"SLACK_CONNECTOR": "SLACK_BOT_TOKEN",
"NOTION_CONNECTOR": "NOTION_INTEGRATION_TOKEN"
"NOTION_CONNECTOR": "NOTION_INTEGRATION_TOKEN",
"GITHUB_CONNECTOR": "GITHUB_PAT"
};
return fieldMap[connectorType] || "";
};
@ -136,6 +138,8 @@ export default function EditConnectorPage() {
name: values.name,
connector_type: connector.connector_type,
config: updatedConfig,
is_indexable: connector.is_indexable,
last_indexed_at: connector.last_indexed_at,
});
toast.success("Connector updated successfully!");
@ -223,17 +227,21 @@ export default function EditConnectorPage() {
? "Slack Bot Token"
: connector?.connector_type === "NOTION_CONNECTOR"
? "Notion Integration Token"
: "API Key"}
: connector?.connector_type === "GITHUB_CONNECTOR"
? "GitHub Personal Access Token (PAT)"
: "API Key"}
</FormLabel>
<FormControl>
<Input
type="password"
placeholder={
connector?.connector_type === "SLACK_CONNECTOR"
? "Enter your Slack Bot Token"
? "Enter new Slack Bot Token (optional)"
: connector?.connector_type === "NOTION_CONNECTOR"
? "Enter your Notion Integration Token"
: "Enter your API key"
? "Enter new Notion Token (optional)"
: connector?.connector_type === "GITHUB_CONNECTOR"
? "Enter new GitHub PAT (optional)"
: "Enter new API key (optional)"
}
{...field}
/>
@ -243,7 +251,9 @@ export default function EditConnectorPage() {
? "Enter a new Slack Bot Token or leave blank to keep your existing token."
: connector?.connector_type === "NOTION_CONNECTOR"
? "Enter a new Notion Integration Token or leave blank to keep your existing token."
: "Enter a new API key or leave blank to keep your existing key."}
: connector?.connector_type === "GITHUB_CONNECTOR"
? "Enter a new GitHub PAT or leave blank to keep your existing token."
: "Enter a new API key or leave blank to keep your existing key."}
</FormDescription>
<FormMessage />
</FormItem>
@ -276,4 +286,4 @@ export default function EditConnectorPage() {
</motion.div>
</div>
);
}
}

View file

@ -0,0 +1,456 @@
"use client";
import { useState } from "react";
import { useRouter, useParams } from "next/navigation";
import { motion } from "framer-motion";
import { zodResolver } from "@hookform/resolvers/zod";
import { useForm } from "react-hook-form";
import * as z from "zod";
import { toast } from "sonner";
import { ArrowLeft, Check, Info, Loader2, Github, CircleAlert, ListChecks } from "lucide-react";
// Assuming useSearchSourceConnectors hook exists and works similarly
import { useSearchSourceConnectors } from "@/hooks/useSearchSourceConnectors";
import {
Form,
FormControl,
FormDescription,
FormField,
FormItem,
FormLabel,
FormMessage,
} from "@/components/ui/form";
import { Input } from "@/components/ui/input";
import { Button } from "@/components/ui/button";
import {
Card,
CardContent,
CardDescription,
CardFooter,
CardHeader,
CardTitle,
} from "@/components/ui/card";
import {
Alert,
AlertDescription,
AlertTitle,
} from "@/components/ui/alert";
import {
Accordion,
AccordionContent,
AccordionItem,
AccordionTrigger,
} from "@/components/ui/accordion";
import { Tabs, TabsContent, TabsList, TabsTrigger } from "@/components/ui/tabs";
import { Checkbox } from "@/components/ui/checkbox";
// Define the form schema with Zod for GitHub PAT entry step
const githubPatFormSchema = z.object({
name: z.string().min(3, {
message: "Connector name must be at least 3 characters.",
}),
github_pat: z.string()
.min(20, { // Apply min length first
message: "GitHub Personal Access Token seems too short.",
})
.refine(pat => pat.startsWith('ghp_') || pat.startsWith('github_pat_'), { // Then refine the pattern
message: "GitHub PAT should start with 'ghp_' or 'github_pat_'",
}),
});
// Define the type for the form values
type GithubPatFormValues = z.infer<typeof githubPatFormSchema>;
// Type for fetched GitHub repositories
interface GithubRepo {
id: number;
name: string;
full_name: string;
private: boolean;
url: string;
description: string | null;
last_updated: string | null;
}
export default function GithubConnectorPage() {
const router = useRouter();
const params = useParams();
const searchSpaceId = params.search_space_id as string;
const [step, setStep] = useState<'enter_pat' | 'select_repos'>('enter_pat');
const [isFetchingRepos, setIsFetchingRepos] = useState(false);
const [isCreatingConnector, setIsCreatingConnector] = useState(false);
const [repositories, setRepositories] = useState<GithubRepo[]>([]);
const [selectedRepos, setSelectedRepos] = useState<string[]>([]);
const [connectorName, setConnectorName] = useState<string>("GitHub Connector");
const [validatedPat, setValidatedPat] = useState<string>(""); // Store the validated PAT
const { createConnector } = useSearchSourceConnectors();
// Initialize the form for PAT entry
const form = useForm<GithubPatFormValues>({
resolver: zodResolver(githubPatFormSchema),
defaultValues: {
name: connectorName,
github_pat: "",
},
});
// Function to fetch repositories using the new backend endpoint
const fetchRepositories = async (values: GithubPatFormValues) => {
setIsFetchingRepos(true);
setConnectorName(values.name); // Store the name
setValidatedPat(values.github_pat); // Store the PAT temporarily
try {
const token = localStorage.getItem('surfsense_bearer_token');
if (!token) {
throw new Error('No authentication token found');
}
const response = await fetch(
`${process.env.NEXT_PUBLIC_FASTAPI_BACKEND_URL}/api/v1/github/repositories/`,
{
method: 'POST',
headers: {
'Content-Type': 'application/json',
'Authorization': `Bearer ${token}`
},
body: JSON.stringify({ github_pat: values.github_pat })
}
);
if (!response.ok) {
const errorData = await response.json();
throw new Error(errorData.detail || `Failed to fetch repositories: ${response.statusText}`);
}
const data: GithubRepo[] = await response.json();
setRepositories(data);
setStep('select_repos'); // Move to the next step
toast.success(`Found ${data.length} repositories.`);
} catch (error) {
console.error("Error fetching GitHub repositories:", error);
const errorMessage = error instanceof Error ? error.message : "Failed to fetch repositories. Please check the PAT and try again.";
toast.error(errorMessage);
} finally {
setIsFetchingRepos(false);
}
};
// Handle final connector creation
const handleCreateConnector = async () => {
if (selectedRepos.length === 0) {
toast.warning("Please select at least one repository to index.");
return;
}
setIsCreatingConnector(true);
try {
await createConnector({
name: connectorName, // Use the stored name
connector_type: "GITHUB_CONNECTOR",
config: {
GITHUB_PAT: validatedPat, // Use the stored validated PAT
repo_full_names: selectedRepos, // Add the selected repo names
},
is_indexable: true,
last_indexed_at: null,
});
toast.success("GitHub connector created successfully!");
router.push(`/dashboard/${searchSpaceId}/connectors`);
} catch (error) {
console.error("Error creating GitHub connector:", error);
const errorMessage = error instanceof Error ? error.message : "Failed to create GitHub connector.";
toast.error(errorMessage);
} finally {
setIsCreatingConnector(false);
}
};
// Handle checkbox changes
const handleRepoSelection = (repoFullName: string, checked: boolean) => {
setSelectedRepos(prev =>
checked
? [...prev, repoFullName]
: prev.filter(name => name !== repoFullName)
);
};
return (
<div className="container mx-auto py-8 max-w-3xl">
<Button
variant="ghost"
className="mb-6"
onClick={() => {
if (step === 'select_repos') {
// Go back to PAT entry, clear sensitive/fetched data
setStep('enter_pat');
setRepositories([]);
setSelectedRepos([]);
setValidatedPat("");
// Reset form PAT field, keep name
form.reset({ name: connectorName, github_pat: "" });
} else {
router.push(`/dashboard/${searchSpaceId}/connectors/add`);
}
}}
>
<ArrowLeft className="mr-2 h-4 w-4" />
{step === 'select_repos' ? "Back to PAT Entry" : "Back to Add Connectors"}
</Button>
<motion.div
initial={{ opacity: 0, y: 20 }}
animate={{ opacity: 1, y: 0 }}
transition={{ duration: 0.5 }}
>
<Tabs defaultValue="connect" className="w-full">
<TabsList className="grid w-full grid-cols-2 mb-6">
<TabsTrigger value="connect">Connect GitHub</TabsTrigger>
<TabsTrigger value="documentation">Setup Guide</TabsTrigger>
</TabsList>
<TabsContent value="connect">
<Card className="border-2 border-border">
<CardHeader>
<CardTitle className="text-2xl font-bold flex items-center gap-2">
{step === 'enter_pat' ? <Github className="h-6 w-6" /> : <ListChecks className="h-6 w-6" />}
{step === 'enter_pat' ? "Connect GitHub Account" : "Select Repositories to Index"}
</CardTitle>
<CardDescription>
{step === 'enter_pat'
? "Provide a name and GitHub Personal Access Token (PAT) to fetch accessible repositories."
: `Select which repositories you want SurfSense to index for search. Found ${repositories.length} repositories accessible via your PAT.`
}
</CardDescription>
</CardHeader>
<Form {...form}>
{step === 'enter_pat' && (
<CardContent>
<Alert className="mb-6 bg-muted">
<Info className="h-4 w-4" />
<AlertTitle>GitHub Personal Access Token (PAT) Required</AlertTitle>
<AlertDescription>
You'll need a GitHub PAT with the appropriate scopes (e.g., 'repo') to fetch repositories. You can create one from your{' '}
<a
href="https://github.com/settings/personal-access-tokens"
target="_blank"
rel="noopener noreferrer"
className="font-medium underline underline-offset-4"
>
GitHub Developer Settings
</a>. The PAT will be used to fetch repositories and then stored securely to enable indexing.
</AlertDescription>
</Alert>
<form onSubmit={form.handleSubmit(fetchRepositories)} className="space-y-6">
<FormField
control={form.control}
name="name"
render={({ field }) => (
<FormItem>
<FormLabel>Connector Name</FormLabel>
<FormControl>
<Input placeholder="My GitHub Connector" {...field} />
</FormControl>
<FormDescription>
A friendly name to identify this GitHub connection.
</FormDescription>
<FormMessage />
</FormItem>
)}
/>
<FormField
control={form.control}
name="github_pat"
render={({ field }) => (
<FormItem>
<FormLabel>GitHub Personal Access Token (PAT)</FormLabel>
<FormControl>
<Input
type="password"
placeholder="ghp_... or github_pat_..."
{...field}
/>
</FormControl>
<FormDescription>
Enter your GitHub PAT here to fetch your repositories. It will be stored encrypted later.
</FormDescription>
<FormMessage />
</FormItem>
)}
/>
<div className="flex justify-end">
<Button
type="submit"
disabled={isFetchingRepos}
className="w-full sm:w-auto"
>
{isFetchingRepos ? (
<>
<Loader2 className="mr-2 h-4 w-4 animate-spin" />
Fetching Repositories...
</>
) : (
"Fetch Repositories"
)}
</Button>
</div>
</form>
</CardContent>
)}
{step === 'select_repos' && (
<CardContent>
{repositories.length === 0 ? (
<Alert variant="destructive">
<CircleAlert className="h-4 w-4" />
<AlertTitle>No Repositories Found</AlertTitle>
<AlertDescription>
No repositories were found or accessible with the provided PAT. Please check the token and its permissions, then go back and try again.
</AlertDescription>
</Alert>
) : (
<div className="space-y-4">
<FormLabel>Repositories ({selectedRepos.length} selected)</FormLabel>
<div className="h-64 w-full rounded-md border p-4 overflow-y-auto">
{repositories.map((repo) => (
<div key={repo.id} className="flex items-center space-x-2 mb-2 py-1">
<Checkbox
id={`repo-${repo.id}`}
checked={selectedRepos.includes(repo.full_name)}
onCheckedChange={(checked) => handleRepoSelection(repo.full_name, !!checked)}
/>
<label
htmlFor={`repo-${repo.id}`}
className="text-sm font-medium leading-none peer-disabled:cursor-not-allowed peer-disabled:opacity-70"
>
{repo.full_name} {repo.private && "(Private)"}
</label>
</div>
))}
</div>
<FormDescription>
Select the repositories you wish to index. Only checked repositories will be processed.
</FormDescription>
<div className="flex justify-between items-center pt-4">
<Button
variant="outline"
onClick={() => {
setStep('enter_pat');
setRepositories([]);
setSelectedRepos([]);
setValidatedPat("");
form.reset({ name: connectorName, github_pat: "" });
}}
>
Back
</Button>
<Button
onClick={handleCreateConnector}
disabled={isCreatingConnector || selectedRepos.length === 0}
className="w-full sm:w-auto"
>
{isCreatingConnector ? (
<>
<Loader2 className="mr-2 h-4 w-4 animate-spin" />
Creating Connector...
</>
) : (
<>
<Check className="mr-2 h-4 w-4" />
Create Connector
</>
)}
</Button>
</div>
</div>
)}
</CardContent>
)}
</Form>
<CardFooter className="flex flex-col items-start border-t bg-muted/50 px-6 py-4">
<h4 className="text-sm font-medium">What you get with GitHub integration:</h4>
<ul className="mt-2 list-disc pl-5 text-sm text-muted-foreground">
<li>Search through code and documentation in your selected repositories</li>
<li>Access READMEs, Markdown files, and common code files</li>
<li>Connect your project knowledge directly to your search space</li>
<li>Index your selected repositories for enhanced search capabilities</li>
</ul>
</CardFooter>
</Card>
</TabsContent>
<TabsContent value="documentation">
<Card className="border-2 border-border">
<CardHeader>
<CardTitle className="text-2xl font-bold">GitHub Connector Setup Guide</CardTitle>
<CardDescription>
Learn how to generate a Personal Access Token (PAT) and connect your GitHub account.
</CardDescription>
</CardHeader>
<CardContent className="space-y-6">
<div>
<h3 className="text-xl font-semibold mb-2">How it works</h3>
<p className="text-muted-foreground">
The GitHub connector uses a Personal Access Token (PAT) to authenticate with the GitHub API. First, it fetches a list of repositories accessible to the token. You then select which repositories you want to index. The connector indexes relevant files (code, markdown, text) from only the selected repositories.
</p>
<ul className="mt-2 list-disc pl-5 text-muted-foreground">
<li>The connector indexes files based on common code and documentation extensions.</li>
<li>Large files (over 1MB) are skipped during indexing.</li>
<li>Only selected repositories are indexed.</li>
<li>Indexing runs periodically (check connector settings for frequency) to keep content up-to-date.</li>
</ul>
</div>
<Accordion type="single" collapsible className="w-full">
<AccordionItem value="create_pat">
<AccordionTrigger className="text-lg font-medium">Step 1: Generate GitHub PAT</AccordionTrigger>
<AccordionContent>
<div className="space-y-6">
<div>
<h4 className="font-medium mb-2">Generating a Token:</h4>
<ol className="list-decimal pl-5 space-y-3">
<li>Go to your GitHub <a href="https://github.com/settings/tokens" target="_blank" rel="noopener noreferrer" className="font-medium underline underline-offset-4">Developer settings</a>.</li>
<li>Click on <strong>Personal access tokens</strong>, then choose <strong>Tokens (classic)</strong> or <strong>Fine-grained tokens</strong> (recommended if available and suitable).</li>
<li>Click <strong>Generate new token</strong> (and choose the appropriate type).</li>
<li>Give your token a descriptive name (e.g., "SurfSense Connector").</li>
<li>Set an expiration date for the token (recommended for security).</li>
<li>Under <strong>Select scopes</strong> (for classic tokens) or <strong>Repository access</strong> (for fine-grained), grant the necessary permissions. At minimum, the <strong>`repo`</strong> scope (or equivalent read access to repositories for fine-grained tokens) is required to read repository content.</li>
<li>Click <strong>Generate token</strong>.</li>
<li><strong>Important:</strong> Copy your new PAT immediately. You won't be able to see it again after leaving the page.</li>
</ol>
</div>
</div>
</AccordionContent>
</AccordionItem>
<AccordionItem value="connect_app">
<AccordionTrigger className="text-lg font-medium">Step 2: Connect in SurfSense</AccordionTrigger>
<AccordionContent className="space-y-4">
<ol className="list-decimal pl-5 space-y-3">
<li>Navigate to the "Connect GitHub" tab.</li>
<li>Enter a name for your connector.</li>
<li>Paste the copied GitHub PAT into the "GitHub Personal Access Token (PAT)" field.</li>
<li>Click <strong>Fetch Repositories</strong>.</li>
<li>If the PAT is valid, you'll see a list of your accessible repositories.</li>
<li>Select the repositories you want SurfSense to index using the checkboxes.</li>
<li>Click the <strong>Create Connector</strong> button.</li>
<li>If the connection is successful, you will be redirected and can start indexing from the Connectors page.</li>
</ol>
</AccordionContent>
</AccordionItem>
</Accordion>
</CardContent>
</Card>
</TabsContent>
</Tabs>
</motion.div>
</div>
);
}

View file

@ -0,0 +1,321 @@
"use client";
import { useState } from "react";
import { useRouter, useParams } from "next/navigation";
import { motion } from "framer-motion";
import { zodResolver } from "@hookform/resolvers/zod";
import { useForm } from "react-hook-form";
import * as z from "zod";
import { toast } from "sonner";
import { ArrowLeft, Check, Info, Loader2 } from "lucide-react";
import { useSearchSourceConnectors } from "@/hooks/useSearchSourceConnectors";
import {
Form,
FormControl,
FormDescription,
FormField,
FormItem,
FormLabel,
FormMessage,
} from "@/components/ui/form";
import { Input } from "@/components/ui/input";
import { Button } from "@/components/ui/button";
import {
Card,
CardContent,
CardDescription,
CardFooter,
CardHeader,
CardTitle,
} from "@/components/ui/card";
import {
Alert,
AlertDescription,
AlertTitle,
} from "@/components/ui/alert";
import {
Accordion,
AccordionContent,
AccordionItem,
AccordionTrigger,
} from "@/components/ui/accordion";
import { Tabs, TabsContent, TabsList, TabsTrigger } from "@/components/ui/tabs";
// Define the form schema with Zod
const linearConnectorFormSchema = z.object({
name: z.string().min(3, {
message: "Connector name must be at least 3 characters.",
}),
api_key: z.string().min(10, {
message: "Linear API Key is required and must be valid.",
}).regex(/^lin_api_/, {
message: "Linear API Key should start with 'lin_api_'",
}),
});
// Define the type for the form values
type LinearConnectorFormValues = z.infer<typeof linearConnectorFormSchema>;
export default function LinearConnectorPage() {
const router = useRouter();
const params = useParams();
const searchSpaceId = params.search_space_id as string;
const [isSubmitting, setIsSubmitting] = useState(false);
const { createConnector } = useSearchSourceConnectors();
// Initialize the form
const form = useForm<LinearConnectorFormValues>({
resolver: zodResolver(linearConnectorFormSchema),
defaultValues: {
name: "Linear Connector",
api_key: "",
},
});
// Handle form submission
const onSubmit = async (values: LinearConnectorFormValues) => {
setIsSubmitting(true);
try {
await createConnector({
name: values.name,
connector_type: "LINEAR_CONNECTOR",
config: {
LINEAR_API_KEY: values.api_key,
},
is_indexable: true,
last_indexed_at: null,
});
toast.success("Linear connector created successfully!");
// Navigate back to connectors page
router.push(`/dashboard/${searchSpaceId}/connectors`);
} catch (error) {
console.error("Error creating connector:", error);
toast.error(error instanceof Error ? error.message : "Failed to create connector");
} finally {
setIsSubmitting(false);
}
};
return (
<div className="container mx-auto py-8 max-w-3xl">
<Button
variant="ghost"
className="mb-6"
onClick={() => router.push(`/dashboard/${searchSpaceId}/connectors/add`)}
>
<ArrowLeft className="mr-2 h-4 w-4" />
Back to Connectors
</Button>
<motion.div
initial={{ opacity: 0, y: 20 }}
animate={{ opacity: 1, y: 0 }}
transition={{ duration: 0.5 }}
>
<Tabs defaultValue="connect" className="w-full">
<TabsList className="grid w-full grid-cols-2 mb-6">
<TabsTrigger value="connect">Connect</TabsTrigger>
<TabsTrigger value="documentation">Documentation</TabsTrigger>
</TabsList>
<TabsContent value="connect">
<Card className="border-2 border-border">
<CardHeader>
<CardTitle className="text-2xl font-bold">Connect Linear Workspace</CardTitle>
<CardDescription>
Integrate with Linear to search and retrieve information from your issues and comments. This connector can index your Linear content for search.
</CardDescription>
</CardHeader>
<CardContent>
<Alert className="mb-6 bg-muted">
<Info className="h-4 w-4" />
<AlertTitle>Linear API Key Required</AlertTitle>
<AlertDescription>
You'll need a Linear API Key to use this connector. You can create a Linear API key from{" "}
<a
href="https://linear.app/settings/api"
target="_blank"
rel="noopener noreferrer"
className="font-medium underline underline-offset-4"
>
Linear API Settings
</a>
</AlertDescription>
</Alert>
<Form {...form}>
<form onSubmit={form.handleSubmit(onSubmit)} className="space-y-6">
<FormField
control={form.control}
name="name"
render={({ field }) => (
<FormItem>
<FormLabel>Connector Name</FormLabel>
<FormControl>
<Input placeholder="My Linear Connector" {...field} />
</FormControl>
<FormDescription>
A friendly name to identify this connector.
</FormDescription>
<FormMessage />
</FormItem>
)}
/>
<FormField
control={form.control}
name="api_key"
render={({ field }) => (
<FormItem>
<FormLabel>Linear API Key</FormLabel>
<FormControl>
<Input
type="password"
placeholder="lin_api_..."
{...field}
/>
</FormControl>
<FormDescription>
Your Linear API Key will be encrypted and stored securely. It typically starts with "lin_api_".
</FormDescription>
<FormMessage />
</FormItem>
)}
/>
<div className="flex justify-end">
<Button
type="submit"
disabled={isSubmitting}
className="w-full sm:w-auto"
>
{isSubmitting ? (
<>
<Loader2 className="mr-2 h-4 w-4 animate-spin" />
Connecting...
</>
) : (
<>
<Check className="mr-2 h-4 w-4" />
Connect Linear
</>
)}
</Button>
</div>
</form>
</Form>
</CardContent>
<CardFooter className="flex flex-col items-start border-t bg-muted/50 px-6 py-4">
<h4 className="text-sm font-medium">What you get with Linear integration:</h4>
<ul className="mt-2 list-disc pl-5 text-sm text-muted-foreground">
<li>Search through all your Linear issues and comments</li>
<li>Access issue titles, descriptions, and full discussion threads</li>
<li>Connect your team's project management directly to your search space</li>
<li>Keep your search results up-to-date with latest Linear content</li>
<li>Index your Linear issues for enhanced search capabilities</li>
</ul>
</CardFooter>
</Card>
</TabsContent>
<TabsContent value="documentation">
<Card className="border-2 border-border">
<CardHeader>
<CardTitle className="text-2xl font-bold">Linear Connector Documentation</CardTitle>
<CardDescription>
Learn how to set up and use the Linear connector to index your project management data.
</CardDescription>
</CardHeader>
<CardContent className="space-y-6">
<div>
<h3 className="text-xl font-semibold mb-2">How it works</h3>
<p className="text-muted-foreground">
The Linear connector uses the Linear GraphQL API to fetch all issues and comments that the API key has access to within a workspace.
</p>
<ul className="mt-2 list-disc pl-5 text-muted-foreground">
<li>For follow up indexing runs, the connector retrieves issues and comments that have been updated since the last indexing attempt.</li>
<li>Indexing is configured to run periodically, so updates should appear in your search results within minutes.</li>
</ul>
</div>
<Accordion type="single" collapsible className="w-full">
<AccordionItem value="authorization">
<AccordionTrigger className="text-lg font-medium">Authorization</AccordionTrigger>
<AccordionContent className="space-y-4">
<Alert className="bg-muted">
<Info className="h-4 w-4" />
<AlertTitle>Read-Only Access is Sufficient</AlertTitle>
<AlertDescription>
You only need a read-only API key for this connector to work. This limits the permissions to just reading your Linear data.
</AlertDescription>
</Alert>
<div className="space-y-6">
<div>
<h4 className="font-medium mb-2">Step 1: Create an API key</h4>
<ol className="list-decimal pl-5 space-y-3">
<li>Log in to your Linear account</li>
<li>Navigate to <a href="https://linear.app/settings/api" target="_blank" rel="noopener noreferrer" className="font-medium underline underline-offset-4">https://linear.app/settings/api</a> in your browser.</li>
<li>Alternatively, click on your profile picture Settings API</li>
<li>Click the <strong>+ New API key</strong> button.</li>
<li>Enter a description for your key (like "Search Connector").</li>
<li>Select "Read-only" as the permission.</li>
<li>Click <strong>Create</strong> to generate the API key.</li>
<li>Copy the generated API key that starts with 'lin_api_' as it will only be shown once.</li>
</ol>
</div>
<div>
<h4 className="font-medium mb-2">Step 2: Grant necessary access</h4>
<p className="text-muted-foreground mb-3">
The API key will have access to all issues and comments that your user account can see. If you're creating the key as an admin, it will have access to all issues in the workspace.
</p>
<Alert className="bg-muted">
<Info className="h-4 w-4" />
<AlertTitle>Data Privacy</AlertTitle>
<AlertDescription>
Only issues and comments will be indexed. Linear attachments and linked files are not indexed by this connector.
</AlertDescription>
</Alert>
</div>
</div>
</AccordionContent>
</AccordionItem>
<AccordionItem value="indexing">
<AccordionTrigger className="text-lg font-medium">Indexing</AccordionTrigger>
<AccordionContent className="space-y-4">
<ol className="list-decimal pl-5 space-y-3">
<li>Navigate to the Connector Dashboard and select the <strong>Linear</strong> Connector.</li>
<li>Place the <strong>API Key</strong> in the form field.</li>
<li>Click <strong>Connect</strong> to establish the connection.</li>
<li>Once connected, your Linear issues will be indexed automatically.</li>
</ol>
<Alert className="bg-muted">
<Info className="h-4 w-4" />
<AlertTitle>What Gets Indexed</AlertTitle>
<AlertDescription>
<p className="mb-2">The Linear connector indexes the following data:</p>
<ul className="list-disc pl-5">
<li>Issue titles and identifiers (e.g., PROJ-123)</li>
<li>Issue descriptions</li>
<li>Issue comments</li>
<li>Issue status and metadata</li>
</ul>
</AlertDescription>
</Alert>
</AccordionContent>
</AccordionItem>
</Accordion>
</CardContent>
</Card>
</TabsContent>
</Tabs>
</motion.div>
</div>
);
}

View file

@ -1,57 +1,61 @@
"use client";
import { cn } from "@/lib/utils";
import { Badge } from "@/components/ui/badge";
import { Button } from "@/components/ui/button";
import { Card, CardContent, CardFooter, CardHeader } from "@/components/ui/card";
import { Collapsible, CollapsibleContent, CollapsibleTrigger } from "@/components/ui/collapsible";
import {
IconBrandGoogle,
IconBrandSlack,
IconBrandWindows,
IconBrandDiscord,
IconSearch,
IconMessages,
IconDatabase,
IconCloud,
IconBrandGithub,
IconBrandNotion,
IconMail,
IconBrandSlack,
IconBrandWindows,
IconBrandZoom,
IconChevronDown,
IconChevronRight,
IconMail,
IconWorldWww,
IconTicket,
IconLayoutKanban,
} from "@tabler/icons-react";
import { motion, AnimatePresence } from "framer-motion";
import { useState } from "react";
import { useParams } from "next/navigation";
import { AnimatePresence, motion } from "framer-motion";
import Link from "next/link";
import { Button } from "@/components/ui/button";
import { Separator } from "@/components/ui/separator";
import { Collapsible, CollapsibleContent, CollapsibleTrigger } from "@/components/ui/collapsible";
import { useParams } from "next/navigation";
import { useState } from "react";
// Define the Connector type
interface Connector {
id: string;
title: string;
description: string;
icon: React.ReactNode;
status: "available" | "coming-soon" | "connected";
}
interface ConnectorCategory {
id: string;
title: string;
connectors: Connector[];
}
// Define connector categories and their connectors
const connectorCategories = [
const connectorCategories: ConnectorCategory[] = [
{
id: "search-engines",
title: "Search Engines",
description: "Connect to search engines to enhance your research capabilities.",
icon: <IconSearch className="h-5 w-5" />,
connectors: [
{
id: "tavily-api",
title: "Tavily Search API",
description: "Connect to Tavily Search API to search the web.",
icon: <IconSearch className="h-6 w-6" />,
title: "Tavily API",
description: "Search the web using the Tavily API",
icon: <IconWorldWww className="h-6 w-6" />,
status: "available",
},
{
id: "serper-api",
title: "Serper API",
description: "Connect to Serper API to search the web.",
icon: <IconBrandGoogle className="h-6 w-6" />,
status: "coming-soon",
},
// Add other search engine connectors like Tavily, Serper if they have UI config
],
},
{
id: "team-chats",
title: "Team Chats",
description: "Connect to your team communication platforms.",
icon: <IconMessages className="h-5 w-5" />,
connectors: [
{
id: "slack-connector",
@ -76,11 +80,29 @@ const connectorCategories = [
},
],
},
{
id: "project-management",
title: "Project Management",
connectors: [
{
id: "linear-connector",
title: "Linear",
description: "Connect to Linear to search issues, comments and project data.",
icon: <IconLayoutKanban className="h-6 w-6" />,
status: "available",
},
{
id: "jira-connector",
title: "Jira",
description: "Connect to Jira to search issues, tickets and project data.",
icon: <IconTicket className="h-6 w-6" />,
status: "coming-soon",
},
],
},
{
id: "knowledge-bases",
title: "Knowledge Bases",
description: "Connect to your knowledge bases and documentation.",
icon: <IconDatabase className="h-5 w-5" />,
connectors: [
{
id: "notion-connector",
@ -90,19 +112,17 @@ const connectorCategories = [
status: "available",
},
{
id: "github",
id: "github-connector",
title: "GitHub",
description: "Connect to GitHub repositories to access code and documentation.",
description: "Connect a GitHub PAT to index code and docs from accessible repositories.",
icon: <IconBrandGithub className="h-6 w-6" />,
status: "coming-soon",
status: "available",
},
],
},
{
id: "communication",
title: "Communication",
description: "Connect to your email and meeting platforms.",
icon: <IconMail className="h-5 w-5" />,
connectors: [
{
id: "gmail",
@ -122,10 +142,48 @@ const connectorCategories = [
},
];
// Animation variants
const fadeIn = {
hidden: { opacity: 0 },
visible: { opacity: 1, transition: { duration: 0.4 } }
};
const staggerContainer = {
hidden: { opacity: 0 },
visible: {
opacity: 1,
transition: {
staggerChildren: 0.1
}
}
};
const cardVariants = {
hidden: { opacity: 0, y: 20 },
visible: {
opacity: 1,
y: 0,
transition: {
type: "spring",
stiffness: 260,
damping: 20
}
},
hover: {
scale: 1.02,
boxShadow: "0 10px 15px -3px rgba(0, 0, 0, 0.1), 0 4px 6px -2px rgba(0, 0, 0, 0.05)",
transition: {
type: "spring",
stiffness: 400,
damping: 10
}
}
};
export default function ConnectorsPage() {
const params = useParams();
const searchSpaceId = params.search_space_id as string;
const [expandedCategories, setExpandedCategories] = useState<string[]>(["search-engines"]);
const [expandedCategories, setExpandedCategories] = useState<string[]>(["search-engines", "knowledge-bases", "project-management"]);
const toggleCategory = (categoryId: string) => {
setExpandedCategories(prev =>
@ -136,121 +194,142 @@ export default function ConnectorsPage() {
};
return (
<div className="container mx-auto py-8 max-w-6xl">
<div className="container mx-auto py-12 max-w-6xl">
<motion.div
initial={{ opacity: 0, y: 20 }}
initial={{ opacity: 0, y: 30 }}
animate={{ opacity: 1, y: 0 }}
transition={{ duration: 0.5 }}
className="mb-8 text-center"
transition={{
duration: 0.6,
ease: [0.22, 1, 0.36, 1]
}}
className="mb-12 text-center"
>
<h1 className="text-3xl font-bold tracking-tight">Connect Your Tools</h1>
<p className="text-muted-foreground mt-2">
<h1 className="text-4xl font-bold tracking-tight bg-gradient-to-r from-indigo-500 to-purple-500 bg-clip-text text-transparent">
Connect Your Tools
</h1>
<p className="text-muted-foreground mt-3 text-lg max-w-2xl mx-auto">
Integrate with your favorite services to enhance your research capabilities.
</p>
</motion.div>
<div className="space-y-6">
{connectorCategories.map((category, categoryIndex) => (
<Collapsible
<motion.div
className="space-y-8"
initial="hidden"
animate="visible"
variants={staggerContainer}
>
{connectorCategories.map((category) => (
<motion.div
key={category.id}
open={expandedCategories.includes(category.id)}
onOpenChange={() => toggleCategory(category.id)}
className="border rounded-lg overflow-hidden bg-card"
variants={fadeIn}
className="rounded-lg border bg-card text-card-foreground shadow-sm"
>
<CollapsibleTrigger asChild>
<motion.div
initial={{ opacity: 0, y: 10 }}
animate={{ opacity: 1, y: 0 }}
transition={{ duration: 0.3, delay: categoryIndex * 0.1 }}
className="p-4 flex items-center justify-between cursor-pointer hover:bg-accent/50 transition-colors"
>
<div className="flex items-center gap-3">
<div className="p-2 rounded-md bg-primary/10 text-primary">
{category.icon}
</div>
<div>
<h2 className="text-xl font-semibold">{category.title}</h2>
<p className="text-sm text-muted-foreground">{category.description}</p>
</div>
</div>
<IconChevronRight
className={cn(
"h-5 w-5 text-muted-foreground transition-transform duration-200",
expandedCategories.includes(category.id) && "rotate-90"
)}
/>
</motion.div>
</CollapsibleTrigger>
<CollapsibleContent>
<Separator />
<div className="p-4 grid grid-cols-1 md:grid-cols-2 lg:grid-cols-3 gap-4">
<AnimatePresence>
{category.connectors.map((connector, index) => (
<Collapsible
open={expandedCategories.includes(category.id)}
onOpenChange={() => toggleCategory(category.id)}
className="w-full"
>
<div className="flex items-center justify-between space-x-4 p-4">
<h3 className="text-xl font-semibold">{category.title}</h3>
<CollapsibleTrigger asChild>
<Button variant="ghost" size="sm" className="w-9 p-0 hover:bg-muted">
<motion.div
key={connector.id}
initial={{ opacity: 0, scale: 0.95 }}
animate={{ opacity: 1, scale: 1 }}
exit={{ opacity: 0, scale: 0.95 }}
transition={{
duration: 0.2,
delay: index * 0.05,
type: "spring",
stiffness: 300,
damping: 30
}}
className={cn(
"relative group flex flex-col p-4 rounded-lg border",
connector.status === "coming-soon" ? "opacity-70" : ""
)}
animate={{ rotate: expandedCategories.includes(category.id) ? 180 : 0 }}
transition={{ duration: 0.3, ease: "easeInOut" }}
>
<div className="absolute inset-0 opacity-0 group-hover:opacity-100 transition duration-200 bg-gradient-to-t from-accent/50 to-transparent rounded-lg pointer-events-none" />
<div className="mb-4 relative z-10 text-primary">
{connector.icon}
</div>
<div className="flex items-center justify-between mb-2">
<h3 className="text-lg font-semibold group-hover:translate-x-1 transition duration-200">
{connector.title}
</h3>
{connector.status === "coming-soon" && (
<span className="text-xs bg-muted px-2 py-1 rounded-full">Coming soon</span>
)}
</div>
<p className="text-sm text-muted-foreground mb-4 flex-grow">
{connector.description}
</p>
{connector.status === "available" ? (
<Link
href={`/dashboard/${searchSpaceId}/connectors/add/${connector.id}`}
className="w-full mt-auto"
>
<Button
variant="default"
className="w-full"
>
Connect
</Button>
</Link>
) : (
<Button
variant="outline"
className="w-full mt-auto"
disabled
>
Notify Me
</Button>
)}
<IconChevronDown className="h-5 w-5" />
</motion.div>
))}
</AnimatePresence>
<span className="sr-only">Toggle</span>
</Button>
</CollapsibleTrigger>
</div>
</CollapsibleContent>
</Collapsible>
<CollapsibleContent>
<AnimatePresence>
<motion.div
className="grid grid-cols-1 gap-6 sm:grid-cols-2 lg:grid-cols-3 p-4"
variants={staggerContainer}
initial="hidden"
animate="visible"
exit="hidden"
>
{category.connectors.map((connector) => (
<motion.div
key={connector.id}
variants={cardVariants}
whileHover="hover"
className="col-span-1"
>
<Card className="h-full flex flex-col overflow-hidden border-transparent transition-all duration-200 hover:border-primary/50">
<CardHeader className="flex-row items-center gap-4 pb-2">
<div className="flex h-12 w-12 items-center justify-center rounded-lg bg-primary/10 dark:bg-primary/20">
<motion.div
whileHover={{ rotate: 5, scale: 1.1 }}
className="text-primary"
>
{connector.icon}
</motion.div>
</div>
<div>
<div className="flex items-center gap-2">
<h3 className="font-medium">{connector.title}</h3>
{connector.status === "coming-soon" && (
<Badge variant="outline" className="text-xs bg-amber-100 dark:bg-amber-950 text-amber-800 dark:text-amber-300 border-amber-200 dark:border-amber-800">
Coming soon
</Badge>
)}
{connector.status === "connected" && (
<Badge variant="outline" className="text-xs bg-green-100 dark:bg-green-950 text-green-800 dark:text-green-300 border-green-200 dark:border-green-800">
Connected
</Badge>
)}
</div>
</div>
</CardHeader>
<CardContent className="pb-4">
<p className="text-sm text-muted-foreground">
{connector.description}
</p>
</CardContent>
<CardFooter className="mt-auto pt-2">
{connector.status === 'available' && (
<Link href={`/dashboard/${searchSpaceId}/connectors/add/${connector.id}`} className="w-full">
<Button variant="default" className="w-full group">
<span>Connect</span>
<motion.div
className="ml-1"
initial={{ x: 0 }}
whileHover={{ x: 3 }}
transition={{ type: "spring", stiffness: 400, damping: 10 }}
>
<IconChevronRight className="h-4 w-4" />
</motion.div>
</Button>
</Link>
)}
{connector.status === 'coming-soon' && (
<Button variant="outline" disabled className="w-full opacity-70">
Coming Soon
</Button>
)}
{connector.status === 'connected' && (
<Button variant="outline" className="w-full border-green-500 text-green-600 hover:bg-green-50 dark:hover:bg-green-950">
Manage
</Button>
)}
</CardFooter>
</Card>
</motion.div>
))}
</motion.div>
</AnimatePresence>
</CollapsibleContent>
</Collapsible>
</motion.div>
))}
</div>
</motion.div>
</div>
);
}

View file

@ -1,6 +1,7 @@
"use client";
import { cn } from "@/lib/utils";
import { DocumentViewer } from "@/components/document-viewer";
import { JsonMetadataViewer } from "@/components/json-metadata-viewer";
import {
AlertDialog,
AlertDialogAction,
@ -12,7 +13,6 @@ import {
AlertDialogTitle,
AlertDialogTrigger,
} from "@/components/ui/alert-dialog";
import { Badge } from "@/components/ui/badge";
import { Button } from "@/components/ui/button";
import { Checkbox } from "@/components/ui/checkbox";
import {
@ -43,6 +43,9 @@ import {
TableHeader,
TableRow,
} from "@/components/ui/table";
import { useDocuments } from "@/hooks/use-documents";
import { cn } from "@/lib/utils";
import { IconBrandGithub, IconBrandNotion, IconBrandSlack, IconBrandYoutube, IconLayoutKanban } from "@tabler/icons-react";
import {
ColumnDef,
ColumnFiltersState,
@ -59,6 +62,7 @@ import {
getSortedRowModel,
useReactTable,
} from "@tanstack/react-table";
import { AnimatePresence, motion } from "framer-motion";
import {
AlertCircle,
ChevronDown,
@ -70,31 +74,22 @@ import {
CircleAlert,
CircleX,
Columns3,
Filter,
ListFilter,
Plus,
FileText,
Globe,
MessageSquare,
FileX,
File,
Trash,
FileX,
Filter,
Globe,
ListFilter,
MoreHorizontal,
Webhook,
Trash,
Webhook
} from "lucide-react";
import { useEffect, useId, useMemo, useRef, useState, useContext } from "react";
import { motion, AnimatePresence } from "framer-motion";
import { useParams } from "next/navigation";
import { useDocuments } from "@/hooks/use-documents";
import React from "react";
import { toast } from "sonner";
import React, { useContext, useEffect, useId, useMemo, useRef, useState } from "react";
import ReactMarkdown from "react-markdown";
import rehypeRaw from "rehype-raw";
import rehypeSanitize from "rehype-sanitize";
import remarkGfm from "remark-gfm";
import { DocumentViewer } from "@/components/document-viewer";
import { JsonMetadataViewer } from "@/components/json-metadata-viewer";
import { IconBrandNotion, IconBrandSlack, IconBrandYoutube } from "@tabler/icons-react";
import { toast } from "sonner";
// Define animation variants for reuse
const fadeInScale = {
@ -114,7 +109,7 @@ const fadeInScale = {
type Document = {
id: number;
title: string;
document_type: "EXTENSION" | "CRAWLED_URL" | "SLACK_CONNECTOR" | "NOTION_CONNECTOR" | "FILE" | "YOUTUBE_VIDEO";
document_type: "EXTENSION" | "CRAWLED_URL" | "SLACK_CONNECTOR" | "NOTION_CONNECTOR" | "FILE" | "YOUTUBE_VIDEO" | "LINEAR_CONNECTOR";
document_metadata: any;
content: string;
created_at: string;
@ -142,6 +137,8 @@ const documentTypeIcons = {
NOTION_CONNECTOR: IconBrandNotion,
FILE: File,
YOUTUBE_VIDEO: IconBrandYoutube,
GITHUB_CONNECTOR: IconBrandGithub,
LINEAR_CONNECTOR: IconLayoutKanban,
} as const;
const columns: ColumnDef<Document>[] = [
@ -1028,4 +1025,5 @@ function RowActions({ row }: { row: Row<Document> }) {
);
}
export { DocumentsTable }
export { DocumentsTable };

View file

@ -240,7 +240,7 @@ const SourcesDialogContent = ({
const ChatPage = () => {
const [token, setToken] = React.useState<string | null>(null);
const [activeTab, setActiveTab] = useState("");
const [dialogOpen, setDialogOpen] = useState(false);
const [dialogOpenId, setDialogOpenId] = useState<number | null>(null);
const [sourcesPage, setSourcesPage] = useState(1);
const [expandedSources, setExpandedSources] = useState(false);
const [canScrollLeft, setCanScrollLeft] = useState(false);
@ -260,6 +260,13 @@ const ChatPage = () => {
const { search_space_id, chat_id } = useParams();
// Function to scroll terminal to bottom
const scrollTerminalToBottom = () => {
if (terminalMessagesRef.current) {
terminalMessagesRef.current.scrollTop = terminalMessagesRef.current.scrollHeight;
}
};
// Get token from localStorage on client side only
React.useEffect(() => {
setToken(localStorage.getItem('surfsense_bearer_token'));
@ -469,54 +476,60 @@ const ChatPage = () => {
updateChat();
}, [messages, status, chat_id, researchMode, selectedConnectors, search_space_id]);
// Log messages whenever they update and extract annotations from the latest assistant message if available
useEffect(() => {
console.log('Messages updated:', messages);
// Extract annotations from the latest assistant message if available
// Memoize connector sources to prevent excessive re-renders
const processedConnectorSources = React.useMemo(() => {
if (messages.length === 0) return connectorSources;
// Only process when we have a complete message (not streaming)
if (status !== 'ready') return connectorSources;
// Find the latest assistant message
const assistantMessages = messages.filter(msg => msg.role === 'assistant');
if (assistantMessages.length > 0) {
const latestAssistantMessage = assistantMessages[assistantMessages.length - 1];
if (latestAssistantMessage?.annotations) {
const annotations = latestAssistantMessage.annotations as any[];
// Debug log to track streaming annotations
if (process.env.NODE_ENV === 'development') {
console.log('Streaming annotations:', annotations);
// Log counts of each annotation type
const terminalInfoCount = annotations.filter(a => a.type === 'TERMINAL_INFO').length;
const sourcesCount = annotations.filter(a => a.type === 'SOURCES').length;
const answerCount = annotations.filter(a => a.type === 'ANSWER').length;
console.log(`Annotation counts - Terminal: ${terminalInfoCount}, Sources: ${sourcesCount}, Answer: ${answerCount}`);
}
// Process SOURCES annotation - get the last one to ensure we have the latest
const sourcesAnnotations = annotations.filter(
(annotation) => annotation.type === 'SOURCES'
);
if (sourcesAnnotations.length > 0) {
// Get the last SOURCES annotation to ensure we have the most recent one
const latestSourcesAnnotation = sourcesAnnotations[sourcesAnnotations.length - 1];
if (latestSourcesAnnotation.content) {
setConnectorSources(latestSourcesAnnotation.content);
}
}
// Check for terminal info annotations and scroll terminal to bottom if they exist
const terminalInfoAnnotations = annotations.filter(
(annotation) => annotation.type === 'TERMINAL_INFO'
);
if (terminalInfoAnnotations.length > 0) {
// Schedule scrolling after the DOM has been updated
setTimeout(scrollTerminalToBottom, 100);
}
}
if (assistantMessages.length === 0) return connectorSources;
const latestAssistantMessage = assistantMessages[assistantMessages.length - 1];
if (!latestAssistantMessage?.annotations) return connectorSources;
// Find the latest SOURCES annotation
const annotations = latestAssistantMessage.annotations as any[];
const sourcesAnnotations = annotations.filter(a => a.type === 'SOURCES');
if (sourcesAnnotations.length === 0) return connectorSources;
const latestSourcesAnnotation = sourcesAnnotations[sourcesAnnotations.length - 1];
if (!latestSourcesAnnotation.content) return connectorSources;
// Use this content if it differs from current
return latestSourcesAnnotation.content;
}, [messages, status, connectorSources]);
// Update connector sources when processed value changes
useEffect(() => {
if (processedConnectorSources !== connectorSources) {
setConnectorSources(processedConnectorSources);
}
}, [messages]);
}, [processedConnectorSources, connectorSources]);
// Check and scroll terminal when terminal info is available
useEffect(() => {
if (messages.length === 0 || status !== 'ready') return;
// Find the latest assistant message
const assistantMessages = messages.filter(msg => msg.role === 'assistant');
if (assistantMessages.length === 0) return;
const latestAssistantMessage = assistantMessages[assistantMessages.length - 1];
if (!latestAssistantMessage?.annotations) return;
// Check for terminal info annotations
const annotations = latestAssistantMessage.annotations as any[];
const terminalInfoAnnotations = annotations.filter(a => a.type === 'TERMINAL_INFO');
if (terminalInfoAnnotations.length > 0) {
// Schedule scrolling after the DOM has been updated
setTimeout(scrollTerminalToBottom, 100);
}
}, [messages, status]);
// Custom handleSubmit function to include selected connectors and answer type
const handleSubmit = (e: React.FormEvent) => {
@ -543,24 +556,22 @@ const ChatPage = () => {
messagesEndRef.current?.scrollIntoView({ behavior: 'smooth' });
};
// Function to scroll terminal to bottom
const scrollTerminalToBottom = () => {
if (terminalMessagesRef.current) {
terminalMessagesRef.current.scrollTop = terminalMessagesRef.current.scrollHeight;
}
};
// Scroll to bottom when messages change
useEffect(() => {
scrollToBottom();
}, [messages]);
// Set activeTab when connectorSources change
useEffect(() => {
if (connectorSources.length > 0) {
setActiveTab(connectorSources[0].type);
}
// Set activeTab when connectorSources change using a memoized value
const activeTabValue = React.useMemo(() => {
return connectorSources.length > 0 ? connectorSources[0].type : "";
}, [connectorSources]);
// Update activeTab when the memoized value changes
useEffect(() => {
if (activeTabValue && activeTabValue !== activeTab) {
setActiveTab(activeTabValue);
}
}, [activeTabValue, activeTab]);
// Scroll terminal to bottom when expanded
useEffect(() => {
@ -617,49 +628,89 @@ const ChatPage = () => {
};
// Function to get a citation source by ID
const getCitationSource = (citationId: number): Source | null => {
const getCitationSource = React.useCallback((citationId: number, messageIndex?: number): Source | null => {
if (!messages || messages.length === 0) return null;
// Find the latest assistant message
const assistantMessages = messages.filter(msg => msg.role === 'assistant');
if (assistantMessages.length === 0) return null;
// If no specific message index is provided, use the latest assistant message
if (messageIndex === undefined) {
// Find the latest assistant message
const assistantMessages = messages.filter(msg => msg.role === 'assistant');
if (assistantMessages.length === 0) return null;
const latestAssistantMessage = assistantMessages[assistantMessages.length - 1];
if (!latestAssistantMessage?.annotations) return null;
const latestAssistantMessage = assistantMessages[assistantMessages.length - 1];
if (!latestAssistantMessage?.annotations) return null;
// Find all SOURCES annotations
const annotations = latestAssistantMessage.annotations as any[];
const sourcesAnnotations = annotations.filter(
(annotation) => annotation.type === 'SOURCES'
);
// Find all SOURCES annotations
const annotations = latestAssistantMessage.annotations as any[];
const sourcesAnnotations = annotations.filter(
(annotation) => annotation.type === 'SOURCES'
);
// Get the latest SOURCES annotation
if (sourcesAnnotations.length === 0) return null;
const latestSourcesAnnotation = sourcesAnnotations[sourcesAnnotations.length - 1];
// Get the latest SOURCES annotation
if (sourcesAnnotations.length === 0) return null;
const latestSourcesAnnotation = sourcesAnnotations[sourcesAnnotations.length - 1];
if (!latestSourcesAnnotation.content) return null;
if (!latestSourcesAnnotation.content) return null;
// Flatten all sources from all connectors
const allSources: Source[] = [];
latestSourcesAnnotation.content.forEach((connector: ConnectorSource) => {
if (connector.sources && Array.isArray(connector.sources)) {
connector.sources.forEach((source: SourceItem) => {
allSources.push({
id: source.id,
title: source.title,
description: source.description,
url: source.url,
connectorType: connector.type
// Flatten all sources from all connectors
const allSources: Source[] = [];
latestSourcesAnnotation.content.forEach((connector: ConnectorSource) => {
if (connector.sources && Array.isArray(connector.sources)) {
connector.sources.forEach((source: SourceItem) => {
allSources.push({
id: source.id,
title: source.title,
description: source.description,
url: source.url,
connectorType: connector.type
});
});
});
}
});
}
});
// Find the source with the matching ID
const foundSource = allSources.find(source => source.id === citationId);
// Find the source with the matching ID
const foundSource = allSources.find(source => source.id === citationId);
return foundSource || null;
};
return foundSource || null;
} else {
// Use the specific message by index
const message = messages[messageIndex];
if (!message || message.role !== 'assistant' || !message.annotations) return null;
// Find all SOURCES annotations
const annotations = message.annotations as any[];
const sourcesAnnotations = annotations.filter(
(annotation) => annotation.type === 'SOURCES'
);
// Get the latest SOURCES annotation
if (sourcesAnnotations.length === 0) return null;
const latestSourcesAnnotation = sourcesAnnotations[sourcesAnnotations.length - 1];
if (!latestSourcesAnnotation.content) return null;
// Flatten all sources from all connectors
const allSources: Source[] = [];
latestSourcesAnnotation.content.forEach((connector: ConnectorSource) => {
if (connector.sources && Array.isArray(connector.sources)) {
connector.sources.forEach((source: SourceItem) => {
allSources.push({
id: source.id,
title: source.title,
description: source.description,
url: source.url,
connectorType: connector.type
});
});
}
});
// Find the source with the matching ID
const foundSource = allSources.find(source => source.id === citationId);
return foundSource || null;
}
}, [messages]);
return (
<>
@ -685,7 +736,11 @@ const ChatPage = () => {
<div className="flex-1">
<Card className="border-gray-300 dark:border-gray-700">
<CardContent className="p-3">
<MarkdownViewer content={message.content} getCitationSource={getCitationSource} className="text-sm" />
<MarkdownViewer
content={message.content}
getCitationSource={(id) => getCitationSource(id, index)}
className="text-sm"
/>
</CardContent>
</Card>
</div>
@ -856,7 +911,7 @@ const ChatPage = () => {
))}
{connector.sources.length > INITIAL_SOURCES_DISPLAY && (
<Dialog open={dialogOpen && activeTab === connector.type} onOpenChange={(open) => setDialogOpen(open)}>
<Dialog open={dialogOpenId === connector.id} onOpenChange={(open) => setDialogOpenId(open ? connector.id : null)}>
<DialogTrigger asChild>
<Button variant="ghost" className="w-full text-sm text-gray-500 dark:text-gray-400">
Show {connector.sources.length - INITIAL_SOURCES_DISPLAY} More Sources
@ -901,13 +956,16 @@ const ChatPage = () => {
return (
<MarkdownViewer
content={latestAnswer.content.join('\n')}
getCitationSource={getCitationSource}
getCitationSource={(id) => getCitationSource(id, index)}
/>
);
}
// Fallback to the message content if no ANSWER annotation is available
return <MarkdownViewer content={message.content} getCitationSource={getCitationSource} />;
return <MarkdownViewer
content={message.content}
getCitationSource={(id) => getCitationSource(id, index)}
/>;
})()}
</div>
}

View file

@ -0,0 +1,46 @@
import { source } from '@/lib/source';
import {
DocsBody,
DocsDescription,
DocsPage,
DocsTitle,
} from 'fumadocs-ui/page';
import { notFound } from 'next/navigation';
import { getMDXComponents } from '@/mdx-components';
export default async function Page(props: {
params: Promise<{ slug?: string[] }>;
}) {
const params = await props.params;
const page = source.getPage(params.slug);
if (!page) notFound();
const MDX = page.data.body;
return (
<DocsPage toc={page.data.toc} full={page.data.full}>
<DocsTitle>{page.data.title}</DocsTitle>
<DocsDescription>{page.data.description}</DocsDescription>
<DocsBody>
<MDX components={getMDXComponents()} />
</DocsBody>
</DocsPage>
);
}
export async function generateStaticParams() {
return source.generateParams();
}
export async function generateMetadata(props: {
params: Promise<{ slug?: string[] }>;
}) {
const params = await props.params;
const page = source.getPage(params.slug);
if (!page) notFound();
return {
title: page.data.title,
description: page.data.description,
};
}

View file

@ -0,0 +1,12 @@
import { source } from '@/lib/source';
import { DocsLayout } from 'fumadocs-ui/layouts/docs';
import type { ReactNode } from 'react';
import { baseOptions } from '@/app/layout.config';
export default function Layout({ children }: { children: ReactNode }) {
return (
<DocsLayout tree={source.pageTree} {...baseOptions}>
{children}
</DocsLayout>
);
}

View file

@ -1,4 +1,6 @@
@import "tailwindcss";
@import 'tailwindcss';
@import 'fumadocs-ui/css/neutral.css';
@import 'fumadocs-ui/css/preset.css';
@plugin "tailwindcss-animate";

View file

@ -0,0 +1,7 @@
import { BaseLayoutProps } from 'fumadocs-ui/layouts/shared';
export const baseOptions: BaseLayoutProps = {
nav: {
title: 'SurfSense Documentation',
},
};

View file

@ -5,6 +5,7 @@ import { Roboto } from "next/font/google";
import { Toaster } from "@/components/ui/sonner";
import { ThemeProvider } from "@/components/theme/theme-provider";
import { RootProvider } from 'fumadocs-ui/provider';
const roboto = Roboto({
subsets: ["latin"],
@ -64,8 +65,10 @@ export default async function RootLayout({
disableTransitionOnChange
defaultTheme="light"
>
{children}
<Toaster />
<RootProvider>
{children}
<Toaster />
</RootProvider>
</ThemeProvider>
</body>
</html>

View file

@ -1,6 +1,6 @@
"use client";
import { cn } from "@/lib/utils";
import { IconArrowRight, IconBrandGithub, IconBrandDiscord } from "@tabler/icons-react";
import { IconFileTypeDoc, IconBrandGithub, IconBrandDiscord } from "@tabler/icons-react";
import Link from "next/link";
import React from "react";
import { motion } from "framer-motion";
@ -20,11 +20,11 @@ export function ModernHeroWithGradients() {
<div className="relative z-20 flex flex-col items-center justify-center overflow-hidden rounded-3xl p-4 md:p-12 lg:p-16">
<Link
href="https://github.com/MODSetter/SurfSense"
href="/docs"
className="flex items-center gap-1 rounded-full border border-gray-200 bg-gradient-to-b from-gray-50 to-gray-100 px-4 py-1 text-center text-sm text-gray-800 shadow-sm dark:border-[#404040] dark:bg-gradient-to-b dark:from-[#5B5B5D] dark:to-[#262627] dark:text-white dark:shadow-inner dark:shadow-purple-500/10"
>
<span>SurfSense v0.0.6 Released</span>
<IconArrowRight className="h-4 w-4 text-gray-800 dark:text-white" />
<IconFileTypeDoc className="h-4 w-4 text-gray-800 dark:text-white" />
<span>Documentation</span>
</Link>
{/* Import the Logo component or define it in this file */}
<div className="flex items-center justify-center gap-4 mt-10 mb-2">
@ -36,7 +36,7 @@ export function ModernHeroWithGradients() {
</h1>
</div>
<p className="mx-auto max-w-3xl py-6 text-center text-base text-gray-600 dark:text-neutral-300 md:text-lg lg:text-xl">
A Customizable AI Research Agent just like NotebookLM or Perplexity, but connected to external sources such as search engines (Tavily), Slack, Notion, and more.
A Customizable AI Research Agent just like NotebookLM or Perplexity, but connected to external sources such as search engines (Tavily), Slack, Linear, Notion, YouTube, GitHub and more.
</p>
<div className="flex flex-col items-center gap-6 py-6 sm:flex-row">
<Link

View file

@ -24,8 +24,8 @@ interface NavbarProps {
export const Navbar = () => {
const navItems = [
{
name: "",
link: "/",
name: "Docs",
link: "/docs",
},
// {
// name: "Product",
@ -118,53 +118,52 @@ const DesktopNav = ({ navItems, visible }: NavbarProps) => {
<Logo className="h-8 w-8 rounded-md" />
<span className="dark:text-white/90 text-gray-800 text-lg font-bold">SurfSense</span>
</div>
<motion.div
className="lg:flex flex-row flex-1 items-center justify-center space-x-1 text-sm"
animate={{
scale: visible ? 0.9 : 1,
justifyContent: visible ? "flex-end" : "center",
}}
>
{navItems.map((navItem, idx) => (
<motion.div
key={`nav-item-${idx}`}
onHoverStart={() => setHoveredIndex(idx)}
className="relative"
>
<Link
className="dark:text-white/90 text-gray-800 relative px-3 py-1.5 transition-colors"
href={navItem.link}
<div className="flex items-center gap-4">
<motion.div
className="lg:flex flex-row items-center justify-end space-x-1 text-sm"
animate={{
scale: visible ? 0.9 : 1,
}}
>
{navItems.map((navItem, idx) => (
<motion.div
key={`nav-item-${idx}`}
onHoverStart={() => setHoveredIndex(idx)}
className="relative"
>
<span className="relative z-10">{navItem.name}</span>
{hoveredIndex === idx && (
<motion.div
layoutId="menu-hover"
className="absolute inset-0 rounded-full dark:bg-gradient-to-r dark:from-white/10 dark:to-white/20 bg-gradient-to-r from-gray-200 to-gray-300"
initial={{ opacity: 0, scale: 0.8 }}
animate={{
opacity: 1,
scale: 1.1,
background: "var(--tw-dark) ? radial-gradient(circle at center, rgba(255,255,255,0.2) 0%, rgba(255,255,255,0.1) 50%, transparent 100%) : radial-gradient(circle at center, rgba(0,0,0,0.05) 0%, rgba(0,0,0,0.03) 50%, transparent 100%)",
}}
exit={{
opacity: 0,
scale: 0.8,
transition: {
duration: 0.2,
},
}}
transition={{
type: "spring",
bounce: 0.4,
duration: 0.4,
}}
/>
)}
</Link>
</motion.div>
))}
</motion.div>
<div className="flex items-center gap-2">
<Link
className="dark:text-white/90 text-gray-800 relative px-3 py-1.5 transition-colors"
href={navItem.link}
>
<span className="relative z-10">{navItem.name}</span>
{hoveredIndex === idx && (
<motion.div
layoutId="menu-hover"
className="absolute inset-0 rounded-full dark:bg-gradient-to-r dark:from-white/10 dark:to-white/20 bg-gradient-to-r from-gray-200 to-gray-300"
initial={{ opacity: 0, scale: 0.8 }}
animate={{
opacity: 1,
scale: 1.1,
background: "var(--tw-dark) ? radial-gradient(circle at center, rgba(255,255,255,0.2) 0%, rgba(255,255,255,0.1) 50%, transparent 100%) : radial-gradient(circle at center, rgba(0,0,0,0.05) 0%, rgba(0,0,0,0.03) 50%, transparent 100%)",
}}
exit={{
opacity: 0,
scale: 0.8,
transition: {
duration: 0.2,
},
}}
transition={{
type: "spring",
bounce: 0.4,
duration: 0.4,
}}
/>
)}
</Link>
</motion.div>
))}
</motion.div>
<ThemeTogglerComponent />
<AnimatePresence mode="popLayout" initial={false}>
{!visible && (

View file

@ -20,7 +20,7 @@ type CitationProps = {
/**
* Citation component to handle individual citations
*/
export const Citation = ({ citationId, citationText, position, source }: CitationProps) => {
export const Citation = React.memo(({ citationId, citationText, position, source }: CitationProps) => {
const [open, setOpen] = useState(false);
const citationKey = `citation-${citationId}-${position}`;
@ -38,37 +38,41 @@ export const Citation = ({ citationId, citationText, position, source }: Citatio
</span>
</sup>
</DropdownMenuTrigger>
<DropdownMenuContent align="start" className="w-80 p-0">
<Card className="border-0 shadow-none">
<div className="p-3 flex items-start gap-3">
<div className="flex-shrink-0 w-7 h-7 flex items-center justify-center bg-muted rounded-full">
{getConnectorIcon(source.connectorType || '')}
</div>
<div className="flex-1">
<div className="flex items-center gap-2 mb-1">
<h3 className="font-medium text-sm text-card-foreground">{source.title}</h3>
{open && (
<DropdownMenuContent align="start" className="w-80 p-0" forceMount>
<Card className="border-0 shadow-none">
<div className="p-3 flex items-start gap-3">
<div className="flex-shrink-0 w-7 h-7 flex items-center justify-center bg-muted rounded-full">
{getConnectorIcon(source.connectorType || '')}
</div>
<p className="text-sm text-muted-foreground mt-0.5">{source.description}</p>
<div className="mt-2 flex items-center text-xs text-muted-foreground">
<span className="truncate max-w-[200px]">{source.url}</span>
<div className="flex-1">
<div className="flex items-center gap-2 mb-1">
<h3 className="font-medium text-sm text-card-foreground">{source.title}</h3>
</div>
<p className="text-sm text-muted-foreground mt-0.5">{source.description}</p>
<div className="mt-2 flex items-center text-xs text-muted-foreground">
<span className="truncate max-w-[200px]">{source.url}</span>
</div>
</div>
<Button
variant="ghost"
size="icon"
className="h-7 w-7 rounded-full"
onClick={() => window.open(source.url, '_blank', 'noopener,noreferrer')}
title="Open in new tab"
>
<ExternalLink className="h-3.5 w-3.5" />
</Button>
</div>
<Button
variant="ghost"
size="icon"
className="h-7 w-7 rounded-full"
onClick={() => window.open(source.url, '_blank')}
title="Open in new tab"
>
<ExternalLink className="h-3.5 w-3.5" />
</Button>
</div>
</Card>
</DropdownMenuContent>
</Card>
</DropdownMenuContent>
)}
</DropdownMenu>
</span>
);
};
});
Citation.displayName = 'Citation';
/**
* Function to render text with citations

View file

@ -11,7 +11,7 @@ import {
Link,
Webhook,
} from 'lucide-react';
import { IconBrandNotion, IconBrandSlack, IconBrandYoutube } from "@tabler/icons-react";
import { IconBrandNotion, IconBrandSlack, IconBrandYoutube, IconBrandGithub, IconLayoutKanban } from "@tabler/icons-react";
import { Button } from '@/components/ui/button';
import { Connector, ResearchMode } from './types';
@ -20,6 +20,10 @@ export const getConnectorIcon = (connectorType: string) => {
const iconProps = { className: "h-4 w-4" };
switch(connectorType) {
case 'LINEAR_CONNECTOR':
return <IconLayoutKanban {...iconProps} />;
case 'GITHUB_CONNECTOR':
return <IconBrandGithub {...iconProps} />;
case 'YOUTUBE_VIDEO':
return <IconBrandYoutube {...iconProps} />;
case 'CRAWLED_URL':

View file

@ -0,0 +1,21 @@
import React from 'react';
import { Skeleton } from "@/components/ui/skeleton";
import { Card, CardContent, CardHeader } from "@/components/ui/card";
export function EditConnectorLoadingSkeleton() {
return (
<div className="container mx-auto py-8 max-w-3xl">
<Skeleton className="h-8 w-48 mb-6" />
<Card className="border-2 border-border">
<CardHeader>
<Skeleton className="h-7 w-3/4 mb-2" />
<Skeleton className="h-4 w-full" />
</CardHeader>
<CardContent className="space-y-4">
<Skeleton className="h-10 w-full" />
<Skeleton className="h-20 w-full" />
</CardContent>
</Card>
</div>
);
}

View file

@ -0,0 +1,25 @@
import React from 'react';
import { Control } from 'react-hook-form';
import { FormField, FormItem, FormLabel, FormControl, FormMessage } from "@/components/ui/form";
import { Input } from "@/components/ui/input";
// Assuming EditConnectorFormValues is defined elsewhere or passed as generic
interface EditConnectorNameFormProps {
control: Control<any>; // Use Control<EditConnectorFormValues> if type is available
}
export function EditConnectorNameForm({ control }: EditConnectorNameFormProps) {
return (
<FormField
control={control}
name="name"
render={({ field }) => (
<FormItem>
<FormLabel>Connector Name</FormLabel>
<FormControl><Input {...field} /></FormControl>
<FormMessage />
</FormItem>
)}
/>
);
}

View file

@ -0,0 +1,160 @@
import React from 'react';
import { UseFormReturn } from 'react-hook-form';
import { FormField, FormItem, FormLabel, FormControl, FormDescription, FormMessage } from "@/components/ui/form";
import { Input } from "@/components/ui/input";
import { Button } from "@/components/ui/button";
import { Checkbox } from "@/components/ui/checkbox";
import { Alert, AlertDescription, AlertTitle } from "@/components/ui/alert";
import { Skeleton } from "@/components/ui/skeleton";
import { Edit, KeyRound, Loader2, CircleAlert } from 'lucide-react';
// Types needed from parent
interface GithubRepo {
id: number;
name: string;
full_name: string;
private: boolean;
url: string;
description: string | null;
last_updated: string | null;
}
type GithubPatFormValues = { github_pat: string; };
type EditMode = 'viewing' | 'editing_repos';
interface EditGitHubConnectorConfigProps {
// State from parent
editMode: EditMode;
originalPat: string;
currentSelectedRepos: string[];
fetchedRepos: GithubRepo[] | null;
newSelectedRepos: string[];
isFetchingRepos: boolean;
// Forms from parent
patForm: UseFormReturn<GithubPatFormValues>;
// Handlers from parent
setEditMode: (mode: EditMode) => void;
handleFetchRepositories: (values: GithubPatFormValues) => Promise<void>;
handleRepoSelectionChange: (repoFullName: string, checked: boolean) => void;
setNewSelectedRepos: React.Dispatch<React.SetStateAction<string[]>>;
setFetchedRepos: React.Dispatch<React.SetStateAction<GithubRepo[] | null>>;
}
export function EditGitHubConnectorConfig({
editMode,
originalPat,
currentSelectedRepos,
fetchedRepos,
newSelectedRepos,
isFetchingRepos,
patForm,
setEditMode,
handleFetchRepositories,
handleRepoSelectionChange,
setNewSelectedRepos,
setFetchedRepos
}: EditGitHubConnectorConfigProps) {
return (
<div className="space-y-4">
<h4 className="font-medium text-muted-foreground">Repository Selection & Access</h4>
{/* Viewing Mode */}
{editMode === 'viewing' && (
<div className="space-y-3 p-4 border rounded-md bg-muted/50">
<FormLabel>Currently Indexed Repositories:</FormLabel>
{currentSelectedRepos.length > 0 ? (
<ul className="list-disc pl-5 text-sm">
{currentSelectedRepos.map(repo => <li key={repo}>{repo}</li>)}
</ul>
) : (
<p className="text-sm text-muted-foreground">(No repositories currently selected)</p>
)}
<Button type="button" variant="outline" size="sm" onClick={() => setEditMode('editing_repos')}>
<Edit className="mr-2 h-4 w-4" /> Change Selection / Update PAT
</Button>
<FormDescription>To change repo selections or update the PAT, click above.</FormDescription>
</div>
)}
{/* Editing Mode */}
{editMode === 'editing_repos' && (
<div className="space-y-4 p-4 border rounded-md">
{/* PAT Input */}
<div className="flex items-end gap-4 p-4 border rounded-md bg-muted/90">
<FormField
control={patForm.control}
name="github_pat"
render={({ field }) => (
<FormItem className="flex-grow">
<FormLabel className="flex items-center gap-1"><KeyRound className="h-4 w-4" /> GitHub PAT</FormLabel>
<FormControl><Input type="password" placeholder="ghp_... or github_pat_..." {...field} /></FormControl>
<FormDescription>Enter PAT to fetch/update repos or if you need to update the stored token.</FormDescription>
<FormMessage />
</FormItem>
)}
/>
<Button
type="button"
disabled={isFetchingRepos}
size="sm"
onClick={async () => {
const isValid = await patForm.trigger('github_pat');
if (isValid) {
handleFetchRepositories(patForm.getValues());
}
}}
>
{isFetchingRepos ? <Loader2 className="h-4 w-4 animate-spin" /> : "Fetch Repositories"}
</Button>
</div>
{/* Repo List */}
{isFetchingRepos && <Skeleton className="h-40 w-full" />}
{!isFetchingRepos && fetchedRepos !== null && (
fetchedRepos.length === 0 ? (
<Alert variant="destructive">
<CircleAlert className="h-4 w-4" />
<AlertTitle>No Repositories Found</AlertTitle>
<AlertDescription>Check PAT & permissions.</AlertDescription>
</Alert>
) : (
<div className="space-y-2">
<FormLabel>Select Repositories to Index ({newSelectedRepos.length} selected):</FormLabel>
<div className="h-64 w-full rounded-md border p-4 overflow-y-auto">
{fetchedRepos.map((repo) => (
<div key={repo.id} className="flex items-center space-x-2 mb-2 py-1">
<Checkbox
id={`repo-${repo.id}`}
checked={newSelectedRepos.includes(repo.full_name)}
onCheckedChange={(checked) => handleRepoSelectionChange(repo.full_name, !!checked)}
/>
<label
htmlFor={`repo-${repo.id}`}
className="text-sm font-medium leading-none peer-disabled:cursor-not-allowed peer-disabled:opacity-70"
>
{repo.full_name} {repo.private && "(Private)"}
</label>
</div>
))}
</div>
</div>
)
)}
<Button
type="button"
variant="ghost"
size="sm"
onClick={() => {
setEditMode('viewing');
setFetchedRepos(null);
setNewSelectedRepos(currentSelectedRepos);
patForm.reset({ github_pat: originalPat }); // Reset PAT form on cancel
}}
>
Cancel Repo Change
</Button>
</div>
)}
</div>
);
}

View file

@ -0,0 +1,37 @@
import React from 'react';
import { Control } from 'react-hook-form';
import { FormField, FormItem, FormLabel, FormControl, FormDescription, FormMessage } from "@/components/ui/form";
import { Input } from "@/components/ui/input";
import { KeyRound } from 'lucide-react';
// Assuming EditConnectorFormValues is defined elsewhere or passed as generic
interface EditSimpleTokenFormProps {
control: Control<any>;
fieldName: string; // e.g., "SLACK_BOT_TOKEN"
fieldLabel: string; // e.g., "Slack Bot Token"
fieldDescription: string;
placeholder?: string;
}
export function EditSimpleTokenForm({
control,
fieldName,
fieldLabel,
fieldDescription,
placeholder
}: EditSimpleTokenFormProps) {
return (
<FormField
control={control}
name={fieldName}
render={({ field }) => (
<FormItem>
<FormLabel className="flex items-center gap-1"><KeyRound className="h-4 w-4" /> {fieldLabel}</FormLabel>
<FormControl><Input type="password" placeholder={placeholder} {...field} /></FormControl>
<FormDescription>{fieldDescription}</FormDescription>
<FormMessage />
</FormItem>
)}
/>
);
}

View file

@ -0,0 +1,34 @@
import * as z from "zod";
// Types
export interface GithubRepo {
id: number;
name: string;
full_name: string;
private: boolean;
url: string;
description: string | null;
last_updated: string | null;
}
export type EditMode = 'viewing' | 'editing_repos';
// Schemas
export const githubPatSchema = z.object({
github_pat: z.string()
.min(20, { message: "GitHub Personal Access Token seems too short." })
.refine(pat => pat.startsWith('ghp_') || pat.startsWith('github_pat_'), {
message: "GitHub PAT should start with 'ghp_' or 'github_pat_'",
}),
});
export type GithubPatFormValues = z.infer<typeof githubPatSchema>;
export const editConnectorSchema = z.object({
name: z.string().min(3, { message: "Connector name must be at least 3 characters." }),
SLACK_BOT_TOKEN: z.string().optional(),
NOTION_INTEGRATION_TOKEN: z.string().optional(),
SERPER_API_KEY: z.string().optional(),
TAVILY_API_KEY: z.string().optional(),
LINEAR_API_KEY: z.string().optional(),
});
export type EditConnectorFormValues = z.infer<typeof editConnectorSchema>;

View file

@ -1,4 +1,4 @@
import React from "react";
import React, { useMemo } from "react";
import ReactMarkdown from "react-markdown";
import rehypeRaw from "rehype-raw";
import rehypeSanitize from "rehype-sanitize";
@ -14,75 +14,87 @@ interface MarkdownViewerProps {
}
export function MarkdownViewer({ content, className, getCitationSource }: MarkdownViewerProps) {
// Memoize the markdown components to prevent unnecessary re-renders
const components = useMemo(() => {
return {
// Define custom components for markdown elements
p: ({node, children, ...props}: any) => {
// If there's no getCitationSource function, just render normally
if (!getCitationSource) {
return <p className="my-2" {...props}>{children}</p>;
}
// Process citations within paragraph content
return <p className="my-2" {...props}>{processCitationsInReactChildren(children, getCitationSource)}</p>;
},
a: ({node, children, ...props}: any) => {
// Process citations within link content if needed
const processedChildren = getCitationSource
? processCitationsInReactChildren(children, getCitationSource)
: children;
return <a className="text-primary hover:underline" {...props}>{processedChildren}</a>;
},
li: ({node, children, ...props}: any) => {
// Process citations within list item content
const processedChildren = getCitationSource
? processCitationsInReactChildren(children, getCitationSource)
: children;
return <li {...props}>{processedChildren}</li>;
},
ul: ({node, ...props}: any) => <ul className="list-disc pl-5 my-2" {...props} />,
ol: ({node, ...props}: any) => <ol className="list-decimal pl-5 my-2" {...props} />,
h1: ({node, children, ...props}: any) => {
const processedChildren = getCitationSource
? processCitationsInReactChildren(children, getCitationSource)
: children;
return <h1 className="text-2xl font-bold mt-6 mb-2" {...props}>{processedChildren}</h1>;
},
h2: ({node, children, ...props}: any) => {
const processedChildren = getCitationSource
? processCitationsInReactChildren(children, getCitationSource)
: children;
return <h2 className="text-xl font-bold mt-5 mb-2" {...props}>{processedChildren}</h2>;
},
h3: ({node, children, ...props}: any) => {
const processedChildren = getCitationSource
? processCitationsInReactChildren(children, getCitationSource)
: children;
return <h3 className="text-lg font-bold mt-4 mb-2" {...props}>{processedChildren}</h3>;
},
h4: ({node, children, ...props}: any) => {
const processedChildren = getCitationSource
? processCitationsInReactChildren(children, getCitationSource)
: children;
return <h4 className="text-base font-bold mt-3 mb-1" {...props}>{processedChildren}</h4>;
},
blockquote: ({node, ...props}: any) => <blockquote className="border-l-4 border-muted pl-4 italic my-2" {...props} />,
hr: ({node, ...props}: any) => <hr className="my-4 border-muted" {...props} />,
img: ({node, ...props}: any) => <img className="max-w-full h-auto my-4 rounded" {...props} />,
table: ({node, ...props}: any) => <div className="overflow-x-auto my-4"><table className="min-w-full divide-y divide-border" {...props} /></div>,
th: ({node, ...props}: any) => <th className="px-3 py-2 text-left font-medium bg-muted" {...props} />,
td: ({node, ...props}: any) => <td className="px-3 py-2 border-t border-border" {...props} />,
code: ({node, className, children, ...props}: any) => {
const match = /language-(\w+)/.exec(className || '');
const isInline = !match;
return isInline
? <code className="bg-muted px-1 py-0.5 rounded text-xs" {...props}>{children}</code>
: (
<div className="relative my-4">
<pre className="bg-muted p-4 rounded-md overflow-x-auto">
<code className="text-xs" {...props}>{children}</code>
</pre>
</div>
);
}
};
}, [getCitationSource]);
return (
<div className={cn("prose prose-sm dark:prose-invert max-w-none", className)}>
<ReactMarkdown
rehypePlugins={[rehypeRaw, rehypeSanitize]}
remarkPlugins={[remarkGfm]}
components={{
// Define custom components for markdown elements
p: ({node, children, ...props}) => {
// If there's no getCitationSource function, just render normally
if (!getCitationSource) {
return <p className="my-2" {...props}>{children}</p>;
}
// Process citations within paragraph content
return <p className="my-2" {...props}>{processCitationsInReactChildren(children, getCitationSource)}</p>;
},
a: ({node, children, ...props}) => {
// Process citations within link content if needed
const processedChildren = getCitationSource
? processCitationsInReactChildren(children, getCitationSource)
: children;
return <a className="text-primary hover:underline" {...props}>{processedChildren}</a>;
},
ul: ({node, ...props}) => <ul className="list-disc pl-5 my-2" {...props} />,
ol: ({node, ...props}) => <ol className="list-decimal pl-5 my-2" {...props} />,
h1: ({node, children, ...props}) => {
const processedChildren = getCitationSource
? processCitationsInReactChildren(children, getCitationSource)
: children;
return <h1 className="text-2xl font-bold mt-6 mb-2" {...props}>{processedChildren}</h1>;
},
h2: ({node, children, ...props}) => {
const processedChildren = getCitationSource
? processCitationsInReactChildren(children, getCitationSource)
: children;
return <h2 className="text-xl font-bold mt-5 mb-2" {...props}>{processedChildren}</h2>;
},
h3: ({node, children, ...props}) => {
const processedChildren = getCitationSource
? processCitationsInReactChildren(children, getCitationSource)
: children;
return <h3 className="text-lg font-bold mt-4 mb-2" {...props}>{processedChildren}</h3>;
},
h4: ({node, children, ...props}) => {
const processedChildren = getCitationSource
? processCitationsInReactChildren(children, getCitationSource)
: children;
return <h4 className="text-base font-bold mt-3 mb-1" {...props}>{processedChildren}</h4>;
},
blockquote: ({node, ...props}) => <blockquote className="border-l-4 border-muted pl-4 italic my-2" {...props} />,
hr: ({node, ...props}) => <hr className="my-4 border-muted" {...props} />,
img: ({node, ...props}) => <img className="max-w-full h-auto my-4 rounded" {...props} />,
table: ({node, ...props}) => <div className="overflow-x-auto my-4"><table className="min-w-full divide-y divide-border" {...props} /></div>,
th: ({node, ...props}) => <th className="px-3 py-2 text-left font-medium bg-muted" {...props} />,
td: ({node, ...props}) => <td className="px-3 py-2 border-t border-border" {...props} />,
code: ({node, className, children, ...props}: any) => {
const match = /language-(\w+)/.exec(className || '');
const isInline = !match;
return isInline
? <code className="bg-muted px-1 py-0.5 rounded text-xs" {...props}>{children}</code>
: (
<div className="relative my-4">
<pre className="bg-muted p-4 rounded-md overflow-x-auto">
<code className="text-xs" {...props}>{children}</code>
</pre>
</div>
);
}
}}
components={components}
>
{content}
</ReactMarkdown>
@ -91,7 +103,7 @@ export function MarkdownViewer({ content, className, getCitationSource }: Markdo
}
// Helper function to process citations within React children
function processCitationsInReactChildren(children: React.ReactNode, getCitationSource: (id: number) => Source | null): React.ReactNode {
const processCitationsInReactChildren = (children: React.ReactNode, getCitationSource: (id: number) => Source | null): React.ReactNode => {
// If children is not an array or string, just return it
if (!children || (typeof children !== 'string' && !Array.isArray(children))) {
return children;
@ -113,16 +125,18 @@ function processCitationsInReactChildren(children: React.ReactNode, getCitationS
}
return children;
}
};
// Process citation references in text content
function processCitationsInText(text: string, getCitationSource: (id: number) => Source | null): React.ReactNode[] {
const processCitationsInText = (text: string, getCitationSource: (id: number) => Source | null): React.ReactNode[] => {
// Use improved regex to catch citation numbers more reliably
// This will match patterns like [1], [42], etc. including when they appear at the end of a line or sentence
const citationRegex = /\[(\d+)\]/g;
const parts: React.ReactNode[] = [];
let lastIndex = 0;
let match;
let position = 0;
while ((match = citationRegex.exec(text)) !== null) {
// Add text before the citation
if (match.index > lastIndex) {
@ -131,13 +145,15 @@ function processCitationsInText(text: string, getCitationSource: (id: number) =>
// Add the citation component
const citationId = parseInt(match[1], 10);
const source = getCitationSource(citationId);
parts.push(
<Citation
key={`citation-${citationId}-${position}`}
citationId={citationId}
citationText={match[0]}
position={position}
source={getCitationSource(citationId)}
source={source}
/>
);
@ -151,4 +167,4 @@ function processCitationsInText(text: string, getCitationSource: (id: number) =>
}
return parts;
}
};

View file

@ -118,8 +118,8 @@ export function AppSidebarProvider({
if (typeof window === 'undefined') return;
try {
// Use the API client instead of direct fetch
const chats: Chat[] = await apiClient.get<Chat[]>('api/v1/chats/?limit=5&skip=0');
// Use the API client instead of direct fetch - filter by current search space ID
const chats: Chat[] = await apiClient.get<Chat[]>(`api/v1/chats/?limit=5&skip=0&search_space_id=${searchSpaceId}`);
// Transform API response to the format expected by AppSidebar
const formattedChats = chats.map(chat => ({
@ -170,7 +170,7 @@ export function AppSidebarProvider({
// Clean up interval on component unmount
return () => clearInterval(intervalId);
}, []);
}, [searchSpaceId]);
// Handle delete chat
const handleDeleteChat = async () => {

View file

@ -0,0 +1,168 @@
---
title: Docker Installation
description: Setting up SurfSense using Docker
full: true
---
## Known Limitations
⚠️ **Important Note:** Currently, the following features have limited functionality when running in Docker:
- **Ollama integration:** Local Ollama models do not work when running SurfSense in Docker. Please use other LLM providers like OpenAI or Gemini instead.
- **Web crawler functionality:** The web crawler feature currently doesn't work properly within the Docker environment.
We're actively working to resolve these limitations in future releases.
# Docker Installation
This guide explains how to run SurfSense using Docker Compose, which is the preferred and recommended method for deployment.
## Prerequisites
Before you begin, ensure you have:
- [Docker](https://docs.docker.com/get-docker/) and [Docker Compose](https://docs.docker.com/compose/install/) installed on your machine
- [Git](https://git-scm.com/downloads) (to clone the repository)
- Completed all the [prerequisite setup steps](/docs) including:
- PGVector setup
- Google OAuth configuration
- Unstructured.io API key
- Other required API keys
## Installation Steps
1. **Configure Environment Variables**
Set up the necessary environment variables:
**Linux/macOS:**
```bash
# Copy example environment files
cp surfsense_backend/.env.example surfsense_backend/.env
cp surfsense_web/.env.example surfsense_web/.env
```
**Windows (Command Prompt):**
```cmd
copy surfsense_backend\.env.example surfsense_backend\.env
copy surfsense_web\.env.example surfsense_web\.env
```
**Windows (PowerShell):**
```powershell
Copy-Item -Path surfsense_backend\.env.example -Destination surfsense_backend\.env
Copy-Item -Path surfsense_web\.env.example -Destination surfsense_web\.env
```
Edit both `.env` files and fill in the required values:
**Backend Environment Variables:**
| ENV VARIABLE | DESCRIPTION |
|--------------|-------------|
| DATABASE_URL | PostgreSQL connection string (e.g., `postgresql+asyncpg://postgres:postgres@localhost:5432/surfsense`) |
| SECRET_KEY | JWT Secret key for authentication (should be a secure random string) |
| GOOGLE_OAUTH_CLIENT_ID | Google OAuth client ID obtained from Google Cloud Console |
| GOOGLE_OAUTH_CLIENT_SECRET | Google OAuth client secret obtained from Google Cloud Console |
| NEXT_FRONTEND_URL | URL where your frontend application is hosted (e.g., `http://localhost:3000`) |
| EMBEDDING_MODEL | Name of the embedding model (e.g., `mixedbread-ai/mxbai-embed-large-v1`) |
| RERANKERS_MODEL_NAME | Name of the reranker model (e.g., `ms-marco-MiniLM-L-12-v2`) |
| RERANKERS_MODEL_TYPE | Type of reranker model (e.g., `flashrank`) |
| FAST_LLM | LiteLLM routed smaller, faster LLM (e.g., `openai/gpt-4o-mini`, `ollama/deepseek-r1:8b`) |
| STRATEGIC_LLM | LiteLLM routed advanced LLM for complex tasks (e.g., `openai/gpt-4o`, `ollama/gemma3:12b`) |
| LONG_CONTEXT_LLM | LiteLLM routed LLM for longer context windows (e.g., `gemini/gemini-2.0-flash`, `ollama/deepseek-r1:8b`) |
| UNSTRUCTURED_API_KEY | API key for Unstructured.io service for document parsing |
| FIRECRAWL_API_KEY | API key for Firecrawl service for web crawling |
Include API keys for the LLM providers you're using. For example:
- `OPENAI_API_KEY`: If using OpenAI models
- `GEMINI_API_KEY`: If using Google Gemini models
For other LLM providers, refer to the [LiteLLM documentation](https://docs.litellm.ai/docs/providers).
**Frontend Environment Variables:**
| ENV VARIABLE | DESCRIPTION |
|--------------|-------------|
| NEXT_PUBLIC_FASTAPI_BACKEND_URL | URL of the backend service (e.g., `http://localhost:8000`) |
2. **Build and Start Containers**
Start the Docker containers:
**Linux/macOS/Windows:**
```bash
docker-compose up --build
```
To run in detached mode (in the background):
**Linux/macOS/Windows:**
```bash
docker-compose up -d
```
**Note for Windows users:** If you're using older Docker Desktop versions, you might need to use `docker compose` (with a space) instead of `docker-compose`.
3. **Access the Applications**
Once the containers are running, you can access:
- Frontend: [http://localhost:3000](http://localhost:3000)
- Backend API: [http://localhost:8000](http://localhost:8000)
- API Documentation: [http://localhost:8000/docs](http://localhost:8000/docs)
## Useful Docker Commands
### Container Management
- **Stop containers:**
**Linux/macOS/Windows:**
```bash
docker-compose down
```
- **View logs:**
**Linux/macOS/Windows:**
```bash
# All services
docker-compose logs -f
# Specific service
docker-compose logs -f backend
docker-compose logs -f frontend
docker-compose logs -f db
```
- **Restart a specific service:**
**Linux/macOS/Windows:**
```bash
docker-compose restart backend
```
- **Execute commands in a running container:**
**Linux/macOS/Windows:**
```bash
# Backend
docker-compose exec backend python -m pytest
# Frontend
docker-compose exec frontend pnpm lint
```
## Troubleshooting
- **Linux/macOS:** If you encounter permission errors, you may need to run the docker commands with `sudo`.
- **Windows:** If you see access denied errors, make sure you're running Command Prompt or PowerShell as Administrator.
- If ports are already in use, modify the port mappings in the `docker-compose.yml` file.
- For backend dependency issues, check the `Dockerfile` in the backend directory.
- For frontend dependency issues, check the `Dockerfile` in the frontend directory.
- **Windows-specific:** If you encounter line ending issues (CRLF vs LF), configure Git to handle line endings properly with `git config --global core.autocrlf true` before cloning the repository.
## Next Steps
Once your installation is complete, you can start using SurfSense! Navigate to the frontend URL and log in using your Google account.

View file

@ -0,0 +1,99 @@
---
title: Prerequisites
description: Required setup's before setting up SurfSense
full: true
---
## PGVector installation Guide
SurfSense requires the pgvector extension for PostgreSQL:
### Linux and Mac
Compile and install the extension (supports Postgres 13+)
```sh
cd /tmp
git clone --branch v0.8.0 https://github.com/pgvector/pgvector.git
cd pgvector
make
make install # may need sudo
```
See the [installation notes](https://github.com/pgvector/pgvector/tree/master#installation-notes---linux-and-mac) if you run into issues
### Windows
Ensure [C++ support in Visual Studio](https://learn.microsoft.com/en-us/cpp/build/building-on-the-command-line?view=msvc-170#download-and-install-the-tools) is installed, and run:
```cmd
call "C:\Program Files\Microsoft Visual Studio\2022\Community\VC\Auxiliary\Build\vcvars64.bat"
```
Note: The exact path will vary depending on your Visual Studio version and edition
Then use `nmake` to build:
```cmd
set "PGROOT=C:\Program Files\PostgreSQL\16"
cd %TEMP%
git clone --branch v0.8.0 https://github.com/pgvector/pgvector.git
cd pgvector
nmake /F Makefile.win
nmake /F Makefile.win install
```
See the [installation notes](https://github.com/pgvector/pgvector/tree/master#installation-notes---windows) if you run into issues
---
## Google OAuth Setup
SurfSense user management and authentication works on Google OAuth. Lets set it up.
1. Login to your [Google Developer Console](https://console.cloud.google.com/)
2. Enable People API.
![Google Developer Console People API](/docs/google_oauth_people_api.png)
3. Set up OAuth consent screen.
![Google Developer Console OAuth consent screen](/docs/google_oauth_screen.png)
4. Create OAuth client ID and secret.
![Google Developer Console OAuth client ID](/docs/google_oauth_client.png)
5. It should look like this.
![Google Developer Console Config](/docs/google_oauth_config.png)
---
## File Upload's
Files are converted to LLM friendly formats using [Unstructured](https://github.com/Unstructured-IO/unstructured)
1. Get an Unstructured.io API key from [Unstructured Platform](https://platform.unstructured.io/)
2. You should be able to generate API keys once registered
![Unstructured Dashboard](/docs/unstructured.png)
---
## LLM Observability (Optional)
This is not required for SurfSense to work. But it is always a good idea to monitor LLM interactions. So we do not have those WTH moments.
1. Get a LangSmith API key from [smith.langchain.com](https://smith.langchain.com/)
2. This helps in observing SurfSense Researcher Agent.
![LangSmith](/docs/langsmith.png)
---
## Crawler
SurfSense have 2 options for saving webpages:
- [SurfSense Extension](https://github.com/MODSetter/SurfSense/tree/main/surfsense_browser_extension) (Overall better experience & ability to save private webpages, recommended)
- Crawler (If you want to save public webpages)
**NOTE:** SurfSense currently uses [Firecrawl.py](https://www.firecrawl.dev/) for web crawling. If you plan on using the crawler, you will need to create a Firecrawl account and get an API key.
---
## Next Steps
Once you have all prerequisites in place, proceed to the [installation guide](/docs/installation) to set up SurfSense.

View file

@ -0,0 +1,21 @@
---
title: Installation
description: Current ways to use SurfSense
full: true
---
# Installing SurfSense
There are two ways to install SurfSense, but both require the repository to be cloned first. Clone [SurfSense](https://github.com/MODSetter/SurfSense) and then:
## Docker Installation
This method provides a containerized environment with all dependencies pre-configured. Less Customization.
[Learn more about Docker installation](/docs/docker-installation)
## Manual Installation (Preferred)
For users who prefer more control over the installation process or need to customize their setup, we also provide manual installation instructions.
[Learn more about Manual installation](/docs/manual-installation)

View file

@ -0,0 +1,258 @@
---
title: Manual Installation
description: Setting up SurfSense manually for customized deployments (Preferred)
full: true
---
# Manual Installation (Preferred)
This guide provides step-by-step instructions for setting up SurfSense without Docker. This approach gives you more control over the installation process and allows for customization of the environment.
## Prerequisites
Before beginning the manual installation, ensure you have completed all the [prerequisite setup steps](/docs), including:
- PGVector installation
- Google OAuth setup
- Unstructured.io API key
- LLM observability (optional)
- Crawler setup (if needed)
## Backend Setup
The backend is the core of SurfSense. Follow these steps to set it up:
### 1. Environment Configuration
First, create and configure your environment variables by copying the example file:
**Linux/macOS:**
```bash
cd surfsense_backend
cp .env.example .env
```
**Windows (Command Prompt):**
```cmd
cd surfsense_backend
copy .env.example .env
```
**Windows (PowerShell):**
```powershell
cd surfsense_backend
Copy-Item -Path .env.example -Destination .env
```
Edit the `.env` file and set the following variables:
| ENV VARIABLE | DESCRIPTION |
|--------------|-------------|
| DATABASE_URL | PostgreSQL connection string (e.g., `postgresql+asyncpg://postgres:postgres@localhost:5432/surfsense`) |
| SECRET_KEY | JWT Secret key for authentication (should be a secure random string) |
| GOOGLE_OAUTH_CLIENT_ID | Google OAuth client ID |
| GOOGLE_OAUTH_CLIENT_SECRET | Google OAuth client secret |
| NEXT_FRONTEND_URL | Frontend application URL (e.g., `http://localhost:3000`) |
| EMBEDDING_MODEL | Name of the embedding model (e.g., `mixedbread-ai/mxbai-embed-large-v1`) |
| RERANKERS_MODEL_NAME | Name of the reranker model (e.g., `ms-marco-MiniLM-L-12-v2`) |
| RERANKERS_MODEL_TYPE | Type of reranker model (e.g., `flashrank`) |
| FAST_LLM | LiteLLM routed faster LLM (e.g., `openai/gpt-4o-mini`, `ollama/deepseek-r1:8b`) |
| STRATEGIC_LLM | LiteLLM routed advanced LLM (e.g., `openai/gpt-4o`, `ollama/gemma3:12b`) |
| LONG_CONTEXT_LLM | LiteLLM routed long-context LLM (e.g., `gemini/gemini-2.0-flash`, `ollama/deepseek-r1:8b`) |
| UNSTRUCTURED_API_KEY | API key for Unstructured.io service |
| FIRECRAWL_API_KEY | API key for Firecrawl service (if using crawler) |
**Important**: Since LLM calls are routed through LiteLLM, include API keys for the LLM providers you're using:
- For OpenAI models: `OPENAI_API_KEY`
- For Google Gemini models: `GEMINI_API_KEY`
- For other providers, refer to the [LiteLLM documentation](https://docs.litellm.ai/docs/providers)
### 2. Install Dependencies
Install the backend dependencies using `uv`:
**Linux/macOS:**
```bash
# Install uv if you don't have it
curl -fsSL https://astral.sh/uv/install.sh | bash
# Install dependencies
uv sync
```
**Windows (PowerShell):**
```powershell
# Install uv if you don't have it
iwr -useb https://astral.sh/uv/install.ps1 | iex
# Install dependencies
uv sync
```
**Windows (Command Prompt):**
```cmd
# Install dependencies with uv (after installing uv)
uv sync
```
### 3. Run the Backend
Start the backend server:
**Linux/macOS/Windows:**
```bash
# Run without hot reloading
uv run main.py
# Or with hot reloading for development
uv run main.py --reload
```
If everything is set up correctly, you should see output indicating the server is running on `http://localhost:8000`.
## Frontend Setup
### 1. Environment Configuration
Set up the frontend environment:
**Linux/macOS:**
```bash
cd surfsense_web
cp .env.example .env
```
**Windows (Command Prompt):**
```cmd
cd surfsense_web
copy .env.example .env
```
**Windows (PowerShell):**
```powershell
cd surfsense_web
Copy-Item -Path .env.example -Destination .env
```
Edit the `.env` file and set:
| ENV VARIABLE | DESCRIPTION |
|--------------|-------------|
| NEXT_PUBLIC_FASTAPI_BACKEND_URL | Backend URL (e.g., `http://localhost:8000`) |
### 2. Install Dependencies
Install the frontend dependencies:
**Linux/macOS:**
```bash
# Install pnpm if you don't have it
npm install -g pnpm
# Install dependencies
pnpm install
```
**Windows:**
```powershell
# Install pnpm if you don't have it
npm install -g pnpm
# Install dependencies
pnpm install
```
### 3. Run the Frontend
Start the Next.js development server:
**Linux/macOS/Windows:**
```bash
pnpm run dev
```
The frontend should now be running at `http://localhost:3000`.
## Browser Extension Setup (Optional)
The SurfSense browser extension allows you to save any webpage, including those protected behind authentication.
### 1. Environment Configuration
**Linux/macOS:**
```bash
cd surfsense_browser_extension
cp .env.example .env
```
**Windows (Command Prompt):**
```cmd
cd surfsense_browser_extension
copy .env.example .env
```
**Windows (PowerShell):**
```powershell
cd surfsense_browser_extension
Copy-Item -Path .env.example -Destination .env
```
Edit the `.env` file:
| ENV VARIABLE | DESCRIPTION |
|--------------|-------------|
| PLASMO_PUBLIC_BACKEND_URL | SurfSense Backend URL (e.g., `http://127.0.0.1:8000`) |
### 2. Build the Extension
Build the extension for your browser using the [Plasmo framework](https://docs.plasmo.com/framework/workflows/build#with-a-specific-target).
**Linux/macOS/Windows:**
```bash
# Install dependencies
pnpm install
# Build for Chrome (default)
pnpm build
# Or for other browsers
pnpm build --target=firefox
pnpm build --target=edge
```
### 3. Load the Extension
Load the extension in your browser's developer mode and configure it with your SurfSense API key.
## Verification
To verify your installation:
1. Open your browser and navigate to `http://localhost:3000`
2. Sign in with your Google account
3. Create a search space and try uploading a document
4. Test the chat functionality with your uploaded content
## Troubleshooting
- **Database Connection Issues**: Verify your PostgreSQL server is running and pgvector is properly installed
- **Authentication Problems**: Check your Google OAuth configuration and ensure redirect URIs are set correctly
- **LLM Errors**: Confirm your LLM API keys are valid and the selected models are accessible
- **File Upload Failures**: Validate your Unstructured.io API key
- **Windows-specific**: If you encounter path issues, ensure you're using the correct path separator (`\` instead of `/`)
- **macOS-specific**: If you encounter permission issues, you may need to use `sudo` for some installation commands
## Next Steps
Now that you have SurfSense running locally, you can explore its features:
- Create search spaces for organizing your content
- Upload documents or use the browser extension to save webpages
- Ask questions about your saved content
- Explore the advanced RAG capabilities
For production deployments, consider setting up:
- A reverse proxy like Nginx
- SSL certificates for secure connections
- Proper database backups
- User access controls

View file

@ -0,0 +1,12 @@
{
"title": "Setup",
"description": "The setup guide for Surfsense",
"root": true,
"pages": [
"---Setup---",
"index",
"installation",
"docker-installation",
"manual-installation"
]
}

View file

@ -22,7 +22,7 @@ export function useDocuments(searchSpaceId: number) {
try {
setLoading(true);
const response = await fetch(
`${process.env.NEXT_PUBLIC_FASTAPI_BACKEND_URL}/api/v1/documents`,
`${process.env.NEXT_PUBLIC_FASTAPI_BACKEND_URL}/api/v1/documents?search_space_id=${searchSpaceId}`,
{
headers: {
Authorization: `Bearer ${localStorage.getItem('surfsense_bearer_token')}`,
@ -57,7 +57,7 @@ export function useDocuments(searchSpaceId: number) {
setLoading(true);
try {
const response = await fetch(
`${process.env.NEXT_PUBLIC_FASTAPI_BACKEND_URL}/api/v1/documents`,
`${process.env.NEXT_PUBLIC_FASTAPI_BACKEND_URL}/api/v1/documents?search_space_id=${searchSpaceId}`,
{
headers: {
Authorization: `Bearer ${localStorage.getItem('surfsense_bearer_token')}`,

View file

@ -0,0 +1,240 @@
import { useState, useEffect, useCallback } from 'react';
import { useRouter } from 'next/navigation';
import { useForm } from 'react-hook-form';
import { zodResolver } from '@hookform/resolvers/zod';
import { toast } from 'sonner';
import { useSearchSourceConnectors, SearchSourceConnector } from '@/hooks/useSearchSourceConnectors';
import {
GithubRepo,
EditMode,
githubPatSchema,
editConnectorSchema,
GithubPatFormValues,
EditConnectorFormValues
} from '@/components/editConnector/types';
export function useConnectorEditPage(connectorId: number, searchSpaceId: string) {
const router = useRouter();
const { connectors, updateConnector, isLoading: connectorsLoading } = useSearchSourceConnectors();
// State managed by the hook
const [connector, setConnector] = useState<SearchSourceConnector | null>(null);
const [originalConfig, setOriginalConfig] = useState<Record<string, any> | null>(null);
const [isSaving, setIsSaving] = useState(false);
const [currentSelectedRepos, setCurrentSelectedRepos] = useState<string[]>([]);
const [originalPat, setOriginalPat] = useState<string>("");
const [editMode, setEditMode] = useState<EditMode>('viewing');
const [fetchedRepos, setFetchedRepos] = useState<GithubRepo[] | null>(null);
const [newSelectedRepos, setNewSelectedRepos] = useState<string[]>([]);
const [isFetchingRepos, setIsFetchingRepos] = useState(false);
// Forms managed by the hook
const patForm = useForm<GithubPatFormValues>({
resolver: zodResolver(githubPatSchema),
defaultValues: { github_pat: "" },
});
const editForm = useForm<EditConnectorFormValues>({
resolver: zodResolver(editConnectorSchema),
defaultValues: {
name: "",
SLACK_BOT_TOKEN: "",
NOTION_INTEGRATION_TOKEN: "",
SERPER_API_KEY: "",
TAVILY_API_KEY: "",
LINEAR_API_KEY: ""
},
});
// Effect to load initial data
useEffect(() => {
if (!connectorsLoading && connectors.length > 0 && !connector) {
const currentConnector = connectors.find(c => c.id === connectorId);
if (currentConnector) {
setConnector(currentConnector);
const config = currentConnector.config || {};
setOriginalConfig(config);
editForm.reset({
name: currentConnector.name,
SLACK_BOT_TOKEN: config.SLACK_BOT_TOKEN || "",
NOTION_INTEGRATION_TOKEN: config.NOTION_INTEGRATION_TOKEN || "",
SERPER_API_KEY: config.SERPER_API_KEY || "",
TAVILY_API_KEY: config.TAVILY_API_KEY || "",
LINEAR_API_KEY: config.LINEAR_API_KEY || ""
});
if (currentConnector.connector_type === 'GITHUB_CONNECTOR') {
const savedRepos = config.repo_full_names || [];
const savedPat = config.GITHUB_PAT || "";
setCurrentSelectedRepos(savedRepos);
setNewSelectedRepos(savedRepos);
setOriginalPat(savedPat);
patForm.reset({ github_pat: savedPat });
setEditMode('viewing');
}
} else {
toast.error("Connector not found.");
router.push(`/dashboard/${searchSpaceId}/connectors`);
}
}
}, [connectorId, connectors, connectorsLoading, router, searchSpaceId, connector, editForm, patForm]);
// Handlers managed by the hook
const handleFetchRepositories = useCallback(async (values: GithubPatFormValues) => {
setIsFetchingRepos(true);
setFetchedRepos(null);
try {
const token = localStorage.getItem('surfsense_bearer_token');
if (!token) throw new Error('No auth token');
const response = await fetch(
`${process.env.NEXT_PUBLIC_FASTAPI_BACKEND_URL}/api/v1/github/repositories/`,
{ method: 'POST', headers: { 'Content-Type': 'application/json', 'Authorization': `Bearer ${token}` }, body: JSON.stringify({ github_pat: values.github_pat }) }
);
if (!response.ok) { const err = await response.json(); throw new Error(err.detail || 'Fetch failed'); }
const data: GithubRepo[] = await response.json();
setFetchedRepos(data);
setNewSelectedRepos(currentSelectedRepos);
toast.success(`Found ${data.length} repos.`);
} catch (error) {
console.error("Error fetching GitHub repositories:", error);
toast.error(error instanceof Error ? error.message : "Failed to fetch repositories.");
} finally { setIsFetchingRepos(false); }
}, [currentSelectedRepos]); // Added dependency
const handleRepoSelectionChange = useCallback((repoFullName: string, checked: boolean) => {
setNewSelectedRepos(prev => checked ? [...prev, repoFullName] : prev.filter(name => name !== repoFullName));
}, []);
const handleSaveChanges = useCallback(async (formData: EditConnectorFormValues) => {
if (!connector || !originalConfig) return;
setIsSaving(true);
const updatePayload: Partial<SearchSourceConnector> = {};
let configChanged = false;
let newConfig: Record<string, any> | null = null;
if (formData.name !== connector.name) {
updatePayload.name = formData.name;
}
switch (connector.connector_type) {
case 'GITHUB_CONNECTOR':
const currentPatInForm = patForm.getValues('github_pat');
const patChanged = currentPatInForm !== originalPat;
const initialRepoSet = new Set(currentSelectedRepos);
const newRepoSet = new Set(newSelectedRepos);
const reposChanged = initialRepoSet.size !== newRepoSet.size || ![...initialRepoSet].every(repo => newRepoSet.has(repo));
if (patChanged || (editMode === 'editing_repos' && reposChanged && fetchedRepos !== null)) {
if (!currentPatInForm || !(currentPatInForm.startsWith('ghp_') || currentPatInForm.startsWith('github_pat_'))) {
toast.error("Invalid GitHub PAT format. Cannot save."); setIsSaving(false); return;
}
newConfig = { GITHUB_PAT: currentPatInForm, repo_full_names: newSelectedRepos };
if (reposChanged && newSelectedRepos.length === 0) { toast.warning("Warning: No repositories selected."); }
}
break;
case 'SLACK_CONNECTOR':
if (formData.SLACK_BOT_TOKEN !== originalConfig.SLACK_BOT_TOKEN) {
if (!formData.SLACK_BOT_TOKEN) { toast.error("Slack Token empty."); setIsSaving(false); return; }
newConfig = { SLACK_BOT_TOKEN: formData.SLACK_BOT_TOKEN };
}
break;
case 'NOTION_CONNECTOR':
if (formData.NOTION_INTEGRATION_TOKEN !== originalConfig.NOTION_INTEGRATION_TOKEN) {
if (!formData.NOTION_INTEGRATION_TOKEN) { toast.error("Notion Token empty."); setIsSaving(false); return; }
newConfig = { NOTION_INTEGRATION_TOKEN: formData.NOTION_INTEGRATION_TOKEN };
}
break;
case 'SERPER_API':
if (formData.SERPER_API_KEY !== originalConfig.SERPER_API_KEY) {
if (!formData.SERPER_API_KEY) { toast.error("Serper Key empty."); setIsSaving(false); return; }
newConfig = { SERPER_API_KEY: formData.SERPER_API_KEY };
}
break;
case 'TAVILY_API':
if (formData.TAVILY_API_KEY !== originalConfig.TAVILY_API_KEY) {
if (!formData.TAVILY_API_KEY) { toast.error("Tavily Key empty."); setIsSaving(false); return; }
newConfig = { TAVILY_API_KEY: formData.TAVILY_API_KEY };
}
break;
case 'LINEAR_CONNECTOR':
if (formData.LINEAR_API_KEY !== originalConfig.LINEAR_API_KEY) {
if (!formData.LINEAR_API_KEY) {
toast.error("Linear API Key cannot be empty.");
setIsSaving(false);
return;
}
newConfig = { LINEAR_API_KEY: formData.LINEAR_API_KEY };
}
break;
}
if (newConfig !== null) {
updatePayload.config = newConfig;
configChanged = true;
}
if (Object.keys(updatePayload).length === 0) {
toast.info("No changes detected.");
setIsSaving(false);
if (connector.connector_type === 'GITHUB_CONNECTOR') { setEditMode('viewing'); patForm.reset({ github_pat: originalPat }); }
return;
}
try {
await updateConnector(connectorId, updatePayload);
toast.success("Connector updated!");
const newlySavedConfig = updatePayload.config || originalConfig;
setOriginalConfig(newlySavedConfig);
if (updatePayload.name) {
setConnector(prev => prev ? { ...prev, name: updatePayload.name!, config: newlySavedConfig } : null);
}
if (configChanged) {
if (connector.connector_type === 'GITHUB_CONNECTOR') {
const savedGitHubConfig = newlySavedConfig as { GITHUB_PAT?: string; repo_full_names?: string[] };
setCurrentSelectedRepos(savedGitHubConfig.repo_full_names || []);
setOriginalPat(savedGitHubConfig.GITHUB_PAT || "");
setNewSelectedRepos(savedGitHubConfig.repo_full_names || []);
patForm.reset({ github_pat: savedGitHubConfig.GITHUB_PAT || "" });
} else if(connector.connector_type === 'SLACK_CONNECTOR') {
editForm.setValue('SLACK_BOT_TOKEN', newlySavedConfig.SLACK_BOT_TOKEN || "");
} else if(connector.connector_type === 'NOTION_CONNECTOR') {
editForm.setValue('NOTION_INTEGRATION_TOKEN', newlySavedConfig.NOTION_INTEGRATION_TOKEN || "");
} else if(connector.connector_type === 'SERPER_API') {
editForm.setValue('SERPER_API_KEY', newlySavedConfig.SERPER_API_KEY || "");
} else if(connector.connector_type === 'TAVILY_API') {
editForm.setValue('TAVILY_API_KEY', newlySavedConfig.TAVILY_API_KEY || "");
} else if(connector.connector_type === 'LINEAR_CONNECTOR') {
editForm.setValue('LINEAR_API_KEY', newlySavedConfig.LINEAR_API_KEY || "");
}
}
if (connector.connector_type === 'GITHUB_CONNECTOR') {
setEditMode('viewing');
setFetchedRepos(null);
}
// Resetting simple form values is handled by useEffect if connector state updates
} catch (error) {
console.error("Error updating connector:", error);
toast.error(error instanceof Error ? error.message : "Failed to update connector.");
} finally { setIsSaving(false); }
}, [connector, originalConfig, updateConnector, connectorId, patForm, originalPat, currentSelectedRepos, newSelectedRepos, editMode, fetchedRepos, editForm]); // Added editForm to dependencies
// Return values needed by the component
return {
connectorsLoading,
connector,
isSaving,
editForm,
patForm,
handleSaveChanges,
// GitHub specific props
editMode,
setEditMode,
originalPat,
currentSelectedRepos,
fetchedRepos,
setFetchedRepos,
newSelectedRepos,
setNewSelectedRepos,
isFetchingRepos,
handleFetchRepositories,
handleRepoSelectionChange,
};
}

View file

@ -0,0 +1,12 @@
// Helper function to get connector type display name
export const getConnectorTypeDisplay = (type: string): string => {
const typeMap: Record<string, string> = {
"SERPER_API": "Serper API",
"TAVILY_API": "Tavily API",
"SLACK_CONNECTOR": "Slack",
"NOTION_CONNECTOR": "Notion",
"GITHUB_CONNECTOR": "GitHub",
"LINEAR_CONNECTOR": "Linear",
};
return typeMap[type] || type;
};

View file

@ -0,0 +1,7 @@
import { docs } from '@/.source';
import { loader } from 'fumadocs-core/source';
export const source = loader({
baseUrl: '/docs',
source: docs.toFumadocsSource(),
});

View file

@ -0,0 +1,9 @@
import defaultMdxComponents from 'fumadocs-ui/mdx';
import type { MDXComponents } from 'mdx/types';
export function getMDXComponents(components?: MDXComponents): MDXComponents {
return {
...defaultMdxComponents,
...components,
};
}

View file

@ -1,4 +1,5 @@
import type { NextConfig } from "next";
import { createMDX } from 'fumadocs-mdx/next';
const nextConfig: NextConfig = {
typescript: {
@ -9,4 +10,7 @@ const nextConfig: NextConfig = {
},
};
export default nextConfig;
// Wrap the config with createMDX
const withMDX = createMDX({});
export default withMDX(nextConfig);

View file

@ -1,15 +1,18 @@
{
"name": "surf_new_frontend",
"version": "0.1.0",
"name": "surfsense_web",
"version": "0.0.6",
"private": true,
"description": "SurfSense Frontend",
"scripts": {
"dev": "next dev --turbopack",
"dev": "next dev",
"dev:turbopack": "next dev --turbopack",
"build": "next build",
"start": "next start",
"lint": "next lint",
"debug": "cross-env NODE_OPTIONS=--inspect next dev --turbopack",
"debug:browser": "cross-env NODE_OPTIONS=--inspect next dev --turbopack",
"debug:server": "cross-env NODE_OPTIONS=--inspect=0.0.0.0:9229 next dev --turbopack"
"debug:server": "cross-env NODE_OPTIONS=--inspect=0.0.0.0:9229 next dev --turbopack",
"postinstall": "fumadocs-mdx"
},
"dependencies": {
"@ai-sdk/react": "^1.1.21",
@ -30,12 +33,16 @@
"@radix-ui/react-tooltip": "^1.1.8",
"@tabler/icons-react": "^3.30.0",
"@tanstack/react-table": "^8.21.2",
"@types/mdx": "^2.0.13",
"ai": "^4.1.54",
"class-variance-authority": "^0.7.1",
"clsx": "^2.1.1",
"date-fns": "^4.1.0",
"emblor": "^1.4.7",
"framer-motion": "^12.4.7",
"fumadocs-core": "^15.2.9",
"fumadocs-mdx": "^11.6.1",
"fumadocs-ui": "^15.2.9",
"geist": "^1.3.1",
"lucide-react": "^0.477.0",
"next": "15.2.3",

File diff suppressed because it is too large Load diff

Binary file not shown.

After

Width:  |  Height:  |  Size: 98 KiB

Binary file not shown.

After

Width:  |  Height:  |  Size: 47 KiB

Binary file not shown.

After

Width:  |  Height:  |  Size: 68 KiB

Binary file not shown.

After

Width:  |  Height:  |  Size: 71 KiB

Binary file not shown.

After

Width:  |  Height:  |  Size: 301 KiB

Binary file not shown.

After

Width:  |  Height:  |  Size: 108 KiB

View file

@ -0,0 +1,5 @@
import { defineDocs } from 'fumadocs-mdx/config';
export const docs = defineDocs({
dir: 'content/docs',
});

View file

@ -22,6 +22,6 @@
"@/*": ["./*"]
}
},
"include": ["next-env.d.ts", "**/*.ts", "**/*.tsx", ".next/types/**/*.ts"],
"include": ["next-env.d.ts", "**/*.ts", "**/*.tsx", ".next/types/**/*.ts", "next.config.mjs"],
"exclude": ["node_modules"]
}