Merge branch 'MODSetter:main' into anshulss/buildimage

2026-05-17 18:35:19 +02:00 · 2025-04-26 22:52:14 +05:30 · 2025-04-26 22:52:14 +05:30 · 73350c7f92
commit 73350c7f92
parent 7e5dd5c146 273c16a611
92 changed files with 8163 additions and 1785 deletions
--- a/README.md
+++ b/README.md
@ -1,11 +1,12 @@
-![headnew](https://github.com/user-attachments/assets/a44fd1e7-1861-46d0-aff7-19cf33e86baa)
+
 ![new_header](https://github.com/user-attachments/assets/e236b764-0ddc-42ff-a1f1-8fbb3d2e0e65)
 # SurfSense
-While tools like NotebookLM and Perplexity are impressive and highly effective for conducting research on any topic, SurfSense elevates this capability by integrating with your personal knowledge base. It is a highly customizable AI research agent, connected to external sources such as search engines (Tavily), Slack, Notion, and more to come.
+While tools like NotebookLM and Perplexity are impressive and highly effective for conducting research on any topic/query, SurfSense elevates this capability by integrating with your personal knowledge base. It is a highly customizable AI research agent, connected to external sources such as search engines (Tavily), Slack, Linear, Notion, YouTube, GitHub and more to come.
 # Video
@ -43,8 +44,10 @@ Open source and easy to deploy locally.
 #### ℹ️ **External Sources**
 - Search Engines (Tavily)
 - Slack
 - Linear
 - Notion
 - Youtube Videos
 - GitHub
 - and more to come.....
 #### 🔖 Cross Browser Extension
@ -69,144 +72,69 @@ Join the [SurfSense Discord](https://discord.gg/ejRNvftDp9) and help shape the f
 ## How to get started?
-### PRE-START CHECKS
+### Installation Options
-#### PGVector
+SurfSense provides two installation methods:
 Make sure pgvector extension is installed on your machine. Setup Guide https://github.com/pgvector/pgvector?tab=readme-ov-file#installation
-#### File Uploading Support
+1. **[Docker Installation](https://www.surfsense.net/docs/docker-installation)** - The easiest way to get SurfSense up and running with all dependencies containerized. Less Customization.
 For File uploading you need Unstructured.io API key. You can get it at http://platform.unstructured.io/
-#### Auth
+2. **[Manual Installation (Recommended)](https://www.surfsense.net/docs/manual-installation)** - For users who prefer more control over their setup or need to customize their deployment.
 SurfSense now only works with Google OAuth. Make sure to set your OAuth Client at https://developers.google.com/identity/protocols/oauth2 . We need client id and client secret for backend. Make sure to enable people api and add the required scopes under data access (openid, userinfo.email, userinfo.profile)
-![gauth](https://github.com/user-attachments/assets/80d60fe5-889b-48a6-b947-200fdaf544c1)
+Both installation guides include detailed OS-specific instructions for Windows, macOS, and Linux.
 Before installation, make sure to complete the [prerequisite setup steps](https://www.surfsense.net/docs/) including:
 - PGVector setup
 - Google OAuth configuration
 - Unstructured.io API key
 - Other required API keys
 ## Screenshots
 **Search Spaces** 
 ![search_spaces](https://github.com/user-attachments/assets/e254c38c-f937-44b6-9e9d-770db583d099)
 **Manage Documents** 
 ![documents](https://github.com/user-attachments/assets/7001e306-eb06-4009-89c6-8fadfdc3fc4d)
 **Research Agent** 
 ![researcher](https://github.com/user-attachments/assets/fda3e61f-f936-4b66-b565-d84edde44a67)
-#### Crawler Support
+**Agent Chat** 
 SurfSense currently uses [Firecrawl.py](https://www.firecrawl.dev/) right now. Playwright crawler support will be added soon. 
 ![chat](https://github.com/user-attachments/assets/bb352d52-1c6d-4020-926b-722d0b98b491)
-## Quick Start
+**Browser Extension**
 ### Preferred Method: Docker Setup
 The recommended way to run SurfSense is using Docker, which ensures consistent environment across different systems.
 1. Make sure you have Docker and Docker Compose installed
 2. Follow the detailed instructions in our [Docker Setup Guide](DOCKER_SETUP.md)
 ```bash
 # Start all services with one command
 docker-compose up --build
 ```
 ---
 ### Alternative: Manual Setup
 ### Backend (./surfsense_backend)
 This is the core of SurfSense. Before we begin let's look at `.env` variables' that we need to successfully setup SurfSense.
 |ENV VARIABLE|DESCRIPTION|
 |--|--|
 | DATABASE_URL| Your PostgreSQL database connection string. Eg. `postgresql+asyncpg://postgres:postgres@localhost:5432/surfsense`|
 | SECRET_KEY| JWT Secret key used for authentication. Should be a secure random string. Eg. `SURFSENSE_SECRET_KEY_123456789`|
 | GOOGLE_OAUTH_CLIENT_ID| Google OAuth client ID obtained from Google Cloud Console when setting up OAuth authentication|
 | GOOGLE_OAUTH_CLIENT_SECRET| Google OAuth client secret obtained from Google Cloud Console when setting up OAuth authentication|
 | NEXT_FRONTEND_URL| URL where your frontend application is hosted. Eg. `http://localhost:3000`|
 | EMBEDDING_MODEL| Name of the embedding model to use for vector embeddings. Currently works with Sentence Transformers only. Expect other embeddings soon. Eg. `mixedbread-ai/mxbai-embed-large-v1`|
 | RERANKERS_MODEL_NAME| Name of the reranker model for search result reranking. Eg. `ms-marco-MiniLM-L-12-v2`|
 | RERANKERS_MODEL_TYPE| Type of reranker model being used. Eg. `flashrank`|
 | FAST_LLM| LiteLLM routed Smaller, faster LLM for quick responses. Eg. `litellm:openai/gpt-4o`|
 | SMART_LLM| LiteLLM routed  Balanced LLM for general use. Eg. `litellm:openai/gpt-4o`|
 | STRATEGIC_LLM| LiteLLM routed  Advanced LLM for complex reasoning tasks. Eg. `litellm:openai/gpt-4o`|
 | LONG_CONTEXT_LLM| LiteLLM routed  LLM capable of handling longer context windows. Eg. `litellm:gemini/gemini-2.0-flash`|
 | UNSTRUCTURED_API_KEY| API key for Unstructured.io service for document parsing|
 | FIRECRAWL_API_KEY| API key for Firecrawl service for web crawling and data extraction|
 IMPORTANT: Since LLM calls are routed through LiteLLM make sure to include API keys of LLM models you are using. For example if you used `litellm:openai/gpt-4o` make sure to include OpenAI API Key `OPENAI_API_KEY` or if you use `litellm:gemini/gemini-2.0-flash` then you include `GEMINI_API_KEY`.
 You can also integrate any LLM just follow this https://docs.litellm.ai/docs/providers
 Now once you have everything let's proceed to run SurfSense. 
 1. Install `uv` : https://docs.astral.sh/uv/getting-started/installation/
 2. Now just run this command to install dependencies i.e `uv sync`
 3. That's it. Now just run the `main.py` file using `uv run main.py`. You can also optionally pass `--reload` as an argument to enable hot reloading.
 4. If everything worked fine you should see screen like this.
 ![backend](https://i.ibb.co/542Vhqw/backendrunning.png)
 ---
 ### FrontEnd (./surfsense_web)
 For local frontend setup just fill out the `.env` file of frontend.
 |ENV VARIABLE|DESCRIPTION|
 |--|--|
 | NEXT_PUBLIC_FASTAPI_BACKEND_URL | Give hosted backend url here. Eg. `http://localhost:8000`|
 1. Now install dependencies using `pnpm install`
 2. Run it using `pnpm run dev`
 You should see your Next.js frontend running at `localhost:3000`
 ---
 ### Extension (./surfsense_browser_extension)
 Extension is in plasmo framework which is a cross browser extension framework. Extension main usecase is to save any webpages protected beyond authentication.
 For building extension just fill out the `.env` file of frontend.
 |ENV VARIABLE|DESCRIPTION|
 |--|--|
 | PLASMO_PUBLIC_BACKEND_URL| SurfSense Backend URL eg. "http://127.0.0.1:8000" |
 Build the extension for your favorite browser using this guide: https://docs.plasmo.com/framework/workflows/build#with-a-specific-target 
 When you load and start the extension you should see a Apu page like this
 ![ext1](https://github.com/user-attachments/assets/1f042b7a-6349-422b-94fb-d40d0df16c40)
 After filling in your SurfSense API key you should be able to use extension now.
 ![ext2](https://github.com/user-attachments/assets/a9b9f1aa-2677-404d-b0a0-c1b2dddf24a7)
-
+## Tech Stack
 |Options|Explanations|
 |--|--|
 | Search Space | Search Space to save your dynamic bookmarks.  |
 | Clear Inactive History Sessions | It clears the saved content for Inactive Tab Sessions.  |
 | Save Current Webpage Snapshot | Stores the current webpage session info into SurfSense history store|
 | Save to SurfSense | Processes the SurfSense History Store & Initiates a Save Job |
 ##  Tech Stack
 ### **BackEnd** 
 -  **FastAPI**: Modern, fast web framework for building APIs with Python
-
+  
 -  **PostgreSQL with pgvector**: Database with vector search capabilities for similarity searches
 -  **SQLAlchemy**: SQL toolkit and ORM (Object-Relational Mapping) for database interactions
-  **FastAPI Users**: Authentication and user management with JWT and OAuth support
+-  **Alembic**: A database migrations tool for SQLAlchemy.
 -  **LangChain**: Framework for developing AI-powered applications
-  **GPT Integration**: Integration with LLM models through LiteLLM
+-  **FastAPI Users**: Authentication and user management with JWT and OAuth support
 -  **LangGraph**: Framework for developing AI-agents.
 -  **LangChain**: Framework for developing AI-powered applications.
 -  **LLM Integration**: Integration with LLM models through LiteLLM
 -  **Rerankers**: Advanced result ranking for improved search relevance
 -  **GPT-Researcher**: Advanced research capabilities
 -  **Hybrid Search**: Combines vector similarity and full-text search for optimal results using Reciprocal Rank Fusion (RRF)
 -  **Vector Embeddings**: Document and text embeddings for semantic search
@ -214,10 +142,8 @@ After filling in your SurfSense API key you should be able to use extension now.
 -  **pgvector**: PostgreSQL extension for efficient vector similarity operations
 -  **Chonkie**: Advanced document chunking and embedding library
-
+ - Uses `AutoEmbeddings` for flexible embedding model selection
- Uses `AutoEmbeddings` for flexible embedding model selection
+ -  `LateChunker` for optimized document chunking based on embedding model's max sequence length
 -  `LateChunker` for optimized document chunking based on embedding model's max sequence length
--- a/surfsense_backend/.env.example
+++ b/surfsense_backend/.env.example
@ -4,18 +4,26 @@ SECRET_KEY="SECRET"
 GOOGLE_OAUTH_CLIENT_ID="924507538m"
 GOOGLE_OAUTH_CLIENT_SECRET="GOCSV"
 NEXT_FRONTEND_URL="http://localhost:3000"
 EMBEDDING_MODEL="mixedbread-ai/mxbai-embed-large-v1"
 RERANKERS_MODEL_NAME="ms-marco-MiniLM-L-12-v2"
 RERANKERS_MODEL_TYPE="flashrank"
-FAST_LLM="litellm:openai/gpt-4o-mini"
+# https://docs.litellm.ai/docs/providers
-SMART_LLM="litellm:openai/gpt-4o-mini"
+FAST_LLM="openai/gpt-4o-mini"
-STRATEGIC_LLM="litellm:openai/gpt-4o-mini"
+STRATEGIC_LLM="openai/gpt-4o"
-LONG_CONTEXT_LLM="litellm:gemini/gemini-2.0-flash-thinking-exp-01-21"
+LONG_CONTEXT_LLM="gemini/gemini-2.0-flash"
 # Chosen LiteLLM Providers Keys
 OPENAI_API_KEY="sk-proj-iA"
 GEMINI_API_KEY="AIzaSyB6-1641124124124124124124124124124"
 UNSTRUCTURED_API_KEY="Tpu3P0U8iy"
 FIRECRAWL_API_KEY="fcr-01J0000000000000000000000"
 #OPTIONAL: Add these for LangSmith Observability
 LANGSMITH_TRACING=true
 LANGSMITH_ENDPOINT="https://api.smith.langchain.com"
 LANGSMITH_API_KEY="lsv2_pt_....."
 LANGSMITH_PROJECT="surfsense"
--- a/surfsense_backend/.gitignore
+++ b/surfsense_backend/.gitignore
@ -3,4 +3,5 @@
 venv/
 data/
 __pycache__/
-.flashrank_cache
+.flashrank_cache
 surf_new_backend.egg-info/
--- a/surfsense_backend/alembic.ini
+++ b/surfsense_backend/alembic.ini
@ -0,0 +1,119 @@
 # A generic, single database configuration.
 [alembic]
 # path to migration scripts.
 # Use forward slashes (/) also on windows to provide an os agnostic path
 script_location = alembic
 # template used to generate migration file names; The default value is %%(rev)s_%%(slug)s
 # Uncomment the line below if you want the files to be prepended with date and time
 # file_template = %%(year)d_%%(month).2d_%%(day).2d_%%(hour).2d%%(minute).2d-%%(rev)s_%%(slug)s
 # sys.path path, will be prepended to sys.path if present.
 # defaults to the current working directory.
 prepend_sys_path = .
 # timezone to use when rendering the date within the migration file
 # as well as the filename.
 # If specified, requires the python>=3.9 or backports.zoneinfo library and tzdata library.
 # Any required deps can installed by adding `alembic[tz]` to the pip requirements
 # string value is passed to ZoneInfo()
 # leave blank for localtime
 # timezone =
 # max length of characters to apply to the "slug" field
 # truncate_slug_length = 40
 # set to 'true' to run the environment during
 # the 'revision' command, regardless of autogenerate
 # revision_environment = false
 # set to 'true' to allow .pyc and .pyo files without
 # a source .py file to be detected as revisions in the
 # versions/ directory
 # sourceless = false
 # version location specification; This defaults
 # to alembic/versions.  When using multiple version
 # directories, initial revisions must be specified with --version-path.
 # The path separator used here should be the separator specified by "version_path_separator" below.
 # version_locations = %(here)s/bar:%(here)s/bat:alembic/versions
 # version path separator; As mentioned above, this is the character used to split
 # version_locations. The default within new alembic.ini files is "os", which uses os.pathsep.
 # If this key is omitted entirely, it falls back to the legacy behavior of splitting on spaces and/or commas.
 # Valid values for version_path_separator are:
 #
 # version_path_separator = :
 # version_path_separator = ;
 # version_path_separator = space
 # version_path_separator = newline
 #
 # Use os.pathsep. Default configuration used for new projects.
 version_path_separator = os
 # set to 'true' to search source files recursively
 # in each "version_locations" directory
 # new in Alembic version 1.10
 # recursive_version_locations = false
 # the output encoding used when revision files
 # are written from script.py.mako
 # output_encoding = utf-8
 # The SQLAlchemy URL to connect to
 # IMPORTANT: Replace this with your actual async database URL
 sqlalchemy.url = postgresql+asyncpg://postgres:postgres@localhost:5432/surfsense
 [post_write_hooks]
 # post_write_hooks defines scripts or Python functions that are run
 # on newly generated revision scripts.  See the documentation for further
 # detail and examples
 # format using "black" - use the console_scripts runner, against the "black" entrypoint
 # hooks = black
 # black.type = console_scripts
 # black.entrypoint = black
 # black.options = -l 79 REVISION_SCRIPT_FILENAME
 # lint with attempts to fix using "ruff" - use the exec runner, execute a binary
 # hooks = ruff
 # ruff.type = exec
 # ruff.executable = %(here)s/.venv/bin/ruff
 # ruff.options = check --fix REVISION_SCRIPT_FILENAME
 # Logging configuration
 [loggers]
 keys = root,sqlalchemy,alembic
 [handlers]
 keys = console
 [formatters]
 keys = generic
 [logger_root]
 level = WARNING
 handlers = console
 qualname =
 [logger_sqlalchemy]
 level = WARNING
 handlers =
 qualname = sqlalchemy.engine
 [logger_alembic]
 level = INFO
 handlers =
 qualname = alembic
 [handler_console]
 class = StreamHandler
 args = (sys.stderr,)
 level = NOTSET
 formatter = generic
 [formatter_generic]
 format = %(levelname)-5.5s [%(name)s] %(message)s
 datefmt = %H:%M:%S
--- a/surfsense_backend/alembic/README
+++ b/surfsense_backend/alembic/README
@ -0,0 +1 @@
 Generic single-database configuration with an async dbapi.
--- a/surfsense_backend/alembic/env.py
+++ b/surfsense_backend/alembic/env.py
@ -0,0 +1,98 @@
 import asyncio
 from logging.config import fileConfig
 import os
 import sys
 from sqlalchemy import pool
 from sqlalchemy.engine import Connection
 from sqlalchemy.ext.asyncio import async_engine_from_config
 from alembic import context
 # Ensure the app directory is in the Python path
 # This allows Alembic to find your models
 sys.path.insert(0, os.path.realpath(os.path.join(os.path.dirname(__file__), '..')))
 # Import your models base
 from app.db import Base # Assuming your Base is defined in app.db
 # this is the Alembic Config object, which provides
 # access to the values within the .ini file in use.
 config = context.config
 # Interpret the config file for Python logging.
 # This line sets up loggers basically.
 if config.config_file_name is not None:
    fileConfig(config.config_file_name)
 # add your model's MetaData object here
 # for 'autogenerate' support
 # from myapp import mymodel
 # target_metadata = mymodel.Base.metadata
 target_metadata = Base.metadata
 # other values from the config, defined by the needs of env.py,
 # can be acquired:
 # my_important_option = config.get_main_option("my_important_option")
 # ... etc.
 def run_migrations_offline() -> None:
    """Run migrations in 'offline' mode.
    This configures the context with just a URL
    and not an Engine, though an Engine is acceptable
    here as well.  By skipping the Engine creation
    we don't even need a DBAPI to be available.
    Calls to context.execute() here emit the given string to the
    script output.
    """
    url = config.get_main_option("sqlalchemy.url")
    context.configure(
        url=url,
        target_metadata=target_metadata,
        literal_binds=True,
        dialect_opts={"paramstyle": "named"},
    )
    with context.begin_transaction():
        context.run_migrations()
 def do_run_migrations(connection: Connection) -> None:
    context.configure(connection=connection, target_metadata=target_metadata)
    with context.begin_transaction():
        context.run_migrations()
 async def run_async_migrations() -> None:
    """In this scenario we need to create an Engine
    and associate a connection with the context.
    """
    connectable = async_engine_from_config(
        config.get_section(config.config_ini_section, {}),
        prefix="sqlalchemy.",
        poolclass=pool.NullPool,
    )
    async with connectable.connect() as connection:
        await connection.run_sync(do_run_migrations)
    await connectable.dispose()
 def run_migrations_online() -> None:
    """Run migrations in 'online' mode."""
    asyncio.run(run_async_migrations())
 if context.is_offline_mode():
    run_migrations_offline()
 else:
    run_migrations_online()
--- a/surfsense_backend/alembic/script.py.mako
+++ b/surfsense_backend/alembic/script.py.mako
@ -0,0 +1,28 @@
 """${message}
 Revision ID: ${up_revision}
 Revises: ${down_revision | comma,n}
 Create Date: ${create_date}
 """
 from typing import Sequence, Union
 from alembic import op
 import sqlalchemy as sa
 ${imports if imports else ""}
 # revision identifiers, used by Alembic.
 revision: str = ${repr(up_revision)}
 down_revision: Union[str, None] = ${repr(down_revision)}
 branch_labels: Union[str, Sequence[str], None] = ${repr(branch_labels)}
 depends_on: Union[str, Sequence[str], None] = ${repr(depends_on)}
 def upgrade() -> None:
    """Upgrade schema."""
    ${upgrades if upgrades else "pass"}
 def downgrade() -> None:
    """Downgrade schema."""
    ${downgrades if downgrades else "pass"}
--- a/surfsense_backend/alembic/versions/1_add_github_connector_enum.py
+++ b/surfsense_backend/alembic/versions/1_add_github_connector_enum.py
@ -0,0 +1,53 @@
 """Add GITHUB_CONNECTOR to SearchSourceConnectorType enum
 Revision ID: 1
 Revises: 
 Create Date: 2023-10-27 10:00:00.000000 
 """
 from typing import Sequence, Union
 from alembic import op
 import sqlalchemy as sa
 # Import pgvector if needed for other types, though not for this ENUM change
 # import pgvector
 # revision identifiers, used by Alembic.
 revision: str = '1'
 down_revision: Union[str, None] = None
 branch_labels: Union[str, Sequence[str], None] = None
 depends_on: Union[str, Sequence[str], None] = None
 def upgrade() -> None:
    # ### commands auto generated by Alembic - please adjust! ###
    # Manually add the command to add the enum value
    # Note: It's generally better to let autogenerate handle this, but we're bypassing it
    op.execute("ALTER TYPE searchsourceconnectortype ADD VALUE 'GITHUB_CONNECTOR'")
    # Pass for the rest, as autogenerate didn't run to add other schema details
    pass
    # ### end Alembic commands ###
 def downgrade() -> None:
    # ### commands auto generated by Alembic - please adjust! ###
    # Downgrading removal of an enum value is complex and potentially dangerous
    # if the value is in use. Often omitted or requires manual SQL based on context.
    # For now, we'll just pass. If you needed to reverse this, you'd likely 
    # have to manually check if 'GITHUB_CONNECTOR' is used in the table
    # and then potentially recreate the type without it.
    op.execute("ALTER TYPE searchsourceconnectortype RENAME TO searchsourceconnectortype_old")
    op.execute("CREATE TYPE searchsourceconnectortype AS ENUM('SERPER_API', 'TAVILY_API', 'SLACK_CONNECTOR', 'NOTION_CONNECTOR')")
    op.execute((
        "ALTER TABLE search_source_connectors ALTER COLUMN connector_type TYPE searchsourceconnectortype USING "
        "connector_type::text::searchsourceconnectortype"
    ))
    op.execute("DROP TYPE searchsourceconnectortype_old")
    pass
    # ### end Alembic commands ### 
--- a/surfsense_backend/alembic/versions/2_add_linear_connector_enum.py
+++ b/surfsense_backend/alembic/versions/2_add_linear_connector_enum.py
@ -0,0 +1,45 @@
 """Add LINEAR_CONNECTOR to SearchSourceConnectorType enum
 Revision ID: 2
 Revises: e55302644c51
 Create Date: 2025-04-16 10:00:00.000000 
 """
 from typing import Sequence, Union
 from alembic import op
 import sqlalchemy as sa
 # revision identifiers, used by Alembic.
 revision: str = '2'
 down_revision: Union[str, None] = 'e55302644c51'
 branch_labels: Union[str, Sequence[str], None] = None
 depends_on: Union[str, Sequence[str], None] = None
 def upgrade() -> None:
    # ### commands auto generated by Alembic - please adjust! ###
    # Manually add the command to add the enum value
    op.execute("ALTER TYPE searchsourceconnectortype ADD VALUE 'LINEAR_CONNECTOR'")
    # Pass for the rest, as autogenerate didn't run to add other schema details
    pass
    # ### end Alembic commands ###
 def downgrade() -> None:
    # ### commands auto generated by Alembic - please adjust! ###
    # Downgrading removal of an enum value requires recreating the type
    op.execute("ALTER TYPE searchsourceconnectortype RENAME TO searchsourceconnectortype_old")
    op.execute("CREATE TYPE searchsourceconnectortype AS ENUM('SERPER_API', 'TAVILY_API', 'SLACK_CONNECTOR', 'NOTION_CONNECTOR', 'GITHUB_CONNECTOR')")
    op.execute((
        "ALTER TABLE search_source_connectors ALTER COLUMN connector_type TYPE searchsourceconnectortype USING "
        "connector_type::text::searchsourceconnectortype"
    ))
    op.execute("DROP TYPE searchsourceconnectortype_old")
    pass
    # ### end Alembic commands ### 
--- a/surfsense_backend/alembic/versions/3_add_linear_connector_to_documenttype_.py
+++ b/surfsense_backend/alembic/versions/3_add_linear_connector_to_documenttype_.py
@ -0,0 +1,71 @@
 """Add LINEAR_CONNECTOR to DocumentType enum
 Revision ID: 3
 Revises: 2
 Create Date: 2025-04-16 10:05:00.059921
 """
 from typing import Sequence, Union
 from alembic import op
 import sqlalchemy as sa
 # revision identifiers, used by Alembic.
 revision: str = '3'
 down_revision: Union[str, None] = '2'
 branch_labels: Union[str, Sequence[str], None] = None
 depends_on: Union[str, Sequence[str], None] = None
 # Define the ENUM type name and the new value
 ENUM_NAME = 'documenttype' # Make sure this matches the name in your DB (usually lowercase class name)
 NEW_VALUE = 'LINEAR_CONNECTOR'
 def upgrade() -> None:
    """Upgrade schema."""
    op.execute(f"ALTER TYPE {ENUM_NAME} ADD VALUE '{NEW_VALUE}'")
 # Warning: This will delete all rows with the new value
 def downgrade() -> None:
    """Downgrade schema - remove LINEAR_CONNECTOR from enum."""
    # The old type name
    old_enum_name = f"{ENUM_NAME}_old"
    # Enum values *before* LINEAR_CONNECTOR was added
    old_values = (
        'EXTENSION',
        'CRAWLED_URL',
        'FILE',
        'SLACK_CONNECTOR',
        'NOTION_CONNECTOR',
        'YOUTUBE_VIDEO',
        'GITHUB_CONNECTOR'
    )
    old_values_sql = ", ".join([f"'{v}'" for v in old_values])
    # Table and column names (adjust if different)
    table_name = 'documents'
    column_name = 'document_type'
    # 1. Rename the current enum type
    op.execute(f"ALTER TYPE {ENUM_NAME} RENAME TO {old_enum_name}")
    # 2. Create the new enum type with the old values
    op.execute(f"CREATE TYPE {ENUM_NAME} AS ENUM({old_values_sql})")
    # 3. Update the table: 
    op.execute(
        f"DELETE FROM {table_name} WHERE {column_name}::text = '{NEW_VALUE}'"
    )
    # 4. Alter the column to use the new enum type (casting old values)
    op.execute(
        f"ALTER TABLE {table_name} ALTER COLUMN {column_name} "
        f"TYPE {ENUM_NAME} USING {column_name}::text::{ENUM_NAME}"
    )
    # 5. Drop the old enum type
    op.execute(f"DROP TYPE {old_enum_name}")
    # ### end Alembic commands ### 
--- a/surfsense_backend/alembic/versions/e55302644c51_add_github_connector_to_documenttype_.py
+++ b/surfsense_backend/alembic/versions/e55302644c51_add_github_connector_to_documenttype_.py
@ -0,0 +1,70 @@
 """Add GITHUB_CONNECTOR to DocumentType enum
 Revision ID: e55302644c51
 Revises: 1
 Create Date: 2025-04-13 19:56:00.059921
 """
 from typing import Sequence, Union
 from alembic import op
 import sqlalchemy as sa
 # revision identifiers, used by Alembic.
 revision: str = 'e55302644c51'
 down_revision: Union[str, None] = '1'
 branch_labels: Union[str, Sequence[str], None] = None
 depends_on: Union[str, Sequence[str], None] = None
 # Define the ENUM type name and the new value
 ENUM_NAME = 'documenttype' # Make sure this matches the name in your DB (usually lowercase class name)
 NEW_VALUE = 'GITHUB_CONNECTOR'
 def upgrade() -> None:
    """Upgrade schema."""
    op.execute(f"ALTER TYPE {ENUM_NAME} ADD VALUE '{NEW_VALUE}'")
 # Warning: This will delete all rows with the new value
 def downgrade() -> None:
    """Downgrade schema - remove GITHUB_CONNECTOR from enum."""
    # The old type name
    old_enum_name = f"{ENUM_NAME}_old"
    # Enum values *before* GITHUB_CONNECTOR was added
    old_values = (
        'EXTENSION',
        'CRAWLED_URL',
        'FILE',
        'SLACK_CONNECTOR',
        'NOTION_CONNECTOR',
        'YOUTUBE_VIDEO'
    )
    old_values_sql = ", ".join([f"'{v}'" for v in old_values])
    # Table and column names (adjust if different)
    table_name = 'documents'
    column_name = 'document_type'
    # 1. Rename the current enum type
    op.execute(f"ALTER TYPE {ENUM_NAME} RENAME TO {old_enum_name}")
    # 2. Create the new enum type with the old values
    op.execute(f"CREATE TYPE {ENUM_NAME} AS ENUM({old_values_sql})")
    # 3. Update the table: 
    op.execute(
        f"DELETE FROM {table_name} WHERE {column_name}::text = '{NEW_VALUE}'"
    )
    # 4. Alter the column to use the new enum type (casting old values)
    op.execute(
        f"ALTER TABLE {table_name} ALTER COLUMN {column_name} "
        f"TYPE {ENUM_NAME} USING {column_name}::text::{ENUM_NAME}"
    )
    # 5. Drop the old enum type
    op.execute(f"DROP TYPE {old_enum_name}")
    # ### end Alembic commands ### 
--- a/surfsense_backend/app/agents/init.py
+++ b/surfsense_backend/app/agents/init.py
@ -0,0 +1 @@
 """This is upcoming research agent. Work in progress."""
--- a/surfsense_backend/app/agents/researcher/init.py
+++ b/surfsense_backend/app/agents/researcher/init.py
--- a/surfsense_backend/app/agents/researcher/configuration.py
+++ b/surfsense_backend/app/agents/researcher/configuration.py
@ -0,0 +1,30 @@
 """Define the configurable parameters for the agent."""
 from __future__ import annotations
 from dataclasses import dataclass, fields
 from typing import Optional, List, Any
 from langchain_core.runnables import RunnableConfig
@dataclass(kw_only=True)
 class Configuration:
    """The configuration for the agent."""
    # Input parameters provided at invocation
    user_query: str
    num_sections: int
    connectors_to_search: List[str]
    user_id: str
    search_space_id: int
    @classmethod
    def from_runnable_config(
        cls, config: Optional[RunnableConfig] = None
    ) -> Configuration:
        """Create a Configuration instance from a RunnableConfig object."""
        configurable = (config.get("configurable") or {}) if config else {}
        _fields = {f.name for f in fields(cls) if f.init}
        return cls(**{k: v for k, v in configurable.items() if k in _fields})
--- a/surfsense_backend/app/agents/researcher/graph.py
+++ b/surfsense_backend/app/agents/researcher/graph.py
@ -0,0 +1,43 @@
 from langgraph.graph import StateGraph
 from .state import State
 from .nodes import write_answer_outline, process_sections
 from .configuration import Configuration
 from typing import TypedDict, List, Dict, Any, Optional
 # Define what keys are in our state dict
 class GraphState(TypedDict):
    # Intermediate data produced during workflow
    answer_outline: Optional[Any]
    # Final output
    final_written_report: Optional[str]
 def build_graph():
    """
    Build and return the LangGraph workflow.
    This function constructs the researcher agent graph with proper state management
    and node connections following LangGraph best practices.
    Returns:
        A compiled LangGraph workflow
    """
    # Define a new graph with state class
    workflow = StateGraph(State, config_schema=Configuration)
    # Add nodes to the graph
    workflow.add_node("write_answer_outline", write_answer_outline)
    workflow.add_node("process_sections", process_sections)
    # Define the edges - create a linear flow
    workflow.add_edge("__start__", "write_answer_outline")
    workflow.add_edge("write_answer_outline", "process_sections")
    workflow.add_edge("process_sections", "__end__")
    # Compile the workflow into an executable graph
    graph = workflow.compile()
    graph.name = "Surfsense Researcher"  # This defines the custom name in LangSmith
    return graph
 # Compile the graph once when the module is loaded
 graph = build_graph()
--- a/surfsense_backend/app/agents/researcher/nodes.py
+++ b/surfsense_backend/app/agents/researcher/nodes.py
@ -0,0 +1,658 @@
 import asyncio
 import json
 from typing import Any, Dict, List
 from app.config import config as app_config
 from app.db import async_session_maker
 from app.utils.connector_service import ConnectorService
 from langchain_core.messages import HumanMessage, SystemMessage
 from langchain_core.runnables import RunnableConfig
 from pydantic import BaseModel, Field
 from sqlalchemy.ext.asyncio import AsyncSession
 from .configuration import Configuration
 from .prompts import get_answer_outline_system_prompt
 from .state import State
 from .sub_section_writer.graph import graph as sub_section_writer_graph
 from langgraph.types import StreamWriter
 class Section(BaseModel):
    """A section in the answer outline."""
    section_id: int = Field(..., description="The zero-based index of the section")
    section_title: str = Field(..., description="The title of the section")
    questions: List[str] = Field(..., description="Questions to research for this section")
 class AnswerOutline(BaseModel):
    """The complete answer outline with all sections."""
    answer_outline: List[Section] = Field(..., description="List of sections in the answer outline")
 async def write_answer_outline(state: State, config: RunnableConfig, writer: StreamWriter) -> Dict[str, Any]:
    """
    Create a structured answer outline based on the user query.
    This node takes the user query and number of sections from the configuration and uses
    an LLM to generate a comprehensive outline with logical sections and research questions
    for each section.
    Returns:
        Dict containing the answer outline in the "answer_outline" key for state update.
    """
    streaming_service = state.streaming_service
    streaming_service.only_update_terminal("Generating answer outline...")
    writer({"yeild_value": streaming_service._format_annotations()})
    # Get configuration from runnable config
    configuration = Configuration.from_runnable_config(config)
    user_query = configuration.user_query
    num_sections = configuration.num_sections
    streaming_service.only_update_terminal(f"Planning research approach for query: {user_query[:100]}...")
    writer({"yeild_value": streaming_service._format_annotations()})
    # Initialize LLM
    llm = app_config.strategic_llm_instance
    # Create the human message content
    human_message_content = f"""
    Now Please create an answer outline for the following query:
    User Query: {user_query}
    Number of Sections: {num_sections}
    Remember to format your response as valid JSON exactly matching this structure:
    {{
      "answer_outline": [
        {{
          "section_id": 0,
          "section_title": "Section Title",
          "questions": [
            "Question 1 to research for this section",
            "Question 2 to research for this section"
          ]
        }}
      ]
    }}
    Your output MUST be valid JSON in exactly this format. Do not include any other text or explanation.
    """
    streaming_service.only_update_terminal("Designing structured outline with AI...")
    writer({"yeild_value": streaming_service._format_annotations()})
    # Create messages for the LLM
    messages = [
        SystemMessage(content=get_answer_outline_system_prompt()),
        HumanMessage(content=human_message_content)
    ]
    # Call the LLM directly without using structured output
    streaming_service.only_update_terminal("Processing answer structure...")
    writer({"yeild_value": streaming_service._format_annotations()})
    response = await llm.ainvoke(messages)
    # Parse the JSON response manually
    try:
        # Extract JSON content from the response
        content = response.content
        # Find the JSON in the content (handle case where LLM might add additional text)
        json_start = content.find('{')
        json_end = content.rfind('}') + 1
        if json_start >= 0 and json_end > json_start:
            json_str = content[json_start:json_end]
            # Parse the JSON string
            parsed_data = json.loads(json_str)
            # Convert to Pydantic model
            answer_outline = AnswerOutline(**parsed_data)
            total_questions = sum(len(section.questions) for section in answer_outline.answer_outline)
            streaming_service.only_update_terminal(f"Successfully generated outline with {len(answer_outline.answer_outline)} sections and {total_questions} research questions")
            writer({"yeild_value": streaming_service._format_annotations()})
            print(f"Successfully generated answer outline with {len(answer_outline.answer_outline)} sections")
            # Return state update
            return {"answer_outline": answer_outline}
        else:
            # If JSON structure not found, raise a clear error
            error_message = f"Could not find valid JSON in LLM response. Raw response: {content}"
            streaming_service.only_update_terminal(error_message, "error")
            writer({"yeild_value": streaming_service._format_annotations()})
            raise ValueError(error_message)
    except (json.JSONDecodeError, ValueError) as e:
        # Log the error and re-raise it
        error_message = f"Error parsing LLM response: {str(e)}"
        streaming_service.only_update_terminal(error_message, "error")
        writer({"yeild_value": streaming_service._format_annotations()})
        print(f"Error parsing LLM response: {str(e)}")
        print(f"Raw response: {response.content}")
        raise
 async def fetch_relevant_documents(
    research_questions: List[str],
    user_id: str,
    search_space_id: int,
    db_session: AsyncSession,
    connectors_to_search: List[str],
    writer: StreamWriter = None,
    state: State = None,
    top_k: int = 20
 ) -> List[Dict[str, Any]]:
    """
    Fetch relevant documents for research questions using the provided connectors.
    Args:
        research_questions: List of research questions to find documents for
        user_id: The user ID
        search_space_id: The search space ID
        db_session: The database session
        connectors_to_search: List of connectors to search
        writer: StreamWriter for sending progress updates
        state: The current state containing the streaming service
        top_k: Number of top results to retrieve per connector per question
    Returns:
        List of relevant documents
    """
    # Initialize services
    connector_service = ConnectorService(db_session)
    # Only use streaming if both writer and state are provided
    streaming_service = state.streaming_service if state is not None else None
    # Stream initial status update
    if streaming_service and writer:
        streaming_service.only_update_terminal(f"Starting research on {len(research_questions)} questions using {len(connectors_to_search)} connectors...")
        writer({"yeild_value": streaming_service._format_annotations()})
    all_raw_documents = []  # Store all raw documents
    all_sources = []  # Store all sources
    for i, user_query in enumerate(research_questions):
        # Stream question being researched
        if streaming_service and writer:
            streaming_service.only_update_terminal(f"Researching question {i+1}/{len(research_questions)}: {user_query[:100]}...")
            writer({"yeild_value": streaming_service._format_annotations()})
        # Use original research question as the query
        reformulated_query = user_query
        # Process each selected connector
        for connector in connectors_to_search:
            # Stream connector being searched
            if streaming_service and writer:
                streaming_service.only_update_terminal(f"Searching {connector} for relevant information...")
                writer({"yeild_value": streaming_service._format_annotations()})
            try:
                if connector == "YOUTUBE_VIDEO":
                    source_object, youtube_chunks = await connector_service.search_youtube(
                        user_query=reformulated_query,
                        user_id=user_id,
                        search_space_id=search_space_id,
                        top_k=top_k
                    )
                    # Add to sources and raw documents
                    if source_object:
                        all_sources.append(source_object)
                    all_raw_documents.extend(youtube_chunks)
                    # Stream found document count
                    if streaming_service and writer:
                        streaming_service.only_update_terminal(f"Found {len(youtube_chunks)} YouTube chunks relevant to the query")
                        writer({"yeild_value": streaming_service._format_annotations()})
                elif connector == "EXTENSION":
                    source_object, extension_chunks = await connector_service.search_extension(
                        user_query=reformulated_query,
                        user_id=user_id,
                        search_space_id=search_space_id,
                        top_k=top_k
                    )
                    # Add to sources and raw documents
                    if source_object:
                        all_sources.append(source_object)
                    all_raw_documents.extend(extension_chunks)
                    # Stream found document count
                    if streaming_service and writer:
                        streaming_service.only_update_terminal(f"Found {len(extension_chunks)} extension chunks relevant to the query")
                        writer({"yeild_value": streaming_service._format_annotations()})
                elif connector == "CRAWLED_URL":
                    source_object, crawled_urls_chunks = await connector_service.search_crawled_urls(
                        user_query=reformulated_query,
                        user_id=user_id,
                        search_space_id=search_space_id,
                        top_k=top_k
                    )
                    # Add to sources and raw documents
                    if source_object:
                        all_sources.append(source_object)
                    all_raw_documents.extend(crawled_urls_chunks)
                    # Stream found document count
                    if streaming_service and writer:
                        streaming_service.only_update_terminal(f"Found {len(crawled_urls_chunks)} crawled URL chunks relevant to the query")
                        writer({"yeild_value": streaming_service._format_annotations()})
                elif connector == "FILE":
                    source_object, files_chunks = await connector_service.search_files(
                        user_query=reformulated_query,
                        user_id=user_id,
                        search_space_id=search_space_id,
                        top_k=top_k
                    )
                    # Add to sources and raw documents
                    if source_object:
                        all_sources.append(source_object)
                    all_raw_documents.extend(files_chunks)
                    # Stream found document count
                    if streaming_service and writer:
                        streaming_service.only_update_terminal(f"Found {len(files_chunks)} file chunks relevant to the query")
                        writer({"yeild_value": streaming_service._format_annotations()})
                elif connector == "TAVILY_API":
                    source_object, tavily_chunks = await connector_service.search_tavily(
                        user_query=reformulated_query,
                        user_id=user_id,
                        top_k=top_k
                    )
                    # Add to sources and raw documents
                    if source_object:
                        all_sources.append(source_object)
                    all_raw_documents.extend(tavily_chunks)
                    # Stream found document count
                    if streaming_service and writer:
                        streaming_service.only_update_terminal(f"Found {len(tavily_chunks)} web search results relevant to the query")
                        writer({"yeild_value": streaming_service._format_annotations()})
                elif connector == "SLACK_CONNECTOR":
                    source_object, slack_chunks = await connector_service.search_slack(
                        user_query=reformulated_query,
                        user_id=user_id,
                        search_space_id=search_space_id,
                        top_k=top_k
                    )
                    # Add to sources and raw documents
                    if source_object:
                        all_sources.append(source_object)
                    all_raw_documents.extend(slack_chunks)
                    # Stream found document count
                    if streaming_service and writer:
                        streaming_service.only_update_terminal(f"Found {len(slack_chunks)} Slack messages relevant to the query")
                        writer({"yeild_value": streaming_service._format_annotations()})
                elif connector == "NOTION_CONNECTOR":
                    source_object, notion_chunks = await connector_service.search_notion(
                        user_query=reformulated_query,
                        user_id=user_id,
                        search_space_id=search_space_id,
                        top_k=top_k
                    )
                    # Add to sources and raw documents
                    if source_object:
                        all_sources.append(source_object)
                    all_raw_documents.extend(notion_chunks)
                    # Stream found document count
                    if streaming_service and writer:
                        streaming_service.only_update_terminal(f"Found {len(notion_chunks)} Notion pages/blocks relevant to the query")
                        writer({"yeild_value": streaming_service._format_annotations()})
                elif connector == "GITHUB_CONNECTOR":
                    source_object, github_chunks = await connector_service.search_github(
                        user_query=reformulated_query,
                        user_id=user_id,
                        search_space_id=search_space_id,
                        top_k=top_k
                    )
                    # Add to sources and raw documents
                    if source_object:
                        all_sources.append(source_object)
                    all_raw_documents.extend(github_chunks)
                    # Stream found document count
                    if streaming_service and writer:
                        streaming_service.only_update_terminal(f"Found {len(github_chunks)} GitHub files/issues relevant to the query")
                        writer({"yeild_value": streaming_service._format_annotations()})
                elif connector == "LINEAR_CONNECTOR":
                    source_object, linear_chunks = await connector_service.search_linear(
                        user_query=reformulated_query,
                        user_id=user_id,
                        search_space_id=search_space_id,
                        top_k=top_k
                    )
                    # Add to sources and raw documents
                    if source_object:
                        all_sources.append(source_object)
                    all_raw_documents.extend(linear_chunks)
                    # Stream found document count
                    if streaming_service and writer:
                        streaming_service.only_update_terminal(f"Found {len(linear_chunks)} Linear issues relevant to the query")
                        writer({"yeild_value": streaming_service._format_annotations()})
            except Exception as e:
                error_message = f"Error searching connector {connector}: {str(e)}"
                print(error_message)
                # Stream error message
                if streaming_service and writer:
                    streaming_service.only_update_terminal(error_message, "error")
                    writer({"yeild_value": streaming_service._format_annotations()})
                # Continue with other connectors on error
                continue
    # Deduplicate source objects by ID before streaming
    deduplicated_sources = []
    seen_source_keys = set()
    for source_obj in all_sources:
        # Use combination of source ID and type as a unique identifier
        # This ensures we don't accidentally deduplicate sources from different connectors
        source_id = source_obj.get('id')
        source_type = source_obj.get('type')
        if source_id and source_type:
            source_key = f"{source_type}_{source_id}"
            if source_key not in seen_source_keys:
                seen_source_keys.add(source_key)
                deduplicated_sources.append(source_obj)
        else:
            # If there's no ID or type, just add it to be safe
            deduplicated_sources.append(source_obj)
    # Stream info about deduplicated sources
    if streaming_service and writer:
        streaming_service.only_update_terminal(f"Collected {len(deduplicated_sources)} unique sources across all connectors")
        writer({"yeild_value": streaming_service._format_annotations()})
    # After all sources are collected and deduplicated, stream them
    if streaming_service and writer:
        streaming_service.only_update_sources(deduplicated_sources)
        writer({"yeild_value": streaming_service._format_annotations()})
    # Deduplicate raw documents based on chunk_id or content
    seen_chunk_ids = set()
    seen_content_hashes = set()
    deduplicated_docs = []
    for doc in all_raw_documents:
        chunk_id = doc.get("chunk_id")
        content = doc.get("content", "")
        content_hash = hash(content)
        # Skip if we've seen this chunk_id or content before
        if (chunk_id and chunk_id in seen_chunk_ids) or content_hash in seen_content_hashes:
            continue
        # Add to our tracking sets and keep this document
        if chunk_id:
            seen_chunk_ids.add(chunk_id)
        seen_content_hashes.add(content_hash)
        deduplicated_docs.append(doc)
    # Stream info about deduplicated documents
    if streaming_service and writer:
        streaming_service.only_update_terminal(f"Found {len(deduplicated_docs)} unique document chunks after deduplication")
        writer({"yeild_value": streaming_service._format_annotations()})
    # Return deduplicated documents
    return deduplicated_docs
 async def process_sections(state: State, config: RunnableConfig, writer: StreamWriter) -> Dict[str, Any]:
    """
    Process all sections in parallel and combine the results.
    This node takes the answer outline from the previous step, fetches relevant documents 
    for all questions across all sections once, and then processes each section in parallel 
    using the sub_section_writer graph with the shared document pool.
    Returns:
        Dict containing the final written report in the "final_written_report" key.
    """
    # Get configuration and answer outline from state
    configuration = Configuration.from_runnable_config(config)
    answer_outline = state.answer_outline
    streaming_service = state.streaming_service
    streaming_service.only_update_terminal(f"Starting to process research sections...")
    writer({"yeild_value": streaming_service._format_annotations()})
    print(f"Processing sections from outline: {answer_outline is not None}")
    if not answer_outline:
        streaming_service.only_update_terminal("Error: No answer outline was provided. Cannot generate report.", "error")
        writer({"yeild_value": streaming_service._format_annotations()})
        return {
            "final_written_report": "No answer outline was provided. Cannot generate final report."
        }
    # Collect all questions from all sections
    all_questions = []
    for section in answer_outline.answer_outline:
        all_questions.extend(section.questions)
    print(f"Collected {len(all_questions)} questions from all sections")
    streaming_service.only_update_terminal(f"Found {len(all_questions)} research questions across {len(answer_outline.answer_outline)} sections")
    writer({"yeild_value": streaming_service._format_annotations()})
    # Fetch relevant documents once for all questions
    streaming_service.only_update_terminal("Searching for relevant information across all connectors...")
    writer({"yeild_value": streaming_service._format_annotations()})
    relevant_documents = []
    async with async_session_maker() as db_session:
        try:
            relevant_documents = await fetch_relevant_documents(
                research_questions=all_questions,
                user_id=configuration.user_id,
                search_space_id=configuration.search_space_id,
                db_session=db_session,
                connectors_to_search=configuration.connectors_to_search,
                writer=writer,
                state=state
            )
        except Exception as e:
            error_message = f"Error fetching relevant documents: {str(e)}"
            print(error_message)
            streaming_service.only_update_terminal(error_message, "error")
            writer({"yeild_value": streaming_service._format_annotations()})
            # Log the error and continue with an empty list of documents
            # This allows the process to continue, but the report might lack information
            relevant_documents = []
            # Consider adding more robust error handling or reporting if needed
    print(f"Fetched {len(relevant_documents)} relevant documents for all sections")
    streaming_service.only_update_terminal(f"Starting to draft {len(answer_outline.answer_outline)} sections using {len(relevant_documents)} relevant document chunks")
    writer({"yeild_value": streaming_service._format_annotations()})
    # Create tasks to process each section in parallel with the same document set
    section_tasks = []
    streaming_service.only_update_terminal("Creating processing tasks for each section...")
    writer({"yeild_value": streaming_service._format_annotations()})
    for section in answer_outline.answer_outline:
        section_tasks.append(
            process_section_with_documents(
                section_title=section.section_title,
                section_questions=section.questions,
                user_query=configuration.user_query,
                user_id=configuration.user_id,
                search_space_id=configuration.search_space_id,
                relevant_documents=relevant_documents,
                state=state,
                writer=writer
            )
        )
    # Run all section processing tasks in parallel
    print(f"Running {len(section_tasks)} section processing tasks in parallel")
    streaming_service.only_update_terminal(f"Processing {len(section_tasks)} sections simultaneously...")
    writer({"yeild_value": streaming_service._format_annotations()})
    section_results = await asyncio.gather(*section_tasks, return_exceptions=True)
    # Handle any exceptions in the results
    streaming_service.only_update_terminal("Combining section results into final report...")
    writer({"yeild_value": streaming_service._format_annotations()})
    processed_results = []
    for i, result in enumerate(section_results):
        if isinstance(result, Exception):
            section_title = answer_outline.answer_outline[i].section_title
            error_message = f"Error processing section '{section_title}': {str(result)}"
            print(error_message)
            streaming_service.only_update_terminal(error_message, "error")
            writer({"yeild_value": streaming_service._format_annotations()})
            processed_results.append(error_message)
        else:
            processed_results.append(result)
    # Combine the results into a final report with section titles
    final_report = []
    for i, (section, content) in enumerate(zip(answer_outline.answer_outline, processed_results)):
        # Skip adding the section header since the content already contains the title
        final_report.append(content)
        final_report.append("\n")  
    # Join all sections with newlines
    final_written_report = "\n".join(final_report)
    print(f"Generated final report with {len(final_report)} parts")
    streaming_service.only_update_terminal("Final research report generated successfully!")
    writer({"yeild_value": streaming_service._format_annotations()})
    if hasattr(state, 'streaming_service') and state.streaming_service:
        # Convert the final report to the expected format for UI:
        # A list of strings where empty strings represent line breaks
        formatted_report = []
        for section in final_report:
            if section == "\n":
                # Add an empty string for line breaks
                formatted_report.append("")
            else:
                # Split any multiline content by newlines and add each line
                section_lines = section.split("\n")
                formatted_report.extend(section_lines)
        state.streaming_service.only_update_answer(formatted_report)
        writer({"yeild_value": state.streaming_service._format_annotations()})
    return {
        "final_written_report": final_written_report
    }
 async def process_section_with_documents(
    section_title: str, 
    section_questions: List[str],
    user_id: str, 
    search_space_id: int, 
    relevant_documents: List[Dict[str, Any]],
    user_query: str,
    state: State = None,
    writer: StreamWriter = None
 ) -> str:
    """
    Process a single section using pre-fetched documents.
    Args:
        section_title: The title of the section
        section_questions: List of research questions for this section
        user_id: The user ID
        search_space_id: The search space ID
        relevant_documents: Pre-fetched documents to use for this section
        state: The current state
        writer: StreamWriter for sending progress updates
    Returns:
        The written section content
    """
    try:
        # Use the provided documents
        documents_to_use = relevant_documents
        # Send status update via streaming if available
        if state and state.streaming_service and writer:
            state.streaming_service.only_update_terminal(f"Writing section: {section_title} with {len(section_questions)} research questions")
            writer({"yeild_value": state.streaming_service._format_annotations()})
        # Fallback if no documents found
        if not documents_to_use:
            print(f"No relevant documents found for section: {section_title}")
            if state and state.streaming_service and writer:
                state.streaming_service.only_update_terminal(f"Warning: No relevant documents found for section: {section_title}", "warning")
                writer({"yeild_value": state.streaming_service._format_annotations()})
            documents_to_use = [
                {"content": f"No specific information was found for: {question}"}
                for question in section_questions
            ]
        # Create a new database session for this section
        async with async_session_maker() as db_session:
            # Call the sub_section_writer graph with the appropriate config
            config = {
                "configurable": {
                    "sub_section_title": section_title,
                    "sub_section_questions": section_questions,
                    "user_query": user_query,
                    "relevant_documents": documents_to_use,
                    "user_id": user_id,
                    "search_space_id": search_space_id
                }
            }
            # Create the initial state with db_session
            sub_state = {"db_session": db_session}
            # Invoke the sub-section writer graph
            print(f"Invoking sub_section_writer for: {section_title}")
            if state and state.streaming_service and writer:
                state.streaming_service.only_update_terminal(f"Analyzing information and drafting content for section: {section_title}")
                writer({"yeild_value": state.streaming_service._format_annotations()})
            result = await sub_section_writer_graph.ainvoke(sub_state, config)
            # Return the final answer from the sub_section_writer
            final_answer = result.get("final_answer", "No content was generated for this section.")
            # Send section content update via streaming if available
            if state and state.streaming_service and writer:
                state.streaming_service.only_update_terminal(f"Completed writing section: {section_title}")
                writer({"yeild_value": state.streaming_service._format_annotations()})
            return final_answer
    except Exception as e:
        print(f"Error processing section '{section_title}': {str(e)}")
        # Send error update via streaming if available
        if state and state.streaming_service and writer:
            state.streaming_service.only_update_terminal(f"Error processing section '{section_title}': {str(e)}", "error")
            writer({"yeild_value": state.streaming_service._format_annotations()})
        return f"Error processing section: {section_title}. Details: {str(e)}"
--- a/surfsense_backend/app/agents/researcher/prompts.py
+++ b/surfsense_backend/app/agents/researcher/prompts.py
@ -0,0 +1,92 @@
 import datetime
 def get_answer_outline_system_prompt():
    return f"""
 Today's date: {datetime.datetime.now().strftime("%Y-%m-%d")}
 <answer_outline_system>
 You are an expert research assistant specializing in structuring information. Your task is to create a detailed and logical research outline based on the user's query. This outline will serve as the blueprint for generating a comprehensive research report.
 <input>
 - user_query (string): The main question or topic the user wants researched. This guides the entire outline creation process.
 - num_sections (integer): The target number of distinct sections the final research report should have. This helps control the granularity and structure of the outline.
 </input>
 <output_format>
 A JSON object with the following structure:
 {{
  "answer_outline": [
    {{
      "section_id": 0,
      "section_title": "Section Title",
      "questions": [
        "Question 1 to research for this section",
        "Question 2 to research for this section"
      ]
    }}
  ]
 }}
 </output_format>
 <instructions>
 1.  **Deconstruct the `user_query`:** Identify the key concepts, entities, and the core information requested by the user.
 2.  **Determine Section Themes:** Based on the analysis and the requested `num_sections`, divide the topic into distinct, logical themes or sub-topics. Each theme will become a section. Ensure these themes collectively address the `user_query` comprehensively.
 3.  **Develop Sections:** For *each* of the `num_sections`:
    *   **Assign `section_id`:** Start with 0 and increment sequentially for each section.
    *   **Craft `section_title`:** Write a concise, descriptive title that clearly defines the scope and focus of the section's theme.
    *   **Formulate Research `questions`:** Generate 2 to 5 specific, targeted research questions for this section. These questions must:
        *   Directly relate to the `section_title` and explore its key aspects.
        *   Be answerable through focused research (e.g., searching documents, databases, or knowledge bases).
        *   Be distinct from each other and from questions in other sections. Avoid redundancy.
        *   Collectively guide the gathering of information needed to fully address the section's theme.
 4.  **Ensure Logical Flow:** Arrange the sections in a coherent and intuitive sequence. Consider structures like:
    *   General background -> Specific details -> Analysis/Comparison -> Applications/Implications
    *   Problem definition -> Proposed solutions -> Evaluation -> Conclusion
    *   Chronological progression
 5.  **Verify Completeness and Cohesion:** Review the entire outline (`section_titles` and `questions`) to confirm that:
    *   All sections together provide a complete and well-structured answer to the original `user_query`.
    *   There are no significant overlaps or gaps in coverage between sections.
 6.  **Adhere Strictly to Output Format:** Ensure the final output is a valid JSON object matching the specified structure exactly, including correct field names (`answer_outline`, `section_id`, `section_title`, `questions`) and data types.
 </instructions>
 <examples>
 User Query: "What are the health benefits of meditation?"
 Number of Sections: 3
 {{
  "answer_outline": [
    {{
      "section_id": 0,
      "section_title": "Physical Health Benefits of Meditation",
      "questions": [
        "What physiological changes occur in the body during meditation?",
        "How does regular meditation affect blood pressure and heart health?",
        "What impact does meditation have on inflammation and immune function?",
        "Can meditation help with pain management, and if so, how?"
      ]
    }},
    {{
      "section_id": 1,
      "section_title": "Mental Health Benefits of Meditation",
      "questions": [
        "How does meditation affect stress and anxiety levels?",
        "What changes in brain structure or function have been observed in meditation practitioners?",
        "Can meditation help with depression and mood disorders?",
        "What is the relationship between meditation and cognitive function?"
      ]
    }},
    {{
      "section_id": 2,
      "section_title": "Best Meditation Practices for Maximum Benefits",
      "questions": [
        "What are the most effective meditation techniques for beginners?",
        "How long and how frequently should one meditate to see benefits?",
        "Are there specific meditation approaches best suited for particular health goals?",
        "What common obstacles prevent people from experiencing meditation benefits?"
      ]
    }}
  ]
 }}
 </examples>
 </answer_outline_system>
 """
--- a/surfsense_backend/app/agents/researcher/state.py
+++ b/surfsense_backend/app/agents/researcher/state.py
@ -0,0 +1,31 @@
 """Define the state structures for the agent."""
 from __future__ import annotations
 from dataclasses import dataclass, field
 from typing import Optional, Any
 from sqlalchemy.ext.asyncio import AsyncSession
 from app.utils.streaming_service import StreamingService
@dataclass
 class State:
    """Defines the dynamic state for the agent during execution.
    This state tracks the database session and the outputs generated by the agent's nodes.
    See: https://langchain-ai.github.io/langgraph/concepts/low_level/#state
    for more information.
    """
    # Runtime context (not part of actual graph state)
    db_session: AsyncSession
    # Streaming service
    streaming_service: StreamingService
    # Intermediate state - populated during workflow
    # Using field to explicitly mark as part of state
    answer_outline: Optional[Any] = field(default=None)
    # OUTPUT: Populated by agent nodes
    # Using field to explicitly mark as part of state
    final_written_report: Optional[str] = field(default=None)
--- a/surfsense_backend/app/agents/researcher/sub_section_writer/init.py
+++ b/surfsense_backend/app/agents/researcher/sub_section_writer/init.py
@ -0,0 +1,8 @@
 """New LangGraph Agent.
 This module defines a custom graph.
 """
 from .graph import graph
 __all__ = ["graph"]
--- a/surfsense_backend/app/agents/researcher/sub_section_writer/configuration.py
+++ b/surfsense_backend/app/agents/researcher/sub_section_writer/configuration.py
@ -0,0 +1,31 @@
 """Define the configurable parameters for the agent."""
 from __future__ import annotations
 from dataclasses import dataclass, fields
 from typing import Optional, List, Any
 from langchain_core.runnables import RunnableConfig
@dataclass(kw_only=True)
 class Configuration:
    """The configuration for the agent."""
    # Input parameters provided at invocation
    sub_section_title: str
    sub_section_questions: List[str]
    user_query: str
    relevant_documents: List[Any]  # Documents provided directly to the agent
    user_id: str
    search_space_id: int
    @classmethod
    def from_runnable_config(
        cls, config: Optional[RunnableConfig] = None
    ) -> Configuration:
        """Create a Configuration instance from a RunnableConfig object."""
        configurable = (config.get("configurable") or {}) if config else {}
        _fields = {f.name for f in fields(cls) if f.init}
        return cls(**{k: v for k, v in configurable.items() if k in _fields})
--- a/surfsense_backend/app/agents/researcher/sub_section_writer/graph.py
+++ b/surfsense_backend/app/agents/researcher/sub_section_writer/graph.py
@ -0,0 +1,20 @@
 from langgraph.graph import StateGraph
 from .state import State
 from .nodes import write_sub_section, rerank_documents
 from .configuration import Configuration
 # Define a new graph
 workflow = StateGraph(State, config_schema=Configuration)
 # Add the nodes to the graph
 workflow.add_node("rerank_documents", rerank_documents)
 workflow.add_node("write_sub_section", write_sub_section)
 # Connect the nodes
 workflow.add_edge("__start__", "rerank_documents")
 workflow.add_edge("rerank_documents", "write_sub_section")
 workflow.add_edge("write_sub_section", "__end__")
 # Compile the workflow into an executable graph
 graph = workflow.compile()
 graph.name = "Sub Section Writer"  # This defines the custom name in LangSmith
--- a/surfsense_backend/app/agents/researcher/sub_section_writer/nodes.py
+++ b/surfsense_backend/app/agents/researcher/sub_section_writer/nodes.py
@ -0,0 +1,160 @@
 from .configuration import Configuration
 from langchain_core.runnables import RunnableConfig
 from .state import State
 from typing import Any, Dict
 from app.config import config as app_config
 from .prompts import get_citation_system_prompt
 from langchain_core.messages import HumanMessage, SystemMessage
 async def rerank_documents(state: State, config: RunnableConfig) -> Dict[str, Any]:
    """
    Rerank the documents based on relevance to the sub-section title.
    This node takes the relevant documents provided in the configuration,
    reranks them using the reranker service based on the sub-section title,
    and updates the state with the reranked documents.
    Returns:
        Dict containing the reranked documents.
    """
    # Get configuration and relevant documents
    configuration = Configuration.from_runnable_config(config)
    documents = configuration.relevant_documents
    sub_section_questions = configuration.sub_section_questions
    # If no documents were provided, return empty list
    if not documents or len(documents) == 0:
        return {
            "reranked_documents": []
        }
    # Get reranker service from app config
    reranker_service = getattr(app_config, "reranker_service", None)
    # Use documents as is if no reranker service is available
    reranked_docs = documents
    if reranker_service:
        try:
            # Use the sub-section questions for reranking context
            # rerank_query = "\n".join(sub_section_questions)
            rerank_query = configuration.user_query
            # Convert documents to format expected by reranker if needed
            reranker_input_docs = [
                {
                    "chunk_id": doc.get("chunk_id", f"chunk_{i}"),
                    "content": doc.get("content", ""),
                    "score": doc.get("score", 0.0),
                    "document": {
                        "id": doc.get("document", {}).get("id", ""),
                        "title": doc.get("document", {}).get("title", ""),
                        "document_type": doc.get("document", {}).get("document_type", ""),
                        "metadata": doc.get("document", {}).get("metadata", {})
                    }
                } for i, doc in enumerate(documents)
            ]
            # Rerank documents using the section title
            reranked_docs = reranker_service.rerank_documents(rerank_query, reranker_input_docs)
            # Sort by score in descending order
            reranked_docs.sort(key=lambda x: x.get("score", 0), reverse=True)
            print(f"Reranked {len(reranked_docs)} documents for section: {configuration.sub_section_title}")
        except Exception as e:
            print(f"Error during reranking: {str(e)}")
            # Use original docs if reranking fails
    return {
        "reranked_documents": reranked_docs
    }
 async def write_sub_section(state: State, config: RunnableConfig) -> Dict[str, Any]:
    """
    Write the sub-section using the provided documents.
    This node takes the relevant documents provided in the configuration and uses
    an LLM to generate a comprehensive answer to the sub-section title with
    proper citations. The citations follow IEEE format using source IDs from the
    documents.
    Returns:
        Dict containing the final answer in the "final_answer" key.
    """
    # Get configuration and relevant documents from configuration
    configuration = Configuration.from_runnable_config(config)
    documents = configuration.relevant_documents
    # Initialize LLM
    llm = app_config.fast_llm_instance
    # If no documents were provided, return a message indicating this
    if not documents or len(documents) == 0:
        return {
            "final_answer": "No relevant documents were provided to answer this question. Please provide documents or try a different approach."
        }
    # Prepare documents for citation formatting
    formatted_documents = []
    for i, doc in enumerate(documents):
        # Extract content and metadata
        content = doc.get("content", "")
        doc_info = doc.get("document", {})
        document_id = doc_info.get("id", f"{i+1}")  # Use document ID or index+1 as source_id
        # Format document according to the citation system prompt's expected format
        formatted_doc = f"""
        <document>
            <metadata>
                <source_id>{document_id}</source_id>
            </metadata>
            <content>
                {content}
            </content>
        </document>
        """
        formatted_documents.append(formatted_doc)
    # Create the query that uses the section title and questions
    section_title = configuration.sub_section_title
    sub_section_questions = configuration.sub_section_questions
    user_query = configuration.user_query  # Get the original user query
    documents_text = "\n".join(formatted_documents)
    # Format the questions as bullet points for clarity
    questions_text = "\n".join([f"- {question}" for question in sub_section_questions])
    # Construct a clear, structured query for the LLM
    human_message_content = f"""
    Now user's query is: 
    <user_query>
        {user_query}
    </user_query>
    The sub-section title is:
    <sub_section_title>
        {section_title}
    </sub_section_title>
    Use the provided documents as your source material and cite them properly using the IEEE citation format [X] where X is the source_id.
    <documents>
        {documents_text}
    </documents>
    """
    # Create messages for the LLM
    messages = [
        SystemMessage(content=get_citation_system_prompt()),
        HumanMessage(content=human_message_content)
    ]
    # Call the LLM and get the response
    response = await llm.ainvoke(messages)
    final_answer = response.content
    return {
        "final_answer": final_answer
    }
--- a/surfsense_backend/app/agents/researcher/sub_section_writer/prompts.py
+++ b/surfsense_backend/app/agents/researcher/sub_section_writer/prompts.py
@ -0,0 +1,87 @@
 import datetime
 def get_citation_system_prompt():
    return f"""
 Today's date: {datetime.datetime.now().strftime("%Y-%m-%d")}
 You are a research assistant tasked with analyzing documents and providing comprehensive answers with proper citations in IEEE format.
 <instructions>
 1. Carefully analyze all provided documents in the <document> section's.
 2. Extract relevant information that addresses the user's query.
 3. Synthesize a comprehensive, well-structured answer using information from these documents.
 4. For EVERY piece of information you include from the documents, add an IEEE-style citation in square brackets [X] where X is the source_id from the document's metadata.
 5. Make sure ALL factual statements from the documents have proper citations.
 6. If multiple documents support the same point, include all relevant citations [X], [Y].
 7. Present information in a logical, coherent flow.
 8. Use your own words to connect ideas, but cite ALL information from the documents.
 9. If documents contain conflicting information, acknowledge this and present both perspectives with appropriate citations.
 10. Do not make up or include information not found in the provided documents.
 11. CRITICAL: You MUST use the exact source_id value from each document's metadata for citations. Do not create your own citation numbers.
 12. CRITICAL: Every citation MUST be in the IEEE format [X] where X is the exact source_id value.
 13. CRITICAL: Never renumber or reorder citations - always use the original source_id values.
 14. CRITICAL: Do not return citations as clickable links.
 15. CRITICAL: Never format citations as markdown links like "([1](https://example.com))". Always use plain square brackets only.
 16. CRITICAL: Citations must ONLY appear as [X] or [X], [Y], [Z] format - never with parentheses, hyperlinks, or other formatting.
 17. CRITICAL: Never make up citation numbers. Only use source_id values that are explicitly provided in the document metadata.
 18. CRITICAL: If you are unsure about a source_id, do not include a citation rather than guessing or making one up.
 </instructions>
 <format>
 - Write in clear, professional language suitable for academic or technical audiences
 - Organize your response with appropriate paragraphs, headings, and structure
 - Every fact from the documents must have an IEEE-style citation in square brackets [X] where X is the EXACT source_id from the document's metadata
 - Citations should appear at the end of the sentence containing the information they support
 - Multiple citations should be separated by commas: [X], [Y], [Z]
 - No need to return references section. Just citation numbers in answer.
 - NEVER create your own citation numbering system - use the exact source_id values from the documents.
 - NEVER format citations as clickable links or as markdown links like "([1](https://example.com))". Always use plain square brackets only.
 - NEVER make up citation numbers if you are unsure about the source_id. It is better to omit the citation than to guess.
 </format>
 <input_example>
    <document>
        <metadata>
            <source_id>1</source_id>
        </metadata>
        <content>
            The Great Barrier Reef is the world's largest coral reef system, stretching over 2,300 kilometers along the coast of Queensland, Australia. It comprises over 2,900 individual reefs and 900 islands.
        </content>
    </document>
    <document>
        <metadata>
            <source_id>13</source_id>
        </metadata>
        <content>
            Climate change poses a significant threat to coral reefs worldwide. Rising ocean temperatures have led to mass coral bleaching events in the Great Barrier Reef in 2016, 2017, and 2020.
        </content>
    </document>
    <document>
        <metadata>
            <source_id>21</source_id>
        </metadata>
        <content>
            The Great Barrier Reef was designated a UNESCO World Heritage Site in 1981 due to its outstanding universal value and biological diversity. It is home to over 1,500 species of fish and 400 types of coral.
        </content>
    </document>
 </input_example>
 <output_example>
    The Great Barrier Reef is the world's largest coral reef system, stretching over 2,300 kilometers along the coast of Queensland, Australia [1]. It was designated a UNESCO World Heritage Site in 1981 due to its outstanding universal value and biological diversity [21]. The reef is home to over 1,500 species of fish and 400 types of coral [21]. Unfortunately, climate change poses a significant threat to coral reefs worldwide, with rising ocean temperatures leading to mass coral bleaching events in the Great Barrier Reef in 2016, 2017, and 2020 [13]. The reef system comprises over 2,900 individual reefs and 900 islands [1], making it an ecological treasure that requires protection from multiple threats [1], [13].
 </output_example>
 <incorrect_citation_formats>
 DO NOT use any of these incorrect citation formats:
 - Using parentheses and markdown links: ([1](https://github.com/MODSetter/SurfSense))
 - Using parentheses around brackets: ([1])
 - Using hyperlinked text: [link to source 1](https://example.com)
 - Using footnote style: ... reef system¹
 - Making up citation numbers when source_id is unknown
 ONLY use plain square brackets [1] or multiple citations [1], [2], [3]
 </incorrect_citation_formats>
 Note that the citation numbers match exactly with the source_id values (1, 13, and 21) and are not renumbered sequentially. Citations follow IEEE style with square brackets and appear at the end of sentences.
 """
--- a/surfsense_backend/app/agents/researcher/sub_section_writer/state.py
+++ b/surfsense_backend/app/agents/researcher/sub_section_writer/state.py
@ -0,0 +1,23 @@
 """Define the state structures for the agent."""
 from __future__ import annotations
 from dataclasses import dataclass
 from typing import List, Optional, Any
 from sqlalchemy.ext.asyncio import AsyncSession
@dataclass
 class State:
    """Defines the dynamic state for the agent during execution.
    This state tracks the database session and the outputs generated by the agent's nodes.
    See: https://langchain-ai.github.io/langgraph/concepts/low_level/#state
    for more information.
    """
    # Runtime context
    db_session: AsyncSession
    # OUTPUT: Populated by agent nodes
    reranked_documents: Optional[List[Any]] = None
    final_answer: Optional[str] = None
--- a/surfsense_backend/app/config/init.py
+++ b/surfsense_backend/app/config/init.py
@ -15,17 +15,6 @@ env_file = BASE_DIR / ".env"
 load_dotenv(env_file)
 def extract_model_name(llm_string: str) -> str:
    """Extract the model name from an LLM string.
    Example: "litellm:openai/gpt-4o-mini" -> "openai/gpt-4o-mini"
    Args:
        llm_string: The LLM string with optional prefix
    Returns:
        str: The extracted model name
    """
    return llm_string.split(":", 1)[1] if ":" in llm_string else llm_string
 class Config:
    # Database
@ -38,15 +27,13 @@ class Config:
    # LONG-CONTEXT LLMS
    LONG_CONTEXT_LLM = os.getenv("LONG_CONTEXT_LLM")
-    long_context_llm_instance = ChatLiteLLM(model=extract_model_name(LONG_CONTEXT_LLM))
+    long_context_llm_instance = ChatLiteLLM(model=LONG_CONTEXT_LLM)
    # GPT Researcher
    FAST_LLM = os.getenv("FAST_LLM")
    SMART_LLM = os.getenv("SMART_LLM")
    STRATEGIC_LLM = os.getenv("STRATEGIC_LLM")
-    fast_llm_instance = ChatLiteLLM(model=extract_model_name(FAST_LLM))
+    fast_llm_instance = ChatLiteLLM(model=FAST_LLM)
-    smart_llm_instance = ChatLiteLLM(model=extract_model_name(SMART_LLM))
+    strategic_llm_instance = ChatLiteLLM(model=STRATEGIC_LLM)
    strategic_llm_instance = ChatLiteLLM(model=extract_model_name(STRATEGIC_LLM))
    # Chonkie Configuration | Edit this to your needs
--- a/surfsense_backend/app/connectors/github_connector.py
+++ b/surfsense_backend/app/connectors/github_connector.py
@ -0,0 +1,211 @@
 import base64
 import logging
 from typing import List, Optional, Dict, Any, Tuple
 from github3 import login as github_login, exceptions as github_exceptions
 from github3.repos.contents import Contents
 from github3.exceptions import ForbiddenError, NotFoundError
 logger = logging.getLogger(__name__)
 # List of common code file extensions to target
 CODE_EXTENSIONS = {
    '.py', '.js', '.jsx', '.ts', '.tsx', '.java', '.c', '.cpp', '.h', '.hpp',
    '.cs', '.go', '.rb', '.php', '.swift', '.kt', '.scala', '.rs', '.m',
    '.sh', '.bash', '.ps1', '.lua', '.pl', '.pm', '.r', '.dart', '.sql'
 }
 # List of common documentation/text file extensions
 DOC_EXTENSIONS = {
    '.md', '.txt', '.rst', '.adoc', '.html', '.htm', '.xml', '.json', '.yaml', '.yml', '.toml'
 }
 # Maximum file size in bytes (e.g., 1MB)
 MAX_FILE_SIZE = 1 * 1024 * 1024
 class GitHubConnector:
    """Connector for interacting with the GitHub API."""
    # Directories to skip during file traversal
    SKIPPED_DIRS = {
        # Version control
        '.git',
        # Dependencies
        'node_modules',
        'vendor', 
        # Build artifacts / Caches
        'build',
        'dist',
        'target',
        '__pycache__',
        # Virtual environments
        'venv',
        '.venv',
        'env',
        # IDE/Editor config
        '.vscode',
        '.idea',
        '.project',
        '.settings',
        # Temporary / Logs
        'tmp',
        'logs',
        # Add other project-specific irrelevant directories if needed
    }
    def __init__(self, token: str):
        """
        Initializes the GitHub connector.
        Args:
            token: GitHub Personal Access Token (PAT).
        """
        if not token:
            raise ValueError("GitHub token cannot be empty.")
        try:
            self.gh = github_login(token=token)
            # Try a simple authenticated call to check token validity
            self.gh.me()
            logger.info("Successfully authenticated with GitHub API.")
        except (github_exceptions.AuthenticationFailed, ForbiddenError) as e:
            logger.error(f"GitHub authentication failed: {e}")
            raise ValueError("Invalid GitHub token or insufficient permissions.")
        except Exception as e:
            logger.error(f"Failed to initialize GitHub client: {e}")
            raise
    def get_user_repositories(self) -> List[Dict[str, Any]]:
        """Fetches repositories accessible by the authenticated user."""
        repos_data = []
        try:
            # type='owner' fetches repos owned by the user
            # type='member' fetches repos the user is a collaborator on (including orgs)
            # type='all' fetches both
            for repo in self.gh.repositories(type='owner', sort='updated'):
                repos_data.append({
                    "id": repo.id,
                    "name": repo.name,
                    "full_name": repo.full_name,
                    "private": repo.private,
                    "url": repo.html_url,
                    "description": repo.description or "",
                    "last_updated": repo.updated_at if repo.updated_at else None,
                })
            logger.info(f"Fetched {len(repos_data)} repositories.")
            return repos_data
        except Exception as e:
            logger.error(f"Failed to fetch GitHub repositories: {e}")
            return [] # Return empty list on error
    def get_repository_files(self, repo_full_name: str, path: str = '') -> List[Dict[str, Any]]:
        """
        Recursively fetches details of relevant files (code, docs) within a repository path.
        Args:
            repo_full_name: The full name of the repository (e.g., 'owner/repo').
            path: The starting path within the repository (default is root).
        Returns:
            A list of dictionaries, each containing file details (path, sha, url, size).
            Returns an empty list if the repository or path is not found or on error.
        """
        files_list = []
        try:
            owner, repo_name = repo_full_name.split('/')
            repo = self.gh.repository(owner, repo_name)
            if not repo:
                logger.warning(f"Repository '{repo_full_name}' not found.")
                return []
            contents = repo.directory_contents(directory_path=path) # Use directory_contents for clarity
            # contents returns a list of tuples (name, content_obj)
            for item_name, content_item in contents:
                if not isinstance(content_item, Contents):
                    continue
                if content_item.type == 'dir':
                    # Check if the directory name is in the skipped list
                    if content_item.name in self.SKIPPED_DIRS:
                        logger.debug(f"Skipping directory: {content_item.path}")
                        continue # Skip recursion for this directory
                    # Recursively fetch contents of subdirectory
                    files_list.extend(self.get_repository_files(repo_full_name, path=content_item.path))
                elif content_item.type == 'file':
                    # Check if the file extension is relevant and size is within limits
                    file_extension = '.' + content_item.name.split('.')[-1].lower() if '.' in content_item.name else ''
                    is_code = file_extension in CODE_EXTENSIONS
                    is_doc = file_extension in DOC_EXTENSIONS
                    if (is_code or is_doc) and content_item.size <= MAX_FILE_SIZE:
                        files_list.append({
                            "path": content_item.path,
                            "sha": content_item.sha,
                            "url": content_item.html_url,
                            "size": content_item.size,
                            "type": "code" if is_code else "doc"
                        })
                    elif content_item.size > MAX_FILE_SIZE:
                         logger.debug(f"Skipping large file: {content_item.path} ({content_item.size} bytes)")
                    else:
                         logger.debug(f"Skipping irrelevant file type: {content_item.path}")
        except (NotFoundError, ForbiddenError) as e:
             logger.warning(f"Cannot access path '{path}' in '{repo_full_name}': {e}")
        except Exception as e:
            logger.error(f"Failed to get files for {repo_full_name} at path '{path}': {e}")
            # Return what we have collected so far in case of partial failure
        return files_list
    def get_file_content(self, repo_full_name: str, file_path: str) -> Optional[str]:
        """
        Fetches the decoded content of a specific file.
        Args:
            repo_full_name: The full name of the repository (e.g., 'owner/repo').
            file_path: The path to the file within the repository.
        Returns:
            The decoded file content as a string, or None if fetching fails or file is too large.
        """
        try:
            owner, repo_name = repo_full_name.split('/')
            repo = self.gh.repository(owner, repo_name)
            if not repo:
                logger.warning(f"Repository '{repo_full_name}' not found when fetching file '{file_path}'.")
                return None
            content_item = repo.file_contents(path=file_path) # Use file_contents for clarity
            if not content_item or not isinstance(content_item, Contents) or content_item.type != 'file':
                logger.warning(f"File '{file_path}' not found or is not a file in '{repo_full_name}'.")
                return None
            if content_item.size > MAX_FILE_SIZE:
                logger.warning(f"File '{file_path}' in '{repo_full_name}' exceeds max size ({content_item.size} > {MAX_FILE_SIZE}). Skipping content fetch.")
                return None
            # Content is base64 encoded
            if content_item.content:
                try:
                    decoded_content = base64.b64decode(content_item.content).decode('utf-8')
                    return decoded_content
                except UnicodeDecodeError:
                    logger.warning(f"Could not decode file '{file_path}' in '{repo_full_name}' as UTF-8. Trying with 'latin-1'.")
                    try:
                        # Try a fallback encoding
                        decoded_content = base64.b64decode(content_item.content).decode('latin-1')
                        return decoded_content
                    except Exception as decode_err:
                        logger.error(f"Failed to decode file '{file_path}' with fallback encoding: {decode_err}")
                        return None # Give up if fallback fails
            else:
                logger.warning(f"No content returned for file '{file_path}' in '{repo_full_name}'. It might be empty.")
                return "" # Return empty string for empty files
        except (NotFoundError, ForbiddenError) as e:
             logger.warning(f"Cannot access file '{file_path}' in '{repo_full_name}': {e}")
             return None
        except Exception as e:
            logger.error(f"Failed to get content for file '{file_path}' in '{repo_full_name}': {e}")
            return None 
--- a/surfsense_backend/app/connectors/linear_connector.py
+++ b/surfsense_backend/app/connectors/linear_connector.py
@ -0,0 +1,454 @@
 """
 Linear Connector Module
 A module for retrieving issues and comments from Linear.
 Allows fetching issue lists and their comments with date range filtering.
 """
 import requests
 from datetime import datetime, timedelta
 from typing import Dict, List, Optional, Tuple, Any, Union
 class LinearConnector:
    """Class for retrieving issues and comments from Linear."""
    def __init__(self, token: str = None):
        """
        Initialize the LinearConnector class.
        Args:
            token: Linear API token (optional, can be set later with set_token)
        """
        self.token = token
        self.api_url = "https://api.linear.app/graphql"
    def set_token(self, token: str) -> None:
        """
        Set the Linear API token.
        Args:
            token: Linear API token
        """
        self.token = token
    def get_headers(self) -> Dict[str, str]:
        """
        Get headers for Linear API requests.
        Returns:
            Dictionary of headers
        Raises:
            ValueError: If no Linear token has been set
        """
        if not self.token:
            raise ValueError("Linear token not initialized. Call set_token() first.")
        return {
            'Content-Type': 'application/json',
            'Authorization': self.token
        }
    def execute_graphql_query(self, query: str, variables: Dict[str, Any] = None) -> Dict[str, Any]:
        """
        Execute a GraphQL query against the Linear API.
        Args:
            query: GraphQL query string
            variables: Variables for the GraphQL query (optional)
        Returns:
            Response data from the API
        Raises:
            ValueError: If no Linear token has been set
            Exception: If the API request fails
        """
        if not self.token:
            raise ValueError("Linear token not initialized. Call set_token() first.")
        headers = self.get_headers()
        payload = {'query': query}
        if variables:
            payload['variables'] = variables
        response = requests.post(
            self.api_url,
            headers=headers,
            json=payload
        )
        if response.status_code == 200:
            return response.json()
        else:
            raise Exception(f"Query failed with status code {response.status_code}: {response.text}")
    def get_all_issues(self, include_comments: bool = True) -> List[Dict[str, Any]]:
        """
        Fetch all issues from Linear.
        Args:
            include_comments: Whether to include comments in the response
        Returns:
            List of issue objects
        Raises:
            ValueError: If no Linear token has been set
            Exception: If the API request fails
        """
        comments_query = ""
        if include_comments:
            comments_query = """
            comments {
                nodes {
                    id
                    body
                    user {
                        id
                        name
                        email
                    }
                    createdAt
                    updatedAt
                }
            }
            """
        query = f"""
        query {{
            issues {{
                nodes {{
                    id
                    identifier
                    title
                    description
                    state {{
                        id
                        name
                        type
                    }}
                    assignee {{
                        id
                        name
                        email
                    }}
                    creator {{
                        id
                        name
                        email
                    }}
                    createdAt
                    updatedAt
                    {comments_query}
                }}
            }}
        }}
        """
        result = self.execute_graphql_query(query)
        # Extract issues from the response
        if "data" in result and "issues" in result["data"] and "nodes" in result["data"]["issues"]:
            return result["data"]["issues"]["nodes"]
        return []
    def get_issues_by_date_range(
        self, 
        start_date: str, 
        end_date: str,
        include_comments: bool = True
    ) -> Tuple[List[Dict[str, Any]], Optional[str]]:
        """
        Fetch issues within a date range.
        Args:
            start_date: Start date in YYYY-MM-DD format
            end_date: End date in YYYY-MM-DD format (inclusive)
            include_comments: Whether to include comments in the response
        Returns:
            Tuple containing (issues list, error message or None)
        """
        # Convert date strings to ISO format
        try:
            # For Linear API: we need to use a more specific format for the filter
            # Instead of DateTime, use a string in the filter for DateTimeOrDuration
            comments_query = ""
            if include_comments:
                comments_query = """
                comments {
                    nodes {
                        id
                        body
                        user {
                            id
                            name
                            email
                        }
                        createdAt
                        updatedAt
                    }
                }
                """
            # Query issues that were either created OR updated within the date range
            # This ensures we catch both new issues and updated existing issues
            query = f"""
            query IssuesByDateRange($after: String) {{
                issues(
                    first: 100,
                    after: $after,
                    filter: {{
                        or: [
                            {{
                                createdAt: {{
                                    gte: "{start_date}T00:00:00Z"
                                    lte: "{end_date}T23:59:59Z"
                                }}
                            }},
                            {{
                                updatedAt: {{
                                    gte: "{start_date}T00:00:00Z"
                                    lte: "{end_date}T23:59:59Z"
                                }}
                            }}
                        ]
                    }}
                ) {{
                    nodes {{
                        id
                        identifier
                        title
                        description
                        state {{
                            id
                            name
                            type
                        }}
                        assignee {{
                            id
                            name
                            email
                        }}
                        creator {{
                            id
                            name
                            email
                        }}
                        createdAt
                        updatedAt
                        {comments_query}
                    }}
                    pageInfo {{
                        hasNextPage
                        endCursor
                    }}
                }}
            }}
            """
            try:
                all_issues = []
                has_next_page = True
                cursor = None
                # Handle pagination to get all issues
                while has_next_page:
                    variables = {"after": cursor} if cursor else {}
                    result = self.execute_graphql_query(query, variables)
                    # Check for errors
                    if "errors" in result:
                        error_message = "; ".join([error.get("message", "Unknown error") for error in result["errors"]])
                        return [], f"GraphQL errors: {error_message}"
                    # Extract issues from the response
                    if "data" in result and "issues" in result["data"]:
                        issues_page = result["data"]["issues"]
                        # Add issues from this page
                        if "nodes" in issues_page:
                            all_issues.extend(issues_page["nodes"])
                        # Check if there are more pages
                        if "pageInfo" in issues_page:
                            page_info = issues_page["pageInfo"]
                            has_next_page = page_info.get("hasNextPage", False)
                            cursor = page_info.get("endCursor") if has_next_page else None
                        else:
                            has_next_page = False
                    else:
                        has_next_page = False
                if not all_issues:
                    return [], "No issues found in the specified date range."
                return all_issues, None
            except Exception as e:
                return [], f"Error fetching issues: {str(e)}"
        except ValueError as e:
            return [], f"Invalid date format: {str(e)}. Please use YYYY-MM-DD."
    def format_issue(self, issue: Dict[str, Any]) -> Dict[str, Any]:
        """
        Format an issue for easier consumption.
        Args:
            issue: The issue object from Linear API
        Returns:
            Formatted issue dictionary
        """
        # Extract basic issue details
        formatted = {
            "id": issue.get("id", ""),
            "identifier": issue.get("identifier", ""),
            "title": issue.get("title", ""),
            "description": issue.get("description", ""),
            "state": issue.get("state", {}).get("name", "Unknown") if issue.get("state") else "Unknown",
            "state_type": issue.get("state", {}).get("type", "Unknown") if issue.get("state") else "Unknown",
            "created_at": issue.get("createdAt", ""),
            "updated_at": issue.get("updatedAt", ""),
            "creator": {
                "id": issue.get("creator", {}).get("id", "") if issue.get("creator") else "",
                "name": issue.get("creator", {}).get("name", "Unknown") if issue.get("creator") else "Unknown",
                "email": issue.get("creator", {}).get("email", "") if issue.get("creator") else ""
            } if issue.get("creator") else {"id": "", "name": "Unknown", "email": ""},
            "assignee": {
                "id": issue.get("assignee", {}).get("id", ""),
                "name": issue.get("assignee", {}).get("name", "Unknown"),
                "email": issue.get("assignee", {}).get("email", "")
            } if issue.get("assignee") else None,
            "comments": []
        }
        # Extract comments if available
        if "comments" in issue and "nodes" in issue["comments"]:
            for comment in issue["comments"]["nodes"]:
                formatted_comment = {
                    "id": comment.get("id", ""),
                    "body": comment.get("body", ""),
                    "created_at": comment.get("createdAt", ""),
                    "updated_at": comment.get("updatedAt", ""),
                    "user": {
                        "id": comment.get("user", {}).get("id", "") if comment.get("user") else "",
                        "name": comment.get("user", {}).get("name", "Unknown") if comment.get("user") else "Unknown",
                        "email": comment.get("user", {}).get("email", "") if comment.get("user") else ""
                    } if comment.get("user") else {"id": "", "name": "Unknown", "email": ""}
                }
                formatted["comments"].append(formatted_comment)
        return formatted
    def format_issue_to_markdown(self, issue: Dict[str, Any]) -> str:
        """
        Convert an issue to markdown format.
        Args:
            issue: The issue object (either raw or formatted)
        Returns:
            Markdown string representation of the issue
        """
        # Format the issue if it's not already formatted
        if "identifier" not in issue:
            issue = self.format_issue(issue)
        # Build the markdown content
        markdown = f"# {issue.get('identifier', 'No ID')}: {issue.get('title', 'No Title')}\n\n"
        if issue.get('state'):
            markdown += f"**Status:** {issue['state']}\n\n"
        if issue.get('assignee') and issue['assignee'].get('name'):
            markdown += f"**Assignee:** {issue['assignee']['name']}\n"
        if issue.get('creator') and issue['creator'].get('name'):
            markdown += f"**Created by:** {issue['creator']['name']}\n"
        if issue.get('created_at'):
            created_date = self.format_date(issue['created_at'])
            markdown += f"**Created:** {created_date}\n"
        if issue.get('updated_at'):
            updated_date = self.format_date(issue['updated_at'])
            markdown += f"**Updated:** {updated_date}\n\n"
        if issue.get('description'):
            markdown += f"## Description\n\n{issue['description']}\n\n"
        if issue.get('comments'):
            markdown += f"## Comments ({len(issue['comments'])})\n\n"
            for comment in issue['comments']:
                user_name = "Unknown"
                if comment.get('user') and comment['user'].get('name'):
                    user_name = comment['user']['name']
                comment_date = "Unknown date"
                if comment.get('created_at'):
                    comment_date = self.format_date(comment['created_at'])
                markdown += f"### {user_name} ({comment_date})\n\n{comment.get('body', '')}\n\n---\n\n"
        return markdown
    @staticmethod
    def format_date(iso_date: str) -> str:
        """
        Format an ISO date string to a more readable format.
        Args:
            iso_date: ISO format date string
        Returns:
            Formatted date string
        """
        if not iso_date or not isinstance(iso_date, str):
            return "Unknown date"
        try:
            dt = datetime.fromisoformat(iso_date.replace('Z', '+00:00'))
            return dt.strftime('%Y-%m-%d %H:%M:%S')
        except ValueError:
            return iso_date
 # Example usage (uncomment to use):
 """
 if __name__ == "__main__":
    # Set your token here
    token = "YOUR_LINEAR_API_KEY"
    linear = LinearConnector(token)
    try:
        # Get all issues with comments
        issues = linear.get_all_issues()
        print(f"Retrieved {len(issues)} issues")
        # Format and print the first issue as markdown
        if issues:
            issue_md = linear.format_issue_to_markdown(issues[0])
            print("\nSample Issue in Markdown:\n")
            print(issue_md)
        # Get issues by date range
        start_date = "2023-01-01"
        end_date = "2023-01-31"
        date_issues, error = linear.get_issues_by_date_range(start_date, end_date)
        if error:
            print(f"Error: {error}")
        else:
            print(f"\nRetrieved {len(date_issues)} issues from {start_date} to {end_date}")
    except Exception as e:
        print(f"Error: {e}")
 """
--- a/surfsense_backend/app/db.py
+++ b/surfsense_backend/app/db.py
@ -40,12 +40,16 @@ class DocumentType(str, Enum):
    SLACK_CONNECTOR = "SLACK_CONNECTOR"
    NOTION_CONNECTOR = "NOTION_CONNECTOR"
    YOUTUBE_VIDEO = "YOUTUBE_VIDEO"
    GITHUB_CONNECTOR = "GITHUB_CONNECTOR"
    LINEAR_CONNECTOR = "LINEAR_CONNECTOR"
 class SearchSourceConnectorType(str, Enum):
    SERPER_API = "SERPER_API"
    TAVILY_API = "TAVILY_API"
    SLACK_CONNECTOR = "SLACK_CONNECTOR"
    NOTION_CONNECTOR = "NOTION_CONNECTOR"
    GITHUB_CONNECTOR = "GITHUB_CONNECTOR"
    LINEAR_CONNECTOR = "LINEAR_CONNECTOR"
 class ChatType(str, Enum):
    GENERAL = "GENERAL"
--- a/surfsense_backend/app/routes/chats_routes.py
+++ b/surfsense_backend/app/routes/chats_routes.py
@ -1,14 +1,15 @@
 from fastapi import APIRouter, Depends, HTTPException, Query
 from fastapi.responses import StreamingResponse
 from sqlalchemy.ext.asyncio import AsyncSession
 from sqlalchemy.future import select
 from sqlalchemy.exc import IntegrityError, OperationalError
 from typing import List
-from app.db import get_async_session, User, SearchSpace, Chat
+
-from app.schemas import ChatCreate, ChatUpdate, ChatRead, AISDKChatRequest
+from app.db import Chat, SearchSpace, User, get_async_session
 from app.schemas import AISDKChatRequest, ChatCreate, ChatRead, ChatUpdate
 from app.tasks.stream_connector_search_results import stream_connector_search_results
 from app.users import current_active_user
 from app.utils.check_ownership import check_ownership
 from fastapi import APIRouter, Depends, HTTPException
 from fastapi.responses import StreamingResponse
 from sqlalchemy.exc import IntegrityError, OperationalError
 from sqlalchemy.ext.asyncio import AsyncSession
 from sqlalchemy.future import select
 router = APIRouter()
@ -46,7 +47,7 @@ async def handle_chat_data(
    response = StreamingResponse(stream_connector_search_results(
        user_query,
        user.id,
-        search_space_id,
+        search_space_id,  # Already converted to int in lines 32-37
        session,
        research_mode,
        selected_connectors
@ -88,16 +89,19 @@ async def create_chat(
 async def read_chats(
    skip: int = 0,
    limit: int = 100,
    search_space_id: int = None,
    session: AsyncSession = Depends(get_async_session),
    user: User = Depends(current_active_user)
 ):
    try:
        query = select(Chat).join(SearchSpace).filter(SearchSpace.user_id == user.id)
        # Filter by search_space_id if provided
        if search_space_id is not None:
            query = query.filter(Chat.search_space_id == search_space_id)
        result = await session.execute(
-            select(Chat)
+            query.offset(skip).limit(limit)
            .join(SearchSpace)
            .filter(SearchSpace.user_id == user.id)
            .offset(skip)
            .limit(limit)
        )
        return result.scalars().all()
    except OperationalError:
--- a/surfsense_backend/app/routes/documents_routes.py
+++ b/surfsense_backend/app/routes/documents_routes.py
@ -170,16 +170,19 @@ async def process_file_in_background(
 async def read_documents(
    skip: int = 0,
    limit: int = 300,
    search_space_id: int = None,
    session: AsyncSession = Depends(get_async_session),
    user: User = Depends(current_active_user)
 ):
    try:
        query = select(Document).join(SearchSpace).filter(SearchSpace.user_id == user.id)
        # Filter by search_space_id if provided
        if search_space_id is not None:
            query = query.filter(Document.search_space_id == search_space_id)
        result = await session.execute(
-            select(Document)
+            query.offset(skip).limit(limit)
            .join(SearchSpace)
            .filter(SearchSpace.user_id == user.id)
            .offset(skip)
            .limit(limit)
        )
        db_documents = result.scalars().all()
--- a/surfsense_backend/app/routes/search_source_connectors_routes.py
+++ b/surfsense_backend/app/routes/search_source_connectors_routes.py
@ -7,20 +7,21 @@ PUT /search-source-connectors/{connector_id} - Update a specific connector
 DELETE /search-source-connectors/{connector_id} - Delete a specific connector
 POST /search-source-connectors/{connector_id}/index - Index content from a connector to a search space
-Note: Each user can have only one connector of each type (SERPER_API, TAVILY_API, SLACK_CONNECTOR, NOTION_CONNECTOR).
+Note: Each user can have only one connector of each type (SERPER_API, TAVILY_API, SLACK_CONNECTOR, NOTION_CONNECTOR, GITHUB_CONNECTOR, LINEAR_CONNECTOR).
 """
-from fastapi import APIRouter, Depends, HTTPException, Query, BackgroundTasks
+from fastapi import APIRouter, Depends, HTTPException, Query, BackgroundTasks, Body
 from sqlalchemy.ext.asyncio import AsyncSession
 from sqlalchemy.future import select
 from sqlalchemy.exc import IntegrityError
 from typing import List, Dict, Any
-from app.db import get_async_session, User, SearchSourceConnector, SearchSourceConnectorType, SearchSpace
+from app.db import get_async_session, User, SearchSourceConnector, SearchSourceConnectorType, SearchSpace, async_session_maker
-from app.schemas import SearchSourceConnectorCreate, SearchSourceConnectorUpdate, SearchSourceConnectorRead
+from app.schemas import SearchSourceConnectorCreate, SearchSourceConnectorUpdate, SearchSourceConnectorRead, SearchSourceConnectorBase
 from app.users import current_active_user
 from app.utils.check_ownership import check_ownership
-from pydantic import ValidationError
+from pydantic import BaseModel, Field, ValidationError
-from app.tasks.connectors_indexing_tasks import index_slack_messages, index_notion_pages
+from app.tasks.connectors_indexing_tasks import index_slack_messages, index_notion_pages, index_github_repos, index_linear_issues
-from datetime import datetime
+from app.connectors.github_connector import GitHubConnector
 from datetime import datetime, timezone, timedelta
 import logging
 # Set up logging
@ -28,6 +29,34 @@ logger = logging.getLogger(__name__)
 router = APIRouter()
 # Use Pydantic's BaseModel here
 class GitHubPATRequest(BaseModel):
    github_pat: str = Field(..., description="GitHub Personal Access Token")
 # --- New Endpoint to list GitHub Repositories ---
@router.post("/github/repositories/", response_model=List[Dict[str, Any]])
 async def list_github_repositories(
    pat_request: GitHubPATRequest,
    user: User = Depends(current_active_user) # Ensure the user is logged in
 ):
    """
    Fetches a list of repositories accessible by the provided GitHub PAT.
    The PAT is used for this request only and is not stored.
    """
    try:
        # Initialize GitHubConnector with the provided PAT
        github_client = GitHubConnector(token=pat_request.github_pat)
        # Fetch repositories
        repositories = github_client.get_user_repositories()
        return repositories
    except ValueError as e:
        # Handle invalid token error specifically
        logger.error(f"GitHub PAT validation failed for user {user.id}: {str(e)}")
        raise HTTPException(status_code=400, detail=f"Invalid GitHub PAT: {str(e)}")
    except Exception as e:
        logger.error(f"Failed to fetch GitHub repositories for user {user.id}: {str(e)}")
        raise HTTPException(status_code=500, detail="Failed to fetch GitHub repositories.")
@router.post("/search-source-connectors/", response_model=SearchSourceConnectorRead)
 async def create_search_source_connector(
    connector: SearchSourceConnectorCreate,
@ -37,7 +66,7 @@ async def create_search_source_connector(
    """
    Create a new search source connector.
-    Each user can have only one connector of each type (SERPER_API, TAVILY_API, SLACK_CONNECTOR).
+    Each user can have only one connector of each type (SERPER_API, TAVILY_API, SLACK_CONNECTOR, etc.).
    The config must contain the appropriate keys for the connector type.
    """
    try:
@ -50,13 +79,11 @@ async def create_search_source_connector(
            )
        )
        existing_connector = result.scalars().first()
        if existing_connector:
            raise HTTPException(
                status_code=409,
                detail=f"A connector with type {connector.connector_type} already exists. Each user can have only one connector of each type."
            )
        db_connector = SearchSourceConnector(**connector.model_dump(), user_id=user.id)
        session.add(db_connector)
        await session.commit()
@ -78,6 +105,7 @@ async def create_search_source_connector(
        await session.rollback()
        raise
    except Exception as e:
        logger.error(f"Failed to create search source connector: {str(e)}")
        await session.rollback()
        raise HTTPException(
            status_code=500,
@ -88,16 +116,18 @@ async def create_search_source_connector(
 async def read_search_source_connectors(
    skip: int = 0,
    limit: int = 100,
    search_space_id: int = None,
    session: AsyncSession = Depends(get_async_session),
    user: User = Depends(current_active_user)
 ):
    """List all search source connectors for the current user."""
    try:
        query = select(SearchSourceConnector).filter(SearchSourceConnector.user_id == user.id)
        # No need to filter by search_space_id as connectors are user-owned, not search space specific
        result = await session.execute(
-            select(SearchSourceConnector)
+            query.offset(skip).limit(limit)
            .filter(SearchSourceConnector.user_id == user.id)
            .offset(skip)
            .limit(limit)
        )
        return result.scalars().all()
    except Exception as e:
@ -132,54 +162,84 @@ async def update_search_source_connector(
 ):
    """
    Update a search source connector.
-    
+    Handles partial updates, including merging changes into the 'config' field.
    Each user can have only one connector of each type (SERPER_API, TAVILY_API, SLACK_CONNECTOR).
    The config must contain the appropriate keys for the connector type.
    """
-    try:
+    db_connector = await check_ownership(session, SearchSourceConnector, connector_id, user)
-        db_connector = await check_ownership(session, SearchSourceConnector, connector_id, user)
+    
    # Convert the sparse update data (only fields present in request) to a dict
    update_data = connector_update.model_dump(exclude_unset=True)
    # Special handling for 'config' field
    if "config" in update_data:
        incoming_config = update_data["config"] # Config data from the request
        existing_config = db_connector.config if db_connector.config else {} # Current config from DB
-        # If connector type is being changed, check if one of that type already exists
+        # Merge incoming config into existing config
-        if connector_update.connector_type != db_connector.connector_type:
+        # This preserves existing keys (like GITHUB_PAT) if they are not in the incoming data
        merged_config = existing_config.copy()
        merged_config.update(incoming_config)
        # -- Validation after merging --
        # Validate the *merged* config based on the connector type
        # We need the connector type - use the one from the update if provided, else the existing one
        current_connector_type = connector_update.connector_type if connector_update.connector_type is not None else db_connector.connector_type
        try:
            # We can reuse the base validator by creating a temporary base model instance
            # Note: This assumes 'name' and 'is_indexable' are not crucial for config validation itself
            temp_data_for_validation = {
                "name": db_connector.name, # Use existing name
                "connector_type": current_connector_type,
                "is_indexable": db_connector.is_indexable, # Use existing value
                "last_indexed_at": db_connector.last_indexed_at, # Not used by validator
                "config": merged_config
            }
            SearchSourceConnectorBase.model_validate(temp_data_for_validation)
        except ValidationError as e:
            # Raise specific validation error for the merged config
            raise HTTPException(
                status_code=422,
                detail=f"Validation error for merged config: {str(e)}"
            )
        # If validation passes, update the main update_data dict with the merged config
        update_data["config"] = merged_config
    # Apply all updates (including the potentially merged config)
    for key, value in update_data.items():
        # Prevent changing connector_type if it causes a duplicate (check moved here)
        if key == "connector_type" and value != db_connector.connector_type:
            result = await session.execute(
                select(SearchSourceConnector)
                .filter(
                    SearchSourceConnector.user_id == user.id,
-                    SearchSourceConnector.connector_type == connector_update.connector_type,
+                    SearchSourceConnector.connector_type == value,
                    SearchSourceConnector.id != connector_id
                )
            )
            existing_connector = result.scalars().first()
            if existing_connector:
                raise HTTPException(
                    status_code=409,
-                    detail=f"A connector with type {connector_update.connector_type} already exists. Each user can have only one connector of each type."
+                    detail=f"A connector with type {value} already exists. Each user can have only one connector of each type."
                )
-        update_data = connector_update.model_dump(exclude_unset=True)
+        setattr(db_connector, key, value)
-        for key, value in update_data.items():
+
-            setattr(db_connector, key, value)
+    try:
        await session.commit()
        await session.refresh(db_connector)
        return db_connector
    except ValidationError as e:
        await session.rollback()
        raise HTTPException(
            status_code=422,
            detail=f"Validation error: {str(e)}"
        )
    except IntegrityError as e:
        await session.rollback()
        # This might occur if connector_type constraint is violated somehow after the check
        raise HTTPException(
            status_code=409,
-            detail=f"Integrity error: A connector with this type already exists. {str(e)}"
+            detail=f"Database integrity error during update: {str(e)}"
        )
    except HTTPException:
        await session.rollback()
        raise
    except Exception as e:
        await session.rollback()
        logger.error(f"Failed to update search source connector {connector_id}: {e}", exc_info=True)
        raise HTTPException(
            status_code=500,
            detail=f"Failed to update search source connector: {str(e)}"
@ -218,10 +278,10 @@ async def index_connector_content(
    Index content from a connector to a search space.
    Currently supports:
-    - SLACK_CONNECTOR: Indexes messages from all accessible Slack channels since the last indexing
+    - SLACK_CONNECTOR: Indexes messages from all accessible Slack channels
-      (or the last 365 days if never indexed before)
+    - NOTION_CONNECTOR: Indexes pages from all accessible Notion pages
-    - NOTION_CONNECTOR: Indexes pages from all accessible Notion pages since the last indexing
+    - GITHUB_CONNECTOR: Indexes code and documentation from GitHub repositories
-      (or the last 365 days if never indexed before)
+    - LINEAR_CONNECTOR: Indexes issues and comments from Linear
    Args:
        connector_id: ID of the connector to use
@ -239,43 +299,65 @@ async def index_connector_content(
        search_space = await check_ownership(session, SearchSpace, search_space_id, user)
        # Handle different connector types
        response_message = ""
        indexing_from = None
        indexing_to = None
        today_str = datetime.now().strftime("%Y-%m-%d")
        if connector.connector_type == SearchSourceConnectorType.SLACK_CONNECTOR:
            # Determine the time range that will be indexed
            if not connector.last_indexed_at:
-                start_date = "365 days ago"
+                start_date = "365 days ago" # Or perhaps set a specific date if needed
            else:
                # Check if last_indexed_at is today
                today = datetime.now().date()
                if connector.last_indexed_at.date() == today:
                    # If last indexed today, go back 1 day to ensure we don't miss anything
-                    start_date = (today - datetime.timedelta(days=1)).strftime("%Y-%m-%d")
+                    start_date = (today - timedelta(days=1)).strftime("%Y-%m-%d")
                else:
                    start_date = connector.last_indexed_at.strftime("%Y-%m-%d")
-            # Add the indexing task to background tasks
+            indexing_from = start_date
-            if background_tasks:
+            indexing_to = today_str
-                background_tasks.add_task(
+            
-                    run_slack_indexing_with_new_session,
+            # Run indexing in background
-                    connector_id,
+            logger.info(f"Triggering Slack indexing for connector {connector_id} into search space {search_space_id}")
-                    search_space_id
+            background_tasks.add_task(run_slack_indexing_with_new_session, connector_id, search_space_id)
-                )
+            response_message = "Slack indexing started in the background."
-                
+
                return {
                    "success": True,
                    "message": "Slack indexing started in the background",
                    "connector_type": connector.connector_type,
                    "search_space": search_space.name,
                    "indexing_from": start_date,
                    "indexing_to": datetime.now().strftime("%Y-%m-%d")
                }
            else:
                # For testing or if background tasks are not available
                return {
                    "success": False,
                    "message": "Background tasks not available",
                    "connector_type": connector.connector_type
                }
        elif connector.connector_type == SearchSourceConnectorType.NOTION_CONNECTOR:
            # Determine the time range that will be indexed
            if not connector.last_indexed_at:
                start_date = "365 days ago" # Or perhaps set a specific date
            else:
                # Check if last_indexed_at is today
                today = datetime.now().date()
                if connector.last_indexed_at.date() == today:
                    # If last indexed today, go back 1 day to ensure we don't miss anything
                    start_date = (today - timedelta(days=1)).strftime("%Y-%m-%d")
                else:
                    start_date = connector.last_indexed_at.strftime("%Y-%m-%d")
            indexing_from = start_date
            indexing_to = today_str
            # Run indexing in background
            logger.info(f"Triggering Notion indexing for connector {connector_id} into search space {search_space_id}")
            background_tasks.add_task(run_notion_indexing_with_new_session, connector_id, search_space_id)
            response_message = "Notion indexing started in the background."
        elif connector.connector_type == SearchSourceConnectorType.GITHUB_CONNECTOR:
            # GitHub connector likely indexes everything relevant, or uses internal logic
            # Setting indexing_from to None and indexing_to to today
            indexing_from = None 
            indexing_to = today_str
            # Run indexing in background
            logger.info(f"Triggering GitHub indexing for connector {connector_id} into search space {search_space_id}")
            background_tasks.add_task(run_github_indexing_with_new_session, connector_id, search_space_id)
            response_message = "GitHub indexing started in the background."
        elif connector.connector_type == SearchSourceConnectorType.LINEAR_CONNECTOR:
            # Determine the time range that will be indexed
            if not connector.last_indexed_at:
                start_date = "365 days ago"
@ -284,48 +366,39 @@ async def index_connector_content(
                today = datetime.now().date()
                if connector.last_indexed_at.date() == today:
                    # If last indexed today, go back 1 day to ensure we don't miss anything
-                    start_date = (today - datetime.timedelta(days=1)).strftime("%Y-%m-%d")
+                    start_date = (today - timedelta(days=1)).strftime("%Y-%m-%d")
                else:
                    start_date = connector.last_indexed_at.strftime("%Y-%m-%d")
-            # Add the indexing task to background tasks
+            indexing_from = start_date
-            if background_tasks:
+            indexing_to = today_str
-                background_tasks.add_task(
+
-                    run_notion_indexing_with_new_session,
+            # Run indexing in background
-                    connector_id,
+            logger.info(f"Triggering Linear indexing for connector {connector_id} into search space {search_space_id}")
-                    search_space_id
+            background_tasks.add_task(run_linear_indexing_with_new_session, connector_id, search_space_id)
-                )
+            response_message = "Linear indexing started in the background."
-                
+
                return {
                    "success": True,
                    "message": "Notion indexing started in the background",
                    "connector_type": connector.connector_type,
                    "search_space": search_space.name,
                    "indexing_from": start_date,
                    "indexing_to": datetime.now().strftime("%Y-%m-%d")
                }
            else:
                # For testing or if background tasks are not available
                return {
                    "success": False,
                    "message": "Background tasks not available",
                    "connector_type": connector.connector_type
                }
        else:
            raise HTTPException(
                status_code=400,
                detail=f"Indexing not supported for connector type: {connector.connector_type}"
            )
-    
+
        return {
            "message": response_message, 
            "connector_id": connector_id, 
            "search_space_id": search_space_id,
            "indexing_from": indexing_from,
            "indexing_to": indexing_to
        }
    except HTTPException:
        raise
    except Exception as e:
-        logger.error(f"Failed to start indexing: {str(e)}")
+        logger.error(f"Failed to initiate indexing for connector {connector_id}: {e}", exc_info=True)
        raise HTTPException(
            status_code=500,
-            detail=f"Failed to start indexing: {str(e)}"
+            detail=f"Failed to initiate indexing: {str(e)}"
-        ) 
+        )
 async def update_connector_last_indexed(
    session: AsyncSession,
@ -361,8 +434,6 @@ async def run_slack_indexing_with_new_session(
    Create a new session and run the Slack indexing task.
    This prevents session leaks by creating a dedicated session for the background task.
    """
    from app.db import async_session_maker
    async with async_session_maker() as session:
        await run_slack_indexing(session, connector_id, search_space_id)
@ -405,8 +476,6 @@ async def run_notion_indexing_with_new_session(
    Create a new session and run the Notion indexing task.
    This prevents session leaks by creating a dedicated session for the background task.
    """
    from app.db import async_session_maker
    async with async_session_maker() as session:
        await run_notion_indexing(session, connector_id, search_space_id)
@ -439,4 +508,72 @@ async def run_notion_indexing(
        else:
            logger.error(f"Notion indexing failed or no documents processed: {error_or_warning}")
    except Exception as e:
-        logger.error(f"Error in background Notion indexing task: {str(e)}")
+        logger.error(f"Error in background Notion indexing task: {str(e)}")
 # Add new helper functions for GitHub indexing
 async def run_github_indexing_with_new_session(
    connector_id: int,
    search_space_id: int
 ):
    """Wrapper to run GitHub indexing with its own database session."""
    logger.info(f"Background task started: Indexing GitHub connector {connector_id} into space {search_space_id}")
    async with async_session_maker() as session:
        await run_github_indexing(session, connector_id, search_space_id)
    logger.info(f"Background task finished: Indexing GitHub connector {connector_id}")
 async def run_github_indexing(
    session: AsyncSession,
    connector_id: int,
    search_space_id: int
 ):
    """Runs the GitHub indexing task and updates the timestamp."""
    try:
        indexed_count, error_message = await index_github_repos(
            session, connector_id, search_space_id, update_last_indexed=False
        )
        if error_message:
            logger.error(f"GitHub indexing failed for connector {connector_id}: {error_message}")
            # Optionally update status in DB to indicate failure
        else:
            logger.info(f"GitHub indexing successful for connector {connector_id}. Indexed {indexed_count} documents.")
            # Update the last indexed timestamp only on success
            await update_connector_last_indexed(session, connector_id)
            await session.commit() # Commit timestamp update
    except Exception as e:
        await session.rollback()
        logger.error(f"Critical error in run_github_indexing for connector {connector_id}: {e}", exc_info=True)
        # Optionally update status in DB to indicate failure
 # Add new helper functions for Linear indexing
 async def run_linear_indexing_with_new_session(
    connector_id: int,
    search_space_id: int
 ):
    """Wrapper to run Linear indexing with its own database session."""
    logger.info(f"Background task started: Indexing Linear connector {connector_id} into space {search_space_id}")
    async with async_session_maker() as session:
        await run_linear_indexing(session, connector_id, search_space_id)
    logger.info(f"Background task finished: Indexing Linear connector {connector_id}")
 async def run_linear_indexing(
    session: AsyncSession,
    connector_id: int,
    search_space_id: int
 ):
    """Runs the Linear indexing task and updates the timestamp."""
    try:
        indexed_count, error_message = await index_linear_issues(
            session, connector_id, search_space_id, update_last_indexed=False
        )
        if error_message:
            logger.error(f"Linear indexing failed for connector {connector_id}: {error_message}")
            # Optionally update status in DB to indicate failure
        else:
            logger.info(f"Linear indexing successful for connector {connector_id}. Indexed {indexed_count} documents.")
            # Update the last indexed timestamp only on success
            await update_connector_last_indexed(session, connector_id)
            await session.commit() # Commit timestamp update
    except Exception as e:
        await session.rollback()
        logger.error(f"Critical error in run_linear_indexing for connector {connector_id}: {e}", exc_info=True)
        # Optionally update status in DB to indicate failure
--- a/surfsense_backend/app/schemas/search_source_connector.py
+++ b/surfsense_backend/app/schemas/search_source_connector.py
@ -1,16 +1,15 @@
 from datetime import datetime
 import uuid
-from typing import Dict, Any
+from typing import Dict, Any, Optional
 from pydantic import BaseModel, field_validator
 from .base import IDModel, TimestampModel
 from app.db import SearchSourceConnectorType
 from fastapi import HTTPException
 class SearchSourceConnectorBase(BaseModel):
    name: str
    connector_type: SearchSourceConnectorType
    is_indexable: bool
-    last_indexed_at: datetime | None
+    last_indexed_at: Optional[datetime] = None
    config: Dict[str, Any]
    @field_validator('config')
@ -57,17 +56,45 @@ class SearchSourceConnectorBase(BaseModel):
            # Ensure the integration token is not empty
            if not config.get("NOTION_INTEGRATION_TOKEN"):
                raise ValueError("NOTION_INTEGRATION_TOKEN cannot be empty")
        elif connector_type == SearchSourceConnectorType.GITHUB_CONNECTOR:
            # For GITHUB_CONNECTOR, only allow GITHUB_PAT and repo_full_names
            allowed_keys = ["GITHUB_PAT", "repo_full_names"]
            if set(config.keys()) != set(allowed_keys):
                raise ValueError(f"For GITHUB_CONNECTOR connector type, config must only contain these keys: {allowed_keys}")
            # Ensure the token is not empty
            if not config.get("GITHUB_PAT"):
                raise ValueError("GITHUB_PAT cannot be empty")
            # Ensure the repo_full_names is present and is a non-empty list
            repo_full_names = config.get("repo_full_names")
            if not isinstance(repo_full_names, list) or not repo_full_names:
                raise ValueError("repo_full_names must be a non-empty list of strings")
        elif connector_type == SearchSourceConnectorType.LINEAR_CONNECTOR:
            # For LINEAR_CONNECTOR, only allow LINEAR_API_KEY
            allowed_keys = ["LINEAR_API_KEY"]
            if set(config.keys()) != set(allowed_keys):
                raise ValueError(f"For LINEAR_CONNECTOR connector type, config must only contain these keys: {allowed_keys}")
            # Ensure the token is not empty
            if not config.get("LINEAR_API_KEY"):
                raise ValueError("LINEAR_API_KEY cannot be empty")
        return config
 class SearchSourceConnectorCreate(SearchSourceConnectorBase):
    pass
-class SearchSourceConnectorUpdate(SearchSourceConnectorBase):
+class SearchSourceConnectorUpdate(BaseModel):
-    pass
+    name: Optional[str] = None
    connector_type: Optional[SearchSourceConnectorType] = None
    is_indexable: Optional[bool] = None
    last_indexed_at: Optional[datetime] = None
    config: Optional[Dict[str, Any]] = None
 class SearchSourceConnectorRead(SearchSourceConnectorBase, IDModel, TimestampModel):
    user_id: uuid.UUID
    class Config:
-        from_attributes = True 
+        from_attributes = True 
--- a/surfsense_backend/app/tasks/connectors_indexing_tasks.py
+++ b/surfsense_backend/app/tasks/connectors_indexing_tasks.py
@ -1,14 +1,16 @@
-from typing import Optional, List, Dict, Any, Tuple
+from typing import Optional, Tuple
 from sqlalchemy.ext.asyncio import AsyncSession
 from sqlalchemy.exc import SQLAlchemyError
 from sqlalchemy.future import select
 from sqlalchemy import delete
-from datetime import datetime, timedelta
+from datetime import datetime, timedelta, timezone
 from app.db import Document, DocumentType, Chunk, SearchSourceConnector, SearchSourceConnectorType
 from app.config import config
 from app.prompts import SUMMARY_PROMPT_TEMPLATE
 from app.connectors.slack_history import SlackHistory
 from app.connectors.notion_history import NotionHistoryConnector
 from app.connectors.github_connector import GitHubConnector
 from app.connectors.linear_connector import LinearConnector
 from slack_sdk.errors import SlackApiError
 import logging
@ -59,8 +61,20 @@ async def index_slack_messages(
        end_date = datetime.now()
        # Use last_indexed_at as start date if available, otherwise use 365 days ago
-
+        if connector.last_indexed_at:
-        start_date = end_date - timedelta(days=365)
+            # Convert dates to be comparable (both timezone-naive)
            last_indexed_naive = connector.last_indexed_at.replace(tzinfo=None) if connector.last_indexed_at.tzinfo else connector.last_indexed_at
            # Check if last_indexed_at is in the future or after end_date
            if last_indexed_naive > end_date:
                logger.warning(f"Last indexed date ({last_indexed_naive.strftime('%Y-%m-%d')}) is in the future. Using 30 days ago instead.")
                start_date = end_date - timedelta(days=30)
            else:
                start_date = last_indexed_naive
                logger.info(f"Using last_indexed_at ({start_date.strftime('%Y-%m-%d')}) as start date")
        else:
            start_date = end_date - timedelta(days=30)  # Use 30 days instead of 365 to catch recent issues
            logger.info(f"No last_indexed_at found, using {start_date.strftime('%Y-%m-%d')} (30 days ago) as start date")
        # Format dates for Slack API
        start_date_str = start_date.strftime("%Y-%m-%d")
@ -589,3 +603,473 @@ async def index_notion_pages(
        await session.rollback()
        logger.error(f"Failed to index Notion pages: {str(e)}", exc_info=True)
        return 0, f"Failed to index Notion pages: {str(e)}"
 async def index_github_repos(
    session: AsyncSession,
    connector_id: int,
    search_space_id: int,
    update_last_indexed: bool = True
 ) -> Tuple[int, Optional[str]]:
    """
    Index code and documentation files from accessible GitHub repositories.
    Args:
        session: Database session
        connector_id: ID of the GitHub connector
        search_space_id: ID of the search space to store documents in
        update_last_indexed: Whether to update the last_indexed_at timestamp (default: True)
    Returns:
        Tuple containing (number of documents indexed, error message or None)
    """
    documents_processed = 0
    errors = []
    try:
        # 1. Get the GitHub connector from the database
        result = await session.execute(
            select(SearchSourceConnector)
            .filter(
                SearchSourceConnector.id == connector_id,
                SearchSourceConnector.connector_type == SearchSourceConnectorType.GITHUB_CONNECTOR
            )
        )
        connector = result.scalars().first()
        if not connector:
            return 0, f"Connector with ID {connector_id} not found or is not a GitHub connector"
        # 2. Get the GitHub PAT and selected repositories from the connector config
        github_pat = connector.config.get("GITHUB_PAT")
        repo_full_names_to_index = connector.config.get("repo_full_names")
        if not github_pat:
            return 0, "GitHub Personal Access Token (PAT) not found in connector config"
        if not repo_full_names_to_index or not isinstance(repo_full_names_to_index, list):
             return 0, "'repo_full_names' not found or is not a list in connector config"
        # 3. Initialize GitHub connector client
        try:
            github_client = GitHubConnector(token=github_pat)
        except ValueError as e:
            return 0, f"Failed to initialize GitHub client: {str(e)}"
        # 4. Validate selected repositories
        #    For simplicity, we'll proceed with the list provided.
        #    If a repo is inaccessible, get_repository_files will likely fail gracefully later.
        logger.info(f"Starting indexing for {len(repo_full_names_to_index)} selected repositories.")
        # 5. Get existing documents for this search space and connector type to prevent duplicates
        existing_docs_result = await session.execute(
            select(Document)
            .filter(
                Document.search_space_id == search_space_id,
                Document.document_type == DocumentType.GITHUB_CONNECTOR
            )
        )
        existing_docs = existing_docs_result.scalars().all()
        # Create a lookup dict: key=repo_fullname/file_path, value=Document object
        existing_docs_lookup = {doc.document_metadata.get("full_path"): doc for doc in existing_docs if doc.document_metadata.get("full_path")}
        logger.info(f"Found {len(existing_docs_lookup)} existing GitHub documents in database for search space {search_space_id}")
        # 6. Iterate through selected repositories and index files
        for repo_full_name in repo_full_names_to_index:
            if not repo_full_name or not isinstance(repo_full_name, str):
                logger.warning(f"Skipping invalid repository entry: {repo_full_name}")
                continue
            logger.info(f"Processing repository: {repo_full_name}")
            try:
                files_to_index = github_client.get_repository_files(repo_full_name)
                if not files_to_index:
                    logger.info(f"No indexable files found in repository: {repo_full_name}")
                    continue
                logger.info(f"Found {len(files_to_index)} files to process in {repo_full_name}")
                for file_info in files_to_index:
                    file_path = file_info.get("path")
                    file_url = file_info.get("url")
                    file_sha = file_info.get("sha")
                    file_type = file_info.get("type") # 'code' or 'doc'
                    full_path_key = f"{repo_full_name}/{file_path}"
                    if not file_path or not file_url or not file_sha:
                        logger.warning(f"Skipping file with missing info in {repo_full_name}: {file_info}")
                        continue
                    # Check if document already exists and if content hash matches
                    existing_doc = existing_docs_lookup.get(full_path_key)
                    if existing_doc and existing_doc.document_metadata.get("sha") == file_sha:
                        logger.debug(f"Skipping unchanged file: {full_path_key}")
                        continue # Skip if SHA matches (content hasn't changed)
                    # Get file content
                    file_content = github_client.get_file_content(repo_full_name, file_path)
                    if file_content is None:
                        logger.warning(f"Could not retrieve content for {full_path_key}. Skipping.")
                        continue # Skip if content fetch failed
                    # Use file_content directly for chunking, maybe summary for main content?
                    # For now, let's use the full content for both, might need refinement
                    summary_content = f"GitHub file: {full_path_key}\n\n{file_content[:1000]}..." # Simple summary
                    summary_embedding = config.embedding_model_instance.embed(summary_content)
                    # Chunk the content
                    try:
                        chunks_data = [
                            Chunk(content=chunk.text, embedding=chunk.embedding)
                            for chunk in config.chunker_instance.chunk(file_content)
                        ]
                    except Exception as chunk_err:
                        logger.error(f"Failed to chunk file {full_path_key}: {chunk_err}")
                        errors.append(f"Chunking failed for {full_path_key}: {chunk_err}")
                        continue # Skip this file if chunking fails
                    doc_metadata = {
                        "repository_full_name": repo_full_name,
                        "file_path": file_path,
                        "full_path": full_path_key, # For easier lookup
                        "url": file_url,
                        "sha": file_sha,
                        "type": file_type,
                        "indexed_at": datetime.now(timezone.utc).isoformat()
                    }
                    if existing_doc:
                        # Update existing document
                        logger.info(f"Updating document for file: {full_path_key}")
                        existing_doc.title = f"GitHub - {file_path}"
                        existing_doc.document_metadata = doc_metadata
                        existing_doc.content = summary_content # Update summary
                        existing_doc.embedding = summary_embedding # Update embedding
                        # Delete old chunks
                        await session.execute(
                            delete(Chunk)
                            .where(Chunk.document_id == existing_doc.id)
                        )
                        # Add new chunks
                        for chunk_obj in chunks_data:
                            chunk_obj.document_id = existing_doc.id
                            session.add(chunk_obj)
                        documents_processed += 1
                    else:
                        # Create new document
                        logger.info(f"Creating new document for file: {full_path_key}")
                        document = Document(
                            title=f"GitHub - {file_path}",
                            document_type=DocumentType.GITHUB_CONNECTOR,
                            document_metadata=doc_metadata,
                            content=summary_content, # Store summary
                            embedding=summary_embedding,
                            search_space_id=search_space_id,
                            chunks=chunks_data # Associate chunks directly
                        )
                        session.add(document)
                        documents_processed += 1
                    # Commit periodically or at the end? For now, commit per repo
                    # await session.commit() 
            except Exception as repo_err:
                logger.error(f"Failed to process repository {repo_full_name}: {repo_err}")
                errors.append(f"Failed processing {repo_full_name}: {repo_err}")
        # Commit all changes at the end
        await session.commit()
        logger.info(f"Finished GitHub indexing for connector {connector_id}. Processed {documents_processed} files.")
    except SQLAlchemyError as db_err:
        await session.rollback()
        logger.error(f"Database error during GitHub indexing for connector {connector_id}: {db_err}")
        errors.append(f"Database error: {db_err}")
        return documents_processed, "; ".join(errors) if errors else str(db_err)
    except Exception as e:
        await session.rollback()
        logger.error(f"Unexpected error during GitHub indexing for connector {connector_id}: {e}", exc_info=True)
        errors.append(f"Unexpected error: {e}")
        return documents_processed, "; ".join(errors) if errors else str(e)
    error_message = "; ".join(errors) if errors else None
    return documents_processed, error_message
 async def index_linear_issues(
    session: AsyncSession,
    connector_id: int,
    search_space_id: int,
    update_last_indexed: bool = True
 ) -> Tuple[int, Optional[str]]:
    """
    Index Linear issues and comments.
    Args:
        session: Database session
        connector_id: ID of the Linear connector
        search_space_id: ID of the search space to store documents in
        update_last_indexed: Whether to update the last_indexed_at timestamp (default: True)
    Returns:
        Tuple containing (number of documents indexed, error message or None)
    """
    try:
        # Get the connector
        result = await session.execute(
            select(SearchSourceConnector)
            .filter(
                SearchSourceConnector.id == connector_id,
                SearchSourceConnector.connector_type == SearchSourceConnectorType.LINEAR_CONNECTOR
            )
        )
        connector = result.scalars().first()
        if not connector:
            return 0, f"Connector with ID {connector_id} not found or is not a Linear connector"
        # Get the Linear token from the connector config
        linear_token = connector.config.get("LINEAR_API_KEY")
        if not linear_token:
            return 0, "Linear API token not found in connector config"
        # Initialize Linear client
        linear_client = LinearConnector(token=linear_token)
        # Calculate date range
        end_date = datetime.now()
        # Use last_indexed_at as start date if available, otherwise use 365 days ago
        if connector.last_indexed_at:
            # Convert dates to be comparable (both timezone-naive)
            last_indexed_naive = connector.last_indexed_at.replace(tzinfo=None) if connector.last_indexed_at.tzinfo else connector.last_indexed_at
            # Check if last_indexed_at is in the future or after end_date
            if last_indexed_naive > end_date:
                logger.warning(f"Last indexed date ({last_indexed_naive.strftime('%Y-%m-%d')}) is in the future. Using 30 days ago instead.")
                start_date = end_date - timedelta(days=30)
            else:
                start_date = last_indexed_naive
                logger.info(f"Using last_indexed_at ({start_date.strftime('%Y-%m-%d')}) as start date")
        else:
            start_date = end_date - timedelta(days=30)  # Use 30 days instead of 365 to catch recent issues
            logger.info(f"No last_indexed_at found, using {start_date.strftime('%Y-%m-%d')} (30 days ago) as start date")
        # Format dates for Linear API
        start_date_str = start_date.strftime("%Y-%m-%d")
        end_date_str = end_date.strftime("%Y-%m-%d")
        logger.info(f"Fetching Linear issues from {start_date_str} to {end_date_str}")
        # Get issues within date range
        try:
            issues, error = linear_client.get_issues_by_date_range(
                start_date=start_date_str,
                end_date=end_date_str,
                include_comments=True
            )
            if error:
                logger.error(f"Failed to get Linear issues: {error}")
                # Don't treat "No issues found" as an error that should stop indexing
                if "No issues found" in error:
                    logger.info("No issues found is not a critical error, continuing with update")
                    if update_last_indexed:
                        connector.last_indexed_at = datetime.now()
                        await session.commit()
                        logger.info(f"Updated last_indexed_at to {connector.last_indexed_at} despite no issues found")
                    return 0, None
                else:
                    return 0, f"Failed to get Linear issues: {error}"
            logger.info(f"Retrieved {len(issues)} issues from Linear API")
        except Exception as e:
            logger.error(f"Exception when calling Linear API: {str(e)}", exc_info=True)
            return 0, f"Failed to get Linear issues: {str(e)}"
        if not issues:
            logger.info("No Linear issues found for the specified date range")
            if update_last_indexed:
                connector.last_indexed_at = datetime.now()
                await session.commit()
                logger.info(f"Updated last_indexed_at to {connector.last_indexed_at} despite no issues found")
            return 0, None  # Return None instead of error message when no issues found
        # Log issue IDs and titles for debugging
        logger.info("Issues retrieved from Linear API:")
        for idx, issue in enumerate(issues[:10]):  # Log first 10 issues
            logger.info(f"  {idx+1}. {issue.get('identifier', 'Unknown')} - {issue.get('title', 'Unknown')} - Created: {issue.get('createdAt', 'Unknown')} - Updated: {issue.get('updatedAt', 'Unknown')}")
        if len(issues) > 10:
            logger.info(f"  ...and {len(issues) - 10} more issues")
        # Get existing documents for this search space and connector type to prevent duplicates
        existing_docs_result = await session.execute(
            select(Document)
            .filter(
                Document.search_space_id == search_space_id,
                Document.document_type == DocumentType.LINEAR_CONNECTOR
            )
        )
        existing_docs = existing_docs_result.scalars().all()
        # Create a lookup dictionary of existing documents by issue_id
        existing_docs_by_issue_id = {}
        for doc in existing_docs:
            if "issue_id" in doc.document_metadata:
                existing_docs_by_issue_id[doc.document_metadata["issue_id"]] = doc
        logger.info(f"Found {len(existing_docs_by_issue_id)} existing Linear documents in database")
        # Log existing document IDs for debugging
        if existing_docs_by_issue_id:
            logger.info("Existing Linear document issue IDs in database:")
            for idx, (issue_id, doc) in enumerate(list(existing_docs_by_issue_id.items())[:10]):  # Log first 10
                logger.info(f"  {idx+1}. {issue_id} - {doc.document_metadata.get('issue_identifier', 'Unknown')} - {doc.document_metadata.get('issue_title', 'Unknown')}")
            if len(existing_docs_by_issue_id) > 10:
                logger.info(f"  ...and {len(existing_docs_by_issue_id) - 10} more existing documents")
        # Track the number of documents indexed
        documents_indexed = 0
        documents_updated = 0
        documents_skipped = 0
        skipped_issues = []
        # Process each issue
        for issue in issues:
            try:
                issue_id = issue.get("id")
                issue_identifier = issue.get("identifier", "")
                issue_title = issue.get("title", "")
                if not issue_id or not issue_title:
                    logger.warning(f"Skipping issue with missing ID or title: {issue_id or 'Unknown'}")
                    skipped_issues.append(f"{issue_identifier or 'Unknown'} (missing data)")
                    documents_skipped += 1
                    continue
                # Format the issue first to get well-structured data
                formatted_issue = linear_client.format_issue(issue)
                # Convert issue to markdown format
                issue_content = linear_client.format_issue_to_markdown(formatted_issue)
                if not issue_content:
                    logger.warning(f"Skipping issue with no content: {issue_identifier} - {issue_title}")
                    skipped_issues.append(f"{issue_identifier} (no content)")
                    documents_skipped += 1
                    continue
                # Create a short summary for the embedding
                # This avoids using the LLM and just uses the issue data directly
                state = formatted_issue.get("state", "Unknown")
                description = formatted_issue.get("description", "")
                # Truncate description if it's too long for the summary
                if description and len(description) > 500:
                    description = description[:497] + "..."
                # Create a simple summary from the issue data
                summary_content = f"Linear Issue {issue_identifier}: {issue_title}\n\nStatus: {state}\n\n"
                if description:
                    summary_content += f"Description: {description}\n\n"
                # Add comment count
                comment_count = len(formatted_issue.get("comments", []))
                summary_content += f"Comments: {comment_count}"
                # Generate embedding for the summary
                summary_embedding = config.embedding_model_instance.embed(summary_content)
                # Process chunks - using the full issue content with comments
                chunks = [
                    Chunk(content=chunk.text, embedding=chunk.embedding)
                    for chunk in config.chunker_instance.chunk(issue_content)
                ]
                # Check if this issue already exists in our database
                existing_document = existing_docs_by_issue_id.get(issue_id)
                if existing_document:
                    # Update existing document instead of creating a new one
                    logger.info(f"Updating existing document for issue {issue_identifier} - {issue_title}")
                    # Update document fields
                    existing_document.title = f"Linear - {issue_identifier}: {issue_title}"
                    existing_document.document_metadata = {
                        "issue_id": issue_id,
                        "issue_identifier": issue_identifier,
                        "issue_title": issue_title,
                        "state": state,
                        "comment_count": comment_count,
                        "indexed_at": datetime.now().strftime("%Y-%m-%d %H:%M:%S"),
                        "last_updated": datetime.now().strftime("%Y-%m-%d %H:%M:%S")
                    }
                    existing_document.content = summary_content
                    existing_document.embedding = summary_embedding
                    # Delete existing chunks and add new ones
                    await session.execute(
                        delete(Chunk)
                        .where(Chunk.document_id == existing_document.id)
                    )
                    # Assign new chunks to existing document
                    for chunk in chunks:
                        chunk.document_id = existing_document.id
                        session.add(chunk)
                    documents_updated += 1
                else:
                    # Create and store new document
                    logger.info(f"Creating new document for issue {issue_identifier} - {issue_title}")
                    document = Document(
                        search_space_id=search_space_id,
                        title=f"Linear - {issue_identifier}: {issue_title}",
                        document_type=DocumentType.LINEAR_CONNECTOR,
                        document_metadata={
                            "issue_id": issue_id,
                            "issue_identifier": issue_identifier,
                            "issue_title": issue_title,
                            "state": state,
                            "comment_count": comment_count,
                            "indexed_at": datetime.now().strftime("%Y-%m-%d %H:%M:%S")
                        },
                        content=summary_content,
                        embedding=summary_embedding,
                        chunks=chunks
                    )
                    session.add(document)
                    documents_indexed += 1
                    logger.info(f"Successfully indexed new issue {issue_identifier} - {issue_title}")
            except Exception as e:
                logger.error(f"Error processing issue {issue.get('identifier', 'Unknown')}: {str(e)}", exc_info=True)
                skipped_issues.append(f"{issue.get('identifier', 'Unknown')} (processing error)")
                documents_skipped += 1
                continue  # Skip this issue and continue with others
        # Update the last_indexed_at timestamp for the connector only if requested
        total_processed = documents_indexed + documents_updated
        if update_last_indexed:
            connector.last_indexed_at = datetime.now()
            logger.info(f"Updated last_indexed_at to {connector.last_indexed_at}")
        # Commit all changes
        await session.commit()
        logger.info(f"Successfully committed all Linear document changes to database")
        logger.info(f"Linear indexing completed: {documents_indexed} new issues, {documents_updated} updated, {documents_skipped} skipped")
        return total_processed, None  # Return None as the error message to indicate success
    except SQLAlchemyError as db_error:
        await session.rollback()
        logger.error(f"Database error: {str(db_error)}", exc_info=True)
        return 0, f"Database error: {str(db_error)}"
    except Exception as e:
        await session.rollback()
        logger.error(f"Failed to index Linear issues: {str(e)}", exc_info=True)
        return 0, f"Failed to index Linear issues: {str(e)}"
--- a/surfsense_backend/app/tasks/stream_connector_search_results.py
+++ b/surfsense_backend/app/tasks/stream_connector_search_results.py
@ -1,20 +1,15 @@
-import json
+from typing import AsyncGenerator, List, Union
-from sqlalchemy.ext.asyncio import AsyncSession
+from uuid import UUID
 from typing import List, AsyncGenerator, Dict, Any
 import asyncio
 import re
-from app.utils.connector_service import ConnectorService
+from app.agents.researcher.graph import graph as researcher_graph
-from app.utils.research_service import ResearchService
+from app.agents.researcher.state import State
 from app.utils.streaming_service import StreamingService
-from app.utils.reranker_service import RerankerService
+from sqlalchemy.ext.asyncio import AsyncSession
-from app.utils.query_service import QueryService
+
 from app.config import config
 from app.utils.document_converters import convert_chunks_to_langchain_documents
 async def stream_connector_search_results(
    user_query: str, 
-    user_id: int, 
+    user_id: Union[str, UUID], 
    search_space_id: int, 
    session: AsyncSession, 
    research_mode: str, 
@ -25,7 +20,7 @@ async def stream_connector_search_results(
    Args:
        user_query: The user's query
-        user_id: The user's ID
+        user_id: The user's ID (can be UUID object or string)
        search_space_id: The search space ID
        session: The database session
        research_mode: The research mode
@ -34,365 +29,45 @@ async def stream_connector_search_results(
    Yields:
        str: Formatted response strings
    """
    # Initialize services
    connector_service = ConnectorService(session)
    streaming_service = StreamingService()
    # Reformulate the user query using the strategic LLM
    yield streaming_service.add_terminal_message("Reformulating your query for better results...", "info")
    reformulated_query = await QueryService.reformulate_query(user_query)
    yield streaming_service.add_terminal_message(f"Searching for: {reformulated_query}", "success")
    reranker_service = RerankerService.get_reranker_instance(config)
    all_raw_documents = []  # Store all raw documents before reranking
    all_sources = []
    TOP_K = 20
    if research_mode == "GENERAL":
-        TOP_K = 20
+        NUM_SECTIONS = 1
    elif research_mode == "DEEP":
-        TOP_K = 40
+        NUM_SECTIONS = 3
    elif research_mode == "DEEPER":
-        TOP_K = 60
+        NUM_SECTIONS = 6
-
+    # Convert UUID to string if needed
-    # Process each selected connector
+    user_id_str = str(user_id) if isinstance(user_id, UUID) else user_id
    for connector in selected_connectors:
        if connector == "YOUTUBE_VIDEO":
            # Send terminal message about starting search
            yield streaming_service.add_terminal_message("Starting to search for youtube videos...")
            # Search for YouTube videos using reformulated query
            result_object, youtube_chunks = await connector_service.search_youtube(
                user_query=reformulated_query,
                user_id=user_id,
                search_space_id=search_space_id,
                top_k=TOP_K
            )
            # Send terminal message about search results
            yield streaming_service.add_terminal_message(
                f"Found {len(result_object['sources'])} relevant YouTube videos",
                "success"
            )
            # Update sources
            all_sources.append(result_object)
            yield streaming_service.update_sources(all_sources)
            # Add documents to collection
            all_raw_documents.extend(youtube_chunks)
        # Extension Docs
        if connector == "EXTENSION":
            # Send terminal message about starting search
            yield streaming_service.add_terminal_message("Starting to search for extension...")
            # Search for crawled URLs using reformulated query
            result_object, extension_chunks = await connector_service.search_extension(
                user_query=reformulated_query,
                user_id=user_id,
                search_space_id=search_space_id,
                top_k=TOP_K
            )
            # Send terminal message about search results
            yield streaming_service.add_terminal_message(
                f"Found {len(result_object['sources'])} relevant extension documents",
                "success"
            )
            # Update sources
            all_sources.append(result_object)
            yield streaming_service.update_sources(all_sources)
            # Add documents to collection
            all_raw_documents.extend(extension_chunks)
        # Crawled URLs
        if connector == "CRAWLED_URL":
            # Send terminal message about starting search
            yield streaming_service.add_terminal_message("Starting to search for crawled URLs...")
            # Search for crawled URLs using reformulated query
            result_object, crawled_urls_chunks = await connector_service.search_crawled_urls(
                user_query=reformulated_query,
                user_id=user_id,
                search_space_id=search_space_id,
                top_k=TOP_K
            )
            # Send terminal message about search results
            yield streaming_service.add_terminal_message(
                f"Found {len(result_object['sources'])} relevant crawled URLs",
                "success"
            )
            # Update sources
            all_sources.append(result_object)
            yield streaming_service.update_sources(all_sources)
            # Add documents to collection
            all_raw_documents.extend(crawled_urls_chunks)
        # Files
        if connector == "FILE":
            # Send terminal message about starting search
            yield streaming_service.add_terminal_message("Starting to search for files...")
            # Search for files using reformulated query
            result_object, files_chunks = await connector_service.search_files(
                user_query=reformulated_query,
                user_id=user_id,
                search_space_id=search_space_id,
                top_k=TOP_K
            )
            # Send terminal message about search results
            yield streaming_service.add_terminal_message(
                f"Found {len(result_object['sources'])} relevant files",
                "success"
            )
            # Update sources
            all_sources.append(result_object)
            yield streaming_service.update_sources(all_sources) 
            # Add documents to collection
            all_raw_documents.extend(files_chunks)
        # Tavily Connector
        if connector == "TAVILY_API":
            # Send terminal message about starting search
            yield streaming_service.add_terminal_message("Starting to search with Tavily API...")
            # Search using Tavily API with reformulated query
            result_object, tavily_chunks = await connector_service.search_tavily(
                user_query=reformulated_query,
                user_id=user_id,
                top_k=TOP_K
            )
            # Send terminal message about search results
            yield streaming_service.add_terminal_message(
                f"Found {len(result_object['sources'])} relevant results from Tavily",
                "success"
            )
            # Update sources
            all_sources.append(result_object)
            yield streaming_service.update_sources(all_sources)
            # Add documents to collection
            all_raw_documents.extend(tavily_chunks)
        # Slack Connector
        if connector == "SLACK_CONNECTOR":
            # Send terminal message about starting search
            yield streaming_service.add_terminal_message("Starting to search for slack connector...")   
            # Search using Slack API with reformulated query
            result_object, slack_chunks = await connector_service.search_slack(
                user_query=reformulated_query,
                user_id=user_id,
                search_space_id=search_space_id,
                top_k=TOP_K
            )
            # Send terminal message about search results
            yield streaming_service.add_terminal_message(
                f"Found {len(result_object['sources'])} relevant results from Slack",
                "success"
            )
            # Update sources
            all_sources.append(result_object)
            yield streaming_service.update_sources(all_sources)
            # Add documents to collection
            all_raw_documents.extend(slack_chunks)
        # Notion Connector
        if connector == "NOTION_CONNECTOR":
            # Send terminal message about starting search
            yield streaming_service.add_terminal_message("Starting to search for notion connector...")  
            # Search using Notion API with reformulated query
            result_object, notion_chunks = await connector_service.search_notion(
                user_query=reformulated_query,
                user_id=user_id,
                search_space_id=search_space_id,
                top_k=TOP_K
            )
            # Send terminal message about search results
            yield streaming_service.add_terminal_message(
                f"Found {len(result_object['sources'])} relevant results from Notion",
                "success"
            )
            # Update sources
            all_sources.append(result_object)
            yield streaming_service.update_sources(all_sources)
            # Add documents to collection
            all_raw_documents.extend(notion_chunks)
-
+    # Sample configuration
-    # If we have documents to research
+    config = {
-    if all_raw_documents:
+        "configurable": {
-        # Rerank all documents if reranker is available
+            "user_query": user_query,
-        if reranker_service:
+            "num_sections": NUM_SECTIONS,
-            yield streaming_service.add_terminal_message("Reranking documents for better relevance...", "info")
+            "connectors_to_search": selected_connectors,
-            
+            "user_id": user_id_str,
-            # Convert documents to format expected by reranker
+            "search_space_id": search_space_id
-            reranker_input_docs = [
+        }
-                {
+    }
-                    "chunk_id": doc.get("chunk_id", f"chunk_{i}"),
+    # Initialize state with database session and streaming service
-                    "content": doc.get("content", ""),
+    initial_state = State(
-                    "score": doc.get("score", 0.0),
+        db_session=session,
-                    "document": {
+        streaming_service=streaming_service
-                        "id": doc.get("document", {}).get("id", ""),
+    )
-                        "title": doc.get("document", {}).get("title", ""),
+    
-                        "document_type": doc.get("document", {}).get("document_type", ""),
+    # Run the graph directly
-                        "metadata": doc.get("document", {}).get("metadata", {})
+    print("\nRunning the complete researcher workflow...")
-                    }
+    
-                } for i, doc in enumerate(all_raw_documents)
+    # Use streaming with config parameter
-            ]
+    async for chunk in researcher_graph.astream(
-            
+        initial_state,
-            # Rerank documents using the reformulated query
+        config=config,
-            reranked_docs = reranker_service.rerank_documents(reformulated_query, reranker_input_docs)
+        stream_mode="custom",
-            
+    ):
-            # Sort by score in descending order
+        # If the chunk contains a 'yeild_value' key, print its value
-            reranked_docs.sort(key=lambda x: x.get("score", 0), reverse=True)
+        # Note: there's a typo in 'yeild_value' in the code, but we need to match it
-            
+        if isinstance(chunk, dict) and 'yeild_value' in chunk:
-           
+            yield chunk['yeild_value']
-            
+    
-            # Convert back to langchain documents format
+    yield streaming_service.format_completion()
            from langchain.schema import Document as LangchainDocument
            all_langchain_documents_to_research = [
                LangchainDocument(
                    page_content= f"""<document><metadata><source_id>{doc.get("document", {}).get("id", "")}</source_id></metadata><content>{doc.get("content", "")}</content></document>""",
                    metadata={
                        # **doc.get("document", {}).get("metadata", {}),
                        # "score": doc.get("score", 0.0),
                        # "rank": doc.get("rank", 0),
                        # "document_id": doc.get("document", {}).get("id", ""),
                        # "document_title": doc.get("document", {}).get("title", ""),
                        # "document_type": doc.get("document", {}).get("document_type", ""),
                        # # Explicitly set source_id for citation purposes
                        "source_id": str(doc.get("document", {}).get("id", ""))
                    }
                ) for doc in reranked_docs
            ]
            yield streaming_service.add_terminal_message(f"Reranked {len(all_langchain_documents_to_research)} documents", "success")
        else:
            # Use raw documents if no reranker is available
            all_langchain_documents_to_research = convert_chunks_to_langchain_documents(all_raw_documents)
        # Send terminal message about starting research
        yield streaming_service.add_terminal_message("Starting to research...", "info")
        # Create a buffer to collect report content
        report_buffer = []
        # Use the streaming research method
        yield streaming_service.add_terminal_message("Generating report...", "info")
        # Create a wrapper to handle the streaming
        class StreamHandler:
            def __init__(self):
                self.queue = asyncio.Queue()
            async def handle_progress(self, data):
                result = None
                if data.get("type") == "logs":
                    # Handle log messages
                    result = streaming_service.add_terminal_message(data.get("output", ""), "info")
                elif data.get("type") == "report":
                    # Handle report content
                    content = data.get("output", "")
                    # Fix incorrect citation formats using regex
                    # More specific pattern to match only numeric citations in markdown-style links
                    # This matches patterns like ([1](https://github.com/...)) but not general links like ([Click here](https://...))
                    pattern = r'\(\[(\d+)\]\((https?://[^\)]+)\)\)'
                    # Replace with just [X] where X is the number
                    content = re.sub(pattern, r'[\1]', content)
                    # Also match other incorrect formats like ([1]) and convert to [1]
                    # Only match if the content inside brackets is a number
                    content = re.sub(r'\(\[(\d+)\]\)', r'[\1]', content)
                    report_buffer.append(content)
                    # Update the answer with the accumulated content
                    result = streaming_service.update_answer(report_buffer)
                if result:
                    await self.queue.put(result)
                return result
            async def get_next(self):
                try:
                    return await self.queue.get()
                except Exception as e:
                    print(f"Error getting next item from queue: {e}")
                    return None
            def task_done(self):
                self.queue.task_done()
        # Create the stream handler
        stream_handler = StreamHandler()
        # Start the research process in a separate task
        research_task = asyncio.create_task(
            ResearchService.stream_research(
                user_query=reformulated_query,
                documents=all_langchain_documents_to_research,
                on_progress=stream_handler.handle_progress,
                research_mode=research_mode
            )
        )
        # Stream results as they become available
        while not research_task.done() or not stream_handler.queue.empty():
            try:
                # Get the next result with a timeout
                result = await asyncio.wait_for(stream_handler.get_next(), timeout=0.1)
                stream_handler.task_done()
                yield result
            except asyncio.TimeoutError:
                # No result available yet, check if the research task is done
                if research_task.done():
                    # If the queue is empty and the task is done, we're finished
                    if stream_handler.queue.empty():
                        break
        # Get the final report
        try:
            final_report = await research_task
            # Send terminal message about research completion
            yield streaming_service.add_terminal_message("Research completed", "success")
            # Update the answer with the final report
            final_report_lines = final_report.split('\n')
            yield streaming_service.update_answer(final_report_lines)
        except Exception as e:
            # Handle any exceptions
            yield streaming_service.add_terminal_message(f"Error during research: {str(e)}", "error")
        # Send completion message
        yield streaming_service.format_completion()
--- a/surfsense_backend/app/utils/connector_service.py
+++ b/surfsense_backend/app/utils/connector_service.py
@ -13,7 +13,7 @@ class ConnectorService:
        self.retriever = ChucksHybridSearchRetriever(session)
        self.source_id_counter = 1
-    async def search_crawled_urls(self, user_query: str, user_id: int, search_space_id: int, top_k: int = 20) -> tuple:
+    async def search_crawled_urls(self, user_query: str, user_id: str, search_space_id: int, top_k: int = 20) -> tuple:
        """
        Search for crawled URLs and return both the source information and langchain documents
@ -28,16 +28,16 @@ class ConnectorService:
            document_type="CRAWLED_URL"
        )
-        # Map crawled_urls_chunks to the required format
+        # Process each chunk and create sources directly without deduplication
-        mapped_sources = {}
+        sources_list = []
        for i, chunk in enumerate(crawled_urls_chunks):
-            #Fix for UI
+            # Fix for UI
            crawled_urls_chunks[i]['document']['id'] = self.source_id_counter
            # Extract document metadata
            document = chunk.get('document', {})
            metadata = document.get('metadata', {})
-            # Create a mapped source entry
+            # Create a source entry
            source = {
                "id":  self.source_id_counter,
                "title": document.get('title', 'Untitled Document'),
@ -46,14 +46,7 @@ class ConnectorService:
            }
            self.source_id_counter += 1
-
+            sources_list.append(source)
            # Use a unique identifier for tracking unique sources
            source_key = source.get("url") or source.get("title")
            if source_key and source_key not in mapped_sources:
                mapped_sources[source_key] = source
        # Convert to list of sources
        sources_list = list(mapped_sources.values())
        # Create result object
        result_object = {
@ -63,10 +56,9 @@ class ConnectorService:
            "sources": sources_list,
        }
        return result_object, crawled_urls_chunks
-    async def search_files(self, user_query: str, user_id: int, search_space_id: int, top_k: int = 20) -> tuple:
+    async def search_files(self, user_query: str, user_id: str, search_space_id: int, top_k: int = 20) -> tuple:
        """
        Search for files and return both the source information and langchain documents
@ -81,16 +73,16 @@ class ConnectorService:
            document_type="FILE"
        )
-        # Map crawled_urls_chunks to the required format
+        # Process each chunk and create sources directly without deduplication
-        mapped_sources = {}
+        sources_list = []
        for i, chunk in enumerate(files_chunks):
-            #Fix for UI
+            # Fix for UI
            files_chunks[i]['document']['id'] = self.source_id_counter
            # Extract document metadata
            document = chunk.get('document', {})
            metadata = document.get('metadata', {})
-            # Create a mapped source entry
+            # Create a source entry
            source = {
                "id":  self.source_id_counter,
                "title": document.get('title', 'Untitled Document'),
@ -99,14 +91,7 @@ class ConnectorService:
            }
            self.source_id_counter += 1
-
+            sources_list.append(source)
            # Use a unique identifier for tracking unique sources
            source_key = source.get("url") or source.get("title")
            if source_key and source_key not in mapped_sources:
                mapped_sources[source_key] = source
        # Convert to list of sources
        sources_list = list(mapped_sources.values())
        # Create result object
        result_object = {
@ -118,7 +103,7 @@ class ConnectorService:
        return result_object, files_chunks
-    async def get_connector_by_type(self, user_id: int, connector_type: SearchSourceConnectorType) -> Optional[SearchSourceConnector]:
+    async def get_connector_by_type(self, user_id: str, connector_type: SearchSourceConnectorType) -> Optional[SearchSourceConnector]:
        """
        Get a connector by type for a specific user
@ -138,7 +123,7 @@ class ConnectorService:
        )
        return result.scalars().first()
-    async def search_tavily(self, user_query: str, user_id: int, top_k: int = 20) -> tuple:
+    async def search_tavily(self, user_query: str, user_id: str, top_k: int = 20) -> tuple:
        """
        Search using Tavily API and return both the source information and documents
@ -177,13 +162,10 @@ class ConnectorService:
            # Extract results from Tavily response
            tavily_results = response.get("results", [])
-            # Map Tavily results to the required format
+            # Process each result and create sources directly without deduplication
            sources_list = []
            documents = []
            # Start IDs from 1000 to avoid conflicts with other connectors
            base_id = 100
            for i, result in enumerate(tavily_results):
                # Create a source entry
@ -234,7 +216,7 @@ class ConnectorService:
                "sources": [],
            }, []
-    async def search_slack(self, user_query: str, user_id: int, search_space_id: int, top_k: int = 20) -> tuple:
+    async def search_slack(self, user_query: str, user_id: str, search_space_id: int, top_k: int = 20) -> tuple:
        """
        Search for slack and return both the source information and langchain documents
@ -249,10 +231,10 @@ class ConnectorService:
            document_type="SLACK_CONNECTOR"
        )
-        # Map slack_chunks to the required format
+        # Process each chunk and create sources directly without deduplication
-        mapped_sources = {}
+        sources_list = []
        for i, chunk in enumerate(slack_chunks):
-            #Fix for UI
+            # Fix for UI
            slack_chunks[i]['document']['id'] = self.source_id_counter
            # Extract document metadata
            document = chunk.get('document', {})
@ -286,14 +268,7 @@ class ConnectorService:
            }
            self.source_id_counter += 1
-
+            sources_list.append(source)
            # Use channel_id and content as a unique identifier for tracking unique sources
            source_key = f"{channel_id}_{chunk.get('chunk_id', i)}"
            if source_key and source_key not in mapped_sources:
                mapped_sources[source_key] = source
        # Convert to list of sources
        sources_list = list(mapped_sources.values())
        # Create result object
        result_object = {
@ -305,7 +280,7 @@ class ConnectorService:
        return result_object, slack_chunks
-    async def search_notion(self, user_query: str, user_id: int, search_space_id: int, top_k: int = 20) -> tuple:
+    async def search_notion(self, user_query: str, user_id: str, search_space_id: int, top_k: int = 20) -> tuple:
        """
        Search for Notion pages and return both the source information and langchain documents
@ -326,8 +301,8 @@ class ConnectorService:
            document_type="NOTION_CONNECTOR"
        )
-        # Map notion_chunks to the required format
+        # Process each chunk and create sources directly without deduplication
-        mapped_sources = {}
+        sources_list = []
        for i, chunk in enumerate(notion_chunks):
            # Fix for UI
            notion_chunks[i]['document']['id'] = self.source_id_counter
@ -365,14 +340,7 @@ class ConnectorService:
            }
            self.source_id_counter += 1
-
+            sources_list.append(source)
            # Use page_id and content as a unique identifier for tracking unique sources
            source_key = f"{page_id}_{chunk.get('chunk_id', i)}"
            if source_key and source_key not in mapped_sources:
                mapped_sources[source_key] = source
        # Convert to list of sources
        sources_list = list(mapped_sources.values())
        # Create result object
        result_object = {
@ -384,7 +352,7 @@ class ConnectorService:
        return result_object, notion_chunks
-    async def search_extension(self, user_query: str, user_id: int, search_space_id: int, top_k: int = 20) -> tuple:
+    async def search_extension(self, user_query: str, user_id: str, search_space_id: int, top_k: int = 20) -> tuple:
        """
        Search for extension data and return both the source information and langchain documents
@ -405,8 +373,8 @@ class ConnectorService:
            document_type="EXTENSION"
        )
-        # Map extension_chunks to the required format
+        # Process each chunk and create sources directly without deduplication
-        mapped_sources = {}
+        sources_list = []
        for i, chunk in enumerate(extension_chunks):
            # Fix for UI
            extension_chunks[i]['document']['id'] = self.source_id_counter
@ -462,14 +430,7 @@ class ConnectorService:
            }
            self.source_id_counter += 1
-
+            sources_list.append(source)
            # Use URL and timestamp as a unique identifier for tracking unique sources
            source_key = f"{webpage_url}_{visit_date}"
            if source_key and source_key not in mapped_sources:
                mapped_sources[source_key] = source
        # Convert to list of sources
        sources_list = list(mapped_sources.values())
        # Create result object
        result_object = {
@ -481,7 +442,7 @@ class ConnectorService:
        return result_object, extension_chunks
-    async def search_youtube(self, user_query: str, user_id: int, search_space_id: int, top_k: int = 20) -> tuple:
+    async def search_youtube(self, user_query: str, user_id: str, search_space_id: int, top_k: int = 20) -> tuple:
        """
        Search for YouTube videos and return both the source information and langchain documents
@ -502,8 +463,8 @@ class ConnectorService:
            document_type="YOUTUBE_VIDEO"
        )
-        # Map youtube_chunks to the required format
+        # Process each chunk and create sources directly without deduplication
-        mapped_sources = {}
+        sources_list = []
        for i, chunk in enumerate(youtube_chunks):
            # Fix for UI
            youtube_chunks[i]['document']['id'] = self.source_id_counter
@ -541,21 +502,144 @@ class ConnectorService:
            }
            self.source_id_counter += 1
-
+            sources_list.append(source)
            # Use video_id as a unique identifier for tracking unique sources
            source_key = video_id or f"youtube_{i}"
            if source_key and source_key not in mapped_sources:
                mapped_sources[source_key] = source
        # Convert to list of sources
        sources_list = list(mapped_sources.values())
        # Create result object
        result_object = {
-            "id": 6,  # Assign a unique ID for the YouTube connector
+            "id": 7,  # Assign a unique ID for the YouTube connector
            "name": "YouTube Videos",
            "type": "YOUTUBE_VIDEO",
            "sources": sources_list,
        }
-        return result_object, youtube_chunks
+        return result_object, youtube_chunks
    async def search_github(self, user_query: str, user_id: int, search_space_id: int, top_k: int = 20) -> tuple:
        """
        Search for GitHub documents and return both the source information and langchain documents
        Returns:
            tuple: (sources_info, langchain_documents)
        """
        github_chunks = await self.retriever.hybrid_search(
            query_text=user_query,
            top_k=top_k,
            user_id=user_id,
            search_space_id=search_space_id,
            document_type="GITHUB_CONNECTOR"
        )
        # Process each chunk and create sources directly without deduplication
        sources_list = []
        for i, chunk in enumerate(github_chunks):
            # Fix for UI - assign a unique ID for citation/source tracking
            github_chunks[i]['document']['id'] = self.source_id_counter
            # Extract document metadata
            document = chunk.get('document', {})
            metadata = document.get('metadata', {})
            # Create a source entry
            source = {
                "id": self.source_id_counter,
                "title": document.get('title', 'GitHub Document'), # Use specific title if available
                "description": metadata.get('description', chunk.get('content', '')[:100]), # Use description or content preview
                "url": metadata.get('url', '') # Use URL if available in metadata
            }
            self.source_id_counter += 1
            sources_list.append(source)
        # Create result object
        result_object = {
            "id": 8,
            "name": "GitHub",
            "type": "GITHUB_CONNECTOR",
            "sources": sources_list,
        }
        return result_object, github_chunks
    async def search_linear(self, user_query: str, user_id: str, search_space_id: int, top_k: int = 20) -> tuple:
        """
        Search for Linear issues and comments and return both the source information and langchain documents
        Args:
            user_query: The user's query
            user_id: The user's ID
            search_space_id: The search space ID to search in
            top_k: Maximum number of results to return
        Returns:
            tuple: (sources_info, langchain_documents)
        """
        linear_chunks = await self.retriever.hybrid_search(
            query_text=user_query,
            top_k=top_k,
            user_id=user_id,
            search_space_id=search_space_id,
            document_type="LINEAR_CONNECTOR"
        )
        # Process each chunk and create sources directly without deduplication
        sources_list = []
        for i, chunk in enumerate(linear_chunks):
            # Fix for UI
            linear_chunks[i]['document']['id'] = self.source_id_counter
            # Extract document metadata
            document = chunk.get('document', {})
            metadata = document.get('metadata', {})
            # Extract Linear-specific metadata
            issue_identifier = metadata.get('issue_identifier', '')
            issue_title = metadata.get('issue_title', 'Untitled Issue')
            issue_state = metadata.get('state', '')
            comment_count = metadata.get('comment_count', 0)
            # Create a more descriptive title for Linear issues
            title = f"Linear: {issue_identifier} - {issue_title}"
            if issue_state:
                title += f" ({issue_state})"
            # Create a more descriptive description for Linear issues
            description = chunk.get('content', '')[:100]
            if len(description) == 100:
                description += "..."
            # Add comment count info to description
            if comment_count:
                if description:
                    description += f" | Comments: {comment_count}"
                else:
                    description = f"Comments: {comment_count}"
            # For URL, we could construct a URL to the Linear issue if we have the workspace info
            # For now, use a generic placeholder
            url = ""
            if issue_identifier:
                # This is a generic format, may need to be adjusted based on actual Linear workspace
                url = f"https://linear.app/issue/{issue_identifier}"
            source = {
                "id": self.source_id_counter,
                "title": title,
                "description": description,
                "url": url,
                "issue_identifier": issue_identifier,
                "state": issue_state,
                "comment_count": comment_count
            }
            self.source_id_counter += 1
            sources_list.append(source)
        # Create result object
        result_object = {
            "id": 9,  # Assign a unique ID for the Linear connector
            "name": "Linear Issues",
            "type": "LINEAR_CONNECTOR",
            "sources": sources_list,
        }
        return result_object, linear_chunks
--- a/surfsense_backend/app/utils/query_service.py
+++ b/surfsense_backend/app/utils/query_service.py
@ -1,5 +1,7 @@
-from typing import Dict, Any
+"""
-from langchain.schema import LLMResult, HumanMessage, SystemMessage
+NOTE: This is not used anymore. Might be removed in the future.
 """
 from langchain.schema import HumanMessage, SystemMessage
 from app.config import config
 class QueryService:
--- a/surfsense_backend/app/utils/research_service.py
+++ b/surfsense_backend/app/utils/research_service.py
@ -1,211 +0,0 @@
 import asyncio
 import re
 from typing import List, Dict, Any, AsyncGenerator, Callable, Optional
 from langchain.schema import Document
 from gpt_researcher.agent import GPTResearcher
 from gpt_researcher.utils.enum import ReportType, Tone, ReportSource
 from dotenv import load_dotenv
 load_dotenv()
 class ResearchService:
    @staticmethod
    async def create_custom_prompt(user_query: str) -> str:
        citation_prompt = f"""
        You are a research assistant tasked with analyzing documents and providing comprehensive answers with proper citations in IEEE format.
        <instructions>
        1. Carefully analyze all provided documents in the <document> section's.
        2. Extract relevant information that addresses the user's query.
        3. Synthesize a comprehensive, well-structured answer using information from these documents.
        4. For EVERY piece of information you include from the documents, add an IEEE-style citation in square brackets [X] where X is the source_id from the document's metadata.
        5. Make sure ALL factual statements from the documents have proper citations.
        6. If multiple documents support the same point, include all relevant citations [X], [Y].
        7. Present information in a logical, coherent flow.
        8. Use your own words to connect ideas, but cite ALL information from the documents.
        9. If documents contain conflicting information, acknowledge this and present both perspectives with appropriate citations.
        10. Do not make up or include information not found in the provided documents.
        11. CRITICAL: You MUST use the exact source_id value from each document's metadata for citations. Do not create your own citation numbers.
        12. CRITICAL: Every citation MUST be in the IEEE format [X] where X is the exact source_id value.
        13. CRITICAL: Never renumber or reorder citations - always use the original source_id values.
        14. CRITICAL: Do not return citations as clickable links.
        15. CRITICAL: Never format citations as markdown links like "([1](https://example.com))". Always use plain square brackets only.
        16. CRITICAL: Citations must ONLY appear as [X] or [X], [Y], [Z] format - never with parentheses, hyperlinks, or other formatting.
        17. CRITICAL: Never make up citation numbers. Only use source_id values that are explicitly provided in the document metadata.
        18. CRITICAL: If you are unsure about a source_id, do not include a citation rather than guessing or making one up.
        </instructions>
        <format>
        - Write in clear, professional language suitable for academic or technical audiences
        - Organize your response with appropriate paragraphs, headings, and structure
        - Every fact from the documents must have an IEEE-style citation in square brackets [X] where X is the EXACT source_id from the document's metadata
        - Citations should appear at the end of the sentence containing the information they support
        - Multiple citations should be separated by commas: [X], [Y], [Z]
        - No need to return references section. Just citation numbers in answer.
        - NEVER create your own citation numbering system - use the exact source_id values from the documents.
        - NEVER format citations as clickable links or as markdown links like "([1](https://example.com))". Always use plain square brackets only.
        - NEVER make up citation numbers if you are unsure about the source_id. It is better to omit the citation than to guess.
        </format>
        <input_example>
            <document>
                <metadata>
                    <source_id>1</source_id>
                </metadata>
                <content>
                    <text>
                        The Great Barrier Reef is the world's largest coral reef system, stretching over 2,300 kilometers along the coast of Queensland, Australia. It comprises over 2,900 individual reefs and 900 islands.
                    </text>
                </content>
            </document>
            <document>
                <metadata>
                    <source_id>13</source_id>
                </metadata>
                <content>
                    <text>
                        Climate change poses a significant threat to coral reefs worldwide. Rising ocean temperatures have led to mass coral bleaching events in the Great Barrier Reef in 2016, 2017, and 2020.
                    </text>
                </content>
            </document>
            <document>
                <metadata>
                    <source_id>21</source_id>
                </metadata>
                <content>
                    <text>
                        The Great Barrier Reef was designated a UNESCO World Heritage Site in 1981 due to its outstanding universal value and biological diversity. It is home to over 1,500 species of fish and 400 types of coral.
                    </text>
                </content>
            </document>
        </input_example>
        <output_example>
            The Great Barrier Reef is the world's largest coral reef system, stretching over 2,300 kilometers along the coast of Queensland, Australia [1]. It was designated a UNESCO World Heritage Site in 1981 due to its outstanding universal value and biological diversity [21]. The reef is home to over 1,500 species of fish and 400 types of coral [21]. Unfortunately, climate change poses a significant threat to coral reefs worldwide, with rising ocean temperatures leading to mass coral bleaching events in the Great Barrier Reef in 2016, 2017, and 2020 [13]. The reef system comprises over 2,900 individual reefs and 900 islands [1], making it an ecological treasure that requires protection from multiple threats [1], [13].
        </output_example>
        <incorrect_citation_formats>
        DO NOT use any of these incorrect citation formats:
        - Using parentheses and markdown links: ([1](https://github.com/MODSetter/SurfSense))
        - Using parentheses around brackets: ([1])
        - Using hyperlinked text: [link to source 1](https://example.com)
        - Using footnote style: ... reef system¹
        - Making up citation numbers when source_id is unknown
        ONLY use plain square brackets [1] or multiple citations [1], [2], [3]
        </incorrect_citation_formats>
        Note that the citation numbers match exactly with the source_id values (1, 13, and 21) and are not renumbered sequentially. Citations follow IEEE style with square brackets and appear at the end of sentences.
        Now, please research the following query:
        <user_query_to_research>
            {user_query}
        </user_query_to_research>
        """
        return citation_prompt
    @staticmethod
    async def stream_research(
        user_query: str, 
        documents: List[Document] = None,
        on_progress: Optional[Callable] = None,
        research_mode: str = "GENERAL"
    ) -> str:
        """
        Stream the research process using GPTResearcher
        Args:
            user_query: The user's query
            documents: List of Document objects to use for research
            on_progress: Optional callback for progress updates
            research_mode: Research mode to use 
        Returns:
            str: The final research report
        """
        # Create a custom websocket-like object to capture streaming output
        class StreamingWebsocket:
            async def send_json(self, data):
                if on_progress:
                    try:
                        # Filter out excessive logging of the prompt
                        if data.get("type") == "logs":
                            output = data.get("output", "")
                            # Check if this is a verbose prompt log
                            if "You are a research assistant tasked with analyzing documents" in output and len(output) > 500:
                                # Replace with a shorter message
                                data["output"] = f"Processing research for query: {user_query}"
                        result = await on_progress(data)
                        return result
                    except Exception as e:
                        print(f"Error in on_progress callback: {e}")
                return None
        streaming_websocket = StreamingWebsocket()
        custom_prompt_for_ieee_citations = await ResearchService.create_custom_prompt(user_query)
        if(research_mode == "GENERAL"):
            research_report_type = ReportType.CustomReport.value
        elif(research_mode == "DEEP"):
            research_report_type = ReportType.ResearchReport.value
        elif(research_mode == "DEEPER"):
            research_report_type = ReportType.DetailedReport.value
        # elif(research_mode == "DEEPEST"):
        #     research_report_type = ReportType.DeepResearch.value
        # Initialize GPTResearcher with the streaming websocket
        researcher = GPTResearcher(
            query=custom_prompt_for_ieee_citations,
            report_type=research_report_type,
            report_format="IEEE",
            report_source=ReportSource.LangChainDocuments.value,
            tone=Tone.Formal,
            documents=documents,
            verbose=True,
            websocket=streaming_websocket
        )
        # Conduct research
        await researcher.conduct_research()
        # Generate report with streaming
        report = await researcher.write_report()
        # Fix citation format
        report = ResearchService.fix_citation_format(report)
        return report 
    @staticmethod
    def fix_citation_format(text: str) -> str:
        """
        Fix any incorrectly formatted citations in the text.
        Args:
            text: The text to fix
        Returns:
            str: The text with fixed citations
        """
        if not text:
            return text
        # More specific pattern to match only numeric citations in markdown-style links
        # This matches patterns like ([1](https://github.com/...)) but not general links like ([Click here](https://...))
        pattern = r'\(\[(\d+)\]\((https?://[^\)]+)\)\)'
        # Replace with just [X] where X is the number
        text = re.sub(pattern, r'[\1]', text)
        # Also match other incorrect formats like ([1]) and convert to [1]
        # Only match if the content inside brackets is a number
        text = re.sub(r'\(\[(\d+)\]\)', r'[\1]', text)
        return text
--- a/surfsense_backend/app/utils/streaming_service.py
+++ b/surfsense_backend/app/utils/streaming_service.py
@ -1,5 +1,6 @@
 import json
-from typing import List, Dict, Any, Generator
+from typing import Any, Dict, List
 class StreamingService:
    def __init__(self):
@ -18,55 +19,7 @@ class StreamingService:
                "content": []
            }
        ]
-    
+    # It is used to send annotations to the frontend
    def add_terminal_message(self, text: str, message_type: str = "info") -> str:
        """
        Add a terminal message to the annotations and return the formatted response
        Args:
            text: The message text
            message_type: The message type (info, success, error)
        Returns:
            str: The formatted response string
        """
        self.message_annotations[0]["content"].append({
            "id": self.terminal_idx,
            "text": text,
            "type": message_type
        })
        self.terminal_idx += 1
        return self._format_annotations()
    def update_sources(self, sources: List[Dict[str, Any]]) -> str:
        """
        Update the sources in the annotations and return the formatted response
        Args:
            sources: List of source objects
        Returns:
            str: The formatted response string
        """
        self.message_annotations[1]["content"] = sources
        return self._format_annotations()
    def update_answer(self, answer_content: List[str]) -> str:
        """
        Update the answer in the annotations and return the formatted response
        Args:
            answer_content: The answer content as a list of strings
        Returns:
            str: The formatted response string
        """
        self.message_annotations[2] = {
            "type": "ANSWER",
            "content": answer_content
        }
        return self._format_annotations()
    def _format_annotations(self) -> str:
        """
        Format the annotations as a string
@ -76,6 +29,7 @@ class StreamingService:
        """
        return f'8:{json.dumps(self.message_annotations)}\n'
    # It is used to end Streaming
    def format_completion(self, prompt_tokens: int = 156, completion_tokens: int = 204) -> str:
        """
        Format a completion message
@ -96,4 +50,23 @@ class StreamingService:
                "totalTokens": total_tokens
            }
        }
-        return f'd:{json.dumps(completion_data)}\n' 
+        return f'd:{json.dumps(completion_data)}\n' 
    def only_update_terminal(self, text: str, message_type: str = "info") -> str:
        self.message_annotations[0]["content"].append({
            "id": self.terminal_idx,
            "text": text,
            "type": message_type
        })
        self.terminal_idx += 1
        return self.message_annotations
    def only_update_sources(self, sources: List[Dict[str, Any]]) -> str:
        self.message_annotations[1]["content"] = sources
        return self.message_annotations
    def only_update_answer(self, answer: List[str]) -> str:
        self.message_annotations[2]["content"] = answer
        return self.message_annotations
--- a/surfsense_backend/draw.py
+++ b/surfsense_backend/draw.py
@ -0,0 +1,5 @@
 from app.agents.researcher.graph import graph as researcher_graph
 from app.agents.researcher.sub_section_writer.graph import graph as sub_section_writer_graph
 print(researcher_graph.get_graph().draw_mermaid())
 print(sub_section_writer_graph.get_graph().draw_mermaid())
--- a/surfsense_backend/main.py
+++ b/surfsense_backend/main.py
@ -1,5 +1,12 @@
 import uvicorn
 import argparse
 import logging
 logging.basicConfig(
    level=logging.INFO,
    format='%(asctime)s - %(name)s - %(levelname)s - %(message)s',
    datefmt='%Y-%m-%d %H:%M:%S'
 )
 if __name__ == "__main__":
    parser = argparse.ArgumentParser(description='Run the SurfSense application')
--- a/surfsense_backend/pyproject.toml
+++ b/surfsense_backend/pyproject.toml
@ -1,18 +1,20 @@
 [project]
 name = "surf-new-backend"
-version = "0.1.0"
+version = "0.0.6"
-description = "Add your description here"
+description = "SurfSense Backend"
 readme = "README.md"
 requires-python = ">=3.12"
 dependencies = [
    "alembic>=1.13.0",
    "asyncpg>=0.30.0",
    "chonkie[all]>=0.4.1",
    "fastapi>=0.115.8",
    "fastapi-users[oauth,sqlalchemy]>=14.0.1",
    "firecrawl-py>=1.12.0",
-    "gpt-researcher>=0.12.12",
+    "github3.py==4.0.1",
    "langchain-community>=0.3.17",
    "langchain-unstructured>=0.1.6",
    "langgraph>=0.3.29",
    "litellm>=1.61.4",
    "markdownify>=0.14.1",
    "notion-client>=2.3.0",
--- a/surfsense_backend/uv.lock
+++ b/surfsense_backend/uv.lock
@ -92,6 +92,20 @@ wheels = [
    { url = "https://files.pythonhosted.org/packages/ec/6a/bc7e17a3e87a2985d3e8f4da4cd0f481060eb78fb08596c42be62c90a4d9/aiosignal-1.3.2-py2.py3-none-any.whl", hash = "sha256:45cde58e409a301715980c2b01d0c28bdde3770d8290b5eb2173759d9acb31a5", size = 7597 },
 ]
 [[package]]
 name = "alembic"
 version = "1.15.2"
 source = { registry = "https://pypi.org/simple" }
 dependencies = [
    { name = "mako" },
    { name = "sqlalchemy" },
    { name = "typing-extensions" },
 ]
 sdist = { url = "https://files.pythonhosted.org/packages/e6/57/e314c31b261d1e8a5a5f1908065b4ff98270a778ce7579bd4254477209a7/alembic-1.15.2.tar.gz", hash = "sha256:1c72391bbdeffccfe317eefba686cb9a3c078005478885413b95c3b26c57a8a7", size = 1925573 }
 wheels = [
    { url = "https://files.pythonhosted.org/packages/41/18/d89a443ed1ab9bcda16264716f809c663866d4ca8de218aa78fd50b38ead/alembic-1.15.2-py3-none-any.whl", hash = "sha256:2e76bd916d547f6900ec4bb5a90aeac1485d2c92536923d0b138c02b126edc53", size = 231911 },
 ]
 [[package]]
 name = "annotated-types"
 version = "0.7.0"
@ -154,19 +168,6 @@ wheels = [
    { url = "https://files.pythonhosted.org/packages/5a/e4/bf8034d25edaa495da3c8a3405627d2e35758e44ff6eaa7948092646fdcc/argon2_cffi_bindings-21.2.0-cp38-abi3-macosx_10_9_universal2.whl", hash = "sha256:e415e3f62c8d124ee16018e491a009937f8cf7ebf5eb430ffc5de21b900dad93", size = 53104 },
 ]
 [[package]]
 name = "arxiv"
 version = "2.1.3"
 source = { registry = "https://pypi.org/simple" }
 dependencies = [
    { name = "feedparser" },
    { name = "requests" },
 ]
 sdist = { url = "https://files.pythonhosted.org/packages/fe/59/fe41f54bdfed776c2e9bcd6289e4c71349eb938241d89b4c97d0f33e8013/arxiv-2.1.3.tar.gz", hash = "sha256:32365221994d2cf05657c1fadf63a26efc8ccdec18590281ee03515bfef8bc4e", size = 16747 }
 wheels = [
    { url = "https://files.pythonhosted.org/packages/b7/7b/7bf42178d227b26d3daf94cdd22a72a4ed5bf235548c4f5aea49c51c6458/arxiv-2.1.3-py3-none-any.whl", hash = "sha256:6f43673ab770a9e848d7d4fc1894824df55edeac3c3572ea280c9ba2e3c0f39f", size = 11478 },
 ]
 [[package]]
 name = "asyncpg"
 version = "0.30.0"
@ -265,61 +266,6 @@ wheels = [
    { url = "https://files.pythonhosted.org/packages/f9/49/6abb616eb3cbab6a7cca303dc02fdf3836de2e0b834bf966a7f5271a34d8/beautifulsoup4-4.13.3-py3-none-any.whl", hash = "sha256:99045d7d3f08f91f0d656bc9b7efbae189426cd913d830294a15eefa0ea4df16", size = 186015 },
 ]
 [[package]]
 name = "brotli"
 version = "1.1.0"
 source = { registry = "https://pypi.org/simple" }
 sdist = { url = "https://files.pythonhosted.org/packages/2f/c2/f9e977608bdf958650638c3f1e28f85a1b075f075ebbe77db8555463787b/Brotli-1.1.0.tar.gz", hash = "sha256:81de08ac11bcb85841e440c13611c00b67d3bf82698314928d0b676362546724", size = 7372270 }
 wheels = [
    { url = "https://files.pythonhosted.org/packages/5c/d0/5373ae13b93fe00095a58efcbce837fd470ca39f703a235d2a999baadfbc/Brotli-1.1.0-cp312-cp312-macosx_10_13_universal2.whl", hash = "sha256:32d95b80260d79926f5fab3c41701dbb818fde1c9da590e77e571eefd14abe28", size = 815693 },
    { url = "https://files.pythonhosted.org/packages/8e/48/f6e1cdf86751300c288c1459724bfa6917a80e30dbfc326f92cea5d3683a/Brotli-1.1.0-cp312-cp312-macosx_10_13_x86_64.whl", hash = "sha256:b760c65308ff1e462f65d69c12e4ae085cff3b332d894637f6273a12a482d09f", size = 422489 },
    { url = "https://files.pythonhosted.org/packages/06/88/564958cedce636d0f1bed313381dfc4b4e3d3f6015a63dae6146e1b8c65c/Brotli-1.1.0-cp312-cp312-macosx_10_9_universal2.whl", hash = "sha256:316cc9b17edf613ac76b1f1f305d2a748f1b976b033b049a6ecdfd5612c70409", size = 873081 },
    { url = "https://files.pythonhosted.org/packages/58/79/b7026a8bb65da9a6bb7d14329fd2bd48d2b7f86d7329d5cc8ddc6a90526f/Brotli-1.1.0-cp312-cp312-macosx_10_9_x86_64.whl", hash = "sha256:caf9ee9a5775f3111642d33b86237b05808dafcd6268faa492250e9b78046eb2", size = 446244 },
    { url = "https://files.pythonhosted.org/packages/e5/18/c18c32ecea41b6c0004e15606e274006366fe19436b6adccc1ae7b2e50c2/Brotli-1.1.0-cp312-cp312-manylinux_2_17_aarch64.manylinux2014_aarch64.whl", hash = "sha256:70051525001750221daa10907c77830bc889cb6d865cc0b813d9db7fefc21451", size = 2906505 },
    { url = "https://files.pythonhosted.org/packages/08/c8/69ec0496b1ada7569b62d85893d928e865df29b90736558d6c98c2031208/Brotli-1.1.0-cp312-cp312-manylinux_2_17_ppc64le.manylinux2014_ppc64le.whl", hash = "sha256:7f4bf76817c14aa98cc6697ac02f3972cb8c3da93e9ef16b9c66573a68014f91", size = 2944152 },
    { url = "https://files.pythonhosted.org/packages/ab/fb/0517cea182219d6768113a38167ef6d4eb157a033178cc938033a552ed6d/Brotli-1.1.0-cp312-cp312-manylinux_2_17_x86_64.manylinux2014_x86_64.whl", hash = "sha256:d0c5516f0aed654134a2fc936325cc2e642f8a0e096d075209672eb321cff408", size = 2919252 },
    { url = "https://files.pythonhosted.org/packages/c7/53/73a3431662e33ae61a5c80b1b9d2d18f58dfa910ae8dd696e57d39f1a2f5/Brotli-1.1.0-cp312-cp312-manylinux_2_5_i686.manylinux1_i686.manylinux_2_17_i686.manylinux2014_i686.whl", hash = "sha256:6c3020404e0b5eefd7c9485ccf8393cfb75ec38ce75586e046573c9dc29967a0", size = 2845955 },
    { url = "https://files.pythonhosted.org/packages/55/ac/bd280708d9c5ebdbf9de01459e625a3e3803cce0784f47d633562cf40e83/Brotli-1.1.0-cp312-cp312-musllinux_1_1_aarch64.whl", hash = "sha256:4ed11165dd45ce798d99a136808a794a748d5dc38511303239d4e2363c0695dc", size = 2914304 },
    { url = "https://files.pythonhosted.org/packages/76/58/5c391b41ecfc4527d2cc3350719b02e87cb424ef8ba2023fb662f9bf743c/Brotli-1.1.0-cp312-cp312-musllinux_1_1_i686.whl", hash = "sha256:4093c631e96fdd49e0377a9c167bfd75b6d0bad2ace734c6eb20b348bc3ea180", size = 2814452 },
    { url = "https://files.pythonhosted.org/packages/c7/4e/91b8256dfe99c407f174924b65a01f5305e303f486cc7a2e8a5d43c8bec3/Brotli-1.1.0-cp312-cp312-musllinux_1_1_ppc64le.whl", hash = "sha256:7e4c4629ddad63006efa0ef968c8e4751c5868ff0b1c5c40f76524e894c50248", size = 2938751 },
    { url = "https://files.pythonhosted.org/packages/5a/a6/e2a39a5d3b412938362bbbeba5af904092bf3f95b867b4a3eb856104074e/Brotli-1.1.0-cp312-cp312-musllinux_1_1_x86_64.whl", hash = "sha256:861bf317735688269936f755fa136a99d1ed526883859f86e41a5d43c61d8966", size = 2933757 },
    { url = "https://files.pythonhosted.org/packages/13/f0/358354786280a509482e0e77c1a5459e439766597d280f28cb097642fc26/Brotli-1.1.0-cp312-cp312-musllinux_1_2_aarch64.whl", hash = "sha256:87a3044c3a35055527ac75e419dfa9f4f3667a1e887ee80360589eb8c90aabb9", size = 2936146 },
    { url = "https://files.pythonhosted.org/packages/80/f7/daf538c1060d3a88266b80ecc1d1c98b79553b3f117a485653f17070ea2a/Brotli-1.1.0-cp312-cp312-musllinux_1_2_i686.whl", hash = "sha256:c5529b34c1c9d937168297f2c1fde7ebe9ebdd5e121297ff9c043bdb2ae3d6fb", size = 2848055 },
    { url = "https://files.pythonhosted.org/packages/ad/cf/0eaa0585c4077d3c2d1edf322d8e97aabf317941d3a72d7b3ad8bce004b0/Brotli-1.1.0-cp312-cp312-musllinux_1_2_ppc64le.whl", hash = "sha256:ca63e1890ede90b2e4454f9a65135a4d387a4585ff8282bb72964fab893f2111", size = 3035102 },
    { url = "https://files.pythonhosted.org/packages/d8/63/1c1585b2aa554fe6dbce30f0c18bdbc877fa9a1bf5ff17677d9cca0ac122/Brotli-1.1.0-cp312-cp312-musllinux_1_2_x86_64.whl", hash = "sha256:e79e6520141d792237c70bcd7a3b122d00f2613769ae0cb61c52e89fd3443839", size = 2930029 },
    { url = "https://files.pythonhosted.org/packages/5f/3b/4e3fd1893eb3bbfef8e5a80d4508bec17a57bb92d586c85c12d28666bb13/Brotli-1.1.0-cp312-cp312-win32.whl", hash = "sha256:5f4d5ea15c9382135076d2fb28dde923352fe02951e66935a9efaac8f10e81b0", size = 333276 },
    { url = "https://files.pythonhosted.org/packages/3d/d5/942051b45a9e883b5b6e98c041698b1eb2012d25e5948c58d6bf85b1bb43/Brotli-1.1.0-cp312-cp312-win_amd64.whl", hash = "sha256:906bc3a79de8c4ae5b86d3d75a8b77e44404b0f4261714306e3ad248d8ab0951", size = 357255 },
    { url = "https://files.pythonhosted.org/packages/0a/9f/fb37bb8ffc52a8da37b1c03c459a8cd55df7a57bdccd8831d500e994a0ca/Brotli-1.1.0-cp313-cp313-macosx_10_13_universal2.whl", hash = "sha256:8bf32b98b75c13ec7cf774164172683d6e7891088f6316e54425fde1efc276d5", size = 815681 },
    { url = "https://files.pythonhosted.org/packages/06/b3/dbd332a988586fefb0aa49c779f59f47cae76855c2d00f450364bb574cac/Brotli-1.1.0-cp313-cp313-macosx_10_13_x86_64.whl", hash = "sha256:7bc37c4d6b87fb1017ea28c9508b36bbcb0c3d18b4260fcdf08b200c74a6aee8", size = 422475 },
    { url = "https://files.pythonhosted.org/packages/bb/80/6aaddc2f63dbcf2d93c2d204e49c11a9ec93a8c7c63261e2b4bd35198283/Brotli-1.1.0-cp313-cp313-manylinux_2_17_aarch64.manylinux2014_aarch64.whl", hash = "sha256:3c0ef38c7a7014ffac184db9e04debe495d317cc9c6fb10071f7fefd93100a4f", size = 2906173 },
    { url = "https://files.pythonhosted.org/packages/ea/1d/e6ca79c96ff5b641df6097d299347507d39a9604bde8915e76bf026d6c77/Brotli-1.1.0-cp313-cp313-manylinux_2_17_ppc64le.manylinux2014_ppc64le.whl", hash = "sha256:91d7cc2a76b5567591d12c01f019dd7afce6ba8cba6571187e21e2fc418ae648", size = 2943803 },
    { url = "https://files.pythonhosted.org/packages/ac/a3/d98d2472e0130b7dd3acdbb7f390d478123dbf62b7d32bda5c830a96116d/Brotli-1.1.0-cp313-cp313-manylinux_2_17_x86_64.manylinux2014_x86_64.whl", hash = "sha256:a93dde851926f4f2678e704fadeb39e16c35d8baebd5252c9fd94ce8ce68c4a0", size = 2918946 },
    { url = "https://files.pythonhosted.org/packages/c4/a5/c69e6d272aee3e1423ed005d8915a7eaa0384c7de503da987f2d224d0721/Brotli-1.1.0-cp313-cp313-manylinux_2_5_i686.manylinux1_i686.manylinux_2_17_i686.manylinux2014_i686.whl", hash = "sha256:f0db75f47be8b8abc8d9e31bc7aad0547ca26f24a54e6fd10231d623f183d089", size = 2845707 },
    { url = "https://files.pythonhosted.org/packages/58/9f/4149d38b52725afa39067350696c09526de0125ebfbaab5acc5af28b42ea/Brotli-1.1.0-cp313-cp313-musllinux_1_2_aarch64.whl", hash = "sha256:6967ced6730aed543b8673008b5a391c3b1076d834ca438bbd70635c73775368", size = 2936231 },
    { url = "https://files.pythonhosted.org/packages/5a/5a/145de884285611838a16bebfdb060c231c52b8f84dfbe52b852a15780386/Brotli-1.1.0-cp313-cp313-musllinux_1_2_i686.whl", hash = "sha256:7eedaa5d036d9336c95915035fb57422054014ebdeb6f3b42eac809928e40d0c", size = 2848157 },
    { url = "https://files.pythonhosted.org/packages/50/ae/408b6bfb8525dadebd3b3dd5b19d631da4f7d46420321db44cd99dcf2f2c/Brotli-1.1.0-cp313-cp313-musllinux_1_2_ppc64le.whl", hash = "sha256:d487f5432bf35b60ed625d7e1b448e2dc855422e87469e3f450aa5552b0eb284", size = 3035122 },
    { url = "https://files.pythonhosted.org/packages/af/85/a94e5cfaa0ca449d8f91c3d6f78313ebf919a0dbd55a100c711c6e9655bc/Brotli-1.1.0-cp313-cp313-musllinux_1_2_x86_64.whl", hash = "sha256:832436e59afb93e1836081a20f324cb185836c617659b07b129141a8426973c7", size = 2930206 },
    { url = "https://files.pythonhosted.org/packages/c2/f0/a61d9262cd01351df22e57ad7c34f66794709acab13f34be2675f45bf89d/Brotli-1.1.0-cp313-cp313-win32.whl", hash = "sha256:43395e90523f9c23a3d5bdf004733246fba087f2948f87ab28015f12359ca6a0", size = 333804 },
    { url = "https://files.pythonhosted.org/packages/7e/c1/ec214e9c94000d1c1974ec67ced1c970c148aa6b8d8373066123fc3dbf06/Brotli-1.1.0-cp313-cp313-win_amd64.whl", hash = "sha256:9011560a466d2eb3f5a6e4929cf4a09be405c64154e12df0dd72713f6500e32b", size = 358517 },
 ]
 [[package]]
 name = "brotlicffi"
 version = "1.1.0.0"
 source = { registry = "https://pypi.org/simple" }
 dependencies = [
    { name = "cffi" },
 ]
 sdist = { url = "https://files.pythonhosted.org/packages/95/9d/70caa61192f570fcf0352766331b735afa931b4c6bc9a348a0925cc13288/brotlicffi-1.1.0.0.tar.gz", hash = "sha256:b77827a689905143f87915310b93b273ab17888fd43ef350d4832c4a71083c13", size = 465192 }
 wheels = [
    { url = "https://files.pythonhosted.org/packages/a2/11/7b96009d3dcc2c931e828ce1e157f03824a69fb728d06bfd7b2fc6f93718/brotlicffi-1.1.0.0-cp37-abi3-macosx_10_9_x86_64.whl", hash = "sha256:9b7ae6bd1a3f0df532b6d67ff674099a96d22bc0948955cb338488c31bfb8851", size = 453786 },
    { url = "https://files.pythonhosted.org/packages/d6/e6/a8f46f4a4ee7856fbd6ac0c6fb0dc65ed181ba46cd77875b8d9bbe494d9e/brotlicffi-1.1.0.0-cp37-abi3-manylinux_2_17_aarch64.manylinux2014_aarch64.whl", hash = "sha256:19ffc919fa4fc6ace69286e0a23b3789b4219058313cf9b45625016bf7ff996b", size = 2911165 },
    { url = "https://files.pythonhosted.org/packages/be/20/201559dff14e83ba345a5ec03335607e47467b6633c210607e693aefac40/brotlicffi-1.1.0.0-cp37-abi3-manylinux_2_17_x86_64.manylinux2014_x86_64.whl", hash = "sha256:9feb210d932ffe7798ee62e6145d3a757eb6233aa9a4e7db78dd3690d7755814", size = 2927895 },
    { url = "https://files.pythonhosted.org/packages/cd/15/695b1409264143be3c933f708a3f81d53c4a1e1ebbc06f46331decbf6563/brotlicffi-1.1.0.0-cp37-abi3-manylinux_2_5_i686.manylinux1_i686.manylinux_2_17_i686.manylinux2014_i686.whl", hash = "sha256:84763dbdef5dd5c24b75597a77e1b30c66604725707565188ba54bab4f114820", size = 2851834 },
    { url = "https://files.pythonhosted.org/packages/b4/40/b961a702463b6005baf952794c2e9e0099bde657d0d7e007f923883b907f/brotlicffi-1.1.0.0-cp37-abi3-win32.whl", hash = "sha256:1b12b50e07c3911e1efa3a8971543e7648100713d4e0971b13631cce22c587eb", size = 341731 },
    { url = "https://files.pythonhosted.org/packages/1c/fa/5408a03c041114ceab628ce21766a4ea882aa6f6f0a800e04ee3a30ec6b9/brotlicffi-1.1.0.0-cp37-abi3-win_amd64.whl", hash = "sha256:994a4f0681bb6c6c3b0925530a1926b7a189d878e6e5e38fae8efa47c5d9c613", size = 366783 },
 ]
 [[package]]
 name = "cachetools"
 version = "5.5.2"
@ -545,19 +491,6 @@ wheels = [
    { url = "https://files.pythonhosted.org/packages/d2/05/5533d30f53f10239616a357f080892026db2d550a40c393d0a8a7af834a9/cryptography-44.0.1-cp39-abi3-win_amd64.whl", hash = "sha256:e403f7f766ded778ecdb790da786b418a9f2394f36e8cc8b796cc056ab05f44f", size = 3207303 },
 ]
 [[package]]
 name = "cssselect2"
 version = "0.8.0"
 source = { registry = "https://pypi.org/simple" }
 dependencies = [
    { name = "tinycss2" },
    { name = "webencodings" },
 ]
 sdist = { url = "https://files.pythonhosted.org/packages/9f/86/fd7f58fc498b3166f3a7e8e0cddb6e620fe1da35b02248b1bd59e95dbaaa/cssselect2-0.8.0.tar.gz", hash = "sha256:7674ffb954a3b46162392aee2a3a0aedb2e14ecf99fcc28644900f4e6e3e9d3a", size = 35716 }
 wheels = [
    { url = "https://files.pythonhosted.org/packages/0f/e7/aa315e6a749d9b96c2504a1ba0ba031ba2d0517e972ce22682e3fccecb09/cssselect2-0.8.0-py3-none-any.whl", hash = "sha256:46fc70ebc41ced7a32cd42d58b1884d72ade23d21e5a4eaaf022401c13f0e76e", size = 15454 },
 ]
 [[package]]
 name = "cycler"
 version = "0.12.1"
@ -619,12 +552,6 @@ wheels = [
    { url = "https://files.pythonhosted.org/packages/68/1b/e0a87d256e40e8c888847551b20a017a6b98139178505dc7ffb96f04e954/dnspython-2.7.0-py3-none-any.whl", hash = "sha256:b4c34b7d10b51bcc3a5071e7b8dee77939f1e878477eeecc965e9835f63c6c86", size = 313632 },
 ]
 [[package]]
 name = "docopt"
 version = "0.6.2"
 source = { registry = "https://pypi.org/simple" }
 sdist = { url = "https://files.pythonhosted.org/packages/a2/55/8f8cab2afd404cf578136ef2cc5dfb50baa1761b68c9da1fb1e4eed343c9/docopt-0.6.2.tar.gz", hash = "sha256:49b3a825280bd66b3aa83585ef59c4a8c82f2c8a522dbe754a8bc8d08c85c491", size = 25901 }
 [[package]]
 name = "effdet"
 version = "0.4.1"
@ -733,18 +660,6 @@ wheels = [
    { url = "https://files.pythonhosted.org/packages/a6/08/9968963c1fb8c34627b7f1fbcdfe9438540f87dc7c9bfb59bb4fd19a4ecf/fastapi_users_db_sqlalchemy-7.0.0-py3-none-any.whl", hash = "sha256:5fceac018e7cfa69efc70834dd3035b3de7988eb4274154a0dbe8b14f5aa001e", size = 6891 },
 ]
 [[package]]
 name = "feedparser"
 version = "6.0.11"
 source = { registry = "https://pypi.org/simple" }
 dependencies = [
    { name = "sgmllib3k" },
 ]
 sdist = { url = "https://files.pythonhosted.org/packages/ff/aa/7af346ebeb42a76bf108027fe7f3328bb4e57a3a96e53e21fd9ef9dd6dd0/feedparser-6.0.11.tar.gz", hash = "sha256:c9d0407b64c6f2a065d0ebb292c2b35c01050cc0dc33757461aaabdc4c4184d5", size = 286197 }
 wheels = [
    { url = "https://files.pythonhosted.org/packages/7c/d4/8c31aad9cc18f451c49f7f9cfb5799dadffc88177f7917bc90a66459b1d7/feedparser-6.0.11-py3-none-any.whl", hash = "sha256:0be7ee7b395572b19ebeb1d6aafb0028dee11169f1c934e0ed67d54992f4ad45", size = 81343 },
 ]
 [[package]]
 name = "filelock"
 version = "3.17.0"
@ -829,13 +744,6 @@ wheels = [
    { url = "https://files.pythonhosted.org/packages/bf/ff/44934a031ce5a39125415eb405b9efb76fe7f9586b75291d66ae5cbfc4e6/fonttools-4.56.0-py3-none-any.whl", hash = "sha256:1088182f68c303b50ca4dc0c82d42083d176cba37af1937e1a976a31149d4d14", size = 1089800 },
 ]
 [package.optional-dependencies]
 woff = [
    { name = "brotli", marker = "platform_python_implementation == 'CPython'" },
    { name = "brotlicffi", marker = "platform_python_implementation != 'CPython'" },
    { name = "zopfli" },
 ]
 [[package]]
 name = "frozenlist"
 version = "1.5.0"
@ -884,6 +792,21 @@ wheels = [
    { url = "https://files.pythonhosted.org/packages/e2/94/758680531a00d06e471ef649e4ec2ed6bf185356a7f9fbfbb7368a40bd49/fsspec-2025.2.0-py3-none-any.whl", hash = "sha256:9de2ad9ce1f85e1931858535bc882543171d197001a0a5eb2ddc04f1781ab95b", size = 184484 },
 ]
 [[package]]
 name = "github3-py"
 version = "4.0.1"
 source = { registry = "https://pypi.org/simple" }
 dependencies = [
    { name = "pyjwt", extra = ["crypto"] },
    { name = "python-dateutil" },
    { name = "requests" },
    { name = "uritemplate" },
 ]
 sdist = { url = "https://files.pythonhosted.org/packages/89/91/603bcaf8cd1b3927de64bf56c3a8915f6653ea7281919140c5bcff2bfe7b/github3.py-4.0.1.tar.gz", hash = "sha256:30d571076753efc389edc7f9aaef338a4fcb24b54d8968d5f39b1342f45ddd36", size = 36214038 }
 wheels = [
    { url = "https://files.pythonhosted.org/packages/61/ad/2394d4fb542574678b0ba342daf734d4d811768da3c2ee0c84d509dcb26c/github3.py-4.0.1-py3-none-any.whl", hash = "sha256:a89af7de25650612d1da2f0609622bcdeb07ee8a45a1c06b2d16a05e4234e753", size = 151800 },
 ]
 [[package]]
 name = "google-api-core"
 version = "2.24.2"
@ -947,42 +870,6 @@ wheels = [
    { url = "https://files.pythonhosted.org/packages/f9/53/d35476d547a286506f0a6a634ccf1e5d288fffd53d48f0bd5fef61d68684/googleapis_common_protos-1.69.2-py3-none-any.whl", hash = "sha256:0b30452ff9c7a27d80bfc5718954063e8ab53dd3697093d3bc99581f5fd24212", size = 293215 },
 ]
 [[package]]
 name = "gpt-researcher"
 version = "0.12.12"
 source = { registry = "https://pypi.org/simple" }
 dependencies = [
    { name = "aiofiles" },
    { name = "arxiv" },
    { name = "beautifulsoup4" },
    { name = "colorama" },
    { name = "htmldocx" },
    { name = "json-repair" },
    { name = "json5" },
    { name = "langchain" },
    { name = "langchain-community" },
    { name = "langchain-openai" },
    { name = "loguru" },
    { name = "lxml-html-clean" },
    { name = "markdown" },
    { name = "md2pdf" },
    { name = "mistune" },
    { name = "pydantic" },
    { name = "pymupdf" },
    { name = "python-docx" },
    { name = "python-dotenv" },
    { name = "python-multipart" },
    { name = "pyyaml" },
    { name = "requests" },
    { name = "tiktoken" },
    { name = "unstructured" },
    { name = "websockets" },
 ]
 sdist = { url = "https://files.pythonhosted.org/packages/55/d2/3a1acfd8b71ead6ab9a6833bcd1725e468e65817e7b04c233cd2d5e0a629/gpt_researcher-0.12.12.tar.gz", hash = "sha256:e3fa6faae4a3dc7e4280521eceb94a081a0aae277eb1ded25152579f65195844", size = 123669 }
 wheels = [
    { url = "https://files.pythonhosted.org/packages/09/0b/80ade14566946ca2253d1718746b63059f5bb1a2e489804c87e24332082d/gpt_researcher-0.12.12-py3-none-any.whl", hash = "sha256:3db51994406844d8acb28cb2a4897b9e9aaa34f0127edd7f8ded103fe2e544d4", size = 161671 },
 ]
 [[package]]
 name = "greenlet"
 version = "3.1.1"
@ -1080,19 +967,6 @@ wheels = [
    { url = "https://files.pythonhosted.org/packages/6c/dd/a834df6482147d48e225a49515aabc28974ad5a4ca3215c18a882565b028/html5lib-1.1-py2.py3-none-any.whl", hash = "sha256:0d78f8fde1c230e99fe37986a60526d7049ed4bf8a9fadbad5f00e22e58e041d", size = 112173 },
 ]
 [[package]]
 name = "htmldocx"
 version = "0.0.6"
 source = { registry = "https://pypi.org/simple" }
 dependencies = [
    { name = "beautifulsoup4" },
    { name = "python-docx" },
 ]
 sdist = { url = "https://files.pythonhosted.org/packages/8b/61/91a6b70ee576a4b07310d81efd4c688fe2e6f63ea42ec95b8f1d436b887e/htmldocx-0.0.6.tar.gz", hash = "sha256:b4bcec895f86d7a50ffc7133ca24d85c24f3614db2b37d33a30d9d04654a5486", size = 9418 }
 wheels = [
    { url = "https://files.pythonhosted.org/packages/8f/da/c70fc2ce54c1d1ce7c16f9656589273a6c94cbbc8867b3a512618d977309/htmldocx-0.0.6-py3-none-any.whl", hash = "sha256:adf5e95ad8ba8121e606cf138c614de13327a1192a5782acdb4a0abdc23db1b7", size = 9490 },
 ]
 [[package]]
 name = "httpcore"
 version = "1.0.7"
@ -1271,24 +1145,6 @@ wheels = [
    { url = "https://files.pythonhosted.org/packages/91/29/df4b9b42f2be0b623cbd5e2140cafcaa2bef0759a00b7b70104dcfe2fb51/joblib-1.4.2-py3-none-any.whl", hash = "sha256:06d478d5674cbc267e7496a410ee875abd68e4340feff4490bcb7afb88060ae6", size = 301817 },
 ]
 [[package]]
 name = "json-repair"
 version = "0.39.1"
 source = { registry = "https://pypi.org/simple" }
 sdist = { url = "https://files.pythonhosted.org/packages/95/60/6d1599bc01070d9fe3840d245ae80fd24b981c732d962842825ce7a9fde6/json_repair-0.39.1.tar.gz", hash = "sha256:e90a489f247e1a8fc86612a5c719872a3dbf9cbaffd6d55f238ec571a77740fa", size = 30040 }
 wheels = [
    { url = "https://files.pythonhosted.org/packages/ff/b9/2e445481555422b907dab468b53574bc1e995099ca1a1201d0d876ca05e9/json_repair-0.39.1-py3-none-any.whl", hash = "sha256:3001409a2f319249f13e13d6c622117a5b70ea7e0c6f43864a0233cdffc3a599", size = 20686 },
 ]
 [[package]]
 name = "json5"
 version = "0.10.0"
 source = { registry = "https://pypi.org/simple" }
 sdist = { url = "https://files.pythonhosted.org/packages/85/3d/bbe62f3d0c05a689c711cff57b2e3ac3d3e526380adb7c781989f075115c/json5-0.10.0.tar.gz", hash = "sha256:e66941c8f0a02026943c52c2eb34ebeb2a6f819a0be05920a6f5243cd30fd559", size = 48202 }
 wheels = [
    { url = "https://files.pythonhosted.org/packages/aa/42/797895b952b682c3dafe23b1834507ee7f02f4d6299b65aaa61425763278/json5-0.10.0-py3-none-any.whl", hash = "sha256:19b23410220a7271e8377f81ba8aacba2fdd56947fbb137ee5977cbe1f5e8dfa", size = 34049 },
 ]
 [[package]]
 name = "jsonpatch"
 version = "1.33"
@ -1450,20 +1306,6 @@ wheels = [
    { url = "https://files.pythonhosted.org/packages/20/0e/ddf9f5dc46b178df5c101666bb3bc7fc526d68cd81cdd60cbe1b6b438b30/langchain_core-0.3.43-py3-none-any.whl", hash = "sha256:caa6bc1f4c6ab71d3c2e400f8b62e1cd6dc5ac2c37e03f12f3e2c60befd5b273", size = 415421 },
 ]
 [[package]]
 name = "langchain-openai"
 version = "0.3.8"
 source = { registry = "https://pypi.org/simple" }
 dependencies = [
    { name = "langchain-core" },
    { name = "openai" },
    { name = "tiktoken" },
 ]
 sdist = { url = "https://files.pythonhosted.org/packages/2e/04/ae071af0b04d1c3a8040498714091afd21149f6f8ae1dbab584317d9dfd7/langchain_openai-0.3.8.tar.gz", hash = "sha256:4d73727eda8102d1d07a2ca036278fccab0bb5e0abf353cec9c3973eb72550ec", size = 256898 }
 wheels = [
    { url = "https://files.pythonhosted.org/packages/a5/43/9c6a1101bcd751d52a3328a06956f85122f9aaa31da1b15a8e0f99a70317/langchain_openai-0.3.8-py3-none-any.whl", hash = "sha256:9004dc8ef853aece0d8f0feca7753dc97f710fa3e53874c8db66466520436dbb", size = 55446 },
 ]
 [[package]]
 name = "langchain-text-splitters"
 version = "0.3.6"
@ -1499,6 +1341,61 @@ dependencies = [
 ]
 sdist = { url = "https://files.pythonhosted.org/packages/0e/72/a3add0e4eec4eb9e2569554f7c70f4a3c27712f40e3284d483e88094cc0e/langdetect-1.0.9.tar.gz", hash = "sha256:cbc1fef89f8d062739774bd51eda3da3274006b3661d199c2655f6b3f6d605a0", size = 981474 }
 [[package]]
 name = "langgraph"
 version = "0.3.29"
 source = { registry = "https://pypi.org/simple" }
 dependencies = [
    { name = "langchain-core" },
    { name = "langgraph-checkpoint" },
    { name = "langgraph-prebuilt" },
    { name = "langgraph-sdk" },
    { name = "xxhash" },
 ]
 sdist = { url = "https://files.pythonhosted.org/packages/26/00/6a38988d472835845ee6837402dc6050e012117b84ef2b838b7abd3268f1/langgraph-0.3.29.tar.gz", hash = "sha256:2bfa6d6b04541ddfcb03b56efd1fca6294a1700ff61a52c1582a8bb4f2d55a94", size = 119970 }
 wheels = [
    { url = "https://files.pythonhosted.org/packages/66/b4/89d81ed78efeec5b3d554a9244cdc6aa6cbf544da9c53738d7c2c6d4be57/langgraph-0.3.29-py3-none-any.whl", hash = "sha256:6045fbbe9ccc5af3fd7295a86f88e0d2b111243a36290e41248af379009e4cc1", size = 144692 },
 ]
 [[package]]
 name = "langgraph-checkpoint"
 version = "2.0.24"
 source = { registry = "https://pypi.org/simple" }
 dependencies = [
    { name = "langchain-core" },
    { name = "ormsgpack" },
 ]
 sdist = { url = "https://files.pythonhosted.org/packages/0d/df/bacef68562ba4c391ded751eecda8e579ec78a581506064cf625e0ebd93a/langgraph_checkpoint-2.0.24.tar.gz", hash = "sha256:9596dad332344e7e871257be464df8a07c2e9bac66143081b11b9422b0167e5b", size = 37328 }
 wheels = [
    { url = "https://files.pythonhosted.org/packages/bc/60/30397e8fd2b7dead3754aa79d708caff9dbb371f30b4cd21802c60f6b921/langgraph_checkpoint-2.0.24-py3-none-any.whl", hash = "sha256:3836e2909ef2387d1fa8d04ee3e2a353f980d519fd6c649af352676dc73d66b8", size = 42028 },
 ]
 [[package]]
 name = "langgraph-prebuilt"
 version = "0.1.8"
 source = { registry = "https://pypi.org/simple" }
 dependencies = [
    { name = "langchain-core" },
    { name = "langgraph-checkpoint" },
 ]
 sdist = { url = "https://files.pythonhosted.org/packages/57/30/f31f0e076c37d097b53e4cff5d479a3686e1991f6c86a1a4727d5d1f5489/langgraph_prebuilt-0.1.8.tar.gz", hash = "sha256:4de7659151829b2b955b6798df6800e580e617782c15c2c5b29b139697491831", size = 24543 }
 wheels = [
    { url = "https://files.pythonhosted.org/packages/36/72/9e092665502f8f52f2708065ed14fbbba3f95d1a1b65d62049b0c5fcdf00/langgraph_prebuilt-0.1.8-py3-none-any.whl", hash = "sha256:ae97b828ae00be2cefec503423aa782e1bff165e9b94592e224da132f2526968", size = 25903 },
 ]
 [[package]]
 name = "langgraph-sdk"
 version = "0.1.61"
 source = { registry = "https://pypi.org/simple" }
 dependencies = [
    { name = "httpx" },
    { name = "orjson" },
 ]
 sdist = { url = "https://files.pythonhosted.org/packages/f0/c6/a11de2c770e1ac2774e2f19fdbd982b8df079e4206376456e14af395a3f0/langgraph_sdk-0.1.61.tar.gz", hash = "sha256:87dd1f07ab82da8875ac343268ece8bf5414632017ebc9d1cef4b523962fd601", size = 44136 }
 wheels = [
    { url = "https://files.pythonhosted.org/packages/fb/2b/85e796d8b4aad892c5d2bccc0def124fcdc2c9852dfa121adadfc41085b2/langgraph_sdk-0.1.61-py3-none-any.whl", hash = "sha256:f2d774b12497c428862993090622d51e0dbc3f53e0cee3d74a13c7495d835cc6", size = 47249 },
 ]
 [[package]]
 name = "langsmith"
 version = "0.3.8"
@ -1538,19 +1435,6 @@ wheels = [
    { url = "https://files.pythonhosted.org/packages/f9/c2/1b6c502909b7af9054736af61e27558a3341e8c1ba28e7f82473e6dd936f/litellm-1.61.4-py3-none-any.whl", hash = "sha256:e87e0d397a191795b4217f9299fc9b21eaacaab91409695f0a4780cceccda6e1", size = 6814517 },
 ]
 [[package]]
 name = "loguru"
 version = "0.7.3"
 source = { registry = "https://pypi.org/simple" }
 dependencies = [
    { name = "colorama", marker = "sys_platform == 'win32'" },
    { name = "win32-setctime", marker = "sys_platform == 'win32'" },
 ]
 sdist = { url = "https://files.pythonhosted.org/packages/3a/05/a1dae3dffd1116099471c643b8924f5aa6524411dc6c63fdae648c4f1aca/loguru-0.7.3.tar.gz", hash = "sha256:19480589e77d47b8d85b2c827ad95d49bf31b0dcde16593892eb51dd18706eb6", size = 63559 }
 wheels = [
    { url = "https://files.pythonhosted.org/packages/0c/29/0348de65b8cc732daa3e33e67806420b2ae89bdce2b04af740289c5c6c8c/loguru-0.7.3-py3-none-any.whl", hash = "sha256:31a33c10c8e1e10422bfd431aeb5d351c7cf7fa671e3c4df004162264b28220c", size = 61595 },
 ]
 [[package]]
 name = "lxml"
 version = "5.3.1"
@ -1593,18 +1477,6 @@ wheels = [
    { url = "https://files.pythonhosted.org/packages/80/83/8c54533b3576f4391eebea88454738978669a6cad0d8e23266224007939d/lxml-5.3.1-cp313-cp313-win_amd64.whl", hash = "sha256:91fb6a43d72b4f8863d21f347a9163eecbf36e76e2f51068d59cd004c506f332", size = 3814484 },
 ]
 [[package]]
 name = "lxml-html-clean"
 version = "0.4.1"
 source = { registry = "https://pypi.org/simple" }
 dependencies = [
    { name = "lxml" },
 ]
 sdist = { url = "https://files.pythonhosted.org/packages/81/f2/fe319e3c5cb505a361b95d1e0d0d793fe28d4dcc2fc39d3cae9324dc4233/lxml_html_clean-0.4.1.tar.gz", hash = "sha256:40c838bbcf1fc72ba4ce811fbb3135913017b27820d7c16e8bc412ae1d8bc00b", size = 21378 }
 wheels = [
    { url = "https://files.pythonhosted.org/packages/f7/ba/2af7a60b45bf21375e111c1e2d5d721108d06c80e3d9a3cc1d767afe1731/lxml_html_clean-0.4.1-py3-none-any.whl", hash = "sha256:b704f2757e61d793b1c08bf5ad69e4c0b68d6696f4c3c1429982caf90050bcaf", size = 14114 },
 ]
 [[package]]
 name = "makefun"
 version = "1.15.6"
@ -1614,6 +1486,18 @@ wheels = [
    { url = "https://files.pythonhosted.org/packages/89/a1/3e145759e776c8866488a71270c399bf7c4e554551ac2e247aa0a18a0596/makefun-1.15.6-py2.py3-none-any.whl", hash = "sha256:e69b870f0bb60304765b1e3db576aaecf2f9b3e5105afe8cfeff8f2afe6ad067", size = 22946 },
 ]
 [[package]]
 name = "mako"
 version = "1.3.10"
 source = { registry = "https://pypi.org/simple" }
 dependencies = [
    { name = "markupsafe" },
 ]
 sdist = { url = "https://files.pythonhosted.org/packages/9e/38/bd5b78a920a64d708fe6bc8e0a2c075e1389d53bef8413725c63ba041535/mako-1.3.10.tar.gz", hash = "sha256:99579a6f39583fa7e5630a28c3c1f440e4e97a414b80372649c0ce338da2ea28", size = 392474 }
 wheels = [
    { url = "https://files.pythonhosted.org/packages/87/fb/99f81ac72ae23375f22b7afdb7642aba97c00a713c217124420147681a2f/mako-1.3.10-py3-none-any.whl", hash = "sha256:baef24a52fc4fc514a0887ac600f9f1cff3d82c61d4d700a1fa84d597b88db59", size = 78509 },
 ]
 [[package]]
 name = "markdown"
 version = "3.7"
@ -1635,15 +1519,6 @@ wheels = [
    { url = "https://files.pythonhosted.org/packages/42/d7/1ec15b46af6af88f19b8e5ffea08fa375d433c998b8a7639e76935c14f1f/markdown_it_py-3.0.0-py3-none-any.whl", hash = "sha256:355216845c60bd96232cd8d8c40e8f9765cc86f46880e43a8fd22dc1a1a8cab1", size = 87528 },
 ]
 [[package]]
 name = "markdown2"
 version = "2.5.3"
 source = { registry = "https://pypi.org/simple" }
 sdist = { url = "https://files.pythonhosted.org/packages/44/52/d7dcc6284d59edb8301b8400435fbb4926a9b0f13a12b5cbaf3a4a54bb7b/markdown2-2.5.3.tar.gz", hash = "sha256:4d502953a4633408b0ab3ec503c5d6984d1b14307e32b325ec7d16ea57524895", size = 141676 }
 wheels = [
    { url = "https://files.pythonhosted.org/packages/84/37/0a13c83ccf5365b8e08ea572dfbc04b8cb87cadd359b2451a567f5248878/markdown2-2.5.3-py3-none-any.whl", hash = "sha256:a8ebb7e84b8519c37bf7382b3db600f1798a22c245bfd754a1f87ca8d7ea63b3", size = 48550 },
 ]
 [[package]]
 name = "markdownify"
 version = "0.14.1"
@ -1744,17 +1619,6 @@ wheels = [
    { url = "https://files.pythonhosted.org/packages/ac/c2/0d5aae823bdcc42cc99327ecdd4d28585e15ccd5218c453b7bcd827f3421/matplotlib-3.10.1-cp313-cp313t-win_amd64.whl", hash = "sha256:bc411ebd5889a78dabbc457b3fa153203e22248bfa6eedc6797be5df0164dbf9", size = 8134832 },
 ]
 [[package]]
 name = "md2pdf"
 version = "1.0.1"
 source = { registry = "https://pypi.org/simple" }
 dependencies = [
    { name = "docopt" },
    { name = "markdown2" },
    { name = "weasyprint" },
 ]
 sdist = { url = "https://files.pythonhosted.org/packages/de/b0/adbef5356f97a6d33c7811805b06e3774c7a58ea70dc28039ae4ad1ba1be/md2pdf-1.0.1.tar.gz", hash = "sha256:3d5aab77dcd5b6f5827b193819ab1a8c1cec506ce5f6c777c3411b703352cd98", size = 6377 }
 [[package]]
 name = "mdurl"
 version = "0.1.2"
@ -1764,15 +1628,6 @@ wheels = [
    { url = "https://files.pythonhosted.org/packages/b3/38/89ba8ad64ae25be8de66a6d463314cf1eb366222074cfda9ee839c56a4b4/mdurl-0.1.2-py3-none-any.whl", hash = "sha256:84008a41e51615a49fc9966191ff91509e3c40b939176e643fd50a5c2196b8f8", size = 9979 },
 ]
 [[package]]
 name = "mistune"
 version = "3.1.2"
 source = { registry = "https://pypi.org/simple" }
 sdist = { url = "https://files.pythonhosted.org/packages/80/f7/f6d06304c61c2a73213c0a4815280f70d985429cda26272f490e42119c1a/mistune-3.1.2.tar.gz", hash = "sha256:733bf018ba007e8b5f2d3a9eb624034f6ee26c4ea769a98ec533ee111d504dff", size = 94613 }
 wheels = [
    { url = "https://files.pythonhosted.org/packages/12/92/30b4e54c4d7c48c06db61595cffbbf4f19588ea177896f9b78f0fbe021fd/mistune-3.1.2-py3-none-any.whl", hash = "sha256:4b47731332315cdca99e0ded46fc0004001c1299ff773dfb48fbe1fd226de319", size = 53696 },
 ]
 [[package]]
 name = "model2vec"
 version = "0.4.0"
@ -2169,6 +2024,30 @@ wheels = [
    { url = "https://files.pythonhosted.org/packages/27/f1/1d7ec15b20f8ce9300bc850de1e059132b88990e46cd0ccac29cbf11e4f9/orjson-3.10.15-cp313-cp313-win_amd64.whl", hash = "sha256:fd56a26a04f6ba5fb2045b0acc487a63162a958ed837648c5781e1fe3316cfbf", size = 133444 },
 ]
 [[package]]
 name = "ormsgpack"
 version = "1.9.1"
 source = { registry = "https://pypi.org/simple" }
 sdist = { url = "https://files.pythonhosted.org/packages/25/a7/462cf8ff5e29241868b82d3a5ec124d690eb6a6a5c6fa5bb1367b839e027/ormsgpack-1.9.1.tar.gz", hash = "sha256:3da6e63d82565e590b98178545e64f0f8506137b92bd31a2d04fd7c82baf5794", size = 56887 }
 wheels = [
    { url = "https://files.pythonhosted.org/packages/dd/f1/155a598cc8030526ccaaf91ba4d61530f87900645559487edba58b0a90a2/ormsgpack-1.9.1-cp312-cp312-macosx_10_12_x86_64.macosx_11_0_arm64.macosx_10_12_universal2.whl", hash = "sha256:1ede445fc3fdba219bb0e0d1f289df26a9c7602016b7daac6fafe8fe4e91548f", size = 383225 },
    { url = "https://files.pythonhosted.org/packages/23/1c/ef3097ba550fad55c79525f461febdd4e0d9cc18d065248044536f09488e/ormsgpack-1.9.1-cp312-cp312-manylinux_2_17_aarch64.manylinux2014_aarch64.whl", hash = "sha256:db50b9f918e25b289114312ed775794d0978b469831b992bdc65bfe20b91fe30", size = 214056 },
    { url = "https://files.pythonhosted.org/packages/27/77/64d0da25896b2cbb99505ca518c109d7dd1964d7fde14c10943731738b60/ormsgpack-1.9.1-cp312-cp312-manylinux_2_17_armv7l.manylinux2014_armv7l.whl", hash = "sha256:8c7d8fc58e4333308f58ec720b1ee6b12b2b3fe2d2d8f0766ab751cb351e8757", size = 217339 },
    { url = "https://files.pythonhosted.org/packages/6c/10/c3a7fd0a0068b0bb52cccbfeb5656db895d69e895a3abbc210c4b3f98ff8/ormsgpack-1.9.1-cp312-cp312-manylinux_2_17_x86_64.manylinux2014_x86_64.whl", hash = "sha256:aeee6d08c040db265cb8563444aba343ecb32cbdbe2414a489dcead9f70c6765", size = 223816 },
    { url = "https://files.pythonhosted.org/packages/43/e7/aee1238dba652f2116c2523d36fd1c5f9775436032be5c233108fd2a1415/ormsgpack-1.9.1-cp312-cp312-musllinux_1_2_aarch64.whl", hash = "sha256:2fbb8181c198bdc413a4e889e5200f010724eea4b6d5a9a7eee2df039ac04aca", size = 394287 },
    { url = "https://files.pythonhosted.org/packages/c7/09/1b452a92376f29d7a2da7c18fb01cf09978197a8eccbb8b204e72fd5a970/ormsgpack-1.9.1-cp312-cp312-musllinux_1_2_armv7l.whl", hash = "sha256:16488f094ac0e2250cceea6caf72962614aa432ee11dd57ef45e1ad25ece3eff", size = 480709 },
    { url = "https://files.pythonhosted.org/packages/de/13/7fa9fee5a73af8a73a42bf8c2e69489605714f65f5a41454400a05e84a3b/ormsgpack-1.9.1-cp312-cp312-musllinux_1_2_x86_64.whl", hash = "sha256:422d960bfd6ad88be20794f50ec7953d8f7a0f2df60e19d0e8feb994e2ed64ee", size = 397247 },
    { url = "https://files.pythonhosted.org/packages/a1/2d/2e87cb28110db0d3bb750edd4d8719b5068852a2eef5e96b0bf376bb8a81/ormsgpack-1.9.1-cp312-cp312-win_amd64.whl", hash = "sha256:e6e2f9eab527cf43fb4a4293e493370276b1c8716cf305689202d646c6a782ef", size = 125368 },
    { url = "https://files.pythonhosted.org/packages/b8/54/0390d5d092831e4df29dbafe32402891fc14b3e6ffe5a644b16cbbc9d9bc/ormsgpack-1.9.1-cp313-cp313-macosx_10_12_x86_64.macosx_11_0_arm64.macosx_10_12_universal2.whl", hash = "sha256:ac61c18d9dd085e8519b949f7e655f7fb07909fd09c53b4338dd33309012e289", size = 383226 },
    { url = "https://files.pythonhosted.org/packages/47/64/8b15d262d1caefead8fb22ec144f5ff7d9505fc31c22bc34598053d46fbe/ormsgpack-1.9.1-cp313-cp313-manylinux_2_17_aarch64.manylinux2014_aarch64.whl", hash = "sha256:134840b8c6615da2c24ce77bd12a46098015c808197a9995c7a2d991e1904eec", size = 214057 },
    { url = "https://files.pythonhosted.org/packages/57/00/65823609266bad4d5ed29ea753d24a3bdb01c7edaf923da80967fc31f9c5/ormsgpack-1.9.1-cp313-cp313-manylinux_2_17_armv7l.manylinux2014_armv7l.whl", hash = "sha256:38fd42618f626394b2c7713c5d4bcbc917254e9753d5d4cde460658b51b11a74", size = 217340 },
    { url = "https://files.pythonhosted.org/packages/a0/51/e535c50f7f87b49110233647f55300d7975139ef5e51f1adb4c55f58c124/ormsgpack-1.9.1-cp313-cp313-manylinux_2_17_x86_64.manylinux2014_x86_64.whl", hash = "sha256:9d36397333ad07b9eba4c2e271fa78951bd81afc059c85a6e9f6c0eb2de07cda", size = 223815 },
    { url = "https://files.pythonhosted.org/packages/0c/ee/393e4a6de2a62124bf589602648f295a9fb3907a0e2fe80061b88899d072/ormsgpack-1.9.1-cp313-cp313-musllinux_1_2_aarch64.whl", hash = "sha256:603063089597917d04e4c1b1d53988a34f7dc2ff1a03adcfd1cf4ae966d5fba6", size = 394287 },
    { url = "https://files.pythonhosted.org/packages/c6/d8/e56d7c3cb73a0e533e3e2a21ae5838b2aa36a9dac1ca9c861af6bae5a369/ormsgpack-1.9.1-cp313-cp313-musllinux_1_2_armv7l.whl", hash = "sha256:94bbf2b185e0cb721ceaba20e64b7158e6caf0cecd140ca29b9f05a8d5e91e2f", size = 480707 },
    { url = "https://files.pythonhosted.org/packages/e6/e0/6a3c6a6dc98583a721c54b02f5195bde8f801aebdeda9b601fa2ab30ad39/ormsgpack-1.9.1-cp313-cp313-musllinux_1_2_x86_64.whl", hash = "sha256:c38f380b1e8c96a712eb302b9349347385161a8e29046868ae2bfdfcb23e2692", size = 397246 },
    { url = "https://files.pythonhosted.org/packages/b0/60/0ee5d790f13507e1f75ac21fc82dc1ef29afe1f520bd0f249d65b2f4839b/ormsgpack-1.9.1-cp313-cp313-win_amd64.whl", hash = "sha256:a4bc63fb30db94075611cedbbc3d261dd17cf2aa8ff75a0fd684cd45ca29cb1b", size = 125371 },
 ]
 [[package]]
 name = "packaging"
 version = "24.2"
@ -2572,15 +2451,6 @@ wheels = [
    { url = "https://files.pythonhosted.org/packages/b4/46/93416fdae86d40879714f72956ac14df9c7b76f7d41a4d68aa9f71a0028b/pydantic_settings-2.7.1-py3-none-any.whl", hash = "sha256:590be9e6e24d06db33a4262829edef682500ef008565a969c73d39d5f8bfb3fd", size = 29718 },
 ]
 [[package]]
 name = "pydyf"
 version = "0.11.0"
 source = { registry = "https://pypi.org/simple" }
 sdist = { url = "https://files.pythonhosted.org/packages/2e/c2/97fc6ce4ce0045080dc99446def812081b57750ed8aa67bfdfafa4561fe5/pydyf-0.11.0.tar.gz", hash = "sha256:394dddf619cca9d0c55715e3c55ea121a9bf9cbc780cdc1201a2427917b86b64", size = 17769 }
 wheels = [
    { url = "https://files.pythonhosted.org/packages/c9/ac/d5db977deaf28c6ecbc61bbca269eb3e8f0b3a1f55c8549e5333e606e005/pydyf-0.11.0-py3-none-any.whl", hash = "sha256:0aaf9e2ebbe786ec7a78ec3fbffa4cdcecde53fd6f563221d53c6bc1328848a3", size = 8104 },
 ]
 [[package]]
 name = "pyee"
 version = "12.1.1"
@ -2616,21 +2486,6 @@ crypto = [
    { name = "cryptography" },
 ]
 [[package]]
 name = "pymupdf"
 version = "1.25.3"
 source = { registry = "https://pypi.org/simple" }
 sdist = { url = "https://files.pythonhosted.org/packages/06/47/b61c1c44b87cbdaeecdec3f43ce524ed6b3c72172bc6184eb82c94fbc43d/pymupdf-1.25.3.tar.gz", hash = "sha256:b640187c64c5ac5d97505a92e836da299da79c2f689f3f94a67a37a493492193", size = 67259841 }
 wheels = [
    { url = "https://files.pythonhosted.org/packages/61/9b/98ef4b98309e9db3baa9fe572f0e61b6130bb9852d13189970f35b703499/pymupdf-1.25.3-cp39-abi3-macosx_10_9_x86_64.whl", hash = "sha256:96878e1b748f9c2011aecb2028c5f96b5a347a9a91169130ad0133053d97915e", size = 19343576 },
    { url = "https://files.pythonhosted.org/packages/14/62/4e12126db174c8cfbf692281cda971cc4046c5f5226032c2cfaa6f83e08d/pymupdf-1.25.3-cp39-abi3-macosx_11_0_arm64.whl", hash = "sha256:6ef753005b72ebfd23470f72f7e30f61e21b0b5e748045ec5b8f89e6e3068d62", size = 18580114 },
    { url = "https://files.pythonhosted.org/packages/ec/c5/cf7ecf005e4f8ba3664d6aaa0613adeba4c2ab524832c452c69857e7184f/pymupdf-1.25.3-cp39-abi3-manylinux2014_aarch64.manylinux_2_17_aarch64.whl", hash = "sha256:cbff443d899f37b17f1e67563cc03673d50b4bf33ccc237e73d34f18f3a07ccf", size = 19442580 },
    { url = "https://files.pythonhosted.org/packages/52/de/bd1418e31f73d37b8381cd5deacfd681e6be702b8890e123e83724569ee1/pymupdf-1.25.3-cp39-abi3-manylinux2014_x86_64.manylinux_2_17_x86_64.whl", hash = "sha256:46d90c4f9e62d1856e8db4b9f04a202ff4a7f086a816af73abdc86adb7f5e25a", size = 19999825 },
    { url = "https://files.pythonhosted.org/packages/42/ee/3c449b0de061440ba1ac984aa845315e9e2dca0ff2003c5adfc6febff203/pymupdf-1.25.3-cp39-abi3-musllinux_1_2_x86_64.whl", hash = "sha256:a5de51efdbe4d486b6c1111c84e8a231cbfb426f3d6ff31ab530ad70e6f39756", size = 21123157 },
    { url = "https://files.pythonhosted.org/packages/83/53/71faaaf91c56f2883b13f3dd849bf2697f012eb35eb7b952d62734cff41f/pymupdf-1.25.3-cp39-abi3-win32.whl", hash = "sha256:bca72e6089f985d800596e22973f79cc08af6cbff1d93e5bda9248326a03857c", size = 15094211 },
    { url = "https://files.pythonhosted.org/packages/09/e0/d72e88a1d5e23aa381fd463057dc3d0fb29090e1e7308a870c334716579c/pymupdf-1.25.3-cp39-abi3-win_amd64.whl", hash = "sha256:4fb357438c9129fbf939b5af85323434df64e36759c399c376b62ad6da95498c", size = 16542949 },
 ]
 [[package]]
 name = "pypandoc"
 version = "1.15"
@ -2678,15 +2533,6 @@ wheels = [
    { url = "https://files.pythonhosted.org/packages/e1/6b/2706497c86e8d69fb76afe5ea857fe1794621aa0f3b1d863feb953fe0f22/pypdfium2-4.30.1-py3-none-win_arm64.whl", hash = "sha256:c2b6d63f6d425d9416c08d2511822b54b8e3ac38e639fc41164b1d75584b3a8c", size = 2814810 },
 ]
 [[package]]
 name = "pyphen"
 version = "0.17.2"
 source = { registry = "https://pypi.org/simple" }
 sdist = { url = "https://files.pythonhosted.org/packages/69/56/e4d7e1bd70d997713649c5ce530b2d15a5fc2245a74ca820fc2d51d89d4d/pyphen-0.17.2.tar.gz", hash = "sha256:f60647a9c9b30ec6c59910097af82bc5dd2d36576b918e44148d8b07ef3b4aa3", size = 2079470 }
 wheels = [
    { url = "https://files.pythonhosted.org/packages/7b/1f/c2142d2edf833a90728e5cdeb10bdbdc094dde8dbac078cee0cf33f5e11b/pyphen-0.17.2-py3-none-any.whl", hash = "sha256:3a07fb017cb2341e1d9ff31b8634efb1ae4dc4b130468c7c39dd3d32e7c3affd", size = 2079358 },
 ]
 [[package]]
 name = "pyreadline3"
 version = "3.5.4"
@ -3135,12 +2981,6 @@ wheels = [
    { url = "https://files.pythonhosted.org/packages/69/8a/b9dc7678803429e4a3bc9ba462fa3dd9066824d3c607490235c6a796be5a/setuptools-75.8.0-py3-none-any.whl", hash = "sha256:e3982f444617239225d675215d51f6ba05f845d4eec313da4418fdbb56fb27e3", size = 1228782 },
 ]
 [[package]]
 name = "sgmllib3k"
 version = "1.0.0"
 source = { registry = "https://pypi.org/simple" }
 sdist = { url = "https://files.pythonhosted.org/packages/9e/bd/3704a8c3e0942d711c1299ebf7b9091930adae6675d7c8f476a7ce48653c/sgmllib3k-1.0.0.tar.gz", hash = "sha256:7868fb1c8bfa764c1ac563d3cf369c381d1325d36124933a726f29fcdaa812e9", size = 5750 }
 [[package]]
 name = "six"
 version = "1.17.0"
@ -3225,17 +3065,19 @@ wheels = [
 [[package]]
 name = "surf-new-backend"
-version = "0.1.0"
+version = "0.0.6"
 source = { virtual = "." }
 dependencies = [
    { name = "alembic" },
    { name = "asyncpg" },
    { name = "chonkie", extra = ["all"] },
    { name = "fastapi" },
    { name = "fastapi-users", extra = ["oauth", "sqlalchemy"] },
    { name = "firecrawl-py" },
-    { name = "gpt-researcher" },
+    { name = "github3-py" },
    { name = "langchain-community" },
    { name = "langchain-unstructured" },
    { name = "langgraph" },
    { name = "litellm" },
    { name = "markdownify" },
    { name = "notion-client" },
@ -3254,14 +3096,16 @@ dependencies = [
 [package.metadata]
 requires-dist = [
    { name = "alembic", specifier = ">=1.13.0" },
    { name = "asyncpg", specifier = ">=0.30.0" },
    { name = "chonkie", extras = ["all"], specifier = ">=0.4.1" },
    { name = "fastapi", specifier = ">=0.115.8" },
    { name = "fastapi-users", extras = ["oauth", "sqlalchemy"], specifier = ">=14.0.1" },
    { name = "firecrawl-py", specifier = ">=1.12.0" },
-    { name = "gpt-researcher", specifier = ">=0.12.12" },
+    { name = "github3-py", specifier = "==4.0.1" },
    { name = "langchain-community", specifier = ">=0.3.17" },
    { name = "langchain-unstructured", specifier = ">=0.1.6" },
    { name = "langgraph", specifier = ">=0.3.29" },
    { name = "litellm", specifier = ">=1.61.4" },
    { name = "markdownify", specifier = ">=0.14.1" },
    { name = "notion-client", specifier = ">=2.3.0" },
@ -3362,30 +3206,6 @@ wheels = [
    { url = "https://files.pythonhosted.org/packages/6c/d0/179abca8b984b3deefd996f362b612c39da73b60f685921e6cd58b6125b4/timm-1.0.15-py3-none-any.whl", hash = "sha256:5a3dc460c24e322ecc7fd1f3e3eb112423ddee320cb059cc1956fbc9731748ef", size = 2361373 },
 ]
 [[package]]
 name = "tinycss2"
 version = "1.4.0"
 source = { registry = "https://pypi.org/simple" }
 dependencies = [
    { name = "webencodings" },
 ]
 sdist = { url = "https://files.pythonhosted.org/packages/7a/fd/7a5ee21fd08ff70d3d33a5781c255cbe779659bd03278feb98b19ee550f4/tinycss2-1.4.0.tar.gz", hash = "sha256:10c0972f6fc0fbee87c3edb76549357415e94548c1ae10ebccdea16fb404a9b7", size = 87085 }
 wheels = [
    { url = "https://files.pythonhosted.org/packages/e6/34/ebdc18bae6aa14fbee1a08b63c015c72b64868ff7dae68808ab500c492e2/tinycss2-1.4.0-py3-none-any.whl", hash = "sha256:3a49cf47b7675da0b15d0c6e1df8df4ebd96e9394bb905a5775adb0d884c5289", size = 26610 },
 ]
 [[package]]
 name = "tinyhtml5"
 version = "2.0.0"
 source = { registry = "https://pypi.org/simple" }
 dependencies = [
    { name = "webencodings" },
 ]
 sdist = { url = "https://files.pythonhosted.org/packages/fd/03/6111ed99e9bf7dfa1c30baeef0e0fb7e0bd387bd07f8e5b270776fe1de3f/tinyhtml5-2.0.0.tar.gz", hash = "sha256:086f998833da24c300c414d9fe81d9b368fd04cb9d2596a008421cbc705fcfcc", size = 179507 }
 wheels = [
    { url = "https://files.pythonhosted.org/packages/5c/de/27c57899297163a4a84104d5cec0af3b1ac5faf62f44667e506373c6b8ce/tinyhtml5-2.0.0-py3-none-any.whl", hash = "sha256:13683277c5b176d070f82d099d977194b7a1e26815b016114f581a74bbfbf47e", size = 39793 },
 ]
 [[package]]
 name = "tokenizers"
 version = "0.21.0"
@ -3658,6 +3478,15 @@ wheels = [
    { url = "https://files.pythonhosted.org/packages/10/6d/adb955ecf60811a3735d508974bbb5358e7745b635dc001329267529c6f2/unstructured.pytesseract-0.3.15-py3-none-any.whl", hash = "sha256:a3f505c5efb7ff9f10379051a7dd6aa624b3be6b0f023ed6767cc80d0b1613d1", size = 14992 },
 ]
 [[package]]
 name = "uritemplate"
 version = "4.1.1"
 source = { registry = "https://pypi.org/simple" }
 sdist = { url = "https://files.pythonhosted.org/packages/d2/5a/4742fdba39cd02a56226815abfa72fe0aa81c33bed16ed045647d6000eba/uritemplate-4.1.1.tar.gz", hash = "sha256:4346edfc5c3b79f694bccd6d6099a322bbeb628dbf2cd86eea55a456ce5124f0", size = 273898 }
 wheels = [
    { url = "https://files.pythonhosted.org/packages/81/c0/7461b49cd25aeece13766f02ee576d1db528f1c37ce69aee300e075b485b/uritemplate-4.1.1-py2.py3-none-any.whl", hash = "sha256:830c08b8d99bdd312ea4ead05994a38e8936266f84b9a7878232db50b044e02e", size = 10356 },
 ]
 [[package]]
 name = "urllib3"
 version = "2.3.0"
@ -3756,25 +3585,6 @@ wheels = [
    { url = "https://files.pythonhosted.org/packages/f0/e5/96b8e55271685ddbadc50ce8bc53aa2dff278fb7ac4c2e473df890def2dc/watchfiles-1.0.4-cp313-cp313-win_amd64.whl", hash = "sha256:d6097538b0ae5c1b88c3b55afa245a66793a8fec7ada6755322e465fb1a0e8cc", size = 285216 },
 ]
 [[package]]
 name = "weasyprint"
 version = "64.1"
 source = { registry = "https://pypi.org/simple" }
 dependencies = [
    { name = "cffi" },
    { name = "cssselect2" },
    { name = "fonttools", extra = ["woff"] },
    { name = "pillow" },
    { name = "pydyf" },
    { name = "pyphen" },
    { name = "tinycss2" },
    { name = "tinyhtml5" },
 ]
 sdist = { url = "https://files.pythonhosted.org/packages/6c/a0/f6b3ef688e747488b17b3b39d27fe7438d3ec88d1b79d5524485a5458020/weasyprint-64.1.tar.gz", hash = "sha256:28b02f2c6409bafce1b1220d9d76a7345875bd3bd08c4f6dfbf510bb92a94757", size = 498647 }
 wheels = [
    { url = "https://files.pythonhosted.org/packages/42/95/bf333fbbaf73c1c211b6b801b9ac2563db8e2225f69902d1ba8b25c70e9c/weasyprint-64.1-py3-none-any.whl", hash = "sha256:f7c88ea8ce0ce0c527cbb9c802689e035fae50016d7efc5dfdaba4b75abf68f4", size = 302025 },
 ]
 [[package]]
 name = "webencodings"
 version = "0.5.1"
@ -3815,15 +3625,6 @@ wheels = [
    { url = "https://files.pythonhosted.org/packages/7b/c8/d529f8a32ce40d98309f4470780631e971a5a842b60aec864833b3615786/websockets-14.2-py3-none-any.whl", hash = "sha256:7a6ceec4ea84469f15cf15807a747e9efe57e369c384fa86e022b3bea679b79b", size = 157416 },
 ]
 [[package]]
 name = "win32-setctime"
 version = "1.2.0"
 source = { registry = "https://pypi.org/simple" }
 sdist = { url = "https://files.pythonhosted.org/packages/b3/8f/705086c9d734d3b663af0e9bb3d4de6578d08f46b1b101c2442fd9aecaa2/win32_setctime-1.2.0.tar.gz", hash = "sha256:ae1fdf948f5640aae05c511ade119313fb6a30d7eabe25fef9764dca5873c4c0", size = 4867 }
 wheels = [
    { url = "https://files.pythonhosted.org/packages/e1/07/c6fe3ad3e685340704d314d765b7912993bcb8dc198f0e7a89382d37974b/win32_setctime-1.2.0-py3-none-any.whl", hash = "sha256:95d644c4e708aba81dc3704a116d8cbc974d70b3bdb8be1d150e36be6e9d1390", size = 4083 },
 ]
 [[package]]
 name = "wrapt"
 version = "1.17.2"
@ -3884,6 +3685,44 @@ wheels = [
    { url = "https://files.pythonhosted.org/packages/9b/07/df054f7413bdfff5e98f75056e4ed0977d0c8716424011fac2587864d1d3/XlsxWriter-3.2.2-py3-none-any.whl", hash = "sha256:272ce861e7fa5e82a4a6ebc24511f2cb952fde3461f6c6e1a1e81d3272db1471", size = 165121 },
 ]
 [[package]]
 name = "xxhash"
 version = "3.5.0"
 source = { registry = "https://pypi.org/simple" }
 sdist = { url = "https://files.pythonhosted.org/packages/00/5e/d6e5258d69df8b4ed8c83b6664f2b47d30d2dec551a29ad72a6c69eafd31/xxhash-3.5.0.tar.gz", hash = "sha256:84f2caddf951c9cbf8dc2e22a89d4ccf5d86391ac6418fe81e3c67d0cf60b45f", size = 84241 }
 wheels = [
    { url = "https://files.pythonhosted.org/packages/07/0e/1bfce2502c57d7e2e787600b31c83535af83746885aa1a5f153d8c8059d6/xxhash-3.5.0-cp312-cp312-macosx_10_9_x86_64.whl", hash = "sha256:14470ace8bd3b5d51318782cd94e6f94431974f16cb3b8dc15d52f3b69df8e00", size = 31969 },
    { url = "https://files.pythonhosted.org/packages/3f/d6/8ca450d6fe5b71ce521b4e5db69622383d039e2b253e9b2f24f93265b52c/xxhash-3.5.0-cp312-cp312-macosx_11_0_arm64.whl", hash = "sha256:59aa1203de1cb96dbeab595ded0ad0c0056bb2245ae11fac11c0ceea861382b9", size = 30787 },
    { url = "https://files.pythonhosted.org/packages/5b/84/de7c89bc6ef63d750159086a6ada6416cc4349eab23f76ab870407178b93/xxhash-3.5.0-cp312-cp312-manylinux_2_17_aarch64.manylinux2014_aarch64.whl", hash = "sha256:08424f6648526076e28fae6ea2806c0a7d504b9ef05ae61d196d571e5c879c84", size = 220959 },
    { url = "https://files.pythonhosted.org/packages/fe/86/51258d3e8a8545ff26468c977101964c14d56a8a37f5835bc0082426c672/xxhash-3.5.0-cp312-cp312-manylinux_2_17_ppc64le.manylinux2014_ppc64le.whl", hash = "sha256:61a1ff00674879725b194695e17f23d3248998b843eb5e933007ca743310f793", size = 200006 },
    { url = "https://files.pythonhosted.org/packages/02/0a/96973bd325412feccf23cf3680fd2246aebf4b789122f938d5557c54a6b2/xxhash-3.5.0-cp312-cp312-manylinux_2_17_s390x.manylinux2014_s390x.whl", hash = "sha256:f2f2c61bee5844d41c3eb015ac652a0229e901074951ae48581d58bfb2ba01be", size = 428326 },
    { url = "https://files.pythonhosted.org/packages/11/a7/81dba5010f7e733de88af9555725146fc133be97ce36533867f4c7e75066/xxhash-3.5.0-cp312-cp312-manylinux_2_17_x86_64.manylinux2014_x86_64.whl", hash = "sha256:9d32a592cac88d18cc09a89172e1c32d7f2a6e516c3dfde1b9adb90ab5df54a6", size = 194380 },
    { url = "https://files.pythonhosted.org/packages/fb/7d/f29006ab398a173f4501c0e4977ba288f1c621d878ec217b4ff516810c04/xxhash-3.5.0-cp312-cp312-manylinux_2_5_i686.manylinux1_i686.manylinux_2_17_i686.manylinux2014_i686.whl", hash = "sha256:70dabf941dede727cca579e8c205e61121afc9b28516752fd65724be1355cc90", size = 207934 },
    { url = "https://files.pythonhosted.org/packages/8a/6e/6e88b8f24612510e73d4d70d9b0c7dff62a2e78451b9f0d042a5462c8d03/xxhash-3.5.0-cp312-cp312-musllinux_1_2_aarch64.whl", hash = "sha256:e5d0ddaca65ecca9c10dcf01730165fd858533d0be84c75c327487c37a906a27", size = 216301 },
    { url = "https://files.pythonhosted.org/packages/af/51/7862f4fa4b75a25c3b4163c8a873f070532fe5f2d3f9b3fc869c8337a398/xxhash-3.5.0-cp312-cp312-musllinux_1_2_i686.whl", hash = "sha256:3e5b5e16c5a480fe5f59f56c30abdeba09ffd75da8d13f6b9b6fd224d0b4d0a2", size = 203351 },
    { url = "https://files.pythonhosted.org/packages/22/61/8d6a40f288f791cf79ed5bb113159abf0c81d6efb86e734334f698eb4c59/xxhash-3.5.0-cp312-cp312-musllinux_1_2_ppc64le.whl", hash = "sha256:149b7914451eb154b3dfaa721315117ea1dac2cc55a01bfbd4df7c68c5dd683d", size = 210294 },
    { url = "https://files.pythonhosted.org/packages/17/02/215c4698955762d45a8158117190261b2dbefe9ae7e5b906768c09d8bc74/xxhash-3.5.0-cp312-cp312-musllinux_1_2_s390x.whl", hash = "sha256:eade977f5c96c677035ff39c56ac74d851b1cca7d607ab3d8f23c6b859379cab", size = 414674 },
    { url = "https://files.pythonhosted.org/packages/31/5c/b7a8db8a3237cff3d535261325d95de509f6a8ae439a5a7a4ffcff478189/xxhash-3.5.0-cp312-cp312-musllinux_1_2_x86_64.whl", hash = "sha256:fa9f547bd98f5553d03160967866a71056a60960be00356a15ecc44efb40ba8e", size = 192022 },
    { url = "https://files.pythonhosted.org/packages/78/e3/dd76659b2811b3fd06892a8beb850e1996b63e9235af5a86ea348f053e9e/xxhash-3.5.0-cp312-cp312-win32.whl", hash = "sha256:f7b58d1fd3551b8c80a971199543379be1cee3d0d409e1f6d8b01c1a2eebf1f8", size = 30170 },
    { url = "https://files.pythonhosted.org/packages/d9/6b/1c443fe6cfeb4ad1dcf231cdec96eb94fb43d6498b4469ed8b51f8b59a37/xxhash-3.5.0-cp312-cp312-win_amd64.whl", hash = "sha256:fa0cafd3a2af231b4e113fba24a65d7922af91aeb23774a8b78228e6cd785e3e", size = 30040 },
    { url = "https://files.pythonhosted.org/packages/0f/eb/04405305f290173acc0350eba6d2f1a794b57925df0398861a20fbafa415/xxhash-3.5.0-cp312-cp312-win_arm64.whl", hash = "sha256:586886c7e89cb9828bcd8a5686b12e161368e0064d040e225e72607b43858ba2", size = 26796 },
    { url = "https://files.pythonhosted.org/packages/c9/b8/e4b3ad92d249be5c83fa72916c9091b0965cb0faeff05d9a0a3870ae6bff/xxhash-3.5.0-cp313-cp313-macosx_10_13_x86_64.whl", hash = "sha256:37889a0d13b0b7d739cfc128b1c902f04e32de17b33d74b637ad42f1c55101f6", size = 31795 },
    { url = "https://files.pythonhosted.org/packages/fc/d8/b3627a0aebfbfa4c12a41e22af3742cf08c8ea84f5cc3367b5de2d039cce/xxhash-3.5.0-cp313-cp313-macosx_11_0_arm64.whl", hash = "sha256:97a662338797c660178e682f3bc180277b9569a59abfb5925e8620fba00b9fc5", size = 30792 },
    { url = "https://files.pythonhosted.org/packages/c3/cc/762312960691da989c7cd0545cb120ba2a4148741c6ba458aa723c00a3f8/xxhash-3.5.0-cp313-cp313-manylinux_2_17_aarch64.manylinux2014_aarch64.whl", hash = "sha256:7f85e0108d51092bdda90672476c7d909c04ada6923c14ff9d913c4f7dc8a3bc", size = 220950 },
    { url = "https://files.pythonhosted.org/packages/fe/e9/cc266f1042c3c13750e86a535496b58beb12bf8c50a915c336136f6168dc/xxhash-3.5.0-cp313-cp313-manylinux_2_17_ppc64le.manylinux2014_ppc64le.whl", hash = "sha256:cd2fd827b0ba763ac919440042302315c564fdb797294d86e8cdd4578e3bc7f3", size = 199980 },
    { url = "https://files.pythonhosted.org/packages/bf/85/a836cd0dc5cc20376de26b346858d0ac9656f8f730998ca4324921a010b9/xxhash-3.5.0-cp313-cp313-manylinux_2_17_s390x.manylinux2014_s390x.whl", hash = "sha256:82085c2abec437abebf457c1d12fccb30cc8b3774a0814872511f0f0562c768c", size = 428324 },
    { url = "https://files.pythonhosted.org/packages/b4/0e/15c243775342ce840b9ba34aceace06a1148fa1630cd8ca269e3223987f5/xxhash-3.5.0-cp313-cp313-manylinux_2_17_x86_64.manylinux2014_x86_64.whl", hash = "sha256:07fda5de378626e502b42b311b049848c2ef38784d0d67b6f30bb5008642f8eb", size = 194370 },
    { url = "https://files.pythonhosted.org/packages/87/a1/b028bb02636dfdc190da01951d0703b3d904301ed0ef6094d948983bef0e/xxhash-3.5.0-cp313-cp313-manylinux_2_5_i686.manylinux1_i686.manylinux_2_17_i686.manylinux2014_i686.whl", hash = "sha256:c279f0d2b34ef15f922b77966640ade58b4ccdfef1c4d94b20f2a364617a493f", size = 207911 },
    { url = "https://files.pythonhosted.org/packages/80/d5/73c73b03fc0ac73dacf069fdf6036c9abad82de0a47549e9912c955ab449/xxhash-3.5.0-cp313-cp313-musllinux_1_2_aarch64.whl", hash = "sha256:89e66ceed67b213dec5a773e2f7a9e8c58f64daeb38c7859d8815d2c89f39ad7", size = 216352 },
    { url = "https://files.pythonhosted.org/packages/b6/2a/5043dba5ddbe35b4fe6ea0a111280ad9c3d4ba477dd0f2d1fe1129bda9d0/xxhash-3.5.0-cp313-cp313-musllinux_1_2_i686.whl", hash = "sha256:bcd51708a633410737111e998ceb3b45d3dbc98c0931f743d9bb0a209033a326", size = 203410 },
    { url = "https://files.pythonhosted.org/packages/a2/b2/9a8ded888b7b190aed75b484eb5c853ddd48aa2896e7b59bbfbce442f0a1/xxhash-3.5.0-cp313-cp313-musllinux_1_2_ppc64le.whl", hash = "sha256:3ff2c0a34eae7df88c868be53a8dd56fbdf592109e21d4bfa092a27b0bf4a7bf", size = 210322 },
    { url = "https://files.pythonhosted.org/packages/98/62/440083fafbc917bf3e4b67c2ade621920dd905517e85631c10aac955c1d2/xxhash-3.5.0-cp313-cp313-musllinux_1_2_s390x.whl", hash = "sha256:4e28503dccc7d32e0b9817aa0cbfc1f45f563b2c995b7a66c4c8a0d232e840c7", size = 414725 },
    { url = "https://files.pythonhosted.org/packages/75/db/009206f7076ad60a517e016bb0058381d96a007ce3f79fa91d3010f49cc2/xxhash-3.5.0-cp313-cp313-musllinux_1_2_x86_64.whl", hash = "sha256:a6c50017518329ed65a9e4829154626f008916d36295b6a3ba336e2458824c8c", size = 192070 },
    { url = "https://files.pythonhosted.org/packages/1f/6d/c61e0668943a034abc3a569cdc5aeae37d686d9da7e39cf2ed621d533e36/xxhash-3.5.0-cp313-cp313-win32.whl", hash = "sha256:53a068fe70301ec30d868ece566ac90d873e3bb059cf83c32e76012c889b8637", size = 30172 },
    { url = "https://files.pythonhosted.org/packages/96/14/8416dce965f35e3d24722cdf79361ae154fa23e2ab730e5323aa98d7919e/xxhash-3.5.0-cp313-cp313-win_amd64.whl", hash = "sha256:80babcc30e7a1a484eab952d76a4f4673ff601f54d5142c26826502740e70b43", size = 30041 },
    { url = "https://files.pythonhosted.org/packages/27/ee/518b72faa2073f5aa8e3262408d284892cb79cf2754ba0c3a5870645ef73/xxhash-3.5.0-cp313-cp313-win_arm64.whl", hash = "sha256:4811336f1ce11cac89dcbd18f3a25c527c16311709a89313c3acaf771def2d4b", size = 26801 },
 ]
 [[package]]
 name = "yarl"
 version = "1.18.3"
@ -3952,34 +3791,6 @@ wheels = [
    { url = "https://files.pythonhosted.org/packages/b7/1a/7e4798e9339adc931158c9d69ecc34f5e6791489d469f5e50ec15e35f458/zipp-3.21.0-py3-none-any.whl", hash = "sha256:ac1bbe05fd2991f160ebce24ffbac5f6d11d83dc90891255885223d42b3cd931", size = 9630 },
 ]
 [[package]]
 name = "zopfli"
 version = "0.2.3.post1"
 source = { registry = "https://pypi.org/simple" }
 sdist = { url = "https://files.pythonhosted.org/packages/5e/7c/a8f6696e694709e2abcbccd27d05ef761e9b6efae217e11d977471555b62/zopfli-0.2.3.post1.tar.gz", hash = "sha256:96484dc0f48be1c5d7ae9f38ed1ce41e3675fd506b27c11a6607f14b49101e99", size = 175629 }
 wheels = [
    { url = "https://files.pythonhosted.org/packages/3f/ce/b6441cc01881d06e0b5883f32c44e7cc9772e0d04e3e59277f59f80b9a19/zopfli-0.2.3.post1-cp312-cp312-macosx_10_13_universal2.whl", hash = "sha256:3f0197b6aa6eb3086ae9e66d6dd86c4d502b6c68b0ec490496348ae8c05ecaef", size = 295489 },
    { url = "https://files.pythonhosted.org/packages/93/f0/24dd708f00ae0a925bc5c9edae858641c80f6a81a516810dc4d21688a930/zopfli-0.2.3.post1-cp312-cp312-macosx_10_13_x86_64.whl", hash = "sha256:5fcfc0dc2761e4fcc15ad5d273b4d58c2e8e059d3214a7390d4d3c8e2aee644e", size = 163010 },
    { url = "https://files.pythonhosted.org/packages/65/57/0378eeeb5e3e1e83b1b0958616b2bf954f102ba5b0755b9747dafbd8cb72/zopfli-0.2.3.post1-cp312-cp312-manylinux_2_12_i686.manylinux2010_i686.manylinux_2_17_i686.manylinux2014_i686.whl", hash = "sha256:cac2b37ab21c2b36a10b685b1893ebd6b0f83ae26004838ac817680881576567", size = 823649 },
    { url = "https://files.pythonhosted.org/packages/ab/8a/3ab8a616d4655acf5cf63c40ca84e434289d7d95518a1a42d28b4a7228f8/zopfli-0.2.3.post1-cp312-cp312-manylinux_2_17_aarch64.manylinux2014_aarch64.whl", hash = "sha256:8d5ab297d660b75c159190ce6d73035502310e40fd35170aed7d1a1aea7ddd65", size = 826557 },
    { url = "https://files.pythonhosted.org/packages/ed/4d/7f6820af119c4fec6efaf007bffee7bc9052f695853a711a951be7afd26b/zopfli-0.2.3.post1-cp312-cp312-manylinux_2_17_x86_64.manylinux2014_x86_64.whl", hash = "sha256:9ba214f4f45bec195ee8559651154d3ac2932470b9d91c5715fc29c013349f8c", size = 851127 },
    { url = "https://files.pythonhosted.org/packages/e1/db/1ef5353ab06f9f2fb0c25ed0cddf1418fe275cc2ee548bc4a29340c44fe1/zopfli-0.2.3.post1-cp312-cp312-musllinux_1_2_aarch64.whl", hash = "sha256:c1e0ed5d84ffa2d677cc9582fc01e61dab2e7ef8b8996e055f0a76167b1b94df", size = 1754183 },
    { url = "https://files.pythonhosted.org/packages/39/03/44f8f39950354d330fa798e4bab1ac8e38ec787d3fde25d5b9c7770065a2/zopfli-0.2.3.post1-cp312-cp312-musllinux_1_2_i686.whl", hash = "sha256:bfa1eb759e07d8b7aa7a310a2bc535e127ee70addf90dc8d4b946b593c3e51a8", size = 1905945 },
    { url = "https://files.pythonhosted.org/packages/74/7b/94b920c33cc64255f59e3cfc77c829b5c6e60805d189baeada728854a342/zopfli-0.2.3.post1-cp312-cp312-musllinux_1_2_x86_64.whl", hash = "sha256:cd2c002f160502608dcc822ed2441a0f4509c52e86fcfd1a09e937278ed1ca14", size = 1835885 },
    { url = "https://files.pythonhosted.org/packages/ad/89/c869ac844351e285a6165e2da79b715b0619a122e3160d183805adf8ab45/zopfli-0.2.3.post1-cp312-cp312-win32.whl", hash = "sha256:7be5cc6732eb7b4df17305d8a7b293223f934a31783a874a01164703bc1be6cd", size = 82743 },
    { url = "https://files.pythonhosted.org/packages/29/e6/c98912fd3a589d8a7316c408fd91519f72c237805c4400b753e3942fda0b/zopfli-0.2.3.post1-cp312-cp312-win_amd64.whl", hash = "sha256:4e50ffac74842c1c1018b9b73875a0d0a877c066ab06bf7cccbaa84af97e754f", size = 99403 },
    { url = "https://files.pythonhosted.org/packages/2b/24/0e552e2efce9a20625b56e9609d1e33c2966be33fc008681121ec267daec/zopfli-0.2.3.post1-cp313-cp313-macosx_10_13_universal2.whl", hash = "sha256:ecb7572df5372abce8073df078207d9d1749f20b8b136089916a4a0868d56051", size = 295485 },
    { url = "https://files.pythonhosted.org/packages/08/83/b2564369fb98797a617fe2796097b1d719a4937234375757ad2a3febc04b/zopfli-0.2.3.post1-cp313-cp313-macosx_10_13_x86_64.whl", hash = "sha256:a1cf720896d2ce998bc8e051d4b4ce0d8bec007aab6243102e8e1d22a0b2fb3f", size = 163000 },
    { url = "https://files.pythonhosted.org/packages/3c/55/81d419739c2aab35e19b58bce5498dcb58e6446e5eb69f2d3c748b1c9151/zopfli-0.2.3.post1-cp313-cp313-manylinux_2_12_i686.manylinux2010_i686.manylinux_2_17_i686.manylinux2014_i686.whl", hash = "sha256:5aad740b4d4fcbaaae4887823925166ffd062db3b248b3f432198fc287381d1a", size = 823699 },
    { url = "https://files.pythonhosted.org/packages/9e/91/89f07c8ea3c9bc64099b3461627b07a8384302235ee0f357eaa86f98f509/zopfli-0.2.3.post1-cp313-cp313-manylinux_2_17_aarch64.manylinux2014_aarch64.whl", hash = "sha256:6617fb10f9e4393b331941861d73afb119cd847e88e4974bdbe8068ceef3f73f", size = 826612 },
    { url = "https://files.pythonhosted.org/packages/41/31/46670fc0c7805d42bc89702440fa9b73491d68abbc39e28d687180755178/zopfli-0.2.3.post1-cp313-cp313-manylinux_2_17_x86_64.manylinux2014_x86_64.whl", hash = "sha256:a53b18797cdef27e019db595d66c4b077325afe2fd62145953275f53d84ce40c", size = 851148 },
    { url = "https://files.pythonhosted.org/packages/22/00/71ad39277bbb88f9fd20fb786bd3ff2ea4025c53b31652a0da796fb546cd/zopfli-0.2.3.post1-cp313-cp313-musllinux_1_2_aarch64.whl", hash = "sha256:b78008a69300d929ca2efeffec951b64a312e9a811e265ea4a907ab546d79fa6", size = 1754215 },
    { url = "https://files.pythonhosted.org/packages/d0/4e/e542c508d20c3dfbef1b90fcf726f824f505e725747f777b0b7b7d1deb95/zopfli-0.2.3.post1-cp313-cp313-musllinux_1_2_i686.whl", hash = "sha256:0aa5f90d6298bda02a95bc8dc8c3c19004d5a4e44bda00b67ca7431d857b4b54", size = 1905988 },
    { url = "https://files.pythonhosted.org/packages/ba/a5/817ac1ecc888723e91dc172e8c6eeab9f48a1e52285803b965084e11bbd5/zopfli-0.2.3.post1-cp313-cp313-musllinux_1_2_x86_64.whl", hash = "sha256:2768c877f76c8a0e7519b1c86c93757f3c01492ddde55751e9988afb7eff64e1", size = 1835907 },
    { url = "https://files.pythonhosted.org/packages/cd/35/2525f90c972d8aafc39784a8c00244eeee8e8221b26cbc576748ee9dc1cd/zopfli-0.2.3.post1-cp313-cp313-win32.whl", hash = "sha256:71390dbd3fbf6ebea9a5d85ffed8c26ee1453ee09248e9b88486e30e0397b775", size = 82742 },
    { url = "https://files.pythonhosted.org/packages/2f/c6/49b27570923956d52d37363e8f5df3a31a61bd7719bb8718527a9df3ae5f/zopfli-0.2.3.post1-cp313-cp313-win_amd64.whl", hash = "sha256:a86eb88e06bd87e1fff31dac878965c26b0c26db59ddcf78bb0379a954b120de", size = 99408 },
 ]
 [[package]]
 name = "zstandard"
 version = "0.23.0"
--- a/surfsense_browser_extension/package.json
+++ b/surfsense_browser_extension/package.json
@ -1,7 +1,7 @@
 {
-  "name": "surfsense",
+  "name": "surfsense_browser_extension",
-  "displayName": "Surfsense",
+  "displayName": "Surfsense Browser Extension",
-  "version": "0.0.1",
+  "version": "0.0.6",
  "description": "Extension to collect Browsing History for SurfSense.",
  "author": "https://github.com/MODSetter",
  "scripts": {
--- a/surfsense_web/.gitignore
+++ b/surfsense_web/.gitignore
@ -39,3 +39,6 @@ yarn-error.log*
 # typescript
 *.tsbuildinfo
 next-env.d.ts
 # source
 /.source/
--- a/surfsense_web/Dockerfile
+++ b/surfsense_web/Dockerfile
@ -8,8 +8,15 @@ RUN npm install -g pnpm
 # Copy package files
 COPY package.json pnpm-lock.yaml ./
-# Install dependencies
+# First copy the config file to avoid fumadocs-mdx postinstall error
-RUN pnpm install
+COPY source.config.ts ./
 COPY content ./content
 # Install dependencies with --ignore-scripts to skip postinstall
 RUN pnpm install --ignore-scripts
 # Now run the postinstall script manually
 RUN pnpm fumadocs-mdx
 # Copy source code
 COPY . .
--- a/surfsense_web/app/api/search/route.ts
+++ b/surfsense_web/app/api/search/route.ts
@ -0,0 +1,4 @@
 import { source } from '@/lib/source';
 import { createFromSource } from 'fumadocs-core/search/server';
 export const { GET } = createFromSource(source);
--- a/surfsense_web/app/dashboard/[search_space_id]/chats/page.tsx
+++ b/surfsense_web/app/dashboard/[search_space_id]/chats/page.tsx
@ -7,12 +7,15 @@ interface PageProps {
  };
 }
-export default function ChatsPage({ params }: PageProps) {
+export default async function ChatsPage({ params }: PageProps) {
  // Await params to properly access dynamic route parameters
  const searchSpaceId = params.search_space_id;
  return (
    <Suspense fallback={<div className="flex items-center justify-center h-[60vh]">
      <div className="h-8 w-8 animate-spin rounded-full border-4 border-primary border-t-transparent"></div>
    </div>}>
-      <ChatsPageClient searchSpaceId={params.search_space_id} />
+      <ChatsPageClient searchSpaceId={searchSpaceId} />
    </Suspense>
  );
 } 
--- a/surfsense_web/app/dashboard/[search_space_id]/connectors/(manage)/page.tsx
+++ b/surfsense_web/app/dashboard/[search_space_id]/connectors/(manage)/page.tsx
@ -44,6 +44,8 @@ const getConnectorTypeDisplay = (type: string): string => {
    "TAVILY_API": "Tavily API",
    "SLACK_CONNECTOR": "Slack",
    "NOTION_CONNECTOR": "Notion",
    "GITHUB_CONNECTOR": "GitHub",
    "LINEAR_CONNECTOR": "Linear",
    // Add other connector types here as needed
  };
  return typeMap[type] || type;
@ -204,7 +206,7 @@ export default function ConnectorsPage() {
                          <Button
                            variant="outline"
                            size="sm"
-                            onClick={() => router.push(`/dashboard/${searchSpaceId}/connectors/${connector.id}`)}
+                            onClick={() => router.push(`/dashboard/${searchSpaceId}/connectors/${connector.id}/edit`)}
                          >
                            <Edit className="h-4 w-4" />
                            <span className="sr-only">Edit</span>
@ -253,4 +255,4 @@ export default function ConnectorsPage() {
      </Card>
    </div>
  );
-} 
+} 
--- a/surfsense_web/app/dashboard/[search_space_id]/connectors/[connector_id]/edit/page.tsx
+++ b/surfsense_web/app/dashboard/[search_space_id]/connectors/[connector_id]/edit/page.tsx
@ -0,0 +1,176 @@
 "use client";
 import React, { useEffect } from 'react';
 import { useRouter, useParams } from "next/navigation";
 import { motion } from "framer-motion";
 import { toast } from "sonner";
 import { ArrowLeft, Check, Loader2, Github } from "lucide-react";
 import { Form } from "@/components/ui/form";
 import { Button } from "@/components/ui/button";
 import { Card, CardContent, CardDescription, CardFooter, CardHeader, CardTitle } from "@/components/ui/card";
 // Import Utils, Types, Hook, and Components
 import { getConnectorTypeDisplay } from '@/lib/connectors/utils';
 import { useConnectorEditPage } from '@/hooks/useConnectorEditPage';
 import { EditConnectorLoadingSkeleton } from "@/components/editConnector/EditConnectorLoadingSkeleton";
 import { EditConnectorNameForm } from "@/components/editConnector/EditConnectorNameForm";
 import { EditGitHubConnectorConfig } from "@/components/editConnector/EditGitHubConnectorConfig";
 import { EditSimpleTokenForm } from "@/components/editConnector/EditSimpleTokenForm";
 export default function EditConnectorPage() {
    const router = useRouter();
    const params = useParams();
    const searchSpaceId = params.search_space_id as string;
    // Ensure connectorId is parsed safely
    const connectorIdParam = params.connector_id as string;
    const connectorId = connectorIdParam ? parseInt(connectorIdParam, 10) : NaN;
    // Use the custom hook to manage state and logic
    const {
        connectorsLoading,
        connector,
        isSaving,
        editForm,
        patForm, // Needed for GitHub child component
        handleSaveChanges,
        // GitHub specific props for the child component
        editMode,
        setEditMode, // Pass down if needed by GitHub component
        originalPat,
        currentSelectedRepos,
        fetchedRepos,
        setFetchedRepos,
        newSelectedRepos,
        setNewSelectedRepos,
        isFetchingRepos,
        handleFetchRepositories,
        handleRepoSelectionChange,
    } = useConnectorEditPage(connectorId, searchSpaceId);
    // Redirect if connectorId is not a valid number after parsing
    useEffect(() => {
        if (isNaN(connectorId)) {
            toast.error("Invalid Connector ID.");
            router.push(`/dashboard/${searchSpaceId}/connectors`);
        }
    }, [connectorId, router, searchSpaceId]);
    // Loading State
    if (connectorsLoading || !connector) {
        // Handle NaN case before showing skeleton
        if (isNaN(connectorId)) return null; 
        return <EditConnectorLoadingSkeleton />;
    }
    // Main Render using data/handlers from the hook
    return (
        <div className="container mx-auto py-8 max-w-3xl">
            <Button variant="ghost" className="mb-6" onClick={() => router.push(`/dashboard/${searchSpaceId}/connectors`)}>
                <ArrowLeft className="mr-2 h-4 w-4" /> Back to Connectors
            </Button>
            <motion.div initial={{ opacity: 0, y: 20 }} animate={{ opacity: 1, y: 0 }} transition={{ duration: 0.5 }}>
                <Card className="border-2 border-border">
                    <CardHeader>
                        <CardTitle className="text-2xl font-bold flex items-center gap-2">
                            <Github className="h-6 w-6" /> {/* TODO: Dynamic icon */}
                            Edit {getConnectorTypeDisplay(connector.connector_type)} Connector
                        </CardTitle>
                        <CardDescription>Modify connector name and configuration.</CardDescription>
                    </CardHeader>
                    <Form {...editForm}> 
                        {/* Pass hook's handleSaveChanges */}
                        <form onSubmit={editForm.handleSubmit(handleSaveChanges)} className="space-y-6">
                            <CardContent className="space-y-6">
                                {/* Pass form control from hook */}
                                <EditConnectorNameForm control={editForm.control} />
                                <hr />
                                <h3 className="text-lg font-semibold">Configuration</h3>
                                {/* == GitHub == */}
                                {connector.connector_type === 'GITHUB_CONNECTOR' && (
                                    <EditGitHubConnectorConfig
                                        // Pass relevant state and handlers from hook
                                        editMode={editMode}
                                        setEditMode={setEditMode} // Pass setter if child manages mode
                                        originalPat={originalPat}
                                        currentSelectedRepos={currentSelectedRepos}
                                        fetchedRepos={fetchedRepos}
                                        newSelectedRepos={newSelectedRepos}
                                        isFetchingRepos={isFetchingRepos}
                                        patForm={patForm}
                                        handleFetchRepositories={handleFetchRepositories}
                                        handleRepoSelectionChange={handleRepoSelectionChange}
                                        setNewSelectedRepos={setNewSelectedRepos}
                                        setFetchedRepos={setFetchedRepos}
                                    />
                                )}
                                {/* == Slack == */}
                                {connector.connector_type === 'SLACK_CONNECTOR' && (
                                    <EditSimpleTokenForm
                                        control={editForm.control}
                                        fieldName="SLACK_BOT_TOKEN"
                                        fieldLabel="Slack Bot Token"
                                        fieldDescription="Update the Slack Bot Token if needed."
                                        placeholder="Begins with xoxb-..."
                                    />
                                )}
                                {/* == Notion == */}
                                {connector.connector_type === 'NOTION_CONNECTOR' && (
                                    <EditSimpleTokenForm
                                        control={editForm.control}
                                        fieldName="NOTION_INTEGRATION_TOKEN"
                                        fieldLabel="Notion Integration Token"
                                        fieldDescription="Update the Notion Integration Token if needed."
                                        placeholder="Begins with secret_..."
                                    />
                                )}
                                {/* == Serper == */}
                                {connector.connector_type === 'SERPER_API' && (
                                    <EditSimpleTokenForm
                                        control={editForm.control}
                                        fieldName="SERPER_API_KEY"
                                        fieldLabel="Serper API Key"
                                        fieldDescription="Update the Serper API Key if needed."
                                    />
                                )}
                                {/* == Tavily == */}
                                {connector.connector_type === 'TAVILY_API' && (
                                    <EditSimpleTokenForm
                                        control={editForm.control}
                                        fieldName="TAVILY_API_KEY"
                                        fieldLabel="Tavily API Key"
                                        fieldDescription="Update the Tavily API Key if needed."
                                    />
                                )}
                                {/* == Linear == */}
                                {connector.connector_type === 'LINEAR_CONNECTOR' && (
                                    <EditSimpleTokenForm
                                        control={editForm.control}
                                        fieldName="LINEAR_API_KEY"
                                        fieldLabel="Linear API Key"
                                        fieldDescription="Update your Linear API Key if needed."
                                        placeholder="Begins with lin_api_..."
                                    />
                                )}
                            </CardContent>
                            <CardFooter className="border-t pt-6">
                                <Button type="submit" disabled={isSaving} className="w-full sm:w-auto">
                                    {isSaving ? <Loader2 className="mr-2 h-4 w-4 animate-spin" /> : <Check className="mr-2 h-4 w-4" />}
                                    Save Changes
                                </Button>
                            </CardFooter>
                        </form>
                    </Form>
                </Card>
            </motion.div>
        </div>
    );
 } 
--- a/surfsense_web/app/dashboard/[search_space_id]/connectors/[connector_id]/page.tsx
+++ b/surfsense_web/app/dashboard/[search_space_id]/connectors/[connector_id]/page.tsx
@ -51,6 +51,7 @@ const getConnectorTypeDisplay = (type: string): string => {
    "TAVILY_API": "Tavily API",
    "SLACK_CONNECTOR": "Slack Connector",
    "NOTION_CONNECTOR": "Notion Connector",
    "GITHUB_CONNECTOR": "GitHub Connector",
    // Add other connector types here as needed
  };
  return typeMap[type] || type;
@ -69,7 +70,7 @@ export default function EditConnectorPage() {
  const [connector, setConnector] = useState<SearchSourceConnector | null>(null);
  const [isLoading, setIsLoading] = useState(true);
  const [isSubmitting, setIsSubmitting] = useState(false);
-
+  console.log("connector", connector);
  // Initialize the form
  const form = useForm<ApiConnectorFormValues>({
    resolver: zodResolver(apiConnectorFormSchema),
@ -85,7 +86,8 @@ export default function EditConnectorPage() {
      "SERPER_API": "SERPER_API_KEY",
      "TAVILY_API": "TAVILY_API_KEY",
      "SLACK_CONNECTOR": "SLACK_BOT_TOKEN",
-      "NOTION_CONNECTOR": "NOTION_INTEGRATION_TOKEN"
+      "NOTION_CONNECTOR": "NOTION_INTEGRATION_TOKEN",
      "GITHUB_CONNECTOR": "GITHUB_PAT"
    };
    return fieldMap[connectorType] || "";
  };
@ -136,6 +138,8 @@ export default function EditConnectorPage() {
        name: values.name,
        connector_type: connector.connector_type,
        config: updatedConfig,
        is_indexable: connector.is_indexable,
        last_indexed_at: connector.last_indexed_at,
      });
      toast.success("Connector updated successfully!");
@ -223,17 +227,21 @@ export default function EditConnectorPage() {
                          ? "Slack Bot Token" 
                          : connector?.connector_type === "NOTION_CONNECTOR" 
                            ? "Notion Integration Token" 
-                            : "API Key"}
+                            : connector?.connector_type === "GITHUB_CONNECTOR"
                              ? "GitHub Personal Access Token (PAT)"
                              : "API Key"}
                      </FormLabel>
                      <FormControl>
                        <Input 
                          type="password" 
                          placeholder={
                            connector?.connector_type === "SLACK_CONNECTOR" 
-                              ? "Enter your Slack Bot Token" 
+                              ? "Enter new Slack Bot Token (optional)" 
                              : connector?.connector_type === "NOTION_CONNECTOR" 
-                                ? "Enter your Notion Integration Token" 
+                                ? "Enter new Notion Token (optional)"
-                                : "Enter your API key"
+                                : connector?.connector_type === "GITHUB_CONNECTOR"
                                  ? "Enter new GitHub PAT (optional)"
                                  : "Enter new API key (optional)"
                          } 
                          {...field} 
                        />
@ -243,7 +251,9 @@ export default function EditConnectorPage() {
                          ? "Enter a new Slack Bot Token or leave blank to keep your existing token." 
                          : connector?.connector_type === "NOTION_CONNECTOR" 
                            ? "Enter a new Notion Integration Token or leave blank to keep your existing token." 
-                            : "Enter a new API key or leave blank to keep your existing key."}
+                            : connector?.connector_type === "GITHUB_CONNECTOR"
                              ? "Enter a new GitHub PAT or leave blank to keep your existing token."
                              : "Enter a new API key or leave blank to keep your existing key."}
                      </FormDescription>
                      <FormMessage />
                    </FormItem>
@ -276,4 +286,4 @@ export default function EditConnectorPage() {
      </motion.div>
    </div>
  );
-} 
+} 
--- a/surfsense_web/app/dashboard/[search_space_id]/connectors/add/github-connector/page.tsx
+++ b/surfsense_web/app/dashboard/[search_space_id]/connectors/add/github-connector/page.tsx
@ -0,0 +1,456 @@
 "use client";
 import { useState } from "react";
 import { useRouter, useParams } from "next/navigation";
 import { motion } from "framer-motion";
 import { zodResolver } from "@hookform/resolvers/zod";
 import { useForm } from "react-hook-form";
 import * as z from "zod";
 import { toast } from "sonner";
 import { ArrowLeft, Check, Info, Loader2, Github, CircleAlert, ListChecks } from "lucide-react";
 // Assuming useSearchSourceConnectors hook exists and works similarly
 import { useSearchSourceConnectors } from "@/hooks/useSearchSourceConnectors";
 import {
    Form,
    FormControl,
    FormDescription,
    FormField,
    FormItem,
    FormLabel,
    FormMessage,
 } from "@/components/ui/form";
 import { Input } from "@/components/ui/input";
 import { Button } from "@/components/ui/button";
 import {
    Card,
    CardContent,
    CardDescription,
    CardFooter,
    CardHeader,
    CardTitle,
 } from "@/components/ui/card";
 import {
    Alert,
    AlertDescription,
    AlertTitle,
 } from "@/components/ui/alert";
 import {
    Accordion,
    AccordionContent,
    AccordionItem,
    AccordionTrigger,
 } from "@/components/ui/accordion";
 import { Tabs, TabsContent, TabsList, TabsTrigger } from "@/components/ui/tabs";
 import { Checkbox } from "@/components/ui/checkbox";
 // Define the form schema with Zod for GitHub PAT entry step
 const githubPatFormSchema = z.object({
    name: z.string().min(3, {
        message: "Connector name must be at least 3 characters.",
    }),
    github_pat: z.string()
        .min(20, { // Apply min length first
            message: "GitHub Personal Access Token seems too short.",
        })
        .refine(pat => pat.startsWith('ghp_') || pat.startsWith('github_pat_'), { // Then refine the pattern
            message: "GitHub PAT should start with 'ghp_' or 'github_pat_'",
        }),
 });
 // Define the type for the form values
 type GithubPatFormValues = z.infer<typeof githubPatFormSchema>;
 // Type for fetched GitHub repositories
 interface GithubRepo {
    id: number;
    name: string;
    full_name: string;
    private: boolean;
    url: string;
    description: string | null;
    last_updated: string | null;
 }
 export default function GithubConnectorPage() {
    const router = useRouter();
    const params = useParams();
    const searchSpaceId = params.search_space_id as string;
    const [step, setStep] = useState<'enter_pat' | 'select_repos'>('enter_pat');
    const [isFetchingRepos, setIsFetchingRepos] = useState(false);
    const [isCreatingConnector, setIsCreatingConnector] = useState(false);
    const [repositories, setRepositories] = useState<GithubRepo[]>([]);
    const [selectedRepos, setSelectedRepos] = useState<string[]>([]);
    const [connectorName, setConnectorName] = useState<string>("GitHub Connector");
    const [validatedPat, setValidatedPat] = useState<string>(""); // Store the validated PAT
    const { createConnector } = useSearchSourceConnectors();
    // Initialize the form for PAT entry
    const form = useForm<GithubPatFormValues>({
        resolver: zodResolver(githubPatFormSchema),
        defaultValues: {
            name: connectorName,
            github_pat: "",
        },
    });
    // Function to fetch repositories using the new backend endpoint
    const fetchRepositories = async (values: GithubPatFormValues) => {
        setIsFetchingRepos(true);
        setConnectorName(values.name); // Store the name
        setValidatedPat(values.github_pat); // Store the PAT temporarily
        try {
            const token = localStorage.getItem('surfsense_bearer_token');
            if (!token) {
                throw new Error('No authentication token found');
            }
            const response = await fetch(
                `${process.env.NEXT_PUBLIC_FASTAPI_BACKEND_URL}/api/v1/github/repositories/`,
                {
                    method: 'POST',
                    headers: {
                        'Content-Type': 'application/json',
                        'Authorization': `Bearer ${token}`
                    },
                    body: JSON.stringify({ github_pat: values.github_pat })
                }
            );
            if (!response.ok) {
                const errorData = await response.json();
                throw new Error(errorData.detail || `Failed to fetch repositories: ${response.statusText}`);
            }
            const data: GithubRepo[] = await response.json();
            setRepositories(data);
            setStep('select_repos'); // Move to the next step
            toast.success(`Found ${data.length} repositories.`);
        } catch (error) {
            console.error("Error fetching GitHub repositories:", error);
            const errorMessage = error instanceof Error ? error.message : "Failed to fetch repositories. Please check the PAT and try again.";
            toast.error(errorMessage);
        } finally {
            setIsFetchingRepos(false);
        }
    };
    // Handle final connector creation
    const handleCreateConnector = async () => {
        if (selectedRepos.length === 0) {
            toast.warning("Please select at least one repository to index.");
            return;
        }
        setIsCreatingConnector(true);
        try {
            await createConnector({
                name: connectorName, // Use the stored name
                connector_type: "GITHUB_CONNECTOR",
                config: {
                    GITHUB_PAT: validatedPat, // Use the stored validated PAT
                    repo_full_names: selectedRepos, // Add the selected repo names
                },
                is_indexable: true,
                last_indexed_at: null,
            });
            toast.success("GitHub connector created successfully!");
            router.push(`/dashboard/${searchSpaceId}/connectors`);
        } catch (error) {
            console.error("Error creating GitHub connector:", error);
            const errorMessage = error instanceof Error ? error.message : "Failed to create GitHub connector.";
            toast.error(errorMessage);
        } finally {
            setIsCreatingConnector(false);
        }
    };
    // Handle checkbox changes
    const handleRepoSelection = (repoFullName: string, checked: boolean) => {
        setSelectedRepos(prev =>
            checked
                ? [...prev, repoFullName]
                : prev.filter(name => name !== repoFullName)
        );
    };
    return (
        <div className="container mx-auto py-8 max-w-3xl">
            <Button
                variant="ghost"
                className="mb-6"
                onClick={() => {
                    if (step === 'select_repos') {
                        // Go back to PAT entry, clear sensitive/fetched data
                        setStep('enter_pat');
                        setRepositories([]);
                        setSelectedRepos([]);
                        setValidatedPat("");
                        // Reset form PAT field, keep name
                        form.reset({ name: connectorName, github_pat: "" });
                    } else {
                        router.push(`/dashboard/${searchSpaceId}/connectors/add`);
                    }
                }}
            >
                <ArrowLeft className="mr-2 h-4 w-4" />
                {step === 'select_repos' ? "Back to PAT Entry" : "Back to Add Connectors"}
            </Button>
            <motion.div
                initial={{ opacity: 0, y: 20 }}
                animate={{ opacity: 1, y: 0 }}
                transition={{ duration: 0.5 }}
            >
                <Tabs defaultValue="connect" className="w-full">
                    <TabsList className="grid w-full grid-cols-2 mb-6">
                        <TabsTrigger value="connect">Connect GitHub</TabsTrigger>
                        <TabsTrigger value="documentation">Setup Guide</TabsTrigger>
                    </TabsList>
                    <TabsContent value="connect">
                        <Card className="border-2 border-border">
                            <CardHeader>
                                <CardTitle className="text-2xl font-bold flex items-center gap-2">
                                    {step === 'enter_pat' ? <Github className="h-6 w-6" /> : <ListChecks className="h-6 w-6" />}
                                    {step === 'enter_pat' ? "Connect GitHub Account" : "Select Repositories to Index"}
                                </CardTitle>
                                <CardDescription>
                                    {step === 'enter_pat'
                                        ? "Provide a name and GitHub Personal Access Token (PAT) to fetch accessible repositories."
                                        : `Select which repositories you want SurfSense to index for search. Found ${repositories.length} repositories accessible via your PAT.`
                                    }
                                </CardDescription>
                            </CardHeader>
                            <Form {...form}>
                                {step === 'enter_pat' && (
                                    <CardContent>
                                        <Alert className="mb-6 bg-muted">
                                            <Info className="h-4 w-4" />
                                            <AlertTitle>GitHub Personal Access Token (PAT) Required</AlertTitle>
                                            <AlertDescription>
                                                You'll need a GitHub PAT with the appropriate scopes (e.g., 'repo') to fetch repositories. You can create one from your{' '}
                                                <a
                                                    href="https://github.com/settings/personal-access-tokens"
                                                    target="_blank"
                                                    rel="noopener noreferrer"
                                                    className="font-medium underline underline-offset-4"
                                                >
                                                    GitHub Developer Settings
                                                </a>. The PAT will be used to fetch repositories and then stored securely to enable indexing.
                                            </AlertDescription>
                                        </Alert>
                                        <form onSubmit={form.handleSubmit(fetchRepositories)} className="space-y-6">
                                            <FormField
                                                control={form.control}
                                                name="name"
                                                render={({ field }) => (
                                                    <FormItem>
                                                        <FormLabel>Connector Name</FormLabel>
                                                        <FormControl>
                                                            <Input placeholder="My GitHub Connector" {...field} />
                                                        </FormControl>
                                                        <FormDescription>
                                                            A friendly name to identify this GitHub connection.
                                                        </FormDescription>
                                                        <FormMessage />
                                                    </FormItem>
                                                )}
                                            />
                                            <FormField
                                                control={form.control}
                                                name="github_pat"
                                                render={({ field }) => (
                                                    <FormItem>
                                                        <FormLabel>GitHub Personal Access Token (PAT)</FormLabel>
                                                        <FormControl>
                                                            <Input
                                                                type="password"
                                                                placeholder="ghp_... or github_pat_..."
                                                                {...field}
                                                            />
                                                        </FormControl>
                                                        <FormDescription>
                                                            Enter your GitHub PAT here to fetch your repositories. It will be stored encrypted later.
                                                        </FormDescription>
                                                        <FormMessage />
                                                    </FormItem>
                                                )}
                                            />
                                            <div className="flex justify-end">
                                                <Button
                                                    type="submit"
                                                    disabled={isFetchingRepos}
                                                    className="w-full sm:w-auto"
                                                >
                                                    {isFetchingRepos ? (
                                                        <>
                                                            <Loader2 className="mr-2 h-4 w-4 animate-spin" />
                                                            Fetching Repositories...
                                                        </>
                                                    ) : (
                                                        "Fetch Repositories"
                                                    )}
                                                </Button>
                                            </div>
                                        </form>
                                    </CardContent>
                                )}
                                {step === 'select_repos' && (
                                    <CardContent>
                                        {repositories.length === 0 ? (
                                            <Alert variant="destructive">
                                                <CircleAlert className="h-4 w-4" />
                                                <AlertTitle>No Repositories Found</AlertTitle>
                                                <AlertDescription>
                                                    No repositories were found or accessible with the provided PAT. Please check the token and its permissions, then go back and try again.
                                                </AlertDescription>
                                            </Alert>
                                        ) : (
                                            <div className="space-y-4">
                                                <FormLabel>Repositories ({selectedRepos.length} selected)</FormLabel>
                                                <div className="h-64 w-full rounded-md border p-4 overflow-y-auto">
                                                    {repositories.map((repo) => (
                                                        <div key={repo.id} className="flex items-center space-x-2 mb-2 py-1">
                                                            <Checkbox
                                                                id={`repo-${repo.id}`}
                                                                checked={selectedRepos.includes(repo.full_name)}
                                                                onCheckedChange={(checked) => handleRepoSelection(repo.full_name, !!checked)}
                                                            />
                                                            <label
                                                                htmlFor={`repo-${repo.id}`}
                                                                className="text-sm font-medium leading-none peer-disabled:cursor-not-allowed peer-disabled:opacity-70"
                                                            >
                                                                {repo.full_name} {repo.private && "(Private)"}
                                                            </label>
                                                        </div>
                                                    ))}
                                                </div>
                                                <FormDescription>
                                                    Select the repositories you wish to index. Only checked repositories will be processed.
                                                </FormDescription>
                                                <div className="flex justify-between items-center pt-4">
                                                    <Button
                                                        variant="outline"
                                                        onClick={() => {
                                                            setStep('enter_pat');
                                                            setRepositories([]);
                                                            setSelectedRepos([]);
                                                            setValidatedPat("");
                                                            form.reset({ name: connectorName, github_pat: "" });
                                                        }}
                                                    >
                                                        Back
                                                    </Button>
                                                    <Button
                                                        onClick={handleCreateConnector}
                                                        disabled={isCreatingConnector || selectedRepos.length === 0}
                                                        className="w-full sm:w-auto"
                                                    >
                                                            {isCreatingConnector ? (
                                                                <>
                                                                    <Loader2 className="mr-2 h-4 w-4 animate-spin" />
                                                                    Creating Connector...
                                                                </>
                                                            ) : (
                                                                <>
                                                                    <Check className="mr-2 h-4 w-4" />
                                                                    Create Connector
                                                                </>
                                                            )}
                                                        </Button>
                                                    </div>
                                            </div>
                                        )}
                                    </CardContent>
                                )}
                            </Form>
                            <CardFooter className="flex flex-col items-start border-t bg-muted/50 px-6 py-4">
                                <h4 className="text-sm font-medium">What you get with GitHub integration:</h4>
                                <ul className="mt-2 list-disc pl-5 text-sm text-muted-foreground">
                                    <li>Search through code and documentation in your selected repositories</li>
                                    <li>Access READMEs, Markdown files, and common code files</li>
                                    <li>Connect your project knowledge directly to your search space</li>
                                    <li>Index your selected repositories for enhanced search capabilities</li>
                                </ul>
                            </CardFooter>
                        </Card>
                    </TabsContent>
                    <TabsContent value="documentation">
                        <Card className="border-2 border-border">
                            <CardHeader>
                                <CardTitle className="text-2xl font-bold">GitHub Connector Setup Guide</CardTitle>
                                <CardDescription>
                                    Learn how to generate a Personal Access Token (PAT) and connect your GitHub account.
                                </CardDescription>
                            </CardHeader>
                            <CardContent className="space-y-6">
                                <div>
                                    <h3 className="text-xl font-semibold mb-2">How it works</h3>
                                    <p className="text-muted-foreground">
                                        The GitHub connector uses a Personal Access Token (PAT) to authenticate with the GitHub API. First, it fetches a list of repositories accessible to the token. You then select which repositories you want to index. The connector indexes relevant files (code, markdown, text) from only the selected repositories.
                                    </p>
                                    <ul className="mt-2 list-disc pl-5 text-muted-foreground">
                                        <li>The connector indexes files based on common code and documentation extensions.</li>
                                        <li>Large files (over 1MB) are skipped during indexing.</li>
                                        <li>Only selected repositories are indexed.</li>
                                        <li>Indexing runs periodically (check connector settings for frequency) to keep content up-to-date.</li>
                                    </ul>
                                </div>
                                <Accordion type="single" collapsible className="w-full">
                                    <AccordionItem value="create_pat">
                                        <AccordionTrigger className="text-lg font-medium">Step 1: Generate GitHub PAT</AccordionTrigger>
                                        <AccordionContent>
                                            <div className="space-y-6">
                                                <div>
                                                    <h4 className="font-medium mb-2">Generating a Token:</h4>
                                                    <ol className="list-decimal pl-5 space-y-3">
                                                        <li>Go to your GitHub <a href="https://github.com/settings/tokens" target="_blank" rel="noopener noreferrer" className="font-medium underline underline-offset-4">Developer settings</a>.</li>
                                                        <li>Click on <strong>Personal access tokens</strong>, then choose <strong>Tokens (classic)</strong> or <strong>Fine-grained tokens</strong> (recommended if available and suitable).</li>
                                                        <li>Click <strong>Generate new token</strong> (and choose the appropriate type).</li>
                                                        <li>Give your token a descriptive name (e.g., "SurfSense Connector").</li>
                                                        <li>Set an expiration date for the token (recommended for security).</li>
                                                        <li>Under <strong>Select scopes</strong> (for classic tokens) or <strong>Repository access</strong> (for fine-grained), grant the necessary permissions. At minimum, the <strong>`repo`</strong> scope (or equivalent read access to repositories for fine-grained tokens) is required to read repository content.</li>
                                                        <li>Click <strong>Generate token</strong>.</li>
                                                        <li><strong>Important:</strong> Copy your new PAT immediately. You won't be able to see it again after leaving the page.</li>
                                                    </ol>
                                                </div>
                                            </div>
                                        </AccordionContent>
                                    </AccordionItem>
                                    <AccordionItem value="connect_app">
                                        <AccordionTrigger className="text-lg font-medium">Step 2: Connect in SurfSense</AccordionTrigger>
                                        <AccordionContent className="space-y-4">
                                            <ol className="list-decimal pl-5 space-y-3">
                                                <li>Navigate to the "Connect GitHub" tab.</li>
                                                <li>Enter a name for your connector.</li>
                                                <li>Paste the copied GitHub PAT into the "GitHub Personal Access Token (PAT)" field.</li>
                                                <li>Click <strong>Fetch Repositories</strong>.</li>
                                                <li>If the PAT is valid, you'll see a list of your accessible repositories.</li>
                                                <li>Select the repositories you want SurfSense to index using the checkboxes.</li>
                                                <li>Click the <strong>Create Connector</strong> button.</li>
                                                <li>If the connection is successful, you will be redirected and can start indexing from the Connectors page.</li>
                                            </ol>
                                        </AccordionContent>
                                    </AccordionItem>
                                </Accordion>
                            </CardContent>
                        </Card>
                    </TabsContent>
                </Tabs>
            </motion.div>
        </div>
    );
 } 
--- a/surfsense_web/app/dashboard/[search_space_id]/connectors/add/linear-connector/page.tsx
+++ b/surfsense_web/app/dashboard/[search_space_id]/connectors/add/linear-connector/page.tsx
@ -0,0 +1,321 @@
 "use client";
 import { useState } from "react";
 import { useRouter, useParams } from "next/navigation";
 import { motion } from "framer-motion";
 import { zodResolver } from "@hookform/resolvers/zod";
 import { useForm } from "react-hook-form";
 import * as z from "zod";
 import { toast } from "sonner";
 import { ArrowLeft, Check, Info, Loader2 } from "lucide-react";
 import { useSearchSourceConnectors } from "@/hooks/useSearchSourceConnectors";
 import {
  Form,
  FormControl,
  FormDescription,
  FormField,
  FormItem,
  FormLabel,
  FormMessage,
 } from "@/components/ui/form";
 import { Input } from "@/components/ui/input";
 import { Button } from "@/components/ui/button";
 import {
  Card,
  CardContent,
  CardDescription,
  CardFooter,
  CardHeader,
  CardTitle,
 } from "@/components/ui/card";
 import {
  Alert,
  AlertDescription,
  AlertTitle,
 } from "@/components/ui/alert";
 import {
  Accordion,
  AccordionContent,
  AccordionItem,
  AccordionTrigger,
 } from "@/components/ui/accordion";
 import { Tabs, TabsContent, TabsList, TabsTrigger } from "@/components/ui/tabs";
 // Define the form schema with Zod
 const linearConnectorFormSchema = z.object({
  name: z.string().min(3, {
    message: "Connector name must be at least 3 characters.",
  }),
  api_key: z.string().min(10, {
    message: "Linear API Key is required and must be valid.",
  }).regex(/^lin_api_/, {
    message: "Linear API Key should start with 'lin_api_'",
  }),
 });
 // Define the type for the form values
 type LinearConnectorFormValues = z.infer<typeof linearConnectorFormSchema>;
 export default function LinearConnectorPage() {
  const router = useRouter();
  const params = useParams();
  const searchSpaceId = params.search_space_id as string;
  const [isSubmitting, setIsSubmitting] = useState(false);
  const { createConnector } = useSearchSourceConnectors();
  // Initialize the form
  const form = useForm<LinearConnectorFormValues>({
    resolver: zodResolver(linearConnectorFormSchema),
    defaultValues: {
      name: "Linear Connector",
      api_key: "",
    },
  });
  // Handle form submission
  const onSubmit = async (values: LinearConnectorFormValues) => {
    setIsSubmitting(true);
    try {
      await createConnector({
        name: values.name,
        connector_type: "LINEAR_CONNECTOR",
        config: {
          LINEAR_API_KEY: values.api_key,
        },
        is_indexable: true,
        last_indexed_at: null,
      });
      toast.success("Linear connector created successfully!");
      // Navigate back to connectors page
      router.push(`/dashboard/${searchSpaceId}/connectors`);
    } catch (error) {
      console.error("Error creating connector:", error);
      toast.error(error instanceof Error ? error.message : "Failed to create connector");
    } finally {
      setIsSubmitting(false);
    }
  };
  return (
    <div className="container mx-auto py-8 max-w-3xl">
      <Button
        variant="ghost"
        className="mb-6"
        onClick={() => router.push(`/dashboard/${searchSpaceId}/connectors/add`)}
      >
        <ArrowLeft className="mr-2 h-4 w-4" />
        Back to Connectors
      </Button>
      <motion.div
        initial={{ opacity: 0, y: 20 }}
        animate={{ opacity: 1, y: 0 }}
        transition={{ duration: 0.5 }}
      >
        <Tabs defaultValue="connect" className="w-full">
          <TabsList className="grid w-full grid-cols-2 mb-6">
            <TabsTrigger value="connect">Connect</TabsTrigger>
            <TabsTrigger value="documentation">Documentation</TabsTrigger>
          </TabsList>
          <TabsContent value="connect">
            <Card className="border-2 border-border">
              <CardHeader>
                <CardTitle className="text-2xl font-bold">Connect Linear Workspace</CardTitle>
                <CardDescription>
                  Integrate with Linear to search and retrieve information from your issues and comments. This connector can index your Linear content for search.
                </CardDescription>
              </CardHeader>
              <CardContent>
                <Alert className="mb-6 bg-muted">
                  <Info className="h-4 w-4" />
                  <AlertTitle>Linear API Key Required</AlertTitle>
                  <AlertDescription>
                    You'll need a Linear API Key to use this connector. You can create a Linear API key from{" "}
                    <a 
                      href="https://linear.app/settings/api" 
                      target="_blank" 
                      rel="noopener noreferrer"
                      className="font-medium underline underline-offset-4"
                    >
                      Linear API Settings
                    </a>
                  </AlertDescription>
                </Alert>
                <Form {...form}>
                  <form onSubmit={form.handleSubmit(onSubmit)} className="space-y-6">
                    <FormField
                      control={form.control}
                      name="name"
                      render={({ field }) => (
                        <FormItem>
                          <FormLabel>Connector Name</FormLabel>
                          <FormControl>
                            <Input placeholder="My Linear Connector" {...field} />
                          </FormControl>
                          <FormDescription>
                            A friendly name to identify this connector.
                          </FormDescription>
                          <FormMessage />
                        </FormItem>
                      )}
                    />
                    <FormField
                      control={form.control}
                      name="api_key"
                      render={({ field }) => (
                        <FormItem>
                          <FormLabel>Linear API Key</FormLabel>
                          <FormControl>
                            <Input 
                              type="password" 
                              placeholder="lin_api_..." 
                              {...field} 
                            />
                          </FormControl>
                          <FormDescription>
                            Your Linear API Key will be encrypted and stored securely. It typically starts with "lin_api_".
                          </FormDescription>
                          <FormMessage />
                        </FormItem>
                      )}
                    />
                    <div className="flex justify-end">
                      <Button 
                        type="submit" 
                        disabled={isSubmitting}
                        className="w-full sm:w-auto"
                      >
                        {isSubmitting ? (
                          <>
                            <Loader2 className="mr-2 h-4 w-4 animate-spin" />
                            Connecting...
                          </>
                        ) : (
                          <>
                            <Check className="mr-2 h-4 w-4" />
                            Connect Linear
                          </>
                        )}
                      </Button>
                    </div>
                  </form>
                </Form>
              </CardContent>
              <CardFooter className="flex flex-col items-start border-t bg-muted/50 px-6 py-4">
                <h4 className="text-sm font-medium">What you get with Linear integration:</h4>
                <ul className="mt-2 list-disc pl-5 text-sm text-muted-foreground">
                  <li>Search through all your Linear issues and comments</li>
                  <li>Access issue titles, descriptions, and full discussion threads</li>
                  <li>Connect your team's project management directly to your search space</li>
                  <li>Keep your search results up-to-date with latest Linear content</li>
                  <li>Index your Linear issues for enhanced search capabilities</li>
                </ul>
              </CardFooter>
            </Card>
          </TabsContent>
          <TabsContent value="documentation">
            <Card className="border-2 border-border">
              <CardHeader>
                <CardTitle className="text-2xl font-bold">Linear Connector Documentation</CardTitle>
                <CardDescription>
                  Learn how to set up and use the Linear connector to index your project management data.
                </CardDescription>
              </CardHeader>
              <CardContent className="space-y-6">
                <div>
                  <h3 className="text-xl font-semibold mb-2">How it works</h3>
                  <p className="text-muted-foreground">
                    The Linear connector uses the Linear GraphQL API to fetch all issues and comments that the API key has access to within a workspace.
                  </p>
                  <ul className="mt-2 list-disc pl-5 text-muted-foreground">
                    <li>For follow up indexing runs, the connector retrieves issues and comments that have been updated since the last indexing attempt.</li>
                    <li>Indexing is configured to run periodically, so updates should appear in your search results within minutes.</li>
                  </ul>
                </div>
                <Accordion type="single" collapsible className="w-full">
                  <AccordionItem value="authorization">
                    <AccordionTrigger className="text-lg font-medium">Authorization</AccordionTrigger>
                    <AccordionContent className="space-y-4">
                      <Alert className="bg-muted">
                        <Info className="h-4 w-4" />
                        <AlertTitle>Read-Only Access is Sufficient</AlertTitle>
                        <AlertDescription>
                          You only need a read-only API key for this connector to work. This limits the permissions to just reading your Linear data.
                        </AlertDescription>
                      </Alert>
                      <div className="space-y-6">
                        <div>
                          <h4 className="font-medium mb-2">Step 1: Create an API key</h4>
                          <ol className="list-decimal pl-5 space-y-3">
                            <li>Log in to your Linear account</li>
                            <li>Navigate to <a href="https://linear.app/settings/api" target="_blank" rel="noopener noreferrer" className="font-medium underline underline-offset-4">https://linear.app/settings/api</a> in your browser.</li>
                            <li>Alternatively, click on your profile picture → Settings → API</li>
                            <li>Click the <strong>+ New API key</strong> button.</li>
                            <li>Enter a description for your key (like "Search Connector").</li>
                            <li>Select "Read-only" as the permission.</li>
                            <li>Click <strong>Create</strong> to generate the API key.</li>
                            <li>Copy the generated API key that starts with 'lin_api_' as it will only be shown once.</li>
                          </ol>
                        </div>
                        <div>
                          <h4 className="font-medium mb-2">Step 2: Grant necessary access</h4>
                          <p className="text-muted-foreground mb-3">
                            The API key will have access to all issues and comments that your user account can see. If you're creating the key as an admin, it will have access to all issues in the workspace.
                          </p>
                          <Alert className="bg-muted">
                            <Info className="h-4 w-4" />
                            <AlertTitle>Data Privacy</AlertTitle>
                            <AlertDescription>
                              Only issues and comments will be indexed. Linear attachments and linked files are not indexed by this connector.
                            </AlertDescription>
                          </Alert>
                        </div>
                      </div>
                    </AccordionContent>
                  </AccordionItem>
                  <AccordionItem value="indexing">
                    <AccordionTrigger className="text-lg font-medium">Indexing</AccordionTrigger>
                    <AccordionContent className="space-y-4">
                      <ol className="list-decimal pl-5 space-y-3">
                        <li>Navigate to the Connector Dashboard and select the <strong>Linear</strong> Connector.</li>
                        <li>Place the <strong>API Key</strong> in the form field.</li>
                        <li>Click <strong>Connect</strong> to establish the connection.</li>
                        <li>Once connected, your Linear issues will be indexed automatically.</li>
                      </ol>
                      <Alert className="bg-muted">
                        <Info className="h-4 w-4" />
                        <AlertTitle>What Gets Indexed</AlertTitle>
                        <AlertDescription>
                          <p className="mb-2">The Linear connector indexes the following data:</p>
                          <ul className="list-disc pl-5">
                            <li>Issue titles and identifiers (e.g., PROJ-123)</li>
                            <li>Issue descriptions</li>
                            <li>Issue comments</li>
                            <li>Issue status and metadata</li>
                          </ul>
                        </AlertDescription>
                      </Alert>
                    </AccordionContent>
                  </AccordionItem>
                </Accordion>
              </CardContent>
            </Card>
          </TabsContent>
        </Tabs>
      </motion.div>
    </div>
  );
 }
--- a/surfsense_web/app/dashboard/[search_space_id]/connectors/add/page.tsx
+++ b/surfsense_web/app/dashboard/[search_space_id]/connectors/add/page.tsx
@ -1,57 +1,61 @@
 "use client";
-import { cn } from "@/lib/utils";
+import { Badge } from "@/components/ui/badge";
 import { Button } from "@/components/ui/button";
 import { Card, CardContent, CardFooter, CardHeader } from "@/components/ui/card";
 import { Collapsible, CollapsibleContent, CollapsibleTrigger } from "@/components/ui/collapsible";
 import {
  IconBrandGoogle,
  IconBrandSlack,
  IconBrandWindows,
  IconBrandDiscord,
  IconSearch,
  IconMessages,
  IconDatabase,
  IconCloud,
  IconBrandGithub,
  IconBrandNotion,
-  IconMail,
+  IconBrandSlack,
  IconBrandWindows,
  IconBrandZoom,
  IconChevronDown,
  IconChevronRight,
  IconMail,
  IconWorldWww,
  IconTicket,
  IconLayoutKanban,
 } from "@tabler/icons-react";
-import { motion, AnimatePresence } from "framer-motion";
+import { AnimatePresence, motion } from "framer-motion";
 import { useState } from "react";
 import { useParams } from "next/navigation";
 import Link from "next/link";
-import { Button } from "@/components/ui/button";
+import { useParams } from "next/navigation";
-import { Separator } from "@/components/ui/separator";
+import { useState } from "react";
-import { Collapsible, CollapsibleContent, CollapsibleTrigger } from "@/components/ui/collapsible";
+
 // Define the Connector type
 interface Connector {
  id: string;
  title: string;
  description: string;
  icon: React.ReactNode;
  status: "available" | "coming-soon" | "connected";
 }
 interface ConnectorCategory {
  id: string;
  title: string;
  connectors: Connector[];
 }
 // Define connector categories and their connectors
-const connectorCategories = [
+const connectorCategories: ConnectorCategory[] = [
  {
    id: "search-engines",
    title: "Search Engines",
    description: "Connect to search engines to enhance your research capabilities.",
    icon: <IconSearch className="h-5 w-5" />,
    connectors: [
      {
        id: "tavily-api",
-        title: "Tavily Search API",
+        title: "Tavily API",
-        description: "Connect to Tavily Search API to search the web.",
+        description: "Search the web using the Tavily API",
-        icon: <IconSearch className="h-6 w-6" />,
+        icon: <IconWorldWww className="h-6 w-6" />,
        status: "available",
      },
-      {
+      // Add other search engine connectors like Tavily, Serper if they have UI config
        id: "serper-api",
        title: "Serper API",
        description: "Connect to Serper API to search the web.",
        icon: <IconBrandGoogle className="h-6 w-6" />,
        status: "coming-soon",
      },
    ],
  },
  {
    id: "team-chats",
    title: "Team Chats",
    description: "Connect to your team communication platforms.",
    icon: <IconMessages className="h-5 w-5" />,
    connectors: [
      {
        id: "slack-connector",
@ -76,11 +80,29 @@ const connectorCategories = [
      },
    ],
  },
  {
    id: "project-management",
    title: "Project Management",
    connectors: [
      {
        id: "linear-connector",
        title: "Linear",
        description: "Connect to Linear to search issues, comments and project data.",
        icon: <IconLayoutKanban className="h-6 w-6" />,
        status: "available",
      },
      {
        id: "jira-connector",
        title: "Jira",
        description: "Connect to Jira to search issues, tickets and project data.",
        icon: <IconTicket className="h-6 w-6" />,
        status: "coming-soon",
      },
    ],
  },
  {
    id: "knowledge-bases",
    title: "Knowledge Bases",
    description: "Connect to your knowledge bases and documentation.",
    icon: <IconDatabase className="h-5 w-5" />,
    connectors: [
      {
        id: "notion-connector",
@ -90,19 +112,17 @@ const connectorCategories = [
        status: "available",
      },
      {
-        id: "github",
+        id: "github-connector",
        title: "GitHub",
-        description: "Connect to GitHub repositories to access code and documentation.",
+        description: "Connect a GitHub PAT to index code and docs from accessible repositories.",
        icon: <IconBrandGithub className="h-6 w-6" />,
-        status: "coming-soon",
+        status: "available",
      },
    ],
  },
  {
    id: "communication",
    title: "Communication",
    description: "Connect to your email and meeting platforms.",
    icon: <IconMail className="h-5 w-5" />,
    connectors: [
      {
        id: "gmail",
@ -122,10 +142,48 @@ const connectorCategories = [
  },
 ];
 // Animation variants
 const fadeIn = {
  hidden: { opacity: 0 },
  visible: { opacity: 1, transition: { duration: 0.4 } }
 };
 const staggerContainer = {
  hidden: { opacity: 0 },
  visible: {
    opacity: 1,
    transition: {
      staggerChildren: 0.1
    }
  }
 };
 const cardVariants = {
  hidden: { opacity: 0, y: 20 },
  visible: { 
    opacity: 1, 
    y: 0,
    transition: { 
      type: "spring",
      stiffness: 260,
      damping: 20
    }
  },
  hover: { 
    scale: 1.02,
    boxShadow: "0 10px 15px -3px rgba(0, 0, 0, 0.1), 0 4px 6px -2px rgba(0, 0, 0, 0.05)",
    transition: { 
      type: "spring",
      stiffness: 400,
      damping: 10
    }
  }
 };
 export default function ConnectorsPage() {
  const params = useParams();
  const searchSpaceId = params.search_space_id as string;
-  const [expandedCategories, setExpandedCategories] = useState<string[]>(["search-engines"]);
+  const [expandedCategories, setExpandedCategories] = useState<string[]>(["search-engines", "knowledge-bases", "project-management"]);
  const toggleCategory = (categoryId: string) => {
    setExpandedCategories(prev => 
@ -136,121 +194,142 @@ export default function ConnectorsPage() {
  };
  return (
-    <div className="container mx-auto py-8 max-w-6xl">
+    <div className="container mx-auto py-12 max-w-6xl">
      <motion.div
-        initial={{ opacity: 0, y: 20 }}
+        initial={{ opacity: 0, y: 30 }}
        animate={{ opacity: 1, y: 0 }}
-        transition={{ duration: 0.5 }}
+        transition={{ 
-        className="mb-8 text-center"
+          duration: 0.6,
          ease: [0.22, 1, 0.36, 1]
        }}
        className="mb-12 text-center"
      >
-        <h1 className="text-3xl font-bold tracking-tight">Connect Your Tools</h1>
+        <h1 className="text-4xl font-bold tracking-tight bg-gradient-to-r from-indigo-500 to-purple-500 bg-clip-text text-transparent">
-        <p className="text-muted-foreground mt-2">
+          Connect Your Tools
        </h1>
        <p className="text-muted-foreground mt-3 text-lg max-w-2xl mx-auto">
          Integrate with your favorite services to enhance your research capabilities.
        </p>
      </motion.div>
-      <div className="space-y-6">
+      <motion.div 
-        {connectorCategories.map((category, categoryIndex) => (
+        className="space-y-8"
-          <Collapsible
+        initial="hidden"
        animate="visible"
        variants={staggerContainer}
      >
        {connectorCategories.map((category) => (
          <motion.div 
            key={category.id}
-            open={expandedCategories.includes(category.id)}
+            variants={fadeIn}
-            onOpenChange={() => toggleCategory(category.id)}
+            className="rounded-lg border bg-card text-card-foreground shadow-sm"
            className="border rounded-lg overflow-hidden bg-card"
          >
-            <CollapsibleTrigger asChild>
+            <Collapsible
-              <motion.div
+              open={expandedCategories.includes(category.id)}
-                initial={{ opacity: 0, y: 10 }}
+              onOpenChange={() => toggleCategory(category.id)}
-                animate={{ opacity: 1, y: 0 }}
+              className="w-full"
-                transition={{ duration: 0.3, delay: categoryIndex * 0.1 }}
+            >
-                className="p-4 flex items-center justify-between cursor-pointer hover:bg-accent/50 transition-colors"
+              <div className="flex items-center justify-between space-x-4 p-4">
-              >
+                <h3 className="text-xl font-semibold">{category.title}</h3>
-                <div className="flex items-center gap-3">
+                <CollapsibleTrigger asChild>
-                  <div className="p-2 rounded-md bg-primary/10 text-primary">
+                  <Button variant="ghost" size="sm" className="w-9 p-0 hover:bg-muted">
                    {category.icon}
                  </div>
                  <div>
                    <h2 className="text-xl font-semibold">{category.title}</h2>
                    <p className="text-sm text-muted-foreground">{category.description}</p>
                  </div>
                </div>
                <IconChevronRight 
                  className={cn(
                    "h-5 w-5 text-muted-foreground transition-transform duration-200",
                    expandedCategories.includes(category.id) && "rotate-90"
                  )} 
                />
              </motion.div>
            </CollapsibleTrigger>
            <CollapsibleContent>
              <Separator />
              <div className="p-4 grid grid-cols-1 md:grid-cols-2 lg:grid-cols-3 gap-4">
                <AnimatePresence>
                  {category.connectors.map((connector, index) => (
                    <motion.div
-                      key={connector.id}
+                      animate={{ rotate: expandedCategories.includes(category.id) ? 180 : 0 }}
-                      initial={{ opacity: 0, scale: 0.95 }}
+                      transition={{ duration: 0.3, ease: "easeInOut" }}
                      animate={{ opacity: 1, scale: 1 }}
                      exit={{ opacity: 0, scale: 0.95 }}
                      transition={{ 
                        duration: 0.2, 
                        delay: index * 0.05,
                        type: "spring",
                        stiffness: 300,
                        damping: 30
                      }}
                      className={cn(
                        "relative group flex flex-col p-4 rounded-lg border",
                        connector.status === "coming-soon" ? "opacity-70" : ""
                      )}
                    >
-                      <div className="absolute inset-0 opacity-0 group-hover:opacity-100 transition duration-200 bg-gradient-to-t from-accent/50 to-transparent rounded-lg pointer-events-none" />
+                      <IconChevronDown className="h-5 w-5" />
                      <div className="mb-4 relative z-10 text-primary">
                        {connector.icon}
                      </div>
                      <div className="flex items-center justify-between mb-2">
                        <h3 className="text-lg font-semibold group-hover:translate-x-1 transition duration-200">
                          {connector.title}
                        </h3>
                        {connector.status === "coming-soon" && (
                          <span className="text-xs bg-muted px-2 py-1 rounded-full">Coming soon</span>
                        )}
                      </div>
                      <p className="text-sm text-muted-foreground mb-4 flex-grow">
                        {connector.description}
                      </p>
                      {connector.status === "available" ? (
                        <Link 
                          href={`/dashboard/${searchSpaceId}/connectors/add/${connector.id}`}
                          className="w-full mt-auto"
                        >
                          <Button 
                            variant="default"
                            className="w-full"
                          >
                            Connect
                          </Button>
                        </Link>
                      ) : (
                        <Button 
                          variant="outline"
                          className="w-full mt-auto"
                          disabled
                        >
                          Notify Me
                        </Button>
                      )}
                    </motion.div>
-                  ))}
+                    <span className="sr-only">Toggle</span>
-                </AnimatePresence>
+                  </Button>
                </CollapsibleTrigger>
              </div>
-            </CollapsibleContent>
+              
-          </Collapsible>
+              <CollapsibleContent>
                <AnimatePresence>
                  <motion.div 
                    className="grid grid-cols-1 gap-6 sm:grid-cols-2 lg:grid-cols-3 p-4"
                    variants={staggerContainer}
                    initial="hidden"
                    animate="visible"
                    exit="hidden"
                  >
                    {category.connectors.map((connector) => (
                      <motion.div
                        key={connector.id}
                        variants={cardVariants}
                        whileHover="hover"
                        className="col-span-1"
                      >
                        <Card className="h-full flex flex-col overflow-hidden border-transparent transition-all duration-200 hover:border-primary/50">
                          <CardHeader className="flex-row items-center gap-4 pb-2">
                            <div className="flex h-12 w-12 items-center justify-center rounded-lg bg-primary/10 dark:bg-primary/20">
                              <motion.div
                                whileHover={{ rotate: 5, scale: 1.1 }}
                                className="text-primary"
                              >
                                {connector.icon}
                              </motion.div>
                            </div>
                            <div>
                              <div className="flex items-center gap-2">
                                <h3 className="font-medium">{connector.title}</h3>
                                {connector.status === "coming-soon" && (
                                  <Badge variant="outline" className="text-xs bg-amber-100 dark:bg-amber-950 text-amber-800 dark:text-amber-300 border-amber-200 dark:border-amber-800">
                                    Coming soon
                                  </Badge>
                                )}
                                {connector.status === "connected" && (
                                  <Badge variant="outline" className="text-xs bg-green-100 dark:bg-green-950 text-green-800 dark:text-green-300 border-green-200 dark:border-green-800">
                                    Connected
                                  </Badge>
                                )}
                              </div>
                            </div>
                          </CardHeader>
                          <CardContent className="pb-4">
                            <p className="text-sm text-muted-foreground">
                              {connector.description}
                            </p>
                          </CardContent>
                          <CardFooter className="mt-auto pt-2">
                            {connector.status === 'available' && (
                              <Link href={`/dashboard/${searchSpaceId}/connectors/add/${connector.id}`} className="w-full">
                                <Button variant="default" className="w-full group">
                                  <span>Connect</span>
                                  <motion.div
                                    className="ml-1"
                                    initial={{ x: 0 }}
                                    whileHover={{ x: 3 }}
                                    transition={{ type: "spring", stiffness: 400, damping: 10 }}
                                  >
                                    <IconChevronRight className="h-4 w-4" />
                                  </motion.div>
                                </Button>
                              </Link>
                            )}
                            {connector.status === 'coming-soon' && (
                              <Button variant="outline" disabled className="w-full opacity-70">
                                Coming Soon
                              </Button>
                            )}
                            {connector.status === 'connected' && (
                              <Button variant="outline" className="w-full border-green-500 text-green-600 hover:bg-green-50 dark:hover:bg-green-950">
                                Manage
                              </Button>
                            )}
                          </CardFooter>
                        </Card>
                      </motion.div>
                    ))}
                  </motion.div>
                </AnimatePresence>
              </CollapsibleContent>
            </Collapsible>
          </motion.div>
        ))}
-      </div>
+      </motion.div>
    </div>
  );
 }
--- a/surfsense_web/app/dashboard/[search_space_id]/documents/(manage)/page.tsx
+++ b/surfsense_web/app/dashboard/[search_space_id]/documents/(manage)/page.tsx
@ -1,6 +1,7 @@
 "use client";
-import { cn } from "@/lib/utils";
+import { DocumentViewer } from "@/components/document-viewer";
 import { JsonMetadataViewer } from "@/components/json-metadata-viewer";
 import {
    AlertDialog,
    AlertDialogAction,
@ -12,7 +13,6 @@ import {
    AlertDialogTitle,
    AlertDialogTrigger,
 } from "@/components/ui/alert-dialog";
 import { Badge } from "@/components/ui/badge";
 import { Button } from "@/components/ui/button";
 import { Checkbox } from "@/components/ui/checkbox";
 import {
@ -43,6 +43,9 @@ import {
    TableHeader,
    TableRow,
 } from "@/components/ui/table";
 import { useDocuments } from "@/hooks/use-documents";
 import { cn } from "@/lib/utils";
 import { IconBrandGithub, IconBrandNotion, IconBrandSlack, IconBrandYoutube, IconLayoutKanban } from "@tabler/icons-react";
 import {
    ColumnDef,
    ColumnFiltersState,
@ -59,6 +62,7 @@ import {
    getSortedRowModel,
    useReactTable,
 } from "@tanstack/react-table";
 import { AnimatePresence, motion } from "framer-motion";
 import {
    AlertCircle,
    ChevronDown,
@ -70,31 +74,22 @@ import {
    CircleAlert,
    CircleX,
    Columns3,
    Filter,
    ListFilter,
    Plus,
    FileText,
    Globe,
    MessageSquare,
    FileX,
    File,
-    Trash,
+    FileX,
    Filter,
    Globe,
    ListFilter,
    MoreHorizontal,
-    Webhook,
+    Trash,
    Webhook
 } from "lucide-react";
 import { useEffect, useId, useMemo, useRef, useState, useContext } from "react";
 import { motion, AnimatePresence } from "framer-motion";
 import { useParams } from "next/navigation";
-import { useDocuments } from "@/hooks/use-documents";
+import React, { useContext, useEffect, useId, useMemo, useRef, useState } from "react";
 import React from "react";
 import { toast } from "sonner";
 import ReactMarkdown from "react-markdown";
 import rehypeRaw from "rehype-raw";
 import rehypeSanitize from "rehype-sanitize";
 import remarkGfm from "remark-gfm";
-import { DocumentViewer } from "@/components/document-viewer";
+import { toast } from "sonner";
 import { JsonMetadataViewer } from "@/components/json-metadata-viewer";
 import { IconBrandNotion, IconBrandSlack, IconBrandYoutube } from "@tabler/icons-react";
 // Define animation variants for reuse
 const fadeInScale = {
@ -114,7 +109,7 @@ const fadeInScale = {
 type Document = {
    id: number;
    title: string;
-    document_type: "EXTENSION" | "CRAWLED_URL" | "SLACK_CONNECTOR" | "NOTION_CONNECTOR" | "FILE" | "YOUTUBE_VIDEO";
+    document_type: "EXTENSION" | "CRAWLED_URL" | "SLACK_CONNECTOR" | "NOTION_CONNECTOR" | "FILE" | "YOUTUBE_VIDEO" | "LINEAR_CONNECTOR";
    document_metadata: any;
    content: string;
    created_at: string;
@ -142,6 +137,8 @@ const documentTypeIcons = {
    NOTION_CONNECTOR: IconBrandNotion,
    FILE: File,
    YOUTUBE_VIDEO: IconBrandYoutube,
    GITHUB_CONNECTOR: IconBrandGithub,
    LINEAR_CONNECTOR: IconLayoutKanban,
 } as const;
 const columns: ColumnDef<Document>[] = [
@ -1028,4 +1025,5 @@ function RowActions({ row }: { row: Row<Document> }) {
    );
 }
-export { DocumentsTable }
+export { DocumentsTable };
--- a/surfsense_web/app/dashboard/[search_space_id]/researcher/[chat_id]/page.tsx
+++ b/surfsense_web/app/dashboard/[search_space_id]/researcher/[chat_id]/page.tsx
@ -240,7 +240,7 @@ const SourcesDialogContent = ({
 const ChatPage = () => {
  const [token, setToken] = React.useState<string | null>(null);
  const [activeTab, setActiveTab] = useState("");
-  const [dialogOpen, setDialogOpen] = useState(false);
+  const [dialogOpenId, setDialogOpenId] = useState<number | null>(null);
  const [sourcesPage, setSourcesPage] = useState(1);
  const [expandedSources, setExpandedSources] = useState(false);
  const [canScrollLeft, setCanScrollLeft] = useState(false);
@ -260,6 +260,13 @@ const ChatPage = () => {
  const { search_space_id, chat_id } = useParams();
  // Function to scroll terminal to bottom
  const scrollTerminalToBottom = () => {
    if (terminalMessagesRef.current) {
      terminalMessagesRef.current.scrollTop = terminalMessagesRef.current.scrollHeight;
    }
  };
  // Get token from localStorage on client side only
  React.useEffect(() => {
    setToken(localStorage.getItem('surfsense_bearer_token'));
@ -469,54 +476,60 @@ const ChatPage = () => {
    updateChat();
  }, [messages, status, chat_id, researchMode, selectedConnectors, search_space_id]);
-  // Log messages whenever they update and extract annotations from the latest assistant message if available
+  // Memoize connector sources to prevent excessive re-renders
-  useEffect(() => {
+  const processedConnectorSources = React.useMemo(() => {
-    console.log('Messages updated:', messages);
+    if (messages.length === 0) return connectorSources;
-
+    
-    // Extract annotations from the latest assistant message if available
+    // Only process when we have a complete message (not streaming)
    if (status !== 'ready') return connectorSources;
    // Find the latest assistant message
    const assistantMessages = messages.filter(msg => msg.role === 'assistant');
-    if (assistantMessages.length > 0) {
+    if (assistantMessages.length === 0) return connectorSources;
-      const latestAssistantMessage = assistantMessages[assistantMessages.length - 1];
+    
-      if (latestAssistantMessage?.annotations) {
+    const latestAssistantMessage = assistantMessages[assistantMessages.length - 1];
-        const annotations = latestAssistantMessage.annotations as any[];
+    if (!latestAssistantMessage?.annotations) return connectorSources;
-
+    
-        // Debug log to track streaming annotations
+    // Find the latest SOURCES annotation
-        if (process.env.NODE_ENV === 'development') {
+    const annotations = latestAssistantMessage.annotations as any[];
-          console.log('Streaming annotations:', annotations);
+    const sourcesAnnotations = annotations.filter(a => a.type === 'SOURCES');
-
+    
-          // Log counts of each annotation type
+    if (sourcesAnnotations.length === 0) return connectorSources;
-          const terminalInfoCount = annotations.filter(a => a.type === 'TERMINAL_INFO').length;
+    
-          const sourcesCount = annotations.filter(a => a.type === 'SOURCES').length;
+    const latestSourcesAnnotation = sourcesAnnotations[sourcesAnnotations.length - 1];
-          const answerCount = annotations.filter(a => a.type === 'ANSWER').length;
+    if (!latestSourcesAnnotation.content) return connectorSources;
-
+    
-          console.log(`Annotation counts - Terminal: ${terminalInfoCount}, Sources: ${sourcesCount}, Answer: ${answerCount}`);
+    // Use this content if it differs from current
-        }
+    return latestSourcesAnnotation.content;
-
+  }, [messages, status, connectorSources]);
-        // Process SOURCES annotation - get the last one to ensure we have the latest
+  
-        const sourcesAnnotations = annotations.filter(
+  // Update connector sources when processed value changes
-          (annotation) => annotation.type === 'SOURCES'
+  useEffect(() => {
-        );
+    if (processedConnectorSources !== connectorSources) {
-
+      setConnectorSources(processedConnectorSources);
        if (sourcesAnnotations.length > 0) {
          // Get the last SOURCES annotation to ensure we have the most recent one
          const latestSourcesAnnotation = sourcesAnnotations[sourcesAnnotations.length - 1];
          if (latestSourcesAnnotation.content) {
            setConnectorSources(latestSourcesAnnotation.content);
          }
        }
        // Check for terminal info annotations and scroll terminal to bottom if they exist
        const terminalInfoAnnotations = annotations.filter(
          (annotation) => annotation.type === 'TERMINAL_INFO'
        );
        if (terminalInfoAnnotations.length > 0) {
          // Schedule scrolling after the DOM has been updated
          setTimeout(scrollTerminalToBottom, 100);
        }
      }
    }
-  }, [messages]);
+  }, [processedConnectorSources, connectorSources]);
  // Check and scroll terminal when terminal info is available
  useEffect(() => {
    if (messages.length === 0 || status !== 'ready') return;
    // Find the latest assistant message
    const assistantMessages = messages.filter(msg => msg.role === 'assistant');
    if (assistantMessages.length === 0) return;
    const latestAssistantMessage = assistantMessages[assistantMessages.length - 1];
    if (!latestAssistantMessage?.annotations) return;
    // Check for terminal info annotations
    const annotations = latestAssistantMessage.annotations as any[];
    const terminalInfoAnnotations = annotations.filter(a => a.type === 'TERMINAL_INFO');
    if (terminalInfoAnnotations.length > 0) {
      // Schedule scrolling after the DOM has been updated
      setTimeout(scrollTerminalToBottom, 100);
    }
  }, [messages, status]);
  // Custom handleSubmit function to include selected connectors and answer type
  const handleSubmit = (e: React.FormEvent) => {
@ -543,24 +556,22 @@ const ChatPage = () => {
    messagesEndRef.current?.scrollIntoView({ behavior: 'smooth' });
  };
  // Function to scroll terminal to bottom
  const scrollTerminalToBottom = () => {
    if (terminalMessagesRef.current) {
      terminalMessagesRef.current.scrollTop = terminalMessagesRef.current.scrollHeight;
    }
  };
  // Scroll to bottom when messages change
  useEffect(() => {
    scrollToBottom();
  }, [messages]);
-  // Set activeTab when connectorSources change
+  // Set activeTab when connectorSources change using a memoized value
-  useEffect(() => {
+  const activeTabValue = React.useMemo(() => {
-    if (connectorSources.length > 0) {
+    return connectorSources.length > 0 ? connectorSources[0].type : "";
      setActiveTab(connectorSources[0].type);
    }
  }, [connectorSources]);
  // Update activeTab when the memoized value changes
  useEffect(() => {
    if (activeTabValue && activeTabValue !== activeTab) {
      setActiveTab(activeTabValue);
    }
  }, [activeTabValue, activeTab]);
  // Scroll terminal to bottom when expanded
  useEffect(() => {
@ -617,49 +628,89 @@ const ChatPage = () => {
  };
  // Function to get a citation source by ID
-  const getCitationSource = (citationId: number): Source | null => {
+  const getCitationSource = React.useCallback((citationId: number, messageIndex?: number): Source | null => {
    if (!messages || messages.length === 0) return null;
-    // Find the latest assistant message
+    // If no specific message index is provided, use the latest assistant message
-    const assistantMessages = messages.filter(msg => msg.role === 'assistant');
+    if (messageIndex === undefined) {
-    if (assistantMessages.length === 0) return null;
+      // Find the latest assistant message
      const assistantMessages = messages.filter(msg => msg.role === 'assistant');
      if (assistantMessages.length === 0) return null;
-    const latestAssistantMessage = assistantMessages[assistantMessages.length - 1];
+      const latestAssistantMessage = assistantMessages[assistantMessages.length - 1];
-    if (!latestAssistantMessage?.annotations) return null;
+      if (!latestAssistantMessage?.annotations) return null;
-    // Find all SOURCES annotations
+      // Find all SOURCES annotations
-    const annotations = latestAssistantMessage.annotations as any[];
+      const annotations = latestAssistantMessage.annotations as any[];
-    const sourcesAnnotations = annotations.filter(
+      const sourcesAnnotations = annotations.filter(
-      (annotation) => annotation.type === 'SOURCES'
+        (annotation) => annotation.type === 'SOURCES'
-    );
+      );
-    // Get the latest SOURCES annotation
+      // Get the latest SOURCES annotation
-    if (sourcesAnnotations.length === 0) return null;
+      if (sourcesAnnotations.length === 0) return null;
-    const latestSourcesAnnotation = sourcesAnnotations[sourcesAnnotations.length - 1];
+      const latestSourcesAnnotation = sourcesAnnotations[sourcesAnnotations.length - 1];
-    if (!latestSourcesAnnotation.content) return null;
+      if (!latestSourcesAnnotation.content) return null;
-    // Flatten all sources from all connectors
+      // Flatten all sources from all connectors
-    const allSources: Source[] = [];
+      const allSources: Source[] = [];
-    latestSourcesAnnotation.content.forEach((connector: ConnectorSource) => {
+      latestSourcesAnnotation.content.forEach((connector: ConnectorSource) => {
-      if (connector.sources && Array.isArray(connector.sources)) {
+        if (connector.sources && Array.isArray(connector.sources)) {
-        connector.sources.forEach((source: SourceItem) => {
+          connector.sources.forEach((source: SourceItem) => {
-          allSources.push({
+            allSources.push({
-            id: source.id,
+              id: source.id,
-            title: source.title,
+              title: source.title,
-            description: source.description,
+              description: source.description,
-            url: source.url,
+              url: source.url,
-            connectorType: connector.type
+              connectorType: connector.type
            });
          });
-        });
+        }
-      }
+      });
    });
-    // Find the source with the matching ID
+      // Find the source with the matching ID
-    const foundSource = allSources.find(source => source.id === citationId);
+      const foundSource = allSources.find(source => source.id === citationId);
-    return foundSource || null;
+      return foundSource || null;
-  };
+    } else {
      // Use the specific message by index
      const message = messages[messageIndex];
      if (!message || message.role !== 'assistant' || !message.annotations) return null;
      // Find all SOURCES annotations
      const annotations = message.annotations as any[];
      const sourcesAnnotations = annotations.filter(
        (annotation) => annotation.type === 'SOURCES'
      );
      // Get the latest SOURCES annotation
      if (sourcesAnnotations.length === 0) return null;
      const latestSourcesAnnotation = sourcesAnnotations[sourcesAnnotations.length - 1];
      if (!latestSourcesAnnotation.content) return null;
      // Flatten all sources from all connectors
      const allSources: Source[] = [];
      latestSourcesAnnotation.content.forEach((connector: ConnectorSource) => {
        if (connector.sources && Array.isArray(connector.sources)) {
          connector.sources.forEach((source: SourceItem) => {
            allSources.push({
              id: source.id,
              title: source.title,
              description: source.description,
              url: source.url,
              connectorType: connector.type
            });
          });
        }
      });
      // Find the source with the matching ID
      const foundSource = allSources.find(source => source.id === citationId);
      return foundSource || null;
    }
  }, [messages]);
  return (
    <>
@ -685,7 +736,11 @@ const ChatPage = () => {
                <div className="flex-1">
                  <Card className="border-gray-300 dark:border-gray-700">
                    <CardContent className="p-3">
-                      <MarkdownViewer content={message.content} getCitationSource={getCitationSource} className="text-sm" />
+                      <MarkdownViewer 
                        content={message.content}
                        getCitationSource={(id) => getCitationSource(id, index)}
                        className="text-sm" 
                      />
                    </CardContent>
                  </Card>
                </div>
@ -856,7 +911,7 @@ const ChatPage = () => {
                              ))}
                              {connector.sources.length > INITIAL_SOURCES_DISPLAY && (
-                                <Dialog open={dialogOpen && activeTab === connector.type} onOpenChange={(open) => setDialogOpen(open)}>
+                                <Dialog open={dialogOpenId === connector.id} onOpenChange={(open) => setDialogOpenId(open ? connector.id : null)}>
                                  <DialogTrigger asChild>
                                    <Button variant="ghost" className="w-full text-sm text-gray-500 dark:text-gray-400">
                                      Show {connector.sources.length - INITIAL_SOURCES_DISPLAY} More Sources
@ -901,13 +956,16 @@ const ChatPage = () => {
                              return (
                                <MarkdownViewer
                                  content={latestAnswer.content.join('\n')}
-                                  getCitationSource={getCitationSource}
+                                  getCitationSource={(id) => getCitationSource(id, index)}
                                />
                              );
                            }
                            // Fallback to the message content if no ANSWER annotation is available
-                            return <MarkdownViewer content={message.content} getCitationSource={getCitationSource} />;
+                            return <MarkdownViewer 
                              content={message.content} 
                              getCitationSource={(id) => getCitationSource(id, index)} 
                            />;
                          })()}
                        </div>
                      }
--- a/surfsense_web/app/docs/[[...slug]]/page.tsx
+++ b/surfsense_web/app/docs/[[...slug]]/page.tsx
@ -0,0 +1,46 @@
 import { source } from '@/lib/source';
 import {
  DocsBody,
  DocsDescription,
  DocsPage,
  DocsTitle,
 } from 'fumadocs-ui/page';
 import { notFound } from 'next/navigation';
 import { getMDXComponents } from '@/mdx-components';
 export default async function Page(props: {
  params: Promise<{ slug?: string[] }>;
 }) {
  const params = await props.params;
  const page = source.getPage(params.slug);
  if (!page) notFound();
  const MDX = page.data.body;
  return (
    <DocsPage toc={page.data.toc} full={page.data.full}>
      <DocsTitle>{page.data.title}</DocsTitle>
      <DocsDescription>{page.data.description}</DocsDescription>
      <DocsBody>
        <MDX components={getMDXComponents()} />
      </DocsBody>
    </DocsPage>
  );
 }
 export async function generateStaticParams() {
  return source.generateParams();
 }
 export async function generateMetadata(props: {
  params: Promise<{ slug?: string[] }>;
 }) {
  const params = await props.params;
  const page = source.getPage(params.slug);
  if (!page) notFound();
  return {
    title: page.data.title,
    description: page.data.description,
  };
 }
--- a/surfsense_web/app/docs/layout.tsx
+++ b/surfsense_web/app/docs/layout.tsx
@ -0,0 +1,12 @@
 import { source } from '@/lib/source';
 import { DocsLayout } from 'fumadocs-ui/layouts/docs';
 import type { ReactNode } from 'react';
 import { baseOptions } from '@/app/layout.config';
 export default function Layout({ children }: { children: ReactNode }) {
  return (
    <DocsLayout tree={source.pageTree} {...baseOptions}>
      {children}
    </DocsLayout>
  );
 }
--- a/surfsense_web/app/globals.css
+++ b/surfsense_web/app/globals.css
@ -1,4 +1,6 @@
-@import "tailwindcss";
+@import 'tailwindcss';
@import 'fumadocs-ui/css/neutral.css';
@import 'fumadocs-ui/css/preset.css';
@plugin "tailwindcss-animate";
--- a/surfsense_web/app/layout.config.tsx
+++ b/surfsense_web/app/layout.config.tsx
@ -0,0 +1,7 @@
 import { BaseLayoutProps } from 'fumadocs-ui/layouts/shared';
 export const baseOptions: BaseLayoutProps = {
  nav: {
    title: 'SurfSense Documentation',
  },
 };
--- a/surfsense_web/app/layout.tsx
+++ b/surfsense_web/app/layout.tsx
@ -5,6 +5,7 @@ import { Roboto } from "next/font/google";
 import { Toaster } from "@/components/ui/sonner";
 import { ThemeProvider } from "@/components/theme/theme-provider";
 import { RootProvider } from 'fumadocs-ui/provider';
 const roboto = Roboto({ 
  subsets: ["latin"],
@ -64,8 +65,10 @@ export default async function RootLayout({
          disableTransitionOnChange
          defaultTheme="light"
        >
-          {children}
+          <RootProvider>
-          <Toaster />
+            {children}
            <Toaster />
          </RootProvider>
        </ThemeProvider>
      </body>
    </html>
--- a/surfsense_web/components/ModernHeroWithGradients.tsx
+++ b/surfsense_web/components/ModernHeroWithGradients.tsx
@ -1,6 +1,6 @@
 "use client";
 import { cn } from "@/lib/utils";
-import { IconArrowRight, IconBrandGithub, IconBrandDiscord } from "@tabler/icons-react";
+import { IconFileTypeDoc, IconBrandGithub, IconBrandDiscord } from "@tabler/icons-react";
 import Link from "next/link";
 import React from "react";
 import { motion } from "framer-motion";
@ -20,11 +20,11 @@ export function ModernHeroWithGradients() {
                    <div className="relative z-20 flex flex-col items-center justify-center overflow-hidden rounded-3xl p-4 md:p-12 lg:p-16">
                        <Link
-                            href="https://github.com/MODSetter/SurfSense"
+                            href="/docs"
                            className="flex items-center gap-1 rounded-full border border-gray-200 bg-gradient-to-b from-gray-50 to-gray-100 px-4 py-1 text-center text-sm text-gray-800 shadow-sm dark:border-[#404040] dark:bg-gradient-to-b dark:from-[#5B5B5D] dark:to-[#262627] dark:text-white dark:shadow-inner dark:shadow-purple-500/10"
                        >
-                            <span>SurfSense v0.0.6 Released</span>
+                            <IconFileTypeDoc className="h-4 w-4 text-gray-800 dark:text-white" />
-                            <IconArrowRight className="h-4 w-4 text-gray-800 dark:text-white" />
+                            <span>Documentation</span>
                        </Link>
                        {/* Import the Logo component or define it in this file */}
                        <div className="flex items-center justify-center gap-4 mt-10 mb-2">
@ -36,7 +36,7 @@ export function ModernHeroWithGradients() {
                            </h1>
                        </div>
                        <p className="mx-auto max-w-3xl py-6 text-center text-base text-gray-600 dark:text-neutral-300 md:text-lg lg:text-xl">
-                            A Customizable AI Research Agent just like NotebookLM or Perplexity, but connected to external sources such as search engines (Tavily), Slack, Notion, and more.
+                            A Customizable AI Research Agent just like NotebookLM or Perplexity, but connected to external sources such as search engines (Tavily), Slack, Linear, Notion, YouTube, GitHub and more.
                        </p>
                        <div className="flex flex-col items-center gap-6 py-6 sm:flex-row">
                            <Link
--- a/surfsense_web/components/Navbar.tsx
+++ b/surfsense_web/components/Navbar.tsx
@ -24,8 +24,8 @@ interface NavbarProps {
 export const Navbar = () => {
  const navItems = [
    {
-      name: "",
+      name: "Docs",
-      link: "/",
+      link: "/docs",
    },
    // {
    //   name: "Product",
@ -118,53 +118,52 @@ const DesktopNav = ({ navItems, visible }: NavbarProps) => {
        <Logo className="h-8 w-8 rounded-md" /> 
        <span className="dark:text-white/90 text-gray-800 text-lg font-bold">SurfSense</span>
      </div>
-      <motion.div
+      <div className="flex items-center gap-4">
-        className="lg:flex flex-row flex-1 items-center justify-center space-x-1 text-sm"
+        <motion.div
-        animate={{
+          className="lg:flex flex-row items-center justify-end space-x-1 text-sm"
-          scale: visible ? 0.9 : 1,
+          animate={{
-          justifyContent: visible ? "flex-end" : "center",
+            scale: visible ? 0.9 : 1,
-        }}
+          }}
-      >
+        >
-        {navItems.map((navItem, idx) => (
+          {navItems.map((navItem, idx) => (
-          <motion.div
+            <motion.div
-            key={`nav-item-${idx}`}
+              key={`nav-item-${idx}`}
-            onHoverStart={() => setHoveredIndex(idx)}
+              onHoverStart={() => setHoveredIndex(idx)}
-            className="relative"
+              className="relative"
          >
            <Link
              className="dark:text-white/90 text-gray-800 relative px-3 py-1.5 transition-colors"
              href={navItem.link}
            >
-              <span className="relative z-10">{navItem.name}</span>
+              <Link
-              {hoveredIndex === idx && (
+                className="dark:text-white/90 text-gray-800 relative px-3 py-1.5 transition-colors"
-                <motion.div
+                href={navItem.link}
-                  layoutId="menu-hover"
+              >
-                  className="absolute inset-0 rounded-full dark:bg-gradient-to-r dark:from-white/10 dark:to-white/20 bg-gradient-to-r from-gray-200 to-gray-300"
+                <span className="relative z-10">{navItem.name}</span>
-                  initial={{ opacity: 0, scale: 0.8 }}
+                {hoveredIndex === idx && (
-                  animate={{
+                  <motion.div
-                    opacity: 1,
+                    layoutId="menu-hover"
-                    scale: 1.1,
+                    className="absolute inset-0 rounded-full dark:bg-gradient-to-r dark:from-white/10 dark:to-white/20 bg-gradient-to-r from-gray-200 to-gray-300"
-                    background: "var(--tw-dark) ? radial-gradient(circle at center, rgba(255,255,255,0.2) 0%, rgba(255,255,255,0.1) 50%, transparent 100%) : radial-gradient(circle at center, rgba(0,0,0,0.05) 0%, rgba(0,0,0,0.03) 50%, transparent 100%)",
+                    initial={{ opacity: 0, scale: 0.8 }}
-                  }}
+                    animate={{
-                  exit={{
+                      opacity: 1,
-                    opacity: 0,
+                      scale: 1.1,
-                    scale: 0.8,
+                      background: "var(--tw-dark) ? radial-gradient(circle at center, rgba(255,255,255,0.2) 0%, rgba(255,255,255,0.1) 50%, transparent 100%) : radial-gradient(circle at center, rgba(0,0,0,0.05) 0%, rgba(0,0,0,0.03) 50%, transparent 100%)",
-                    transition: {
+                    }}
-                      duration: 0.2,
+                    exit={{
-                    },
+                      opacity: 0,
-                  }}
+                      scale: 0.8,
-                  transition={{
+                      transition: {
-                    type: "spring",
+                        duration: 0.2,
-                    bounce: 0.4,
+                      },
-                    duration: 0.4,
+                    }}
-                  }}
+                    transition={{
-                />
+                      type: "spring",
-              )}
+                      bounce: 0.4,
-            </Link>
+                      duration: 0.4,
-          </motion.div>
+                    }}
-        ))}
+                  />
-      </motion.div>
+                )}
-      <div className="flex items-center gap-2">
+              </Link>
            </motion.div>
          ))}
        </motion.div>
        <ThemeTogglerComponent />
        <AnimatePresence mode="popLayout" initial={false}>
          {!visible && (
--- a/surfsense_web/components/chat/Citation.tsx
+++ b/surfsense_web/components/chat/Citation.tsx
@ -20,7 +20,7 @@ type CitationProps = {
 /**
 * Citation component to handle individual citations
 */
-export const Citation = ({ citationId, citationText, position, source }: CitationProps) => {
+export const Citation = React.memo(({ citationId, citationText, position, source }: CitationProps) => {
  const [open, setOpen] = useState(false);
  const citationKey = `citation-${citationId}-${position}`;
@ -38,37 +38,41 @@ export const Citation = ({ citationId, citationText, position, source }: Citatio
            </span>
          </sup>
        </DropdownMenuTrigger>
-        <DropdownMenuContent align="start" className="w-80 p-0">
+        {open && (
-          <Card className="border-0 shadow-none">
+          <DropdownMenuContent align="start" className="w-80 p-0" forceMount>
-            <div className="p-3 flex items-start gap-3">
+            <Card className="border-0 shadow-none">
-              <div className="flex-shrink-0 w-7 h-7 flex items-center justify-center bg-muted rounded-full">
+              <div className="p-3 flex items-start gap-3">
-                {getConnectorIcon(source.connectorType || '')}
+                <div className="flex-shrink-0 w-7 h-7 flex items-center justify-center bg-muted rounded-full">
-              </div>
+                  {getConnectorIcon(source.connectorType || '')}
              <div className="flex-1">
                <div className="flex items-center gap-2 mb-1">
                  <h3 className="font-medium text-sm text-card-foreground">{source.title}</h3>
                </div>
-                <p className="text-sm text-muted-foreground mt-0.5">{source.description}</p>
+                <div className="flex-1">
-                <div className="mt-2 flex items-center text-xs text-muted-foreground">
+                  <div className="flex items-center gap-2 mb-1">
-                  <span className="truncate max-w-[200px]">{source.url}</span>
+                    <h3 className="font-medium text-sm text-card-foreground">{source.title}</h3>
                  </div>
                  <p className="text-sm text-muted-foreground mt-0.5">{source.description}</p>
                  <div className="mt-2 flex items-center text-xs text-muted-foreground">
                    <span className="truncate max-w-[200px]">{source.url}</span>
                  </div>
                </div>
                <Button 
                  variant="ghost" 
                  size="icon" 
                  className="h-7 w-7 rounded-full"
                  onClick={() => window.open(source.url, '_blank', 'noopener,noreferrer')}
                  title="Open in new tab"
                >
                  <ExternalLink className="h-3.5 w-3.5" />
                </Button>
              </div>
-              <Button 
+            </Card>
-                variant="ghost" 
+          </DropdownMenuContent>
-                size="icon" 
+        )}
                className="h-7 w-7 rounded-full"
                onClick={() => window.open(source.url, '_blank')}
                title="Open in new tab"
              >
                <ExternalLink className="h-3.5 w-3.5" />
              </Button>
            </div>
          </Card>
        </DropdownMenuContent>
      </DropdownMenu>
    </span>
  );
-};
+});
 Citation.displayName = 'Citation';
 /**
 * Function to render text with citations
--- a/surfsense_web/components/chat/ConnectorComponents.tsx
+++ b/surfsense_web/components/chat/ConnectorComponents.tsx
@ -11,7 +11,7 @@ import {
  Link,
  Webhook,
 } from 'lucide-react';
-import { IconBrandNotion, IconBrandSlack, IconBrandYoutube } from "@tabler/icons-react";
+import { IconBrandNotion, IconBrandSlack, IconBrandYoutube, IconBrandGithub, IconLayoutKanban } from "@tabler/icons-react";
 import { Button } from '@/components/ui/button';
 import { Connector, ResearchMode } from './types';
@ -20,6 +20,10 @@ export const getConnectorIcon = (connectorType: string) => {
  const iconProps = { className: "h-4 w-4" };
  switch(connectorType) {
    case 'LINEAR_CONNECTOR':
      return <IconLayoutKanban {...iconProps} />;
    case 'GITHUB_CONNECTOR':
      return <IconBrandGithub {...iconProps} />;
    case 'YOUTUBE_VIDEO':
      return <IconBrandYoutube {...iconProps} />;
    case 'CRAWLED_URL':
--- a/surfsense_web/components/editConnector/EditConnectorLoadingSkeleton.tsx
+++ b/surfsense_web/components/editConnector/EditConnectorLoadingSkeleton.tsx
@ -0,0 +1,21 @@
 import React from 'react';
 import { Skeleton } from "@/components/ui/skeleton";
 import { Card, CardContent, CardHeader } from "@/components/ui/card";
 export function EditConnectorLoadingSkeleton() {
    return (
        <div className="container mx-auto py-8 max-w-3xl">
            <Skeleton className="h-8 w-48 mb-6" />
            <Card className="border-2 border-border">
                <CardHeader>
                    <Skeleton className="h-7 w-3/4 mb-2" />
                    <Skeleton className="h-4 w-full" />
                </CardHeader>
                <CardContent className="space-y-4">
                    <Skeleton className="h-10 w-full" />
                    <Skeleton className="h-20 w-full" />
                </CardContent>
            </Card>
        </div>
    );
 } 
--- a/surfsense_web/components/editConnector/EditConnectorNameForm.tsx
+++ b/surfsense_web/components/editConnector/EditConnectorNameForm.tsx
@ -0,0 +1,25 @@
 import React from 'react';
 import { Control } from 'react-hook-form';
 import { FormField, FormItem, FormLabel, FormControl, FormMessage } from "@/components/ui/form";
 import { Input } from "@/components/ui/input";
 // Assuming EditConnectorFormValues is defined elsewhere or passed as generic
 interface EditConnectorNameFormProps {
    control: Control<any>; // Use Control<EditConnectorFormValues> if type is available
 }
 export function EditConnectorNameForm({ control }: EditConnectorNameFormProps) {
    return (
        <FormField
            control={control}
            name="name"
            render={({ field }) => (
                <FormItem>
                    <FormLabel>Connector Name</FormLabel>
                    <FormControl><Input {...field} /></FormControl>
                    <FormMessage />
                </FormItem>
            )}
        />
    );
 } 
--- a/surfsense_web/components/editConnector/EditGitHubConnectorConfig.tsx
+++ b/surfsense_web/components/editConnector/EditGitHubConnectorConfig.tsx
@ -0,0 +1,160 @@
 import React from 'react';
 import { UseFormReturn } from 'react-hook-form';
 import { FormField, FormItem, FormLabel, FormControl, FormDescription, FormMessage } from "@/components/ui/form";
 import { Input } from "@/components/ui/input";
 import { Button } from "@/components/ui/button";
 import { Checkbox } from "@/components/ui/checkbox";
 import { Alert, AlertDescription, AlertTitle } from "@/components/ui/alert";
 import { Skeleton } from "@/components/ui/skeleton";
 import { Edit, KeyRound, Loader2, CircleAlert } from 'lucide-react';
 // Types needed from parent
 interface GithubRepo {
    id: number;
    name: string;
    full_name: string;
    private: boolean;
    url: string;
    description: string | null;
    last_updated: string | null;
 }
 type GithubPatFormValues = { github_pat: string; };
 type EditMode = 'viewing' | 'editing_repos';
 interface EditGitHubConnectorConfigProps {
    // State from parent
    editMode: EditMode;
    originalPat: string;
    currentSelectedRepos: string[];
    fetchedRepos: GithubRepo[] | null;
    newSelectedRepos: string[];
    isFetchingRepos: boolean;
    // Forms from parent
    patForm: UseFormReturn<GithubPatFormValues>;
    // Handlers from parent
    setEditMode: (mode: EditMode) => void;
    handleFetchRepositories: (values: GithubPatFormValues) => Promise<void>;
    handleRepoSelectionChange: (repoFullName: string, checked: boolean) => void;
    setNewSelectedRepos: React.Dispatch<React.SetStateAction<string[]>>;
    setFetchedRepos: React.Dispatch<React.SetStateAction<GithubRepo[] | null>>;
 }
 export function EditGitHubConnectorConfig({
    editMode,
    originalPat,
    currentSelectedRepos,
    fetchedRepos,
    newSelectedRepos,
    isFetchingRepos,
    patForm,
    setEditMode,
    handleFetchRepositories,
    handleRepoSelectionChange,
    setNewSelectedRepos,
    setFetchedRepos
 }: EditGitHubConnectorConfigProps) {
    return (
        <div className="space-y-4">
            <h4 className="font-medium text-muted-foreground">Repository Selection & Access</h4>
            {/* Viewing Mode */}
            {editMode === 'viewing' && (
                <div className="space-y-3 p-4 border rounded-md bg-muted/50">
                    <FormLabel>Currently Indexed Repositories:</FormLabel>
                    {currentSelectedRepos.length > 0 ? (
                        <ul className="list-disc pl-5 text-sm">
                            {currentSelectedRepos.map(repo => <li key={repo}>{repo}</li>)}
                        </ul>
                    ) : (
                        <p className="text-sm text-muted-foreground">(No repositories currently selected)</p>
                    )}
                    <Button type="button" variant="outline" size="sm" onClick={() => setEditMode('editing_repos')}>
                        <Edit className="mr-2 h-4 w-4" /> Change Selection / Update PAT
                    </Button>
                    <FormDescription>To change repo selections or update the PAT, click above.</FormDescription>
                </div>
            )}
            {/* Editing Mode */}
            {editMode === 'editing_repos' && (
                <div className="space-y-4 p-4 border rounded-md">
                    {/* PAT Input */}
                    <div className="flex items-end gap-4 p-4 border rounded-md bg-muted/90">
                        <FormField
                            control={patForm.control}
                            name="github_pat"
                            render={({ field }) => (
                                <FormItem className="flex-grow">
                                    <FormLabel className="flex items-center gap-1"><KeyRound className="h-4 w-4" /> GitHub PAT</FormLabel>
                                    <FormControl><Input type="password" placeholder="ghp_... or github_pat_..." {...field} /></FormControl>
                                    <FormDescription>Enter PAT to fetch/update repos or if you need to update the stored token.</FormDescription>
                                    <FormMessage />
                                </FormItem>
                            )}
                        />
                        <Button
                            type="button"
                            disabled={isFetchingRepos}
                            size="sm"
                            onClick={async () => {
                                const isValid = await patForm.trigger('github_pat');
                                if (isValid) {
                                    handleFetchRepositories(patForm.getValues());
                                }
                            }}
                        >
                            {isFetchingRepos ? <Loader2 className="h-4 w-4 animate-spin" /> : "Fetch Repositories"}
                        </Button>
                    </div>
                    {/* Repo List */}
                    {isFetchingRepos && <Skeleton className="h-40 w-full" />}
                    {!isFetchingRepos && fetchedRepos !== null && (
                        fetchedRepos.length === 0 ? (
                            <Alert variant="destructive">
                                <CircleAlert className="h-4 w-4" />
                                <AlertTitle>No Repositories Found</AlertTitle>
                                <AlertDescription>Check PAT & permissions.</AlertDescription>
                            </Alert>
                        ) : (
                            <div className="space-y-2">
                                <FormLabel>Select Repositories to Index ({newSelectedRepos.length} selected):</FormLabel>
                                <div className="h-64 w-full rounded-md border p-4 overflow-y-auto">
                                    {fetchedRepos.map((repo) => (
                                        <div key={repo.id} className="flex items-center space-x-2 mb-2 py-1">
                                            <Checkbox
                                                id={`repo-${repo.id}`}
                                                checked={newSelectedRepos.includes(repo.full_name)}
                                                onCheckedChange={(checked) => handleRepoSelectionChange(repo.full_name, !!checked)}
                                            />
                                            <label
                                                htmlFor={`repo-${repo.id}`}
                                                className="text-sm font-medium leading-none peer-disabled:cursor-not-allowed peer-disabled:opacity-70"
                                            >
                                                {repo.full_name} {repo.private && "(Private)"}
                                            </label>
                                        </div>
                                    ))}
                                </div>
                            </div>
                        )
                    )}
                    <Button
                        type="button"
                        variant="ghost"
                        size="sm"
                        onClick={() => {
                            setEditMode('viewing');
                            setFetchedRepos(null);
                            setNewSelectedRepos(currentSelectedRepos);
                            patForm.reset({ github_pat: originalPat }); // Reset PAT form on cancel
                        }}
                    >
                        Cancel Repo Change
                    </Button>
                </div>
            )}
        </div>
    );
 } 
--- a/surfsense_web/components/editConnector/EditSimpleTokenForm.tsx
+++ b/surfsense_web/components/editConnector/EditSimpleTokenForm.tsx
@ -0,0 +1,37 @@
 import React from 'react';
 import { Control } from 'react-hook-form';
 import { FormField, FormItem, FormLabel, FormControl, FormDescription, FormMessage } from "@/components/ui/form";
 import { Input } from "@/components/ui/input";
 import { KeyRound } from 'lucide-react';
 // Assuming EditConnectorFormValues is defined elsewhere or passed as generic
 interface EditSimpleTokenFormProps {
    control: Control<any>;
    fieldName: string; // e.g., "SLACK_BOT_TOKEN"
    fieldLabel: string; // e.g., "Slack Bot Token"
    fieldDescription: string;
    placeholder?: string;
 }
 export function EditSimpleTokenForm({
    control,
    fieldName,
    fieldLabel,
    fieldDescription,
    placeholder
 }: EditSimpleTokenFormProps) {
    return (
        <FormField
            control={control}
            name={fieldName}
            render={({ field }) => (
                <FormItem>
                    <FormLabel className="flex items-center gap-1"><KeyRound className="h-4 w-4" /> {fieldLabel}</FormLabel>
                    <FormControl><Input type="password" placeholder={placeholder} {...field} /></FormControl>
                    <FormDescription>{fieldDescription}</FormDescription>
                    <FormMessage />
                </FormItem>
            )}
        />
    );
 } 
--- a/surfsense_web/components/editConnector/types.ts
+++ b/surfsense_web/components/editConnector/types.ts
@ -0,0 +1,34 @@
 import * as z from "zod";
 // Types
 export interface GithubRepo {
    id: number;
    name: string;
    full_name: string;
    private: boolean;
    url: string;
    description: string | null;
    last_updated: string | null;
 }
 export type EditMode = 'viewing' | 'editing_repos';
 // Schemas
 export const githubPatSchema = z.object({
    github_pat: z.string()
        .min(20, { message: "GitHub Personal Access Token seems too short." })
        .refine(pat => pat.startsWith('ghp_') || pat.startsWith('github_pat_'), {
            message: "GitHub PAT should start with 'ghp_' or 'github_pat_'",
        }),
 });
 export type GithubPatFormValues = z.infer<typeof githubPatSchema>;
 export const editConnectorSchema = z.object({
    name: z.string().min(3, { message: "Connector name must be at least 3 characters." }),
    SLACK_BOT_TOKEN: z.string().optional(),
    NOTION_INTEGRATION_TOKEN: z.string().optional(),
    SERPER_API_KEY: z.string().optional(),
    TAVILY_API_KEY: z.string().optional(),
    LINEAR_API_KEY: z.string().optional(),
 });
 export type EditConnectorFormValues = z.infer<typeof editConnectorSchema>; 
--- a/surfsense_web/components/markdown-viewer.tsx
+++ b/surfsense_web/components/markdown-viewer.tsx
@ -1,4 +1,4 @@
-import React from "react";
+import React, { useMemo } from "react";
 import ReactMarkdown from "react-markdown";
 import rehypeRaw from "rehype-raw";
 import rehypeSanitize from "rehype-sanitize";
@ -14,75 +14,87 @@ interface MarkdownViewerProps {
 }
 export function MarkdownViewer({ content, className, getCitationSource }: MarkdownViewerProps) {
  // Memoize the markdown components to prevent unnecessary re-renders
  const components = useMemo(() => {
    return {
      // Define custom components for markdown elements
      p: ({node, children, ...props}: any) => {
        // If there's no getCitationSource function, just render normally
        if (!getCitationSource) {
          return <p className="my-2" {...props}>{children}</p>;
        }
        // Process citations within paragraph content
        return <p className="my-2" {...props}>{processCitationsInReactChildren(children, getCitationSource)}</p>;
      },
      a: ({node, children, ...props}: any) => {
        // Process citations within link content if needed
        const processedChildren = getCitationSource 
          ? processCitationsInReactChildren(children, getCitationSource) 
          : children;
        return <a className="text-primary hover:underline" {...props}>{processedChildren}</a>;
      },
      li: ({node, children, ...props}: any) => {
        // Process citations within list item content
        const processedChildren = getCitationSource 
          ? processCitationsInReactChildren(children, getCitationSource) 
          : children;
        return <li {...props}>{processedChildren}</li>;
      },
      ul: ({node, ...props}: any) => <ul className="list-disc pl-5 my-2" {...props} />,
      ol: ({node, ...props}: any) => <ol className="list-decimal pl-5 my-2" {...props} />,
      h1: ({node, children, ...props}: any) => {
        const processedChildren = getCitationSource 
          ? processCitationsInReactChildren(children, getCitationSource) 
          : children;
        return <h1 className="text-2xl font-bold mt-6 mb-2" {...props}>{processedChildren}</h1>;
      },
      h2: ({node, children, ...props}: any) => {
        const processedChildren = getCitationSource 
          ? processCitationsInReactChildren(children, getCitationSource) 
          : children;
        return <h2 className="text-xl font-bold mt-5 mb-2" {...props}>{processedChildren}</h2>;
      },
      h3: ({node, children, ...props}: any) => {
        const processedChildren = getCitationSource 
          ? processCitationsInReactChildren(children, getCitationSource) 
          : children;
        return <h3 className="text-lg font-bold mt-4 mb-2" {...props}>{processedChildren}</h3>;
      },
      h4: ({node, children, ...props}: any) => {
        const processedChildren = getCitationSource 
          ? processCitationsInReactChildren(children, getCitationSource) 
          : children;
        return <h4 className="text-base font-bold mt-3 mb-1" {...props}>{processedChildren}</h4>;
      },
      blockquote: ({node, ...props}: any) => <blockquote className="border-l-4 border-muted pl-4 italic my-2" {...props} />,
      hr: ({node, ...props}: any) => <hr className="my-4 border-muted" {...props} />,
      img: ({node, ...props}: any) => <img className="max-w-full h-auto my-4 rounded" {...props} />,
      table: ({node, ...props}: any) => <div className="overflow-x-auto my-4"><table className="min-w-full divide-y divide-border" {...props} /></div>,
      th: ({node, ...props}: any) => <th className="px-3 py-2 text-left font-medium bg-muted" {...props} />,
      td: ({node, ...props}: any) => <td className="px-3 py-2 border-t border-border" {...props} />,
      code: ({node, className, children, ...props}: any) => {
        const match = /language-(\w+)/.exec(className || '');
        const isInline = !match;
        return isInline 
          ? <code className="bg-muted px-1 py-0.5 rounded text-xs" {...props}>{children}</code>
          : (
            <div className="relative my-4">
              <pre className="bg-muted p-4 rounded-md overflow-x-auto">
                <code className="text-xs" {...props}>{children}</code>
              </pre>
            </div>
          );
      }
    };
  }, [getCitationSource]);
  return (
    <div className={cn("prose prose-sm dark:prose-invert max-w-none", className)}>
      <ReactMarkdown
        rehypePlugins={[rehypeRaw, rehypeSanitize]}
        remarkPlugins={[remarkGfm]}
-        components={{
+        components={components}
          // Define custom components for markdown elements
          p: ({node, children, ...props}) => {
            // If there's no getCitationSource function, just render normally
            if (!getCitationSource) {
              return <p className="my-2" {...props}>{children}</p>;
            }
            // Process citations within paragraph content
            return <p className="my-2" {...props}>{processCitationsInReactChildren(children, getCitationSource)}</p>;
          },
          a: ({node, children, ...props}) => {
            // Process citations within link content if needed
            const processedChildren = getCitationSource 
              ? processCitationsInReactChildren(children, getCitationSource) 
              : children;
            return <a className="text-primary hover:underline" {...props}>{processedChildren}</a>;
          },
          ul: ({node, ...props}) => <ul className="list-disc pl-5 my-2" {...props} />,
          ol: ({node, ...props}) => <ol className="list-decimal pl-5 my-2" {...props} />,
          h1: ({node, children, ...props}) => {
            const processedChildren = getCitationSource 
              ? processCitationsInReactChildren(children, getCitationSource) 
              : children;
            return <h1 className="text-2xl font-bold mt-6 mb-2" {...props}>{processedChildren}</h1>;
          },
          h2: ({node, children, ...props}) => {
            const processedChildren = getCitationSource 
              ? processCitationsInReactChildren(children, getCitationSource) 
              : children;
            return <h2 className="text-xl font-bold mt-5 mb-2" {...props}>{processedChildren}</h2>;
          },
          h3: ({node, children, ...props}) => {
            const processedChildren = getCitationSource 
              ? processCitationsInReactChildren(children, getCitationSource) 
              : children;
            return <h3 className="text-lg font-bold mt-4 mb-2" {...props}>{processedChildren}</h3>;
          },
          h4: ({node, children, ...props}) => {
            const processedChildren = getCitationSource 
              ? processCitationsInReactChildren(children, getCitationSource) 
              : children;
            return <h4 className="text-base font-bold mt-3 mb-1" {...props}>{processedChildren}</h4>;
          },
          blockquote: ({node, ...props}) => <blockquote className="border-l-4 border-muted pl-4 italic my-2" {...props} />,
          hr: ({node, ...props}) => <hr className="my-4 border-muted" {...props} />,
          img: ({node, ...props}) => <img className="max-w-full h-auto my-4 rounded" {...props} />,
          table: ({node, ...props}) => <div className="overflow-x-auto my-4"><table className="min-w-full divide-y divide-border" {...props} /></div>,
          th: ({node, ...props}) => <th className="px-3 py-2 text-left font-medium bg-muted" {...props} />,
          td: ({node, ...props}) => <td className="px-3 py-2 border-t border-border" {...props} />,
          code: ({node, className, children, ...props}: any) => {
            const match = /language-(\w+)/.exec(className || '');
            const isInline = !match;
            return isInline 
              ? <code className="bg-muted px-1 py-0.5 rounded text-xs" {...props}>{children}</code>
              : (
                <div className="relative my-4">
                  <pre className="bg-muted p-4 rounded-md overflow-x-auto">
                    <code className="text-xs" {...props}>{children}</code>
                  </pre>
                </div>
              );
          }
        }}
      >
        {content}
      </ReactMarkdown>
@ -91,7 +103,7 @@ export function MarkdownViewer({ content, className, getCitationSource }: Markdo
 }
 // Helper function to process citations within React children
-function processCitationsInReactChildren(children: React.ReactNode, getCitationSource: (id: number) => Source | null): React.ReactNode {
+const processCitationsInReactChildren = (children: React.ReactNode, getCitationSource: (id: number) => Source | null): React.ReactNode => {
  // If children is not an array or string, just return it
  if (!children || (typeof children !== 'string' && !Array.isArray(children))) {
    return children;
@ -113,16 +125,18 @@ function processCitationsInReactChildren(children: React.ReactNode, getCitationS
  }
  return children;
-}
+};
 // Process citation references in text content
-function processCitationsInText(text: string, getCitationSource: (id: number) => Source | null): React.ReactNode[] {
+const processCitationsInText = (text: string, getCitationSource: (id: number) => Source | null): React.ReactNode[] => {
  // Use improved regex to catch citation numbers more reliably
  // This will match patterns like [1], [42], etc. including when they appear at the end of a line or sentence
  const citationRegex = /\[(\d+)\]/g;
  const parts: React.ReactNode[] = [];
  let lastIndex = 0;
  let match;
  let position = 0;
-
+  
  while ((match = citationRegex.exec(text)) !== null) {
    // Add text before the citation
    if (match.index > lastIndex) {
@ -131,13 +145,15 @@ function processCitationsInText(text: string, getCitationSource: (id: number) =>
    // Add the citation component
    const citationId = parseInt(match[1], 10);
    const source = getCitationSource(citationId);
    parts.push(
      <Citation 
        key={`citation-${citationId}-${position}`}
        citationId={citationId}
        citationText={match[0]}
        position={position}
-        source={getCitationSource(citationId)}
+        source={source}
      />
    );
@ -151,4 +167,4 @@ function processCitationsInText(text: string, getCitationSource: (id: number) =>
  }
  return parts;
-} 
+}; 
--- a/surfsense_web/components/sidebar/AppSidebarProvider.tsx
+++ b/surfsense_web/components/sidebar/AppSidebarProvider.tsx
@ -118,8 +118,8 @@ export function AppSidebarProvider({
        if (typeof window === 'undefined') return;
        try {
-          // Use the API client instead of direct fetch
+          // Use the API client instead of direct fetch - filter by current search space ID
-          const chats: Chat[] = await apiClient.get<Chat[]>('api/v1/chats/?limit=5&skip=0');
+          const chats: Chat[] = await apiClient.get<Chat[]>(`api/v1/chats/?limit=5&skip=0&search_space_id=${searchSpaceId}`);
          // Transform API response to the format expected by AppSidebar
          const formattedChats = chats.map(chat => ({
@ -170,7 +170,7 @@ export function AppSidebarProvider({
    // Clean up interval on component unmount
    return () => clearInterval(intervalId);
-  }, []);
+  }, [searchSpaceId]);
  // Handle delete chat
  const handleDeleteChat = async () => {
--- a/surfsense_web/content/docs/docker-installation.mdx
+++ b/surfsense_web/content/docs/docker-installation.mdx
@ -0,0 +1,168 @@
 ---
 title: Docker Installation
 description: Setting up SurfSense using Docker 
 full: true
 ---
 ## Known Limitations
 ⚠️ **Important Note:** Currently, the following features have limited functionality when running in Docker:
 - **Ollama integration:** Local Ollama models do not work when running SurfSense in Docker. Please use other LLM providers like OpenAI or Gemini instead.
 - **Web crawler functionality:** The web crawler feature currently doesn't work properly within the Docker environment.
 We're actively working to resolve these limitations in future releases.
 # Docker Installation 
 This guide explains how to run SurfSense using Docker Compose, which is the preferred and recommended method for deployment.
 ## Prerequisites
 Before you begin, ensure you have:
 - [Docker](https://docs.docker.com/get-docker/) and [Docker Compose](https://docs.docker.com/compose/install/) installed on your machine
 - [Git](https://git-scm.com/downloads) (to clone the repository)
 - Completed all the [prerequisite setup steps](/docs) including:
  - PGVector setup
  - Google OAuth configuration
  - Unstructured.io API key
  - Other required API keys
 ## Installation Steps
 1. **Configure Environment Variables**
   Set up the necessary environment variables:
   **Linux/macOS:**
   ```bash
   # Copy example environment files
   cp surfsense_backend/.env.example surfsense_backend/.env
   cp surfsense_web/.env.example surfsense_web/.env
   ```
   **Windows (Command Prompt):**
   ```cmd
   copy surfsense_backend\.env.example surfsense_backend\.env
   copy surfsense_web\.env.example surfsense_web\.env
   ```
   **Windows (PowerShell):**
   ```powershell
   Copy-Item -Path surfsense_backend\.env.example -Destination surfsense_backend\.env
   Copy-Item -Path surfsense_web\.env.example -Destination surfsense_web\.env
   ```
   Edit both `.env` files and fill in the required values:
   **Backend Environment Variables:**
   | ENV VARIABLE | DESCRIPTION |
   |--------------|-------------|
   | DATABASE_URL | PostgreSQL connection string (e.g., `postgresql+asyncpg://postgres:postgres@localhost:5432/surfsense`) |
   | SECRET_KEY | JWT Secret key for authentication (should be a secure random string) |
   | GOOGLE_OAUTH_CLIENT_ID | Google OAuth client ID obtained from Google Cloud Console |
   | GOOGLE_OAUTH_CLIENT_SECRET | Google OAuth client secret obtained from Google Cloud Console |
   | NEXT_FRONTEND_URL | URL where your frontend application is hosted (e.g., `http://localhost:3000`) |
   | EMBEDDING_MODEL | Name of the embedding model (e.g., `mixedbread-ai/mxbai-embed-large-v1`) |
   | RERANKERS_MODEL_NAME | Name of the reranker model (e.g., `ms-marco-MiniLM-L-12-v2`) |
   | RERANKERS_MODEL_TYPE | Type of reranker model (e.g., `flashrank`) |
   | FAST_LLM | LiteLLM routed smaller, faster LLM (e.g., `openai/gpt-4o-mini`, `ollama/deepseek-r1:8b`) |
   | STRATEGIC_LLM | LiteLLM routed advanced LLM for complex tasks (e.g., `openai/gpt-4o`, `ollama/gemma3:12b`) |
   | LONG_CONTEXT_LLM | LiteLLM routed LLM for longer context windows (e.g., `gemini/gemini-2.0-flash`, `ollama/deepseek-r1:8b`) |
   | UNSTRUCTURED_API_KEY | API key for Unstructured.io service for document parsing |
   | FIRECRAWL_API_KEY | API key for Firecrawl service for web crawling |
   Include API keys for the LLM providers you're using. For example:
   - `OPENAI_API_KEY`: If using OpenAI models
   - `GEMINI_API_KEY`: If using Google Gemini models
   For other LLM providers, refer to the [LiteLLM documentation](https://docs.litellm.ai/docs/providers).
   **Frontend Environment Variables:**
   | ENV VARIABLE | DESCRIPTION |
   |--------------|-------------|
   | NEXT_PUBLIC_FASTAPI_BACKEND_URL | URL of the backend service (e.g., `http://localhost:8000`) |
 2. **Build and Start Containers**
   Start the Docker containers:
   **Linux/macOS/Windows:**
   ```bash
   docker-compose up --build
   ```
   To run in detached mode (in the background):
   **Linux/macOS/Windows:**
   ```bash
   docker-compose up -d
   ```
   **Note for Windows users:** If you're using older Docker Desktop versions, you might need to use `docker compose` (with a space) instead of `docker-compose`.
 3. **Access the Applications**
   Once the containers are running, you can access:
   - Frontend: [http://localhost:3000](http://localhost:3000)
   - Backend API: [http://localhost:8000](http://localhost:8000)
   - API Documentation: [http://localhost:8000/docs](http://localhost:8000/docs)
 ## Useful Docker Commands
 ### Container Management
 - **Stop containers:**
  **Linux/macOS/Windows:**
  ```bash
  docker-compose down
  ```
 - **View logs:**
  **Linux/macOS/Windows:**
  ```bash
  # All services
  docker-compose logs -f
  # Specific service
  docker-compose logs -f backend
  docker-compose logs -f frontend
  docker-compose logs -f db
  ```
 - **Restart a specific service:**
  **Linux/macOS/Windows:**
  ```bash
  docker-compose restart backend
  ```
 - **Execute commands in a running container:**
  **Linux/macOS/Windows:**
  ```bash
  # Backend
  docker-compose exec backend python -m pytest
  # Frontend
  docker-compose exec frontend pnpm lint
  ```
 ## Troubleshooting
 - **Linux/macOS:** If you encounter permission errors, you may need to run the docker commands with `sudo`.
 - **Windows:** If you see access denied errors, make sure you're running Command Prompt or PowerShell as Administrator.
 - If ports are already in use, modify the port mappings in the `docker-compose.yml` file.
 - For backend dependency issues, check the `Dockerfile` in the backend directory.
 - For frontend dependency issues, check the `Dockerfile` in the frontend directory.
 - **Windows-specific:** If you encounter line ending issues (CRLF vs LF), configure Git to handle line endings properly with `git config --global core.autocrlf true` before cloning the repository.
 ## Next Steps
 Once your installation is complete, you can start using SurfSense! Navigate to the frontend URL and log in using your Google account. 
--- a/surfsense_web/content/docs/index.mdx
+++ b/surfsense_web/content/docs/index.mdx
@ -0,0 +1,99 @@
 ---
 title: Prerequisites
 description: Required setup's before setting up SurfSense
 full: true
 ---
 ## PGVector installation Guide 
 SurfSense requires the pgvector extension for PostgreSQL:
 ### Linux and Mac
 Compile and install the extension (supports Postgres 13+)
 ```sh
 cd /tmp
 git clone --branch v0.8.0 https://github.com/pgvector/pgvector.git
 cd pgvector
 make
 make install # may need sudo
 ```
 See the [installation notes](https://github.com/pgvector/pgvector/tree/master#installation-notes---linux-and-mac) if you run into issues
 ### Windows
 Ensure [C++ support in Visual Studio](https://learn.microsoft.com/en-us/cpp/build/building-on-the-command-line?view=msvc-170#download-and-install-the-tools) is installed, and run:
 ```cmd
 call "C:\Program Files\Microsoft Visual Studio\2022\Community\VC\Auxiliary\Build\vcvars64.bat"
 ```
 Note: The exact path will vary depending on your Visual Studio version and edition
 Then use `nmake` to build:
 ```cmd
 set "PGROOT=C:\Program Files\PostgreSQL\16"
 cd %TEMP%
 git clone --branch v0.8.0 https://github.com/pgvector/pgvector.git
 cd pgvector
 nmake /F Makefile.win
 nmake /F Makefile.win install
 ```
 See the [installation notes](https://github.com/pgvector/pgvector/tree/master#installation-notes---windows) if you run into issues
 ---
 ## Google OAuth Setup
 SurfSense user management and authentication works on Google OAuth. Lets set it up.
 1. Login to your [Google Developer Console](https://console.cloud.google.com/)
 2. Enable People API.
 ![Google Developer Console People API](/docs/google_oauth_people_api.png)
 3. Set up OAuth consent screen.
 ![Google Developer Console OAuth consent screen](/docs/google_oauth_screen.png)
 4. Create OAuth client ID and secret.
 ![Google Developer Console OAuth client ID](/docs/google_oauth_client.png)
 5. It should look like this.
 ![Google Developer Console Config](/docs/google_oauth_config.png)
 ---
 ## File Upload's
 Files are converted to LLM friendly formats using [Unstructured](https://github.com/Unstructured-IO/unstructured)
 1. Get an Unstructured.io API key from [Unstructured Platform](https://platform.unstructured.io/)
 2. You should be able to generate API keys once registered
 ![Unstructured Dashboard](/docs/unstructured.png)
 ---
 ## LLM Observability (Optional)
 This is not required for SurfSense to work. But it is always a good idea to monitor LLM interactions. So we do not have those WTH moments.
 1. Get a LangSmith API key from [smith.langchain.com](https://smith.langchain.com/)
 2. This helps in observing SurfSense Researcher Agent.
 ![LangSmith](/docs/langsmith.png)
 ---
 ## Crawler 
 SurfSense have 2 options for saving webpages:
 - [SurfSense Extension](https://github.com/MODSetter/SurfSense/tree/main/surfsense_browser_extension) (Overall better experience & ability to save private webpages, recommended)
 - Crawler (If you want to save public webpages)
 **NOTE:** SurfSense currently uses [Firecrawl.py](https://www.firecrawl.dev/) for web crawling. If you plan on using the crawler, you will need to create a Firecrawl account and get an API key.
 ---
 ## Next Steps
 Once you have all prerequisites in place, proceed to the [installation guide](/docs/installation) to set up SurfSense.
--- a/surfsense_web/content/docs/installation.mdx
+++ b/surfsense_web/content/docs/installation.mdx
@ -0,0 +1,21 @@
 ---
 title: Installation
 description: Current ways to use SurfSense
 full: true
 ---
 # Installing SurfSense
 There are two ways to install SurfSense, but both require the repository to be cloned first. Clone [SurfSense](https://github.com/MODSetter/SurfSense) and then:
 ## Docker Installation 
 This method provides a containerized environment with all dependencies pre-configured. Less Customization.
 [Learn more about Docker installation](/docs/docker-installation)
 ## Manual Installation (Preferred)
 For users who prefer more control over the installation process or need to customize their setup, we also provide manual installation instructions.
 [Learn more about Manual installation](/docs/manual-installation)
--- a/surfsense_web/content/docs/manual-installation.mdx
+++ b/surfsense_web/content/docs/manual-installation.mdx
@ -0,0 +1,258 @@
 ---
 title: Manual Installation
 description: Setting up SurfSense manually for customized deployments (Preferred)
 full: true
 ---
 # Manual Installation (Preferred)
 This guide provides step-by-step instructions for setting up SurfSense without Docker. This approach gives you more control over the installation process and allows for customization of the environment.
 ## Prerequisites
 Before beginning the manual installation, ensure you have completed all the [prerequisite setup steps](/docs), including:
 - PGVector installation
 - Google OAuth setup
 - Unstructured.io API key
 - LLM observability (optional)
 - Crawler setup (if needed)
 ## Backend Setup
 The backend is the core of SurfSense. Follow these steps to set it up:
 ### 1. Environment Configuration
 First, create and configure your environment variables by copying the example file:
 **Linux/macOS:**
 ```bash
 cd surfsense_backend
 cp .env.example .env
 ```
 **Windows (Command Prompt):**
 ```cmd
 cd surfsense_backend
 copy .env.example .env
 ```
 **Windows (PowerShell):**
 ```powershell
 cd surfsense_backend
 Copy-Item -Path .env.example -Destination .env
 ```
 Edit the `.env` file and set the following variables:
 | ENV VARIABLE | DESCRIPTION |
 |--------------|-------------|
 | DATABASE_URL | PostgreSQL connection string (e.g., `postgresql+asyncpg://postgres:postgres@localhost:5432/surfsense`) |
 | SECRET_KEY | JWT Secret key for authentication (should be a secure random string) |
 | GOOGLE_OAUTH_CLIENT_ID | Google OAuth client ID |
 | GOOGLE_OAUTH_CLIENT_SECRET | Google OAuth client secret |
 | NEXT_FRONTEND_URL | Frontend application URL (e.g., `http://localhost:3000`) |
 | EMBEDDING_MODEL | Name of the embedding model (e.g., `mixedbread-ai/mxbai-embed-large-v1`) |
 | RERANKERS_MODEL_NAME | Name of the reranker model (e.g., `ms-marco-MiniLM-L-12-v2`) |
 | RERANKERS_MODEL_TYPE | Type of reranker model (e.g., `flashrank`) |
 | FAST_LLM | LiteLLM routed faster LLM (e.g., `openai/gpt-4o-mini`, `ollama/deepseek-r1:8b`) |
 | STRATEGIC_LLM | LiteLLM routed advanced LLM (e.g., `openai/gpt-4o`, `ollama/gemma3:12b`) |
 | LONG_CONTEXT_LLM | LiteLLM routed long-context LLM (e.g., `gemini/gemini-2.0-flash`, `ollama/deepseek-r1:8b`) |
 | UNSTRUCTURED_API_KEY | API key for Unstructured.io service |
 | FIRECRAWL_API_KEY | API key for Firecrawl service (if using crawler) |
 **Important**: Since LLM calls are routed through LiteLLM, include API keys for the LLM providers you're using:
 - For OpenAI models: `OPENAI_API_KEY`
 - For Google Gemini models: `GEMINI_API_KEY`
 - For other providers, refer to the [LiteLLM documentation](https://docs.litellm.ai/docs/providers)
 ### 2. Install Dependencies
 Install the backend dependencies using `uv`:
 **Linux/macOS:**
 ```bash
 # Install uv if you don't have it
 curl -fsSL https://astral.sh/uv/install.sh | bash
 # Install dependencies
 uv sync
 ```
 **Windows (PowerShell):**
 ```powershell
 # Install uv if you don't have it
 iwr -useb https://astral.sh/uv/install.ps1 | iex
 # Install dependencies
 uv sync
 ```
 **Windows (Command Prompt):**
 ```cmd
 # Install dependencies with uv (after installing uv)
 uv sync
 ```
 ### 3. Run the Backend
 Start the backend server:
 **Linux/macOS/Windows:**
 ```bash
 # Run without hot reloading
 uv run main.py
 # Or with hot reloading for development
 uv run main.py --reload
 ```
 If everything is set up correctly, you should see output indicating the server is running on `http://localhost:8000`.
 ## Frontend Setup
 ### 1. Environment Configuration
 Set up the frontend environment:
 **Linux/macOS:**
 ```bash
 cd surfsense_web
 cp .env.example .env
 ```
 **Windows (Command Prompt):**
 ```cmd
 cd surfsense_web
 copy .env.example .env
 ```
 **Windows (PowerShell):**
 ```powershell
 cd surfsense_web
 Copy-Item -Path .env.example -Destination .env
 ```
 Edit the `.env` file and set:
 | ENV VARIABLE | DESCRIPTION |
 |--------------|-------------|
 | NEXT_PUBLIC_FASTAPI_BACKEND_URL | Backend URL (e.g., `http://localhost:8000`) |
 ### 2. Install Dependencies
 Install the frontend dependencies:
 **Linux/macOS:**
 ```bash
 # Install pnpm if you don't have it
 npm install -g pnpm
 # Install dependencies
 pnpm install
 ```
 **Windows:**
 ```powershell
 # Install pnpm if you don't have it
 npm install -g pnpm
 # Install dependencies
 pnpm install
 ```
 ### 3. Run the Frontend
 Start the Next.js development server:
 **Linux/macOS/Windows:**
 ```bash
 pnpm run dev
 ```
 The frontend should now be running at `http://localhost:3000`.
 ## Browser Extension Setup (Optional)
 The SurfSense browser extension allows you to save any webpage, including those protected behind authentication.
 ### 1. Environment Configuration
 **Linux/macOS:**
 ```bash
 cd surfsense_browser_extension
 cp .env.example .env
 ```
 **Windows (Command Prompt):**
 ```cmd
 cd surfsense_browser_extension
 copy .env.example .env
 ```
 **Windows (PowerShell):**
 ```powershell
 cd surfsense_browser_extension
 Copy-Item -Path .env.example -Destination .env
 ```
 Edit the `.env` file:
 | ENV VARIABLE | DESCRIPTION |
 |--------------|-------------|
 | PLASMO_PUBLIC_BACKEND_URL | SurfSense Backend URL (e.g., `http://127.0.0.1:8000`) |
 ### 2. Build the Extension
 Build the extension for your browser using the [Plasmo framework](https://docs.plasmo.com/framework/workflows/build#with-a-specific-target).
 **Linux/macOS/Windows:**
 ```bash
 # Install dependencies
 pnpm install
 # Build for Chrome (default)
 pnpm build
 # Or for other browsers
 pnpm build --target=firefox
 pnpm build --target=edge
 ```
 ### 3. Load the Extension
 Load the extension in your browser's developer mode and configure it with your SurfSense API key.
 ## Verification
 To verify your installation:
 1. Open your browser and navigate to `http://localhost:3000`
 2. Sign in with your Google account
 3. Create a search space and try uploading a document
 4. Test the chat functionality with your uploaded content
 ## Troubleshooting
 - **Database Connection Issues**: Verify your PostgreSQL server is running and pgvector is properly installed
 - **Authentication Problems**: Check your Google OAuth configuration and ensure redirect URIs are set correctly
 - **LLM Errors**: Confirm your LLM API keys are valid and the selected models are accessible
 - **File Upload Failures**: Validate your Unstructured.io API key
 - **Windows-specific**: If you encounter path issues, ensure you're using the correct path separator (`\` instead of `/`)
 - **macOS-specific**: If you encounter permission issues, you may need to use `sudo` for some installation commands
 ## Next Steps
 Now that you have SurfSense running locally, you can explore its features:
 - Create search spaces for organizing your content
 - Upload documents or use the browser extension to save webpages
 - Ask questions about your saved content
 - Explore the advanced RAG capabilities
 For production deployments, consider setting up:
 - A reverse proxy like Nginx
 - SSL certificates for secure connections
 - Proper database backups
 - User access controls 
--- a/surfsense_web/content/docs/meta.json
+++ b/surfsense_web/content/docs/meta.json
@ -0,0 +1,12 @@
 {
    "title": "Setup",
    "description": "The setup guide for Surfsense",
    "root": true,
    "pages": [
      "---Setup---",
      "index",
      "installation",
      "docker-installation",
      "manual-installation"
    ]
  }
--- a/surfsense_web/hooks/use-documents.ts
+++ b/surfsense_web/hooks/use-documents.ts
@ -22,7 +22,7 @@ export function useDocuments(searchSpaceId: number) {
      try {
        setLoading(true);
        const response = await fetch(
-          `${process.env.NEXT_PUBLIC_FASTAPI_BACKEND_URL}/api/v1/documents`, 
+          `${process.env.NEXT_PUBLIC_FASTAPI_BACKEND_URL}/api/v1/documents?search_space_id=${searchSpaceId}`, 
          {
            headers: {
              Authorization: `Bearer ${localStorage.getItem('surfsense_bearer_token')}`,
@ -57,7 +57,7 @@ export function useDocuments(searchSpaceId: number) {
    setLoading(true);
    try {
      const response = await fetch(
-        `${process.env.NEXT_PUBLIC_FASTAPI_BACKEND_URL}/api/v1/documents`, 
+        `${process.env.NEXT_PUBLIC_FASTAPI_BACKEND_URL}/api/v1/documents?search_space_id=${searchSpaceId}`, 
        {
          headers: {
            Authorization: `Bearer ${localStorage.getItem('surfsense_bearer_token')}`,
--- a/surfsense_web/hooks/useConnectorEditPage.ts
+++ b/surfsense_web/hooks/useConnectorEditPage.ts
@ -0,0 +1,240 @@
 import { useState, useEffect, useCallback } from 'react';
 import { useRouter } from 'next/navigation';
 import { useForm } from 'react-hook-form';
 import { zodResolver } from '@hookform/resolvers/zod';
 import { toast } from 'sonner';
 import { useSearchSourceConnectors, SearchSourceConnector } from '@/hooks/useSearchSourceConnectors';
 import { 
    GithubRepo, 
    EditMode, 
    githubPatSchema, 
    editConnectorSchema, 
    GithubPatFormValues, 
    EditConnectorFormValues 
 } from '@/components/editConnector/types';
 export function useConnectorEditPage(connectorId: number, searchSpaceId: string) {
    const router = useRouter();
    const { connectors, updateConnector, isLoading: connectorsLoading } = useSearchSourceConnectors();
    // State managed by the hook
    const [connector, setConnector] = useState<SearchSourceConnector | null>(null);
    const [originalConfig, setOriginalConfig] = useState<Record<string, any> | null>(null);
    const [isSaving, setIsSaving] = useState(false);
    const [currentSelectedRepos, setCurrentSelectedRepos] = useState<string[]>([]);
    const [originalPat, setOriginalPat] = useState<string>("");
    const [editMode, setEditMode] = useState<EditMode>('viewing');
    const [fetchedRepos, setFetchedRepos] = useState<GithubRepo[] | null>(null);
    const [newSelectedRepos, setNewSelectedRepos] = useState<string[]>([]);
    const [isFetchingRepos, setIsFetchingRepos] = useState(false);
    // Forms managed by the hook
    const patForm = useForm<GithubPatFormValues>({
        resolver: zodResolver(githubPatSchema),
        defaultValues: { github_pat: "" },
    });
    const editForm = useForm<EditConnectorFormValues>({
        resolver: zodResolver(editConnectorSchema),
        defaultValues: { 
            name: "", 
            SLACK_BOT_TOKEN: "", 
            NOTION_INTEGRATION_TOKEN: "", 
            SERPER_API_KEY: "", 
            TAVILY_API_KEY: "",
            LINEAR_API_KEY: ""
        }, 
    });
    // Effect to load initial data
    useEffect(() => {
        if (!connectorsLoading && connectors.length > 0 && !connector) {
            const currentConnector = connectors.find(c => c.id === connectorId);
            if (currentConnector) {
                setConnector(currentConnector);
                const config = currentConnector.config || {};
                setOriginalConfig(config);
                editForm.reset({
                    name: currentConnector.name,
                    SLACK_BOT_TOKEN: config.SLACK_BOT_TOKEN || "",
                    NOTION_INTEGRATION_TOKEN: config.NOTION_INTEGRATION_TOKEN || "",
                    SERPER_API_KEY: config.SERPER_API_KEY || "",
                    TAVILY_API_KEY: config.TAVILY_API_KEY || "",
                    LINEAR_API_KEY: config.LINEAR_API_KEY || ""
                });
                if (currentConnector.connector_type === 'GITHUB_CONNECTOR') {
                    const savedRepos = config.repo_full_names || [];
                    const savedPat = config.GITHUB_PAT || "";
                    setCurrentSelectedRepos(savedRepos);
                    setNewSelectedRepos(savedRepos);
                    setOriginalPat(savedPat);
                    patForm.reset({ github_pat: savedPat });
                    setEditMode('viewing');
                }
            } else {
                toast.error("Connector not found.");
                router.push(`/dashboard/${searchSpaceId}/connectors`);
            }
        }
    }, [connectorId, connectors, connectorsLoading, router, searchSpaceId, connector, editForm, patForm]);
    // Handlers managed by the hook
    const handleFetchRepositories = useCallback(async (values: GithubPatFormValues) => {
        setIsFetchingRepos(true);
        setFetchedRepos(null);
        try {
            const token = localStorage.getItem('surfsense_bearer_token');
            if (!token) throw new Error('No auth token');
            const response = await fetch(
                `${process.env.NEXT_PUBLIC_FASTAPI_BACKEND_URL}/api/v1/github/repositories/`,
                { method: 'POST', headers: { 'Content-Type': 'application/json', 'Authorization': `Bearer ${token}` }, body: JSON.stringify({ github_pat: values.github_pat }) }
            );
            if (!response.ok) { const err = await response.json(); throw new Error(err.detail || 'Fetch failed'); }
            const data: GithubRepo[] = await response.json();
            setFetchedRepos(data);
            setNewSelectedRepos(currentSelectedRepos);
            toast.success(`Found ${data.length} repos.`);
        } catch (error) {
            console.error("Error fetching GitHub repositories:", error);
            toast.error(error instanceof Error ? error.message : "Failed to fetch repositories.");
        } finally { setIsFetchingRepos(false); }
    }, [currentSelectedRepos]); // Added dependency
    const handleRepoSelectionChange = useCallback((repoFullName: string, checked: boolean) => {
        setNewSelectedRepos(prev => checked ? [...prev, repoFullName] : prev.filter(name => name !== repoFullName));
    }, []);
    const handleSaveChanges = useCallback(async (formData: EditConnectorFormValues) => {
        if (!connector || !originalConfig) return;
        setIsSaving(true);
        const updatePayload: Partial<SearchSourceConnector> = {};
        let configChanged = false;
        let newConfig: Record<string, any> | null = null;
        if (formData.name !== connector.name) {
            updatePayload.name = formData.name;
        }
        switch (connector.connector_type) {
            case 'GITHUB_CONNECTOR':
                const currentPatInForm = patForm.getValues('github_pat');
                const patChanged = currentPatInForm !== originalPat;
                const initialRepoSet = new Set(currentSelectedRepos);
                const newRepoSet = new Set(newSelectedRepos);
                const reposChanged = initialRepoSet.size !== newRepoSet.size || ![...initialRepoSet].every(repo => newRepoSet.has(repo));
                if (patChanged || (editMode === 'editing_repos' && reposChanged && fetchedRepos !== null)) {
                    if (!currentPatInForm || !(currentPatInForm.startsWith('ghp_') || currentPatInForm.startsWith('github_pat_'))) {
                        toast.error("Invalid GitHub PAT format. Cannot save."); setIsSaving(false); return;
                    }
                    newConfig = { GITHUB_PAT: currentPatInForm, repo_full_names: newSelectedRepos };
                    if (reposChanged && newSelectedRepos.length === 0) { toast.warning("Warning: No repositories selected."); }
                }
                break;
            case 'SLACK_CONNECTOR':
                 if (formData.SLACK_BOT_TOKEN !== originalConfig.SLACK_BOT_TOKEN) {
                     if (!formData.SLACK_BOT_TOKEN) { toast.error("Slack Token empty."); setIsSaving(false); return; }
                     newConfig = { SLACK_BOT_TOKEN: formData.SLACK_BOT_TOKEN };
                 }
                 break;
             case 'NOTION_CONNECTOR':
                  if (formData.NOTION_INTEGRATION_TOKEN !== originalConfig.NOTION_INTEGRATION_TOKEN) {
                      if (!formData.NOTION_INTEGRATION_TOKEN) { toast.error("Notion Token empty."); setIsSaving(false); return; }
                      newConfig = { NOTION_INTEGRATION_TOKEN: formData.NOTION_INTEGRATION_TOKEN };
                  }
                  break;
               case 'SERPER_API':
                   if (formData.SERPER_API_KEY !== originalConfig.SERPER_API_KEY) {
                       if (!formData.SERPER_API_KEY) { toast.error("Serper Key empty."); setIsSaving(false); return; }
                       newConfig = { SERPER_API_KEY: formData.SERPER_API_KEY };
                   }
                   break;
               case 'TAVILY_API':
                   if (formData.TAVILY_API_KEY !== originalConfig.TAVILY_API_KEY) {
                       if (!formData.TAVILY_API_KEY) { toast.error("Tavily Key empty."); setIsSaving(false); return; }
                       newConfig = { TAVILY_API_KEY: formData.TAVILY_API_KEY };
                   }
                   break;
            case 'LINEAR_CONNECTOR':
                if (formData.LINEAR_API_KEY !== originalConfig.LINEAR_API_KEY) {
                    if (!formData.LINEAR_API_KEY) { 
                        toast.error("Linear API Key cannot be empty."); 
                        setIsSaving(false); 
                        return; 
                    }
                    newConfig = { LINEAR_API_KEY: formData.LINEAR_API_KEY };
                }
                break;
        }
        if (newConfig !== null) {
            updatePayload.config = newConfig;
            configChanged = true;
        }
        if (Object.keys(updatePayload).length === 0) {
            toast.info("No changes detected.");
            setIsSaving(false);
            if (connector.connector_type === 'GITHUB_CONNECTOR') { setEditMode('viewing'); patForm.reset({ github_pat: originalPat }); }
            return;
        }
        try {
            await updateConnector(connectorId, updatePayload);
            toast.success("Connector updated!");
            const newlySavedConfig = updatePayload.config || originalConfig;
            setOriginalConfig(newlySavedConfig);
            if (updatePayload.name) {
                 setConnector(prev => prev ? { ...prev, name: updatePayload.name!, config: newlySavedConfig } : null);
            }
            if (configChanged) {
                if (connector.connector_type === 'GITHUB_CONNECTOR') {
                     const savedGitHubConfig = newlySavedConfig as { GITHUB_PAT?: string; repo_full_names?: string[] };
                     setCurrentSelectedRepos(savedGitHubConfig.repo_full_names || []);
                     setOriginalPat(savedGitHubConfig.GITHUB_PAT || "");
                     setNewSelectedRepos(savedGitHubConfig.repo_full_names || []);
                     patForm.reset({ github_pat: savedGitHubConfig.GITHUB_PAT || "" });
                 } else if(connector.connector_type === 'SLACK_CONNECTOR') {
                    editForm.setValue('SLACK_BOT_TOKEN', newlySavedConfig.SLACK_BOT_TOKEN || "");
                 } else if(connector.connector_type === 'NOTION_CONNECTOR') {
                    editForm.setValue('NOTION_INTEGRATION_TOKEN', newlySavedConfig.NOTION_INTEGRATION_TOKEN || "");
                 } else if(connector.connector_type === 'SERPER_API') {
                    editForm.setValue('SERPER_API_KEY', newlySavedConfig.SERPER_API_KEY || "");
                 } else if(connector.connector_type === 'TAVILY_API') {
                    editForm.setValue('TAVILY_API_KEY', newlySavedConfig.TAVILY_API_KEY || "");
                 } else if(connector.connector_type === 'LINEAR_CONNECTOR') {
                    editForm.setValue('LINEAR_API_KEY', newlySavedConfig.LINEAR_API_KEY || "");
                 }
             }
            if (connector.connector_type === 'GITHUB_CONNECTOR') {
                 setEditMode('viewing');
                 setFetchedRepos(null);
             }
            // Resetting simple form values is handled by useEffect if connector state updates
        } catch (error) {
            console.error("Error updating connector:", error);
            toast.error(error instanceof Error ? error.message : "Failed to update connector.");
        } finally { setIsSaving(false); }
    }, [connector, originalConfig, updateConnector, connectorId, patForm, originalPat, currentSelectedRepos, newSelectedRepos, editMode, fetchedRepos, editForm]); // Added editForm to dependencies
    // Return values needed by the component
    return {
        connectorsLoading,
        connector,
        isSaving,
        editForm,
        patForm,
        handleSaveChanges,
        // GitHub specific props
        editMode,
        setEditMode,
        originalPat,
        currentSelectedRepos,
        fetchedRepos,
        setFetchedRepos,
        newSelectedRepos,
        setNewSelectedRepos,
        isFetchingRepos,
        handleFetchRepositories,
        handleRepoSelectionChange,
    };
 } 
--- a/surfsense_web/lib/connectors/utils.ts
+++ b/surfsense_web/lib/connectors/utils.ts
@ -0,0 +1,12 @@
 // Helper function to get connector type display name
 export const getConnectorTypeDisplay = (type: string): string => {
    const typeMap: Record<string, string> = {
        "SERPER_API": "Serper API",
        "TAVILY_API": "Tavily API",
        "SLACK_CONNECTOR": "Slack",
        "NOTION_CONNECTOR": "Notion",
        "GITHUB_CONNECTOR": "GitHub",
        "LINEAR_CONNECTOR": "Linear",
    };
    return typeMap[type] || type;
 }; 
--- a/surfsense_web/lib/source.ts
+++ b/surfsense_web/lib/source.ts
@ -0,0 +1,7 @@
 import { docs } from '@/.source';
 import { loader } from 'fumadocs-core/source';
 export const source = loader({
  baseUrl: '/docs',
  source: docs.toFumadocsSource(),
 });
--- a/surfsense_web/mdx-components.tsx
+++ b/surfsense_web/mdx-components.tsx
@ -0,0 +1,9 @@
 import defaultMdxComponents from 'fumadocs-ui/mdx';
 import type { MDXComponents } from 'mdx/types';
 export function getMDXComponents(components?: MDXComponents): MDXComponents {
  return {
    ...defaultMdxComponents,
    ...components,
  };
 }
--- a/surfsense_web/next.config.ts
+++ b/surfsense_web/next.config.ts
@ -1,4 +1,5 @@
 import type { NextConfig } from "next";
 import { createMDX } from 'fumadocs-mdx/next';
 const nextConfig: NextConfig = {
  typescript: {
@ -9,4 +10,7 @@ const nextConfig: NextConfig = {
  },
 };
-export default nextConfig;
+// Wrap the config with createMDX
 const withMDX = createMDX({});
 export default withMDX(nextConfig);
--- a/surfsense_web/package.json
+++ b/surfsense_web/package.json
@ -1,15 +1,18 @@
 {
-  "name": "surf_new_frontend",
+  "name": "surfsense_web",
-  "version": "0.1.0",
+  "version": "0.0.6",
  "private": true,
  "description": "SurfSense Frontend",
  "scripts": {
-    "dev": "next dev --turbopack",
+    "dev": "next dev",
    "dev:turbopack": "next dev --turbopack",
    "build": "next build",
    "start": "next start",
    "lint": "next lint",
    "debug": "cross-env NODE_OPTIONS=--inspect next dev --turbopack",
    "debug:browser": "cross-env NODE_OPTIONS=--inspect next dev --turbopack",
-    "debug:server": "cross-env NODE_OPTIONS=--inspect=0.0.0.0:9229 next dev --turbopack"
+    "debug:server": "cross-env NODE_OPTIONS=--inspect=0.0.0.0:9229 next dev --turbopack",
    "postinstall": "fumadocs-mdx"
  },
  "dependencies": {
    "@ai-sdk/react": "^1.1.21",
@ -30,12 +33,16 @@
    "@radix-ui/react-tooltip": "^1.1.8",
    "@tabler/icons-react": "^3.30.0",
    "@tanstack/react-table": "^8.21.2",
    "@types/mdx": "^2.0.13",
    "ai": "^4.1.54",
    "class-variance-authority": "^0.7.1",
    "clsx": "^2.1.1",
    "date-fns": "^4.1.0",
    "emblor": "^1.4.7",
    "framer-motion": "^12.4.7",
    "fumadocs-core": "^15.2.9",
    "fumadocs-mdx": "^11.6.1",
    "fumadocs-ui": "^15.2.9",
    "geist": "^1.3.1",
    "lucide-react": "^0.477.0",
    "next": "15.2.3",
--- a/surfsense_web/pnpm-lock.yaml
+++ b/surfsense_web/pnpm-lock.yaml
--- a/surfsense_web/public/docs/google_oauth_client.png
+++ b/surfsense_web/public/docs/google_oauth_client.png
--- a/surfsense_web/public/docs/google_oauth_config.png
+++ b/surfsense_web/public/docs/google_oauth_config.png
--- a/surfsense_web/public/docs/google_oauth_people_api.png
+++ b/surfsense_web/public/docs/google_oauth_people_api.png
--- a/surfsense_web/public/docs/google_oauth_screen.png
+++ b/surfsense_web/public/docs/google_oauth_screen.png
--- a/surfsense_web/public/docs/langsmith.png
+++ b/surfsense_web/public/docs/langsmith.png
--- a/surfsense_web/public/docs/unstructured.png
+++ b/surfsense_web/public/docs/unstructured.png
--- a/surfsense_web/source.config.ts
+++ b/surfsense_web/source.config.ts
@ -0,0 +1,5 @@
 import { defineDocs } from 'fumadocs-mdx/config';
 export const docs = defineDocs({
  dir: 'content/docs',
 });
--- a/surfsense_web/tsconfig.json
+++ b/surfsense_web/tsconfig.json
@ -22,6 +22,6 @@
      "@/*": ["./*"]
    }
  },
-  "include": ["next-env.d.ts", "**/*.ts", "**/*.tsx", ".next/types/**/*.ts"],
+  "include": ["next-env.d.ts", "**/*.ts", "**/*.tsx", ".next/types/**/*.ts", "next.config.mjs"],
  "exclude": ["node_modules"]
 }
		`@ -0,0 +1 @@`
							`Generic single-database configuration with an async dbapi.`
		`@ -0,0 +1 @@`
							`"""This is upcoming research agent. Work in progress."""`