mirror of
https://github.com/MODSetter/SurfSense.git
synced 2026-07-02 22:01:05 +02:00
Merge remote-tracking branch 'upstream/dev' into feat/web-search
This commit is contained in:
commit
60d12b0a70
45 changed files with 377 additions and 198 deletions
|
|
@ -14,6 +14,7 @@ SurfSense 现已支持以下国产 LLM:
|
||||||
- ✅ **阿里通义千问 (Alibaba Qwen)** - 阿里云通义千问大模型
|
- ✅ **阿里通义千问 (Alibaba Qwen)** - 阿里云通义千问大模型
|
||||||
- ✅ **月之暗面 Kimi (Moonshot)** - 月之暗面 Kimi 大模型
|
- ✅ **月之暗面 Kimi (Moonshot)** - 月之暗面 Kimi 大模型
|
||||||
- ✅ **智谱 AI GLM (Zhipu)** - 智谱 AI GLM 系列模型
|
- ✅ **智谱 AI GLM (Zhipu)** - 智谱 AI GLM 系列模型
|
||||||
|
- ✅ **MiniMax** - MiniMax 大模型 (M2.5 系列,204K 上下文)
|
||||||
|
|
||||||
---
|
---
|
||||||
|
|
||||||
|
|
@ -197,6 +198,52 @@ API Base URL: https://open.bigmodel.cn/api/paas/v4
|
||||||
|
|
||||||
---
|
---
|
||||||
|
|
||||||
|
## 5️⃣ MiniMax 配置 | MiniMax Configuration
|
||||||
|
|
||||||
|
### 获取 API Key
|
||||||
|
|
||||||
|
1. 访问 [MiniMax 开放平台](https://platform.minimaxi.com/)
|
||||||
|
2. 注册并登录账号
|
||||||
|
3. 进入 **API Keys** 页面
|
||||||
|
4. 创建新的 API Key
|
||||||
|
5. 复制 API Key
|
||||||
|
|
||||||
|
### 在 SurfSense 中配置
|
||||||
|
|
||||||
|
| 字段 | 值 | 说明 |
|
||||||
|
|------|-----|------|
|
||||||
|
| **Configuration Name** | `MiniMax M2.5` | 配置名称(自定义) |
|
||||||
|
| **Provider** | `MINIMAX` | 选择 MiniMax |
|
||||||
|
| **Model Name** | `MiniMax-M2.5` | 推荐模型<br>其他选项: `MiniMax-M2.5-highspeed` |
|
||||||
|
| **API Key** | `eyJ...` | 你的 MiniMax API Key |
|
||||||
|
| **API Base URL** | `https://api.minimax.io/v1` | MiniMax API 地址 |
|
||||||
|
| **Parameters** | `{"temperature": 1.0}` | 注意:temperature 必须在 (0.0, 1.0] 范围内,不能为 0 |
|
||||||
|
|
||||||
|
### 示例配置
|
||||||
|
|
||||||
|
```
|
||||||
|
Configuration Name: MiniMax M2.5
|
||||||
|
Provider: MINIMAX
|
||||||
|
Model Name: MiniMax-M2.5
|
||||||
|
API Key: eyJxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx
|
||||||
|
API Base URL: https://api.minimax.io/v1
|
||||||
|
```
|
||||||
|
|
||||||
|
### 可用模型
|
||||||
|
|
||||||
|
- **MiniMax-M2.5**: 高性能通用模型,204K 上下文窗口(推荐)
|
||||||
|
- **MiniMax-M2.5-highspeed**: 高速推理版本,204K 上下文窗口
|
||||||
|
|
||||||
|
### 注意事项
|
||||||
|
|
||||||
|
- **temperature 参数**: MiniMax 要求 temperature 必须在 (0.0, 1.0] 范围内,不能设置为 0。建议使用 1.0。
|
||||||
|
- 两个模型都支持 204K 超长上下文窗口,适合处理长文本任务。
|
||||||
|
|
||||||
|
### 定价
|
||||||
|
- 请访问 [MiniMax 定价页面](https://platform.minimaxi.com/document/Price) 查看最新价格
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
## ⚙️ 高级配置 | Advanced Configuration
|
## ⚙️ 高级配置 | Advanced Configuration
|
||||||
|
|
||||||
### 自定义参数 | Custom Parameters
|
### 自定义参数 | Custom Parameters
|
||||||
|
|
@ -268,8 +315,8 @@ docker compose logs backend | grep -i "error"
|
||||||
|---------|---------|------|
|
|---------|---------|------|
|
||||||
| **文档摘要** | Qwen-Plus, GLM-4 | 平衡性能和成本 |
|
| **文档摘要** | Qwen-Plus, GLM-4 | 平衡性能和成本 |
|
||||||
| **代码分析** | DeepSeek-Coder | 代码专用 |
|
| **代码分析** | DeepSeek-Coder | 代码专用 |
|
||||||
| **长文本处理** | Kimi 128K | 超长上下文 |
|
| **长文本处理** | Kimi 128K, MiniMax-M2.5 (204K) | 超长上下文 |
|
||||||
| **快速响应** | Qwen-Turbo, GLM-4-Flash | 速度优先 |
|
| **快速响应** | Qwen-Turbo, GLM-4-Flash, MiniMax-M2.5-highspeed | 速度优先 |
|
||||||
|
|
||||||
### 2. 成本优化
|
### 2. 成本优化
|
||||||
|
|
||||||
|
|
@ -294,6 +341,7 @@ docker compose logs backend | grep -i "error"
|
||||||
- [阿里云百炼文档](https://help.aliyun.com/zh/model-studio/)
|
- [阿里云百炼文档](https://help.aliyun.com/zh/model-studio/)
|
||||||
- [Moonshot AI 文档](https://platform.moonshot.cn/docs)
|
- [Moonshot AI 文档](https://platform.moonshot.cn/docs)
|
||||||
- [智谱 AI 文档](https://open.bigmodel.cn/dev/api)
|
- [智谱 AI 文档](https://open.bigmodel.cn/dev/api)
|
||||||
|
- [MiniMax 文档](https://platform.minimaxi.com/document/Guides)
|
||||||
|
|
||||||
### SurfSense 文档
|
### SurfSense 文档
|
||||||
|
|
||||||
|
|
|
||||||
|
|
@ -0,0 +1,23 @@
|
||||||
|
"""Add MINIMAX to LiteLLMProvider enum
|
||||||
|
|
||||||
|
Revision ID: 106
|
||||||
|
Revises: 105
|
||||||
|
"""
|
||||||
|
|
||||||
|
from collections.abc import Sequence
|
||||||
|
|
||||||
|
from alembic import op
|
||||||
|
|
||||||
|
revision: str = "106"
|
||||||
|
down_revision: str | None = "105"
|
||||||
|
branch_labels: str | Sequence[str] | None = None
|
||||||
|
depends_on: str | Sequence[str] | None = None
|
||||||
|
|
||||||
|
|
||||||
|
def upgrade() -> None:
|
||||||
|
op.execute("COMMIT")
|
||||||
|
op.execute("ALTER TYPE litellmprovider ADD VALUE IF NOT EXISTS 'MINIMAX'")
|
||||||
|
|
||||||
|
|
||||||
|
def downgrade() -> None:
|
||||||
|
pass
|
||||||
|
|
@ -59,6 +59,7 @@ PROVIDER_MAP = {
|
||||||
"DATABRICKS": "databricks",
|
"DATABRICKS": "databricks",
|
||||||
"COMETAPI": "cometapi",
|
"COMETAPI": "cometapi",
|
||||||
"HUGGINGFACE": "huggingface",
|
"HUGGINGFACE": "huggingface",
|
||||||
|
"MINIMAX": "openai",
|
||||||
"CUSTOM": "custom",
|
"CUSTOM": "custom",
|
||||||
}
|
}
|
||||||
|
|
||||||
|
|
|
||||||
|
|
@ -183,6 +183,23 @@ global_llm_configs:
|
||||||
use_default_system_instructions: true
|
use_default_system_instructions: true
|
||||||
citations_enabled: true
|
citations_enabled: true
|
||||||
|
|
||||||
|
# Example: MiniMax M2.5 - High-performance with 204K context window
|
||||||
|
- id: -8
|
||||||
|
name: "Global MiniMax M2.5"
|
||||||
|
description: "MiniMax M2.5 with 204K context window and competitive pricing"
|
||||||
|
provider: "MINIMAX"
|
||||||
|
model_name: "MiniMax-M2.5"
|
||||||
|
api_key: "your-minimax-api-key-here"
|
||||||
|
api_base: "https://api.minimax.io/v1"
|
||||||
|
rpm: 60
|
||||||
|
tpm: 100000
|
||||||
|
litellm_params:
|
||||||
|
temperature: 1.0 # MiniMax requires temperature in (0.0, 1.0], cannot be 0
|
||||||
|
max_tokens: 4000
|
||||||
|
system_instructions: ""
|
||||||
|
use_default_system_instructions: true
|
||||||
|
citations_enabled: true
|
||||||
|
|
||||||
# =============================================================================
|
# =============================================================================
|
||||||
# Image Generation Configuration
|
# Image Generation Configuration
|
||||||
# =============================================================================
|
# =============================================================================
|
||||||
|
|
|
||||||
|
|
@ -463,7 +463,7 @@ async def _process_gmail_messages_phase2(
|
||||||
"connector_id": connector_id,
|
"connector_id": connector_id,
|
||||||
"source": "composio",
|
"source": "composio",
|
||||||
}
|
}
|
||||||
safe_set_chunks(document, chunks)
|
await safe_set_chunks(session, document, chunks)
|
||||||
document.updated_at = get_current_timestamp()
|
document.updated_at = get_current_timestamp()
|
||||||
document.status = DocumentStatus.ready()
|
document.status = DocumentStatus.ready()
|
||||||
|
|
||||||
|
|
|
||||||
|
|
@ -477,7 +477,7 @@ async def index_composio_google_calendar(
|
||||||
"connector_id": connector_id,
|
"connector_id": connector_id,
|
||||||
"source": "composio",
|
"source": "composio",
|
||||||
}
|
}
|
||||||
safe_set_chunks(document, chunks)
|
await safe_set_chunks(session, document, chunks)
|
||||||
document.updated_at = get_current_timestamp()
|
document.updated_at = get_current_timestamp()
|
||||||
document.status = DocumentStatus.ready()
|
document.status = DocumentStatus.ready()
|
||||||
|
|
||||||
|
|
|
||||||
|
|
@ -1112,7 +1112,7 @@ async def _index_composio_drive_delta_sync(
|
||||||
"connector_id": connector_id,
|
"connector_id": connector_id,
|
||||||
"source": "composio",
|
"source": "composio",
|
||||||
}
|
}
|
||||||
safe_set_chunks(document, chunks)
|
await safe_set_chunks(session, document, chunks)
|
||||||
document.updated_at = get_current_timestamp()
|
document.updated_at = get_current_timestamp()
|
||||||
document.status = DocumentStatus.ready()
|
document.status = DocumentStatus.ready()
|
||||||
|
|
||||||
|
|
@ -1520,7 +1520,7 @@ async def _index_composio_drive_full_scan(
|
||||||
"connector_id": connector_id,
|
"connector_id": connector_id,
|
||||||
"source": "composio",
|
"source": "composio",
|
||||||
}
|
}
|
||||||
safe_set_chunks(document, chunks)
|
await safe_set_chunks(session, document, chunks)
|
||||||
document.updated_at = get_current_timestamp()
|
document.updated_at = get_current_timestamp()
|
||||||
document.status = DocumentStatus.ready()
|
document.status = DocumentStatus.ready()
|
||||||
|
|
||||||
|
|
|
||||||
|
|
@ -215,6 +215,7 @@ class LiteLLMProvider(StrEnum):
|
||||||
COMETAPI = "COMETAPI"
|
COMETAPI = "COMETAPI"
|
||||||
HUGGINGFACE = "HUGGINGFACE"
|
HUGGINGFACE = "HUGGINGFACE"
|
||||||
GITHUB_MODELS = "GITHUB_MODELS"
|
GITHUB_MODELS = "GITHUB_MODELS"
|
||||||
|
MINIMAX = "MINIMAX"
|
||||||
CUSTOM = "CUSTOM"
|
CUSTOM = "CUSTOM"
|
||||||
|
|
||||||
|
|
||||||
|
|
|
||||||
|
|
@ -1,3 +1,4 @@
|
||||||
|
import asyncio
|
||||||
import time
|
import time
|
||||||
from datetime import datetime
|
from datetime import datetime
|
||||||
|
|
||||||
|
|
@ -49,7 +50,7 @@ class ChucksHybridSearchRetriever:
|
||||||
# Get embedding for the query
|
# Get embedding for the query
|
||||||
embedding_model = config.embedding_model_instance
|
embedding_model = config.embedding_model_instance
|
||||||
t_embed = time.perf_counter()
|
t_embed = time.perf_counter()
|
||||||
query_embedding = embedding_model.embed(query_text)
|
query_embedding = await asyncio.to_thread(embedding_model.embed, query_text)
|
||||||
perf.debug(
|
perf.debug(
|
||||||
"[chunk_search] vector_search embedding in %.3fs",
|
"[chunk_search] vector_search embedding in %.3fs",
|
||||||
time.perf_counter() - t_embed,
|
time.perf_counter() - t_embed,
|
||||||
|
|
@ -195,7 +196,7 @@ class ChucksHybridSearchRetriever:
|
||||||
if query_embedding is None:
|
if query_embedding is None:
|
||||||
embedding_model = config.embedding_model_instance
|
embedding_model = config.embedding_model_instance
|
||||||
t_embed = time.perf_counter()
|
t_embed = time.perf_counter()
|
||||||
query_embedding = embedding_model.embed(query_text)
|
query_embedding = await asyncio.to_thread(embedding_model.embed, query_text)
|
||||||
perf.debug(
|
perf.debug(
|
||||||
"[chunk_search] hybrid_search embedding in %.3fs",
|
"[chunk_search] hybrid_search embedding in %.3fs",
|
||||||
time.perf_counter() - t_embed,
|
time.perf_counter() - t_embed,
|
||||||
|
|
@ -427,4 +428,4 @@ class ChucksHybridSearchRetriever:
|
||||||
search_space_id,
|
search_space_id,
|
||||||
document_type,
|
document_type,
|
||||||
)
|
)
|
||||||
return final_docs
|
return final_docs
|
||||||
|
|
@ -1,11 +1,10 @@
|
||||||
import logging
|
import logging
|
||||||
from datetime import datetime
|
from datetime import datetime
|
||||||
|
|
||||||
from sqlalchemy import delete
|
|
||||||
from sqlalchemy.ext.asyncio import AsyncSession
|
from sqlalchemy.ext.asyncio import AsyncSession
|
||||||
|
|
||||||
from app.connectors.linear_connector import LinearConnector
|
from app.connectors.linear_connector import LinearConnector
|
||||||
from app.db import Chunk, Document
|
from app.db import Document
|
||||||
from app.services.llm_service import get_user_long_context_llm
|
from app.services.llm_service import get_user_long_context_llm
|
||||||
from app.utils.document_converters import (
|
from app.utils.document_converters import (
|
||||||
create_document_chunks,
|
create_document_chunks,
|
||||||
|
|
@ -105,10 +104,6 @@ class LinearKBSyncService:
|
||||||
)
|
)
|
||||||
summary_embedding = embed_text(summary_content)
|
summary_embedding = embed_text(summary_content)
|
||||||
|
|
||||||
await self.db_session.execute(
|
|
||||||
delete(Chunk).where(Chunk.document_id == document.id)
|
|
||||||
)
|
|
||||||
|
|
||||||
chunks = await create_document_chunks(issue_content)
|
chunks = await create_document_chunks(issue_content)
|
||||||
|
|
||||||
document.title = f"{issue_identifier}: {issue_title}"
|
document.title = f"{issue_identifier}: {issue_title}"
|
||||||
|
|
@ -131,7 +126,7 @@ class LinearKBSyncService:
|
||||||
"connector_id": connector_id,
|
"connector_id": connector_id,
|
||||||
}
|
}
|
||||||
flag_modified(document, "document_metadata")
|
flag_modified(document, "document_metadata")
|
||||||
safe_set_chunks(document, chunks)
|
await safe_set_chunks(self.db_session, document, chunks)
|
||||||
document.updated_at = get_current_timestamp()
|
document.updated_at = get_current_timestamp()
|
||||||
|
|
||||||
await self.db_session.commit()
|
await self.db_session.commit()
|
||||||
|
|
|
||||||
|
|
@ -85,6 +85,7 @@ PROVIDER_MAP = {
|
||||||
"ZHIPU": "openai",
|
"ZHIPU": "openai",
|
||||||
"GITHUB_MODELS": "github",
|
"GITHUB_MODELS": "github",
|
||||||
"HUGGINGFACE": "huggingface",
|
"HUGGINGFACE": "huggingface",
|
||||||
|
"MINIMAX": "openai",
|
||||||
"CUSTOM": "custom",
|
"CUSTOM": "custom",
|
||||||
}
|
}
|
||||||
|
|
||||||
|
|
|
||||||
|
|
@ -127,6 +127,7 @@ async def validate_llm_config(
|
||||||
"ALIBABA_QWEN": "openai",
|
"ALIBABA_QWEN": "openai",
|
||||||
"MOONSHOT": "openai",
|
"MOONSHOT": "openai",
|
||||||
"ZHIPU": "openai", # GLM needs special handling
|
"ZHIPU": "openai", # GLM needs special handling
|
||||||
|
"MINIMAX": "openai",
|
||||||
"GITHUB_MODELS": "github",
|
"GITHUB_MODELS": "github",
|
||||||
}
|
}
|
||||||
provider_prefix = provider_map.get(provider, provider.lower())
|
provider_prefix = provider_map.get(provider, provider.lower())
|
||||||
|
|
@ -277,6 +278,7 @@ async def get_search_space_llm_instance(
|
||||||
"ALIBABA_QWEN": "openai",
|
"ALIBABA_QWEN": "openai",
|
||||||
"MOONSHOT": "openai",
|
"MOONSHOT": "openai",
|
||||||
"ZHIPU": "openai",
|
"ZHIPU": "openai",
|
||||||
|
"MINIMAX": "openai",
|
||||||
}
|
}
|
||||||
provider_prefix = provider_map.get(
|
provider_prefix = provider_map.get(
|
||||||
global_config["provider"], global_config["provider"].lower()
|
global_config["provider"], global_config["provider"].lower()
|
||||||
|
|
@ -350,6 +352,7 @@ async def get_search_space_llm_instance(
|
||||||
"ALIBABA_QWEN": "openai",
|
"ALIBABA_QWEN": "openai",
|
||||||
"MOONSHOT": "openai",
|
"MOONSHOT": "openai",
|
||||||
"ZHIPU": "openai",
|
"ZHIPU": "openai",
|
||||||
|
"MINIMAX": "openai",
|
||||||
"GITHUB_MODELS": "github",
|
"GITHUB_MODELS": "github",
|
||||||
}
|
}
|
||||||
provider_prefix = provider_map.get(
|
provider_prefix = provider_map.get(
|
||||||
|
|
|
||||||
|
|
@ -1,10 +1,9 @@
|
||||||
import logging
|
import logging
|
||||||
from datetime import datetime
|
from datetime import datetime
|
||||||
|
|
||||||
from sqlalchemy import delete
|
|
||||||
from sqlalchemy.ext.asyncio import AsyncSession
|
from sqlalchemy.ext.asyncio import AsyncSession
|
||||||
|
|
||||||
from app.db import Chunk, Document
|
from app.db import Document
|
||||||
from app.services.llm_service import get_user_long_context_llm
|
from app.services.llm_service import get_user_long_context_llm
|
||||||
from app.utils.document_converters import (
|
from app.utils.document_converters import (
|
||||||
create_document_chunks,
|
create_document_chunks,
|
||||||
|
|
@ -130,11 +129,6 @@ class NotionKBSyncService:
|
||||||
summary_content = f"Notion Page: {document.document_metadata.get('page_title')}\n\n{full_content}"
|
summary_content = f"Notion Page: {document.document_metadata.get('page_title')}\n\n{full_content}"
|
||||||
summary_embedding = embed_text(summary_content)
|
summary_embedding = embed_text(summary_content)
|
||||||
|
|
||||||
logger.debug(f"Deleting old chunks for document {document_id}")
|
|
||||||
await self.db_session.execute(
|
|
||||||
delete(Chunk).where(Chunk.document_id == document.id)
|
|
||||||
)
|
|
||||||
|
|
||||||
logger.debug("Creating new chunks")
|
logger.debug("Creating new chunks")
|
||||||
chunks = await create_document_chunks(full_content)
|
chunks = await create_document_chunks(full_content)
|
||||||
logger.debug(f"Created {len(chunks)} chunks")
|
logger.debug(f"Created {len(chunks)} chunks")
|
||||||
|
|
@ -147,7 +141,7 @@ class NotionKBSyncService:
|
||||||
**document.document_metadata,
|
**document.document_metadata,
|
||||||
"indexed_at": datetime.now().strftime("%Y-%m-%d %H:%M:%S"),
|
"indexed_at": datetime.now().strftime("%Y-%m-%d %H:%M:%S"),
|
||||||
}
|
}
|
||||||
safe_set_chunks(document, chunks)
|
await safe_set_chunks(self.db_session, document, chunks)
|
||||||
document.updated_at = get_current_timestamp()
|
document.updated_at = get_current_timestamp()
|
||||||
|
|
||||||
logger.debug("Committing changes to database")
|
logger.debug("Committing changes to database")
|
||||||
|
|
|
||||||
|
|
@ -432,7 +432,7 @@ async def index_airtable_records(
|
||||||
"table_name": item["table_name"],
|
"table_name": item["table_name"],
|
||||||
"connector_id": connector_id,
|
"connector_id": connector_id,
|
||||||
}
|
}
|
||||||
safe_set_chunks(document, chunks)
|
await safe_set_chunks(session, document, chunks)
|
||||||
document.updated_at = get_current_timestamp()
|
document.updated_at = get_current_timestamp()
|
||||||
document.status = DocumentStatus.ready()
|
document.status = DocumentStatus.ready()
|
||||||
|
|
||||||
|
|
|
||||||
|
|
@ -28,45 +28,37 @@ def get_current_timestamp() -> datetime:
|
||||||
return datetime.now(UTC)
|
return datetime.now(UTC)
|
||||||
|
|
||||||
|
|
||||||
def safe_set_chunks(document: Document, chunks: list) -> None:
|
async def safe_set_chunks(
|
||||||
|
session: "AsyncSession", document: Document, chunks: list
|
||||||
|
) -> None:
|
||||||
"""
|
"""
|
||||||
Safely assign chunks to a document without triggering lazy loading.
|
Delete old chunks and assign new ones to a document.
|
||||||
|
|
||||||
ALWAYS use this instead of `document.chunks = chunks` to avoid
|
This replaces direct ``document.chunks = chunks`` which triggers lazy
|
||||||
SQLAlchemy async errors (MissingGreenlet / greenlet_spawn).
|
loading (and MissingGreenlet errors in async contexts). It also
|
||||||
|
explicitly deletes pre-existing chunks so they don't accumulate across
|
||||||
Why this is needed:
|
repeated re-indexes — ``set_committed_value`` bypasses SQLAlchemy's
|
||||||
- Direct assignment `document.chunks = chunks` triggers SQLAlchemy to
|
delete-orphan cascade.
|
||||||
load the OLD chunks first (for comparison/orphan detection)
|
|
||||||
- This lazy loading fails in async context with asyncpg driver
|
|
||||||
- set_committed_value bypasses this by setting the value directly
|
|
||||||
|
|
||||||
This function is safe regardless of how the document was loaded
|
|
||||||
(with or without selectinload).
|
|
||||||
|
|
||||||
Args:
|
Args:
|
||||||
document: The Document object to update
|
session: The current async database session.
|
||||||
chunks: List of Chunk objects to assign
|
document: The Document object to update.
|
||||||
|
chunks: List of Chunk objects to assign.
|
||||||
Example:
|
|
||||||
# Instead of: document.chunks = chunks (DANGEROUS!)
|
|
||||||
safe_set_chunks(document, chunks) # Always safe
|
|
||||||
"""
|
"""
|
||||||
from sqlalchemy.orm import object_session
|
from sqlalchemy import delete
|
||||||
from sqlalchemy.orm.attributes import set_committed_value
|
from sqlalchemy.orm.attributes import set_committed_value
|
||||||
|
|
||||||
# Keep relationship assignment lazy-load-safe.
|
from app.db import Chunk
|
||||||
set_committed_value(document, "chunks", chunks)
|
|
||||||
|
|
||||||
# Ensure chunk rows are actually persisted.
|
if document.id is not None:
|
||||||
# set_committed_value bypasses normal unit-of-work tracking, so we need to
|
await session.execute(
|
||||||
# explicitly attach chunk objects to the current session.
|
delete(Chunk).where(Chunk.document_id == document.id)
|
||||||
session = object_session(document)
|
)
|
||||||
if session is not None:
|
for chunk in chunks:
|
||||||
if document.id is not None:
|
chunk.document_id = document.id
|
||||||
for chunk in chunks:
|
|
||||||
chunk.document_id = document.id
|
set_committed_value(document, "chunks", chunks)
|
||||||
session.add_all(chunks)
|
session.add_all(chunks)
|
||||||
|
|
||||||
|
|
||||||
def parse_date_flexible(date_str: str) -> datetime:
|
def parse_date_flexible(date_str: str) -> datetime:
|
||||||
|
|
|
||||||
|
|
@ -430,7 +430,7 @@ async def index_bookstack_pages(
|
||||||
document.content_hash = item["content_hash"]
|
document.content_hash = item["content_hash"]
|
||||||
document.embedding = summary_embedding
|
document.embedding = summary_embedding
|
||||||
document.document_metadata = doc_metadata
|
document.document_metadata = doc_metadata
|
||||||
safe_set_chunks(document, chunks)
|
await safe_set_chunks(session, document, chunks)
|
||||||
document.updated_at = get_current_timestamp()
|
document.updated_at = get_current_timestamp()
|
||||||
document.status = DocumentStatus.ready()
|
document.status = DocumentStatus.ready()
|
||||||
|
|
||||||
|
|
|
||||||
|
|
@ -439,7 +439,7 @@ async def index_clickup_tasks(
|
||||||
"connector_id": connector_id,
|
"connector_id": connector_id,
|
||||||
"indexed_at": datetime.now().strftime("%Y-%m-%d %H:%M:%S"),
|
"indexed_at": datetime.now().strftime("%Y-%m-%d %H:%M:%S"),
|
||||||
}
|
}
|
||||||
safe_set_chunks(document, chunks)
|
await safe_set_chunks(session, document, chunks)
|
||||||
document.updated_at = get_current_timestamp()
|
document.updated_at = get_current_timestamp()
|
||||||
document.status = DocumentStatus.ready()
|
document.status = DocumentStatus.ready()
|
||||||
|
|
||||||
|
|
|
||||||
|
|
@ -413,7 +413,7 @@ async def index_confluence_pages(
|
||||||
"indexed_at": datetime.now().strftime("%Y-%m-%d %H:%M:%S"),
|
"indexed_at": datetime.now().strftime("%Y-%m-%d %H:%M:%S"),
|
||||||
"connector_id": connector_id,
|
"connector_id": connector_id,
|
||||||
}
|
}
|
||||||
safe_set_chunks(document, chunks)
|
await safe_set_chunks(session, document, chunks)
|
||||||
document.updated_at = get_current_timestamp()
|
document.updated_at = get_current_timestamp()
|
||||||
document.status = DocumentStatus.ready()
|
document.status = DocumentStatus.ready()
|
||||||
|
|
||||||
|
|
|
||||||
|
|
@ -690,7 +690,7 @@ async def index_discord_messages(
|
||||||
"indexed_at": datetime.now(UTC).strftime("%Y-%m-%d %H:%M:%S"),
|
"indexed_at": datetime.now(UTC).strftime("%Y-%m-%d %H:%M:%S"),
|
||||||
"connector_id": connector_id,
|
"connector_id": connector_id,
|
||||||
}
|
}
|
||||||
safe_set_chunks(document, chunks)
|
await safe_set_chunks(session, document, chunks)
|
||||||
document.updated_at = get_current_timestamp()
|
document.updated_at = get_current_timestamp()
|
||||||
document.status = DocumentStatus.ready()
|
document.status = DocumentStatus.ready()
|
||||||
|
|
||||||
|
|
|
||||||
|
|
@ -386,7 +386,7 @@ async def index_elasticsearch_documents(
|
||||||
document.content_hash = item["content_hash"]
|
document.content_hash = item["content_hash"]
|
||||||
document.unique_identifier_hash = item["unique_identifier_hash"]
|
document.unique_identifier_hash = item["unique_identifier_hash"]
|
||||||
document.document_metadata = metadata
|
document.document_metadata = metadata
|
||||||
safe_set_chunks(document, chunks)
|
await safe_set_chunks(session, document, chunks)
|
||||||
document.updated_at = get_current_timestamp()
|
document.updated_at = get_current_timestamp()
|
||||||
document.status = DocumentStatus.ready()
|
document.status = DocumentStatus.ready()
|
||||||
|
|
||||||
|
|
|
||||||
|
|
@ -415,7 +415,7 @@ async def index_github_repos(
|
||||||
document.content_hash = item["content_hash"]
|
document.content_hash = item["content_hash"]
|
||||||
document.embedding = summary_embedding
|
document.embedding = summary_embedding
|
||||||
document.document_metadata = doc_metadata
|
document.document_metadata = doc_metadata
|
||||||
safe_set_chunks(document, chunks_data)
|
await safe_set_chunks(session, document, chunks_data)
|
||||||
document.updated_at = get_current_timestamp()
|
document.updated_at = get_current_timestamp()
|
||||||
document.status = DocumentStatus.ready()
|
document.status = DocumentStatus.ready()
|
||||||
|
|
||||||
|
|
|
||||||
|
|
@ -528,7 +528,7 @@ async def index_google_calendar_events(
|
||||||
"indexed_at": datetime.now().strftime("%Y-%m-%d %H:%M:%S"),
|
"indexed_at": datetime.now().strftime("%Y-%m-%d %H:%M:%S"),
|
||||||
"connector_id": connector_id,
|
"connector_id": connector_id,
|
||||||
}
|
}
|
||||||
safe_set_chunks(document, chunks)
|
await safe_set_chunks(session, document, chunks)
|
||||||
document.updated_at = get_current_timestamp()
|
document.updated_at = get_current_timestamp()
|
||||||
document.status = DocumentStatus.ready()
|
document.status = DocumentStatus.ready()
|
||||||
|
|
||||||
|
|
|
||||||
|
|
@ -451,7 +451,7 @@ async def index_google_gmail_messages(
|
||||||
"date": item["date_str"],
|
"date": item["date_str"],
|
||||||
"connector_id": connector_id,
|
"connector_id": connector_id,
|
||||||
}
|
}
|
||||||
safe_set_chunks(document, chunks)
|
await safe_set_chunks(session, document, chunks)
|
||||||
document.updated_at = get_current_timestamp()
|
document.updated_at = get_current_timestamp()
|
||||||
document.status = DocumentStatus.ready()
|
document.status = DocumentStatus.ready()
|
||||||
|
|
||||||
|
|
|
||||||
|
|
@ -393,7 +393,7 @@ async def index_jira_issues(
|
||||||
"indexed_at": datetime.now().strftime("%Y-%m-%d %H:%M:%S"),
|
"indexed_at": datetime.now().strftime("%Y-%m-%d %H:%M:%S"),
|
||||||
"connector_id": connector_id,
|
"connector_id": connector_id,
|
||||||
}
|
}
|
||||||
safe_set_chunks(document, chunks)
|
await safe_set_chunks(session, document, chunks)
|
||||||
document.updated_at = get_current_timestamp()
|
document.updated_at = get_current_timestamp()
|
||||||
document.status = DocumentStatus.ready()
|
document.status = DocumentStatus.ready()
|
||||||
|
|
||||||
|
|
|
||||||
|
|
@ -431,7 +431,7 @@ async def index_linear_issues(
|
||||||
"indexed_at": datetime.now().strftime("%Y-%m-%d %H:%M:%S"),
|
"indexed_at": datetime.now().strftime("%Y-%m-%d %H:%M:%S"),
|
||||||
"connector_id": connector_id,
|
"connector_id": connector_id,
|
||||||
}
|
}
|
||||||
safe_set_chunks(document, chunks)
|
await safe_set_chunks(session, document, chunks)
|
||||||
document.updated_at = get_current_timestamp()
|
document.updated_at = get_current_timestamp()
|
||||||
document.status = DocumentStatus.ready()
|
document.status = DocumentStatus.ready()
|
||||||
|
|
||||||
|
|
|
||||||
|
|
@ -488,7 +488,7 @@ async def index_luma_events(
|
||||||
"indexed_at": datetime.now().strftime("%Y-%m-%d %H:%M:%S"),
|
"indexed_at": datetime.now().strftime("%Y-%m-%d %H:%M:%S"),
|
||||||
"connector_id": connector_id,
|
"connector_id": connector_id,
|
||||||
}
|
}
|
||||||
safe_set_chunks(document, chunks)
|
await safe_set_chunks(session, document, chunks)
|
||||||
document.updated_at = get_current_timestamp()
|
document.updated_at = get_current_timestamp()
|
||||||
document.status = DocumentStatus.ready()
|
document.status = DocumentStatus.ready()
|
||||||
|
|
||||||
|
|
|
||||||
|
|
@ -479,7 +479,7 @@ async def index_notion_pages(
|
||||||
"indexed_at": datetime.now().strftime("%Y-%m-%d %H:%M:%S"),
|
"indexed_at": datetime.now().strftime("%Y-%m-%d %H:%M:%S"),
|
||||||
"connector_id": connector_id,
|
"connector_id": connector_id,
|
||||||
}
|
}
|
||||||
safe_set_chunks(document, chunks)
|
await safe_set_chunks(session, document, chunks)
|
||||||
document.updated_at = get_current_timestamp()
|
document.updated_at = get_current_timestamp()
|
||||||
document.status = DocumentStatus.ready()
|
document.status = DocumentStatus.ready()
|
||||||
|
|
||||||
|
|
|
||||||
|
|
@ -571,7 +571,7 @@ async def index_obsidian_vault(
|
||||||
document.content_hash = content_hash
|
document.content_hash = content_hash
|
||||||
document.embedding = embedding
|
document.embedding = embedding
|
||||||
document.document_metadata = document_metadata
|
document.document_metadata = document_metadata
|
||||||
safe_set_chunks(document, chunks)
|
await safe_set_chunks(session, document, chunks)
|
||||||
document.updated_at = get_current_timestamp()
|
document.updated_at = get_current_timestamp()
|
||||||
document.status = DocumentStatus.ready()
|
document.status = DocumentStatus.ready()
|
||||||
|
|
||||||
|
|
|
||||||
|
|
@ -564,7 +564,7 @@ async def index_slack_messages(
|
||||||
"indexed_at": datetime.now().strftime("%Y-%m-%d %H:%M:%S"),
|
"indexed_at": datetime.now().strftime("%Y-%m-%d %H:%M:%S"),
|
||||||
"connector_id": connector_id,
|
"connector_id": connector_id,
|
||||||
}
|
}
|
||||||
safe_set_chunks(document, chunks)
|
await safe_set_chunks(session, document, chunks)
|
||||||
document.updated_at = get_current_timestamp()
|
document.updated_at = get_current_timestamp()
|
||||||
document.status = DocumentStatus.ready()
|
document.status = DocumentStatus.ready()
|
||||||
|
|
||||||
|
|
|
||||||
|
|
@ -603,7 +603,7 @@ async def index_teams_messages(
|
||||||
"indexed_at": datetime.now().strftime("%Y-%m-%d %H:%M:%S"),
|
"indexed_at": datetime.now().strftime("%Y-%m-%d %H:%M:%S"),
|
||||||
"connector_id": connector_id,
|
"connector_id": connector_id,
|
||||||
}
|
}
|
||||||
safe_set_chunks(document, chunks)
|
await safe_set_chunks(session, document, chunks)
|
||||||
document.updated_at = get_current_timestamp()
|
document.updated_at = get_current_timestamp()
|
||||||
document.status = DocumentStatus.ready()
|
document.status = DocumentStatus.ready()
|
||||||
|
|
||||||
|
|
|
||||||
|
|
@ -410,7 +410,7 @@ async def index_crawled_urls(
|
||||||
"indexed_at": datetime.now().strftime("%Y-%m-%d %H:%M:%S"),
|
"indexed_at": datetime.now().strftime("%Y-%m-%d %H:%M:%S"),
|
||||||
"connector_id": connector_id,
|
"connector_id": connector_id,
|
||||||
}
|
}
|
||||||
safe_set_chunks(document, chunks)
|
await safe_set_chunks(session, document, chunks)
|
||||||
document.status = DocumentStatus.ready() # READY status
|
document.status = DocumentStatus.ready() # READY status
|
||||||
document.updated_at = get_current_timestamp()
|
document.updated_at = get_current_timestamp()
|
||||||
|
|
||||||
|
|
|
||||||
|
|
@ -14,45 +14,37 @@ from app.db import Document
|
||||||
md = MarkdownifyTransformer()
|
md = MarkdownifyTransformer()
|
||||||
|
|
||||||
|
|
||||||
def safe_set_chunks(document: Document, chunks: list) -> None:
|
async def safe_set_chunks(
|
||||||
|
session: "AsyncSession", document: Document, chunks: list
|
||||||
|
) -> None:
|
||||||
"""
|
"""
|
||||||
Safely assign chunks to a document without triggering lazy loading.
|
Delete old chunks and assign new ones to a document.
|
||||||
|
|
||||||
ALWAYS use this instead of `document.chunks = chunks` to avoid
|
This replaces direct ``document.chunks = chunks`` which triggers lazy
|
||||||
SQLAlchemy async errors (MissingGreenlet / greenlet_spawn).
|
loading (and MissingGreenlet errors in async contexts). It also
|
||||||
|
explicitly deletes pre-existing chunks so they don't accumulate across
|
||||||
Why this is needed:
|
repeated re-indexes — ``set_committed_value`` bypasses SQLAlchemy's
|
||||||
- Direct assignment `document.chunks = chunks` triggers SQLAlchemy to
|
delete-orphan cascade.
|
||||||
load the OLD chunks first (for comparison/orphan detection)
|
|
||||||
- This lazy loading fails in async context with asyncpg driver
|
|
||||||
- set_committed_value bypasses this by setting the value directly
|
|
||||||
|
|
||||||
This function is safe regardless of how the document was loaded
|
|
||||||
(with or without selectinload).
|
|
||||||
|
|
||||||
Args:
|
Args:
|
||||||
document: The Document object to update
|
session: The current async database session.
|
||||||
chunks: List of Chunk objects to assign
|
document: The Document object to update.
|
||||||
|
chunks: List of Chunk objects to assign.
|
||||||
Example:
|
|
||||||
# Instead of: document.chunks = chunks (DANGEROUS!)
|
|
||||||
safe_set_chunks(document, chunks) # Always safe
|
|
||||||
"""
|
"""
|
||||||
from sqlalchemy.orm import object_session
|
from sqlalchemy import delete
|
||||||
from sqlalchemy.orm.attributes import set_committed_value
|
from sqlalchemy.orm.attributes import set_committed_value
|
||||||
|
|
||||||
# Keep relationship assignment lazy-load-safe.
|
from app.db import Chunk
|
||||||
set_committed_value(document, "chunks", chunks)
|
|
||||||
|
|
||||||
# Ensure chunk rows are actually persisted.
|
if document.id is not None:
|
||||||
# set_committed_value bypasses normal unit-of-work tracking, so we need to
|
await session.execute(
|
||||||
# explicitly attach chunk objects to the current session.
|
delete(Chunk).where(Chunk.document_id == document.id)
|
||||||
session = object_session(document)
|
)
|
||||||
if session is not None:
|
for chunk in chunks:
|
||||||
if document.id is not None:
|
chunk.document_id = document.id
|
||||||
for chunk in chunks:
|
|
||||||
chunk.document_id = document.id
|
set_committed_value(document, "chunks", chunks)
|
||||||
session.add_all(chunks)
|
session.add_all(chunks)
|
||||||
|
|
||||||
|
|
||||||
def get_current_timestamp() -> datetime:
|
def get_current_timestamp() -> datetime:
|
||||||
|
|
|
||||||
|
|
@ -227,7 +227,7 @@ async def add_circleback_meeting_document(
|
||||||
if summary_embedding is not None:
|
if summary_embedding is not None:
|
||||||
document.embedding = summary_embedding
|
document.embedding = summary_embedding
|
||||||
document.document_metadata = document_metadata
|
document.document_metadata = document_metadata
|
||||||
safe_set_chunks(document, chunks)
|
await safe_set_chunks(session, document, chunks)
|
||||||
document.source_markdown = markdown_content
|
document.source_markdown = markdown_content
|
||||||
document.content_needs_reindexing = False
|
document.content_needs_reindexing = False
|
||||||
document.updated_at = get_current_timestamp()
|
document.updated_at = get_current_timestamp()
|
||||||
|
|
|
||||||
|
|
@ -21,6 +21,7 @@ from app.utils.document_converters import (
|
||||||
from .base import (
|
from .base import (
|
||||||
check_document_by_unique_identifier,
|
check_document_by_unique_identifier,
|
||||||
get_current_timestamp,
|
get_current_timestamp,
|
||||||
|
safe_set_chunks,
|
||||||
)
|
)
|
||||||
|
|
||||||
|
|
||||||
|
|
@ -154,7 +155,7 @@ async def add_extension_received_document(
|
||||||
existing_document.content_hash = content_hash
|
existing_document.content_hash = content_hash
|
||||||
existing_document.embedding = summary_embedding
|
existing_document.embedding = summary_embedding
|
||||||
existing_document.document_metadata = content.metadata.model_dump()
|
existing_document.document_metadata = content.metadata.model_dump()
|
||||||
existing_document.chunks = chunks
|
await safe_set_chunks(session, existing_document, chunks)
|
||||||
existing_document.source_markdown = combined_document_string
|
existing_document.source_markdown = combined_document_string
|
||||||
existing_document.updated_at = get_current_timestamp()
|
existing_document.updated_at = get_current_timestamp()
|
||||||
|
|
||||||
|
|
|
||||||
|
|
@ -35,6 +35,7 @@ from .base import (
|
||||||
check_document_by_unique_identifier,
|
check_document_by_unique_identifier,
|
||||||
check_duplicate_document,
|
check_duplicate_document,
|
||||||
get_current_timestamp,
|
get_current_timestamp,
|
||||||
|
safe_set_chunks,
|
||||||
)
|
)
|
||||||
from .markdown_processor import add_received_markdown_file_document
|
from .markdown_processor import add_received_markdown_file_document
|
||||||
|
|
||||||
|
|
@ -488,7 +489,7 @@ async def add_received_file_document_using_unstructured(
|
||||||
"FILE_NAME": file_name,
|
"FILE_NAME": file_name,
|
||||||
"ETL_SERVICE": "UNSTRUCTURED",
|
"ETL_SERVICE": "UNSTRUCTURED",
|
||||||
}
|
}
|
||||||
existing_document.chunks = chunks
|
await safe_set_chunks(session, existing_document, chunks)
|
||||||
existing_document.source_markdown = file_in_markdown
|
existing_document.source_markdown = file_in_markdown
|
||||||
existing_document.content_needs_reindexing = False
|
existing_document.content_needs_reindexing = False
|
||||||
existing_document.updated_at = get_current_timestamp()
|
existing_document.updated_at = get_current_timestamp()
|
||||||
|
|
@ -622,7 +623,7 @@ async def add_received_file_document_using_llamacloud(
|
||||||
"FILE_NAME": file_name,
|
"FILE_NAME": file_name,
|
||||||
"ETL_SERVICE": "LLAMACLOUD",
|
"ETL_SERVICE": "LLAMACLOUD",
|
||||||
}
|
}
|
||||||
existing_document.chunks = chunks
|
await safe_set_chunks(session, existing_document, chunks)
|
||||||
existing_document.source_markdown = file_in_markdown
|
existing_document.source_markdown = file_in_markdown
|
||||||
existing_document.content_needs_reindexing = False
|
existing_document.content_needs_reindexing = False
|
||||||
existing_document.updated_at = get_current_timestamp()
|
existing_document.updated_at = get_current_timestamp()
|
||||||
|
|
@ -777,7 +778,7 @@ async def add_received_file_document_using_docling(
|
||||||
"FILE_NAME": file_name,
|
"FILE_NAME": file_name,
|
||||||
"ETL_SERVICE": "DOCLING",
|
"ETL_SERVICE": "DOCLING",
|
||||||
}
|
}
|
||||||
existing_document.chunks = chunks
|
await safe_set_chunks(session, existing_document, chunks)
|
||||||
existing_document.source_markdown = file_in_markdown
|
existing_document.source_markdown = file_in_markdown
|
||||||
existing_document.content_needs_reindexing = False
|
existing_document.content_needs_reindexing = False
|
||||||
existing_document.updated_at = get_current_timestamp()
|
existing_document.updated_at = get_current_timestamp()
|
||||||
|
|
|
||||||
|
|
@ -21,6 +21,7 @@ from .base import (
|
||||||
check_document_by_unique_identifier,
|
check_document_by_unique_identifier,
|
||||||
check_duplicate_document,
|
check_duplicate_document,
|
||||||
get_current_timestamp,
|
get_current_timestamp,
|
||||||
|
safe_set_chunks,
|
||||||
)
|
)
|
||||||
|
|
||||||
|
|
||||||
|
|
@ -258,7 +259,7 @@ async def add_received_markdown_file_document(
|
||||||
existing_document.document_metadata = {
|
existing_document.document_metadata = {
|
||||||
"FILE_NAME": file_name,
|
"FILE_NAME": file_name,
|
||||||
}
|
}
|
||||||
existing_document.chunks = chunks
|
await safe_set_chunks(session, existing_document, chunks)
|
||||||
existing_document.source_markdown = file_in_markdown
|
existing_document.source_markdown = file_in_markdown
|
||||||
existing_document.updated_at = get_current_timestamp()
|
existing_document.updated_at = get_current_timestamp()
|
||||||
existing_document.status = DocumentStatus.ready() # Mark as ready
|
existing_document.status = DocumentStatus.ready() # Mark as ready
|
||||||
|
|
|
||||||
|
|
@ -419,7 +419,7 @@ async def add_youtube_video_document(
|
||||||
"author": video_data.get("author_name", "Unknown"),
|
"author": video_data.get("author_name", "Unknown"),
|
||||||
"thumbnail": video_data.get("thumbnail_url", ""),
|
"thumbnail": video_data.get("thumbnail_url", ""),
|
||||||
}
|
}
|
||||||
safe_set_chunks(document, chunks)
|
await safe_set_chunks(session, document, chunks)
|
||||||
document.source_markdown = combined_document_string
|
document.source_markdown = combined_document_string
|
||||||
document.status = DocumentStatus.ready() # READY status - fully processed
|
document.status = DocumentStatus.ready() # READY status - fully processed
|
||||||
document.updated_at = get_current_timestamp()
|
document.updated_at = get_current_timestamp()
|
||||||
|
|
|
||||||
|
|
@ -13,12 +13,32 @@ from sqlalchemy import select
|
||||||
from sqlalchemy.ext.asyncio import AsyncSession
|
from sqlalchemy.ext.asyncio import AsyncSession
|
||||||
from sqlalchemy.orm import selectinload
|
from sqlalchemy.orm import selectinload
|
||||||
|
|
||||||
|
from sqlalchemy import delete as sa_delete
|
||||||
|
from sqlalchemy.orm.attributes import set_committed_value
|
||||||
|
|
||||||
from app.config import config
|
from app.config import config
|
||||||
from app.db import SurfsenseDocsChunk, SurfsenseDocsDocument, async_session_maker
|
from app.db import SurfsenseDocsChunk, SurfsenseDocsDocument, async_session_maker
|
||||||
from app.utils.document_converters import embed_text
|
from app.utils.document_converters import embed_text
|
||||||
|
|
||||||
logger = logging.getLogger(__name__)
|
logger = logging.getLogger(__name__)
|
||||||
|
|
||||||
|
|
||||||
|
async def _safe_set_docs_chunks(
|
||||||
|
session: AsyncSession, document: SurfsenseDocsDocument, chunks: list
|
||||||
|
) -> None:
|
||||||
|
"""safe_set_chunks variant for the SurfsenseDocsDocument/Chunk models."""
|
||||||
|
if document.id is not None:
|
||||||
|
await session.execute(
|
||||||
|
sa_delete(SurfsenseDocsChunk).where(
|
||||||
|
SurfsenseDocsChunk.document_id == document.id
|
||||||
|
)
|
||||||
|
)
|
||||||
|
for chunk in chunks:
|
||||||
|
chunk.document_id = document.id
|
||||||
|
|
||||||
|
set_committed_value(document, "chunks", chunks)
|
||||||
|
session.add_all(chunks)
|
||||||
|
|
||||||
# Path to docs relative to project root
|
# Path to docs relative to project root
|
||||||
DOCS_DIR = (
|
DOCS_DIR = (
|
||||||
Path(__file__).resolve().parent.parent.parent.parent
|
Path(__file__).resolve().parent.parent.parent.parent
|
||||||
|
|
@ -156,7 +176,7 @@ async def index_surfsense_docs(session: AsyncSession) -> tuple[int, int, int, in
|
||||||
existing_doc.content = content
|
existing_doc.content = content
|
||||||
existing_doc.content_hash = content_hash
|
existing_doc.content_hash = content_hash
|
||||||
existing_doc.embedding = embed_text(content)
|
existing_doc.embedding = embed_text(content)
|
||||||
existing_doc.chunks = chunks
|
await _safe_set_docs_chunks(session, existing_doc, chunks)
|
||||||
existing_doc.updated_at = datetime.now(UTC)
|
existing_doc.updated_at = datetime.now(UTC)
|
||||||
|
|
||||||
updated += 1
|
updated += 1
|
||||||
|
|
|
||||||
|
|
@ -19,10 +19,12 @@ import {
|
||||||
ChevronRightIcon,
|
ChevronRightIcon,
|
||||||
CopyIcon,
|
CopyIcon,
|
||||||
DownloadIcon,
|
DownloadIcon,
|
||||||
|
Plus,
|
||||||
RefreshCwIcon,
|
RefreshCwIcon,
|
||||||
|
Settings2,
|
||||||
SquareIcon,
|
SquareIcon,
|
||||||
Unplug,
|
Unplug,
|
||||||
Wrench,
|
Upload,
|
||||||
X,
|
X,
|
||||||
} from "lucide-react";
|
} from "lucide-react";
|
||||||
import { useParams } from "next/navigation";
|
import { useParams } from "next/navigation";
|
||||||
|
|
@ -53,6 +55,7 @@ import { currentUserAtom } from "@/atoms/user/user-query.atoms";
|
||||||
import { AssistantMessage } from "@/components/assistant-ui/assistant-message";
|
import { AssistantMessage } from "@/components/assistant-ui/assistant-message";
|
||||||
import { ChatSessionStatus } from "@/components/assistant-ui/chat-session-status";
|
import { ChatSessionStatus } from "@/components/assistant-ui/chat-session-status";
|
||||||
import { ConnectorIndicator } from "@/components/assistant-ui/connector-popup";
|
import { ConnectorIndicator } from "@/components/assistant-ui/connector-popup";
|
||||||
|
import { useDocumentUploadDialog } from "@/components/assistant-ui/document-upload-popup";
|
||||||
import {
|
import {
|
||||||
InlineMentionEditor,
|
InlineMentionEditor,
|
||||||
type InlineMentionEditorRef,
|
type InlineMentionEditorRef,
|
||||||
|
|
@ -73,6 +76,13 @@ import {
|
||||||
import type { ThinkingStep } from "@/components/tool-ui/deepagent-thinking";
|
import type { ThinkingStep } from "@/components/tool-ui/deepagent-thinking";
|
||||||
import { Avatar, AvatarFallback, AvatarGroup } from "@/components/ui/avatar";
|
import { Avatar, AvatarFallback, AvatarGroup } from "@/components/ui/avatar";
|
||||||
import { Button } from "@/components/ui/button";
|
import { Button } from "@/components/ui/button";
|
||||||
|
import { Drawer, DrawerContent, DrawerHandle, DrawerTitle } from "@/components/ui/drawer";
|
||||||
|
import {
|
||||||
|
DropdownMenu,
|
||||||
|
DropdownMenuContent,
|
||||||
|
DropdownMenuItem,
|
||||||
|
DropdownMenuTrigger,
|
||||||
|
} from "@/components/ui/dropdown-menu";
|
||||||
import { Popover, PopoverContent, PopoverTrigger } from "@/components/ui/popover";
|
import { Popover, PopoverContent, PopoverTrigger } from "@/components/ui/popover";
|
||||||
import { Switch } from "@/components/ui/switch";
|
import { Switch } from "@/components/ui/switch";
|
||||||
import { Tooltip, TooltipContent, TooltipTrigger } from "@/components/ui/tooltip";
|
import { Tooltip, TooltipContent, TooltipTrigger } from "@/components/ui/tooltip";
|
||||||
|
|
@ -278,21 +288,14 @@ const ConnectToolsBanner: FC = () => {
|
||||||
</Avatar>
|
</Avatar>
|
||||||
))}
|
))}
|
||||||
</AvatarGroup>
|
</AvatarGroup>
|
||||||
<span
|
<button
|
||||||
role="button"
|
type="button"
|
||||||
tabIndex={0}
|
|
||||||
onClick={handleDismiss}
|
onClick={handleDismiss}
|
||||||
onKeyDown={(e) => {
|
|
||||||
if (e.key === "Enter" || e.key === " ") {
|
|
||||||
e.preventDefault();
|
|
||||||
handleDismiss(e as unknown as React.MouseEvent);
|
|
||||||
}
|
|
||||||
}}
|
|
||||||
className="shrink-0 ml-0.5 p-0.5 text-muted-foreground/40 hover:text-foreground transition-colors"
|
className="shrink-0 ml-0.5 p-0.5 text-muted-foreground/40 hover:text-foreground transition-colors"
|
||||||
aria-label="Dismiss"
|
aria-label="Dismiss"
|
||||||
>
|
>
|
||||||
<X className="size-3.5" />
|
<X className="size-3.5" />
|
||||||
</span>
|
</button>
|
||||||
</button>
|
</button>
|
||||||
</div>
|
</div>
|
||||||
);
|
);
|
||||||
|
|
@ -564,6 +567,7 @@ const ComposerAction: FC<ComposerActionProps> = ({ isBlockedByOtherUser = false
|
||||||
const setConnectorDialogOpen = useSetAtom(connectorDialogOpenAtom);
|
const setConnectorDialogOpen = useSetAtom(connectorDialogOpenAtom);
|
||||||
const [toolsPopoverOpen, setToolsPopoverOpen] = useState(false);
|
const [toolsPopoverOpen, setToolsPopoverOpen] = useState(false);
|
||||||
const isDesktop = useMediaQuery("(min-width: 640px)");
|
const isDesktop = useMediaQuery("(min-width: 640px)");
|
||||||
|
const { openDialog: openUploadDialog } = useDocumentUploadDialog();
|
||||||
const [toolsScrollPos, setToolsScrollPos] = useState<"top" | "middle" | "bottom">("top");
|
const [toolsScrollPos, setToolsScrollPos] = useState<"top" | "middle" | "bottom">("top");
|
||||||
const handleToolsScroll = useCallback((e: React.UIEvent<HTMLDivElement>) => {
|
const handleToolsScroll = useCallback((e: React.UIEvent<HTMLDivElement>) => {
|
||||||
const el = e.currentTarget;
|
const el = e.currentTarget;
|
||||||
|
|
@ -607,87 +611,144 @@ const ComposerAction: FC<ComposerActionProps> = ({ isBlockedByOtherUser = false
|
||||||
return (
|
return (
|
||||||
<div className="aui-composer-action-wrapper relative mx-3 mb-2 flex items-center justify-between">
|
<div className="aui-composer-action-wrapper relative mx-3 mb-2 flex items-center justify-between">
|
||||||
<div className="flex items-center gap-1">
|
<div className="flex items-center gap-1">
|
||||||
<Popover open={toolsPopoverOpen} onOpenChange={setToolsPopoverOpen}>
|
{!isDesktop ? (
|
||||||
<PopoverTrigger asChild>
|
<>
|
||||||
<TooltipIconButton
|
<DropdownMenu>
|
||||||
tooltip="Manage tools"
|
<DropdownMenuTrigger asChild>
|
||||||
side="bottom"
|
<Button
|
||||||
|
variant="ghost"
|
||||||
|
size="icon"
|
||||||
|
className="size-[34px] rounded-full p-1 font-semibold text-xs hover:bg-muted-foreground/15 dark:border-muted-foreground/15 dark:hover:bg-muted-foreground/30"
|
||||||
|
aria-label="More actions"
|
||||||
|
data-joyride="connector-icon"
|
||||||
|
>
|
||||||
|
<Plus className="size-4" />
|
||||||
|
</Button>
|
||||||
|
</DropdownMenuTrigger>
|
||||||
|
<DropdownMenuContent side="top" align="start" sideOffset={8}>
|
||||||
|
<DropdownMenuItem onSelect={() => setToolsPopoverOpen(true)}>
|
||||||
|
<Settings2 className="size-4" />
|
||||||
|
Manage Tools
|
||||||
|
</DropdownMenuItem>
|
||||||
|
<DropdownMenuItem onSelect={() => openUploadDialog()}>
|
||||||
|
<Upload className="size-4" />
|
||||||
|
Upload Files
|
||||||
|
</DropdownMenuItem>
|
||||||
|
</DropdownMenuContent>
|
||||||
|
</DropdownMenu>
|
||||||
|
<Drawer open={toolsPopoverOpen} onOpenChange={setToolsPopoverOpen}>
|
||||||
|
<DrawerContent className="max-h-[60dvh]">
|
||||||
|
<DrawerHandle />
|
||||||
|
<div className="flex items-center justify-between px-4 py-2">
|
||||||
|
<DrawerTitle className="text-sm font-medium">Agent Tools</DrawerTitle>
|
||||||
|
<span className="text-xs text-muted-foreground">
|
||||||
|
{enabledCount}/{agentTools?.length ?? 0} enabled
|
||||||
|
</span>
|
||||||
|
</div>
|
||||||
|
<div className="overflow-y-auto pb-6" onScroll={handleToolsScroll}>
|
||||||
|
{agentTools?.map((tool) => {
|
||||||
|
const isDisabled = disabledTools.includes(tool.name);
|
||||||
|
return (
|
||||||
|
<div
|
||||||
|
key={tool.name}
|
||||||
|
className="flex w-full items-center gap-3 px-4 py-2 hover:bg-muted-foreground/10 transition-colors"
|
||||||
|
>
|
||||||
|
<span className="flex-1 min-w-0 text-sm font-medium truncate">
|
||||||
|
{formatToolName(tool.name)}
|
||||||
|
</span>
|
||||||
|
<Switch
|
||||||
|
checked={!isDisabled}
|
||||||
|
onCheckedChange={() => toggleTool(tool.name)}
|
||||||
|
className="shrink-0"
|
||||||
|
/>
|
||||||
|
</div>
|
||||||
|
);
|
||||||
|
})}
|
||||||
|
{!agentTools?.length && (
|
||||||
|
<div className="px-4 py-6 text-center text-sm text-muted-foreground">
|
||||||
|
Loading tools...
|
||||||
|
</div>
|
||||||
|
)}
|
||||||
|
</div>
|
||||||
|
</DrawerContent>
|
||||||
|
</Drawer>
|
||||||
|
<Button
|
||||||
variant="ghost"
|
variant="ghost"
|
||||||
size="icon"
|
size="icon"
|
||||||
className="size-[34px] rounded-full p-1 font-semibold text-xs hover:bg-muted-foreground/15 dark:border-muted-foreground/15 dark:hover:bg-muted-foreground/30"
|
className="size-[34px] rounded-full p-1 font-semibold text-xs hover:bg-muted-foreground/15 dark:border-muted-foreground/15 dark:hover:bg-muted-foreground/30"
|
||||||
aria-label="Manage tools"
|
aria-label="Manage connectors"
|
||||||
data-joyride="connector-icon"
|
onClick={() => setConnectorDialogOpen(true)}
|
||||||
>
|
>
|
||||||
<Wrench className="size-4" />
|
<Unplug className="size-4" />
|
||||||
</TooltipIconButton>
|
</Button>
|
||||||
</PopoverTrigger>
|
</>
|
||||||
<PopoverContent
|
) : (
|
||||||
side="bottom"
|
<Popover open={toolsPopoverOpen} onOpenChange={setToolsPopoverOpen}>
|
||||||
align="start"
|
<PopoverTrigger asChild>
|
||||||
sideOffset={12}
|
<TooltipIconButton
|
||||||
className="w-[calc(100vw-2rem)] max-w-56 sm:max-w-72 sm:w-72 p-0 select-none"
|
tooltip="Manage tools"
|
||||||
onOpenAutoFocus={(e) => e.preventDefault()}
|
side="bottom"
|
||||||
>
|
variant="ghost"
|
||||||
<div className="flex items-center justify-between px-2.5 py-2 sm:px-3 sm:py-2.5 border-b">
|
size="icon"
|
||||||
<span className="text-xs sm:text-sm font-medium">Agent Tools</span>
|
className="size-[34px] rounded-full p-1 font-semibold text-xs hover:bg-muted-foreground/15 dark:border-muted-foreground/15 dark:hover:bg-muted-foreground/30"
|
||||||
<span className="text-[10px] sm:text-xs text-muted-foreground">
|
aria-label="Manage tools"
|
||||||
{enabledCount}/{agentTools?.length ?? 0} enabled
|
data-joyride="connector-icon"
|
||||||
</span>
|
>
|
||||||
</div>
|
<Settings2 className="size-4" />
|
||||||
<div
|
</TooltipIconButton>
|
||||||
className="max-h-48 sm:max-h-64 overflow-y-auto py-0.5 sm:py-1"
|
</PopoverTrigger>
|
||||||
onScroll={handleToolsScroll}
|
<PopoverContent
|
||||||
style={{
|
side="bottom"
|
||||||
maskImage: `linear-gradient(to bottom, ${toolsScrollPos === "top" ? "black" : "transparent"}, black 16px, black calc(100% - 16px), ${toolsScrollPos === "bottom" ? "black" : "transparent"})`,
|
align="start"
|
||||||
WebkitMaskImage: `linear-gradient(to bottom, ${toolsScrollPos === "top" ? "black" : "transparent"}, black 16px, black calc(100% - 16px), ${toolsScrollPos === "bottom" ? "black" : "transparent"})`,
|
sideOffset={12}
|
||||||
}}
|
className="w-[calc(100vw-2rem)] max-w-56 sm:max-w-72 sm:w-72 p-0 select-none"
|
||||||
|
onOpenAutoFocus={(e) => e.preventDefault()}
|
||||||
>
|
>
|
||||||
{agentTools?.map((tool) => {
|
<div className="flex items-center justify-between px-2.5 py-2 sm:px-3 sm:py-2.5 border-b">
|
||||||
const isDisabled = disabledTools.includes(tool.name);
|
<span className="text-xs sm:text-sm font-medium">Agent Tools</span>
|
||||||
const row = (
|
<span className="text-[10px] sm:text-xs text-muted-foreground">
|
||||||
<label className="flex items-center gap-2 sm:gap-3 px-2.5 sm:px-3 py-1 sm:py-1.5 cursor-pointer hover:bg-muted-foreground/10 transition-colors">
|
{enabledCount}/{agentTools?.length ?? 0} enabled
|
||||||
<span className="flex-1 min-w-0 text-xs sm:text-sm font-medium truncate">
|
</span>
|
||||||
{formatToolName(tool.name)}
|
</div>
|
||||||
</span>
|
<div
|
||||||
<Switch
|
className="max-h-48 sm:max-h-64 overflow-y-auto py-0.5 sm:py-1"
|
||||||
checked={!isDisabled}
|
onScroll={handleToolsScroll}
|
||||||
onCheckedChange={() => toggleTool(tool.name)}
|
style={{
|
||||||
className="shrink-0 scale-[0.6] sm:scale-75"
|
maskImage: `linear-gradient(to bottom, ${toolsScrollPos === "top" ? "black" : "transparent"}, black 16px, black calc(100% - 16px), ${toolsScrollPos === "bottom" ? "black" : "transparent"})`,
|
||||||
/>
|
WebkitMaskImage: `linear-gradient(to bottom, ${toolsScrollPos === "top" ? "black" : "transparent"}, black 16px, black calc(100% - 16px), ${toolsScrollPos === "bottom" ? "black" : "transparent"})`,
|
||||||
</label>
|
}}
|
||||||
);
|
>
|
||||||
if (!isDesktop) {
|
{agentTools?.map((tool) => {
|
||||||
return <div key={tool.name}>{row}</div>;
|
const isDisabled = disabledTools.includes(tool.name);
|
||||||
}
|
const row = (
|
||||||
return (
|
<div className="flex w-full items-center gap-2 sm:gap-3 px-2.5 sm:px-3 py-1 sm:py-1.5 hover:bg-muted-foreground/10 transition-colors">
|
||||||
<Tooltip key={tool.name}>
|
<span className="flex-1 min-w-0 text-xs sm:text-sm font-medium truncate">
|
||||||
<TooltipTrigger asChild>{row}</TooltipTrigger>
|
{formatToolName(tool.name)}
|
||||||
<TooltipContent side="right" className="max-w-64 text-xs">
|
</span>
|
||||||
{tool.description}
|
<Switch
|
||||||
</TooltipContent>
|
checked={!isDisabled}
|
||||||
</Tooltip>
|
onCheckedChange={() => toggleTool(tool.name)}
|
||||||
);
|
className="shrink-0 scale-[0.6] sm:scale-75"
|
||||||
})}
|
/>
|
||||||
{!agentTools?.length && (
|
</div>
|
||||||
<div className="px-3 py-4 text-center text-xs text-muted-foreground">
|
);
|
||||||
Loading tools...
|
return (
|
||||||
</div>
|
<Tooltip key={tool.name}>
|
||||||
)}
|
<TooltipTrigger asChild>{row}</TooltipTrigger>
|
||||||
</div>
|
<TooltipContent side="right" className="max-w-64 text-xs">
|
||||||
</PopoverContent>
|
{tool.description}
|
||||||
</Popover>
|
</TooltipContent>
|
||||||
{!isDesktop && (
|
</Tooltip>
|
||||||
<TooltipIconButton
|
);
|
||||||
tooltip="Manage connectors"
|
})}
|
||||||
side="bottom"
|
{!agentTools?.length && (
|
||||||
variant="ghost"
|
<div className="px-3 py-4 text-center text-xs text-muted-foreground">
|
||||||
size="icon"
|
Loading tools...
|
||||||
className="size-[34px] rounded-full p-1 font-semibold text-xs hover:bg-muted-foreground/15 dark:border-muted-foreground/15 dark:hover:bg-muted-foreground/30"
|
</div>
|
||||||
aria-label="Manage connectors"
|
)}
|
||||||
onClick={() => setConnectorDialogOpen(true)}
|
</div>
|
||||||
>
|
</PopoverContent>
|
||||||
<Unplug className="size-4" />
|
</Popover>
|
||||||
</TooltipIconButton>
|
|
||||||
)}
|
)}
|
||||||
{sidebarDocs.length > 0 && (
|
{sidebarDocs.length > 0 && (
|
||||||
<button
|
<button
|
||||||
|
|
|
||||||
|
|
@ -12,6 +12,7 @@ export { default as FireworksAiIcon } from "./fireworksai.svg";
|
||||||
export { default as GeminiIcon } from "./gemini.svg";
|
export { default as GeminiIcon } from "./gemini.svg";
|
||||||
export { default as GroqIcon } from "./groq.svg";
|
export { default as GroqIcon } from "./groq.svg";
|
||||||
export { default as HuggingFaceIcon } from "./huggingface.svg";
|
export { default as HuggingFaceIcon } from "./huggingface.svg";
|
||||||
|
export { default as MiniMaxIcon } from "./minimax.svg";
|
||||||
export { default as MistralIcon } from "./mistral.svg";
|
export { default as MistralIcon } from "./mistral.svg";
|
||||||
export { default as MoonshotIcon } from "./moonshot.svg";
|
export { default as MoonshotIcon } from "./moonshot.svg";
|
||||||
export { default as NscaleIcon } from "./nscale.svg";
|
export { default as NscaleIcon } from "./nscale.svg";
|
||||||
|
|
|
||||||
1
surfsense_web/components/icons/providers/minimax.svg
Normal file
1
surfsense_web/components/icons/providers/minimax.svg
Normal file
|
|
@ -0,0 +1 @@
|
||||||
|
<svg xmlns="http://www.w3.org/2000/svg" viewBox="0 0 24 24"><path fill="currentColor" d="M21.6 4.8h-2.4l-4.2 7.2-3-5.16h-.01L9 12 4.8 4.8H2.4L9 16.14V20.4h2.4v-4.2h1.2v4.2h2.4v-4.26z"/></svg>
|
||||||
|
After Width: | Height: | Size: 192 B |
|
|
@ -1525,6 +1525,20 @@ export const LLM_MODELS: LLMModel[] = [
|
||||||
provider: "GITHUB_MODELS",
|
provider: "GITHUB_MODELS",
|
||||||
contextWindow: "64K",
|
contextWindow: "64K",
|
||||||
},
|
},
|
||||||
|
|
||||||
|
// MiniMax
|
||||||
|
{
|
||||||
|
value: "MiniMax-M2.5",
|
||||||
|
label: "MiniMax M2.5",
|
||||||
|
provider: "MINIMAX",
|
||||||
|
contextWindow: "204K",
|
||||||
|
},
|
||||||
|
{
|
||||||
|
value: "MiniMax-M2.5-highspeed",
|
||||||
|
label: "MiniMax M2.5 Highspeed",
|
||||||
|
provider: "MINIMAX",
|
||||||
|
contextWindow: "204K",
|
||||||
|
},
|
||||||
];
|
];
|
||||||
|
|
||||||
// Helper function to get models by provider
|
// Helper function to get models by provider
|
||||||
|
|
|
||||||
|
|
@ -181,6 +181,13 @@ export const LLM_PROVIDERS: LLMProvider[] = [
|
||||||
description: "AI models from GitHub Marketplace",
|
description: "AI models from GitHub Marketplace",
|
||||||
apiBase: "https://models.github.ai/inference",
|
apiBase: "https://models.github.ai/inference",
|
||||||
},
|
},
|
||||||
|
{
|
||||||
|
value: "MINIMAX",
|
||||||
|
label: "MiniMax",
|
||||||
|
example: "MiniMax-M2.5, MiniMax-M2.5-highspeed",
|
||||||
|
description: "High-performance models with 204K context",
|
||||||
|
apiBase: "https://api.minimax.io/v1",
|
||||||
|
},
|
||||||
{
|
{
|
||||||
value: "CUSTOM",
|
value: "CUSTOM",
|
||||||
label: "Custom Provider",
|
label: "Custom Provider",
|
||||||
|
|
|
||||||
|
|
@ -34,6 +34,7 @@ export const liteLLMProviderEnum = z.enum([
|
||||||
"COMETAPI",
|
"COMETAPI",
|
||||||
"HUGGINGFACE",
|
"HUGGINGFACE",
|
||||||
"GITHUB_MODELS",
|
"GITHUB_MODELS",
|
||||||
|
"MINIMAX",
|
||||||
"CUSTOM",
|
"CUSTOM",
|
||||||
]);
|
]);
|
||||||
|
|
||||||
|
|
|
||||||
|
|
@ -15,6 +15,7 @@ import {
|
||||||
GeminiIcon,
|
GeminiIcon,
|
||||||
GroqIcon,
|
GroqIcon,
|
||||||
HuggingFaceIcon,
|
HuggingFaceIcon,
|
||||||
|
MiniMaxIcon,
|
||||||
MistralIcon,
|
MistralIcon,
|
||||||
MoonshotIcon,
|
MoonshotIcon,
|
||||||
NscaleIcon,
|
NscaleIcon,
|
||||||
|
|
@ -85,6 +86,8 @@ export function getProviderIcon(
|
||||||
return <GroqIcon className={cn(className)} />;
|
return <GroqIcon className={cn(className)} />;
|
||||||
case "HUGGINGFACE":
|
case "HUGGINGFACE":
|
||||||
return <HuggingFaceIcon className={cn(className)} />;
|
return <HuggingFaceIcon className={cn(className)} />;
|
||||||
|
case "MINIMAX":
|
||||||
|
return <MiniMaxIcon className={cn(className)} />;
|
||||||
case "MISTRAL":
|
case "MISTRAL":
|
||||||
return <MistralIcon className={cn(className)} />;
|
return <MistralIcon className={cn(className)} />;
|
||||||
case "MOONSHOT":
|
case "MOONSHOT":
|
||||||
|
|
|
||||||
Loading…
Add table
Add a link
Reference in a new issue