mirror of
https://github.com/MODSetter/SurfSense.git
synced 2026-06-08 20:25:19 +02:00
Merge pull request #898 from MODSetter/dev
feat: SearXNG search, Electron desktop app, video agent & UI overhaul
This commit is contained in:
commit
1013586506
220 changed files with 12155 additions and 3102 deletions
136
.cursor/skills/system-architecture/SKILL.md
Executable file
136
.cursor/skills/system-architecture/SKILL.md
Executable file
|
|
@ -0,0 +1,136 @@
|
|||
---
|
||||
name: system-architecture
|
||||
description: Design systems with appropriate complexity - no more, no less. Use when the user asks to architect applications, design system boundaries, plan service decomposition, evaluate monolith vs microservices, make scaling decisions, or review structural trade-offs. Applies to new system design, refactoring, and migration planning.
|
||||
---
|
||||
|
||||
# System Architecture
|
||||
|
||||
Design real structures with clear boundaries, explicit trade-offs, and appropriate complexity. Match architecture to actual requirements, not imagined future needs.
|
||||
|
||||
## Workflow
|
||||
|
||||
When the user requests an architecture, follow these steps:
|
||||
|
||||
```
|
||||
Task Progress:
|
||||
- [ ] Step 1: Clarify constraints
|
||||
- [ ] Step 2: Identify domains
|
||||
- [ ] Step 3: Map data flow
|
||||
- [ ] Step 4: Draw boundaries with rationale
|
||||
- [ ] Step 5: Run complexity checklist
|
||||
- [ ] Step 6: Present architecture with trade-offs
|
||||
```
|
||||
|
||||
**Step 1 - Clarify constraints.** Ask about:
|
||||
|
||||
| Constraint | Question | Why it matters |
|
||||
|------------|----------|----------------|
|
||||
| Scale | What's the real load? (users, requests/sec, data size) | Design for 10x current, not 1000x |
|
||||
| Team | How many developers? How many teams? | Deployable units ≤ number of teams |
|
||||
| Lifespan | Prototype? MVP? Long-term product? | Temporary systems need temporary solutions |
|
||||
| Change vectors | What actually varies? | Abstract only where you have evidence of variation |
|
||||
|
||||
**Step 2 - Identify domains.** Group by business capability, not technical layer. Look for things that change for different reasons and at different rates.
|
||||
|
||||
**Step 3 - Map data flow.** Trace: where does data enter → how does it transform → where does it exit? Make the flow obvious.
|
||||
|
||||
**Step 4 - Draw boundaries.** Every boundary needs a reason: different team, different change rate, different compliance requirement, or different scaling need.
|
||||
|
||||
**Step 5 - Run complexity checklist.** Before adding any non-trivial pattern:
|
||||
|
||||
```
|
||||
[ ] Have I tried the simple solution?
|
||||
[ ] Do I have evidence it's insufficient?
|
||||
[ ] Can my team operate this?
|
||||
[ ] Will this still make sense in 6 months?
|
||||
[ ] Can I explain why this complexity is necessary?
|
||||
```
|
||||
|
||||
If any answer is "no", keep it simple.
|
||||
|
||||
**Step 6 - Present the architecture** using the output template below.
|
||||
|
||||
## Output Template
|
||||
|
||||
```markdown
|
||||
### System: [Name]
|
||||
|
||||
**Constraints**:
|
||||
- Scale: [current and expected load]
|
||||
- Team: [size and structure]
|
||||
- Lifespan: [prototype / MVP / long-term]
|
||||
|
||||
**Architecture**:
|
||||
[Component diagram or description of components and their relationships]
|
||||
|
||||
**Data Flow**:
|
||||
[How data enters → transforms → exits]
|
||||
|
||||
**Key Boundaries**:
|
||||
| Boundary | Reason | Change Rate |
|
||||
|----------|--------|-------------|
|
||||
| ... | ... | ... |
|
||||
|
||||
**Trade-offs**:
|
||||
- Chose X over Y because [reason]
|
||||
- Accepted [limitation] to gain [benefit]
|
||||
|
||||
**Complexity Justification**:
|
||||
- [Each non-trivial pattern] → [why it's needed, with evidence]
|
||||
```
|
||||
|
||||
## Core Principles
|
||||
|
||||
1. **Boundaries at real differences.** Separate concerns that change for different reasons and at different rates.
|
||||
2. **Dependencies flow inward.** Core logic depends on nothing. Infrastructure depends on core.
|
||||
3. **Follow the data.** Architecture should make data flow obvious.
|
||||
4. **Design for failure.** Network fails. Databases timeout. Build compensation into the structure.
|
||||
5. **Design for operations.** You will debug this at 3am. Every request needs a trace. Every error needs context for replay.
|
||||
|
||||
For concrete good/bad examples of each principle, see [examples.md](examples.md).
|
||||
|
||||
## Anti-Patterns
|
||||
|
||||
| Don't | Do Instead |
|
||||
|-------|------------|
|
||||
| Microservices for a 3-person team | Well-structured monolith |
|
||||
| Event sourcing for CRUD | Simple state storage |
|
||||
| Message queues within the same process | Just call the function |
|
||||
| Distributed transactions | Redesign to avoid, or accept eventual consistency |
|
||||
| Repository wrapping an ORM | Use the ORM directly |
|
||||
| Interfaces with one implementation | Mock at boundaries only |
|
||||
| AbstractFactoryFactoryBean | Just instantiate the thing |
|
||||
| DI containers for simple graphs | Constructor injection is enough |
|
||||
| Clean Architecture for a TODO app | Match layers to actual complexity |
|
||||
| DDD tactics without strategic design | Aggregates need bounded contexts |
|
||||
| Hexagonal ports with one adapter | Just call the database |
|
||||
| CQRS when reads = writes | Add when they diverge |
|
||||
| "We might swap databases" | You won't; rewrite if you do |
|
||||
| "Multi-tenant someday" | Build it when you have tenant #2 |
|
||||
| "Microservices for team scale" | Helps at 50+ engineers, not 4 |
|
||||
|
||||
## Success Criteria
|
||||
|
||||
Your architecture is right-sized when:
|
||||
|
||||
1. **You can draw it** - dependency graph fits on a whiteboard
|
||||
2. **You can explain it** - new team member understands data flow in 30 minutes
|
||||
3. **You can change it** - adding a feature touches 1-3 modules, not 10
|
||||
4. **You can delete it** - removing a component needs no archaeology
|
||||
5. **You can debug it** - tracing a request takes minutes, not hours
|
||||
6. **It matches your team** - deployable units ≤ number of teams
|
||||
|
||||
## When the Simple Solution Isn't Enough
|
||||
|
||||
If the complexity checklist says "yes, scale is real", see [scaling-checklist.md](scaling-checklist.md) for concrete techniques covering caching, async processing, partitioning, horizontal scaling, and multi-region.
|
||||
|
||||
## Iterative Architecture
|
||||
|
||||
Architecture is discovered, not designed upfront:
|
||||
|
||||
1. **Start obvious** - group by domain, not by technical layer
|
||||
2. **Let hotspots emerge** - monitor which modules change together
|
||||
3. **Extract when painful** - split only when the current form causes measurable problems
|
||||
4. **Document decisions** - record why boundaries exist so future you knows what's load-bearing
|
||||
|
||||
Every senior engineer has a graveyard of over-engineered systems they regret. Learn from their pain. Build boring systems that work.
|
||||
120
.cursor/skills/system-architecture/examples.md
Normal file
120
.cursor/skills/system-architecture/examples.md
Normal file
|
|
@ -0,0 +1,120 @@
|
|||
# Architecture Examples
|
||||
|
||||
Concrete good/bad examples for each core principle in SKILL.md.
|
||||
|
||||
---
|
||||
|
||||
## Boundaries at Real Differences
|
||||
|
||||
**Good** - Meaningful boundary:
|
||||
```
|
||||
# Users and Billing are separate bounded contexts
|
||||
# - Different teams own them
|
||||
# - Different change cadences (users: weekly, billing: quarterly)
|
||||
# - Different compliance requirements
|
||||
|
||||
src/
|
||||
users/ # User management domain
|
||||
models.py
|
||||
services.py
|
||||
api.py
|
||||
billing/ # Billing domain
|
||||
models.py
|
||||
services.py
|
||||
api.py
|
||||
shared/ # Truly shared utilities
|
||||
auth.py
|
||||
```
|
||||
|
||||
**Bad** - Ceremony without purpose:
|
||||
```
|
||||
# UserService → UserRepository → UserRepositoryImpl
|
||||
# ...when you'll never swap the database
|
||||
|
||||
src/
|
||||
interfaces/
|
||||
IUserRepository.py # One implementation exists
|
||||
repositories/
|
||||
UserRepositoryImpl.py # Wraps SQLAlchemy, which is already a repository
|
||||
services/
|
||||
UserService.py # Just calls the repository
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## Dependencies Flow Inward
|
||||
|
||||
**Good** - Clear dependency direction:
|
||||
```
|
||||
# Dependency flows inward: infrastructure → application → domain
|
||||
|
||||
domain/ # Pure business logic, no imports from outer layers
|
||||
order.py # Order entity with business rules
|
||||
|
||||
application/ # Use cases, orchestrates domain
|
||||
place_order.py # Imports from domain/, not infrastructure/
|
||||
|
||||
infrastructure/ # External concerns
|
||||
postgres.py # Implements persistence, imports from application/
|
||||
stripe.py # Implements payments
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## Follow the Data
|
||||
|
||||
**Good** - Obvious data flow:
|
||||
```
|
||||
Request → Validate → Transform → Store → Respond
|
||||
|
||||
# Each step is a clear function/module:
|
||||
api/routes.py # Request enters
|
||||
validators.py # Validation
|
||||
transformers.py # Business logic transformation
|
||||
repositories.py # Storage
|
||||
serializers.py # Response shaping
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## Design for Failure
|
||||
|
||||
**Good** - Failure-aware design with compensation:
|
||||
```python
|
||||
class OrderService:
|
||||
def place_order(self, order: Order) -> Result:
|
||||
inventory = self.inventory.reserve(order.items)
|
||||
if inventory.failed:
|
||||
return Result.failure("Items unavailable", retry=False)
|
||||
|
||||
payment = self.payments.charge(order.total)
|
||||
if payment.failed:
|
||||
self.inventory.release(inventory.reservation_id) # Compensate
|
||||
return Result.failure("Payment failed", retry=True)
|
||||
|
||||
return Result.success(order)
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## Design for Operations
|
||||
|
||||
**Good** - Observable architecture:
|
||||
```python
|
||||
@trace
|
||||
def handle_request(request):
|
||||
log.info("Processing", request_id=request.id, user=request.user_id)
|
||||
try:
|
||||
result = process(request)
|
||||
log.info("Completed", request_id=request.id, result=result.status)
|
||||
return result
|
||||
except Exception as e:
|
||||
log.error("Failed", request_id=request.id, error=str(e),
|
||||
context=request.to_dict()) # Full context for replay
|
||||
raise
|
||||
```
|
||||
|
||||
Key elements:
|
||||
- Every request gets a correlation ID
|
||||
- Every service logs with that ID
|
||||
- Every error includes full context for reproduction
|
||||
76
.cursor/skills/system-architecture/scaling-checklist.md
Normal file
76
.cursor/skills/system-architecture/scaling-checklist.md
Normal file
|
|
@ -0,0 +1,76 @@
|
|||
# Scaling Checklist
|
||||
|
||||
Concrete techniques for when the complexity checklist in SKILL.md confirms scale is a real problem. Apply in order - each level solves the previous level's bottleneck.
|
||||
|
||||
---
|
||||
|
||||
## Level 0: Optimize First
|
||||
|
||||
Before adding infrastructure, exhaust these:
|
||||
|
||||
- [ ] Database queries have proper indexes
|
||||
- [ ] N+1 queries eliminated
|
||||
- [ ] Connection pooling configured
|
||||
- [ ] Slow endpoints profiled and optimized
|
||||
- [ ] Static assets served via CDN
|
||||
|
||||
## Level 1: Read-Heavy
|
||||
|
||||
**Symptom**: Database reads are the bottleneck.
|
||||
|
||||
| Technique | When | Trade-off |
|
||||
|-----------|------|-----------|
|
||||
| Application cache (in-memory) | Small, frequently accessed data | Stale data, memory pressure |
|
||||
| Redis/Memcached | Shared cache across instances | Network hop, cache invalidation complexity |
|
||||
| Read replicas | High read volume, slight staleness OK | Replication lag, eventual consistency |
|
||||
| CDN | Static or semi-static content | Cache invalidation delay |
|
||||
|
||||
## Level 2: Write-Heavy
|
||||
|
||||
**Symptom**: Database writes or processing are the bottleneck.
|
||||
|
||||
| Technique | When | Trade-off |
|
||||
|-----------|------|-----------|
|
||||
| Async task queue (Celery, SQS) | Work can be deferred | Eventual consistency, failure handling |
|
||||
| Write-behind cache | Batch frequent writes | Data loss risk on crash |
|
||||
| Event streaming (Kafka) | Multiple consumers of same data | Operational complexity, ordering guarantees |
|
||||
| CQRS | Reads and writes have diverged significantly | Two models to maintain |
|
||||
|
||||
## Level 3: Traffic Spikes
|
||||
|
||||
**Symptom**: Individual instances can't handle peak load.
|
||||
|
||||
| Technique | When | Trade-off |
|
||||
|-----------|------|-----------|
|
||||
| Horizontal scaling + load balancer | Stateless services | Session management, deploy complexity |
|
||||
| Auto-scaling | Unpredictable traffic patterns | Cold start latency, cost spikes |
|
||||
| Rate limiting | Protect against abuse/spikes | Legitimate users may be throttled |
|
||||
| Circuit breakers | Downstream services degrade | Partial functionality during failures |
|
||||
|
||||
## Level 4: Data Growth
|
||||
|
||||
**Symptom**: Single database can't hold or query all the data efficiently.
|
||||
|
||||
| Technique | When | Trade-off |
|
||||
|-----------|------|-----------|
|
||||
| Table partitioning | Time-series or naturally partitioned data | Query complexity, partition management |
|
||||
| Archival / cold storage | Old data rarely accessed | Access latency for archived data |
|
||||
| Database sharding | Partitioning insufficient, clear shard key exists | Cross-shard queries, operational burden |
|
||||
| Search index (Elasticsearch) | Full-text or complex queries on large datasets | Index lag, another system to operate |
|
||||
|
||||
## Level 5: Multi-Region
|
||||
|
||||
**Symptom**: Users are geographically distributed, latency matters.
|
||||
|
||||
| Technique | When | Trade-off |
|
||||
|-----------|------|-----------|
|
||||
| CDN + edge caching | Static/semi-static content | Cache invalidation |
|
||||
| Read replicas per region | Read-heavy, slight staleness OK | Replication lag |
|
||||
| Active-passive failover | Disaster recovery | Failover time, cost of standby |
|
||||
| Active-active multi-region | True global low-latency required | Conflict resolution, extreme complexity |
|
||||
|
||||
---
|
||||
|
||||
## Decision Rule
|
||||
|
||||
Always start at Level 0. Move to the next level only when you have **measured evidence** that the current level is insufficient. Skipping levels is how you end up with Kafka for a TODO app.
|
||||
78
.github/workflows/desktop-release.yml
vendored
Normal file
78
.github/workflows/desktop-release.yml
vendored
Normal file
|
|
@ -0,0 +1,78 @@
|
|||
name: Desktop Release
|
||||
|
||||
on:
|
||||
push:
|
||||
tags:
|
||||
- 'v*'
|
||||
- 'beta-v*'
|
||||
|
||||
permissions:
|
||||
contents: write
|
||||
|
||||
jobs:
|
||||
build:
|
||||
runs-on: ${{ matrix.os }}
|
||||
strategy:
|
||||
fail-fast: false
|
||||
matrix:
|
||||
include:
|
||||
- os: macos-latest
|
||||
platform: --mac
|
||||
- os: ubuntu-latest
|
||||
platform: --linux
|
||||
- os: windows-latest
|
||||
platform: --win
|
||||
|
||||
steps:
|
||||
- name: Checkout
|
||||
uses: actions/checkout@v4
|
||||
|
||||
- name: Extract version from tag
|
||||
id: version
|
||||
shell: bash
|
||||
run: |
|
||||
TAG=${GITHUB_REF#refs/tags/}
|
||||
VERSION=${TAG#beta-}
|
||||
VERSION=${VERSION#v}
|
||||
echo "VERSION=$VERSION" >> "$GITHUB_OUTPUT"
|
||||
|
||||
- name: Setup pnpm
|
||||
uses: pnpm/action-setup@v4
|
||||
|
||||
- name: Setup Node.js
|
||||
uses: actions/setup-node@v4
|
||||
with:
|
||||
node-version: 20
|
||||
cache: 'pnpm'
|
||||
cache-dependency-path: |
|
||||
surfsense_web/pnpm-lock.yaml
|
||||
surfsense_desktop/pnpm-lock.yaml
|
||||
|
||||
- name: Install web dependencies
|
||||
run: pnpm install
|
||||
working-directory: surfsense_web
|
||||
|
||||
- name: Build Next.js standalone
|
||||
run: pnpm build
|
||||
working-directory: surfsense_web
|
||||
env:
|
||||
NEXT_PUBLIC_FASTAPI_BACKEND_URL: ${{ vars.NEXT_PUBLIC_FASTAPI_BACKEND_URL }}
|
||||
NEXT_PUBLIC_ELECTRIC_URL: ${{ vars.NEXT_PUBLIC_ELECTRIC_URL }}
|
||||
NEXT_PUBLIC_DEPLOYMENT_MODE: ${{ vars.NEXT_PUBLIC_DEPLOYMENT_MODE }}
|
||||
NEXT_PUBLIC_FASTAPI_BACKEND_AUTH_TYPE: ${{ vars.NEXT_PUBLIC_FASTAPI_BACKEND_AUTH_TYPE }}
|
||||
|
||||
- name: Install desktop dependencies
|
||||
run: pnpm install
|
||||
working-directory: surfsense_desktop
|
||||
|
||||
- name: Build Electron
|
||||
run: pnpm build
|
||||
working-directory: surfsense_desktop
|
||||
env:
|
||||
HOSTED_FRONTEND_URL: ${{ vars.HOSTED_FRONTEND_URL }}
|
||||
|
||||
- name: Package & Publish
|
||||
run: pnpm exec electron-builder ${{ matrix.platform }} --config electron-builder.yml --publish always -c.extraMetadata.version=${{ steps.version.outputs.VERSION }}
|
||||
working-directory: surfsense_desktop
|
||||
env:
|
||||
GH_TOKEN: ${{ secrets.GITHUB_TOKEN }}
|
||||
2
.gitignore
vendored
2
.gitignore
vendored
|
|
@ -5,4 +5,4 @@ node_modules/
|
|||
.ruff_cache/
|
||||
.venv
|
||||
.pnpm-store
|
||||
.DS_Store
|
||||
.DS_Store
|
||||
35
.vscode/launch.json
vendored
35
.vscode/launch.json
vendored
|
|
@ -22,7 +22,11 @@
|
|||
"console": "integratedTerminal",
|
||||
"justMyCode": false,
|
||||
"cwd": "${workspaceFolder}/surfsense_backend",
|
||||
"python": "${command:python.interpreterPath}"
|
||||
"python": "uv",
|
||||
"pythonArgs": [
|
||||
"run",
|
||||
"python"
|
||||
]
|
||||
},
|
||||
{
|
||||
"name": "Backend: FastAPI (No Reload)",
|
||||
|
|
@ -32,7 +36,11 @@
|
|||
"console": "integratedTerminal",
|
||||
"justMyCode": false,
|
||||
"cwd": "${workspaceFolder}/surfsense_backend",
|
||||
"python": "${command:python.interpreterPath}"
|
||||
"python": "uv",
|
||||
"pythonArgs": [
|
||||
"run",
|
||||
"python"
|
||||
]
|
||||
},
|
||||
{
|
||||
"name": "Backend: FastAPI (main.py)",
|
||||
|
|
@ -41,14 +49,19 @@
|
|||
"program": "${workspaceFolder}/surfsense_backend/main.py",
|
||||
"console": "integratedTerminal",
|
||||
"justMyCode": false,
|
||||
"cwd": "${workspaceFolder}/surfsense_backend"
|
||||
"cwd": "${workspaceFolder}/surfsense_backend",
|
||||
"python": "uv",
|
||||
"pythonArgs": [
|
||||
"run",
|
||||
"python"
|
||||
]
|
||||
},
|
||||
{
|
||||
"name": "Frontend: Next.js",
|
||||
"type": "node",
|
||||
"request": "launch",
|
||||
"cwd": "${workspaceFolder}/surfsense_web",
|
||||
"runtimeExecutable": "npm",
|
||||
"runtimeExecutable": "pnpm",
|
||||
"runtimeArgs": ["run", "dev"],
|
||||
"console": "integratedTerminal",
|
||||
"serverReadyAction": {
|
||||
|
|
@ -62,7 +75,7 @@
|
|||
"type": "node",
|
||||
"request": "launch",
|
||||
"cwd": "${workspaceFolder}/surfsense_web",
|
||||
"runtimeExecutable": "npm",
|
||||
"runtimeExecutable": "pnpm",
|
||||
"runtimeArgs": ["run", "debug:server"],
|
||||
"console": "integratedTerminal",
|
||||
"serverReadyAction": {
|
||||
|
|
@ -87,7 +100,11 @@
|
|||
"console": "integratedTerminal",
|
||||
"justMyCode": false,
|
||||
"cwd": "${workspaceFolder}/surfsense_backend",
|
||||
"python": "${command:python.interpreterPath}"
|
||||
"python": "uv",
|
||||
"pythonArgs": [
|
||||
"run",
|
||||
"python"
|
||||
]
|
||||
},
|
||||
{
|
||||
"name": "Celery: Beat Scheduler",
|
||||
|
|
@ -103,7 +120,11 @@
|
|||
"console": "integratedTerminal",
|
||||
"justMyCode": false,
|
||||
"cwd": "${workspaceFolder}/surfsense_backend",
|
||||
"python": "${command:python.interpreterPath}"
|
||||
"python": "uv",
|
||||
"pythonArgs": [
|
||||
"run",
|
||||
"python"
|
||||
]
|
||||
}
|
||||
],
|
||||
"compounds": [
|
||||
|
|
|
|||
3
.vscode/settings.json
vendored
3
.vscode/settings.json
vendored
|
|
@ -1,3 +1,4 @@
|
|||
{
|
||||
"biome.configurationPath": "./surfsense_web/biome.json"
|
||||
"biome.configurationPath": "./surfsense_web/biome.json",
|
||||
"deepscan.ignoreConfirmWarning": true
|
||||
}
|
||||
22
README.es.md
22
README.es.md
|
|
@ -27,11 +27,18 @@ SurfSense es un agente de investigación de IA altamente personalizable, conecta
|
|||
|
||||
|
||||
|
||||
# Video
|
||||
# Demo
|
||||
|
||||
https://github.com/user-attachments/assets/cc0c84d3-1f2f-4f7a-b519-2ecce22310b1
|
||||
|
||||
## Ejemplo de Podcast
|
||||
## Ejemplo de Agente de Video
|
||||
|
||||
|
||||
https://github.com/user-attachments/assets/cc977e6d-8292-4ffe-abb8-3b0560ef5562
|
||||
|
||||
|
||||
|
||||
## Ejemplo de Agente de Podcast
|
||||
|
||||
https://github.com/user-attachments/assets/a0a16566-6967-4374-ac51-9b3e07fbecd7
|
||||
|
||||
|
|
@ -46,20 +53,25 @@ https://github.com/user-attachments/assets/a0a16566-6967-4374-ac51-9b3e07fbecd7
|
|||
|
||||
2. Conecta tus conectores y sincroniza. Activa la sincronización periódica para mantenerlos actualizados.
|
||||
|
||||
<p align="center"><img src="https://github.com/user-attachments/assets/59da61d7-da05-4576-b7c0-dbc09f5985e8" alt="Conectores" /></p>
|
||||
<p align="center"><img src="https://github.com/user-attachments/assets/0740f351-23fa-4909-9880-70aa1dcc1df7" alt="Conectores" /></p>
|
||||
|
||||
3. Mientras se indexan los datos de los conectores, sube documentos.
|
||||
|
||||
<p align="center"><img src="https://github.com/user-attachments/assets/d1e8b2e2-9eac-41d8-bdc0-f0cdc405d128" alt="Subir Documentos" /></p>
|
||||
<p align="center"><img src="https://github.com/user-attachments/assets/daf3dbae-ef86-4e86-82ea-fcbcad988761" alt="Subir Documentos" /></p>
|
||||
|
||||
4. Una vez que todo esté indexado, pregunta lo que quieras (Casos de uso):
|
||||
|
||||
- Generación de videos
|
||||
|
||||
<p align="center"><img src="https://github.com/user-attachments/assets/af85c0f3-6cfd-4757-9706-07fd5e32c857" alt="Generación de Videos" /></p>
|
||||
|
||||
- Búsqueda básica y citaciones
|
||||
|
||||
<p align="center"><img src="https://github.com/user-attachments/assets/81e797a1-e01a-4003-8e60-0a0b3a9789df" alt="Búsqueda y Citación" /></p>
|
||||
|
||||
- QNA con mención de documentos
|
||||
|
||||
<p align="center"><img src="https://github.com/user-attachments/assets/65c3bf06-1d46-4dd5-b169-4d934c9b6798" alt="QNA con Mención de Documentos" /></p>
|
||||
<p align="center"><img src="https://github.com/user-attachments/assets/be958295-0a8c-4707-998c-9fe1f1c007be" alt="QNA con Mención de Documentos" /></p>
|
||||
|
||||
- Generación de informes y exportaciones (PDF, DOCX, HTML, LaTeX, EPUB, ODT, texto plano)
|
||||
|
|
@ -133,6 +145,8 @@ Para Docker Compose, instalación manual y otras opciones de despliegue, consult
|
|||
| Soporte Universal de LLM | 100+ LLMs, 6000+ modelos de embeddings, todos los principales rerankers vía OpenAI spec y LiteLLM |
|
||||
| Privacidad Primero | Soporte completo de LLM local (vLLM, Ollama) tus datos son tuyos |
|
||||
| Colaboración en Equipo | RBAC con roles de Propietario / Admin / Editor / Visor, chat en tiempo real e hilos de comentarios |
|
||||
| Generación de Videos | Genera videos con narración y visuales |
|
||||
| Generación de Presentaciones | Crea presentaciones editables basadas en diapositivas |
|
||||
| Generación de Podcasts | Podcast de 3 min en menos de 20 segundos; múltiples proveedores TTS (OpenAI, Azure, Kokoro) |
|
||||
| Extensión de Navegador | Extensión multi-navegador para guardar cualquier página web, incluyendo páginas protegidas por autenticación |
|
||||
| 25+ Conectores | Motores de búsqueda, Google Drive, Slack, Teams, Jira, Notion, GitHub, Discord y [más](#fuentes-externas) |
|
||||
|
|
|
|||
22
README.hi.md
22
README.hi.md
|
|
@ -27,11 +27,18 @@ SurfSense एक अत्यधिक अनुकूलन योग्य AI
|
|||
|
||||
|
||||
|
||||
# वीडियो
|
||||
# डेमो
|
||||
|
||||
https://github.com/user-attachments/assets/cc0c84d3-1f2f-4f7a-b519-2ecce22310b1
|
||||
|
||||
## पॉडकास्ट नमूना
|
||||
## वीडियो एजेंट नमूना
|
||||
|
||||
|
||||
https://github.com/user-attachments/assets/cc977e6d-8292-4ffe-abb8-3b0560ef5562
|
||||
|
||||
|
||||
|
||||
## पॉडकास्ट एजेंट नमूना
|
||||
|
||||
https://github.com/user-attachments/assets/a0a16566-6967-4374-ac51-9b3e07fbecd7
|
||||
|
||||
|
|
@ -46,20 +53,25 @@ https://github.com/user-attachments/assets/a0a16566-6967-4374-ac51-9b3e07fbecd7
|
|||
|
||||
2. अपने कनेक्टर जोड़ें और सिंक करें। कनेक्टर्स को अपडेट रखने के लिए आवधिक सिंकिंग सक्षम करें।
|
||||
|
||||
<p align="center"><img src="https://github.com/user-attachments/assets/59da61d7-da05-4576-b7c0-dbc09f5985e8" alt="कनेक्टर्स" /></p>
|
||||
<p align="center"><img src="https://github.com/user-attachments/assets/0740f351-23fa-4909-9880-70aa1dcc1df7" alt="कनेक्टर्स" /></p>
|
||||
|
||||
3. जब तक कनेक्टर्स का डेटा इंडेक्स हो रहा है, दस्तावेज़ अपलोड करें।
|
||||
|
||||
<p align="center"><img src="https://github.com/user-attachments/assets/d1e8b2e2-9eac-41d8-bdc0-f0cdc405d128" alt="दस्तावेज़ अपलोड करें" /></p>
|
||||
<p align="center"><img src="https://github.com/user-attachments/assets/daf3dbae-ef86-4e86-82ea-fcbcad988761" alt="दस्तावेज़ अपलोड करें" /></p>
|
||||
|
||||
4. सब कुछ इंडेक्स हो जाने के बाद, कुछ भी पूछें (उपयोग के मामले):
|
||||
|
||||
- वीडियो जनरेशन
|
||||
|
||||
<p align="center"><img src="https://github.com/user-attachments/assets/af85c0f3-6cfd-4757-9706-07fd5e32c857" alt="वीडियो जनरेशन" /></p>
|
||||
|
||||
- बेसिक सर्च और उद्धरण
|
||||
|
||||
<p align="center"><img src="https://github.com/user-attachments/assets/81e797a1-e01a-4003-8e60-0a0b3a9789df" alt="सर्च और उद्धरण" /></p>
|
||||
|
||||
- दस्तावेज़ मेंशन QNA
|
||||
|
||||
<p align="center"><img src="https://github.com/user-attachments/assets/65c3bf06-1d46-4dd5-b169-4d934c9b6798" alt="दस्तावेज़ मेंशन QNA" /></p>
|
||||
<p align="center"><img src="https://github.com/user-attachments/assets/be958295-0a8c-4707-998c-9fe1f1c007be" alt="दस्तावेज़ मेंशन QNA" /></p>
|
||||
|
||||
- रिपोर्ट जनरेशन और एक्सपोर्ट (PDF, DOCX, HTML, LaTeX, EPUB, ODT, सादा टेक्स्ट)
|
||||
|
|
@ -133,6 +145,8 @@ Docker Compose, मैनुअल इंस्टॉलेशन और अन
|
|||
| यूनिवर्सल LLM सपोर्ट | 100+ LLMs, 6000+ एम्बेडिंग मॉडल, सभी प्रमुख रीरैंकर्स OpenAI spec और LiteLLM के माध्यम से |
|
||||
| प्राइवेसी फर्स्ट | पूर्ण लोकल LLM सपोर्ट (vLLM, Ollama) आपका डेटा आपका रहता है |
|
||||
| टीम सहयोग | मालिक / एडमिन / संपादक / दर्शक भूमिकाओं के साथ RBAC, रीयल-टाइम चैट और कमेंट थ्रेड |
|
||||
| वीडियो जनरेशन | नैरेशन और विज़ुअल के साथ वीडियो बनाएं |
|
||||
| प्रेजेंटेशन जनरेशन | संपादन योग्य, स्लाइड आधारित प्रेजेंटेशन बनाएं |
|
||||
| पॉडकास्ट जनरेशन | 20 सेकंड से कम में 3 मिनट का पॉडकास्ट; कई TTS प्रदाता (OpenAI, Azure, Kokoro) |
|
||||
| ब्राउज़र एक्सटेंशन | किसी भी वेबपेज को सहेजने के लिए क्रॉस-ब्राउज़र एक्सटेंशन, प्रमाणीकरण सुरक्षित पेज सहित |
|
||||
| 25+ कनेक्टर्स | सर्च इंजन, Google Drive, Slack, Teams, Jira, Notion, GitHub, Discord और [अधिक](#बाहरी-स्रोत) |
|
||||
|
|
|
|||
22
README.md
22
README.md
|
|
@ -27,11 +27,18 @@ SurfSense is a highly customizable AI research agent, connected to external sour
|
|||
|
||||
|
||||
|
||||
# Video
|
||||
# Demo
|
||||
|
||||
https://github.com/user-attachments/assets/cc0c84d3-1f2f-4f7a-b519-2ecce22310b1
|
||||
|
||||
## Podcast Sample
|
||||
## Video Agent Sample
|
||||
|
||||
|
||||
https://github.com/user-attachments/assets/cc977e6d-8292-4ffe-abb8-3b0560ef5562
|
||||
|
||||
|
||||
|
||||
## Podcast Agent Sample
|
||||
|
||||
https://github.com/user-attachments/assets/a0a16566-6967-4374-ac51-9b3e07fbecd7
|
||||
|
||||
|
|
@ -46,20 +53,25 @@ https://github.com/user-attachments/assets/a0a16566-6967-4374-ac51-9b3e07fbecd7
|
|||
|
||||
2. Connect your connectors and sync. Enable periodic syncing to keep connectors synced.
|
||||
|
||||
<p align="center"><img src="https://github.com/user-attachments/assets/59da61d7-da05-4576-b7c0-dbc09f5985e8" alt="Connectors" /></p>
|
||||
<p align="center"><img src="https://github.com/user-attachments/assets/0740f351-23fa-4909-9880-70aa1dcc1df7" alt="Connectors" /></p>
|
||||
|
||||
3. Till connectors data index, upload Documents.
|
||||
|
||||
<p align="center"><img src="https://github.com/user-attachments/assets/d1e8b2e2-9eac-41d8-bdc0-f0cdc405d128" alt="Upload Documents" /></p>
|
||||
<p align="center"><img src="https://github.com/user-attachments/assets/daf3dbae-ef86-4e86-82ea-fcbcad988761" alt="Upload Documents" /></p>
|
||||
|
||||
4. Once everything is indexed, Ask Away (Use Cases):
|
||||
|
||||
- Video Generation
|
||||
|
||||
<p align="center"><img src="https://github.com/user-attachments/assets/af85c0f3-6cfd-4757-9706-07fd5e32c857" alt="Search and Citation" /></p>
|
||||
|
||||
- Basic search and citation
|
||||
|
||||
<p align="center"><img src="https://github.com/user-attachments/assets/81e797a1-e01a-4003-8e60-0a0b3a9789df" alt="Search and Citation" /></p>
|
||||
|
||||
- Document Mention QNA
|
||||
|
||||
<p align="center"><img src="https://github.com/user-attachments/assets/65c3bf06-1d46-4dd5-b169-4d934c9b6798" alt="Document Mention QNA" /></p>
|
||||
<p align="center"><img src="https://github.com/user-attachments/assets/be958295-0a8c-4707-998c-9fe1f1c007be" alt="Document Mention QNA" /></p>
|
||||
|
||||
- Report Generations and Exports (PDF, DOCX, HTML, LaTeX, EPUB, ODT, Plain Text)
|
||||
|
|
@ -133,6 +145,8 @@ For Docker Compose, manual installation, and other deployment options, see the [
|
|||
| Universal LLM Support | 100+ LLMs, 6000+ embedding models, all major rerankers via OpenAI spec & LiteLLM |
|
||||
| Privacy First | Full local LLM support (vLLM, Ollama) your data stays yours |
|
||||
| Team Collaboration | RBAC with Owner / Admin / Editor / Viewer roles, real time chat & comment threads |
|
||||
| Video Generation | Generate videos with narration and visuals |
|
||||
| Presentation Generation | Create editable, slide based presentations |
|
||||
| Podcast Generation | 3 min podcast in under 20 seconds; multiple TTS providers (OpenAI, Azure, Kokoro) |
|
||||
| Browser Extension | Cross browser extension to save any webpage, including auth protected pages |
|
||||
| 25+ Connectors | Search Engines, Google Drive, Slack, Teams, Jira, Notion, GitHub, Discord & [more](#external-sources) |
|
||||
|
|
|
|||
|
|
@ -27,11 +27,18 @@ SurfSense é um agente de pesquisa de IA altamente personalizável, conectado a
|
|||
|
||||
|
||||
|
||||
# Vídeo
|
||||
# Demo
|
||||
|
||||
https://github.com/user-attachments/assets/cc0c84d3-1f2f-4f7a-b519-2ecce22310b1
|
||||
|
||||
## Exemplo de Podcast
|
||||
## Exemplo de Agente de Vídeo
|
||||
|
||||
|
||||
https://github.com/user-attachments/assets/cc977e6d-8292-4ffe-abb8-3b0560ef5562
|
||||
|
||||
|
||||
|
||||
## Exemplo de Agente de Podcast
|
||||
|
||||
https://github.com/user-attachments/assets/a0a16566-6967-4374-ac51-9b3e07fbecd7
|
||||
|
||||
|
|
@ -46,20 +53,25 @@ https://github.com/user-attachments/assets/a0a16566-6967-4374-ac51-9b3e07fbecd7
|
|||
|
||||
2. Conecte seus conectores e sincronize. Ative a sincronização periódica para manter os conectores atualizados.
|
||||
|
||||
<p align="center"><img src="https://github.com/user-attachments/assets/59da61d7-da05-4576-b7c0-dbc09f5985e8" alt="Conectores" /></p>
|
||||
<p align="center"><img src="https://github.com/user-attachments/assets/0740f351-23fa-4909-9880-70aa1dcc1df7" alt="Conectores" /></p>
|
||||
|
||||
3. Enquanto os dados dos conectores são indexados, faça upload de documentos.
|
||||
|
||||
<p align="center"><img src="https://github.com/user-attachments/assets/d1e8b2e2-9eac-41d8-bdc0-f0cdc405d128" alt="Upload de Documentos" /></p>
|
||||
<p align="center"><img src="https://github.com/user-attachments/assets/daf3dbae-ef86-4e86-82ea-fcbcad988761" alt="Upload de Documentos" /></p>
|
||||
|
||||
4. Quando tudo estiver indexado, pergunte o que quiser (Casos de uso):
|
||||
|
||||
- Geração de vídeos
|
||||
|
||||
<p align="center"><img src="https://github.com/user-attachments/assets/af85c0f3-6cfd-4757-9706-07fd5e32c857" alt="Geração de Vídeos" /></p>
|
||||
|
||||
- Busca básica e citações
|
||||
|
||||
<p align="center"><img src="https://github.com/user-attachments/assets/81e797a1-e01a-4003-8e60-0a0b3a9789df" alt="Busca e Citação" /></p>
|
||||
|
||||
- QNA com menção de documentos
|
||||
|
||||
<p align="center"><img src="https://github.com/user-attachments/assets/65c3bf06-1d46-4dd5-b169-4d934c9b6798" alt="QNA com Menção de Documentos" /></p>
|
||||
<p align="center"><img src="https://github.com/user-attachments/assets/be958295-0a8c-4707-998c-9fe1f1c007be" alt="QNA com Menção de Documentos" /></p>
|
||||
|
||||
- Geração de relatórios e exportações (PDF, DOCX, HTML, LaTeX, EPUB, ODT, texto simples)
|
||||
|
|
@ -133,6 +145,8 @@ Para Docker Compose, instalação manual e outras opções de implantação, con
|
|||
| Suporte Universal de LLM | 100+ LLMs, 6000+ modelos de embeddings, todos os principais rerankers via OpenAI spec e LiteLLM |
|
||||
| Privacidade em Primeiro Lugar | Suporte completo a LLM local (vLLM, Ollama) seus dados ficam com você |
|
||||
| Colaboração em Equipe | RBAC com papéis de Proprietário / Admin / Editor / Visualizador, chat em tempo real e threads de comentários |
|
||||
| Geração de Vídeos | Gera vídeos com narração e visuais |
|
||||
| Geração de Apresentações | Cria apresentações editáveis baseadas em slides |
|
||||
| Geração de Podcasts | Podcast de 3 min em menos de 20 segundos; múltiplos provedores TTS (OpenAI, Azure, Kokoro) |
|
||||
| Extensão de Navegador | Extensão multi-navegador para salvar qualquer página web, incluindo páginas protegidas por autenticação |
|
||||
| 25+ Conectores | Mecanismos de busca, Google Drive, Slack, Teams, Jira, Notion, GitHub, Discord e [mais](#fontes-externas) |
|
||||
|
|
|
|||
|
|
@ -27,11 +27,18 @@ SurfSense 是一个高度可定制的 AI 研究助手,可以连接外部数据
|
|||
|
||||
|
||||
|
||||
# 视频
|
||||
# 演示
|
||||
|
||||
https://github.com/user-attachments/assets/cc0c84d3-1f2f-4f7a-b519-2ecce22310b1
|
||||
|
||||
## 播客示例
|
||||
## 视频代理示例
|
||||
|
||||
|
||||
https://github.com/user-attachments/assets/cc977e6d-8292-4ffe-abb8-3b0560ef5562
|
||||
|
||||
|
||||
|
||||
## 播客代理示例
|
||||
|
||||
https://github.com/user-attachments/assets/a0a16566-6967-4374-ac51-9b3e07fbecd7
|
||||
|
||||
|
|
@ -46,20 +53,25 @@ https://github.com/user-attachments/assets/a0a16566-6967-4374-ac51-9b3e07fbecd7
|
|||
|
||||
2. 连接您的连接器并同步。启用定期同步以保持连接器数据更新。
|
||||
|
||||
<p align="center"><img src="https://github.com/user-attachments/assets/59da61d7-da05-4576-b7c0-dbc09f5985e8" alt="连接器" /></p>
|
||||
<p align="center"><img src="https://github.com/user-attachments/assets/0740f351-23fa-4909-9880-70aa1dcc1df7" alt="连接器" /></p>
|
||||
|
||||
3. 在连接器数据索引期间,上传文档。
|
||||
|
||||
<p align="center"><img src="https://github.com/user-attachments/assets/d1e8b2e2-9eac-41d8-bdc0-f0cdc405d128" alt="上传文档" /></p>
|
||||
<p align="center"><img src="https://github.com/user-attachments/assets/daf3dbae-ef86-4e86-82ea-fcbcad988761" alt="上传文档" /></p>
|
||||
|
||||
4. 一切索引完成后,尽管提问(使用场景):
|
||||
|
||||
- 视频生成
|
||||
|
||||
<p align="center"><img src="https://github.com/user-attachments/assets/af85c0f3-6cfd-4757-9706-07fd5e32c857" alt="视频生成" /></p>
|
||||
|
||||
- 基本搜索和引用
|
||||
|
||||
<p align="center"><img src="https://github.com/user-attachments/assets/81e797a1-e01a-4003-8e60-0a0b3a9789df" alt="搜索和引用" /></p>
|
||||
|
||||
- 文档提及问答
|
||||
|
||||
<p align="center"><img src="https://github.com/user-attachments/assets/65c3bf06-1d46-4dd5-b169-4d934c9b6798" alt="文档提及问答" /></p>
|
||||
<p align="center"><img src="https://github.com/user-attachments/assets/be958295-0a8c-4707-998c-9fe1f1c007be" alt="文档提及问答" /></p>
|
||||
|
||||
- 报告生成和导出(PDF、DOCX、HTML、LaTeX、EPUB、ODT、纯文本)
|
||||
|
|
@ -133,6 +145,8 @@ irm https://raw.githubusercontent.com/MODSetter/SurfSense/main/docker/scripts/in
|
|||
| 通用 LLM 支持 | 100+ LLM、6000+ 嵌入模型、所有主流重排序器,通过 OpenAI spec 和 LiteLLM |
|
||||
| 隐私优先 | 完整本地 LLM 支持(vLLM、Ollama),您的数据由您掌控 |
|
||||
| 团队协作 | RBAC 角色控制(所有者/管理员/编辑者/查看者),实时聊天和评论线程 |
|
||||
| 视频生成 | 生成带有旁白和视觉效果的视频 |
|
||||
| 演示文稿生成 | 创建可编辑的幻灯片式演示文稿 |
|
||||
| 播客生成 | 20 秒内生成 3 分钟播客;多种 TTS 提供商(OpenAI、Azure、Kokoro) |
|
||||
| 浏览器扩展 | 跨浏览器扩展,保存任何网页,包括需要身份验证的页面 |
|
||||
| 25+ 连接器 | 搜索引擎、Google Drive、Slack、Teams、Jira、Notion、GitHub、Discord 等[更多](#外部数据源) |
|
||||
|
|
|
|||
|
|
@ -36,6 +36,7 @@ EMBEDDING_MODEL=sentence-transformers/all-MiniLM-L6-v2
|
|||
# BACKEND_PORT=8929
|
||||
# FRONTEND_PORT=3929
|
||||
# ELECTRIC_PORT=5929
|
||||
# SEARXNG_PORT=8888
|
||||
# FLOWER_PORT=5555
|
||||
|
||||
# ==============================================================================
|
||||
|
|
@ -199,6 +200,16 @@ STT_SERVICE=local/base
|
|||
# COMPOSIO_ENABLED=TRUE
|
||||
# COMPOSIO_REDIRECT_URI=http://localhost:8000/api/v1/auth/composio/connector/callback
|
||||
|
||||
# ------------------------------------------------------------------------------
|
||||
# SearXNG (bundled web search — works out of the box, no config needed)
|
||||
# ------------------------------------------------------------------------------
|
||||
# SearXNG provides web search to all search spaces automatically.
|
||||
# To access the SearXNG UI directly: http://localhost:8888
|
||||
# To disable the service entirely: docker compose up --scale searxng=0
|
||||
# To point at your own SearXNG instance instead of the bundled one:
|
||||
# SEARXNG_DEFAULT_HOST=http://your-searxng:8080
|
||||
# SEARXNG_SECRET=surfsense-searxng-secret
|
||||
|
||||
# ------------------------------------------------------------------------------
|
||||
# Daytona Sandbox (optional — cloud code execution for the deep agent)
|
||||
# ------------------------------------------------------------------------------
|
||||
|
|
|
|||
|
|
@ -57,6 +57,20 @@ services:
|
|||
timeout: 5s
|
||||
retries: 5
|
||||
|
||||
searxng:
|
||||
image: searxng/searxng:2026.3.13-3c1f68c59
|
||||
ports:
|
||||
- "${SEARXNG_PORT:-8888}:8080"
|
||||
volumes:
|
||||
- ./searxng:/etc/searxng
|
||||
environment:
|
||||
- SEARXNG_SECRET=${SEARXNG_SECRET:-surfsense-searxng-secret}
|
||||
healthcheck:
|
||||
test: ["CMD", "wget", "--spider", "-q", "http://localhost:8080/healthz"]
|
||||
interval: 10s
|
||||
timeout: 5s
|
||||
retries: 5
|
||||
|
||||
backend:
|
||||
build: ../surfsense_backend
|
||||
ports:
|
||||
|
|
@ -81,6 +95,7 @@ services:
|
|||
- ELECTRIC_DB_PASSWORD=${ELECTRIC_DB_PASSWORD:-electric_password}
|
||||
- AUTH_TYPE=${AUTH_TYPE:-LOCAL}
|
||||
- NEXT_FRONTEND_URL=${NEXT_FRONTEND_URL:-http://localhost:3000}
|
||||
- SEARXNG_DEFAULT_HOST=${SEARXNG_DEFAULT_HOST:-http://searxng:8080}
|
||||
# Daytona Sandbox – uncomment and set credentials to enable cloud code execution
|
||||
# - DAYTONA_SANDBOX_ENABLED=TRUE
|
||||
# - DAYTONA_API_KEY=${DAYTONA_API_KEY:-}
|
||||
|
|
@ -92,6 +107,8 @@ services:
|
|||
condition: service_healthy
|
||||
redis:
|
||||
condition: service_healthy
|
||||
searxng:
|
||||
condition: service_healthy
|
||||
healthcheck:
|
||||
test: ["CMD", "curl", "-f", "http://localhost:8000/health"]
|
||||
interval: 15s
|
||||
|
|
@ -115,6 +132,7 @@ services:
|
|||
- PYTHONPATH=/app
|
||||
- ELECTRIC_DB_USER=${ELECTRIC_DB_USER:-electric}
|
||||
- ELECTRIC_DB_PASSWORD=${ELECTRIC_DB_PASSWORD:-electric_password}
|
||||
- SEARXNG_DEFAULT_HOST=${SEARXNG_DEFAULT_HOST:-http://searxng:8080}
|
||||
- SERVICE_ROLE=worker
|
||||
depends_on:
|
||||
db:
|
||||
|
|
|
|||
|
|
@ -42,6 +42,19 @@ services:
|
|||
timeout: 5s
|
||||
retries: 5
|
||||
|
||||
searxng:
|
||||
image: searxng/searxng:2026.3.13-3c1f68c59
|
||||
volumes:
|
||||
- ./searxng:/etc/searxng
|
||||
environment:
|
||||
SEARXNG_SECRET: ${SEARXNG_SECRET:-surfsense-searxng-secret}
|
||||
restart: unless-stopped
|
||||
healthcheck:
|
||||
test: ["CMD", "wget", "--spider", "-q", "http://localhost:8080/healthz"]
|
||||
interval: 10s
|
||||
timeout: 5s
|
||||
retries: 5
|
||||
|
||||
backend:
|
||||
image: ghcr.io/modsetter/surfsense-backend:${SURFSENSE_VERSION:-latest}
|
||||
ports:
|
||||
|
|
@ -62,6 +75,7 @@ services:
|
|||
ELECTRIC_DB_USER: ${ELECTRIC_DB_USER:-electric}
|
||||
ELECTRIC_DB_PASSWORD: ${ELECTRIC_DB_PASSWORD:-electric_password}
|
||||
NEXT_FRONTEND_URL: ${NEXT_FRONTEND_URL:-http://localhost:${FRONTEND_PORT:-3929}}
|
||||
SEARXNG_DEFAULT_HOST: ${SEARXNG_DEFAULT_HOST:-http://searxng:8080}
|
||||
# Daytona Sandbox – uncomment and set credentials to enable cloud code execution
|
||||
# DAYTONA_SANDBOX_ENABLED: "TRUE"
|
||||
# DAYTONA_API_KEY: ${DAYTONA_API_KEY:-}
|
||||
|
|
@ -75,6 +89,8 @@ services:
|
|||
condition: service_healthy
|
||||
redis:
|
||||
condition: service_healthy
|
||||
searxng:
|
||||
condition: service_healthy
|
||||
restart: unless-stopped
|
||||
healthcheck:
|
||||
test: ["CMD", "curl", "-f", "http://localhost:8000/health"]
|
||||
|
|
@ -98,6 +114,7 @@ services:
|
|||
PYTHONPATH: /app
|
||||
ELECTRIC_DB_USER: ${ELECTRIC_DB_USER:-electric}
|
||||
ELECTRIC_DB_PASSWORD: ${ELECTRIC_DB_PASSWORD:-electric_password}
|
||||
SEARXNG_DEFAULT_HOST: ${SEARXNG_DEFAULT_HOST:-http://searxng:8080}
|
||||
SERVICE_ROLE: worker
|
||||
depends_on:
|
||||
db:
|
||||
|
|
|
|||
|
|
@ -103,6 +103,7 @@ Write-Step "Downloading SurfSense files"
|
|||
Write-Info "Installation directory: $InstallDir"
|
||||
|
||||
New-Item -ItemType Directory -Path "$InstallDir\scripts" -Force | Out-Null
|
||||
New-Item -ItemType Directory -Path "$InstallDir\searxng" -Force | Out-Null
|
||||
|
||||
$Files = @(
|
||||
@{ Src = "docker/docker-compose.yml"; Dest = "docker-compose.yml" }
|
||||
|
|
@ -110,6 +111,8 @@ $Files = @(
|
|||
@{ Src = "docker/postgresql.conf"; Dest = "postgresql.conf" }
|
||||
@{ Src = "docker/scripts/init-electric-user.sh"; Dest = "scripts/init-electric-user.sh" }
|
||||
@{ Src = "docker/scripts/migrate-database.ps1"; Dest = "scripts/migrate-database.ps1" }
|
||||
@{ Src = "docker/searxng/settings.yml"; Dest = "searxng/settings.yml" }
|
||||
@{ Src = "docker/searxng/limiter.toml"; Dest = "searxng/limiter.toml" }
|
||||
)
|
||||
|
||||
foreach ($f in $Files) {
|
||||
|
|
|
|||
|
|
@ -102,6 +102,7 @@ wait_for_pg() {
|
|||
step "Downloading SurfSense files"
|
||||
info "Installation directory: ${INSTALL_DIR}"
|
||||
mkdir -p "${INSTALL_DIR}/scripts"
|
||||
mkdir -p "${INSTALL_DIR}/searxng"
|
||||
|
||||
FILES=(
|
||||
"docker/docker-compose.yml:docker-compose.yml"
|
||||
|
|
@ -109,6 +110,8 @@ FILES=(
|
|||
"docker/postgresql.conf:postgresql.conf"
|
||||
"docker/scripts/init-electric-user.sh:scripts/init-electric-user.sh"
|
||||
"docker/scripts/migrate-database.sh:scripts/migrate-database.sh"
|
||||
"docker/searxng/settings.yml:searxng/settings.yml"
|
||||
"docker/searxng/limiter.toml:searxng/limiter.toml"
|
||||
)
|
||||
|
||||
for entry in "${FILES[@]}"; do
|
||||
|
|
|
|||
5
docker/searxng/limiter.toml
Normal file
5
docker/searxng/limiter.toml
Normal file
|
|
@ -0,0 +1,5 @@
|
|||
[botdetection.ip_limit]
|
||||
link_token = false
|
||||
|
||||
[botdetection.ip_lists]
|
||||
pass_ip = ["0.0.0.0/0"]
|
||||
90
docker/searxng/settings.yml
Normal file
90
docker/searxng/settings.yml
Normal file
|
|
@ -0,0 +1,90 @@
|
|||
use_default_settings:
|
||||
engines:
|
||||
remove:
|
||||
- ahmia
|
||||
- torch
|
||||
- qwant
|
||||
- qwant news
|
||||
- qwant images
|
||||
- qwant videos
|
||||
- mojeek
|
||||
- mojeek images
|
||||
- mojeek news
|
||||
|
||||
server:
|
||||
secret_key: "override-me-via-env"
|
||||
limiter: false
|
||||
image_proxy: false
|
||||
method: "GET"
|
||||
default_http_headers:
|
||||
X-Robots-Tag: "noindex, nofollow"
|
||||
|
||||
search:
|
||||
formats:
|
||||
- html
|
||||
- json
|
||||
default_lang: "auto"
|
||||
autocomplete: ""
|
||||
safe_search: 0
|
||||
ban_time_on_fail: 5
|
||||
max_ban_time_on_fail: 120
|
||||
suspended_times:
|
||||
SearxEngineAccessDenied: 3600
|
||||
SearxEngineCaptcha: 3600
|
||||
SearxEngineTooManyRequests: 600
|
||||
cf_SearxEngineCaptcha: 7200
|
||||
cf_SearxEngineAccessDenied: 3600
|
||||
recaptcha_SearxEngineCaptcha: 7200
|
||||
|
||||
ui:
|
||||
static_use_hash: true
|
||||
|
||||
outgoing:
|
||||
request_timeout: 12.0
|
||||
max_request_timeout: 20.0
|
||||
pool_connections: 100
|
||||
pool_maxsize: 20
|
||||
enable_http2: true
|
||||
extra_proxy_timeout: 10
|
||||
retries: 1
|
||||
# Uncomment and set your residential proxy URL to route search engine requests through it.
|
||||
# Format: http://<username>:<base64_password>@<hostname>:<port>/
|
||||
#
|
||||
# proxies:
|
||||
# all://:
|
||||
# - http://user:pass@proxy-host:port/
|
||||
|
||||
engines:
|
||||
- name: google
|
||||
disabled: false
|
||||
weight: 1.2
|
||||
retry_on_http_error: [429, 503]
|
||||
- name: duckduckgo
|
||||
disabled: false
|
||||
weight: 1.1
|
||||
retry_on_http_error: [429, 503]
|
||||
- name: brave
|
||||
disabled: false
|
||||
weight: 1.0
|
||||
retry_on_http_error: [429, 503]
|
||||
- name: bing
|
||||
disabled: false
|
||||
weight: 0.9
|
||||
retry_on_http_error: [429, 503]
|
||||
- name: wikipedia
|
||||
disabled: false
|
||||
weight: 0.8
|
||||
- name: stackoverflow
|
||||
disabled: false
|
||||
weight: 0.7
|
||||
- name: yahoo
|
||||
disabled: false
|
||||
weight: 0.7
|
||||
retry_on_http_error: [429, 503]
|
||||
- name: wikidata
|
||||
disabled: false
|
||||
weight: 0.6
|
||||
- name: currency
|
||||
disabled: false
|
||||
- name: ddg definitions
|
||||
disabled: false
|
||||
|
|
@ -14,6 +14,7 @@ SurfSense 现已支持以下国产 LLM:
|
|||
- ✅ **阿里通义千问 (Alibaba Qwen)** - 阿里云通义千问大模型
|
||||
- ✅ **月之暗面 Kimi (Moonshot)** - 月之暗面 Kimi 大模型
|
||||
- ✅ **智谱 AI GLM (Zhipu)** - 智谱 AI GLM 系列模型
|
||||
- ✅ **MiniMax** - MiniMax 大模型 (M2.5 系列,204K 上下文)
|
||||
|
||||
---
|
||||
|
||||
|
|
@ -197,6 +198,52 @@ API Base URL: https://open.bigmodel.cn/api/paas/v4
|
|||
|
||||
---
|
||||
|
||||
## 5️⃣ MiniMax 配置 | MiniMax Configuration
|
||||
|
||||
### 获取 API Key
|
||||
|
||||
1. 访问 [MiniMax 开放平台](https://platform.minimaxi.com/)
|
||||
2. 注册并登录账号
|
||||
3. 进入 **API Keys** 页面
|
||||
4. 创建新的 API Key
|
||||
5. 复制 API Key
|
||||
|
||||
### 在 SurfSense 中配置
|
||||
|
||||
| 字段 | 值 | 说明 |
|
||||
|------|-----|------|
|
||||
| **Configuration Name** | `MiniMax M2.5` | 配置名称(自定义) |
|
||||
| **Provider** | `MINIMAX` | 选择 MiniMax |
|
||||
| **Model Name** | `MiniMax-M2.5` | 推荐模型<br>其他选项: `MiniMax-M2.5-highspeed` |
|
||||
| **API Key** | `eyJ...` | 你的 MiniMax API Key |
|
||||
| **API Base URL** | `https://api.minimax.io/v1` | MiniMax API 地址 |
|
||||
| **Parameters** | `{"temperature": 1.0}` | 注意:temperature 必须在 (0.0, 1.0] 范围内,不能为 0 |
|
||||
|
||||
### 示例配置
|
||||
|
||||
```
|
||||
Configuration Name: MiniMax M2.5
|
||||
Provider: MINIMAX
|
||||
Model Name: MiniMax-M2.5
|
||||
API Key: eyJxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx
|
||||
API Base URL: https://api.minimax.io/v1
|
||||
```
|
||||
|
||||
### 可用模型
|
||||
|
||||
- **MiniMax-M2.5**: 高性能通用模型,204K 上下文窗口(推荐)
|
||||
- **MiniMax-M2.5-highspeed**: 高速推理版本,204K 上下文窗口
|
||||
|
||||
### 注意事项
|
||||
|
||||
- **temperature 参数**: MiniMax 要求 temperature 必须在 (0.0, 1.0] 范围内,不能设置为 0。建议使用 1.0。
|
||||
- 两个模型都支持 204K 超长上下文窗口,适合处理长文本任务。
|
||||
|
||||
### 定价
|
||||
- 请访问 [MiniMax 定价页面](https://platform.minimaxi.com/document/Price) 查看最新价格
|
||||
|
||||
---
|
||||
|
||||
## ⚙️ 高级配置 | Advanced Configuration
|
||||
|
||||
### 自定义参数 | Custom Parameters
|
||||
|
|
@ -268,8 +315,8 @@ docker compose logs backend | grep -i "error"
|
|||
|---------|---------|------|
|
||||
| **文档摘要** | Qwen-Plus, GLM-4 | 平衡性能和成本 |
|
||||
| **代码分析** | DeepSeek-Coder | 代码专用 |
|
||||
| **长文本处理** | Kimi 128K | 超长上下文 |
|
||||
| **快速响应** | Qwen-Turbo, GLM-4-Flash | 速度优先 |
|
||||
| **长文本处理** | Kimi 128K, MiniMax-M2.5 (204K) | 超长上下文 |
|
||||
| **快速响应** | Qwen-Turbo, GLM-4-Flash, MiniMax-M2.5-highspeed | 速度优先 |
|
||||
|
||||
### 2. 成本优化
|
||||
|
||||
|
|
@ -294,6 +341,7 @@ docker compose logs backend | grep -i "error"
|
|||
- [阿里云百炼文档](https://help.aliyun.com/zh/model-studio/)
|
||||
- [Moonshot AI 文档](https://platform.moonshot.cn/docs)
|
||||
- [智谱 AI 文档](https://open.bigmodel.cn/dev/api)
|
||||
- [MiniMax 文档](https://platform.minimaxi.com/document/Guides)
|
||||
|
||||
### SurfSense 文档
|
||||
|
||||
|
|
|
|||
|
|
@ -12,6 +12,11 @@ REDIS_APP_URL=redis://localhost:6379/0
|
|||
# Optional: TTL in seconds for connector indexing lock key
|
||||
# CONNECTOR_INDEXING_LOCK_TTL_SECONDS=28800
|
||||
|
||||
# Platform Web Search (SearXNG)
|
||||
# Set this to enable built-in web search. Docker Compose sets it automatically.
|
||||
# Only uncomment if running the backend outside Docker (e.g. uvicorn on host).
|
||||
# SEARXNG_DEFAULT_HOST=http://localhost:8888
|
||||
|
||||
#Electric(for migrations only)
|
||||
ELECTRIC_DB_USER=electric
|
||||
ELECTRIC_DB_PASSWORD=electric_password
|
||||
|
|
|
|||
1
surfsense_backend/.gitignore
vendored
1
surfsense_backend/.gitignore
vendored
|
|
@ -6,6 +6,7 @@ __pycache__/
|
|||
.flashrank_cache
|
||||
surf_new_backend.egg-info/
|
||||
podcasts/
|
||||
video_presentation_audio/
|
||||
sandbox_files/
|
||||
temp_audio/
|
||||
celerybeat-schedule*
|
||||
|
|
|
|||
|
|
@ -0,0 +1,23 @@
|
|||
"""Add MINIMAX to LiteLLMProvider enum
|
||||
|
||||
Revision ID: 106
|
||||
Revises: 105
|
||||
"""
|
||||
|
||||
from collections.abc import Sequence
|
||||
|
||||
from alembic import op
|
||||
|
||||
revision: str = "106"
|
||||
down_revision: str | None = "105"
|
||||
branch_labels: str | Sequence[str] | None = None
|
||||
depends_on: str | Sequence[str] | None = None
|
||||
|
||||
|
||||
def upgrade() -> None:
|
||||
op.execute("COMMIT")
|
||||
op.execute("ALTER TYPE litellmprovider ADD VALUE IF NOT EXISTS 'MINIMAX'")
|
||||
|
||||
|
||||
def downgrade() -> None:
|
||||
pass
|
||||
|
|
@ -0,0 +1,85 @@
|
|||
"""Add video_presentations table and video_presentation_status enum
|
||||
|
||||
Revision ID: 107
|
||||
Revises: 106
|
||||
"""
|
||||
|
||||
from collections.abc import Sequence
|
||||
|
||||
import sqlalchemy as sa
|
||||
from sqlalchemy.dialects.postgresql import JSONB
|
||||
|
||||
from alembic import op
|
||||
|
||||
revision: str = "107"
|
||||
down_revision: str | None = "106"
|
||||
branch_labels: str | Sequence[str] | None = None
|
||||
depends_on: str | Sequence[str] | None = None
|
||||
|
||||
video_presentation_status_enum = sa.Enum(
|
||||
"pending",
|
||||
"generating",
|
||||
"ready",
|
||||
"failed",
|
||||
name="video_presentation_status",
|
||||
)
|
||||
|
||||
|
||||
def upgrade() -> None:
|
||||
video_presentation_status_enum.create(op.get_bind(), checkfirst=True)
|
||||
|
||||
op.create_table(
|
||||
"video_presentations",
|
||||
sa.Column("id", sa.Integer(), autoincrement=True, nullable=False),
|
||||
sa.Column("title", sa.String(length=500), nullable=False),
|
||||
sa.Column("slides", JSONB(), nullable=True),
|
||||
sa.Column("scene_codes", JSONB(), nullable=True),
|
||||
sa.Column(
|
||||
"status",
|
||||
video_presentation_status_enum,
|
||||
server_default="ready",
|
||||
nullable=False,
|
||||
),
|
||||
sa.Column("search_space_id", sa.Integer(), nullable=False),
|
||||
sa.Column("thread_id", sa.Integer(), nullable=True),
|
||||
sa.Column(
|
||||
"created_at",
|
||||
sa.TIMESTAMP(timezone=True),
|
||||
server_default=sa.text("now()"),
|
||||
nullable=False,
|
||||
),
|
||||
sa.ForeignKeyConstraint(
|
||||
["search_space_id"],
|
||||
["searchspaces.id"],
|
||||
ondelete="CASCADE",
|
||||
),
|
||||
sa.ForeignKeyConstraint(
|
||||
["thread_id"],
|
||||
["new_chat_threads.id"],
|
||||
ondelete="SET NULL",
|
||||
),
|
||||
sa.PrimaryKeyConstraint("id"),
|
||||
)
|
||||
op.create_index(
|
||||
"ix_video_presentations_status",
|
||||
"video_presentations",
|
||||
["status"],
|
||||
)
|
||||
op.create_index(
|
||||
"ix_video_presentations_thread_id",
|
||||
"video_presentations",
|
||||
["thread_id"],
|
||||
)
|
||||
op.create_index(
|
||||
"ix_video_presentations_created_at",
|
||||
"video_presentations",
|
||||
["created_at"],
|
||||
)
|
||||
|
||||
|
||||
def downgrade() -> None:
|
||||
op.drop_index("ix_video_presentations_created_at", table_name="video_presentations")
|
||||
op.drop_index("ix_video_presentations_thread_id", table_name="video_presentations")
|
||||
op.drop_index("ix_video_presentations_status", table_name="video_presentations")
|
||||
op.drop_table("video_presentations")
|
||||
video_presentation_status_enum.drop(op.get_bind(), checkfirst=True)
|
||||
|
|
@ -37,13 +37,15 @@ _perf_log = get_perf_logger()
|
|||
# =============================================================================
|
||||
|
||||
# Maps SearchSourceConnectorType enum values to the searchable document/connector types
|
||||
# used by the knowledge_base tool. Some connectors map to different document types.
|
||||
# used by the knowledge_base and web_search tools.
|
||||
# Live search connectors (TAVILY_API, LINKUP_API, BAIDU_SEARCH_API) are routed to
|
||||
# the web_search tool; all others go to search_knowledge_base.
|
||||
_CONNECTOR_TYPE_TO_SEARCHABLE: dict[str, str] = {
|
||||
# Direct mappings (connector type == searchable type)
|
||||
# Live search connectors (handled by web_search tool)
|
||||
"TAVILY_API": "TAVILY_API",
|
||||
"SEARXNG_API": "SEARXNG_API",
|
||||
"LINKUP_API": "LINKUP_API",
|
||||
"BAIDU_SEARCH_API": "BAIDU_SEARCH_API",
|
||||
# Local/indexed connectors (handled by search_knowledge_base tool)
|
||||
"SLACK_CONNECTOR": "SLACK_CONNECTOR",
|
||||
"TEAMS_CONNECTOR": "TEAMS_CONNECTOR",
|
||||
"NOTION_CONNECTOR": "NOTION_CONNECTOR",
|
||||
|
|
@ -233,6 +235,7 @@ async def create_surfsense_deep_agent(
|
|||
available_document_types = await connector_service.get_available_document_types(
|
||||
search_space_id
|
||||
)
|
||||
|
||||
except Exception as e:
|
||||
logging.warning(f"Failed to discover available connectors/document types: {e}")
|
||||
_perf_log.info(
|
||||
|
|
|
|||
|
|
@ -59,6 +59,7 @@ PROVIDER_MAP = {
|
|||
"DATABRICKS": "databricks",
|
||||
"COMETAPI": "cometapi",
|
||||
"HUGGINGFACE": "huggingface",
|
||||
"MINIMAX": "openai",
|
||||
"CUSTOM": "custom",
|
||||
}
|
||||
|
||||
|
|
|
|||
|
|
@ -99,14 +99,8 @@ _TOOL_INSTRUCTIONS["search_knowledge_base"] = """
|
|||
- IMPORTANT: When searching for information (meetings, schedules, notes, tasks, etc.), ALWAYS search broadly
|
||||
across ALL sources first by omitting connectors_to_search. The user may store information in various places
|
||||
including calendar apps, note-taking apps (Obsidian, Notion), chat apps (Slack, Discord), and more.
|
||||
- IMPORTANT (REAL-TIME / PUBLIC WEB QUERIES): For questions that require current public web data
|
||||
(e.g., live exchange rates, stock prices, breaking news, weather, current events), you MUST call
|
||||
`search_knowledge_base` using live web connectors via `connectors_to_search`:
|
||||
["LINKUP_API", "TAVILY_API", "SEARXNG_API", "BAIDU_SEARCH_API"].
|
||||
- For these real-time/public web queries, DO NOT answer from memory and DO NOT say you lack internet
|
||||
access before attempting a live connector search.
|
||||
- If the live connectors return no relevant results, explain that live web sources did not return enough
|
||||
data and ask the user if they want you to retry with a refined query.
|
||||
- This tool searches ONLY local/indexed data (uploaded files, Notion, Slack, browser extension captures, etc.).
|
||||
For real-time web search (current events, news, live data), use the `web_search` tool instead.
|
||||
- FALLBACK BEHAVIOR: If the search returns no relevant results, you MAY then answer using your own
|
||||
general knowledge, but clearly indicate that no matching information was found in the knowledge base.
|
||||
- Only narrow to specific connectors if the user explicitly asks (e.g., "check my Slack" or "in my calendar").
|
||||
|
|
@ -138,6 +132,17 @@ _TOOL_INSTRUCTIONS["generate_podcast"] = """
|
|||
- After calling this tool, inform the user that podcast generation has started and they will see the player when it's ready (takes 3-5 minutes).
|
||||
"""
|
||||
|
||||
_TOOL_INSTRUCTIONS["generate_video_presentation"] = """
|
||||
- generate_video_presentation: Generate a video presentation from provided content.
|
||||
- Use this when the user asks to create a video, presentation, slides, or slide deck.
|
||||
- Trigger phrases: "give me a presentation", "create slides", "generate a video", "make a slide deck", "turn this into a presentation"
|
||||
- Args:
|
||||
- source_content: The text content to turn into a presentation. The more detailed, the better.
|
||||
- video_title: Optional title (default: "SurfSense Presentation")
|
||||
- user_prompt: Optional style instructions (e.g., "Make it technical and detailed")
|
||||
- After calling this tool, inform the user that generation has started and they will see the presentation when it's ready.
|
||||
"""
|
||||
|
||||
_TOOL_INSTRUCTIONS["generate_report"] = """
|
||||
- generate_report: Generate or revise a structured Markdown report artifact.
|
||||
- WHEN TO CALL THIS TOOL — the message must contain a creation or modification VERB directed at producing a deliverable:
|
||||
|
|
@ -271,6 +276,24 @@ _TOOL_INSTRUCTIONS["scrape_webpage"] = """
|
|||
* Don't show every image - just the most relevant 1-3 images that enhance understanding.
|
||||
"""
|
||||
|
||||
_TOOL_INSTRUCTIONS["web_search"] = """
|
||||
- web_search: Search the web for real-time information using all configured search engines.
|
||||
- Use this for current events, news, prices, weather, public facts, or any question requiring
|
||||
up-to-date information from the internet.
|
||||
- This tool dispatches to all configured search engines (SearXNG, Tavily, Linkup, Baidu) in
|
||||
parallel and merges the results.
|
||||
- IMPORTANT (REAL-TIME / PUBLIC WEB QUERIES): For questions that require current public web data
|
||||
(e.g., live exchange rates, stock prices, breaking news, weather, current events), you MUST call
|
||||
`web_search` instead of answering from memory.
|
||||
- For these real-time/public web queries, DO NOT answer from memory and DO NOT say you lack internet
|
||||
access before attempting a web search.
|
||||
- If the search returns no relevant results, explain that web sources did not return enough
|
||||
data and ask the user if they want you to retry with a refined query.
|
||||
- Args:
|
||||
- query: The search query - use specific, descriptive terms
|
||||
- top_k: Number of results to retrieve (default: 10, max: 50)
|
||||
"""
|
||||
|
||||
# Memory tool instructions have private and shared variants.
|
||||
# We store them keyed as "save_memory" / "recall_memory" with sub-keys.
|
||||
_MEMORY_TOOL_INSTRUCTIONS: dict[str, dict[str, str]] = {
|
||||
|
|
@ -401,7 +424,7 @@ _TOOL_EXAMPLES["search_knowledge_base"] = """
|
|||
- User: "Check my Obsidian notes for meeting notes"
|
||||
- Call: `search_knowledge_base(query="meeting notes", connectors_to_search=["OBSIDIAN_CONNECTOR"])`
|
||||
- User: "search me current usd to inr rate"
|
||||
- Call: `search_knowledge_base(query="current USD to INR exchange rate", connectors_to_search=["LINKUP_API", "TAVILY_API", "SEARXNG_API", "BAIDU_SEARCH_API"])`
|
||||
- Call: `web_search(query="current USD to INR exchange rate")`
|
||||
- Then answer using the returned live web results with citations.
|
||||
"""
|
||||
|
||||
|
|
@ -426,6 +449,16 @@ _TOOL_EXAMPLES["generate_podcast"] = """
|
|||
- Then: `generate_podcast(source_content="Key insights about quantum computing from the knowledge base:\\n\\n[Comprehensive summary of all relevant search results with key facts, concepts, and findings]", podcast_title="Quantum Computing Explained")`
|
||||
"""
|
||||
|
||||
_TOOL_EXAMPLES["generate_video_presentation"] = """
|
||||
- User: "Give me a presentation about AI trends based on what we discussed"
|
||||
- First search for relevant content, then call: `generate_video_presentation(source_content="Based on our conversation and search results: [detailed summary of chat + search findings]", video_title="AI Trends Presentation")`
|
||||
- User: "Create slides summarizing this conversation"
|
||||
- Call: `generate_video_presentation(source_content="Complete conversation summary:\\n\\nUser asked about [topic 1]:\\n[Your detailed response]\\n\\nUser then asked about [topic 2]:\\n[Your detailed response]\\n\\n[Continue for all exchanges in the conversation]", video_title="Conversation Summary")`
|
||||
- User: "Make a video presentation about quantum computing"
|
||||
- First search: `search_knowledge_base(query="quantum computing")`
|
||||
- Then: `generate_video_presentation(source_content="Key insights about quantum computing from the knowledge base:\\n\\n[Comprehensive summary of all relevant search results with key facts, concepts, and findings]", video_title="Quantum Computing Explained")`
|
||||
"""
|
||||
|
||||
_TOOL_EXAMPLES["generate_report"] = """
|
||||
- User: "Generate a report about AI trends"
|
||||
- Call: `generate_report(topic="AI Trends Report", source_strategy="kb_search", search_queries=["AI trends recent developments", "artificial intelligence industry trends", "AI market growth and predictions"], report_style="detailed")`
|
||||
|
|
@ -471,11 +504,23 @@ _TOOL_EXAMPLES["generate_image"] = """
|
|||
- Step 2: `display_image(src="<returned_url>", alt="Bean Dream coffee shop logo", title="Generated Image")`
|
||||
"""
|
||||
|
||||
_TOOL_EXAMPLES["web_search"] = """
|
||||
- User: "What's the current USD to INR exchange rate?"
|
||||
- Call: `web_search(query="current USD to INR exchange rate")`
|
||||
- Then answer using the returned web results with citations.
|
||||
- User: "What's the latest news about AI?"
|
||||
- Call: `web_search(query="latest AI news today")`
|
||||
- User: "What's the weather in New York?"
|
||||
- Call: `web_search(query="weather New York today")`
|
||||
"""
|
||||
|
||||
# All tool names that have prompt instructions (order matters for prompt readability)
|
||||
_ALL_TOOL_NAMES_ORDERED = [
|
||||
"search_surfsense_docs",
|
||||
"search_knowledge_base",
|
||||
"web_search",
|
||||
"generate_podcast",
|
||||
"generate_video_presentation",
|
||||
"generate_report",
|
||||
"link_preview",
|
||||
"display_image",
|
||||
|
|
@ -543,7 +588,7 @@ DISABLED TOOLS (by user):
|
|||
The following tools are available in SurfSense but have been disabled by the user for this session: {disabled_list}.
|
||||
You do NOT have access to these tools and MUST NOT claim you can use them.
|
||||
If the user asks about a capability provided by a disabled tool, let them know the relevant tool
|
||||
is currently disabled and they can re-enable it from the tools menu (wrench icon) in the composer toolbar.
|
||||
is currently disabled and they can re-enable it.
|
||||
""")
|
||||
|
||||
parts.append("\n</tools>\n")
|
||||
|
|
@ -595,11 +640,10 @@ The documents you receive are structured like this:
|
|||
</document_content>
|
||||
</document>
|
||||
|
||||
**Live web search results (URL chunk IDs):**
|
||||
**Web search results (URL chunk IDs):**
|
||||
<document>
|
||||
<document_metadata>
|
||||
<document_id>TAVILY_API::Some Title::https://example.com/article</document_id>
|
||||
<document_type>TAVILY_API</document_type>
|
||||
<document_type>WEB_SEARCH</document_type>
|
||||
<title><![CDATA[Some web search result]]></title>
|
||||
<url><![CDATA[https://example.com/article]]></url>
|
||||
</document_metadata>
|
||||
|
|
|
|||
|
|
@ -8,6 +8,7 @@ Available tools:
|
|||
- search_knowledge_base: Search the user's personal knowledge base
|
||||
- search_surfsense_docs: Search Surfsense documentation for usage help
|
||||
- generate_podcast: Generate audio podcasts from content
|
||||
- generate_video_presentation: Generate video presentations with slides and narration
|
||||
- generate_image: Generate images from text descriptions using AI models
|
||||
- link_preview: Fetch rich previews for URLs
|
||||
- display_image: Display images in chat
|
||||
|
|
@ -39,6 +40,7 @@ from .registry import (
|
|||
from .scrape_webpage import create_scrape_webpage_tool
|
||||
from .search_surfsense_docs import create_search_surfsense_docs_tool
|
||||
from .user_memory import create_recall_memory_tool, create_save_memory_tool
|
||||
from .video_presentation import create_generate_video_presentation_tool
|
||||
|
||||
__all__ = [
|
||||
# Registry
|
||||
|
|
@ -51,6 +53,7 @@ __all__ = [
|
|||
"create_display_image_tool",
|
||||
"create_generate_image_tool",
|
||||
"create_generate_podcast_tool",
|
||||
"create_generate_video_presentation_tool",
|
||||
"create_link_preview_tool",
|
||||
"create_recall_memory_tool",
|
||||
"create_save_memory_tool",
|
||||
|
|
|
|||
|
|
@ -23,11 +23,10 @@ from app.db import shielded_async_session
|
|||
from app.services.connector_service import ConnectorService
|
||||
from app.utils.perf import get_perf_logger
|
||||
|
||||
# Connectors that call external live-search APIs (no local DB / embedding needed).
|
||||
# These are never filtered by available_document_types.
|
||||
# Connectors that call external live-search APIs. These are handled by the
|
||||
# ``web_search`` tool and must be excluded from knowledge-base searches.
|
||||
_LIVE_SEARCH_CONNECTORS: set[str] = {
|
||||
"TAVILY_API",
|
||||
"SEARXNG_API",
|
||||
"LINKUP_API",
|
||||
"BAIDU_SEARCH_API",
|
||||
}
|
||||
|
|
@ -190,10 +189,6 @@ _ALL_CONNECTORS: list[str] = [
|
|||
"GOOGLE_DRIVE_FILE",
|
||||
"DISCORD_CONNECTOR",
|
||||
"AIRTABLE_CONNECTOR",
|
||||
"TAVILY_API",
|
||||
"SEARXNG_API",
|
||||
"LINKUP_API",
|
||||
"BAIDU_SEARCH_API",
|
||||
"LUMA_CONNECTOR",
|
||||
"NOTE",
|
||||
"BOOKSTACK_CONNECTOR",
|
||||
|
|
@ -227,10 +222,6 @@ CONNECTOR_DESCRIPTIONS: dict[str, str] = {
|
|||
"GOOGLE_DRIVE_FILE": "Google Drive files and documents (personal cloud storage)",
|
||||
"DISCORD_CONNECTOR": "Discord server conversations and shared content (personal community)",
|
||||
"AIRTABLE_CONNECTOR": "Airtable records, tables, and database content (personal data)",
|
||||
"TAVILY_API": "Tavily web search API results (real-time web search)",
|
||||
"SEARXNG_API": "SearxNG search API results (privacy-focused web search)",
|
||||
"LINKUP_API": "Linkup search API results (web search)",
|
||||
"BAIDU_SEARCH_API": "Baidu search API results (Chinese web search)",
|
||||
"LUMA_CONNECTOR": "Luma events and meetings",
|
||||
"WEBCRAWLER_CONNECTOR": "Webpages indexed by SurfSense (personally selected websites)",
|
||||
"CRAWLED_URL": "Webpages indexed by SurfSense (personally selected websites)",
|
||||
|
|
@ -268,14 +259,15 @@ def _normalize_connectors(
|
|||
valid_set = (
|
||||
set(available_connectors) if available_connectors else set(_ALL_CONNECTORS)
|
||||
)
|
||||
valid_set -= _LIVE_SEARCH_CONNECTORS
|
||||
|
||||
if not connectors_to_search:
|
||||
# Search all available connectors if none specified
|
||||
return (
|
||||
base = (
|
||||
list(available_connectors)
|
||||
if available_connectors
|
||||
else list(_ALL_CONNECTORS)
|
||||
)
|
||||
return [c for c in base if c not in _LIVE_SEARCH_CONNECTORS]
|
||||
|
||||
normalized: list[str] = []
|
||||
for raw in connectors_to_search:
|
||||
|
|
@ -302,15 +294,14 @@ def _normalize_connectors(
|
|||
out.append(c)
|
||||
|
||||
# Fallback to all available if nothing matched
|
||||
return (
|
||||
out
|
||||
if out
|
||||
else (
|
||||
if not out:
|
||||
base = (
|
||||
list(available_connectors)
|
||||
if available_connectors
|
||||
else list(_ALL_CONNECTORS)
|
||||
)
|
||||
)
|
||||
return [c for c in base if c not in _LIVE_SEARCH_CONNECTORS]
|
||||
return out
|
||||
|
||||
|
||||
# =============================================================================
|
||||
|
|
@ -479,7 +470,6 @@ def format_documents_for_context(
|
|||
# a numeric chunk_id (the numeric IDs are meaningless auto-incremented counters).
|
||||
live_search_connectors = {
|
||||
"TAVILY_API",
|
||||
"SEARXNG_API",
|
||||
"LINKUP_API",
|
||||
"BAIDU_SEARCH_API",
|
||||
}
|
||||
|
|
@ -623,13 +613,11 @@ async def search_knowledge_base_async(
|
|||
|
||||
connectors = _normalize_connectors(connectors_to_search, available_connectors)
|
||||
|
||||
# --- Optimization 1: skip local connectors that have zero indexed documents ---
|
||||
# --- Optimization 1: skip connectors that have zero indexed documents ---
|
||||
if available_document_types:
|
||||
doc_types_set = set(available_document_types)
|
||||
before_count = len(connectors)
|
||||
connectors = [
|
||||
c for c in connectors if c in _LIVE_SEARCH_CONNECTORS or c in doc_types_set
|
||||
]
|
||||
connectors = [c for c in connectors if c in doc_types_set]
|
||||
skipped = before_count - len(connectors)
|
||||
if skipped:
|
||||
perf.info(
|
||||
|
|
@ -664,9 +652,7 @@ async def search_knowledge_base_async(
|
|||
"[kb_search] degenerate query %r detected - falling back to recency browse",
|
||||
query,
|
||||
)
|
||||
local_connectors = [c for c in connectors if c not in _LIVE_SEARCH_CONNECTORS]
|
||||
if not local_connectors:
|
||||
local_connectors = [None] # type: ignore[list-item]
|
||||
browse_connectors = connectors if connectors else [None] # type: ignore[list-item]
|
||||
|
||||
browse_results = await asyncio.gather(
|
||||
*[
|
||||
|
|
@ -677,7 +663,7 @@ async def search_knowledge_base_async(
|
|||
start_date=resolved_start_date,
|
||||
end_date=resolved_end_date,
|
||||
)
|
||||
for c in local_connectors
|
||||
for c in browse_connectors
|
||||
]
|
||||
)
|
||||
for docs in browse_results:
|
||||
|
|
@ -702,66 +688,20 @@ async def search_knowledge_base_async(
|
|||
)
|
||||
return result
|
||||
|
||||
# Specs for live-search connectors (external APIs, no local DB/embedding).
|
||||
live_connector_specs: dict[str, tuple[str, bool, bool, dict[str, Any]]] = {
|
||||
"TAVILY_API": ("search_tavily", False, True, {}),
|
||||
"SEARXNG_API": ("search_searxng", False, True, {}),
|
||||
"LINKUP_API": ("search_linkup", False, False, {"mode": "standard"}),
|
||||
"BAIDU_SEARCH_API": ("search_baidu", False, True, {}),
|
||||
}
|
||||
|
||||
# --- Optimization 2: compute the query embedding once, share across all local searches ---
|
||||
precomputed_embedding: list[float] | None = None
|
||||
has_local_connectors = any(c not in _LIVE_SEARCH_CONNECTORS for c in connectors)
|
||||
if has_local_connectors:
|
||||
from app.config import config as app_config
|
||||
from app.config import config as app_config
|
||||
|
||||
t_embed = time.perf_counter()
|
||||
precomputed_embedding = app_config.embedding_model_instance.embed(query)
|
||||
perf.info(
|
||||
"[kb_search] shared embedding computed in %.3fs",
|
||||
time.perf_counter() - t_embed,
|
||||
)
|
||||
t_embed = time.perf_counter()
|
||||
precomputed_embedding = app_config.embedding_model_instance.embed(query)
|
||||
perf.info(
|
||||
"[kb_search] shared embedding computed in %.3fs",
|
||||
time.perf_counter() - t_embed,
|
||||
)
|
||||
|
||||
max_parallel_searches = 4
|
||||
semaphore = asyncio.Semaphore(max_parallel_searches)
|
||||
|
||||
async def _search_one_connector(connector: str) -> list[dict[str, Any]]:
|
||||
is_live = connector in _LIVE_SEARCH_CONNECTORS
|
||||
|
||||
if is_live:
|
||||
spec = live_connector_specs.get(connector)
|
||||
if spec is None:
|
||||
return []
|
||||
method_name, includes_date_range, includes_top_k, extra_kwargs = spec
|
||||
kwargs: dict[str, Any] = {
|
||||
"user_query": query,
|
||||
"search_space_id": search_space_id,
|
||||
**extra_kwargs,
|
||||
}
|
||||
if includes_top_k:
|
||||
kwargs["top_k"] = top_k
|
||||
if includes_date_range:
|
||||
kwargs["start_date"] = resolved_start_date
|
||||
kwargs["end_date"] = resolved_end_date
|
||||
|
||||
try:
|
||||
t_conn = time.perf_counter()
|
||||
async with semaphore, shielded_async_session() as isolated_session:
|
||||
svc = ConnectorService(isolated_session, search_space_id)
|
||||
_, chunks = await getattr(svc, method_name)(**kwargs)
|
||||
perf.info(
|
||||
"[kb_search] connector=%s results=%d in %.3fs",
|
||||
connector,
|
||||
len(chunks),
|
||||
time.perf_counter() - t_conn,
|
||||
)
|
||||
return chunks
|
||||
except Exception as e:
|
||||
perf.warning("[kb_search] connector=%s FAILED: %s", connector, e)
|
||||
return []
|
||||
|
||||
# --- Optimization 3: call _combined_rrf_search directly with shared embedding ---
|
||||
try:
|
||||
t_conn = time.perf_counter()
|
||||
async with semaphore, shielded_async_session() as isolated_session:
|
||||
|
|
@ -967,7 +907,9 @@ Focus searches on these types for best results."""
|
|||
# This is what the LLM sees when deciding whether/how to use the tool
|
||||
dynamic_description = f"""Search the user's personal knowledge base for relevant information.
|
||||
|
||||
Use this tool to find documents, notes, files, web pages, and other content that may help answer the user's question.
|
||||
Use this tool to find documents, notes, files, web pages, and other content the user has indexed.
|
||||
This searches ONLY local/indexed data (uploaded files, Notion, Slack, browser extension captures, etc.).
|
||||
For real-time web search (current events, news, live data), use the `web_search` tool instead.
|
||||
|
||||
IMPORTANT:
|
||||
- Always craft specific, descriptive search queries using natural language keywords.
|
||||
|
|
@ -977,9 +919,6 @@ IMPORTANT:
|
|||
- If the user requests a specific source type (e.g. "my notes", "Slack messages"), pass `connectors_to_search=[...]` using the enums below.
|
||||
- If `connectors_to_search` is omitted/empty, the system will search broadly.
|
||||
- Only connectors that are enabled/configured for this search space are available.{doc_types_info}
|
||||
- For real-time/public web queries (e.g., current exchange rates, stock prices, breaking news, weather),
|
||||
explicitly include live web connectors in `connectors_to_search`, prioritizing:
|
||||
["LINKUP_API", "TAVILY_API", "SEARXNG_API", "BAIDU_SEARCH_API"].
|
||||
|
||||
## Available connector enums for `connectors_to_search`
|
||||
|
||||
|
|
|
|||
|
|
@ -4,60 +4,15 @@ Podcast generation tool for the SurfSense agent.
|
|||
This module provides a factory function for creating the generate_podcast tool
|
||||
that submits a Celery task for background podcast generation. The frontend
|
||||
polls for completion and auto-updates when the podcast is ready.
|
||||
|
||||
Duplicate request prevention:
|
||||
- Only one podcast can be generated at a time per search space
|
||||
- Uses Redis to track active podcast tasks
|
||||
- Returns a friendly message if a podcast is already being generated
|
||||
"""
|
||||
|
||||
from typing import Any
|
||||
|
||||
import redis
|
||||
from langchain_core.tools import tool
|
||||
from sqlalchemy.ext.asyncio import AsyncSession
|
||||
|
||||
from app.config import config
|
||||
from app.db import Podcast, PodcastStatus
|
||||
|
||||
# Redis connection for tracking active podcast tasks
|
||||
# Defaults to the Celery broker when REDIS_APP_URL is not set
|
||||
REDIS_URL = config.REDIS_APP_URL
|
||||
_redis_client: redis.Redis | None = None
|
||||
|
||||
|
||||
def get_redis_client() -> redis.Redis:
|
||||
"""Get or create Redis client for podcast task tracking."""
|
||||
global _redis_client
|
||||
if _redis_client is None:
|
||||
_redis_client = redis.from_url(REDIS_URL, decode_responses=True)
|
||||
return _redis_client
|
||||
|
||||
|
||||
def _redis_key(search_space_id: int) -> str:
|
||||
return f"podcast:generating:{search_space_id}"
|
||||
|
||||
|
||||
def get_generating_podcast_id(search_space_id: int) -> int | None:
|
||||
"""Get the podcast ID currently being generated for this search space."""
|
||||
try:
|
||||
client = get_redis_client()
|
||||
value = client.get(_redis_key(search_space_id))
|
||||
return int(value) if value else None
|
||||
except Exception:
|
||||
return None
|
||||
|
||||
|
||||
def set_generating_podcast(search_space_id: int, podcast_id: int) -> None:
|
||||
"""Mark a podcast as currently generating for this search space."""
|
||||
try:
|
||||
client = get_redis_client()
|
||||
client.setex(_redis_key(search_space_id), 1800, str(podcast_id))
|
||||
except Exception as e:
|
||||
print(
|
||||
f"[generate_podcast] Warning: Could not set generating podcast in Redis: {e}"
|
||||
)
|
||||
|
||||
|
||||
def create_generate_podcast_tool(
|
||||
search_space_id: int,
|
||||
|
|
@ -109,18 +64,6 @@ def create_generate_podcast_tool(
|
|||
- message: Status message (or "error" field if status is failed)
|
||||
"""
|
||||
try:
|
||||
generating_podcast_id = get_generating_podcast_id(search_space_id)
|
||||
if generating_podcast_id:
|
||||
print(
|
||||
f"[generate_podcast] Blocked duplicate request. Generating podcast: {generating_podcast_id}"
|
||||
)
|
||||
return {
|
||||
"status": PodcastStatus.GENERATING.value,
|
||||
"podcast_id": generating_podcast_id,
|
||||
"title": podcast_title,
|
||||
"message": "A podcast is already being generated. Please wait for it to complete.",
|
||||
}
|
||||
|
||||
podcast = Podcast(
|
||||
title=podcast_title,
|
||||
status=PodcastStatus.PENDING,
|
||||
|
|
@ -142,8 +85,6 @@ def create_generate_podcast_tool(
|
|||
user_prompt=user_prompt,
|
||||
)
|
||||
|
||||
set_generating_podcast(search_space_id, podcast.id)
|
||||
|
||||
print(f"[generate_podcast] Created podcast {podcast.id}, task: {task.id}")
|
||||
|
||||
return {
|
||||
|
|
|
|||
|
|
@ -73,6 +73,8 @@ from .shared_memory import (
|
|||
create_save_shared_memory_tool,
|
||||
)
|
||||
from .user_memory import create_recall_memory_tool, create_save_memory_tool
|
||||
from .video_presentation import create_generate_video_presentation_tool
|
||||
from .web_search import create_web_search_tool
|
||||
|
||||
# =============================================================================
|
||||
# Tool Definition
|
||||
|
|
@ -135,6 +137,17 @@ BUILTIN_TOOLS: list[ToolDefinition] = [
|
|||
),
|
||||
requires=["search_space_id", "db_session", "thread_id"],
|
||||
),
|
||||
# Video presentation generation tool
|
||||
ToolDefinition(
|
||||
name="generate_video_presentation",
|
||||
description="Generate a video presentation with slides and narration from provided content",
|
||||
factory=lambda deps: create_generate_video_presentation_tool(
|
||||
search_space_id=deps["search_space_id"],
|
||||
db_session=deps["db_session"],
|
||||
thread_id=deps["thread_id"],
|
||||
),
|
||||
requires=["search_space_id", "db_session", "thread_id"],
|
||||
),
|
||||
# Report generation tool (inline, short-lived sessions for DB ops)
|
||||
# Supports internal KB search via source_strategy so the agent doesn't
|
||||
# need to call search_knowledge_base separately before generating.
|
||||
|
|
@ -186,7 +199,16 @@ BUILTIN_TOOLS: list[ToolDefinition] = [
|
|||
),
|
||||
requires=[], # firecrawl_api_key is optional
|
||||
),
|
||||
# Note: write_todos is now provided by TodoListMiddleware from deepagents
|
||||
# Web search tool — real-time web search via SearXNG + user-configured engines
|
||||
ToolDefinition(
|
||||
name="web_search",
|
||||
description="Search the web for real-time information using configured search engines",
|
||||
factory=lambda deps: create_web_search_tool(
|
||||
search_space_id=deps.get("search_space_id"),
|
||||
available_connectors=deps.get("available_connectors"),
|
||||
),
|
||||
requires=[],
|
||||
),
|
||||
# Surfsense documentation search tool
|
||||
ToolDefinition(
|
||||
name="search_surfsense_docs",
|
||||
|
|
|
|||
|
|
@ -0,0 +1,87 @@
|
|||
"""
|
||||
Video presentation generation tool for the SurfSense agent.
|
||||
|
||||
This module provides a factory function for creating the generate_video_presentation
|
||||
tool that submits a Celery task for background video presentation generation.
|
||||
The frontend polls for completion and auto-updates when the presentation is ready.
|
||||
"""
|
||||
|
||||
from typing import Any
|
||||
|
||||
from langchain_core.tools import tool
|
||||
from sqlalchemy.ext.asyncio import AsyncSession
|
||||
|
||||
from app.db import VideoPresentation, VideoPresentationStatus
|
||||
|
||||
|
||||
def create_generate_video_presentation_tool(
|
||||
search_space_id: int,
|
||||
db_session: AsyncSession,
|
||||
thread_id: int | None = None,
|
||||
):
|
||||
"""
|
||||
Factory function to create the generate_video_presentation tool with injected dependencies.
|
||||
|
||||
Pre-creates video presentation record with pending status so the ID is available
|
||||
immediately for frontend polling.
|
||||
"""
|
||||
|
||||
@tool
|
||||
async def generate_video_presentation(
|
||||
source_content: str,
|
||||
video_title: str = "SurfSense Presentation",
|
||||
user_prompt: str | None = None,
|
||||
) -> dict[str, Any]:
|
||||
"""Generate a video presentation from the provided content.
|
||||
|
||||
Use this tool when the user asks to create a video, presentation, slides, or slide deck.
|
||||
|
||||
Args:
|
||||
source_content: The text content to turn into a presentation.
|
||||
video_title: Title for the presentation (default: "SurfSense Presentation")
|
||||
user_prompt: Optional style/tone instructions.
|
||||
"""
|
||||
try:
|
||||
video_pres = VideoPresentation(
|
||||
title=video_title,
|
||||
status=VideoPresentationStatus.PENDING,
|
||||
search_space_id=search_space_id,
|
||||
thread_id=thread_id,
|
||||
)
|
||||
db_session.add(video_pres)
|
||||
await db_session.commit()
|
||||
await db_session.refresh(video_pres)
|
||||
|
||||
from app.tasks.celery_tasks.video_presentation_tasks import (
|
||||
generate_video_presentation_task,
|
||||
)
|
||||
|
||||
task = generate_video_presentation_task.delay(
|
||||
video_presentation_id=video_pres.id,
|
||||
source_content=source_content,
|
||||
search_space_id=search_space_id,
|
||||
user_prompt=user_prompt,
|
||||
)
|
||||
|
||||
print(
|
||||
f"[generate_video_presentation] Created video presentation {video_pres.id}, task: {task.id}"
|
||||
)
|
||||
|
||||
return {
|
||||
"status": VideoPresentationStatus.PENDING.value,
|
||||
"video_presentation_id": video_pres.id,
|
||||
"title": video_title,
|
||||
"message": "Video presentation generation started. This may take a few minutes.",
|
||||
}
|
||||
|
||||
except Exception as e:
|
||||
error_message = str(e)
|
||||
print(f"[generate_video_presentation] Error: {error_message}")
|
||||
return {
|
||||
"status": VideoPresentationStatus.FAILED.value,
|
||||
"error": error_message,
|
||||
"title": video_title,
|
||||
"video_presentation_id": None,
|
||||
}
|
||||
|
||||
return generate_video_presentation
|
||||
247
surfsense_backend/app/agents/new_chat/tools/web_search.py
Normal file
247
surfsense_backend/app/agents/new_chat/tools/web_search.py
Normal file
|
|
@ -0,0 +1,247 @@
|
|||
"""
|
||||
Web search tool for the SurfSense agent.
|
||||
|
||||
Provides a unified tool for real-time web searches that dispatches to all
|
||||
configured search engines: the platform SearXNG instance (always available)
|
||||
plus any user-configured live-search connectors (Tavily, Linkup, Baidu).
|
||||
"""
|
||||
|
||||
import asyncio
|
||||
import json
|
||||
import time
|
||||
from typing import Any
|
||||
|
||||
from langchain_core.tools import StructuredTool
|
||||
from pydantic import BaseModel, Field
|
||||
|
||||
from app.db import shielded_async_session
|
||||
from app.services.connector_service import ConnectorService
|
||||
from app.utils.perf import get_perf_logger
|
||||
|
||||
_LIVE_SEARCH_CONNECTORS: set[str] = {
|
||||
"TAVILY_API",
|
||||
"LINKUP_API",
|
||||
"BAIDU_SEARCH_API",
|
||||
}
|
||||
|
||||
_LIVE_CONNECTOR_SPECS: dict[str, tuple[str, bool, bool, dict[str, Any]]] = {
|
||||
"TAVILY_API": ("search_tavily", False, True, {}),
|
||||
"LINKUP_API": ("search_linkup", False, False, {"mode": "standard"}),
|
||||
"BAIDU_SEARCH_API": ("search_baidu", False, True, {}),
|
||||
}
|
||||
|
||||
_CONNECTOR_LABELS: dict[str, str] = {
|
||||
"TAVILY_API": "Tavily",
|
||||
"LINKUP_API": "Linkup",
|
||||
"BAIDU_SEARCH_API": "Baidu",
|
||||
}
|
||||
|
||||
|
||||
class WebSearchInput(BaseModel):
|
||||
"""Input schema for the web_search tool."""
|
||||
|
||||
query: str = Field(
|
||||
description="The search query to look up on the web. Use specific, descriptive terms.",
|
||||
)
|
||||
top_k: int = Field(
|
||||
default=10,
|
||||
description="Number of results to retrieve (default: 10, max: 50).",
|
||||
)
|
||||
|
||||
|
||||
def _format_web_results(
|
||||
documents: list[dict[str, Any]],
|
||||
*,
|
||||
max_chars: int = 50_000,
|
||||
) -> str:
|
||||
"""Format web search results into XML suitable for the LLM context."""
|
||||
if not documents:
|
||||
return "No web search results found."
|
||||
|
||||
parts: list[str] = []
|
||||
total_chars = 0
|
||||
|
||||
for doc in documents:
|
||||
doc_info = doc.get("document") or {}
|
||||
metadata = doc_info.get("metadata") or {}
|
||||
title = doc_info.get("title") or "Web Result"
|
||||
url = metadata.get("url") or ""
|
||||
content = (doc.get("content") or "").strip()
|
||||
source = metadata.get("document_type") or doc.get("source") or "WEB_SEARCH"
|
||||
if not content:
|
||||
continue
|
||||
|
||||
metadata_json = json.dumps(metadata, ensure_ascii=False)
|
||||
doc_xml = "\n".join(
|
||||
[
|
||||
"<document>",
|
||||
"<document_metadata>",
|
||||
f" <document_type>{source}</document_type>",
|
||||
f" <title><![CDATA[{title}]]></title>",
|
||||
f" <url><![CDATA[{url}]]></url>",
|
||||
f" <metadata_json><![CDATA[{metadata_json}]]></metadata_json>",
|
||||
"</document_metadata>",
|
||||
"<document_content>",
|
||||
f" <chunk id='{url}'><![CDATA[{content}]]></chunk>",
|
||||
"</document_content>",
|
||||
"</document>",
|
||||
"",
|
||||
]
|
||||
)
|
||||
|
||||
if total_chars + len(doc_xml) > max_chars:
|
||||
parts.append("<!-- Output truncated to fit context window -->")
|
||||
break
|
||||
|
||||
parts.append(doc_xml)
|
||||
total_chars += len(doc_xml)
|
||||
|
||||
return "\n".join(parts).strip() or "No web search results found."
|
||||
|
||||
|
||||
async def _search_live_connector(
|
||||
connector: str,
|
||||
query: str,
|
||||
search_space_id: int,
|
||||
top_k: int,
|
||||
semaphore: asyncio.Semaphore,
|
||||
) -> list[dict[str, Any]]:
|
||||
"""Dispatch a single live-search connector (Tavily / Linkup / Baidu)."""
|
||||
perf = get_perf_logger()
|
||||
spec = _LIVE_CONNECTOR_SPECS.get(connector)
|
||||
if spec is None:
|
||||
return []
|
||||
|
||||
method_name, _includes_date_range, includes_top_k, extra_kwargs = spec
|
||||
kwargs: dict[str, Any] = {
|
||||
"user_query": query,
|
||||
"search_space_id": search_space_id,
|
||||
**extra_kwargs,
|
||||
}
|
||||
if includes_top_k:
|
||||
kwargs["top_k"] = top_k
|
||||
|
||||
try:
|
||||
t0 = time.perf_counter()
|
||||
async with semaphore, shielded_async_session() as session:
|
||||
svc = ConnectorService(session, search_space_id)
|
||||
_, chunks = await getattr(svc, method_name)(**kwargs)
|
||||
perf.info(
|
||||
"[web_search] connector=%s results=%d in %.3fs",
|
||||
connector,
|
||||
len(chunks),
|
||||
time.perf_counter() - t0,
|
||||
)
|
||||
return chunks
|
||||
except Exception as e:
|
||||
perf.warning("[web_search] connector=%s FAILED: %s", connector, e)
|
||||
return []
|
||||
|
||||
|
||||
def create_web_search_tool(
|
||||
search_space_id: int | None = None,
|
||||
available_connectors: list[str] | None = None,
|
||||
) -> StructuredTool:
|
||||
"""Factory for the ``web_search`` tool.
|
||||
|
||||
Dispatches in parallel to the platform SearXNG instance and any
|
||||
user-configured live-search connectors (Tavily, Linkup, Baidu).
|
||||
"""
|
||||
active_live_connectors: list[str] = []
|
||||
if available_connectors:
|
||||
active_live_connectors = [
|
||||
c for c in available_connectors if c in _LIVE_SEARCH_CONNECTORS
|
||||
]
|
||||
|
||||
engine_names = ["SearXNG (platform default)"]
|
||||
engine_names.extend(_CONNECTOR_LABELS.get(c, c) for c in active_live_connectors)
|
||||
engines_summary = ", ".join(engine_names)
|
||||
|
||||
description = (
|
||||
"Search the web for real-time information. "
|
||||
"Use this for current events, news, prices, weather, public facts, or any "
|
||||
"question that requires up-to-date information from the internet.\n\n"
|
||||
f"Active search engines: {engines_summary}.\n"
|
||||
"All configured engines are queried in parallel and results are merged."
|
||||
)
|
||||
|
||||
_search_space_id = search_space_id
|
||||
_active_live = active_live_connectors
|
||||
|
||||
async def _web_search_impl(query: str, top_k: int = 10) -> str:
|
||||
from app.services import web_search_service
|
||||
|
||||
perf = get_perf_logger()
|
||||
t0 = time.perf_counter()
|
||||
clamped_top_k = min(max(1, top_k), 50)
|
||||
|
||||
semaphore = asyncio.Semaphore(4)
|
||||
tasks: list[asyncio.Task[list[dict[str, Any]]]] = []
|
||||
|
||||
if web_search_service.is_available():
|
||||
|
||||
async def _searxng() -> list[dict[str, Any]]:
|
||||
async with semaphore:
|
||||
_result_obj, docs = await web_search_service.search(
|
||||
query=query,
|
||||
top_k=clamped_top_k,
|
||||
)
|
||||
return docs
|
||||
|
||||
tasks.append(asyncio.ensure_future(_searxng()))
|
||||
|
||||
if _search_space_id is not None:
|
||||
for connector in _active_live:
|
||||
tasks.append(
|
||||
asyncio.ensure_future(
|
||||
_search_live_connector(
|
||||
connector=connector,
|
||||
query=query,
|
||||
search_space_id=_search_space_id,
|
||||
top_k=clamped_top_k,
|
||||
semaphore=semaphore,
|
||||
)
|
||||
)
|
||||
)
|
||||
|
||||
if not tasks:
|
||||
return "Web search is not available — no search engines are configured."
|
||||
|
||||
results_lists = await asyncio.gather(*tasks, return_exceptions=True)
|
||||
|
||||
all_documents: list[dict[str, Any]] = []
|
||||
for result in results_lists:
|
||||
if isinstance(result, BaseException):
|
||||
perf.warning("[web_search] a search engine failed: %s", result)
|
||||
continue
|
||||
all_documents.extend(result)
|
||||
|
||||
seen_urls: set[str] = set()
|
||||
deduplicated: list[dict[str, Any]] = []
|
||||
for doc in all_documents:
|
||||
url = ((doc.get("document") or {}).get("metadata") or {}).get("url", "")
|
||||
if url and url in seen_urls:
|
||||
continue
|
||||
if url:
|
||||
seen_urls.add(url)
|
||||
deduplicated.append(doc)
|
||||
|
||||
formatted = _format_web_results(deduplicated)
|
||||
|
||||
perf.info(
|
||||
"[web_search] query=%r engines=%d results=%d deduped=%d chars=%d in %.3fs",
|
||||
query[:60],
|
||||
len(tasks),
|
||||
len(all_documents),
|
||||
len(deduplicated),
|
||||
len(formatted),
|
||||
time.perf_counter() - t0,
|
||||
)
|
||||
return formatted
|
||||
|
||||
return StructuredTool(
|
||||
name="web_search",
|
||||
description=description,
|
||||
coroutine=_web_search_impl,
|
||||
args_schema=WebSearchInput,
|
||||
)
|
||||
10
surfsense_backend/app/agents/video_presentation/__init__.py
Normal file
10
surfsense_backend/app/agents/video_presentation/__init__.py
Normal file
|
|
@ -0,0 +1,10 @@
|
|||
"""Video Presentation LangGraph Agent.
|
||||
|
||||
This module defines a graph for generating video presentations
|
||||
from source content, similar to the podcaster agent but producing
|
||||
slide-based video presentations with TTS narration.
|
||||
"""
|
||||
|
||||
from .graph import graph
|
||||
|
||||
__all__ = ["graph"]
|
||||
|
|
@ -0,0 +1,25 @@
|
|||
"""Define the configurable parameters for the video presentation agent."""
|
||||
|
||||
from __future__ import annotations
|
||||
|
||||
from dataclasses import dataclass, fields
|
||||
|
||||
from langchain_core.runnables import RunnableConfig
|
||||
|
||||
|
||||
@dataclass(kw_only=True)
|
||||
class Configuration:
|
||||
"""The configuration for the video presentation agent."""
|
||||
|
||||
video_title: str
|
||||
search_space_id: int
|
||||
user_prompt: str | None = None
|
||||
|
||||
@classmethod
|
||||
def from_runnable_config(
|
||||
cls, config: RunnableConfig | None = None
|
||||
) -> Configuration:
|
||||
"""Create a Configuration instance from a RunnableConfig object."""
|
||||
configurable = (config.get("configurable") or {}) if config else {}
|
||||
_fields = {f.name for f in fields(cls) if f.init}
|
||||
return cls(**{k: v for k, v in configurable.items() if k in _fields})
|
||||
39
surfsense_backend/app/agents/video_presentation/graph.py
Normal file
39
surfsense_backend/app/agents/video_presentation/graph.py
Normal file
|
|
@ -0,0 +1,39 @@
|
|||
from langgraph.graph import StateGraph
|
||||
|
||||
from .configuration import Configuration
|
||||
from .nodes import (
|
||||
assign_slide_themes,
|
||||
create_presentation_slides,
|
||||
create_slide_audio,
|
||||
generate_slide_scene_codes,
|
||||
)
|
||||
from .state import State
|
||||
|
||||
|
||||
def build_graph():
|
||||
workflow = StateGraph(State, config_schema=Configuration)
|
||||
|
||||
workflow.add_node("create_presentation_slides", create_presentation_slides)
|
||||
workflow.add_node("create_slide_audio", create_slide_audio)
|
||||
workflow.add_node("assign_slide_themes", assign_slide_themes)
|
||||
workflow.add_node("generate_slide_scene_codes", generate_slide_scene_codes)
|
||||
|
||||
# Fan-out: after slides are parsed, run audio generation and theme
|
||||
# assignment in parallel (themes only need slide metadata, not audio).
|
||||
workflow.add_edge("__start__", "create_presentation_slides")
|
||||
workflow.add_edge("create_presentation_slides", "create_slide_audio")
|
||||
workflow.add_edge("create_presentation_slides", "assign_slide_themes")
|
||||
|
||||
# Fan-in: scene code generation waits for both audio and themes.
|
||||
workflow.add_edge("create_slide_audio", "generate_slide_scene_codes")
|
||||
workflow.add_edge("assign_slide_themes", "generate_slide_scene_codes")
|
||||
|
||||
workflow.add_edge("generate_slide_scene_codes", "__end__")
|
||||
|
||||
graph = workflow.compile()
|
||||
graph.name = "Surfsense Video Presentation"
|
||||
|
||||
return graph
|
||||
|
||||
|
||||
graph = build_graph()
|
||||
580
surfsense_backend/app/agents/video_presentation/nodes.py
Normal file
580
surfsense_backend/app/agents/video_presentation/nodes.py
Normal file
|
|
@ -0,0 +1,580 @@
|
|||
import asyncio
|
||||
import contextlib
|
||||
import json
|
||||
import math
|
||||
import os
|
||||
import shutil
|
||||
import uuid
|
||||
from pathlib import Path
|
||||
from typing import Any
|
||||
|
||||
from ffmpeg.asyncio import FFmpeg
|
||||
from langchain_core.messages import HumanMessage, SystemMessage
|
||||
from langchain_core.runnables import RunnableConfig
|
||||
from litellm import aspeech
|
||||
|
||||
from app.config import config as app_config
|
||||
from app.services.kokoro_tts_service import get_kokoro_tts_service
|
||||
from app.services.llm_service import get_agent_llm
|
||||
|
||||
from .configuration import Configuration
|
||||
from .prompts import (
|
||||
DEFAULT_DURATION_IN_FRAMES,
|
||||
FPS,
|
||||
REFINE_SCENE_SYSTEM_PROMPT,
|
||||
REMOTION_SCENE_SYSTEM_PROMPT,
|
||||
THEME_PRESETS,
|
||||
build_scene_generation_user_prompt,
|
||||
build_theme_assignment_user_prompt,
|
||||
get_slide_generation_prompt,
|
||||
get_theme_assignment_system_prompt,
|
||||
pick_theme_and_mode_fallback,
|
||||
)
|
||||
from .state import (
|
||||
PresentationSlides,
|
||||
SlideAudioResult,
|
||||
SlideContent,
|
||||
SlideSceneCode,
|
||||
State,
|
||||
)
|
||||
from .utils import get_voice_for_provider
|
||||
|
||||
MAX_REFINE_ATTEMPTS = 3
|
||||
|
||||
|
||||
async def create_presentation_slides(
|
||||
state: State, config: RunnableConfig
|
||||
) -> dict[str, Any]:
|
||||
"""Parse source content into structured presentation slides using LLM."""
|
||||
|
||||
configuration = Configuration.from_runnable_config(config)
|
||||
search_space_id = configuration.search_space_id
|
||||
user_prompt = configuration.user_prompt
|
||||
|
||||
llm = await get_agent_llm(state.db_session, search_space_id)
|
||||
if not llm:
|
||||
error_message = f"No LLM configured for search space {search_space_id}"
|
||||
print(error_message)
|
||||
raise RuntimeError(error_message)
|
||||
|
||||
prompt = get_slide_generation_prompt(user_prompt)
|
||||
|
||||
messages = [
|
||||
SystemMessage(content=prompt),
|
||||
HumanMessage(
|
||||
content=f"<source_content>{state.source_content}</source_content>"
|
||||
),
|
||||
]
|
||||
|
||||
llm_response = await llm.ainvoke(messages)
|
||||
|
||||
try:
|
||||
presentation = PresentationSlides.model_validate(
|
||||
json.loads(llm_response.content)
|
||||
)
|
||||
except (json.JSONDecodeError, ValueError) as e:
|
||||
print(f"Direct JSON parsing failed, trying fallback approach: {e!s}")
|
||||
|
||||
try:
|
||||
content = llm_response.content
|
||||
json_start = content.find("{")
|
||||
json_end = content.rfind("}") + 1
|
||||
if json_start >= 0 and json_end > json_start:
|
||||
json_str = content[json_start:json_end]
|
||||
parsed_data = json.loads(json_str)
|
||||
presentation = PresentationSlides.model_validate(parsed_data)
|
||||
print("Successfully parsed presentation slides using fallback approach")
|
||||
else:
|
||||
error_message = f"Could not find valid JSON in LLM response. Raw response: {content}"
|
||||
print(error_message)
|
||||
raise ValueError(error_message)
|
||||
|
||||
except (json.JSONDecodeError, ValueError) as e2:
|
||||
error_message = f"Error parsing LLM response (fallback also failed): {e2!s}"
|
||||
print(f"Error parsing LLM response: {e2!s}")
|
||||
print(f"Raw response: {llm_response.content}")
|
||||
raise
|
||||
|
||||
return {"slides": presentation.slides}
|
||||
|
||||
|
||||
async def create_slide_audio(state: State, config: RunnableConfig) -> dict[str, Any]:
|
||||
"""Generate TTS audio for each slide.
|
||||
|
||||
Each slide's speaker_transcripts are generated as individual TTS chunks,
|
||||
then concatenated with ffmpeg (matching the POC in RemotionTets/api/tts).
|
||||
"""
|
||||
|
||||
session_id = str(uuid.uuid4())
|
||||
temp_dir = Path("temp_audio")
|
||||
temp_dir.mkdir(exist_ok=True)
|
||||
output_dir = Path("video_presentation_audio")
|
||||
output_dir.mkdir(exist_ok=True)
|
||||
|
||||
slides = state.slides or []
|
||||
voice = get_voice_for_provider(app_config.TTS_SERVICE, speaker_id=0)
|
||||
ext = "wav" if app_config.TTS_SERVICE == "local/kokoro" else "mp3"
|
||||
|
||||
async def _generate_tts_chunk(text: str, chunk_path: str) -> str:
|
||||
"""Generate a single TTS chunk and write it to *chunk_path*."""
|
||||
if app_config.TTS_SERVICE == "local/kokoro":
|
||||
kokoro_service = await get_kokoro_tts_service(lang_code="a")
|
||||
await kokoro_service.generate_speech(
|
||||
text=text,
|
||||
voice=voice,
|
||||
speed=1.0,
|
||||
output_path=chunk_path,
|
||||
)
|
||||
else:
|
||||
kwargs: dict[str, Any] = {
|
||||
"model": app_config.TTS_SERVICE,
|
||||
"api_key": app_config.TTS_SERVICE_API_KEY,
|
||||
"voice": voice,
|
||||
"input": text,
|
||||
"max_retries": 2,
|
||||
"timeout": 600,
|
||||
}
|
||||
if app_config.TTS_SERVICE_API_BASE:
|
||||
kwargs["api_base"] = app_config.TTS_SERVICE_API_BASE
|
||||
|
||||
response = await aspeech(**kwargs)
|
||||
with open(chunk_path, "wb") as f:
|
||||
f.write(response.content)
|
||||
|
||||
return chunk_path
|
||||
|
||||
async def _concat_with_ffmpeg(chunk_paths: list[str], output_file: str) -> None:
|
||||
"""Concatenate multiple audio chunks into one file using async ffmpeg."""
|
||||
ffmpeg = FFmpeg().option("y")
|
||||
for chunk in chunk_paths:
|
||||
ffmpeg = ffmpeg.input(chunk)
|
||||
|
||||
filter_parts = [f"[{i}:0]" for i in range(len(chunk_paths))]
|
||||
filter_str = (
|
||||
"".join(filter_parts) + f"concat=n={len(chunk_paths)}:v=0:a=1[outa]"
|
||||
)
|
||||
ffmpeg = ffmpeg.option("filter_complex", filter_str)
|
||||
ffmpeg = ffmpeg.output(output_file, map="[outa]")
|
||||
await ffmpeg.execute()
|
||||
|
||||
async def generate_audio_for_slide(slide: SlideContent) -> SlideAudioResult:
|
||||
has_transcripts = (
|
||||
slide.speaker_transcripts and len(slide.speaker_transcripts) > 0
|
||||
)
|
||||
|
||||
if not has_transcripts:
|
||||
print(
|
||||
f"Slide {slide.slide_number}: no speaker_transcripts, "
|
||||
f"using default duration ({DEFAULT_DURATION_IN_FRAMES} frames)"
|
||||
)
|
||||
return SlideAudioResult(
|
||||
slide_number=slide.slide_number,
|
||||
audio_file="",
|
||||
duration_seconds=DEFAULT_DURATION_IN_FRAMES / FPS,
|
||||
duration_in_frames=DEFAULT_DURATION_IN_FRAMES,
|
||||
)
|
||||
|
||||
output_file = str(output_dir / f"{session_id}_slide_{slide.slide_number}.{ext}")
|
||||
|
||||
chunk_paths: list[str] = []
|
||||
try:
|
||||
chunk_paths = [
|
||||
str(
|
||||
temp_dir
|
||||
/ f"{session_id}_slide_{slide.slide_number}_chunk_{i}.{ext}"
|
||||
)
|
||||
for i in range(len(slide.speaker_transcripts))
|
||||
]
|
||||
|
||||
for i, text in enumerate(slide.speaker_transcripts):
|
||||
print(
|
||||
f" Slide {slide.slide_number} chunk {i + 1}/"
|
||||
f"{len(slide.speaker_transcripts)}: "
|
||||
f'"{text[:60]}..."'
|
||||
)
|
||||
|
||||
await asyncio.gather(
|
||||
*[
|
||||
_generate_tts_chunk(text, path)
|
||||
for text, path in zip(
|
||||
slide.speaker_transcripts, chunk_paths, strict=False
|
||||
)
|
||||
]
|
||||
)
|
||||
|
||||
if len(chunk_paths) == 1:
|
||||
shutil.move(chunk_paths[0], output_file)
|
||||
else:
|
||||
print(
|
||||
f" Concatenating {len(chunk_paths)} chunks for slide "
|
||||
f"{slide.slide_number} with ffmpeg"
|
||||
)
|
||||
await _concat_with_ffmpeg(chunk_paths, output_file)
|
||||
|
||||
duration_seconds = await _get_audio_duration(output_file)
|
||||
duration_in_frames = math.ceil(duration_seconds * FPS)
|
||||
|
||||
return SlideAudioResult(
|
||||
slide_number=slide.slide_number,
|
||||
audio_file=output_file,
|
||||
duration_seconds=duration_seconds,
|
||||
duration_in_frames=max(duration_in_frames, DEFAULT_DURATION_IN_FRAMES),
|
||||
)
|
||||
|
||||
except Exception as e:
|
||||
print(f"Error generating audio for slide {slide.slide_number}: {e!s}")
|
||||
raise
|
||||
finally:
|
||||
for p in chunk_paths:
|
||||
with contextlib.suppress(OSError):
|
||||
os.remove(p)
|
||||
|
||||
tasks = [generate_audio_for_slide(slide) for slide in slides]
|
||||
audio_results = await asyncio.gather(*tasks)
|
||||
|
||||
audio_results_sorted = sorted(audio_results, key=lambda r: r.slide_number)
|
||||
|
||||
print(
|
||||
f"Generated audio for {len(audio_results_sorted)} slides "
|
||||
f"(total duration: {sum(r.duration_seconds for r in audio_results_sorted):.1f}s)"
|
||||
)
|
||||
|
||||
return {"slide_audio_results": audio_results_sorted}
|
||||
|
||||
|
||||
async def _get_audio_duration(file_path: str) -> float:
|
||||
"""Get audio duration in seconds using ffprobe (via python-ffmpeg).
|
||||
|
||||
Falls back to file-size estimation if ffprobe fails.
|
||||
"""
|
||||
try:
|
||||
import subprocess
|
||||
|
||||
proc = await asyncio.create_subprocess_exec(
|
||||
"ffprobe",
|
||||
"-v",
|
||||
"error",
|
||||
"-show_entries",
|
||||
"format=duration",
|
||||
"-of",
|
||||
"default=noprint_wrappers=1:nokey=1",
|
||||
file_path,
|
||||
stdout=subprocess.PIPE,
|
||||
stderr=subprocess.PIPE,
|
||||
)
|
||||
stdout, _ = await asyncio.wait_for(proc.communicate(), timeout=10)
|
||||
if proc.returncode == 0 and stdout.strip():
|
||||
return float(stdout.strip())
|
||||
except Exception as e:
|
||||
print(f"ffprobe failed for {file_path}: {e!s}, using file-size estimation")
|
||||
|
||||
try:
|
||||
file_size = os.path.getsize(file_path)
|
||||
if file_path.endswith(".wav"):
|
||||
return file_size / (16000 * 2)
|
||||
else:
|
||||
return file_size / 16000
|
||||
except Exception:
|
||||
return DEFAULT_DURATION_IN_FRAMES / FPS
|
||||
|
||||
|
||||
async def _assign_themes_with_llm(
|
||||
llm, slides: list[SlideContent]
|
||||
) -> dict[int, tuple[str, str]]:
|
||||
"""Ask the LLM to assign a theme+mode to each slide in one call.
|
||||
|
||||
Returns a dict mapping slide_number → (theme, mode).
|
||||
Falls back to round-robin if the LLM response can't be parsed.
|
||||
"""
|
||||
total = len(slides)
|
||||
slide_summaries = [
|
||||
{
|
||||
"slide_number": s.slide_number,
|
||||
"title": s.title,
|
||||
"subtitle": s.subtitle or "",
|
||||
"background_explanation": s.background_explanation or "",
|
||||
}
|
||||
for s in slides
|
||||
]
|
||||
|
||||
system = get_theme_assignment_system_prompt()
|
||||
user = build_theme_assignment_user_prompt(slide_summaries)
|
||||
|
||||
try:
|
||||
response = await llm.ainvoke(
|
||||
[
|
||||
SystemMessage(content=system),
|
||||
HumanMessage(content=user),
|
||||
]
|
||||
)
|
||||
|
||||
text = response.content.strip()
|
||||
if text.startswith("```"):
|
||||
lines = text.split("\n")
|
||||
text = "\n".join(
|
||||
line for line in lines if not line.strip().startswith("```")
|
||||
).strip()
|
||||
|
||||
assignments = json.loads(text)
|
||||
valid_themes = set(THEME_PRESETS)
|
||||
result: dict[int, tuple[str, str]] = {}
|
||||
for entry in assignments:
|
||||
sn = entry.get("slide_number")
|
||||
theme = entry.get("theme", "").upper()
|
||||
mode = entry.get("mode", "dark").lower()
|
||||
if sn and theme in valid_themes and mode in ("dark", "light"):
|
||||
result[sn] = (theme, mode)
|
||||
|
||||
if len(result) == total:
|
||||
print(
|
||||
"LLM theme assignment: "
|
||||
+ ", ".join(f"S{sn}={t}/{m}" for sn, (t, m) in sorted(result.items()))
|
||||
)
|
||||
return result
|
||||
|
||||
print(
|
||||
f"LLM returned {len(result)}/{total} valid assignments, "
|
||||
"filling gaps with fallback"
|
||||
)
|
||||
for s in slides:
|
||||
if s.slide_number not in result:
|
||||
result[s.slide_number] = pick_theme_and_mode_fallback(
|
||||
s.slide_number - 1, total
|
||||
)
|
||||
return result
|
||||
|
||||
except Exception as e:
|
||||
print(f"LLM theme assignment failed ({e!s}), using fallback")
|
||||
return {
|
||||
s.slide_number: pick_theme_and_mode_fallback(s.slide_number - 1, total)
|
||||
for s in slides
|
||||
}
|
||||
|
||||
|
||||
async def assign_slide_themes(state: State, config: RunnableConfig) -> dict[str, Any]:
|
||||
"""Assign a theme preset + dark/light mode to every slide via a single LLM call.
|
||||
|
||||
Runs in parallel with audio generation since it only needs slide metadata.
|
||||
"""
|
||||
configuration = Configuration.from_runnable_config(config)
|
||||
search_space_id = configuration.search_space_id
|
||||
|
||||
llm = await get_agent_llm(state.db_session, search_space_id)
|
||||
if not llm:
|
||||
raise RuntimeError(f"No LLM configured for search space {search_space_id}")
|
||||
|
||||
slides = state.slides or []
|
||||
assignments = await _assign_themes_with_llm(llm, slides)
|
||||
return {"slide_theme_assignments": assignments}
|
||||
|
||||
|
||||
async def generate_slide_scene_codes(
|
||||
state: State, config: RunnableConfig
|
||||
) -> dict[str, Any]:
|
||||
"""Generate Remotion component code for each slide using LLM.
|
||||
|
||||
Reads pre-assigned themes from state (produced by the parallel
|
||||
assign_slide_themes node) and generates scene code concurrently.
|
||||
"""
|
||||
|
||||
configuration = Configuration.from_runnable_config(config)
|
||||
search_space_id = configuration.search_space_id
|
||||
|
||||
llm = await get_agent_llm(state.db_session, search_space_id)
|
||||
if not llm:
|
||||
raise RuntimeError(f"No LLM configured for search space {search_space_id}")
|
||||
|
||||
slides = state.slides or []
|
||||
audio_results = state.slide_audio_results or []
|
||||
|
||||
audio_map: dict[int, SlideAudioResult] = {r.slide_number: r for r in audio_results}
|
||||
total_slides = len(slides)
|
||||
|
||||
theme_assignments = state.slide_theme_assignments or {}
|
||||
|
||||
async def _generate_scene_for_slide(slide: SlideContent) -> SlideSceneCode:
|
||||
audio = audio_map.get(slide.slide_number)
|
||||
duration = audio.duration_in_frames if audio else DEFAULT_DURATION_IN_FRAMES
|
||||
|
||||
theme, mode = theme_assignments.get(
|
||||
slide.slide_number,
|
||||
pick_theme_and_mode_fallback(slide.slide_number - 1, total_slides),
|
||||
)
|
||||
|
||||
user_prompt = build_scene_generation_user_prompt(
|
||||
slide_number=slide.slide_number,
|
||||
total_slides=total_slides,
|
||||
title=slide.title,
|
||||
subtitle=slide.subtitle,
|
||||
content_in_markdown=slide.content_in_markdown,
|
||||
background_explanation=slide.background_explanation,
|
||||
duration_in_frames=duration,
|
||||
theme=theme,
|
||||
mode=mode,
|
||||
)
|
||||
|
||||
messages = [
|
||||
SystemMessage(content=REMOTION_SCENE_SYSTEM_PROMPT),
|
||||
HumanMessage(content=user_prompt),
|
||||
]
|
||||
|
||||
print(
|
||||
f"Generating scene code for slide {slide.slide_number}/{total_slides}: "
|
||||
f'"{slide.title}" ({duration} frames)'
|
||||
)
|
||||
|
||||
llm_response = await llm.ainvoke(messages)
|
||||
code, scene_title = _extract_code_and_title(llm_response.content)
|
||||
|
||||
code = await _refine_if_needed(llm, code, slide.slide_number)
|
||||
|
||||
print(f"Scene code ready for slide {slide.slide_number} ({len(code)} chars)")
|
||||
|
||||
return SlideSceneCode(
|
||||
slide_number=slide.slide_number,
|
||||
code=code,
|
||||
title=scene_title or slide.title,
|
||||
)
|
||||
|
||||
scene_codes = list(
|
||||
await asyncio.gather(*[_generate_scene_for_slide(s) for s in slides])
|
||||
)
|
||||
|
||||
return {"slide_scene_codes": scene_codes}
|
||||
|
||||
|
||||
def _extract_code_and_title(content: str) -> tuple[str, str | None]:
|
||||
"""Extract code and optional title from LLM response.
|
||||
|
||||
The LLM may return a JSON object like the POC's structured output:
|
||||
{ "code": "...", "title": "..." }
|
||||
Or it may return raw code (with optional markdown fences).
|
||||
|
||||
Returns (code, title) where title may be None.
|
||||
"""
|
||||
text = content.strip()
|
||||
|
||||
if text.startswith("{"):
|
||||
try:
|
||||
parsed = json.loads(text)
|
||||
if isinstance(parsed, dict) and "code" in parsed:
|
||||
return parsed["code"], parsed.get("title")
|
||||
except (json.JSONDecodeError, ValueError):
|
||||
pass
|
||||
|
||||
json_start = text.find("{")
|
||||
json_end = text.rfind("}") + 1
|
||||
if json_start >= 0 and json_end > json_start:
|
||||
try:
|
||||
parsed = json.loads(text[json_start:json_end])
|
||||
if isinstance(parsed, dict) and "code" in parsed:
|
||||
return parsed["code"], parsed.get("title")
|
||||
except (json.JSONDecodeError, ValueError):
|
||||
pass
|
||||
|
||||
code = text
|
||||
if code.startswith("```"):
|
||||
lines = code.split("\n")
|
||||
start = 1
|
||||
end = len(lines)
|
||||
for i in range(len(lines) - 1, 0, -1):
|
||||
if lines[i].strip().startswith("```"):
|
||||
end = i
|
||||
break
|
||||
code = "\n".join(lines[start:end]).strip()
|
||||
|
||||
return code, None
|
||||
|
||||
|
||||
async def _refine_if_needed(llm, code: str, slide_number: int) -> str:
|
||||
"""Attempt basic syntax validation and auto-repair via LLM if needed.
|
||||
|
||||
Raises RuntimeError if the code is still invalid after MAX_REFINE_ATTEMPTS,
|
||||
matching the POC's behavior where a failed slide aborts the pipeline.
|
||||
"""
|
||||
error = _basic_syntax_check(code)
|
||||
if error is None:
|
||||
return code
|
||||
|
||||
for attempt in range(1, MAX_REFINE_ATTEMPTS + 1):
|
||||
print(
|
||||
f"Slide {slide_number}: syntax issue (attempt {attempt}/{MAX_REFINE_ATTEMPTS}): {error}"
|
||||
)
|
||||
|
||||
messages = [
|
||||
SystemMessage(content=REFINE_SCENE_SYSTEM_PROMPT),
|
||||
HumanMessage(
|
||||
content=(
|
||||
f"Here is the broken Remotion component code:\n\n{code}\n\n"
|
||||
f"Compilation error:\n{error}\n\nFix the code."
|
||||
)
|
||||
),
|
||||
]
|
||||
|
||||
response = await llm.ainvoke(messages)
|
||||
code, _ = _extract_code_and_title(response.content)
|
||||
|
||||
error = _basic_syntax_check(code)
|
||||
if error is None:
|
||||
print(f"Slide {slide_number}: fixed on attempt {attempt}")
|
||||
return code
|
||||
|
||||
raise RuntimeError(
|
||||
f"Slide {slide_number} failed to compile after {MAX_REFINE_ATTEMPTS} "
|
||||
f"refine attempts. Last error: {error}"
|
||||
)
|
||||
|
||||
|
||||
def _basic_syntax_check(code: str) -> str | None:
|
||||
"""Run a lightweight syntax check on the generated code.
|
||||
|
||||
Full Babel-based compilation happens on the frontend. This backend check
|
||||
catches the most common LLM code-generation mistakes so the refine loop
|
||||
can fix them before persisting.
|
||||
|
||||
Returns an error description or None if the code looks valid.
|
||||
"""
|
||||
if not code or not code.strip():
|
||||
return "Empty code"
|
||||
|
||||
if "export" not in code and "MyComposition" not in code:
|
||||
return "Missing exported component (expected 'export const MyComposition')"
|
||||
|
||||
brace_count = 0
|
||||
paren_count = 0
|
||||
bracket_count = 0
|
||||
for ch in code:
|
||||
if ch == "{":
|
||||
brace_count += 1
|
||||
elif ch == "}":
|
||||
brace_count -= 1
|
||||
elif ch == "(":
|
||||
paren_count += 1
|
||||
elif ch == ")":
|
||||
paren_count -= 1
|
||||
elif ch == "[":
|
||||
bracket_count += 1
|
||||
elif ch == "]":
|
||||
bracket_count -= 1
|
||||
|
||||
if brace_count < 0:
|
||||
return "Unmatched closing brace '}'"
|
||||
if paren_count < 0:
|
||||
return "Unmatched closing parenthesis ')'"
|
||||
if bracket_count < 0:
|
||||
return "Unmatched closing bracket ']'"
|
||||
|
||||
if brace_count != 0:
|
||||
return f"Unbalanced braces: {brace_count} unclosed"
|
||||
if paren_count != 0:
|
||||
return f"Unbalanced parentheses: {paren_count} unclosed"
|
||||
if bracket_count != 0:
|
||||
return f"Unbalanced brackets: {bracket_count} unclosed"
|
||||
|
||||
if "useCurrentFrame" not in code:
|
||||
return "Missing useCurrentFrame() — required for Remotion animations"
|
||||
|
||||
if "AbsoluteFill" not in code:
|
||||
return "Missing AbsoluteFill — required as the root layout component"
|
||||
|
||||
return None
|
||||
509
surfsense_backend/app/agents/video_presentation/prompts.py
Normal file
509
surfsense_backend/app/agents/video_presentation/prompts.py
Normal file
|
|
@ -0,0 +1,509 @@
|
|||
import datetime
|
||||
|
||||
# TODO: move these to config file
|
||||
MAX_SLIDES = 5
|
||||
FPS = 30
|
||||
DEFAULT_DURATION_IN_FRAMES = 300
|
||||
|
||||
THEME_PRESETS = [
|
||||
"TERRA",
|
||||
"OCEAN",
|
||||
"SUNSET",
|
||||
"EMERALD",
|
||||
"ECLIPSE",
|
||||
"ROSE",
|
||||
"FROST",
|
||||
"NEBULA",
|
||||
"AURORA",
|
||||
"CORAL",
|
||||
"MIDNIGHT",
|
||||
"AMBER",
|
||||
"LAVENDER",
|
||||
"STEEL",
|
||||
"CITRUS",
|
||||
"CHERRY",
|
||||
]
|
||||
|
||||
THEME_DESCRIPTIONS: dict[str, str] = {
|
||||
"TERRA": "Warm earthy tones — terracotta, olive. Heritage, tradition, organic warmth.",
|
||||
"OCEAN": "Cool oceanic depth — teal, coral accents. Calm, marine, fluid elegance.",
|
||||
"SUNSET": "Vibrant warm energy — orange, purple. Passion, creativity, bold expression.",
|
||||
"EMERALD": "Fresh natural life — green, mint. Growth, health, sustainability.",
|
||||
"ECLIPSE": "Dramatic luxury — black, gold. Premium, power, prestige.",
|
||||
"ROSE": "Soft elegance — dusty pink, mauve. Beauty, care, refined femininity.",
|
||||
"FROST": "Crisp clarity — ice blue, silver. Tech, data, precision analytics.",
|
||||
"NEBULA": "Cosmic mystery — magenta, deep purple. AI, innovation, cutting-edge future.",
|
||||
"AURORA": "Ethereal northern lights — green-teal, violet. Mystical, transformative, wonder.",
|
||||
"CORAL": "Tropical warmth — coral, turquoise. Inviting, lively, community.",
|
||||
"MIDNIGHT": "Deep sophistication — navy, silver. Contemplative, trust, authority.",
|
||||
"AMBER": "Rich honey warmth — amber, brown. Comfort, wisdom, organic richness.",
|
||||
"LAVENDER": "Gentle dreaminess — purple, lilac. Calm, imaginative, serene.",
|
||||
"STEEL": "Industrial strength — gray, steel blue. Modern professional, reliability.",
|
||||
"CITRUS": "Bright optimism — yellow, lime. Energy, joy, fresh starts.",
|
||||
"CHERRY": "Bold impact — deep red, dark. Power, urgency, passionate conviction.",
|
||||
}
|
||||
|
||||
|
||||
# ---------------------------------------------------------------------------
|
||||
# LLM-based theme assignment (replaces keyword-based pick_theme_and_mode)
|
||||
# ---------------------------------------------------------------------------
|
||||
|
||||
THEME_ASSIGNMENT_SYSTEM_PROMPT = """You are a visual design director assigning color themes to presentation slides.
|
||||
Given a list of slides, assign each slide a theme preset and color mode (dark or light).
|
||||
|
||||
Available themes (name — description):
|
||||
{theme_list}
|
||||
|
||||
Rules:
|
||||
1. Pick the theme that best matches each slide's mood, content, and visual direction.
|
||||
2. Maximize visual variety — avoid repeating the same theme on consecutive slides.
|
||||
3. Mix dark and light modes across the presentation for contrast and rhythm.
|
||||
4. Opening slides often benefit from a bold dark theme; closing/summary slides can go either way.
|
||||
5. The "background_explanation" field is the primary signal — it describes the intended mood and color direction.
|
||||
|
||||
Return ONLY a JSON array (no markdown fences, no explanation):
|
||||
[
|
||||
{{"slide_number": 1, "theme": "THEME_NAME", "mode": "dark"}},
|
||||
{{"slide_number": 2, "theme": "THEME_NAME", "mode": "light"}}
|
||||
]
|
||||
""".strip()
|
||||
|
||||
|
||||
def build_theme_assignment_user_prompt(
|
||||
slides: list[dict[str, str]],
|
||||
) -> str:
|
||||
"""Build the user prompt for LLM theme assignment.
|
||||
|
||||
*slides* is a list of dicts with keys: slide_number, title, subtitle,
|
||||
background_explanation (mood).
|
||||
"""
|
||||
lines = ["Assign a theme and mode to each of these slides:", ""]
|
||||
for s in slides:
|
||||
lines.append(
|
||||
f'Slide {s["slide_number"]}: "{s["title"]}" '
|
||||
f'(subtitle: "{s.get("subtitle", "")}") — '
|
||||
f'Mood: "{s.get("background_explanation", "neutral")}"'
|
||||
)
|
||||
return "\n".join(lines)
|
||||
|
||||
|
||||
def get_theme_assignment_system_prompt() -> str:
|
||||
"""Return the theme assignment system prompt with the full theme list injected."""
|
||||
theme_list = "\n".join(
|
||||
f"- {name}: {desc}" for name, desc in THEME_DESCRIPTIONS.items()
|
||||
)
|
||||
return THEME_ASSIGNMENT_SYSTEM_PROMPT.format(theme_list=theme_list)
|
||||
|
||||
|
||||
def pick_theme_and_mode_fallback(
|
||||
slide_index: int, total_slides: int
|
||||
) -> tuple[str, str]:
|
||||
"""Simple round-robin fallback when LLM theme assignment fails."""
|
||||
theme = THEME_PRESETS[slide_index % len(THEME_PRESETS)]
|
||||
mode = "dark" if slide_index % 2 == 0 else "light"
|
||||
if total_slides == 1:
|
||||
mode = "dark"
|
||||
return theme, mode
|
||||
|
||||
|
||||
def get_slide_generation_prompt(user_prompt: str | None = None) -> str:
|
||||
return f"""
|
||||
Today's date: {datetime.datetime.now().strftime("%Y-%m-%d")}
|
||||
<video_presentation_system>
|
||||
You are a content-to-slides converter. You receive raw source content (articles, notes, transcripts,
|
||||
product descriptions, chat conversations, etc.) and break it into a sequence of presentation slides
|
||||
for a video presentation with voiceover narration.
|
||||
|
||||
{
|
||||
f'''
|
||||
You **MUST** strictly adhere to the following user instruction while generating the slides:
|
||||
<user_instruction>
|
||||
{user_prompt}
|
||||
</user_instruction>
|
||||
'''
|
||||
if user_prompt
|
||||
else ""
|
||||
}
|
||||
|
||||
<input>
|
||||
- '<source_content>': A block of text containing the information to be presented. This could be
|
||||
research findings, an article summary, a detailed outline, user chat history, or any relevant
|
||||
raw information. The content serves as the factual basis for the video presentation.
|
||||
</input>
|
||||
|
||||
<output_format>
|
||||
A JSON object containing the presentation slides:
|
||||
{{
|
||||
"slides": [
|
||||
{{
|
||||
"slide_number": 1,
|
||||
"title": "Concise slide title",
|
||||
"subtitle": "One-line subtitle or tagline",
|
||||
"content_in_markdown": "## Heading\\n- Bullet point 1\\n- **Bold text**\\n- Bullet point 3",
|
||||
"speaker_transcripts": [
|
||||
"First narration sentence for this slide.",
|
||||
"Second narration sentence expanding on the point.",
|
||||
"Third sentence wrapping up this slide."
|
||||
],
|
||||
"background_explanation": "Emotional mood and color direction for this slide"
|
||||
}}
|
||||
]
|
||||
}}
|
||||
</output_format>
|
||||
|
||||
<guidelines>
|
||||
=== SLIDE COUNT ===
|
||||
|
||||
Dynamically decide the number of slides between 1 and {MAX_SLIDES} (inclusive).
|
||||
Base your decision entirely on the content's depth, richness, and how many distinct ideas it contains.
|
||||
Thin or simple content should produce fewer slides; dense or multi-faceted content may use more.
|
||||
Do NOT inflate or pad slides to reach {
|
||||
MAX_SLIDES
|
||||
} — only use what the content genuinely warrants.
|
||||
Do NOT treat {MAX_SLIDES} as a target; it is a hard ceiling, not a goal.
|
||||
|
||||
=== SLIDE STRUCTURE ===
|
||||
|
||||
- Each slide should cover ONE distinct key idea or section.
|
||||
- Keep slides focused: 2-5 bullet points of content per slide max.
|
||||
- The first slide should be a title/intro slide.
|
||||
- The last slide should be a summary or closing slide ONLY if there are 3+ slides.
|
||||
For 1-2 slides, skip the closing slide — just cover the content.
|
||||
- Do NOT create a separate closing slide if its content would just repeat earlier slides.
|
||||
|
||||
=== CONTENT FIELDS ===
|
||||
|
||||
- Write speaker_transcripts as if a human presenter is narrating — natural, conversational, 2-4 sentences per slide.
|
||||
These will be converted to TTS audio, so write in a way that sounds great when spoken aloud.
|
||||
- background_explanation should describe a visual style matching the slide's mood:
|
||||
- Describe the emotional feel: "warm and organic", "dramatic and urgent", "clean and optimistic",
|
||||
"technical and precise", "celebratory", "earthy and grounded", "cosmic and futuristic"
|
||||
- Mention color direction: warm tones, cool tones, earth tones, neon accents, gold/black, etc.
|
||||
- Vary the mood across slides — do NOT always say "dark blue gradient".
|
||||
- content_in_markdown should use proper markdown: ## headings, **bold**, - bullets, etc.
|
||||
|
||||
=== NARRATION QUALITY ===
|
||||
|
||||
- Speaker transcripts should explain the slide content in an engaging, presenter-like voice.
|
||||
- Keep narration concise: 2-4 sentences per slide (targeting ~10-15 seconds of audio per slide).
|
||||
- The narration should add context beyond what's on the slide — don't just read the bullets.
|
||||
- Use natural language: contractions, conversational tone, occasional enthusiasm.
|
||||
</guidelines>
|
||||
|
||||
<examples>
|
||||
Input: "Quantum computing uses quantum bits or qubits which can exist in multiple states simultaneously due to superposition."
|
||||
|
||||
Output:
|
||||
{{
|
||||
"slides": [
|
||||
{{
|
||||
"slide_number": 1,
|
||||
"title": "Quantum Computing",
|
||||
"subtitle": "Beyond Classical Bits",
|
||||
"content_in_markdown": "## The Quantum Leap\\n- Classical computers use **bits** (0 or 1)\\n- Quantum computers use **qubits**\\n- Qubits leverage **superposition**",
|
||||
"speaker_transcripts": [
|
||||
"Let's explore quantum computing, a technology that's fundamentally different from the computers we use every day.",
|
||||
"While traditional computers work with bits that are either zero or one, quantum computers use something called qubits.",
|
||||
"The magic of qubits is superposition — they can exist in multiple states at the same time."
|
||||
],
|
||||
"background_explanation": "Cosmic and futuristic with deep purple and magenta tones, evoking the mystery of quantum mechanics"
|
||||
}}
|
||||
]
|
||||
}}
|
||||
</examples>
|
||||
|
||||
Transform the source material into well-structured presentation slides with engaging narration.
|
||||
Ensure each slide has a clear visual mood and natural-sounding speaker transcripts.
|
||||
</video_presentation_system>
|
||||
"""
|
||||
|
||||
|
||||
# ---------------------------------------------------------------------------
|
||||
# Remotion scene code generation prompt
|
||||
# Ported from RemotionTets POC /api/generate system prompt
|
||||
# ---------------------------------------------------------------------------
|
||||
|
||||
REMOTION_SCENE_SYSTEM_PROMPT = """
|
||||
You are a Remotion component generator that creates cinematic, modern motion graphics.
|
||||
Generate a single self-contained React component that uses Remotion.
|
||||
|
||||
=== THEME PRESETS (pick ONE per slide — see user prompt for which to use) ===
|
||||
|
||||
Each slide MUST use a DIFFERENT preset. The user prompt will tell you which preset to use.
|
||||
Use ALL colors from that preset — background, surface, text, accent, glow. Do NOT mix presets.
|
||||
|
||||
TERRA (warm earth — terracotta + olive):
|
||||
dark: bg #1C1510 surface #261E16 border #3D3024 text #E8DDD0 muted #9A8A78 accent #C2623D secondary #7D8C52 glow rgba(194,98,61,0.12)
|
||||
light: bg #F7F0E8 surface #FFF8F0 border #DDD0BF text #2C1D0E muted #8A7A68 accent #B85430 secondary #6B7A42 glow rgba(184,84,48,0.08)
|
||||
gradient-dark: radial-gradient(ellipse at 30% 80%, rgba(194,98,61,0.18), transparent 60%), linear-gradient(180deg, #1C1510, #261E16)
|
||||
gradient-light: radial-gradient(ellipse at 70% 20%, rgba(107,122,66,0.12), transparent 55%), linear-gradient(180deg, #F7F0E8, #FFF8F0)
|
||||
|
||||
OCEAN (cool depth — teal + coral):
|
||||
dark: bg #0B1A1E surface #122428 border #1E3740 text #D5EAF0 muted #6A9AA8 accent #1DB6A8 secondary #E87461 glow rgba(29,182,168,0.12)
|
||||
light: bg #F0F8FA surface #FFFFFF border #C8E0E8 text #0E2830 muted #5A8A98 accent #0EA69A secondary #D05F4E glow rgba(14,166,154,0.08)
|
||||
gradient-dark: radial-gradient(ellipse at 80% 30%, rgba(29,182,168,0.20), transparent 55%), radial-gradient(circle at 20% 80%, rgba(232,116,97,0.10), transparent 50%), #0B1A1E
|
||||
gradient-light: radial-gradient(ellipse at 20% 40%, rgba(14,166,154,0.10), transparent 55%), linear-gradient(180deg, #F0F8FA, #FFFFFF)
|
||||
|
||||
SUNSET (warm energy — orange + purple):
|
||||
dark: bg #1E130F surface #2A1B14 border #42291C text #F0DDD0 muted #A08878 accent #E86A20 secondary #A855C0 glow rgba(232,106,32,0.12)
|
||||
light: bg #FFF5ED surface #FFFFFF border #EADAC8 text #2E1508 muted #907860 accent #D05A18 secondary #9045A8 glow rgba(208,90,24,0.08)
|
||||
gradient-dark: linear-gradient(135deg, rgba(232,106,32,0.15) 0%, transparent 40%), radial-gradient(circle at 80% 70%, rgba(168,85,192,0.15), transparent 50%), #1E130F
|
||||
gradient-light: linear-gradient(135deg, rgba(208,90,24,0.08) 0%, rgba(144,69,168,0.06) 100%), #FFF5ED
|
||||
|
||||
EMERALD (fresh life — green + mint):
|
||||
dark: bg #0B1E14 surface #12281A border #1E3C28 text #D0F0E0 muted #5EA880 accent #10B981 secondary #84CC16 glow rgba(16,185,129,0.12)
|
||||
light: bg #F0FAF5 surface #FFFFFF border #C0E8D0 text #0E2C18 muted #489068 accent #059669 secondary #65A30D glow rgba(5,150,105,0.08)
|
||||
gradient-dark: radial-gradient(ellipse at 50% 50%, rgba(16,185,129,0.18), transparent 60%), linear-gradient(180deg, #0B1E14, #12281A)
|
||||
gradient-light: radial-gradient(ellipse at 60% 30%, rgba(101,163,13,0.10), transparent 55%), linear-gradient(180deg, #F0FAF5, #FFFFFF)
|
||||
|
||||
ECLIPSE (dramatic — black + gold):
|
||||
dark: bg #100C05 surface #1A1508 border #2E2510 text #D4B96A muted #8A7840 accent #E8B830 secondary #C09020 glow rgba(232,184,48,0.14)
|
||||
light: bg #FAF6ED surface #FFFFFF border #E0D8C0 text #1A1408 muted #7A6818 accent #C09820 secondary #A08018 glow rgba(192,152,32,0.08)
|
||||
gradient-dark: radial-gradient(circle at 50% 40%, rgba(232,184,48,0.20), transparent 50%), radial-gradient(ellipse at 50% 90%, rgba(192,144,32,0.08), transparent 50%), #100C05
|
||||
gradient-light: radial-gradient(circle at 50% 40%, rgba(192,152,32,0.10), transparent 55%), linear-gradient(180deg, #FAF6ED, #FFFFFF)
|
||||
|
||||
ROSE (soft elegance — dusty pink + mauve):
|
||||
dark: bg #1E1018 surface #281820 border #3D2830 text #F0D8E0 muted #A08090 accent #E4508C secondary #B06498 glow rgba(228,80,140,0.12)
|
||||
light: bg #FDF2F5 surface #FFFFFF border #F0D0D8 text #2C1018 muted #906878 accent #D43D78 secondary #9A5080 glow rgba(212,61,120,0.08)
|
||||
gradient-dark: radial-gradient(ellipse at 70% 30%, rgba(228,80,140,0.18), transparent 55%), radial-gradient(circle at 20% 80%, rgba(176,100,152,0.10), transparent 50%), #1E1018
|
||||
gradient-light: radial-gradient(ellipse at 30% 60%, rgba(212,61,120,0.08), transparent 55%), linear-gradient(180deg, #FDF2F5, #FFFFFF)
|
||||
|
||||
FROST (crisp clarity — ice blue + silver):
|
||||
dark: bg #0A1520 surface #101D2A border #1A3040 text #D0E5F5 muted #6090B0 accent #5AB4E8 secondary #8BA8C0 glow rgba(90,180,232,0.12)
|
||||
light: bg #F0F6FC surface #FFFFFF border #C8D8E8 text #0C1820 muted #5080A0 accent #3A96D0 secondary #7090A8 glow rgba(58,150,208,0.08)
|
||||
gradient-dark: radial-gradient(ellipse at 40% 20%, rgba(90,180,232,0.16), transparent 55%), linear-gradient(180deg, #0A1520, #101D2A)
|
||||
gradient-light: radial-gradient(ellipse at 50% 50%, rgba(58,150,208,0.08), transparent 55%), linear-gradient(180deg, #F0F6FC, #FFFFFF)
|
||||
|
||||
NEBULA (cosmic — magenta + deep purple):
|
||||
dark: bg #150A1E surface #1E1028 border #351A48 text #E0D0F0 muted #8060A0 accent #C850E0 secondary #8030C0 glow rgba(200,80,224,0.14)
|
||||
light: bg #F8F0FF surface #FFFFFF border #E0C8F0 text #1A0A24 muted #7050A0 accent #A840C0 secondary #6820A0 glow rgba(168,64,192,0.08)
|
||||
gradient-dark: radial-gradient(circle at 60% 40%, rgba(200,80,224,0.18), transparent 50%), radial-gradient(ellipse at 30% 80%, rgba(128,48,192,0.12), transparent 50%), #150A1E
|
||||
gradient-light: radial-gradient(circle at 40% 30%, rgba(168,64,192,0.10), transparent 55%), linear-gradient(180deg, #F8F0FF, #FFFFFF)
|
||||
|
||||
AURORA (ethereal lights — green-teal + violet):
|
||||
dark: bg #0A1A1A surface #102020 border #1A3838 text #D0F0F0 muted #60A0A0 accent #30D0B0 secondary #8040D0 glow rgba(48,208,176,0.12)
|
||||
light: bg #F0FAF8 surface #FFFFFF border #C0E8E0 text #0A2020 muted #508080 accent #20B090 secondary #6830B0 glow rgba(32,176,144,0.08)
|
||||
gradient-dark: radial-gradient(ellipse at 30% 70%, rgba(48,208,176,0.18), transparent 55%), radial-gradient(circle at 70% 30%, rgba(128,64,208,0.12), transparent 50%), #0A1A1A
|
||||
gradient-light: radial-gradient(ellipse at 50% 40%, rgba(32,176,144,0.10), transparent 55%), linear-gradient(180deg, #F0FAF8, #FFFFFF)
|
||||
|
||||
CORAL (tropical warmth — coral + turquoise):
|
||||
dark: bg #1E0F0F surface #281818 border #402828 text #F0D8D8 muted #A07070 accent #F06050 secondary #30B8B0 glow rgba(240,96,80,0.12)
|
||||
light: bg #FFF5F3 surface #FFFFFF border #F0D0C8 text #2E1010 muted #906060 accent #E04838 secondary #20A098 glow rgba(224,72,56,0.08)
|
||||
gradient-dark: radial-gradient(ellipse at 60% 60%, rgba(240,96,80,0.18), transparent 55%), radial-gradient(circle at 30% 30%, rgba(48,184,176,0.10), transparent 50%), #1E0F0F
|
||||
gradient-light: radial-gradient(ellipse at 40% 50%, rgba(224,72,56,0.08), transparent 55%), linear-gradient(180deg, #FFF5F3, #FFFFFF)
|
||||
|
||||
MIDNIGHT (deep sophistication — navy + silver):
|
||||
dark: bg #080C18 surface #0E1420 border #1A2438 text #C8D8F0 muted #5070A0 accent #4080E0 secondary #A0B0D0 glow rgba(64,128,224,0.12)
|
||||
light: bg #F0F2F8 surface #FFFFFF border #C8D0E0 text #101828 muted #506080 accent #3060C0 secondary #8090B0 glow rgba(48,96,192,0.08)
|
||||
gradient-dark: radial-gradient(ellipse at 50% 30%, rgba(64,128,224,0.16), transparent 55%), linear-gradient(180deg, #080C18, #0E1420)
|
||||
gradient-light: radial-gradient(ellipse at 50% 50%, rgba(48,96,192,0.08), transparent 55%), linear-gradient(180deg, #F0F2F8, #FFFFFF)
|
||||
|
||||
AMBER (rich honey warmth — amber + brown):
|
||||
dark: bg #1A1208 surface #221A0E border #3A2C18 text #F0E0C0 muted #A09060 accent #E0A020 secondary #C08030 glow rgba(224,160,32,0.12)
|
||||
light: bg #FFF8E8 surface #FFFFFF border #E8D8B8 text #2A1C08 muted #907840 accent #C88810 secondary #A86820 glow rgba(200,136,16,0.08)
|
||||
gradient-dark: radial-gradient(ellipse at 40% 60%, rgba(224,160,32,0.18), transparent 55%), linear-gradient(180deg, #1A1208, #221A0E)
|
||||
gradient-light: radial-gradient(ellipse at 60% 40%, rgba(200,136,16,0.10), transparent 55%), linear-gradient(180deg, #FFF8E8, #FFFFFF)
|
||||
|
||||
LAVENDER (gentle dreaminess — purple + lilac):
|
||||
dark: bg #14101E surface #1C1628 border #302840 text #E0D8F0 muted #8070A0 accent #A060E0 secondary #C090D0 glow rgba(160,96,224,0.12)
|
||||
light: bg #F8F0FF surface #FFFFFF border #E0D0F0 text #1C1028 muted #706090 accent #8848C0 secondary #A878B8 glow rgba(136,72,192,0.08)
|
||||
gradient-dark: radial-gradient(ellipse at 60% 40%, rgba(160,96,224,0.18), transparent 55%), radial-gradient(circle at 30% 70%, rgba(192,144,208,0.10), transparent 50%), #14101E
|
||||
gradient-light: radial-gradient(ellipse at 40% 30%, rgba(136,72,192,0.10), transparent 55%), linear-gradient(180deg, #F8F0FF, #FFFFFF)
|
||||
|
||||
STEEL (industrial strength — gray + steel blue):
|
||||
dark: bg #101214 surface #181C20 border #282E38 text #D0D8E0 muted #708090 accent #5088B0 secondary #90A0B0 glow rgba(80,136,176,0.12)
|
||||
light: bg #F2F4F6 surface #FFFFFF border #D0D8E0 text #181C24 muted #607080 accent #3870A0 secondary #708898 glow rgba(56,112,160,0.08)
|
||||
gradient-dark: radial-gradient(ellipse at 50% 50%, rgba(80,136,176,0.14), transparent 55%), linear-gradient(180deg, #101214, #181C20)
|
||||
gradient-light: radial-gradient(ellipse at 50% 40%, rgba(56,112,160,0.08), transparent 55%), linear-gradient(180deg, #F2F4F6, #FFFFFF)
|
||||
|
||||
CITRUS (bright optimism — yellow + lime):
|
||||
dark: bg #181808 surface #202010 border #383818 text #F0F0C0 muted #A0A060 accent #E8D020 secondary #90D030 glow rgba(232,208,32,0.12)
|
||||
light: bg #FFFFF0 surface #FFFFFF border #E8E8C0 text #282808 muted #808040 accent #C8B010 secondary #70B020 glow rgba(200,176,16,0.08)
|
||||
gradient-dark: radial-gradient(ellipse at 40% 40%, rgba(232,208,32,0.18), transparent 55%), radial-gradient(circle at 70% 70%, rgba(144,208,48,0.10), transparent 50%), #181808
|
||||
gradient-light: radial-gradient(ellipse at 50% 30%, rgba(200,176,16,0.10), transparent 55%), linear-gradient(180deg, #FFFFF0, #FFFFFF)
|
||||
|
||||
CHERRY (bold impact — deep red + dark):
|
||||
dark: bg #1A0808 surface #241010 border #401818 text #F0D0D0 muted #A06060 accent #D02030 secondary #E05060 glow rgba(208,32,48,0.14)
|
||||
light: bg #FFF0F0 surface #FFFFFF border #F0C8C8 text #280808 muted #904848 accent #B01828 secondary #C83848 glow rgba(176,24,40,0.08)
|
||||
gradient-dark: radial-gradient(ellipse at 50% 40%, rgba(208,32,48,0.20), transparent 50%), linear-gradient(180deg, #1A0808, #241010)
|
||||
gradient-light: radial-gradient(ellipse at 50% 50%, rgba(176,24,40,0.10), transparent 55%), linear-gradient(180deg, #FFF0F0, #FFFFFF)
|
||||
|
||||
=== SHARED TOKENS (use with any theme above) ===
|
||||
|
||||
SPACING: xs 8px, sm 16px, md 24px, lg 32px, xl 48px, 2xl 64px, 3xl 96px, 4xl 128px
|
||||
TYPOGRAPHY: fontFamily "Inter, system-ui, -apple-system, sans-serif"
|
||||
caption 14px/1.4, body 18px/1.6, subhead 24px/1.4, title 40px/1.2 w600, headline 64px/1.1 w700, display 96px/1.0 w800
|
||||
letterSpacing: tight "-0.02em", normal "0", wide "0.05em"
|
||||
BORDER RADIUS: 12px (cards), 8px (buttons), 9999px (pills)
|
||||
|
||||
=== VISUAL VARIETY (CRITICAL) ===
|
||||
|
||||
The user prompt assigns each slide a specific theme preset AND mode (dark/light).
|
||||
You MUST use EXACTLY the assigned preset and mode. Additionally:
|
||||
|
||||
1. Use the preset's gradient as the AbsoluteFill background.
|
||||
2. Use the preset's accent/secondary colors for highlights, pill badges, and card accents.
|
||||
3. Use the preset's glow value for all boxShadow effects.
|
||||
4. LAYOUT VARIATION: Vary layout between slides:
|
||||
- One slide: bold centered headline + subtle stat
|
||||
- Another: two-column card layout
|
||||
- Another: single large number or quote as hero
|
||||
Do NOT use the same layout pattern for every slide.
|
||||
|
||||
=== LAYOUT RULES (CRITICAL — elements must NEVER overlap) ===
|
||||
|
||||
The canvas is 1920x1080. You MUST use a SINGLE-LAYER layout. NO stacking, NO multiple AbsoluteFill layers.
|
||||
|
||||
STRUCTURE — every component must follow this exact pattern:
|
||||
<AbsoluteFill style={{ backgroundColor: "...", display: "flex", flexDirection: "column", justifyContent: "center", alignItems: "center", padding: 80 }}>
|
||||
{/* ALL content goes here as direct children in normal flow */}
|
||||
</AbsoluteFill>
|
||||
|
||||
ABSOLUTE RULES:
|
||||
- Use exactly ONE AbsoluteFill as the root. Set its background color/gradient via its style prop.
|
||||
- NEVER nest AbsoluteFill inside AbsoluteFill.
|
||||
- NEVER use position "absolute" or position "fixed" on ANY element.
|
||||
- NEVER use multiple layers or z-index.
|
||||
- ALL elements must be in normal document flow inside the single root AbsoluteFill.
|
||||
|
||||
SPACING:
|
||||
- Root padding: 80px on all sides (safe area).
|
||||
- Use flexDirection "column" with gap for vertical stacking, flexDirection "row" with gap for horizontal.
|
||||
- Minimum gap between elements: 24px vertical, 32px horizontal.
|
||||
- Text hierarchy gaps: headline→subheading 16px, subheading→body 12px, body→button 32px.
|
||||
- Cards/panels: padding 32px-48px, borderRadius 12px.
|
||||
- NEVER use margin to space siblings — always use the parent's gap property.
|
||||
|
||||
=== DESIGN STYLE ===
|
||||
|
||||
- Premium aesthetic — use the exact colors from the assigned theme preset (do NOT invent your own)
|
||||
- Background: use the preset's gradient-dark or gradient-light value directly as the AbsoluteFill's background
|
||||
- Card/surface backgrounds: use the preset's surface color
|
||||
- Text colors: use the preset's text, muted values
|
||||
- Borders: use the preset's border color
|
||||
- Glows: use the preset's glow value for all boxShadow — do NOT substitute other colors
|
||||
- Generous whitespace — less is more, let elements breathe
|
||||
- NO decorative background shapes, blurs, or overlapping ornaments
|
||||
|
||||
=== REMOTION RULES ===
|
||||
|
||||
- Export the component as: export const MyComposition = () => { ... }
|
||||
- Use useCurrentFrame() and useVideoConfig() from "remotion"
|
||||
- Do NOT use Sequence
|
||||
- Do NOT manually calculate animation timings or frame offsets
|
||||
|
||||
=== ANIMATION (use the stagger() helper for ALL element animations) ===
|
||||
|
||||
A pre-built helper function called stagger() is available globally.
|
||||
It handles enter, hold, and exit phases automatically — you MUST use it.
|
||||
|
||||
Signature:
|
||||
stagger(frame, fps, index, total) → { opacity: number, transform: string }
|
||||
|
||||
Parameters:
|
||||
frame — from useCurrentFrame()
|
||||
fps — from useVideoConfig()
|
||||
index — 0-based index of this element in the entrance order
|
||||
total — total number of animated elements in the scene
|
||||
|
||||
It returns a style object with opacity and transform that you spread onto the element.
|
||||
Timing is handled for you: staggered spring entrances, ambient hold motion, and a graceful exit.
|
||||
|
||||
Usage pattern:
|
||||
const frame = useCurrentFrame();
|
||||
const { fps } = useVideoConfig();
|
||||
|
||||
<div style={stagger(frame, fps, 0, 4)}>Headline</div>
|
||||
<div style={stagger(frame, fps, 1, 4)}>Subtitle</div>
|
||||
<div style={stagger(frame, fps, 2, 4)}>Card</div>
|
||||
<div style={stagger(frame, fps, 3, 4)}>Footer</div>
|
||||
|
||||
Rules:
|
||||
- Count ALL animated elements in your scene and pass that count as the "total" parameter.
|
||||
- Assign each element a sequential index starting from 0.
|
||||
- You can merge stagger's return with additional styles:
|
||||
<div style={{ ...stagger(frame, fps, 0, 3), fontSize: 64, color: "#fafafa" }}>
|
||||
- For non-animated static elements (backgrounds, borders), just use normal styles without stagger.
|
||||
- You may still use spring() and interpolate() for EXTRA custom effects (e.g., a number counter,
|
||||
color shift, or typewriter effect), but stagger() must drive all entrance/exit animations.
|
||||
|
||||
=== AVAILABLE GLOBALS (injected at runtime, do NOT import anything else) ===
|
||||
|
||||
- React (available globally)
|
||||
- AbsoluteFill, useCurrentFrame, useVideoConfig, spring, interpolate, Easing from "remotion"
|
||||
- stagger(frame, fps, index, total) — animation helper described above
|
||||
|
||||
=== CODE RULES ===
|
||||
|
||||
- Output ONLY the raw code, no markdown fences, no explanations
|
||||
- Keep it fully self-contained, no external dependencies or images
|
||||
- Use inline styles only (no CSS imports, no className)
|
||||
- Target 1920x1080 resolution
|
||||
- Every container must use display "flex" with explicit gap values
|
||||
- NEVER use marginTop/marginBottom to space siblings — use the parent's gap instead
|
||||
""".strip()
|
||||
|
||||
|
||||
def build_scene_generation_user_prompt(
|
||||
slide_number: int,
|
||||
total_slides: int,
|
||||
title: str,
|
||||
subtitle: str,
|
||||
content_in_markdown: str,
|
||||
background_explanation: str,
|
||||
duration_in_frames: int,
|
||||
theme: str,
|
||||
mode: str,
|
||||
) -> str:
|
||||
"""Build the user prompt for generating a single slide's Remotion scene code.
|
||||
|
||||
*theme* and *mode* are pre-assigned (by LLM or fallback) before this is called.
|
||||
"""
|
||||
return "\n".join(
|
||||
[
|
||||
"Create a cinematic, visually striking Remotion scene.",
|
||||
f"The video is {duration_in_frames} frames at {FPS}fps ({duration_in_frames / FPS:.1f}s total).",
|
||||
"",
|
||||
f"This is slide {slide_number} of {total_slides} in the video.",
|
||||
"",
|
||||
f"=== ASSIGNED THEME: {theme} / {mode.upper()} mode ===",
|
||||
f"You MUST use the {theme} preset in {mode} mode from the theme presets above.",
|
||||
f"Use its exact background gradient (gradient-{mode}), surface, text, accent, secondary, border, and glow colors.",
|
||||
"Do NOT substitute, invent, or default to blue/violet colors.",
|
||||
"",
|
||||
f'The scene should communicate this message: "{title} — {subtitle}"',
|
||||
"",
|
||||
"Key ideas to convey (use as creative inspiration, NOT literal text to dump on screen):",
|
||||
content_in_markdown,
|
||||
"",
|
||||
"Pick only the 1-2 most impactful phrases or numbers to display as text.",
|
||||
"",
|
||||
f"Mood & tone: {background_explanation}",
|
||||
]
|
||||
)
|
||||
|
||||
|
||||
REFINE_SCENE_SYSTEM_PROMPT = """
|
||||
You are a code repair assistant. You will receive a Remotion React component that failed to compile,
|
||||
along with the exact error message from the Babel transpiler.
|
||||
|
||||
Your job is to fix the code so it compiles and runs correctly.
|
||||
|
||||
RULES:
|
||||
- Output ONLY the fixed raw code as a string — no markdown fences, no explanations.
|
||||
- Preserve the original intent, design, and animations as closely as possible.
|
||||
- The component must be exported as: export const MyComposition = () => { ... }
|
||||
- Only these globals are available at runtime (they are injected, not actually imported):
|
||||
React, AbsoluteFill, useCurrentFrame, useVideoConfig, spring, interpolate, Easing,
|
||||
stagger (a helper: stagger(frame, fps, index, total) → { opacity, transform })
|
||||
- Keep import statements at the top (they get stripped by the compiler) but do NOT import anything
|
||||
other than "react" and "remotion".
|
||||
- Use inline styles only (no CSS, no className).
|
||||
- Common fixes:
|
||||
- Mismatched braces/brackets in JSX style objects (e.g. }}, instead of }}>)
|
||||
- Missing closing tags
|
||||
- Trailing commas before > in JSX
|
||||
- Undefined variables or typos
|
||||
- Invalid JSX expressions
|
||||
- After fixing, mentally walk through every brace pair { } and JSX tag to verify they match.
|
||||
""".strip()
|
||||
73
surfsense_backend/app/agents/video_presentation/state.py
Normal file
73
surfsense_backend/app/agents/video_presentation/state.py
Normal file
|
|
@ -0,0 +1,73 @@
|
|||
"""Define the state structures for the video presentation agent."""
|
||||
|
||||
from __future__ import annotations
|
||||
|
||||
from dataclasses import dataclass
|
||||
|
||||
from pydantic import BaseModel, Field
|
||||
from sqlalchemy.ext.asyncio import AsyncSession
|
||||
|
||||
|
||||
class SlideContent(BaseModel):
|
||||
"""Represents a single parsed slide from content analysis."""
|
||||
|
||||
slide_number: int = Field(..., description="1-based slide number")
|
||||
title: str = Field(..., description="Concise slide title")
|
||||
subtitle: str = Field(..., description="One-line subtitle or tagline")
|
||||
content_in_markdown: str = Field(
|
||||
..., description="Slide body content formatted as markdown"
|
||||
)
|
||||
speaker_transcripts: list[str] = Field(
|
||||
...,
|
||||
description="2-4 short sentences a presenter would say while this slide is shown",
|
||||
)
|
||||
background_explanation: str = Field(
|
||||
...,
|
||||
description="Emotional mood and color direction for this slide",
|
||||
)
|
||||
|
||||
|
||||
class PresentationSlides(BaseModel):
|
||||
"""Represents the full set of parsed slides from the LLM."""
|
||||
|
||||
slides: list[SlideContent] = Field(
|
||||
..., description="Ordered array of presentation slides"
|
||||
)
|
||||
|
||||
|
||||
class SlideAudioResult(BaseModel):
|
||||
"""Audio generation result for a single slide."""
|
||||
|
||||
slide_number: int
|
||||
audio_file: str = Field(..., description="Path to the per-slide audio file")
|
||||
duration_seconds: float = Field(..., description="Audio duration in seconds")
|
||||
duration_in_frames: int = Field(
|
||||
..., description="Audio duration in frames (at 30fps)"
|
||||
)
|
||||
|
||||
|
||||
class SlideSceneCode(BaseModel):
|
||||
"""Generated Remotion component code for a single slide."""
|
||||
|
||||
slide_number: int
|
||||
code: str = Field(
|
||||
..., description="Raw Remotion React component source code for this slide"
|
||||
)
|
||||
title: str = Field(..., description="Short title for the composition")
|
||||
|
||||
|
||||
@dataclass
|
||||
class State:
|
||||
"""State for the video presentation agent graph.
|
||||
|
||||
Pipeline: parse slides → (TTS audio ∥ theme assignment) → generate Remotion code
|
||||
The frontend receives the slides + code + audio and handles compilation/rendering.
|
||||
"""
|
||||
|
||||
db_session: AsyncSession
|
||||
source_content: str
|
||||
|
||||
slides: list[SlideContent] | None = None
|
||||
slide_audio_results: list[SlideAudioResult] | None = None
|
||||
slide_theme_assignments: dict[int, tuple[str, str]] | None = None
|
||||
slide_scene_codes: list[SlideSceneCode] | None = None
|
||||
30
surfsense_backend/app/agents/video_presentation/utils.py
Normal file
30
surfsense_backend/app/agents/video_presentation/utils.py
Normal file
|
|
@ -0,0 +1,30 @@
|
|||
def get_voice_for_provider(provider: str, speaker_id: int = 0) -> dict | str:
|
||||
"""
|
||||
Get the appropriate voice configuration based on the TTS provider.
|
||||
|
||||
Currently single-speaker only (speaker_id=0). Multi-speaker support
|
||||
will be added in a future iteration.
|
||||
|
||||
Args:
|
||||
provider: The TTS provider (e.g., "openai/tts-1", "vertex_ai/test")
|
||||
speaker_id: The ID of the speaker (default 0, single speaker for now)
|
||||
|
||||
Returns:
|
||||
Voice configuration - string for OpenAI, dict for Vertex AI
|
||||
"""
|
||||
if provider == "local/kokoro":
|
||||
return "af_heart"
|
||||
|
||||
provider_type = (
|
||||
provider.split("/")[0].lower() if "/" in provider else provider.lower()
|
||||
)
|
||||
|
||||
voices = {
|
||||
"openai": "alloy",
|
||||
"vertex_ai": {
|
||||
"languageCode": "en-US",
|
||||
"name": "en-US-Studio-O",
|
||||
},
|
||||
"azure": "alloy",
|
||||
}
|
||||
return voices.get(provider_type, {})
|
||||
|
|
@ -340,20 +340,17 @@ if config.NEXT_FRONTEND_URL:
|
|||
if www_url not in allowed_origins:
|
||||
allowed_origins.append(www_url)
|
||||
|
||||
# For local development, also allow common localhost origins
|
||||
if not config.BACKEND_URL or (
|
||||
config.NEXT_FRONTEND_URL and "localhost" in config.NEXT_FRONTEND_URL
|
||||
):
|
||||
allowed_origins.extend(
|
||||
[
|
||||
"http://localhost:3000",
|
||||
"http://127.0.0.1:3000",
|
||||
]
|
||||
)
|
||||
allowed_origins.extend(
|
||||
[ # For local development and desktop app
|
||||
"http://localhost:3000",
|
||||
"http://127.0.0.1:3000",
|
||||
]
|
||||
)
|
||||
|
||||
app.add_middleware(
|
||||
CORSMiddleware,
|
||||
allow_origins=allowed_origins,
|
||||
allow_origin_regex=r"^https?://(localhost|127\.0\.0\.1)(:\d+)?$",
|
||||
allow_credentials=True,
|
||||
allow_methods=["*"], # Allows all methods
|
||||
allow_headers=["*"], # Allows all headers
|
||||
|
|
|
|||
|
|
@ -77,6 +77,7 @@ celery_app = Celery(
|
|||
include=[
|
||||
"app.tasks.celery_tasks.document_tasks",
|
||||
"app.tasks.celery_tasks.podcast_tasks",
|
||||
"app.tasks.celery_tasks.video_presentation_tasks",
|
||||
"app.tasks.celery_tasks.connector_tasks",
|
||||
"app.tasks.celery_tasks.schedule_checker_task",
|
||||
"app.tasks.celery_tasks.document_reindex_tasks",
|
||||
|
|
|
|||
|
|
@ -224,6 +224,9 @@ class Config:
|
|||
os.getenv("CONNECTOR_INDEXING_LOCK_TTL_SECONDS", str(8 * 60 * 60))
|
||||
)
|
||||
|
||||
# Platform web search (SearXNG)
|
||||
SEARXNG_DEFAULT_HOST = os.getenv("SEARXNG_DEFAULT_HOST")
|
||||
|
||||
NEXT_FRONTEND_URL = os.getenv("NEXT_FRONTEND_URL")
|
||||
# Backend URL to override the http to https in the OAuth redirect URI
|
||||
BACKEND_URL = os.getenv("BACKEND_URL")
|
||||
|
|
|
|||
|
|
@ -183,6 +183,23 @@ global_llm_configs:
|
|||
use_default_system_instructions: true
|
||||
citations_enabled: true
|
||||
|
||||
# Example: MiniMax M2.5 - High-performance with 204K context window
|
||||
- id: -8
|
||||
name: "Global MiniMax M2.5"
|
||||
description: "MiniMax M2.5 with 204K context window and competitive pricing"
|
||||
provider: "MINIMAX"
|
||||
model_name: "MiniMax-M2.5"
|
||||
api_key: "your-minimax-api-key-here"
|
||||
api_base: "https://api.minimax.io/v1"
|
||||
rpm: 60
|
||||
tpm: 100000
|
||||
litellm_params:
|
||||
temperature: 1.0 # MiniMax requires temperature in (0.0, 1.0], cannot be 0
|
||||
max_tokens: 4000
|
||||
system_instructions: ""
|
||||
use_default_system_instructions: true
|
||||
citations_enabled: true
|
||||
|
||||
# =============================================================================
|
||||
# Image Generation Configuration
|
||||
# =============================================================================
|
||||
|
|
|
|||
|
|
@ -463,7 +463,7 @@ async def _process_gmail_messages_phase2(
|
|||
"connector_id": connector_id,
|
||||
"source": "composio",
|
||||
}
|
||||
safe_set_chunks(document, chunks)
|
||||
await safe_set_chunks(session, document, chunks)
|
||||
document.updated_at = get_current_timestamp()
|
||||
document.status = DocumentStatus.ready()
|
||||
|
||||
|
|
|
|||
|
|
@ -477,7 +477,7 @@ async def index_composio_google_calendar(
|
|||
"connector_id": connector_id,
|
||||
"source": "composio",
|
||||
}
|
||||
safe_set_chunks(document, chunks)
|
||||
await safe_set_chunks(session, document, chunks)
|
||||
document.updated_at = get_current_timestamp()
|
||||
document.status = DocumentStatus.ready()
|
||||
|
||||
|
|
|
|||
|
|
@ -1112,7 +1112,7 @@ async def _index_composio_drive_delta_sync(
|
|||
"connector_id": connector_id,
|
||||
"source": "composio",
|
||||
}
|
||||
safe_set_chunks(document, chunks)
|
||||
await safe_set_chunks(session, document, chunks)
|
||||
document.updated_at = get_current_timestamp()
|
||||
document.status = DocumentStatus.ready()
|
||||
|
||||
|
|
@ -1520,7 +1520,7 @@ async def _index_composio_drive_full_scan(
|
|||
"connector_id": connector_id,
|
||||
"source": "composio",
|
||||
}
|
||||
safe_set_chunks(document, chunks)
|
||||
await safe_set_chunks(session, document, chunks)
|
||||
document.updated_at = get_current_timestamp()
|
||||
document.status = DocumentStatus.ready()
|
||||
|
||||
|
|
|
|||
|
|
@ -103,6 +103,13 @@ class PodcastStatus(StrEnum):
|
|||
FAILED = "failed"
|
||||
|
||||
|
||||
class VideoPresentationStatus(StrEnum):
|
||||
PENDING = "pending"
|
||||
GENERATING = "generating"
|
||||
READY = "ready"
|
||||
FAILED = "failed"
|
||||
|
||||
|
||||
class DocumentStatus:
|
||||
"""
|
||||
Helper class for document processing status (stored as JSONB).
|
||||
|
|
@ -215,6 +222,7 @@ class LiteLLMProvider(StrEnum):
|
|||
COMETAPI = "COMETAPI"
|
||||
HUGGINGFACE = "HUGGINGFACE"
|
||||
GITHUB_MODELS = "GITHUB_MODELS"
|
||||
MINIMAX = "MINIMAX"
|
||||
CUSTOM = "CUSTOM"
|
||||
|
||||
|
||||
|
|
@ -336,6 +344,12 @@ class Permission(StrEnum):
|
|||
PODCASTS_UPDATE = "podcasts:update"
|
||||
PODCASTS_DELETE = "podcasts:delete"
|
||||
|
||||
# Video Presentations
|
||||
VIDEO_PRESENTATIONS_CREATE = "video_presentations:create"
|
||||
VIDEO_PRESENTATIONS_READ = "video_presentations:read"
|
||||
VIDEO_PRESENTATIONS_UPDATE = "video_presentations:update"
|
||||
VIDEO_PRESENTATIONS_DELETE = "video_presentations:delete"
|
||||
|
||||
# Image Generations
|
||||
IMAGE_GENERATIONS_CREATE = "image_generations:create"
|
||||
IMAGE_GENERATIONS_READ = "image_generations:read"
|
||||
|
|
@ -402,6 +416,10 @@ DEFAULT_ROLE_PERMISSIONS = {
|
|||
Permission.PODCASTS_CREATE.value,
|
||||
Permission.PODCASTS_READ.value,
|
||||
Permission.PODCASTS_UPDATE.value,
|
||||
# Video Presentations (no delete)
|
||||
Permission.VIDEO_PRESENTATIONS_CREATE.value,
|
||||
Permission.VIDEO_PRESENTATIONS_READ.value,
|
||||
Permission.VIDEO_PRESENTATIONS_UPDATE.value,
|
||||
# Image Generations (create and read, no delete)
|
||||
Permission.IMAGE_GENERATIONS_CREATE.value,
|
||||
Permission.IMAGE_GENERATIONS_READ.value,
|
||||
|
|
@ -434,6 +452,8 @@ DEFAULT_ROLE_PERMISSIONS = {
|
|||
Permission.LLM_CONFIGS_READ.value,
|
||||
# Podcasts (read only)
|
||||
Permission.PODCASTS_READ.value,
|
||||
# Video Presentations (read only)
|
||||
Permission.VIDEO_PRESENTATIONS_READ.value,
|
||||
# Image Generations (read only)
|
||||
Permission.IMAGE_GENERATIONS_READ.value,
|
||||
# Connectors (read only)
|
||||
|
|
@ -1043,6 +1063,46 @@ class Podcast(BaseModel, TimestampMixin):
|
|||
thread = relationship("NewChatThread")
|
||||
|
||||
|
||||
class VideoPresentation(BaseModel, TimestampMixin):
|
||||
"""Video presentation model for storing AI-generated video presentations.
|
||||
|
||||
The slides JSONB stores per-slide data including Remotion component code,
|
||||
audio file paths, and durations. The frontend compiles the code and renders
|
||||
the video using Remotion Player.
|
||||
"""
|
||||
|
||||
__tablename__ = "video_presentations"
|
||||
|
||||
title = Column(String(500), nullable=False)
|
||||
slides = Column(JSONB, nullable=True)
|
||||
scene_codes = Column(JSONB, nullable=True)
|
||||
status = Column(
|
||||
SQLAlchemyEnum(
|
||||
VideoPresentationStatus,
|
||||
name="video_presentation_status",
|
||||
create_type=False,
|
||||
values_callable=lambda x: [e.value for e in x],
|
||||
),
|
||||
nullable=False,
|
||||
default=VideoPresentationStatus.READY,
|
||||
server_default="ready",
|
||||
index=True,
|
||||
)
|
||||
|
||||
search_space_id = Column(
|
||||
Integer, ForeignKey("searchspaces.id", ondelete="CASCADE"), nullable=False
|
||||
)
|
||||
search_space = relationship("SearchSpace", back_populates="video_presentations")
|
||||
|
||||
thread_id = Column(
|
||||
Integer,
|
||||
ForeignKey("new_chat_threads.id", ondelete="SET NULL"),
|
||||
nullable=True,
|
||||
index=True,
|
||||
)
|
||||
thread = relationship("NewChatThread")
|
||||
|
||||
|
||||
class Report(BaseModel, TimestampMixin):
|
||||
"""Report model for storing generated Markdown reports."""
|
||||
|
||||
|
|
@ -1227,6 +1287,12 @@ class SearchSpace(BaseModel, TimestampMixin):
|
|||
order_by="Podcast.id.desc()",
|
||||
cascade="all, delete-orphan",
|
||||
)
|
||||
video_presentations = relationship(
|
||||
"VideoPresentation",
|
||||
back_populates="search_space",
|
||||
order_by="VideoPresentation.id.desc()",
|
||||
cascade="all, delete-orphan",
|
||||
)
|
||||
reports = relationship(
|
||||
"Report",
|
||||
back_populates="search_space",
|
||||
|
|
|
|||
|
|
@ -1,3 +1,4 @@
|
|||
import asyncio
|
||||
import time
|
||||
from datetime import datetime
|
||||
|
||||
|
|
@ -49,7 +50,7 @@ class ChucksHybridSearchRetriever:
|
|||
# Get embedding for the query
|
||||
embedding_model = config.embedding_model_instance
|
||||
t_embed = time.perf_counter()
|
||||
query_embedding = embedding_model.embed(query_text)
|
||||
query_embedding = await asyncio.to_thread(embedding_model.embed, query_text)
|
||||
perf.debug(
|
||||
"[chunk_search] vector_search embedding in %.3fs",
|
||||
time.perf_counter() - t_embed,
|
||||
|
|
@ -195,7 +196,7 @@ class ChucksHybridSearchRetriever:
|
|||
if query_embedding is None:
|
||||
embedding_model = config.embedding_model_instance
|
||||
t_embed = time.perf_counter()
|
||||
query_embedding = embedding_model.embed(query_text)
|
||||
query_embedding = await asyncio.to_thread(embedding_model.embed, query_text)
|
||||
perf.debug(
|
||||
"[chunk_search] hybrid_search embedding in %.3fs",
|
||||
time.perf_counter() - t_embed,
|
||||
|
|
|
|||
|
|
@ -42,6 +42,7 @@ from .search_spaces_routes import router as search_spaces_router
|
|||
from .slack_add_connector_route import router as slack_add_connector_router
|
||||
from .surfsense_docs_routes import router as surfsense_docs_router
|
||||
from .teams_add_connector_route import router as teams_add_connector_router
|
||||
from .video_presentations_routes import router as video_presentations_router
|
||||
from .youtube_routes import router as youtube_router
|
||||
|
||||
router = APIRouter()
|
||||
|
|
@ -55,6 +56,9 @@ router.include_router(new_chat_router) # Chat with assistant-ui persistence
|
|||
router.include_router(sandbox_router) # Sandbox file downloads (Daytona)
|
||||
router.include_router(chat_comments_router)
|
||||
router.include_router(podcasts_router) # Podcast task status and audio
|
||||
router.include_router(
|
||||
video_presentations_router
|
||||
) # Video presentation status and streaming
|
||||
router.include_router(reports_router) # Report CRUD and multi-format export
|
||||
router.include_router(image_generation_router) # Image generation via litellm
|
||||
router.include_router(search_source_connectors_router)
|
||||
|
|
|
|||
|
|
@ -21,6 +21,7 @@ from app.services.public_chat_service import (
|
|||
get_public_chat,
|
||||
get_snapshot_podcast,
|
||||
get_snapshot_report,
|
||||
get_snapshot_video_presentation,
|
||||
)
|
||||
from app.users import current_active_user
|
||||
|
||||
|
|
@ -117,6 +118,119 @@ async def stream_public_podcast(
|
|||
)
|
||||
|
||||
|
||||
@router.get("/{share_token}/video-presentations/{video_presentation_id}")
|
||||
async def get_public_video_presentation(
|
||||
share_token: str,
|
||||
video_presentation_id: int,
|
||||
session: AsyncSession = Depends(get_async_session),
|
||||
):
|
||||
"""
|
||||
Get video presentation details from a public chat snapshot.
|
||||
|
||||
No authentication required - the share_token provides access.
|
||||
Returns slide data (with public audio URLs) and scene codes.
|
||||
"""
|
||||
vp_info = await get_snapshot_video_presentation(
|
||||
session, share_token, video_presentation_id
|
||||
)
|
||||
|
||||
if not vp_info:
|
||||
raise HTTPException(status_code=404, detail="Video presentation not found")
|
||||
|
||||
slides = vp_info.get("slides") or []
|
||||
public_slides = _replace_audio_paths_with_public_urls(
|
||||
share_token, video_presentation_id, slides
|
||||
)
|
||||
|
||||
return {
|
||||
"id": vp_info.get("original_id"),
|
||||
"title": vp_info.get("title"),
|
||||
"status": "ready",
|
||||
"slides": public_slides,
|
||||
"scene_codes": vp_info.get("scene_codes"),
|
||||
"slide_count": len(slides) if slides else None,
|
||||
}
|
||||
|
||||
|
||||
@router.get(
|
||||
"/{share_token}/video-presentations/{video_presentation_id}/slides/{slide_number}/audio"
|
||||
)
|
||||
async def stream_public_slide_audio(
|
||||
share_token: str,
|
||||
video_presentation_id: int,
|
||||
slide_number: int,
|
||||
session: AsyncSession = Depends(get_async_session),
|
||||
):
|
||||
"""
|
||||
Stream a slide's audio from a public chat snapshot.
|
||||
|
||||
No authentication required - the share_token provides access.
|
||||
"""
|
||||
from pathlib import Path
|
||||
|
||||
vp_info = await get_snapshot_video_presentation(
|
||||
session, share_token, video_presentation_id
|
||||
)
|
||||
|
||||
if not vp_info:
|
||||
raise HTTPException(status_code=404, detail="Video presentation not found")
|
||||
|
||||
slides = vp_info.get("slides") or []
|
||||
slide_data = None
|
||||
for s in slides:
|
||||
if s.get("slide_number") == slide_number:
|
||||
slide_data = s
|
||||
break
|
||||
|
||||
if not slide_data:
|
||||
raise HTTPException(status_code=404, detail=f"Slide {slide_number} not found")
|
||||
|
||||
file_path = slide_data.get("audio_file")
|
||||
if not file_path or not os.path.isfile(file_path):
|
||||
raise HTTPException(status_code=404, detail="Slide audio file not found")
|
||||
|
||||
ext = Path(file_path).suffix.lower()
|
||||
media_type = "audio/wav" if ext == ".wav" else "audio/mpeg"
|
||||
|
||||
def iterfile():
|
||||
with open(file_path, mode="rb") as file_like:
|
||||
yield from file_like
|
||||
|
||||
return StreamingResponse(
|
||||
iterfile(),
|
||||
media_type=media_type,
|
||||
headers={
|
||||
"Accept-Ranges": "bytes",
|
||||
"Content-Disposition": f"inline; filename={Path(file_path).name}",
|
||||
},
|
||||
)
|
||||
|
||||
|
||||
def _replace_audio_paths_with_public_urls(
|
||||
share_token: str,
|
||||
video_presentation_id: int,
|
||||
slides: list[dict],
|
||||
) -> list[dict]:
|
||||
"""Replace server-local audio_file paths with public streaming API URLs."""
|
||||
result = []
|
||||
for slide in slides:
|
||||
slide_copy = dict(slide)
|
||||
slide_number = slide_copy.get("slide_number")
|
||||
audio_file = slide_copy.pop("audio_file", None)
|
||||
|
||||
if audio_file and slide_number is not None:
|
||||
slide_copy["audio_url"] = (
|
||||
f"/api/v1/public/{share_token}"
|
||||
f"/video-presentations/{video_presentation_id}"
|
||||
f"/slides/{slide_number}/audio"
|
||||
)
|
||||
else:
|
||||
slide_copy["audio_url"] = None
|
||||
|
||||
result.append(slide_copy)
|
||||
return result
|
||||
|
||||
|
||||
@router.get("/{share_token}/reports/{report_id}/content")
|
||||
async def get_public_report_content(
|
||||
share_token: str,
|
||||
|
|
|
|||
242
surfsense_backend/app/routes/video_presentations_routes.py
Normal file
242
surfsense_backend/app/routes/video_presentations_routes.py
Normal file
|
|
@ -0,0 +1,242 @@
|
|||
"""
|
||||
Video presentation routes for CRUD operations and per-slide audio streaming.
|
||||
|
||||
These routes support the video presentation generation feature in new-chat.
|
||||
Frontend polls GET /video-presentations/{id} to check status field.
|
||||
When ready, the slides JSONB contains per-slide Remotion code and audio file paths.
|
||||
The frontend compiles the Remotion code via Babel and renders with Remotion Player.
|
||||
"""
|
||||
|
||||
import os
|
||||
from pathlib import Path
|
||||
|
||||
from fastapi import APIRouter, Depends, HTTPException
|
||||
from fastapi.responses import StreamingResponse
|
||||
from sqlalchemy import select
|
||||
from sqlalchemy.exc import SQLAlchemyError
|
||||
from sqlalchemy.ext.asyncio import AsyncSession
|
||||
|
||||
from app.db import (
|
||||
Permission,
|
||||
SearchSpace,
|
||||
SearchSpaceMembership,
|
||||
User,
|
||||
VideoPresentation,
|
||||
get_async_session,
|
||||
)
|
||||
from app.schemas import VideoPresentationRead
|
||||
from app.users import current_active_user
|
||||
from app.utils.rbac import check_permission
|
||||
|
||||
router = APIRouter()
|
||||
|
||||
|
||||
@router.get("/video-presentations", response_model=list[VideoPresentationRead])
|
||||
async def read_video_presentations(
|
||||
skip: int = 0,
|
||||
limit: int = 100,
|
||||
search_space_id: int | None = None,
|
||||
session: AsyncSession = Depends(get_async_session),
|
||||
user: User = Depends(current_active_user),
|
||||
):
|
||||
"""
|
||||
List video presentations the user has access to.
|
||||
Requires VIDEO_PRESENTATIONS_READ permission for the search space(s).
|
||||
"""
|
||||
if skip < 0 or limit < 1:
|
||||
raise HTTPException(status_code=400, detail="Invalid pagination parameters")
|
||||
try:
|
||||
if search_space_id is not None:
|
||||
await check_permission(
|
||||
session,
|
||||
user,
|
||||
search_space_id,
|
||||
Permission.VIDEO_PRESENTATIONS_READ.value,
|
||||
"You don't have permission to read video presentations in this search space",
|
||||
)
|
||||
result = await session.execute(
|
||||
select(VideoPresentation)
|
||||
.filter(VideoPresentation.search_space_id == search_space_id)
|
||||
.offset(skip)
|
||||
.limit(limit)
|
||||
)
|
||||
else:
|
||||
result = await session.execute(
|
||||
select(VideoPresentation)
|
||||
.join(SearchSpace)
|
||||
.join(SearchSpaceMembership)
|
||||
.filter(SearchSpaceMembership.user_id == user.id)
|
||||
.offset(skip)
|
||||
.limit(limit)
|
||||
)
|
||||
return [
|
||||
VideoPresentationRead.from_orm_with_slides(vp)
|
||||
for vp in result.scalars().all()
|
||||
]
|
||||
except HTTPException:
|
||||
raise
|
||||
except SQLAlchemyError:
|
||||
raise HTTPException(
|
||||
status_code=500,
|
||||
detail="Database error occurred while fetching video presentations",
|
||||
) from None
|
||||
|
||||
|
||||
@router.get(
|
||||
"/video-presentations/{video_presentation_id}",
|
||||
response_model=VideoPresentationRead,
|
||||
)
|
||||
async def read_video_presentation(
|
||||
video_presentation_id: int,
|
||||
session: AsyncSession = Depends(get_async_session),
|
||||
user: User = Depends(current_active_user),
|
||||
):
|
||||
"""
|
||||
Get a specific video presentation by ID.
|
||||
Requires authentication with VIDEO_PRESENTATIONS_READ permission.
|
||||
|
||||
When status is "ready", the response includes:
|
||||
- slides: parsed slide data with per-slide audio_url and durations
|
||||
- scene_codes: Remotion component source code per slide
|
||||
"""
|
||||
try:
|
||||
result = await session.execute(
|
||||
select(VideoPresentation).filter(
|
||||
VideoPresentation.id == video_presentation_id
|
||||
)
|
||||
)
|
||||
video_pres = result.scalars().first()
|
||||
|
||||
if not video_pres:
|
||||
raise HTTPException(status_code=404, detail="Video presentation not found")
|
||||
|
||||
await check_permission(
|
||||
session,
|
||||
user,
|
||||
video_pres.search_space_id,
|
||||
Permission.VIDEO_PRESENTATIONS_READ.value,
|
||||
"You don't have permission to read video presentations in this search space",
|
||||
)
|
||||
|
||||
return VideoPresentationRead.from_orm_with_slides(video_pres)
|
||||
except HTTPException as he:
|
||||
raise he
|
||||
except SQLAlchemyError:
|
||||
raise HTTPException(
|
||||
status_code=500,
|
||||
detail="Database error occurred while fetching video presentation",
|
||||
) from None
|
||||
|
||||
|
||||
@router.delete("/video-presentations/{video_presentation_id}", response_model=dict)
|
||||
async def delete_video_presentation(
|
||||
video_presentation_id: int,
|
||||
session: AsyncSession = Depends(get_async_session),
|
||||
user: User = Depends(current_active_user),
|
||||
):
|
||||
"""
|
||||
Delete a video presentation.
|
||||
Requires VIDEO_PRESENTATIONS_DELETE permission for the search space.
|
||||
"""
|
||||
try:
|
||||
result = await session.execute(
|
||||
select(VideoPresentation).filter(
|
||||
VideoPresentation.id == video_presentation_id
|
||||
)
|
||||
)
|
||||
db_video_pres = result.scalars().first()
|
||||
|
||||
if not db_video_pres:
|
||||
raise HTTPException(status_code=404, detail="Video presentation not found")
|
||||
|
||||
await check_permission(
|
||||
session,
|
||||
user,
|
||||
db_video_pres.search_space_id,
|
||||
Permission.VIDEO_PRESENTATIONS_DELETE.value,
|
||||
"You don't have permission to delete video presentations in this search space",
|
||||
)
|
||||
|
||||
await session.delete(db_video_pres)
|
||||
await session.commit()
|
||||
return {"message": "Video presentation deleted successfully"}
|
||||
except HTTPException as he:
|
||||
raise he
|
||||
except SQLAlchemyError:
|
||||
await session.rollback()
|
||||
raise HTTPException(
|
||||
status_code=500,
|
||||
detail="Database error occurred while deleting video presentation",
|
||||
) from None
|
||||
|
||||
|
||||
@router.get("/video-presentations/{video_presentation_id}/slides/{slide_number}/audio")
|
||||
async def stream_slide_audio(
|
||||
video_presentation_id: int,
|
||||
slide_number: int,
|
||||
session: AsyncSession = Depends(get_async_session),
|
||||
user: User = Depends(current_active_user),
|
||||
):
|
||||
"""
|
||||
Stream the audio file for a specific slide in a video presentation.
|
||||
The slide_number is 1-based. Audio path is read from the slides JSONB.
|
||||
"""
|
||||
try:
|
||||
result = await session.execute(
|
||||
select(VideoPresentation).filter(
|
||||
VideoPresentation.id == video_presentation_id
|
||||
)
|
||||
)
|
||||
video_pres = result.scalars().first()
|
||||
|
||||
if not video_pres:
|
||||
raise HTTPException(status_code=404, detail="Video presentation not found")
|
||||
|
||||
await check_permission(
|
||||
session,
|
||||
user,
|
||||
video_pres.search_space_id,
|
||||
Permission.VIDEO_PRESENTATIONS_READ.value,
|
||||
"You don't have permission to access video presentations in this search space",
|
||||
)
|
||||
|
||||
slides = video_pres.slides or []
|
||||
slide_data = None
|
||||
for s in slides:
|
||||
if s.get("slide_number") == slide_number:
|
||||
slide_data = s
|
||||
break
|
||||
|
||||
if not slide_data:
|
||||
raise HTTPException(
|
||||
status_code=404,
|
||||
detail=f"Slide {slide_number} not found",
|
||||
)
|
||||
|
||||
file_path = slide_data.get("audio_file")
|
||||
if not file_path or not os.path.isfile(file_path):
|
||||
raise HTTPException(status_code=404, detail="Slide audio file not found")
|
||||
|
||||
ext = Path(file_path).suffix.lower()
|
||||
media_type = "audio/wav" if ext == ".wav" else "audio/mpeg"
|
||||
|
||||
def iterfile():
|
||||
with open(file_path, mode="rb") as file_like:
|
||||
yield from file_like
|
||||
|
||||
return StreamingResponse(
|
||||
iterfile(),
|
||||
media_type=media_type,
|
||||
headers={
|
||||
"Accept-Ranges": "bytes",
|
||||
"Content-Disposition": f"inline; filename={Path(file_path).name}",
|
||||
},
|
||||
)
|
||||
|
||||
except HTTPException as he:
|
||||
raise he
|
||||
except Exception as e:
|
||||
raise HTTPException(
|
||||
status_code=500,
|
||||
detail=f"Error streaming slide audio: {e!s}",
|
||||
) from e
|
||||
|
|
@ -101,6 +101,12 @@ from .search_space import (
|
|||
SearchSpaceWithStats,
|
||||
)
|
||||
from .users import UserCreate, UserRead, UserUpdate
|
||||
from .video_presentations import (
|
||||
VideoPresentationBase,
|
||||
VideoPresentationCreate,
|
||||
VideoPresentationRead,
|
||||
VideoPresentationUpdate,
|
||||
)
|
||||
|
||||
__all__ = [
|
||||
# Chat schemas (assistant-ui integration)
|
||||
|
|
@ -220,4 +226,9 @@ __all__ = [
|
|||
"UserRead",
|
||||
"UserSearchSpaceAccess",
|
||||
"UserUpdate",
|
||||
# Video Presentation schemas
|
||||
"VideoPresentationBase",
|
||||
"VideoPresentationCreate",
|
||||
"VideoPresentationRead",
|
||||
"VideoPresentationUpdate",
|
||||
]
|
||||
|
|
|
|||
|
|
@ -12,13 +12,11 @@ class SearchSpaceBase(BaseModel):
|
|||
|
||||
|
||||
class SearchSpaceCreate(SearchSpaceBase):
|
||||
# Optional on create, will use defaults if not provided
|
||||
citations_enabled: bool = True
|
||||
qna_custom_instructions: str | None = None
|
||||
|
||||
|
||||
class SearchSpaceUpdate(BaseModel):
|
||||
# All fields optional on update - only send what you want to change
|
||||
name: str | None = None
|
||||
description: str | None = None
|
||||
citations_enabled: bool | None = None
|
||||
|
|
@ -29,7 +27,6 @@ class SearchSpaceRead(SearchSpaceBase, IDModel, TimestampModel):
|
|||
id: int
|
||||
created_at: datetime
|
||||
user_id: uuid.UUID
|
||||
# QnA configuration
|
||||
citations_enabled: bool
|
||||
qna_custom_instructions: str | None = None
|
||||
|
||||
|
|
|
|||
103
surfsense_backend/app/schemas/video_presentations.py
Normal file
103
surfsense_backend/app/schemas/video_presentations.py
Normal file
|
|
@ -0,0 +1,103 @@
|
|||
"""Video presentation schemas for API responses."""
|
||||
|
||||
from datetime import datetime
|
||||
from enum import StrEnum
|
||||
from typing import Any
|
||||
|
||||
from pydantic import BaseModel
|
||||
|
||||
|
||||
class VideoPresentationStatusEnum(StrEnum):
|
||||
PENDING = "pending"
|
||||
GENERATING = "generating"
|
||||
READY = "ready"
|
||||
FAILED = "failed"
|
||||
|
||||
|
||||
class VideoPresentationBase(BaseModel):
|
||||
"""Base video presentation schema."""
|
||||
|
||||
title: str
|
||||
slides: list[dict[str, Any]] | None = None
|
||||
scene_codes: list[dict[str, Any]] | None = None
|
||||
search_space_id: int
|
||||
|
||||
|
||||
class VideoPresentationCreate(VideoPresentationBase):
|
||||
"""Schema for creating a video presentation."""
|
||||
|
||||
pass
|
||||
|
||||
|
||||
class VideoPresentationUpdate(BaseModel):
|
||||
"""Schema for updating a video presentation."""
|
||||
|
||||
title: str | None = None
|
||||
slides: list[dict[str, Any]] | None = None
|
||||
scene_codes: list[dict[str, Any]] | None = None
|
||||
|
||||
|
||||
class VideoPresentationRead(VideoPresentationBase):
|
||||
"""Schema for reading a video presentation."""
|
||||
|
||||
id: int
|
||||
status: VideoPresentationStatusEnum = VideoPresentationStatusEnum.READY
|
||||
created_at: datetime
|
||||
slide_count: int | None = None
|
||||
|
||||
class Config:
|
||||
from_attributes = True
|
||||
|
||||
@classmethod
|
||||
def from_orm_with_slides(cls, obj):
|
||||
"""Create VideoPresentationRead with slide_count computed.
|
||||
|
||||
Replaces raw server file paths in `audio_file` with API streaming
|
||||
URLs so the frontend can use them directly in Remotion <Audio />.
|
||||
"""
|
||||
slides = obj.slides
|
||||
if slides:
|
||||
slides = _replace_audio_paths_with_urls(obj.id, slides)
|
||||
|
||||
data = {
|
||||
"id": obj.id,
|
||||
"title": obj.title,
|
||||
"slides": slides,
|
||||
"scene_codes": obj.scene_codes,
|
||||
"search_space_id": obj.search_space_id,
|
||||
"status": obj.status,
|
||||
"created_at": obj.created_at,
|
||||
"slide_count": len(obj.slides) if obj.slides else None,
|
||||
}
|
||||
return cls(**data)
|
||||
|
||||
|
||||
def _replace_audio_paths_with_urls(
|
||||
video_presentation_id: int,
|
||||
slides: list[dict[str, Any]],
|
||||
) -> list[dict[str, Any]]:
|
||||
"""Replace server-local audio_file paths with streaming API URLs.
|
||||
|
||||
Transforms:
|
||||
"audio_file": "video_presentation_audio/abc_slide_1.mp3"
|
||||
Into:
|
||||
"audio_url": "/api/v1/video-presentations/42/slides/1/audio"
|
||||
|
||||
The frontend passes this URL to Remotion's <Audio src={slide.audio_url} />.
|
||||
"""
|
||||
result = []
|
||||
for slide in slides:
|
||||
slide_copy = dict(slide)
|
||||
slide_number = slide_copy.get("slide_number")
|
||||
audio_file = slide_copy.pop("audio_file", None)
|
||||
|
||||
if audio_file and slide_number is not None:
|
||||
slide_copy["audio_url"] = (
|
||||
f"/api/v1/video-presentations/{video_presentation_id}"
|
||||
f"/slides/{slide_number}/audio"
|
||||
)
|
||||
else:
|
||||
slide_copy["audio_url"] = None
|
||||
|
||||
result.append(slide_copy)
|
||||
return result
|
||||
|
|
@ -2,7 +2,6 @@ import asyncio
|
|||
import time
|
||||
from datetime import datetime
|
||||
from typing import Any
|
||||
from urllib.parse import urljoin
|
||||
|
||||
import httpx
|
||||
from linkup import LinkupClient
|
||||
|
|
@ -577,185 +576,27 @@ class ConnectorService:
|
|||
search_space_id: int,
|
||||
top_k: int = 20,
|
||||
) -> tuple:
|
||||
"""Search using the platform SearXNG instance.
|
||||
|
||||
Delegates to ``WebSearchService`` which handles caching, circuit
|
||||
breaking, and retries. SearXNG configuration comes from the
|
||||
docker/searxng/settings.yml file.
|
||||
"""
|
||||
Search using a configured SearxNG instance and return both sources and documents.
|
||||
"""
|
||||
searx_connector = await self.get_connector_by_type(
|
||||
SearchSourceConnectorType.SEARXNG_API, search_space_id
|
||||
from app.services import web_search_service
|
||||
|
||||
if not web_search_service.is_available():
|
||||
return {
|
||||
"id": 11,
|
||||
"name": "Web Search",
|
||||
"type": "SEARXNG_API",
|
||||
"sources": [],
|
||||
}, []
|
||||
|
||||
return await web_search_service.search(
|
||||
query=user_query,
|
||||
top_k=top_k,
|
||||
)
|
||||
|
||||
if not searx_connector:
|
||||
return {
|
||||
"id": 11,
|
||||
"name": "SearxNG Search",
|
||||
"type": "SEARXNG_API",
|
||||
"sources": [],
|
||||
}, []
|
||||
|
||||
config = searx_connector.config or {}
|
||||
host = config.get("SEARXNG_HOST")
|
||||
|
||||
if not host:
|
||||
print("SearxNG connector is missing SEARXNG_HOST configuration")
|
||||
return {
|
||||
"id": 11,
|
||||
"name": "SearxNG Search",
|
||||
"type": "SEARXNG_API",
|
||||
"sources": [],
|
||||
}, []
|
||||
|
||||
api_key = config.get("SEARXNG_API_KEY")
|
||||
engines = config.get("SEARXNG_ENGINES")
|
||||
categories = config.get("SEARXNG_CATEGORIES")
|
||||
language = config.get("SEARXNG_LANGUAGE")
|
||||
safesearch = config.get("SEARXNG_SAFESEARCH")
|
||||
|
||||
def _parse_bool(value: Any, default: bool = True) -> bool:
|
||||
if isinstance(value, bool):
|
||||
return value
|
||||
if isinstance(value, str):
|
||||
lowered = value.strip().lower()
|
||||
if lowered in {"true", "1", "yes", "on"}:
|
||||
return True
|
||||
if lowered in {"false", "0", "no", "off"}:
|
||||
return False
|
||||
return default
|
||||
|
||||
verify_ssl = _parse_bool(config.get("SEARXNG_VERIFY_SSL", True))
|
||||
|
||||
safesearch_value: int | None = None
|
||||
if isinstance(safesearch, str):
|
||||
safesearch_clean = safesearch.strip()
|
||||
if safesearch_clean.isdigit():
|
||||
safesearch_value = int(safesearch_clean)
|
||||
elif isinstance(safesearch, int | float):
|
||||
safesearch_value = int(safesearch)
|
||||
|
||||
if safesearch_value is not None and not (0 <= safesearch_value <= 2):
|
||||
safesearch_value = None
|
||||
|
||||
def _format_list(value: Any) -> str | None:
|
||||
if value is None:
|
||||
return None
|
||||
if isinstance(value, str):
|
||||
value = value.strip()
|
||||
return value or None
|
||||
if isinstance(value, list | tuple | set):
|
||||
cleaned = [str(item).strip() for item in value if str(item).strip()]
|
||||
return ",".join(cleaned) if cleaned else None
|
||||
return str(value)
|
||||
|
||||
params: dict[str, Any] = {
|
||||
"q": user_query,
|
||||
"format": "json",
|
||||
"language": language or "",
|
||||
"limit": max(1, min(top_k, 50)),
|
||||
}
|
||||
|
||||
engines_param = _format_list(engines)
|
||||
if engines_param:
|
||||
params["engines"] = engines_param
|
||||
|
||||
categories_param = _format_list(categories)
|
||||
if categories_param:
|
||||
params["categories"] = categories_param
|
||||
|
||||
if safesearch_value is not None:
|
||||
params["safesearch"] = safesearch_value
|
||||
|
||||
if not params.get("language"):
|
||||
params.pop("language")
|
||||
|
||||
headers = {"Accept": "application/json"}
|
||||
if api_key:
|
||||
headers["X-API-KEY"] = api_key
|
||||
|
||||
searx_endpoint = urljoin(host if host.endswith("/") else f"{host}/", "search")
|
||||
|
||||
try:
|
||||
async with httpx.AsyncClient(timeout=20.0, verify=verify_ssl) as client:
|
||||
response = await client.get(
|
||||
searx_endpoint,
|
||||
params=params,
|
||||
headers=headers,
|
||||
)
|
||||
response.raise_for_status()
|
||||
except httpx.HTTPError as exc:
|
||||
print(f"Error searching with SearxNG: {exc!s}")
|
||||
return {
|
||||
"id": 11,
|
||||
"name": "SearxNG Search",
|
||||
"type": "SEARXNG_API",
|
||||
"sources": [],
|
||||
}, []
|
||||
|
||||
try:
|
||||
data = response.json()
|
||||
except ValueError:
|
||||
print("Failed to decode JSON response from SearxNG")
|
||||
return {
|
||||
"id": 11,
|
||||
"name": "SearxNG Search",
|
||||
"type": "SEARXNG_API",
|
||||
"sources": [],
|
||||
}, []
|
||||
|
||||
searx_results = data.get("results", [])
|
||||
if not searx_results:
|
||||
return {
|
||||
"id": 11,
|
||||
"name": "SearxNG Search",
|
||||
"type": "SEARXNG_API",
|
||||
"sources": [],
|
||||
}, []
|
||||
|
||||
sources_list: list[dict[str, Any]] = []
|
||||
documents: list[dict[str, Any]] = []
|
||||
|
||||
async with self.counter_lock:
|
||||
for result in searx_results:
|
||||
description = result.get("content") or result.get("snippet") or ""
|
||||
if len(description) > 160:
|
||||
description = f"{description}"
|
||||
|
||||
source = {
|
||||
"id": self.source_id_counter,
|
||||
"title": result.get("title", "SearxNG Result"),
|
||||
"description": description,
|
||||
"url": result.get("url", ""),
|
||||
}
|
||||
sources_list.append(source)
|
||||
|
||||
metadata = {
|
||||
"url": result.get("url", ""),
|
||||
"engines": result.get("engines", []),
|
||||
"category": result.get("category"),
|
||||
"source": "SEARXNG_API",
|
||||
}
|
||||
|
||||
document = {
|
||||
"chunk_id": self.source_id_counter,
|
||||
"content": description or result.get("content", ""),
|
||||
"score": result.get("score", 0.0),
|
||||
"document": {
|
||||
"id": self.source_id_counter,
|
||||
"title": result.get("title", "SearxNG Result"),
|
||||
"document_type": "SEARXNG_API",
|
||||
"metadata": metadata,
|
||||
},
|
||||
}
|
||||
documents.append(document)
|
||||
self.source_id_counter += 1
|
||||
|
||||
result_object = {
|
||||
"id": 11,
|
||||
"name": "SearxNG Search",
|
||||
"type": "SEARXNG_API",
|
||||
"sources": sources_list,
|
||||
}
|
||||
|
||||
return result_object, documents
|
||||
|
||||
async def search_baidu(
|
||||
self,
|
||||
user_query: str,
|
||||
|
|
|
|||
|
|
@ -1,11 +1,10 @@
|
|||
import logging
|
||||
from datetime import datetime
|
||||
|
||||
from sqlalchemy import delete
|
||||
from sqlalchemy.ext.asyncio import AsyncSession
|
||||
|
||||
from app.connectors.linear_connector import LinearConnector
|
||||
from app.db import Chunk, Document
|
||||
from app.db import Document
|
||||
from app.services.llm_service import get_user_long_context_llm
|
||||
from app.utils.document_converters import (
|
||||
create_document_chunks,
|
||||
|
|
@ -105,10 +104,6 @@ class LinearKBSyncService:
|
|||
)
|
||||
summary_embedding = embed_text(summary_content)
|
||||
|
||||
await self.db_session.execute(
|
||||
delete(Chunk).where(Chunk.document_id == document.id)
|
||||
)
|
||||
|
||||
chunks = await create_document_chunks(issue_content)
|
||||
|
||||
document.title = f"{issue_identifier}: {issue_title}"
|
||||
|
|
@ -131,7 +126,7 @@ class LinearKBSyncService:
|
|||
"connector_id": connector_id,
|
||||
}
|
||||
flag_modified(document, "document_metadata")
|
||||
safe_set_chunks(document, chunks)
|
||||
await safe_set_chunks(self.db_session, document, chunks)
|
||||
document.updated_at = get_current_timestamp()
|
||||
|
||||
await self.db_session.commit()
|
||||
|
|
|
|||
|
|
@ -85,6 +85,7 @@ PROVIDER_MAP = {
|
|||
"ZHIPU": "openai",
|
||||
"GITHUB_MODELS": "github",
|
||||
"HUGGINGFACE": "huggingface",
|
||||
"MINIMAX": "openai",
|
||||
"CUSTOM": "custom",
|
||||
}
|
||||
|
||||
|
|
|
|||
|
|
@ -127,6 +127,7 @@ async def validate_llm_config(
|
|||
"ALIBABA_QWEN": "openai",
|
||||
"MOONSHOT": "openai",
|
||||
"ZHIPU": "openai", # GLM needs special handling
|
||||
"MINIMAX": "openai",
|
||||
"GITHUB_MODELS": "github",
|
||||
}
|
||||
provider_prefix = provider_map.get(provider, provider.lower())
|
||||
|
|
@ -277,6 +278,7 @@ async def get_search_space_llm_instance(
|
|||
"ALIBABA_QWEN": "openai",
|
||||
"MOONSHOT": "openai",
|
||||
"ZHIPU": "openai",
|
||||
"MINIMAX": "openai",
|
||||
}
|
||||
provider_prefix = provider_map.get(
|
||||
global_config["provider"], global_config["provider"].lower()
|
||||
|
|
@ -350,6 +352,7 @@ async def get_search_space_llm_instance(
|
|||
"ALIBABA_QWEN": "openai",
|
||||
"MOONSHOT": "openai",
|
||||
"ZHIPU": "openai",
|
||||
"MINIMAX": "openai",
|
||||
"GITHUB_MODELS": "github",
|
||||
}
|
||||
provider_prefix = provider_map.get(
|
||||
|
|
|
|||
|
|
@ -1,10 +1,9 @@
|
|||
import logging
|
||||
from datetime import datetime
|
||||
|
||||
from sqlalchemy import delete
|
||||
from sqlalchemy.ext.asyncio import AsyncSession
|
||||
|
||||
from app.db import Chunk, Document
|
||||
from app.db import Document
|
||||
from app.services.llm_service import get_user_long_context_llm
|
||||
from app.utils.document_converters import (
|
||||
create_document_chunks,
|
||||
|
|
@ -130,11 +129,6 @@ class NotionKBSyncService:
|
|||
summary_content = f"Notion Page: {document.document_metadata.get('page_title')}\n\n{full_content}"
|
||||
summary_embedding = embed_text(summary_content)
|
||||
|
||||
logger.debug(f"Deleting old chunks for document {document_id}")
|
||||
await self.db_session.execute(
|
||||
delete(Chunk).where(Chunk.document_id == document.id)
|
||||
)
|
||||
|
||||
logger.debug("Creating new chunks")
|
||||
chunks = await create_document_chunks(full_content)
|
||||
logger.debug(f"Created {len(chunks)} chunks")
|
||||
|
|
@ -147,7 +141,7 @@ class NotionKBSyncService:
|
|||
**document.document_metadata,
|
||||
"indexed_at": datetime.now().strftime("%Y-%m-%d %H:%M:%S"),
|
||||
}
|
||||
safe_set_chunks(document, chunks)
|
||||
await safe_set_chunks(self.db_session, document, chunks)
|
||||
document.updated_at = get_current_timestamp()
|
||||
|
||||
logger.debug("Committing changes to database")
|
||||
|
|
|
|||
|
|
@ -32,6 +32,8 @@ from app.db import (
|
|||
Report,
|
||||
SearchSpaceMembership,
|
||||
User,
|
||||
VideoPresentation,
|
||||
VideoPresentationStatus,
|
||||
)
|
||||
from app.utils.rbac import check_permission
|
||||
|
||||
|
|
@ -40,6 +42,7 @@ UI_TOOLS = {
|
|||
"link_preview",
|
||||
"generate_podcast",
|
||||
"generate_report",
|
||||
"generate_video_presentation",
|
||||
"scrape_webpage",
|
||||
"multi_link_preview",
|
||||
}
|
||||
|
|
@ -199,6 +202,8 @@ async def create_snapshot(
|
|||
podcast_ids_seen: set[int] = set()
|
||||
reports_data = []
|
||||
report_ids_seen: set[int] = set()
|
||||
video_presentations_data = []
|
||||
video_presentation_ids_seen: set[int] = set()
|
||||
|
||||
for msg in sorted(thread.messages, key=lambda m: m.created_at):
|
||||
author = await get_author_display(session, msg.author_id, user_cache)
|
||||
|
|
@ -225,6 +230,18 @@ async def create_snapshot(
|
|||
# Update status to "ready" so frontend renders PodcastPlayer
|
||||
part["result"] = {**result_data, "status": "ready"}
|
||||
|
||||
elif tool_name == "generate_video_presentation":
|
||||
result_data = part.get("result", {})
|
||||
vp_id = result_data.get("video_presentation_id")
|
||||
if vp_id and vp_id not in video_presentation_ids_seen:
|
||||
vp_info = await _get_video_presentation_for_snapshot(
|
||||
session, vp_id
|
||||
)
|
||||
if vp_info:
|
||||
video_presentations_data.append(vp_info)
|
||||
video_presentation_ids_seen.add(vp_id)
|
||||
part["result"] = {**result_data, "status": "ready"}
|
||||
|
||||
elif tool_name == "generate_report":
|
||||
result_data = part.get("result", {})
|
||||
report_id = result_data.get("report_id")
|
||||
|
|
@ -283,6 +300,7 @@ async def create_snapshot(
|
|||
"messages": messages_data,
|
||||
"podcasts": podcasts_data,
|
||||
"reports": reports_data,
|
||||
"video_presentations": video_presentations_data,
|
||||
}
|
||||
|
||||
# Create new snapshot
|
||||
|
|
@ -326,6 +344,27 @@ async def _get_podcast_for_snapshot(
|
|||
}
|
||||
|
||||
|
||||
async def _get_video_presentation_for_snapshot(
|
||||
session: AsyncSession,
|
||||
video_presentation_id: int,
|
||||
) -> dict | None:
|
||||
"""Get video presentation info for embedding in snapshot_data."""
|
||||
result = await session.execute(
|
||||
select(VideoPresentation).filter(VideoPresentation.id == video_presentation_id)
|
||||
)
|
||||
vp = result.scalars().first()
|
||||
|
||||
if not vp or vp.status != VideoPresentationStatus.READY:
|
||||
return None
|
||||
|
||||
return {
|
||||
"original_id": vp.id,
|
||||
"title": vp.title,
|
||||
"slides": vp.slides,
|
||||
"scene_codes": vp.scene_codes,
|
||||
}
|
||||
|
||||
|
||||
async def _get_report_for_snapshot(
|
||||
session: AsyncSession,
|
||||
report_id: int,
|
||||
|
|
@ -769,6 +808,31 @@ async def get_snapshot_podcast(
|
|||
return None
|
||||
|
||||
|
||||
async def get_snapshot_video_presentation(
|
||||
session: AsyncSession,
|
||||
share_token: str,
|
||||
video_presentation_id: int,
|
||||
) -> dict | None:
|
||||
"""
|
||||
Get video presentation info from a snapshot by original video presentation ID.
|
||||
|
||||
Used for rendering video presentation in public view.
|
||||
Looks up the presentation by its original_id in the snapshot's video_presentations array.
|
||||
"""
|
||||
snapshot = await get_snapshot_by_token(session, share_token)
|
||||
|
||||
if not snapshot:
|
||||
return None
|
||||
|
||||
video_presentations = snapshot.snapshot_data.get("video_presentations", [])
|
||||
|
||||
for vp in video_presentations:
|
||||
if vp.get("original_id") == video_presentation_id:
|
||||
return vp
|
||||
|
||||
return None
|
||||
|
||||
|
||||
async def get_snapshot_report(
|
||||
session: AsyncSession,
|
||||
share_token: str,
|
||||
|
|
|
|||
290
surfsense_backend/app/services/web_search_service.py
Normal file
290
surfsense_backend/app/services/web_search_service.py
Normal file
|
|
@ -0,0 +1,290 @@
|
|||
"""
|
||||
Platform-level web search service backed by SearXNG.
|
||||
|
||||
Redis is used only for result caching (graceful degradation if unavailable).
|
||||
The circuit breaker is fully in-process — no external dependency, zero
|
||||
latency overhead.
|
||||
"""
|
||||
|
||||
from __future__ import annotations
|
||||
|
||||
import contextlib
|
||||
import hashlib
|
||||
import json
|
||||
import logging
|
||||
import threading
|
||||
import time
|
||||
from typing import Any
|
||||
from urllib.parse import urljoin
|
||||
|
||||
import httpx
|
||||
import redis
|
||||
|
||||
from app.config import config
|
||||
|
||||
logger = logging.getLogger(__name__)
|
||||
|
||||
_EMPTY_RESULT: dict[str, Any] = {
|
||||
"id": 11,
|
||||
"name": "Web Search",
|
||||
"type": "SEARXNG_API",
|
||||
"sources": [],
|
||||
}
|
||||
|
||||
# ---------------------------------------------------------------------------
|
||||
# Redis — used only for result caching
|
||||
# ---------------------------------------------------------------------------
|
||||
|
||||
_redis_client: redis.Redis | None = None
|
||||
|
||||
|
||||
def _get_redis() -> redis.Redis:
|
||||
global _redis_client
|
||||
if _redis_client is None:
|
||||
_redis_client = redis.from_url(config.REDIS_APP_URL, decode_responses=True)
|
||||
return _redis_client
|
||||
|
||||
|
||||
# ---------------------------------------------------------------------------
|
||||
# In-process Circuit Breaker (no Redis dependency)
|
||||
# ---------------------------------------------------------------------------
|
||||
|
||||
_CB_FAILURE_THRESHOLD = 5
|
||||
_CB_FAILURE_WINDOW_SECONDS = 60
|
||||
_CB_COOLDOWN_SECONDS = 30
|
||||
|
||||
_cb_lock = threading.Lock()
|
||||
_cb_failure_count: int = 0
|
||||
_cb_last_failure_time: float = 0.0
|
||||
_cb_open_until: float = 0.0
|
||||
|
||||
|
||||
def _circuit_is_open() -> bool:
|
||||
return time.monotonic() < _cb_open_until
|
||||
|
||||
|
||||
def _record_failure() -> None:
|
||||
global _cb_failure_count, _cb_last_failure_time, _cb_open_until
|
||||
now = time.monotonic()
|
||||
with _cb_lock:
|
||||
if now - _cb_last_failure_time > _CB_FAILURE_WINDOW_SECONDS:
|
||||
_cb_failure_count = 0
|
||||
_cb_failure_count += 1
|
||||
_cb_last_failure_time = now
|
||||
if _cb_failure_count >= _CB_FAILURE_THRESHOLD:
|
||||
_cb_open_until = now + _CB_COOLDOWN_SECONDS
|
||||
logger.warning(
|
||||
"Circuit breaker OPENED after %d failures — "
|
||||
"SearXNG calls paused for %ds",
|
||||
_cb_failure_count,
|
||||
_CB_COOLDOWN_SECONDS,
|
||||
)
|
||||
|
||||
|
||||
def _record_success() -> None:
|
||||
global _cb_failure_count, _cb_open_until
|
||||
with _cb_lock:
|
||||
_cb_failure_count = 0
|
||||
_cb_open_until = 0.0
|
||||
|
||||
|
||||
# ---------------------------------------------------------------------------
|
||||
# Result Caching (Redis, graceful degradation)
|
||||
# ---------------------------------------------------------------------------
|
||||
|
||||
_CACHE_TTL_SECONDS = 300 # 5 minutes
|
||||
_CACHE_PREFIX = "websearch:cache:"
|
||||
|
||||
|
||||
def _cache_key(query: str, engines: str | None, language: str | None) -> str:
|
||||
raw = f"{query}|{engines or ''}|{language or ''}"
|
||||
digest = hashlib.sha256(raw.encode()).hexdigest()[:24]
|
||||
return f"{_CACHE_PREFIX}{digest}"
|
||||
|
||||
|
||||
def _cache_get(key: str) -> dict | None:
|
||||
try:
|
||||
data = _get_redis().get(key)
|
||||
if data:
|
||||
return json.loads(data)
|
||||
except (redis.RedisError, json.JSONDecodeError):
|
||||
pass
|
||||
return None
|
||||
|
||||
|
||||
def _cache_set(key: str, value: dict) -> None:
|
||||
with contextlib.suppress(redis.RedisError):
|
||||
_get_redis().setex(key, _CACHE_TTL_SECONDS, json.dumps(value))
|
||||
|
||||
|
||||
# ---------------------------------------------------------------------------
|
||||
# Public API
|
||||
# ---------------------------------------------------------------------------
|
||||
|
||||
|
||||
def is_available() -> bool:
|
||||
"""Return ``True`` when the platform SearXNG host is configured."""
|
||||
return bool(config.SEARXNG_DEFAULT_HOST)
|
||||
|
||||
|
||||
async def health_check() -> dict[str, Any]:
|
||||
"""Ping the SearXNG ``/healthz`` endpoint and return status info."""
|
||||
host = config.SEARXNG_DEFAULT_HOST
|
||||
if not host:
|
||||
return {"status": "unavailable", "error": "SEARXNG_DEFAULT_HOST not set"}
|
||||
|
||||
healthz_url = urljoin(host if host.endswith("/") else f"{host}/", "healthz")
|
||||
t0 = time.perf_counter()
|
||||
try:
|
||||
async with httpx.AsyncClient(timeout=5.0, verify=False) as client:
|
||||
resp = await client.get(healthz_url)
|
||||
resp.raise_for_status()
|
||||
elapsed_ms = round((time.perf_counter() - t0) * 1000)
|
||||
return {
|
||||
"status": "healthy",
|
||||
"response_time_ms": elapsed_ms,
|
||||
"circuit_breaker": "open" if _circuit_is_open() else "closed",
|
||||
}
|
||||
except Exception as exc:
|
||||
elapsed_ms = round((time.perf_counter() - t0) * 1000)
|
||||
return {
|
||||
"status": "unhealthy",
|
||||
"error": str(exc),
|
||||
"response_time_ms": elapsed_ms,
|
||||
"circuit_breaker": "open" if _circuit_is_open() else "closed",
|
||||
}
|
||||
|
||||
|
||||
async def search(
|
||||
query: str,
|
||||
top_k: int = 20,
|
||||
*,
|
||||
engines: str | None = None,
|
||||
language: str | None = None,
|
||||
safesearch: int | None = None,
|
||||
) -> tuple[dict[str, Any], list[dict[str, Any]]]:
|
||||
"""Execute a web search against the platform SearXNG instance.
|
||||
|
||||
Returns the standard ``(result_object, documents)`` tuple expected by
|
||||
``ConnectorService.search_searxng``.
|
||||
"""
|
||||
host = config.SEARXNG_DEFAULT_HOST
|
||||
if not host:
|
||||
return dict(_EMPTY_RESULT), []
|
||||
|
||||
if _circuit_is_open():
|
||||
logger.info("Web search skipped — circuit breaker is open")
|
||||
result = dict(_EMPTY_RESULT)
|
||||
result["error"] = "Web search temporarily unavailable (circuit open)"
|
||||
result["status"] = "degraded"
|
||||
return result, []
|
||||
|
||||
ck = _cache_key(query, engines, language)
|
||||
cached = _cache_get(ck)
|
||||
if cached is not None:
|
||||
logger.debug("Web search cache HIT for query=%r", query[:60])
|
||||
return cached["result"], cached["documents"]
|
||||
|
||||
params: dict[str, Any] = {
|
||||
"q": query,
|
||||
"format": "json",
|
||||
"limit": max(1, min(top_k, 50)),
|
||||
}
|
||||
if engines:
|
||||
params["engines"] = engines
|
||||
if language:
|
||||
params["language"] = language
|
||||
if safesearch is not None and 0 <= safesearch <= 2:
|
||||
params["safesearch"] = safesearch
|
||||
|
||||
searx_endpoint = urljoin(host if host.endswith("/") else f"{host}/", "search")
|
||||
headers = {"Accept": "application/json"}
|
||||
|
||||
data: dict[str, Any] | None = None
|
||||
last_error: Exception | None = None
|
||||
|
||||
for attempt in range(2):
|
||||
try:
|
||||
async with httpx.AsyncClient(timeout=15.0, verify=False) as client:
|
||||
response = await client.get(
|
||||
searx_endpoint,
|
||||
params=params,
|
||||
headers=headers,
|
||||
)
|
||||
response.raise_for_status()
|
||||
data = response.json()
|
||||
break
|
||||
except (httpx.HTTPStatusError, httpx.TimeoutException) as exc:
|
||||
last_error = exc
|
||||
if attempt == 0 and (
|
||||
isinstance(exc, httpx.TimeoutException)
|
||||
or (
|
||||
isinstance(exc, httpx.HTTPStatusError)
|
||||
and exc.response.status_code >= 500
|
||||
)
|
||||
):
|
||||
continue
|
||||
break
|
||||
except httpx.HTTPError as exc:
|
||||
last_error = exc
|
||||
break
|
||||
except ValueError as exc:
|
||||
last_error = exc
|
||||
break
|
||||
|
||||
if data is None:
|
||||
_record_failure()
|
||||
logger.warning("Web search failed after retries: %s", last_error)
|
||||
return dict(_EMPTY_RESULT), []
|
||||
|
||||
_record_success()
|
||||
|
||||
searx_results = data.get("results", [])
|
||||
if not searx_results:
|
||||
return dict(_EMPTY_RESULT), []
|
||||
|
||||
sources_list: list[dict[str, Any]] = []
|
||||
documents: list[dict[str, Any]] = []
|
||||
|
||||
for idx, result in enumerate(searx_results):
|
||||
source_id = 200_000 + idx
|
||||
description = result.get("content") or result.get("snippet") or ""
|
||||
|
||||
sources_list.append(
|
||||
{
|
||||
"id": source_id,
|
||||
"title": result.get("title", "Web Search Result"),
|
||||
"description": description,
|
||||
"url": result.get("url", ""),
|
||||
}
|
||||
)
|
||||
|
||||
documents.append(
|
||||
{
|
||||
"chunk_id": source_id,
|
||||
"content": description or result.get("content", ""),
|
||||
"score": result.get("score", 0.0),
|
||||
"document": {
|
||||
"id": source_id,
|
||||
"title": result.get("title", "Web Search Result"),
|
||||
"document_type": "SEARXNG_API",
|
||||
"metadata": {
|
||||
"url": result.get("url", ""),
|
||||
"engines": result.get("engines", []),
|
||||
"category": result.get("category"),
|
||||
"source": "SEARXNG_API",
|
||||
},
|
||||
},
|
||||
}
|
||||
)
|
||||
|
||||
result_object: dict[str, Any] = {
|
||||
"id": 11,
|
||||
"name": "Web Search",
|
||||
"type": "SEARXNG_API",
|
||||
"sources": sources_list,
|
||||
}
|
||||
|
||||
_cache_set(ck, {"result": result_object, "documents": documents})
|
||||
|
||||
return result_object, documents
|
||||
|
|
@ -9,7 +9,6 @@ from sqlalchemy import select
|
|||
from app.agents.podcaster.graph import graph as podcaster_graph
|
||||
from app.agents.podcaster.state import State as PodcasterState
|
||||
from app.celery_app import celery_app
|
||||
from app.config import config
|
||||
from app.db import Podcast, PodcastStatus
|
||||
from app.tasks.celery_tasks import get_celery_session_maker
|
||||
|
||||
|
|
@ -29,21 +28,6 @@ if sys.platform.startswith("win"):
|
|||
# =============================================================================
|
||||
|
||||
|
||||
def _clear_generating_podcast(search_space_id: int) -> None:
|
||||
"""Clear the generating podcast marker from Redis when task completes."""
|
||||
import redis
|
||||
|
||||
try:
|
||||
client = redis.from_url(config.REDIS_APP_URL, decode_responses=True)
|
||||
key = f"podcast:generating:{search_space_id}"
|
||||
client.delete(key)
|
||||
logger.info(
|
||||
f"Cleared generating podcast key for search_space_id={search_space_id}"
|
||||
)
|
||||
except Exception as e:
|
||||
logger.warning(f"Could not clear generating podcast key: {e}")
|
||||
|
||||
|
||||
@celery_app.task(name="generate_content_podcast", bind=True)
|
||||
def generate_content_podcast_task(
|
||||
self,
|
||||
|
|
@ -75,7 +59,6 @@ def generate_content_podcast_task(
|
|||
loop.run_until_complete(_mark_podcast_failed(podcast_id))
|
||||
return {"status": "failed", "podcast_id": podcast_id}
|
||||
finally:
|
||||
_clear_generating_podcast(search_space_id)
|
||||
asyncio.set_event_loop(None)
|
||||
loop.close()
|
||||
|
||||
|
|
|
|||
|
|
@ -0,0 +1,161 @@
|
|||
"""Celery tasks for video presentation generation."""
|
||||
|
||||
import asyncio
|
||||
import logging
|
||||
import sys
|
||||
|
||||
from sqlalchemy import select
|
||||
|
||||
from app.agents.video_presentation.graph import graph as video_presentation_graph
|
||||
from app.agents.video_presentation.state import State as VideoPresentationState
|
||||
from app.celery_app import celery_app
|
||||
from app.db import VideoPresentation, VideoPresentationStatus
|
||||
from app.tasks.celery_tasks import get_celery_session_maker
|
||||
|
||||
logger = logging.getLogger(__name__)
|
||||
|
||||
if sys.platform.startswith("win"):
|
||||
try:
|
||||
asyncio.set_event_loop_policy(asyncio.WindowsProactorEventLoopPolicy())
|
||||
except AttributeError:
|
||||
logger.warning(
|
||||
"WindowsProactorEventLoopPolicy is unavailable; async subprocess support may fail."
|
||||
)
|
||||
|
||||
|
||||
@celery_app.task(name="generate_video_presentation", bind=True)
|
||||
def generate_video_presentation_task(
|
||||
self,
|
||||
video_presentation_id: int,
|
||||
source_content: str,
|
||||
search_space_id: int,
|
||||
user_prompt: str | None = None,
|
||||
) -> dict:
|
||||
"""
|
||||
Celery task to generate video presentation from source content.
|
||||
Updates existing video presentation record created by the tool.
|
||||
"""
|
||||
loop = asyncio.new_event_loop()
|
||||
asyncio.set_event_loop(loop)
|
||||
|
||||
try:
|
||||
result = loop.run_until_complete(
|
||||
_generate_video_presentation(
|
||||
video_presentation_id,
|
||||
source_content,
|
||||
search_space_id,
|
||||
user_prompt,
|
||||
)
|
||||
)
|
||||
loop.run_until_complete(loop.shutdown_asyncgens())
|
||||
return result
|
||||
except Exception as e:
|
||||
logger.error(f"Error generating video presentation: {e!s}")
|
||||
loop.run_until_complete(_mark_video_presentation_failed(video_presentation_id))
|
||||
return {"status": "failed", "video_presentation_id": video_presentation_id}
|
||||
finally:
|
||||
asyncio.set_event_loop(None)
|
||||
loop.close()
|
||||
|
||||
|
||||
async def _mark_video_presentation_failed(video_presentation_id: int) -> None:
|
||||
"""Mark a video presentation as failed in the database."""
|
||||
async with get_celery_session_maker()() as session:
|
||||
try:
|
||||
result = await session.execute(
|
||||
select(VideoPresentation).filter(
|
||||
VideoPresentation.id == video_presentation_id
|
||||
)
|
||||
)
|
||||
video_pres = result.scalars().first()
|
||||
if video_pres:
|
||||
video_pres.status = VideoPresentationStatus.FAILED
|
||||
await session.commit()
|
||||
except Exception as e:
|
||||
logger.error(f"Failed to mark video presentation as failed: {e}")
|
||||
|
||||
|
||||
async def _generate_video_presentation(
|
||||
video_presentation_id: int,
|
||||
source_content: str,
|
||||
search_space_id: int,
|
||||
user_prompt: str | None = None,
|
||||
) -> dict:
|
||||
"""Generate video presentation and update existing record."""
|
||||
async with get_celery_session_maker()() as session:
|
||||
result = await session.execute(
|
||||
select(VideoPresentation).filter(
|
||||
VideoPresentation.id == video_presentation_id
|
||||
)
|
||||
)
|
||||
video_pres = result.scalars().first()
|
||||
|
||||
if not video_pres:
|
||||
raise ValueError(f"VideoPresentation {video_presentation_id} not found")
|
||||
|
||||
try:
|
||||
video_pres.status = VideoPresentationStatus.GENERATING
|
||||
await session.commit()
|
||||
|
||||
graph_config = {
|
||||
"configurable": {
|
||||
"video_title": video_pres.title,
|
||||
"search_space_id": search_space_id,
|
||||
"user_prompt": user_prompt,
|
||||
}
|
||||
}
|
||||
|
||||
initial_state = VideoPresentationState(
|
||||
source_content=source_content,
|
||||
db_session=session,
|
||||
)
|
||||
|
||||
graph_result = await video_presentation_graph.ainvoke(
|
||||
initial_state, config=graph_config
|
||||
)
|
||||
|
||||
# Serialize slides (parsed content + audio info merged)
|
||||
slides_raw = graph_result.get("slides", [])
|
||||
audio_results_raw = graph_result.get("slide_audio_results", [])
|
||||
scene_codes_raw = graph_result.get("slide_scene_codes", [])
|
||||
|
||||
audio_map = {}
|
||||
for ar in audio_results_raw:
|
||||
data = ar.model_dump() if hasattr(ar, "model_dump") else ar
|
||||
audio_map[data.get("slide_number", 0)] = data
|
||||
|
||||
serializable_slides = []
|
||||
for slide in slides_raw:
|
||||
slide_data = (
|
||||
slide.model_dump() if hasattr(slide, "model_dump") else dict(slide)
|
||||
)
|
||||
audio_data = audio_map.get(slide_data.get("slide_number", 0), {})
|
||||
slide_data["audio_file"] = audio_data.get("audio_file")
|
||||
slide_data["duration_seconds"] = audio_data.get("duration_seconds")
|
||||
slide_data["duration_in_frames"] = audio_data.get("duration_in_frames")
|
||||
serializable_slides.append(slide_data)
|
||||
|
||||
serializable_scene_codes = []
|
||||
for sc in scene_codes_raw:
|
||||
sc_data = sc.model_dump() if hasattr(sc, "model_dump") else dict(sc)
|
||||
serializable_scene_codes.append(sc_data)
|
||||
|
||||
video_pres.slides = serializable_slides
|
||||
video_pres.scene_codes = serializable_scene_codes
|
||||
video_pres.status = VideoPresentationStatus.READY
|
||||
await session.commit()
|
||||
|
||||
logger.info(f"Successfully generated video presentation: {video_pres.id}")
|
||||
|
||||
return {
|
||||
"status": "ready",
|
||||
"video_presentation_id": video_pres.id,
|
||||
"title": video_pres.title,
|
||||
"slide_count": len(serializable_slides),
|
||||
}
|
||||
|
||||
except Exception as e:
|
||||
logger.error(f"Error in _generate_video_presentation: {e!s}")
|
||||
video_pres.status = VideoPresentationStatus.FAILED
|
||||
await session.commit()
|
||||
raise
|
||||
|
|
@ -613,6 +613,41 @@ async def _stream_agent_events(
|
|||
status="completed",
|
||||
items=completed_items,
|
||||
)
|
||||
elif tool_name == "generate_video_presentation":
|
||||
vp_status = (
|
||||
tool_output.get("status", "unknown")
|
||||
if isinstance(tool_output, dict)
|
||||
else "unknown"
|
||||
)
|
||||
vp_title = (
|
||||
tool_output.get("title", "Presentation")
|
||||
if isinstance(tool_output, dict)
|
||||
else "Presentation"
|
||||
)
|
||||
if vp_status in ("pending", "generating"):
|
||||
completed_items = [
|
||||
f"Title: {vp_title}",
|
||||
"Presentation generation started",
|
||||
"Processing in background...",
|
||||
]
|
||||
elif vp_status == "failed":
|
||||
error_msg = (
|
||||
tool_output.get("error", "Unknown error")
|
||||
if isinstance(tool_output, dict)
|
||||
else "Unknown error"
|
||||
)
|
||||
completed_items = [
|
||||
f"Title: {vp_title}",
|
||||
f"Error: {error_msg[:50]}",
|
||||
]
|
||||
else:
|
||||
completed_items = last_active_step_items
|
||||
yield streaming_service.format_thinking_step(
|
||||
step_id=original_step_id,
|
||||
title="Generating video presentation",
|
||||
status="completed",
|
||||
items=completed_items,
|
||||
)
|
||||
elif tool_name == "generate_report":
|
||||
report_status = (
|
||||
tool_output.get("status", "unknown")
|
||||
|
|
@ -756,6 +791,34 @@ async def _stream_agent_events(
|
|||
f"Podcast generation failed: {error_msg}",
|
||||
"error",
|
||||
)
|
||||
elif tool_name == "generate_video_presentation":
|
||||
yield streaming_service.format_tool_output_available(
|
||||
tool_call_id,
|
||||
tool_output
|
||||
if isinstance(tool_output, dict)
|
||||
else {"result": tool_output},
|
||||
)
|
||||
if (
|
||||
isinstance(tool_output, dict)
|
||||
and tool_output.get("status") == "pending"
|
||||
):
|
||||
yield streaming_service.format_terminal_info(
|
||||
f"Video presentation queued: {tool_output.get('title', 'Presentation')}",
|
||||
"success",
|
||||
)
|
||||
elif (
|
||||
isinstance(tool_output, dict)
|
||||
and tool_output.get("status") == "failed"
|
||||
):
|
||||
error_msg = (
|
||||
tool_output.get("error", "Unknown error")
|
||||
if isinstance(tool_output, dict)
|
||||
else "Unknown error"
|
||||
)
|
||||
yield streaming_service.format_terminal_info(
|
||||
f"Presentation generation failed: {error_msg}",
|
||||
"error",
|
||||
)
|
||||
elif tool_name == "link_preview":
|
||||
yield streaming_service.format_tool_output_available(
|
||||
tool_call_id,
|
||||
|
|
|
|||
|
|
@ -432,7 +432,7 @@ async def index_airtable_records(
|
|||
"table_name": item["table_name"],
|
||||
"connector_id": connector_id,
|
||||
}
|
||||
safe_set_chunks(document, chunks)
|
||||
await safe_set_chunks(session, document, chunks)
|
||||
document.updated_at = get_current_timestamp()
|
||||
document.status = DocumentStatus.ready()
|
||||
|
||||
|
|
|
|||
|
|
@ -28,45 +28,35 @@ def get_current_timestamp() -> datetime:
|
|||
return datetime.now(UTC)
|
||||
|
||||
|
||||
def safe_set_chunks(document: Document, chunks: list) -> None:
|
||||
async def safe_set_chunks(
|
||||
session: "AsyncSession", document: Document, chunks: list
|
||||
) -> None:
|
||||
"""
|
||||
Safely assign chunks to a document without triggering lazy loading.
|
||||
Delete old chunks and assign new ones to a document.
|
||||
|
||||
ALWAYS use this instead of `document.chunks = chunks` to avoid
|
||||
SQLAlchemy async errors (MissingGreenlet / greenlet_spawn).
|
||||
|
||||
Why this is needed:
|
||||
- Direct assignment `document.chunks = chunks` triggers SQLAlchemy to
|
||||
load the OLD chunks first (for comparison/orphan detection)
|
||||
- This lazy loading fails in async context with asyncpg driver
|
||||
- set_committed_value bypasses this by setting the value directly
|
||||
|
||||
This function is safe regardless of how the document was loaded
|
||||
(with or without selectinload).
|
||||
This replaces direct ``document.chunks = chunks`` which triggers lazy
|
||||
loading (and MissingGreenlet errors in async contexts). It also
|
||||
explicitly deletes pre-existing chunks so they don't accumulate across
|
||||
repeated re-indexes — ``set_committed_value`` bypasses SQLAlchemy's
|
||||
delete-orphan cascade.
|
||||
|
||||
Args:
|
||||
document: The Document object to update
|
||||
chunks: List of Chunk objects to assign
|
||||
|
||||
Example:
|
||||
# Instead of: document.chunks = chunks (DANGEROUS!)
|
||||
safe_set_chunks(document, chunks) # Always safe
|
||||
session: The current async database session.
|
||||
document: The Document object to update.
|
||||
chunks: List of Chunk objects to assign.
|
||||
"""
|
||||
from sqlalchemy.orm import object_session
|
||||
from sqlalchemy import delete
|
||||
from sqlalchemy.orm.attributes import set_committed_value
|
||||
|
||||
# Keep relationship assignment lazy-load-safe.
|
||||
set_committed_value(document, "chunks", chunks)
|
||||
from app.db import Chunk
|
||||
|
||||
# Ensure chunk rows are actually persisted.
|
||||
# set_committed_value bypasses normal unit-of-work tracking, so we need to
|
||||
# explicitly attach chunk objects to the current session.
|
||||
session = object_session(document)
|
||||
if session is not None:
|
||||
if document.id is not None:
|
||||
for chunk in chunks:
|
||||
chunk.document_id = document.id
|
||||
session.add_all(chunks)
|
||||
if document.id is not None:
|
||||
await session.execute(delete(Chunk).where(Chunk.document_id == document.id))
|
||||
for chunk in chunks:
|
||||
chunk.document_id = document.id
|
||||
|
||||
set_committed_value(document, "chunks", chunks)
|
||||
session.add_all(chunks)
|
||||
|
||||
|
||||
def parse_date_flexible(date_str: str) -> datetime:
|
||||
|
|
|
|||
|
|
@ -430,7 +430,7 @@ async def index_bookstack_pages(
|
|||
document.content_hash = item["content_hash"]
|
||||
document.embedding = summary_embedding
|
||||
document.document_metadata = doc_metadata
|
||||
safe_set_chunks(document, chunks)
|
||||
await safe_set_chunks(session, document, chunks)
|
||||
document.updated_at = get_current_timestamp()
|
||||
document.status = DocumentStatus.ready()
|
||||
|
||||
|
|
|
|||
|
|
@ -439,7 +439,7 @@ async def index_clickup_tasks(
|
|||
"connector_id": connector_id,
|
||||
"indexed_at": datetime.now().strftime("%Y-%m-%d %H:%M:%S"),
|
||||
}
|
||||
safe_set_chunks(document, chunks)
|
||||
await safe_set_chunks(session, document, chunks)
|
||||
document.updated_at = get_current_timestamp()
|
||||
document.status = DocumentStatus.ready()
|
||||
|
||||
|
|
|
|||
|
|
@ -413,7 +413,7 @@ async def index_confluence_pages(
|
|||
"indexed_at": datetime.now().strftime("%Y-%m-%d %H:%M:%S"),
|
||||
"connector_id": connector_id,
|
||||
}
|
||||
safe_set_chunks(document, chunks)
|
||||
await safe_set_chunks(session, document, chunks)
|
||||
document.updated_at = get_current_timestamp()
|
||||
document.status = DocumentStatus.ready()
|
||||
|
||||
|
|
|
|||
|
|
@ -690,7 +690,7 @@ async def index_discord_messages(
|
|||
"indexed_at": datetime.now(UTC).strftime("%Y-%m-%d %H:%M:%S"),
|
||||
"connector_id": connector_id,
|
||||
}
|
||||
safe_set_chunks(document, chunks)
|
||||
await safe_set_chunks(session, document, chunks)
|
||||
document.updated_at = get_current_timestamp()
|
||||
document.status = DocumentStatus.ready()
|
||||
|
||||
|
|
|
|||
|
|
@ -386,7 +386,7 @@ async def index_elasticsearch_documents(
|
|||
document.content_hash = item["content_hash"]
|
||||
document.unique_identifier_hash = item["unique_identifier_hash"]
|
||||
document.document_metadata = metadata
|
||||
safe_set_chunks(document, chunks)
|
||||
await safe_set_chunks(session, document, chunks)
|
||||
document.updated_at = get_current_timestamp()
|
||||
document.status = DocumentStatus.ready()
|
||||
|
||||
|
|
|
|||
|
|
@ -415,7 +415,7 @@ async def index_github_repos(
|
|||
document.content_hash = item["content_hash"]
|
||||
document.embedding = summary_embedding
|
||||
document.document_metadata = doc_metadata
|
||||
safe_set_chunks(document, chunks_data)
|
||||
await safe_set_chunks(session, document, chunks_data)
|
||||
document.updated_at = get_current_timestamp()
|
||||
document.status = DocumentStatus.ready()
|
||||
|
||||
|
|
|
|||
|
|
@ -528,7 +528,7 @@ async def index_google_calendar_events(
|
|||
"indexed_at": datetime.now().strftime("%Y-%m-%d %H:%M:%S"),
|
||||
"connector_id": connector_id,
|
||||
}
|
||||
safe_set_chunks(document, chunks)
|
||||
await safe_set_chunks(session, document, chunks)
|
||||
document.updated_at = get_current_timestamp()
|
||||
document.status = DocumentStatus.ready()
|
||||
|
||||
|
|
|
|||
|
|
@ -451,7 +451,7 @@ async def index_google_gmail_messages(
|
|||
"date": item["date_str"],
|
||||
"connector_id": connector_id,
|
||||
}
|
||||
safe_set_chunks(document, chunks)
|
||||
await safe_set_chunks(session, document, chunks)
|
||||
document.updated_at = get_current_timestamp()
|
||||
document.status = DocumentStatus.ready()
|
||||
|
||||
|
|
|
|||
|
|
@ -393,7 +393,7 @@ async def index_jira_issues(
|
|||
"indexed_at": datetime.now().strftime("%Y-%m-%d %H:%M:%S"),
|
||||
"connector_id": connector_id,
|
||||
}
|
||||
safe_set_chunks(document, chunks)
|
||||
await safe_set_chunks(session, document, chunks)
|
||||
document.updated_at = get_current_timestamp()
|
||||
document.status = DocumentStatus.ready()
|
||||
|
||||
|
|
|
|||
|
|
@ -431,7 +431,7 @@ async def index_linear_issues(
|
|||
"indexed_at": datetime.now().strftime("%Y-%m-%d %H:%M:%S"),
|
||||
"connector_id": connector_id,
|
||||
}
|
||||
safe_set_chunks(document, chunks)
|
||||
await safe_set_chunks(session, document, chunks)
|
||||
document.updated_at = get_current_timestamp()
|
||||
document.status = DocumentStatus.ready()
|
||||
|
||||
|
|
|
|||
|
|
@ -488,7 +488,7 @@ async def index_luma_events(
|
|||
"indexed_at": datetime.now().strftime("%Y-%m-%d %H:%M:%S"),
|
||||
"connector_id": connector_id,
|
||||
}
|
||||
safe_set_chunks(document, chunks)
|
||||
await safe_set_chunks(session, document, chunks)
|
||||
document.updated_at = get_current_timestamp()
|
||||
document.status = DocumentStatus.ready()
|
||||
|
||||
|
|
|
|||
|
|
@ -479,7 +479,7 @@ async def index_notion_pages(
|
|||
"indexed_at": datetime.now().strftime("%Y-%m-%d %H:%M:%S"),
|
||||
"connector_id": connector_id,
|
||||
}
|
||||
safe_set_chunks(document, chunks)
|
||||
await safe_set_chunks(session, document, chunks)
|
||||
document.updated_at = get_current_timestamp()
|
||||
document.status = DocumentStatus.ready()
|
||||
|
||||
|
|
|
|||
|
|
@ -571,7 +571,7 @@ async def index_obsidian_vault(
|
|||
document.content_hash = content_hash
|
||||
document.embedding = embedding
|
||||
document.document_metadata = document_metadata
|
||||
safe_set_chunks(document, chunks)
|
||||
await safe_set_chunks(session, document, chunks)
|
||||
document.updated_at = get_current_timestamp()
|
||||
document.status = DocumentStatus.ready()
|
||||
|
||||
|
|
|
|||
|
|
@ -564,7 +564,7 @@ async def index_slack_messages(
|
|||
"indexed_at": datetime.now().strftime("%Y-%m-%d %H:%M:%S"),
|
||||
"connector_id": connector_id,
|
||||
}
|
||||
safe_set_chunks(document, chunks)
|
||||
await safe_set_chunks(session, document, chunks)
|
||||
document.updated_at = get_current_timestamp()
|
||||
document.status = DocumentStatus.ready()
|
||||
|
||||
|
|
|
|||
|
|
@ -603,7 +603,7 @@ async def index_teams_messages(
|
|||
"indexed_at": datetime.now().strftime("%Y-%m-%d %H:%M:%S"),
|
||||
"connector_id": connector_id,
|
||||
}
|
||||
safe_set_chunks(document, chunks)
|
||||
await safe_set_chunks(session, document, chunks)
|
||||
document.updated_at = get_current_timestamp()
|
||||
document.status = DocumentStatus.ready()
|
||||
|
||||
|
|
|
|||
|
|
@ -410,7 +410,7 @@ async def index_crawled_urls(
|
|||
"indexed_at": datetime.now().strftime("%Y-%m-%d %H:%M:%S"),
|
||||
"connector_id": connector_id,
|
||||
}
|
||||
safe_set_chunks(document, chunks)
|
||||
await safe_set_chunks(session, document, chunks)
|
||||
document.status = DocumentStatus.ready() # READY status
|
||||
document.updated_at = get_current_timestamp()
|
||||
|
||||
|
|
|
|||
|
|
@ -14,45 +14,35 @@ from app.db import Document
|
|||
md = MarkdownifyTransformer()
|
||||
|
||||
|
||||
def safe_set_chunks(document: Document, chunks: list) -> None:
|
||||
async def safe_set_chunks(
|
||||
session: "AsyncSession", document: Document, chunks: list
|
||||
) -> None:
|
||||
"""
|
||||
Safely assign chunks to a document without triggering lazy loading.
|
||||
Delete old chunks and assign new ones to a document.
|
||||
|
||||
ALWAYS use this instead of `document.chunks = chunks` to avoid
|
||||
SQLAlchemy async errors (MissingGreenlet / greenlet_spawn).
|
||||
|
||||
Why this is needed:
|
||||
- Direct assignment `document.chunks = chunks` triggers SQLAlchemy to
|
||||
load the OLD chunks first (for comparison/orphan detection)
|
||||
- This lazy loading fails in async context with asyncpg driver
|
||||
- set_committed_value bypasses this by setting the value directly
|
||||
|
||||
This function is safe regardless of how the document was loaded
|
||||
(with or without selectinload).
|
||||
This replaces direct ``document.chunks = chunks`` which triggers lazy
|
||||
loading (and MissingGreenlet errors in async contexts). It also
|
||||
explicitly deletes pre-existing chunks so they don't accumulate across
|
||||
repeated re-indexes — ``set_committed_value`` bypasses SQLAlchemy's
|
||||
delete-orphan cascade.
|
||||
|
||||
Args:
|
||||
document: The Document object to update
|
||||
chunks: List of Chunk objects to assign
|
||||
|
||||
Example:
|
||||
# Instead of: document.chunks = chunks (DANGEROUS!)
|
||||
safe_set_chunks(document, chunks) # Always safe
|
||||
session: The current async database session.
|
||||
document: The Document object to update.
|
||||
chunks: List of Chunk objects to assign.
|
||||
"""
|
||||
from sqlalchemy.orm import object_session
|
||||
from sqlalchemy import delete
|
||||
from sqlalchemy.orm.attributes import set_committed_value
|
||||
|
||||
# Keep relationship assignment lazy-load-safe.
|
||||
set_committed_value(document, "chunks", chunks)
|
||||
from app.db import Chunk
|
||||
|
||||
# Ensure chunk rows are actually persisted.
|
||||
# set_committed_value bypasses normal unit-of-work tracking, so we need to
|
||||
# explicitly attach chunk objects to the current session.
|
||||
session = object_session(document)
|
||||
if session is not None:
|
||||
if document.id is not None:
|
||||
for chunk in chunks:
|
||||
chunk.document_id = document.id
|
||||
session.add_all(chunks)
|
||||
if document.id is not None:
|
||||
await session.execute(delete(Chunk).where(Chunk.document_id == document.id))
|
||||
for chunk in chunks:
|
||||
chunk.document_id = document.id
|
||||
|
||||
set_committed_value(document, "chunks", chunks)
|
||||
session.add_all(chunks)
|
||||
|
||||
|
||||
def get_current_timestamp() -> datetime:
|
||||
|
|
|
|||
|
|
@ -227,7 +227,7 @@ async def add_circleback_meeting_document(
|
|||
if summary_embedding is not None:
|
||||
document.embedding = summary_embedding
|
||||
document.document_metadata = document_metadata
|
||||
safe_set_chunks(document, chunks)
|
||||
await safe_set_chunks(session, document, chunks)
|
||||
document.source_markdown = markdown_content
|
||||
document.content_needs_reindexing = False
|
||||
document.updated_at = get_current_timestamp()
|
||||
|
|
|
|||
|
|
@ -21,6 +21,7 @@ from app.utils.document_converters import (
|
|||
from .base import (
|
||||
check_document_by_unique_identifier,
|
||||
get_current_timestamp,
|
||||
safe_set_chunks,
|
||||
)
|
||||
|
||||
|
||||
|
|
@ -154,7 +155,7 @@ async def add_extension_received_document(
|
|||
existing_document.content_hash = content_hash
|
||||
existing_document.embedding = summary_embedding
|
||||
existing_document.document_metadata = content.metadata.model_dump()
|
||||
existing_document.chunks = chunks
|
||||
await safe_set_chunks(session, existing_document, chunks)
|
||||
existing_document.source_markdown = combined_document_string
|
||||
existing_document.updated_at = get_current_timestamp()
|
||||
|
||||
|
|
|
|||
|
|
@ -35,6 +35,7 @@ from .base import (
|
|||
check_document_by_unique_identifier,
|
||||
check_duplicate_document,
|
||||
get_current_timestamp,
|
||||
safe_set_chunks,
|
||||
)
|
||||
from .markdown_processor import add_received_markdown_file_document
|
||||
|
||||
|
|
@ -488,7 +489,7 @@ async def add_received_file_document_using_unstructured(
|
|||
"FILE_NAME": file_name,
|
||||
"ETL_SERVICE": "UNSTRUCTURED",
|
||||
}
|
||||
existing_document.chunks = chunks
|
||||
await safe_set_chunks(session, existing_document, chunks)
|
||||
existing_document.source_markdown = file_in_markdown
|
||||
existing_document.content_needs_reindexing = False
|
||||
existing_document.updated_at = get_current_timestamp()
|
||||
|
|
@ -622,7 +623,7 @@ async def add_received_file_document_using_llamacloud(
|
|||
"FILE_NAME": file_name,
|
||||
"ETL_SERVICE": "LLAMACLOUD",
|
||||
}
|
||||
existing_document.chunks = chunks
|
||||
await safe_set_chunks(session, existing_document, chunks)
|
||||
existing_document.source_markdown = file_in_markdown
|
||||
existing_document.content_needs_reindexing = False
|
||||
existing_document.updated_at = get_current_timestamp()
|
||||
|
|
@ -777,7 +778,7 @@ async def add_received_file_document_using_docling(
|
|||
"FILE_NAME": file_name,
|
||||
"ETL_SERVICE": "DOCLING",
|
||||
}
|
||||
existing_document.chunks = chunks
|
||||
await safe_set_chunks(session, existing_document, chunks)
|
||||
existing_document.source_markdown = file_in_markdown
|
||||
existing_document.content_needs_reindexing = False
|
||||
existing_document.updated_at = get_current_timestamp()
|
||||
|
|
|
|||
|
|
@ -21,6 +21,7 @@ from .base import (
|
|||
check_document_by_unique_identifier,
|
||||
check_duplicate_document,
|
||||
get_current_timestamp,
|
||||
safe_set_chunks,
|
||||
)
|
||||
|
||||
|
||||
|
|
@ -258,7 +259,7 @@ async def add_received_markdown_file_document(
|
|||
existing_document.document_metadata = {
|
||||
"FILE_NAME": file_name,
|
||||
}
|
||||
existing_document.chunks = chunks
|
||||
await safe_set_chunks(session, existing_document, chunks)
|
||||
existing_document.source_markdown = file_in_markdown
|
||||
existing_document.updated_at = get_current_timestamp()
|
||||
existing_document.status = DocumentStatus.ready() # Mark as ready
|
||||
|
|
|
|||
|
|
@ -419,7 +419,7 @@ async def add_youtube_video_document(
|
|||
"author": video_data.get("author_name", "Unknown"),
|
||||
"thumbnail": video_data.get("thumbnail_url", ""),
|
||||
}
|
||||
safe_set_chunks(document, chunks)
|
||||
await safe_set_chunks(session, document, chunks)
|
||||
document.source_markdown = combined_document_string
|
||||
document.status = DocumentStatus.ready() # READY status - fully processed
|
||||
document.updated_at = get_current_timestamp()
|
||||
|
|
|
|||
|
|
@ -9,9 +9,10 @@ import re
|
|||
from datetime import UTC, datetime
|
||||
from pathlib import Path
|
||||
|
||||
from sqlalchemy import select
|
||||
from sqlalchemy import delete as sa_delete, select
|
||||
from sqlalchemy.ext.asyncio import AsyncSession
|
||||
from sqlalchemy.orm import selectinload
|
||||
from sqlalchemy.orm.attributes import set_committed_value
|
||||
|
||||
from app.config import config
|
||||
from app.db import SurfsenseDocsChunk, SurfsenseDocsDocument, async_session_maker
|
||||
|
|
@ -19,6 +20,24 @@ from app.utils.document_converters import embed_text
|
|||
|
||||
logger = logging.getLogger(__name__)
|
||||
|
||||
|
||||
async def _safe_set_docs_chunks(
|
||||
session: AsyncSession, document: SurfsenseDocsDocument, chunks: list
|
||||
) -> None:
|
||||
"""safe_set_chunks variant for the SurfsenseDocsDocument/Chunk models."""
|
||||
if document.id is not None:
|
||||
await session.execute(
|
||||
sa_delete(SurfsenseDocsChunk).where(
|
||||
SurfsenseDocsChunk.document_id == document.id
|
||||
)
|
||||
)
|
||||
for chunk in chunks:
|
||||
chunk.document_id = document.id
|
||||
|
||||
set_committed_value(document, "chunks", chunks)
|
||||
session.add_all(chunks)
|
||||
|
||||
|
||||
# Path to docs relative to project root
|
||||
DOCS_DIR = (
|
||||
Path(__file__).resolve().parent.parent.parent.parent
|
||||
|
|
@ -156,7 +175,7 @@ async def index_surfsense_docs(session: AsyncSession) -> tuple[int, int, int, in
|
|||
existing_doc.content = content
|
||||
existing_doc.content_hash = content_hash
|
||||
existing_doc.embedding = embed_text(content)
|
||||
existing_doc.chunks = chunks
|
||||
await _safe_set_docs_chunks(session, existing_doc, chunks)
|
||||
existing_doc.updated_at = datetime.now(UTC)
|
||||
|
||||
updated += 1
|
||||
|
|
|
|||
|
|
@ -6,6 +6,9 @@ requires-python = ">=3.12"
|
|||
dependencies = [
|
||||
"alembic>=1.13.0",
|
||||
"asyncpg>=0.30.0",
|
||||
"authlib>=1.6.9",
|
||||
"PyJWT>=2.12.0",
|
||||
"tornado>=6.5.5",
|
||||
"datasets>=2.21.0",
|
||||
"pyarrow>=15.0.0,<19.0.0",
|
||||
"discord-py>=2.5.2",
|
||||
|
|
|
|||
47
surfsense_backend/uv.lock
generated
47
surfsense_backend/uv.lock
generated
|
|
@ -413,14 +413,14 @@ wheels = [
|
|||
|
||||
[[package]]
|
||||
name = "authlib"
|
||||
version = "1.6.8"
|
||||
version = "1.6.9"
|
||||
source = { registry = "https://pypi.org/simple" }
|
||||
dependencies = [
|
||||
{ name = "cryptography" },
|
||||
]
|
||||
sdist = { url = "https://files.pythonhosted.org/packages/6b/6c/c88eac87468c607f88bc24df1f3b31445ee6fc9ba123b09e666adf687cd9/authlib-1.6.8.tar.gz", hash = "sha256:41ae180a17cf672bc784e4a518e5c82687f1fe1e98b0cafaeda80c8e4ab2d1cb", size = 165074 }
|
||||
sdist = { url = "https://files.pythonhosted.org/packages/af/98/00d3dd826d46959ad8e32af2dbb2398868fd9fd0683c26e56d0789bd0e68/authlib-1.6.9.tar.gz", hash = "sha256:d8f2421e7e5980cc1ddb4e32d3f5fa659cfaf60d8eaf3281ebed192e4ab74f04", size = 165134 }
|
||||
wheels = [
|
||||
{ url = "https://files.pythonhosted.org/packages/9b/73/f7084bf12755113cd535ae586782ff3a6e710bfbe6a0d13d1c2f81ffbbfa/authlib-1.6.8-py2.py3-none-any.whl", hash = "sha256:97286fd7a15e6cfefc32771c8ef9c54f0ed58028f1322de6a2a7c969c3817888", size = 244116 },
|
||||
{ url = "https://files.pythonhosted.org/packages/53/23/b65f568ed0c22f1efacb744d2db1a33c8068f384b8c9b482b52ebdbc3ef6/authlib-1.6.9-py2.py3-none-any.whl", hash = "sha256:f08b4c14e08f0861dc18a32357b33fbcfd2ea86cfe3fe149484b4d764c4a0ac3", size = 244197 },
|
||||
]
|
||||
|
||||
[[package]]
|
||||
|
|
@ -6353,11 +6353,11 @@ wheels = [
|
|||
|
||||
[[package]]
|
||||
name = "pyjwt"
|
||||
version = "2.11.0"
|
||||
version = "2.12.1"
|
||||
source = { registry = "https://pypi.org/simple" }
|
||||
sdist = { url = "https://files.pythonhosted.org/packages/5c/5a/b46fa56bf322901eee5b0454a34343cdbdae202cd421775a8ee4e42fd519/pyjwt-2.11.0.tar.gz", hash = "sha256:35f95c1f0fbe5d5ba6e43f00271c275f7a1a4db1dab27bf708073b75318ea623", size = 98019 }
|
||||
sdist = { url = "https://files.pythonhosted.org/packages/c2/27/a3b6e5bf6ff856d2509292e95c8f57f0df7017cf5394921fc4e4ef40308a/pyjwt-2.12.1.tar.gz", hash = "sha256:c74a7a2adf861c04d002db713dd85f84beb242228e671280bf709d765b03672b", size = 102564 }
|
||||
wheels = [
|
||||
{ url = "https://files.pythonhosted.org/packages/6f/01/c26ce75ba460d5cd503da9e13b21a33804d38c2165dec7b716d06b13010c/pyjwt-2.11.0-py3-none-any.whl", hash = "sha256:94a6bde30eb5c8e04fee991062b534071fd1439ef58d2adc9ccb823e7bcd0469", size = 28224 },
|
||||
{ url = "https://files.pythonhosted.org/packages/e5/7a/8dd906bd22e79e47397a61742927f6747fe93242ef86645ee9092e610244/pyjwt-2.12.1-py3-none-any.whl", hash = "sha256:28ca37c070cad8ba8cd9790cd940535d40274d22f80ab87f3ac6a713e6e8454c", size = 29726 },
|
||||
]
|
||||
|
||||
[package.optional-dependencies]
|
||||
|
|
@ -7854,6 +7854,7 @@ source = { editable = "." }
|
|||
dependencies = [
|
||||
{ name = "alembic" },
|
||||
{ name = "asyncpg" },
|
||||
{ name = "authlib" },
|
||||
{ name = "boto3" },
|
||||
{ name = "celery", extra = ["redis"] },
|
||||
{ name = "chonkie", extra = ["all"] },
|
||||
|
|
@ -7894,6 +7895,7 @@ dependencies = [
|
|||
{ name = "playwright" },
|
||||
{ name = "psycopg", extra = ["binary", "pool"] },
|
||||
{ name = "pyarrow" },
|
||||
{ name = "pyjwt" },
|
||||
{ name = "pypandoc" },
|
||||
{ name = "pypandoc-binary" },
|
||||
{ name = "pypdf" },
|
||||
|
|
@ -7909,6 +7911,7 @@ dependencies = [
|
|||
{ name = "starlette" },
|
||||
{ name = "static-ffmpeg" },
|
||||
{ name = "tavily-python" },
|
||||
{ name = "tornado" },
|
||||
{ name = "trafilatura" },
|
||||
{ name = "typst" },
|
||||
{ name = "unstructured", extra = ["all-docs"] },
|
||||
|
|
@ -7931,6 +7934,7 @@ dev = [
|
|||
requires-dist = [
|
||||
{ name = "alembic", specifier = ">=1.13.0" },
|
||||
{ name = "asyncpg", specifier = ">=0.30.0" },
|
||||
{ name = "authlib", specifier = ">=1.6.9" },
|
||||
{ name = "boto3", specifier = ">=1.35.0" },
|
||||
{ name = "celery", extras = ["redis"], specifier = ">=5.5.3" },
|
||||
{ name = "chonkie", extras = ["all"], specifier = ">=1.5.0" },
|
||||
|
|
@ -7971,6 +7975,7 @@ requires-dist = [
|
|||
{ name = "playwright", specifier = ">=1.50.0" },
|
||||
{ name = "psycopg", extras = ["binary", "pool"], specifier = ">=3.3.2" },
|
||||
{ name = "pyarrow", specifier = ">=15.0.0,<19.0.0" },
|
||||
{ name = "pyjwt", specifier = ">=2.12.0" },
|
||||
{ name = "pypandoc", specifier = ">=1.16.2" },
|
||||
{ name = "pypandoc-binary", specifier = ">=1.16.2" },
|
||||
{ name = "pypdf", specifier = ">=5.1.0" },
|
||||
|
|
@ -7986,6 +7991,7 @@ requires-dist = [
|
|||
{ name = "starlette", specifier = ">=0.40.0,<0.51.0" },
|
||||
{ name = "static-ffmpeg", specifier = ">=2.13" },
|
||||
{ name = "tavily-python", specifier = ">=0.3.2" },
|
||||
{ name = "tornado", specifier = ">=6.5.5" },
|
||||
{ name = "trafilatura", specifier = ">=2.0.0" },
|
||||
{ name = "typst", specifier = ">=0.14.0" },
|
||||
{ name = "unstructured", extras = ["all-docs"], specifier = ">=0.18.31" },
|
||||
|
|
@ -8252,6 +8258,11 @@ dependencies = [
|
|||
wheels = [
|
||||
{ url = "https://files.pythonhosted.org/packages/d3/54/a2ba279afcca44bbd320d4e73675b282fcee3d81400ea1b53934efca6462/torch-2.10.0-2-cp312-none-macosx_11_0_arm64.whl", hash = "sha256:13ec4add8c3faaed8d13e0574f5cd4a323c11655546f91fbe6afa77b57423574", size = 79498202 },
|
||||
{ url = "https://files.pythonhosted.org/packages/ec/23/2c9fe0c9c27f7f6cb865abcea8a4568f29f00acaeadfc6a37f6801f84cb4/torch-2.10.0-2-cp313-none-macosx_11_0_arm64.whl", hash = "sha256:e521c9f030a3774ed770a9c011751fb47c4d12029a3d6522116e48431f2ff89e", size = 79498254 },
|
||||
{ url = "https://files.pythonhosted.org/packages/b3/7a/abada41517ce0011775f0f4eacc79659bc9bc6c361e6bfe6f7052a6b9363/torch-2.10.0-3-cp312-cp312-manylinux_2_28_x86_64.whl", hash = "sha256:98c01b8bb5e3240426dcde1446eed6f40c778091c8544767ef1168fc663a05a6", size = 915622781 },
|
||||
{ url = "https://files.pythonhosted.org/packages/ab/c6/4dfe238342ffdcec5aef1c96c457548762d33c40b45a1ab7033bb26d2ff2/torch-2.10.0-3-cp313-cp313-manylinux_2_28_x86_64.whl", hash = "sha256:80b1b5bfe38eb0e9f5ff09f206dcac0a87aadd084230d4a36eea5ec5232c115b", size = 915627275 },
|
||||
{ url = "https://files.pythonhosted.org/packages/d8/f0/72bf18847f58f877a6a8acf60614b14935e2f156d942483af1ffc081aea0/torch-2.10.0-3-cp313-cp313t-manylinux_2_28_x86_64.whl", hash = "sha256:46b3574d93a2a8134b3f5475cfb98e2eb46771794c57015f6ad1fb795ec25e49", size = 915523474 },
|
||||
{ url = "https://files.pythonhosted.org/packages/f4/39/590742415c3030551944edc2ddc273ea1fdfe8ffb2780992e824f1ebee98/torch-2.10.0-3-cp314-cp314-manylinux_2_28_x86_64.whl", hash = "sha256:b1d5e2aba4eb7f8e87fbe04f86442887f9167a35f092afe4c237dfcaaef6e328", size = 915632474 },
|
||||
{ url = "https://files.pythonhosted.org/packages/b6/8e/34949484f764dde5b222b7fe3fede43e4a6f0da9d7f8c370bb617d629ee2/torch-2.10.0-3-cp314-cp314t-manylinux_2_28_x86_64.whl", hash = "sha256:0228d20b06701c05a8f978357f657817a4a63984b0c90745def81c18aedfa591", size = 915523882 },
|
||||
{ url = "https://files.pythonhosted.org/packages/cc/af/758e242e9102e9988969b5e621d41f36b8f258bb4a099109b7a4b4b50ea4/torch-2.10.0-cp312-cp312-manylinux_2_28_aarch64.whl", hash = "sha256:5fd4117d89ffd47e3dcc71e71a22efac24828ad781c7e46aaaf56bf7f2796acf", size = 145996088 },
|
||||
{ url = "https://files.pythonhosted.org/packages/23/8e/3c74db5e53bff7ed9e34c8123e6a8bfef718b2450c35eefab85bb4a7e270/torch-2.10.0-cp312-cp312-manylinux_2_28_x86_64.whl", hash = "sha256:787124e7db3b379d4f1ed54dd12ae7c741c16a4d29b49c0226a89bea50923ffb", size = 915711952 },
|
||||
{ url = "https://files.pythonhosted.org/packages/6e/01/624c4324ca01f66ae4c7cd1b74eb16fb52596dce66dbe51eff95ef9e7a4c/torch-2.10.0-cp312-cp312-win_amd64.whl", hash = "sha256:2c66c61f44c5f903046cc696d088e21062644cbe541c7f1c4eaae88b2ad23547", size = 113757972 },
|
||||
|
|
@ -8308,21 +8319,19 @@ wheels = [
|
|||
|
||||
[[package]]
|
||||
name = "tornado"
|
||||
version = "6.5.4"
|
||||
version = "6.5.5"
|
||||
source = { registry = "https://pypi.org/simple" }
|
||||
sdist = { url = "https://files.pythonhosted.org/packages/37/1d/0a336abf618272d53f62ebe274f712e213f5a03c0b2339575430b8362ef2/tornado-6.5.4.tar.gz", hash = "sha256:a22fa9047405d03260b483980635f0b041989d8bcc9a313f8fe18b411d84b1d7", size = 513632 }
|
||||
sdist = { url = "https://files.pythonhosted.org/packages/f8/f1/3173dfa4a18db4a9b03e5d55325559dab51ee653763bb8745a75af491286/tornado-6.5.5.tar.gz", hash = "sha256:192b8f3ea91bd7f1f50c06955416ed76c6b72f96779b962f07f911b91e8d30e9", size = 516006 }
|
||||
wheels = [
|
||||
{ url = "https://files.pythonhosted.org/packages/ab/a9/e94a9d5224107d7ce3cc1fab8d5dc97f5ea351ccc6322ee4fb661da94e35/tornado-6.5.4-cp39-abi3-macosx_10_9_universal2.whl", hash = "sha256:d6241c1a16b1c9e4cc28148b1cda97dd1c6cb4fb7068ac1bedc610768dff0ba9", size = 443909 },
|
||||
{ url = "https://files.pythonhosted.org/packages/db/7e/f7b8d8c4453f305a51f80dbb49014257bb7d28ccb4bbb8dd328ea995ecad/tornado-6.5.4-cp39-abi3-macosx_10_9_x86_64.whl", hash = "sha256:2d50f63dda1d2cac3ae1fa23d254e16b5e38153758470e9956cbc3d813d40843", size = 442163 },
|
||||
{ url = "https://files.pythonhosted.org/packages/ba/b5/206f82d51e1bfa940ba366a8d2f83904b15942c45a78dd978b599870ab44/tornado-6.5.4-cp39-abi3-manylinux_2_17_aarch64.manylinux2014_aarch64.whl", hash = "sha256:d1cf66105dc6acb5af613c054955b8137e34a03698aa53272dbda4afe252be17", size = 445746 },
|
||||
{ url = "https://files.pythonhosted.org/packages/8e/9d/1a3338e0bd30ada6ad4356c13a0a6c35fbc859063fa7eddb309183364ac1/tornado-6.5.4-cp39-abi3-manylinux_2_5_i686.manylinux1_i686.manylinux_2_17_i686.manylinux2014_i686.whl", hash = "sha256:50ff0a58b0dc97939d29da29cd624da010e7f804746621c78d14b80238669335", size = 445083 },
|
||||
{ url = "https://files.pythonhosted.org/packages/50/d4/e51d52047e7eb9a582da59f32125d17c0482d065afd5d3bc435ff2120dc5/tornado-6.5.4-cp39-abi3-manylinux_2_5_x86_64.manylinux1_x86_64.manylinux_2_17_x86_64.manylinux2014_x86_64.whl", hash = "sha256:e5fb5e04efa54cf0baabdd10061eb4148e0be137166146fff835745f59ab9f7f", size = 445315 },
|
||||
{ url = "https://files.pythonhosted.org/packages/27/07/2273972f69ca63dbc139694a3fc4684edec3ea3f9efabf77ed32483b875c/tornado-6.5.4-cp39-abi3-musllinux_1_2_aarch64.whl", hash = "sha256:9c86b1643b33a4cd415f8d0fe53045f913bf07b4a3ef646b735a6a86047dda84", size = 446003 },
|
||||
{ url = "https://files.pythonhosted.org/packages/d1/83/41c52e47502bf7260044413b6770d1a48dda2f0246f95ee1384a3cd9c44a/tornado-6.5.4-cp39-abi3-musllinux_1_2_i686.whl", hash = "sha256:6eb82872335a53dd063a4f10917b3efd28270b56a33db69009606a0312660a6f", size = 445412 },
|
||||
{ url = "https://files.pythonhosted.org/packages/10/c7/bc96917f06cbee182d44735d4ecde9c432e25b84f4c2086143013e7b9e52/tornado-6.5.4-cp39-abi3-musllinux_1_2_x86_64.whl", hash = "sha256:6076d5dda368c9328ff41ab5d9dd3608e695e8225d1cd0fd1e006f05da3635a8", size = 445392 },
|
||||
{ url = "https://files.pythonhosted.org/packages/0c/1a/d7592328d037d36f2d2462f4bc1fbb383eec9278bc786c1b111cbbd44cfa/tornado-6.5.4-cp39-abi3-win32.whl", hash = "sha256:1768110f2411d5cd281bac0a090f707223ce77fd110424361092859e089b38d1", size = 446481 },
|
||||
{ url = "https://files.pythonhosted.org/packages/d6/6d/c69be695a0a64fd37a97db12355a035a6d90f79067a3cf936ec2b1dc38cd/tornado-6.5.4-cp39-abi3-win_amd64.whl", hash = "sha256:fa07d31e0cd85c60713f2b995da613588aa03e1303d75705dca6af8babc18ddc", size = 446886 },
|
||||
{ url = "https://files.pythonhosted.org/packages/50/49/8dc3fd90902f70084bd2cd059d576ddb4f8bb44c2c7c0e33a11422acb17e/tornado-6.5.4-cp39-abi3-win_arm64.whl", hash = "sha256:053e6e16701eb6cbe641f308f4c1a9541f91b6261991160391bfc342e8a551a1", size = 445910 },
|
||||
{ url = "https://files.pythonhosted.org/packages/59/8c/77f5097695f4dd8255ecbd08b2a1ed8ba8b953d337804dd7080f199e12bf/tornado-6.5.5-cp39-abi3-macosx_10_9_universal2.whl", hash = "sha256:487dc9cc380e29f58c7ab88f9e27cdeef04b2140862e5076a66fb6bb68bb1bfa", size = 445983 },
|
||||
{ url = "https://files.pythonhosted.org/packages/ab/5e/7625b76cd10f98f1516c36ce0346de62061156352353ef2da44e5c21523c/tornado-6.5.5-cp39-abi3-macosx_10_9_x86_64.whl", hash = "sha256:65a7f1d46d4bb41df1ac99f5fcb685fb25c7e61613742d5108b010975a9a6521", size = 444246 },
|
||||
{ url = "https://files.pythonhosted.org/packages/b2/04/7b5705d5b3c0fab088f434f9c83edac1573830ca49ccf29fb83bf7178eec/tornado-6.5.5-cp39-abi3-manylinux1_x86_64.manylinux_2_28_x86_64.manylinux_2_5_x86_64.whl", hash = "sha256:e74c92e8e65086b338fd56333fb9a68b9f6f2fe7ad532645a290a464bcf46be5", size = 447229 },
|
||||
{ url = "https://files.pythonhosted.org/packages/34/01/74e034a30ef59afb4097ef8659515e96a39d910b712a89af76f5e4e1f93c/tornado-6.5.5-cp39-abi3-manylinux2014_aarch64.manylinux_2_17_aarch64.manylinux_2_28_aarch64.whl", hash = "sha256:435319e9e340276428bbdb4e7fa732c2d399386d1de5686cb331ec8eee754f07", size = 448192 },
|
||||
{ url = "https://files.pythonhosted.org/packages/be/00/fe9e02c5a96429fce1a1d15a517f5d8444f9c412e0bb9eadfbe3b0fc55bf/tornado-6.5.5-cp39-abi3-musllinux_1_2_aarch64.whl", hash = "sha256:3f54aa540bdbfee7b9eb268ead60e7d199de5021facd276819c193c0fb28ea4e", size = 448039 },
|
||||
{ url = "https://files.pythonhosted.org/packages/82/9e/656ee4cec0398b1d18d0f1eb6372c41c6b889722641d84948351ae19556d/tornado-6.5.5-cp39-abi3-musllinux_1_2_x86_64.whl", hash = "sha256:36abed1754faeb80fbd6e64db2758091e1320f6bba74a4cf8c09cd18ccce8aca", size = 447445 },
|
||||
{ url = "https://files.pythonhosted.org/packages/5a/76/4921c00511f88af86a33de770d64141170f1cfd9c00311aea689949e274e/tornado-6.5.5-cp39-abi3-win32.whl", hash = "sha256:dd3eafaaeec1c7f2f8fdcd5f964e8907ad788fe8a5a32c4426fbbdda621223b7", size = 448582 },
|
||||
{ url = "https://files.pythonhosted.org/packages/2c/23/f6c6112a04d28eed765e374435fb1a9198f73e1ec4b4024184f21faeb1ad/tornado-6.5.5-cp39-abi3-win_amd64.whl", hash = "sha256:6443a794ba961a9f619b1ae926a2e900ac20c34483eea67be4ed8f1e58d3ef7b", size = 448990 },
|
||||
{ url = "https://files.pythonhosted.org/packages/b7/c8/876602cbc96469911f0939f703453c1157b0c826ecb05bdd32e023397d4e/tornado-6.5.5-cp39-abi3-win_arm64.whl", hash = "sha256:2c9a876e094109333f888539ddb2de4361743e5d21eece20688e3e351e4990a6", size = 448016 },
|
||||
]
|
||||
|
||||
[[package]]
|
||||
|
|
|
|||
6
surfsense_desktop/.env
Normal file
6
surfsense_desktop/.env
Normal file
|
|
@ -0,0 +1,6 @@
|
|||
# Electron-specific build-time configuration.
|
||||
# Set before running pnpm dist:mac / dist:win / dist:linux.
|
||||
|
||||
# The hosted web frontend URL. Used to intercept OAuth redirects and keep them
|
||||
# inside the desktop app. Set to your production frontend domain.
|
||||
HOSTED_FRONTEND_URL=https://surfsense.net
|
||||
3
surfsense_desktop/.gitignore
vendored
Normal file
3
surfsense_desktop/.gitignore
vendored
Normal file
|
|
@ -0,0 +1,3 @@
|
|||
node_modules/
|
||||
dist/
|
||||
release/
|
||||
58
surfsense_desktop/README.md
Normal file
58
surfsense_desktop/README.md
Normal file
|
|
@ -0,0 +1,58 @@
|
|||
# SurfSense Desktop
|
||||
|
||||
Electron wrapper around the SurfSense web app. Packages the Next.js standalone build into a native desktop application with OAuth support, deep linking, and system browser integration.
|
||||
|
||||
## Prerequisites
|
||||
|
||||
- Node.js 18+
|
||||
- pnpm 10+
|
||||
- The `surfsense_web` project dependencies installed (`pnpm install` in `surfsense_web/`)
|
||||
|
||||
## Development
|
||||
|
||||
```bash
|
||||
pnpm install
|
||||
pnpm dev
|
||||
```
|
||||
|
||||
This starts the Next.js dev server and Electron concurrently. Hot reload works — edit the web app and changes appear immediately.
|
||||
|
||||
## Configuration
|
||||
|
||||
Two `.env` files control the build:
|
||||
|
||||
**`surfsense_web/.env`** — Next.js environment variables baked into the frontend at build time:
|
||||
|
||||
**`surfsense_desktop/.env`** — Electron-specific configuration:
|
||||
|
||||
Set these before building.
|
||||
|
||||
## Build & Package
|
||||
|
||||
**Step 1** — Build the Next.js standalone output:
|
||||
|
||||
```bash
|
||||
cd ../surfsense_web
|
||||
pnpm build
|
||||
```
|
||||
|
||||
**Step 2** — Compile Electron and prepare the standalone output:
|
||||
|
||||
```bash
|
||||
cd ../surfsense_desktop
|
||||
pnpm build
|
||||
```
|
||||
|
||||
**Step 3** — Package into a distributable:
|
||||
|
||||
```bash
|
||||
pnpm dist:mac # macOS (.dmg + .zip)
|
||||
pnpm dist:win # Windows (.exe)
|
||||
pnpm dist:linux # Linux (.deb + .AppImage)
|
||||
```
|
||||
|
||||
**Step 4** — Find the output:
|
||||
|
||||
```bash
|
||||
ls release/
|
||||
```
|
||||
BIN
surfsense_desktop/assets/icon.icns
Normal file
BIN
surfsense_desktop/assets/icon.icns
Normal file
Binary file not shown.
BIN
surfsense_desktop/assets/icon.ico
Normal file
BIN
surfsense_desktop/assets/icon.ico
Normal file
Binary file not shown.
|
After Width: | Height: | Size: 151 KiB |
BIN
surfsense_desktop/assets/icon.png
Normal file
BIN
surfsense_desktop/assets/icon.png
Normal file
Binary file not shown.
|
After Width: | Height: | Size: 6.3 MiB |
67
surfsense_desktop/electron-builder.yml
Normal file
67
surfsense_desktop/electron-builder.yml
Normal file
|
|
@ -0,0 +1,67 @@
|
|||
appId: com.surfsense.desktop
|
||||
productName: SurfSense
|
||||
publish:
|
||||
provider: github
|
||||
owner: MODSetter
|
||||
repo: SurfSense
|
||||
directories:
|
||||
output: release
|
||||
files:
|
||||
- dist/**/*
|
||||
- "!node_modules"
|
||||
- "!src"
|
||||
- "!scripts"
|
||||
- "!release"
|
||||
extraResources:
|
||||
- from: ../surfsense_web/.next/standalone/surfsense_web/
|
||||
to: standalone/
|
||||
filter:
|
||||
- "**/*"
|
||||
- "!**/node_modules"
|
||||
- from: ../surfsense_web/.next/standalone/surfsense_web/node_modules/
|
||||
to: standalone/node_modules/
|
||||
filter: ["**/*"]
|
||||
- from: ../surfsense_web/.next/static/
|
||||
to: standalone/.next/static/
|
||||
filter: ["**/*"]
|
||||
- from: ../surfsense_web/public/
|
||||
to: standalone/public/
|
||||
filter: ["**/*"]
|
||||
asarUnpack:
|
||||
- "**/*.node"
|
||||
mac:
|
||||
icon: assets/icon.icns
|
||||
category: public.app-category.productivity
|
||||
artifactName: "${productName}-${version}-${arch}.${ext}"
|
||||
hardenedRuntime: true
|
||||
gatekeeperAssess: false
|
||||
target:
|
||||
- target: dmg
|
||||
arch: [x64, arm64]
|
||||
- target: zip
|
||||
arch: [x64, arm64]
|
||||
win:
|
||||
icon: assets/icon.ico
|
||||
target:
|
||||
- target: nsis
|
||||
arch: [x64, arm64]
|
||||
nsis:
|
||||
oneClick: false
|
||||
perMachine: false
|
||||
allowToChangeInstallationDirectory: true
|
||||
createDesktopShortcut: true
|
||||
createStartMenuShortcut: true
|
||||
linux:
|
||||
icon: assets/icon.png
|
||||
category: Utility
|
||||
artifactName: "${productName}-${version}-${arch}.${ext}"
|
||||
mimeTypes:
|
||||
- x-scheme-handler/surfsense
|
||||
desktop:
|
||||
entry:
|
||||
Name: SurfSense
|
||||
Comment: AI-powered research assistant
|
||||
Categories: Utility;Office;
|
||||
target:
|
||||
- deb
|
||||
- AppImage
|
||||
33
surfsense_desktop/package.json
Normal file
33
surfsense_desktop/package.json
Normal file
|
|
@ -0,0 +1,33 @@
|
|||
{
|
||||
"name": "surfsense-desktop",
|
||||
"version": "0.1.0",
|
||||
"description": "SurfSense Desktop App",
|
||||
"main": "dist/main.js",
|
||||
"scripts": {
|
||||
"dev": "concurrently -k \"pnpm --dir ../surfsense_web dev\" \"wait-on http://localhost:3000 && electron .\"",
|
||||
"build": "node scripts/build-electron.mjs",
|
||||
"pack:dir": "pnpm build && electron-builder --dir --config electron-builder.yml",
|
||||
"dist": "pnpm build && electron-builder --config electron-builder.yml",
|
||||
"dist:mac": "pnpm build && electron-builder --mac --config electron-builder.yml",
|
||||
"dist:win": "pnpm build && electron-builder --win --config electron-builder.yml",
|
||||
"dist:linux": "pnpm build && electron-builder --linux --config electron-builder.yml",
|
||||
"typecheck": "tsc --noEmit"
|
||||
},
|
||||
"author": "MODSetter",
|
||||
"license": "MIT",
|
||||
"packageManager": "pnpm@10.24.0",
|
||||
"devDependencies": {
|
||||
"@types/node": "^25.5.0",
|
||||
"concurrently": "^9.2.1",
|
||||
"dotenv": "^17.3.1",
|
||||
"electron": "^41.0.2",
|
||||
"electron-builder": "^26.8.1",
|
||||
"esbuild": "^0.27.4",
|
||||
"typescript": "^5.9.3",
|
||||
"wait-on": "^9.0.4"
|
||||
},
|
||||
"dependencies": {
|
||||
"electron-updater": "^6.8.3",
|
||||
"get-port-please": "^3.2.0"
|
||||
}
|
||||
}
|
||||
Some files were not shown because too many files have changed in this diff Show more
Loading…
Add table
Add a link
Reference in a new issue