diff --git a/.cursor/skills/system-architecture/SKILL.md b/.cursor/skills/system-architecture/SKILL.md new file mode 100755 index 000000000..70683a7ac --- /dev/null +++ b/.cursor/skills/system-architecture/SKILL.md @@ -0,0 +1,136 @@ +--- +name: system-architecture +description: Design systems with appropriate complexity - no more, no less. Use when the user asks to architect applications, design system boundaries, plan service decomposition, evaluate monolith vs microservices, make scaling decisions, or review structural trade-offs. Applies to new system design, refactoring, and migration planning. +--- + +# System Architecture + +Design real structures with clear boundaries, explicit trade-offs, and appropriate complexity. Match architecture to actual requirements, not imagined future needs. + +## Workflow + +When the user requests an architecture, follow these steps: + +``` +Task Progress: +- [ ] Step 1: Clarify constraints +- [ ] Step 2: Identify domains +- [ ] Step 3: Map data flow +- [ ] Step 4: Draw boundaries with rationale +- [ ] Step 5: Run complexity checklist +- [ ] Step 6: Present architecture with trade-offs +``` + +**Step 1 - Clarify constraints.** Ask about: + +| Constraint | Question | Why it matters | +|------------|----------|----------------| +| Scale | What's the real load? (users, requests/sec, data size) | Design for 10x current, not 1000x | +| Team | How many developers? How many teams? | Deployable units ≤ number of teams | +| Lifespan | Prototype? MVP? Long-term product? | Temporary systems need temporary solutions | +| Change vectors | What actually varies? | Abstract only where you have evidence of variation | + +**Step 2 - Identify domains.** Group by business capability, not technical layer. Look for things that change for different reasons and at different rates. + +**Step 3 - Map data flow.** Trace: where does data enter → how does it transform → where does it exit? Make the flow obvious. + +**Step 4 - Draw boundaries.** Every boundary needs a reason: different team, different change rate, different compliance requirement, or different scaling need. + +**Step 5 - Run complexity checklist.** Before adding any non-trivial pattern: + +``` +[ ] Have I tried the simple solution? +[ ] Do I have evidence it's insufficient? +[ ] Can my team operate this? +[ ] Will this still make sense in 6 months? +[ ] Can I explain why this complexity is necessary? +``` + +If any answer is "no", keep it simple. + +**Step 6 - Present the architecture** using the output template below. + +## Output Template + +```markdown +### System: [Name] + +**Constraints**: +- Scale: [current and expected load] +- Team: [size and structure] +- Lifespan: [prototype / MVP / long-term] + +**Architecture**: +[Component diagram or description of components and their relationships] + +**Data Flow**: +[How data enters → transforms → exits] + +**Key Boundaries**: +| Boundary | Reason | Change Rate | +|----------|--------|-------------| +| ... | ... | ... | + +**Trade-offs**: +- Chose X over Y because [reason] +- Accepted [limitation] to gain [benefit] + +**Complexity Justification**: +- [Each non-trivial pattern] → [why it's needed, with evidence] +``` + +## Core Principles + +1. **Boundaries at real differences.** Separate concerns that change for different reasons and at different rates. +2. **Dependencies flow inward.** Core logic depends on nothing. Infrastructure depends on core. +3. **Follow the data.** Architecture should make data flow obvious. +4. **Design for failure.** Network fails. Databases timeout. Build compensation into the structure. +5. **Design for operations.** You will debug this at 3am. Every request needs a trace. Every error needs context for replay. + +For concrete good/bad examples of each principle, see [examples.md](examples.md). + +## Anti-Patterns + +| Don't | Do Instead | +|-------|------------| +| Microservices for a 3-person team | Well-structured monolith | +| Event sourcing for CRUD | Simple state storage | +| Message queues within the same process | Just call the function | +| Distributed transactions | Redesign to avoid, or accept eventual consistency | +| Repository wrapping an ORM | Use the ORM directly | +| Interfaces with one implementation | Mock at boundaries only | +| AbstractFactoryFactoryBean | Just instantiate the thing | +| DI containers for simple graphs | Constructor injection is enough | +| Clean Architecture for a TODO app | Match layers to actual complexity | +| DDD tactics without strategic design | Aggregates need bounded contexts | +| Hexagonal ports with one adapter | Just call the database | +| CQRS when reads = writes | Add when they diverge | +| "We might swap databases" | You won't; rewrite if you do | +| "Multi-tenant someday" | Build it when you have tenant #2 | +| "Microservices for team scale" | Helps at 50+ engineers, not 4 | + +## Success Criteria + +Your architecture is right-sized when: + +1. **You can draw it** - dependency graph fits on a whiteboard +2. **You can explain it** - new team member understands data flow in 30 minutes +3. **You can change it** - adding a feature touches 1-3 modules, not 10 +4. **You can delete it** - removing a component needs no archaeology +5. **You can debug it** - tracing a request takes minutes, not hours +6. **It matches your team** - deployable units ≤ number of teams + +## When the Simple Solution Isn't Enough + +If the complexity checklist says "yes, scale is real", see [scaling-checklist.md](scaling-checklist.md) for concrete techniques covering caching, async processing, partitioning, horizontal scaling, and multi-region. + +## Iterative Architecture + +Architecture is discovered, not designed upfront: + +1. **Start obvious** - group by domain, not by technical layer +2. **Let hotspots emerge** - monitor which modules change together +3. **Extract when painful** - split only when the current form causes measurable problems +4. **Document decisions** - record why boundaries exist so future you knows what's load-bearing + +Every senior engineer has a graveyard of over-engineered systems they regret. Learn from their pain. Build boring systems that work. diff --git a/.cursor/skills/system-architecture/examples.md b/.cursor/skills/system-architecture/examples.md new file mode 100644 index 000000000..fa72f92ce --- /dev/null +++ b/.cursor/skills/system-architecture/examples.md @@ -0,0 +1,120 @@ +# Architecture Examples + +Concrete good/bad examples for each core principle in SKILL.md. + +--- + +## Boundaries at Real Differences + +**Good** - Meaningful boundary: +``` +# Users and Billing are separate bounded contexts +# - Different teams own them +# - Different change cadences (users: weekly, billing: quarterly) +# - Different compliance requirements + +src/ + users/ # User management domain + models.py + services.py + api.py + billing/ # Billing domain + models.py + services.py + api.py + shared/ # Truly shared utilities + auth.py +``` + +**Bad** - Ceremony without purpose: +``` +# UserService → UserRepository → UserRepositoryImpl +# ...when you'll never swap the database + +src/ + interfaces/ + IUserRepository.py # One implementation exists + repositories/ + UserRepositoryImpl.py # Wraps SQLAlchemy, which is already a repository + services/ + UserService.py # Just calls the repository +``` + +--- + +## Dependencies Flow Inward + +**Good** - Clear dependency direction: +``` +# Dependency flows inward: infrastructure → application → domain + +domain/ # Pure business logic, no imports from outer layers + order.py # Order entity with business rules + +application/ # Use cases, orchestrates domain + place_order.py # Imports from domain/, not infrastructure/ + +infrastructure/ # External concerns + postgres.py # Implements persistence, imports from application/ + stripe.py # Implements payments +``` + +--- + +## Follow the Data + +**Good** - Obvious data flow: +``` +Request → Validate → Transform → Store → Respond + +# Each step is a clear function/module: +api/routes.py # Request enters +validators.py # Validation +transformers.py # Business logic transformation +repositories.py # Storage +serializers.py # Response shaping +``` + +--- + +## Design for Failure + +**Good** - Failure-aware design with compensation: +```python +class OrderService: + def place_order(self, order: Order) -> Result: + inventory = self.inventory.reserve(order.items) + if inventory.failed: + return Result.failure("Items unavailable", retry=False) + + payment = self.payments.charge(order.total) + if payment.failed: + self.inventory.release(inventory.reservation_id) # Compensate + return Result.failure("Payment failed", retry=True) + + return Result.success(order) +``` + +--- + +## Design for Operations + +**Good** - Observable architecture: +```python +@trace +def handle_request(request): + log.info("Processing", request_id=request.id, user=request.user_id) + try: + result = process(request) + log.info("Completed", request_id=request.id, result=result.status) + return result + except Exception as e: + log.error("Failed", request_id=request.id, error=str(e), + context=request.to_dict()) # Full context for replay + raise +``` + +Key elements: +- Every request gets a correlation ID +- Every service logs with that ID +- Every error includes full context for reproduction diff --git a/.cursor/skills/system-architecture/scaling-checklist.md b/.cursor/skills/system-architecture/scaling-checklist.md new file mode 100644 index 000000000..d9cfdce43 --- /dev/null +++ b/.cursor/skills/system-architecture/scaling-checklist.md @@ -0,0 +1,76 @@ +# Scaling Checklist + +Concrete techniques for when the complexity checklist in SKILL.md confirms scale is a real problem. Apply in order - each level solves the previous level's bottleneck. + +--- + +## Level 0: Optimize First + +Before adding infrastructure, exhaust these: + +- [ ] Database queries have proper indexes +- [ ] N+1 queries eliminated +- [ ] Connection pooling configured +- [ ] Slow endpoints profiled and optimized +- [ ] Static assets served via CDN + +## Level 1: Read-Heavy + +**Symptom**: Database reads are the bottleneck. + +| Technique | When | Trade-off | +|-----------|------|-----------| +| Application cache (in-memory) | Small, frequently accessed data | Stale data, memory pressure | +| Redis/Memcached | Shared cache across instances | Network hop, cache invalidation complexity | +| Read replicas | High read volume, slight staleness OK | Replication lag, eventual consistency | +| CDN | Static or semi-static content | Cache invalidation delay | + +## Level 2: Write-Heavy + +**Symptom**: Database writes or processing are the bottleneck. + +| Technique | When | Trade-off | +|-----------|------|-----------| +| Async task queue (Celery, SQS) | Work can be deferred | Eventual consistency, failure handling | +| Write-behind cache | Batch frequent writes | Data loss risk on crash | +| Event streaming (Kafka) | Multiple consumers of same data | Operational complexity, ordering guarantees | +| CQRS | Reads and writes have diverged significantly | Two models to maintain | + +## Level 3: Traffic Spikes + +**Symptom**: Individual instances can't handle peak load. + +| Technique | When | Trade-off | +|-----------|------|-----------| +| Horizontal scaling + load balancer | Stateless services | Session management, deploy complexity | +| Auto-scaling | Unpredictable traffic patterns | Cold start latency, cost spikes | +| Rate limiting | Protect against abuse/spikes | Legitimate users may be throttled | +| Circuit breakers | Downstream services degrade | Partial functionality during failures | + +## Level 4: Data Growth + +**Symptom**: Single database can't hold or query all the data efficiently. + +| Technique | When | Trade-off | +|-----------|------|-----------| +| Table partitioning | Time-series or naturally partitioned data | Query complexity, partition management | +| Archival / cold storage | Old data rarely accessed | Access latency for archived data | +| Database sharding | Partitioning insufficient, clear shard key exists | Cross-shard queries, operational burden | +| Search index (Elasticsearch) | Full-text or complex queries on large datasets | Index lag, another system to operate | + +## Level 5: Multi-Region + +**Symptom**: Users are geographically distributed, latency matters. + +| Technique | When | Trade-off | +|-----------|------|-----------| +| CDN + edge caching | Static/semi-static content | Cache invalidation | +| Read replicas per region | Read-heavy, slight staleness OK | Replication lag | +| Active-passive failover | Disaster recovery | Failover time, cost of standby | +| Active-active multi-region | True global low-latency required | Conflict resolution, extreme complexity | + +--- + +## Decision Rule + +Always start at Level 0. Move to the next level only when you have **measured evidence** that the current level is insufficient. Skipping levels is how you end up with Kafka for a TODO app. diff --git a/.github/workflows/desktop-release.yml b/.github/workflows/desktop-release.yml new file mode 100644 index 000000000..7119fcb6d --- /dev/null +++ b/.github/workflows/desktop-release.yml @@ -0,0 +1,78 @@ +name: Desktop Release + +on: + push: + tags: + - 'v*' + - 'beta-v*' + +permissions: + contents: write + +jobs: + build: + runs-on: ${{ matrix.os }} + strategy: + fail-fast: false + matrix: + include: + - os: macos-latest + platform: --mac + - os: ubuntu-latest + platform: --linux + - os: windows-latest + platform: --win + + steps: + - name: Checkout + uses: actions/checkout@v4 + + - name: Extract version from tag + id: version + shell: bash + run: | + TAG=${GITHUB_REF#refs/tags/} + VERSION=${TAG#beta-} + VERSION=${VERSION#v} + echo "VERSION=$VERSION" >> "$GITHUB_OUTPUT" + + - name: Setup pnpm + uses: pnpm/action-setup@v4 + + - name: Setup Node.js + uses: actions/setup-node@v4 + with: + node-version: 20 + cache: 'pnpm' + cache-dependency-path: | + surfsense_web/pnpm-lock.yaml + surfsense_desktop/pnpm-lock.yaml + + - name: Install web dependencies + run: pnpm install + working-directory: surfsense_web + + - name: Build Next.js standalone + run: pnpm build + working-directory: surfsense_web + env: + NEXT_PUBLIC_FASTAPI_BACKEND_URL: ${{ vars.NEXT_PUBLIC_FASTAPI_BACKEND_URL }} + NEXT_PUBLIC_ELECTRIC_URL: ${{ vars.NEXT_PUBLIC_ELECTRIC_URL }} + NEXT_PUBLIC_DEPLOYMENT_MODE: ${{ vars.NEXT_PUBLIC_DEPLOYMENT_MODE }} + NEXT_PUBLIC_FASTAPI_BACKEND_AUTH_TYPE: ${{ vars.NEXT_PUBLIC_FASTAPI_BACKEND_AUTH_TYPE }} + + - name: Install desktop dependencies + run: pnpm install + working-directory: surfsense_desktop + + - name: Build Electron + run: pnpm build + working-directory: surfsense_desktop + env: + HOSTED_FRONTEND_URL: ${{ vars.HOSTED_FRONTEND_URL }} + + - name: Package & Publish + run: pnpm exec electron-builder ${{ matrix.platform }} --config electron-builder.yml --publish always -c.extraMetadata.version=${{ steps.version.outputs.VERSION }} + working-directory: surfsense_desktop + env: + GH_TOKEN: ${{ secrets.GITHUB_TOKEN }} diff --git a/.gitignore b/.gitignore index 559918a61..a5c44ce73 100644 --- a/.gitignore +++ b/.gitignore @@ -5,4 +5,4 @@ node_modules/ .ruff_cache/ .venv .pnpm-store -.DS_Store +.DS_Store \ No newline at end of file diff --git a/.vscode/launch.json b/.vscode/launch.json index 2c4784c0e..029e7c647 100644 --- a/.vscode/launch.json +++ b/.vscode/launch.json @@ -22,7 +22,11 @@ "console": "integratedTerminal", "justMyCode": false, "cwd": "${workspaceFolder}/surfsense_backend", - "python": "${command:python.interpreterPath}" + "python": "uv", + "pythonArgs": [ + "run", + "python" + ] }, { "name": "Backend: FastAPI (No Reload)", @@ -32,7 +36,11 @@ "console": "integratedTerminal", "justMyCode": false, "cwd": "${workspaceFolder}/surfsense_backend", - "python": "${command:python.interpreterPath}" + "python": "uv", + "pythonArgs": [ + "run", + "python" + ] }, { "name": "Backend: FastAPI (main.py)", @@ -41,14 +49,19 @@ "program": "${workspaceFolder}/surfsense_backend/main.py", "console": "integratedTerminal", "justMyCode": false, - "cwd": "${workspaceFolder}/surfsense_backend" + "cwd": "${workspaceFolder}/surfsense_backend", + "python": "uv", + "pythonArgs": [ + "run", + "python" + ] }, { "name": "Frontend: Next.js", "type": "node", "request": "launch", "cwd": "${workspaceFolder}/surfsense_web", - "runtimeExecutable": "npm", + "runtimeExecutable": "pnpm", "runtimeArgs": ["run", "dev"], "console": "integratedTerminal", "serverReadyAction": { @@ -62,7 +75,7 @@ "type": "node", "request": "launch", "cwd": "${workspaceFolder}/surfsense_web", - "runtimeExecutable": "npm", + "runtimeExecutable": "pnpm", "runtimeArgs": ["run", "debug:server"], "console": "integratedTerminal", "serverReadyAction": { @@ -87,7 +100,11 @@ "console": "integratedTerminal", "justMyCode": false, "cwd": "${workspaceFolder}/surfsense_backend", - "python": "${command:python.interpreterPath}" + "python": "uv", + "pythonArgs": [ + "run", + "python" + ] }, { "name": "Celery: Beat Scheduler", @@ -103,7 +120,11 @@ "console": "integratedTerminal", "justMyCode": false, "cwd": "${workspaceFolder}/surfsense_backend", - "python": "${command:python.interpreterPath}" + "python": "uv", + "pythonArgs": [ + "run", + "python" + ] } ], "compounds": [ diff --git a/.vscode/settings.json b/.vscode/settings.json index f134660b6..05bd30702 100644 --- a/.vscode/settings.json +++ b/.vscode/settings.json @@ -1,3 +1,4 @@ { - "biome.configurationPath": "./surfsense_web/biome.json" + "biome.configurationPath": "./surfsense_web/biome.json", + "deepscan.ignoreConfirmWarning": true } \ No newline at end of file diff --git a/README.es.md b/README.es.md index a1f5b80d8..e5bc9be7e 100644 --- a/README.es.md +++ b/README.es.md @@ -27,11 +27,18 @@ SurfSense es un agente de investigación de IA altamente personalizable, conecta -# Video +# Demo https://github.com/user-attachments/assets/cc0c84d3-1f2f-4f7a-b519-2ecce22310b1 -## Ejemplo de Podcast +## Ejemplo de Agente de Video + + +https://github.com/user-attachments/assets/cc977e6d-8292-4ffe-abb8-3b0560ef5562 + + + +## Ejemplo de Agente de Podcast https://github.com/user-attachments/assets/a0a16566-6967-4374-ac51-9b3e07fbecd7 @@ -46,20 +53,25 @@ https://github.com/user-attachments/assets/a0a16566-6967-4374-ac51-9b3e07fbecd7 2. Conecta tus conectores y sincroniza. Activa la sincronización periódica para mantenerlos actualizados. -

Conectores

+

Conectores

3. Mientras se indexan los datos de los conectores, sube documentos. -

Subir Documentos

+

Subir Documentos

4. Una vez que todo esté indexado, pregunta lo que quieras (Casos de uso): + - Generación de videos + +

Generación de Videos

+ - Búsqueda básica y citaciones

Búsqueda y Citación

- QNA con mención de documentos +

QNA con Mención de Documentos

QNA con Mención de Documentos

- Generación de informes y exportaciones (PDF, DOCX, HTML, LaTeX, EPUB, ODT, texto plano) @@ -133,6 +145,8 @@ Para Docker Compose, instalación manual y otras opciones de despliegue, consult | Soporte Universal de LLM | 100+ LLMs, 6000+ modelos de embeddings, todos los principales rerankers vía OpenAI spec y LiteLLM | | Privacidad Primero | Soporte completo de LLM local (vLLM, Ollama) tus datos son tuyos | | Colaboración en Equipo | RBAC con roles de Propietario / Admin / Editor / Visor, chat en tiempo real e hilos de comentarios | +| Generación de Videos | Genera videos con narración y visuales | +| Generación de Presentaciones | Crea presentaciones editables basadas en diapositivas | | Generación de Podcasts | Podcast de 3 min en menos de 20 segundos; múltiples proveedores TTS (OpenAI, Azure, Kokoro) | | Extensión de Navegador | Extensión multi-navegador para guardar cualquier página web, incluyendo páginas protegidas por autenticación | | 25+ Conectores | Motores de búsqueda, Google Drive, Slack, Teams, Jira, Notion, GitHub, Discord y [más](#fuentes-externas) | diff --git a/README.hi.md b/README.hi.md index 7a4822e68..2966ef4a3 100644 --- a/README.hi.md +++ b/README.hi.md @@ -27,11 +27,18 @@ SurfSense एक अत्यधिक अनुकूलन योग्य AI -# वीडियो +# डेमो https://github.com/user-attachments/assets/cc0c84d3-1f2f-4f7a-b519-2ecce22310b1 -## पॉडकास्ट नमूना +## वीडियो एजेंट नमूना + + +https://github.com/user-attachments/assets/cc977e6d-8292-4ffe-abb8-3b0560ef5562 + + + +## पॉडकास्ट एजेंट नमूना https://github.com/user-attachments/assets/a0a16566-6967-4374-ac51-9b3e07fbecd7 @@ -46,20 +53,25 @@ https://github.com/user-attachments/assets/a0a16566-6967-4374-ac51-9b3e07fbecd7 2. अपने कनेक्टर जोड़ें और सिंक करें। कनेक्टर्स को अपडेट रखने के लिए आवधिक सिंकिंग सक्षम करें। -

कनेक्टर्स

+

कनेक्टर्स

3. जब तक कनेक्टर्स का डेटा इंडेक्स हो रहा है, दस्तावेज़ अपलोड करें। -

दस्तावेज़ अपलोड करें

+

दस्तावेज़ अपलोड करें

4. सब कुछ इंडेक्स हो जाने के बाद, कुछ भी पूछें (उपयोग के मामले): + - वीडियो जनरेशन + +

वीडियो जनरेशन

+ - बेसिक सर्च और उद्धरण

सर्च और उद्धरण

- दस्तावेज़ मेंशन QNA +

दस्तावेज़ मेंशन QNA

दस्तावेज़ मेंशन QNA

- रिपोर्ट जनरेशन और एक्सपोर्ट (PDF, DOCX, HTML, LaTeX, EPUB, ODT, सादा टेक्स्ट) @@ -133,6 +145,8 @@ Docker Compose, मैनुअल इंस्टॉलेशन और अन | यूनिवर्सल LLM सपोर्ट | 100+ LLMs, 6000+ एम्बेडिंग मॉडल, सभी प्रमुख रीरैंकर्स OpenAI spec और LiteLLM के माध्यम से | | प्राइवेसी फर्स्ट | पूर्ण लोकल LLM सपोर्ट (vLLM, Ollama) आपका डेटा आपका रहता है | | टीम सहयोग | मालिक / एडमिन / संपादक / दर्शक भूमिकाओं के साथ RBAC, रीयल-टाइम चैट और कमेंट थ्रेड | +| वीडियो जनरेशन | नैरेशन और विज़ुअल के साथ वीडियो बनाएं | +| प्रेजेंटेशन जनरेशन | संपादन योग्य, स्लाइड आधारित प्रेजेंटेशन बनाएं | | पॉडकास्ट जनरेशन | 20 सेकंड से कम में 3 मिनट का पॉडकास्ट; कई TTS प्रदाता (OpenAI, Azure, Kokoro) | | ब्राउज़र एक्सटेंशन | किसी भी वेबपेज को सहेजने के लिए क्रॉस-ब्राउज़र एक्सटेंशन, प्रमाणीकरण सुरक्षित पेज सहित | | 25+ कनेक्टर्स | सर्च इंजन, Google Drive, Slack, Teams, Jira, Notion, GitHub, Discord और [अधिक](#बाहरी-स्रोत) | diff --git a/README.md b/README.md index f37664dd7..7ad66e0d9 100644 --- a/README.md +++ b/README.md @@ -27,11 +27,18 @@ SurfSense is a highly customizable AI research agent, connected to external sour -# Video +# Demo https://github.com/user-attachments/assets/cc0c84d3-1f2f-4f7a-b519-2ecce22310b1 -## Podcast Sample +## Video Agent Sample + + +https://github.com/user-attachments/assets/cc977e6d-8292-4ffe-abb8-3b0560ef5562 + + + +## Podcast Agent Sample https://github.com/user-attachments/assets/a0a16566-6967-4374-ac51-9b3e07fbecd7 @@ -46,20 +53,25 @@ https://github.com/user-attachments/assets/a0a16566-6967-4374-ac51-9b3e07fbecd7 2. Connect your connectors and sync. Enable periodic syncing to keep connectors synced. -

Connectors

+

Connectors

3. Till connectors data index, upload Documents. -

Upload Documents

+

Upload Documents

4. Once everything is indexed, Ask Away (Use Cases): + - Video Generation + +

Search and Citation

+ - Basic search and citation

Search and Citation

- Document Mention QNA +

Document Mention QNA

Document Mention QNA

- Report Generations and Exports (PDF, DOCX, HTML, LaTeX, EPUB, ODT, Plain Text) @@ -133,6 +145,8 @@ For Docker Compose, manual installation, and other deployment options, see the [ | Universal LLM Support | 100+ LLMs, 6000+ embedding models, all major rerankers via OpenAI spec & LiteLLM | | Privacy First | Full local LLM support (vLLM, Ollama) your data stays yours | | Team Collaboration | RBAC with Owner / Admin / Editor / Viewer roles, real time chat & comment threads | +| Video Generation | Generate videos with narration and visuals | +| Presentation Generation | Create editable, slide based presentations | | Podcast Generation | 3 min podcast in under 20 seconds; multiple TTS providers (OpenAI, Azure, Kokoro) | | Browser Extension | Cross browser extension to save any webpage, including auth protected pages | | 25+ Connectors | Search Engines, Google Drive, Slack, Teams, Jira, Notion, GitHub, Discord & [more](#external-sources) | diff --git a/README.pt-BR.md b/README.pt-BR.md index 5461d8824..4b93a8036 100644 --- a/README.pt-BR.md +++ b/README.pt-BR.md @@ -27,11 +27,18 @@ SurfSense é um agente de pesquisa de IA altamente personalizável, conectado a -# Vídeo +# Demo https://github.com/user-attachments/assets/cc0c84d3-1f2f-4f7a-b519-2ecce22310b1 -## Exemplo de Podcast +## Exemplo de Agente de Vídeo + + +https://github.com/user-attachments/assets/cc977e6d-8292-4ffe-abb8-3b0560ef5562 + + + +## Exemplo de Agente de Podcast https://github.com/user-attachments/assets/a0a16566-6967-4374-ac51-9b3e07fbecd7 @@ -46,20 +53,25 @@ https://github.com/user-attachments/assets/a0a16566-6967-4374-ac51-9b3e07fbecd7 2. Conecte seus conectores e sincronize. Ative a sincronização periódica para manter os conectores atualizados. -

Conectores

+

Conectores

3. Enquanto os dados dos conectores são indexados, faça upload de documentos. -

Upload de Documentos

+

Upload de Documentos

4. Quando tudo estiver indexado, pergunte o que quiser (Casos de uso): + - Geração de vídeos + +

Geração de Vídeos

+ - Busca básica e citações

Busca e Citação

- QNA com menção de documentos +

QNA com Menção de Documentos

QNA com Menção de Documentos

- Geração de relatórios e exportações (PDF, DOCX, HTML, LaTeX, EPUB, ODT, texto simples) @@ -133,6 +145,8 @@ Para Docker Compose, instalação manual e outras opções de implantação, con | Suporte Universal de LLM | 100+ LLMs, 6000+ modelos de embeddings, todos os principais rerankers via OpenAI spec e LiteLLM | | Privacidade em Primeiro Lugar | Suporte completo a LLM local (vLLM, Ollama) seus dados ficam com você | | Colaboração em Equipe | RBAC com papéis de Proprietário / Admin / Editor / Visualizador, chat em tempo real e threads de comentários | +| Geração de Vídeos | Gera vídeos com narração e visuais | +| Geração de Apresentações | Cria apresentações editáveis baseadas em slides | | Geração de Podcasts | Podcast de 3 min em menos de 20 segundos; múltiplos provedores TTS (OpenAI, Azure, Kokoro) | | Extensão de Navegador | Extensão multi-navegador para salvar qualquer página web, incluindo páginas protegidas por autenticação | | 25+ Conectores | Mecanismos de busca, Google Drive, Slack, Teams, Jira, Notion, GitHub, Discord e [mais](#fontes-externas) | diff --git a/README.zh-CN.md b/README.zh-CN.md index 9333348b6..5230a5b80 100644 --- a/README.zh-CN.md +++ b/README.zh-CN.md @@ -27,11 +27,18 @@ SurfSense 是一个高度可定制的 AI 研究助手,可以连接外部数据 -# 视频 +# 演示 https://github.com/user-attachments/assets/cc0c84d3-1f2f-4f7a-b519-2ecce22310b1 -## 播客示例 +## 视频代理示例 + + +https://github.com/user-attachments/assets/cc977e6d-8292-4ffe-abb8-3b0560ef5562 + + + +## 播客代理示例 https://github.com/user-attachments/assets/a0a16566-6967-4374-ac51-9b3e07fbecd7 @@ -46,20 +53,25 @@ https://github.com/user-attachments/assets/a0a16566-6967-4374-ac51-9b3e07fbecd7 2. 连接您的连接器并同步。启用定期同步以保持连接器数据更新。 -

连接器

+

连接器

3. 在连接器数据索引期间,上传文档。 -

上传文档

+

上传文档

4. 一切索引完成后,尽管提问(使用场景): + - 视频生成 + +

视频生成

+ - 基本搜索和引用

搜索和引用

- 文档提及问答 +

文档提及问答

文档提及问答

- 报告生成和导出(PDF、DOCX、HTML、LaTeX、EPUB、ODT、纯文本) @@ -133,6 +145,8 @@ irm https://raw.githubusercontent.com/MODSetter/SurfSense/main/docker/scripts/in | 通用 LLM 支持 | 100+ LLM、6000+ 嵌入模型、所有主流重排序器,通过 OpenAI spec 和 LiteLLM | | 隐私优先 | 完整本地 LLM 支持(vLLM、Ollama),您的数据由您掌控 | | 团队协作 | RBAC 角色控制(所有者/管理员/编辑者/查看者),实时聊天和评论线程 | +| 视频生成 | 生成带有旁白和视觉效果的视频 | +| 演示文稿生成 | 创建可编辑的幻灯片式演示文稿 | | 播客生成 | 20 秒内生成 3 分钟播客;多种 TTS 提供商(OpenAI、Azure、Kokoro) | | 浏览器扩展 | 跨浏览器扩展,保存任何网页,包括需要身份验证的页面 | | 25+ 连接器 | 搜索引擎、Google Drive、Slack、Teams、Jira、Notion、GitHub、Discord 等[更多](#外部数据源) | diff --git a/docker/.env.example b/docker/.env.example index c31b87185..a226c2624 100644 --- a/docker/.env.example +++ b/docker/.env.example @@ -36,6 +36,7 @@ EMBEDDING_MODEL=sentence-transformers/all-MiniLM-L6-v2 # BACKEND_PORT=8929 # FRONTEND_PORT=3929 # ELECTRIC_PORT=5929 +# SEARXNG_PORT=8888 # FLOWER_PORT=5555 # ============================================================================== @@ -199,6 +200,16 @@ STT_SERVICE=local/base # COMPOSIO_ENABLED=TRUE # COMPOSIO_REDIRECT_URI=http://localhost:8000/api/v1/auth/composio/connector/callback +# ------------------------------------------------------------------------------ +# SearXNG (bundled web search — works out of the box, no config needed) +# ------------------------------------------------------------------------------ +# SearXNG provides web search to all search spaces automatically. +# To access the SearXNG UI directly: http://localhost:8888 +# To disable the service entirely: docker compose up --scale searxng=0 +# To point at your own SearXNG instance instead of the bundled one: +# SEARXNG_DEFAULT_HOST=http://your-searxng:8080 +# SEARXNG_SECRET=surfsense-searxng-secret + # ------------------------------------------------------------------------------ # Daytona Sandbox (optional — cloud code execution for the deep agent) # ------------------------------------------------------------------------------ diff --git a/docker/docker-compose.dev.yml b/docker/docker-compose.dev.yml index 4d602f584..15531bf55 100644 --- a/docker/docker-compose.dev.yml +++ b/docker/docker-compose.dev.yml @@ -57,6 +57,20 @@ services: timeout: 5s retries: 5 + searxng: + image: searxng/searxng:2026.3.13-3c1f68c59 + ports: + - "${SEARXNG_PORT:-8888}:8080" + volumes: + - ./searxng:/etc/searxng + environment: + - SEARXNG_SECRET=${SEARXNG_SECRET:-surfsense-searxng-secret} + healthcheck: + test: ["CMD", "wget", "--spider", "-q", "http://localhost:8080/healthz"] + interval: 10s + timeout: 5s + retries: 5 + backend: build: ../surfsense_backend ports: @@ -81,6 +95,7 @@ services: - ELECTRIC_DB_PASSWORD=${ELECTRIC_DB_PASSWORD:-electric_password} - AUTH_TYPE=${AUTH_TYPE:-LOCAL} - NEXT_FRONTEND_URL=${NEXT_FRONTEND_URL:-http://localhost:3000} + - SEARXNG_DEFAULT_HOST=${SEARXNG_DEFAULT_HOST:-http://searxng:8080} # Daytona Sandbox – uncomment and set credentials to enable cloud code execution # - DAYTONA_SANDBOX_ENABLED=TRUE # - DAYTONA_API_KEY=${DAYTONA_API_KEY:-} @@ -92,6 +107,8 @@ services: condition: service_healthy redis: condition: service_healthy + searxng: + condition: service_healthy healthcheck: test: ["CMD", "curl", "-f", "http://localhost:8000/health"] interval: 15s @@ -115,6 +132,7 @@ services: - PYTHONPATH=/app - ELECTRIC_DB_USER=${ELECTRIC_DB_USER:-electric} - ELECTRIC_DB_PASSWORD=${ELECTRIC_DB_PASSWORD:-electric_password} + - SEARXNG_DEFAULT_HOST=${SEARXNG_DEFAULT_HOST:-http://searxng:8080} - SERVICE_ROLE=worker depends_on: db: diff --git a/docker/docker-compose.yml b/docker/docker-compose.yml index ca20e3ed4..8c85248d2 100644 --- a/docker/docker-compose.yml +++ b/docker/docker-compose.yml @@ -42,6 +42,19 @@ services: timeout: 5s retries: 5 + searxng: + image: searxng/searxng:2026.3.13-3c1f68c59 + volumes: + - ./searxng:/etc/searxng + environment: + SEARXNG_SECRET: ${SEARXNG_SECRET:-surfsense-searxng-secret} + restart: unless-stopped + healthcheck: + test: ["CMD", "wget", "--spider", "-q", "http://localhost:8080/healthz"] + interval: 10s + timeout: 5s + retries: 5 + backend: image: ghcr.io/modsetter/surfsense-backend:${SURFSENSE_VERSION:-latest} ports: @@ -62,6 +75,7 @@ services: ELECTRIC_DB_USER: ${ELECTRIC_DB_USER:-electric} ELECTRIC_DB_PASSWORD: ${ELECTRIC_DB_PASSWORD:-electric_password} NEXT_FRONTEND_URL: ${NEXT_FRONTEND_URL:-http://localhost:${FRONTEND_PORT:-3929}} + SEARXNG_DEFAULT_HOST: ${SEARXNG_DEFAULT_HOST:-http://searxng:8080} # Daytona Sandbox – uncomment and set credentials to enable cloud code execution # DAYTONA_SANDBOX_ENABLED: "TRUE" # DAYTONA_API_KEY: ${DAYTONA_API_KEY:-} @@ -75,6 +89,8 @@ services: condition: service_healthy redis: condition: service_healthy + searxng: + condition: service_healthy restart: unless-stopped healthcheck: test: ["CMD", "curl", "-f", "http://localhost:8000/health"] @@ -98,6 +114,7 @@ services: PYTHONPATH: /app ELECTRIC_DB_USER: ${ELECTRIC_DB_USER:-electric} ELECTRIC_DB_PASSWORD: ${ELECTRIC_DB_PASSWORD:-electric_password} + SEARXNG_DEFAULT_HOST: ${SEARXNG_DEFAULT_HOST:-http://searxng:8080} SERVICE_ROLE: worker depends_on: db: diff --git a/docker/scripts/install.ps1 b/docker/scripts/install.ps1 index 5f41ef7d6..b7004bae2 100644 --- a/docker/scripts/install.ps1 +++ b/docker/scripts/install.ps1 @@ -103,6 +103,7 @@ Write-Step "Downloading SurfSense files" Write-Info "Installation directory: $InstallDir" New-Item -ItemType Directory -Path "$InstallDir\scripts" -Force | Out-Null +New-Item -ItemType Directory -Path "$InstallDir\searxng" -Force | Out-Null $Files = @( @{ Src = "docker/docker-compose.yml"; Dest = "docker-compose.yml" } @@ -110,6 +111,8 @@ $Files = @( @{ Src = "docker/postgresql.conf"; Dest = "postgresql.conf" } @{ Src = "docker/scripts/init-electric-user.sh"; Dest = "scripts/init-electric-user.sh" } @{ Src = "docker/scripts/migrate-database.ps1"; Dest = "scripts/migrate-database.ps1" } + @{ Src = "docker/searxng/settings.yml"; Dest = "searxng/settings.yml" } + @{ Src = "docker/searxng/limiter.toml"; Dest = "searxng/limiter.toml" } ) foreach ($f in $Files) { diff --git a/docker/scripts/install.sh b/docker/scripts/install.sh index eb6aeb83d..7a68a9bd1 100644 --- a/docker/scripts/install.sh +++ b/docker/scripts/install.sh @@ -102,6 +102,7 @@ wait_for_pg() { step "Downloading SurfSense files" info "Installation directory: ${INSTALL_DIR}" mkdir -p "${INSTALL_DIR}/scripts" +mkdir -p "${INSTALL_DIR}/searxng" FILES=( "docker/docker-compose.yml:docker-compose.yml" @@ -109,6 +110,8 @@ FILES=( "docker/postgresql.conf:postgresql.conf" "docker/scripts/init-electric-user.sh:scripts/init-electric-user.sh" "docker/scripts/migrate-database.sh:scripts/migrate-database.sh" + "docker/searxng/settings.yml:searxng/settings.yml" + "docker/searxng/limiter.toml:searxng/limiter.toml" ) for entry in "${FILES[@]}"; do diff --git a/docker/searxng/limiter.toml b/docker/searxng/limiter.toml new file mode 100644 index 000000000..dce84146f --- /dev/null +++ b/docker/searxng/limiter.toml @@ -0,0 +1,5 @@ +[botdetection.ip_limit] +link_token = false + +[botdetection.ip_lists] +pass_ip = ["0.0.0.0/0"] diff --git a/docker/searxng/settings.yml b/docker/searxng/settings.yml new file mode 100644 index 000000000..0b805b6aa --- /dev/null +++ b/docker/searxng/settings.yml @@ -0,0 +1,90 @@ +use_default_settings: + engines: + remove: + - ahmia + - torch + - qwant + - qwant news + - qwant images + - qwant videos + - mojeek + - mojeek images + - mojeek news + +server: + secret_key: "override-me-via-env" + limiter: false + image_proxy: false + method: "GET" + default_http_headers: + X-Robots-Tag: "noindex, nofollow" + +search: + formats: + - html + - json + default_lang: "auto" + autocomplete: "" + safe_search: 0 + ban_time_on_fail: 5 + max_ban_time_on_fail: 120 + suspended_times: + SearxEngineAccessDenied: 3600 + SearxEngineCaptcha: 3600 + SearxEngineTooManyRequests: 600 + cf_SearxEngineCaptcha: 7200 + cf_SearxEngineAccessDenied: 3600 + recaptcha_SearxEngineCaptcha: 7200 + +ui: + static_use_hash: true + +outgoing: + request_timeout: 12.0 + max_request_timeout: 20.0 + pool_connections: 100 + pool_maxsize: 20 + enable_http2: true + extra_proxy_timeout: 10 + retries: 1 + # Uncomment and set your residential proxy URL to route search engine requests through it. + # Format: http://:@:/ + # + # proxies: + # all://: + # - http://user:pass@proxy-host:port/ + +engines: + - name: google + disabled: false + weight: 1.2 + retry_on_http_error: [429, 503] + - name: duckduckgo + disabled: false + weight: 1.1 + retry_on_http_error: [429, 503] + - name: brave + disabled: false + weight: 1.0 + retry_on_http_error: [429, 503] + - name: bing + disabled: false + weight: 0.9 + retry_on_http_error: [429, 503] + - name: wikipedia + disabled: false + weight: 0.8 + - name: stackoverflow + disabled: false + weight: 0.7 + - name: yahoo + disabled: false + weight: 0.7 + retry_on_http_error: [429, 503] + - name: wikidata + disabled: false + weight: 0.6 + - name: currency + disabled: false + - name: ddg definitions + disabled: false diff --git a/docs/chinese-llm-setup.md b/docs/chinese-llm-setup.md index 2a184608f..37042aa2f 100644 --- a/docs/chinese-llm-setup.md +++ b/docs/chinese-llm-setup.md @@ -14,6 +14,7 @@ SurfSense 现已支持以下国产 LLM: - ✅ **阿里通义千问 (Alibaba Qwen)** - 阿里云通义千问大模型 - ✅ **月之暗面 Kimi (Moonshot)** - 月之暗面 Kimi 大模型 - ✅ **智谱 AI GLM (Zhipu)** - 智谱 AI GLM 系列模型 +- ✅ **MiniMax** - MiniMax 大模型 (M2.5 系列,204K 上下文) --- @@ -197,6 +198,52 @@ API Base URL: https://open.bigmodel.cn/api/paas/v4 --- +## 5️⃣ MiniMax 配置 | MiniMax Configuration + +### 获取 API Key + +1. 访问 [MiniMax 开放平台](https://platform.minimaxi.com/) +2. 注册并登录账号 +3. 进入 **API Keys** 页面 +4. 创建新的 API Key +5. 复制 API Key + +### 在 SurfSense 中配置 + +| 字段 | 值 | 说明 | +|------|-----|------| +| **Configuration Name** | `MiniMax M2.5` | 配置名称(自定义) | +| **Provider** | `MINIMAX` | 选择 MiniMax | +| **Model Name** | `MiniMax-M2.5` | 推荐模型
其他选项: `MiniMax-M2.5-highspeed` | +| **API Key** | `eyJ...` | 你的 MiniMax API Key | +| **API Base URL** | `https://api.minimax.io/v1` | MiniMax API 地址 | +| **Parameters** | `{"temperature": 1.0}` | 注意:temperature 必须在 (0.0, 1.0] 范围内,不能为 0 | + +### 示例配置 + +``` +Configuration Name: MiniMax M2.5 +Provider: MINIMAX +Model Name: MiniMax-M2.5 +API Key: eyJxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx +API Base URL: https://api.minimax.io/v1 +``` + +### 可用模型 + +- **MiniMax-M2.5**: 高性能通用模型,204K 上下文窗口(推荐) +- **MiniMax-M2.5-highspeed**: 高速推理版本,204K 上下文窗口 + +### 注意事项 + +- **temperature 参数**: MiniMax 要求 temperature 必须在 (0.0, 1.0] 范围内,不能设置为 0。建议使用 1.0。 +- 两个模型都支持 204K 超长上下文窗口,适合处理长文本任务。 + +### 定价 +- 请访问 [MiniMax 定价页面](https://platform.minimaxi.com/document/Price) 查看最新价格 + +--- + ## ⚙️ 高级配置 | Advanced Configuration ### 自定义参数 | Custom Parameters @@ -268,8 +315,8 @@ docker compose logs backend | grep -i "error" |---------|---------|------| | **文档摘要** | Qwen-Plus, GLM-4 | 平衡性能和成本 | | **代码分析** | DeepSeek-Coder | 代码专用 | -| **长文本处理** | Kimi 128K | 超长上下文 | -| **快速响应** | Qwen-Turbo, GLM-4-Flash | 速度优先 | +| **长文本处理** | Kimi 128K, MiniMax-M2.5 (204K) | 超长上下文 | +| **快速响应** | Qwen-Turbo, GLM-4-Flash, MiniMax-M2.5-highspeed | 速度优先 | ### 2. 成本优化 @@ -294,6 +341,7 @@ docker compose logs backend | grep -i "error" - [阿里云百炼文档](https://help.aliyun.com/zh/model-studio/) - [Moonshot AI 文档](https://platform.moonshot.cn/docs) - [智谱 AI 文档](https://open.bigmodel.cn/dev/api) +- [MiniMax 文档](https://platform.minimaxi.com/document/Guides) ### SurfSense 文档 diff --git a/surfsense_backend/.env.example b/surfsense_backend/.env.example index 413be03c4..621c8cf99 100644 --- a/surfsense_backend/.env.example +++ b/surfsense_backend/.env.example @@ -12,6 +12,11 @@ REDIS_APP_URL=redis://localhost:6379/0 # Optional: TTL in seconds for connector indexing lock key # CONNECTOR_INDEXING_LOCK_TTL_SECONDS=28800 +# Platform Web Search (SearXNG) +# Set this to enable built-in web search. Docker Compose sets it automatically. +# Only uncomment if running the backend outside Docker (e.g. uvicorn on host). +# SEARXNG_DEFAULT_HOST=http://localhost:8888 + #Electric(for migrations only) ELECTRIC_DB_USER=electric ELECTRIC_DB_PASSWORD=electric_password diff --git a/surfsense_backend/.gitignore b/surfsense_backend/.gitignore index 443c85e9c..1cd7fd32c 100644 --- a/surfsense_backend/.gitignore +++ b/surfsense_backend/.gitignore @@ -6,6 +6,7 @@ __pycache__/ .flashrank_cache surf_new_backend.egg-info/ podcasts/ +video_presentation_audio/ sandbox_files/ temp_audio/ celerybeat-schedule* diff --git a/surfsense_backend/alembic/versions/106_add_minimax_to_litellmprovider_enum.py b/surfsense_backend/alembic/versions/106_add_minimax_to_litellmprovider_enum.py new file mode 100644 index 000000000..fed3bc7c3 --- /dev/null +++ b/surfsense_backend/alembic/versions/106_add_minimax_to_litellmprovider_enum.py @@ -0,0 +1,23 @@ +"""Add MINIMAX to LiteLLMProvider enum + +Revision ID: 106 +Revises: 105 +""" + +from collections.abc import Sequence + +from alembic import op + +revision: str = "106" +down_revision: str | None = "105" +branch_labels: str | Sequence[str] | None = None +depends_on: str | Sequence[str] | None = None + + +def upgrade() -> None: + op.execute("COMMIT") + op.execute("ALTER TYPE litellmprovider ADD VALUE IF NOT EXISTS 'MINIMAX'") + + +def downgrade() -> None: + pass diff --git a/surfsense_backend/alembic/versions/107_add_video_presentations_table.py b/surfsense_backend/alembic/versions/107_add_video_presentations_table.py new file mode 100644 index 000000000..e6f928b50 --- /dev/null +++ b/surfsense_backend/alembic/versions/107_add_video_presentations_table.py @@ -0,0 +1,85 @@ +"""Add video_presentations table and video_presentation_status enum + +Revision ID: 107 +Revises: 106 +""" + +from collections.abc import Sequence + +import sqlalchemy as sa +from sqlalchemy.dialects.postgresql import JSONB + +from alembic import op + +revision: str = "107" +down_revision: str | None = "106" +branch_labels: str | Sequence[str] | None = None +depends_on: str | Sequence[str] | None = None + +video_presentation_status_enum = sa.Enum( + "pending", + "generating", + "ready", + "failed", + name="video_presentation_status", +) + + +def upgrade() -> None: + video_presentation_status_enum.create(op.get_bind(), checkfirst=True) + + op.create_table( + "video_presentations", + sa.Column("id", sa.Integer(), autoincrement=True, nullable=False), + sa.Column("title", sa.String(length=500), nullable=False), + sa.Column("slides", JSONB(), nullable=True), + sa.Column("scene_codes", JSONB(), nullable=True), + sa.Column( + "status", + video_presentation_status_enum, + server_default="ready", + nullable=False, + ), + sa.Column("search_space_id", sa.Integer(), nullable=False), + sa.Column("thread_id", sa.Integer(), nullable=True), + sa.Column( + "created_at", + sa.TIMESTAMP(timezone=True), + server_default=sa.text("now()"), + nullable=False, + ), + sa.ForeignKeyConstraint( + ["search_space_id"], + ["searchspaces.id"], + ondelete="CASCADE", + ), + sa.ForeignKeyConstraint( + ["thread_id"], + ["new_chat_threads.id"], + ondelete="SET NULL", + ), + sa.PrimaryKeyConstraint("id"), + ) + op.create_index( + "ix_video_presentations_status", + "video_presentations", + ["status"], + ) + op.create_index( + "ix_video_presentations_thread_id", + "video_presentations", + ["thread_id"], + ) + op.create_index( + "ix_video_presentations_created_at", + "video_presentations", + ["created_at"], + ) + + +def downgrade() -> None: + op.drop_index("ix_video_presentations_created_at", table_name="video_presentations") + op.drop_index("ix_video_presentations_thread_id", table_name="video_presentations") + op.drop_index("ix_video_presentations_status", table_name="video_presentations") + op.drop_table("video_presentations") + video_presentation_status_enum.drop(op.get_bind(), checkfirst=True) diff --git a/surfsense_backend/app/agents/new_chat/chat_deepagent.py b/surfsense_backend/app/agents/new_chat/chat_deepagent.py index f3d988e5b..c247ada61 100644 --- a/surfsense_backend/app/agents/new_chat/chat_deepagent.py +++ b/surfsense_backend/app/agents/new_chat/chat_deepagent.py @@ -37,13 +37,15 @@ _perf_log = get_perf_logger() # ============================================================================= # Maps SearchSourceConnectorType enum values to the searchable document/connector types -# used by the knowledge_base tool. Some connectors map to different document types. +# used by the knowledge_base and web_search tools. +# Live search connectors (TAVILY_API, LINKUP_API, BAIDU_SEARCH_API) are routed to +# the web_search tool; all others go to search_knowledge_base. _CONNECTOR_TYPE_TO_SEARCHABLE: dict[str, str] = { - # Direct mappings (connector type == searchable type) + # Live search connectors (handled by web_search tool) "TAVILY_API": "TAVILY_API", - "SEARXNG_API": "SEARXNG_API", "LINKUP_API": "LINKUP_API", "BAIDU_SEARCH_API": "BAIDU_SEARCH_API", + # Local/indexed connectors (handled by search_knowledge_base tool) "SLACK_CONNECTOR": "SLACK_CONNECTOR", "TEAMS_CONNECTOR": "TEAMS_CONNECTOR", "NOTION_CONNECTOR": "NOTION_CONNECTOR", @@ -233,6 +235,7 @@ async def create_surfsense_deep_agent( available_document_types = await connector_service.get_available_document_types( search_space_id ) + except Exception as e: logging.warning(f"Failed to discover available connectors/document types: {e}") _perf_log.info( diff --git a/surfsense_backend/app/agents/new_chat/llm_config.py b/surfsense_backend/app/agents/new_chat/llm_config.py index 4ddb47330..60cd2a452 100644 --- a/surfsense_backend/app/agents/new_chat/llm_config.py +++ b/surfsense_backend/app/agents/new_chat/llm_config.py @@ -59,6 +59,7 @@ PROVIDER_MAP = { "DATABRICKS": "databricks", "COMETAPI": "cometapi", "HUGGINGFACE": "huggingface", + "MINIMAX": "openai", "CUSTOM": "custom", } diff --git a/surfsense_backend/app/agents/new_chat/system_prompt.py b/surfsense_backend/app/agents/new_chat/system_prompt.py index b042f75c3..f8ac62787 100644 --- a/surfsense_backend/app/agents/new_chat/system_prompt.py +++ b/surfsense_backend/app/agents/new_chat/system_prompt.py @@ -99,14 +99,8 @@ _TOOL_INSTRUCTIONS["search_knowledge_base"] = """ - IMPORTANT: When searching for information (meetings, schedules, notes, tasks, etc.), ALWAYS search broadly across ALL sources first by omitting connectors_to_search. The user may store information in various places including calendar apps, note-taking apps (Obsidian, Notion), chat apps (Slack, Discord), and more. - - IMPORTANT (REAL-TIME / PUBLIC WEB QUERIES): For questions that require current public web data - (e.g., live exchange rates, stock prices, breaking news, weather, current events), you MUST call - `search_knowledge_base` using live web connectors via `connectors_to_search`: - ["LINKUP_API", "TAVILY_API", "SEARXNG_API", "BAIDU_SEARCH_API"]. - - For these real-time/public web queries, DO NOT answer from memory and DO NOT say you lack internet - access before attempting a live connector search. - - If the live connectors return no relevant results, explain that live web sources did not return enough - data and ask the user if they want you to retry with a refined query. + - This tool searches ONLY local/indexed data (uploaded files, Notion, Slack, browser extension captures, etc.). + For real-time web search (current events, news, live data), use the `web_search` tool instead. - FALLBACK BEHAVIOR: If the search returns no relevant results, you MAY then answer using your own general knowledge, but clearly indicate that no matching information was found in the knowledge base. - Only narrow to specific connectors if the user explicitly asks (e.g., "check my Slack" or "in my calendar"). @@ -138,6 +132,17 @@ _TOOL_INSTRUCTIONS["generate_podcast"] = """ - After calling this tool, inform the user that podcast generation has started and they will see the player when it's ready (takes 3-5 minutes). """ +_TOOL_INSTRUCTIONS["generate_video_presentation"] = """ +- generate_video_presentation: Generate a video presentation from provided content. + - Use this when the user asks to create a video, presentation, slides, or slide deck. + - Trigger phrases: "give me a presentation", "create slides", "generate a video", "make a slide deck", "turn this into a presentation" + - Args: + - source_content: The text content to turn into a presentation. The more detailed, the better. + - video_title: Optional title (default: "SurfSense Presentation") + - user_prompt: Optional style instructions (e.g., "Make it technical and detailed") + - After calling this tool, inform the user that generation has started and they will see the presentation when it's ready. +""" + _TOOL_INSTRUCTIONS["generate_report"] = """ - generate_report: Generate or revise a structured Markdown report artifact. - WHEN TO CALL THIS TOOL — the message must contain a creation or modification VERB directed at producing a deliverable: @@ -271,6 +276,24 @@ _TOOL_INSTRUCTIONS["scrape_webpage"] = """ * Don't show every image - just the most relevant 1-3 images that enhance understanding. """ +_TOOL_INSTRUCTIONS["web_search"] = """ +- web_search: Search the web for real-time information using all configured search engines. + - Use this for current events, news, prices, weather, public facts, or any question requiring + up-to-date information from the internet. + - This tool dispatches to all configured search engines (SearXNG, Tavily, Linkup, Baidu) in + parallel and merges the results. + - IMPORTANT (REAL-TIME / PUBLIC WEB QUERIES): For questions that require current public web data + (e.g., live exchange rates, stock prices, breaking news, weather, current events), you MUST call + `web_search` instead of answering from memory. + - For these real-time/public web queries, DO NOT answer from memory and DO NOT say you lack internet + access before attempting a web search. + - If the search returns no relevant results, explain that web sources did not return enough + data and ask the user if they want you to retry with a refined query. + - Args: + - query: The search query - use specific, descriptive terms + - top_k: Number of results to retrieve (default: 10, max: 50) +""" + # Memory tool instructions have private and shared variants. # We store them keyed as "save_memory" / "recall_memory" with sub-keys. _MEMORY_TOOL_INSTRUCTIONS: dict[str, dict[str, str]] = { @@ -401,7 +424,7 @@ _TOOL_EXAMPLES["search_knowledge_base"] = """ - User: "Check my Obsidian notes for meeting notes" - Call: `search_knowledge_base(query="meeting notes", connectors_to_search=["OBSIDIAN_CONNECTOR"])` - User: "search me current usd to inr rate" - - Call: `search_knowledge_base(query="current USD to INR exchange rate", connectors_to_search=["LINKUP_API", "TAVILY_API", "SEARXNG_API", "BAIDU_SEARCH_API"])` + - Call: `web_search(query="current USD to INR exchange rate")` - Then answer using the returned live web results with citations. """ @@ -426,6 +449,16 @@ _TOOL_EXAMPLES["generate_podcast"] = """ - Then: `generate_podcast(source_content="Key insights about quantum computing from the knowledge base:\\n\\n[Comprehensive summary of all relevant search results with key facts, concepts, and findings]", podcast_title="Quantum Computing Explained")` """ +_TOOL_EXAMPLES["generate_video_presentation"] = """ +- User: "Give me a presentation about AI trends based on what we discussed" + - First search for relevant content, then call: `generate_video_presentation(source_content="Based on our conversation and search results: [detailed summary of chat + search findings]", video_title="AI Trends Presentation")` +- User: "Create slides summarizing this conversation" + - Call: `generate_video_presentation(source_content="Complete conversation summary:\\n\\nUser asked about [topic 1]:\\n[Your detailed response]\\n\\nUser then asked about [topic 2]:\\n[Your detailed response]\\n\\n[Continue for all exchanges in the conversation]", video_title="Conversation Summary")` +- User: "Make a video presentation about quantum computing" + - First search: `search_knowledge_base(query="quantum computing")` + - Then: `generate_video_presentation(source_content="Key insights about quantum computing from the knowledge base:\\n\\n[Comprehensive summary of all relevant search results with key facts, concepts, and findings]", video_title="Quantum Computing Explained")` +""" + _TOOL_EXAMPLES["generate_report"] = """ - User: "Generate a report about AI trends" - Call: `generate_report(topic="AI Trends Report", source_strategy="kb_search", search_queries=["AI trends recent developments", "artificial intelligence industry trends", "AI market growth and predictions"], report_style="detailed")` @@ -471,11 +504,23 @@ _TOOL_EXAMPLES["generate_image"] = """ - Step 2: `display_image(src="", alt="Bean Dream coffee shop logo", title="Generated Image")` """ +_TOOL_EXAMPLES["web_search"] = """ +- User: "What's the current USD to INR exchange rate?" + - Call: `web_search(query="current USD to INR exchange rate")` + - Then answer using the returned web results with citations. +- User: "What's the latest news about AI?" + - Call: `web_search(query="latest AI news today")` +- User: "What's the weather in New York?" + - Call: `web_search(query="weather New York today")` +""" + # All tool names that have prompt instructions (order matters for prompt readability) _ALL_TOOL_NAMES_ORDERED = [ "search_surfsense_docs", "search_knowledge_base", + "web_search", "generate_podcast", + "generate_video_presentation", "generate_report", "link_preview", "display_image", @@ -543,7 +588,7 @@ DISABLED TOOLS (by user): The following tools are available in SurfSense but have been disabled by the user for this session: {disabled_list}. You do NOT have access to these tools and MUST NOT claim you can use them. If the user asks about a capability provided by a disabled tool, let them know the relevant tool -is currently disabled and they can re-enable it from the tools menu (wrench icon) in the composer toolbar. +is currently disabled and they can re-enable it. """) parts.append("\n\n") @@ -595,11 +640,10 @@ The documents you receive are structured like this: -**Live web search results (URL chunk IDs):** +**Web search results (URL chunk IDs):** - TAVILY_API::Some Title::https://example.com/article - TAVILY_API + WEB_SEARCH <![CDATA[Some web search result]]> diff --git a/surfsense_backend/app/agents/new_chat/tools/__init__.py b/surfsense_backend/app/agents/new_chat/tools/__init__.py index 0a11951f0..5002e69bb 100644 --- a/surfsense_backend/app/agents/new_chat/tools/__init__.py +++ b/surfsense_backend/app/agents/new_chat/tools/__init__.py @@ -8,6 +8,7 @@ Available tools: - search_knowledge_base: Search the user's personal knowledge base - search_surfsense_docs: Search Surfsense documentation for usage help - generate_podcast: Generate audio podcasts from content +- generate_video_presentation: Generate video presentations with slides and narration - generate_image: Generate images from text descriptions using AI models - link_preview: Fetch rich previews for URLs - display_image: Display images in chat @@ -39,6 +40,7 @@ from .registry import ( from .scrape_webpage import create_scrape_webpage_tool from .search_surfsense_docs import create_search_surfsense_docs_tool from .user_memory import create_recall_memory_tool, create_save_memory_tool +from .video_presentation import create_generate_video_presentation_tool __all__ = [ # Registry @@ -51,6 +53,7 @@ __all__ = [ "create_display_image_tool", "create_generate_image_tool", "create_generate_podcast_tool", + "create_generate_video_presentation_tool", "create_link_preview_tool", "create_recall_memory_tool", "create_save_memory_tool", diff --git a/surfsense_backend/app/agents/new_chat/tools/knowledge_base.py b/surfsense_backend/app/agents/new_chat/tools/knowledge_base.py index 4596d5efd..a683b1c17 100644 --- a/surfsense_backend/app/agents/new_chat/tools/knowledge_base.py +++ b/surfsense_backend/app/agents/new_chat/tools/knowledge_base.py @@ -23,11 +23,10 @@ from app.db import shielded_async_session from app.services.connector_service import ConnectorService from app.utils.perf import get_perf_logger -# Connectors that call external live-search APIs (no local DB / embedding needed). -# These are never filtered by available_document_types. +# Connectors that call external live-search APIs. These are handled by the +# ``web_search`` tool and must be excluded from knowledge-base searches. _LIVE_SEARCH_CONNECTORS: set[str] = { "TAVILY_API", - "SEARXNG_API", "LINKUP_API", "BAIDU_SEARCH_API", } @@ -190,10 +189,6 @@ _ALL_CONNECTORS: list[str] = [ "GOOGLE_DRIVE_FILE", "DISCORD_CONNECTOR", "AIRTABLE_CONNECTOR", - "TAVILY_API", - "SEARXNG_API", - "LINKUP_API", - "BAIDU_SEARCH_API", "LUMA_CONNECTOR", "NOTE", "BOOKSTACK_CONNECTOR", @@ -227,10 +222,6 @@ CONNECTOR_DESCRIPTIONS: dict[str, str] = { "GOOGLE_DRIVE_FILE": "Google Drive files and documents (personal cloud storage)", "DISCORD_CONNECTOR": "Discord server conversations and shared content (personal community)", "AIRTABLE_CONNECTOR": "Airtable records, tables, and database content (personal data)", - "TAVILY_API": "Tavily web search API results (real-time web search)", - "SEARXNG_API": "SearxNG search API results (privacy-focused web search)", - "LINKUP_API": "Linkup search API results (web search)", - "BAIDU_SEARCH_API": "Baidu search API results (Chinese web search)", "LUMA_CONNECTOR": "Luma events and meetings", "WEBCRAWLER_CONNECTOR": "Webpages indexed by SurfSense (personally selected websites)", "CRAWLED_URL": "Webpages indexed by SurfSense (personally selected websites)", @@ -268,14 +259,15 @@ def _normalize_connectors( valid_set = ( set(available_connectors) if available_connectors else set(_ALL_CONNECTORS) ) + valid_set -= _LIVE_SEARCH_CONNECTORS if not connectors_to_search: - # Search all available connectors if none specified - return ( + base = ( list(available_connectors) if available_connectors else list(_ALL_CONNECTORS) ) + return [c for c in base if c not in _LIVE_SEARCH_CONNECTORS] normalized: list[str] = [] for raw in connectors_to_search: @@ -302,15 +294,14 @@ def _normalize_connectors( out.append(c) # Fallback to all available if nothing matched - return ( - out - if out - else ( + if not out: + base = ( list(available_connectors) if available_connectors else list(_ALL_CONNECTORS) ) - ) + return [c for c in base if c not in _LIVE_SEARCH_CONNECTORS] + return out # ============================================================================= @@ -479,7 +470,6 @@ def format_documents_for_context( # a numeric chunk_id (the numeric IDs are meaningless auto-incremented counters). live_search_connectors = { "TAVILY_API", - "SEARXNG_API", "LINKUP_API", "BAIDU_SEARCH_API", } @@ -623,13 +613,11 @@ async def search_knowledge_base_async( connectors = _normalize_connectors(connectors_to_search, available_connectors) - # --- Optimization 1: skip local connectors that have zero indexed documents --- + # --- Optimization 1: skip connectors that have zero indexed documents --- if available_document_types: doc_types_set = set(available_document_types) before_count = len(connectors) - connectors = [ - c for c in connectors if c in _LIVE_SEARCH_CONNECTORS or c in doc_types_set - ] + connectors = [c for c in connectors if c in doc_types_set] skipped = before_count - len(connectors) if skipped: perf.info( @@ -664,9 +652,7 @@ async def search_knowledge_base_async( "[kb_search] degenerate query %r detected - falling back to recency browse", query, ) - local_connectors = [c for c in connectors if c not in _LIVE_SEARCH_CONNECTORS] - if not local_connectors: - local_connectors = [None] # type: ignore[list-item] + browse_connectors = connectors if connectors else [None] # type: ignore[list-item] browse_results = await asyncio.gather( *[ @@ -677,7 +663,7 @@ async def search_knowledge_base_async( start_date=resolved_start_date, end_date=resolved_end_date, ) - for c in local_connectors + for c in browse_connectors ] ) for docs in browse_results: @@ -702,66 +688,20 @@ async def search_knowledge_base_async( ) return result - # Specs for live-search connectors (external APIs, no local DB/embedding). - live_connector_specs: dict[str, tuple[str, bool, bool, dict[str, Any]]] = { - "TAVILY_API": ("search_tavily", False, True, {}), - "SEARXNG_API": ("search_searxng", False, True, {}), - "LINKUP_API": ("search_linkup", False, False, {"mode": "standard"}), - "BAIDU_SEARCH_API": ("search_baidu", False, True, {}), - } - # --- Optimization 2: compute the query embedding once, share across all local searches --- - precomputed_embedding: list[float] | None = None - has_local_connectors = any(c not in _LIVE_SEARCH_CONNECTORS for c in connectors) - if has_local_connectors: - from app.config import config as app_config + from app.config import config as app_config - t_embed = time.perf_counter() - precomputed_embedding = app_config.embedding_model_instance.embed(query) - perf.info( - "[kb_search] shared embedding computed in %.3fs", - time.perf_counter() - t_embed, - ) + t_embed = time.perf_counter() + precomputed_embedding = app_config.embedding_model_instance.embed(query) + perf.info( + "[kb_search] shared embedding computed in %.3fs", + time.perf_counter() - t_embed, + ) max_parallel_searches = 4 semaphore = asyncio.Semaphore(max_parallel_searches) async def _search_one_connector(connector: str) -> list[dict[str, Any]]: - is_live = connector in _LIVE_SEARCH_CONNECTORS - - if is_live: - spec = live_connector_specs.get(connector) - if spec is None: - return [] - method_name, includes_date_range, includes_top_k, extra_kwargs = spec - kwargs: dict[str, Any] = { - "user_query": query, - "search_space_id": search_space_id, - **extra_kwargs, - } - if includes_top_k: - kwargs["top_k"] = top_k - if includes_date_range: - kwargs["start_date"] = resolved_start_date - kwargs["end_date"] = resolved_end_date - - try: - t_conn = time.perf_counter() - async with semaphore, shielded_async_session() as isolated_session: - svc = ConnectorService(isolated_session, search_space_id) - _, chunks = await getattr(svc, method_name)(**kwargs) - perf.info( - "[kb_search] connector=%s results=%d in %.3fs", - connector, - len(chunks), - time.perf_counter() - t_conn, - ) - return chunks - except Exception as e: - perf.warning("[kb_search] connector=%s FAILED: %s", connector, e) - return [] - - # --- Optimization 3: call _combined_rrf_search directly with shared embedding --- try: t_conn = time.perf_counter() async with semaphore, shielded_async_session() as isolated_session: @@ -967,7 +907,9 @@ Focus searches on these types for best results.""" # This is what the LLM sees when deciding whether/how to use the tool dynamic_description = f"""Search the user's personal knowledge base for relevant information. -Use this tool to find documents, notes, files, web pages, and other content that may help answer the user's question. +Use this tool to find documents, notes, files, web pages, and other content the user has indexed. +This searches ONLY local/indexed data (uploaded files, Notion, Slack, browser extension captures, etc.). +For real-time web search (current events, news, live data), use the `web_search` tool instead. IMPORTANT: - Always craft specific, descriptive search queries using natural language keywords. @@ -977,9 +919,6 @@ IMPORTANT: - If the user requests a specific source type (e.g. "my notes", "Slack messages"), pass `connectors_to_search=[...]` using the enums below. - If `connectors_to_search` is omitted/empty, the system will search broadly. - Only connectors that are enabled/configured for this search space are available.{doc_types_info} -- For real-time/public web queries (e.g., current exchange rates, stock prices, breaking news, weather), - explicitly include live web connectors in `connectors_to_search`, prioritizing: - ["LINKUP_API", "TAVILY_API", "SEARXNG_API", "BAIDU_SEARCH_API"]. ## Available connector enums for `connectors_to_search` diff --git a/surfsense_backend/app/agents/new_chat/tools/podcast.py b/surfsense_backend/app/agents/new_chat/tools/podcast.py index 8ac537f9a..248a4f450 100644 --- a/surfsense_backend/app/agents/new_chat/tools/podcast.py +++ b/surfsense_backend/app/agents/new_chat/tools/podcast.py @@ -4,60 +4,15 @@ Podcast generation tool for the SurfSense agent. This module provides a factory function for creating the generate_podcast tool that submits a Celery task for background podcast generation. The frontend polls for completion and auto-updates when the podcast is ready. - -Duplicate request prevention: -- Only one podcast can be generated at a time per search space -- Uses Redis to track active podcast tasks -- Returns a friendly message if a podcast is already being generated """ from typing import Any -import redis from langchain_core.tools import tool from sqlalchemy.ext.asyncio import AsyncSession -from app.config import config from app.db import Podcast, PodcastStatus -# Redis connection for tracking active podcast tasks -# Defaults to the Celery broker when REDIS_APP_URL is not set -REDIS_URL = config.REDIS_APP_URL -_redis_client: redis.Redis | None = None - - -def get_redis_client() -> redis.Redis: - """Get or create Redis client for podcast task tracking.""" - global _redis_client - if _redis_client is None: - _redis_client = redis.from_url(REDIS_URL, decode_responses=True) - return _redis_client - - -def _redis_key(search_space_id: int) -> str: - return f"podcast:generating:{search_space_id}" - - -def get_generating_podcast_id(search_space_id: int) -> int | None: - """Get the podcast ID currently being generated for this search space.""" - try: - client = get_redis_client() - value = client.get(_redis_key(search_space_id)) - return int(value) if value else None - except Exception: - return None - - -def set_generating_podcast(search_space_id: int, podcast_id: int) -> None: - """Mark a podcast as currently generating for this search space.""" - try: - client = get_redis_client() - client.setex(_redis_key(search_space_id), 1800, str(podcast_id)) - except Exception as e: - print( - f"[generate_podcast] Warning: Could not set generating podcast in Redis: {e}" - ) - def create_generate_podcast_tool( search_space_id: int, @@ -109,18 +64,6 @@ def create_generate_podcast_tool( - message: Status message (or "error" field if status is failed) """ try: - generating_podcast_id = get_generating_podcast_id(search_space_id) - if generating_podcast_id: - print( - f"[generate_podcast] Blocked duplicate request. Generating podcast: {generating_podcast_id}" - ) - return { - "status": PodcastStatus.GENERATING.value, - "podcast_id": generating_podcast_id, - "title": podcast_title, - "message": "A podcast is already being generated. Please wait for it to complete.", - } - podcast = Podcast( title=podcast_title, status=PodcastStatus.PENDING, @@ -142,8 +85,6 @@ def create_generate_podcast_tool( user_prompt=user_prompt, ) - set_generating_podcast(search_space_id, podcast.id) - print(f"[generate_podcast] Created podcast {podcast.id}, task: {task.id}") return { diff --git a/surfsense_backend/app/agents/new_chat/tools/registry.py b/surfsense_backend/app/agents/new_chat/tools/registry.py index 030cbf239..4feff7d90 100644 --- a/surfsense_backend/app/agents/new_chat/tools/registry.py +++ b/surfsense_backend/app/agents/new_chat/tools/registry.py @@ -73,6 +73,8 @@ from .shared_memory import ( create_save_shared_memory_tool, ) from .user_memory import create_recall_memory_tool, create_save_memory_tool +from .video_presentation import create_generate_video_presentation_tool +from .web_search import create_web_search_tool # ============================================================================= # Tool Definition @@ -135,6 +137,17 @@ BUILTIN_TOOLS: list[ToolDefinition] = [ ), requires=["search_space_id", "db_session", "thread_id"], ), + # Video presentation generation tool + ToolDefinition( + name="generate_video_presentation", + description="Generate a video presentation with slides and narration from provided content", + factory=lambda deps: create_generate_video_presentation_tool( + search_space_id=deps["search_space_id"], + db_session=deps["db_session"], + thread_id=deps["thread_id"], + ), + requires=["search_space_id", "db_session", "thread_id"], + ), # Report generation tool (inline, short-lived sessions for DB ops) # Supports internal KB search via source_strategy so the agent doesn't # need to call search_knowledge_base separately before generating. @@ -186,7 +199,16 @@ BUILTIN_TOOLS: list[ToolDefinition] = [ ), requires=[], # firecrawl_api_key is optional ), - # Note: write_todos is now provided by TodoListMiddleware from deepagents + # Web search tool — real-time web search via SearXNG + user-configured engines + ToolDefinition( + name="web_search", + description="Search the web for real-time information using configured search engines", + factory=lambda deps: create_web_search_tool( + search_space_id=deps.get("search_space_id"), + available_connectors=deps.get("available_connectors"), + ), + requires=[], + ), # Surfsense documentation search tool ToolDefinition( name="search_surfsense_docs", diff --git a/surfsense_backend/app/agents/new_chat/tools/video_presentation.py b/surfsense_backend/app/agents/new_chat/tools/video_presentation.py new file mode 100644 index 000000000..a90e08ac3 --- /dev/null +++ b/surfsense_backend/app/agents/new_chat/tools/video_presentation.py @@ -0,0 +1,87 @@ +""" +Video presentation generation tool for the SurfSense agent. + +This module provides a factory function for creating the generate_video_presentation +tool that submits a Celery task for background video presentation generation. +The frontend polls for completion and auto-updates when the presentation is ready. +""" + +from typing import Any + +from langchain_core.tools import tool +from sqlalchemy.ext.asyncio import AsyncSession + +from app.db import VideoPresentation, VideoPresentationStatus + + +def create_generate_video_presentation_tool( + search_space_id: int, + db_session: AsyncSession, + thread_id: int | None = None, +): + """ + Factory function to create the generate_video_presentation tool with injected dependencies. + + Pre-creates video presentation record with pending status so the ID is available + immediately for frontend polling. + """ + + @tool + async def generate_video_presentation( + source_content: str, + video_title: str = "SurfSense Presentation", + user_prompt: str | None = None, + ) -> dict[str, Any]: + """Generate a video presentation from the provided content. + + Use this tool when the user asks to create a video, presentation, slides, or slide deck. + + Args: + source_content: The text content to turn into a presentation. + video_title: Title for the presentation (default: "SurfSense Presentation") + user_prompt: Optional style/tone instructions. + """ + try: + video_pres = VideoPresentation( + title=video_title, + status=VideoPresentationStatus.PENDING, + search_space_id=search_space_id, + thread_id=thread_id, + ) + db_session.add(video_pres) + await db_session.commit() + await db_session.refresh(video_pres) + + from app.tasks.celery_tasks.video_presentation_tasks import ( + generate_video_presentation_task, + ) + + task = generate_video_presentation_task.delay( + video_presentation_id=video_pres.id, + source_content=source_content, + search_space_id=search_space_id, + user_prompt=user_prompt, + ) + + print( + f"[generate_video_presentation] Created video presentation {video_pres.id}, task: {task.id}" + ) + + return { + "status": VideoPresentationStatus.PENDING.value, + "video_presentation_id": video_pres.id, + "title": video_title, + "message": "Video presentation generation started. This may take a few minutes.", + } + + except Exception as e: + error_message = str(e) + print(f"[generate_video_presentation] Error: {error_message}") + return { + "status": VideoPresentationStatus.FAILED.value, + "error": error_message, + "title": video_title, + "video_presentation_id": None, + } + + return generate_video_presentation diff --git a/surfsense_backend/app/agents/new_chat/tools/web_search.py b/surfsense_backend/app/agents/new_chat/tools/web_search.py new file mode 100644 index 000000000..c67db541c --- /dev/null +++ b/surfsense_backend/app/agents/new_chat/tools/web_search.py @@ -0,0 +1,247 @@ +""" +Web search tool for the SurfSense agent. + +Provides a unified tool for real-time web searches that dispatches to all +configured search engines: the platform SearXNG instance (always available) +plus any user-configured live-search connectors (Tavily, Linkup, Baidu). +""" + +import asyncio +import json +import time +from typing import Any + +from langchain_core.tools import StructuredTool +from pydantic import BaseModel, Field + +from app.db import shielded_async_session +from app.services.connector_service import ConnectorService +from app.utils.perf import get_perf_logger + +_LIVE_SEARCH_CONNECTORS: set[str] = { + "TAVILY_API", + "LINKUP_API", + "BAIDU_SEARCH_API", +} + +_LIVE_CONNECTOR_SPECS: dict[str, tuple[str, bool, bool, dict[str, Any]]] = { + "TAVILY_API": ("search_tavily", False, True, {}), + "LINKUP_API": ("search_linkup", False, False, {"mode": "standard"}), + "BAIDU_SEARCH_API": ("search_baidu", False, True, {}), +} + +_CONNECTOR_LABELS: dict[str, str] = { + "TAVILY_API": "Tavily", + "LINKUP_API": "Linkup", + "BAIDU_SEARCH_API": "Baidu", +} + + +class WebSearchInput(BaseModel): + """Input schema for the web_search tool.""" + + query: str = Field( + description="The search query to look up on the web. Use specific, descriptive terms.", + ) + top_k: int = Field( + default=10, + description="Number of results to retrieve (default: 10, max: 50).", + ) + + +def _format_web_results( + documents: list[dict[str, Any]], + *, + max_chars: int = 50_000, +) -> str: + """Format web search results into XML suitable for the LLM context.""" + if not documents: + return "No web search results found." + + parts: list[str] = [] + total_chars = 0 + + for doc in documents: + doc_info = doc.get("document") or {} + metadata = doc_info.get("metadata") or {} + title = doc_info.get("title") or "Web Result" + url = metadata.get("url") or "" + content = (doc.get("content") or "").strip() + source = metadata.get("document_type") or doc.get("source") or "WEB_SEARCH" + if not content: + continue + + metadata_json = json.dumps(metadata, ensure_ascii=False) + doc_xml = "\n".join( + [ + "", + "", + f" {source}", + f" <![CDATA[{title}]]>", + f" ", + f" ", + "", + "", + f" ", + "", + "", + "", + ] + ) + + if total_chars + len(doc_xml) > max_chars: + parts.append("") + break + + parts.append(doc_xml) + total_chars += len(doc_xml) + + return "\n".join(parts).strip() or "No web search results found." + + +async def _search_live_connector( + connector: str, + query: str, + search_space_id: int, + top_k: int, + semaphore: asyncio.Semaphore, +) -> list[dict[str, Any]]: + """Dispatch a single live-search connector (Tavily / Linkup / Baidu).""" + perf = get_perf_logger() + spec = _LIVE_CONNECTOR_SPECS.get(connector) + if spec is None: + return [] + + method_name, _includes_date_range, includes_top_k, extra_kwargs = spec + kwargs: dict[str, Any] = { + "user_query": query, + "search_space_id": search_space_id, + **extra_kwargs, + } + if includes_top_k: + kwargs["top_k"] = top_k + + try: + t0 = time.perf_counter() + async with semaphore, shielded_async_session() as session: + svc = ConnectorService(session, search_space_id) + _, chunks = await getattr(svc, method_name)(**kwargs) + perf.info( + "[web_search] connector=%s results=%d in %.3fs", + connector, + len(chunks), + time.perf_counter() - t0, + ) + return chunks + except Exception as e: + perf.warning("[web_search] connector=%s FAILED: %s", connector, e) + return [] + + +def create_web_search_tool( + search_space_id: int | None = None, + available_connectors: list[str] | None = None, +) -> StructuredTool: + """Factory for the ``web_search`` tool. + + Dispatches in parallel to the platform SearXNG instance and any + user-configured live-search connectors (Tavily, Linkup, Baidu). + """ + active_live_connectors: list[str] = [] + if available_connectors: + active_live_connectors = [ + c for c in available_connectors if c in _LIVE_SEARCH_CONNECTORS + ] + + engine_names = ["SearXNG (platform default)"] + engine_names.extend(_CONNECTOR_LABELS.get(c, c) for c in active_live_connectors) + engines_summary = ", ".join(engine_names) + + description = ( + "Search the web for real-time information. " + "Use this for current events, news, prices, weather, public facts, or any " + "question that requires up-to-date information from the internet.\n\n" + f"Active search engines: {engines_summary}.\n" + "All configured engines are queried in parallel and results are merged." + ) + + _search_space_id = search_space_id + _active_live = active_live_connectors + + async def _web_search_impl(query: str, top_k: int = 10) -> str: + from app.services import web_search_service + + perf = get_perf_logger() + t0 = time.perf_counter() + clamped_top_k = min(max(1, top_k), 50) + + semaphore = asyncio.Semaphore(4) + tasks: list[asyncio.Task[list[dict[str, Any]]]] = [] + + if web_search_service.is_available(): + + async def _searxng() -> list[dict[str, Any]]: + async with semaphore: + _result_obj, docs = await web_search_service.search( + query=query, + top_k=clamped_top_k, + ) + return docs + + tasks.append(asyncio.ensure_future(_searxng())) + + if _search_space_id is not None: + for connector in _active_live: + tasks.append( + asyncio.ensure_future( + _search_live_connector( + connector=connector, + query=query, + search_space_id=_search_space_id, + top_k=clamped_top_k, + semaphore=semaphore, + ) + ) + ) + + if not tasks: + return "Web search is not available — no search engines are configured." + + results_lists = await asyncio.gather(*tasks, return_exceptions=True) + + all_documents: list[dict[str, Any]] = [] + for result in results_lists: + if isinstance(result, BaseException): + perf.warning("[web_search] a search engine failed: %s", result) + continue + all_documents.extend(result) + + seen_urls: set[str] = set() + deduplicated: list[dict[str, Any]] = [] + for doc in all_documents: + url = ((doc.get("document") or {}).get("metadata") or {}).get("url", "") + if url and url in seen_urls: + continue + if url: + seen_urls.add(url) + deduplicated.append(doc) + + formatted = _format_web_results(deduplicated) + + perf.info( + "[web_search] query=%r engines=%d results=%d deduped=%d chars=%d in %.3fs", + query[:60], + len(tasks), + len(all_documents), + len(deduplicated), + len(formatted), + time.perf_counter() - t0, + ) + return formatted + + return StructuredTool( + name="web_search", + description=description, + coroutine=_web_search_impl, + args_schema=WebSearchInput, + ) diff --git a/surfsense_backend/app/agents/video_presentation/__init__.py b/surfsense_backend/app/agents/video_presentation/__init__.py new file mode 100644 index 000000000..caf885218 --- /dev/null +++ b/surfsense_backend/app/agents/video_presentation/__init__.py @@ -0,0 +1,10 @@ +"""Video Presentation LangGraph Agent. + +This module defines a graph for generating video presentations +from source content, similar to the podcaster agent but producing +slide-based video presentations with TTS narration. +""" + +from .graph import graph + +__all__ = ["graph"] diff --git a/surfsense_backend/app/agents/video_presentation/configuration.py b/surfsense_backend/app/agents/video_presentation/configuration.py new file mode 100644 index 000000000..18724a2ab --- /dev/null +++ b/surfsense_backend/app/agents/video_presentation/configuration.py @@ -0,0 +1,25 @@ +"""Define the configurable parameters for the video presentation agent.""" + +from __future__ import annotations + +from dataclasses import dataclass, fields + +from langchain_core.runnables import RunnableConfig + + +@dataclass(kw_only=True) +class Configuration: + """The configuration for the video presentation agent.""" + + video_title: str + search_space_id: int + user_prompt: str | None = None + + @classmethod + def from_runnable_config( + cls, config: RunnableConfig | None = None + ) -> Configuration: + """Create a Configuration instance from a RunnableConfig object.""" + configurable = (config.get("configurable") or {}) if config else {} + _fields = {f.name for f in fields(cls) if f.init} + return cls(**{k: v for k, v in configurable.items() if k in _fields}) diff --git a/surfsense_backend/app/agents/video_presentation/graph.py b/surfsense_backend/app/agents/video_presentation/graph.py new file mode 100644 index 000000000..1d87bcd76 --- /dev/null +++ b/surfsense_backend/app/agents/video_presentation/graph.py @@ -0,0 +1,39 @@ +from langgraph.graph import StateGraph + +from .configuration import Configuration +from .nodes import ( + assign_slide_themes, + create_presentation_slides, + create_slide_audio, + generate_slide_scene_codes, +) +from .state import State + + +def build_graph(): + workflow = StateGraph(State, config_schema=Configuration) + + workflow.add_node("create_presentation_slides", create_presentation_slides) + workflow.add_node("create_slide_audio", create_slide_audio) + workflow.add_node("assign_slide_themes", assign_slide_themes) + workflow.add_node("generate_slide_scene_codes", generate_slide_scene_codes) + + # Fan-out: after slides are parsed, run audio generation and theme + # assignment in parallel (themes only need slide metadata, not audio). + workflow.add_edge("__start__", "create_presentation_slides") + workflow.add_edge("create_presentation_slides", "create_slide_audio") + workflow.add_edge("create_presentation_slides", "assign_slide_themes") + + # Fan-in: scene code generation waits for both audio and themes. + workflow.add_edge("create_slide_audio", "generate_slide_scene_codes") + workflow.add_edge("assign_slide_themes", "generate_slide_scene_codes") + + workflow.add_edge("generate_slide_scene_codes", "__end__") + + graph = workflow.compile() + graph.name = "Surfsense Video Presentation" + + return graph + + +graph = build_graph() diff --git a/surfsense_backend/app/agents/video_presentation/nodes.py b/surfsense_backend/app/agents/video_presentation/nodes.py new file mode 100644 index 000000000..1b3d71e84 --- /dev/null +++ b/surfsense_backend/app/agents/video_presentation/nodes.py @@ -0,0 +1,580 @@ +import asyncio +import contextlib +import json +import math +import os +import shutil +import uuid +from pathlib import Path +from typing import Any + +from ffmpeg.asyncio import FFmpeg +from langchain_core.messages import HumanMessage, SystemMessage +from langchain_core.runnables import RunnableConfig +from litellm import aspeech + +from app.config import config as app_config +from app.services.kokoro_tts_service import get_kokoro_tts_service +from app.services.llm_service import get_agent_llm + +from .configuration import Configuration +from .prompts import ( + DEFAULT_DURATION_IN_FRAMES, + FPS, + REFINE_SCENE_SYSTEM_PROMPT, + REMOTION_SCENE_SYSTEM_PROMPT, + THEME_PRESETS, + build_scene_generation_user_prompt, + build_theme_assignment_user_prompt, + get_slide_generation_prompt, + get_theme_assignment_system_prompt, + pick_theme_and_mode_fallback, +) +from .state import ( + PresentationSlides, + SlideAudioResult, + SlideContent, + SlideSceneCode, + State, +) +from .utils import get_voice_for_provider + +MAX_REFINE_ATTEMPTS = 3 + + +async def create_presentation_slides( + state: State, config: RunnableConfig +) -> dict[str, Any]: + """Parse source content into structured presentation slides using LLM.""" + + configuration = Configuration.from_runnable_config(config) + search_space_id = configuration.search_space_id + user_prompt = configuration.user_prompt + + llm = await get_agent_llm(state.db_session, search_space_id) + if not llm: + error_message = f"No LLM configured for search space {search_space_id}" + print(error_message) + raise RuntimeError(error_message) + + prompt = get_slide_generation_prompt(user_prompt) + + messages = [ + SystemMessage(content=prompt), + HumanMessage( + content=f"{state.source_content}" + ), + ] + + llm_response = await llm.ainvoke(messages) + + try: + presentation = PresentationSlides.model_validate( + json.loads(llm_response.content) + ) + except (json.JSONDecodeError, ValueError) as e: + print(f"Direct JSON parsing failed, trying fallback approach: {e!s}") + + try: + content = llm_response.content + json_start = content.find("{") + json_end = content.rfind("}") + 1 + if json_start >= 0 and json_end > json_start: + json_str = content[json_start:json_end] + parsed_data = json.loads(json_str) + presentation = PresentationSlides.model_validate(parsed_data) + print("Successfully parsed presentation slides using fallback approach") + else: + error_message = f"Could not find valid JSON in LLM response. Raw response: {content}" + print(error_message) + raise ValueError(error_message) + + except (json.JSONDecodeError, ValueError) as e2: + error_message = f"Error parsing LLM response (fallback also failed): {e2!s}" + print(f"Error parsing LLM response: {e2!s}") + print(f"Raw response: {llm_response.content}") + raise + + return {"slides": presentation.slides} + + +async def create_slide_audio(state: State, config: RunnableConfig) -> dict[str, Any]: + """Generate TTS audio for each slide. + + Each slide's speaker_transcripts are generated as individual TTS chunks, + then concatenated with ffmpeg (matching the POC in RemotionTets/api/tts). + """ + + session_id = str(uuid.uuid4()) + temp_dir = Path("temp_audio") + temp_dir.mkdir(exist_ok=True) + output_dir = Path("video_presentation_audio") + output_dir.mkdir(exist_ok=True) + + slides = state.slides or [] + voice = get_voice_for_provider(app_config.TTS_SERVICE, speaker_id=0) + ext = "wav" if app_config.TTS_SERVICE == "local/kokoro" else "mp3" + + async def _generate_tts_chunk(text: str, chunk_path: str) -> str: + """Generate a single TTS chunk and write it to *chunk_path*.""" + if app_config.TTS_SERVICE == "local/kokoro": + kokoro_service = await get_kokoro_tts_service(lang_code="a") + await kokoro_service.generate_speech( + text=text, + voice=voice, + speed=1.0, + output_path=chunk_path, + ) + else: + kwargs: dict[str, Any] = { + "model": app_config.TTS_SERVICE, + "api_key": app_config.TTS_SERVICE_API_KEY, + "voice": voice, + "input": text, + "max_retries": 2, + "timeout": 600, + } + if app_config.TTS_SERVICE_API_BASE: + kwargs["api_base"] = app_config.TTS_SERVICE_API_BASE + + response = await aspeech(**kwargs) + with open(chunk_path, "wb") as f: + f.write(response.content) + + return chunk_path + + async def _concat_with_ffmpeg(chunk_paths: list[str], output_file: str) -> None: + """Concatenate multiple audio chunks into one file using async ffmpeg.""" + ffmpeg = FFmpeg().option("y") + for chunk in chunk_paths: + ffmpeg = ffmpeg.input(chunk) + + filter_parts = [f"[{i}:0]" for i in range(len(chunk_paths))] + filter_str = ( + "".join(filter_parts) + f"concat=n={len(chunk_paths)}:v=0:a=1[outa]" + ) + ffmpeg = ffmpeg.option("filter_complex", filter_str) + ffmpeg = ffmpeg.output(output_file, map="[outa]") + await ffmpeg.execute() + + async def generate_audio_for_slide(slide: SlideContent) -> SlideAudioResult: + has_transcripts = ( + slide.speaker_transcripts and len(slide.speaker_transcripts) > 0 + ) + + if not has_transcripts: + print( + f"Slide {slide.slide_number}: no speaker_transcripts, " + f"using default duration ({DEFAULT_DURATION_IN_FRAMES} frames)" + ) + return SlideAudioResult( + slide_number=slide.slide_number, + audio_file="", + duration_seconds=DEFAULT_DURATION_IN_FRAMES / FPS, + duration_in_frames=DEFAULT_DURATION_IN_FRAMES, + ) + + output_file = str(output_dir / f"{session_id}_slide_{slide.slide_number}.{ext}") + + chunk_paths: list[str] = [] + try: + chunk_paths = [ + str( + temp_dir + / f"{session_id}_slide_{slide.slide_number}_chunk_{i}.{ext}" + ) + for i in range(len(slide.speaker_transcripts)) + ] + + for i, text in enumerate(slide.speaker_transcripts): + print( + f" Slide {slide.slide_number} chunk {i + 1}/" + f"{len(slide.speaker_transcripts)}: " + f'"{text[:60]}..."' + ) + + await asyncio.gather( + *[ + _generate_tts_chunk(text, path) + for text, path in zip( + slide.speaker_transcripts, chunk_paths, strict=False + ) + ] + ) + + if len(chunk_paths) == 1: + shutil.move(chunk_paths[0], output_file) + else: + print( + f" Concatenating {len(chunk_paths)} chunks for slide " + f"{slide.slide_number} with ffmpeg" + ) + await _concat_with_ffmpeg(chunk_paths, output_file) + + duration_seconds = await _get_audio_duration(output_file) + duration_in_frames = math.ceil(duration_seconds * FPS) + + return SlideAudioResult( + slide_number=slide.slide_number, + audio_file=output_file, + duration_seconds=duration_seconds, + duration_in_frames=max(duration_in_frames, DEFAULT_DURATION_IN_FRAMES), + ) + + except Exception as e: + print(f"Error generating audio for slide {slide.slide_number}: {e!s}") + raise + finally: + for p in chunk_paths: + with contextlib.suppress(OSError): + os.remove(p) + + tasks = [generate_audio_for_slide(slide) for slide in slides] + audio_results = await asyncio.gather(*tasks) + + audio_results_sorted = sorted(audio_results, key=lambda r: r.slide_number) + + print( + f"Generated audio for {len(audio_results_sorted)} slides " + f"(total duration: {sum(r.duration_seconds for r in audio_results_sorted):.1f}s)" + ) + + return {"slide_audio_results": audio_results_sorted} + + +async def _get_audio_duration(file_path: str) -> float: + """Get audio duration in seconds using ffprobe (via python-ffmpeg). + + Falls back to file-size estimation if ffprobe fails. + """ + try: + import subprocess + + proc = await asyncio.create_subprocess_exec( + "ffprobe", + "-v", + "error", + "-show_entries", + "format=duration", + "-of", + "default=noprint_wrappers=1:nokey=1", + file_path, + stdout=subprocess.PIPE, + stderr=subprocess.PIPE, + ) + stdout, _ = await asyncio.wait_for(proc.communicate(), timeout=10) + if proc.returncode == 0 and stdout.strip(): + return float(stdout.strip()) + except Exception as e: + print(f"ffprobe failed for {file_path}: {e!s}, using file-size estimation") + + try: + file_size = os.path.getsize(file_path) + if file_path.endswith(".wav"): + return file_size / (16000 * 2) + else: + return file_size / 16000 + except Exception: + return DEFAULT_DURATION_IN_FRAMES / FPS + + +async def _assign_themes_with_llm( + llm, slides: list[SlideContent] +) -> dict[int, tuple[str, str]]: + """Ask the LLM to assign a theme+mode to each slide in one call. + + Returns a dict mapping slide_number → (theme, mode). + Falls back to round-robin if the LLM response can't be parsed. + """ + total = len(slides) + slide_summaries = [ + { + "slide_number": s.slide_number, + "title": s.title, + "subtitle": s.subtitle or "", + "background_explanation": s.background_explanation or "", + } + for s in slides + ] + + system = get_theme_assignment_system_prompt() + user = build_theme_assignment_user_prompt(slide_summaries) + + try: + response = await llm.ainvoke( + [ + SystemMessage(content=system), + HumanMessage(content=user), + ] + ) + + text = response.content.strip() + if text.startswith("```"): + lines = text.split("\n") + text = "\n".join( + line for line in lines if not line.strip().startswith("```") + ).strip() + + assignments = json.loads(text) + valid_themes = set(THEME_PRESETS) + result: dict[int, tuple[str, str]] = {} + for entry in assignments: + sn = entry.get("slide_number") + theme = entry.get("theme", "").upper() + mode = entry.get("mode", "dark").lower() + if sn and theme in valid_themes and mode in ("dark", "light"): + result[sn] = (theme, mode) + + if len(result) == total: + print( + "LLM theme assignment: " + + ", ".join(f"S{sn}={t}/{m}" for sn, (t, m) in sorted(result.items())) + ) + return result + + print( + f"LLM returned {len(result)}/{total} valid assignments, " + "filling gaps with fallback" + ) + for s in slides: + if s.slide_number not in result: + result[s.slide_number] = pick_theme_and_mode_fallback( + s.slide_number - 1, total + ) + return result + + except Exception as e: + print(f"LLM theme assignment failed ({e!s}), using fallback") + return { + s.slide_number: pick_theme_and_mode_fallback(s.slide_number - 1, total) + for s in slides + } + + +async def assign_slide_themes(state: State, config: RunnableConfig) -> dict[str, Any]: + """Assign a theme preset + dark/light mode to every slide via a single LLM call. + + Runs in parallel with audio generation since it only needs slide metadata. + """ + configuration = Configuration.from_runnable_config(config) + search_space_id = configuration.search_space_id + + llm = await get_agent_llm(state.db_session, search_space_id) + if not llm: + raise RuntimeError(f"No LLM configured for search space {search_space_id}") + + slides = state.slides or [] + assignments = await _assign_themes_with_llm(llm, slides) + return {"slide_theme_assignments": assignments} + + +async def generate_slide_scene_codes( + state: State, config: RunnableConfig +) -> dict[str, Any]: + """Generate Remotion component code for each slide using LLM. + + Reads pre-assigned themes from state (produced by the parallel + assign_slide_themes node) and generates scene code concurrently. + """ + + configuration = Configuration.from_runnable_config(config) + search_space_id = configuration.search_space_id + + llm = await get_agent_llm(state.db_session, search_space_id) + if not llm: + raise RuntimeError(f"No LLM configured for search space {search_space_id}") + + slides = state.slides or [] + audio_results = state.slide_audio_results or [] + + audio_map: dict[int, SlideAudioResult] = {r.slide_number: r for r in audio_results} + total_slides = len(slides) + + theme_assignments = state.slide_theme_assignments or {} + + async def _generate_scene_for_slide(slide: SlideContent) -> SlideSceneCode: + audio = audio_map.get(slide.slide_number) + duration = audio.duration_in_frames if audio else DEFAULT_DURATION_IN_FRAMES + + theme, mode = theme_assignments.get( + slide.slide_number, + pick_theme_and_mode_fallback(slide.slide_number - 1, total_slides), + ) + + user_prompt = build_scene_generation_user_prompt( + slide_number=slide.slide_number, + total_slides=total_slides, + title=slide.title, + subtitle=slide.subtitle, + content_in_markdown=slide.content_in_markdown, + background_explanation=slide.background_explanation, + duration_in_frames=duration, + theme=theme, + mode=mode, + ) + + messages = [ + SystemMessage(content=REMOTION_SCENE_SYSTEM_PROMPT), + HumanMessage(content=user_prompt), + ] + + print( + f"Generating scene code for slide {slide.slide_number}/{total_slides}: " + f'"{slide.title}" ({duration} frames)' + ) + + llm_response = await llm.ainvoke(messages) + code, scene_title = _extract_code_and_title(llm_response.content) + + code = await _refine_if_needed(llm, code, slide.slide_number) + + print(f"Scene code ready for slide {slide.slide_number} ({len(code)} chars)") + + return SlideSceneCode( + slide_number=slide.slide_number, + code=code, + title=scene_title or slide.title, + ) + + scene_codes = list( + await asyncio.gather(*[_generate_scene_for_slide(s) for s in slides]) + ) + + return {"slide_scene_codes": scene_codes} + + +def _extract_code_and_title(content: str) -> tuple[str, str | None]: + """Extract code and optional title from LLM response. + + The LLM may return a JSON object like the POC's structured output: + { "code": "...", "title": "..." } + Or it may return raw code (with optional markdown fences). + + Returns (code, title) where title may be None. + """ + text = content.strip() + + if text.startswith("{"): + try: + parsed = json.loads(text) + if isinstance(parsed, dict) and "code" in parsed: + return parsed["code"], parsed.get("title") + except (json.JSONDecodeError, ValueError): + pass + + json_start = text.find("{") + json_end = text.rfind("}") + 1 + if json_start >= 0 and json_end > json_start: + try: + parsed = json.loads(text[json_start:json_end]) + if isinstance(parsed, dict) and "code" in parsed: + return parsed["code"], parsed.get("title") + except (json.JSONDecodeError, ValueError): + pass + + code = text + if code.startswith("```"): + lines = code.split("\n") + start = 1 + end = len(lines) + for i in range(len(lines) - 1, 0, -1): + if lines[i].strip().startswith("```"): + end = i + break + code = "\n".join(lines[start:end]).strip() + + return code, None + + +async def _refine_if_needed(llm, code: str, slide_number: int) -> str: + """Attempt basic syntax validation and auto-repair via LLM if needed. + + Raises RuntimeError if the code is still invalid after MAX_REFINE_ATTEMPTS, + matching the POC's behavior where a failed slide aborts the pipeline. + """ + error = _basic_syntax_check(code) + if error is None: + return code + + for attempt in range(1, MAX_REFINE_ATTEMPTS + 1): + print( + f"Slide {slide_number}: syntax issue (attempt {attempt}/{MAX_REFINE_ATTEMPTS}): {error}" + ) + + messages = [ + SystemMessage(content=REFINE_SCENE_SYSTEM_PROMPT), + HumanMessage( + content=( + f"Here is the broken Remotion component code:\n\n{code}\n\n" + f"Compilation error:\n{error}\n\nFix the code." + ) + ), + ] + + response = await llm.ainvoke(messages) + code, _ = _extract_code_and_title(response.content) + + error = _basic_syntax_check(code) + if error is None: + print(f"Slide {slide_number}: fixed on attempt {attempt}") + return code + + raise RuntimeError( + f"Slide {slide_number} failed to compile after {MAX_REFINE_ATTEMPTS} " + f"refine attempts. Last error: {error}" + ) + + +def _basic_syntax_check(code: str) -> str | None: + """Run a lightweight syntax check on the generated code. + + Full Babel-based compilation happens on the frontend. This backend check + catches the most common LLM code-generation mistakes so the refine loop + can fix them before persisting. + + Returns an error description or None if the code looks valid. + """ + if not code or not code.strip(): + return "Empty code" + + if "export" not in code and "MyComposition" not in code: + return "Missing exported component (expected 'export const MyComposition')" + + brace_count = 0 + paren_count = 0 + bracket_count = 0 + for ch in code: + if ch == "{": + brace_count += 1 + elif ch == "}": + brace_count -= 1 + elif ch == "(": + paren_count += 1 + elif ch == ")": + paren_count -= 1 + elif ch == "[": + bracket_count += 1 + elif ch == "]": + bracket_count -= 1 + + if brace_count < 0: + return "Unmatched closing brace '}'" + if paren_count < 0: + return "Unmatched closing parenthesis ')'" + if bracket_count < 0: + return "Unmatched closing bracket ']'" + + if brace_count != 0: + return f"Unbalanced braces: {brace_count} unclosed" + if paren_count != 0: + return f"Unbalanced parentheses: {paren_count} unclosed" + if bracket_count != 0: + return f"Unbalanced brackets: {bracket_count} unclosed" + + if "useCurrentFrame" not in code: + return "Missing useCurrentFrame() — required for Remotion animations" + + if "AbsoluteFill" not in code: + return "Missing AbsoluteFill — required as the root layout component" + + return None diff --git a/surfsense_backend/app/agents/video_presentation/prompts.py b/surfsense_backend/app/agents/video_presentation/prompts.py new file mode 100644 index 000000000..5533bb01c --- /dev/null +++ b/surfsense_backend/app/agents/video_presentation/prompts.py @@ -0,0 +1,509 @@ +import datetime + +# TODO: move these to config file +MAX_SLIDES = 5 +FPS = 30 +DEFAULT_DURATION_IN_FRAMES = 300 + +THEME_PRESETS = [ + "TERRA", + "OCEAN", + "SUNSET", + "EMERALD", + "ECLIPSE", + "ROSE", + "FROST", + "NEBULA", + "AURORA", + "CORAL", + "MIDNIGHT", + "AMBER", + "LAVENDER", + "STEEL", + "CITRUS", + "CHERRY", +] + +THEME_DESCRIPTIONS: dict[str, str] = { + "TERRA": "Warm earthy tones — terracotta, olive. Heritage, tradition, organic warmth.", + "OCEAN": "Cool oceanic depth — teal, coral accents. Calm, marine, fluid elegance.", + "SUNSET": "Vibrant warm energy — orange, purple. Passion, creativity, bold expression.", + "EMERALD": "Fresh natural life — green, mint. Growth, health, sustainability.", + "ECLIPSE": "Dramatic luxury — black, gold. Premium, power, prestige.", + "ROSE": "Soft elegance — dusty pink, mauve. Beauty, care, refined femininity.", + "FROST": "Crisp clarity — ice blue, silver. Tech, data, precision analytics.", + "NEBULA": "Cosmic mystery — magenta, deep purple. AI, innovation, cutting-edge future.", + "AURORA": "Ethereal northern lights — green-teal, violet. Mystical, transformative, wonder.", + "CORAL": "Tropical warmth — coral, turquoise. Inviting, lively, community.", + "MIDNIGHT": "Deep sophistication — navy, silver. Contemplative, trust, authority.", + "AMBER": "Rich honey warmth — amber, brown. Comfort, wisdom, organic richness.", + "LAVENDER": "Gentle dreaminess — purple, lilac. Calm, imaginative, serene.", + "STEEL": "Industrial strength — gray, steel blue. Modern professional, reliability.", + "CITRUS": "Bright optimism — yellow, lime. Energy, joy, fresh starts.", + "CHERRY": "Bold impact — deep red, dark. Power, urgency, passionate conviction.", +} + + +# --------------------------------------------------------------------------- +# LLM-based theme assignment (replaces keyword-based pick_theme_and_mode) +# --------------------------------------------------------------------------- + +THEME_ASSIGNMENT_SYSTEM_PROMPT = """You are a visual design director assigning color themes to presentation slides. +Given a list of slides, assign each slide a theme preset and color mode (dark or light). + +Available themes (name — description): +{theme_list} + +Rules: +1. Pick the theme that best matches each slide's mood, content, and visual direction. +2. Maximize visual variety — avoid repeating the same theme on consecutive slides. +3. Mix dark and light modes across the presentation for contrast and rhythm. +4. Opening slides often benefit from a bold dark theme; closing/summary slides can go either way. +5. The "background_explanation" field is the primary signal — it describes the intended mood and color direction. + +Return ONLY a JSON array (no markdown fences, no explanation): +[ + {{"slide_number": 1, "theme": "THEME_NAME", "mode": "dark"}}, + {{"slide_number": 2, "theme": "THEME_NAME", "mode": "light"}} +] +""".strip() + + +def build_theme_assignment_user_prompt( + slides: list[dict[str, str]], +) -> str: + """Build the user prompt for LLM theme assignment. + + *slides* is a list of dicts with keys: slide_number, title, subtitle, + background_explanation (mood). + """ + lines = ["Assign a theme and mode to each of these slides:", ""] + for s in slides: + lines.append( + f'Slide {s["slide_number"]}: "{s["title"]}" ' + f'(subtitle: "{s.get("subtitle", "")}") — ' + f'Mood: "{s.get("background_explanation", "neutral")}"' + ) + return "\n".join(lines) + + +def get_theme_assignment_system_prompt() -> str: + """Return the theme assignment system prompt with the full theme list injected.""" + theme_list = "\n".join( + f"- {name}: {desc}" for name, desc in THEME_DESCRIPTIONS.items() + ) + return THEME_ASSIGNMENT_SYSTEM_PROMPT.format(theme_list=theme_list) + + +def pick_theme_and_mode_fallback( + slide_index: int, total_slides: int +) -> tuple[str, str]: + """Simple round-robin fallback when LLM theme assignment fails.""" + theme = THEME_PRESETS[slide_index % len(THEME_PRESETS)] + mode = "dark" if slide_index % 2 == 0 else "light" + if total_slides == 1: + mode = "dark" + return theme, mode + + +def get_slide_generation_prompt(user_prompt: str | None = None) -> str: + return f""" +Today's date: {datetime.datetime.now().strftime("%Y-%m-%d")} + +You are a content-to-slides converter. You receive raw source content (articles, notes, transcripts, +product descriptions, chat conversations, etc.) and break it into a sequence of presentation slides +for a video presentation with voiceover narration. + +{ + f''' +You **MUST** strictly adhere to the following user instruction while generating the slides: + +{user_prompt} + +''' + if user_prompt + else "" + } + + +- '': A block of text containing the information to be presented. This could be + research findings, an article summary, a detailed outline, user chat history, or any relevant + raw information. The content serves as the factual basis for the video presentation. + + + +A JSON object containing the presentation slides: +{{ + "slides": [ + {{ + "slide_number": 1, + "title": "Concise slide title", + "subtitle": "One-line subtitle or tagline", + "content_in_markdown": "## Heading\\n- Bullet point 1\\n- **Bold text**\\n- Bullet point 3", + "speaker_transcripts": [ + "First narration sentence for this slide.", + "Second narration sentence expanding on the point.", + "Third sentence wrapping up this slide." + ], + "background_explanation": "Emotional mood and color direction for this slide" + }} + ] +}} + + + +=== SLIDE COUNT === + +Dynamically decide the number of slides between 1 and {MAX_SLIDES} (inclusive). +Base your decision entirely on the content's depth, richness, and how many distinct ideas it contains. +Thin or simple content should produce fewer slides; dense or multi-faceted content may use more. +Do NOT inflate or pad slides to reach { + MAX_SLIDES + } — only use what the content genuinely warrants. +Do NOT treat {MAX_SLIDES} as a target; it is a hard ceiling, not a goal. + +=== SLIDE STRUCTURE === + +- Each slide should cover ONE distinct key idea or section. +- Keep slides focused: 2-5 bullet points of content per slide max. +- The first slide should be a title/intro slide. +- The last slide should be a summary or closing slide ONLY if there are 3+ slides. + For 1-2 slides, skip the closing slide — just cover the content. +- Do NOT create a separate closing slide if its content would just repeat earlier slides. + +=== CONTENT FIELDS === + +- Write speaker_transcripts as if a human presenter is narrating — natural, conversational, 2-4 sentences per slide. + These will be converted to TTS audio, so write in a way that sounds great when spoken aloud. +- background_explanation should describe a visual style matching the slide's mood: + - Describe the emotional feel: "warm and organic", "dramatic and urgent", "clean and optimistic", + "technical and precise", "celebratory", "earthy and grounded", "cosmic and futuristic" + - Mention color direction: warm tones, cool tones, earth tones, neon accents, gold/black, etc. + - Vary the mood across slides — do NOT always say "dark blue gradient". +- content_in_markdown should use proper markdown: ## headings, **bold**, - bullets, etc. + +=== NARRATION QUALITY === + +- Speaker transcripts should explain the slide content in an engaging, presenter-like voice. +- Keep narration concise: 2-4 sentences per slide (targeting ~10-15 seconds of audio per slide). +- The narration should add context beyond what's on the slide — don't just read the bullets. +- Use natural language: contractions, conversational tone, occasional enthusiasm. + + + +Input: "Quantum computing uses quantum bits or qubits which can exist in multiple states simultaneously due to superposition." + +Output: +{{ + "slides": [ + {{ + "slide_number": 1, + "title": "Quantum Computing", + "subtitle": "Beyond Classical Bits", + "content_in_markdown": "## The Quantum Leap\\n- Classical computers use **bits** (0 or 1)\\n- Quantum computers use **qubits**\\n- Qubits leverage **superposition**", + "speaker_transcripts": [ + "Let's explore quantum computing, a technology that's fundamentally different from the computers we use every day.", + "While traditional computers work with bits that are either zero or one, quantum computers use something called qubits.", + "The magic of qubits is superposition — they can exist in multiple states at the same time." + ], + "background_explanation": "Cosmic and futuristic with deep purple and magenta tones, evoking the mystery of quantum mechanics" + }} + ] +}} + + +Transform the source material into well-structured presentation slides with engaging narration. +Ensure each slide has a clear visual mood and natural-sounding speaker transcripts. + +""" + + +# --------------------------------------------------------------------------- +# Remotion scene code generation prompt +# Ported from RemotionTets POC /api/generate system prompt +# --------------------------------------------------------------------------- + +REMOTION_SCENE_SYSTEM_PROMPT = """ +You are a Remotion component generator that creates cinematic, modern motion graphics. +Generate a single self-contained React component that uses Remotion. + +=== THEME PRESETS (pick ONE per slide — see user prompt for which to use) === + +Each slide MUST use a DIFFERENT preset. The user prompt will tell you which preset to use. +Use ALL colors from that preset — background, surface, text, accent, glow. Do NOT mix presets. + +TERRA (warm earth — terracotta + olive): + dark: bg #1C1510 surface #261E16 border #3D3024 text #E8DDD0 muted #9A8A78 accent #C2623D secondary #7D8C52 glow rgba(194,98,61,0.12) + light: bg #F7F0E8 surface #FFF8F0 border #DDD0BF text #2C1D0E muted #8A7A68 accent #B85430 secondary #6B7A42 glow rgba(184,84,48,0.08) + gradient-dark: radial-gradient(ellipse at 30% 80%, rgba(194,98,61,0.18), transparent 60%), linear-gradient(180deg, #1C1510, #261E16) + gradient-light: radial-gradient(ellipse at 70% 20%, rgba(107,122,66,0.12), transparent 55%), linear-gradient(180deg, #F7F0E8, #FFF8F0) + +OCEAN (cool depth — teal + coral): + dark: bg #0B1A1E surface #122428 border #1E3740 text #D5EAF0 muted #6A9AA8 accent #1DB6A8 secondary #E87461 glow rgba(29,182,168,0.12) + light: bg #F0F8FA surface #FFFFFF border #C8E0E8 text #0E2830 muted #5A8A98 accent #0EA69A secondary #D05F4E glow rgba(14,166,154,0.08) + gradient-dark: radial-gradient(ellipse at 80% 30%, rgba(29,182,168,0.20), transparent 55%), radial-gradient(circle at 20% 80%, rgba(232,116,97,0.10), transparent 50%), #0B1A1E + gradient-light: radial-gradient(ellipse at 20% 40%, rgba(14,166,154,0.10), transparent 55%), linear-gradient(180deg, #F0F8FA, #FFFFFF) + +SUNSET (warm energy — orange + purple): + dark: bg #1E130F surface #2A1B14 border #42291C text #F0DDD0 muted #A08878 accent #E86A20 secondary #A855C0 glow rgba(232,106,32,0.12) + light: bg #FFF5ED surface #FFFFFF border #EADAC8 text #2E1508 muted #907860 accent #D05A18 secondary #9045A8 glow rgba(208,90,24,0.08) + gradient-dark: linear-gradient(135deg, rgba(232,106,32,0.15) 0%, transparent 40%), radial-gradient(circle at 80% 70%, rgba(168,85,192,0.15), transparent 50%), #1E130F + gradient-light: linear-gradient(135deg, rgba(208,90,24,0.08) 0%, rgba(144,69,168,0.06) 100%), #FFF5ED + +EMERALD (fresh life — green + mint): + dark: bg #0B1E14 surface #12281A border #1E3C28 text #D0F0E0 muted #5EA880 accent #10B981 secondary #84CC16 glow rgba(16,185,129,0.12) + light: bg #F0FAF5 surface #FFFFFF border #C0E8D0 text #0E2C18 muted #489068 accent #059669 secondary #65A30D glow rgba(5,150,105,0.08) + gradient-dark: radial-gradient(ellipse at 50% 50%, rgba(16,185,129,0.18), transparent 60%), linear-gradient(180deg, #0B1E14, #12281A) + gradient-light: radial-gradient(ellipse at 60% 30%, rgba(101,163,13,0.10), transparent 55%), linear-gradient(180deg, #F0FAF5, #FFFFFF) + +ECLIPSE (dramatic — black + gold): + dark: bg #100C05 surface #1A1508 border #2E2510 text #D4B96A muted #8A7840 accent #E8B830 secondary #C09020 glow rgba(232,184,48,0.14) + light: bg #FAF6ED surface #FFFFFF border #E0D8C0 text #1A1408 muted #7A6818 accent #C09820 secondary #A08018 glow rgba(192,152,32,0.08) + gradient-dark: radial-gradient(circle at 50% 40%, rgba(232,184,48,0.20), transparent 50%), radial-gradient(ellipse at 50% 90%, rgba(192,144,32,0.08), transparent 50%), #100C05 + gradient-light: radial-gradient(circle at 50% 40%, rgba(192,152,32,0.10), transparent 55%), linear-gradient(180deg, #FAF6ED, #FFFFFF) + +ROSE (soft elegance — dusty pink + mauve): + dark: bg #1E1018 surface #281820 border #3D2830 text #F0D8E0 muted #A08090 accent #E4508C secondary #B06498 glow rgba(228,80,140,0.12) + light: bg #FDF2F5 surface #FFFFFF border #F0D0D8 text #2C1018 muted #906878 accent #D43D78 secondary #9A5080 glow rgba(212,61,120,0.08) + gradient-dark: radial-gradient(ellipse at 70% 30%, rgba(228,80,140,0.18), transparent 55%), radial-gradient(circle at 20% 80%, rgba(176,100,152,0.10), transparent 50%), #1E1018 + gradient-light: radial-gradient(ellipse at 30% 60%, rgba(212,61,120,0.08), transparent 55%), linear-gradient(180deg, #FDF2F5, #FFFFFF) + +FROST (crisp clarity — ice blue + silver): + dark: bg #0A1520 surface #101D2A border #1A3040 text #D0E5F5 muted #6090B0 accent #5AB4E8 secondary #8BA8C0 glow rgba(90,180,232,0.12) + light: bg #F0F6FC surface #FFFFFF border #C8D8E8 text #0C1820 muted #5080A0 accent #3A96D0 secondary #7090A8 glow rgba(58,150,208,0.08) + gradient-dark: radial-gradient(ellipse at 40% 20%, rgba(90,180,232,0.16), transparent 55%), linear-gradient(180deg, #0A1520, #101D2A) + gradient-light: radial-gradient(ellipse at 50% 50%, rgba(58,150,208,0.08), transparent 55%), linear-gradient(180deg, #F0F6FC, #FFFFFF) + +NEBULA (cosmic — magenta + deep purple): + dark: bg #150A1E surface #1E1028 border #351A48 text #E0D0F0 muted #8060A0 accent #C850E0 secondary #8030C0 glow rgba(200,80,224,0.14) + light: bg #F8F0FF surface #FFFFFF border #E0C8F0 text #1A0A24 muted #7050A0 accent #A840C0 secondary #6820A0 glow rgba(168,64,192,0.08) + gradient-dark: radial-gradient(circle at 60% 40%, rgba(200,80,224,0.18), transparent 50%), radial-gradient(ellipse at 30% 80%, rgba(128,48,192,0.12), transparent 50%), #150A1E + gradient-light: radial-gradient(circle at 40% 30%, rgba(168,64,192,0.10), transparent 55%), linear-gradient(180deg, #F8F0FF, #FFFFFF) + +AURORA (ethereal lights — green-teal + violet): + dark: bg #0A1A1A surface #102020 border #1A3838 text #D0F0F0 muted #60A0A0 accent #30D0B0 secondary #8040D0 glow rgba(48,208,176,0.12) + light: bg #F0FAF8 surface #FFFFFF border #C0E8E0 text #0A2020 muted #508080 accent #20B090 secondary #6830B0 glow rgba(32,176,144,0.08) + gradient-dark: radial-gradient(ellipse at 30% 70%, rgba(48,208,176,0.18), transparent 55%), radial-gradient(circle at 70% 30%, rgba(128,64,208,0.12), transparent 50%), #0A1A1A + gradient-light: radial-gradient(ellipse at 50% 40%, rgba(32,176,144,0.10), transparent 55%), linear-gradient(180deg, #F0FAF8, #FFFFFF) + +CORAL (tropical warmth — coral + turquoise): + dark: bg #1E0F0F surface #281818 border #402828 text #F0D8D8 muted #A07070 accent #F06050 secondary #30B8B0 glow rgba(240,96,80,0.12) + light: bg #FFF5F3 surface #FFFFFF border #F0D0C8 text #2E1010 muted #906060 accent #E04838 secondary #20A098 glow rgba(224,72,56,0.08) + gradient-dark: radial-gradient(ellipse at 60% 60%, rgba(240,96,80,0.18), transparent 55%), radial-gradient(circle at 30% 30%, rgba(48,184,176,0.10), transparent 50%), #1E0F0F + gradient-light: radial-gradient(ellipse at 40% 50%, rgba(224,72,56,0.08), transparent 55%), linear-gradient(180deg, #FFF5F3, #FFFFFF) + +MIDNIGHT (deep sophistication — navy + silver): + dark: bg #080C18 surface #0E1420 border #1A2438 text #C8D8F0 muted #5070A0 accent #4080E0 secondary #A0B0D0 glow rgba(64,128,224,0.12) + light: bg #F0F2F8 surface #FFFFFF border #C8D0E0 text #101828 muted #506080 accent #3060C0 secondary #8090B0 glow rgba(48,96,192,0.08) + gradient-dark: radial-gradient(ellipse at 50% 30%, rgba(64,128,224,0.16), transparent 55%), linear-gradient(180deg, #080C18, #0E1420) + gradient-light: radial-gradient(ellipse at 50% 50%, rgba(48,96,192,0.08), transparent 55%), linear-gradient(180deg, #F0F2F8, #FFFFFF) + +AMBER (rich honey warmth — amber + brown): + dark: bg #1A1208 surface #221A0E border #3A2C18 text #F0E0C0 muted #A09060 accent #E0A020 secondary #C08030 glow rgba(224,160,32,0.12) + light: bg #FFF8E8 surface #FFFFFF border #E8D8B8 text #2A1C08 muted #907840 accent #C88810 secondary #A86820 glow rgba(200,136,16,0.08) + gradient-dark: radial-gradient(ellipse at 40% 60%, rgba(224,160,32,0.18), transparent 55%), linear-gradient(180deg, #1A1208, #221A0E) + gradient-light: radial-gradient(ellipse at 60% 40%, rgba(200,136,16,0.10), transparent 55%), linear-gradient(180deg, #FFF8E8, #FFFFFF) + +LAVENDER (gentle dreaminess — purple + lilac): + dark: bg #14101E surface #1C1628 border #302840 text #E0D8F0 muted #8070A0 accent #A060E0 secondary #C090D0 glow rgba(160,96,224,0.12) + light: bg #F8F0FF surface #FFFFFF border #E0D0F0 text #1C1028 muted #706090 accent #8848C0 secondary #A878B8 glow rgba(136,72,192,0.08) + gradient-dark: radial-gradient(ellipse at 60% 40%, rgba(160,96,224,0.18), transparent 55%), radial-gradient(circle at 30% 70%, rgba(192,144,208,0.10), transparent 50%), #14101E + gradient-light: radial-gradient(ellipse at 40% 30%, rgba(136,72,192,0.10), transparent 55%), linear-gradient(180deg, #F8F0FF, #FFFFFF) + +STEEL (industrial strength — gray + steel blue): + dark: bg #101214 surface #181C20 border #282E38 text #D0D8E0 muted #708090 accent #5088B0 secondary #90A0B0 glow rgba(80,136,176,0.12) + light: bg #F2F4F6 surface #FFFFFF border #D0D8E0 text #181C24 muted #607080 accent #3870A0 secondary #708898 glow rgba(56,112,160,0.08) + gradient-dark: radial-gradient(ellipse at 50% 50%, rgba(80,136,176,0.14), transparent 55%), linear-gradient(180deg, #101214, #181C20) + gradient-light: radial-gradient(ellipse at 50% 40%, rgba(56,112,160,0.08), transparent 55%), linear-gradient(180deg, #F2F4F6, #FFFFFF) + +CITRUS (bright optimism — yellow + lime): + dark: bg #181808 surface #202010 border #383818 text #F0F0C0 muted #A0A060 accent #E8D020 secondary #90D030 glow rgba(232,208,32,0.12) + light: bg #FFFFF0 surface #FFFFFF border #E8E8C0 text #282808 muted #808040 accent #C8B010 secondary #70B020 glow rgba(200,176,16,0.08) + gradient-dark: radial-gradient(ellipse at 40% 40%, rgba(232,208,32,0.18), transparent 55%), radial-gradient(circle at 70% 70%, rgba(144,208,48,0.10), transparent 50%), #181808 + gradient-light: radial-gradient(ellipse at 50% 30%, rgba(200,176,16,0.10), transparent 55%), linear-gradient(180deg, #FFFFF0, #FFFFFF) + +CHERRY (bold impact — deep red + dark): + dark: bg #1A0808 surface #241010 border #401818 text #F0D0D0 muted #A06060 accent #D02030 secondary #E05060 glow rgba(208,32,48,0.14) + light: bg #FFF0F0 surface #FFFFFF border #F0C8C8 text #280808 muted #904848 accent #B01828 secondary #C83848 glow rgba(176,24,40,0.08) + gradient-dark: radial-gradient(ellipse at 50% 40%, rgba(208,32,48,0.20), transparent 50%), linear-gradient(180deg, #1A0808, #241010) + gradient-light: radial-gradient(ellipse at 50% 50%, rgba(176,24,40,0.10), transparent 55%), linear-gradient(180deg, #FFF0F0, #FFFFFF) + +=== SHARED TOKENS (use with any theme above) === + +SPACING: xs 8px, sm 16px, md 24px, lg 32px, xl 48px, 2xl 64px, 3xl 96px, 4xl 128px +TYPOGRAPHY: fontFamily "Inter, system-ui, -apple-system, sans-serif" + caption 14px/1.4, body 18px/1.6, subhead 24px/1.4, title 40px/1.2 w600, headline 64px/1.1 w700, display 96px/1.0 w800 + letterSpacing: tight "-0.02em", normal "0", wide "0.05em" +BORDER RADIUS: 12px (cards), 8px (buttons), 9999px (pills) + +=== VISUAL VARIETY (CRITICAL) === + +The user prompt assigns each slide a specific theme preset AND mode (dark/light). +You MUST use EXACTLY the assigned preset and mode. Additionally: + +1. Use the preset's gradient as the AbsoluteFill background. +2. Use the preset's accent/secondary colors for highlights, pill badges, and card accents. +3. Use the preset's glow value for all boxShadow effects. +4. LAYOUT VARIATION: Vary layout between slides: + - One slide: bold centered headline + subtle stat + - Another: two-column card layout + - Another: single large number or quote as hero + Do NOT use the same layout pattern for every slide. + +=== LAYOUT RULES (CRITICAL — elements must NEVER overlap) === + +The canvas is 1920x1080. You MUST use a SINGLE-LAYER layout. NO stacking, NO multiple AbsoluteFill layers. + +STRUCTURE — every component must follow this exact pattern: + + {/* ALL content goes here as direct children in normal flow */} + + +ABSOLUTE RULES: +- Use exactly ONE AbsoluteFill as the root. Set its background color/gradient via its style prop. +- NEVER nest AbsoluteFill inside AbsoluteFill. +- NEVER use position "absolute" or position "fixed" on ANY element. +- NEVER use multiple layers or z-index. +- ALL elements must be in normal document flow inside the single root AbsoluteFill. + +SPACING: +- Root padding: 80px on all sides (safe area). +- Use flexDirection "column" with gap for vertical stacking, flexDirection "row" with gap for horizontal. +- Minimum gap between elements: 24px vertical, 32px horizontal. +- Text hierarchy gaps: headline→subheading 16px, subheading→body 12px, body→button 32px. +- Cards/panels: padding 32px-48px, borderRadius 12px. +- NEVER use margin to space siblings — always use the parent's gap property. + +=== DESIGN STYLE === + +- Premium aesthetic — use the exact colors from the assigned theme preset (do NOT invent your own) +- Background: use the preset's gradient-dark or gradient-light value directly as the AbsoluteFill's background +- Card/surface backgrounds: use the preset's surface color +- Text colors: use the preset's text, muted values +- Borders: use the preset's border color +- Glows: use the preset's glow value for all boxShadow — do NOT substitute other colors +- Generous whitespace — less is more, let elements breathe +- NO decorative background shapes, blurs, or overlapping ornaments + +=== REMOTION RULES === + +- Export the component as: export const MyComposition = () => { ... } +- Use useCurrentFrame() and useVideoConfig() from "remotion" +- Do NOT use Sequence +- Do NOT manually calculate animation timings or frame offsets + +=== ANIMATION (use the stagger() helper for ALL element animations) === + +A pre-built helper function called stagger() is available globally. +It handles enter, hold, and exit phases automatically — you MUST use it. + +Signature: + stagger(frame, fps, index, total) → { opacity: number, transform: string } + +Parameters: + frame — from useCurrentFrame() + fps — from useVideoConfig() + index — 0-based index of this element in the entrance order + total — total number of animated elements in the scene + +It returns a style object with opacity and transform that you spread onto the element. +Timing is handled for you: staggered spring entrances, ambient hold motion, and a graceful exit. + +Usage pattern: + const frame = useCurrentFrame(); + const { fps } = useVideoConfig(); + +
Headline
+
Subtitle
+
Card
+
Footer
+ +Rules: +- Count ALL animated elements in your scene and pass that count as the "total" parameter. +- Assign each element a sequential index starting from 0. +- You can merge stagger's return with additional styles: +
+- For non-animated static elements (backgrounds, borders), just use normal styles without stagger. +- You may still use spring() and interpolate() for EXTRA custom effects (e.g., a number counter, + color shift, or typewriter effect), but stagger() must drive all entrance/exit animations. + +=== AVAILABLE GLOBALS (injected at runtime, do NOT import anything else) === + +- React (available globally) +- AbsoluteFill, useCurrentFrame, useVideoConfig, spring, interpolate, Easing from "remotion" +- stagger(frame, fps, index, total) — animation helper described above + +=== CODE RULES === + +- Output ONLY the raw code, no markdown fences, no explanations +- Keep it fully self-contained, no external dependencies or images +- Use inline styles only (no CSS imports, no className) +- Target 1920x1080 resolution +- Every container must use display "flex" with explicit gap values +- NEVER use marginTop/marginBottom to space siblings — use the parent's gap instead +""".strip() + + +def build_scene_generation_user_prompt( + slide_number: int, + total_slides: int, + title: str, + subtitle: str, + content_in_markdown: str, + background_explanation: str, + duration_in_frames: int, + theme: str, + mode: str, +) -> str: + """Build the user prompt for generating a single slide's Remotion scene code. + + *theme* and *mode* are pre-assigned (by LLM or fallback) before this is called. + """ + return "\n".join( + [ + "Create a cinematic, visually striking Remotion scene.", + f"The video is {duration_in_frames} frames at {FPS}fps ({duration_in_frames / FPS:.1f}s total).", + "", + f"This is slide {slide_number} of {total_slides} in the video.", + "", + f"=== ASSIGNED THEME: {theme} / {mode.upper()} mode ===", + f"You MUST use the {theme} preset in {mode} mode from the theme presets above.", + f"Use its exact background gradient (gradient-{mode}), surface, text, accent, secondary, border, and glow colors.", + "Do NOT substitute, invent, or default to blue/violet colors.", + "", + f'The scene should communicate this message: "{title} — {subtitle}"', + "", + "Key ideas to convey (use as creative inspiration, NOT literal text to dump on screen):", + content_in_markdown, + "", + "Pick only the 1-2 most impactful phrases or numbers to display as text.", + "", + f"Mood & tone: {background_explanation}", + ] + ) + + +REFINE_SCENE_SYSTEM_PROMPT = """ +You are a code repair assistant. You will receive a Remotion React component that failed to compile, +along with the exact error message from the Babel transpiler. + +Your job is to fix the code so it compiles and runs correctly. + +RULES: +- Output ONLY the fixed raw code as a string — no markdown fences, no explanations. +- Preserve the original intent, design, and animations as closely as possible. +- The component must be exported as: export const MyComposition = () => { ... } +- Only these globals are available at runtime (they are injected, not actually imported): + React, AbsoluteFill, useCurrentFrame, useVideoConfig, spring, interpolate, Easing, + stagger (a helper: stagger(frame, fps, index, total) → { opacity, transform }) +- Keep import statements at the top (they get stripped by the compiler) but do NOT import anything + other than "react" and "remotion". +- Use inline styles only (no CSS, no className). +- Common fixes: + - Mismatched braces/brackets in JSX style objects (e.g. }}, instead of }}>) + - Missing closing tags + - Trailing commas before > in JSX + - Undefined variables or typos + - Invalid JSX expressions +- After fixing, mentally walk through every brace pair { } and JSX tag to verify they match. +""".strip() diff --git a/surfsense_backend/app/agents/video_presentation/state.py b/surfsense_backend/app/agents/video_presentation/state.py new file mode 100644 index 000000000..adfedec48 --- /dev/null +++ b/surfsense_backend/app/agents/video_presentation/state.py @@ -0,0 +1,73 @@ +"""Define the state structures for the video presentation agent.""" + +from __future__ import annotations + +from dataclasses import dataclass + +from pydantic import BaseModel, Field +from sqlalchemy.ext.asyncio import AsyncSession + + +class SlideContent(BaseModel): + """Represents a single parsed slide from content analysis.""" + + slide_number: int = Field(..., description="1-based slide number") + title: str = Field(..., description="Concise slide title") + subtitle: str = Field(..., description="One-line subtitle or tagline") + content_in_markdown: str = Field( + ..., description="Slide body content formatted as markdown" + ) + speaker_transcripts: list[str] = Field( + ..., + description="2-4 short sentences a presenter would say while this slide is shown", + ) + background_explanation: str = Field( + ..., + description="Emotional mood and color direction for this slide", + ) + + +class PresentationSlides(BaseModel): + """Represents the full set of parsed slides from the LLM.""" + + slides: list[SlideContent] = Field( + ..., description="Ordered array of presentation slides" + ) + + +class SlideAudioResult(BaseModel): + """Audio generation result for a single slide.""" + + slide_number: int + audio_file: str = Field(..., description="Path to the per-slide audio file") + duration_seconds: float = Field(..., description="Audio duration in seconds") + duration_in_frames: int = Field( + ..., description="Audio duration in frames (at 30fps)" + ) + + +class SlideSceneCode(BaseModel): + """Generated Remotion component code for a single slide.""" + + slide_number: int + code: str = Field( + ..., description="Raw Remotion React component source code for this slide" + ) + title: str = Field(..., description="Short title for the composition") + + +@dataclass +class State: + """State for the video presentation agent graph. + + Pipeline: parse slides → (TTS audio ∥ theme assignment) → generate Remotion code + The frontend receives the slides + code + audio and handles compilation/rendering. + """ + + db_session: AsyncSession + source_content: str + + slides: list[SlideContent] | None = None + slide_audio_results: list[SlideAudioResult] | None = None + slide_theme_assignments: dict[int, tuple[str, str]] | None = None + slide_scene_codes: list[SlideSceneCode] | None = None diff --git a/surfsense_backend/app/agents/video_presentation/utils.py b/surfsense_backend/app/agents/video_presentation/utils.py new file mode 100644 index 000000000..58909e104 --- /dev/null +++ b/surfsense_backend/app/agents/video_presentation/utils.py @@ -0,0 +1,30 @@ +def get_voice_for_provider(provider: str, speaker_id: int = 0) -> dict | str: + """ + Get the appropriate voice configuration based on the TTS provider. + + Currently single-speaker only (speaker_id=0). Multi-speaker support + will be added in a future iteration. + + Args: + provider: The TTS provider (e.g., "openai/tts-1", "vertex_ai/test") + speaker_id: The ID of the speaker (default 0, single speaker for now) + + Returns: + Voice configuration - string for OpenAI, dict for Vertex AI + """ + if provider == "local/kokoro": + return "af_heart" + + provider_type = ( + provider.split("/")[0].lower() if "/" in provider else provider.lower() + ) + + voices = { + "openai": "alloy", + "vertex_ai": { + "languageCode": "en-US", + "name": "en-US-Studio-O", + }, + "azure": "alloy", + } + return voices.get(provider_type, {}) diff --git a/surfsense_backend/app/app.py b/surfsense_backend/app/app.py index e6db5670e..bba2f1f3a 100644 --- a/surfsense_backend/app/app.py +++ b/surfsense_backend/app/app.py @@ -340,20 +340,17 @@ if config.NEXT_FRONTEND_URL: if www_url not in allowed_origins: allowed_origins.append(www_url) -# For local development, also allow common localhost origins -if not config.BACKEND_URL or ( - config.NEXT_FRONTEND_URL and "localhost" in config.NEXT_FRONTEND_URL -): - allowed_origins.extend( - [ - "http://localhost:3000", - "http://127.0.0.1:3000", - ] - ) +allowed_origins.extend( + [ # For local development and desktop app + "http://localhost:3000", + "http://127.0.0.1:3000", + ] +) app.add_middleware( CORSMiddleware, allow_origins=allowed_origins, + allow_origin_regex=r"^https?://(localhost|127\.0\.0\.1)(:\d+)?$", allow_credentials=True, allow_methods=["*"], # Allows all methods allow_headers=["*"], # Allows all headers diff --git a/surfsense_backend/app/celery_app.py b/surfsense_backend/app/celery_app.py index 62414775a..69e117747 100644 --- a/surfsense_backend/app/celery_app.py +++ b/surfsense_backend/app/celery_app.py @@ -77,6 +77,7 @@ celery_app = Celery( include=[ "app.tasks.celery_tasks.document_tasks", "app.tasks.celery_tasks.podcast_tasks", + "app.tasks.celery_tasks.video_presentation_tasks", "app.tasks.celery_tasks.connector_tasks", "app.tasks.celery_tasks.schedule_checker_task", "app.tasks.celery_tasks.document_reindex_tasks", diff --git a/surfsense_backend/app/config/__init__.py b/surfsense_backend/app/config/__init__.py index aaf77a54f..186936325 100644 --- a/surfsense_backend/app/config/__init__.py +++ b/surfsense_backend/app/config/__init__.py @@ -224,6 +224,9 @@ class Config: os.getenv("CONNECTOR_INDEXING_LOCK_TTL_SECONDS", str(8 * 60 * 60)) ) + # Platform web search (SearXNG) + SEARXNG_DEFAULT_HOST = os.getenv("SEARXNG_DEFAULT_HOST") + NEXT_FRONTEND_URL = os.getenv("NEXT_FRONTEND_URL") # Backend URL to override the http to https in the OAuth redirect URI BACKEND_URL = os.getenv("BACKEND_URL") diff --git a/surfsense_backend/app/config/global_llm_config.example.yaml b/surfsense_backend/app/config/global_llm_config.example.yaml index 0bb00c398..6ca3e95e3 100644 --- a/surfsense_backend/app/config/global_llm_config.example.yaml +++ b/surfsense_backend/app/config/global_llm_config.example.yaml @@ -183,6 +183,23 @@ global_llm_configs: use_default_system_instructions: true citations_enabled: true + # Example: MiniMax M2.5 - High-performance with 204K context window + - id: -8 + name: "Global MiniMax M2.5" + description: "MiniMax M2.5 with 204K context window and competitive pricing" + provider: "MINIMAX" + model_name: "MiniMax-M2.5" + api_key: "your-minimax-api-key-here" + api_base: "https://api.minimax.io/v1" + rpm: 60 + tpm: 100000 + litellm_params: + temperature: 1.0 # MiniMax requires temperature in (0.0, 1.0], cannot be 0 + max_tokens: 4000 + system_instructions: "" + use_default_system_instructions: true + citations_enabled: true + # ============================================================================= # Image Generation Configuration # ============================================================================= diff --git a/surfsense_backend/app/connectors/composio_gmail_connector.py b/surfsense_backend/app/connectors/composio_gmail_connector.py index 2a382f3b8..e675085db 100644 --- a/surfsense_backend/app/connectors/composio_gmail_connector.py +++ b/surfsense_backend/app/connectors/composio_gmail_connector.py @@ -463,7 +463,7 @@ async def _process_gmail_messages_phase2( "connector_id": connector_id, "source": "composio", } - safe_set_chunks(document, chunks) + await safe_set_chunks(session, document, chunks) document.updated_at = get_current_timestamp() document.status = DocumentStatus.ready() diff --git a/surfsense_backend/app/connectors/composio_google_calendar_connector.py b/surfsense_backend/app/connectors/composio_google_calendar_connector.py index 63bade873..6344f9f38 100644 --- a/surfsense_backend/app/connectors/composio_google_calendar_connector.py +++ b/surfsense_backend/app/connectors/composio_google_calendar_connector.py @@ -477,7 +477,7 @@ async def index_composio_google_calendar( "connector_id": connector_id, "source": "composio", } - safe_set_chunks(document, chunks) + await safe_set_chunks(session, document, chunks) document.updated_at = get_current_timestamp() document.status = DocumentStatus.ready() diff --git a/surfsense_backend/app/connectors/composio_google_drive_connector.py b/surfsense_backend/app/connectors/composio_google_drive_connector.py index c10edb7e9..30ce4a77b 100644 --- a/surfsense_backend/app/connectors/composio_google_drive_connector.py +++ b/surfsense_backend/app/connectors/composio_google_drive_connector.py @@ -1112,7 +1112,7 @@ async def _index_composio_drive_delta_sync( "connector_id": connector_id, "source": "composio", } - safe_set_chunks(document, chunks) + await safe_set_chunks(session, document, chunks) document.updated_at = get_current_timestamp() document.status = DocumentStatus.ready() @@ -1520,7 +1520,7 @@ async def _index_composio_drive_full_scan( "connector_id": connector_id, "source": "composio", } - safe_set_chunks(document, chunks) + await safe_set_chunks(session, document, chunks) document.updated_at = get_current_timestamp() document.status = DocumentStatus.ready() diff --git a/surfsense_backend/app/db.py b/surfsense_backend/app/db.py index 062b11b3a..2ce48c16d 100644 --- a/surfsense_backend/app/db.py +++ b/surfsense_backend/app/db.py @@ -103,6 +103,13 @@ class PodcastStatus(StrEnum): FAILED = "failed" +class VideoPresentationStatus(StrEnum): + PENDING = "pending" + GENERATING = "generating" + READY = "ready" + FAILED = "failed" + + class DocumentStatus: """ Helper class for document processing status (stored as JSONB). @@ -215,6 +222,7 @@ class LiteLLMProvider(StrEnum): COMETAPI = "COMETAPI" HUGGINGFACE = "HUGGINGFACE" GITHUB_MODELS = "GITHUB_MODELS" + MINIMAX = "MINIMAX" CUSTOM = "CUSTOM" @@ -336,6 +344,12 @@ class Permission(StrEnum): PODCASTS_UPDATE = "podcasts:update" PODCASTS_DELETE = "podcasts:delete" + # Video Presentations + VIDEO_PRESENTATIONS_CREATE = "video_presentations:create" + VIDEO_PRESENTATIONS_READ = "video_presentations:read" + VIDEO_PRESENTATIONS_UPDATE = "video_presentations:update" + VIDEO_PRESENTATIONS_DELETE = "video_presentations:delete" + # Image Generations IMAGE_GENERATIONS_CREATE = "image_generations:create" IMAGE_GENERATIONS_READ = "image_generations:read" @@ -402,6 +416,10 @@ DEFAULT_ROLE_PERMISSIONS = { Permission.PODCASTS_CREATE.value, Permission.PODCASTS_READ.value, Permission.PODCASTS_UPDATE.value, + # Video Presentations (no delete) + Permission.VIDEO_PRESENTATIONS_CREATE.value, + Permission.VIDEO_PRESENTATIONS_READ.value, + Permission.VIDEO_PRESENTATIONS_UPDATE.value, # Image Generations (create and read, no delete) Permission.IMAGE_GENERATIONS_CREATE.value, Permission.IMAGE_GENERATIONS_READ.value, @@ -434,6 +452,8 @@ DEFAULT_ROLE_PERMISSIONS = { Permission.LLM_CONFIGS_READ.value, # Podcasts (read only) Permission.PODCASTS_READ.value, + # Video Presentations (read only) + Permission.VIDEO_PRESENTATIONS_READ.value, # Image Generations (read only) Permission.IMAGE_GENERATIONS_READ.value, # Connectors (read only) @@ -1043,6 +1063,46 @@ class Podcast(BaseModel, TimestampMixin): thread = relationship("NewChatThread") +class VideoPresentation(BaseModel, TimestampMixin): + """Video presentation model for storing AI-generated video presentations. + + The slides JSONB stores per-slide data including Remotion component code, + audio file paths, and durations. The frontend compiles the code and renders + the video using Remotion Player. + """ + + __tablename__ = "video_presentations" + + title = Column(String(500), nullable=False) + slides = Column(JSONB, nullable=True) + scene_codes = Column(JSONB, nullable=True) + status = Column( + SQLAlchemyEnum( + VideoPresentationStatus, + name="video_presentation_status", + create_type=False, + values_callable=lambda x: [e.value for e in x], + ), + nullable=False, + default=VideoPresentationStatus.READY, + server_default="ready", + index=True, + ) + + search_space_id = Column( + Integer, ForeignKey("searchspaces.id", ondelete="CASCADE"), nullable=False + ) + search_space = relationship("SearchSpace", back_populates="video_presentations") + + thread_id = Column( + Integer, + ForeignKey("new_chat_threads.id", ondelete="SET NULL"), + nullable=True, + index=True, + ) + thread = relationship("NewChatThread") + + class Report(BaseModel, TimestampMixin): """Report model for storing generated Markdown reports.""" @@ -1227,6 +1287,12 @@ class SearchSpace(BaseModel, TimestampMixin): order_by="Podcast.id.desc()", cascade="all, delete-orphan", ) + video_presentations = relationship( + "VideoPresentation", + back_populates="search_space", + order_by="VideoPresentation.id.desc()", + cascade="all, delete-orphan", + ) reports = relationship( "Report", back_populates="search_space", diff --git a/surfsense_backend/app/retriever/chunks_hybrid_search.py b/surfsense_backend/app/retriever/chunks_hybrid_search.py index 5ab2964ca..d8b009655 100644 --- a/surfsense_backend/app/retriever/chunks_hybrid_search.py +++ b/surfsense_backend/app/retriever/chunks_hybrid_search.py @@ -1,3 +1,4 @@ +import asyncio import time from datetime import datetime @@ -49,7 +50,7 @@ class ChucksHybridSearchRetriever: # Get embedding for the query embedding_model = config.embedding_model_instance t_embed = time.perf_counter() - query_embedding = embedding_model.embed(query_text) + query_embedding = await asyncio.to_thread(embedding_model.embed, query_text) perf.debug( "[chunk_search] vector_search embedding in %.3fs", time.perf_counter() - t_embed, @@ -195,7 +196,7 @@ class ChucksHybridSearchRetriever: if query_embedding is None: embedding_model = config.embedding_model_instance t_embed = time.perf_counter() - query_embedding = embedding_model.embed(query_text) + query_embedding = await asyncio.to_thread(embedding_model.embed, query_text) perf.debug( "[chunk_search] hybrid_search embedding in %.3fs", time.perf_counter() - t_embed, diff --git a/surfsense_backend/app/routes/__init__.py b/surfsense_backend/app/routes/__init__.py index d7df2182a..66471b0ed 100644 --- a/surfsense_backend/app/routes/__init__.py +++ b/surfsense_backend/app/routes/__init__.py @@ -42,6 +42,7 @@ from .search_spaces_routes import router as search_spaces_router from .slack_add_connector_route import router as slack_add_connector_router from .surfsense_docs_routes import router as surfsense_docs_router from .teams_add_connector_route import router as teams_add_connector_router +from .video_presentations_routes import router as video_presentations_router from .youtube_routes import router as youtube_router router = APIRouter() @@ -55,6 +56,9 @@ router.include_router(new_chat_router) # Chat with assistant-ui persistence router.include_router(sandbox_router) # Sandbox file downloads (Daytona) router.include_router(chat_comments_router) router.include_router(podcasts_router) # Podcast task status and audio +router.include_router( + video_presentations_router +) # Video presentation status and streaming router.include_router(reports_router) # Report CRUD and multi-format export router.include_router(image_generation_router) # Image generation via litellm router.include_router(search_source_connectors_router) diff --git a/surfsense_backend/app/routes/public_chat_routes.py b/surfsense_backend/app/routes/public_chat_routes.py index 9afcbc188..e206bfd11 100644 --- a/surfsense_backend/app/routes/public_chat_routes.py +++ b/surfsense_backend/app/routes/public_chat_routes.py @@ -21,6 +21,7 @@ from app.services.public_chat_service import ( get_public_chat, get_snapshot_podcast, get_snapshot_report, + get_snapshot_video_presentation, ) from app.users import current_active_user @@ -117,6 +118,119 @@ async def stream_public_podcast( ) +@router.get("/{share_token}/video-presentations/{video_presentation_id}") +async def get_public_video_presentation( + share_token: str, + video_presentation_id: int, + session: AsyncSession = Depends(get_async_session), +): + """ + Get video presentation details from a public chat snapshot. + + No authentication required - the share_token provides access. + Returns slide data (with public audio URLs) and scene codes. + """ + vp_info = await get_snapshot_video_presentation( + session, share_token, video_presentation_id + ) + + if not vp_info: + raise HTTPException(status_code=404, detail="Video presentation not found") + + slides = vp_info.get("slides") or [] + public_slides = _replace_audio_paths_with_public_urls( + share_token, video_presentation_id, slides + ) + + return { + "id": vp_info.get("original_id"), + "title": vp_info.get("title"), + "status": "ready", + "slides": public_slides, + "scene_codes": vp_info.get("scene_codes"), + "slide_count": len(slides) if slides else None, + } + + +@router.get( + "/{share_token}/video-presentations/{video_presentation_id}/slides/{slide_number}/audio" +) +async def stream_public_slide_audio( + share_token: str, + video_presentation_id: int, + slide_number: int, + session: AsyncSession = Depends(get_async_session), +): + """ + Stream a slide's audio from a public chat snapshot. + + No authentication required - the share_token provides access. + """ + from pathlib import Path + + vp_info = await get_snapshot_video_presentation( + session, share_token, video_presentation_id + ) + + if not vp_info: + raise HTTPException(status_code=404, detail="Video presentation not found") + + slides = vp_info.get("slides") or [] + slide_data = None + for s in slides: + if s.get("slide_number") == slide_number: + slide_data = s + break + + if not slide_data: + raise HTTPException(status_code=404, detail=f"Slide {slide_number} not found") + + file_path = slide_data.get("audio_file") + if not file_path or not os.path.isfile(file_path): + raise HTTPException(status_code=404, detail="Slide audio file not found") + + ext = Path(file_path).suffix.lower() + media_type = "audio/wav" if ext == ".wav" else "audio/mpeg" + + def iterfile(): + with open(file_path, mode="rb") as file_like: + yield from file_like + + return StreamingResponse( + iterfile(), + media_type=media_type, + headers={ + "Accept-Ranges": "bytes", + "Content-Disposition": f"inline; filename={Path(file_path).name}", + }, + ) + + +def _replace_audio_paths_with_public_urls( + share_token: str, + video_presentation_id: int, + slides: list[dict], +) -> list[dict]: + """Replace server-local audio_file paths with public streaming API URLs.""" + result = [] + for slide in slides: + slide_copy = dict(slide) + slide_number = slide_copy.get("slide_number") + audio_file = slide_copy.pop("audio_file", None) + + if audio_file and slide_number is not None: + slide_copy["audio_url"] = ( + f"/api/v1/public/{share_token}" + f"/video-presentations/{video_presentation_id}" + f"/slides/{slide_number}/audio" + ) + else: + slide_copy["audio_url"] = None + + result.append(slide_copy) + return result + + @router.get("/{share_token}/reports/{report_id}/content") async def get_public_report_content( share_token: str, diff --git a/surfsense_backend/app/routes/video_presentations_routes.py b/surfsense_backend/app/routes/video_presentations_routes.py new file mode 100644 index 000000000..ed694b9bf --- /dev/null +++ b/surfsense_backend/app/routes/video_presentations_routes.py @@ -0,0 +1,242 @@ +""" +Video presentation routes for CRUD operations and per-slide audio streaming. + +These routes support the video presentation generation feature in new-chat. +Frontend polls GET /video-presentations/{id} to check status field. +When ready, the slides JSONB contains per-slide Remotion code and audio file paths. +The frontend compiles the Remotion code via Babel and renders with Remotion Player. +""" + +import os +from pathlib import Path + +from fastapi import APIRouter, Depends, HTTPException +from fastapi.responses import StreamingResponse +from sqlalchemy import select +from sqlalchemy.exc import SQLAlchemyError +from sqlalchemy.ext.asyncio import AsyncSession + +from app.db import ( + Permission, + SearchSpace, + SearchSpaceMembership, + User, + VideoPresentation, + get_async_session, +) +from app.schemas import VideoPresentationRead +from app.users import current_active_user +from app.utils.rbac import check_permission + +router = APIRouter() + + +@router.get("/video-presentations", response_model=list[VideoPresentationRead]) +async def read_video_presentations( + skip: int = 0, + limit: int = 100, + search_space_id: int | None = None, + session: AsyncSession = Depends(get_async_session), + user: User = Depends(current_active_user), +): + """ + List video presentations the user has access to. + Requires VIDEO_PRESENTATIONS_READ permission for the search space(s). + """ + if skip < 0 or limit < 1: + raise HTTPException(status_code=400, detail="Invalid pagination parameters") + try: + if search_space_id is not None: + await check_permission( + session, + user, + search_space_id, + Permission.VIDEO_PRESENTATIONS_READ.value, + "You don't have permission to read video presentations in this search space", + ) + result = await session.execute( + select(VideoPresentation) + .filter(VideoPresentation.search_space_id == search_space_id) + .offset(skip) + .limit(limit) + ) + else: + result = await session.execute( + select(VideoPresentation) + .join(SearchSpace) + .join(SearchSpaceMembership) + .filter(SearchSpaceMembership.user_id == user.id) + .offset(skip) + .limit(limit) + ) + return [ + VideoPresentationRead.from_orm_with_slides(vp) + for vp in result.scalars().all() + ] + except HTTPException: + raise + except SQLAlchemyError: + raise HTTPException( + status_code=500, + detail="Database error occurred while fetching video presentations", + ) from None + + +@router.get( + "/video-presentations/{video_presentation_id}", + response_model=VideoPresentationRead, +) +async def read_video_presentation( + video_presentation_id: int, + session: AsyncSession = Depends(get_async_session), + user: User = Depends(current_active_user), +): + """ + Get a specific video presentation by ID. + Requires authentication with VIDEO_PRESENTATIONS_READ permission. + + When status is "ready", the response includes: + - slides: parsed slide data with per-slide audio_url and durations + - scene_codes: Remotion component source code per slide + """ + try: + result = await session.execute( + select(VideoPresentation).filter( + VideoPresentation.id == video_presentation_id + ) + ) + video_pres = result.scalars().first() + + if not video_pres: + raise HTTPException(status_code=404, detail="Video presentation not found") + + await check_permission( + session, + user, + video_pres.search_space_id, + Permission.VIDEO_PRESENTATIONS_READ.value, + "You don't have permission to read video presentations in this search space", + ) + + return VideoPresentationRead.from_orm_with_slides(video_pres) + except HTTPException as he: + raise he + except SQLAlchemyError: + raise HTTPException( + status_code=500, + detail="Database error occurred while fetching video presentation", + ) from None + + +@router.delete("/video-presentations/{video_presentation_id}", response_model=dict) +async def delete_video_presentation( + video_presentation_id: int, + session: AsyncSession = Depends(get_async_session), + user: User = Depends(current_active_user), +): + """ + Delete a video presentation. + Requires VIDEO_PRESENTATIONS_DELETE permission for the search space. + """ + try: + result = await session.execute( + select(VideoPresentation).filter( + VideoPresentation.id == video_presentation_id + ) + ) + db_video_pres = result.scalars().first() + + if not db_video_pres: + raise HTTPException(status_code=404, detail="Video presentation not found") + + await check_permission( + session, + user, + db_video_pres.search_space_id, + Permission.VIDEO_PRESENTATIONS_DELETE.value, + "You don't have permission to delete video presentations in this search space", + ) + + await session.delete(db_video_pres) + await session.commit() + return {"message": "Video presentation deleted successfully"} + except HTTPException as he: + raise he + except SQLAlchemyError: + await session.rollback() + raise HTTPException( + status_code=500, + detail="Database error occurred while deleting video presentation", + ) from None + + +@router.get("/video-presentations/{video_presentation_id}/slides/{slide_number}/audio") +async def stream_slide_audio( + video_presentation_id: int, + slide_number: int, + session: AsyncSession = Depends(get_async_session), + user: User = Depends(current_active_user), +): + """ + Stream the audio file for a specific slide in a video presentation. + The slide_number is 1-based. Audio path is read from the slides JSONB. + """ + try: + result = await session.execute( + select(VideoPresentation).filter( + VideoPresentation.id == video_presentation_id + ) + ) + video_pres = result.scalars().first() + + if not video_pres: + raise HTTPException(status_code=404, detail="Video presentation not found") + + await check_permission( + session, + user, + video_pres.search_space_id, + Permission.VIDEO_PRESENTATIONS_READ.value, + "You don't have permission to access video presentations in this search space", + ) + + slides = video_pres.slides or [] + slide_data = None + for s in slides: + if s.get("slide_number") == slide_number: + slide_data = s + break + + if not slide_data: + raise HTTPException( + status_code=404, + detail=f"Slide {slide_number} not found", + ) + + file_path = slide_data.get("audio_file") + if not file_path or not os.path.isfile(file_path): + raise HTTPException(status_code=404, detail="Slide audio file not found") + + ext = Path(file_path).suffix.lower() + media_type = "audio/wav" if ext == ".wav" else "audio/mpeg" + + def iterfile(): + with open(file_path, mode="rb") as file_like: + yield from file_like + + return StreamingResponse( + iterfile(), + media_type=media_type, + headers={ + "Accept-Ranges": "bytes", + "Content-Disposition": f"inline; filename={Path(file_path).name}", + }, + ) + + except HTTPException as he: + raise he + except Exception as e: + raise HTTPException( + status_code=500, + detail=f"Error streaming slide audio: {e!s}", + ) from e diff --git a/surfsense_backend/app/schemas/__init__.py b/surfsense_backend/app/schemas/__init__.py index 7e3ba1936..11d3bfc06 100644 --- a/surfsense_backend/app/schemas/__init__.py +++ b/surfsense_backend/app/schemas/__init__.py @@ -101,6 +101,12 @@ from .search_space import ( SearchSpaceWithStats, ) from .users import UserCreate, UserRead, UserUpdate +from .video_presentations import ( + VideoPresentationBase, + VideoPresentationCreate, + VideoPresentationRead, + VideoPresentationUpdate, +) __all__ = [ # Chat schemas (assistant-ui integration) @@ -220,4 +226,9 @@ __all__ = [ "UserRead", "UserSearchSpaceAccess", "UserUpdate", + # Video Presentation schemas + "VideoPresentationBase", + "VideoPresentationCreate", + "VideoPresentationRead", + "VideoPresentationUpdate", ] diff --git a/surfsense_backend/app/schemas/search_space.py b/surfsense_backend/app/schemas/search_space.py index 729ff4e7d..054fe1465 100644 --- a/surfsense_backend/app/schemas/search_space.py +++ b/surfsense_backend/app/schemas/search_space.py @@ -12,13 +12,11 @@ class SearchSpaceBase(BaseModel): class SearchSpaceCreate(SearchSpaceBase): - # Optional on create, will use defaults if not provided citations_enabled: bool = True qna_custom_instructions: str | None = None class SearchSpaceUpdate(BaseModel): - # All fields optional on update - only send what you want to change name: str | None = None description: str | None = None citations_enabled: bool | None = None @@ -29,7 +27,6 @@ class SearchSpaceRead(SearchSpaceBase, IDModel, TimestampModel): id: int created_at: datetime user_id: uuid.UUID - # QnA configuration citations_enabled: bool qna_custom_instructions: str | None = None diff --git a/surfsense_backend/app/schemas/video_presentations.py b/surfsense_backend/app/schemas/video_presentations.py new file mode 100644 index 000000000..ec29147ef --- /dev/null +++ b/surfsense_backend/app/schemas/video_presentations.py @@ -0,0 +1,103 @@ +"""Video presentation schemas for API responses.""" + +from datetime import datetime +from enum import StrEnum +from typing import Any + +from pydantic import BaseModel + + +class VideoPresentationStatusEnum(StrEnum): + PENDING = "pending" + GENERATING = "generating" + READY = "ready" + FAILED = "failed" + + +class VideoPresentationBase(BaseModel): + """Base video presentation schema.""" + + title: str + slides: list[dict[str, Any]] | None = None + scene_codes: list[dict[str, Any]] | None = None + search_space_id: int + + +class VideoPresentationCreate(VideoPresentationBase): + """Schema for creating a video presentation.""" + + pass + + +class VideoPresentationUpdate(BaseModel): + """Schema for updating a video presentation.""" + + title: str | None = None + slides: list[dict[str, Any]] | None = None + scene_codes: list[dict[str, Any]] | None = None + + +class VideoPresentationRead(VideoPresentationBase): + """Schema for reading a video presentation.""" + + id: int + status: VideoPresentationStatusEnum = VideoPresentationStatusEnum.READY + created_at: datetime + slide_count: int | None = None + + class Config: + from_attributes = True + + @classmethod + def from_orm_with_slides(cls, obj): + """Create VideoPresentationRead with slide_count computed. + + Replaces raw server file paths in `audio_file` with API streaming + URLs so the frontend can use them directly in Remotion