SurfSense/docs/integration-architecture.md

2.4 KiB

SurfSense Integration Architecture

System Topography

SurfSense operates via a multi-part integration model separating state syncing, background processing, and web presentation.

1. Data Sync Layer (Zero + Next.js + PostgreSQL)

  • Zero Cache (zero-cache): Acts as the real-time CVR (Client View Record) layer synchronizing mutations.
  • Flow:
    • The Next.js frontend uses @rocicorp/zero to mutate or query data locally.
    • Zero Cache connects directly to the PostgreSQL upstream DB (ZERO_UPSTREAM_DB & ZERO_CVR_DB) polling for replication changes.
    • Conflicts and syncing are orchestrated between Zero Cache (port 4848) and Frontend API endpoints (/api/zero/query and /api/zero/mutate).

2. General API / Action Layer (FastAPI + Next.js)

  • FastAPI Layer: When synchronous REST operations, authentication, or triggers are needed (like initiating a scrape), Next.js performs HTTP REST calls to the FastAPI backend at BACKEND_PORT: 8929/8000.
  • The FastAPI layer accesses the Postgres DB directly utilizing asyncpg and SQLAlchemy.

3. AI / RAG Processing Layer (SearXNG + Celery + FastAPI)

  • Search: FastAPI contacts SearXNG at http://searxng:8080 (Internal Docker network) for anonymized agent searches.
  • Offloaded Processing:
    • Web scraping/Agent LLM processes are handed off from FastAPI to Celery.
    • Celery uses Redis at redis:6379/0 as its Broker and Result Backend.
    • A detached Celery Worker process natively handles LangGraph workflows, processing large web contexts and storing vector outputs back into PostgreSQL via pgvector.
    • Celery Beat provides ongoing scheduled tasks, waking up the worker for cron jobs.

Component Integrations Summary

Origin Target Protocol Description
Next.js (Web) Zero Cache WebSocket / HTTP Local-first State Sync API.
Next.js (Web) FastAPI HTTP/REST Triggering synchronous jobs, user logic.
Zero Cache PostgreSQL TCP (PG Protocol) Standard relational sync and conflict resolution via CVR.
FastAPI PostgreSQL TCP (PG Protocol) AsyncPG DB Reads/Writes.
FastAPI Redis TCP Celery queue pipelining / PubSub.
FastAPI SearXNG HTTP Metasearch queries.
Celery Worker Redis TCP Consuming asynchronous processing queues.
Celery Worker PostgreSQL TCP Saving LangGraph context and PgVector embeddings.