SurfSense/surfsense_backend/tests/fixtures/sample.md
Anish Sarkar 41eb68663a feat: Add end-to-end tests for document upload pipeline and shared test utilities
- Introduced new test files for end-to-end testing of document uploads, including support for .txt, .md, and .pdf formats.
- Created shared fixtures and helper functions for authentication, document management, and cleanup.
- Added sample documents for testing purposes.
- Established a conftest.py file to provide reusable fixtures across test modules.
2026-02-25 16:39:45 +05:30

1.4 KiB

SurfSense Test Document

Overview

This is a sample markdown document used for end-to-end testing of the manual document upload pipeline. It includes various markdown formatting elements.

Key Features

  • Document upload and processing
  • Automatic chunking of content
  • Embedding generation for semantic search
  • Real-time status tracking via ElectricSQL

Technical Architecture

Backend Stack

The SurfSense backend is built with:

  1. FastAPI for the REST API
  2. PostgreSQL with pgvector for vector storage
  3. Celery with Redis for background task processing
  4. Docling/Unstructured for document parsing (ETL)

Processing Pipeline

Documents go through a multi-stage pipeline:

Stage Description
Upload File received via API endpoint
Parsing Content extracted using ETL service
Chunking Text split into semantic chunks
Embedding Vector representations generated
Storage Chunks stored with embeddings in pgvector

Code Example

async def process_document(file_path: str) -> Document:
    content = extract_content(file_path)
    chunks = create_chunks(content)
    embeddings = generate_embeddings(chunks)
    return store_document(chunks, embeddings)

Conclusion

This document serves as a test fixture to validate the complete document processing pipeline from upload through to chunk creation and embedding storage.