# SurfSense Test Document ## Overview This is a **sample markdown document** used for end-to-end testing of the manual document upload pipeline. It includes various markdown formatting elements. ## Key Features - Document upload and processing - Automatic chunking of content - Embedding generation for semantic search - Real-time status tracking via ElectricSQL ## Technical Architecture ### Backend Stack The SurfSense backend is built with: 1. **FastAPI** for the REST API 2. **PostgreSQL** with pgvector for vector storage 3. **Celery** with Redis for background task processing 4. **Docling/Unstructured** for document parsing (ETL) ### Processing Pipeline Documents go through a multi-stage pipeline: | Stage | Description | |-------|-------------| | Upload | File received via API endpoint | | Parsing | Content extracted using ETL service | | Chunking | Text split into semantic chunks | | Embedding | Vector representations generated | | Storage | Chunks stored with embeddings in pgvector | ## Code Example ```python async def process_document(file_path: str) -> Document: content = extract_content(file_path) chunks = create_chunks(content) embeddings = generate_embeddings(chunks) return store_document(chunks, embeddings) ``` ## Conclusion This document serves as a test fixture to validate the complete document processing pipeline from upload through to chunk creation and embedding storage.