mirror of
https://github.com/MODSetter/SurfSense.git
synced 2026-04-28 10:26:33 +02:00
- Enhanced the DocsPage component by adding table of content and popover configurations. - Removed the 'full' property from multiple MDX files to streamline documentation structure. - Updated meta.json to reflect new documentation organization and added a 'connectors' page.
38 lines
843 B
Text
38 lines
843 B
Text
---
|
|
title: Web Crawler
|
|
description: Crawl and index websites with SurfSense
|
|
---
|
|
|
|
# Web Crawler Connector
|
|
|
|
Crawl and index public websites to make them searchable.
|
|
|
|
## Prerequisites
|
|
|
|
- Firecrawl API key (see [Prerequisites](/docs))
|
|
|
|
## Setup
|
|
|
|
1. Navigate to your Search Space settings
|
|
2. Click on **Add Connector**
|
|
3. Select **Web Crawler** from the list
|
|
4. Enter the URL(s) you want to crawl
|
|
5. Configure crawl depth and settings
|
|
|
|
## What Gets Indexed
|
|
|
|
- Web page content
|
|
- Page titles and metadata
|
|
- Links and navigation
|
|
- Images and media (configurable)
|
|
|
|
## Configuration Options
|
|
|
|
- **Crawl Depth**: How many levels deep to crawl
|
|
- **Include/Exclude Patterns**: Filter which URLs to index
|
|
- **Rate Limiting**: Control crawl speed
|
|
|
|
## Sync Frequency
|
|
|
|
The Web Crawler connector supports scheduled re-crawling to keep your content up to date.
|
|
|