mirror of
https://github.com/MODSetter/SurfSense.git
synced 2026-05-04 13:22:41 +02:00
39 lines
843 B
Text
39 lines
843 B
Text
|
|
---
|
||
|
|
title: Web Crawler
|
||
|
|
description: Crawl and index websites with SurfSense
|
||
|
|
---
|
||
|
|
|
||
|
|
# Web Crawler Connector
|
||
|
|
|
||
|
|
Crawl and index public websites to make them searchable.
|
||
|
|
|
||
|
|
## Prerequisites
|
||
|
|
|
||
|
|
- Firecrawl API key (see [Prerequisites](/docs))
|
||
|
|
|
||
|
|
## Setup
|
||
|
|
|
||
|
|
1. Navigate to your Search Space settings
|
||
|
|
2. Click on **Add Connector**
|
||
|
|
3. Select **Web Crawler** from the list
|
||
|
|
4. Enter the URL(s) you want to crawl
|
||
|
|
5. Configure crawl depth and settings
|
||
|
|
|
||
|
|
## What Gets Indexed
|
||
|
|
|
||
|
|
- Web page content
|
||
|
|
- Page titles and metadata
|
||
|
|
- Links and navigation
|
||
|
|
- Images and media (configurable)
|
||
|
|
|
||
|
|
## Configuration Options
|
||
|
|
|
||
|
|
- **Crawl Depth**: How many levels deep to crawl
|
||
|
|
- **Include/Exclude Patterns**: Filter which URLs to index
|
||
|
|
- **Rate Limiting**: Control crawl speed
|
||
|
|
|
||
|
|
## Sync Frequency
|
||
|
|
|
||
|
|
The Web Crawler connector supports scheduled re-crawling to keep your content up to date.
|
||
|
|
|