mirror of
https://github.com/MODSetter/SurfSense.git
synced 2026-05-01 03:46:25 +02:00
feat: added atlassian docs
This commit is contained in:
parent
3ecd4eb320
commit
761fa9162b
19 changed files with 194 additions and 249 deletions
|
|
@ -3,36 +3,4 @@ title: Web Crawler
|
|||
description: Crawl and index websites with SurfSense
|
||||
---
|
||||
|
||||
# Web Crawler Connector
|
||||
|
||||
Crawl and index public websites to make them searchable.
|
||||
|
||||
## Prerequisites
|
||||
|
||||
- Firecrawl API key (see [Prerequisites](/docs))
|
||||
|
||||
## Setup
|
||||
|
||||
1. Navigate to your Search Space settings
|
||||
2. Click on **Add Connector**
|
||||
3. Select **Web Crawler** from the list
|
||||
4. Enter the URL(s) you want to crawl
|
||||
5. Configure crawl depth and settings
|
||||
|
||||
## What Gets Indexed
|
||||
|
||||
- Web page content
|
||||
- Page titles and metadata
|
||||
- Links and navigation
|
||||
- Images and media (configurable)
|
||||
|
||||
## Configuration Options
|
||||
|
||||
- **Crawl Depth**: How many levels deep to crawl
|
||||
- **Include/Exclude Patterns**: Filter which URLs to index
|
||||
- **Rate Limiting**: Control crawl speed
|
||||
|
||||
## Sync Frequency
|
||||
|
||||
The Web Crawler connector supports scheduled re-crawling to keep your content up to date.
|
||||
|
||||
# Documentation in progress
|
||||
Loading…
Add table
Add a link
Reference in a new issue