mirror of
https://github.com/MODSetter/SurfSense.git
synced 2026-05-01 20:03:30 +02:00
feat: update documentation structure and remove 'full' property from MDX files
- Enhanced the DocsPage component by adding table of content and popover configurations. - Removed the 'full' property from multiple MDX files to streamline documentation structure. - Updated meta.json to reflect new documentation organization and added a 'connectors' page.
This commit is contained in:
parent
929bc026e6
commit
ba54e1da06
23 changed files with 581 additions and 8 deletions
38
surfsense_web/content/docs/connectors/web-crawler.mdx
Normal file
38
surfsense_web/content/docs/connectors/web-crawler.mdx
Normal file
|
|
@ -0,0 +1,38 @@
|
|||
---
|
||||
title: Web Crawler
|
||||
description: Crawl and index websites with SurfSense
|
||||
---
|
||||
|
||||
# Web Crawler Connector
|
||||
|
||||
Crawl and index public websites to make them searchable.
|
||||
|
||||
## Prerequisites
|
||||
|
||||
- Firecrawl API key (see [Prerequisites](/docs))
|
||||
|
||||
## Setup
|
||||
|
||||
1. Navigate to your Search Space settings
|
||||
2. Click on **Add Connector**
|
||||
3. Select **Web Crawler** from the list
|
||||
4. Enter the URL(s) you want to crawl
|
||||
5. Configure crawl depth and settings
|
||||
|
||||
## What Gets Indexed
|
||||
|
||||
- Web page content
|
||||
- Page titles and metadata
|
||||
- Links and navigation
|
||||
- Images and media (configurable)
|
||||
|
||||
## Configuration Options
|
||||
|
||||
- **Crawl Depth**: How many levels deep to crawl
|
||||
- **Include/Exclude Patterns**: Filter which URLs to index
|
||||
- **Rate Limiting**: Control crawl speed
|
||||
|
||||
## Sync Frequency
|
||||
|
||||
The Web Crawler connector supports scheduled re-crawling to keep your content up to date.
|
||||
|
||||
Loading…
Add table
Add a link
Reference in a new issue