feat: add allow_subdomains and allow_external_links to CrawlConfig

Crawls are same-origin by default. Enable allow_subdomains to follow
sibling/child subdomains (blog.example.com from example.com), or
allow_external_links for full cross-origin crawling.

Root domain extraction uses a heuristic that handles two-part TLDs
(co.uk, com.au). Includes 5 unit tests for root_domain().

Bump to 0.3.12.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
This commit is contained in:
Valerio 2026-04-14 19:33:06 +02:00
parent a4c351d5ae
commit 050b2ef463
7 changed files with 109 additions and 17 deletions

View file

@ -1218,6 +1218,8 @@ async fn run_crawl(cli: &Cli) -> Result<(), String> {
exclude_patterns,
progress_tx: Some(progress_tx),
cancel_flag: Some(Arc::clone(&cancel_flag)),
allow_subdomains: false,
allow_external_links: false,
};
// Load resume state if --crawl-state file exists