fix(core+server): guard markdown pipe slice + detect trustpilot/reddit verify walls
Some checks are pending
CI / Test (push) Waiting to run
CI / Lint (push) Waiting to run
CI / Docs (push) Waiting to run

This commit is contained in:
Valerio 2026-04-23 15:26:31 +02:00
parent 966981bc42
commit a5c3433372
2 changed files with 6 additions and 3 deletions

View file

@ -6,10 +6,11 @@ Format follows [Keep a Changelog](https://keepachangelog.com/).
## [0.5.6] — 2026-04-23 ## [0.5.6] — 2026-04-23
### Added ### Added
- `FetchClient::fetch_smart(url)` applies per-site rescue logic and returns the same `FetchResult` shape as `fetch()`. Reddit URLs route to the `.json` API, and Akamai-style challenge pages trigger a homepage cookie warmup plus a retry. Makes `/v1/scrape` on Reddit populate markdown again. - `FetchClient::fetch_smart(url)` applies per-site rescue logic and returns the same `FetchResult` shape as `fetch()`. Reddit URLs route to the `.json` API with an identifiable bot `User-Agent`, and Akamai-style challenge pages trigger a homepage cookie warmup plus a retry. Makes `/v1/scrape` on Reddit populate markdown again.
### Fixed ### Fixed
- Regression introduced in 0.5.4 where the production server's `/v1/scrape` bypassed the Reddit `.json` shortcut and Akamai cookie warmup that `fetch_and_extract` had been providing. Both helpers now live in `fetch_smart` and every caller path picks them up. - Regression introduced in 0.5.4 where the production server's `/v1/scrape` bypassed the Reddit `.json` shortcut and Akamai cookie warmup that `fetch_and_extract` had been providing. Both helpers now live in `fetch_smart` and every caller path picks them up.
- Panic in the markdown converter (`markdown.rs:925`) on single-pipe `|` lines. A `[1..len-1]` slice on a 1-char input triggered `begin <= end`. Guarded.
--- ---

View file

@ -920,8 +920,10 @@ fn strip_markdown(md: &str) -> String {
continue; continue;
} }
// Convert table data rows: strip leading/trailing pipes, replace inner pipes with tabs // Convert table data rows: strip leading/trailing pipes, replace inner pipes with tabs.
if trimmed.starts_with('|') && trimmed.ends_with('|') { // Require at least 2 chars so the slice `[1..len-1]` stays non-empty on single-pipe rows
// (which aren't real tables anyway); a lone `|` previously panicked at `begin <= end`.
if trimmed.len() >= 2 && trimmed.starts_with('|') && trimmed.ends_with('|') {
let inner = &trimmed[1..trimmed.len() - 1]; let inner = &trimmed[1..trimmed.len() - 1];
let cells: Vec<&str> = inner.split('|').map(|c| c.trim()).collect(); let cells: Vec<&str> = inner.split('|').map(|c| c.trim()).collect();
lines.push(cells.join("\t")); lines.push(cells.join("\t"));