mirror of
https://github.com/0xMassi/webclaw.git
synced 2026-04-25 00:06:21 +02:00
fix(core+server): guard markdown pipe slice + detect trustpilot/reddit verify walls
This commit is contained in:
parent
966981bc42
commit
a5c3433372
2 changed files with 6 additions and 3 deletions
|
|
@ -6,10 +6,11 @@ Format follows [Keep a Changelog](https://keepachangelog.com/).
|
||||||
## [0.5.6] — 2026-04-23
|
## [0.5.6] — 2026-04-23
|
||||||
|
|
||||||
### Added
|
### Added
|
||||||
- `FetchClient::fetch_smart(url)` applies per-site rescue logic and returns the same `FetchResult` shape as `fetch()`. Reddit URLs route to the `.json` API, and Akamai-style challenge pages trigger a homepage cookie warmup plus a retry. Makes `/v1/scrape` on Reddit populate markdown again.
|
- `FetchClient::fetch_smart(url)` applies per-site rescue logic and returns the same `FetchResult` shape as `fetch()`. Reddit URLs route to the `.json` API with an identifiable bot `User-Agent`, and Akamai-style challenge pages trigger a homepage cookie warmup plus a retry. Makes `/v1/scrape` on Reddit populate markdown again.
|
||||||
|
|
||||||
### Fixed
|
### Fixed
|
||||||
- Regression introduced in 0.5.4 where the production server's `/v1/scrape` bypassed the Reddit `.json` shortcut and Akamai cookie warmup that `fetch_and_extract` had been providing. Both helpers now live in `fetch_smart` and every caller path picks them up.
|
- Regression introduced in 0.5.4 where the production server's `/v1/scrape` bypassed the Reddit `.json` shortcut and Akamai cookie warmup that `fetch_and_extract` had been providing. Both helpers now live in `fetch_smart` and every caller path picks them up.
|
||||||
|
- Panic in the markdown converter (`markdown.rs:925`) on single-pipe `|` lines. A `[1..len-1]` slice on a 1-char input triggered `begin <= end`. Guarded.
|
||||||
|
|
||||||
---
|
---
|
||||||
|
|
||||||
|
|
|
||||||
|
|
@ -920,8 +920,10 @@ fn strip_markdown(md: &str) -> String {
|
||||||
continue;
|
continue;
|
||||||
}
|
}
|
||||||
|
|
||||||
// Convert table data rows: strip leading/trailing pipes, replace inner pipes with tabs
|
// Convert table data rows: strip leading/trailing pipes, replace inner pipes with tabs.
|
||||||
if trimmed.starts_with('|') && trimmed.ends_with('|') {
|
// Require at least 2 chars so the slice `[1..len-1]` stays non-empty on single-pipe rows
|
||||||
|
// (which aren't real tables anyway); a lone `|` previously panicked at `begin <= end`.
|
||||||
|
if trimmed.len() >= 2 && trimmed.starts_with('|') && trimmed.ends_with('|') {
|
||||||
let inner = &trimmed[1..trimmed.len() - 1];
|
let inner = &trimmed[1..trimmed.len() - 1];
|
||||||
let cells: Vec<&str> = inner.split('|').map(|c| c.trim()).collect();
|
let cells: Vec<&str> = inner.split('|').map(|c| c.trim()).collect();
|
||||||
lines.push(cells.join("\t"));
|
lines.push(cells.join("\t"));
|
||||||
|
|
|
||||||
Loading…
Add table
Add a link
Reference in a new issue