Commit graph

159 commits

Author SHA1 Message Date
Valerio
8fe8bcb479 chore(ci): bump actions/checkout and artifact actions to v5
GitHub flagged checkout@v4 / upload-artifact@v4 / download-artifact@v4
as Node.js 20 actions, force-migrated to Node 24 on 2026-06-02. Bump
all nine references to v5 ahead of the deadline. The artifact steps are
v5-compatible: upload uses a unique matrix-target name and the download
step flattens subdirectories with find afterward.
2026-05-21 15:11:29 +02:00
Valerio
51260ae4e3 chore(release): record v0.6.4 version bump and changelog
The v0.6.4 tag shipped the API surface discovery module but the
release commit left the workspace version at 0.6.3 with no matching
changelog entry. Bump [workspace.package] to 0.6.4 and add the
[0.6.4] CHANGELOG section so the code matches the tag.
2026-05-21 12:58:47 +02:00
Valerio
fe567a6af1
feat(core): endpoints module for API surface extraction from HTML and JS (#47)
* feat(core): endpoints module — extract API surface from HTML + JS bundles

* fix(docker): source CA bundle from distroless instead of apt (fixes arm64 release build)

* fix(test): serialize env-mutating CloudClient tests to stop flaky CI

* feat(core): filter endpoint-extractor noise (invalid hosts, schema domains, bare paths)
2026-05-19 19:05:16 +02:00
Valerio
be8bcfebd9
fix: harden resource limits, path safety, and WASM build (#46)
Security audit follow-up across the workspace:

- webclaw-core: keep the crate WASM-safe. quickjs/rquickjs is now a
  cfg(not(wasm32)) target dependency and the extraction entry point uses
  a direct call on wasm instead of spawning a thread, so it builds and
  runs on wasm32 with or without default features.
- webclaw-core: bound the structured-data scrubber recursion (depth cap)
  so deeply nested attacker JSON-LD / __NEXT_DATA__ cannot exhaust the
  stack.
- webclaw-fetch: stream the response body with a running ceiling so a
  small highly compressed payload cannot inflate to gigabytes in memory;
  redact user:pass@ from proxy URLs before they reach error strings.
- webclaw-cli: contain output filenames inside the chosen directory
  (reject .. / absolute, drop traversal path segments), run --webhook
  URLs through the public-URL SSRF guard, clamp --watch-interval to >=1s,
  and make research slug truncation char-safe.
- webclaw-mcp: char-safe slug truncation (no multibyte slice panic).
- setup.sh / deploy/hetzner.sh: replace eval on read input with
  printf -v, and mask auth key / API token in console output.
- CI: enforce the wasm32 build invariant for webclaw-core.

Tests added for every behavioral change. Bump to 0.6.3 + CHANGELOG.
2026-05-19 17:03:52 +02:00
Valerio
aab51bea91 docs: add workflow examples 2026-05-18 18:56:00 +02:00
Valerio
b75b768ec3 Update Quantum Proxies sponsor copy 2026-05-18 18:50:38 +02:00
Valerio
3fabdc1d02
fix: clean llm output noise
Port the valid PR #43 LLM cleanup fixes onto current main without stale branch regressions.\n\nIncludes comment-count link cleanup, bare numeric paragraph cleanup, pagination leftover cleanup, JSON-LD article body scrubbing, clearer CLI consent-wall warnings, and quieter parser logs by default.\n\nThanks to @devnen for the report and patch work.
2026-05-18 18:39:33 +02:00
Valerio
5eef8358b0 docs: update sponsor partner details 2026-05-18 13:09:02 +02:00
Valerio
7dfd62ec1d docs: add proxy-seller studio partner 2026-05-18 12:37:28 +02:00
Valerio
6d886c44f6 docs: enlarge studio partner banner 2026-05-18 12:27:11 +02:00
Valerio
8e3ad17428 docs: tighten studio partner layout 2026-05-18 12:23:19 +02:00
Valerio
7321549412 docs: add studio partner section 2026-05-18 12:17:34 +02:00
Valerio
72edb61881
Merge pull request #42 from jal-co/docs/add-community-plugins
docs: add community plugins section
2026-05-16 11:24:33 +02:00
Valerio
00d86a12bc docs: refine community plugin copy 2026-05-16 11:19:15 +02:00
Justin Levine
c8be5214f6
docs: add community plugins section with OpenClaw and Hermes integrations 2026-05-15 17:51:22 -07:00
Valerio
0ea189c5b2 fix(ci): pass repository to release cli
Some checks failed
CI / Test (push) Has been cancelled
CI / Lint (push) Has been cancelled
CI / Docs (push) Has been cancelled
2026-05-12 12:28:14 +02:00
Valerio
a629534490
fix(security): prepare 0.6.1 hardening
Merge the 0.6.1 security hardening release candidate after local and CI verification.
2026-05-12 12:16:42 +02:00
Valerio
fd2e75d509 chore(fetch): satisfy clippy for resolver setup 2026-05-12 12:09:18 +02:00
Valerio
e2f89941ac chore(release): prepare 0.6.1 2026-05-12 12:06:06 +02:00
Valerio
307b4f980d fix(extractors): harden marketplace host matching 2026-05-12 12:03:43 +02:00
Valerio
dbf9ce08a6 fix(ci): scope release workflow token permissions 2026-05-12 12:00:47 +02:00
Valerio
3bcb288d13 fix(fetch): guard challenge detection before utf8 decoding 2026-05-12 12:00:47 +02:00
Valerio
a611ae26f3 fix(security): harden local fetch surfaces 2026-05-12 12:00:25 +02:00
Valerio
af96628dc9
Revise README for clarity and updated content
Some checks failed
CI / Test (push) Has been cancelled
CI / Lint (push) Has been cancelled
CI / Docs (push) Has been cancelled
Updated the README to reflect changes in the project description, banner image size, and various content sections. Enhanced clarity on features and usage.
2026-05-10 22:44:57 +02:00
devnen
e8ca1417d6
Improve --format llm output quality (#37)
Some checks are pending
CI / Test (push) Waiting to run
CI / Lint (push) Waiting to run
CI / Docs (push) Waiting to run
Improve LLM-format output for modern news and documentation pages.

- Filter noisy hydration and low-value page chrome structured data while preserving content-bearing Schema.org records
- Fix element/text spacing without detaching punctuation on docs, forums, and reference pages
- Remove common accessibility link chrome from LLM text and link labels
- Bump workspace version to 0.6.0 and update the changelog

Thanks to Nenad Oric (@devnen) for the original PR and contribution.
2026-05-10 15:11:12 +02:00
Valerio
7f75143954 docs: update hosted api trial copy
Some checks failed
CI / Test (push) Has been cancelled
CI / Lint (push) Has been cancelled
CI / Docs (push) Has been cancelled
2026-05-06 17:16:35 +02:00
Valerio
e6a95f783d chore: bump version to 0.5.9
Some checks are pending
CI / Test (push) Waiting to run
CI / Lint (push) Waiting to run
CI / Docs (push) Waiting to run
2026-05-06 11:42:09 +02:00
Valerio
a3aa4bce6f fix: support LLM provider compatibility options
Closes #36
2026-05-06 11:36:53 +02:00
Valerio
86183b11e4 docs: credit Windows release contribution
Some checks are pending
CI / Test (push) Waiting to run
CI / Lint (push) Waiting to run
CI / Docs (push) Waiting to run
2026-05-05 11:44:07 +02:00
SURYANSH MISHRA
513b0e493e ci: add Windows release artifacts
Closes #34
2026-05-05 11:38:30 +02:00
Valerio
a1242a1c1d docs: credit README badge refresh 2026-05-05 11:18:58 +02:00
Justin Levine
a542e45768
docs: refresh README badges
Replace README badges with shieldcn-styled badges.
2026-05-05 11:17:21 +02:00
Valerio
615f326660 docs: update changelog for brand extraction
Some checks are pending
CI / Test (push) Waiting to run
CI / Lint (push) Waiting to run
CI / Docs (push) Waiting to run
2026-05-04 21:52:49 +02:00
Valerio
72b8dbc285 fix: improve brand extraction signals 2026-05-04 21:25:07 +02:00
Valerio
1c9def2fde fix: validate self-host route URLs consistently 2026-05-04 14:30:06 +02:00
Valerio
eede2f6953 docs: credit SSRF report
Some checks are pending
CI / Test (push) Waiting to run
CI / Lint (push) Waiting to run
CI / Docs (push) Waiting to run
2026-05-04 12:08:11 +02:00
Valerio
bdf81fe6bf fix: harden fetch URL validation 2026-05-04 11:50:57 +02:00
Valerio
23544f8fac docs(claude): note youtube.rs role and yt-dlp short-circuit in server
Some checks are pending
CI / Test (push) Waiting to run
CI / Lint (push) Waiting to run
CI / Docs (push) Waiting to run
The webclaw-core youtube module produces structured markdown but no
transcript; document that and point at the production server's
youtube_transcript.rs short-circuit for the full YoutubeData + caption
text shape.
2026-05-03 21:17:23 +02:00
Valerio
923445f4a8 docs(readme): add h1 brand heading
Some checks failed
CI / Test (push) Has been cancelled
CI / Lint (push) Has been cancelled
CI / Docs (push) Has been cancelled
The repo had no heading-level brand anchor, only a banner image and
an h3 slogan. Search engines indexing the README were missing the
canonical brand signal. The new h1 is what GitHub renders as the
title of the page and what Google co-ranks with webclaw.io.

Bumps workspace version to 0.5.7.
2026-04-30 11:47:02 +02:00
Valerio
0e6c7cdc97
Add GitHub Sponsors username to FUNDING.yml
Some checks failed
CI / Test (push) Has been cancelled
CI / Lint (push) Has been cancelled
CI / Docs (push) Has been cancelled
Updated funding model with GitHub Sponsors username.
2026-04-27 13:18:22 +02:00
Valerio
5795c5c422 docs(readme): add star history chart
Some checks are pending
CI / Test (push) Waiting to run
CI / Lint (push) Waiting to run
CI / Docs (push) Waiting to run
2026-04-26 17:55:22 +02:00
Valerio
4908367720 docs(readme): add hosted API callout above Get Started
Surface webclaw.io as a clear alternative path for visitors who want
the antibot, JS rendering, async jobs, search, and watches the OSS
server doesn't ship. Sits between the value-prop and the install
instructions so self-host stays the primary on-ramp.
2026-04-26 17:15:44 +02:00
Valerio
a5c3433372 fix(core+server): guard markdown pipe slice + detect trustpilot/reddit verify walls
Some checks failed
CI / Test (push) Has been cancelled
CI / Lint (push) Has been cancelled
CI / Docs (push) Has been cancelled
2026-04-23 15:26:31 +02:00
Valerio
966981bc42 fix(fetch): send bot-identifying UA on reddit .json API to bypass browser UA block
Some checks are pending
CI / Test (push) Waiting to run
CI / Lint (push) Waiting to run
CI / Docs (push) Waiting to run
2026-04-23 15:17:04 +02:00
Valerio
866fa88aa0 fix(fetch): reject HTML verification pages served at .json reddit URL 2026-04-23 15:06:35 +02:00
Valerio
b413d702b2 feat(fetch): add fetch_smart with Reddit + Akamai rescue paths, bump 0.5.6 2026-04-23 14:59:29 +02:00
Valerio
98a177dec4 feat(cli): expose safari-ios browser profile + bump to 0.5.5 2026-04-23 13:32:55 +02:00
Valerio
e1af2da509 docs(claude): drop sidecar references, mention ProductionFetcher 2026-04-23 13:25:23 +02:00
Valerio
2285c585b1 docs(changelog): simplify 0.5.4 entry 2026-04-23 13:01:02 +02:00
Valerio
b77767814a Bump to 0.5.4: SafariIos profile + Chrome fingerprint alignment + locale helper
- New BrowserProfile::SafariIos mapped to BrowserVariant::SafariIos26.
  Built on wreq_util::Emulation::SafariIos26 with 4 overrides (TLS
  extension order, HTTP/2 HEADERS priority, real Safari iOS 26 headers,
  gzip/deflate/br). Matches bogdanfinn safari_ios_26_0 JA3
  8d909525bd5bbb79f133d11cc05159fe exactly. Empirically 9/10 on
  immobiliare.it with country-it residential.

- BrowserProfile::Chrome aligned to bogdanfinn chrome_133: dropped
  MAX_CONCURRENT_STREAMS from H2 SETTINGS, priority weight 256,
  explicit extension_permutation, advertise h3 in ALPN and ALPS.
  JA3 43067709b025da334de1279a120f8e14, akamai_fp
  52d84b11737d980aef856699f885ca86. Fixes indeed.com and other
  Cloudflare-fronted sites.

- New locale module: accept_language_for_url / accept_language_for_tld.
  TLD to Accept-Language mapping, unknown TLDs default to en-US.
  DataDome geo-vs-locale cross-checks are now trivially satisfiable.

- wreq-util bumped 2.2.6 to 3.0.0-rc.10 for Emulation::SafariIos26.
2026-04-23 12:58:24 +02:00