Commit graph

  • a5c3433372 fix(core+server): guard markdown pipe slice + detect trustpilot/reddit verify walls main v0.5.6 Valerio 2026-04-23 15:26:31 +02:00
  • 966981bc42 fix(fetch): send bot-identifying UA on reddit .json API to bypass browser UA block Valerio 2026-04-23 15:17:04 +02:00
  • 866fa88aa0 fix(fetch): reject HTML verification pages served at .json reddit URL Valerio 2026-04-23 15:06:35 +02:00
  • b413d702b2 feat(fetch): add fetch_smart with Reddit + Akamai rescue paths, bump 0.5.6 Valerio 2026-04-23 14:59:29 +02:00
  • 98a177dec4 feat(cli): expose safari-ios browser profile + bump to 0.5.5 v0.5.5 Valerio 2026-04-23 13:32:55 +02:00
  • e1af2da509 docs(claude): drop sidecar references, mention ProductionFetcher Valerio 2026-04-23 13:25:23 +02:00
  • 2285c585b1 docs(changelog): simplify 0.5.4 entry v0.5.4 Valerio 2026-04-23 13:01:02 +02:00
  • b77767814a Bump to 0.5.4: SafariIos profile + Chrome fingerprint alignment + locale helper Valerio 2026-04-23 12:58:24 +02:00
  • 4bf11d902f fix(mcp): vertical_scrape uses Firefox profile, not default Chrome v0.5.3 Valerio 2026-04-22 23:18:11 +02:00
  • 0daa2fec1a feat(cli+mcp): vertical extractor support (28 extractors discoverable + callable) v0.5.2 Valerio 2026-04-22 21:41:15 +02:00
  • 058493bc8f feat(fetch): Fetcher trait so vertical extractors work under any HTTP backend v0.5.1 Valerio 2026-04-22 21:17:50 +02:00
  • aaa5103504 docs(claude): fix stale primp references, document wreq + Fetcher trait Valerio 2026-04-22 21:11:18 +02:00
  • 2373162c81 chore: release v0.5.0 (28 vertical extractors + cloud integration) v0.5.0 Valerio 2026-04-22 20:59:01 +02:00
  • b2e7dbf365 fix(extractors): perfect-score follow-ups (trustpilot 2025 schema, amazon/etsy fallbacks, cloud docs) Valerio 2026-04-22 17:49:50 +02:00
  • e10066f527 fix(cloud): synthesize HTML from cloud response instead of requesting raw html Valerio 2026-04-22 17:24:50 +02:00
  • a53578e45c fix(extractors): detect AWS WAF verifying-connection page, add OG fallback to ecommerce_product Valerio 2026-04-22 17:07:31 +02:00
  • 7f5eb93b65 feat(extractors): wave 6b, etsy_listing + HTML fallbacks for substack/youtube Valerio 2026-04-22 16:44:51 +02:00
  • 8cc727c2f2 feat(extractors): wave 6a, 5 easy verticals (27 total) Valerio 2026-04-22 16:33:35 +02:00
  • d8c9274a9c feat(extractors): wave 5 \u2014 Amazon, eBay, Trustpilot via cloud fallback Valerio 2026-04-22 16:16:11 +02:00
  • 0ab891bd6b refactor(cloud): consolidate CloudClient + smart_fetch into webclaw-fetch Valerio 2026-04-22 16:05:44 +02:00
  • 0221c151dc feat(extractors): wave 4 \u2014 ecommerce (shopify + generic JSON-LD) Valerio 2026-04-22 15:36:01 +02:00
  • 3bb0a4bca0 feat(extractors): add LinkedIn + Instagram with profile-to-posts fan-out Valerio 2026-04-22 14:39:49 +02:00
  • b041f3cddd feat(extractors): wave 2 \u2014 8 more verticals (14 total) Valerio 2026-04-22 14:20:21 +02:00
  • 86182ef28a fix(server): switch default browser profile to Firefox Valerio 2026-04-22 14:11:55 +02:00
  • 8ba7538c37 feat(extractors): add vertical extractors module + first 6 verticals Valerio 2026-04-22 14:11:43 +02:00
  • ccdb6d364b fix(ci): release workflow must include webclaw-server v0.4.0 Valerio 2026-04-22 12:44:14 +02:00
  • eff914e84f
    Merge pull request #31 from 0xMassi/feat/oss-webclaw-server Valerio 2026-04-22 12:30:23 +02:00
  • c7e5abea8f docs(changelog): v0.4.0 release notes (#26, #29, #30) Valerio 2026-04-22 12:25:44 +02:00
  • d71eebdacc fix(mcp): silence dead-code warning on tool_router field (closes #30) Valerio 2026-04-22 12:25:39 +02:00
  • d91ad9c1f4 feat(cli): add webclaw bench <url> subcommand (closes #26) Valerio 2026-04-22 12:25:29 +02:00
  • 2ba682adf3 feat(server): add OSS webclaw-server REST API binary (closes #29) Valerio 2026-04-22 12:25:11 +02:00
  • b4bfff120e
    fix(docker): entrypoint shim so child images with custom CMD work (#28) v0.3.19 Valerio 2026-04-17 15:57:47 +02:00
  • 3396dc8ce7 fix(docker): entrypoint shim so child images with custom CMD work Valerio 2026-04-17 15:53:34 +02:00
  • e27ee1f86f
    docs(benchmarks): reproducible 3-way comparison vs trafilatura + firecrawl (#25) Valerio 2026-04-17 14:46:19 +02:00
  • 6116d2b38c docs(benchmarks): reproducible 3-way comparison vs trafilatura + firecrawl Valerio 2026-04-17 14:42:22 +02:00
  • 0463b5e263 style: cargo fmt v0.3.18 Valerio 2026-04-17 12:03:22 +02:00
  • 7f0420bbf0
    fix(core): UTF-8 char boundary panic in find_content_position (#16) (#24) Valerio 2026-04-17 12:02:52 +02:00
  • 12d938fabf fix(core): UTF-8 char boundary panic in find_content_position (#16) Valerio 2026-04-17 11:58:54 +02:00
  • 095ae5d4b1
    polish(fetch,mcp): robots parser + firefox client cache + Acquire ordering (P3) (#23) v0.3.17 Valerio 2026-04-16 20:21:32 +02:00
  • 2214c83782 polish(fetch,mcp): robots parser + firefox client cache + Acquire ordering (P3) Valerio 2026-04-16 20:18:02 +02:00
  • d69c50a31d
    feat(fetch,llm): DoS hardening + glob validation + cleanup (P2) (#22) Valerio 2026-04-16 19:44:08 +02:00
  • 05a2e0eced chore: gitignore CLI research dumps, drop accidentally-tracked file Valerio 2026-04-16 19:40:16 +02:00
  • 98c30365a2 feat(fetch,llm): DoS hardening via response caps + glob validation (P2) Valerio 2026-04-16 19:39:37 +02:00
  • 7773c8af2a
    fix(fetch): surface semaphore-closed as typed error instead of panic (P1) (#21) Valerio 2026-04-16 19:20:26 +02:00
  • 8a0a6fd8ce fix(fetch): surface semaphore-closed as typed error instead of panic (P1) Valerio 2026-04-16 19:14:00 +02:00
  • 1352f48e05
    fix(cli): close --on-change command injection via sh -c (P0) (#20) Valerio 2026-04-16 18:37:02 +02:00
  • 63e94af2e7 chore(brand): fix clippy 1.95 unnecessary_sort_by errors Valerio 2026-04-16 18:32:27 +02:00
  • 9645239fe8 fix(cli): close --on-change command injection via sh -c (P0) Valerio 2026-04-16 18:17:04 +02:00
  • 6316b1a6e7 fix: handle raw newlines in JSON-LD strings Valerio 2026-04-16 11:40:25 +02:00
  • 78e198a347 fix: use ENTRYPOINT instead of CMD in Dockerfiles for proper arg passthrough v0.3.13 Valerio 2026-04-14 20:24:26 +02:00
  • 050b2ef463 feat: add allow_subdomains and allow_external_links to CrawlConfig v0.3.12 Valerio 2026-04-14 19:33:06 +02:00
  • 2e4343905f fix(noxa-68r.4): replace qdrant-client with plain reqwest REST calls Jacob Magar 2026-04-12 08:48:47 -04:00
  • 5217b99601 feat(noxa-68r.8): daemon binary + README Jacob Magar 2026-04-12 07:23:33 -04:00
  • 9cde0d66ca feat(noxa-68r.7): filesystem watcher pipeline Jacob Magar 2026-04-12 07:21:37 -04:00
  • d66522b8ae feat(noxa-68r.6): factory and TOML config Jacob Magar 2026-04-12 07:18:53 -04:00
  • 20e880eea5 feat(noxa-68r.2,noxa-68r.3,noxa-68r.4): chunker, TEI provider, Qdrant store Jacob Magar 2026-04-12 07:17:57 -04:00
  • 62554b8f12 feat(noxa-68r.1): crate scaffold, traits, and core types Jacob Magar 2026-04-12 07:13:19 -04:00
  • adf4b6ba55 feat(llm): add Gemini CLI provider as primary; set qwen3.5:9b as default Ollama model Jacob Magar 2026-04-12 00:52:53 -04:00
  • 464eb1baec
    Merge pull request #2 from jmagar/feature/noxa-mcp-subcommand jmagar 2026-04-11 21:38:16 -04:00
  • a25103667e refactor: add noxa mcp subcommand Jacob Magar 2026-04-11 20:44:25 -04:00
  • 251979edfe perf(gemini-cli): skip MCP server startup via workspace settings override Jacob Magar 2026-04-11 20:23:33 -04:00
  • c018ea4a61 Sync env example with env contract Jacob Magar 2026-04-11 19:30:51 -04:00
  • 042feb7887 refactor(llm): move timing to dispatch layer; keep CLI eprintln Jacob Magar 2026-04-11 17:06:53 -04:00
  • 534855955b feat(cli): print LLM elapsed time to stderr Jacob Magar 2026-04-11 16:52:13 -04:00
  • b8fc1d7c75 chore: update Cargo.lock after adding serde to noxa-cli Jacob Magar 2026-04-11 12:35:49 -04:00
  • 1c112459bc fix: error on explicit missing config path; update env.example; add README config docs Jacob Magar 2026-04-11 12:35:21 -04:00
  • 10364416c1 chore: slim .env.example to secrets/URLs only Jacob Magar 2026-04-11 12:29:05 -04:00
  • 9acecba314 chore: add config.example.json and gitignore config.json Jacob Magar 2026-04-11 12:29:00 -04:00
  • f22051491f fix: use resolved config in raw_html selector guard; remove dead path_prefix fallback Jacob Magar 2026-04-11 12:28:29 -04:00
  • bac13fc1b5 feat: wire ResolvedConfig into main.rs via clap ValueSource Jacob Magar 2026-04-11 12:24:44 -04:00
  • e7583a5c51 docs: clarify doc comments on ResolvedConfig selector and raw_html fields Jacob Magar 2026-04-11 12:18:53 -04:00
  • 3bc6a9920b feat: add NoxaConfig and ResolvedConfig with load() Jacob Magar 2026-04-11 12:16:56 -04:00
  • cc1617a3a9 fix(gemini-cli): correct CLI invocation to match gemini v0.36 interface Jacob Magar 2026-04-11 12:16:21 -04:00
  • cfe455b752 feat: derive Deserialize on OutputFormat, Browser, PdfModeArg Jacob Magar 2026-04-11 12:13:25 -04:00
  • af304eda7f docs(noxa-9fw.4): describe gemini cli as primary llm backend Jacob Magar 2026-04-11 07:36:19 -04:00
  • 993fd6c45d feat(noxa-9fw.3): validate structured extraction output with one retry Jacob Magar 2026-04-11 07:34:58 -04:00
  • 420a1d7522 feat(noxa-9fw.2): make gemini cli the primary llm backend Jacob Magar 2026-04-11 07:32:24 -04:00
  • d800c37bfd feat(noxa-9fw.1): add gemini cli provider adapter Jacob Magar 2026-04-11 07:30:41 -04:00
  • 8674b60b4e chore: rebrand webclaw to noxa Jacob Magar 2026-04-11 00:10:38 -04:00
  • a4c351d5ae feat: add fallback sitemap paths for broader discovery v0.3.11 Valerio 2026-04-10 18:22:57 +02:00
  • 25b6282d5f style: fix rustfmt for 2-element delay array v0.3.10 Valerio 2026-04-10 17:21:53 +02:00
  • 954aabe3e8 perf: reduce fetch timeout to 12s and retries to 2 Valerio 2026-04-10 17:18:57 +02:00
  • 5ea646a332 fix: resolve clippy warnings from #14 (collapsible_if, manual_inspect) v0.3.9 Valerio 2026-04-04 15:28:59 +02:00
  • 3cf9dbaf2a chore: bump to 0.3.9, fix formatting from #14 Valerio 2026-04-04 15:24:17 +02:00
  • 87ecf4241f
    fix: layout tables, stack overflow, and noise filter (#14) Valerio 2026-04-04 15:20:08 +02:00
  • 70c67f2ed6 fix: prevent noise filter from swallowing content in malformed HTML devnen 2026-04-04 01:33:11 +02:00
  • 74bac87435 fix: prevent stack overflow on deeply nested HTML pages devnen 2026-04-03 23:45:19 +02:00
  • 95a6681b02 fix: detect layout tables and render as sections instead of markdown tables devnen 2026-04-03 22:24:35 +02:00
  • 1d2018c98e fix: MCP research saves to file, returns compact response v0.3.8 Valerio 2026-04-03 16:05:45 +02:00
  • f7cc0cc5cf feat: CLI --research flag + MCP cloud fallback + structured research output v0.3.7 Valerio 2026-04-03 14:04:04 +02:00
  • 344eea74d9 feat: structured data in markdown/LLM output + v0.3.6 v0.3.6 Valerio 2026-04-02 19:16:56 +02:00
  • b219fc3648 fix(ci): update all 4 Homebrew checksums after Docker build completes Valerio 2026-04-02 19:02:27 +02:00
  • 8d29382b25 feat: extract __NEXT_DATA__ into structured_data v0.3.5 Valerio 2026-04-02 16:04:51 +02:00
  • 4e81c3430d docs: update npm package license to AGPL-3.0 Valerio 2026-04-02 11:33:43 +02:00
  • c43da982c3 docs: update README license references from MIT to AGPL-3.0 Valerio 2026-04-02 11:28:40 +02:00
  • 84b2e6092e feat: SvelteKit data extraction + license change to AGPL-3.0 v0.3.4 Valerio 2026-04-01 20:37:56 +02:00
  • b4800e681c ci: fix aarch64 cross-compilation for BoringSSL (boring-sys2) v0.3.3 Valerio 2026-04-01 18:39:43 +02:00
  • a1b9a55048 chore: add SKILL.md to repo root for skills.sh discoverability Valerio 2026-04-01 18:27:17 +02:00
  • 124352e0b4 style: cargo fmt Valerio 2026-04-01 18:25:40 +02:00
  • 1a5d3d8aaf chore: remove reqwest_unstable rustflag (no longer needed) Valerio 2026-04-01 18:15:05 +02:00