webclaw/CONTRIBUTING.md
Valerio c99ec684fa Initial release: webclaw v0.1.0 — web content extraction for LLMs
CLI + MCP server for extracting clean, structured content from any URL.
6 Rust crates, 10 MCP tools, TLS fingerprinting, 5 output formats.

MIT Licensed | https://webclaw.io
2026-03-23 18:31:11 +01:00

2.8 KiB

Contributing to Webclaw

Thanks for your interest in contributing. This document covers the essentials.

Development Setup

  1. Install Rust 1.85+ (edition 2024 required):

    curl --proto '=https' --tlsv1.2 -sSf https://sh.rustup.rs | sh
    
  2. Clone and build:

    git clone https://github.com/0xMassi/webclaw.git
    cd webclaw
    cargo build --release
    

    RUSTFLAGS are configured in .cargo/config.toml -- no manual flags needed.

  3. Optional: run ./setup.sh for environment bootstrapping.

Running Tests

cargo test --workspace          # All crates
cargo test -p webclaw-core      # Single crate

Linting

cargo clippy --all -- -D warnings
cargo fmt --check --all

Both must pass cleanly before submitting a PR.

Code Style

  • Rust edition 2024, formatted with rustfmt (see rustfmt.toml, style_edition = "2024")
  • webclaw-core has zero network dependencies -- keep it WASM-safe
  • webclaw-llm uses plain reqwest, not the patched TLS variant
  • Prefer returning Result over panicking. No .unwrap() on untrusted input.
  • Doc comments on all public items. Explain why, not what.

Pull Request Process

  1. Fork the repository and create a feature branch:

    git checkout -b feat/my-feature
    
  2. Make your changes. Write tests for new functionality.

  3. Ensure all checks pass:

    cargo test --workspace
    cargo clippy --all -- -D warnings
    cargo fmt --check --all
    
  4. Push and open a pull request against main.

  5. PRs require review before merging. Keep changes focused -- one concern per PR.

Commit Messages

Follow Conventional Commits:

feat: add PDF table extraction
fix: handle malformed sitemap XML gracefully
refactor: simplify crawler BFS loop
docs: update MCP setup instructions
test: add glob_match edge cases
chore: bump dependencies

Use the imperative mood ("add", not "added"). Keep the subject under 72 characters. Body is optional but encouraged for non-trivial changes.

Reporting Issues

  • Search existing issues before opening a new one
  • Include: Rust version, OS, steps to reproduce, expected vs actual behavior
  • For extraction bugs: include the URL (or HTML snippet) and the output format used
  • Security issues: email directly instead of opening a public issue

Crate Boundaries

Changes that cross crate boundaries need extra care:

Crate Network? Key constraint
webclaw-core No Zero network deps, WASM-safe
webclaw-fetch Yes (Impit) Requires [patch.crates-io]
webclaw-llm Yes (reqwest) Plain reqwest, not Impit-patched
webclaw-pdf No Minimal, wraps pdf-extract
webclaw-cli Yes Depends on all above
webclaw-mcp Yes MCP server via rmcp