CLI + MCP server for extracting clean, structured content from any URL. 6 Rust crates, 10 MCP tools, TLS fingerprinting, 5 output formats. MIT Licensed | https://webclaw.io
2.8 KiB
Contributing to Webclaw
Thanks for your interest in contributing. This document covers the essentials.
Development Setup
-
Install Rust 1.85+ (edition 2024 required):
curl --proto '=https' --tlsv1.2 -sSf https://sh.rustup.rs | sh -
Clone and build:
git clone https://github.com/0xMassi/webclaw.git cd webclaw cargo build --releaseRUSTFLAGS are configured in
.cargo/config.toml-- no manual flags needed. -
Optional: run
./setup.shfor environment bootstrapping.
Running Tests
cargo test --workspace # All crates
cargo test -p webclaw-core # Single crate
Linting
cargo clippy --all -- -D warnings
cargo fmt --check --all
Both must pass cleanly before submitting a PR.
Code Style
- Rust edition 2024, formatted with
rustfmt(seerustfmt.toml,style_edition = "2024") webclaw-corehas zero network dependencies -- keep it WASM-safewebclaw-llmuses plainreqwest, not the patched TLS variant- Prefer returning
Resultover panicking. No.unwrap()on untrusted input. - Doc comments on all public items. Explain why, not what.
Pull Request Process
-
Fork the repository and create a feature branch:
git checkout -b feat/my-feature -
Make your changes. Write tests for new functionality.
-
Ensure all checks pass:
cargo test --workspace cargo clippy --all -- -D warnings cargo fmt --check --all -
Push and open a pull request against
main. -
PRs require review before merging. Keep changes focused -- one concern per PR.
Commit Messages
Follow Conventional Commits:
feat: add PDF table extraction
fix: handle malformed sitemap XML gracefully
refactor: simplify crawler BFS loop
docs: update MCP setup instructions
test: add glob_match edge cases
chore: bump dependencies
Use the imperative mood ("add", not "added"). Keep the subject under 72 characters. Body is optional but encouraged for non-trivial changes.
Reporting Issues
- Search existing issues before opening a new one
- Include: Rust version, OS, steps to reproduce, expected vs actual behavior
- For extraction bugs: include the URL (or HTML snippet) and the output format used
- Security issues: email directly instead of opening a public issue
Crate Boundaries
Changes that cross crate boundaries need extra care:
| Crate | Network? | Key constraint |
|---|---|---|
| webclaw-core | No | Zero network deps, WASM-safe |
| webclaw-fetch | Yes (Impit) | Requires [patch.crates-io] |
| webclaw-llm | Yes (reqwest) | Plain reqwest, not Impit-patched |
| webclaw-pdf | No | Minimal, wraps pdf-extract |
| webclaw-cli | Yes | Depends on all above |
| webclaw-mcp | Yes | MCP server via rmcp |