mirror of
https://github.com/0xMassi/webclaw.git
synced 2026-04-25 00:06:21 +02:00
docs: update CONTRIBUTING.md for v0.3.0 architecture
- Replace Impit/primp references with webclaw-tls - Add architecture diagram showing crate layout + TLS repo - Update crate boundaries table Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
This commit is contained in:
parent
e3b0d0bd74
commit
ee3c714aa9
1 changed files with 20 additions and 3 deletions
|
|
@ -40,7 +40,7 @@ Both must pass cleanly before submitting a PR.
|
|||
|
||||
- Rust edition 2024, formatted with `rustfmt` (see `rustfmt.toml`, `style_edition = "2024"`)
|
||||
- `webclaw-core` has zero network dependencies -- keep it WASM-safe
|
||||
- `webclaw-llm` uses plain `reqwest`, not the patched TLS variant
|
||||
- `webclaw-llm` uses plain `reqwest` — LLM APIs don't need TLS fingerprinting
|
||||
- Prefer returning `Result` over panicking. No `.unwrap()` on untrusted input.
|
||||
- Doc comments on all public items. Explain *why*, not *what*.
|
||||
|
||||
|
|
@ -87,6 +87,23 @@ Body is optional but encouraged for non-trivial changes.
|
|||
- For extraction bugs: include the URL (or HTML snippet) and the output format used
|
||||
- Security issues: email directly instead of opening a public issue
|
||||
|
||||
## Architecture
|
||||
|
||||
```
|
||||
webclaw (this repo)
|
||||
├── crates/
|
||||
│ ├── webclaw-core/ # Pure extraction engine (HTML → markdown/json/text)
|
||||
│ ├── webclaw-fetch/ # HTTP client + crawler + sitemap + batch
|
||||
│ ├── webclaw-llm/ # LLM provider chain (Ollama → OpenAI → Anthropic)
|
||||
│ ├── webclaw-pdf/ # PDF text extraction
|
||||
│ ├── webclaw-cli/ # CLI binary
|
||||
│ └── webclaw-mcp/ # MCP server binary
|
||||
│
|
||||
└── [patch.crates-io] # Points to webclaw-tls for TLS fingerprinting
|
||||
```
|
||||
|
||||
TLS fingerprinting lives in a separate repo: [webclaw-tls](https://github.com/0xMassi/webclaw-tls). The `[patch.crates-io]` section in `Cargo.toml` overrides rustls, h2, hyper, hyper-util, and reqwest with our patched forks for browser-grade JA4 + HTTP/2 Akamai fingerprinting.
|
||||
|
||||
## Crate Boundaries
|
||||
|
||||
Changes that cross crate boundaries need extra care:
|
||||
|
|
@ -94,8 +111,8 @@ Changes that cross crate boundaries need extra care:
|
|||
| Crate | Network? | Key constraint |
|
||||
|-------|----------|----------------|
|
||||
| webclaw-core | No | Zero network deps, WASM-safe |
|
||||
| webclaw-fetch | Yes (Impit) | Requires `[patch.crates-io]` |
|
||||
| webclaw-llm | Yes (reqwest) | Plain reqwest, not Impit-patched |
|
||||
| webclaw-fetch | Yes (webclaw-http) | Uses [webclaw-tls](https://github.com/0xMassi/webclaw-tls) for TLS fingerprinting |
|
||||
| webclaw-llm | Yes (reqwest) | Plain reqwest — LLM APIs don't need TLS fingerprinting |
|
||||
| webclaw-pdf | No | Minimal, wraps pdf-extract |
|
||||
| webclaw-cli | Yes | Depends on all above |
|
||||
| webclaw-mcp | Yes | MCP server via rmcp |
|
||||
|
|
|
|||
Loading…
Add table
Add a link
Reference in a new issue