Commit graph

1 commit

Author SHA1 Message Date
feder-cr
3d8ba0b82c feat: deterministic reCAPTCHA cookie pre-seed via Bayesian browsing history
Adds opt-in helper that auto-injects coherent cookie history into every
BrowserContext created via new_context(). Content is fully deterministic
from the persona seed so a given seed always presents the same cookies
across sessions.

Composition (per persona, all derived from seed):
  - 5 cookies on .google.com (NID, CONSENT, SOCS, _GRECAPTCHA, ENID).
    Excludes 1P_JAR which was deprecated by Google in 2022. CONSENT
    `lang+region` token derived from the persona's IANA timezone
    (Europe/Rome -> it+IT, America/* -> en+FX, etc.). NID prefix
    broadened to 100-540 to cover historical versions.
  - Per-site cookies on 13-25 "visited" everyday domains, sampled from a
    Bayesian network conditioned on gpu_class - workstation/high_end
    personas trend toward dev/tech sites, low_end/integrated_old trend
    toward shop/news/reference. Each site contributes 1-7 cookies based
    on a `cookie_profile` tag. Cookie pool includes _ga, _gid, _clck,
    _clsk, __cf_bm, OneTrust/CookieYes consent, _fbp (Facebook Pixel),
    _dc_gtm_<id> (Tag Manager helper), __hssrc (HubSpot helper).

API:
    Stealthfox(seed=42, prep_recaptcha=True)

No per-call configuration: visited-sites + cookie composition all derived
from the persona seed via the Bayesian sampler.

Gated server-side: forced False if profile_dir is set (persistent profile
owns its own state). All expiries capped to 395 days per Chrome/Firefox
400-day RFC 6265bis-15 limit.

Bayesian integration:
  - New `derive_browsing_history(gpu_class, rng)` in _fpforge/_sampler.py
    (parallel to `derive_font_prefs`).
  - New data files: browsing_pool.json (50 site entries) and
    cpt_browsing_given_class.json (per-class probabilities).
  - Profile dataclass exposes `browsing_history` field.
  - _recaptcha_seed.py consumes Profile.browsing_history; receives
    timezone separately to derive CONSENT lang+region.

Also drops a dead Chromium-only e2e test that always skipped on our
Firefox-only wrapper.

Test coverage: 29 unit tests covering composition, profile recipes
(minimal/ga_only/ga_cf/ga_consent/ga_consent_clarity), determinism,
Chrome 400-day cap, Playwright field requirements, CONSENT lang
mapping (IT/DE/US/default), helper-cookie probability distributions,
end-to-end with real fpforge Profile.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-24 20:01:26 -07:00