Skip to content

feat(detection): stealth-browser detection for botasaurus-class scrapers (0.4.0)#1

Merged
cport1 merged 4 commits into
mainfrom
feat/stealth-browser-detection
Jul 1, 2026
Merged

feat(detection): stealth-browser detection for botasaurus-class scrapers (0.4.0)#1
cport1 merged 4 commits into
mainfrom
feat/stealth-browser-detection

Conversation

@cport1

@cport1 cport1 commented Jul 1, 2026

Copy link
Copy Markdown
Contributor

Summary

0.4.0 — two layers against botasaurus-class scrapers, both validated against real botasaurus via a live harness.

F1 — stealth fingerprint detection (contributing signals)

  • New stealth category (native-function lie detection) + AudioContext scoring + broadened software-WebGL.
  • FP fix surfaced by live testing: dropped webdriver_configurable/chrome_runtime_missing (fired on real Chrome). Real Chrome 36%→26% (allow).
  • Honest finding: real botasaurus browser mode still scores ~30% → allow. Fingerprinting alone does not stop it (expected — it is built to defeat fingerprinting).

F4 — tripwire deception (the actual catch)

  • tripwire({ paths, prefixes, patterns, includeDefaults }) rule + honeytoken() helper. Deterministic, zero-FP: a request for a hidden honeypot path is automated by construction. Flows through the existing DENY→403→violation pipeline (no middleware changes).
  • Live-validated: the same botasaurus crawler that evaded F1 is BLOCKED by the tripwire (GET /__wd/… → 403); real Chrome loading the same page never requests the hidden link → zero FP.

Tests: 98 pass. Harness: packages/webdecoy/harness/ (server + page + botasaurus scripts).

Publish: tag v0.4.0 after merge triggers the npm workflow.

cport1 added 4 commits June 30, 2026 20:06
Add native-function lie/tampering detection (new 'stealth' category), AudioContext scoring, and broaden software-WebGL detection to match the client's suspiciousRenderer set. Targets botasaurus browser-mode evasions that strip navigator.webdriver but leave patched natives.

- client: collect native-function integrity (_getLieDetection)
- detection: analyzeLies (stealth category), analyzeAudioContext
- weights: add stealth category (0.30) — 0 for clean browsers, no FP
- harness: signal-profile measurement (packages/webdecoy/harness)

Measured (synthetic fixtures): headful botasaurus 24% (allow) -> 52% (challenge), cloud 33% -> 57%; real Chrome unchanged at 6%. 58 detection tests pass. Fixtures are synthetic; live-botasaurus validation pending.
…l Chrome

Live-harness testing against real headful Chrome showed webdriver_configurable and chrome_runtime_missing fire on genuine browsers (Chrome's webdriver descriptor is configurable; chrome.runtime is absent on ordinary pages), scoring a real user at 36% (challenge). Removing them: real Chrome 36%->26% (allow), zero environmental detections; the new stealth detectors stayed clean.

Adds the live harness (server.ts + page.html + botasaurus_test.py) used to find this.
- Native-function lie detection (new 'stealth' category) + AudioContext scoring
- Broadened software/virtualized WebGL detection to the client's suspiciousRenderer set
- Fix: drop playwright heuristics (webdriver_configurable, chrome_runtime_missing) that false-positived on real Chrome
- Live harness (server + page + botasaurus script) for real-browser validation

Live-validated: real Chrome -> allow (env layer clean), request-mode -> challenge. 64 tests pass.
…eption)

Adds tripwire({paths,prefixes,patterns,includeDefaults}) and honeytoken() (a hidden decoy link + its tripwire path). Any request for a honeypot path is automated by construction, so it DENYs through the existing rule pipeline (403 + violation report) with no middleware changes. Detects intent, not fingerprint, so stealth tools like botasaurus cannot evade it.

Live-validated against real botasaurus: the crawler that walked past F1 fingerprinting (browser mode -> allow) is BLOCKED by the tripwire (GET /__wd/... -> 403). Real Chrome loading the same page never requests the hidden link -> zero false positive.

- rules/tripwire-rule.ts, rules/honeytoken.ts + exports; rules/tripwire.test.ts (13 tests)
- harness: tripwire wired into server, honeytoken injected into page, botasaurus_crawl_test.py
@cport1 cport1 merged commit 998561c into main Jul 1, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant