feat(detection): stealth-browser detection for botasaurus-class scrapers (0.4.0)#1
Merged
Merged
Conversation
Add native-function lie/tampering detection (new 'stealth' category), AudioContext scoring, and broaden software-WebGL detection to match the client's suspiciousRenderer set. Targets botasaurus browser-mode evasions that strip navigator.webdriver but leave patched natives. - client: collect native-function integrity (_getLieDetection) - detection: analyzeLies (stealth category), analyzeAudioContext - weights: add stealth category (0.30) — 0 for clean browsers, no FP - harness: signal-profile measurement (packages/webdecoy/harness) Measured (synthetic fixtures): headful botasaurus 24% (allow) -> 52% (challenge), cloud 33% -> 57%; real Chrome unchanged at 6%. 58 detection tests pass. Fixtures are synthetic; live-botasaurus validation pending.
…l Chrome Live-harness testing against real headful Chrome showed webdriver_configurable and chrome_runtime_missing fire on genuine browsers (Chrome's webdriver descriptor is configurable; chrome.runtime is absent on ordinary pages), scoring a real user at 36% (challenge). Removing them: real Chrome 36%->26% (allow), zero environmental detections; the new stealth detectors stayed clean. Adds the live harness (server.ts + page.html + botasaurus_test.py) used to find this.
- Native-function lie detection (new 'stealth' category) + AudioContext scoring - Broadened software/virtualized WebGL detection to the client's suspiciousRenderer set - Fix: drop playwright heuristics (webdriver_configurable, chrome_runtime_missing) that false-positived on real Chrome - Live harness (server + page + botasaurus script) for real-browser validation Live-validated: real Chrome -> allow (env layer clean), request-mode -> challenge. 64 tests pass.
…eption)
Adds tripwire({paths,prefixes,patterns,includeDefaults}) and honeytoken() (a hidden decoy link + its tripwire path). Any request for a honeypot path is automated by construction, so it DENYs through the existing rule pipeline (403 + violation report) with no middleware changes. Detects intent, not fingerprint, so stealth tools like botasaurus cannot evade it.
Live-validated against real botasaurus: the crawler that walked past F1 fingerprinting (browser mode -> allow) is BLOCKED by the tripwire (GET /__wd/... -> 403). Real Chrome loading the same page never requests the hidden link -> zero false positive.
- rules/tripwire-rule.ts, rules/honeytoken.ts + exports; rules/tripwire.test.ts (13 tests)
- harness: tripwire wired into server, honeytoken injected into page, botasaurus_crawl_test.py
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
0.4.0 — two layers against botasaurus-class scrapers, both validated against real botasaurus via a live harness.
F1 — stealth fingerprint detection (contributing signals)
stealthcategory (native-function lie detection) + AudioContext scoring + broadened software-WebGL.webdriver_configurable/chrome_runtime_missing(fired on real Chrome). Real Chrome 36%→26% (allow).F4 — tripwire deception (the actual catch)
tripwire({ paths, prefixes, patterns, includeDefaults })rule +honeytoken()helper. Deterministic, zero-FP: a request for a hidden honeypot path is automated by construction. Flows through the existing DENY→403→violation pipeline (no middleware changes).GET /__wd/… → 403); real Chrome loading the same page never requests the hidden link → zero FP.Tests: 98 pass. Harness:
packages/webdecoy/harness/(server + page + botasaurus scripts).Publish: tag
v0.4.0after merge triggers the npm workflow.