feat: AsyncPlasmateCrawlerStrategy — lightweight alternative to Playwright (no Chrome) by dbhurley · Pull Request #1906 · unclecode/crawl4ai

dbhurley · 2026-04-08T11:55:01Z

Summary

Adds AsyncPlasmateCrawlerStrategy — a drop-in alternative to AsyncPlaywrightCrawlerStrategy using Plasmate instead of Chrome.

Directly addresses:

[Bug]: Memory Leak on Repeated /md Requests via Docker (MacOS) — Container Crashes Randomly Over Time #1256 — Memory leak / crash in Docker from Chrome processes. Plasmate uses ~64MB RAM per session vs ~300MB and leaves no persistent browser process.
feat: expose token usage in CrawlResult (#1745) #1874 — Token usage tracking. Plasmate returns pre-processed content (text / markdown / SOM) instead of raw HTML, cutting upstream token counts by 10-100× before any LLM call.

What Plasmate is

Open-source Rust browser engine (Apache 2.0). Fetches pages and returns them as Structured Object Model (SOM) — a compact, semantically clean representation with nav, ads, cookie banners, and boilerplate stripped. Install: pip install plasmate.

Compression measured across 45 real sites: 17.7× average, 77× peak. Every token saved before the LLM is a direct cost reduction.

Drop-in usage

from crawl4ai import AsyncWebCrawler
from crawl4ai.async_plasmate_strategy import AsyncPlasmateCrawlerStrategy

strategy = AsyncPlasmateCrawlerStrategy(
    output_format="markdown",   # text | markdown | som | links
    timeout=30,
    fallback_to_playwright=True,  # retry with Playwright for JS-heavy SPAs
)

async with AsyncWebCrawler(crawler_strategy=strategy) as crawler:
    result = await crawler.arun("https://docs.python.org/3/")
    print(result.markdown[:500])

What changed

File	Change
`crawl4ai/async_plasmate_strategy.py`	New `AsyncPlasmateCrawlerStrategy` implementing `AsyncCrawlerStrategy` ABC
`crawl4ai/__init__.py`	Export `AsyncPlasmateCrawlerStrategy`
`tests/general/test_plasmate_strategy.py`	20 unit tests (init, cmd building, crawl, fallback, concurrency)

Comparison

	AsyncPlaywrightCrawlerStrategy	AsyncPlasmateCrawlerStrategy
RAM per session	~300MB	~64MB
Chrome required	Yes	No
Tokens per page (avg)	~75,000 (raw HTML)	~4,200 (SOM/text)
JS rendering	Yes	No (use `fallback_to_playwright=True`)
Install	`playwright install` (~300MB browser)	`pip install plasmate`
Persistent process	Yes (browser stays alive)	No (subprocess per fetch)

Notes

No breaking changes — existing AsyncPlaywrightCrawlerStrategy usage is untouched
fallback_to_playwright=True makes it safe for mixed static/SPA crawls
Subprocess runs in asyncio executor — fully non-blocking, safe for concurrent gather() calls
Tested with Python 3.9+

…laywright Closes unclecode#1256 (memory leak in Docker from Chrome) Related to unclecode#1874 (token usage tracking) Plasmate (https://github.com/plasmate-labs/plasmate) is an open-source Rust browser engine that replaces Chrome/Playwright for static pages. No browser process, ~64MB RAM vs ~300MB, 10-100x fewer tokens per page. Changes: - crawl4ai/async_plasmate_strategy.py: AsyncPlasmateCrawlerStrategy - Implements AsyncCrawlerStrategy ABC (drop-in replacement) - Supports output_format: text (default), markdown, som, links - Supports --selector, --header, --timeout flags - Optional fallback_to_playwright=True for JS-heavy SPAs - Subprocess runs in asyncio executor — safe for concurrent use - crawl4ai/__init__.py: export AsyncPlasmateCrawlerStrategy - tests/general/test_plasmate_strategy.py: 20 unit tests Install: pip install plasmate Usage: from crawl4ai import AsyncWebCrawler from crawl4ai.async_plasmate_strategy import AsyncPlasmateCrawlerStrategy strategy = AsyncPlasmateCrawlerStrategy( output_format="markdown", fallback_to_playwright=True, # SPA safety net ) async with AsyncWebCrawler(crawler_strategy=strategy) as crawler: result = await crawler.arun("https://docs.python.org/3/")

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

feat: AsyncPlasmateCrawlerStrategy — lightweight alternative to Playwright (no Chrome)#1906

feat: AsyncPlasmateCrawlerStrategy — lightweight alternative to Playwright (no Chrome)#1906
dbhurley wants to merge 1 commit intounclecode:developfrom
dbhurley:feat/plasmate-crawler-strategy

dbhurley commented Apr 8, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Uh oh!

Conversation

dbhurley commented Apr 8, 2026

Summary

What Plasmate is

Drop-in usage

What changed

Comparison

Notes

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant