Experimental DuckDB extension that wraps remote filesystems with an opt-in parallel read layer. IO concurrency is controlled independently from DuckDB's PRAGMA threads, letting you keep compute threads low while still saturating network bandwidth.
httpfs2 intercepts Read(..., location) calls on remote paths (S3, HTTP, etc.) and splits large reads into parallel sub-reads dispatched to a separate IO thread pool. DuckDB's ExternalFileCache and all other filesystem behavior stays intact.
makeLOAD httpfs; -- or any remote FS extension
LOAD httpfs2;
SET httpfs2_parallel_read_enabled = true;
SET httpfs2_parallel_read_recommended_preset_enabled = true;The recommended preset uses conservative defaults safe for real S3/R2/GCS: fanout=2, chunk=16MiB, min_bytes=2MiB.
| Setting | Default | Description |
|---|---|---|
httpfs2_parallel_read_enabled |
false |
Enable parallel read splitting for remote files |
httpfs2_parallel_read_io_threads |
16 |
IO worker pool size (separate from DuckDB threads) |
httpfs2_parallel_read_chunk_size_bytes |
16 MiB |
Target chunk size per sub-read |
httpfs2_parallel_read_max_fanout |
4 |
Max parallel sub-reads per Read() call |
httpfs2_parallel_read_min_bytes |
2 MiB |
Minimum read size before splitting kicks in |
httpfs2_parallel_read_recommended_preset_enabled |
false |
Override to safe defaults (fanout=2, chunk=16MiB) |
httpfs2_parallel_read_dedicated_handles |
false |
Open extra file handles per remote file to reduce httpfs contention |
httpfs2_parallel_read_auto_fanout_enabled |
false |
Use S3 cost model to choose fanout per read |
httpfs2_parallel_read_auto_fanout_overhead_ms |
500 |
Penalty (ms) per extra sub-request in auto-fanout |
httpfs2_parallel_read_auto_fanout_overhead_adaptive_enabled |
false |
Learn overhead from observed timings (EWMA) |
Use it when:
- Remote reads are the bottleneck, not CPU
- You want to cap DuckDB compute threads but still overlap IO
- You're latency/RTT-bound and need more in-flight range reads
Don't use it when:
- Queries are CPU-bound
- DuckDB already has enough threads to saturate the network
- Single core only (overhead tends to negate the benefit)
All benchmarks: RustFS (S3 API) via Docker, 512 MiB object, 128 MiB read calls, p50 over 3 repeats.
| Configuration | Best p50 MB/s | CPU% |
|---|---|---|
Native DuckDB threads=2 |
1,455 | 51% |
httpfs2 io_threads=8, DuckDB threads=1 |
1,816 | 68% |
httpfs2 beats native threading by ~25% while keeping DuckDB at 1 thread.
Native DuckDB threads (httpfs2 off):
| threads | p50 MB/s |
|---|---|
| 1 | 349 |
| 2 | 422 |
| 4 | 239 |
httpfs2 (DuckDB threads=1, fanout=4, chunk=16MiB):
| io_threads | p50 MB/s |
|---|---|
| 1 | 365 |
| 2 | 391 |
| 4 | 275 |
Under real latency, native threads=2 wins against httpfs2 at threads=1. But combining httpfs2 with higher DuckDB thread counts helps:
| DuckDB threads | httpfs (baseline) | httpfs2 recommended | httpfs2 auto+adaptive |
|---|---|---|---|
| 1 | 243 MB/s | 213 (-12.6%) | 220 (-9.7%) |
| 2 | 241 MB/s | 284 (+18.0%) | 267 (+11.1%) |
| 4 | 228 MB/s | 257 (+12.7%) | 291 (+27.8%) |
- High per-read fanout (>=4) hurts under real latency. Keep
max_fanout=2for production. - The win comes from decoupling IO concurrency from compute threads, not from splitting individual reads aggressively.
- Best results are on 2+ cores with RTT-bound workloads at moderate DuckDB thread counts.
# Build Linux artifacts in Docker
docker run --rm -v "$PWD:/work" -w /work httpfs2-bench bash -lc \
'bash scripts/docker_build_linux_artifacts.sh'
# Start local S3 (RustFS) + Toxiproxy
docker compose up -d rustfs
RUSTFS_TOXI_LATENCY_MS=50 bash scripts/rustfs_toxiproxy_setup.sh
# Run full matrix (default)
RUSTFS_TOXI_LATENCY_MS=50 bash scripts/rustfs_local_bench_toxiproxy.sh
# Run a specific sweep mode
HTTPFS2_BENCH_MODE=low-threads HTTPFS2_DUCKDB_THREADS_LIST="1 2 4" \
RUSTFS_TOXI_LATENCY_MS=50 bash scripts/rustfs_local_bench_toxiproxy.sh
# Summarize results
uv run scripts/summarize_bench_ascii.py results.csv
uv run scripts/summarize_bench_ascii.py results.csv --compareAvailable bench modes: full, recommended, auto-adaptive, overhead-sweep, low-threads, single-scan, pool-tasks, pool-fanout, fanout-sweep. See scripts/docker_s3_bench.sh for details.
This project was inspired by and builds on ideas from:
- duck-read-cache-fs - DuckDB filesystem wrapper with read caching and parallel subrequests
- duckdb-diskcache - DuckDB disk cache extension with S3 cost model and merge-if-cheaper heuristics