Skip to content

sidequery/httpfs2

Repository files navigation

httpfs2

Experimental DuckDB extension that wraps remote filesystems with an opt-in parallel read layer. IO concurrency is controlled independently from DuckDB's PRAGMA threads, letting you keep compute threads low while still saturating network bandwidth.

How it works

httpfs2 intercepts Read(..., location) calls on remote paths (S3, HTTP, etc.) and splits large reads into parallel sub-reads dispatched to a separate IO thread pool. DuckDB's ExternalFileCache and all other filesystem behavior stays intact.

Build

make

Quick start

LOAD httpfs;   -- or any remote FS extension
LOAD httpfs2;

SET httpfs2_parallel_read_enabled = true;
SET httpfs2_parallel_read_recommended_preset_enabled = true;

The recommended preset uses conservative defaults safe for real S3/R2/GCS: fanout=2, chunk=16MiB, min_bytes=2MiB.

Settings

Setting Default Description
httpfs2_parallel_read_enabled false Enable parallel read splitting for remote files
httpfs2_parallel_read_io_threads 16 IO worker pool size (separate from DuckDB threads)
httpfs2_parallel_read_chunk_size_bytes 16 MiB Target chunk size per sub-read
httpfs2_parallel_read_max_fanout 4 Max parallel sub-reads per Read() call
httpfs2_parallel_read_min_bytes 2 MiB Minimum read size before splitting kicks in
httpfs2_parallel_read_recommended_preset_enabled false Override to safe defaults (fanout=2, chunk=16MiB)
httpfs2_parallel_read_dedicated_handles false Open extra file handles per remote file to reduce httpfs contention
httpfs2_parallel_read_auto_fanout_enabled false Use S3 cost model to choose fanout per read
httpfs2_parallel_read_auto_fanout_overhead_ms 500 Penalty (ms) per extra sub-request in auto-fanout
httpfs2_parallel_read_auto_fanout_overhead_adaptive_enabled false Learn overhead from observed timings (EWMA)

When to use httpfs2

Use it when:

  • Remote reads are the bottleneck, not CPU
  • You want to cap DuckDB compute threads but still overlap IO
  • You're latency/RTT-bound and need more in-flight range reads

Don't use it when:

  • Queries are CPU-bound
  • DuckDB already has enough threads to saturate the network
  • Single core only (overhead tends to negate the benefit)

Benchmark results

All benchmarks: RustFS (S3 API) via Docker, 512 MiB object, 128 MiB read calls, p50 over 3 repeats.

Local S3 (no injected latency), 2 CPUs pinned

Configuration Best p50 MB/s CPU%
Native DuckDB threads=2 1,455 51%
httpfs2 io_threads=8, DuckDB threads=1 1,816 68%

httpfs2 beats native threading by ~25% while keeping DuckDB at 1 thread.

With 100ms RTT (Toxiproxy 50ms up + 50ms down), 2 CPUs pinned

Native DuckDB threads (httpfs2 off):

threads p50 MB/s
1 349
2 422
4 239

httpfs2 (DuckDB threads=1, fanout=4, chunk=16MiB):

io_threads p50 MB/s
1 365
2 391
4 275

Under real latency, native threads=2 wins against httpfs2 at threads=1. But combining httpfs2 with higher DuckDB thread counts helps:

DuckDB threads httpfs (baseline) httpfs2 recommended httpfs2 auto+adaptive
1 243 MB/s 213 (-12.6%) 220 (-9.7%)
2 241 MB/s 284 (+18.0%) 267 (+11.1%)
4 228 MB/s 257 (+12.7%) 291 (+27.8%)

Key takeaways

  • High per-read fanout (>=4) hurts under real latency. Keep max_fanout=2 for production.
  • The win comes from decoupling IO concurrency from compute threads, not from splitting individual reads aggressively.
  • Best results are on 2+ cores with RTT-bound workloads at moderate DuckDB thread counts.

Benchmarking

# Build Linux artifacts in Docker
docker run --rm -v "$PWD:/work" -w /work httpfs2-bench bash -lc \
  'bash scripts/docker_build_linux_artifacts.sh'

# Start local S3 (RustFS) + Toxiproxy
docker compose up -d rustfs
RUSTFS_TOXI_LATENCY_MS=50 bash scripts/rustfs_toxiproxy_setup.sh

# Run full matrix (default)
RUSTFS_TOXI_LATENCY_MS=50 bash scripts/rustfs_local_bench_toxiproxy.sh

# Run a specific sweep mode
HTTPFS2_BENCH_MODE=low-threads HTTPFS2_DUCKDB_THREADS_LIST="1 2 4" \
  RUSTFS_TOXI_LATENCY_MS=50 bash scripts/rustfs_local_bench_toxiproxy.sh

# Summarize results
uv run scripts/summarize_bench_ascii.py results.csv
uv run scripts/summarize_bench_ascii.py results.csv --compare

Available bench modes: full, recommended, auto-adaptive, overhead-sweep, low-threads, single-scan, pool-tasks, pool-fanout, fanout-sweep. See scripts/docker_s3_bench.sh for details.

Inspiration

This project was inspired by and builds on ideas from:

  • duck-read-cache-fs - DuckDB filesystem wrapper with read caching and parallel subrequests
  • duckdb-diskcache - DuckDB disk cache extension with S3 cost model and merge-if-cheaper heuristics

About

Remote filesystem parallel reads for DuckDB

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages