Skip to content

Optimize GFQL GPU hot paths and RAPIDS benchmarks#972

Open
lmeyerov wants to merge 18 commits intomasterfrom
feat/gfql-rapids-gpu-regression-optimization
Open

Optimize GFQL GPU hot paths and RAPIDS benchmarks#972
lmeyerov wants to merge 18 commits intomasterfrom
feat/gfql-rapids-gpu-regression-optimization

Conversation

@lmeyerov
Copy link
Copy Markdown
Contributor

@lmeyerov lmeyerov commented Mar 30, 2026

Summary

This PR advances the GFQL/RAPIDS GPU regression investigation in three ways:

  • optimizes the hot undirected single-hop GFQL traversal path in hop()
  • adds a narrow safe to_cugraph() fast path for contiguous zero-based integer IDs
  • adds official RAPIDS-image GPU benchmark tooling for direct to_cugraph(), GFQL pipeline, GFQL stage-breakdown, and timing-summary comparisons across RAPIDS/CUDA variants
  • tracks the upstream-facing RAPIDS string-ID graph-build regression in #977

Kept code changes

  • graphistry/compute/hop.py
    • narrow undirected single-hop fast path to avoid doubled pair materialization
  • graphistry/plugins/cugraph.py
    • guarded auto-renumber=False when IDs are already signed integer, zero-based, and contiguous
    • cached fast-path decision reuse on repeated conversions
    • conservative rejection of null-bearing integer endpoint columns for the renumber=False fast path
  • new/expanded tests
    • graphistry/tests/compute/test_hop.py
    • graphistry/tests/plugins/test_cugraph.py

Benchmark tooling

Committed in this PR:

  • benchmarks/gfql/filter_pagerank/benchmark_to_cugraph_gpu.py
  • benchmarks/gfql/filter_pagerank/filter_pagerank_stage_breakdown_gpu.py
  • docker/benchmark-rapids-official-to-cugraph-gpu.sh
  • docker/benchmark-rapids-official-gfql-gpu.sh
    • supports CUDA_VARIANT
    • supports RUN_STAGE_BREAKDOWN=1
  • docker/test-rapids-official-local.sh
    • adds WITH_BENCHMARK=1 timing output as TIMING_JSON=...
  • docker/benchmark-rapids-official-matrix.sh
    • timing-summary wrapper over the official RAPIDS image matrix

Follow-up issues

  • #977 Investigate RAPIDS 26.02 string-ID graph-build regression with pure cuDF/cuGraph repro
  • #978 Feature request: cache/reuse cuGraph graph projections across repeated GPU algorithm calls

Validation

Core validation on the kept optimization paths:

  • PYTHONPATH=. pytest -q graphistry/tests/plugins/test_cugraph.py graphistry/tests/compute/test_hop.py graphistry/tests/test_compute_hops.py graphistry/tests/compute/test_chain.py graphistry/tests/test_compute_chain.py
    • 196 passed, 29 skipped
  • ./bin/typecheck.sh
    • success
  • added amplification coverage for:
    • to_cugraph() explicit kwarg forwarding and ineligible-ID no-op behavior
    • hop() fallback-equivalence on duplicate seeds and numeric-ID topologies

GPU validation on the kept optimization paths:

  • hop() GPU stage-breakdown spot checks
    • 26.02-cuda13 / twitter
      • search1_median_s: 0.1121
      • pagerank_median_s: 0.0280
      • search2_median_s: 0.1043
      • pipeline_total_median_s: 0.2417
    • 25.02-cuda12 / twitter
      • search1_median_s: 0.1354
      • pagerank_median_s: 0.0322
      • search2_median_s: 0.1140
      • pipeline_total_median_s: 0.3199
  • to_cugraph() GPU conversion spot checks
    • 26.02-cuda13
      • synthetic_contiguous: total 0.0200, build 0.0123, expected_vertex_match=True
      • synthetic_offset: total 0.0220, build 0.0140, expected_vertex_match=True
      • synthetic_string_gplus_shape: total 0.0504, build 0.0466, expected_vertex_match=True
      • twitter: total 0.0516, build 0.0358, expected_vertex_match=True
    • 25.02-cuda12
      • synthetic_contiguous: total 0.0255, build 0.0151, expected_vertex_match=True
      • synthetic_offset: total 0.0349, build 0.0227, expected_vertex_match=True
      • twitter: total 0.0584, build 0.0382, expected_vertex_match=True

Benchmark-shell validation for the newly committed timing helpers:

  • bash -n docker/test-rapids-official-local.sh docker/benchmark-rapids-official-matrix.sh
    • success

CI is green on the last code-path validation head, and the benchmark-shell commit is now queued on top of it.

Findings

Accepted local wins:

  • hop() undirected single-hop fast path
    • gplus GPU pipeline improved about -40% on both RAPIDS 25.02 and 26.02
  • contiguous integer to_cugraph() fast path
    • real wins on contiguous integer graph families
  • 26.02-cuda13 is consistently better than 26.02-cuda12

Remaining regression story:

  • real GFQL gplus warm pipeline, 25.02-cuda12.8 -> 26.02-cuda13
    • pipeline_total: 1.7019s -> 2.1780s (+27.97%)
    • pagerank stage: 0.3240s -> 0.5108s (+57.65%)
  • direct to_cugraph()+pagerank, same workload
    • total: 0.8171s -> 1.3695s (+67.60%)
    • build: 0.7630s -> 1.3215s (+73.20%)
    • pagerank: 0.0537s -> 0.0487s (-9.31%)

Pure RAPIDS follow-up:

  • we now have a cuDF/cuGraph-only reproducer with no PyGraphistry imports
  • strongest small-ish repro shape:
    • synthetic_string_gplus_shape
    • 10,000,000 edges
    • 107,614 unique vertices
  • 25.02-cuda12.8 -> 26.02-cuda13
    • total: 0.1936s -> 0.3188s (+64.67%)
    • build: 0.1861s -> 0.3148s (+69.16%)
  • sparse integer control remained slightly faster on 26.02

Known limits

  • there is no safe local sparse-ID pre-factorization to keep in this PR
  • the remaining large sparse string/object-ID renumber regression still appears upstream-facing in cuGraph graph build / renumbering
  • the cache-reuse idea is tracked separately as a follow-up feature request and is not bundled into this PR

return cast(Any, df)[[col]]

def _column_values(df: Any, col: str) -> Any:
return cast(Any, df)[col]
Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

move and cleanup

@lmeyerov lmeyerov force-pushed the feat/gfql-rapids-gpu-regression-optimization branch 2 times, most recently from 8ca42fc to bb6bd69 Compare March 30, 2026 05:56
@lmeyerov lmeyerov force-pushed the feat/gfql-rapids-gpu-regression-optimization branch from bb6bd69 to cf94b0f Compare March 30, 2026 06:16
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant