Optimize GFQL GPU hot paths and RAPIDS benchmarks by lmeyerov · Pull Request #972 · graphistry/pygraphistry

lmeyerov · 2026-03-30T02:46:05Z

Summary

This PR advances the GFQL/RAPIDS GPU regression investigation in three ways:

optimizes the hot undirected single-hop GFQL traversal path in hop()
adds a narrow safe to_cugraph() fast path for contiguous zero-based integer IDs
adds official RAPIDS-image GPU benchmark tooling for direct to_cugraph(), GFQL pipeline, GFQL stage-breakdown, and timing-summary comparisons across RAPIDS/CUDA variants
tracks the upstream-facing RAPIDS string-ID graph-build regression in #977

Kept code changes

graphistry/compute/hop.py
- narrow undirected single-hop fast path to avoid doubled pair materialization
graphistry/plugins/cugraph.py
- guarded auto-renumber=False when IDs are already signed integer, zero-based, and contiguous
- cached fast-path decision reuse on repeated conversions
- conservative rejection of null-bearing integer endpoint columns for the renumber=False fast path
new/expanded tests
- graphistry/tests/compute/test_hop.py
- graphistry/tests/plugins/test_cugraph.py

Benchmark tooling

Committed in this PR:

benchmarks/gfql/filter_pagerank/benchmark_to_cugraph_gpu.py
benchmarks/gfql/filter_pagerank/filter_pagerank_stage_breakdown_gpu.py
docker/benchmark-rapids-official-to-cugraph-gpu.sh
docker/benchmark-rapids-official-gfql-gpu.sh
- supports CUDA_VARIANT
- supports RUN_STAGE_BREAKDOWN=1
docker/test-rapids-official-local.sh
- adds WITH_BENCHMARK=1 timing output as TIMING_JSON=...
docker/benchmark-rapids-official-matrix.sh
- timing-summary wrapper over the official RAPIDS image matrix

Follow-up issues

#977 Investigate RAPIDS 26.02 string-ID graph-build regression with pure cuDF/cuGraph repro
#978 Feature request: cache/reuse cuGraph graph projections across repeated GPU algorithm calls

Validation

Core validation on the kept optimization paths:

PYTHONPATH=. pytest -q graphistry/tests/plugins/test_cugraph.py graphistry/tests/compute/test_hop.py graphistry/tests/test_compute_hops.py graphistry/tests/compute/test_chain.py graphistry/tests/test_compute_chain.py
- 196 passed, 29 skipped
./bin/typecheck.sh
- success
added amplification coverage for:
- to_cugraph() explicit kwarg forwarding and ineligible-ID no-op behavior
- hop() fallback-equivalence on duplicate seeds and numeric-ID topologies

GPU validation on the kept optimization paths:

hop() GPU stage-breakdown spot checks
- 26.02-cuda13 / twitter
  - search1_median_s: 0.1121
  - pagerank_median_s: 0.0280
  - search2_median_s: 0.1043
  - pipeline_total_median_s: 0.2417
- 25.02-cuda12 / twitter
  - search1_median_s: 0.1354
  - pagerank_median_s: 0.0322
  - search2_median_s: 0.1140
  - pipeline_total_median_s: 0.3199
to_cugraph() GPU conversion spot checks
- 26.02-cuda13
  - synthetic_contiguous: total 0.0200, build 0.0123, expected_vertex_match=True
  - synthetic_offset: total 0.0220, build 0.0140, expected_vertex_match=True
  - synthetic_string_gplus_shape: total 0.0504, build 0.0466, expected_vertex_match=True
  - twitter: total 0.0516, build 0.0358, expected_vertex_match=True
- 25.02-cuda12
  - synthetic_contiguous: total 0.0255, build 0.0151, expected_vertex_match=True
  - synthetic_offset: total 0.0349, build 0.0227, expected_vertex_match=True
  - twitter: total 0.0584, build 0.0382, expected_vertex_match=True

Benchmark-shell validation for the newly committed timing helpers:

bash -n docker/test-rapids-official-local.sh docker/benchmark-rapids-official-matrix.sh
- success

CI is green on the last code-path validation head, and the benchmark-shell commit is now queued on top of it.

Findings

Accepted local wins:

hop() undirected single-hop fast path
- gplus GPU pipeline improved about -40% on both RAPIDS 25.02 and 26.02
contiguous integer to_cugraph() fast path
- real wins on contiguous integer graph families
26.02-cuda13 is consistently better than 26.02-cuda12

Remaining regression story:

real GFQL gplus warm pipeline, 25.02-cuda12.8 -> 26.02-cuda13
- pipeline_total: 1.7019s -> 2.1780s (+27.97%)
- pagerank stage: 0.3240s -> 0.5108s (+57.65%)
direct to_cugraph()+pagerank, same workload
- total: 0.8171s -> 1.3695s (+67.60%)
- build: 0.7630s -> 1.3215s (+73.20%)
- pagerank: 0.0537s -> 0.0487s (-9.31%)

Pure RAPIDS follow-up:

we now have a cuDF/cuGraph-only reproducer with no PyGraphistry imports
strongest small-ish repro shape:
- synthetic_string_gplus_shape
- 10,000,000 edges
- 107,614 unique vertices
25.02-cuda12.8 -> 26.02-cuda13
- total: 0.1936s -> 0.3188s (+64.67%)
- build: 0.1861s -> 0.3148s (+69.16%)
sparse integer control remained slightly faster on 26.02

Known limits

there is no safe local sparse-ID pre-factorization to keep in this PR
the remaining large sparse string/object-ID renumber regression still appears upstream-facing in cuGraph graph build / renumbering
the cache-reuse idea is tracked separately as a follow-up feature request and is not bundled into this PR

lmeyerov · 2026-03-30T04:48:31Z

graphistry/compute/hop.py

+        return cast(Any, df)[[col]]
+
+    def _column_values(df: Any, col: str) -> Any:
+        return cast(Any, df)[col]


move and cleanup

lmeyerov commented Mar 30, 2026

View reviewed changes

lmeyerov force-pushed the feat/gfql-rapids-gpu-regression-optimization branch 2 times, most recently from 8ca42fc to bb6bd69 Compare March 30, 2026 05:56

lmeyerov added 15 commits March 29, 2026 23:16

Optimize GFQL undirected hop fast path

6e2fdfa

Fix hop fast path typing for Python 3.8

d2f3a4b

Tighten hop seed-id typing for CI

f76bc01

Disambiguate hop seed-id dataframe branches

96f8066

Add to_cugraph benchmark harness

8c07b8c

Optimize contiguous to_cugraph path

f9a591f

Fix to_cugraph typecheck

13e0cc9

Extend GFQL GPU benchmark variants

6ef6e2c

Fix audit issues in cugraph benchmarks

b07a5c1

Harden GFQL fast-path regression coverage

97d1e48

Move hop dataframe shims into helper layer

9b1ad30

Add sparse-id to_cugraph benchmark strategies

c41a39b

Add synthetic string repro graph benchmarks

0c937e9

Clarify synthetic string benchmark purpose

2bc81f5

Document GFQL GPU optimization work

cf94b0f

lmeyerov force-pushed the feat/gfql-rapids-gpu-regression-optimization branch from bb6bd69 to cf94b0f Compare March 30, 2026 06:16

lmeyerov added 3 commits March 29, 2026 23:17

Clarify synthetic repro graph intent

5591612

Amplify hop and cugraph regression tests

1cd2c5f

Add RAPIDS official matrix timing scripts

c4f1a8e

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Optimize GFQL GPU hot paths and RAPIDS benchmarks#972

Optimize GFQL GPU hot paths and RAPIDS benchmarks#972
lmeyerov wants to merge 18 commits intomasterfrom
feat/gfql-rapids-gpu-regression-optimization

lmeyerov commented Mar 30, 2026 •

edited

Loading

Uh oh!

lmeyerov Mar 30, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

lmeyerov commented Mar 30, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

Kept code changes

Benchmark tooling

Follow-up issues

Validation

Findings

Known limits

Uh oh!

lmeyerov Mar 30, 2026

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

lmeyerov commented Mar 30, 2026 •

edited

Loading