|
| 1 | +# **RFC-0021 GPU support using cuDF in C++ workers** |
| 2 | + |
| 3 | +### Proposers |
| 4 | + |
| 5 | +* Deepak Majeti, et. al. (IBM) |
| 6 | +* Zoltan Arnold Nagy, et. al. (IBM Research Europe) |
| 7 | +* Karthikeyan Natarajan, et. al. (NVIDIA) |
| 8 | + |
| 9 | +## Related Issues |
| 10 | + |
| 11 | +* [PR #25094](https://github.com/prestodb/presto/pull/25094): Enable Velox cuDF |
| 12 | +* [PR #26156](https://github.com/prestodb/presto/pull/26156): Add support for Velox cuDF options and CudfHiveConnector |
| 13 | + |
| 14 | +## Summary |
| 15 | + |
| 16 | +Enable C++ workers to execute queries on GPUs. |
| 17 | + |
| 18 | +## Background |
| 19 | + |
| 20 | +There is now a proliferation of GPU hardware primarily due to the demands from AI/ML usecases. |
| 21 | +GPU hardware over the years has evolved with advanced I/O capabilities. |
| 22 | +New AI adjacent data processing workflows are also being developed. |
| 23 | + |
| 24 | +GPUs provide high compute and memory bandwidth, which can benefit operations such as |
| 25 | +joins, aggregations, string processing, etc. |
| 26 | + |
| 27 | + |
| 28 | +### Goals |
| 29 | +* Allow Presto queries to run on a single GPU or multiple GPUs. |
| 30 | +* A query will run either on the CPU or a GPU. No hybrid execution. |
| 31 | +* Use CPU if a GPU lacks a certain functionality. |
| 32 | +* Execution should maximize utilization of available hardware such as NVLink. |
| 33 | + |
| 34 | +## Proposed Implementation |
| 35 | + |
| 36 | +Some of this work has been implemented in [Velox](https://github.com/facebookincubator/velox/tree/main/velox/experimental/cudf). |
| 37 | +The current implementation translates the CPU operators to the GPU operators via a DriverAdapter in Velox. |
| 38 | + |
| 39 | +Nvidia's [blog](https://developer.nvidia.com/blog/accelerating-large-scale-data-analytics-with-gpu-native-velox-and-nvidia-cudf/) |
| 40 | +has more details on the design and some early results. |
| 41 | + |
| 42 | +The [Extending Velox - GPU Acceleration with cuDF](https://velox-lib.io/blog/extending-velox-with-cudf) blog also covers the current implementation. |
| 43 | + |
| 44 | +On the Presto C++ side, the following registrations and configs have been added. |
| 45 | + |
| 46 | +* CMake build option `PRESTO_ENABLE_CUDF` must be set. https://github.com/prestodb/presto/tree/master/presto-native-execution#nvidia-cudf-gpu-support |
| 47 | +* Parquet file-format is supported. cudfHiveConnector is registered. |
| 48 | +* S3 and local/linux filesystems are supported. |
| 49 | +* cuDF [configs](https://facebookincubator.github.io/velox/configs.html#cudf-specific-configuration-experimental) can be |
| 50 | +specified inside `config.properties` and catalog `.properties` file. |
| 51 | + |
| 52 | +The current work so far shows that GPUs can provide good price-performance. However, to make this support user-friendly and get better price-performance, the following improvements are in progress. |
| 53 | + |
| 54 | +## Work in Progress |
| 55 | +* Add GPU plan nodes. |
| 56 | + * Driver adapter runs after the drivers/pipelines are built. Limits the adaptation. |
| 57 | + * Allow efficient fallback to CPU. |
| 58 | +* GPU-GPU exchange using UCX (https://github.com/prestodb/presto/tree/ibm-research-preview). |
| 59 | +* Topology and hardware detection. |
| 60 | +* Metadata queries on CPU only. |
| 61 | +* Session parameter to filter workers. |
| 62 | +* Optimizer cost model to support GPUs. |
| 63 | + |
| 64 | +## Releases |
| 65 | +Presto C++ workers will be released with GPU support. |
| 66 | + |
| 67 | +## Test Plan |
| 68 | +Velox CI has a gpu runner sponsored by Meta. We need a similar runner for Presto. |
0 commit comments