Epic: GPU Data Loading

## Context 
So far we've only spent time optimizing the GPU kernels themselves but not how to get the data to the GPU fast. Prior art here is Onur's exploratory work in this area. As a first I'll capture the current GPU data loading state in Vortex which is expected to be slow.

## Benchmarking copy modes
Insight => Copy performance does not differ between pageable host memory and pinned host memory. The main difference being that copy calls can return immediately (C style async) in case the memory is pinned to physical addresses.  `WRITECOMBINED` yields significantly slower copy performance for `cuMemHostAlloc` (see https://github.com/vortex-data/vortex/pull/7815#issue-4392199611).

## Benchmark perf between
- copies from host memory to GPU
- NVMe to CPU
- object storage to GPU

## GPU memory pooling
- GPU allocation is very expensive => buffer pool

## GPUDirect - NVMe
- DMA copy from NVMe to the GPU

## GPUDirect - Object Storage
- RDMA copy from object storage to the GPU

## Vortex Scan Integration
- run the ideal respective modes as part of the scan, bypass host memory if possible
- come up with a decision mechanism if and which parts of the scan should run on the GPU

--- 
- https://github.com/vortex-data/vortex/pull/7799
- https://github.com/vortex-data/vortex/pull/7815

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Epic: GPU Data Loading #7712

Context

Benchmarking copy modes

Benchmark perf between

GPU memory pooling

GPUDirect - NVMe

GPUDirect - Object Storage

Vortex Scan Integration

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Epic: GPU Data Loading #7712

Description

Context

Benchmarking copy modes

Benchmark perf between

GPU memory pooling

GPUDirect - NVMe

GPUDirect - Object Storage

Vortex Scan Integration

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions