Skip to content

Epic: GPU Data Loading #7712

@0ax1

Description

@0ax1

Context

So far we've only spent time optimizing the GPU kernels themselves but not how to get the data to the GPU fast. Prior art here is Onur's exploratory work in this area. As a first I'll capture the current GPU data loading state in Vortex which is expected to be slow.

Benchmarking copy modes

Insight => Copy performance does not differ between pageable host memory and pinned host memory. The main difference being that copy calls can return immediately (C style async) in case the memory is pinned to physical addresses. WRITECOMBINED yields significantly slower copy performance for cuMemHostAlloc (see #7815 (comment)).

Benchmark perf between

  • copies from host memory to GPU
  • NVMe to CPU
  • object storage to GPU

GPU memory pooling

  • GPU allocation is very expensive => buffer pool

GPUDirect - NVMe

  • DMA copy from NVMe to the GPU

GPUDirect - Object Storage

  • RDMA copy from object storage to the GPU

Vortex Scan Integration

  • run the ideal respective modes as part of the scan, bypass host memory if possible
  • come up with a decision mechanism if and which parts of the scan should run on the GPU

Metadata

Metadata

Assignees

Labels

epicPublic roadmap umbrella for a major initiative, with work tracked in sub-issues.

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions