Skip to content

[Performance]: UVA vs UVM for CPU offloading on v0.8.4+ #17062

Open
@rajesh-s

Description

@rajesh-s

Proposal to improve performance

Referencing the recent implementation on #15354 (v0.8.4+) for CPU offloading

@youkaichao, is there any specific reason to pick UVA (cudaHostAlloc) over UVM cudaMallocManaged()?

  1. UVM goes further than UVA to manage data automatically, often using page-faulting hardware to migrate pages on demand. On systems like the GH200, this has potentially additional benefits such as hardware orchestrated frequency based migration.
  2. A key benefit of Unified Memory is simplifying the heterogeneous computing memory model by eliminating the need for deep copies when accessing structured data in GPU kernels. Source
  3. On several discussion threads, the larger access sizes of CPU offloading makes UVM seems to be the better approach compared to UVA Source

Upon profiling vLLM v0.8.4 on a GH200 trying to assess the penalty of page migrations with CPU offloading, I noticed two things

  1. cudaHostAlloc() calls were prevalent but there is no page fault data collected. Going by UVA, these were likely accessed directly from the CPU which could hurt utilization on the GPU
  2. A high utilization driver process called UVM GPU1 BH whose behavior is unexplained on Nvidia forums
image

Going by this literature, if transparent offloading is desired cudaMallocManaged() seems to be desirable for platforms such as the GH200

Image

Report of performance regression

No response

Misc discussion on performance

No response

Your current environment (if you think it is necessary)

Before submitting a new issue...

  • Make sure you already searched for relevant issues, and asked the chatbot living at the bottom right corner of the documentation page, which can answer lots of frequently asked questions.

Metadata

Metadata

Assignees

No one assigned

    Labels

    performancePerformance-related issues

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions