Skip to content

[AMDGPU]Add+GEP->GEP commit performance degradation on AMDGPU #94092

Open
@lfmeadow

Description

@lfmeadow

AMD have traced some significant (8-20%) slowdowns on several of AMD's rocFFT kernels to the commit mentioned in #78214 . The llvm commit is e13bed4.
Briefly, we have identified two problems:

  1. The commit causes simplifycfg to bail out for fear of creating too many PHI nodes (even though they don't really get created). This has cascade effects on the backend register allocation pipeline, causing the kernel to use many more Vector GPRs (VGPRS) on AMD CDNA architecture GPUs. This in turn drops occupancy with a corresponding decrease in performance. See simplifycfg-repro
  2. A more subtle effect occurs when the transformation results in more CSE opportunities. This ends up increasing the VGPR usage as well, similarly dropping the occupancy. The performance loss is less severe but is still in the 8% range. See regalloc-repro
    There are a lot of details on github as noted above. I will briefly summarize.
    For issue 1, simplifycfg bails with SINK: stopping here, too many PHIs would be created!. I hacked a tunable for simplifycfg as shown in the patch thus allowing 4 PHI nodes, and the resulting code regains the performance.
    For issue 2, there is no obvious smoking gun; as far as I can tell it just puts more pressure on the register allocation path to put partial address expressions in registers, with no obvious benefit.

Ideally the GEP rewrite commit would be withdrawn. I don't see any way to fix this otherwise.
I do not understand why simplifycfg is failing to restructure this code.

Thanks for your consideration

Metadata

Metadata

Assignees

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions