Skip to content

[AMDGPU] InstCombine results in performance drop in ROCM's rocRAND library in MI100 #104900

Closed
@vg0204

Description

@vg0204

Instruction folding done in #94887 causes performance drop in the benchmark testcase for normal-double distribution for mt19937 randomizer engine. {in AMD MI100 device }

The expected result should be :

benchmark_rocrand_generate
rocRAND: 300100 Runtime: 60241133 Device: AMD Instinct MI100

mt19937:
  normal-double:
      Throughput =  264.408 GB/s, Samples =   33.051 GSample/s, AvgTime (1 trial) =    3.782 ms, Time (all) =   75.641 ms, Size = 134217728

But, the mentioned PR causing drop in throughput as follows :

benchmark_rocrand_generate
rocRAND: 300100 Runtime: 60241133 Device: AMD Instinct MI100

mt19937:
  normal-double:
      Throughput =  248.093 GB/s, Samples =   31.012 GSample/s, AvgTime (1 trial) =    4.031 ms, Time (all) =   80.615 ms, Size = 134217728

Metadata

Metadata

Assignees

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions