Closed
Description
Instruction folding done in #94887 causes performance drop in the benchmark testcase for normal-double distribution for mt19937 randomizer engine. {in AMD MI100 device }
The expected result should be :
benchmark_rocrand_generate
rocRAND: 300100 Runtime: 60241133 Device: AMD Instinct MI100
mt19937:
normal-double:
Throughput = 264.408 GB/s, Samples = 33.051 GSample/s, AvgTime (1 trial) = 3.782 ms, Time (all) = 75.641 ms, Size = 134217728
But, the mentioned PR causing drop in throughput as follows :
benchmark_rocrand_generate
rocRAND: 300100 Runtime: 60241133 Device: AMD Instinct MI100
mt19937:
normal-double:
Throughput = 248.093 GB/s, Samples = 31.012 GSample/s, AvgTime (1 trial) = 4.031 ms, Time (all) = 80.615 ms, Size = 134217728