Open
Description
If we are rematerializing a wide instruction, we should try harder to rewrite it to set the minimal set of required lanes at the use point. In the most basic case, this means folding a use of s_mov_b64:
%0:sreg_64 = S_MOV_B64 0
// Should rematerialize here to undef %0.sub0 = S_MOV_B32 0
S_NOP 0, implicit %0.sub0
0001-WIP-AMDGPU-Fold-64-bit-moves-into-32-bit-when-materi.patch
Attaching WIP patch to start investigation. I'm not sure the starting point is useful, we try something similar already for scalar loads but I don't think the reMaterialize hook has enough context to see the uses here.