Skip to content

AMDGPU should try to shrink 64-bit defs to 32-bit when rematerializing #128716

Open
@arsenm

Description

@arsenm

If we are rematerializing a wide instruction, we should try harder to rewrite it to set the minimal set of required lanes at the use point. In the most basic case, this means folding a use of s_mov_b64:

  %0:sreg_64 = S_MOV_B64 0
   
  // Should rematerialize here to undef %0.sub0 = S_MOV_B32 0
  S_NOP 0, implicit %0.sub0

0001-WIP-AMDGPU-Fold-64-bit-moves-into-32-bit-when-materi.patch

Attaching WIP patch to start investigation. I'm not sure the starting point is useful, we try something similar already for scalar loads but I don't think the reMaterialize hook has enough context to see the uses here.

Metadata

Metadata

Assignees

No one assigned

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions