Skip to content

AMDGPU generates v_cndmask/readfirstlane for uniform select #59869

Closed
@jayfoad

Description

@jayfoad

Test case:

define amdgpu_ps i32 @_amdgpu_ps_main(i32 inreg %arg) {
bb:
  %i = icmp eq i32 %arg, 0
  %i1 = zext i1 %i to i64
  %i2 = getelementptr i8, ptr addrspace(4) null, i64 %i1
  %i3 = load i32, ptr addrspace(4) %i2, align 8
  ret i32 %i3
}

If I compile with llc -march=amdgcn -mcpu=gfx900 I get:

_amdgpu_ps_main:                        ; @_amdgpu_ps_main
; %bb.0:                                ; %bb
	s_cmp_eq_u32 s0, 0
	s_cselect_b64 s[2:3], -1, 0
	v_cndmask_b32_e64 v0, 0, 1, s[2:3]
	s_mov_b32 s1, 0
	v_readfirstlane_b32 s0, v0
	s_load_dword s0, s[0:1], 0x0
	s_waitcnt lgkmcnt(0)
	; return to shader part epilog

All computations are uniform, so the use of v_cndmask_b32_e64 and v_readfirstlane_b32 is wasteful and inefficient.

Metadata

Metadata

Assignees

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions