Skip to content

[AMDGPU][GlobalISel] Combine (sext (trunc (sext_in_reg x))) #131312

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
12 changes: 11 additions & 1 deletion llvm/include/llvm/Target/GlobalISel/Combine.td
Original file line number Diff line number Diff line change
Expand Up @@ -258,6 +258,14 @@ def sext_trunc_sextload : GICombineRule<
[{ return Helper.matchSextTruncSextLoad(*${d}); }]),
(apply [{ Helper.applySextTruncSextLoad(*${d}); }])>;

def sext_trunc_sextinreg : GICombineRule<
(defs root:$dst),
(match (G_SEXT_INREG $sir, $src, $width),
(G_TRUNC $trunc, $sir),
(G_SEXT $dst, $trunc),
[{ return (MRI.getType(${trunc}.getReg()).getScalarSizeInBits() >= ${width}.getImm()); }]),
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You also need to check that $sir and $dst have the same width, right?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

GIReplaceReg implicitly does it. It'll check canReplaceReg
There is one testcase that has different widths that shows it working

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That seems like a weird design. Surely all the checking of conditions should be done in the "match" part? I didn't even know that an "apply" function was allowed to fail.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

apply should not be allowed to fail

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

On a related note, couldn't you split this whole combine into two independently useful parts:

  1. (sext (trunc x)) --> (sext_inreg x)
  2. (sext_inreg (sext_inreg x)) --> (sext_inreg x)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Apply isn't allowed to fail. It's just that the presence of GIReplaceReg triggers emission of a canReplaceReg call during the matching portion of the match table rule.

On a related note, couldn't you split this whole combine into two independently useful parts:

Good idea, I can try that

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That seems like a weird design. Surely all the checking of conditions should be done in the "match" part? I didn't even know that an "apply" function was allowed to fail.

... or maybe we should remove the distinction between the match and apply parts as previously discussed.

(apply (GIReplaceReg $dst, $sir))>;

def sext_inreg_of_load_matchdata : GIDefMatchData<"std::tuple<Register, unsigned>">;
def sext_inreg_of_load : GICombineRule<
(defs root:$root, sext_inreg_of_load_matchdata:$matchinfo),
Expand Down Expand Up @@ -1896,7 +1904,9 @@ def cast_of_cast_combines: GICombineGroup<[
sext_of_anyext,
anyext_of_anyext,
anyext_of_zext,
anyext_of_sext
anyext_of_sext,

sext_trunc_sextinreg
]>;

def cast_combines: GICombineGroup<[
Expand Down
Original file line number Diff line number Diff line change
@@ -0,0 +1,86 @@
# NOTE: Assertions have been autogenerated by utils/update_mir_test_checks.py
# RUN: llc -mtriple=amdgcn-amd-amdhsa -mcpu=gfx1030 -run-pass=amdgpu-postlegalizer-combiner -verify-machineinstrs %s -o - | FileCheck %s
# RUN: llc -mtriple=amdgcn-amd-amdhsa -mcpu=gfx1030 -run-pass=amdgpu-regbank-combiner -verify-machineinstrs %s -o - | FileCheck %s

---
name: trunc_s16_inreg_8
tracksRegLiveness: true
body: |
bb.0:
liveins: $vgpr0
; CHECK-LABEL: name: trunc_s16_inreg_8
; CHECK: liveins: $vgpr0
; CHECK-NEXT: {{ $}}
; CHECK-NEXT: %copy:_(s32) = COPY $vgpr0
; CHECK-NEXT: %inreg:_(s32) = G_SEXT_INREG %copy, 8
; CHECK-NEXT: $vgpr0 = COPY %inreg(s32)
%copy:_(s32) = COPY $vgpr0
%inreg:_(s32) = G_SEXT_INREG %copy, 8
%trunc:_(s16) = G_TRUNC %inreg
%sext:_(s32) = G_SEXT %trunc
$vgpr0 = COPY %sext
...

---
name: trunc_s16_inreg_16
tracksRegLiveness: true
body: |
bb.0:
liveins: $vgpr0
; CHECK-LABEL: name: trunc_s16_inreg_16
; CHECK: liveins: $vgpr0
; CHECK-NEXT: {{ $}}
; CHECK-NEXT: %copy:_(s32) = COPY $vgpr0
; CHECK-NEXT: %inreg:_(s32) = G_SEXT_INREG %copy, 16
; CHECK-NEXT: $vgpr0 = COPY %inreg(s32)
%copy:_(s32) = COPY $vgpr0
%inreg:_(s32) = G_SEXT_INREG %copy, 16
%trunc:_(s16) = G_TRUNC %inreg
%sext:_(s32) = G_SEXT %trunc
$vgpr0 = COPY %sext
...

---
name: trunc_s8_inreg_16
tracksRegLiveness: true
body: |
bb.0:
liveins: $vgpr0
; CHECK-LABEL: name: trunc_s8_inreg_16
; CHECK: liveins: $vgpr0
; CHECK-NEXT: {{ $}}
; CHECK-NEXT: %copy:_(s32) = COPY $vgpr0
; CHECK-NEXT: %inreg:_(s32) = G_SEXT_INREG %copy, 16
; CHECK-NEXT: %trunc:_(s8) = G_TRUNC %inreg(s32)
; CHECK-NEXT: %sext:_(s32) = G_SEXT %trunc(s8)
; CHECK-NEXT: $vgpr0 = COPY %sext(s32)
%copy:_(s32) = COPY $vgpr0
%inreg:_(s32) = G_SEXT_INREG %copy, 16
%trunc:_(s8) = G_TRUNC %inreg
%sext:_(s32) = G_SEXT %trunc
$vgpr0 = COPY %sext
...

# TODO?: We could handle this by inserting a trunc, but I'm not sure how useful that'd be.
---
name: mismatching_types
tracksRegLiveness: true
body: |
bb.0:
liveins: $vgpr0
; CHECK-LABEL: name: mismatching_types
; CHECK: liveins: $vgpr0
; CHECK-NEXT: {{ $}}
; CHECK-NEXT: %copy:_(s32) = COPY $vgpr0
; CHECK-NEXT: %inreg:_(s32) = G_SEXT_INREG %copy, 8
; CHECK-NEXT: %trunc:_(s8) = G_TRUNC %inreg(s32)
; CHECK-NEXT: %sext:_(s16) = G_SEXT %trunc(s8)
; CHECK-NEXT: %anyext:_(s32) = G_ANYEXT %sext(s16)
; CHECK-NEXT: $vgpr0 = COPY %anyext(s32)
%copy:_(s32) = COPY $vgpr0
%inreg:_(s32) = G_SEXT_INREG %copy, 8
%trunc:_(s8) = G_TRUNC %inreg
%sext:_(s16) = G_SEXT %trunc
%anyext:_(s32) = G_ANYEXT %sext
$vgpr0 = COPY %anyext
...
78 changes: 16 additions & 62 deletions llvm/test/CodeGen/AMDGPU/GlobalISel/llvm.abs.ll
Original file line number Diff line number Diff line change
Expand Up @@ -197,33 +197,13 @@ define amdgpu_cs <4 x i32> @abs_vgpr_v4i32(<4 x i32> %arg) {
}

define amdgpu_cs <2 x i8> @abs_sgpr_v2i8(<2 x i8> inreg %arg) {
; GFX6-LABEL: abs_sgpr_v2i8:
; GFX6: ; %bb.0:
; GFX6-NEXT: s_sext_i32_i8 s0, s0
; GFX6-NEXT: s_sext_i32_i8 s1, s1
; GFX6-NEXT: s_abs_i32 s0, s0
; GFX6-NEXT: s_abs_i32 s1, s1
; GFX6-NEXT: ; return to shader part epilog
;
; GFX8-LABEL: abs_sgpr_v2i8:
; GFX8: ; %bb.0:
; GFX8-NEXT: s_sext_i32_i8 s0, s0
; GFX8-NEXT: s_sext_i32_i8 s1, s1
; GFX8-NEXT: s_sext_i32_i16 s0, s0
; GFX8-NEXT: s_sext_i32_i16 s1, s1
; GFX8-NEXT: s_abs_i32 s0, s0
; GFX8-NEXT: s_abs_i32 s1, s1
; GFX8-NEXT: ; return to shader part epilog
;
; GFX10-LABEL: abs_sgpr_v2i8:
; GFX10: ; %bb.0:
; GFX10-NEXT: s_sext_i32_i8 s0, s0
; GFX10-NEXT: s_sext_i32_i8 s1, s1
; GFX10-NEXT: s_sext_i32_i16 s0, s0
; GFX10-NEXT: s_sext_i32_i16 s1, s1
; GFX10-NEXT: s_abs_i32 s0, s0
; GFX10-NEXT: s_abs_i32 s1, s1
; GFX10-NEXT: ; return to shader part epilog
; GFX-LABEL: abs_sgpr_v2i8:
; GFX: ; %bb.0:
; GFX-NEXT: s_sext_i32_i8 s0, s0
; GFX-NEXT: s_sext_i32_i8 s1, s1
; GFX-NEXT: s_abs_i32 s0, s0
; GFX-NEXT: s_abs_i32 s1, s1
; GFX-NEXT: ; return to shader part epilog
%res = call <2 x i8> @llvm.abs.v2i8(<2 x i8> %arg, i1 false)
ret <2 x i8> %res
}
Expand Down Expand Up @@ -268,41 +248,15 @@ define amdgpu_cs <2 x i8> @abs_vgpr_v2i8(<2 x i8> %arg) {
}

define amdgpu_cs <3 x i8> @abs_sgpr_v3i8(<3 x i8> inreg %arg) {
; GFX6-LABEL: abs_sgpr_v3i8:
; GFX6: ; %bb.0:
; GFX6-NEXT: s_sext_i32_i8 s0, s0
; GFX6-NEXT: s_sext_i32_i8 s1, s1
; GFX6-NEXT: s_sext_i32_i8 s2, s2
; GFX6-NEXT: s_abs_i32 s0, s0
; GFX6-NEXT: s_abs_i32 s1, s1
; GFX6-NEXT: s_abs_i32 s2, s2
; GFX6-NEXT: ; return to shader part epilog
;
; GFX8-LABEL: abs_sgpr_v3i8:
; GFX8: ; %bb.0:
; GFX8-NEXT: s_sext_i32_i8 s0, s0
; GFX8-NEXT: s_sext_i32_i8 s1, s1
; GFX8-NEXT: s_sext_i32_i8 s2, s2
; GFX8-NEXT: s_sext_i32_i16 s0, s0
; GFX8-NEXT: s_sext_i32_i16 s1, s1
; GFX8-NEXT: s_sext_i32_i16 s2, s2
; GFX8-NEXT: s_abs_i32 s0, s0
; GFX8-NEXT: s_abs_i32 s1, s1
; GFX8-NEXT: s_abs_i32 s2, s2
; GFX8-NEXT: ; return to shader part epilog
;
; GFX10-LABEL: abs_sgpr_v3i8:
; GFX10: ; %bb.0:
; GFX10-NEXT: s_sext_i32_i8 s0, s0
; GFX10-NEXT: s_sext_i32_i8 s1, s1
; GFX10-NEXT: s_sext_i32_i8 s2, s2
; GFX10-NEXT: s_sext_i32_i16 s0, s0
; GFX10-NEXT: s_sext_i32_i16 s1, s1
; GFX10-NEXT: s_sext_i32_i16 s2, s2
; GFX10-NEXT: s_abs_i32 s0, s0
; GFX10-NEXT: s_abs_i32 s1, s1
; GFX10-NEXT: s_abs_i32 s2, s2
; GFX10-NEXT: ; return to shader part epilog
; GFX-LABEL: abs_sgpr_v3i8:
; GFX: ; %bb.0:
; GFX-NEXT: s_sext_i32_i8 s0, s0
; GFX-NEXT: s_sext_i32_i8 s1, s1
; GFX-NEXT: s_sext_i32_i8 s2, s2
; GFX-NEXT: s_abs_i32 s0, s0
; GFX-NEXT: s_abs_i32 s1, s1
; GFX-NEXT: s_abs_i32 s2, s2
; GFX-NEXT: ; return to shader part epilog
%res = call <3 x i8> @llvm.abs.v3i8(<3 x i8> %arg, i1 false)
ret <3 x i8> %res
}
Expand Down
Loading