Skip to content

[AMDGPU][GlobalISel] Combine (sext (trunc (sext_in_reg x))) #131312

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Conversation

Pierre-vh
Copy link
Contributor

This is a bit of an akward pattern that can come up as a result
of legalization and then widening of i16 operations to i32 in RegBankSelect
on AMDGPU.

This quick combine avoids redundant patterns like

s_sext_i32_i8 s0, s0
s_sext_i32_i16 s0, s0
s_ashr_i32 s0, s0, s1

With this the second sext is removed as it's redundant.

Copy link
Contributor Author

Pierre-vh commented Mar 14, 2025

@llvmbot
Copy link
Member

llvmbot commented Mar 14, 2025

@llvm/pr-subscribers-llvm-globalisel

Author: Pierre van Houtryve (Pierre-vh)

Changes

This is a bit of an akward pattern that can come up as a result
of legalization and then widening of i16 operations to i32 in RegBankSelect
on AMDGPU.

This quick combine avoids redundant patterns like

s_sext_i32_i8 s0, s0
s_sext_i32_i16 s0, s0
s_ashr_i32 s0, s0, s1

With this the second sext is removed as it's redundant.


Full diff: https://github.com/llvm/llvm-project/pull/131312.diff

3 Files Affected:

  • (modified) llvm/include/llvm/Target/GlobalISel/Combine.td (+11-1)
  • (added) llvm/test/CodeGen/AMDGPU/GlobalISel/combine-sext-trunc-sextinreg.mir (+86)
  • (modified) llvm/test/CodeGen/AMDGPU/GlobalISel/llvm.abs.ll (+16-62)
diff --git a/llvm/include/llvm/Target/GlobalISel/Combine.td b/llvm/include/llvm/Target/GlobalISel/Combine.td
index 3590ab221ad44..9727b86b4be8b 100644
--- a/llvm/include/llvm/Target/GlobalISel/Combine.td
+++ b/llvm/include/llvm/Target/GlobalISel/Combine.td
@@ -258,6 +258,14 @@ def sext_trunc_sextload : GICombineRule<
          [{ return Helper.matchSextTruncSextLoad(*${d}); }]),
   (apply [{ Helper.applySextTruncSextLoad(*${d}); }])>;
 
+def sext_trunc_sextinreg : GICombineRule<
+  (defs root:$dst),
+  (match (G_SEXT_INREG $sir, $src, $width),
+         (G_TRUNC $trunc, $sir),
+         (G_SEXT $dst, $trunc),
+         [{ return (MRI.getType(${trunc}.getReg()).getScalarSizeInBits() >= ${width}.getImm()); }]),
+  (apply (GIReplaceReg $dst, $sir))>;
+
 def sext_inreg_of_load_matchdata : GIDefMatchData<"std::tuple<Register, unsigned>">;
 def sext_inreg_of_load : GICombineRule<
   (defs root:$root, sext_inreg_of_load_matchdata:$matchinfo),
@@ -1896,7 +1904,9 @@ def cast_of_cast_combines: GICombineGroup<[
   sext_of_anyext,
   anyext_of_anyext,
   anyext_of_zext,
-  anyext_of_sext
+  anyext_of_sext,
+
+  sext_trunc_sextinreg
 ]>;
 
 def cast_combines: GICombineGroup<[
diff --git a/llvm/test/CodeGen/AMDGPU/GlobalISel/combine-sext-trunc-sextinreg.mir b/llvm/test/CodeGen/AMDGPU/GlobalISel/combine-sext-trunc-sextinreg.mir
new file mode 100644
index 0000000000000..d41e5b172efc2
--- /dev/null
+++ b/llvm/test/CodeGen/AMDGPU/GlobalISel/combine-sext-trunc-sextinreg.mir
@@ -0,0 +1,86 @@
+# NOTE: Assertions have been autogenerated by utils/update_mir_test_checks.py
+# RUN: llc -mtriple=amdgcn-amd-amdhsa -mcpu=gfx1030 -run-pass=amdgpu-postlegalizer-combiner -verify-machineinstrs %s -o - | FileCheck %s
+# RUN: llc -mtriple=amdgcn-amd-amdhsa -mcpu=gfx1030 -run-pass=amdgpu-regbank-combiner -verify-machineinstrs %s -o - | FileCheck %s
+
+---
+name: trunc_s16_inreg_8
+tracksRegLiveness: true
+body:             |
+  bb.0:
+    liveins: $vgpr0
+    ; CHECK-LABEL: name: trunc_s16_inreg_8
+    ; CHECK: liveins: $vgpr0
+    ; CHECK-NEXT: {{  $}}
+    ; CHECK-NEXT: %copy:_(s32) = COPY $vgpr0
+    ; CHECK-NEXT: %inreg:_(s32) = G_SEXT_INREG %copy, 8
+    ; CHECK-NEXT: $vgpr0 = COPY %inreg(s32)
+    %copy:_(s32) = COPY $vgpr0
+    %inreg:_(s32) = G_SEXT_INREG %copy, 8
+    %trunc:_(s16) = G_TRUNC %inreg
+    %sext:_(s32) = G_SEXT %trunc
+    $vgpr0 = COPY %sext
+...
+
+---
+name: trunc_s16_inreg_16
+tracksRegLiveness: true
+body:             |
+  bb.0:
+    liveins: $vgpr0
+    ; CHECK-LABEL: name: trunc_s16_inreg_16
+    ; CHECK: liveins: $vgpr0
+    ; CHECK-NEXT: {{  $}}
+    ; CHECK-NEXT: %copy:_(s32) = COPY $vgpr0
+    ; CHECK-NEXT: %inreg:_(s32) = G_SEXT_INREG %copy, 16
+    ; CHECK-NEXT: $vgpr0 = COPY %inreg(s32)
+    %copy:_(s32) = COPY $vgpr0
+    %inreg:_(s32) = G_SEXT_INREG %copy, 16
+    %trunc:_(s16) = G_TRUNC %inreg
+    %sext:_(s32) = G_SEXT %trunc
+    $vgpr0 = COPY %sext
+...
+
+---
+name: trunc_s8_inreg_16
+tracksRegLiveness: true
+body:             |
+  bb.0:
+    liveins: $vgpr0
+    ; CHECK-LABEL: name: trunc_s8_inreg_16
+    ; CHECK: liveins: $vgpr0
+    ; CHECK-NEXT: {{  $}}
+    ; CHECK-NEXT: %copy:_(s32) = COPY $vgpr0
+    ; CHECK-NEXT: %inreg:_(s32) = G_SEXT_INREG %copy, 16
+    ; CHECK-NEXT: %trunc:_(s8) = G_TRUNC %inreg(s32)
+    ; CHECK-NEXT: %sext:_(s32) = G_SEXT %trunc(s8)
+    ; CHECK-NEXT: $vgpr0 = COPY %sext(s32)
+    %copy:_(s32) = COPY $vgpr0
+    %inreg:_(s32) = G_SEXT_INREG %copy, 16
+    %trunc:_(s8) = G_TRUNC %inreg
+    %sext:_(s32) = G_SEXT %trunc
+    $vgpr0 = COPY %sext
+...
+
+# TODO?: We could handle this by inserting a trunc, but I'm not sure how useful that'd be.
+---
+name: mismatching_types
+tracksRegLiveness: true
+body:             |
+  bb.0:
+    liveins: $vgpr0
+    ; CHECK-LABEL: name: mismatching_types
+    ; CHECK: liveins: $vgpr0
+    ; CHECK-NEXT: {{  $}}
+    ; CHECK-NEXT: %copy:_(s32) = COPY $vgpr0
+    ; CHECK-NEXT: %inreg:_(s32) = G_SEXT_INREG %copy, 8
+    ; CHECK-NEXT: %trunc:_(s8) = G_TRUNC %inreg(s32)
+    ; CHECK-NEXT: %sext:_(s16) = G_SEXT %trunc(s8)
+    ; CHECK-NEXT: %anyext:_(s32) = G_ANYEXT %sext(s16)
+    ; CHECK-NEXT: $vgpr0 = COPY %anyext(s32)
+    %copy:_(s32) = COPY $vgpr0
+    %inreg:_(s32) = G_SEXT_INREG %copy, 8
+    %trunc:_(s8) = G_TRUNC %inreg
+    %sext:_(s16) = G_SEXT %trunc
+    %anyext:_(s32) = G_ANYEXT %sext
+    $vgpr0 = COPY %anyext
+...
diff --git a/llvm/test/CodeGen/AMDGPU/GlobalISel/llvm.abs.ll b/llvm/test/CodeGen/AMDGPU/GlobalISel/llvm.abs.ll
index 8c687d85ac24b..7ec27f47578c2 100644
--- a/llvm/test/CodeGen/AMDGPU/GlobalISel/llvm.abs.ll
+++ b/llvm/test/CodeGen/AMDGPU/GlobalISel/llvm.abs.ll
@@ -197,33 +197,13 @@ define amdgpu_cs <4 x i32> @abs_vgpr_v4i32(<4 x i32> %arg) {
 }
 
 define amdgpu_cs <2 x i8> @abs_sgpr_v2i8(<2 x i8> inreg %arg) {
-; GFX6-LABEL: abs_sgpr_v2i8:
-; GFX6:       ; %bb.0:
-; GFX6-NEXT:    s_sext_i32_i8 s0, s0
-; GFX6-NEXT:    s_sext_i32_i8 s1, s1
-; GFX6-NEXT:    s_abs_i32 s0, s0
-; GFX6-NEXT:    s_abs_i32 s1, s1
-; GFX6-NEXT:    ; return to shader part epilog
-;
-; GFX8-LABEL: abs_sgpr_v2i8:
-; GFX8:       ; %bb.0:
-; GFX8-NEXT:    s_sext_i32_i8 s0, s0
-; GFX8-NEXT:    s_sext_i32_i8 s1, s1
-; GFX8-NEXT:    s_sext_i32_i16 s0, s0
-; GFX8-NEXT:    s_sext_i32_i16 s1, s1
-; GFX8-NEXT:    s_abs_i32 s0, s0
-; GFX8-NEXT:    s_abs_i32 s1, s1
-; GFX8-NEXT:    ; return to shader part epilog
-;
-; GFX10-LABEL: abs_sgpr_v2i8:
-; GFX10:       ; %bb.0:
-; GFX10-NEXT:    s_sext_i32_i8 s0, s0
-; GFX10-NEXT:    s_sext_i32_i8 s1, s1
-; GFX10-NEXT:    s_sext_i32_i16 s0, s0
-; GFX10-NEXT:    s_sext_i32_i16 s1, s1
-; GFX10-NEXT:    s_abs_i32 s0, s0
-; GFX10-NEXT:    s_abs_i32 s1, s1
-; GFX10-NEXT:    ; return to shader part epilog
+; GFX-LABEL: abs_sgpr_v2i8:
+; GFX:       ; %bb.0:
+; GFX-NEXT:    s_sext_i32_i8 s0, s0
+; GFX-NEXT:    s_sext_i32_i8 s1, s1
+; GFX-NEXT:    s_abs_i32 s0, s0
+; GFX-NEXT:    s_abs_i32 s1, s1
+; GFX-NEXT:    ; return to shader part epilog
   %res = call <2 x i8> @llvm.abs.v2i8(<2 x i8> %arg, i1 false)
   ret <2 x i8> %res
 }
@@ -268,41 +248,15 @@ define amdgpu_cs <2 x i8> @abs_vgpr_v2i8(<2 x i8> %arg) {
 }
 
 define amdgpu_cs <3 x i8> @abs_sgpr_v3i8(<3 x i8> inreg %arg) {
-; GFX6-LABEL: abs_sgpr_v3i8:
-; GFX6:       ; %bb.0:
-; GFX6-NEXT:    s_sext_i32_i8 s0, s0
-; GFX6-NEXT:    s_sext_i32_i8 s1, s1
-; GFX6-NEXT:    s_sext_i32_i8 s2, s2
-; GFX6-NEXT:    s_abs_i32 s0, s0
-; GFX6-NEXT:    s_abs_i32 s1, s1
-; GFX6-NEXT:    s_abs_i32 s2, s2
-; GFX6-NEXT:    ; return to shader part epilog
-;
-; GFX8-LABEL: abs_sgpr_v3i8:
-; GFX8:       ; %bb.0:
-; GFX8-NEXT:    s_sext_i32_i8 s0, s0
-; GFX8-NEXT:    s_sext_i32_i8 s1, s1
-; GFX8-NEXT:    s_sext_i32_i8 s2, s2
-; GFX8-NEXT:    s_sext_i32_i16 s0, s0
-; GFX8-NEXT:    s_sext_i32_i16 s1, s1
-; GFX8-NEXT:    s_sext_i32_i16 s2, s2
-; GFX8-NEXT:    s_abs_i32 s0, s0
-; GFX8-NEXT:    s_abs_i32 s1, s1
-; GFX8-NEXT:    s_abs_i32 s2, s2
-; GFX8-NEXT:    ; return to shader part epilog
-;
-; GFX10-LABEL: abs_sgpr_v3i8:
-; GFX10:       ; %bb.0:
-; GFX10-NEXT:    s_sext_i32_i8 s0, s0
-; GFX10-NEXT:    s_sext_i32_i8 s1, s1
-; GFX10-NEXT:    s_sext_i32_i8 s2, s2
-; GFX10-NEXT:    s_sext_i32_i16 s0, s0
-; GFX10-NEXT:    s_sext_i32_i16 s1, s1
-; GFX10-NEXT:    s_sext_i32_i16 s2, s2
-; GFX10-NEXT:    s_abs_i32 s0, s0
-; GFX10-NEXT:    s_abs_i32 s1, s1
-; GFX10-NEXT:    s_abs_i32 s2, s2
-; GFX10-NEXT:    ; return to shader part epilog
+; GFX-LABEL: abs_sgpr_v3i8:
+; GFX:       ; %bb.0:
+; GFX-NEXT:    s_sext_i32_i8 s0, s0
+; GFX-NEXT:    s_sext_i32_i8 s1, s1
+; GFX-NEXT:    s_sext_i32_i8 s2, s2
+; GFX-NEXT:    s_abs_i32 s0, s0
+; GFX-NEXT:    s_abs_i32 s1, s1
+; GFX-NEXT:    s_abs_i32 s2, s2
+; GFX-NEXT:    ; return to shader part epilog
   %res = call <3 x i8> @llvm.abs.v3i8(<3 x i8> %arg, i1 false)
   ret <3 x i8> %res
 }

@llvmbot
Copy link
Member

llvmbot commented Mar 14, 2025

@llvm/pr-subscribers-backend-amdgpu

Author: Pierre van Houtryve (Pierre-vh)

Changes

This is a bit of an akward pattern that can come up as a result
of legalization and then widening of i16 operations to i32 in RegBankSelect
on AMDGPU.

This quick combine avoids redundant patterns like

s_sext_i32_i8 s0, s0
s_sext_i32_i16 s0, s0
s_ashr_i32 s0, s0, s1

With this the second sext is removed as it's redundant.


Full diff: https://github.com/llvm/llvm-project/pull/131312.diff

3 Files Affected:

  • (modified) llvm/include/llvm/Target/GlobalISel/Combine.td (+11-1)
  • (added) llvm/test/CodeGen/AMDGPU/GlobalISel/combine-sext-trunc-sextinreg.mir (+86)
  • (modified) llvm/test/CodeGen/AMDGPU/GlobalISel/llvm.abs.ll (+16-62)
diff --git a/llvm/include/llvm/Target/GlobalISel/Combine.td b/llvm/include/llvm/Target/GlobalISel/Combine.td
index 3590ab221ad44..9727b86b4be8b 100644
--- a/llvm/include/llvm/Target/GlobalISel/Combine.td
+++ b/llvm/include/llvm/Target/GlobalISel/Combine.td
@@ -258,6 +258,14 @@ def sext_trunc_sextload : GICombineRule<
          [{ return Helper.matchSextTruncSextLoad(*${d}); }]),
   (apply [{ Helper.applySextTruncSextLoad(*${d}); }])>;
 
+def sext_trunc_sextinreg : GICombineRule<
+  (defs root:$dst),
+  (match (G_SEXT_INREG $sir, $src, $width),
+         (G_TRUNC $trunc, $sir),
+         (G_SEXT $dst, $trunc),
+         [{ return (MRI.getType(${trunc}.getReg()).getScalarSizeInBits() >= ${width}.getImm()); }]),
+  (apply (GIReplaceReg $dst, $sir))>;
+
 def sext_inreg_of_load_matchdata : GIDefMatchData<"std::tuple<Register, unsigned>">;
 def sext_inreg_of_load : GICombineRule<
   (defs root:$root, sext_inreg_of_load_matchdata:$matchinfo),
@@ -1896,7 +1904,9 @@ def cast_of_cast_combines: GICombineGroup<[
   sext_of_anyext,
   anyext_of_anyext,
   anyext_of_zext,
-  anyext_of_sext
+  anyext_of_sext,
+
+  sext_trunc_sextinreg
 ]>;
 
 def cast_combines: GICombineGroup<[
diff --git a/llvm/test/CodeGen/AMDGPU/GlobalISel/combine-sext-trunc-sextinreg.mir b/llvm/test/CodeGen/AMDGPU/GlobalISel/combine-sext-trunc-sextinreg.mir
new file mode 100644
index 0000000000000..d41e5b172efc2
--- /dev/null
+++ b/llvm/test/CodeGen/AMDGPU/GlobalISel/combine-sext-trunc-sextinreg.mir
@@ -0,0 +1,86 @@
+# NOTE: Assertions have been autogenerated by utils/update_mir_test_checks.py
+# RUN: llc -mtriple=amdgcn-amd-amdhsa -mcpu=gfx1030 -run-pass=amdgpu-postlegalizer-combiner -verify-machineinstrs %s -o - | FileCheck %s
+# RUN: llc -mtriple=amdgcn-amd-amdhsa -mcpu=gfx1030 -run-pass=amdgpu-regbank-combiner -verify-machineinstrs %s -o - | FileCheck %s
+
+---
+name: trunc_s16_inreg_8
+tracksRegLiveness: true
+body:             |
+  bb.0:
+    liveins: $vgpr0
+    ; CHECK-LABEL: name: trunc_s16_inreg_8
+    ; CHECK: liveins: $vgpr0
+    ; CHECK-NEXT: {{  $}}
+    ; CHECK-NEXT: %copy:_(s32) = COPY $vgpr0
+    ; CHECK-NEXT: %inreg:_(s32) = G_SEXT_INREG %copy, 8
+    ; CHECK-NEXT: $vgpr0 = COPY %inreg(s32)
+    %copy:_(s32) = COPY $vgpr0
+    %inreg:_(s32) = G_SEXT_INREG %copy, 8
+    %trunc:_(s16) = G_TRUNC %inreg
+    %sext:_(s32) = G_SEXT %trunc
+    $vgpr0 = COPY %sext
+...
+
+---
+name: trunc_s16_inreg_16
+tracksRegLiveness: true
+body:             |
+  bb.0:
+    liveins: $vgpr0
+    ; CHECK-LABEL: name: trunc_s16_inreg_16
+    ; CHECK: liveins: $vgpr0
+    ; CHECK-NEXT: {{  $}}
+    ; CHECK-NEXT: %copy:_(s32) = COPY $vgpr0
+    ; CHECK-NEXT: %inreg:_(s32) = G_SEXT_INREG %copy, 16
+    ; CHECK-NEXT: $vgpr0 = COPY %inreg(s32)
+    %copy:_(s32) = COPY $vgpr0
+    %inreg:_(s32) = G_SEXT_INREG %copy, 16
+    %trunc:_(s16) = G_TRUNC %inreg
+    %sext:_(s32) = G_SEXT %trunc
+    $vgpr0 = COPY %sext
+...
+
+---
+name: trunc_s8_inreg_16
+tracksRegLiveness: true
+body:             |
+  bb.0:
+    liveins: $vgpr0
+    ; CHECK-LABEL: name: trunc_s8_inreg_16
+    ; CHECK: liveins: $vgpr0
+    ; CHECK-NEXT: {{  $}}
+    ; CHECK-NEXT: %copy:_(s32) = COPY $vgpr0
+    ; CHECK-NEXT: %inreg:_(s32) = G_SEXT_INREG %copy, 16
+    ; CHECK-NEXT: %trunc:_(s8) = G_TRUNC %inreg(s32)
+    ; CHECK-NEXT: %sext:_(s32) = G_SEXT %trunc(s8)
+    ; CHECK-NEXT: $vgpr0 = COPY %sext(s32)
+    %copy:_(s32) = COPY $vgpr0
+    %inreg:_(s32) = G_SEXT_INREG %copy, 16
+    %trunc:_(s8) = G_TRUNC %inreg
+    %sext:_(s32) = G_SEXT %trunc
+    $vgpr0 = COPY %sext
+...
+
+# TODO?: We could handle this by inserting a trunc, but I'm not sure how useful that'd be.
+---
+name: mismatching_types
+tracksRegLiveness: true
+body:             |
+  bb.0:
+    liveins: $vgpr0
+    ; CHECK-LABEL: name: mismatching_types
+    ; CHECK: liveins: $vgpr0
+    ; CHECK-NEXT: {{  $}}
+    ; CHECK-NEXT: %copy:_(s32) = COPY $vgpr0
+    ; CHECK-NEXT: %inreg:_(s32) = G_SEXT_INREG %copy, 8
+    ; CHECK-NEXT: %trunc:_(s8) = G_TRUNC %inreg(s32)
+    ; CHECK-NEXT: %sext:_(s16) = G_SEXT %trunc(s8)
+    ; CHECK-NEXT: %anyext:_(s32) = G_ANYEXT %sext(s16)
+    ; CHECK-NEXT: $vgpr0 = COPY %anyext(s32)
+    %copy:_(s32) = COPY $vgpr0
+    %inreg:_(s32) = G_SEXT_INREG %copy, 8
+    %trunc:_(s8) = G_TRUNC %inreg
+    %sext:_(s16) = G_SEXT %trunc
+    %anyext:_(s32) = G_ANYEXT %sext
+    $vgpr0 = COPY %anyext
+...
diff --git a/llvm/test/CodeGen/AMDGPU/GlobalISel/llvm.abs.ll b/llvm/test/CodeGen/AMDGPU/GlobalISel/llvm.abs.ll
index 8c687d85ac24b..7ec27f47578c2 100644
--- a/llvm/test/CodeGen/AMDGPU/GlobalISel/llvm.abs.ll
+++ b/llvm/test/CodeGen/AMDGPU/GlobalISel/llvm.abs.ll
@@ -197,33 +197,13 @@ define amdgpu_cs <4 x i32> @abs_vgpr_v4i32(<4 x i32> %arg) {
 }
 
 define amdgpu_cs <2 x i8> @abs_sgpr_v2i8(<2 x i8> inreg %arg) {
-; GFX6-LABEL: abs_sgpr_v2i8:
-; GFX6:       ; %bb.0:
-; GFX6-NEXT:    s_sext_i32_i8 s0, s0
-; GFX6-NEXT:    s_sext_i32_i8 s1, s1
-; GFX6-NEXT:    s_abs_i32 s0, s0
-; GFX6-NEXT:    s_abs_i32 s1, s1
-; GFX6-NEXT:    ; return to shader part epilog
-;
-; GFX8-LABEL: abs_sgpr_v2i8:
-; GFX8:       ; %bb.0:
-; GFX8-NEXT:    s_sext_i32_i8 s0, s0
-; GFX8-NEXT:    s_sext_i32_i8 s1, s1
-; GFX8-NEXT:    s_sext_i32_i16 s0, s0
-; GFX8-NEXT:    s_sext_i32_i16 s1, s1
-; GFX8-NEXT:    s_abs_i32 s0, s0
-; GFX8-NEXT:    s_abs_i32 s1, s1
-; GFX8-NEXT:    ; return to shader part epilog
-;
-; GFX10-LABEL: abs_sgpr_v2i8:
-; GFX10:       ; %bb.0:
-; GFX10-NEXT:    s_sext_i32_i8 s0, s0
-; GFX10-NEXT:    s_sext_i32_i8 s1, s1
-; GFX10-NEXT:    s_sext_i32_i16 s0, s0
-; GFX10-NEXT:    s_sext_i32_i16 s1, s1
-; GFX10-NEXT:    s_abs_i32 s0, s0
-; GFX10-NEXT:    s_abs_i32 s1, s1
-; GFX10-NEXT:    ; return to shader part epilog
+; GFX-LABEL: abs_sgpr_v2i8:
+; GFX:       ; %bb.0:
+; GFX-NEXT:    s_sext_i32_i8 s0, s0
+; GFX-NEXT:    s_sext_i32_i8 s1, s1
+; GFX-NEXT:    s_abs_i32 s0, s0
+; GFX-NEXT:    s_abs_i32 s1, s1
+; GFX-NEXT:    ; return to shader part epilog
   %res = call <2 x i8> @llvm.abs.v2i8(<2 x i8> %arg, i1 false)
   ret <2 x i8> %res
 }
@@ -268,41 +248,15 @@ define amdgpu_cs <2 x i8> @abs_vgpr_v2i8(<2 x i8> %arg) {
 }
 
 define amdgpu_cs <3 x i8> @abs_sgpr_v3i8(<3 x i8> inreg %arg) {
-; GFX6-LABEL: abs_sgpr_v3i8:
-; GFX6:       ; %bb.0:
-; GFX6-NEXT:    s_sext_i32_i8 s0, s0
-; GFX6-NEXT:    s_sext_i32_i8 s1, s1
-; GFX6-NEXT:    s_sext_i32_i8 s2, s2
-; GFX6-NEXT:    s_abs_i32 s0, s0
-; GFX6-NEXT:    s_abs_i32 s1, s1
-; GFX6-NEXT:    s_abs_i32 s2, s2
-; GFX6-NEXT:    ; return to shader part epilog
-;
-; GFX8-LABEL: abs_sgpr_v3i8:
-; GFX8:       ; %bb.0:
-; GFX8-NEXT:    s_sext_i32_i8 s0, s0
-; GFX8-NEXT:    s_sext_i32_i8 s1, s1
-; GFX8-NEXT:    s_sext_i32_i8 s2, s2
-; GFX8-NEXT:    s_sext_i32_i16 s0, s0
-; GFX8-NEXT:    s_sext_i32_i16 s1, s1
-; GFX8-NEXT:    s_sext_i32_i16 s2, s2
-; GFX8-NEXT:    s_abs_i32 s0, s0
-; GFX8-NEXT:    s_abs_i32 s1, s1
-; GFX8-NEXT:    s_abs_i32 s2, s2
-; GFX8-NEXT:    ; return to shader part epilog
-;
-; GFX10-LABEL: abs_sgpr_v3i8:
-; GFX10:       ; %bb.0:
-; GFX10-NEXT:    s_sext_i32_i8 s0, s0
-; GFX10-NEXT:    s_sext_i32_i8 s1, s1
-; GFX10-NEXT:    s_sext_i32_i8 s2, s2
-; GFX10-NEXT:    s_sext_i32_i16 s0, s0
-; GFX10-NEXT:    s_sext_i32_i16 s1, s1
-; GFX10-NEXT:    s_sext_i32_i16 s2, s2
-; GFX10-NEXT:    s_abs_i32 s0, s0
-; GFX10-NEXT:    s_abs_i32 s1, s1
-; GFX10-NEXT:    s_abs_i32 s2, s2
-; GFX10-NEXT:    ; return to shader part epilog
+; GFX-LABEL: abs_sgpr_v3i8:
+; GFX:       ; %bb.0:
+; GFX-NEXT:    s_sext_i32_i8 s0, s0
+; GFX-NEXT:    s_sext_i32_i8 s1, s1
+; GFX-NEXT:    s_sext_i32_i8 s2, s2
+; GFX-NEXT:    s_abs_i32 s0, s0
+; GFX-NEXT:    s_abs_i32 s1, s1
+; GFX-NEXT:    s_abs_i32 s2, s2
+; GFX-NEXT:    ; return to shader part epilog
   %res = call <3 x i8> @llvm.abs.v3i8(<3 x i8> %arg, i1 false)
   ret <3 x i8> %res
 }

(match (G_SEXT_INREG $sir, $src, $width),
(G_TRUNC $trunc, $sir),
(G_SEXT $dst, $trunc),
[{ return (MRI.getType(${trunc}.getReg()).getScalarSizeInBits() >= ${width}.getImm()); }]),
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You also need to check that $sir and $dst have the same width, right?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

GIReplaceReg implicitly does it. It'll check canReplaceReg
There is one testcase that has different widths that shows it working

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That seems like a weird design. Surely all the checking of conditions should be done in the "match" part? I didn't even know that an "apply" function was allowed to fail.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

apply should not be allowed to fail

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

On a related note, couldn't you split this whole combine into two independently useful parts:

  1. (sext (trunc x)) --> (sext_inreg x)
  2. (sext_inreg (sext_inreg x)) --> (sext_inreg x)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Apply isn't allowed to fail. It's just that the presence of GIReplaceReg triggers emission of a canReplaceReg call during the matching portion of the match table rule.

On a related note, couldn't you split this whole combine into two independently useful parts:

Good idea, I can try that

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That seems like a weird design. Surely all the checking of conditions should be done in the "match" part? I didn't even know that an "apply" function was allowed to fail.

... or maybe we should remove the distinction between the match and apply parts as previously discussed.

@Pierre-vh Pierre-vh force-pushed the users/pierre-vh/sifold-bitmasks branch from 2141239 to 65de524 Compare March 17, 2025 08:49
@Pierre-vh Pierre-vh force-pushed the users/pierre-vh/combine-sext-trunc-sextinreg branch from b9bf3f2 to 4751d38 Compare March 17, 2025 08:49
@Pierre-vh Pierre-vh force-pushed the users/pierre-vh/sifold-bitmasks branch from 65de524 to c18d66f Compare March 17, 2025 09:16
@Pierre-vh Pierre-vh force-pushed the users/pierre-vh/combine-sext-trunc-sextinreg branch from 4751d38 to 9fabf93 Compare March 17, 2025 09:17
@Pierre-vh Pierre-vh force-pushed the users/pierre-vh/sifold-bitmasks branch from c18d66f to 82443bc Compare March 17, 2025 09:30
This is a bit of an akward pattern that can come up as a result
of legalization and then widening of i16 operations to i32 in RegBankSelect
on AMDGPU.

This quick combine avoids redundant patterns like
```
s_sext_i32_i8 s0, s0
s_sext_i32_i16 s0, s0
s_ashr_i32 s0, s0, s1
```

With this the second sext is removed as it's redundant.
@Pierre-vh Pierre-vh force-pushed the users/pierre-vh/combine-sext-trunc-sextinreg branch from 9fabf93 to 782153a Compare March 17, 2025 09:31
@Pierre-vh Pierre-vh closed this Mar 17, 2025
@Pierre-vh Pierre-vh deleted the users/pierre-vh/combine-sext-trunc-sextinreg branch March 17, 2025 11:56
kraj pushed a commit to kraj/llvm-project that referenced this pull request Mar 17, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants