[AMDGPU][GlobalISel] Combine (sext (trunc (sext_in_reg x))) #131312

Pierre-vh · 2025-03-14T11:16:50Z

This is a bit of an akward pattern that can come up as a result
of legalization and then widening of i16 operations to i32 in RegBankSelect
on AMDGPU.

This quick combine avoids redundant patterns like

s_sext_i32_i8 s0, s0
s_sext_i32_i16 s0, s0
s_ashr_i32 s0, s0, s1

With this the second sext is removed as it's redundant.

Pierre-vh · 2025-03-14T11:17:05Z

Warning

This pull request is not mergeable via GitHub because a downstack PR is open. Once all requirements are satisfied, merge this PR as a stack on Graphite.
Learn more

This stack of pull requests is managed by Graphite. Learn more about stacking.

llvmbot · 2025-03-14T11:19:24Z

@llvm/pr-subscribers-llvm-globalisel

Author: Pierre van Houtryve (Pierre-vh)

Changes

This is a bit of an akward pattern that can come up as a result
of legalization and then widening of i16 operations to i32 in RegBankSelect
on AMDGPU.

This quick combine avoids redundant patterns like

s_sext_i32_i8 s0, s0
s_sext_i32_i16 s0, s0
s_ashr_i32 s0, s0, s1

With this the second sext is removed as it's redundant.

Full diff: https://github.com/llvm/llvm-project/pull/131312.diff

3 Files Affected:

(modified) llvm/include/llvm/Target/GlobalISel/Combine.td (+11-1)
(added) llvm/test/CodeGen/AMDGPU/GlobalISel/combine-sext-trunc-sextinreg.mir (+86)
(modified) llvm/test/CodeGen/AMDGPU/GlobalISel/llvm.abs.ll (+16-62)

diff --git a/llvm/include/llvm/Target/GlobalISel/Combine.td b/llvm/include/llvm/Target/GlobalISel/Combine.td
index 3590ab221ad44..9727b86b4be8b 100644
--- a/llvm/include/llvm/Target/GlobalISel/Combine.td
+++ b/llvm/include/llvm/Target/GlobalISel/Combine.td
@@ -258,6 +258,14 @@ def sext_trunc_sextload : GICombineRule<
          [{ return Helper.matchSextTruncSextLoad(*${d}); }]),
   (apply [{ Helper.applySextTruncSextLoad(*${d}); }])>;
 
+def sext_trunc_sextinreg : GICombineRule<
+  (defs root:$dst),
+  (match (G_SEXT_INREG $sir, $src, $width),
+         (G_TRUNC $trunc, $sir),
+         (G_SEXT $dst, $trunc),
+         [{ return (MRI.getType(${trunc}.getReg()).getScalarSizeInBits() >= ${width}.getImm()); }]),
+  (apply (GIReplaceReg $dst, $sir))>;
+
 def sext_inreg_of_load_matchdata : GIDefMatchData<"std::tuple<Register, unsigned>">;
 def sext_inreg_of_load : GICombineRule<
   (defs root:$root, sext_inreg_of_load_matchdata:$matchinfo),
@@ -1896,7 +1904,9 @@ def cast_of_cast_combines: GICombineGroup<[
   sext_of_anyext,
   anyext_of_anyext,
   anyext_of_zext,
-  anyext_of_sext
+  anyext_of_sext,
+
+  sext_trunc_sextinreg
 ]>;
 
 def cast_combines: GICombineGroup<[
diff --git a/llvm/test/CodeGen/AMDGPU/GlobalISel/combine-sext-trunc-sextinreg.mir b/llvm/test/CodeGen/AMDGPU/GlobalISel/combine-sext-trunc-sextinreg.mir
new file mode 100644
index 0000000000000..d41e5b172efc2
--- /dev/null
+++ b/llvm/test/CodeGen/AMDGPU/GlobalISel/combine-sext-trunc-sextinreg.mir
@@ -0,0 +1,86 @@
+# NOTE: Assertions have been autogenerated by utils/update_mir_test_checks.py
+# RUN: llc -mtriple=amdgcn-amd-amdhsa -mcpu=gfx1030 -run-pass=amdgpu-postlegalizer-combiner -verify-machineinstrs %s -o - | FileCheck %s
+# RUN: llc -mtriple=amdgcn-amd-amdhsa -mcpu=gfx1030 -run-pass=amdgpu-regbank-combiner -verify-machineinstrs %s -o - | FileCheck %s
+
+---
+name: trunc_s16_inreg_8
+tracksRegLiveness: true
+body:             |
+  bb.0:
+    liveins: $vgpr0
+    ; CHECK-LABEL: name: trunc_s16_inreg_8
+    ; CHECK: liveins: $vgpr0
+    ; CHECK-NEXT: {{  $}}
+    ; CHECK-NEXT: %copy:_(s32) = COPY $vgpr0
+    ; CHECK-NEXT: %inreg:_(s32) = G_SEXT_INREG %copy, 8
+    ; CHECK-NEXT: $vgpr0 = COPY %inreg(s32)
+    %copy:_(s32) = COPY $vgpr0
+    %inreg:_(s32) = G_SEXT_INREG %copy, 8
+    %trunc:_(s16) = G_TRUNC %inreg
+    %sext:_(s32) = G_SEXT %trunc
+    $vgpr0 = COPY %sext
+...
+
+---
+name: trunc_s16_inreg_16
+tracksRegLiveness: true
+body:             |
+  bb.0:
+    liveins: $vgpr0
+    ; CHECK-LABEL: name: trunc_s16_inreg_16
+    ; CHECK: liveins: $vgpr0
+    ; CHECK-NEXT: {{  $}}
+    ; CHECK-NEXT: %copy:_(s32) = COPY $vgpr0
+    ; CHECK-NEXT: %inreg:_(s32) = G_SEXT_INREG %copy, 16
+    ; CHECK-NEXT: $vgpr0 = COPY %inreg(s32)
+    %copy:_(s32) = COPY $vgpr0
+    %inreg:_(s32) = G_SEXT_INREG %copy, 16
+    %trunc:_(s16) = G_TRUNC %inreg
+    %sext:_(s32) = G_SEXT %trunc
+    $vgpr0 = COPY %sext
+...
+
+---
+name: trunc_s8_inreg_16
+tracksRegLiveness: true
+body:             |
+  bb.0:
+    liveins: $vgpr0
+    ; CHECK-LABEL: name: trunc_s8_inreg_16
+    ; CHECK: liveins: $vgpr0
+    ; CHECK-NEXT: {{  $}}
+    ; CHECK-NEXT: %copy:_(s32) = COPY $vgpr0
+    ; CHECK-NEXT: %inreg:_(s32) = G_SEXT_INREG %copy, 16
+    ; CHECK-NEXT: %trunc:_(s8) = G_TRUNC %inreg(s32)
+    ; CHECK-NEXT: %sext:_(s32) = G_SEXT %trunc(s8)
+    ; CHECK-NEXT: $vgpr0 = COPY %sext(s32)
+    %copy:_(s32) = COPY $vgpr0
+    %inreg:_(s32) = G_SEXT_INREG %copy, 16
+    %trunc:_(s8) = G_TRUNC %inreg
+    %sext:_(s32) = G_SEXT %trunc
+    $vgpr0 = COPY %sext
+...
+
+# TODO?: We could handle this by inserting a trunc, but I'm not sure how useful that'd be.
+---
+name: mismatching_types
+tracksRegLiveness: true
+body:             |
+  bb.0:
+    liveins: $vgpr0
+    ; CHECK-LABEL: name: mismatching_types
+    ; CHECK: liveins: $vgpr0
+    ; CHECK-NEXT: {{  $}}
+    ; CHECK-NEXT: %copy:_(s32) = COPY $vgpr0
+    ; CHECK-NEXT: %inreg:_(s32) = G_SEXT_INREG %copy, 8
+    ; CHECK-NEXT: %trunc:_(s8) = G_TRUNC %inreg(s32)
+    ; CHECK-NEXT: %sext:_(s16) = G_SEXT %trunc(s8)
+    ; CHECK-NEXT: %anyext:_(s32) = G_ANYEXT %sext(s16)
+    ; CHECK-NEXT: $vgpr0 = COPY %anyext(s32)
+    %copy:_(s32) = COPY $vgpr0
+    %inreg:_(s32) = G_SEXT_INREG %copy, 8
+    %trunc:_(s8) = G_TRUNC %inreg
+    %sext:_(s16) = G_SEXT %trunc
+    %anyext:_(s32) = G_ANYEXT %sext
+    $vgpr0 = COPY %anyext
+...
diff --git a/llvm/test/CodeGen/AMDGPU/GlobalISel/llvm.abs.ll b/llvm/test/CodeGen/AMDGPU/GlobalISel/llvm.abs.ll
index 8c687d85ac24b..7ec27f47578c2 100644
--- a/llvm/test/CodeGen/AMDGPU/GlobalISel/llvm.abs.ll
+++ b/llvm/test/CodeGen/AMDGPU/GlobalISel/llvm.abs.ll
@@ -197,33 +197,13 @@ define amdgpu_cs <4 x i32> @abs_vgpr_v4i32(<4 x i32> %arg) {
 }
 
 define amdgpu_cs <2 x i8> @abs_sgpr_v2i8(<2 x i8> inreg %arg) {
-; GFX6-LABEL: abs_sgpr_v2i8:
-; GFX6:       ; %bb.0:
-; GFX6-NEXT:    s_sext_i32_i8 s0, s0
-; GFX6-NEXT:    s_sext_i32_i8 s1, s1
-; GFX6-NEXT:    s_abs_i32 s0, s0
-; GFX6-NEXT:    s_abs_i32 s1, s1
-; GFX6-NEXT:    ; return to shader part epilog
-;
-; GFX8-LABEL: abs_sgpr_v2i8:
-; GFX8:       ; %bb.0:
-; GFX8-NEXT:    s_sext_i32_i8 s0, s0
-; GFX8-NEXT:    s_sext_i32_i8 s1, s1
-; GFX8-NEXT:    s_sext_i32_i16 s0, s0
-; GFX8-NEXT:    s_sext_i32_i16 s1, s1
-; GFX8-NEXT:    s_abs_i32 s0, s0
-; GFX8-NEXT:    s_abs_i32 s1, s1
-; GFX8-NEXT:    ; return to shader part epilog
-;
-; GFX10-LABEL: abs_sgpr_v2i8:
-; GFX10:       ; %bb.0:
-; GFX10-NEXT:    s_sext_i32_i8 s0, s0
-; GFX10-NEXT:    s_sext_i32_i8 s1, s1
-; GFX10-NEXT:    s_sext_i32_i16 s0, s0
-; GFX10-NEXT:    s_sext_i32_i16 s1, s1
-; GFX10-NEXT:    s_abs_i32 s0, s0
-; GFX10-NEXT:    s_abs_i32 s1, s1
-; GFX10-NEXT:    ; return to shader part epilog
+; GFX-LABEL: abs_sgpr_v2i8:
+; GFX:       ; %bb.0:
+; GFX-NEXT:    s_sext_i32_i8 s0, s0
+; GFX-NEXT:    s_sext_i32_i8 s1, s1
+; GFX-NEXT:    s_abs_i32 s0, s0
+; GFX-NEXT:    s_abs_i32 s1, s1
+; GFX-NEXT:    ; return to shader part epilog
   %res = call <2 x i8> @llvm.abs.v2i8(<2 x i8> %arg, i1 false)
   ret <2 x i8> %res
 }
@@ -268,41 +248,15 @@ define amdgpu_cs <2 x i8> @abs_vgpr_v2i8(<2 x i8> %arg) {
 }
 
 define amdgpu_cs <3 x i8> @abs_sgpr_v3i8(<3 x i8> inreg %arg) {
-; GFX6-LABEL: abs_sgpr_v3i8:
-; GFX6:       ; %bb.0:
-; GFX6-NEXT:    s_sext_i32_i8 s0, s0
-; GFX6-NEXT:    s_sext_i32_i8 s1, s1
-; GFX6-NEXT:    s_sext_i32_i8 s2, s2
-; GFX6-NEXT:    s_abs_i32 s0, s0
-; GFX6-NEXT:    s_abs_i32 s1, s1
-; GFX6-NEXT:    s_abs_i32 s2, s2
-; GFX6-NEXT:    ; return to shader part epilog
-;
-; GFX8-LABEL: abs_sgpr_v3i8:
-; GFX8:       ; %bb.0:
-; GFX8-NEXT:    s_sext_i32_i8 s0, s0
-; GFX8-NEXT:    s_sext_i32_i8 s1, s1
-; GFX8-NEXT:    s_sext_i32_i8 s2, s2
-; GFX8-NEXT:    s_sext_i32_i16 s0, s0
-; GFX8-NEXT:    s_sext_i32_i16 s1, s1
-; GFX8-NEXT:    s_sext_i32_i16 s2, s2
-; GFX8-NEXT:    s_abs_i32 s0, s0
-; GFX8-NEXT:    s_abs_i32 s1, s1
-; GFX8-NEXT:    s_abs_i32 s2, s2
-; GFX8-NEXT:    ; return to shader part epilog
-;
-; GFX10-LABEL: abs_sgpr_v3i8:
-; GFX10:       ; %bb.0:
-; GFX10-NEXT:    s_sext_i32_i8 s0, s0
-; GFX10-NEXT:    s_sext_i32_i8 s1, s1
-; GFX10-NEXT:    s_sext_i32_i8 s2, s2
-; GFX10-NEXT:    s_sext_i32_i16 s0, s0
-; GFX10-NEXT:    s_sext_i32_i16 s1, s1
-; GFX10-NEXT:    s_sext_i32_i16 s2, s2
-; GFX10-NEXT:    s_abs_i32 s0, s0
-; GFX10-NEXT:    s_abs_i32 s1, s1
-; GFX10-NEXT:    s_abs_i32 s2, s2
-; GFX10-NEXT:    ; return to shader part epilog
+; GFX-LABEL: abs_sgpr_v3i8:
+; GFX:       ; %bb.0:
+; GFX-NEXT:    s_sext_i32_i8 s0, s0
+; GFX-NEXT:    s_sext_i32_i8 s1, s1
+; GFX-NEXT:    s_sext_i32_i8 s2, s2
+; GFX-NEXT:    s_abs_i32 s0, s0
+; GFX-NEXT:    s_abs_i32 s1, s1
+; GFX-NEXT:    s_abs_i32 s2, s2
+; GFX-NEXT:    ; return to shader part epilog
   %res = call <3 x i8> @llvm.abs.v3i8(<3 x i8> %arg, i1 false)
   ret <3 x i8> %res
 }

llvmbot · 2025-03-14T11:19:25Z

@llvm/pr-subscribers-backend-amdgpu

Author: Pierre van Houtryve (Pierre-vh)

Changes

This is a bit of an akward pattern that can come up as a result
of legalization and then widening of i16 operations to i32 in RegBankSelect
on AMDGPU.

This quick combine avoids redundant patterns like

s_sext_i32_i8 s0, s0
s_sext_i32_i16 s0, s0
s_ashr_i32 s0, s0, s1

With this the second sext is removed as it's redundant.

Full diff: https://github.com/llvm/llvm-project/pull/131312.diff

3 Files Affected:

(modified) llvm/include/llvm/Target/GlobalISel/Combine.td (+11-1)
(added) llvm/test/CodeGen/AMDGPU/GlobalISel/combine-sext-trunc-sextinreg.mir (+86)
(modified) llvm/test/CodeGen/AMDGPU/GlobalISel/llvm.abs.ll (+16-62)

diff --git a/llvm/include/llvm/Target/GlobalISel/Combine.td b/llvm/include/llvm/Target/GlobalISel/Combine.td
index 3590ab221ad44..9727b86b4be8b 100644
--- a/llvm/include/llvm/Target/GlobalISel/Combine.td
+++ b/llvm/include/llvm/Target/GlobalISel/Combine.td
@@ -258,6 +258,14 @@ def sext_trunc_sextload : GICombineRule<
          [{ return Helper.matchSextTruncSextLoad(*${d}); }]),
   (apply [{ Helper.applySextTruncSextLoad(*${d}); }])>;
 
+def sext_trunc_sextinreg : GICombineRule<
+  (defs root:$dst),
+  (match (G_SEXT_INREG $sir, $src, $width),
+         (G_TRUNC $trunc, $sir),
+         (G_SEXT $dst, $trunc),
+         [{ return (MRI.getType(${trunc}.getReg()).getScalarSizeInBits() >= ${width}.getImm()); }]),
+  (apply (GIReplaceReg $dst, $sir))>;
+
 def sext_inreg_of_load_matchdata : GIDefMatchData<"std::tuple<Register, unsigned>">;
 def sext_inreg_of_load : GICombineRule<
   (defs root:$root, sext_inreg_of_load_matchdata:$matchinfo),
@@ -1896,7 +1904,9 @@ def cast_of_cast_combines: GICombineGroup<[
   sext_of_anyext,
   anyext_of_anyext,
   anyext_of_zext,
-  anyext_of_sext
+  anyext_of_sext,
+
+  sext_trunc_sextinreg
 ]>;
 
 def cast_combines: GICombineGroup<[
diff --git a/llvm/test/CodeGen/AMDGPU/GlobalISel/combine-sext-trunc-sextinreg.mir b/llvm/test/CodeGen/AMDGPU/GlobalISel/combine-sext-trunc-sextinreg.mir
new file mode 100644
index 0000000000000..d41e5b172efc2
--- /dev/null
+++ b/llvm/test/CodeGen/AMDGPU/GlobalISel/combine-sext-trunc-sextinreg.mir
@@ -0,0 +1,86 @@
+# NOTE: Assertions have been autogenerated by utils/update_mir_test_checks.py
+# RUN: llc -mtriple=amdgcn-amd-amdhsa -mcpu=gfx1030 -run-pass=amdgpu-postlegalizer-combiner -verify-machineinstrs %s -o - | FileCheck %s
+# RUN: llc -mtriple=amdgcn-amd-amdhsa -mcpu=gfx1030 -run-pass=amdgpu-regbank-combiner -verify-machineinstrs %s -o - | FileCheck %s
+
+---
+name: trunc_s16_inreg_8
+tracksRegLiveness: true
+body:             |
+  bb.0:
+    liveins: $vgpr0
+    ; CHECK-LABEL: name: trunc_s16_inreg_8
+    ; CHECK: liveins: $vgpr0
+    ; CHECK-NEXT: {{  $}}
+    ; CHECK-NEXT: %copy:_(s32) = COPY $vgpr0
+    ; CHECK-NEXT: %inreg:_(s32) = G_SEXT_INREG %copy, 8
+    ; CHECK-NEXT: $vgpr0 = COPY %inreg(s32)
+    %copy:_(s32) = COPY $vgpr0
+    %inreg:_(s32) = G_SEXT_INREG %copy, 8
+    %trunc:_(s16) = G_TRUNC %inreg
+    %sext:_(s32) = G_SEXT %trunc
+    $vgpr0 = COPY %sext
+...
+
+---
+name: trunc_s16_inreg_16
+tracksRegLiveness: true
+body:             |
+  bb.0:
+    liveins: $vgpr0
+    ; CHECK-LABEL: name: trunc_s16_inreg_16
+    ; CHECK: liveins: $vgpr0
+    ; CHECK-NEXT: {{  $}}
+    ; CHECK-NEXT: %copy:_(s32) = COPY $vgpr0
+    ; CHECK-NEXT: %inreg:_(s32) = G_SEXT_INREG %copy, 16
+    ; CHECK-NEXT: $vgpr0 = COPY %inreg(s32)
+    %copy:_(s32) = COPY $vgpr0
+    %inreg:_(s32) = G_SEXT_INREG %copy, 16
+    %trunc:_(s16) = G_TRUNC %inreg
+    %sext:_(s32) = G_SEXT %trunc
+    $vgpr0 = COPY %sext
+...
+
+---
+name: trunc_s8_inreg_16
+tracksRegLiveness: true
+body:             |
+  bb.0:
+    liveins: $vgpr0
+    ; CHECK-LABEL: name: trunc_s8_inreg_16
+    ; CHECK: liveins: $vgpr0
+    ; CHECK-NEXT: {{  $}}
+    ; CHECK-NEXT: %copy:_(s32) = COPY $vgpr0
+    ; CHECK-NEXT: %inreg:_(s32) = G_SEXT_INREG %copy, 16
+    ; CHECK-NEXT: %trunc:_(s8) = G_TRUNC %inreg(s32)
+    ; CHECK-NEXT: %sext:_(s32) = G_SEXT %trunc(s8)
+    ; CHECK-NEXT: $vgpr0 = COPY %sext(s32)
+    %copy:_(s32) = COPY $vgpr0
+    %inreg:_(s32) = G_SEXT_INREG %copy, 16
+    %trunc:_(s8) = G_TRUNC %inreg
+    %sext:_(s32) = G_SEXT %trunc
+    $vgpr0 = COPY %sext
+...
+
+# TODO?: We could handle this by inserting a trunc, but I'm not sure how useful that'd be.
+---
+name: mismatching_types
+tracksRegLiveness: true
+body:             |
+  bb.0:
+    liveins: $vgpr0
+    ; CHECK-LABEL: name: mismatching_types
+    ; CHECK: liveins: $vgpr0
+    ; CHECK-NEXT: {{  $}}
+    ; CHECK-NEXT: %copy:_(s32) = COPY $vgpr0
+    ; CHECK-NEXT: %inreg:_(s32) = G_SEXT_INREG %copy, 8
+    ; CHECK-NEXT: %trunc:_(s8) = G_TRUNC %inreg(s32)
+    ; CHECK-NEXT: %sext:_(s16) = G_SEXT %trunc(s8)
+    ; CHECK-NEXT: %anyext:_(s32) = G_ANYEXT %sext(s16)
+    ; CHECK-NEXT: $vgpr0 = COPY %anyext(s32)
+    %copy:_(s32) = COPY $vgpr0
+    %inreg:_(s32) = G_SEXT_INREG %copy, 8
+    %trunc:_(s8) = G_TRUNC %inreg
+    %sext:_(s16) = G_SEXT %trunc
+    %anyext:_(s32) = G_ANYEXT %sext
+    $vgpr0 = COPY %anyext
+...
diff --git a/llvm/test/CodeGen/AMDGPU/GlobalISel/llvm.abs.ll b/llvm/test/CodeGen/AMDGPU/GlobalISel/llvm.abs.ll
index 8c687d85ac24b..7ec27f47578c2 100644
--- a/llvm/test/CodeGen/AMDGPU/GlobalISel/llvm.abs.ll
+++ b/llvm/test/CodeGen/AMDGPU/GlobalISel/llvm.abs.ll
@@ -197,33 +197,13 @@ define amdgpu_cs <4 x i32> @abs_vgpr_v4i32(<4 x i32> %arg) {
 }
 
 define amdgpu_cs <2 x i8> @abs_sgpr_v2i8(<2 x i8> inreg %arg) {
-; GFX6-LABEL: abs_sgpr_v2i8:
-; GFX6:       ; %bb.0:
-; GFX6-NEXT:    s_sext_i32_i8 s0, s0
-; GFX6-NEXT:    s_sext_i32_i8 s1, s1
-; GFX6-NEXT:    s_abs_i32 s0, s0
-; GFX6-NEXT:    s_abs_i32 s1, s1
-; GFX6-NEXT:    ; return to shader part epilog
-;
-; GFX8-LABEL: abs_sgpr_v2i8:
-; GFX8:       ; %bb.0:
-; GFX8-NEXT:    s_sext_i32_i8 s0, s0
-; GFX8-NEXT:    s_sext_i32_i8 s1, s1
-; GFX8-NEXT:    s_sext_i32_i16 s0, s0
-; GFX8-NEXT:    s_sext_i32_i16 s1, s1
-; GFX8-NEXT:    s_abs_i32 s0, s0
-; GFX8-NEXT:    s_abs_i32 s1, s1
-; GFX8-NEXT:    ; return to shader part epilog
-;
-; GFX10-LABEL: abs_sgpr_v2i8:
-; GFX10:       ; %bb.0:
-; GFX10-NEXT:    s_sext_i32_i8 s0, s0
-; GFX10-NEXT:    s_sext_i32_i8 s1, s1
-; GFX10-NEXT:    s_sext_i32_i16 s0, s0
-; GFX10-NEXT:    s_sext_i32_i16 s1, s1
-; GFX10-NEXT:    s_abs_i32 s0, s0
-; GFX10-NEXT:    s_abs_i32 s1, s1
-; GFX10-NEXT:    ; return to shader part epilog
+; GFX-LABEL: abs_sgpr_v2i8:
+; GFX:       ; %bb.0:
+; GFX-NEXT:    s_sext_i32_i8 s0, s0
+; GFX-NEXT:    s_sext_i32_i8 s1, s1
+; GFX-NEXT:    s_abs_i32 s0, s0
+; GFX-NEXT:    s_abs_i32 s1, s1
+; GFX-NEXT:    ; return to shader part epilog
   %res = call <2 x i8> @llvm.abs.v2i8(<2 x i8> %arg, i1 false)
   ret <2 x i8> %res
 }
@@ -268,41 +248,15 @@ define amdgpu_cs <2 x i8> @abs_vgpr_v2i8(<2 x i8> %arg) {
 }
 
 define amdgpu_cs <3 x i8> @abs_sgpr_v3i8(<3 x i8> inreg %arg) {
-; GFX6-LABEL: abs_sgpr_v3i8:
-; GFX6:       ; %bb.0:
-; GFX6-NEXT:    s_sext_i32_i8 s0, s0
-; GFX6-NEXT:    s_sext_i32_i8 s1, s1
-; GFX6-NEXT:    s_sext_i32_i8 s2, s2
-; GFX6-NEXT:    s_abs_i32 s0, s0
-; GFX6-NEXT:    s_abs_i32 s1, s1
-; GFX6-NEXT:    s_abs_i32 s2, s2
-; GFX6-NEXT:    ; return to shader part epilog
-;
-; GFX8-LABEL: abs_sgpr_v3i8:
-; GFX8:       ; %bb.0:
-; GFX8-NEXT:    s_sext_i32_i8 s0, s0
-; GFX8-NEXT:    s_sext_i32_i8 s1, s1
-; GFX8-NEXT:    s_sext_i32_i8 s2, s2
-; GFX8-NEXT:    s_sext_i32_i16 s0, s0
-; GFX8-NEXT:    s_sext_i32_i16 s1, s1
-; GFX8-NEXT:    s_sext_i32_i16 s2, s2
-; GFX8-NEXT:    s_abs_i32 s0, s0
-; GFX8-NEXT:    s_abs_i32 s1, s1
-; GFX8-NEXT:    s_abs_i32 s2, s2
-; GFX8-NEXT:    ; return to shader part epilog
-;
-; GFX10-LABEL: abs_sgpr_v3i8:
-; GFX10:       ; %bb.0:
-; GFX10-NEXT:    s_sext_i32_i8 s0, s0
-; GFX10-NEXT:    s_sext_i32_i8 s1, s1
-; GFX10-NEXT:    s_sext_i32_i8 s2, s2
-; GFX10-NEXT:    s_sext_i32_i16 s0, s0
-; GFX10-NEXT:    s_sext_i32_i16 s1, s1
-; GFX10-NEXT:    s_sext_i32_i16 s2, s2
-; GFX10-NEXT:    s_abs_i32 s0, s0
-; GFX10-NEXT:    s_abs_i32 s1, s1
-; GFX10-NEXT:    s_abs_i32 s2, s2
-; GFX10-NEXT:    ; return to shader part epilog
+; GFX-LABEL: abs_sgpr_v3i8:
+; GFX:       ; %bb.0:
+; GFX-NEXT:    s_sext_i32_i8 s0, s0
+; GFX-NEXT:    s_sext_i32_i8 s1, s1
+; GFX-NEXT:    s_sext_i32_i8 s2, s2
+; GFX-NEXT:    s_abs_i32 s0, s0
+; GFX-NEXT:    s_abs_i32 s1, s1
+; GFX-NEXT:    s_abs_i32 s2, s2
+; GFX-NEXT:    ; return to shader part epilog
   %res = call <3 x i8> @llvm.abs.v3i8(<3 x i8> %arg, i1 false)
   ret <3 x i8> %res
 }

jayfoad · 2025-03-14T11:31:40Z

llvm/include/llvm/Target/GlobalISel/Combine.td

+  (match (G_SEXT_INREG $sir, $src, $width),
+         (G_TRUNC $trunc, $sir),
+         (G_SEXT $dst, $trunc),
+         [{ return (MRI.getType(${trunc}.getReg()).getScalarSizeInBits() >= ${width}.getImm()); }]),


You also need to check that $sir and $dst have the same width, right?

GIReplaceReg implicitly does it. It'll check canReplaceReg
There is one testcase that has different widths that shows it working

That seems like a weird design. Surely all the checking of conditions should be done in the "match" part? I didn't even know that an "apply" function was allowed to fail.

apply should not be allowed to fail

On a related note, couldn't you split this whole combine into two independently useful parts:

(sext (trunc x)) --> (sext_inreg x)

(sext_inreg (sext_inreg x)) --> (sext_inreg x)

Apply isn't allowed to fail. It's just that the presence of GIReplaceReg triggers emission of a canReplaceReg call during the matching portion of the match table rule.

On a related note, couldn't you split this whole combine into two independently useful parts:

Good idea, I can try that

That seems like a weird design. Surely all the checking of conditions should be done in the "match" part? I didn't even know that an "apply" function was allowed to fail.

... or maybe we should remove the distinction between the match and apply parts as previously discussed.

This is a bit of an akward pattern that can come up as a result of legalization and then widening of i16 operations to i32 in RegBankSelect on AMDGPU. This quick combine avoids redundant patterns like ``` s_sext_i32_i8 s0, s0 s_sext_i32_i16 s0, s0 s_ashr_i32 s0, s0, s1 ``` With this the second sext is removed as it's redundant.

Split from llvm#131312

Split from #131312

This was referenced Mar 14, 2025

[AMDGPU] Precommit si-fold-bitmask.mir #131310

Closed

[AMDGPU][SIFoldOperands] Fold some redundant bitmasks #131311

Closed

Pierre-vh requested review from arsenm, jayfoad and shiltian March 14, 2025 11:18

Pierre-vh marked this pull request as ready for review March 14, 2025 11:18

llvmbot added backend:AMDGPU llvm:globalisel labels Mar 14, 2025

Pierre-vh mentioned this pull request Mar 14, 2025

AMDGPUCodeGenPrepare should stop promoting uniform i16 values #64591

Closed

Pierre-vh force-pushed the users/pierre-vh/combine-sext-trunc-sextinreg branch from 3289b23 to b9bf3f2 Compare March 14, 2025 11:29

jayfoad reviewed Mar 14, 2025

View reviewed changes

Pierre-vh force-pushed the users/pierre-vh/sifold-bitmasks branch from 2141239 to 65de524 Compare March 17, 2025 08:49

Pierre-vh force-pushed the users/pierre-vh/combine-sext-trunc-sextinreg branch from b9bf3f2 to 4751d38 Compare March 17, 2025 08:49

Pierre-vh force-pushed the users/pierre-vh/sifold-bitmasks branch from 65de524 to c18d66f Compare March 17, 2025 09:16

Pierre-vh force-pushed the users/pierre-vh/combine-sext-trunc-sextinreg branch from 4751d38 to 9fabf93 Compare March 17, 2025 09:17

Pierre-vh force-pushed the users/pierre-vh/sifold-bitmasks branch from c18d66f to 82443bc Compare March 17, 2025 09:30

Pierre-vh force-pushed the users/pierre-vh/combine-sext-trunc-sextinreg branch from 9fabf93 to 782153a Compare March 17, 2025 09:31

Pierre-vh closed this Mar 17, 2025

Pierre-vh deleted the users/pierre-vh/combine-sext-trunc-sextinreg branch March 17, 2025 11:56

Pierre-vh mentioned this pull request Mar 17, 2025

[GlobalISel] Combine (sext (trunc x)) to (sext_inreg x) #131622

Merged

kraj pushed a commit to kraj/llvm-project that referenced this pull request Mar 17, 2025

[GlobalISel] Combine (sext (trunc x)) to (sext_inreg x)

4dcec18

Split from llvm#131312

Pierre-vh added a commit that referenced this pull request Mar 24, 2025

[GlobalISel] Combine (sext (trunc x)) to (sext_inreg x) (#131622)

c457c88

Split from #131312

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

[AMDGPU][GlobalISel] Combine (sext (trunc (sext_in_reg x))) #131312

[AMDGPU][GlobalISel] Combine (sext (trunc (sext_in_reg x))) #131312

Uh oh!

Pierre-vh commented Mar 14, 2025

Uh oh!

Pierre-vh commented Mar 14, 2025 •

edited

Loading

Uh oh!

llvmbot commented Mar 14, 2025

Uh oh!

llvmbot commented Mar 14, 2025

Uh oh!

jayfoad Mar 14, 2025

Uh oh!

Pierre-vh Mar 14, 2025

Uh oh!

jayfoad Mar 14, 2025

Uh oh!

arsenm Mar 14, 2025

Uh oh!

jayfoad Mar 14, 2025

Uh oh!

Pierre-vh Mar 14, 2025

Uh oh!

jayfoad Mar 14, 2025

Uh oh!

Uh oh!

[AMDGPU][GlobalISel] Combine (sext (trunc (sext_in_reg x))) #131312

[AMDGPU][GlobalISel] Combine (sext (trunc (sext_in_reg x))) #131312

Uh oh!

Conversation

Pierre-vh commented Mar 14, 2025

Uh oh!

Pierre-vh commented Mar 14, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

llvmbot commented Mar 14, 2025

Uh oh!

llvmbot commented Mar 14, 2025

Uh oh!

jayfoad Mar 14, 2025

Choose a reason for hiding this comment

Uh oh!

Pierre-vh Mar 14, 2025

Choose a reason for hiding this comment

Uh oh!

jayfoad Mar 14, 2025

Choose a reason for hiding this comment

Uh oh!

arsenm Mar 14, 2025

Choose a reason for hiding this comment

Uh oh!

jayfoad Mar 14, 2025

Choose a reason for hiding this comment

Uh oh!

Pierre-vh Mar 14, 2025

Choose a reason for hiding this comment

Uh oh!

jayfoad Mar 14, 2025

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Pierre-vh commented Mar 14, 2025 •

edited

Loading