Skip to content

[AMDGPU] Precommit si-fold-bitmask.mir #131310

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Conversation

Pierre-vh
Copy link
Contributor

No description provided.

Copy link
Contributor Author

Pierre-vh commented Mar 14, 2025

@llvmbot
Copy link
Member

llvmbot commented Mar 14, 2025

@llvm/pr-subscribers-backend-amdgpu

Author: Pierre van Houtryve (Pierre-vh)

Changes

Full diff: https://github.com/llvm/llvm-project/pull/131310.diff

1 Files Affected:

  • (added) llvm/test/CodeGen/AMDGPU/si-fold-bitmasks.mir (+429)
diff --git a/llvm/test/CodeGen/AMDGPU/si-fold-bitmasks.mir b/llvm/test/CodeGen/AMDGPU/si-fold-bitmasks.mir
new file mode 100644
index 0000000000000..1edf970591179
--- /dev/null
+++ b/llvm/test/CodeGen/AMDGPU/si-fold-bitmasks.mir
@@ -0,0 +1,429 @@
+# NOTE: Assertions have been autogenerated by utils/update_mir_test_checks.py
+# RUN: llc -mtriple=amdgcn-amd-amdhsa -mcpu=gfx1010 -run-pass=si-fold-operands -verify-machineinstrs -o - %s | FileCheck --check-prefix=GCN %s
+
+# Test supported instructions
+
+---
+name: v_ashr_i32_e64__v_and_b32_e32
+tracksRegLiveness: true
+body: |
+  bb.0:
+    liveins: $vgpr0, $vgpr1
+
+    ; GCN-LABEL: name: v_ashr_i32_e64__v_and_b32_e32
+    ; GCN: liveins: $vgpr0, $vgpr1
+    ; GCN-NEXT: {{  $}}
+    ; GCN-NEXT: %src:vgpr_32 = COPY $vgpr0
+    ; GCN-NEXT: %shift:vgpr_32 = COPY $vgpr1
+    ; GCN-NEXT: %shiftmask:vgpr_32 = V_AND_B32_e32 65535, %shift, implicit $exec
+    ; GCN-NEXT: %ret:vgpr_32 = V_ASHR_I32_e64 %src, %shiftmask, implicit $exec
+    ; GCN-NEXT: $vgpr0 = COPY %ret
+    %src:vgpr_32 = COPY $vgpr0
+    %shift:vgpr_32 = COPY $vgpr1
+    %shiftmask:vgpr_32 = V_AND_B32_e32 65535, %shift, implicit $exec
+    %ret:vgpr_32 = V_ASHR_I32_e64 %src, %shiftmask, implicit $exec
+    $vgpr0 = COPY %ret
+...
+
+---
+name: v_lshr_b32_e64__v_and_b32_e32
+tracksRegLiveness: true
+body: |
+  bb.0:
+    liveins: $vgpr0, $vgpr1
+
+    ; GCN-LABEL: name: v_lshr_b32_e64__v_and_b32_e32
+    ; GCN: liveins: $vgpr0, $vgpr1
+    ; GCN-NEXT: {{  $}}
+    ; GCN-NEXT: %src:vgpr_32 = COPY $vgpr0
+    ; GCN-NEXT: %shift:vgpr_32 = COPY $vgpr1
+    ; GCN-NEXT: %shiftmask:vgpr_32 = V_AND_B32_e32 65535, %shift, implicit $exec
+    ; GCN-NEXT: %ret:vgpr_32 = V_LSHR_B32_e64 %src, %shiftmask, implicit $exec
+    ; GCN-NEXT: $vgpr0 = COPY %ret
+    %src:vgpr_32 = COPY $vgpr0
+    %shift:vgpr_32 = COPY $vgpr1
+    %shiftmask:vgpr_32 = V_AND_B32_e32 65535, %shift, implicit $exec
+    %ret:vgpr_32 = V_LSHR_B32_e64 %src, %shiftmask, implicit $exec
+    $vgpr0 = COPY %ret
+...
+
+---
+name: v_lshr_b32_e32__v_and_b32_e32
+tracksRegLiveness: true
+body: |
+  bb.0:
+    liveins: $vgpr0, $vgpr1
+
+    ; GCN-LABEL: name: v_lshr_b32_e32__v_and_b32_e32
+    ; GCN: liveins: $vgpr0, $vgpr1
+    ; GCN-NEXT: {{  $}}
+    ; GCN-NEXT: %src:vgpr_32 = COPY $vgpr0
+    ; GCN-NEXT: %shift:vgpr_32 = COPY $vgpr1
+    ; GCN-NEXT: %shiftmask:vgpr_32 = V_AND_B32_e64 65535, %shift, implicit $exec
+    ; GCN-NEXT: %ret:vgpr_32 = V_LSHR_B32_e32 %src, %shiftmask, implicit $exec
+    ; GCN-NEXT: $vgpr0 = COPY %ret
+    %src:vgpr_32 = COPY $vgpr0
+    %shift:vgpr_32 = COPY $vgpr1
+    %shiftmask:vgpr_32 = V_AND_B32_e64 65535, %shift, implicit $exec
+    %ret:vgpr_32 = V_LSHR_B32_e32 %src, %shiftmask, implicit $exec
+    $vgpr0 = COPY %ret
+...
+
+---
+name: v_lshl_b32_e64__v_and_b32_e32
+tracksRegLiveness: true
+body: |
+  bb.0:
+    liveins: $vgpr0, $vgpr1
+
+    ; GCN-LABEL: name: v_lshl_b32_e64__v_and_b32_e32
+    ; GCN: liveins: $vgpr0, $vgpr1
+    ; GCN-NEXT: {{  $}}
+    ; GCN-NEXT: %src:vgpr_32 = COPY $vgpr0
+    ; GCN-NEXT: %shift:vgpr_32 = COPY $vgpr1
+    ; GCN-NEXT: %shiftmask:vgpr_32 = V_AND_B32_e64 65535, %shift, implicit $exec
+    ; GCN-NEXT: %ret:vgpr_32 = V_LSHL_B32_e64 %src, %shiftmask, implicit $exec
+    ; GCN-NEXT: $vgpr0 = COPY %ret
+    %src:vgpr_32 = COPY $vgpr0
+    %shift:vgpr_32 = COPY $vgpr1
+    %shiftmask:vgpr_32 = V_AND_B32_e64 65535, %shift, implicit $exec
+    %ret:vgpr_32 = V_LSHL_B32_e64 %src, %shiftmask, implicit $exec
+    $vgpr0 = COPY %ret
+...
+
+---
+name: s_lshl_b32__s_and_b32
+tracksRegLiveness: true
+body: |
+  bb.0:
+    liveins: $sgpr0, $sgpr1
+
+    ; GCN-LABEL: name: s_lshl_b32__s_and_b32
+    ; GCN: liveins: $sgpr0, $sgpr1
+    ; GCN-NEXT: {{  $}}
+    ; GCN-NEXT: %src:sgpr_32 = COPY $sgpr0
+    ; GCN-NEXT: %shift:sgpr_32 = COPY $sgpr1
+    ; GCN-NEXT: %shiftmask:sgpr_32 = S_AND_B32 65535, %shift, implicit-def $scc
+    ; GCN-NEXT: %ret:sgpr_32 = S_LSHL_B32 %src, %shiftmask, implicit-def $scc
+    ; GCN-NEXT: $sgpr0 = COPY %ret
+    %src:sgpr_32 = COPY $sgpr0
+    %shift:sgpr_32 = COPY $sgpr1
+    %shiftmask:sgpr_32 = S_AND_B32 65535, %shift, implicit-def $scc
+    %ret:sgpr_32 = S_LSHL_B32 %src, %shiftmask, implicit-def $scc
+    $sgpr0 = COPY %ret
+...
+
+---
+name: s_lshr_b32__s_and_b32
+tracksRegLiveness: true
+body: |
+  bb.0:
+    liveins: $sgpr0, $sgpr1
+
+    ; GCN-LABEL: name: s_lshr_b32__s_and_b32
+    ; GCN: liveins: $sgpr0, $sgpr1
+    ; GCN-NEXT: {{  $}}
+    ; GCN-NEXT: %src:sgpr_32 = COPY $sgpr0
+    ; GCN-NEXT: %shift:sgpr_32 = COPY $sgpr1
+    ; GCN-NEXT: %shiftmask:sgpr_32 = S_AND_B32 65535, %shift, implicit-def $scc
+    ; GCN-NEXT: %ret:sgpr_32 = S_LSHR_B32 %src, %shiftmask, implicit-def $scc
+    ; GCN-NEXT: $sgpr0 = COPY %ret
+    %src:sgpr_32 = COPY $sgpr0
+    %shift:sgpr_32 = COPY $sgpr1
+    %shiftmask:sgpr_32 = S_AND_B32 65535, %shift, implicit-def $scc
+    %ret:sgpr_32 = S_LSHR_B32 %src, %shiftmask, implicit-def $scc
+    $sgpr0 = COPY %ret
+...
+
+---
+name: s_ashr_i32__s_and_b32
+tracksRegLiveness: true
+body: |
+  bb.0:
+    liveins: $sgpr0, $sgpr1
+
+    ; GCN-LABEL: name: s_ashr_i32__s_and_b32
+    ; GCN: liveins: $sgpr0, $sgpr1
+    ; GCN-NEXT: {{  $}}
+    ; GCN-NEXT: %src:sgpr_32 = COPY $sgpr0
+    ; GCN-NEXT: %shift:sgpr_32 = COPY $sgpr1
+    ; GCN-NEXT: %shiftmask:sgpr_32 = S_AND_B32 65535, %shift, implicit-def $scc
+    ; GCN-NEXT: %ret:sgpr_32 = S_ASHR_I32 %src, %shiftmask, implicit-def $scc
+    ; GCN-NEXT: $sgpr0 = COPY %ret
+    %src:sgpr_32 = COPY $sgpr0
+    %shift:sgpr_32 = COPY $sgpr1
+    %shiftmask:sgpr_32 = S_AND_B32 65535, %shift, implicit-def $scc
+    %ret:sgpr_32 = S_ASHR_I32 %src, %shiftmask, implicit-def $scc
+    $sgpr0 = COPY %ret
+...
+
+---
+name: s_lshl_b64__s_and_b32
+tracksRegLiveness: true
+body: |
+  bb.0:
+    liveins: $sgpr0_sgpr1, $sgpr2
+
+    ; GCN-LABEL: name: s_lshl_b64__s_and_b32
+    ; GCN: liveins: $sgpr0_sgpr1, $sgpr2
+    ; GCN-NEXT: {{  $}}
+    ; GCN-NEXT: %src:sgpr_64 = COPY $sgpr0_sgpr1
+    ; GCN-NEXT: %shift:sgpr_32 = COPY $sgpr2
+    ; GCN-NEXT: %shiftmask:sgpr_32 = S_AND_B32 63, %shift, implicit-def $scc
+    ; GCN-NEXT: %ret:sgpr_64 = S_LSHL_B64 %src, %shiftmask, implicit-def $scc
+    ; GCN-NEXT: $sgpr0_sgpr1 = COPY %ret
+    %src:sgpr_64 = COPY $sgpr0_sgpr1
+    %shift:sgpr_32 = COPY $sgpr2
+    %shiftmask:sgpr_32 = S_AND_B32 63, %shift, implicit-def $scc
+    %ret:sgpr_64 = S_LSHL_B64 %src, %shiftmask, implicit-def $scc
+    $sgpr0_sgpr1 = COPY %ret
+...
+
+---
+name: s_lshr_b64__s_and_b32
+tracksRegLiveness: true
+body: |
+  bb.0:
+    liveins: $sgpr0_sgpr1, $sgpr2
+
+    ; GCN-LABEL: name: s_lshr_b64__s_and_b32
+    ; GCN: liveins: $sgpr0_sgpr1, $sgpr2
+    ; GCN-NEXT: {{  $}}
+    ; GCN-NEXT: %src:sgpr_64 = COPY $sgpr0_sgpr1
+    ; GCN-NEXT: %shift:sgpr_32 = COPY $sgpr2
+    ; GCN-NEXT: %shiftmask:sgpr_32 = S_AND_B32 63, %shift, implicit-def $scc
+    ; GCN-NEXT: %ret:sgpr_64 = S_LSHR_B64 %src, %shiftmask, implicit-def $scc
+    ; GCN-NEXT: $sgpr0_sgpr1 = COPY %ret
+    %src:sgpr_64 = COPY $sgpr0_sgpr1
+    %shift:sgpr_32 = COPY $sgpr2
+    %shiftmask:sgpr_32 = S_AND_B32 63, %shift, implicit-def $scc
+    %ret:sgpr_64 = S_LSHR_B64 %src, %shiftmask, implicit-def $scc
+    $sgpr0_sgpr1 = COPY %ret
+...
+
+---
+name: s_ashr_i64__s_and_b32
+tracksRegLiveness: true
+body: |
+  bb.0:
+    liveins: $sgpr0_sgpr1, $sgpr2
+
+    ; GCN-LABEL: name: s_ashr_i64__s_and_b32
+    ; GCN: liveins: $sgpr0_sgpr1, $sgpr2
+    ; GCN-NEXT: {{  $}}
+    ; GCN-NEXT: %src:sgpr_64 = COPY $sgpr0_sgpr1
+    ; GCN-NEXT: %shift:sgpr_32 = COPY $sgpr2
+    ; GCN-NEXT: %shiftmask:sgpr_32 = S_AND_B32 63, %shift, implicit-def $scc
+    ; GCN-NEXT: %ret:sgpr_64 = S_ASHR_I64 %src, %shiftmask, implicit-def $scc
+    ; GCN-NEXT: $sgpr0_sgpr1 = COPY %ret
+    %src:sgpr_64 = COPY $sgpr0_sgpr1
+    %shift:sgpr_32 = COPY $sgpr2
+    %shiftmask:sgpr_32 = S_AND_B32 63, %shift, implicit-def $scc
+    %ret:sgpr_64 = S_ASHR_I64 %src, %shiftmask, implicit-def $scc
+    $sgpr0_sgpr1 = COPY %ret
+...
+
+---
+name: v_lshlrev_b32_e64__v_and_b32_e32
+tracksRegLiveness: true
+body: |
+  bb.0:
+    liveins: $vgpr0, $vgpr1
+
+    ; GCN-LABEL: name: v_lshlrev_b32_e64__v_and_b32_e32
+    ; GCN: liveins: $vgpr0, $vgpr1
+    ; GCN-NEXT: {{  $}}
+    ; GCN-NEXT: %src:vgpr_32 = COPY $vgpr0
+    ; GCN-NEXT: %shift:vgpr_32 = COPY $vgpr1
+    ; GCN-NEXT: %shiftmask:vgpr_32 = V_AND_B32_e32 65535, %shift, implicit $exec
+    ; GCN-NEXT: %ret:vgpr_32 = V_LSHLREV_B32_e64 %shiftmask, %src, implicit $exec
+    ; GCN-NEXT: $vgpr0 = COPY %ret
+    %src:vgpr_32 = COPY $vgpr0
+    %shift:vgpr_32 = COPY $vgpr1
+    %shiftmask:vgpr_32 = V_AND_B32_e32 65535, %shift, implicit $exec
+    %ret:vgpr_32 = V_LSHLREV_B32_e64 %shiftmask, %src, implicit $exec
+    $vgpr0 = COPY %ret
+...
+
+---
+name: v_lshrrev_b32_e64__v_and_b32_e32
+tracksRegLiveness: true
+body: |
+  bb.0:
+    liveins: $vgpr0, $vgpr1
+
+    ; GCN-LABEL: name: v_lshrrev_b32_e64__v_and_b32_e32
+    ; GCN: liveins: $vgpr0, $vgpr1
+    ; GCN-NEXT: {{  $}}
+    ; GCN-NEXT: %src:vgpr_32 = COPY $vgpr0
+    ; GCN-NEXT: %shift:vgpr_32 = COPY $vgpr1
+    ; GCN-NEXT: %shiftmask:vgpr_32 = V_AND_B32_e32 65535, %shift, implicit $exec
+    ; GCN-NEXT: %ret:vgpr_32 = V_LSHRREV_B32_e64 %shiftmask, %src, implicit $exec
+    ; GCN-NEXT: $vgpr0 = COPY %ret
+    %src:vgpr_32 = COPY $vgpr0
+    %shift:vgpr_32 = COPY $vgpr1
+    %shiftmask:vgpr_32 = V_AND_B32_e32 65535, %shift, implicit $exec
+    %ret:vgpr_32 = V_LSHRREV_B32_e64 %shiftmask, %src, implicit $exec
+    $vgpr0 = COPY %ret
+...
+
+---
+name: v_ashrrev_i32_e64__v_and_b32_e32
+tracksRegLiveness: true
+body: |
+  bb.0:
+    liveins: $vgpr0, $vgpr1
+
+    ; GCN-LABEL: name: v_ashrrev_i32_e64__v_and_b32_e32
+    ; GCN: liveins: $vgpr0, $vgpr1
+    ; GCN-NEXT: {{  $}}
+    ; GCN-NEXT: %src:vgpr_32 = COPY $vgpr0
+    ; GCN-NEXT: %shift:vgpr_32 = COPY $vgpr1
+    ; GCN-NEXT: %shiftmask:vgpr_32 = V_AND_B32_e32 65535, %shift, implicit $exec
+    ; GCN-NEXT: %ret:vgpr_32 = V_ASHRREV_I32_e64 %shiftmask, %src, implicit $exec
+    ; GCN-NEXT: $vgpr0 = COPY %ret
+    %src:vgpr_32 = COPY $vgpr0
+    %shift:vgpr_32 = COPY $vgpr1
+    %shiftmask:vgpr_32 = V_AND_B32_e32 65535, %shift, implicit $exec
+    %ret:vgpr_32 = V_ASHRREV_I32_e64 %shiftmask, %src, implicit $exec
+    $vgpr0 = COPY %ret
+...
+
+# Test interesting cases
+
+---
+name: flipped_operands
+tracksRegLiveness: true
+body: |
+  bb.0:
+    liveins: $vgpr0, $vgpr1
+
+    ; GCN-LABEL: name: flipped_operands
+    ; GCN: liveins: $vgpr0, $vgpr1
+    ; GCN-NEXT: {{  $}}
+    ; GCN-NEXT: %src:vgpr_32 = COPY $vgpr0
+    ; GCN-NEXT: %shift:vgpr_32 = COPY $vgpr1
+    ; GCN-NEXT: %shiftmask:vgpr_32 = V_AND_B32_e32 65535, %shift, implicit $exec
+    ; GCN-NEXT: %ret:vgpr_32 = V_ASHR_I32_e64 %shiftmask, %src, implicit $exec
+    ; GCN-NEXT: $vgpr0 = COPY %ret
+    %src:vgpr_32 = COPY $vgpr0
+    %shift:vgpr_32 = COPY $vgpr1
+    %shiftmask:vgpr_32 = V_AND_B32_e32 65535, %shift, implicit $exec
+    %ret:vgpr_32 = V_ASHR_I32_e64 %shiftmask, %src, implicit $exec
+    $vgpr0 = COPY %ret
+...
+
+---
+name: flipped_operands_rev
+tracksRegLiveness: true
+body: |
+  bb.0:
+    liveins: $vgpr0, $vgpr1
+
+    ; GCN-LABEL: name: flipped_operands_rev
+    ; GCN: liveins: $vgpr0, $vgpr1
+    ; GCN-NEXT: {{  $}}
+    ; GCN-NEXT: %src:vgpr_32 = COPY $vgpr0
+    ; GCN-NEXT: %shift:vgpr_32 = COPY $vgpr1
+    ; GCN-NEXT: %shiftmask:vgpr_32 = V_AND_B32_e32 65535, %shift, implicit $exec
+    ; GCN-NEXT: %ret:vgpr_32 = V_ASHRREV_I32_e64 %src, %shiftmask, implicit $exec
+    ; GCN-NEXT: $vgpr0 = COPY %ret
+    %src:vgpr_32 = COPY $vgpr0
+    %shift:vgpr_32 = COPY $vgpr1
+    %shiftmask:vgpr_32 = V_AND_B32_e32 65535, %shift, implicit $exec
+    %ret:vgpr_32 = V_ASHRREV_I32_e64 %src, %shiftmask, implicit $exec
+    $vgpr0 = COPY %ret
+...
+
+# 30 = 11110 = doesn't cover all bits.
+---
+name: shift32_mask_too_small
+tracksRegLiveness: true
+body: |
+  bb.0:
+    liveins: $sgpr0, $sgpr1
+
+    ; GCN-LABEL: name: shift32_mask_too_small
+    ; GCN: liveins: $sgpr0, $sgpr1
+    ; GCN-NEXT: {{  $}}
+    ; GCN-NEXT: %src:sgpr_32 = COPY $sgpr0
+    ; GCN-NEXT: %shift:sgpr_32 = COPY $sgpr1
+    ; GCN-NEXT: %shiftmask:sgpr_32 = S_AND_B32 30, %shift, implicit-def $scc
+    ; GCN-NEXT: %ret:sgpr_32 = S_LSHL_B32 %src, %shiftmask, implicit-def $scc
+    ; GCN-NEXT: $sgpr0 = COPY %ret
+    %src:sgpr_32 = COPY $sgpr0
+    %shift:sgpr_32 = COPY $sgpr1
+    %shiftmask:sgpr_32 = S_AND_B32 30, %shift, implicit-def $scc
+    %ret:sgpr_32 = S_LSHL_B32 %src, %shiftmask, implicit-def $scc
+    $sgpr0 = COPY %ret
+...
+
+# 90 = 1011010 = doesn't cover all bits.
+---
+name: shift32_mask_has_holes
+tracksRegLiveness: true
+body: |
+  bb.0:
+    liveins: $sgpr0, $sgpr1
+
+    ; GCN-LABEL: name: shift32_mask_has_holes
+    ; GCN: liveins: $sgpr0, $sgpr1
+    ; GCN-NEXT: {{  $}}
+    ; GCN-NEXT: %src:sgpr_32 = COPY $sgpr0
+    ; GCN-NEXT: %shift:sgpr_32 = COPY $sgpr1
+    ; GCN-NEXT: %shiftmask:sgpr_32 = S_AND_B32 90, %shift, implicit-def $scc
+    ; GCN-NEXT: %ret:sgpr_32 = S_LSHL_B32 %src, %shiftmask, implicit-def $scc
+    ; GCN-NEXT: $sgpr0 = COPY %ret
+    %src:sgpr_32 = COPY $sgpr0
+    %shift:sgpr_32 = COPY $sgpr1
+    %shiftmask:sgpr_32 = S_AND_B32 90, %shift, implicit-def $scc
+    %ret:sgpr_32 = S_LSHL_B32 %src, %shiftmask, implicit-def $scc
+    $sgpr0 = COPY %ret
+...
+
+# 30 = 11110 = doesn't cover all bits.
+---
+name: shift64_mask_too_small
+tracksRegLiveness: true
+body: |
+  bb.0:
+    liveins: $sgpr0_sgpr1, $sgpr2
+
+    ; GCN-LABEL: name: shift64_mask_too_small
+    ; GCN: liveins: $sgpr0_sgpr1, $sgpr2
+    ; GCN-NEXT: {{  $}}
+    ; GCN-NEXT: %src:sgpr_64 = COPY $sgpr0_sgpr1
+    ; GCN-NEXT: %shift:sgpr_32 = COPY $sgpr2
+    ; GCN-NEXT: %shiftmask:sgpr_32 = S_AND_B32 30, %shift, implicit-def $scc
+    ; GCN-NEXT: %ret:sgpr_64 = S_LSHR_B64 %src, %shiftmask, implicit-def $scc
+    ; GCN-NEXT: $sgpr0_sgpr1 = COPY %ret
+    %src:sgpr_64 = COPY $sgpr0_sgpr1
+    %shift:sgpr_32 = COPY $sgpr2
+    %shiftmask:sgpr_32 = S_AND_B32 30, %shift, implicit-def $scc
+    %ret:sgpr_64 = S_LSHR_B64 %src, %shiftmask, implicit-def $scc
+    $sgpr0_sgpr1 = COPY %ret
+...
+
+
+# 90 = 1011010 = doesn't cover all bits.
+---
+name: shift64_mask_has_holes
+tracksRegLiveness: true
+body: |
+  bb.0:
+    liveins: $sgpr0_sgpr1, $sgpr2
+
+    ; GCN-LABEL: name: shift64_mask_has_holes
+    ; GCN: liveins: $sgpr0_sgpr1, $sgpr2
+    ; GCN-NEXT: {{  $}}
+    ; GCN-NEXT: %src:sgpr_64 = COPY $sgpr0_sgpr1
+    ; GCN-NEXT: %shift:sgpr_32 = COPY $sgpr2
+    ; GCN-NEXT: %shiftmask:sgpr_32 = S_AND_B32 90, %shift, implicit-def $scc
+    ; GCN-NEXT: %ret:sgpr_64 = S_LSHR_B64 %src, %shiftmask, implicit-def $scc
+    ; GCN-NEXT: $sgpr0_sgpr1 = COPY %ret
+    %src:sgpr_64 = COPY $sgpr0_sgpr1
+    %shift:sgpr_32 = COPY $sgpr2
+    %shiftmask:sgpr_32 = S_AND_B32 90, %shift, implicit-def $scc
+    %ret:sgpr_64 = S_LSHR_B64 %src, %shiftmask, implicit-def $scc
+    $sgpr0_sgpr1 = COPY %ret
+...

@Pierre-vh Pierre-vh force-pushed the users/pierre-vh/precommit_bitmask_test branch from b87a9db to fcd5623 Compare March 14, 2025 11:29
@Pierre-vh Pierre-vh force-pushed the users/pierre-vh/bfx-on-s16 branch from ee917df to c30cc50 Compare March 14, 2025 11:29
Copy link
Contributor

@arsenm arsenm left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hopefully this is to deal with DAG regressions? I would hope globalisel doesn't need this

@@ -0,0 +1,429 @@
# NOTE: Assertions have been autogenerated by utils/update_mir_test_checks.py
# RUN: llc -mtriple=amdgcn-amd-amdhsa -mcpu=gfx1010 -run-pass=si-fold-operands -verify-machineinstrs -o - %s | FileCheck --check-prefix=GCN %s
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
# RUN: llc -mtriple=amdgcn-amd-amdhsa -mcpu=gfx1010 -run-pass=si-fold-operands -verify-machineinstrs -o - %s | FileCheck --check-prefix=GCN %s
# RUN: llc -mtriple=amdgcn-amd-amdhsa -mcpu=gfx1010 -run-pass=si-fold-operands -o - %s | FileCheck --check-prefix=GCN %s

@Pierre-vh
Copy link
Contributor Author

GlobalISel unfortunately needs it. We can end up with things like a G_LSHR with the shift amount being zext'd, and they're both lowered independently so we have a s_and_b32 of the shift amount.

I tried fixing this elsewhere but it's messy. I don't think we can do it before lowering unless we start replacing all G_ZEXT of shift amounts with G_AND, but that feels clunky and it introduces other issues I think, like it forces the lowering a certain way and takes decisions away from ISel.

Doing it during ISel by looking through zext is another option but after spending a day trying to bend the DAG emitter to do what I want, I gave up on that. We have so many patterns using our cshl_32 (and friends) PatFrags that I couldn't please them all, I always either crashed the TableGen backend or got type inference issues, or got a pattern that never worked (e.g. matched a i64 shift amount for 32 bit shift ops)

@arsenm
Copy link
Contributor

arsenm commented Mar 14, 2025

GlobalISel unfortunately needs it. We can end up with things like a G_LSHR with the shift amount being zext'd, and they're both lowered independently so we have a s_and_b32 of the shift amount.

It should always be post legalize / post regbankselect combinable. Things are strictly more difficult after selection

@Pierre-vh
Copy link
Contributor Author

GlobalISel unfortunately needs it. We can end up with things like a G_LSHR with the shift amount being zext'd, and they're both lowered independently so we have a s_and_b32 of the shift amount.

It should always be post legalize / post regbankselect combinable. Things are strictly more difficult after selection

The main issue I was having was with code that had <32 bit arguments in registers.
We'd have

%0(s32) = COPY $sgpr0
%1(s16) = G_TRUNC %0
%2(s32) = G_ZEXT %1

Then %2 being used as the shift amount. We can't eliminate the zext/trunc because the generic opcode has no mention of reading only the lower bits, AFAIK. I tried experimenting with multiple approaches but I didn't find anything better than doing it in SIFoldOperand

@arsenm
Copy link
Contributor

arsenm commented Mar 14, 2025

Then %2 being used as the shift amount. We can't eliminate the zext/trunc because the generic opcode has no mention of reading only the lower bits, AFAIK.

We can fold the clamp of the shift amount into the shift instruction during selection as we know the instruction ignores the high bits. We do that in the DAG path already. I think it special cases the and & (bitwidth - 1) pattern, which should form canonically. In principle it could do a general simplify demand bits

@Pierre-vh
Copy link
Contributor Author

We can fold the clamp of the shift amount into the shift instruction during selection as we know the instruction ignores the high bits. We do that in the DAG path already. I think it special cases the and & (bitwidth - 1) pattern, which should form canonically. In principle it could do a general simplify demand bits

Where and how should that be implemented ? I struggled with that. I tried adding a new special case in TableGen but I just couldn't find the right way to do it.
Do I just add it in C++ InstructionSelector before it checks the patterns?
Or should it be some kind of post-processing step after the shift has been selected, but before the G_ZEXT is selected?

@Pierre-vh Pierre-vh force-pushed the users/pierre-vh/precommit_bitmask_test branch from fcd5623 to 6db5fe8 Compare March 17, 2025 08:49
@Pierre-vh Pierre-vh force-pushed the users/pierre-vh/bfx-on-s16 branch 2 times, most recently from 090fa3e to 2dc7126 Compare March 17, 2025 09:15
@Pierre-vh Pierre-vh force-pushed the users/pierre-vh/precommit_bitmask_test branch from 6db5fe8 to d4b257d Compare March 17, 2025 09:16
@Pierre-vh Pierre-vh force-pushed the users/pierre-vh/bfx-on-s16 branch from 2dc7126 to d65db02 Compare March 17, 2025 09:29
@Pierre-vh Pierre-vh force-pushed the users/pierre-vh/precommit_bitmask_test branch from d4b257d to 65d5012 Compare March 17, 2025 09:29
@arsenm
Copy link
Contributor

arsenm commented Mar 17, 2025

Where and how should that be implemented ? I struggled with that. I tried adding a new special case in TableGen but I just couldn't find the right way to do it. Do I just add it in C++ InstructionSelector before it checks the patterns? Or should it be some kind of post-processing step after the shift has been selected, but before the G_ZEXT is selected?

It already exists as a complex pattern, isUnneededShiftMask. The combiners should be trying to get the clamping code into this form which expects the and

@Pierre-vh Pierre-vh closed this Mar 17, 2025
@Pierre-vh Pierre-vh deleted the users/pierre-vh/precommit_bitmask_test branch March 17, 2025 11:56
@Pierre-vh
Copy link
Contributor Author

Where and how should that be implemented ? I struggled with that. I tried adding a new special case in TableGen but I just couldn't find the right way to do it. Do I just add it in C++ InstructionSelector before it checks the patterns? Or should it be some kind of post-processing step after the shift has been selected, but before the G_ZEXT is selected?

It already exists as a complex pattern, isUnneededShiftMask. The combiners should be trying to get the clamping code into this form which expects the and

I tried it but the DAG immediately transforms (and x, 0xFF) into a zext and it seems pretty stubborn about it as it's a basic transform.
I don't mind trying to make it work a bit longer, but I could also just bring this back. What do you think?

@arsenm
Copy link
Contributor

arsenm commented Mar 22, 2025

I tried it but the DAG immediately transforms (and x, 0xFF) into a zext and it seems pretty stubborn about it as it's a basic transform.

Then isUnneededShiftMask should probably recognize more forms of the pattern. I'd assume this only forms the zext if it is legal

@Pierre-vh
Copy link
Contributor Author

Then isUnneededShiftMask should probably recognize more forms of the pattern

Yes, but with a cast operation it gets tricky. If I add (zext node:$src0) to the PatFrag, it doesn't work because the PatFrag is always used like this: (cshl_32 (i32 $src0)). As $src0 is inferred to i32, it proceeds to infer the zext as a zext to i64 and then we get a type mismatch error.
I'm not sure how to proceed with this one, I need someone to help me figure out the right syntax

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants