Skip to content

AMDGPU: Replace unused permlane inputs with poison instead of undef #131288

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Conversation

arsenm
Copy link
Contributor

@arsenm arsenm commented Mar 14, 2025

No description provided.

Copy link
Contributor Author

arsenm commented Mar 14, 2025

@llvmbot
Copy link
Member

llvmbot commented Mar 14, 2025

@llvm/pr-subscribers-llvm-transforms

@llvm/pr-subscribers-backend-amdgpu

Author: Matt Arsenault (arsenm)

Changes

Full diff: https://github.com/llvm/llvm-project/pull/131288.diff

2 Files Affected:

  • (modified) llvm/lib/Target/AMDGPU/AMDGPUInstCombineIntrinsic.cpp (+2-2)
  • (modified) llvm/test/Transforms/InstCombine/AMDGPU/amdgcn-intrinsics.ll (+4-4)
diff --git a/llvm/lib/Target/AMDGPU/AMDGPUInstCombineIntrinsic.cpp b/llvm/lib/Target/AMDGPU/AMDGPUInstCombineIntrinsic.cpp
index 26c48c19ebfd8..7cd97e95b0189 100644
--- a/llvm/lib/Target/AMDGPU/AMDGPUInstCombineIntrinsic.cpp
+++ b/llvm/lib/Target/AMDGPU/AMDGPUInstCombineIntrinsic.cpp
@@ -1115,7 +1115,7 @@ GCNTTIImpl::instCombineIntrinsic(InstCombiner &IC, IntrinsicInst &II) const {
   case Intrinsic::amdgcn_permlanex16_var: {
     // Discard vdst_in if it's not going to be read.
     Value *VDstIn = II.getArgOperand(0);
-    if (isa<UndefValue>(VDstIn))
+    if (isa<PoisonValue>(VDstIn))
       break;
 
     // FetchInvalid operand idx.
@@ -1134,7 +1134,7 @@ GCNTTIImpl::instCombineIntrinsic(InstCombiner &IC, IntrinsicInst &II) const {
     if (!FetchInvalid->getZExtValue() && !BoundCtrl->getZExtValue())
       break;
 
-    return IC.replaceOperand(II, 0, UndefValue::get(VDstIn->getType()));
+    return IC.replaceOperand(II, 0, PoisonValue::get(VDstIn->getType()));
   }
   case Intrinsic::amdgcn_permlane64:
   case Intrinsic::amdgcn_readfirstlane:
diff --git a/llvm/test/Transforms/InstCombine/AMDGPU/amdgcn-intrinsics.ll b/llvm/test/Transforms/InstCombine/AMDGPU/amdgcn-intrinsics.ll
index 3e30b2a5b0911..99e6fc98588ae 100644
--- a/llvm/test/Transforms/InstCombine/AMDGPU/amdgcn-intrinsics.ll
+++ b/llvm/test/Transforms/InstCombine/AMDGPU/amdgcn-intrinsics.ll
@@ -3272,7 +3272,7 @@ define amdgpu_kernel void @permlane16(ptr addrspace(1) %out, i32 %src0, i32 %src
 
 define amdgpu_kernel void @permlane16_bound_ctrl(ptr addrspace(1) %out, i32 %src0, i32 %src1, i32 %src2) {
 ; CHECK-LABEL: @permlane16_bound_ctrl(
-; CHECK-NEXT:    [[RES:%.*]] = call i32 @llvm.amdgcn.permlane16.i32(i32 undef, i32 [[SRC0:%.*]], i32 [[SRC1:%.*]], i32 [[SRC2:%.*]], i1 false, i1 true)
+; CHECK-NEXT:    [[RES:%.*]] = call i32 @llvm.amdgcn.permlane16.i32(i32 poison, i32 [[SRC0:%.*]], i32 [[SRC1:%.*]], i32 [[SRC2:%.*]], i1 false, i1 true)
 ; CHECK-NEXT:    store i32 [[RES]], ptr addrspace(1) [[OUT:%.*]], align 4
 ; CHECK-NEXT:    ret void
 ;
@@ -3283,7 +3283,7 @@ define amdgpu_kernel void @permlane16_bound_ctrl(ptr addrspace(1) %out, i32 %src
 
 define amdgpu_kernel void @permlane16_fetch_invalid_bound_ctrl(ptr addrspace(1) %out, i32 %src0, i32 %src1, i32 %src2) {
 ; CHECK-LABEL: @permlane16_fetch_invalid_bound_ctrl(
-; CHECK-NEXT:    [[RES:%.*]] = call i32 @llvm.amdgcn.permlane16.i32(i32 undef, i32 [[SRC0:%.*]], i32 [[SRC1:%.*]], i32 [[SRC2:%.*]], i1 true, i1 true)
+; CHECK-NEXT:    [[RES:%.*]] = call i32 @llvm.amdgcn.permlane16.i32(i32 poison, i32 [[SRC0:%.*]], i32 [[SRC1:%.*]], i32 [[SRC2:%.*]], i1 true, i1 true)
 ; CHECK-NEXT:    store i32 [[RES]], ptr addrspace(1) [[OUT:%.*]], align 4
 ; CHECK-NEXT:    ret void
 ;
@@ -3311,7 +3311,7 @@ define amdgpu_kernel void @permlanex16(ptr addrspace(1) %out, i32 %src0, i32 %sr
 
 define amdgpu_kernel void @permlanex16_bound_ctrl(ptr addrspace(1) %out, i32 %src0, i32 %src1, i32 %src2) {
 ; CHECK-LABEL: @permlanex16_bound_ctrl(
-; CHECK-NEXT:    [[RES:%.*]] = call i32 @llvm.amdgcn.permlanex16.i32(i32 undef, i32 [[SRC0:%.*]], i32 [[SRC1:%.*]], i32 [[SRC2:%.*]], i1 false, i1 true)
+; CHECK-NEXT:    [[RES:%.*]] = call i32 @llvm.amdgcn.permlanex16.i32(i32 poison, i32 [[SRC0:%.*]], i32 [[SRC1:%.*]], i32 [[SRC2:%.*]], i1 false, i1 true)
 ; CHECK-NEXT:    store i32 [[RES]], ptr addrspace(1) [[OUT:%.*]], align 4
 ; CHECK-NEXT:    ret void
 ;
@@ -3322,7 +3322,7 @@ define amdgpu_kernel void @permlanex16_bound_ctrl(ptr addrspace(1) %out, i32 %sr
 
 define amdgpu_kernel void @permlanex16_fetch_invalid_bound_ctrl(ptr addrspace(1) %out, i32 %src0, i32 %src1, i32 %src2) {
 ; CHECK-LABEL: @permlanex16_fetch_invalid_bound_ctrl(
-; CHECK-NEXT:    [[RES:%.*]] = call i32 @llvm.amdgcn.permlanex16.i32(i32 undef, i32 [[SRC0:%.*]], i32 [[SRC1:%.*]], i32 [[SRC2:%.*]], i1 true, i1 true)
+; CHECK-NEXT:    [[RES:%.*]] = call i32 @llvm.amdgcn.permlanex16.i32(i32 poison, i32 [[SRC0:%.*]], i32 [[SRC1:%.*]], i32 [[SRC2:%.*]], i1 true, i1 true)
 ; CHECK-NEXT:    store i32 [[RES]], ptr addrspace(1) [[OUT:%.*]], align 4
 ; CHECK-NEXT:    ret void
 ;

@@ -1115,7 +1115,7 @@ GCNTTIImpl::instCombineIntrinsic(InstCombiner &IC, IntrinsicInst &II) const {
case Intrinsic::amdgcn_permlanex16_var: {
// Discard vdst_in if it's not going to be read.
Value *VDstIn = II.getArgOperand(0);
if (isa<UndefValue>(VDstIn))
if (isa<PoisonValue>(VDstIn))
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't know how prevalent this intrinsic is, but I would maybe leave this one as-is to preserve compatibility with older IR. We can always do a search & replace later when we get rid of undef.
The rest LGTM, thanks!

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is mostly to stop infinite looping the combiner

@jayfoad
Copy link
Contributor

jayfoad commented Mar 14, 2025

Same kind of objection as #131287: as a general strategy, "replace unused inputs with poison" seems incompatible with "propagate poison from arguments to result". @nunoplopes any thoughts on this?

@arsenm
Copy link
Contributor Author

arsenm commented Mar 14, 2025

Same kind of objection as #131287: as a general strategy, "replace unused inputs with poison"

Repeating from the other review, but this is not the case. Poison does not unconditionally fold through intrinsics. This is specific to an operand for an intrinsic. It just happens in the common case, normal arithmetic-like intrinsics propagate poison in any inputs

@jayfoad
Copy link
Contributor

jayfoad commented Mar 14, 2025

Same kind of objection as #131287: as a general strategy, "replace unused inputs with poison"

Repeating from the other review, but this is not the case. Poison does not unconditionally fold through intrinsics. This is specific to an operand for an intrinsic. It just happens in the common case, normal arithmetic-like intrinsics propagate poison in any inputs

Then I think the poison-propagating rules for this intrinsic should be documented. They're not obvious and "it just does what the underlying instruction does" is no longer sufficient.

@arsenm
Copy link
Contributor Author

arsenm commented Mar 14, 2025

Then I think the poison-propagating rules for this intrinsic should be documented. They're not obvious and "it just does what the underlying instruction does" is no longer sufficient.

We're currently not propagating poison for these intrinsics, and this patch isn't changing any behavior. This is changing an implementation detail of the discard input that is already done

@jayfoad
Copy link
Contributor

jayfoad commented Mar 14, 2025

Then I think the poison-propagating rules for this intrinsic should be documented. They're not obvious and "it just does what the underlying instruction does" is no longer sufficient.

We're currently not propagating poison for these intrinsics, and this patch isn't changing any behavior. This is changing an implementation detail of the discard input that is already done

This patch is relying on an undocumented assumption that poison is not propagated for all arguments of the intrinsic.

But I won't insist, if you really want to commit it as-is.

@arsenm
Copy link
Contributor Author

arsenm commented Mar 14, 2025

This patch is relying on an undocumented assumption that poison is not propagated for all arguments of the intrinsic.

The only folding of anything for target intrinsics is done explicitly here in this function

Copy link
Contributor Author

arsenm commented Mar 18, 2025

Merge activity

  • Mar 18, 6:23 AM EDT: A user started a stack merge that includes this pull request via Graphite.
  • Mar 18, 6:34 AM EDT: Graphite rebased this pull request as part of a merge.
  • Mar 18, 6:37 AM EDT: A user merged this pull request with Graphite.

@arsenm arsenm force-pushed the users/arsenm/amdgpu/amdgpu-replace-unused-update-dpp-with-poison-not-undef branch from 4928c54 to cbc0b70 Compare March 18, 2025 10:31
Base automatically changed from users/arsenm/amdgpu/amdgpu-replace-unused-update-dpp-with-poison-not-undef to main March 18, 2025 10:34
@arsenm arsenm force-pushed the users/arsenm/amdgpu/amdgpu-replace-unused-permlane-input-with-poison-not-undef branch from b9480c7 to f44ea0e Compare March 18, 2025 10:34
@arsenm arsenm merged commit c180fc8 into main Mar 18, 2025
6 of 9 checks passed
@arsenm arsenm deleted the users/arsenm/amdgpu/amdgpu-replace-unused-permlane-input-with-poison-not-undef branch March 18, 2025 10:37
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants