-
Notifications
You must be signed in to change notification settings - Fork 13.6k
[ValueTracking] Recognize LShr(UINT_MAX, Y) + 1
as a power-of-two
#91171
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
@llvm/pr-subscribers-llvm-analysis Author: Monad (YanWQ-monad) ChangesThere is a missed optimization in define i8 @<!-- -->known_power_of_two_rust_next_power_of_two(i8 %x, i8 %y) {
%2 = add i8 %x, -1
%3 = tail call i8 @<!-- -->llvm.ctlz.i8(i8 %2, i1 true)
%4 = lshr i8 -1, %3
%5 = add i8 %4, 1
%6 = icmp ugt i8 %x, 1
%p = select i1 %6, i8 %5, i8 1
%r = urem i8 %y, %p
ret i8 %r
} which is extracted from the Rust code fn func(x: usize, y: usize) -> usize {
let z = x.next_power_of_two();
y % z
} Here It could be generalized to recognizing Full diff: https://github.com/llvm/llvm-project/pull/91171.diff 2 Files Affected:
diff --git a/llvm/lib/Analysis/ValueTracking.cpp b/llvm/lib/Analysis/ValueTracking.cpp
index 0dbb39d7c8ec46..33bedf7ad3d90d 100644
--- a/llvm/lib/Analysis/ValueTracking.cpp
+++ b/llvm/lib/Analysis/ValueTracking.cpp
@@ -2144,6 +2144,8 @@ bool isKnownToBeAPowerOfTwo(const Value *V, bool OrZero, unsigned Depth,
return OrZero || isKnownNonZero(I->getOperand(0), Q, Depth);
return false;
case Instruction::Add: {
+ unsigned BitWidth = V->getType()->getScalarSizeInBits();
+
// Adding a power-of-two or zero to the same power-of-two or zero yields
// either the original power-of-two, a larger power-of-two or zero.
const OverflowingBinaryOperator *VOBO = cast<OverflowingBinaryOperator>(V);
@@ -2158,7 +2160,6 @@ bool isKnownToBeAPowerOfTwo(const Value *V, bool OrZero, unsigned Depth,
isKnownToBeAPowerOfTwo(I->getOperand(0), OrZero, Depth, Q))
return true;
- unsigned BitWidth = V->getType()->getScalarSizeInBits();
KnownBits LHSBits(BitWidth);
computeKnownBits(I->getOperand(0), LHSBits, Depth, Q);
@@ -2173,6 +2174,13 @@ bool isKnownToBeAPowerOfTwo(const Value *V, bool OrZero, unsigned Depth,
if (OrZero || RHSBits.One.getBoolValue() || LHSBits.One.getBoolValue())
return true;
}
+
+ // LShr(UINT_MAX, Y) + 1 is a power of two (if nuw) or zero.
+ if (OrZero || Q.IIQ.hasNoUnsignedWrap(VOBO)) {
+ APInt UIntMax = APInt::getMaxValue(BitWidth);
+ if (match(I, m_Add(m_LShr(m_SpecificInt(UIntMax), m_Value()), m_One())))
+ return true;
+ }
return false;
}
case Instruction::Select:
diff --git a/llvm/test/Analysis/ValueTracking/known-power-of-two-urem.ll b/llvm/test/Analysis/ValueTracking/known-power-of-two-urem.ll
index 0fa8a74d250d25..20497cb57d03d0 100644
--- a/llvm/test/Analysis/ValueTracking/known-power-of-two-urem.ll
+++ b/llvm/test/Analysis/ValueTracking/known-power-of-two-urem.ll
@@ -387,3 +387,26 @@ for.end:
%r = phi i64 [ %sum, %for.body ]
ret i64 %r
}
+
+; https://alive2.llvm.org/ce/z/3QfEHm
+define i8 @known_power_of_two_rust_next_power_of_two(i8 %x, i8 %y) {
+; CHECK-LABEL: @known_power_of_two_rust_next_power_of_two(
+; CHECK-NEXT: [[TMP1:%.*]] = add i8 [[X:%.*]], -1
+; CHECK-NEXT: [[TMP2:%.*]] = tail call range(i8 0, 9) i8 @llvm.ctlz.i8(i8 [[TMP1]], i1 true)
+; CHECK-NEXT: [[TMP3:%.*]] = lshr i8 -1, [[TMP2]]
+; CHECK-NEXT: [[TMP4:%.*]] = icmp ugt i8 [[X]], 1
+; CHECK-NEXT: [[TMP5:%.*]] = select i1 [[TMP4]], i8 [[TMP3]], i8 0
+; CHECK-NEXT: [[R:%.*]] = and i8 [[TMP5]], [[Y:%.*]]
+; CHECK-NEXT: ret i8 [[R]]
+;
+ %2 = add i8 %x, -1
+ %3 = tail call i8 @llvm.ctlz.i8(i8 %2, i1 true)
+ %4 = lshr i8 -1, %3
+ %5 = add i8 %4, 1
+ %6 = icmp ugt i8 %x, 1
+ %p = select i1 %6, i8 %5, i8 1
+ ; Rust's implementation of `%p = next_power_of_two(%x)`
+
+ %r = urem i8 %y, %p
+ ret i8 %r
+}
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nice find.
|
||
%r = urem i8 %y, %p | ||
ret i8 %r | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Could you please also add a more targeted test of just the lshr + add pattern, and then test:
- The OrZero/nuw logic.
- Negative tests for wrong constants (-1, 1).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM. Nice catch!
BTW, can you please have a look at https://github.com/dtcxzyw/llvm-opt-benchmark/blob/03a9d18338e0213e304f67ea9cb18e4ae95f5d26/bench/regex-rs/optimized/11vfjke4utuj478u.ll#L7147-L7152? I guess we can handle this pattern in foldCttzCtlz
as well :)
Fold ``` llvm define i64 @src(i64 %50) { %52 = lshr i64 -1, %50 %53 = add i64 %52, 1 %54 = call i64 @llvm.cttz.i64(i64 %53, i1 false) ret i64 %54 } ``` to ``` llvm define i64 @tgt(i64 %50) { %52 = sub i64 64, %50 ret i64 %52 } ``` as #91171 (review) pointed out. Alive2 proof: https://alive2.llvm.org/ce/z/2aHfYa Note: the `ctlz` version of this pattern seems not exist in dtcxzyw's benchmark, so put it aside for now.
There is a missed optimization in
which is extracted from the Rust code
Here
%p
(a.k.az
) is semantically a power-of-two, soy urem p
can be optimized toy & (p - 1)
. (Alive2 proof: https://alive2.llvm.org/ce/z/H3zooY)It could be generalized to recognizing
LShr(UINT_MAX, Y) + 1
as a power-of-two, which is what this PR does.Alive2 proof: https://alive2.llvm.org/ce/z/zUPTbc