[ValueTracking] Recognize `LShr(UINT_MAX, Y) + 1` as a power-of-two #91171

YanWQ-monad · 2024-05-06T08:08:48Z

There is a missed optimization in

define i8 @known_power_of_two_rust_next_power_of_two(i8 %x, i8 %y) {
  %2 = add i8 %x, -1
  %3 = tail call i8 @llvm.ctlz.i8(i8 %2, i1 true)
  %4 = lshr i8 -1, %3
  %5 = add i8 %4, 1
  %6 = icmp ugt i8 %x, 1
  %p = select i1 %6, i8 %5, i8 1

  %r = urem i8 %y, %p
  ret i8 %r
}

which is extracted from the Rust code

fn func(x: usize, y: usize) -> usize {
    let z = x.next_power_of_two();
    y % z
}

Here %p (a.k.a z) is semantically a power-of-two, so y urem p can be optimized to y & (p - 1). (Alive2 proof: https://alive2.llvm.org/ce/z/H3zooY)

It could be generalized to recognizing LShr(UINT_MAX, Y) + 1 as a power-of-two, which is what this PR does.
Alive2 proof: https://alive2.llvm.org/ce/z/zUPTbc

llvmbot · 2024-05-06T08:09:17Z

@llvm/pr-subscribers-llvm-analysis

Author: Monad (YanWQ-monad)

Changes

There is a missed optimization in

define i8 @<!-- -->known_power_of_two_rust_next_power_of_two(i8 %x, i8 %y) {
  %2 = add i8 %x, -1
  %3 = tail call i8 @<!-- -->llvm.ctlz.i8(i8 %2, i1 true)
  %4 = lshr i8 -1, %3
  %5 = add i8 %4, 1
  %6 = icmp ugt i8 %x, 1
  %p = select i1 %6, i8 %5, i8 1

  %r = urem i8 %y, %p
  ret i8 %r
}

which is extracted from the Rust code

fn func(x: usize, y: usize) -&gt; usize {
    let z = x.next_power_of_two();
    y % z
}

Here %p (a.k.a z) is semantically a power-of-two, so y urem p can be optimized to y & (p - 1). (Alive2 proof: https://alive2.llvm.org/ce/z/H3zooY)

It could be generalized to recognizing LShr(UINT_MAX, Y) + 1 as a power-of-two, which is what this PR does.
Alive2 proof: https://alive2.llvm.org/ce/z/zUPTbc

Full diff: https://github.com/llvm/llvm-project/pull/91171.diff

2 Files Affected:

(modified) llvm/lib/Analysis/ValueTracking.cpp (+9-1)
(modified) llvm/test/Analysis/ValueTracking/known-power-of-two-urem.ll (+23)

diff --git a/llvm/lib/Analysis/ValueTracking.cpp b/llvm/lib/Analysis/ValueTracking.cpp
index 0dbb39d7c8ec46..33bedf7ad3d90d 100644
--- a/llvm/lib/Analysis/ValueTracking.cpp
+++ b/llvm/lib/Analysis/ValueTracking.cpp
@@ -2144,6 +2144,8 @@ bool isKnownToBeAPowerOfTwo(const Value *V, bool OrZero, unsigned Depth,
       return OrZero || isKnownNonZero(I->getOperand(0), Q, Depth);
     return false;
   case Instruction::Add: {
+    unsigned BitWidth = V->getType()->getScalarSizeInBits();
+
     // Adding a power-of-two or zero to the same power-of-two or zero yields
     // either the original power-of-two, a larger power-of-two or zero.
     const OverflowingBinaryOperator *VOBO = cast<OverflowingBinaryOperator>(V);
@@ -2158,7 +2160,6 @@ bool isKnownToBeAPowerOfTwo(const Value *V, bool OrZero, unsigned Depth,
           isKnownToBeAPowerOfTwo(I->getOperand(0), OrZero, Depth, Q))
         return true;
 
-      unsigned BitWidth = V->getType()->getScalarSizeInBits();
       KnownBits LHSBits(BitWidth);
       computeKnownBits(I->getOperand(0), LHSBits, Depth, Q);
 
@@ -2173,6 +2174,13 @@ bool isKnownToBeAPowerOfTwo(const Value *V, bool OrZero, unsigned Depth,
         if (OrZero || RHSBits.One.getBoolValue() || LHSBits.One.getBoolValue())
           return true;
     }
+
+    // LShr(UINT_MAX, Y) + 1 is a power of two (if nuw) or zero.
+    if (OrZero || Q.IIQ.hasNoUnsignedWrap(VOBO)) {
+      APInt UIntMax = APInt::getMaxValue(BitWidth);
+      if (match(I, m_Add(m_LShr(m_SpecificInt(UIntMax), m_Value()), m_One())))
+        return true;
+    }
     return false;
   }
   case Instruction::Select:
diff --git a/llvm/test/Analysis/ValueTracking/known-power-of-two-urem.ll b/llvm/test/Analysis/ValueTracking/known-power-of-two-urem.ll
index 0fa8a74d250d25..20497cb57d03d0 100644
--- a/llvm/test/Analysis/ValueTracking/known-power-of-two-urem.ll
+++ b/llvm/test/Analysis/ValueTracking/known-power-of-two-urem.ll
@@ -387,3 +387,26 @@ for.end:
   %r = phi i64 [ %sum, %for.body ]
   ret i64 %r
 }
+
+; https://alive2.llvm.org/ce/z/3QfEHm
+define i8 @known_power_of_two_rust_next_power_of_two(i8 %x, i8 %y) {
+; CHECK-LABEL: @known_power_of_two_rust_next_power_of_two(
+; CHECK-NEXT:    [[TMP1:%.*]] = add i8 [[X:%.*]], -1
+; CHECK-NEXT:    [[TMP2:%.*]] = tail call range(i8 0, 9) i8 @llvm.ctlz.i8(i8 [[TMP1]], i1 true)
+; CHECK-NEXT:    [[TMP3:%.*]] = lshr i8 -1, [[TMP2]]
+; CHECK-NEXT:    [[TMP4:%.*]] = icmp ugt i8 [[X]], 1
+; CHECK-NEXT:    [[TMP5:%.*]] = select i1 [[TMP4]], i8 [[TMP3]], i8 0
+; CHECK-NEXT:    [[R:%.*]] = and i8 [[TMP5]], [[Y:%.*]]
+; CHECK-NEXT:    ret i8 [[R]]
+;
+  %2 = add i8 %x, -1
+  %3 = tail call i8 @llvm.ctlz.i8(i8 %2, i1 true)
+  %4 = lshr i8 -1, %3
+  %5 = add i8 %4, 1
+  %6 = icmp ugt i8 %x, 1
+  %p = select i1 %6, i8 %5, i8 1
+  ; Rust's implementation of `%p = next_power_of_two(%x)`
+
+  %r = urem i8 %y, %p
+  ret i8 %r
+}

nikic

Nice find.

llvm/lib/Analysis/ValueTracking.cpp

nikic · 2024-05-06T10:15:49Z

llvm/test/Analysis/ValueTracking/known-power-of-two-urem.ll

+
+  %r = urem i8 %y, %p
+  ret i8 %r
+}


Could you please also add a more targeted test of just the lshr + add pattern, and then test:

The OrZero/nuw logic.

Negative tests for wrong constants (-1, 1).

PR Link: llvm/llvm-project#91171

dtcxzyw

LGTM. Nice catch!

BTW, can you please have a look at https://github.com/dtcxzyw/llvm-opt-benchmark/blob/03a9d18338e0213e304f67ea9cb18e4ae95f5d26/bench/regex-rs/optimized/11vfjke4utuj478u.ll#L7147-L7152? I guess we can handle this pattern in foldCttzCtlz as well :)

Proof: https://alive2.llvm.org/ce/z/2aHfYa

@src

Fold ``` llvm define i64 @src(i64 %50) { %52 = lshr i64 -1, %50 %53 = add i64 %52, 1 %54 = call i64 @llvm.cttz.i64(i64 %53, i1 false) ret i64 %54 } ``` to ``` llvm define i64 @tgt(i64 %50) { %52 = sub i64 64, %50 ret i64 %52 } ``` as #91171 (review) pointed out. Alive2 proof: https://alive2.llvm.org/ce/z/2aHfYa Note: the `ctlz` version of this pattern seems not exist in dtcxzyw's benchmark, so put it aside for now.

YanWQ-monad added 2 commits May 6, 2024 15:43

[ValueTracking] Pre-commit: add test

497a79e

[ValueTracking] Recognize LShr(UINT_MAX, Y) + 1 as a power-of-two

bd1e9ba

YanWQ-monad requested a review from nikic as a code owner May 6, 2024 08:08

llvmbot added the llvm:analysis label May 6, 2024

nikic reviewed May 6, 2024

View reviewed changes

nikic requested a review from dtcxzyw May 6, 2024 10:16

YanWQ-monad added 2 commits May 6, 2024 18:55

[ValueTracking] Use m_AllOnes

3a64b85

[ValueTracking] Add more tests

b26bf5c

dtcxzyw added a commit to dtcxzyw/llvm-opt-benchmark that referenced this pull request May 6, 2024

pre-commit: test PR91171

e366dc9

PR Link: llvm/llvm-project#91171

dtcxzyw mentioned this pull request May 6, 2024

pre-commit: test PR91171 dtcxzyw/llvm-opt-benchmark#582

Closed

dtcxzyw approved these changes May 6, 2024

View reviewed changes

YanWQ-monad mentioned this pull request May 6, 2024

[InstCombine] Fold cttz(lshr(-1, x) + 1) to width - x #91244

Merged

nikic approved these changes May 7, 2024

View reviewed changes

nikic merged commit fd0ffb7 into llvm:main May 7, 2024
4 checks passed

YanWQ-monad mentioned this pull request May 14, 2024

[ValueTracking] Missed optimization of modulo power-of-two #92074

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

[ValueTracking] Recognize `LShr(UINT_MAX, Y) + 1` as a power-of-two #91171

[ValueTracking] Recognize `LShr(UINT_MAX, Y) + 1` as a power-of-two #91171

Uh oh!

YanWQ-monad commented May 6, 2024

Uh oh!

llvmbot commented May 6, 2024

Uh oh!

nikic left a comment

Uh oh!

Uh oh!

nikic May 6, 2024

Uh oh!

dtcxzyw left a comment

Uh oh!

Uh oh!

Uh oh!

[ValueTracking] Recognize LShr(UINT_MAX, Y) + 1 as a power-of-two #91171

[ValueTracking] Recognize LShr(UINT_MAX, Y) + 1 as a power-of-two #91171

Uh oh!

Conversation

YanWQ-monad commented May 6, 2024

Uh oh!

llvmbot commented May 6, 2024

Uh oh!

nikic left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

nikic May 6, 2024

Choose a reason for hiding this comment

Uh oh!

dtcxzyw left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

[ValueTracking] Recognize `LShr(UINT_MAX, Y) + 1` as a power-of-two #91171

[ValueTracking] Recognize `LShr(UINT_MAX, Y) + 1` as a power-of-two #91171