Skip to content

[ValueTracking] Recognize LShr(UINT_MAX, Y) + 1 as a power-of-two #91171

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 4 commits into from
May 7, 2024

Conversation

YanWQ-monad
Copy link
Contributor

There is a missed optimization in

define i8 @known_power_of_two_rust_next_power_of_two(i8 %x, i8 %y) {
  %2 = add i8 %x, -1
  %3 = tail call i8 @llvm.ctlz.i8(i8 %2, i1 true)
  %4 = lshr i8 -1, %3
  %5 = add i8 %4, 1
  %6 = icmp ugt i8 %x, 1
  %p = select i1 %6, i8 %5, i8 1

  %r = urem i8 %y, %p
  ret i8 %r
}

which is extracted from the Rust code

fn func(x: usize, y: usize) -> usize {
    let z = x.next_power_of_two();
    y % z
}

Here %p (a.k.a z) is semantically a power-of-two, so y urem p can be optimized to y & (p - 1). (Alive2 proof: https://alive2.llvm.org/ce/z/H3zooY)


It could be generalized to recognizing LShr(UINT_MAX, Y) + 1 as a power-of-two, which is what this PR does.
Alive2 proof: https://alive2.llvm.org/ce/z/zUPTbc

@YanWQ-monad YanWQ-monad requested a review from nikic as a code owner May 6, 2024 08:08
@llvmbot
Copy link
Member

llvmbot commented May 6, 2024

@llvm/pr-subscribers-llvm-analysis

Author: Monad (YanWQ-monad)

Changes

There is a missed optimization in

define i8 @<!-- -->known_power_of_two_rust_next_power_of_two(i8 %x, i8 %y) {
  %2 = add i8 %x, -1
  %3 = tail call i8 @<!-- -->llvm.ctlz.i8(i8 %2, i1 true)
  %4 = lshr i8 -1, %3
  %5 = add i8 %4, 1
  %6 = icmp ugt i8 %x, 1
  %p = select i1 %6, i8 %5, i8 1

  %r = urem i8 %y, %p
  ret i8 %r
}

which is extracted from the Rust code

fn func(x: usize, y: usize) -&gt; usize {
    let z = x.next_power_of_two();
    y % z
}

Here %p (a.k.a z) is semantically a power-of-two, so y urem p can be optimized to y &amp; (p - 1). (Alive2 proof: https://alive2.llvm.org/ce/z/H3zooY)


It could be generalized to recognizing LShr(UINT_MAX, Y) + 1 as a power-of-two, which is what this PR does.
Alive2 proof: https://alive2.llvm.org/ce/z/zUPTbc


Full diff: https://github.com/llvm/llvm-project/pull/91171.diff

2 Files Affected:

  • (modified) llvm/lib/Analysis/ValueTracking.cpp (+9-1)
  • (modified) llvm/test/Analysis/ValueTracking/known-power-of-two-urem.ll (+23)
diff --git a/llvm/lib/Analysis/ValueTracking.cpp b/llvm/lib/Analysis/ValueTracking.cpp
index 0dbb39d7c8ec46..33bedf7ad3d90d 100644
--- a/llvm/lib/Analysis/ValueTracking.cpp
+++ b/llvm/lib/Analysis/ValueTracking.cpp
@@ -2144,6 +2144,8 @@ bool isKnownToBeAPowerOfTwo(const Value *V, bool OrZero, unsigned Depth,
       return OrZero || isKnownNonZero(I->getOperand(0), Q, Depth);
     return false;
   case Instruction::Add: {
+    unsigned BitWidth = V->getType()->getScalarSizeInBits();
+
     // Adding a power-of-two or zero to the same power-of-two or zero yields
     // either the original power-of-two, a larger power-of-two or zero.
     const OverflowingBinaryOperator *VOBO = cast<OverflowingBinaryOperator>(V);
@@ -2158,7 +2160,6 @@ bool isKnownToBeAPowerOfTwo(const Value *V, bool OrZero, unsigned Depth,
           isKnownToBeAPowerOfTwo(I->getOperand(0), OrZero, Depth, Q))
         return true;
 
-      unsigned BitWidth = V->getType()->getScalarSizeInBits();
       KnownBits LHSBits(BitWidth);
       computeKnownBits(I->getOperand(0), LHSBits, Depth, Q);
 
@@ -2173,6 +2174,13 @@ bool isKnownToBeAPowerOfTwo(const Value *V, bool OrZero, unsigned Depth,
         if (OrZero || RHSBits.One.getBoolValue() || LHSBits.One.getBoolValue())
           return true;
     }
+
+    // LShr(UINT_MAX, Y) + 1 is a power of two (if nuw) or zero.
+    if (OrZero || Q.IIQ.hasNoUnsignedWrap(VOBO)) {
+      APInt UIntMax = APInt::getMaxValue(BitWidth);
+      if (match(I, m_Add(m_LShr(m_SpecificInt(UIntMax), m_Value()), m_One())))
+        return true;
+    }
     return false;
   }
   case Instruction::Select:
diff --git a/llvm/test/Analysis/ValueTracking/known-power-of-two-urem.ll b/llvm/test/Analysis/ValueTracking/known-power-of-two-urem.ll
index 0fa8a74d250d25..20497cb57d03d0 100644
--- a/llvm/test/Analysis/ValueTracking/known-power-of-two-urem.ll
+++ b/llvm/test/Analysis/ValueTracking/known-power-of-two-urem.ll
@@ -387,3 +387,26 @@ for.end:
   %r = phi i64 [ %sum, %for.body ]
   ret i64 %r
 }
+
+; https://alive2.llvm.org/ce/z/3QfEHm
+define i8 @known_power_of_two_rust_next_power_of_two(i8 %x, i8 %y) {
+; CHECK-LABEL: @known_power_of_two_rust_next_power_of_two(
+; CHECK-NEXT:    [[TMP1:%.*]] = add i8 [[X:%.*]], -1
+; CHECK-NEXT:    [[TMP2:%.*]] = tail call range(i8 0, 9) i8 @llvm.ctlz.i8(i8 [[TMP1]], i1 true)
+; CHECK-NEXT:    [[TMP3:%.*]] = lshr i8 -1, [[TMP2]]
+; CHECK-NEXT:    [[TMP4:%.*]] = icmp ugt i8 [[X]], 1
+; CHECK-NEXT:    [[TMP5:%.*]] = select i1 [[TMP4]], i8 [[TMP3]], i8 0
+; CHECK-NEXT:    [[R:%.*]] = and i8 [[TMP5]], [[Y:%.*]]
+; CHECK-NEXT:    ret i8 [[R]]
+;
+  %2 = add i8 %x, -1
+  %3 = tail call i8 @llvm.ctlz.i8(i8 %2, i1 true)
+  %4 = lshr i8 -1, %3
+  %5 = add i8 %4, 1
+  %6 = icmp ugt i8 %x, 1
+  %p = select i1 %6, i8 %5, i8 1
+  ; Rust's implementation of `%p = next_power_of_two(%x)`
+
+  %r = urem i8 %y, %p
+  ret i8 %r
+}

Copy link
Contributor

@nikic nikic left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice find.


%r = urem i8 %y, %p
ret i8 %r
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could you please also add a more targeted test of just the lshr + add pattern, and then test:

  • The OrZero/nuw logic.
  • Negative tests for wrong constants (-1, 1).

@nikic nikic requested a review from dtcxzyw May 6, 2024 10:16
dtcxzyw added a commit to dtcxzyw/llvm-opt-benchmark that referenced this pull request May 6, 2024
Copy link
Member

@dtcxzyw dtcxzyw left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM. Nice catch!

BTW, can you please have a look at https://github.com/dtcxzyw/llvm-opt-benchmark/blob/03a9d18338e0213e304f67ea9cb18e4ae95f5d26/bench/regex-rs/optimized/11vfjke4utuj478u.ll#L7147-L7152? I guess we can handle this pattern in foldCttzCtlz as well :)

Proof: https://alive2.llvm.org/ce/z/2aHfYa

@nikic nikic merged commit fd0ffb7 into llvm:main May 7, 2024
4 checks passed
nikic pushed a commit that referenced this pull request May 7, 2024
Fold
``` llvm
define i64 @src(i64 %50) {
  %52 = lshr i64 -1, %50
  %53 = add i64 %52, 1
  %54 = call i64 @llvm.cttz.i64(i64 %53, i1 false)
  ret i64 %54
}
```
to
``` llvm
define i64 @tgt(i64 %50) {
  %52 = sub i64 64, %50
  ret i64 %52
}
```

as
#91171 (review)
pointed out.

Alive2 proof: https://alive2.llvm.org/ce/z/2aHfYa

Note: the `ctlz` version of this pattern seems not exist in dtcxzyw's
benchmark, so put it aside for now.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants