Skip to content

Reland "VectorUtils: mark xrint as trivially vectorizable" #71416

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 1 commit into from
Nov 6, 2023

Conversation

artagnon
Copy link
Contributor

@artagnon artagnon commented Nov 6, 2023

With the recent change 98c90a1 (ISel: introduce vector ISD::LRINT, ISD::LLRINT; custom RISCV lowering), it is now possible for SLPVectorizer, LoopVectorize, and Scalarizer to operate on llvm.lrint and llvm.llrint, with vector codegen for the RISC-V target. Make a trivial change to VectorUtils, and update the corresponding tests.

A couple of important fixes have been landed since the original patch was landed and reverted, and it is now safe to re-land the patch: 5e1d81a (LegalizeIntegerTypes: implement PromoteIntRes for xrint) and fd887a3 (LegalizeVectorTypes: fix bug in widening of vec result in xrint). See also #71399, which proves that lrint and llrint will indeed produce vector codegen on RISC-V.

Fixes #55208.

With the recent change 98c90a1 (ISel: introduce vector ISD::LRINT,
ISD::LLRINT; custom RISCV lowering), it is now possible for
SLPVectorizer, LoopVectorize, and Scalarizer to operate on llvm.lrint
and llvm.llrint, with vector codegen for the RISC-V target. Make a
trivial change to VectorUtils, and update the corresponding tests.

A couple of important fixes have been landed since the original patch
was landed and reverted, and it is now safe to re-land the patch:
5e1d81a (LegalizeIntegerTypes: implement PromoteIntRes for xrint) and
fd887a3 (LegalizeVectorTypes: fix bug in widening of vec result in
xrint). See also llvm#71399 which proves that lrint and llrint will indeed
produce vector codegen on RISC-V.

Fixes llvm#55208.
@llvmbot
Copy link
Member

llvmbot commented Nov 6, 2023

@llvm/pr-subscribers-llvm-analysis

@llvm/pr-subscribers-llvm-transforms

Author: Ramkumar Ramachandra (artagnon)

Changes

With the recent change 98c90a1 (ISel: introduce vector ISD::LRINT, ISD::LLRINT; custom RISCV lowering), it is now possible for SLPVectorizer, LoopVectorize, and Scalarizer to operate on llvm.lrint and llvm.llrint, with vector codegen for the RISC-V target. Make a trivial change to VectorUtils, and update the corresponding tests.

A couple of important fixes have been landed since the original patch was landed and reverted, and it is now safe to re-land the patch: 5e1d81a (LegalizeIntegerTypes: implement PromoteIntRes for xrint) and fd887a3 (LegalizeVectorTypes: fix bug in widening of vec result in xrint). See also #71399, which proves that lrint and llrint will indeed produce vector codegen on RISC-V.

Fixes #55208.


Full diff: https://github.com/llvm/llvm-project/pull/71416.diff

4 Files Affected:

  • (modified) llvm/lib/Analysis/VectorUtils.cpp (+4)
  • (modified) llvm/test/Transforms/LoopVectorize/intrinsic.ll (+2-2)
  • (modified) llvm/test/Transforms/SLPVectorizer/RISCV/fround.ll (+30-135)
  • (modified) llvm/test/Transforms/Scalarizer/intrinsics.ll (+12-2)
diff --git a/llvm/lib/Analysis/VectorUtils.cpp b/llvm/lib/Analysis/VectorUtils.cpp
index be171867a2bf4d9..4f5db8b7aaf746f 100644
--- a/llvm/lib/Analysis/VectorUtils.cpp
+++ b/llvm/lib/Analysis/VectorUtils.cpp
@@ -91,6 +91,8 @@ bool llvm::isTriviallyVectorizable(Intrinsic::ID ID) {
   case Intrinsic::canonicalize:
   case Intrinsic::fptosi_sat:
   case Intrinsic::fptoui_sat:
+  case Intrinsic::lrint:
+  case Intrinsic::llrint:
     return true;
   default:
     return false;
@@ -122,6 +124,8 @@ bool llvm::isVectorIntrinsicWithOverloadTypeAtArg(Intrinsic::ID ID,
   switch (ID) {
   case Intrinsic::fptosi_sat:
   case Intrinsic::fptoui_sat:
+  case Intrinsic::lrint:
+  case Intrinsic::llrint:
     return OpdIdx == -1 || OpdIdx == 0;
   case Intrinsic::is_fpclass:
     return OpdIdx == 0;
diff --git a/llvm/test/Transforms/LoopVectorize/intrinsic.ll b/llvm/test/Transforms/LoopVectorize/intrinsic.ll
index 3e7754951de6a83..0f070347dd4efb6 100644
--- a/llvm/test/Transforms/LoopVectorize/intrinsic.ll
+++ b/llvm/test/Transforms/LoopVectorize/intrinsic.ll
@@ -1602,7 +1602,7 @@ declare i32 @llvm.lrint.i32.f32(float)
 
 define void @lrint_i32_f32(ptr %x, ptr %y, i64 %n) {
 ; CHECK-LABEL: @lrint_i32_f32(
-; CHECK-NOT: llvm.lrint.v4i32.v4f32
+; CHECK: llvm.lrint.v4i32.v4f32
 ; CHECK: ret void
 ;
 entry:
@@ -1628,7 +1628,7 @@ declare i64 @llvm.llrint.i64.f32(float)
 
 define void @llrint_i64_f32(ptr %x, ptr %y, i64 %n) {
 ; CHECK-LABEL: @llrint_i64_f32(
-; CHECK-NOT: llvm.llrint.v4i32.v4f32
+; CHECK: llvm.llrint.v4i64.v4f32
 ; CHECK: ret void
 ;
 entry:
diff --git a/llvm/test/Transforms/SLPVectorizer/RISCV/fround.ll b/llvm/test/Transforms/SLPVectorizer/RISCV/fround.ll
index e5c46b72a1c30fc..9cd0d5d321e2f60 100644
--- a/llvm/test/Transforms/SLPVectorizer/RISCV/fround.ll
+++ b/llvm/test/Transforms/SLPVectorizer/RISCV/fround.ll
@@ -34,13 +34,8 @@ define <2 x i32> @lrint_v2i32f32(ptr %a) {
 ; CHECK-SAME: ptr [[A:%.*]]) #[[ATTR0]] {
 ; CHECK-NEXT:  entry:
 ; CHECK-NEXT:    [[TMP0:%.*]] = load <2 x float>, ptr [[A]], align 8
-; CHECK-NEXT:    [[VECEXT:%.*]] = extractelement <2 x float> [[TMP0]], i32 0
-; CHECK-NEXT:    [[TMP1:%.*]] = call i32 @llvm.lrint.i32.f32(float [[VECEXT]])
-; CHECK-NEXT:    [[VECINS:%.*]] = insertelement <2 x i32> undef, i32 [[TMP1]], i32 0
-; CHECK-NEXT:    [[VECEXT_1:%.*]] = extractelement <2 x float> [[TMP0]], i32 1
-; CHECK-NEXT:    [[TMP2:%.*]] = call i32 @llvm.lrint.i32.f32(float [[VECEXT_1]])
-; CHECK-NEXT:    [[VECINS_1:%.*]] = insertelement <2 x i32> [[VECINS]], i32 [[TMP2]], i32 1
-; CHECK-NEXT:    ret <2 x i32> [[VECINS_1]]
+; CHECK-NEXT:    [[TMP1:%.*]] = call <2 x i32> @llvm.lrint.v2i32.v2f32(<2 x float> [[TMP0]])
+; CHECK-NEXT:    ret <2 x i32> [[TMP1]]
 ;
 entry:
   %0 = load <2 x float>, ptr %a
@@ -58,19 +53,8 @@ define <4 x i32> @lrint_v4i32f32(ptr %a) {
 ; CHECK-SAME: ptr [[A:%.*]]) #[[ATTR0]] {
 ; CHECK-NEXT:  entry:
 ; CHECK-NEXT:    [[TMP0:%.*]] = load <4 x float>, ptr [[A]], align 16
-; CHECK-NEXT:    [[VECEXT:%.*]] = extractelement <4 x float> [[TMP0]], i32 0
-; CHECK-NEXT:    [[TMP1:%.*]] = call i32 @llvm.lrint.i32.f32(float [[VECEXT]])
-; CHECK-NEXT:    [[VECINS:%.*]] = insertelement <4 x i32> undef, i32 [[TMP1]], i32 0
-; CHECK-NEXT:    [[VECEXT_1:%.*]] = extractelement <4 x float> [[TMP0]], i32 1
-; CHECK-NEXT:    [[TMP2:%.*]] = call i32 @llvm.lrint.i32.f32(float [[VECEXT_1]])
-; CHECK-NEXT:    [[VECINS_1:%.*]] = insertelement <4 x i32> [[VECINS]], i32 [[TMP2]], i32 1
-; CHECK-NEXT:    [[VECEXT_2:%.*]] = extractelement <4 x float> [[TMP0]], i32 2
-; CHECK-NEXT:    [[TMP3:%.*]] = call i32 @llvm.lrint.i32.f32(float [[VECEXT_2]])
-; CHECK-NEXT:    [[VECINS_2:%.*]] = insertelement <4 x i32> [[VECINS_1]], i32 [[TMP3]], i32 2
-; CHECK-NEXT:    [[VECEXT_3:%.*]] = extractelement <4 x float> [[TMP0]], i32 3
-; CHECK-NEXT:    [[TMP4:%.*]] = call i32 @llvm.lrint.i32.f32(float [[VECEXT_3]])
-; CHECK-NEXT:    [[VECINS_3:%.*]] = insertelement <4 x i32> [[VECINS_2]], i32 [[TMP4]], i32 3
-; CHECK-NEXT:    ret <4 x i32> [[VECINS_3]]
+; CHECK-NEXT:    [[TMP1:%.*]] = call <4 x i32> @llvm.lrint.v4i32.v4f32(<4 x float> [[TMP0]])
+; CHECK-NEXT:    ret <4 x i32> [[TMP1]]
 ;
 entry:
   %0 = load <4 x float>, ptr %a
@@ -94,31 +78,8 @@ define <8 x i32> @lrint_v8i32f32(ptr %a) {
 ; CHECK-SAME: ptr [[A:%.*]]) #[[ATTR0]] {
 ; CHECK-NEXT:  entry:
 ; CHECK-NEXT:    [[TMP0:%.*]] = load <8 x float>, ptr [[A]], align 32
-; CHECK-NEXT:    [[VECEXT:%.*]] = extractelement <8 x float> [[TMP0]], i32 0
-; CHECK-NEXT:    [[TMP1:%.*]] = call i32 @llvm.lrint.i32.f32(float [[VECEXT]])
-; CHECK-NEXT:    [[VECINS:%.*]] = insertelement <8 x i32> undef, i32 [[TMP1]], i32 0
-; CHECK-NEXT:    [[VECEXT_1:%.*]] = extractelement <8 x float> [[TMP0]], i32 1
-; CHECK-NEXT:    [[TMP2:%.*]] = call i32 @llvm.lrint.i32.f32(float [[VECEXT_1]])
-; CHECK-NEXT:    [[VECINS_1:%.*]] = insertelement <8 x i32> [[VECINS]], i32 [[TMP2]], i32 1
-; CHECK-NEXT:    [[VECEXT_2:%.*]] = extractelement <8 x float> [[TMP0]], i32 2
-; CHECK-NEXT:    [[TMP3:%.*]] = call i32 @llvm.lrint.i32.f32(float [[VECEXT_2]])
-; CHECK-NEXT:    [[VECINS_2:%.*]] = insertelement <8 x i32> [[VECINS_1]], i32 [[TMP3]], i32 2
-; CHECK-NEXT:    [[VECEXT_3:%.*]] = extractelement <8 x float> [[TMP0]], i32 3
-; CHECK-NEXT:    [[TMP4:%.*]] = call i32 @llvm.lrint.i32.f32(float [[VECEXT_3]])
-; CHECK-NEXT:    [[VECINS_3:%.*]] = insertelement <8 x i32> [[VECINS_2]], i32 [[TMP4]], i32 3
-; CHECK-NEXT:    [[VECEXT_4:%.*]] = extractelement <8 x float> [[TMP0]], i32 4
-; CHECK-NEXT:    [[TMP5:%.*]] = call i32 @llvm.lrint.i32.f32(float [[VECEXT_4]])
-; CHECK-NEXT:    [[VECINS_4:%.*]] = insertelement <8 x i32> [[VECINS_3]], i32 [[TMP5]], i32 4
-; CHECK-NEXT:    [[VECEXT_5:%.*]] = extractelement <8 x float> [[TMP0]], i32 5
-; CHECK-NEXT:    [[TMP6:%.*]] = call i32 @llvm.lrint.i32.f32(float [[VECEXT_5]])
-; CHECK-NEXT:    [[VECINS_5:%.*]] = insertelement <8 x i32> [[VECINS_4]], i32 [[TMP6]], i32 5
-; CHECK-NEXT:    [[VECEXT_6:%.*]] = extractelement <8 x float> [[TMP0]], i32 6
-; CHECK-NEXT:    [[TMP7:%.*]] = call i32 @llvm.lrint.i32.f32(float [[VECEXT_6]])
-; CHECK-NEXT:    [[VECINS_6:%.*]] = insertelement <8 x i32> [[VECINS_5]], i32 [[TMP7]], i32 6
-; CHECK-NEXT:    [[VECEXT_7:%.*]] = extractelement <8 x float> [[TMP0]], i32 7
-; CHECK-NEXT:    [[TMP8:%.*]] = call i32 @llvm.lrint.i32.f32(float [[VECEXT_7]])
-; CHECK-NEXT:    [[VECINS_7:%.*]] = insertelement <8 x i32> [[VECINS_6]], i32 [[TMP8]], i32 7
-; CHECK-NEXT:    ret <8 x i32> [[VECINS_7]]
+; CHECK-NEXT:    [[TMP1:%.*]] = call <8 x i32> @llvm.lrint.v8i32.v8f32(<8 x float> [[TMP0]])
+; CHECK-NEXT:    ret <8 x i32> [[TMP1]]
 ;
 entry:
   %0 = load <8 x float>, ptr %a
@@ -154,13 +115,8 @@ define <2 x i64> @lrint_v2i64f32(ptr %a) {
 ; CHECK-SAME: ptr [[A:%.*]]) #[[ATTR0]] {
 ; CHECK-NEXT:  entry:
 ; CHECK-NEXT:    [[TMP0:%.*]] = load <2 x float>, ptr [[A]], align 8
-; CHECK-NEXT:    [[VECEXT:%.*]] = extractelement <2 x float> [[TMP0]], i64 0
-; CHECK-NEXT:    [[TMP1:%.*]] = call i64 @llvm.lrint.i64.f32(float [[VECEXT]])
-; CHECK-NEXT:    [[VECINS:%.*]] = insertelement <2 x i64> undef, i64 [[TMP1]], i64 0
-; CHECK-NEXT:    [[VECEXT_1:%.*]] = extractelement <2 x float> [[TMP0]], i64 1
-; CHECK-NEXT:    [[TMP2:%.*]] = call i64 @llvm.lrint.i64.f32(float [[VECEXT_1]])
-; CHECK-NEXT:    [[VECINS_1:%.*]] = insertelement <2 x i64> [[VECINS]], i64 [[TMP2]], i64 1
-; CHECK-NEXT:    ret <2 x i64> [[VECINS_1]]
+; CHECK-NEXT:    [[TMP1:%.*]] = call <2 x i64> @llvm.lrint.v2i64.v2f32(<2 x float> [[TMP0]])
+; CHECK-NEXT:    ret <2 x i64> [[TMP1]]
 ;
 entry:
   %0 = load <2 x float>, ptr %a
@@ -178,19 +134,8 @@ define <4 x i64> @lrint_v4i64f32(ptr %a) {
 ; CHECK-SAME: ptr [[A:%.*]]) #[[ATTR0]] {
 ; CHECK-NEXT:  entry:
 ; CHECK-NEXT:    [[TMP0:%.*]] = load <4 x float>, ptr [[A]], align 16
-; CHECK-NEXT:    [[VECEXT:%.*]] = extractelement <4 x float> [[TMP0]], i64 0
-; CHECK-NEXT:    [[TMP1:%.*]] = call i64 @llvm.lrint.i64.f32(float [[VECEXT]])
-; CHECK-NEXT:    [[VECINS:%.*]] = insertelement <4 x i64> undef, i64 [[TMP1]], i64 0
-; CHECK-NEXT:    [[VECEXT_1:%.*]] = extractelement <4 x float> [[TMP0]], i64 1
-; CHECK-NEXT:    [[TMP2:%.*]] = call i64 @llvm.lrint.i64.f32(float [[VECEXT_1]])
-; CHECK-NEXT:    [[VECINS_1:%.*]] = insertelement <4 x i64> [[VECINS]], i64 [[TMP2]], i64 1
-; CHECK-NEXT:    [[VECEXT_2:%.*]] = extractelement <4 x float> [[TMP0]], i64 2
-; CHECK-NEXT:    [[TMP3:%.*]] = call i64 @llvm.lrint.i64.f32(float [[VECEXT_2]])
-; CHECK-NEXT:    [[VECINS_2:%.*]] = insertelement <4 x i64> [[VECINS_1]], i64 [[TMP3]], i64 2
-; CHECK-NEXT:    [[VECEXT_3:%.*]] = extractelement <4 x float> [[TMP0]], i64 3
-; CHECK-NEXT:    [[TMP4:%.*]] = call i64 @llvm.lrint.i64.f32(float [[VECEXT_3]])
-; CHECK-NEXT:    [[VECINS_3:%.*]] = insertelement <4 x i64> [[VECINS_2]], i64 [[TMP4]], i64 3
-; CHECK-NEXT:    ret <4 x i64> [[VECINS_3]]
+; CHECK-NEXT:    [[TMP1:%.*]] = call <4 x i64> @llvm.lrint.v4i64.v4f32(<4 x float> [[TMP0]])
+; CHECK-NEXT:    ret <4 x i64> [[TMP1]]
 ;
 entry:
   %0 = load <4 x float>, ptr %a
@@ -214,31 +159,14 @@ define <8 x i64> @lrint_v8i64f32(ptr %a) {
 ; CHECK-SAME: ptr [[A:%.*]]) #[[ATTR0]] {
 ; CHECK-NEXT:  entry:
 ; CHECK-NEXT:    [[TMP0:%.*]] = load <8 x float>, ptr [[A]], align 32
-; CHECK-NEXT:    [[VECEXT:%.*]] = extractelement <8 x float> [[TMP0]], i64 0
-; CHECK-NEXT:    [[TMP1:%.*]] = call i64 @llvm.lrint.i64.f32(float [[VECEXT]])
-; CHECK-NEXT:    [[VECINS:%.*]] = insertelement <8 x i64> undef, i64 [[TMP1]], i64 0
-; CHECK-NEXT:    [[VECEXT_1:%.*]] = extractelement <8 x float> [[TMP0]], i64 1
-; CHECK-NEXT:    [[TMP2:%.*]] = call i64 @llvm.lrint.i64.f32(float [[VECEXT_1]])
-; CHECK-NEXT:    [[VECINS_1:%.*]] = insertelement <8 x i64> [[VECINS]], i64 [[TMP2]], i64 1
-; CHECK-NEXT:    [[VECEXT_2:%.*]] = extractelement <8 x float> [[TMP0]], i64 2
-; CHECK-NEXT:    [[TMP3:%.*]] = call i64 @llvm.lrint.i64.f32(float [[VECEXT_2]])
-; CHECK-NEXT:    [[VECINS_2:%.*]] = insertelement <8 x i64> [[VECINS_1]], i64 [[TMP3]], i64 2
-; CHECK-NEXT:    [[VECEXT_3:%.*]] = extractelement <8 x float> [[TMP0]], i64 3
-; CHECK-NEXT:    [[TMP4:%.*]] = call i64 @llvm.lrint.i64.f32(float [[VECEXT_3]])
-; CHECK-NEXT:    [[VECINS_3:%.*]] = insertelement <8 x i64> [[VECINS_2]], i64 [[TMP4]], i64 3
-; CHECK-NEXT:    [[VECEXT_4:%.*]] = extractelement <8 x float> [[TMP0]], i64 4
-; CHECK-NEXT:    [[TMP5:%.*]] = call i64 @llvm.lrint.i64.f32(float [[VECEXT_4]])
-; CHECK-NEXT:    [[VECINS_4:%.*]] = insertelement <8 x i64> [[VECINS_3]], i64 [[TMP5]], i64 4
-; CHECK-NEXT:    [[VECEXT_5:%.*]] = extractelement <8 x float> [[TMP0]], i64 5
-; CHECK-NEXT:    [[TMP6:%.*]] = call i64 @llvm.lrint.i64.f32(float [[VECEXT_5]])
-; CHECK-NEXT:    [[VECINS_5:%.*]] = insertelement <8 x i64> [[VECINS_4]], i64 [[TMP6]], i64 5
-; CHECK-NEXT:    [[VECEXT_6:%.*]] = extractelement <8 x float> [[TMP0]], i64 6
-; CHECK-NEXT:    [[TMP7:%.*]] = call i64 @llvm.lrint.i64.f32(float [[VECEXT_6]])
-; CHECK-NEXT:    [[VECINS_6:%.*]] = insertelement <8 x i64> [[VECINS_5]], i64 [[TMP7]], i64 6
-; CHECK-NEXT:    [[VECEXT_7:%.*]] = extractelement <8 x float> [[TMP0]], i64 7
-; CHECK-NEXT:    [[TMP8:%.*]] = call i64 @llvm.lrint.i64.f32(float [[VECEXT_7]])
-; CHECK-NEXT:    [[VECINS_7:%.*]] = insertelement <8 x i64> [[VECINS_6]], i64 [[TMP8]], i64 7
-; CHECK-NEXT:    ret <8 x i64> [[VECINS_7]]
+; CHECK-NEXT:    [[TMP1:%.*]] = shufflevector <8 x float> [[TMP0]], <8 x float> poison, <4 x i32> <i32 0, i32 1, i32 2, i32 3>
+; CHECK-NEXT:    [[TMP2:%.*]] = call <4 x i64> @llvm.lrint.v4i64.v4f32(<4 x float> [[TMP1]])
+; CHECK-NEXT:    [[TMP3:%.*]] = shufflevector <4 x i64> [[TMP2]], <4 x i64> poison, <8 x i32> <i32 0, i32 1, i32 2, i32 3, i32 poison, i32 poison, i32 poison, i32 poison>
+; CHECK-NEXT:    [[TMP4:%.*]] = shufflevector <8 x float> [[TMP0]], <8 x float> poison, <4 x i32> <i32 4, i32 5, i32 6, i32 7>
+; CHECK-NEXT:    [[TMP5:%.*]] = call <4 x i64> @llvm.lrint.v4i64.v4f32(<4 x float> [[TMP4]])
+; CHECK-NEXT:    [[TMP6:%.*]] = shufflevector <4 x i64> [[TMP5]], <4 x i64> poison, <8 x i32> <i32 0, i32 1, i32 2, i32 3, i32 poison, i32 poison, i32 poison, i32 poison>
+; CHECK-NEXT:    [[VECINS_71:%.*]] = shufflevector <8 x i64> [[TMP3]], <8 x i64> [[TMP6]], <8 x i32> <i32 0, i32 1, i32 2, i32 3, i32 8, i32 9, i32 10, i32 11>
+; CHECK-NEXT:    ret <8 x i64> [[VECINS_71]]
 ;
 entry:
   %0 = load <8 x float>, ptr %a
@@ -274,13 +202,8 @@ define <2 x i64> @llrint_v2i64f32(ptr %a) {
 ; CHECK-SAME: ptr [[A:%.*]]) #[[ATTR0]] {
 ; CHECK-NEXT:  entry:
 ; CHECK-NEXT:    [[TMP0:%.*]] = load <2 x float>, ptr [[A]], align 8
-; CHECK-NEXT:    [[VECEXT:%.*]] = extractelement <2 x float> [[TMP0]], i64 0
-; CHECK-NEXT:    [[TMP1:%.*]] = call i64 @llvm.llrint.i64.f32(float [[VECEXT]])
-; CHECK-NEXT:    [[VECINS:%.*]] = insertelement <2 x i64> undef, i64 [[TMP1]], i64 0
-; CHECK-NEXT:    [[VECEXT_1:%.*]] = extractelement <2 x float> [[TMP0]], i64 1
-; CHECK-NEXT:    [[TMP2:%.*]] = call i64 @llvm.llrint.i64.f32(float [[VECEXT_1]])
-; CHECK-NEXT:    [[VECINS_1:%.*]] = insertelement <2 x i64> [[VECINS]], i64 [[TMP2]], i64 1
-; CHECK-NEXT:    ret <2 x i64> [[VECINS_1]]
+; CHECK-NEXT:    [[TMP1:%.*]] = call <2 x i64> @llvm.llrint.v2i64.v2f32(<2 x float> [[TMP0]])
+; CHECK-NEXT:    ret <2 x i64> [[TMP1]]
 ;
 entry:
   %0 = load <2 x float>, ptr %a
@@ -298,19 +221,8 @@ define <4 x i64> @llrint_v4i64f32(ptr %a) {
 ; CHECK-SAME: ptr [[A:%.*]]) #[[ATTR0]] {
 ; CHECK-NEXT:  entry:
 ; CHECK-NEXT:    [[TMP0:%.*]] = load <4 x float>, ptr [[A]], align 16
-; CHECK-NEXT:    [[VECEXT:%.*]] = extractelement <4 x float> [[TMP0]], i64 0
-; CHECK-NEXT:    [[TMP1:%.*]] = call i64 @llvm.llrint.i64.f32(float [[VECEXT]])
-; CHECK-NEXT:    [[VECINS:%.*]] = insertelement <4 x i64> undef, i64 [[TMP1]], i64 0
-; CHECK-NEXT:    [[VECEXT_1:%.*]] = extractelement <4 x float> [[TMP0]], i64 1
-; CHECK-NEXT:    [[TMP2:%.*]] = call i64 @llvm.llrint.i64.f32(float [[VECEXT_1]])
-; CHECK-NEXT:    [[VECINS_1:%.*]] = insertelement <4 x i64> [[VECINS]], i64 [[TMP2]], i64 1
-; CHECK-NEXT:    [[VECEXT_2:%.*]] = extractelement <4 x float> [[TMP0]], i64 2
-; CHECK-NEXT:    [[TMP3:%.*]] = call i64 @llvm.llrint.i64.f32(float [[VECEXT_2]])
-; CHECK-NEXT:    [[VECINS_2:%.*]] = insertelement <4 x i64> [[VECINS_1]], i64 [[TMP3]], i64 2
-; CHECK-NEXT:    [[VECEXT_3:%.*]] = extractelement <4 x float> [[TMP0]], i64 3
-; CHECK-NEXT:    [[TMP4:%.*]] = call i64 @llvm.llrint.i64.f32(float [[VECEXT_3]])
-; CHECK-NEXT:    [[VECINS_3:%.*]] = insertelement <4 x i64> [[VECINS_2]], i64 [[TMP4]], i64 3
-; CHECK-NEXT:    ret <4 x i64> [[VECINS_3]]
+; CHECK-NEXT:    [[TMP1:%.*]] = call <4 x i64> @llvm.llrint.v4i64.v4f32(<4 x float> [[TMP0]])
+; CHECK-NEXT:    ret <4 x i64> [[TMP1]]
 ;
 entry:
   %0 = load <4 x float>, ptr %a
@@ -334,31 +246,14 @@ define <8 x i64> @llrint_v8i64f32(ptr %a) {
 ; CHECK-SAME: ptr [[A:%.*]]) #[[ATTR0]] {
 ; CHECK-NEXT:  entry:
 ; CHECK-NEXT:    [[TMP0:%.*]] = load <8 x float>, ptr [[A]], align 32
-; CHECK-NEXT:    [[VECEXT:%.*]] = extractelement <8 x float> [[TMP0]], i64 0
-; CHECK-NEXT:    [[TMP1:%.*]] = call i64 @llvm.llrint.i64.f32(float [[VECEXT]])
-; CHECK-NEXT:    [[VECINS:%.*]] = insertelement <8 x i64> undef, i64 [[TMP1]], i64 0
-; CHECK-NEXT:    [[VECEXT_1:%.*]] = extractelement <8 x float> [[TMP0]], i64 1
-; CHECK-NEXT:    [[TMP2:%.*]] = call i64 @llvm.llrint.i64.f32(float [[VECEXT_1]])
-; CHECK-NEXT:    [[VECINS_1:%.*]] = insertelement <8 x i64> [[VECINS]], i64 [[TMP2]], i64 1
-; CHECK-NEXT:    [[VECEXT_2:%.*]] = extractelement <8 x float> [[TMP0]], i64 2
-; CHECK-NEXT:    [[TMP3:%.*]] = call i64 @llvm.llrint.i64.f32(float [[VECEXT_2]])
-; CHECK-NEXT:    [[VECINS_2:%.*]] = insertelement <8 x i64> [[VECINS_1]], i64 [[TMP3]], i64 2
-; CHECK-NEXT:    [[VECEXT_3:%.*]] = extractelement <8 x float> [[TMP0]], i64 3
-; CHECK-NEXT:    [[TMP4:%.*]] = call i64 @llvm.llrint.i64.f32(float [[VECEXT_3]])
-; CHECK-NEXT:    [[VECINS_3:%.*]] = insertelement <8 x i64> [[VECINS_2]], i64 [[TMP4]], i64 3
-; CHECK-NEXT:    [[VECEXT_4:%.*]] = extractelement <8 x float> [[TMP0]], i64 4
-; CHECK-NEXT:    [[TMP5:%.*]] = call i64 @llvm.llrint.i64.f32(float [[VECEXT_4]])
-; CHECK-NEXT:    [[VECINS_4:%.*]] = insertelement <8 x i64> [[VECINS_3]], i64 [[TMP5]], i64 4
-; CHECK-NEXT:    [[VECEXT_5:%.*]] = extractelement <8 x float> [[TMP0]], i64 5
-; CHECK-NEXT:    [[TMP6:%.*]] = call i64 @llvm.llrint.i64.f32(float [[VECEXT_5]])
-; CHECK-NEXT:    [[VECINS_5:%.*]] = insertelement <8 x i64> [[VECINS_4]], i64 [[TMP6]], i64 5
-; CHECK-NEXT:    [[VECEXT_6:%.*]] = extractelement <8 x float> [[TMP0]], i64 6
-; CHECK-NEXT:    [[TMP7:%.*]] = call i64 @llvm.llrint.i64.f32(float [[VECEXT_6]])
-; CHECK-NEXT:    [[VECINS_6:%.*]] = insertelement <8 x i64> [[VECINS_5]], i64 [[TMP7]], i64 6
-; CHECK-NEXT:    [[VECEXT_7:%.*]] = extractelement <8 x float> [[TMP0]], i64 7
-; CHECK-NEXT:    [[TMP8:%.*]] = call i64 @llvm.llrint.i64.f32(float [[VECEXT_7]])
-; CHECK-NEXT:    [[VECINS_7:%.*]] = insertelement <8 x i64> [[VECINS_6]], i64 [[TMP8]], i64 7
-; CHECK-NEXT:    ret <8 x i64> [[VECINS_7]]
+; CHECK-NEXT:    [[TMP1:%.*]] = shufflevector <8 x float> [[TMP0]], <8 x float> poison, <4 x i32> <i32 0, i32 1, i32 2, i32 3>
+; CHECK-NEXT:    [[TMP2:%.*]] = call <4 x i64> @llvm.llrint.v4i64.v4f32(<4 x float> [[TMP1]])
+; CHECK-NEXT:    [[TMP3:%.*]] = shufflevector <4 x i64> [[TMP2]], <4 x i64> poison, <8 x i32> <i32 0, i32 1, i32 2, i32 3, i32 poison, i32 poison, i32 poison, i32 poison>
+; CHECK-NEXT:    [[TMP4:%.*]] = shufflevector <8 x float> [[TMP0]], <8 x float> poison, <4 x i32> <i32 4, i32 5, i32 6, i32 7>
+; CHECK-NEXT:    [[TMP5:%.*]] = call <4 x i64> @llvm.llrint.v4i64.v4f32(<4 x float> [[TMP4]])
+; CHECK-NEXT:    [[TMP6:%.*]] = shufflevector <4 x i64> [[TMP5]], <4 x i64> poison, <8 x i32> <i32 0, i32 1, i32 2, i32 3, i32 poison, i32 poison, i32 poison, i32 poison>
+; CHECK-NEXT:    [[VECINS_71:%.*]] = shufflevector <8 x i64> [[TMP3]], <8 x i64> [[TMP6]], <8 x i32> <i32 0, i32 1, i32 2, i32 3, i32 8, i32 9, i32 10, i32 11>
+; CHECK-NEXT:    ret <8 x i64> [[VECINS_71]]
 ;
 entry:
   %0 = load <8 x float>, ptr %a
diff --git a/llvm/test/Transforms/Scalarizer/intrinsics.ll b/llvm/test/Transforms/Scalarizer/intrinsics.ll
index f5e5be1f5998e00..cee44ef43426071 100644
--- a/llvm/test/Transforms/Scalarizer/intrinsics.ll
+++ b/llvm/test/Transforms/Scalarizer/intrinsics.ll
@@ -217,7 +217,12 @@ define <2 x i32> @scalarize_fptoui_sat(<2 x float> %x) #0 {
 
 define <2 x i32> @scalarize_lrint(<2 x float> %x) #0 {
 ; CHECK-LABEL: @scalarize_lrint(
-; CHECK-NEXT:    [[RND:%.*]] = call <2 x i32> @llvm.lrint.v2i32.v2f32(<2 x float> [[X:%.*]])
+; CHECK-NEXT:    [[X_I0:%.*]] = extractelement <2 x float> [[X:%.*]], i64 0
+; CHECK-NEXT:    [[RND_I0:%.*]] = call i32 @llvm.lrint.i32.f32(float [[X_I0]])
+; CHECK-NEXT:    [[X_I1:%.*]] = extractelement <2 x float> [[X]], i64 1
+; CHECK-NEXT:    [[RND_I1:%.*]] = call i32 @llvm.lrint.i32.f32(float [[X_I1]])
+; CHECK-NEXT:    [[RND_UPTO0:%.*]] = insertelement <2 x i32> poison, i32 [[RND_I0]], i64 0
+; CHECK-NEXT:    [[RND:%.*]] = insertelement <2 x i32> [[RND_UPTO0]], i32 [[RND_I1]], i64 1
 ; CHECK-NEXT:    ret <2 x i32> [[RND]]
 ;
   %rnd = call <2 x i32> @llvm.lrint.v2i32.v2f32(<2 x float> %x)
@@ -226,7 +231,12 @@ define <2 x i32> @scalarize_lrint(<2 x float> %x) #0 {
 
 define <2 x i32> @scalarize_llrint(<2 x float> %x) #0 {
 ; CHECK-LABEL: @scalarize_llrint(
-; CHECK-NEXT:    [[RND:%.*]] = call <2 x i32> @llvm.llrint.v2i32.v2f32(<2 x float> [[X:%.*]])
+; CHECK-NEXT:    [[X_I0:%.*]] = extractelement <2 x float> [[X:%.*]], i64 0
+; CHECK-NEXT:    [[RND_I0:%.*]] = call i32 @llvm.llrint.i32.f32(float [[X_I0]])
+; CHECK-NEXT:    [[X_I1:%.*]] = extractelement <2 x float> [[X]], i64 1
+; CHECK-NEXT:    [[RND_I1:%.*]] = call i32 @llvm.llrint.i32.f32(float [[X_I1]])
+; CHECK-NEXT:    [[RND_UPTO0:%.*]] = insertelement <2 x i32> poison, i32 [[RND_I0]], i64 0
+; CHECK-NEXT:    [[RND:%.*]] = insertelement <2 x i32> [[RND_UPTO0]], i32 [[RND_I1]], i64 1
 ; CHECK-NEXT:    ret <2 x i32> [[RND]]
 ;
   %rnd = call <2 x i32> @llvm.llrint.v2i32.v2f32(<2 x float> %x)

Copy link
Collaborator

@RKSimon RKSimon left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@artagnon artagnon merged commit 2302e4c into llvm:main Nov 6, 2023
@artagnon artagnon deleted the vectorutils-lrint-reland branch November 6, 2023 18:49
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Failure to vectorize @llvm.lrint.i64.f32
3 participants