-
Notifications
You must be signed in to change notification settings - Fork 13.5k
Reland "VectorUtils: mark xrint as trivially vectorizable" #71416
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
With the recent change 98c90a1 (ISel: introduce vector ISD::LRINT, ISD::LLRINT; custom RISCV lowering), it is now possible for SLPVectorizer, LoopVectorize, and Scalarizer to operate on llvm.lrint and llvm.llrint, with vector codegen for the RISC-V target. Make a trivial change to VectorUtils, and update the corresponding tests. A couple of important fixes have been landed since the original patch was landed and reverted, and it is now safe to re-land the patch: 5e1d81a (LegalizeIntegerTypes: implement PromoteIntRes for xrint) and fd887a3 (LegalizeVectorTypes: fix bug in widening of vec result in xrint). See also llvm#71399 which proves that lrint and llrint will indeed produce vector codegen on RISC-V. Fixes llvm#55208.
@llvm/pr-subscribers-llvm-analysis @llvm/pr-subscribers-llvm-transforms Author: Ramkumar Ramachandra (artagnon) ChangesWith the recent change 98c90a1 (ISel: introduce vector ISD::LRINT, ISD::LLRINT; custom RISCV lowering), it is now possible for SLPVectorizer, LoopVectorize, and Scalarizer to operate on llvm.lrint and llvm.llrint, with vector codegen for the RISC-V target. Make a trivial change to VectorUtils, and update the corresponding tests. A couple of important fixes have been landed since the original patch was landed and reverted, and it is now safe to re-land the patch: 5e1d81a (LegalizeIntegerTypes: implement PromoteIntRes for xrint) and fd887a3 (LegalizeVectorTypes: fix bug in widening of vec result in xrint). See also #71399, which proves that lrint and llrint will indeed produce vector codegen on RISC-V. Fixes #55208. Full diff: https://github.com/llvm/llvm-project/pull/71416.diff 4 Files Affected:
diff --git a/llvm/lib/Analysis/VectorUtils.cpp b/llvm/lib/Analysis/VectorUtils.cpp
index be171867a2bf4d9..4f5db8b7aaf746f 100644
--- a/llvm/lib/Analysis/VectorUtils.cpp
+++ b/llvm/lib/Analysis/VectorUtils.cpp
@@ -91,6 +91,8 @@ bool llvm::isTriviallyVectorizable(Intrinsic::ID ID) {
case Intrinsic::canonicalize:
case Intrinsic::fptosi_sat:
case Intrinsic::fptoui_sat:
+ case Intrinsic::lrint:
+ case Intrinsic::llrint:
return true;
default:
return false;
@@ -122,6 +124,8 @@ bool llvm::isVectorIntrinsicWithOverloadTypeAtArg(Intrinsic::ID ID,
switch (ID) {
case Intrinsic::fptosi_sat:
case Intrinsic::fptoui_sat:
+ case Intrinsic::lrint:
+ case Intrinsic::llrint:
return OpdIdx == -1 || OpdIdx == 0;
case Intrinsic::is_fpclass:
return OpdIdx == 0;
diff --git a/llvm/test/Transforms/LoopVectorize/intrinsic.ll b/llvm/test/Transforms/LoopVectorize/intrinsic.ll
index 3e7754951de6a83..0f070347dd4efb6 100644
--- a/llvm/test/Transforms/LoopVectorize/intrinsic.ll
+++ b/llvm/test/Transforms/LoopVectorize/intrinsic.ll
@@ -1602,7 +1602,7 @@ declare i32 @llvm.lrint.i32.f32(float)
define void @lrint_i32_f32(ptr %x, ptr %y, i64 %n) {
; CHECK-LABEL: @lrint_i32_f32(
-; CHECK-NOT: llvm.lrint.v4i32.v4f32
+; CHECK: llvm.lrint.v4i32.v4f32
; CHECK: ret void
;
entry:
@@ -1628,7 +1628,7 @@ declare i64 @llvm.llrint.i64.f32(float)
define void @llrint_i64_f32(ptr %x, ptr %y, i64 %n) {
; CHECK-LABEL: @llrint_i64_f32(
-; CHECK-NOT: llvm.llrint.v4i32.v4f32
+; CHECK: llvm.llrint.v4i64.v4f32
; CHECK: ret void
;
entry:
diff --git a/llvm/test/Transforms/SLPVectorizer/RISCV/fround.ll b/llvm/test/Transforms/SLPVectorizer/RISCV/fround.ll
index e5c46b72a1c30fc..9cd0d5d321e2f60 100644
--- a/llvm/test/Transforms/SLPVectorizer/RISCV/fround.ll
+++ b/llvm/test/Transforms/SLPVectorizer/RISCV/fround.ll
@@ -34,13 +34,8 @@ define <2 x i32> @lrint_v2i32f32(ptr %a) {
; CHECK-SAME: ptr [[A:%.*]]) #[[ATTR0]] {
; CHECK-NEXT: entry:
; CHECK-NEXT: [[TMP0:%.*]] = load <2 x float>, ptr [[A]], align 8
-; CHECK-NEXT: [[VECEXT:%.*]] = extractelement <2 x float> [[TMP0]], i32 0
-; CHECK-NEXT: [[TMP1:%.*]] = call i32 @llvm.lrint.i32.f32(float [[VECEXT]])
-; CHECK-NEXT: [[VECINS:%.*]] = insertelement <2 x i32> undef, i32 [[TMP1]], i32 0
-; CHECK-NEXT: [[VECEXT_1:%.*]] = extractelement <2 x float> [[TMP0]], i32 1
-; CHECK-NEXT: [[TMP2:%.*]] = call i32 @llvm.lrint.i32.f32(float [[VECEXT_1]])
-; CHECK-NEXT: [[VECINS_1:%.*]] = insertelement <2 x i32> [[VECINS]], i32 [[TMP2]], i32 1
-; CHECK-NEXT: ret <2 x i32> [[VECINS_1]]
+; CHECK-NEXT: [[TMP1:%.*]] = call <2 x i32> @llvm.lrint.v2i32.v2f32(<2 x float> [[TMP0]])
+; CHECK-NEXT: ret <2 x i32> [[TMP1]]
;
entry:
%0 = load <2 x float>, ptr %a
@@ -58,19 +53,8 @@ define <4 x i32> @lrint_v4i32f32(ptr %a) {
; CHECK-SAME: ptr [[A:%.*]]) #[[ATTR0]] {
; CHECK-NEXT: entry:
; CHECK-NEXT: [[TMP0:%.*]] = load <4 x float>, ptr [[A]], align 16
-; CHECK-NEXT: [[VECEXT:%.*]] = extractelement <4 x float> [[TMP0]], i32 0
-; CHECK-NEXT: [[TMP1:%.*]] = call i32 @llvm.lrint.i32.f32(float [[VECEXT]])
-; CHECK-NEXT: [[VECINS:%.*]] = insertelement <4 x i32> undef, i32 [[TMP1]], i32 0
-; CHECK-NEXT: [[VECEXT_1:%.*]] = extractelement <4 x float> [[TMP0]], i32 1
-; CHECK-NEXT: [[TMP2:%.*]] = call i32 @llvm.lrint.i32.f32(float [[VECEXT_1]])
-; CHECK-NEXT: [[VECINS_1:%.*]] = insertelement <4 x i32> [[VECINS]], i32 [[TMP2]], i32 1
-; CHECK-NEXT: [[VECEXT_2:%.*]] = extractelement <4 x float> [[TMP0]], i32 2
-; CHECK-NEXT: [[TMP3:%.*]] = call i32 @llvm.lrint.i32.f32(float [[VECEXT_2]])
-; CHECK-NEXT: [[VECINS_2:%.*]] = insertelement <4 x i32> [[VECINS_1]], i32 [[TMP3]], i32 2
-; CHECK-NEXT: [[VECEXT_3:%.*]] = extractelement <4 x float> [[TMP0]], i32 3
-; CHECK-NEXT: [[TMP4:%.*]] = call i32 @llvm.lrint.i32.f32(float [[VECEXT_3]])
-; CHECK-NEXT: [[VECINS_3:%.*]] = insertelement <4 x i32> [[VECINS_2]], i32 [[TMP4]], i32 3
-; CHECK-NEXT: ret <4 x i32> [[VECINS_3]]
+; CHECK-NEXT: [[TMP1:%.*]] = call <4 x i32> @llvm.lrint.v4i32.v4f32(<4 x float> [[TMP0]])
+; CHECK-NEXT: ret <4 x i32> [[TMP1]]
;
entry:
%0 = load <4 x float>, ptr %a
@@ -94,31 +78,8 @@ define <8 x i32> @lrint_v8i32f32(ptr %a) {
; CHECK-SAME: ptr [[A:%.*]]) #[[ATTR0]] {
; CHECK-NEXT: entry:
; CHECK-NEXT: [[TMP0:%.*]] = load <8 x float>, ptr [[A]], align 32
-; CHECK-NEXT: [[VECEXT:%.*]] = extractelement <8 x float> [[TMP0]], i32 0
-; CHECK-NEXT: [[TMP1:%.*]] = call i32 @llvm.lrint.i32.f32(float [[VECEXT]])
-; CHECK-NEXT: [[VECINS:%.*]] = insertelement <8 x i32> undef, i32 [[TMP1]], i32 0
-; CHECK-NEXT: [[VECEXT_1:%.*]] = extractelement <8 x float> [[TMP0]], i32 1
-; CHECK-NEXT: [[TMP2:%.*]] = call i32 @llvm.lrint.i32.f32(float [[VECEXT_1]])
-; CHECK-NEXT: [[VECINS_1:%.*]] = insertelement <8 x i32> [[VECINS]], i32 [[TMP2]], i32 1
-; CHECK-NEXT: [[VECEXT_2:%.*]] = extractelement <8 x float> [[TMP0]], i32 2
-; CHECK-NEXT: [[TMP3:%.*]] = call i32 @llvm.lrint.i32.f32(float [[VECEXT_2]])
-; CHECK-NEXT: [[VECINS_2:%.*]] = insertelement <8 x i32> [[VECINS_1]], i32 [[TMP3]], i32 2
-; CHECK-NEXT: [[VECEXT_3:%.*]] = extractelement <8 x float> [[TMP0]], i32 3
-; CHECK-NEXT: [[TMP4:%.*]] = call i32 @llvm.lrint.i32.f32(float [[VECEXT_3]])
-; CHECK-NEXT: [[VECINS_3:%.*]] = insertelement <8 x i32> [[VECINS_2]], i32 [[TMP4]], i32 3
-; CHECK-NEXT: [[VECEXT_4:%.*]] = extractelement <8 x float> [[TMP0]], i32 4
-; CHECK-NEXT: [[TMP5:%.*]] = call i32 @llvm.lrint.i32.f32(float [[VECEXT_4]])
-; CHECK-NEXT: [[VECINS_4:%.*]] = insertelement <8 x i32> [[VECINS_3]], i32 [[TMP5]], i32 4
-; CHECK-NEXT: [[VECEXT_5:%.*]] = extractelement <8 x float> [[TMP0]], i32 5
-; CHECK-NEXT: [[TMP6:%.*]] = call i32 @llvm.lrint.i32.f32(float [[VECEXT_5]])
-; CHECK-NEXT: [[VECINS_5:%.*]] = insertelement <8 x i32> [[VECINS_4]], i32 [[TMP6]], i32 5
-; CHECK-NEXT: [[VECEXT_6:%.*]] = extractelement <8 x float> [[TMP0]], i32 6
-; CHECK-NEXT: [[TMP7:%.*]] = call i32 @llvm.lrint.i32.f32(float [[VECEXT_6]])
-; CHECK-NEXT: [[VECINS_6:%.*]] = insertelement <8 x i32> [[VECINS_5]], i32 [[TMP7]], i32 6
-; CHECK-NEXT: [[VECEXT_7:%.*]] = extractelement <8 x float> [[TMP0]], i32 7
-; CHECK-NEXT: [[TMP8:%.*]] = call i32 @llvm.lrint.i32.f32(float [[VECEXT_7]])
-; CHECK-NEXT: [[VECINS_7:%.*]] = insertelement <8 x i32> [[VECINS_6]], i32 [[TMP8]], i32 7
-; CHECK-NEXT: ret <8 x i32> [[VECINS_7]]
+; CHECK-NEXT: [[TMP1:%.*]] = call <8 x i32> @llvm.lrint.v8i32.v8f32(<8 x float> [[TMP0]])
+; CHECK-NEXT: ret <8 x i32> [[TMP1]]
;
entry:
%0 = load <8 x float>, ptr %a
@@ -154,13 +115,8 @@ define <2 x i64> @lrint_v2i64f32(ptr %a) {
; CHECK-SAME: ptr [[A:%.*]]) #[[ATTR0]] {
; CHECK-NEXT: entry:
; CHECK-NEXT: [[TMP0:%.*]] = load <2 x float>, ptr [[A]], align 8
-; CHECK-NEXT: [[VECEXT:%.*]] = extractelement <2 x float> [[TMP0]], i64 0
-; CHECK-NEXT: [[TMP1:%.*]] = call i64 @llvm.lrint.i64.f32(float [[VECEXT]])
-; CHECK-NEXT: [[VECINS:%.*]] = insertelement <2 x i64> undef, i64 [[TMP1]], i64 0
-; CHECK-NEXT: [[VECEXT_1:%.*]] = extractelement <2 x float> [[TMP0]], i64 1
-; CHECK-NEXT: [[TMP2:%.*]] = call i64 @llvm.lrint.i64.f32(float [[VECEXT_1]])
-; CHECK-NEXT: [[VECINS_1:%.*]] = insertelement <2 x i64> [[VECINS]], i64 [[TMP2]], i64 1
-; CHECK-NEXT: ret <2 x i64> [[VECINS_1]]
+; CHECK-NEXT: [[TMP1:%.*]] = call <2 x i64> @llvm.lrint.v2i64.v2f32(<2 x float> [[TMP0]])
+; CHECK-NEXT: ret <2 x i64> [[TMP1]]
;
entry:
%0 = load <2 x float>, ptr %a
@@ -178,19 +134,8 @@ define <4 x i64> @lrint_v4i64f32(ptr %a) {
; CHECK-SAME: ptr [[A:%.*]]) #[[ATTR0]] {
; CHECK-NEXT: entry:
; CHECK-NEXT: [[TMP0:%.*]] = load <4 x float>, ptr [[A]], align 16
-; CHECK-NEXT: [[VECEXT:%.*]] = extractelement <4 x float> [[TMP0]], i64 0
-; CHECK-NEXT: [[TMP1:%.*]] = call i64 @llvm.lrint.i64.f32(float [[VECEXT]])
-; CHECK-NEXT: [[VECINS:%.*]] = insertelement <4 x i64> undef, i64 [[TMP1]], i64 0
-; CHECK-NEXT: [[VECEXT_1:%.*]] = extractelement <4 x float> [[TMP0]], i64 1
-; CHECK-NEXT: [[TMP2:%.*]] = call i64 @llvm.lrint.i64.f32(float [[VECEXT_1]])
-; CHECK-NEXT: [[VECINS_1:%.*]] = insertelement <4 x i64> [[VECINS]], i64 [[TMP2]], i64 1
-; CHECK-NEXT: [[VECEXT_2:%.*]] = extractelement <4 x float> [[TMP0]], i64 2
-; CHECK-NEXT: [[TMP3:%.*]] = call i64 @llvm.lrint.i64.f32(float [[VECEXT_2]])
-; CHECK-NEXT: [[VECINS_2:%.*]] = insertelement <4 x i64> [[VECINS_1]], i64 [[TMP3]], i64 2
-; CHECK-NEXT: [[VECEXT_3:%.*]] = extractelement <4 x float> [[TMP0]], i64 3
-; CHECK-NEXT: [[TMP4:%.*]] = call i64 @llvm.lrint.i64.f32(float [[VECEXT_3]])
-; CHECK-NEXT: [[VECINS_3:%.*]] = insertelement <4 x i64> [[VECINS_2]], i64 [[TMP4]], i64 3
-; CHECK-NEXT: ret <4 x i64> [[VECINS_3]]
+; CHECK-NEXT: [[TMP1:%.*]] = call <4 x i64> @llvm.lrint.v4i64.v4f32(<4 x float> [[TMP0]])
+; CHECK-NEXT: ret <4 x i64> [[TMP1]]
;
entry:
%0 = load <4 x float>, ptr %a
@@ -214,31 +159,14 @@ define <8 x i64> @lrint_v8i64f32(ptr %a) {
; CHECK-SAME: ptr [[A:%.*]]) #[[ATTR0]] {
; CHECK-NEXT: entry:
; CHECK-NEXT: [[TMP0:%.*]] = load <8 x float>, ptr [[A]], align 32
-; CHECK-NEXT: [[VECEXT:%.*]] = extractelement <8 x float> [[TMP0]], i64 0
-; CHECK-NEXT: [[TMP1:%.*]] = call i64 @llvm.lrint.i64.f32(float [[VECEXT]])
-; CHECK-NEXT: [[VECINS:%.*]] = insertelement <8 x i64> undef, i64 [[TMP1]], i64 0
-; CHECK-NEXT: [[VECEXT_1:%.*]] = extractelement <8 x float> [[TMP0]], i64 1
-; CHECK-NEXT: [[TMP2:%.*]] = call i64 @llvm.lrint.i64.f32(float [[VECEXT_1]])
-; CHECK-NEXT: [[VECINS_1:%.*]] = insertelement <8 x i64> [[VECINS]], i64 [[TMP2]], i64 1
-; CHECK-NEXT: [[VECEXT_2:%.*]] = extractelement <8 x float> [[TMP0]], i64 2
-; CHECK-NEXT: [[TMP3:%.*]] = call i64 @llvm.lrint.i64.f32(float [[VECEXT_2]])
-; CHECK-NEXT: [[VECINS_2:%.*]] = insertelement <8 x i64> [[VECINS_1]], i64 [[TMP3]], i64 2
-; CHECK-NEXT: [[VECEXT_3:%.*]] = extractelement <8 x float> [[TMP0]], i64 3
-; CHECK-NEXT: [[TMP4:%.*]] = call i64 @llvm.lrint.i64.f32(float [[VECEXT_3]])
-; CHECK-NEXT: [[VECINS_3:%.*]] = insertelement <8 x i64> [[VECINS_2]], i64 [[TMP4]], i64 3
-; CHECK-NEXT: [[VECEXT_4:%.*]] = extractelement <8 x float> [[TMP0]], i64 4
-; CHECK-NEXT: [[TMP5:%.*]] = call i64 @llvm.lrint.i64.f32(float [[VECEXT_4]])
-; CHECK-NEXT: [[VECINS_4:%.*]] = insertelement <8 x i64> [[VECINS_3]], i64 [[TMP5]], i64 4
-; CHECK-NEXT: [[VECEXT_5:%.*]] = extractelement <8 x float> [[TMP0]], i64 5
-; CHECK-NEXT: [[TMP6:%.*]] = call i64 @llvm.lrint.i64.f32(float [[VECEXT_5]])
-; CHECK-NEXT: [[VECINS_5:%.*]] = insertelement <8 x i64> [[VECINS_4]], i64 [[TMP6]], i64 5
-; CHECK-NEXT: [[VECEXT_6:%.*]] = extractelement <8 x float> [[TMP0]], i64 6
-; CHECK-NEXT: [[TMP7:%.*]] = call i64 @llvm.lrint.i64.f32(float [[VECEXT_6]])
-; CHECK-NEXT: [[VECINS_6:%.*]] = insertelement <8 x i64> [[VECINS_5]], i64 [[TMP7]], i64 6
-; CHECK-NEXT: [[VECEXT_7:%.*]] = extractelement <8 x float> [[TMP0]], i64 7
-; CHECK-NEXT: [[TMP8:%.*]] = call i64 @llvm.lrint.i64.f32(float [[VECEXT_7]])
-; CHECK-NEXT: [[VECINS_7:%.*]] = insertelement <8 x i64> [[VECINS_6]], i64 [[TMP8]], i64 7
-; CHECK-NEXT: ret <8 x i64> [[VECINS_7]]
+; CHECK-NEXT: [[TMP1:%.*]] = shufflevector <8 x float> [[TMP0]], <8 x float> poison, <4 x i32> <i32 0, i32 1, i32 2, i32 3>
+; CHECK-NEXT: [[TMP2:%.*]] = call <4 x i64> @llvm.lrint.v4i64.v4f32(<4 x float> [[TMP1]])
+; CHECK-NEXT: [[TMP3:%.*]] = shufflevector <4 x i64> [[TMP2]], <4 x i64> poison, <8 x i32> <i32 0, i32 1, i32 2, i32 3, i32 poison, i32 poison, i32 poison, i32 poison>
+; CHECK-NEXT: [[TMP4:%.*]] = shufflevector <8 x float> [[TMP0]], <8 x float> poison, <4 x i32> <i32 4, i32 5, i32 6, i32 7>
+; CHECK-NEXT: [[TMP5:%.*]] = call <4 x i64> @llvm.lrint.v4i64.v4f32(<4 x float> [[TMP4]])
+; CHECK-NEXT: [[TMP6:%.*]] = shufflevector <4 x i64> [[TMP5]], <4 x i64> poison, <8 x i32> <i32 0, i32 1, i32 2, i32 3, i32 poison, i32 poison, i32 poison, i32 poison>
+; CHECK-NEXT: [[VECINS_71:%.*]] = shufflevector <8 x i64> [[TMP3]], <8 x i64> [[TMP6]], <8 x i32> <i32 0, i32 1, i32 2, i32 3, i32 8, i32 9, i32 10, i32 11>
+; CHECK-NEXT: ret <8 x i64> [[VECINS_71]]
;
entry:
%0 = load <8 x float>, ptr %a
@@ -274,13 +202,8 @@ define <2 x i64> @llrint_v2i64f32(ptr %a) {
; CHECK-SAME: ptr [[A:%.*]]) #[[ATTR0]] {
; CHECK-NEXT: entry:
; CHECK-NEXT: [[TMP0:%.*]] = load <2 x float>, ptr [[A]], align 8
-; CHECK-NEXT: [[VECEXT:%.*]] = extractelement <2 x float> [[TMP0]], i64 0
-; CHECK-NEXT: [[TMP1:%.*]] = call i64 @llvm.llrint.i64.f32(float [[VECEXT]])
-; CHECK-NEXT: [[VECINS:%.*]] = insertelement <2 x i64> undef, i64 [[TMP1]], i64 0
-; CHECK-NEXT: [[VECEXT_1:%.*]] = extractelement <2 x float> [[TMP0]], i64 1
-; CHECK-NEXT: [[TMP2:%.*]] = call i64 @llvm.llrint.i64.f32(float [[VECEXT_1]])
-; CHECK-NEXT: [[VECINS_1:%.*]] = insertelement <2 x i64> [[VECINS]], i64 [[TMP2]], i64 1
-; CHECK-NEXT: ret <2 x i64> [[VECINS_1]]
+; CHECK-NEXT: [[TMP1:%.*]] = call <2 x i64> @llvm.llrint.v2i64.v2f32(<2 x float> [[TMP0]])
+; CHECK-NEXT: ret <2 x i64> [[TMP1]]
;
entry:
%0 = load <2 x float>, ptr %a
@@ -298,19 +221,8 @@ define <4 x i64> @llrint_v4i64f32(ptr %a) {
; CHECK-SAME: ptr [[A:%.*]]) #[[ATTR0]] {
; CHECK-NEXT: entry:
; CHECK-NEXT: [[TMP0:%.*]] = load <4 x float>, ptr [[A]], align 16
-; CHECK-NEXT: [[VECEXT:%.*]] = extractelement <4 x float> [[TMP0]], i64 0
-; CHECK-NEXT: [[TMP1:%.*]] = call i64 @llvm.llrint.i64.f32(float [[VECEXT]])
-; CHECK-NEXT: [[VECINS:%.*]] = insertelement <4 x i64> undef, i64 [[TMP1]], i64 0
-; CHECK-NEXT: [[VECEXT_1:%.*]] = extractelement <4 x float> [[TMP0]], i64 1
-; CHECK-NEXT: [[TMP2:%.*]] = call i64 @llvm.llrint.i64.f32(float [[VECEXT_1]])
-; CHECK-NEXT: [[VECINS_1:%.*]] = insertelement <4 x i64> [[VECINS]], i64 [[TMP2]], i64 1
-; CHECK-NEXT: [[VECEXT_2:%.*]] = extractelement <4 x float> [[TMP0]], i64 2
-; CHECK-NEXT: [[TMP3:%.*]] = call i64 @llvm.llrint.i64.f32(float [[VECEXT_2]])
-; CHECK-NEXT: [[VECINS_2:%.*]] = insertelement <4 x i64> [[VECINS_1]], i64 [[TMP3]], i64 2
-; CHECK-NEXT: [[VECEXT_3:%.*]] = extractelement <4 x float> [[TMP0]], i64 3
-; CHECK-NEXT: [[TMP4:%.*]] = call i64 @llvm.llrint.i64.f32(float [[VECEXT_3]])
-; CHECK-NEXT: [[VECINS_3:%.*]] = insertelement <4 x i64> [[VECINS_2]], i64 [[TMP4]], i64 3
-; CHECK-NEXT: ret <4 x i64> [[VECINS_3]]
+; CHECK-NEXT: [[TMP1:%.*]] = call <4 x i64> @llvm.llrint.v4i64.v4f32(<4 x float> [[TMP0]])
+; CHECK-NEXT: ret <4 x i64> [[TMP1]]
;
entry:
%0 = load <4 x float>, ptr %a
@@ -334,31 +246,14 @@ define <8 x i64> @llrint_v8i64f32(ptr %a) {
; CHECK-SAME: ptr [[A:%.*]]) #[[ATTR0]] {
; CHECK-NEXT: entry:
; CHECK-NEXT: [[TMP0:%.*]] = load <8 x float>, ptr [[A]], align 32
-; CHECK-NEXT: [[VECEXT:%.*]] = extractelement <8 x float> [[TMP0]], i64 0
-; CHECK-NEXT: [[TMP1:%.*]] = call i64 @llvm.llrint.i64.f32(float [[VECEXT]])
-; CHECK-NEXT: [[VECINS:%.*]] = insertelement <8 x i64> undef, i64 [[TMP1]], i64 0
-; CHECK-NEXT: [[VECEXT_1:%.*]] = extractelement <8 x float> [[TMP0]], i64 1
-; CHECK-NEXT: [[TMP2:%.*]] = call i64 @llvm.llrint.i64.f32(float [[VECEXT_1]])
-; CHECK-NEXT: [[VECINS_1:%.*]] = insertelement <8 x i64> [[VECINS]], i64 [[TMP2]], i64 1
-; CHECK-NEXT: [[VECEXT_2:%.*]] = extractelement <8 x float> [[TMP0]], i64 2
-; CHECK-NEXT: [[TMP3:%.*]] = call i64 @llvm.llrint.i64.f32(float [[VECEXT_2]])
-; CHECK-NEXT: [[VECINS_2:%.*]] = insertelement <8 x i64> [[VECINS_1]], i64 [[TMP3]], i64 2
-; CHECK-NEXT: [[VECEXT_3:%.*]] = extractelement <8 x float> [[TMP0]], i64 3
-; CHECK-NEXT: [[TMP4:%.*]] = call i64 @llvm.llrint.i64.f32(float [[VECEXT_3]])
-; CHECK-NEXT: [[VECINS_3:%.*]] = insertelement <8 x i64> [[VECINS_2]], i64 [[TMP4]], i64 3
-; CHECK-NEXT: [[VECEXT_4:%.*]] = extractelement <8 x float> [[TMP0]], i64 4
-; CHECK-NEXT: [[TMP5:%.*]] = call i64 @llvm.llrint.i64.f32(float [[VECEXT_4]])
-; CHECK-NEXT: [[VECINS_4:%.*]] = insertelement <8 x i64> [[VECINS_3]], i64 [[TMP5]], i64 4
-; CHECK-NEXT: [[VECEXT_5:%.*]] = extractelement <8 x float> [[TMP0]], i64 5
-; CHECK-NEXT: [[TMP6:%.*]] = call i64 @llvm.llrint.i64.f32(float [[VECEXT_5]])
-; CHECK-NEXT: [[VECINS_5:%.*]] = insertelement <8 x i64> [[VECINS_4]], i64 [[TMP6]], i64 5
-; CHECK-NEXT: [[VECEXT_6:%.*]] = extractelement <8 x float> [[TMP0]], i64 6
-; CHECK-NEXT: [[TMP7:%.*]] = call i64 @llvm.llrint.i64.f32(float [[VECEXT_6]])
-; CHECK-NEXT: [[VECINS_6:%.*]] = insertelement <8 x i64> [[VECINS_5]], i64 [[TMP7]], i64 6
-; CHECK-NEXT: [[VECEXT_7:%.*]] = extractelement <8 x float> [[TMP0]], i64 7
-; CHECK-NEXT: [[TMP8:%.*]] = call i64 @llvm.llrint.i64.f32(float [[VECEXT_7]])
-; CHECK-NEXT: [[VECINS_7:%.*]] = insertelement <8 x i64> [[VECINS_6]], i64 [[TMP8]], i64 7
-; CHECK-NEXT: ret <8 x i64> [[VECINS_7]]
+; CHECK-NEXT: [[TMP1:%.*]] = shufflevector <8 x float> [[TMP0]], <8 x float> poison, <4 x i32> <i32 0, i32 1, i32 2, i32 3>
+; CHECK-NEXT: [[TMP2:%.*]] = call <4 x i64> @llvm.llrint.v4i64.v4f32(<4 x float> [[TMP1]])
+; CHECK-NEXT: [[TMP3:%.*]] = shufflevector <4 x i64> [[TMP2]], <4 x i64> poison, <8 x i32> <i32 0, i32 1, i32 2, i32 3, i32 poison, i32 poison, i32 poison, i32 poison>
+; CHECK-NEXT: [[TMP4:%.*]] = shufflevector <8 x float> [[TMP0]], <8 x float> poison, <4 x i32> <i32 4, i32 5, i32 6, i32 7>
+; CHECK-NEXT: [[TMP5:%.*]] = call <4 x i64> @llvm.llrint.v4i64.v4f32(<4 x float> [[TMP4]])
+; CHECK-NEXT: [[TMP6:%.*]] = shufflevector <4 x i64> [[TMP5]], <4 x i64> poison, <8 x i32> <i32 0, i32 1, i32 2, i32 3, i32 poison, i32 poison, i32 poison, i32 poison>
+; CHECK-NEXT: [[VECINS_71:%.*]] = shufflevector <8 x i64> [[TMP3]], <8 x i64> [[TMP6]], <8 x i32> <i32 0, i32 1, i32 2, i32 3, i32 8, i32 9, i32 10, i32 11>
+; CHECK-NEXT: ret <8 x i64> [[VECINS_71]]
;
entry:
%0 = load <8 x float>, ptr %a
diff --git a/llvm/test/Transforms/Scalarizer/intrinsics.ll b/llvm/test/Transforms/Scalarizer/intrinsics.ll
index f5e5be1f5998e00..cee44ef43426071 100644
--- a/llvm/test/Transforms/Scalarizer/intrinsics.ll
+++ b/llvm/test/Transforms/Scalarizer/intrinsics.ll
@@ -217,7 +217,12 @@ define <2 x i32> @scalarize_fptoui_sat(<2 x float> %x) #0 {
define <2 x i32> @scalarize_lrint(<2 x float> %x) #0 {
; CHECK-LABEL: @scalarize_lrint(
-; CHECK-NEXT: [[RND:%.*]] = call <2 x i32> @llvm.lrint.v2i32.v2f32(<2 x float> [[X:%.*]])
+; CHECK-NEXT: [[X_I0:%.*]] = extractelement <2 x float> [[X:%.*]], i64 0
+; CHECK-NEXT: [[RND_I0:%.*]] = call i32 @llvm.lrint.i32.f32(float [[X_I0]])
+; CHECK-NEXT: [[X_I1:%.*]] = extractelement <2 x float> [[X]], i64 1
+; CHECK-NEXT: [[RND_I1:%.*]] = call i32 @llvm.lrint.i32.f32(float [[X_I1]])
+; CHECK-NEXT: [[RND_UPTO0:%.*]] = insertelement <2 x i32> poison, i32 [[RND_I0]], i64 0
+; CHECK-NEXT: [[RND:%.*]] = insertelement <2 x i32> [[RND_UPTO0]], i32 [[RND_I1]], i64 1
; CHECK-NEXT: ret <2 x i32> [[RND]]
;
%rnd = call <2 x i32> @llvm.lrint.v2i32.v2f32(<2 x float> %x)
@@ -226,7 +231,12 @@ define <2 x i32> @scalarize_lrint(<2 x float> %x) #0 {
define <2 x i32> @scalarize_llrint(<2 x float> %x) #0 {
; CHECK-LABEL: @scalarize_llrint(
-; CHECK-NEXT: [[RND:%.*]] = call <2 x i32> @llvm.llrint.v2i32.v2f32(<2 x float> [[X:%.*]])
+; CHECK-NEXT: [[X_I0:%.*]] = extractelement <2 x float> [[X:%.*]], i64 0
+; CHECK-NEXT: [[RND_I0:%.*]] = call i32 @llvm.llrint.i32.f32(float [[X_I0]])
+; CHECK-NEXT: [[X_I1:%.*]] = extractelement <2 x float> [[X]], i64 1
+; CHECK-NEXT: [[RND_I1:%.*]] = call i32 @llvm.llrint.i32.f32(float [[X_I1]])
+; CHECK-NEXT: [[RND_UPTO0:%.*]] = insertelement <2 x i32> poison, i32 [[RND_I0]], i64 0
+; CHECK-NEXT: [[RND:%.*]] = insertelement <2 x i32> [[RND_UPTO0]], i32 [[RND_I1]], i64 1
; CHECK-NEXT: ret <2 x i32> [[RND]]
;
%rnd = call <2 x i32> @llvm.llrint.v2i32.v2f32(<2 x float> %x)
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
With the recent change 98c90a1 (ISel: introduce vector ISD::LRINT, ISD::LLRINT; custom RISCV lowering), it is now possible for SLPVectorizer, LoopVectorize, and Scalarizer to operate on llvm.lrint and llvm.llrint, with vector codegen for the RISC-V target. Make a trivial change to VectorUtils, and update the corresponding tests.
A couple of important fixes have been landed since the original patch was landed and reverted, and it is now safe to re-land the patch: 5e1d81a (LegalizeIntegerTypes: implement PromoteIntRes for xrint) and fd887a3 (LegalizeVectorTypes: fix bug in widening of vec result in xrint). See also #71399, which proves that lrint and llrint will indeed produce vector codegen on RISC-V.
Fixes #55208.