-
Notifications
You must be signed in to change notification settings - Fork 13.6k
[SLP] Make getSameOpcode support interchangeable instructions. #127450
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Changes from 8 commits
4c951fc
68684d3
b7c1a24
5ad586c
fdf88a2
4a08497
6788da4
deeb10d
ace8e91
f422a59
0e8d567
bf43fff
29c8cff
ea092b6
968f346
3f067dc
3cedcd4
c573e92
81f9e60
ba9ab59
b5ae180
4f00083
20c5597
f6b0561
751cfd9
5e80a55
0474893
ad7bec9
ddcd456
386c355
81698e4
9af04c0
62f0a1d
28f2d58
8fff436
9f9913d
2cb24a3
29f0813
f104cb0
5b1c64e
b83d444
b134aec
4132979
ce0fb67
b4e0f8e
fdf3446
52b129d
760d852
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
Large diffs are not rendered by default.
Original file line number | Diff line number | Diff line change |
---|---|---|
|
@@ -5,11 +5,19 @@ define i64 @test() { | |
; CHECK-LABEL: define i64 @test() { | ||
; CHECK-NEXT: [[ENTRY:.*:]] | ||
; CHECK-NEXT: [[OR54_I_I_6:%.*]] = or i32 0, 0 | ||
; CHECK-NEXT: [[TMP0:%.*]] = insertelement <16 x i32> poison, i32 [[OR54_I_I_6]], i32 8 | ||
; CHECK-NEXT: [[TMP1:%.*]] = call <16 x i32> @llvm.vector.insert.v16i32.v8i32(<16 x i32> [[TMP0]], <8 x i32> zeroinitializer, i64 0) | ||
; CHECK-NEXT: [[TMP2:%.*]] = shufflevector <16 x i32> [[TMP1]], <16 x i32> poison, <16 x i32> <i32 0, i32 0, i32 1, i32 1, i32 2, i32 2, i32 3, i32 3, i32 4, i32 4, i32 5, i32 5, i32 6, i32 7, i32 7, i32 8> | ||
; CHECK-NEXT: [[TMP3:%.*]] = zext <16 x i32> [[TMP2]] to <16 x i64> | ||
; CHECK-NEXT: [[TMP4:%.*]] = call i64 @llvm.vector.reduce.or.v16i64(<16 x i64> [[TMP3]]) | ||
; CHECK-NEXT: [[CONV193_1_I_6:%.*]] = zext i32 [[OR54_I_I_6]] to i64 | ||
; CHECK-NEXT: [[CONV193_I_7:%.*]] = zext i32 0 to i64 | ||
; CHECK-NEXT: [[TMP0:%.*]] = call <4 x i64> @llvm.vector.extract.v4i64.v8i64(<8 x i64> zeroinitializer, i64 0) | ||
; CHECK-NEXT: [[RDX_OP:%.*]] = or <4 x i64> [[TMP0]], zeroinitializer | ||
; CHECK-NEXT: [[TMP1:%.*]] = call <8 x i64> @llvm.vector.insert.v8i64.v4i64(<8 x i64> zeroinitializer, <4 x i64> [[RDX_OP]], i64 0) | ||
; CHECK-NEXT: [[OP_RDX:%.*]] = call i64 @llvm.vector.reduce.or.v8i64(<8 x i64> [[TMP1]]) | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Regression? There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Yes. We can get this without 4a08497 if we make There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Would be good somehow to estimate here, which one is better, and select best solution There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. before this PR, SLP are trying to vectorize VL with size 12. %xor148.2.i.1 = xor i32 0, 0
%xor148.2.i.2 = xor i32 0, 0
%xor148.2.i.3 = xor i32 0, 0
%xor148.2.i.4 = xor i32 0, 0
%xor148.2.i.5 = xor i32 0, 0
%xor148.2.i.6 = xor i32 0, 0
%xor148.2.i.7 = xor i32 0, 0
%or54.i.i.6 = or i32 %xor148.2.i.6, 0
i32 poison
i32 poison
i32 poison the code has a check like this
VL is alternate shuffle (xor and or) and SLP will try to combine from VL[0] to VL[7]. ( There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Need to fix this check, if it causes the regression There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. But it should be in another PR. Even without this PR, SLP can still get the same result if |
||
; CHECK-NEXT: [[TMP2:%.*]] = insertelement <2 x i64> poison, i64 [[OP_RDX]], i32 0 | ||
; CHECK-NEXT: [[TMP3:%.*]] = insertelement <2 x i64> [[TMP2]], i64 [[CONV193_I_7]], i32 1 | ||
; CHECK-NEXT: [[TMP7:%.*]] = or <2 x i64> [[TMP3]], zeroinitializer | ||
; CHECK-NEXT: [[TMP5:%.*]] = extractelement <2 x i64> [[TMP7]], i32 0 | ||
; CHECK-NEXT: [[TMP6:%.*]] = extractelement <2 x i64> [[TMP7]], i32 1 | ||
; CHECK-NEXT: [[OP_RDX3:%.*]] = or i64 [[TMP5]], [[TMP6]] | ||
; CHECK-NEXT: [[TMP4:%.*]] = or i64 [[OP_RDX3]], [[CONV193_1_I_6]] | ||
; CHECK-NEXT: ret i64 [[TMP4]] | ||
; | ||
entry: | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Regression?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
see 29c8cff