-
Notifications
You must be signed in to change notification settings - Fork 13.6k
[clang][CodeGen] Generate follow-up metadata for loops in correct format #131985
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
@llvm/pr-subscribers-clang @llvm/pr-subscribers-llvm-transforms Author: Ryotaro Kasuga (kasuga-fj) ChangesWhen multiple pragma for loop transformations are specified, such as: #pragma clang loop vectorize(enable) unroll_count(8)
for (...) {
} The generated metadata would look like this:
For a loop with
However, the current implementation creates new LoopID like:
Therefore subsequent passes like LoopUnroll fails to recognize the attributes of this loop correctly. This patch fixes Full diff: https://github.com/llvm/llvm-project/pull/131985.diff 2 Files Affected:
diff --git a/llvm/lib/Transforms/Utils/LoopUtils.cpp b/llvm/lib/Transforms/Utils/LoopUtils.cpp
index 84c08556f8a25..4a6105add953f 100644
--- a/llvm/lib/Transforms/Utils/LoopUtils.cpp
+++ b/llvm/lib/Transforms/Utils/LoopUtils.cpp
@@ -317,6 +317,35 @@ std::optional<MDNode *> llvm::makeFollowupLoopID(
HasAnyFollowup = true;
for (const MDOperand &Option : drop_begin(FollowupNode->operands())) {
+ // The followup metadata typically forms as follows:
+ //
+ // !0 = distinct !{!0, !1, !2}
+ // !1 = !{!"llvm.loop.distribute.enable", i1 true}
+ // !2 = !{!"llvm.loop.distribute.followup_all", !3}
+ // !3 = distinct !{!3, !4}
+ // !4 = !{!"llvm.loop.vectorize.enable", i1 true}
+ //
+ // If we push Option (!3 in this case) in MDs, the new metadata looks
+ // something like:
+ //
+ // !5 = distinct !{!5, !3}
+ //
+ // This doesn't contain !4, so the vectorization pass doesn't recognize
+ // this loop as vectorization enabled. To make the new metadata contain !4
+ // instead of !3, traverse all of Option's operands and push them into
+ // MDs if Option seems to be a LoopID.
+ if (auto *MDN = dyn_cast<MDNode>(Option)) {
+ // TODO: Is there a proper way to detect LoopID?
+ if (MDN->getNumOperands() > 1 && MDN->getOperand(0) == MDN) {
+ for (const MDOperand &NestedOption : drop_begin(MDN->operands())) {
+ MDs.push_back(NestedOption.get());
+ Changed = true;
+ }
+ continue;
+ }
+ }
+
+ // If Option does't seem to be a LoopID, push it as it is.
MDs.push_back(Option.get());
Changed = true;
}
diff --git a/llvm/test/Transforms/LoopVectorize/make-followup-loop-id.ll b/llvm/test/Transforms/LoopVectorize/make-followup-loop-id.ll
index fa5c206547a07..41f508e0a7641 100644
--- a/llvm/test/Transforms/LoopVectorize/make-followup-loop-id.ll
+++ b/llvm/test/Transforms/LoopVectorize/make-followup-loop-id.ll
@@ -11,10 +11,6 @@
; a[i] *= x;
; }
; }
-;
-; FIXME: Currently unrolling is not applied. This is because the new Loop ID
-; created after vectorization does not directly contain unroll metadata.
-; Unexpected nests have been created.
define void @f(ptr noundef captures(none) %a, float noundef %x) {
; CHECK-LABEL: define void @f(
; CHECK-SAME: ptr noundef captures(none) [[A:%.*]], float noundef [[X:%.*]]) {
@@ -25,14 +21,47 @@ define void @f(ptr noundef captures(none) %a, float noundef %x) {
; CHECK-NEXT: [[BROADCAST_SPLAT:%.*]] = shufflevector <4 x float> [[BROADCAST_SPLATINSERT]], <4 x float> poison, <4 x i32> zeroinitializer
; CHECK-NEXT: br label %[[VECTOR_BODY:.*]]
; CHECK: [[VECTOR_BODY]]:
-; CHECK-NEXT: [[INDEX:%.*]] = phi i64 [ 0, %[[VECTOR_PH]] ], [ [[INDEX_NEXT:%.*]], %[[VECTOR_BODY]] ]
-; CHECK-NEXT: [[INDEX_NEXT_6:%.*]] = add i64 [[INDEX]], 0
+; CHECK-NEXT: [[INDEX_NEXT_6:%.*]] = phi i64 [ 0, %[[VECTOR_PH]] ], [ [[INDEX_NEXT:%.*]], %[[VECTOR_BODY]] ]
; CHECK-NEXT: [[TMP14:%.*]] = getelementptr inbounds nuw float, ptr [[A]], i64 [[INDEX_NEXT_6]]
-; CHECK-NEXT: [[TMP2:%.*]] = getelementptr inbounds nuw float, ptr [[TMP14]], i32 0
-; CHECK-NEXT: [[WIDE_LOAD_7:%.*]] = load <4 x float>, ptr [[TMP2]], align 4
+; CHECK-NEXT: [[WIDE_LOAD_7:%.*]] = load <4 x float>, ptr [[TMP14]], align 4
; CHECK-NEXT: [[TMP15:%.*]] = fmul <4 x float> [[BROADCAST_SPLAT]], [[WIDE_LOAD_7]]
-; CHECK-NEXT: store <4 x float> [[TMP15]], ptr [[TMP2]], align 4
-; CHECK-NEXT: [[INDEX_NEXT]] = add nuw i64 [[INDEX]], 4
+; CHECK-NEXT: store <4 x float> [[TMP15]], ptr [[TMP14]], align 4
+; CHECK-NEXT: [[INDEX_NEXT1:%.*]] = add nuw nsw i64 [[INDEX_NEXT_6]], 4
+; CHECK-NEXT: [[TMP2:%.*]] = getelementptr inbounds nuw float, ptr [[A]], i64 [[INDEX_NEXT1]]
+; CHECK-NEXT: [[WIDE_LOAD_1:%.*]] = load <4 x float>, ptr [[TMP2]], align 4
+; CHECK-NEXT: [[TMP3:%.*]] = fmul <4 x float> [[BROADCAST_SPLAT]], [[WIDE_LOAD_1]]
+; CHECK-NEXT: store <4 x float> [[TMP3]], ptr [[TMP2]], align 4
+; CHECK-NEXT: [[INDEX_NEXT_1:%.*]] = add nuw nsw i64 [[INDEX_NEXT_6]], 8
+; CHECK-NEXT: [[TMP16:%.*]] = getelementptr inbounds nuw float, ptr [[A]], i64 [[INDEX_NEXT_1]]
+; CHECK-NEXT: [[WIDE_LOAD_2:%.*]] = load <4 x float>, ptr [[TMP16]], align 4
+; CHECK-NEXT: [[TMP5:%.*]] = fmul <4 x float> [[BROADCAST_SPLAT]], [[WIDE_LOAD_2]]
+; CHECK-NEXT: store <4 x float> [[TMP5]], ptr [[TMP16]], align 4
+; CHECK-NEXT: [[INDEX_NEXT_2:%.*]] = add nuw nsw i64 [[INDEX_NEXT_6]], 12
+; CHECK-NEXT: [[TMP6:%.*]] = getelementptr inbounds nuw float, ptr [[A]], i64 [[INDEX_NEXT_2]]
+; CHECK-NEXT: [[WIDE_LOAD_3:%.*]] = load <4 x float>, ptr [[TMP6]], align 4
+; CHECK-NEXT: [[TMP7:%.*]] = fmul <4 x float> [[BROADCAST_SPLAT]], [[WIDE_LOAD_3]]
+; CHECK-NEXT: store <4 x float> [[TMP7]], ptr [[TMP6]], align 4
+; CHECK-NEXT: [[INDEX_NEXT_3:%.*]] = add nuw nsw i64 [[INDEX_NEXT_6]], 16
+; CHECK-NEXT: [[TMP8:%.*]] = getelementptr inbounds nuw float, ptr [[A]], i64 [[INDEX_NEXT_3]]
+; CHECK-NEXT: [[WIDE_LOAD_4:%.*]] = load <4 x float>, ptr [[TMP8]], align 4
+; CHECK-NEXT: [[TMP9:%.*]] = fmul <4 x float> [[BROADCAST_SPLAT]], [[WIDE_LOAD_4]]
+; CHECK-NEXT: store <4 x float> [[TMP9]], ptr [[TMP8]], align 4
+; CHECK-NEXT: [[INDEX_NEXT_4:%.*]] = add nuw nsw i64 [[INDEX_NEXT_6]], 20
+; CHECK-NEXT: [[TMP10:%.*]] = getelementptr inbounds nuw float, ptr [[A]], i64 [[INDEX_NEXT_4]]
+; CHECK-NEXT: [[WIDE_LOAD_5:%.*]] = load <4 x float>, ptr [[TMP10]], align 4
+; CHECK-NEXT: [[TMP11:%.*]] = fmul <4 x float> [[BROADCAST_SPLAT]], [[WIDE_LOAD_5]]
+; CHECK-NEXT: store <4 x float> [[TMP11]], ptr [[TMP10]], align 4
+; CHECK-NEXT: [[INDEX_NEXT_5:%.*]] = add nuw nsw i64 [[INDEX_NEXT_6]], 24
+; CHECK-NEXT: [[TMP12:%.*]] = getelementptr inbounds nuw float, ptr [[A]], i64 [[INDEX_NEXT_5]]
+; CHECK-NEXT: [[WIDE_LOAD_6:%.*]] = load <4 x float>, ptr [[TMP12]], align 4
+; CHECK-NEXT: [[TMP13:%.*]] = fmul <4 x float> [[BROADCAST_SPLAT]], [[WIDE_LOAD_6]]
+; CHECK-NEXT: store <4 x float> [[TMP13]], ptr [[TMP12]], align 4
+; CHECK-NEXT: [[INDEX_NEXT_7:%.*]] = add nuw nsw i64 [[INDEX_NEXT_6]], 28
+; CHECK-NEXT: [[TMP17:%.*]] = getelementptr inbounds nuw float, ptr [[A]], i64 [[INDEX_NEXT_7]]
+; CHECK-NEXT: [[WIDE_LOAD_8:%.*]] = load <4 x float>, ptr [[TMP17]], align 4
+; CHECK-NEXT: [[TMP18:%.*]] = fmul <4 x float> [[BROADCAST_SPLAT]], [[WIDE_LOAD_8]]
+; CHECK-NEXT: store <4 x float> [[TMP18]], ptr [[TMP17]], align 4
+; CHECK-NEXT: [[INDEX_NEXT]] = add nuw nsw i64 [[INDEX_NEXT_6]], 32
; CHECK-NEXT: [[TMP4:%.*]] = icmp eq i64 [[INDEX_NEXT]], 1024
; CHECK-NEXT: br i1 [[TMP4]], label %[[MIDDLE_BLOCK:.*]], label %[[VECTOR_BODY]], !llvm.loop [[LOOP0:![0-9]+]]
; CHECK: [[MIDDLE_BLOCK]]:
@@ -41,14 +70,49 @@ define void @f(ptr noundef captures(none) %a, float noundef %x) {
; CHECK-NEXT: [[BC_RESUME_VAL:%.*]] = phi i64 [ 1024, %[[MIDDLE_BLOCK]] ], [ 0, %[[ENTRY]] ]
; CHECK-NEXT: br label %[[FOR_BODY:.*]]
; CHECK: [[FOR_BODY]]:
-; CHECK-NEXT: [[IV:%.*]] = phi i64 [ [[BC_RESUME_VAL]], %[[SCALAR_PH]] ], [ [[IV_NEXT:%.*]], %[[FOR_BODY]] ]
+; CHECK-NEXT: [[IV:%.*]] = phi i64 [ [[BC_RESUME_VAL]], %[[SCALAR_PH]] ], [ [[IV_NEXT_7:%.*]], %[[FOR_BODY]] ]
; CHECK-NEXT: [[ARRAYIDX:%.*]] = getelementptr inbounds nuw float, ptr [[A]], i64 [[IV]]
; CHECK-NEXT: [[LOAD:%.*]] = load float, ptr [[ARRAYIDX]], align 4
; CHECK-NEXT: [[MUL:%.*]] = fmul float [[X]], [[LOAD]]
; CHECK-NEXT: store float [[MUL]], ptr [[ARRAYIDX]], align 4
-; CHECK-NEXT: [[IV_NEXT]] = add nuw nsw i64 [[IV]], 1
-; CHECK-NEXT: [[COMP:%.*]] = icmp eq i64 [[IV_NEXT]], 1024
-; CHECK-NEXT: br i1 [[COMP]], label %[[EXIT_LOOPEXIT:.*]], label %[[FOR_BODY]], !llvm.loop [[LOOP5:![0-9]+]]
+; CHECK-NEXT: [[IV_NEXT:%.*]] = add nuw nsw i64 [[IV]], 1
+; CHECK-NEXT: [[ARRAYIDX_1:%.*]] = getelementptr inbounds nuw float, ptr [[A]], i64 [[IV_NEXT]]
+; CHECK-NEXT: [[LOAD_1:%.*]] = load float, ptr [[ARRAYIDX_1]], align 4
+; CHECK-NEXT: [[MUL_1:%.*]] = fmul float [[X]], [[LOAD_1]]
+; CHECK-NEXT: store float [[MUL_1]], ptr [[ARRAYIDX_1]], align 4
+; CHECK-NEXT: [[IV_NEXT_1:%.*]] = add nuw nsw i64 [[IV]], 2
+; CHECK-NEXT: [[ARRAYIDX_2:%.*]] = getelementptr inbounds nuw float, ptr [[A]], i64 [[IV_NEXT_1]]
+; CHECK-NEXT: [[LOAD_2:%.*]] = load float, ptr [[ARRAYIDX_2]], align 4
+; CHECK-NEXT: [[MUL_2:%.*]] = fmul float [[X]], [[LOAD_2]]
+; CHECK-NEXT: store float [[MUL_2]], ptr [[ARRAYIDX_2]], align 4
+; CHECK-NEXT: [[IV_NEXT_2:%.*]] = add nuw nsw i64 [[IV]], 3
+; CHECK-NEXT: [[ARRAYIDX_3:%.*]] = getelementptr inbounds nuw float, ptr [[A]], i64 [[IV_NEXT_2]]
+; CHECK-NEXT: [[LOAD_3:%.*]] = load float, ptr [[ARRAYIDX_3]], align 4
+; CHECK-NEXT: [[MUL_3:%.*]] = fmul float [[X]], [[LOAD_3]]
+; CHECK-NEXT: store float [[MUL_3]], ptr [[ARRAYIDX_3]], align 4
+; CHECK-NEXT: [[IV_NEXT_3:%.*]] = add nuw nsw i64 [[IV]], 4
+; CHECK-NEXT: [[ARRAYIDX_4:%.*]] = getelementptr inbounds nuw float, ptr [[A]], i64 [[IV_NEXT_3]]
+; CHECK-NEXT: [[LOAD_4:%.*]] = load float, ptr [[ARRAYIDX_4]], align 4
+; CHECK-NEXT: [[MUL_4:%.*]] = fmul float [[X]], [[LOAD_4]]
+; CHECK-NEXT: store float [[MUL_4]], ptr [[ARRAYIDX_4]], align 4
+; CHECK-NEXT: [[IV_NEXT_4:%.*]] = add nuw nsw i64 [[IV]], 5
+; CHECK-NEXT: [[ARRAYIDX_5:%.*]] = getelementptr inbounds nuw float, ptr [[A]], i64 [[IV_NEXT_4]]
+; CHECK-NEXT: [[LOAD_5:%.*]] = load float, ptr [[ARRAYIDX_5]], align 4
+; CHECK-NEXT: [[MUL_5:%.*]] = fmul float [[X]], [[LOAD_5]]
+; CHECK-NEXT: store float [[MUL_5]], ptr [[ARRAYIDX_5]], align 4
+; CHECK-NEXT: [[IV_NEXT_5:%.*]] = add nuw nsw i64 [[IV]], 6
+; CHECK-NEXT: [[ARRAYIDX_6:%.*]] = getelementptr inbounds nuw float, ptr [[A]], i64 [[IV_NEXT_5]]
+; CHECK-NEXT: [[LOAD_6:%.*]] = load float, ptr [[ARRAYIDX_6]], align 4
+; CHECK-NEXT: [[MUL_6:%.*]] = fmul float [[X]], [[LOAD_6]]
+; CHECK-NEXT: store float [[MUL_6]], ptr [[ARRAYIDX_6]], align 4
+; CHECK-NEXT: [[IV_NEXT_6:%.*]] = add nuw nsw i64 [[IV]], 7
+; CHECK-NEXT: [[ARRAYIDX_7:%.*]] = getelementptr inbounds nuw float, ptr [[A]], i64 [[IV_NEXT_6]]
+; CHECK-NEXT: [[LOAD_7:%.*]] = load float, ptr [[ARRAYIDX_7]], align 4
+; CHECK-NEXT: [[MUL_7:%.*]] = fmul float [[X]], [[LOAD_7]]
+; CHECK-NEXT: store float [[MUL_7]], ptr [[ARRAYIDX_7]], align 4
+; CHECK-NEXT: [[IV_NEXT_7]] = add nuw nsw i64 [[IV]], 8
+; CHECK-NEXT: [[COMP_7:%.*]] = icmp eq i64 [[IV_NEXT_7]], 1024
+; CHECK-NEXT: br i1 [[COMP_7]], label %[[EXIT_LOOPEXIT:.*]], label %[[FOR_BODY]], !llvm.loop [[LOOP3:![0-9]+]]
; CHECK: [[EXIT_LOOPEXIT]]:
; CHECK-NEXT: br label %[[EXIT]]
; CHECK: [[EXIT]]:
@@ -78,10 +142,8 @@ exit:
!4 = !{!"llvm.loop.isvectorized"}
!5 = !{!"llvm.loop.unroll.count", i32 8}
;.
-; CHECK: [[LOOP0]] = distinct !{[[LOOP0]], [[META1:![0-9]+]], [[META4:![0-9]+]]}
-; CHECK: [[META1]] = distinct !{[[META1]], [[META2:![0-9]+]], [[META3:![0-9]+]]}
-; CHECK: [[META2]] = !{!"llvm.loop.isvectorized"}
-; CHECK: [[META3]] = !{!"llvm.loop.unroll.count", i32 8}
-; CHECK: [[META4]] = !{!"llvm.loop.unroll.runtime.disable"}
-; CHECK: [[LOOP5]] = distinct !{[[LOOP5]], [[META1]]}
+; CHECK: [[LOOP0]] = distinct !{[[LOOP0]], [[META1:![0-9]+]], [[META2:![0-9]+]]}
+; CHECK: [[META1]] = !{!"llvm.loop.isvectorized"}
+; CHECK: [[META2]] = !{!"llvm.loop.unroll.disable"}
+; CHECK: [[LOOP3]] = distinct !{[[LOOP3]], [[META1]], [[META2]]}
;.
|
// TODO: Is there a proper way to detect LoopID? | ||
if (MDN->getNumOperands() > 1 && MDN->getOperand(0) == MDN) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't think this is a very good implementation, but I couldn't think of a better way.
It may be better to force the followup metadata to have exactly one LoopID. As far as I've investigated, some tests (e.g., Transforms/LoopDistribute/followup.ll
) contain followup metadata with non-LoopID data, but I believe we can fix them easily. Also, if my understanding is correct, those which actually generated by the frontend seem to have exactly one LoopID. Would it make sense to set such a restriction?
Looks to me that the mistake is that the llvm-project/llvm/test/Transforms/LoopDistribute/followup.ll Lines 129 to 132 in 29925b7
Sorry, didn't notice that when reviewing the test case. Whatever created it should be fixed. Whether an MDNode is a LoopID cannot be detected reliably. Having the first entry refer to itself was a technique to avoid collapsing identical LoopIDs before we had |
Thanks for the background explanation! I wasn't sure which one was correct, but I see that having the properties is correct. Let me clarify the problem: We should fix the processes of generating followups in llvm-project/clang/test/CodeGenCXX/pragma-loop.cpp Lines 217 to 221 in 67a0113
|
Looks like the problem is here: llvm-project/clang/lib/CodeGen/CGLoopInfo.cpp Line 134 in 39ce995
It should add the properties, not the generated LoopID (that would be used if the #pragma was not the followup of another #pragma). Considering that it might be necessary to merge multiple such followup properties (eg. both of I wrote both sides, so I obviously wasn't consistent with that myself. If you fix it, could you fix it for wall |
Yes, of course, I was going to fix everything from the beginning (in that sense, I should have added more tests in #131337). Thanks also for the implementation advice. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Possibly the
create*Metadata
should only return a list of properties instead of an MDNode, with the MDNode only to be created when applying the properties to a Loop. However, the easier fix would probably be to extract the properties from the MDNode and append to thefollowup_*
MDNode, with the returned MDNode becoming garbage-collected at some point.
Fixed create*Metadata
to return a list of properties instead of an MDNode. I believe unnecessary MDNodes would not be generated.
MDNode::get(Ctx, {MDString::get(Ctx, "llvm.loop.pipeline.disable"), | ||
ConstantAsMetadata::get(ConstantInt::get( | ||
llvm::Type::getInt1Ty(Ctx), 1))})); | ||
LoopProperties = NewLoopProperties; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Different from this PR, but I think we should set HasUserTransforms
to true here (same for other create*Metadata
). If not, the user-specified `disable' attributes would not be generated properly.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think this is because by design pipeline
is always the last transformation (it is implemented in the back-end, after all the IR optimization passes, there can be no transformation after this).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What about other transformations, e.g., vectorization?
llvm-project/clang/lib/CodeGen/CGLoopInfo.cpp
Lines 213 to 220 in ce8febb
if (Enabled == false) { | |
NewLoopProperties.append(LoopProperties.begin(), LoopProperties.end()); | |
NewLoopProperties.push_back( | |
MDNode::get(Ctx, {MDString::get(Ctx, "llvm.loop.vectorize.enable"), | |
ConstantAsMetadata::get(ConstantInt::get( | |
llvm::Type::getInt1Ty(Ctx), 0))})); | |
LoopProperties = NewLoopProperties; | |
} |
I just looked for it and found an issue that might be caused by this.
#75839
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM, great work!
MDNode::get(Ctx, {MDString::get(Ctx, "llvm.loop.pipeline.disable"), | ||
ConstantAsMetadata::get(ConstantInt::get( | ||
llvm::Type::getInt1Ty(Ctx), 1))})); | ||
LoopProperties = NewLoopProperties; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think this is because by design pipeline
is always the last transformation (it is implemented in the back-end, after all the IR optimization passes, there can be no transformation after this).
if (Enabled == false) { | ||
NewLoopProperties.append(LoopProperties.begin(), LoopProperties.end()); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This line was not executed when Enabled == std::nullop
, so llvm.mustprogress
([[MP]]
) from LoopProperties
is never added. Should have been added unconditionally.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
When Enabled == std::nullopt
, LoopProperties
was used as is, not NewProperties
. So I think the cause is elsewhere. Anyway, it's enough to know that llvm.mustprogress
should be appended unconditionally, thanks.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In principle, LoopVectorize should know that if the original loop had a progress guarantee, then the vectorized loop will as well, so it should add llvm.loop.must_progress
no matter what. I don't think it currently does.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In principle, LoopVectorize should know that if the original loop had a progress guarantee, then the vectorized loop will as well, so it should add
llvm.loop.must_progress
no matter what.
I think this is completely correct. What I didn't understand is, why the followup metadata of LOOP_6
(FOLLOW_VECTOR_6
) didn't have llvm.mustprogress
before this patch, but now it (FOLLOWUP_VECTOR_3
) does. I investigated a little deeper and found the cause; FOLLOWUP_VECTOR_6
actually had mustprogress
(?!). That is, the test passed for both of the following directives.
// Original.
// CHECK: ![[AFTER_VECTOR_6]] = distinct !{![[AFTER_VECTOR_6]], ![[ISVECTORIZED:.*]], ![[UNROLL_8:.*]]}
// This was also fine.
// CHECK: ![[AFTER_VECTOR_6]] = distinct !{![[AFTER_VECTOR_6]], [[MP]], ![[ISVECTORIZED:.*]], ![[UNROLL_8:.*]]}
Maybe FileCheck has a problem?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The same thing seems to happen elsewhere, e.g. LOOP_6
actually has vectorize.enable
but is not included in the CHECK directive.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think this is due to the regex .*
being too greedy, so e.g. ![[ISVECTORIZED:.*]]
consumes multiple metadata nodes. A better one would be [_a-zA-Z0-9.]+
. In principle ![[ISVECTORIZED]] = {!"llvm.loop.isvectorized"}
(or UNROLL_8
) should be verified somewhere, which would then fail if it matched more than one node.
IMHO there are lots of problems with FileCheck on LLVM-IR, and this is just one of them. Another one is that by default CHECK: pet store
will match carpet store
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think this is due to the regex .* being too greedy, so e.g. ![[ISVECTORIZED:.*]] consumes multiple metadata nodes.
That makes sense! I got it, thank you!
Another one is that by default
CHECK: pet store
will matchcarpet store
.
Ugh, that's a tricky problem.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for your review!
if (Enabled == false) { | ||
NewLoopProperties.append(LoopProperties.begin(), LoopProperties.end()); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
When Enabled == std::nullopt
, LoopProperties
was used as is, not NewProperties
. So I think the cause is elsewhere. Anyway, it's enough to know that llvm.mustprogress
should be appended unconditionally, thanks.
MDNode::get(Ctx, {MDString::get(Ctx, "llvm.loop.pipeline.disable"), | ||
ConstantAsMetadata::get(ConstantInt::get( | ||
llvm::Type::getInt1Ty(Ctx), 1))})); | ||
LoopProperties = NewLoopProperties; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What about other transformations, e.g., vectorization?
llvm-project/clang/lib/CodeGen/CGLoopInfo.cpp
Lines 213 to 220 in ce8febb
if (Enabled == false) { | |
NewLoopProperties.append(LoopProperties.begin(), LoopProperties.end()); | |
NewLoopProperties.push_back( | |
MDNode::get(Ctx, {MDString::get(Ctx, "llvm.loop.vectorize.enable"), | |
ConstantAsMetadata::get(ConstantInt::get( | |
llvm::Type::getInt1Ty(Ctx), 0))})); | |
LoopProperties = NewLoopProperties; | |
} |
I just looked for it and found an issue that might be caused by this.
#75839
When pragma of loop transformations is specified, follow-up metadata for loops is generated after each transformation. On the LLVM side, follow-up metadata is expected to be a list of properties, such as the following: ``` !followup = !{!"llvm.loop.vectorize.followup_all", !mp, !isvectorized} !mp = !{!"llvm.loop.mustprogress"} !isvectorized = !{"llvm.loop.isvectorized"} ``` However, on the clang side, the generated metadata contains an MDNode that has those properties, as shown below: ``` !followup = !{!"llvm.loop.vectorize.followup_all", !loop_id} !loop_id = distinct !{!loop_id, !mp, !isvectorized} !mp = !{!"llvm.loop.mustprogress"} !isvectorized = !{"llvm.loop.isvectorized"} ``` According to the LangRef, the LLVM side is correct. (ref: https://llvm.org/docs/TransformMetadata.html#transformation-metadata-structure). Due to this inconsistency, follow-up metadata was not interpreted correctly, e.g., only one transformation is applied when multiple pragmas are used. This patch fixes clang side to emit followup metadata in correct format.
ce8febb
to
63dec28
Compare
pragma-loop.cpp contains tests for loop metadata generated via pragma directives. These tests were not working as (perhaps) expected. This is because the regex `.*` consumes multiple elements in metadata. For example, ``` ![[LOOP_9]] = distinct !{![[LOOP_9]], ![[WIDTH_8:.*]], ![[FIXED_VEC]], ...} ``` `[[WIDTH_8]]` would have been expected to match a node like `[[WIDTH_8]] = !{!"llvm.loop.vectorize.width", i32 8}`. However, since there is no check directive to verify the contents of `[[WIDTH_8]]`, `[[WIDTH_8:.*]]` consumed more than one element. There were other similar cases. This patch fixes this problem by not using regex matcher in the metadata contents. Instead, it uses string variables whose contents are validated elsewhere. Related: llvm#131985 (comment)
pragma-loop.cpp contains tests for loop metadata generated by pragma directives. These tests didn't work as (perhaps) expected. This is because the regex `.*` consumes multiple elements in the metadata. For example, there was a check directive like this. ``` // CHECK: ![[LOOP_9]] = distinct !{![[LOOP_9]], ![[WIDTH_8:.*]], ![[FIXED_VEC]], ...} ``` In the above case, `[[WIDTH_8]]` would have been expected to match a node like `[[WIDTH_8]] = !{!"llvm.loop.vectorize.width", i32 8}`. However, since there is no check directive to verify the contents of `[[WIDTH_8]]`, the regex `.*` consumed more than one element. There were other similar cases. This patch fixes the problem by not using regex matcher in the metadata content (except for follow-up metadata). Instead, it uses string variables whose contents are validated elsewhere. Related: #131985 (comment)
…#133707) pragma-loop.cpp contains tests for loop metadata generated by pragma directives. These tests didn't work as (perhaps) expected. This is because the regex `.*` consumes multiple elements in the metadata. For example, there was a check directive like this. ``` // CHECK: ![[LOOP_9]] = distinct !{![[LOOP_9]], ![[WIDTH_8:.*]], ![[FIXED_VEC]], ...} ``` In the above case, `[[WIDTH_8]]` would have been expected to match a node like `[[WIDTH_8]] = !{!"llvm.loop.vectorize.width", i32 8}`. However, since there is no check directive to verify the contents of `[[WIDTH_8]]`, the regex `.*` consumed more than one element. There were other similar cases. This patch fixes the problem by not using regex matcher in the metadata content (except for follow-up metadata). Instead, it uses string variables whose contents are validated elsewhere. Related: llvm/llvm-project#131985 (comment)
We're facing a regression after this one. We can't build https://github.com/dslarm/Financial-Services-Workload-Samples/tree/main/MonteCarloEuropeanOptions with
|
For more details on the above, the problem happens once the loop vectorizer transforms a loop with this metadata that's inside an openmp block. The dbg metadata becomes incorrectly attached and as a result verifyFunction returns a failure.
|
Thanks for sharing. I haven't looked at the details yet, but I simplified it as follows. #include <omp.h>
void g(float);
void f(int m) {
#pragma omp parallel
{
float v0 = 0.0;
#pragma omp simd reduction(+:v0)
#pragma unroll(4)
for(int i=0; i < m; i++)
{
v0 += i;
}
#pragma omp barrier
g(v0);
}
} godbold: https://godbolt.org/z/P3cW4To99 |
That's not supposed to work. I think in the past the unroll pragma was just dropped. If the debug info is the issue, there hasn't been a lot of care to get OpenMP emit correct debug info, see e.g. #110700. For this reason OpenMP 5.1 added new pragma: #pragma omp simd reduction(+:v0)
#pragma omp unroll partial(4) |
FWIW, I tried it and the following error occurred.
|
One thing that worries me is that before this commit, there was no ICE caused by compiling this snippet of code (regardless its validity or usefulness), and now there is. It would be great if it could print a normal error message (which would encourage my resistant colleagues to rewrite the code), rather than failing with a segfault which triggers the 'fix the compiler' demands. |
Printing an error message looks reasonable to me, but I think we first need to clarify what codes are "unsupported". |
When pragma of loop transformations is specified, follow-up metadata for loops is generated after each transformation. On the LLVM side, follow-up metadata is expected to be a list of properties, such as the following:
However, on the clang side, the generated metadata contains an MDNode that has those properties, as shown below:
According to the LangRef, the LLVM side is correct. Due to this inconsistency, follow-up metadata was not interpreted correctly, e.g., only one transformation is applied when multiple pragmas are used.
This patch fixes clang side to emit followup metadata in correct format.