Skip to content

[clang][CodeGen] Generate follow-up metadata for loops in correct format #131985

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 1 commit into from
Mar 27, 2025

Conversation

kasuga-fj
Copy link
Contributor

@kasuga-fj kasuga-fj commented Mar 19, 2025

When pragma of loop transformations is specified, follow-up metadata for loops is generated after each transformation. On the LLVM side, follow-up metadata is expected to be a list of properties, such as the following:

!followup = !{!"llvm.loop.vectorize.followup_all", !mp, !isvectorized}
!mp = !{!"llvm.loop.mustprogress"}
!isvectorized = !{"llvm.loop.isvectorized"}

However, on the clang side, the generated metadata contains an MDNode that has those properties, as shown below:

!followup = !{!"llvm.loop.vectorize.followup_all", !loop_id}
!loop_id = distinct !{!loop_id, !mp, !isvectorized}
!mp = !{!"llvm.loop.mustprogress"}
!isvectorized = !{"llvm.loop.isvectorized"}

According to the LangRef, the LLVM side is correct. Due to this inconsistency, follow-up metadata was not interpreted correctly, e.g., only one transformation is applied when multiple pragmas are used.

This patch fixes clang side to emit followup metadata in correct format.

@llvmbot
Copy link
Member

llvmbot commented Mar 19, 2025

@llvm/pr-subscribers-clang
@llvm/pr-subscribers-clang-codegen

@llvm/pr-subscribers-llvm-transforms

Author: Ryotaro Kasuga (kasuga-fj)

Changes

When multiple pragma for loop transformations are specified, such as:

#pragma clang loop vectorize(enable) unroll_count(8)
for (...) {
}

The generated metadata would look like this:

!0 = distinct !{!0, !1, !2}
!1 = !{"llvm.loop.vectorize.enable", i1 true}
!2 = !{"llvm.loop.vectorize.followup_all", !3}
!3 = distinct !{!3, !4, !5}
!4 = !{"llvm.loop.isvectorized"}
!5 = !{"llvm.loop.unroll_count", i32, 8}

For a loop with !0 as its LoopID, the new LoopID after vectorized should be like as below, so that we can know that this loop is already vectorized and should be unrolled with specified count:

!6 = distinct !{!6, !4, !5}

However, the current implementation creates new LoopID like:

!7 = distinct !{!7, !3}

Therefore subsequent passes like LoopUnroll fails to recognize the attributes of this loop correctly.

This patch fixes makeFollowupLoopID, which creates a new LoopID after each transformation. If the follow-up metadata (!3 in the above case) is a LoopID, the new LoopID will contain its operands (!4 and !5) instead of the metadata itself.


Full diff: https://github.com/llvm/llvm-project/pull/131985.diff

2 Files Affected:

  • (modified) llvm/lib/Transforms/Utils/LoopUtils.cpp (+29)
  • (modified) llvm/test/Transforms/LoopVectorize/make-followup-loop-id.ll (+82-20)
diff --git a/llvm/lib/Transforms/Utils/LoopUtils.cpp b/llvm/lib/Transforms/Utils/LoopUtils.cpp
index 84c08556f8a25..4a6105add953f 100644
--- a/llvm/lib/Transforms/Utils/LoopUtils.cpp
+++ b/llvm/lib/Transforms/Utils/LoopUtils.cpp
@@ -317,6 +317,35 @@ std::optional<MDNode *> llvm::makeFollowupLoopID(
 
     HasAnyFollowup = true;
     for (const MDOperand &Option : drop_begin(FollowupNode->operands())) {
+      // The followup metadata typically forms as follows:
+      //
+      //   !0 = distinct !{!0, !1, !2}
+      //   !1 = !{!"llvm.loop.distribute.enable", i1 true}
+      //   !2 = !{!"llvm.loop.distribute.followup_all", !3}
+      //   !3 = distinct !{!3, !4}
+      //   !4 = !{!"llvm.loop.vectorize.enable", i1 true}
+      //
+      // If we push Option (!3 in this case) in MDs, the new metadata looks
+      // something like:
+      //
+      //   !5 = distinct !{!5, !3}
+      //
+      // This doesn't contain !4, so the vectorization pass doesn't recognize
+      // this loop as vectorization enabled. To make the new metadata contain !4
+      // instead of !3, traverse all of Option's operands and push them into
+      // MDs if Option seems to be a LoopID.
+      if (auto *MDN = dyn_cast<MDNode>(Option)) {
+        // TODO: Is there a proper way to detect LoopID?
+        if (MDN->getNumOperands() > 1 && MDN->getOperand(0) == MDN) {
+          for (const MDOperand &NestedOption : drop_begin(MDN->operands())) {
+            MDs.push_back(NestedOption.get());
+            Changed = true;
+          }
+          continue;
+        }
+      }
+
+      // If Option does't seem to be a LoopID, push it as it is.
       MDs.push_back(Option.get());
       Changed = true;
     }
diff --git a/llvm/test/Transforms/LoopVectorize/make-followup-loop-id.ll b/llvm/test/Transforms/LoopVectorize/make-followup-loop-id.ll
index fa5c206547a07..41f508e0a7641 100644
--- a/llvm/test/Transforms/LoopVectorize/make-followup-loop-id.ll
+++ b/llvm/test/Transforms/LoopVectorize/make-followup-loop-id.ll
@@ -11,10 +11,6 @@
 ;     a[i] *= x;
 ;   }
 ; }
-;
-; FIXME: Currently unrolling is not applied. This is because the new Loop ID
-; created after vectorization does not directly contain unroll metadata.
-; Unexpected nests have been created.
 define void @f(ptr noundef captures(none) %a, float noundef %x) {
 ; CHECK-LABEL: define void @f(
 ; CHECK-SAME: ptr noundef captures(none) [[A:%.*]], float noundef [[X:%.*]]) {
@@ -25,14 +21,47 @@ define void @f(ptr noundef captures(none) %a, float noundef %x) {
 ; CHECK-NEXT:    [[BROADCAST_SPLAT:%.*]] = shufflevector <4 x float> [[BROADCAST_SPLATINSERT]], <4 x float> poison, <4 x i32> zeroinitializer
 ; CHECK-NEXT:    br label %[[VECTOR_BODY:.*]]
 ; CHECK:       [[VECTOR_BODY]]:
-; CHECK-NEXT:    [[INDEX:%.*]] = phi i64 [ 0, %[[VECTOR_PH]] ], [ [[INDEX_NEXT:%.*]], %[[VECTOR_BODY]] ]
-; CHECK-NEXT:    [[INDEX_NEXT_6:%.*]] = add i64 [[INDEX]], 0
+; CHECK-NEXT:    [[INDEX_NEXT_6:%.*]] = phi i64 [ 0, %[[VECTOR_PH]] ], [ [[INDEX_NEXT:%.*]], %[[VECTOR_BODY]] ]
 ; CHECK-NEXT:    [[TMP14:%.*]] = getelementptr inbounds nuw float, ptr [[A]], i64 [[INDEX_NEXT_6]]
-; CHECK-NEXT:    [[TMP2:%.*]] = getelementptr inbounds nuw float, ptr [[TMP14]], i32 0
-; CHECK-NEXT:    [[WIDE_LOAD_7:%.*]] = load <4 x float>, ptr [[TMP2]], align 4
+; CHECK-NEXT:    [[WIDE_LOAD_7:%.*]] = load <4 x float>, ptr [[TMP14]], align 4
 ; CHECK-NEXT:    [[TMP15:%.*]] = fmul <4 x float> [[BROADCAST_SPLAT]], [[WIDE_LOAD_7]]
-; CHECK-NEXT:    store <4 x float> [[TMP15]], ptr [[TMP2]], align 4
-; CHECK-NEXT:    [[INDEX_NEXT]] = add nuw i64 [[INDEX]], 4
+; CHECK-NEXT:    store <4 x float> [[TMP15]], ptr [[TMP14]], align 4
+; CHECK-NEXT:    [[INDEX_NEXT1:%.*]] = add nuw nsw i64 [[INDEX_NEXT_6]], 4
+; CHECK-NEXT:    [[TMP2:%.*]] = getelementptr inbounds nuw float, ptr [[A]], i64 [[INDEX_NEXT1]]
+; CHECK-NEXT:    [[WIDE_LOAD_1:%.*]] = load <4 x float>, ptr [[TMP2]], align 4
+; CHECK-NEXT:    [[TMP3:%.*]] = fmul <4 x float> [[BROADCAST_SPLAT]], [[WIDE_LOAD_1]]
+; CHECK-NEXT:    store <4 x float> [[TMP3]], ptr [[TMP2]], align 4
+; CHECK-NEXT:    [[INDEX_NEXT_1:%.*]] = add nuw nsw i64 [[INDEX_NEXT_6]], 8
+; CHECK-NEXT:    [[TMP16:%.*]] = getelementptr inbounds nuw float, ptr [[A]], i64 [[INDEX_NEXT_1]]
+; CHECK-NEXT:    [[WIDE_LOAD_2:%.*]] = load <4 x float>, ptr [[TMP16]], align 4
+; CHECK-NEXT:    [[TMP5:%.*]] = fmul <4 x float> [[BROADCAST_SPLAT]], [[WIDE_LOAD_2]]
+; CHECK-NEXT:    store <4 x float> [[TMP5]], ptr [[TMP16]], align 4
+; CHECK-NEXT:    [[INDEX_NEXT_2:%.*]] = add nuw nsw i64 [[INDEX_NEXT_6]], 12
+; CHECK-NEXT:    [[TMP6:%.*]] = getelementptr inbounds nuw float, ptr [[A]], i64 [[INDEX_NEXT_2]]
+; CHECK-NEXT:    [[WIDE_LOAD_3:%.*]] = load <4 x float>, ptr [[TMP6]], align 4
+; CHECK-NEXT:    [[TMP7:%.*]] = fmul <4 x float> [[BROADCAST_SPLAT]], [[WIDE_LOAD_3]]
+; CHECK-NEXT:    store <4 x float> [[TMP7]], ptr [[TMP6]], align 4
+; CHECK-NEXT:    [[INDEX_NEXT_3:%.*]] = add nuw nsw i64 [[INDEX_NEXT_6]], 16
+; CHECK-NEXT:    [[TMP8:%.*]] = getelementptr inbounds nuw float, ptr [[A]], i64 [[INDEX_NEXT_3]]
+; CHECK-NEXT:    [[WIDE_LOAD_4:%.*]] = load <4 x float>, ptr [[TMP8]], align 4
+; CHECK-NEXT:    [[TMP9:%.*]] = fmul <4 x float> [[BROADCAST_SPLAT]], [[WIDE_LOAD_4]]
+; CHECK-NEXT:    store <4 x float> [[TMP9]], ptr [[TMP8]], align 4
+; CHECK-NEXT:    [[INDEX_NEXT_4:%.*]] = add nuw nsw i64 [[INDEX_NEXT_6]], 20
+; CHECK-NEXT:    [[TMP10:%.*]] = getelementptr inbounds nuw float, ptr [[A]], i64 [[INDEX_NEXT_4]]
+; CHECK-NEXT:    [[WIDE_LOAD_5:%.*]] = load <4 x float>, ptr [[TMP10]], align 4
+; CHECK-NEXT:    [[TMP11:%.*]] = fmul <4 x float> [[BROADCAST_SPLAT]], [[WIDE_LOAD_5]]
+; CHECK-NEXT:    store <4 x float> [[TMP11]], ptr [[TMP10]], align 4
+; CHECK-NEXT:    [[INDEX_NEXT_5:%.*]] = add nuw nsw i64 [[INDEX_NEXT_6]], 24
+; CHECK-NEXT:    [[TMP12:%.*]] = getelementptr inbounds nuw float, ptr [[A]], i64 [[INDEX_NEXT_5]]
+; CHECK-NEXT:    [[WIDE_LOAD_6:%.*]] = load <4 x float>, ptr [[TMP12]], align 4
+; CHECK-NEXT:    [[TMP13:%.*]] = fmul <4 x float> [[BROADCAST_SPLAT]], [[WIDE_LOAD_6]]
+; CHECK-NEXT:    store <4 x float> [[TMP13]], ptr [[TMP12]], align 4
+; CHECK-NEXT:    [[INDEX_NEXT_7:%.*]] = add nuw nsw i64 [[INDEX_NEXT_6]], 28
+; CHECK-NEXT:    [[TMP17:%.*]] = getelementptr inbounds nuw float, ptr [[A]], i64 [[INDEX_NEXT_7]]
+; CHECK-NEXT:    [[WIDE_LOAD_8:%.*]] = load <4 x float>, ptr [[TMP17]], align 4
+; CHECK-NEXT:    [[TMP18:%.*]] = fmul <4 x float> [[BROADCAST_SPLAT]], [[WIDE_LOAD_8]]
+; CHECK-NEXT:    store <4 x float> [[TMP18]], ptr [[TMP17]], align 4
+; CHECK-NEXT:    [[INDEX_NEXT]] = add nuw nsw i64 [[INDEX_NEXT_6]], 32
 ; CHECK-NEXT:    [[TMP4:%.*]] = icmp eq i64 [[INDEX_NEXT]], 1024
 ; CHECK-NEXT:    br i1 [[TMP4]], label %[[MIDDLE_BLOCK:.*]], label %[[VECTOR_BODY]], !llvm.loop [[LOOP0:![0-9]+]]
 ; CHECK:       [[MIDDLE_BLOCK]]:
@@ -41,14 +70,49 @@ define void @f(ptr noundef captures(none) %a, float noundef %x) {
 ; CHECK-NEXT:    [[BC_RESUME_VAL:%.*]] = phi i64 [ 1024, %[[MIDDLE_BLOCK]] ], [ 0, %[[ENTRY]] ]
 ; CHECK-NEXT:    br label %[[FOR_BODY:.*]]
 ; CHECK:       [[FOR_BODY]]:
-; CHECK-NEXT:    [[IV:%.*]] = phi i64 [ [[BC_RESUME_VAL]], %[[SCALAR_PH]] ], [ [[IV_NEXT:%.*]], %[[FOR_BODY]] ]
+; CHECK-NEXT:    [[IV:%.*]] = phi i64 [ [[BC_RESUME_VAL]], %[[SCALAR_PH]] ], [ [[IV_NEXT_7:%.*]], %[[FOR_BODY]] ]
 ; CHECK-NEXT:    [[ARRAYIDX:%.*]] = getelementptr inbounds nuw float, ptr [[A]], i64 [[IV]]
 ; CHECK-NEXT:    [[LOAD:%.*]] = load float, ptr [[ARRAYIDX]], align 4
 ; CHECK-NEXT:    [[MUL:%.*]] = fmul float [[X]], [[LOAD]]
 ; CHECK-NEXT:    store float [[MUL]], ptr [[ARRAYIDX]], align 4
-; CHECK-NEXT:    [[IV_NEXT]] = add nuw nsw i64 [[IV]], 1
-; CHECK-NEXT:    [[COMP:%.*]] = icmp eq i64 [[IV_NEXT]], 1024
-; CHECK-NEXT:    br i1 [[COMP]], label %[[EXIT_LOOPEXIT:.*]], label %[[FOR_BODY]], !llvm.loop [[LOOP5:![0-9]+]]
+; CHECK-NEXT:    [[IV_NEXT:%.*]] = add nuw nsw i64 [[IV]], 1
+; CHECK-NEXT:    [[ARRAYIDX_1:%.*]] = getelementptr inbounds nuw float, ptr [[A]], i64 [[IV_NEXT]]
+; CHECK-NEXT:    [[LOAD_1:%.*]] = load float, ptr [[ARRAYIDX_1]], align 4
+; CHECK-NEXT:    [[MUL_1:%.*]] = fmul float [[X]], [[LOAD_1]]
+; CHECK-NEXT:    store float [[MUL_1]], ptr [[ARRAYIDX_1]], align 4
+; CHECK-NEXT:    [[IV_NEXT_1:%.*]] = add nuw nsw i64 [[IV]], 2
+; CHECK-NEXT:    [[ARRAYIDX_2:%.*]] = getelementptr inbounds nuw float, ptr [[A]], i64 [[IV_NEXT_1]]
+; CHECK-NEXT:    [[LOAD_2:%.*]] = load float, ptr [[ARRAYIDX_2]], align 4
+; CHECK-NEXT:    [[MUL_2:%.*]] = fmul float [[X]], [[LOAD_2]]
+; CHECK-NEXT:    store float [[MUL_2]], ptr [[ARRAYIDX_2]], align 4
+; CHECK-NEXT:    [[IV_NEXT_2:%.*]] = add nuw nsw i64 [[IV]], 3
+; CHECK-NEXT:    [[ARRAYIDX_3:%.*]] = getelementptr inbounds nuw float, ptr [[A]], i64 [[IV_NEXT_2]]
+; CHECK-NEXT:    [[LOAD_3:%.*]] = load float, ptr [[ARRAYIDX_3]], align 4
+; CHECK-NEXT:    [[MUL_3:%.*]] = fmul float [[X]], [[LOAD_3]]
+; CHECK-NEXT:    store float [[MUL_3]], ptr [[ARRAYIDX_3]], align 4
+; CHECK-NEXT:    [[IV_NEXT_3:%.*]] = add nuw nsw i64 [[IV]], 4
+; CHECK-NEXT:    [[ARRAYIDX_4:%.*]] = getelementptr inbounds nuw float, ptr [[A]], i64 [[IV_NEXT_3]]
+; CHECK-NEXT:    [[LOAD_4:%.*]] = load float, ptr [[ARRAYIDX_4]], align 4
+; CHECK-NEXT:    [[MUL_4:%.*]] = fmul float [[X]], [[LOAD_4]]
+; CHECK-NEXT:    store float [[MUL_4]], ptr [[ARRAYIDX_4]], align 4
+; CHECK-NEXT:    [[IV_NEXT_4:%.*]] = add nuw nsw i64 [[IV]], 5
+; CHECK-NEXT:    [[ARRAYIDX_5:%.*]] = getelementptr inbounds nuw float, ptr [[A]], i64 [[IV_NEXT_4]]
+; CHECK-NEXT:    [[LOAD_5:%.*]] = load float, ptr [[ARRAYIDX_5]], align 4
+; CHECK-NEXT:    [[MUL_5:%.*]] = fmul float [[X]], [[LOAD_5]]
+; CHECK-NEXT:    store float [[MUL_5]], ptr [[ARRAYIDX_5]], align 4
+; CHECK-NEXT:    [[IV_NEXT_5:%.*]] = add nuw nsw i64 [[IV]], 6
+; CHECK-NEXT:    [[ARRAYIDX_6:%.*]] = getelementptr inbounds nuw float, ptr [[A]], i64 [[IV_NEXT_5]]
+; CHECK-NEXT:    [[LOAD_6:%.*]] = load float, ptr [[ARRAYIDX_6]], align 4
+; CHECK-NEXT:    [[MUL_6:%.*]] = fmul float [[X]], [[LOAD_6]]
+; CHECK-NEXT:    store float [[MUL_6]], ptr [[ARRAYIDX_6]], align 4
+; CHECK-NEXT:    [[IV_NEXT_6:%.*]] = add nuw nsw i64 [[IV]], 7
+; CHECK-NEXT:    [[ARRAYIDX_7:%.*]] = getelementptr inbounds nuw float, ptr [[A]], i64 [[IV_NEXT_6]]
+; CHECK-NEXT:    [[LOAD_7:%.*]] = load float, ptr [[ARRAYIDX_7]], align 4
+; CHECK-NEXT:    [[MUL_7:%.*]] = fmul float [[X]], [[LOAD_7]]
+; CHECK-NEXT:    store float [[MUL_7]], ptr [[ARRAYIDX_7]], align 4
+; CHECK-NEXT:    [[IV_NEXT_7]] = add nuw nsw i64 [[IV]], 8
+; CHECK-NEXT:    [[COMP_7:%.*]] = icmp eq i64 [[IV_NEXT_7]], 1024
+; CHECK-NEXT:    br i1 [[COMP_7]], label %[[EXIT_LOOPEXIT:.*]], label %[[FOR_BODY]], !llvm.loop [[LOOP3:![0-9]+]]
 ; CHECK:       [[EXIT_LOOPEXIT]]:
 ; CHECK-NEXT:    br label %[[EXIT]]
 ; CHECK:       [[EXIT]]:
@@ -78,10 +142,8 @@ exit:
 !4 = !{!"llvm.loop.isvectorized"}
 !5 = !{!"llvm.loop.unroll.count", i32 8}
 ;.
-; CHECK: [[LOOP0]] = distinct !{[[LOOP0]], [[META1:![0-9]+]], [[META4:![0-9]+]]}
-; CHECK: [[META1]] = distinct !{[[META1]], [[META2:![0-9]+]], [[META3:![0-9]+]]}
-; CHECK: [[META2]] = !{!"llvm.loop.isvectorized"}
-; CHECK: [[META3]] = !{!"llvm.loop.unroll.count", i32 8}
-; CHECK: [[META4]] = !{!"llvm.loop.unroll.runtime.disable"}
-; CHECK: [[LOOP5]] = distinct !{[[LOOP5]], [[META1]]}
+; CHECK: [[LOOP0]] = distinct !{[[LOOP0]], [[META1:![0-9]+]], [[META2:![0-9]+]]}
+; CHECK: [[META1]] = !{!"llvm.loop.isvectorized"}
+; CHECK: [[META2]] = !{!"llvm.loop.unroll.disable"}
+; CHECK: [[LOOP3]] = distinct !{[[LOOP3]], [[META1]], [[META2]]}
 ;.

Comment on lines 338 to 339
// TODO: Is there a proper way to detect LoopID?
if (MDN->getNumOperands() > 1 && MDN->getOperand(0) == MDN) {
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think this is a very good implementation, but I couldn't think of a better way.

It may be better to force the followup metadata to have exactly one LoopID. As far as I've investigated, some tests (e.g., Transforms/LoopDistribute/followup.ll) contain followup metadata with non-LoopID data, but I believe we can fix them easily. Also, if my understanding is correct, those which actually generated by the frontend seem to have exactly one LoopID. Would it make sense to set such a restriction?

@kasuga-fj kasuga-fj requested a review from Meinersbur March 19, 2025 09:32
@Meinersbur
Copy link
Member

Meinersbur commented Mar 19, 2025

Looks to me that the mistake is that the followup_all contains the LoopID, while it should contain the individual properties. See e.g.

!2 = !{!"llvm.loop.distribute.followup_all", !{!"FollowupAll"}}
!3 = !{!"llvm.loop.distribute.followup_coincident", !{!"FollowupCoincident", i1 false}}
!4 = !{!"llvm.loop.distribute.followup_sequential", !{!"FollowupSequential", i32 8}}
!5 = !{!"llvm.loop.distribute.followup_fallback", !{!"FollowupFallback"}}
and the documentation at https://llvm.org/docs/TransformMetadata.html.

Sorry, didn't notice that when reviewing the test case. Whatever created it should be fixed. Whether an MDNode is a LoopID cannot be detected reliably. Having the first entry refer to itself was a technique to avoid collapsing identical LoopIDs before we had distinct. There might be other kinds of metadata that also still use this technique. For LoopID it is pointless, passes that duplicate code (e.g. inlining, LoopUnroll, ...) do not care about ensuring that each LoopID is distinct anyway. It should be more thought of as a bag of properties.

@kasuga-fj
Copy link
Contributor Author

kasuga-fj commented Mar 19, 2025

Thanks for the background explanation! I wasn't sure which one was correct, but I see that having the properties is correct.

Let me clarify the problem: We should fix the processes of generating followups in CGLoopInfo.cpp, right? It seems that they generate followup_* having a LoopID, not properties, e.g.,

// CHECK: ![[LOOP_3]] = distinct !{![[LOOP_3]], [[MP]], ![[INTERLEAVE_4:.*]], ![[VECTORIZE_ENABLE]], ![[FOLLOWUP_VECTOR_3:.*]]}
// CHECK: ![[FOLLOWUP_VECTOR_3]] = !{!"llvm.loop.vectorize.followup_all", ![[AFTER_VECTOR_3:.*]]}
// CHECK: ![[AFTER_VECTOR_3]] = distinct !{![[AFTER_VECTOR_3]], [[MP]], ![[ISVECTORIZED:.*]], ![[UNROLL_8:.*]]}
// CHECK: ![[ISVECTORIZED]] = !{!"llvm.loop.isvectorized"}
// CHECK: ![[UNROLL_8]] = !{!"llvm.loop.unroll.count", i32 8}

@Meinersbur
Copy link
Member

Meinersbur commented Mar 19, 2025

Looks like the problem is here:

Ctx, {MDString::get(Ctx, "llvm.loop.unroll.followup_all"), Followup}));

It should add the properties, not the generated LoopID (that would be used if the #pragma was not the followup of another #pragma). Considering that it might be necessary to merge multiple such followup properties (eg. both of followup_all and followup_vectorized should be added to the vectorized loop).

I wrote both sides, so I obviously wasn't consistent with that myself. If you fix it, could you fix it for wall followup_*? Possibly the create*Metadata should only return a list of properties instead of an MDNode, with the MDNode only to be created when applying the properties to a Loop. However, the easier fix would probably be to extract the properties from the MDNode and append to the followup_* MDNode, with the returned MDNode becoming garbage-collected at some point.

@kasuga-fj
Copy link
Contributor Author

If you fix it, could you fix it for wall followup_*?

Yes, of course, I was going to fix everything from the beginning (in that sense, I should have added more tests in #131337). Thanks also for the implementation advice.

@llvmbot llvmbot added clang Clang issues not falling into any other category clang:codegen IR generation bugs: mangling, exceptions, etc. labels Mar 26, 2025
@kasuga-fj kasuga-fj changed the title [LoopUtils] Fix metadata generated by makeFollowupLoopID [clang][CodeGen] Generate follow-up metadata for loops in correct format Mar 26, 2025
Copy link
Contributor Author

@kasuga-fj kasuga-fj left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Possibly the create*Metadata should only return a list of properties instead of an MDNode, with the MDNode only to be created when applying the properties to a Loop. However, the easier fix would probably be to extract the properties from the MDNode and append to the followup_* MDNode, with the returned MDNode becoming garbage-collected at some point.

Fixed create*Metadata to return a list of properties instead of an MDNode. I believe unnecessary MDNodes would not be generated.

MDNode::get(Ctx, {MDString::get(Ctx, "llvm.loop.pipeline.disable"),
ConstantAsMetadata::get(ConstantInt::get(
llvm::Type::getInt1Ty(Ctx), 1))}));
LoopProperties = NewLoopProperties;
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Different from this PR, but I think we should set HasUserTransforms to true here (same for other create*Metadata). If not, the user-specified `disable' attributes would not be generated properly.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this is because by design pipeline is always the last transformation (it is implemented in the back-end, after all the IR optimization passes, there can be no transformation after this).

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What about other transformations, e.g., vectorization?

if (Enabled == false) {
NewLoopProperties.append(LoopProperties.begin(), LoopProperties.end());
NewLoopProperties.push_back(
MDNode::get(Ctx, {MDString::get(Ctx, "llvm.loop.vectorize.enable"),
ConstantAsMetadata::get(ConstantInt::get(
llvm::Type::getInt1Ty(Ctx), 0))}));
LoopProperties = NewLoopProperties;
}

I just looked for it and found an issue that might be caused by this.
#75839

Copy link
Member

@Meinersbur Meinersbur left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, great work!

MDNode::get(Ctx, {MDString::get(Ctx, "llvm.loop.pipeline.disable"),
ConstantAsMetadata::get(ConstantInt::get(
llvm::Type::getInt1Ty(Ctx), 1))}));
LoopProperties = NewLoopProperties;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this is because by design pipeline is always the last transformation (it is implemented in the back-end, after all the IR optimization passes, there can be no transformation after this).

if (Enabled == false) {
NewLoopProperties.append(LoopProperties.begin(), LoopProperties.end());
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This line was not executed when Enabled == std::nullop, so llvm.mustprogress ([[MP]]) from LoopProperties is never added. Should have been added unconditionally.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

When Enabled == std::nullopt, LoopProperties was used as is, not NewProperties. So I think the cause is elsewhere. Anyway, it's enough to know that llvm.mustprogress should be appended unconditionally, thanks.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In principle, LoopVectorize should know that if the original loop had a progress guarantee, then the vectorized loop will as well, so it should add llvm.loop.must_progress no matter what. I don't think it currently does.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In principle, LoopVectorize should know that if the original loop had a progress guarantee, then the vectorized loop will as well, so it should add llvm.loop.must_progress no matter what.

I think this is completely correct. What I didn't understand is, why the followup metadata of LOOP_6 (FOLLOW_VECTOR_6) didn't have llvm.mustprogress before this patch, but now it (FOLLOWUP_VECTOR_3) does. I investigated a little deeper and found the cause; FOLLOWUP_VECTOR_6 actually had mustprogress (?!). That is, the test passed for both of the following directives.

// Original.
// CHECK: ![[AFTER_VECTOR_6]] = distinct !{![[AFTER_VECTOR_6]], ![[ISVECTORIZED:.*]], ![[UNROLL_8:.*]]}

// This was also fine.
// CHECK: ![[AFTER_VECTOR_6]] = distinct !{![[AFTER_VECTOR_6]], [[MP]], ![[ISVECTORIZED:.*]], ![[UNROLL_8:.*]]}

Maybe FileCheck has a problem?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The same thing seems to happen elsewhere, e.g. LOOP_6 actually has vectorize.enable but is not included in the CHECK directive.

Copy link
Member

@Meinersbur Meinersbur Mar 28, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this is due to the regex .* being too greedy, so e.g. ![[ISVECTORIZED:.*]] consumes multiple metadata nodes. A better one would be [_a-zA-Z0-9.]+. In principle ![[ISVECTORIZED]] = {!"llvm.loop.isvectorized"} (or UNROLL_8) should be verified somewhere, which would then fail if it matched more than one node.

IMHO there are lots of problems with FileCheck on LLVM-IR, and this is just one of them. Another one is that by default CHECK: pet store will match carpet store.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this is due to the regex .* being too greedy, so e.g. ![[ISVECTORIZED:.*]] consumes multiple metadata nodes.

That makes sense! I got it, thank you!

Another one is that by default CHECK: pet store will match carpet store.

Ugh, that's a tricky problem.

Copy link
Contributor Author

@kasuga-fj kasuga-fj left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for your review!

if (Enabled == false) {
NewLoopProperties.append(LoopProperties.begin(), LoopProperties.end());
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

When Enabled == std::nullopt, LoopProperties was used as is, not NewProperties. So I think the cause is elsewhere. Anyway, it's enough to know that llvm.mustprogress should be appended unconditionally, thanks.

MDNode::get(Ctx, {MDString::get(Ctx, "llvm.loop.pipeline.disable"),
ConstantAsMetadata::get(ConstantInt::get(
llvm::Type::getInt1Ty(Ctx), 1))}));
LoopProperties = NewLoopProperties;
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What about other transformations, e.g., vectorization?

if (Enabled == false) {
NewLoopProperties.append(LoopProperties.begin(), LoopProperties.end());
NewLoopProperties.push_back(
MDNode::get(Ctx, {MDString::get(Ctx, "llvm.loop.vectorize.enable"),
ConstantAsMetadata::get(ConstantInt::get(
llvm::Type::getInt1Ty(Ctx), 0))}));
LoopProperties = NewLoopProperties;
}

I just looked for it and found an issue that might be caused by this.
#75839

When pragma of loop transformations is specified, follow-up metadata for
loops is generated after each transformation. On the LLVM side,
follow-up metadata is expected to be a list of properties, such as the
following:

```
!followup = !{!"llvm.loop.vectorize.followup_all", !mp, !isvectorized}
!mp = !{!"llvm.loop.mustprogress"}
!isvectorized = !{"llvm.loop.isvectorized"}
```

However, on the clang side, the generated metadata contains an MDNode
that has those properties, as shown below:

```
!followup = !{!"llvm.loop.vectorize.followup_all", !loop_id}
!loop_id = distinct !{!loop_id, !mp, !isvectorized}
!mp = !{!"llvm.loop.mustprogress"}
!isvectorized = !{"llvm.loop.isvectorized"}
```

According to the LangRef, the LLVM side is correct. (ref:
https://llvm.org/docs/TransformMetadata.html#transformation-metadata-structure).
Due to this inconsistency, follow-up metadata was not interpreted
correctly, e.g., only one transformation is applied when multiple
pragmas are used.

This patch fixes clang side to emit followup metadata in correct format.
@kasuga-fj kasuga-fj force-pushed the fix-make-followup-loopid branch from ce8febb to 63dec28 Compare March 27, 2025 05:50
@kasuga-fj kasuga-fj merged commit 6c56a84 into llvm:main Mar 27, 2025
11 checks passed
kasuga-fj added a commit to kasuga-fj/llvm-project that referenced this pull request Mar 31, 2025
pragma-loop.cpp contains tests for loop metadata generated via pragma
directives. These tests were not working as (perhaps) expected. This is
because the regex `.*` consumes multiple elements in metadata. For
example,

```
![[LOOP_9]] = distinct !{![[LOOP_9]], ![[WIDTH_8:.*]], ![[FIXED_VEC]], ...}
```

`[[WIDTH_8]]` would have been expected to match a node like
`[[WIDTH_8]] = !{!"llvm.loop.vectorize.width", i32 8}`. However, since
there is no check directive to verify the contents of `[[WIDTH_8]]`,
`[[WIDTH_8:.*]]` consumed more than one element. There were other
similar cases.

This patch fixes this problem by not using regex matcher in the metadata
contents. Instead, it uses string variables whose contents are validated
elsewhere.

Related:
llvm#131985 (comment)
kasuga-fj added a commit that referenced this pull request Apr 2, 2025
pragma-loop.cpp contains tests for loop metadata generated by pragma
directives. These tests didn't work as (perhaps) expected. This is
because the regex `.*` consumes multiple elements in the metadata. For
example, there was a check directive like this.

```
// CHECK: ![[LOOP_9]] = distinct !{![[LOOP_9]], ![[WIDTH_8:.*]], ![[FIXED_VEC]], ...}
```

In the above case, `[[WIDTH_8]]` would have been expected to match a
node like `[[WIDTH_8]] = !{!"llvm.loop.vectorize.width", i32 8}`.
However, since there is no check directive to verify the contents of
`[[WIDTH_8]]`, the regex `.*` consumed more than one element. There were
other similar cases.

This patch fixes the problem by not using regex matcher in the metadata
content (except for follow-up metadata). Instead, it uses string
variables whose contents are validated elsewhere.

Related:
#131985 (comment)
llvm-sync bot pushed a commit to arm/arm-toolchain that referenced this pull request Apr 2, 2025
…#133707)

pragma-loop.cpp contains tests for loop metadata generated by pragma
directives. These tests didn't work as (perhaps) expected. This is
because the regex `.*` consumes multiple elements in the metadata. For
example, there was a check directive like this.

```
// CHECK: ![[LOOP_9]] = distinct !{![[LOOP_9]], ![[WIDTH_8:.*]], ![[FIXED_VEC]], ...}
```

In the above case, `[[WIDTH_8]]` would have been expected to match a
node like `[[WIDTH_8]] = !{!"llvm.loop.vectorize.width", i32 8}`.
However, since there is no check directive to verify the contents of
`[[WIDTH_8]]`, the regex `.*` consumed more than one element. There were
other similar cases.

This patch fixes the problem by not using regex matcher in the metadata
content (except for follow-up metadata). Instead, it uses string
variables whose contents are validated elsewhere.

Related:
llvm/llvm-project#131985 (comment)
@pawosm-arm
Copy link
Contributor

pawosm-arm commented May 14, 2025

We're facing a regression after this one. We can't build https://github.com/dslarm/Financial-Services-Workload-Samples/tree/main/MonteCarloEuropeanOptions with -g -fopenmp for AArch64 now:

Stack dump:
0.	Program arguments: /usr/bin/clang-21 -cc1 -triple aarch64-none-linux-gnu -emit-obj -dumpdir MonteCarloInsideBlockingDP- -disable-free -clear-ast-before-backend -disable-llvm-verifier -discard-value-names -main-file-name MonteCarloInsideBlockingDP.cpp -mrelocation-model static -mframe-pointer=non-leaf -menable-no-infs -menable-no-nans -fapprox-func -funsafe-math-optimizations -fno-signed-zeros -mreassociate -freciprocal-math -ffp-contract=fast -fno-rounding-math -ffast-math -ffinite-math-only -complex-range=basic -mconstructor-aliases -funwind-tables=2 -enable-tlsdesc -target-cpu generic -target-feature +v8.1a -target-feature +crc -target-feature +fp-armv8 -target-feature +lse -target-feature +neon -target-feature +rdm -target-abi aapcs -tune-cpu neoverse-v1 -debug-info-kind=constructor -dwarf-version=5 -debugger-tuning=gdb -fdebug-compilation-dir=/opt/buildAgent/temp/buildTmp/suite_stage/arm-dps/llvm/dps-llvm/FinanceBuildTest_0d34fa41/MonteCarloEuropeanOptions -fcoverage-compilation-dir=/opt/buildAgent/temp/buildTmp/suite_stage/arm-dps/llvm/dps-llvm/FinanceBuildTest_0d34fa41/MonteCarloEuropeanOptions -resource-dir /usr/lib/clang/21 -I /opt/arm/armpl_24.10_flang-new/include -I/opt/arm/armpl_24.10_flang-new/include -internal-isystem /usr/bin/../lib/gcc/aarch64-linux-gnu/12/../../../../include/c++/12 -internal-isystem /usr/bin/../lib/gcc/aarch64-linux-gnu/12/../../../../include/aarch64-linux-gnu/c++/12 -internal-isystem /usr/bin/../lib/gcc/aarch64-linux-gnu/12/../../../../include/c++/12/backward -internal-isystem /usr/lib/clang/21/include -internal-isystem /usr/local/include -internal-isystem /usr/bin/../lib/gcc/aarch64-linux-gnu/12/../../../../aarch64-linux-gnu/include -internal-externc-isystem /usr/include/aarch64-linux-gnu -internal-externc-isystem /include -internal-externc-isystem /usr/include -Ofast --std=c++14 -fdeprecated-macro -ferror-limit 19 -fopenmp -fno-signed-char -fgnuc-version=4.2.1 -fskip-odr-check-in-gmf -fcxx-exceptions -fexceptions -vectorize-loops -vectorize-slp -target-feature +outline-atomics -target-feature -fmv -faddrsig -D__GCC_HAVE_DWARF2_CFI_ASM=1 -o /opt/buildAgent/temp/buildTmp/MonteCarloInsideBlockingDP-0c7c07.o -x c++ MonteCarloInsideBlockingDP.cpp
1.	<eof> parser at end of file
2.	Code generation
 #0 0x0000aaaaad96ee6c llvm::sys::PrintStackTrace(llvm::raw_ostream&, int) (/usr/bin/clang-21+0x2ecee6c)
 #1 0x0000aaaaad96ce38 llvm::sys::RunSignalHandlers() (/usr/bin/clang-21+0x2ecce38)
 #2 0x0000aaaaad96f6d0 SignalHandler(int, siginfo_t*, void*) Signals.cpp:0:0
 #3 0x000040000002d8f8 (linux-vdso.so.1+0x8f8)
 #4 0x0000aaaaae649ddc llvm::DIE::getUnitDie() const (/usr/bin/clang-21+0x3ba9ddc)
 #5 0x0000aaaaae652994 llvm::DwarfDebug::finalizeModuleInfo() (/usr/bin/clang-21+0x3bb2994)
 #6 0x0000aaaaae65315c llvm::DwarfDebug::endModule() (/usr/bin/clang-21+0x3bb315c)
 #7 0x0000aaaaae63aa78 llvm::AsmPrinter::doFinalization(llvm::Module&) (/usr/bin/clang-21+0x3b9aa78)
 #8 0x0000aaaaad54e250 llvm::FPPassManager::doFinalization(llvm::Module&) (/usr/bin/clang-21+0x2aae250)
 #9 0x0000aaaaad548930 llvm::legacy::PassManagerImpl::run(llvm::Module&) (/usr/bin/clang-21+0x2aa8930)
#10 0x0000aaaaadf4cca4 clang::emitBackendOutput(clang::CompilerInstance&, clang::CodeGenOptions&, llvm::StringRef, llvm::Module*, clang::BackendAction, llvm::IntrusiveRefCntPtr<llvm::vfs::FileSystem>, std::unique_ptr<llvm::raw_pwrite_stream, std::default_delete<llvm::raw_pwrite_stream>>, clang::BackendConsumer*) (/usr/bin/clang-21+0x34acca4)
#11 0x0000aaaaadf5e28c clang::BackendConsumer::HandleTranslationUnit(clang::ASTContext&) (/usr/bin/clang-21+0x34be28c)
#12 0x0000aaaaaf4aad58 clang::ParseAST(clang::Sema&, bool, bool) (/usr/bin/clang-21+0x4a0ad58)
#13 0x0000aaaaae3d46a0 clang::FrontendAction::Execute() (/usr/bin/clang-21+0x39346a0)
#14 0x0000aaaaae365610 clang::CompilerInstance::ExecuteAction(clang::FrontendAction&) (/usr/bin/clang-21+0x38c5610)
#15 0x0000aaaaae49f688 clang::ExecuteCompilerInvocation(clang::CompilerInstance*) (/usr/bin/clang-21+0x39ff688)
#16 0x0000aaaaac98fb6c cc1_main(llvm::ArrayRef<char const*>, char const*, void*) (/usr/bin/clang-21+0x1eefb6c)
#17 0x0000aaaaac98d204 ExecuteCC1Tool(llvm::SmallVectorImpl<char const*>&, llvm::ToolContext const&) driver.cpp:0:0
#18 0x0000aaaaac98c2e4 clang_main(int, char**, llvm::ToolContext const&) (/usr/bin/clang-21+0x1eec2e4)
#19 0x0000aaaaac998544 main (/usr/bin/clang-21+0x1ef8544)
#20 0x00004000004573fc __libc_start_call_main ./csu/../sysdeps/nptl/libc_start_call_main.h:74:3
#21 0x00004000004574cc call_init ./csu/../csu/libc-start.c:128:20
#22 0x00004000004574cc __libc_start_main ./csu/../csu/libc-start.c:379:5
#23 0x0000aaaaac98afe4 _start (/usr/bin/clang-21+0x1eeafe4)
clang++: error: unable to execute command: Segmentation fault (core dumped)

@mrkajetanp
Copy link
Contributor

For more details on the above, the problem happens once the loop vectorizer transforms a loop with this metadata that's inside an openmp block. The dbg metadata becomes incorrectly attached and as a result verifyFunction returns a failure.

!dbg attachment points at wrong subprogram for function
!2094 = distinct !DISubprogram(name: "main.omp_outlined", scope: !2, file: !2, line: 207, type: !2095, scopeLine: 207, flags: DIFlagArtificial | DIFlagPrototyped | DIFlagAllCallsDescribed, spFlags: DISPFlagLocalToUnit | DISPFlagDefinition | DISPFlagOptimized, unit: !25, retainedNodes: !2102)
ptr @main.omp_outlined
  br i1 %53, label %middle.block, label %vector.body, !dbg !2327, !llvm.loop !2338
!2339 = !DILocation(line: 258, column: 5, scope: !2162)
!2162 = distinct !DILexicalBlock(scope: !2163, file: !2, line: 258, column: 5)
!2118 = distinct !DISubprogram(name: "main.omp_outlined_debug__", scope: !2, file: !2, line: 208, type: !2095, scopeLine: 208, flags: DIFlagArtificial | DIFlagPrototyped | DIFlagAllCallsDescribed, spFlags: DISPFlagLocalToUnit | DISPFlagDefinition | DISPFlagOptimized, unit: !25, retainedNodes: !2119)
clang-21: ~/llvm/llvm-project-release/llvm/lib/Transforms/Vectorize/LoopVectorize.cpp:11013: bool llvm::LoopVectorizePass::processLoop(Loop *): Assertion `!verifyFunction(*L->getHeader()->getParent(), &dbgs())' failed.

@kasuga-fj
Copy link
Contributor Author

Thanks for sharing. I haven't looked at the details yet, but I simplified it as follows.

#include <omp.h>

void g(float);

void f(int m) {
#pragma omp parallel
  {
    float v0 = 0.0;
#pragma omp simd reduction(+:v0)
#pragma unroll(4)
    for(int i=0; i < m; i++)
    {
      v0 += i;
    }

#pragma omp barrier
    g(v0);
  }
}

godbold: https://godbolt.org/z/P3cW4To99

@Meinersbur
Copy link
Member

#pragma omp simd reduction(+:v0)
#pragma unroll(4)

That's not supposed to work. I think in the past the unroll pragma was just dropped. If the debug info is the issue, there hasn't been a lot of care to get OpenMP emit correct debug info, see e.g. #110700.

For this reason OpenMP 5.1 added new pragma:

#pragma omp simd reduction(+:v0)
#pragma omp unroll partial(4)

@kasuga-fj
Copy link
Contributor Author

For this reason OpenMP 5.1 added new pragma:

#pragma omp simd reduction(+:v0)
#pragma omp unroll partial(4)

FWIW, I tried it and the following error occurred.

<source>:10:1: error: OpenMP constructs may not be nested inside a simd region except for ordered simd, simd, scan, or atomic directive
   10 | #pragma omp unroll partial(4)
      | ^

@pawosm-arm
Copy link
Contributor

pawosm-arm commented May 15, 2025

One thing that worries me is that before this commit, there was no ICE caused by compiling this snippet of code (regardless its validity or usefulness), and now there is. It would be great if it could print a normal error message (which would encourage my resistant colleagues to rewrite the code), rather than failing with a segfault which triggers the 'fix the compiler' demands.

@kasuga-fj
Copy link
Contributor Author

Printing an error message looks reasonable to me, but I think we first need to clarify what codes are "unsupported".

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
clang:codegen IR generation bugs: mangling, exceptions, etc. clang Clang issues not falling into any other category llvm:transforms
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants