[CI] Disable flang from pre-commit tests when Flang files are not touched #93485

rengolin · 2024-05-27T16:57:35Z

Flang pre-commit tests are triggered when any file in LLVM or MLIR are touched. However, Flang is not yet achieved a stable CI, which have caused instability to many other completely unrelated PRs. This PR disables Flang tests on non-Flang changes on pre-commit, but does nothing to post-commit tests, which continue to run as is.

Once Flang becomes stable like Clang and MLIR we can re-enable it on pre-commit tests, following our current staged policy on CI, testing and reporting failures.

Flang tests are unstable on Windows pre-commit CI: https://discourse.llvm.org/t/flang-tests-are-extremely-slow-on-windows/78591/40 Potential timeout issue: https://discourse.llvm.org/t/flang-tests-are-extremely-slow-on-windows/78591/42 Since there doesn't seem to be interest in the Flang community to fix this, we disable. This should only be re-enabled once the stability issues are proven to be fixed.

Endilll

LGTM.
CC @lnihlen

kiranchandramohan

Requesting changes since the original issue reported was fixed.

Endilll · 2024-05-27T17:19:19Z

Flang tests are unstable on Windows pre-commit CI: https://discourse.llvm.org/t/flang-tests-are-extremely-slow-on-windows/78591/40

I think is should be removed from the description to avoid confusion. The current issue is OOM while building Flang on Windows, which is not the issue the Discourse thread was created for (worth a new thread?)

joker-eph · 2024-05-27T19:01:36Z

The problem IMO is more the disruption caused to other projects by flang failures.
So I wouldn’t just disable flang tests entirely, instead I would exclude flang from being built/tested on windows when no files from the flang directory was touched.
(I already argued this point to the clang folks who IMO incorrectly handled it)

AaronBallman · 2024-05-28T12:56:49Z

I do not think Flang is in a state where it makes sense to precommit test Flang because of changes to Clang or LLVM for any platform. At least within Clang, I'm unaware of precommit CI ever testing a change to Clang which catches a legitimate failure in Flang, but there have been plenty of instances of Flang being in an invalid state and causing precommit CI to "fail" for Clang reviews as a result.

That said, I think it makes a lot of sense for Flang to have precommit CI testing when Flang source changes.

tstellar · 2024-05-28T17:07:11Z

I agree with @joker-eph we should still build and test it when flang/ files are touched.

rengolin · 2024-05-29T11:45:58Z

I agree with @joker-eph we should still build and test it when flang/ files are touched.

Agree. What's the right incantation to make that work in our CI?

kiranchandramohan · 2024-05-29T12:30:27Z

I agree with @joker-eph we should still build and test it when flang/ files are touched.

Agree. What's the right incantation to make that work in our CI?

It will be great if you can write a new post in discourse before going ahead. The original issue regarding the previous post was fixed, the last patch in the sequence was #93295. The other issue discussed in the post was OOM. Do you still see the problems after #93329?

Windows is important for Flang and Flang is important for the Windows community as highlighted in https://discourse.llvm.org/t/the-you-in-eucatastrophe-the-story-of-building-scipy-with-flang/74768.

rengolin · 2024-05-29T15:25:46Z

It will be great if you can write a new post in discourse before going ahead. The original issue regarding the previous post was fixed, the last patch in the sequence was #93295. The other issue discussed in the post was OOM. Do you still see the problems after #93329?

This has nothing to do with the importance of Flang or Windows, only that we should not run tests when we don't touch those files. No need for an RFC here, this is just basic good CI practices.

The fact that those tests are unstable for over a year, for different issues, on Linux and Windows is just the trigger to do this in the first place, not the justification to do so.

rengolin · 2024-05-29T15:27:12Z

@tstellar if you have a quick way to get this done, feel free to create a new PR and close this one. Thanks!

Endilll · 2024-05-29T15:28:49Z

I agree with @joker-eph we should still build and test it when flang/ files are touched.

Agree. What's the right incantation to make that work in our CI?

You'd need to remove flang from compute-projects-to-test.

rengolin · 2024-05-29T15:38:48Z

I agree with @joker-eph we should still build and test it when flang/ files are touched.

Agree. What's the right incantation to make that work in our CI?

You'd need to remove flang from compute-projects-to-test.

Ack. I see it's on both llvm and mlir. Now, of course, flang depends on both, but it's also no officially in production, so random crashes are still expected until it's as stable and complete as clang.

I propose we disable these two dependencies for now, and let them be tested in post-commit checks. I'm assuming that, when flang itself changes in a PR, it will be tested regardless of these deps.

Endilll · 2024-05-29T15:39:30Z

I'm assuming that, when flang itself changes in a PR, it will be tested regardless of these deps.

That's correct.

AaronBallman · 2024-05-29T15:46:12Z

I propose we disable these two dependencies for now, and let them be tested in post-commit checks. I'm assuming that, when flang itself changes in a PR, it will be tested regardless of these deps.

I think this is a happy medium (Flang still gets precommit CI test coverage when Flang changes, but relies on post-commit CI test coverage when Clang, MLIR, or LLVM changes), so I'm in favor.

rengolin · 2024-05-29T16:29:31Z

I think is should be removed from the description to avoid confusion. The current issue is OOM while building Flang on Windows, which is not the issue the Discourse thread was created for (worth a new thread?)

Reworded the title and description of the PR to avoid confusion. Should be clear now.

@joker-eph @tstellar

kiranchandramohan · 2024-05-29T17:00:30Z

I have created a post in discourse. https://discourse.llvm.org/t/disable-flang-from-pre-commit-tests-when-flang-files-are-not-touched/79251

clementval · 2024-05-29T18:05:16Z

So is this disabling both linux and windows pre-commit testing?

joker-eph · 2024-05-29T18:07:42Z

Now, of course, flang depends on both, but it's also no officially in production, so random crashes are still expected until it's as stable and complete as clang.

This is a bit of a weird take to me: what does "officially in production" mean? Did MLIR ever got "officially in production" in the context of the LLVM project? The only thing I'm aware of is the notion of "experimental", which applies to backends (Flang was kind of in-between at the beginning because it was C++17 before LLVM was, but it's all aligned now).
There are degrees of maturity of course, but I haven't seen issues with the Flang-on-Linux CI so far (some frontend files are annoyingly heavy to compile!), or only very occasionally (flaky tests can happen, but not more than in LLVM itself or any other project)

joker-eph · 2024-05-29T18:08:50Z

So is this disabling both linux and windows pre-commit testing?

Yes but only on PR that don't touch flang directly. That is a PR that would only touch other projects (llvm/mlir/clang) used to to trigger testing of Flang since they can potentially break it.

clementval · 2024-05-29T18:10:25Z

So is this disabling both linux and windows pre-commit testing?

Yes but only on PR that don't touch flang directly. That is a PR that would only touch other projects (llvm/mlir/clang) used to to trigger testing of Flang since they can potentially break it.

It would be nice to keep flang testing on linux at least when MLIR is changed since it relies a lot on it.

rengolin · 2024-05-29T18:48:38Z

This is a bit of a weird take to me: what does "officially in production" mean? Did MLIR ever got "officially in production" in the context of the LLVM project? The only thing I'm aware of is the notion of "experimental", which applies to backends (Flang was kind of in-between at the beginning because it was C++17 before LLVM was, but it's all aligned now). There are degrees of maturity of course, but I haven't seen issues with the Flang-on-Linux CI so far (some frontend files are annoyingly heavy to compile!), or only very occasionally (flaky tests can happen, but not more than in LLVM itself or any other project)

Our main support policy is: https://llvm.org/docs/SupportPolicy.html I have reported at least three serious issues with Flang testing since last year, all of which took over a month to get fixed. This one is only the last example.

But, this PR is NOT about that at all. This is a tangent that we should not pursue. The core issue here is PRE-COMMIT versus POST-COMMIT testing.

I think we need to be serious about pre-commit CI. This issue was causing grief for a month to a lot of people who do not care about Flang or cannot fix the underlying issue. It will not make it less tested if it has post-commit testing, it will just have a slightly longer cycle to fix rarer problems.

rengolin · 2024-05-29T18:49:36Z

Yes but only on PR that don't touch flang directly. That is a PR that would only touch other projects (llvm/mlir/clang) used to to trigger testing of Flang since they can potentially break it.

It would be nice to keep flang testing on linux at least when MLIR is changed since it relies a lot on it.

Isn't is tested in post-commit bots?

joker-eph · 2024-05-29T18:57:49Z

Our main support policy is: https://llvm.org/docs/SupportPolicy.html I

Thanks, I don't see any mention here that makes me think that Flang isn't the core tier, do you read anything differently here?

This issue was causing grief for a month to a lot of people who do not care about Flang or cannot fix the underlying issue

By "this issue", do you mean the "OOM on Windows" issue?
If so I agree with you, and I agree that disabling flang as a dependency to fix this error is the right course of action.
But I'm still puzzled as of why we'd not just do this for Windows where the issue exists and leave the Linux flow as-is, mind elaborating on this?
(if there are flakiness issues on Linux that I'm not aware, please say so, of course I'd support disabling this on Linux if this was the case)

clementval · 2024-05-29T19:01:16Z

Yes but only on PR that don't touch flang directly. That is a PR that would only touch other projects (llvm/mlir/clang) used to to trigger testing of Flang since they can potentially break it.

It would be nice to keep flang testing on linux at least when MLIR is changed since it relies a lot on it.

Isn't is tested in post-commit bots?

Yes, I just don't see the problem to have it in linux pre-commit since it was reliable so far until now. If it's a problem of build time or load then it's a different issue.

rengolin · 2024-05-29T19:25:12Z

Isn't is tested in post-commit bots?

Yes, I just don't see the problem to have it in linux pre-commit since it was reliable so far until now. If it's a problem of build time or load then it's a different issue.

If there's a way to run on Linux but not Windows, then I don't mind. I just want to make sure we only add to pre-commit what we know is stable, but letting it run post-commit for a few months without crashes.

kiranchandramohan · 2024-05-29T19:41:41Z

This is a bit of a weird take to me: what does "officially in production" mean? Did MLIR ever got "officially in production" in the context of the LLVM project? The only thing I'm aware of is the notion of "experimental", which applies to backends (Flang was kind of in-between at the beginning because it was C++17 before LLVM was, but it's all aligned now). There are degrees of maturity of course, but I haven't seen issues with the Flang-on-Linux CI so far (some frontend files are annoyingly heavy to compile!), or only very occasionally (flaky tests can happen, but not more than in LLVM itself or any other project)

Our main support policy is: https://llvm.org/docs/SupportPolicy.html I have reported at least three serious issues with Flang testing since last year, all of which took over a month to get fixed. This one is only the last example.

This issue did take some time to fix because it was not reproducible on any of the Windows machines we tried. One of the past issues that was brought up (#77086) was a bot issue as pointed out by @joker-eph in #77086 (comment). I don't remember what the third issue was.

rengolin · 2024-05-29T19:51:18Z

This issue did take some time to fix because it was not reproducible on any of the Windows machines we tried. One of the past issues that was brought up (#77086) was a bot issue as pointed out by @joker-eph in #77086 (comment). I don't remember what the third issue was.

I can't remember either, tbh. This was last year, and IIRC on Linux.

Right now I don't mind if we just wait until it crashes again, but we need to be more responsive. If we get instability and it's hard to reproduce, then the tests must be taken down from pre-commit CI until it can be reproduced, fixed and shown stable.

This is the kind of cost that is non-trivial and it makes no sense for everyone to have to pay when only a few people can actually do anything about it.

joker-eph · 2024-05-29T20:05:42Z

Isn't is tested in post-commit bots?

Yes, I just don't see the problem to have it in linux pre-commit since it was reliable so far until now. If it's a problem of build time or load then it's a different issue.

If there's a way to run on Linux but not Windows, then I don't mind.

OK! Then what about: #93729 ?

clementval · 2024-05-29T20:11:03Z

Isn't is tested in post-commit bots?

Yes, I just don't see the problem to have it in linux pre-commit since it was reliable so far until now. If it's a problem of build time or load then it's a different issue.

If there's a way to run on Linux but not Windows, then I don't mind. I just want to make sure we only add to pre-commit what we know is stable, but letting it run post-commit for a few months without crashes.

Yeah that makes sense to enable pre-commit that are stable only. If we can keep linux that would be great. Even with pre-commit enabled it happened a couple of times that some patches were committed even with obvious failure in the pre commit CI.

rengolin · 2024-05-29T22:17:57Z

Even with pre-commit enabled it happened a couple of times that some patches were committed even with obvious failure in the pre commit CI.

This is what I'm trying to avoid. We can't really move to better CI if we keep asking people to "just ignore the failures if you don't understand".

clementval · 2024-05-29T22:30:25Z

Even with pre-commit enabled it happened a couple of times that some patches were committed even with obvious failure in the pre commit CI.

This is what I'm trying to avoid. We can't really move to better CI if we keep asking people to "just ignore the failures if you don't understand".

In these cases it was understandable failure that were breaking flang.

rengolin · 2024-05-30T10:13:21Z

Fixed in #93729

rengolin requested review from ldionne, Meinersbur, AaronBallman, Endilll, joker-eph and kparzysz May 27, 2024 16:57

ldionne approved these changes May 27, 2024

View reviewed changes

Endilll approved these changes May 27, 2024

View reviewed changes

kiranchandramohan requested changes May 27, 2024

View reviewed changes

AaronBallman requested review from tstellar and lnihlen May 28, 2024 12:51

Revert slow flang fix, remove depos for pre-ci for now

b468470

rengolin changed the title ~~[CI] Disable flang from Windows pre-commit tests~~ [CI] Disable flang from pre-commit tests when Flang files are not touched May 29, 2024

rengolin mentioned this pull request May 29, 2024

[CI] Disable Flang from pre-commit tests when Flang files are not touched on Windows Only #93729

Merged

rengolin closed this May 30, 2024

[CI] Disable flang from pre-commit tests when Flang files are not touched #93485

[CI] Disable flang from pre-commit tests when Flang files are not touched #93485

Uh oh!

Conversation

rengolin commented May 27, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Endilll left a comment

Choose a reason for hiding this comment

Uh oh!

kiranchandramohan left a comment

Choose a reason for hiding this comment

Uh oh!

Endilll commented May 27, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

joker-eph commented May 27, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

AaronBallman commented May 28, 2024

Uh oh!

tstellar commented May 28, 2024

Uh oh!

rengolin commented May 29, 2024

Uh oh!

kiranchandramohan commented May 29, 2024

Uh oh!

rengolin commented May 29, 2024

Uh oh!

rengolin commented May 29, 2024

Uh oh!

Endilll commented May 29, 2024

Uh oh!

rengolin commented May 29, 2024

Uh oh!

Endilll commented May 29, 2024

Uh oh!

AaronBallman commented May 29, 2024

Uh oh!

rengolin commented May 29, 2024

Uh oh!

kiranchandramohan commented May 29, 2024

Uh oh!

clementval commented May 29, 2024

Uh oh!

joker-eph commented May 29, 2024

Uh oh!

joker-eph commented May 29, 2024

Uh oh!

clementval commented May 29, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

rengolin commented May 29, 2024

Uh oh!

rengolin commented May 29, 2024

Uh oh!

joker-eph commented May 29, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

clementval commented May 29, 2024

Uh oh!

rengolin commented May 29, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

kiranchandramohan commented May 29, 2024

Uh oh!

rengolin commented May 29, 2024

Uh oh!

joker-eph commented May 29, 2024

Uh oh!

clementval commented May 29, 2024

Uh oh!

rengolin commented May 29, 2024

Uh oh!

clementval commented May 29, 2024

Uh oh!

rengolin commented May 30, 2024

Uh oh!

Uh oh!

rengolin commented May 27, 2024 •

edited

Loading

Endilll commented May 27, 2024 •

edited

Loading

joker-eph commented May 27, 2024 •

edited

Loading

clementval commented May 29, 2024 •

edited

Loading

joker-eph commented May 29, 2024 •

edited

Loading

rengolin commented May 29, 2024 •

edited

Loading