-
Notifications
You must be signed in to change notification settings - Fork 13.6k
[CI] Disable flang from pre-commit tests when Flang files are not touched #93485
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
Flang tests are unstable on Windows pre-commit CI: https://discourse.llvm.org/t/flang-tests-are-extremely-slow-on-windows/78591/40 Potential timeout issue: https://discourse.llvm.org/t/flang-tests-are-extremely-slow-on-windows/78591/42 Since there doesn't seem to be interest in the Flang community to fix this, we disable. This should only be re-enabled once the stability issues are proven to be fixed.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM.
CC @lnihlen
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Requesting changes since the original issue reported was fixed.
I think is should be removed from the description to avoid confusion. The current issue is OOM while building Flang on Windows, which is not the issue the Discourse thread was created for (worth a new thread?) |
The problem IMO is more the disruption caused to other projects by flang failures. |
I do not think Flang is in a state where it makes sense to precommit test Flang because of changes to Clang or LLVM for any platform. At least within Clang, I'm unaware of precommit CI ever testing a change to Clang which catches a legitimate failure in Flang, but there have been plenty of instances of Flang being in an invalid state and causing precommit CI to "fail" for Clang reviews as a result. That said, I think it makes a lot of sense for Flang to have precommit CI testing when Flang source changes. |
I agree with @joker-eph we should still build and test it when flang/ files are touched. |
Agree. What's the right incantation to make that work in our CI? |
It will be great if you can write a new post in discourse before going ahead. The original issue regarding the previous post was fixed, the last patch in the sequence was #93295. The other issue discussed in the post was OOM. Do you still see the problems after #93329? Windows is important for Flang and Flang is important for the Windows community as highlighted in https://discourse.llvm.org/t/the-you-in-eucatastrophe-the-story-of-building-scipy-with-flang/74768. |
This has nothing to do with the importance of Flang or Windows, only that we should not run tests when we don't touch those files. No need for an RFC here, this is just basic good CI practices. The fact that those tests are unstable for over a year, for different issues, on Linux and Windows is just the trigger to do this in the first place, not the justification to do so. |
@tstellar if you have a quick way to get this done, feel free to create a new PR and close this one. Thanks! |
You'd need to remove |
Ack. I see it's on both I propose we disable these two dependencies for now, and let them be tested in post-commit checks. I'm assuming that, when flang itself changes in a PR, it will be tested regardless of these deps. |
That's correct. |
I think this is a happy medium (Flang still gets precommit CI test coverage when Flang changes, but relies on post-commit CI test coverage when Clang, MLIR, or LLVM changes), so I'm in favor. |
Reworded the title and description of the PR to avoid confusion. Should be clear now. |
I have created a post in discourse. https://discourse.llvm.org/t/disable-flang-from-pre-commit-tests-when-flang-files-are-not-touched/79251 |
So is this disabling both linux and windows pre-commit testing? |
This is a bit of a weird take to me: what does "officially in production" mean? Did MLIR ever got "officially in production" in the context of the LLVM project? The only thing I'm aware of is the notion of "experimental", which applies to backends (Flang was kind of in-between at the beginning because it was C++17 before LLVM was, but it's all aligned now). |
Yes but only on PR that don't touch flang directly. That is a PR that would only touch other projects (llvm/mlir/clang) used to to trigger testing of Flang since they can potentially break it. |
It would be nice to keep flang testing on linux at least when MLIR is changed since it relies a lot on it. |
Our main support policy is: https://llvm.org/docs/SupportPolicy.html I have reported at least three serious issues with Flang testing since last year, all of which took over a month to get fixed. This one is only the last example. But, this PR is NOT about that at all. This is a tangent that we should not pursue. The core issue here is PRE-COMMIT versus POST-COMMIT testing. I think we need to be serious about pre-commit CI. This issue was causing grief for a month to a lot of people who do not care about Flang or cannot fix the underlying issue. It will not make it less tested if it has post-commit testing, it will just have a slightly longer cycle to fix rarer problems. |
Isn't is tested in post-commit bots? |
Thanks, I don't see any mention here that makes me think that Flang isn't the core tier, do you read anything differently here?
By "this issue", do you mean the "OOM on Windows" issue? |
Yes, I just don't see the problem to have it in linux pre-commit since it was reliable so far until now. If it's a problem of build time or load then it's a different issue. |
If there's a way to run on Linux but not Windows, then I don't mind. I just want to make sure we only add to pre-commit what we know is stable, but letting it run post-commit for a few months without crashes. |
This issue did take some time to fix because it was not reproducible on any of the Windows machines we tried. One of the past issues that was brought up (#77086) was a bot issue as pointed out by @joker-eph in #77086 (comment). I don't remember what the third issue was. |
I can't remember either, tbh. This was last year, and IIRC on Linux. Right now I don't mind if we just wait until it crashes again, but we need to be more responsive. If we get instability and it's hard to reproduce, then the tests must be taken down from pre-commit CI until it can be reproduced, fixed and shown stable. This is the kind of cost that is non-trivial and it makes no sense for everyone to have to pay when only a few people can actually do anything about it. |
OK! Then what about: #93729 ? |
Yeah that makes sense to enable pre-commit that are stable only. If we can keep linux that would be great. Even with pre-commit enabled it happened a couple of times that some patches were committed even with obvious failure in the pre commit CI. |
This is what I'm trying to avoid. We can't really move to better CI if we keep asking people to "just ignore the failures if you don't understand". |
In these cases it was understandable failure that were breaking flang. |
Fixed in #93729 |
Flang pre-commit tests are triggered when any file in LLVM or MLIR are touched. However, Flang is not yet achieved a stable CI, which have caused instability to many other completely unrelated PRs. This PR disables Flang tests on non-Flang changes on pre-commit, but does nothing to post-commit tests, which continue to run as is.
Once Flang becomes stable like Clang and MLIR we can re-enable it on pre-commit tests, following our current staged policy on CI, testing and reporting failures.