Skip to content

Modify the localCache API to require an explicit commit on CachedFile… #136121

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 14 commits into from
Apr 22, 2025

Conversation

anjenner
Copy link
Contributor

…Stream.

CachedFileStream has previously performed the commit step in its destructor, but this means its only recourse for error handling is report_fatal_error. Modify this to add an explicit commit() method, and call this in the appropriate places with appropriate error handling for the location.

Currently the destructor of CacheStream gives an assert failure in Debug builds if commit() was not called. This will help track down any remaining uses of the API that assume the old destructior behaviour. In Release builds we fall back to the previous behaviour and call report_fatal_error if the commit fails.

This is version 2 of this PR, superseding reverted PR #115331 . I have incorporated a change to the testcase to make it more reliable on Windows, as well as two follow-up changes (df79000 and b0baa1d) that were also reverted when 115331 was reverted.

anjenner and others added 14 commits November 21, 2024 10:52
…Stream.

CachedFileStream has previously performed the commit step in its destructor,
but this means its only recourse for error handling is report_fatal_error.
Modify this to add an explicit commit() method, and call this in the
appropriate places with appropriate error handling for the location.

Currently the destructor of CacheStream gives an assert failure in Debug builds
if commit() was not called. This will help track down any remaining uses of the
API that assume the old destructior behaviour. In Release builds we fall back
to the previous behaviour and call report_fatal_error if the commit fails.
@anjenner anjenner requested a review from JDevlieghere as a code owner April 17, 2025 10:30
@anjenner anjenner merged commit c3f815b into llvm:main Apr 22, 2025
12 checks passed
@llvm-ci
Copy link
Collaborator

llvm-ci commented Apr 22, 2025

LLVM Buildbot has detected a new failure on builder openmp-s390x-linux running on systemz-1 while building lldb,llvm at step 6 "test-openmp".

Full details are available at: https://lab.llvm.org/buildbot/#/builders/88/builds/10746

Here is the relevant piece of the build log for the reference
Step 6 (test-openmp) failure: test (failure)
******************** TEST 'libomp :: tasking/issue-94260-2.c' FAILED ********************
Exit Code: -11

Command Output (stdout):
--
# RUN: at line 1
/home/uweigand/sandbox/buildbot/openmp-s390x-linux/llvm.build/./bin/clang -fopenmp   -I /home/uweigand/sandbox/buildbot/openmp-s390x-linux/llvm.build/runtimes/runtimes-bins/openmp/runtime/src -I /home/uweigand/sandbox/buildbot/openmp-s390x-linux/llvm.src/openmp/runtime/test -L /home/uweigand/sandbox/buildbot/openmp-s390x-linux/llvm.build/runtimes/runtimes-bins/openmp/runtime/src  -fno-omit-frame-pointer -mbackchain -I /home/uweigand/sandbox/buildbot/openmp-s390x-linux/llvm.src/openmp/runtime/test/ompt /home/uweigand/sandbox/buildbot/openmp-s390x-linux/llvm.src/openmp/runtime/test/tasking/issue-94260-2.c -o /home/uweigand/sandbox/buildbot/openmp-s390x-linux/llvm.build/runtimes/runtimes-bins/openmp/runtime/test/tasking/Output/issue-94260-2.c.tmp -lm -latomic && /home/uweigand/sandbox/buildbot/openmp-s390x-linux/llvm.build/runtimes/runtimes-bins/openmp/runtime/test/tasking/Output/issue-94260-2.c.tmp
# executed command: /home/uweigand/sandbox/buildbot/openmp-s390x-linux/llvm.build/./bin/clang -fopenmp -I /home/uweigand/sandbox/buildbot/openmp-s390x-linux/llvm.build/runtimes/runtimes-bins/openmp/runtime/src -I /home/uweigand/sandbox/buildbot/openmp-s390x-linux/llvm.src/openmp/runtime/test -L /home/uweigand/sandbox/buildbot/openmp-s390x-linux/llvm.build/runtimes/runtimes-bins/openmp/runtime/src -fno-omit-frame-pointer -mbackchain -I /home/uweigand/sandbox/buildbot/openmp-s390x-linux/llvm.src/openmp/runtime/test/ompt /home/uweigand/sandbox/buildbot/openmp-s390x-linux/llvm.src/openmp/runtime/test/tasking/issue-94260-2.c -o /home/uweigand/sandbox/buildbot/openmp-s390x-linux/llvm.build/runtimes/runtimes-bins/openmp/runtime/test/tasking/Output/issue-94260-2.c.tmp -lm -latomic
# executed command: /home/uweigand/sandbox/buildbot/openmp-s390x-linux/llvm.build/runtimes/runtimes-bins/openmp/runtime/test/tasking/Output/issue-94260-2.c.tmp
# note: command had no output on stdout or stderr
# error: command failed with exit status: -11

--

********************


searlmc1 pushed a commit to ROCm/llvm-project that referenced this pull request Apr 23, 2025
…chedFile… (llvm#136121)"

breaks oclTstO0 and others

This reverts commit c3f815b.
searlmc1 pushed a commit to ROCm/llvm-project that referenced this pull request Apr 23, 2025
…it on CachedFile… (llvm#136121)""

This reverts commit cf261c6 and reinstates
commit c3f815b with changes to the comgr
caching code to use the new commit API.
@dyung
Copy link
Collaborator

dyung commented Apr 24, 2025

Maybe I'm missing something, but was either this change or the original one approved for submit by someone? I don't see any review/approval in this commit, and the original PR doesn't seem to have been approved.

@anjenner
Copy link
Contributor Author

The original PR (#115331) was approved and merged, but later reverted due an unstable test. The only difference in this version of the PR was a fix to make the test stable on Windows, so I thought it would be ok to merge by the approval of the original PR. Apologies if I was mistaken on this.

@dyung
Copy link
Collaborator

dyung commented Apr 24, 2025

The original PR (#115331) was approved and merged, but later reverted due an unstable test. The only difference in this version of the PR was a fix to make the test stable on Windows, so I thought it would be ok to merge by the approval of the original PR. Apologies if I was mistaken on this.

Who approved the original PR? Was the approval done offline? I do not see an approval in the original PR, just some feedback.

@anjenner
Copy link
Contributor Author

Hmm... perhaps I have misunderstood the LLVM contribution process here. I had assumed that once all the review feedback had been addressed, some time had passed to allow reviewers to comment on the modifications resulting from review feedback, and the "merge" button was available, then that was all that was needed. This was my takeaway from reading the documentation at https://www.llvm.org/docs/Contributing.html . But it sounds like you're saying that a separate approval step is necessary? How is that approval obtained? Where is that process documented?

@dyung
Copy link
Collaborator

dyung commented Apr 25, 2025

Hmm... perhaps I have misunderstood the LLVM contribution process here. I had assumed that once all the review feedback had been addressed, some time had passed to allow reviewers to comment on the modifications resulting from review feedback, and the "merge" button was available, then that was all that was needed. This was my takeaway from reading the documentation at https://www.llvm.org/docs/Contributing.html . But it sounds like you're saying that a separate approval step is necessary? How is that approval obtained? Where is that process documented?

My understanding of the process is you put up a pull request, ask for review/feedback, and then keep iterating until someone approves your PR at which point you can then submit the request. If after some time you haven't heard back, it is customary to "ping" your review to try and get it back on people's radars as many reviewers get a ton of emails and it may get lost in the noise.

The relevant passage from the document you cited I believe is the following:

If you have received no comments on your patch for a week, you can request a review by ‘ping’ing the GitHub PR with “Ping” in a comment. The common courtesy ‘ping’ rate is once a week. Please remember that you are asking for valuable time from other professional developers.

After your PR is approved, you can merge it.

The main reason I asked is that this patch came to my attention twice, once when it originally caused a lot of bot failures (I was the person who reverted it originally), and also this second time because it is causing problems with our downstream code (that is our problem to fix).

@anjenner
Copy link
Contributor Author

Ah, apologies for misunderstanding the process. I will do it the right way for this time. As for this patch, I'm guessing it would be better to leave it merged (unless it causes any other problems) to avoid excess churn - please let me know if you disagree. And apologies for the earlier bot failures and the problems that this is causing with your downstream code as well.

@teresajohnson
Copy link
Contributor

@dyung thanks for catching this, I haven't been able to watch closely and didn't realize no one approved.

@anjenner No worries, misunderstandings happen. I do think we need to revert this, however. See PR138203 - it sounds like this is causing issues with lifetimes and I'm concerned that there may be other lurking issues like that one. Can you revert this?

@anjenner
Copy link
Contributor Author

anjenner commented May 3, 2025

I'm away for the long weekend but can revert on Wednesday if nobody does so before then. However, I'm not sure how we can track down such lurking issues other than by leaving this unreverted and fixing issues as they pop up, since neither code review nor testcases caught #138194 . Do you have any recommendations other than just not making this API change and leaving the localCache API without proper error handling?

@teresajohnson
Copy link
Contributor

I'm away for the long weekend but can revert on Wednesday if nobody does so before then. However, I'm not sure how we can track down such lurking issues other than by leaving this unreverted and fixing issues as they pop up, since neither code review nor testcases caught #138194 . Do you have any recommendations other than just not making this API change and leaving the localCache API without proper error handling?

I've added some questions on that PR, after looking closely at the 2 changes I'm not really sure how this change produced or is related to the crash being fixed there.

IanWood1 pushed a commit to IanWood1/llvm-project that referenced this pull request May 6, 2025
llvm#136121)

…Stream.

CachedFileStream has previously performed the commit step in its
destructor, but this means its only recourse for error handling is
report_fatal_error. Modify this to add an explicit commit() method, and
call this in the appropriate places with appropriate error handling for
the location.

Currently the destructor of CacheStream gives an assert failure in Debug
builds if commit() was not called. This will help track down any
remaining uses of the API that assume the old destructior behaviour. In
Release builds we fall back to the previous behaviour and call
report_fatal_error if the commit fails.

This is version 2 of this PR, superseding reverted PR
llvm#115331 . I have incorporated a
change to the testcase to make it more reliable on Windows, as well as
two follow-up changes
(llvm@df79000
and
llvm@b0baa1d)
that were also reverted when 115331 was reverted.

---------

Co-authored-by: Augie Fackler <[email protected]>
Co-authored-by: Vitaly Buka <[email protected]>
IanWood1 pushed a commit to IanWood1/llvm-project that referenced this pull request May 6, 2025
llvm#136121)

…Stream.

CachedFileStream has previously performed the commit step in its
destructor, but this means its only recourse for error handling is
report_fatal_error. Modify this to add an explicit commit() method, and
call this in the appropriate places with appropriate error handling for
the location.

Currently the destructor of CacheStream gives an assert failure in Debug
builds if commit() was not called. This will help track down any
remaining uses of the API that assume the old destructior behaviour. In
Release builds we fall back to the previous behaviour and call
report_fatal_error if the commit fails.

This is version 2 of this PR, superseding reverted PR
llvm#115331 . I have incorporated a
change to the testcase to make it more reliable on Windows, as well as
two follow-up changes
(llvm@df79000
and
llvm@b0baa1d)
that were also reverted when 115331 was reverted.

---------

Co-authored-by: Augie Fackler <[email protected]>
Co-authored-by: Vitaly Buka <[email protected]>
IanWood1 pushed a commit to IanWood1/llvm-project that referenced this pull request May 6, 2025
llvm#136121)

…Stream.

CachedFileStream has previously performed the commit step in its
destructor, but this means its only recourse for error handling is
report_fatal_error. Modify this to add an explicit commit() method, and
call this in the appropriate places with appropriate error handling for
the location.

Currently the destructor of CacheStream gives an assert failure in Debug
builds if commit() was not called. This will help track down any
remaining uses of the API that assume the old destructior behaviour. In
Release builds we fall back to the previous behaviour and call
report_fatal_error if the commit fails.

This is version 2 of this PR, superseding reverted PR
llvm#115331 . I have incorporated a
change to the testcase to make it more reliable on Windows, as well as
two follow-up changes
(llvm@df79000
and
llvm@b0baa1d)
that were also reverted when 115331 was reverted.

---------

Co-authored-by: Augie Fackler <[email protected]>
Co-authored-by: Vitaly Buka <[email protected]>
quic-akaryaki added a commit that referenced this pull request May 14, 2025
The `CacheStream::commit()` function (defined in Caching.cpp) deletes
the underlying raw stream. Some output streamers may hold a pointer
to it, which then will outlive the stream object.
In particular, MCAsmStreamer keeps the pointer to the raw stream
though a separate `formatted_raw_stream` object, which buffers data and
there is no path to explicitly flush this data. Before this change,
the buffered data was flushed during the MCAsmStreamer destructor.
After #136121, this happened after the `commit()` function is called.
Therefore, it caused a crash because the `formatted_raw_stream` object
tries to write the buffered data into a deleted raw stream. Even if
we don't delete the stream to avoid the crash, it would be too late
as the output stream cannot accept data after commit().

Fixes: #138194.
@quic-akaryaki
Copy link
Contributor

@anjenner : an alternative way to report error would be to pass a "diagnostic object" to CachedFileStream that would collect errors happening during calling its destructors, report them later, and fail the compilation if any. That might have been a less intrusive way.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

7 participants