Skip to content

[MachineVerifier] Report errors from one thread at a time #111605

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 3 commits into from
Oct 11, 2024

Conversation

ellishg
Copy link
Contributor

@ellishg ellishg commented Oct 8, 2024

Create the ReportedErrors class to track the number of reported errors during verification. The class will block reporting errors if some other thread is currently reporting an error.

I've encountered a case where there were many different verifications reporting errors at the same time on different threads. This ensures that we don't start printing the error from one case until we are completely done printing errors from other cases. Most of the time AbortOnError = true so we usually abort after reporting the first error.

Depends on #111602.

@ellishg ellishg marked this pull request as draft October 8, 2024 23:42
@ellishg ellishg marked this pull request as ready for review October 9, 2024 00:01
# RUN: not --crash llc -mtriple=aarch64 -o /dev/null -run-pass=none -verify-machineinstrs %s 2>&1 | FileCheck %s --implicit-check-not="Bad machine code"
# REQUIRES: aarch64-registered-target

# Check that errors are reported from only one function.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This behavior is really annoying if it's fixable

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Even before this patch, AbortOnError defaults to true which means we will abort after reporting the first error.

if (AbortOnErrors && FoundErrors)
report_fatal_error("Found "+Twine(FoundErrors)+" machine code errors.");

The problem I encountered is that if the report is super long, it's possible that the verifier would find another error and start print it before the program aborts. The output would be unreadable since two separate errors were printing at the same time.

This patch aims to fix that scenario by ensuring a single error gets reported at a time. Since we abort after the first error, this test checks that we only print errors from one function.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I updated the wording to be a bit more clear.

Copy link
Contributor

@arsenm arsenm left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Probably fine, but I don't think we really need the verifier to be thread safe. Step 1 of debugging any issue would be to get a standalone reproducer which fails

We also should probably stop dumping the entire function on failure.

@ellishg
Copy link
Contributor Author

ellishg commented Oct 11, 2024

Probably fine, but I don't think we really need the verifier to be thread safe. Step 1 of debugging any issue would be to get a standalone reproducer which fails

I did find a reproducer, but because it included multiple failing modules, the error was scrambled, so I couldn't minimize the reproducer easily. I couldn't even tell which modules were failing until I introduced a lock in the reporting code.

We also should probably stop dumping the entire function on failure.

Makes sense to me, but that should probably be a separate PR.

@ellishg ellishg merged commit adaa603 into llvm:main Oct 11, 2024
6 of 8 checks passed
@ellishg ellishg deleted the verifier-threads branch October 11, 2024 18:53
DanielCChen pushed a commit to DanielCChen/llvm-project that referenced this pull request Oct 16, 2024
Create the `ReportedErrors` class to track the number of reported errors
during verification. The class will block reporting errors if some other
thread is currently reporting an error.

I've encountered a case where there were many different verifications
reporting errors at the same time on different threads. This ensures
that we don't start printing the error from one case until we are
completely done printing errors from other cases. Most of the time
`AbortOnError = true` so we usually abort after reporting the first
error.

Depends on llvm#111602.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants