-
Notifications
You must be signed in to change notification settings - Fork 13.6k
[analyzer][Z3] Restore the original timeout of 15s #118291
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
Discussion here: https://discourse.llvm.org/t/analyzer-rfc-taming-z3-query-times/79520/15?u=szelethus The original patch, llvm#97298 introduced new timeouts backed by thorough testing and measurements to keep the running time of Z3 within reasonable limits. The measurements also showed that only certain reports and certain TUs were responsible for the poor performance of Z3 refutation. Unfortunately, it seems like that on machines with different characteristics (slower machines) the current timeouts don't just axe 0.01% of reports, but many more as well. Considering that timeouts are inherently nondeterministic as a cutoff point, this lead reports sets being vastly different on the same projects with the same configuration. The discussion link shows that all configurations introduced in the patch with their default values lead to severa nondeterminism of the analyzer. As we, and others use the analyzer as a gating tool for PRs, we should revert to the original defaults. We should respect that * There are still parts of the analyzer that are either proven or suspected to contain nondeterministic code (like pointer sets), * A 15s timeout is more likely to hit the same reports every time on a wider range of machines, but is still inherently nondeterministic, but an infinite timeout leads to the tool hanging, * If you measure the performance of the analyzer on your machines, you can and should achieve some speedup with little or no observable nondeterminism.
Your org has enabled the Graphite merge queue for merging into mainAdd the label “FP Bundles” to the PR and Graphite will automatically add it to the merge queue when it’s ready to merge. You must have a Graphite account and log in to Graphite in order to use the merge queue. Sign up using this link. |
@llvm/pr-subscribers-clang @llvm/pr-subscribers-clang-static-analyzer-1 Author: Kristóf Umann (Szelethus) ChangesDiscussion here: The original patch, #97298 introduced new timeouts backed by thorough testing and measurements to keep the running time of Z3 within reasonable limits. The measurements also showed that only certain reports and certain TUs were responsible for the poor performance of Z3 refutation. Unfortunately, it seems like that on machines with different characteristics (slower machines) the current timeouts don't just axe 0.01% of reports, but many more as well. Considering that timeouts are inherently nondeterministic as a cutoff point, this lead reports sets being vastly different on the same projects with the same configuration. The discussion link shows that all configurations introduced in the patch with their default values lead to severa nondeterminism of the analyzer. As we, and others use the analyzer as a gating tool for PRs, we should revert to the original defaults. We should respect that
Full diff: https://github.com/llvm/llvm-project/pull/118291.diff 3 Files Affected:
diff --git a/clang/include/clang/StaticAnalyzer/Core/AnalyzerOptions.def b/clang/include/clang/StaticAnalyzer/Core/AnalyzerOptions.def
index 737bc8e86cfb6a..64fb11821a2656 100644
--- a/clang/include/clang/StaticAnalyzer/Core/AnalyzerOptions.def
+++ b/clang/include/clang/StaticAnalyzer/Core/AnalyzerOptions.def
@@ -189,20 +189,23 @@ ANALYZER_OPTION(
"crosscheck-with-z3-eqclass-timeout-threshold",
"Set a timeout for bug report equivalence classes in milliseconds. "
"If we exhaust this threshold, we will drop the bug report eqclass "
- "instead of doing more Z3 queries. Set 0 for no timeout.", 700)
+ "instead of doing more Z3 queries. On fast machines, 700 is a sane value. "
+ "Set 0 for no timeout.", 0)
ANALYZER_OPTION(
unsigned, Z3CrosscheckTimeoutThreshold,
"crosscheck-with-z3-timeout-threshold",
- "Set a timeout for individual Z3 queries in milliseconds. "
- "Set 0 for no timeout.", 300)
+ "Set a timeout for individual Z3 queries in milliseconds. On fast "
+ "machines, 400 is a sane value. "
+ "Set 0 for no timeout.", 15'000)
ANALYZER_OPTION(
unsigned, Z3CrosscheckRLimitThreshold,
"crosscheck-with-z3-rlimit-threshold",
- "Set the Z3 resource limit threshold. This sets a deterministic cutoff "
- "point for Z3 queries, as longer queries usually consume more resources. "
- "Set 0 for unlimited.", 400'000)
+ "Set the Z3 resource limit threshold. This sets a supposedly deterministic "
+ "cutoff point for Z3 queries, as longer queries usually consume more "
+ "resources. "
+ "Set 0 for unlimited.", 0)
ANALYZER_OPTION(bool, ShouldReportIssuesInMainSourceFile,
"report-in-main-source-file",
diff --git a/clang/test/Analysis/analyzer-config.c b/clang/test/Analysis/analyzer-config.c
index 8eb869bac46f8f..0f1314aae9db57 100644
--- a/clang/test/Analysis/analyzer-config.c
+++ b/clang/test/Analysis/analyzer-config.c
@@ -41,9 +41,9 @@
// CHECK-NEXT: cplusplus.Move:WarnOn = KnownsAndLocals
// CHECK-NEXT: cplusplus.SmartPtrModeling:ModelSmartPtrDereference = false
// CHECK-NEXT: crosscheck-with-z3 = false
-// CHECK-NEXT: crosscheck-with-z3-eqclass-timeout-threshold = 700
-// CHECK-NEXT: crosscheck-with-z3-rlimit-threshold = 400000
-// CHECK-NEXT: crosscheck-with-z3-timeout-threshold = 300
+// CHECK-NEXT: crosscheck-with-z3-eqclass-timeout-threshold = 0
+// CHECK-NEXT: crosscheck-with-z3-rlimit-threshold = 0
+// CHECK-NEXT: crosscheck-with-z3-timeout-threshold = 15000
// CHECK-NEXT: ctu-dir = ""
// CHECK-NEXT: ctu-import-cpp-threshold = 8
// CHECK-NEXT: ctu-import-threshold = 24
diff --git a/clang/unittests/StaticAnalyzer/Z3CrosscheckOracleTest.cpp b/clang/unittests/StaticAnalyzer/Z3CrosscheckOracleTest.cpp
index ef07e47ee911b2..a8cb2782c7b72f 100644
--- a/clang/unittests/StaticAnalyzer/Z3CrosscheckOracleTest.cpp
+++ b/clang/unittests/StaticAnalyzer/Z3CrosscheckOracleTest.cpp
@@ -38,8 +38,8 @@ static const AnalyzerOptions DefaultOpts = [] {
// Remember to update the tests in this file when these values change.
// Also update the doc comment of `interpretQueryResult`.
- assert(Config.Z3CrosscheckRLimitThreshold == 400'000);
- assert(Config.Z3CrosscheckTimeoutThreshold == 300_ms);
+ assert(Config.Z3CrosscheckRLimitThreshold == 0);
+ assert(Config.Z3CrosscheckTimeoutThreshold == 15'000_ms);
// Usually, when the timeout/rlimit threshold is reached, Z3 only slightly
// overshoots until it realizes that it overshoot and needs to back off.
// Consequently, the measured timeout should be fairly close to the threshold.
@@ -74,13 +74,13 @@ TEST_F(Z3CrosscheckOracleTest, SATWhenItGoesOverTime) {
}
TEST_F(Z3CrosscheckOracleTest, UNSATWhenItGoesOverTime) {
- ASSERT_EQ(RejectEQClass, interpretQueryResult({UNSAT, 310_ms, 1000_step}));
+ ASSERT_EQ(RejectReport, interpretQueryResult({UNSAT, 310_ms, 1000_step}));
}
TEST_F(Z3CrosscheckOracleTest, RejectsTimeout) {
ASSERT_EQ(RejectReport, interpretQueryResult({UNSAT, 25_ms, 1000_step}));
ASSERT_EQ(RejectReport, interpretQueryResult({UNSAT, 25_ms, 1000_step}));
- ASSERT_EQ(RejectEQClass, interpretQueryResult({UNDEF, 310_ms, 1000_step}));
+ ASSERT_EQ(RejectReport, interpretQueryResult({UNDEF, 310_ms, 1000_step}));
}
TEST_F(Z3CrosscheckOracleTest, RejectsUNSATs) {
@@ -97,7 +97,7 @@ TEST_F(Z3CrosscheckOracleTest, RejectEQClassIfSpendsTooMuchTotalTime) {
// Simulate long queries, that barely doesn't trigger the timeout.
ASSERT_EQ(RejectReport, interpretQueryResult({UNSAT, 290_ms, 1000_step}));
ASSERT_EQ(RejectReport, interpretQueryResult({UNSAT, 290_ms, 1000_step}));
- ASSERT_EQ(RejectEQClass, interpretQueryResult({UNSAT, 290_ms, 1000_step}));
+ ASSERT_EQ(RejectReport, interpretQueryResult({UNSAT, 290_ms, 1000_step}));
}
TEST_F(Z3CrosscheckOracleTest, SATWhenItSpendsTooMuchTotalTime) {
@@ -114,7 +114,7 @@ TEST_F(Z3CrosscheckOracleTest, RejectEQClassIfAttemptsManySmallQueries) {
ASSERT_EQ(RejectReport, interpretQueryResult({UNSAT, 20_ms, 1000_step}));
}
// Do one more to trigger the heuristic.
- ASSERT_EQ(RejectEQClass, interpretQueryResult({UNSAT, 1_ms, 1000_step}));
+ ASSERT_EQ(RejectReport, interpretQueryResult({UNSAT, 1_ms, 1000_step}));
}
TEST_F(Z3CrosscheckOracleTest, SATWhenIfAttemptsManySmallQueries) {
@@ -131,7 +131,7 @@ TEST_F(Z3CrosscheckOracleTest, SATWhenIfAttemptsManySmallQueries) {
TEST_F(Z3CrosscheckOracleTest, RejectEQClassIfExhaustsRLimit) {
ASSERT_EQ(RejectReport, interpretQueryResult({UNSAT, 25_ms, 1000_step}));
ASSERT_EQ(RejectReport, interpretQueryResult({UNSAT, 25_ms, 1000_step}));
- ASSERT_EQ(RejectEQClass, interpretQueryResult({UNDEF, 25_ms, 405'000_step}));
+ ASSERT_EQ(RejectReport, interpretQueryResult({UNDEF, 25_ms, 405'000_step}));
}
TEST_F(Z3CrosscheckOracleTest, SATWhenItExhaustsRLimit) {
|
I also have some just-in-case measurements cooking to confirm that we don't observe any nondeterminism with these values. Previous measurements of mine were made with all configs set to 0. Our downstream release goes off with the patch totally reverted. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The commit which introduced the new Z3 timeout logic (#97298) was landed with my approval, but since then I realized that this was a mistake -- in particular the concrete fine-tuned default values provide a good fit for one particular computer speed and (as the experimental results show) behave badly on systems that are significantly slower than that. (This should have been obvious in hindsight, but I missed it.)
Based on this I support restoring the legacy 15 second timeout which is good as a sane default value and ensures that even the slow computers finish practically all non-hanging Z3 calculations.
Changing these ANALYZER_OPTION
s might be useful for power users who can measure the speed of their worker nodes and configure some sharper timeouts that work for them in particular; but even that is a questionable decision, because with sharp timeouts a temporary loss of processing power (e.g. a job that's started in parallel on the same machine) can cause inconsistent results.
Sanity check returned normal, no difference in between the report sets. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I have no major concerns with this patch.
I proposed some wording changes, and I think we should test the configuration we had for the defaults we had prior to this patch to showcase that what is the expected behavior for the situations that motivated the Z3 oracle.
Thank you for proposing this.
We will have two fixtures: - DefaultZ3CrosscheckOracleTest (default options) - LimitedZ3CrosscheckOracleTest (eqclass timeout 700ms, 300ms and 400k rlimit per query, aka. the previous default) Each test is duplicated to showcase the different behavior in the two configs.
…hat they intended to test Some tests were dropped because they only make sense to test if the given configuration is not disabled.
I took the time to update the branch with my recommendations. tldr; I only made sure the tests actually test what they intended now that the default threshold has changed. |
Oh, I'm so sorry I left this as-is, thank you so much for picking up the slack! Yes, everything looks great! |
Discussion here:
https://discourse.llvm.org/t/analyzer-rfc-taming-z3-query-times/79520/15?u=szelethus
The original patch, #97298 introduced new timeouts backed by thorough testing and measurements to keep the running time of Z3 within reasonable limits. The measurements also showed that only certain reports and certain TUs were responsible for the poor performance of Z3 refutation.
Unfortunately, it seems like that on machines with different characteristics (slower machines) the current timeouts don't just axe 0.01% of reports, but many more as well. Considering that timeouts are inherently nondeterministic as a cutoff point, this lead reports sets being vastly different on the same projects with the same configuration. The discussion link shows that all configurations introduced in the patch with their default values lead to severa nondeterminism of the analyzer. As we, and others use the analyzer as a gating tool for PRs, we should revert to the original defaults.
We should respect that