[analyzer][Z3] Restore the original timeout of 15s #118291

Szelethus · 2024-12-02T13:29:36Z

Discussion here:
https://discourse.llvm.org/t/analyzer-rfc-taming-z3-query-times/79520/15?u=szelethus

The original patch, #97298 introduced new timeouts backed by thorough testing and measurements to keep the running time of Z3 within reasonable limits. The measurements also showed that only certain reports and certain TUs were responsible for the poor performance of Z3 refutation.

Unfortunately, it seems like that on machines with different characteristics (slower machines) the current timeouts don't just axe 0.01% of reports, but many more as well. Considering that timeouts are inherently nondeterministic as a cutoff point, this lead reports sets being vastly different on the same projects with the same configuration. The discussion link shows that all configurations introduced in the patch with their default values lead to severa nondeterminism of the analyzer. As we, and others use the analyzer as a gating tool for PRs, we should revert to the original defaults.

We should respect that

There are still parts of the analyzer that are either proven or suspected to contain nondeterministic code (like pointer sets),
A 15s timeout is more likely to hit the same reports every time on a wider range of machines, but is still inherently nondeterministic, but an infinite timeout leads to the tool hanging,
If you measure the performance of the analyzer on your machines, you can and should achieve some speedup with little or no observable nondeterminism.

Discussion here: https://discourse.llvm.org/t/analyzer-rfc-taming-z3-query-times/79520/15?u=szelethus The original patch, llvm#97298 introduced new timeouts backed by thorough testing and measurements to keep the running time of Z3 within reasonable limits. The measurements also showed that only certain reports and certain TUs were responsible for the poor performance of Z3 refutation. Unfortunately, it seems like that on machines with different characteristics (slower machines) the current timeouts don't just axe 0.01% of reports, but many more as well. Considering that timeouts are inherently nondeterministic as a cutoff point, this lead reports sets being vastly different on the same projects with the same configuration. The discussion link shows that all configurations introduced in the patch with their default values lead to severa nondeterminism of the analyzer. As we, and others use the analyzer as a gating tool for PRs, we should revert to the original defaults. We should respect that * There are still parts of the analyzer that are either proven or suspected to contain nondeterministic code (like pointer sets), * A 15s timeout is more likely to hit the same reports every time on a wider range of machines, but is still inherently nondeterministic, but an infinite timeout leads to the tool hanging, * If you measure the performance of the analyzer on your machines, you can and should achieve some speedup with little or no observable nondeterminism.

graphite-app · 2024-12-02T13:29:44Z

Your org has enabled the Graphite merge queue for merging into main

Add the label “FP Bundles” to the PR and Graphite will automatically add it to the merge queue when it’s ready to merge.

You must have a Graphite account and log in to Graphite in order to use the merge queue. Sign up using this link.

llvmbot · 2024-12-02T13:30:07Z

@llvm/pr-subscribers-clang

@llvm/pr-subscribers-clang-static-analyzer-1

Author: Kristóf Umann (Szelethus)

Changes

Discussion here:
https://discourse.llvm.org/t/analyzer-rfc-taming-z3-query-times/79520/15?u=szelethus

The original patch, #97298 introduced new timeouts backed by thorough testing and measurements to keep the running time of Z3 within reasonable limits. The measurements also showed that only certain reports and certain TUs were responsible for the poor performance of Z3 refutation.

Unfortunately, it seems like that on machines with different characteristics (slower machines) the current timeouts don't just axe 0.01% of reports, but many more as well. Considering that timeouts are inherently nondeterministic as a cutoff point, this lead reports sets being vastly different on the same projects with the same configuration. The discussion link shows that all configurations introduced in the patch with their default values lead to severa nondeterminism of the analyzer. As we, and others use the analyzer as a gating tool for PRs, we should revert to the original defaults.

We should respect that

There are still parts of the analyzer that are either proven or suspected to contain nondeterministic code (like pointer sets),
A 15s timeout is more likely to hit the same reports every time on a wider range of machines, but is still inherently nondeterministic, but an infinite timeout leads to the tool hanging,
If you measure the performance of the analyzer on your machines, you can and should achieve some speedup with little or no observable nondeterminism.

Full diff: https://github.com/llvm/llvm-project/pull/118291.diff

3 Files Affected:

(modified) clang/include/clang/StaticAnalyzer/Core/AnalyzerOptions.def (+9-6)
(modified) clang/test/Analysis/analyzer-config.c (+3-3)
(modified) clang/unittests/StaticAnalyzer/Z3CrosscheckOracleTest.cpp (+7-7)

diff --git a/clang/include/clang/StaticAnalyzer/Core/AnalyzerOptions.def b/clang/include/clang/StaticAnalyzer/Core/AnalyzerOptions.def
index 737bc8e86cfb6a..64fb11821a2656 100644
--- a/clang/include/clang/StaticAnalyzer/Core/AnalyzerOptions.def
+++ b/clang/include/clang/StaticAnalyzer/Core/AnalyzerOptions.def
@@ -189,20 +189,23 @@ ANALYZER_OPTION(
     "crosscheck-with-z3-eqclass-timeout-threshold",
     "Set a timeout for bug report equivalence classes in milliseconds. "
     "If we exhaust this threshold, we will drop the bug report eqclass "
-    "instead of doing more Z3 queries. Set 0 for no timeout.", 700)
+    "instead of doing more Z3 queries. On fast machines, 700 is a sane value. "
+    "Set 0 for no timeout.", 0)
 
 ANALYZER_OPTION(
     unsigned, Z3CrosscheckTimeoutThreshold,
     "crosscheck-with-z3-timeout-threshold",
-    "Set a timeout for individual Z3 queries in milliseconds. "
-    "Set 0 for no timeout.", 300)
+    "Set a timeout for individual Z3 queries in milliseconds. On fast "
+    "machines, 400 is a sane value. "
+    "Set 0 for no timeout.", 15'000)
 
 ANALYZER_OPTION(
     unsigned, Z3CrosscheckRLimitThreshold,
     "crosscheck-with-z3-rlimit-threshold",
-    "Set the Z3 resource limit threshold. This sets a deterministic cutoff "
-    "point for Z3 queries, as longer queries usually consume more resources. "
-    "Set 0 for unlimited.", 400'000)
+    "Set the Z3 resource limit threshold. This sets a supposedly deterministic "
+    "cutoff point for Z3 queries, as longer queries usually consume more "
+    "resources. "
+    "Set 0 for unlimited.", 0)
 
 ANALYZER_OPTION(bool, ShouldReportIssuesInMainSourceFile,
                 "report-in-main-source-file",
diff --git a/clang/test/Analysis/analyzer-config.c b/clang/test/Analysis/analyzer-config.c
index 8eb869bac46f8f..0f1314aae9db57 100644
--- a/clang/test/Analysis/analyzer-config.c
+++ b/clang/test/Analysis/analyzer-config.c
@@ -41,9 +41,9 @@
 // CHECK-NEXT: cplusplus.Move:WarnOn = KnownsAndLocals
 // CHECK-NEXT: cplusplus.SmartPtrModeling:ModelSmartPtrDereference = false
 // CHECK-NEXT: crosscheck-with-z3 = false
-// CHECK-NEXT: crosscheck-with-z3-eqclass-timeout-threshold = 700
-// CHECK-NEXT: crosscheck-with-z3-rlimit-threshold = 400000
-// CHECK-NEXT: crosscheck-with-z3-timeout-threshold = 300
+// CHECK-NEXT: crosscheck-with-z3-eqclass-timeout-threshold = 0
+// CHECK-NEXT: crosscheck-with-z3-rlimit-threshold = 0
+// CHECK-NEXT: crosscheck-with-z3-timeout-threshold = 15000
 // CHECK-NEXT: ctu-dir = ""
 // CHECK-NEXT: ctu-import-cpp-threshold = 8
 // CHECK-NEXT: ctu-import-threshold = 24
diff --git a/clang/unittests/StaticAnalyzer/Z3CrosscheckOracleTest.cpp b/clang/unittests/StaticAnalyzer/Z3CrosscheckOracleTest.cpp
index ef07e47ee911b2..a8cb2782c7b72f 100644
--- a/clang/unittests/StaticAnalyzer/Z3CrosscheckOracleTest.cpp
+++ b/clang/unittests/StaticAnalyzer/Z3CrosscheckOracleTest.cpp
@@ -38,8 +38,8 @@ static const AnalyzerOptions DefaultOpts = [] {
 
   // Remember to update the tests in this file when these values change.
   // Also update the doc comment of `interpretQueryResult`.
-  assert(Config.Z3CrosscheckRLimitThreshold == 400'000);
-  assert(Config.Z3CrosscheckTimeoutThreshold == 300_ms);
+  assert(Config.Z3CrosscheckRLimitThreshold == 0);
+  assert(Config.Z3CrosscheckTimeoutThreshold == 15'000_ms);
   // Usually, when the timeout/rlimit threshold is reached, Z3 only slightly
   // overshoots until it realizes that it overshoot and needs to back off.
   // Consequently, the measured timeout should be fairly close to the threshold.
@@ -74,13 +74,13 @@ TEST_F(Z3CrosscheckOracleTest, SATWhenItGoesOverTime) {
 }
 
 TEST_F(Z3CrosscheckOracleTest, UNSATWhenItGoesOverTime) {
-  ASSERT_EQ(RejectEQClass, interpretQueryResult({UNSAT, 310_ms, 1000_step}));
+  ASSERT_EQ(RejectReport, interpretQueryResult({UNSAT, 310_ms, 1000_step}));
 }
 
 TEST_F(Z3CrosscheckOracleTest, RejectsTimeout) {
   ASSERT_EQ(RejectReport, interpretQueryResult({UNSAT, 25_ms, 1000_step}));
   ASSERT_EQ(RejectReport, interpretQueryResult({UNSAT, 25_ms, 1000_step}));
-  ASSERT_EQ(RejectEQClass, interpretQueryResult({UNDEF, 310_ms, 1000_step}));
+  ASSERT_EQ(RejectReport, interpretQueryResult({UNDEF, 310_ms, 1000_step}));
 }
 
 TEST_F(Z3CrosscheckOracleTest, RejectsUNSATs) {
@@ -97,7 +97,7 @@ TEST_F(Z3CrosscheckOracleTest, RejectEQClassIfSpendsTooMuchTotalTime) {
   // Simulate long queries, that barely doesn't trigger the timeout.
   ASSERT_EQ(RejectReport, interpretQueryResult({UNSAT, 290_ms, 1000_step}));
   ASSERT_EQ(RejectReport, interpretQueryResult({UNSAT, 290_ms, 1000_step}));
-  ASSERT_EQ(RejectEQClass, interpretQueryResult({UNSAT, 290_ms, 1000_step}));
+  ASSERT_EQ(RejectReport, interpretQueryResult({UNSAT, 290_ms, 1000_step}));
 }
 
 TEST_F(Z3CrosscheckOracleTest, SATWhenItSpendsTooMuchTotalTime) {
@@ -114,7 +114,7 @@ TEST_F(Z3CrosscheckOracleTest, RejectEQClassIfAttemptsManySmallQueries) {
     ASSERT_EQ(RejectReport, interpretQueryResult({UNSAT, 20_ms, 1000_step}));
   }
   // Do one more to trigger the heuristic.
-  ASSERT_EQ(RejectEQClass, interpretQueryResult({UNSAT, 1_ms, 1000_step}));
+  ASSERT_EQ(RejectReport, interpretQueryResult({UNSAT, 1_ms, 1000_step}));
 }
 
 TEST_F(Z3CrosscheckOracleTest, SATWhenIfAttemptsManySmallQueries) {
@@ -131,7 +131,7 @@ TEST_F(Z3CrosscheckOracleTest, SATWhenIfAttemptsManySmallQueries) {
 TEST_F(Z3CrosscheckOracleTest, RejectEQClassIfExhaustsRLimit) {
   ASSERT_EQ(RejectReport, interpretQueryResult({UNSAT, 25_ms, 1000_step}));
   ASSERT_EQ(RejectReport, interpretQueryResult({UNSAT, 25_ms, 1000_step}));
-  ASSERT_EQ(RejectEQClass, interpretQueryResult({UNDEF, 25_ms, 405'000_step}));
+  ASSERT_EQ(RejectReport, interpretQueryResult({UNDEF, 25_ms, 405'000_step}));
 }
 
 TEST_F(Z3CrosscheckOracleTest, SATWhenItExhaustsRLimit) {

Szelethus · 2024-12-02T13:44:11Z

I also have some just-in-case measurements cooking to confirm that we don't observe any nondeterminism with these values. Previous measurements of mine were made with all configs set to 0. Our downstream release goes off with the patch totally reverted.

NagyDonat

The commit which introduced the new Z3 timeout logic (#97298) was landed with my approval, but since then I realized that this was a mistake -- in particular the concrete fine-tuned default values provide a good fit for one particular computer speed and (as the experimental results show) behave badly on systems that are significantly slower than that. (This should have been obvious in hindsight, but I missed it.)

Based on this I support restoring the legacy 15 second timeout which is good as a sane default value and ensures that even the slow computers finish practically all non-hanging Z3 calculations.

Changing these ANALYZER_OPTIONs might be useful for power users who can measure the speed of their worker nodes and configure some sharper timeouts that work for them in particular; but even that is a questionable decision, because with sharp timeouts a temporary loss of processing power (e.g. a job that's started in parallel on the same machine) can cause inconsistent results.

Szelethus · 2024-12-03T11:59:21Z

I also have some just-in-case measurements cooking to confirm that we don't observe any nondeterminism with these values. Previous measurements of mine were made with all configs set to 0. Our downstream release goes off with the patch totally reverted.

Sanity check returned normal, no difference in between the report sets.
https://codechecker-demo.eastus.cloudapp.azure.com/Default/reports?run=qtbase_v6.2.0_determinism_check15A_1&run=libwebm_libwebm-1.0.0.27_determinism_check15A_1&run=vim_v8.2.1920_determinism_check15A_1&newcheck=qtbase_v6.2.0_determinism_check15B_1&newcheck=libwebm_libwebm-1.0.0.27_determinism_check15B_1&newcheck=vim_v8.2.1920_determinism_check15B_1

steakhal

I have no major concerns with this patch.
I proposed some wording changes, and I think we should test the configuration we had for the defaults we had prior to this patch to showcase that what is the expected behavior for the situations that motivated the Z3 oracle.

Thank you for proposing this.

clang/include/clang/StaticAnalyzer/Core/AnalyzerOptions.def

clang/unittests/StaticAnalyzer/Z3CrosscheckOracleTest.cpp

clang/include/clang/StaticAnalyzer/Core/AnalyzerOptions.def

We will have two fixtures: - DefaultZ3CrosscheckOracleTest (default options) - LimitedZ3CrosscheckOracleTest (eqclass timeout 700ms, 300ms and 400k rlimit per query, aka. the previous default) Each test is duplicated to showcase the different behavior in the two configs.

…hat they intended to test Some tests were dropped because they only make sense to test if the given configuration is not disabled.

steakhal · 2024-12-05T13:16:19Z

I took the time to update the branch with my recommendations.
Please confirm if you agree with the current status.

tldr; I only made sure the tests actually test what they intended now that the default threshold has changed.
I also added a couple more tests where it made sense, along with a new test showcasing the original issue that motivated the presence of the now disabled thresholds.

Szelethus · 2024-12-13T13:30:59Z

Oh, I'm so sorry I left this as-is, thank you so much for picking up the slack! Yes, everything looks great!

Szelethus added the clang:static analyzer label Dec 2, 2024

Szelethus requested review from haoNoQ, steakhal, balazske, NagyDonat and isuckatcs December 2, 2024 13:29

llvmbot added the clang Clang issues not falling into any other category label Dec 2, 2024

isuckatcs approved these changes Dec 2, 2024

View reviewed changes

NagyDonat approved these changes Dec 2, 2024

View reviewed changes

steakhal requested changes Dec 3, 2024

View reviewed changes

steakhal added 4 commits December 5, 2024 13:03

NFC Reword some crosscheck config descriptions

d308764

Revisit the tests and ensure they have the right thresholds to test w…

453aafc

…hat they intended to test Some tests were dropped because they only make sense to test if the given configuration is not disabled.

Demonstrate the weakness of the default configuration

4b63bf7

steakhal approved these changes Dec 5, 2024

View reviewed changes

Szelethus merged commit ea8e328 into llvm:main Dec 13, 2024
8 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

[analyzer][Z3] Restore the original timeout of 15s #118291

[analyzer][Z3] Restore the original timeout of 15s #118291

Uh oh!

Szelethus commented Dec 2, 2024

Uh oh!

graphite-app bot commented Dec 2, 2024

Uh oh!

llvmbot commented Dec 2, 2024 •

edited

Loading

Uh oh!

Szelethus commented Dec 2, 2024

Uh oh!

NagyDonat left a comment

Uh oh!

Szelethus commented Dec 3, 2024

Uh oh!

steakhal left a comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

steakhal commented Dec 5, 2024

Uh oh!

Szelethus commented Dec 13, 2024

Uh oh!

Uh oh!

Uh oh!

[analyzer][Z3] Restore the original timeout of 15s #118291

[analyzer][Z3] Restore the original timeout of 15s #118291

Uh oh!

Conversation

Szelethus commented Dec 2, 2024

Uh oh!

graphite-app bot commented Dec 2, 2024

Your org has enabled the Graphite merge queue for merging into main

Uh oh!

llvmbot commented Dec 2, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Szelethus commented Dec 2, 2024

Uh oh!

NagyDonat left a comment

Choose a reason for hiding this comment

Uh oh!

Szelethus commented Dec 3, 2024

Uh oh!

steakhal left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

steakhal commented Dec 5, 2024

Uh oh!

Szelethus commented Dec 13, 2024

Uh oh!

Uh oh!

Uh oh!

llvmbot commented Dec 2, 2024 •

edited

Loading