Skip to content

[analyzer][Z3] Restore the original timeout of 15s #118291

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 5 commits into from
Dec 13, 2024

Conversation

Szelethus
Copy link
Contributor

Discussion here:
https://discourse.llvm.org/t/analyzer-rfc-taming-z3-query-times/79520/15?u=szelethus

The original patch, #97298 introduced new timeouts backed by thorough testing and measurements to keep the running time of Z3 within reasonable limits. The measurements also showed that only certain reports and certain TUs were responsible for the poor performance of Z3 refutation.

Unfortunately, it seems like that on machines with different characteristics (slower machines) the current timeouts don't just axe 0.01% of reports, but many more as well. Considering that timeouts are inherently nondeterministic as a cutoff point, this lead reports sets being vastly different on the same projects with the same configuration. The discussion link shows that all configurations introduced in the patch with their default values lead to severa nondeterminism of the analyzer. As we, and others use the analyzer as a gating tool for PRs, we should revert to the original defaults.

We should respect that

  • There are still parts of the analyzer that are either proven or suspected to contain nondeterministic code (like pointer sets),
  • A 15s timeout is more likely to hit the same reports every time on a wider range of machines, but is still inherently nondeterministic, but an infinite timeout leads to the tool hanging,
  • If you measure the performance of the analyzer on your machines, you can and should achieve some speedup with little or no observable nondeterminism.

Discussion here:
https://discourse.llvm.org/t/analyzer-rfc-taming-z3-query-times/79520/15?u=szelethus

The original patch, llvm#97298 introduced new timeouts backed by thorough
testing and measurements to keep the running time of Z3 within
reasonable limits. The measurements also showed that only certain
reports and certain TUs were responsible for the poor performance of Z3
refutation.

Unfortunately, it seems like that on machines with different
characteristics (slower machines) the current timeouts don't just axe
0.01% of reports, but many more as well. Considering that timeouts are
inherently nondeterministic as a cutoff point, this lead reports sets
being vastly different on the same projects with the same configuration.
The discussion link shows that all configurations introduced in the
patch with their default values lead to severa nondeterminism of the
analyzer. As we, and others use the analyzer as a gating tool for PRs,
we should revert to the original defaults.

We should respect that
* There are still parts of the analyzer that are either proven or
  suspected to contain nondeterministic code (like pointer sets),
* A 15s timeout is more likely to hit the same reports every time on a
  wider range of machines, but is still inherently nondeterministic, but
  an infinite timeout leads to the tool hanging,
* If you measure the performance of the analyzer on your machines, you
  can and should achieve some speedup with little or no observable
  nondeterminism.
Copy link

graphite-app bot commented Dec 2, 2024

Your org has enabled the Graphite merge queue for merging into main

Add the label “FP Bundles” to the PR and Graphite will automatically add it to the merge queue when it’s ready to merge.

You must have a Graphite account and log in to Graphite in order to use the merge queue. Sign up using this link.

@llvmbot llvmbot added the clang Clang issues not falling into any other category label Dec 2, 2024
@llvmbot
Copy link
Member

llvmbot commented Dec 2, 2024

@llvm/pr-subscribers-clang

@llvm/pr-subscribers-clang-static-analyzer-1

Author: Kristóf Umann (Szelethus)

Changes

Discussion here:
https://discourse.llvm.org/t/analyzer-rfc-taming-z3-query-times/79520/15?u=szelethus

The original patch, #97298 introduced new timeouts backed by thorough testing and measurements to keep the running time of Z3 within reasonable limits. The measurements also showed that only certain reports and certain TUs were responsible for the poor performance of Z3 refutation.

Unfortunately, it seems like that on machines with different characteristics (slower machines) the current timeouts don't just axe 0.01% of reports, but many more as well. Considering that timeouts are inherently nondeterministic as a cutoff point, this lead reports sets being vastly different on the same projects with the same configuration. The discussion link shows that all configurations introduced in the patch with their default values lead to severa nondeterminism of the analyzer. As we, and others use the analyzer as a gating tool for PRs, we should revert to the original defaults.

We should respect that

  • There are still parts of the analyzer that are either proven or suspected to contain nondeterministic code (like pointer sets),
  • A 15s timeout is more likely to hit the same reports every time on a wider range of machines, but is still inherently nondeterministic, but an infinite timeout leads to the tool hanging,
  • If you measure the performance of the analyzer on your machines, you can and should achieve some speedup with little or no observable nondeterminism.

Full diff: https://github.com/llvm/llvm-project/pull/118291.diff

3 Files Affected:

  • (modified) clang/include/clang/StaticAnalyzer/Core/AnalyzerOptions.def (+9-6)
  • (modified) clang/test/Analysis/analyzer-config.c (+3-3)
  • (modified) clang/unittests/StaticAnalyzer/Z3CrosscheckOracleTest.cpp (+7-7)
diff --git a/clang/include/clang/StaticAnalyzer/Core/AnalyzerOptions.def b/clang/include/clang/StaticAnalyzer/Core/AnalyzerOptions.def
index 737bc8e86cfb6a..64fb11821a2656 100644
--- a/clang/include/clang/StaticAnalyzer/Core/AnalyzerOptions.def
+++ b/clang/include/clang/StaticAnalyzer/Core/AnalyzerOptions.def
@@ -189,20 +189,23 @@ ANALYZER_OPTION(
     "crosscheck-with-z3-eqclass-timeout-threshold",
     "Set a timeout for bug report equivalence classes in milliseconds. "
     "If we exhaust this threshold, we will drop the bug report eqclass "
-    "instead of doing more Z3 queries. Set 0 for no timeout.", 700)
+    "instead of doing more Z3 queries. On fast machines, 700 is a sane value. "
+    "Set 0 for no timeout.", 0)
 
 ANALYZER_OPTION(
     unsigned, Z3CrosscheckTimeoutThreshold,
     "crosscheck-with-z3-timeout-threshold",
-    "Set a timeout for individual Z3 queries in milliseconds. "
-    "Set 0 for no timeout.", 300)
+    "Set a timeout for individual Z3 queries in milliseconds. On fast "
+    "machines, 400 is a sane value. "
+    "Set 0 for no timeout.", 15'000)
 
 ANALYZER_OPTION(
     unsigned, Z3CrosscheckRLimitThreshold,
     "crosscheck-with-z3-rlimit-threshold",
-    "Set the Z3 resource limit threshold. This sets a deterministic cutoff "
-    "point for Z3 queries, as longer queries usually consume more resources. "
-    "Set 0 for unlimited.", 400'000)
+    "Set the Z3 resource limit threshold. This sets a supposedly deterministic "
+    "cutoff point for Z3 queries, as longer queries usually consume more "
+    "resources. "
+    "Set 0 for unlimited.", 0)
 
 ANALYZER_OPTION(bool, ShouldReportIssuesInMainSourceFile,
                 "report-in-main-source-file",
diff --git a/clang/test/Analysis/analyzer-config.c b/clang/test/Analysis/analyzer-config.c
index 8eb869bac46f8f..0f1314aae9db57 100644
--- a/clang/test/Analysis/analyzer-config.c
+++ b/clang/test/Analysis/analyzer-config.c
@@ -41,9 +41,9 @@
 // CHECK-NEXT: cplusplus.Move:WarnOn = KnownsAndLocals
 // CHECK-NEXT: cplusplus.SmartPtrModeling:ModelSmartPtrDereference = false
 // CHECK-NEXT: crosscheck-with-z3 = false
-// CHECK-NEXT: crosscheck-with-z3-eqclass-timeout-threshold = 700
-// CHECK-NEXT: crosscheck-with-z3-rlimit-threshold = 400000
-// CHECK-NEXT: crosscheck-with-z3-timeout-threshold = 300
+// CHECK-NEXT: crosscheck-with-z3-eqclass-timeout-threshold = 0
+// CHECK-NEXT: crosscheck-with-z3-rlimit-threshold = 0
+// CHECK-NEXT: crosscheck-with-z3-timeout-threshold = 15000
 // CHECK-NEXT: ctu-dir = ""
 // CHECK-NEXT: ctu-import-cpp-threshold = 8
 // CHECK-NEXT: ctu-import-threshold = 24
diff --git a/clang/unittests/StaticAnalyzer/Z3CrosscheckOracleTest.cpp b/clang/unittests/StaticAnalyzer/Z3CrosscheckOracleTest.cpp
index ef07e47ee911b2..a8cb2782c7b72f 100644
--- a/clang/unittests/StaticAnalyzer/Z3CrosscheckOracleTest.cpp
+++ b/clang/unittests/StaticAnalyzer/Z3CrosscheckOracleTest.cpp
@@ -38,8 +38,8 @@ static const AnalyzerOptions DefaultOpts = [] {
 
   // Remember to update the tests in this file when these values change.
   // Also update the doc comment of `interpretQueryResult`.
-  assert(Config.Z3CrosscheckRLimitThreshold == 400'000);
-  assert(Config.Z3CrosscheckTimeoutThreshold == 300_ms);
+  assert(Config.Z3CrosscheckRLimitThreshold == 0);
+  assert(Config.Z3CrosscheckTimeoutThreshold == 15'000_ms);
   // Usually, when the timeout/rlimit threshold is reached, Z3 only slightly
   // overshoots until it realizes that it overshoot and needs to back off.
   // Consequently, the measured timeout should be fairly close to the threshold.
@@ -74,13 +74,13 @@ TEST_F(Z3CrosscheckOracleTest, SATWhenItGoesOverTime) {
 }
 
 TEST_F(Z3CrosscheckOracleTest, UNSATWhenItGoesOverTime) {
-  ASSERT_EQ(RejectEQClass, interpretQueryResult({UNSAT, 310_ms, 1000_step}));
+  ASSERT_EQ(RejectReport, interpretQueryResult({UNSAT, 310_ms, 1000_step}));
 }
 
 TEST_F(Z3CrosscheckOracleTest, RejectsTimeout) {
   ASSERT_EQ(RejectReport, interpretQueryResult({UNSAT, 25_ms, 1000_step}));
   ASSERT_EQ(RejectReport, interpretQueryResult({UNSAT, 25_ms, 1000_step}));
-  ASSERT_EQ(RejectEQClass, interpretQueryResult({UNDEF, 310_ms, 1000_step}));
+  ASSERT_EQ(RejectReport, interpretQueryResult({UNDEF, 310_ms, 1000_step}));
 }
 
 TEST_F(Z3CrosscheckOracleTest, RejectsUNSATs) {
@@ -97,7 +97,7 @@ TEST_F(Z3CrosscheckOracleTest, RejectEQClassIfSpendsTooMuchTotalTime) {
   // Simulate long queries, that barely doesn't trigger the timeout.
   ASSERT_EQ(RejectReport, interpretQueryResult({UNSAT, 290_ms, 1000_step}));
   ASSERT_EQ(RejectReport, interpretQueryResult({UNSAT, 290_ms, 1000_step}));
-  ASSERT_EQ(RejectEQClass, interpretQueryResult({UNSAT, 290_ms, 1000_step}));
+  ASSERT_EQ(RejectReport, interpretQueryResult({UNSAT, 290_ms, 1000_step}));
 }
 
 TEST_F(Z3CrosscheckOracleTest, SATWhenItSpendsTooMuchTotalTime) {
@@ -114,7 +114,7 @@ TEST_F(Z3CrosscheckOracleTest, RejectEQClassIfAttemptsManySmallQueries) {
     ASSERT_EQ(RejectReport, interpretQueryResult({UNSAT, 20_ms, 1000_step}));
   }
   // Do one more to trigger the heuristic.
-  ASSERT_EQ(RejectEQClass, interpretQueryResult({UNSAT, 1_ms, 1000_step}));
+  ASSERT_EQ(RejectReport, interpretQueryResult({UNSAT, 1_ms, 1000_step}));
 }
 
 TEST_F(Z3CrosscheckOracleTest, SATWhenIfAttemptsManySmallQueries) {
@@ -131,7 +131,7 @@ TEST_F(Z3CrosscheckOracleTest, SATWhenIfAttemptsManySmallQueries) {
 TEST_F(Z3CrosscheckOracleTest, RejectEQClassIfExhaustsRLimit) {
   ASSERT_EQ(RejectReport, interpretQueryResult({UNSAT, 25_ms, 1000_step}));
   ASSERT_EQ(RejectReport, interpretQueryResult({UNSAT, 25_ms, 1000_step}));
-  ASSERT_EQ(RejectEQClass, interpretQueryResult({UNDEF, 25_ms, 405'000_step}));
+  ASSERT_EQ(RejectReport, interpretQueryResult({UNDEF, 25_ms, 405'000_step}));
 }
 
 TEST_F(Z3CrosscheckOracleTest, SATWhenItExhaustsRLimit) {

@Szelethus
Copy link
Contributor Author

I also have some just-in-case measurements cooking to confirm that we don't observe any nondeterminism with these values. Previous measurements of mine were made with all configs set to 0. Our downstream release goes off with the patch totally reverted.

Copy link
Contributor

@NagyDonat NagyDonat left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The commit which introduced the new Z3 timeout logic (#97298) was landed with my approval, but since then I realized that this was a mistake -- in particular the concrete fine-tuned default values provide a good fit for one particular computer speed and (as the experimental results show) behave badly on systems that are significantly slower than that. (This should have been obvious in hindsight, but I missed it.)

Based on this I support restoring the legacy 15 second timeout which is good as a sane default value and ensures that even the slow computers finish practically all non-hanging Z3 calculations.

Changing these ANALYZER_OPTIONs might be useful for power users who can measure the speed of their worker nodes and configure some sharper timeouts that work for them in particular; but even that is a questionable decision, because with sharp timeouts a temporary loss of processing power (e.g. a job that's started in parallel on the same machine) can cause inconsistent results.

@Szelethus
Copy link
Contributor Author

I also have some just-in-case measurements cooking to confirm that we don't observe any nondeterminism with these values. Previous measurements of mine were made with all configs set to 0. Our downstream release goes off with the patch totally reverted.

Sanity check returned normal, no difference in between the report sets.
https://codechecker-demo.eastus.cloudapp.azure.com/Default/reports?run=qtbase_v6.2.0_determinism_check15A_1&run=libwebm_libwebm-1.0.0.27_determinism_check15A_1&run=vim_v8.2.1920_determinism_check15A_1&newcheck=qtbase_v6.2.0_determinism_check15B_1&newcheck=libwebm_libwebm-1.0.0.27_determinism_check15B_1&newcheck=vim_v8.2.1920_determinism_check15B_1

Copy link
Contributor

@steakhal steakhal left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have no major concerns with this patch.
I proposed some wording changes, and I think we should test the configuration we had for the defaults we had prior to this patch to showcase that what is the expected behavior for the situations that motivated the Z3 oracle.

Thank you for proposing this.

We will have two fixtures:
 - DefaultZ3CrosscheckOracleTest (default options)
 - LimitedZ3CrosscheckOracleTest
   (eqclass timeout 700ms, 300ms and 400k rlimit per query, aka. the previous default)

Each test is duplicated to showcase the different behavior in the two
configs.
…hat they intended to test

Some tests were dropped because they only make sense to test if the
given configuration is not disabled.
@steakhal
Copy link
Contributor

steakhal commented Dec 5, 2024

I took the time to update the branch with my recommendations.
Please confirm if you agree with the current status.

tldr; I only made sure the tests actually test what they intended now that the default threshold has changed.
I also added a couple more tests where it made sense, along with a new test showcasing the original issue that motivated the presence of the now disabled thresholds.

@Szelethus
Copy link
Contributor Author

Oh, I'm so sorry I left this as-is, thank you so much for picking up the slack! Yes, everything looks great!

@Szelethus Szelethus merged commit ea8e328 into llvm:main Dec 13, 2024
8 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
clang:static analyzer clang Clang issues not falling into any other category
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants