-
Notifications
You must be signed in to change notification settings - Fork 1.7k
Automodel: Apply negative characteristics only to endpoints of the right kind. #15326
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Automodel: Apply negative characteristics only to endpoints of the right kind. #15326
Conversation
This reflects the fact that these predicates also deal with source candidates.
`Files.list` has a taint step from its first argument to its result, so that first argument should not be considered a sink candidate (and it is not). However, due to a bug in `IsMaDTaintStepCharacteristic` it is also not considered a source candidate, which is wrong: as the example shows, if that argument is a call we do very much want to consider it as a source candidate.
…se when determining candidates. This prevents us from associating a sink candidate with a source type and vice versa. However, this does not fix the problem of negative characteristics for sink types excluding source candidates.
Instead of positively implying the negative sink type, negative sink characteristics now negatively imply all sink types (but not source types). This is simpler and sice we will never have a huge number of sink types it doesn't impact performance either. Changes to test results: - The call to `createDirectories` at `Test.java:87` is now correctly classified as a source candidate, having previously been erroneously excluded by a negative _sink_ characteristic. - The call to `compareTo` at `Test.java:48` is now erroneously classified as a source candidate; it should be suppressed by `IsSanitizerCharacteristic`, which is a negative sink characteristic, but should really be a negative source characteristic. - In framework mode, several endpoints are now erroneously classified as source candidates even though they have neutral models, because `NeutralModelCharacteristic` is currently only a negative sink characteristic and not a negative source characteristic.
In particular, `IsSanitizerCharacteristic` is a negative _source_ characteristic (not a negative sink characteristic), while `NeutralModelCharacteristic` is both. This eliminates the erroneous test results.
…e to take the extensibleType into account.
java/ql/automodel/src/AutomodelApplicationModeExtractNegativeExamples.ql
Fixed
Show fixed
Hide fixed
To do this PR review justice, I will block myself some focus time tomorrow. Am I blocking your progress? |
Not at all, this is not urgent in any way. Happy to pair on reviewing if it helps! |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think this is a fantastic PR. It seems like it manages to improve the quality of results, as well as simplifying code. I have some questions and comments, but none of them necessarily require any substantial follow-up work.
Thank you!
java/ql/automodel/src/AutomodelApplicationModeExtractNegativeExamples.ql
Outdated
Show resolved
Hide resolved
java/ql/automodel/src/AutomodelApplicationModeExtractNegativeExamples.ql
Outdated
Show resolved
Hide resolved
java/ql/automodel/src/AutomodelApplicationModeExtractNegativeExamples.ql
Show resolved
Hide resolved
java/ql/automodel/src/AutomodelApplicationModeExtractNegativeExamples.ql
Outdated
Show resolved
Hide resolved
java/ql/automodel/src/AutomodelFrameworkModeExtractNegativeExamples.ql
Outdated
Show resolved
Hide resolved
java/ql/automodel/src/AutomodelFrameworkModeExtractNegativeExamples.ql
Outdated
Show resolved
Hide resolved
...model/test/AutomodelFrameworkModeExtraction/AutomodelFrameworkModeExtractCandidates.expected
Outdated
Show resolved
Hide resolved
.../AutomodelApplicationModeExtraction/AutomodelApplicationModeExtractNegativeExamples.expected
Outdated
Show resolved
Hide resolved
Thank you very much for the review, @kaeluka! I have addressed your suggestions, please take a look. |
2eb9de6
to
8614d7b
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
perfect, LGTM! Thank you!
Previously, we would apply negative characteristics to both source and sink candidates. Some negative characteristics had ad-hoc code to only consider sinks, but this wasn't done consistently, and was at odds with the semantics implied by class names like
NotASinkCharacteristic
. This PR aims to fix that, which turns out to be a little more involved than I had initially hoped.The core of the PR is the third commit: endpoints are now associated with their potential endpoint types, and the logic for applying negative characteristics checks that candidates are only excluded based on negative characteristics for the right kind. So if we are looking for source candidates, we will no longer exclude an endpoint because there is a negative characteristic saying that this endpoint cannot be a sink candidate.
Of course this breaks a few things, which are fixed in follow-up commits, which make existing negative characteristics more precise and in particular nail down whether they should apply to source or sink candidates or both. To make this easier, I also got rid of negative sink types, which we previously used to signal that an endpoint shouldn't be a candidate for any sink type; there aren't very many sink types, so we can instead simply enumerate the sink types for which the endpoint shouldn't be a candidate.
I have tested these changes by running all six extraction queries on
apache/geode
. The following table shows the number of results and the query runtime, both before and after this change:As you can see, performance is largely unchanged. The number of positive examples is unchanged as well, and I have verified that the set of examples is indeed the same. The other queries yield different results.
For the application-mode queries it is unfortunately not easy to compare results because they do sampling, so even a very small change in actual results can yield big apparent differences. For the framework-mode queries, I looked at the 524 new results, and verified that they are due to parameters of
is
-style methods,exists
methods, and exception-related methods, which previously were spuriously suppressed. Correspondingly, these parameters are no longer considered negative examples, which explains the change of results for that query.Since this is a good example of the general impact of this PR, let me elaborate in slightly more detail.
We have a negative characteristic that says that any endpoint to do with a method with Boolean return type and a name starting with
is
should be suppressed. This stems from a time when we only looked for sink candidates, and the thinking was that arguments to such methods rarely if ever are sinks.But with source candidates the situation is different: while a parameter of such a method should not be a sink candidate, it does make sense to consider it as a source candidate (in framework mode). We still don't want the result to be a source candidate (since it's just a Boolean), though.
The new characteristic implements this reasoning, leading to fewer useless source candidates.