Skip to content

Automodel: Apply negative characteristics only to endpoints of the right kind. #15326

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 18 commits into from
Jan 17, 2024

Conversation

max-schaefer
Copy link
Contributor

Previously, we would apply negative characteristics to both source and sink candidates. Some negative characteristics had ad-hoc code to only consider sinks, but this wasn't done consistently, and was at odds with the semantics implied by class names like NotASinkCharacteristic. This PR aims to fix that, which turns out to be a little more involved than I had initially hoped.

The core of the PR is the third commit: endpoints are now associated with their potential endpoint types, and the logic for applying negative characteristics checks that candidates are only excluded based on negative characteristics for the right kind. So if we are looking for source candidates, we will no longer exclude an endpoint because there is a negative characteristic saying that this endpoint cannot be a sink candidate.

Of course this breaks a few things, which are fixed in follow-up commits, which make existing negative characteristics more precise and in particular nail down whether they should apply to source or sink candidates or both. To make this easier, I also got rid of negative sink types, which we previously used to signal that an endpoint shouldn't be a candidate for any sink type; there aren't very many sink types, so we can instead simply enumerate the sink types for which the endpoint shouldn't be a candidate.

I have tested these changes by running all six extraction queries on apache/geode. The following table shows the number of results and the query runtime, both before and after this change:

query results before (runtime) results after (runtime)
application mode
  - candidates 34,435 (460s) 25,417 (406s)
  - positive examples 3165 (5s) 3165 (2s)
  - negative examples 586 (408s) 519 (186s)
framework mode
  - candidates 39,802 (10s) 40,326 (10s)
  - positive examples 0 (0s) 0 (0s)
  - negative examples 6742 (13s) 6215 (2s)

As you can see, performance is largely unchanged. The number of positive examples is unchanged as well, and I have verified that the set of examples is indeed the same. The other queries yield different results.

For the application-mode queries it is unfortunately not easy to compare results because they do sampling, so even a very small change in actual results can yield big apparent differences. For the framework-mode queries, I looked at the 524 new results, and verified that they are due to parameters of is-style methods, exists methods, and exception-related methods, which previously were spuriously suppressed. Correspondingly, these parameters are no longer considered negative examples, which explains the change of results for that query.

Since this is a good example of the general impact of this PR, let me elaborate in slightly more detail.

We have a negative characteristic that says that any endpoint to do with a method with Boolean return type and a name starting with is should be suppressed. This stems from a time when we only looked for sink candidates, and the thinking was that arguments to such methods rarely if ever are sinks.

But with source candidates the situation is different: while a parameter of such a method should not be a sink candidate, it does make sense to consider it as a source candidate (in framework mode). We still don't want the result to be a source candidate (since it's just a Boolean), though.

The new characteristic implements this reasoning, leading to fewer useless source candidates.

Max Schaefer added 15 commits January 11, 2024 11:20
This reflects the fact that these predicates also deal with source candidates.
`Files.list` has a taint step from its first argument to its result, so that first argument should not be considered a sink candidate (and it is not). However, due to a bug in `IsMaDTaintStepCharacteristic` it is also not considered a source candidate, which is wrong: as the example shows, if that argument is a call we do very much want to consider it as a source candidate.
…se when determining candidates.

This prevents us from associating a sink candidate with a source type and vice versa.

However, this does not fix the problem of negative characteristics for sink types excluding source candidates.
Instead of positively implying the negative sink type, negative sink characteristics now negatively imply all sink types (but not source types). This is simpler and sice we will never have a huge number of sink types it doesn't impact performance either.

Changes to test results:

- The call to `createDirectories` at `Test.java:87` is now correctly classified as a source candidate, having previously been erroneously excluded by a negative _sink_ characteristic.
- The call to `compareTo` at `Test.java:48` is now erroneously classified as a source candidate; it should be suppressed by `IsSanitizerCharacteristic`, which is a negative sink characteristic, but should really be a negative source characteristic.
- In framework mode, several endpoints are now erroneously classified as source candidates even though they have neutral models, because `NeutralModelCharacteristic` is currently only a negative sink characteristic and not a negative source characteristic.
In particular, `IsSanitizerCharacteristic` is a negative _source_ characteristic (not a negative sink characteristic), while `NeutralModelCharacteristic` is both.

This eliminates the erroneous test results.
…e to take the extensibleType into account.
@max-schaefer max-schaefer requested a review from kaeluka January 15, 2024 13:32
@max-schaefer max-schaefer requested a review from a team as a code owner January 15, 2024 13:32
@github-actions github-actions bot added the Java label Jan 15, 2024
@kaeluka
Copy link
Contributor

kaeluka commented Jan 16, 2024

To do this PR review justice, I will block myself some focus time tomorrow. Am I blocking your progress?

@max-schaefer
Copy link
Contributor Author

Not at all, this is not urgent in any way. Happy to pair on reviewing if it helps!

Copy link
Contributor

@kaeluka kaeluka left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this is a fantastic PR. It seems like it manages to improve the quality of results, as well as simplifying code. I have some questions and comments, but none of them necessarily require any substantial follow-up work.

Thank you!

@max-schaefer
Copy link
Contributor Author

Thank you very much for the review, @kaeluka! I have addressed your suggestions, please take a look.

@max-schaefer max-schaefer force-pushed the max-schaefer/automodel-negative-sink-models branch from 2eb9de6 to 8614d7b Compare January 17, 2024 14:30
Copy link
Contributor

@kaeluka kaeluka left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

perfect, LGTM! Thank you!

@max-schaefer max-schaefer merged commit 3ae4848 into main Jan 17, 2024
@max-schaefer max-schaefer deleted the max-schaefer/automodel-negative-sink-models branch January 17, 2024 15:54
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants