Automodel: Apply negative characteristics only to endpoints of the right kind. #15326

max-schaefer · 2024-01-15T13:32:16Z

Previously, we would apply negative characteristics to both source and sink candidates. Some negative characteristics had ad-hoc code to only consider sinks, but this wasn't done consistently, and was at odds with the semantics implied by class names like NotASinkCharacteristic. This PR aims to fix that, which turns out to be a little more involved than I had initially hoped.

The core of the PR is the third commit: endpoints are now associated with their potential endpoint types, and the logic for applying negative characteristics checks that candidates are only excluded based on negative characteristics for the right kind. So if we are looking for source candidates, we will no longer exclude an endpoint because there is a negative characteristic saying that this endpoint cannot be a sink candidate.

Of course this breaks a few things, which are fixed in follow-up commits, which make existing negative characteristics more precise and in particular nail down whether they should apply to source or sink candidates or both. To make this easier, I also got rid of negative sink types, which we previously used to signal that an endpoint shouldn't be a candidate for any sink type; there aren't very many sink types, so we can instead simply enumerate the sink types for which the endpoint shouldn't be a candidate.

I have tested these changes by running all six extraction queries on apache/geode. The following table shows the number of results and the query runtime, both before and after this change:

query	results before (runtime)	results after (runtime)
application mode
- candidates	34,435 (460s)	25,417 (406s)
- positive examples	3165 (5s)	3165 (2s)
- negative examples	586 (408s)	519 (186s)
framework mode
- candidates	39,802 (10s)	40,326 (10s)
- positive examples	0 (0s)	0 (0s)
- negative examples	6742 (13s)	6215 (2s)

As you can see, performance is largely unchanged. The number of positive examples is unchanged as well, and I have verified that the set of examples is indeed the same. The other queries yield different results.

For the application-mode queries it is unfortunately not easy to compare results because they do sampling, so even a very small change in actual results can yield big apparent differences. For the framework-mode queries, I looked at the 524 new results, and verified that they are due to parameters of is-style methods, exists methods, and exception-related methods, which previously were spuriously suppressed. Correspondingly, these parameters are no longer considered negative examples, which explains the change of results for that query.

Since this is a good example of the general impact of this PR, let me elaborate in slightly more detail.

We have a negative characteristic that says that any endpoint to do with a method with Boolean return type and a name starting with is should be suppressed. This stems from a time when we only looked for sink candidates, and the thinking was that arguments to such methods rarely if ever are sinks.

But with source candidates the situation is different: while a parameter of such a method should not be a sink candidate, it does make sense to consider it as a source candidate (in framework mode). We still don't want the result to be a source candidate (since it's just a Boolean), though.

The new characteristic implements this reasoning, leading to fewer useless source candidates.

This reflects the fact that these predicates also deal with source candidates.

`Files.list` has a taint step from its first argument to its result, so that first argument should not be considered a sink candidate (and it is not). However, due to a bug in `IsMaDTaintStepCharacteristic` it is also not considered a source candidate, which is wrong: as the example shows, if that argument is a call we do very much want to consider it as a source candidate.

…se when determining candidates. This prevents us from associating a sink candidate with a source type and vice versa. However, this does not fix the problem of negative characteristics for sink types excluding source candidates.

Instead of positively implying the negative sink type, negative sink characteristics now negatively imply all sink types (but not source types). This is simpler and sice we will never have a huge number of sink types it doesn't impact performance either. Changes to test results: - The call to `createDirectories` at `Test.java:87` is now correctly classified as a source candidate, having previously been erroneously excluded by a negative _sink_ characteristic. - The call to `compareTo` at `Test.java:48` is now erroneously classified as a source candidate; it should be suppressed by `IsSanitizerCharacteristic`, which is a negative sink characteristic, but should really be a negative source characteristic. - In framework mode, several endpoints are now erroneously classified as source candidates even though they have neutral models, because `NeutralModelCharacteristic` is currently only a negative sink characteristic and not a negative source characteristic.

In particular, `IsSanitizerCharacteristic` is a negative _source_ characteristic (not a negative sink characteristic), while `NeutralModelCharacteristic` is both. This eliminates the erroneous test results.

…e to take the extensibleType into account.

java/ql/automodel/src/AutomodelApplicationModeExtractNegativeExamples.ql

java/ql/automodel/src/AutomodelFrameworkModeExtractNegativeExamples.ql

kaeluka · 2024-01-16T14:14:16Z

To do this PR review justice, I will block myself some focus time tomorrow. Am I blocking your progress?

max-schaefer · 2024-01-16T14:21:00Z

Not at all, this is not urgent in any way. Happy to pair on reviewing if it helps!

kaeluka

I think this is a fantastic PR. It seems like it manages to improve the quality of results, as well as simplifying code. I have some questions and comments, but none of them necessarily require any substantial follow-up work.

Thank you!

java/ql/automodel/src/AutomodelSharedCharacteristics.qll

java/ql/automodel/src/AutomodelApplicationModeExtractNegativeExamples.ql

java/ql/automodel/src/AutomodelFrameworkModeExtractNegativeExamples.ql

...model/test/AutomodelFrameworkModeExtraction/AutomodelFrameworkModeExtractCandidates.expected

.../AutomodelApplicationModeExtraction/AutomodelApplicationModeExtractNegativeExamples.expected

max-schaefer · 2024-01-17T14:27:51Z

Thank you very much for the review, @kaeluka! I have addressed your suggestions, please take a look.

kaeluka

perfect, LGTM! Thank you!

Max Schaefer added 15 commits January 11, 2024 11:20

Rename isSinkCandidate (and a related predicate) to isCandidate.

8e429bd

This reflects the fact that these predicates also deal with source candidates.

Drop a conjunct which is now spurious.

bcf4f4f

Properly distinguish negative source and sink characteristics.

6e9c90a

In particular, `IsSanitizerCharacteristic` is a negative _source_ characteristic (not a negative sink characteristic), while `NeutralModelCharacteristic` is both. This eliminates the erroneous test results.

Make Unexploitable*Characteristic more precise.

9f443d4

Share some code.

76b8430

Eliminate GetCallable modules and use getCallable instead.

06ba5ea

Extend negative characteristics for exceptions to source models.

ea26e21

Rename a predicate.

45ca301

Refactor to avoid bad join order.

bb63fcd

Some more performance refactoring.

919330f

Remove a few getExtensibleType checks which are now unnecessary.

68cf9ac

When checking whether an endpoint has already been modelled, make sur…

3befce9

…e to take the extensibleType into account.

max-schaefer requested a review from kaeluka January 15, 2024 13:32

max-schaefer requested a review from a team as a code owner January 15, 2024 13:32

github-actions bot added the Java label Jan 15, 2024

github-advanced-security bot found potential problems Jan 15, 2024

View reviewed changes

java/ql/automodel/src/AutomodelApplicationModeExtractNegativeExamples.ql Fixed Show fixed Hide fixed

java/ql/automodel/src/AutomodelFrameworkModeExtractNegativeExamples.ql Fixed Show fixed Hide fixed

Max Schaefer added 2 commits January 15, 2024 13:44

Autoformat.

fee4407

Fix omittable exists.

90a4552

kaeluka reviewed Jan 17, 2024

View reviewed changes

.../AutomodelApplicationModeExtraction/AutomodelApplicationModeExtractNegativeExamples.expected Outdated Show resolved Hide resolved

Address review feedback.

8614d7b

max-schaefer force-pushed the max-schaefer/automodel-negative-sink-models branch from 2eb9de6 to 8614d7b Compare January 17, 2024 14:30

kaeluka approved these changes Jan 17, 2024

View reviewed changes

max-schaefer merged commit 3ae4848 into main Jan 17, 2024

max-schaefer deleted the max-schaefer/automodel-negative-sink-models branch January 17, 2024 15:54

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Automodel: Apply negative characteristics only to endpoints of the right kind. #15326

Automodel: Apply negative characteristics only to endpoints of the right kind. #15326

max-schaefer commented Jan 15, 2024

kaeluka commented Jan 16, 2024

max-schaefer commented Jan 16, 2024

kaeluka left a comment

max-schaefer commented Jan 17, 2024

kaeluka left a comment

Automodel: Apply negative characteristics only to endpoints of the right kind. #15326

Automodel: Apply negative characteristics only to endpoints of the right kind. #15326

Conversation

max-schaefer commented Jan 15, 2024

kaeluka commented Jan 16, 2024

max-schaefer commented Jan 16, 2024

kaeluka left a comment

Choose a reason for hiding this comment

max-schaefer commented Jan 17, 2024

kaeluka left a comment

Choose a reason for hiding this comment