Skip to content

ATM: undo unsound performance optimizations #8470

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Draft
wants to merge 2 commits into
base: main
Choose a base branch
from

Conversation

esbena
Copy link
Contributor

@esbena esbena commented Mar 16, 2022

For an ordinary path-problem query, it is a requirement that at least one sink exists, otherwise there is nothing to alert on.
Thus the optimization with checking isSink(_, this, _) pruning in Configuration::hasFlow is sound.

For programs without any relevant sinks (which is likely to be the case when ATM is used), the Configuration::hasFlow calls will not hold due to the above pruning step.
The optimizations removed in this commit are thus unsound for programs without any relevant sinks.

Relevant pings: @henrymercer @adityasharad

Hopefully performance works out without this 🤞🏻🤞🏻.

Impact

This means that it now is possible to predict sinks in programs without any known sinks(!!!).

Testing

Testing this is presumably tricky at the moment due to the required presence of a mlmodel. So I include my own manual test case for completeness and posterity.

Example database sources

// danger_sink_call.js
danger(sink)
// flow_to_predicted_sink.ts
// carefully selected file content that presently predicts `req.params.id` to be a sink
import { Request } from "express";

module.exports = function retrieveBasket() {
  return (req: Request) => {
    models.Basket.findOne({
      where: { id: req.params.id }
    });
  };
};

Example query that produces no/some results without/with this PR.

import DataFlow::PathGraph
import experimental.adaptivethreatmodeling.TaintedPathATM
import experimental.adaptivethreatmodeling.EndpointFeatures
import experimental.adaptivethreatmodeling.EndpointScoring

from DataFlow::Node scored, EndpointType type, string basename, float scoreNum, string description
where
  description = type.getDescription() and
  ModelScoring::endpointScores(scored, type.getEncoding(), scoreNum) and
  scored.getFile().getBaseName() = basename and
  basename = ["danger_sink_call.js", "flow_to_predicted_sink.ts"]
select scored, basename, description, scoreNum

Confirming that hasFlow(...) works for the full ATM query without any known sinks

Run TaintedPathATM.ql.

For an ordinary path-problem query, it is a requirement that at least one sink exists, otherwise there is nothing to alert on.
Thus the optimization with checking `isSink(_, this, _)` pruning in `Configuration::hasFlow` is sound.

For programs without any relevant sinks (which is likely to be the case when ATM is used), the `Configuration::hasFlow` calls will not hold due to the above pruning step.
The optimizations removed in this commit are thus unsound for programs without any relevant sinks.
@esbena esbena added the Awaiting evaluation Do not merge yet, this PR is waiting for an evaluation to finish label Mar 16, 2022
@esbena esbena requested a review from a team March 16, 2022 21:22
@github-actions github-actions bot added the JS label Mar 16, 2022
@adityasharad
Copy link
Collaborator

I think this makes sense. The goal is to discover new sinks, so we cannot restrict the analysis to only consider known sinks.

@esbena
Copy link
Contributor Author

esbena commented May 23, 2022

Marking as draft due to many other in-flight ATM changes.

@esbena esbena marked this pull request as draft May 23, 2022 12:10
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Awaiting evaluation Do not merge yet, this PR is waiting for an evaluation to finish JS
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants