Skip to content

Ruby: more type-tracking steps #10650

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 37 commits into from
Oct 5, 2022
Merged

Ruby: more type-tracking steps #10650

merged 37 commits into from
Oct 5, 2022

Conversation

asgerf
Copy link
Contributor

@asgerf asgerf commented Sep 30, 2022

Expands on the work in #10375 to add more type-tracking steps.

In particular, we are now able to generate type-tracking steps in more cases:

  • The input and/or output can now be mapped to a parameter or return of a callback/block.
  • WithContent and WithoutContent are now supported with precision that's more appropriate for type tracking.
  • When both input and output end with a content, we can now map it to a load-store step.

There are a couple of performance tweaks along the way, resulting in mostly neutral performance overall.

I found the easiest way to leverage the existing test suite was to add inline test that rely on type-tracking, and simply let the test output document the different between DataFlow::Configuration flow and type-tracking flow.

Evaluation shows mostly neutral performance and around 21,000 new call edges.

I triaged some of the changes in an earlier evaluation, and the only spurious call edge I could find was the call to delete! which I believe is ultimately because a yield call resolved to a callback coming from a different caller. (In JS, plain callbacks are not part of the static call graph and are handled specially in type-tracking - we could do the same in Ruby but it doesn't seem urgent.)

There’s still some work to be done, but given the performance-related commits in here, I’d prefer to merge sooner rather than later, so we don’t end up redoing or undoing each other’s performance work.

Edit: updated text to reflect final state of PR

@github-actions github-actions bot added the Ruby label Sep 30, 2022
@asgerf asgerf force-pushed the rb/summarize-more branch 3 times, most recently from 3c799e7 to d1d2fae Compare October 3, 2022 10:40
@asgerf asgerf force-pushed the rb/summarize-more branch 3 times, most recently from cc09b73 to 31378ec Compare October 3, 2022 12:25
@github-actions github-actions bot added the Python label Oct 3, 2022
@asgerf asgerf added the no-change-note-required This PR does not need a change note label Oct 3, 2022
@asgerf asgerf marked this pull request as ready for review October 3, 2022 14:25
@asgerf asgerf requested review from a team as code owners October 3, 2022 14:25
@asgerf asgerf force-pushed the rb/summarize-more branch from 4a20227 to 9485940 Compare October 4, 2022 09:15
@asgerf
Copy link
Contributor Author

asgerf commented Oct 4, 2022

Rebased to resolve conflict with #10664. Started another evaluation.

@asgerf
Copy link
Contributor Author

asgerf commented Oct 4, 2022

New evaluation

Copy link
Contributor

@hvitved hvitved left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Initial review comments. We should be able to revert the Hash changes once #10686 is in.

@asgerf
Copy link
Contributor Author

asgerf commented Oct 5, 2022

Thanks for the review! I'll work on getting type-tracking to rely on the Argument[hash-splat]-based summary instead, to replace the hand-written steps for Hash.[].

@asgerf
Copy link
Contributor Author

asgerf commented Oct 5, 2022

New evaluation looks... unexpectedly good. We now gain 21,000 call edges at a very modest cost. Previously we only gained 2k call edges. I'll take some time to investigate exactly how the recent changes caused this change.

@hvitved
Copy link
Contributor

hvitved commented Oct 5, 2022

I had to specialise handling of the Hash.[] summary as it would otherwise generate hundreds of useless edges for each element in a hash literal (the number depends on the number of constants in the program). For example, on one project a literal { foo: bar } would map to 648 load-store steps from the foo: bar pair into the hash literal itself. The specialised version simply adds a store step directly from bar to the hash literal. I’ve added hooks so the specialisation happens in the model itself, not as a special case in the type-tracking library.

You may want to remove this from the description now.

Copy link
Contributor

@hvitved hvitved left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Overall this LGTM. Some comments.

@@ -381,7 +381,7 @@ private module Cached {
n instanceof SynthReturnNode
or
// Needed for stores in type tracking
TypeTrackerSpecific::postUpdateStoreStep(_, n, _)
TypeTrackerSpecific::storeStepIntoSourceNode(_, n, _)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I belive we should remove

not n instanceof SynthHashSplatParameterNode

above as well.

@@ -1,6 +1,7 @@
/** Provides flow summaries for the `Hash` class. */

private import codeql.ruby.AST
private import codeql.ruby.CFG as Cfg
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Remove this import.

* of the standard library, and their implementations may not depend on API graphs.
* For query-specific steps, consider including the custom steps in the type-tracking predicate itself.
*/
class TypeTrackingStep extends TUnit {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am not convinced that we should add this, especially as it is not used presently.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fine, let's remove it for now

nodeFrom = evaluateSummaryComponentLocal(call, input) and
nodeTo = evaluateSummaryComponentLocal(call, output)
nodeFrom = evaluateSummaryComponentStackLocal(callable, call, input) and
nodeTo = evaluateSummaryComponentStackLocal(callable, call, output)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For a store step induced from a flow summary

input = Argument[0] and
output = Argument[1].Element[?]

and a call to the relevant method

foo(x, y)

doesn't this yield a store-step x -store-> y, while it should have been x -store-> [post] y?

The same for read-store steps.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Type tracking isn't supposed to be flow sensitive. The intent is for the store to ultimately target the local sources of y and flow from there to its uses.

There's some pre-existing code here about post-updates, which I don't really agree with, but for this PR I'm trying not to change too many things at once.

}

pragma[nomagic]
private predicate hasWithoutContentSummary(
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Does this work with compound WithoutContents (see #10691)?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If at most one of those WithoutContent has an associated ContentFilter then yes, because the rest are ignored and treated as ordinary flow (in evaluateSummaryComponentStackLocal).

)
}

/**
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copy-paste error (mentions WithoutContent.

}

pragma[nomagic]
private predicate hasWithContentSummary(
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same question as for hasWithoutContentSummary.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No, doesn't handle compound WithContent.

filter = getFilterFromWithoutContentStep(content) and
not isNonLocal(input.head()) and
not isNonLocal(output.head()) and
input != output
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This restriction is super important, as otherwise use-use flow will ignore it (that's what prohibitsUseUseFlow is there for in data flow). Perhaps highlight that with a comment.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Type-tracking doesn't depend on prohibitsUseUseFlow. It's simply here to clean up some steps that would have no effect for type-tracking.

hvitved
hvitved previously approved these changes Oct 5, 2022
@asgerf
Copy link
Contributor Author

asgerf commented Oct 5, 2022

It seems the large number of extra call edges is a result of merging with @aibaars's PR #10559. The added summaries can now be used by type-tracking to great effect. 💪

@asgerf asgerf merged commit 387e575 into github:main Oct 5, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
no-change-note-required This PR does not need a change note Python Ruby
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants