Ruby: type-tracking and API edges through simple library callables #10375

asgerf · 2022-09-12T08:09:43Z

This PR ensures that simple library callables whose callee does not depend on API graphs can be seen by type-tracking and API graphs. "Simple" in this case means it can be reduced to a store or load edge at the call site, though in the future we may want to add load-store and basic edges as well.

For example, a summary of form Argument[0].Element[0];ReturnValue will be seen as:

a load step in type-tracking, reading the content set corresponding to Element[0], and
an edge in the API graph labelled with the content corresponding to Element[0].

Ruby is the first language to use the string-based flow summary approach combined with type-tracking and API graphs, but the two systems hadn't been fully integrated. In particular, Ruby has so far been modelling the standard library using summaries that are invisible to type-tracking and API graphs - this would lead to incompleteness compared to just generating the flow edges directly. I believe this PR bridges the gap, so we can get the best of both worlds while modelling everything with flow summaries.

Evaluation shows 216 new call edges, at a modest performance cost. I've investigated the performance overhead a bit, but couldn't get it any lower than it is now - partly because we are just tracking more flow now.

I'm running an evaluation for Python as well, since some shared code was affected.

github-advanced-security

Found 46 potential problems in the proposed changes. Check the Files changed tab for more details.

github-advanced-security

Found 46 potential problems in the proposed changes. Check the Files changed tab for more details.

ruby/ql/lib/codeql/ruby/typetracking/TypeTrackerSpecific.qll

ruby/ql/lib/codeql/ruby/typetracking/TypeTracker.qll

python/ql/lib/semmle/python/dataflow/new/internal/TypeTracker.qll

python/ql/lib/semmle/python/dataflow/new/internal/TypeTrackerSpecific.qll

hvitved

Looks great Asger! A few comments.

hvitved · 2022-09-14T08:06:42Z

ruby/ql/lib/codeql/ruby/ApiGraphs.qll

+     * Gets a node representing the `content` stored on the base object.
+     */
+    Node getContent(DataFlow::Content content) {
+      result = this.getASuccessor(Label::content(content))


Couldn't we use ContentSets in the API graph instead, and then let this predicate use getAReadContent when this and Node are use nodes, and getAStoreContent when they are def nodes?

getAReadContent can sometimes have a huge fan-out, which I would like to not have materialized in the graph itself.

It is done like this exactly to avoid the fan-out from getAReadContent.

Here's a break down of how the graph is generated and what ends up being matched:

Description Edge label Matched by

Use node reading a known element n TKnownElementContent(n) Element[n] and Element[any]

Def node storing a known element n TKnownElementContent(n) Element[n] and Element[any]

Use node reading an unknown element TUnknownElementContent Element[?] and Element[any]

Def node storing an unknown element TUnknownElementContent Element[?] and Element[any]

Element[n] does not match TUnknownElementContent, and the model must thus use Element[?,n] if this behaviour is wanted. This seems consistent with how existing Ruby models are required to mention Element[?] explicitly.

I believe this is the best integration we can hope for without changing how ContentSet works.

I tried storing ContentSet on the edge label but it doesn't really work out nicely. getAReadContent() returns a Content, but the edges are labelled with a ContentSet so we have to map it back to a ContentSet to look up in the edge relation. To ensure an efficient join there we'd end up factoring out a relation indexed on the Content, which basically corresponds the edge relation we're using now.

Thanks for the detailed explanation.

After merging with #10574 I believe the table is updated as follows:

the first two rows can also be matched by Element[n!]

the last two rows can also be matched by Element[k] for any constant k

Description Edge label Matched by

Use node reading a known element n TKnownElementContent(n) Element[n], Element[n!], and Element[any]

Def node storing a known element n TKnownElementContent(n) Element[n], Element[n!], and Element[any]

Use node reading an unknown element TUnknownElementContent Element[k] for any k, Element[?] and Element[any]

Def node storing an unknown element TUnknownElementContent Element[k] for any k, Element[?] and Element[any]

Previously it was the model's responsibility to mention Element[?] if unknown reads/stores should be found.

Now models are now responsible for using Element[n!] if this behaviour is not wanted.

Again this seems consistent with how Ruby models work after #10574, so I'm cautiously optimistic that everything just works after this change.

ruby/ql/lib/codeql/ruby/ApiGraphs.qll

ruby/ql/lib/codeql/ruby/dataflow/internal/DataFlowPublic.qll

ruby/ql/lib/codeql/ruby/dataflow/internal/DataFlowPrivate.qll

ruby/ql/lib/codeql/ruby/dataflow/internal/DataFlowPublic.qll

ruby/ql/lib/codeql/ruby/typetracking/TypeTracker.qll

ruby/ql/lib/codeql/ruby/typetracking/TypeTrackerSpecific.qll

ruby/ql/test/library-tests/modules/calls.rb

ruby/ql/lib/codeql/ruby/dataflow/internal/DataFlowPrivate.qll

ruby/ql/test/library-tests/dataflow/api-graphs/ApiGraphs.ql

asgerf · 2022-09-22T14:08:13Z

Thanks for the review so far @hvitved.

I have rebased on top the of recent call graph changes, resolving a conflict with #10531, and moved the change to trackUseNode later in history to make it easier to evaluation in isolation.

The first new commit is Ruby: expand test case to reveal mismatching forward/backward flow.

asgerf · 2022-09-22T14:10:23Z

New evaluation

ruby/ql/lib/codeql/ruby/dataflow/internal/DataFlowPrivate.qll

ruby/ql/lib/codeql/ruby/ApiGraphs.qll

asgerf · 2022-09-26T12:33:16Z

We've got quite a few unresolved threads open. @hvitved could you 'mark as resolved' any conversations you feel have been addressed.

From my perspective the two outstanding items are:

trackUseNode pruning: Waiting for this evaluation.
summarySetterStep: Waiting for your feedback in on whether to add boolean valuePreserving to the relevant predicates.

hvitved · 2022-09-26T19:28:28Z

@hvitved could you 'mark as resolved' any conversations you feel have been addressed.

Done.

asgerf · 2022-09-27T11:54:10Z

Force-pushed to remove some revert commits that I had accidentally pushed as part of the investigation in this thread.

fixup qldoc in OptionalTypeTrckerContent

The optimizations done here now seem to backfire and cause more problems than they fix.

asgerf · 2022-09-28T08:50:33Z

Rebased to fix conflict in DataFlowDispatch.qll: the call to getACallSimple now appears in viableLibraryCallable.

This only fixes superficial conflicts with github#10574 semantic conflicts will be addressed in later commits

asgerf · 2022-09-28T09:47:15Z

Merging with #10574 went smoother than expected, and at first glance it seems no real changes are necessary. I take that as an indicator that we're doing it right, namely type-trackers storing a ContentSet and API graph edges labelled with Content.

Running another evaluation, hoping it doesn't prove me wrong 😄

hvitved

Two last comments (sorry for not catching earlier).

ruby/ql/lib/codeql/ruby/dataflow/internal/DataFlowPrivate.qll

ruby/ql/lib/codeql/ruby/typetracking/TypeTrackerSpecific.qll

asgerf · 2022-09-29T10:19:03Z

The evaluation shows a slightly increased cost for ruby/ruby after merging with #10574. I think part of the reason is that there are about twice as many content sets now, and ruby/ruby has a lot of content sets. I tried restricting the content sets we care about with this commit and although the tuple counts showed a great reduction, it did not seem to help. I'd prefer to move ahead with the PR as it is.

hvitved · 2022-09-29T11:29:29Z

I tried restricting the content sets we care about with this commit and although the tuple counts showed a great reduction, it did not seem to help. I'd prefer to move ahead with the PR as it is.

I do like that commit, in particular because TStepSummary is cached. We could even make the two branches even smaller via

private predicate compatibleStoreLoadContents(TypeTrackerContent storeContents, TypeTrackerContent loadContents) {
  basicStoreStep(_, _, storeContents) and
  basicLoadStep(_, _, loadContents) and
  compatibleContents(storeContents, loadContents)
}
...
    StoreStep(TypeTrackerContent content) { compatibleStoreLoadContents(content, _)  } or
    LoadStep(TypeTrackerContent content) { compatibleStoreLoadContents(_, content) } or

asgerf · 2022-09-29T13:58:05Z

I measured the size of StepSummary with each change on ruby/ruby:

Commit	Size
Baseline `dc03557`	4,111,472
Size restriction from asgerf@`1c4c058`	4,633
Additional size restriction from @hvitved's comment	4,428

Unfortunately neither version is compatible with TypeTracker.startInContent as a model might start type-tracking in a property that has a load in the database but not a store. The predicate isn't currently used in Ruby, but has uses in JS and Python.

I've pushed another commit that makes the restriction on type-tracker compatible with startInContent and start another evaluation (not that I expect it look better than last time).

hvitved

Thanks for you hard work, and patience, on this one!

asgerf · 2022-09-30T10:21:37Z

Evaluation looks similar to the last one.

The last Python evaluation shows a median slow-down of 1% which is a bummer given that this PR does nothing for Python. We can't do langauge-dependent join order tweaks in shared code, and the best join order is slightly different since Ruby has ContentSet and Python doesn't.

asgerf · 2022-09-30T12:25:08Z

Another evaluation of Python including the latest changes shows the same result, and I got a 👍 from the Python team that we're ok with this.

asgerf added no-change-note-required This PR does not need a change note Ruby labels Sep 12, 2022

github-actions bot added the Python label Sep 12, 2022

github-advanced-security bot found potential problems Sep 12, 2022

View reviewed changes

asgerf force-pushed the rb/summarize-loads-v2 branch from 9268c4b to e52553b Compare September 12, 2022 10:09

github-advanced-security bot found potential problems Sep 12, 2022

View reviewed changes

asgerf force-pushed the rb/summarize-loads-v2 branch from e52553b to 275b2a9 Compare September 12, 2022 10:55

github-advanced-security bot found potential problems Sep 12, 2022

View reviewed changes

asgerf force-pushed the rb/summarize-loads-v2 branch from d980c31 to 00bb877 Compare September 13, 2022 09:58

asgerf marked this pull request as ready for review September 13, 2022 13:47

asgerf requested review from a team as code owners September 13, 2022 13:47

hvitved requested changes Sep 14, 2022

View reviewed changes

calumgrant requested review from tausbn and yoff September 20, 2022 09:08

asgerf force-pushed the rb/summarize-loads-v2 branch 3 times, most recently from d5a06ad to 51ceef8 Compare September 22, 2022 12:23

github-advanced-security bot found potential problems Sep 22, 2022

View reviewed changes

ruby/ql/lib/codeql/ruby/dataflow/internal/DataFlowPrivate.qll Fixed Show fixed Hide fixed

ruby/ql/test/library-tests/dataflow/api-graphs/ApiGraphs.ql Fixed Show fixed Hide fixed

hvitved reviewed Sep 23, 2022

View reviewed changes

ruby/ql/lib/codeql/ruby/dataflow/internal/DataFlowPrivate.qll Outdated Show resolved Hide resolved

hvitved reviewed Sep 23, 2022

View reviewed changes

ruby/ql/lib/codeql/ruby/ApiGraphs.qll Show resolved Hide resolved

asgerf force-pushed the rb/summarize-loads-v2 branch from 0275ad5 to 463c6ca Compare September 27, 2022 11:52

asgerf added 3 commits September 28, 2022 10:49

Ruby: Add getACallSimple and use it for arrays and hashes

53ef054

Ruby: use IPA type for type tracker contents

f1b99e8

fixup qldoc in OptionalTypeTrckerContent

Ruby: generate type-tracking steps from simple summary specs

cd9cddf

asgerf added 5 commits September 28, 2022 10:49

Ruby: revert trackUseNode to idiomatic type-tracking

665ee81

The optimizations done here now seem to backfire and cause more problems than they fix.

Ruby: remove unneeded qualified AST import

ce3665d

Ruby: remove unneeded import

14e384a

Ruby: move OptionalContentSet to TypeTrackerSpecific.qll

e1dfed0

Ruby: add missing qldoc

e56630a

asgerf force-pushed the rb/summarize-loads-v2 branch from 463c6ca to e56630a Compare September 28, 2022 08:49

hvitved mentioned this pull request Sep 28, 2022

Ruby: Fix spurious flow through reverse stores #10574

Merged

asgerf added 4 commits September 28, 2022 11:11

Merge branch 'main' into rb/summarize-loads-v2

ee7dea1

This only fixes superficial conflicts with github#10574 semantic conflicts will be addressed in later commits

Ruby: Update TypeTracker.expected

ce1c258

Ruby: update API graph inline test to match output

9716572

Ruby: expand on type-tracking test a bit

fea47c8

hvitved reviewed Sep 28, 2022

View reviewed changes

ruby/ql/lib/codeql/ruby/dataflow/internal/DataFlowPrivate.qll Outdated Show resolved Hide resolved

ruby/ql/lib/codeql/ruby/typetracking/TypeTrackerSpecific.qll Outdated Show resolved Hide resolved

asgerf and others added 4 commits September 28, 2022 15:18

Ruby: mention TNoContentSet is only used by type-tracking

8704cce

Ruby: reuse argumentPositionMatch

76cab23

Ruby: Include With(out)Element in isElementBody

3af3772

Merge branch 'main' into rb/summarize-loads-v2

dc03557

asgerf added 3 commits September 29, 2022 14:10

Ruby: Restrict summaries and type trackers to relevant contents

f1de5a2

Ruby: ensure pruning works with startInContent

ae60b0a

Python: sync TypeTracker.qll

ed36f19

hvitved approved these changes Sep 29, 2022

View reviewed changes

asgerf merged commit 6e1914a into github:main Sep 30, 2022

asgerf mentioned this pull request Oct 3, 2022

Ruby: more type-tracking steps #10650

Merged

hvitved mentioned this pull request Oct 4, 2022

Ruby/Python: Cache more type tracking predicates #10664

Merged

Description	Edge label	Matched by
Use node reading a known element `n`	`TKnownElementContent(n)`	`Element[n]` and `Element[any]`
Def node storing a known element `n`	`TKnownElementContent(n)`	`Element[n]` and `Element[any]`
Use node reading an unknown element	`TUnknownElementContent`	`Element[?]` and `Element[any]`
Def node storing an unknown element	`TUnknownElementContent`	`Element[?]` and `Element[any]`

Description	Edge label	Matched by
Use node reading a known element `n`	`TKnownElementContent(n)`	`Element[n]`, `Element[n!]`, and `Element[any]`
Def node storing a known element `n`	`TKnownElementContent(n)`	`Element[n]`, `Element[n!]`, and `Element[any]`
Use node reading an unknown element	`TUnknownElementContent`	`Element[k]` for any `k`, `Element[?]` and `Element[any]`
Def node storing an unknown element	`TUnknownElementContent`	`Element[k]` for any `k`, `Element[?]` and `Element[any]`

Ruby: type-tracking and API edges through simple library callables #10375

Ruby: type-tracking and API edges through simple library callables #10375

Uh oh!

Conversation

asgerf commented Sep 12, 2022 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

github-advanced-security bot left a comment

Choose a reason for hiding this comment

Uh oh!

github-advanced-security bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

hvitved left a comment

Choose a reason for hiding this comment

Uh oh!

hvitved Sep 14, 2022

Choose a reason for hiding this comment

Uh oh!

asgerf Sep 22, 2022

Choose a reason for hiding this comment

Uh oh!

hvitved Sep 23, 2022

Choose a reason for hiding this comment

Uh oh!

asgerf Sep 28, 2022

Choose a reason for hiding this comment

Uh oh!

hvitved Sep 28, 2022

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

asgerf commented Sep 22, 2022

Uh oh!

asgerf commented Sep 22, 2022

Uh oh!

Uh oh!

Uh oh!

asgerf commented Sep 26, 2022

Uh oh!

hvitved commented Sep 26, 2022

Uh oh!

asgerf commented Sep 27, 2022

Uh oh!

asgerf commented Sep 28, 2022

Uh oh!

asgerf commented Sep 28, 2022 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

hvitved left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

asgerf commented Sep 29, 2022

Uh oh!

hvitved commented Sep 29, 2022

Uh oh!

asgerf commented Sep 29, 2022

Uh oh!

hvitved left a comment

Choose a reason for hiding this comment

Uh oh!

asgerf commented Sep 30, 2022

Uh oh!

asgerf commented Sep 12, 2022 •

edited

Loading

asgerf commented Sep 28, 2022 •

edited

Loading