Skip to content

Data flow: Fix bad join-order when getAReadContent has large fan-in #10577

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 3 commits into from
Sep 27, 2022

Conversation

hvitved
Copy link
Contributor

@hvitved hvitved commented Sep 26, 2022

Before (terminated before completion)

Evaluated relational algebra for predicate DataFlowImplForHttpClientLibraries#c536b619::store#5#fffff@e5ef07bh with tuple counts:
            151500     ~0%    {4} r1 = SCAN DataFlowImplCommon#4f8df883::Cached::store#4#ffff OUTPUT In.1, In.0, In.2, In.3
            150500     ~0%    {5} r2 = JOIN r1 WITH DataFlowImplCommon#4f8df883::Cached::MkTypedContent#fff_20#join_rhs ON FIRST 1 OUTPUT Lhs.1, Lhs.0, Lhs.2, Lhs.3, Rhs.1
            149500     ~0%    {5} r3 = JOIN r2 WITH num#DataFlowImplForHttpClientLibraries#c536b619::TNodeNormal#ff ON FIRST 1 OUTPUT Lhs.2, Lhs.1, Lhs.3, Lhs.4, Rhs.1
            148500     ~0%    {5} r4 = JOIN r3 WITH num#DataFlowImplForHttpClientLibraries#c536b619::TNodeNormal#ff ON FIRST 1 OUTPUT Lhs.3, Lhs.1, Lhs.2, Lhs.4, Rhs.1
        2003849000     ~0%    {5} r5 = JOIN r4 WITH DataFlowPublic#e1781e31::ContentSet::getAReadContent#0#dispred#ff_10#join_rhs ON FIRST 1 OUTPUT Rhs.1, Lhs.1, Lhs.2, Lhs.3, Lhs.4
         105066500  ~9036%    {5} r6 = JOIN r5 WITH project#DataFlowImplForHttpClientLibraries#c536b619::readSet#4#ffff ON FIRST 1 OUTPUT Lhs.3, Lhs.1, Lhs.4, Lhs.2, Rhs.1
                              return r6

After

Evaluated relational algebra for predicate DataFlowImplForHttpClientLibraries#c536b619::readProj#2#ff@302620cn with tuple counts:
        1461867  ~0%    {2} r1 = SCAN DataFlowPrivate#462ff392::Cached::TContent#f OUTPUT In.0, In.0
        3549054  ~1%    {2} r2 = JOIN r1 WITH DataFlowPublic#e1781e31::ContentSet::getAReadContent#0#dispred#ff_10#join_rhs ON FIRST 1 OUTPUT Rhs.1, Lhs.1
        5772824  ~5%    {2} r3 = JOIN r2 WITH project#DataFlowImplForHttpClientLibraries#c536b619::readSet#4#ffff ON FIRST 1 OUTPUT Lhs.1, Rhs.1
                        return r3

Evaluated relational algebra for predicate DataFlowImplForHttpClientLibraries#c536b619::store#5#fffff@016cd9o1 with tuple counts:
         267905  ~0%    {4} r1 = SCAN DataFlowImplCommon#4f8df883::Cached::store#4#ffff OUTPUT In.1, In.0, In.2, In.3
         267905  ~0%    {5} r2 = JOIN r1 WITH DataFlowImplCommon#4f8df883::Cached::MkTypedContent#fff_20#join_rhs ON FIRST 1 OUTPUT Lhs.1, Lhs.0, Lhs.2, Lhs.3, Rhs.1
         267905  ~0%    {5} r3 = JOIN r2 WITH num#DataFlowImplForHttpClientLibraries#c536b619::TNodeNormal#ff ON FIRST 1 OUTPUT Lhs.2, Lhs.1, Lhs.3, Lhs.4, Rhs.1
         267905  ~0%    {5} r4 = JOIN r3 WITH num#DataFlowImplForHttpClientLibraries#c536b619::TNodeNormal#ff ON FIRST 1 OUTPUT Lhs.3, Lhs.1, Lhs.2, Lhs.4, Rhs.1
        2109240  ~0%    {5} r5 = JOIN r4 WITH DataFlowImplForHttpClientLibraries#c536b619::readProj#2#ff ON FIRST 1 OUTPUT Lhs.3, Lhs.1, Lhs.4, Lhs.2, Rhs.1
                        return r5

Discovered on #10574, where a new ContentSet kind is introduced, which has a large fan-in through getAReadContent (the special TUnknownElementContent() can be mapped to TKnownOrUnknownElementContent(c) via getAReadContent, for all KnownElementContent c).

Before (terminated before completion)
```
Evaluated relational algebra for predicate DataFlowImplForHttpClientLibraries#c536b619::store#5#fffff@e5ef07bh with tuple counts:
            151500     ~0%    {4} r1 = SCAN DataFlowImplCommon#4f8df883::Cached::store#4#ffff OUTPUT In.1, In.0, In.2, In.3
            150500     ~0%    {5} r2 = JOIN r1 WITH DataFlowImplCommon#4f8df883::Cached::MkTypedContent#fff_20#join_rhs ON FIRST 1 OUTPUT Lhs.1, Lhs.0, Lhs.2, Lhs.3, Rhs.1
            149500     ~0%    {5} r3 = JOIN r2 WITH num#DataFlowImplForHttpClientLibraries#c536b619::TNodeNormal#ff ON FIRST 1 OUTPUT Lhs.2, Lhs.1, Lhs.3, Lhs.4, Rhs.1
            148500     ~0%    {5} r4 = JOIN r3 WITH num#DataFlowImplForHttpClientLibraries#c536b619::TNodeNormal#ff ON FIRST 1 OUTPUT Lhs.3, Lhs.1, Lhs.2, Lhs.4, Rhs.1
        2003849000     ~0%    {5} r5 = JOIN r4 WITH DataFlowPublic#e1781e31::ContentSet::getAReadContent#0#dispred#ff_10#join_rhs ON FIRST 1 OUTPUT Rhs.1, Lhs.1, Lhs.2, Lhs.3, Lhs.4
         105066500  ~9036%    {5} r6 = JOIN r5 WITH project#DataFlowImplForHttpClientLibraries#c536b619::readSet#4#ffff ON FIRST 1 OUTPUT Lhs.3, Lhs.1, Lhs.4, Lhs.2, Rhs.1
                              return r6
```

After
```
Evaluated relational algebra for predicate DataFlowImplForHttpClientLibraries#c536b619::readProj#2#ff@302620cn with tuple counts:
        1461867  ~0%    {2} r1 = SCAN DataFlowPrivate#462ff392::Cached::TContent#f OUTPUT In.0, In.0
        3549054  ~1%    {2} r2 = JOIN r1 WITH DataFlowPublic#e1781e31::ContentSet::getAReadContent#0#dispred#ff_10#join_rhs ON FIRST 1 OUTPUT Rhs.1, Lhs.1
        5772824  ~5%    {2} r3 = JOIN r2 WITH project#DataFlowImplForHttpClientLibraries#c536b619::readSet#4#ffff ON FIRST 1 OUTPUT Lhs.1, Rhs.1
                        return r3

Evaluated relational algebra for predicate DataFlowImplForHttpClientLibraries#c536b619::store#5#fffff@016cd9o1 with tuple counts:
         267905  ~0%    {4} r1 = SCAN DataFlowImplCommon#4f8df883::Cached::store#4#ffff OUTPUT In.1, In.0, In.2, In.3
         267905  ~0%    {5} r2 = JOIN r1 WITH DataFlowImplCommon#4f8df883::Cached::MkTypedContent#fff_20#join_rhs ON FIRST 1 OUTPUT Lhs.1, Lhs.0, Lhs.2, Lhs.3, Rhs.1
         267905  ~0%    {5} r3 = JOIN r2 WITH num#DataFlowImplForHttpClientLibraries#c536b619::TNodeNormal#ff ON FIRST 1 OUTPUT Lhs.2, Lhs.1, Lhs.3, Lhs.4, Rhs.1
         267905  ~0%    {5} r4 = JOIN r3 WITH num#DataFlowImplForHttpClientLibraries#c536b619::TNodeNormal#ff ON FIRST 1 OUTPUT Lhs.3, Lhs.1, Lhs.2, Lhs.4, Rhs.1
        2109240  ~0%    {5} r5 = JOIN r4 WITH DataFlowImplForHttpClientLibraries#c536b619::readProj#2#ff ON FIRST 1 OUTPUT Lhs.3, Lhs.1, Lhs.4, Lhs.2, Rhs.1
                        return r5
```
@hvitved hvitved added the no-change-note-required This PR does not need a change note label Sep 26, 2022
@hvitved hvitved marked this pull request as ready for review September 26, 2022 19:21
@hvitved hvitved requested review from a team as code owners September 26, 2022 19:21
michaelnebel
michaelnebel previously approved these changes Sep 27, 2022
Copy link
Contributor

@michaelnebel michaelnebel left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

C# LGTM 👍

atorralba
atorralba previously approved these changes Sep 27, 2022
Copy link
Contributor

@atorralba atorralba left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Java looks plausible to me.

geoffw0
geoffw0 previously approved these changes Sep 27, 2022
Copy link
Contributor

@geoffw0 geoffw0 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

C++: Change seems reasonable. The DCA run looks OK (analysis performance most likely unchanged), though there don't seem to be the full set of nightly projects in the run.

@aschackmull
Copy link
Contributor

Java dca has apparently failed, but that's ok as Java currently don't use read-sets, and the change looks very safe.

@hvitved hvitved dismissed stale reviews from geoffw0, atorralba, and michaelnebel via 335e1a8 September 27, 2022 11:37
@hvitved hvitved merged commit df2b586 into github:main Sep 27, 2022
@hvitved hvitved deleted the dataflow/get-a-read-content-fan-in branch September 27, 2022 18:05
owen-mc added a commit to owen-mc/codeql that referenced this pull request Nov 29, 2022
owen-mc added a commit to owen-mc/codeql that referenced this pull request Nov 29, 2022
owen-mc added a commit to owen-mc/codeql that referenced this pull request Nov 29, 2022
owen-mc added a commit to owen-mc/codeql that referenced this pull request Nov 29, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

7 participants