-
Notifications
You must be signed in to change notification settings - Fork 1.7k
Python: Fix bad join in function resolution #16885
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Python: Fix bad join in function resolution #16885
Conversation
On a certain database, the evaluator was spending 17 seconds on a single iteration, which is usually a bad sign. Looking more closely, we find the following culprit: ``` (621s) Tuple counts for PoorMansFunctionResolution::poorMansFunctionTracker/2#75430e01/3@i1#fb44303q after 17.8s: 79429084 ~0% {3} r1 = JOIN `TypeTrackingImpl::TypeTracker.start/0#dispred#f8047cd1` WITH `DataFlowPublic::Node.asExpr/0#dispred#2845197a` CARTESIAN PRODUCT OUTPUT Rhs.0, Lhs.0 't', Rhs.1 59018875 ~7% {3} | JOIN WITH LocalSources::LocalSourceNode#2491029a ON FIRST 1 OUTPUT Lhs.2, Lhs.1 't', Lhs.0 'result' 7929 ~1% {3} | JOIN WITH `PoorMansFunctionResolution::lastDecoratorCall/1#152a5a8f_10#join_rhs` ON FIRST 1 OUTPUT Lhs.1 't', Rhs.1 'func', Lhs.2 'result' 277654 ~5% {3} r2 = JOIN `TypeTrackingImpl::TypeTracker.start/0#dispred#f8047cd1` WITH `Function::Function.getDefinition/0#dispred#6279b12c` CARTESIAN PRODUCT OUTPUT Rhs.0, Lhs.0 't', Rhs.1 {3} | AND NOT `project#Function::Function.getADecorator/0#dispred#119184e3`(FIRST 1) 269725 ~0% {3} | SCAN OUTPUT In.2, In.1 't', In.0 'func' 270268 ~0% {3} | JOIN WITH `DataFlowPublic::Node.asExpr/0#dispred#2845197a_10#join_rhs` ON FIRST 1 OUTPUT Rhs.1 'result', Lhs.1 't', Lhs.2 'func' 270268 ~7% {3} | JOIN WITH LocalSources::LocalSourceNode#2491029a ON FIRST 1 OUTPUT Lhs.1 't', Lhs.2 'func', Lhs.0 'result' 278197 ~5% {3} r3 = r1 UNION r2 return r3 ``` This is clearly silly. We're creating a type tracker start spot for _every expression_, and only then filtering them down to just final decorator calls. To fix this, I simply pushed the `.asExpr` into `lastDecoratorCall` and made that `pragma[nomagic]`. This resulted in a much more reasonable iteration: ``` (585s) Tuple counts for PoorMansFunctionResolution::poorMansFunctionTracker/2#75430e01/3@i1#62ade299 after 2.1s: 61252093 ~0% {2} r1 = JOIN `TypeTrackingImpl::TypeTracker.start/0#dispred#f8047cd1` WITH LocalSources::LocalSourceNode#2491029a CARTESIAN PRODUCT OUTPUT Rhs.0, Lhs.0 't' 7929 ~3% {3} | JOIN WITH `PoorMansFunctionResolution::lastDecoratorCall/1#152a5a8f_10#join_rhs` ON FIRST 1 OUTPUT Lhs.1 't', Rhs.1 'func', Lhs.0 'result' 277654 ~5% {3} r2 = JOIN `TypeTrackingImpl::TypeTracker.start/0#dispred#f8047cd1` WITH `Function::Function.getDefinition/0#dispred#6279b12c` CARTESIAN PRODUCT OUTPUT Rhs.0, Lhs.0 't', Rhs.1 {3} | AND NOT `project#Function::Function.getADecorator/0#dispred#119184e3`(FIRST 1) 269725 ~0% {3} | SCAN OUTPUT In.2, In.1 't', In.0 'func' 270268 ~0% {3} | JOIN WITH `DataFlowPublic::Node.asExpr/0#dispred#2845197a_10#join_rhs` ON FIRST 1 OUTPUT Rhs.1 'result', Lhs.1 't', Lhs.2 'func' 270268 ~5% {3} | JOIN WITH LocalSources::LocalSourceNode#2491029a ON FIRST 1 OUTPUT Lhs.1 't', Lhs.2 'func', Lhs.0 'result' 278197 ~5% {3} r3 = r1 UNION r2 return r3 ``` That's better, but can we do better still? Ideally, we'll join with `lastDecoratorCall` _first_, and only then consider `LocalSourceNode`s. Time to add some pragmas: ``` (6s) Tuple counts for PoorMansFunctionResolution::poorMansFunctionTracker/2#75430e01/3@i1#049abau6 after 51ms: 247936 ~0% {3} r1 = SCAN TypeTrackingImpl::Cached::MkTypeTracker#ff50e2d8 OUTPUT In.1, In.0, In.2 't' 2 ~0% {3} | JOIN WITH `num#Option::Option<TypeTrackingImpl::TypeTrackingInput::Content>::TNone#364b4b8a` ON FIRST 1 OUTPUT Lhs.1, Lhs.2 't', _ {2} | REWRITE WITH Tmp.2 := false, TEST InOut.0 = Tmp.2 KEEPING 2 1 ~0% {1} | SCAN OUTPUT In.1 't' 7929 ~0% {3} r2 = JOIN r1 WITH `PoorMansFunctionResolution::lastDecoratorCall/1#152a5a8f` CARTESIAN PRODUCT OUTPUT Rhs.1 'result', Lhs.0 't', Rhs.0 7929 ~3% {3} | JOIN WITH LocalSources::LocalSourceNode#2491029a ON FIRST 1 OUTPUT Lhs.1 't', Lhs.2 'func', Lhs.0 'result' 277654 ~5% {3} r3 = JOIN r1 WITH `Function::Function.getDefinition/0#dispred#6279b12c` CARTESIAN PRODUCT OUTPUT Rhs.0, Lhs.0 't', Rhs.1 {3} | AND NOT `project#Function::Function.getADecorator/0#dispred#119184e3`(FIRST 1) 269725 ~0% {3} | SCAN OUTPUT In.2, In.1 't', In.0 'func' 270268 ~0% {3} | JOIN WITH `DataFlowPublic::Node.asExpr/0#dispred#2845197a#fb_10#join_rhs` ON FIRST 1 OUTPUT Rhs.1 'result', Lhs.1 't', Lhs.2 'func' 270268 ~5% {3} | JOIN WITH LocalSources::LocalSourceNode#2491029a ON FIRST 1 OUTPUT Lhs.1 't', Lhs.2 'func', Lhs.0 'result' 278197 ~5% {3} r4 = r2 UNION r3 return r4 ``` Perfect! (Here, the optimiser has for some reason decided to inline `t.start()`, but this is not important.) Iteration timings before: ``` (621s) Tuple counts for PoorMansFunctionResolution::poorMansFunctionTracker/2#75430e01/3@i1#fb44303q after 17.8s: (623s) Tuple counts for PoorMansFunctionResolution::poorMansFunctionTracker/2#75430e01/3@i2#fb44303q after 1.9s: (628s) Tuple counts for PoorMansFunctionResolution::poorMansFunctionTracker/2#75430e01/3@i3#fb44303q after 5.2s: (629s) Tuple counts for PoorMansFunctionResolution::poorMansFunctionTracker/2#75430e01/3@i4#fb44303q after 696ms: (629s) Tuple counts for PoorMansFunctionResolution::poorMansFunctionTracker/2#75430e01/3@i5#fb44303q after 238ms: (630s) Tuple counts for PoorMansFunctionResolution::poorMansFunctionTracker/2#75430e01/3@i6#fb44303q after 401ms: (630s) Tuple counts for PoorMansFunctionResolution::poorMansFunctionTracker/2#75430e01/3@i7#fb44303q after 6ms: (630s) Tuple counts for PoorMansFunctionResolution::poorMansFunctionTracker/2#75430e01/3@i8#fb44303q after 3ms: (630s) Tuple counts for PoorMansFunctionResolution::poorMansFunctionTracker/2#75430e01/3@i9#fb44303q after 1ms: (630s) Tuple counts for PoorMansFunctionResolution::poorMansFunctionTracker/2#75430e01/3@i10#fb44303q after 525ms: (631s) Tuple counts for PoorMansFunctionResolution::poorMansFunctionTracker/1#c7f86fe2#fb/2@6a2459s7 after 518ms: (631s) Tuple counts for PoorMansFunctionResolution::poorMansFunctionTracker/1#c7f86fe2#fb_10#join_rhs/2@eab64fk2 after 42ms: ``` And after: ``` (585s) Tuple counts for PoorMansFunctionResolution::poorMansFunctionTracker/2#75430e01/3@i1#62ade299 after 2.1s: (585s) Tuple counts for PoorMansFunctionResolution::poorMansFunctionTracker/2#75430e01/3@i2#62ade299 after 331ms: (590s) Tuple counts for PoorMansFunctionResolution::poorMansFunctionTracker/2#75430e01/3@i3#62ade299 after 4.8s: (592s) Tuple counts for PoorMansFunctionResolution::poorMansFunctionTracker/2#75430e01/3@i4#62ade299 after 1.9s: (593s) Tuple counts for PoorMansFunctionResolution::poorMansFunctionTracker/2#75430e01/3@i5#62ade299 after 504ms: (593s) Tuple counts for PoorMansFunctionResolution::poorMansFunctionTracker/2#75430e01/3@i6#62ade299 after 312ms: (593s) Tuple counts for PoorMansFunctionResolution::poorMansFunctionTracker/2#75430e01/3@i7#62ade299 after 6ms: (593s) Tuple counts for PoorMansFunctionResolution::poorMansFunctionTracker/2#75430e01/3@i8#62ade299 after 2ms: (593s) Tuple counts for PoorMansFunctionResolution::poorMansFunctionTracker/2#75430e01/3@i9#62ade299 after 1ms: (593s) Tuple counts for PoorMansFunctionResolution::poorMansFunctionTracker/2#75430e01/3@i10#62ade299 after 489ms: (594s) Tuple counts for PoorMansFunctionResolution::poorMansFunctionTracker/1#c7f86fe2#fb/2@7cac46aj after 522ms: (594s) Tuple counts for PoorMansFunctionResolution::poorMansFunctionTracker/1#c7f86fe2#fb_10#join_rhs/2@a15a372e after 41ms: ``` (Curiously, in the "before" run it appeared to run that tracker _twice_, with the same performance characteristics the second time around. This is not present in the "after" version.) And finally with the pragma: ``` (596s) Tuple counts for PoorMansFunctionResolution::poorMansFunctionTracker/2#75430e01/3@i1#7ef50eer after 636ms: (596s) Tuple counts for PoorMansFunctionResolution::poorMansFunctionTracker/2#75430e01/3@i2#7ef50eer after 180ms: (598s) Tuple counts for PoorMansFunctionResolution::poorMansFunctionTracker/2#75430e01/3@i3#7ef50eer after 1.9s: (604s) Tuple counts for PoorMansFunctionResolution::poorMansFunctionTracker/2#75430e01/3@i4#7ef50eer after 6.2s: (606s) Tuple counts for PoorMansFunctionResolution::poorMansFunctionTracker/2#75430e01/3@i5#7ef50eer after 1.4s: (607s) Tuple counts for PoorMansFunctionResolution::poorMansFunctionTracker/2#75430e01/3@i6#7ef50eer after 1.7s: (607s) Tuple counts for PoorMansFunctionResolution::poorMansFunctionTracker/2#75430e01/3@i7#7ef50eer after 55ms: (607s) Tuple counts for PoorMansFunctionResolution::poorMansFunctionTracker/2#75430e01/3@i8#7ef50eer after 8ms: (607s) Tuple counts for PoorMansFunctionResolution::poorMansFunctionTracker/2#75430e01/3@i9#7ef50eer after 2ms: (608s) Tuple counts for PoorMansFunctionResolution::poorMansFunctionTracker/2#75430e01/3@i10#7ef50eer after 619ms: (609s) Tuple counts for PoorMansFunctionResolution::poorMansFunctionTracker/1#c7f86fe2#fb/2@4cb536id after 566ms: (609s) Tuple counts for PoorMansFunctionResolution::poorMansFunctionTracker/1#c7f86fe2#fb_10#join_rhs/2@f0664dnr after 56ms: (714s) Tuple counts for m#PoorMansFunctionResolution::poorMansFunctionTracker/1#c7f86fe2#fb/1@fe47420j after 14ms: (718s) Tuple counts for PoorMansFunctionResolution::poorMansFunctionTracker/2#75430e01/3@i1#769227t3 after 369ms: (719s) Tuple counts for PoorMansFunctionResolution::poorMansFunctionTracker/2#75430e01/3@i2#769227t3 after 623ms: (724s) Tuple counts for PoorMansFunctionResolution::poorMansFunctionTracker/2#75430e01/3@i3#769227t3 after 5.3s: (727s) Tuple counts for PoorMansFunctionResolution::poorMansFunctionTracker/2#75430e01/3@i4#769227t3 after 2.8s: (727s) Tuple counts for PoorMansFunctionResolution::poorMansFunctionTracker/2#75430e01/3@i5#769227t3 after 289ms: (727s) Tuple counts for PoorMansFunctionResolution::poorMansFunctionTracker/2#75430e01/3@i6#769227t3 after 285ms: (727s) Tuple counts for PoorMansFunctionResolution::poorMansFunctionTracker/2#75430e01/3@i7#769227t3 after 7ms: (727s) Tuple counts for PoorMansFunctionResolution::poorMansFunctionTracker/2#75430e01/3@i8#769227t3 after 3ms: (727s) Tuple counts for PoorMansFunctionResolution::poorMansFunctionTracker/2#75430e01/3@i9#769227t3 after 1ms: (728s) Tuple counts for PoorMansFunctionResolution::poorMansFunctionTracker/2#75430e01/3@i10#769227t3 after 672ms: (731s) Tuple counts for PoorMansFunctionResolution::poorMansFunctionTracker/2#75430e01_201#join_rhs/3@cacd816f after 2.5s: (732s) Tuple counts for PoorMansFunctionResolution::poorMansFunctionTracker/1#c7f86fe2#fb/2@14c0cfsk after 80ms: (732s) Tuple counts for PoorMansFunctionResolution::poorMansFunctionTracker/1#c7f86fe2#fb_10#join_rhs/2@fc0f040d after 0ms: ``` Note the double evaluation again. This will need to be investigated further.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Great 👍
It's curious that the slowest iteration for the final version is 6.2s
while the slowest iteration for the first fix is only 4.8s
🤔
I agree, but I think the variance is also rather high. Observe that the slowest iteration in the strange reevaluation is |
On a certain database, the evaluator was spending 17 seconds on a single iteration, which is usually a bad sign.
Looking more closely, we find the following culprit:
This is clearly silly. We're creating a type tracker start spot for every expression, and only then filtering them down to just final decorator calls.
To fix this, I simply pushed the
.asExpr
intolastDecoratorCall
and made thatpragma[nomagic]
.This resulted in a much more reasonable iteration:
That's better, but can we do better still? Ideally, we'll join with
lastDecoratorCall
first, and only then considerLocalSourceNode
s.Time to add some pragmas:
Perfect! (Here, the optimiser has for some reason decided to inline
t.start()
, but this is not important.)Iteration timings before:
And after:
(Curiously, in the "before" run it appeared to run that tracker twice, with the same performance characteristics the second time around. This is not present in the "after" version.)
And finally with the pragma:
Note the double evaluation again. This will need to be investigated further.