Closed
Description
@mroeschke looking at BaseWindow._apply we're spending a lot of time in apply_along_axis in our asvs (results posted below). IIUC whats happening is that homogeneous_func is passed to BlockManager.apply thereby iterating over Blocks, and then within that call it iterates over columns.
My intuition is that we should either iterate over blocks or columns but not both. Is there a reason to do both?
Not sure about UDFs, but for mean/sum/... it seems like we'd have to edit the cython functions to iterate over columns there, right?
from asv_bench.benchmarks.rolling import *
self = Engine()
self.setup("DataFrame", "float", sum, "cython", "mean")
%prun -s cumtime for n in range(1000): self.time_rolling_methods('DataFrame', 'float', sum, 'cython', 'mean')
ncalls tottime percall cumtime percall filename:lineno(function)
1 0.000 0.000 0.325 0.325 {built-in method builtins.exec}
1 0.001 0.001 0.325 0.325 <string>:1(<module>)
1000 0.003 0.000 0.324 0.000 rolling.py:68(time_rolling_methods)
1000 0.002 0.000 0.302 0.000 rolling.py:1802(mean)
1000 0.002 0.000 0.299 0.000 rolling.py:1291(mean)
1000 0.002 0.000 0.295 0.000 rolling.py:479(_apply)
1000 0.007 0.000 0.291 0.000 rolling.py:408(_apply_blockwise)
1000 0.003 0.000 0.268 0.000 managers.py:276(apply)
1000 0.003 0.000 0.204 0.000 blocks.py:364(apply)
1000 0.002 0.000 0.165 0.000 rolling.py:425(hfunc)
1000 0.004 0.000 0.149 0.000 rolling.py:514(homogeneous_func)
14000/3000 0.013 0.000 0.140 0.000 {built-in method numpy.core._multiarray_umath.implement_array_function}
1000 0.001 0.000 0.136 0.000 <__array_function__ internals>:2(apply_along_axis)
1000 0.015 0.000 0.134 0.000 shape_base.py:267(apply_along_axis)
1000 0.002 0.000 0.074 0.000 rolling.py:520(calc)
1000 0.010 0.000 0.058 0.000 managers.py:539(_combine)
1000 0.007 0.000 0.050 0.000 indexers.py:73(get_window_bounds)
# With ArrayManager so we only iterate over columns:
ncalls tottime percall cumtime percall filename:lineno(function)
1 0.000 0.000 0.198 0.198 {built-in method builtins.exec}
1 0.001 0.001 0.198 0.198 <string>:1(<module>)
1000 0.003 0.000 0.197 0.000 rolling.py:68(time_rolling_methods)
1000 0.002 0.000 0.174 0.000 rolling.py:1802(mean)
1000 0.002 0.000 0.171 0.000 rolling.py:1291(mean)
1000 0.002 0.000 0.167 0.000 rolling.py:479(_apply)
1000 0.007 0.000 0.163 0.000 rolling.py:408(_apply_blockwise)
1000 0.005 0.000 0.139 0.000 array_manager.py:194(apply)
1000 0.002 0.000 0.100 0.000 rolling.py:425(hfunc)
1000 0.004 0.000 0.086 0.000 rolling.py:514(homogeneous_func)
1000 0.002 0.000 0.071 0.000 rolling.py:520(calc)
1000 0.006 0.000 0.047 0.000 indexers.py:73(get_window_bounds)
2000 0.001 0.000 0.039 0.000 <__array_function__ internals>:2(clip)
7000/3000 0.003 0.000 0.038 0.000 {built-in method numpy.core._multiarray_umath.implement_array_function}
2000 0.002 0.000 0.036 0.000 fromnumeric.py:2046(clip)