Description
Running some asvs and seeing changes in benchmarks that should be unaffected, yet those changes are consistent across runs.
Hunting this down, I'm replicating the most-weird benchmark (asv reports 1.27±0ms vs 789±3μs) in ipython
arr = np.ones(10**6, dtype="int8")
ser = pd.Series(arr)
func = ser.sum
%timeit func()
The first few runs matched the asv results really closely:
In [4]: %timeit func()
781 µs ± 552 ns per loop (mean ± std. dev. of 7 runs, 1000 loops each) # <-- branch
1.27 ms ± 1.82 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each) # <-- master
But running it again just for kicks the results flipped:
In [5]: %timeit func()
1.27 ms ± 5.05 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each) # <-- branch
780 µs ± 4.81 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each) # <-- master
In [5]: %timeit func()
1.26 ms ± 691 ns per loop (mean ± std. dev. of 7 runs, 1000 loops each) # <-- branch
1.26 ms ± 578 ns per loop (mean ± std. dev. of 7 runs, 1000 loops each) # <-- master
In [10]: %timeit func()
1.26 ms ± 818 ns per loop (mean ± std. dev. of 7 runs, 1000 loops each) # <-- branch
773 µs ± 200 ns per loop (mean ± std. dev. of 7 runs, 1000 loops each) # <-- master
Eventually the ~780 number stopped showing up and got replaced a mode between 895-907, but the 1.26-1.27 mode hasn't moved a bit.
No obvious pattern as to which mode will show up in any given run (through they are positively auto-correlated). But in the asv results there must be something causing the modes to be split between the two branches.
I get the same behavior even if I use the same branch, just different terminal windows.
Looking at this with %prun -s cumtime for i in range(1000): func()
points at {method 'reduce' of 'numpy.ufunc' objects}
ncalls tottime percall cumtime percall filename:lineno(function)
1 0.000 0.000 0.796 0.796 {built-in method builtins.exec}
1 0.001 0.001 0.796 0.796 <string>:1(<module>)
1000 0.001 0.000 0.795 0.001 generic.py:10747(sum)
1000 0.001 0.000 0.794 0.001 generic.py:10489(sum)
1000 0.002 0.000 0.793 0.001 generic.py:10454(_min_count_stat_function)
1000 0.004 0.000 0.791 0.001 series.py:4133(_reduce)
1000 0.004 0.000 0.778 0.001 nanops.py:82(_f)
1000 0.001 0.000 0.763 0.001 nanops.py:392(new_func)
1000 0.003 0.000 0.761 0.001 nanops.py:528(nansum)
1000 0.001 0.000 0.738 0.001 {method 'sum' of 'numpy.ndarray' objects}
1000 0.000 0.000 0.738 0.001 _methods.py:45(_sum)
1000 0.737 0.001 0.737 0.001 {method 'reduce' of 'numpy.ufunc' objects} # <--
1000 0.003 0.000 0.017 0.000 nanops.py:256(_get_values)
4000 0.005 0.000 0.010 0.000 _ufunc_config.py:32(seterr)
ncalls tottime percall cumtime percall filename:lineno(function)
1 0.000 0.000 1.292 1.292 {built-in method builtins.exec}
1 0.001 0.001 1.292 1.292 <string>:1(<module>)
1000 0.001 0.000 1.292 0.001 generic.py:10747(sum)
1000 0.001 0.000 1.291 0.001 generic.py:10489(sum)
1000 0.002 0.000 1.290 0.001 generic.py:10454(_min_count_stat_function)
1000 0.004 0.000 1.288 0.001 series.py:4133(_reduce)
1000 0.004 0.000 1.275 0.001 nanops.py:82(_f)
1000 0.001 0.000 1.260 0.001 nanops.py:392(new_func)
1000 0.002 0.000 1.258 0.001 nanops.py:528(nansum)
1000 0.001 0.000 1.235 0.001 {method 'sum' of 'numpy.ndarray' objects}
1000 0.000 0.000 1.234 0.001 _methods.py:45(_sum)
1000 1.234 0.001 1.234 0.001 {method 'reduce' of 'numpy.ufunc' objects} # <--
1000 0.003 0.000 0.018 0.000 nanops.py:256(_get_values)
4000 0.005 0.000 0.010 0.000 _ufunc_config.py:32(seterr)
The total difference is 1.292 - 0.796 = 0.496 and the difference in the reduce
ufunc is 1.234 - 0.737 = 0.497, which is pretty suggestive.