Description
Currently, https://pandas.pydata.org/pandas-docs/stable/contributing.html says:
Running the full test suite can take up to one hour and use up to 3GB of RAM.
On a standard laptop with 8GB RAM and 4 cores, this was more like 6.5h last night.
I recently updated the ASV code (as recommended by contributing.html
) with
pip install git+https://github.com/spacetelescope/asv
and it seems that in v.0.4
, ASV runs each commit/benchmark in 2 rounds, effectively doubling the runtime? (It may well be that I don't understand what the rounds are supposed to do exactly, but the ASV ran much faster before).
Quite a lot of time is also spent doing the environment builds, and I was wondering if it wouldn't be possible to reuse the logic from python setup.py build_ext --inplace -j 4
to only cythonize the modules for which the code has changed (probably more an asv issue).
Finally, the runs are annoyingly noisy. For example, after running asv continuous -f 1.1 upstream/master HEAD
overnight, with nothing else running on the machine (all other applications closed), I got something like this,
before after ratio
[360e7271] [19c7c1f8]
<master> <unique_inverse_cython>
+ 1.25±0ms 93.8±0ms 75.00 frame_ctor.FromRecords.time_frame_from_records_generator(None)
+ 1.41±0ms 6.25±0.6ms 4.44 indexing.NumericSeriesIndexing.time_getitem_array(<class 'pandas.core.indexes.numeric.Int64Index'>, 'unique_monotonic_inc')
+ 14.1±0ms 62.5±8ms 4.44 indexing.NumericSeriesIndexing.time_loc_slice(<class 'pandas.core.indexes.numeric.UInt64Index'>, 'nonunique_monotonic_inc')
+ 1.88±0.08ms 5.21±0.4ms 2.78 reindex.DropDuplicates.time_frame_drop_dups_int(True)
+ 22.7±2μs 62.5±0μs 2.75 indexing.NumericSeriesIndexing.time_getitem_scalar(<class 'pandas.core.indexes.numeric.Int64Index'>, 'nonunique_monotonic_inc')
+ 4.62±0.3ms 12.5±0ms 2.70 index_object.Indexing.time_get_loc_non_unique('Float')
+ 1.41±0.2ms 3.47±0.9ms 2.47 index_object.Indexing.time_get_loc_non_unique_sorted('Int')
+ 273±20μs 625±0μs 2.29 indexing.NumericSeriesIndexing.time_ix_slice(<class 'pandas.core.indexes.numeric.Int64Index'>, 'nonunique_monotonic_inc')
+ 703±200μs 1.56±0ms 2.22 inference.NumericInferOps.time_subtract(<class 'numpy.uint16'>)
+ 938±60μs 1.59±0ms 1.70 indexing.NumericSeriesIndexing.time_ix_list_like(<class 'pandas.core.indexes.numeric.Float64Index'>, 'nonunique_monotonic_inc')
+ 703±80μs 1.17±0ms 1.67 frame_methods.Iteration.time_iteritems_cached
+ 1.09±0.1ms 1.63±0.1ms 1.50 indexing.NumericSeriesIndexing.time_loc_list_like(<class 'pandas.core.indexes.numeric.Float64Index'>, 'nonunique_monotonic_inc')
+ 1.09±0.08ms 1.62±0.1ms 1.48 indexing.NumericSeriesIndexing.time_loc_array(<class 'pandas.core.indexes.numeric.Int64Index'>, 'unique_monotonic_inc')
+ 938±0μs 1.35±0.1ms 1.44 indexing.NumericSeriesIndexing.time_ix_list_like(<class 'pandas.core.indexes.numeric.Int64Index'>, 'nonunique_monotonic_inc')
+ 1.09±0.08ms 1.56±0ms 1.43 inference.NumericInferOps.time_subtract(<class 'numpy.int16'>)
+ 938±0μs 1.30±0.1ms 1.39 indexing.NumericSeriesIndexing.time_ix_list_like(<class 'pandas.core.indexes.numeric.UInt64Index'>, 'unique_monotonic_inc')
+ 141±0μs 194±20μs 1.38 indexing.NumericSeriesIndexing.time_loc_slice(<class 'pandas.core.indexes.numeric.Float64Index'>, 'nonunique_monotonic_inc')
+ 729±40μs 938±0μs 1.29 indexing.NumericSeriesIndexing.time_loc_list_like(<class 'pandas.core.indexes.numeric.Int64Index'>, 'nonunique_monotonic_inc')
[...]
However, I didn't trust the results because there were equally strong divergences in the other direction.
Upon rerunning asv continuous -f 1.1 upstream/master HEAD -b "^(re)?index"
, all those divergences vanished, and got replaced by the following (with other divergences):
before after ratio
[360e7271] [19c7c1f8]
<master> <unique_inverse_cython>
+ 3.12±0.2ms 15.6±2ms 5.00 index_object.Indexing.time_get_loc_non_unique_sorted('Float')
+ 20.3±1μs 93.8±0μs 4.62 indexing.NonNumericSeriesIndexing.time_getitem_scalar('datetime', 'nonunique_monotonic_inc')
SOME BENCHMARKS HAVE CHANGED SIGNIFICANTLY.
This is a larger point that letting people run the ASVs on their private machines is not the most thorough approach, prone to bias (or even manipulation), and exposed to whatever else is running on their machines at the time.
Finally, a lot of the divergences are not shown if the results are too noisy according to the ASV internals - this is a general point to keep in mind, because IMO, this can mask real regressions just because the runs are noisy. I've opened airspeed-velocity/asv#752 for that.
Summing up, I think that:
- the asv section in the docs should be updated (at least concerning the estimated runtime)
- maybe consider pinning an ASV version?
- disable rounds if deemed not necessary (or worth the runtime tradeoff)
- find ways to reduce build times if possible (esp. for exploratory runs with
-b "some_regex"
) - have an ASV job executed on a worker (e.g. azure) that isn't triggered by default, but can be started by a core dev for PRs that need it. This should greatly improve stability of the results (very controlled enviroment with little background noise), and is also way more transparent.