ASV: docs, runtime, stability, build acceleration, PR integration?

Currently, https://pandas.pydata.org/pandas-docs/stable/contributing.html says:
> Running the full test suite can take up to one hour and use up to 3GB of RAM.

On a standard laptop with 8GB RAM and 4 cores, this was more like **6.5h** last night.

I recently updated the ASV code (as recommended by `contributing.html`) with
```pip install git+https://github.com/spacetelescope/asv```
and it seems that in `v.0.4`, ASV runs each commit/benchmark in 2 rounds, effectively doubling the runtime? (It may well be that I don't understand what the rounds are supposed to do exactly, but the ASV ran much faster before).

Quite a lot of time is also spent doing the environment builds, and I was wondering if it wouldn't be possible to reuse the logic from `python setup.py build_ext --inplace -j 4` to only cythonize the modules for which the code has changed (probably more an asv issue).

Finally, the runs are annoyingly noisy. For example, after running ```asv continuous -f 1.1 upstream/master HEAD``` overnight, with nothing else running on the machine (all other applications closed), I got something like this,
```
       before           after         ratio
     [360e7271]       [19c7c1f8]
     <master>         <unique_inverse_cython>
+        1.25±0ms         93.8±0ms    75.00  frame_ctor.FromRecords.time_frame_from_records_generator(None)
+        1.41±0ms       6.25±0.6ms     4.44  indexing.NumericSeriesIndexing.time_getitem_array(<class 'pandas.core.indexes.numeric.Int64Index'>, 'unique_monotonic_inc')
+        14.1±0ms         62.5±8ms     4.44  indexing.NumericSeriesIndexing.time_loc_slice(<class 'pandas.core.indexes.numeric.UInt64Index'>, 'nonunique_monotonic_inc')
+     1.88±0.08ms       5.21±0.4ms     2.78  reindex.DropDuplicates.time_frame_drop_dups_int(True)
+        22.7±2μs         62.5±0μs     2.75  indexing.NumericSeriesIndexing.time_getitem_scalar(<class 'pandas.core.indexes.numeric.Int64Index'>, 'nonunique_monotonic_inc')
+      4.62±0.3ms         12.5±0ms     2.70  index_object.Indexing.time_get_loc_non_unique('Float')
+      1.41±0.2ms       3.47±0.9ms     2.47  index_object.Indexing.time_get_loc_non_unique_sorted('Int')
+        273±20μs          625±0μs     2.29  indexing.NumericSeriesIndexing.time_ix_slice(<class 'pandas.core.indexes.numeric.Int64Index'>, 'nonunique_monotonic_inc')
+       703±200μs         1.56±0ms     2.22  inference.NumericInferOps.time_subtract(<class 'numpy.uint16'>)
+        938±60μs         1.59±0ms     1.70  indexing.NumericSeriesIndexing.time_ix_list_like(<class 'pandas.core.indexes.numeric.Float64Index'>, 'nonunique_monotonic_inc')
+        703±80μs         1.17±0ms     1.67  frame_methods.Iteration.time_iteritems_cached
+      1.09±0.1ms       1.63±0.1ms     1.50  indexing.NumericSeriesIndexing.time_loc_list_like(<class 'pandas.core.indexes.numeric.Float64Index'>, 'nonunique_monotonic_inc')
+     1.09±0.08ms       1.62±0.1ms     1.48  indexing.NumericSeriesIndexing.time_loc_array(<class 'pandas.core.indexes.numeric.Int64Index'>, 'unique_monotonic_inc')
+         938±0μs       1.35±0.1ms     1.44  indexing.NumericSeriesIndexing.time_ix_list_like(<class 'pandas.core.indexes.numeric.Int64Index'>, 'nonunique_monotonic_inc')
+     1.09±0.08ms         1.56±0ms     1.43  inference.NumericInferOps.time_subtract(<class 'numpy.int16'>)
+         938±0μs       1.30±0.1ms     1.39  indexing.NumericSeriesIndexing.time_ix_list_like(<class 'pandas.core.indexes.numeric.UInt64Index'>, 'unique_monotonic_inc')
+         141±0μs         194±20μs     1.38  indexing.NumericSeriesIndexing.time_loc_slice(<class 'pandas.core.indexes.numeric.Float64Index'>, 'nonunique_monotonic_inc')
+        729±40μs          938±0μs     1.29  indexing.NumericSeriesIndexing.time_loc_list_like(<class 'pandas.core.indexes.numeric.Int64Index'>, 'nonunique_monotonic_inc')
[...]
```
However, I didn't trust the results because there were equally strong divergences in the other direction.

Upon rerunning ```asv continuous -f 1.1 upstream/master HEAD -b "^(re)?index"```, all those divergences vanished, and got replaced by the following (with *other* divergences):

```
       before           after         ratio
     [360e7271]       [19c7c1f8]
     <master>         <unique_inverse_cython>
+      3.12±0.2ms         15.6±2ms     5.00  index_object.Indexing.time_get_loc_non_unique_sorted('Float')
+        20.3±1μs         93.8±0μs     4.62  indexing.NonNumericSeriesIndexing.time_getitem_scalar('datetime', 'nonunique_monotonic_inc')

SOME BENCHMARKS HAVE CHANGED SIGNIFICANTLY.
```

This is a larger point that letting people run the ASVs on their private machines is not the most thorough approach, prone to bias (or even manipulation), and exposed to whatever else is running on their machines at the time.

Finally, a lot of the divergences are not shown if the results are too noisy according to the ASV internals - this is a general point to keep in mind, because IMO, this can mask real regressions just because the runs are noisy. I've opened airspeed-velocity/asv#752 for that.

Summing up, I think that:
* the asv section in the docs should be updated (at least concerning the estimated runtime)
* maybe consider pinning an ASV version?
* disable rounds if deemed not necessary (or worth the runtime tradeoff)
* find ways to reduce build times if possible (esp. for exploratory runs with `-b "some_regex"`)
* have an ASV job executed on a worker (e.g. azure) that isn't triggered by default, but can be started by a core dev for PRs that need it. This should greatly improve stability of the results (very controlled enviroment with little background noise), and is also way more transparent.


Provide feedback

Saved searches

Use saved searches to filter your results more quickly

ASV: docs, runtime, stability, build acceleration, PR integration? #23412

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

ASV: docs, runtime, stability, build acceleration, PR integration? #23412

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions