Skip to content

Perf check on upcoming v0.11.0 vs 0.10.1 #3326

Closed
@ghost

Description

Here are all the vbenches which differ by more then 15 percent,
best of 5.

I hope to have this automated and realtime by the time the
next release comes around, along with bisection.

1st run

λ cat r1/report.txt 
Worse
getattr_dataframe_index              2.000000
frame_multi_and_no_ne                1.319728
series_constructor_ndarray           1.329545
ctor_index_array_string              1.651376
frame_wide_repr                     48.154345
groupby_sum_booleans                 1.152285
indexing_dataframe_boolean_rows      1.168337
series_getitem_scalar                1.862069
dataframe_getitem_scalar             1.214286
datamatrix_getitem_scalar            1.190476
concat_small_frames                  1.159975
series_align_left_monotonic          1.289482
reindex_daterange_backfill           1.190402
reindex_daterange_pad                1.185000
timeseries_large_lookup_value      235.037234
dtype: float64

Better
frame_multi_and_st               0.580615
frame_multi_and                  0.597748
frame_fancy_lookup               0.792336
frame_get_dtype_counts           0.000404
frame_fancy_lookup_all           0.797565
series_string_vector_slice       0.801726
frame_reindex_upcast             0.523595
frame_reindex_axis0              0.509034
groupby_first_float32            0.043276
groupby_last_float32             0.044075
groupby_transform                0.413400
indexing_dataframe_boolean_st    0.094886
indexing_dataframe_boolean       0.095269
frame_to_csv                     0.737868
frame_to_csv2                    0.121081
frame_to_csv_mixed               0.381196
write_csv_standard               0.198206
append_frame_single_mixed        0.805745
reindex_frame_level_align        0.790248
reindex_frame_level_reindex      0.787495

2nd run

Worse
frame_multi_and_no_ne                1.322615
series_constructor_ndarray           1.284091
ctor_index_array_string              1.557522
frame_wide_repr                     47.085119
indexing_dataframe_boolean_rows      1.158785
series_getitem_scalar                1.896552
dataframe_getitem_scalar             1.214286
datamatrix_getitem_scalar            1.190476
series_align_left_monotonic          1.293073
reindex_daterange_backfill           1.192585
reindex_daterange_pad                1.176850
frame_reindex_columns                1.231402
timeseries_large_lookup_value      370.207254
dtype: float64

Better
frame_multi_and_st               0.584512
frame_multi_and                  0.592406
frame_fancy_lookup               0.794964
frame_get_dtype_counts           0.000393
series_string_vector_slice       0.793987
frame_reindex_upcast             0.562601
frame_reindex_axis0              0.512411
groupby_first_float32            0.042893
groupby_last_float32             0.043322
groupby_transform                0.412138
indexing_dataframe_boolean_st    0.093816
indexing_dataframe_boolean       0.094429
frame_to_csv                     0.721259
frame_to_csv2                    0.120910
frame_to_csv_mixed               0.399276
write_csv_standard               0.197096
append_frame_single_mixed        0.795502
reindex_frame_level_align        0.787276
reindex_frame_level_reindex      0.786307
dtype: float64


Until test_perf gets validated in it's compare mode, re instability

#!/bin/bash

# profile current HEAD, against the commit
# specified on the command line

# assume you're running in a venv, and
# that upstream pandas is a git remote named
# "upstream"

PREV_VER=$1
THRESH=0.15
NITER=5
UPSTREAM=upstream/master

git reset --hard $UPSTREAM
H1=$(git log --format="%h" -1)
python setup.py develop
./test_perf.sh -H -N $NITER -c 1 -d $PWD/HEAD-$H1.pickle "$2"

git reset --hard $PREV_VER
H2=$(git log --format="%h" -1)
git checkout upstream/master vb_suite setup.py # bring back the updated suite and test_perf, and build cache
python setup.py develop
./test_perf.sh -H -N $NITER -c 1 -d $PWD/PREV-$H2.pickle "$2"

# back to master
git reset --hard $UPSTREAM
python setup.py develop

SCR=$(tee <<EOF
import pandas as pd
H=(pd.load("HEAD-$H1.pickle").min(1)/pd.load("PREV-$H2.pickle").min(1))
print "Worse"
print H[(H-1)>$THRESH]
print "\nBetter"
print H[(1-H)>$THRESH]
EOF
)
python -c "$SCR"

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions