Skip to content

pandas sort_values significantly slower on Python 3.5.2 vs. Python 2.7.12 #14103

Closed
@samlalwani

Description

@samlalwani

Code Sample, a copy-pastable example if possible

import pandas as pd
import numpy as np
from time import time
import sys

df_data = pd.DataFrame(np.random.randint(0,int(1e6),int(20e6)), columns=['pop_id'])
df_data['PL_dB'] = 50 + np.random.random(df_data.shape[0]) * 100
df_data['Rx_dBm'] = 23 - df_data.PL_dB
df_data['noise_mW'] = (10.**(df_data.Rx_dBm / 10.)).astype('float32')

start = time()
df_data.sort_values(by=['pop_id', 'Rx_dBm'], ascending=[True, False], inplace=True)
df_data.reset_index(drop=True, inplace=True)

print("Sort took {:0.2f} seconds".format(time() - start))
print('Python version ' + sys.version)
print('pandas version ' + pd.version)

output of pd.show_versions()

For Python 2.7

INSTALLED VERSIONS

commit: None
python: 2.7.12.final.0
python-bits: 64
OS: Windows
OS-release: 7
machine: AMD64
processor: Intel64 Family 6 Model 69 Stepping 1, GenuineIntel
byteorder: little
LC_ALL: None
LANG: None

pandas: 0.18.1
nose: 1.3.7
pip: 8.1.2
setuptools: 25.1.6
Cython: 0.24.1
numpy: 1.11.1
scipy: 0.18.0
statsmodels: 0.6.1
xarray: 0.8.2
IPython: 5.1.0
sphinx: 1.4.1
patsy: 0.4.1
dateutil: 2.5.3
pytz: 2016.6.1
blosc: None
bottleneck: 1.1.0
tables: 3.2.2
numexpr: 2.6.1
matplotlib: 1.5.1
openpyxl: 2.3.2
xlrd: 1.0.0
xlwt: 1.1.2
xlsxwriter: 0.9.2
lxml: 3.6.4
bs4: 4.4.1
html5lib: None
httplib2: None
apiclient: None
sqlalchemy: 1.0.13
pymysql: None
psycopg2: None
jinja2: 2.8
boto: 2.40.0
pandas_datareader: None

For Python 3.5

INSTALLED VERSIONS

commit: None
python: 3.5.2.final.0
python-bits: 64
OS: Windows
OS-release: 7
machine: AMD64
processor: Intel64 Family 6 Model 69 Stepping 1, GenuineIntel
byteorder: little
LC_ALL: None
LANG: None

pandas: 0.18.1
nose: None
pip: 8.1.2
setuptools: 25.1.6
Cython: 0.24.1
numpy: 1.11.1
scipy: 0.18.0
statsmodels: None
xarray: 0.8.2
IPython: 5.1.0
sphinx: 1.4.1
patsy: None
dateutil: 2.5.3
pytz: 2016.6.1
blosc: None
bottleneck: 1.1.0
tables: 3.2.2
numexpr: 2.6.1
matplotlib: 1.5.1
openpyxl: None
xlrd: 1.0.0
xlwt: None
xlsxwriter: None
lxml: None
bs4: None
html5lib: None
httplib2: None
apiclient: None
sqlalchemy: 1.0.13
pymysql: None
psycopg2: None
jinja2: 2.8
boto: None
pandas_datareader: None

Results with Python 2.7

Sort took 40.91 seconds
Python version 2.7.12 |Anaconda custom (64-bit)| (default, Jun 29 2016, 11:07:13) [MSC v.1500 64 bit (AMD64)]
pandas version 0.18.1

Results with Python 3.5

Sort took 81.30 seconds
Python version 3.5.2 |Continuum Analytics, Inc.| (default, Jul 5 2016, 11:41:13) [MSC v.1900 64 bit (AMD64)]
pandas version 0.18.1

Metadata

Metadata

Assignees

No one assigned

    Labels

    Duplicate ReportDuplicate issue or pull requestPerformanceMemory or execution speed performance

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions