Skip to content

PERF: numpy function like np.max called on DataFrame significantly slower than df.max #46874

Closed
@auderson

Description

@auderson

Pandas version checks

  • I have checked that this issue has not already been reported.

  • I have confirmed this issue exists on the latest version of pandas.

  • I have confirmed this issue exists on the main branch of pandas.

Reproducible Example

related: #45099

df = pd.DataFrame(np.random.randn(100, 100))
%%time
_ = df.max()

CPU times: user 913 µs, sys: 1.6 ms, total: 2.52 ms
Wall time: 1.81 ms

%%time
_ = np.max(df)

CPU times: user 13.8 ms, sys: 290 µs, total: 14.1 ms
Wall time: 12.8 ms

%%pyinstrument
_ = np.max(df)

image

Looks like this is triggered by:

pandas/pandas/core/generic.py

Lines 10674 to 10683 in 8980af7

if axis is None and level is None and self.ndim > 1:
# user must have explicitly passed axis=None
# GH#21597
warnings.warn(
f"In a future version, DataFrame.{name}(axis=None) will return a "
f"scalar {name} over the entire DataFrame. To retain the old "
f"behavior, use 'frame.{name}(axis=0)' or just 'frame.{name}()'",
FutureWarning,
stacklevel=find_stack_level(),
)

Installed Versions

INSTALLED VERSIONS

commit : 06d2301
python : 3.9.7.final.0
python-bits : 64
OS : Linux
OS-release : 5.8.0-63-generic
Version : #71-Ubuntu SMP Tue Jul 13 15:59:12 UTC 2021
machine : x86_64
processor : x86_64
byteorder : little
LC_ALL : None
LANG : en_US.UTF-8
LOCALE : en_US.UTF-8

pandas : 1.4.1
numpy : 1.21.5
pytz : 2022.1
dateutil : 2.8.2
pip : 21.2.4
setuptools : 58.0.4
Cython : 0.29.28
pytest : 6.2.5
hypothesis : None
sphinx : None
blosc : None
feather : None
xlsxwriter : None
lxml.etree : None
html5lib : None
pymysql : 1.0.2
psycopg2 : 2.9.3
jinja2 : 3.0.3
IPython : 8.1.1
pandas_datareader: None
bs4 : 4.10.0
bottleneck : None
fastparquet : None
fsspec : 2022.02.0
gcsfs : None
matplotlib : 3.5.1
numba : 0.55.1
numexpr : 2.8.0
odfpy : None
openpyxl : 3.0.9
pandas_gbq : None
pyarrow : 7.0.0
pyreadstat : None
pyxlsb : None
s3fs : None
scipy : 1.8.0
sqlalchemy : 1.4.32
tables : 3.7.0
tabulate : None
xarray : None
xlrd : None
xlwt : None
zstandard : None

Prior Performance

No response

Metadata

Metadata

Assignees

No one assigned

    Labels

    Needs TriageIssue that has not been reviewed by a pandas team memberPerformanceMemory or execution speed performance

    Type

    No type

    Projects

    No projects

    Milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions