Skip to content

Enhancingperf documentation updates #24807

Closed
@smason

Description

@smason

I was going through the enhancingperf document recently and realised that the saying:

Note: Loops like this would be extremely slow in Python, but in Cython looping over NumPy arrays is fast.

doesn't seem to be true any more. Specifically I can implement apply_integrate_f as:

def apply_integrate_pyf(df):
    return np.fromiter((
        integrate_f_typed(*x) for x in zip(df['a'], df['b'], df['N'])
    ), float, len(df))

and get basically the same performance:

  • apply_integrate_f takes 1.27 ms
  • apply_integrate_f_wrap checks disabled takes 856 µs
  • my apply_integrate_pyf version takes 1.13 ms

(all the above run on my computer using Jupyter %timeit, i.e. mean of 7 runs)

This feels like a much nicer way of elding the creation of all those Series objects.

I could submit a pull-request if it seems worthwhile updating this document — git blame says it's mostly been receiving cosmetic changes for 6 years or so.

Output of pd.show_versions()

INSTALLED VERSIONS
------------------
commit: None
python: 3.7.1.final.0
python-bits: 64
OS: Darwin
OS-release: 18.2.0
machine: x86_64
processor: i386
byteorder: little
LC_ALL: None
LANG: en_GB.UTF-8
LOCALE: en_GB.UTF-8

pandas: 0.23.4
pytest: None
pip: 18.1
setuptools: 40.6.3
Cython: 0.29.2
numpy: 1.15.4
scipy: 1.2.0
pyarrow: None
xarray: None
IPython: 7.2.0
sphinx: None
patsy: None
dateutil: 2.7.5
pytz: 2018.9
blosc: None
bottleneck: None
tables: None
numexpr: None
feather: None
matplotlib: 3.0.2
openpyxl: None
xlrd: None
xlwt: None
xlsxwriter: None
lxml: None
bs4: None
html5lib: None
sqlalchemy: None
pymysql: None
psycopg2: None
jinja2: 2.10
s3fs: None
fastparquet: None
pandas_gbq: None
pandas_datareader: None

Metadata

Metadata

Assignees

No one assigned

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions