Closed
Description
I was going through the enhancingperf
document recently and realised that the saying:
Note: Loops like this would be extremely slow in Python, but in Cython looping over NumPy arrays is fast.
doesn't seem to be true any more. Specifically I can implement apply_integrate_f
as:
def apply_integrate_pyf(df):
return np.fromiter((
integrate_f_typed(*x) for x in zip(df['a'], df['b'], df['N'])
), float, len(df))
and get basically the same performance:
apply_integrate_f
takes 1.27 msapply_integrate_f_wrap
checks disabled takes 856 µs- my
apply_integrate_pyf
version takes 1.13 ms
(all the above run on my computer using Jupyter %timeit
, i.e. mean of 7 runs)
This feels like a much nicer way of elding the creation of all those Series
objects.
I could submit a pull-request if it seems worthwhile updating this document — git blame
says it's mostly been receiving cosmetic changes for 6 years or so.
Output of pd.show_versions()
INSTALLED VERSIONS
------------------
commit: None
python: 3.7.1.final.0
python-bits: 64
OS: Darwin
OS-release: 18.2.0
machine: x86_64
processor: i386
byteorder: little
LC_ALL: None
LANG: en_GB.UTF-8
LOCALE: en_GB.UTF-8
pandas: 0.23.4
pytest: None
pip: 18.1
setuptools: 40.6.3
Cython: 0.29.2
numpy: 1.15.4
scipy: 1.2.0
pyarrow: None
xarray: None
IPython: 7.2.0
sphinx: None
patsy: None
dateutil: 2.7.5
pytz: 2018.9
blosc: None
bottleneck: None
tables: None
numexpr: None
feather: None
matplotlib: 3.0.2
openpyxl: None
xlrd: None
xlwt: None
xlsxwriter: None
lxml: None
bs4: None
html5lib: None
sqlalchemy: None
pymysql: None
psycopg2: None
jinja2: 2.10
s3fs: None
fastparquet: None
pandas_gbq: None
pandas_datareader: None