Closed
Description
Using df.to_dict(orient="records")
with large dataframes is significantly slower in pandas 1.3.0 vs 1.2.5.
Could you please advice on what might be the cause of this issue?
Test dataframe
<class 'pandas.core.frame.DataFrame'>
Int64Index: 100823 entries, 0 to 262141
Data columns (total 13 columns):
# Column Non-Null Count Dtype
--- ------ -------------- -----
0 Class 100823 non-null object
1 x 100823 non-null float64
2 y 100823 non-null float64
3 z 100823 non-null float64
4 rgb 100823 non-null object
5 distance 100823 non-null float64
6 treecluster 100820 non-null float64
7 normal 100823 non-null object
8 color 100823 non-null object
9 rgb_distance 100823 non-null object
10 responsibility 100820 non-null object
11 vp_codes 100820 non-null float64
12 rgb_treecluster 100823 non-null object
dtypes: float64(6), object(7)
memory usage: 10.8+ MB
Simple timing test
Profiling
Pandas 1.2.5
5547864 function calls (5547672 primitive calls) in 1.791 seconds
Ordered by: internal time
ncalls tottime percall cumtime percall filename:lineno(function)
1310699 0.568 0.000 0.831 0.000 cast.py:137(maybe_box_datetimelike)
1411522 0.351 0.000 1.181 0.000 frame.py:1601(<genexpr>)
100824 0.288 0.000 0.288 0.000 frame.py:1596(<genexpr>)
1 0.280 0.280 1.759 1.759 frame.py:1600(<listcomp>)
2621687 0.263 0.000 0.263 0.000 {built-in method builtins.isinstance}
1 0.028 0.028 1.791 1.791 <string>:2(<module>)
100823 0.010 0.000 0.010 0.000 {method 'items' of 'dict' objects}
83/3 0.002 0.000 0.002 0.001 {built-in method _abc._abc_subclasscheck}
13 0.000 0.000 0.001 0.000 indexing.py:782(_getitem_lowerdim)
26 0.000 0.000 0.000 0.000 generic.py:5467(__setattr__)
Pandas 1.3.0
35794233 function calls (35794206 primitive calls) in 15.844 seconds
Ordered by: internal time
ncalls tottime percall cumtime percall filename:lineno(function)
1310699 2.127 0.000 11.738 0.000 common.py:1578(_is_dtype_type)
806581 2.013 0.000 5.492 0.000 base.py:425(find)
1310696 1.942 0.000 8.020 0.000 common.py:1744(pandas_dtype)
2218073 1.415 0.000 1.740 0.000 base.py:208(construct_from_string)
12703947 1.410 0.000 1.410 0.000 {built-in method builtins.isinstance}
1310699 1.066 0.000 14.387 0.000 cast.py:173(maybe_box_native)
1310699 0.962 0.000 12.927 0.000 common.py:996(is_datetime_or_timedelta_dtype)
1411522 0.598 0.000 14.985 0.000 frame.py:1823(<genexpr>)
1 0.498 0.498 15.815 15.815 frame.py:1822(<listcomp>)
1310699 0.344 0.000 0.535 0.000 common.py:146(<lambda>)
100824 0.313 0.000 0.313 0.000 frame.py:1818(<genexpr>)