Skip to content

BUG: df.to_dict(orient="records") significantly slower in Pandas 1.3.0 #42352

Closed
@kyri-petrou

Description

@kyri-petrou

Using df.to_dict(orient="records") with large dataframes is significantly slower in pandas 1.3.0 vs 1.2.5.

Could you please advice on what might be the cause of this issue?

Test dataframe

image

<class 'pandas.core.frame.DataFrame'>
Int64Index: 100823 entries, 0 to 262141
Data columns (total 13 columns):
 #   Column           Non-Null Count   Dtype  
---  ------           --------------   -----  
 0   Class            100823 non-null  object 
 1   x                100823 non-null  float64
 2   y                100823 non-null  float64
 3   z                100823 non-null  float64
 4   rgb              100823 non-null  object 
 5   distance         100823 non-null  float64
 6   treecluster      100820 non-null  float64
 7   normal           100823 non-null  object 
 8   color            100823 non-null  object 
 9   rgb_distance     100823 non-null  object 
 10  responsibility   100820 non-null  object 
 11  vp_codes         100820 non-null  float64
 12  rgb_treecluster  100823 non-null  object 
dtypes: float64(6), object(7)
memory usage: 10.8+ MB

Simple timing test

image

Profiling

Pandas 1.2.5

         5547864 function calls (5547672 primitive calls) in 1.791 seconds

   Ordered by: internal time

   ncalls  tottime  percall  cumtime  percall filename:lineno(function)
  1310699    0.568    0.000    0.831    0.000 cast.py:137(maybe_box_datetimelike)
  1411522    0.351    0.000    1.181    0.000 frame.py:1601(<genexpr>)
   100824    0.288    0.000    0.288    0.000 frame.py:1596(<genexpr>)
        1    0.280    0.280    1.759    1.759 frame.py:1600(<listcomp>)
  2621687    0.263    0.000    0.263    0.000 {built-in method builtins.isinstance}
        1    0.028    0.028    1.791    1.791 <string>:2(<module>)
   100823    0.010    0.000    0.010    0.000 {method 'items' of 'dict' objects}
     83/3    0.002    0.000    0.002    0.001 {built-in method _abc._abc_subclasscheck}
       13    0.000    0.000    0.001    0.000 indexing.py:782(_getitem_lowerdim)
       26    0.000    0.000    0.000    0.000 generic.py:5467(__setattr__)

Pandas 1.3.0

         35794233 function calls (35794206 primitive calls) in 15.844 seconds

   Ordered by: internal time

   ncalls  tottime  percall  cumtime  percall filename:lineno(function)
  1310699    2.127    0.000   11.738    0.000 common.py:1578(_is_dtype_type)
   806581    2.013    0.000    5.492    0.000 base.py:425(find)
  1310696    1.942    0.000    8.020    0.000 common.py:1744(pandas_dtype)
  2218073    1.415    0.000    1.740    0.000 base.py:208(construct_from_string)
 12703947    1.410    0.000    1.410    0.000 {built-in method builtins.isinstance}
  1310699    1.066    0.000   14.387    0.000 cast.py:173(maybe_box_native)
  1310699    0.962    0.000   12.927    0.000 common.py:996(is_datetime_or_timedelta_dtype)
  1411522    0.598    0.000   14.985    0.000 frame.py:1823(<genexpr>)
        1    0.498    0.498   15.815   15.815 frame.py:1822(<listcomp>)
  1310699    0.344    0.000    0.535    0.000 common.py:146(<lambda>)
   100824    0.313    0.000    0.313    0.000 frame.py:1818(<genexpr>)

Metadata

Metadata

Assignees

Labels

Dtype ConversionsUnexpected or buggy dtype conversionsPerformanceMemory or execution speed performanceRegressionFunctionality that used to work in a prior pandas version

Type

No type

Projects

No projects

Milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions