Skip to content

Performance issue with pandas/core/common.py -> maybe_box_datetimelike #30520

Closed
@ivan-vasilev

Description

@ivan-vasilev

Code Sample, a copy-pastable example if possible

# The existing implementation is:
def maybe_box_datetimelike(value):
    # turn a datetime like into a Timestamp/timedelta as needed

    if isinstance(value, (np.datetime64, datetime)):
        value = tslibs.Timestamp(value)
    elif isinstance(value, (np.timedelta64, timedelta)):
        value = tslibs.Timedelta(value)

    return value

# Proposed improvement:
def maybe_box_datetimelike(value):
    # turn a datetime like into a Timestamp/timedelta as needed

    if isinstance(value, (np.datetime64, datetime)) and not isinstance(value, tslibs.Timestamp):
        value = tslibs.Timestamp(value)
    elif isinstance(value, (np.timedelta64, timedelta)):
        value = tslibs.Timedelta(value)

    return value

Problem description

This function determines whether value is of type (np.datetime64, datetime) and if so, converts it into tslibs.Timestamp. However, the class tslibs.Timestamp is already a subclass of datetime. Therefore, even if the object value is already of type tslibs.Timestamp, it will be needlessly converted one more time. This issue has large performance, when working with large dataframes, which contain datet time objects. This issue could be fixed by changing the condition
if isinstance(value, (np.datetime64, datetime)):
to:
if isinstance(value, (np.datetime64, datetime)) and not isinstance(value, tslibs.Timestamp):

Metadata

Metadata

Assignees

No one assigned

    Labels

    PerformanceMemory or execution speed performance

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions