Skip to content

API: to_datetime with integers/floats and format desired behavior? #55663

Open
@jbrockmendel

Description

@jbrockmendel
>>> pd.to_datetime([20231024])
DatetimeIndex(['1970-01-01 00:00:00.020231024'], dtype='datetime64[ns]', freq=None)

>>> pd.to_datetime([20231024], format="%Y%m%d")
DatetimeIndex(['2023-10-24'], dtype='datetime64[ns]', freq=None)

I'm trying to iron out differences between several datetime-parsing paths. array_to_datetime and array_strptime (which to_datetime goes through when a format is specified) have different behavior for ints/floats. array_to_datetime will cast (non-nan) floats to str and then send those through the string parsing, so the int 20231024 gets treated like the string "20231024" as in the example above.

We have 77 tests that hit this path in array_strptime, 2 of them in test_sql, 25 of them in test_stata (though #55642 will get rid of those), the rest in test_to_datetime.

(Note also that array_to_datetime_with_unit has a now-deprecated behavior casting strings to floats!)

Some options

  1. Change nothing, the status quo is fine
  2. Tell users to explicitly cast their ints/floats to strings if thats what they want
  3. Move away from allowing floats/ints in either array_to_datetime or array_strptime; all-numeric cases get their own path (DatetimeIndex does this with a check for infer_dtype(data) == "integer") and push users to do something explicit with mixed-type cases before getting to to_datetime

Metadata

Metadata

Assignees

No one assigned

    Labels

    API DesignConstructorsSeries/DataFrame/Index/pd.array ConstructorsDatetimeDatetime data dtype

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions