Skip to content

ENH: make infer_datetime_format strict #48596

Closed
@MarcoGorelli

Description

@MarcoGorelli

to_datetime has an argument infer_datetime_format which, if set to True, will guess the format from the first non-NaN row.

People (users, but also core devs, e.g. here and here), expect that the format inferred from the first row will be applied to the rest of the series. i.e. that the following two should behave the same:

pd.to_datetime(['01-31-2000', '20-01-2000'], infer_datetime_format=True)
pd.to_datetime(['01-31-2000', '20-01-2000'], format='%m-%d-%Y')

However, they don't: the latter raises, whilst the first one swaps format midway.

Although this is documented in the user guide, it's not what people expect.

Making this argument strict would align more to people's expectations, but also simplify the codebase, as it would get rid of special-casing such as

if not infer_datetime_format:
if errors == "raise":
raise
elif errors == "coerce":
result = np.empty(arg.shape, dtype="M8[ns]")
iresult = result.view("i8")
iresult.fill(iNaT)
else:
result = arg
else:
# Indicates to the caller to fallback to objects_to_datetime64ns
return None

TL;RD I'm suggesting that when using infer_datetime_format=True, the format detected from the first non-NaN value should be used to parse the rest of the Series, exactly as if the user had passed it to format=

This would be one step towards addressing #12585

@pandas-dev/pandas-core any thoughts here?


EDIT: I'm hoping that #48621 can supersede this

Metadata

Metadata

Assignees

No one assigned

    Labels

    DatetimeDatetime data dtypeNeeds DiscussionRequires discussion from core team before further action

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions