Skip to content

to_datetime returning numpy.datetime64 #31649

Closed
@ecwootten

Description

@ecwootten

Code Sample, a copy-pastable example if possible

This code:

>>> df = pd.DataFrame({'date': ['Aug2020', 'November 2020']})
>>> df['parsed'] = df['date'].apply(pd.to_datetime)
>>> end = df.loc[df['parsed'].idxmax()]
>>> end['parsed'].replace(day=2)
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
AttributeError: 'numpy.datetime64' object has no attribute 'replace'

worked in Pandas 0.25.3, but raises since 1.0.0.

I think there might be an issue with unboxing values when there are mixed types in the dataframe:

>>> df = pd.DataFrame({'date': ['Aug2020', 'November 2020']})
>>> new = (
...     df
...     .assign(
...         parsed=lambda x: x['date'].apply(pd.to_datetime),
...         parsed2 = lambda x: x['date'].apply(pd.to_datetime)
...     )
... )
>>> new['parsed'].iloc[0]
Timestamp('2020-08-01 00:00:00')
>>> new.iloc[0]['parsed']
numpy.datetime64('2020-08-01T00:00:00.000000000') # unboxed type
>>> new2 = new.drop(columns=['date'])
>>> new2['parsed'].iloc[0]
Timestamp('2020-08-01 00:00:00')
>>> new2.iloc[0]['parsed']
Timestamp('2020-08-01 00:00:00') # boxed type now that we've dropped the string column

Problem description

to_datetime can "sometimes" result in a np.datetime64 return type.

np.datetime64 is not a valid return type for to_datetime (https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.to_datetime.html), it should always be a datetimelike.

Expected Output

As in previous versions of Pandas:

>>> df = pd.DataFrame({'date': ['Aug2020', 'November 2020']})
>>> df['parsed'] = df['date'].apply(pd.to_datetime)
>>> end = df.loc[df['parsed'].idxmax()]
>>> end['parsed'].replace(day=2)
Timestamp('2020-11-02 00:00:00')

Output of pd.show_versions()

pd.show_versions()

INSTALLED VERSIONS

commit : None
python : 3.7.6.final.0
python-bits : 64
OS : Windows
OS-release : 10
machine : AMD64
processor : Intel64 Family 6 Model 94 Stepping 3, GenuineIntel
byteorder : little
LC_ALL : None
LANG : en_GB.UTF-8
LOCALE : None.None

pandas : 1.0.0
numpy : 1.18.1
pytz : 2019.3
dateutil : 2.8.1
pip : 19.3.1
setuptools : 41.2.0
Cython : 0.29.14
pytest : 5.3.5
hypothesis : None
sphinx : None
blosc : None
feather : None
xlsxwriter : None
lxml.etree : 4.5.0
html5lib : None
pymysql : None
psycopg2 : 2.8.4 (dt dec pq3 ext lo64)
jinja2 : 2.10.3
IPython : 7.11.1
pandas_datareader: None
bs4 : None
bottleneck : None
fastparquet : None
gcsfs : None
lxml.etree : 4.5.0
matplotlib : 3.1.3
numexpr : None
odfpy : None
openpyxl : 3.0.0
pandas_gbq : None
pyarrow : None
pytables : None
pytest : 5.3.5
pyxlsb : None
s3fs : None
scipy : 1.4.1
sqlalchemy : None
tables : None
tabulate : None
xarray : None
xlrd : 1.2.0
xlwt : None
xlsxwriter : None
numba : None

Metadata

Metadata

Assignees

No one assigned

    Labels

    DatetimeDatetime data dtypeIndexingRelated to indexing on series/frames, not to indexes themselvesRegressionFunctionality that used to work in a prior pandas version

    Type

    No type

    Projects

    No projects

    Milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions