Closed
Description
Code Sample, a copy-pastable example if possible
import pandas as pd
datetime_index = pd.DatetimeIndex(['2017-08-01', '2017-08-02'])
print(datetime_index)
# DatetimeIndex(['2017-08-01', '2017-08-02'], dtype='datetime64[ns]', freq=None)
print(datetime_index.dtype)
# datetime64[ns]
infered_dtype = pd.api.types.infer_dtype(datetime_index, skipna=True)
print(infered_dtype)
# datetime64
print(pd.Index(['2017-08-01', '2017-08-02'], dtype=infered_dtype))
# Traceback (most recent call last):
# File "/tmp/a.py", line 8, in <module>
# print(pd.Index(['2017-08-01', '2017-08-02'], dtype=infered_dtype))
# File "/usr/local/lib/python3.7/dist-packages/pandas/core/indexes/base.py", line 308, in __new__
# dtype=dtype, **kwargs)
# File "/usr/local/lib/python3.7/dist-packages/pandas/core/indexes/datetimes.py", line 303, in __new__
# int_as_wall_time=True)
# File "/usr/local/lib/python3.7/dist-packages/pandas/core/arrays/datetimes.py", line 368, in _from_sequence
# ambiguous=ambiguous, int_as_wall_time=int_as_wall_time)
# File "/usr/local/lib/python3.7/dist-packages/pandas/core/arrays/datetimes.py", line 1706, in sequence_to_dt64ns
# dtype = _validate_dt64_dtype(dtype)
# File "/usr/local/lib/python3.7/dist-packages/pandas/core/arrays/datetimes.py", line 1993, in _validate_dt64_dtype
# .format(dtype=dtype))
# ValueError: Unexpected value for 'dtype': 'datetime64'. Must be 'datetime64[ns]' or DatetimeTZDtype'.
Problem description
We can't use dtype inferred from DatetimeIndex
to convert an Index
to DatetimeIndex
with pandas 0.24.0rc1.
pyarrow uses this logic to convert Arrow objects to pandas objects.
FYI: Here are related codes:
- https://github.com/apache/arrow/blob/master/python/pyarrow/pandas_compat.py#L36
- https://github.com/apache/arrow/blob/master/python/pyarrow/pandas_compat.py#L114
- https://github.com/apache/arrow/blob/master/python/pyarrow/pandas_compat.py#L235
- https://github.com/apache/arrow/blob/master/python/pyarrow/pandas_compat.py#L789
FYI: Here is an error in pyarrow test: https://travis-ci.org/kszucs/crossbow/builds/478558634#L2724-L2788
#24478 introduces a validation for datetime64
. But pd.api.types.infer_dtype
still returns 'datetime64'
for DatetimeIndex
.
The following change fixes this problem. But I'm not sure whether this is a regression or pyarrow's use case is wrong.
diff --git a/pandas/_libs/lib.pyx b/pandas/_libs/lib.pyx
index 85eb6c342..6271d1204 100644
--- a/pandas/_libs/lib.pyx
+++ b/pandas/_libs/lib.pyx
@@ -928,7 +928,7 @@ _TYPE_MAP = {
'U': 'unicode' if PY2 else 'string',
'bool': 'boolean',
'b': 'boolean',
- 'datetime64[ns]': 'datetime64',
+ 'datetime64[ns]': 'datetime64[ns]',
'M': 'datetime64',
'timedelta64[ns]': 'timedelta64',
'm': 'timedelta64',
Expected Output
import pandas as pd
datetime_index = pd.DatetimeIndex(['2017-08-01', '2017-08-02'])
print(datetime_index)
# DatetimeIndex(['2017-08-01', '2017-08-02'], dtype='datetime64[ns]', freq=None)
print(datetime_index.dtype)
# datetime64[ns]
infered_dtype = pd.api.types.infer_dtype(datetime_index, skipna=True)
print(infered_dtype)
# datetime64[ns]
print(pd.Index(['2017-08-01', '2017-08-02'], dtype=infered_dtype))
# DatetimeIndex(['2017-08-01', '2017-08-02'], dtype='datetime64[ns]', freq=None)
Output of pd.show_versions()
INSTALLED VERSIONS
------------------
commit: None
python: 3.7.2.final.0
python-bits: 64
OS: Linux
OS-release: 4.19.0-1-amd64
machine: x86_64
processor:
byteorder: little
LC_ALL: None
LANG: ja_JP.UTF-8
LOCALE: ja_JP.UTF-8
pandas: 0.24.0rc1
pytest: 3.10.1
pip: 18.1
setuptools: 40.6.2
Cython: 0.29.2
numpy: 1.16.0rc2
scipy: 1.1.0
pyarrow: None
xarray: None
IPython: None
sphinx: None
patsy: None
dateutil: 2.7.3
pytz: 2018.9
blosc: None
bottleneck: None
tables: 3.4.4
numexpr: 2.6.9
feather: None
matplotlib: 3.0.2
openpyxl: None
xlrd: None
xlwt: None
xlsxwriter: None
lxml.etree: 4.2.5
bs4: 4.6.3
html5lib: 1.0.1
sqlalchemy: None
pymysql: None
psycopg2: None
jinja2: None
s3fs: None
fastparquet: None
pandas_gbq: None
pandas_datareader: None
gcsfs: None