Closed
Description
-
I have checked that this issue has not already been reported.
-
I have confirmed this bug exists on the latest version of pandas.
-
(optional) I have confirmed this bug exists on the master branch of pandas.
Code Sample, a copy-pastable example
pd.util.hash_array(pd.DatetimeIndex(['2018-10-28 01:20:00'], tz='Europe/Berlin'))
Output:
---------------------------------------------------------------------------
AttributeError Traceback (most recent call last)
<ipython-input-2-389898c5af02> in <module>
----> 1 pd.util.hash_array(pd.DatetimeIndex(['2018-10-28 01:20:00'], tz='Europe/Berlin'))
~/.pyenv/versions/3.8.6/envs/beam/lib/python3.8/site-packages/pandas/core/util/hashing.py in hash_array(vals, encoding, hash_key, categorize)
255 return _hash_categorical(vals, encoding, hash_key)
256 elif is_extension_array_dtype(dtype):
--> 257 vals, _ = vals._values_for_factorize()
258 dtype = vals.dtype
259
AttributeError: 'DatetimeIndex' object has no attribute '_values_for_factorize'
Apparently datetime64[ns, Europe/Berlin]
is an extension array dtype but has no _values_for_factorize
method. I've reproduced on pandas 1.1.4, 1.2.4, and on master (503ce50)
Problem description
pd.util.hash_array
works with other Indexes, including a timezone-naive DatetimeIndex
, it seems reasonable to expect it to work with a timezone-aware DatetimeIndex
(or yield a better error).
Expected Output
Output should be similar to timezone-naive DatetimeIndex
:
In [3]: pd.util.hash_array(pd.DatetimeIndex(['2018-10-28 01:20:00']))
Out[3]: array([3152239034440746192], dtype=uint64)