Skip to content

BUG: 1.3.0rc1 pd.util.hash_array on Index fails to access _values_for_factorize #42003

Closed
@TheNeuralBit

Description

@TheNeuralBit
  • I have checked that this issue has not already been reported.

  • I have confirmed this bug exists on the latest version of pandas. (does not exist in 1.2.4, only seen in 1.3.0rc1 and master)

  • (optional) I have confirmed this bug exists on the master branch of pandas. (confirmed on 0b68d87)


Code Sample, a copy-pastable example

In [1]: import pandas as pd

In [2]: pd.util.hash_array(pd.DatetimeIndex(['2018-10-28 01:20:00'], tz='Europe/Berlin'))
---------------------------------------------------------------------------
AttributeError                            Traceback (most recent call last)
<ipython-input-2-389898c5af02> in <module>
----> 1 pd.util.hash_array(pd.DatetimeIndex(['2018-10-28 01:20:00'], tz='Europe/Berlin'))

~/working_dir/pandas/pandas/core/util/hashing.py in hash_array(vals, encoding, hash_key, categorize)
    287     elif not isinstance(vals, np.ndarray):
    288         # i.e. ExtensionArray
--> 289         vals, _ = vals._values_for_factorize()
    290 
    291     return _hash_ndarray(vals, encoding, hash_key, categorize)

AttributeError: 'DatetimeIndex' object has no attribute '_values_for_factorize'

In [3]: pd.util.hash_array(pd.Index([1,2,3]))
---------------------------------------------------------------------------
AttributeError                            Traceback (most recent call last)
<ipython-input-3-e66efa244441> in <module>
----> 1 pd.util.hash_array(pd.Index([1,2,3]))

~/working_dir/pandas/pandas/core/util/hashing.py in hash_array(vals, encoding, hash_key, categorize)
    287     elif not isinstance(vals, np.ndarray):
    288         # i.e. ExtensionArray
--> 289         vals, _ = vals._values_for_factorize()
    290 
    291     return _hash_ndarray(vals, encoding, hash_key, categorize)

AttributeError: 'Int64Index' object has no attribute '_values_for_factorize'

In [4]: pd.util.hash_array(pd.RangeIndex(1,3))
---------------------------------------------------------------------------
AttributeError                            Traceback (most recent call last)
<ipython-input-4-af0900aae979> in <module>
----> 1 pd.util.hash_array(pd.RangeIndex(1,3))

~/working_dir/pandas/pandas/core/util/hashing.py in hash_array(vals, encoding, hash_key, categorize)
    287     elif not isinstance(vals, np.ndarray):
    288         # i.e. ExtensionArray
--> 289         vals, _ = vals._values_for_factorize()
    290 
    291     return _hash_ndarray(vals, encoding, hash_key, categorize)

AttributeError: 'RangeIndex' object has no attribute '_values_for_factorize'

Problem description

This issue looks similar to #41817, but that is specifically for DateTimeIndex with tz defined, while this seems to happen for any Index instance.

Expected Output

A hash of the input index, as in pandas < 1.3.0.

Output of pd.show_versions()

INSTALLED VERSIONS ------------------ commit : 2dd9e9b python : 3.8.6.final.0 python-bits : 64 OS : Linux OS-release : 5.10.28-1rodete1-amd64 Version : #1 SMP Debian 5.10.28-1rodete1 (2021-04-30) machine : x86_64 processor : byteorder : little LC_ALL : None LANG : en_US.UTF-8 LOCALE : en_US.UTF-8

pandas : 1.3.0rc1
numpy : 1.19.5
pytz : 2021.1
dateutil : 2.8.1
pip : 20.2.1
setuptools : 49.2.1
Cython : 0.29.22
pytest : 6.2.2
hypothesis : 6.4.0
sphinx : 3.5.1
blosc : 1.10.2
feather : None
xlsxwriter : 1.3.7
lxml.etree : 4.6.2
html5lib : 1.1
pymysql : None
psycopg2 : None
jinja2 : 2.11.3
IPython : 7.21.0
pandas_datareader: None
bs4 : 4.9.3
bottleneck : 1.3.2
fsspec : 0.8.7
fastparquet : 0.5.0
gcsfs : 0.7.2
matplotlib : 3.3.4
numexpr : 2.7.3
odfpy : None
openpyxl : 3.0.6
pandas_gbq : None
pyarrow : 3.0.0
pyxlsb : None
s3fs : 0.5.2
scipy : 1.6.1
sqlalchemy : 1.3.23
tables : 3.6.1
tabulate : 0.8.9
xarray : 0.17.0
xlrd : 2.0.1
xlwt : 1.3.0
numba : 0.52.0

Metadata

Metadata

Assignees

No one assigned

    Labels

    Type

    No type

    Projects

    No projects

    Milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions