Skip to content

Regression in hash_pandas_object with hash_key=None and object dtype #30887

Closed
@TomAugspurger

Description

@TomAugspurger

On 1.0.0rc0, this raises

In [7]: pd.util.hash_pandas_object(pd.Series(['a', 'b']), hash_key=None)
---------------------------------------------------------------------------
AttributeError                            Traceback (most recent call last)
<ipython-input-7-00d1f153287e> in <module>
----> 1 pd.util.hash_pandas_object(pd.Series(['a', 'b']), hash_key=None)

~/sandbox/pandas/pandas/core/util/hashing.py in hash_pandas_object(obj, index, encoding, hash_key, categorize)
     93
     94     elif isinstance(obj, ABCSeries):
---> 95         h = hash_array(obj.values, encoding, hash_key, categorize).astype(
     96             "uint64", copy=False
     97         )

~/sandbox/pandas/pandas/core/util/hashing.py in hash_array(vals, encoding, hash_key, categorize)
    302             codes, categories = factorize(vals, sort=False)
    303             cat = Categorical(codes, Index(categories), ordered=False, fastpath=True)
--> 304             return _hash_categorical(cat, encoding, hash_key)
    305
    306         try:

~/sandbox/pandas/pandas/core/util/hashing.py in _hash_categorical(c, encoding, hash_key)
    221     # Convert ExtensionArrays to ndarrays
    222     values = np.asarray(c.categories.values)
--> 223     hashed = hash_array(values, encoding, hash_key, categorize=False)
    224
    225     # we have uint64, as we don't directly support missing values

~/sandbox/pandas/pandas/core/util/hashing.py in hash_array(vals, encoding, hash_key, categorize)
    305
    306         try:
--> 307             vals = hashing.hash_object_array(vals, hash_key, encoding)
    308         except TypeError:
    309             # we have mixed types

~/sandbox/pandas/pandas/_libs/hashing.pyx in pandas._libs.hashing.hash_object_array()

AttributeError: 'NoneType' object has no attribute 'encode'

On 0.25.3

In [7]: pd.util.hash_pandas_object(pd.Series(['a', 'b']), hash_key=None)
Out[7]:
0     4578374827886788867
1    17338122309987883691
dtype: uint64

It's only for object dtype.

Metadata

Metadata

Assignees

No one assigned

    Labels

    RegressionFunctionality that used to work in a prior pandas versionhashinghash_pandas_object

    Type

    No type

    Projects

    No projects

    Milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions