Skip to content

Unstable hashtable / duplicated algo for object dtype #27035

Open
@jorisvandenbossche

Description

@jorisvandenbossche

From a flaky test in geopandas, I observed the following behaviour:

In [1]: pd.__version__
Out[1]: '0.25.0.dev0+791.gf0919f272'

In [2]: from shapely.geometry import Point 

In [3]: a = np.array([Point(1, 1), Point(1, 1)], dtype=object) 

In [4]: pd.Series(a).duplicated()
Out[4]: 
0    False
1     True
dtype: bool

In [6]: print(pd.Series(a).duplicated()) 
   ...: print(pd.Series(a).duplicated())
0    False
1     True
dtype: bool
0    False
1    False
dtype: bool

So you see that sometimes it works, sometimes it does not work.

I am also not fully sure how the object hashtable works (assuming duplicated uses the hashtable), as the shapely Point objects are not hashable:

In [9]: pd.Series(a).unique()
...
TypeError: unhashable type: 'Point'

Metadata

Metadata

Assignees

No one assigned

    Labels

    Bugduplicatedduplicated, drop_duplicates

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions