Skip to content

.str._validate should infer for Series, not raise for all-na Index #23163

Closed
@h-vetinari

Description

@h-vetinari

While working on #22725, a couple of work-arounds were necessary to correctly raise on wrong data types hiding as objects, e.g.

>>> pd.Series([1, 2, 3], dtype=object).str.cat([1, 2, 3])

However, already the .str accessor itself should raise on __init__ resp. the internal _validate method (this is closely related to #23011), i.e. instead of

>>> pd.Series([1,2,3], dtype=object).str
<pandas.core.strings.StringMethods object at 0x000002A4C70AE198>

it should be

>>> pd.Series([1,2,3], dtype=object).str
AttributeError

Interestingly, Index does correctly infer already in .str._validate:

>>> pd.Index([1,2,3], dtype=object).str
AttributeError: Can only use .str accessor with string values (i.e. inferred_type is 'string', 'unicode' or 'mixed')

However, there is another nit about that that I want to fix at the same time as the inferral for Series - namely that an all-na object Index (or Series) should not raise the AttributeError (currently it does because all-na gets inferred as float). There are legitimate cases where a selection of string data may be all-na (by alignment or whatever), and if the dtype is object then this shouldn't fail.

Edit: xref #9343 #13877

Metadata

Metadata

Assignees

No one assigned

    Labels

    BugStringsString extension data type and string data

    Type

    No type

    Projects

    No projects

    Milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions