Description
While working on #22725, a couple of work-arounds were necessary to correctly raise on wrong data types hiding as objects, e.g.
>>> pd.Series([1, 2, 3], dtype=object).str.cat([1, 2, 3])
However, already the .str
accessor itself should raise on __init__
resp. the internal _validate
method (this is closely related to #23011), i.e. instead of
>>> pd.Series([1,2,3], dtype=object).str
<pandas.core.strings.StringMethods object at 0x000002A4C70AE198>
it should be
>>> pd.Series([1,2,3], dtype=object).str
AttributeError
Interestingly, Index
does correctly infer already in .str._validate
:
>>> pd.Index([1,2,3], dtype=object).str
AttributeError: Can only use .str accessor with string values (i.e. inferred_type is 'string', 'unicode' or 'mixed')
However, there is another nit about that that I want to fix at the same time as the inferral for Series - namely that an all-na object Index (or Series) should not raise the AttributeError
(currently it does because all-na gets inferred as float
). There are legitimate cases where a selection of string data may be all-na (by alignment or whatever), and if the dtype is object
then this shouldn't fail.