Description
This issue discusses a future deprecation of the .str
accessor on StringArray.
It's probably blocked by making Series(['a', 'b', 'c'])
infer StringDtype, rather
than object dtype.
In #29640, we split the implementation
of .str
methods into two: one for the old object
-dtype arrays and one for StringArray.
The StringArray implementation is much nicer since we know the result dtype statically
for most methods. We don't need to worry about the presence of NAs changing int to
floats or bool to object.
Given that it's nicer for the user (faster, more predictable) and it's nicer for us
(clean up old code), we should deprecate the .str
accessor on object dtype.
There are a few complications. Certain .str
methods aren't actually methods
called on scalar strings.
.str.join
can work with Series where each row is a list of strings..str.decode
works on bytes, not strings. It makes no sense on a StringArray (but
we can pretty easily implement a BytesArray).str.get
is extremely flexible / complicated. It works on anything that
implements__getitem__
or.get
. So it's maybe useful for strings, but
also for nested Series storing lists / dicts.
For these, we may want separate accessors. Or we can keep the .str
accessor
and deprecate every method except for those.