Skip to content

DEPR: Deprecate .str accessor on object-dtype. #29710

Open
@TomAugspurger

Description

@TomAugspurger

This issue discusses a future deprecation of the .str accessor on StringArray.
It's probably blocked by making Series(['a', 'b', 'c']) infer StringDtype, rather
than object dtype.

In #29640, we split the implementation
of .str methods into two: one for the old object-dtype arrays and one for StringArray.
The StringArray implementation is much nicer since we know the result dtype statically
for most methods. We don't need to worry about the presence of NAs changing int to
floats or bool to object.

Given that it's nicer for the user (faster, more predictable) and it's nicer for us
(clean up old code), we should deprecate the .str accessor on object dtype.

There are a few complications. Certain .str methods aren't actually methods
called on scalar strings.

  1. .str.join can work with Series where each row is a list of strings.
  2. .str.decode works on bytes, not strings. It makes no sense on a StringArray (but
    we can pretty easily implement a BytesArray)
  3. .str.get is extremely flexible / complicated. It works on anything that
    implements __getitem__ or .get. So it's maybe useful for strings, but
    also for nested Series storing lists / dicts.

For these, we may want separate accessors. Or we can keep the .str accessor
and deprecate every method except for those.

Metadata

Metadata

Assignees

No one assigned

    Labels

    DeprecateFunctionality to remove in pandasStringsString extension data type and string data

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions