Skip to content

DEPR: list of lists in Series.str.cat #21950

Closed
@h-vetinari

Description

@h-vetinari

The .str.cat-method is the only one in the str-accessor that takes another Series as an argument, and as such, is a bit of a special case (e.g. it had no index alignment until v0.23).

It makes sense to support lists of objects which get concatenated sequentially, and list of lists have been supported since at least v0.17, see https://pandas.pydata.org/pandas-docs/version/0.17.0/generated/pandas.Series.str.cat.html

When I wrote #20347, I tried very hard to keep signature backwards-compatible, and the example from the v0.17-22 docs working:

>>> Series(['a', 'b']).str.cat([['x', 'y'], ['1', '2']], sep=',')
0    a,x,1
1    b,y,2
dtype: object

However, this added lots of complexity, and I think that this should be simplified, especially in light of @TomAugspurger's comment in #21894

As a reminder, the plan is to have no new deprecations in 0.25.x and 1.0.0. So this [v0.24] is the last round of deprecations before 1.0.

My suggestion is to modify the allowed combinations (as of v0.23) as follows:

Type of "others"                        |  action  |  comment
---------------------------------------------------------------------
list-like of strings                    |   keep   |  as before; mimics behavior elsewhere,
                                                      cf.: pd.Series(range(3)) + [2,4,6]
Series                                  |   keep   |
np.ndarray (1-dim)                      |   keep   |
DataFrame                               |   keep   |  sequential concatenation
np.ndarray (2-dim)                      |   keep   |  sequential concatenation
list-like of
    Series/Index/np.ndarray (1-dim)     |   keep   |  sequential concatenation
list-like containing list-likes (1-dim)
    other than Series/Index/np.ndarray  |   DEPR   |  sequential concatenation

In other words, if the user wants sequential concatenation, there are many possibilities available, and list-of-lists does not have to be one of them, IMO. This would substantially simplify (post-deprecation) the code for str.cat._get_series_list, which is currently a bit complicated. https://github.com/pandas-dev/pandas/blob/v0.23.3/pandas/core/strings.py#L2089

Finally, for completeness, the example from the v0.17-22 docs has been removed for v0.23, but there are two examples in https://pandas.pydata.org/pandas-docs/stable/text.html#concatenating-a-series-and-many-objects-into-a-series that would fall under the deprecation I'm suggesting.

Metadata

Metadata

Assignees

No one assigned

    Labels

    DeprecateFunctionality to remove in pandasStringsString extension data type and string data

    Type

    No type

    Projects

    No projects

    Milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions