Disable .str-accessor for byte data

This supersedes #22721.

Pandas is trying to straddle many different chasms, which leads to undesirable behaviour on the fringes. For the purpose of this issue, I'm talking mainly about
1. supporting python 2/3 (will be over soon...)
1. being largely based on numpy's type system

From the first point, we have the inconsistent handling of str vs. bytes, so having the Series-concatenator work with bytes is a necessity in Python 2.

Mostly due to the second point, there's no proper string dtype, it's just hiding in the `object` dtype. I started #22721 as a side issue which came up while refactoring in #22725. Then I got told that:
> We do NOT handle bytes in `.str` if you want to add tests and raise, pls do so, but not going to 'make it work better'. It is amazingly confusing and causes all sorts of errors. We probably don't have explicit checks on this (though I *thought* that we always infer on the strings that must be string/unicode and *never* bytes).

However, it works already -- the `Series.str`-accessor already checks that it can only be called on an object column, but there's not much more it can do (not least because inspecting every element of a Series would be very performance-intense). Consequently, `.str.cat` currently *does* work on bytes data, and easily at that:
```
>>> import pandas as pd
>>> import numpy as np
>>> s = pd.Series(np.array(list('abc'), 'S1').astype(object))
>>> t = pd.Series(np.array(list('def'), 'S1').astype(object))
>>> s.str.cat(t, sep=b'')
0    b'ad'
1    b'be'
2    b'cf'
dtype: object
>>> s.str.cat(t, sep=b',')
0    b'a,d'
1    b'b,e'
2    b'c,f'
dtype: object
```

Long story short - this issue supersedes #22721, and should serve as a long term goal to disable `.str` once Python 2 gets dropped and/or there is a string dtype.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Disable .str-accessor for byte data #23011

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Disable .str-accessor for byte data #23011

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions