API: astype("string") behavior

in https://github.com/pandas-dev/pandas/pull/39908#issuecomment-855889758 @jorisvandenbossche wrote

---

Bringing the "deferred" storage mode lookup for `StringDtype` discussion (originally here: https://github.com/pandas-dev/pandas/pull/39908#discussion_r585573328) in the main thread, and trying to recap.

Currently, doing `pd.StringDtype()` (without specifying the storage), will already look up the option. In the default case, you get:

```python
>>> pd.StringDtype().storage
'python'
```

which also means that `pandas_dtype()` already "fully initializes" the string dtype:

```python
>>> pd.api.types.pandas_dtype("string")
string[python]
```

As a consequence, doing `astype("string")` will actually convert the values if your string dtype doesn't match the globab setting:

```python
>>> s = pd.Series(['a', 'b'], dtype=pd.StringDtype(storage="pyarrow"))
>>> s.dtype
string[pyarrow]
>>> s.astype("string").dtype
string[python]
```

While I think it could make sense for `.astype("string")` to mean: "ensure I have *a* string dtype", and thus don't convert to another storage backend if I already had a string dtype to start with. 

We do something similar for CategoricalDtype (`"category"` means a categorical dtype with no categories, but `astype("category")` does not remove your categories, it preserves any existing categorical dtype as is). 

We could still have the `astype("string")` behave in the way I suggest by special casing this in the `astype` implementations (as suggested in https://github.com/pandas-dev/pandas/pull/39908#discussion_r643867763)), but I think that's something we would ideally avoid (any `astype` implementation accepting string dtype values as input would need to handle this case?)

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

API: astype("string") behavior #41856

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

API: astype("string") behavior #41856

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions