Change default string storage from "python" to "pyarrow" (if installed) for for NA-variant of StringDtype

Historically, the default value for the string storage (globally configurable through `pd.options.mode.string_storage`) of `StringDtype` was `"python"`, and users needed to explicitly ask for `"pyarrow"`. For example:

```python
>>> ser = pd.Series(["a", "b"], dtype="string")
>>>  ser.dtype
string[python]
```

and this is still the behaviour on `main`.

For the new NaN-variant of `StringDtype`, however, we implemented the default string storage option `"auto"` meaning "use pyarrow if installed, otherwise use python". So on a system with pyarrow installed:

```python
>>> pd.options.future.infer_string = True
>>> ser = pd.Series(["a", "b"], dtype="str")
>>> ser.dtype.storage
'pyarrow'
```

Essentially we interpret the default `string_storage` option setting of `"auto"` differently for the NaN vs NA variant of the string dtype, which you can see in the code here:

https://github.com/pandas-dev/pandas/blob/5f23aced2f97f2ed481deda4eaeeb049d6c7debe/pandas/core/arrays/string_.py#L152-L163

---

__Proposal__: I think it makes sense to also switch to "pyarrow" as the default string storage (if installed) for the nullable StringDtype. This is somewhat a breaking change (although mostly for the dtype object itself, because behaviour-wise for string operations, there should be hardly any difference between both backends), so I would keep this for 3.0 and properly document it in the whatsnew notes.


	if storage is None:
	if na_value is not libmissing.NA:
	storage = get_option("mode.string_storage")
	if storage == "auto":
	if HAS_PYARROW:
	storage = "pyarrow"
	else:
	storage = "python"
	else:
	storage = get_option("mode.string_storage")
	if storage == "auto":
	storage = "python"

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Change default string storage from "python" to "pyarrow" (if installed) for for NA-variant of StringDtype #60287

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Change default string storage from "python" to "pyarrow" (if installed) for for NA-variant of StringDtype #60287

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions