API: return value of `.values` for Series with the future string dtype (numpy array vs extension array)

Historically, the `.values` attribute returned a numpy array (except for categoricals). When we added more ExtensionArrays, for certain dtypes (e.g. tz-aware timestamps, or periods, ..) the EA could more faithfully represent the underlying values instead of the lossy conversion to numpy (e.g for tz-aware timestamps we decided to return a numpy object dtype array instead of "datetime64[ns]" to not lose the timezone information). At that point, instead of "breaking" the behaviour of `.values`, we decided to add an `.array` attribute that then always returns the EA.

But for generic ExtensionArrays (external, or non-default EAs like the masked ones or the Arrow ones), the `.values` has always already directly returned the EA as well. So in those cases, there is no difference between `.values` and `.array`.

Now to the point: with the new default `StringDtype`, the current behaviour is indeed to also always return the EA for both `.values` and `.array`.

This means this is one of the breaking changes for users when upgrading to pandas 3.0, that for a column which is inferred as string data, the `.values` no longer returns a numpy array.

**Are we OK with this breaking change now?**  
Or, we could also decide to keep `.values` return the numpy array with `.array` returning the EA. 

Of course, when we would move to use EAs for all dtypes (which is being considered in the logical dtypes and missing values PDEP discussions), then we would have this breaking change as well (or at least need to make a decision about it). But, that could also be a reason to not yet do it for the string dtype now, if we would change it for all dtypes later.

cc @pandas-dev/pandas-core 



Provide feedback

Saved searches

Use saved searches to filter your results more quickly

API: return value of `.values` for Series with the future string dtype (numpy array vs extension array) #60301

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

API: return value of .values for Series with the future string dtype (numpy array vs extension array) #60301

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions

API: return value of `.values` for Series with the future string dtype (numpy array vs extension array) #60301