Skip to content

ENH: Remove ArrowStringArray and StringDtype("pyarrow") #48469

Closed
@gsheni

Description

@gsheni

Feature Type

  • Adding new functionality to pandas
  • Changing existing functionality in pandas
  • Removing existing functionality in pandas

Problem Description

I wish I could use pandas to create pyarrow backend Series for strings.
I wish there was a single data type and single extension array for strings (rather than 2).

Currently, we have 2 pyarrow data types & arrays for strings

  • StringDtype("pyarrow") backend by arrays.ArrowStringArray
  • ArrowDtype(pa.string()) backend by arrays.ArrowExtensionArray

I propose we use ArrowDtype(pa.string()) and ArrowExtensionArray.

Feature Description

import pyarrow as pa
import pandas as pd 

series_str_arry = pd.Series(['red', 'blue', None], dtype="string[pyarrow]")

string_ext_arry = pd.ArrowDtype(pa.string())
series_ext_arry = pd.Series(['red', 'blue', None], dtype=string_ext_arry)

assert series_str_arry.dtype == series_ext_arry.dtype
assert series_str_arry.dtype.construct_array_type() == series_ext_arry.dtype.construct_array_type()

Alternative Solutions

  • Keep both data types and arrays

Additional Context

Metadata

Metadata

Assignees

No one assigned

    Labels

    Arrowpyarrow functionalityDeprecateFunctionality to remove in pandasEnhancementNeeds DiscussionRequires discussion from core team before further actionStringsString extension data type and string data

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions