Closed
Description
Feature Type
- Adding new functionality to pandas
- Changing existing functionality in pandas
- Removing existing functionality in pandas
Problem Description
I wish I could use pandas to create pyarrow backend Series for strings.
I wish there was a single data type and single extension array for strings (rather than 2).
Currently, we have 2 pyarrow data types & arrays for strings
StringDtype("pyarrow")
backend by arrays.ArrowStringArrayArrowDtype(pa.string())
backend by arrays.ArrowExtensionArray
I propose we use ArrowDtype(pa.string())
and ArrowExtensionArray.
Feature Description
import pyarrow as pa
import pandas as pd
series_str_arry = pd.Series(['red', 'blue', None], dtype="string[pyarrow]")
string_ext_arry = pd.ArrowDtype(pa.string())
series_ext_arry = pd.Series(['red', 'blue', None], dtype=string_ext_arry)
assert series_str_arry.dtype == series_ext_arry.dtype
assert series_str_arry.dtype.construct_array_type() == series_ext_arry.dtype.construct_array_type()
Alternative Solutions
- Keep both data types and arrays