Skip to content

API: Setting Arrow-backed dtypes by default #51433

Closed
@datapythonista

Description

@datapythonista

I've been using the new Arrow backed dtypes, and I'm a bit confused on how it is decided which backend is used. One example:

>>> with pandas.option_context("mode.dtype_backend", "pyarrow"):
...     pandas.Series([1, 2, 3, 4])
... 
0    1
1    2
2    3
3    4
dtype: int64

Why is setting the dtype_backend to pyarrow not enough to use Arrow in the Series constructor when no dtype is specified?

Also, when using for example read_csv:

>>> import pandas
>>> pandas.read_csv('test.csv').dtypes
name    object
age      int64
dtype: object
>>> pandas.read_csv('test.csv', use_nullable_dtypes=True).dtypes
name    string[python]
age              Int64
dtype: object
>>> with pandas.option_context("mode.dtype_backend", "pyarrow"):
...     pandas.read_csv('test.csv').dtypes
... 
name    object
age      int64
dtype: object
>>> with pandas.option_context("mode.dtype_backend", "pyarrow"):
...     pandas.read_csv('test.csv', use_nullable_dtypes=True).dtypes
... 
name    string[pyarrow]
age      int64[pyarrow]
dtype: object

Why again is not enough that the user set the backend to pyarrow to use Arrow dtypes, and needs to call use_nullable_dtypes? This s what we returned, which doesn't make sense to me:

dtype_backend=None dtype_backend=pyarrow
use_nullable_dtypes=False NumPy NumPy ???
use_nullable_dtypes=True Arrow+NumPy nullables Arrow

What I would expect:

dtype_backend=None dtype_backend=pyarrow
use_nullable_dtypes=False NumPy Arrow
use_nullable_dtypes=True Arrow eventually, Arrow+Numpy nullables for now Arrow

Sorry if I missed the discussion, maybe I'm just missing something. But I don't see what's the use case for a user to explicitly say they want Arrow types with the option, but still giving them NumPy backed series and dataframes... Is this something it was agreed, or we just didn't make the changes to have a more intuitive behavior?

CC: @mroeschke

Metadata

Metadata

Assignees

No one assigned

    Labels

    API - ConsistencyInternal Consistency of API/BehaviorAPI DesignArrowpyarrow functionalityNeeds DiscussionRequires discussion from core team before further actionTypingtype annotations, mypy/pyright type checking

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions