DISC: path to nullable-by-default integers and floats

I've been giving some thought to how we can move towards having nullable integer/bool dtypes by default (from the ice cream agreement last august).  

Terminology note: I am using "nullable" to mean "supports some missing sentinel without taking a stance on what that sentinel is or what semantics it has"

On the user-end, I think it will need to be opt-in for a while.  This can mirror the pyarrow-hybrid string future option.  In the medium-term, we can implement hybrid Integer/Boolean dtype/EAs that use nan as their sentinel.  This will minimize the behavior changes users see and avoids introducing mixed-propagation behavior.  A subsequent deprecation cycle can move to all-propagating.

Open Questions
- Do we disallow numpy int/bool dtypes entirely?
- Lots of users have legacy code that says `dtype=np.int64`, do we warn/raise or map that to future dtype (assuming the user has opted in)?
- Similarly if they do `df.dtypes == np.int64`?

Now that I write that out, I'm talking myself into being strict on this front and avoiding headaches down the road.

Thoughts?

cc @jorisvandenbossche @phofl 


Provide feedback

Saved searches

Use saved searches to filter your results more quickly

DISC: path to nullable-by-default integers and floats #58243

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

DISC: path to nullable-by-default integers and floats #58243

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions