Skip to content

DISC: path to nullable-by-default integers and floats #58243

Open
@jbrockmendel

Description

@jbrockmendel

I've been giving some thought to how we can move towards having nullable integer/bool dtypes by default (from the ice cream agreement last august).

Terminology note: I am using "nullable" to mean "supports some missing sentinel without taking a stance on what that sentinel is or what semantics it has"

On the user-end, I think it will need to be opt-in for a while. This can mirror the pyarrow-hybrid string future option. In the medium-term, we can implement hybrid Integer/Boolean dtype/EAs that use nan as their sentinel. This will minimize the behavior changes users see and avoids introducing mixed-propagation behavior. A subsequent deprecation cycle can move to all-propagating.

Open Questions

  • Do we disallow numpy int/bool dtypes entirely?
  • Lots of users have legacy code that says dtype=np.int64, do we warn/raise or map that to future dtype (assuming the user has opted in)?
  • Similarly if they do df.dtypes == np.int64?

Now that I write that out, I'm talking myself into being strict on this front and avoiding headaches down the road.

Thoughts?

cc @jorisvandenbossche @phofl

Metadata

Metadata

Assignees

No one assigned

    Labels

    BugNeeds TriageIssue that has not been reviewed by a pandas team member

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions