Skip to content

ENH: Can pandas.Series.str.len returns a nullable pd.Int64Dtype rather than float64 #51948

Open
@chelsea-lin

Description

@chelsea-lin

Feature Type

  • Adding new functionality to pandas

  • Changing existing functionality in pandas

  • Removing existing functionality in pandas

Problem Description

The [pandas.Series.str.len](https://pandas.pydata.org/docs/reference/api/pandas.Series.str.len.html) documents the returned value is Series or Index of int, but the codes actually return a float64 dtype because it allows nullable integers. However, float64 may loss the precision. Ideally, can we return pd.Int64Dtype(), the nullable integer data type here? Same request applied to pandas.Series.str.find etc.

Feature Description

Add use_nullable_dtype parameter to return a pd.Int64Dtype() in the pandas.Series.str.len() API

Alternative Solutions

pandas.Series.str.len().astype(pd.Int64Dtype()) can explicitly change the dtype from float64 but may loss the precision.

Additional Context

No response

Metadata

Metadata

Assignees

No one assigned

    Labels

    EnhancementNA - MaskedArraysRelated to pd.NA and nullable extension arraysStringsString extension data type and string data

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions