Skip to content

BUG: Inconsistent dtype with GroupBy for StrDtype and all missing values #60810

Closed
@WillAyd

Description

@WillAyd

Pandas version checks

  • I have checked that this issue has not already been reported.

  • I have confirmed this bug exists on the latest version of pandas.

  • I have confirmed this bug exists on the main branch of pandas.

Reproducible Example

>>> df = pd.DataFrame({"a": ["a"] * 3, "b": pd.Series([None] * 3, dtype=pd.StringDtype(na_value=np.nan))})
>>> df
   a    b
0  a  NaN
1  a  NaN
2  a  NaN
>>> df.groupby("a").sum()
   b
a   
a  0
>>> df.groupby("a").sum().dtypes
b    str
dtype: object
>>> df.groupby("a").min()
    b
a    
a NaN
>>> df.groupby("a").min().dtypes
b    float64
dtype: object

Issue Description

The sum reduction return type is partially discussed in #60229 but I didn't see anything for min

Note that this discrepancy is the root cause of the test failure shown at

@pytest.mark.xfail(using_string_dtype(), reason="TODO(infer_string)")

@rhshadrach

Expected Behavior

I think in all cases here we should still be returning a str type.

Installed Versions

'3.0.0.dev0+1824.g8d6d29cac3.dirty'

Metadata

Metadata

Assignees

Labels

BugGroupbyStringsString extension data type and string data

Type

No type

Projects

No projects

Relationships

None yet

Development

No branches or pull requests

Issue actions