Skip to content

ENH: consistent types in output of df.groupby #42795

Closed
@ikamensh

Description

@ikamensh

Is your feature request related to a problem?

The problem that occurred to me is due to inconsistent types outputted by for values, match_df in df.groupby(names) - if names is a list of length >1, then values can be a tuple. Otherwise it's an element type, which can be e.g. string.

I had a bug where I iterating over values resulted in going over characters of a string, together with zip it just discarded most of the field.

Describe the solution you'd like

Whenever df.groupby is called with a list, output a tuple of values found. Specifically when called with a list of length 1, output a tuple of length 1 instead of unwrapped element.

API breaking implications

This would change how existing code behaves.

Describe alternatives you've considered

Leave things as they are. Unfortunately it means people need to write special case handling ifs around groupby to get consistent types outputted.

Additional context

import pandas as pd


df = pd.DataFrame(columns=['a','b','c'], index=['x','y'])
df.loc['y'] = pd.Series({'a':1, 'b':5, 'c':2})
print(df)

values, _ = next(iter(df.groupby(['a', 'b'])))
print(type(values))  # <class 'tuple'>

values, _ = next(iter(df.groupby(['a'])))
print(type(values))  # <class 'int'>  # IMHO should be 'tuple'!

values, _ = next(iter(df.groupby('a')))
print(type(values))  # <class 'int'>

Metadata

Metadata

Assignees

No one assigned

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions