Skip to content

Are multiple callable/dict groupers allowed in groupby? #22278

Open
@toobaz

Description

@toobaz

Problem description

The docs for DataFrame.groupby signature start with:

by : mapping, function, label, or list of labels
    Used to determine the groups for the groupby.

... but the code assumes that lists of mappings or functions can also be passed, and this is also tested, although with limited enthusiasm:

# this code path isn't used anywhere else

... and consistency (apparently that code path is used somewhere else):
grouped = wp.groupby([lambda x: x.month, lambda x: x.weekday()],

Expected Output

Either we disable/deprecate the possibility of passing lists of mappings, ore we document it.

I guess the latter is the desired outcome, since the code does not support the feature "by chance". Still I wanted to double check with @pandas-dev/pandas-core because

  • it is not a killer feature, as it is really easy to pass a single lambda that does the same job of a list of mappings (and more, like applying different mappings to specific levels of the index)
  • removing it would allow us to simplify the code quite a bit (e.g. get_group(...) fails for groupby(...) based on a function #22257 wouldn't have happened)
  • it is probably not much used

Output of pd.show_versions()

INSTALLED VERSIONS

commit: None
python: 3.5.3.final.0
python-bits: 64
OS: Linux
OS-release: 4.9.0-6-amd64
machine: x86_64
processor:
byteorder: little
LC_ALL: None
LANG: it_IT.UTF-8
LOCALE: it_IT.UTF-8

pandas: 0.24.0.dev0+437.g33d70efb5
pytest: 3.5.0
pip: 9.0.1
setuptools: 39.2.0
Cython: 0.28.4
numpy: 1.14.3
scipy: 0.19.0
pyarrow: None
xarray: None
IPython: 6.2.1
sphinx: 1.5.6
patsy: 0.5.0
dateutil: 2.7.3
pytz: 2018.4
blosc: None
bottleneck: 1.2.0dev
tables: 3.3.0
numexpr: 2.6.1
feather: 0.3.1
matplotlib: 2.2.2.post1634.dev0+ge8120cf6d
openpyxl: 2.3.0
xlrd: 1.0.0
xlwt: 1.3.0
xlsxwriter: 0.9.6
lxml: 4.1.1
bs4: 4.5.3
html5lib: 0.999999999
sqlalchemy: 1.0.15
pymysql: None
psycopg2: None
jinja2: 2.10
s3fs: None
fastparquet: None
pandas_gbq: None
pandas_datareader: 0.2.1
gcsfs: None

Metadata

Metadata

Assignees

No one assigned

    Labels

    DeprecateFunctionality to remove in pandasGroupby

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions