Skip to content

Groupby "negative dimensions are not allowed" error and bad key behaviour when there are NaNs values. #9096

Closed
@jordeu

Description

@jordeu

On a groupby with a composed key if the product of all possible values is bigger than 2^63 we get a ValueError "negative dimensions are not allowed" when we call len(grouped_data).

A simple version to reproduce it:

values = range(55109)
data = pd.DataFrame.from_dict({'a': values, 'b': values, 'c': values, 'd': values})
grouped = data.groupby(['a', 'b', 'c', 'd'])
len(grouped)

A side effect of this error is that if there are NaN values as possible keys it won't ignore them, it will replace the NaN values with some other values present in the index.

Here there is a complete IPython notebook example to reproduce it:
http://nbviewer.ipython.org/gist/jordeu/cd86fc99f5f89451cf93

Metadata

Metadata

Assignees

No one assigned

    Type

    No type

    Projects

    No projects

    Milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions