-
-
Notifications
You must be signed in to change notification settings - Fork 18.5k
Fix 'observed' kwarg not doing anything on SeriesGroupBy #26463
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Changes from 16 commits
a5d6d1a
41f49f4
2575c41
1c02d9f
7350472
0a949d5
0e9f473
1ef54f4
cd481ad
a515caf
ff42dd7
c22875c
cc0b725
629a144
e4fda22
8cfa4a1
db176de
d520952
3591dbc
f97c8a1
d5c9c40
ad16db8
7c525a1
e6bca5e
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
Original file line number | Diff line number | Diff line change |
---|---|---|
|
@@ -36,13 +36,14 @@ class providing the base-class of operations. | |
from pandas.api.types import ( | ||
is_datetime64_dtype, is_integer_dtype, is_object_dtype) | ||
import pandas.core.algorithms as algorithms | ||
from pandas.core.arrays import Categorical | ||
from pandas.core.base import ( | ||
DataError, GroupByError, PandasObject, SelectionMixin, SpecificationError) | ||
import pandas.core.common as com | ||
from pandas.core.frame import DataFrame | ||
from pandas.core.generic import NDFrame | ||
from pandas.core.groupby import base | ||
from pandas.core.index import Index, MultiIndex | ||
from pandas.core.index import CategoricalIndex, Index, MultiIndex | ||
from pandas.core.series import Series | ||
from pandas.core.sorting import get_group_index_sorter | ||
|
||
|
@@ -2301,6 +2302,69 @@ def tail(self, n=5): | |
mask = self._cumcount_array(ascending=False) < n | ||
return self._selected_obj[mask] | ||
|
||
def _reindex_output(self, result): | ||
""" | ||
If we have categorical groupers, then we want to make sure that | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. can you update the doc-string with Parameters / Results; type things if you can |
||
we have a fully reindex-output to the levels. These may have not | ||
participated in the groupings (e.g. may have all been | ||
nan groups); | ||
|
||
This can re-expand the output space | ||
""" | ||
|
||
# we need to re-expand the output space to accomodate all values | ||
# whether observed or not in the cartesian product of our groupes | ||
groupings = self.grouper.groupings | ||
if groupings is None: | ||
return result | ||
elif len(groupings) == 1: | ||
return result | ||
|
||
# if we only care about the observed values | ||
# we are done | ||
elif self.observed: | ||
return result | ||
|
||
# reindexing only applies to a Categorical grouper | ||
elif not any(isinstance(ping.grouper, (Categorical, CategoricalIndex)) | ||
for ping in groupings): | ||
return result | ||
|
||
levels_list = [ping.group_index for ping in groupings] | ||
index, _ = MultiIndex.from_product( | ||
levels_list, names=self.grouper.names).sortlevel() | ||
|
||
if self.as_index: | ||
d = {self.obj._get_axis_name(self.axis): index, 'copy': False} | ||
return result.reindex(**d) | ||
|
||
# GH 13204 | ||
# Here, the categorical in-axis groupers, which need to be fully | ||
# expanded, are columns in `result`. An idea is to do: | ||
# result = result.set_index(self.grouper.names) | ||
# .reindex(index).reset_index() | ||
# but special care has to be taken because of possible not-in-axis | ||
# groupers. | ||
# So, we manually select and drop the in-axis grouper columns, | ||
# reindex `result`, and then reset the in-axis grouper columns. | ||
|
||
# Select in-axis groupers | ||
in_axis_grps = ((i, ping.name) for (i, ping) | ||
in enumerate(groupings) if ping.in_axis) | ||
g_nums, g_names = zip(*in_axis_grps) | ||
|
||
result = result.drop(labels=list(g_names), axis=1) | ||
|
||
# Set a temp index and reindex (possibly expanding) | ||
result = result.set_index(self.grouper.result_index | ||
).reindex(index, copy=False) | ||
|
||
# Reset in-axis grouper columns | ||
# (using level numbers `g_nums` because level names may not be unique) | ||
result = result.reset_index(level=g_nums) | ||
|
||
return result.reset_index(drop=True) | ||
|
||
|
||
GroupBy._add_numeric_operations() | ||
|
||
|
Original file line number | Diff line number | Diff line change |
---|---|---|
|
@@ -76,3 +76,13 @@ def three_group(): | |
'D': np.random.randn(11), | ||
'E': np.random.randn(11), | ||
'F': np.random.randn(11)}) | ||
|
||
|
||
@pytest.fixture | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. can this be more generally used in groupby/test_categorical.py? There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I would have to keep doing I also preferred literal values to random ones for easier equality checks. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. it’s not about decreasing the number of fixtures There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. literal values are fine (u could just replace the random with fixed values) |
||
def df_cat(): | ||
df = DataFrame({'a': ['one', 'one', 'one', 'two'], | ||
'b': ['foo', 'foo', 'bar', 'foo'], | ||
'c': [1, 2, 3, 4]}) | ||
df['a'] = df['a'].astype('category') | ||
df['b'] = df['b'].astype('category') | ||
return df |
Uh oh!
There was an error while loading. Please reload this page.