Skip to content

Bug 29764 groupby loses index name sometimes #33111

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
Show file tree
Hide file tree
Changes from 5 commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
1 change: 1 addition & 0 deletions doc/source/whatsnew/v1.1.0.rst
Original file line number Diff line number Diff line change
Expand Up @@ -403,6 +403,7 @@ Groupby/resample/rolling
- Bug in :meth:`GroupBy.apply` raises ``ValueError`` when the ``by`` axis is not sorted and has duplicates and the applied ``func`` does not mutate passed in objects (:issue:`30667`)
- Bug in :meth:`DataFrameGroupby.transform` produces incorrect result with transformation functions (:issue:`30918`)
- Bug in :meth:`DataFrame.groupby` and :meth:`Series.groupby` produces inconsistent type when aggregating Boolean series (:issue:`32894`)
- Bug in :meth:`DataFrame.groupby` does not always maintain column index name for ``any``, ``all``, ``bfill``, ``ffill``, ``shift`` (:issue:`29764`)


Reshaping
Expand Down
6 changes: 4 additions & 2 deletions pandas/core/groupby/generic.py
Original file line number Diff line number Diff line change
Expand Up @@ -1687,8 +1687,9 @@ def _wrap_aggregated_output(
-------
DataFrame
"""
idx_name = output.pop("idx_name", None)
indexed_output = {key.position: val for key, val in output.items()}
columns = Index(key.label for key in output)
columns = Index([key.label for key in output], name=idx_name)

result = DataFrame(indexed_output)
result.columns = columns
Expand Down Expand Up @@ -1720,8 +1721,9 @@ def _wrap_transformed_output(
-------
DataFrame
"""
idx_name = output.pop("idx_name", None)
indexed_output = {key.position: val for key, val in output.items()}
columns = Index(key.label for key in output)
columns = Index([key.label for key in output], name=idx_name)

result = DataFrame(indexed_output)
result.columns = columns
Expand Down
5 changes: 4 additions & 1 deletion pandas/core/groupby/groupby.py
Original file line number Diff line number Diff line change
Expand Up @@ -2235,8 +2235,11 @@ def _get_cythonized_result(
grouper = self.grouper

labels, _, ngroups = grouper.group_info
output: Dict[base.OutputKey, np.ndarray] = {}
output: Dict[base.OutputKey, np.ndarray, str:str] = {}
base_func = getattr(libgroupby, how)
obj = self._selected_obj
if isinstance(obj, DataFrame):
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

what are you doing here? this is very odd

Copy link
Member Author

@phofl phofl Apr 7, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The function call self._wrap_aggregated_output or self._wrap_transformed_output(output) convert the array containing the results in a DataFrame (or Series, but not important in our case). This is below the marked section. The name of the index is already lost at this point, so I added the name to the output dictionary, which is given as input for this functions. This is not relevant, if the result is a Series, so I check, if we have a DataFrame.

My first idea was to check if the original object was from the type DataFrameGroupBy, but I could not perform this check without importing DataFrameGroupBy during runtime. To avoid this, I used the method you see above.

Does this answer your question?

output["idx_name"] = getattr(getattr(obj, "columns"), "name")

for idx, obj in enumerate(self._iterate_slices()):
name = obj.name
Expand Down
16 changes: 16 additions & 0 deletions pandas/tests/groupby/test_groupby.py
Original file line number Diff line number Diff line change
Expand Up @@ -2057,3 +2057,19 @@ def test_groups_repr_truncates(max_seq_items, expected):

result = df.groupby(np.array(df.a)).groups.__repr__()
assert result == expected


def test_groupby_column_index_name_lost():
# GH: 29764 groupby loses index sometimes
df = pd.DataFrame([[1]], columns=pd.Index(["a"], name="idx"))
result = df.groupby([1]).sum()
expected = pd.DataFrame([1], columns=pd.Index(["a"], name="idx"), index=[1])
tm.assert_frame_equal(result, expected)

result = df.groupby([1]).any()
expected = pd.DataFrame([True], columns=pd.Index(["a"], name="idx"), index=[1])
tm.assert_frame_equal(result, expected)

result = df.groupby([1]).shift()
expected = pd.DataFrame([np.nan], columns=pd.Index(["a"], name="idx"), index=[0])
tm.assert_frame_equal(result, expected)