Skip to content

ENH: Fixed DF.apply for functions returning a dict, #8735 #10740

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
wants to merge 2 commits into from
Closed
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
2 changes: 2 additions & 0 deletions doc/source/whatsnew/v0.17.0.txt
Original file line number Diff line number Diff line change
Expand Up @@ -218,6 +218,8 @@ Other enhancements

- Support pickling of ``Period`` objects (:issue:`10439`)

- ``DataFrame.apply`` will return a Series of dicts if the passed function returns a dict and ``reduce=True`` (:issue:`8735`).

.. _whatsnew_0170.api:

.. _whatsnew_0170.api_breaking:
Expand Down
15 changes: 9 additions & 6 deletions pandas/core/frame.py
Original file line number Diff line number Diff line change
Expand Up @@ -3921,13 +3921,16 @@ def _apply_standard(self, func, axis, ignore_failures=False, reduce=True):
# e.g. if we want to apply to a SparseFrame, then can't directly reduce
if reduce:

try:

# the is the fast-path
values = self.values
dummy = Series(NA, index=self._get_axis(axis),
dtype=values.dtype)
# the is the fast-path
values = self.values
# Create a dummy Series from an empty array
# Unlike filling with NA, this works for any dtype
index = self._get_axis(axis)
empty_arr = np.empty(len(index), dtype=values.dtype)
dummy = Series(empty_arr, index=self._get_axis(axis),
dtype=values.dtype)
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I can't think of a test that will isolate this change, as the code is wrapped in DF._apply_standard, and the dummy array isn't returned. Part of the previous issue was that all exceptions were caught, so when it failed to create the dummy array, this branch silently failed. I took the dummy generation code outside of the try block, so if it fails, it will raise an exception in the one test I added. Otherwise, I'm not sure what else I can do to test it.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

well something must have caused you to change it. what was that? The point is we cannot make changes that are not tested.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The problem is the previous code couldn't create an empty series of ints. I guess I could make this into a Series.empty_like class method and add tests for that, then replace this block with a single call to that method.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Or I could fix Series(index=..., dtype=int) to return a series of 0's or something, but I would have to make sure there's a well-defined empty/zero value for any dtype.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

before you actually change anything. a test would be helpful.


try:
labels = self._get_agg_axis(axis)
result = lib.reduce(values, func, axis=axis, dummy=dummy,
labels=labels)
Expand Down
2 changes: 1 addition & 1 deletion pandas/src/reduce.pyx
Original file line number Diff line number Diff line change
Expand Up @@ -133,7 +133,7 @@ cdef class Reducer:
else:
res = self.f(chunk)

if hasattr(res,'values'):
if hasattr(res,'values') and isinstance(res.values, np.ndarray):
res = res.values
if i == 0:
result = _get_result_array(res,
Expand Down
19 changes: 19 additions & 0 deletions pandas/tests/test_frame.py
Original file line number Diff line number Diff line change
Expand Up @@ -11255,6 +11255,25 @@ def test_apply_multi_index(self):
res = s.apply(lambda x: Series({'min': min(x), 'max': max(x)}), 1)
tm.assertIsInstance(res.index, MultiIndex)

def test_apply_dict(self):

# GH 8735
A = DataFrame([['foo', 'bar'], ['spam', 'eggs']])
A_dicts = pd.Series([dict([(0, 'foo'), (1, 'spam')]),
dict([(0, 'bar'), (1, 'eggs')])])
B = DataFrame([[0, 1], [2, 3]])
B_dicts = pd.Series([dict([(0, 0), (1, 2)]), dict([(0, 1), (1, 3)])])
fn = lambda x: x.to_dict()

for df, dicts in [(A, A_dicts), (B, B_dicts)]:
reduce_true = df.apply(fn, reduce=True)
reduce_false = df.apply(fn, reduce=False)
reduce_none = df.apply(fn, reduce=None)

assert_series_equal(reduce_true, dicts)
assert_frame_equal(reduce_false, df)
assert_series_equal(reduce_none, dicts)

def test_applymap(self):
applied = self.frame.applymap(lambda x: x * 2)
assert_frame_equal(applied, self.frame * 2)
Expand Down