Skip to content

BUG: preserve name in set_categories (#17509) #17517

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 36 commits into from
Sep 18, 2017

Conversation

Giftlin
Copy link
Contributor

@Giftlin Giftlin commented Sep 13, 2017

@@ -2068,7 +2068,7 @@ def _delegate_method(self, name, *args, **kwargs):
method = getattr(self.categorical, name)
res = method(*args, **kwargs)
if res is not None:
return Series(res, index=self.index)
return Series(res, index=self.index, name=self.name)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah, I misread the earlier. self is the instance of CategoricalAccessor, not the Series like I expected. So this approach won't work.

Copy link
Contributor

@TomAugspurger TomAugspurger Sep 13, 2017

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think if you modify

return CategoricalAccessor(data.values, data.index)
to get data.name. and then modify CategoricalAccessor.__init__ to accept a name. data may not have a name, so maybe getattr(data, 'name', None). Make sense?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

so the signature should follow like we do in pandas.core.indexes.accessors, e.g. (data, index, name=None)

@TomAugspurger
Copy link
Contributor

Also, if you could add new tests to ensure the name is passed through, and a release note in doc/source/whatsnew/v0.21.0.txt under bug fixes.

@pep8speaks
Copy link

pep8speaks commented Sep 13, 2017

Hello @Giftlin! Thanks for updating the PR.

Cheers ! There are no PEP8 issues in this Pull Request. 🍻

Comment last updated on September 18, 2017 at 11:36 Hours UTC

Copy link
Contributor

@jreback jreback left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

tests!

@@ -2068,7 +2068,7 @@ def _delegate_method(self, name, *args, **kwargs):
method = getattr(self.categorical, name)
res = method(*args, **kwargs)
if res is not None:
return Series(res, index=self.index)
return Series(res, index=self.index, name=self.name)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

so the signature should follow like we do in pandas.core.indexes.accessors, e.g. (data, index, name=None)

@jreback jreback added the Categorical Categorical Data Type label Sep 13, 2017
@codecov
Copy link

codecov bot commented Sep 13, 2017

Codecov Report

Merging #17517 into master will decrease coverage by 0.01%.
The diff coverage is 100%.

Impacted file tree graph

@@            Coverage Diff             @@
##           master   #17517      +/-   ##
==========================================
- Coverage   91.18%   91.16%   -0.02%     
==========================================
  Files         163      163              
  Lines       49543    49544       +1     
==========================================
- Hits        45177    45169       -8     
- Misses       4366     4375       +9
Flag Coverage Δ
#multiple 88.95% <100%> (ø) ⬆️
#single 40.21% <75%> (-0.06%) ⬇️
Impacted Files Coverage Δ
pandas/core/categorical.py 95.52% <100%> (ø) ⬆️
pandas/io/gbq.py 25% <0%> (-58.34%) ⬇️
pandas/core/frame.py 97.77% <0%> (-0.1%) ⬇️

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update f11bbf2...b817366. Read the comment docs.

@codecov
Copy link

codecov bot commented Sep 13, 2017

Codecov Report

Merging #17517 into master will decrease coverage by 0.04%.
The diff coverage is 100%.

Impacted file tree graph

@@            Coverage Diff             @@
##           master   #17517      +/-   ##
==========================================
- Coverage   91.25%    91.2%   -0.05%     
==========================================
  Files         163      163              
  Lines       49606    49625      +19     
==========================================
- Hits        45266    45261       -5     
- Misses       4340     4364      +24
Flag Coverage Δ
#multiple 88.99% <100%> (-0.03%) ⬇️
#single 40.19% <75%> (-0.06%) ⬇️
Impacted Files Coverage Δ
pandas/core/categorical.py 95.57% <100%> (ø) ⬆️
pandas/io/gbq.py 25% <0%> (-58.34%) ⬇️
pandas/plotting/_converter.py 63.23% <0%> (-1.82%) ⬇️
pandas/tseries/offsets.py 97% <0%> (-0.18%) ⬇️
pandas/core/frame.py 97.77% <0%> (-0.1%) ⬇️
pandas/core/indexes/interval.py 93.57% <0%> (ø) ⬆️
pandas/core/indexes/datetimes.py 95.53% <0%> (ø) ⬆️
pandas/core/api.py 100% <0%> (ø) ⬆️
pandas/core/resample.py 96.17% <0%> (+0.01%) ⬆️
pandas/plotting/_core.py 82.73% <0%> (+0.03%) ⬆️
... and 1 more

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 328c7e1...ce78e6c. Read the comment docs.

@jreback
Copy link
Contributor

jreback commented Sep 14, 2017

contrib docs are here. pls add tests IN this PR. It should fail with the tests, then pass after the fix.

@jorisvandenbossche jorisvandenbossche changed the title FIX for set_categories issue #17509 BUG: preserve name in set_categories (#17509) Sep 15, 2017
@@ -566,6 +566,7 @@ Categorical
- Bug in the categorical constructor with empty values and categories causing
the ``.categories`` to be an empty ``Float64Index`` rather than an empty
``Index`` with object dtype (:issue:`17248`)
- Bug in preserving name in ``set_categories``. (:issue:`17509`)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could you make this more descriptive? Maybe

Bug in categorical operations on Series like `Series.cat.set_categories` not preserving the original Series' name (:issue:`17509`)

@@ -57,6 +57,13 @@ def test_getitem_listlike(self):
expected = c[np.array([100000]).astype(np.int64)].codes
tm.assert_numpy_array_equal(result, expected)

def test_getname_category(self):
result = 'A'
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

result should be the result of the operation. So. result = s.cat.set_categories([1, 2, 3]).name. And then compare directly to the expected name. assert result == 'A'

s = s.cat.set_categories([1, 2, 3])
expected = s.astype('category').name
tm.assert_almost_equal(result, expected)

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you add additional tests for the other delegated methods that this should apply to? set_ordered, etc?

expected = 'A'
s = pd.Series([1, 2, 3], name='A').astype('category')
s = s.cat.set_categories([1, 2, 3])
result = s.astype('category').name
Copy link
Contributor

@TomAugspurger TomAugspurger Sep 17, 2017

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This should be result = s.cat.set_categories([1, 2, 3]).name so that we're directly testing the behavior that caused the bug.

@@ -566,7 +566,9 @@ Categorical
- Bug in the categorical constructor with empty values and categories causing
the ``.categories`` to be an empty ``Float64Index`` rather than an empty
``Index`` with object dtype (:issue:`17248`)

- Bug in categorical operations on Series like `Series.cat.set_categories`
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorry, it should be double backticks, not single

``Series.cat.set_categories``

@@ -57,6 +57,12 @@ def test_getitem_listlike(self):
expected = c[np.array([100000]).astype(np.int64)].codes
tm.assert_numpy_array_equal(result, expected)

def test_getname_category(self):
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

parametrize with all accessor methods

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

you can use a lambda expression

@pytest.mark.parametrize("method",
[
   lambda x: x.cat.set_categories([1, 2,3 ]),
   lambda x: x.cat.reorder_categories([2,3, 1], ordered=True),
])
def test_getname_categorical_accessor(self, method):
     s = ....
    expected = 'A'
    result = method(s).name
    assert result == expected

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The current test function is also fine only right? What is the difference? Performance?

@@ -566,7 +566,9 @@ Categorical
- Bug in the categorical constructor with empty values and categories causing
the ``.categories`` to be an empty ``Float64Index`` rather than an empty
``Index`` with object dtype (:issue:`17248`)

- Bug in categorical operations on Series like ``Series.cat.set_categories``
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

should instead refer to all Series.cat operations not preserving name

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should I mention all the functions along with set_categories here?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@@ -57,6 +57,24 @@ def test_getitem_listlike(self):
expected = c[np.array([100000]).astype(np.int64)].codes
tm.assert_numpy_array_equal(result, expected)

@pytest.mark.parametrize("method",
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

not sure if this is passing linting I would write this like

@pytest.mark.parametrize(
    "method",
    [
        lambda .....
    ])

to get more indent

@jreback
Copy link
Contributor

jreback commented Sep 17, 2017

ok this looks fine to me (after fixing linting). ping on green.

@jreback jreback added the Bug label Sep 17, 2017
@jorisvandenbossche jorisvandenbossche added this to the 0.21.0 milestone Sep 18, 2017
Copy link
Member

@jorisvandenbossche jorisvandenbossche left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good!
Just a minor comment about how to link to a section in the docs.

@@ -566,6 +566,7 @@ Categorical
- Bug in the categorical constructor with empty values and categories causing
the ``.categories`` to be an empty ``Float64Index`` rather than an empty
``Index`` with object dtype (:issue:`17248`)
- Bug in categorical operations `Series.cat <http://pandas.pydata.org/pandas-docs/stable/categorical.html#working-with-categories>' not preserving the original Series' name (:issue:`17509`)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sphinx has the ability to do local references (see http://www.sphinx-doc.org/en/stable/markup/inline.html#cross-referencing-arbitrary-locations), so you can do :ref:`Series.cat <categorical.cat>` instead of listing the actual full url.
You only need to add a label at the specific location (put .. _categorical.cat: on the line before the title):

Working with categories
-----------------------

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is the expected change right?

screenshot_2017-09-18-13-25-56-574_com android browser

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yes!

s = pd.Series([1, 2, 3], name='A').astype('category')
expected = 'A'
result = method(s).name
assert result == expected
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Instead of doing this, construct the expected Series directly from the constructor and call tm.assert_series_equal.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

But Series is not expected. Only the name is expected, which is a character

Copy link
Member

@gfyoung gfyoung Sep 18, 2017

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That's because you defined your test as such. The output of method is a Series, not a character. I'm saying that you should check the output of method instead of comparing its name attribute.

We want to make sure that nothing else changes and that the name parameter is preserved.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@gfyoung In this case asserting only the name is fine I think. Assuming the actual methods are already tested separately, this tests just asserts for all of them they preserve the name. If checking the actual resulting series, all of them would need a different result, so would just over-complicate the test.

@Giftlin
Copy link
Contributor Author

Giftlin commented Sep 18, 2017

@jreback
@TomAugspurger
checks have passed

@jorisvandenbossche jorisvandenbossche merged commit 9cc3333 into pandas-dev:master Sep 18, 2017
@jorisvandenbossche
Copy link
Member

@Giftlin Thanks!

(I added a small commit with some white-space clean-up. Not sure what went wrong, but the diff showed changes in whitespace on some lines. You might need to check the settings of your editor to ensure it only uses spaces for whitespace)

@Giftlin Giftlin deleted the patch-4 branch September 18, 2017 18:01
alanbato pushed a commit to alanbato/pandas that referenced this pull request Nov 10, 2017
No-Stream pushed a commit to No-Stream/pandas that referenced this pull request Nov 28, 2017
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Bug Categorical Categorical Data Type
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Series losts its name after set_categories
6 participants