-
-
Notifications
You must be signed in to change notification settings - Fork 18.5k
BUG: preserve name in set_categories (#17509) #17517
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
@@ -2068,7 +2068,7 @@ def _delegate_method(self, name, *args, **kwargs): | |||
method = getattr(self.categorical, name) | |||
res = method(*args, **kwargs) | |||
if res is not None: | |||
return Series(res, index=self.index) | |||
return Series(res, index=self.index, name=self.name) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ah, I misread the earlier. self
is the instance of CategoricalAccessor
, not the Series
like I expected. So this approach won't work.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think if you modify
pandas/pandas/core/categorical.py
Line 2078 in f11bbf2
return CategoricalAccessor(data.values, data.index) |
data.name
. and then modify CategoricalAccessor.__init__
to accept a name. data
may not have a name, so maybe getattr(data, 'name', None)
. Make sense?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
so the signature should follow like we do in pandas.core.indexes.accessors
, e.g. (data, index, name=None)
Also, if you could add new tests to ensure the name is passed through, and a release note in doc/source/whatsnew/v0.21.0.txt under bug fixes. |
Hello @Giftlin! Thanks for updating the PR. Cheers ! There are no PEP8 issues in this Pull Request. 🍻 Comment last updated on September 18, 2017 at 11:36 Hours UTC |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
tests!
@@ -2068,7 +2068,7 @@ def _delegate_method(self, name, *args, **kwargs): | |||
method = getattr(self.categorical, name) | |||
res = method(*args, **kwargs) | |||
if res is not None: | |||
return Series(res, index=self.index) | |||
return Series(res, index=self.index, name=self.name) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
so the signature should follow like we do in pandas.core.indexes.accessors
, e.g. (data, index, name=None)
Codecov Report
@@ Coverage Diff @@
## master #17517 +/- ##
==========================================
- Coverage 91.18% 91.16% -0.02%
==========================================
Files 163 163
Lines 49543 49544 +1
==========================================
- Hits 45177 45169 -8
- Misses 4366 4375 +9
Continue to review full report at Codecov.
|
Codecov Report
@@ Coverage Diff @@
## master #17517 +/- ##
==========================================
- Coverage 91.25% 91.2% -0.05%
==========================================
Files 163 163
Lines 49606 49625 +19
==========================================
- Hits 45266 45261 -5
- Misses 4340 4364 +24
Continue to review full report at Codecov.
|
contrib docs are here. pls add tests IN this PR. It should fail with the tests, then pass after the fix. |
- Bug in preserving name in set_categories. (:issue:`17509`)
doc/source/whatsnew/v0.21.0.txt
Outdated
@@ -566,6 +566,7 @@ Categorical | |||
- Bug in the categorical constructor with empty values and categories causing | |||
the ``.categories`` to be an empty ``Float64Index`` rather than an empty | |||
``Index`` with object dtype (:issue:`17248`) | |||
- Bug in preserving name in ``set_categories``. (:issue:`17509`) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Could you make this more descriptive? Maybe
Bug in categorical operations on Series like `Series.cat.set_categories` not preserving the original Series' name (:issue:`17509`)
pandas/tests/test_categorical.py
Outdated
@@ -57,6 +57,13 @@ def test_getitem_listlike(self): | |||
expected = c[np.array([100000]).astype(np.int64)].codes | |||
tm.assert_numpy_array_equal(result, expected) | |||
|
|||
def test_getname_category(self): | |||
result = 'A' |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
result
should be the result of the operation. So. result = s.cat.set_categories([1, 2, 3]).name
. And then compare directly to the expected name. assert result == 'A'
pandas/tests/test_categorical.py
Outdated
s = s.cat.set_categories([1, 2, 3]) | ||
expected = s.astype('category').name | ||
tm.assert_almost_equal(result, expected) | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can you add additional tests for the other delegated methods that this should apply to? set_ordered
, etc?
pandas/tests/test_categorical.py
Outdated
expected = 'A' | ||
s = pd.Series([1, 2, 3], name='A').astype('category') | ||
s = s.cat.set_categories([1, 2, 3]) | ||
result = s.astype('category').name |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This should be result = s.cat.set_categories([1, 2, 3]).name
so that we're directly testing the behavior that caused the bug.
doc/source/whatsnew/v0.21.0.txt
Outdated
@@ -566,7 +566,9 @@ Categorical | |||
- Bug in the categorical constructor with empty values and categories causing | |||
the ``.categories`` to be an empty ``Float64Index`` rather than an empty | |||
``Index`` with object dtype (:issue:`17248`) | |||
|
|||
- Bug in categorical operations on Series like `Series.cat.set_categories` |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sorry, it should be double backticks, not single
``Series.cat.set_categories``
pandas/tests/test_categorical.py
Outdated
@@ -57,6 +57,12 @@ def test_getitem_listlike(self): | |||
expected = c[np.array([100000]).astype(np.int64)].codes | |||
tm.assert_numpy_array_equal(result, expected) | |||
|
|||
def test_getname_category(self): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
parametrize with all accessor methods
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
you can use a lambda expression
@pytest.mark.parametrize("method",
[
lambda x: x.cat.set_categories([1, 2,3 ]),
lambda x: x.cat.reorder_categories([2,3, 1], ordered=True),
])
def test_getname_categorical_accessor(self, method):
s = ....
expected = 'A'
result = method(s).name
assert result == expected
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The current test function is also fine only right? What is the difference? Performance?
doc/source/whatsnew/v0.21.0.txt
Outdated
@@ -566,7 +566,9 @@ Categorical | |||
- Bug in the categorical constructor with empty values and categories causing | |||
the ``.categories`` to be an empty ``Float64Index`` rather than an empty | |||
``Index`` with object dtype (:issue:`17248`) | |||
|
|||
- Bug in categorical operations on Series like ``Series.cat.set_categories`` |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
should instead refer to all Series.cat
operations not preserving name
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should I mention all the functions along with set_categories here?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I would just refer to http://pandas.pydata.org/pandas-docs/stable/categorical.html#working-with-categories
Lint error fix
Sorry for too many commits.
Lint error
pandas/tests/test_categorical.py
Outdated
@@ -57,6 +57,24 @@ def test_getitem_listlike(self): | |||
expected = c[np.array([100000]).astype(np.int64)].codes | |||
tm.assert_numpy_array_equal(result, expected) | |||
|
|||
@pytest.mark.parametrize("method", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
not sure if this is passing linting I would write this like
@pytest.mark.parametrize(
"method",
[
lambda .....
])
to get more indent
ok this looks fine to me (after fixing linting). ping on green. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks good!
Just a minor comment about how to link to a section in the docs.
doc/source/whatsnew/v0.21.0.txt
Outdated
@@ -566,6 +566,7 @@ Categorical | |||
- Bug in the categorical constructor with empty values and categories causing | |||
the ``.categories`` to be an empty ``Float64Index`` rather than an empty | |||
``Index`` with object dtype (:issue:`17248`) | |||
- Bug in categorical operations `Series.cat <http://pandas.pydata.org/pandas-docs/stable/categorical.html#working-with-categories>' not preserving the original Series' name (:issue:`17509`) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sphinx has the ability to do local references (see http://www.sphinx-doc.org/en/stable/markup/inline.html#cross-referencing-arbitrary-locations), so you can do :ref:`Series.cat <categorical.cat>`
instead of listing the actual full url.
You only need to add a label at the specific location (put .. _categorical.cat:
on the line before the title):
pandas/doc/source/categorical.rst
Lines 148 to 150 in cbb090f
Working with categories | |
----------------------- |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yes!
Update categorical.rst
What's new
pandas/tests/test_categorical.py
Outdated
s = pd.Series([1, 2, 3], name='A').astype('category') | ||
expected = 'A' | ||
result = method(s).name | ||
assert result == expected |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Instead of doing this, construct the expected Series
directly from the constructor and call tm.assert_series_equal
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
But Series is not expected. Only the name is expected, which is a character
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That's because you defined your test as such. The output of method
is a Series
, not a character. I'm saying that you should check the output of method
instead of comparing its name
attribute.
We want to make sure that nothing else changes and that the name
parameter is preserved.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@gfyoung In this case asserting only the name is fine I think. Assuming the actual methods are already tested separately, this tests just asserts for all of them they preserve the name. If checking the actual resulting series, all of them would need a different result, so would just over-complicate the test.
Update test_categorical.py
@jreback |
@Giftlin Thanks! (I added a small commit with some white-space clean-up. Not sure what went wrong, but the diff showed changes in whitespace on some lines. You might need to check the settings of your editor to ensure it only uses spaces for whitespace) |
git diff upstream/master -u -- "*.py" | flake8 --diff