-
-
Notifications
You must be signed in to change notification settings - Fork 18.5k
Melting with not present column does not produce error #23575
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Melting with not present column does not produce error #23575
Conversation
Hello @michaelsilverstein! Thanks for updating the PR.
Comment last updated on November 15, 2018 at 22:12 Hours UTC |
Codecov Report
@@ Coverage Diff @@
## master #23575 +/- ##
==========================================
+ Coverage 92.25% 92.29% +0.03%
==========================================
Files 161 161
Lines 51383 51500 +117
==========================================
+ Hits 47404 47531 +127
+ Misses 3979 3969 -10
Continue to review full report at Codecov.
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This will need tests and a release note.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
pls add some tests and a whatsnew entry
@jreback would you be able to point me to some documentation on how to properly document tests and a whatsnew entry? |
All the contributing docs are at http://pandas-docs.github.io/pandas-docs-travis/contributing.html release notes: http://pandas-docs.github.io/pandas-docs-travis/contributing.html#documenting-your-code |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@jreback I have added tests and a whatsnew entry
You may need to escape the `[` and `]` in the match.
…On Tue, Nov 13, 2018 at 4:30 PM Michael Silverstein < ***@***.***> wrote:
***@***.**** commented on this pull request.
------------------------------
In pandas/tests/reshape/test_melt.py
<#23575 (comment)>:
> @@ -661,3 +661,36 @@ def test_col_substring_of_stubname(self):
i=['node_id', 'A'],
j='time')
tm.assert_frame_equal(result, expected)
+
+ def test_melt_missing_columns(self):
+ # GH-23575
+ # This test is to ensure that pandas raises an error if melting is
+ # attempted with column names absent from the dataframe
+
+ # Generate data
+ df = pd.DataFrame(np.random.randn(5, 4), columns=list('abcd'))
+
+ # Try to melt with missing `value_vars` column name
+ with pytest.raises(KeyError, match="The following 'value_vars' are not"
Regex wasn't working and now when I try pytest with match equal to the
exact output of the error message (as before) I get this frustrating error:
AssertionError: Pattern 'The following 'value_vars' are not present in the
DataFrame: ['C']' not found in '"The following 'value_vars' are not present
in the DataFrame: ['C']"'
—
You are receiving this because you commented.
Reply to this email directly, view it on GitHub
<#23575 (comment)>,
or mute the thread
<https://github.com/notifications/unsubscribe-auth/ABQHIn5i7rcFplFXCToXnbwsh-Nf_wxuks5uu0fygaJpZM4YVNGG>
.
|
# Conflicts: # doc/source/whatsnew/v0.24.0.rst
…v_melt_column_check # Conflicts: # doc/source/whatsnew/v0.24.0.rst
…v_melt_column_check # Conflicts: # doc/source/whatsnew/v0.24.0.rst
@jreback CircleCI and Azure have passed. Failure in TravisCI seems unrelated (@TomAugspurger agrees it seems). |
This failure is real: https://travis-ci.org/pandas-dev/pandas/jobs/455623394#L2875 It seems there's an issue with MultiIndex in the columns. I'm surprised there aren't tests for that outside of the docstring. |
I'll add the tests from the docstring explicitly |
I think this is the only error from TravisCI now? |
yeah this looks like a ci/code_checks.sh error, run it locally and see |
What is the proper order of imports? I had to import |
http://pandas-docs.github.io/pandas-docs-travis/contributing.html#import-formatting SO something like |
I think the failure from TravisCI passed here: |
@@ -24,6 +25,10 @@ | |||
def melt(frame, id_vars=None, value_vars=None, var_name=None, | |||
value_name='value', col_level=None): | |||
# TODO: what about the existing index? | |||
if isinstance(frame.columns, ABCMultiIndex): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm not especially familiar with melt and multi-index columns, but I don't think this is quite right.
It seems like you need to specify col_level
when you have a MI in the columns, so you should probably just be checks against frame.columns.levels[col_level]
when you have a MI.
However, it doesn't quite seem that a col_level
is required when there's a MI in the columns. The default of pd.melt(df)
seems to work, but any time I specified an id_vars
or value_vars
without col_level I get an uninformative error message. I'm not sure what's going on.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think you need to provide col_level
for MI when only melting on one level like this example from the docstring (that I added a new test for):
pd.melt(df, col_level=0, id_vars=['A'], value_vars=['B'])
But you don't need to specify col_level
when using all levels of MI:
pd.melt(df, id_vars=[('A', 'D')], value_vars=[('B', 'E')])
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
All I am doing at L28 is gathering column names from all levels. There are other checks to make sure that melting is performed properly, this will just check to make sure that whatever you pass, it is in your df
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thoughts?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yeah i think this is ok, can you provdie a comment on what is going on.
@@ -24,6 +25,10 @@ | |||
def melt(frame, id_vars=None, value_vars=None, var_name=None, | |||
value_name='value', col_level=None): | |||
# TODO: what about the existing index? | |||
if isinstance(frame.columns, ABCMultiIndex): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yeah i think this is ok, can you provdie a comment on what is going on.
KeyError, | ||
match=msg.format(Var='id_vars', | ||
Col="\\['not_here', 'or_there'\\]")): | ||
df.melt(['a', 'b', 'not_here', 'or_there'], ['c', 'd']) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
can you do an example with an MI and columns that are not in the top level of the MI, ideally try with and w/o col_level as well.
lgtm. ping on green. |
thanks! |
git diff upstream/master -u -- "*.py" | flake8 --diff