-
-
Notifications
You must be signed in to change notification settings - Fork 18.5k
Melting with not present column does not produce error #23575
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Merged
jreback
merged 34 commits into
pandas-dev:master
from
michaelsilverstein:dev_melt_column_check
Nov 21, 2018
Merged
Changes from all commits
Commits
Show all changes
34 commits
Select commit
Hold shift + click to select a range
855985d
check for columns in dataframe
michaelsilverstein 40fdb05
check for columns in dataframe
michaelsilverstein 9670da2
check difference with Index; use {} str formatting
michaelsilverstein 3ffc870
missing.any()
michaelsilverstein 8139f78
started test
michaelsilverstein 0a94650
added to whatsnew
michaelsilverstein d0f6d23
PEP criteria
michaelsilverstein 6c76161
`missing.empty` to accommodate MultiIndex
michaelsilverstein ad3d926
rm `*`
michaelsilverstein e097a87
rm comment
michaelsilverstein 5ff3a32
add test for id_var and multiple missing
michaelsilverstein fcbda15
reformat error statement; Value->KeyError
michaelsilverstein 3175b34
simplified test
michaelsilverstein 515fb9f
Issue -> GH
michaelsilverstein c7d6fcf
PEP criteria
michaelsilverstein 5911cc3
PEP criteria
michaelsilverstein 47ca7fc
test not working now
michaelsilverstein d0ee9c5
regex compatible match
michaelsilverstein c75ab23
PEP criteria
michaelsilverstein 32ed22c
move test to TestMelt() class
michaelsilverstein e629b2a
PEP
michaelsilverstein 89de406
PEP
michaelsilverstein 1d13f4a
PEP
michaelsilverstein 479b761
Merge branch 'master' into dev_melt_column_check
michaelsilverstein 01e8d74
resolving conflicts
michaelsilverstein 6762b21
Merge branch 'master' of https://github.com/pandas-dev/pandas into de…
michaelsilverstein eae7716
Merge branch 'master' of https://github.com/pandas-dev/pandas into de…
michaelsilverstein fba641f
handle multiindex columns
michaelsilverstein 06b7cdb
test single var melt with multiindex
michaelsilverstein 39c746b
test single var melt with multiindex
michaelsilverstein af170e1
pep8 and index sorting
michaelsilverstein 4c9bc9f
rm extra description
michaelsilverstein c59d29f
add comment
michaelsilverstein 0db8838
add MI tests
michaelsilverstein File filter
Filter by extension
Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
There are no files selected for viewing
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
|
@@ -101,6 +101,14 @@ def test_vars_work_with_multiindex(self): | |
result = self.df1.melt(id_vars=[('A', 'a')], value_vars=[('B', 'b')]) | ||
tm.assert_frame_equal(result, expected) | ||
|
||
def test_single_vars_work_with_multiindex(self): | ||
expected = DataFrame({ | ||
'A': {0: 1.067683, 1: -1.321405, 2: -0.807333}, | ||
'CAP': {0: 'B', 1: 'B', 2: 'B'}, | ||
'value': {0: -1.110463, 1: 0.368915, 2: 0.08298}}) | ||
result = self.df1.melt(['A'], ['B'], col_level=0) | ||
tm.assert_frame_equal(result, expected) | ||
|
||
def test_tuple_vars_fail_with_multiindex(self): | ||
# melt should fail with an informative error message if | ||
# the columns have a MultiIndex and a tuple is passed | ||
|
@@ -233,6 +241,49 @@ def test_pandas_dtypes(self, col): | |
expected.columns = ['klass', 'col', 'attribute', 'value'] | ||
tm.assert_frame_equal(result, expected) | ||
|
||
def test_melt_missing_columns_raises(self): | ||
# GH-23575 | ||
# This test is to ensure that pandas raises an error if melting is | ||
# attempted with column names absent from the dataframe | ||
|
||
# Generate data | ||
df = pd.DataFrame(np.random.randn(5, 4), columns=list('abcd')) | ||
|
||
# Try to melt with missing `value_vars` column name | ||
msg = "The following '{Var}' are not present in the DataFrame: {Col}" | ||
with pytest.raises( | ||
KeyError, | ||
match=msg.format(Var='value_vars', Col="\\['C'\\]")): | ||
df.melt(['a', 'b'], ['C', 'd']) | ||
|
||
# Try to melt with missing `id_vars` column name | ||
with pytest.raises( | ||
KeyError, | ||
match=msg.format(Var='id_vars', Col="\\['A'\\]")): | ||
df.melt(['A', 'b'], ['c', 'd']) | ||
|
||
# Multiple missing | ||
with pytest.raises( | ||
KeyError, | ||
match=msg.format(Var='id_vars', | ||
Col="\\['not_here', 'or_there'\\]")): | ||
df.melt(['a', 'b', 'not_here', 'or_there'], ['c', 'd']) | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. can you do an example with an MI and columns that are not in the top level of the MI, ideally try with and w/o col_level as well. |
||
|
||
# Multiindex melt fails if column is missing from multilevel melt | ||
multi = df.copy() | ||
multi.columns = [list('ABCD'), list('abcd')] | ||
with pytest.raises( | ||
KeyError, | ||
match=msg.format(Var='id_vars', | ||
Col="\\['E'\\]")): | ||
multi.melt([('E', 'a')], [('B', 'b')]) | ||
# Multiindex fails if column is missing from single level melt | ||
with pytest.raises( | ||
KeyError, | ||
match=msg.format(Var='value_vars', | ||
Col="\\['F'\\]")): | ||
multi.melt(['A'], ['F'], col_level=0) | ||
|
||
|
||
class TestLreshape(object): | ||
|
||
|
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm not especially familiar with melt and multi-index columns, but I don't think this is quite right.
It seems like you need to specify
col_level
when you have a MI in the columns, so you should probably just be checks againstframe.columns.levels[col_level]
when you have a MI.However, it doesn't quite seem that a
col_level
is required when there's a MI in the columns. The default ofpd.melt(df)
seems to work, but any time I specified anid_vars
orvalue_vars
without col_level I get an uninformative error message. I'm not sure what's going on.There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think you need to provide
col_level
for MI when only melting on one level like this example from the docstring (that I added a new test for):pd.melt(df, col_level=0, id_vars=['A'], value_vars=['B'])
But you don't need to specify
col_level
when using all levels of MI:pd.melt(df, id_vars=[('A', 'D')], value_vars=[('B', 'E')])
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
All I am doing at L28 is gathering column names from all levels. There are other checks to make sure that melting is performed properly, this will just check to make sure that whatever you pass, it is in your df
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thoughts?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yeah i think this is ok, can you provdie a comment on what is going on.