-
-
Notifications
You must be signed in to change notification settings - Fork 18.5k
Fix for DataFrame.hist() with by- and weights-keyword #11441
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
will make the following work import pandas as pd d = {'one' : ['A', 'A', 'B', 'C'], 'two' : [4., 3., 2., 1.], 'three' : [10., 8., 5., 7.]} df = pd.DataFrame(d) df.hist('two', by='one', weights='three', bins=range(0, 10))
Because of my poor git skills, this is a continuation for upgraded to work with the newest pandas release I am at loss writing a test for this. @jreback I could not find the place you recommended. The place where weights are tested right now seems to be test_generic.py? However I am unsure of how to construct a test the best way, I have never written one, so pointers are very welcome! |
Also fix if NaN's make an entire group empty
… in the DataFrame
Note, I added the column 'weighst' to the supplied DataFrame if an array was supplied instead of the column name... Can this be done in a nicer way? I need it in the groupby logic. if a column is named 'weights' it will be overwritten... :( |
@Twizzledrizzle no worries about the git snafu, we've all been there. People are usually in the chat if you need help walking through git. I think Jeff was talking about the logic in For the tests you can look to see how similar tests are done in I'll take a look later, but it won't be until next week sometime. |
…nymore Also made the logic better when using weights without group by by aligning NaN's, and finding out if weights is supplied by column or by array.
Thanks @TomAugspurger I will take a look. A lot of commits right now, finding things while working through the travis errors... I hope the server does not crash! |
Yeah, we'll come up with a better way to solve that weights column. FYI you can run the tests locally, will be quicker than waiting on Travis. Make sure you have nose installed ( Run specific tests with |
I have python 2.7 and every tests seems to work fine. Why could python 3.4 fail on
Does not seem to have anything to do with my pull request? |
xrot=None, ylabelsize=None, yrot=None, ax=None, sharex=False, | ||
sharey=False, figsize=None, layout=None, bins=10, **kwds): | ||
def hist_frame(data, column=None, weights=None, by=None, grid=True, | ||
xlabelsize=None, xrot=None, ylabelsize=None, yrot=None, ax=None, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
don't change the order of parameters, just add at th eend
need tests |
@Twizzledrizzle ignore that goggle multi failure....its an unreliable test on travis |
Uses dropna(subset=...) to delete where nan's over the columns supplied Also doing this in the beginning so we do not have to duplicate this logic
Made the code more readable now I think, I hope it looks better. Regarding tests, are we talking tests like this? In test_graphics_others.py? (Looks like some other tests are done one the hist functionality there). ax = _check_plot_works(df.hist, column='two', by='one', weights='three', bins=range(0, 10)) |
Oh also, as a side, the new function df.plot.hist() will miss a lot of this functionality. df.plot.hist() seem to have problems using the by- keyword. Also missing column keyword. I tried working with the code, but it was more complex than the one I have been editing. I could not get my head around making the by-keyword work. Should I open a discussion around this perhaps in a separate thread? |
subset_cols_drop_nan = [] | ||
if weights is not None: | ||
if isinstance(weights, np.ndarray): | ||
# weights supplied as an array instead of a part of the dataframe |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hmm, this will not work if weights is a >1 dimensional ndarray... Need to think on this
@TomAugspurger status of this? |
I am awaiting with interest what will happen with before putting to much time in this pull request. |
@TomAugspurger status of this |
@TomAugspurger status? |
I think waiting on #8018, which will conflict with the case where |
status? |
@TomAugspurger can this be independent #8018 ? |
closing as stale |
will make the following work