-
-
Notifications
You must be signed in to change notification settings - Fork 18.5k
WIP: Add value_counts() to DataFrame #5381
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Changes from all commits
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
Original file line number | Diff line number | Diff line change |
---|---|---|
|
@@ -4297,6 +4297,55 @@ def mode(self, axis=0, numeric_only=False): | |
f = lambda s: s.mode() | ||
return data.apply(f, axis=axis) | ||
|
||
def value_counts(self, axis=0, normalize=False, sort=True, | ||
ascending=False, bins=None, numeric_only=False): | ||
""" | ||
Returns DataFrame containing counts of unique values. The resulting | ||
DataFrame will be in descending order so that the first element is the | ||
most frequently-occurring element among *all* columns. Excludes NA | ||
values. Maintains order along axis (i.e., column/row) | ||
|
||
Parameters | ||
---------- | ||
axis : {0, 1, 'index', 'columns'} (default 0) | ||
0/'index' : get value_counts by column | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. can you add |
||
1/'columns' : get value_counts by row | ||
normalize: boolean, default False | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. space between 'normalize' and ':' |
||
If True then the Series returned will contain the relative | ||
frequencies of the unique values. | ||
sort : boolean, default True | ||
Sort by sum of counts across columns (if False, DataFrame will be | ||
sorted by union of all the unique values found) | ||
ascending : boolean, default False | ||
Sort in ascending order | ||
bins : integer or sequence of scalars, optional | ||
Rather than count values, group them into half-open bins, a | ||
convenience for pd.cut, only works with numeric data. If integer, | ||
then creates bins based upon overall max and overall min. If | ||
passed, assumes numeric_only. | ||
numeric_only : bool, default False | ||
only apply to numeric columns. | ||
|
||
Returns | ||
------- | ||
counts : DataFrame | ||
""" | ||
data = self if not numeric_only else self._get_numeric_data() | ||
from pandas.tools.tile import _generate_bins | ||
if bins is not None and not com._is_sequence(bins): | ||
max_val = self.max().max() | ||
min_val = self.min().min() | ||
bins = _generate_bins(bins=bins, min_val=min_val, max_val=max_val) | ||
|
||
f = lambda s: s.value_counts(normalize=normalize, bins=bins) | ||
res = data.apply(f, axis=axis) | ||
|
||
if sort: | ||
order = res.sum(1).order(ascending=ascending).index | ||
res = res.reindex(order) | ||
|
||
return res | ||
|
||
def quantile(self, q=0.5, axis=0, numeric_only=True): | ||
""" | ||
Return values at the given quantile over requested axis, a la | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can you put the first sentence on a seperate line? (so putting the "The resulting" on the next line. When following the numpy docstring standard exactly, there should even be a blank line after the first sentence.) This will ensure that the summary in the api docs (http://pandas.pydata.org/pandas-docs/dev/api.html) are limited to that one sentence.