Description
-
I have checked that this issue has not already been reported.
-
I have confirmed this bug exists on the latest version of pandas.
-
(optional) I have confirmed this bug exists on the master branch of pandas.
Code Sample
import pandas as pd
# Test 1 - Group by with value counts returns a series
id_column = ["idA","idA","idA","idA","idA","idA","idB","idB","idB","idB","idB","idB","idB"]
test_bool_value = [True,False,False,False,False,False,True,True,False,False,False,True,True]
test_data = {"id":id_column, "bool_value":test_bool_value}
df = pd.DataFrame(data=test_data)
df_output = df.groupby("id").apply(lambda df: df["bool_value"].value_counts())
print(type(df_output))
# Test 2 - Group by with value counts returns a dataframe (same input df shape with different values)
id_column = ["idA","idA","idA","idA","idA","idA","idB","idB","idB","idB","idB","idB","idB"]
test_bool_value = [True,True,True,True,False,False,True,True,False,False,False,True,True]
test_data = {"id":id_column, "bool_value":test_bool_value}
df = pd.DataFrame(data=test_data)
df_output = df.groupby("id").apply(lambda df: df["bool_value"].value_counts())
print(type(df_output))
Problem description
When using value_counts() on a column within a groupby - in this instance to count how many True/False values exist within each id grouping - the output type varies between a Series and a DataFrame depending on the specific arrangement of True/False values with no changes to the shape of the incoming DataFrame.
Expected Output
Either both should return <class 'pandas.core.frame.DataFrame'>
or <class 'pandas.core.series.Series'>
- seeing as the value_counts()
function returns a Series I would expect the default output from the groupby operation to be a DataFrame.
Output of pd.show_versions()
INSTALLED VERSIONS
commit : f00ed8f
python : 3.8.5.final.0
python-bits : 64
OS : Linux
OS-release : 5.4.89+
Version : #1 SMP Sat Feb 13 19:45:14 PST 2021
machine : x86_64
processor :
byteorder : little
LC_ALL : None
LANG : C.UTF-8
LOCALE : en_US.UTF-8
pandas : 1.3.0
numpy : 1.19.1
pytz : 2020.1
dateutil : 2.8.1
pip : 20.2.2
setuptools : 49.6.0.post20200814
Cython : 0.29.21
pytest : 6.2.1
hypothesis : None
sphinx : None
blosc : None
feather : None
xlsxwriter : None
lxml.etree : None
html5lib : None
pymysql : None
psycopg2 : 2.8.6 (dt dec pq3 ext lo64)
jinja2 : 2.11.2
IPython : 7.20.0
pandas_datareader: None
bs4 : 4.9.3
bottleneck : None
fsspec : None
fastparquet : None
gcsfs : None
matplotlib : 3.3.1
numexpr : None
odfpy : None
openpyxl : None
pandas_gbq : None
pyarrow : None
pyxlsb : None
s3fs : None
scipy : 1.5.2
sqlalchemy : 1.3.13
tables : None
tabulate : 0.8.7
xarray : None
xlrd : None
xlwt : None
numba : 0.50.1