Description
Code Sample, a copy-pastable example if possible
df = pd.DataFrame({'A': ['a','b',None,'a','b',None], 'B': range(6)})
df.groupby('A').sum() # output has two rows
df.groupby('A').first() # output has two rows
df.groupby('A').head(1) # output has three rows, one for the null group
df.groupby('A').nth(1) # output has three rows, but one of them has the wrong value in the 'A' column.
Problem description
My understanding is that when grouping on a column, pandas excludes null values (link). However, when calling the head
method on a GroupBy object, the null group is returned. While I generally would support keeping null values across the board, this inconsistency contradicts the documentation. Looking to see if similar methods had the same behavior, I discovered that nth
has an even worse bug.
Output of pd.show_versions()
INSTALLED VERSIONS
commit: None
python: 3.6.4.final.0
python-bits: 64
OS: Linux
OS-release: 4.13.0-45-generic
machine: x86_64
processor: x86_64
byteorder: little
LC_ALL: None
LANG: en_US.UTF-8
LOCALE: en_US.UTF-8
pandas: 0.23.0
pytest: 3.4.2
pip: 10.0.1
setuptools: 38.4.0
Cython: None
numpy: 1.14.3
scipy: 1.1.0
pyarrow: 0.9.0
xarray: None
IPython: 6.2.1
sphinx: 1.7.3
patsy: 0.5.0
dateutil: 2.7.3
pytz: 2018.5
blosc: None
bottleneck: None
tables: None
numexpr: None
feather: 0.4.0
matplotlib: 2.1.2
openpyxl: None
xlrd: 1.1.0
xlwt: None
xlsxwriter: None
lxml: None
bs4: None
html5lib: 1.0.1
sqlalchemy: 1.2.7
pymysql: None
psycopg2: 2.7.5 (dt dec pq3 ext lo64)
jinja2: 2.10
s3fs: None
fastparquet: None
pandas_gbq: None
pandas_datareader: None