Skip to content

groupby first can return values not in group #9300

Closed
@meloncholy

Description

@meloncholy

Not that familiar (at all :) with pandas internals, but I don't think this is expected behaviour.

f3 = DataFrame(
    [
        [95820843523155097, 1, 'director', 1],
        [95820843523155098, 1, 'director', 2],
        [95820843523155099, 1, 'director', 3],
        [95820843523155100, 2, 'director', 4],
        [95820843523155101, 2, 'computer system management (director)', 5],
        [95820843523155102, 3, 'company director', 6],
        [95820843523155103, 3, 'office manager', 7]
    ],
    columns=['uid', 'cid', 'role', 'idx']
)

f3.dtypes
uid      int64
cid      int64
role    object
idx      int64
dtype: object

Observed behaviour

f3.groupby('cid').first()
uid role idx
cid
1 95820843523155104 director 1
2 95820843523155104 director 4
3 95820843523155104 company director 6

The uid column contains values that are all the same and aren't in the original data. (This isn't always true in larger sets; sometimes there's an overlap.)

Expected behaviour

f3.groupby('cid').apply(lambda g: g[:1])
uid role idx
cid
1 0 95820843523155097 director 1
2 3 95820843523155100 director 4
3 5 95820843523155102 company director 6

This is what I expected to happen (i.e. the uid matches the rest of the row).

INSTALLED VERSIONS
------------------
commit: None
python: 3.4.2.final.0
python-bits: 64
OS: Linux
OS-release: 3.13.0-24-generic
machine: x86_64
processor: x86_64
byteorder: little
LC_ALL: None
LANG: en_US.UTF-8

pandas: 0.15.2
nose: 1.3.4
Cython: 0.21.1
numpy: 1.8.2
scipy: 0.14.0
statsmodels: 0.6.1
IPython: 2.3.1
sphinx: None
patsy: 0.3.0
dateutil: 2.1
pytz: 2014.9
bottleneck: 0.8.0
tables: 3.1.1
numexpr: 2.3.1
matplotlib: 1.4.0
openpyxl: 2.0.2
xlrd: 0.9.3
xlwt: None
xlsxwriter: 0.6.4
lxml: 3.4.1
bs4: 4.3.2
html5lib: None
httplib2: None
apiclient: None
rpy2: None
sqlalchemy: 0.9.8
pymysql: None
psycopg2: 2.5.4 (dt dec pq3 ext)

Metadata

Metadata

Assignees

No one assigned

    Labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions