Skip to content

Taking first row from each group in groupby sometimes strips tzinfo #10668

Closed
@louispotok

Description

@louispotok

xref #12898 (same fix)

(c.f. http://stackoverflow.com/questions/31617084/how-to-have-groupby-first-not-remove-timezone-info-from-datetime-columns)
Take a dataframe with a column of tz-aware datetime.datetime objects, and group it by a different column, then return the first row from each group. There are some ways to do this that leave the datetime as it is; and then at least two ways that convert it to a tz-naive pandas Timestamp object.

In [1]: import pandas as pd

In [2]: import datetime

In [3]: import pytz

In [4]: dates = [datetime.datetime(2015,1,i,tzinfo=pytz.timezone('US/Pacific')) for i in range(1,5)]

In [5]: df = pd.DataFrame({'A': ['a','b']*2,'B': dates})

In [6]: df
Out[6]: 
   A                          B
0  a  2015-01-01 00:00:00-08:00
1  b  2015-01-02 00:00:00-08:00
2  a  2015-01-03 00:00:00-08:00
3  b  2015-01-04 00:00:00-08:00

In [7]: grouped = df.groupby('A') 

In [8]: grouped.nth(0) #B stays a datetime.datetime with timezone info
Out[8]: 
                           B
A                           
a  2015-01-01 00:00:00-08:00
b  2015-01-02 00:00:00-08:00

In [9]: grouped.head(1) #B stays a datetime.datetime with timezone 
Out[9]: 
                           B
0  2015-01-01 00:00:00-08:00
1  2015-01-02 00:00:00-08:00

In [10]: grouped.first() #B is naive pd.TimeStamp in UTC
Out[10]: 
                    B
A                    
a 2015-01-01 08:00:00
b 2015-01-02 08:00:00

And apparently grouped.apply(lambda x: x.iloc[0]) does the same as .first().

Metadata

Metadata

Assignees

No one assigned

    Labels

    BugDuplicate ReportDuplicate issue or pull requestGroupbyTestingpandas testing functions or related to the test suiteTimezonesTimezone data dtype

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions