Closed
Description
xref #12898 (same fix)
(c.f. http://stackoverflow.com/questions/31617084/how-to-have-groupby-first-not-remove-timezone-info-from-datetime-columns)
Take a dataframe with a column of tz-aware datetime.datetime objects, and group it by a different column, then return the first row from each group. There are some ways to do this that leave the datetime as it is; and then at least two ways that convert it to a tz-naive pandas Timestamp object.
In [1]: import pandas as pd
In [2]: import datetime
In [3]: import pytz
In [4]: dates = [datetime.datetime(2015,1,i,tzinfo=pytz.timezone('US/Pacific')) for i in range(1,5)]
In [5]: df = pd.DataFrame({'A': ['a','b']*2,'B': dates})
In [6]: df
Out[6]:
A B
0 a 2015-01-01 00:00:00-08:00
1 b 2015-01-02 00:00:00-08:00
2 a 2015-01-03 00:00:00-08:00
3 b 2015-01-04 00:00:00-08:00
In [7]: grouped = df.groupby('A')
In [8]: grouped.nth(0) #B stays a datetime.datetime with timezone info
Out[8]:
B
A
a 2015-01-01 00:00:00-08:00
b 2015-01-02 00:00:00-08:00
In [9]: grouped.head(1) #B stays a datetime.datetime with timezone
Out[9]:
B
0 2015-01-01 00:00:00-08:00
1 2015-01-02 00:00:00-08:00
In [10]: grouped.first() #B is naive pd.TimeStamp in UTC
Out[10]:
B
A
a 2015-01-01 08:00:00
b 2015-01-02 08:00:00
And apparently grouped.apply(lambda x: x.iloc[0])
does the same as .first()
.