Closed
Description
I ran into an issue where the behavior of .apply()
changed from 0.16 to 0.17, causing different results on tz aware data. Extracting the hour of day from datetimes is different for Series(x).apply(func)
vs func(x)
. Below is a minimal example of the issue in 0.17, it seems the behavior is the same in 0.18 but different (though still not equal) on master, also shown below.
On 0.17.1 and 0.18.0:
>>> import pandas as pd
>>> def hour_of_day(dt):
... return dt.hour
...
>>> dt = pd.to_datetime(1462068217, unit='s')
>>> dt_localized = dt.tz_localize('UTC').tz_convert('US/Pacific')
>>> dt_list = [dt, dt_localized]
>>> apply_series = pd.Series(dt_list).apply(hour_of_day)
>>> map_series = pd.Series(map(hour_of_day, dt_list))
>>> print dt_list
[Timestamp('2016-05-01 02:03:37'), Timestamp('2016-04-30 19:03:37-0700', tz='US/Pacific')]
>>> print apply_series
0 2
1 2
dtype: int64
>>> print map_series
0 2
1 19
dtype: int64
>>> print apply_series - map_series
0 0
1 -17
dtype: int64
>>> pd.show_versions()
INSTALLED VERSIONS
------------------
commit: None
python: 2.7.11.final.0
python-bits: 64
OS: Darwin
OS-release: 15.2.0
machine: x86_64
processor: i386
byteorder: little
LC_ALL: None
LANG: en_US.UTF-8
pandas: 0.17.1
nose: 1.3.7
pip: 8.1.1
setuptools: 20.7.0
Cython: 0.24
numpy: 1.11.0
scipy: 0.17.0
statsmodels: None
IPython: 4.2.0
sphinx: 1.4.1
patsy: 0.4.1
dateutil: 2.5.2
pytz: 2016.3
blosc: None
bottleneck: 1.0.0
tables: 3.2.2
numexpr: 2.5.2
matplotlib: 1.5.1
openpyxl: 2.3.2
xlrd: 0.9.4
xlwt: 1.0.0
xlsxwriter: 0.8.5
lxml: 3.6.0
bs4: 4.3.2
html5lib: 0.999
httplib2: None
apiclient: None
sqlalchemy: 1.0.12
pymysql: 0.6.7.None
psycopg2: None
Jinja2: None
On 0.16.2 (what I expected):
# Same setup as above
>>> print apply_series
0 2
1 19
dtype: int64
>>> print map_series
0 2
1 19
dtype: int64
>>> print apply_series - map_series
0 0
1 0
dtype: int64
>>> pd.show_versions()
INSTALLED VERSIONS
------------------
commit: None
python: 2.7.11.final.0
python-bits: 64
OS: Darwin
OS-release: 15.2.0
machine: x86_64
processor: i386
byteorder: little
LC_ALL: None
LANG: en_US.UTF-8
pandas: 0.16.2
nose: None
Cython: None
numpy: 1.10.4
scipy: None
statsmodels: None
IPython: 4.2.0
sphinx: None
patsy: None
dateutil: 2.5.2
pytz: 2016.3
bottleneck: None
tables: None
numexpr: None
matplotlib: None
openpyxl: None
xlrd: None
xlwt: None
xlsxwriter: None
lxml: None
bs4: None
html5lib: None
httplib2: None
apiclient: None
sqlalchemy: None
pymysql: None
psycopg2: None
On master:
# Same setup as above
>>> print apply_series
0 19
1 19
dtype: int32
>>> print map_series
0 2
1 19
dtype: int64
>>> print apply_series - map_series
0 17
1 0
dtype: int64
>>> pd.show_versions()
INSTALLED VERSIONS
------------------
commit: 05e734ab171be0fda838c6b12839c38fa588da2c
python: 2.7.11.final.0
python-bits: 64
OS: Darwin
OS-release: 15.2.0
machine: x86_64
processor: i386
byteorder: little
LC_ALL: None
LANG: en_US.UTF-8
pandas: 0.18.0+203.g05e734a
nose: None
pip: 8.1.1
setuptools: 20.7.0
Cython: None
numpy: 1.10.4
scipy: None
statsmodels: None
xarray: None
IPython: 4.2.0
sphinx: None
patsy: None
dateutil: 2.5.2
pytz: 2016.3
blosc: None
bottleneck: None
tables: None
numexpr: None
matplotlib: None
openpyxl: None
xlrd: None
xlwt: None
xlsxwriter: None
lxml: None
bs4: None
html5lib: None
httplib2: None
apiclient: None
sqlalchemy: None
pymysql: None
psycopg2: None
jinja2: None
boto: None
pandas_datareader: None
Expected Output
I would expect the output to be [2, 19], as in 0.16, and matching map(f, data).