Description
Code Sample, a copy-pastable example if possible
import pandas as pd
foo = pd.DataFrame.from_records(
[
(pd.datetime(2016,1,1), 'red', 'dark', 1, '8'),
(pd.datetime(2015,1,1), 'green', 'stormy', 2, '9'),
(pd.datetime(2014,1,1), 'blue', 'bright', 3, '10'),
(pd.datetime(2013,1,1), 'blue', 'calm', 4, 'potato')
],
columns=['observation', 'color', 'mood', 'intensity', 'score'])
# The type of 'score' changes depending on the types passed through the groupby
print(pd.concat(
[
foo.dtypes,
foo.loc[:,['observation', 'color', 'mood', 'intensity', 'score']].groupby('color').apply(lambda g: g.iloc[0]).dtypes,
foo.loc[:,[ 'color', 'mood', 'intensity', 'score']].groupby('color').apply(lambda g: g.iloc[0]).dtypes
],
axis=1,
keys=['original DF', 'w/ datetime', 'w/o datetime']))
Problem description
When the results of a groupby contain a Series with a datetime and are aggregated back into a DataFrame, columns of object type are cast numeric when possible. When that Series contains no datetime, they are not.
The presence of a datetime elsewhere in the Series should not have effects on unrelated columns. Doing no implicit type coercion seems (to me) like the safest option (especially in a language where "1" != 1). But regardless, whether or not type coercion is done for a column 'A' should not depend on the types of all the column 'B's.
Issue #14423 is a different problem over the same code.
Expected Output
Current:
original DF w/ datetime w/o datetime
color object object object
intensity int64 int64 int64
mood object object object
observation datetime64[ns] datetime64[ns] NaN
score object int64 object
Expected:
original DF w/ datetime w/o datetime
color object object object
intensity int64 int64 int64
mood object object object
observation datetime64[ns] datetime64[ns] NaN
score object object object
-or-
original DF w/ datetime w/o datetime
color object object object
intensity int64 int64 int64
mood object object object
observation datetime64[ns] datetime64[ns] NaN
score object int64 int64
Output of pd.show_versions()
pandas: 0.19.1
nose: 1.3.1
pip: 1.5.4
setuptools: 3.3
Cython: 0.24.1
numpy: 1.11.2
scipy: 0.18.1
statsmodels: None
xarray: None
IPython: 5.1.0
sphinx: None
patsy: None
dateutil: 2.6.0
pytz: 2016.10
blosc: None
bottleneck: None
tables: None
numexpr: None
matplotlib: 1.5.3
openpyxl: 2.4.0
xlrd: 1.0.0
xlwt: None
xlsxwriter: None
lxml: 3.3.3
bs4: 4.2.1
html5lib: 0.999
httplib2: 0.8
apiclient: None
sqlalchemy: None
pymysql: None
psycopg2: None
jinja2: None
boto: None
pandas_datareader: None