Closed
Description
It seems that somehow the columns used in sum
when applied to a 1 row dataframe depend on the values in the row instead of just the dtypes. Observe:
import pandas as pd
import numpy as np
# Frame with some non-numeric dtypes
df = pd.DataFrame({'a': [1], 'b': [1.1], 'c': ['foo'], 'd': [pd.Timestamp('2000-01-01')]})
# Only change here is that `d` is `NaT`
df2 = pd.DataFrame({'a': [1], 'b': [1.1], 'c': ['foo'], 'd': [pd.NaT]})
# This is just the first one twice
df3 = pd.concat([df, df])
# I'd expect all 3 to use the same columns in the reduction
df_sum = df.sum()
df2_sum = df2.sum()
df3_sum = df3.sum()
Loading that in an ipython session:
In [1]: df_sum
Out[1]:
a 1
b 1.1
c foo
d 2000-01-01 00:00:00
dtype: object
In [2]: df2_sum
Out[2]:
a 1.0
b 1.1
dtype: float64
In [3]: df3_sum
Out[3]:
a 2.0
b 2.2
dtype: float64
In [4]: pd.__version__
Out[4]: u'0.18.1'
In [5]: np.__version__
Out[5]: '1.11.1'
I'd expect all 3 to only use the columns ['a', 'b']
, as these are the only numeric columns. Strangely, _get_numeric_data
does return just ['a', 'b']
in all cases, so it's not that.