Skip to content

BUG: Columns used in sum on 1 row dataframe depend on values instead of dtype #13912

Closed
@jcrist

Description

@jcrist

It seems that somehow the columns used in sum when applied to a 1 row dataframe depend on the values in the row instead of just the dtypes. Observe:

import pandas as pd
import numpy as np

# Frame with some non-numeric dtypes
df = pd.DataFrame({'a': [1], 'b': [1.1], 'c': ['foo'], 'd': [pd.Timestamp('2000-01-01')]})
# Only change here is that `d` is `NaT`
df2 = pd.DataFrame({'a': [1], 'b': [1.1], 'c': ['foo'], 'd': [pd.NaT]})
# This is just the first one twice
df3 = pd.concat([df, df])

# I'd expect all 3 to use the same columns in the reduction
df_sum = df.sum()
df2_sum = df2.sum()
df3_sum = df3.sum()

Loading that in an ipython session:

In [1]: df_sum
Out[1]:
a                      1
b                    1.1
c                    foo
d    2000-01-01 00:00:00
dtype: object

In [2]: df2_sum
Out[2]:
a    1.0
b    1.1
dtype: float64

In [3]: df3_sum
Out[3]:
a    2.0
b    2.2
dtype: float64

In [4]: pd.__version__
Out[4]: u'0.18.1'

In [5]: np.__version__
Out[5]: '1.11.1'

I'd expect all 3 to only use the columns ['a', 'b'], as these are the only numeric columns. Strangely, _get_numeric_data does return just ['a', 'b'] in all cases, so it's not that.

Metadata

Metadata

Assignees

No one assigned

    Labels

    Type

    No type

    Projects

    No projects

    Milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions