Skip to content

Bug? Replacing NaN values based on a condition. #8669

Closed
@ozhogin

Description

@ozhogin

Dataframe with 2 columns: A and B. If values in B are larger than values in A - replace those values with values of A. I used to do this by doing df.B[df.B > df.A] = df.A, however recent upgrade of pandas started giving a SettingWithCopyWarning when encountering this chained assignment. Official documentation recommends using .loc.

Okay, I said, and did it through df.loc[df.B > df.A, 'B'] = df.A and it all works fine, unless column B has all values of NaN. Then something weird happens:

In [1]: df = pd.DataFrame({'A': [1, 2, 3],'B': [np.NaN, np.NaN, np.NaN]})

In [2]: df
Out[2]: 
   A   B
0  1 NaN
1  2 NaN
2  3 NaN

In [3]: df.loc[df.B > df.A, 'B'] = df.A

In [4]: df
Out[4]: 
   A                    B
0  1 -9223372036854775808
1  2 -9223372036854775808
2  3 -9223372036854775808

Now, if even one of B's elements satisfies the condition (larger than A), then it all works fine:

In [1]: df = pd.DataFrame({'A': [1, 2, 3],'B': [np.NaN, 4, np.NaN]})

In [2]: df
Out[2]: 
   A   B
0  1 NaN
1  2   4
2  3 NaN

In [3]: df.loc[df.B > df.A, 'B'] = df.A

In [4]: df
Out[4]: 
   A   B
0  1 NaN
1  2   2
2  3 NaN

But if none of B's elements satisfy, then all NaNs get replaces with -9223372036854775808:

In [1]: df = pd.DataFrame({'A':[1,2,3],'B':[np.NaN,1,np.NaN]})

In [2]: df
Out[2]: 
   A   B
0  1 NaN
1  2   1
2  3 NaN

In [3]: df.loc[df.B > df.A, 'B'] = df.A

In [4]: df
Out[4]: 
   A                    B
0  1 -9223372036854775808
1  2                    1
2  3 -9223372036854775808

Am I doing something wrong, or this is a bug?

pandas: 0.15.0

Metadata

Metadata

Assignees

No one assigned

    Labels

    BugDtype ConversionsUnexpected or buggy dtype conversionsIndexingRelated to indexing on series/frames, not to indexes themselves

    Type

    No type

    Projects

    No projects

    Milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions