Skip to content

SparseDataFrame.fillna() doesn't fill all NaNs #16112

Closed
@kernc

Description

@kernc

Code Sample, a copy-pastable example if possible

>>> naneye = np.eye(3)
>>> naneye[1:, 0] = np.nan
>>> naneye
array([[  1.,   0.,   0.],
       [ nan,   1.,   0.],
       [ nan,   0.,   1.]])

>>> import scipy.sparse
>>> spm = scipy.sparse.csr_matrix(naneye)
>>> spm
<3x3 sparse matrix of type '<class 'numpy.float64'>'
	with 5 stored elements in Compressed Sparse Row format>

>>> spm.nnz  # diag + 2 NaN
5

>>> sdf = pd.SparseDataFrame(spm, default_fill_value=0)
>>> sdf
     0    1    2
0  1.0  0.0  0.0
1  NaN  1.0  0.0
2  NaN  0.0  1.0

This is fine, since I want the implied zeros¹ of scipy matrices to remain zeros and have explicit missing data remain missing.

¹ spm.mean(0) == matrix([[ nan, 0.3333, 0.3333]])

However, if I don't specify a fill value:

>>> sdf = pd.SparseDataFrame(spm)
>>> sdf
     0    1    2
0  1.0  NaN  NaN
1  NaN  1.0  NaN
2  NaN  NaN  1.0

>>> sdf.fillna(-1)  # huh??
     0    1    2
0  1.0 -1.0 -1.0
1  NaN  1.0 -1.0
2  NaN -1.0  1.0

>>> sdf.fillna(-1).fillna(-2)  # at least there's a pattern to it :)
     0    1    2
0  1.0 -1.0 -1.0
1 -2.0  1.0 -1.0
2 -2.0 -1.0  1.0

>>> sdf.fillna(-1).__class__
pandas.core.sparse.frame.SparseDataFrame

Problem description

On sdf.fillna(-1), I'd expect all NaNs to be filled. Don't know whether it's a bug or a feature, but it certainly is strange.

xref: #15533

Output of pd.show_versions()

pandas v0.20.0rc1 844013b

Metadata

Metadata

Assignees

No one assigned

    Labels

    Type

    No type

    Projects

    No projects

    Milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions