Closed
Description
Code Sample, a copy-pastable example if possible
>>> naneye = np.eye(3)
>>> naneye[1:, 0] = np.nan
>>> naneye
array([[ 1., 0., 0.],
[ nan, 1., 0.],
[ nan, 0., 1.]])
>>> import scipy.sparse
>>> spm = scipy.sparse.csr_matrix(naneye)
>>> spm
<3x3 sparse matrix of type '<class 'numpy.float64'>'
with 5 stored elements in Compressed Sparse Row format>
>>> spm.nnz # diag + 2 NaN
5
>>> sdf = pd.SparseDataFrame(spm, default_fill_value=0)
>>> sdf
0 1 2
0 1.0 0.0 0.0
1 NaN 1.0 0.0
2 NaN 0.0 1.0
This is fine, since I want the implied zeros¹ of scipy matrices to remain zeros and have explicit missing data remain missing.
¹ spm.mean(0) == matrix([[ nan, 0.3333, 0.3333]])
However, if I don't specify a fill value:
>>> sdf = pd.SparseDataFrame(spm)
>>> sdf
0 1 2
0 1.0 NaN NaN
1 NaN 1.0 NaN
2 NaN NaN 1.0
>>> sdf.fillna(-1) # huh??
0 1 2
0 1.0 -1.0 -1.0
1 NaN 1.0 -1.0
2 NaN -1.0 1.0
>>> sdf.fillna(-1).fillna(-2) # at least there's a pattern to it :)
0 1 2
0 1.0 -1.0 -1.0
1 -2.0 1.0 -1.0
2 -2.0 -1.0 1.0
>>> sdf.fillna(-1).__class__
pandas.core.sparse.frame.SparseDataFrame
Problem description
On sdf.fillna(-1)
, I'd expect all NaNs to be filled. Don't know whether it's a bug or a feature, but it certainly is strange.
xref: #15533
Output of pd.show_versions()
pandas v0.20.0rc1 844013b