Closed
Description
Code Sample, a copy-pastable example if possible
import numpy as np
import pandas as pd
X = pd.SparseDataFrame([[0,1], [0,0]], default_fill_value=0.0)
## Good behaviour
X.loc[0].to_numpy()
# array([0., 1.])
X.loc[[0]].to_numpy()
# array([[0., 1.]])
X.iloc[0].to_numpy()
# array([0., 1.])
## Bad behaviour
X.iloc[[0]].to_numpy()
# array([[nan, 1]], dtype=object)
X.loc[[True, False]].to_numpy()
# array([[nan, 1]], dtype=object)
Problem description
Indexing a SparseDataFrame with iloc
and more than a single row number should return the same result as indexing the same rows with loc
and the corresponding indices. Instead, iloc
drops column fill_value
for any column with no non-zero entries.
Expected Output
All commands should return array([0., 1.])
(allowing for differences between 1- and 2-D output.) The last two (iloc
with fancy indexing, and loc
with boolean indexing) returns instead array([nan, 1.])
.
Output of pd.show_versions()
INSTALLED VERSIONS
------------------
commit: None
python: 3.6.7.final.0
python-bits: 64
OS: Linux
OS-release: 4.4.0-17763-Microsoft
machine: x86_64
processor: x86_64
byteorder: little
LC_ALL: None
LANG: C.UTF-8
LOCALE: en_US.UTF-8
pandas: 0.24.1
pytest: None
pip: 18.0
setuptools: 40.2.0
Cython: 0.29
numpy: 1.15.1
scipy: 1.2.0
pyarrow: None
xarray: None
IPython: 7.1.1
sphinx: 1.6.7
patsy: None
dateutil: 2.7.3
pytz: 2018.3
blosc: None
bottleneck: None
tables: 3.4.4
numexpr: 2.6.8
feather: None
matplotlib: 2.2.3
openpyxl: None
xlrd: None
xlwt: None
xlsxwriter: None
lxml.etree: None
bs4: None
html5lib: None
sqlalchemy: None
pymysql: None
psycopg2: None
jinja2: 2.10
s3fs: None
fastparquet: None
pandas_gbq: None
pandas_datareader: None
gcsfs: None