Description
Code Sample, a copy-pastable example if possible
data = {"ID1": [1, 1, 1, 2, 2],
"ID2": [1001, 1001, 1002, 1001, 1002],
"ID3": [1, 2, 1, 1, 2],
"Value": [1, 2, 9, 3, 4]}
df = pd.DataFrame(data).set_index(["ID1", "ID2", "ID3"])
desired_rows = ((1, 1001, 1), (1, 1001, 2), (2, 1002, 2)) # the rows to be extracted
print(df)
Out[3]:
Value
ID1 ID2 ID3
1 1001 1 1
2 2
1002 1 9
2 1001 1 3
1002 2 4
Problem description
Now, extracting the desired rows with loc
fails here while returning only the first row:
In [5]: df.loc[desired_rows, :]
Out[5]:
Value
ID1 ID2 ID3
1 1001 2 2
Expected Output
One solution would be to convert the tuple
to a list
internally because a list
of indices work correctly:
In [6]: df.loc[list(desired_rows), :]
Out[6]:
Value
ID1 ID2 ID3
1 1001 1 1
2 2
2 1002 2 4
Another solution is to raise an error if a tuple
of indices is provided as the row indexer of the loc
in order to prevent unpredicted results.
Output of pd.show_versions()
INSTALLED VERSIONS
commit: None
python: 3.6.1.final.0
python-bits: 64
OS: Linux
OS-release: 4.8.0-58-generic
machine: x86_64
processor: x86_64
byteorder: little
LC_ALL: None
LANG: en_US.UTF-8
LOCALE: en_US.UTF-8
pandas: 0.20.2
pytest: None
pip: 9.0.1
setuptools: 36.0.1
Cython: 0.25.2
numpy: 1.13.1
scipy: 0.19.1
xarray: None
IPython: 6.1.0
sphinx: None
patsy: None
dateutil: 2.6.0
pytz: 2017.2
blosc: None
bottleneck: None
tables: None
numexpr: None
feather: None
matplotlib: 2.0.2
openpyxl: None
xlrd: None
xlwt: None
xlsxwriter: None
lxml: None
bs4: None
html5lib: 0.999999999
sqlalchemy: None
pymysql: None
psycopg2: None
jinja2: 2.9.6
s3fs: None
pandas_gbq: None
pandas_datareader: None