Description
-
I have checked that this issue has not already been reported.
-
I have confirmed this bug exists on the latest version of pandas.
-
(optional) I have confirmed this bug exists on the master branch of pandas.
Note: Please read this guide detailing how to provide the necessary information for us to reproduce your bug.
Code Sample, a copy-pastable example
import numpy as np
import pandas as pd
df = pd.DataFrame(np.random.random(size=(2, 3)),
index=['A', 'B'],
columns=[0, 1, 2])
print(df.info())
<class 'pandas.core.frame.DataFrame'>
Index: 2 entries, A to B
Data columns (total 3 columns):
# Column Non-Null Count Dtype
--- ------ -------------- -----
0 0 2 non-null float64
1 1 2 non-null float64
2 2 2 non-null float64
dtypes: float64(3)
memory usage: 64.0+ bytes
None
print(df.loc[:, [1, 2]])
1 2
A 0.950714 0.731994
B 0.156019 0.155995
print(df.loc[:, [1, 2]].info())
Traceback (most recent call last):
File ".../.pyenv/versions/dm/lib/python3.8/site-packages/pandas/core/indexes/base.py", line 2895, in get_loc
return self._engine.get_loc(casted_key)
File "pandas/_libs/index.pyx", line 70, in pandas._libs.index.IndexEngine.get_loc
File "pandas/_libs/index.pyx", line 101, in pandas._libs.index.IndexEngine.get_loc
File "pandas/_libs/hashtable_class_helper.pxi", line 1032, in pandas._libs.hashtable.Int64HashTable.get_item
File "pandas/_libs/hashtable_class_helper.pxi", line 1039, in pandas._libs.hashtable.Int64HashTable.get_item
KeyError: 0
The above exception was the direct cause of the following exception:
Traceback (most recent call last):
File "some_file.py", line 32, in <module>
print(df.loc[:, [1, 2]].info())
File ".../.pyenv/versions/dm/lib/python3.8/site-packages/pandas/core/frame.py", line 2589, in info
return DataFrameInfo(
File ".../.pyenv/versions/dm/lib/python3.8/site-packages/pandas/io/formats/info.py", line 250, in info
self._verbose_repr(lines, ids, dtypes, show_counts)
File ".../.pyenv/versions/dm/lib/python3.8/site-packages/pandas/io/formats/info.py", line 335, in _verbose_repr
dtype = dtypes[i]
File ".../.pyenv/versions/dm/lib/python3.8/site-packages/pandas/core/series.py", line 882, in __getitem__
return self._get_value(key)
File ".../.pyenv/versions/dm/lib/python3.8/site-packages/pandas/core/series.py", line 989, in _get_value
loc = self.index.get_loc(label)
File ".../.pyenv/versions/dm/lib/python3.8/site-packages/pandas/core/indexes/base.py", line 2897, in get_loc
raise KeyError(key) from err
KeyError: 0
Process finished with exit code 1
Problem description
When column names are integers, calling .info()
after selecting a subset of the columns using .loc[:, [...]]
causes the above error.
Converting to str
avoids the error:
df.columns = df.columns.astype(str)
print(df.loc[:, ['1', '2']].info())
<class 'pandas.core.frame.DataFrame'>
Index: 2 entries, A to B
Data columns (total 2 columns):
# Column Non-Null Count Dtype
--- ------ -------------- -----
0 1 2 non-null float64
1 2 2 non-null float64
dtypes: float64(2)
memory usage: 48.0+ bytes
None
Expected Output
Standard .info()
DataFrame summary.
Output of pd.show_versions()
INSTALLED VERSIONS
commit : db08276
python : 3.8.5.final.0
python-bits : 64
OS : Linux
OS-release : 5.4.0-51-generic
Version : #56-Ubuntu SMP Mon Oct 5 14:28:49 UTC 2020
machine : x86_64
processor : x86_64
byteorder : little
LC_ALL : None
LANG : en_US.UTF-8
LOCALE : en_US.UTF-8
pandas : 1.1.3
numpy : 1.19.2
pytz : 2020.1
dateutil : 2.8.1
pip : 20.2.3
setuptools : 50.3.0
Cython : None
pytest : None
hypothesis : None
sphinx : None
blosc : None
feather : None
xlsxwriter : None
lxml.etree : 4.5.2
html5lib : None
pymysql : None
psycopg2 : None
jinja2 : 2.11.2
IPython : 7.18.1
pandas_datareader: None
bs4 : None
bottleneck : None
fsspec : 0.8.3
fastparquet : 0.4.1
gcsfs : 0.7.1
matplotlib : 3.3.2
numexpr : 2.7.1
odfpy : None
openpyxl : 3.0.5
pandas_gbq : None
pyarrow : 1.0.1
pytables : None
pyxlsb : None
s3fs : None
scipy : 1.5.2
sqlalchemy : None
tables : 3.6.1
tabulate : None
xarray : None
xlrd : 1.2.0
xlwt : None
numba : 0.51.2
None]