Skip to content

BUG: with integer column labels, .info() throws KeyError after column subsetting with .loc[] #37245

Closed
@stefan-jansen

Description

@stefan-jansen
  • I have checked that this issue has not already been reported.

  • I have confirmed this bug exists on the latest version of pandas.

  • (optional) I have confirmed this bug exists on the master branch of pandas.


Note: Please read this guide detailing how to provide the necessary information for us to reproduce your bug.

Code Sample, a copy-pastable example

import numpy as np
import pandas as pd

df = pd.DataFrame(np.random.random(size=(2, 3)),
                  index=['A', 'B'],
                  columns=[0, 1, 2])

print(df.info())
<class 'pandas.core.frame.DataFrame'>
Index: 2 entries, A to B
Data columns (total 3 columns):
 #   Column  Non-Null Count  Dtype  
---  ------  --------------  -----  
 0   0       2 non-null      float64
 1   1       2 non-null      float64
 2   2       2 non-null      float64
dtypes: float64(3)
memory usage: 64.0+ bytes
None

print(df.loc[:, [1, 2]])
          1         2
A  0.950714  0.731994
B  0.156019  0.155995

print(df.loc[:, [1, 2]].info())
Traceback (most recent call last):
  File ".../.pyenv/versions/dm/lib/python3.8/site-packages/pandas/core/indexes/base.py", line 2895, in get_loc
    return self._engine.get_loc(casted_key)
  File "pandas/_libs/index.pyx", line 70, in pandas._libs.index.IndexEngine.get_loc
  File "pandas/_libs/index.pyx", line 101, in pandas._libs.index.IndexEngine.get_loc
  File "pandas/_libs/hashtable_class_helper.pxi", line 1032, in pandas._libs.hashtable.Int64HashTable.get_item
  File "pandas/_libs/hashtable_class_helper.pxi", line 1039, in pandas._libs.hashtable.Int64HashTable.get_item
KeyError: 0

The above exception was the direct cause of the following exception:

Traceback (most recent call last):
  File "some_file.py", line 32, in <module>
    print(df.loc[:, [1, 2]].info())
  File ".../.pyenv/versions/dm/lib/python3.8/site-packages/pandas/core/frame.py", line 2589, in info
    return DataFrameInfo(
  File ".../.pyenv/versions/dm/lib/python3.8/site-packages/pandas/io/formats/info.py", line 250, in info
    self._verbose_repr(lines, ids, dtypes, show_counts)
  File ".../.pyenv/versions/dm/lib/python3.8/site-packages/pandas/io/formats/info.py", line 335, in _verbose_repr
    dtype = dtypes[i]
  File ".../.pyenv/versions/dm/lib/python3.8/site-packages/pandas/core/series.py", line 882, in __getitem__
    return self._get_value(key)
  File ".../.pyenv/versions/dm/lib/python3.8/site-packages/pandas/core/series.py", line 989, in _get_value
    loc = self.index.get_loc(label)
  File ".../.pyenv/versions/dm/lib/python3.8/site-packages/pandas/core/indexes/base.py", line 2897, in get_loc
    raise KeyError(key) from err
KeyError: 0

Process finished with exit code 1

Problem description

When column names are integers, calling .info() after selecting a subset of the columns using .loc[:, [...]] causes the above error.

Converting to str avoids the error:

df.columns = df.columns.astype(str)
print(df.loc[:, ['1', '2']].info())

<class 'pandas.core.frame.DataFrame'>
Index: 2 entries, A to B
Data columns (total 2 columns):
 #   Column  Non-Null Count  Dtype  
---  ------  --------------  -----  
 0   1       2 non-null      float64
 1   2       2 non-null      float64
dtypes: float64(2)
memory usage: 48.0+ bytes
None

Expected Output

Standard .info() DataFrame summary.

Output of pd.show_versions()

INSTALLED VERSIONS

commit : db08276
python : 3.8.5.final.0
python-bits : 64
OS : Linux
OS-release : 5.4.0-51-generic
Version : #56-Ubuntu SMP Mon Oct 5 14:28:49 UTC 2020
machine : x86_64
processor : x86_64
byteorder : little
LC_ALL : None
LANG : en_US.UTF-8
LOCALE : en_US.UTF-8

pandas : 1.1.3
numpy : 1.19.2
pytz : 2020.1
dateutil : 2.8.1
pip : 20.2.3
setuptools : 50.3.0
Cython : None
pytest : None
hypothesis : None
sphinx : None
blosc : None
feather : None
xlsxwriter : None
lxml.etree : 4.5.2
html5lib : None
pymysql : None
psycopg2 : None
jinja2 : 2.11.2
IPython : 7.18.1
pandas_datareader: None
bs4 : None
bottleneck : None
fsspec : 0.8.3
fastparquet : 0.4.1
gcsfs : 0.7.1
matplotlib : 3.3.2
numexpr : 2.7.1
odfpy : None
openpyxl : 3.0.5
pandas_gbq : None
pyarrow : 1.0.1
pytables : None
pyxlsb : None
s3fs : None
scipy : 1.5.2
sqlalchemy : None
tables : 3.6.1
tabulate : None
xarray : None
xlrd : 1.2.0
xlwt : None
numba : 0.51.2
None]

Metadata

Metadata

Assignees

No one assigned

    Labels

    IndexingRelated to indexing on series/frames, not to indexes themselvesRegressionFunctionality that used to work in a prior pandas version

    Type

    No type

    Projects

    No projects

    Milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions