Skip to content

Pandas 0.22.0 does not raise KeyError for misspelled column with .drop_duplicates() #19726

Closed
@aktivkohle

Description

@aktivkohle

So I have tested two versions of Pandas parallel to each other with exactly the same code. 0.19.2 behaves more as expected, but 0.22.0 does what I am about to describe. Will probably switch to 0.19.2 for now. Am using Python 3.6.4

import pandas as pd
df = pd.DataFrame({"A":[34,12,78,84,26], "B":[54,87,35,81,87], "C":[56,78,0,14,13], "D":[0,87,72,87,14], "E":[78,12,31,0,34]}) 

print(df.drop_duplicates(['b','D']))
print(df.drop_duplicates(['B','D']))
print(df.drop_duplicates(['B']))
print(df.drop_duplicates(['D']))

Problem description

I became aware of the problem working with a much larger dataframe when it failed to warn me or raise a KeyError when I misspelled a column name.

Expected Output

Pandas 0.19.2 gives you the following and but Pandas 22 gives you no KeyError for the first print statement it just runs.

KeyError: 'b'

    A   B   C   D   E
0  34  54  56   0  78
1  12  87  78  87  12
2  78  35   0  72  31
3  84  81  14  87   0
4  26  87  13  14  34

    A   B   C   D   E
0  34  54  56   0  78
1  12  87  78  87  12
2  78  35   0  72  31
3  84  81  14  87   0

    A   B   C   D   E
0  34  54  56   0  78
1  12  87  78  87  12
2  78  35   0  72  31
4  26  87  13  14  34

Output of pd.show_versions()

Below is the output for the Pandas version for where the problem is.

INSTALLED VERSIONS

commit: None

pandas: 0.22.0
pytest: None
pip: 9.0.1
setuptools: 38.4.0
Cython: None
numpy: 1.14.0
scipy: 1.0.0
pyarrow: None
xarray: None
IPython: 6.2.1
sphinx: None
patsy: 0.5.0
dateutil: 2.6.1
pytz: 2017.3
blosc: None
bottleneck: None
tables: None
numexpr: None
feather: None
matplotlib: 2.1.1
openpyxl: None
xlrd: 1.1.0
xlwt: None
xlsxwriter: None
lxml: None
bs4: 4.6.0
html5lib: 1.0.1
sqlalchemy: 1.2.1
pymysql: None
psycopg2: None
jinja2: 2.10
s3fs: None
fastparquet: None
pandas_gbq: None
pandas_datareader: None

Metadata

Metadata

Assignees

No one assigned

    Labels

    RegressionFunctionality that used to work in a prior pandas versiongood first issue

    Type

    No type

    Projects

    No projects

    Milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions