Description
So I have tested two versions of Pandas parallel to each other with exactly the same code. 0.19.2 behaves more as expected, but 0.22.0 does what I am about to describe. Will probably switch to 0.19.2 for now. Am using Python 3.6.4
import pandas as pd
df = pd.DataFrame({"A":[34,12,78,84,26], "B":[54,87,35,81,87], "C":[56,78,0,14,13], "D":[0,87,72,87,14], "E":[78,12,31,0,34]})
print(df.drop_duplicates(['b','D']))
print(df.drop_duplicates(['B','D']))
print(df.drop_duplicates(['B']))
print(df.drop_duplicates(['D']))
Problem description
I became aware of the problem working with a much larger dataframe when it failed to warn me or raise a KeyError
when I misspelled a column name.
Expected Output
Pandas 0.19.2 gives you the following and but Pandas 22 gives you no KeyError
for the first print statement it just runs.
KeyError: 'b'
A B C D E
0 34 54 56 0 78
1 12 87 78 87 12
2 78 35 0 72 31
3 84 81 14 87 0
4 26 87 13 14 34
A B C D E
0 34 54 56 0 78
1 12 87 78 87 12
2 78 35 0 72 31
3 84 81 14 87 0
A B C D E
0 34 54 56 0 78
1 12 87 78 87 12
2 78 35 0 72 31
4 26 87 13 14 34
Output of pd.show_versions()
Below is the output for the Pandas version for where the problem is.
INSTALLED VERSIONS
commit: None
pandas: 0.22.0
pytest: None
pip: 9.0.1
setuptools: 38.4.0
Cython: None
numpy: 1.14.0
scipy: 1.0.0
pyarrow: None
xarray: None
IPython: 6.2.1
sphinx: None
patsy: 0.5.0
dateutil: 2.6.1
pytz: 2017.3
blosc: None
bottleneck: None
tables: None
numexpr: None
feather: None
matplotlib: 2.1.1
openpyxl: None
xlrd: 1.1.0
xlwt: None
xlsxwriter: None
lxml: None
bs4: 4.6.0
html5lib: 1.0.1
sqlalchemy: 1.2.1
pymysql: None
psycopg2: None
jinja2: 2.10
s3fs: None
fastparquet: None
pandas_gbq: None
pandas_datareader: None