Closed
Description
I'm using pandas 0.17.1
import pandas as pd
pd.__version__
Out:'0.17.1'
When column names are duplicated
cols = ['A', 'A', 'B']
with open('pandas.csv', 'w') as f:
f.write('1,2,3')
we can still load dataframe
pd.read_csv('pandas.csv',
header=None,
names=cols,
)
with explainable behaviour
Out:
A A B
0 2 2 3
Then we might want to load some of the columns with python engine
pd.read_csv('pandas.csv',
engine='python',
header=None,
names=cols,
usecols=cols
)
and get different but still explainable result
Out:
A B
0 1 3
But then we switch back to c-engine
pd.read_csv('pandas.csv',
engine='c',
header=None,
names=cols,
usecols=cols
)
and get the following
Out:
A A B
0 2 2 NaN
which is:
(a) different (which is not good in my opinion)
(b) looks like bug