Closed
Description
Tested on 0.13.0, 0.13.1 and 0.14.0rc1:
from StringIO import StringIO
import pandas as pd
# this is correct
print(pd.DataFrame([[0, 1, 2], [3, 4, 5]], columns=["a", "b", "a"]))
# and this is fine as well
# (although, changing the column names to a,b,a.1)
print(pd.read_csv(StringIO("a,b,a\n0,1,2\n3,4,5")))
# but this is not correct
print(pd.read_csv(StringIO("0,1,2\n3,4,5"), names=["a", "b", "a"]))
The last one returns:
Out[5]:
a b a
0 2 1 2
1 5 4 5
I would expect all 3 methods to return the same DataFrame. I noticed this when I wanted to read csv file that had a separate file with a header (and a duplicated column in it). BTW is there a better way to do it than to read the header file first and pass the output into 'names' parameter of read_csv?