Skip to content

Bug in read_csv with duplicated column names #7160

Closed
@rafaljozefowicz

Description

@rafaljozefowicz

Tested on 0.13.0, 0.13.1 and 0.14.0rc1:

from StringIO import StringIO
import pandas as pd

# this is correct
print(pd.DataFrame([[0, 1, 2], [3, 4, 5]], columns=["a", "b", "a"]))
# and this is fine as well
# (although, changing the column names to a,b,a.1)
print(pd.read_csv(StringIO("a,b,a\n0,1,2\n3,4,5")))
# but this is not correct
print(pd.read_csv(StringIO("0,1,2\n3,4,5"), names=["a", "b", "a"]))

The last one returns:

Out[5]: 
   a  b  a
0  2  1  2
1  5  4  5

I would expect all 3 methods to return the same DataFrame. I noticed this when I wanted to read csv file that had a separate file with a header (and a duplicated column in it). BTW is there a better way to do it than to read the header file first and pass the output into 'names' parameter of read_csv?

Metadata

Metadata

Assignees

No one assigned

    Labels

    Type

    No type

    Projects

    No projects

    Milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions