Bug in read_csv with duplicated column names

Tested on 0.13.0, 0.13.1 and 0.14.0rc1:

``` python
from StringIO import StringIO
import pandas as pd

# this is correct
print(pd.DataFrame([[0, 1, 2], [3, 4, 5]], columns=["a", "b", "a"]))
# and this is fine as well
# (although, changing the column names to a,b,a.1)
print(pd.read_csv(StringIO("a,b,a\n0,1,2\n3,4,5")))
# but this is not correct
print(pd.read_csv(StringIO("0,1,2\n3,4,5"), names=["a", "b", "a"]))
```

The last one returns:

```
Out[5]: 
   a  b  a
0  2  1  2
1  5  4  5
```

I would expect all 3 methods to return the same DataFrame. I noticed this when I wanted to read csv file that had a separate file with a header (and a duplicated column in it). BTW is there a better way to do it than to read the header file first and pass the output into 'names' parameter of read_csv?


Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Bug in read_csv with duplicated column names #7160

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Bug in read_csv with duplicated column names #7160

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions