Reading from Excel mangles columns

As far as I can tell, both methods of reading an Excel sheet (through ExcelFile and read_excel) do not have the option to avoid mangling duplicate columns, which exists for read_csv (https://github.com/pydata/pandas/issues/3468).

I have an Excel file that looks like this:

```
  foo  foo  bar  bar  baz  baz
    A    B    A    B    A    B
  123  456  789   12  345  678
  901  234  567  890  123  456
  789   12  345  678  901  234
```

I initially encountered this problem while trying to find a workaround for not being able to specify a multirow-header (#4679). Reading in this Excel file with header = 0 (default) mangles the column names:

```
>>> df = pd.read_excel("dupe_cols.xlsx", header = 0)
>>> print df
   foo foo.1  bar bar.1  baz baz.1
0    A     B    A     B    A     B
1  123   456  789    12  345   678
2  901   234  567   890  123   456
3  789    12  345   678  901   234
```

Unlike `read_csv`, `read_excel` does not have the option to avoid mangling duplicate columns (using ExcelFile.parse works the same as far as I can see). Specifying header = None in `read_excel` and then assigning the column names to the first row will effectively allow you to avoid mangling the column names. You could also read in the Excel sheet with header = None, save a .csv with header = False and index = False, then read that csv and specify mangle_dupe_cols = False to get the dataframe you want. (Incidentally, this also allows you to specify multiple rows as the header, which is the behavior I was originally trying to emulate.) 


Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Reading from Excel mangles columns #10523

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Reading from Excel mangles columns #10523

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions