Skip to content

Pandas inconsistenly handles identically named columns in csv export and merging #3468

Closed
@darindillon

Description

@darindillon

Using pandas 0.10.1

Pandas allows creating a dataframe with two columns with the same name. (I disagree that it should be allowed, but it is allowed, so OK). However, it doesn't handle that correctly in several cases.
Pandas ought to either completely disallow duplicate named columns or handle them everywhere. But it shouldn't handle them in some cases but not others.

Problem 1: Round-trip to a CSV
Dump the dataframe to a CSV and then read it back. Even though duplicate columns are supposed to be legal, Pandas won't allow that in the CSV import/export.

df = pandas.DataFrame([[1,2]], columns=['a','a'])
#Note df has two identically named columns now
df.to_csv('foo.csv')
df2 = pandas.DataFrame.from_csv('foo.csv')
#Note df2 does NOT have the same columns anymore. the "a" was renamed "a.1"
#At the least, you ought to get some warning that dumping to a CSV will 
#lose column names.

Problem 2: Merging to a dataframe with dup columns does not work

df = pandas.DataFrame([[1,2,3]], columns=['a','a','b'])
df.merge(df, how='left', on='b')
#Result: Very misleading error message
# edit: reindexing error

I'd be ok with almost any solution (disallowing duplicate named columns, giving you a warning when you do it, handling it correctly in the merge and csv read, etc). But it should be consistently one way or the other.

Metadata

Metadata

Assignees

No one assigned

    Labels

    EnhancementIO CSVread_csv, to_csvIO DataIO issues that don't fit into a more specific label

    Type

    No type

    Projects

    No projects

    Milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions