Skip to content

to_csv with lists of strings and unicode encoding produces wrong output #10813

Closed
@tdszyman

Description

@tdszyman

If I have a dataframe with cells containing lists of strings (or unicode strings), then these lists are broken when I use to_csv() with the encoding parameter set. The error does not occur if the encoding is not set.

Here is an example (using pandas version 0.16.2):

df = pd.DataFrame.from_records(
    [('Mary S.',['Detroit, MI','New York, NY']),
     ('John U.',[u'Atlanta, GA',u'Paris, France'])],
    columns=['name','residences'])
df.to_csv('ascii.csv')
df.to_csv('utf8.csv',encoding='utf-8')

The ascii-encoded CSV file is fine. (contents of 'ascii.csv' below)

,name,residences
0,Mary S.,"['Detroit, MI', 'New York, NY']"
1,John U.,"[u'Atlanta, GA', u'Paris, France']"

But the unicode CSV file fails to quote the strings within the lists. (contents of 'utf8.csv' below)

,name,residences
0,Mary S.,"[Detroit, MI, New York, NY]"
1,John U.,"[Atlanta, GA, Paris, France]"

This results in the data being impossible to recover. For example, if I load this file using read_csv(), the relevant cells are treated as strings, and cannot be accurately recast as lists.

The behavior is the same using encoding='utf-16' but I didn't check any other encodings.

Metadata

Metadata

Assignees

No one assigned

    Labels

    IO CSVread_csv, to_csvOutput-Formatting__repr__ of pandas objects, to_string

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions