Closed
Description
If I have a dataframe with cells containing lists of strings (or unicode strings), then these lists are broken when I use to_csv()
with the encoding
parameter set. The error does not occur if the encoding
is not set.
Here is an example (using pandas version 0.16.2):
df = pd.DataFrame.from_records(
[('Mary S.',['Detroit, MI','New York, NY']),
('John U.',[u'Atlanta, GA',u'Paris, France'])],
columns=['name','residences'])
df.to_csv('ascii.csv')
df.to_csv('utf8.csv',encoding='utf-8')
The ascii-encoded CSV file is fine. (contents of 'ascii.csv' below)
,name,residences
0,Mary S.,"['Detroit, MI', 'New York, NY']"
1,John U.,"[u'Atlanta, GA', u'Paris, France']"
But the unicode CSV file fails to quote the strings within the lists. (contents of 'utf8.csv' below)
,name,residences
0,Mary S.,"[Detroit, MI, New York, NY]"
1,John U.,"[Atlanta, GA, Paris, France]"
This results in the data being impossible to recover. For example, if I load this file using read_csv()
, the relevant cells are treated as strings, and cannot be accurately recast as lists.
The behavior is the same using encoding='utf-16'
but I didn't check any other encodings.