Closed
Description
Code Sample
s = '\ud800'
srs = pd.Series()
srs.loc[ 0 ] = s
srs.to_csv('testcase.csv')
Stack trace:
---------------------------------------------------------------------------
UnicodeEncodeError Traceback (most recent call last)
<ipython-input-50-769583baba38> in <module>()
4 srs = pd.Series()
5 srs.loc[ 0 ] = s
----> 6 srs.to_csv('testcase.csv')
/opt/conda/lib/python3.6/site-packages/pandas/core/series.py in to_csv(self, path, index, sep, na_rep, float_format, header, index_label, mode, encoding, compression, date_format, decimal)
3779 index_label=index_label, mode=mode,
3780 encoding=encoding, compression=compression,
-> 3781 date_format=date_format, decimal=decimal)
3782 if path is None:
3783 return result
/opt/conda/lib/python3.6/site-packages/pandas/core/frame.py in to_csv(self, path_or_buf, sep, na_rep, float_format, columns, header, index, index_label, mode, encoding, compression, quoting, quotechar, line_terminator, chunksize, tupleize_cols, date_format, doublequote, escapechar, decimal)
1743 doublequote=doublequote,
1744 escapechar=escapechar, decimal=decimal)
-> 1745 formatter.save()
1746
1747 if path_or_buf is None:
/opt/conda/lib/python3.6/site-packages/pandas/io/formats/csvs.py in save(self)
169 self.writer = UnicodeWriter(f, **writer_kwargs)
170
--> 171 self._save()
172
173 finally:
/opt/conda/lib/python3.6/site-packages/pandas/io/formats/csvs.py in _save(self)
284 break
285
--> 286 self._save_chunk(start_i, end_i)
287
288 def _save_chunk(self, start_i, end_i):
/opt/conda/lib/python3.6/site-packages/pandas/io/formats/csvs.py in _save_chunk(self, start_i, end_i)
311
312 libwriters.write_csv_rows(self.data, ix, self.nlevels,
--> 313 self.cols, self.writer)
pandas/_libs/writers.pyx in pandas._libs.writers.write_csv_rows()
UnicodeEncodeError: 'utf-8' codec can't encode character '\ud800' in position 2: surrogates not allowed
Problem description
The presence of Unicode surrogates in a dataframe (or Series) causes an error in .to_csv()
. This has already been fixed in .to_hdf()
by allowing the errors=
argument to be used where we can use the surrogatepass
or surrogateescape
error handler.
See the original bug report and the PR that fixed it.
Expected Output
No error.
Output of pd.show_versions()
I forgot to grab this before the end of my workshop and I destroyed the cloud instance. Sorry. It was Python 3.6 and pandas 0.23.4 I think.