Closed
Description
Right now if you want to use an encoding, you have to make sure that all fields that contain non-ascii characters are in that encoding. Maybe this is okay, but I'm working with a lot of data and it is a bit cumbersome for me to be doing these checks constantly. For example
from StringIO import StringIO
import pandas
df = pandas.read_table(StringIO('Ki\xc3\x9fwetter, Wolfgang;Ki\xc3\x9fwetter, Wolfgang'), sep=";", header=None)
df["X.1"] = df["X.1"].apply(lambda x : x.decode('utf-8'))
df.to_csv("blah.csv", encoding="utf-8")
The question is, is this something that the user should be worrying about or is there a "safe_encode" that could be used instead, similar idea to #1804?