Skip to content

Encoding non-ascii characters in to_csv with encodings #1966

Closed
@jseabold

Description

@jseabold

Right now if you want to use an encoding, you have to make sure that all fields that contain non-ascii characters are in that encoding. Maybe this is okay, but I'm working with a lot of data and it is a bit cumbersome for me to be doing these checks constantly. For example

from StringIO import StringIO
import pandas

df = pandas.read_table(StringIO('Ki\xc3\x9fwetter, Wolfgang;Ki\xc3\x9fwetter, Wolfgang'), sep=";", header=None)
df["X.1"] = df["X.1"].apply(lambda x : x.decode('utf-8'))
df.to_csv("blah.csv", encoding="utf-8")

The question is, is this something that the user should be worrying about or is there a "safe_encode" that could be used instead, similar idea to #1804?

Metadata

Metadata

Assignees

No one assigned

    Labels

    IO DataIO issues that don't fit into a more specific label

    Type

    No type

    Projects

    No projects

    Milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions