Skip to content

DataFrame.to_msgpack unexpectedly defaults to latin-1 encoding #12170

Closed
@rspeer

Description

@rspeer

I am using Python 3.

I tried saving a DataFrame with Unicode labels using the .to_msgpack method. I didn't specify an encoding, because I assumed it would use UTF-8, which is the default encoding for Python in my locale (en_US.UTF-8) as well as just a sensible encoding to use in general.

Instead, it tried to encode labels in Latin-1, which failed. Latin-1 seems like a strangely antiquated default to use in modern code.

I can work around it by passing the encoding='utf-8' option, but it would be helpful if UTF-8 were the default, as it is in other Python I/O.

Here's my version information:

>>> pd.show_versions()

INSTALLED VERSIONS
------------------
commit: None
python: 3.5.0.final.0
python-bits: 64
OS: Linux
OS-release: 3.13.0-51-generic
machine: x86_64
processor: x86_64
byteorder: little
LC_ALL: None
LANG: en_US.UTF-8

pandas: 0.17.1
nose: 1.3.7
pip: 1.5.4
setuptools: 2.2
Cython: 0.23.3
numpy: 1.10.4
scipy: 0.16.1
statsmodels: 0.6.1
IPython: 4.0.0
sphinx: 1.3.1
patsy: 0.4.0
dateutil: 2.4.2
pytz: 2015.7
blosc: None
bottleneck: None
tables: None
numexpr: None
matplotlib: 1.4.3
openpyxl: None
xlrd: None
xlwt: None
xlsxwriter: 0.7.3
lxml: None
bs4: 4.4.1
html5lib: None
httplib2: None
apiclient: None
sqlalchemy: None
pymysql: None
psycopg2: 2.6.1 (dt dec pq3 ext lo64)
Jinja2: None

Metadata

Metadata

Assignees

No one assigned

    Type

    No type

    Projects

    No projects

    Milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions