Skip to content

ERR: validate encoding on to_stata #15723

Closed
@ozak

Description

@ozak

It seems pandas in python3.5 causes issues due to encoding. For example the following generates a corrupt output file

import pandas as pd
df1 = pd.DataFrame(np.array([1,2,3,4]), columns=['var1'])
df1.to_stata('corrupt.dta', write_index=False, encoding='utf8')

while

df1.to_stata('not-corrupt.dta', write_index=False)

generates a correct file. I imagine this may be due to use of encoding and the difference in the treatment between python 2 and python 3, which breaks compatibility of scripts across python versions. I guess it would be nice if it does not take this option into account on python 3, unless the error is caused by something else.

Metadata

Metadata

Assignees

No one assigned

    Labels

    Error ReportingIncorrect or improved errors from pandasIO Stataread_stata, to_stataUnicodeUnicode strings

    Type

    No type

    Projects

    No projects

    Milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions