Skip to content

df.to_stata fails when a column of type object contains only None #23572

Closed
@kylebarron

Description

@kylebarron

Code Sample, a copy-pastable example if possible

import pandas as pd
df = pd.DataFrame({'a': ['a', None]})
df.to_stata('test.dta')
df = pd.DataFrame({'a': [None, 'a']})
df.to_stata('test.dta')
df = pd.DataFrame({'a': [None, None]})
df.to_stata('test.dta')
# ValueError: Writing general object arrays is not supported

Problem description

The df.to_stata() method writes columns containing None without error when there is at least one string value in the column, but fails if the column contains only None. It's unclear what data type to write a column of None as, so maybe that's why this isn't supported? I would propose that a column with values of only None be written as str1 with empty strings.

I came across this error because I read in a Parquet file with pd.read_parquet() and was unable to write the file to Stata format. In the Parquet schema, the column had type BYTE_ARRAY UTF8, but since the column had only missing values, it was read into Pandas as only None.

Expected Output

Stata file written to disk with missing values for the column with None.

Output of pd.show_versions()

INSTALLED VERSIONS
------------------
commit: None
python: 3.6.4.final.0
python-bits: 64
OS: Linux
OS-release: 4.15.0-38-generic
machine: x86_64
processor: x86_64
byteorder: little
LC_ALL: None
LANG: en_US.UTF-8
LOCALE: en_US.UTF-8

pandas: 0.23.4
pytest: 3.3.2
pip: 18.0
setuptools: 40.0.0
Cython: 0.28.3
numpy: 1.14.2
scipy: 1.0.0
pyarrow: 0.11.1
xarray: None
IPython: 6.5.0
sphinx: 1.6.6
patsy: 0.5.0
dateutil: 2.7.3
pytz: 2017.3
blosc: None
bottleneck: 1.2.1
tables: 3.4.2
numexpr: 2.6.4
feather: None
matplotlib: 2.1.2
openpyxl: 2.4.10
xlrd: 1.1.0
xlwt: 1.3.0
xlsxwriter: 1.0.2
lxml: 4.1.1
bs4: 4.6.3
html5lib: 1.0.1
sqlalchemy: 1.1.14
pymysql: None
psycopg2: 2.7.5 (dt dec pq3 ext lo64)
jinja2: 2.10
s3fs: None
fastparquet: 0.1.5
pandas_gbq: None
pandas_datareader: None

Metadata

Metadata

Assignees

No one assigned

    Labels

    Error ReportingIncorrect or improved errors from pandasIO Stataread_stata, to_stata

    Type

    No type

    Projects

    No projects

    Milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions