Description
Code Sample, a copy-pastable example if possible
from io import StringIO
import pandas as pd
import tempfile
s = StringIO()
# TemporaryFile fails for other reasons, but seems to be covered in pandas-dev/pandas#21471
with tempfile.NamedTemporaryFile('w+') as f:
for df in (pd.DataFrame({'a': [1,2]}), pd.DataFrame({'a': [3,4]})):
# Keeping the default mode='w' for all cases
df.to_csv(f, index=False, header=False, mode='w')
df.to_csv(s, index=False, header=False, mode='w')
f.seek(0)
s.seek(0)
print("File:\n{}".format(f.read()))
print("StringIO:\n{}".format(s.read()))
Output:
File:
3
4
StringIO:
1
2
3
4
Problem description
Pandas to_csv doesn't properly handle truncating a StringIO object when writing with mode='w'
, which is inconsistent with it's own behavior for normal files and the stdlib meaning of w
.
While it's probably more often the case to append when writing multiple files like this, I had tests that operated on StringIO objects and didn't explicitly set mode
, so the append behavior passed the test while the actual behavior with files truncated.
Expected Output
File:
3
4
StringIO:
3
4
Output of pd.show_versions()
INSTALLED VERSIONS
commit: None
python: 3.6.5.final.0
python-bits: 64
OS: Darwin
OS-release: 17.6.0
machine: x86_64
processor: i386
byteorder: little
LC_ALL: None
LANG: en_US.UTF-8
LOCALE: en_US.UTF-8
pandas: 0.23.1
pytest: 3.6.3
pip: 10.0.1
setuptools: 39.2.0
Cython: None
numpy: 1.14.5
scipy: None
pyarrow: None
xarray: None
IPython: None
sphinx: None
patsy: None
dateutil: 2.7.3
pytz: 2018.5
blosc: None
bottleneck: 1.2.1
tables: None
numexpr: 2.6.5
feather: None
matplotlib: None
openpyxl: None
xlrd: None
xlwt: None
xlsxwriter: None
lxml: None
bs4: None
html5lib: 1.0.1
sqlalchemy: 1.1.18
pymysql: None
psycopg2: 2.7.5 (dt dec pq3 ext lo64)
jinja2: 2.8.1
s3fs: None
fastparquet: None
pandas_gbq: 0.5.0
pandas_datareader: None