Skip to content

Commit f146632

Browse files
jbman223jreback
authored andcommitted
BUG: Fix pd.NA na_rep truncated in to_csv (#30146)
1 parent 56b6561 commit f146632

File tree

3 files changed

+13
-5
lines changed

3 files changed

+13
-5
lines changed

doc/source/whatsnew/v1.0.0.rst

Lines changed: 2 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -56,7 +56,7 @@ Dedicated string data type
5656
^^^^^^^^^^^^^^^^^^^^^^^^^^
5757

5858
We've added :class:`StringDtype`, an extension type dedicated to string data.
59-
Previously, strings were typically stored in object-dtype NumPy arrays.
59+
Previously, strings were typically stored in object-dtype NumPy arrays. (:issue:`29975`)
6060

6161
.. warning::
6262

@@ -985,7 +985,7 @@ Other
985985
- Bug in :meth:`Series.count` raises if use_inf_as_na is enabled (:issue:`29478`)
986986
- Bug in :class:`Index` where a non-hashable name could be set without raising ``TypeError`` (:issue:`29069`)
987987
- Bug in :class:`DataFrame` constructor when passing a 2D ``ndarray`` and an extension dtype (:issue:`12513`)
988-
-
988+
- Bug in :meth:`DaataFrame.to_csv` when supplied a series with a ``dtype="string"`` and a ``na_rep``, the ``na_rep`` was being truncated to 2 characters. (:issue:`29975`)
989989

990990
.. _whatsnew_1000.contributors:
991991

pandas/core/internals/blocks.py

Lines changed: 3 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -657,9 +657,9 @@ def to_native_types(self, slicer=None, na_rep="nan", quoting=None, **kwargs):
657657
if slicer is not None:
658658
values = values[:, slicer]
659659
mask = isna(values)
660+
itemsize = writers.word_len(na_rep)
660661

661-
if not self.is_object and not quoting:
662-
itemsize = writers.word_len(na_rep)
662+
if not self.is_object and not quoting and itemsize:
663663
values = values.astype(f"<U{itemsize}")
664664
else:
665665
values = np.array(values, dtype="object")
@@ -1773,11 +1773,11 @@ def to_native_types(self, slicer=None, na_rep="nan", quoting=None, **kwargs):
17731773
mask = isna(values)
17741774

17751775
try:
1776-
values = values.astype(str)
17771776
values[mask] = na_rep
17781777
except Exception:
17791778
# eg SparseArray does not support setitem, needs to be converted to ndarray
17801779
return super().to_native_types(slicer, na_rep, quoting, **kwargs)
1780+
values = values.astype(str)
17811781

17821782
# we are expected to return a 2-d ndarray
17831783
return values.reshape(1, len(values))

pandas/tests/io/formats/test_to_csv.py

Lines changed: 8 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -205,6 +205,14 @@ def test_to_csv_na_rep(self):
205205
assert df.set_index("a").to_csv(na_rep="_") == expected
206206
assert df.set_index(["a", "b"]).to_csv(na_rep="_") == expected
207207

208+
# GH 29975
209+
# Make sure full na_rep shows up when a dtype is provided
210+
csv = pd.Series(["a", pd.NA, "c"]).to_csv(na_rep="ZZZZZ")
211+
expected = tm.convert_rows_list_to_csv_str([",0", "0,a", "1,ZZZZZ", "2,c"])
212+
assert expected == csv
213+
csv = pd.Series(["a", pd.NA, "c"], dtype="string").to_csv(na_rep="ZZZZZ")
214+
assert expected == csv
215+
208216
def test_to_csv_date_format(self):
209217
# GH 10209
210218
df_sec = DataFrame({"A": pd.date_range("20130101", periods=5, freq="s")})

0 commit comments

Comments
 (0)