-
-
Notifications
You must be signed in to change notification settings - Fork 18.6k
read_excel with dtype=str converts empty cells to np.nan #20429
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Changes from 24 commits
dd53df8
6f771fb
37f00ad
f194b70
eb8f4c5
ac6a409
6994bb0
40a563f
9858259
5f71a99
0a93b60
f296f9a
7c0af1f
f0fd0a7
61e0519
9fdac27
ddb904f
694849d
5ba95a1
d3ceec3
ea1d73a
c1376a5
3103811
7d5f6b2
478d08d
edb26d7
c3ab9cb
69f6c95
97a345a
8b2fb0b
c9f5120
fab0b27
571d5c4
0712392
47bc105
3740dfe
7d453bb
bcd739d
7341cd1
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
Original file line number | Diff line number | Diff line change |
---|---|---|
|
@@ -984,6 +984,8 @@ I/O | |
- Bug in :meth:`pandas.io.stata.StataReader.value_labels` raising an ``AttributeError`` when called on very old files. Now returns an empty dict (:issue:`19417`) | ||
- Bug in :func:`read_pickle` when unpickling objects with :class:`TimedeltaIndex` or :class:`Float64Index` created with pandas prior to version 0.20 (:issue:`19939`) | ||
- Bug in :meth:`pandas.io.json.json_normalize` where subrecords are not properly normalized if any subrecords values are NoneType (:issue:`20030`) | ||
- Bug in :func:`read_excel` and :class:`TextReader` now turn np.nan to empty string when dtype=str. They used to turn np.nan to 'nan' (:issue `20377`) | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Perhaps phrase this as "Bug in read_excel, old behavior, new behavior". Right now it reads a bit like turning NaN to the empty string was the bug. TextReader isn't part of the public API so this link won't work (and it'll cause a warning in the doc build.) Double backticks around Double backticsk around Double backticks around There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Should also mention that this affected parsing CSVs with There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more.
@TomAugspurger : yes |
||
|
||
|
||
Plotting | ||
^^^^^^^^ | ||
|
Original file line number | Diff line number | Diff line change |
---|---|---|
|
@@ -465,7 +465,8 @@ cpdef ndarray[object] astype_unicode(ndarray arr): | |
for i in range(n): | ||
# we can use the unsafe version because we know `result` is mutable | ||
# since it was created from `np.empty` | ||
util.set_value_at_unsafe(result, i, unicode(arr[i])) | ||
arr_i = arr[i] | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. is arr_i in the cdef? There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. d'oh! |
||
util.set_value_at_unsafe(result, i, unicode(arr_i) if arr_i is not np.nan else '') | ||
|
||
return result | ||
|
||
|
@@ -478,7 +479,8 @@ cpdef ndarray[object] astype_str(ndarray arr): | |
for i in range(n): | ||
# we can use the unsafe version because we know `result` is mutable | ||
# since it was created from `np.empty` | ||
util.set_value_at_unsafe(result, i, str(arr[i])) | ||
arr_i = arr[i] | ||
util.set_value_at_unsafe(result, i, str(arr_i) if arr_i is not np.nan else '') | ||
|
||
return result | ||
|
||
|
Original file line number | Diff line number | Diff line change |
---|---|---|
|
@@ -361,6 +361,35 @@ def test_reader_dtype(self, ext): | |
with pytest.raises(ValueError): | ||
actual = self.get_exceldf(basename, ext, dtype={'d': 'int64'}) | ||
|
||
def test_reader_dtype_str(self, ext): | ||
# GH 20377 | ||
basename = 'testdtype' | ||
actual = self.get_exceldf(basename, ext) | ||
|
||
expected = DataFrame({ | ||
'a': [1, 2, 3, 4], | ||
'b': [2.5, 3.5, 4.5, 5.5], | ||
'c': [1, 2, 3, 4], | ||
'd': [1.0, 2.0, np.nan, 4.0]}).reindex( | ||
columns=['a', 'b', 'c', 'd']) | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. just specify |
||
|
||
tm.assert_frame_equal(actual, expected) | ||
|
||
actual = self.get_exceldf(basename, ext, | ||
dtype={'a': 'float64', | ||
'b': 'float32', | ||
'c': str, | ||
'd': str}) | ||
|
||
expected['a'] = expected['a'].astype('float64') | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. move this higher (by the expected), you can simply construct things directly by using e.g. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more.
I'm not sure what you mean here.
First of all, this is copy-paste from the previous test, which was added for #8212 Do you mean to do expected = DataFrame({'a': Series([1,2,3,4], dtype='float64'),
'b': Series([2.5,3.5,4.5,5.5], dtype='float32'),
...}) ? There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. yes exactly. If you need things like '001', then just do it that way, e.g. |
||
expected['b'] = expected['b'].astype('float32') | ||
expected['c'] = ['001', '002', '003', '004'] | ||
expected['d'] = ['1', '2', '', '4'] | ||
tm.assert_frame_equal(actual, expected) | ||
|
||
with pytest.raises(ValueError): | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. What's the error message? There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more.
which is the same as in the There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more.
What do you mean? |
||
actual = self.get_exceldf(basename, ext, dtype={'d': 'int64'}) | ||
|
||
def test_reading_all_sheets(self, ext): | ||
# Test reading all sheetnames by setting sheetname to None, | ||
# Ensure a dict is returned. | ||
|
Original file line number | Diff line number | Diff line change |
---|---|---|
|
@@ -162,6 +162,16 @@ def test_to_csv_compression(self, compression): | |
index_col=0, | ||
squeeze=True)) | ||
|
||
def test_from_csv_dtype_str(self): | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more.
|
||
# GH20377 | ||
s = Series([1, 2, np.nan, 4], index=['A', 'B', 'C', 'D'], | ||
name='X') | ||
with ensure_clean() as filename: | ||
s.to_csv(filename, header=True) | ||
rs = pd.read_csv(filename, dtype=str) | ||
expected = Series(['1.0', '2.0', '', '4.0'], name=s.name) | ||
assert_series_equal(rs.X, expected) | ||
|
||
|
||
class TestSeriesIO(TestData): | ||
|
||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
you are including some other changes here, pls rebase on master.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
it's not mine. I deleted it by mistake and added it back.
You can check master here https://github.com/pandas-dev/pandas/blob/master/doc/source/whatsnew/v0.23.0.txt#L985
However, even after rebasing, I keep getting this conflict
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If you rebased off master and resolved the conflicts in the rebase then it should be ok. Did you fetch the current master before rebasing?
Uh oh!
There was an error while loading. Please reload this page.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
i did now