Skip to content

BUG: Fix replace for different dtype equal value #34878

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
wants to merge 15 commits into from
Closed
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
1 change: 1 addition & 0 deletions doc/source/whatsnew/v1.2.0.rst
Original file line number Diff line number Diff line change
Expand Up @@ -339,6 +339,7 @@ Groupby/resample/rolling
- Bug in :meth:`DataFrameGroupby.tshift` failing to raise ``ValueError`` when a frequency cannot be inferred for the index of a group (:issue:`35937`)
- Bug in :meth:`DataFrame.groupby` does not always maintain column index name for ``any``, ``all``, ``bfill``, ``ffill``, ``shift`` (:issue:`29764`)
- Bug in :meth:`DataFrameGroupBy.apply` raising error with ``np.nan`` group(s) when ``dropna=False`` (:issue:`35889`)
- Bug where :meth:`DataFrame.replace` and :meth:`Series.replace` would fail when replacement value was of a different dtype but equal to values in the original object (:issue:`34871`).
-

Reshaping
Expand Down
10 changes: 7 additions & 3 deletions pandas/core/internals/blocks.py
Original file line number Diff line number Diff line change
Expand Up @@ -714,9 +714,13 @@ def replace(
# retry
if not self._can_hold_element(to_replace):
if not isinstance(to_replace, list):
if inplace:
return [self]
return [self.copy()]
return self.astype(object).replace(
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

we only want to do this if these are different types, not generally. please show asv's for this.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think the if not self._can_hold_element(to_replace) check is making sure we have different types.

Not seeing any performance regressions from the change (at least in the replace benchmarks):

(pandas-dev) > asv continuous -f 1.1 upstream/master replace-diff-dtype -b ^replace
· Creating environments
· Discovering benchmarks
·· Uninstalling from conda-py3.7-Cython0.29.16-matplotlib-numba-numexpr-numpy-odfpy-openpyxl-pytables-pytest-scipy-sqlalchemy-xlrd-xlsxwriter-xlwt.
·· Installing 1e014777 <replace-diff-dtype> into conda-py3.7-Cython0.29.16-matplotlib-numba-numexpr-numpy-odfpy-openpyxl-pytables-pytest-scipy-sqlalchemy-xlrd-xlsxwriter-xlwt..
· Running 12 total benchmarks (2 commits * 1 environments * 6 benchmarks)
[  0.00%] · For pandas commit 9114b4bc <master> (round 1/2):
[  0.00%] ·· Building for conda-py3.7-Cython0.29.16-matplotlib-numba-numexpr-numpy-odfpy-openpyxl-pytables-pytest-scipy-sqlalchemy-xlrd-xlsxwriter-xlwt...
[  0.00%] ·· Benchmarking conda-py3.7-Cython0.29.16-matplotlib-numba-numexpr-numpy-odfpy-openpyxl-pytables-pytest-scipy-sqlalchemy-xlrd-xlsxwriter-xlwt
[  4.17%] ··· Running (replace.Convert.time_replace--)....
[ 20.83%] ··· Running (replace.ReplaceList.time_replace_list--)..
[ 25.00%] · For pandas commit 1e014777 <replace-diff-dtype> (round 1/2):
[ 25.00%] ·· Building for conda-py3.7-Cython0.29.16-matplotlib-numba-numexpr-numpy-odfpy-openpyxl-pytables-pytest-scipy-sqlalchemy-xlrd-xlsxwriter-xlwt...
[ 25.00%] ·· Benchmarking conda-py3.7-Cython0.29.16-matplotlib-numba-numexpr-numpy-odfpy-openpyxl-pytables-pytest-scipy-sqlalchemy-xlrd-xlsxwriter-xlwt
[ 29.17%] ··· Running (replace.Convert.time_replace--)....
[ 45.83%] ··· Running (replace.ReplaceList.time_replace_list--)..
[ 50.00%] · For pandas commit 1e014777 <replace-diff-dtype> (round 2/2):
[ 50.00%] ·· Benchmarking conda-py3.7-Cython0.29.16-matplotlib-numba-numexpr-numpy-odfpy-openpyxl-pytables-pytest-scipy-sqlalchemy-xlrd-xlsxwriter-xlwt
[ 54.17%] ··· replace.Convert.time_replace                                                                                                                                                                 ok
[ 54.17%] ··· ============= ============ ============
              --                   replace_data      
              ------------- -------------------------
               constructor   Timestamp    Timedelta  
              ============= ============ ============
                DataFrame     98.6±3ms    95.2±0.5ms 
                  Series     74.0±0.3ms    74.6±1ms  
              ============= ============ ============

[ 58.33%] ··· replace.FillNa.time_fillna                                                                                                                                                                   ok
[ 58.33%] ··· ========= =============
               inplace               
              --------- -------------
                 True    1.04±0.01ms 
                False     3.01±0.3ms 
              ========= =============

[ 62.50%] ··· replace.FillNa.time_replace                                                                                                                                                                  ok
[ 62.50%] ··· ========= =============
               inplace               
              --------- -------------
                 True    1.10±0.02ms 
                False    4.02±0.07ms 
              ========= =============

[ 66.67%] ··· replace.ReplaceDict.time_replace_series                                                                                                                                                      ok
[ 66.67%] ··· ========= ============
               inplace              
              --------- ------------
                 True    4.27±0.02s 
                False    4.24±0.04s 
              ========= ============

[ 70.83%] ··· replace.ReplaceList.time_replace_list                                                                                                                                                        ok
[ 70.83%] ··· ========= ============
               inplace              
              --------- ------------
                 True    51.6±0.6μs 
                False     405±5ms   
              ========= ============

[ 75.00%] ··· replace.ReplaceList.time_replace_list_one_match                                                                                                                                              ok
[ 75.00%] ··· ========= ==========
               inplace            
              --------- ----------
                 True    89.2±3ms 
                False    467±4ms  
              ========= ==========

[ 75.00%] · For pandas commit 9114b4bc <master> (round 2/2):
[ 75.00%] ·· Building for conda-py3.7-Cython0.29.16-matplotlib-numba-numexpr-numpy-odfpy-openpyxl-pytables-pytest-scipy-sqlalchemy-xlrd-xlsxwriter-xlwt...
[ 75.00%] ·· Benchmarking conda-py3.7-Cython0.29.16-matplotlib-numba-numexpr-numpy-odfpy-openpyxl-pytables-pytest-scipy-sqlalchemy-xlrd-xlsxwriter-xlwt
[ 79.17%] ··· replace.Convert.time_replace                                                                                                                                                                 ok
[ 79.17%] ··· ============= =========== ===========
              --                  replace_data     
              ------------- -----------------------
               constructor   Timestamp   Timedelta 
              ============= =========== ===========
                DataFrame     102±4ms     103±3ms  
                  Series      80.0±4ms    83.1±4ms 
              ============= =========== ===========

[ 83.33%] ··· replace.FillNa.time_fillna                                                                                                                                                                   ok
[ 83.33%] ··· ========= =============
               inplace               
              --------- -------------
                 True    1.10±0.04ms 
                False     3.04±0.1ms 
              ========= =============

[ 87.50%] ··· replace.FillNa.time_replace                                                                                                                                                                  ok
[ 87.50%] ··· ========= =============
               inplace               
              --------- -------------
                 True    1.20±0.04ms 
                False     4.26±0.2ms 
              ========= =============

[ 91.67%] ··· replace.ReplaceDict.time_replace_series                                                                                                                                                      ok
[ 91.67%] ··· ========= ============
               inplace              
              --------- ------------
                 True    4.26±0.01s 
                False    4.24±0.01s 
              ========= ============

[ 95.83%] ··· replace.ReplaceList.time_replace_list                                                                                                                                                        ok
[ 95.83%] ··· ========= ==========
               inplace            
              --------- ----------
                 True    49.6±1μs 
                False    412±6ms  
              ========= ==========

[100.00%] ··· replace.ReplaceList.time_replace_list_one_match                                                                                                                                              ok
[100.00%] ··· ========= ==========
               inplace            
              --------- ----------
                 True    89.3±2ms 
                False    464±30ms 
              ========= ==========


BENCHMARKS NOT SIGNIFICANTLY CHANGED.

to_replace=to_replace,
value=value,
inplace=inplace,
regex=regex,
convert=convert,
)

to_replace = [x for x in to_replace if self._can_hold_element(x)]
if not len(to_replace):
Expand Down
7 changes: 7 additions & 0 deletions pandas/tests/frame/methods/test_replace.py
Original file line number Diff line number Diff line change
Expand Up @@ -1492,6 +1492,13 @@ def test_replace_with_duplicate_columns(self, replacement):

tm.assert_frame_equal(result, expected)

def test_replace_equal_value_different_dtype(self):
# https://github.com/pandas-dev/pandas/issues/34871
df = pd.DataFrame([1, 2, 3])
result = df.replace(1.0, 0)
expected = pd.DataFrame([0, 2, 3])
tm.assert_frame_equal(result, expected)

@pytest.mark.xfail(
reason="replace() changes dtype from period to object, see GH34871", strict=True
)
Expand Down
7 changes: 7 additions & 0 deletions pandas/tests/series/methods/test_replace.py
Original file line number Diff line number Diff line change
Expand Up @@ -442,6 +442,13 @@ def test_replace_extension_other(self):
ser = pd.Series(pd.array([1, 2, 3], dtype="Int64"))
ser.replace("", "") # no exception

def test_replace_equal_value_different_dtype(self):
# https://github.com/pandas-dev/pandas/issues/34871
ser = pd.Series([1, 2, 3])
result = ser.replace(1.0, 0)
expected = pd.Series([0, 2, 3])
tm.assert_series_equal(result, expected)

def test_replace_with_compiled_regex(self):
# https://github.com/pandas-dev/pandas/issues/35680
s = pd.Series(["a", "b", "c"])
Expand Down