Skip to content

DataFrame.replace(dict) has weird behaviour in some cases #5338

Closed
@jankatins

Description

@jankatins
import pandas as pd
df = pd.DataFrame({"color":[1,2,3,4]})
print df
   color
0      1
1      2
2      3
3      4
print df.replace({"color":{"1":"2","3":"4",}}) # works but shouldn't?
  color
0     2
1     2
2     4
3     4
print df.replace({"color":{"1":"2","2":"3","3":"4","4":"5"}}) # strange
  color
0     2
1     4
2     3
3     5
print df.replace({"color":{1:"2",2:"3",3:"4",4:"5"}}) # works by replacing each cell once
  color
0     2
1     3
2     4
3     5

df = pd.DataFrame({"color":["1","2","3","4"]})
print df
  color
0     1
1     2
2     3
3     4
print df.replace({"color":{"1":"2","3":"4",}}) # works
  color
0     2
1     2
2     4
3     4
print df.replace({"color":{"1":"2","2":"3","3":"4","4":"5"}}) # works not
  color
0     3
1     3
2     5
3     5
print df.replace({"color":{1:"2",2:"3",3:"4",4:"5"}}) # works as expected: shouldn't replace anything!
  color
0     1
1     2
2     3
3     4

So, my expected behaviour would be:

  • don't replace a cell if the type of the cell does not match the key (as it is the case when a string cell is replaced by a int key)
  • if a value of a cell is replaced, the cell shouldn't be replaced a second time in the same replace call

I found the problem when I tried to match string values to colors and got blown up color values: like {"3":"#123456","4":"#000000"} wouldn't convert "3" into "#123#00000056"

Edit: insert string cell cases and my expected behaviour and deleted the intial comments which had these examples

Metadata

Metadata

Assignees

No one assigned

    Labels

    API DesignBugMissing-datanp.nan, pd.NaT, pd.NA, dropna, isnull, interpolate

    Type

    No type

    Projects

    No projects

    Milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions