Skip to content

BUG: to_numeric casts floats incorrectly #36502

Closed
@dcsaba89

Description

@dcsaba89
  • [ X ] I have checked that this issue has not already been reported.

  • [ X ] I have confirmed this bug exists on the latest version of pandas.

  • (optional) I have confirmed this bug exists on the master branch of pandas.


Note: Please read this guide detailing how to provide the necessary information for us to reproduce your bug.

Code Sample, a copy-pastable example

df = pd.DataFrame({'A': ['0.09', '0.63', '0.121909', '0.117863']})
df['B'] = pd.to_numeric(df['A'])
df['C'] = df['A'].astype(float)

Problem description

When floats as strings are passed to a column of a DataFrame, pd.to_numeric casts some of the given strings incorrectly.

In the above case df['C'] and df['A'] should have exactly the same values, but they differ in some decimals:

df['B'] = (0, 0.09) (1, 0.63) (2, 0.12190899999999999) (3, 0.11786300000000001)
df['C'] = (0, 0.09) (1, 0.63) (2, 0.121909) (3, 0.117863)

df['C'] is correct but df['B'] is incorrect.

The above issue does not occur when the same input is passed as list
w = pd.to_numeric(['0.09', '0.63', '0.121909', '0.117863'])

[0.09 0.63 0.121909 0.117863]

[this should explain why the current behaviour is a problem and why the expected output is a better solution]

Expected Output

Output of pd.show_versions()

INSTALLED VERSIONS

commit : 2a7d332
python : 3.8.5.final.0
python-bits : 32
OS : Windows
OS-release : 10
Version : 10.0.19041
machine : AMD64
processor : Intel64 Family 6 Model 142 Stepping 12, GenuineIntel
byteorder : little
LC_ALL : None
LANG : None
LOCALE : English_United States.1252

pandas : 1.1.2

Metadata

Metadata

Assignees

No one assigned

    Labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions