Skip to content

BUG: Unexpected change of behavior on DataFrame type float32 between pandas versions. #46552

Open
@carlosan1708

Description

@carlosan1708

Pandas version checks

  • I have checked that this issue has not already been reported.

  • I have confirmed this bug exists on the latest version of pandas.

  • I have confirmed this bug exists on the main branch of pandas.

Reproducible Example

import pandas as pd
import numpy as np
import json

def check_error(x, r):
    df = pd.DataFrame(data=x, dtype="float32")
    for i in df.index:
        for j in range(len(list(df.iloc[i]))):
            # Here is the difference, it"s converting str to float32 in pandas 1.3.5
            # On pandas 1.4.0 no longer happens, instead it's keeping the value intact.
            df.iloc[i][j] = r[j][int(df.iloc[i][j])]
    return df


x = [[0, 0.], [1., 0.], [0., 1.], [1., 1.]]
r = np.array([["10", "20"], ["50", "40"]])

result = check_error(x, r).to_json()
result_json = expected = json.loads(result)
## Pandas 1.3.5
if pd.__version__=="1.3.5":
    expected_json_pandas_1_3_5 = json.loads("""
    {"0":{"0":10.0,"1":20.0,"2":10.0,"3":20.0},"1":{"0":50.0,"1":50.0,"2":40.0,"3":40.0}}
    """)
    print(sorted(result_json.items()) == sorted(expected_json_pandas_1_3_5.items()))
else:
    expected_json_pandas_1_4_0 = json.loads("""
    {"0": {"0": 0.0, "1": 1.0, "2": 0.0, "3": 1.0}, "1": {"0": 0.0, "1": 0.0, "2": 1.0, "3": 1.0}}
    """)
    ## Pandas 1.4.0
    print(sorted(result_json.items()) == sorted(expected_json_pandas_1_4_0.items()))

Issue Description

There is a change of behavior that is not mentioned in the documentation that could cause issues in existing libraries when assigning variables in a data frame with float32, I haven't checked if in other types of data frames the same could occur.

Pretty when using data frames with type float32 the assignation is not reacting in the same way between 1.3.5 pandas version and 1.4.0. In 1.3.5 str type is getting transformed into a float type, in 1.4.0 this is no longer occurring and instead, the assignation is not occurring anymore, but this is not throwing an error either which is causing as in the example two data frames to contain different information based on pandas version.

Expected Behavior

Change should be called out in the documentation of 1.4.0 or throw an error instead to alert users about incorrect types during the assignation of variables. Overall it might be worthy to explore if this is not affecting other dataframes types.

Installed Versions

Pandas 1.3.5

Pandas 1.4.0

Pandas 1.4.1

Metadata

Metadata

Assignees

No one assigned

    Labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions