Skip to content

BUG: Parameter converters when using the read function. #59026

Open
@Thanaraklee

Description

@Thanaraklee

Pandas version checks

  • I have checked that this issue has not already been reported.

  • I have confirmed this bug exists on the latest version of pandas.

  • I have confirmed this bug exists on the main branch of pandas.

Reproducible Example

import pandas as pd
def a_cleaning(value: object) -> object:
    if isinstance(value, str):
        return value.replace(',','')
    else:
        return value

data = pd.DataFrame({
    'A':['1,200',np.nan,'400','200',np.nan]
})
data.to_csv('data.csv',index=False)

# converters
df = pd.read_csv('data.csv',
                  converters={
                      'A': a_cleaning
                  })
print('converters:')
display(df)

# apply
print('apply:')
df = pd.read_csv('data.csv')
df['A'] = df['A'].apply(a_cleaning)
display(df)

Issue Description

I'm wondering why using converters results in returning NaN values as '' when using the same function, but when switching to apply instead of converters, the NaN values are returned as NaN as before.

My function:

def a_cleaning(value: object) -> object:
    if isinstance(value, str):
        return value.replace(',','')
    else:
        return value

My Dataframe:

data = pd.DataFrame({
    'A':['1,200',np.nan,'400','200',np.nan]
})
data.to_csv('data.csv',index=False)

My code when using converters:

df = pd.read_csv('data.csv',
                  converters={
                      'A': a_cleaning
                  })
display(df)

Result:
image

My code when using apply:

df = pd.read_csv('data.csv')
df['A'] = df['A'].apply(a_cleaning)
display(df)

Result:
image

Why are the results different?
I'm not sure if this issue will occur with other read functions. I've only tested it with read_csv so far.

Expected Behavior

It should produce the same result as using apply.

Installed Versions

INSTALLED VERSIONS

commit : d9cdd2e
python : 3.10.13.final.0
python-bits : 64
OS : Linux
OS-release : 5.15.133+
Version : #1 SMP Tue Dec 19 13:14:11 UTC 2023
machine : x86_64
processor : x86_64
byteorder : little
LC_ALL : POSIX
LANG : C.UTF-8
LOCALE : None.None

pandas : 2.2.2
numpy : 1.26.4
pytz : 2024.1
dateutil : 2.9.0.post0
setuptools : 69.0.3
pip : 23.3.2
Cython : 3.0.8
pytest : 8.2.1
hypothesis : None
sphinx : None
blosc : None
feather : 0.4.1
xlsxwriter : None
lxml.etree : 5.2.2
html5lib : 1.1
pymysql : None
psycopg2 : None
jinja2 : 3.1.2
IPython : 8.20.0
pandas_datareader : 0.10.0
adbc-driver-postgresql: None
adbc-driver-sqlite : None
bs4 : 4.12.2
bottleneck : None
dataframe-api-compat : None
fastparquet : None
fsspec : 2024.3.1
gcsfs : 2024.3.1
matplotlib : 3.7.5
numba : 0.59.1
numexpr : 2.10.0
odfpy : None
openpyxl : 3.1.3
pandas_gbq : None
pyarrow : 14.0.2
pyreadstat : None
python-calamine : None
pyxlsb : None
s3fs : 2024.3.1
scipy : 1.11.4
sqlalchemy : 2.0.25
tables : 3.9.2
tabulate : 0.9.0
xarray : 2024.5.0
xlrd : None
zstandard : 0.19.0
tzdata : 2024.1
qtpy : None
pyqt5 : None

Metadata

Metadata

Labels

BugIO CSVread_csv, to_csvNeeds TriageIssue that has not been reviewed by a pandas team member

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions