Skip to content

BUG: Error message in read_csv misleading when using decimal="," #59299

Open
@behrenhoff

Description

@behrenhoff

Pandas version checks

  • I have checked that this issue has not already been reported.

  • I have confirmed this bug exists on the latest version of pandas.

  • I have confirmed this bug exists on the main branch of pandas.

Reproducible Example

from io import BytesIO
import pandas as pd

file_good = b"a;b\na;1,20\nb;22,3\n"
file_bad = file_good + b"c;1.234,56\n"
file_ints = b"a;b\na;1\nb;1.234,56\n"

# OK
df1 = pd.read_csv(BytesIO(file_good), sep=";", decimal=",", dtype={"b": float})

# NOT OK!
# raises: ValueError: could not convert string to float: '1,20'
# should raise: ValueError: could not convert string to float: '1.234,56'
df2 = pd.read_csv(BytesIO(file_bad), sep=";", decimal=",", dtype={"b": float})

# OK, correctly raises ValueError: could not convert string to float: '1.234,56'
df3 = pd.read_csv(BytesIO(file_ints), sep=";", decimal=",", dtype={"b": float})

Issue Description

When reading a csv file with comma as decimal separator but not specifying the thousands separator, the error message from read_csv is broken.

In the example, someone added a number with a "." as thousands separator in file_bad. If previous rows contain correct numbers with comma as decimal separator, the later line with the "." suddenly causes a ValueError in a previous line. This should never happen. Here, the offending line is the new one with "." in it. Interestingly, if all previous floats don't have any decimal points (case "file_int"), then the error message is correct.

Im my case, I had a csv file with 600k lines. The real error was in line 550k, while the ValueError pointed me to line somewhere around 1k.

My issue was quickly solved by adding thousands=".", but it took me some minutes to find the offending line (can be hard in large csv files, therefore correct ValueErrors are important)

Expected Behavior

file_bad should raise ValueError: could not convert string to float: "1.234,56" as in case 3

Installed Versions

Replace this line with the output of pd.show_versions()

Metadata

Metadata

Labels

BugError ReportingIncorrect or improved errors from pandasIO CSVread_csv, to_csv

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions