Skip to content

BUG: read_csv does not raise UnicodeDecodeError on non utf-8 characters #39450

Closed
@DrGFreeman

Description

@DrGFreeman
  • I have checked that this issue has not already been reported.

  • I have confirmed this bug exists on the latest version of pandas.
    Bug exists in 1.2.1, not in <=1.2.0

  • (optional) I have confirmed this bug exists on the master branch of pandas.


Note: Please read this guide detailing how to provide the necessary information for us to reproduce your bug.

Code Sample, a copy-pastable example

from pathlib import Path

import pandas as pd

file = Path("non-utf8.csv")
file.write_bytes(b"\xe4\na\n1")  # non utf-8 character

df = pd.read_csv(file)

Output:

   �
0  a
1  1

Problem description

In pandas version 1.2.1, reading a csv file containing non utf-8 characters does not raise a UnicodeDecoreError. In version 1.2.0 and earlier, a UnicodeDecodeError is raised, allowing proper exception handling in application code.

Expected Output

Expect a UnicodeDecodeError to be raised.

Output of pd.show_versions()

INSTALLED VERSIONS
------------------
commit           : 9d598a5e1eee26df95b3910e3f2934890d062caa
python           : 3.8.5.final.0
python-bits      : 64
OS               : Windows
OS-release       : 10
Version          : 10.0.17763
machine          : AMD64
processor        : Intel64 Family 6 Model 158 Stepping 9, GenuineIntel
byteorder        : little
LC_ALL           : None
LANG             : None
LOCALE           : French_Canada.1252

pandas           : 1.2.1
numpy            : 1.19.2
pytz             : 2020.5
dateutil         : 2.8.1
pip              : 20.3.3
setuptools       : 52.0.0.post20210125
Cython           : None
pytest           : 6.2.2
hypothesis       : None
sphinx           : 3.4.3
blosc            : None
feather          : None
xlsxwriter       : None
lxml.etree       : None
html5lib         : None
pymysql          : None
psycopg2         : None
jinja2           : 2.11.2
IPython          : 7.19.0
pandas_datareader: None
bs4              : None
bottleneck       : None
fsspec           : 0.8.3
fastparquet      : None
gcsfs            : None
matplotlib       : 3.3.2
numexpr          : None
odfpy            : None
openpyxl         : None
pandas_gbq       : None
pyarrow          : 0.15.1
pyxlsb           : None
s3fs             : None
scipy            : 1.5.2
sqlalchemy       : None
tables           : None
tabulate         : None
xarray           : None
xlrd             : None
xlwt             : None
numba            : 0.51.2

Metadata

Metadata

Assignees

No one assigned

    Labels

    BugIO CSVread_csv, to_csvIO DataIO issues that don't fit into a more specific labelNeeds DiscussionRequires discussion from core team before further action

    Type

    No type

    Projects

    No projects

    Milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions