Skip to content

pandas 1.0.0 read_csv() is broken use open( buffering=0) option. #31575

Closed
@paihu

Description

@paihu

Code Sample

import os
import pandas
import tempfile

fname = ""
with tempfile.NamedTemporaryFile(delete=False, mode="w+", encoding="shift-jis") as f:
        f.write("てすと\nbar")
        fname = f.name
print(fname)

try:
        with open(fname,mode="r", encoding="shift-jis") as f:
                result = pandas.read_csv(f)
                print("read shift-jis")
                print(result)

        with open(fname,mode="r", encoding="shift-jis") as f:
                result = pandas.read_csv(f,encoding="utf-8")
                print("open shift-jis file and read_csv with encoding: utf-8")
                print(result)

        with open(fname,mode="rb") as f:
                result = pandas.read_csv(f,encoding="shift-jis")
                print("open binary with buffered and read_csv with encoding: shift-jis")
                print(result)

        with open(fname,mode="rb",buffering=0) as f:
                result = pandas.read_csv(f,encoding="shift-jis")
                print("open binary without burrered and read_csv with encoding: shift-jis")
                print(result)
except Exception as e:
        print(e)

os.unlink(fname)

Problem description

Pandas 1.0.0, this sample does not work. But pandas 0.25.3, this sample works fine.

Open file with buffering=0 option, f is RawIOBase. This case seems encoding option will be ignored.

https://github.com/pandas-dev/pandas/pull/30771/files#diff-0335ae9037e4eb4747749a9f94cffd32R641

https://github.com/pandas-dev/pandas/pull/30771/files#diff-777d7549579ddc0c6e67596ad87e0d27R1879

Expected Output

/tmp/tmpxxxxxxxxxxx
read shift-jis
   てすと
0  bar
open shift-jis file and read_csv with encoding: utf-8
   てすと
0  bar
open binary with buffered and read_csv with encoding: shift-jis
   てすと
0  bar
open binary without burrered and read_csv with encoding: shift-jis
   てすと
0  bar

Output of pd.show_versions()

INSTALLED VERSIONS

commit : None
python : 3.6.8.final.0
python-bits : 64
OS : Linux
OS-release : 4.4.0-18362-Microsoft
machine : x86_64
processor : x86_64
byteorder : little
LC_ALL : None
LANG : ja_JP.UTF-8
LOCALE : ja_JP.UTF-8

pandas : 1.0.0
numpy : 1.17.2
pytz : 2019.3
dateutil : 2.8.0
pip : 19.3.1
setuptools : 41.4.0
Cython : None
pytest : None
hypothesis : None
sphinx : None
blosc : None
feather : None
xlsxwriter : None
lxml.etree : 4.4.1
html5lib : None
pymysql : None
psycopg2 : None
jinja2 : 2.10
IPython : None
pandas_datareader: None
bs4 : None
bottleneck : None
fastparquet : None
gcsfs : None
lxml.etree : 4.4.1
matplotlib : 3.1.1
numexpr : None
odfpy : None
openpyxl : None
pandas_gbq : None
pyarrow : None
pytables : None
pytest : None
pyxlsb : None
s3fs : None
scipy : None
sqlalchemy : None
tables : None
tabulate : None
xarray : None
xlrd : None
xlwt : None
xlsxwriter : None
numba : None

Metadata

Metadata

Assignees

No one assigned

    Labels

    BugIO CSVread_csv, to_csvRegressionFunctionality that used to work in a prior pandas version

    Type

    No type

    Projects

    No projects

    Milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions