Description
Code Sample
import os
import pandas
import tempfile
fname = ""
with tempfile.NamedTemporaryFile(delete=False, mode="w+", encoding="shift-jis") as f:
f.write("てすと\nbar")
fname = f.name
print(fname)
try:
with open(fname,mode="r", encoding="shift-jis") as f:
result = pandas.read_csv(f)
print("read shift-jis")
print(result)
with open(fname,mode="r", encoding="shift-jis") as f:
result = pandas.read_csv(f,encoding="utf-8")
print("open shift-jis file and read_csv with encoding: utf-8")
print(result)
with open(fname,mode="rb") as f:
result = pandas.read_csv(f,encoding="shift-jis")
print("open binary with buffered and read_csv with encoding: shift-jis")
print(result)
with open(fname,mode="rb",buffering=0) as f:
result = pandas.read_csv(f,encoding="shift-jis")
print("open binary without burrered and read_csv with encoding: shift-jis")
print(result)
except Exception as e:
print(e)
os.unlink(fname)
Problem description
Pandas 1.0.0, this sample does not work. But pandas 0.25.3, this sample works fine.
Open file with buffering=0 option, f is RawIOBase
. This case seems encoding option will be ignored.
https://github.com/pandas-dev/pandas/pull/30771/files#diff-0335ae9037e4eb4747749a9f94cffd32R641
https://github.com/pandas-dev/pandas/pull/30771/files#diff-777d7549579ddc0c6e67596ad87e0d27R1879
Expected Output
/tmp/tmpxxxxxxxxxxx
read shift-jis
てすと
0 bar
open shift-jis file and read_csv with encoding: utf-8
てすと
0 bar
open binary with buffered and read_csv with encoding: shift-jis
てすと
0 bar
open binary without burrered and read_csv with encoding: shift-jis
てすと
0 bar
Output of pd.show_versions()
INSTALLED VERSIONS
commit : None
python : 3.6.8.final.0
python-bits : 64
OS : Linux
OS-release : 4.4.0-18362-Microsoft
machine : x86_64
processor : x86_64
byteorder : little
LC_ALL : None
LANG : ja_JP.UTF-8
LOCALE : ja_JP.UTF-8
pandas : 1.0.0
numpy : 1.17.2
pytz : 2019.3
dateutil : 2.8.0
pip : 19.3.1
setuptools : 41.4.0
Cython : None
pytest : None
hypothesis : None
sphinx : None
blosc : None
feather : None
xlsxwriter : None
lxml.etree : 4.4.1
html5lib : None
pymysql : None
psycopg2 : None
jinja2 : 2.10
IPython : None
pandas_datareader: None
bs4 : None
bottleneck : None
fastparquet : None
gcsfs : None
lxml.etree : 4.4.1
matplotlib : 3.1.1
numexpr : None
odfpy : None
openpyxl : None
pandas_gbq : None
pyarrow : None
pytables : None
pytest : None
pyxlsb : None
s3fs : None
scipy : None
sqlalchemy : None
tables : None
tabulate : None
xarray : None
xlrd : None
xlwt : None
xlsxwriter : None
numba : None