Description
-
I have checked that this issue has not already been reported.
-
I have confirmed this bug exists on the latest version of pandas.
-
(optional) I have confirmed this bug exists on the master branch of pandas.
When xlrd is installed but is an "older" version (e.g. 1.1.0):
pandas.read_excel("file.xlsx", engine="openpyxl")
raises
AttributeError: module 'xlrd' has no attribute '__version__'
This specifically occurs in pandas/io/excel/_base.py:336
Problem description
This bug occurs regardless of the engine
input because pandas.read_excel()
always checks xlrd.__version__
if any version of xlrd is installed. Older versions of xlrd use __VERSION__
instead of __version__
instead, which results in the AttributeError.
This exception does not occur and everything behaves as expected if:
- xlrd is not installed
- A "newer" version of xlrd is installed (that has the
__version__
attribute)
Expected Output
According to the documentation,
Changed in version 1.2.0: When engine=None, the following logic will be used to determine the engine:
- If path_or_buffer is an OpenDocument format (.odf, .ods, .odt), then odf will be used.
- Otherwise if path_or_buffer is an xls format, xlrd will be used.
- Otherwise if openpyxl is installed, then openpyxl will be used.
- Otherwise if xlrd >= 2.0 is installed, a ValueError will be raised.
- Otherwise xlrd will be used and a FutureWarning will be raised. This case will raise a ValueError in a future version of pandas.
Based on this, I believe the intended behavior is:
- read_excel() should not check the xlrd version if engine is
"openpyxl"
- If read_excel() does need to know the version of xlrd, it should try both
__version__
and__VERSION__
Output of pd.show_versions()
INSTALLED VERSIONS
commit : 3e89b4c
python : 3.7.4.final.0
python-bits : 64
OS : Linux
OS-release : 3.10.0-1062.4.1.el7.x86_64
Version : #1 SMP Wed Sep 25 09:42:57 EDT 2019
machine : x86_64
processor : x86_64
byteorder : little
LC_ALL : None
LANG : en_US
LOCALE : en_US.ISO8859-1
pandas : 1.2.0
numpy : 1.19.1
pytz : 2020.1
dateutil : 2.8.1
pip : 20.3.3
setuptools : 40.8.0
Cython : None
pytest : 6.0.1
hypothesis : None
sphinx : 3.2.1
blosc : None
feather : None
xlsxwriter : None
lxml.etree : None
html5lib : None
pymysql : None
psycopg2 : None
jinja2 : 2.11.2
IPython : 7.19.0
pandas_datareader: None
bs4 : None
bottleneck : None
fsspec : None
fastparquet : None
gcsfs : None
matplotlib : 3.3.2
numexpr : 2.7.1
odfpy : None
openpyxl : 3.0.5
pandas_gbq : None
pyarrow : None
pyxlsb : None
s3fs : None
scipy : 1.5.2
sqlalchemy : None
tables : 3.6.1
tabulate : 0.8.7
xarray : None
xlrd : 1.1.0
xlwt : None
numba : None