Skip to content

BUG: read_excel() fails when checking __version__ of older xlrd versions #38955

Closed
@kcharlie2

Description

@kcharlie2
  • I have checked that this issue has not already been reported.

  • I have confirmed this bug exists on the latest version of pandas.

  • (optional) I have confirmed this bug exists on the master branch of pandas.


When xlrd is installed but is an "older" version (e.g. 1.1.0):

pandas.read_excel("file.xlsx", engine="openpyxl")

raises

AttributeError: module 'xlrd' has no attribute '__version__'

This specifically occurs in pandas/io/excel/_base.py:336

Problem description

This bug occurs regardless of the engine input because pandas.read_excel() always checks xlrd.__version__ if any version of xlrd is installed. Older versions of xlrd use __VERSION__ instead of __version__ instead, which results in the AttributeError.

This exception does not occur and everything behaves as expected if:

  • xlrd is not installed
  • A "newer" version of xlrd is installed (that has the __version__ attribute)

Expected Output

According to the documentation,

Changed in version 1.2.0: When engine=None, the following logic will be used to determine the engine:
    
- If path_or_buffer is an OpenDocument format (.odf, .ods, .odt), then odf will be used.
- Otherwise if path_or_buffer is an xls format, xlrd will be used.
- Otherwise if openpyxl is installed, then openpyxl will be used.
- Otherwise if xlrd >= 2.0 is installed, a ValueError will be raised.
- Otherwise xlrd will be used and a FutureWarning will be raised. This case will raise a ValueError in a future version of pandas.

Based on this, I believe the intended behavior is:

  • read_excel() should not check the xlrd version if engine is "openpyxl"
  • If read_excel() does need to know the version of xlrd, it should try both __version__ and __VERSION__

Output of pd.show_versions()

INSTALLED VERSIONS

commit : 3e89b4c
python : 3.7.4.final.0
python-bits : 64
OS : Linux
OS-release : 3.10.0-1062.4.1.el7.x86_64
Version : #1 SMP Wed Sep 25 09:42:57 EDT 2019
machine : x86_64
processor : x86_64
byteorder : little
LC_ALL : None
LANG : en_US
LOCALE : en_US.ISO8859-1

pandas : 1.2.0
numpy : 1.19.1
pytz : 2020.1
dateutil : 2.8.1
pip : 20.3.3
setuptools : 40.8.0
Cython : None
pytest : 6.0.1
hypothesis : None
sphinx : 3.2.1
blosc : None
feather : None
xlsxwriter : None
lxml.etree : None
html5lib : None
pymysql : None
psycopg2 : None
jinja2 : 2.11.2
IPython : 7.19.0
pandas_datareader: None
bs4 : None
bottleneck : None
fsspec : None
fastparquet : None
gcsfs : None
matplotlib : 3.3.2
numexpr : 2.7.1
odfpy : None
openpyxl : 3.0.5
pandas_gbq : None
pyarrow : None
pyxlsb : None
s3fs : None
scipy : 1.5.2
sqlalchemy : None
tables : 3.6.1
tabulate : 0.8.7
xarray : None
xlrd : 1.1.0
xlwt : None
numba : None

Metadata

Metadata

Assignees

Labels

Type

No type

Projects

No projects

Milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions