Description
-
I have checked that this issue has not already been reported.
-
I have confirmed this bug exists on the latest version of pandas.
-
(optional) I have confirmed this bug exists on the master branch of pandas.
Code Sample, a copy-pastable example
import pandas as pd
pd.read_excel("foo.xlsx")
Example (empty) excel file: https://github.com/pandas-dev/pandas/files/6635871/foo.xlsx
Problem description
Running the code example emits an error message that could be very helpful:
ImportError: Missing optional dependency 'xlrd'. Install xlrd >= 1.0.0 for Excel support Use pip or conda to install xlrd.
However, as of version 2, the xlrd library only supports xls files, not xlsx (#38410). This instruction is thus misleading: running pip install xlrd
and then rerunning the code still fails:
ValueError: Your version of xlrd is 2.0.1. In xlrd >= 2.0, only the xls format is supported. Install openpyxl instead.
The correct/final resolution is thus to run pip install openpyxl
. It's a bit of papercut to go though several steps here, especially as I suspect xlsx files are more common than xls these days (at least, it's been ~15-20 years since .xlsx was released according to https://en.wikipedia.org/wiki/Office_Open_XML).
Expected Output
It would be nice if the original error message made it clearer what was going on. For instance, options might be:
- detect the file type/extension and reference the appropriate library (i.e. for .xlsx
ImportError: Missing optional dependency 'openpyxl'. Install openpyxl for .xlsx Excel support. Use pip or conda to install openpyxl.
) - mention both libraries so the human can make the decision
ImportError: Missing optional dependency 'xlrd' (for xls) 'openpyxl' (for xlsx). Install the appropriate one of these for Excel support. Use pip or conda to install xlrd or openpyxl.
) - suggest installing
xlrd>=1.0.0,<2.0.0
(I personally feel that 1 would be the nicest.)
Output of pd.show_versions()
INSTALLED VERSIONS
commit : 2cb9652
python : 3.8.7.final.0
python-bits : 64
OS : Darwin
OS-release : 20.2.0
Version : Darwin Kernel Version 20.2.0: Wed Dec 2 20:40:21 PST 2020; root:xnu-7195.60.75~1/RELEASE_ARM64_T8101
machine : arm64
processor : arm
byteorder : little
LC_ALL : None
LANG : en_AU.UTF-8
LOCALE : en_AU.UTF-8
pandas : 1.2.4
numpy : 1.20.3
pytz : 2021.1
dateutil : 2.8.1
pip : 20.2.3
setuptools : 49.2.1
Cython : None
pytest : None
hypothesis : None
sphinx : None
blosc : None
feather : None
xlsxwriter : None
lxml.etree : None
html5lib : None
pymysql : None
psycopg2 : None
jinja2 : None
IPython : None
pandas_datareader: None
bs4 : None
bottleneck : None
fsspec : None
fastparquet : None
gcsfs : None
matplotlib : None
numexpr : None
odfpy : None
openpyxl : None
pandas_gbq : None
pyarrow : None
pyxlsb : None
s3fs : None
scipy : None
sqlalchemy : None
tables : None
tabulate : None
xarray : None
xlrd : None
xlwt : None
numba : None
After pip install xlrd
, it reports: xlrd: 2.0.1.