Description
Is your feature request related to a problem?
When your io
parameter for pd.read_excel()
is an openpyxl.Workbook object, the function should assume that engine='openpyxl'
. Right now, if you don't specify the engine, you will get a ValueError: Invalid file path or buffer object type: <class 'openpyxl.workbook.workbook.Workbook'>
. However, if you explicitly pass the engine, it will load fine.
It seems like passing in the specific workbook object would only ever be done if you want it to read it with that engine, so we should just assume that engine.
The current documentation for read_excel()
specifies that you can use an xlrd.Book
, and that works fine without an explicit engine
. It doesn't say that you can use an openpyxl.Workbook
at all, but it does work when you declare the engine
.
This has come up as a small inconvenience for me a few times. I have a script where I am pulling some data out of Excel that is not a table at all, and so I'm using opening it as an openpyxl.Workbook
and using some of the tools from that library, and then also want to pull another table using pandas. I want to do this without opening the workbook multiple times, so I use the same workbook to pass into pd.read_excel()
. It then throws and error and I say "Oh yea, I need the engine."
Describe the solution you'd like
One should be able to do
import pandas as pd
import openpyxl
wb = openpyxl.load_Workbook(filepath,...)
df = pd.read_excel(wb,...) # and not include engine
wb.close()
This should work the same as just df = pd.read_excel(filepath,...)
Describe alternatives you've considered
Alternatives seem to be (1) explicitly pass engine
or (2) don't use openpyxl.Workbook
objects.
Potential Solution
# starting line 1466 or so of pandas.io.excel_base.py
if isinstance(path_or_buffer, openpyxl.Workbook):
if engine == "openpyxl":
pass
elif engine is None:
engine = "openpyxl"
else:
raise ValueError(f"Excel object is openpyxl.Workbook, but engine is {engine}")