Skip to content

ENH: read_excel should assume engine when passing openpyxl.Workbook object #46352

Open
@scotscotmcc

Description

@scotscotmcc

Is your feature request related to a problem?

When your io parameter for pd.read_excel() is an openpyxl.Workbook object, the function should assume that engine='openpyxl'. Right now, if you don't specify the engine, you will get a ValueError: Invalid file path or buffer object type: <class 'openpyxl.workbook.workbook.Workbook'>. However, if you explicitly pass the engine, it will load fine.

It seems like passing in the specific workbook object would only ever be done if you want it to read it with that engine, so we should just assume that engine.

The current documentation for read_excel() specifies that you can use an xlrd.Book, and that works fine without an explicit engine. It doesn't say that you can use an openpyxl.Workbook at all, but it does work when you declare the engine.

This has come up as a small inconvenience for me a few times. I have a script where I am pulling some data out of Excel that is not a table at all, and so I'm using opening it as an openpyxl.Workbook and using some of the tools from that library, and then also want to pull another table using pandas. I want to do this without opening the workbook multiple times, so I use the same workbook to pass into pd.read_excel(). It then throws and error and I say "Oh yea, I need the engine."

Describe the solution you'd like

One should be able to do

import pandas as pd
import openpyxl

wb = openpyxl.load_Workbook(filepath,...)
df = pd.read_excel(wb,...) # and not include engine
wb.close()

This should work the same as just df = pd.read_excel(filepath,...)

Describe alternatives you've considered

Alternatives seem to be (1) explicitly pass engine or (2) don't use openpyxl.Workbook objects.

Potential Solution

# starting line 1466 or so of pandas.io.excel_base.py
if isinstance(path_or_buffer, openpyxl.Workbook):
    if engine == "openpyxl":
        pass
    elif engine is None:
        engine = "openpyxl"
    else:
        raise ValueError(f"Excel object is openpyxl.Workbook, but engine is {engine}")

Metadata

Metadata

Assignees

No one assigned

    Labels

    EnhancementIO Excelread_excel, to_excelNeeds DiscussionRequires discussion from core team before further action

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions