Skip to content

DEPR: ExcelFile.parse #58247

Open
Open
@rhshadrach

Description

@rhshadrach

It seems to me we should either fix ExcelFile.parse or deprecate it entirely, and I lean toward the latter. pandas originally started out with just ExcelFile but now has the top-level read_excel. The signatures started the same, but now read_excel has gained and modified parameters that have not been added/changed in ExcelFile.parse. For example:

  • ExcelFile.parse lacks a dtype parameter
  • ExcelFile.parse has a **kwds argument that is passed on to pandas internals with no documentation on what can be included. Invalid arguments are just ignored (e.g. BUG: xl.parse index_col ignoring skiprows #50953)

It appears to me that pd.ExcelFile(...).parse(...) offers no advantage over pd.read_excel(pd.ExcelFile(...)), and so rather than fixing parse we can deprecate it and make it internal.

Edit: I no longer think deprecating ExcelFile entirely as mentioned below is a good option. See #58247 (comment).

Another option is to deprecate ExcelFile entirely. The one thing ExcelFile still provides that isn't available elsewhere is to get the underlying book or sheet_names without reading the entire file.

df = pd.DataFrame(np.zeros((100, 100)))
with pd.ExcelWriter("test.xlsx") as writer:
    for e in range(10):
        df.to_excel(writer, sheet_name=str(e))

%timeit pd.ExcelFile("test.xlsx").sheet_names
# 14.1 ms ± 76 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)
%timeit pd.read_excel("test.xlsx", sheet_name=None)
# 411 ms ± 2.07 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)

One can somewhat work around this by using nrows, but it's clunky.

%timeit pd.read_excel("test.xlsx", sheet_name=None, nrows=0).keys()
# 57.3 ms ± 257 µs per loop (mean ± std. dev. of 7 runs, 10 loops each)

Metadata

Metadata

Assignees

No one assigned

    Labels

    DeprecateFunctionality to remove in pandasIO Excelread_excel, to_excelNeeds DiscussionRequires discussion from core team before further action

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions