DEPR: ExcelFile.parse

It seems to me we should either fix `ExcelFile.parse` or deprecate it entirely, and I lean toward the latter. pandas originally started out with just `ExcelFile` but now has the top-level `read_excel`. The signatures started the same, but now `read_excel` has gained and modified parameters that have not been added/changed in `ExcelFile.parse`. For example:

  - `ExcelFile.parse` lacks a `dtype` parameter
  - `ExcelFile.parse` has a `**kwds` argument that is passed on to pandas internals with no documentation on what can be included. Invalid arguments are just ignored (e.g.  #50953)

It appears to me that `pd.ExcelFile(...).parse(...)` offers no advantage over `pd.read_excel(pd.ExcelFile(...))`, and so rather than fixing `parse` we can deprecate it and make it internal.

Edit: I no longer think deprecating `ExcelFile` entirely as mentioned below is a good option. See https://github.com/pandas-dev/pandas/issues/58247#issuecomment-2067632583.

Another option is to deprecate `ExcelFile` entirely. The one thing `ExcelFile` still provides that isn't available elsewhere is to get the underlying `book` or `sheet_names` without reading the entire file.

```
df = pd.DataFrame(np.zeros((100, 100)))
with pd.ExcelWriter("test.xlsx") as writer:
    for e in range(10):
        df.to_excel(writer, sheet_name=str(e))

%timeit pd.ExcelFile("test.xlsx").sheet_names
# 14.1 ms ± 76 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)
%timeit pd.read_excel("test.xlsx", sheet_name=None)
# 411 ms ± 2.07 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)
```

One can somewhat work around this by using `nrows`, but it's clunky.

```
%timeit pd.read_excel("test.xlsx", sheet_name=None, nrows=0).keys()
# 57.3 ms ± 257 µs per loop (mean ± std. dev. of 7 runs, 10 loops each)
```

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

DEPR: ExcelFile.parse #58247

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

DEPR: ExcelFile.parse #58247

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions