Skip to content

TYP/DOC: flavor parameter with incorrect type hint in read_html #55059

Closed
@matheusfelipeog

Description

@matheusfelipeog

Problem

The flavor parameter has the incorret type hint in the read_html(...) function.

Currently, the type hint is an optional str. However, according to the documentation in io.html it is possible to pass a list-like:

The lxml backend will raise an error on a failed parse if that is the only parser you provide. 
If you only have a single parser you can provide just a string, but it is considered good practice
to pass a list with one string if, for example, the function expects a sequence of strings. You may use:

>>> dfs = pd.read_html(url, "Metcalf Bank", index_col=0, flavor=["lxml"])

Or you could pass flavor='lxml' without a list:

>>> dfs = pd.read_html(url, "Metcalf Bank", index_col=0, flavor="lxml")

However, if you have bs4 and html5lib installed and pass None or ['lxml', 'bs4'] then the parse will
most likely succeed. Note that as soon as a parse succeeds, the function will return.

>>> dfs = pd.read_html(url, "Metcalf Bank", index_col=0, flavor=["lxml", "bs4"])

Internally read_html(...) converts the passed value to a tuple in both cases.

With this incorrect type hint, type checkers throw an error when passed a list-like:

image

Solution

We can set the type hint to a str or a Sequence[str], both of which are optional:

flavor: str | Sequence[str] | None = None

I can resolve this and create a PR by updating the type hints, docstrings and documentation. Just let me know if you agree with my solution.

Metadata

Metadata

Assignees

No one assigned

    Labels

    DocsTypingtype annotations, mypy/pyright type checking

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions