Closed
Description
Problem
The flavor
parameter has the incorret type hint in the read_html(...)
function.
Currently, the type hint is an optional str
. However, according to the documentation in io.html it is possible to pass a list-like:
The lxml backend will raise an error on a failed parse if that is the only parser you provide.
If you only have a single parser you can provide just a string, but it is considered good practice
to pass a list with one string if, for example, the function expects a sequence of strings. You may use:
>>> dfs = pd.read_html(url, "Metcalf Bank", index_col=0, flavor=["lxml"])
Or you could pass flavor='lxml' without a list:
>>> dfs = pd.read_html(url, "Metcalf Bank", index_col=0, flavor="lxml")
However, if you have bs4 and html5lib installed and pass None or ['lxml', 'bs4'] then the parse will
most likely succeed. Note that as soon as a parse succeeds, the function will return.
>>> dfs = pd.read_html(url, "Metcalf Bank", index_col=0, flavor=["lxml", "bs4"])
Internally read_html(...)
converts the passed value to a tuple
in both cases.
With this incorrect type hint, type checkers throw an error when passed a list-like:
Solution
We can set the type hint to a str
or a Sequence[str]
, both of which are optional:
flavor: str | Sequence[str] | None = None
I can resolve this and create a PR by updating the type hints, docstrings and documentation. Just let me know if you agree with my solution.