Description
Pandas version checks
-
I have checked that this issue has not already been reported.
-
I have confirmed this bug exists on the latest version of pandas.
-
I have confirmed this bug exists on the main branch of pandas.
Reproducible Example
import pandas as pd
from io import StringIO
table = """<table border="1" class="dataframe">
<thead>
<tr style="text-align: right;">
<th>col0</th>
<th>col1</th>
<th>col2</th>
</tr>
</thead>
<tbody>
<tr>
<td>1</td>
<td>1.0</td>
<td>a</td>
</tr>
<tr>
<td>2</td>
<td>2.5</td>
<td>b</td>
</tr>
<tr>
<td>3</td>
<td>5.0</td>
<td>c</td>
</tr>
</tbody>
</table>
"""
buf = StringIO()
buf.write(table)
buf.seek(0)
pd.read_html(buf, flavor="bs4")
Issue Description
beautifuloup4
version 4.13.0b2
breaks this example, with the following exception being raised:
Traceback (most recent call last):
File "/Users/clm/dev/astropy-project/coordinated/astropy/bugs/16251/t.py", line 34, in <module>
pd.read_html(buf, flavor="bs4")
File "/Users/clm/.pyenv/versions/astropy.dev/lib/python3.12/site-packages/pandas/io/html.py", line 1213, in read_html
return _parse(
^^^^^^^
File "/Users/clm/.pyenv/versions/astropy.dev/lib/python3.12/site-packages/pandas/io/html.py", line 972, in _parse
tables = p.parse_tables()
^^^^^^^^^^^^^^^^
File "/Users/clm/.pyenv/versions/astropy.dev/lib/python3.12/site-packages/pandas/io/html.py", line 242, in parse_tables
tables = self._parse_tables(self._build_doc(), self.match, self.attrs)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/Users/clm/.pyenv/versions/astropy.dev/lib/python3.12/site-packages/pandas/io/html.py", line 594, in _parse_tables
element_name = self._strainer.name
^^^^^^^^^^^^^^^^^^^
AttributeError: 'SoupStrainer' object has no attribute 'name'
I'm not sure whether it should be addressed in pandas or in bs4.
For context, this was discovered while testing astropy.
xref: astropy/astropy#16251
Expected Behavior
Not exception.
Installed Versions
INSTALLED VERSIONS
commit : 4241ba5
python : 3.12.2.final.0
python-bits : 64
OS : Darwin
OS-release : 23.4.0
Version : Darwin Kernel Version 23.4.0: Fri Mar 15 00:12:49 PDT 2024; root:xnu-10063.101.17~1/RELEASE_ARM64_T6020
machine : arm64
processor : arm
byteorder : little
LC_ALL : None
LANG : None
LOCALE : None.UTF-8
pandas : 3.0.0.dev0+644.g4241ba5e1
numpy : 1.26.0
pytz : 2024.1
dateutil : 2.9.0.post0
setuptools : 69.1.0
pip : 24.0
Cython : None
pytest : 8.1.1
hypothesis : 6.98.9
sphinx : None
blosc : None
feather : None
xlsxwriter : None
lxml.etree : None
html5lib : 1.1
pymysql : None
psycopg2 : None
jinja2 : 3.1.3
IPython : 8.22.0
pandas_datareader : None
adbc-driver-postgresql: None
adbc-driver-sqlite : None
bs4 : 4.13.0b2
bottleneck : None
fastparquet : None
fsspec : None
gcsfs : None
matplotlib : 3.8.3
numba : None
numexpr : None
odfpy : None
openpyxl : None
pyarrow : 15.0.2
pyreadstat : None
python-calamine : None
pyxlsb : None
s3fs : None
scipy : 1.12.0
sqlalchemy : None
tables : None
tabulate : None
xarray : None
xlrd : None
zstandard : None
tzdata : 2024.1
qtpy : None
pyqt5 : None