Closed
Description
I presume that the problem is that the data is first parsed and then the header is selected out. But when the dtype of the column is a number type the item that should become the column name, since it's not a valid number, becomes NaN
.
Sample data:
data1 = io.StringIO(u'''<table>
<thead>
<tr>
<th>Country</th>
<th>Municipality</th>
<th>Year</th>
</tr>
</thead>
<tbody>
<tr>
<td>Ukraine</td>
<th>Odessa</th>
<td>1944</td>
</tr>
</tbody>
</table>''')
data2 = io.StringIO(u'''
<table>
<tbody>
<tr>
<th>Country</th>
<th>Municipality</th>
<th>Year</th>
</tr>
<tr>
<td>Ukraine</td>
<th>Odessa</th>
<td>1944</td>
</tr>
</tbody>
</table>''')
Output:
>>> pd.read_html(data1)[0]
Country Municipality Year
0 Ukraine Odessa 1944
>>> pd.read_html(data2, header=0)[0]
0 Country Municipality NaN
1 Ukraine Odessa 1944