Open
Description
Would it be possible to add position information, i.e. line+column to text nodes? Or, at least make this information available to the tree builder? I implemented a very minimal proof of concept to add the information to each token and pass that along to the dom tree builder and obtain the following result:
import html5lib
html = '<div>&<p>b<span>c</span></p> cab</div>'
parser = html5lib.HTMLParser(tree=html5lib.getTreeBuilder("dom"))
doc = parser.parse(html)
def parse(n):
for c in n.childNodes:
if hasattr(c, 'sourcepos'):
print(c.sourcepos, c)
parse(c)
parse(doc)
None <DOM Element: head at 0x10bbed0d0>
None <DOM Element: body at 0x10bbed1f0>
(1, 5) <DOM Element: div at 0x10bbfb790>
(1, 10) <DOM Text node "'&'">
(1, 13) <DOM Element: p at 0x10bbfb820>
(1, 14) <DOM Text node "'b'">
(1, 20) <DOM Element: span at 0x10bbfb8b0>
(1, 21) <DOM Text node "'c'">
(1, 33) <DOM Text node "' '">
(1, 36) <DOM Text node "'cab'">
I would be willing to implement it.
Metadata
Metadata
Assignees
Labels
No labels