Skip to content

Add position information for text nodes #533

Open
@corynezin

Description

@corynezin

Would it be possible to add position information, i.e. line+column to text nodes? Or, at least make this information available to the tree builder? I implemented a very minimal proof of concept to add the information to each token and pass that along to the dom tree builder and obtain the following result:

import html5lib

html = '<div>&amp;<p>b<span>c</span></p> cab</div>'

parser = html5lib.HTMLParser(tree=html5lib.getTreeBuilder("dom"))

doc = parser.parse(html)
def parse(n):
    for c in n.childNodes:
        if hasattr(c, 'sourcepos'):
            print(c.sourcepos, c)
        parse(c)

parse(doc)
None <DOM Element: head at 0x10bbed0d0>
None <DOM Element: body at 0x10bbed1f0>
(1, 5) <DOM Element: div at 0x10bbfb790>
(1, 10) <DOM Text node "'&'">
(1, 13) <DOM Element: p at 0x10bbfb820>
(1, 14) <DOM Text node "'b'">
(1, 20) <DOM Element: span at 0x10bbfb8b0>
(1, 21) <DOM Text node "'c'">
(1, 33) <DOM Text node "' '">
(1, 36) <DOM Text node "'cab'">

I would be willing to implement it.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions