Skip to content

DOM bogusly ending up with adjacent text nodes given a character reference #208

Open
@gsnedders

Description

@gsnedders
import html5lib
string = u"<p>name&thinsp;: value</p>"
dom = html5lib.parse(string, treebuilder="dom")
print len(dom.getElementsByTagName("p")[0].childNodes)

This gives three, split around the \u2009. Per spec, we should have one child text node, I'm pretty sure. This only shows up here because etree/lxml can't have adjacent text nodes in their data model.

Metadata

Metadata

Assignees

No one assigned

    Labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions