Skip to content

etree lxml.etree walker can't serialize full documents #345

Open
@cjerdonek

Description

@cjerdonek

The etree walker with implementation lxml.etree doesn't work when passed a full html document (having type lxml.etree._ElementTree).

To reproduce--

def serialize(element, treebuilder, implementation=None):
    walker_cls = html5lib.getTreeWalker(treebuilder, implementation=implementation)
    walker = walker_cls(element)
    serializer = HTMLSerializer(omit_optional_tags=False)
    html = serializer.render(walker)
    print(html)

html = """<!DOCTYPE html>
<html>
<head>
    <title>foo</title>
</head>
<body>
    <p>a</p><p>b</p>
</body>
</html>
"""

builder = html5lib.getTreeBuilder('lxml')
parser = html5lib.HTMLParser(builder, namespaceHTMLElements=False)
element = parser.parse(html)

serialize(element, 'lxml')
serialize(element, 'etree', implementation=lxml.etree)

The last line fails with the following error:

Traceback (most recent call last):
  File "test-html5lib.py", line 98, in <module>
    parse_and_serialize(element, 'etree', implementation=lxml.etree)
  File "test-html5lib.py", line 79, in serialize
    html = serializer.render(walker)
  File "/.../python3.6/site-packages/html5lib/serializer.py", line 323, in render
    return "".join(list(self.serialize(treewalker)))
  File "/.../python3.6/site-packages/html5lib/serializer.py", line 209, in serialize
    for token in treewalker:
  File "/.../python3.6/site-packages/html5lib/treewalkers/base.py", line 128, in __iter__
    firstChild = self.getFirstChild(currentNode)
  File "/.../python3.6/site-packages/html5lib/treewalkers/etree.py", line 88, in getFirstChild
    if element.text:
AttributeError: 'lxml.etree._ElementTree' object has no attribute 'text'

The walker should probably first be calling root = element.getroot(). This seems to be on the same wave length as the issue with treewalkers/etree.py I described in this comment: #338 (comment)

Metadata

Metadata

Assignees

No one assigned

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions