Closed
Description
>>> html5lib.serializer.serialize(html5lib.parse('<p> </p>'))
'<p>\xa0'
at the moment the parsing and serialising a document causes entities to be converted into special characters, including things like #00 and there is no way to pass additional entities to xml.sax.saxutils.escape.
I looked into subclassing the serialiser but the escaping happens in the middle of the serialize() method at:
https://github.com/html5lib/html5lib-python/blob/master/html5lib/serializer/htmlserializer.py#L223
perhaps the class should define an entities dict to pass through the standard html5 entities and special characters or do the escaping via a class method that can be overridden?
Metadata
Metadata
Assignees
Labels
No labels