Skip to content

Preserve order of attributes on serialization #37

Closed
@gsnedders

Description

@gsnedders

From Google Code #153:

Reported by @fantasai, Jun 1, 2010

What steps will reproduce the problem?
Parse an XHTML file containing attributes in unsorted order with lxml and reserialize.

What is the expected output? What do you see instead?
Expect no change.
Got attributes in alphabetical order, which makes the source harder to read (since the order was chosen to optimize readability, e.g. listing the fixed-length rel="stylesheet" before variable-length href="..."). This also makes it harder to understand diffs, since there's a lot of unnecessary changes to the source output.

Ideally, html5lib would remember the order of attributes and reserialize in that order. lxml does remember the order, so removing the attrs.sort() line in htmlserializer.py is adequate to fix the problem for serializing an lxml tree.

Jul 20, 2010 geoffers
AFAIK the reason for the sort being there is so that there is a guaranteed order even when a tree-builder with no guaranteed order is being used.

May 22, 2011 geoffers
There's no real way to fix this without relying upon defined-to-be-undefined behaviour in CPython/lxml, and as such I'm reluctant to do so. lxml says attributes are given in an arbitrary order, and they are stored in a dict which CPython makes no guarantee of the order of. (lxml does always insert attributes in document order into the dict, and dicts are ordered by insertion order, so it does actually work… for now, at least).

Yes, we could go against both the lxml/CPython documentation and rely upon the ordering, but if either ever changes their behaviour, it could mean html5lib could potentially start serializing the same lxml parse-tree in random ways, and I'd much rather go for the definitely-consistent route we have now.

Metadata

Metadata

Assignees

Type

No type

Projects

No projects

Milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions