Closed
Description
Currently the re
module supports only simple set syntax. But it is possible that in future it will support extended syntax: nested sets and set operations. Unfortunately that syntax is not fully compatible with the current syntax. In particular open bracket '['
in a character set starts a nested set. The code of html5lib contains a regular expression that will be broken if the new syntax will be accepted.
ascii_punctuation_re = re.compile("[\u0009-\u000D\u0020-\u002F\u003A-\u0040\u005B-\u0060\u007B-\u007E]")
It would be good to guard the code from possible future breakage. It is enough to add a backslash before [
. Replace \u005B
with \u005C\u005B
, \\\u005B
or \\[
.
See Python issue: https://bugs.python.org/issue30349.
Metadata
Metadata
Assignees
Labels
No labels