Skip to content

Incompatibility with possible future regular expression syntax #347

Closed
@serhiy-storchaka

Description

@serhiy-storchaka

Currently the re module supports only simple set syntax. But it is possible that in future it will support extended syntax: nested sets and set operations. Unfortunately that syntax is not fully compatible with the current syntax. In particular open bracket '[' in a character set starts a nested set. The code of html5lib contains a regular expression that will be broken if the new syntax will be accepted.

ascii_punctuation_re = re.compile("[\u0009-\u000D\u0020-\u002F\u003A-\u0040\u005B-\u0060\u007B-\u007E]")

It would be good to guard the code from possible future breakage. It is enough to add a backslash before [. Replace \u005B with \u005C\u005B, \\\u005B or \\[.

See Python issue: https://bugs.python.org/issue30349.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions