Skip to content

Refactor codecs error handlers to use _PyUnicodeError_GetParams and extract complex logic into separate functions #129173

Closed
@picnixz

Description

@picnixz

Feature or enhancement

Proposal:

I want to refactor the different codecs handlers in Python/codecs.c to use _PyUnicodeError_GetParams. Some codecs handlers will be refactored as part of #126004 but some others are not subject to issues (namely, the ignore, namereplace, surrogateescape, and surrogatepass handlers do not suffer from crashes, or at least I wasn't able to make them crash easily).

In addition, I also plan to split the handlers into functions instead of 2 or 3 big blocks of code handling a specific exception. For that reason, I will introduce the following helper macros:

#define _PyIsUnicodeEncodeError(EXC)    \
    PyObject_TypeCheck(EXC, (PyTypeObject *)PyExc_UnicodeEncodeError)
#define _PyIsUnicodeDecodeError(EXC)    \
    PyObject_TypeCheck(EXC, (PyTypeObject *)PyExc_UnicodeDecodeError)
#define _PyIsUnicodeTranslateError(EXC) \
    PyObject_TypeCheck(EXC, (PyTypeObject *)PyExc_UnicodeTranslateError)

For handlers that need to be fixed, I will first fix them in-place (no refactorization). Afterwards, I will refactor them and extract the relevant part of the code into functions. That way, the diff will be easier to follow (I've observed that it's much harder to read the diff where I did both so I will revert that part in the existing PRs; EDIT: actually there is no PR doing both fixes and split...).

I'm creating this issue to track the progression of the refactorization if no issue occurs.

cc @vstinner @encukou

Has this already been discussed elsewhere?

This is a minor feature, which does not need previous discussion elsewhere

Links to previous discussion of this feature:

No response

Linked PRs

Metadata

Metadata

Assignees

Labels

Projects

Status

Done

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions