Skip to content

Implement ability to toggle inplace activation operations in the implemented architectures #6699

Open
@javierbg

Description

@javierbg

🚀 The feature

A lot of the available architectures like AlexNet, EfficientNet, ResNet, DenseNet, etc. are implemented with inplace activation operations, e.g.:

nn.ReLU(inplace=True)

This is done in the name of saving memory (#807)

This could be toggled using a parameter for the model-building functions.

Motivation, pitch

Often, this presents a problem when trying to register register full backward hooks, as you get errors like the following:

UserWarning: Output 0 of BackwardHookFunctionBackward is a view and is being modified inplace. This view was created inside a custom Function (or because an input was returned as-is) and the autograd logic to handle view+inplace would override the custom backward associated with the custom Function, leading to incorrect gradients. This behavior is deprecated and will be forbidden starting version 1.6. You can remove this warning by cloning the output of the custom Function. (Triggered internally at /pytorch/torch/csrc/autograd/variable.cpp:547.)

This is annoying when trying to use libraries like Captum, which need to attach these hooks in layer attribution methods like LRP:

Model cannot contain any in-place nonlinear submodules; these are not supported by the register_full_backward_hook PyTorch API starting from PyTorch v1.9.

Adding a parameter (something like inplace_activations, defaulted to True) would allow to maintain compatibility with old code while providing an option for anyone that needs it.

Alternatives

Currently, my only option is to copy the model code in question (e.g. ResNet) and substituting every inplace=True with inplace=False. This approach works, but it is not really scalable when trying to benchmark as many architectures as possible.

There may be other implementation approaches that libraries like Captum could take in order to be able to implement these methods to work with inplace operations. Nonetheless, the proposed solution appears more simple and would allow a lot of already implemented methods to work out of the box with the standard implementations in torchvision.

Additional context

I'm willing to implement this myself, but I'm having trouble setting up a torchvision development environment in Arch. I might need additional help for that.

cc @datumbox

Metadata

Metadata

Assignees

No one assigned

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions