Closed
Description
In this tutorial here it says in the comment that "# We don't need gradients on to do reporting". From what I understand the train flag only affects layers such as dropout and batch-normalization. Does it also affect gradient calculations, or is this comment wrong?
cc @suraj813