Skip to content

But in Spatial Transformer Network? #1117

Open
@theRealSuperMario

Description

@theRealSuperMario

Hi,

I am opening this issue because I noticed a weird behavior of the the spatial transformer networks implementation (

transforms.Normalize((0.1307,), (0.3081,))
)

I summarized my findings here. In short, what is happening is that
when the input is normalised and then fed to the STN, the F.grid_sample call adds a zero-padding, however, the normalisation changes the background value from 0 to -mean/std.
(

)

This causes the STN to collapse very early and to actually never learn the correct transformation. You can actually see that in the example code already (https://pytorch.org/tutorials/intermediate/spatial_transformer_tutorial.html), because the learnt transformation is zooming OUT instead of zooming IN on the digits. For the original 28 x 28 images, this is not such a big problem, However, when you continue to cluttered MNIST as in the original publication, the difference is huge. Once again, please have a look here.

I think the tutorial for the STN should be updated and also include the cluttered MNIST example because that is what drives the point home. I would volunteer to do so, if I get the permission to go ahead.

Unfortunately, most other implementations I was able to find on the web also have this bug.

cc @sekyondaMeta @svekars @carljparker @NicolasHug @kit1980 @subramen

Metadata

Metadata

Assignees

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions