GeneralizedRCNNTransform output different shapes of images and targets

### 🐛 Describe the bug

I would expect the shape of images and targets consistent with each other (that is, have the same width and height). 

However, since torchvision.models.detection.transform.GeneralizedRCNNTransform call self.batch_images at the end of its forward function only on images, the output images and targets usually don't have the same shapes.

Example:
```
from torchvision.models.detection.transform import GeneralizedRCNNTransform
import torch

#initialize transformation
image_mean = [0.485, 0.456, 0.406]
image_std = [0.229, 0.224, 0.225]
T=GeneralizedRCNNTransform(800,1333,image_mean,image_std)

#initialize image & target
images=[torch.randn([3,1000,900])]
targets=[{'boxes':torch.tensor([[0,0,1000,900]]),'masks':torch.randn([1,1000,900]).byte()}]

#do experiment
images,targets=T(images,targets)

print(images.tensors.shape) 
print(images.image_sizes)
print(targets[0]['boxes'])
print(targets[0]['masks'].shape)

```

output:

torch.Size([1, 3, 896, 800])
[(888, 800)]
tensor([[  0.0000,   0.0000, 888.8889, 799.2000]])
torch.Size([1, 888, 800])

The default size_divisible parameter in self.batch_images is 32. Since 888 is not divisible by 32, it pads the image (888,800) to 
(896,800). But there is no change on 'boxes' and 'masks' and even images.image_sizes.

### Versions

Collecting environment information...
PyTorch version: 1.11.0
Is debug build: False
CUDA used to build PyTorch: 11.3
ROCM used to build PyTorch: N/A

OS: Ubuntu 20.04.3 LTS (x86_64)
GCC version: (Ubuntu 9.3.0-17ubuntu1~20.04) 9.3.0
Clang version: Could not collect
CMake version: Could not collect
Libc version: glibc-2.31

Python version: 3.8.8 (default, Apr 13 2021, 19:58:26)  [GCC 7.3.0] (64-bit runtime)
Python platform: Linux-5.11.0-37-generic-x86_64-with-glibc2.10
Is CUDA available: True
CUDA runtime version: 11.4.120
GPU models and configuration: GPU 0: NVIDIA GeForce RTX 2070
Nvidia driver version: 470.57.02
cuDNN version: Could not collect
HIP runtime version: N/A
MIOpen runtime version: N/A
Is XNNPACK available: True

Versions of relevant libraries:
[pip3] mypy-extensions==0.4.3
[pip3] numpy==1.20.1
[pip3] numpydoc==1.1.0
[pip3] pytorch-lightning==1.3.3
[pip3] torch==1.11.0
[pip3] torch-summary==1.4.5
[pip3] torchaudio==0.11.0
[pip3] torchmetrics==0.5.1
[pip3] torchvision==0.12.0
[pip3] vit-pytorch==0.22.0
[conda] blas                      1.0                         mkl  
[conda] cudatoolkit               11.3.1               h2bc3f7f_2  
[conda] ffmpeg                    4.3                  hf484d3e_0    pytorch
[conda] mkl                       2021.2.0           h06a4308_296  
[conda] mkl-service               2.3.0            py38h27cfd23_1  
[conda] mkl_fft                   1.3.0            py38h42c9631_2  
[conda] mkl_random                1.2.1            py38ha9443f7_2  
[conda] numpy                     1.20.1           py38h93e21f0_0  
[conda] numpy-base                1.20.1           py38h7d8b39e_0  
[conda] numpydoc                  1.1.0              pyhd3eb1b0_1  
[conda] pytorch                   1.11.0          py3.8_cuda11.3_cudnn8.2.0_0    pytorch
[conda] pytorch-lightning         1.3.3                    pypi_0    pypi
[conda] pytorch-mutex             1.0                        cuda    pytorch
[conda] torch                     1.11.0                   pypi_0    pypi
[conda] torch-summary             1.4.5                    pypi_0    pypi
[conda] torchaudio                0.11.0               py38_cu113    pytorch
[conda] torchmetrics              0.5.1                    pypi_0    pypi
[conda] torchvision               0.12.0                   pypi_0    pypi
[conda] vit-pytorch               0.22.0                   pypi_0    pypi


cc @datumbox @vfdev-5 @YosuaMichael

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

GeneralizedRCNNTransform output different shapes of images and targets #6213

🐛 Describe the bug

Versions

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

GeneralizedRCNNTransform output different shapes of images and targets #6213

Description

🐛 Describe the bug

Versions

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions