Skip to content

GeneralizedRCNNTransform output different shapes of images and targets #6213

Open
@lpq97lpq

Description

@lpq97lpq

🐛 Describe the bug

I would expect the shape of images and targets consistent with each other (that is, have the same width and height).

However, since torchvision.models.detection.transform.GeneralizedRCNNTransform call self.batch_images at the end of its forward function only on images, the output images and targets usually don't have the same shapes.

Example:

from torchvision.models.detection.transform import GeneralizedRCNNTransform
import torch

#initialize transformation
image_mean = [0.485, 0.456, 0.406]
image_std = [0.229, 0.224, 0.225]
T=GeneralizedRCNNTransform(800,1333,image_mean,image_std)

#initialize image & target
images=[torch.randn([3,1000,900])]
targets=[{'boxes':torch.tensor([[0,0,1000,900]]),'masks':torch.randn([1,1000,900]).byte()}]

#do experiment
images,targets=T(images,targets)

print(images.tensors.shape) 
print(images.image_sizes)
print(targets[0]['boxes'])
print(targets[0]['masks'].shape)

output:

torch.Size([1, 3, 896, 800])
[(888, 800)]
tensor([[ 0.0000, 0.0000, 888.8889, 799.2000]])
torch.Size([1, 888, 800])

The default size_divisible parameter in self.batch_images is 32. Since 888 is not divisible by 32, it pads the image (888,800) to
(896,800). But there is no change on 'boxes' and 'masks' and even images.image_sizes.

Versions

Collecting environment information...
PyTorch version: 1.11.0
Is debug build: False
CUDA used to build PyTorch: 11.3
ROCM used to build PyTorch: N/A

OS: Ubuntu 20.04.3 LTS (x86_64)
GCC version: (Ubuntu 9.3.0-17ubuntu1~20.04) 9.3.0
Clang version: Could not collect
CMake version: Could not collect
Libc version: glibc-2.31

Python version: 3.8.8 (default, Apr 13 2021, 19:58:26) [GCC 7.3.0] (64-bit runtime)
Python platform: Linux-5.11.0-37-generic-x86_64-with-glibc2.10
Is CUDA available: True
CUDA runtime version: 11.4.120
GPU models and configuration: GPU 0: NVIDIA GeForce RTX 2070
Nvidia driver version: 470.57.02
cuDNN version: Could not collect
HIP runtime version: N/A
MIOpen runtime version: N/A
Is XNNPACK available: True

Versions of relevant libraries:
[pip3] mypy-extensions==0.4.3
[pip3] numpy==1.20.1
[pip3] numpydoc==1.1.0
[pip3] pytorch-lightning==1.3.3
[pip3] torch==1.11.0
[pip3] torch-summary==1.4.5
[pip3] torchaudio==0.11.0
[pip3] torchmetrics==0.5.1
[pip3] torchvision==0.12.0
[pip3] vit-pytorch==0.22.0
[conda] blas 1.0 mkl
[conda] cudatoolkit 11.3.1 h2bc3f7f_2
[conda] ffmpeg 4.3 hf484d3e_0 pytorch
[conda] mkl 2021.2.0 h06a4308_296
[conda] mkl-service 2.3.0 py38h27cfd23_1
[conda] mkl_fft 1.3.0 py38h42c9631_2
[conda] mkl_random 1.2.1 py38ha9443f7_2
[conda] numpy 1.20.1 py38h93e21f0_0
[conda] numpy-base 1.20.1 py38h7d8b39e_0
[conda] numpydoc 1.1.0 pyhd3eb1b0_1
[conda] pytorch 1.11.0 py3.8_cuda11.3_cudnn8.2.0_0 pytorch
[conda] pytorch-lightning 1.3.3 pypi_0 pypi
[conda] pytorch-mutex 1.0 cuda pytorch
[conda] torch 1.11.0 pypi_0 pypi
[conda] torch-summary 1.4.5 pypi_0 pypi
[conda] torchaudio 0.11.0 py38_cu113 pytorch
[conda] torchmetrics 0.5.1 pypi_0 pypi
[conda] torchvision 0.12.0 pypi_0 pypi
[conda] vit-pytorch 0.22.0 pypi_0 pypi

cc @datumbox @vfdev-5 @YosuaMichael

Metadata

Metadata

Assignees

No one assigned

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions