Closed
Description
Bug Description
When compiling the T5-base network (https://huggingface.co/t5-base), the following error is encountered:
RuntimeError: Expected all tensors to be on the same device, but found at least two devices, cuda:0 and cpu! (when checking argument for argument index in method wrapper__index_select)
To Reproduce
Steps to reproduce the behavior:
- Run torch_tensorrt.compile with
t5-base
model as input, using fp32 precision. - Choose two fixed-size inputs of shape [1, 128] and [1, 128] and enable truncate_long_and_double with 12 GB workspace.
- Pass in model keyword args to disable attention and hidden state outputs
- Run inference using the compiled model on two sample inputs.
Expected behavior
Model should successfully compile with Torch-TRT. Specifically, internal device mismatch issues should either be addressed with a warning at compile time, or should otherwise not cause errors.
Environment
- Torch-TensorRT Version: 1.4.0.dev0+f43be5b6
- PyTorch Version: 1.14.0.dev20221114+cu116
- CPU Architecture: Intel Xeon CPU
- OS: Ubuntu 20.04
- How you installed PyTorch: pip
- Build command you used:
python setup.py develop
- Are you using local sources or building from archives: local
- Python version: 3.8.13
- CUDA version: 11.6
Additional context
The problem seems related to #1416 which was intended to address device mismatch issues of this sort. Since this case is not caught by that PR, it likely arises in a different area, for example as a result of an internal computation in a Torch block.