Description
Bug Description
I wanted to use Torch-TensorRT to boost BERT model inference, but met following errors:
../aten/src/ATen/native/cuda/Indexing.cu:975: indexSelectLargeIndex: block: [239,0,0], thread: [32,0,0] Assertion srcIndex < srcSelectDimSize
failed.
../aten/src/ATen/native/cuda/Indexing.cu:975: indexSelectLargeIndex: block: [239,0,0], thread: [33,0,0] Assertion srcIndex < srcSelectDimSize
failed.
../aten/src/ATen/native/cuda/Indexing.cu:975: indexSelectLargeIndex: block: [239,0,0], thread: [34,0,0] Assertion srcIndex < srcSelectDimSize
failed.
../aten/src/ATen/native/cuda/Indexing.cu:975: indexSelectLargeIndex: block: [239,0,0], thread: [35,0,0] Assertion srcIndex < srcSelectDimSize
failed.
../aten/src/ATen/native/cuda/Indexing.cu:975: indexSelectLargeIndex: block: [239,0,0], thread: [36,0,0] Assertion srcIndex < srcSelectDimSize
failed.
../aten/src/ATen/native/cuda/Indexing.cu:975: indexSelectLargeIndex: block: [239,0,0], thread: [37,0,0] Assertion srcIndex < srcSelectDimSize
failed.
../aten/src/ATen/native/cuda/Indexing.cu:975: indexSelectLargeIndex: block: [239,0,0], thread: [38,0,0] Assertion srcIndex < srcSelectDimSize
failed.
../aten/src/ATen/native/cuda/Indexing.cu:975: indexSelectLargeIndex: block: [239,0,0], thread: [39,0,0] Assertion srcIndex < srcSelectDimSize
failed.
../aten/src/ATen/native/cuda/Indexing.cu:975: indexSelectLargeIndex: block: [239,0,0], thread: [40,0,0] Assertion srcIndex < srcSelectDimSize
failed.
../aten/src/ATen/native/cuda/Indexing.cu:975: indexSelectLargeIndex: block: [239,0,0], thread: [41,0,0] Assertion srcIndex < srcSelectDimSize
failed.
CUDA initialization failure with error: 710. Please check your CUDA installation: http://docs.nvidia.com/cuda/cuda-installation-guide-linux/index.html
Segmentation fault (core dumped)
To Reproduce
from transformers import BertModel, BertTokenizer, BertConfig
import numpy as np
import torch
import torch_tensorrt
import time
print("VERSION:", torch_tensorrt.__version__)
# Creating a dummy input
test_batchsz = 128
tokens_tensor = torch.ones((test_batchsz, 20)).to(torch.int32).cuda()
segments_tensors = torch.zeros((test_batchsz, 20)).to(torch.int32).cuda()
mask_tensors = torch.ones((test_batchsz, 20)).to(torch.int32).cuda()
model = BertModel.from_pretrained("bert-base-chinese", torchscript=True)
torch_script_module = torch.jit.trace(model.eval().cuda(), (tokens_tensor, mask_tensors, segments_tensors))
trt_ts_module = torch_tensorrt.compile(torch_script_module.float(),
inputs= [torch_tensorrt.Input(shape=[test_batchsz, 20], dtype=torch.int32),
torch_tensorrt.Input(shape=[test_batchsz, 20], dtype=torch.int32),
torch_tensorrt.Input(shape=[test_batchsz, 20], dtype=torch.int32),
],
enabled_precisions= {torch.float},
workspace_size=2000000000,
truncate_long_and_double=True)
Environment
Build information about Torch-TensorRT can be found by turning on debug messages
- Torch-TensorRT Version (e.g. 1.0.0): 1.2.0
- PyTorch Version (e.g. 1.0): 1.12.1
- CPU Architecture:
- OS (e.g., Linux): Linux
- How you installed PyTorch (
conda
,pip
,libtorch
, source): pip - Build command you used (if compiling from source):
- Are you using local sources or building from archives:
- Python version: 3.8
- CUDA version: 11.6
- GPU models and configuration:
- Any other relevant information: