Skip to content

🐛 [Bug] Encountered cuda 710 error when apply Torch-TensorRT to BERT #1418

Closed
@Mansterteddy

Description

@Mansterteddy

Bug Description

I wanted to use Torch-TensorRT to boost BERT model inference, but met following errors:

../aten/src/ATen/native/cuda/Indexing.cu:975: indexSelectLargeIndex: block: [239,0,0], thread: [32,0,0] Assertion srcIndex < srcSelectDimSize failed.
../aten/src/ATen/native/cuda/Indexing.cu:975: indexSelectLargeIndex: block: [239,0,0], thread: [33,0,0] Assertion srcIndex < srcSelectDimSize failed.
../aten/src/ATen/native/cuda/Indexing.cu:975: indexSelectLargeIndex: block: [239,0,0], thread: [34,0,0] Assertion srcIndex < srcSelectDimSize failed.
../aten/src/ATen/native/cuda/Indexing.cu:975: indexSelectLargeIndex: block: [239,0,0], thread: [35,0,0] Assertion srcIndex < srcSelectDimSize failed.
../aten/src/ATen/native/cuda/Indexing.cu:975: indexSelectLargeIndex: block: [239,0,0], thread: [36,0,0] Assertion srcIndex < srcSelectDimSize failed.
../aten/src/ATen/native/cuda/Indexing.cu:975: indexSelectLargeIndex: block: [239,0,0], thread: [37,0,0] Assertion srcIndex < srcSelectDimSize failed.
../aten/src/ATen/native/cuda/Indexing.cu:975: indexSelectLargeIndex: block: [239,0,0], thread: [38,0,0] Assertion srcIndex < srcSelectDimSize failed.
../aten/src/ATen/native/cuda/Indexing.cu:975: indexSelectLargeIndex: block: [239,0,0], thread: [39,0,0] Assertion srcIndex < srcSelectDimSize failed.
../aten/src/ATen/native/cuda/Indexing.cu:975: indexSelectLargeIndex: block: [239,0,0], thread: [40,0,0] Assertion srcIndex < srcSelectDimSize failed.
../aten/src/ATen/native/cuda/Indexing.cu:975: indexSelectLargeIndex: block: [239,0,0], thread: [41,0,0] Assertion srcIndex < srcSelectDimSize failed.

CUDA initialization failure with error: 710. Please check your CUDA installation: http://docs.nvidia.com/cuda/cuda-installation-guide-linux/index.html
Segmentation fault (core dumped)

To Reproduce

from transformers import BertModel, BertTokenizer, BertConfig
import numpy as np
import torch
import torch_tensorrt
import time

print("VERSION:", torch_tensorrt.__version__)

# Creating a dummy input
test_batchsz = 128
tokens_tensor = torch.ones((test_batchsz, 20)).to(torch.int32).cuda()
segments_tensors = torch.zeros((test_batchsz, 20)).to(torch.int32).cuda()
mask_tensors = torch.ones((test_batchsz, 20)).to(torch.int32).cuda()

model = BertModel.from_pretrained("bert-base-chinese", torchscript=True)
torch_script_module = torch.jit.trace(model.eval().cuda(), (tokens_tensor, mask_tensors, segments_tensors))

trt_ts_module = torch_tensorrt.compile(torch_script_module.float(),
                        inputs= [torch_tensorrt.Input(shape=[test_batchsz, 20], dtype=torch.int32),
                        torch_tensorrt.Input(shape=[test_batchsz, 20], dtype=torch.int32),
                        torch_tensorrt.Input(shape=[test_batchsz, 20], dtype=torch.int32),
                        ], 
                        enabled_precisions= {torch.float},
                        workspace_size=2000000000,
                        truncate_long_and_double=True)

Environment

Build information about Torch-TensorRT can be found by turning on debug messages

  • Torch-TensorRT Version (e.g. 1.0.0): 1.2.0
  • PyTorch Version (e.g. 1.0): 1.12.1
  • CPU Architecture:
  • OS (e.g., Linux): Linux
  • How you installed PyTorch (conda, pip, libtorch, source): pip
  • Build command you used (if compiling from source):
  • Are you using local sources or building from archives:
  • Python version: 3.8
  • CUDA version: 11.6
  • GPU models and configuration:
  • Any other relevant information:

Additional context

Metadata

Metadata

Labels

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions