Description
After updating the video card drivers and cuda cores, torch, torchvision, when training using behavioral_cloning the error below occurs. Note that the error started after this update.
[INFO] CarParking. Step: 10000. Time Elapsed: 17.855 s. Mean Reward: -5.293. Std of Reward: 7.679. Training. [INFO] CarParking. Step: 20000. Time Elapsed: 21.741 s. Mean Reward: -3.358. Std of Reward: 5.096. Training. C:\actions-runner\_work\pytorch\pytorch\pytorch\aten\src\ATen\native\cuda\ScatterGatherKernel.cu:367: block: [0,0,0], thread: [0,0,0] Assertion
idx_dim >= 0 && idx_dim < index_size && "index out of bounds"failed. C:\actions-runner\_work\pytorch\pytorch\pytorch\aten\src\ATen\native\cuda\ScatterGatherKernel.cu:367: block: [0,0,0], thread: [1,0,0] Assertion
idx_dim >= 0 && idx_dim < index_size && "index out of bounds"failed. C:\actions-runner\_work\pytorch\pytorch\pytorch\aten\src\ATen\native\cuda\ScatterGatherKernel.cu:367: block: [0,0,0], thread: [2,0,0] Assertion
idx_dim >= 0 && idx_dim < index_size && "index out of bounds"failed. C:\actions-runner\_work\pytorch\pytorch\pytorch\aten\src\ATen\native\cuda\ScatterGatherKernel.cu:367: block: [0,0,0], thread: [3,0,0] Assertion
idx_dim >= 0 && idx_dim < index_size && "index out of bounds"failed. C:\actions-runner\_work\pytorch\pytorch\pytorch\aten\src\ATen\native\cuda\ScatterGatherKernel.cu:367: block: [0,0,0], thread: [4,0,0] Assertion
idx_dim >= 0 && idx_dim < index_size && "index out of bounds"failed. C:\actions-runner\_work\pytorch\pytorch\pytorch\aten\src\ATen\native\cuda\ScatterGatherKernel.cu:367: block: [0,0,0], thread: [6,0,0] Assertion
idx_dim >= 0 && idx_dim < index_size && "index out of bounds"failed. C:\actions-runner\_work\pytorch\pytorch\pytorch\aten\src\ATen\native\cuda\ScatterGatherKernel.cu:367: block: [0,0,0], thread: [7,0,0] Assertion
idx_dim >= 0 && idx_dim < index_size && "index out of bounds"failed. C:\actions-runner\_work\pytorch\pytorch\pytorch\aten\src\ATen\native\cuda\ScatterGatherKernel.cu:367: block: [0,0,0], thread: [8,0,0] Assertion
idx_dim >= 0 && idx_dim < index_size && "index out of bounds"failed. C:\actions-runner\_work\pytorch\pytorch\pytorch\aten\src\ATen\native\cuda\ScatterGatherKernel.cu:367: block: [0,0,0], thread: [9,0,0] Assertion
idx_dim >= 0 && idx_dim < index_size && "index out of bounds"failed. C:\actions-runner\_work\pytorch\pytorch\pytorch\aten\src\ATen\native\cuda\ScatterGatherKernel.cu:367: block: [0,0,0], thread: [10,0,0] Assertion
idx_dim >= 0 && idx_dim < index_size && "index out of bounds"failed. C:\actions-runner\_work\pytorch\pytorch\pytorch\aten\src\ATen\native\cuda\ScatterGatherKernel.cu:367: block: [0,0,0], thread: [11,0,0] Assertion
idx_dim >= 0 && idx_dim < index_size && "index out of bounds"failed. C:\actions-runner\_work\pytorch\pytorch\pytorch\aten\src\ATen\native\cuda\ScatterGatherKernel.cu:367: block: [0,0,0], thread: [12,0,0] Assertion
idx_dim >= 0 && idx_dim < index_size && "index out of bounds"failed. C:\actions-runner\_work\pytorch\pytorch\pytorch\aten\src\ATen\native\cuda\ScatterGatherKernel.cu:367: block: [0,0,0], thread: [13,0,0] Assertion
idx_dim >= 0 && idx_dim < index_size && "index out of bounds"failed. C:\actions-runner\_work\pytorch\pytorch\pytorch\aten\src\ATen\native\cuda\ScatterGatherKernel.cu:367: block: [0,0,0], thread: [14,0,0] Assertion
idx_dim >= 0 && idx_dim < index_size && "index out of bounds"failed. C:\actions-runner\_work\pytorch\pytorch\pytorch\aten\src\ATen\native\cuda\ScatterGatherKernel.cu:367: block: [0,0,0], thread: [15,0,0] Assertion
idx_dim >= 0 && idx_dim < index_size && "index out of bounds"failed. C:\actions-runner\_work\pytorch\pytorch\pytorch\aten\src\ATen\native\cuda\ScatterGatherKernel.cu:367: block: [0,0,0], thread: [17,0,0] Assertion
idx_dim >= 0 && idx_dim < index_size && "index out of bounds"failed. C:\actions-runner\_work\pytorch\pytorch\pytorch\aten\src\ATen\native\cuda\ScatterGatherKernel.cu:367: block: [0,0,0], thread: [19,0,0] Assertion
idx_dim >= 0 && idx_dim < index_size && "index out of bounds"failed. C:\actions-runner\_work\pytorch\pytorch\pytorch\aten\src\ATen\native\cuda\ScatterGatherKernel.cu:367: block: [0,0,0], thread: [20,0,0] Assertion
idx_dim >= 0 && idx_dim < index_size && "index out of bounds"failed. C:\actions-runner\_work\pytorch\pytorch\pytorch\aten\src\ATen\native\cuda\ScatterGatherKernel.cu:367: block: [0,0,0], thread: [21,0,0] Assertion
idx_dim >= 0 && idx_dim < index_size && "index out of bounds"failed. C:\actions-runner\_work\pytorch\pytorch\pytorch\aten\src\ATen\native\cuda\ScatterGatherKernel.cu:367: block: [0,0,0], thread: [23,0,0] Assertion
idx_dim >= 0 && idx_dim < index_size && "index out of bounds"failed. C:\actions-runner\_work\pytorch\pytorch\pytorch\aten\src\ATen\native\cuda\ScatterGatherKernel.cu:367: block: [0,0,0], thread: [24,0,0] Assertion
idx_dim >= 0 && idx_dim < index_size && "index out of bounds"failed. C:\actions-runner\_work\pytorch\pytorch\pytorch\aten\src\ATen\native\cuda\ScatterGatherKernel.cu:367: block: [0,0,0], thread: [26,0,0] Assertion
idx_dim >= 0 && idx_dim < index_size && "index out of bounds"failed. C:\actions-runner\_work\pytorch\pytorch\pytorch\aten\src\ATen\native\cuda\ScatterGatherKernel.cu:367: block: [0,0,0], thread: [28,0,0] Assertion
idx_dim >= 0 && idx_dim < index_size && "index out of bounds"failed. C:\actions-runner\_work\pytorch\pytorch\pytorch\aten\src\ATen\native\cuda\ScatterGatherKernel.cu:367: block: [0,0,0], thread: [29,0,0] Assertion
idx_dim >= 0 && idx_dim < index_size && "index out of bounds"failed. C:\actions-runner\_work\pytorch\pytorch\pytorch\aten\src\ATen\native\cuda\ScatterGatherKernel.cu:367: block: [0,0,0], thread: [31,0,0] Assertion
idx_dim >= 0 && idx_dim < index_size && "index out of bounds"failed. Traceback (most recent call last): File "D:\anaconda3\envs\mlagents\lib\site-packages\mlagents\trainers\trainer_controller.py", line 175, in start_learning n_steps = self.advance(env_manager) File "D:\anaconda3\envs\mlagents\lib\site-packages\mlagents_envs\timers.py", line 305, in wrapped return func(*args, **kwargs) File "D:\anaconda3\envs\mlagents\lib\site-packages\mlagents\trainers\trainer_controller.py", line 250, in advance trainer.advance() File "D:\anaconda3\envs\mlagents\lib\site-packages\mlagents\trainers\trainer\rl_trainer.py", line 302, in advance if self._update_policy(): File "D:\anaconda3\envs\mlagents\lib\site-packages\mlagents\trainers\trainer\on_policy_trainer.py", line 111, in _update_policy update_stats = self.optimizer.bc_module.update() File "D:\anaconda3\envs\mlagents\lib\site-packages\mlagents\trainers\torch_entities\components\bc\module.py", line 95, in update run_out = self._update_batch(mini_batch_demo, self.n_sequences) File "D:\anaconda3\envs\mlagents\lib\site-packages\mlagents\trainers\torch_entities\components\bc\module.py", line 184, in _update_batch self.optimizer.step() File "D:\anaconda3\envs\mlagents\lib\site-packages\torch\optim\optimizer.py", line 504, in wrapper out = func(*args, **kwargs) File "D:\anaconda3\envs\mlagents\lib\site-packages\torch\optim\optimizer.py", line 79, in _use_grad ret = func(self, *args, **kwargs) File "D:\anaconda3\envs\mlagents\lib\site-packages\torch\optim\adam.py", line 237, in step has_complex = self._init_group( File "D:\anaconda3\envs\mlagents\lib\site-packages\torch\optim\adam.py", line 174, in _init_group else torch.tensor(0.0, dtype=_get_scalar_dtype()) File "D:\anaconda3\envs\mlagents\lib\site-packages\torch\utils\_device.py", line 104, in __torch_function__ return func(*args, **kwargs) RuntimeError: CUDA error: device-side assert triggered CUDA kernel errors might be asynchronously reported at some other API call, so the stacktrace below might be incorrect. For debugging consider passing CUDA_LAUNCH_BLOCKING=1 Compile with
TORCH_USE_CUDA_DSA` to enable device-side assertions.
During handling of the above exception, another exception occurred:
Traceback (most recent call last):
File "D:\anaconda3\envs\mlagents\lib\runpy.py", line 196, in _run_module_as_main
return _run_code(code, main_globals, None,
File "D:\anaconda3\envs\mlagents\lib\runpy.py", line 86, in run_code
exec(code, run_globals)
File "D:\anaconda3\envs\mlagents\Scripts\mlagents-learn.exe_main.py", line 7, in
File "D:\anaconda3\envs\mlagents\lib\site-packages\mlagents\trainers\learn.py", line 270, in main
run_cli(parse_command_line())
File "D:\anaconda3\envs\mlagents\lib\site-packages\mlagents\trainers\learn.py", line 266, in run_cli
run_training(run_seed, options, num_areas)
File "D:\anaconda3\envs\mlagents\lib\site-packages\mlagents\trainers\learn.py", line 138, in run_training
tc.start_learning(env_manager)
File "D:\anaconda3\envs\mlagents\lib\site-packages\mlagents_envs\timers.py", line 305, in wrapped
return func(*args, **kwargs)
File "D:\anaconda3\envs\mlagents\lib\site-packages\mlagents\trainers\trainer_controller.py", line 200, in start_learning
self._save_models()
File "D:\anaconda3\envs\mlagents\lib\site-packages\mlagents_envs\timers.py", line 305, in wrapped
return func(*args, **kwargs)
File "D:\anaconda3\envs\mlagents\lib\site-packages\mlagents\trainers\trainer_controller.py", line 80, in _save_models
self.trainers[brain_name].save_model()
File "D:\anaconda3\envs\mlagents\lib\site-packages\mlagents\trainers\trainer\rl_trainer.py", line 172, in save_model
model_checkpoint = self._checkpoint()
File "D:\anaconda3\envs\mlagents\lib\site-packages\mlagents_envs\timers.py", line 305, in wrapped
return func(*args, **kwargs)
File "D:\anaconda3\envs\mlagents\lib\site-packages\mlagents\trainers\trainer\rl_trainer.py", line 144, in _checkpoint
export_path, auxillary_paths = self.model_saver.save_checkpoint(
File "D:\anaconda3\envs\mlagents\lib\site-packages\mlagents\trainers\model_saver\torch_model_saver.py", line 58, in save_checkpoint
torch.save(state_dict, f"{checkpoint_path}.pt")
File "D:\anaconda3\envs\mlagents\lib\site-packages\torch\serialization.py", line 965, in save
_save(
File "D:\anaconda3\envs\mlagents\lib\site-packages\torch\serialization.py", line 1264, in save
storage = storage.cpu()
File "D:\anaconda3\envs\mlagents\lib\site-packages\torch\storage.py", line 262, in cpu
return torch.UntypedStorage(self.size()).copy(self, False)
RuntimeError: CUDA error: device-side assert triggered
CUDA kernel errors might be asynchronously reported at some other API call, so the stacktrace below might be incorrect.
For debugging consider passing CUDA_LAUNCH_BLOCKING=1
Compile with TORCH_USE_CUDA_DSA
to enable device-side assertions.
`