Open
Description
Would the kerne calll in lltm_cuda_forward
in the tutorial tutorials/advanced_source/cpp_extension.rst
fail on multi gpu systems if the inputs are not on the default device, i.e., device:0
?
To my understanding, some "magic" takes care of setting the right context if we add functionality do pytorch via custom kernels, see here.
However, it seems like in the tutorial this machinery is not used.
Explicit usage of at::OptionalDeviceGuard
should resolve the issue (?) in the tutorial.