Open
Description
The tutorial gives an example about using packed_accessor64 to access tensor elements efficiently as follows. However, I still do not know how to use packed_accessor64. Can anyone give me a more specific example? Thanks.
__global__ void packed_accessor_kernel(
PackedTensorAccessor64<float, 2> foo,
float* trace) {
int i=threadIdx.x
gpuAtomicAdd(trace, foo[i][i])
}
torch::Tensor foo = torch::rand({12, 12});
// assert foo is 2-dimensional and holds floats.
auto foo_a = foo.packed_accessor64<float,2>();
float trace = 0;
packed_accessor_kernel<<<1, 12>>>(foo_a, &trace);
cc @sekyondaMeta @svekars @carljparker @NicolasHug @kit1980 @subramen