(Libtorch)How to use packed_accessor64 to access tensor elements in CUDA?

The [tutorial ](https://pytorch.org/cppdocs/notes/tensor_basics.html#cuda-accessors) gives an example about using _packed_accessor64_ to access tensor elements efficiently as follows. However, I still do not know how to use _packed_accessor64_. Can anyone give me a more specific example? Thanks.
```
__global__ void packed_accessor_kernel(
    PackedTensorAccessor64<float, 2> foo,
    float* trace) {
  int i=threadIdx.x
  gpuAtomicAdd(trace, foo[i][i])
}
 
torch::Tensor foo = torch::rand({12, 12});
 
// assert foo is 2-dimensional and holds floats.
auto foo_a = foo.packed_accessor64<float,2>();
float trace = 0;
 
packed_accessor_kernel<<<1, 12>>>(foo_a, &trace);
```

cc @sekyondaMeta @svekars @carljparker @NicolasHug @kit1980 @subramen

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

(Libtorch)How to use packed_accessor64 to access tensor elements in CUDA? #1539

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

(Libtorch)How to use packed_accessor64 to access tensor elements in CUDA? #1539

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions