Open
Description
We try to add 3 parameters to the 6x16-aarch64-neonfp16arith-cortex-a75.S.in script, and then the f16-gemm-6x16-minmax-asm-aarch64-neonfp16arith-cortex-a75.S kernel can be modified.
There are 3 parameters:
- size_t index,
- size_t tile,
- void* w_head
And we do some modification in the code:
- add new function (xnn_compute_gemm_fp16) in operator-run.c refer xnn_compute_gemm function,
- add bool flag(is_fp16_kernel) in reshape_fully_connected_nc function in fully-connected-nc.c, if the flag is true, we will set
fully_connected_op->compute[0].task_2d_tile_2d = (pthreadpool_task_2d_tile_2d_t) xnn_compute_gemm_fp16;
Then, we pass the index, tile, w_head parameters by the xnn_compute_gemm_fp16 function to f16-gemm-6x16-minmax-asm-aarch64-neonfp16arith-cortex-a75.S kernel
- In the assembly kernel, we load the parameters as following code:
LDR x8, [sp, 8] //load index
MOV x28, x8
LDR x8, [sp, 16] //load tile
MOV x19, x8
LDR x8, [sp, 24] //load w_head
MOV x21, x8
- Also, we tried the LDP instruction, but the results are same.
With the above modifications, we cannot get the correct values of the 3 parameters, and we guess that there may be something missing.
So we ask for help, thank you very much for your time