-
Notifications
You must be signed in to change notification settings - Fork 13.6k
BPF: Ensure __sync_fetch_and_add() always generate atomic_fetch_add insn #101428
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
Thanks for the quick fix! The reporter's first name in the commit message should be "Peilin". |
Peilen Ye reported an issue ([1]) where for __sync_fetch_and_add(...) without return value like __sync_fetch_and_add(&foo, 1); llvm BPF backend generates locked insn e.g. lock *(u32 *)(r1 + 0) += r2 If __sync_fetch_and_add(...) returns a value like res = __sync_fetch_and_add(&foo, 1); llvm BPF backend generates like r2 = atomic_fetch_add((u32 *)(r1 + 0), r2) But 'lock *(u32 *)(r1 + 0) += r2' caused a problem in jit since proper barrier is not inserted based on __sync_fetch_and_add() semantics. The above discrepancy is due to commit [2] where it tries to maintain backward compatability since before commit [2], __sync_fetch_and_add(...) generates lock insn in BPF backend. Based on discussion in [1], now it is time to fix the above discrepancy so we can have proper barrier support in jit. llvm patch [3] made sure that __sync_fetch_and_add(...) always generates atomic_fetch_add(...) insns. Now 'lock *(u32 *)(r1 + 0) += r2' can only be generated by inline asm. The same for __sync_fetch_and_and(), __sync_fetch_and_or() and __sync_fetch_and_xor(). But the change in [3] caused arena_atomics selftest failure. test_arena_atomics:PASS:arena atomics skeleton open 0 nsec libbpf: prog 'and': BPF program load failed: Permission denied libbpf: prog 'and': -- BEGIN PROG LOAD LOG -- arg#0 reference type('UNKNOWN ') size cannot be determined: -22 0: R1=ctx() R10=fp0 ; if (pid != (bpf_get_current_pid_tgid() >> 32)) @ arena_atomics.c:87 0: (18) r1 = 0xffffc90000064000 ; R1_w=map_value(map=arena_at.bss,ks=4,vs=4) 2: (61) r6 = *(u32 *)(r1 +0) ; R1_w=map_value(map=arena_at.bss,ks=4,vs=4) R6_w=scalar(smin=0,smax=umax=0xffffffff,var_off=(0x0; 0xffffffff)) 3: (85) call bpf_get_current_pid_tgid#14 ; R0_w=scalar() 4: (77) r0 >>= 32 ; R0_w=scalar(smin=0,smax=umax=0xffffffff,var_off=(0x0; 0xffffffff)) 5: (5d) if r0 != r6 goto pc+11 ; R0_w=scalar(smin=0,smax=umax=0xffffffff,var_off=(0x0; 0xffffffff)) R6_w=scalar(smin=0,smax=umax=0xffffffff,var_off=(0x0; 0x) ; __sync_fetch_and_and(&and64_value, 0x011ull << 32); @ arena_atomics.c:91 6: (18) r1 = 0x100000000060 ; R1_w=scalar() 8: (bf) r1 = addr_space_cast(r1, 0, 1) ; R1_w=arena 9: (18) r2 = 0x1100000000 ; R2_w=0x1100000000 11: (db) r2 = atomic64_fetch_and((u64 *)(r1 +0), r2) BPF_ATOMIC stores into R1 arena is not allowed processed 9 insns (limit 1000000) max_states_per_insn 0 total_states 0 peak_states 0 mark_read 0 -- END PROG LOAD LOG -- libbpf: prog 'and': failed to load: -13 libbpf: failed to load object 'arena_atomics' libbpf: failed to load BPF skeleton 'arena_atomics': -13 test_arena_atomics:FAIL:arena atomics skeleton load unexpected error: -13 (errno 13) #3 arena_atomics:FAIL The reason of the failure is due to [4] where atomic{64,}_fetch_{and,or,xor}() are not allowed by arena addresses. Without llvm patch [3], the compiler will generate 'lock ...' insn and everything will work fine. This patch fixed the problem by using inline asms. Instead of __sync_fetch_and_{and,or,xor}() functions, the inline asm with 'lock' insn is used and it will work with or without [3]. [1] https://lore.kernel.org/bpf/[email protected]/T/#mb68d67bc8f39e35a0c3db52468b9de59b79f021f [2] llvm/llvm-project@286daaf [3] llvm/llvm-project#101428 [4] d503a04 ("bpf: Add support for certain atomics in bpf_arena to x86 JIT") Signed-off-by: Yonghong Song <[email protected]>
Peilen Ye reported an issue ([1]) where for __sync_fetch_and_add(...) without return value like __sync_fetch_and_add(&foo, 1); llvm BPF backend generates locked insn e.g. lock *(u32 *)(r1 + 0) += r2 If __sync_fetch_and_add(...) returns a value like res = __sync_fetch_and_add(&foo, 1); llvm BPF backend generates like r2 = atomic_fetch_add((u32 *)(r1 + 0), r2) But 'lock *(u32 *)(r1 + 0) += r2' caused a problem in jit since proper barrier is not inserted based on __sync_fetch_and_add() semantics. The above discrepancy is due to commit [2] where it tries to maintain backward compatability since before commit [2], __sync_fetch_and_add(...) generates lock insn in BPF backend. Based on discussion in [1], now it is time to fix the above discrepancy so we can have proper barrier support in jit. llvm patch [3] made sure that __sync_fetch_and_add(...) always generates atomic_fetch_add(...) insns. Now 'lock *(u32 *)(r1 + 0) += r2' can only be generated by inline asm. The same for __sync_fetch_and_and(), __sync_fetch_and_or() and __sync_fetch_and_xor(). But the change in [3] caused arena_atomics selftest failure. test_arena_atomics:PASS:arena atomics skeleton open 0 nsec libbpf: prog 'and': BPF program load failed: Permission denied libbpf: prog 'and': -- BEGIN PROG LOAD LOG -- arg#0 reference type('UNKNOWN ') size cannot be determined: -22 0: R1=ctx() R10=fp0 ; if (pid != (bpf_get_current_pid_tgid() >> 32)) @ arena_atomics.c:87 0: (18) r1 = 0xffffc90000064000 ; R1_w=map_value(map=arena_at.bss,ks=4,vs=4) 2: (61) r6 = *(u32 *)(r1 +0) ; R1_w=map_value(map=arena_at.bss,ks=4,vs=4) R6_w=scalar(smin=0,smax=umax=0xffffffff,var_off=(0x0; 0xffffffff)) 3: (85) call bpf_get_current_pid_tgid#14 ; R0_w=scalar() 4: (77) r0 >>= 32 ; R0_w=scalar(smin=0,smax=umax=0xffffffff,var_off=(0x0; 0xffffffff)) 5: (5d) if r0 != r6 goto pc+11 ; R0_w=scalar(smin=0,smax=umax=0xffffffff,var_off=(0x0; 0xffffffff)) R6_w=scalar(smin=0,smax=umax=0xffffffff,var_off=(0x0; 0x) ; __sync_fetch_and_and(&and64_value, 0x011ull << 32); @ arena_atomics.c:91 6: (18) r1 = 0x100000000060 ; R1_w=scalar() 8: (bf) r1 = addr_space_cast(r1, 0, 1) ; R1_w=arena 9: (18) r2 = 0x1100000000 ; R2_w=0x1100000000 11: (db) r2 = atomic64_fetch_and((u64 *)(r1 +0), r2) BPF_ATOMIC stores into R1 arena is not allowed processed 9 insns (limit 1000000) max_states_per_insn 0 total_states 0 peak_states 0 mark_read 0 -- END PROG LOAD LOG -- libbpf: prog 'and': failed to load: -13 libbpf: failed to load object 'arena_atomics' libbpf: failed to load BPF skeleton 'arena_atomics': -13 test_arena_atomics:FAIL:arena atomics skeleton load unexpected error: -13 (errno 13) #3 arena_atomics:FAIL The reason of the failure is due to [4] where atomic{64,}_fetch_{and,or,xor}() are not allowed by arena addresses. Without llvm patch [3], the compiler will generate 'lock ...' insn and everything will work fine. This patch fixed the problem by using inline asms. Instead of __sync_fetch_and_{and,or,xor}() functions, the inline asm with 'lock' insn is used and it will work with or without [3]. [1] https://lore.kernel.org/bpf/[email protected]/T/#mb68d67bc8f39e35a0c3db52468b9de59b79f021f [2] llvm/llvm-project@286daaf [3] llvm/llvm-project#101428 [4] d503a04 ("bpf: Add support for certain atomics in bpf_arena to x86 JIT") Signed-off-by: Yonghong Song <[email protected]>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The change itself looks fine to me, have a nitpick regarding dis-assembler behaviour for cpu v3.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nitpick, the CHECKs could be simplified a bit, e.g.:
; RUN: llc < %s -march=bpfel -verify-machineinstrs -show-mc-encoding | FileCheck %s
; RUN: llc < %s -march=bpfel -verify-machineinstrs -show-mc-encoding -mcpu=v3 | FileCheck --check-prefix=CHECK-V3 %s
; CHECK-LABEL: test_load_add_32
; CHECK: r2 = atomic_fetch_add((u32 *)(r1 + 0), r2)
; CHECK-V3: w2 = atomic_fetch_add((u32 *)(r1 + 0), w2)
; CHECK: encoding: [0xc3,0x21,0x00,0x00,0x01,0x00,0x00,0x00]
define void @test_load_add_32(ptr %p, i32 zeroext %v) {
entry:
atomicrmw add ptr %p, i32 %v seq_cst
ret void
}
; CHECK-LABEL: test_load_add_64
; CHECK: r2 = atomic_fetch_add((u64 *)(r1 + 0), r2)
; CHECK: encoding: [0xdb,0x21,0x00,0x00,0x01,0x00,0x00,0x00]
define void @test_load_add_64(ptr %p, i64 zeroext %v) {
entry:
atomicrmw add ptr %p, i64 %v seq_cst
ret void
}
On the other hand, it looks like we should change the dis-assembler to avoid the discrepancy:
; CHECK: r2 = atomic_fetch_add((u32 *)(r1 + 0), r2)
; CHECK-V3: w2 = atomic_fetch_add((u32 *)(r1 + 0), w2)
Since instruction encoding is the same.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks. Indeed simplified CHECKs a little bit.
For the difference for
; CHECK: r2 = atomic_fetch_add((u32 *)(r1 + 0), r2)
; CHECK-V3: w2 = atomic_fetch_add((u32 *)(r1 + 0), w2)
I think it is okay, since this is asm output from llc.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do you mind if I try to change this behaviour for both llc and disasm? (in a separate pr)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do you mind if I try to change this behaviour for both llc and disasm? (in a separate pr)
I don't know what code change you intend to do. But you can go ahead to send a patch and so we can discuss the change.
Peilen Ye reported an issue ([1]) where for __sync_fetch_and_add(...) without return value like __sync_fetch_and_add(&foo, 1); llvm BPF backend generates locked insn e.g. lock *(u32 *)(r1 + 0) += r2 If __sync_fetch_and_add(...) returns a value like res = __sync_fetch_and_add(&foo, 1); llvm BPF backend generates like r2 = atomic_fetch_add((u32 *)(r1 + 0), r2) But 'lock *(u32 *)(r1 + 0) += r2' caused a problem in jit since proper barrier is not inserted based on __sync_fetch_and_add() semantics. The above discrepancy is due to commit [2] where it tries to maintain backward compatability since before commit [2], __sync_fetch_and_add(...) generates lock insn in BPF backend. Based on discussion in [1], now it is time to fix the above discrepancy so we can have proper barrier support in jit. llvm patch [3] made sure that __sync_fetch_and_add(...) always generates atomic_fetch_add(...) insns. Now 'lock *(u32 *)(r1 + 0) += r2' can only be generated by inline asm. The same for __sync_fetch_and_and(), __sync_fetch_and_or() and __sync_fetch_and_xor(). But the change in [3] caused arena_atomics selftest failure. test_arena_atomics:PASS:arena atomics skeleton open 0 nsec libbpf: prog 'and': BPF program load failed: Permission denied libbpf: prog 'and': -- BEGIN PROG LOAD LOG -- arg#0 reference type('UNKNOWN ') size cannot be determined: -22 0: R1=ctx() R10=fp0 ; if (pid != (bpf_get_current_pid_tgid() >> 32)) @ arena_atomics.c:87 0: (18) r1 = 0xffffc90000064000 ; R1_w=map_value(map=arena_at.bss,ks=4,vs=4) 2: (61) r6 = *(u32 *)(r1 +0) ; R1_w=map_value(map=arena_at.bss,ks=4,vs=4) R6_w=scalar(smin=0,smax=umax=0xffffffff,var_off=(0x0; 0xffffffff)) 3: (85) call bpf_get_current_pid_tgid#14 ; R0_w=scalar() 4: (77) r0 >>= 32 ; R0_w=scalar(smin=0,smax=umax=0xffffffff,var_off=(0x0; 0xffffffff)) 5: (5d) if r0 != r6 goto pc+11 ; R0_w=scalar(smin=0,smax=umax=0xffffffff,var_off=(0x0; 0xffffffff)) R6_w=scalar(smin=0,smax=umax=0xffffffff,var_off=(0x0; 0x) ; __sync_fetch_and_and(&and64_value, 0x011ull << 32); @ arena_atomics.c:91 6: (18) r1 = 0x100000000060 ; R1_w=scalar() 8: (bf) r1 = addr_space_cast(r1, 0, 1) ; R1_w=arena 9: (18) r2 = 0x1100000000 ; R2_w=0x1100000000 11: (db) r2 = atomic64_fetch_and((u64 *)(r1 +0), r2) BPF_ATOMIC stores into R1 arena is not allowed processed 9 insns (limit 1000000) max_states_per_insn 0 total_states 0 peak_states 0 mark_read 0 -- END PROG LOAD LOG -- libbpf: prog 'and': failed to load: -13 libbpf: failed to load object 'arena_atomics' libbpf: failed to load BPF skeleton 'arena_atomics': -13 test_arena_atomics:FAIL:arena atomics skeleton load unexpected error: -13 (errno 13) #3 arena_atomics:FAIL The reason of the failure is due to [4] where atomic{64,}_fetch_{and,or,xor}() are not allowed by arena addresses. Without llvm patch [3], the compiler will generate 'lock ...' insn and everything will work fine. This patch fixed the problem by using inline asms. Instead of __sync_fetch_and_{and,or,xor}() functions, the inline asm with 'lock' insn is used and it will work with or without [3]. [1] https://lore.kernel.org/bpf/[email protected]/T/#mb68d67bc8f39e35a0c3db52468b9de59b79f021f [2] llvm/llvm-project@286daaf [3] llvm/llvm-project#101428 [4] d503a04 ("bpf: Add support for certain atomics in bpf_arena to x86 JIT") Signed-off-by: Yonghong Song <[email protected]>
Peilen Ye reported an issue ([1]) where for __sync_fetch_and_add(...) without return value like __sync_fetch_and_add(&foo, 1); llvm BPF backend generates locked insn e.g. lock *(u32 *)(r1 + 0) += r2 If __sync_fetch_and_add(...) returns a value like res = __sync_fetch_and_add(&foo, 1); llvm BPF backend generates like r2 = atomic_fetch_add((u32 *)(r1 + 0), r2) But 'lock *(u32 *)(r1 + 0) += r2' caused a problem in jit since proper barrier is not inserted based on __sync_fetch_and_add() semantics. The above discrepancy is due to commit [2] where it tries to maintain backward compatability since before commit [2], __sync_fetch_and_add(...) generates lock insn in BPF backend. Based on discussion in [1], now it is time to fix the above discrepancy so we can have proper barrier support in jit. llvm patch [3] made sure that __sync_fetch_and_add(...) always generates atomic_fetch_add(...) insns. Now 'lock *(u32 *)(r1 + 0) += r2' can only be generated by inline asm. The same for __sync_fetch_and_and(), __sync_fetch_and_or() and __sync_fetch_and_xor(). But the change in [3] caused arena_atomics selftest failure. test_arena_atomics:PASS:arena atomics skeleton open 0 nsec libbpf: prog 'and': BPF program load failed: Permission denied libbpf: prog 'and': -- BEGIN PROG LOAD LOG -- arg#0 reference type('UNKNOWN ') size cannot be determined: -22 0: R1=ctx() R10=fp0 ; if (pid != (bpf_get_current_pid_tgid() >> 32)) @ arena_atomics.c:87 0: (18) r1 = 0xffffc90000064000 ; R1_w=map_value(map=arena_at.bss,ks=4,vs=4) 2: (61) r6 = *(u32 *)(r1 +0) ; R1_w=map_value(map=arena_at.bss,ks=4,vs=4) R6_w=scalar(smin=0,smax=umax=0xffffffff,var_off=(0x0; 0xffffffff)) 3: (85) call bpf_get_current_pid_tgid#14 ; R0_w=scalar() 4: (77) r0 >>= 32 ; R0_w=scalar(smin=0,smax=umax=0xffffffff,var_off=(0x0; 0xffffffff)) 5: (5d) if r0 != r6 goto pc+11 ; R0_w=scalar(smin=0,smax=umax=0xffffffff,var_off=(0x0; 0xffffffff)) R6_w=scalar(smin=0,smax=umax=0xffffffff,var_off=(0x0; 0x) ; __sync_fetch_and_and(&and64_value, 0x011ull << 32); @ arena_atomics.c:91 6: (18) r1 = 0x100000000060 ; R1_w=scalar() 8: (bf) r1 = addr_space_cast(r1, 0, 1) ; R1_w=arena 9: (18) r2 = 0x1100000000 ; R2_w=0x1100000000 11: (db) r2 = atomic64_fetch_and((u64 *)(r1 +0), r2) BPF_ATOMIC stores into R1 arena is not allowed processed 9 insns (limit 1000000) max_states_per_insn 0 total_states 0 peak_states 0 mark_read 0 -- END PROG LOAD LOG -- libbpf: prog 'and': failed to load: -13 libbpf: failed to load object 'arena_atomics' libbpf: failed to load BPF skeleton 'arena_atomics': -13 test_arena_atomics:FAIL:arena atomics skeleton load unexpected error: -13 (errno 13) #3 arena_atomics:FAIL The reason of the failure is due to [4] where atomic{64,}_fetch_{and,or,xor}() are not allowed by arena addresses. Without llvm patch [3], the compiler will generate 'lock ...' insn and everything will work fine. This patch fixed the problem by using inline asms. Instead of __sync_fetch_and_{and,or,xor}() functions, the inline asm with 'lock' insn is used and it will work with or without [3]. [1] https://lore.kernel.org/bpf/[email protected]/T/#mb68d67bc8f39e35a0c3db52468b9de59b79f021f [2] llvm/llvm-project@286daaf [3] llvm/llvm-project#101428 [4] d503a04 ("bpf: Add support for certain atomics in bpf_arena to x86 JIT") Signed-off-by: Yonghong Song <[email protected]>
Peilen Ye reported an issue ([1]) where for __sync_fetch_and_add(...) without return value like __sync_fetch_and_add(&foo, 1); llvm BPF backend generates locked insn e.g. lock *(u32 *)(r1 + 0) += r2 If __sync_fetch_and_add(...) returns a value like res = __sync_fetch_and_add(&foo, 1); llvm BPF backend generates like r2 = atomic_fetch_add((u32 *)(r1 + 0), r2) But 'lock *(u32 *)(r1 + 0) += r2' caused a problem in jit since proper barrier is not inserted based on __sync_fetch_and_add() semantics. The above discrepancy is due to commit [2] where it tries to maintain backward compatability since before commit [2], __sync_fetch_and_add(...) generates lock insn in BPF backend. Based on discussion in [1], now it is time to fix the above discrepancy so we can have proper barrier support in jit. llvm patch [3] made sure that __sync_fetch_and_add(...) always generates atomic_fetch_add(...) insns. Now 'lock *(u32 *)(r1 + 0) += r2' can only be generated by inline asm. The same for __sync_fetch_and_and(), __sync_fetch_and_or() and __sync_fetch_and_xor(). But the change in [3] caused arena_atomics selftest failure. test_arena_atomics:PASS:arena atomics skeleton open 0 nsec libbpf: prog 'and': BPF program load failed: Permission denied libbpf: prog 'and': -- BEGIN PROG LOAD LOG -- arg#0 reference type('UNKNOWN ') size cannot be determined: -22 0: R1=ctx() R10=fp0 ; if (pid != (bpf_get_current_pid_tgid() >> 32)) @ arena_atomics.c:87 0: (18) r1 = 0xffffc90000064000 ; R1_w=map_value(map=arena_at.bss,ks=4,vs=4) 2: (61) r6 = *(u32 *)(r1 +0) ; R1_w=map_value(map=arena_at.bss,ks=4,vs=4) R6_w=scalar(smin=0,smax=umax=0xffffffff,var_off=(0x0; 0xffffffff)) 3: (85) call bpf_get_current_pid_tgid#14 ; R0_w=scalar() 4: (77) r0 >>= 32 ; R0_w=scalar(smin=0,smax=umax=0xffffffff,var_off=(0x0; 0xffffffff)) 5: (5d) if r0 != r6 goto pc+11 ; R0_w=scalar(smin=0,smax=umax=0xffffffff,var_off=(0x0; 0xffffffff)) R6_w=scalar(smin=0,smax=umax=0xffffffff,var_off=(0x0; 0x) ; __sync_fetch_and_and(&and64_value, 0x011ull << 32); @ arena_atomics.c:91 6: (18) r1 = 0x100000000060 ; R1_w=scalar() 8: (bf) r1 = addr_space_cast(r1, 0, 1) ; R1_w=arena 9: (18) r2 = 0x1100000000 ; R2_w=0x1100000000 11: (db) r2 = atomic64_fetch_and((u64 *)(r1 +0), r2) BPF_ATOMIC stores into R1 arena is not allowed processed 9 insns (limit 1000000) max_states_per_insn 0 total_states 0 peak_states 0 mark_read 0 -- END PROG LOAD LOG -- libbpf: prog 'and': failed to load: -13 libbpf: failed to load object 'arena_atomics' libbpf: failed to load BPF skeleton 'arena_atomics': -13 test_arena_atomics:FAIL:arena atomics skeleton load unexpected error: -13 (errno 13) #3 arena_atomics:FAIL The reason of the failure is due to [4] where atomic{64,}_fetch_{and,or,xor}() are not allowed by arena addresses. Without llvm patch [3], the compiler will generate 'lock ...' insn and everything will work fine. This patch fixed the problem by using inline asms. Instead of __sync_fetch_and_{and,or,xor}() functions, the inline asm with 'lock' insn is used and it will work with or without [3]. Note that three bpf programs ('and', 'or' and 'xor') are guarded with __BPF_FEATURE_ADDR_SPACE_CAST as well to ensure compilation failure for llvm <= 18 version. Note that for llvm <= 18 where addr_space_cast is not supported, all arena_atomics subtests are skipped with below message: test_arena_atomics:SKIP:no ENABLE_ATOMICS_TESTS or no addr_space_cast support in clang #3 arena_atomics:SKIP [1] https://lore.kernel.org/bpf/[email protected]/T/#mb68d67bc8f39e35a0c3db52468b9de59b79f021f [2] llvm/llvm-project@286daaf [3] llvm/llvm-project#101428 [4] d503a04 ("bpf: Add support for certain atomics in bpf_arena to x86 JIT") Signed-off-by: Yonghong Song <[email protected]>
Peilen Ye reported an issue ([1]) where for __sync_fetch_and_add(...) without return value like __sync_fetch_and_add(&foo, 1); llvm BPF backend generates locked insn e.g. lock *(u32 *)(r1 + 0) += r2 If __sync_fetch_and_add(...) returns a value like res = __sync_fetch_and_add(&foo, 1); llvm BPF backend generates like r2 = atomic_fetch_add((u32 *)(r1 + 0), r2) But 'lock *(u32 *)(r1 + 0) += r2' caused a problem in jit since proper barrier is not inserted based on __sync_fetch_and_add() semantics. The above discrepancy is due to commit [2] where it tries to maintain backward compatability since before commit [2], __sync_fetch_and_add(...) generates lock insn in BPF backend. Based on discussion in [1], now it is time to fix the above discrepancy so we can have proper barrier support in jit. llvm patch [3] made sure that __sync_fetch_and_add(...) always generates atomic_fetch_add(...) insns. Now 'lock *(u32 *)(r1 + 0) += r2' can only be generated by inline asm. The same for __sync_fetch_and_and(), __sync_fetch_and_or() and __sync_fetch_and_xor(). But the change in [3] caused arena_atomics selftest failure. test_arena_atomics:PASS:arena atomics skeleton open 0 nsec libbpf: prog 'and': BPF program load failed: Permission denied libbpf: prog 'and': -- BEGIN PROG LOAD LOG -- arg#0 reference type('UNKNOWN ') size cannot be determined: -22 0: R1=ctx() R10=fp0 ; if (pid != (bpf_get_current_pid_tgid() >> 32)) @ arena_atomics.c:87 0: (18) r1 = 0xffffc90000064000 ; R1_w=map_value(map=arena_at.bss,ks=4,vs=4) 2: (61) r6 = *(u32 *)(r1 +0) ; R1_w=map_value(map=arena_at.bss,ks=4,vs=4) R6_w=scalar(smin=0,smax=umax=0xffffffff,var_off=(0x0; 0xffffffff)) 3: (85) call bpf_get_current_pid_tgid#14 ; R0_w=scalar() 4: (77) r0 >>= 32 ; R0_w=scalar(smin=0,smax=umax=0xffffffff,var_off=(0x0; 0xffffffff)) 5: (5d) if r0 != r6 goto pc+11 ; R0_w=scalar(smin=0,smax=umax=0xffffffff,var_off=(0x0; 0xffffffff)) R6_w=scalar(smin=0,smax=umax=0xffffffff,var_off=(0x0; 0x) ; __sync_fetch_and_and(&and64_value, 0x011ull << 32); @ arena_atomics.c:91 6: (18) r1 = 0x100000000060 ; R1_w=scalar() 8: (bf) r1 = addr_space_cast(r1, 0, 1) ; R1_w=arena 9: (18) r2 = 0x1100000000 ; R2_w=0x1100000000 11: (db) r2 = atomic64_fetch_and((u64 *)(r1 +0), r2) BPF_ATOMIC stores into R1 arena is not allowed processed 9 insns (limit 1000000) max_states_per_insn 0 total_states 0 peak_states 0 mark_read 0 -- END PROG LOAD LOG -- libbpf: prog 'and': failed to load: -13 libbpf: failed to load object 'arena_atomics' libbpf: failed to load BPF skeleton 'arena_atomics': -13 test_arena_atomics:FAIL:arena atomics skeleton load unexpected error: -13 (errno 13) #3 arena_atomics:FAIL The reason of the failure is due to [4] where atomic{64,}_fetch_{and,or,xor}() are not allowed by arena addresses. Without llvm patch [3], the compiler will generate 'lock ...' insn and everything will work fine. This patch fixed the problem by using inline asms. Instead of __sync_fetch_and_{and,or,xor}() functions, the inline asm with 'lock' insn is used and it will work with or without [3]. Note that three bpf programs ('and', 'or' and 'xor') are guarded with __BPF_FEATURE_ADDR_SPACE_CAST as well to ensure compilation failure for llvm <= 18 version. Note that for llvm <= 18 where addr_space_cast is not supported, all arena_atomics subtests are skipped with below message: test_arena_atomics:SKIP:no ENABLE_ATOMICS_TESTS or no addr_space_cast support in clang #3 arena_atomics:SKIP [1] https://lore.kernel.org/bpf/[email protected]/T/#mb68d67bc8f39e35a0c3db52468b9de59b79f021f [2] llvm/llvm-project@286daaf [3] llvm/llvm-project#101428 [4] d503a04 ("bpf: Add support for certain atomics in bpf_arena to x86 JIT") Signed-off-by: Yonghong Song <[email protected]>
Peilen Ye reported an issue ([1]) where for __sync_fetch_and_add(...) without return value like __sync_fetch_and_add(&foo, 1); llvm BPF backend generates locked insn e.g. lock *(u32 *)(r1 + 0) += r2 If __sync_fetch_and_add(...) returns a value like res = __sync_fetch_and_add(&foo, 1); llvm BPF backend generates like r2 = atomic_fetch_add((u32 *)(r1 + 0), r2) The above generation of 'lock *(u32 *)(r1 + 0) += r2' caused a problem in jit since proper barrier is not inserted. The above discrepancy is due to commit [2] where it tries to maintain backward compatability since before commit [2], __sync_fetch_and_add(...) generates lock insn in BPF backend. Based on discussion in [1], now it is time to fix the above discrepancy so we can have proper barrier support in jit. This patch made sure that __sync_fetch_and_add(...) always generates atomic_fetch_add(...) insns. Now 'lock *(u32 *)(r1 + 0) += r2' can only be generated by inline asm. I also removed the whole BPFMIChecking.cpp file whose original purpose is to detect and issue errors if XADD{W,D,W32} may return a value used subsequently. Since insns XADD{W,D,W32} are all inline asm only now, such error detection is not needed. [1] https://lore.kernel.org/bpf/[email protected]/T/#mb68d67bc8f39e35a0c3db52468b9de59b79f021f [2] llvm@286daaf
@yonghong-song this seems to be a breaking change for folks still targeting |
@brycekahle Yes. This is a deliberate choice. See upstream discussion in The following is the quote from the discussion from Alexei: Right. We did it for backward compat. Older llvm was |
@yonghong-song what is the best place for me to voice my concerns? We ship object files to our Datadog customers that target a wide range of kernels, going much further back than 5.12. The total customer host install count is 400,000+. We do not have the luxury of only deploying eBPF on internal infrastructure where we have total control of the kernel version used. We also intentionally limit ourselves to At the very least, there should be a non-inline assembly way for us to explicitly use the |
@brycekahle
You can use inline asm like in cc @4ast for more comments. |
@yonghong-song we cannot use
|
Hi @brycekahle , Could you please elaborate a bit on issues with inline assembly and $ cat test3.c
void foo(unsigned long *p, unsigned long v)
{
asm volatile("lock *(u64 *)(%[p] + 0) += %[v];" :: [p]"r"(p), [v]"r"(v));
}
$ clang -mcpu=v1 -O2 --target=bpf -S -emit-llvm -o - test3.c | llc -march=bpf -mcpu=v1 -o - -filetype=asm
.text
.file "test3.c"
.globl foo # -- Begin function foo
.p2align 3
.type foo,@function
foo: # @foo
.Lfoo$local:
.type .Lfoo$local,@function
# %bb.0:
#APP
lock *(u64 *)(r1 + 0) += r2
#NO_APP
exit
.Lfunc_end0:
.size foo, .Lfunc_end0-foo
.size .Lfoo$local, .Lfunc_end0-foo
# -- End function |
@eddyz87 You are still using |
Well, that is a very strange arrangement, I'm not sure there is a documentation stating that this is a supported use-case. I remember several architecture-specific checks in the clang front-end. Anyways, simple example still works:
|
It is explicitly called out in the kernel documentation I linked and quoted. |
Oh, sorry, my bad. But you stated that there is some issue in usage of inline assembly and |
It turns out what breaks inline assembly with the default target is trying to use the bpf registers by name. Since the example you gave does not do that, it does actually compile. I still think generating v3 instructions when v1 is explicitly asked for, is a giant footgun that is going to trip up a lot of folks. Could you do this new behavior only when v3 (or later) is used? |
For asm body or for asm parameters? I was able to get error from
Agree that this is a footgun. I think we should report an error when
So intrinsic is optional. |
clang with default architecture and llc with bpf target works in some cases but not always, e.g. x86_64 target may generate (e.g. jump table) which bpf back will not support. In early days, when CORE is not available, we do use clang with default architecture and llc with bpf model, but that is complicated and need to access kernel internal headers. When CORE is available, the recommendation is to use 'clang --target=bpf ...' directly which is also we did in bpf selftests. |
We avoid this by passing
This is the use case I'm talking about. |
datadog is not the only user of (void)__sync_fetch_and_add(&counter, ...); For (void)__sync_fetch_and_add() and -mcpu=v3 we should generate fetch_and_add() insn. |
Sounds good, I will fix the issue for this in llvm20. |
…ch_add insn (llvm#101428)" This reverts commit c566769. See discussion in [1]. Currently, with -mcpu=v1/v2, atomic_fetch_add() insn is generated for (void)__sync_fetch_and_add(...). This breaks backward compatibility since there are codes who runs on old system (< 5.12) which does not support atomci_fetch_add(). Now let revert previous comment ([1]) and add additional logic in the next patch to ensure for (void)__sync_fetch_and_add(...), v1/v2: generate locked add insn >= v3: generate atomic_fetch_add insn [1] llvm#101428
…ch_add insn (llvm#101428)" This reverts commit c566769. See discussion in [1]. Currently, with -mcpu=v1/v2, atomic_fetch_add() insn is generated for (void)__sync_fetch_and_add(...). This breaks backward compatibility since there are codes who runs on old system (< 5.12) which does not support atomci_fetch_add(). Now let revert previous comment ([1]) and add additional logic in the next patch to ensure for (void)__sync_fetch_and_add(...), v1/v2: generate locked add insn >= v3: generate atomic_fetch_add insn [1] llvm#101428
Before llvm20, (void)__sync_fetch_and_add(...) always generates locked xadd insns. In linux kernel upstream discussion [1], it is found that for arm64 architecture, the original semantics of (void)__sync_fetch_and_add(...), i.e., __atomic_fetch_add(...), is preferred in order for jit to emit proper native barrier insns. In llvm commits [2] and [3], (void)__sync_fetch_and_add(...) will generate the following insns: - for cpu v1/v2: locked xadd insns to keep backward compatibility - for cpu v3/v4: __atomic_fetch_add() insns To ensure proper barrier semantics for (void)__sync_fetch_and_add(...), cpu v3/v4 is recommended. This patch enables cpu=v3 as the default cpu version. For users wanting to use cpu v1, -mcpu=v1 needs to be explicitly added to clang/llc command line. [1] https://lore.kernel.org/bpf/[email protected]/T/#mb68d67bc8f39e35a0c3db52468b9de59b79f021f [2] llvm#101428 [3] llvm#106494
Before llvm20, (void)__sync_fetch_and_add(...) always generates locked xadd insns. In linux kernel upstream discussion [1], it is found that for arm64 architecture, the original semantics of (void)__sync_fetch_and_add(...), i.e., __atomic_fetch_add(...), is preferred in order for jit to emit proper native barrier insns. In llvm commits [2] and [3], (void)__sync_fetch_and_add(...) will generate the following insns: - for cpu v1/v2: locked xadd insns to keep backward compatibility - for cpu v3/v4: __atomic_fetch_add() insns To ensure proper barrier semantics for (void)__sync_fetch_and_add(...), cpu v3/v4 is recommended. This patch enables cpu=v3 as the default cpu version. For users wanting to use cpu v1, -mcpu=v1 needs to be explicitly added to clang/llc command line. [1] https://lore.kernel.org/bpf/[email protected]/T/#mb68d67bc8f39e35a0c3db52468b9de59b79f021f [2] llvm#101428 [3] llvm#106494
Before llvm20, (void)__sync_fetch_and_add(...) always generates locked xadd insns. In linux kernel upstream discussion [1], it is found that for arm64 architecture, the original semantics of (void)__sync_fetch_and_add(...), i.e., __atomic_fetch_add(...), is preferred in order for jit to emit proper native barrier insns. In llvm commits [2] and [3], (void)__sync_fetch_and_add(...) will generate the following insns: - for cpu v1/v2: locked xadd insns to keep backward compatibility - for cpu v3/v4: __atomic_fetch_add() insns To ensure proper barrier semantics for (void)__sync_fetch_and_add(...), cpu v3/v4 is recommended. This patch enables cpu=v3 as the default cpu version. For users wanting to use cpu v1, -mcpu=v1 needs to be explicitly added to clang/llc command line. [1] https://lore.kernel.org/bpf/[email protected]/T/#mb68d67bc8f39e35a0c3db52468b9de59b79f021f [2] #101428 [3] #106494
Peilen Ye reported an issue ([1]) where for __sync_fetch_and_add(...) without return value like
__sync_fetch_and_add(&foo, 1);
llvm BPF backend generates locked insn e.g.
lock *(u32 *)(r1 + 0) += r2
If __sync_fetch_and_add(...) returns a value like
res = __sync_fetch_and_add(&foo, 1);
llvm BPF backend generates like
r2 = atomic_fetch_add((u32 *)(r1 + 0), r2)
The above generation of 'lock *(u32 *)(r1 + 0) += r2' caused a problem in jit since proper barrier is not inserted.
The above discrepancy is due to commit [2] where it tries to maintain backward compatability since before commit [2], __sync_fetch_and_add(...) generates lock insn in BPF backend.
Based on discussion in [1], now it is time to fix the above discrepancy so we can have proper barrier support in jit. This patch made sure that __sync_fetch_and_add(...) always generates atomic_fetch_add(...) insns. Now 'lock *(u32 *)(r1 + 0) += r2' can only be generated by inline asm. I also removed the whole BPFMIChecking.cpp file whose original purpose is to detect and issue errors if XADD{W,D,W32} may return a value used subsequently. Since insns XADD{W,D,W32} are all inline asm only now, such error detection is not needed.
[1] https://lore.kernel.org/bpf/[email protected]/T/#mb68d67bc8f39e35a0c3db52468b9de59b79f021f
[2] 286daaf