Description
A standard primitive in network BPF programs is checking that data + expected_length <= data_end
, bailing out if this condition doesn't hold. For example, revalidate_data
in Cilium checks that the packet is long enough to contain L2 and L3 headers.
The issue happens when revalidate_data
is called multiple times within a complex graph of branching. Attached is the minimal example of reproduction. Every part seems to be important: putting the body of the inline function lb4_extract_tuple
directly into handle_ipv4
produces good bytecode, removing seemingly useless conditions that never happen (ret == TC_ACT_SHOT
or ret == TC_ACT_REDIRECT
) also produces good bytecode, but the whole thing together produces buggy bytecode.
Compilation and testing command line:
$ clang -O2 -g -target bpf -std=gnu99 -nostdinc -Wall -Wextra -Werror -c custom.c -o custom.o
# bpftool prog load custom.o /sys/fs/bpf/test type sk_skb
# rm -v /sys/fs/bpf/test
Verifier output:
libbpf: prog 'tail_handle_ipv4_from_netdev': BPF program load failed: Permission denied
libbpf: prog 'tail_handle_ipv4_from_netdev': -- BEGIN PROG LOAD LOG --
0: R1=ctx(off=0,imm=0) R10=fp0
; int tail_handle_ipv4_from_netdev(struct __sk_buff *ctx)
0: (bf) r6 = r1 ; R1=ctx(off=0,imm=0) R6_w=ctx(off=0,imm=0)
1: (18) r0 = 0xffffff7a ; R0_w=4294967162
; return (void *)(unsigned long)ctx->data_end;
3: (61) r2 = *(u32 *)(r6 +80) ; R2_w=pkt_end(off=0,imm=0) R6_w=ctx(off=0,imm=0)
; return (void *)(unsigned long)ctx->data;
4: (61) r1 = *(u32 *)(r6 +76) ; R1_w=pkt(off=0,r=0,imm=0) R6_w=ctx(off=0,imm=0)
; if (data + tot_len > data_end)
5: (bf) r3 = r1 ; R1_w=pkt(off=0,r=0,imm=0) R3_w=pkt(off=0,r=0,imm=0)
6: (07) r3 += 34 ; R3_w=pkt(off=34,r=0,imm=0)
; if (data + tot_len > data_end)
7: (2d) if r3 > r2 goto pc+49 ; R2_w=pkt_end(off=0,imm=0) R3_w=pkt(off=34,r=34,imm=0)
; if (!(ctx->tc_index & TC_INDEX_F_SKIP_NODEPORT)) {
8: (61) r4 = *(u32 *)(r6 +44) ; R4_w=scalar(umax=4294967295,var_off=(0x0; 0xffffffff)) R6_w=ctx(off=0,imm=0)
; if (!(ctx->tc_index & TC_INDEX_F_SKIP_NODEPORT)) {
9: (57) r4 &= 4 ; R4_w=scalar(umax=4,var_off=(0x0; 0x4))
10: (bf) r3 = r1 ; R1_w=pkt(off=0,r=34,imm=0) R3_w=pkt(off=0,r=34,imm=0)
; if (!(ctx->tc_index & TC_INDEX_F_SKIP_NODEPORT)) {
11: (55) if r4 != 0x0 goto pc+35 47: R0_w=4294967162 R1_w=pkt(off=0,r=34,imm=0) R2_w=pkt_end(off=0,imm=0) R3_w=pkt(off=0,r=34,imm=0) R4_w=scalar(umax=4,var_off=(0x0; 0x4)) R6_w=ctx(off=0,imm=0) R10=fp0
; if (data + tot_len > data_end)
47: (07) r3 += 34 ; R3_w=pkt(off=34,r=34,imm=0)
;
48: (2d) if r3 > r2 goto pc+8 ; R2_w=pkt_end(off=0,imm=0) R3_w=pkt(off=34,r=34,imm=0)
; return (void *)(unsigned long)ctx->data_end;
49: (67) r2 <<= 32
R2 pointer arithmetic on pkt_end prohibited
processed 45 insns (limit 1000000) max_states_per_insn 1 total_states 4 peak_states 4 mark_read 2
-- END PROG LOAD LOG --
libbpf: prog 'tail_handle_ipv4_from_netdev': failed to load: -13
libbpf: failed to load object '/home/max/projects/cilium/cilium/bpf/tests/custom.o'
Error: failed to load object file
As can be observed with Godbolt, Clang generated the following transformation on pkt
and pkt_end
pointers (ctx->data
and ctx->data_end
) for one of revalidate_data
calls:
r2 <<= 32
r2 >>= 32
r1 <<= 32
r1 >>= 32
The other revalidate_data
calls don't have this zero-extension sequence, but the one that produces it fails to pass the verifier.
A similar issue has been reported recently (#59908, where this pattern of bitshifts was produced under other circumstances), but my case is more severe, because these useless instructions also make the verifier reject the program.