-
Notifications
You must be signed in to change notification settings - Fork 13.6k
[RFC][BPF] Support Jump Table #133856
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
[RFC][BPF] Support Jump Table #133856
Conversation
@aspsk As we discussed in LSFMMBPF, here is the implementation for llvm jump table support. Please take a look and try libbpf/kernel implementations. Let me know if you hit any issues. |
Don't bother. x86 is doing it to save a byte in encoding. This technique doesn't apply to bpf isa. |
|
||
let isIndirectBranch = 1 in { | ||
def JX : JMP_IND<BPF_JA, "gotox", [(brind i64:$dst)]>; | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nice to see how it should be done, I just had hardcoded it in my test branch: aspsk@98773c6
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks @yonghong-song! I will test this, match with the verification part, and post my results in this PR
@@ -65,10 +65,11 @@ BPFTargetLowering::BPFTargetLowering(const TargetMachine &TM, | |||
|
|||
setOperationAction(ISD::BR_CC, MVT::i64, Custom); | |||
setOperationAction(ISD::BR_JT, MVT::Other, Expand); | |||
setOperationAction(ISD::BRIND, MVT::Other, Expand); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
So, this does remove restriction to not produce indirect jumps?
Is there a way to control if we want to generate indirect jumps "in general" vs., say, "only for large switches"? (Or even only for a particular switch?)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
So, this does remove restriction to not produce indirect jumps?
Yes, we do not want to expand 'brind', rather we will do pattern matching with 'brind'.
Is there a way to control if we want to generate indirect jumps "in general" vs., say, "only for large switches"? (Or even only for a particular switch?)
Good point. Let me do some experiments with a flag for this. I am not sure whether I could do 'only for a particular switch', but I will do some investigation. Hopefully can find a s solution for that.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I added an option to control how many cases in a switch statement to use jump table. The default is 4 cases. But you can change it with additional clang option, e.g., the minimum number of cases must be 6, then
clang ... -mllvm -bpf-min-jump-table-entries=6
I checked other targets, there are no control for a specific switch. So I think we do not need them for now.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Awesome, thanks!
@yonghong-song could you please elaborate on this? How exactly is to classify those into per-table? |
The below is an example for test_tc_tunnel.bpf.o with
The above .rodata is what you really care. You can also find all .rodata relocations happen in decap and .text sections.
You then need to go through sections 'decap' and '.text' for their .rodata relocations.
It corresponds to insn 7 (0x38/8 = 7).
In the above 'r3 = 0x80' means the relocation starts 0x80 at .rodata section. You need to scan ALL such relocations in .text and decap sections and with that you can sort based on start of each relocation. After that, you will be able to calculate each relocation size. After you calculated each relocation size (for .rodata section), you need to check whether a particular relocation is for gotox or something else. So you need to go backwords to scan. For example,
You find a gotox insn with target r2, then you need to go back and find 'r2 = *(u64 *)(r2 + 0x0)' and then 'r2 += r3' and then 'r2 = 0x140 ll'. The above code pattern is gernated by llvm and should be generally true for jump table implementation. And you will be certain that the table for this particular gotox will be in offset 0x140 of .rodata section. The size of the table is already calculated based on the previous mechanism by scanning all .rodata relocations in .text and decap sections. |
I am looking into how to automate this properly (I have a really hacky PoC test working with this version of llvm and my custom test). It looks simpler with explicit jump tables (when I take an address of a label and store in an array), because then I can just push values to a custom section. Will post updates here. |
I find a llvm option
This way, you just need to scan related code section. As long as it |
This is one test failure like below:
The reason should be due to my unconditional enabling |
Thanks @yonghong-song, that size/offset section is really useful! This looks sufficient for me to continue with a PoC.
Unfortunately, I do, this is required for verification. For indirect jumps to work, two things should be verified:
The So, in order to construct a verifiable program, libbpf should:
(Haven't checked yet for real, but this looks to be enough for "custom", e.g., user-defined, jump tables to work. Just declare it as |
You are right. Verification does need to connect jump table map and gotox insn.
Backtrack certainly work. But maybe there is an alternative not to do backtrack.
Your user-defined jump table may work. But it would be great if we can just allow the current common switch statements from code cleanness and developer productivity. |
Right, this is exactly what I've meant by "backtrack". Looks like for |
Yes, libbpf does not need to do verifier work. The range analysis should be done in verifier. |
Hi @yonghong-song! I was trying different switch variants, simple ones work like magic, so we're definitely going the right direction. One simple case fails for me though. Namely, in the example below LLVM generates an unreachable instruction. Could you take a look please? An example source program is
Then the object file looks like
Now, the jump table is
And the check
makes sure that And this makes the instruction
unreachable. |
I suspect it won't be easy to avoid this on llvm side. Probably better to teach verifier to ignore those. |
Ok, thanks, will do this for now |
Update. I have a patch for kernel + libbpf which uses this LLVM and which passes all my new selftests + all (but one) standard bpf selftests which are compiled to use So far only one selftest fails ( |
✅ With the latest revision this PR passed the C/C++ code formatter. |
Thanks for the update. When trying your above example
I found a problem and just added another commit to fix the problem. The issue is due to llvm machine-sink pass. The implementation is similar to X86 (X86InstrInfo::getJumpTableIndex()). See the top commit (commit 4) for more details. |
Thanks @yonghong-song! I will test your latest changes over this weekend. (The |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do we need to modify the ASMParser also?
llvm-project/llvm/lib/Target/BPF/AsmParser/BPFAsmParser.cpp
Lines 228 to 233 in f2e62cf
static bool isValidIdAtStart(StringRef Name) { | |
return StringSwitch<bool>(Name.lower()) | |
.Case("if", true) | |
.Case("call", true) | |
.Case("callx", true) | |
.Case("goto", true) |
Right, need to add gotox as well. Will fix. Thanks! |
NOTE: We probably need cpu v5 or other flags to enable this feature. We can add it later when necessary. This patch adds jump table support. A new insn 'gotox <reg>' is added to allow goto through a register. The register represents the address in the current section. The function is a concrete example with bpf selftest progs/user_ringbuf_success.c. Compilation command line to generate .s file: ============================================= clang -g -Wall -Werror -D__TARGET_ARCH_x86 -mlittle-endian \ -I/home/yhs/work/bpf-next/tools/testing/selftests/bpf/tools/include \ -I/home/yhs/work/bpf-next/tools/testing/selftests/bpf \ -I/home/yhs/work/bpf-next/tools/include/uapi \ -I/home/yhs/work/bpf-next/tools/testing/selftests/usr/include -std=gnu11 \ -fno-strict-aliasing -Wno-compare-distinct-pointer-types \ -idirafter /home/yhs/work/llvm-project/llvm/build.21/Release/lib/clang/21/include \ -idirafter /usr/local/include -idirafter /usr/include \ -DENABLE_ATOMICS_TESTS -O2 -S progs/user_ringbuf_success.c \ -o /home/yhs/work/bpf-next/tools/testing/selftests/bpf/user_ringbuf_success.bpf.o.s \ --target=bpf -mcpu=v3 The related assembly: read_protocol_msg: ... r3 <<= 3 r1 = .LJTI1_0 ll r1 += r3 r1 = *(u64 *)(r1 + 0) gotox r1 LBB1_4: r1 = *(u64 *)(r0 + 8) goto LBB1_5 LBB1_7: r1 = *(u64 *)(r0 + 8) goto LBB1_8 LBB1_9: w1 = *(u32 *)(r0 + 8) r1 <<= 32 r1 s>>= 32 r2 = kern_mutated ll r3 = *(u64 *)(r2 + 0) r3 *= r1 *(u64 *)(r2 + 0) = r3 goto LBB1_11 LBB1_6: w1 = *(u32 *)(r0 + 8) r1 <<= 32 r1 s>>= 32 LBB1_5: ... .section .rodata,"a",@progbits .p2align 3, 0x0 .LJTI1_0: .quad LBB1_4 .quad LBB1_6 .quad LBB1_7 .quad LBB1_9 ... publish_next_kern_msg: ... r6 <<= 3 r1 = .LJTI6_0 ll r1 += r6 r1 = *(u64 *)(r1 + 0) gotox r1 LBB6_3: ... LBB6_5: ... LBB6_6: ... LBB6_4: ... .section .rodata,"a",@progbits .p2align 3, 0x0 .LJTI6_0: .quad LBB6_3 .quad LBB6_4 .quad LBB6_5 .quad LBB6_6 Now let us look at .o file ========================== clang -g -Wall -Werror -D__TARGET_ARCH_x86 -mlittle-endian \ -I/home/yhs/work/bpf-next/tools/testing/selftests/bpf/tools/include \ -I/home/yhs/work/bpf-next/tools/testing/selftests/bpf \ -I/home/yhs/work/bpf-next/tools/include/uapi \ -I/home/yhs/work/bpf-next/tools/testing/selftests/usr/include \ -std=gnu11 -fno-strict-aliasing -Wno-compare-distinct-pointer-types \ -idirafter /home/yhs/work/llvm-project/llvm/build.21/Release/lib/clang/21/include \ -idirafter /usr/local/include -idirafter /usr/include -DENABLE_ATOMICS_TESTS \ -O2 -c progs/user_ringbuf_success.c \ -o /home/yhs/work/bpf-next/tools/testing/selftests/bpf/user_ringbuf_success.bpf.o \ --target=bpf -mcpu=v3 In obj file, all .rodata sections are merged together. So we have $ llvm-readelf -x '.rodata' user_ringbuf_success.bpf.o Hex dump of section '.rodata': 0x00000000 a8020000 00000000 10030000 00000000 ................ 0x00000010 b8020000 00000000 c8020000 00000000 ................ 0x00000020 40040000 00000000 18050000 00000000 @............... 0x00000030 88040000 00000000 d0040000 00000000 ................ 0x00000040 44726169 6e207265 7475726e 65643a20 Drain returned: 0x00000050 256c640a 00556e65 78706563 7465646c %ld..Unexpectedl 0x00000060 79206661 696c6564 20746f20 67657420 y failed to get 0x00000070 6d73670a 00556e72 65636f67 6e697a65 msg..Unrecognize 0x00000080 64206f70 2025640a 00256c75 20213d20 d op %d..%lu != 0x00000090 256c750a 00627066 5f64796e 7074725f %lu..bpf_dynptr_ 0x000000a0 72656164 28292066 61696c65 643a2025 read() failed: % 0x000000b0 640a0055 6e657870 65637465 646c7920 d..Unexpectedly 0x000000c0 6661696c 65642074 6f20676 74207361 failed to get sa 0x000000d0 6d706c65 0a00 mple.. Let us look at the insns. Some annotation explains details. $ llvm-objdump -Sr user_ringbuf_success.bpf.o .... Disassembly of section .text: 0000000000000000 <read_protocol_msg>: ; msg = bpf_dynptr_data(dynptr, 0, sizeof(*msg)); 0: b4 02 00 00 00 00 00 00 w2 = 0x0 1: b4 03 00 00 10 00 00 00 w3 = 0x10 2: 85 00 00 00 cb 00 00 00 call 0xcb ... 0000000000000268 <handle_sample_msg>: ; switch (msg->msg_op) { 77: 61 13 00 00 00 00 00 00 w3 = *(u32 *)(r1 + 0x0) 78: 26 03 1c 00 03 00 00 00 if w3 > 0x3 goto +0x1c <handle_sample_msg+0xf0> 79: 67 03 00 00 03 00 00 00 r3 <<= 0x3 80: 18 02 00 00 00 00 00 00 00 00 00 00 00 00 00 00 r2 = 0x0 ll 0000000000000280: R_BPF_64_64 .rodata <=== r2 will be the address of .rodata with offset 0. <=== look at the first 32 bytes of .rodata: 0x00000000 a8020000 00000000 10030000 00000000 ................ 0x00000010 b8020000 00000000 c8020000 00000000 ................ The four actual addresses are 0x2a8: insn idx 0x2a8/8 = 85 0x310: insn idx 0x310/8 = 98 0x2b8: insn idx 0x2b8/8 = 87 0x2c8: insn idx 0x2c8/8 = 89 82: 0f 32 00 00 00 00 00 00 r2 += r3 83: 79 22 00 00 00 00 00 00 r2 = *(u64 *)(r2 + 0x0) 84: 0d 02 00 00 00 00 00 00 gotox r2 <=== So eventually gotox will go to the insn idx in this section. ; kern_mutated += msg->operand_64; 85: 79 11 08 00 00 00 00 00 r1 = *(u64 *)(r1 + 0x8) 86: 05 00 0e 00 00 00 00 00 goto +0xe <handle_sample_msg+0xc0> ; kern_mutated *= msg->operand_64; 87: 79 11 08 00 00 00 00 00 r1 = *(u64 *)(r1 + 0x8) 88: 05 00 03 00 00 00 00 00 goto +0x3 <handle_sample_msg+0x78> ; kern_mutated *= msg->operand_32; 89: 61 11 08 00 00 00 00 00 w1 = *(u32 *)(r1 + 0x8) 90: 67 01 00 00 20 00 00 00 r1 <<= 0x20 91: c7 01 00 00 20 00 00 00 r1 s>>= 0x20 92: 18 02 00 00 00 00 00 00 00 00 00 00 00 00 00 00 r2 = 0x0 ll ... 00000000000003a0 <publish_next_kern_msg>: ; { 116: bc 16 00 00 00 00 00 00 w6 = w1 ; msg = bpf_ringbuf_reserve(&kernel_ringbuf, sizeof(*msg), 0); 117: 18 01 00 00 00 00 00 00 00 00 00 00 00 00 00 00 r1 = 0x0 ll 00000000000003a8: R_BPF_64_64 kernel_ringbuf 119: b7 02 00 00 10 00 00 00 r2 = 0x10 120: b7 03 00 00 00 00 00 00 r3 = 0x0 121: 85 00 00 00 83 00 00 00 call 0x83 ; if (!msg) { 122: 55 00 06 00 00 00 00 00 if r0 != 0x0 goto +0x6 <publish_next_kern_msg+0x68> ; err = 4; 123: 18 01 00 00 00 00 00 00 00 00 00 00 00 00 00 00 r1 = 0x0 ll 00000000000003d8: R_BPF_64_64 err 125: b4 02 00 00 04 00 00 00 w2 = 0x4 126: 63 21 00 00 00 00 00 00 *(u32 *)(r1 + 0x0) = w2 127: b4 00 00 00 01 00 00 00 w0 = 0x1 ; return 1; 128: 05 00 31 00 00 00 00 00 goto +0x31 <publish_next_kern_msg+0x1f0> ; switch (index % TEST_MSG_OP_NUM_OPS) { 129: 54 06 00 00 03 00 00 00 w6 &= 0x3 130: 67 06 00 00 03 00 00 00 r6 <<= 0x3 131: 18 01 00 00 20 00 00 00 00 00 00 00 00 00 00 00 r1 = 0x20 ll 0000000000000418: R_BPF_64_64 .rodata <=== r2 will be the address of .rodata with offset 20. <=== look at the first 32 bytes of .rodata: 0x00000020 40040000 00000000 18050000 00000000 @............... 0x00000030 88040000 00000000 d0040000 00000000 ................ The four actual addresses are 0x440: insn idx 0x440/8 = 136 0x518: insn idx 0x518/8 = 163 0x488: insn idx 0x488/8 = 145 0x4d0: insn idx 0x4d0/8 = 154 133: 0f 61 00 00 00 00 00 00 r1 += r6 134: 79 11 00 00 00 00 00 00 r1 = *(u64 *)(r1 + 0x0) 135: 0d 01 00 00 00 00 00 00 gotox r1 <=== So eventually gotox will go to the insn idx in this section. 136: b4 01 00 00 00 00 00 00 w1 = 0x0 ; msg->msg_op = TEST_MSG_OP_INC64; 137: 63 10 00 00 00 00 00 00 *(u32 *)(r0 + 0x0) = w1 138: b7 01 00 00 04 00 00 00 r1 = 0x4 ; msg->operand_64 = operand_64; 139: 7b 10 08 00 00 00 00 00 *(u64 *)(r0 + 0x8) = r1 ; expected_user_mutated += operand_64; 140: 18 01 00 00 00 00 00 00 00 00 00 00 00 00 00 00 r1 = 0x0 ll 0000000000000460: R_BPF_64_64 expected_user_mutated 142: 79 11 00 00 00 00 00 00 r1 = *(u64 *)(r1 + 0x0) 143: 07 01 00 00 04 00 00 00 r1 += 0x4 ; break; 144: 05 00 1a 00 00 00 00 00 goto +0x1a <publish_next_kern_msg+0x1b8> 145: b4 01 00 00 02 00 00 00 w1 = 0x2 ; msg->msg_op = TEST_MSG_OP_MUL64; ... There are a few things worth to discuss. First, in the above, it is hard to find jump table size for a particular relocation ('R_BPF_64_64 .rodata + <offset>'). One thing is to scan through the whole elf file and you can find all '.rodata + <offset>' relocations. For example, here we have .rodata + 0 .rodata + 0x20 .rodata + 0x40 .rodata + 0x55 .rodata + 0x75 .rodata + 0x89 .rodata + 0x95 .rodata + 0xb3 With the above information, the size for each sub-rodata can be found easily. An option -bpf-min-jump-table-entries is implemented to control the minimum number of entries to use a jump table on BPF. The default value 4, but it can be changed with the following clang option clang ... -mllvm -bpf-min-jump-table-entries=6 where the number of jump table cases needs to be >= 6 in order to use jump table.
For example, [ 6] .rodata PROGBITS 0000000000000000 000740 0000d6 00 A 0 0 8 [ 7] .rel.rodata REL 0000000000000000 003860 000080 10 I 39 6 8 [ 8] .llvm_jump_table_sizes LLVM_JT_SIZES 0000000000000000 000816 000010 00 0 0 1 [ 9] .rel.llvm_jump_table_sizes REL 0000000000000000 0038e0 000010 10 I 39 8 8 ... [14] .llvm_jump_table_sizes LLVM_JT_SIZES 0000000000000000 000958 000010 00 0 0 1 [15] .rel.llvm_jump_table_sizes REL 0000000000000000 003970 000010 10 I 39 14 8 With llvm-readelf dump section 8 and 14: $ llvm-readelf -x 8 user_ringbuf_success.bpf.o Hex dump of section '.llvm_jump_table_sizes': 0x00000000 00000000 00000000 04000000 00000000 ................ $ llvm-readelf -x 14 user_ringbuf_success.bpf.o Hex dump of section '.llvm_jump_table_sizes': 0x00000000 20000000 00000000 04000000 00000000 ............... You can see. There are two jump tables: jump table 1: offset 0, size 4 (4 labels) jump table 2: offset 0x20, size 4 (4 labels) Check sections 9 and 15, we can find the corresponding section: Relocation section '.rel.llvm_jump_table_sizes' at offset 0x38e0 contains 1 entries: Offset Info Type Symbol's Value Symbol's Name 0000000000000000 0000000a00000002 R_BPF_64_ABS64 0000000000000000 .rodata Relocation section '.rel.llvm_jump_table_sizes' at offset 0x3970 contains 1 entries: Offset Info Type Symbol's Value Symbol's Name 0000000000000000 0000000a00000002 R_BPF_64_ABS64 0000000000000000 .rodata and confirmed that the relocation is against '.rodata'. Dump .rodata section: 0x00000000 a8000000 00000000 10010000 00000000 ................ 0x00000010 b8000000 00000000 c8000000 00000000 ................ 0x00000020 28040000 00000000 00050000 00000000 (............... 0x00000030 70040000 00000000 b8040000 00000000 p............... 0x00000040 44726169 6e207265 7475726e 65643a20 Drain returned: So we can get two jump tables: .rodata offset 0, # of lables 4: 0x00000000 a8000000 00000000 10010000 00000000 ................ 0x00000010 b8000000 00000000 c8000000 00000000 ................ .rodata offset 0x200, # of lables 4: 0x00000020 28040000 00000000 00050000 00000000 (............... 0x00000030 70040000 00000000 b8040000 00000000 p............... This way, you just need to scan related code section. As long as it matches one of jump tables (.rodata relocation, offset also matching), you do not need to care about gotox at all in libbpf.
The implementation is similar to func getJumpTableIndex() in X86InstrInfo.cpp. For the following example: struct simple_ctx { int x; int y; int z; }; int ret_user, ret_user2; void bar(void); int foo(struct simple_ctx *ctx, struct simple_ctx *ctx2) { switch (ctx->x) { case 1: ret_user = 8; break; case 6: ret_user = 3; break; case 2: ret_user = 4; break; case 31: ret_user = 5; break; default: ret_user = 19; break; } bar(); switch (ctx2->x) { case 0: ret_user2 = 8; break; case 7: ret_user2 = 3; break; case 9: ret_user2 = 4; break; case 31: ret_user2 = 5; break; default: ret_user2 = 29; break; } return 0; } Before machine-sink pass, Jump Tables: %jump-table.0: %bb.5 %bb.2 %bb.4 %bb.4 %bb.4 %bb.1 %bb.4 %bb.4 %bb.4 %bb.4 %bb.4 %bb.4 %bb.4 %bb.4 %bb.4 %bb.4 %bb.4 %bb.4 %bb.4 %bb.4 %bb.4 %bb.4 %bb.4 %bb.4 %bb.4 %bb.4 %bb.4 %bb.4 %bb.4 %bb.4 %bb.3 %jump-table.1: %bb.10 %bb.9 %bb.9 %bb.9 %bb.9 %bb.9 %bb.9 %bb.6 %bb.9 %bb.7 %bb.9 %bb.9 %bb.9 %bb.9 %bb.9 %bb.9 %bb.9 %bb.9 %bb.9 %bb.9 %bb.9 %bb.9 %bb.9 %bb.9 %bb.9 %bb.9 %bb.9 %bb.9 %bb.9 %bb.9 %bb.9 %bb.8 Machine-level IR: bb.0.entry: successors: %bb.4(0x0ccccccb), %bb.11(0x73333335); %bb.4(10.00%), %bb.11(90.00%) liveins: $r1, $r2 %3:gpr = COPY $r2 %2:gpr = COPY $r1 %4:gpr32 = MOV_ri_32 8 %6:gpr32 = LDW32 %2:gpr, 0 :: (load (s32) from %ir.ctx, !tbaa !3) %7:gpr32 = ADD_ri_32 %6:gpr32(tied-def 0), -1 %5:gpr = MOV_32_64 %7:gpr32 JUGT_ri_32 %7:gpr32, 30, %bb.4 bb.11.entry: ; predecessors: %bb.0 successors: %bb.5(0x1c71c71c), %bb.2(0x1c71c71c), %bb.4(0x0e38e38e), %bb.1(0x1c71c71c), %bb.3(0x1c71c71c); %bb.5(22.22%), %bb.2(22.22%), %bb.4(11.11%), %bb.1(22.22%), %bb.3(22.22%) %8:gpr = SLL_ri %5:gpr(tied-def 0), 3 %9:gpr = LD_imm64 %jump-table.0 %10:gpr = ADD_rr %9:gpr(tied-def 0), killed %8:gpr %11:gpr = LDD killed %10:gpr, 0 :: (load (s64) from jump-table) JX killed %11:gpr bb.1.sw.bb1: ; predecessors: %bb.11 successors: %bb.5(0x80000000); %bb.5(100.00%) %14:gpr32 = MOV_ri_32 3 JMP %bb.5 bb.2.sw.bb2: ; predecessors: %bb.11 successors: %bb.5(0x80000000); %bb.5(100.00%) %13:gpr32 = MOV_ri_32 4 JMP %bb.5 bb.3.sw.bb3: ; predecessors: %bb.11 successors: %bb.5(0x80000000); %bb.5(100.00%) %12:gpr32 = MOV_ri_32 5 JMP %bb.5 bb.4.sw.default: ; predecessors: %bb.11, %bb.0 successors: %bb.5(0x80000000); %bb.5(100.00%) %15:gpr32 = MOV_ri_32 19 bb.5.sw.epilog: ... After machine-sink pass: Jump Tables: %jump-table.0: %bb.13 %bb.2 %bb.4 %bb.4 %bb.4 %bb.1 %bb.4 %bb.4 %bb.4 %bb.4 %bb.4 %bb.4 %bb.4 %bb.4 %bb.4 %bb.4 %bb.4 %bb.4 %bb.4 %bb.4 %bb.4 %bb.4 %bb.4 %bb.4 %bb.4 %bb.4 %bb.4 %bb.4 %bb.4 %bb.4 %bb.3 %jump-table.1: %bb.14 %bb.9 %bb.9 %bb.9 %bb.9 %bb.9 %bb.9 %bb.6 %bb.9 %bb.7 %bb.9 %bb.9 %bb.9 %bb.9 %bb.9 %bb.9 %bb.9 %bb.9 %bb.9 %bb.9 %bb.9 %bb.9 %bb.9 %bb.9 %bb.9 %bb.9 %bb.9 %bb.9 %bb.9 %bb.9 %bb.9 %bb.8 Machine-level IR: bb.0.entry: successors: %bb.4(0x0ccccccb), %bb.11(0x73333335); %bb.4(10.00%), %bb.11(90.00%) liveins: $r1, $r2 %3:gpr = COPY $r2 %2:gpr = COPY $r1 %6:gpr32 = LDW32 %2:gpr, 0 :: (load (s32) from %ir.ctx, !tbaa !3) %7:gpr32 = ADD_ri_32 %6:gpr32(tied-def 0), -1 JUGT_ri_32 %7:gpr32, 30, %bb.4 bb.11.entry: ; predecessors: %bb.0 successors: %bb.13(0x1c71c71c), %bb.2(0x1c71c71c), %bb.4(0x0e38e38e), %bb.1(0x1c71c71c), %bb.3(0x1c71c71c); %bb.13(22.22%), %bb.2(22.22%), %bb.4(11.11%), %bb.1(22.22%), %bb.3(22.22%) %5:gpr = MOV_32_64 %7:gpr32 %8:gpr = SLL_ri %5:gpr(tied-def 0), 3 %9:gpr = LD_imm64 %jump-table.0 %10:gpr = ADD_rr %9:gpr(tied-def 0), killed %8:gpr %11:gpr = LDD killed %10:gpr, 0 :: (load (s64) from jump-table) JX killed %11:gpr bb.13: ; predecessors: %bb.11 successors: %bb.5(0x80000000); %bb.5(100.00%) %4:gpr32 = MOV_ri_32 8 JMP %bb.5 bb.1.sw.bb1: ; predecessors: %bb.11 successors: %bb.5(0x80000000); %bb.5(100.00%) %14:gpr32 = MOV_ri_32 3 JMP %bb.5 bb.2.sw.bb2: ; predecessors: %bb.11 successors: %bb.5(0x80000000); %bb.5(100.00%) %13:gpr32 = MOV_ri_32 4 JMP %bb.5 bb.3.sw.bb3: ; predecessors: %bb.11 successors: %bb.5(0x80000000); %bb.5(100.00%) %12:gpr32 = MOV_ri_32 5 JMP %bb.5 bb.4.sw.default: ; predecessors: %bb.11, %bb.0 successors: %bb.5(0x80000000); %bb.5(100.00%) %15:gpr32 = MOV_ri_32 19 bb.5.sw.epilog: Before machine-sink pass, '%4:gpr32 = MOV_ri_32 8' is in the entry block so there is no switch-branch for value 8. But machine-sink pass later removed '%4:gpr32 = MOV_ri_32 8' and add back the switch-branch with value 8. Such transformation requires adjust the jump table. This commit implemented backend callback function getJumpTableIndex() so jump table can be properly updated.
Also remove gotol to be in the middle of asm insn.
NOTE: We probably need cpu v5 or other flags to enable this feature. We can add it later when necessary.
This patch adds jump table support. A new insn 'gotox ' is added to allow goto through a register. The register represents the address in the current section. The function is a concrete example with bpf selftest progs/user_ringbuf_success.c.
Compilation command line to generate .s file:
The related assembly:
Now let us look at .o file
In obj file, all .rodata sections are merged together. So we have
Let us look at the insns. Some annotation explains details.
In the above, it is hard to find jump table size for a particular relocation ('R_BPF_64_64 .rodata + '). One thing is to scan through the whole elf file and you can find all '.rodata + ' relocations. For example, here we have
With the above information, the size for each sub-rodata can be found easily.
An option
-bpf-min-jump-table-entries
is implemented to control the minimumnumber of entries to use a jump table on BPF. The default value 4, but it
can be changed with the following clang option
where the number of jump table cases needs to be >= 6 in order to
use jump table.