-
Notifications
You must be signed in to change notification settings - Fork 13.6k
[BPF] expand mem intrinsics (memcpy, memmove, memset) #97648
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
Declare that we do not support these "RTLIB" symbols. Thus PreISel pass could handle it. Also removes `warn-call.ll` because we will generate loop for memintrins. Fixes: #93700
Hi @inclyc , I tried this patch with BPF Linux kernel selftests and this triggered a verification failure with
After this change it looks as follows:
Basically, fully unrolled version was replaced by loop. BPFTargetLowering::BPFTargetLowering(const TargetMachine &TM,
const BPFSubtarget &STI)
: TargetLowering(TM) {
...
MaxStoresPerMemset = MaxStoresPerMemsetOptSize = 0;
MaxStoresPerMemcpy = MaxStoresPerMemcpyOptSize = 0;
MaxStoresPerMemmove = MaxStoresPerMemmoveOptSize = 0;
...
} Because of the way kernel BPF verifier works, the unrolled version is preferable to the loop (verifier traces execution, so processing a loop would take more instruction budget). Is it possible to make @yonghong-song , fyi. |
Hi @eddyz87, Thanks for testing this PR in the kernel! ❤️
I'd like to figure out how to, and I kindly ask if "memset" is unrolled then it is friendly to the verifier, what should we do if the memset is called with dynamic length? (Expand it, or report error?) |
In certain cases verifier would be able to handle dynamic length.
In essence, verifier traces (probably, it is correct to say that it does abstract interpretation) for each execution path in the program, and tries to trim traces that reach equivalent states. This logic makes loops a pain point. For program A verifier would trace path Added: and there is a certain budget for maximal number instructions that verifier would trace, before giving up. |
If the user has a call to memcpy which cannot be simply unrolled, the current behavior is to issue an error. For example,
With current compiler (llvm18), we will have
But with this patch, I see
basically memcpy is 'inlined' by the compiler. This is probably not what we want for test1.c since memcpy call is from user. In such cases, we would like user to explicitly write the loop to have maximum performance (e.g., load/store with u64 and remaining with u8 etc). Maybe somehow we should prevent memcpy generation if it cannot be fully unrolled later (meaning no loops)? We have some backend hooks at IR level, maybe we can leverage them? |
@yonghong-song , the memcpy is inserted by
Given that there are many builtins that we probably would like to disable, second option seems better. There is also the following thing used by /// Options to disable Loop Idiom Recognize, which can be shared with other
/// passes.
struct DisableLIRP {
/// When true, the entire pass is disabled.
static bool All;
/// When true, Memset is disabled.
static bool Memset;
/// When true, Memcpy is disabled.
static bool Memcpy;
}; But this would only solve issue with loop idioms pass. |
Thanks @eddyz87 I think the following change is good enough for now.
Basically, LoopIdiomRecognizePass exposed DisableLIRP::All (for memcpy/memset) so other pass can disable this transformation. This should prevent original source code from generating a library func which later is inlined again. Note that bpf program does not like library functions since it prevents verifier from doing its work. |
Hi @yonghong-song , Do you think it is a nice option to add some mem intrinsics into the kernel? Thus we can combine & generate such intrinsics, and verifier can easily recognize it (if it is just a |
You probably mean adding some kernel kfunc functions (similar to what you mentioned intrinsic functions, e.g., memcpy/memset etc.). Currently, there is no plan for this as the verifier is able to handle fully-unrolled loop or actual loop pretty well. There is no special processing here. Introducing kfunc needs additional verifier change and there are no obvious benefit. We do not have use case for memcpy/memset kfunc yet. |
Declare that we do not support these "RTLIB" symbols. Thus PreISel pass could handle it.
Also removes
warn-call.ll
because we will generate loop for memintrins.Fixes: #93700