-
Notifications
You must be signed in to change notification settings - Fork 13.6k
[C] Disable use of NRVO #101038
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
[C] Disable use of NRVO #101038
Conversation
Applying NRVO changes the observable behavior of the abstract machine, which makes it an invalid optimization to apply in C. This disables the functionality in C. This is the only code path where we set the NRVO state for a local variable in C (the other code paths are in coroutines and template instantiation, neither of which are available in C). Fixes llvm#100902
@llvm/pr-subscribers-clang Author: Aaron Ballman (AaronBallman) ChangesApplying NRVO changes the observable behavior of the abstract machine, which makes it an invalid optimization to apply in C. This disables the functionality in C. This is the only code path where we set the NRVO state for a local variable in C (the other code paths are in coroutines and template instantiation, neither of which are available in C). Fixes #100902 Full diff: https://github.com/llvm/llvm-project/pull/101038.diff 3 Files Affected:
diff --git a/clang/docs/ReleaseNotes.rst b/clang/docs/ReleaseNotes.rst
index ddad083571eb1..b0a8c55835868 100644
--- a/clang/docs/ReleaseNotes.rst
+++ b/clang/docs/ReleaseNotes.rst
@@ -156,6 +156,12 @@ Bug Fixes to Compiler Builtins
Bug Fixes to Attribute Support
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
+Bug Fixes to C Support
+^^^^^^^^^^^^^^^^^^^^^^
+
+- No longer applying Named Return Value Optimization (NRVO) in C as it is a
+ non-conforming optimization in C. (#GH100902)
+
Bug Fixes to C++ Support
^^^^^^^^^^^^^^^^^^^^^^^^
diff --git a/clang/lib/Sema/SemaDecl.cpp b/clang/lib/Sema/SemaDecl.cpp
index 694a754646f27..0697d12d2fe0d 100644
--- a/clang/lib/Sema/SemaDecl.cpp
+++ b/clang/lib/Sema/SemaDecl.cpp
@@ -2176,7 +2176,9 @@ static void CheckPoppedLabel(LabelDecl *L, Sema &S,
}
void Sema::ActOnPopScope(SourceLocation Loc, Scope *S) {
- S->applyNRVO();
+ // NRVO is only valid in C++, not in C.
+ if (getLangOpts().CPlusPlus)
+ S->applyNRVO();
if (S->decl_empty()) return;
assert((S->getFlags() & (Scope::DeclScope | Scope::TemplateParamScope)) &&
diff --git a/clang/test/CodeGen/nrvo.c b/clang/test/CodeGen/nrvo.c
new file mode 100644
index 0000000000000..39796ce2144e0
--- /dev/null
+++ b/clang/test/CodeGen/nrvo.c
@@ -0,0 +1,75 @@
+// NOTE: Assertions have been autogenerated by utils/update_cc_test_checks.py UTC_ARGS: --version 5
+// RUN: %clang_cc1 -triple i386-unknown-unknown -emit-llvm -disable-llvm-passes -o - %s | FileCheck --check-prefixes=C %s
+// RUN: %clang_cc1 -triple i386-unknown-unknown -emit-llvm -disable-llvm-passes -o - -x c++ %s | FileCheck --check-prefixes=CXX %s
+
+// NRVO is not allowed in C as it is in C++, so validate that this results in
+// use of NRVO in C++, but not in C.
+typedef struct {
+ int i, j;
+ double a, b;
+} S;
+
+int result;
+
+// C-LABEL: define dso_local void @test(
+// C-SAME: ptr dead_on_unwind noalias writable sret([[STRUCT_S:%.*]]) align 4 [[AGG_RESULT:%.*]], ptr noundef [[Q:%.*]]) #[[ATTR0:[0-9]+]] {
+// C-NEXT: [[ENTRY:.*:]]
+// C-NEXT: [[RESULT_PTR:%.*]] = alloca ptr, align 4
+// C-NEXT: [[Q_ADDR:%.*]] = alloca ptr, align 4
+// C-NEXT: [[S:%.*]] = alloca [[STRUCT_S]], align 4
+// C-NEXT: store ptr [[AGG_RESULT]], ptr [[RESULT_PTR]], align 4
+// C-NEXT: store ptr [[Q]], ptr [[Q_ADDR]], align 4
+// C-NEXT: call void @llvm.memset.p0.i32(ptr align 4 [[S]], i8 0, i32 24, i1 false)
+// C-NEXT: [[TMP0:%.*]] = load ptr, ptr [[Q_ADDR]], align 4
+// C-NEXT: [[CMP:%.*]] = icmp eq ptr [[S]], [[TMP0]]
+// C-NEXT: [[CONV:%.*]] = zext i1 [[CMP]] to i32
+// C-NEXT: store i32 [[CONV]], ptr @result, align 4
+// C-NEXT: call void @llvm.memcpy.p0.p0.i32(ptr align 4 [[AGG_RESULT]], ptr align 4 [[S]], i32 24, i1 false)
+// C-NEXT: ret void
+//
+// CXX-LABEL: define dso_local void @_Z4testP1S(
+// CXX-SAME: ptr dead_on_unwind noalias writable sret([[STRUCT_S:%.*]]) align 4 [[AGG_RESULT:%.*]], ptr noundef [[Q:%.*]]) #[[ATTR0:[0-9]+]] {
+// CXX-NEXT: [[ENTRY:.*:]]
+// CXX-NEXT: [[RESULT_PTR:%.*]] = alloca ptr, align 4
+// CXX-NEXT: [[Q_ADDR:%.*]] = alloca ptr, align 4
+// CXX-NEXT: store ptr [[AGG_RESULT]], ptr [[RESULT_PTR]], align 4
+// CXX-NEXT: store ptr [[Q]], ptr [[Q_ADDR]], align 4
+// CXX-NEXT: call void @llvm.memset.p0.i32(ptr align 4 [[AGG_RESULT]], i8 0, i32 24, i1 false)
+// CXX-NEXT: [[TMP0:%.*]] = load ptr, ptr [[Q_ADDR]], align 4
+// CXX-NEXT: [[CMP:%.*]] = icmp eq ptr [[AGG_RESULT]], [[TMP0]]
+// CXX-NEXT: [[CONV:%.*]] = zext i1 [[CMP]] to i32
+// CXX-NEXT: store i32 [[CONV]], ptr @result, align 4
+// CXX-NEXT: ret void
+//
+S test(S* q) {
+ S s = {0, 0, 0, 0};
+ result = &s == q;
+ return s;
+}
+
+// C-LABEL: define dso_local i32 @main(
+// C-SAME: ) #[[ATTR0]] {
+// C-NEXT: [[ENTRY:.*:]]
+// C-NEXT: [[RETVAL:%.*]] = alloca i32, align 4
+// C-NEXT: [[T:%.*]] = alloca [[STRUCT_S:%.*]], align 4
+// C-NEXT: store i32 0, ptr [[RETVAL]], align 4
+// C-NEXT: call void @test(ptr dead_on_unwind writable sret([[STRUCT_S]]) align 4 [[T]], ptr noundef [[T]])
+// C-NEXT: [[TMP0:%.*]] = load i32, ptr @result, align 4
+// C-NEXT: ret i32 [[TMP0]]
+//
+// CXX-LABEL: define dso_local noundef i32 @main(
+// CXX-SAME: ) #[[ATTR2:[0-9]+]] {
+// CXX-NEXT: [[ENTRY:.*:]]
+// CXX-NEXT: [[RETVAL:%.*]] = alloca i32, align 4
+// CXX-NEXT: [[T:%.*]] = alloca [[STRUCT_S:%.*]], align 4
+// CXX-NEXT: store i32 0, ptr [[RETVAL]], align 4
+// CXX-NEXT: call void @_Z4testP1S(ptr dead_on_unwind writable sret([[STRUCT_S]]) align 4 [[T]], ptr noundef [[T]])
+// CXX-NEXT: [[TMP0:%.*]] = load i32, ptr @result, align 4
+// CXX-NEXT: ret i32 [[TMP0]]
+//
+int main(void)
+{
+ S t = test(&t);
+ return result;
+}
+
|
One question I had is: this is an ABI breaking change for us, isn't it? Do we need/want ABI tags in that case? Should I call it out as a potentially breaking change? |
NRVO vs not-NRVO is not an ABI difference, just a behavior difference -- it doesn't change the calling convention, and different choices here are link-compatible (in C++ we can make different choices for different emissions of the same inline function even). This is also not a potentially breaking change in the sense that it was always an optional optimization. But it's breaking in the sense that it's an observable behavior change in our default mode. Instead of forcing this off for C compilations, how would you feel about instead changing the default for |
I'll get on top of fixing the other failing test cases, but I'd appreciate confirmation that we agree with the direction this patch is heading first. |
I think that's a reasonable language dialect for users, and agree that the name should change. Thanks for the suggestion! |
It doesn't seem like elide-constructors is wired to control this behavior. Seems to be wired only to the StaticAnalyzer presently. edit: nvm, there are two flags with similar names, ElideConstructors and ShouldElideConstructors. Regardless, the decision where to apply NRVO is currently centralized in SemaStmt. I think "Sema::getNamedReturnInfo(const VarDecl *VD)" would be the best place to apply the effects of this flag. |
I agree that the C standard does not appear to permit the result in Ted's example. There are three ways to fix this:
(1) also requires disabling copy elision in some cases: struct S { int x, y; char make_it_huge[1024]; };
struct S foo(struct S *alias) {
return (struct S) { .x = 5, .y = (alias->x = 7, 0) };
}
void test() {
struct S var = foo(&var); // var.x must contain 5
} I put "when aliasing is possible" in (1), but I can't see any reasonable way to make this decision locally in the callee because essentially any pointer could alias the return value. (1) really has to be done unconditionally. It is tractable to implement (2) with a simple local escape analysis of the variable's initializer. This would not be tractable if the initialized variable were a global variable, but, fortunately, C does not permit global variables to be initialized with non-constant expressions. (3) would presumably take the form of either permitting objects to overlap in certain situations, the same way that C++ permits copy elision, or just generally weakening object overlap rules during initialization. Richard, I'm sorry to contradict you, but Aaron is correct to guess that this is ABI-affecting: ABIs should and sometimes do specify whether an indirect return address is allowed to be aliased. For example, the x86_64 psABI is quite clear:
So on x86_64, we are clearly required to at least do (2). I don't think we want different rules on different targets unless we're forced to. If (2) is implemented correctly, I believe there's no way to observe NRVO, so we don't also need to implement (1). |
We already have some code for detecting variable inits that access the variable; see isAccessedBy() in CGDecl.cpp. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We shouldn't merge this PR unless we need an immediate fix, which seems unlikely given how longstanding this behavior is.
Agreed, it's definitely not in shape for committing. As for needing an immediate fix, what surprised me was that two different people ran into this same issue within a few days of each other (#100902 and #100868) |
Note that we're forced to override that ABI rule in C++ sometimes — I believe we're now required (since C++11?) to elide a copy/move when initializing a normal object with a return value, and even if we weren't, it'd clearly be unreasonable compiler behavior to do an extra non-trivial operation here. I don't think the psABI intends that rule to have that effect anyway. This does raise a language interoperation question, though, related to the tractability note I made about (2) above:
C++ does have some restrictions on accessing objects that are being initialized through other names. It's possible that they're strong enough to satisfy the ABI rule here through a sort of reverse of the normal analysis: basically, any program that would violate the ABI rule is actually incorrect because of the semantic restriction. If not, I think we may need to force copies of trivial return types in the caller in C++. We can consider that separately, though. |
I'm a bit less clear on what to do now. It sounds like nobody is opposed to a language dialect flag (default off) so users can retain the existing behavior. But it also sounds like perhaps the correct approach to this is to modify codegen and not the NRVO markings when building the AST. (Then again, wouldn't we want to drop that bit from the AST in C anyway because we wouldn't want AST consumers to presume that NRVO is valid, such as one-off AST matchers for use in clang-tidy, etc?) |
I agree it's strange choice the effects of elide-constructor are applied on the CodeGen side, by ignoring the AST, instead of on the AST directly. |
My argument above is that the x86-64 psABI requires us to not pass I think the approximate shape of the correct fix here is to: |
From my testing in compiler explorer, in C++ mode GCC and Clang always produce programs that return 1 as far back as we can test. So I think this supports the notion that this should be a dialect, and we have to wire defaults differently in MSVC mode as well. |
My claim was narrowly that the choice to perform or not perform NRVO for an indirect return is not an ABI difference, which I'm not yet sure this contradicts. But you make a good point -- we shouldn't be turning off NRVO if it's only causing issues because we're also not correctly implementing an ABI rule on the caller side, and should instead fix Clang to obey the ABI rule. Do we support ABIs that don't have a rule like x86_64's? |
I doubt anyone is intentionally depending on equality here.
I suspect it isn't legal to actually access the object through another pointer while it's being constructed, but that doesn't help if the user is just doing a pointer equality comparison. And C++17 rules forbid a copy on the caller side: "guaranteed copy elision" means semantically, there is no copy. |
The difference wrt the gnu::cleanup behavior looks plausible: #100868 |
I accept your pedantry. 🫡
The ARM and ARM64 AAPCSs specify that "The memory to be used for the result may be modified at any point during the function call" and "The callee may modify the result memory block at any point during the execution of the subroutine", respectively. I would interpret both to have a similar overall effect as the x86-64 psABI:
I checked a handful of other major ABIs (i386, MIPS, and PPC), and they don't say anything directly about it. Typically, the absence of a statement must be interpreted as a lack of a guarantee. In this case, however, I would argue it is more plausible to infer that the ABI authors did not consider the possibility of passing a pointer to aliased memory for the return value, and therefore we should interpret their silence as an implicit statement that compilers are not permitted to do that. There are plausible arguments for the reverse ABI rule from x86-64 here. For example, since trivial types don't distinguish initialization and assignment, an ABI rule that required indirect return values to be initialized as the final step in the callee would permit simple assignments in the caller (e.g. |
Ah, good point.
An extra copy is permitted for trivially-copyable types; otherwise we'd be required to pass and return all classes indirectly. So we're not forbidden from adding an extra copy here if we decide it's necessary. |
Oh, right. |
That seems like a reasonable argument. However, in my testing, a lot of C compilers (including GCC, Clang, ICC) pass a pointer to aliased memory for cases like Should we try to convince at least the GCC folks that they're getting this wrong too?
For what it's worth, (1) appears to be what GCC and ICC do, unconditionally. (And MSVC appears to make an unconditional copy in both the caller and the callee.) |
It does feel like a wider conversation is necessary, yeah. The fact that everybody is doing this even on x86_64 where it's pretty incontrovertibly forbidden seems significant. |
Hi, I opened #100868. As a user without knowledge of LLVM internals, I think a finer-grained option like |
I think |
CC @pinskia for awareness |
Note I think the C++ issue with NRVO is recorded as https://cplusplus.github.io/CWG/issues/2868.html . |
Applying NRVO changes the observable behavior of the abstract machine, which makes it an invalid optimization to apply in C. This disables the functionality in C.
This is the only code path where we set the NRVO state for a local variable in C (the other code paths are in coroutines and template instantiation, neither of which are available in C).
Fixes #100902