Skip to content

[C] Disable use of NRVO #101038

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 1 commit into
base: main
Choose a base branch
from

Conversation

AaronBallman
Copy link
Collaborator

Applying NRVO changes the observable behavior of the abstract machine, which makes it an invalid optimization to apply in C. This disables the functionality in C.

This is the only code path where we set the NRVO state for a local variable in C (the other code paths are in coroutines and template instantiation, neither of which are available in C).

Fixes #100902

Applying NRVO changes the observable behavior of the abstract machine,
which makes it an invalid optimization to apply in C. This disables the
functionality in C.

This is the only code path where we set the NRVO state for a local
variable in C (the other code paths are in coroutines and template
instantiation, neither of which are available in C).

Fixes llvm#100902
@AaronBallman AaronBallman added c clang:frontend Language frontend issues, e.g. anything involving "Sema" labels Jul 29, 2024
@llvmbot llvmbot added the clang Clang issues not falling into any other category label Jul 29, 2024
@llvmbot
Copy link
Member

llvmbot commented Jul 29, 2024

@llvm/pr-subscribers-clang

Author: Aaron Ballman (AaronBallman)

Changes

Applying NRVO changes the observable behavior of the abstract machine, which makes it an invalid optimization to apply in C. This disables the functionality in C.

This is the only code path where we set the NRVO state for a local variable in C (the other code paths are in coroutines and template instantiation, neither of which are available in C).

Fixes #100902


Full diff: https://github.com/llvm/llvm-project/pull/101038.diff

3 Files Affected:

  • (modified) clang/docs/ReleaseNotes.rst (+6)
  • (modified) clang/lib/Sema/SemaDecl.cpp (+3-1)
  • (added) clang/test/CodeGen/nrvo.c (+75)
diff --git a/clang/docs/ReleaseNotes.rst b/clang/docs/ReleaseNotes.rst
index ddad083571eb1..b0a8c55835868 100644
--- a/clang/docs/ReleaseNotes.rst
+++ b/clang/docs/ReleaseNotes.rst
@@ -156,6 +156,12 @@ Bug Fixes to Compiler Builtins
 Bug Fixes to Attribute Support
 ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
 
+Bug Fixes to C Support
+^^^^^^^^^^^^^^^^^^^^^^
+
+- No longer applying Named Return Value Optimization (NRVO) in C as it is a
+  non-conforming optimization in C. (#GH100902)
+
 Bug Fixes to C++ Support
 ^^^^^^^^^^^^^^^^^^^^^^^^
 
diff --git a/clang/lib/Sema/SemaDecl.cpp b/clang/lib/Sema/SemaDecl.cpp
index 694a754646f27..0697d12d2fe0d 100644
--- a/clang/lib/Sema/SemaDecl.cpp
+++ b/clang/lib/Sema/SemaDecl.cpp
@@ -2176,7 +2176,9 @@ static void CheckPoppedLabel(LabelDecl *L, Sema &S,
 }
 
 void Sema::ActOnPopScope(SourceLocation Loc, Scope *S) {
-  S->applyNRVO();
+  // NRVO is only valid in C++, not in C.
+  if (getLangOpts().CPlusPlus)
+    S->applyNRVO();
 
   if (S->decl_empty()) return;
   assert((S->getFlags() & (Scope::DeclScope | Scope::TemplateParamScope)) &&
diff --git a/clang/test/CodeGen/nrvo.c b/clang/test/CodeGen/nrvo.c
new file mode 100644
index 0000000000000..39796ce2144e0
--- /dev/null
+++ b/clang/test/CodeGen/nrvo.c
@@ -0,0 +1,75 @@
+// NOTE: Assertions have been autogenerated by utils/update_cc_test_checks.py UTC_ARGS: --version 5
+// RUN: %clang_cc1 -triple i386-unknown-unknown -emit-llvm -disable-llvm-passes -o - %s | FileCheck --check-prefixes=C %s
+// RUN: %clang_cc1 -triple i386-unknown-unknown -emit-llvm -disable-llvm-passes -o - -x c++ %s | FileCheck --check-prefixes=CXX %s
+
+// NRVO is not allowed in C as it is in C++, so validate that this results in
+// use of NRVO in C++, but not in C.
+typedef struct {
+    int i, j;
+    double a, b;
+} S;
+
+int result;
+
+// C-LABEL: define dso_local void @test(
+// C-SAME: ptr dead_on_unwind noalias writable sret([[STRUCT_S:%.*]]) align 4 [[AGG_RESULT:%.*]], ptr noundef [[Q:%.*]]) #[[ATTR0:[0-9]+]] {
+// C-NEXT:  [[ENTRY:.*:]]
+// C-NEXT:    [[RESULT_PTR:%.*]] = alloca ptr, align 4
+// C-NEXT:    [[Q_ADDR:%.*]] = alloca ptr, align 4
+// C-NEXT:    [[S:%.*]] = alloca [[STRUCT_S]], align 4
+// C-NEXT:    store ptr [[AGG_RESULT]], ptr [[RESULT_PTR]], align 4
+// C-NEXT:    store ptr [[Q]], ptr [[Q_ADDR]], align 4
+// C-NEXT:    call void @llvm.memset.p0.i32(ptr align 4 [[S]], i8 0, i32 24, i1 false)
+// C-NEXT:    [[TMP0:%.*]] = load ptr, ptr [[Q_ADDR]], align 4
+// C-NEXT:    [[CMP:%.*]] = icmp eq ptr [[S]], [[TMP0]]
+// C-NEXT:    [[CONV:%.*]] = zext i1 [[CMP]] to i32
+// C-NEXT:    store i32 [[CONV]], ptr @result, align 4
+// C-NEXT:    call void @llvm.memcpy.p0.p0.i32(ptr align 4 [[AGG_RESULT]], ptr align 4 [[S]], i32 24, i1 false)
+// C-NEXT:    ret void
+//
+// CXX-LABEL: define dso_local void @_Z4testP1S(
+// CXX-SAME: ptr dead_on_unwind noalias writable sret([[STRUCT_S:%.*]]) align 4 [[AGG_RESULT:%.*]], ptr noundef [[Q:%.*]]) #[[ATTR0:[0-9]+]] {
+// CXX-NEXT:  [[ENTRY:.*:]]
+// CXX-NEXT:    [[RESULT_PTR:%.*]] = alloca ptr, align 4
+// CXX-NEXT:    [[Q_ADDR:%.*]] = alloca ptr, align 4
+// CXX-NEXT:    store ptr [[AGG_RESULT]], ptr [[RESULT_PTR]], align 4
+// CXX-NEXT:    store ptr [[Q]], ptr [[Q_ADDR]], align 4
+// CXX-NEXT:    call void @llvm.memset.p0.i32(ptr align 4 [[AGG_RESULT]], i8 0, i32 24, i1 false)
+// CXX-NEXT:    [[TMP0:%.*]] = load ptr, ptr [[Q_ADDR]], align 4
+// CXX-NEXT:    [[CMP:%.*]] = icmp eq ptr [[AGG_RESULT]], [[TMP0]]
+// CXX-NEXT:    [[CONV:%.*]] = zext i1 [[CMP]] to i32
+// CXX-NEXT:    store i32 [[CONV]], ptr @result, align 4
+// CXX-NEXT:    ret void
+//
+S test(S* q) {
+    S s = {0, 0, 0, 0};
+    result = &s == q;
+    return s;
+}
+
+// C-LABEL: define dso_local i32 @main(
+// C-SAME: ) #[[ATTR0]] {
+// C-NEXT:  [[ENTRY:.*:]]
+// C-NEXT:    [[RETVAL:%.*]] = alloca i32, align 4
+// C-NEXT:    [[T:%.*]] = alloca [[STRUCT_S:%.*]], align 4
+// C-NEXT:    store i32 0, ptr [[RETVAL]], align 4
+// C-NEXT:    call void @test(ptr dead_on_unwind writable sret([[STRUCT_S]]) align 4 [[T]], ptr noundef [[T]])
+// C-NEXT:    [[TMP0:%.*]] = load i32, ptr @result, align 4
+// C-NEXT:    ret i32 [[TMP0]]
+//
+// CXX-LABEL: define dso_local noundef i32 @main(
+// CXX-SAME: ) #[[ATTR2:[0-9]+]] {
+// CXX-NEXT:  [[ENTRY:.*:]]
+// CXX-NEXT:    [[RETVAL:%.*]] = alloca i32, align 4
+// CXX-NEXT:    [[T:%.*]] = alloca [[STRUCT_S:%.*]], align 4
+// CXX-NEXT:    store i32 0, ptr [[RETVAL]], align 4
+// CXX-NEXT:    call void @_Z4testP1S(ptr dead_on_unwind writable sret([[STRUCT_S]]) align 4 [[T]], ptr noundef [[T]])
+// CXX-NEXT:    [[TMP0:%.*]] = load i32, ptr @result, align 4
+// CXX-NEXT:    ret i32 [[TMP0]]
+//
+int main(void)
+{
+  S t = test(&t);
+  return result;
+}
+

@AaronBallman
Copy link
Collaborator Author

One question I had is: this is an ABI breaking change for us, isn't it? Do we need/want ABI tags in that case? Should I call it out as a potentially breaking change?

@zygoloid
Copy link
Collaborator

NRVO vs not-NRVO is not an ABI difference, just a behavior difference -- it doesn't change the calling convention, and different choices here are link-compatible (in C++ we can make different choices for different emissions of the same inline function even).

This is also not a potentially breaking change in the sense that it was always an optional optimization. But it's breaking in the sense that it's an observable behavior change in our default mode.

Instead of forcing this off for C compilations, how would you feel about instead changing the default for -felide-constructors to be off for C, so people can still re-enable the optimization to avoid performance / stack usage regressions? (Weird flag name for C, I know; maybe we should also think about adding an alias such as -felide-return-copies.)

@AaronBallman
Copy link
Collaborator Author

I'll get on top of fixing the other failing test cases, but I'd appreciate confirmation that we agree with the direction this patch is heading first.

@AaronBallman
Copy link
Collaborator Author

Instead of forcing this off for C compilations, how would you feel about instead changing the default for -felide-constructors to be off for C, so people can still re-enable the optimization to avoid performance / stack usage regressions? (Weird flag name for C, I know; maybe we should also think about adding an alias such as -felide-return-copies.)

I think that's a reasonable language dialect for users, and agree that the name should change. Thanks for the suggestion!

@mizvekov
Copy link
Contributor

mizvekov commented Jul 29, 2024

It doesn't seem like elide-constructors is wired to control this behavior. Seems to be wired only to the StaticAnalyzer presently. edit: nvm, there are two flags with similar names, ElideConstructors and ShouldElideConstructors.

Regardless, the decision where to apply NRVO is currently centralized in SemaStmt.

I think "Sema::getNamedReturnInfo(const VarDecl *VD)" would be the best place to apply the effects of this flag.

@rjmccall
Copy link
Contributor

rjmccall commented Jul 29, 2024

I agree that the C standard does not appear to permit the result in Ted's example. There are three ways to fix this:

  1. Disable NRVO in the callee when aliasing is possible.
  2. Force a copy in the caller when aliasing is possible.
  3. Convince the committee that they should change the standard to permit these simple optimizations.

(1) also requires disabling copy elision in some cases:

struct S { int x, y; char make_it_huge[1024]; };
struct S foo(struct S *alias) {
  return (struct S) { .x = 5, .y = (alias->x = 7, 0) };
}
void test() {
  struct S var = foo(&var); // var.x must contain 5
}

I put "when aliasing is possible" in (1), but I can't see any reasonable way to make this decision locally in the callee because essentially any pointer could alias the return value. (1) really has to be done unconditionally.

It is tractable to implement (2) with a simple local escape analysis of the variable's initializer. This would not be tractable if the initialized variable were a global variable, but, fortunately, C does not permit global variables to be initialized with non-constant expressions.

(3) would presumably take the form of either permitting objects to overlap in certain situations, the same way that C++ permits copy elision, or just generally weakening object overlap rules during initialization.

Richard, I'm sorry to contradict you, but Aaron is correct to guess that this is ABI-affecting: ABIs should and sometimes do specify whether an indirect return address is allowed to be aliased. For example, the x86_64 psABI is quite clear:

If the type has class MEMORY, then the caller provides space for the return value and passes the address of this storage in %rdi as if it were the first argument to the function. In effect, this address becomes a “hidden” first argument. This storage must not overlap any data visible to the callee through other names than this argument.

So on x86_64, we are clearly required to at least do (2). I don't think we want different rules on different targets unless we're forced to.

If (2) is implemented correctly, I believe there's no way to observe NRVO, so we don't also need to implement (1).

@efriedma-quic
Copy link
Collaborator

We already have some code for detecting variable inits that access the variable; see isAccessedBy() in CGDecl.cpp.

Copy link
Contributor

@rjmccall rjmccall left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We shouldn't merge this PR unless we need an immediate fix, which seems unlikely given how longstanding this behavior is.

@AaronBallman
Copy link
Collaborator Author

We shouldn't merge this PR unless we need an immediate fix, which seems unlikely given how longstanding this behavior is.

Agreed, it's definitely not in shape for committing. As for needing an immediate fix, what surprised me was that two different people ran into this same issue within a few days of each other (#100902 and #100868)

@rjmccall
Copy link
Contributor

Note that we're forced to override that ABI rule in C++ sometimes — I believe we're now required (since C++11?) to elide a copy/move when initializing a normal object with a return value, and even if we weren't, it'd clearly be unreasonable compiler behavior to do an extra non-trivial operation here. I don't think the psABI intends that rule to have that effect anyway.

This does raise a language interoperation question, though, related to the tractability note I made about (2) above:

  • You can call a function to initialize a global variable in C++.
  • That function can be implemented in C, in which case it will expect the aliasing rule to still apply to its return value slot.
  • We cannot do any sort of local analysis on a global variable to prove that the variable is not accessed during its initializer.

C++ does have some restrictions on accessing objects that are being initialized through other names. It's possible that they're strong enough to satisfy the ABI rule here through a sort of reverse of the normal analysis: basically, any program that would violate the ABI rule is actually incorrect because of the semantic restriction. If not, I think we may need to force copies of trivial return types in the caller in C++. We can consider that separately, though.

@AaronBallman
Copy link
Collaborator Author

I'm a bit less clear on what to do now. It sounds like nobody is opposed to a language dialect flag (default off) so users can retain the existing behavior. But it also sounds like perhaps the correct approach to this is to modify codegen and not the NRVO markings when building the AST. (Then again, wouldn't we want to drop that bit from the AST in C anyway because we wouldn't want AST consumers to presume that NRVO is valid, such as one-off AST matchers for use in clang-tidy, etc?)

@mizvekov
Copy link
Contributor

I agree it's strange choice the effects of elide-constructor are applied on the CodeGen side, by ignoring the AST, instead of on the AST directly.

@rjmccall
Copy link
Contributor

rjmccall commented Jul 29, 2024

My argument above is that the x86-64 psABI requires us to not pass &t as the indirect return address. Following that rule fixes the problem here even if we keep doing NRVO, so we can completely drop any changes to NRVO and focus on implementing the psABI rule.

I think the approximate shape of the correct fix here is to:
(1) Use the function Eli pointed out to check whether the initializer refers to the variable being initialized.
(2) Propagate that information down in CodeGen when we're emitting into an address.
(3) When emitting a call into an address, if the return value is trivially copyable and the initialized address is potentially aliased, emit into a temporary and then copy.

@mizvekov
Copy link
Contributor

mizvekov commented Jul 29, 2024

From my testing in compiler explorer, in C++ mode GCC and Clang always produce programs that return 1 as far back as we can test.
ICC and MSVC do otherwise, they return 0.

So I think this supports the notion that this should be a dialect, and we have to wire defaults differently in MSVC mode as well.

@zygoloid
Copy link
Collaborator

Richard, I'm sorry to contradict you, but Aaron is correct to guess that this is ABI-affecting: ABIs should and sometimes do specify whether an indirect return address is allowed to be aliased.

My claim was narrowly that the choice to perform or not perform NRVO for an indirect return is not an ABI difference, which I'm not yet sure this contradicts. But you make a good point -- we shouldn't be turning off NRVO if it's only causing issues because we're also not correctly implementing an ABI rule on the caller side, and should instead fix Clang to obey the ABI rule.

Do we support ABIs that don't have a rule like x86_64's?

@efriedma-quic
Copy link
Collaborator

So I think this supports the notion that this should be a dialect, and we have to wire defaults differently in MSVC mode as well.

I doubt anyone is intentionally depending on equality here.

C++ does have some restrictions on accessing objects that are being initialized through other names. It's possible that they're strong enough to satisfy the ABI rule here through a sort of reverse of the normal analysis: basically, any program that would violate the ABI rule is actually incorrect because of the semantic restriction. If not, I think we may need to force copies of trivial return types in the caller in C++. We can consider that separately, though.

I suspect it isn't legal to actually access the object through another pointer while it's being constructed, but that doesn't help if the user is just doing a pointer equality comparison. And C++17 rules forbid a copy on the caller side: "guaranteed copy elision" means semantically, there is no copy.

@mizvekov
Copy link
Contributor

I doubt anyone is intentionally depending on equality here.

The difference wrt the gnu::cleanup behavior looks plausible: #100868

@rjmccall
Copy link
Contributor

rjmccall commented Jul 29, 2024

My claim was narrowly that the choice to perform or not perform NRVO for an indirect return is not an ABI difference, which I'm not yet sure this contradicts.

I accept your pedantry. 🫡

Do we support ABIs that don't have a rule like x86_64's?

The ARM and ARM64 AAPCSs specify that "The memory to be used for the result may be modified at any point during the function call" and "The callee may modify the result memory block at any point during the execution of the subroutine", respectively. I would interpret both to have a similar overall effect as the x86-64 psABI:

  • the function is not required to only assign to the memory immediately before returning
  • therefore doing earlier assignments must not have a semantic impact
  • therefore the caller must not pass memory that is legal to otherwise access during the call

I checked a handful of other major ABIs (i386, MIPS, and PPC), and they don't say anything directly about it. Typically, the absence of a statement must be interpreted as a lack of a guarantee. In this case, however, I would argue it is more plausible to infer that the ABI authors did not consider the possibility of passing a pointer to aliased memory for the return value, and therefore we should interpret their silence as an implicit statement that compilers are not permitted to do that.

There are plausible arguments for the reverse ABI rule from x86-64 here. For example, since trivial types don't distinguish initialization and assignment, an ABI rule that required indirect return values to be initialized as the final step in the callee would permit simple assignments in the caller (e.g. x = foo();) to be implemented by passing &x as the indirect return pointer. But a compiler couldn't even think about doing that without a strong, explicit guarantee in the ABI that functions can't interleave the initialization of the return value with arbitrary expression evaluation. (It's also a trade-off, because it forces the compiler to emit complex return value expressions (like a compound literal) into temporaries that it will copy at the end of the function.) So there's really no reason for an ABI to intentionally try to split the difference here — it's just putting itself into the worst of all worlds.

@rjmccall
Copy link
Contributor

rjmccall commented Jul 29, 2024

C++ does have some restrictions on accessing objects that are being initialized through other names. It's possible that they're strong enough to satisfy the ABI rule here through a sort of reverse of the normal analysis: basically, any program that would violate the ABI rule is actually incorrect because of the semantic restriction. If not, I think we may need to force copies of trivial return types in the caller in C++. We can consider that separately, though.

I suspect it isn't legal to actually access the object through another pointer while it's being constructed, but that doesn't help if the user is just doing a pointer equality comparison.

Ah, good point.

And C++17 rules forbid a copy on the caller side: "guaranteed copy elision" means semantically, there is no copy.

An extra copy is permitted for trivially-copyable types; otherwise we'd be required to pass and return all classes indirectly. So we're not forbidden from adding an extra copy here if we decide it's necessary.

@efriedma-quic
Copy link
Collaborator

And C++17 rules forbid a copy on the caller side: "guaranteed copy elision" means semantically, there is no copy.

An extra copy is permitted for trivially-copyable types; otherwise we'd be required to pass and return all classes indirectly. So we're not forbidden from adding an extra copy here if we decide it's necessary.

Oh, right.

@zygoloid
Copy link
Collaborator

Typically, the absence of a statement must be interpreted as a lack of a guarantee. In this case, however, I would argue it is more plausible to infer that the ABI authors did not consider the possibility of passing a pointer to aliased memory for the return value, and therefore we should interpret their silence as an implicit statement that compilers are not permitted to do that.

That seems like a reasonable argument. However, in my testing, a lot of C compilers (including GCC, Clang, ICC) pass a pointer to aliased memory for cases like struct X x = f(&x); in practice, and Clang seems to be the only one to do NRVO. A non-aliasing return slot plus NRVO seems to be ABI-conforming (at least for the ABIs that specify a non-aliasing return slot) and probably the best choice for performance, but it also seems to be the choice that is maximally incompatible with everyone else in practice :)

Should we try to convince at least the GCC folks that they're getting this wrong too?

(1) also requires disabling copy elision in some cases:

struct S { int x, y; char make_it_huge[1024]; };
struct S foo(struct S *alias) {
  return (struct S) { .x = 5, .y = (alias->x = 7, 0) };
}
void test() {
  struct S var = foo(&var); // var.x must contain 5
}

I put "when aliasing is possible" in (1), but I can't see any reasonable way to make this decision locally in the callee because essentially any pointer could alias the return value. (1) really has to be done unconditionally.

For what it's worth, (1) appears to be what GCC and ICC do, unconditionally. (And MSVC appears to make an unconditional copy in both the caller and the callee.)

@rjmccall
Copy link
Contributor

Should we try to convince at least the GCC folks that they're getting this wrong too?

It does feel like a wider conversation is necessary, yeah. The fact that everybody is doing this even on x86_64 where it's pretty incontrovertibly forbidden seems significant.

@fuhsnn
Copy link

fuhsnn commented Jul 30, 2024

Hi, I opened #100868. As a user without knowledge of LLVM internals, I think a finer-grained option like -felide-constructors=unaliased, which only apply RVO on unaliased objects, would be useful for both C++ and C. Maybe as default in C to prevent #100902, and unaliased cases that are under C's as-if rule will not suffer regression.

@zygoloid
Copy link
Collaborator

zygoloid commented Jul 30, 2024

I think __attribute__((cleanup)) on a variable should disable NRVO on that variable, in both C and in C++. GCC's behavior is to ignore the cleanup attribute entirely if NRVO is applied to the variable, which seems like a flawed approach, but performing NRVO and doing something different than GCC also seems problematic, as demonstrated in #100868. Given that, I think that #100868 and #100902 should probably be handled independently.

@AaronBallman
Copy link
Collaborator Author

Should we try to convince at least the GCC folks that they're getting this wrong too?

It does feel like a wider conversation is necessary, yeah. The fact that everybody is doing this even on x86_64 where it's pretty incontrovertibly forbidden seems significant.

CC @pinskia for awareness

@AaronBallman
Copy link
Collaborator Author

Given that, I think that #100868 and #100902 should probably be handled independently.

I've reopened #100868 and removed the duplicate label for it.

@pinskia
Copy link

pinskia commented Aug 1, 2024

Note I think the C++ issue with NRVO is recorded as https://cplusplus.github.io/CWG/issues/2868.html .

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
c clang:frontend Language frontend issues, e.g. anything involving "Sema" clang Clang issues not falling into any other category
Projects
None yet
Development

Successfully merging this pull request may close these issues.

[clang] NRVO used in C mode
8 participants