-
Notifications
You must be signed in to change notification settings - Fork 13.6k
[libcxx] Allow string to use SSO in constant evaluation. #66576
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
Previously, libcxx forced all strings created during constant evaluation to point to allocated memory. That was apparently done due to implementation difficultites, but it turns out not to be necessary: this patch shows that it is feasible to use SSO strings during constant evaluation. It also simplifies the implementation. However, I'm not convinced that this is a good idea. It is currently an error in C++ for a pointer created during constant-evaluation-time to attempt to escape into the binary and become runtime-allocated memory. Thus, the existing string implementation has the property that NO constant-evaluation-created string object can escape to runtime. It is always an error. On the other hand, once we permit SSO strings at constant-evaluation-time, then "short enough" strings will be permitted to escape to runtime, while longer strings will produce an error. Thus, whether code will successfully compile now depends on whether the text is smaller than the SSO capacity. Given that the maximum SSO string length is an unspecified internal implementation detail which differs between implementations or platforms, having it become an important part of the API seems unfortunate. It is a new way to inadvertently write non-portable code, and to write difficult-to-modify code. If you depend on constexpr string initialization for a string that initially fits, it may be tricky to later modify your code to no longer depend on constexpr string initialization when it turns out you need a slightly longer string, or to port the code to a different implementation. That said, the other implementations already permit this, and don't seem to have any inclination to change. So, perhaps it's best to just follow suit. Currently, libstdc++ and MSVC allow constant-initialized strings to escape to runtime if they're 15 bytes or less (excluding the trailing NUL) -- except that libstdc++ does allow it for function-locals, only globals. With this patch, libc++ will permit such strings up to 22 bytes on 64-bit platforms, and up to 10 bytes on 32-bit platforms.
@llvm/pr-subscribers-libcxx ChangesPreviously, libcxx forced all strings created during constant evaluation to point to allocated memory. That was apparently done due to implementation difficultites, but it turns out not to be necessary: this patch shows that it is feasible to use SSO strings during constant evaluation. It also simplifies the implementation. However, I'm not convinced that this is a good idea. It is currently an error in C++ for a pointer created during constant-evaluation-time to attempt to escape into the binary and become runtime-allocated memory. Thus, the existing string implementation has the property that NO constant-evaluation-created string object can escape to runtime. It is always an error. On the other hand, once we permit SSO strings at constant-evaluation-time, then "short enough" strings will be permitted to escape to runtime, while longer strings will produce an error. Thus, whether code will successfully compile now depends on whether the text is smaller than the SSO capacity. Given that the maximum SSO string length is an unspecified internal implementation detail which differs between implementations or platforms, having it become an important part of the API seems unfortunate. It is a new way to inadvertently write non-portable code, and to write difficult-to-modify code. If you depend on constexpr string initialization for a string that initially fits, it may be tricky to later modify your code to no longer depend on constexpr string initialization when it turns out you need a slightly longer string, or to port the code to a different implementation. That said, the other implementations already permit this, and don't seem to have any inclination to change. So, perhaps it's best to just follow suit. Currently, libstdc++ and MSVC allow constant-initialized strings to escape to runtime if they're 15 bytes or less (excluding the trailing NUL) -- except that libstdc++ does allow it for function-locals, only globals. With this patch, libc++ will permit such strings up to 22 bytes on 64-bit platforms, and up to 10 bytes on 32-bit platforms.Full diff: https://github.com/llvm/llvm-project/pull/66576.diff 1 Files Affected:
diff --git a/libcxx/include/string b/libcxx/include/string
index 4b96273698166dd..349c892b9243ddb 100644
--- a/libcxx/include/string
+++ b/libcxx/include/string
@@ -830,8 +830,8 @@ private:
{
union
{
- __long __l;
__short __s;
+ __long __l;
__raw __r;
};
};
@@ -1729,8 +1729,10 @@ private:
_LIBCPP_HIDE_FROM_ABI _LIBCPP_CONSTEXPR_SINCE_CXX20
bool __is_long() const _NOEXCEPT {
- if (__libcpp_is_constant_evaluated())
- return true;
+ if (__libcpp_is_constant_evaluated()) {
+ if (__builtin_constant_p(__r_.first().__l.__is_long_))
+ return __r_.first().__l.__is_long_;
+ }
return __r_.first().__s.__is_long_;
}
@@ -1748,24 +1750,11 @@ private:
_LIBCPP_HIDE_FROM_ABI _LIBCPP_CONSTEXPR_SINCE_CXX20 void __default_init() {
__r_.first() = __rep();
- if (__libcpp_is_constant_evaluated()) {
- size_type __sz = __recommend(0) + 1;
- pointer __ptr = __alloc_traits::allocate(__alloc(), __sz);
- __begin_lifetime(__ptr, __sz);
- __set_long_pointer(__ptr);
- __set_long_cap(__sz);
- __set_long_size(0);
- }
- }
-
- _LIBCPP_HIDE_FROM_ABI _LIBCPP_CONSTEXPR_SINCE_CXX20 void __deallocate_constexpr() {
- if (__libcpp_is_constant_evaluated() && __get_pointer() != nullptr)
- __alloc_traits::deallocate(__alloc(), __get_pointer(), __get_long_cap());
}
_LIBCPP_CONSTEXPR _LIBCPP_HIDE_FROM_ABI static bool __fits_in_sso(size_type __sz) {
// SSO is disabled during constant evaluation because `__is_long` isn't constexpr friendly
- return !__libcpp_is_constant_evaluated() && (__sz < __min_cap);
+ return (__sz < __min_cap);
}
template <class _Iterator, class _Sentinel>
@@ -1877,10 +1866,7 @@ private:
size_type __recommend(size_type __s) _NOEXCEPT
{
if (__s < __min_cap) {
- if (__libcpp_is_constant_evaluated())
- return static_cast<size_type>(__min_cap);
- else
- return static_cast<size_type>(__min_cap) - 1;
+ return static_cast<size_type>(__min_cap) - 1;
}
size_type __guess = __align_it<sizeof(value_type) < __alignment ?
__alignment/sizeof(value_type) : 1 > (__s+1) - 1;
@@ -1969,7 +1955,8 @@ private:
allocator_type __a = __str.__alloc();
auto __allocation = std::__allocate_at_least(__a, __str.__get_long_cap());
__begin_lifetime(__allocation.ptr, __allocation.count);
- __alloc_traits::deallocate(__alloc(), __get_long_pointer(), __get_long_cap());
+ if (__is_long())
+ __alloc_traits::deallocate(__alloc(), __get_long_pointer(), __get_long_cap());
__alloc() = std::move(__a);
__set_long_pointer(__allocation.ptr);
__set_long_cap(__allocation.count);
@@ -2020,7 +2007,7 @@ private:
_LIBCPP_CONSTEXPR_SINCE_CXX20 basic_string& __assign_external(const value_type* __s, size_type __n);
// Assigns the value in __s, guaranteed to be __n < __min_cap in length.
- inline basic_string& __assign_short(const value_type* __s, size_type __n) {
+ inline _LIBCPP_CONSTEXPR_SINCE_CXX20 basic_string& __assign_short(const value_type* __s, size_type __n) {
pointer __p = __is_long()
? (__set_long_size(__n), __get_long_pointer())
: (__set_short_size(__n), __get_short_pointer());
@@ -2334,7 +2321,7 @@ basic_string<_CharT, _Traits, _Allocator>::__grow_by_and_replace
if (__sec_cp_sz != 0)
traits_type::copy(std::__to_address(__p) + __n_copy + __n_add,
std::__to_address(__old_p) + __n_copy + __n_del, __sec_cp_sz);
- if (__old_cap+1 != __min_cap || __libcpp_is_constant_evaluated())
+ if (__old_cap+1 != __min_cap)
__alloc_traits::deallocate(__alloc(), __old_p, __old_cap+1);
__set_long_pointer(__p);
__set_long_cap(__allocation.count);
@@ -2374,7 +2361,7 @@ basic_string<_CharT, _Traits, _Allocator>::__grow_by(size_type __old_cap, size_t
traits_type::copy(std::__to_address(__p) + __n_copy + __n_add,
std::__to_address(__old_p) + __n_copy + __n_del,
__sec_cp_sz);
- if (__libcpp_is_constant_evaluated() || __old_cap + 1 != __min_cap)
+ if (__old_cap + 1 != __min_cap)
__alloc_traits::deallocate(__alloc(), __old_p, __old_cap + 1);
__set_long_pointer(__p);
__set_long_cap(__allocation.count);
@@ -2537,12 +2524,8 @@ basic_string<_CharT, _Traits, _Allocator>::__move_assign(basic_string& __str, tr
}
__move_assign_alloc(__str);
__r_.first() = __str.__r_.first();
- if (__libcpp_is_constant_evaluated()) {
- __str.__default_init();
- } else {
- __str.__set_short_size(0);
- traits_type::assign(__str.__get_short_pointer()[0], value_type());
- }
+ __str.__set_short_size(0);
+ traits_type::assign(__str.__get_short_pointer()[0], value_type());
}
#endif
@@ -2828,13 +2811,6 @@ basic_string<_CharT, _Traits, _Allocator>::insert(size_type __pos, const value_t
if (__pos > __sz)
__throw_out_of_range();
size_type __cap = capacity();
- if (__libcpp_is_constant_evaluated()) {
- if (__cap - __sz >= __n)
- __grow_by_and_replace(__cap, 0, __sz, __pos, 0, __n, __s);
- else
- __grow_by_and_replace(__cap, __sz + __n - __cap, __sz, __pos, 0, __n, __s);
- return *this;
- }
if (__cap - __sz >= __n)
{
if (__n)
@@ -2843,7 +2819,7 @@ basic_string<_CharT, _Traits, _Allocator>::insert(size_type __pos, const value_t
size_type __n_move = __sz - __pos;
if (__n_move != 0)
{
- if (__p + __pos <= __s && __s < __p + __sz)
+ if (std::__is_pointer_in_range(__p + __pos, __p + __sz, __s))
__s += __n;
traits_type::move(__p + __pos + __n, __p + __pos, __n_move);
}
|
Is there a test to demonstrate the new behavior? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In general I really like this patch. I understand that this could be a portability problem, but given that libstdc++ and the MSVC STL already allow this, I don't think that's something we introduce. Maybe we could add a warning for this case. I think that would address any portability concerns. @AaronBallman @cor3ntin @erichkeane do you have any idea if that would be feasible?
I have not written one at the moment, no. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think this looks quite good. I'd like a libc++-specific test for something simple like constinit std::string my_str = "";
and a short paragraph in libcxx/docs/UsingLibcxx.rst
about this. Maybe something at the bottom about unspecified behaviour which we define.
I'm not quite sure the question here, |
That's a great idea!
We want a warning if an object of a particular type (std::basic_string) is initialized at constant-evaluation-time, and then the resulting object escapes and is embedded into the resulting binary for use at runtime. The warning is justified for std::basic_string, because escaping the object to runtime is only possible (at least so far), when it doesn't point to any external memory allocations (and also it must not point to itself, when it's a local constinit variable) -- which are all internal implementation details. Whether a string meets these constraints is thus implementation-defined, and in practice varies based on platform and standard-lib implementation, and of course length of the string. We could make a generic type attribute to diagnose this, or perhaps just hard-code it for std::basic_string. |
This looks like a good motivation for both Exposing the diagnostic engine to C++ (LLVM) and P2758 Emitting messages at compile time. And of course, folks are still working on allocation propagation which would make the need for a warning here moot. I'm not sure how easy it would be for us to track the uses of basic strings in context that are potentially evaluated and not constant evaluated. For example what about constexpr functions that themselves are only used at compile time? |
I'm not sure how that would help. The problem here is that we have to detect whether the string is promoted to static storage, which AFAIK isn't possible to detect in this case. |
I think the warning should be triggered only if
For a |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think this is fine but I agree we should have a compat warning for this and I would like to see more testing as well.
libcxx/test/libcxx/strings/basic.string/string.cons/constinit_sso_string.compile.pass.cpp
Show resolved
Hide resolved
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM % nit.
libcxx/test/libcxx/strings/basic.string/string.cons/constinit_sso_string.compile.pass.cpp
Outdated
Show resolved
Hide resolved
I just want to slide in to say that I really like this patch from the point of view of I agree it could cause a portability trap to enable this, but I think the right way to go is to have a nice Clang diagnostic that says "oops, strings are can't portably be initialized at compile-time" -- not to make our Thanks for working on that @jyknight! |
@jyknight Is there anything you need in order to make progress on this? If so, please let us know and we'll try to help. I think this patch is pretty much ready to go based on @philnik777 's comment above. |
I don't think I will be able to work on that the discussed Clang diagnostics anytime soon. Unless someone else is volunteering to implement it, that may not be implemented at all. So, I'd like to hear explicit consensus is that folks are generally OK to submit this without such a diagnostic. |
Oh, I might have misread or misunderstood the discussion, but I didn't have in mind that we would block this patch from making progress on having a Clang diagnostic. Did other people in this conversation have that expectation? Edit: In other words, I think we should ship this and then adding a warning is a nice-to-have followup to reduce the likelihood of people falling into a portability trap. |
I didn't consider that blocking either. IMO that's actually out-of-scope for this patch. |
I think that's a reasonable path forward. |
Looks like the debug/hardened/safe mode tests are failing libcxx/std/utilities/template.bitset/bitset.members/to_string.pass.cpp because the constant evaluations for
now both hit the constant evaluation step limit. With the previous code, it would work until length 80 (after creating a get_test_cases<80> function with similar patterns), but with the new impl, 63 is the max in those modes. I'm not sure whether we should care? I've fixed the test by splitting up the test function. Sample failure from https://buildkite.com/llvm-project/libcxx-ci/builds/30568:
|
@jyknight I don't think we have to worry much about that. The step count is quite arbitrary (maybe something like AST nodes?) and doesn't seem to have anything to do with the actual cost or compile time of the program. e.g. a |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks a lot for working on this @jyknight!
I'm merging now since it looks like everyone's happy with the PR and there are no outstanding comments. |
llvm/llvm-project#66576 makes std::string's ctor constexpr if it's short enough to fit in the short-string optimization. This allows the compiler to realize that these are unused. Deleting them is necessary to be able to update libc++. (Relying on std::string's ctor to be constexpr for storage isn't allowed per standard, see discussion on the pull request above.) Bug: none Change-Id: I860e1c67a7c15342901814dafbab262672c0d45a Reviewed-on: https://chromium-review.googlesource.com/c/chromium/src/+/4935335 Reviewed-by: Hans Wennborg <[email protected]> Auto-Submit: Nico Weber <[email protected]> Commit-Queue: Nico Weber <[email protected]> Owners-Override: Nico Weber <[email protected]> Cr-Commit-Position: refs/heads/main@{#1208958}
llvm/llvm-project#66576 makes std::string's ctor constexpr if it's short enough to fit in the short-string optimization. This allows the compiler to realize that these are unused. Deleting them is necessary to be able to update libc++. (Relying on std::string's ctor to be constexpr for storage isn't allowed per standard, see discussion on the pull request above.) Bug: none Change-Id: I860e1c67a7c15342901814dafbab262672c0d45a Reviewed-on: https://chromium-review.googlesource.com/c/chromium/src/+/4935335 Reviewed-by: Hans Wennborg <[email protected]> Auto-Submit: Nico Weber <[email protected]> Commit-Queue: Nico Weber <[email protected]> Owners-Override: Nico Weber <[email protected]> Cr-Commit-Position: refs/heads/main@{#1208958} NOKEYCHECK=True GitOrigin-RevId: 879cb86187c04e9e941c03174765e5030a5a9cba
llvm/llvm-project#66576 makes std::string's ctor constexpr if it's short enough to fit in the short-string optimization. This allows the compiler to realize that these are unused. Deleting them is necessary to be able to update libc++. (Relying on std::string's ctor to be constexpr for storage isn't allowed per standard, see discussion on the pull request above.) Bug: none Change-Id: I860e1c67a7c15342901814dafbab262672c0d45a Reviewed-on: https://chromium-review.googlesource.com/c/chromium/src/+/4935335 Reviewed-by: Hans Wennborg <[email protected]> Auto-Submit: Nico Weber <[email protected]> Commit-Queue: Nico Weber <[email protected]> Owners-Override: Nico Weber <[email protected]> Cr-Commit-Position: refs/heads/main@{#1208958} CrOS-Libchrome-Original-Commit: 879cb86187c04e9e941c03174765e5030a5a9cba
Previously, libcxx forced all strings created during constant evaluation to point to allocated memory. That was done due to implementation difficultites, but it turns out not to be necessary. This patch permits the use of SSO strings during constant evaluation, and also simplifies the implementation.
This does have a downside in terms of enabling users to accidentally write non-portable code, however, which I've documented in UsingLibcxx.rst.
In particular, whether
constinit std::string x = "...";
will successfully compile now depends on whether the string is smaller than the SSO capacity -- in libc++, up to 22 bytes on 64-bit platforms, and up to 10 bytes on 32-bit platforms. By comparison, libstdc++ and MSVC have an SSO capacity of 15 bytes, except that in libstdc++, constant-initialized strings cannot be used as function-locals because the object contains a pointer to itself.