Skip to content

Expose more string APIs #15075

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Draft
wants to merge 14 commits into
base: master
Choose a base branch
from
Draft

Conversation

morrisonlevi
Copy link
Contributor

@morrisonlevi morrisonlevi commented Jul 23, 2024

The main motivation here is to increase the amount of ZEND_API functions for working with zend_string *. Rust extensions cannot easily access these static zend_always_inline functions. So this makes them available while also still forcing them to be inline in C.

The secondary motivation is to shrink the size of the PHP binaries. It turns out that because many of these functions are forced inline, they actually bloat the binary. Consider code with a structure like this:

    if (UNEXPECTED(...)) {
        ...
        zend_string_release(...);
        return FAILURE;
    }

In cases like this, you do not actually want to inline zend_string_release. If you replace static zend_always_inline with inline and turn on -Werror=inline, GCC will tell you this with an error message like:

error: inlining failed in call to 'zend_string_release': call is unlikely and code size would grow [-Werror=inline]

So, for some functions we want an inline version and a non-inline version. Sometimes we name a thing x and the inline version of it x_inline. However, these are widely used APIs. So I left the inline version alone and added _outline for the non-inline version. I'm open to a better naming scheme.

Lastly, -Werror=inline has a lot of failures, as you might guess. This PR addresses a number of failures that fall into these classes that have nothing to do with strings, but cutting down the number of error messages helped me iterate over the patch:

  1. The function uses setjmp, so it cannot be inline. The inline gets removed.
  2. The function is never inlined. For instance, if it has 3 usage sites, and the compiler fails to inline all 3, then I remove inline from this function.

For these cases, the compiler was already not inlining the function. There shouldn't be any ill effect from this group of changes.


So what does the size look like after these changes? This is on Linux with GCC 14.1 in a docker container on an Apple M1 (aarch64) and configured with CFLAGS="-O2 -g1 -fno-omit-frame-pointer".

# master
# ls -l sapi/cli/php
-rwxr-xr-x 1 root root 26496064 Jul 29 17:02 sapi/cli/php

# this PR
# ls -l sapi/cli/php
-rwxr-xr-x 1 root root 26483576 Jul 29 19:46 sapi/cli/php

It saves 12,488 bytes, despite adding 25 ZEND_API functions.

Performance should not be affected, but there may be a slight improvement (some unexpected code paths become smaller, which can have cache effects). If there is a regression, it would mean that an UNEXPECTED (whether implicit or explicit) code path is actually taken quite often in the benchmark, and I'd like to know more information.

@javiereguiluz
Copy link
Contributor

Nice contribution! Thanks.

About suggestions related to the naming scheme, I don't like outline because that word is not the opposite of inline, so it might be confusing. If you are open to slightly larger variants:

  • *_no_inline
  • *_noinline
  • *_never_inline
  • *_noninline
  • *_non_inline
  • *_non_inlined
  • *_not_inline
  • *_not_inlined

This is what other projects do:

@iluuu1994
Copy link
Member

iluuu1994 commented Jul 23, 2024

I agree with @javiereguiluz about the word outline. I did not understand its meaning until seeing that this is related to avoiding inlining. I do wonder if it might not be better to just stop forcing inlining of these functions and instead use inline. That would force inlining most of the time, while allowing the compiler to allow not doing so in .cold sections and unlikely branches.

That said, I do not know the exact heuristics used in compilers to determine if some function is inlined, and they are likely complex.

@morrisonlevi
Copy link
Contributor Author

morrisonlevi commented Jul 23, 2024

I do wonder if it might not be better to just stop forcing inlining of these functions and instead use inline. That would force inlining most of the time, while allowing the compiler to allow not doing so in .cold sections and unlikely branches.

I would be in favor of that personally for the cases where I added _outline variants. The PR would probably need a more rigorous performance evaluation, though. There were a few cases which I did not use the outline version because GCC believed it was unlikely, but based on my reading of the code, I was not so sure. Removing the always inline would mean these would then not get inlined.

One bothersome bit about that though is that -Winline is going to whine about those cases where it doesn't inline. With how the PR is written right now, it does not whine. Sadly, you can't just do a pragma diagnostic push -Winline or whatever around the inline function--the warning is generated from the usage site, not the declaration/definition.

Copy link
Member

@Girgias Girgias left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm happy with removal of the inline qualifier in ext/spl and ext/session if you want to merge those changes already.

@morrisonlevi
Copy link
Contributor Author

I've expanded the number of ZEND_API functions to 25 and renamed *_outline to *_noinline. I've updated the size numbers in the PR description (still smaller, just barely less so because I added more functions).

@morrisonlevi
Copy link
Contributor Author

morrisonlevi commented Jul 30, 2024

Could someone confirm that the symbols show up as expected on Windows? I don't know how to check this for that platform. If anyone knows if we check for existence of certain symbols in CI pipelines, I could add something there.

@iluuu1994
Copy link
Member

To entertain the idea of extern inline: It seems like neither GCC nor Clang handle this optimally. I wonder if it's worth reporting.

#include <stdbool.h>
#include <stdint.h>
#include <stdlib.h>

#define IS_STR_INTERNED (1 << 0)
#define IS_STR_PERSISTENT (1 << 1)
#define EXPECTED(condition) __builtin_expect(!!(condition), 1)
#define UNEXPECTED(condition) __builtin_expect(!!(condition), 0)
#define pefree(ptr, persistent) ((persistent)?free(ptr):efree(ptr))

extern void efree(void *);

typedef struct {
    int32_t rc;
    int32_t flags;
    size_t len;
    char val[1];
} zend_string;

extern inline void zend_string_release(zend_string *s) {
    if (!(s->flags & IS_STR_INTERNED) && --s->rc == 0) {
        pefree(s, s->flags & IS_STR_PERSISTENT);
    }
}

 void test1(zend_string *s) {
    zend_string_release(s);
}

__attribute__((cold)) void test2(zend_string *s) {
    zend_string_release(s);
}

void test3(zend_string *s, bool cond) {
    if (UNEXPECTED(cond)) {
        zend_string_release(s);
    }
}

void test4(zend_string *s, bool cond) {
    if (EXPECTED(cond)) {
        zend_string_release(s);
    }
}

@morrisonlevi
Copy link
Contributor Author

morrisonlevi commented Jul 30, 2024

I would have hoped for this matrix:

  • test1: Inlined
  • test2: Not inlined
  • test3: Not inlined
  • test4: Inlined

Because it's consistent with expected/unexpected code paths. It's interesting that GCC and Clang aren't consistent with each other.

But this further solidifies the value in having both an inlined func and one that isn't, at least for some cases.

@iluuu1994
Copy link
Member

I would have hoped for this matrix:

Agreed.

But this further solidifies the value in having both an inlined func and one that isn't, at least for some cases.

It might be worth discussing with the GCC/Clang folks first, before committing to a big amount of code churn. If GCC doesn't want to adjust it behavior, I won't object to this change. The Clang behavior is less problematic. Inlining within cold functions is less problematic because they likely live in .cold sections. However, unlikely branches only get moved to .cold sections after some size threshold, so they can increase instruction cache pressure.

@iluuu1994
Copy link
Member

To entertain the extern inline idea, I created a PoC:

diff --git a/Zend/zend_string.c b/Zend/zend_string.c
index 7a00171d2c..d753582614 100644
--- a/Zend/zend_string.c
+++ b/Zend/zend_string.c
@@ -528,3 +528,7 @@ size_t strlcat (char *__restrict dest, const char *restrict src, size_t n)
 	return result;
 }
 #endif
+
+extern inline void zend_string_release(zend_string *s);
+extern inline uint32_t zval_gc_flags(uint32_t gc_type_info);
+extern inline uint32_t zend_gc_delref(zend_refcounted_h *p);
diff --git a/Zend/zend_string.h b/Zend/zend_string.h
index b62875a6ec..26a834692a 100644
--- a/Zend/zend_string.h
+++ b/Zend/zend_string.h
@@ -341,7 +341,7 @@ static zend_always_inline void zend_string_efree(zend_string *s)
 	efree(s);
 }
 
-static zend_always_inline void zend_string_release(zend_string *s)
+ZEND_API inline void zend_string_release(zend_string *s)
 {
 	if (!ZSTR_IS_INTERNED(s)) {
 		if (GC_DELREF(s) == 0) {
diff --git a/Zend/zend_types.h b/Zend/zend_types.h
index d1db25a8f2..85afd4316b 100644
--- a/Zend/zend_types.h
+++ b/Zend/zend_types.h
@@ -742,7 +742,7 @@ static zend_always_inline uint8_t zval_gc_type(uint32_t gc_type_info) {
 	return (gc_type_info & GC_TYPE_MASK);
 }
 
-static zend_always_inline uint32_t zval_gc_flags(uint32_t gc_type_info) {
+ZEND_API inline uint32_t zval_gc_flags(uint32_t gc_type_info) {
 	return (gc_type_info >> GC_FLAGS_SHIFT) & (GC_FLAGS_MASK >> GC_FLAGS_SHIFT);
 }
 
@@ -1339,7 +1339,7 @@ static zend_always_inline void zend_gc_try_delref(zend_refcounted_h *p) {
 	}
 }
 
-static zend_always_inline uint32_t zend_gc_delref(zend_refcounted_h *p) {
+ZEND_API inline uint32_t zend_gc_delref(zend_refcounted_h *p) {
 	ZEND_ASSERT(p->refcount > 0);
 	ZEND_RC_MOD_CHECK(p);
 	return --(p->refcount);

However, it barely showed any savings (64 bytes). I didn't benchmark it, but since the size didn't change much I wouldn't expect there to be a significant change. I created a gcc report here that suggests being more conservative about inlining in unlikely branches. https://gcc.gnu.org/bugzilla/show_bug.cgi?id=116191

@morrisonlevi
Copy link
Contributor Author

Are we okay with me separating this into two PRs? One of them would just make a bunch of functions ZEND_API, which will increase size, and isn't blocked on the GCC discussion. And the second PR would reduce the size back down whenever the GCC discussion is done and we can decide what approach to take (hope for GCC fix in the future, or do things manually, or some combo).

I want to have the APIs available to Rust extensions in PHP 8.4. The shrinking stuff is more of "well I'm making binaries bigger, and if there are complaints, it's probably going to be that, so let's be proactive in reducing it down."

@iluuu1994
Copy link
Member

iluuu1994 commented Aug 5, 2024

@morrisonlevi Would my patch here also work? It guarantees that the symbol with the given name is available, but it just also allows other compilation units to inline if they wish. It seems like you should be able to rely from your Rust source then that zend_string_release() is available. That would avoid the name split. For C++, it might be a different story (extern inline does not exist in C++).

@morrisonlevi
Copy link
Contributor Author

morrisonlevi commented Aug 7, 2024

The patch works but I'd rather not touch it without benchmarks. As I mentioned before, there were cases where the compiler thought it would be beneficial to not inline but it looked to me to be on a not-cold path, so I didn't switch those to use the noinline variants.

That doesn't mean it's not okay, we could perhaps make those choices depending on what comes up in the GCC discussion. For now I'm just trying to decouple such tinkerings so the extern APIs can be available in PHP 8.4.

I was looking around and it seems there are other options, like writing this in a header:

ZEND_API void zend_string_release(zend_string *s);
ZEND_API inline void zend_string_release(zend_string *s) { /* ... */ }

Not quite sure how this stacks up in C and C++ yet. Preliminary investigation says that in C with clang it gets inlined but there is still a function emitted even without extern-inline. Not sure what other side effects it might have. Edit: it seems to emit it in every translation unit. Not what we want, though the linkers I tried didn't care.

@iluuu1994
Copy link
Member

I was looking around and it seems there are other options, like writing this in a header:

ZEND_API void zend_string_release(zend_string *s);
ZEND_API inline void zend_string_release(zend_string *s) { /* ... */ }

I don't think that's enough. You need extern inline for it to get generated and exported with the given symbol name. The extern inline needs to go in a specific .c file, otherwise the compiler can't know which compilation unit should declare it.

The patch works but I'd rather not touch it without benchmarks.

That's fair. If this is simply about making sure zend_string_release is available in some compilation unit, you could still keep the always_inline in the .h file, which will always inline it but then also declare it in the compilation unit.

https://godbolt.org/z/GvqzshMfc

Anyway, I don't object to your approach either. But once we introduce the _noinline variants there's no going back, so let's discuss it first. I'd be interested in the opinion of some other people. /cc @arnaud-lb @nielsdos

@nielsdos
Copy link
Member

nielsdos commented Aug 8, 2024

About the name "outline": at least this term was familiar to me because I saw it in the compilers course I took as the opposite of inlining. AFAIK this is an existing term. Anyway, I don't care too much about the name, just pointing out that it's not that strange. "noinline" is fine by me.

I'm not sure what approach is best honestly, I've seen the _noinline approach in other projects, but indeed once we add that we can't really go back. I don't object to either one though.

@arnaud-lb
Copy link
Member

I like the extern inline approach as it solves the problem of exposing inline functions with less verbosity. As it appears to be useable with always_inline, there should be no impact on php-src. However, there may be pitfalls I'm not aware of.

I don't object to either approach.

@morrisonlevi
Copy link
Contributor Author

morrisonlevi commented Aug 9, 2024

BTW, here is an example where GCC wouldn't inline zend_string_release_ex if it wasn't zend_always_inline, that I think is possibly a "hot" path:

zend_string_release_ex(lc_name, 0);

This file has a few places it wouldn't inline.

@morrisonlevi
Copy link
Contributor Author

morrisonlevi commented Aug 13, 2024

Can anyone check Windows that the new symbols exist with the correct visibility level? That's my only other concern other than ZEND_API inline vs dedicated zend_always_inline/_noinline versions for the relevant functions.

@TimWolla
Copy link
Member

@cmb69 👆 ?

@cmb69
Copy link
Member

cmb69 commented Aug 13, 2024

Can anyone check Windows that the new symbols exist with the correct visibility level?

$ dumpbin /exports x64\Release_TS\php8ts.dll | grep noinline
       1639  666 00156A60 smart_str_free_noinline = smart_str_free_noinline
       4013  FAC 001587E0 zend_string_efree_noinline = zend_string_efree_noinline
       4019  FB2 001587C0 zend_string_free_noinline = zend_string_free_noinline
       4029  FBC 00158820 zend_string_release_ex_noinline = zend_string_release_ex_noinline
       4030  FBD 001587F0 zend_string_release_noinline = zend_string_release_noinline
       4058  FD9 0014FD00 zend_tmp_string_release_noinline = zend_tmp_string_release_noinline

I think you can also add this line (dumpbin /exports x64\Release_TS\php8ts.dll | grep noinline) as a step to push.yml, so in case of further modifications to the PR you can check that in the workflow logs.

@cmb69
Copy link
Member

cmb69 commented Aug 14, 2024

Oops, dumpbin is not in the PATH, and it might be a bit fiddly to get it there. So I've checked locally:

$ dumpbin /exports x64\Release_TS\php8ts.dll | grep zend_string_addref
       4006  FA5 00005CF0 zend_string_addref = zend_string_addref

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

8 participants