Skip to content

Make internal run_time_cache a persistent allocation #15040

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 3 commits into from
Sep 6, 2024

Conversation

bwoebi
Copy link
Member

@bwoebi bwoebi commented Jul 20, 2024

We also add zend_map_ptr_static, so that we do not incur the overhead of constantly recreating the internal run_time_cache pointers on each request when observers are enabled.

Additionally it saves a bit of zeroing of the map_ptrs on each request, independently of whether observers are actually enabled or not, given that they're below the zend_map_ptr_static_size threshold now.

Comparing instruction counts, it shows about 0.1% improvement without observers.

This mechanism might be extended for mutable_data of internal classes too. And possibly also for preloaded code.

@bwoebi bwoebi force-pushed the optimize-internal-rt-cache branch 5 times, most recently from 522803f to 818f6d5 Compare July 22, 2024 02:35
@bwoebi bwoebi marked this pull request as ready for review July 22, 2024 15:47
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What's the purpose of the uintptr_t casts being moved to intptr_t?

Copy link
Member Author

@bwoebi bwoebi Jul 22, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Negative offsets with respect to CG(map_ptr_base). (Still positive compared to map_ptr_real_base).
But having negative offsets for the static ones allows allocating positive offsets for non-static map_ptrs during startup as well.

Comment on lines +1999 to +2001
ZEND_API size_t zend_map_ptr_static_size;
ZEND_API size_t zend_map_ptr_static_last;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Any reason these are mutable globals as opposed to thread-locals?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That's the point - they're supposed to be allocated fully during startup. And startup is the same across all threads.

Copy link
Member

@iluuu1994 iluuu1994 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't see any issues otherwise.

Copy link
Member

@arnaud-lb arnaud-lb left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice!

As the internal runtime cache is never re-allocated after startup due to this change, it looks like that we could store a true pointer in ->run_time_cache instead of an offset, at least in non-ZTS builds?

@bwoebi
Copy link
Member Author

bwoebi commented Jul 23, 2024

@arnaud-lb Yes, in NTS that's possible.

@staabm
Copy link
Contributor

staabm commented Jul 23, 2024

is this change meant to improve performance? if so, can the diff be measured?

@bwoebi bwoebi force-pushed the optimize-internal-rt-cache branch 2 times, most recently from 07a4cce to 477314f Compare July 24, 2024 01:11
@bwoebi bwoebi requested a review from nielsdos as a code owner July 24, 2024 01:11
@bwoebi bwoebi force-pushed the optimize-internal-rt-cache branch 3 times, most recently from a28a5f1 to 671d06d Compare July 24, 2024 02:39
@dstogov
Copy link
Member

dstogov commented Jul 24, 2024

I have benchmarked this patch on Symfony Demo with callgrind (with and without observer), using the following commands:

ZEND_DONT_UNLOAD_MODULES=1 valgrind --tool=callgrind --separate-recs=1 --dump-instr=yes --cache-sim=no sapi/cgi/php-cgi -d opcache.jit=0 -T2,100 /home/dmitry/php/community_tests/symfony_demo/public/index.php > /dev/null
ZEND_DONT_UNLOAD_MODULES=1 valgrind --tool=callgrind --separate-recs=1 --dump-instr=yes --cache-sim=no sapi/cgi/php-cgi -d opcache.jit=0 -d zend_test.observer.enabled=1 -d zend_test.observer.show_output=0 -T2,100 /home/dmitry/php/community_tests/symfony_demo/public/index.php > /dev/null

master without observer:  860,901,434
master with observer:     944,021,946 (observer makes ~9% slowdown)
patched without observer: 859,001,591 (patch makes ~0.2% improvement)
patched with observer:    930,449,148 (patch makes ~1.5% improvement, now observer makes ~8% slowdown)

So the patch makes some performance improvement for observer (eliminates call to zend_init_internal_run_time_cache() on each request) and "as a side effect" also makes slight improvement for normal execution (keeps "static" part of map_ptr space unchanged between requests).

It would be great to saw this explanation in PR comments...

Comment on lines +5395 to +5400
#if !ZTS
ZEND_MAP_PTR(zend_ffi_new_fn.run_time_cache) = ZEND_MAP_PTR(((zend_internal_function *)zend_hash_str_find_ptr(&zend_ffi_ce->function_table, "new", sizeof("new")-1))->run_time_cache);
ZEND_MAP_PTR(zend_ffi_cast_fn.run_time_cache) = ZEND_MAP_PTR(((zend_internal_function *)zend_hash_str_find_ptr(&zend_ffi_ce->function_table, "cast", sizeof("cast")-1))->run_time_cache);
ZEND_MAP_PTR(zend_ffi_type_fn.run_time_cache) = ZEND_MAP_PTR(((zend_internal_function *)zend_hash_str_find_ptr(&zend_ffi_ce->function_table, "type", sizeof("type")-1))->run_time_cache);
#endif
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why do we need this?
Why this is done only for non-ZTS build?
Will we have any problems with ZTS build?

Please use ZEND_MAP_PTR_INIT instead of ZEND_MAP_PTR() on left size.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm using MAP_PTR_INIT for everything where I assign a value, but in this case I want to copy whatever binary data is, thus I found using ZEND_MAP_PTR directly more fitting and symmetric with the right side.

Copy link
Member Author

@bwoebi bwoebi Jul 24, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also, only need it in NTS, as in ZTS this uses a proper map_ptr which is set when the function is created and thus existing here.
In NTS the pointer is initialized later in the startup sequence and thus needs to be copied back.

Comment on lines -2955 to +2965
ZEND_MAP_PTR_NEW(internal_function->run_time_cache);
#if ZTS
ZEND_MAP_PTR_NEW_STATIC(internal_function->run_time_cache);
#else
ZEND_MAP_PTR_INIT(internal_function->run_time_cache, NULL);
#endif
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you explain why ZTS makes difference?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We need per-thread pointers for ZTS (run-time cache is local to each request). But for NTS we can, as a slight optimization, just skip the indirect lookup via CG(map_ptr_base).

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Probably worth adding a comment here in code.

Copy link
Member

@dstogov dstogov left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The patch looks more or less good (I may miss some edge cases in implementation)
Please, answer my questions and make minor clean up.

You don't see any problems for Windows, where different workers have different copies of internal functions?

@bwoebi
Copy link
Member Author

bwoebi commented Jul 24, 2024

@dstogov This is not problematic on Windows, as this is fully local to each process. There's zero interaction with shared memory here.

We also add zend_map_ptr_static, so that we do not incur the overhead of constantly recreating the internal run_time_cache pointers on each request.
This mechanism might be extended for mutable_data of internal classes too.
@bwoebi bwoebi force-pushed the optimize-internal-rt-cache branch from 671d06d to 87f038e Compare September 6, 2024 23:02
@bwoebi bwoebi merged commit 25d7616 into php:master Sep 6, 2024
10 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants