Skip to content

random: Pass algorithm and state together as php_random_algo_with_state #13350

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 2 commits into from
Feb 25, 2024

Conversation

TimWolla
Copy link
Member

@TimWolla TimWolla commented Feb 7, 2024

Since 162e1dc, the php_random_status struct contains just a single void*, resulting in needless indirection when accessing the engine state and thus decreasing readability because of the additional non-meaningful ->state references / the local helper variables.

There is also a small, but measurable performance benefit:

<?php
$e = new Random\Engine\Xoshiro256StarStar(0);
$r = new Random\Randomizer($e);

for ($i = 0; $i < 15; $i++)
	var_dump(strlen($r->getBytes(100000000)));

goes from roughly 3.85s down to 3.60s.

The names of the status variables have not yet been touched to keep the diff small. They will be renamed to the more appropriate state in a follow-up cleanup commit.

Since 162e1dc, the `php_random_status` struct
contains just a single `void*`, resulting in needless indirection when
accessing the engine state and thus decreasing readability because of the
additional non-meaningful `->state` references / the local helper variables.

There is also a small, but measurable performance benefit:

    <?php
    $e = new Random\Engine\Xoshiro256StarStar(0);
    $r = new Random\Randomizer($e);

    for ($i = 0; $i < 15; $i++)
    	var_dump(strlen($r->getBytes(100000000)));

goes from roughly 3.85s down to 3.60s.

The names of the `status` variables have not yet been touched to keep the diff
small. They will be renamed to the more appropriate `state` in a follow-up
cleanup commit.
Copy link
Member

@nielsdos nielsdos left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The change looks good to me.

Copy link
Member

@bukka bukka left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The problem with void * is that it makes things more error prone as you can add there anything by mistake. So from the API caller, it makes actually thinks more error prone. I understand why you want to make this change from the internal PoV but not sure if this is an ideal API design. Personally I try to stay away from void * if possible and use it just as the last resort. In this case it might make sense as you want to probably make it generic but maybe having a typed wrapper around it is not that bad and might catch some mistakes.

@TimWolla
Copy link
Member Author

TimWolla commented Feb 8, 2024

@bukka The engine state is stored in a void* before and after this change, so there already is no type safety. This PR just gets rid of the now-useless single-member struct php_random_status:

typedef struct _php_random_status_ {
	void *state;
} php_random_status;

In fact I've increased type safety where possible in previous PRs leading up to this change, e.g. in 0d18f41.

As a possible follow-up, I plan to remove the seed member from the struct php_random_algo and instead add well-typed engine specific seed functions, because seeding an engine generically does not appear super useful to me (Mt19937 ignores the upper half of the uint64_t seed for example).

typedef struct _php_random_algo {
	const size_t state_size;
	void (*seed)(void *status, uint64_t seed);
	php_random_result (*generate)(void *status);
	zend_long (*range)(void *status, zend_long min, zend_long max);
	bool (*serialize)(void *status, HashTable *data);
	bool (*unserialize)(void *status, HashTable *data);
} php_random_algo;

@bukka
Copy link
Member

bukka commented Feb 8, 2024

I meant type safety from the API caller PoV. It means after this change I could pass any pointer to php_array_pick_keys instead of php_random_default_status() result and it won't comlain...

@TimWolla
Copy link
Member Author

TimWolla commented Feb 8, 2024

I see. Yes, making those functions generic is an explicit design goal.

As you specifically mentioned php_random_default_status() - A medium to long-term goal of mine is moving away from a “blessed” global instance of an engine, towards independent instances for each use case.

The global Mersenne Twister is a constant source of problems, especially since its state is visible towards and controllable by userland. php_combined_lcg() as the other global RNG has the same problem that its state is visible towards userland (and it also is a terrible RNG in general).

You could fit quite a few copies of the 256-bit xoshiro256** or 128-bit PcgOneseq128XslRr64 state into the 2.5 kB of global Mersenne Twister 😄

see also: #13290 (comment)

@TimWolla
Copy link
Member Author

TimWolla commented Feb 8, 2024

I could imagine introducing a new

struct {
 const php_random_algo *algo;
 void *state;
}

that allows passing the algorithm and associated state as a single non-pointer parameter, but I'm not sure if this would improve type safety in a measurable way? What do you think?

@bukka
Copy link
Member

bukka commented Feb 8, 2024

Maybe instead of void* you could create some dummy structure for the state that will be as a whole casted to the actual state by the engines. The actual struct would not be used (it could have just contain one char array element for example) but it would keep the caller type safety as the caller would have to operate with specific pointer...

@TimWolla
Copy link
Member Author

Sorry for the delay in getting back to this, was busy with other stuff.

I've thought about the suggestion and made the change locally and I don't believe moving away from the void * to some opaque php_random_state * really improves safety. C will just accept the explicit cast from whatever pointer you have and you're back to square one, just with additional noise. The extra casts wouldn't necessarily be limited to ext/random either. It would also affect users of the code if they stack allocate the engine state or embed the state structure within another structure (e.g. the module globals) to avoid the extra heap allocation. Doing would be especially beneficial for php_random_status_state_pcgoneseq128xslrr64, which is just twice the size of a pointer.

My suggestion of adding the struct in #13350 (comment) on the other hand feels like a value-add to me, because it allows one to store and pass the algorithm + associated state in an atomic fashion. The struct will even be passed as a register pair, making it zero-cost from a performance perspective.

@bukka
Copy link
Member

bukka commented Feb 18, 2024

Your suggestion is fine. I though that you want just pass state on its own and don't need algo but if its useful then cool. My suggestion was for user facing API only so it would be just partial improvement.

@TimWolla TimWolla changed the title random: Remove php_random_status random: Pass algorithm and state together as php_random_algo_with_state Feb 19, 2024
@TimWolla TimWolla requested review from nielsdos and bukka February 19, 2024 19:11
@TimWolla
Copy link
Member Author

I've now updated the code. The user-facing APIs (php_random_gammasection_*, php_random_range*, php_binary_string_shuffle, php_array_data_shuffle, php_array_pick_keys) now take a single php_random_algo_with_state struct by value. I've also added a php_random_default_engine() method for the folks that want a simple way to migrate to the new API, without migrating off the global Mt19937 instance.

The function pointers within the php_random_algo struct still take void *, but these are part the low-level API and not usually expected to be called outside of ext/random.

@TimWolla
Copy link
Member Author

@zeriyoshi Please add an explicit approval / request for change once you had time to look into this!

Copy link
Contributor

@zeriyoshi zeriyoshi left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Simple and elegant!
Sorry for late review!

@TimWolla TimWolla merged commit 79133df into php:master Feb 25, 2024
@TimWolla TimWolla deleted the random-state branch February 25, 2024 19:49
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants