Skip to content

Optimize SplObjectStorage native read/write/isset dimension handlers #7695

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
wants to merge 2 commits into from

Conversation

TysonAndre
Copy link
Contributor

This makes reading/writing with $splObjectStorage[$offset] shorthand twice as
fast as it was previously, and a bit faster than offsetGet/offsetSet instead of
(previously) much slower.

There have been no attempts at adding these read/write handlers in the commit history of this file.

The shorthand syntax is also more concise than calling offsetGet/offsetSet directly.

Related to GH-7690

A synthetic benchmark that sequentially writes then sequentially reads data in a SplObjectStorage with 100000 entries:

  • Time to construct a SplObjectStorage instance instead of a subclass - I expect that to be unaffected
  • Reads/writes to SplObjectStorage (or subclasses that don't override offsetGet/offsetSet/offsetExists/getHash) - I expect that to be faster, like the benchmarks
  • Magic array dimension read/write shorthand syntax is much faster
  • Subclasses that do override those ArrayAccess methods may be slightly slower than before, but are uncommon overall (and were already slow)
<?php

class Example {
}

function bench_spl_object_storage_short(int $n) {
    $m = new SplObjectStorage();
    $values = [];
    for ($i = 0; $i < $n; $i++) {
        $x = new Example();
        $m[$x] = $i;
        $values[] = $x;
    }
    $total = 0;
    foreach ($values as $x) {
        $total += $m[$x];
    }
    return $total;
}

function bench_spl_object_storage(int $n) {
    $m = new SplObjectStorage();
    $values = [];
    for ($i = 0; $i < $n; $i++) {
        $x = new Example();
        $m->offsetSet($x, $i);
        $values[] = $x;
    }
    $total = 0;
    foreach ($values as $x) {
        $total += $m->offsetGet($x);
    }
    return $total;
}

function bench_weak_map(int $n) {
    $m = new WeakMap();
    $values = [];
    for ($i = 0; $i < $n; $i++) {
        $x = new Example();
        $m[$x] = $i;
        $values[] = $x;
    }
    $total = 0;
    foreach ($values as $x) {
        $total += $m[$x];
    }
    return $total;
}

function main(int $n, int $iterations) {
    error_reporting(E_ALL & ~E_DEPRECATED);
    ini_set('memory_limit', '1G');
    foreach (['bench_spl_object_storage', 'bench_spl_object_storage_short', 'bench_weak_map'] as $f) {
        gc_collect_cycles();
        $t1 = hrtime(true);
        for ($i = 0; $i < $iterations; $i++) {
            $total = $f($n);
        }
        $t2 = hrtime(true);
        printf("%30s: n=%d iterations=%d create+read time=%.6f result=%d\n", $f, $n, $iterations, ($t2 - $t1)/1e9, $total);
    }
    echo "\n";
}

for ($i = 0; $i < 10; $i++) {
    main(100000, 10);
}

Before work on WeakMap

      bench_spl_object_storage: n=100000 iterations=10 create+read time=0.137906 result=4999950000
bench_spl_object_storage_short: n=100000 iterations=10 create+read time=0.231022 result=4999950000
                bench_weak_map: n=100000 iterations=10 create+read time=0.360986 result=4999950000

With this PR, bench_spl_object_storage_short is much faster than before:

      bench_spl_object_storage: n=100000 iterations=10 create+read time=0.135253 result=4999950000
bench_spl_object_storage_short: n=100000 iterations=10 create+read time=0.113202 result=4999950000

With GH-7690 and related work on WeakMap, create+read for WeakMap is faster than before

                bench_weak_map: n=100000 iterations=10 create+read time=0.150795 result=4999950000

It's safer to call zval_ptr_dtor on a copy of the zval (ZVAL_COPY_VALUE) AFTER updating the SplObjectStorage entry in case the entry is moved or updated by the destructor (e.g. __destruct deletes the entry while it's being updated, causing that pointer to be invalidated) - this is a bug unrelated to this change that is unlikely in typical code.

@TysonAndre TysonAndre changed the title Optimize SplObjectStorage native read/write dimension handlers Optimize SplObjectStorage native read/write/isset dimension handlers Nov 27, 2021
@TysonAndre TysonAndre force-pushed the spl-object-storage-optimize branch from 784e1a5 to 37383a6 Compare November 27, 2021 17:22
@TysonAndre
Copy link
Contributor Author

The build failure is spurious; the review comment on the internal representation of flags was addressed

@TysonAndre TysonAndre force-pushed the spl-object-storage-optimize branch from 37383a6 to 7e59d4a Compare November 28, 2021 15:22
@SVGAnimate
Copy link
Contributor

SVGAnimate commented Nov 28, 2021

Hello ,
I also believe we should use zend_object_alloc() instead of emalloc()
@see https://github.com/php/php-src/blob/master/ext/spl/spl_observer.c#L197
@see https://github.com/php/php-src/search?q=zend_object_alloc
Thanks,

@TysonAndre
Copy link
Contributor Author

TysonAndre commented Nov 28, 2021

I also believe we should use zend_object_alloc() instead of emalloc()

They'd have the same computed size. You use zend_object_alloc when you don't know the size of the internal (C) data in the struct before the zend_object embedded in the internal data - not knowing makes zend_object_alloc a bit less efficient.

This is properly accounting for the amount of memory for the number of PHP properties in the subclasses

@nikic
Copy link
Member

nikic commented Nov 28, 2021

The count() change looks good, feel free to land separately.

@nikic
Copy link
Member

nikic commented Nov 28, 2021

... though you probably want to update the implementation in array.c as well.

TysonAndre added a commit that referenced this pull request Nov 28, 2021
This makes reading/writing with `$splObjectStorage[$offset]` shorthand twice as
fast as it was previously, and a bit faster than offsetGet/offsetSet instead of
(previously) much slower.

Call destructor after overriding old SplObjectStorage entry.
Previously, it was called before, which was possibly unsafe
if the destructor had side effects.

Add tests.

Related to phpGH-7690
@TysonAndre TysonAndre force-pushed the spl-object-storage-optimize branch from 85ee3f6 to 13f46bf Compare November 28, 2021 19:18
spl_object_storage_free_hash(intern, &key);
ZEND_ASSERT(key.key);
bool found = zend_hash_exists(&intern->storage, key.key);
zend_string_release_ex(key.key, 0);
Copy link
Contributor

@morrisonlevi morrisonlevi Nov 29, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just in case any other reviewers wonder: the spl_object_storage_free_hash goes away because we know that we have key.key, so we can release the string directly.

Copy link
Member

@dstogov dstogov left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't see any serious problems

Comment on lines 483 to 484
ZEND_ASSERT(Z_TYPE(element->inf) != IS_REFERENCE);
ZVAL_COPY(rv, &element->inf);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not sure why element of the storage can't be a reference. May be this correct. May be it's safer to use ZVAL_COPY_DEREF()

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't believe it can be a reference - it's because you can't create a reference with offsetSet, offsetGet, and offsetExists in the zend_std_read_dimension code.

In many places it was using ZVAL_COPY.

Attempting to do so resulted in Notice: Indirect modification of overloaded element of SplObjectStorage has no effect in php shell code on line 1 before/after this change


Separately, I think there might be a bug in the __unserialize implementation when a value is a reference
The handlers I've added are based on zend_std_read_dimension and zend_std_write_dimension. #7695 (comment)

Copy link
Contributor Author

@TysonAndre TysonAndre Nov 29, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

php > echo PHP_VERSION;
7.4.19-dev
php > $s = new SplObjectStorage();
php > $y = 1;
php > $x = [new stdClass(), &$y];
php > var_dump($s->__unserialize([$x, []]));
NULL
php > var_dump($s);
object(SplObjectStorage)#1 (1) {
  ["storage":"SplObjectStorage":private]=>
  array(1) {
    ["000000001f5767c4000000002919f006"]=>
    array(2) {
      ["obj"]=>
      object(stdClass)#2 (0) {
      }
      ["inf"]=>
      &int(1)
    }
  }
}

The entry for inf is a reference

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Figuring out what to do for the legacy Serializable::unserialize interface is a different change - if something was serialized that referred to that index, subsequent objects being unserialized could refer back to it.

The only way I can think of to add a reference to a value of an entry of https://www.php.net/splobjectstorage is through manually constructed serializable data, so it's probably fine to convert references to non-references in __unserialize in this case

SplObjectStorage->unserialize may be a different cause of references
for malformed inputs, so continue checking

In internally used methods, convert references to non-references if they're
found.
@TysonAndre TysonAndre force-pushed the spl-object-storage-optimize branch from 620d63c to 9f69bf3 Compare November 29, 2021 15:12
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants