-
Notifications
You must be signed in to change notification settings - Fork 7.9k
RFC: Add final class Collections\Deque
to PHP
#7500
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: master
Are you sure you want to change the base?
Conversation
99adb5c
to
84a8134
Compare
final class Deque
to PHPfinal class Deque
to PHP
IMO, if you aren't going to put "spl" in the name somewhere, then please don't put it in the |
Any name ideas for an extension? E.g. 'std'/'standard'/'nds'? Move to core? Obviously, can't use pecl names that already exist such as 'ds'. Should I create an extension per data structure Still, I don't know if it's a good idea. std also has 'ArrayObject' and various iterables and interfaces that aren't prefixed with Spl already. Also, this would inconvience users trying to find datastructures in the manual if it was split up among many extensions - a developer would see SplDoublyLinkedList exists and not know there was a more efficient Deque, though a "See also" section in the overview would help. EDIT: Overall, I believe that a programming language having multiple standard libraries ("Standard PHP Library") for generic data structures would lead to more confusion than it's worth and be regarded as a poor design choice in the future, especially if the original one would likely never be deprecated. |
Theoretically you can just add it to |
72991b1
to
3612f23
Compare
I would be curious to see benchmarks with the ds deque also - that is already implemented in C within the PHP internal API and has been released and tested for years. My feeling there was that the API and behavior for deque and vector are so similar that you could just implement vector as a circular buffer and avoid the responsibility of choice. The benchmarks between deque and vector for all operations should be marginal and in all cases insignificant. |
Updated. Note that php 8.2's packed arrays use half the memory as packed arrays in PHP <=8.1 This is a tiny bit faster.
Note that php 8.2's packed arrays use half the memory as packed arrays in PHP <=8.1 and would outperform Vector, so the use case for Vectors is less compelling. There's still the argument for
There's the restriction that those Deque implementations need the capacity to be a power of 2 but Vector doesn't. In special cases, Vector could use less memory (e.g. if Vector::map() doesn't need subsequent resizing in the most common cases) |
6b96098
to
bfaf732
Compare
final class Deque
to PHPfinal class Collections\Deque
to PHP
@TysonAndre Is there any progress on this RFC? |
ext/standard/tests/general_functions/phpinfo.phpt is just failing because of the commit message being in the environment variable.
I had found a few things I'd wanted to change, and some that were brought up in https://externals.io/message/116100#116214
It was getting too close to a feature freeze, and I was concerned that'd be too rushed, or that there wouldn't be time to adjust the api in followup RFCs if needed, but I think that's it. I'd also wanted to make sure that memory usage would be consistently low (and at the same time free of edge cases) for new data structures for handlers of get_properties and get_properties_for (even after calling var_export/debug_zval_dump) - In php 8.3-dev it's now straightforward to do that, and the patches have been merged into 8.3-dev for months. #8044 #8046 (etc) So I need to
|
This has lower memory usage and better performance than SplDoublyLinkedList for push/pop operations. The API is as follows: ```php namespace Collections; /** * A double-ended queue (Typically abbreviated as Deque, pronounced "deck", like "cheque") * represented internally as a circular buffer. * * This has much lower memory usage than SplDoublyLinkedList or its subclasses (SplStack, SplStack), * and operations are significantly faster than SplDoublyLinkedList. * * See https://en.wikipedia.org/wiki/Double-ended_queue * * This supports amortized constant time pushing and popping onto the start (i.e. start, first) * or back (i.e. end, last) of the Deque. * * Method naming is based on https://www.php.net/spldoublylinkedlist * and on array_push/pop/unshift/shift/ and array_key_first/array_key_last. */ final class Deque implements IteratorAggregate, Countable, JsonSerializable, ArrayAccess { /** Construct the Deque from the values of the Traversable/array, ignoring keys */ public function __construct(iterable $iterator = []) {} /** * Returns an iterator that accounts for calls to shift/unshift tracking the position of the start of the Deque. * Calls to shift/unshift will do the following: * - Increase/Decrease the value returned by the iterator's key() * by the number of elements added/removed to/from the start of the Deque. * (`$deque[$iteratorKey] === $iteratorValue` at the time the key and value are returned). * - Repeated calls to shift will cause valid() to return false if the iterator's * position ends up before the start of the Deque at the time iteration resumes. * - They will not cause the remaining values to be iterated over more than once or skipped. */ public function getIterator(): \InternalIterator {} /** Returns the number of elements in the Deque. */ public function count(): int {} /** Returns true if there are 0 elements in the Deque. */ public function isEmpty(): bool {} /** Removes all elements from the Deque. */ public function clear(): void {} public function __serialize(): array {} public function __unserialize(array $data): void {} /** Construct the Deque from the values of the array, ignoring keys */ public static function __set_state(array $array): Deque {} /** Appends value(s) to the end of the Deque. */ public function push(mixed ...$values): void {} /** Prepends value(s) to the start of the Deque. */ public function unshift(mixed ...$values): void {} /** * Pops a value from the end of the Deque. * @throws \UnderflowException if the Deque is empty */ public function pop(): mixed {} /** * Shifts a value from the start of the Deque. * @throws \UnderflowException if the Deque is empty */ public function shift(): mixed {} /** * Peeks at the value at the start of the Deque. * @throws \UnderflowException if the Deque is empty */ public function first(): mixed {} /** * Peeks at the value at the end of the Deque. * @throws \UnderflowException if the Deque is empty */ public function last(): mixed {} /** * Returns a list of the elements from the start to the end. */ public function toArray(): array {} // Must be mixed for compatibility with ArrayAccess /** * Insert 0 or more values at the given offset of the Deque. * @throws \OutOfBoundsException if the value of $offset is not within the bounds of this Deque. */ public function insert(int $offset, mixed ...$values): void {} /** * Returns the value at offset (int)$offset (relative to the start of the Deque) * @throws \OutOfBoundsException if the value of (int)$offset is not within the bounds of this vector */ public function offsetGet(mixed $offset): mixed {} /** * Returns true if `0 <= (int)$offset && (int)$offset < $this->count(). */ public function offsetExists(mixed $offset): bool {} /** * Sets the value at offset $offset (relative to the start of the Deque) to $value * @throws \OutOfBoundsException if the value of (int)$offset is not within the bounds of this vector */ public function offsetSet(mixed $offset, mixed $value): void {} /** * Removes the value at (int)$offset from the deque. * @throws \OutOfBoundsException if the value of (int)$offset is not within the bounds of this Deque. */ public function offsetUnset(mixed $offset): void {} /** * This is JSON serialized as a JSON array with elements from the start to the end. */ public function jsonSerialize(): array {} } ``` Earlier work on the implementation can be found at https://github.com/TysonAndre/pecl-teds (though `Teds\Deque` hasn't been updated with new names yet) This was originally based on spl_fixedarray.c and previous work I did on an RFC. Notable features of `Deque` - Significantly lower memory usage and better performance than `SplDoublyLinkedList` - Amortized constant time operations for push/pop/unshift/shift. - Reclaims memory when roughly a quarter of the capacity is used, unlike array, which never releases allocated capacity https://www.npopov.com/2014/12/22/PHPs-new-hashtable-implementation.html > One problem with the current implementation is that arData never shrinks > (unless explicitly told to). So if you create an array with a few million > elements and remove them afterwards, the array will still take a lot of > memory. We should probably half the arData size if utilization falls below a > certain level. For long-running applications when the maximum count of Deque is larger than the average count, this may be a concern. - Adds functionality that cannot be implemented nearly efficiently in an array. For example, shifting a single element onto an array (and making it first in iteration order) with `array_shift` would take linear time, because all elements in the array would need to be moved to make room for the first one - Support `$deque[] = $element`, like ArrayObject. - Having this functionality in php itself rather than a third party extension would encourage wider adoption of this
These operations are constant-time. Unlike array_shift/array_unshift, they aren't actually shifting values in the representation, and the new names are more self-explanatory and commonly used in other Deque implementations.
Hey there, @TysonAndre . How are you doing? Are there any updates on this RFC? I stumbled into a problem today where a better data structure (compared to PHP arrays) would be helpful so I remembered this. Btw, thank you for all the effort you have put into this. <3 |
RFC: https://wiki.php.net/rfc/deque
Online WebAssembly Demo: https://tysonandre.github.io/php-rfc-demo/deque (outdated version of this)
Also see the
final class Vector
proposal: #7488Discussion: Adding
final class Deque
to PHPThis proposes to add the class final class Deque to PHP. From the Wikipedia article for the Double-Ended Queue:
This has lower memory usage and better performance than SplDoublyLinkedList for
push/pop/shift/unshift operations.
The API is as follows:
Earlier work on the implementation can be found at
https://github.com/TysonAndre/pecl-teds
(though
Teds\Deque
hasn't been updated with new names yet)This was originally based on spl_fixedarray.c and previous work I did on an RFC.
Notable features of
Deque
Significantly lower memory usage and better performance than
SplDoublyLinkedList
Amortized constant time operations for push/pop/unshift/shift.
Reclaims memory when roughly a quarter of the capacity is used,
unlike array, which never releases allocated capacity
https://www.npopov.com/2014/12/22/PHPs-new-hashtable-implementation.html (the article predates php 8.2-dev's memory optimization for packed arrays)
For long-running applications when the maximum count of Deque
is larger than the average count, this may be a concern.
Adds functionality that cannot be implemented nearly efficiently in
an array. For example, shifting a single element onto an array
(and making it first in iteration order) with
array_shift
would take linear time, because all elements in the array
would need to be moved to make room for the first one
Support
$deque[] = $element
, like ArrayObject.Having this functionality in php itself rather than a third party extension
would encourage wider adoption of this
Backwards incompatible changes:
\Collections\Deque
wouldcause a compile error due to this class now being declared internally.
Benchmark
Benchmarks were re-run on November 9th, 2022 with opcache enabled in an NTS non-debug build with default cflags (-O2)
Two cycles of appending n values then shifting them from the front
Note that it is possible to have constant time removal from the front of a PHP
array
efficiently (as long askey
stays at the front of the array), but it is not possible to have constant time prepending (unshift
) to the front of an array.array_unshift
is a linear time operation (takes time proportional to the current array size) - **this benchmark avoidsarray_unshift and other pitfalls for array**. So
unshift` is not benchmarked.Because there's a second cycle, array becomes an associative array and uses more memory than a packed array (https://www.npopov.com/2014/12/22/PHPs-new-hashtable-implementation.html).
memory_get_usage is not counting the memory overhead of tracking the allocations of a lot of small objects, so the memory usage of SplDoublyLinkedList is under-reported (negligibly with the default
emalloc
allocator). SplQueue is a subclass of SplDoublyLinkedList and I expect it would have the same performance.Click to expand source code for benchmarking 2 cycles of pushing `n` elements then popping all elements
Tests run on November 9, 2022 with php 8.3-dev
Only appending to a Deque and reading elements without removal
Note that the proposed
Deque
as well as the existingSplDoublyLinkedList
/SplStack
are expected to perform equally well at shifting to (adding to) or unshifting from(removing from) the front of the Collection (compared to adding/removing the back of a Collection)Deque
is more efficient than other object data structures in the SPL at this benchmark, but is less efficient thanarray
after array optimizations for #7491 were merged into PHP 8.2Click to expand benchmark of only appending to a Deque and reading elements without removal
Caveats for comparison with Ds\Deque from ds PECL:
phpize; ./configure; make install
(installing as a shared extension reflects the default way to install PECL modules, e.g. whatpecl install
or an OS package manager would do or what copying the windows DLL would do), not statically compiled into php