|
| 1 | +# zval |
| 2 | + |
| 3 | +PHP is a dynamic language. As such, a variable can typically contain a value of any type, and the type of the variable |
| 4 | +may even change during the execution of the program. Under the hood, this is implemented through the `zval` struct. It |
| 5 | +is one of the most important data structures in php-src. It is essentially a "tagged union", meaning it consists of an |
| 6 | +integer tag, representing the type of the variable, and a union for the value itself. Let's look at the value first. |
| 7 | + |
| 8 | +## `zend_value` |
| 9 | + |
| 10 | +```c |
| 11 | +typedef union _zend_value { |
| 12 | + zend_long lval; /* long value, i.e. int. */ |
| 13 | + double dval; /* double value, i.e. float. */ |
| 14 | + zend_refcounted *counted; |
| 15 | + zend_string *str; |
| 16 | + zend_array *arr; |
| 17 | + zend_object *obj; |
| 18 | + zend_resource *res; |
| 19 | + zend_reference *ref; |
| 20 | + // Less important for now. |
| 21 | + zend_ast_ref *ast; |
| 22 | + zval *zv; |
| 23 | + void *ptr; |
| 24 | + zend_class_entry *ce; |
| 25 | + zend_function *func; |
| 26 | + struct { |
| 27 | + uint32_t w1; |
| 28 | + uint32_t w2; |
| 29 | + } ww; |
| 30 | +} zend_value; |
| 31 | +``` |
| 32 | + |
| 33 | +A C union is a data type that is big enough to hold the biggest of its members. As such, it can hold exactly one of its |
| 34 | +members at a time. For example, `zend_value` may store the `lval` member, or the `dval` member, but never both at the |
| 35 | +same time. Remembering exactly _which_ member is stored is our job. That's what the `zval` types are for. |
| 36 | + |
| 37 | +If you are a PHP developer, the top members should sound pretty familiar, with the exception of `counted`. `counted` |
| 38 | +refers to any of the values that use [reference counting]() to determine the lifetime of a value. This includes strings, |
| 39 | +arrays, objects, resources and references. All of these will be discussed in their own chapters. You may be thinking |
| 40 | +that some values are missing, most notably `null` and `bool`. These values don't hold any auxiliary data, but consist |
| 41 | +solely of the `zval` type. |
| 42 | + |
| 43 | +## `zval` types |
| 44 | + |
| 45 | +```c |
| 46 | +#define IS_UNDEF 0 /* A variable that was never written to. */ |
| 47 | +#define IS_NULL 1 |
| 48 | +#define IS_FALSE 2 |
| 49 | +#define IS_TRUE 3 |
| 50 | +#define IS_LONG 4 /* An integer value. */ |
| 51 | +#define IS_DOUBLE 5 /* A floating point value. */ |
| 52 | +#define IS_STRING 6 |
| 53 | +#define IS_ARRAY 7 |
| 54 | +#define IS_OBJECT 8 |
| 55 | +#define IS_RESOURCE 9 |
| 56 | +#define IS_REFERENCE 10 |
| 57 | +``` |
| 58 | +
|
| 59 | +These simple integers determine what value is currently stored in `zend_value`. Together, the value and the tag make up |
| 60 | +the `zval`, along with some other fields. Note how `IS_NULL`, `IS_FALSE` and `IS_TRUE` are actually `zval` types. This |
| 61 | +explains why they are absent from `zend_value`. |
| 62 | +
|
| 63 | +Finally, here's what the `zval` struct actually looks like. This may look intimidating at first. Don't worry, we'll go |
| 64 | +over it step by step. |
| 65 | +
|
| 66 | +```c |
| 67 | +typedef struct _zval_struct zval; |
| 68 | +
|
| 69 | +struct _zval_struct { |
| 70 | + zend_value value; |
| 71 | + union { |
| 72 | + uint32_t type_info; |
| 73 | + struct { |
| 74 | + ZEND_ENDIAN_LOHI_3( |
| 75 | + uint8_t type, /* active type */ |
| 76 | + uint8_t type_flags, |
| 77 | + union { |
| 78 | + uint16_t extra; /* not further specified */ |
| 79 | + } u) |
| 80 | + } v; |
| 81 | + } u1; |
| 82 | + union { |
| 83 | + uint32_t next; /* hash collision chain */ |
| 84 | + uint32_t cache_slot; /* cache slot (for RECV_INIT) */ |
| 85 | + uint32_t opline_num; /* opline number (for FAST_CALL) */ |
| 86 | + uint32_t lineno; /* line number (for ast nodes) */ |
| 87 | + uint32_t num_args; /* arguments number for EX(This) */ |
| 88 | + uint32_t fe_pos; /* foreach position */ |
| 89 | + uint32_t fe_iter_idx; /* foreach iterator index */ |
| 90 | + uint32_t guard; /* recursion and single property guard */ |
| 91 | + uint32_t constant_flags; /* constant flags */ |
| 92 | + uint32_t extra; /* not further specified */ |
| 93 | + } u2; |
| 94 | +}; |
| 95 | +``` |
| 96 | + |
| 97 | +`zval.value` reserves space for the actual variable data, if the type requires any. |
| 98 | + |
| 99 | +`zval.u1` stores the type of the variable. This refers to the `IS_*` constants above. You may be wondering why this is a |
| 100 | +`union`. In short, this field is used not only for the `IS_*` constants, but also some other flags. The entire |
| 101 | +`type_info` consists of 4 bytes. `zval.u1.v.type`, the lowest byte, is used for the `IS_*` constants. |
| 102 | +`zval.u1.v.type_flags` is used for the `IS_TYPE_REFCOUNTED` and `IS_TYPE_COLLECTABLE` flags. They will be discussed |
| 103 | +within the [reference counting]() chapter. `zval.u1.v.u.extra` (containing the useless `u` union) is currently only used |
| 104 | +for the `IS_STATIC_VAR_UNINITIALIZED` flag, which is somewhat of a fringe-case we won't get into here. So, |
| 105 | +`zval.u1.type_info` and `zval.u1.v` are essentially two ways to access the same data. The `ZEND_ENDIAN_LOHI_3` macro is |
| 106 | +used to guarantee ordering of bytes across big- and little-endian architectures. |
| 107 | +
|
| 108 | +If you're familiar with C, you'll know that the compiler likes to add padding to structures with "odd" sizes. It does |
| 109 | +that because the CPU can work with some offsets more efficiently that others. Ignoring the `zval.u2` field for a second, |
| 110 | +our struct would be 12 bytes in total, 8 coming from `zval.value` and 4 from `zval.u1`. A compiler on a 64-bit |
| 111 | +architecture will generally bump this to 16 bytes by adding 4 bytes of useless padding. If this padding is added anyway, |
| 112 | +we might as well make use of it. `zval.u2` is often unoccupied, but provides 4 additional bytes to be used in various |
| 113 | +contexts. How exactly the value is used depends on the use case, but it's important to remember that it may only be used |
| 114 | +for one of them at a time. |
| 115 | + |
| 116 | +## Macros |
| 117 | + |
| 118 | +The fields in `zval` should never be accessed directly. Instead, there are a plethora of macros to access them, |
| 119 | +concealing some of the implementation details of the `zval` struct. For many macros, there's a `_P`-suffixed variant |
| 120 | +that performs the same operation on a pointer to the given `zval`. |
| 121 | +
|
| 122 | +| Macro | Description | |
| 123 | +| ----------------------- | --------------------------------------------------------------------------------------- | |
| 124 | +| `Z_TYPE[_P]` | Access the `zval.u1.v.type` part of the type flags, containing the `IS_*` type. | |
| 125 | +| `Z_LVAL[_P]` | Access the underlying `int` value. | |
| 126 | +| `Z_DVAL[_P]` | Access the underlying `float` value. | |
| 127 | +| `Z_STR[_P]` | Access the underlying `zend_string` pointer. | |
| 128 | +| `Z_STRVAL[_P]` | Access the strings raw `char *` pointer. | |
| 129 | +| `Z_STRLEN[_P]` | Access the strings length. | |
| 130 | +| `ZVAL_COPY_VALUE(t, s)` | Copy one `zval` to another, including type and value. | |
| 131 | +| `ZVAL_COPY(t, s)` | Same as `ZVAL_COPY_VALUE`, but if the value is reference counted, increase the counter. | |
| 132 | +
|
| 133 | +<!-- FIXME: There are many more. --> |
| 134 | +
|
| 135 | +## Other `zval` types |
| 136 | +
|
| 137 | +`zval`s are sometimes used internally with types that don't exist in userland. |
| 138 | + |
| 139 | +```c |
| 140 | +#define IS_CONSTANT_AST 11 |
| 141 | +#define IS_INDIRECT 12 |
| 142 | +#define IS_PTR 13 |
| 143 | +#define IS_ALIAS_PTR 14 |
| 144 | +#define _IS_ERROR 15 |
| 145 | +``` |
| 146 | +
|
| 147 | +`IS_CONSTANT_AST` is used to represent constant values (the right hand side of `const`, property/parameter initializers, |
| 148 | +etc.) before they are evaluated. The evaluation of a constant expression is not always possible during compilation, |
| 149 | +because they may contain references to values only available at runtime. Until that evaluation is possible, the |
| 150 | +constants contain the AST of the expression rather than the concrete values. Check the [parser]() chapter for more |
| 151 | +information on ASTs. When this flag is set, the `zval.value.ast` union member is set accordingly. |
| 152 | +
|
| 153 | +`IS_INDIRECT` indicates that the `zval.value.zv` member is populated. This field stores a pointer to some other `zval`. |
| 154 | +This type is mainly used in two situations, namely for intermediate values between `FETCH` and `ASSIGN` instructions, |
| 155 | +and for the sharing of variables in the symbol table. |
| 156 | +
|
| 157 | +<!-- TODO: The above should be described in more detail somewhere else. --> |
| 158 | +
|
| 159 | +`IS_PTR` is used for pointers to arbitrary data. Most commonly, this type is used internally for `HashTable`, as |
| 160 | +`HashTable` may only store `zval` values. For example, `EG(class_table)` represents the class table, which is a hash map |
| 161 | +of class names to the corresponding `zend_class_entry`, representing the class. The same goes for functions and many |
| 162 | +other data types. `IS_ALIAS_PTR` is used for class aliases registered via `class_alias`. Essentially, it just allows |
| 163 | +differencing between members in the class table that are aliases, or actual classes. Otherwise, it is essentially the |
| 164 | +same as `IS_PTR`. Arbitrary data is accessed through `zval.value.ptr`, and casted to the correct type depending on |
| 165 | +context. If `ptr` stores a class or function, the `zval.value.ce` or `zval.value.func` fields may be used, respectively. |
| 166 | +
|
| 167 | +`_IS_ERROR` is used as an error value for some [object handlers](). It is described in more detail in its own chapter. |
| 168 | +
|
| 169 | +```c |
| 170 | +/* Fake types used only for type hinting. |
| 171 | + * These are allowed to overlap with the types below. */ |
| 172 | +#define IS_CALLABLE 12 |
| 173 | +#define IS_ITERABLE 13 |
| 174 | +#define IS_VOID 14 |
| 175 | +#define IS_STATIC 15 |
| 176 | +#define IS_MIXED 16 |
| 177 | +#define IS_NEVER 17 |
| 178 | +
|
| 179 | +/* used for casts */ |
| 180 | +#define _IS_BOOL 18 |
| 181 | +#define _IS_NUMBER 19 |
| 182 | +``` |
| 183 | + |
| 184 | +These flags are never actually stored in `zval.u1`. They are used for type hinting and in the [object handler]() API. |
| 185 | + |
| 186 | +This only leaves the `zval.value.ww` field. In short, this field is used on 32-bit platforms when copying data from one |
| 187 | +`zval` to another. Normally, `zval.value.counted` is copied as a generic value, no matter what the actual underlying |
| 188 | +type is. `zend_value` always consists of 8 bytes due to the `double` field. Pointers, however, consist only of 4. |
| 189 | +Because we would otherwise miss the other 4 bytes, they are copied manually using `z->value.ww.w2 = _w2;`. This happens |
| 190 | +in the `ZVAL_COPY_VALUE_EX` macro, you won't ever have to care about this. |
0 commit comments