Skip to content

Commit 59050fe

Browse files
committed
[skip ci] Add page for zval
1 parent 8a20f11 commit 59050fe

File tree

2 files changed

+191
-1
lines changed

2 files changed

+191
-1
lines changed

docs/src/SUMMARY.md

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -15,7 +15,7 @@
1515
- [Overview]()
1616
- [Startup/shutdown]()
1717
- [Data structures]()
18-
- [zval]()
18+
- [zval](./core/data-structures/zval.md)
1919
- [zend_string]()
2020
- [zend_array]()
2121
- [zend_object]()

docs/src/core/data-structures/zval.md

Lines changed: 190 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,190 @@
1+
# zval
2+
3+
PHP is a dynamic language. As such, a variable can typically contain a value of any type, and the type of the variable
4+
may even change during the execution of the program. Under the hood, this is implemented through the `zval` struct. It
5+
is one of the most important data structures in php-src. It is essentially a "tagged union", meaning it consists of an
6+
integer tag, representing the type of the variable, and a union for the value itself. Let's look at the value first.
7+
8+
## `zend_value`
9+
10+
```c
11+
typedef union _zend_value {
12+
zend_long lval; /* long value, i.e. int. */
13+
double dval; /* double value, i.e. float. */
14+
zend_refcounted *counted;
15+
zend_string *str;
16+
zend_array *arr;
17+
zend_object *obj;
18+
zend_resource *res;
19+
zend_reference *ref;
20+
// Less important for now.
21+
zend_ast_ref *ast;
22+
zval *zv;
23+
void *ptr;
24+
zend_class_entry *ce;
25+
zend_function *func;
26+
struct {
27+
uint32_t w1;
28+
uint32_t w2;
29+
} ww;
30+
} zend_value;
31+
```
32+
33+
A C union is a data type that is big enough to hold the biggest of its members. As such, it can hold exactly one of its
34+
members at a time. For example, `zend_value` may store the `lval` member, or the `dval` member, but never both at the
35+
same time. Remembering exactly _which_ member is stored is our job. That's what the `zval` types are for.
36+
37+
If you are a PHP developer, the top members should sound pretty familiar, with the exception of `counted`. `counted`
38+
refers to any of the values that use [reference counting]() to determine the lifetime of a value. This includes strings,
39+
arrays, objects, resources and references. All of these will be discussed in their own chapters. You may be thinking
40+
that some values are missing, most notably `null` and `bool`. These values don't hold any auxiliary data, but consist
41+
solely of the `zval` type.
42+
43+
## `zval` types
44+
45+
```c
46+
#define IS_UNDEF 0 /* A variable that was never written to. */
47+
#define IS_NULL 1
48+
#define IS_FALSE 2
49+
#define IS_TRUE 3
50+
#define IS_LONG 4 /* An integer value. */
51+
#define IS_DOUBLE 5 /* A floating point value. */
52+
#define IS_STRING 6
53+
#define IS_ARRAY 7
54+
#define IS_OBJECT 8
55+
#define IS_RESOURCE 9
56+
#define IS_REFERENCE 10
57+
```
58+
59+
These simple integers determine what value is currently stored in `zend_value`. Together, the value and the tag make up
60+
the `zval`, along with some other fields. Note how `IS_NULL`, `IS_FALSE` and `IS_TRUE` are actually `zval` types. This
61+
explains why they are absent from `zend_value`.
62+
63+
Finally, here's what the `zval` struct actually looks like. This may look intimidating at first. Don't worry, we'll go
64+
over it step by step.
65+
66+
```c
67+
typedef struct _zval_struct zval;
68+
69+
struct _zval_struct {
70+
zend_value value;
71+
union {
72+
uint32_t type_info;
73+
struct {
74+
ZEND_ENDIAN_LOHI_3(
75+
uint8_t type, /* active type */
76+
uint8_t type_flags,
77+
union {
78+
uint16_t extra; /* not further specified */
79+
} u)
80+
} v;
81+
} u1;
82+
union {
83+
uint32_t next; /* hash collision chain */
84+
uint32_t cache_slot; /* cache slot (for RECV_INIT) */
85+
uint32_t opline_num; /* opline number (for FAST_CALL) */
86+
uint32_t lineno; /* line number (for ast nodes) */
87+
uint32_t num_args; /* arguments number for EX(This) */
88+
uint32_t fe_pos; /* foreach position */
89+
uint32_t fe_iter_idx; /* foreach iterator index */
90+
uint32_t guard; /* recursion and single property guard */
91+
uint32_t constant_flags; /* constant flags */
92+
uint32_t extra; /* not further specified */
93+
} u2;
94+
};
95+
```
96+
97+
`zval.value` reserves space for the actual variable data, if the type requires any.
98+
99+
`zval.u1` stores the type of the variable. This refers to the `IS_*` constants above. You may be wondering why this is a
100+
`union`. In short, this field is used not only for the `IS_*` constants, but also some other flags. The entire
101+
`type_info` consists of 4 bytes. `zval.u1.v.type`, the lowest byte, is used for the `IS_*` constants.
102+
`zval.u1.v.type_flags` is used for the `IS_TYPE_REFCOUNTED` and `IS_TYPE_COLLECTABLE` flags. They will be discussed
103+
within the [reference counting]() chapter. `zval.u1.v.u.extra` (containing the useless `u` union) is currently only used
104+
for the `IS_STATIC_VAR_UNINITIALIZED` flag, which is somewhat of a fringe-case we won't get into here. So,
105+
`zval.u1.type_info` and `zval.u1.v` are essentially two ways to access the same data. The `ZEND_ENDIAN_LOHI_3` macro is
106+
used to guarantee ordering of bytes across big- and little-endian architectures.
107+
108+
If you're familiar with C, you'll know that the compiler likes to add padding to structures with "odd" sizes. It does
109+
that because the CPU can work with some offsets more efficiently that others. Ignoring the `zval.u2` field for a second,
110+
our struct would be 12 bytes in total, 8 coming from `zval.value` and 4 from `zval.u1`. A compiler on a 64-bit
111+
architecture will generally bump this to 16 bytes by adding 4 bytes of useless padding. If this padding is added anyway,
112+
we might as well make use of it. `zval.u2` is often unoccupied, but provides 4 additional bytes to be used in various
113+
contexts. How exactly the value is used depends on the use case, but it's important to remember that it may only be used
114+
for one of them at a time.
115+
116+
## Macros
117+
118+
The fields in `zval` should never be accessed directly. Instead, there are a plethora of macros to access them,
119+
concealing some of the implementation details of the `zval` struct. For many macros, there's a `_P`-suffixed variant
120+
that performs the same operation on a pointer to the given `zval`.
121+
122+
| Macro | Description |
123+
| ----------------------- | --------------------------------------------------------------------------------------- |
124+
| `Z_TYPE[_P]` | Access the `zval.u1.v.type` part of the type flags, containing the `IS_*` type. |
125+
| `Z_LVAL[_P]` | Access the underlying `int` value. |
126+
| `Z_DVAL[_P]` | Access the underlying `float` value. |
127+
| `Z_STR[_P]` | Access the underlying `zend_string` pointer. |
128+
| `Z_STRVAL[_P]` | Access the strings raw `char *` pointer. |
129+
| `Z_STRLEN[_P]` | Access the strings length. |
130+
| `ZVAL_COPY_VALUE(t, s)` | Copy one `zval` to another, including type and value. |
131+
| `ZVAL_COPY(t, s)` | Same as `ZVAL_COPY_VALUE`, but if the value is reference counted, increase the counter. |
132+
133+
<!-- FIXME: There are many more. -->
134+
135+
## Other `zval` types
136+
137+
`zval`s are sometimes used internally with types that don't exist in userland.
138+
139+
```c
140+
#define IS_CONSTANT_AST 11
141+
#define IS_INDIRECT 12
142+
#define IS_PTR 13
143+
#define IS_ALIAS_PTR 14
144+
#define _IS_ERROR 15
145+
```
146+
147+
`IS_CONSTANT_AST` is used to represent constant values (the right hand side of `const`, property/parameter initializers,
148+
etc.) before they are evaluated. The evaluation of a constant expression is not always possible during compilation,
149+
because they may contain references to values only available at runtime. Until that evaluation is possible, the
150+
constants contain the AST of the expression rather than the concrete values. Check the [parser]() chapter for more
151+
information on ASTs. When this flag is set, the `zval.value.ast` union member is set accordingly.
152+
153+
`IS_INDIRECT` indicates that the `zval.value.zv` member is populated. This field stores a pointer to some other `zval`.
154+
This type is mainly used in two situations, namely for intermediate values between `FETCH` and `ASSIGN` instructions,
155+
and for the sharing of variables in the symbol table.
156+
157+
<!-- TODO: The above should be described in more detail somewhere else. -->
158+
159+
`IS_PTR` is used for pointers to arbitrary data. Most commonly, this type is used internally for `HashTable`, as
160+
`HashTable` may only store `zval` values. For example, `EG(class_table)` represents the class table, which is a hash map
161+
of class names to the corresponding `zend_class_entry`, representing the class. The same goes for functions and many
162+
other data types. `IS_ALIAS_PTR` is used for class aliases registered via `class_alias`. Essentially, it just allows
163+
differencing between members in the class table that are aliases, or actual classes. Otherwise, it is essentially the
164+
same as `IS_PTR`. Arbitrary data is accessed through `zval.value.ptr`, and casted to the correct type depending on
165+
context. If `ptr` stores a class or function, the `zval.value.ce` or `zval.value.func` fields may be used, respectively.
166+
167+
`_IS_ERROR` is used as an error value for some [object handlers](). It is described in more detail in its own chapter.
168+
169+
```c
170+
/* Fake types used only for type hinting.
171+
* These are allowed to overlap with the types below. */
172+
#define IS_CALLABLE 12
173+
#define IS_ITERABLE 13
174+
#define IS_VOID 14
175+
#define IS_STATIC 15
176+
#define IS_MIXED 16
177+
#define IS_NEVER 17
178+
179+
/* used for casts */
180+
#define _IS_BOOL 18
181+
#define _IS_NUMBER 19
182+
```
183+
184+
These flags are never actually stored in `zval.u1`. They are used for type hinting and in the [object handler]() API.
185+
186+
This only leaves the `zval.value.ww` field. In short, this field is used on 32-bit platforms when copying data from one
187+
`zval` to another. Normally, `zval.value.counted` is copied as a generic value, no matter what the actual underlying
188+
type is. `zend_value` always consists of 8 bytes due to the `double` field. Pointers, however, consist only of 4.
189+
Because we would otherwise miss the other 4 bytes, they are copied manually using `z->value.ww.w2 = _w2;`. This happens
190+
in the `ZVAL_COPY_VALUE_EX` macro, you won't ever have to care about this.

0 commit comments

Comments
 (0)