Description
I haven't properly profiled anything. Throwing this idea here to see what people think.
Consider this fairly common pattern in Python:
for i in range(len(x)):
...
Bytecode snippet:
6 GET_ITER
>> 8 FOR_ITER 2 (to 14)
Anecdotally, some users are surprised at how much overhead this has. For most simple for loops, users intend for the range objects to be used as the equivalent for(int i=0; i < x; i++)
loop in C. The range objects are created then thrownaway immediately.
In those cases, calling next(range_iterator)
in FOR_ITER
is unnecessary overhead. We can unbox this into a simple PyLong
object, then PyLong_Add
on it. This can be a FOR_ITER_RANGE
opcode.
This will have to be extremely conservative. We can only implement this on range objects with a reference count of 1 (ie used only for the for loop) and they must be the builtin range and not some monkeypatched version. Without the compiler’s help, the following would be very dangerous to optimize:
x = iter(range(10))
for _ in x:
next(x)
We can also do the same with lists and tuples (FOR_ITER_LIST
and FOR_ITER_TUPLE
), but use the the native PyTuple/List_GetItem
instead of iterator protocol. But I'm not sure how common something like for x in (1,2,3)
is. So maybe those aren't worth it (they're also much harder to rollback if the optimization breaks halfway).
FOR_ITER
isn't a common instruction. But I think when it's used it matters, because it tends to be rather loopy code.
What do y'all think?