Open
Description
Feature or enhancement
Proposal:
Currently, a C calls is executed roughly as follows in the specializing interpreter:
(py_func->m_ml->ml_meth)(...args)
This has two sources of overhead:
- Double pointer lookup while cheap is still something we can remove.
- JIT cannot inline the function without PGO (which the JIT currently does not have, and will probably never have).
We can optimize it to the following in the JIT:
PyCFunction cfunc = LOOKUP_TABLE[1..n]; // Via replicate(n)
DEOPT_IF(cfunc != py_func->m_ml->ml_meth);
cfunc(...args);
LOOKUP_TABLE
will be populated with common C functions that we know Python code uses.
This will remove the overhead of 2. Allowing the JIT to inline and optimize these calls.
If we want, there's an even more extreme optimization we could do. We could just burn in the C function directly and call it. saving the overhead of 1. However, I don't think this could be done without breaking strange usages of ml_meth
where it's dynamically set. So I would be more cautious here with that.
Has this already been discussed elsewhere?
No response given
Links to previous discussion of this feature:
No response