Skip to content

SAVE_OPLINE before allocating memory #12648

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
wants to merge 1 commit into from

Conversation

morrisonlevi
Copy link
Contributor

If the engine OOMs at these locations, then they may sigsegv instead of doing a fatal error. Additionally, extensions that look at oplines may also crash.

Usually, the opline will either be null, or be a non-null pointer to a valid (but wrong) opline. So usually, we do not observe crashes.

This is still in draft because we found these through reading opcode handlers, and I want to try to modify the engine to help me find additional cases where it's wrong.

If the engine OOMs at these locations, then they may sigsegv instead
of doing a fatal error. Additionally, extensions that look at oplines
may also crash.
@iluuu1994
Copy link
Member

iluuu1994 commented Nov 10, 2023

Related: #10004

I believe opline should ever be an invalid pointer, so can't extensions check for NULL? Oplines may indeed be outdated, but requiring SAVE_OPLINE in all locations essentially disables ZEND_VM_IP_GLOBAL_REG.

@nielsdos
Copy link
Member

nielsdos commented Nov 10, 2023

then they may sigsegv instead of doing a fatal error.

Where do they segfault? Might be worth doing something similar like in #10003 instead of doing a heavy-weight fix like this.

@morrisonlevi
Copy link
Contributor Author

morrisonlevi commented Nov 10, 2023

They segfault because the opline hasn't been saved in all versions, and in at least some situations, this crashes because the opline is invalid rather than null. For instance, run the .phpt in this other commit on PHP 7.4 and you can see it sigsegvs instead of OOMs:

26c7c82

The opline is sometimes null, which is why zend_get_executed_lineno was changed to look for null pointers, but honestly that doesn't seem like a fix, rather a guard-rail. In the above case, the opline was uninitialized memory. The Datadog allocation profiler also checks for non-null pointers, and it hit this issue, which prompted further investigations which have found some of these SAVE_OPLINE fixes already e.g. ec54ffa. However, that fix is incomplete; there's a zend_array_dup above the SAVE_OPLINE.

Note that this PR doesn't call SAVE_OPLINE everywhere. There are at least some paths which do not allocate such as various JMP instructions as well as fast-paths for many math opcodes. So it does not completely undo this optimization. Knowing what I know now, I would rather take the performance hit and always initialize the opline like we did on PHP 7.3... but that's not my call.

The reason Datadog is finding these now is because we have an allocation profiler which takes stack samples based on a memory allocation rate. If you just stick to the engine, you have to OOM at very specific spots, which is vastly harder to do. So this is a bigger issue for extensions than core, but it's still an issue for core.

@iluuu1994
Copy link
Member

iluuu1994 commented Nov 10, 2023

@morrisonlevi I don't think ec54ffa tries to fix this OOMs but destructors. Destuctors need an updated opline because otherwise, if they throw, we're looking for try/catch boundaries in the wrong position.

I don't understand when/how an opline could be invalid. When a function is first executed, EX(opline) is set. Unless that pointer is corrupted the opline should not go away. Is there potentially something else going on?

If we consider OOM an error that needs an updated opline, almost every handler will need to add SAVE_OPLINE. It's also incredibly hard to understand whether some function may allocate memory somewhere internally, or remembering that when allocating some memory (even if temporary) in a function, and that function is used somewhere in the VM (even indirectly), it needs to add a SAVE_OPLINE. This seems impossible to uphold.

If we came to the conclusion that this is necessary, it would be better to move away from the global IP register so that we don't need SAVE_OPLINE and _CHECK_EXCEPTION at all. Hopefully that's not necessary.

@morrisonlevi
Copy link
Contributor Author

morrisonlevi commented Nov 11, 2023

I don't understand when/how an opline could be invalid. When a function is first executed, EX(opline) is set. Unless that pointer is corrupted the opline should not go away. Is there potentially something else going on?

Any time you hit out-of-memory, you are going to get an error which checks for the line number. In at least some cases in the past, the opline has not been initialized yet. I really don't think we've fixed all these. For example, I would not be surprised based on this patch if we could make this occur with observers: the engine initializes the run_time_cache__ptr, hit oom while doing so, and it goes poof. Calling the observers was guarded, but not the allocation of the run time cache.

Saving the opline for every allocation seems simpler than doing something more nuanced. I agree it seems impossible, but PHP 8.1 goes into security fixes mode soon...

@iluuu1994
Copy link
Member

iluuu1994 commented Nov 11, 2023

As OOM is a fatal error, we don't need opline for anything but line numbers. Inaccurate line numbers is a trade-off we were willing to accept. It would be helpful if you could provide a concrete reproducer, it's hard to justify a sledgehammer approach without fully understanding the problem.

Saving the opline for every allocation seems simpler than doing something more nuanced.

This would likely be quite a big performance regression. I don't know by how much, but it could probably be tested by disabling the global IP register feature.

@morrisonlevi
Copy link
Contributor Author

It would be helpful if you could provide a concrete reproducer, it's hard to justify a sledgehammer approach without fully understanding the problem.

Yes, I am working on that. Making reproducers is pretty hard, I think I'll have to provide an extension.

Saving the opline for every allocation seems simpler than doing something more nuanced.

This would likely be quite a big performance regression. I don't know by how much, but it could probably be tested by disabling the global IP register feature.

Having a PR so we can measure the performance overhead compared to disabling the optimization can help us know which approach is better (for performance, anyway).

Note that most opcodes which do allocations already do save the opline. This PR added roughly 10 checks, and moved one to happen earlier. If this is the extent of the needed changes (we'll see, still need to provide more data, I know), then I think we'll be fine.

I opened the PR early but in draft in this case because I really hope to get whatever fixes are necessary into PHP 8.1, which has a narrow window. I wanted to have these discussions quickly so there's actually a chance of merging in time.

@iluuu1994
Copy link
Member

Took a sec ^^ But I could find at least a handful that were missed.

  • ZEND_ROPE_END
  • ZEND_FETCH_CONSTANT
  • ZEND_INIT_ARRAY
  • ZEND_BEGIN_SILENCE
  • ZEND_BIND_GLOBAL
  • ZEND_BIND_LEXICAL
  • ZEND_BIND_INIT_STATIC_OR_JMP (>8.1)
  • ZEND_FETCH_GLOBALS
  • ZEND_MAKE_REF

You're right that surprisingly many handlers already do SAVE_OPLINE (as in the vast majority).

@morrisonlevi
Copy link
Contributor Author

We're still working on this, but with the announcement that PHP 8.1 is closed for fixes, we're not scrambling quite so hard. It's still a priority but we'll slow down and really focus on figuring out methods to expose which ones are stability issues and which are just stale. For instance, I'm pretty sure ZEND_BIND_STATIC is a stability issue but ZEND_ROPE_END is probably just stale (init or other rope operations probably already initialized the opline).

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants