-
Notifications
You must be signed in to change notification settings - Fork 7.9k
Remote heap feng shui / heap spraying protection #14304
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: master
Are you sure you want to change the base?
Conversation
Zend/zend_alloc.c
Outdated
#ifndef ZEND_MM_HEAP_PROTECTION | ||
# define ZEND_MM_HEAP_PROTECTION 1 /* protect heap against corruptions */ | ||
#endif | ||
#ifndef ZEND_MM_HEAP_SPRAYING_PROTECTION | ||
# define ZEND_MM_HEAP_SPRAYING_PROTECTION 1 /* protect against remote heap | ||
spraying or heap feng chui via | ||
environment / user input */ | ||
#endif |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is it worth having several knobs for those options? Having them all under ZEND_MM_HEAP_PROTECTION
would be alright, I think.
Zend/zend_alloc.c
Outdated
#endif | ||
#ifndef ZEND_MM_HEAP_SPRAYING_PROTECTION | ||
# define ZEND_MM_HEAP_SPRAYING_PROTECTION 1 /* protect against remote heap | ||
spraying or heap feng chui via |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
spraying or heap feng chui via | |
spraying or heap feng shui via |
|
||
#if ZEND_MM_HEAP_SPRAYING_PROTECTION | ||
|
||
# define ZEND_MM_ZONES 2 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
# define ZEND_MM_ZONES 2 | |
# define ZEND_MM_ZONES 2 /* one zone for trusted data, one for user-controlled ones.*/ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm hesitating on this one, as I feel that how zones are used belongs to upper abstraction levels (that's also why I've defined ZEND_MM_ZONE_INPUT
elsewhere). In any case I agree that a comment would help.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Fair, but I also agree that a comment would be nice :)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
2M * 100 workers = waster of 200MB
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This consumes an additional 2MiB of VMA per worker, but I expect the overhead of actually committed memory to be much less than that in practice, as not all pages of the extra chunk are touched.
For example, at the end of a symfony-demo request, we have this:
base: VmRSS: 65772 kB
mm-zones: VmRSS: 66020 kB
Diff: 249 kB
With USE_ZEND_ALLOC_HUGE_PAGES=1
, the RSS actually increases by 2MiB, but this is something to expect with huge pages, and this is not enabled by default.
Zend/zend_alloc.c
Outdated
ZEND_API void zend_mm_input_begin(void) | ||
{ | ||
#if ZEND_MM_HEAP_SPRAYING_PROTECTION | ||
AG(use_input_zone)++; | ||
AG(mm_heap)->zone_free_slot = ZEND_MM_ZONE_FREE_SLOT(AG(mm_heap), ZEND_MM_ZONE_INPUT); | ||
#endif | ||
} | ||
|
||
ZEND_API void zend_mm_input_end(void) | ||
{ | ||
#if ZEND_MM_HEAP_SPRAYING_PROTECTION | ||
AG(use_input_zone)--; | ||
if (!AG(use_input_zone)) { | ||
AG(mm_heap)->zone_free_slot = ZEND_MM_ZONE_FREE_SLOT(AG(mm_heap), ZEND_MM_ZONE_DEFAULT); | ||
} | ||
#endif | ||
} | ||
|
||
ZEND_API bool zend_mm_check_in_input(void) | ||
{ | ||
#if ZEND_MM_HEAP_SPRAYING_PROTECTION | ||
return AG(use_input_zone); | ||
#else | ||
return true; | ||
#endif | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think those should be named zend_mm_userinput_…
Zend/zend_alloc.c
Outdated
@@ -2603,6 +2734,7 @@ ZEND_API size_t ZEND_FASTCALL _zend_mm_block_size(zend_mm_heap *heap, void *ptr | |||
|
|||
typedef struct _zend_alloc_globals { | |||
zend_mm_heap *mm_heap; | |||
int use_input_zone; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why a signed type?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That's not necessary indeed 👍
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The linux kernel has saturating types, based on PAX_REFCOUNT, to prevent wrap-around, but I think it's overkill here: an attacker shouldn't™ able to overflow a size_t
or something is seriously wrong.
Now that #14054 was merged, can you please rebase this one, so we can get it landed? |
@arnaud-lb ping :) |
Sorry I didn't see your previous comment. I will get back at this PR soon. |
Isolate request environment / input in separate chunks to makes it more difficult to remotely control the layout of the heap.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If this lands, please add a note to UPGRADING in other changes about the increase in memory requirements.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
At the first look, I don't like this. What kind of attacks will this complicate?
Zend/zend_alloc.c
Outdated
struct _zend_mm_chunk { | ||
zend_mm_heap *heap; | ||
#if ZEND_MM_HEAP_PROTECTION | ||
zend_mm_free_slot **zone_free_slot; | ||
#endif | ||
zend_mm_chunk *next; | ||
zend_mm_chunk *prev; | ||
uint32_t free_pages; /* number of free pages */ | ||
uint32_t free_tail; /* number of free pages at the end of chunk */ | ||
uint32_t num; | ||
char reserve[64 - (sizeof(void*) * 3 + sizeof(uint32_t) * 3)]; | ||
zend_mm_heap heap_slot; /* used only in main chunk */ | ||
#if ZEND_MM_HEAP_PROTECTION | ||
zend_mm_zone *zone; | ||
#endif | ||
zend_mm_page_map free_map; /* 512 bits or 64 bytes */ | ||
zend_mm_page_info map[ZEND_MM_PAGES]; /* 2 KB = 512 * 4 */ | ||
}; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The layout of MM structures was carefully designed to be CPU cache friendly. The most useful data field were aligned to CPU cache line boundary (see the reserve
field). This change breaks this layout.
I'm not sure if your previous changes to zend_mm_heap
didn't break it already.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thank you, I wasn't sure about the purpose of the reserve
field. I've adjusted its size, and now the overhead is down to +0.04% according to Valgrind.
|
||
#if ZEND_MM_HEAP_SPRAYING_PROTECTION | ||
|
||
# define ZEND_MM_ZONES 2 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
2M * 100 workers = waster of 200MB
zend_mm_set_next_free_slot(heap, bin_num, p, heap->free_slot[bin_num]); | ||
heap->free_slot[bin_num] = p; | ||
zend_mm_set_next_free_slot(heap, bin_num, p, ZEND_MM_FREE_SLOT_EX(heap, chunk, bin_num)); | ||
ZEND_MM_FREE_SLOT_EX(heap, chunk, bin_num) = p; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This make regression for each deallocation.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This adds an additional fetch of chunk->zone_free_slot
, but this field is just next to chunk->heap
, and is likely in cache as we fetch it here:
Lines 2701 to 2702 in 53df3ae
ZEND_MM_CHECK(chunk->heap == AG(mm_heap), "zend_mm_heap corrupted"); \ | |
zend_mm_free_small(AG(mm_heap), ptr, _num); \ |
Also, benchmark results show very little overhead (0% wall time, 0.04% valgrind)
An attacker can precisely control the layout of the heap by sending GET or POST parameters. For example, the query string By isolating the user inputs in separate chunks, we prevent grooming the application heap via GET/POST/etc. This is useful in the context of a remote attacker with no ability to execute arbitrary code. As the overhead is very small (0% wall time / 0.04% valgrind on the symfony-demo benchmark), I believe this is worth it. |
@arnaud-lb didn't you already add the protection against heap buffer overflow/underflow through the shadow pointers? do you have the source of the exploit used in the presentation? Personally, I think that the better approach would be filtering input data and just stopping the request in case of unexpected/dangerous data detection (e.g. GET/POST with duplicate names). |
Yes, but that's not the only way to exploit an out of bound write. If an attacker can arrange the heap to override the first bytes of an arbitrary block, there are other things they can attack. Protecting against freelist corruption specifically was worth it because it's an easy and powerful target, but there are other targets that are made practicable by arranging the layout of the heap.
I've tried to find it, but without success
This will prevent controlling the order of elements in the freelist, but this is not absolutely necessary to control block placement in the heap. For instance, an attacker can control the order of runs in a chunk (without unsetting a GET/POST element), as well as how much they are filled, so they can arrange for a block of size N allocated by the application to be at the end of a run, and just before an other run of their choice. |
@cfreal could you please share the sources of exploit used in your presentation? (better to email to my and @arnaud-lb public github email addresses) |
Also, do not abuse the userinput zone, as it may introduce a new attack surface. For example, if there is a potential vulnerability in the unserialize process and unserialize-related operations are added to the userinput zone, an attacker could use an HTTP request to arrange an ideal memory layout without needing to account for PHP internals. |
Unserialized is already providing enough control to an attacker, this wouldn't change much :D But more seriously, partitioning memory by types (whether primitive types/size, or usage-types) is part of #14083's roadmap :) |
@dstogov did you have time to take a look at this? |
@arnaud-lb please discuss this with @nielsdos, @iluuu1994 and take the decision. |
This isolates remotely-controlled allocations (user input / request environment) in separate chunks, to make it more difficult for a remote attacker to precisely control the layout of the heap before the application starts, or to spray the heap with specific content. The general idea is similar to https://lwn.net/Articles/965837/ (not necessarily the implementation).
Design:
The heap is split in two zones, each with its own set of freelists (free_slot) and chunks so that allocations in a zone do not impact the layout of the other one. There is a notion of current zone, from which all allocations are made. The current zone is switched to the input zone before handling a request, and switched back to the default zone after input handling.
A new field
heap.zone_free_slot
points to the freelist of the current zone. The only change required in allocation operations is to refer to that instead ofheap.free_slot
. Similarly, in free operations we find thezone_free_slot
in the chunk.realloc() allocates new blocks in the current zone (if truncation or extension is required), regardless of the original zone in which the block was allocated, so that it's not possible to profit of the layout of the original zone.
It is valid to switch zones at any time, so we can switch zones before/after handling JIT super globals, for instance.
Future scope would be to activate the input zone in more places (e.g. when parsing json, during unserialize, etc), or to create more zones for various purposes.
Time overhead is very small (~+0% wall time, +0.13% valgrind).
The minimal memory usage is now 4MiB (two chunks) instead of 2MiB, but this does not necessarily translates to that much increase in RSS.
Related: #14083