Skip to content

Remote heap feng shui / heap spraying protection #14304

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 8 commits into
base: master
Choose a base branch
from

Conversation

arnaud-lb
Copy link
Member

@arnaud-lb arnaud-lb commented May 23, 2024

This isolates remotely-controlled allocations (user input / request environment) in separate chunks, to make it more difficult for a remote attacker to precisely control the layout of the heap before the application starts, or to spray the heap with specific content. The general idea is similar to https://lwn.net/Articles/965837/ (not necessarily the implementation).

Design:

The heap is split in two zones, each with its own set of freelists (free_slot) and chunks so that allocations in a zone do not impact the layout of the other one. There is a notion of current zone, from which all allocations are made. The current zone is switched to the input zone before handling a request, and switched back to the default zone after input handling.

A new field heap.zone_free_slot points to the freelist of the current zone. The only change required in allocation operations is to refer to that instead of heap.free_slot. Similarly, in free operations we find the zone_free_slot in the chunk.

realloc() allocates new blocks in the current zone (if truncation or extension is required), regardless of the original zone in which the block was allocated, so that it's not possible to profit of the layout of the original zone.

It is valid to switch zones at any time, so we can switch zones before/after handling JIT super globals, for instance.

Future scope would be to activate the input zone in more places (e.g. when parsing json, during unserialize, etc), or to create more zones for various purposes.

Time overhead is very small (~+0% wall time, +0.13% valgrind).

The minimal memory usage is now 4MiB (two chunks) instead of 2MiB, but this does not necessarily translates to that much increase in RSS.

Related: #14083

Comment on lines 148 to 154
#ifndef ZEND_MM_HEAP_PROTECTION
# define ZEND_MM_HEAP_PROTECTION 1 /* protect heap against corruptions */
#endif
#ifndef ZEND_MM_HEAP_SPRAYING_PROTECTION
# define ZEND_MM_HEAP_SPRAYING_PROTECTION 1 /* protect against remote heap
spraying or heap feng chui via
environment / user input */
#endif
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is it worth having several knobs for those options? Having them all under ZEND_MM_HEAP_PROTECTION would be alright, I think.

#endif
#ifndef ZEND_MM_HEAP_SPRAYING_PROTECTION
# define ZEND_MM_HEAP_SPRAYING_PROTECTION 1 /* protect against remote heap
spraying or heap feng chui via
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
spraying or heap feng chui via
spraying or heap feng shui via

See https://en.wikipedia.org/wiki/Feng_shui


#if ZEND_MM_HEAP_SPRAYING_PROTECTION

# define ZEND_MM_ZONES 2
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
# define ZEND_MM_ZONES 2
# define ZEND_MM_ZONES 2 /* one zone for trusted data, one for user-controlled ones.*/

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm hesitating on this one, as I feel that how zones are used belongs to upper abstraction levels (that's also why I've defined ZEND_MM_ZONE_INPUT elsewhere). In any case I agree that a comment would help.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fair, but I also agree that a comment would be nice :)

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

2M * 100 workers = waster of 200MB

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This consumes an additional 2MiB of VMA per worker, but I expect the overhead of actually committed memory to be much less than that in practice, as not all pages of the extra chunk are touched.

For example, at the end of a symfony-demo request, we have this:

base:     VmRSS:	   65772 kB
mm-zones: VmRSS:	   66020 kB

Diff: 249 kB

With USE_ZEND_ALLOC_HUGE_PAGES=1, the RSS actually increases by 2MiB, but this is something to expect with huge pages, and this is not enabled by default.

Comment on lines 2803 to 2790
ZEND_API void zend_mm_input_begin(void)
{
#if ZEND_MM_HEAP_SPRAYING_PROTECTION
AG(use_input_zone)++;
AG(mm_heap)->zone_free_slot = ZEND_MM_ZONE_FREE_SLOT(AG(mm_heap), ZEND_MM_ZONE_INPUT);
#endif
}

ZEND_API void zend_mm_input_end(void)
{
#if ZEND_MM_HEAP_SPRAYING_PROTECTION
AG(use_input_zone)--;
if (!AG(use_input_zone)) {
AG(mm_heap)->zone_free_slot = ZEND_MM_ZONE_FREE_SLOT(AG(mm_heap), ZEND_MM_ZONE_DEFAULT);
}
#endif
}

ZEND_API bool zend_mm_check_in_input(void)
{
#if ZEND_MM_HEAP_SPRAYING_PROTECTION
return AG(use_input_zone);
#else
return true;
#endif
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think those should be named zend_mm_userinput_…

@@ -2603,6 +2734,7 @@ ZEND_API size_t ZEND_FASTCALL _zend_mm_block_size(zend_mm_heap *heap, void *ptr

typedef struct _zend_alloc_globals {
zend_mm_heap *mm_heap;
int use_input_zone;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why a signed type?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That's not necessary indeed 👍

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The linux kernel has saturating types, based on PAX_REFCOUNT, to prevent wrap-around, but I think it's overkill here: an attacker shouldn't™ able to overflow a size_t or something is seriously wrong.

@arnaud-lb arnaud-lb changed the title Remote heap feng chui / heap spraying protection Remote heap feng shui / heap spraying protection May 29, 2024
@jvoisin
Copy link
Contributor

jvoisin commented Jul 10, 2024

Now that #14054 was merged, can you please rebase this one, so we can get it landed?

@jvoisin
Copy link
Contributor

jvoisin commented Nov 4, 2024

@arnaud-lb ping :)

@arnaud-lb
Copy link
Member Author

Sorry I didn't see your previous comment. I will get back at this PR soon.

Isolate request environment / input in separate chunks to makes it more
difficult to remotely control the layout of the heap.
@arnaud-lb arnaud-lb marked this pull request as ready for review November 7, 2024 14:00
Copy link
Member

@Girgias Girgias left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If this lands, please add a note to UPGRADING in other changes about the increase in memory requirements.

Copy link
Member

@dstogov dstogov left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

At the first look, I don't like this. What kind of attacks will this complicate?

Comment on lines 371 to 388
struct _zend_mm_chunk {
zend_mm_heap *heap;
#if ZEND_MM_HEAP_PROTECTION
zend_mm_free_slot **zone_free_slot;
#endif
zend_mm_chunk *next;
zend_mm_chunk *prev;
uint32_t free_pages; /* number of free pages */
uint32_t free_tail; /* number of free pages at the end of chunk */
uint32_t num;
char reserve[64 - (sizeof(void*) * 3 + sizeof(uint32_t) * 3)];
zend_mm_heap heap_slot; /* used only in main chunk */
#if ZEND_MM_HEAP_PROTECTION
zend_mm_zone *zone;
#endif
zend_mm_page_map free_map; /* 512 bits or 64 bytes */
zend_mm_page_info map[ZEND_MM_PAGES]; /* 2 KB = 512 * 4 */
};
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The layout of MM structures was carefully designed to be CPU cache friendly. The most useful data field were aligned to CPU cache line boundary (see the reserve field). This change breaks this layout.

I'm not sure if your previous changes to zend_mm_heap didn't break it already.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you, I wasn't sure about the purpose of the reserve field. I've adjusted its size, and now the overhead is down to +0.04% according to Valgrind.


#if ZEND_MM_HEAP_SPRAYING_PROTECTION

# define ZEND_MM_ZONES 2
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

2M * 100 workers = waster of 200MB

Comment on lines -1435 to +1523
zend_mm_set_next_free_slot(heap, bin_num, p, heap->free_slot[bin_num]);
heap->free_slot[bin_num] = p;
zend_mm_set_next_free_slot(heap, bin_num, p, ZEND_MM_FREE_SLOT_EX(heap, chunk, bin_num));
ZEND_MM_FREE_SLOT_EX(heap, chunk, bin_num) = p;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This make regression for each deallocation.

Copy link
Member Author

@arnaud-lb arnaud-lb Nov 11, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This adds an additional fetch of chunk->zone_free_slot, but this field is just next to chunk->heap, and is likely in cache as we fetch it here:

php-src/Zend/zend_alloc.c

Lines 2701 to 2702 in 53df3ae

ZEND_MM_CHECK(chunk->heap == AG(mm_heap), "zend_mm_heap corrupted"); \
zend_mm_free_small(AG(mm_heap), ptr, _num); \

Also, benchmark results show very little overhead (0% wall time, 0.04% valgrind)

@arnaud-lb
Copy link
Member Author

arnaud-lb commented Nov 11, 2024

@dstogov

What kind of attacks will this complicate?

An attacker can precisely control the layout of the heap by sending GET or POST parameters. For example, the query string ?a=aaa...&b=bbb...&c=ccc...&b= allocates 4 blocks of arbitrary size, frees the second one, and puts it at the top of the freelist. So the attacker is able to precisely control what is allocated where during application execution, and what is next to it in memory. This greatly facilitates the exploitation of all kinds of memory safety issues. See https://www.youtube.com/watch?v=-FXvUe0tySM&t=1233s (starting at 18:28) for example.

By isolating the user inputs in separate chunks, we prevent grooming the application heap via GET/POST/etc. This is useful in the context of a remote attacker with no ability to execute arbitrary code.

As the overhead is very small (0% wall time / 0.04% valgrind on the symfony-demo benchmark), I believe this is worth it.

@dstogov
Copy link
Member

dstogov commented Nov 11, 2024

@arnaud-lb didn't you already add the protection against heap buffer overflow/underflow through the shadow pointers?
I thought, it shouldn't be possible to corrupt the free-list now.
why do we need this patch on top of shadow pointers?

do you have the source of the exploit used in the presentation?

Personally, I think that the better approach would be filtering input data and just stopping the request in case of unexpected/dangerous data detection (e.g. GET/POST with duplicate names).

@arnaud-lb
Copy link
Member Author

@dstogov

didn't you already add the protection against heap buffer overflow/underflow through the shadow pointers?

Yes, but that's not the only way to exploit an out of bound write. If an attacker can arrange the heap to override the first bytes of an arbitrary block, there are other things they can attack. Protecting against freelist corruption specifically was worth it because it's an easy and powerful target, but there are other targets that are made practicable by arranging the layout of the heap.

do you have the source of the exploit used in the presentation?

I've tried to find it, but without success

Personally, I think that the better approach would be filtering input data and just stopping the request in case of unexpected/dangerous data detection (e.g. GET/POST with duplicate names).

This will prevent controlling the order of elements in the freelist, but this is not absolutely necessary to control block placement in the heap. For instance, an attacker can control the order of runs in a chunk (without unsetting a GET/POST element), as well as how much they are filled, so they can arrange for a block of size N allocated by the application to be at the end of a run, and just before an other run of their choice.

@dstogov
Copy link
Member

dstogov commented Nov 11, 2024

do you have the source of the exploit used in the presentation?

I've tried to find it, but without success

@cfreal could you please share the sources of exploit used in your presentation? (better to email to my and @arnaud-lb public github email addresses)

@jvoisin
Copy link
Contributor

jvoisin commented Nov 11, 2024

Similar exploits are available here, with a detailed write-up about the heap shaping here

@cfreal
Copy link

cfreal commented Nov 11, 2024

What @jvoisin said. Relevant section here. I can provide the adminer exploit as well tomorrow if you find it useful.

@dstogov
Copy link
Member

dstogov commented Nov 12, 2024

@jvoisin @cfreal thank you very much. I'll need to play with this.

@m4p1e
Copy link

m4p1e commented Nov 18, 2024

Future scope would be to activate the input zone in more places (e.g. when parsing json, during unserialize, etc), or to create more zones for various purposes.

Also, do not abuse the userinput zone, as it may introduce a new attack surface. For example, if there is a potential vulnerability in the unserialize process and unserialize-related operations are added to the userinput zone, an attacker could use an HTTP request to arrange an ideal memory layout without needing to account for PHP internals.

@jvoisin
Copy link
Contributor

jvoisin commented Nov 18, 2024

Also, do not abuse the userinput zone, as it may introduce a new attack surface. For example, if there is a potential vulnerability in the unserialize process and unserialize-related operations are added to the userinput zone, an attacker could use an HTTP request to arrange an ideal memory layout without needing to account for PHP internals.

Unserialized is already providing enough control to an attacker, this wouldn't change much :D But more seriously, partitioning memory by types (whether primitive types/size, or usage-types) is part of #14083's roadmap :)

@jvoisin
Copy link
Contributor

jvoisin commented Apr 8, 2025

@dstogov did you have time to take a look at this?

@dstogov
Copy link
Member

dstogov commented Apr 14, 2025

@arnaud-lb please discuss this with @nielsdos, @iluuu1994 and take the decision.
I don't like this over-complication, but I also don't like to be a blocker.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants