Flying under the radar with split processing
Intro
In the beginning of my recent article -https://hackingiscool.pl/breaking-out-from-stripped-tokens-using-process-injection/- I mentioned that it laid out the groundwork for something I was about to publish soonish. This is it.
The idea started roaming my mind back in 2019 when I began working as an incident responder. Then it resurfaced very recently when I was developing the linked above process injection POC against PostgreSQL and considered using DripLoader (https://github.com/xuanxuan0/DripLoader) to evade EDR. Eventually I did something else, although similar. This is about detection evasion.
In advance I would like to provide a little disclaimer here. I am not claiming this is anything super inventive and honestly, I do not believe what I am going to describe has not been used by malware developers. I just haven't seen it, probably because I don't analyze malicious software on daily basis. Also, I don't read every single blog post and Github repository, although I try to keep up with interesting stuff, to the best of my abilities. So, if you have written about something similar or have seen this somewhere, please let me know. I'll be happy to add an update and link relevant resources, to make this content more accurate and useful.
DripLoader
If you are not familiar with DripLoader, I recommend reading the main page of its github repository (linked above).
For completeness, I will paraphrase here what DripLoader is, but also elaborate on the entire concept myself. It is an evasive approach to process injection. It boils down to the assumption that every single step of the entire process injection sequence (opening a handle, allocating memory, writing to it, making it executable, starting a remote thread), when seen as an isolated event, does not meet the scoring threshold of malicious behavior (because there would be way too many false positives). So, what the EDR is looking for is a sequence of events, that when summed up together, exceeds the score required to accurately rate an activity as malicious. So the goal is to trick the EDR into not seeing those events as a correlated series of steps. This is achieved by introducing artificial delays between the steps. If those delays are long enough, when the next event occurs, the EDR will already have forgotten about the previous one, and once again see and treat the current event as an isolated one. That way even though eventually the entire process injection sequence is performed, EDR does not catch the big picture. It misses the forest for the trees. This assumption is based on the fact that an EDR sensor processes a large volume of events and cannot track the history of every process indefinitely. So if base events are separated by sufficiently long time delays, they do not get correlated and seen as malicious activity.
Simple versus complex rules
In security monitoring in general, we can group detection rules as:
- single event-based (simple, triggered immediately by one base event),
- multi event-based (looking for a specific sequence of base events).
So, the simple ones trigger alerts immediately in response to just one occurrence of the relevant base event - such as administrative user creation or an attempt do dump lsass.exe memory.
The second group contains everything else; correlational rules looking for a specific sequence of base events or simply a minimum threshold of particular base events being reached. Such as failed logons, potentially suspicious commands, use of potentially suspicious functions, abnormal volumes of network traffic, abnormally high number of file operations etc. And since multiple base events are involved, their occurrences must be measured in time. This task is much more complex and expensive than using simple rules.
And the principle demonstrated by DripLoader applies to most score/threshold-based detections. Since there are minimum count thresholds and scores that need to be met in a given time window for a detection to work, once an attacker knows what those thresholds are and how they are calculated, it becomes fairly easy to bypass them.
For example, let's take a brute force/password spraying detection/prevention mechanism based on thresholds. If we know that the threshold is 10 in an hour, and we assume that in normal conditions, in result of the regular noise, there might be 2 failed logons per hour, we could run an automated attack that does not exceed 7 attempts per hour. So-called long and slow.
Separating events by source instead of time
Now, how further we could go, depends on how the threshold is calculated. If it is tracked separately for every source IP address, we can split our dictionary into X chunks and run X agents from different source IP addresses. Neither of them would exceed the threshold, while we would be performing X*7 attempts per hour. This way both time windows and IP addresses are used to avoid reaching the limit thresholds.
Now, the same goes for static executable analysis as well as dynamic process analysis. In dynamic process analysis in addition to low frequency-activity, if we distribute the activity across multiple processes, typical thresholds will apply to them individually. So we take a sequence of events that would trigger a rule if originated from one process, we split them so every single event comes from a different source process, and neither of them ever exceeds the alert threshold, despite their collective activity being equally malicious as if it was conducted from just one process. In static binary analysis, when our malicious code is distributed across 10 different executables/dlls, again neither of them individually reaches the score sufficient to rate it as malware, because it contains too few references to functions deemed suspicious.
First POC
I implemented this approach when building my POC against an older x86 version (9.2) of PostgreSQL. The process injection logic was split between several source processes (instead of using time delays DripLoader-style):
1. Start one process and obtain the PID of the process name to inject into. Save the PID into a file, exit.
2. Start another process, which reads the PID from the file created in step 1, opens a handle to the target process, allocates a writable memory section in it, saves its address in a file, closes the handle and exits.
3. Start another process, which opens a handle to the target process, reads the address of the memory section from the file created in step 2, generates the shellcode and writes it into the newly allocated section, closes the handle and exits.
4. Start another process, which opens a handle to the target process, reads the address of the memory section from the file created in step 2, replaces the writable flag on it with readable+executable flags, closes the handle, exits.
5. Start another process, which opens a handle to the target process, reads the address of the memory section from the file created in step 2, starts a remote thread with entry point at the beginning of that memory section, closes the handle and exits.
Neither of the processes reaches the minimum score to be deemed malicious. And it works on the static analysis level as well. Since all the processes come from different executables, neither of the files alone is flagged as malware by machine learning algorithms. Neither of those executables contains the full or even half the function calls that occur in typical process injection. Instead, there is only one of them in each, which is too low to generate alerts without an unacceptably high ratio of false positives. This could be achieved both by splitting the logic between different standalone exe files as well as using one exe file as a carrier and move the logic to separate DLLs. In both cases every time a new process needs to be spawned. The first approach, with separate executables, should be more effective, in case the EDR was calculating/aggregating the scores not only based on the PID, but also based on the image name/checksum.
This is exactly what I implemented for the PostgreSQL code injection POC and had successfully used in production without getting caught. This approach was additionally supported by the design of the service itself, because every time a new connection is made to the database, the service creates a new legitimate postgres.exe child process to serve it. And if we operate from a DBA level and abuse stored procedures to load arbitrary DLLs (or use file operations), that is the process we can execute code from.
So, in the case of attacking PostgreSQL, the goal was to inject the shellcode into pg_ctl.exe as it had the SeImpersonatePrivilege in its primary token. To split the injection sequence between separate processes, every step was implemented in a separate DLL file, which was loaded in a separate connection session, as the design of the service itself supported this approach.
Process injection was implemented in 5 separate DLLs (pgext1-5.dll) deployed in C:\ProgramData\Comms, the shellcode loaded into pg_ctl.exe triggered LoadLibrary() on C:\ProgramData\Comms\pged.dll, which in turn created a text file C:\ProgramData\Comms\NETSERVICE_POC.txt.
Before start:
Then steps, from 1 to 5:
Temporary files used by individual processes to share information were stored in the data\ directory of the PostgreSQL Program Files installation directory.
For debugging purposes and for better visibility of what goes behind the scenes, all actions are logged into a file named ext.log.
Second POC
Some EDRs calculate suspicion scores in a more sophisticated way, for instance for entire groups of processes, by tracking parent-child relationships. So, if we simply split our malicious activity between child processes of our control process, it is easier for an EDR to correlate them. But there are ways to make this type of tracking more challenging as well.
And I am not talking about parent process ID spoofing. Instead, we can use native Windows services that create processes for us. Such as Task Scheduler, Windows Installer, WMI, and COM.
I decided to give it a go with private COM servers.
It is possible for a non-administrative user to register COM servers in their own HKCU\Software\Classes\CLSID, including out-of-process servers. What I implemented is not an actual COM application, because it is not exposing any interfaces and is not calling any methods this way. The only function COM serves in this scenario is to break the parent-child relationship, so the controlling process (in this example called "client.exe") is not seen as the parent of the worker processes (in this example called "server.exe").
The client supports two commands: deploy and inject.
Running it with the deploy command simply registers the server.exe as a COM server with an arbitrary CLSID by creating a corresponding key in HKEY_CURRENT_USER\Software\Classes\CLSID\{B0A7E410-1F7C-4B1A-8E0B-0A0C6D7E1234}\LocalServer32, pointing to the full path of the executable:
So, after deployment it looks like this:
This is sufficient to get that process executed when a client invokes the corresponding CLSID via CoCreateInstance(). Instead of client.exe, the parent process will be a service named "Host Process For Windows Services", the one with command line "C:\Windows\system32\svchost.exe -k DcomLaunch -p".
In this POC I split the entire process into the following six steps:
1. Obtaining the PID of the target process.
2. Allocating writable memory in it.
3. Generating the shellcode (dynamic generation is required as current addresses of LoadLibraryA() and ExitThread() need to be fetched from an existing process).
4. Write the shellcode into the newly allocated section.
5. Replace the writable permission of the section with read+execute.
6. Start a new remote thread.
The inject command in client.exe takes one additional argument, that is the name of the target process to inject into.
Also, this version does not open the process handles with PROCESS_ALL_ACCESS, but instead only uses the minimum access mask required to perform the given operation (e.g. PROCESS_VM_OPERATION).
Every time the server.exe is spawned, depending on the step currently requested from it, loads a different DLL that implements the relevant logic.
Also, contrary to the PostgreSQL POC, this one was made for x64 targets.
In this simple demo, all the paths are hardcoded and point to C:\Users\Public. The shellcode makes the target process call LoadLibrary("C:\Users\Public\get_basic_info.dll").
The DLL creates (or appends) a text file in C:\Users\Public\split_proc.poc.txt and simply appends the current process's image path, command line and user information followed by a newline character, just to confirm the process into which it was injected. For debugging purposes and for better visibility of what goes behind the scenes, all actions are logged into a file named split.log.
Filter set in Procmon before running:
How it started (the target was the notepad.exe intentionally opened on the TODO.txt file from C:\Users\Public):
How it went:
The split_proc.poc.txt was created from the get_basic_info.dll loaded into the only running instance of notepad.exe, as confirmed by the contents of the file:
BTW, this version will not work when run from an elevated token. The HKCU registry key is not looked for CLSIDs invoked via COM, which I suspect is a countermeasure against UAC bypasses. So, the COM version, if to be run elevated, requires set up in the HKLM key instead.
Test results
Functional
Both versions were tested on Windows 10. Additionally, the COM version was tested on Windows 11, and the PostgreSQL x86 version was tested on Windows Server 2016. Functionally they worked everywhere.
Evasive
I decided not to name any specific EDR products I have tested this against, all I want to say is that I am expecting this to work against some of them at the time of writing this. Since detection capabilities are constantly improved and can change any time, it is best if you test this POC against the product you use yourself.