[HITCON2022 - Checker] How to reverse a metamorphic windows kernel driver statically - 0poss
This is a long write-up. It’s not particularly technical, I just wanted to show how I reverse-engineering this challenge fully statically using Binary Ninja.
The archive contains two files, check_drv.sys
and checker.exe
, so we can already guess that the .exe
is a user-mode application making requests (spoil : IRPs) to the .sys
kernel driver for flag verification.
Since we can make the (educated) guess that most of the code is inside the driver, let’s take a quick look inside checker.exe
. Quick tip if you’re looking for the call to the main
function inside the _start
in a Portable Executable :
It’s here.
Single click on sub_140001070
and press y
. Change the int64_t sub_140001070()
into int32_t main(int32_t argc, char** argv)
(it will both change the type and rename it). After a little renaming, we get the following function :
It seems that checker.exe
sends a 0x222080
control code to the hitcon_checker
device… and that’s it. It doesn’t take input and doesn’t send any buffer to the device. It only prints “wrong” if the driver returns 0
in the input buffer, “correct” otherwise. So I guess we’ll have to check the driver itself and see what it’s all about.
Open the checker.sys
in your favorite disassembler. In case you’re wondering where the call to DriverEntry
is :
It’s here.
And it’s ugly.
There’s a bunch of additional XORs that we’ll get onto much later but first let us make more sens out of the variables of the function.
Before re-typing and re-naming uint64_t sub_140001b50(void* arg1)
to NTSTATUS DriverEntry(DRIVER_OBJECT* DriverObject, UNICODE_STRING *RegistryPath)
, we need to actually define these types. Easy stuff with Binary Ninja, just download the Windows kernel headers (for example from ‣), click “Types” in the left sidebar, click anywhere in the types panel and press i
before writing the following :
1 |
|
1 | -isystem /home/osef/Documents/winsdk-10/Include/10.0.16299.0/km -isystem /home/osef/Documents/winsdk-10/Include/10.0.16299.0/km/crt -isystem /home/osef/Documents/winsdk-10/Include/10.0.16299.0/shared -D_AMD64_ |
Remember to change the path to the headers at the bottom. If you’re using Linux, you’ll need to manually create two or three symlinks inside the header directories because of inconsistent letter cases in some includes. Binary Ninja will recursively define the requested types. I don’t know how to tell it to import ALL types from wdm.h
, hence the dummy typedefs.
NOW we can rename uint64_t sub_140001b50(void* arg1)
to NTSTATUS DriverEntry(DRIVER_OBJECT* DriverObject, UNICODE_STRING *RegistryPath)
, and the function makes slightly more sens :
Using the y
keybinding, we can directly change sub_140001000
‘s name and type to void DriverUnload(DRIVER_OBJECT *DriverObject)
.
Taking a look at sub_140001110
:
That the function creating the hitcon_checker
device (with a comfy symlink to it). Using the y
keybinding, do the following changes :
int64_t sub_140001110(void* arg1) | NTSTATUS CreateHitconDevice(DRIVER_OBJECT* DriverObject) |
---|---|
data_140003158 | UNICODE_STRING us_device_symlink |
data_140003148 | UNICODE_STRING us_device_name |
data_140003140 | DEVICE_OBJECT *HitconDevice |
Here’s the result :
Inside sub_140001040
We have a lot of “unused” stack variables, so these probably are a structure that is mistyped. Either way, there’s a call to [ObRegisterCallbacks](https://learn.microsoft.com/en-us/windows-hardware/drivers/ddi/wdm/nf-wdm-obregistercallbacks)
so that’s confirmed. One variable, var_60
, is assigned PsProcessType
, meaning that the driver is registering a callback to handle operations on processes. Let’s change var_40
‘s type to [OB_CALLBACK_REGISTRATION](https://learn.microsoft.com/en-us/windows-hardware/drivers/ddi/wdm/ns-wdm-_ob_callback_registration)
and it’s name to ObCallbackRegistration
. The function is now slightly more readable :
But we still have some “unused” stack variables so let’s do the following changes :
int64_t (* const var_60)() | OB_OPERATION_REGISTRATION ObOperationRegistration |
---|---|
int128_t var_18 | UNICODE_STRING us_altitude |
int128_t zmm0 | UNICODE_STRING us_altitude2 |
int64_t var_88 | int64_t Context[0x5] (the RegistrationContext is driver-dependent, this is just a guess) |
int64_t rax | NTSTATUS ret |
data_140003168 | PVOID RegistrationHandle |
int64_t sub_140001430(int64_t, int32_t*) | OB_PREOP_CALLBACK_STATUS PreProcCallback(PVOID, POB_PRE_OPERATION_INFORMATION PreOpInformation) |
NTSTATUS sub_140001040() | NTSTATUS RegisterCallback() |
And the function is now much more readable :
Of course all of this wasn’t needed ; but still, it’s good practice.
Let’s get into the PreProcCallback
function. To make it a little easier to read, we can override Windows’s typedef ULONG OB_OPERATION
with the following type :
1 | typedef enum _OB_OPERATION { |
Change int32_t rax_2
and int32_t* rcx_2
to OB_OPERATION op
and OB_PRE_CREATE_HANDLE_INFORMATION* params
, respectively, to get :
By looking into data_140003170
, we that it’s a 0x10000
(65536
)-long uint8_t
array. Since the PIDs on Windows range from 0
to 65535
, we can be pretty confident when re-typing and renaming data_140003170
to uint8_t PIDArray[0x10000]
. So this little function checks if the process that triggered the callback has an entry in PIDArray
that is not 0
and declines further handle creation and duplication iff the original requested access rights has the lowest bit set to 1
. To this day, I don’t know what the purpose of this function is (probably just preventing processes that already opened a handle to the device to create other handles to it, but I don’t get what this achieves). If you know more about this, please tell me.
After inserting a few comments, the DriverEntry
is now a little bit cleaner (I used High Level IL to hide the boring casts) :
There’s still some stuff to resolve.
Let’s take a look inside the sub_1400011b0
, the function that is supposed to handle those IRPs.
It really isn’t big and there’s a cute little switch with 9 cases. Before looking at the code, let’s change it’s type and name to NTSTATUS DispatchXXX(DEVICE_OBJECT* DeviceObject, IRP* Irp)
(how do I know ? Because I do).
Inside the function, Binary Ninja incorrectly identifies (I’m pretty sure) the function at 0x140001ed0
as being the Concurrency::details::VirtualProcessor::GetExecutingContext
function from the C++ concurrency runtime. If we look at it’s content, it only loads the qword
at offset 0xb8
from the input pointer. At offset 0xb8
of the IRP
structure resides a union
with a struct _IO_STACK_LOCATION* CurrentStackLocation
and a ULONG PacketType
. Since it loads a qword
, and that this qword
is later used as a pointer for dereferencing several values, we can guess that the union
will consist of the CurrentStackLocation
field at runtime. So Concurrency::details::VirtualProcessor::GetExecutingContext
is in fact [IoGetCurrentIrpStackLocation](https://learn.microsoft.com/en-us/windows-hardware/drivers/ddi/wdm/nf-wdm-iogetcurrentirpstacklocation)
. Let’s fix this.
Let’s also fix the PsGetCurrentProcessId
(because I forgot to do it when going over PreProcCallback
) so that it doesn’t take any argument (because it doesn’t).
To speed up a little, here’s what we have :
and the end of the function is :
In a nutshell, if after all the cases have been executed, the flag
global array starts with “hitcon”, we win (remember the end of the main
function in the user mode application ?).
Before jumping into the myterious_bunch_of_xors
function, we need to finish unraveling the DriverEntry
.
Remember :
Calling MmGetPhysicalAddress
with a virtual address as argument (e.g. the address of flag
) returns the physical address that it maps to. MmMapIoSpace
does the “contrary” ; it takes a physical address range as input, create a new virtual mapping to the physical address range. This means that after the call to MmMapIoSpace
, there will be at least two virtual pages that are mapped to the same physical memory region.
In this case, the driver maps 0x1000
(4096
) bytes of new virtual memory to the same physical region that flag
was mapped onto. It then stores the first address of this new mapping inside data_140013170
, and does the same thing for the PreProcCallback
function and data_140013188
.
data_140013178
also maps onto the same physical memory region as flag + 0x30
and data_140013180
onto the same physical memory region as PreProcCallback + 0x700
, namely sub_140001b30
:
Very simple function. We’ll call it dec
as in “decryption” !
Since we know that data_140013170
always points to flag
, let’s patch the binary, And do the same for data_140013178
, data_140013188
and data_140013180
, shall we ?
If you look at all the code refs to data_140013170
, these are all (except for the first one) mov rbx, qword [rel data_140013170]
, so they just load the address of flag
into rbx
. We’re going to replace them by an lea rbx, qword [rel flag]
.
1 | # /!\ Very important when messing around with the Binary Ninja API /!\ |
Let’s turn this into a function to apply this to the other virtual address duplicate :
1 |
|
Wanna see a magic trick ? Go inside the myterious_bunch_of_xors
function :
Now run the script and watch all these indirections just…
… huh ? Oh no ! Binary Ninja’s optimisations automatically replace the bytes from the dec
function by their value, despite them being changed at runtime.
I went on Binary Ninja’s official Slack and the very same day I’m writing this, a guy had the same problem :
There was some nice answers, like this one :
The only real solution was this :
I’ve tried several but with no success. I asked how to do it, I’m still without answers :/
We’re going to have to make this “virtual address unraveling” for all for of them, except dec
. Let’s comment this out :
1 | # patch_vaddr_instructions(proccbk_addr + 0x700, 0x140013180) |
So we’ll have to stick to
And
This. It could be better.
Since it seems to be used as an array of XOR keys in myterious_bunch_of_xors
, let’s change data_140003030
to uint8_t key_array[0x100]
. Why 0x100
? I just selected everything from 0x140003030
to the next data label and BInja kindly display the size of the selection in bytes at the bottom right corner :
So it seems that the driver is self-modifying. The only way it’s able to achieve this without triggering a fault is by setting the bit 16 of the CR0
register to 0
in sub_140001490
et re-setting it to 1
later in sub_1400014b0
:
What this achieves is that it allows code running in ring 0 to write to read-only pages. So let’s call these two functions DisableWP
and EnableWP
.
Actually solving the challenge
Back into myterious_bunch_of_xors
. The function takes an int32_t
as argument that seems to be the offset in key_array
from which we’ll XOR the dec
function, and, by looking at the code in the switch cases of DispatchXXX
this argument can only be one of [0x00, 0x20, 0x40, 0x60, 0x80, 0xa0, 0xc0, 0xe0]
, depending on the value of the IRP that was sent. At this moment, I just made the educated guess that the IRPs must be sent in a specific order to keep the dec
coherent.
Let’s write a function to do the decryption.
1 | # /!\ Very important when messing around with the Binary Ninja API /!\ |
This script produces this output :
1 | Offset 0x0 |
None of these routines makes sens… except the last one !
1 | edx = zx.d(cl) |
It takes the first argument cl
and basically returns cl << 3 | cl >> 5
. That’s right ! The guess is confirmed !
Let’s just add
1 | offsets = [] |
right before the for
loop and run the script again :
1 | Offset 0x0 |
Now only the routine decrypted with the switch case 0x20
(offset 0x40
) makes sens !
Let’s repeat this process and at the end we have
1 | offsets += [ 0xe0 ] |
Which yields the following dec
routines :
1 | // Offset 0xe0 |
Let’s flag this then :
1 | def dec(c): |
:3