[HITCON2022 - Checker] How to reverse a metamorphic windows kernel driver statically - 0poss

This is a long write-up. It’s not particularly technical, I just wanted to show how I reverse-engineering this challenge fully statically using Binary Ninja.

Untitled

The archive contains two files, check_drv.sys and checker.exe, so we can already guess that the .exe is a user-mode application making requests (spoil : IRPs) to the .sys kernel driver for flag verification.

Since we can make the (educated) guess that most of the code is inside the driver, let’s take a quick look inside checker.exe. Quick tip if you’re looking for the call to the main function inside the _start in a Portable Executable :

Untitled

It’s here.

Single click on sub_140001070 and press y. Change the int64_t sub_140001070() into int32_t main(int32_t argc, char** argv) (it will both change the type and rename it). After a little renaming, we get the following function :

Untitled

It seems that checker.exe sends a 0x222080 control code to the hitcon_checker device… and that’s it. It doesn’t take input and doesn’t send any buffer to the device. It only prints “wrong” if the driver returns 0 in the input buffer, “correct” otherwise. So I guess we’ll have to check the driver itself and see what it’s all about.

Open the checker.sys in your favorite disassembler. In case you’re wondering where the call to DriverEntry is :

Untitled

It’s here.

And it’s ugly.

Untitled

There’s a bunch of additional XORs that we’ll get onto much later but first let us make more sens out of the variables of the function.

Before re-typing and re-naming uint64_t sub_140001b50(void* arg1) to NTSTATUS DriverEntry(DRIVER_OBJECT* DriverObject, UNICODE_STRING *RegistryPath), we need to actually define these types. Easy stuff with Binary Ninja, just download the Windows kernel headers (for example from ‣), click “Types” in the left sidebar, click anywhere in the types panel and press i before writing the following :

Untitled

#include <wdm.h>

typedef DEVICE_OBJECT lol1;
typedef DRIVER_OBJECT lol2;
typedef IRP lol3;
typedef UNICODE_STRING lol4;
typedef OB_CALLBACK_REGISTRATION lol5;
typedef OB_OPERATION_REGISTRATION lol6;

-isystem /home/osef/Documents/winsdk-10/Include/10.0.16299.0/km -isystem /home/osef/Documents/winsdk-10/Include/10.0.16299.0/km/crt -isystem /home/osef/Documents/winsdk-10/Include/10.0.16299.0/shared -D_AMD64_

Remember to change the path to the headers at the bottom. If you’re using Linux, you’ll need to manually create two or three symlinks inside the header directories because of inconsistent letter cases in some includes. Binary Ninja will recursively define the requested types. I don’t know how to tell it to import ALL types from wdm.h, hence the dummy typedefs.

NOW we can rename uint64_t sub_140001b50(void* arg1) to NTSTATUS DriverEntry(DRIVER_OBJECT* DriverObject, UNICODE_STRING *RegistryPath), and the function makes slightly more sens :

Untitled

Using the y keybinding, we can directly change sub_140001000‘s name and type to void DriverUnload(DRIVER_OBJECT *DriverObject).

Taking a look at sub_140001110 :

Untitled

That the function creating the hitcon_checker device (with a comfy symlink to it). Using the y keybinding, do the following changes :

int64_t sub_140001110(void* arg1)	NTSTATUS CreateHitconDevice(DRIVER_OBJECT* DriverObject)
data_140003158	UNICODE_STRING us_device_symlink
data_140003148	UNICODE_STRING us_device_name
data_140003140	DEVICE_OBJECT *HitconDevice

Here’s the result :

Untitled

Inside sub_140001040

Untitled

We have a lot of “unused” stack variables, so these probably are a structure that is mistyped. Either way, there’s a call to [ObRegisterCallbacks](https://learn.microsoft.com/en-us/windows-hardware/drivers/ddi/wdm/nf-wdm-obregistercallbacks) so that’s confirmed. One variable, var_60, is assigned PsProcessType, meaning that the driver is registering a callback to handle operations on processes. Let’s change var_40‘s type to [OB_CALLBACK_REGISTRATION](https://learn.microsoft.com/en-us/windows-hardware/drivers/ddi/wdm/ns-wdm-_ob_callback_registration) and it’s name to ObCallbackRegistration. The function is now slightly more readable :

Untitled

But we still have some “unused” stack variables so let’s do the following changes :

int64_t (* const var_60)()	OB_OPERATION_REGISTRATION ObOperationRegistration
int128_t var_18	UNICODE_STRING us_altitude
int128_t zmm0	UNICODE_STRING us_altitude2
int64_t var_88	int64_t Context[0x5] (the RegistrationContext is driver-dependent, this is just a guess)
int64_t rax	NTSTATUS ret
data_140003168	PVOID RegistrationHandle
int64_t sub_140001430(int64_t, int32_t*)	OB_PREOP_CALLBACK_STATUS PreProcCallback(PVOID, POB_PRE_OPERATION_INFORMATION PreOpInformation)
NTSTATUS sub_140001040()	NTSTATUS RegisterCallback()

And the function is now much more readable :

Untitled

Of course all of this wasn’t needed ; but still, it’s good practice.

Let’s get into the PreProcCallback function. To make it a little easier to read, we can override Windows’s typedef ULONG OB_OPERATION with the following type :

typedef enum _OB_OPERATION {
	OB_OPERATION_HANDLE_CREATE = 1,
	OB_OPERATION_HANDLE_DUPLICATE = 2
} OB_OPERATION;

Change int32_t rax_2 and int32_t* rcx_2 to OB_OPERATION op and OB_PRE_CREATE_HANDLE_INFORMATION* params, respectively, to get :

Untitled

By looking into data_140003170, we that it’s a 0x10000 (65536)-long uint8_t array. Since the PIDs on Windows range from 0 to 65535, we can be pretty confident when re-typing and renaming data_140003170 to uint8_t PIDArray[0x10000]. So this little function checks if the process that triggered the callback has an entry in PIDArray that is not 0 and declines further handle creation and duplication iff the original requested access rights has the lowest bit set to 1. To this day, I don’t know what the purpose of this function is (probably just preventing processes that already opened a handle to the device to create other handles to it, but I don’t get what this achieves). If you know more about this, please tell me.

After inserting a few comments, the DriverEntry is now a little bit cleaner (I used High Level IL to hide the boring casts) :

Untitled

There’s still some stuff to resolve.

Let’s take a look inside the sub_1400011b0, the function that is supposed to handle those IRPs.

Untitled

It really isn’t big and there’s a cute little switch with 9 cases. Before looking at the code, let’s change it’s type and name to NTSTATUS DispatchXXX(DEVICE_OBJECT* DeviceObject, IRP* Irp) (how do I know ? Because I do).

Untitled

Inside the function, Binary Ninja incorrectly identifies (I’m pretty sure) the function at 0x140001ed0 as being the Concurrency::details::VirtualProcessor::GetExecutingContext function from the C++ concurrency runtime. If we look at it’s content, it only loads the qword at offset 0xb8 from the input pointer. At offset 0xb8 of the IRP structure resides a union with a struct _IO_STACK_LOCATION* CurrentStackLocation and a ULONG PacketType. Since it loads a qword, and that this qword is later used as a pointer for dereferencing several values, we can guess that the union will consist of the CurrentStackLocation field at runtime. So Concurrency::details::VirtualProcessor::GetExecutingContext is in fact [IoGetCurrentIrpStackLocation](https://learn.microsoft.com/en-us/windows-hardware/drivers/ddi/wdm/nf-wdm-iogetcurrentirpstacklocation). Let’s fix this.

Let’s also fix the PsGetCurrentProcessId (because I forgot to do it when going over PreProcCallback) so that it doesn’t take any argument (because it doesn’t).

To speed up a little, here’s what we have :

Untitled

and the end of the function is :

Untitled

In a nutshell, if after all the cases have been executed, the flag global array starts with “hitcon”, we win (remember the end of the main function in the user mode application ?).

Before jumping into the myterious_bunch_of_xors function, we need to finish unraveling the DriverEntry.

Remember :

Untitled

Calling MmGetPhysicalAddress with a virtual address as argument (e.g. the address of flag) returns the physical address that it maps to. MmMapIoSpace does the “contrary” ; it takes a physical address range as input, create a new virtual mapping to the physical address range. This means that after the call to MmMapIoSpace, there will be at least two virtual pages that are mapped to the same physical memory region.

In this case, the driver maps 0x1000 (4096) bytes of new virtual memory to the same physical region that flag was mapped onto. It then stores the first address of this new mapping inside data_140013170, and does the same thing for the PreProcCallback function and data_140013188.

data_140013178 also maps onto the same physical memory region as flag + 0x30 and data_140013180 onto the same physical memory region as PreProcCallback + 0x700, namely sub_140001b30 :

Untitled

Very simple function. We’ll call it dec as in “decryption” !

Since we know that data_140013170 always points to flag, let’s patch the binary, And do the same for data_140013178, data_140013188 and data_140013180, shall we ?

If you look at all the code refs to data_140013170, these are all (except for the first one) mov rbx, qword [rel data_140013170], so they just load the address of flag into rbx. We’re going to replace them by an lea rbx, qword [rel flag].

# /!\ Very important when messing around with the Binary Ninja API /!\
bv.begin_undo_actions()

flag_addr = 0x140003000
# Get all code refs to `data_140013170`.
for r in bv.get_data_var_at(0x140013170).code_refs:
    inst = r.llil

    # Make sure we patching the right instructions
    if inst.operation == LowLevelILOperation.LLIL_SET_REG and inst.dest.name == 'rbx':
        delta = flag_addr - inst.address

        # `lea rbx, qword [rel flag]` is 7 bytes long.
        asm = f"lea rbx, [rip+{delta - 7:#x}]"
        print("Patching", hex(inst.address), str(inst), "with", asm)
        asm = bv.arch.assemble(asm)
        bv.write(inst.address, asm)

# /!\ Very important when messing around with the Binary Ninja API /!\
bv.commit_undo_actions()
print("Done")

Let’s turn this into a function to apply this to the other virtual address duplicate :


# /!\ Very important when messing around with the Binary Ninja API /!\
bv.begin_undo_actions()

def patch_vaddr_instructions(original_addr, vaddr_ptr):
    for r in bv.get_data_var_at(vaddr_ptr).code_refs:
        inst = r.llil
        if inst.operation == LowLevelILOperation.LLIL_SET_REG:
            delta = original_addr - inst.address

            # `lea reg, qword [rel pos]` is 7 bytes long.
            asm = f"lea {inst.dest.name}, [rip+{delta - 7:#x}]"
            print("Patching", hex(inst.address), str(inst), "with", asm)
            asm = bv.arch.assemble(asm)
            bv.write(inst.address, asm)

flag_addr = bv.get_symbol_by_raw_name("flag").address
patch_vaddr_instructions(flag_addr, 0x140013170)
patch_vaddr_instructions(flag_addr + 0x30, 0x140013178)

proccbk_addr = bv.get_symbol_by_raw_name("PreProcCallback").address
patch_vaddr_instructions(proccbk_addr + 0x700, 0x140013180)
patch_vaddr_instructions(proccbk_addr, 0x140013188)

# /!\ Very important when messing around with the Binary Ninja API /!\
bv.commit_undo_actions()
print("Done")

Wanna see a magic trick ? Go inside the myterious_bunch_of_xors function :

Untitled

Now run the script and watch all these indirections just…

Untitled

… huh ? Oh no ! Binary Ninja’s optimisations automatically replace the bytes from the dec function by their value, despite them being changed at runtime.

I went on Binary Ninja’s official Slack and the very same day I’m writing this, a guy had the same problem :

Untitled

There was some nice answers, like this one :

Untitled

The only real solution was this :

Untitled

I’ve tried several but with no success. I asked how to do it, I’m still without answers :/

We’re going to have to make this “virtual address unraveling” for all for of them, except dec. Let’s comment this out :

1	# patch_vaddr_instructions(proccbk_addr + 0x700, 0x140013180)

So we’ll have to stick to

Untitled

And

Untitled

This. It could be better.

Since it seems to be used as an array of XOR keys in myterious_bunch_of_xors, let’s change data_140003030 to uint8_t key_array[0x100]. Why 0x100 ? I just selected everything from 0x140003030 to the next data label and BInja kindly display the size of the selection in bytes at the bottom right corner :

Untitled

So it seems that the driver is self-modifying. The only way it’s able to achieve this without triggering a fault is by setting the bit 16 of the CR0 register to 0 in sub_140001490 et re-setting it to 1 later in sub_1400014b0:

Untitled

What this achieves is that it allows code running in ring 0 to write to read-only pages. So let’s call these two functions DisableWP and EnableWP.

Actually solving the challenge

Back into myterious_bunch_of_xors. The function takes an int32_t as argument that seems to be the offset in key_array from which we’ll XOR the dec function, and, by looking at the code in the switch cases of DispatchXXX this argument can only be one of [0x00, 0x20, 0x40, 0x60, 0x80, 0xa0, 0xc0, 0xe0], depending on the value of the IRP that was sent. At this moment, I just made the educated guess that the IRPs must be sent in a specific order to keep the dec coherent.

Let’s write a function to do the decryption.

# /!\ Very important when messing around with the Binary Ninja API /!\
bv.begin_undo_actions()

# Get those addresses.
dec_addr = bv.get_symbol_by_raw_name("dec").address
proccbk_addr = bv.get_symbol_by_raw_name("PreProcCallback").address
key_array_addr = bv.get_symbol_by_raw_name("key_array").address

# Disassemble and lift to LLIL.
def disass(buf):
    new_bv = binaryview.BinaryView.new(buf)
    new_bv.add_function(0, plat=bv.platform)
    return '\n'.join(map(str,
        new_bv.get_function_at(0).low_level_il.instructions))

# Do what `DriverEntry` does on `dec`, basically.
def do_initial_decryption():
    buf = Transform['XOR'].encode(bv.read(dec_addr, 16), {'key': bv.read(proccbk_addr, 16)})
    buf = Transform['XOR'].encode(buf, {'key': bv.read(proccbk_addr + 16, 16)})
    return buf

# Do what the first and last do in `myterious_bunch_of_xors`.
# But only the start (or only the end, depending on the offset).
# (The last part is the same as the first but with `offset := offset + 16`.
def do_decrypt(buf, offset):
    assert(len(buf) == 16)
    buf = Transform['XOR'].encode(buf,
        {'key' : bv.read(key_array_addr + offset, 16)})
    return buf

dec = do_initial_decryption()

for offset in range(0, 0xe0 + 0x20, 0x20):
    print("Offset " + hex(offset),
        disass(do_decrypt(dec, offset)),
        sep='\n')
    print("="*10)

# /!\ Very important when messing around with the Binary Ninja API /!\
bv.commit_undo_actions()

This script produces this output :

Offset 0x0
temp0.d = ecx
ecx = eax
eax = temp0.d
ebp = (rdi + 0x2e988591).d
if (flag:s != flag:o) then 5 else 6 @ 0x9
jump(0x3c)
temp0.d = [rcx - 0x6d2d9af5].d
[rcx - 0x6d2d9af5].d = edi
edi = temp0.d
undefined
==========
Offset 0x20
rax = pop
st0 = st0 f* float.t([rcx + 0x499be7f0].d)
if (not(flag:z) && flag:s == flag:o) then 3 else 4 @ 0xd
jump(0x6e858a3d)
edx = edx ^ eax
undefined
==========
Offset 0x40
undefined
==========
Offset 0x60
push(rcx)
eax = sbb.d(eax, -0x61b75970, flag:c)
rbx = pop
jump(0x1a1b5d87)
==========
Offset 0x80
temp0.b = bl
bl = bl u>> cl
flag:c = unimplemented
[ffffffffaa687d9a].d = sbb.d([ffffffffaa687d9a].d, edx, flag:c)
r13 = r13 + rdi
undefined
==========
Offset 0xa0
esi = edi * 0x2fcf7038
undefined
==========
Offset 0xc0
al = sbb.b(al, 0x40, flag:c)
temp0.d = ecx
ecx = eax
eax = temp0.d
temp0.b = [rsp + (rax << 1) + 0x252aff20].b
[rsp + (rax << 1) + 0x252aff20].b = al
al = temp0.b
undefined
==========
Offset 0xe0
edx = zx.d(cl)
eax = edx
dl = dl << 3
eax = eax u>> 5
al = al | dl
<return> jump(pop)
==========

None of these routines makes sens… except the last one !

edx = zx.d(cl)
eax = edx
dl = dl << 3
eax = eax u>> 5
al = al | dl
<return> jump(pop)

It takes the first argument cl and basically returns cl << 3 | cl >> 5. That’s right ! The guess is confirmed !

Let’s just add

offsets = []

offsets += [ 0xe0 ]
# Do the first and last part of `myterious_bunch_of_xors`
dec = do_decrypt(do_decrypt(dec, offsets[-1]), offsets[-1] + 0x10)

right before the for loop and run the script again :

Offset 0x0
temp2.d = edx
temp3.d = eax
temp0.d = divs.dp.d(temp2.d:temp3.d, [rcx - 0x4c].d)
temp4.d = edx
temp5.d = eax
temp1.d = mods.dp.d(temp4.d:temp5.d, [rcx - 0x4c].d)
eax = temp0.d
edx = temp1.d
undefined
==========
Offset 0x20
__out_immb_oeax(0xa0, eax, rflags.d)
rdi = pop
undefined
==========
Offset 0x40
cl = cl ^ 0x26
eax = zx.d(cl)
<return> jump(pop)
==========
Offset 0x60
__out_dx_al(dx, al, rflags.d)
undefined
==========
Offset 0x80
temp0, rdi = __insd(rdi, dx, rflags.d)
[rdi].d = temp0.d
temp0.d = ecx
ecx = eax
eax = temp0.d
undefined
==========
Offset 0xa0
undefined
==========
Offset 0xc0
[8df7fc620fa3473a].d = eax
dl = sbb.b(dl, [r10].b, flag:c)
rip = __int1()
push(rax)
rbp = pop
undefined
==========
Offset 0xe0
al = 0xcc
undefined
==========

Now only the routine decrypted with the switch case 0x20 (offset 0x40) makes sens !

Let’s repeat this process and at the end we have

offsets += [ 0xe0 ]
dec = do_decrypt(do_decrypt(dec, offsets[-1]), offsets[-1] + 0x10)

offsets += [ 0x40 ]
dec = do_decrypt(do_decrypt(dec, offsets[-1]), offsets[-1] + 0x10)

offsets += [ 0xc0 ]
dec = do_decrypt(do_decrypt(dec, offsets[-1]), offsets[-1] + 0x10)

offsets += [ 0x00 ]
dec = do_decrypt(do_decrypt(dec, offsets[-1]), offsets[-1] + 0x10)

offsets += [ 0x20 ]
dec = do_decrypt(do_decrypt(dec, offsets[-1]), offsets[-1] + 0x10)

offsets += [ 0x80 ]
dec = do_decrypt(do_decrypt(dec, offsets[-1]), offsets[-1] + 0x10)

offsets += [ 0x60 ]
dec = do_decrypt(do_decrypt(dec, offsets[-1]), offsets[-1] + 0x10)

offsets += [ 0xa0 ]
dec = do_decrypt(do_decrypt(dec, offsets[-1]), offsets[-1] + 0x10)
# No decryption make sens after this ...

Which yields the following dec routines :

// Offset 0xe0
edx = zx.d(cl)
eax = edx
dl = dl << 3
eax = eax u>> 5
al = al | dl
<return> jump(pop)

// Offset 0x40
cl = cl ^ 0x26
eax = zx.d(cl)
<return> jump(pop)

// Offset 0xc0
edx = zx.d(cl)
eax = edx
dl = dl << 4
eax = eax u>> 4
al = al | dl
<return> jump(pop)

// Offset 0x0
eax = (rcx + 0x37).d
<return> jump(pop)

// Offset 0x20
eax = (rcx + 0x7b).d
<return> jump(pop)

// Offset 0x80
edx = zx.d(cl)
eax = edx
dl = dl << 7
eax = eax u>> 1
al = al | dl
<return> jump(pop)

// Offset 0x60
eax = zx.d(cl)
eax = eax * 0xad
<return> jump(pop)

// Offset 0xa0
edx = zx.d(cl)
eax = edx
dl = dl << 2
eax = eax u>> 6
al = al | dl
<return> jump(pop)

Let’s flag this then :

def dec(c):
    c = ((c << 3) | (c >> 5)) & 0xFF
    c = c ^ 0x26
    c = ((c << 4) | (c >> 4)) & 0xFF
    c = (c + 0x37) & 0xFF
    c = (c + 0x7b) & 0xFF
    c = ((c << 7) | (c >> 1)) & 0xFF
    c = (c * 0xad) & 0xFF
    c = ((c << 2) | (c >> 6)) & 0xFF
    return c

enc_flag = bv.get_data_var_at(bv.get_symbol_by_raw_name("flag").address)
flag = ''.join(map(lambda c: chr(dec(c)), enc_flag.value))
print(flag)

Untitled

Prev Home Next

[HITCON2022 - Checker] How to reverse a metamorphic windows kernel driver statically - 0poss

Actually solving the challenge

Comments