De-virtualization’ challenge from MalwareTech

Amey Chavan
6 min readOct 16, 2022

--

The de-virtualization challenge from MalwareTech shows an interesting way of how a malware can attempt to hinder reverse engineering by implementing a custom virtual machine (VM) which actually runs a custom bytecode!

In the previous article (called ‘Shellcode’ challenges from MalwareTech) we explored & extracted flags from couple of Shellcode challenges, but this challenge manipulates some pre-defined bytes by performing various operations over those bytes.

For this challenge, it’s not required to execute the provided binary & the use of any debugger/dumper is not allowed. It should be solved via static analysis only.

In the downloaded challenge ZIP, the first file is ram.bin which contains a copy of the VM’s RAM containing the encrypted flag & some bytecode to decrypt it. We don’t need this .bin file because it’s possible to extract flag using the provided executable file called vm1.exe & some Python scripting. 🙂

Downloaded challenge ZIP contains two input files.

Analysis of ‘vm1.exe’

  • On opening vm1.exe file in IDA, we selected start() function which is the main entry point of this executable:
Picture 1.1: The start() entry function of vm1.exe executable.
  • We’ll follow the decompiled code view called “Pseudocode-A” most of the time because it’s helpful to explain/understand the general flow. But sometimes we may need the assembly view called “IDA View-A”.
  • In “Pseudocode-A” of picture 1.1, after some local variables, the line number 7 looks like to initializing the MD5 hash. Then line number 8 will be the call to GetProcessHeap() function that returns a handle to the default heap of calling process in a handle called ProcessHeap. On the line number 9, the call to HeapAlloc() function allocates the 0x1FB (i.e., 507) bytes block of memory from heap. IDA showing the pointer to this allocated memory being stored in dword_40423C variable.
  • There are many variables whose names are not so meaningful & for that here we do a quick renaming (highlighted with red boxes):
Picture 1.2: Before & after renaming of variables to make it more meaningful.
  • As marked in picture 1.2, the dword_40423C variable is now renamed to heap_memory_block & unk_404040 is now renamed to unknown_507_bytes.
  • Also from picture 1.2, the line number 9 allocates the 0x1FB (i.e., 507) bytes block of memory from heap & the line number 10 calls the memcpy() function to copy the 0x1FB (i.e., 507) bytes from a location defined by unknown_507_bytes.
  • All the byte values in hexadecimal form defined at unknown_507_bytes variable location will be used later & those bytes are:
Picture 1.3: All 0x1FB (i.e., 507) bytes at unknown_507_bytes.
  • Also, if we open the second binary file called ram.bin, have these similar bytes defined.
  • Going back to pseudocode flow from picture 1.2, the line number 11 calls sub_4022E0() function. Let’s examine the decompiled view of this function:
Picture 1.4: sub_4022E0() function definition.
  • In picture 1.4, the left side is the “Default Decompiled Code” with not so meaningful variable naming. But on right side the “Renamed Code” is where we renamed a few variables to be more meaningful & red box highlighting is used to show renamed stuff from original default one.
  • So, by following the “Renamed Code” from picture 1.4, there is a single do-while loop that uses heap_memory_block (from picture 1.2) & gets some byte from heap_memory_block using either idx1 or idx2, then those bytes will get passed to function call sub_402270(). The returned result by function call will decide whether the loop will continue or terminate.
  • To understand what sub_402270() really do, we should check its definition:
Picture 1.5: sub_402270() function definition.
  • Again, as in picture 1.5, on right side the “Renamed Code” is where we renamed a few variables to be meaningful. The red box highlighting used to show the renamed stuff from original default one.
  • By following the logic of “Renamed Code” from picture 1.5, there is a single switch case, its expression is provided by a1 parameter.
  • There is total 3 cases & a default one. The case 1 only do assignment of parameter a3 to heap_memory_block[a2]. The case 2 will get byte from heap_memory_block[a2] & store in byte_404240. Using the byte value from byte_404240, the case 3 will do an XOR operation with byte at heap_memory_block[a2] & store back in heap_memory_block[a2].
  • All these cases would break & return 1 from this function, so the do-while loop from sub_4022E0() function (in picture 1.4) will be continuing to iterate. The default case just returns 0 so that the do-while loop would be terminated.
  • We need to implement the code logic for these two functions, sub_4022E0() & sub_402270() to know what the bytes from unknown_507_bytes (in picture 1.2) being changed between these function calls.
  • Let’s create Python script & start by defining sequence of bytes as unknown_507_bytes:
Picture 1.6: Sequence of bytes defined by unknown_507_bytes.
  • As seen from picture 1.5, there are references to heap_memory_block & byte_404240 in definition of sub_402270() function & both of those were not present as function parameters & local variables. So, we consider them as global & define as:
Picture 1.7: Defining heap_memory_block & byte_404240 as global.
  • The sub_402270() function definition (which use switch case) can be defined as follows (here used simple if-else conditioning):
Picture 1.8: Definition for sub_402270() function.
  • Similarly, the sub_4022E0() function definition (which use do-while loop) can be defined as:
Picture 1.9: Definition for sub_4022E0() function.
  • For sake of observations (in picture 1.9), we print both the original & changed heap_memory_block. Now let’s call the sub_4022E0() function, run the script & check updates to heap_memory_block:
Picture 1.10: Calling sub_4022E0() function.
Picture 1.11: Original vs. Changed bytes of heap_memory_block.
  • From picture 1.11, it’s clear that various bytes are getting changed & at first many of them are in ASCII range of alphabets. So, let’s try converting them to character representation using heap_memory_block:
Picture 1.12: Convert bytes from heap_memory_block to characters & print them.
  • Referring pictures between 1.6 to 1.12, the complete Python script so far looks like:
Picture 1.13: Complete Python script.
  • Now when we run this script (from picture 1.13), it prints:
Resulting flag printed…! 🤯
  • We captured the flag! Submitting on challenge page will accept it:
Captured the flag…! 🚀

And we finished the challenge! As an overview, we first statically analyzed the vm1.exe binary executable using IDA by going through flow & functions. Then we replicated the two functions of interest using Python to check & understand how the pre-defined bytes get changed. In the end, we converted those bytes to character representation & got to the challenge flag… 🙂

Thank you to be here throughout this process & hopefully you enjoyed this. Special thanks to MalwareTech who provided these challenges. Please consider sharing, following & subscribing to get notified for more future writeups… 😊

Reverse Engineering is an art… The more we explore, it grows the curiosity!

Amey Chavan

--

--

Amey Chavan

Passionate about programming, Software Engineering & gaming... 😃 GitHub/LinkedIn/Twitter: apchavan