Most variants of storm worms are packed with the custom packer "Tibs". Tibs packer is a polymorphic packer, which also has the capability of anti-emulation. Recently we encountered a wave of stormcodec8.exe. When we carried on an analysis for this variant, we found some interesting features in it. Below are some highlights of our analysis.
Similar to the publicly-released packers (UPX, FSG, etc.), a custom packer has a data structure to store the packer information, as well. Such a data structure is essential to a packer. It stores the information which the packer stub can use to reform its original shape. For example, where to locate the encrypted data, what's the length of the encrypted data, what are the parameters for the decompression algorithm, how to reach the OEP (original entry point), and so on. Of course such a data structure is always hidden in the massive disassembly codes. Sometimes it's not easy to locate it. But once you get it, you control the packer :).
In this variant of Tibs, the packer data structure is relatively simple. First, it gets the base of the data structure:
Let's do a snapshot from the address 0x00401331, which is the base of the packer data structure. Through analyzing the disassembly codes, we will get two important members of the structure. One is the pointer to the encrypted data (0x00401341), and the other one is the size of the encrypted data (0x00401345).
This packer then comes to the anti-emulation part after a few junk codes, which is the same as the majority of variants of storm worms. In the very beginning, the storm worms always used FPU and MMX instructions to break AV emulators. After that, fake API (uncommonly used APIs which the emulators may not support) calls in the decryption loop are widely used. But this variant uses quite a different way. Let's take a look at its disassembly code in IDA Pro:
It looks like a common loop, right? But when you look into the codes carefully, you may find something strange. The counter is initiated with 0x7f2d, as mentioned above, it is actually the size of the encrypted data (Part 1). And when you look at the loop, in part 2 of the picture, you will see that the counter assigned its value to eax, and then increases by 1 each time. In part 3, that's the test condition, you may have noticed that the control flow will jump out of the loop only when the register eax overflows! Let's calculate how many loops will occur in this part. Certainly it's not a complicated mathematics problem. We can easily to get the result: 0xFFFFFFFF - 0x7F2D + 0x1 = 0xFFFF80d3 = 4,294,934,739 !
So what do 4,294,934,739 loops mean to an emulator? As is known to all, most emulators have a predefined maximum timeout or maximum operation count for the sake of performance and robustness. 4,294,934,739 loops will most probably make the emulator stop before reaching the decryption loop, not to mention the real malicious code.
After going through the anti-emulation part, we reach the decryption codes. Tibs packers usually use TEA (Tiny Encryption Algorithm) or TEA + UPX against cryption analysis. The following picture shows a part of the decryption routine in this variant:
We can use a PEiD plugin called Krypto ANALyzer (KANAL) to detect whether a standard algorithm is employed by this packer. After scanning the decryption routine by using KANAL, we get the following results, the constant "0x9E3779B9" is used in some standard compression/decompression algorithms.
You probably know that the standard TEA operates on 64-bit blocks and uses a 128-bit key (maybe four 32-bit keys is more accurate). After we analyzed this decryption algorithm, we found that it differs from TEA in some aspects. For example, it operates on 32-bit blocks. That is to say, each time a double word data will be decrypted.
There are also four obvious 32-bit keys in this algorithm, but unlike TEA, only one of them is selected each time. Actually, the counter for the decryption loop is used to decide which key will be used. And in this algorithm, the latest 32-bit decrypted data will participate in the decryption process of the next 32-bit data. This makes it much more complicated than the standard TEA. Although there are many differences between these two algorithms, their frameworks and the operations on the encrypted data are similar. So, I guess it's a modified version of TEA.
Then, what is the encryption data after it gets decrypted? Actually, they are a stub and an incomplete PE image in memory. The stub will conduct the processes to rebuild a complete PE image in the heap based on the embedded PE image. First, the stub will allocate a buffer for the new image. After that it will copy the headers, reform the sections, fix the import and relocation table, and finally jump to the entry point of the new PE image, starting to run it. The following picture shows the beginning part of the stub:
Compared to most publicly-released packers, custom packers are usually more sophisticated. In this case, headers (including the section header) are not encrypted. They will be copied to the heap directly. But all the sections in the embedded PE image are encrypted. So, in order to reform the sections, another decryption is needed when rebuilding the sections. Let's take a look at part of the routine for decrypting the PE sections:
After the headers and all the sections are ready, the stub codes finally fix the import table and relocation table. And at this point, the PE image in the heap is a complete one and then stub codes will transfer the control to it.
Now we can summarize the whole process of this packer with following flow chart:
We notice that in this case the original malware is embedded. It means that the malware is not combined with the packer stub very compactly to some extent. So it's probably another clue that the storm worm gang are using their tools to produce polymorphic Tibs packers.
Security Researcher: Xiaodong Tan