Morphine 3.5 Part 2

Analysis of anti-debugging features


A while back, I had posted a paper on unpacking Morphine 3.5 on Tuts4you. I was able to succeed in unpacking the target thanks to Kao, however, I had not studied the anti-debugging features of the packer. I had planned on revisiting this at some point and I've finally decided to do that today. The reason for this is not only to understand targets that implement these anti-debugging features, but also to understand why I wasn't able to debug with Ollydbg at the time.

As it turns out there are quite a few anti-debugging features, some of which are not targeted at Olly, but other software debuggers. It's worth noting that the source code for Morphine is readily available as well. The anti-debugging methods we'll focus on are:

BeingDebugged Flag

Perhaps one of the most commonly used functions for anti-debugging is Kernel32.IsDebuggerPresent(). This function as well as it's bypass is well documented by previous reversers. However, I'll still go over it since it wasn't that obvious to me as a beginner.

IsDebuggerPresent is a function that will check the PEB (Process Environment Block) for the flag at 0x2 (BeingDebugged). Olly as well as other debuggers have a nifty way of being able to access the PEB through it's commandline plugin. If we go to plugins->Command Line and type

dump fs:[30]+2

This will bring up the BeingDebugged flag in the dump window. Ideally this should be 0, so you can edit->fill with zeros to change that value. This is effectively what other plugins do for you such as OllyAdvanced. It's also worth noting what this syntax actually means.

The fs: portion is referring to the fs register that holds a pointer to the TIB (Thread Information Block)

[30] is referring to the PEB.

+2 is an offset into the PEB structure that stores the BeingDebugged flag.

If you're curious, more information is available on the web about all of the different information the PEB stores.

OllyDBG Window Class Search

If we breakpoint using the command line bp Kernel32.IsDebuggerPresent() we'll find the first check that Morphine does to find Olly. If we follow it out to the return and re-analyze the code, we'll see this portion of disassembly:

005113E4   .  85C0          TEST EAX,EAX
005113E6   .  0F85 22020000 JNZ UnPackMe.0051160E
...
0051141B   .  8B85 78FEFFFF MOV EAX,DWORD PTR SS:[EBP-188]      ; user32.FindWindowA
00511421   .  C785 FDFDFFFF>MOV DWORD PTR SS:[EBP-203],594C4C4F ; "OLLY"
0051142B   .  C785 01FEFFFF>MOV DWORD PTR SS:[EBP-1FF],474244   ; "DBG"
00511435   .  89EB          MOV EBX,EBP
00511437   .  81EB 03020000 SUB EBX,203
0051143D   .  6A 00         PUSH 0                              ; lpWindowName = NULL
0051143F   .  53            PUSH EBX                            ; lpClassName = "OllyDBG"
00511440   .  FFD0          CALL EAX                            ; Call user32.FindWindowA
00511442   .  85C0          TEST EAX,EAX                        ; EAX holds module handle if found
00511444   .  0F85 C4010000 JNZ UnPackMe.0051160E               ; badboy

Intentionally, I've skipped over the first check as it doesn't apply to Olly 1.x. If we look at the comments it's quite clear what Morphine is doing here. It makes a call to FindWindowA looking for the class name of "OllyDBG". I will say, I'm not sure if there's a plugin available to bypass this since you can't rename Olly without also hex editing your plugins. The easiest solution here is to use a custom Olly that has been renamed already like FOFF. You can also change EAX to 0 before the jump, or flip the ZF flag to bypass this as well, however, you'd have to do this every time you restart the target.

RDTSC

This check is quite interesting because I hadn't noticed it in x32dbg. Let's refer to the following disassembly from Morphine:

005115B3   > \0F31          RDTSC
005115B5   .  8985 FDFDFFFF MOV DWORD PTR SS:[EBP-203],EAX
005115BB   .  0F31          RDTSC
005115BD   .  01D8          ADD EAX,EBX
005115BF   .  8985 01FEFFFF MOV DWORD PTR SS:[EBP-1FF],EAX
005115C5   .  0F31          RDTSC
005115C7   .  29D8          SUB EAX,EBX
005115C9   .  8985 05FEFFFF MOV DWORD PTR SS:[EBP-1FB],EAX

The RDTSC instruction takes a timestamp (from CPU cycles), places the high order bits into EDX, and places the low order bits into EAX (64 bit timestamp). The idea here is to take multiple timestamps to determine whether or not the module is being debugged. If the time between instructions is above a certain threshold, it's reasonable to assume the module is being single stepped through a debugger. OllyAdvanced provides a method to bypass this using the Anti-RDTSC option. I personally haven't studied what this plugin option does behind the scenes exactly, though it does stop the debugger from getting caught on the instruction. Alternatively, you can generally breakpoint after the instructions to bypass this. Unlike this example, if there are comparisons after the timestamps are done you could also patch the routine entirely with NOPs.

CRC

CRC is a check that's done to test the integrity of a memory region or module by using a hashing method. The specific check for Morphine is located here:

005115D4   $  58            POP EAX
005115D5   .  89C3          MOV EBX,EAX                     ; mostly all setup for crc
005115D7   .  83C0 31       ADD EAX,31                      ; start address, size of byte array, etc
005115DA   .  81EB F9010000 SUB EBX,1F9
005115E0   .  89C2          MOV EDX,EAX
005115E2   .  29DA          SUB EDX,EBX
005115E4   .  89DE          MOV ESI,EBX
005115E6   .  89D1          MOV ECX,EDX
005115E8   .  31DB          XOR EBX,EBX
005115EA   .  31D2          XOR EDX,EDX
005115EC   .  B8 01000000   MOV EAX,1
005115F1   >  0FB61E        MOVZX EBX,BYTE PTR DS:[ESI]    ; this is where it begins the hashing
005115F4   .  46            INC ESI
005115F5   .  01C8          ADD EAX,ECX
005115F7   .  01D8          ADD EAX,EBX
005115F9   .  31D8          XOR EAX,EBX
005115FB   .  31C2          XOR EDX,EAX
005115FD   .  49            DEC ECX
005115FE   .^ 75 F1         JNZ SHORT UnPackMe.005115F1
00511600   .  3B95 64FEFFFF CMP EDX,DWORD PTR SS:[EBP-19C]   ; EBP-19C = 0x73B10300 CRC
00511606   .  75 06         JNZ SHORT UnPackMe.0051160E      ; badboy

What this function is designed to do is to check for patches or changes in memory. If for example, you have software breakpoints set and you reach this function, you will fail the check. If you have modified the memory with patches, you will fail the check. The real question is, which portion of memory is the packer running a checksum on? The answer is quite clever, it's from address range 0x5113DB - 0x511605 which includes the entirety of the packer anti-debugging routines up until this very check. This also explains why modifying the PUSHAD instruction to RET as mentioned in the unpacking tutorial will not work until after the CRC check. You may be wondering how to narrow down that exact number of bytes and the answer is luckily very simple, perhaps better explained in the form of comments:

005115F1   >  0FB61E        MOVZX EBX,BYTE PTR DS:[ESI]     ; index
005115F4   .  46            INC ESI                         ; ++index to read next byte on next loop
005115F5   .  01C8          ADD EAX,ECX                     ; checksum logic
005115F7   .  01D8          ADD EAX,EBX
005115F9   .  31D8          XOR EAX,EBX
005115FB   .  31C2          XOR EDX,EAX
005115FD   .  49            DEC ECX                          ; number of bytes left to read
005115FE   .^ 75 F1         JNZ SHORT UnPackMe.005115F1      ; loop until 0 ecx = 0
00511600   .  3B95 64FEFFFF CMP EDX,DWORD PTR SS:[EBP-19C]   ; EBP-19C = 0x73B10300

ECX holds the number of bytes we'll read (0x22A) so you just have to add that to the starting address (0x5113DB) to get the range. Initially, I told myself I wasn't going to reverse this crc function, but as it turns out, it's not a regular crc32 hash. Luckily the instructions are small enough to do rather quickly and an Olly plugin I had made copying the bytes from dump into a C friendly array very easy. I also wanted a way to verify all of this so that I truly had confidence in my findings. Thus here's the crc algorithm:

#include <stdio.h>

int main() {

  unsigned int crc_table[0x22A] = {
    0x60, 0x8B, 0x85, 0x84, 0xFE, 0xFF, 0xFF, 0xFF, 0xD0, 0x85, 
0xC0, 0x0F, 0x85, 0x22, 0x02, 0x00, 0x00, 0x8B, 0x85, 0x78, 
0xFE, 0xFF, 0xFF, 0xC7, 0x85, 0xFD, 0xFD, 0xFF, 0xFF, 0x46, 
0x5A, 0x50, 0x38, 0xC7, 0x85, 0x01, 0xFE, 0xFF, 0xFF, 0x38, 
0x00, 0x00, 0x00, 0x89, 0xEB, 0x81, 0xEB, 0x03, 0x02, 0x00, 
0x00, 0x6A, 0x00, 0x53, 0xFF, 0xD0, 0x85, 0xC0, 0x0F, 0x85, 
0xF3, 0x01, 0x00, 0x00, 0x8B, 0x85, 0x78, 0xFE, 0xFF, 0xFF, 
0xC7, 0x85, 0xFD, 0xFD, 0xFF, 0xFF, 0x4F, 0x4C, 0x4C, 0x59, 
0xC7, 0x85, 0x01, 0xFE, 0xFF, 0xFF, 0x44, 0x42, 0x47, 0x00, 
0x89, 0xEB, 0x81, 0xEB, 0x03, 0x02, 0x00, 0x00, 0x6A, 0x00, 
0x53, 0xFF, 0xD0, 0x85, 0xC0, 0x0F, 0x85, 0xC4, 0x01, 0x00, 
0x00, 0x8B, 0x85, 0x78, 0xFE, 0xFF, 0xFF, 0xC7, 0x85, 0xFD, 
0xFD, 0xFF, 0xFF, 0x41, 0x00, 0x00, 0x00, 0x89, 0xEB, 0x81, 
0xEB, 0x03, 0x02, 0x00, 0x00, 0x6A, 0x00, 0x53, 0xFF, 0xD0, 
0x85, 0xC0, 0x74, 0x75, 0x89, 0xC1, 0x89, 0xC6, 0x8B, 0x85, 
0x6C, 0xFE, 0xFF, 0xFF, 0x51, 0xFF, 0xD0, 0x85, 0xC0, 0x74, 
0x64, 0x40, 0x89, 0xC2, 0x8B, 0x85, 0x74, 0xFE, 0xFF, 0xFF, 
0x89, 0xEB, 0x81, 0xEB, 0x03, 0x02, 0x00, 0x00, 0x52, 0x53, 
0x56, 0xFF, 0xD0, 0x85, 0xC0, 0x74, 0x4A, 0x8B, 0x95, 0xFD, 
0xFD, 0xFF, 0xFF, 0xBB, 0x47, 0x6F, 0x42, 0x75, 0x39, 0xD3, 
0x75, 0x3B, 0x8B, 0x95, 0x01, 0xFE, 0xFF, 0xFF, 0xBB, 0x67, 
0x20, 0x44, 0x65, 0x39, 0xD3, 0x75, 0x2C, 0x8B, 0x95, 0x05, 
0xFE, 0xFF, 0xFF, 0xBB, 0x62, 0x75, 0x67, 0x67, 0x39, 0xD3, 
0x75, 0x1D, 0xC7, 0x85, 0x0B, 0xFE, 0xFF, 0xFF, 0x00, 0x00, 
0x00, 0x00, 0x8B, 0x95, 0x09, 0xFE, 0xFF, 0xFF, 0xBB, 0x65, 
0x72, 0x00, 0x00, 0x39, 0xD3, 0x0F, 0x84, 0x2E, 0x01, 0x00, 
0x00, 0x64, 0x8B, 0x05, 0x18, 0x00, 0x00, 0x00, 0x89, 0xEB, 
0x81, 0xC3, 0x03, 0x02, 0x00, 0x00, 0x8B, 0x40, 0x30, 0x31, 
0xC9, 0x89, 0xCB, 0x41, 0x29, 0xCB, 0x4E, 0x01, 0xF3, 0x56, 
0x0F, 0xB6, 0x40, 0x02, 0x5E, 0x85, 0xC0, 0x0F, 0x85, 0x04, 
0x01, 0x00, 0x00, 0xC7, 0x85, 0xFD, 0xFD, 0xFF, 0xFF, 0x5C, 
0x5C, 0x2E, 0x5C, 0xC7, 0x85, 0x01, 0xFE, 0xFF, 0xFF, 0x53, 
0x49, 0x43, 0x45, 0xC7, 0x85, 0x05, 0xFE, 0xFF, 0xFF, 0x00, 
0x00, 0x00, 0x00, 0x89, 0xEB, 0x81, 0xEB, 0x03, 0x02, 0x00, 
0x00, 0x8B, 0x85, 0x80, 0xFE, 0xFF, 0xFF, 0x6A, 0x00, 0x68, 
0x80, 0x00, 0x00, 0x00, 0x6A, 0x03, 0x6A, 0x00, 0x6A, 0x03, 
0x68, 0x00, 0x00, 0x00, 0xC0, 0x53, 0xFF, 0xD0, 0x3D, 0xFF, 
0xFF, 0xFF, 0xFF, 0x74, 0x0E, 0x50, 0x8B, 0x85, 0x7C, 0xFE, 
0xFF, 0xFF, 0xFF, 0xD0, 0xE9, 0xAE, 0x00, 0x00, 0x00, 0xC7, 
0x85, 0xFD, 0xFD, 0xFF, 0xFF, 0x5C, 0x5C, 0x2E, 0x5C, 0xC7, 
0x85, 0x01, 0xFE, 0xFF, 0xFF, 0x4E, 0x54, 0x49, 0x43, 0xC7, 
0x85, 0x05, 0xFE, 0xFF, 0xFF, 0x45, 0x00, 0x00, 0x00, 0x89, 
0xEB, 0x81, 0xEB, 0x03, 0x02, 0x00, 0x00, 0x8B, 0x85, 0x80, 
0xFE, 0xFF, 0xFF, 0x6A, 0x00, 0x68, 0x80, 0x00, 0x00, 0x00, 
0x6A, 0x03, 0x6A, 0x00, 0x6A, 0x03, 0x68, 0x00, 0x00, 0x00, 
0xC0, 0x53, 0xFF, 0xD0, 0x3D, 0xFF, 0xFF, 0xFF, 0xFF, 0x74, 
0x0B, 0x50, 0x8B, 0x85, 0x7C, 0xFE, 0xFF, 0xFF, 0xFF, 0xD0, 
0xEB, 0x5B, 0x0F, 0x31, 0x89, 0x85, 0xFD, 0xFD, 0xFF, 0xFF, 
0x0F, 0x31, 0x01, 0xD8, 0x89, 0x85, 0x01, 0xFE, 0xFF, 0xFF, 
0x0F, 0x31, 0x29, 0xD8, 0x89, 0x85, 0x05, 0xFE, 0xFF, 0xFF, 
0xE8, 0x00, 0x00, 0x00, 0x00, 0x58, 0x89, 0xC3, 0x83, 0xC0, 
0x31, 0x81, 0xEB, 0xF9, 0x01, 0x00, 0x00, 0x89, 0xC2, 0x29, 
0xDA, 0x89, 0xDE, 0x89, 0xD1, 0x31, 0xDB, 0x31, 0xD2, 0xB8, 
0x01, 0x00, 0x00, 0x00, 0x0F, 0xB6, 0x1E, 0x46, 0x01, 0xC8, 
0x01, 0xD8, 0x31, 0xD8, 0x31, 0xC2, 0x49, 0x75, 0xF1, 0x3B, 
0x95, 0x64, 0xFE, 0xFF, 0xFF
  };
  unsigned int hash = 0;
  unsigned int byte = 0;     // current byte value from array
  unsigned int eax = 1;
  unsigned int bytes_left = 0x22A;
  
  // rough, but it'll do
  for(unsigned int i = 0; i < 0x22A; i++) {
    byte = crc_table[i]; 
    eax += bytes_left;
    eax += byte;
    eax ^= byte;
    hash ^= eax;
    bytes_left--;
  }
  printf("%X", hash); // 0x3B173
    return 0;
}

That wraps up the write up of this packer, in the future I'll study the implementation of the hardware based RDTSC solution that OllyAdvanced uses so that I can provide more information as well.