January 18, 2013

Encrypted Self Modifying Code


Introduction

This is supposed to be a short article on the development of self modifying code. I will show some simple codes and discuss a way to make them behave differently by having one part of the code section replaced by another one.

After this, we will use a simple XOR logic to encrypt a piece of code and add it to our existing code. Using the techinique initially described we will have a self modifying code to decrypt and execute such code.

If this doesn’t sound clear, keep reading for a better understanding. Basic knowledge on assembly is required in order to follow the examples. Every assembly code described can be compiled using the GNU Assembler (gas).

Reasoning

Why write self modifying code? Well, there is not only one single reason, but I believe that the main reason is obfuscation. Self modifying code can be hard to debug and understand, making this technique very useful for malware writers.

This technique can also be used to trick anti-virus software into not detecting the malware.

Based on the scenario described above, I believe it is important to understand a little bit of this technique if you want to do some malware analysis.

Our first Self Modifying Code (SMC)

The idea is simple: we have a piece of code in an address that gets executed. After that, we change the code in this address and jump to it so something else is executed. More less like this

Address 0x1000:

will_change:
    mov $0x100, %eax, 
    add $0x02,  %eax 
    mov %eax,   %ecx 
    
    call modify_code
    jmp will_change

What will happen is, the ‘modify_code’ procedure will change the opcodes starting at the address fo the ‘will_change’ label. When we jump to the ‘will_change’ label again, the code that will be executed will not be a sum to the %eax register, but something else we want.

Enough talk, let’s look at some code:

# Assemble and link with:
# as -o smc1.o smc1.s
# ld -o smc1 smc1.o

.section .data
    hello_1: .asciz "Hello number 1!"
    bye: .asciz "Adios!"

.section .text
.globl _start
# Align the code to a page boundary
.align 4096

_start:
    # Make memory starting at the address '_start' writable
    # using syscall sys_mprotect
    mov  $125, %eax
    movl $_start, %ebx

    # We will make everything from _start to new_code_end writable
    movl $(new_code_end - _start), %ecx
    movl $7, %edx
    int  $0x80

loop:
    # Say hello!
    xor  %eax, %eax
    movl $4, %eax        # syscall write
    movl $1, %ebx        # to stdout
    movl $hello_1, %ecx
    movl $17, %edx       # The size of the hello_1 string
    int $0x80

    # This is where the magic happens. We will replace everything
    # starting from the address at the label 'loop' with the code
    # starting at the label 'new_code'            
    movl $(new_code_end - new_code), %ecx
    movl $new_code, %esi
    movl $loop, %edi
    rep movsb            # Decrement %ecx and repeat while %ecx > 0

    jmp loop

new_code:
    # Say Adios!
    movl $4, %eax
    movl $1, %ebx
    movl $bye, %ecx
    movl $7, %edx
    int  $0x80

    # The infinte loop is over!
    movl $1, %eax        # syscall exit
    xor %ebx, %ebx
    int $0x80
new_code_end:

I believe the code and its comments are pretty self explanatory. One thing that might need an explanation is the fact of the code being aligned in 4096 bytes.

In Linux (at least for x86 and base x64) the memory is aligned in a 4Kbytes boundary (called a page). The pages containing code are, by default, read only.

When one tries to write to a read only page, the kernel will generate a SIGSEGV causing our apllication to abort. For this reason, we need to use ‘mprotect’ to make the page where the code will be rewritten writable. As one can see in the manpages, you can only change the privileges of the page, if this page has been allocated by your process, thus the reason for the aligning (guarantee that we have one page of ours to modify the access privileges).

If you didn’t understand it to this point, I recommend that you re-read the code and experiment a little bit with it. Comment out the line ‘rep movsb’ to see the results. Try removing the ‘.align 4096’ directive and check see the behaviour.

Adding some encryption!

The first part showed a really simple example of self modifying code. It is not too usefull as it is pretty simple to figure out what is happening and how it is happening, after all the code being used to replace the existing one is in a clear format for everyone to read. This is when doing some encryption might come into the game in order to make it harder for the analyst to figure out what the code is doing.

Let’s have a look at the code we will be injecting into our original code:

.section .text
.globl _start
_start:
    mov $4, %eax
    mov $1, %ebx
    .byte 0xE8, 0x04, 0x00, 0x00, 0x00  # call next. 
txt:
    .byte 0x41, 0x42, 0x43, 0x0A
next: 
    popq %rcx                           # Address of our string poped into %rcx
    movl $4, %edx
    int  $0x80

    mov $1, %eax
    xor %ebx, %ebx
    int $0x80

This code will print the string “ABC\n” and execute an exit(0).

There are a couple of tricks which have been used and I should discuss, so let me do that.

The first trick is in the line

    .byte 0xE8, 0x04, 0x00, 0x00, 0x00  # call next

Here I am hard coding the instruction call into the code. This would be translated into something like this:

    call 0x4(%rip)

This causes the RIP (or EIP in a 32 bit processor) to be pushed into the stack. If you look close, the RIP being pushed into the stack is exactly the address of the ‘txt’ label! So, right now we have in the stack the address to our string!

The other trick here, is to have the string in the middle of the code section disguised as code!

    txt: .byte 0x41, 0x42, 0x43, 0x0A

That is the string “ABC\n”!

$ as -o smc2.o smc2.s && ld -o smc2 smc2.o
$ ./smc2
ABC

Looking at this code under objdump we get:

$ objdump -d smc2

smc2:     file format elf64-x86-64


Disassembly of section .text:

00000000004000b0 <_start>:
  4000b0:   b8 04 00 00 00          mov    $0x4,%eax
  4000b5:   bb 01 00 00 00          mov    $0x1,%ebx
  4000ba:   e8 04 00 00 00          callq  4000c3 <next>

00000000004000bf <txt>:
  4000bf:   41                      rex.B
  4000c0:   42                      rex.X
  4000c1:   43 0a 59 ba             rex.XB or -0x46(%r9),%bl

00000000004000c3 <next>:
  4000c3:   59                      pop    %rcx
  4000c4:   ba 04 00 00 00          mov    $0x4,%edx
  4000c9:   cd 80                   int    $0x80
  4000cb:   b8 01 00 00 00          mov    $0x1,%eax
  4000d0:   31 db                   xor    %ebx,%ebx
  4000d2:   cd 80                   int    $0x80

As you can see, the section under <txt> is completey garbage (our string).

So our opcode listing is as follows:

b8 04 00 00 00
bb 01 00 00 00
e8 04 00 00 00
41 42 43 0a
59
ba 04 00 00 00
cd 80
b8 01 00 00 00
31 db
cd 80

Right now what we have to do is encrypt our opcodes and then insert them into our binary, where they will be decoded and executed.

For this example, I chose to encrypt the opcodes using an XOR E3; this means I take each one of the bytes and XOR E3 it.

You can use the following code to do this:

#include <stdio.h>

#define KEY 0xE3

int main() {

    int i;
    char code[] = "\xb8\x04\x00\x00\x00\xbb\x01\x00\x00\x00\xe8\x04\x00\x00"
                  "\x00\x41\x42\x43\x0a\x59\xba\x04\x00\x00\x00\xcd\x80\xb8"
                  "\x01\x00\x00\x00\x31\xdb\xcd\x80";

    for(i = 0; i < sizeof(code)-1; i++) {
        printf("%02x ", (unsigned char )(code[i] ^ KEY));
    }

    return 0;

}

The result is as follows:

5b e7 e3 e3 e3 
58 e2 e3 e3 e3
0b e7 e3 e3 e3
a2 a1 a0 e9
ba
59 e7 e3 e3 e3
2e 63 
5b e2 e3 e3 e3
d2 38
2e 63

Great! So let’s write some code to use this

.section .data

# Our encrypted code goes here.
code:
    .byte 0x5b, 0xe7, 0xe3, 0xe3, 0xe3 
    .byte 0x58, 0xe2, 0xe3, 0xe3, 0xe3
    .byte 0x0b, 0xe7, 0xe3, 0xe3, 0xe3
    .byte 0xa2, 0xa1, 0xa0, 0xe9
    .byte 0xba
    .byte 0x59, 0xe7, 0xe3, 0xe3, 0xe3
    .byte 0x2e, 0x63 
    .byte 0x5b, 0xe2, 0xe3, 0xe3, 0xe3
    .byte 0xd2, 0x38
    .byte 0x2e, 0x63

hello: .asciz "Hello number 1!"

.section .text
.globl _start
.align 4096

_start:

    # Make memory rwx, using syscall sys_mprotect
    movq $125, %rax
    movq $_start, %rbx

    # We will make everything from _start to new_code_end rwx
    movq $(the_end - _start), %rcx
    movq $7, %rdx
    int  $0x80
    

    call decrypt

evil:

    movq $0x04,  %rax
    movq $0x01,  %rbx
    movq $hello, %rcx
    movq $0x0f,  %rdx
    int  $0x80
    
    movq $36, %rcx               # 36 is the length of our code
    movq $code, %rsi
    movq $evil, %rdi
    rep  movsb
    
    jmp evil

decrypt:
    xor %rcx, %rcx
    
loop_decrypt:
    mov $0xE3, %dl
    movb code(,%rcx), %al
    xorb %al, %dl
    movb %dl, code(,%rcx)
    
    inc %rcx
    cmp $36, %rcx                # 36 is the length of our code
    jne loop_decrypt

the_end:    
    ret

The flow in this code is pretty simple. We set our page with permissions rwx, decrypt each byte of our code and apply the technique of modifying the existing code replacing it with the new one.

Of course this is a pretty simple way to do it and it is also really easy for someone looking at the disassembly to figure out the encryption key and algorithm.

The idea here is to simply give the reader an idea of what can be achieved with the technique and not the best way to apply it.

References

http://asm.sourceforge.net/articles/smc.html

Some viruses source codes :P