January 11, 2013

Hooking syscalls in the Linux Kernel


It is time again to start hacking the kernel a little bit. In order to try to bring some memories back and after stealing the suggestion from a friend, I decided to try to hook the syscalls in the Linux Kernel 3 series for the x64 architecture.

The code presented below has been tested on the kernel version 3.2.0 and seems to work about fine.

So let’s have a look at the code and I will discuss some parts of it after.

#include <linux/module.h>
#include <linux/init.h>
#include <linux/types.h>
#include <asm/uaccess.h>
#include <asm/cacheflush.h>
#include <linux/syscalls.h>
#include <linux/delay.h>    // loops_per_jiffy

#define CR0_WP 0x00010000   // Write Protect Bit (CR0:16)

/* Just so we do not taint the kernel */
MODULE_LICENSE("GPL");

void **syscall_table;
unsigned long **find_sys_call_table(void);

long (*orig_sys_open)(const char __user *filename, int flags, int mode);

unsigned long **find_sys_call_table() {
    
    unsigned long ptr;
    unsigned long *p;

    for (ptr = (unsigned long)sys_close;
         ptr < (unsigned long)&loops_per_jiffy;
         ptr += sizeof(void *)) {
             
        p = (unsigned long *)ptr;

        if (p[__NR_close] == (unsigned long)sys_close) {
            printk(KERN_DEBUG "Found the sys_call_table!!!\n");
            return (unsigned long **)p;
        }
    }
    
    return NULL;
}

long my_sys_open(const char __user *filename, int flags, int mode) {
    long ret;

    ret = orig_sys_open(filename, flags, mode);
    printk(KERN_DEBUG "file %s has been opened with mode %d\n", filename, mode);
    
    return ret;
}

static int __init syscall_init(void)
{
    int ret;
    unsigned long addr;
    unsigned long cr0;
  
    syscall_table = (void **)find_sys_call_table();

    if (!syscall_table) {
        printk(KERN_DEBUG "Cannot find the system call address\n"); 
        return -1;
    }

    cr0 = read_cr0();
    write_cr0(cr0 & ~CR0_WP);

    addr = (unsigned long)syscall_table;
    ret = set_memory_rw(PAGE_ALIGN(addr) - PAGE_SIZE, 3);
    if(ret) {
        printk(KERN_DEBUG "Cannot set the memory to rw (%d) at addr %16lX\n", ret, PAGE_ALIGN(addr) - PAGE_SIZE);
    } else {
        printk(KERN_DEBUG "3 pages set to rw");
    }
    
    orig_sys_open = syscall_table[__NR_open];
    syscall_table[__NR_open] = my_sys_open;

    write_cr0(cr0);
  
    return 0;
}

static void __exit syscall_release(void)
{
    unsigned long cr0;
    
    cr0 = read_cr0();
    write_cr0(cr0 & ~CR0_WP);
    
    syscall_table[__NR_open] = orig_sys_open;
    
    write_cr0(cr0);
}

module_init(syscall_init);
module_exit(syscall_release);

Let’s first have a look at the find_syscall_table function.

unsigned long **find_sys_call_table() {
    
    unsigned long ptr;
    unsigned long *p;

    for (ptr = (unsigned long)sys_close;
         ptr < (unsigned long)&loops_per_jiffy;
         ptr += sizeof(void *)) {
             
        p = (unsigned long *)ptr;

        if (p[__NR_close] == (unsigned long)sys_close) {
            printk(KERN_DEBUG "Found the sys_call_table!!!\n");
            return (unsigned long **)p;
        }
    }
    
    return NULL;
}

What we are doing here is looking for the pointer to the sys_call_table symbol. With this table in hands we can overwrite its entries to have different functions called in place of the expected syscall.

The for loop starts looking at the address of the function sys_close and runs up to the address of the symbol loops_per_jiffy trying to find the sys_call_table symbol. I have set these two as start and end points based on my System.map (located on /boot/) file which has these entries:

ffffffff81175dc0 T sys_close
ffffffff81801300 R sys_call_table
ffffffff81c0f3a0 D loops_per_jiffy

As we can see the address of the sys_call_table is between sys_close and loops_per_jiffy. I believe this might be true for most kernels in the 3.x line, that is the reason why I have selected these addresses as the space to look for the sys_call_table.

The letter R before the sys_call_table means that this memory region is read-only so we will not be able to simply get the address to the syscall table and directly modify it.

static int __init syscall_init(void)
{
    int ret;
    unsigned long addr;
    unsigned long cr0;
  
    syscall_table = (void **)find_sys_call_table();

    ...
    
    cr0 = read_cr0();
    write_cr0(cr0 & ~CR0_WP);

    addr = (unsigned long)syscall_table;
    
    ...
    
    orig_sys_open = syscall_table[__NR_open];
    syscall_table[__NR_open] = my_sys_open;

    write_cr0(cr0);
  
    return 0;
}

As the sys_call_table is read-only we need to make it writeable. In order to do that, we need to make sure that the Write-Protect bit in cr0 is disabled. The register cr0 is a control register in the Intel architecture that contains a flag called WP on bit 16 (bit count starts at 0); when this flag is set to 1 any memory page that is set read-only cannot be changed to be writable, so we need to change this flag back to 0 before we can call set_memory_rw to make the sys_call_table writable again. Note that at the end of the function I set the cr0 register back to its original value as a good measure, meaning that the syscall table cannot be changed again.

At this point, we simply change the sys_call_table index we want to point to a function of ours that has THE SAME prototype as the original syscall in this index. Note that I am keeping the address of the original sycall saved, as we will probably want to call it inside our new function and as we will need to set it back inside the table when we remove this module.

For a list of the prototypes of each syscall, have a look into the file include/linux/syscalls.h in the source of the linux kernel. The list of syscall indexes (fox x64) can be found in the file /usr/include/x86_64-linux-gnu/asm/unistd_64.h on my system.

After all this is done we just need to make sure that when our module is removed we set back the correct address for the overwritten entry in the sys_call_table.

static void __exit syscall_release(void)
{
    unsigned long cr0;

    /* Make the sys_call_table writable again */
    cr0 = read_cr0();
    write_cr0(cr0 & ~CR0_WP);

    /* Restore the old pointer */
    syscall_table[__NR_open] = orig_sys_open;

    /* Bring cr0 back to what it was */
    write_cr0(cr0);
}

This is the first step into buidling a rootkit or a syscall proxy. So have fun and do not abuse the knowledge :P

The code with the makefile can be found at my GitHub

References

http://www.gadgetweb.de/linux/40-how-to-hijacking-the-syscall-table-on-latest-26x-kernel-systems.html

http://kerneltrap.org/mailarchive/linux-kernel/2008/1/25/612014

http://badishi.com/kernel-writing-to-read-only-memory/