lab2-system calls

计算机/操作系统

发布日期: 2021-03-29

更新日期: 2021-04-03

文章字数: 3.4k

阅读时长: 15 分

阅读次数:

在上一个实验中，我们使用系统调用实现了一些功能。

本实验中，我们将为 xv6 添加一些新的系统调用，这将帮助我们理解系统调用是如何工作的。本实验会暴露内核的一些细节。

在实验开始前，我们需要先阅读 xv6 book 的 chapter 2 以及 section 4.3, 4.4以及以下代码：

The user-space code for systems calls is in user/user.h and user/usys.pl.
The kernel-space code is kernel/syscall.h, kernel/syscall.c.
The process-related code is kernel/proc.h and kernel/proc.c.

切换到 syscall branch 开始实验。

$ git fetch
$ git checkout syscall
$ make clean

如果现在执行评分程序 make grade，会发现评分脚本无法执行 trace 和 sysinfotest。我们的工作就是添加这两个系统调用。

开始之前

我们需要详细看一下 xv6 启动过程。在教材第二章最后一节已经讲解过了启动过程，这里结合实际代码来开一下。

RISC-V 计算机上电后先完成初始化，然后运行ROM里的bootloader，bootloader会把xv6内核载入内存。然后在机器模式下，CPU从xv6内核的_entry （kernel/entry.S:6）开始执行。RISC-V 启动时是禁用分页硬件（paging hardware）的，此时虚地址被直接映射到物理地址。

内核被加载到物理地址的 0x80000000 处，之所以不加载到0x0处是因为在0x0:0x80000000之间包含了I/O设备。

		# kernel/entry.S
		# qemu -kernel loads the kernel at 0x80000000
        # and causes each CPU to jump there.
        # kernel.ld causes the following code to
        # be placed at 0x80000000.
.section .text
_entry:
	# set up a stack for C.
        # stack0 is declared in start.c,
        # with a 4096-byte stack per CPU.
        # sp = stack0 + (hartid * 4096)
        la sp, stack0
        li a0, 1024*4
	csrr a1, mhartid
        addi a1, a1, 1
        mul a0, a0, a1
        add sp, sp, a0
	# jump to start() in start.c
        call start
spin:
        j spin

_entry处的指令建立了一个栈，以便 xv6 可以执行 C 代码。xv6 在 start.c(kernel.start.c:11) 声明了一个初始的栈空间 stack0:

// entry.S needs one stack per CPU.
__attribute__ ((aligned (16))) char stack0[4096 * NCPU];

_entry 中的指令加载地址stack0 + 4096 到栈指针寄存器 sp，这指示栈顶，因为 xv6 中的栈是向下生长的。现在内核有一个栈了，_entry 之后跳转到 start (kernel.start.c:21) 执行 C 代码。

start标记的作用是进行一些配置，这些配置只能在机器模式下来做，然后就切换到监督者模式。为了进入监督者模式，RISC-V 提供了 mret 指令。这条指令常用来从监督者模式到机器模式的调用中返回。start 不是从这个调用返回，而是直接设置一些必要的操作，就好像是从 mret 返回的一样：

在寄存器 mstatus 中设置为 supervisor 模式
设置返回地址为main 的地址（将 main 地址写入 mepc 寄存器）
禁用虚地址映射（在页表寄存器 satp 中写入 0）
并向特权模式（s mode）委托所有的中断和异常

在切换到 s mode 之前， start还要做一件事：为时钟芯片编程以产生定时中断。然后通过 mret返回到 supervisor 模式，这会导致 PC 变为 main （kernel/main.c:11）。这是因为我们已经通过 w_mepc((uint64)main) 设置了 mret 返回的地址。

// entry.S jumps here in machine mode on stack0.
void
start()
{
  // set M Previous Privilege mode to Supervisor, for mret.
  // 对于mret，将之前的特权模式设置为Supervisor。
  unsigned long x = r_mstatus();
  x &= ~MSTATUS_MPP_MASK;
  x |= MSTATUS_MPP_S;
  w_mstatus(x);

  // set M Exception Program Counter to main, for mret.
  // requires gcc -mcmodel=medany
  w_mepc((uint64)main);

  // disable paging for now.
  w_satp(0);

  // delegate all interrupts and exceptions to supervisor mode.
  w_medeleg(0xffff);
  w_mideleg(0xffff);
  w_sie(r_sie() | SIE_SEIE | SIE_STIE | SIE_SSIE);

  // ask for clock interrupts.
  timerinit();

  // keep each CPU's hartid in its tp register, for cpuid().
  // 对于cpuid()，将每个CPU的hartid保存在其tp寄存器中。
  int id = r_mhartid();
  w_tp(id);

  // switch to supervisor mode and jump to main().
  asm volatile("mret");
}

接下来会执行 main 函数。main函数首先是初始化了一些设备和子系统，然后调用userinit来创建第一个进程（kernel/proc.c:212）。第一个进程执行了一个使用 RISC-V 汇编编写的小程序initcode.S(user/initcode.S:1)，通过系统调用exec 重新进入内核，这将会把当前进程的内存和寄存器替换成一个新的程序：/init。一旦内核完成 exec ，就会返回到/init进程的用户空间。/init（kernel/init.c:15）创建了一个控制台设备文件，并作为文件描述符0，1，2来打开它。然后在无限循环中，启动shell并处理僵尸进程。系统就这样启动了。

// start() jumps here in supervisor mode on all CPUs.
void
main()
{
  if(cpuid() == 0){
    consoleinit();
    printfinit();
    printf("\n");
    printf("xv6 kernel is booting\n");
    printf("\n");
    kinit();         // physical page allocator
    kvminit();       // create kernel page table
    kvminithart();   // turn on paging
    procinit();      // process table
    trapinit();      // trap vectors
    trapinithart();  // install kernel trap vector
    plicinit();      // set up interrupt controller
    plicinithart();  // ask PLIC for device interrupts
    binit();         // buffer cache
    iinit();         // inode cache
    fileinit();      // file table
    virtio_disk_init(); // emulated hard disk
    userinit();      // first user process
    __sync_synchronize();
    started = 1;
  } else {
    while(started == 0)
      ;
    __sync_synchronize();
    printf("hart %d starting\n", cpuid());
    kvminithart();    // turn on paging
    trapinithart();   // install kernel trap vector
    plicinithart();   // ask PLIC for device interrupts
  }

  scheduler();        
}

1. System call tracing (moderate)

再次任务中，需要添加一个 trace 系统调用的功能，这对后面实验的 debug 会有帮助。你将要创建一个新的系统调用 trace，它接受一个参数 mask，使用 maks 的比特位决定需要跟踪哪个系统调用。例如，如果需要跟踪 fork 系统调用，那么可以使用 trace(1 << SYS_fork)，其中，SYS_fork 是在 kernel/syscall.h 中定义的。
// System call numbers
#define SYS_fork    1
你需要改动 xv6 内核，在每个系统调用即将返回时打印一行，包括进程 id、系统调用的名称以及返回值。无需打印系统调用的参数。trace 命令需要打印调用它的进程以及所有派生的子进程，但是不应该对其它进程造成影响。

实验提供了一个用户程序 trace，该用户程序执行另一个启用了 trace 的程序（user/trace.c）。当完成试验任务时，应该能得到类似如下输出：

$ trace 32 grep hello README
3: syscall read -> 1023
3: syscall read -> 966
3: syscall read -> 70
3: syscall read -> 0
$
$ trace 2147483647 grep hello README
4: syscall trace -> 0
4: syscall exec -> 3
4: syscall open -> 3
4: syscall read -> 1023
4: syscall read -> 966
4: syscall read -> 70
4: syscall read -> 0
4: syscall close -> 0
$
$ grep hello README
$
$ trace 2 usertests forkforkfork
usertests starting
test forkforkfork: 407: syscall fork -> 408
408: syscall fork -> 409
409: syscall fork -> 410
410: syscall fork -> 411
409: syscall fork -> 412
410: syscall fork -> 413
409: syscall fork -> 414
411: syscall fork -> 415
...
$

在上面第一个例子中，32 是 2 的 5 次方，在 kernel/syscall.h 中查到 \#define SYS_read 5，所以此时 trace 只跟踪 grep 程序中的 read 系统调用。

第二个例子中，2147483647 所有低 31 位都是 1，所以会跟踪所有的系统调用。

第三个例子中，程序没有被跟踪，所以没有跟踪信息打印出来。

第四个例子中，跟踪 usertests 中 forkforkfork 测试程序的所有 fork 系统调用。

一些提示：

在 Makefile 的 UPROGS 中添加 $U/_trace
执行 make qemu 你会发现 user/trace.c 并不能被编译，那是因为用户空间中还不存在该系统调用的声明：在 user/user.h 中增加系统调用的函数原型，user/usys.pl 中增加一个存根（stub），kernel/syscall.h 中增加一个系统调用号。Makefile 调用 perl 脚本user/usys.pl 生成 user/usys.S，这里才是实际的系统调用存根，调用 RISC-V 的 ecall 指令转移到内核。解决了以上问题之后，执行 trace 32 grep hello README 依然是失败的，因为你还没有在内核中实现此系统调用。
在 kernel/sysproc.c 中添加一个 sys_trace() 函数，通过在 proc 结构体中（kernel/proc.h）用一个新变量记住其参数来实现新的系统调用。The functions to retrieve system call arguments from user space are in kernel/syscall.c, and you can see examples of their use in kernel/sysproc.c.
修改 kernel/proc.c 中的 fork() 以便将父进程的 mask 复制到子进程。
修改 kernel/syscall.c 中的 syscall() 函数打印 trace 的输出。需要添加一个系统调用名字列表以便索引。

解析

首先要搞清楚系统调用的数据流。

user/usys.pl 生成 usys.S 汇编代码，通过 ecall 设置寄存器 a7 的值为系统调用号。根据下面的代码，我们需要定义 SYS_{name} 作为系统调用号。
```
sub entry {
    my $name = shift;
    print ".global $name\n";
    print "${name}:\n";
    print " li a7, SYS_${name}\n";
    print " ecall\n";
    print " ret\n";
}
```
- 因此在 syscall.h 中添加系统调用号。
```
#define SYS_trace  22
```
- 同时，需要在 user/usys.pl 中添加一个 entry
```
entry("trace");
```
需要在 user/user.h 中添加一个 trace 函数的声明：
```
int trace(int);
```
需要在 kernel.proc.h 的 proc 结构体中添加一个 mask 字节数组用于保存 mask 码
```
char mask[23];               // Trace mask
```

需要在 kernel/sysproc.c 中实现具体的 trace 函数

// trace
uint64
sys_trace(void)
{
  int n;
  if (argint(0, &n) < 0)
    return -1;
  struct proc *p = myproc();
  char *mask = p -> mask;
  int i = 0;
  while (i < 23 && n > 0) {
    mask[i++] = n % 2 == 1 ? '1' : '0';
    n >>= 1;
  }
  return 0;
}

需要在 kernel/syscall.c 添加 sys_trace 定义
```
extern uint64 sys_trace(void);
```
同时，在 kernel/syscall.c 中的函数指针数组 syscalls 中增加 sys_trace
```
[SYS_trace]   sys_trace,
```

在 kernel/syscall.c 中新增一个用于保存系统调用名称的数组：

static char *syscall_names[23] = 
  {
    "",
    "fork",
    "exit",
    "wait",
    "pipe",
    "read",
    "kill",
    "exec",
    "fstat",
    "chdir",
    "dup",
    "getpid",
    "sbrk",
    "sleep",
    "uptime",
    "open",
    "write",
    "mknod",
    "unlink",
    "link",
    "mkdir",
    "close",
    "trace"
  };

kernel/syscall.c 中根据 mask 按要求打印系统调用信息

if (strlen(p -> mask) > 0 && p -> mask[num] == '1') {
    printf("%d: syscall %s -> %d\n", 
    p -> pid, syscall_names[num], p -> trapframe -> a0);
}

kernel/proc.c 中 fork 函数调用时，子进程复制父进程的 mask
```
safestrcpy(np -> mask, p -> mask, sizeof(p -> mask));
```
评测结果如下：

(base) zsc@BerryLap:~/xv6-labs-2020$ ./grade-lab-syscall trace
make: 'kernel/kernel' is up to date.
== Test trace 32 grep == trace 32 grep: OK (1.2s)
== Test trace all grep == trace all grep: OK (0.8s)
== Test trace nothing == trace nothing: OK (1.0s)
== Test trace children == trace children: OK (9.8s)
(base) zsc@BerryLap:~/xv6-labs-2020$

2. Sysinfo (moderate)

此任务中你需要添加一个系统调用 sysinfo 来收集正在运行的系统的信息。该系统调用接受一个参数：一个指向 struct sysinfo （kernel/sysinfo.h）的指针。内核应该结构体的字段：其中，freemem字段应该被设置为空闲内存的字节数，nproc 字段应该被设置为状态不是 UNUSED 的进程数量。实验提供了 sysinfotest 测试程序，如果该程序输出 sysinfotest: OK 即通过测试。

一些提示

向 Makefile 的 UPROGS 中添加 $U/_sysinfotest
执行 make qemu 时， user/sysinfotest.c 无法通过编译。像前一个任务一样添sysinfo 系统调用。为了在 user/user.h 中声明 sysinfo() 函数原型，你需要预先声明 sysinfo 结构体：
```
struct sysinfo;
int sysinfo(struct sysinfo *);
```
- 一旦可以通过编译，执行 sysinfotest，但是由于现在没有具体在内核中实现系统调用，所以现在依然会失败。
sysinfo 需要将 sysinfo 结构体复制回用户空间，copyout() 的用法参见 sys_fstat() （kernel/sysfile.c）和 filestat() （kernel/file.c）中的例子。
为了收集空闲内存的信息，在 kernel/kalloc.c 中添加一个函数。
为了收集进程数量，在 kernel/proc.c中添加一个函数。

解析

kernel/sysinfo.h 中已经定义了 sysinfo 结构体：

struct sysinfo {
  uint64 freemem;   // amount of free memory (bytes)
  uint64 nproc;     // number of process
};

在 Makefile 中添加 $U/_sysinfo\
在 kernel/syscall.h 添加一个系统调用号:
```
#define SYS_sysinfo 23
```
user/user.pl 中添加一个入口： #define SYS_sysinfo 23
kernel/syscall.c 中添加 sysinfo 定义：
```
extern uint64 sys_sysinfo(void);
```
- 并将其添加到系统调用列表中：
```
[SYS_sysinfo] sys_sysinfo,
```
- 同时将其添加至系统调用名字的列表 syscall_names[MASK_SIZE] 中

在 user/user.h 中，添加函数原型：

struct sysinfo;
int sysinfo(struct sysinfo*);

在 kernel/sysproc.c 中引入 #include "sysinfo.h"，并添加系统调用代码：

uint64
sys_sysinfo(void)
{
  struct sysinfo info;
  uint64 addr;
  struct proc *p = myproc();
  if (argaddr(0, &addr) < 0) {
    return -1;
  }
  info.freemem = freemem_size();
  info.nproc = proc_num();
  if (copyout(p -> pagetable, addr, (char *)&info, sizeof(info)) < 8) {
    return -1;
  }
  return 0;
}

这里，我们用到了两个额外的函数 freemem_size() 和 proc_num()，加下来分别实现它们。

在 kernel/defs.h 中添加函数声明：

int freemem_size(void);
int proc_num(void);

在 kernel/proc.c 中实现 proc_num 函数

int 
proc_num(void) 
{
  struct proc *p;
  uint64 num = 0;
  for (p = proc; p < &proc[NPROC]; p++) {
    if (p -> state !=UNUSED) {
      num++;
    }
  }
  return num;
}

这里的 proc 是在此文件头部声明的一个结构体数组，这里只是把这个数组遍历了一遍。

在 kernel/kalloc.c 中实现 freemem_size 函数

int
freemem_size(void) 
{
  struct run *r;
  int num = 0;
  for (r = kmem.freelist; r; r = r -> next) {
    num++;
  }

  return num * PGSIZE;
}

在 user 目录下新建一个文件 sysinfo.c，添加以下内容：

#include "kernel/param.h"
#include "kernel/types.h"
#include "user/user.h"
#include "kernel/sysinfo.h"

int
main(int argc, char *argv[])
{
    if (argc != 1) {
        fprintf(2, "sysinfo need not param\n", argv[0]);
        exit(1);
    }

    struct sysinfo info;
    sysinfo(&info);
    printf("free space:%d, used process num:%d\n", info.freemem, info.nproc);
    exit(0);
}

测试结果如下：

(base) zsc@BerryLap:~/xv6-labs-2020$ make qemu
qemu-system-riscv64 -machine virt -bios none -kernel kernel/kernel -m 128M -smp 3 -nographic -drive file=fs.img,if=none,format=raw,id=x0 -device virtio-blk-device,drive=x0,bus=virtio-mmio-bus.0

xv6 kernel is booting

hart 2 starting
hart 1 starting
init: starting sh
$ sysinfo
free space:133382144, used process num:3
$ QEMU: Terminated
(base) zsc@BerryLap:~/xv6-labs-2020$