x86-32 Legacy System Calls Using Interrupts

Legacy Linux system calls on x86-32 are interrupt-based; the kernel registers the 0x80 interrupt number for a system call handler

To make a system call, a program executes INT 0x80 to issue an interrupt, with the system call number specified in the EAX register. The arguments for the system call are placed in the remaining general-purpose registers

To return from the system call, Linux kernel handlers use the IRET instruction, which returns program control from an interrupt handler to the interrupted procedure

These interrupt-based system calls are no longer used due to the performance penalties incurred by interrupts

x86-64 Fast System Calls

The x86-64 architecture provides two instructions, SYSCALL and SYSRET, specifically for making system calls, which are faster than using interrupts

The SYSCALL instruction invokes an OS system call handler in kernel mode by loading RIP from the IA32_LSTAR MSR register, after saving the address of the instruction following SYSCALL into the RCX register. It also saves RFLAGS into the R11register

In other words, for the kernel to receive incoming system calls, it stores the address of the code that will execute when a system call occurs in the IA32_LSTAR MSR register. The system call number is specified in the RAX register, and the arguments for the system call are placed in a subset of the general-purpose registers

The SYSRET instruction is the companion to SYSCALL, returning from an OS system call handler to user code in user mode. It does this by loading RIP from RCX and RFLAGS from R11 registers

Virtual Dynamic Shared Object (vDSO)

Some system calls only need to read a small amount of information from the kernel, and for these, the full machinery of a context switch to kernel mode is a significant overhead. The Linux vDSO is an ELF shared library that is part of the kernel but mapped into the address space of a user program to speed up some of these read-only system calls. The vDSO allows calling provided kernel functions without making system calls, thus eliminating the performance penalty of a context switch to OS kernel mode

Linux exposes four system calls through vDSO: clock_gettime()gettimeofday()time(), and getcpu()

Locating the vDSO

Due to ASLR the vDSO is loaded at a random address at a program startup. The AT_SYSINFO_EHDR auxiliary vector value contains the address of the start of the ELF header for the vDSO that was generated by a linker. Once that header is located, user programs can parse the ELF object and call the functions in it as needed

References