x86-32 Legacy System Calls Using Interrupts
Legacy Linux system calls on x86-32 are interrupt-based; the kernel registers the 0x80
interrupt number for a system call handler
To make a system call, a program executes INT 0x80
to issue an interrupt, with the system call number specified in the EAX
register. The arguments for the system call are placed in the remaining general-purpose registers
To return from the system call, Linux kernel handlers use the IRET
instruction, which returns program control from an interrupt handler to the interrupted procedure
These interrupt-based system calls are no longer used due to the performance penalties incurred by interrupts
x86-64 Fast System Calls
The x86-64 architecture provides two instructions, SYSCALL
and SYSRET
, specifically for making system calls, which are faster than using interrupts
The SYSCALL
instruction invokes an OS system call handler in kernel mode by loading RIP
from the IA32_LSTAR
MSR register, after saving the address of the instruction following SYSCALL
into the RCX
register. It also saves RFLAGS
into the R11
register
In other words, for the kernel to receive incoming system calls, it stores the address of the code that will execute when a system call occurs in the IA32_LSTAR
MSR register. The system call number is specified in the RAX
register, and the arguments for the system call are placed in a subset of the general-purpose registers
The SYSRET
instruction is the companion to SYSCALL
, returning from an OS system call handler to user code in user mode. It does this by loading RIP
from RCX
and RFLAGS
from R11
registers
Virtual Dynamic Shared Object (vDSO)
Some system calls only need to read a small amount of information from the kernel, and for these, the full machinery of a context switch to kernel mode is a significant overhead. The Linux vDSO is an ELF shared library that is part of the kernel but mapped into the address space of a user program to speed up some of these read-only system calls. The vDSO allows calling provided kernel functions without making system calls, thus eliminating the performance penalty of a context switch to OS kernel mode
Linux exposes four system calls through vDSO: clock_gettime()
, gettimeofday()
, time()
, and getcpu()
Locating the vDSO
Due to ASLR the vDSO is loaded at a random address at a program startup. The AT_SYSINFO_EHDR
auxiliary vector value contains the address of the start of the ELF header for the vDSO that was generated by a linker. Once that header is located, user programs can parse the ELF object and call the functions in it as needed
References
- Computer Systems A Programmer’s Perspective, Global Edition (3rd ed). Randal E. Bryant, David R. O’Hallaron
- Lec17 Virtualization - YouTube
- The Definitive Guide to Linux System Calls | Packagecloud Blog
- Intel 64 and IA-32 Architectures Software Developer’s Manual
- vDSO - Wikipedia
- Chris’s Wiki :: blog/programming/GoSchedulerAndSyscalls
- vdso(7) - Linux manual page
- Implementing virtual system calls [LWN.net]
- Anatomy of a system call, part 2 [LWN.net]