Compare-and-Swap (CAS)
Compare-And-Swap (CAS) is an atomic instruction used to achieve synchronization: it compares the contents of a memory location with a given value and, if so, changes it to a new value. It returns a flag indicating whether it changed the value
CAS behaves like this function, but atomically:
CAS is implemented using special hardware instructions provided by the CPU
- On x86-64 CAS is supported directly by the
CMPXCHG
instruction - On ARM64 and RISC-V, CAS can be implemented using LL/SC instructions
For example, on RISC-V:
Double-Width CAS (DW-CAS)
Double-Width Compare-and-Swap (DW-CAS) is a CPU instruction performing a CAS operation on a memory location that’s double the native word size. For example, it gives you 128-bit atomic lock-free variables on a 64-bit CPU
On x86-64, DW-CAS is supported directly by the CMPXCHG16B
instruction
Fetch-and-Add
Fetch-and-Add (FAA) is an atomic instruction used to increment the contents of a memory location by a specified value and return the original value
FAA is implemented using special hardware instructions provided by the CPU:
- On x86-64, FAA is implemented by the
XADD
instruction - On RISC-V, FAA is implemented using the
AMOADD
instruction, while on ARM64, it is implemented using theLDADDAL
instruction
FAA can be implemented using CAS on architectures without FAA support:
Load-Link/Store-Conditional
Load-Link (LL) (also known as Load-Reserved (LR)) and Store-Conditional (SC) are a pair of atomic instructions used to achieve synchronization:
- Load-Link (LL) returns the current value of a memory location
- Store-Conditional (SC) writes to a memory location only if (1) it was the memory location used in the last LL instruction, and (2) that location has not been changed since the LL
Real implementations of LL/SC do not always succeed even if there are no concurrent updates to the memory location in question. Any exceptional events between the two operations, such as a context switch, another LL, or even another load or store operation, will cause the SC to spuriously fail
Typically, CPUs track the LL address at a cache-line granularity, such that any modification to any portion of the cache line (whether via another core’s SC or merely by an ordinary store) is sufficient to cause the SC to fail
LL/SC are supported by both ARM64 ldxr/stxr
and RISC-V lr/sc
instructions
Double-Width LL/SC
Double-Width LL/SC are LL/SC that act on double words. For example on RISC-V these instructions are lr.d
and sc.d
Compared to CAS
If any updates have occurred, the SC is guaranteed to fail, even if the value read by the LL has since been restored. As such, an LL/SC pair is stronger than a read followed by a CAS, which will not detect updates if the old value has been restored, thus eliminating the ABA problem
One advantage of CAS is that it guarantees that some hardware thread eventually makes progress, whereas an LR/SC atomic sequence could livelock indefinitely on some systems, due to spurious fails. To avoid this concern, RISC-V added an architectural guarantee of livelock freedom for certain LR/SC sequences