it's essential to decrement the stack pointer before writing to new
stack space, rather than afterwards. otherwise there is a race
condition during which asynchronous code (signals) could clobber the
data being stored.
it may be possible to optimize the code further using stwu, but I
wanted to avoid making any changes to the actual stack layout in this
commit. further improvements can be made separately if desired.