fix arm atomic store and generate simpler/less-bloated/faster code