xchg is slow on athlons, so use 3 xors instead