use atomic decrement rather than cas in pthread_exit thread count
[musl] / include / memory.h
1 #include <string.h>