optimize cond waiter move using atomic swap instead of cas loop