The implementation used to generate 32-bit bts/btr/btc for 8 and 16-bit
atomics, which could result in alignment and access violation and possibly
data corruption. 32 and 64-bit atomics are unaffected.
This commit fixes operaend width for 16-bit atomics. For 8-bit atomics the
generic implementation is used based on or/and/xor instructions since there
are no 8-bit bts/btr/btc.