* [x86][avx2] add _mm256_shuffle{hi,lo}_epi16 * [x86][avx2] add _mm256_{insert,extract}i128_si256 * [x86][avx2] add remaining permute intrinsics