* Add _mm_loadh_pi
* Add doctest for _mm_loadh_pi
* Add _mm_loadl_pi
* Add _mm_load_ss
* Add _mm_load1_ps and _mm_load_ps1
* Add _mm_load_ps and _mm_loadu_ps
* Add _mm_loadr_ps
* Replace _mm_loadu_ps TODO with explanation
* Tweak expected instructions for _mm_loadl/h_pi on x86
* Try fixing i586 test crash
* Targets i586/i686 generate different code for _mm_loadh_pi