* Add _mm_cmp*_ps variant (SSE)
* Add _mm_comi{eq,lt,le,gt,ge,neq}_ss instructions (sse)
* Add _mm_ucomi*_ss instructions SSE
They all compile down to the same x86 instruction, UCOMISS, whereas the
_mm_comi*_ss instructions compile down to COMISS. The outputs of both
sets of instructions are exactly the same. The only difference is in
exception handling. I therefore added a single test case which tests
their different effect on the MXCSR register (_mm_getcsr) of
_mm_comieq_ss vs. _mm_ucomieq_ss. Together with the tests about emitting
the right instruction, no tests further tests are needed for the other
variants.
* Avoid constant-folding test case