OOOO I am very very interested in MIT’s MADDNESS! From the looks of it, theyre smartly using AVX2’s LUT vpermq / vpermd directly in the quantization step which looks super promising!! SSE’s pshufb is also a possibility, though their package only supports AVX2.
This is probably similar to 8 bit quantization where you shift the range of (min, max) to (0, 255) or (-128, 128). However, the difference now if I understood this correctly is the removal of the integer FMA (ie SSE pmaddubsw) and replacing it with a LUT addition? I need to read their research more.