It’s obviously a huge leap forward in speed and context size, but the slight degradation in accuracy compared to normal GPT-4 makes me wonder, could OpenAI have taken notes from the open-source community and quantized for speed and scalability?
I know it’s a bit of a random topic, but I’m curious what you all think. This is all just speculation, of course.