It looks like GPT-4-32k is rolling out

Not really. The less than quadratic time models are starting to be developed. This one uses good ole’ CNN’s. I can’t attest to the performance or maturity, but this nut should be cracked soon. BTW, already basic RNN’s are infinite window, but they suffer in performance compared to transformers.