It looks like GPT-4-32k is rolling out

@theevildays What were you hoping to use 32k for?

Realize that these transformer/attention-head architectures are quadratic in time with the output token length. So it’s more of a technology bottleneck, and these servers don’t grow on trees!