Gemini PRO came with 1 million context

…but will that help unless the models can be trained with larger context. My understanding is that due to quadratic complexity, the models are not being trained with large context, hence even if we have large context window, models will not be able to use them. Any thoughts on this?

I have been using it since its preview release, and I am shocked by how superior it is. Its 1M context window, along with its multimodal capabilities, really allow you to do work that simply cannot be done with ChatGPT. For example, I gave it a lecture video and asked to write a transcript of the part where a particular concept is explained and it did it without any problems. I gave it an entire book and asked to spit out a theorem. And again, it did it without any problems. The only two limitations I see (for now) are: 1) you cannot create GPTs (what I mean is agents that can readily access a set of files without re-uploading them every time); 2) it cannot format mathematics. But I guess these things will come.

1 Like

Enough “optimization” (like gpt-4-turbo), and it seems what you get by attention masking on limited attention layers is a lot more like “token RAG” instead of “context”…

1 Like

On the other hand there are OpenAI’s GPTs which I really stopped using because of their degraded performance and unreliability. I just run a test with my GPT: 9 out of 10 times it ignored the custom instructions. The prompt is simply flooded by the knowledge files. I am aware this is a known issue, but it makes me wonder what is the point of GPTs with knowledge files if it completely zeros out the performance.

What do you use instead of that, is it open GPT or Bedrock Agent?

Is this optimization something similar to kernelized optimization or approxiamation.

OpenAI doesn’t document the proprietary changes to their model to make input context cost 1/6 the price of gpt-4-32k and token generation go faster. You only get to see the predictable side-effects - instructions and data ignored.