I think it’s a good idea to keep keep the attention economy and the principle of glance/process separation in mind.
The attention economy
Models can only “see” so much all at once. While some 120000 (~100000 words) tokens can be nominally loaded into the “context”, all that context is only candidate for attention. Let’s call that “things” the model can attend to “concepts”.
Depending on the nature of the context (how complicated it is, how repetitive it is, how spatially clustered it is, how convoluted it is, what the content is about, what the response will be about), you will see a variable amount of concepts the model can “keep in mind”.
If you have a prompt that can “highlight” a small number of compressible passages (or ideally, just one) in your context that can inform the output, you will typically get excellent results.
If your instruction is kind of vague and everything in the context has some level of applicability to the output, you will often see that the model will simply “miss” or “ignore” things that, in your opinion, it shouldn’t.
TL;DR: If the model has the capacity to “grok-at-a-glance” the concept of a typo, and you only have a handful in an otherwise well-written text, it might perform excellently. If you have 100 typos randomly distributed over a long text it’s gonna struggle.
Glance/Process separation
I briefly alluded to “highlighting” - when you load your context, the model doesn’t really “read” anything. Instead, it just “looks” at the end of your context (your query, or instruction, typically) and then pulls in potentially relevant information to generate the next handful of tokens. And every generated token can only be informed by what can be “seen” “at a glance” in your context.
A wooden shape sorter cube with various colored geometric blocks fitting into corresponding cutouts
What color is the star block? We can tell that at a glance.
What shape is the red block?
You could say trapezoid and triangle, but perhaps even you might forget the hexagon in the back.
Models typically (I assume OpenAI does too, although we don’t know exactly how they work because they’re not open source) sort of annotate the entire context in a lot of dimensions. Roundness, Corners, Concavity, Color, perhaps in this case. If your “instruction” can be encoded into one particular embedding that recalls one specific thing, then you win.
One strategy to achieve this is to chunk your task to that it can be solved at a glance.
What’s the shape of the blue object in the bottom left corner? Oval
What’s the shape of the blue object in the bottom right corner? Star
What’s the shape of the blue object in the top left corner? Flower
What’s the shape of the blue object in the top right corner? Hallucination.
Now, with proof reading, you typically won’t have to deal with hallucinations because it’s fairly straight forward. That said, you should always give your model an “easy out” - an option to respond in the negative of whatever you’re asking.
But the TL;DR: here is: if it can’t be solved at a glance, you must solve it as a process.
HTH and good luck!