Proofreading large amount of text (~10,000 words) does not work well

Hello,

I am interested in proofreading capability of gpt LLM models. While using gpt-4 model via api, I noticed that proofreading “small” documents (e.g. <= 5000 words) works as expected, but for “larger” documents (~10,000 words), proofreading does not work well. Specifically, I wanted to be able to correct a typo in a sentence (e.g. “except” vs “expect”). gpt-4 model was able to fix this for small documents, but not for large documents. Similar behaviour was observed for chatgpt-4o as well.

Any thoughts on why proofreading does not work well on large documents? Also any suggestion as to how to fix this for large documents? I guess you can break up large document into smaller chunks, and proofread each chunk separately, but I was wondering if there is a better way.

I used a very simple user prompt: e.g. “Proofread the following text.”

As a side note, if you ask the model to “Check only typos”, then the model can detect typos for large documents as well.

Thank you!

3 Likes

One thing to note is that we pay by the token. From this perspective the solutions are equivalent.

2 Likes

Hi, I think to better for your project use the Gemini 1.5 pro, because that model have possibility to use 2M of tokens of interaction with a chat.

2 Likes

I think it’s a good idea to keep keep the attention economy and the principle of glance/process separation in mind.

The attention economy

Models can only “see” so much all at once. While some 120000 (~100000 words) tokens can be nominally loaded into the “context”, all that context is only candidate for attention. Let’s call that “things” the model can attend to “concepts”.

Depending on the nature of the context (how complicated it is, how repetitive it is, how spatially clustered it is, how convoluted it is, what the content is about, what the response will be about), you will see a variable amount of concepts the model can “keep in mind”.

If you have a prompt that can “highlight” a small number of compressible passages (or ideally, just one) in your context that can inform the output, you will typically get excellent results.

If your instruction is kind of vague and everything in the context has some level of applicability to the output, you will often see that the model will simply “miss” or “ignore” things that, in your opinion, it shouldn’t.

TL;DR: If the model has the capacity to “grok-at-a-glance” the concept of a typo, and you only have a handful in an otherwise well-written text, it might perform excellently. If you have 100 typos randomly distributed over a long text it’s gonna struggle.

Glance/Process separation

I briefly alluded to “highlighting” - when you load your context, the model doesn’t really “read” anything. Instead, it just “looks” at the end of your context (your query, or instruction, typically) and then pulls in potentially relevant information to generate the next handful of tokens. And every generated token can only be informed by what can be “seen” “at a glance” in your context.

A wooden shape sorter cube with various colored geometric blocks fitting into corresponding cutouts

What color is the star block? We can tell that at a glance.
What shape is the red block? :person_shrugging: You could say trapezoid and triangle, but perhaps even you might forget the hexagon in the back.

Models typically (I assume OpenAI does too, although we don’t know exactly how they work because they’re not open source) sort of annotate the entire context in a lot of dimensions. Roundness, Corners, Concavity, Color, perhaps in this case. If your “instruction” can be encoded into one particular embedding that recalls one specific thing, then you win.

One strategy to achieve this is to chunk your task to that it can be solved at a glance.

What’s the shape of the blue object in the bottom left corner? Oval
What’s the shape of the blue object in the bottom right corner? Star
What’s the shape of the blue object in the top left corner? Flower
What’s the shape of the blue object in the top right corner? Hallucination.

Now, with proof reading, you typically won’t have to deal with hallucinations because it’s fairly straight forward. That said, you should always give your model an “easy out” - an option to respond in the negative of whatever you’re asking.

But the TL;DR: here is: if it can’t be solved at a glance, you must solve it as a process.

HTH and good luck!

4 Likes

If that’s just for proofreading, you can split the input text into 3-5 paragraph chunks and process them all in parallel… But if you’re doing something else while proofreading, then maybe some more details about your app would help me helping you.

3 Likes