I’m working on a python script that relies on gpt-3.5-turbo-1106 in an API call to the Open AI to deliver condensed versions of 700-1000 word long text batches. I’ve tried different ways but GPT fails to deliver the version I’m looking for. My expectation is a condensed version of 350-500 words of the original text in length. Usually, the version GPT delivers is significantly shorter at about 10% of the size of the text. I’ve tested with multiple variables, including:
different prompts (eg “generate a shorter version”, “regenerate the following test in a shorter form”, “generate a summary of”, “regenerate a version of the following text that’s 300 to 500 words long”)
different token capacities ( anything from 100 to 4000)
different temperatures (from 01. to 0.9)
And, it keeps failing to deliver a properly condensed version. Meanwhile, ChatGPT is more successful at those text manipulations.
Any advice on how to solve this?
Because LLMs operate with tokens not whole words, it can be difficult for it to count words correctly. You might try asking for # of sentences or giving it a one-shot with the exact length of output that you want.
This seems to be the same issue I have described in my post from earlier today - Essentially the new turbo models have gotten lazy and their responses are too short and missing data in many cases. Interesting to see that this applies not only to extraction but also to summarization.
Can a mod confirm that the team is aware of this issue?
The training of the AI model fine tune is very much a conscious decision of the OpenAI developers. Summarization is a somewhat unnatural language task that takes a lot of training, and that includes the output length when “summary” is requested.