Getting 400 Bad Request on GPT 3.5 for long prompts

Hi OpenAI team,

We are using createChatCompletion endpoint and we are experiencing a weird behavior with long messages using gpt3.5-turbo.

Example 1 getting 400 bad Request:
{ messages: [ { role: 'user', content: ’ \n’ + ' You are a Content generation assistant.\n' + ‘\n’ + " Here is the user userInput constant between triple '. \n" + " ‘’‘As temperatures shift, animals move to different regions and pathogens have more opportunities to jump between hosts. Could the next pandemic be fuelled by an unstable climate? When temperatures rise, everything changes and disease arrives. As the thick ice melts and the seas and the air warm, so new life arrives in Arctic waters. Minke, bottlenose, fin and sperm whales are heading north, even as grizzly bears, white-tailed deer, coyotes and other animals and birds expand their range into boreal forests to the south. But the geography of disease is also changing as novel pathogens affecting plants, animals and humans increase their range. New beetles are heading north and devastating Siberian forests, Alaskan mammals are struggling as new ticks arrive and human habitations in northern Norway are infested by new insects. In Alaska, where winter warming has increased by nearly 4C in 60 years, the whole ecosystem is undergoing change. The sea ice is breaking up earlier than it used to, causing changes in the amount of phytoplankton – the minute organisms that drift around in water currents – at the bottom of the food chain. This has knock-on effects on fish and bird populations. Lakes are changing size, marine heat waves are becoming more frequent and more intense, and mammals must seek new food sources. The author Edward Struzik reports a plethora of deadly and debilitating diseases striking reindeer in Scandinavia and Russia, musk oxen on Banks and Victoria islands in Arctic Canada and polar bears and seals off the coast of Alaska. \n" + 'New pathogens are turning up everywhere. They may be strange pests in Malawian maize fields, a novel fungal infection appearing in the ear of a Japanese woman, an unidentified insect killing trees in Russia, or a new bacterium shrivelling the fruit of lemon trees in Florida. I called Daniel Brooks, a bacteriologist at the Harold W Manter Laboratory of Parasitology at the University of Nebraska State Museum. He and his colleagues studies show that with global heating, pathogens are jumping more frequently from one host to another. He argues that climate will soon be fuelling pandemics and we can expect a succession of unpredictable human, animal and plant disease outbreaks. We rarely have an idea where the next [pathogen] will pop up. All we know is that they are better at finding us than we have been finding them. Brooks places climate change and disease in historical and ecological context. When the climate is stable, he says, species tend to be isolated and specialised. But when it is volatile, as it is now becoming, environments change and this creates opportunities for new species to colonise and evolve. \n' + “Throughout the past 15,000 years, advances in civilisation – agriculture, domestication, the flourishing of cities and globalisation – have all been accompanied by increasing disease risk. But never before has the human population been larger, living at such high densities and so hyperconnected. We are now approaching a storm of spiralling disease risks. Past pandemics, he says, largely arose at times of climate shifts that destabilised human populations and forced migration, famines and conflict. They then subsided with the return of climate stability. So long as climate change continues to stir up the biosphere, pathogens will continue to move between humans, farmed animals and wildlife. The planet is a minefield of evolutionary accidents waiting to happen. Now we are in a period of accelerating climate activity, we should expect more emerging diseases. It is these emerging diseases, made more likely by climate change, which then exacerbate poverty, famine, drought, conflict and migration. Climate change, he emphasised, does not produce disease itself; rather, it is a multiplier, increasing and speeding up the threat of diseases emerging. When temperatures rise or fall, when the rains are heavier or a drought lasts longer, then the conditions for life change and the insects, bats or ticks that often carry the pathogens of diseases like malaria, Rift Valley fever, cholera and dengue are likely to move geographically. Far from being linked definitively to a single host, as was thought for many years, Brooks and colleagues are showing that viruses and bacteria can and do switch hosts frequently, with the result that new diseases will inevitably emerge or old ones flare up more easily with climate change. They suggest that we have a crisis of emerging infectious disease, with known pathogens becoming more dangerous than before, appearing somewhere new or infecting new hosts, changing both the movement of pathogens and the impact they have on humans and animals. Other work suggests that climate and land use changes over the coming decades will together produce many new opportunities for viruses to jump between species and cross to humans. At the moment, says Colin Carlson at the Center for Global Health Science and Security and the Verena Institute at Georgetown University, some 10,000 virus species have the capacity to infect us, but the vast majority circulate silently and unrecognised in wild mammals. His models simulate how 3,139 species will share viruses and create new spillover risk hotspots. In the warmer, physically degraded world that we are creating, species will naturally converge in new combinations at high altitude, in places of high biodiversity and where human populations are concentrated, most likely in Asia and Africa. Mammals like bats will encounter other mammals for the first time, potentially sharing thousands of viruses. This ecological transition may already be underway, he says. The implications of research like that of Brooks and Carlson are enormous, says Canadian-French ecologist Timothée Poisot. It shows that there isn’t a ‘climate’ future and a ‘pandemics’ future, there’s just a future made of synergistic effects. When it comes to warming, the ecological process goes roughly like this: the temperature rises, which affects rainfall and humidity, wind patterns and the amount of sunshine; this increases the severity of cyclones, floods, droughts and heatwaves, which in turn affect the survival, dispersal and reproduction of pathogens that are linked to outbreaks of cholera, typhoid, filariasis, leptospirosis, malaria, yellow fever and a host of other diseases. Changes in a warming world are throwing up many medical surprises. In 2016, the far past returned to northern Russia with an anthrax outbreak that killed a 12-year-old boy and infected dozens of semi-nomadic reindeer herders and their animals. A heatwave with temperatures reaching unprecedented mid-30s Celsius (mid-90s Fahrenheit) had part-melted the permafrost and exposed a reindeer corpse that had died of the disease many years before and been buried barely a metre deep in the frozen ground. Ninety people were taken to hospital. Until then, it had barely occurred to me that a warming climate might disinter dormant but still active pathogens. Viruses and bacteria can live for millennia when frozen, and no one has much idea what else may be entombed in icy soils and glaciers to be released with heating. We have not had long to wait for more evidence. In the past five years, the European Space Agency has found hundreds of ancient microorganisms resistant to antibiotics in the Alaskan permafrost, and Chinese hunters have isolated 33 ancient and previously unknown viruses dating back at least 11,000 years inside Tibetan glaciers and ice samples in Greenland and elsewhere. There will surely be many more similar discoveries.\n” + "''' \n" + ‘\n’ + " You need to generate content based on the provided input between triple '. \n" + ’ \n’ + '\n' + ’ Final Steps:\n’ + ' - Make sure that the generated text is in the same language as the userInput or translate the generated Content in the same language as the userInput.\n' + ’ - Make sure that the generated text does not change the person perspective in which the userInput was written.\n’ + ' - Remove any keywords such as "Title", "Introduction", "Headline", "Description", "Section", "Conclusion", "CTA" from the generated text.\n' + ’ \n’ + ' Return the generated Text \n' + ‘\t\n’ } ], model: 'gpt-3.5-turbo', max_tokens: 2500, n: 1, temperature: 0.7, user: 'xxxxxxxxx'}

The same message without the final steps instructions in prompt works OK.
Other longer texts removing the final steps also fail with 400 Bad Request.

Could you help us with this issue, please?

Might be a smart quote problem? I’d check there first.

Many thanks for your answer PaulBellow, it was my first assumption but it is not the root cause. We build all the prompts in the same way and we have only problems with the longest ones and these longest prompts work with gpt4.

Hi, did you find a solution for the issues that you posted about ?

Please explain the issues you are inquiring about, to provide some assurance you aren’t just a bot filling the forum with nonsense.

  • What symptom are you personally experiencing
  • What prompting and code are you executing to cause the issue
  • What have you tried so far to solve your problem?

There’s been too many new users here “taking a survey” of old posts.

1 Like

We have been finetuning GPT3.5 with very good results, when we use it to translate huge files, all of a suddent we get totally empty response for a given segment, even if we re-prompt or include it in a different payload it will still return “”.

It works perfectly with GPT4-turbo, but we have a lot of plain vanilla cases where a fine-tuned GPT-3.5 will perform perfectly.

Does it make sense ?

That certainly does make sense as an issue. You don’t speak to having a 400:Bad Request of this topic though, caused by malformed messages sent to the AI.

gpt-3.5-turbo has had two completely new versions since this forum topic was made, and it seems the ability of the AI to see a large amount of data at once has decreased.

I think though, we need to look at what the AI is writing instead of what it is comprehending.

Is it really writing quotes you don’t expect? Those can be demoted by using an API parameter logit_bias with a negative value (and then looking up the actual token numbers for quote characters alone and in twos).

The 1106 versions particularly were trained with some problems in using Unicode characters, such as accented Latin alphabet letters.

However, if you are getting nothing back from the AI, that also might be the AI deciding to emit a “stop sequence” instead of responding. The quality of fine-tune can affect token generation in such a way.

I’d try the same inputs on plain gpt-3.5-turbo-0125 (cheaper), and gpt-3.5-turbo-0613 (more capable), and see if they also abort the output prematurely, or how close they get to performing the task, with more system message if it needs alignment.

Unfortunately, there is no seeing the logprobs to show what else the AI might have had a chance of producing if there is only the “stop”. You can verify the completion tokens in the usage stats returned by the API is 0 to see if indeed you are simply getting nothing.

Hope I’ve given you some ideas. With fine-tune, there is no replicating what happens for you by others.

Thank you very much. It gives us some things to test out.

To clarify, we get in return like 50 blank spaces:

" "

. I dont know if that might be caused by something different or there is other things to do, to eliminate that issues ?

Bajillion spaces? But not the quotes themselves?

"                                                  "

That can come not just from fine-tuning gone goofy, but also from using the JSON response mode API parameter with poor JSON specifications (which makes the AI either write JSON…or go crazy). Also, trying to get the output of a later model by a tool call which internally uses the JSON mode.

Sequences of spaces will have unique token numbers by the length the AI writes at once. You can look at the “bytes” of logprobs, and see how many spaces there are at once to then get the token numbers to demote, when you investigate how close the AI is to starting the input in a correct manner.

(Of course you can read the context carefully and see if there’s any reason why the AI would actually think spaces are acceptable as a response…)

1 Like