Issue with AI-generated text: Concatenated words without spaces

miryam.ychen · July 14, 2023, 10:53am

I am using openAI 3.5 to generate content. Lately I am getting outputs with a lot of concatenated words with no spaces. Literally some full sentences/paragraphs with all the words concatenated, with no spaces. Plus some grammar problems as well. Does anyone have a solution to similar problems? Thank you in advance.

miryam.ychen · July 14, 2023, 11:01am

it’s been happening in the last few weeks consistently. It started happening maybe 1 out of 10 times, and now it happens pretty much every single time.

smuzani · July 14, 2023, 11:46am

Could you post a sample of what caused it to generate this content? I haven’t seen it do this, but often things like this are related to the conversation.

Does it still happen if you start a new conversation as opposed to continuing an old one?

miryam.ychen · July 14, 2023, 11:56am

Sorry it’s not about chatgpt, which I have not seen this happen neither. The problem I am having is with the content generated using OpenAI API calls. The prompt used is quite simple, for example: You are a professional content writer, write an article about Summer. Use 6 paragraphs with 150 words in each.

EricGT · July 14, 2023, 12:01pm

Kindly ensure that you pay close attention to the category and tags you have selected.

The category Prompting leads us to understand you are talking about a ChatGPT prompt.
The tag chatgpt further enforces this idea.

I would suggest considering making adjustments to both of them in order to better align with your chosen topic.

_j · July 14, 2023, 7:38pm

The solution is likely to specify and reduce the temperature parameter. Temperature can cause non-ideal token generation. Start with 0.0 and work up from there to 0.6 if you need varied outputs to the same inputs. Reduced quality of generation (quantization?) already now gives you a simulation of higher temperature.

A few examples in earlier context is all it takes to cause unbreakable “training” with the way gpt-3.5 is now paying more attention to prior biases.

More training on code and function calling may have also increased the weight given to the next token possibility being a word without the preceding space included in its token, such as functionName that includes camelCase.

daniel.yakubov · July 14, 2023, 8:23pm

On top of the most recent reply about temperature, you may also want to reduce top_p - if it is 1 it can lead to the consideration of highly improbable next tokens in outputs

miryam.ychen · July 17, 2023, 8:26am

Yes, I should have used the API Tag here, but the category is correct, as it is related to prompt engineering.

miryam.ychen · July 17, 2023, 8:30am

Thank you for the info, I have tried with different temp, I do notice that with lower temperature it happens less often, but it still happens. When I started the project with GPT 3.5 (back in April-May) it rarely happened, now it is just constantly happening. However, using the same prompts and keeping all the other parameters constant - changing only the gpt model to gpt-4, this problem disappears… feels like I am forced to upgrade the model now.

miryam.ychen · July 17, 2023, 8:31am

Thank you for the info, I’ll try it

Topic		Replies	Views
Incoherent Phrases, Missing Words, Language Switching API gpt-4	4	983	December 26, 2023
Spacing between generated word API	2	871	February 14, 2022
Open AI APIs responses becoming random Community gpt-4 , api	3	935	April 28, 2024
2 extra spaces at the end of each line in the AI output API	1	704	February 25, 2025
GPT-4o generated gibberish: How to prevent going forward? Bugs api , gpt-4o	5	249	November 26, 2024

Issue with AI-generated text: Concatenated words without spaces

Related topics