Soft Token output limits and worsening performance

minasbenyamin · April 14, 2024, 6:10pm

I’ve noticed a general trend of poor performance on the pretrained models for anything longer than 750-1k input tokens.

This includes a:

Decrease in output quality
Decrease in performance for RAG tasks
Poor quality outputs for anything over 500 tokens in length
Output Instability/Volatility

This has also been true of the manual chatGPT interface.

The issues seem consistent across 3.5 and 4.0.

To be blunt this issue is extremely detrimental. At the same time that there has been an increase in speed and responsiveness of the application there has been a steep drop in output quality.

I have had API related tasks mistakenly omit up to 75% of provided information.

It is not acceptable to have a 16k 4k model behave like a 2k 1k model.

I’ve tried dozens of prompts and the only ones that work with any consistency are injection type which are not a reliable option.

This may be a dealbreaker. There are ways around this like chunking the data and writing tests to verify information loss against different prompts but it is time and expense.

Model retraining is expensive and even my colleagues at large cap companies have limited ability to fine tune.

Topic		Replies	Views
GPT-4 8k token API response size limit API	1	1396	December 16, 2023
GPT 4o mini performing much worse than GPT-3.5-16k Bugs	0	167	August 18, 2024
Why is gpt-3.5-turbo-1106 max_tokens limited to 4096? API	3	13873	January 11, 2024
Is it me or GPT4 consistently doesn't finish and cuts the answers? API	18	6510	April 11, 2024
Chat GPT4 1106 vs ChatGPT 4: Impressive drop in quality API gpt-4 , chatgpt	27	15518	February 14, 2024

Soft Token output limits and worsening performance

Related topics