Openai response getting truncated

pythondev · March 4, 2024, 9:19am

What can be done to avoid the answers generated by the openai-langchain model getting truncated? The total token length is 4097. However the input token length itself is over 3000 hence the output which is below 1000 is either getting cut off or throwing error that it exceeded the total context length.
Is it possible to reduce the input context length by adjusting the chunk size or any other parameters/workarounds possible here?

_j · March 4, 2024, 9:33am

openai-langchain model? - there is nothing called that.

You can review the OpenAI models page, and see which models have a larger context length. Most work on the chat completions endpoint, to which langchain can be adapted.

https://platform.openai.com/docs/models

pythondev · March 4, 2024, 9:45am

by openai-langchain model, what i meant is using openai for embeddings and lanchain for QnA.

_j · March 4, 2024, 11:09am

You still are using a language model for language inference generation, when you talk about 3000 in/1000 out. It might have a name like gpt-3.5-turbo-instruct, which is only for the completions endpoint and might be used if you adapted some old code.

Langchain is something to understand completely before using it even simply. Like assistants, it has the ability to run iteratively internally and empty your account balance.

pythondev · March 4, 2024, 11:32am

I’m using gpt-3.5-turbo-instruct. I need lengthy answers so i’ve set the max_tokens parameter to 1024.

dignity_for_all · March 4, 2024, 12:47pm

Please note that max_tokens is the length of tokens for the output.
This means that if max_tokens = 1024, the response will necessarily be truncated to 1024.

Why not check the length of the input tokens beforehand and try to keep them to 4096 tokens along with the output?

pythondev · March 5, 2024, 4:07am

yes, that’s what i needed to know. how to limit the input context length to keep the output length unaffected.

dignity_for_all · March 5, 2024, 5:14am

You cannot adjust the length of the output completion to avoid truncation by adjusting the chunk size or other parameters.

It’s quick to ask the Langchain bot about Langchain.

Set up to use gpt-3.5-turbo and ask “Can I devise a way to make the completion of gpt-3.5-instruct fit into the specified length?”

I can’t guarantee the results, though.

Topic		Replies	Views
Help Needed: Tackling Context Length Limits in OpenAI Models Community gpt-4 , chatgpt , token , rate-limit , openai	8	15758	February 8, 2024
OpenAI truncating the response API gpt-4 , chatgpt	0	106	April 4, 2025
Token limit during completion API	5	3091	December 18, 2023
Error 400: Maximum context length exceeded Bugs	2	1161	September 11, 2024
Setting max tokens for output issues API gpt-4 , api	4	3509	January 26, 2024

Openai response getting truncated

Related topics