What can be done to avoid the answers generated by the openai-langchain model getting truncated? The total token length is 4097. However the input token length itself is over 3000 hence the output which is below 1000 is either getting cut off or throwing error that it exceeded the total context length.
Is it possible to reduce the input context length by adjusting the chunk size or any other parameters/workarounds possible here?
openai-langchain model? - there is nothing called that.
You can review the OpenAI models page, and see which models have a larger context length. Most work on the chat completions endpoint, to which langchain can be adapted.
by openai-langchain model, what i meant is using openai for embeddings and lanchain for QnA.
You still are using a language model for language inference generation, when you talk about 3000 in/1000 out. It might have a name like gpt-3.5-turbo-instruct
, which is only for the completions endpoint and might be used if you adapted some old code.
Langchain is something to understand completely before using it even simply. Like assistants, it has the ability to run iteratively internally and empty your account balance.
I’m using gpt-3.5-turbo-instruct. I need lengthy answers so i’ve set the max_tokens parameter to 1024.
Please note that max_tokens is the length of tokens for the output.
This means that if max_tokens = 1024, the response will necessarily be truncated to 1024.
Why not check the length of the input tokens beforehand and try to keep them to 4096 tokens along with the output?
yes, that’s what i needed to know. how to limit the input context length to keep the output length unaffected.
You cannot adjust the length of the output completion to avoid truncation by adjusting the chunk size or other parameters.
It’s quick to ask the Langchain bot about Langchain.
Set up to use gpt-3.5-turbo and ask “Can I devise a way to make the completion of gpt-3.5-instruct fit into the specified length?”
I can’t guarantee the results, though.