Need to come of finish_reason: length Max token alternatives

Hello guys I am trying to print transactions from a pdf in json format (I converted pdf to text from a third party library). These transactions are of 1 month. Transactions are being printed rightly but I am facing cutted response issue due to which I am not able to parse the response json. Reason is “finish_reason”: “length”. I read the doc and found that maximun token length in 4096.
But I need all the data of transaction the model I am using is gpt-3.5-turbo-0125

Here is the code for more clarification. Any leads will help a lot.

messages: [
{
role: ‘system’,
content: ‘You assist with converting unstructued data to structured key value json format from bank statements. Bank statements keys will be date, payment_type, description, paid_out and paid_in. Strictly return response in json without any extra text in beginning or end’
},
{
role: ‘user’,
content: List all the transactions from the following bank statement. Response should be in structured JSON format with the keys 'date', 'payment_method', 'description', 'paid_out', and 'paid_in': \n ${bankPdfData.text}
}],
model: ‘gpt-3.5-turbo-0125’,
temperature: 0.1,

As the maximum token length cannot be changed, you need to make adjustments on your end by splitting the statements into smaller chunks and submit each chunk to the API. You can then aggregate the individual API responses back to arrive at the full list of transactions.

Like I have to sub divide my input and then hit the open ai api 2,3 times and with each response I have to spread out my array ?

2 Likes

I would run the API calls concurrently for greater efficiency and then just combine the results into your final JSON with all transactions.

1 Like

The maximum size of the response is set by your own API use of the parameter max_tokens.

Set that to 100, and the maximum of the model doesn’t matter as you specified your own maximum response, with the response truncated at token 100 with “length” reason.

Set that to 6000, and the maximum of the model matters, and you’ll get an API error.

You should not develop an application that relies on more than 600-800 response tokens from the gpt-4-turbo models, anyway. The AI has been pretrained to produce output lower than the maximum, and will undermine your application by reducing the size of language parts and wrapping up the output prematurely. The finish reason is then “stop” - the AI terminated the output itself.

In this particular case I believe GPT-3.5-turbo was used. I don’t think we should be generalizing the 600-800 response token rule as the models can be steered towards producing a longer output. That said, I agree that one should be conservative about what’s achievable.