Conversation output too long and cut in the returned response

Hello,

I’m developing a B2B wellness app that begins by asking users several onboarding questions. Based on their responses, I make an API call to an assistant to generate a customized meal plan. I’ve carefully designed the prompt to return the plan in a structured JSON format, and this part of the process works well.

However, I’m facing an issue with the length of the output. I need to generate a 7-day meal plan with at least 3 meals per day. Each meal includes an array of foods, macronutrients, and a 3-step recipe. As you can imagine, this creates a very long output. When I check the token length using the OpenAI tokenizer, it consistently falls below 5900 tokens.

Is this length limitation due to the model, the context window, or something in my approach? If my use case needs to be reworked, what would you suggest? I’m open to ways to streamline this process.

Thank you!

Maybe the “finish_reason” in response would give you an idea what went wrong?
https://platform.openai.com/docs/api-reference/chat/object

Will it be returned by the assistant API? I’m using the assistant api

Hi - in terms of output length, the latest models are generally limited to a maximum of 4,096 output tokens. In practice, getting output consistently up 4k tokens is however pretty difficult and scenarios where you reach 1-3k tokens are more likely.

To deal with this, you should best break down the task into smaller, manageable steps (e.g. only have the Assistant return the meal plan for 1 or 2 days at a time) and then, if necessary, concatenate the outputs into the full 7-day meal plan at the end.

1 Like