Response limitations on Assistant and possible alternatives for structured data

Hey there,

I’m still a freshman when it comes to the openAI api so I may just be missing something obvious.

I’m trying to build an email parser using n8n and the Assistant API. I got prompting setup to work fine and results look amazing.

Unforunately Assistant is truncating the response due to limitations (not sure which ones). The emails aren’t really huge and my intendend response is structured JSON data. It just contains a small list of products (around 50) with some quantity and meta information. Nothing much.

Is there a better way to build something like that without the response limitations? I thought limitations for Assistant would be much higher though - a bit confused about that.

The latest AI models are trained more than ever to limit output. It will get very reluctant after 700 tokens, almost as if an “end this conversation” has been put into the inference embeddings at that point.

An end result is that extensive requests for writing still find a way to have individual item’s contents made even briefer, and the output to be wrapped up prematurely.

You might look at the n8n tool you describe. Most 3rd-party tools are smart enough to not use OpenAI’s “assistants”. If on chat completion, a “max_token” setting for the response length can cut off the output mid-sentence.

Truncation in the middle of output is not a parameter we have available for use with assistants. The only thing that can make it stop abruptly is getting the AI to write something that looks copyrighted (and unlike streaming chat completions where you can see where AI output was cut off, its possible instead you get nothing on assistants)

There is always a way out… Think that you can use more than one Assistant if you structure your task(s) in such a way that each Assistant shall perform a small (but precise) portion of the whole task you planed. So for example, you may create one Assistant to do the first 1/3 of the task and then compile to a place where later on you can warp it up and paste the entire thing to your output. I would highly recommend you to take a look into “function calling” in order to do this. Functions are a super tool to interfere within the thread to perform things that the Assistant is still confused doing so… Of course this is a more complicated way to reach your goal, but I also struggled a bit (and I´m still am with some long instructions/tasks) and this was the solution I found to work best. Hope you can make it! Cheers!