Assistant's truncation/optimization of context window is not good!

I am encountering problems with the assistant API’s automatic optimization of the context window. I am sending 7k tokens consisting of code in the first message and it gets truncated almost in half, so that only the first half is in GPT’s awareness. When I ask anything about the code afterwards, it only knows what’s going on in the first part of the code, and is completely unaware of the remaining code. I know they said they’re using the same optimization for assistants that they use for ChatGPT, but this is not good at all, I expected for the optimization to kick in when the context window is almost full, not when it’s empty (using 128k tokens context btw).

I used chat completion instead of assistants up until now and it seems that I need to go back to it. Unless there’s a way to explicitly set a specific message from a thread to never be truncated or modified…

Do people that use data retrieval where they need to give big chucks of text in order for gpt to answer, face the same problem ?


Are you approaching this prior error threshold in your message?

1 validation error for Request body → content ensure this value has at most 32768 characters (type=value_error.any_str.max_length; limit_value=32768)

The output of the model is limited to 4k tokens, thus there is no getting a response of that size back all at once.

No, it’s about its awareness of the given message with the user role. When it responds it simply says that it doesn’t know of what piece of code I am talking about. But it does seem to know if I ask something about the code that is in the first 3-4k tokens.

BTW, you need to explicitly use gpt-4-1106-preview throughout, as the only common AI model with that 128k context length and a chance of understanding and producing useful output based on it.

If one wanted to reproduce this symptom, are you trying to use or is the AI invoking code interpreter, or are you just asking about the code? Describe exactly the “I am sending 7k tokens consisting of code”

Clarification: is the “code” all message text, or is it a file attachment to a user message?

Attachments are chunked and accessed by the demand-based retrieval function.

Yes, I am using gpt-4-1106-preview .

I have a long string of code like this:

package main

func main() {
	err := godotenv.Load()
	if err != nil {
			"err": err,
		}).Error("Can't load config from .env. Problem with .env, or the server is in production environment.")

	config := config.ApiEnvConfig{
		Env:     strings.ToUpper(os.Getenv("ENV")),
		Port:    os.Getenv("PORT"),
		Version: os.Getenv("VERSION"),

except is 7k tokens long. I create a message with the role of user and instruct the assistant to remember the code since I’ll ask questions on it afterwards (I also added a thorough instruction when creating the assistant, just like in docs). Then, my next message that I add to the thread is something like: “What does the main() function do?”, and it answers it well. Then I ask another question, about a function that is at the end of the code text - “What does the function httpResponse() do ?”, and it answers with, literally: " The functionhttpFunction does not exist in the project codebase you’ve provided. …". I have played with it and found out that it is not aware of anything from the second half of the code, even better, if I ask something about some code that is somewhat in the middle of the whole text, it starts to explain, giving me the first few lines of that piece of code and then says: “The rest of the code has been cut off”, literally not knowing how that specific block of code ends.

My guess is that the optimization algorithm for the context window is triggered by such a long message, and it can’t help itself but truncate it. Also I have not had this problem with chatCompletion

Was this ever solved?? I’m getting the same issue.

Nope. I switched to ChatCompletion. And for other tasks I use Claude altogether.

1 Like

Thanks. I was thinking about both ChatCompletion and Claude.

1 Like