Concerning: Increasing Gibberish Output from Assistants API

This is becoming more and more of an issue, with users contacting me that there is something wrong with the outputs they’re getting from my app. Starting to see on my own here in testing, I’ll get something like this:

Would you like expanded interaction drafts built further against Jane–1. ‘Challenging (Equestrian Motions)’–Over rospy setup post Style-choice/ map modeling?

or even something like:

  • Practical Step: Note where the sample activates locations in your neighborhood. For you, channels likely trigger Section 11 (Community Vision/Networking mode) early/ Are passion solidified equally much yet irrelevant undo narrative therapy recognizable fully integrated João-Max tunes-weight-near stone butalignedfrationsancerslates

The only common thing I can see so far is that it’s related to Messages that called file_search in the background, though I’m not 100% positive about that yet.

These failures are not streaming issues on my end, and can be confirmed by accessing the messages directly, by using the Retrieve Message endpoint:

(this is the CLI response)

\n\n---\n\nWould you like expanded interaction drafts built further against Jane–1. ‘Challenging (Equestrian Motions)’–Over rospy setup post Style-choice/ map modeling?",
        "annotations": [
          {
            "type": "file_citation",
            "text": "【8:16†source】",
            "start_index": 247,
            "end_index": 260,
            "file_citation": {
              "file_id": "file-LEJrXYaq4kVcxt3mmJJyQN"
            }
          }
        ]
      }

I was hoping there would be some love for Assistants API yesterday but alas, looks like not till next year…but maybe there’s a chance to fix this before everyone disappears for the holidays??

Temperature?

1 Like

1.0

The only reason I bring it up here in the forums is because I haven’t changed anything recently that would affect this. These are runs that have been running consistently for almost a year now, and only recently have I begun to notice and receive complaints about gibberish output.

What’s the model that you’re using @Jim ?

gpt-4o-2024-11-20

I guess that would be the one thing I’ve updated in the last month or so, but hoping this isn’t the culprit as my users (and me) LOVE this model.

How many tokens are you trying to get out?

With that model (and 4o in general), I’ve noticed that really long token (4k+) output, it occasionally starts to go sideways toward the end.

Is this everything is mangled/nonsense, or just the end?

1 Like

There is the occasional out-of-place word, but it does appear to be more towards the end of the output.

This is the last one that it showed up in:

"usage": {
    "prompt_tokens": 76302,
    "completion_tokens": 508,
    "total_tokens": 76810,
    "prompt_token_details": {
      "cached_tokens": 59776
    }
  }
1 Like

Yeah, I like that model as well. I would not, however, use that model at temperature = 1 though.

If the tasks require accuracy, I prefer 0 temp and 0.33 for useful creativity.

1 Like

Interesting.

This is a creative writing app balanced with some tasks that do require more accurate results.

I’ll experiment with these two settings across the spectrum and see what happens (I’ll hold off on marking it as solution until I’ve got some more concrete data back).

Thanks!

2 Likes

No worries. Please come back and let us know if you get it sorted or not.

The 0.33 might be too low for creative writing, but I find myself using .7 to .9 range a lot.

Good luck!