Since 2024-Nov-16 Assistant API returning 'server_error'

As initially reported here - and requested by @vb , I open a new thread for an issue with constantly failing runs that had worked with an Assistant until last weekend.

Now, within 3 to 5 sec, constantly (without exceptions!) a server error is returned - like:

{
  "id": "run_2upIUIw4aPXmooG1zukz0Jsl",
  "object": "thread.run",
  "created_at": 1731924886,
  "assistant_id": "asst_O2DO4Mt86W149U4C0o6TNMtU",
  "thread_id": "thread_sVTzwtg1H4d7U023XkYc7leG",
  "status": "failed",
  "started_at": 1731924886,
  "expires_at": null,
  "cancelled_at": null,
  "failed_at": 1731924888,
  "completed_at": null,
  "required_action": null,
  "last_error": {
    "code": "server_error",
    "message": "Sorry, something went wrong."
  },
  "model": "gpt-4o-mini",
  "instructions": "You are ...",
  "tools": [
    {
      "type": "file_search",
      "file_search": {
        "ranking_options": {
          "ranker": "default_2024_08_21",
          "score_threshold": 0.0
        }
      }
    }
  ],
  "tool_resources": {},
  "metadata": {},
  "temperature": 0.15,
  "top_p": 1.0,
  "max_completion_tokens": null,
  "max_prompt_tokens": null,
  "truncation_strategy": {
    "type": "auto",
    "last_messages": null
  },
  "incomplete_details": null,
  "usage": {
    "prompt_tokens": 1339,
    "completion_tokens": 28,
    "total_tokens": 1367,
    "prompt_token_details": {
      "cached_tokens": 0
    }
  },
  "response_format": {
    "type": "text"
  },
  "tool_choice": "auto",
  "parallel_tool_calls": true
}

this happens after successfully proceeding as follows:

  1. https://api.openai.com/v1/threads
    • POST
  2. https://api.openai.com/v1/threads/thread_sVTzwtg1H4d7U023XkYc7leG/messages
    • POST
  3. https://api.openai.com/v1/threads/thread_sVTzwtg1H4d7U023XkYc7leG/runs
    • POST
  • initially responding WITHOUT error:
{
  "id": "run_2upIUIw4aPXmooG1zukz0Jsl",
  "object": "thread.run",
  "created_at": 1731924886,
  "assistant_id": "asst_O2DO4Mt86W149U4C0o6TNMtU",
  "thread_id": "thread_sVTzwtg1H4d7U023XkYc7leG",
  "status": "queued",
  "started_at": null,
  "expires_at": 1731925486,
  "cancelled_at": null,
  "failed_at": null,
  "completed_at": null,
  "required_action": null,
  "last_error": null,
  "model": "gpt-4o-mini",
  "instructions": "You are ...",
  "tools": [
    {
      "type": "file_search",
      "file_search": {
        "ranking_options": {
          "ranker": "default_2024_08_21",
          "score_threshold": 0.0
        }
      }
    }
  ],
  "tool_resources": {},
  "metadata": {},
  "temperature": 0.15,
  "top_p": 1.0,
  "max_completion_tokens": null,
  "max_prompt_tokens": null,
  "truncation_strategy": {
    "type": "auto",
    "last_messages": null
  },
  "incomplete_details": null,
  "usage": null,
  "response_format": {
    "type": "text"
  },
  "tool_choice": "auto",
  "parallel_tool_calls": true
}

then, after 2 times polling run status, the error like above is returned.

Working with same prompting in playground, with same assistant_id, the assistant provides a response.
But playground works with streaming ON, whilst my app does not (stream property not set)

Have already tried to set up a new assistant, but same behaviour.

THANK YOU for looking into it! :pray:

2 Likes

Update: seems that a server-side fix has healed the issue. I cannot see the error occuring, currently - will keep an eye on it and update.

2 Likes

I am facing the same issue. Please could you explain what you mean by “server-side fix”? Did you do something to make that happen?

Thanks in advance!

He’s talking about the error event that is sometimes being sent where it will give you a request_id to submit. Unfortunately, as you probably see sometimes you get it, sometimes you just get “Sorry, something went wrong.” It’s totally random.

And don’t get your hopes up once it starts working, my current “fix” is to remove all tools and file_search which SEEMS to work at the moment - but I’ve been bitten by this before where I think some change I do on my end fixes things…and then it fails 20 minutes later.

And all of this was working 100% across the board just seven days ago!

I am seeing this now and it hasn’t been an issue before that I am aware of. I am using playground and a copy of the model that has no tools EXCEPT it does have file search turned on. Our production versions of these assistants are all hosed critically at the moment. Just submitted a bug ticket.

2 Likes

I tried the version with just file_search, but that eventually failed on me too.

Spent the day setting production up so if it fails 3 times in a row, I automatically set the assistant_id to a “dumbed” down version – i.e., no tools and no file_search.

It kind of defeats the purpose of using the Assistants API but at least users won’t see errors with every request.

After an hour it resets back to the “best” assistant and starts the whole process over again.

1 Like

Yeah I think that is a good course of action. I was similarly thinking of instituting something like this but man, that really stinks. It’s one thing when the status JSON can trigger a warning but this seems to be happening “silently” on the openAI status page.

One more thing, @jim - I just looked at what you are doing with your app and I love it! Very cool.

1 Like

Thanks so much, I REALLY appreciate that.

I’ll let you know when it actually starts working again. :laughing:

1 Like

Update: while the previous "server_error", "message": "Sorry, something went wrong." issue does not reoccur for me:

I see a more severe but hidden issue from the fact that the assigned RAG (vector store - for file_search) is no longer being used for inference with the Assistant API:

  • neither when using playground
  • nor when using the API
  • even if additionally (or to override) a dedicated vector store is assigned to the thread

Of course, the input and context used where the same that previously (before last weekend) did use the vector store (file embeddings).

FYI: Ease to note if file_search was successfully used, when the response contains values in *annotations array inside the message with assistant role, e.g.:

  "thread_id": "thread_abc123",
  "role": "assistant",
  "content": [
    {
      "type": "text",
      "text": {
        "value": "Hi! How can I help you today?",
        "annotations": []
      }
    }
  ],
  "assistant_id": "asst_abc123",

Now this array is always empty - so file_search was not done and/or not successfully working :frowning:

After forwarding this issue to OpenAI, I just received a message confirming that a fix has been deployed, and the situation should already be improved.
Please share your feedback here.

I hope this helps!

2 Likes

Thank you for looking into this and for forwarding.
The issue is unchanged:

  • whilst runs do no longer fail (that issue is gone since >24hrs)
  • still the file_search is not using the specified vector store anymore, neither in playground nor via API

“Works for me”.

(something that no AI can answer without more knowledge)

Perhaps more information about the age of the vector store, if recreating another helps return results, etc.

The AI in assistant has a real problem automatically using file_search (or myfiles_browser as its name for all models except gpt-4o). The tool describes files as being uploaded by a user; you have to produce a high quality re-description to get good automatic searching, even reproducing the tool text yourself in instructions as an override, or have the user state “I uploaded files to search”.

1 Like

@_j I know perfectly well what a RAG is and how it works and how a configured vector_store (with attached files) can be attached to an assistant, and/or attached to a given thread for a specific run (to override what is configured on assistant config level).

Not sure what you aim to say with “works for me”…:

  • do you want to say that currently you DO GET assistant responses when retrieving messages that have a populated "annotations": [] array?
  • or do you just want to say that you know how it works in general but cannot comment on what changed for you since this weekend, with an existing app/setup?

My post was to report a bug - not a how-to question - nor an ask for how to engage with gpt-4o on the API… :wink:

You show an AI saying: Hi! How can I help you today?

As if you just said hello to it.

And then you wonder why the AI didn’t invoke the file search feature?

Or do you want to show not getting annotations for a useful user query, like “search my uploaded files for more OpenAI facts”.

If you are using beta:v2 and file_search, quoting a selection of text with output as annotation is no longer a feature. (There is useless text still left in the tool description for the AI though.)

If you want to see if file search was employed, you would retrieve the run steps of a run.

Or report the stimulus to get this topic’s server error on your account as run status, also something with deeper run step inspection possible.

An AI model’s stealth update, writing garbage to internal tools and the wrong tools and output methods, may cause an error. Reduce the assistant top-p and temperature setting as a compensation for AI of the same name that seems to only spiral downwards in capability.

Thank you, but I know pretty well what I am doing. And I am not sending any “Hi AI…” prompts, but well defined information from projects that the curated files will have dozends of matches for…

As said, I use this pattern and app for a while, it worked before, every time.

Now it fails completely, even with manual and faked prompts that match many embeddings from vector store

In what way? That the AI never emits to the tool, producing hallucinated garbage? Or do you get an error?

You can give a bug report of not using tools as instructed or as expected - but that is an AI that is at a complete loss of how to operate with the provided tool placement. That would be a problem report that looks like this for gpt-4o (by alias).

First, show everybody how this operates:

gpt-4o tool's guidance - 752+ tokens
## file_search

// Tool for browsing the files uploaded by the user. To use this tool, set the recipient of your message as `to=file_search.msearch`.
// Parts of the documents uploaded by users will be automatically included in the conversation. Only use this tool when the relevant parts don't contain the necessary information to fulfill the user's request.
// Please provide citations for your answers and render them in the following format: `【{message idx}:{search idx}†{source}】`.
// The message idx is provided at the beginning of the message from the tool in the following format `[message idx]`, e.g. [3].
// The search index should be extracted from the search results, e.g. # 【13†Paris†4f4915f6-2a0b-4eb5-85d1-352e00c125bb】refers to the 13th search result, which comes from a document titled "Paris" with ID 4f4915f6-2a0b-4eb5-85d1-352e00c125bb.
// For this example, a valid citation would be ` `.
// All 3 parts of the citation are REQUIRED.

namespace file_search {

// Issues multiple queries to a search over the file(s) uploaded by the user and displays the results.
// You can issue up to five queries to the msearch command at a time. However, you should only issue multiple queries when the user's question needs to be decomposed / rewritten to find different facts.
// In other scenarios, prefer providing a single, well-designed query. Avoid short queries that are extremely broad and will return unrelated results.
// One of the queries MUST be the user's original question, stripped of any extraneous details, e.g. instructions or unnecessary context. However, you must fill in relevant context from the rest of the conversation to make the question complete. E.g. "What was their age?" => "What was Kevin's age?" because the preceding conversation makes it clear that the user is talking about Kevin.
// Here are some examples of how to use the msearch command:
// User: What was the GDP of France and Italy in the 1970s? => {"queries": ["What was the GDP of France and Italy in the 1970s?", "france gdp 1970", "italy gdp 1970"]} # User's question is copied over.
// User: What does the report say about the GPT4 performance on MMLU? => {"queries": ["What does the report say about the GPT4 performance on MMLU?"]}
// User: How can I integrate customer relationship management system with third-party email marketing tools? => {"queries": ["How can I integrate customer relationship management system with third-party email marketing tools?", "customer management system marketing integration"]}
// User: What are the best practices for data security and privacy for our cloud storage services? => {"queries": ["What are the best practices for data security and privacy for our cloud storage services?"]}
// User: What was the average P/E ratio for APPL in Q4 2023? The P/E ratio is calculated by dividing the market value price per share by the company's earnings per share (EPS).  => {"queries": ["What was the average P/E ratio for APPL in Q4 2023?"]} # Instructions are removed from the user's question.
// REMEMBER: One of the queries MUST be the user's original question, stripped of any extraneous details, but with ambiguous references resolved using context from the conversation. It MUST be a complete sentence.

type msearch = (_: {
queries?: string[],
}) => any;

} // namespace file_search

or

tool provided to all other AI models
# Tools

## myfiles_browser

You have the tool `myfiles_browser` with these functions:
`msearch(queries: list[str])` Issues multiple queries to a search over the file(s) uploaded in the current conversation and displays the results.
please render in this format: `【{message idx}†{link text}】`

Tool for browsing the files uploaded by the user.

Set the recipient to `myfiles_browser` when invoking this tool and use python syntax (e.g. msearch(['query'])). "Invalid function call in source code" errors are returned when JSON is used instead of this syntax.

Parts of the documents uploaded by users will be automatically included in the conversation. Only use this tool, when the relevant parts don't contain the necessary information to fulfill the user's request.

Think carefully about how the information you find relates to the user's request. Respond as soon as you find information that clearly answers the request.

You can issue up to five queries to the msearch command at a time. However, you should only issue multiple queries when the user's question needs to be decomposed to find different facts. In other scenarios, prefer providing a single, well-designed query. Avoid single word queries that are extremely broad and will return unrelated results.


Here are some examples of how to use the msearch command:
User: What was the GDP of France and Italy in the 1970s? => msearch(["france gdp 1970", "italy gdp 1970"])
User: What does the report say about the GPT4 performance on MMLU? => msearch(["GPT4 MMLU performance"])
User: How can I integrate customer relationship management system with third-party email marketing tools? => msearch(["customer management system marketing integration"])
User: What are the best practices for data security and privacy for our cloud storage services? => msearch(["cloud storage security and privacy"])

Then the model failing to follow instructions and producing fabrication in output by not emitting a tool call token:


Enhancement:

An input pattern that would exploit the phrasing of the file_search tool against a reluctant AI, following tool instructions that have no hint of the uploaded contents, and no indication that vector store would be an AI Assistant’s knowledge skill (and not simply user-uploaded files):

“I’ve uploaded files as additional knowledge. Please consult with them before responding. When using OpenAI’s Assistants API endpoint, how will the AI return particular citations or annotations back to the API developer from the vector store text when it receives file search results after using the internal file search tool?”

Run steps: Invocation

Run steps: ranker results

That’s a demonstration of file_search being called vs not.

Then by my prompt about “my files” being satisfied, establishing that I am the uploader of “file_search” – complete authority over your documentation.


You can get higher satisfaction by pitching the whole Assistants platform, and placing your own injected RAG automatically based on user input message context (without any multi-turn tool invocation by AI with another’s instructions) - on Chat Completions.

Thank you @_j for providing these relevant insights into how file_search is being implemented in the background.

With that in mind, I was able to at least confirm that my files/vector_store is still accessible in my Assistant usage, at all.

However, my previous approach with prompting embedded in my existing app doesn’t work anymore since the weekend. Apparently something has changed in the way the backend interprets the prompting for identifying relevant context and imperative for leveraging file_search.

For now, and while I attempt to revise my app-embeded prompting to ensure that file_search is used:

I have these questions for you @_j :

  1. Why are the “System instructions” that are configured in a given Assistant NOT USED when a new thread is created through API for that assistant_id?
    • “System instructions” are the place where I was instructing the model on how and when to use my files for any given run…
  2. Why, only when I include my content from “System instructions” into the user-role message content for a thread, the file_search tool is used as instructed…?
    • so far I was in believe that the “System instructions” are used for a new thread despite them being repeated in the user-role message (in Playground)

The API reference is not exactly clear about what to include in API calls that leverage an existing and configured Assistant:

  • should that be done to override the config?
  • is it required to read the assistant config by API and re-insert each relevant config property into the API calls? (e.g. vector_store reference - although already included in the config via UI?)
  • is the config completely overwritten if only a single property is injected via API?

Attached, an example that shows that not a single file was successfully retrieved and referenced/used when I continue to use my previous prompting.

  1. an example that shows

Assistant “Instruction” takes the place of (and is) a system role message placed at the start of a conversation (still a system message internally). It is the assistant behavior, and you also have a field additional_instructions that can be appended to that text on a per-run basis.

When you create a thread, there is no assistant associated with the thread ID. You can choose whatever assistant ID to run the thread against each time you start a run. You can choose if an expert in Rust programming with new documentation or Cleopatra answers the same thread (and an adorable squirrel cartoon Assistant will quickly drop out to answer your programming questions anyway, as the model is only good at acting like ChatGPT).

You should just be able to select an assistant, and if the previous conversation isn’t too confusing, get that new behavior with new tools - in theory. The thread maintains tool calls and returns that may no longer be available or representative of the new Assistant, and other things internally that would make such a mid-conversation switch difficult.

GPT-4o is just terrible about following procedural step-by-step instructions beyond a chat-like initial input - especially with the input context length that can grow beyond the size of original GPT-4’s context window with just one data return from Assistants’ file search, with documents that are distraction. You can use the ranker threshold so that fewer chunks are returned, so that any second question doesn’t have instructions lost in 16000 other tokens.

You can see the internal instructions for file search tools simply are not oriented to documents being part of a built-in knowledge. You’ll have to basically give overriding instructions, now knowing the tool name, and also tell the AI Assistant the contents of the vector store and the requirement to use it, so it can make informed searches.

The model is also the reason the whole ChatGPT GPT idea went to “give up on it” quality as soon as OpenAI switched to that model. It is the 1% computational cost solution to needing to shut off ChatGPT Plus signups for over a month after release of assistants and GPTs. You still have a few variants of gpt-4-turbo to experiment with.