Fluctuating Relevance Scores in OpenAI Vector Store — Bug or Intended Behavior?

Hey everyone,

I’ve been encountering something strange with the vector store behavior in OpenAI recently and I wanted to see if anyone else has noticed this or has insight into what’s going on.

The Issue

When I upload documents to a vector store (using OpenAI’s file_search/vector_store flow), I immediately try querying them with questions that should match directly. But the relevance scores I get back are oddly low — like 0.03, 0.028, or similarly tiny values — even when the document is clearly the best match.

The really weird part? I ran the exact same query again the next day, the score jumped to 0.9+. Then the day after, back to 0.03. It’s inconsistent — almost as if some background indexing or reranking is changing or failing intermittently.

Test Case

To be sure, I ran a controlled test:

  • Created a brand new vector store through the API.
  • Uploaded a single document (a simple text file about a random topic).
  • Asked a clear question that the document directly answers.

Despite it being the only document in the store, I still got a relevance score of around 0.02. But other times, this same test yields 0.99.

I first started noticing this right after the major OpenAI outage a few days ago (when OpenAI was down for hours). Ever since then, these fluctuations have been frequent. Could something have broken in the underlying embedding or reranking pipeline?

What I’m Wondering

  • Is anyone else seeing this inconsistency in relevance scores from vector store queries?
  • Does OpenAI do any delayed processing or background optimization after initial upload that would affect results?
  • Is this just a glitch post-outage, or is this an intended behavior we should account for?

Any thoughts, similar experiences, or official responses would be super helpful. This is making reliable vector-based retrieval kind of shaky for production apps right now.

Thanks!
— Dominic

7 Likes

We are using the Responses API with file search. We retrieve the scores from the search results to show that score next to the citation if it is output in the response. Today we notice that these scores have changed dramatically. We used to get scores between .5 and .9. Now we’re seeing the highest scores being .03. What gives? I’ve been scouring the web and these forums to learn about what has happened here and are coming up empty. Can anyone shed some light on this for us?

3 Likes

Hi @dlovric2, it’s great to see you back.

Just a quick question: you are referencing ChatGPT but you are actually using the API?

3 Likes

Apologies for the confusion - I am referring to the creation and usage of the vector_store through the API. I just edited the post.

2 Likes

I see my post was merged into this one. I’ll be following. For us, the problem started yesterday afternoon, 6/17/2025 around 5:30-6:30p central. Once the low scores started, they stayed that way for us. We didn’t experience the inconsistency the OP reported.

2 Likes

Thank you for your understanding!
I saw the two similar, new reports within a single day and decided to merge the topics before informing staff about your issues.

Hope this will be resolved soon!

4 Likes

Some additional feedback; we are also see this issue when using the search function within a vector store. Max score from a recent sample query was 0.0322. A useful result for our use case would typically be > 0.5. At first I assumed it was something on our end, but after re-creating the issue with a new vector store and freshly uploaded data I suspect it’s not.

1 Like

Thanks for sharing this detailed breakdown. If you can send this to [my email] we can triage this to the best placed team to speak to this.

1 Like

The scoring appears to have returned to normal.

Are there any explanations as to what happened ?

1 Like