Vector Store association endpoint error

The issue retrieving files has been reported in a previous post, but is part of multi-faceted issues with vector stores that have continued for days. Other symptoms are delays in updating server objects on upload (breaking SDK polling), incomplete file listings, non-deleting file attachments, missing endpoint for a platform site service - and plainly, Assistants that cannot answer because the file search tool returns an error instead of retrieving information and loading tokens.


I’d like you to carefully read this announcement:

https://openai.com/index/introducing-company-knowledge/

If OpenAI is going directly compete for your business vertical, why would they continue to provide a quality (yet generic and unconfigurable as a tool) service for you to wrap? Quality has been degraded with unfortunate timing in direct correlation to that announcement, across multiple endpoints, to disrupt your reputation?


This image is a comparison, today, of the benchmarks of two competing AI embeddings models, at their full dimensionality (embeddings being the underlying AI power of a vector store semantic search). The red one was GA last month. The orange one was released January 2024,

MTEB Score, Multilingual, v2 Mean (task)

MRL Dimension gemini-embedding-001 embeddinggemma-300m 3-large 3-small
3072 68.37 58.93
2048 68.16
1536 68.17 54.00
768 67.99 61.15
512 67.55 60.71 ?
256 66.19 59.68 ?
128 63.31 58.23

and with only 256 Matryoshka dimensions of that shown 3072-dimension 3-large embeddings model score being employed for OpenAI’s vector store semantic search product, you can run open-weight EmbeddingGemma in 2MB and be competitive; quant Q8_0 (768d): 60.93. Knowledge of August 2024 instead of September 2021 that could even classify “ChatGPT” to similar space as “OpenAI”.


With no MTEB v2 benchmarks at lower dimensionality, here’s text-embeddings-3-large scaling on MTEB v1, by OpenAI in 2024, for interpretation:

3-large 3-large 3-large
Embedding size 256 1024 3072
Average MTEB score 62.0 64.1 64.52


Update: I notice that the AI model used for vector stores has been downgraded from its earlier “256 dimensions of 3-large” (along with an upgrade to a per-use fee) - Now documentation says text-embedding-3-small (and doesn’t state dimension reduction).