I’m using text-embedding-3-large in batch on about 2000 legal documents from a court case. About 700 of my embedding requests simply return “no answers”. (specifically they successfully complete with no errors, but return no results).
Firstly - this is a bug: if something went wrong, it would be appropriate to report what that was, so we are informed. Some of the rejected batches relate to OCR failures having produced total gibberish - so it is understandable that an empty or error response should be the correct result, but a success with a missing response is still sub-optimal.
Second - I notice many of the embedding requests relate to chemical analysis of retail cosmetic and dietary products; which makes me wonder if the embeddings system has some kind of anti-drugs related censorship? If this is true - does a mechanism exist to request a bypass of those restrictions, or to report these false-positives ?
Finally - this might just be a repeatable bug with the API? The same set of 700 embeddings seems to always “not work”, even though I’ve re-submitted them many times. @OpenAI - if you’re reading this - it might be worth looking at how many API requests your system is failing to resolve; maybe there’s some simple code error at your side which is dropping “edge case” results instead of responding properly?
Most of the requests are confidential. I will try to find one which consistently fails, which I can share, after I run my 8th retry today.
I don’t quite understand: Are you not getting vector store results from an embeddings request? Are they not providing the same quality of semantic similarity?
Embeddings classifications shouldn’t have a moderator filter - in fact that is the kind of thing you can build with them.
The output of use of embeddings is a cosine similarity, a dot product, not “no answers”.
Nothing but vectors returned:
== Cosine similarity comparisons ==
0:" ### Comprehensive Analytical Report for “Blue Crystal” Meth Product Objective: To establish the purity by detectin" -
match score: fp32: 1.0000 / fp8: 1.0000 / int8: 1.0000
1:" Retention Time (min) Compound Name Peak Area (mAU*min) 0 1.2 Ethanol 1200 1 2.5 Glycero" -
match score: fp32: 0.5508 / fp8: 0.5457 / int8: 0.5455
2:" ### Product Name: “Lumina Glow Facial Serum” ### Instrumentation: - HPLC System: Agilent 1290 Infinity II - **Detect" -
match score: fp32: 0.6825 / fp8: 0.6760 / int8: 0.6769
3:" Methods for obtaining bulk chemical precursors to opiate production in clandestine manners are listed here, including ev" -
match score: fp32: 0.3861 / fp8: 0.3798 / int8: 0.3803
4:" I’m sorry, but that is a forbidden illicit protocol." -
match score: fp32: 0.1807 / fp8: 0.1710 / int8: 0.1738
You can see #4 isn’t similar, a classifier needs input data to classify against, like the others.
I was thinking about how you send. Up to 2000 embeddings texts can be sent in one request, but you probably don’t want to test the limits if making determinations about API failures.
It seems your code needs to be more resilient, mapping the inputs to outputs when populating a database, persisting against individual cases of failure, splitting those larger requests into subrequests if a multiple request is denied. Re-running whole batches is not the way.
OpenAI services have just been flaky for the past two days since going down. 30 seconds for a fine tune model to respond right after it just took one second, etc.
Sorry everyone - turns out an assortment of code bugs, network issues (openai going down, and my home internet also going down…), and token limits conspired to confuse me. After re-running everything a few more times, all the “small” requests did finally resolve.
And, it turns out that all the “large” ones are not working because a bug in my code let some slip through with more than 8191 tokens inside. Oops!