We’ve got an AI chatbot built using OpenAI, and we’re currently using text-embeddings-ada-002 as our embeddings model. A couple of days ago a much better embeddings model was released. The reasons why I was particularly interested was because among other things it reduces dimensions from 1,500+ to only 500 something.
For us reducing dimensions would be very valuable since we’re running our own SQLite based database adapter, using a VSS search functionality developed as a plugin, based upon FAISS. So reducing the number of dimensions from 1,500 to 500 would significantly reduce the memory footprint, allowing us to handle much more data, without adding more RAM.
However, is it really better, and/or equally good?
I would love to hear from somebody having practical experience with this …
Thank you for great answer, although it didn’t really answer me. I see this part “So my preliminary conclusion is that when you can afford it performance-wise, using all the available dimensions is still very favorable” which I guess is concluding no.
Our industry is AI chatbots for customer support, and sales navigation/suggestions. Most of our clients have 500 or less records, some few up to 10,000 and 20,000, but these are the edge case. We still need to support them though.
We typically extract multiple records as we’re creating context we’re sending to OpenAI. I’d love to have an answer in the span of; “No, don’t change” or “Yes, change, the quality is still (almost) the same”.
You’re running into ram issues with 500 records? That’s like 10 megabytes give or take with 2000 dims
Nope, but we’re running into RAM issues once we start seeing 10,000+ records. We’re deploying into Kubernetes, and we’re trying to be as conservative as possible with resources - So the default deployment is using a 400MB RAM. I’d love to be able to use that amount of RAM for 30,000+ records, but this is not possible now …
Without training, and/or smaller embeddings, this is not possible unfortunately. If the new 500 dim embeddings model gives us 1 to 3 percent quality loss, I wouldn’t mind that much - But if it fundamentally changes (to the worse) the lookups into our DB, that would be a big no-no …
Notice, we’re using SQLite, so the application itself is sharing RAM with the database …
Single deployment/single POD, single process deployment, containing “everything” …