Offline Embedding Options

I’m seeking out advice from the community for any options they might be aware of for the generation of embeddings without the need to call a cloud service. This is for Vectra, my local Vector DB project and is related to a question I got from a user. It looks like TensorFlow might be an option but I’m wondering if there are other options and if anyone in the community can comment on the quality of the Tensor Flow embeddings in the context of semantic search, compared to either OpenAI of HF’s embeddings?

Offline embeddings are not only interesting as a cost savings measure but also in the context of search over private data where you don’t want any data leakage to an external cloud.

I saw this posted a while back:


The issue you will hit though is that it requires reasonable amounts of compute. Also hosting models is not trivial, you need monitoring an API … etc…


Look at this:

1 Like

Considering the current cost of embeddings, I think the depreciation of the hardware you ran local ones on would be greater than the cost incurred by using the API.

agree. especially on cloud, it’s way more expensive to use ec2 with gpu. :smiling_face_with_tear:

if i am using embeddiing approach using

in this what am i doing
am i hitting openai api or not? beacuse in the code i have not specify any endpoint , just given the model

please expalin me someone

yes, you will be hitting the ada endpoint when you embed if you specify text-embedding-ada-002

suppose if i have question answr pair as text data
but now if i want too do convert as a embed like this

embedding_model = “text-embedding-ada-002”
embedding_encoding = “cl100k_base”

what am i doing here
if i am using cl100k_base that means i am hitting ada endpoint for coverting text data into embedding data

and one other question is that
can i save embedded data in sql server or not

and how will be the query search

because i have embedded data that has answer in vector form but question is in text

Try Sentence Transformers.

You might want to start with one of the many pretrained models e.g. «all-MiniLM-L6-v2» that is lightweight (just 80MB) and fast and yields good results.

1 Like