About the usage of ChatGPT Embedding

klcogluberk · March 9, 2023, 12:36pm

Similar to the example on the link, I’m using embedding on the gpt-3.5-turbo model. I created the embeddings, I use them, but every user query getting the embeddings. For example, when the query comes to “How are you” model can’t answer it properly because query doesn’t take part in embeddings. How can I ensure that only appropriate query are answered with embeddings? When the query comes to “How are you” the model should respond to the standard way without the need to send it to embeddings. How can I ensure that only appropriate query are answered with embeddings?

wfhbrian · March 9, 2023, 2:02pm

This is where “stacking” AI components becomes useful.

Your already stacking ChatGPT and Embeddings. The next model in your stack could be a classifier that determines whether your system should query the Embeddings, or otherwise just answer as ChatGPT.

klcogluberk · March 9, 2023, 2:10pm

Is there a document guide that you can share for me to do what you say?

d.rojas53 · March 25, 2023, 6:33pm

Guys a quick question. If I have a very very large database of an inventory (+15,000 products and each one of them with +100 tags or characteristics), im trying to chunk the DB per Tag of the products (ex: price), but its too big as a chunk once it is retrieved by the cosine algorithm to input into the prompt. What can I do in this situation? if I separate the Tag information of the 10K products into multiple Chunks, even if a query ask a question about a price analysis that pertains all products, all chunks would be relevant and then all chunks would be brought to the prompt and wont fit into the max. of 4092 tokens by far. what do you recommend?

anon10827405 · March 25, 2023, 6:47pm

You may want to consider using bag-of-words alongside semantic embeddings as your dataset is mainly keyword-focused.

For some reason I can’t wrap my head around your question. It’s not wrong, I just am not too experienced in dealing with large databases like this.

Why are you not embedding each individual item and then using metadata for sorting? If I understand correctly, you have bunched your products together by their tag?

d.rojas53 · March 25, 2023, 7:01pm

For example if the question is, name me a list with all the products with a price of US$50 or more, if there are more than 4000 products with that characteristic, the relevant embeddings would bring to the prompt the names of these 4,000 products surpassing by far the limit token of 4,097. So the problem is that if the DB is sufficiently big, the relevant results brought to the prompt would surpass the token limit

d.rojas53 · March 25, 2023, 7:04pm

Exactly, I created the chunks that are finally embededd, separating the products by characteristics (types)

anon10827405 · March 25, 2023, 7:09pm

I see. Where does this information come from? Some sort of database?
What will you do when the price changes? Update the embedding chunk?

Have you considered trying to directly connect the database which stores and updates these values?
Not only would it cut out the middle-man, it would mean that you don’t need to update, and re-embed whole chunks of data each time something is changed.

If there is a database that already exists, and is relatively clean and not too nested, you may want to try:

Regarding your embeddings, I would think the best option would be to either reduce the chunk size, or possibly you could use bag-of-words to find the exact product once it’s retrieved.

Also, ChatGPT plugins might be exactly what you’re looking for:

jay9 · August 18, 2023, 9:24pm

are you suggesting that he doesn’t use embeddings or a vectorDB at all? im curious what the tradeoffs would be in this case

anon10827405 · August 18, 2023, 10:26pm

It really depends on what your use-case is. I think using embeddings as a pointer is great. Obviously if you have data that changes constantly then storing and updating the data isn’t ideal and also breaks the single source of truth principle if you already are using a database.

Weaviate has been making some moves though. What’s your use-case?

Topic		Replies	Views
Can someone make embeddings make sense? (Not what you think, more in post, lets discuss!) API embeddings , gpt-4	6	2355	September 19, 2023
Reducing Cost of GPT 4 by using embeddings Prompting	23	10773	May 4, 2023
How to feed data for completions, instead of using prompt/answer fine-tuning format? API	25	18140	December 17, 2023
How do you tag data correctly? API embeddings , chatgpt , vector-db	8	4520	December 16, 2023
What am I doing wrong on my semantic search JSON embeded? API	16	4778	February 21, 2024

About the usage of ChatGPT Embedding

Related topics