Can we improve the embedded data?

Klassdev · August 8, 2023, 4:46pm

Hi There,

I have a requirement where I need to make an option for my business users to validate and correct the embedded data through a user interface.

Use case:
As an admin of the system if I ask the model a question that is answered through embedded data(similarity search). If I find that the answer is not right then I would like an option correct that answer and save it.

Expected result:

Saved answser and question should be saved back to embeddings
Whenever same question is asked next time then priority is the answer saved above and not the embedded material.

I know this can be done using fine tuning but since I am using embeddings here so looking for a solution around this only.

Looking forward to hears from all the experts here.

Thanks

jochenschultz · August 8, 2023, 5:49pm

Maybe you can do some manipulations on the embedded data and then embedd it again on a fresh model - what kind of data is it?

curt.kennedy · August 8, 2023, 6:54pm

You can train another neural network on your embeddings to map it to the correct answer … but lots of work to get there.

Otherwise look into keyword based correlation algorithms. Embeddings capture meaning, and if there are a bunch of keywords without meaning, you need to use a keyword based algorithm instead.

Also, if the user input is super vague, you might need a fine-tune to intercept this and ask the user to be more specific so your embedding (or keyword) search is meaningful.

A hybrid of embedding and keyword search is also an option. Just need more details on what you are searching over. Also chunk sizes, etc.

jochenschultz · August 8, 2023, 7:47pm

Maybe also some filters on top after you take the data like a bad word filter. Not the best solution though.

Klassdev · August 8, 2023, 8:13pm

hi,

Thanks for responding.
This is a chatbot which has PDF + FAQ as embeddings as source. The issue is where embeddings are not accurate and sometimes responds with random answers.

I would like to keep an option open where we can add accurate answers for most common questions and this new answer has priority than the actual text stored in embeddings.

Hope this clarifies

Topic		Replies	Views
What's better for the type of chatbot I am building? Fine tune or embedding? Community chatgpt , api	10	2409	August 20, 2023
Making embeddings more accurate? API embeddings	7	2844	December 17, 2023
I read about embeddings and I want to try it. How to start? Community embeddings , chatgpt , api	2	4919	August 11, 2023
Fine-tuning or using embeddings? Small dataset API chatgpt	5	1645	December 17, 2023
How to create FAQ on internal company data? API	6	4848	December 18, 2023

Can we improve the embedded data?

Related topics