Near real time feedback to provide new output

G’day everyone,

I want to use my customer feedback on a recommendation to provide new recommendations based on it, near real time.

Imagine I provide a list of 5 recommendations. My customer tag 2 as ‘bad’. I want to send it back to OpenAI so I receive more recommendations - however it will understand that i sent two ‘bad’ examples to this customer.

The way i thought about doing it is through fine-tuning a model, however I am not trying to generalize to my whole customer base. I am trying to be specific to a customer.

To work-around this, I thought about providing the customer identification in the prompt, so my fine-tuned will understand this specificity.

Do you guys think it should work? I would like to hear thoughts about the solution:

  1. Provide recommendations to my customer.
  2. Collect good and bad examples.
  3. Fine tune the model with something like: For my customer (#client_identification), the recommendation {recommendation} is bad. For my customer (#client_identification), the recommendation {recommendation} is good.
  4. Request other recommendation from the model, different from the previously done suggested in the tuning phase.

Thanks everyone and as long as I evolve this idea, i will certainly provide some updates!

1 Like

Hi @dmirandaalves, and welcome to the community!

I’m certainly not an expert in these matters, but I fear any attempt to fine-tune based on customer feedback has some risks. Imagine users mistaking a given response that was good, but they thought it was bad. Now your model is compromised.

I have a hunch that feedback loops and their influence on the solution must exist separately from the model itself.

3 Likes

Hey @bill.french !!! Thanks for the answer!

I’m certainly not an expert in these matters, but I fear any attempt to fine-tune based on customer feedback has some risks. Imagine users mistaking a given response that was good, but they thought it was bad. Now your model is compromised.

I completely agree with the risks…maybe an option would be to ‘deprecate’ old suggestions after a while (for example, based on the last n evaluations?)

I have a hunch that feedback loops and their influence on the solution must exist separately from the model itself.

I was thinking about the possibility of input the previous feedback on the prompt when generating the ‘next’ recommendations…But i am afraid the prompt size will not be enough for my requests…Any thought about this?

It would be something like: Recommend a product based on my interests that are {interest 1}, {interest 2} and {interest 3}. Notice that the following interests are not what I want: {interest 4}, {interest 5}. Also, notice that the following interests are marked as good examples: {interest 6} and {interest 7}

Thanks again and I need to say that i’m glad to finally find people to discuss this theme with me! :slight_smile:

1 Like

Yeah, you could untrain the model I guess. I was thinking either you manually vet nominated changes before training, or establish customer personas as a means to separate bot behaviors. I think this kinda leans into this approach.

Hey guys, what is the exact use case? How do you define what is good and what is bad? How does you customer know what is good and what is bad for them? Answers to those questions maybe will just make your problem disappear.

Hey @sergeliatko! Thanks for coming! Your ideas will be great to be heard

Imagine something like booking.com. I will recommend you trip ideas:

1- Enjoy beaches all day [LIKE] [DISLIKE]
2- Top restaurants [LIKE] [DISLIKE]
3- Museums [LIKE] [DISLIKE]

So, if you tag that the option “2 - Top restaurant” is a dislike, i want to replace top restaurants to another example for this specific customer…And i want to make sure that the model will not recommend another top restaurant to this specific customer…Did you get the point? After the dislike check of my customer, the list will be like:

1- same as previous: Enjoy beaches all day [LIKE] [DISLIKE]
2- NEW RECOMMENDATION: Hiking [LIKE] [DISLIKE]
3- same as previous: Museums [LIKE] [DISLIKE]

And there it goes…So i do not the model to understand that the option tagged as dislike is bad for everyone - but for this specific customer.

Please let me know if you got the use case and i am would love to hear your suggestions

Hey @bill.french, thanks again for your input

I am starting to think that the personas idea might help with the prompt request bringing the previous feedback evaluation…The bad side of this is that i will be limited to the prompt size limit, right?

I am genuinely in doubt if fine-tuning would solve the problem or it would be good to structure the answer - instead of helping with new knowledge (in that case, the evaluation of a specific customer)

I would attach a “preferences” table to each user with at least 2 columns: Subject, User_Score

The subject would contain the category (restaurants, museums, etc) and the score would be a number say between 0 and 100 of how much the user likes the subject.

Personally I would also add at least one more column “Guessed_Score” to hold the AI predicted user score for the subject.

Then, before building my prompt I would sorry the table by preference and get the subjects in order from high to low (I would use the user score as well in the prompt, but maybe not a must-have). I would cut the list in 3 with something like the following:

Highly preferred: category_1 / 100, category_5 / 80 etc.
No special preference: category_2 / 50, category_3 / 35
Skip: category_4 / 15, category_6 / 0

Then if user updates the preferences during the conversation, I would rebuild the whole master-prompt (seed system messages if chat API) with updated info and if no way to take out the messages related to the “bad” subjects, I would create an instruction update message for the chat bot saying something like: starting from this point the new user preferences are (add them after), adjust the replies accordingly.

I would store the table in database for retrieval.

Fine tuning answers on user preferences? Please explain what do understand by fine-tuning. What is model fine-running for you and why do you need it in chat bot?

Hey @sergeliatko

I would attach a “preferences” table to each user with at least 2 columns: Subject, User_Score

I think this is a great way to go (and pretty easy to implement)…It def worths the test. I will try it and I will come back here soon to update you guys about the results!

Fine tuning answers on user preferences? Please explain what do understand by fine-tuning. What is model fine-running for you and why do you need it in chat bot?

To explain this, I will try to discuss the pain point regarding the first solution:

Imagine a customer got 500 great categories I could tag in the prompt as ‘preferred categories’. Also, the same customer got 300 ‘bad’ categories I would love to tell OpenAi as categories to be avoided. In that case, probably all this info about this specific customer would not fit in the prompt request.

What I thought is: instead of adding this additional info in the prompt request, it would be good to provide the knowledge about each customer to my model. In this way, I will make sure the recommendations will consider all the previous provided data about ‘good’ and ‘bad’ classifications to each customer. The idea to fine tune would be something around this. How does it sound to you?

I am not sure if fine tuning would help me in this use case - I would love to hear from you about your thoughts

If this is something that we cannot overcome, another option would be to: consider only the last N categories feeback on each group (highly preferred, no special preference, skip), for example…or maybe trying to summarize better the categories so I have fewer ones to input in the prompt…Just food for thought

Thanks again for the comments. It helped me a lot.

One thought is to have them fill out the questionnaire, and then embed the things they like and correlate this against your offerings.

Example:

BACON ← LIKE
CHEESE ← LIKE
BANANA ← DISLIKE
GRAPES ← LIKE
APPLES ← DISLIKE

Then embed “BACON CHEESE GRAPES” and correlate it with your database that contains embeddings of your products. Embeddings are the real-time solution, fine-tunes are static.

1 Like

Hey @curt.kennedy thanks for the answer!

It seems a good way to go, however the issue is: in your example, we have only three words to embed, right? However imagine that I might have more than five thousands words to embed to a specific customer…I would not be able to put this in a prompt, right?

So my question would be: where should i provide this ‘embedded info’ so I have the output with great context?

I’d love to hear your thoughts again

That was just a toy example. Feel free to embed larger, more descriptive things to increase the information embedded, and improve the search results.

So

GRAPES ← LIKE

Embed “Vine ripened super special Spanish grapes of variety X that belong to the gunus … blah, blah, blah”

You can trade on concatenating the entire set, then embed, then query, OR, embed each → query → curate results across each question.

If you do the latter, and your descriptions are fixed, you don’t need to embed each time, just lookup the predefined embedding vector. But you loose the “mix” and what interesting matches the mix might bring. So think about it, and see what works best.

@curt.kennedy one question: when you say ‘to embed’, do you mean put it inside the prompt request or using the embedding solution end-point?

If you are talking about the open ai end-point to embed it, do you believe it will be possible to relate this piece of information to a specific customer?

This reflection is relevant to me because in the prompt case (even though we have the token limit as a restriction of the solution), I could handle this, right?

Thanks again for sharing your thoughts with @sergeliatko and @bill.french

So “embedding” really means both, let me explain …

The process is use text-embedding-ada-002 in the API to create a vector out of your “LIKE” information, concatenated, or one-at-a-time. Then you take this output, which is a vector of 1536 dimensions, you take the dot product of this vector with all of the previously embedded product offerings in a real-time database (either in-memory from a file, or use fancy vector database like Pinecone or Redis). Then you retrieve your top N hits that are closest (max dot products) and then feed in the text behind these closest dot products to a prompt, and ask your favorite GPT LLM to answer or create a recommendation based on the related text you just retrieved.

So, you need: the embedding endpoint, DB search, and prompt + GPT to generate the answer.

You can use other embedding engines too. But most here use the ada-002 one from the API. Oh, and dot products are sufficient since unit vectors come out of ada-002, but otherwise you may have to take actual vector distances (either Euclidean or Manhattan).

I would say you have 2 separate issues:

  1. Match the suggested items by category to user preferred categories
  2. Construct “suggestion text” based on suggested items

I wouldn’t use openAI for the task #1, but rather turn to resources about personalizing user experience based on knowledge about user (how Amazon, Google do it?)

Task 2 is perfect for Open AI:

  1. Do you homework to find items you want to suggest to the user, add them as context to your prompt
  2. Add user info as context to the prompt (need to decide on how detailed it should be)
  3. Add summary of the previous conversation as context to the prompt
  4. add instructions to your bot
  5. get bot reply
  6. validate/sanitize the reply
  7. show it to user

If fine tuning (maybe not necessary in your case) the whole prompt (1-4) must be in the “prompt” field and the bot reply you show in step 7 will be your “completion”.

I don’t think this will fit as ideally the vector for user preferences would be a numerically indexed array where keys are the hash of categories and the values are floats of how strong user prefers them.

And suggestion vector would be the same array but in values you would have the cosine similarity between the suggested category and the category in the key.

Having this structure you would compare oranges with oranges and get the result you need. So the cosine similarity between the user preferences and the suggestion vector in this form would be reflecting how likely the user is going to love your suggestion.

I don’t think simple ada embedding will give you this. But I did not test. Personally, I would look into custom embedding models to do this.

Applying user preferences as weights is conceptually similar to applying weighted keywords, right? Is not what sparse vectors accomplish? For example, using BM25 (default for Elastic Search I believe) or SPLADE? This can already be combined with dense vectors such as ada-002.

Thanks for explaining you point, @curt.kennedy! I will think about it…the only remaining question I have is how good the model will perform considering the specific customer regarding their feedback…When I run some tests, I will come back here to share what I discovered

1 Like

Hey @sergeliatko!

I completely agree with this perspective, however I think that the only fragile points is:

  1. Add summary of the previous conversation as context to the prompt

I believe it is not a blocker to do in that way, but I feel I will lose improvements opportunities in my recommendation when I summarize previous feedback due to token limit size in the prompt request…

To be honest, I am not completely sure about the smartest way to workaround this (prompt limit size) but the embedding end-point seems the closest thing to handle.

I am also not sure that the model will be smart enough to understand I am refering to a specific customer. If not, it only comes in my mind about the possibility of design a model per customer (so I will make sure the output will not merge feedback from other customers). The problem is that this approach is def not scalable…

Thanks again for all the comments and I’ll be always happy to hear your reflections about it. I am pretty confident I am able to put in production an MVP based in our conversations in the next few days

Hey @RonaldGRuckus, thanks for replying us!

Applying user preferences as weights is conceptually similar to applying weighted keywords, right?

I feel it is! Maybe the only additional challenge i am facing is regarding the customization to each customer preference

Is not what sparse vectors accomplish? For example, using BM25 (default for Elastic Search I believe) or SPLADE? This can already be combined with dense vectors such as ada-002.

I haven’t thought about this and to be honest, i am not really familiarized with BM25 to discuss…I will spend sometime studying so I can have a point of view regarding this - however it would be great for me to complete a solution purely designed with OpenAI. It would be great if you could explain me a bit more about this combination idea

Thanks again,